Greenwashing detection case study — metrics

This document summarizes the greenwashing intent case study: transformation rule, data source, and numeric metrics (accuracy, macro F1). Goal: answer "How well does Intentum perform in at least one domain?" with a reproducible, documented result.

Case study completion status (50–100 labeled public data)

Component	Status
Labeled dataset (URL + human label)	✅ greenwashing-labeled-sources.csv — 53 rows (GenuineSustainability, ActiveGreenwashing, SelectiveDisclosure, StrategicObfuscation, UnintentionalMisrepresentation).
Transformation rule (document → BehaviorSpace)	✅ Documented and implemented: SustainabilityReporter.AnalyzeReport(text) extracts counts for `language:claim.vague`, `language:comparison.unsubstantiated`, `data:metrics.without.proof`, `data:baseline.manipulation`. (Imagery from image only.)
Model (BehaviorSpace → Intent)	✅ GreenwashingIntentModel.Infer — intent name + confidence; taxonomy aligned with labels.
Metric computation (accuracy / macro F1)	✅ GreenwashingCaseStudyTests — accuracy and macro F1 on labeled examples.
Numeric summary in repo	✅ This page; values from last test run.
Document text for CSV URLs	⚠️ Partial: CSV has URLs only. Run `./scripts/download-greenwashing-sources.sh` to fetch ClientEarth + Sustainable Agency HTML into `docs/case-studies/downloaded/`. No automatic fetch for PDFs or other news URLs.
Evaluation on public CSV rows	✅ GreenwashingCaseStudyTests.GreenwashingCaseStudy_OnDownloadedHtml_ComputesAccuracyAndF1 — when downloaded HTML exists, reads CSV, maps ClientEarth URLs to local files, extracts text, runs AnalyzeReport → Infer, compares to human_label, reports accuracy/F1. Run download script first for ~10 ClientEarth rows.
Evaluation on Mendeley Excel	✅ GreenwashingCaseStudyTests.GreenwashingCaseStudy_OnMendeleyExcel_ComputesAccuracyAndF1 — when DataGreenwash greenwash.xlsx exists, reads rows (ENTITY + columns as text), runs AnalyzeReport → Infer, compares to human label (default ActiveGreenwashing), reports accuracy/F1. Cap 500 rows per run.

Conclusion: Everything needed for the case study is in place. For synthetic labeled data (19 examples): run GreenwashingCaseStudyTests.GreenwashingCaseStudy_ComputesAccuracyAndF1. For public data (subset with local HTML): run ./scripts/download-greenwashing-sources.sh, then GreenwashingCaseStudyTests.GreenwashingCaseStudy_OnDownloadedHtml_ComputesAccuracyAndF1; extend CSV and download more sources to approach 50–100 evaluated rows.

Transformation rule (document → BehaviorSpace)

Actor:action format: each dimension is actor:action. The greenwashing model expects these signal dimensions:
- language:claim.vague
- language:comparison.unsubstantiated
- data:metrics.without.proof
- data:baseline.manipulation
- imagery:nature.without.data
From a document (PDF, web article, report): extract counts per signal (e.g. how many vague claims, unsubstantiated comparisons, metrics without proof, baseline manipulation, nature imagery without data). Then build a BehaviorSpace by calling Observe(actor, action) once per occurrence (e.g. 3 vague claims → Observe("language", "claim.vague") three times).
Data source: You do not need to create the data yourself. Public sources are valid: company sustainability reports (PDF), news/analysis articles, academic or NGO reports (greenwashing vs genuine examples). Important: each example must have a human label (e.g. GenuineSustainability, UnintentionalMisrepresentation, SelectiveDisclosure, StrategicObfuscation, ActiveGreenwashing). Full text does not need to live in the repo — a URL + label list or extracted signal counts is enough; if licensing/attribution is required, keep a source list under docs/case-studies/.

Labeled dataset (this run)

Type: Synthetic labeled examples for reproducibility (19 examples). Each row: dimension counts (actor:action → count) and expected intent name from the model taxonomy.
Intent names (GreenwashingIntentModel): GenuineSustainability, UnintentionalMisrepresentation, SelectiveDisclosure, StrategicObfuscation, ActiveGreenwashing.
Reproduce: Run the test GreenwashingCaseStudyTests.GreenwashingCaseStudy_ComputesAccuracyAndF1; it builds BehaviorSpaces from the labeled set, runs GreenwashingIntentModel.Infer, and compares predicted intent name to human label.

Numeric summary

Synthetic labeled set (19 examples)

Metric	Value
Accuracy	0.63
Macro F1	0.63
N	19

(Values from last run of GreenwashingCaseStudyTests.GreenwashingCaseStudy_ComputesAccuracyAndF1; re-run the test to regenerate.)

Public data (downloaded HTML, ClientEarth subset)

Metric	Value
Accuracy	0.00
Macro F1	0.00
N	9 (ClientEarth company profiles with local HTML)

(Run: ./scripts/download-greenwashing-sources.sh then dotnet test tests/Intentum.Tests.Integration/Intentum.Tests.Integration.csproj --filter GreenwashingCaseStudy_OnDownloadedHtml_ComputesAccuracyAndF1. All 9 rows are human-labeled ActiveGreenwashing; model predictions on stripped HTML differ — pattern-based signal extraction may need tuning for NGO article content.)

Public data (Mendeley DataGreenwash Excel)

Metric	Value
Accuracy	0.00
Macro F1	0.00
N	500 (capped; greenwash.xlsx rows with ENTITY + columns as text)

(Run: unpack Mendeley dataset to docs/case-studies/downloaded/DataGreenwash/, then dotnet test tests/Intentum.Tests.Integration/Intentum.Tests.Integration.csproj --filter GreenwashingCaseStudy_OnMendeleyExcel_ComputesAccuracyAndF1. Excel files are tabular (ENTITY + scores), not report text; all rows default to ActiveGreenwashing. Results indicative; for report-style text use CSV+HTML or synthetic set.)

Public data sources (URL + label)

greenwashing-sources.md — Curated list: Mendeley dataset (CC BY 4.0; download manually), ClientEarth Greenwashing Files, The Sustainable Agency article, genuine sustainability report URLs.
greenwashing-labeled-sources.csv — 50+ rows: url, human_label, source_name, notes. Use to extend the labeled set or map external sources to our taxonomy.
Download public HTML (optional): From repo root run ./scripts/download-greenwashing-sources.sh to fetch ClientEarth profile pages and the Sustainable Agency article into docs/case-studies/downloaded/ for local analysis.

How to reproduce / extend

Synthetic (19 examples): dotnet test tests/Intentum.Tests.Integration/Intentum.Tests.Integration.csproj --filter GreenwashingCaseStudy_ComputesAccuracyAndF1. Update the "Synthetic labeled set" table above with the printed accuracy and macro F1.
Public data (downloaded HTML): Run ./scripts/download-greenwashing-sources.sh, then dotnet test ... --filter GreenwashingCaseStudy_OnDownloadedHtml_ComputesAccuracyAndF1. Update the "Public data (downloaded HTML)" table with the printed values.
Public data (Mendeley Excel): Unpack Mendeley dataset to docs/case-studies/downloaded/DataGreenwash/, then dotnet test ... --filter GreenwashingCaseStudy_OnMendeleyExcel_ComputesAccuracyAndF1. Update the "Public data (Mendeley DataGreenwash Excel)" table with the printed values.
Extend the labeled set: Add rows to greenwashing-labeled-sources.csv; if you add more download targets to the script or provide local text/counts, re-run the public-data test and update this page with the new accuracy/F1.

Table of Contents