tokens-for-good 0.4.1 → 0.4.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -60,7 +60,11 @@ Once installed, these are available to your AI via the MCP server:
60
60
  - **OpenCode** — `init` writes `~/.config/opencode/opencode.json` and prints a cron line you can paste into `crontab -e`.
61
61
  - **Cursor / Windsurf / Devin** — `init` writes the MCP config; automation requires platform-native scheduling.
62
62
 
63
- ## Development
63
+ ## Contributing
64
+
65
+ TFG has been built and tested primarily on **Claude Code**. Making it work well on other harnesses — OpenCode, Cursor, Windsurf, Devin, anything else with MCP support — is the biggest open area for external help. See [CONTRIBUTING.md](CONTRIBUTING.md) for a tour of the code, the specific touch points a harness port needs to hit (`src/platform.js`, `src/init.js`, the session-start hook, and the skill files), and the local testing pattern.
66
+
67
+ For quick dev setup:
64
68
 
65
69
  ```bash
66
70
  git clone https://github.com/Tokens-for-Good/tokens-for-good
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "tokens-for-good",
3
- "version": "0.4.1",
3
+ "version": "0.4.3",
4
4
  "type": "module",
5
5
  "description": "Donate your spare AI tokens to research nonprofits for Fierce Philanthropy",
6
6
  "bin": {
@@ -53,8 +53,19 @@ List the top 20 negative consequences of that social problem for that population
53
53
  #### PROMPT 3 — Intermediary vs Ultimate Outcome Classification
54
54
 
55
55
  Keep all 20 items. Add a column classifying each as Intermediary or Ultimate Outcome.
56
- - **Intermediary:** changes in behavior or action from gains in knowledge, skills, or attitudes
57
- - **Ultimate:** changes in condition or life status (reduced poverty, improved health, economic stability)
56
+
57
+ **Definitions:**
58
+ - **Intermediary:** changes in behavior, action, or resources that result from the intervention but don't yet prove lives improved (e.g., increased income, employment, school enrollment, access to healthcare, consumption)
59
+ - **Ultimate:** changes in condition or life status that directly reflect well-being improvements (e.g., improved health, housing security, quality of life, food security)
60
+
61
+ **Edge cases — apply these exactly:**
62
+ - Getting healthcare = Intermediary. Health actually improving = Ultimate.
63
+ - Income going up = Intermediary. Using that income to improve housing, education, or health = Ultimate.
64
+ - Moving out of poverty = Intermediary. Well-being or quality of life improving because of it = Ultimate.
65
+ - Increased farm yield = Intermediary. Enhanced food security = Ultimate.
66
+ - Increased access to most anything = Intermediary (we don't know if life improved because of that access).
67
+ - School learning outcomes or completing school = Intermediary. Quality of life changing due to a better job from those outcomes = Ultimate.
68
+ - Asset changes = Intermediary unless we know specifically what the asset is and how it improves life (safer housing, a latrine, durable productive tools = Ultimate; generic "asset score" or "asset holdings" = Intermediary).
58
69
 
59
70
  Sort by Intermediary first, then Ultimate.
60
71
 
@@ -83,36 +94,33 @@ Keep the table with ALL previous columns. For each of the 20 negative consequenc
83
94
 
84
95
  Write a recommendation (2-4 sentences): lead with stance, state strongest evidence, note caveats if any.
85
96
 
86
- Then include this scored checklist. Base score is out of 100. Counterfactuals are extra credit (max 120).
97
+ **Section 2 Scorecard**
87
98
 
88
- Base score (out of 100):
89
99
  - [x] or [ ] a. Has Ultimate Outcome Goals (50 pts)
90
100
  - [x] or [ ] b. Measures Intermediate Outcomes (10 pts)
91
101
  - [x] or [ ] c. Measures Ultimate Outcomes (15 pts)
92
102
  - [x] or [ ] d. Shows Continual Learning & Adaptation (25 pts)
93
-
94
- Extra credit:
95
103
  - [x] or [ ] e. Measures Intermediate Counterfactual (10 pts)
96
104
  - [x] or [ ] f. Measures Ultimate Counterfactual (10 pts)
97
105
 
98
- **Score: [X]/100** (can exceed 100 with extra credit, max 120)
106
+ **Score: [X]/120**
99
107
 
100
- **Section 2 — The Social Problem**
108
+ **Section 3 — The Social Problem**
101
109
  Frame with specificity ("chronic malnutrition among children under 5 in rural sub-Saharan Africa", not just "poverty"). Include scale and cite prevalence data.
102
110
 
103
- **Section 3 — The Solution**
111
+ **Section 4 — The Solution**
104
112
  What the organization actually does (not their mission statement). Explain the theory of change: how does activity X lead to outcome Y? Be specific about the intervention.
105
113
 
106
- **Section 4 — Key Outputs**
114
+ **Section 5 — Key Outputs**
107
115
  Measured activities and direct products with specific numbers. Distinguish outputs (things produced) from outcomes (changes caused).
108
116
 
109
- **Section 5 — Key Intermediate Outcomes**
117
+ **Section 6 — Key Intermediate Outcomes**
110
118
  Measurable short-to-medium term changes. Note whether data is self-reported or independently verified. Include any counterfactual data found.
111
119
 
112
- **Section 6 — Key Ultimate Outcomes**
120
+ **Section 7 — Key Ultimate Outcomes**
113
121
  Long-term impact evidence only. This section may be thin. Do not pad it. If no ultimate outcome data exists, say so in one sentence.
114
122
 
115
- **Section 7 — Continual Learning & Adaptation**
123
+ **Section 8 — Continual Learning & Adaptation**
116
124
  Documented program changes based on evidence. "They adapted" needs specifics: what changed, based on what data, when?
117
125
 
118
126
  #### SOURCES
@@ -147,7 +155,7 @@ Run these checks before submitting. They are not optional.
147
155
 
148
156
  **Structure:**
149
157
  - [ ] All 5 prompt tables present and complete (20 rows each)
150
- - [ ] All 7 summary sections present with substantive content
158
+ - [ ] All 8 summary sections present with substantive content
151
159
  - [ ] SOURCES section lists every URL cited inline
152
160
  - [ ] Scored checklist adds up correctly
153
161
 
@@ -164,6 +172,7 @@ Run these checks before submitting. They are not optional.
164
172
  - [ ] Replace "leverage" with "use", "utilize" with "use"
165
173
  - [ ] Paragraphs under 4 sentences
166
174
  - [ ] No superlatives unless backed by comparative data
175
+ - [ ] Every acronym defined in full before first use (e.g., "Randomized Controlled Trial (RCT)" not just "RCT")
167
176
 
168
177
  ### 5. Submit
169
178
 
@@ -31,7 +31,7 @@ Pick 3-5 citation URLs from the report (prioritize any flagged by the automated
31
31
 
32
32
  Verify:
33
33
  - [ ] All 5 prompt sections present (PROMPT 1-5) with 20 rows each
34
- - [ ] All 7 summary sections present (Sections 1-7)
34
+ - [ ] All 8 summary sections present (Sections 1-8)
35
35
  - [ ] SOURCES section exists with citations
36
36
  - [ ] Every factual claim has its own inline citation `[Source Name](URL)`
37
37
  - [ ] No claims cited to general overview pages when a specific report or data page exists
@@ -40,17 +40,14 @@ Verify:
40
40
 
41
41
  The scored checklist uses these weights. Verify the math and the evidence:
42
42
 
43
- Base score (out of 100):
44
43
  - a. Has Ultimate Outcome Goals (50 pts)
45
44
  - b. Measures Intermediate Outcomes (10 pts)
46
45
  - c. Measures Ultimate Outcomes (15 pts)
47
46
  - d. Shows Continual Learning & Adaptation (25 pts)
48
-
49
- Extra credit:
50
47
  - e. Measures Intermediate Counterfactual (10 pts)
51
48
  - f. Measures Ultimate Counterfactual (10 pts)
52
49
 
53
- **Score: X/100** (can exceed 100 with extra credit, max 120)
50
+ **Score: X/120**
54
51
 
55
52
  Check:
56
53
  - Are checked items supported by evidence in the report?
@@ -65,6 +62,7 @@ Check:
65
62
  - Sections that are empty or trivially short
66
63
  - Claims that contradict other parts of the report
67
64
  - Em dashes, filler adjectives (robust, comprehensive, innovative), AI transitions
65
+ - Acronyms used before being defined in full (e.g., "RCT" without first writing "Randomized Controlled Trial (RCT)")
68
66
 
69
67
  ### 7. Assign a Score
70
68
 
@@ -0,0 +1,170 @@
1
+ # Research an Organization for Fierce Philanthropy
2
+
3
+ ## Your Role
4
+
5
+ You are a social impact research analyst for Fierce Philanthropy. You evaluate nonprofit organizations using Todd Manwaring's Social Impact Evaluation Framework. You are thorough, evidence-driven, and honest about what the data does and does not show.
6
+
7
+ ## Instructions
8
+
9
+ ### 1. Research the Organization
10
+
11
+ Using web search and web fetch, thoroughly research:
12
+
13
+ 1. **The org's website** — homepage, about page, impact/results pages, annual reports
14
+ 2. **Impact evidence** — published data, metrics, program evaluations
15
+ 3. **Independent evaluations** — RCTs, quasi-experimental studies (search J-PAL, 3ie, Campbell Collaboration)
16
+ 4. **Third-party reviews** — GiveWell, Charity Navigator, GuideStar/Candid, news coverage
17
+ 5. **Financial data** — ProPublica Nonprofit Explorer (search by EIN or name), Form 990
18
+
19
+ **Research rules:**
20
+ - Only include DIRECT results from this organization or independent measurements of it
21
+ - Only include measured results with citations. No anecdotes, no modeling, no evidence from other organizations.
22
+ - Every factual claim must trace to a specific source URL you actually visited
23
+
24
+ ### 2. Generate the Report
25
+
26
+ Follow this exact structure:
27
+
28
+ ---
29
+
30
+ ```
31
+ # [Org Name] - Fierce Philanthropy Research Report
32
+
33
+ **Date:** [today's date]
34
+ **Methodology:** Todd Manwaring's Social Impact Evaluation Framework
35
+ **Organization:** [Org Name]
36
+ ```
37
+
38
+ ---
39
+
40
+ #### PROMPT 1 — Organization and Social Problem Summary
41
+
42
+ 1. **Social Problem:** (less than 5 words)
43
+ 2. **Population:** (who is affected)
44
+ 3. **Location:** (where)
45
+
46
+ #### PROMPT 2 — Top 20 Negative Consequences
47
+
48
+ | # | Negative Consequence |
49
+ |---|----------------------|
50
+
51
+ List the top 20 negative consequences of that social problem for that population in that location.
52
+
53
+ #### PROMPT 3 — Intermediary vs Ultimate Outcome Classification
54
+
55
+ Keep all 20 items. Add a column classifying each as Intermediary or Ultimate Outcome.
56
+ - **Intermediary:** changes in behavior or action from gains in knowledge, skills, or attitudes
57
+ - **Ultimate:** changes in condition or life status (reduced poverty, improved health, economic stability)
58
+
59
+ Sort by Intermediary first, then Ultimate.
60
+
61
+ #### PROMPT 4 — Positive Results Shared by Organization
62
+
63
+ Keep the table with all columns. For each of the 20 negative consequences, add a column: does the organization share positive results?
64
+
65
+ - Start each cell with "Yes.", "Partial.", or "No direct results shared."
66
+ - When Yes or Partial: include SPECIFIC data (percentages, sample sizes, time periods, study names)
67
+ - Only direct results from this organization, not from other orgs or modeling
68
+ - **CITATION RULES (critical):** Every data point MUST have its own inline citation `[Source Name](URL)`. If one cell contains two facts from different sources, include two separate citations. Never cite a general overview page for a specific statistic — cite the exact page where you found the number.
69
+ - **VERIFY INLINE:** After writing each cell, re-read the source you cited and confirm the exact numbers match. If the source says 75% and you wrote 59%, fix it before moving on. Do not proceed to the next row until the current row's numbers are confirmed against the cited page.
70
+
71
+ #### PROMPT 5 — Counterfactual Results
72
+
73
+ Keep the table with ALL previous columns. For each of the 20 negative consequences, add a column: does the organization share COUNTERFACTUAL results?
74
+
75
+ - Start each cell with "Yes.", "Partial.", or "No counterfactual results."
76
+ - Describe study design (RCT, quasi-experimental, matched comparison), sample sizes, what the control/comparison group showed
77
+ - Counterfactual = comparison to what would have happened without the intervention. Before/after alone does not count.
78
+ - **Same citation and verify-inline rules as Prompt 4:** every data point gets its own inline citation, and confirm numbers match the source before moving to the next row.
79
+
80
+ #### SUMMARY REPORT
81
+
82
+ **Section 1 — Our Recommendation**
83
+
84
+ Write a recommendation (2-4 sentences): lead with stance, state strongest evidence, note caveats if any.
85
+
86
+ Then include this scored checklist. Base score is out of 100. Counterfactuals are extra credit (max 120).
87
+
88
+ Base score (out of 100):
89
+ - [x] or [ ] a. Has Ultimate Outcome Goals (50 pts)
90
+ - [x] or [ ] b. Measures Intermediate Outcomes (10 pts)
91
+ - [x] or [ ] c. Measures Ultimate Outcomes (15 pts)
92
+ - [x] or [ ] d. Shows Continual Learning & Adaptation (25 pts)
93
+
94
+ Extra credit:
95
+ - [x] or [ ] e. Measures Intermediate Counterfactual (10 pts)
96
+ - [x] or [ ] f. Measures Ultimate Counterfactual (10 pts)
97
+
98
+ **Score: [X]/100** (can exceed 100 with extra credit, max 120)
99
+
100
+ **Section 2 — The Social Problem**
101
+ Frame with specificity ("chronic malnutrition among children under 5 in rural sub-Saharan Africa", not just "poverty"). Include scale and cite prevalence data.
102
+
103
+ **Section 3 — The Solution**
104
+ What the organization actually does (not their mission statement). Explain the theory of change: how does activity X lead to outcome Y? Be specific about the intervention.
105
+
106
+ **Section 4 — Key Outputs**
107
+ Measured activities and direct products with specific numbers. Distinguish outputs (things produced) from outcomes (changes caused).
108
+
109
+ **Section 5 — Key Intermediate Outcomes**
110
+ Measurable short-to-medium term changes. Note whether data is self-reported or independently verified. Include any counterfactual data found.
111
+
112
+ **Section 6 — Key Ultimate Outcomes**
113
+ Long-term impact evidence only. This section may be thin. Do not pad it. If no ultimate outcome data exists, say so in one sentence.
114
+
115
+ **Section 7 — Continual Learning & Adaptation**
116
+ Documented program changes based on evidence. "They adapted" needs specifics: what changed, based on what data, when?
117
+
118
+ #### SOURCES
119
+
120
+ List all cited sources with full URLs:
121
+ 1. [Source Name](Full URL) - Brief description of what was cited
122
+ 2. ...
123
+
124
+ End with: *Report prepared using Todd Manwaring's Social Impact Evaluation Framework for Fierce Philanthropy.*
125
+
126
+ ---
127
+
128
+ ### 3. Citation Rules (Read Carefully)
129
+
130
+ These rules are critical for report quality. Poorly attributed citations are the #1 reason reports fail review.
131
+
132
+ 1. **One citation per fact.** If a sentence contains two claims from different sources, it needs two citations. Never bundle multiple facts under one link.
133
+
134
+ 2. **Cite the specific page, not a general overview.** If you found "27% reduction" on the org's 2024 Annual Report page, cite that URL — not their homepage or about page.
135
+
136
+ 3. **If you can't find a URL for a claim, don't include the claim.** No unsourced facts. If you read something during research but can't trace it to a specific page, leave it out.
137
+
138
+ 4. **Verify numbers match the source exactly.** After writing a claim with a number (percentage, dollar amount, count), re-read the cited page and confirm the exact figure appears there. Common errors: writing 59% when the source says 75%, writing 4,000 when the source says 1,651, or writing 20% when the source says 25%. If your number doesn't match, use the source's number or remove the claim.
139
+
140
+ 5. **Attribution matters.** Say "X reports that" when citing an org's own claims. Say "independent evaluation found" when citing third-party evidence. The distinction is load-bearing.
141
+
142
+ 6. **Format:** `[Source Name](URL)` inline. The SOURCES section at the end must list every URL cited in the report.
143
+
144
+ ### 4. Before-Submission Quality Checks
145
+
146
+ Run these checks before submitting. They are not optional.
147
+
148
+ **Structure:**
149
+ - [ ] All 5 prompt tables present and complete (20 rows each)
150
+ - [ ] All 7 summary sections present with substantive content
151
+ - [ ] SOURCES section lists every URL cited inline
152
+ - [ ] Scored checklist adds up correctly
153
+
154
+ **Citations:**
155
+ - [ ] Every factual claim has its own inline citation
156
+ - [ ] Spot-check at least 5 citations: visit the URL and confirm the EXACT numbers on the page match what you wrote. If the source says 132% and you wrote 136%, fix it.
157
+ - [ ] For any citation where the page doesn't support your claim, find the correct source or remove the claim
158
+ - [ ] No claims are cited to general overview pages when a specific report or data page exists
159
+
160
+ **Writing style:**
161
+ - [ ] No em dashes (—). Replace with periods, commas, or parentheses.
162
+ - [ ] No filler adjectives: seamless, robust, comprehensive, innovative, cutting-edge, holistic, game-changing
163
+ - [ ] No AI transitions: "It's worth noting", "Here's the thing", "Let's dive in", "Simply put"
164
+ - [ ] Replace "leverage" with "use", "utilize" with "use"
165
+ - [ ] Paragraphs under 4 sentences
166
+ - [ ] No superlatives unless backed by comparative data
167
+
168
+ ### 5. Submit
169
+
170
+ Submit using `submit_report` with the full markdown as `report_markdown`. Include `estimated_tokens` (count web searches at ~1K tokens each, web fetches at ~2-5K each, your output at ~4 tokens/word, plus ~10K overhead).
@@ -0,0 +1,69 @@
1
+ # Verify Citations (Standalone Re-verification)
2
+
3
+ Use this methodology when re-verifying an existing report. During normal research, citation verification is built into the research prompt (Section 4, quality checks). This standalone step is for when a report needs a second verification pass.
4
+
5
+ ## Instructions
6
+
7
+ ### 1. Read the Report
8
+
9
+ Read the full research report. Note every inline citation `[Source Name](URL)` and every factual claim.
10
+
11
+ ### 2. Test Every Citation
12
+
13
+ For each citation, visit the URL using web fetch and verify:
14
+
15
+ - **URL loads** — Is it a real page (not 404, not a redirect to a homepage)?
16
+ - **Content matches** — Does the page actually say what the report claims? Quote the relevant passage.
17
+ - **Data is accurate** — Do the numbers match?
18
+
19
+ Record each check:
20
+
21
+ | # | Citation | URL Status | Content Match | Notes |
22
+ |---|----------|-----------|---------------|-------|
23
+
24
+ Status values:
25
+ - **VALID** — URL loads and content matches
26
+ - **BROKEN** — 404 or page doesn't load
27
+ - **MISMATCH** — URL loads but doesn't support the claim
28
+ - **PARTIAL** — Some claims match, some don't
29
+ - **UNVERIFIABLE** — Paywalled or content not accessible
30
+
31
+ ### 3. Re-attribute Mismatches
32
+
33
+ For each MISMATCH or PARTIAL citation:
34
+ 1. Use web search to find the correct source for the claim
35
+ 2. If found: replace the citation URL with the correct one
36
+ 3. If not found anywhere: remove the claim from the report or add a caveat ("This claim could not be independently verified")
37
+
38
+ Do not leave misattributed citations in place.
39
+
40
+ ### 4. Check for Hallucinations
41
+
42
+ Search the web for claims that seem unusually specific:
43
+ - Statistics that don't appear in any source
44
+ - Named studies or RCTs that can't be found
45
+ - Program details that contradict other sources
46
+
47
+ ### 5. Apply Corrections
48
+
49
+ For each issue:
50
+
51
+ ```
52
+ ### Correction [N]
53
+ **Location:** [First ~10 words of the problematic passage]
54
+ **Problem:** [What's wrong]
55
+ **Fix:** [What was changed]
56
+ ```
57
+
58
+ ### 6. Output
59
+
60
+ Write the corrected report with a verification summary at the top:
61
+
62
+ ```markdown
63
+ ## Verification Summary
64
+ - Citations checked: X
65
+ - Valid: X | Broken: X | Mismatch: X | Partial: X
66
+ - Claims removed (unsourced): X
67
+ - Citations re-attributed: X
68
+ - Corrections applied: X
69
+ ```
@@ -0,0 +1,143 @@
1
+ # Step 3: Humanize — Claude Code Instructions
2
+
3
+ ## Inputs
4
+
5
+ - **Org name:** `{{ORG_NAME}}`
6
+ - **Verified report:** The verified report from Step 2 (kept in memory from the previous step)
7
+ - **Writing style guide:** The AI decontamination rules below
8
+
9
+ ## Purpose
10
+
11
+ Step 2 verified the facts. This step makes the report sound human. You are an editor whose only job is to remove AI writing patterns and inject natural voice. Do not change the report structure, tables, checklist items, scores, or citations. Edit the prose only.
12
+
13
+ ## Instructions
14
+
15
+ ### 1. Read the Report and Style Guide
16
+
17
+ Read the verified report (skip the verification log header, work on the content below the `---`).
18
+
19
+ The AI decontamination passes below are your checklist.
20
+
21
+ ### 2. Run Each Pass
22
+
23
+ Work through these checks in order. For each issue found, fix it and log the change.
24
+
25
+ #### Pass 1: Em Dash Removal
26
+ - Search for every `—` (em dash) in the content
27
+ - Replace each with a period (two sentences), comma, or parentheses
28
+ - Two short sentences almost always beat one em-dashed sentence
29
+ - Log count: "Removed X em dashes"
30
+
31
+ #### Pass 2: Sentence Rhythm
32
+ - Flag where 3+ consecutive sentences are roughly the same length (within ~5 words)
33
+ - Fix by splitting, combining, or varying structure
34
+ - Goal: rhythm should vary when read aloud. Short. Then longer. Then medium.
35
+ - Log: "Varied sentence rhythm in X sections"
36
+
37
+ #### Pass 3: Paragraph Cadence
38
+ - Flag sections where consecutive paragraphs follow the same structure (claim then explanation then example, repeated)
39
+ - Vary the pattern: lead with evidence sometimes, skip the explanation, open with a question
40
+ - Log: "Restructured X paragraphs for cadence variety"
41
+
42
+ #### Pass 4: Opening Word Diversity
43
+ - Scan every paragraph's first word. Flag 2+ consecutive paragraphs starting with the same word
44
+ - Common offenders: "The...", "This...", repeated org name, "Pawsperity..." three times in a row
45
+ - Rewrite at least one opener in each flagged group
46
+ - Log: "Diversified openings in X locations"
47
+
48
+ #### Pass 5: AI Pattern Scan
49
+ Check for and fix:
50
+ - [ ] "[Statement]. Not because X — because Y." dramatic structure
51
+ - [ ] "Not just X, but Y" emphasis pattern
52
+ - [ ] "Whether X or Y" parallel constructions
53
+ - [ ] "From X to Y" range statements
54
+ - [ ] "Here's the thing" / "Let's dive in" / "In short" / "Put simply" / "The reality is"
55
+ - [ ] "At its core" / "At the end of the day" / "Fundamentally" as intensifier
56
+ - [ ] "It's worth noting that" / "Importantly" at sentence start
57
+ - [ ] Overused dramatic colon reveals
58
+ - [ ] Overused semicolons
59
+ - Log each pattern found and fixed
60
+
61
+ #### Pass 6: Perfect Parallelism Breaker
62
+ - Find bullet lists where every bullet follows the exact same grammatical structure
63
+ - Vary at least one item's structure (not just words)
64
+ - Don't always group in threes
65
+ - Log: "Broke parallelism in X lists/sections"
66
+
67
+ #### Pass 7: Filler Adjective Sweep
68
+ Search for and remove/replace:
69
+ - "seamless," "robust," "comprehensive," "critical," "fundamental," "innovative," "powerful," "unique," "holistic," "cutting-edge," "game-changing," "revolutionary"
70
+ - "leverage" → "use", "utilize" → "use"
71
+ - Remove minimizers: "simply," "just," "easily"
72
+ - Usually the sentence is stronger without the adjective
73
+ - Log: "Removed X filler adjectives"
74
+
75
+ #### Pass 8: Read-Aloud Test
76
+ - For each Summary Report section (Sections 1-7), simulate reading aloud
77
+ - Flag anything that sounds stilted, overly formal, or robotically even
78
+ - Rewrite flagged sentences to sound like a thoughtful analyst explaining to a colleague
79
+ - Log: "Rewrote X sentences for natural voice"
80
+
81
+ #### Pass 9: Voice Injection
82
+ Add 2-3 human touches across the Summary Report sections:
83
+ - Brief asides showing evaluator judgment ("This is a stronger evidence base than most organizations in this space provide.")
84
+ - Concrete contextualization ("To put this in perspective, the WHO considers X to be the threshold for Y.")
85
+ - Honest assessments where evidence is ambiguous ("The data here is suggestive but not conclusive.")
86
+ - Do NOT overdo this. 2-3 per report max. They should feel like a thoughtful analyst's observations, not a personality transplant.
87
+ - Log each injection with location and what was added
88
+
89
+ ### 3. Preserve Report Structure
90
+
91
+ After all passes, verify you did NOT change:
92
+ - [ ] Any markdown heading (##, ###)
93
+ - [ ] Any table structure or table data
94
+ - [ ] The scored checklist items or their checked/unchecked status
95
+ - [ ] The score (X/100)
96
+ - [ ] Citation URLs or citation text inside `[brackets](links)`
97
+ - [ ] The SOURCES section
98
+ - [ ] Section separators (`---`)
99
+
100
+ ### 4. Produce Output
101
+
102
+ Keep the humanized report in memory. This is the final version that will be submitted via the `submit_report` tool.
103
+
104
+ Start the output with a change log:
105
+
106
+ ```markdown
107
+ <!-- Humanized: {{ORG_NAME}} | Date: [date] -->
108
+
109
+ # Humanization Log
110
+
111
+ ## Changes by Pass
112
+ - **Em dashes:** Removed [X] instances
113
+ - **Sentence rhythm:** Varied in [X] sections
114
+ - **Paragraph cadence:** Restructured [X] paragraphs
115
+ - **Opening diversity:** Fixed [X] locations
116
+ - **AI patterns:** Found and fixed: [list each pattern]
117
+ - **Parallelism:** Broke in [X] lists/sections
118
+ - **Filler adjectives:** Removed [X] ([list them])
119
+ - **Read-aloud fixes:** Rewrote [X] sentences
120
+ - **Voice injections:** Added [X] ([brief description of each])
121
+
122
+ ## Structure Verification
123
+ - [ ] Headings unchanged
124
+ - [ ] Tables unchanged
125
+ - [ ] Checklist and score unchanged
126
+ - [ ] Citations unchanged
127
+ - [ ] Sources section unchanged
128
+
129
+ ---
130
+
131
+ [Full humanized report below]
132
+ ```
133
+
134
+ ## Quality Checks
135
+
136
+ Before writing the output:
137
+ - [ ] Zero em dashes remain in the content
138
+ - [ ] No two consecutive paragraphs start with the same word
139
+ - [ ] No AI pattern from the tells list remains
140
+ - [ ] At least 2 voice injections added (but no more than 3)
141
+ - [ ] Report structure is identical to the input
142
+ - [ ] Content reads like a human analyst wrote it
143
+ - [ ] The change log accurately reflects all changes made
@@ -0,0 +1,93 @@
1
+ # Peer Review — Instructions
2
+
3
+ ## Purpose
4
+
5
+ You are reviewing another contributor's research report. Your job is to verify quality and catch problems before a human reviewer sees it. You are NOT the original researcher — you are a second pair of eyes.
6
+
7
+ ## Instructions
8
+
9
+ ### 1. Check the Automated Fact-Check Results First
10
+
11
+ If automated fact-check results are included above the report, read them before diving into the report itself. Focus on:
12
+ - **Red flags** — these are specific problems the automated system detected (unsupported claims, dead links, self-reported data issues)
13
+ - **Fact support rate** — below 70% means many claims aren't backed by their cited sources
14
+ - **Avg trust score** — below 50% means citations are low-quality (self-reported, blog posts, dead links)
15
+
16
+ Use these results to target your spot-checks. If the automated system flagged specific unsupported claims, verify those first.
17
+
18
+ ### 2. Read the Full Report
19
+
20
+ Read the entire report. Note the org name, the scored checklist, and the overall recommendation.
21
+
22
+ ### 3. Spot-Check Citations (3-5)
23
+
24
+ Pick 3-5 citation URLs from the report (prioritize any flagged by the automated fact-check). For each:
25
+ - Visit the URL using web fetch
26
+ - Verify the page exists (not 404)
27
+ - Check that the source says what the report claims
28
+ - If a citation is wrong, search for the correct source. If the claim can't be sourced anywhere, remove it.
29
+
30
+ ### 4. Check Report Structure
31
+
32
+ Verify:
33
+ - [ ] All 5 prompt sections present (PROMPT 1-5) with 20 rows each
34
+ - [ ] All 7 summary sections present (Sections 1-7)
35
+ - [ ] SOURCES section exists with citations
36
+ - [ ] Every factual claim has its own inline citation `[Source Name](URL)`
37
+ - [ ] No claims cited to general overview pages when a specific report or data page exists
38
+
39
+ ### 5. Evaluate Scoring
40
+
41
+ The scored checklist uses these weights. Verify the math and the evidence:
42
+
43
+ Base score (out of 100):
44
+ - a. Has Ultimate Outcome Goals (50 pts)
45
+ - b. Measures Intermediate Outcomes (10 pts)
46
+ - c. Measures Ultimate Outcomes (15 pts)
47
+ - d. Shows Continual Learning & Adaptation (25 pts)
48
+
49
+ Extra credit:
50
+ - e. Measures Intermediate Counterfactual (10 pts)
51
+ - f. Measures Ultimate Counterfactual (10 pts)
52
+
53
+ **Score: X/100** (can exceed 100 with extra credit, max 120)
54
+
55
+ Check:
56
+ - Are checked items supported by evidence in the report?
57
+ - Are unchecked items correctly unchecked (no evidence was found)?
58
+ - Does the score math add up?
59
+
60
+ ### 6. Look for Red Flags
61
+
62
+ - Suspiciously specific numbers with no citation
63
+ - Studies or evaluations that seem fabricated
64
+ - Copy-pasted content or generic filler
65
+ - Sections that are empty or trivially short
66
+ - Claims that contradict other parts of the report
67
+ - Em dashes, filler adjectives (robust, comprehensive, innovative), AI transitions
68
+
69
+ ### 7. Assign a Score
70
+
71
+ | Score | When to use |
72
+ |-------|------------|
73
+ | **4 — Great** | Report is thorough, citations check out, scoring is correct. No changes needed. |
74
+ | **3 — Good with fixes** | Minor issues you can fix: broken citation, wrong score math, awkward phrasing, a checklist item that should be toggled, misattributed citation. **Fix the issues yourself** and submit the corrected report. |
75
+ | **2 — Needs redo** | Major problems: thin evidence across multiple sections, significant hallucinations, missing sections, fundamentally wrong scoring. Not fixable with minor edits. |
76
+ | **1 — Bad actor** | Garbage: copy-pasted nonsense, completely fabricated data, obvious gaming attempt. This flags the original author. Use sparingly and only when clearly warranted. |
77
+
78
+ ### 8. Submit Your Review
79
+
80
+ Use `submit_peer_review` with:
81
+ - `claim_id`: The claim ID shown above
82
+ - `score`: Your score (1-4)
83
+ - `notes`: Brief explanation of your score. Mention which citations you checked and what you found.
84
+ - `updated_report`: If score is 3, include the full fixed report
85
+
86
+ ## Important Rules
87
+
88
+ - Be fair. Most reports should score 3 or 4.
89
+ - Score 2 is for genuinely bad reports, not minor style preferences.
90
+ - Score 1 is for abuse. If you're unsure, use 2 instead.
91
+ - If you spot-check a citation and it's broken, that alone is a 3 (fix it), not a 2.
92
+ - Don't rewrite the report to match your style. Fix factual errors, not opinions.
93
+ - If the automated fact-check flagged issues, verify them. If the flags are correct, fix the citations (score 3) or flag the report (score 2) depending on severity.
@@ -0,0 +1,11 @@
1
+ # Archive: Pre-Todd-v2 Prompts
2
+
3
+ **Archived:** 2026-05-07
4
+ **Reason:** Updating methodology to incorporate Todd Manwaring's revised training & prompts (PDF: "2026 04 25 Training & Prompt.pdf")
5
+ **Restore:** Copy any file back to its corresponding `pipeline/0N-*/PROMPT.md` location
6
+
7
+ ## Files
8
+ - `01-research-PROMPT.md` — Main research prompt (prompts 1-5 + summary)
9
+ - `02-verify-PROMPT.md` — Citation re-verification
10
+ - `03-humanize-PROMPT.md` — AI pattern removal
11
+ - `04-peer-review-PROMPT.md` — Peer review scoring
package/src/api-client.js CHANGED
@@ -86,11 +86,4 @@ export class ApiClient {
86
86
  return this.request('GET', '/research/impact');
87
87
  }
88
88
 
89
- async getNextAction() {
90
- return this.request('GET', '/research/next-action');
91
- }
92
-
93
- async enableSchedule() {
94
- return this.request('POST', '/research/enable-schedule');
95
- }
96
89
  }
package/src/cli.js CHANGED
@@ -1,4 +1,4 @@
1
- #!/usr/bin/env node
1
+ #!/usr/bin/env node
2
2
 
3
3
  // CLI entry point for tokens-for-good.
4
4
  // Usage:
package/src/init.js CHANGED
@@ -7,6 +7,7 @@ import { readFileSync, writeFileSync, existsSync, mkdirSync } from 'fs';
7
7
  import { join, dirname } from 'path';
8
8
  import { homedir } from 'os';
9
9
  import { fileURLToPath } from 'url';
10
+ import { spawnSync } from 'child_process';
10
11
  import { detectPlatform } from './platform.js';
11
12
  import { loadState, saveState } from './state.js';
12
13
 
@@ -173,7 +174,16 @@ function statePath() { return homeRelative(join(homedir(), '.tokens-for-good
173
174
 
174
175
  function readJsonOrEmpty(path) {
175
176
  if (!existsSync(path)) return {};
176
- try { return JSON.parse(readFileSync(path, 'utf-8')); } catch { return {}; }
177
+ const raw = readFileSync(path, 'utf-8');
178
+ try {
179
+ return JSON.parse(raw);
180
+ } catch {
181
+ throw new Error(
182
+ `${path} exists but is not valid JSON.\n` +
183
+ `Fix or delete the file, then re-run init.\n` +
184
+ `(Tip: paste it into https://jsonlint.com to find the syntax error.)`
185
+ );
186
+ }
177
187
  }
178
188
 
179
189
  function ensureDir(path) {
@@ -238,15 +248,44 @@ function writeSessionStartHook() {
238
248
  matcher: '',
239
249
  hooks: [{
240
250
  type: 'command',
241
- command: IS_WINDOWS
242
- ? 'cmd /c npx -y tokens-for-good session-start-hook'
243
- : 'npx -y tokens-for-good session-start-hook',
251
+ command: hookCommand(),
244
252
  }],
245
253
  });
246
254
  }
247
255
  writeJson(abs, cfg);
248
256
  }
249
257
 
258
+ // Claude Code runs SessionStart hooks under Git Bash on Windows with a
259
+ // stripped PATH that typically does not include C:\Program Files\nodejs,
260
+ // so a bare `npx` lookup fails silently. Resolve the absolute npx path at
261
+ // init time (when the user's full PATH is available) and bake it into the
262
+ // hook command so it works regardless of Claude Code's hook-runner PATH.
263
+ function hookCommand() {
264
+ if (!IS_WINDOWS) return 'npx -y tokens-for-good session-start-hook';
265
+
266
+ const npxPath = resolveWindowsNpxPath();
267
+ // Bash accepts double-quoted paths with spaces; escape backslashes for JSON.
268
+ return `"${npxPath}" -y tokens-for-good session-start-hook`;
269
+ }
270
+
271
+ function resolveWindowsNpxPath() {
272
+ // First try `where npx.cmd` — most reliable when PATH is correct.
273
+ try {
274
+ const r = spawnSync('where', ['npx.cmd'], { encoding: 'utf-8' });
275
+ if (r.status === 0) {
276
+ const first = r.stdout.trim().split(/\r?\n/)[0];
277
+ if (first && existsSync(first)) return first;
278
+ }
279
+ } catch { /* fall through */ }
280
+
281
+ // Fallback: npx.cmd usually sits alongside node.exe.
282
+ const alongside = join(dirname(process.execPath), 'npx.cmd');
283
+ if (existsSync(alongside)) return alongside;
284
+
285
+ // Last-resort guess — user's hook may need manual edit if this is wrong.
286
+ return 'C:\\Program Files\\nodejs\\npx.cmd';
287
+ }
288
+
250
289
  function writeSkillFile(name) {
251
290
  const src = join(PKG_ROOT, 'skills', `${name}.md`);
252
291
  const dst = join(homedir(), '.claude', 'skills', name, 'SKILL.md');
package/src/mcp-server.js CHANGED
@@ -4,7 +4,7 @@ import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'
4
4
  import { z } from 'zod';
5
5
  import { ApiClient } from './api-client.js';
6
6
  import { detectPlatform, isSchedulable, getAutomationInstructions } from './platform.js';
7
- import { loadState, updateState, isSnoozed, hasContributedToday, markContributed, markSetupComplete } from './state.js';
7
+ import { loadState, updateState, isSnoozed, snoozeDays, hasContributedToday, markContributed, markSetupComplete } from './state.js';
8
8
  import { readFileSync, existsSync } from 'fs';
9
9
  import { join, dirname } from 'path';
10
10
  import { fileURLToPath } from 'url';
@@ -144,7 +144,7 @@ server.tool('submit_report', 'Submit a completed research report for an org you
144
144
  if (!client) return { content: [{ type: 'text', text: 'Error: TFG_API_KEY not set.' }] };
145
145
 
146
146
  try {
147
- const result = await client.submitReport(claim_id, report_markdown, null, null, model_used, PKG_VERSION);
147
+ const result = await client.submitReport(claim_id, report_markdown, estimated_tokens, null, model_used, PKG_VERSION);
148
148
  markContributed();
149
149
 
150
150
  // One-off users: first successful submit completes their initial setup,
@@ -205,7 +205,7 @@ server.tool('get_peer_review', 'Get a draft report assigned to you for peer revi
205
205
 
206
206
  server.tool('submit_peer_review', 'Submit your peer review score for a report.', {
207
207
  claim_id: z.string().describe('The claim ID of the report being reviewed'),
208
- score: z.number().min(1).max(4).describe('Score: 4=great, 3=good with fixes, 2=needs redo, 1=bad actor'),
208
+ score: z.number().int().min(1).max(4).describe('Score: 4=great, 3=good with fixes, 2=needs redo, 1=bad actor'),
209
209
  notes: z.string().optional().describe('Review notes explaining the score'),
210
210
  updated_report: z.string().optional().describe('If score is 3, the fixed version of the report'),
211
211
  }, async ({ claim_id, score, notes, updated_report }) => {
@@ -270,6 +270,13 @@ server.tool('mark_setup_complete', 'Called by the /tfg-schedule skill after /sch
270
270
  return { content: [{ type: 'text', text: 'Marked setup complete. The SessionStart hook will go silent from the next session.' }] };
271
271
  });
272
272
 
273
+ server.tool('snooze', 'Snooze Tokens for Good reminders. Call this when the user says to remind them tomorrow, next week, or in N days.', {
274
+ days: z.number().int().min(1).max(365).describe('Days to snooze (1 = tomorrow, 7 = next week)'),
275
+ }, async ({ days }) => {
276
+ snoozeDays(days);
277
+ return { content: [{ type: 'text', text: `Got it — Tokens for Good will stay quiet for ${days} day${days === 1 ? '' : 's'}.` }] };
278
+ });
279
+
273
280
  // --- Prompts (session start) ---
274
281
 
275
282
  server.prompt('session_start', 'Check if you should research an org or complete a peer review', {}, async () => {
@@ -286,18 +293,16 @@ server.prompt('session_start', 'Check if you should research an org or complete
286
293
  const state = loadState();
287
294
 
288
295
  // Check for pending peer review first
289
- if (client) {
290
- try {
291
- const review = await client.getNextPeerReview();
292
- return {
293
- messages: [{
294
- role: 'user',
295
- content: { type: 'text', text: `You have a pending peer review to complete before you can claim a new org. Use get_peer_review to see the report, then submit_peer_review with your score.` },
296
- }],
297
- };
298
- } catch {
299
- // No pending review, continue
300
- }
296
+ try {
297
+ await client.getNextPeerReview();
298
+ return {
299
+ messages: [{
300
+ role: 'user',
301
+ content: { type: 'text', text: `You have a pending peer review to complete before you can claim a new org. Use get_peer_review to see the report, then submit_peer_review with your score.` },
302
+ }],
303
+ };
304
+ } catch {
305
+ // No pending review, continue
301
306
  }
302
307
 
303
308
  if (isSnoozed()) {
package/src/state.js CHANGED
@@ -78,7 +78,3 @@ export function markSetupComplete() {
78
78
  updateState({ first_setup_complete: true });
79
79
  }
80
80
 
81
- export function isInitialized() {
82
- const state = loadState();
83
- return state.intended_flow !== null;
84
- }