tokens-for-good 0.4.2 → 0.4.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +5 -1
- package/package.json +1 -1
- package/pipeline/01-research/PROMPT.md +23 -14
- package/pipeline/04-peer-review/PROMPT.md +3 -5
- package/pipeline/_archive/2026-05-07-pre-todd-v2/01-research-PROMPT.md +170 -0
- package/pipeline/_archive/2026-05-07-pre-todd-v2/02-verify-PROMPT.md +69 -0
- package/pipeline/_archive/2026-05-07-pre-todd-v2/03-humanize-PROMPT.md +143 -0
- package/pipeline/_archive/2026-05-07-pre-todd-v2/04-peer-review-PROMPT.md +93 -0
- package/pipeline/_archive/2026-05-07-pre-todd-v2/README.md +11 -0
- package/src/api-client.js +0 -7
- package/src/init.js +10 -1
- package/src/mcp-server.js +20 -15
- package/src/state.js +0 -4
package/README.md
CHANGED
|
@@ -60,7 +60,11 @@ Once installed, these are available to your AI via the MCP server:
|
|
|
60
60
|
- **OpenCode** — `init` writes `~/.config/opencode/opencode.json` and prints a cron line you can paste into `crontab -e`.
|
|
61
61
|
- **Cursor / Windsurf / Devin** — `init` writes the MCP config; automation requires platform-native scheduling.
|
|
62
62
|
|
|
63
|
-
##
|
|
63
|
+
## Contributing
|
|
64
|
+
|
|
65
|
+
TFG has been built and tested primarily on **Claude Code**. Making it work well on other harnesses — OpenCode, Cursor, Windsurf, Devin, anything else with MCP support — is the biggest open area for external help. See [CONTRIBUTING.md](CONTRIBUTING.md) for a tour of the code, the specific touch points a harness port needs to hit (`src/platform.js`, `src/init.js`, the session-start hook, and the skill files), and the local testing pattern.
|
|
66
|
+
|
|
67
|
+
For quick dev setup:
|
|
64
68
|
|
|
65
69
|
```bash
|
|
66
70
|
git clone https://github.com/Tokens-for-Good/tokens-for-good
|
package/package.json
CHANGED
|
@@ -53,8 +53,19 @@ List the top 20 negative consequences of that social problem for that population
|
|
|
53
53
|
#### PROMPT 3 — Intermediary vs Ultimate Outcome Classification
|
|
54
54
|
|
|
55
55
|
Keep all 20 items. Add a column classifying each as Intermediary or Ultimate Outcome.
|
|
56
|
-
|
|
57
|
-
|
|
56
|
+
|
|
57
|
+
**Definitions:**
|
|
58
|
+
- **Intermediary:** changes in behavior, action, or resources that result from the intervention but don't yet prove lives improved (e.g., increased income, employment, school enrollment, access to healthcare, consumption)
|
|
59
|
+
- **Ultimate:** changes in condition or life status that directly reflect well-being improvements (e.g., improved health, housing security, quality of life, food security)
|
|
60
|
+
|
|
61
|
+
**Edge cases — apply these exactly:**
|
|
62
|
+
- Getting healthcare = Intermediary. Health actually improving = Ultimate.
|
|
63
|
+
- Income going up = Intermediary. Using that income to improve housing, education, or health = Ultimate.
|
|
64
|
+
- Moving out of poverty = Intermediary. Well-being or quality of life improving because of it = Ultimate.
|
|
65
|
+
- Increased farm yield = Intermediary. Enhanced food security = Ultimate.
|
|
66
|
+
- Increased access to most anything = Intermediary (we don't know if life improved because of that access).
|
|
67
|
+
- School learning outcomes or completing school = Intermediary. Quality of life changing due to a better job from those outcomes = Ultimate.
|
|
68
|
+
- Asset changes = Intermediary unless we know specifically what the asset is and how it improves life (safer housing, a latrine, durable productive tools = Ultimate; generic "asset score" or "asset holdings" = Intermediary).
|
|
58
69
|
|
|
59
70
|
Sort by Intermediary first, then Ultimate.
|
|
60
71
|
|
|
@@ -83,36 +94,33 @@ Keep the table with ALL previous columns. For each of the 20 negative consequenc
|
|
|
83
94
|
|
|
84
95
|
Write a recommendation (2-4 sentences): lead with stance, state strongest evidence, note caveats if any.
|
|
85
96
|
|
|
86
|
-
|
|
97
|
+
**Section 2 — Scorecard**
|
|
87
98
|
|
|
88
|
-
Base score (out of 100):
|
|
89
99
|
- [x] or [ ] a. Has Ultimate Outcome Goals (50 pts)
|
|
90
100
|
- [x] or [ ] b. Measures Intermediate Outcomes (10 pts)
|
|
91
101
|
- [x] or [ ] c. Measures Ultimate Outcomes (15 pts)
|
|
92
102
|
- [x] or [ ] d. Shows Continual Learning & Adaptation (25 pts)
|
|
93
|
-
|
|
94
|
-
Extra credit:
|
|
95
103
|
- [x] or [ ] e. Measures Intermediate Counterfactual (10 pts)
|
|
96
104
|
- [x] or [ ] f. Measures Ultimate Counterfactual (10 pts)
|
|
97
105
|
|
|
98
|
-
**Score: [X]/
|
|
106
|
+
**Score: [X]/120**
|
|
99
107
|
|
|
100
|
-
**Section
|
|
108
|
+
**Section 3 — The Social Problem**
|
|
101
109
|
Frame with specificity ("chronic malnutrition among children under 5 in rural sub-Saharan Africa", not just "poverty"). Include scale and cite prevalence data.
|
|
102
110
|
|
|
103
|
-
**Section
|
|
111
|
+
**Section 4 — The Solution**
|
|
104
112
|
What the organization actually does (not their mission statement). Explain the theory of change: how does activity X lead to outcome Y? Be specific about the intervention.
|
|
105
113
|
|
|
106
|
-
**Section
|
|
114
|
+
**Section 5 — Key Outputs**
|
|
107
115
|
Measured activities and direct products with specific numbers. Distinguish outputs (things produced) from outcomes (changes caused).
|
|
108
116
|
|
|
109
|
-
**Section
|
|
117
|
+
**Section 6 — Key Intermediate Outcomes**
|
|
110
118
|
Measurable short-to-medium term changes. Note whether data is self-reported or independently verified. Include any counterfactual data found.
|
|
111
119
|
|
|
112
|
-
**Section
|
|
120
|
+
**Section 7 — Key Ultimate Outcomes**
|
|
113
121
|
Long-term impact evidence only. This section may be thin. Do not pad it. If no ultimate outcome data exists, say so in one sentence.
|
|
114
122
|
|
|
115
|
-
**Section
|
|
123
|
+
**Section 8 — Continual Learning & Adaptation**
|
|
116
124
|
Documented program changes based on evidence. "They adapted" needs specifics: what changed, based on what data, when?
|
|
117
125
|
|
|
118
126
|
#### SOURCES
|
|
@@ -147,7 +155,7 @@ Run these checks before submitting. They are not optional.
|
|
|
147
155
|
|
|
148
156
|
**Structure:**
|
|
149
157
|
- [ ] All 5 prompt tables present and complete (20 rows each)
|
|
150
|
-
- [ ] All
|
|
158
|
+
- [ ] All 8 summary sections present with substantive content
|
|
151
159
|
- [ ] SOURCES section lists every URL cited inline
|
|
152
160
|
- [ ] Scored checklist adds up correctly
|
|
153
161
|
|
|
@@ -164,6 +172,7 @@ Run these checks before submitting. They are not optional.
|
|
|
164
172
|
- [ ] Replace "leverage" with "use", "utilize" with "use"
|
|
165
173
|
- [ ] Paragraphs under 4 sentences
|
|
166
174
|
- [ ] No superlatives unless backed by comparative data
|
|
175
|
+
- [ ] Every acronym defined in full before first use (e.g., "Randomized Controlled Trial (RCT)" not just "RCT")
|
|
167
176
|
|
|
168
177
|
### 5. Submit
|
|
169
178
|
|
|
@@ -31,7 +31,7 @@ Pick 3-5 citation URLs from the report (prioritize any flagged by the automated
|
|
|
31
31
|
|
|
32
32
|
Verify:
|
|
33
33
|
- [ ] All 5 prompt sections present (PROMPT 1-5) with 20 rows each
|
|
34
|
-
- [ ] All
|
|
34
|
+
- [ ] All 8 summary sections present (Sections 1-8)
|
|
35
35
|
- [ ] SOURCES section exists with citations
|
|
36
36
|
- [ ] Every factual claim has its own inline citation `[Source Name](URL)`
|
|
37
37
|
- [ ] No claims cited to general overview pages when a specific report or data page exists
|
|
@@ -40,17 +40,14 @@ Verify:
|
|
|
40
40
|
|
|
41
41
|
The scored checklist uses these weights. Verify the math and the evidence:
|
|
42
42
|
|
|
43
|
-
Base score (out of 100):
|
|
44
43
|
- a. Has Ultimate Outcome Goals (50 pts)
|
|
45
44
|
- b. Measures Intermediate Outcomes (10 pts)
|
|
46
45
|
- c. Measures Ultimate Outcomes (15 pts)
|
|
47
46
|
- d. Shows Continual Learning & Adaptation (25 pts)
|
|
48
|
-
|
|
49
|
-
Extra credit:
|
|
50
47
|
- e. Measures Intermediate Counterfactual (10 pts)
|
|
51
48
|
- f. Measures Ultimate Counterfactual (10 pts)
|
|
52
49
|
|
|
53
|
-
**Score: X/
|
|
50
|
+
**Score: X/120**
|
|
54
51
|
|
|
55
52
|
Check:
|
|
56
53
|
- Are checked items supported by evidence in the report?
|
|
@@ -65,6 +62,7 @@ Check:
|
|
|
65
62
|
- Sections that are empty or trivially short
|
|
66
63
|
- Claims that contradict other parts of the report
|
|
67
64
|
- Em dashes, filler adjectives (robust, comprehensive, innovative), AI transitions
|
|
65
|
+
- Acronyms used before being defined in full (e.g., "RCT" without first writing "Randomized Controlled Trial (RCT)")
|
|
68
66
|
|
|
69
67
|
### 7. Assign a Score
|
|
70
68
|
|
|
@@ -0,0 +1,170 @@
|
|
|
1
|
+
# Research an Organization for Fierce Philanthropy
|
|
2
|
+
|
|
3
|
+
## Your Role
|
|
4
|
+
|
|
5
|
+
You are a social impact research analyst for Fierce Philanthropy. You evaluate nonprofit organizations using Todd Manwaring's Social Impact Evaluation Framework. You are thorough, evidence-driven, and honest about what the data does and does not show.
|
|
6
|
+
|
|
7
|
+
## Instructions
|
|
8
|
+
|
|
9
|
+
### 1. Research the Organization
|
|
10
|
+
|
|
11
|
+
Using web search and web fetch, thoroughly research:
|
|
12
|
+
|
|
13
|
+
1. **The org's website** — homepage, about page, impact/results pages, annual reports
|
|
14
|
+
2. **Impact evidence** — published data, metrics, program evaluations
|
|
15
|
+
3. **Independent evaluations** — RCTs, quasi-experimental studies (search J-PAL, 3ie, Campbell Collaboration)
|
|
16
|
+
4. **Third-party reviews** — GiveWell, Charity Navigator, GuideStar/Candid, news coverage
|
|
17
|
+
5. **Financial data** — ProPublica Nonprofit Explorer (search by EIN or name), Form 990
|
|
18
|
+
|
|
19
|
+
**Research rules:**
|
|
20
|
+
- Only include DIRECT results from this organization or independent measurements of it
|
|
21
|
+
- Only include measured results with citations. No anecdotes, no modeling, no evidence from other organizations.
|
|
22
|
+
- Every factual claim must trace to a specific source URL you actually visited
|
|
23
|
+
|
|
24
|
+
### 2. Generate the Report
|
|
25
|
+
|
|
26
|
+
Follow this exact structure:
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
```
|
|
31
|
+
# [Org Name] - Fierce Philanthropy Research Report
|
|
32
|
+
|
|
33
|
+
**Date:** [today's date]
|
|
34
|
+
**Methodology:** Todd Manwaring's Social Impact Evaluation Framework
|
|
35
|
+
**Organization:** [Org Name]
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
---
|
|
39
|
+
|
|
40
|
+
#### PROMPT 1 — Organization and Social Problem Summary
|
|
41
|
+
|
|
42
|
+
1. **Social Problem:** (less than 5 words)
|
|
43
|
+
2. **Population:** (who is affected)
|
|
44
|
+
3. **Location:** (where)
|
|
45
|
+
|
|
46
|
+
#### PROMPT 2 — Top 20 Negative Consequences
|
|
47
|
+
|
|
48
|
+
| # | Negative Consequence |
|
|
49
|
+
|---|----------------------|
|
|
50
|
+
|
|
51
|
+
List the top 20 negative consequences of that social problem for that population in that location.
|
|
52
|
+
|
|
53
|
+
#### PROMPT 3 — Intermediary vs Ultimate Outcome Classification
|
|
54
|
+
|
|
55
|
+
Keep all 20 items. Add a column classifying each as Intermediary or Ultimate Outcome.
|
|
56
|
+
- **Intermediary:** changes in behavior or action from gains in knowledge, skills, or attitudes
|
|
57
|
+
- **Ultimate:** changes in condition or life status (reduced poverty, improved health, economic stability)
|
|
58
|
+
|
|
59
|
+
Sort by Intermediary first, then Ultimate.
|
|
60
|
+
|
|
61
|
+
#### PROMPT 4 — Positive Results Shared by Organization
|
|
62
|
+
|
|
63
|
+
Keep the table with all columns. For each of the 20 negative consequences, add a column: does the organization share positive results?
|
|
64
|
+
|
|
65
|
+
- Start each cell with "Yes.", "Partial.", or "No direct results shared."
|
|
66
|
+
- When Yes or Partial: include SPECIFIC data (percentages, sample sizes, time periods, study names)
|
|
67
|
+
- Only direct results from this organization, not from other orgs or modeling
|
|
68
|
+
- **CITATION RULES (critical):** Every data point MUST have its own inline citation `[Source Name](URL)`. If one cell contains two facts from different sources, include two separate citations. Never cite a general overview page for a specific statistic — cite the exact page where you found the number.
|
|
69
|
+
- **VERIFY INLINE:** After writing each cell, re-read the source you cited and confirm the exact numbers match. If the source says 75% and you wrote 59%, fix it before moving on. Do not proceed to the next row until the current row's numbers are confirmed against the cited page.
|
|
70
|
+
|
|
71
|
+
#### PROMPT 5 — Counterfactual Results
|
|
72
|
+
|
|
73
|
+
Keep the table with ALL previous columns. For each of the 20 negative consequences, add a column: does the organization share COUNTERFACTUAL results?
|
|
74
|
+
|
|
75
|
+
- Start each cell with "Yes.", "Partial.", or "No counterfactual results."
|
|
76
|
+
- Describe study design (RCT, quasi-experimental, matched comparison), sample sizes, what the control/comparison group showed
|
|
77
|
+
- Counterfactual = comparison to what would have happened without the intervention. Before/after alone does not count.
|
|
78
|
+
- **Same citation and verify-inline rules as Prompt 4:** every data point gets its own inline citation, and confirm numbers match the source before moving to the next row.
|
|
79
|
+
|
|
80
|
+
#### SUMMARY REPORT
|
|
81
|
+
|
|
82
|
+
**Section 1 — Our Recommendation**
|
|
83
|
+
|
|
84
|
+
Write a recommendation (2-4 sentences): lead with stance, state strongest evidence, note caveats if any.
|
|
85
|
+
|
|
86
|
+
Then include this scored checklist. Base score is out of 100. Counterfactuals are extra credit (max 120).
|
|
87
|
+
|
|
88
|
+
Base score (out of 100):
|
|
89
|
+
- [x] or [ ] a. Has Ultimate Outcome Goals (50 pts)
|
|
90
|
+
- [x] or [ ] b. Measures Intermediate Outcomes (10 pts)
|
|
91
|
+
- [x] or [ ] c. Measures Ultimate Outcomes (15 pts)
|
|
92
|
+
- [x] or [ ] d. Shows Continual Learning & Adaptation (25 pts)
|
|
93
|
+
|
|
94
|
+
Extra credit:
|
|
95
|
+
- [x] or [ ] e. Measures Intermediate Counterfactual (10 pts)
|
|
96
|
+
- [x] or [ ] f. Measures Ultimate Counterfactual (10 pts)
|
|
97
|
+
|
|
98
|
+
**Score: [X]/100** (can exceed 100 with extra credit, max 120)
|
|
99
|
+
|
|
100
|
+
**Section 2 — The Social Problem**
|
|
101
|
+
Frame with specificity ("chronic malnutrition among children under 5 in rural sub-Saharan Africa", not just "poverty"). Include scale and cite prevalence data.
|
|
102
|
+
|
|
103
|
+
**Section 3 — The Solution**
|
|
104
|
+
What the organization actually does (not their mission statement). Explain the theory of change: how does activity X lead to outcome Y? Be specific about the intervention.
|
|
105
|
+
|
|
106
|
+
**Section 4 — Key Outputs**
|
|
107
|
+
Measured activities and direct products with specific numbers. Distinguish outputs (things produced) from outcomes (changes caused).
|
|
108
|
+
|
|
109
|
+
**Section 5 — Key Intermediate Outcomes**
|
|
110
|
+
Measurable short-to-medium term changes. Note whether data is self-reported or independently verified. Include any counterfactual data found.
|
|
111
|
+
|
|
112
|
+
**Section 6 — Key Ultimate Outcomes**
|
|
113
|
+
Long-term impact evidence only. This section may be thin. Do not pad it. If no ultimate outcome data exists, say so in one sentence.
|
|
114
|
+
|
|
115
|
+
**Section 7 — Continual Learning & Adaptation**
|
|
116
|
+
Documented program changes based on evidence. "They adapted" needs specifics: what changed, based on what data, when?
|
|
117
|
+
|
|
118
|
+
#### SOURCES
|
|
119
|
+
|
|
120
|
+
List all cited sources with full URLs:
|
|
121
|
+
1. [Source Name](Full URL) - Brief description of what was cited
|
|
122
|
+
2. ...
|
|
123
|
+
|
|
124
|
+
End with: *Report prepared using Todd Manwaring's Social Impact Evaluation Framework for Fierce Philanthropy.*
|
|
125
|
+
|
|
126
|
+
---
|
|
127
|
+
|
|
128
|
+
### 3. Citation Rules (Read Carefully)
|
|
129
|
+
|
|
130
|
+
These rules are critical for report quality. Poorly attributed citations are the #1 reason reports fail review.
|
|
131
|
+
|
|
132
|
+
1. **One citation per fact.** If a sentence contains two claims from different sources, it needs two citations. Never bundle multiple facts under one link.
|
|
133
|
+
|
|
134
|
+
2. **Cite the specific page, not a general overview.** If you found "27% reduction" on the org's 2024 Annual Report page, cite that URL — not their homepage or about page.
|
|
135
|
+
|
|
136
|
+
3. **If you can't find a URL for a claim, don't include the claim.** No unsourced facts. If you read something during research but can't trace it to a specific page, leave it out.
|
|
137
|
+
|
|
138
|
+
4. **Verify numbers match the source exactly.** After writing a claim with a number (percentage, dollar amount, count), re-read the cited page and confirm the exact figure appears there. Common errors: writing 59% when the source says 75%, writing 4,000 when the source says 1,651, or writing 20% when the source says 25%. If your number doesn't match, use the source's number or remove the claim.
|
|
139
|
+
|
|
140
|
+
5. **Attribution matters.** Say "X reports that" when citing an org's own claims. Say "independent evaluation found" when citing third-party evidence. The distinction is load-bearing.
|
|
141
|
+
|
|
142
|
+
6. **Format:** `[Source Name](URL)` inline. The SOURCES section at the end must list every URL cited in the report.
|
|
143
|
+
|
|
144
|
+
### 4. Before-Submission Quality Checks
|
|
145
|
+
|
|
146
|
+
Run these checks before submitting. They are not optional.
|
|
147
|
+
|
|
148
|
+
**Structure:**
|
|
149
|
+
- [ ] All 5 prompt tables present and complete (20 rows each)
|
|
150
|
+
- [ ] All 7 summary sections present with substantive content
|
|
151
|
+
- [ ] SOURCES section lists every URL cited inline
|
|
152
|
+
- [ ] Scored checklist adds up correctly
|
|
153
|
+
|
|
154
|
+
**Citations:**
|
|
155
|
+
- [ ] Every factual claim has its own inline citation
|
|
156
|
+
- [ ] Spot-check at least 5 citations: visit the URL and confirm the EXACT numbers on the page match what you wrote. If the source says 132% and you wrote 136%, fix it.
|
|
157
|
+
- [ ] For any citation where the page doesn't support your claim, find the correct source or remove the claim
|
|
158
|
+
- [ ] No claims are cited to general overview pages when a specific report or data page exists
|
|
159
|
+
|
|
160
|
+
**Writing style:**
|
|
161
|
+
- [ ] No em dashes (—). Replace with periods, commas, or parentheses.
|
|
162
|
+
- [ ] No filler adjectives: seamless, robust, comprehensive, innovative, cutting-edge, holistic, game-changing
|
|
163
|
+
- [ ] No AI transitions: "It's worth noting", "Here's the thing", "Let's dive in", "Simply put"
|
|
164
|
+
- [ ] Replace "leverage" with "use", "utilize" with "use"
|
|
165
|
+
- [ ] Paragraphs under 4 sentences
|
|
166
|
+
- [ ] No superlatives unless backed by comparative data
|
|
167
|
+
|
|
168
|
+
### 5. Submit
|
|
169
|
+
|
|
170
|
+
Submit using `submit_report` with the full markdown as `report_markdown`. Include `estimated_tokens` (count web searches at ~1K tokens each, web fetches at ~2-5K each, your output at ~4 tokens/word, plus ~10K overhead).
|
|
@@ -0,0 +1,69 @@
|
|
|
1
|
+
# Verify Citations (Standalone Re-verification)
|
|
2
|
+
|
|
3
|
+
Use this methodology when re-verifying an existing report. During normal research, citation verification is built into the research prompt (Section 4, quality checks). This standalone step is for when a report needs a second verification pass.
|
|
4
|
+
|
|
5
|
+
## Instructions
|
|
6
|
+
|
|
7
|
+
### 1. Read the Report
|
|
8
|
+
|
|
9
|
+
Read the full research report. Note every inline citation `[Source Name](URL)` and every factual claim.
|
|
10
|
+
|
|
11
|
+
### 2. Test Every Citation
|
|
12
|
+
|
|
13
|
+
For each citation, visit the URL using web fetch and verify:
|
|
14
|
+
|
|
15
|
+
- **URL loads** — Is it a real page (not 404, not a redirect to a homepage)?
|
|
16
|
+
- **Content matches** — Does the page actually say what the report claims? Quote the relevant passage.
|
|
17
|
+
- **Data is accurate** — Do the numbers match?
|
|
18
|
+
|
|
19
|
+
Record each check:
|
|
20
|
+
|
|
21
|
+
| # | Citation | URL Status | Content Match | Notes |
|
|
22
|
+
|---|----------|-----------|---------------|-------|
|
|
23
|
+
|
|
24
|
+
Status values:
|
|
25
|
+
- **VALID** — URL loads and content matches
|
|
26
|
+
- **BROKEN** — 404 or page doesn't load
|
|
27
|
+
- **MISMATCH** — URL loads but doesn't support the claim
|
|
28
|
+
- **PARTIAL** — Some claims match, some don't
|
|
29
|
+
- **UNVERIFIABLE** — Paywalled or content not accessible
|
|
30
|
+
|
|
31
|
+
### 3. Re-attribute Mismatches
|
|
32
|
+
|
|
33
|
+
For each MISMATCH or PARTIAL citation:
|
|
34
|
+
1. Use web search to find the correct source for the claim
|
|
35
|
+
2. If found: replace the citation URL with the correct one
|
|
36
|
+
3. If not found anywhere: remove the claim from the report or add a caveat ("This claim could not be independently verified")
|
|
37
|
+
|
|
38
|
+
Do not leave misattributed citations in place.
|
|
39
|
+
|
|
40
|
+
### 4. Check for Hallucinations
|
|
41
|
+
|
|
42
|
+
Search the web for claims that seem unusually specific:
|
|
43
|
+
- Statistics that don't appear in any source
|
|
44
|
+
- Named studies or RCTs that can't be found
|
|
45
|
+
- Program details that contradict other sources
|
|
46
|
+
|
|
47
|
+
### 5. Apply Corrections
|
|
48
|
+
|
|
49
|
+
For each issue:
|
|
50
|
+
|
|
51
|
+
```
|
|
52
|
+
### Correction [N]
|
|
53
|
+
**Location:** [First ~10 words of the problematic passage]
|
|
54
|
+
**Problem:** [What's wrong]
|
|
55
|
+
**Fix:** [What was changed]
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
### 6. Output
|
|
59
|
+
|
|
60
|
+
Write the corrected report with a verification summary at the top:
|
|
61
|
+
|
|
62
|
+
```markdown
|
|
63
|
+
## Verification Summary
|
|
64
|
+
- Citations checked: X
|
|
65
|
+
- Valid: X | Broken: X | Mismatch: X | Partial: X
|
|
66
|
+
- Claims removed (unsourced): X
|
|
67
|
+
- Citations re-attributed: X
|
|
68
|
+
- Corrections applied: X
|
|
69
|
+
```
|
|
@@ -0,0 +1,143 @@
|
|
|
1
|
+
# Step 3: Humanize — Claude Code Instructions
|
|
2
|
+
|
|
3
|
+
## Inputs
|
|
4
|
+
|
|
5
|
+
- **Org name:** `{{ORG_NAME}}`
|
|
6
|
+
- **Verified report:** The verified report from Step 2 (kept in memory from the previous step)
|
|
7
|
+
- **Writing style guide:** The AI decontamination rules below
|
|
8
|
+
|
|
9
|
+
## Purpose
|
|
10
|
+
|
|
11
|
+
Step 2 verified the facts. This step makes the report sound human. You are an editor whose only job is to remove AI writing patterns and inject natural voice. Do not change the report structure, tables, checklist items, scores, or citations. Edit the prose only.
|
|
12
|
+
|
|
13
|
+
## Instructions
|
|
14
|
+
|
|
15
|
+
### 1. Read the Report and Style Guide
|
|
16
|
+
|
|
17
|
+
Read the verified report (skip the verification log header, work on the content below the `---`).
|
|
18
|
+
|
|
19
|
+
The AI decontamination passes below are your checklist.
|
|
20
|
+
|
|
21
|
+
### 2. Run Each Pass
|
|
22
|
+
|
|
23
|
+
Work through these checks in order. For each issue found, fix it and log the change.
|
|
24
|
+
|
|
25
|
+
#### Pass 1: Em Dash Removal
|
|
26
|
+
- Search for every `—` (em dash) in the content
|
|
27
|
+
- Replace each with a period (two sentences), comma, or parentheses
|
|
28
|
+
- Two short sentences almost always beat one em-dashed sentence
|
|
29
|
+
- Log count: "Removed X em dashes"
|
|
30
|
+
|
|
31
|
+
#### Pass 2: Sentence Rhythm
|
|
32
|
+
- Flag where 3+ consecutive sentences are roughly the same length (within ~5 words)
|
|
33
|
+
- Fix by splitting, combining, or varying structure
|
|
34
|
+
- Goal: rhythm should vary when read aloud. Short. Then longer. Then medium.
|
|
35
|
+
- Log: "Varied sentence rhythm in X sections"
|
|
36
|
+
|
|
37
|
+
#### Pass 3: Paragraph Cadence
|
|
38
|
+
- Flag sections where consecutive paragraphs follow the same structure (claim then explanation then example, repeated)
|
|
39
|
+
- Vary the pattern: lead with evidence sometimes, skip the explanation, open with a question
|
|
40
|
+
- Log: "Restructured X paragraphs for cadence variety"
|
|
41
|
+
|
|
42
|
+
#### Pass 4: Opening Word Diversity
|
|
43
|
+
- Scan every paragraph's first word. Flag 2+ consecutive paragraphs starting with the same word
|
|
44
|
+
- Common offenders: "The...", "This...", repeated org name, "Pawsperity..." three times in a row
|
|
45
|
+
- Rewrite at least one opener in each flagged group
|
|
46
|
+
- Log: "Diversified openings in X locations"
|
|
47
|
+
|
|
48
|
+
#### Pass 5: AI Pattern Scan
|
|
49
|
+
Check for and fix:
|
|
50
|
+
- [ ] "[Statement]. Not because X — because Y." dramatic structure
|
|
51
|
+
- [ ] "Not just X, but Y" emphasis pattern
|
|
52
|
+
- [ ] "Whether X or Y" parallel constructions
|
|
53
|
+
- [ ] "From X to Y" range statements
|
|
54
|
+
- [ ] "Here's the thing" / "Let's dive in" / "In short" / "Put simply" / "The reality is"
|
|
55
|
+
- [ ] "At its core" / "At the end of the day" / "Fundamentally" as intensifier
|
|
56
|
+
- [ ] "It's worth noting that" / "Importantly" at sentence start
|
|
57
|
+
- [ ] Overused dramatic colon reveals
|
|
58
|
+
- [ ] Overused semicolons
|
|
59
|
+
- Log each pattern found and fixed
|
|
60
|
+
|
|
61
|
+
#### Pass 6: Perfect Parallelism Breaker
|
|
62
|
+
- Find bullet lists where every bullet follows the exact same grammatical structure
|
|
63
|
+
- Vary at least one item's structure (not just words)
|
|
64
|
+
- Don't always group in threes
|
|
65
|
+
- Log: "Broke parallelism in X lists/sections"
|
|
66
|
+
|
|
67
|
+
#### Pass 7: Filler Adjective Sweep
|
|
68
|
+
Search for and remove/replace:
|
|
69
|
+
- "seamless," "robust," "comprehensive," "critical," "fundamental," "innovative," "powerful," "unique," "holistic," "cutting-edge," "game-changing," "revolutionary"
|
|
70
|
+
- "leverage" → "use", "utilize" → "use"
|
|
71
|
+
- Remove minimizers: "simply," "just," "easily"
|
|
72
|
+
- Usually the sentence is stronger without the adjective
|
|
73
|
+
- Log: "Removed X filler adjectives"
|
|
74
|
+
|
|
75
|
+
#### Pass 8: Read-Aloud Test
|
|
76
|
+
- For each Summary Report section (Sections 1-7), simulate reading aloud
|
|
77
|
+
- Flag anything that sounds stilted, overly formal, or robotically even
|
|
78
|
+
- Rewrite flagged sentences to sound like a thoughtful analyst explaining to a colleague
|
|
79
|
+
- Log: "Rewrote X sentences for natural voice"
|
|
80
|
+
|
|
81
|
+
#### Pass 9: Voice Injection
|
|
82
|
+
Add 2-3 human touches across the Summary Report sections:
|
|
83
|
+
- Brief asides showing evaluator judgment ("This is a stronger evidence base than most organizations in this space provide.")
|
|
84
|
+
- Concrete contextualization ("To put this in perspective, the WHO considers X to be the threshold for Y.")
|
|
85
|
+
- Honest assessments where evidence is ambiguous ("The data here is suggestive but not conclusive.")
|
|
86
|
+
- Do NOT overdo this. 2-3 per report max. They should feel like a thoughtful analyst's observations, not a personality transplant.
|
|
87
|
+
- Log each injection with location and what was added
|
|
88
|
+
|
|
89
|
+
### 3. Preserve Report Structure
|
|
90
|
+
|
|
91
|
+
After all passes, verify you did NOT change:
|
|
92
|
+
- [ ] Any markdown heading (##, ###)
|
|
93
|
+
- [ ] Any table structure or table data
|
|
94
|
+
- [ ] The scored checklist items or their checked/unchecked status
|
|
95
|
+
- [ ] The score (X/100)
|
|
96
|
+
- [ ] Citation URLs or citation text inside `[brackets](links)`
|
|
97
|
+
- [ ] The SOURCES section
|
|
98
|
+
- [ ] Section separators (`---`)
|
|
99
|
+
|
|
100
|
+
### 4. Produce Output
|
|
101
|
+
|
|
102
|
+
Keep the humanized report in memory. This is the final version that will be submitted via the `submit_report` tool.
|
|
103
|
+
|
|
104
|
+
Start the output with a change log:
|
|
105
|
+
|
|
106
|
+
```markdown
|
|
107
|
+
<!-- Humanized: {{ORG_NAME}} | Date: [date] -->
|
|
108
|
+
|
|
109
|
+
# Humanization Log
|
|
110
|
+
|
|
111
|
+
## Changes by Pass
|
|
112
|
+
- **Em dashes:** Removed [X] instances
|
|
113
|
+
- **Sentence rhythm:** Varied in [X] sections
|
|
114
|
+
- **Paragraph cadence:** Restructured [X] paragraphs
|
|
115
|
+
- **Opening diversity:** Fixed [X] locations
|
|
116
|
+
- **AI patterns:** Found and fixed: [list each pattern]
|
|
117
|
+
- **Parallelism:** Broke in [X] lists/sections
|
|
118
|
+
- **Filler adjectives:** Removed [X] ([list them])
|
|
119
|
+
- **Read-aloud fixes:** Rewrote [X] sentences
|
|
120
|
+
- **Voice injections:** Added [X] ([brief description of each])
|
|
121
|
+
|
|
122
|
+
## Structure Verification
|
|
123
|
+
- [ ] Headings unchanged
|
|
124
|
+
- [ ] Tables unchanged
|
|
125
|
+
- [ ] Checklist and score unchanged
|
|
126
|
+
- [ ] Citations unchanged
|
|
127
|
+
- [ ] Sources section unchanged
|
|
128
|
+
|
|
129
|
+
---
|
|
130
|
+
|
|
131
|
+
[Full humanized report below]
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
## Quality Checks
|
|
135
|
+
|
|
136
|
+
Before writing the output:
|
|
137
|
+
- [ ] Zero em dashes remain in the content
|
|
138
|
+
- [ ] No two consecutive paragraphs start with the same word
|
|
139
|
+
- [ ] No AI pattern from the tells list remains
|
|
140
|
+
- [ ] At least 2 voice injections added (but no more than 3)
|
|
141
|
+
- [ ] Report structure is identical to the input
|
|
142
|
+
- [ ] Content reads like a human analyst wrote it
|
|
143
|
+
- [ ] The change log accurately reflects all changes made
|
|
@@ -0,0 +1,93 @@
|
|
|
1
|
+
# Peer Review — Instructions
|
|
2
|
+
|
|
3
|
+
## Purpose
|
|
4
|
+
|
|
5
|
+
You are reviewing another contributor's research report. Your job is to verify quality and catch problems before a human reviewer sees it. You are NOT the original researcher — you are a second pair of eyes.
|
|
6
|
+
|
|
7
|
+
## Instructions
|
|
8
|
+
|
|
9
|
+
### 1. Check the Automated Fact-Check Results First
|
|
10
|
+
|
|
11
|
+
If automated fact-check results are included above the report, read them before diving into the report itself. Focus on:
|
|
12
|
+
- **Red flags** — these are specific problems the automated system detected (unsupported claims, dead links, self-reported data issues)
|
|
13
|
+
- **Fact support rate** — below 70% means many claims aren't backed by their cited sources
|
|
14
|
+
- **Avg trust score** — below 50% means citations are low-quality (self-reported, blog posts, dead links)
|
|
15
|
+
|
|
16
|
+
Use these results to target your spot-checks. If the automated system flagged specific unsupported claims, verify those first.
|
|
17
|
+
|
|
18
|
+
### 2. Read the Full Report
|
|
19
|
+
|
|
20
|
+
Read the entire report. Note the org name, the scored checklist, and the overall recommendation.
|
|
21
|
+
|
|
22
|
+
### 3. Spot-Check Citations (3-5)
|
|
23
|
+
|
|
24
|
+
Pick 3-5 citation URLs from the report (prioritize any flagged by the automated fact-check). For each:
|
|
25
|
+
- Visit the URL using web fetch
|
|
26
|
+
- Verify the page exists (not 404)
|
|
27
|
+
- Check that the source says what the report claims
|
|
28
|
+
- If a citation is wrong, search for the correct source. If the claim can't be sourced anywhere, remove it.
|
|
29
|
+
|
|
30
|
+
### 4. Check Report Structure
|
|
31
|
+
|
|
32
|
+
Verify:
|
|
33
|
+
- [ ] All 5 prompt sections present (PROMPT 1-5) with 20 rows each
|
|
34
|
+
- [ ] All 7 summary sections present (Sections 1-7)
|
|
35
|
+
- [ ] SOURCES section exists with citations
|
|
36
|
+
- [ ] Every factual claim has its own inline citation `[Source Name](URL)`
|
|
37
|
+
- [ ] No claims cited to general overview pages when a specific report or data page exists
|
|
38
|
+
|
|
39
|
+
### 5. Evaluate Scoring
|
|
40
|
+
|
|
41
|
+
The scored checklist uses these weights. Verify the math and the evidence:
|
|
42
|
+
|
|
43
|
+
Base score (out of 100):
|
|
44
|
+
- a. Has Ultimate Outcome Goals (50 pts)
|
|
45
|
+
- b. Measures Intermediate Outcomes (10 pts)
|
|
46
|
+
- c. Measures Ultimate Outcomes (15 pts)
|
|
47
|
+
- d. Shows Continual Learning & Adaptation (25 pts)
|
|
48
|
+
|
|
49
|
+
Extra credit:
|
|
50
|
+
- e. Measures Intermediate Counterfactual (10 pts)
|
|
51
|
+
- f. Measures Ultimate Counterfactual (10 pts)
|
|
52
|
+
|
|
53
|
+
**Score: X/100** (can exceed 100 with extra credit, max 120)
|
|
54
|
+
|
|
55
|
+
Check:
|
|
56
|
+
- Are checked items supported by evidence in the report?
|
|
57
|
+
- Are unchecked items correctly unchecked (no evidence was found)?
|
|
58
|
+
- Does the score math add up?
|
|
59
|
+
|
|
60
|
+
### 6. Look for Red Flags
|
|
61
|
+
|
|
62
|
+
- Suspiciously specific numbers with no citation
|
|
63
|
+
- Studies or evaluations that seem fabricated
|
|
64
|
+
- Copy-pasted content or generic filler
|
|
65
|
+
- Sections that are empty or trivially short
|
|
66
|
+
- Claims that contradict other parts of the report
|
|
67
|
+
- Em dashes, filler adjectives (robust, comprehensive, innovative), AI transitions
|
|
68
|
+
|
|
69
|
+
### 7. Assign a Score
|
|
70
|
+
|
|
71
|
+
| Score | When to use |
|
|
72
|
+
|-------|------------|
|
|
73
|
+
| **4 — Great** | Report is thorough, citations check out, scoring is correct. No changes needed. |
|
|
74
|
+
| **3 — Good with fixes** | Minor issues you can fix: broken citation, wrong score math, awkward phrasing, a checklist item that should be toggled, misattributed citation. **Fix the issues yourself** and submit the corrected report. |
|
|
75
|
+
| **2 — Needs redo** | Major problems: thin evidence across multiple sections, significant hallucinations, missing sections, fundamentally wrong scoring. Not fixable with minor edits. |
|
|
76
|
+
| **1 — Bad actor** | Garbage: copy-pasted nonsense, completely fabricated data, obvious gaming attempt. This flags the original author. Use sparingly and only when clearly warranted. |
|
|
77
|
+
|
|
78
|
+
### 8. Submit Your Review
|
|
79
|
+
|
|
80
|
+
Use `submit_peer_review` with:
|
|
81
|
+
- `claim_id`: The claim ID shown above
|
|
82
|
+
- `score`: Your score (1-4)
|
|
83
|
+
- `notes`: Brief explanation of your score. Mention which citations you checked and what you found.
|
|
84
|
+
- `updated_report`: If score is 3, include the full fixed report
|
|
85
|
+
|
|
86
|
+
## Important Rules
|
|
87
|
+
|
|
88
|
+
- Be fair. Most reports should score 3 or 4.
|
|
89
|
+
- Score 2 is for genuinely bad reports, not minor style preferences.
|
|
90
|
+
- Score 1 is for abuse. If you're unsure, use 2 instead.
|
|
91
|
+
- If you spot-check a citation and it's broken, that alone is a 3 (fix it), not a 2.
|
|
92
|
+
- Don't rewrite the report to match your style. Fix factual errors, not opinions.
|
|
93
|
+
- If the automated fact-check flagged issues, verify them. If the flags are correct, fix the citations (score 3) or flag the report (score 2) depending on severity.
|
|
@@ -0,0 +1,11 @@
|
|
|
1
|
+
# Archive: Pre-Todd-v2 Prompts
|
|
2
|
+
|
|
3
|
+
**Archived:** 2026-05-07
|
|
4
|
+
**Reason:** Updating methodology to incorporate Todd Manwaring's revised training & prompts (PDF: "2026 04 25 Training & Prompt.pdf")
|
|
5
|
+
**Restore:** Copy any file back to its corresponding `pipeline/0N-*/PROMPT.md` location
|
|
6
|
+
|
|
7
|
+
## Files
|
|
8
|
+
- `01-research-PROMPT.md` — Main research prompt (prompts 1-5 + summary)
|
|
9
|
+
- `02-verify-PROMPT.md` — Citation re-verification
|
|
10
|
+
- `03-humanize-PROMPT.md` — AI pattern removal
|
|
11
|
+
- `04-peer-review-PROMPT.md` — Peer review scoring
|
package/src/api-client.js
CHANGED
|
@@ -86,11 +86,4 @@ export class ApiClient {
|
|
|
86
86
|
return this.request('GET', '/research/impact');
|
|
87
87
|
}
|
|
88
88
|
|
|
89
|
-
async getNextAction() {
|
|
90
|
-
return this.request('GET', '/research/next-action');
|
|
91
|
-
}
|
|
92
|
-
|
|
93
|
-
async enableSchedule() {
|
|
94
|
-
return this.request('POST', '/research/enable-schedule');
|
|
95
|
-
}
|
|
96
89
|
}
|
package/src/init.js
CHANGED
|
@@ -174,7 +174,16 @@ function statePath() { return homeRelative(join(homedir(), '.tokens-for-good
|
|
|
174
174
|
|
|
175
175
|
function readJsonOrEmpty(path) {
|
|
176
176
|
if (!existsSync(path)) return {};
|
|
177
|
-
|
|
177
|
+
const raw = readFileSync(path, 'utf-8');
|
|
178
|
+
try {
|
|
179
|
+
return JSON.parse(raw);
|
|
180
|
+
} catch {
|
|
181
|
+
throw new Error(
|
|
182
|
+
`${path} exists but is not valid JSON.\n` +
|
|
183
|
+
`Fix or delete the file, then re-run init.\n` +
|
|
184
|
+
`(Tip: paste it into https://jsonlint.com to find the syntax error.)`
|
|
185
|
+
);
|
|
186
|
+
}
|
|
178
187
|
}
|
|
179
188
|
|
|
180
189
|
function ensureDir(path) {
|
package/src/mcp-server.js
CHANGED
|
@@ -4,7 +4,7 @@ import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'
|
|
|
4
4
|
import { z } from 'zod';
|
|
5
5
|
import { ApiClient } from './api-client.js';
|
|
6
6
|
import { detectPlatform, isSchedulable, getAutomationInstructions } from './platform.js';
|
|
7
|
-
import { loadState, updateState, isSnoozed, hasContributedToday, markContributed, markSetupComplete } from './state.js';
|
|
7
|
+
import { loadState, updateState, isSnoozed, snoozeDays, hasContributedToday, markContributed, markSetupComplete } from './state.js';
|
|
8
8
|
import { readFileSync, existsSync } from 'fs';
|
|
9
9
|
import { join, dirname } from 'path';
|
|
10
10
|
import { fileURLToPath } from 'url';
|
|
@@ -144,7 +144,7 @@ server.tool('submit_report', 'Submit a completed research report for an org you
|
|
|
144
144
|
if (!client) return { content: [{ type: 'text', text: 'Error: TFG_API_KEY not set.' }] };
|
|
145
145
|
|
|
146
146
|
try {
|
|
147
|
-
const result = await client.submitReport(claim_id, report_markdown,
|
|
147
|
+
const result = await client.submitReport(claim_id, report_markdown, estimated_tokens, null, model_used, PKG_VERSION);
|
|
148
148
|
markContributed();
|
|
149
149
|
|
|
150
150
|
// One-off users: first successful submit completes their initial setup,
|
|
@@ -205,7 +205,7 @@ server.tool('get_peer_review', 'Get a draft report assigned to you for peer revi
|
|
|
205
205
|
|
|
206
206
|
server.tool('submit_peer_review', 'Submit your peer review score for a report.', {
|
|
207
207
|
claim_id: z.string().describe('The claim ID of the report being reviewed'),
|
|
208
|
-
score: z.number().min(1).max(4).describe('Score: 4=great, 3=good with fixes, 2=needs redo, 1=bad actor'),
|
|
208
|
+
score: z.number().int().min(1).max(4).describe('Score: 4=great, 3=good with fixes, 2=needs redo, 1=bad actor'),
|
|
209
209
|
notes: z.string().optional().describe('Review notes explaining the score'),
|
|
210
210
|
updated_report: z.string().optional().describe('If score is 3, the fixed version of the report'),
|
|
211
211
|
}, async ({ claim_id, score, notes, updated_report }) => {
|
|
@@ -270,6 +270,13 @@ server.tool('mark_setup_complete', 'Called by the /tfg-schedule skill after /sch
|
|
|
270
270
|
return { content: [{ type: 'text', text: 'Marked setup complete. The SessionStart hook will go silent from the next session.' }] };
|
|
271
271
|
});
|
|
272
272
|
|
|
273
|
+
server.tool('snooze', 'Snooze Tokens for Good reminders. Call this when the user says to remind them tomorrow, next week, or in N days.', {
|
|
274
|
+
days: z.number().int().min(1).max(365).describe('Days to snooze (1 = tomorrow, 7 = next week)'),
|
|
275
|
+
}, async ({ days }) => {
|
|
276
|
+
snoozeDays(days);
|
|
277
|
+
return { content: [{ type: 'text', text: `Got it — Tokens for Good will stay quiet for ${days} day${days === 1 ? '' : 's'}.` }] };
|
|
278
|
+
});
|
|
279
|
+
|
|
273
280
|
// --- Prompts (session start) ---
|
|
274
281
|
|
|
275
282
|
server.prompt('session_start', 'Check if you should research an org or complete a peer review', {}, async () => {
|
|
@@ -286,18 +293,16 @@ server.prompt('session_start', 'Check if you should research an org or complete
|
|
|
286
293
|
const state = loadState();
|
|
287
294
|
|
|
288
295
|
// Check for pending peer review first
|
|
289
|
-
|
|
290
|
-
|
|
291
|
-
|
|
292
|
-
|
|
293
|
-
|
|
294
|
-
|
|
295
|
-
|
|
296
|
-
|
|
297
|
-
|
|
298
|
-
|
|
299
|
-
// No pending review, continue
|
|
300
|
-
}
|
|
296
|
+
try {
|
|
297
|
+
await client.getNextPeerReview();
|
|
298
|
+
return {
|
|
299
|
+
messages: [{
|
|
300
|
+
role: 'user',
|
|
301
|
+
content: { type: 'text', text: `You have a pending peer review to complete before you can claim a new org. Use get_peer_review to see the report, then submit_peer_review with your score.` },
|
|
302
|
+
}],
|
|
303
|
+
};
|
|
304
|
+
} catch {
|
|
305
|
+
// No pending review, continue
|
|
301
306
|
}
|
|
302
307
|
|
|
303
308
|
if (isSnoozed()) {
|