tokens-for-good 0.3.6 → 0.3.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "tokens-for-good",
3
- "version": "0.3.6",
3
+ "version": "0.3.8",
4
4
  "type": "module",
5
5
  "description": "Donate your spare AI tokens to research nonprofits for Fierce Philanthropy",
6
6
  "bin": {
@@ -66,6 +66,7 @@ Keep the table with all columns. For each of the 20 negative consequences, add a
66
66
  - When Yes or Partial: include SPECIFIC data (percentages, sample sizes, time periods, study names)
67
67
  - Only direct results from this organization, not from other orgs or modeling
68
68
  - **CITATION RULES (critical):** Every data point MUST have its own inline citation `[Source Name](URL)`. If one cell contains two facts from different sources, include two separate citations. Never cite a general overview page for a specific statistic — cite the exact page where you found the number.
69
+ - **VERIFY INLINE:** After writing each cell, re-read the source you cited and confirm the exact numbers match. If the source says 75% and you wrote 59%, fix it before moving on. Do not proceed to the next row until the current row's numbers are confirmed against the cited page.
69
70
 
70
71
  #### PROMPT 5 — Counterfactual Results
71
72
 
@@ -74,7 +75,7 @@ Keep the table with ALL previous columns. For each of the 20 negative consequenc
74
75
  - Start each cell with "Yes.", "Partial.", or "No counterfactual results."
75
76
  - Describe study design (RCT, quasi-experimental, matched comparison), sample sizes, what the control/comparison group showed
76
77
  - Counterfactual = comparison to what would have happened without the intervention. Before/after alone does not count.
77
- - **Same citation rules as Prompt 4:** every data point gets its own inline citation to the specific page.
78
+ - **Same citation and verify-inline rules as Prompt 4:** every data point gets its own inline citation, and confirm numbers match the source before moving to the next row.
78
79
 
79
80
  #### SUMMARY REPORT
80
81
 
@@ -134,7 +135,7 @@ These rules are critical for report quality. Poorly attributed citations are the
134
135
 
135
136
  3. **If you can't find a URL for a claim, don't include the claim.** No unsourced facts. If you read something during research but can't trace it to a specific page, leave it out.
136
137
 
137
- 4. **Verify before citing.** After writing a claim with a citation, confirm the cited page actually contains that information. If it doesn't, find the correct source or remove the claim.
138
+ 4. **Verify numbers match the source exactly.** After writing a claim with a number (percentage, dollar amount, count), re-read the cited page and confirm the exact figure appears there. Common errors: writing 59% when the source says 75%, writing 4,000 when the source says 1,651, or writing 20% when the source says 25%. If your number doesn't match, use the source's number or remove the claim.
138
139
 
139
140
  5. **Attribution matters.** Say "X reports that" when citing an org's own claims. Say "independent evaluation found" when citing third-party evidence. The distinction is load-bearing.
140
141
 
@@ -152,7 +153,7 @@ Run these checks before submitting. They are not optional.
152
153
 
153
154
  **Citations:**
154
155
  - [ ] Every factual claim has its own inline citation
155
- - [ ] Spot-check at least 5 citations: visit the URL and confirm the page says what you claim
156
+ - [ ] Spot-check at least 5 citations: visit the URL and confirm the EXACT numbers on the page match what you wrote. If the source says 132% and you wrote 136%, fix it.
156
157
  - [ ] For any citation where the page doesn't support your claim, find the correct source or remove the claim
157
158
  - [ ] No claims are cited to general overview pages when a specific report or data page exists
158
159
 
@@ -1,67 +1,86 @@
1
- # Step 4: Peer Review -- Claude Code Instructions
2
-
3
- ## Inputs
4
-
5
- - **Report to review:** Provided by the `get_peer_review` MCP tool
6
- - **Research guidance:** The same methodology from step 1
7
- - **Writing style guide:** The same decontamination rules from step 3
1
+ # Peer Review Instructions
8
2
 
9
3
  ## Purpose
10
4
 
11
- You are reviewing another contributor's research report. Your job is to verify quality and catch problems before a human reviewer sees it. You are NOT the original researcher -- you are a second pair of eyes.
5
+ You are reviewing another contributor's research report. Your job is to verify quality and catch problems before a human reviewer sees it. You are NOT the original researcher you are a second pair of eyes.
12
6
 
13
7
  ## Instructions
14
8
 
15
- ### 1. Read the Full Report
9
+ ### 1. Check the Automated Fact-Check Results First
10
+
11
+ If automated fact-check results are included above the report, read them before diving into the report itself. Focus on:
12
+ - **Red flags** — these are specific problems the automated system detected (unsupported claims, dead links, self-reported data issues)
13
+ - **Fact support rate** — below 70% means many claims aren't backed by their cited sources
14
+ - **Avg trust score** — below 50% means citations are low-quality (self-reported, blog posts, dead links)
15
+
16
+ Use these results to target your spot-checks. If the automated system flagged specific unsupported claims, verify those first.
17
+
18
+ ### 2. Read the Full Report
16
19
 
17
- Read the entire report carefully. Note the org name, the scored checklist, and the overall recommendation.
20
+ Read the entire report. Note the org name, the scored checklist, and the overall recommendation.
18
21
 
19
- ### 2. Spot-Check Citations (3-5)
22
+ ### 3. Spot-Check Citations (3-5)
20
23
 
21
- Pick 3-5 citation URLs from the report. For each:
24
+ Pick 3-5 citation URLs from the report (prioritize any flagged by the automated fact-check). For each:
22
25
  - Visit the URL using web fetch
23
26
  - Verify the page exists (not 404)
24
27
  - Check that the source says what the report claims
28
+ - If a citation is wrong, search for the correct source. If the claim can't be sourced anywhere, remove it.
25
29
 
26
- ### 3. Check Report Structure
30
+ ### 4. Check Report Structure
27
31
 
28
32
  Verify:
29
- - [ ] All 5 prompt sections present (PROMPT 1-5)
33
+ - [ ] All 5 prompt sections present (PROMPT 1-5) with 20 rows each
30
34
  - [ ] All 7 summary sections present (Sections 1-7)
31
35
  - [ ] SOURCES section exists with citations
32
- - [ ] Tables in Prompts 2-5 have content
33
- - [ ] Scored checklist is present with score calculated correctly
36
+ - [ ] Every factual claim has its own inline citation `[Source Name](URL)`
37
+ - [ ] No claims cited to general overview pages when a specific report or data page exists
38
+
39
+ ### 5. Evaluate Scoring
40
+
41
+ The scored checklist uses these weights. Verify the math and the evidence:
42
+
43
+ Base score (out of 100):
44
+ - a. Has Ultimate Outcome Goals (50 pts)
45
+ - b. Measures Intermediate Outcomes (10 pts)
46
+ - c. Measures Ultimate Outcomes (15 pts)
47
+ - d. Shows Continual Learning & Adaptation (25 pts)
48
+
49
+ Extra credit:
50
+ - e. Measures Intermediate Counterfactual (10 pts)
51
+ - f. Measures Ultimate Counterfactual (10 pts)
34
52
 
35
- ### 4. Evaluate Scoring
53
+ **Score: X/100** (can exceed 100 with extra credit, max 120)
36
54
 
37
- Compare the checklist against the evidence:
55
+ Check:
38
56
  - Are checked items supported by evidence in the report?
39
57
  - Are unchecked items correctly unchecked (no evidence was found)?
40
- - Does the score math add up (checked items x weights = stated score)?
58
+ - Does the score math add up?
41
59
 
42
- ### 5. Look for Red Flags
60
+ ### 6. Look for Red Flags
43
61
 
44
62
  - Suspiciously specific numbers with no citation
45
63
  - Studies or evaluations that seem fabricated
46
64
  - Copy-pasted content or generic filler
47
65
  - Sections that are empty or trivially short
48
66
  - Claims that contradict other parts of the report
67
+ - Em dashes, filler adjectives (robust, comprehensive, innovative), AI transitions
49
68
 
50
- ### 6. Assign a Score
69
+ ### 7. Assign a Score
51
70
 
52
71
  | Score | When to use |
53
72
  |-------|------------|
54
- | **4 -- Great** | Report is thorough, citations check out, scoring is correct. No changes needed. |
55
- | **3 -- Good with fixes** | Minor issues you can fix: broken citation, wrong score math, awkward phrasing, a checklist item that should be toggled. **Fix the issues yourself** and submit the corrected report. |
56
- | **2 -- Needs redo** | Major problems: thin evidence across multiple sections, significant hallucinations, missing sections, fundamentally wrong scoring. Not fixable with minor edits. |
57
- | **1 -- Bad actor** | Garbage: copy-pasted nonsense, completely fabricated data, obvious gaming attempt. This flags the original author. Use sparingly and only when clearly warranted. |
73
+ | **4 Great** | Report is thorough, citations check out, scoring is correct. No changes needed. |
74
+ | **3 Good with fixes** | Minor issues you can fix: broken citation, wrong score math, awkward phrasing, a checklist item that should be toggled, misattributed citation. **Fix the issues yourself** and submit the corrected report. |
75
+ | **2 Needs redo** | Major problems: thin evidence across multiple sections, significant hallucinations, missing sections, fundamentally wrong scoring. Not fixable with minor edits. |
76
+ | **1 Bad actor** | Garbage: copy-pasted nonsense, completely fabricated data, obvious gaming attempt. This flags the original author. Use sparingly and only when clearly warranted. |
58
77
 
59
- ### 7. Submit Your Review
78
+ ### 8. Submit Your Review
60
79
 
61
80
  Use `submit_peer_review` with:
62
- - `claim_id`: The claim ID from `get_peer_review`
81
+ - `claim_id`: The claim ID shown above
63
82
  - `score`: Your score (1-4)
64
- - `notes`: Brief explanation of your score
83
+ - `notes`: Brief explanation of your score. Mention which citations you checked and what you found.
65
84
  - `updated_report`: If score is 3, include the full fixed report
66
85
 
67
86
  ## Important Rules
@@ -71,3 +90,4 @@ Use `submit_peer_review` with:
71
90
  - Score 1 is for abuse. If you're unsure, use 2 instead.
72
91
  - If you spot-check a citation and it's broken, that alone is a 3 (fix it), not a 2.
73
92
  - Don't rewrite the report to match your style. Fix factual errors, not opinions.
93
+ - If the automated fact-check flagged issues, verify them. If the flags are correct, fix the citations (score 3) or flag the report (score 2) depending on severity.
package/src/mcp-server.js CHANGED
@@ -160,7 +160,7 @@ server.tool('get_methodology', 'Get the full research methodology, verification
160
160
  });
161
161
 
162
162
  server.tool('submit_report', 'Submit a completed research report for an org you claimed. You MUST include estimated_tokens.', {
163
- claim_id: z.number().describe('The claim ID from claim_org'),
163
+ claim_id: z.string().describe('The claim ID from claim_org'),
164
164
  report_markdown: z.string().describe('The full research report in markdown'),
165
165
  estimated_tokens: z.number().describe('Estimated total tokens used: count web searches (~1K each), web fetches (~2-5K each), report output (~4 tokens/word), plus ~10K overhead'),
166
166
  model_used: z.string().optional().describe('The model that generated this report'),
@@ -189,9 +189,25 @@ server.tool('get_peer_review', 'Get a draft report assigned to you for peer revi
189
189
  } catch {
190
190
  peerMethodology = 'Score 1-4: 4=Great, 3=Good with fixes (submit corrected version), 2=Needs redo, 1=Bad actor.';
191
191
  }
192
- const factCheckNote = result.automated_review
193
- ? `\n\nAutomated Fact-Check: ${result.automated_review.status}${result.automated_review.summary ? ` — Quality: ${result.automated_review.summary.overall_quality}, Fact support: ${Math.round(result.automated_review.summary.fact_support_rate * 100)}%, Avg trust: ${Math.round(result.automated_review.summary.avg_trust_score * 100)}%` : ''}`
194
- : '';
192
+ let factCheckNote = '';
193
+ if (result.automated_review?.summary) {
194
+ const s = result.automated_review.summary;
195
+ const lines = [
196
+ `\n\n## Automated Fact-Check Results`,
197
+ `Quality: ${s.overall_quality} | Fact support: ${Math.round(s.fact_support_rate * 100)}% | Avg trust: ${Math.round(s.avg_trust_score * 100)}%`,
198
+ `Facts checked: ${result.automated_review.facts_checked}/${result.automated_review.facts_extracted} | Citations rated: ${result.automated_review.citations_rated}`,
199
+ ];
200
+ if (s.red_flags?.length > 0) {
201
+ lines.push(`\nRed flags:\n${s.red_flags.map(f => ` - ${f}`).join('\n')}`);
202
+ }
203
+ if (s.strengths?.length > 0) {
204
+ lines.push(`\nStrengths:\n${s.strengths.map(f => ` - ${f}`).join('\n')}`);
205
+ }
206
+ lines.push(`\nUse these results to focus your spot-checks on flagged areas.`);
207
+ factCheckNote = lines.join('\n');
208
+ } else if (result.automated_review) {
209
+ factCheckNote = `\n\nAutomated Fact-Check: ${result.automated_review.status} (no summary available yet)`;
210
+ }
195
211
  return {
196
212
  content: [{ type: 'text', text: `Peer review assigned:\nOrg: ${result.org.name}\nAuthor: ${result.author}\nClaim ID: ${result.claim_id}${factCheckNote}\n\n---\n\n${peerMethodology}\n\n---\n\n${result.report_markdown}\n\n---\n\nUse submit_peer_review with your score and notes.` }],
197
213
  };