tokens-for-good 0.3.6 → 0.3.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/pipeline/04-peer-review/PROMPT.md +48 -28
- package/src/mcp-server.js +20 -4
package/package.json
CHANGED
|
@@ -1,67 +1,86 @@
|
|
|
1
|
-
#
|
|
2
|
-
|
|
3
|
-
## Inputs
|
|
4
|
-
|
|
5
|
-
- **Report to review:** Provided by the `get_peer_review` MCP tool
|
|
6
|
-
- **Research guidance:** The same methodology from step 1
|
|
7
|
-
- **Writing style guide:** The same decontamination rules from step 3
|
|
1
|
+
# Peer Review — Instructions
|
|
8
2
|
|
|
9
3
|
## Purpose
|
|
10
4
|
|
|
11
|
-
You are reviewing another contributor's research report. Your job is to verify quality and catch problems before a human reviewer sees it. You are NOT the original researcher
|
|
5
|
+
You are reviewing another contributor's research report. Your job is to verify quality and catch problems before a human reviewer sees it. You are NOT the original researcher — you are a second pair of eyes.
|
|
12
6
|
|
|
13
7
|
## Instructions
|
|
14
8
|
|
|
15
|
-
### 1.
|
|
9
|
+
### 1. Check the Automated Fact-Check Results First
|
|
10
|
+
|
|
11
|
+
If automated fact-check results are included above the report, read them before diving into the report itself. Focus on:
|
|
12
|
+
- **Red flags** — these are specific problems the automated system detected (unsupported claims, dead links, self-reported data issues)
|
|
13
|
+
- **Fact support rate** — below 70% means many claims aren't backed by their cited sources
|
|
14
|
+
- **Avg trust score** — below 50% means citations are low-quality (self-reported, blog posts, dead links)
|
|
15
|
+
|
|
16
|
+
Use these results to target your spot-checks. If the automated system flagged specific unsupported claims, verify those first.
|
|
17
|
+
|
|
18
|
+
### 2. Read the Full Report
|
|
16
19
|
|
|
17
|
-
Read the entire report
|
|
20
|
+
Read the entire report. Note the org name, the scored checklist, and the overall recommendation.
|
|
18
21
|
|
|
19
|
-
###
|
|
22
|
+
### 3. Spot-Check Citations (3-5)
|
|
20
23
|
|
|
21
|
-
Pick 3-5 citation URLs from the report. For each:
|
|
24
|
+
Pick 3-5 citation URLs from the report (prioritize any flagged by the automated fact-check). For each:
|
|
22
25
|
- Visit the URL using web fetch
|
|
23
26
|
- Verify the page exists (not 404)
|
|
24
27
|
- Check that the source says what the report claims
|
|
28
|
+
- If a citation is wrong, search for the correct source. If the claim can't be sourced anywhere, remove it.
|
|
25
29
|
|
|
26
|
-
###
|
|
30
|
+
### 4. Check Report Structure
|
|
27
31
|
|
|
28
32
|
Verify:
|
|
29
|
-
- [ ] All 5 prompt sections present (PROMPT 1-5)
|
|
33
|
+
- [ ] All 5 prompt sections present (PROMPT 1-5) with 20 rows each
|
|
30
34
|
- [ ] All 7 summary sections present (Sections 1-7)
|
|
31
35
|
- [ ] SOURCES section exists with citations
|
|
32
|
-
- [ ]
|
|
33
|
-
- [ ]
|
|
36
|
+
- [ ] Every factual claim has its own inline citation `[Source Name](URL)`
|
|
37
|
+
- [ ] No claims cited to general overview pages when a specific report or data page exists
|
|
38
|
+
|
|
39
|
+
### 5. Evaluate Scoring
|
|
40
|
+
|
|
41
|
+
The scored checklist uses these weights. Verify the math and the evidence:
|
|
42
|
+
|
|
43
|
+
Base score (out of 100):
|
|
44
|
+
- a. Has Ultimate Outcome Goals (50 pts)
|
|
45
|
+
- b. Measures Intermediate Outcomes (10 pts)
|
|
46
|
+
- c. Measures Ultimate Outcomes (15 pts)
|
|
47
|
+
- d. Shows Continual Learning & Adaptation (25 pts)
|
|
48
|
+
|
|
49
|
+
Extra credit:
|
|
50
|
+
- e. Measures Intermediate Counterfactual (10 pts)
|
|
51
|
+
- f. Measures Ultimate Counterfactual (10 pts)
|
|
34
52
|
|
|
35
|
-
|
|
53
|
+
**Score: X/100** (can exceed 100 with extra credit, max 120)
|
|
36
54
|
|
|
37
|
-
|
|
55
|
+
Check:
|
|
38
56
|
- Are checked items supported by evidence in the report?
|
|
39
57
|
- Are unchecked items correctly unchecked (no evidence was found)?
|
|
40
|
-
- Does the score math add up
|
|
58
|
+
- Does the score math add up?
|
|
41
59
|
|
|
42
|
-
###
|
|
60
|
+
### 6. Look for Red Flags
|
|
43
61
|
|
|
44
62
|
- Suspiciously specific numbers with no citation
|
|
45
63
|
- Studies or evaluations that seem fabricated
|
|
46
64
|
- Copy-pasted content or generic filler
|
|
47
65
|
- Sections that are empty or trivially short
|
|
48
66
|
- Claims that contradict other parts of the report
|
|
67
|
+
- Em dashes, filler adjectives (robust, comprehensive, innovative), AI transitions
|
|
49
68
|
|
|
50
|
-
###
|
|
69
|
+
### 7. Assign a Score
|
|
51
70
|
|
|
52
71
|
| Score | When to use |
|
|
53
72
|
|-------|------------|
|
|
54
|
-
| **4
|
|
55
|
-
| **3
|
|
56
|
-
| **2
|
|
57
|
-
| **1
|
|
73
|
+
| **4 — Great** | Report is thorough, citations check out, scoring is correct. No changes needed. |
|
|
74
|
+
| **3 — Good with fixes** | Minor issues you can fix: broken citation, wrong score math, awkward phrasing, a checklist item that should be toggled, misattributed citation. **Fix the issues yourself** and submit the corrected report. |
|
|
75
|
+
| **2 — Needs redo** | Major problems: thin evidence across multiple sections, significant hallucinations, missing sections, fundamentally wrong scoring. Not fixable with minor edits. |
|
|
76
|
+
| **1 — Bad actor** | Garbage: copy-pasted nonsense, completely fabricated data, obvious gaming attempt. This flags the original author. Use sparingly and only when clearly warranted. |
|
|
58
77
|
|
|
59
|
-
###
|
|
78
|
+
### 8. Submit Your Review
|
|
60
79
|
|
|
61
80
|
Use `submit_peer_review` with:
|
|
62
|
-
- `claim_id`: The claim ID
|
|
81
|
+
- `claim_id`: The claim ID shown above
|
|
63
82
|
- `score`: Your score (1-4)
|
|
64
|
-
- `notes`: Brief explanation of your score
|
|
83
|
+
- `notes`: Brief explanation of your score. Mention which citations you checked and what you found.
|
|
65
84
|
- `updated_report`: If score is 3, include the full fixed report
|
|
66
85
|
|
|
67
86
|
## Important Rules
|
|
@@ -71,3 +90,4 @@ Use `submit_peer_review` with:
|
|
|
71
90
|
- Score 1 is for abuse. If you're unsure, use 2 instead.
|
|
72
91
|
- If you spot-check a citation and it's broken, that alone is a 3 (fix it), not a 2.
|
|
73
92
|
- Don't rewrite the report to match your style. Fix factual errors, not opinions.
|
|
93
|
+
- If the automated fact-check flagged issues, verify them. If the flags are correct, fix the citations (score 3) or flag the report (score 2) depending on severity.
|
package/src/mcp-server.js
CHANGED
|
@@ -160,7 +160,7 @@ server.tool('get_methodology', 'Get the full research methodology, verification
|
|
|
160
160
|
});
|
|
161
161
|
|
|
162
162
|
server.tool('submit_report', 'Submit a completed research report for an org you claimed. You MUST include estimated_tokens.', {
|
|
163
|
-
claim_id: z.
|
|
163
|
+
claim_id: z.string().describe('The claim ID from claim_org'),
|
|
164
164
|
report_markdown: z.string().describe('The full research report in markdown'),
|
|
165
165
|
estimated_tokens: z.number().describe('Estimated total tokens used: count web searches (~1K each), web fetches (~2-5K each), report output (~4 tokens/word), plus ~10K overhead'),
|
|
166
166
|
model_used: z.string().optional().describe('The model that generated this report'),
|
|
@@ -189,9 +189,25 @@ server.tool('get_peer_review', 'Get a draft report assigned to you for peer revi
|
|
|
189
189
|
} catch {
|
|
190
190
|
peerMethodology = 'Score 1-4: 4=Great, 3=Good with fixes (submit corrected version), 2=Needs redo, 1=Bad actor.';
|
|
191
191
|
}
|
|
192
|
-
|
|
193
|
-
|
|
194
|
-
|
|
192
|
+
let factCheckNote = '';
|
|
193
|
+
if (result.automated_review?.summary) {
|
|
194
|
+
const s = result.automated_review.summary;
|
|
195
|
+
const lines = [
|
|
196
|
+
`\n\n## Automated Fact-Check Results`,
|
|
197
|
+
`Quality: ${s.overall_quality} | Fact support: ${Math.round(s.fact_support_rate * 100)}% | Avg trust: ${Math.round(s.avg_trust_score * 100)}%`,
|
|
198
|
+
`Facts checked: ${result.automated_review.facts_checked}/${result.automated_review.facts_extracted} | Citations rated: ${result.automated_review.citations_rated}`,
|
|
199
|
+
];
|
|
200
|
+
if (s.red_flags?.length > 0) {
|
|
201
|
+
lines.push(`\nRed flags:\n${s.red_flags.map(f => ` - ${f}`).join('\n')}`);
|
|
202
|
+
}
|
|
203
|
+
if (s.strengths?.length > 0) {
|
|
204
|
+
lines.push(`\nStrengths:\n${s.strengths.map(f => ` - ${f}`).join('\n')}`);
|
|
205
|
+
}
|
|
206
|
+
lines.push(`\nUse these results to focus your spot-checks on flagged areas.`);
|
|
207
|
+
factCheckNote = lines.join('\n');
|
|
208
|
+
} else if (result.automated_review) {
|
|
209
|
+
factCheckNote = `\n\nAutomated Fact-Check: ${result.automated_review.status} (no summary available yet)`;
|
|
210
|
+
}
|
|
195
211
|
return {
|
|
196
212
|
content: [{ type: 'text', text: `Peer review assigned:\nOrg: ${result.org.name}\nAuthor: ${result.author}\nClaim ID: ${result.claim_id}${factCheckNote}\n\n---\n\n${peerMethodology}\n\n---\n\n${result.report_markdown}\n\n---\n\nUse submit_peer_review with your score and notes.` }],
|
|
197
213
|
};
|