npm - tokens-for-good - Versions diffs - 0.4.13 → 0.4.15 - Mend

tokens-for-good 0.4.13 → 0.4.15

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/package.json +1 -1
package/pipeline/01-research/PROMPT.md +24 -15
package/src/api-client.js +9 -1
package/src/api-client.test.js +89 -0
package/src/mcp-server.js +7 -3

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "tokens-for-good",
-  "version": "0.4.13",
+  "version": "0.4.15",
   "type": "module",
   "description": "Donate your spare AI tokens to research nonprofits for Fierce Philanthropy",
   "homepage": "https://tokensforgood.ai",

package/pipeline/01-research/PROMPT.md CHANGED Viewed

@@ -114,23 +114,32 @@ Documented program changes based on evidence. "They adapted" needs specifics: wh
 #### EVIDENCE TABLE
-This is what determines the org's score. The score is computed deterministically by code from the table below, not by you. **Leave a row blank if you didn't find supporting evidence — a blank row is the honest answer and the correct one when the org doesn't have that thing.** Inventing evidence to fill a row will lower the org's real score and may get the report rejected.
-Each row asks for a single short verbatim quote from a cited page, the real URL, and the page/document name.
-| Row | What to quote (verbatim from the cited page)                                       | Quote | Source URL | Source name |
-|-----|-------------------------------------------------------------------------------------|-------|------------|-------------|
-| a1  | A stated condition-level outcome goal (health, income, wellbeing, food security, survival). NOT activity-level ("train X people") and NOT access-level ("expand financial access") — those are intermediary. | | | |
-| a2  | A number or percentage attached to the goal in a1 (e.g. "reduce stunting by 30%"). The same quote as a1 is fine if it already contains the number. | | | |
-| a3  | A target population AND a target year for the goal (e.g. "children under 5 in Ghana by 2030"). Same quote as a1/a2 is fine if it covers both. | | | |
-| b   | An intermediate outcome the org MEASURED, with a number (e.g. "78% of trained CHWs retained at 24 months"). | | | |
-| c   | An ultimate outcome the org MEASURED, with a number (e.g. "27% reduction in under-five mortality"). | | | |
-| d   | A documented program change the org made BASED ON outcome data. The quote should make both the data and the change concrete (e.g. "In 2022 we moved to blended training after retention dropped to 45%"). | | | |
-| e   | An intermediate result measured with a comparison or control group. Name the design (RCT, quasi-experimental, matched comparison). Before/after alone does not count. | | | |
-| f   | An ultimate result measured with a comparison or control group. Same design rules as e. | | | |
+The score is computed deterministically by code from this table, not by you. Leave a row blank if you didn't find supporting evidence — a blank row is the honest answer when the org doesn't have that thing. Inventing evidence will lower the org's real score and may get the report rejected.
+**What each row asks for** (read this before filling in the table):
+- **a1** — A stated condition-level outcome goal (health, income, wellbeing, food security, survival). NOT activity-level ("train X people") and NOT access-level ("expand financial access") — those are intermediary.
+- **a2** — A number or percentage attached to the goal in a1 (e.g. "reduce stunting by 30%"). The same quote as a1 is fine if it already contains the number.
+- **a3** — A target population AND a target year for the goal (e.g. "children under 5 in Ghana by 2030"). Same quote as a1/a2 is fine if it covers both.
+- **b** — An intermediate outcome the org MEASURED, with a number (e.g. "78% of trained CHWs retained at 24 months").
+- **c** — An ultimate outcome the org MEASURED, with a number (e.g. "27% reduction in under-five mortality").
+- **d** — A documented program change the org made BASED ON outcome data. The quote should make both the data and the change concrete (e.g. "In 2022 we moved to blended training after retention dropped to 45%").
+- **e** — An intermediate result measured with a comparison or control group. Name the design (RCT, quasi-experimental, matched comparison). Before/after alone does not count.
+- **f** — An ultimate result measured with a comparison or control group. Same design rules as e.
+| Row | Quote (verbatim from the cited page) | Source URL | Source name |
+|-----|--------------------------------------|------------|-------------|
+| a1  |                                      |            |             |
+| a2  |                                      |            |             |
+| a3  |                                      |            |             |
+| b   |                                      |            |             |
+| c   |                                      |            |             |
+| d   |                                      |            |             |
+| e   |                                      |            |             |
+| f   |                                      |            |             |
 **Rules for the EVIDENCE TABLE:**
-- The quoted text must appear verbatim on the cited page. Substring matching on the page body is how the score is verified.
+- The quoted text must appear verbatim on the cited page. A separate fact-check pass verifies your quotes against the page bodies after submission; invented or paraphrased quotes get the report flagged.
 - Use the real URL of the specific page that contains the quote. Not the org homepage. Not `example.com`.
 - A blank row is the correct answer when the evidence doesn't exist. Do not invent.
 - One row, one quote, one URL. Don't bundle two facts under one citation.

package/src/api-client.js CHANGED Viewed

@@ -61,10 +61,18 @@ export class ApiClient {
   }
   async submitReport(claimId, reportMarkdown, tokenUsage = null, metrics = null, modelUsed = null, promptVersion = null, disagreementRows = null) {
+    // The MCP tool surface accepts `estimated_tokens` as a plain number, but
+    // the API validates `token_usage` as `nullable|array` and reads
+    // `token_usage.total_tokens` for leaderboard accounting. Wrap a bare
+    // number into the shape the server expects so MCP submits don't 422.
+    const normalizedTokenUsage = typeof tokenUsage === 'number'
+      ? { total_tokens: tokenUsage }
+      : tokenUsage;
     return this.request('POST', '/research/submit', {
       claim_id: claimId,
       report_markdown: reportMarkdown,
-      token_usage: tokenUsage,
+      token_usage: normalizedTokenUsage,
       metrics: metrics,
       model_used: modelUsed,
       prompt_version: promptVersion,

package/src/api-client.test.js ADDED Viewed

@@ -0,0 +1,89 @@
+// Regression tests for ApiClient. The MCP layer passes `estimated_tokens` as
+// a plain number; the Laravel API validates token_usage as `nullable|array`
+// and reads `token_usage.total_tokens` for leaderboard accounting. If we
+// stop normalizing the shape, every MCP submit silently 422s.
+import { test } from 'node:test';
+import assert from 'node:assert/strict';
+import { ApiClient } from './api-client.js';
+function withMockFetch(fn) {
+  const original = globalThis.fetch;
+  const calls = [];
+  globalThis.fetch = async (url, opts) => {
+    calls.push({ url, opts });
+    return new Response(JSON.stringify({ success: true, org_name: 'Test' }), {
+      status: 200,
+      headers: { 'Content-Type': 'application/json' },
+    });
+  };
+  return fn(calls).finally(() => {
+    globalThis.fetch = original;
+  });
+}
+test('submitReport wraps a numeric token_usage into {total_tokens: N}', async () => {
+  await withMockFetch(async (calls) => {
+    const client = new ApiClient('tfg_test_key');
+    await client.submitReport('claim-uuid', 'report body', 12345);
+    const body = JSON.parse(calls[0].opts.body);
+    assert.deepEqual(
+      body.token_usage,
+      { total_tokens: 12345 },
+      'numeric token usage must be wrapped so Laravel `nullable|array` accepts it'
+    );
+  });
+});
+test('submitReport passes an array-shaped token_usage through untouched', async () => {
+  await withMockFetch(async (calls) => {
+    const client = new ApiClient('tfg_test_key');
+    const usage = { total_tokens: 42, input_tokens: 30, output_tokens: 12 };
+    await client.submitReport('claim-uuid', 'report body', usage);
+    const body = JSON.parse(calls[0].opts.body);
+    assert.deepEqual(body.token_usage, usage);
+  });
+});
+test('submitReport leaves null token_usage as null', async () => {
+  await withMockFetch(async (calls) => {
+    const client = new ApiClient('tfg_test_key');
+    await client.submitReport('claim-uuid', 'report body', null);
+    const body = JSON.parse(calls[0].opts.body);
+    assert.equal(body.token_usage, null);
+  });
+});
+test('submitReport forwards disagreement_rows and prompt_version', async () => {
+  await withMockFetch(async (calls) => {
+    const client = new ApiClient('tfg_test_key');
+    await client.submitReport(
+      'claim-uuid',
+      'report body',
+      100,
+      null,
+      'sonnet-4-6',
+      'v3',
+      ['a1', 'b']
+    );
+    const body = JSON.parse(calls[0].opts.body);
+    assert.equal(body.prompt_version, 'v3');
+    assert.deepEqual(body.disagreement_rows, ['a1', 'b']);
+  });
+});
+test('request() returns null on 204 No Content (consolidation queue empty)', async () => {
+  const original = globalThis.fetch;
+  globalThis.fetch = async () => new Response(null, { status: 204 });
+  try {
+    const client = new ApiClient('tfg_test_key');
+    const result = await client.getNextConsolidation();
+    assert.equal(result, null);
+  } finally {
+    globalThis.fetch = original;
+  }
+});

package/src/mcp-server.js CHANGED Viewed

@@ -143,12 +143,13 @@ server.tool('submit_report', 'Submit a completed research report (or a consolida
   report_markdown: z.string().describe('The full research report in markdown'),
   estimated_tokens: z.number().describe('Estimated total tokens used: count web searches (~1K each), web fetches (~2-5K each), report output (~4 tokens/word), plus ~10K overhead'),
   model_used: z.string().optional().describe('The model that generated this report'),
+  prompt_version: z.string().optional().describe('Methodology version: "v3" for the EVIDENCE TABLE flow (default), "v2" for the legacy scorecard flow.'),
   disagreement_rows: z.array(z.enum(['a1', 'a2', 'a3', 'b', 'c', 'd', 'e', 'f'])).optional().describe('Consolidation-only: EVIDENCE TABLE row keys where the two researchers materially disagreed. >=3 auto-triggers a 3rd researcher.'),
-}, async ({ claim_id, report_markdown, estimated_tokens, model_used, disagreement_rows }) => {
+}, async ({ claim_id, report_markdown, estimated_tokens, model_used, prompt_version, disagreement_rows }) => {
   if (!client) return { content: [{ type: 'text', text: 'Error: TFG_API_KEY not set.' }] };
   try {
-    const result = await client.submitReport(claim_id, report_markdown, estimated_tokens, null, model_used, PKG_VERSION, disagreement_rows);
+    const result = await client.submitReport(claim_id, report_markdown, estimated_tokens, null, model_used, prompt_version, disagreement_rows);
     markContributed();
     // One-off users: first successful submit completes their initial setup,
@@ -278,8 +279,11 @@ server.tool('my_impact', 'See your personal contribution stats, tier, and histor
     const result = await client.getImpact();
     const c = result.contributor;
+    // Older server builds omit github_handle from the impact response — fall
+    // back to display_name so we never print "@undefined".
+    const who = c.github_handle ? `@${c.github_handle}` : (c.display_name || 'you');
     return {
-      content: [{ type: 'text', text: `Your Impact (@${c.github_handle}):\n\nTier: ${c.tier}\nOrgs researched: ${c.total_orgs}\nAcceptance rate: ${c.acceptance_rate}%\nAutomation: ${c.has_schedule ? 'Active' : 'Not set up'}\n\nRecent:\n${result.claims?.slice(0, 5).map(cl => `  ${cl.organization?.name || 'Unknown'} - ${cl.status}`).join('\n') || 'None'}` }],
+      content: [{ type: 'text', text: `Your Impact (${who}):\n\nTier: ${c.tier}\nOrgs researched: ${c.total_orgs}\nAcceptance rate: ${c.acceptance_rate}%\nAutomation: ${c.has_schedule ? 'Active' : 'Not set up'}\n\nRecent:\n${result.claims?.slice(0, 5).map(cl => `  ${cl.organization?.name || 'Unknown'} - ${cl.status}`).join('\n') || 'None'}` }],
     };
   } catch (err) {
     return { content: [{ type: 'text', text: `Error: ${err.message}` }] };