freshcontext-mcp 0.3.16 → 0.3.17

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/RISKS.md DELETED
@@ -1,137 +0,0 @@
1
- # RISKS — FreshContext DAR engine and ingestion pipeline
2
-
3
- Known algorithmic and data-integrity edge cases in `worker/src/intelligence.ts` and the cron ingestion path. Last reviewed: 2026-05-01.
4
-
5
- ---
6
-
7
- ## Active risks (not yet mitigated)
8
-
9
- These are real and unguarded. Tracked in CLAUDE.md "Things Pending".
10
-
11
- ### 1. Frozen signal paradox
12
-
13
- When `publishedAt` is `null`, `applyDecay` falls back to `t = halfLifeHours`. The result is `R_t = R_0 · e^(−ln 2) = R_0 / 2` exactly — half the base score, deterministically. Such signals are pinned at half their base score forever, even after cron recompute. Permanent "stable" entropy.
14
-
15
- - **Where:** `applyDecay`, intelligence.ts:172
16
- - **Impact:** Signals without an extractable publication date accumulate as permanent middle-tier results.
17
- - **Mitigation pending:** Hard floor (`R_t < 5` → mark expired) plus lazy decay at read time.
18
-
19
- ### 2. No hard floor on R_t
20
-
21
- `is_relevant` uses `R_t >= 35`. Below 35, signals stay in the DB. Below 5 they're effectively dead but unflagged. Storage grows monotonically.
22
-
23
- - **Where:** intelligence.ts:255, feed query at worker.ts:942
24
- - **Mitigation pending:** Add `R_t < 5 → is_expired = 1` and exclude expired signals from the feed query.
25
-
26
- ### 3. Lazy decay missing
27
-
28
- `rt_score` is computed at ingest, then recomputed by the cron every 6h. Reads return the stored value, so feed responses can be up to 6h stale.
29
-
30
- - **Where:** Feed query reads `sr.rt_score` directly, no recompute.
31
- - **Mitigation pending:** Recompute decay at read time, or shorten the cron interval.
32
-
33
- ### 4. Re-ignition gap
34
-
35
- `isDuplicate(48)` skips fingerprints seen in the last 48h. A story that trends → dies → re-trends within 48h is dropped at ingest. After 48h, a new ingestion is allowed and scored fresh.
36
-
37
- - **Where:** intelligence.ts:333
38
- - **Mitigation pending:** Detect re-ignition by comparing the dedup window to the original signal's age and decay state.
39
-
40
- ### 5. CPU timeout in cron loop
41
-
42
- The DAR functions are O(n) on content length and individually cheap. The cron loop processes signals sequentially. At ~1k signals this is well under Workers CPU limits, but the failure mode at scale is unbounded loop time, not per-signal cost.
43
-
44
- - **Mitigation pending:** Batch processing with explicit per-batch CPU budget tracking.
45
-
46
- ---
47
-
48
- ## Documented behaviors (by design, not bugs)
49
-
50
- These are intentional but worth knowing.
51
-
52
- ### 6. Future dates in `applyDecay` score as freshest
53
-
54
- `extractPublishedAt` filters future dates upstream. If any future code path calls `applyDecay` directly with a future-dated string, `t` is clamped to 0 by `Math.max(0, …)` and the signal scores as freshest possible.
55
-
56
- - **Where:** `applyDecay`, intelligence.ts:176
57
- - **Why this is OK currently:** Only `scoreSignal` calls `applyDecay`, and it pre-filters via `extractPublishedAt`. Defense-in-depth would add an explicit reject in `applyDecay` itself.
58
-
59
- ### 7. Semantic fingerprint truncates titles to 80 characters
60
-
61
- `semanticFingerprint` slices the normalised title to 80 chars before hashing. Two articles whose titles diverge only after the 80th character will collide on fingerprint and dedupe. Most CMSs produce titles well under 80 chars; this is a tradeoff for fingerprint stability across whitespace/encoding noise.
62
-
63
- - **Where:** intelligence.ts:318
64
-
65
- ### 8. Empty and whitespace-only content collide on a single fingerprint
66
-
67
- Both reduce to the input string `"||"` and hash to the same 16-char fingerprint. If garbage signals leak past adapter validation, the second one is silently dropped at `isDuplicate`. Storage-wise benign, but a future debugging trap (you'll see one phantom "empty signal" in the DB and never know how many were ingested).
68
-
69
- - **Mitigation if needed:** Short-circuit fingerprint to `null` when title + url + date are all empty, and have the cron skip those.
70
-
71
- ### 9. `parseStoredProfile` degrades silently on garbage JSON
72
-
73
- If `targets` or `skills` columns contain malformed JSON, `safeParse` falls back to comma-split. Garbage tokens enter as profile keywords. They won't match real content, but no error is raised.
74
-
75
- - **Where:** intelligence.ts:273
76
- - **Mitigation if needed:** Tighten profile validation at write time; D1 can't enforce JSON shape, so this would have to be a write-path check.
77
-
78
- ### 10. Excluded signals still complete the scoring pipeline
79
-
80
- `scoreSignal` zeroes the score on exclusion match but still computes the fingerprint and audit signature, and the cron presumably inserts the row with `is_relevant=0`. If exclusion-matching content is high-volume, that's pure storage cost.
81
-
82
- - **Mitigation if needed:** Skip insertion when `R_0 == 0`.
83
-
84
- ### 11. `ha_pri_sig` is integrity, not authentication
85
-
86
- `PROVENANCE_SALT = "FRESHCONTEXT_DAR_V1"` is hardcoded in a public repo. Anyone with the source can forge a valid `ha_pri_sig` for any `(resultId, contentHash)` pair. The signature proves the pair was hashed together at some point — it does not prove "scored by this engine".
87
-
88
- - **Where:** intelligence.ts:195
89
- - **Why this matters:** METHODOLOGY.md describes the signature as proving provenance. That's accurate against accidental tampering, not against an adversary. If a customer ever needs cryptographic provenance, the salt would need to move to a secret binding (env var) and signatures would need to be reissued.
90
-
91
- ### 12. Trailing-slash URLs do not collapse
92
-
93
- `https://example.com/conf` and `https://example.com/conf/` produce different fingerprints. Most sites canonicalize one or the other, but not all. Minor dedup miss.
94
-
95
- - **Mitigation if needed:** Strip trailing `/` from `u.pathname` in `semanticFingerprint`.
96
-
97
- ---
98
-
99
- ## Resolved 2026-05-01
100
-
101
- Stress-test pass on 2026-05-01 surfaced the following data-integrity bugs and fixed them in-place:
102
-
103
- - **Duplicate keywords inflated R_0** — `calculateBaseScore` filtered raw `targets` / `skills` arrays without deduping. A profile with duplicate entries (`["typescript","typescript","typescript"]`) inflated the score by +15 per duplicate, capped at +35. Fixed by deduping via `new Set` before matching. Verified: dupe and single profiles now score identically (R_0=55 each).
104
-
105
- - **Malformed dates rolled silently** — `extractPublishedAt` accepted dates like `2024-02-30` because JS `new Date('2024-02-30')` rolls to Mar 1 instead of returning Invalid Date. Fixed by adding a round-trip check: reject if `new Date(ts).toISOString().slice(0,10) !== originalString`. Verified: `2024-02-30` → null, `2024-02-29` (valid leap day) → `2024-02-29`.
106
-
107
- - **Querystring stripping was too aggressive** — `semanticFingerprint` stripped *all* querystrings, causing `?id=1` and `?id=2` to collide. Fixed by parsing the URL and removing only known tracking params (`utm_*`, `fbclid`, `gclid`, `mc_*`, `igshid`); legitimate query identifiers are preserved. Verified: `?id=1` vs `?id=2` now differ; `?utm_source=hn` vs `?utm_source=reddit` still collide as intended.
108
-
109
- - **Hidden 50-char content threshold killed legitimate short signals** — `calculateBaseScore` had a `raw.length < 50 → −40` penalty grouped with `[ERROR]` and `"not found"` checks. The early `< 20` reject is the real floor; the 50-char clause silently zeroed legitimate short content like "OpenAI launches Atlas browser typescript" (40 chars). Removed. Verified: that content now scores R_0=58 (was 15).
110
-
111
- ---
112
-
113
- ## Behaviors confirmed working as intended (2026-05-01)
114
-
115
- - Future dates filtered at extraction (`2030-01-01` → `null`)
116
- - Bad month/day filtered (`2024-13-45` → `null` via `isNaN`)
117
- - Multiple dates → newest valid wins
118
- - Case-insensitive matching (`TYPESCRIPT` matches target `typescript`)
119
- - Punctuation/case differences in titles → same fingerprint (good dedup)
120
- - `http` vs `https` → different fingerprints (defensive)
121
- - `utm_*` variants → same fingerprint (intentional dedup, preserved post-fix)
122
- - Score capped at 100 and floored at 0
123
- - Unknown adapter falls back to default lambda (0.001)
124
- - Float underflow → 0 silently (no NaN/Infinity)
125
-
126
- ---
127
-
128
- ## How to re-run the stress test
129
-
130
- The probe script is not committed (deleted after each run to keep the working tree clean). To regenerate:
131
-
132
- 1. Create `worker/probe.ts` that imports the pure functions from `./src/intelligence`.
133
- 2. Feed adversarial inputs: empty / oversized content, malformed/future/rolled dates, duplicate target keywords, UTM-only URL diffs, title-collision boundaries, exclusion matches.
134
- 3. Run with `cd worker && npx tsx probe.ts`.
135
- 4. Delete the probe script after.
136
-
137
- A live load test was also run on 2026-05-01: 755 requests across `/health`, `/debug/db`, `/v1/intel/feed/default`. Zero errors. p50 latencies: `/health` 180ms, `/debug/db` 0.8–1.2s, `/v1/intel/feed/` 0.6–0.9s at concurrency 10–20. No cliffs found at tested loads; the bottleneck for finding real cliffs is a faster client than `xargs + curl.exe` on Windows.
package/cleanup.ps1 DELETED
@@ -1,99 +0,0 @@
1
- # cleanup.ps1 — One-time repo cleanup for freshcontext-mcp
2
- # Run from the repo root: powershell -ExecutionPolicy Bypass -File cleanup.ps1
3
- # Safe: only moves files into _archive/ subfolders. No deletions.
4
-
5
- $ErrorActionPreference = "Stop"
6
- $repo = "C:\Users\Immanuel Gabriel\Downloads\freshcontext-mcp"
7
- Set-Location $repo
8
-
9
- Write-Host "=== FreshContext repo cleanup ===" -ForegroundColor Cyan
10
- Write-Host "Repo: $repo" -ForegroundColor Gray
11
- Write-Host ""
12
-
13
- # Helper: move with git mv if tracked, plain move otherwise
14
- function Move-RepoFile {
15
- param([string]$From, [string]$ToDir)
16
- if (-not (Test-Path $From)) {
17
- Write-Host " SKIP (not found): $From" -ForegroundColor DarkGray
18
- return
19
- }
20
- $filename = Split-Path $From -Leaf
21
- $to = Join-Path $ToDir $filename
22
-
23
- # Check if file is tracked by git
24
- $tracked = git ls-files --error-unmatch $From 2>$null
25
- if ($LASTEXITCODE -eq 0) {
26
- git mv $From $to | Out-Null
27
- Write-Host " git mv $filename -> $ToDir/" -ForegroundColor Green
28
- } else {
29
- Move-Item -Path $From -Destination $to -Force
30
- Write-Host " move $filename -> $ToDir/" -ForegroundColor Yellow
31
- }
32
- }
33
-
34
- # --- Session saves -> _archive/sessions/ ---
35
- Write-Host "Moving session saves..." -ForegroundColor Cyan
36
- $sessions = @(
37
- "SESSION_SAVE_V3.md",
38
- "SESSION_SAVE_V4.md",
39
- "SESSION_SAVE_V5.md",
40
- "SESSION_SAVE_V5b.md",
41
- "SESSION_SAVE_V6.md",
42
- "SESSION_SAVE_V7.md",
43
- "SESSION_SAVE_V8.md",
44
- "SESSION_SAVE_V9.md",
45
- "SESSION_SAVE_V9b.md",
46
- "SESSION_SAVE_ARCHITECTURE_V1.md",
47
- "SESSION_SAVE_ARCHITECTURE_V2.md",
48
- "CONTEXT_SKILL.md"
49
- )
50
- foreach ($f in $sessions) {
51
- Move-RepoFile -From $f -ToDir "_archive\sessions"
52
- }
53
-
54
- # --- Superseded architecture plans -> _archive/architecture/ ---
55
- Write-Host ""
56
- Write-Host "Moving superseded architecture plans..." -ForegroundColor Cyan
57
- $architecture = @(
58
- "ARCHITECTURE_UPGRADE_CHECKLIST.md",
59
- "ARCHITECTURE_UPGRADE_ROADMAP_V1.md"
60
- )
61
- foreach ($f in $architecture) {
62
- Move-RepoFile -From $f -ToDir "_archive\architecture"
63
- }
64
-
65
- # --- Launch drafts -> _archive/launch-drafts/ ---
66
- Write-Host ""
67
- Write-Host "Moving launch drafts..." -ForegroundColor Cyan
68
- $drafts = @(
69
- "LAUNCH_POSTS_V9.md",
70
- "LAUNCH_POSTS_TODAY.md",
71
- "HN_THROWAWAY_FRIDAY.md"
72
- )
73
- foreach ($f in $drafts) {
74
- Move-RepoFile -From $f -ToDir "_archive\launch-drafts"
75
- }
76
-
77
- # --- Untracked junk: keep locally but make sure git ignores them ---
78
- Write-Host ""
79
- Write-Host "Cleaning git index of newly-ignored files..." -ForegroundColor Cyan
80
- $ignoredButTracked = @("backup.sql", "mcp-publisher.exe")
81
- foreach ($f in $ignoredButTracked) {
82
- if (Test-Path $f) {
83
- $tracked = git ls-files --error-unmatch $f 2>$null
84
- if ($LASTEXITCODE -eq 0) {
85
- git rm --cached $f | Out-Null
86
- Write-Host " git rm --cached $f (file kept locally)" -ForegroundColor Yellow
87
- } else {
88
- Write-Host " $f already untracked" -ForegroundColor DarkGray
89
- }
90
- }
91
- }
92
-
93
- Write-Host ""
94
- Write-Host "=== Cleanup complete ===" -ForegroundColor Cyan
95
- Write-Host ""
96
- Write-Host "Next steps:" -ForegroundColor White
97
- Write-Host " 1. Review changes: git status" -ForegroundColor Gray
98
- Write-Host " 2. Commit: git commit -m 'chore: archive session saves + tighten gitignore + clean repo root'" -ForegroundColor Gray
99
- Write-Host " 3. Push: git push origin main" -ForegroundColor Gray
package/demo/README.md DELETED
@@ -1,70 +0,0 @@
1
- # FreshContext — Live Demo
2
-
3
- > **Same model. Same retrieval set. Same query. Two completely different answers — because one of them remembered when its sources were written.**
4
-
5
- A 5-document demonstration of why semantic-only retrieval gives 2026 systems 2022 answers — and what it costs to fix it.
6
-
7
- ## What's in this folder
8
-
9
- | File | Purpose |
10
- |------|---------|
11
- | `index.html` | The shareable demo. Self-contained, no server needed. Open in any browser. |
12
- | `data.json` | The mock retrieval set (5 documents, mixed timestamps, semantic scores). |
13
- | `generate.mjs` | Calls the live Anthropic API to regenerate the two answers — proves they aren't hand-written. |
14
- | `README.md` | This file. |
15
-
16
- ## View the demo
17
-
18
- Open `index.html` in any browser. R<sub>t</sub> is computed live on page load, so the math stays current as documents age.
19
-
20
- To share it as a link:
21
-
22
- - **Cloudflare Pages:** drop the `demo/` folder into a Pages project — done.
23
- - **GitHub Pages:** push `demo/` to a `gh-pages` branch.
24
- - **Static host:** any S3, Netlify, or Vercel deploy works. No build step.
25
-
26
- ## Verify the answers are real
27
-
28
- The two answers in `index.html` are pre-baked. To prove they aren't hand-written, regenerate them with a real Claude API call:
29
-
30
- ```powershell
31
- # PowerShell
32
- $env:ANTHROPIC_API_KEY = "sk-ant-..."
33
- node generate.mjs
34
- ```
35
-
36
- ```bash
37
- # bash / zsh
38
- ANTHROPIC_API_KEY=sk-ant-... node generate.mjs
39
- ```
40
-
41
- The script:
42
-
43
- 1. Loads `data.json`
44
- 2. Computes R<sub>t</sub> for every document
45
- 3. Builds two prompts — one with the top-3 by R<sub>0</sub> (semantic), one with the top-3 by R<sub>t</sub> (decay-adjusted)
46
- 4. Calls `claude-sonnet-4-6` for both
47
- 5. Prints both answers side-by-side
48
-
49
- You'll see the same kind of divergence the demo shows. Different versions of Claude will phrase it differently, but the *direction* of the change is structurally guaranteed: whatever the top-3 says, that's what the model anchors on.
50
-
51
- ## What this demonstrates
52
-
53
- - **It's not a model problem.** Claude isn't wrong in the baseline — it faithfully summarized stale context.
54
- - **It's not an embedding problem.** Cosine similarity scores were correct.
55
- - **It's a context-engineering problem.** Retrieval ranks correctly along one axis (semantic similarity) and ignores another axis that matters in production (temporal validity).
56
-
57
- > Most RAG pipelines rank context correctly semantically but incorrectly temporally.
58
-
59
- ## Run it against your own data
60
-
61
- Replace `data.json` with your own retrieval output. The shape is documented inline. The HTML and the script will pick up your new query, your documents, your timestamps. The math doesn't change.
62
-
63
- ## Where this comes from
64
-
65
- - Repo: <https://github.com/PrinceGabriel-lgtm/freshcontext-mcp>
66
- - Spec: <https://freshcontext-site.pages.dev>
67
- - npm: `npm install freshcontext-mcp`
68
- - Live API: `https://freshcontext-mcp.gimmanuel73.workers.dev/v1/intel/feed/default`
69
-
70
- Built by Immanuel Gabriel · Grootfontein, Namibia · MIT licensed.
package/demo/data.json DELETED
@@ -1,88 +0,0 @@
1
- {
2
- "query": "What's the recommended way to chunk documents for RAG in 2026?",
3
- "now": "2026-05-08T00:00:00Z",
4
- "decay": {
5
- "lambda_per_hour": 0.0001,
6
- "halflife_hours": 6931,
7
- "halflife_days_human": "≈ 9.5 months",
8
- "note": "Single decay constant across sources for demo clarity. The live engine uses source-specific lambdas (HN: 14h half-life, blogs: 29 days, papers: 1.6 years)."
9
- },
10
- "documents": [
11
- {
12
- "id": "doc_2022_langchain_blog",
13
- "source": "blog.langchain.dev",
14
- "title": "RAG Chunking: The Complete Guide",
15
- "published_at": "2022-08-15T00:00:00Z",
16
- "base_score": 92,
17
- "content": "For most RAG applications, the recommended approach is fixed-size chunking with 1000 tokens per chunk and 200 token overlap. Use RecursiveCharacterTextSplitter with this configuration. This works for almost any document type and is the LangChain default. Studies show fixed-size chunking is sufficient for 95% of use cases when paired with a good embedding model like ada-002.",
18
- "why_high_semantic_score": "Authoritative source, dense keyword match (RAG, chunking, recommended, LangChain), confident tone."
19
- },
20
- {
21
- "id": "doc_2023_reddit_localllama",
22
- "source": "reddit.com/r/LocalLLaMA",
23
- "title": "What chunk size do you use for RAG?",
24
- "published_at": "2023-01-20T00:00:00Z",
25
- "base_score": 81,
26
- "content": "Most people use 512 or 1024 tokens with 10-20% overlap. Don't overthink it. Just use the LangChain RecursiveCharacterTextSplitter and tune chunk_size to match your embedding model's context window. Higher overlap = more storage, marginal accuracy gains.",
27
- "why_high_semantic_score": "Strong community signal (high upvotes), direct answer to query, confirms authoritative source above."
28
- },
29
- {
30
- "id": "doc_2024_arxiv_paper",
31
- "source": "arxiv.org/abs/2410.xxxxx",
32
- "title": "Semantic Chunking vs Fixed-Size: An Empirical Study",
33
- "published_at": "2024-06-12T00:00:00Z",
34
- "base_score": 78,
35
- "content": "We evaluate semantic chunking against fixed-size baselines across 12 retrieval tasks. Semantic chunking outperforms fixed-size in 73% of cases, particularly for technical documents. However, the computational overhead of semantic chunking is 4.2x higher.",
36
- "why_high_semantic_score": "Empirical evidence, scientific framing, but slightly off-axis from the direct query."
37
- },
38
- {
39
- "id": "doc_2025_langchain_late",
40
- "source": "blog.langchain.com",
41
- "title": "Late Chunking and Contextual Retrieval",
42
- "published_at": "2025-09-30T00:00:00Z",
43
- "base_score": 86,
44
- "content": "Late chunking with embedding-aware boundaries (bge-late-chunk, jina-late) outperforms both fixed-size and semantic chunking. Combine with contextual retrieval per Anthropic's research for state-of-the-art accuracy. The fixed-size 1000-token recommendation from earlier guides is now considered a baseline at best.",
45
- "why_high_semantic_score": "Same authoritative source as #1, strong keyword match, more recent."
46
- },
47
- {
48
- "id": "doc_2026_x_post",
49
- "source": "x.com/jerryjliu0",
50
- "title": "RAG failures we see in 2026",
51
- "published_at": "2026-02-08T00:00:00Z",
52
- "base_score": 80,
53
- "content": "Top RAG failure mode in production right now: teams still using fixed-size chunking from 2022 tutorials. Move to late-interaction or contextual retrieval. The math has changed; the 'best practice' docs haven't caught up. If your retrieval is still ranking by cosine similarity alone, you're shipping 2022 results in 2026.",
54
- "why_high_semantic_score": "Practitioner expertise, direct claim about query topic, but social-media format slightly reduces baseline semantic confidence."
55
- }
56
- ],
57
- "baked_answers": {
58
- "_note": "These are the two answers the demo displays by default. Anyone landing on the page sees them — no API key required. To regenerate from a real Claude call, run `node generate.mjs` with ANTHROPIC_API_KEY set; the script overwrites these two fields. Last regenerated: never (illustrative outputs).",
59
- "_last_regenerated": null,
60
- "_last_model": null,
61
- "stale": {
62
- "context_label": "2022 LangChain blog · 2025 LangChain blog · 2023 Reddit",
63
- "intro": "For RAG pipelines, the recommended approach is <strong>fixed-size chunking</strong>:",
64
- "bullets": [
65
- "1,000 tokens per chunk with 200-token overlap",
66
- "Use <span class=\"inline-code\">RecursiveCharacterTextSplitter</span> — the LangChain default",
67
- "Pair with a strong embedding model like <span class=\"inline-code\">ada-002</span>",
68
- "Tune <span class=\"inline-code\">chunk_size</span> to match your embedding model's context window"
69
- ],
70
- "outro": "This approach is widely confirmed by both authoritative documentation and community consensus.",
71
- "verdict_class": "bad",
72
- "verdict_text": "⚠ This is 2022 advice. ada-002 is no longer SOTA. Fixed-size chunking has been superseded."
73
- },
74
- "fresh": {
75
- "context_label": "2026 X post · 2025 LangChain blog · 2024 arXiv paper",
76
- "intro": "In 2026, the current state-of-the-art is <strong>late chunking with embedding-aware boundaries</strong>:",
77
- "bullets": [
78
- "Use late-chunking models like <span class=\"inline-code\">bge-late-chunk</span> or <span class=\"inline-code\">jina-late</span>",
79
- "Combine with <strong>contextual retrieval</strong> (per Anthropic's research) for SOTA accuracy",
80
- "Avoid fixed-size 1000-token chunking — it's now considered a baseline at best",
81
- "For technical documents, semantic chunking remains viable (73% win rate over fixed-size per the 2024 study), but carries 4.2× compute overhead"
82
- ],
83
- "outro": "Common 2026 production failure: teams still applying 2022-era tutorial recommendations.",
84
- "verdict_class": "good",
85
- "verdict_text": "✓ Current as of May 2026. Reflects 2025–2026 consensus while preserving the 2024 empirical context."
86
- }
87
- }
88
- }
package/demo/generate.mjs DELETED
@@ -1,199 +0,0 @@
1
- /**
2
- * generate.mjs — regenerate the demo's two answers from a real Claude call,
3
- * then write them back into data.json so the HTML picks them up automatically.
4
- *
5
- * The demo works WITHOUT this script. The baked_answers in data.json are
6
- * displayed by default — no API key required to view the page. This script
7
- * exists so anyone who wants to verify the math actually changes Claude's
8
- * answer can plug in their own key and prove it.
9
- *
10
- * Run:
11
- * PowerShell: $env:ANTHROPIC_API_KEY="sk-ant-..."; node generate.mjs
12
- * bash/zsh: ANTHROPIC_API_KEY=sk-ant-... node generate.mjs
13
- *
14
- * Optional env:
15
- * MODEL — model string, defaults to claude-sonnet-4-5-20250929
16
- * DRY_RUN=1 — print the prompts and exit, don't call the API
17
- * NO_SAVE=1 — call the API and print, but don't write data.json
18
- */
19
-
20
- import fs from 'fs/promises';
21
- import path from 'path';
22
- import { fileURLToPath } from 'url';
23
-
24
- const __dirname = path.dirname(fileURLToPath(import.meta.url));
25
- const DATA_PATH = path.join(__dirname, 'data.json');
26
-
27
- const API_KEY = process.env.ANTHROPIC_API_KEY;
28
- const MODEL = process.env.MODEL ?? 'claude-sonnet-4-5-20250929';
29
- const DRY_RUN = process.env.DRY_RUN === '1';
30
- const NO_SAVE = process.env.NO_SAVE === '1';
31
-
32
- if (!API_KEY && !DRY_RUN) {
33
- console.error('ERROR: Set ANTHROPIC_API_KEY env var, or run with DRY_RUN=1 to print prompts only.\n');
34
- console.error(' PowerShell: $env:ANTHROPIC_API_KEY="sk-ant-..."');
35
- console.error(' bash/zsh: export ANTHROPIC_API_KEY=sk-ant-...');
36
- process.exit(1);
37
- }
38
-
39
- // ─── Load data and compute R_t ──────────────────────────────────────────────
40
-
41
- const raw = await fs.readFile(DATA_PATH, 'utf8');
42
- const data = JSON.parse(raw);
43
- const now = new Date(data.now).getTime();
44
- const lambda = data.decay.lambda_per_hour;
45
-
46
- const docs = data.documents.map(d => {
47
- const published = new Date(d.published_at).getTime();
48
- const age_hours = (now - published) / (1000 * 60 * 60);
49
- const r_t = d.base_score * Math.exp(-lambda * age_hours);
50
- return { ...d, age_hours, r_t };
51
- });
52
-
53
- const baselineTop3 = [...docs].sort((a, b) => b.base_score - a.base_score).slice(0, 3);
54
- const freshTop3 = [...docs].sort((a, b) => b.r_t - a.r_t).slice(0, 3);
55
-
56
- // ─── Build prompts ──────────────────────────────────────────────────────────
57
-
58
- function buildPrompt(label, contextDocs) {
59
- const formatted = contextDocs.map((d, i) =>
60
- `[Document ${i + 1}]\nSource: ${d.source}\nPublished: ${d.published_at.slice(0, 10)}\nTitle: ${d.title}\nContent: ${d.content}`
61
- ).join('\n\n');
62
- return `You are answering a developer's technical question using the retrieved documents below. Be concrete and cite specific recommendations from the context. Format your answer as:
63
-
64
- 1. A one-line lede starting with "For RAG pipelines..." or "In 2026..." that names the recommended approach in bold-able terms.
65
- 2. A short bullet list (3-5 bullets) of specific recommendations with code spans where relevant.
66
- 3. A one-line closing note.
67
-
68
- Keep total length under 120 words. Output plain text — the demo's renderer will format the bullets.
69
-
70
- QUERY: ${data.query}
71
-
72
- RETRIEVED CONTEXT (top 3 by ${label}):
73
-
74
- ${formatted}
75
-
76
- Answer the query using these documents.`;
77
- }
78
-
79
- // ─── Call the API ───────────────────────────────────────────────────────────
80
-
81
- async function ask(prompt) {
82
- if (DRY_RUN) {
83
- console.log('--- DRY RUN: would have sent the following prompt ---');
84
- console.log(prompt);
85
- console.log('--- end prompt ---');
86
- return '[DRY_RUN: no API call made]';
87
- }
88
- const res = await fetch('https://api.anthropic.com/v1/messages', {
89
- method: 'POST',
90
- headers: {
91
- 'Content-Type': 'application/json',
92
- 'x-api-key': API_KEY,
93
- 'anthropic-version': '2023-06-01',
94
- },
95
- body: JSON.stringify({
96
- model: MODEL,
97
- max_tokens: 600,
98
- messages: [{ role: 'user', content: prompt }],
99
- }),
100
- });
101
- if (!res.ok) {
102
- const err = await res.text();
103
- console.error(`Anthropic API error (${res.status}):`, err);
104
- process.exit(1);
105
- }
106
- const json = await res.json();
107
- return json.content[0].text;
108
- }
109
-
110
- // ─── Convert raw model output → structured answer object ───────────────────
111
- // The model is asked for a lede + bullets + closing. Parse what came back into
112
- // the shape data.json's baked_answers expects. If parsing fails on a real run,
113
- // the script falls back to dumping the raw text into the `intro` field so the
114
- // page still renders something readable.
115
-
116
- function parseAnswer(text) {
117
- const lines = text.split('\n').map(l => l.trim()).filter(Boolean);
118
- const bullets = [];
119
- const otherLines = [];
120
- for (const line of lines) {
121
- const m = line.match(/^[-*•]\s+(.+)$/) || line.match(/^\d+\.\s+(.+)$/);
122
- if (m) bullets.push(m[1].trim());
123
- else otherLines.push(line);
124
- }
125
- const intro = otherLines[0] ?? text;
126
- const outro = otherLines.length > 1 ? otherLines[otherLines.length - 1] : '';
127
- if (bullets.length === 0) {
128
- return { intro: text, bullets: [], outro: '' };
129
- }
130
- return { intro, bullets, outro };
131
- }
132
-
133
- // ─── Run both, print side-by-side ──────────────────────────────────────────
134
-
135
- console.log(`Model: ${MODEL}`);
136
- console.log(`Query: "${data.query}"`);
137
- console.log(`Now: ${data.now}`);
138
- if (DRY_RUN) console.log('Mode: DRY_RUN — no API calls');
139
- if (NO_SAVE) console.log('Mode: NO_SAVE — won\'t write data.json');
140
- console.log();
141
-
142
- console.log('━'.repeat(72));
143
- console.log(' WITHOUT FreshContext — top 3 by semantic similarity (R₀)');
144
- console.log('━'.repeat(72));
145
- baselineTop3.forEach((d, i) =>
146
- console.log(` ${i + 1}. ${d.source.padEnd(28)} ${d.published_at.slice(0,10)} R₀=${d.base_score}`)
147
- );
148
- console.log();
149
- const baselineRaw = await ask(buildPrompt('semantic similarity (R₀)', baselineTop3));
150
- console.log(baselineRaw);
151
-
152
- console.log('\n' + '━'.repeat(72));
153
- console.log(' WITH FreshContext — top 3 by decay-adjusted relevancy (Rₜ)');
154
- console.log('━'.repeat(72));
155
- freshTop3.forEach((d, i) =>
156
- console.log(` ${i + 1}. ${d.source.padEnd(28)} ${d.published_at.slice(0,10)} Rₜ=${d.r_t.toFixed(1)}`)
157
- );
158
- console.log();
159
- const freshRaw = await ask(buildPrompt('decay-adjusted relevancy (Rₜ)', freshTop3));
160
- console.log(freshRaw);
161
-
162
- console.log('\n' + '━'.repeat(72));
163
- console.log(' Same model. Same retrieval set. Same query.');
164
- console.log(' Only the temporal layer changed.');
165
- console.log('━'.repeat(72));
166
-
167
- // ─── Persist to data.json (unless suppressed) ──────────────────────────────
168
-
169
- if (DRY_RUN || NO_SAVE) {
170
- console.log('\n(skipped writing data.json)');
171
- process.exit(0);
172
- }
173
-
174
- const baselineParsed = parseAnswer(baselineRaw);
175
- const freshParsed = parseAnswer(freshRaw);
176
-
177
- data.baked_answers.stale = {
178
- context_label: baselineTop3.map(d => `${d.published_at.slice(0,4)} ${d.source.split('.')[0]}`).join(' · '),
179
- intro: baselineParsed.intro,
180
- bullets: baselineParsed.bullets,
181
- outro: baselineParsed.outro,
182
- verdict_class: 'bad',
183
- verdict_text: data.baked_answers.stale.verdict_text, // preserve human-curated verdict
184
- };
185
- data.baked_answers.fresh = {
186
- context_label: freshTop3.map(d => `${d.published_at.slice(0,4)} ${d.source.split('.')[0]}`).join(' · '),
187
- intro: freshParsed.intro,
188
- bullets: freshParsed.bullets,
189
- outro: freshParsed.outro,
190
- verdict_class: 'good',
191
- verdict_text: data.baked_answers.fresh.verdict_text, // preserve human-curated verdict
192
- };
193
- data.baked_answers._last_regenerated = new Date().toISOString();
194
- data.baked_answers._last_model = MODEL;
195
- data.baked_answers._note = `Live-regenerated answers from ${MODEL}. Same model, same retrieval set, same query — only the temporal layer changed. Re-run \`node generate.mjs\` any time to refresh.`;
196
-
197
- await fs.writeFile(DATA_PATH, JSON.stringify(data, null, 2) + '\n', 'utf8');
198
- console.log(`\n✓ Wrote ${DATA_PATH}`);
199
- console.log(' Open index.html — the answers in the page now reflect this run.');