freshcontext-mcp 0.3.16 → 0.3.17
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.env.example +3 -0
- package/README.md +6 -6
- package/dist/adapters/finance.js +87 -101
- package/dist/adapters/gdelt.js +1 -1
- package/dist/adapters/gebiz.js +1 -1
- package/dist/adapters/hackernews.js +43 -13
- package/dist/adapters/productHunt.js +8 -4
- package/dist/adapters/repoSearch.js +1 -1
- package/dist/adapters/secFilings.js +1 -1
- package/dist/security.js +1 -1
- package/dist/server.js +10 -10
- package/dist/tools/freshnessStamp.js +23 -3
- package/freshcontext.schema.json +1 -1
- package/package.json +14 -7
- package/server.json +3 -3
- package/.github/workflows/publish.yml +0 -32
- package/RESEARCH.md +0 -487
- package/RISKS.md +0 -137
- package/cleanup.ps1 +0 -99
- package/demo/README.md +0 -70
- package/demo/data.json +0 -88
- package/demo/generate.mjs +0 -199
- package/demo/index.html +0 -513
- package/demo/logo-export.html +0 -61
- package/demo/logo.svg +0 -23
- package/freshcontext-validate.js +0 -196
- package/time-check.ps1 +0 -46
package/RISKS.md
DELETED
|
@@ -1,137 +0,0 @@
|
|
|
1
|
-
# RISKS — FreshContext DAR engine and ingestion pipeline
|
|
2
|
-
|
|
3
|
-
Known algorithmic and data-integrity edge cases in `worker/src/intelligence.ts` and the cron ingestion path. Last reviewed: 2026-05-01.
|
|
4
|
-
|
|
5
|
-
---
|
|
6
|
-
|
|
7
|
-
## Active risks (not yet mitigated)
|
|
8
|
-
|
|
9
|
-
These are real and unguarded. Tracked in CLAUDE.md "Things Pending".
|
|
10
|
-
|
|
11
|
-
### 1. Frozen signal paradox
|
|
12
|
-
|
|
13
|
-
When `publishedAt` is `null`, `applyDecay` falls back to `t = halfLifeHours`. The result is `R_t = R_0 · e^(−ln 2) = R_0 / 2` exactly — half the base score, deterministically. Such signals are pinned at half their base score forever, even after cron recompute. Permanent "stable" entropy.
|
|
14
|
-
|
|
15
|
-
- **Where:** `applyDecay`, intelligence.ts:172
|
|
16
|
-
- **Impact:** Signals without an extractable publication date accumulate as permanent middle-tier results.
|
|
17
|
-
- **Mitigation pending:** Hard floor (`R_t < 5` → mark expired) plus lazy decay at read time.
|
|
18
|
-
|
|
19
|
-
### 2. No hard floor on R_t
|
|
20
|
-
|
|
21
|
-
`is_relevant` uses `R_t >= 35`. Below 35, signals stay in the DB. Below 5 they're effectively dead but unflagged. Storage grows monotonically.
|
|
22
|
-
|
|
23
|
-
- **Where:** intelligence.ts:255, feed query at worker.ts:942
|
|
24
|
-
- **Mitigation pending:** Add `R_t < 5 → is_expired = 1` and exclude expired signals from the feed query.
|
|
25
|
-
|
|
26
|
-
### 3. Lazy decay missing
|
|
27
|
-
|
|
28
|
-
`rt_score` is computed at ingest, then recomputed by the cron every 6h. Reads return the stored value, so feed responses can be up to 6h stale.
|
|
29
|
-
|
|
30
|
-
- **Where:** Feed query reads `sr.rt_score` directly, no recompute.
|
|
31
|
-
- **Mitigation pending:** Recompute decay at read time, or shorten the cron interval.
|
|
32
|
-
|
|
33
|
-
### 4. Re-ignition gap
|
|
34
|
-
|
|
35
|
-
`isDuplicate(48)` skips fingerprints seen in the last 48h. A story that trends → dies → re-trends within 48h is dropped at ingest. After 48h, a new ingestion is allowed and scored fresh.
|
|
36
|
-
|
|
37
|
-
- **Where:** intelligence.ts:333
|
|
38
|
-
- **Mitigation pending:** Detect re-ignition by comparing the dedup window to the original signal's age and decay state.
|
|
39
|
-
|
|
40
|
-
### 5. CPU timeout in cron loop
|
|
41
|
-
|
|
42
|
-
The DAR functions are O(n) on content length and individually cheap. The cron loop processes signals sequentially. At ~1k signals this is well under Workers CPU limits, but the failure mode at scale is unbounded loop time, not per-signal cost.
|
|
43
|
-
|
|
44
|
-
- **Mitigation pending:** Batch processing with explicit per-batch CPU budget tracking.
|
|
45
|
-
|
|
46
|
-
---
|
|
47
|
-
|
|
48
|
-
## Documented behaviors (by design, not bugs)
|
|
49
|
-
|
|
50
|
-
These are intentional but worth knowing.
|
|
51
|
-
|
|
52
|
-
### 6. Future dates in `applyDecay` score as freshest
|
|
53
|
-
|
|
54
|
-
`extractPublishedAt` filters future dates upstream. If any future code path calls `applyDecay` directly with a future-dated string, `t` is clamped to 0 by `Math.max(0, …)` and the signal scores as freshest possible.
|
|
55
|
-
|
|
56
|
-
- **Where:** `applyDecay`, intelligence.ts:176
|
|
57
|
-
- **Why this is OK currently:** Only `scoreSignal` calls `applyDecay`, and it pre-filters via `extractPublishedAt`. Defense-in-depth would add an explicit reject in `applyDecay` itself.
|
|
58
|
-
|
|
59
|
-
### 7. Semantic fingerprint truncates titles to 80 characters
|
|
60
|
-
|
|
61
|
-
`semanticFingerprint` slices the normalised title to 80 chars before hashing. Two articles whose titles diverge only after the 80th character will collide on fingerprint and dedupe. Most CMSs produce titles well under 80 chars; this is a tradeoff for fingerprint stability across whitespace/encoding noise.
|
|
62
|
-
|
|
63
|
-
- **Where:** intelligence.ts:318
|
|
64
|
-
|
|
65
|
-
### 8. Empty and whitespace-only content collide on a single fingerprint
|
|
66
|
-
|
|
67
|
-
Both reduce to the input string `"||"` and hash to the same 16-char fingerprint. If garbage signals leak past adapter validation, the second one is silently dropped at `isDuplicate`. Storage-wise benign, but a future debugging trap (you'll see one phantom "empty signal" in the DB and never know how many were ingested).
|
|
68
|
-
|
|
69
|
-
- **Mitigation if needed:** Short-circuit fingerprint to `null` when title + url + date are all empty, and have the cron skip those.
|
|
70
|
-
|
|
71
|
-
### 9. `parseStoredProfile` degrades silently on garbage JSON
|
|
72
|
-
|
|
73
|
-
If `targets` or `skills` columns contain malformed JSON, `safeParse` falls back to comma-split. Garbage tokens enter as profile keywords. They won't match real content, but no error is raised.
|
|
74
|
-
|
|
75
|
-
- **Where:** intelligence.ts:273
|
|
76
|
-
- **Mitigation if needed:** Tighten profile validation at write time; D1 can't enforce JSON shape, so this would have to be a write-path check.
|
|
77
|
-
|
|
78
|
-
### 10. Excluded signals still complete the scoring pipeline
|
|
79
|
-
|
|
80
|
-
`scoreSignal` zeroes the score on exclusion match but still computes the fingerprint and audit signature, and the cron presumably inserts the row with `is_relevant=0`. If exclusion-matching content is high-volume, that's pure storage cost.
|
|
81
|
-
|
|
82
|
-
- **Mitigation if needed:** Skip insertion when `R_0 == 0`.
|
|
83
|
-
|
|
84
|
-
### 11. `ha_pri_sig` is integrity, not authentication
|
|
85
|
-
|
|
86
|
-
`PROVENANCE_SALT = "FRESHCONTEXT_DAR_V1"` is hardcoded in a public repo. Anyone with the source can forge a valid `ha_pri_sig` for any `(resultId, contentHash)` pair. The signature proves the pair was hashed together at some point — it does not prove "scored by this engine".
|
|
87
|
-
|
|
88
|
-
- **Where:** intelligence.ts:195
|
|
89
|
-
- **Why this matters:** METHODOLOGY.md describes the signature as proving provenance. That's accurate against accidental tampering, not against an adversary. If a customer ever needs cryptographic provenance, the salt would need to move to a secret binding (env var) and signatures would need to be reissued.
|
|
90
|
-
|
|
91
|
-
### 12. Trailing-slash URLs do not collapse
|
|
92
|
-
|
|
93
|
-
`https://example.com/conf` and `https://example.com/conf/` produce different fingerprints. Most sites canonicalize one or the other, but not all. Minor dedup miss.
|
|
94
|
-
|
|
95
|
-
- **Mitigation if needed:** Strip trailing `/` from `u.pathname` in `semanticFingerprint`.
|
|
96
|
-
|
|
97
|
-
---
|
|
98
|
-
|
|
99
|
-
## Resolved 2026-05-01
|
|
100
|
-
|
|
101
|
-
Stress-test pass on 2026-05-01 surfaced the following data-integrity bugs and fixed them in-place:
|
|
102
|
-
|
|
103
|
-
- **Duplicate keywords inflated R_0** — `calculateBaseScore` filtered raw `targets` / `skills` arrays without deduping. A profile with duplicate entries (`["typescript","typescript","typescript"]`) inflated the score by +15 per duplicate, capped at +35. Fixed by deduping via `new Set` before matching. Verified: dupe and single profiles now score identically (R_0=55 each).
|
|
104
|
-
|
|
105
|
-
- **Malformed dates rolled silently** — `extractPublishedAt` accepted dates like `2024-02-30` because JS `new Date('2024-02-30')` rolls to Mar 1 instead of returning Invalid Date. Fixed by adding a round-trip check: reject if `new Date(ts).toISOString().slice(0,10) !== originalString`. Verified: `2024-02-30` → null, `2024-02-29` (valid leap day) → `2024-02-29`.
|
|
106
|
-
|
|
107
|
-
- **Querystring stripping was too aggressive** — `semanticFingerprint` stripped *all* querystrings, causing `?id=1` and `?id=2` to collide. Fixed by parsing the URL and removing only known tracking params (`utm_*`, `fbclid`, `gclid`, `mc_*`, `igshid`); legitimate query identifiers are preserved. Verified: `?id=1` vs `?id=2` now differ; `?utm_source=hn` vs `?utm_source=reddit` still collide as intended.
|
|
108
|
-
|
|
109
|
-
- **Hidden 50-char content threshold killed legitimate short signals** — `calculateBaseScore` had a `raw.length < 50 → −40` penalty grouped with `[ERROR]` and `"not found"` checks. The early `< 20` reject is the real floor; the 50-char clause silently zeroed legitimate short content like "OpenAI launches Atlas browser typescript" (40 chars). Removed. Verified: that content now scores R_0=58 (was 15).
|
|
110
|
-
|
|
111
|
-
---
|
|
112
|
-
|
|
113
|
-
## Behaviors confirmed working as intended (2026-05-01)
|
|
114
|
-
|
|
115
|
-
- Future dates filtered at extraction (`2030-01-01` → `null`)
|
|
116
|
-
- Bad month/day filtered (`2024-13-45` → `null` via `isNaN`)
|
|
117
|
-
- Multiple dates → newest valid wins
|
|
118
|
-
- Case-insensitive matching (`TYPESCRIPT` matches target `typescript`)
|
|
119
|
-
- Punctuation/case differences in titles → same fingerprint (good dedup)
|
|
120
|
-
- `http` vs `https` → different fingerprints (defensive)
|
|
121
|
-
- `utm_*` variants → same fingerprint (intentional dedup, preserved post-fix)
|
|
122
|
-
- Score capped at 100 and floored at 0
|
|
123
|
-
- Unknown adapter falls back to default lambda (0.001)
|
|
124
|
-
- Float underflow → 0 silently (no NaN/Infinity)
|
|
125
|
-
|
|
126
|
-
---
|
|
127
|
-
|
|
128
|
-
## How to re-run the stress test
|
|
129
|
-
|
|
130
|
-
The probe script is not committed (deleted after each run to keep the working tree clean). To regenerate:
|
|
131
|
-
|
|
132
|
-
1. Create `worker/probe.ts` that imports the pure functions from `./src/intelligence`.
|
|
133
|
-
2. Feed adversarial inputs: empty / oversized content, malformed/future/rolled dates, duplicate target keywords, UTM-only URL diffs, title-collision boundaries, exclusion matches.
|
|
134
|
-
3. Run with `cd worker && npx tsx probe.ts`.
|
|
135
|
-
4. Delete the probe script after.
|
|
136
|
-
|
|
137
|
-
A live load test was also run on 2026-05-01: 755 requests across `/health`, `/debug/db`, `/v1/intel/feed/default`. Zero errors. p50 latencies: `/health` 180ms, `/debug/db` 0.8–1.2s, `/v1/intel/feed/` 0.6–0.9s at concurrency 10–20. No cliffs found at tested loads; the bottleneck for finding real cliffs is a faster client than `xargs + curl.exe` on Windows.
|
package/cleanup.ps1
DELETED
|
@@ -1,99 +0,0 @@
|
|
|
1
|
-
# cleanup.ps1 — One-time repo cleanup for freshcontext-mcp
|
|
2
|
-
# Run from the repo root: powershell -ExecutionPolicy Bypass -File cleanup.ps1
|
|
3
|
-
# Safe: only moves files into _archive/ subfolders. No deletions.
|
|
4
|
-
|
|
5
|
-
$ErrorActionPreference = "Stop"
|
|
6
|
-
$repo = "C:\Users\Immanuel Gabriel\Downloads\freshcontext-mcp"
|
|
7
|
-
Set-Location $repo
|
|
8
|
-
|
|
9
|
-
Write-Host "=== FreshContext repo cleanup ===" -ForegroundColor Cyan
|
|
10
|
-
Write-Host "Repo: $repo" -ForegroundColor Gray
|
|
11
|
-
Write-Host ""
|
|
12
|
-
|
|
13
|
-
# Helper: move with git mv if tracked, plain move otherwise
|
|
14
|
-
function Move-RepoFile {
|
|
15
|
-
param([string]$From, [string]$ToDir)
|
|
16
|
-
if (-not (Test-Path $From)) {
|
|
17
|
-
Write-Host " SKIP (not found): $From" -ForegroundColor DarkGray
|
|
18
|
-
return
|
|
19
|
-
}
|
|
20
|
-
$filename = Split-Path $From -Leaf
|
|
21
|
-
$to = Join-Path $ToDir $filename
|
|
22
|
-
|
|
23
|
-
# Check if file is tracked by git
|
|
24
|
-
$tracked = git ls-files --error-unmatch $From 2>$null
|
|
25
|
-
if ($LASTEXITCODE -eq 0) {
|
|
26
|
-
git mv $From $to | Out-Null
|
|
27
|
-
Write-Host " git mv $filename -> $ToDir/" -ForegroundColor Green
|
|
28
|
-
} else {
|
|
29
|
-
Move-Item -Path $From -Destination $to -Force
|
|
30
|
-
Write-Host " move $filename -> $ToDir/" -ForegroundColor Yellow
|
|
31
|
-
}
|
|
32
|
-
}
|
|
33
|
-
|
|
34
|
-
# --- Session saves -> _archive/sessions/ ---
|
|
35
|
-
Write-Host "Moving session saves..." -ForegroundColor Cyan
|
|
36
|
-
$sessions = @(
|
|
37
|
-
"SESSION_SAVE_V3.md",
|
|
38
|
-
"SESSION_SAVE_V4.md",
|
|
39
|
-
"SESSION_SAVE_V5.md",
|
|
40
|
-
"SESSION_SAVE_V5b.md",
|
|
41
|
-
"SESSION_SAVE_V6.md",
|
|
42
|
-
"SESSION_SAVE_V7.md",
|
|
43
|
-
"SESSION_SAVE_V8.md",
|
|
44
|
-
"SESSION_SAVE_V9.md",
|
|
45
|
-
"SESSION_SAVE_V9b.md",
|
|
46
|
-
"SESSION_SAVE_ARCHITECTURE_V1.md",
|
|
47
|
-
"SESSION_SAVE_ARCHITECTURE_V2.md",
|
|
48
|
-
"CONTEXT_SKILL.md"
|
|
49
|
-
)
|
|
50
|
-
foreach ($f in $sessions) {
|
|
51
|
-
Move-RepoFile -From $f -ToDir "_archive\sessions"
|
|
52
|
-
}
|
|
53
|
-
|
|
54
|
-
# --- Superseded architecture plans -> _archive/architecture/ ---
|
|
55
|
-
Write-Host ""
|
|
56
|
-
Write-Host "Moving superseded architecture plans..." -ForegroundColor Cyan
|
|
57
|
-
$architecture = @(
|
|
58
|
-
"ARCHITECTURE_UPGRADE_CHECKLIST.md",
|
|
59
|
-
"ARCHITECTURE_UPGRADE_ROADMAP_V1.md"
|
|
60
|
-
)
|
|
61
|
-
foreach ($f in $architecture) {
|
|
62
|
-
Move-RepoFile -From $f -ToDir "_archive\architecture"
|
|
63
|
-
}
|
|
64
|
-
|
|
65
|
-
# --- Launch drafts -> _archive/launch-drafts/ ---
|
|
66
|
-
Write-Host ""
|
|
67
|
-
Write-Host "Moving launch drafts..." -ForegroundColor Cyan
|
|
68
|
-
$drafts = @(
|
|
69
|
-
"LAUNCH_POSTS_V9.md",
|
|
70
|
-
"LAUNCH_POSTS_TODAY.md",
|
|
71
|
-
"HN_THROWAWAY_FRIDAY.md"
|
|
72
|
-
)
|
|
73
|
-
foreach ($f in $drafts) {
|
|
74
|
-
Move-RepoFile -From $f -ToDir "_archive\launch-drafts"
|
|
75
|
-
}
|
|
76
|
-
|
|
77
|
-
# --- Untracked junk: keep locally but make sure git ignores them ---
|
|
78
|
-
Write-Host ""
|
|
79
|
-
Write-Host "Cleaning git index of newly-ignored files..." -ForegroundColor Cyan
|
|
80
|
-
$ignoredButTracked = @("backup.sql", "mcp-publisher.exe")
|
|
81
|
-
foreach ($f in $ignoredButTracked) {
|
|
82
|
-
if (Test-Path $f) {
|
|
83
|
-
$tracked = git ls-files --error-unmatch $f 2>$null
|
|
84
|
-
if ($LASTEXITCODE -eq 0) {
|
|
85
|
-
git rm --cached $f | Out-Null
|
|
86
|
-
Write-Host " git rm --cached $f (file kept locally)" -ForegroundColor Yellow
|
|
87
|
-
} else {
|
|
88
|
-
Write-Host " $f already untracked" -ForegroundColor DarkGray
|
|
89
|
-
}
|
|
90
|
-
}
|
|
91
|
-
}
|
|
92
|
-
|
|
93
|
-
Write-Host ""
|
|
94
|
-
Write-Host "=== Cleanup complete ===" -ForegroundColor Cyan
|
|
95
|
-
Write-Host ""
|
|
96
|
-
Write-Host "Next steps:" -ForegroundColor White
|
|
97
|
-
Write-Host " 1. Review changes: git status" -ForegroundColor Gray
|
|
98
|
-
Write-Host " 2. Commit: git commit -m 'chore: archive session saves + tighten gitignore + clean repo root'" -ForegroundColor Gray
|
|
99
|
-
Write-Host " 3. Push: git push origin main" -ForegroundColor Gray
|
package/demo/README.md
DELETED
|
@@ -1,70 +0,0 @@
|
|
|
1
|
-
# FreshContext — Live Demo
|
|
2
|
-
|
|
3
|
-
> **Same model. Same retrieval set. Same query. Two completely different answers — because one of them remembered when its sources were written.**
|
|
4
|
-
|
|
5
|
-
A 5-document demonstration of why semantic-only retrieval gives 2026 systems 2022 answers — and what it costs to fix it.
|
|
6
|
-
|
|
7
|
-
## What's in this folder
|
|
8
|
-
|
|
9
|
-
| File | Purpose |
|
|
10
|
-
|------|---------|
|
|
11
|
-
| `index.html` | The shareable demo. Self-contained, no server needed. Open in any browser. |
|
|
12
|
-
| `data.json` | The mock retrieval set (5 documents, mixed timestamps, semantic scores). |
|
|
13
|
-
| `generate.mjs` | Calls the live Anthropic API to regenerate the two answers — proves they aren't hand-written. |
|
|
14
|
-
| `README.md` | This file. |
|
|
15
|
-
|
|
16
|
-
## View the demo
|
|
17
|
-
|
|
18
|
-
Open `index.html` in any browser. R<sub>t</sub> is computed live on page load, so the math stays current as documents age.
|
|
19
|
-
|
|
20
|
-
To share it as a link:
|
|
21
|
-
|
|
22
|
-
- **Cloudflare Pages:** drop the `demo/` folder into a Pages project — done.
|
|
23
|
-
- **GitHub Pages:** push `demo/` to a `gh-pages` branch.
|
|
24
|
-
- **Static host:** any S3, Netlify, or Vercel deploy works. No build step.
|
|
25
|
-
|
|
26
|
-
## Verify the answers are real
|
|
27
|
-
|
|
28
|
-
The two answers in `index.html` are pre-baked. To prove they aren't hand-written, regenerate them with a real Claude API call:
|
|
29
|
-
|
|
30
|
-
```powershell
|
|
31
|
-
# PowerShell
|
|
32
|
-
$env:ANTHROPIC_API_KEY = "sk-ant-..."
|
|
33
|
-
node generate.mjs
|
|
34
|
-
```
|
|
35
|
-
|
|
36
|
-
```bash
|
|
37
|
-
# bash / zsh
|
|
38
|
-
ANTHROPIC_API_KEY=sk-ant-... node generate.mjs
|
|
39
|
-
```
|
|
40
|
-
|
|
41
|
-
The script:
|
|
42
|
-
|
|
43
|
-
1. Loads `data.json`
|
|
44
|
-
2. Computes R<sub>t</sub> for every document
|
|
45
|
-
3. Builds two prompts — one with the top-3 by R<sub>0</sub> (semantic), one with the top-3 by R<sub>t</sub> (decay-adjusted)
|
|
46
|
-
4. Calls `claude-sonnet-4-6` for both
|
|
47
|
-
5. Prints both answers side-by-side
|
|
48
|
-
|
|
49
|
-
You'll see the same kind of divergence the demo shows. Different versions of Claude will phrase it differently, but the *direction* of the change is structurally guaranteed: whatever the top-3 says, that's what the model anchors on.
|
|
50
|
-
|
|
51
|
-
## What this demonstrates
|
|
52
|
-
|
|
53
|
-
- **It's not a model problem.** Claude isn't wrong in the baseline — it faithfully summarized stale context.
|
|
54
|
-
- **It's not an embedding problem.** Cosine similarity scores were correct.
|
|
55
|
-
- **It's a context-engineering problem.** Retrieval ranks correctly along one axis (semantic similarity) and ignores another axis that matters in production (temporal validity).
|
|
56
|
-
|
|
57
|
-
> Most RAG pipelines rank context correctly semantically but incorrectly temporally.
|
|
58
|
-
|
|
59
|
-
## Run it against your own data
|
|
60
|
-
|
|
61
|
-
Replace `data.json` with your own retrieval output. The shape is documented inline. The HTML and the script will pick up your new query, your documents, your timestamps. The math doesn't change.
|
|
62
|
-
|
|
63
|
-
## Where this comes from
|
|
64
|
-
|
|
65
|
-
- Repo: <https://github.com/PrinceGabriel-lgtm/freshcontext-mcp>
|
|
66
|
-
- Spec: <https://freshcontext-site.pages.dev>
|
|
67
|
-
- npm: `npm install freshcontext-mcp`
|
|
68
|
-
- Live API: `https://freshcontext-mcp.gimmanuel73.workers.dev/v1/intel/feed/default`
|
|
69
|
-
|
|
70
|
-
Built by Immanuel Gabriel · Grootfontein, Namibia · MIT licensed.
|
package/demo/data.json
DELETED
|
@@ -1,88 +0,0 @@
|
|
|
1
|
-
{
|
|
2
|
-
"query": "What's the recommended way to chunk documents for RAG in 2026?",
|
|
3
|
-
"now": "2026-05-08T00:00:00Z",
|
|
4
|
-
"decay": {
|
|
5
|
-
"lambda_per_hour": 0.0001,
|
|
6
|
-
"halflife_hours": 6931,
|
|
7
|
-
"halflife_days_human": "≈ 9.5 months",
|
|
8
|
-
"note": "Single decay constant across sources for demo clarity. The live engine uses source-specific lambdas (HN: 14h half-life, blogs: 29 days, papers: 1.6 years)."
|
|
9
|
-
},
|
|
10
|
-
"documents": [
|
|
11
|
-
{
|
|
12
|
-
"id": "doc_2022_langchain_blog",
|
|
13
|
-
"source": "blog.langchain.dev",
|
|
14
|
-
"title": "RAG Chunking: The Complete Guide",
|
|
15
|
-
"published_at": "2022-08-15T00:00:00Z",
|
|
16
|
-
"base_score": 92,
|
|
17
|
-
"content": "For most RAG applications, the recommended approach is fixed-size chunking with 1000 tokens per chunk and 200 token overlap. Use RecursiveCharacterTextSplitter with this configuration. This works for almost any document type and is the LangChain default. Studies show fixed-size chunking is sufficient for 95% of use cases when paired with a good embedding model like ada-002.",
|
|
18
|
-
"why_high_semantic_score": "Authoritative source, dense keyword match (RAG, chunking, recommended, LangChain), confident tone."
|
|
19
|
-
},
|
|
20
|
-
{
|
|
21
|
-
"id": "doc_2023_reddit_localllama",
|
|
22
|
-
"source": "reddit.com/r/LocalLLaMA",
|
|
23
|
-
"title": "What chunk size do you use for RAG?",
|
|
24
|
-
"published_at": "2023-01-20T00:00:00Z",
|
|
25
|
-
"base_score": 81,
|
|
26
|
-
"content": "Most people use 512 or 1024 tokens with 10-20% overlap. Don't overthink it. Just use the LangChain RecursiveCharacterTextSplitter and tune chunk_size to match your embedding model's context window. Higher overlap = more storage, marginal accuracy gains.",
|
|
27
|
-
"why_high_semantic_score": "Strong community signal (high upvotes), direct answer to query, confirms authoritative source above."
|
|
28
|
-
},
|
|
29
|
-
{
|
|
30
|
-
"id": "doc_2024_arxiv_paper",
|
|
31
|
-
"source": "arxiv.org/abs/2410.xxxxx",
|
|
32
|
-
"title": "Semantic Chunking vs Fixed-Size: An Empirical Study",
|
|
33
|
-
"published_at": "2024-06-12T00:00:00Z",
|
|
34
|
-
"base_score": 78,
|
|
35
|
-
"content": "We evaluate semantic chunking against fixed-size baselines across 12 retrieval tasks. Semantic chunking outperforms fixed-size in 73% of cases, particularly for technical documents. However, the computational overhead of semantic chunking is 4.2x higher.",
|
|
36
|
-
"why_high_semantic_score": "Empirical evidence, scientific framing, but slightly off-axis from the direct query."
|
|
37
|
-
},
|
|
38
|
-
{
|
|
39
|
-
"id": "doc_2025_langchain_late",
|
|
40
|
-
"source": "blog.langchain.com",
|
|
41
|
-
"title": "Late Chunking and Contextual Retrieval",
|
|
42
|
-
"published_at": "2025-09-30T00:00:00Z",
|
|
43
|
-
"base_score": 86,
|
|
44
|
-
"content": "Late chunking with embedding-aware boundaries (bge-late-chunk, jina-late) outperforms both fixed-size and semantic chunking. Combine with contextual retrieval per Anthropic's research for state-of-the-art accuracy. The fixed-size 1000-token recommendation from earlier guides is now considered a baseline at best.",
|
|
45
|
-
"why_high_semantic_score": "Same authoritative source as #1, strong keyword match, more recent."
|
|
46
|
-
},
|
|
47
|
-
{
|
|
48
|
-
"id": "doc_2026_x_post",
|
|
49
|
-
"source": "x.com/jerryjliu0",
|
|
50
|
-
"title": "RAG failures we see in 2026",
|
|
51
|
-
"published_at": "2026-02-08T00:00:00Z",
|
|
52
|
-
"base_score": 80,
|
|
53
|
-
"content": "Top RAG failure mode in production right now: teams still using fixed-size chunking from 2022 tutorials. Move to late-interaction or contextual retrieval. The math has changed; the 'best practice' docs haven't caught up. If your retrieval is still ranking by cosine similarity alone, you're shipping 2022 results in 2026.",
|
|
54
|
-
"why_high_semantic_score": "Practitioner expertise, direct claim about query topic, but social-media format slightly reduces baseline semantic confidence."
|
|
55
|
-
}
|
|
56
|
-
],
|
|
57
|
-
"baked_answers": {
|
|
58
|
-
"_note": "These are the two answers the demo displays by default. Anyone landing on the page sees them — no API key required. To regenerate from a real Claude call, run `node generate.mjs` with ANTHROPIC_API_KEY set; the script overwrites these two fields. Last regenerated: never (illustrative outputs).",
|
|
59
|
-
"_last_regenerated": null,
|
|
60
|
-
"_last_model": null,
|
|
61
|
-
"stale": {
|
|
62
|
-
"context_label": "2022 LangChain blog · 2025 LangChain blog · 2023 Reddit",
|
|
63
|
-
"intro": "For RAG pipelines, the recommended approach is <strong>fixed-size chunking</strong>:",
|
|
64
|
-
"bullets": [
|
|
65
|
-
"1,000 tokens per chunk with 200-token overlap",
|
|
66
|
-
"Use <span class=\"inline-code\">RecursiveCharacterTextSplitter</span> — the LangChain default",
|
|
67
|
-
"Pair with a strong embedding model like <span class=\"inline-code\">ada-002</span>",
|
|
68
|
-
"Tune <span class=\"inline-code\">chunk_size</span> to match your embedding model's context window"
|
|
69
|
-
],
|
|
70
|
-
"outro": "This approach is widely confirmed by both authoritative documentation and community consensus.",
|
|
71
|
-
"verdict_class": "bad",
|
|
72
|
-
"verdict_text": "⚠ This is 2022 advice. ada-002 is no longer SOTA. Fixed-size chunking has been superseded."
|
|
73
|
-
},
|
|
74
|
-
"fresh": {
|
|
75
|
-
"context_label": "2026 X post · 2025 LangChain blog · 2024 arXiv paper",
|
|
76
|
-
"intro": "In 2026, the current state-of-the-art is <strong>late chunking with embedding-aware boundaries</strong>:",
|
|
77
|
-
"bullets": [
|
|
78
|
-
"Use late-chunking models like <span class=\"inline-code\">bge-late-chunk</span> or <span class=\"inline-code\">jina-late</span>",
|
|
79
|
-
"Combine with <strong>contextual retrieval</strong> (per Anthropic's research) for SOTA accuracy",
|
|
80
|
-
"Avoid fixed-size 1000-token chunking — it's now considered a baseline at best",
|
|
81
|
-
"For technical documents, semantic chunking remains viable (73% win rate over fixed-size per the 2024 study), but carries 4.2× compute overhead"
|
|
82
|
-
],
|
|
83
|
-
"outro": "Common 2026 production failure: teams still applying 2022-era tutorial recommendations.",
|
|
84
|
-
"verdict_class": "good",
|
|
85
|
-
"verdict_text": "✓ Current as of May 2026. Reflects 2025–2026 consensus while preserving the 2024 empirical context."
|
|
86
|
-
}
|
|
87
|
-
}
|
|
88
|
-
}
|
package/demo/generate.mjs
DELETED
|
@@ -1,199 +0,0 @@
|
|
|
1
|
-
/**
|
|
2
|
-
* generate.mjs — regenerate the demo's two answers from a real Claude call,
|
|
3
|
-
* then write them back into data.json so the HTML picks them up automatically.
|
|
4
|
-
*
|
|
5
|
-
* The demo works WITHOUT this script. The baked_answers in data.json are
|
|
6
|
-
* displayed by default — no API key required to view the page. This script
|
|
7
|
-
* exists so anyone who wants to verify the math actually changes Claude's
|
|
8
|
-
* answer can plug in their own key and prove it.
|
|
9
|
-
*
|
|
10
|
-
* Run:
|
|
11
|
-
* PowerShell: $env:ANTHROPIC_API_KEY="sk-ant-..."; node generate.mjs
|
|
12
|
-
* bash/zsh: ANTHROPIC_API_KEY=sk-ant-... node generate.mjs
|
|
13
|
-
*
|
|
14
|
-
* Optional env:
|
|
15
|
-
* MODEL — model string, defaults to claude-sonnet-4-5-20250929
|
|
16
|
-
* DRY_RUN=1 — print the prompts and exit, don't call the API
|
|
17
|
-
* NO_SAVE=1 — call the API and print, but don't write data.json
|
|
18
|
-
*/
|
|
19
|
-
|
|
20
|
-
import fs from 'fs/promises';
|
|
21
|
-
import path from 'path';
|
|
22
|
-
import { fileURLToPath } from 'url';
|
|
23
|
-
|
|
24
|
-
const __dirname = path.dirname(fileURLToPath(import.meta.url));
|
|
25
|
-
const DATA_PATH = path.join(__dirname, 'data.json');
|
|
26
|
-
|
|
27
|
-
const API_KEY = process.env.ANTHROPIC_API_KEY;
|
|
28
|
-
const MODEL = process.env.MODEL ?? 'claude-sonnet-4-5-20250929';
|
|
29
|
-
const DRY_RUN = process.env.DRY_RUN === '1';
|
|
30
|
-
const NO_SAVE = process.env.NO_SAVE === '1';
|
|
31
|
-
|
|
32
|
-
if (!API_KEY && !DRY_RUN) {
|
|
33
|
-
console.error('ERROR: Set ANTHROPIC_API_KEY env var, or run with DRY_RUN=1 to print prompts only.\n');
|
|
34
|
-
console.error(' PowerShell: $env:ANTHROPIC_API_KEY="sk-ant-..."');
|
|
35
|
-
console.error(' bash/zsh: export ANTHROPIC_API_KEY=sk-ant-...');
|
|
36
|
-
process.exit(1);
|
|
37
|
-
}
|
|
38
|
-
|
|
39
|
-
// ─── Load data and compute R_t ──────────────────────────────────────────────
|
|
40
|
-
|
|
41
|
-
const raw = await fs.readFile(DATA_PATH, 'utf8');
|
|
42
|
-
const data = JSON.parse(raw);
|
|
43
|
-
const now = new Date(data.now).getTime();
|
|
44
|
-
const lambda = data.decay.lambda_per_hour;
|
|
45
|
-
|
|
46
|
-
const docs = data.documents.map(d => {
|
|
47
|
-
const published = new Date(d.published_at).getTime();
|
|
48
|
-
const age_hours = (now - published) / (1000 * 60 * 60);
|
|
49
|
-
const r_t = d.base_score * Math.exp(-lambda * age_hours);
|
|
50
|
-
return { ...d, age_hours, r_t };
|
|
51
|
-
});
|
|
52
|
-
|
|
53
|
-
const baselineTop3 = [...docs].sort((a, b) => b.base_score - a.base_score).slice(0, 3);
|
|
54
|
-
const freshTop3 = [...docs].sort((a, b) => b.r_t - a.r_t).slice(0, 3);
|
|
55
|
-
|
|
56
|
-
// ─── Build prompts ──────────────────────────────────────────────────────────
|
|
57
|
-
|
|
58
|
-
function buildPrompt(label, contextDocs) {
|
|
59
|
-
const formatted = contextDocs.map((d, i) =>
|
|
60
|
-
`[Document ${i + 1}]\nSource: ${d.source}\nPublished: ${d.published_at.slice(0, 10)}\nTitle: ${d.title}\nContent: ${d.content}`
|
|
61
|
-
).join('\n\n');
|
|
62
|
-
return `You are answering a developer's technical question using the retrieved documents below. Be concrete and cite specific recommendations from the context. Format your answer as:
|
|
63
|
-
|
|
64
|
-
1. A one-line lede starting with "For RAG pipelines..." or "In 2026..." that names the recommended approach in bold-able terms.
|
|
65
|
-
2. A short bullet list (3-5 bullets) of specific recommendations with code spans where relevant.
|
|
66
|
-
3. A one-line closing note.
|
|
67
|
-
|
|
68
|
-
Keep total length under 120 words. Output plain text — the demo's renderer will format the bullets.
|
|
69
|
-
|
|
70
|
-
QUERY: ${data.query}
|
|
71
|
-
|
|
72
|
-
RETRIEVED CONTEXT (top 3 by ${label}):
|
|
73
|
-
|
|
74
|
-
${formatted}
|
|
75
|
-
|
|
76
|
-
Answer the query using these documents.`;
|
|
77
|
-
}
|
|
78
|
-
|
|
79
|
-
// ─── Call the API ───────────────────────────────────────────────────────────
|
|
80
|
-
|
|
81
|
-
async function ask(prompt) {
|
|
82
|
-
if (DRY_RUN) {
|
|
83
|
-
console.log('--- DRY RUN: would have sent the following prompt ---');
|
|
84
|
-
console.log(prompt);
|
|
85
|
-
console.log('--- end prompt ---');
|
|
86
|
-
return '[DRY_RUN: no API call made]';
|
|
87
|
-
}
|
|
88
|
-
const res = await fetch('https://api.anthropic.com/v1/messages', {
|
|
89
|
-
method: 'POST',
|
|
90
|
-
headers: {
|
|
91
|
-
'Content-Type': 'application/json',
|
|
92
|
-
'x-api-key': API_KEY,
|
|
93
|
-
'anthropic-version': '2023-06-01',
|
|
94
|
-
},
|
|
95
|
-
body: JSON.stringify({
|
|
96
|
-
model: MODEL,
|
|
97
|
-
max_tokens: 600,
|
|
98
|
-
messages: [{ role: 'user', content: prompt }],
|
|
99
|
-
}),
|
|
100
|
-
});
|
|
101
|
-
if (!res.ok) {
|
|
102
|
-
const err = await res.text();
|
|
103
|
-
console.error(`Anthropic API error (${res.status}):`, err);
|
|
104
|
-
process.exit(1);
|
|
105
|
-
}
|
|
106
|
-
const json = await res.json();
|
|
107
|
-
return json.content[0].text;
|
|
108
|
-
}
|
|
109
|
-
|
|
110
|
-
// ─── Convert raw model output → structured answer object ───────────────────
|
|
111
|
-
// The model is asked for a lede + bullets + closing. Parse what came back into
|
|
112
|
-
// the shape data.json's baked_answers expects. If parsing fails on a real run,
|
|
113
|
-
// the script falls back to dumping the raw text into the `intro` field so the
|
|
114
|
-
// page still renders something readable.
|
|
115
|
-
|
|
116
|
-
function parseAnswer(text) {
|
|
117
|
-
const lines = text.split('\n').map(l => l.trim()).filter(Boolean);
|
|
118
|
-
const bullets = [];
|
|
119
|
-
const otherLines = [];
|
|
120
|
-
for (const line of lines) {
|
|
121
|
-
const m = line.match(/^[-*•]\s+(.+)$/) || line.match(/^\d+\.\s+(.+)$/);
|
|
122
|
-
if (m) bullets.push(m[1].trim());
|
|
123
|
-
else otherLines.push(line);
|
|
124
|
-
}
|
|
125
|
-
const intro = otherLines[0] ?? text;
|
|
126
|
-
const outro = otherLines.length > 1 ? otherLines[otherLines.length - 1] : '';
|
|
127
|
-
if (bullets.length === 0) {
|
|
128
|
-
return { intro: text, bullets: [], outro: '' };
|
|
129
|
-
}
|
|
130
|
-
return { intro, bullets, outro };
|
|
131
|
-
}
|
|
132
|
-
|
|
133
|
-
// ─── Run both, print side-by-side ──────────────────────────────────────────
|
|
134
|
-
|
|
135
|
-
console.log(`Model: ${MODEL}`);
|
|
136
|
-
console.log(`Query: "${data.query}"`);
|
|
137
|
-
console.log(`Now: ${data.now}`);
|
|
138
|
-
if (DRY_RUN) console.log('Mode: DRY_RUN — no API calls');
|
|
139
|
-
if (NO_SAVE) console.log('Mode: NO_SAVE — won\'t write data.json');
|
|
140
|
-
console.log();
|
|
141
|
-
|
|
142
|
-
console.log('━'.repeat(72));
|
|
143
|
-
console.log(' WITHOUT FreshContext — top 3 by semantic similarity (R₀)');
|
|
144
|
-
console.log('━'.repeat(72));
|
|
145
|
-
baselineTop3.forEach((d, i) =>
|
|
146
|
-
console.log(` ${i + 1}. ${d.source.padEnd(28)} ${d.published_at.slice(0,10)} R₀=${d.base_score}`)
|
|
147
|
-
);
|
|
148
|
-
console.log();
|
|
149
|
-
const baselineRaw = await ask(buildPrompt('semantic similarity (R₀)', baselineTop3));
|
|
150
|
-
console.log(baselineRaw);
|
|
151
|
-
|
|
152
|
-
console.log('\n' + '━'.repeat(72));
|
|
153
|
-
console.log(' WITH FreshContext — top 3 by decay-adjusted relevancy (Rₜ)');
|
|
154
|
-
console.log('━'.repeat(72));
|
|
155
|
-
freshTop3.forEach((d, i) =>
|
|
156
|
-
console.log(` ${i + 1}. ${d.source.padEnd(28)} ${d.published_at.slice(0,10)} Rₜ=${d.r_t.toFixed(1)}`)
|
|
157
|
-
);
|
|
158
|
-
console.log();
|
|
159
|
-
const freshRaw = await ask(buildPrompt('decay-adjusted relevancy (Rₜ)', freshTop3));
|
|
160
|
-
console.log(freshRaw);
|
|
161
|
-
|
|
162
|
-
console.log('\n' + '━'.repeat(72));
|
|
163
|
-
console.log(' Same model. Same retrieval set. Same query.');
|
|
164
|
-
console.log(' Only the temporal layer changed.');
|
|
165
|
-
console.log('━'.repeat(72));
|
|
166
|
-
|
|
167
|
-
// ─── Persist to data.json (unless suppressed) ──────────────────────────────
|
|
168
|
-
|
|
169
|
-
if (DRY_RUN || NO_SAVE) {
|
|
170
|
-
console.log('\n(skipped writing data.json)');
|
|
171
|
-
process.exit(0);
|
|
172
|
-
}
|
|
173
|
-
|
|
174
|
-
const baselineParsed = parseAnswer(baselineRaw);
|
|
175
|
-
const freshParsed = parseAnswer(freshRaw);
|
|
176
|
-
|
|
177
|
-
data.baked_answers.stale = {
|
|
178
|
-
context_label: baselineTop3.map(d => `${d.published_at.slice(0,4)} ${d.source.split('.')[0]}`).join(' · '),
|
|
179
|
-
intro: baselineParsed.intro,
|
|
180
|
-
bullets: baselineParsed.bullets,
|
|
181
|
-
outro: baselineParsed.outro,
|
|
182
|
-
verdict_class: 'bad',
|
|
183
|
-
verdict_text: data.baked_answers.stale.verdict_text, // preserve human-curated verdict
|
|
184
|
-
};
|
|
185
|
-
data.baked_answers.fresh = {
|
|
186
|
-
context_label: freshTop3.map(d => `${d.published_at.slice(0,4)} ${d.source.split('.')[0]}`).join(' · '),
|
|
187
|
-
intro: freshParsed.intro,
|
|
188
|
-
bullets: freshParsed.bullets,
|
|
189
|
-
outro: freshParsed.outro,
|
|
190
|
-
verdict_class: 'good',
|
|
191
|
-
verdict_text: data.baked_answers.fresh.verdict_text, // preserve human-curated verdict
|
|
192
|
-
};
|
|
193
|
-
data.baked_answers._last_regenerated = new Date().toISOString();
|
|
194
|
-
data.baked_answers._last_model = MODEL;
|
|
195
|
-
data.baked_answers._note = `Live-regenerated answers from ${MODEL}. Same model, same retrieval set, same query — only the temporal layer changed. Re-run \`node generate.mjs\` any time to refresh.`;
|
|
196
|
-
|
|
197
|
-
await fs.writeFile(DATA_PATH, JSON.stringify(data, null, 2) + '\n', 'utf8');
|
|
198
|
-
console.log(`\n✓ Wrote ${DATA_PATH}`);
|
|
199
|
-
console.log(' Open index.html — the answers in the page now reflect this run.');
|