freshcontext-mcp 0.3.15 → 0.3.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/.env.example ADDED
@@ -0,0 +1,8 @@
1
+ # freshcontext-mcp environment variables
2
+ # Copy to .env and fill in
3
+
4
+ # Optional: GitHub Personal Access Token (increases rate limits for GitHub API fallback)
5
+ GITHUB_TOKEN=
6
+
7
+ # Optional: Proxy URL if needed for certain extractions
8
+ # PROXY_URL=http://user:pass@host:port
package/README.md CHANGED
@@ -10,13 +10,44 @@ That's the problem freshcontext fixes.
10
10
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
11
11
  [![MCP Registry](https://img.shields.io/badge/MCP%20Registry-Listed-blue)](https://registry.modelcontextprotocol.io)
12
12
 
13
+ > **Live demo:** [freshcontext-mcp.gimmanuel73.workers.dev/demo](https://freshcontext-mcp.gimmanuel73.workers.dev/demo) — same model, same query, two completely different answers. Only the temporal layer changed.
14
+
13
15
  ---
14
16
 
15
- ## The Standard
17
+ ## The problem
18
+
19
+ Large language models retrieve web data semantically. Cosine similarity finds the documents that match a query best — but cosine doesn't know when a document was written.
20
+
21
+ So a 2022 blog post and a 2026 paper can score nearly identically. The model gets a context window full of stale documents and faithfully summarizes 2022 advice for a 2026 question.
22
+
23
+ That's not hallucination. That's correct summarization of corrupted retrieval.
16
24
 
17
- FreshContext is a **data freshness layer for AI agents** an open standard and reference implementation that makes retrieved data trustworthy.
25
+ > **Most RAG pipelines rank context correctly semantically but incorrectly temporally.**
18
26
 
19
- Every piece of web data an AI agent retrieves has an age. Most tools ignore it. FreshContext surfaces it — wrapping every result in a structured envelope that carries three guarantees:
27
+ ---
28
+
29
+ ## The layer
30
+
31
+ FreshContext is a **temporal correction layer for retrieval systems**. One math correction applied before context reaches the LLM:
32
+
33
+ ```
34
+ R_t = R_0 · e^(−λt)
35
+ ```
36
+
37
+ - `R_0` — base semantic relevancy (whatever your retriever already gives you)
38
+ - `λ` — source-specific decay constant (HN ≈14h half-life, blogs ≈29d, academic papers ≈1.6y)
39
+ - `t` — hours elapsed since publication
40
+ - `R_t` — decay-adjusted relevancy at query time
41
+
42
+ That's the whole fix. No model swap. No re-embedding. No re-indexing. The layer drops onto whatever retrieval pipeline you already have.
43
+
44
+ **The layer is the product.** The 20 adapters shipped with this repo are reference implementations demonstrating compatibility — useful, but commodity. The DAR engine, the freshness envelope, and the FreshContext Specification are the moat.
45
+
46
+ ---
47
+
48
+ ## The standard
49
+
50
+ Every FreshContext-compatible response wraps content in a structured envelope:
20
51
 
21
52
  ```
22
53
  [FRESHCONTEXT]
@@ -31,101 +62,74 @@ Confidence: high
31
62
 
32
63
  **When** it was retrieved. **Where** it came from. **How confident** we are the date is accurate.
33
64
 
34
- The FreshContext Specification v1.1 is published as an open standard under MIT license. Any tool, agent, or system that wraps retrieved data in this envelope is FreshContext-compatible. → [Read the spec](./FRESHCONTEXT_SPEC.md)
65
+ The FreshContext Specification v1.1 is published as an open standard under MIT licence. Any tool, agent, or system that wraps retrieved data in this envelope is FreshContext-compatible. → [Read the spec](./FRESHCONTEXT_SPEC.md) · [Read the methodology](./METHODOLOGY.md)
66
+
67
+ ---
68
+
69
+ ## The intelligence feed
70
+
71
+ Beyond the per-call envelope, the production FreshContext deployment exposes a continuous, decay-scored, deduplicated feed:
72
+
73
+ ```
74
+ GET /v1/intel/feed/:profile_id?limit=20&min_rt=0
75
+ ```
76
+
77
+ Every signal is stamped with `base_score`, `rt_score`, `entropy_level` (low / stable / high), `ha_pri_sig` (SHA-256 provenance), `semantic_fingerprint` (cross-adapter dedup), and `published_at`. Ready for direct LLM or agent consumption — no synthesis required.
78
+
79
+ Production endpoint: `https://freshcontext-mcp.gimmanuel73.workers.dev`
35
80
 
36
81
  ---
37
82
 
38
- ## 20 tools. No API keys.
83
+ ## Reference adapters
84
+
85
+ The repo ships 20 adapters demonstrating how to make any data source FreshContext-compatible. Useful as drop-in tools, but the value is the layer above them.
39
86
 
40
87
  ### Intelligence
41
- | Tool | What it gets you |
88
+ | Adapter | What it returns |
42
89
  |---|---|
43
90
  | `extract_github` | README, stars, forks, language, topics, last commit |
44
91
  | `extract_hackernews` | Top stories or search results with scores and timestamps |
45
92
  | `extract_scholar` | Research papers — titles, authors, years, snippets |
46
- | `extract_arxiv` | arXiv papers via official API — more reliable than Scholar |
93
+ | `extract_arxiv` | arXiv papers via official API |
47
94
  | `extract_reddit` | Posts and community sentiment from any subreddit |
48
95
 
49
96
  ### Competitive research
50
- | Tool | What it gets you |
97
+ | Adapter | What it returns |
51
98
  |---|---|
52
- | `extract_yc` | YC company listings by keyword — who's funded in your space |
99
+ | `extract_yc` | YC company listings by keyword |
53
100
  | `extract_producthunt` | Recent launches by topic |
54
101
  | `search_repos` | GitHub repos ranked by stars with activity signals |
55
102
  | `package_trends` | npm and PyPI metadata — version history, release cadence |
56
103
 
57
104
  ### Market data
58
- | Tool | What it gets you |
105
+ | Adapter | What it returns |
59
106
  |---|---|
60
107
  | `extract_finance` | Live stock data — price, market cap, P/E, 52w range. Up to 5 tickers. |
61
- | `search_jobs` | Remote job listings from Remotive, RemoteOK, HN "Who is Hiring" — every listing dated |
108
+ | `search_jobs` | Remote job listings from Remotive, RemoteOK, HN "Who is Hiring" |
62
109
 
63
110
  ### Composites — multiple sources, one call
64
- | Tool | Sources | What it gets you |
111
+ | Adapter | Sources | Purpose |
65
112
  |---|---|---|
66
113
  | `extract_landscape` | 6 | YC + GitHub + HN + Reddit + Product Hunt + npm in parallel |
67
114
  | `extract_idea_landscape` | 6 | HN + YC + GitHub + Jobs + npm + Product Hunt — full idea validation |
68
115
  | `extract_gov_landscape` | 4 | Gov contracts + HN + GitHub + changelog |
69
116
  | `extract_finance_landscape` | 5 | Finance + HN + Reddit + GitHub + changelog |
70
- | `extract_company_landscape` | 5 | **The full picture on any company** — see below |
117
+ | `extract_company_landscape` | 5 | The full picture on any company |
71
118
 
72
119
  ### Unique — not available in any other MCP server
73
- | Tool | Source | What it gets you |
120
+ | Adapter | Source | What it returns |
74
121
  |---|---|---|
75
- | `extract_changelog` | GitHub Releases API / npm / auto-discover | Update history from any repo, package, or website |
122
+ | `extract_changelog` | GitHub Releases / npm / auto-discover | Update history from any repo, package, or website |
76
123
  | `extract_govcontracts` | USASpending.gov | US federal contract awards — company, amount, agency, period |
77
124
  | `extract_sec_filings` | SEC EDGAR | 8-K filings — legally mandated material event disclosures |
78
- | `extract_gdelt` | GDELT Project | Global news intelligence — 100+ languages, every country, 15-min updates |
79
- | `extract_gebiz` | data.gov.sg | Singapore Government procurement tenders — open dataset, no auth |
125
+ | `extract_gdelt` | GDELT Project | Global news intelligence — 100+ languages, 15-min updates |
126
+ | `extract_gebiz` | data.gov.sg | Singapore Government procurement tenders — open dataset |
80
127
 
81
128
  ---
82
129
 
83
- ## extract_idea_landscape
84
-
85
- Built for the moment before you start building. Six sources fired in parallel to answer: *should I build this?*
130
+ ## Quick start
86
131
 
87
- 1. **Hacker News** — what are developers actively complaining about (pain signal)
88
- 2. **YC Companies** — who has already received funding in this space (funding signal)
89
- 3. **GitHub** — how crowded the open source landscape is (crowding signal)
90
- 4. **Job listings** — companies hiring around this problem = real budget = real market (market signal)
91
- 5. **npm / PyPI** — ecosystem adoption and release velocity (ecosystem signal)
92
- 6. **Product Hunt** — what just launched and how the market received it (launch signal)
93
-
94
- ```
95
- Use extract_idea_landscape with idea "data freshness for AI agents"
96
- ```
97
-
98
- ---
99
-
100
- ## extract_company_landscape
101
-
102
- The most complete single-call company analysis available in any MCP server. Five sources fired in parallel:
103
-
104
- 1. **SEC EDGAR** — what did they legally just disclose (8-K filings)
105
- 2. **USASpending.gov** — who is giving them government money
106
- 3. **GDELT** — what is global news saying right now
107
- 4. **Changelog** — are they actually shipping product
108
- 5. **Yahoo Finance** — what is the market pricing in
109
-
110
- ```
111
- Use extract_company_landscape with company "Palantir" and ticker "PLTR"
112
- ```
113
-
114
- Real output from March 2026:
115
-
116
- > **Q4 2025:** Revenue $1.407B (+70% YoY). US commercial +137%. Rule of 40 score: **127%**.
117
- > **Federal contracts:** $292.7M Army Maven Smart System · $252.5M CDAO · $145M ICE · $130M Air Force · more
118
- > **SEC filing:** Q4 earnings 8-K filed Feb 3, 2026 — GAAP net income $609M, 43% margin
119
- > **GDELT:** ICE/Medicaid data controversy, UK MoD security warning, NHS opposition — all timestamped
120
- > **PLTR:** ~$154–157 · Market cap ~$370B · P/E 244x · 52w range $66 → $207
121
-
122
- Bloomberg Terminal doesn't read commit history as a company health signal. FreshContext does.
123
-
124
- ---
125
-
126
- ## Quick Start
127
-
128
- ### Option A — Cloud (no install)
132
+ ### Cloud (no install)
129
133
 
130
134
  Add to your Claude Desktop config and restart:
131
135
 
@@ -147,9 +151,7 @@ Restart Claude. Done.
147
151
 
148
152
  > Prefer a guided setup? Visit **[freshcontext-site.pages.dev](https://freshcontext-site.pages.dev)** — 3 steps, no terminal.
149
153
 
150
- ---
151
-
152
- ### Option B — Local (full Playwright)
154
+ ### Local (full Playwright)
153
155
 
154
156
  **Requires:** Node.js 18+ ([nodejs.org](https://nodejs.org))
155
157
 
@@ -187,16 +189,14 @@ Add to Claude Desktop config:
187
189
  }
188
190
  ```
189
191
 
190
- ---
191
-
192
- ### Troubleshooting (Mac)
192
+ #### Mac troubleshooting
193
193
 
194
194
  **"command not found: node"** — Use the full path:
195
195
  ```bash
196
196
  which node # copy this output, replace "node" in config
197
197
  ```
198
198
 
199
- **Config file doesn't exist** — Create it:
199
+ **Config file doesn't exist:**
200
200
  ```bash
201
201
  mkdir -p ~/Library/Application\ Support/Claude
202
202
  touch ~/Library/Application\ Support/Claude/claude_desktop_config.json
@@ -216,19 +216,7 @@ Returns funding signal, pain signal, crowding signal, market signal, ecosystem s
216
216
  ```
217
217
  Use extract_company_landscape with company "Palantir" and ticker "PLTR"
218
218
  ```
219
- SEC filings + federal contracts + global news + changelog + market data. The complete picture.
220
-
221
- **Is anyone already building what you're building?**
222
- ```
223
- Use extract_landscape with topic "cashflow prediction saas"
224
- ```
225
- Returns who's funded, what's trending, what repos exist, what packages are moving — all timestamped.
226
-
227
- **What's Singapore's government procuring right now?**
228
- ```
229
- Use extract_gebiz with url "artificial intelligence"
230
- ```
231
- Returns live tenders from the Ministry of Finance open dataset — agency, amount, closing date, all timestamped.
219
+ SEC filings + federal contracts + global news + changelog + market data.
232
220
 
233
221
  **Did that company just disclose something material?**
234
222
  ```
@@ -236,18 +224,6 @@ Use extract_sec_filings with url "Palantir Technologies"
236
224
  ```
237
225
  8-K filings are legally mandated within 4 business days of any material event — CEO change, acquisition, breach, major contract.
238
226
 
239
- **What is global news saying about a company right now?**
240
- ```
241
- Use extract_gdelt with url "Palantir"
242
- ```
243
- 100+ languages, every country, updated every 15 minutes. Surfaces what Western sources miss.
244
-
245
- **Which companies just won US government contracts in AI?**
246
- ```
247
- Use extract_govcontracts with url "artificial intelligence"
248
- ```
249
- Largest recent federal contract awards matching that keyword — company, amount, agency, award date.
250
-
251
227
  **Is this dependency still actively maintained?**
252
228
  ```
253
229
  Use extract_changelog with url "https://github.com/org/repo"
@@ -256,64 +232,51 @@ Returns the last 8 releases with exact dates. If the last release was 18 months
256
232
 
257
233
  ---
258
234
 
259
- ## How freshness works
260
-
261
- Most AI tools retrieve data silently. No timestamp, no signal, no way for the agent to know how old it is.
235
+ ## Deployment & infrastructure
262
236
 
263
- FreshContext treats **retrieval time as first-class metadata**. Every adapter returns:
237
+ The reference implementation runs on Cloudflare's global edge:
264
238
 
265
- - `retrieved_at` exact ISO timestamp of the fetch
266
- - `content_date` — best estimate of when the content was originally published
267
- - `freshness_confidence` `high`, `medium`, or `low` based on signal quality
268
- - `freshness_score` numeric 0–100 with domain-specific decay rates (financial data at 5.0, academic papers at 0.3)
269
- - `adapter` which source the data came from
270
-
271
- When confidence is `high`, the date came from a structured field (API, metadata). When it's `medium` or `low`, FreshContext tells you why.
272
-
273
- ---
239
+ | Endpoint | Method | Purpose |
240
+ |---|---|---|
241
+ | `/` | GET | Service info + endpoint list |
242
+ | `/health` | GET | Liveness check |
243
+ | `/mcp` | POST | MCP JSON-RPC transport |
244
+ | `/demo` | GET | Live before/after demo (no API key required) |
245
+ | `/briefing` | GET | Latest stored briefing |
246
+ | `/v1/intel/feed/:profile_id` | GET | DAR-scored intelligence feed |
247
+ | `/watched-queries` | GET | List all watched queries |
274
248
 
275
- ## Security
249
+ - **D1 database** — 18 watched queries running on 6-hour cron with relevancy scoring
250
+ - **KV-backed rate limiting** — 60 req/min per IP across all edge nodes
251
+ - **Defensive valves** — clock-skew rejection (5min tolerance), hard floor at R_t<5, lazy decay at read time
252
+ - **Provenance** — Ha-Pri SHA-256 audit signatures on every signal
253
+ - **Schema migrations** — promise-gated, idempotent, run on first request after deploy
276
254
 
277
- - Input sanitization and domain allowlists on all adapters
278
- - SSRF prevention (blocked private IP ranges)
279
- - KV-backed global rate limiting: 60 req/min per IP across all edge nodes
280
- - No credentials required — all public data sources
255
+ Production: `https://freshcontext-mcp.gimmanuel73.workers.dev`
281
256
 
282
257
  ---
283
258
 
284
259
  ## Roadmap
285
260
 
286
- - [x] 20 tools across intelligence, competitive research, market data, and composites
287
- - [x] `extract_changelog` update cadence from any repo, package, or website
288
- - [x] `extract_govcontracts` US federal contract intelligence via USASpending.gov
289
- - [x] `extract_sec_filings` SEC EDGAR 8-K material event filings
290
- - [x] `extract_gdelt` GDELT global news intelligence (100+ languages)
291
- - [x] `extract_gebiz`Singapore Government procurement via data.gov.sg
292
- - [x] `extract_company_landscape` 5-source company intelligence composite
293
- - [x] `extract_idea_landscape`6-source idea validation composite
294
- - [x] `freshness_score` numeric metric (0–100) with domain-specific decay rates
295
- - [x] Cloudflare Workers deploymentglobal edge with KV caching and rate limiting
296
- - [x] D1 database — 18 watched queries running on 6-hour cron with relevancy scoring
297
- - [x] Listed on official MCP Registry
298
- - [x] Listed on Apify Store
299
- - [x] FreshContext Specification v1.1 published (MIT) — composite adapters, decay rate table, compatibility levels
300
- - [x] GitHub Actions CI/CD — auto-publish to npm on every push
301
- - [x] **DAR engine** — exponential decay scoring with proprietary λ constants (v0.3.15)
302
- - [x] **Ha-Pri audit signatures** — SHA-256 provenance stamps on every signal
303
- - [x] **Semantic deduplication** — cross-adapter fingerprinting
304
- - [x] **Intelligence feed endpoint** — `/v1/intel/feed/:profile_id`
305
- - [x] **METHODOLOGY.md** — formal IP documentation
261
+ - [x] FreshContext Specification v1.1 published (MIT, open standard)
262
+ - [x] DAR engine with proprietary λ constants (v0.3.15)
263
+ - [x] Ha-Pri audit signatures on every signal
264
+ - [x] Semantic deduplication via fingerprinting
265
+ - [x] Live before/after demo at `/demo`
266
+ - [x] METHODOLOGY.mdformal IP and engineering documentation
267
+ - [x] 20 reference adapters across intelligence, competitive research, market data, and composites
268
+ - [x] Cloudflare Workers deployment global edge, KV cache, KV rate limiting
269
+ - [x] Listed on official MCP Registry, Apify Store, npm
270
+ - [x] GitHub Actions CI/CDauto-publish on every push
306
271
  - [ ] Webhook triggers — push high-entropy signals on threshold
307
- - [ ] Domain-specific watched queries for mining/industrial sector
308
- - [ ] Subscription tier with profile customization
309
- - [ ] GKG upgrade for `extract_gdelt` — tone scores, goldstein scale, event codes
310
272
  - [ ] Dashboard — React frontend for the D1 intelligence pipeline
273
+ - [ ] GKG upgrade for `extract_gdelt` — tone scores, goldstein scale, event codes
311
274
 
312
275
  ---
313
276
 
314
277
  ## Contributing
315
278
 
316
- PRs welcome. New adapters are the highest-value contribution — see `src/adapters/` for the pattern and `FRESHCONTEXT_SPEC.md` for the contract any adapter must fulfill.
279
+ PRs welcome. New adapters are the highest-value contribution — see `src/adapters/` for the pattern and [`FRESHCONTEXT_SPEC.md`](./FRESHCONTEXT_SPEC.md) for the contract any adapter must fulfil.
317
280
 
318
281
  If you're building something FreshContext-compatible, open an issue and we'll add you to the ecosystem list.
319
282
 
@@ -331,63 +294,3 @@ MIT
331
294
  ---
332
295
 
333
296
  **Also on:** [Apify Store](https://apify.com/prince_gabriel/freshcontext-mcp) · [MCP Registry](https://registry.modelcontextprotocol.io) · [npm](https://www.npmjs.com/package/freshcontext-mcp)
334
-
335
- ---
336
-
337
- ## The Intelligence Layer (v0.3.15)
338
-
339
- FreshContext is no longer just a pull tool. The infrastructure now runs a continuous **Decay-Adjusted Relevancy (DAR)** engine that scores every signal with exponential decay and provenance signatures.
340
-
341
- ### The math
342
-
343
- ```
344
- R_t = R_0 · e^(-λt)
345
- ```
346
-
347
- - `R_0` — base semantic score against your profile (0–100)
348
- - `λ` — source-specific decay constant (per hour)
349
- - `t` — hours since the content was published
350
- - `R_t` — final relevancy at query time
351
-
352
- Source half-lives are calibrated empirically: Hacker News ≈14h, Reddit ≈3d, jobs ≈6d, GitHub ≈5mo, academic papers ≈1.6y.
353
-
354
- ### What every signal carries
355
-
356
- Every row in the D1 ledger is stamped with:
357
-
358
- - `base_score` — R_0, semantic match against profile
359
- - `rt_score` — R_t, decay-adjusted relevancy
360
- - `entropy_level` — `low` / `stable` / `high` on the decay curve
361
- - `ha_pri_sig` — SHA-256 provenance signature (tamper-evident)
362
- - `semantic_fingerprint` — cross-adapter deduplication hash
363
- - `published_at` — extracted content publication date
364
-
365
- ### The intelligence feed
366
-
367
- ```
368
- GET /v1/intel/feed/:profile_id?limit=20&min_rt=0
369
- ```
370
-
371
- Returns scored, deduplicated, provenance-stamped signals ranked by R_t — ready for direct consumption by any LLM or agent. No synthesis needed.
372
-
373
- ### Methodology
374
-
375
- The full data collection, scoring, and provenance methodology is formally documented in [METHODOLOGY.md](./METHODOLOGY.md) — written as an audit trail for acquirers, integrators, and regulators. Version 1.1, April 2026.
376
-
377
- ---
378
-
379
- ## Live endpoints
380
-
381
- | Endpoint | Method | Purpose |
382
- |---|---|---|
383
- | `/` | GET | Service info + endpoint list |
384
- | `/health` | GET | Liveness check |
385
- | `/mcp` | POST | MCP JSON-RPC transport |
386
- | `/briefing` | GET | Latest stored briefing |
387
- | `/briefing/now` | POST | Force scrape + synthesize |
388
- | `/v1/intel/feed/:profile_id` | GET | DAR-scored intelligence feed |
389
- | `/watched-queries` | GET | List all watched queries |
390
- | `/debug/db` | GET | D1 counts + DAR engine coverage |
391
- | `/debug/scrape` | GET | Run a single adapter raw |
392
-
393
- Production: `https://freshcontext-mcp.gimmanuel73.workers.dev`