npm - freshcontext-mcp - Versions diffs - 0.3.15 → 0.3.16 - Mend

freshcontext-mcp 0.3.15 → 0.3.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

package/.env.example +8 -0
package/README.md +99 -196
package/RESEARCH.md +487 -0
package/RISKS.md +137 -0
package/cleanup.ps1 +99 -0
package/demo/README.md +70 -0
package/demo/data.json +88 -0
package/demo/generate.mjs +199 -0
package/demo/index.html +513 -0
package/demo/logo-export.html +61 -0
package/demo/logo.svg +23 -0
package/dist/server.js +81 -8
package/dist/tools/freshnessStamp.js +30 -22
package/package.json +1 -1
package/time-check.ps1 +46 -0
package/.actor/Dockerfile +0 -19
package/.actor/actor.json +0 -9
package/.actor/output_schema.json +0 -13
package/ARCHITECTURE_UPGRADE_CHECKLIST.md +0 -88
package/ARCHITECTURE_UPGRADE_ROADMAP_V1.md +0 -174
package/CONTEXT_SKILL.md +0 -84
package/FRESHCONTEXT_SPEC.md +0 -252
package/HANDOFF.md +0 -313
package/METHODOLOGY.md +0 -277
package/ROADMAP.md +0 -174
package/SESSION_SAVE_ARCHITECTURE_V1.md +0 -67
package/SESSION_SAVE_ARCHITECTURE_V2.md +0 -142
package/SESSION_SAVE_V4.md +0 -60
package/SESSION_SAVE_V5.md +0 -121
package/SESSION_SAVE_V6.md +0 -194
package/SESSION_SAVE_V9.md +0 -170
package/USAGE.md +0 -294
package/add-cache.cjs +0 -86
package/dataset_schema.json +0 -41
package/input_schema.json +0 -48

package/.env.example ADDED Viewed

@@ -0,0 +1,8 @@
+# freshcontext-mcp environment variables
+# Copy to .env and fill in
+# Optional: GitHub Personal Access Token (increases rate limits for GitHub API fallback)
+GITHUB_TOKEN=
+# Optional: Proxy URL if needed for certain extractions
+# PROXY_URL=http://user:pass@host:port

package/README.md CHANGED Viewed

@@ -10,13 +10,44 @@ That's the problem freshcontext fixes.
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
 [![MCP Registry](https://img.shields.io/badge/MCP%20Registry-Listed-blue)](https://registry.modelcontextprotocol.io)
+> **Live demo:** [freshcontext-mcp.gimmanuel73.workers.dev/demo](https://freshcontext-mcp.gimmanuel73.workers.dev/demo) — same model, same query, two completely different answers. Only the temporal layer changed.
 ---
-## The Standard
+## The problem
+Large language models retrieve web data semantically. Cosine similarity finds the documents that match a query best — but cosine doesn't know when a document was written.
+So a 2022 blog post and a 2026 paper can score nearly identically. The model gets a context window full of stale documents and faithfully summarizes 2022 advice for a 2026 question.
+That's not hallucination. That's correct summarization of corrupted retrieval.
-FreshContext is a **data freshness layer for AI agents** — an open standard and reference implementation that makes retrieved data trustworthy.
+> **Most RAG pipelines rank context correctly semantically but incorrectly temporally.**
-Every piece of web data an AI agent retrieves has an age. Most tools ignore it. FreshContext surfaces it — wrapping every result in a structured envelope that carries three guarantees:
+---
+## The layer
+FreshContext is a **temporal correction layer for retrieval systems**. One math correction applied before context reaches the LLM:
+```
+R_t = R_0 · e^(−λt)
+```
+- `R_0` — base semantic relevancy (whatever your retriever already gives you)
+- `λ` — source-specific decay constant (HN ≈14h half-life, blogs ≈29d, academic papers ≈1.6y)
+- `t` — hours elapsed since publication
+- `R_t` — decay-adjusted relevancy at query time
+That's the whole fix. No model swap. No re-embedding. No re-indexing. The layer drops onto whatever retrieval pipeline you already have.
+**The layer is the product.** The 20 adapters shipped with this repo are reference implementations demonstrating compatibility — useful, but commodity. The DAR engine, the freshness envelope, and the FreshContext Specification are the moat.
+---
+## The standard
+Every FreshContext-compatible response wraps content in a structured envelope:
 ```
 [FRESHCONTEXT]
@@ -31,101 +62,74 @@ Confidence: high
 **When** it was retrieved. **Where** it came from. **How confident** we are the date is accurate.
-The FreshContext Specification v1.1 is published as an open standard under MIT license. Any tool, agent, or system that wraps retrieved data in this envelope is FreshContext-compatible. → [Read the spec](./FRESHCONTEXT_SPEC.md)
+The FreshContext Specification v1.1 is published as an open standard under MIT licence. Any tool, agent, or system that wraps retrieved data in this envelope is FreshContext-compatible. → [Read the spec](./FRESHCONTEXT_SPEC.md) · [Read the methodology](./METHODOLOGY.md)
+---
+## The intelligence feed
+Beyond the per-call envelope, the production FreshContext deployment exposes a continuous, decay-scored, deduplicated feed:
+```
+GET /v1/intel/feed/:profile_id?limit=20&min_rt=0
+```
+Every signal is stamped with `base_score`, `rt_score`, `entropy_level` (low / stable / high), `ha_pri_sig` (SHA-256 provenance), `semantic_fingerprint` (cross-adapter dedup), and `published_at`. Ready for direct LLM or agent consumption — no synthesis required.
+Production endpoint: `https://freshcontext-mcp.gimmanuel73.workers.dev`
 ---
-## 20 tools. No API keys.
+## Reference adapters
+The repo ships 20 adapters demonstrating how to make any data source FreshContext-compatible. Useful as drop-in tools, but the value is the layer above them.
 ### Intelligence
-| Tool | What it gets you |
+| Adapter | What it returns |
 |---|---|
 | `extract_github` | README, stars, forks, language, topics, last commit |
 | `extract_hackernews` | Top stories or search results with scores and timestamps |
 | `extract_scholar` | Research papers — titles, authors, years, snippets |
-| `extract_arxiv` | arXiv papers via official API — more reliable than Scholar |
+| `extract_arxiv` | arXiv papers via official API |
 | `extract_reddit` | Posts and community sentiment from any subreddit |
 ### Competitive research
-| Tool | What it gets you |
+| Adapter | What it returns |
 |---|---|
-| `extract_yc` | YC company listings by keyword — who's funded in your space |
+| `extract_yc` | YC company listings by keyword |
 | `extract_producthunt` | Recent launches by topic |
 | `search_repos` | GitHub repos ranked by stars with activity signals |
 | `package_trends` | npm and PyPI metadata — version history, release cadence |
 ### Market data
-| Tool | What it gets you |
+| Adapter | What it returns |
 |---|---|
 | `extract_finance` | Live stock data — price, market cap, P/E, 52w range. Up to 5 tickers. |
-| `search_jobs` | Remote job listings from Remotive, RemoteOK, HN "Who is Hiring" — every listing dated |
+| `search_jobs` | Remote job listings from Remotive, RemoteOK, HN "Who is Hiring" |
 ### Composites — multiple sources, one call
-| Tool | Sources | What it gets you |
+| Adapter | Sources | Purpose |
 |---|---|---|
 | `extract_landscape` | 6 | YC + GitHub + HN + Reddit + Product Hunt + npm in parallel |
 | `extract_idea_landscape` | 6 | HN + YC + GitHub + Jobs + npm + Product Hunt — full idea validation |
 | `extract_gov_landscape` | 4 | Gov contracts + HN + GitHub + changelog |
 | `extract_finance_landscape` | 5 | Finance + HN + Reddit + GitHub + changelog |
-| `extract_company_landscape` | 5 | **The full picture on any company** — see below |
+| `extract_company_landscape` | 5 | The full picture on any company |
 ### Unique — not available in any other MCP server
-| Tool | Source | What it gets you |
+| Adapter | Source | What it returns |
 |---|---|---|
-| `extract_changelog` | GitHub Releases API / npm / auto-discover | Update history from any repo, package, or website |
+| `extract_changelog` | GitHub Releases / npm / auto-discover | Update history from any repo, package, or website |
 | `extract_govcontracts` | USASpending.gov | US federal contract awards — company, amount, agency, period |
 | `extract_sec_filings` | SEC EDGAR | 8-K filings — legally mandated material event disclosures |
-| `extract_gdelt` | GDELT Project | Global news intelligence — 100+ languages, every country, 15-min updates |
-| `extract_gebiz` | data.gov.sg | Singapore Government procurement tenders — open dataset, no auth |
+| `extract_gdelt` | GDELT Project | Global news intelligence — 100+ languages, 15-min updates |
+| `extract_gebiz` | data.gov.sg | Singapore Government procurement tenders — open dataset |
 ---
-## extract_idea_landscape
-Built for the moment before you start building. Six sources fired in parallel to answer: *should I build this?*
+## Quick start
-1. **Hacker News** — what are developers actively complaining about (pain signal)
-2. **YC Companies** — who has already received funding in this space (funding signal)
-3. **GitHub** — how crowded the open source landscape is (crowding signal)
-4. **Job listings** — companies hiring around this problem = real budget = real market (market signal)
-5. **npm / PyPI** — ecosystem adoption and release velocity (ecosystem signal)
-6. **Product Hunt** — what just launched and how the market received it (launch signal)
-```
-Use extract_idea_landscape with idea "data freshness for AI agents"
-```
----
-## extract_company_landscape
-The most complete single-call company analysis available in any MCP server. Five sources fired in parallel:
-1. **SEC EDGAR** — what did they legally just disclose (8-K filings)
-2. **USASpending.gov** — who is giving them government money
-3. **GDELT** — what is global news saying right now
-4. **Changelog** — are they actually shipping product
-5. **Yahoo Finance** — what is the market pricing in
-```
-Use extract_company_landscape with company "Palantir" and ticker "PLTR"
-```
-Real output from March 2026:
-> **Q4 2025:** Revenue $1.407B (+70% YoY). US commercial +137%. Rule of 40 score: **127%**.
-> **Federal contracts:** $292.7M Army Maven Smart System · $252.5M CDAO · $145M ICE · $130M Air Force · more
-> **SEC filing:** Q4 earnings 8-K filed Feb 3, 2026 — GAAP net income $609M, 43% margin
-> **GDELT:** ICE/Medicaid data controversy, UK MoD security warning, NHS opposition — all timestamped
-> **PLTR:** ~$154–157 · Market cap ~$370B · P/E 244x · 52w range $66 → $207
-Bloomberg Terminal doesn't read commit history as a company health signal. FreshContext does.
----
-## Quick Start
-### Option A — Cloud (no install)
+### Cloud (no install)
 Add to your Claude Desktop config and restart:
@@ -147,9 +151,7 @@ Restart Claude. Done.
 > Prefer a guided setup? Visit **[freshcontext-site.pages.dev](https://freshcontext-site.pages.dev)** — 3 steps, no terminal.
----
-### Option B — Local (full Playwright)
+### Local (full Playwright)
 **Requires:** Node.js 18+ ([nodejs.org](https://nodejs.org))
@@ -187,16 +189,14 @@ Add to Claude Desktop config:
 }
 ```
----
-### Troubleshooting (Mac)
+#### Mac troubleshooting
 **"command not found: node"** — Use the full path:
 ```bash
 which node  # copy this output, replace "node" in config
 ```
-**Config file doesn't exist** — Create it:
+**Config file doesn't exist:**
 ```bash
 mkdir -p ~/Library/Application\ Support/Claude
 touch ~/Library/Application\ Support/Claude/claude_desktop_config.json
@@ -216,19 +216,7 @@ Returns funding signal, pain signal, crowding signal, market signal, ecosystem s
 ```
 Use extract_company_landscape with company "Palantir" and ticker "PLTR"
 ```
-SEC filings + federal contracts + global news + changelog + market data. The complete picture.
-**Is anyone already building what you're building?**
-```
-Use extract_landscape with topic "cashflow prediction saas"
-```
-Returns who's funded, what's trending, what repos exist, what packages are moving — all timestamped.
-**What's Singapore's government procuring right now?**
-```
-Use extract_gebiz with url "artificial intelligence"
-```
-Returns live tenders from the Ministry of Finance open dataset — agency, amount, closing date, all timestamped.
+SEC filings + federal contracts + global news + changelog + market data.
 **Did that company just disclose something material?**
 ```
@@ -236,18 +224,6 @@ Use extract_sec_filings with url "Palantir Technologies"
 ```
 8-K filings are legally mandated within 4 business days of any material event — CEO change, acquisition, breach, major contract.
-**What is global news saying about a company right now?**
-```
-Use extract_gdelt with url "Palantir"
-```
-100+ languages, every country, updated every 15 minutes. Surfaces what Western sources miss.
-**Which companies just won US government contracts in AI?**
-```
-Use extract_govcontracts with url "artificial intelligence"
-```
-Largest recent federal contract awards matching that keyword — company, amount, agency, award date.
 **Is this dependency still actively maintained?**
 ```
 Use extract_changelog with url "https://github.com/org/repo"
@@ -256,64 +232,51 @@ Returns the last 8 releases with exact dates. If the last release was 18 months
 ---
-## How freshness works
-Most AI tools retrieve data silently. No timestamp, no signal, no way for the agent to know how old it is.
+## Deployment & infrastructure
-FreshContext treats **retrieval time as first-class metadata**. Every adapter returns:
+The reference implementation runs on Cloudflare's global edge:
-- `retrieved_at` — exact ISO timestamp of the fetch
-- `content_date` — best estimate of when the content was originally published
-- `freshness_confidence` — `high`, `medium`, or `low` based on signal quality
-- `freshness_score` — numeric 0–100 with domain-specific decay rates (financial data at 5.0, academic papers at 0.3)
-- `adapter` — which source the data came from
-When confidence is `high`, the date came from a structured field (API, metadata). When it's `medium` or `low`, FreshContext tells you why.
----
+| Endpoint | Method | Purpose |
+|---|---|---|
+| `/` | GET | Service info + endpoint list |
+| `/health` | GET | Liveness check |
+| `/mcp` | POST | MCP JSON-RPC transport |
+| `/demo` | GET | Live before/after demo (no API key required) |
+| `/briefing` | GET | Latest stored briefing |
+| `/v1/intel/feed/:profile_id` | GET | DAR-scored intelligence feed |
+| `/watched-queries` | GET | List all watched queries |
-## Security
+- **D1 database** — 18 watched queries running on 6-hour cron with relevancy scoring
+- **KV-backed rate limiting** — 60 req/min per IP across all edge nodes
+- **Defensive valves** — clock-skew rejection (5min tolerance), hard floor at R_t<5, lazy decay at read time
+- **Provenance** — Ha-Pri SHA-256 audit signatures on every signal
+- **Schema migrations** — promise-gated, idempotent, run on first request after deploy
-- Input sanitization and domain allowlists on all adapters
-- SSRF prevention (blocked private IP ranges)
-- KV-backed global rate limiting: 60 req/min per IP across all edge nodes
-- No credentials required — all public data sources
+Production: `https://freshcontext-mcp.gimmanuel73.workers.dev`
 ---
 ## Roadmap
-- [x] 20 tools across intelligence, competitive research, market data, and composites
-- [x] `extract_changelog` — update cadence from any repo, package, or website
-- [x] `extract_govcontracts` — US federal contract intelligence via USASpending.gov
-- [x] `extract_sec_filings` — SEC EDGAR 8-K material event filings
-- [x] `extract_gdelt` — GDELT global news intelligence (100+ languages)
-- [x] `extract_gebiz` — Singapore Government procurement via data.gov.sg
-- [x] `extract_company_landscape` — 5-source company intelligence composite
-- [x] `extract_idea_landscape` — 6-source idea validation composite
-- [x] `freshness_score` numeric metric (0–100) with domain-specific decay rates
-- [x] Cloudflare Workers deployment — global edge with KV caching and rate limiting
-- [x] D1 database — 18 watched queries running on 6-hour cron with relevancy scoring
-- [x] Listed on official MCP Registry
-- [x] Listed on Apify Store
-- [x] FreshContext Specification v1.1 published (MIT) — composite adapters, decay rate table, compatibility levels
-- [x] GitHub Actions CI/CD — auto-publish to npm on every push
-- [x] **DAR engine** — exponential decay scoring with proprietary λ constants (v0.3.15)
-- [x] **Ha-Pri audit signatures** — SHA-256 provenance stamps on every signal
-- [x] **Semantic deduplication** — cross-adapter fingerprinting
-- [x] **Intelligence feed endpoint** — `/v1/intel/feed/:profile_id`
-- [x] **METHODOLOGY.md** — formal IP documentation
+- [x] FreshContext Specification v1.1 published (MIT, open standard)
+- [x] DAR engine with proprietary λ constants (v0.3.15)
+- [x] Ha-Pri audit signatures on every signal
+- [x] Semantic deduplication via fingerprinting
+- [x] Live before/after demo at `/demo`
+- [x] METHODOLOGY.md — formal IP and engineering documentation
+- [x] 20 reference adapters across intelligence, competitive research, market data, and composites
+- [x] Cloudflare Workers deployment — global edge, KV cache, KV rate limiting
+- [x] Listed on official MCP Registry, Apify Store, npm
+- [x] GitHub Actions CI/CD — auto-publish on every push
 - [ ] Webhook triggers — push high-entropy signals on threshold
-- [ ] Domain-specific watched queries for mining/industrial sector
-- [ ] Subscription tier with profile customization
-- [ ] GKG upgrade for `extract_gdelt` — tone scores, goldstein scale, event codes
 - [ ] Dashboard — React frontend for the D1 intelligence pipeline
+- [ ] GKG upgrade for `extract_gdelt` — tone scores, goldstein scale, event codes
 ---
 ## Contributing
-PRs welcome. New adapters are the highest-value contribution — see `src/adapters/` for the pattern and `FRESHCONTEXT_SPEC.md` for the contract any adapter must fulfill.
+PRs welcome. New adapters are the highest-value contribution — see `src/adapters/` for the pattern and [`FRESHCONTEXT_SPEC.md`](./FRESHCONTEXT_SPEC.md) for the contract any adapter must fulfil.
 If you're building something FreshContext-compatible, open an issue and we'll add you to the ecosystem list.
@@ -331,63 +294,3 @@ MIT
 ---
 **Also on:** [Apify Store](https://apify.com/prince_gabriel/freshcontext-mcp) · [MCP Registry](https://registry.modelcontextprotocol.io) · [npm](https://www.npmjs.com/package/freshcontext-mcp)
----
-## The Intelligence Layer (v0.3.15)
-FreshContext is no longer just a pull tool. The infrastructure now runs a continuous **Decay-Adjusted Relevancy (DAR)** engine that scores every signal with exponential decay and provenance signatures.
-### The math
-```
-R_t = R_0 · e^(-λt)
-```
-- `R_0` — base semantic score against your profile (0–100)
-- `λ` — source-specific decay constant (per hour)
-- `t` — hours since the content was published
-- `R_t` — final relevancy at query time
-Source half-lives are calibrated empirically: Hacker News ≈14h, Reddit ≈3d, jobs ≈6d, GitHub ≈5mo, academic papers ≈1.6y.
-### What every signal carries
-Every row in the D1 ledger is stamped with:
-- `base_score` — R_0, semantic match against profile
-- `rt_score` — R_t, decay-adjusted relevancy
-- `entropy_level` — `low` / `stable` / `high` on the decay curve
-- `ha_pri_sig` — SHA-256 provenance signature (tamper-evident)
-- `semantic_fingerprint` — cross-adapter deduplication hash
-- `published_at` — extracted content publication date
-### The intelligence feed
-```
-GET /v1/intel/feed/:profile_id?limit=20&min_rt=0
-```
-Returns scored, deduplicated, provenance-stamped signals ranked by R_t — ready for direct consumption by any LLM or agent. No synthesis needed.
-### Methodology
-The full data collection, scoring, and provenance methodology is formally documented in [METHODOLOGY.md](./METHODOLOGY.md) — written as an audit trail for acquirers, integrators, and regulators. Version 1.1, April 2026.
----
-## Live endpoints
-| Endpoint | Method | Purpose |
-|---|---|---|
-| `/` | GET | Service info + endpoint list |
-| `/health` | GET | Liveness check |
-| `/mcp` | POST | MCP JSON-RPC transport |
-| `/briefing` | GET | Latest stored briefing |
-| `/briefing/now` | POST | Force scrape + synthesize |
-| `/v1/intel/feed/:profile_id` | GET | DAR-scored intelligence feed |
-| `/watched-queries` | GET | List all watched queries |
-| `/debug/db` | GET | D1 counts + DAR engine coverage |
-| `/debug/scrape` | GET | Run a single adapter raw |
-Production: `https://freshcontext-mcp.gimmanuel73.workers.dev`