freshcontext-mcp 0.3.13 → 0.3.15

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/METHODOLOGY.md ADDED
@@ -0,0 +1,277 @@
1
+ # FreshContext Data Intelligence Methodology
2
+ **Version 1.1 — April 2026**
3
+ *Authored by Immanuel Gabriel (Prince Gabriel) — Grootfontein, Namibia*
4
+
5
+ ---
6
+
7
+ ## What This Document Is
8
+
9
+ This document formally describes the data collection, scoring, and provenance methodology underlying the FreshContext intelligence platform.
10
+
11
+ It exists for three audiences:
12
+
13
+ 1. **Technical integrators** — teams embedding FreshContext into their agent infrastructure who need to understand what the data represents and how it is scored.
14
+ 2. **Acquirers and licensing partners** — entities evaluating FreshContext as an asset, who need to audit the methodology that makes the data defensible.
15
+ 3. **Regulators and auditors** — who may need to verify that the platform's data claims are substantiated by documented, reproducible methodology.
16
+
17
+ ---
18
+
19
+ ## Section 1: Data Collection
20
+
21
+ ### 1.1 Architecture
22
+
23
+ FreshContext operates a continuous data collection pipeline running on Cloudflare's global edge infrastructure. The pipeline executes every 6 hours via a scheduled cron job and queries 18 watched query definitions stored in the platform's D1 database.
24
+
25
+ Each watched query specifies:
26
+ - **Adapter** — the data source to query (e.g., `hackernews`, `jobs`, `reposearch`)
27
+ - **Query** — the search term or URL
28
+ - **User ID** — the profile this query serves
29
+ - **Filters** — optional parameters (location, exclusion terms, etc.)
30
+
31
+ ### 1.2 Adapters
32
+
33
+ FreshContext implements 11 production adapters covering the following sources:
34
+
35
+ | Adapter | Source | Auth Required | Update Frequency |
36
+ |---|---|---|---|
37
+ | `hackernews` | Hacker News Algolia API | None | Real-time |
38
+ | `jobs` | Remotive API | None | Continuous |
39
+ | `reposearch` | GitHub Search API | Optional (rate limit) | Real-time |
40
+ | `github` | GitHub Repository API | Optional | Real-time |
41
+ | `reddit` | Reddit JSON API | None | Real-time |
42
+ | `yc` | YC Open Source API | None | Per batch cycle |
43
+ | `packagetrends` | npm Registry + npm Downloads API | None | Per publish |
44
+ | `finance` | Yahoo Finance API | None | Market hours |
45
+ | `hackernews` | HN Algolia Full-Text Search | None | Real-time |
46
+
47
+ All adapters operate exclusively on **publicly accessible data**. No credentials are required or used for data access. All fetch requests include a `User-Agent` header identifying the FreshContext crawler.
48
+
49
+ ### 1.3 Content Hash Deduplication
50
+
51
+ Before any signal is stored, the platform computes a 32-bit rolling hash of the raw content. If the most recent stored result for a given watched query carries an identical hash, the current result is discarded. This prevents storing unchanged content across cron cycles.
52
+
53
+ ### 1.4 Semantic Deduplication
54
+
55
+ Beyond exact-match deduplication, FreshContext implements semantic deduplication to prevent the same underlying story appearing as multiple signals because it was covered by multiple sources (e.g., the same GitHub release appearing in both HN and Reddit).
56
+
57
+ The semantic fingerprint is computed as follows:
58
+
59
+ 1. Extract the first canonical URL from the raw content
60
+ 2. Extract the first ISO 8601 publication date from the raw content
61
+ 3. Extract and normalise the first substantive line (title) — lowercased, punctuation stripped, truncated to 80 characters
62
+ 4. Concatenate: `normalised_title|canonical_url|publication_date`
63
+ 5. Compute SHA-256 of the concatenated string
64
+ 6. Retain the first 16 hex characters as the fingerprint
65
+
66
+ If any signal stored within the preceding 48 hours carries an identical fingerprint, the new result is discarded. The 48-hour window is configurable.
67
+
68
+ ---
69
+
70
+ ## Section 2: Temporal Scoring — The DAR Engine
71
+
72
+ ### 2.1 Overview
73
+
74
+ The Decay-Adjusted Relevancy (DAR) engine scores every collected signal on two axes:
75
+
76
+ - **R_0 (Base Score)** — semantic relevancy of the content against the user's profile, independent of time
77
+ - **R_t (Decay-Adjusted Score)** — R_0 adjusted for how much time has elapsed since the content was published
78
+
79
+ The final stored `rt_score` is what drives signal ranking in briefings and the intelligence feed.
80
+
81
+ ### 2.2 Base Score Calculation (R_0)
82
+
83
+ R_0 is computed by matching content against the user profile:
84
+
85
+ ```
86
+ R_0 = baseline (40)
87
+ + vital_keyword_matches × 15 [capped at +35]
88
+ + skill_keyword_matches × 3 [capped at +15]
89
+ + location_accessibility_bonus [+8 if remote/accessible]
90
+ - error_penalty [−40 if content is empty/error]
91
+ ```
92
+
93
+ Vital keywords are drawn from the `targets` field of the user profile — job titles, company names, and technology domains the user is specifically tracking.
94
+
95
+ Skill keywords are drawn from the `skills` field — the user's technical competencies. A match here adds relevancy signal but at lower weight than a direct target match.
96
+
97
+ The location accessibility bonus is applied when the content explicitly mentions "remote", "worldwide", "anywhere", or the user's stated location. This is not a geographic filter — it is a signal boost for content that is accessible to the user regardless of their physical location.
98
+
99
+ **Hard exclusions:** If any term from the `exclusion_terms` list appears in the content, R_0 is forced to zero. The result is still stored (for audit purposes) but marked `is_relevant = 0`.
100
+
101
+ ### 2.3 Decay Function (R_t)
102
+
103
+ ```
104
+ R_t = R_0 · e^(-λt)
105
+ ```
106
+
107
+ Where:
108
+ - `λ` = source-specific decay constant (per hour)
109
+ - `t` = hours elapsed since `published_at`
110
+
111
+ If `published_at` cannot be extracted from the content, `t` is assumed to equal one half-life for that source (conservative assumption — signal is treated as partially decayed but not dead).
112
+
113
+ ### 2.4 Source Decay Constants (λ)
114
+
115
+ These constants represent the platform's proprietary calibration of how quickly signals from each source class lose intelligence value:
116
+
117
+ | Source | λ (per hour) | Half-life |
118
+ |---|---|---|
119
+ | Hacker News | 0.050 | ~14 hours |
120
+ | Reddit | 0.010 | ~3 days |
121
+ | Product Hunt | 0.010 | ~3 days |
122
+ | Job listings | 0.005 | ~6 days |
123
+ | Financial data | 0.001 | ~29 days |
124
+ | YC companies | 0.001 | ~29 days |
125
+ | Package trends | 0.0005 | ~58 days |
126
+ | GitHub repositories | 0.0002 | ~5 months |
127
+ | Academic papers | 0.00005 | ~1.6 years |
128
+
129
+ These constants are calibrated against observed information decay rates across source types. They are the platform's primary trade secret and are not exposed in API responses.
130
+
131
+ ### 2.5 Entropy Classification
132
+
133
+ Each signal is classified into one of three entropy states based on its position on the decay curve:
134
+
135
+ | State | Condition | Interpretation |
136
+ |---|---|---|
137
+ | `low` | `t < half_life / 2` | Signal near peak value — act now |
138
+ | `stable` | `t < 1.5 × half_life` | Usable signal — monitor |
139
+ | `high` | `t ≥ 1.5 × half_life` | Significantly degraded — verify before acting |
140
+
141
+ ### 2.6 Relevancy Threshold
142
+
143
+ Signals with `rt_score < 35` are stored with `is_relevant = 0`. They remain in the database for audit and historical analysis but are excluded from briefings and the intelligence feed by default. The threshold is configurable per profile.
144
+
145
+ ---
146
+
147
+ ## Section 3: Provenance and Auditability
148
+
149
+ ### 3.1 The Ha-Pri Audit Signature
150
+
151
+ Every signal stored in the FreshContext database carries a `ha_pri_sig` — a SHA-256 audit signature computed as:
152
+
153
+ ```
154
+ SHA-256( result_id + ":" + content_hash + ":" + "FRESHCONTEXT_DAR_V1" )
155
+ ```
156
+
157
+ This signature serves three purposes:
158
+
159
+ 1. **Tamper detection** — the signature binds the content hash to the result ID and the engine version. Any modification to the stored content would invalidate the signature.
160
+ 2. **Provenance chain** — every row in the `scrape_results` table is cryptographically linked to the moment it was scored by the DAR engine.
161
+ 3. **Licensing audit** — when FreshContext data is provided to a third party under licence, the `ha_pri_sig` column provides an immutable record of exactly what was delivered and when.
162
+
163
+ ### 3.2 D1 Historical Ledger
164
+
165
+ The `scrape_results` table functions as a **Contextual Ledger** — not merely a cache, but a time-series record of intelligence signals with full provenance.
166
+
167
+ Key properties of the ledger:
168
+ - Every row is immutable once written (no UPDATE operations on scored rows)
169
+ - Every row carries a `scraped_at` timestamp with second precision
170
+ - Every row carries a `published_at` date extracted from content (where available)
171
+ - The ledger accumulates continuously at 6-hour intervals regardless of active user sessions
172
+ - The ledger enables time-travel queries: "what was the intelligence landscape for topic X at date Y?"
173
+
174
+ ### 3.3 Schema Reference
175
+
176
+ ```sql
177
+ scrape_results (
178
+ id TEXT PRIMARY KEY, -- sr_{timestamp}_{random}
179
+ watched_query_id TEXT, -- FK → watched_queries.id
180
+ adapter TEXT, -- source adapter name
181
+ query TEXT, -- the search term used
182
+ raw_content TEXT, -- scraped content (max 8000 chars)
183
+ result_hash TEXT, -- 32-bit rolling hash of raw_content
184
+ semantic_fingerprint TEXT, -- 16-char SHA-256 of normalised title|url|date
185
+ is_new INTEGER, -- 1 until consumed by briefing
186
+ scraped_at TEXT, -- ISO 8601 UTC timestamp
187
+ published_at TEXT, -- extracted content publication date
188
+ relevancy_score INTEGER, -- = round(rt_score), 0-100
189
+ is_relevant INTEGER, -- 1 if rt_score >= 35, else 0
190
+ base_score INTEGER, -- R_0 semantic score, 0-100
191
+ rt_score REAL, -- R_t decay-adjusted score, 0-100
192
+ ha_pri_sig TEXT, -- SHA-256 audit signature (64 hex chars)
193
+ entropy_level TEXT -- 'low' | 'stable' | 'high'
194
+ )
195
+ ```
196
+
197
+ ---
198
+
199
+ ## Section 4: The Intelligence Feed
200
+
201
+ ### 4.1 Endpoint
202
+
203
+ ```
204
+ GET /v1/intel/feed/{profile_id}
205
+ ```
206
+
207
+ Optional parameters:
208
+ - `limit` — maximum signals to return (default: 20)
209
+ - `min_rt` — minimum rt_score filter (default: 0)
210
+
211
+ ### 4.2 Response Structure
212
+
213
+ ```json
214
+ {
215
+ "feed_metadata": {
216
+ "profile_id": "default",
217
+ "generated_at": "2026-04-14T09:00:00Z",
218
+ "signal_count": 18,
219
+ "version": "freshcontext-1.1"
220
+ },
221
+ "signals": [
222
+ {
223
+ "signal_id": "sr_1744628412_a3f7b",
224
+ "source": "hackernews",
225
+ "label": "HN: MCP Servers",
226
+ "content": {
227
+ "preview": "...",
228
+ "url": "mcp server 2026"
229
+ },
230
+ "intelligence_stamps": {
231
+ "scraped_at": "2026-04-14T08:12:00Z",
232
+ "published_at": "2026-04-14",
233
+ "base_score": 78,
234
+ "rt_score": 61.4,
235
+ "entropy_level": "stable",
236
+ "ha_pri_sig": "a3f7b2c1d4e5f6a7b8c9d0e1f2a3b4c5..."
237
+ }
238
+ }
239
+ ]
240
+ }
241
+ ```
242
+
243
+ ### 4.3 LLM Integration
244
+
245
+ The intelligence feed is designed to be consumed directly by any language model or AI agent without modification. The `intelligence_stamps` block gives the agent everything it needs to reason about data freshness:
246
+
247
+ - `rt_score` — a single number representing current signal value
248
+ - `entropy_level` — human-readable decay state
249
+ - `published_at` — the actual content date (not the retrieval date)
250
+ - `ha_pri_sig` — provenance reference the agent can cite
251
+
252
+ This is the core value proposition: **AI agents get grounded, timestamped, scored intelligence rather than undated web content of unknown age.**
253
+
254
+ ---
255
+
256
+ ## Section 5: Asset Summary
257
+
258
+ For acquirers, investors, and licensing partners:
259
+
260
+ **What FreshContext owns:**
261
+
262
+ 1. **The FreshContext Specification v1.1** (MIT licence, open standard) — defines the envelope format, confidence levels, and structured JSON form. Timestamped in the public GitHub repository.
263
+
264
+ 2. **The DAR Engine** (proprietary) — the exponential decay scoring methodology with source-specific λ constants. These constants are not published and constitute trade secret IP.
265
+
266
+ 3. **The Semantic Fingerprinting Method** (proprietary) — the three-field normalisation and SHA-256 fingerprinting approach for cross-adapter deduplication.
267
+
268
+ 4. **The Ha-Pri Audit Signature scheme** (proprietary) — the provenance binding method that makes the historical ledger tamper-evident.
269
+
270
+ 5. **The Historical D1 Ledger** (data asset) — the continuously accumulating time-series dataset. As of the date of this document, the ledger has been running since early 2026 with 6-hour collection intervals across 18 watched queries. The dataset grows in defensibility with every passing day.
271
+
272
+ 6. **The Reference Implementation** — `freshcontext-mcp@0.3.15`, listed on the official MCP Registry and npm. Deployed globally on Cloudflare's edge infrastructure.
273
+
274
+ ---
275
+
276
+ *"The work isn't gone. It's just waiting to be continued."*
277
+ *— Prince Gabriel, Grootfontein, Namibia*
package/README.md CHANGED
@@ -8,12 +8,15 @@ That's the problem freshcontext fixes.
8
8
 
9
9
  [![npm version](https://img.shields.io/npm/v/freshcontext-mcp)](https://www.npmjs.com/package/freshcontext-mcp)
10
10
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
11
+ [![MCP Registry](https://img.shields.io/badge/MCP%20Registry-Listed-blue)](https://registry.modelcontextprotocol.io)
11
12
 
12
13
  ---
13
14
 
14
- ## What it does
15
+ ## The Standard
15
16
 
16
- Every MCP server returns data. freshcontext returns data **plus when it was retrieved and how confident that date is** wrapped in a FreshContext envelope:
17
+ FreshContext is a **data freshness layer for AI agents** an open standard and reference implementation that makes retrieved data trustworthy.
18
+
19
+ Every piece of web data an AI agent retrieves has an age. Most tools ignore it. FreshContext surfaces it — wrapping every result in a structured envelope that carries three guarantees:
17
20
 
18
21
  ```
19
22
  [FRESHCONTEXT]
@@ -26,11 +29,13 @@ Confidence: high
26
29
  [/FRESHCONTEXT]
27
30
  ```
28
31
 
29
- Claude now knows the difference between something from this morning and something from two years ago. You do too.
32
+ **When** it was retrieved. **Where** it came from. **How confident** we are the date is accurate.
33
+
34
+ The FreshContext Specification v1.1 is published as an open standard under MIT license. Any tool, agent, or system that wraps retrieved data in this envelope is FreshContext-compatible. → [Read the spec](./FRESHCONTEXT_SPEC.md)
30
35
 
31
36
  ---
32
37
 
33
- ## 13 tools. No API keys.
38
+ ## 20 tools. No API keys.
34
39
 
35
40
  ### Intelligence
36
41
  | Tool | What it gets you |
@@ -52,22 +57,69 @@ Claude now knows the difference between something from this morning and somethin
52
57
  ### Market data
53
58
  | Tool | What it gets you |
54
59
  |---|---|
55
- | `extract_finance` | Live stock data — price, market cap, P/E, 52w range |
60
+ | `extract_finance` | Live stock data — price, market cap, P/E, 52w range. Up to 5 tickers. |
61
+ | `search_jobs` | Remote job listings from Remotive, RemoteOK, HN "Who is Hiring" — every listing dated |
62
+
63
+ ### Composites — multiple sources, one call
64
+ | Tool | Sources | What it gets you |
65
+ |---|---|---|
66
+ | `extract_landscape` | 6 | YC + GitHub + HN + Reddit + Product Hunt + npm in parallel |
67
+ | `extract_idea_landscape` | 6 | HN + YC + GitHub + Jobs + npm + Product Hunt — full idea validation |
68
+ | `extract_gov_landscape` | 4 | Gov contracts + HN + GitHub + changelog |
69
+ | `extract_finance_landscape` | 5 | Finance + HN + Reddit + GitHub + changelog |
70
+ | `extract_company_landscape` | 5 | **The full picture on any company** — see below |
71
+
72
+ ### Unique — not available in any other MCP server
73
+ | Tool | Source | What it gets you |
74
+ |---|---|---|
75
+ | `extract_changelog` | GitHub Releases API / npm / auto-discover | Update history from any repo, package, or website |
76
+ | `extract_govcontracts` | USASpending.gov | US federal contract awards — company, amount, agency, period |
77
+ | `extract_sec_filings` | SEC EDGAR | 8-K filings — legally mandated material event disclosures |
78
+ | `extract_gdelt` | GDELT Project | Global news intelligence — 100+ languages, every country, 15-min updates |
79
+ | `extract_gebiz` | data.gov.sg | Singapore Government procurement tenders — open dataset, no auth |
56
80
 
57
- ### Composite
58
- | Tool | What it gets you |
59
- |---|---|
60
- | `extract_landscape` | One call. YC + GitHub + HN + Reddit + Product Hunt + npm in parallel. Full timestamped picture. |
81
+ ---
61
82
 
62
- ### Update intelligence — unique to FreshContext
63
- | Tool | What it gets you |
64
- |---|---|
65
- | `extract_changelog` | Update history from any GitHub repo, npm package, or website. Accepts a GitHub URL (uses the Releases API), an npm package name, or any website URL — auto-discovers `/changelog`, `/releases`, and `CHANGELOG.md`. Returns version numbers, release dates, and entry content, all timestamped. Use this to check if a dependency is still actively maintained, or to find out exactly when a feature shipped before referencing it. |
83
+ ## extract_idea_landscape
66
84
 
67
- ### Government intelligence unique to FreshContext
68
- | Tool | What it gets you |
69
- |---|---|
70
- | `extract_govcontracts` | US federal contract awards pulled live from USASpending.gov the official US Treasury database, updated daily. Search by company name, keyword, or NAICS code. Returns award amounts, awarding agency, period of performance, and contract description, all timestamped. A company that just won a $10M DoD contract is actively hiring and spending — that is a buying intent signal no other MCP server surfaces. |
85
+ Built for the moment before you start building. Six sources fired in parallel to answer: *should I build this?*
86
+
87
+ 1. **Hacker News** — what are developers actively complaining about (pain signal)
88
+ 2. **YC Companies**who has already received funding in this space (funding signal)
89
+ 3. **GitHub** — how crowded the open source landscape is (crowding signal)
90
+ 4. **Job listings** — companies hiring around this problem = real budget = real market (market signal)
91
+ 5. **npm / PyPI** — ecosystem adoption and release velocity (ecosystem signal)
92
+ 6. **Product Hunt** — what just launched and how the market received it (launch signal)
93
+
94
+ ```
95
+ Use extract_idea_landscape with idea "data freshness for AI agents"
96
+ ```
97
+
98
+ ---
99
+
100
+ ## extract_company_landscape
101
+
102
+ The most complete single-call company analysis available in any MCP server. Five sources fired in parallel:
103
+
104
+ 1. **SEC EDGAR** — what did they legally just disclose (8-K filings)
105
+ 2. **USASpending.gov** — who is giving them government money
106
+ 3. **GDELT** — what is global news saying right now
107
+ 4. **Changelog** — are they actually shipping product
108
+ 5. **Yahoo Finance** — what is the market pricing in
109
+
110
+ ```
111
+ Use extract_company_landscape with company "Palantir" and ticker "PLTR"
112
+ ```
113
+
114
+ Real output from March 2026:
115
+
116
+ > **Q4 2025:** Revenue $1.407B (+70% YoY). US commercial +137%. Rule of 40 score: **127%**.
117
+ > **Federal contracts:** $292.7M Army Maven Smart System · $252.5M CDAO · $145M ICE · $130M Air Force · more
118
+ > **SEC filing:** Q4 earnings 8-K filed Feb 3, 2026 — GAAP net income $609M, 43% margin
119
+ > **GDELT:** ICE/Medicaid data controversy, UK MoD security warning, NHS opposition — all timestamped
120
+ > **PLTR:** ~$154–157 · Market cap ~$370B · P/E 244x · 52w range $66 → $207
121
+
122
+ Bloomberg Terminal doesn't read commit history as a company health signal. FreshContext does.
71
123
 
72
124
  ---
73
125
 
@@ -154,35 +206,53 @@ touch ~/Library/Application\ Support/Claude/claude_desktop_config.json
154
206
 
155
207
  ## Usage examples
156
208
 
209
+ **Should I build this idea?**
210
+ ```
211
+ Use extract_idea_landscape with idea "procurement intelligence saas"
212
+ ```
213
+ Returns funding signal, pain signal, crowding signal, market signal, ecosystem signal, and launch signal — all timestamped.
214
+
215
+ **Full company intelligence in one call:**
216
+ ```
217
+ Use extract_company_landscape with company "Palantir" and ticker "PLTR"
218
+ ```
219
+ SEC filings + federal contracts + global news + changelog + market data. The complete picture.
220
+
157
221
  **Is anyone already building what you're building?**
158
222
  ```
159
223
  Use extract_landscape with topic "cashflow prediction saas"
160
224
  ```
161
225
  Returns who's funded, what's trending, what repos exist, what packages are moving — all timestamped.
162
226
 
163
- **What's the community actually saying right now?**
227
+ **What's Singapore's government procuring right now?**
164
228
  ```
165
- Use extract_reddit on r/MachineLearning
166
- Use extract_hackernews to search "mcp server 2026"
229
+ Use extract_gebiz with url "artificial intelligence"
167
230
  ```
231
+ Returns live tenders from the Ministry of Finance open dataset — agency, amount, closing date, all timestamped.
168
232
 
169
- **Did that company actually ship recently?**
233
+ **Did that company just disclose something material?**
170
234
  ```
171
- Use extract_github on https://github.com/some-org/some-repo
235
+ Use extract_sec_filings with url "Palantir Technologies"
172
236
  ```
173
- Check `Published` vs `Retrieved`. If the gap is 18 months, Claude will tell you.
237
+ 8-K filings are legally mandated within 4 business days of any material event — CEO change, acquisition, breach, major contract.
174
238
 
175
- **Is this dependency still actively maintained?**
239
+ **What is global news saying about a company right now?**
176
240
  ```
177
- Use extract_changelog with url "https://github.com/org/repo"
241
+ Use extract_gdelt with url "Palantir"
178
242
  ```
179
- Returns the last 8 releases with exact dates. If the last release was 18 months ago, you'll know before you pin the version.
243
+ 100+ languages, every country, updated every 15 minutes. Surfaces what Western sources miss.
180
244
 
181
- **Which companies just won government contracts in AI?**
245
+ **Which companies just won US government contracts in AI?**
182
246
  ```
183
247
  Use extract_govcontracts with url "artificial intelligence"
184
248
  ```
185
- Returns the largest recent federal contract awards matching that keyword — company name, amount, agency, and award date. Pure buying intent signal.
249
+ Largest recent federal contract awards matching that keyword — company, amount, agency, award date.
250
+
251
+ **Is this dependency still actively maintained?**
252
+ ```
253
+ Use extract_changelog with url "https://github.com/org/repo"
254
+ ```
255
+ Returns the last 8 releases with exact dates. If the last release was 18 months ago, you'll know before you pin the version.
186
256
 
187
257
  ---
188
258
 
@@ -190,14 +260,15 @@ Returns the largest recent federal contract awards matching that keyword — com
190
260
 
191
261
  Most AI tools retrieve data silently. No timestamp, no signal, no way for the agent to know how old it is.
192
262
 
193
- freshcontext treats **retrieval time as first-class metadata**. Every adapter returns:
263
+ FreshContext treats **retrieval time as first-class metadata**. Every adapter returns:
194
264
 
195
265
  - `retrieved_at` — exact ISO timestamp of the fetch
196
266
  - `content_date` — best estimate of when the content was originally published
197
267
  - `freshness_confidence` — `high`, `medium`, or `low` based on signal quality
268
+ - `freshness_score` — numeric 0–100 with domain-specific decay rates (financial data at 5.0, academic papers at 0.3)
198
269
  - `adapter` — which source the data came from
199
270
 
200
- When confidence is `high`, the date came from a structured field (API, metadata). When it's `medium` or `low`, freshcontext tells you why.
271
+ When confidence is `high`, the date came from a structured field (API, metadata). When it's `medium` or `low`, FreshContext tells you why.
201
272
 
202
273
  ---
203
274
 
@@ -212,28 +283,111 @@ When confidence is `high`, the date came from a structured field (API, metadata)
212
283
 
213
284
  ## Roadmap
214
285
 
215
- - [x] GitHub, HN, Scholar, YC, Reddit, Product Hunt, Finance, arXiv adapters
216
- - [x] `extract_landscape` — 6-source composite tool
217
- - [x] Cloudflare Workers deployment
218
- - [x] KV-backed global rate limiting
219
- - [x] Listed on official MCP Registry
286
+ - [x] 20 tools across intelligence, competitive research, market data, and composites
220
287
  - [x] `extract_changelog` — update cadence from any repo, package, or website
221
288
  - [x] `extract_govcontracts` — US federal contract intelligence via USASpending.gov
289
+ - [x] `extract_sec_filings` — SEC EDGAR 8-K material event filings
290
+ - [x] `extract_gdelt` — GDELT global news intelligence (100+ languages)
291
+ - [x] `extract_gebiz` — Singapore Government procurement via data.gov.sg
292
+ - [x] `extract_company_landscape` — 5-source company intelligence composite
293
+ - [x] `extract_idea_landscape` — 6-source idea validation composite
294
+ - [x] `freshness_score` numeric metric (0–100) with domain-specific decay rates
295
+ - [x] Cloudflare Workers deployment — global edge with KV caching and rate limiting
296
+ - [x] D1 database — 18 watched queries running on 6-hour cron with relevancy scoring
297
+ - [x] Listed on official MCP Registry
222
298
  - [x] Listed on Apify Store
223
- - [x] FreshContext Specification v1.0 published
224
- - [ ] TTL-based caching layer
225
- - [ ] `freshness_score` numeric metric (0–100)
226
- - [ ] `extract_devto`developer article sentiment
227
- - [ ] `extract_npm_releases`package release velocity
299
+ - [x] FreshContext Specification v1.1 published (MIT) — composite adapters, decay rate table, compatibility levels
300
+ - [x] GitHub Actions CI/CD — auto-publish to npm on every push
301
+ - [x] **DAR engine** exponential decay scoring with proprietary λ constants (v0.3.15)
302
+ - [x] **Ha-Pri audit signatures** SHA-256 provenance stamps on every signal
303
+ - [x] **Semantic deduplication** cross-adapter fingerprinting
304
+ - [x] **Intelligence feed endpoint** — `/v1/intel/feed/:profile_id`
305
+ - [x] **METHODOLOGY.md** — formal IP documentation
306
+ - [ ] Webhook triggers — push high-entropy signals on threshold
307
+ - [ ] Domain-specific watched queries for mining/industrial sector
308
+ - [ ] Subscription tier with profile customization
309
+ - [ ] GKG upgrade for `extract_gdelt` — tone scores, goldstein scale, event codes
310
+ - [ ] Dashboard — React frontend for the D1 intelligence pipeline
228
311
 
229
312
  ---
230
313
 
231
314
  ## Contributing
232
315
 
233
- PRs welcome. New adapters are the highest-value contribution — see `src/adapters/` for the pattern.
316
+ PRs welcome. New adapters are the highest-value contribution — see `src/adapters/` for the pattern and `FRESHCONTEXT_SPEC.md` for the contract any adapter must fulfill.
317
+
318
+ If you're building something FreshContext-compatible, open an issue and we'll add you to the ecosystem list.
234
319
 
235
320
  ---
236
321
 
237
322
  ## License
238
323
 
239
324
  MIT
325
+
326
+ ---
327
+
328
+ *Built by Prince Gabriel — Grootfontein, Namibia 🇳🇦*
329
+ *"The work isn't gone. It's just waiting to be continued."*
330
+
331
+ ---
332
+
333
+ **Also on:** [Apify Store](https://apify.com/prince_gabriel/freshcontext-mcp) · [MCP Registry](https://registry.modelcontextprotocol.io) · [npm](https://www.npmjs.com/package/freshcontext-mcp)
334
+
335
+ ---
336
+
337
+ ## The Intelligence Layer (v0.3.15)
338
+
339
+ FreshContext is no longer just a pull tool. The infrastructure now runs a continuous **Decay-Adjusted Relevancy (DAR)** engine that scores every signal with exponential decay and provenance signatures.
340
+
341
+ ### The math
342
+
343
+ ```
344
+ R_t = R_0 · e^(-λt)
345
+ ```
346
+
347
+ - `R_0` — base semantic score against your profile (0–100)
348
+ - `λ` — source-specific decay constant (per hour)
349
+ - `t` — hours since the content was published
350
+ - `R_t` — final relevancy at query time
351
+
352
+ Source half-lives are calibrated empirically: Hacker News ≈14h, Reddit ≈3d, jobs ≈6d, GitHub ≈5mo, academic papers ≈1.6y.
353
+
354
+ ### What every signal carries
355
+
356
+ Every row in the D1 ledger is stamped with:
357
+
358
+ - `base_score` — R_0, semantic match against profile
359
+ - `rt_score` — R_t, decay-adjusted relevancy
360
+ - `entropy_level` — `low` / `stable` / `high` on the decay curve
361
+ - `ha_pri_sig` — SHA-256 provenance signature (tamper-evident)
362
+ - `semantic_fingerprint` — cross-adapter deduplication hash
363
+ - `published_at` — extracted content publication date
364
+
365
+ ### The intelligence feed
366
+
367
+ ```
368
+ GET /v1/intel/feed/:profile_id?limit=20&min_rt=0
369
+ ```
370
+
371
+ Returns scored, deduplicated, provenance-stamped signals ranked by R_t — ready for direct consumption by any LLM or agent. No synthesis needed.
372
+
373
+ ### Methodology
374
+
375
+ The full data collection, scoring, and provenance methodology is formally documented in [METHODOLOGY.md](./METHODOLOGY.md) — written as an audit trail for acquirers, integrators, and regulators. Version 1.1, April 2026.
376
+
377
+ ---
378
+
379
+ ## Live endpoints
380
+
381
+ | Endpoint | Method | Purpose |
382
+ |---|---|---|
383
+ | `/` | GET | Service info + endpoint list |
384
+ | `/health` | GET | Liveness check |
385
+ | `/mcp` | POST | MCP JSON-RPC transport |
386
+ | `/briefing` | GET | Latest stored briefing |
387
+ | `/briefing/now` | POST | Force scrape + synthesize |
388
+ | `/v1/intel/feed/:profile_id` | GET | DAR-scored intelligence feed |
389
+ | `/watched-queries` | GET | List all watched queries |
390
+ | `/debug/db` | GET | D1 counts + DAR engine coverage |
391
+ | `/debug/scrape` | GET | Run a single adapter raw |
392
+
393
+ Production: `https://freshcontext-mcp.gimmanuel73.workers.dev`