freshcontext-mcp 0.3.14 → 0.3.15

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/METHODOLOGY.md ADDED
@@ -0,0 +1,277 @@
1
+ # FreshContext Data Intelligence Methodology
2
+ **Version 1.1 — April 2026**
3
+ *Authored by Immanuel Gabriel (Prince Gabriel) — Grootfontein, Namibia*
4
+
5
+ ---
6
+
7
+ ## What This Document Is
8
+
9
+ This document formally describes the data collection, scoring, and provenance methodology underlying the FreshContext intelligence platform.
10
+
11
+ It exists for three audiences:
12
+
13
+ 1. **Technical integrators** — teams embedding FreshContext into their agent infrastructure who need to understand what the data represents and how it is scored.
14
+ 2. **Acquirers and licensing partners** — entities evaluating FreshContext as an asset, who need to audit the methodology that makes the data defensible.
15
+ 3. **Regulators and auditors** — who may need to verify that the platform's data claims are substantiated by documented, reproducible methodology.
16
+
17
+ ---
18
+
19
+ ## Section 1: Data Collection
20
+
21
+ ### 1.1 Architecture
22
+
23
+ FreshContext operates a continuous data collection pipeline running on Cloudflare's global edge infrastructure. The pipeline executes every 6 hours via a scheduled cron job and queries 18 watched query definitions stored in the platform's D1 database.
24
+
25
+ Each watched query specifies:
26
+ - **Adapter** — the data source to query (e.g., `hackernews`, `jobs`, `reposearch`)
27
+ - **Query** — the search term or URL
28
+ - **User ID** — the profile this query serves
29
+ - **Filters** — optional parameters (location, exclusion terms, etc.)
30
+
31
+ ### 1.2 Adapters
32
+
33
+ FreshContext implements 11 production adapters covering the following sources:
34
+
35
+ | Adapter | Source | Auth Required | Update Frequency |
36
+ |---|---|---|---|
37
+ | `hackernews` | Hacker News Algolia API | None | Real-time |
38
+ | `jobs` | Remotive API | None | Continuous |
39
+ | `reposearch` | GitHub Search API | Optional (rate limit) | Real-time |
40
+ | `github` | GitHub Repository API | Optional | Real-time |
41
+ | `reddit` | Reddit JSON API | None | Real-time |
42
+ | `yc` | YC Open Source API | None | Per batch cycle |
43
+ | `packagetrends` | npm Registry + npm Downloads API | None | Per publish |
44
+ | `finance` | Yahoo Finance API | None | Market hours |
45
+ | `hackernews` | HN Algolia Full-Text Search | None | Real-time |
46
+
47
+ All adapters operate exclusively on **publicly accessible data**. No credentials are required or used for data access. All fetch requests include a `User-Agent` header identifying the FreshContext crawler.
48
+
49
+ ### 1.3 Content Hash Deduplication
50
+
51
+ Before any signal is stored, the platform computes a 32-bit rolling hash of the raw content. If the most recent stored result for a given watched query carries an identical hash, the current result is discarded. This prevents storing unchanged content across cron cycles.
52
+
53
+ ### 1.4 Semantic Deduplication
54
+
55
+ Beyond exact-match deduplication, FreshContext implements semantic deduplication to prevent the same underlying story appearing as multiple signals because it was covered by multiple sources (e.g., the same GitHub release appearing in both HN and Reddit).
56
+
57
+ The semantic fingerprint is computed as follows:
58
+
59
+ 1. Extract the first canonical URL from the raw content
60
+ 2. Extract the first ISO 8601 publication date from the raw content
61
+ 3. Extract and normalise the first substantive line (title) — lowercased, punctuation stripped, truncated to 80 characters
62
+ 4. Concatenate: `normalised_title|canonical_url|publication_date`
63
+ 5. Compute SHA-256 of the concatenated string
64
+ 6. Retain the first 16 hex characters as the fingerprint
65
+
66
+ If any signal stored within the preceding 48 hours carries an identical fingerprint, the new result is discarded. The 48-hour window is configurable.
67
+
68
+ ---
69
+
70
+ ## Section 2: Temporal Scoring — The DAR Engine
71
+
72
+ ### 2.1 Overview
73
+
74
+ The Decay-Adjusted Relevancy (DAR) engine scores every collected signal on two axes:
75
+
76
+ - **R_0 (Base Score)** — semantic relevancy of the content against the user's profile, independent of time
77
+ - **R_t (Decay-Adjusted Score)** — R_0 adjusted for how much time has elapsed since the content was published
78
+
79
+ The final stored `rt_score` is what drives signal ranking in briefings and the intelligence feed.
80
+
81
+ ### 2.2 Base Score Calculation (R_0)
82
+
83
+ R_0 is computed by matching content against the user profile:
84
+
85
+ ```
86
+ R_0 = baseline (40)
87
+ + vital_keyword_matches × 15 [capped at +35]
88
+ + skill_keyword_matches × 3 [capped at +15]
89
+ + location_accessibility_bonus [+8 if remote/accessible]
90
+ - error_penalty [−40 if content is empty/error]
91
+ ```
92
+
93
+ Vital keywords are drawn from the `targets` field of the user profile — job titles, company names, and technology domains the user is specifically tracking.
94
+
95
+ Skill keywords are drawn from the `skills` field — the user's technical competencies. A match here adds relevancy signal but at lower weight than a direct target match.
96
+
97
+ The location accessibility bonus is applied when the content explicitly mentions "remote", "worldwide", "anywhere", or the user's stated location. This is not a geographic filter — it is a signal boost for content that is accessible to the user regardless of their physical location.
98
+
99
+ **Hard exclusions:** If any term from the `exclusion_terms` list appears in the content, R_0 is forced to zero. The result is still stored (for audit purposes) but marked `is_relevant = 0`.
100
+
101
+ ### 2.3 Decay Function (R_t)
102
+
103
+ ```
104
+ R_t = R_0 · e^(-λt)
105
+ ```
106
+
107
+ Where:
108
+ - `λ` = source-specific decay constant (per hour)
109
+ - `t` = hours elapsed since `published_at`
110
+
111
+ If `published_at` cannot be extracted from the content, `t` is assumed to equal one half-life for that source (conservative assumption — signal is treated as partially decayed but not dead).
112
+
113
+ ### 2.4 Source Decay Constants (λ)
114
+
115
+ These constants represent the platform's proprietary calibration of how quickly signals from each source class lose intelligence value:
116
+
117
+ | Source | λ (per hour) | Half-life |
118
+ |---|---|---|
119
+ | Hacker News | 0.050 | ~14 hours |
120
+ | Reddit | 0.010 | ~3 days |
121
+ | Product Hunt | 0.010 | ~3 days |
122
+ | Job listings | 0.005 | ~6 days |
123
+ | Financial data | 0.001 | ~29 days |
124
+ | YC companies | 0.001 | ~29 days |
125
+ | Package trends | 0.0005 | ~58 days |
126
+ | GitHub repositories | 0.0002 | ~5 months |
127
+ | Academic papers | 0.00005 | ~1.6 years |
128
+
129
+ These constants are calibrated against observed information decay rates across source types. They are the platform's primary trade secret and are not exposed in API responses.
130
+
131
+ ### 2.5 Entropy Classification
132
+
133
+ Each signal is classified into one of three entropy states based on its position on the decay curve:
134
+
135
+ | State | Condition | Interpretation |
136
+ |---|---|---|
137
+ | `low` | `t < half_life / 2` | Signal near peak value — act now |
138
+ | `stable` | `t < 1.5 × half_life` | Usable signal — monitor |
139
+ | `high` | `t ≥ 1.5 × half_life` | Significantly degraded — verify before acting |
140
+
141
+ ### 2.6 Relevancy Threshold
142
+
143
+ Signals with `rt_score < 35` are stored with `is_relevant = 0`. They remain in the database for audit and historical analysis but are excluded from briefings and the intelligence feed by default. The threshold is configurable per profile.
144
+
145
+ ---
146
+
147
+ ## Section 3: Provenance and Auditability
148
+
149
+ ### 3.1 The Ha-Pri Audit Signature
150
+
151
+ Every signal stored in the FreshContext database carries a `ha_pri_sig` — a SHA-256 audit signature computed as:
152
+
153
+ ```
154
+ SHA-256( result_id + ":" + content_hash + ":" + "FRESHCONTEXT_DAR_V1" )
155
+ ```
156
+
157
+ This signature serves three purposes:
158
+
159
+ 1. **Tamper detection** — the signature binds the content hash to the result ID and the engine version. Any modification to the stored content would invalidate the signature.
160
+ 2. **Provenance chain** — every row in the `scrape_results` table is cryptographically linked to the moment it was scored by the DAR engine.
161
+ 3. **Licensing audit** — when FreshContext data is provided to a third party under licence, the `ha_pri_sig` column provides an immutable record of exactly what was delivered and when.
162
+
163
+ ### 3.2 D1 Historical Ledger
164
+
165
+ The `scrape_results` table functions as a **Contextual Ledger** — not merely a cache, but a time-series record of intelligence signals with full provenance.
166
+
167
+ Key properties of the ledger:
168
+ - Every row is immutable once written (no UPDATE operations on scored rows)
169
+ - Every row carries a `scraped_at` timestamp with second precision
170
+ - Every row carries a `published_at` date extracted from content (where available)
171
+ - The ledger accumulates continuously at 6-hour intervals regardless of active user sessions
172
+ - The ledger enables time-travel queries: "what was the intelligence landscape for topic X at date Y?"
173
+
174
+ ### 3.3 Schema Reference
175
+
176
+ ```sql
177
+ scrape_results (
178
+ id TEXT PRIMARY KEY, -- sr_{timestamp}_{random}
179
+ watched_query_id TEXT, -- FK → watched_queries.id
180
+ adapter TEXT, -- source adapter name
181
+ query TEXT, -- the search term used
182
+ raw_content TEXT, -- scraped content (max 8000 chars)
183
+ result_hash TEXT, -- 32-bit rolling hash of raw_content
184
+ semantic_fingerprint TEXT, -- 16-char SHA-256 of normalised title|url|date
185
+ is_new INTEGER, -- 1 until consumed by briefing
186
+ scraped_at TEXT, -- ISO 8601 UTC timestamp
187
+ published_at TEXT, -- extracted content publication date
188
+ relevancy_score INTEGER, -- = round(rt_score), 0-100
189
+ is_relevant INTEGER, -- 1 if rt_score >= 35, else 0
190
+ base_score INTEGER, -- R_0 semantic score, 0-100
191
+ rt_score REAL, -- R_t decay-adjusted score, 0-100
192
+ ha_pri_sig TEXT, -- SHA-256 audit signature (64 hex chars)
193
+ entropy_level TEXT -- 'low' | 'stable' | 'high'
194
+ )
195
+ ```
196
+
197
+ ---
198
+
199
+ ## Section 4: The Intelligence Feed
200
+
201
+ ### 4.1 Endpoint
202
+
203
+ ```
204
+ GET /v1/intel/feed/{profile_id}
205
+ ```
206
+
207
+ Optional parameters:
208
+ - `limit` — maximum signals to return (default: 20)
209
+ - `min_rt` — minimum rt_score filter (default: 0)
210
+
211
+ ### 4.2 Response Structure
212
+
213
+ ```json
214
+ {
215
+ "feed_metadata": {
216
+ "profile_id": "default",
217
+ "generated_at": "2026-04-14T09:00:00Z",
218
+ "signal_count": 18,
219
+ "version": "freshcontext-1.1"
220
+ },
221
+ "signals": [
222
+ {
223
+ "signal_id": "sr_1744628412_a3f7b",
224
+ "source": "hackernews",
225
+ "label": "HN: MCP Servers",
226
+ "content": {
227
+ "preview": "...",
228
+ "url": "mcp server 2026"
229
+ },
230
+ "intelligence_stamps": {
231
+ "scraped_at": "2026-04-14T08:12:00Z",
232
+ "published_at": "2026-04-14",
233
+ "base_score": 78,
234
+ "rt_score": 61.4,
235
+ "entropy_level": "stable",
236
+ "ha_pri_sig": "a3f7b2c1d4e5f6a7b8c9d0e1f2a3b4c5..."
237
+ }
238
+ }
239
+ ]
240
+ }
241
+ ```
242
+
243
+ ### 4.3 LLM Integration
244
+
245
+ The intelligence feed is designed to be consumed directly by any language model or AI agent without modification. The `intelligence_stamps` block gives the agent everything it needs to reason about data freshness:
246
+
247
+ - `rt_score` — a single number representing current signal value
248
+ - `entropy_level` — human-readable decay state
249
+ - `published_at` — the actual content date (not the retrieval date)
250
+ - `ha_pri_sig` — provenance reference the agent can cite
251
+
252
+ This is the core value proposition: **AI agents get grounded, timestamped, scored intelligence rather than undated web content of unknown age.**
253
+
254
+ ---
255
+
256
+ ## Section 5: Asset Summary
257
+
258
+ For acquirers, investors, and licensing partners:
259
+
260
+ **What FreshContext owns:**
261
+
262
+ 1. **The FreshContext Specification v1.1** (MIT licence, open standard) — defines the envelope format, confidence levels, and structured JSON form. Timestamped in the public GitHub repository.
263
+
264
+ 2. **The DAR Engine** (proprietary) — the exponential decay scoring methodology with source-specific λ constants. These constants are not published and constitute trade secret IP.
265
+
266
+ 3. **The Semantic Fingerprinting Method** (proprietary) — the three-field normalisation and SHA-256 fingerprinting approach for cross-adapter deduplication.
267
+
268
+ 4. **The Ha-Pri Audit Signature scheme** (proprietary) — the provenance binding method that makes the historical ledger tamper-evident.
269
+
270
+ 5. **The Historical D1 Ledger** (data asset) — the continuously accumulating time-series dataset. As of the date of this document, the ledger has been running since early 2026 with 6-hour collection intervals across 18 watched queries. The dataset grows in defensibility with every passing day.
271
+
272
+ 6. **The Reference Implementation** — `freshcontext-mcp@0.3.15`, listed on the official MCP Registry and npm. Deployed globally on Cloudflare's edge infrastructure.
273
+
274
+ ---
275
+
276
+ *"The work isn't gone. It's just waiting to be continued."*
277
+ *— Prince Gabriel, Grootfontein, Namibia*
package/README.md CHANGED
@@ -8,12 +8,15 @@ That's the problem freshcontext fixes.
8
8
 
9
9
  [![npm version](https://img.shields.io/npm/v/freshcontext-mcp)](https://www.npmjs.com/package/freshcontext-mcp)
10
10
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
11
+ [![MCP Registry](https://img.shields.io/badge/MCP%20Registry-Listed-blue)](https://registry.modelcontextprotocol.io)
11
12
 
12
13
  ---
13
14
 
14
- ## What it does
15
+ ## The Standard
15
16
 
16
- Every MCP server returns data. freshcontext returns data **plus when it was retrieved and how confident that date is** wrapped in a FreshContext envelope:
17
+ FreshContext is a **data freshness layer for AI agents** an open standard and reference implementation that makes retrieved data trustworthy.
18
+
19
+ Every piece of web data an AI agent retrieves has an age. Most tools ignore it. FreshContext surfaces it — wrapping every result in a structured envelope that carries three guarantees:
17
20
 
18
21
  ```
19
22
  [FRESHCONTEXT]
@@ -26,11 +29,13 @@ Confidence: high
26
29
  [/FRESHCONTEXT]
27
30
  ```
28
31
 
29
- Claude now knows the difference between something from this morning and something from two years ago. You do too.
32
+ **When** it was retrieved. **Where** it came from. **How confident** we are the date is accurate.
33
+
34
+ The FreshContext Specification v1.1 is published as an open standard under MIT license. Any tool, agent, or system that wraps retrieved data in this envelope is FreshContext-compatible. → [Read the spec](./FRESHCONTEXT_SPEC.md)
30
35
 
31
36
  ---
32
37
 
33
- ## 19 tools. No API keys.
38
+ ## 20 tools. No API keys.
34
39
 
35
40
  ### Intelligence
36
41
  | Tool | What it gets you |
@@ -53,13 +58,14 @@ Claude now knows the difference between something from this morning and somethin
53
58
  | Tool | What it gets you |
54
59
  |---|---|
55
60
  | `extract_finance` | Live stock data — price, market cap, P/E, 52w range. Up to 5 tickers. |
56
- | `search_jobs` | Remote job listings from Remotive + HN "Who is Hiring" — every listing dated |
61
+ | `search_jobs` | Remote job listings from Remotive, RemoteOK, HN "Who is Hiring" — every listing dated |
57
62
 
58
63
  ### Composites — multiple sources, one call
59
64
  | Tool | Sources | What it gets you |
60
65
  |---|---|---|
61
66
  | `extract_landscape` | 6 | YC + GitHub + HN + Reddit + Product Hunt + npm in parallel |
62
- | `extract_gov_landscape` | 4 | Gov contracts + HN + GitHub repos + changelog |
67
+ | `extract_idea_landscape` | 6 | HN + YC + GitHub + Jobs + npm + Product Hunt — full idea validation |
68
+ | `extract_gov_landscape` | 4 | Gov contracts + HN + GitHub + changelog |
63
69
  | `extract_finance_landscape` | 5 | Finance + HN + Reddit + GitHub + changelog |
64
70
  | `extract_company_landscape` | 5 | **The full picture on any company** — see below |
65
71
 
@@ -74,6 +80,23 @@ Claude now knows the difference between something from this morning and somethin
74
80
 
75
81
  ---
76
82
 
83
+ ## extract_idea_landscape
84
+
85
+ Built for the moment before you start building. Six sources fired in parallel to answer: *should I build this?*
86
+
87
+ 1. **Hacker News** — what are developers actively complaining about (pain signal)
88
+ 2. **YC Companies** — who has already received funding in this space (funding signal)
89
+ 3. **GitHub** — how crowded the open source landscape is (crowding signal)
90
+ 4. **Job listings** — companies hiring around this problem = real budget = real market (market signal)
91
+ 5. **npm / PyPI** — ecosystem adoption and release velocity (ecosystem signal)
92
+ 6. **Product Hunt** — what just launched and how the market received it (launch signal)
93
+
94
+ ```
95
+ Use extract_idea_landscape with idea "data freshness for AI agents"
96
+ ```
97
+
98
+ ---
99
+
77
100
  ## extract_company_landscape
78
101
 
79
102
  The most complete single-call company analysis available in any MCP server. Five sources fired in parallel:
@@ -88,7 +111,7 @@ The most complete single-call company analysis available in any MCP server. Five
88
111
  Use extract_company_landscape with company "Palantir" and ticker "PLTR"
89
112
  ```
90
113
 
91
- Real output from March 26, 2026:
114
+ Real output from March 2026:
92
115
 
93
116
  > **Q4 2025:** Revenue $1.407B (+70% YoY). US commercial +137%. Rule of 40 score: **127%**.
94
117
  > **Federal contracts:** $292.7M Army Maven Smart System · $252.5M CDAO · $145M ICE · $130M Air Force · more
@@ -96,7 +119,7 @@ Real output from March 26, 2026:
96
119
  > **GDELT:** ICE/Medicaid data controversy, UK MoD security warning, NHS opposition — all timestamped
97
120
  > **PLTR:** ~$154–157 · Market cap ~$370B · P/E 244x · 52w range $66 → $207
98
121
 
99
- Bloomberg Terminal doesn't read commit history as a company health signal. This does.
122
+ Bloomberg Terminal doesn't read commit history as a company health signal. FreshContext does.
100
123
 
101
124
  ---
102
125
 
@@ -183,11 +206,11 @@ touch ~/Library/Application\ Support/Claude/claude_desktop_config.json
183
206
 
184
207
  ## Usage examples
185
208
 
186
- **Is anyone already building what you're building?**
209
+ **Should I build this idea?**
187
210
  ```
188
- Use extract_landscape with topic "cashflow prediction saas"
211
+ Use extract_idea_landscape with idea "procurement intelligence saas"
189
212
  ```
190
- Returns who's funded, what's trending, what repos exist, what packages are moving — all timestamped.
213
+ Returns funding signal, pain signal, crowding signal, market signal, ecosystem signal, and launch signal — all timestamped.
191
214
 
192
215
  **Full company intelligence in one call:**
193
216
  ```
@@ -195,6 +218,12 @@ Use extract_company_landscape with company "Palantir" and ticker "PLTR"
195
218
  ```
196
219
  SEC filings + federal contracts + global news + changelog + market data. The complete picture.
197
220
 
221
+ **Is anyone already building what you're building?**
222
+ ```
223
+ Use extract_landscape with topic "cashflow prediction saas"
224
+ ```
225
+ Returns who's funded, what's trending, what repos exist, what packages are moving — all timestamped.
226
+
198
227
  **What's Singapore's government procuring right now?**
199
228
  ```
200
229
  Use extract_gebiz with url "artificial intelligence"
@@ -207,17 +236,17 @@ Use extract_sec_filings with url "Palantir Technologies"
207
236
  ```
208
237
  8-K filings are legally mandated within 4 business days of any material event — CEO change, acquisition, breach, major contract.
209
238
 
210
- **What is global news saying about a company?**
239
+ **What is global news saying about a company right now?**
211
240
  ```
212
241
  Use extract_gdelt with url "Palantir"
213
242
  ```
214
243
  100+ languages, every country, updated every 15 minutes. Surfaces what Western sources miss.
215
244
 
216
- **What's the community actually saying right now?**
245
+ **Which companies just won US government contracts in AI?**
217
246
  ```
218
- Use extract_reddit on r/MachineLearning
219
- Use extract_hackernews to search "mcp server 2026"
247
+ Use extract_govcontracts with url "artificial intelligence"
220
248
  ```
249
+ Largest recent federal contract awards matching that keyword — company, amount, agency, award date.
221
250
 
222
251
  **Is this dependency still actively maintained?**
223
252
  ```
@@ -225,31 +254,21 @@ Use extract_changelog with url "https://github.com/org/repo"
225
254
  ```
226
255
  Returns the last 8 releases with exact dates. If the last release was 18 months ago, you'll know before you pin the version.
227
256
 
228
- **Which companies just won government contracts in AI?**
229
- ```
230
- Use extract_govcontracts with url "artificial intelligence"
231
- ```
232
- Largest recent federal contract awards matching that keyword — company, amount, agency, award date.
233
-
234
257
  ---
235
258
 
236
259
  ## How freshness works
237
260
 
238
261
  Most AI tools retrieve data silently. No timestamp, no signal, no way for the agent to know how old it is.
239
262
 
240
- freshcontext treats **retrieval time as first-class metadata**. Every adapter returns:
263
+ FreshContext treats **retrieval time as first-class metadata**. Every adapter returns:
241
264
 
242
265
  - `retrieved_at` — exact ISO timestamp of the fetch
243
266
  - `content_date` — best estimate of when the content was originally published
244
267
  - `freshness_confidence` — `high`, `medium`, or `low` based on signal quality
245
- - `freshness_score` — numeric 0–100 score with domain-specific decay rates
268
+ - `freshness_score` — numeric 0–100 with domain-specific decay rates (financial data at 5.0, academic papers at 0.3)
246
269
  - `adapter` — which source the data came from
247
270
 
248
- When confidence is `high`, the date came from a structured field (API, metadata). When it's `medium` or `low`, freshcontext tells you why.
249
-
250
- The FreshContext Specification v1.0 is published as an open standard under MIT license. Any tool or agent that wraps retrieved data in the `[FRESHCONTEXT]` envelope is FreshContext-compatible.
251
-
252
- → [Read the spec](./FRESHCONTEXT_SPEC.md)
271
+ When confidence is `high`, the date came from a structured field (API, metadata). When it's `medium` or `low`, FreshContext tells you why.
253
272
 
254
273
  ---
255
274
 
@@ -264,27 +283,31 @@ The FreshContext Specification v1.0 is published as an open standard under MIT l
264
283
 
265
284
  ## Roadmap
266
285
 
267
- - [x] GitHub, HN, Scholar, YC, Reddit, Product Hunt, Finance, arXiv, Jobs adapters
268
- - [x] `extract_landscape` — 6-source composite tool
286
+ - [x] 20 tools across intelligence, competitive research, market data, and composites
269
287
  - [x] `extract_changelog` — update cadence from any repo, package, or website
270
288
  - [x] `extract_govcontracts` — US federal contract intelligence via USASpending.gov
271
289
  - [x] `extract_sec_filings` — SEC EDGAR 8-K material event filings
272
290
  - [x] `extract_gdelt` — GDELT global news intelligence (100+ languages)
273
291
  - [x] `extract_gebiz` — Singapore Government procurement via data.gov.sg
274
- - [x] `extract_gov_landscape` — gov contracts + HN + GitHub + changelog composite
275
- - [x] `extract_finance_landscape` — finance + HN + Reddit + GitHub + changelog composite
276
292
  - [x] `extract_company_landscape` — 5-source company intelligence composite
293
+ - [x] `extract_idea_landscape` — 6-source idea validation composite
277
294
  - [x] `freshness_score` numeric metric (0–100) with domain-specific decay rates
278
- - [x] Cloudflare Workers deployment — global edge with KV caching
279
- - [x] D1 database — 18 watched queries running on 6-hour cron
295
+ - [x] Cloudflare Workers deployment — global edge with KV caching and rate limiting
296
+ - [x] D1 database — 18 watched queries running on 6-hour cron with relevancy scoring
280
297
  - [x] Listed on official MCP Registry
281
298
  - [x] Listed on Apify Store
282
- - [x] FreshContext Specification v1.0 published
299
+ - [x] FreshContext Specification v1.1 published (MIT) — composite adapters, decay rate table, compatibility levels
283
300
  - [x] GitHub Actions CI/CD — auto-publish to npm on every push
301
+ - [x] **DAR engine** — exponential decay scoring with proprietary λ constants (v0.3.15)
302
+ - [x] **Ha-Pri audit signatures** — SHA-256 provenance stamps on every signal
303
+ - [x] **Semantic deduplication** — cross-adapter fingerprinting
304
+ - [x] **Intelligence feed endpoint** — `/v1/intel/feed/:profile_id`
305
+ - [x] **METHODOLOGY.md** — formal IP documentation
306
+ - [ ] Webhook triggers — push high-entropy signals on threshold
307
+ - [ ] Domain-specific watched queries for mining/industrial sector
308
+ - [ ] Subscription tier with profile customization
284
309
  - [ ] GKG upgrade for `extract_gdelt` — tone scores, goldstein scale, event codes
285
- - [ ] TTL-based caching layer
286
310
  - [ ] Dashboard — React frontend for the D1 intelligence pipeline
287
- - [ ] Synthesis endpoint — `/briefing/now` AI-generated intelligence briefings
288
311
 
289
312
  ---
290
313
 
@@ -292,6 +315,8 @@ The FreshContext Specification v1.0 is published as an open standard under MIT l
292
315
 
293
316
  PRs welcome. New adapters are the highest-value contribution — see `src/adapters/` for the pattern and `FRESHCONTEXT_SPEC.md` for the contract any adapter must fulfill.
294
317
 
318
+ If you're building something FreshContext-compatible, open an issue and we'll add you to the ecosystem list.
319
+
295
320
  ---
296
321
 
297
322
  ## License
@@ -302,3 +327,67 @@ MIT
302
327
 
303
328
  *Built by Prince Gabriel — Grootfontein, Namibia 🇳🇦*
304
329
  *"The work isn't gone. It's just waiting to be continued."*
330
+
331
+ ---
332
+
333
+ **Also on:** [Apify Store](https://apify.com/prince_gabriel/freshcontext-mcp) · [MCP Registry](https://registry.modelcontextprotocol.io) · [npm](https://www.npmjs.com/package/freshcontext-mcp)
334
+
335
+ ---
336
+
337
+ ## The Intelligence Layer (v0.3.15)
338
+
339
+ FreshContext is no longer just a pull tool. The infrastructure now runs a continuous **Decay-Adjusted Relevancy (DAR)** engine that scores every signal with exponential decay and provenance signatures.
340
+
341
+ ### The math
342
+
343
+ ```
344
+ R_t = R_0 · e^(-λt)
345
+ ```
346
+
347
+ - `R_0` — base semantic score against your profile (0–100)
348
+ - `λ` — source-specific decay constant (per hour)
349
+ - `t` — hours since the content was published
350
+ - `R_t` — final relevancy at query time
351
+
352
+ Source half-lives are calibrated empirically: Hacker News ≈14h, Reddit ≈3d, jobs ≈6d, GitHub ≈5mo, academic papers ≈1.6y.
353
+
354
+ ### What every signal carries
355
+
356
+ Every row in the D1 ledger is stamped with:
357
+
358
+ - `base_score` — R_0, semantic match against profile
359
+ - `rt_score` — R_t, decay-adjusted relevancy
360
+ - `entropy_level` — `low` / `stable` / `high` on the decay curve
361
+ - `ha_pri_sig` — SHA-256 provenance signature (tamper-evident)
362
+ - `semantic_fingerprint` — cross-adapter deduplication hash
363
+ - `published_at` — extracted content publication date
364
+
365
+ ### The intelligence feed
366
+
367
+ ```
368
+ GET /v1/intel/feed/:profile_id?limit=20&min_rt=0
369
+ ```
370
+
371
+ Returns scored, deduplicated, provenance-stamped signals ranked by R_t — ready for direct consumption by any LLM or agent. No synthesis needed.
372
+
373
+ ### Methodology
374
+
375
+ The full data collection, scoring, and provenance methodology is formally documented in [METHODOLOGY.md](./METHODOLOGY.md) — written as an audit trail for acquirers, integrators, and regulators. Version 1.1, April 2026.
376
+
377
+ ---
378
+
379
+ ## Live endpoints
380
+
381
+ | Endpoint | Method | Purpose |
382
+ |---|---|---|
383
+ | `/` | GET | Service info + endpoint list |
384
+ | `/health` | GET | Liveness check |
385
+ | `/mcp` | POST | MCP JSON-RPC transport |
386
+ | `/briefing` | GET | Latest stored briefing |
387
+ | `/briefing/now` | POST | Force scrape + synthesize |
388
+ | `/v1/intel/feed/:profile_id` | GET | DAR-scored intelligence feed |
389
+ | `/watched-queries` | GET | List all watched queries |
390
+ | `/debug/db` | GET | D1 counts + DAR engine coverage |
391
+ | `/debug/scrape` | GET | Run a single adapter raw |
392
+
393
+ Production: `https://freshcontext-mcp.gimmanuel73.workers.dev`