freshcontext-mcp 0.3.18 → 0.3.20
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/FRESHCONTEXT_SPEC.md +317 -0
- package/METHODOLOGY.md +381 -0
- package/README.md +87 -19
- package/dist/adapters/arxiv.js +2 -1
- package/dist/adapters/changelog.js +4 -2
- package/dist/adapters/finance.js +1 -1
- package/dist/adapters/gdelt.js +1 -1
- package/dist/adapters/gebiz.js +1 -1
- package/dist/adapters/reddit.js +11 -4
- package/dist/adapters/repoSearch.js +1 -1
- package/dist/adapters/secFilings.js +1 -1
- package/dist/core/envelope.js +9 -1
- package/dist/security.js +3 -1
- package/dist/server.js +40 -2
- package/dist/tools/evaluateContext.js +146 -0
- package/docs/CLIENT_SETUP.md +166 -0
- package/docs/CODEX_MCP_USAGE.md +7 -7
- package/docs/CORE_API.md +12 -8
- package/docs/CORE_MCP_BOUNDARY.md +106 -0
- package/docs/FUTURE_LANES.md +196 -0
- package/docs/HA_PRI_V2_DESIGN.md +7 -1
- package/docs/HA_PRI_V2_PRODUCTION_ENFORCEMENT_PLAN.md +414 -0
- package/docs/RELEASE_INTEGRITY.md +2 -0
- package/docs/RELEASE_NOTES.md +22 -5
- package/docs/SIGNAL_CONTRACT.md +213 -17
- package/docs/SOURCE_PROFILES.md +3 -3
- package/package-script-guard.mjs +76 -28
- package/package.json +14 -7
- package/server.json +3 -3
- package/docs/OPERATIONAL_DEMO_RUNBOOK.md +0 -458
package/METHODOLOGY.md
ADDED
|
@@ -0,0 +1,381 @@
|
|
|
1
|
+
# FreshContext Data Intelligence Methodology
|
|
2
|
+
**Version 1.2 — May 2026**
|
|
3
|
+
*Authored by Immanuel Gabriel (Prince Gabriel) — Grootfontein, Namibia*
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## What This Document Is
|
|
8
|
+
|
|
9
|
+
This document formally describes the data collection, scoring, ranking, storage, and provenance methodology underlying FreshContext.
|
|
10
|
+
|
|
11
|
+
It exists for four audiences:
|
|
12
|
+
|
|
13
|
+
1. **Technical integrators** — teams embedding FreshContext into their agent infrastructure who need to understand what the data represents and how it is scored.
|
|
14
|
+
2. **Agent/retrieval system builders** — teams designing retrieval pipelines that need temporal relevance instead of undated context.
|
|
15
|
+
3. **Auditors and reviewers** — people verifying that timestamped AI context is represented honestly and reproducibly.
|
|
16
|
+
4. **Future licensing or platform partners** — entities evaluating FreshContext as infrastructure, who need to audit the methodology that makes the data defensible.
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## Section 1: Core Methodology and Source Collection
|
|
21
|
+
|
|
22
|
+
### 1.1 Architecture
|
|
23
|
+
|
|
24
|
+
FreshContext Core methodology describes the signal contract and temporal scoring primitives that can be used by MCP servers, APIs, CLIs, dashboards, agents, or internal retrieval systems.
|
|
25
|
+
|
|
26
|
+
The Core methodology covers:
|
|
27
|
+
|
|
28
|
+
- **Signal schema** — source, content, timestamps, confidence, adapter identity
|
|
29
|
+
- **Source/provenance** — where the observation came from and how it was retrieved
|
|
30
|
+
- **Published/content date** — when the source claims the content became true or available
|
|
31
|
+
- **Retrieved timestamp** — when FreshContext observed the content
|
|
32
|
+
- **Confidence** — how reliable the timestamp extraction is
|
|
33
|
+
- **Decay-Adjusted Relevancy (DAR)** — temporal utility after source-specific decay
|
|
34
|
+
- **Failure honesty** — failed adapters must not be promoted as fresh successful context
|
|
35
|
+
- **Ranking/explain primitives** — fields that let agents and systems explain why a signal ranked where it did
|
|
36
|
+
|
|
37
|
+
FreshContext also supports a Store/Ledger methodology for systems that persist recurring signals over time. The production Worker implementation uses Cloudflare runtime pieces for MCP transport, KV cache policy, rate limiting, D1 persistence, feeds, cron collection, and deployment concerns. Those runtime concerns are implementation layers, not requirements for every FreshContext-compatible system.
|
|
38
|
+
|
|
39
|
+
### 1.2 Store / Ledger Collection Layer
|
|
40
|
+
|
|
41
|
+
The Store/Ledger methodology describes a continuous data collection pipeline that can run on Cloudflare's global edge infrastructure. A deployment may execute scheduled collection via cron and query watched definitions stored in D1 or another durable store.
|
|
42
|
+
|
|
43
|
+
Each watched query specifies:
|
|
44
|
+
- **Adapter** — the data source to query (e.g., `hackernews`, `jobs`, `reposearch`)
|
|
45
|
+
- **Query** — the search term or URL
|
|
46
|
+
- **User ID** — the profile this query serves
|
|
47
|
+
- **Filters** — optional parameters (location, exclusion terms, etc.)
|
|
48
|
+
|
|
49
|
+
This D1 cron ledger is one implementation layer and future Store direction. It is not required for every FreshContext-compatible envelope implementation.
|
|
50
|
+
|
|
51
|
+
### 1.3 Example Adapter / Source Classes
|
|
52
|
+
|
|
53
|
+
FreshContext currently has:
|
|
54
|
+
|
|
55
|
+
- A reference MCP implementation with `evaluate_context` and 21 read-only reference adapters
|
|
56
|
+
- Separate feed products such as Fresh HN Feed and Fresh Jobs Feed
|
|
57
|
+
- A Store/Ledger methodology for systems that collect recurring signals over time
|
|
58
|
+
|
|
59
|
+
The following table describes example source classes used by FreshContext implementations. Not every source class is necessarily collected by every cron/feed deployment.
|
|
60
|
+
|
|
61
|
+
| Adapter class | Source | Auth Required | Typical Update Frequency |
|
|
62
|
+
|---|---|---|---|
|
|
63
|
+
| `hackernews` | Hacker News Algolia API | None | Real-time |
|
|
64
|
+
| `jobs` | Remotive API | None | Continuous |
|
|
65
|
+
| `reposearch` | GitHub Search API | Optional (rate limit) | Real-time |
|
|
66
|
+
| `github` | GitHub Repository API | Optional | Real-time |
|
|
67
|
+
| `reddit` | Reddit JSON API | None | Real-time |
|
|
68
|
+
| `yc` | YC Open Source API | None | Per batch cycle |
|
|
69
|
+
| `packagetrends` | npm Registry + npm Downloads API | None | Per publish |
|
|
70
|
+
| `finance` | Stooq quote API | None | Market hours / quote feed cadence |
|
|
71
|
+
| `producthunt` | Product Hunt launch data | Token when API-backed | Launch cadence |
|
|
72
|
+
| `changelog` | GitHub Releases / npm package metadata | Optional | Per release |
|
|
73
|
+
| `arxiv` / `scholar` | Academic sources | None | Publication cadence |
|
|
74
|
+
| `gdelt` | GDELT global news | None | 15-minute feed cadence |
|
|
75
|
+
| `govcontracts` / `gebiz` | Government procurement datasets | None | Dataset cadence |
|
|
76
|
+
| `sec_filings` | SEC EDGAR filings | None | Filing cadence |
|
|
77
|
+
|
|
78
|
+
FreshContext adapters operate on publicly accessible or publicly documented data sources. Most reference adapters require no credentials. Some APIs may optionally use tokens for rate limits or official API access, but FreshContext-compatible adapters should not require private user data unless explicitly documented by the implementation. All fetch requests include a `User-Agent` header identifying the FreshContext crawler where the runtime/source supports it.
|
|
79
|
+
|
|
80
|
+
### 1.4 Content Hash Deduplication
|
|
81
|
+
|
|
82
|
+
Before any signal is stored, the platform computes a 32-bit rolling hash of the raw content. If the most recent stored result for a given watched query carries an identical hash, the current result is discarded. This prevents storing unchanged content across cron cycles.
|
|
83
|
+
|
|
84
|
+
### 1.5 Semantic Deduplication
|
|
85
|
+
|
|
86
|
+
Beyond exact-match deduplication, FreshContext implements semantic deduplication to prevent the same underlying story appearing as multiple signals because it was covered by multiple sources (e.g., the same GitHub release appearing in both HN and Reddit).
|
|
87
|
+
|
|
88
|
+
The semantic fingerprint is computed as follows:
|
|
89
|
+
|
|
90
|
+
1. Extract the first canonical URL from the raw content
|
|
91
|
+
2. Extract the first ISO 8601 publication date from the raw content
|
|
92
|
+
3. Extract and normalise the first substantive line (title) — lowercased, punctuation stripped, truncated to 80 characters
|
|
93
|
+
4. Concatenate: `normalised_title|canonical_url|publication_date`
|
|
94
|
+
5. Compute SHA-256 of the concatenated string
|
|
95
|
+
6. Retain the first 16 hex characters as the fingerprint
|
|
96
|
+
|
|
97
|
+
If any signal stored within the preceding 48 hours carries an identical fingerprint, the new result is discarded. The 48-hour window is configurable.
|
|
98
|
+
|
|
99
|
+
---
|
|
100
|
+
|
|
101
|
+
## Section 2: Temporal Scoring — The DAR Engine
|
|
102
|
+
|
|
103
|
+
### 2.1 Overview
|
|
104
|
+
|
|
105
|
+
The Decay-Adjusted Relevancy (DAR) engine scores every collected signal on two axes:
|
|
106
|
+
|
|
107
|
+
- **R_0 (Base Score)** — semantic relevancy of the content against the user's profile, independent of time
|
|
108
|
+
- **R_t (Decay-Adjusted Score)** — R_0 adjusted for how much time has elapsed since the content was published
|
|
109
|
+
|
|
110
|
+
The final stored `rt_score` is what drives signal ranking in briefings and the intelligence feed.
|
|
111
|
+
|
|
112
|
+
FreshContext measures temporal utility, not truth. A source can be valid and still have low utility for the current query if it is stale. A source can be fresh but low-confidence if its timestamp is missing, malformed, inferred, or contradicted.
|
|
113
|
+
|
|
114
|
+
### 2.2 Base Score Calculation (R_0)
|
|
115
|
+
|
|
116
|
+
R_0 is the starting relevance or utility before temporal decay. In the Store/Feed implementation, R_0 is computed by matching content against the user profile:
|
|
117
|
+
|
|
118
|
+
```
|
|
119
|
+
R_0 = baseline (40)
|
|
120
|
+
+ vital_keyword_matches × 15 [capped at +35]
|
|
121
|
+
+ skill_keyword_matches × 3 [capped at +15]
|
|
122
|
+
+ location_accessibility_bonus [+8 if remote/accessible]
|
|
123
|
+
- error_penalty [−40 if content is empty/error]
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
Vital keywords are drawn from the `targets` field of the user profile — job titles, company names, and technology domains the user is specifically tracking.
|
|
127
|
+
|
|
128
|
+
Skill keywords are drawn from the `skills` field — the user's technical competencies. A match here adds relevancy signal but at lower weight than a direct target match.
|
|
129
|
+
|
|
130
|
+
The location accessibility bonus is applied when the content explicitly mentions "remote", "worldwide", "anywhere", or the user's stated location. This is not a geographic filter — it is a signal boost for content that is accessible to the user regardless of their physical location.
|
|
131
|
+
|
|
132
|
+
**Hard exclusions:** If any term from the `exclusion_terms` list appears in the content, R_0 is forced to zero. The result is still stored (for audit purposes) but marked `is_relevant = 0`.
|
|
133
|
+
|
|
134
|
+
This profile formula is a Store/Feed implementation example, not the only possible way to produce base relevance. For Core/MCP envelope scoring, R_0 may be normalised to 100. For feed/ranking systems, R_0 may come from semantic relevance, profile relevance, adapter-specific relevance, or another documented scoring layer.
|
|
135
|
+
|
|
136
|
+
### 2.3 Context-Conditioned Utility
|
|
137
|
+
|
|
138
|
+
FreshContext scoring is context-conditioned. The same signal can have different utility depending on the user, query, agent, platform, or workflow requesting it.
|
|
139
|
+
|
|
140
|
+
In the Store/Feed implementation, this context is represented by `R_0`, the base relevance or utility score before temporal decay. `R_0` may be computed from profile targets, query terms, semantic relevance, adapter-specific relevance, or another documented scoring layer.
|
|
141
|
+
|
|
142
|
+
The DAR function then applies temporal pressure:
|
|
143
|
+
|
|
144
|
+
```
|
|
145
|
+
R_t = R_0 · e^(-λt)
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
This means FreshContext does not treat freshness as a standalone ranking signal. A fresh but irrelevant signal should not outrank an older but highly relevant signal unless the source policy and use case justify it.
|
|
149
|
+
|
|
150
|
+
FreshContext Core exposes a pure context utility primitive for this direction:
|
|
151
|
+
|
|
152
|
+
```
|
|
153
|
+
U(q, s, t) = R(q, s) · e^(-λt) · C_date · C_status
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
Where:
|
|
157
|
+
- `q` is the requester context: user, query, agent, platform, or workflow
|
|
158
|
+
- `s` is the signal or database record
|
|
159
|
+
- `R(q, s)` is contextual relevance between the request and the signal
|
|
160
|
+
- `λ` is the source-specific decay constant
|
|
161
|
+
- `t` is signal age
|
|
162
|
+
- `C_date` is a timestamp-confidence factor
|
|
163
|
+
- `C_status` is a failure/partial/success factor
|
|
164
|
+
|
|
165
|
+
This is an extension of the DAR methodology, not a replacement for it. The purpose is to support systems where FreshContext runs over databases, feeds, retrieved documents, or agent memory and ranks information by both relevance and temporal utility. It does not imply vector search, multi-agent orchestration, or a hosted context store.
|
|
166
|
+
|
|
167
|
+
### 2.4 Decay Function (R_t)
|
|
168
|
+
|
|
169
|
+
```
|
|
170
|
+
R_t = R_0 · e^(-λt)
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
Where:
|
|
174
|
+
- `λ` = source-specific decay constant (per hour)
|
|
175
|
+
- `t` = hours elapsed since `published_at` / `content_date`
|
|
176
|
+
- `R_t` = current temporal utility score
|
|
177
|
+
|
|
178
|
+
If `published_at` / `content_date` cannot be extracted, the system must not pretend the signal is fresh. Core-compatible envelope scoring SHOULD use `freshness_score: null` and low confidence. Store/feed systems MAY apply a conservative fallback assumption, such as one source half-life, but must mark confidence low and explain the assumption.
|
|
179
|
+
|
|
180
|
+
### 2.5 Source Decay Constants (λ)
|
|
181
|
+
|
|
182
|
+
These constants are reference/default calibration values for how quickly signals from each source class lose temporal utility:
|
|
183
|
+
|
|
184
|
+
| Source | λ (per hour) | Half-life |
|
|
185
|
+
|---|---|---|
|
|
186
|
+
| Hacker News | 0.050 | ~14 hours |
|
|
187
|
+
| Reddit | 0.010 | ~3 days |
|
|
188
|
+
| Product Hunt | 0.010 | ~3 days |
|
|
189
|
+
| Job listings | 0.005 | ~6 days |
|
|
190
|
+
| Financial data | 0.001 | ~29 days |
|
|
191
|
+
| YC companies | 0.001 | ~29 days |
|
|
192
|
+
| Package trends | 0.0005 | ~58 days |
|
|
193
|
+
| GitHub repositories | 0.0002 | ~5 months |
|
|
194
|
+
| Academic papers | 0.00005 | ~1.6 years |
|
|
195
|
+
|
|
196
|
+
These constants are reference defaults used by the FreshContext methodology and may be tuned by implementation. Hosted or private deployments may use calibrated variants per source, query type, or user profile. The calibration process and production tuning may be proprietary, even when public reference defaults are documented.
|
|
197
|
+
|
|
198
|
+
### 2.6 Entropy Classification
|
|
199
|
+
|
|
200
|
+
Each signal is classified into one of three entropy states based on its position on the decay curve:
|
|
201
|
+
|
|
202
|
+
| State | Condition | Interpretation |
|
|
203
|
+
|---|---|---|
|
|
204
|
+
| `low` | `t < half_life / 2` | Signal near peak value — act now |
|
|
205
|
+
| `stable` | `t < 1.5 × half_life` | Usable signal — monitor |
|
|
206
|
+
| `high` | `t ≥ 1.5 × half_life` | Significantly degraded — verify before acting |
|
|
207
|
+
|
|
208
|
+
Entropy labels describe signal decay state, not confidence level. A high-entropy signal may still be factually accurate, but it has lost temporal utility for current retrieval unless reinforced by newer evidence.
|
|
209
|
+
|
|
210
|
+
### 2.7 Relevancy Threshold
|
|
211
|
+
|
|
212
|
+
Signals with `rt_score < 35` are stored with `is_relevant = 0`. They remain in the database for audit and historical analysis but are excluded from briefings and the intelligence feed by default. The threshold is configurable per profile.
|
|
213
|
+
|
|
214
|
+
### 2.8 Failure Honesty
|
|
215
|
+
|
|
216
|
+
Failed adapters must not be promoted by freshness scoring. Empty, blocked, timeout, malformed, rate-limited, access-denied, or error-only outputs reduce R_0 or mark the signal status as failed/unknown.
|
|
217
|
+
|
|
218
|
+
A failed result should not receive high confidence. A failed result should not produce `Score: 100/100`. Partial composites should preserve successful upstream results while marking failures explicitly.
|
|
219
|
+
|
|
220
|
+
---
|
|
221
|
+
|
|
222
|
+
## Section 3: FreshContext Store / Ledger Methodology
|
|
223
|
+
|
|
224
|
+
### 3.1 The Ha-Pri Audit Signature
|
|
225
|
+
|
|
226
|
+
Every signal stored in a FreshContext Store/Ledger deployment carries a `ha_pri_sig` — a SHA-256 audit signature computed as:
|
|
227
|
+
|
|
228
|
+
```
|
|
229
|
+
SHA-256( result_id + ":" + content_hash + ":" + "FRESHCONTEXT_DAR_V1" )
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
In Ha-Pri v1, this signature is a provenance stamp and audit reference for stored signals. It binds the result ID, the current content hash, and the engine version. It is not yet a full tamper-enforcement system: the current `content_hash` source is the existing rolling `result_hash`, and signatures are not recomputed on read to reject modified rows.
|
|
233
|
+
|
|
234
|
+
Ha-Pri v1 serves three purposes:
|
|
235
|
+
|
|
236
|
+
1. **Provenance reference** — the signature binds the result ID, current rolling content hash, and engine/version marker so the stored signal can be audited against the v1 formula.
|
|
237
|
+
2. **Scoring lineage** — the signature records the scoring/signature formula used when the row was written.
|
|
238
|
+
3. **Licensing / audit reference** — when FreshContext data is provided to a third party under licence, the `ha_pri_sig` column gives a stable reference for what was stored and delivered.
|
|
239
|
+
|
|
240
|
+
Ha-Pri v1 is not hard tamper enforcement. It is not recomputed on read, it signs the existing rolling result_hash (`result_hash`) rather than canonical content SHA-256, and it does not reject rows. Ha-Pri v2 is the planned/additive path for stronger verification.
|
|
241
|
+
|
|
242
|
+
Future Ha-Pri v2 may add canonical content SHA-256, stronger canonicalization, and explicit verification/rejection on read. That hardening is separate from the current v1 provenance stamp.
|
|
243
|
+
|
|
244
|
+
Ha-Pri v1 is the provenance layer and the foundation for a stronger integrity layer, while DAR and context-conditioned utility are the ranking/scoring layer.
|
|
245
|
+
|
|
246
|
+
### 3.2 D1 Historical Ledger
|
|
247
|
+
|
|
248
|
+
The `scrape_results` table functions as a **Contextual Ledger** — not merely a cache, but a time-series record of intelligence signals with full provenance.
|
|
249
|
+
|
|
250
|
+
This Store/Ledger methodology is not required for basic FreshContext-compatible envelope implementations. It is the methodology for systems that persist recurring signals and want auditability over time.
|
|
251
|
+
|
|
252
|
+
Key properties of the ledger:
|
|
253
|
+
- Scored signal material is treated as immutable once written; consumption metadata such as `is_new` may be updated
|
|
254
|
+
- Every row carries a `scraped_at` timestamp with second precision
|
|
255
|
+
- Every row carries a `published_at` date extracted from content (where available)
|
|
256
|
+
- The ledger accumulates continuously at 6-hour intervals regardless of active user sessions
|
|
257
|
+
- The ledger enables time-travel queries: "what was the intelligence landscape for topic X at date Y?"
|
|
258
|
+
|
|
259
|
+
### 3.3 Schema Reference
|
|
260
|
+
|
|
261
|
+
```sql
|
|
262
|
+
scrape_results (
|
|
263
|
+
id TEXT PRIMARY KEY, -- sr_{timestamp}_{random}
|
|
264
|
+
watched_query_id TEXT, -- FK → watched_queries.id
|
|
265
|
+
adapter TEXT, -- source adapter name
|
|
266
|
+
query TEXT, -- the search term used
|
|
267
|
+
raw_content TEXT, -- scraped content (max 8000 chars)
|
|
268
|
+
result_hash TEXT, -- 32-bit rolling hash of raw_content
|
|
269
|
+
semantic_fingerprint TEXT, -- 16-char SHA-256 of normalised title|url|date
|
|
270
|
+
is_new INTEGER, -- 1 until consumed by briefing
|
|
271
|
+
scraped_at TEXT, -- ISO 8601 UTC timestamp
|
|
272
|
+
published_at TEXT, -- extracted content publication date
|
|
273
|
+
relevancy_score INTEGER, -- = round(rt_score), 0-100
|
|
274
|
+
is_relevant INTEGER, -- 1 if rt_score >= 35, else 0
|
|
275
|
+
base_score INTEGER, -- R_0 semantic score, 0-100
|
|
276
|
+
rt_score REAL, -- R_t decay-adjusted score, 0-100
|
|
277
|
+
ha_pri_sig TEXT, -- SHA-256 audit signature (64 hex chars)
|
|
278
|
+
entropy_level TEXT -- 'low' | 'stable' | 'high'
|
|
279
|
+
)
|
|
280
|
+
```
|
|
281
|
+
|
|
282
|
+
---
|
|
283
|
+
|
|
284
|
+
## Section 4: The Intelligence Feed
|
|
285
|
+
|
|
286
|
+
### 4.1 Endpoint
|
|
287
|
+
|
|
288
|
+
```
|
|
289
|
+
GET /v1/intel/feed/{profile_id}
|
|
290
|
+
```
|
|
291
|
+
|
|
292
|
+
Optional parameters:
|
|
293
|
+
- `limit` — maximum signals to return (default: 20)
|
|
294
|
+
- `min_rt` — minimum rt_score filter (default: 0)
|
|
295
|
+
|
|
296
|
+
### 4.2 Response Structure
|
|
297
|
+
|
|
298
|
+
```json
|
|
299
|
+
{
|
|
300
|
+
"feed_metadata": {
|
|
301
|
+
"profile_id": "default",
|
|
302
|
+
"generated_at": "2026-04-14T09:00:00Z",
|
|
303
|
+
"signal_count": 18,
|
|
304
|
+
"version": "freshcontext-1.2"
|
|
305
|
+
},
|
|
306
|
+
"signals": [
|
|
307
|
+
{
|
|
308
|
+
"signal_id": "sr_1744628412_a3f7b",
|
|
309
|
+
"source": "hackernews",
|
|
310
|
+
"label": "HN: MCP Servers",
|
|
311
|
+
"content": {
|
|
312
|
+
"preview": "...",
|
|
313
|
+
"url": "mcp server 2026"
|
|
314
|
+
},
|
|
315
|
+
"intelligence_stamps": {
|
|
316
|
+
"scraped_at": "2026-04-14T08:12:00Z",
|
|
317
|
+
"published_at": "2026-04-14",
|
|
318
|
+
"base_score": 78,
|
|
319
|
+
"rt_score": 61.4,
|
|
320
|
+
"entropy_level": "stable",
|
|
321
|
+
"ha_pri_sig": "a3f7b2c1d4e5f6a7b8c9d0e1f2a3b4c5..."
|
|
322
|
+
}
|
|
323
|
+
}
|
|
324
|
+
]
|
|
325
|
+
}
|
|
326
|
+
```
|
|
327
|
+
|
|
328
|
+
### 4.3 LLM Integration
|
|
329
|
+
|
|
330
|
+
The intelligence feed is designed to be consumed directly by any language model or AI agent without modification. The `intelligence_stamps` block gives the agent everything it needs to reason about data freshness:
|
|
331
|
+
|
|
332
|
+
- `rt_score` — a single number representing current signal value
|
|
333
|
+
- `entropy_level` — human-readable decay state
|
|
334
|
+
- `published_at` — the actual content date (not the retrieval date)
|
|
335
|
+
- `ha_pri_sig` — provenance reference the agent can cite
|
|
336
|
+
|
|
337
|
+
This is the core value proposition: **AI agents get grounded, timestamped, scored intelligence rather than undated web content of unknown age.**
|
|
338
|
+
|
|
339
|
+
MCP is one interface over this methodology, not the whole system. The same scoring, timestamp, confidence, and provenance primitives can support APIs, CLIs, npm packages, dashboards, agents, and internal services.
|
|
340
|
+
|
|
341
|
+
---
|
|
342
|
+
|
|
343
|
+
## Section 5: Asset Summary
|
|
344
|
+
|
|
345
|
+
For technical integrators, auditors, and future platform partners:
|
|
346
|
+
|
|
347
|
+
**What FreshContext owns:**
|
|
348
|
+
|
|
349
|
+
1. **The FreshContext Specification v1.2** (MIT licence, open standard) — defines the envelope format, confidence levels, structured JSON form, freshness score behavior, and failure-honesty requirements. Timestamped in the public GitHub repository.
|
|
350
|
+
|
|
351
|
+
2. **The DAR Engine** — the exponential decay scoring methodology with source-specific λ reference defaults and calibrated production tuning.
|
|
352
|
+
|
|
353
|
+
3. **The Semantic Fingerprinting Method** — the three-field normalisation and SHA-256 fingerprinting approach for cross-adapter deduplication.
|
|
354
|
+
|
|
355
|
+
4. **The Ha-Pri Audit Signature scheme** — the provenance stamp and audit reference that binds stored row material to the current v1 formula; stronger tamper-evidence is the future additive v2 path.
|
|
356
|
+
|
|
357
|
+
5. **The Store / Ledger design** — support for recurring watched queries, historical signal accumulation, D1-backed storage, and time-series auditability.
|
|
358
|
+
|
|
359
|
+
6. **The Reference Implementation** — `freshcontext-mcp@0.3.19`, the `evaluate_context` MCP interface, and 21 read-only reference adapters, listed on npm and the MCP Registry. The hosted Worker endpoint is a separate deployment surface.
|
|
360
|
+
|
|
361
|
+
---
|
|
362
|
+
|
|
363
|
+
## Changelog
|
|
364
|
+
|
|
365
|
+
### Version 1.2 — May 2026
|
|
366
|
+
- Clarified Core methodology vs Store/Ledger methodology.
|
|
367
|
+
- Preserved DAR as the mathematical scoring backbone.
|
|
368
|
+
- Updated reference implementation language for `evaluate_context` plus 21 MCP reference adapters.
|
|
369
|
+
- Reframed source decay constants as reference defaults/calibration values.
|
|
370
|
+
- Added failure-honesty methodology.
|
|
371
|
+
- Added context-conditioned utility as a Core scoring primitive.
|
|
372
|
+
- Clarified missing timestamp behavior.
|
|
373
|
+
- Clarified MCP as one interface, not the whole system.
|
|
374
|
+
|
|
375
|
+
### Version 1.1 — April 2026
|
|
376
|
+
- Existing methodology version.
|
|
377
|
+
|
|
378
|
+
---
|
|
379
|
+
|
|
380
|
+
*"The work isn't gone. It's just waiting to be continued."*
|
|
381
|
+
*— Prince Gabriel, Grootfontein, Namibia*
|
package/README.md
CHANGED
|
@@ -6,7 +6,7 @@ Claude had no idea. It presented everything with the same confidence.
|
|
|
6
6
|
|
|
7
7
|
That's the problem freshcontext fixes.
|
|
8
8
|
|
|
9
|
-
This repository is the integrated FreshContext MCP
|
|
9
|
+
This repository is the integrated FreshContext Core/MCP package. FreshContext is the context judgment layer between retrieval and reasoning. Core is the reusable engine that scores, ranks, explains, and turns candidate context into decision-ready context; MCP is the first live host/interface over that engine.
|
|
10
10
|
|
|
11
11
|
[](https://www.npmjs.com/package/freshcontext-mcp)
|
|
12
12
|
[](https://opensource.org/licenses/MIT)
|
|
@@ -30,7 +30,16 @@ That's not hallucination. That's correct summarization of corrupted retrieval.
|
|
|
30
30
|
|
|
31
31
|
## The layer
|
|
32
32
|
|
|
33
|
-
FreshContext is
|
|
33
|
+
FreshContext is **context integrity infrastructure for AI agents and retrieval systems**. It sits between retrieval and reasoning:
|
|
34
|
+
|
|
35
|
+
```text
|
|
36
|
+
candidate context
|
|
37
|
+
-> FreshContext Core
|
|
38
|
+
-> decision-ready context
|
|
39
|
+
-> model / agent / app
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
FreshContext evaluates freshness, source profile, confidence, utility, provenance material, and failure honesty before context reaches the LLM. The temporal core uses Decay-Adjusted Relevancy:
|
|
34
43
|
|
|
35
44
|
```
|
|
36
45
|
R_t = R_0 · e^(−λt)
|
|
@@ -41,9 +50,9 @@ R_t = R_0 · e^(−λt)
|
|
|
41
50
|
- `t` — hours elapsed since publication
|
|
42
51
|
- `R_t` — decay-adjusted relevancy at query time
|
|
43
52
|
|
|
44
|
-
That's the
|
|
53
|
+
That's the core correction. No model swap. No re-embedding. No re-indexing. The layer drops onto whatever retrieval pipeline you already have.
|
|
45
54
|
|
|
46
|
-
**The layer is the product.** The
|
|
55
|
+
**The layer is the product.** The named adapters shipped with this repo demonstrate compatibility across different source classes. The DAR engine, the freshness envelope, Source Profiles, and the FreshContext Specification are the moat.
|
|
47
56
|
|
|
48
57
|
---
|
|
49
58
|
|
|
@@ -70,17 +79,62 @@ The FreshContext Specification v1.2 is published as an open standard under MIT l
|
|
|
70
79
|
|
|
71
80
|
## Architecture boundary
|
|
72
81
|
|
|
73
|
-
FreshContext Core is the reusable center of the current integrated package. It owns freshness scoring, envelope formatting, failure guards, shared types, rank/explain primitives, and the context-conditioned utility primitive.
|
|
82
|
+
FreshContext Core is the reusable center of the current integrated package. It owns signal normalization, freshness scoring, Source Profiles, decision output, envelope formatting, failure guards, shared types, rank/explain primitives, and the context-conditioned utility primitive.
|
|
74
83
|
|
|
75
|
-
MCP is the primary reference/interface implementation over Core. Claude Desktop is supported, but not required. The
|
|
84
|
+
MCP is the primary reference/interface implementation over Core. Claude Desktop is supported, but not required. The MCP tool surface exposes named reference adapters and a live interface for using the system.
|
|
76
85
|
|
|
77
86
|
The production Cloudflare Worker now uses Core-backed envelope generation. Worker-specific concerns remain outside Core: MCP transport, runtime guards, KV cache policy, cache metadata injection, JSON parse/replace cache helpers, D1 feeds, cron, rate limiting, and Store/feed scoring/provenance.
|
|
78
87
|
|
|
88
|
+
See [Core / MCP Boundary](./docs/CORE_MCP_BOUNDARY.md) for the current package boundary and the staged path toward a future standalone Core package.
|
|
89
|
+
|
|
79
90
|
---
|
|
80
91
|
|
|
81
|
-
##
|
|
92
|
+
## Primary MCP interface
|
|
82
93
|
|
|
83
|
-
|
|
94
|
+
The clearest MCP path is `evaluate_context`.
|
|
95
|
+
|
|
96
|
+
It accepts candidate context from any retriever, agent, database, local script, note parser, or adapter output:
|
|
97
|
+
|
|
98
|
+
```json
|
|
99
|
+
{
|
|
100
|
+
"profile": "academic_research",
|
|
101
|
+
"intent": "citation_check",
|
|
102
|
+
"signals": [
|
|
103
|
+
{
|
|
104
|
+
"title": "Example source",
|
|
105
|
+
"content": "Candidate context text...",
|
|
106
|
+
"source": "https://example.com/source",
|
|
107
|
+
"source_type": "arxiv",
|
|
108
|
+
"published_at": "2026-05-24T12:00:00.000Z",
|
|
109
|
+
"retrieved_at": "2026-05-24T13:00:00.000Z",
|
|
110
|
+
"semantic_score": 0.92
|
|
111
|
+
}
|
|
112
|
+
]
|
|
113
|
+
}
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
FreshContext returns decision-first output:
|
|
117
|
+
|
|
118
|
+
- Decision
|
|
119
|
+
- Meaning
|
|
120
|
+
- Action
|
|
121
|
+
- Warnings
|
|
122
|
+
- Source
|
|
123
|
+
- Freshness
|
|
124
|
+
- Rank score
|
|
125
|
+
- Utility
|
|
126
|
+
- Confidence
|
|
127
|
+
- Why
|
|
128
|
+
|
|
129
|
+
`evaluate_context` does not fetch URLs, crawl, scrape, browse, read folders, or call adapters. It only evaluates candidate context the caller provides.
|
|
130
|
+
|
|
131
|
+
Current boundary: `evaluate_context` is part of the published npm/local stdio MCP server and has been verified on the hosted Cloudflare Worker MCP endpoint at `0.3.19 / 22 tools`. The Worker remains a separate deployment surface, so future package interfaces should be re-verified remotely before being claimed live.
|
|
132
|
+
|
|
133
|
+
---
|
|
134
|
+
|
|
135
|
+
## Advanced Worker/feed surface
|
|
136
|
+
|
|
137
|
+
Beyond the per-call Core/MCP paths, the production Worker deployment exposes a continuous, decay-scored, deduplicated feed. This is an advanced deployment surface, not the required way to use FreshContext Core:
|
|
84
138
|
|
|
85
139
|
```
|
|
86
140
|
GET /v1/intel/feed/:profile_id?limit=20&min_rt=0
|
|
@@ -94,7 +148,7 @@ Production endpoint: `https://freshcontext-mcp.gimmanuel73.workers.dev`
|
|
|
94
148
|
|
|
95
149
|
## Reference adapters
|
|
96
150
|
|
|
97
|
-
The repo ships
|
|
151
|
+
The repo ships named reference adapters that demonstrate how different source classes can become FreshContext-compatible. Each adapter keeps its own name because it represents a source boundary; the adapter count is operational proof, not the product headline.
|
|
98
152
|
|
|
99
153
|
### Intelligence
|
|
100
154
|
| Adapter | What it returns |
|
|
@@ -128,7 +182,7 @@ The repo ships 21 tools demonstrating how to make any data source FreshContext-c
|
|
|
128
182
|
| `extract_finance_landscape` | 5 | Finance + HN + Reddit + GitHub + changelog |
|
|
129
183
|
| `extract_company_landscape` | 5 | The full picture on any company |
|
|
130
184
|
|
|
131
|
-
###
|
|
185
|
+
### Official, regulatory, and procurement sources
|
|
132
186
|
| Adapter | Source | What it returns |
|
|
133
187
|
|---|---|---|
|
|
134
188
|
| `extract_changelog` | GitHub Releases / npm / auto-discover | Update history from any repo, package, or website |
|
|
@@ -141,6 +195,8 @@ The repo ships 21 tools demonstrating how to make any data source FreshContext-c
|
|
|
141
195
|
|
|
142
196
|
## Quick start
|
|
143
197
|
|
|
198
|
+
For Claude Desktop, Codex, `npx`, global npm, and source-checkout setup, see the concise [client setup guide](./docs/CLIENT_SETUP.md).
|
|
199
|
+
|
|
144
200
|
### Cloud (no install)
|
|
145
201
|
|
|
146
202
|
Add to your Claude Desktop config and restart:
|
|
@@ -290,6 +346,14 @@ Minimal shape:
|
|
|
290
346
|
|
|
291
347
|
This local demo does not fetch URLs, crawl, or read folders. It evaluates candidate context you provide and returns decision-first output: Decision, Meaning, Action, Warnings, and supporting metrics.
|
|
292
348
|
|
|
349
|
+
In an MCP client, use `evaluate_context` when you already have candidate context from another retriever, database, agent, or script:
|
|
350
|
+
|
|
351
|
+
```text
|
|
352
|
+
Use evaluate_context with profile "academic_research", intent "citation_check", and these candidate signals: [...]
|
|
353
|
+
```
|
|
354
|
+
|
|
355
|
+
Use the named reference adapters when you want FreshContext's current MCP package to fetch public source examples for you.
|
|
356
|
+
|
|
293
357
|
**Should I build this idea?**
|
|
294
358
|
```
|
|
295
359
|
Use extract_idea_landscape with idea "procurement intelligence saas"
|
|
@@ -325,7 +389,7 @@ The reference implementation runs on Cloudflare's global edge:
|
|
|
325
389
|
| `/` | GET | Service info + endpoint list |
|
|
326
390
|
| `/health` | GET | Liveness check |
|
|
327
391
|
| `/mcp` | POST | MCP JSON-RPC transport |
|
|
328
|
-
| `/demo` | GET | Live before/after demo (no
|
|
392
|
+
| `/demo` | GET | Live before/after demo (no auth token required) |
|
|
329
393
|
| `/briefing` | GET | Latest stored briefing |
|
|
330
394
|
| `/v1/intel/feed/:profile_id` | GET | DAR-scored intelligence feed |
|
|
331
395
|
| `/watched-queries` | GET | List all watched queries |
|
|
@@ -343,26 +407,31 @@ Production: `https://freshcontext-mcp.gimmanuel73.workers.dev`
|
|
|
343
407
|
## Roadmap
|
|
344
408
|
|
|
345
409
|
- [x] FreshContext Specification v1.2 published (MIT, open standard)
|
|
346
|
-
- [x] DAR engine with
|
|
410
|
+
- [x] DAR engine with source-specific lambda constants (v0.3.19)
|
|
347
411
|
- [x] Ha-Pri v1 provenance signatures on stored signals
|
|
348
412
|
- [x] Semantic deduplication via fingerprinting
|
|
349
413
|
- [x] Live before/after demo at `/demo`
|
|
350
|
-
- [x] METHODOLOGY.md —
|
|
351
|
-
- [x]
|
|
414
|
+
- [x] METHODOLOGY.md — methodology and engineering documentation
|
|
415
|
+
- [x] Named reference adapters across intelligence, competitive research, market data, and composites
|
|
416
|
+
- [x] Generic MCP `evaluate_context` tool for caller-provided candidate context
|
|
352
417
|
- [x] Core-backed envelope generation shared by npm/MCP and the Cloudflare Worker
|
|
353
418
|
- [x] Cloudflare Workers deployment — global edge, KV cache, KV rate limiting
|
|
354
|
-
- [x]
|
|
355
|
-
- [
|
|
419
|
+
- [x] Published on npm and listed for MCP usage; Apify/feed assets are separated from the normal MCP runtime package
|
|
420
|
+
- [x] Ha-Pri v2 Core helper and deterministic golden vectors
|
|
421
|
+
- [x] Ha-Pri v2 production-enforcement design document
|
|
422
|
+
- [ ] Ha-Pri v2 Worker/D1 production enforcement
|
|
356
423
|
- [x] GitHub Actions release workflow — manual or `v*` tag-triggered npm publish path
|
|
357
424
|
- [ ] Webhook triggers — push high-entropy signals on threshold
|
|
358
425
|
- [ ] Dashboard — React frontend for the D1 intelligence pipeline
|
|
359
426
|
- [ ] GKG upgrade for `extract_gdelt` — tone scores, goldstein scale, event codes
|
|
360
427
|
|
|
428
|
+
Future work is organized in [FreshContext Future Lanes](./docs/FUTURE_LANES.md). Roadmap items are not live product claims until implemented and validated.
|
|
429
|
+
|
|
361
430
|
---
|
|
362
431
|
|
|
363
432
|
## Contributing
|
|
364
433
|
|
|
365
|
-
PRs welcome. New adapters are
|
|
434
|
+
PRs welcome. The highest-value contributions improve the caller-provided context path, decision output, host integrations, and FreshContext-compatible signal quality. New reference adapters are useful when they preserve source boundaries and emit timestamped, failure-honest context — see `src/adapters/` for examples and [`FRESHCONTEXT_SPEC.md`](./FRESHCONTEXT_SPEC.md) for the compatibility contract.
|
|
366
435
|
|
|
367
436
|
If you're building something FreshContext-compatible, open an issue and we'll add you to the ecosystem list.
|
|
368
437
|
|
|
@@ -377,7 +446,6 @@ If you're building something FreshContext-compatible, open an issue and we'll ad
|
|
|
377
446
|
- [Dependency diligence notes](./docs/DEPENDENCY_DILIGENCE.md)
|
|
378
447
|
- [Release integrity notes](./docs/RELEASE_INTEGRITY.md)
|
|
379
448
|
- [Release notes](./docs/RELEASE_NOTES.md)
|
|
380
|
-
- [Operational demo runbook](./docs/OPERATIONAL_DEMO_RUNBOOK.md)
|
|
381
449
|
|
|
382
450
|
---
|
|
383
451
|
|
|
@@ -392,4 +460,4 @@ MIT
|
|
|
392
460
|
|
|
393
461
|
---
|
|
394
462
|
|
|
395
|
-
**Also on:** [
|
|
463
|
+
**Also on:** [MCP Registry](https://registry.modelcontextprotocol.io) · [npm](https://www.npmjs.com/package/freshcontext-mcp)
|
package/dist/adapters/arxiv.js
CHANGED
|
@@ -1,3 +1,4 @@
|
|
|
1
|
+
import { validateUrl } from "../security.js";
|
|
1
2
|
const USER_AGENT = "freshcontext-mcp/0.1.7 (https://github.com/PrinceGabriel-lgtm/freshcontext-mcp)";
|
|
2
3
|
const DEFAULT_ARXIV_SIGNAL_SCORE = 0.8;
|
|
3
4
|
function buildArxivApiUrl(input, maxResults = 10) {
|
|
@@ -6,7 +7,7 @@ function buildArxivApiUrl(input, maxResults = 10) {
|
|
|
6
7
|
? Math.max(1, Math.min(Math.trunc(maxResults), 50))
|
|
7
8
|
: 10;
|
|
8
9
|
return trimmed.startsWith("http")
|
|
9
|
-
? trimmed
|
|
10
|
+
? validateUrl(trimmed, "arxiv")
|
|
10
11
|
: `https://export.arxiv.org/api/query?search_query=all:${encodeURIComponent(trimmed)}&start=0&max_results=${safeMaxResults}&sortBy=relevance&sortOrder=descending`;
|
|
11
12
|
}
|
|
12
13
|
async function fetchArxivXml(apiUrl) {
|
|
@@ -1,3 +1,4 @@
|
|
|
1
|
+
import { validateUrl } from "../security.js";
|
|
1
2
|
/**
|
|
2
3
|
* Changelog adapter — extracts update history from any product or repo.
|
|
3
4
|
*
|
|
@@ -193,7 +194,8 @@ export async function changelogAdapter(options) {
|
|
|
193
194
|
return fetchNpmChangelog(input, maxLength);
|
|
194
195
|
}
|
|
195
196
|
// GitHub repo URL → use releases API
|
|
196
|
-
const
|
|
197
|
+
const safeInput = validateUrl(input, "changelog");
|
|
198
|
+
const ghMatch = safeInput.match(/github\.com\/([^/]+)\/([^/?\s]+)/);
|
|
197
199
|
if (ghMatch) {
|
|
198
200
|
try {
|
|
199
201
|
return await fetchGitHubReleases(ghMatch[1], ghMatch[2], maxLength);
|
|
@@ -203,5 +205,5 @@ export async function changelogAdapter(options) {
|
|
|
203
205
|
}
|
|
204
206
|
}
|
|
205
207
|
// Any other URL → discover changelog
|
|
206
|
-
return discoverChangelog(
|
|
208
|
+
return discoverChangelog(safeInput, maxLength);
|
|
207
209
|
}
|
package/dist/adapters/finance.js
CHANGED
|
@@ -30,7 +30,7 @@ async function fetchStooqQuote(ticker) {
|
|
|
30
30
|
const url = `https://stooq.com/q/l/?s=${encodeURIComponent(stooqSymbol.toLowerCase())}&f=sd2t2ohlcv&h&e=json`;
|
|
31
31
|
const res = await fetch(url, {
|
|
32
32
|
headers: {
|
|
33
|
-
"User-Agent": "freshcontext-mcp/0.3.
|
|
33
|
+
"User-Agent": "freshcontext-mcp/0.3.19",
|
|
34
34
|
"Accept": "application/json",
|
|
35
35
|
},
|
|
36
36
|
});
|
package/dist/adapters/gdelt.js
CHANGED
|
@@ -11,7 +11,7 @@
|
|
|
11
11
|
*/
|
|
12
12
|
const HEADERS = {
|
|
13
13
|
"Accept": "application/json",
|
|
14
|
-
"User-Agent": "freshcontext-mcp/0.3.
|
|
14
|
+
"User-Agent": "freshcontext-mcp/0.3.19 (https://github.com/PrinceGabriel-lgtm/freshcontext-mcp)",
|
|
15
15
|
};
|
|
16
16
|
function parseGdeltDate(raw) {
|
|
17
17
|
if (!raw)
|
package/dist/adapters/gebiz.js
CHANGED
|
@@ -20,7 +20,7 @@ const DATASET_ID = "d_acde1106003906a75c3fa052592f2fcb";
|
|
|
20
20
|
const BASE_URL = "https://data.gov.sg/api/action/datastore_search";
|
|
21
21
|
const HEADERS = {
|
|
22
22
|
"Accept": "application/json",
|
|
23
|
-
"User-Agent": "freshcontext-mcp/0.3.
|
|
23
|
+
"User-Agent": "freshcontext-mcp/0.3.19 (https://github.com/PrinceGabriel-lgtm/freshcontext-mcp)",
|
|
24
24
|
};
|
|
25
25
|
function formatDate(raw) {
|
|
26
26
|
if (!raw)
|