freshcontext-mcp 0.3.13 → 0.3.15
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.actor/Dockerfile +7 -4
- package/.actor/actor.json +1 -1
- package/CONTEXT_SKILL.md +84 -0
- package/FRESHCONTEXT_SPEC.md +80 -6
- package/HANDOFF.md +220 -91
- package/METHODOLOGY.md +277 -0
- package/README.md +195 -41
- package/SESSION_SAVE_V5.md +121 -0
- package/SESSION_SAVE_V6.md +194 -0
- package/SESSION_SAVE_V9.md +170 -0
- package/dist/apify.js +133 -0
- package/dist/server.js +92 -46
- package/freshcontext-validate.js +196 -0
- package/freshcontext.schema.json +103 -0
- package/input_schema.json +16 -17
- package/package.json +2 -2
- package/server.json +3 -3
package/METHODOLOGY.md
ADDED
|
@@ -0,0 +1,277 @@
|
|
|
1
|
+
# FreshContext Data Intelligence Methodology
|
|
2
|
+
**Version 1.1 — April 2026**
|
|
3
|
+
*Authored by Immanuel Gabriel (Prince Gabriel) — Grootfontein, Namibia*
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## What This Document Is
|
|
8
|
+
|
|
9
|
+
This document formally describes the data collection, scoring, and provenance methodology underlying the FreshContext intelligence platform.
|
|
10
|
+
|
|
11
|
+
It exists for three audiences:
|
|
12
|
+
|
|
13
|
+
1. **Technical integrators** — teams embedding FreshContext into their agent infrastructure who need to understand what the data represents and how it is scored.
|
|
14
|
+
2. **Acquirers and licensing partners** — entities evaluating FreshContext as an asset, who need to audit the methodology that makes the data defensible.
|
|
15
|
+
3. **Regulators and auditors** — who may need to verify that the platform's data claims are substantiated by documented, reproducible methodology.
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
## Section 1: Data Collection
|
|
20
|
+
|
|
21
|
+
### 1.1 Architecture
|
|
22
|
+
|
|
23
|
+
FreshContext operates a continuous data collection pipeline running on Cloudflare's global edge infrastructure. The pipeline executes every 6 hours via a scheduled cron job and queries 18 watched query definitions stored in the platform's D1 database.
|
|
24
|
+
|
|
25
|
+
Each watched query specifies:
|
|
26
|
+
- **Adapter** — the data source to query (e.g., `hackernews`, `jobs`, `reposearch`)
|
|
27
|
+
- **Query** — the search term or URL
|
|
28
|
+
- **User ID** — the profile this query serves
|
|
29
|
+
- **Filters** — optional parameters (location, exclusion terms, etc.)
|
|
30
|
+
|
|
31
|
+
### 1.2 Adapters
|
|
32
|
+
|
|
33
|
+
FreshContext implements 11 production adapters covering the following sources:
|
|
34
|
+
|
|
35
|
+
| Adapter | Source | Auth Required | Update Frequency |
|
|
36
|
+
|---|---|---|---|
|
|
37
|
+
| `hackernews` | Hacker News Algolia API | None | Real-time |
|
|
38
|
+
| `jobs` | Remotive API | None | Continuous |
|
|
39
|
+
| `reposearch` | GitHub Search API | Optional (rate limit) | Real-time |
|
|
40
|
+
| `github` | GitHub Repository API | Optional | Real-time |
|
|
41
|
+
| `reddit` | Reddit JSON API | None | Real-time |
|
|
42
|
+
| `yc` | YC Open Source API | None | Per batch cycle |
|
|
43
|
+
| `packagetrends` | npm Registry + npm Downloads API | None | Per publish |
|
|
44
|
+
| `finance` | Yahoo Finance API | None | Market hours |
|
|
45
|
+
| `hackernews` | HN Algolia Full-Text Search | None | Real-time |
|
|
46
|
+
|
|
47
|
+
All adapters operate exclusively on **publicly accessible data**. No credentials are required or used for data access. All fetch requests include a `User-Agent` header identifying the FreshContext crawler.
|
|
48
|
+
|
|
49
|
+
### 1.3 Content Hash Deduplication
|
|
50
|
+
|
|
51
|
+
Before any signal is stored, the platform computes a 32-bit rolling hash of the raw content. If the most recent stored result for a given watched query carries an identical hash, the current result is discarded. This prevents storing unchanged content across cron cycles.
|
|
52
|
+
|
|
53
|
+
### 1.4 Semantic Deduplication
|
|
54
|
+
|
|
55
|
+
Beyond exact-match deduplication, FreshContext implements semantic deduplication to prevent the same underlying story appearing as multiple signals because it was covered by multiple sources (e.g., the same GitHub release appearing in both HN and Reddit).
|
|
56
|
+
|
|
57
|
+
The semantic fingerprint is computed as follows:
|
|
58
|
+
|
|
59
|
+
1. Extract the first canonical URL from the raw content
|
|
60
|
+
2. Extract the first ISO 8601 publication date from the raw content
|
|
61
|
+
3. Extract and normalise the first substantive line (title) — lowercased, punctuation stripped, truncated to 80 characters
|
|
62
|
+
4. Concatenate: `normalised_title|canonical_url|publication_date`
|
|
63
|
+
5. Compute SHA-256 of the concatenated string
|
|
64
|
+
6. Retain the first 16 hex characters as the fingerprint
|
|
65
|
+
|
|
66
|
+
If any signal stored within the preceding 48 hours carries an identical fingerprint, the new result is discarded. The 48-hour window is configurable.
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## Section 2: Temporal Scoring — The DAR Engine
|
|
71
|
+
|
|
72
|
+
### 2.1 Overview
|
|
73
|
+
|
|
74
|
+
The Decay-Adjusted Relevancy (DAR) engine scores every collected signal on two axes:
|
|
75
|
+
|
|
76
|
+
- **R_0 (Base Score)** — semantic relevancy of the content against the user's profile, independent of time
|
|
77
|
+
- **R_t (Decay-Adjusted Score)** — R_0 adjusted for how much time has elapsed since the content was published
|
|
78
|
+
|
|
79
|
+
The final stored `rt_score` is what drives signal ranking in briefings and the intelligence feed.
|
|
80
|
+
|
|
81
|
+
### 2.2 Base Score Calculation (R_0)
|
|
82
|
+
|
|
83
|
+
R_0 is computed by matching content against the user profile:
|
|
84
|
+
|
|
85
|
+
```
|
|
86
|
+
R_0 = baseline (40)
|
|
87
|
+
+ vital_keyword_matches × 15 [capped at +35]
|
|
88
|
+
+ skill_keyword_matches × 3 [capped at +15]
|
|
89
|
+
+ location_accessibility_bonus [+8 if remote/accessible]
|
|
90
|
+
- error_penalty [−40 if content is empty/error]
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
Vital keywords are drawn from the `targets` field of the user profile — job titles, company names, and technology domains the user is specifically tracking.
|
|
94
|
+
|
|
95
|
+
Skill keywords are drawn from the `skills` field — the user's technical competencies. A match here adds relevancy signal but at lower weight than a direct target match.
|
|
96
|
+
|
|
97
|
+
The location accessibility bonus is applied when the content explicitly mentions "remote", "worldwide", "anywhere", or the user's stated location. This is not a geographic filter — it is a signal boost for content that is accessible to the user regardless of their physical location.
|
|
98
|
+
|
|
99
|
+
**Hard exclusions:** If any term from the `exclusion_terms` list appears in the content, R_0 is forced to zero. The result is still stored (for audit purposes) but marked `is_relevant = 0`.
|
|
100
|
+
|
|
101
|
+
### 2.3 Decay Function (R_t)
|
|
102
|
+
|
|
103
|
+
```
|
|
104
|
+
R_t = R_0 · e^(-λt)
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
Where:
|
|
108
|
+
- `λ` = source-specific decay constant (per hour)
|
|
109
|
+
- `t` = hours elapsed since `published_at`
|
|
110
|
+
|
|
111
|
+
If `published_at` cannot be extracted from the content, `t` is assumed to equal one half-life for that source (conservative assumption — signal is treated as partially decayed but not dead).
|
|
112
|
+
|
|
113
|
+
### 2.4 Source Decay Constants (λ)
|
|
114
|
+
|
|
115
|
+
These constants represent the platform's proprietary calibration of how quickly signals from each source class lose intelligence value:
|
|
116
|
+
|
|
117
|
+
| Source | λ (per hour) | Half-life |
|
|
118
|
+
|---|---|---|
|
|
119
|
+
| Hacker News | 0.050 | ~14 hours |
|
|
120
|
+
| Reddit | 0.010 | ~3 days |
|
|
121
|
+
| Product Hunt | 0.010 | ~3 days |
|
|
122
|
+
| Job listings | 0.005 | ~6 days |
|
|
123
|
+
| Financial data | 0.001 | ~29 days |
|
|
124
|
+
| YC companies | 0.001 | ~29 days |
|
|
125
|
+
| Package trends | 0.0005 | ~58 days |
|
|
126
|
+
| GitHub repositories | 0.0002 | ~5 months |
|
|
127
|
+
| Academic papers | 0.00005 | ~1.6 years |
|
|
128
|
+
|
|
129
|
+
These constants are calibrated against observed information decay rates across source types. They are the platform's primary trade secret and are not exposed in API responses.
|
|
130
|
+
|
|
131
|
+
### 2.5 Entropy Classification
|
|
132
|
+
|
|
133
|
+
Each signal is classified into one of three entropy states based on its position on the decay curve:
|
|
134
|
+
|
|
135
|
+
| State | Condition | Interpretation |
|
|
136
|
+
|---|---|---|
|
|
137
|
+
| `low` | `t < half_life / 2` | Signal near peak value — act now |
|
|
138
|
+
| `stable` | `t < 1.5 × half_life` | Usable signal — monitor |
|
|
139
|
+
| `high` | `t ≥ 1.5 × half_life` | Significantly degraded — verify before acting |
|
|
140
|
+
|
|
141
|
+
### 2.6 Relevancy Threshold
|
|
142
|
+
|
|
143
|
+
Signals with `rt_score < 35` are stored with `is_relevant = 0`. They remain in the database for audit and historical analysis but are excluded from briefings and the intelligence feed by default. The threshold is configurable per profile.
|
|
144
|
+
|
|
145
|
+
---
|
|
146
|
+
|
|
147
|
+
## Section 3: Provenance and Auditability
|
|
148
|
+
|
|
149
|
+
### 3.1 The Ha-Pri Audit Signature
|
|
150
|
+
|
|
151
|
+
Every signal stored in the FreshContext database carries a `ha_pri_sig` — a SHA-256 audit signature computed as:
|
|
152
|
+
|
|
153
|
+
```
|
|
154
|
+
SHA-256( result_id + ":" + content_hash + ":" + "FRESHCONTEXT_DAR_V1" )
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
This signature serves three purposes:
|
|
158
|
+
|
|
159
|
+
1. **Tamper detection** — the signature binds the content hash to the result ID and the engine version. Any modification to the stored content would invalidate the signature.
|
|
160
|
+
2. **Provenance chain** — every row in the `scrape_results` table is cryptographically linked to the moment it was scored by the DAR engine.
|
|
161
|
+
3. **Licensing audit** — when FreshContext data is provided to a third party under licence, the `ha_pri_sig` column provides an immutable record of exactly what was delivered and when.
|
|
162
|
+
|
|
163
|
+
### 3.2 D1 Historical Ledger
|
|
164
|
+
|
|
165
|
+
The `scrape_results` table functions as a **Contextual Ledger** — not merely a cache, but a time-series record of intelligence signals with full provenance.
|
|
166
|
+
|
|
167
|
+
Key properties of the ledger:
|
|
168
|
+
- Every row is immutable once written (no UPDATE operations on scored rows)
|
|
169
|
+
- Every row carries a `scraped_at` timestamp with second precision
|
|
170
|
+
- Every row carries a `published_at` date extracted from content (where available)
|
|
171
|
+
- The ledger accumulates continuously at 6-hour intervals regardless of active user sessions
|
|
172
|
+
- The ledger enables time-travel queries: "what was the intelligence landscape for topic X at date Y?"
|
|
173
|
+
|
|
174
|
+
### 3.3 Schema Reference
|
|
175
|
+
|
|
176
|
+
```sql
|
|
177
|
+
scrape_results (
|
|
178
|
+
id TEXT PRIMARY KEY, -- sr_{timestamp}_{random}
|
|
179
|
+
watched_query_id TEXT, -- FK → watched_queries.id
|
|
180
|
+
adapter TEXT, -- source adapter name
|
|
181
|
+
query TEXT, -- the search term used
|
|
182
|
+
raw_content TEXT, -- scraped content (max 8000 chars)
|
|
183
|
+
result_hash TEXT, -- 32-bit rolling hash of raw_content
|
|
184
|
+
semantic_fingerprint TEXT, -- 16-char SHA-256 of normalised title|url|date
|
|
185
|
+
is_new INTEGER, -- 1 until consumed by briefing
|
|
186
|
+
scraped_at TEXT, -- ISO 8601 UTC timestamp
|
|
187
|
+
published_at TEXT, -- extracted content publication date
|
|
188
|
+
relevancy_score INTEGER, -- = round(rt_score), 0-100
|
|
189
|
+
is_relevant INTEGER, -- 1 if rt_score >= 35, else 0
|
|
190
|
+
base_score INTEGER, -- R_0 semantic score, 0-100
|
|
191
|
+
rt_score REAL, -- R_t decay-adjusted score, 0-100
|
|
192
|
+
ha_pri_sig TEXT, -- SHA-256 audit signature (64 hex chars)
|
|
193
|
+
entropy_level TEXT -- 'low' | 'stable' | 'high'
|
|
194
|
+
)
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
---
|
|
198
|
+
|
|
199
|
+
## Section 4: The Intelligence Feed
|
|
200
|
+
|
|
201
|
+
### 4.1 Endpoint
|
|
202
|
+
|
|
203
|
+
```
|
|
204
|
+
GET /v1/intel/feed/{profile_id}
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
Optional parameters:
|
|
208
|
+
- `limit` — maximum signals to return (default: 20)
|
|
209
|
+
- `min_rt` — minimum rt_score filter (default: 0)
|
|
210
|
+
|
|
211
|
+
### 4.2 Response Structure
|
|
212
|
+
|
|
213
|
+
```json
|
|
214
|
+
{
|
|
215
|
+
"feed_metadata": {
|
|
216
|
+
"profile_id": "default",
|
|
217
|
+
"generated_at": "2026-04-14T09:00:00Z",
|
|
218
|
+
"signal_count": 18,
|
|
219
|
+
"version": "freshcontext-1.1"
|
|
220
|
+
},
|
|
221
|
+
"signals": [
|
|
222
|
+
{
|
|
223
|
+
"signal_id": "sr_1744628412_a3f7b",
|
|
224
|
+
"source": "hackernews",
|
|
225
|
+
"label": "HN: MCP Servers",
|
|
226
|
+
"content": {
|
|
227
|
+
"preview": "...",
|
|
228
|
+
"url": "mcp server 2026"
|
|
229
|
+
},
|
|
230
|
+
"intelligence_stamps": {
|
|
231
|
+
"scraped_at": "2026-04-14T08:12:00Z",
|
|
232
|
+
"published_at": "2026-04-14",
|
|
233
|
+
"base_score": 78,
|
|
234
|
+
"rt_score": 61.4,
|
|
235
|
+
"entropy_level": "stable",
|
|
236
|
+
"ha_pri_sig": "a3f7b2c1d4e5f6a7b8c9d0e1f2a3b4c5..."
|
|
237
|
+
}
|
|
238
|
+
}
|
|
239
|
+
]
|
|
240
|
+
}
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
### 4.3 LLM Integration
|
|
244
|
+
|
|
245
|
+
The intelligence feed is designed to be consumed directly by any language model or AI agent without modification. The `intelligence_stamps` block gives the agent everything it needs to reason about data freshness:
|
|
246
|
+
|
|
247
|
+
- `rt_score` — a single number representing current signal value
|
|
248
|
+
- `entropy_level` — human-readable decay state
|
|
249
|
+
- `published_at` — the actual content date (not the retrieval date)
|
|
250
|
+
- `ha_pri_sig` — provenance reference the agent can cite
|
|
251
|
+
|
|
252
|
+
This is the core value proposition: **AI agents get grounded, timestamped, scored intelligence rather than undated web content of unknown age.**
|
|
253
|
+
|
|
254
|
+
---
|
|
255
|
+
|
|
256
|
+
## Section 5: Asset Summary
|
|
257
|
+
|
|
258
|
+
For acquirers, investors, and licensing partners:
|
|
259
|
+
|
|
260
|
+
**What FreshContext owns:**
|
|
261
|
+
|
|
262
|
+
1. **The FreshContext Specification v1.1** (MIT licence, open standard) — defines the envelope format, confidence levels, and structured JSON form. Timestamped in the public GitHub repository.
|
|
263
|
+
|
|
264
|
+
2. **The DAR Engine** (proprietary) — the exponential decay scoring methodology with source-specific λ constants. These constants are not published and constitute trade secret IP.
|
|
265
|
+
|
|
266
|
+
3. **The Semantic Fingerprinting Method** (proprietary) — the three-field normalisation and SHA-256 fingerprinting approach for cross-adapter deduplication.
|
|
267
|
+
|
|
268
|
+
4. **The Ha-Pri Audit Signature scheme** (proprietary) — the provenance binding method that makes the historical ledger tamper-evident.
|
|
269
|
+
|
|
270
|
+
5. **The Historical D1 Ledger** (data asset) — the continuously accumulating time-series dataset. As of the date of this document, the ledger has been running since early 2026 with 6-hour collection intervals across 18 watched queries. The dataset grows in defensibility with every passing day.
|
|
271
|
+
|
|
272
|
+
6. **The Reference Implementation** — `freshcontext-mcp@0.3.15`, listed on the official MCP Registry and npm. Deployed globally on Cloudflare's edge infrastructure.
|
|
273
|
+
|
|
274
|
+
---
|
|
275
|
+
|
|
276
|
+
*"The work isn't gone. It's just waiting to be continued."*
|
|
277
|
+
*— Prince Gabriel, Grootfontein, Namibia*
|
package/README.md
CHANGED
|
@@ -8,12 +8,15 @@ That's the problem freshcontext fixes.
|
|
|
8
8
|
|
|
9
9
|
[](https://www.npmjs.com/package/freshcontext-mcp)
|
|
10
10
|
[](https://opensource.org/licenses/MIT)
|
|
11
|
+
[](https://registry.modelcontextprotocol.io)
|
|
11
12
|
|
|
12
13
|
---
|
|
13
14
|
|
|
14
|
-
##
|
|
15
|
+
## The Standard
|
|
15
16
|
|
|
16
|
-
|
|
17
|
+
FreshContext is a **data freshness layer for AI agents** — an open standard and reference implementation that makes retrieved data trustworthy.
|
|
18
|
+
|
|
19
|
+
Every piece of web data an AI agent retrieves has an age. Most tools ignore it. FreshContext surfaces it — wrapping every result in a structured envelope that carries three guarantees:
|
|
17
20
|
|
|
18
21
|
```
|
|
19
22
|
[FRESHCONTEXT]
|
|
@@ -26,11 +29,13 @@ Confidence: high
|
|
|
26
29
|
[/FRESHCONTEXT]
|
|
27
30
|
```
|
|
28
31
|
|
|
29
|
-
|
|
32
|
+
**When** it was retrieved. **Where** it came from. **How confident** we are the date is accurate.
|
|
33
|
+
|
|
34
|
+
The FreshContext Specification v1.1 is published as an open standard under MIT license. Any tool, agent, or system that wraps retrieved data in this envelope is FreshContext-compatible. → [Read the spec](./FRESHCONTEXT_SPEC.md)
|
|
30
35
|
|
|
31
36
|
---
|
|
32
37
|
|
|
33
|
-
##
|
|
38
|
+
## 20 tools. No API keys.
|
|
34
39
|
|
|
35
40
|
### Intelligence
|
|
36
41
|
| Tool | What it gets you |
|
|
@@ -52,22 +57,69 @@ Claude now knows the difference between something from this morning and somethin
|
|
|
52
57
|
### Market data
|
|
53
58
|
| Tool | What it gets you |
|
|
54
59
|
|---|---|
|
|
55
|
-
| `extract_finance` | Live stock data — price, market cap, P/E, 52w range |
|
|
60
|
+
| `extract_finance` | Live stock data — price, market cap, P/E, 52w range. Up to 5 tickers. |
|
|
61
|
+
| `search_jobs` | Remote job listings from Remotive, RemoteOK, HN "Who is Hiring" — every listing dated |
|
|
62
|
+
|
|
63
|
+
### Composites — multiple sources, one call
|
|
64
|
+
| Tool | Sources | What it gets you |
|
|
65
|
+
|---|---|---|
|
|
66
|
+
| `extract_landscape` | 6 | YC + GitHub + HN + Reddit + Product Hunt + npm in parallel |
|
|
67
|
+
| `extract_idea_landscape` | 6 | HN + YC + GitHub + Jobs + npm + Product Hunt — full idea validation |
|
|
68
|
+
| `extract_gov_landscape` | 4 | Gov contracts + HN + GitHub + changelog |
|
|
69
|
+
| `extract_finance_landscape` | 5 | Finance + HN + Reddit + GitHub + changelog |
|
|
70
|
+
| `extract_company_landscape` | 5 | **The full picture on any company** — see below |
|
|
71
|
+
|
|
72
|
+
### Unique — not available in any other MCP server
|
|
73
|
+
| Tool | Source | What it gets you |
|
|
74
|
+
|---|---|---|
|
|
75
|
+
| `extract_changelog` | GitHub Releases API / npm / auto-discover | Update history from any repo, package, or website |
|
|
76
|
+
| `extract_govcontracts` | USASpending.gov | US federal contract awards — company, amount, agency, period |
|
|
77
|
+
| `extract_sec_filings` | SEC EDGAR | 8-K filings — legally mandated material event disclosures |
|
|
78
|
+
| `extract_gdelt` | GDELT Project | Global news intelligence — 100+ languages, every country, 15-min updates |
|
|
79
|
+
| `extract_gebiz` | data.gov.sg | Singapore Government procurement tenders — open dataset, no auth |
|
|
56
80
|
|
|
57
|
-
|
|
58
|
-
| Tool | What it gets you |
|
|
59
|
-
|---|---|
|
|
60
|
-
| `extract_landscape` | One call. YC + GitHub + HN + Reddit + Product Hunt + npm in parallel. Full timestamped picture. |
|
|
81
|
+
---
|
|
61
82
|
|
|
62
|
-
|
|
63
|
-
| Tool | What it gets you |
|
|
64
|
-
|---|---|
|
|
65
|
-
| `extract_changelog` | Update history from any GitHub repo, npm package, or website. Accepts a GitHub URL (uses the Releases API), an npm package name, or any website URL — auto-discovers `/changelog`, `/releases`, and `CHANGELOG.md`. Returns version numbers, release dates, and entry content, all timestamped. Use this to check if a dependency is still actively maintained, or to find out exactly when a feature shipped before referencing it. |
|
|
83
|
+
## extract_idea_landscape
|
|
66
84
|
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
85
|
+
Built for the moment before you start building. Six sources fired in parallel to answer: *should I build this?*
|
|
86
|
+
|
|
87
|
+
1. **Hacker News** — what are developers actively complaining about (pain signal)
|
|
88
|
+
2. **YC Companies** — who has already received funding in this space (funding signal)
|
|
89
|
+
3. **GitHub** — how crowded the open source landscape is (crowding signal)
|
|
90
|
+
4. **Job listings** — companies hiring around this problem = real budget = real market (market signal)
|
|
91
|
+
5. **npm / PyPI** — ecosystem adoption and release velocity (ecosystem signal)
|
|
92
|
+
6. **Product Hunt** — what just launched and how the market received it (launch signal)
|
|
93
|
+
|
|
94
|
+
```
|
|
95
|
+
Use extract_idea_landscape with idea "data freshness for AI agents"
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
---
|
|
99
|
+
|
|
100
|
+
## extract_company_landscape
|
|
101
|
+
|
|
102
|
+
The most complete single-call company analysis available in any MCP server. Five sources fired in parallel:
|
|
103
|
+
|
|
104
|
+
1. **SEC EDGAR** — what did they legally just disclose (8-K filings)
|
|
105
|
+
2. **USASpending.gov** — who is giving them government money
|
|
106
|
+
3. **GDELT** — what is global news saying right now
|
|
107
|
+
4. **Changelog** — are they actually shipping product
|
|
108
|
+
5. **Yahoo Finance** — what is the market pricing in
|
|
109
|
+
|
|
110
|
+
```
|
|
111
|
+
Use extract_company_landscape with company "Palantir" and ticker "PLTR"
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
Real output from March 2026:
|
|
115
|
+
|
|
116
|
+
> **Q4 2025:** Revenue $1.407B (+70% YoY). US commercial +137%. Rule of 40 score: **127%**.
|
|
117
|
+
> **Federal contracts:** $292.7M Army Maven Smart System · $252.5M CDAO · $145M ICE · $130M Air Force · more
|
|
118
|
+
> **SEC filing:** Q4 earnings 8-K filed Feb 3, 2026 — GAAP net income $609M, 43% margin
|
|
119
|
+
> **GDELT:** ICE/Medicaid data controversy, UK MoD security warning, NHS opposition — all timestamped
|
|
120
|
+
> **PLTR:** ~$154–157 · Market cap ~$370B · P/E 244x · 52w range $66 → $207
|
|
121
|
+
|
|
122
|
+
Bloomberg Terminal doesn't read commit history as a company health signal. FreshContext does.
|
|
71
123
|
|
|
72
124
|
---
|
|
73
125
|
|
|
@@ -154,35 +206,53 @@ touch ~/Library/Application\ Support/Claude/claude_desktop_config.json
|
|
|
154
206
|
|
|
155
207
|
## Usage examples
|
|
156
208
|
|
|
209
|
+
**Should I build this idea?**
|
|
210
|
+
```
|
|
211
|
+
Use extract_idea_landscape with idea "procurement intelligence saas"
|
|
212
|
+
```
|
|
213
|
+
Returns funding signal, pain signal, crowding signal, market signal, ecosystem signal, and launch signal — all timestamped.
|
|
214
|
+
|
|
215
|
+
**Full company intelligence in one call:**
|
|
216
|
+
```
|
|
217
|
+
Use extract_company_landscape with company "Palantir" and ticker "PLTR"
|
|
218
|
+
```
|
|
219
|
+
SEC filings + federal contracts + global news + changelog + market data. The complete picture.
|
|
220
|
+
|
|
157
221
|
**Is anyone already building what you're building?**
|
|
158
222
|
```
|
|
159
223
|
Use extract_landscape with topic "cashflow prediction saas"
|
|
160
224
|
```
|
|
161
225
|
Returns who's funded, what's trending, what repos exist, what packages are moving — all timestamped.
|
|
162
226
|
|
|
163
|
-
**What's
|
|
227
|
+
**What's Singapore's government procuring right now?**
|
|
164
228
|
```
|
|
165
|
-
Use
|
|
166
|
-
Use extract_hackernews to search "mcp server 2026"
|
|
229
|
+
Use extract_gebiz with url "artificial intelligence"
|
|
167
230
|
```
|
|
231
|
+
Returns live tenders from the Ministry of Finance open dataset — agency, amount, closing date, all timestamped.
|
|
168
232
|
|
|
169
|
-
**Did that company
|
|
233
|
+
**Did that company just disclose something material?**
|
|
170
234
|
```
|
|
171
|
-
Use
|
|
235
|
+
Use extract_sec_filings with url "Palantir Technologies"
|
|
172
236
|
```
|
|
173
|
-
|
|
237
|
+
8-K filings are legally mandated within 4 business days of any material event — CEO change, acquisition, breach, major contract.
|
|
174
238
|
|
|
175
|
-
**
|
|
239
|
+
**What is global news saying about a company right now?**
|
|
176
240
|
```
|
|
177
|
-
Use
|
|
241
|
+
Use extract_gdelt with url "Palantir"
|
|
178
242
|
```
|
|
179
|
-
|
|
243
|
+
100+ languages, every country, updated every 15 minutes. Surfaces what Western sources miss.
|
|
180
244
|
|
|
181
|
-
**Which companies just won government contracts in AI?**
|
|
245
|
+
**Which companies just won US government contracts in AI?**
|
|
182
246
|
```
|
|
183
247
|
Use extract_govcontracts with url "artificial intelligence"
|
|
184
248
|
```
|
|
185
|
-
|
|
249
|
+
Largest recent federal contract awards matching that keyword — company, amount, agency, award date.
|
|
250
|
+
|
|
251
|
+
**Is this dependency still actively maintained?**
|
|
252
|
+
```
|
|
253
|
+
Use extract_changelog with url "https://github.com/org/repo"
|
|
254
|
+
```
|
|
255
|
+
Returns the last 8 releases with exact dates. If the last release was 18 months ago, you'll know before you pin the version.
|
|
186
256
|
|
|
187
257
|
---
|
|
188
258
|
|
|
@@ -190,14 +260,15 @@ Returns the largest recent federal contract awards matching that keyword — com
|
|
|
190
260
|
|
|
191
261
|
Most AI tools retrieve data silently. No timestamp, no signal, no way for the agent to know how old it is.
|
|
192
262
|
|
|
193
|
-
|
|
263
|
+
FreshContext treats **retrieval time as first-class metadata**. Every adapter returns:
|
|
194
264
|
|
|
195
265
|
- `retrieved_at` — exact ISO timestamp of the fetch
|
|
196
266
|
- `content_date` — best estimate of when the content was originally published
|
|
197
267
|
- `freshness_confidence` — `high`, `medium`, or `low` based on signal quality
|
|
268
|
+
- `freshness_score` — numeric 0–100 with domain-specific decay rates (financial data at 5.0, academic papers at 0.3)
|
|
198
269
|
- `adapter` — which source the data came from
|
|
199
270
|
|
|
200
|
-
When confidence is `high`, the date came from a structured field (API, metadata). When it's `medium` or `low`,
|
|
271
|
+
When confidence is `high`, the date came from a structured field (API, metadata). When it's `medium` or `low`, FreshContext tells you why.
|
|
201
272
|
|
|
202
273
|
---
|
|
203
274
|
|
|
@@ -212,28 +283,111 @@ When confidence is `high`, the date came from a structured field (API, metadata)
|
|
|
212
283
|
|
|
213
284
|
## Roadmap
|
|
214
285
|
|
|
215
|
-
- [x]
|
|
216
|
-
- [x] `extract_landscape` — 6-source composite tool
|
|
217
|
-
- [x] Cloudflare Workers deployment
|
|
218
|
-
- [x] KV-backed global rate limiting
|
|
219
|
-
- [x] Listed on official MCP Registry
|
|
286
|
+
- [x] 20 tools across intelligence, competitive research, market data, and composites
|
|
220
287
|
- [x] `extract_changelog` — update cadence from any repo, package, or website
|
|
221
288
|
- [x] `extract_govcontracts` — US federal contract intelligence via USASpending.gov
|
|
289
|
+
- [x] `extract_sec_filings` — SEC EDGAR 8-K material event filings
|
|
290
|
+
- [x] `extract_gdelt` — GDELT global news intelligence (100+ languages)
|
|
291
|
+
- [x] `extract_gebiz` — Singapore Government procurement via data.gov.sg
|
|
292
|
+
- [x] `extract_company_landscape` — 5-source company intelligence composite
|
|
293
|
+
- [x] `extract_idea_landscape` — 6-source idea validation composite
|
|
294
|
+
- [x] `freshness_score` numeric metric (0–100) with domain-specific decay rates
|
|
295
|
+
- [x] Cloudflare Workers deployment — global edge with KV caching and rate limiting
|
|
296
|
+
- [x] D1 database — 18 watched queries running on 6-hour cron with relevancy scoring
|
|
297
|
+
- [x] Listed on official MCP Registry
|
|
222
298
|
- [x] Listed on Apify Store
|
|
223
|
-
- [x] FreshContext Specification v1.
|
|
224
|
-
- [
|
|
225
|
-
- [
|
|
226
|
-
- [
|
|
227
|
-
- [
|
|
299
|
+
- [x] FreshContext Specification v1.1 published (MIT) — composite adapters, decay rate table, compatibility levels
|
|
300
|
+
- [x] GitHub Actions CI/CD — auto-publish to npm on every push
|
|
301
|
+
- [x] **DAR engine** — exponential decay scoring with proprietary λ constants (v0.3.15)
|
|
302
|
+
- [x] **Ha-Pri audit signatures** — SHA-256 provenance stamps on every signal
|
|
303
|
+
- [x] **Semantic deduplication** — cross-adapter fingerprinting
|
|
304
|
+
- [x] **Intelligence feed endpoint** — `/v1/intel/feed/:profile_id`
|
|
305
|
+
- [x] **METHODOLOGY.md** — formal IP documentation
|
|
306
|
+
- [ ] Webhook triggers — push high-entropy signals on threshold
|
|
307
|
+
- [ ] Domain-specific watched queries for mining/industrial sector
|
|
308
|
+
- [ ] Subscription tier with profile customization
|
|
309
|
+
- [ ] GKG upgrade for `extract_gdelt` — tone scores, goldstein scale, event codes
|
|
310
|
+
- [ ] Dashboard — React frontend for the D1 intelligence pipeline
|
|
228
311
|
|
|
229
312
|
---
|
|
230
313
|
|
|
231
314
|
## Contributing
|
|
232
315
|
|
|
233
|
-
PRs welcome. New adapters are the highest-value contribution — see `src/adapters/` for the pattern.
|
|
316
|
+
PRs welcome. New adapters are the highest-value contribution — see `src/adapters/` for the pattern and `FRESHCONTEXT_SPEC.md` for the contract any adapter must fulfill.
|
|
317
|
+
|
|
318
|
+
If you're building something FreshContext-compatible, open an issue and we'll add you to the ecosystem list.
|
|
234
319
|
|
|
235
320
|
---
|
|
236
321
|
|
|
237
322
|
## License
|
|
238
323
|
|
|
239
324
|
MIT
|
|
325
|
+
|
|
326
|
+
---
|
|
327
|
+
|
|
328
|
+
*Built by Prince Gabriel — Grootfontein, Namibia 🇳🇦*
|
|
329
|
+
*"The work isn't gone. It's just waiting to be continued."*
|
|
330
|
+
|
|
331
|
+
---
|
|
332
|
+
|
|
333
|
+
**Also on:** [Apify Store](https://apify.com/prince_gabriel/freshcontext-mcp) · [MCP Registry](https://registry.modelcontextprotocol.io) · [npm](https://www.npmjs.com/package/freshcontext-mcp)
|
|
334
|
+
|
|
335
|
+
---
|
|
336
|
+
|
|
337
|
+
## The Intelligence Layer (v0.3.15)
|
|
338
|
+
|
|
339
|
+
FreshContext is no longer just a pull tool. The infrastructure now runs a continuous **Decay-Adjusted Relevancy (DAR)** engine that scores every signal with exponential decay and provenance signatures.
|
|
340
|
+
|
|
341
|
+
### The math
|
|
342
|
+
|
|
343
|
+
```
|
|
344
|
+
R_t = R_0 · e^(-λt)
|
|
345
|
+
```
|
|
346
|
+
|
|
347
|
+
- `R_0` — base semantic score against your profile (0–100)
|
|
348
|
+
- `λ` — source-specific decay constant (per hour)
|
|
349
|
+
- `t` — hours since the content was published
|
|
350
|
+
- `R_t` — final relevancy at query time
|
|
351
|
+
|
|
352
|
+
Source half-lives are calibrated empirically: Hacker News ≈14h, Reddit ≈3d, jobs ≈6d, GitHub ≈5mo, academic papers ≈1.6y.
|
|
353
|
+
|
|
354
|
+
### What every signal carries
|
|
355
|
+
|
|
356
|
+
Every row in the D1 ledger is stamped with:
|
|
357
|
+
|
|
358
|
+
- `base_score` — R_0, semantic match against profile
|
|
359
|
+
- `rt_score` — R_t, decay-adjusted relevancy
|
|
360
|
+
- `entropy_level` — `low` / `stable` / `high` on the decay curve
|
|
361
|
+
- `ha_pri_sig` — SHA-256 provenance signature (tamper-evident)
|
|
362
|
+
- `semantic_fingerprint` — cross-adapter deduplication hash
|
|
363
|
+
- `published_at` — extracted content publication date
|
|
364
|
+
|
|
365
|
+
### The intelligence feed
|
|
366
|
+
|
|
367
|
+
```
|
|
368
|
+
GET /v1/intel/feed/:profile_id?limit=20&min_rt=0
|
|
369
|
+
```
|
|
370
|
+
|
|
371
|
+
Returns scored, deduplicated, provenance-stamped signals ranked by R_t — ready for direct consumption by any LLM or agent. No synthesis needed.
|
|
372
|
+
|
|
373
|
+
### Methodology
|
|
374
|
+
|
|
375
|
+
The full data collection, scoring, and provenance methodology is formally documented in [METHODOLOGY.md](./METHODOLOGY.md) — written as an audit trail for acquirers, integrators, and regulators. Version 1.1, April 2026.
|
|
376
|
+
|
|
377
|
+
---
|
|
378
|
+
|
|
379
|
+
## Live endpoints
|
|
380
|
+
|
|
381
|
+
| Endpoint | Method | Purpose |
|
|
382
|
+
|---|---|---|
|
|
383
|
+
| `/` | GET | Service info + endpoint list |
|
|
384
|
+
| `/health` | GET | Liveness check |
|
|
385
|
+
| `/mcp` | POST | MCP JSON-RPC transport |
|
|
386
|
+
| `/briefing` | GET | Latest stored briefing |
|
|
387
|
+
| `/briefing/now` | POST | Force scrape + synthesize |
|
|
388
|
+
| `/v1/intel/feed/:profile_id` | GET | DAR-scored intelligence feed |
|
|
389
|
+
| `/watched-queries` | GET | List all watched queries |
|
|
390
|
+
| `/debug/db` | GET | D1 counts + DAR engine coverage |
|
|
391
|
+
| `/debug/scrape` | GET | Run a single adapter raw |
|
|
392
|
+
|
|
393
|
+
Production: `https://freshcontext-mcp.gimmanuel73.workers.dev`
|