freshcontext-mcp 0.3.1 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,8 @@
1
+ {
2
+ "actorSpecification": 1,
3
+ "name": "freshcontext-mcp",
4
+ "title": "FreshContext MCP",
5
+ "version": "0.3.1",
6
+ "input": "../input_schema.json",
7
+ "output": "./output_schema.json"
8
+ }
@@ -0,0 +1,13 @@
1
+ {
2
+ "actorOutputSchemaVersion": 1,
3
+ "title": "FreshContext MCP Output",
4
+ "description": "Timestamped web intelligence results wrapped in FreshContext envelopes.",
5
+ "properties": {
6
+ "results": {
7
+ "type": "string",
8
+ "title": "Results",
9
+ "description": "FreshContext envelopes with content, source URL, retrieval timestamp, and freshness confidence.",
10
+ "template": "{{links.apiDefaultDatasetUrl}}/items"
11
+ }
12
+ }
13
+ }
@@ -0,0 +1,178 @@
1
+ # The FreshContext Specification
2
+ **Version 1.0 — March 2026**
3
+ *Authored by Immanuel Gabriel (Prince Gabriel) — Grootfontein, Namibia*
4
+
5
+ ---
6
+
7
+ ## What This Is
8
+
9
+ The FreshContext Specification defines a standard envelope format for AI-retrieved web data.
10
+
11
+ It exists to solve one problem: **AI models present stale data with the same confidence as fresh data, and users have no way to tell the difference.**
12
+
13
+ FreshContext fixes this by wrapping every piece of retrieved content in a structured envelope that carries three guarantees:
14
+
15
+ 1. **When** the data was retrieved (exact ISO 8601 timestamp)
16
+ 2. **Where** it came from (canonical source URL)
17
+ 3. **How confident** we are that the content date is accurate (freshness confidence)
18
+
19
+ Any tool, agent, or system that implements this spec is **FreshContext-compatible**.
20
+
21
+ ---
22
+
23
+ ## The Envelope Format
24
+
25
+ Every FreshContext-compatible response MUST wrap its content in the following envelope:
26
+
27
+ ```
28
+ [FRESHCONTEXT]
29
+ Source: <canonical_url>
30
+ Published: <content_date_or_"unknown">
31
+ Retrieved: <iso8601_timestamp>
32
+ Confidence: <high|medium|low>
33
+ ---
34
+ <content>
35
+ [/FRESHCONTEXT]
36
+ ```
37
+
38
+ ### Field Definitions
39
+
40
+ | Field | Required | Format | Description |
41
+ |---|---|---|---|
42
+ | `Source` | Yes | Valid URL | The canonical URL of the original source |
43
+ | `Published` | Yes | ISO 8601 date or `"unknown"` | Best estimate of when the content was originally published |
44
+ | `Retrieved` | Yes | ISO 8601 datetime with timezone | Exact timestamp when this data was fetched |
45
+ | `Confidence` | Yes | `high`, `medium`, or `low` | Confidence level of the `Published` date estimate |
46
+
47
+ ---
48
+
49
+ ## Confidence Levels
50
+
51
+ ### `high`
52
+ The publication date was sourced from a structured, machine-readable field — an API response, HTML metadata tag, RSS feed, or official timestamp. The date is reliable.
53
+
54
+ *Examples: GitHub API `pushed_at`, arXiv submission date, Hacker News `created_at`*
55
+
56
+ ### `medium`
57
+ The publication date was inferred from page signals — visible date strings, URL patterns, or content heuristics. Likely correct but not guaranteed.
58
+
59
+ *Examples: Blog post date parsed from HTML, URL containing `/2025/03/`, footer copyright year*
60
+
61
+ ### `low`
62
+ No reliable date signal was found. The date is an estimate based on indirect signals or is entirely unknown.
63
+
64
+ *Examples: Static page with no date, scraped content with no metadata, cached result of unknown age*
65
+
66
+ ---
67
+
68
+ ## Structured Form (JSON)
69
+
70
+ Implementations MAY additionally expose freshness metadata as structured JSON alongside the text envelope:
71
+
72
+ ```json
73
+ {
74
+ "freshcontext": {
75
+ "source_url": "https://github.com/owner/repo",
76
+ "content_date": "2026-03-05",
77
+ "retrieved_at": "2026-03-16T09:19:00.000Z",
78
+ "freshness_confidence": "high",
79
+ "adapter": "github",
80
+ "freshness_score": 94
81
+ },
82
+ "content": "..."
83
+ }
84
+ ```
85
+
86
+ ### `freshness_score` (optional)
87
+
88
+ A numeric representation of data freshness from 0–100, calculated as:
89
+
90
+ ```
91
+ freshness_score = max(0, 100 - (days_since_retrieved × decay_rate))
92
+ ```
93
+
94
+ Where `decay_rate` defaults to `1.5` for general web content. Implementations MAY use domain-specific decay rates (e.g., financial data decays faster than academic papers).
95
+
96
+ | Score | Interpretation |
97
+ |---|---|
98
+ | 90–100 | Retrieved within hours — treat as current |
99
+ | 70–89 | Retrieved within days — reliable for most uses |
100
+ | 50–69 | Retrieved within weeks — verify before acting |
101
+ | Below 50 | Retrieved more than a month ago — use with caution |
102
+
103
+ ---
104
+
105
+ ## Adapter Contract
106
+
107
+ Any data source that feeds into a FreshContext-compatible system is called an **adapter**. Adapters MUST:
108
+
109
+ 1. Return raw content plus a `content_date` (or `null` if unknown)
110
+ 2. Set a `freshness_confidence` level based on how the date was determined
111
+ 3. Never fabricate or forward-date content timestamps
112
+ 4. Clearly identify which source system produced the data via the `adapter` field
113
+
114
+ Adapters SHOULD:
115
+
116
+ - Prefer structured API sources over scraped content when both are available
117
+ - Log retrieval errors without silently returning cached or stale data
118
+ - Surface rate-limit or access-denied errors explicitly rather than returning empty content
119
+
120
+ ---
121
+
122
+ ## Why This Matters for AI Agents
123
+
124
+ Large language models have no internal clock. When an agent retrieves web data, it cannot distinguish between something published this morning and something published three years ago — unless that information is explicitly surfaced.
125
+
126
+ Without FreshContext (or equivalent):
127
+ - An agent recommending job listings may recommend roles that no longer exist
128
+ - An agent summarising market trends may cite conditions from a previous cycle
129
+ - An agent checking a competitor's pricing may act on outdated information
130
+
131
+ With FreshContext:
132
+ - Every piece of retrieved data carries its own timestamp
133
+ - The agent can reason about data age before acting
134
+ - Users can see exactly how fresh their AI's information is
135
+
136
+ ---
137
+
138
+ ## Compatibility
139
+
140
+ A tool, server, or API is **FreshContext-compatible** if:
141
+
142
+ - Its responses include the `[FRESHCONTEXT]...[/FRESHCONTEXT]` envelope, OR
143
+ - Its responses include the structured JSON form with `freshcontext.retrieved_at` and `freshcontext.freshness_confidence` fields
144
+
145
+ Partial implementations that include only `retrieved_at` without `freshness_confidence` are considered **FreshContext-aware** but not fully compatible.
146
+
147
+ ---
148
+
149
+ ## Reference Implementation
150
+
151
+ The canonical reference implementation of this specification is:
152
+
153
+ **freshcontext-mcp** — an MCP server with 11 adapters covering GitHub, Hacker News, Google Scholar, arXiv, Reddit, YC Companies, Product Hunt, npm/PyPI, financial markets, and a composite landscape tool.
154
+
155
+ - npm: `freshcontext-mcp`
156
+ - GitHub: https://github.com/PrinceGabriel-lgtm/freshcontext-mcp
157
+ - Cloud endpoint: `https://freshcontext-mcp.gimmanuel73.workers.dev/mcp`
158
+
159
+ ---
160
+
161
+ ## Versioning
162
+
163
+ This document is version 1.0 of the FreshContext Specification.
164
+
165
+ Future versions will be tagged in this repository. Breaking changes to the envelope format will increment the major version. Additive changes (new optional fields, new confidence levels) will increment the minor version.
166
+
167
+ ---
168
+
169
+ ## License
170
+
171
+ This specification is published under the MIT License.
172
+ Implementations may be proprietary or open source.
173
+ Attribution to the FreshContext Specification is appreciated but not required.
174
+
175
+ ---
176
+
177
+ *"The work isn't gone. It's just waiting to be continued."*
178
+ *— Prince Gabriel, Grootfontein, Namibia*
package/ROADMAP.md ADDED
@@ -0,0 +1,174 @@
1
+ # FreshContext Roadmap
2
+
3
+ > *This document describes what FreshContext is becoming — not just what it is today.*
4
+ > *Built by Prince Gabriel — Grootfontein, Namibia 🇳🇦*
5
+
6
+ ---
7
+
8
+ ## Where We Are Today
9
+
10
+ FreshContext is a working, deployed, monetized web intelligence engine for AI agents.
11
+
12
+ **What's live and functional right now:**
13
+
14
+ - 11 MCP adapters — GitHub, Hacker News, Google Scholar, arXiv, Reddit, YC Companies, Product Hunt, npm/PyPI trends, finance, job search, and `extract_landscape` (all 6 sources in one call)
15
+ - Cloudflare Worker deployed globally at the edge with KV caching and rate limiting
16
+ - D1 database with 18 active watched queries running on a 6-hour cron schedule
17
+ - `GET /briefing` and `POST /briefing/now` endpoints for scheduled AI synthesis (synthesis paused pending Anthropic credits — infrastructure fully built)
18
+ - Listed on npm (`freshcontext-mcp@0.3.1`) and the official MCP Registry
19
+ - Published FreshContext Specification v1.0 — the standard this project is authoring
20
+ - Apify Store listing pending approval (account under manual review)
21
+
22
+ ---
23
+
24
+ ## Layer 5 — Dashboard (Next Build)
25
+
26
+ **Status: Designed, not yet built**
27
+
28
+ A React frontend that makes the intelligence pipeline visible and beautiful.
29
+
30
+ The dashboard pulls from live endpoints already built:
31
+
32
+ - `GET /briefing` → renders the latest AI-generated briefing with per-adapter sections
33
+ - `POST /briefing/now` → force-triggers a fresh synthesis on demand
34
+ - `GET /watched-queries` → manage what topics are being monitored
35
+ - User profile editor → update skills, targets, and context that shape briefing personalization
36
+
37
+ **Design targets:**
38
+ - Freshness confidence indicators on every source card (high/medium/low with color coding)
39
+ - Briefing history timeline showing how signal has evolved over time
40
+ - Watched query manager — add, pause, delete, and score queries by signal quality
41
+ - "Force refresh" button with live streaming output
42
+
43
+ **Deployment:** Cloudflare Pages — stays entirely within the Cloudflare free tier ecosystem.
44
+
45
+ ---
46
+
47
+ ## Layer 6 — Personalization Engine
48
+
49
+ **Status: Schema designed in D1, logic not yet built**
50
+
51
+ The `user_profiles` table already exists in D1 with fields for skills, certifications, targets, location, and context. The synthesis prompt already uses this data. What's missing is the user-facing surface:
52
+
53
+ - Onboarding flow — build your profile in the app in under 3 minutes
54
+ - Multiple profiles — team mode where each member gets their own briefing
55
+ - Custom briefing schedules — not just every 6h, but user-defined intervals
56
+ - Notification delivery — push briefings to Slack, email, or SMS via webhook
57
+
58
+ ---
59
+
60
+ ## Layer 7 — Watched Query Intelligence
61
+
62
+ **Status: Data accumulating, intelligence layer not yet built**
63
+
64
+ Every query run leaves a result in `scrape_results`. Over time this becomes a dataset with genuine historical value. The intelligence layer turns it into signal:
65
+
66
+ - **Relevance scoring** — each result is scored against the user profile (0–100) before inclusion in briefings
67
+ - **Deduplication** — same story appearing on HN and Reddit counts as one signal, not two
68
+ - **Query performance scoring** — which watched queries are generating signal vs. noise? Surface the top performers.
69
+ - **Smart suggestions** — "Based on your profile, you should also watch: mcp server rust, cloudflare workers ai"
70
+ - **Trend detection** — alert when a topic spikes across multiple adapters simultaneously
71
+
72
+ ---
73
+
74
+ ## Layer 8 — New Adapters
75
+
76
+ **Status: Planned, prioritised by acquisition value**
77
+
78
+ These adapters extend FreshContext into new intelligence categories with zero API key requirements:
79
+
80
+ | Adapter | Source | What it adds |
81
+ |---|---|---|
82
+ | `extract_devto` | dev.to public API | Developer article sentiment with clean publish dates |
83
+ | `extract_changelog` | Any `/changelog` or `/releases` URL | Track any product's update cadence |
84
+ | `extract_crunchbase_free` | Crunchbase public feed | Funding announcements with date signals |
85
+ | `extract_govcontracts` | USASpending.gov API | Government contract awards — unique GTM signal |
86
+ | `extract_npm_releases` | npm registry API | Package release velocity and adoption signals |
87
+ | `extract_twitter_trends` | Nitter public endpoints | Real-time trending topics with no auth |
88
+ | `extract_linkedin_jobs` | LinkedIn public job search | Job freshness — the origin story, completed |
89
+
90
+ The `extract_changelog` and `extract_govcontracts` adapters are not available in any other MCP server. They represent a genuine capability gap in the market.
91
+
92
+ ---
93
+
94
+ ## Layer 9 — The Freshness Score Standard
95
+
96
+ **Status: Spec written (FRESHCONTEXT_SPEC.md), numeric score not yet implemented**
97
+
98
+ The FreshContext Specification v1.0 defines an optional `freshness_score` field (0–100) calculated as:
99
+
100
+ ```
101
+ freshness_score = max(0, 100 - (days_since_retrieved × decay_rate))
102
+ ```
103
+
104
+ Domain-specific decay rates will allow different categories of data to age at appropriate speeds:
105
+
106
+ | Category | Decay Rate | Half-life |
107
+ |---|---|---|
108
+ | Financial data | 5.0 | ~10 days |
109
+ | Job listings | 3.0 | ~17 days |
110
+ | News / HN | 2.0 | ~25 days |
111
+ | GitHub repos | 1.0 | ~50 days |
112
+ | Academic papers | 0.3 | ~167 days |
113
+
114
+ Once implemented, agents can filter results by `freshness_score > threshold` instead of relying on string confidence levels. This makes FreshContext usable as a query parameter, not just a label.
115
+
116
+ ---
117
+
118
+ ## Layer 10 — API + Monetization Infrastructure
119
+
120
+ **Status: Pricing designed, billing not yet built**
121
+
122
+ The monetization architecture planned for FreshContext:
123
+
124
+ **Free tier**
125
+ - 1 user profile
126
+ - 5 watched queries
127
+ - Daily briefings
128
+ - All 11 adapters via MCP
129
+
130
+ **Pro ($19/month)**
131
+ - Unlimited watched queries
132
+ - 6-hour briefings
133
+ - All adapters including new ones
134
+ - Freshness score on every result
135
+ - API access (100k calls/month)
136
+
137
+ **Team ($79/month)**
138
+ - Multiple user profiles
139
+ - Shared briefing feed
140
+ - Slack / email delivery
141
+ - 500k API calls/month
142
+ - Priority support
143
+
144
+ **Enterprise (custom)**
145
+ - Dedicated Cloudflare Worker deployment
146
+ - Custom adapter development
147
+ - SLA-backed uptime
148
+ - White-label briefing output
149
+
150
+ **Billing implementation:** Lemon Squeezy (Namibia-compatible, merchant-of-record, no Stripe required)
151
+
152
+ ---
153
+
154
+ ## The Bigger Picture
155
+
156
+ FreshContext started as a fix for a personal problem — AI giving stale job listings with no warning. It's becoming something more structural: a **data freshness layer for the AI agent ecosystem.**
157
+
158
+ Every agent needs to know how old its data is. Right now, none of them do — not reliably, not with a standard format, not with a confidence signal. FreshContext is the first project to address this as a named, specified, open standard with a working reference implementation.
159
+
160
+ The opportunity is to become the layer that other AI tools plug into when they need grounded, timestamped intelligence — not a scraper, not a search engine, but the envelope that makes retrieved data trustworthy.
161
+
162
+ **The unfair advantage:** One developer, Cloudflare's global edge, a working spec, and a dataset that grows every six hours whether or not anyone is watching. The longer FreshContext runs, the more historical signal accumulates, and the harder it becomes to replicate from scratch.
163
+
164
+ ---
165
+
166
+ ## Contribution
167
+
168
+ The FreshContext Specification is open. New adapters are the highest-value contribution — see `src/adapters/` for the pattern and `FRESHCONTEXT_SPEC.md` for the contract any adapter must fulfill.
169
+
170
+ If you're building something FreshContext-compatible, open an issue and we'll add you to the ecosystem list.
171
+
172
+ ---
173
+
174
+ *"The work isn't gone. It's just waiting to be continued."*
@@ -0,0 +1,41 @@
1
+ {
2
+ "actorSpecification": 1,
3
+ "fields": [
4
+ {
5
+ "fieldId": "adapter",
6
+ "fieldType": "String",
7
+ "title": "Adapter",
8
+ "description": "The source adapter used to retrieve the data (e.g. github, hackernews, reddit, yc, scholar)."
9
+ },
10
+ {
11
+ "fieldId": "source_url",
12
+ "fieldType": "String",
13
+ "title": "Source URL",
14
+ "description": "The URL of the original source the data was retrieved from."
15
+ },
16
+ {
17
+ "fieldId": "content",
18
+ "fieldType": "String",
19
+ "title": "Content",
20
+ "description": "The retrieved content from the source, truncated to max_length characters."
21
+ },
22
+ {
23
+ "fieldId": "retrieved_at",
24
+ "fieldType": "String",
25
+ "title": "Retrieved at",
26
+ "description": "ISO 8601 timestamp of when FreshContext fetched this data. Always reflects the actual retrieval time."
27
+ },
28
+ {
29
+ "fieldId": "content_date",
30
+ "fieldType": "String",
31
+ "title": "Content date",
32
+ "description": "Best estimate of when the original content was published. Null if unknown."
33
+ },
34
+ {
35
+ "fieldId": "freshness_confidence",
36
+ "fieldType": "String",
37
+ "title": "Freshness confidence",
38
+ "description": "Confidence level of the content_date estimate. One of: high (from structured API/metadata), medium (inferred from page signals), low (estimated or unknown)."
39
+ }
40
+ ]
41
+ }
@@ -0,0 +1,207 @@
1
+ /**
2
+ * Changelog adapter — extracts update history from any product or repo.
3
+ *
4
+ * Accepts:
5
+ * - Any URL: https://example.com → auto-discovers /changelog, /releases, /CHANGELOG.md
6
+ * - GitHub repo URL: https://github.com/owner/repo → uses Releases API
7
+ * - Direct changelog URL: https://example.com/changelog
8
+ * - npm package name: e.g. "freshcontext-mcp" → fetches from npm registry
9
+ *
10
+ * What it returns:
11
+ * - Most recent changelog entries with dates
12
+ * - Version numbers when available
13
+ * - Content of each entry (truncated)
14
+ * - freshness_confidence based on how the date was sourced
15
+ *
16
+ * Why this matters for AI agents:
17
+ * Agents checking "is this tool still maintained?" or "did they ship X feature?"
18
+ * need to know WHEN changes happened — not just that they happened.
19
+ * This adapter makes update cadence a first-class signal.
20
+ */
21
+ const CHANGELOG_PATHS = [
22
+ "/changelog",
23
+ "/CHANGELOG",
24
+ "/CHANGELOG.md",
25
+ "/CHANGELOG.txt",
26
+ "/releases",
27
+ "/blog/changelog",
28
+ "/blog/releases",
29
+ "/updates",
30
+ "/whats-new",
31
+ "/what-s-new",
32
+ "/release-notes",
33
+ ];
34
+ function sanitize(s) {
35
+ return s.replace(/[^\x20-\x7E\n]/g, "").trim();
36
+ }
37
+ // ─── GitHub Releases API ──────────────────────────────────────────────────────
38
+ async function fetchGitHubReleases(owner, repo, maxLength) {
39
+ const res = await fetch(`https://api.github.com/repos/${owner}/${repo}/releases?per_page=10`, { headers: { "Accept": "application/vnd.github.v3+json", "User-Agent": "freshcontext-mcp" } });
40
+ if (!res.ok)
41
+ throw new Error(`GitHub releases API error: ${res.status}`);
42
+ const releases = await res.json();
43
+ if (!releases.length)
44
+ throw new Error("No releases found");
45
+ const stable = releases.filter((r) => !r.prerelease && !r.draft);
46
+ const items = stable.length ? stable : releases;
47
+ const raw = items
48
+ .slice(0, 8)
49
+ .map((r, i) => {
50
+ const body = sanitize(r.body ?? "").slice(0, 500);
51
+ return [
52
+ `[${i + 1}] ${r.tag_name}${r.name && r.name !== r.tag_name ? ` — ${r.name}` : ""}`,
53
+ `Released: ${r.published_at?.slice(0, 10) ?? "unknown"}`,
54
+ body ? `\n${body}` : "(no release notes)",
55
+ ].join("\n");
56
+ })
57
+ .join("\n\n")
58
+ .slice(0, maxLength);
59
+ const newest = items[0]?.published_at ?? null;
60
+ return { raw, content_date: newest, freshness_confidence: "high" };
61
+ }
62
+ // ─── npm Registry ─────────────────────────────────────────────────────────────
63
+ async function fetchNpmChangelog(packageName, maxLength) {
64
+ const res = await fetch(`https://registry.npmjs.org/${encodeURIComponent(packageName)}`);
65
+ if (!res.ok)
66
+ throw new Error(`npm registry error: ${res.status}`);
67
+ const data = await res.json();
68
+ const times = data.time ?? {};
69
+ const versions = Object.keys(times)
70
+ .filter((k) => k !== "created" && k !== "modified" && /^\d/.test(k))
71
+ .sort((a, b) => new Date(times[b]).getTime() - new Date(times[a]).getTime())
72
+ .slice(0, 10);
73
+ const latest = data["dist-tags"]?.latest ?? versions[0];
74
+ const raw = [
75
+ `Package: ${data.name}`,
76
+ `Description: ${data.description ?? "N/A"}`,
77
+ `Latest: ${latest} (${times[latest]?.slice(0, 10) ?? "unknown"})`,
78
+ ``,
79
+ `Recent versions:`,
80
+ ...versions.map((v) => ` ${v} — ${times[v]?.slice(0, 10) ?? "unknown"}`),
81
+ ].join("\n").slice(0, maxLength);
82
+ const newest = versions[0] ? times[versions[0]] : null;
83
+ return { raw, content_date: newest ?? null, freshness_confidence: newest ? "high" : "medium" };
84
+ }
85
+ // ─── Browser-based changelog discovery ───────────────────────────────────────
86
+ async function discoverChangelog(baseUrl, maxLength) {
87
+ const { chromium } = await import("playwright");
88
+ // Strip trailing slash and path — we want the root for discovery
89
+ const urlObj = new URL(baseUrl);
90
+ // If the URL already looks like a changelog page, go directly
91
+ const isDirectChangelog = CHANGELOG_PATHS.some((p) => urlObj.pathname.toLowerCase().includes(p.replace("/", "")));
92
+ const targetUrls = isDirectChangelog
93
+ ? [baseUrl]
94
+ : [baseUrl, ...CHANGELOG_PATHS.map((p) => `${urlObj.origin}${p}`)];
95
+ const browser = await chromium.launch({ headless: true });
96
+ for (const url of targetUrls) {
97
+ const page = await browser.newPage();
98
+ try {
99
+ const res = await page.goto(url, { waitUntil: "domcontentloaded", timeout: 15000 });
100
+ if (!res || !res.ok()) {
101
+ await page.close();
102
+ continue;
103
+ }
104
+ // Check if we landed on a real page with content
105
+ const content = await page.evaluate(`(function() {
106
+ // Try to find changelog-like content
107
+ var selectors = [
108
+ 'article', 'main', '.changelog', '.releases', '.release-notes',
109
+ '[class*="changelog"]', '[class*="release"]', '[id*="changelog"]',
110
+ '[id*="release"]', '.prose', '.content', '.markdown-body'
111
+ ];
112
+
113
+ var el = null;
114
+ for (var i = 0; i < selectors.length; i++) {
115
+ el = document.querySelector(selectors[i]);
116
+ if (el && el.innerText && el.innerText.length > 100) break;
117
+ }
118
+
119
+ if (!el) el = document.body;
120
+
121
+ var text = el ? el.innerText : '';
122
+
123
+ // Extract dates — look for version/date patterns
124
+ var datePattern = /\\b(20\\d{2}[-/.](0[1-9]|1[0-2])[-/.](0[1-9]|[12]\\d|3[01]))\\b/g;
125
+ var versionPattern = /v?\\d+\\.\\d+(\\.\\d+)?(-\\w+)?/g;
126
+
127
+ var dates = (text.match(datePattern) || []).slice(0, 5);
128
+ var versions = (text.match(versionPattern) || []).slice(0, 5);
129
+
130
+ // Truncate to first 3000 chars of meaningful content
131
+ var truncated = text
132
+ .split('\\n')
133
+ .filter(function(l) { return l.trim().length > 0; })
134
+ .slice(0, 60)
135
+ .join('\\n');
136
+
137
+ return {
138
+ text: truncated,
139
+ dates: dates,
140
+ versions: versions,
141
+ title: document.title,
142
+ url: window.location.href,
143
+ hasContent: text.length > 200
144
+ };
145
+ })`);
146
+ const result = content;
147
+ if (!result.hasContent) {
148
+ await page.close();
149
+ continue;
150
+ }
151
+ // Check if this actually looks like a changelog
152
+ const looksLikeChangelog = result.url.toLowerCase().includes("changelog") ||
153
+ result.url.toLowerCase().includes("release") ||
154
+ result.url.toLowerCase().includes("update") ||
155
+ result.title.toLowerCase().includes("changelog") ||
156
+ result.title.toLowerCase().includes("release") ||
157
+ result.dates.length > 0 ||
158
+ result.versions.length > 1;
159
+ if (!looksLikeChangelog && url !== baseUrl) {
160
+ await page.close();
161
+ continue;
162
+ }
163
+ await browser.close();
164
+ const raw = [
165
+ `Source: ${result.url}`,
166
+ `Title: ${result.title}`,
167
+ result.versions.length ? `Versions found: ${result.versions.join(", ")}` : null,
168
+ result.dates.length ? `Dates found: ${result.dates.join(", ")}` : null,
169
+ ``,
170
+ sanitize(result.text),
171
+ ].filter(Boolean).join("\n").slice(0, maxLength);
172
+ // Best date is the first/most recent date found
173
+ const newestDate = result.dates.length > 0
174
+ ? result.dates.sort().reverse()[0]
175
+ : null;
176
+ const confidence = result.dates.length > 0 ? "medium" : "low";
177
+ return { raw, content_date: newestDate, freshness_confidence: confidence };
178
+ }
179
+ catch {
180
+ await page.close();
181
+ continue;
182
+ }
183
+ }
184
+ await browser.close();
185
+ throw new Error(`No changelog found at ${baseUrl} or common changelog paths`);
186
+ }
187
+ // ─── Main export ──────────────────────────────────────────────────────────────
188
+ export async function changelogAdapter(options) {
189
+ const input = (options.url ?? "").trim();
190
+ const maxLength = options.maxLength ?? 6000;
191
+ // npm package name (no http, no dots at start, no slashes)
192
+ if (!input.startsWith("http") && !input.includes("/") && input.length > 0) {
193
+ return fetchNpmChangelog(input, maxLength);
194
+ }
195
+ // GitHub repo URL → use releases API
196
+ const ghMatch = input.match(/github\.com\/([^/]+)\/([^/?\s]+)/);
197
+ if (ghMatch) {
198
+ try {
199
+ return await fetchGitHubReleases(ghMatch[1], ghMatch[2], maxLength);
200
+ }
201
+ catch {
202
+ // Fall through to browser scrape if API fails
203
+ }
204
+ }
205
+ // Any other URL → discover changelog
206
+ return discoverChangelog(input, maxLength);
207
+ }
package/dist/server.js CHANGED
@@ -9,6 +9,7 @@ import { ycAdapter } from "./adapters/yc.js";
9
9
  import { repoSearchAdapter } from "./adapters/repoSearch.js";
10
10
  import { packageTrendsAdapter } from "./adapters/packageTrends.js";
11
11
  import { jobsAdapter } from "./adapters/jobs.js";
12
+ import { changelogAdapter } from "./adapters/changelog.js";
12
13
  import { stampFreshness, formatForLLM } from "./tools/freshnessStamp.js";
13
14
  import { formatSecurityError } from "./security.js";
14
15
  const server = new McpServer({
@@ -182,6 +183,24 @@ server.registerTool("search_jobs", {
182
183
  return { content: [{ type: "text", text: formatSecurityError(err) }] };
183
184
  }
184
185
  });
186
+ // ─── Tool: extract_changelog ────────────────────────────────────────────────
187
+ server.registerTool("extract_changelog", {
188
+ description: "Extract update history from any product, repo, or package. Accepts a GitHub URL (uses Releases API), an npm package name, or any website URL (auto-discovers /changelog, /releases, /CHANGELOG.md). Returns version numbers, release dates, and entry content — all timestamped. Use this to check if a tool is actively maintained, when a feature shipped, or how fast a team moves.",
189
+ inputSchema: z.object({
190
+ url: z.string().describe("GitHub repo URL (https://github.com/owner/repo), npm package name (e.g. 'freshcontext-mcp'), or any website URL (https://example.com). Auto-discovers changelog paths."),
191
+ max_length: z.number().optional().default(6000).describe("Max content length"),
192
+ }),
193
+ annotations: { readOnlyHint: true, openWorldHint: true },
194
+ }, async ({ url, max_length }) => {
195
+ try {
196
+ const result = await changelogAdapter({ url, maxLength: max_length });
197
+ const ctx = stampFreshness(result, { url, maxLength: max_length }, "changelog");
198
+ return { content: [{ type: "text", text: formatForLLM(ctx) }] };
199
+ }
200
+ catch (err) {
201
+ return { content: [{ type: "text", text: formatSecurityError(err) }] };
202
+ }
203
+ });
185
204
  // ─── Start ───────────────────────────────────────────────────────────────────
186
205
  async function main() {
187
206
  const transport = new StdioServerTransport();
@@ -0,0 +1,49 @@
1
+ {
2
+ "title": "FreshContext MCP Input",
3
+ "type": "object",
4
+ "schemaVersion": 1,
5
+ "properties": {
6
+ "tool": {
7
+ "title": "Tool",
8
+ "type": "string",
9
+ "description": "The FreshContext tool to run.",
10
+ "enum": [
11
+ "extract_github",
12
+ "extract_hackernews",
13
+ "extract_scholar",
14
+ "extract_arxiv",
15
+ "extract_reddit",
16
+ "extract_yc",
17
+ "extract_producthunt",
18
+ "search_repos",
19
+ "package_trends",
20
+ "extract_finance",
21
+ "extract_landscape"
22
+ ],
23
+ "default": "extract_landscape",
24
+ "editor": "select"
25
+ },
26
+ "url": {
27
+ "title": "URL",
28
+ "type": "string",
29
+ "description": "URL to extract from. Required for: extract_github, extract_hackernews, extract_scholar, extract_reddit. E.g. https://github.com/owner/repo",
30
+ "editor": "textfield"
31
+ },
32
+ "query": {
33
+ "title": "Query",
34
+ "type": "string",
35
+ "description": "Search query. Required for: extract_landscape, search_repos, extract_yc, extract_producthunt, package_trends, extract_finance.",
36
+ "editor": "textfield"
37
+ },
38
+ "max_length": {
39
+ "title": "Max content length",
40
+ "type": "integer",
41
+ "description": "Maximum characters returned per result. Default: 6000.",
42
+ "default": 6000,
43
+ "minimum": 500,
44
+ "maximum": 20000,
45
+ "editor": "number"
46
+ }
47
+ },
48
+ "required": ["tool"]
49
+ }
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "freshcontext-mcp",
3
3
  "mcpName": "io.github.PrinceGabriel-lgtm/freshcontext",
4
- "version": "0.3.1",
4
+ "version": "0.3.2",
5
5
  "description": "Real-time web extraction MCP server with freshness timestamps for AI agents",
6
6
  "keywords": [
7
7
  "mcp",