freshcontext-mcp 0.1.1 β†’ 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,60 +1,92 @@
1
1
  # freshcontext-mcp
2
2
 
3
- > Real-time web extraction MCP server with guaranteed freshness timestamps for AI agents.
3
+ > Timestamped web intelligence for AI agents. Every result is wrapped in a **FreshContext envelope** β€” so your agent always knows *when* it's looking at data, not just *what*.
4
+
5
+ [![npm version](https://img.shields.io/npm/v/freshcontext-mcp)](https://www.npmjs.com/package/freshcontext-mcp)
6
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
7
+
8
+ ---
4
9
 
5
10
  ## The Problem
6
11
 
7
- LLMs hallucinate recency. They'll cite a 2022 job posting as "current" or recall outdated API docs as if they're live. This happens because they have no reliable signal for *when* data was retrieved vs. when it was published.
12
+ LLMs hallucinate recency. They'll cite a 2022 job posting as "current", recall outdated API docs as if they're live, or tell you a project is active when it hasn't been touched in two years. This happens because they have no reliable signal for *when* data was retrieved vs. when it was published.
8
13
 
9
- ## The Fix
14
+ Existing MCP servers return raw content. No timestamp. No confidence signal. No way for the agent to know if it's looking at something from this morning or three years ago.
10
15
 
11
- Every piece of data extracted by `freshcontext-mcp` is wrapped in a `FreshContext` envelope:
16
+ ## The Fix: FreshContext Envelope
12
17
 
13
- ```json
14
- {
15
- "content": "...",
16
- "source_url": "https://github.com/owner/repo",
17
- "content_date": "2024-11-03",
18
- "retrieved_at": "2026-03-02T10:14:00Z",
19
- "freshness_confidence": "high",
20
- "adapter": "github"
21
- }
18
+ Every piece of data extracted by `freshcontext-mcp` is wrapped in a structured envelope:
19
+
20
+ ```
21
+ [FRESHCONTEXT]
22
+ Source: https://github.com/owner/repo
23
+ Published: 2024-11-03
24
+ Retrieved: 2026-03-03T10:14:00Z
25
+ Confidence: high
26
+ ---
27
+ ... content ...
28
+ [/FRESHCONTEXT]
22
29
  ```
23
30
 
24
- The AI agent always knows *when it's looking at*, not just *what*.
31
+ The AI agent always knows **when it's looking at data**, not just what the data says. This is the difference between a hallucinated recency claim and a verifiable one.
32
+
33
+ ---
34
+
35
+ ## Tools
36
+
37
+ ### πŸ”¬ Intelligence Tools
25
38
 
26
- ## Adapters
39
+ | Tool | Description |
40
+ |---|---|
41
+ | `extract_github` | README, stars, forks, language, topics, last commit from any GitHub repo |
42
+ | `extract_hackernews` | Top stories or search results from HN with scores and timestamps |
43
+ | `extract_scholar` | Research paper titles, authors, years, and snippets from Google Scholar |
27
44
 
28
- | Adapter | Tool Name | What it extracts |
29
- |---|---|---|
30
- | GitHub | `extract_github` | README, stars, forks, last commit, topics |
31
- | Google Scholar | `extract_scholar` | Titles, authors, years, snippets |
32
- | Hacker News | `extract_hackernews` | Top stories, scores, post timestamps |
45
+ ### πŸš€ Competitive Intelligence Tools
33
46
 
34
- ## Setup
47
+ | Tool | Description |
48
+ |---|---|
49
+ | `extract_yc` | Scrape YC company listings by keyword β€” find who's funded in your space |
50
+ | `search_repos` | Search GitHub for similar/competing repos, ranked by stars with activity signals |
51
+ | `package_trends` | npm and PyPI package metadata β€” version history, release cadence, last updated |
52
+
53
+ ### πŸ—ΊοΈ Composite Tool
54
+
55
+ | Tool | Description |
56
+ |---|---|
57
+ | `extract_landscape` | **One call. Full picture.** Queries YC startups + GitHub repos + HN sentiment + package ecosystem simultaneously. Returns a unified landscape report. |
58
+
59
+ ---
60
+
61
+ ## Quick Start
62
+
63
+ ### Install via npm
35
64
 
36
65
  ```bash
37
- git clone https://github.com/YOUR_USERNAME/freshcontext-mcp
38
- cd freshcontext-mcp
39
- npm install
40
- npx playwright install chromium
41
- npm run build
66
+ npx freshcontext-mcp
42
67
  ```
43
68
 
44
- ## Test locally
69
+ ### Or clone and run locally
45
70
 
46
71
  ```bash
47
- npm run inspect
72
+ git clone https://github.com/PrinceGabriel-lgtm/freshcontext-mcp
73
+ cd freshcontext-mcp
74
+ npm install
75
+ npx playwright install chromium
76
+ npm run build
48
77
  ```
49
78
 
50
- ## Connect to Claude
79
+ ### Connect to Claude Desktop
51
80
 
52
81
  Add to your `claude_desktop_config.json`:
53
82
 
83
+ **Mac:** `~/Library/Application Support/Claude/claude_desktop_config.json`
84
+ **Windows:** `%APPDATA%\Claude\claude_desktop_config.json`
85
+
54
86
  ```json
55
87
  {
56
88
  "mcpServers": {
57
- "freshcontext": {
89
+ "freshcontext-local": {
58
90
  "command": "node",
59
91
  "args": ["/absolute/path/to/freshcontext-mcp/dist/server.js"]
60
92
  }
@@ -62,10 +94,131 @@ Add to your `claude_desktop_config.json`:
62
94
  }
63
95
  ```
64
96
 
97
+ Restart Claude Desktop. You'll see the freshcontext tools available in your session.
98
+
99
+ ### Or use the Cloudflare edge deployment (no install needed)
100
+
101
+ ```json
102
+ {
103
+ "mcpServers": {
104
+ "freshcontext-cloud": {
105
+ "command": "npx",
106
+ "args": ["-y", "mcp-remote", "https://freshcontext-worker.gimmanuel73.workers.dev/mcp"]
107
+ }
108
+ }
109
+ }
110
+ ```
111
+
112
+ ---
113
+
114
+ ## Usage Examples
115
+
116
+ ### Check if anyone is already building what you're building
117
+
118
+ ```
119
+ Use extract_landscape with topic "cashflow prediction mcp"
120
+ ```
121
+
122
+ Returns a unified report: who's funded (YC), what's trending (HN), what repos exist (GitHub), what packages are active (npm/PyPI). All timestamped.
123
+
124
+ ### Analyse a specific repo
125
+
126
+ ```
127
+ Use extract_github on https://github.com/anthropics/anthropic-sdk-python
128
+ ```
129
+
130
+ ### Find research papers on a topic
131
+
132
+ ```
133
+ Use extract_scholar on https://scholar.google.com/scholar?q=llm+context+freshness
134
+ ```
135
+
136
+ ### Check package ecosystem health
137
+
138
+ ```
139
+ Use package_trends with packages "npm:@modelcontextprotocol/sdk,pypi:langchain"
140
+ ```
141
+
142
+ ---
143
+
144
+ ## Why FreshContext?
145
+
146
+ Most AI agents retrieve data but don't timestamp it. This creates a silent failure mode: the agent presents stale information with the same confidence as fresh information. The user has no way to know the difference.
147
+
148
+ FreshContext treats **retrieval time as first-class metadata**. Every adapter returns:
149
+
150
+ - `retrieved_at` β€” exact ISO timestamp of when the data was fetched
151
+ - `content_date` β€” best estimate of when the content was originally published
152
+ - `freshness_confidence` β€” `high`, `medium`, or `low` based on signal quality
153
+ - `adapter` β€” which source the data came from
154
+
155
+ This makes freshness **verifiable**, not assumed.
156
+
157
+ ---
158
+
159
+ ## Deployment
160
+
161
+ ### Local (Playwright-based)
162
+ Uses headless Chromium via Playwright. Full browser rendering for JavaScript-heavy sites.
163
+
164
+ ### Cloud (Cloudflare Workers)
165
+ The `worker/` directory contains a Cloudflare Workers deployment using the Browser Rendering REST API. No Playwright dependency β€” runs at the edge globally.
166
+
167
+ ```bash
168
+ cd worker
169
+ npm install
170
+ npx wrangler secret put CF_API_TOKEN
171
+ npx wrangler deploy
172
+ ```
173
+
174
+ ---
175
+
176
+ ## Project Structure
177
+
178
+ ```
179
+ freshcontext-mcp/
180
+ β”œβ”€β”€ src/
181
+ β”‚ β”œβ”€β”€ server.ts # MCP server, all tool registrations
182
+ β”‚ β”œβ”€β”€ types.ts # FreshContext interfaces
183
+ β”‚ β”œβ”€β”€ adapters/
184
+ β”‚ β”‚ β”œβ”€β”€ github.ts # GitHub repo extraction
185
+ β”‚ β”‚ β”œβ”€β”€ hackernews.ts # HN front page + Algolia API
186
+ β”‚ β”‚ β”œβ”€β”€ scholar.ts # Google Scholar scraping
187
+ β”‚ β”‚ β”œβ”€β”€ yc.ts # YC company directory
188
+ β”‚ β”‚ β”œβ”€β”€ repoSearch.ts # GitHub Search API
189
+ β”‚ β”‚ └── packageTrends.ts # npm + PyPI registries
190
+ β”‚ └── tools/
191
+ β”‚ └── freshnessStamp.ts # FreshContext envelope builder
192
+ └── worker/ # Cloudflare Workers deployment
193
+ └── src/worker.ts
194
+ ```
195
+
196
+ ---
197
+
65
198
  ## Roadmap
66
199
 
67
- - [ ] Twitter/X public feed adapter
68
- - [ ] Dev.to / Hashnode adapter
69
- - [ ] Supabase changelog adapter
70
- - [ ] Cloudflare Worker deployment
71
- - [ ] Caching layer with TTL
200
+ - [x] GitHub adapter
201
+ - [x] Hacker News adapter
202
+ - [x] Google Scholar adapter
203
+ - [x] YC startup scraper
204
+ - [x] GitHub repo search
205
+ - [x] npm/PyPI package trends
206
+ - [x] `extract_landscape` composite tool
207
+ - [x] Cloudflare Workers deployment
208
+ - [ ] Product Hunt launches adapter
209
+ - [ ] Crunchbase/funding signals adapter
210
+ - [ ] TTL-based caching layer
211
+ - [ ] `freshness_score` numeric metric
212
+ - [ ] Webhook support for real-time updates
213
+
214
+ ---
215
+
216
+ ## Contributing
217
+
218
+ PRs welcome. New adapters are the highest-value contribution β€” see the existing adapters in `src/adapters/` for the pattern. Each adapter returns `{ raw, content_date, freshness_confidence }`.
219
+
220
+ ---
221
+
222
+ ## License
223
+
224
+ MIT
@@ -1,5 +1,8 @@
1
1
  import { chromium } from "playwright";
2
+ import { validateUrl } from "../security.js";
2
3
  export async function githubAdapter(options) {
4
+ const safeUrl = validateUrl(options.url, "github");
5
+ options = { ...options, url: safeUrl };
3
6
  const browser = await chromium.launch({ headless: true });
4
7
  const page = await browser.newPage();
5
8
  // Spoof a real browser UA to avoid bot detection
@@ -1,6 +1,8 @@
1
1
  import { chromium } from "playwright";
2
+ import { validateUrl } from "../security.js";
2
3
  export async function hackerNewsAdapter(options) {
3
- // If it's an Algolia API URL or search query, use the REST API directly (no browser)
4
+ // Validate URL β€” allow both HN and Algolia domains
5
+ validateUrl(options.url, "hackernews");
4
6
  const url = options.url;
5
7
  if (url.includes("hn.algolia.com/api/") || url.startsWith("hn-search:")) {
6
8
  const query = url.startsWith("hn-search:")
@@ -1,8 +1,8 @@
1
+ import { sanitizePackages } from "../security.js";
1
2
  // Uses npm registry API + PyPI JSON API (no auth needed)
2
3
  export async function packageTrendsAdapter(options) {
3
- // options.url is the package name or a comma-separated list
4
- // e.g. "langchain" or "npm:langchain" or "pypi:langchain"
5
- const raw_input = options.url.replace(/^https?:\/\//, "").trim();
4
+ // Sanitize package input
5
+ const raw_input = sanitizePackages(options.url.replace(/^https?:\/\//, "").trim());
6
6
  // Parse ecosystem prefix
7
7
  const parts = raw_input.split(",").map((s) => s.trim());
8
8
  const results = [];
@@ -1,8 +1,9 @@
1
+ import { sanitizeQuery } from "../security.js";
1
2
  // Uses GitHub Search API (no auth needed for basic search)
2
3
  export async function repoSearchAdapter(options) {
3
- // options.url is treated as the search query string
4
- // e.g. "mcp server typescript" or a full GitHub search URL
5
- let query = options.url;
4
+ // Sanitize query input
5
+ const query_input = sanitizeQuery(options.url);
6
+ let query = query_input;
6
7
  // If it's a full URL, extract the query param
7
8
  try {
8
9
  const parsed = new URL(options.url);
@@ -1,5 +1,8 @@
1
1
  import { chromium } from "playwright";
2
+ import { validateUrl } from "../security.js";
2
3
  export async function scholarAdapter(options) {
4
+ const safeUrl = validateUrl(options.url, "scholar");
5
+ options = { ...options, url: safeUrl };
3
6
  const browser = await chromium.launch({ headless: true });
4
7
  const page = await browser.newPage();
5
8
  await page.setExtraHTTPHeaders({
@@ -1,5 +1,8 @@
1
1
  import { chromium } from "playwright";
2
+ import { validateUrl } from "../security.js";
2
3
  export async function ycAdapter(options) {
4
+ const safeUrl = validateUrl(options.url, "yc");
5
+ options = { ...options, url: safeUrl };
3
6
  const browser = await chromium.launch({ headless: true });
4
7
  const page = await browser.newPage();
5
8
  // YC company directory is React-rendered β€” wait for network to settle
@@ -0,0 +1,117 @@
1
+ /**
2
+ * freshcontext-mcp security module
3
+ * Input sanitization, domain allowlists, and request validation
4
+ */
5
+ // ─── Allowed domains per adapter ────────────────────────────────────────────
6
+ export const ALLOWED_DOMAINS = {
7
+ github: ["github.com", "raw.githubusercontent.com"],
8
+ scholar: ["scholar.google.com"],
9
+ hackernews: ["news.ycombinator.com", "hn.algolia.com"],
10
+ yc: ["www.ycombinator.com", "ycombinator.com"],
11
+ repoSearch: [], // uses GitHub API directly, no browser
12
+ packageTrends: [], // uses npm/PyPI APIs directly, no browser
13
+ };
14
+ // ─── Blocked IP ranges and internal hostnames ────────────────────────────────
15
+ const BLOCKED_PATTERNS = [
16
+ /^localhost$/i,
17
+ /^127\.\d+\.\d+\.\d+$/,
18
+ /^10\.\d+\.\d+\.\d+$/,
19
+ /^172\.(1[6-9]|2\d|3[01])\.\d+\.\d+$/,
20
+ /^192\.168\.\d+\.\d+$/,
21
+ /^169\.254\.\d+\.\d+$/, // AWS metadata
22
+ /^0\.0\.0\.0$/,
23
+ /^::1$/,
24
+ /^fc00:/i,
25
+ /^fe80:/i,
26
+ ];
27
+ // ─── Max length limits ────────────────────────────────────────────────────────
28
+ export const MAX_URL_LENGTH = 500;
29
+ export const MAX_QUERY_LENGTH = 200;
30
+ export const MAX_PACKAGES_LENGTH = 300;
31
+ // ─── Validation errors ───────────────────────────────────────────────────────
32
+ export class SecurityError extends Error {
33
+ constructor(message) {
34
+ super(message);
35
+ this.name = "SecurityError";
36
+ }
37
+ }
38
+ // ─── URL validator ───────────────────────────────────────────────────────────
39
+ export function validateUrl(rawUrl, adapterName) {
40
+ // Length check
41
+ if (!rawUrl || rawUrl.trim().length === 0) {
42
+ throw new SecurityError("URL cannot be empty");
43
+ }
44
+ if (rawUrl.length > MAX_URL_LENGTH) {
45
+ throw new SecurityError(`URL exceeds maximum length of ${MAX_URL_LENGTH} characters`);
46
+ }
47
+ // Must be a valid URL
48
+ let parsed;
49
+ try {
50
+ parsed = new URL(rawUrl.trim());
51
+ }
52
+ catch {
53
+ throw new SecurityError(`Invalid URL format: ${rawUrl}`);
54
+ }
55
+ // Must use http or https
56
+ if (!["http:", "https:"].includes(parsed.protocol)) {
57
+ throw new SecurityError(`Protocol not allowed: ${parsed.protocol}. Only http/https permitted.`);
58
+ }
59
+ const hostname = parsed.hostname.toLowerCase();
60
+ // Block internal/private IPs and hostnames
61
+ for (const pattern of BLOCKED_PATTERNS) {
62
+ if (pattern.test(hostname)) {
63
+ throw new SecurityError(`Access to internal/private addresses is not permitted: ${hostname}`);
64
+ }
65
+ }
66
+ // Domain allowlist check (skip if allowlist is empty β€” means no browser used)
67
+ const allowedDomains = ALLOWED_DOMAINS[adapterName];
68
+ if (allowedDomains && allowedDomains.length > 0) {
69
+ const isAllowed = allowedDomains.some((domain) => hostname === domain || hostname.endsWith(`.${domain}`));
70
+ if (!isAllowed) {
71
+ throw new SecurityError(`Domain not allowed for ${adapterName} adapter: ${hostname}. ` +
72
+ `Allowed domains: ${allowedDomains.join(", ")}`);
73
+ }
74
+ }
75
+ return parsed.toString();
76
+ }
77
+ // ─── Query string sanitizer ──────────────────────────────────────────────────
78
+ export function sanitizeQuery(query, maxLength = MAX_QUERY_LENGTH) {
79
+ if (!query || query.trim().length === 0) {
80
+ throw new SecurityError("Query cannot be empty");
81
+ }
82
+ const trimmed = query.trim().slice(0, maxLength);
83
+ // Strip null bytes and control characters
84
+ const cleaned = trimmed.replace(/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]/g, "");
85
+ if (cleaned.length === 0) {
86
+ throw new SecurityError("Query contains no valid characters after sanitization");
87
+ }
88
+ return cleaned;
89
+ }
90
+ // ─── Package name sanitizer ──────────────────────────────────────────────────
91
+ export function sanitizePackages(input) {
92
+ if (!input || input.trim().length === 0) {
93
+ throw new SecurityError("Package name cannot be empty");
94
+ }
95
+ if (input.length > MAX_PACKAGES_LENGTH) {
96
+ throw new SecurityError(`Package input exceeds maximum length of ${MAX_PACKAGES_LENGTH} characters`);
97
+ }
98
+ // Only allow valid npm/PyPI package name characters, commas, colons (for npm:/pypi: prefix)
99
+ const cleaned = input
100
+ .trim()
101
+ .replace(/[^a-zA-Z0-9@/._\-,:]/g, "")
102
+ .slice(0, MAX_PACKAGES_LENGTH);
103
+ if (cleaned.length === 0) {
104
+ throw new SecurityError("Package name contains no valid characters after sanitization");
105
+ }
106
+ return cleaned;
107
+ }
108
+ // ─── Error formatter ─────────────────────────────────────────────────────────
109
+ export function formatSecurityError(err) {
110
+ if (err instanceof SecurityError) {
111
+ return `[Security] ${err.message}`;
112
+ }
113
+ if (err instanceof Error) {
114
+ return `[Error] ${err.message}`;
115
+ }
116
+ return "[Error] Unknown error occurred";
117
+ }
package/dist/server.js CHANGED
@@ -8,6 +8,7 @@ import { ycAdapter } from "./adapters/yc.js";
8
8
  import { repoSearchAdapter } from "./adapters/repoSearch.js";
9
9
  import { packageTrendsAdapter } from "./adapters/packageTrends.js";
10
10
  import { stampFreshness, formatForLLM } from "./tools/freshnessStamp.js";
11
+ import { formatSecurityError } from "./security.js";
11
12
  const server = new McpServer({
12
13
  name: "freshcontext-mcp",
13
14
  version: "0.1.0",
@@ -21,9 +22,14 @@ server.registerTool("extract_github", {
21
22
  }),
22
23
  annotations: { readOnlyHint: true, openWorldHint: true },
23
24
  }, async ({ url, max_length }) => {
24
- const result = await githubAdapter({ url, maxLength: max_length });
25
- const ctx = stampFreshness(result, { url, maxLength: max_length }, "github");
26
- return { content: [{ type: "text", text: formatForLLM(ctx) }] };
25
+ try {
26
+ const result = await githubAdapter({ url, maxLength: max_length });
27
+ const ctx = stampFreshness(result, { url, maxLength: max_length }, "github");
28
+ return { content: [{ type: "text", text: formatForLLM(ctx) }] };
29
+ }
30
+ catch (err) {
31
+ return { content: [{ type: "text", text: formatSecurityError(err) }] };
32
+ }
27
33
  });
28
34
  // ─── Tool: extract_scholar ───────────────────────────────────────────────────
29
35
  server.registerTool("extract_scholar", {
@@ -34,9 +40,14 @@ server.registerTool("extract_scholar", {
34
40
  }),
35
41
  annotations: { readOnlyHint: true, openWorldHint: true },
36
42
  }, async ({ url, max_length }) => {
37
- const result = await scholarAdapter({ url, maxLength: max_length });
38
- const ctx = stampFreshness(result, { url, maxLength: max_length }, "google_scholar");
39
- return { content: [{ type: "text", text: formatForLLM(ctx) }] };
43
+ try {
44
+ const result = await scholarAdapter({ url, maxLength: max_length });
45
+ const ctx = stampFreshness(result, { url, maxLength: max_length }, "google_scholar");
46
+ return { content: [{ type: "text", text: formatForLLM(ctx) }] };
47
+ }
48
+ catch (err) {
49
+ return { content: [{ type: "text", text: formatSecurityError(err) }] };
50
+ }
40
51
  });
41
52
  // ─── Tool: extract_hackernews ────────────────────────────────────────────────
42
53
  server.registerTool("extract_hackernews", {
@@ -47,9 +58,14 @@ server.registerTool("extract_hackernews", {
47
58
  }),
48
59
  annotations: { readOnlyHint: true, openWorldHint: true },
49
60
  }, async ({ url, max_length }) => {
50
- const result = await hackerNewsAdapter({ url, maxLength: max_length });
51
- const ctx = stampFreshness(result, { url, maxLength: max_length }, "hackernews");
52
- return { content: [{ type: "text", text: formatForLLM(ctx) }] };
61
+ try {
62
+ const result = await hackerNewsAdapter({ url, maxLength: max_length });
63
+ const ctx = stampFreshness(result, { url, maxLength: max_length }, "hackernews");
64
+ return { content: [{ type: "text", text: formatForLLM(ctx) }] };
65
+ }
66
+ catch (err) {
67
+ return { content: [{ type: "text", text: formatSecurityError(err) }] };
68
+ }
53
69
  });
54
70
  // ─── Tool: extract_yc ──────────────────────────────────────────────────────────
55
71
  server.registerTool("extract_yc", {
@@ -60,9 +76,14 @@ server.registerTool("extract_yc", {
60
76
  }),
61
77
  annotations: { readOnlyHint: true, openWorldHint: true },
62
78
  }, async ({ url, max_length }) => {
63
- const result = await ycAdapter({ url, maxLength: max_length });
64
- const ctx = stampFreshness(result, { url, maxLength: max_length }, "ycombinator");
65
- return { content: [{ type: "text", text: formatForLLM(ctx) }] };
79
+ try {
80
+ const result = await ycAdapter({ url, maxLength: max_length });
81
+ const ctx = stampFreshness(result, { url, maxLength: max_length }, "ycombinator");
82
+ return { content: [{ type: "text", text: formatForLLM(ctx) }] };
83
+ }
84
+ catch (err) {
85
+ return { content: [{ type: "text", text: formatSecurityError(err) }] };
86
+ }
66
87
  });
67
88
  // ─── Tool: search_repos ──────────────────────────────────────────────────────
68
89
  server.registerTool("search_repos", {
@@ -73,9 +94,14 @@ server.registerTool("search_repos", {
73
94
  }),
74
95
  annotations: { readOnlyHint: true, openWorldHint: true },
75
96
  }, async ({ query, max_length }) => {
76
- const result = await repoSearchAdapter({ url: query, maxLength: max_length });
77
- const ctx = stampFreshness(result, { url: query, maxLength: max_length }, "github_search");
78
- return { content: [{ type: "text", text: formatForLLM(ctx) }] };
97
+ try {
98
+ const result = await repoSearchAdapter({ url: query, maxLength: max_length });
99
+ const ctx = stampFreshness(result, { url: query, maxLength: max_length }, "github_search");
100
+ return { content: [{ type: "text", text: formatForLLM(ctx) }] };
101
+ }
102
+ catch (err) {
103
+ return { content: [{ type: "text", text: formatSecurityError(err) }] };
104
+ }
79
105
  });
80
106
  // ─── Tool: package_trends ────────────────────────────────────────────────────
81
107
  server.registerTool("package_trends", {
@@ -86,9 +112,14 @@ server.registerTool("package_trends", {
86
112
  }),
87
113
  annotations: { readOnlyHint: true, openWorldHint: true },
88
114
  }, async ({ packages, max_length }) => {
89
- const result = await packageTrendsAdapter({ url: packages, maxLength: max_length });
90
- const ctx = stampFreshness(result, { url: packages, maxLength: max_length }, "package_registry");
91
- return { content: [{ type: "text", text: formatForLLM(ctx) }] };
115
+ try {
116
+ const result = await packageTrendsAdapter({ url: packages, maxLength: max_length });
117
+ const ctx = stampFreshness(result, { url: packages, maxLength: max_length }, "package_registry");
118
+ return { content: [{ type: "text", text: formatForLLM(ctx) }] };
119
+ }
120
+ catch (err) {
121
+ return { content: [{ type: "text", text: formatSecurityError(err) }] };
122
+ }
92
123
  });
93
124
  // ─── Tool: extract_landscape ─────────────────────────────────────────────────
94
125
  server.registerTool("extract_landscape", {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "freshcontext-mcp",
3
- "version": "0.1.1",
3
+ "version": "0.1.3",
4
4
  "description": "Real-time web extraction MCP server with freshness timestamps for AI agents",
5
5
  "keywords": [
6
6
  "mcp",
@@ -1,7 +1,11 @@
1
1
  import { chromium } from "playwright";
2
2
  import { AdapterResult, ExtractOptions } from "../types.js";
3
+ import { validateUrl } from "../security.js";
3
4
 
4
5
  export async function githubAdapter(options: ExtractOptions): Promise<AdapterResult> {
6
+ const safeUrl = validateUrl(options.url, "github");
7
+ options = { ...options, url: safeUrl };
8
+
5
9
  const browser = await chromium.launch({ headless: true });
6
10
  const page = await browser.newPage();
7
11
 
@@ -1,8 +1,10 @@
1
1
  import { chromium } from "playwright";
2
2
  import { AdapterResult, ExtractOptions } from "../types.js";
3
+ import { validateUrl } from "../security.js";
3
4
 
4
5
  export async function hackerNewsAdapter(options: ExtractOptions): Promise<AdapterResult> {
5
- // If it's an Algolia API URL or search query, use the REST API directly (no browser)
6
+ // Validate URL β€” allow both HN and Algolia domains
7
+ validateUrl(options.url, "hackernews");
6
8
  const url = options.url;
7
9
 
8
10
  if (url.includes("hn.algolia.com/api/") || url.startsWith("hn-search:")) {
@@ -1,10 +1,10 @@
1
1
  import { AdapterResult, ExtractOptions } from "../types.js";
2
+ import { sanitizePackages } from "../security.js";
2
3
 
3
4
  // Uses npm registry API + PyPI JSON API (no auth needed)
4
5
  export async function packageTrendsAdapter(options: ExtractOptions): Promise<AdapterResult> {
5
- // options.url is the package name or a comma-separated list
6
- // e.g. "langchain" or "npm:langchain" or "pypi:langchain"
7
- const raw_input = options.url.replace(/^https?:\/\//, "").trim();
6
+ // Sanitize package input
7
+ const raw_input = sanitizePackages(options.url.replace(/^https?:\/\//, "").trim());
8
8
 
9
9
  // Parse ecosystem prefix
10
10
  const parts = raw_input.split(",").map((s) => s.trim());
@@ -1,10 +1,11 @@
1
1
  import { AdapterResult, ExtractOptions } from "../types.js";
2
+ import { sanitizeQuery } from "../security.js";
2
3
 
3
4
  // Uses GitHub Search API (no auth needed for basic search)
4
5
  export async function repoSearchAdapter(options: ExtractOptions): Promise<AdapterResult> {
5
- // options.url is treated as the search query string
6
- // e.g. "mcp server typescript" or a full GitHub search URL
7
- let query = options.url;
6
+ // Sanitize query input
7
+ const query_input = sanitizeQuery(options.url);
8
+ let query = query_input;
8
9
 
9
10
  // If it's a full URL, extract the query param
10
11
  try {
@@ -1,7 +1,11 @@
1
1
  import { chromium } from "playwright";
2
2
  import { AdapterResult, ExtractOptions } from "../types.js";
3
+ import { validateUrl } from "../security.js";
3
4
 
4
5
  export async function scholarAdapter(options: ExtractOptions): Promise<AdapterResult> {
6
+ const safeUrl = validateUrl(options.url, "scholar");
7
+ options = { ...options, url: safeUrl };
8
+
5
9
  const browser = await chromium.launch({ headless: true });
6
10
  const page = await browser.newPage();
7
11
 
@@ -1,7 +1,11 @@
1
1
  import { chromium } from "playwright";
2
2
  import { AdapterResult, ExtractOptions } from "../types.js";
3
+ import { validateUrl } from "../security.js";
3
4
 
4
5
  export async function ycAdapter(options: ExtractOptions): Promise<AdapterResult> {
6
+ const safeUrl = validateUrl(options.url, "yc");
7
+ options = { ...options, url: safeUrl };
8
+
5
9
  const browser = await chromium.launch({ headless: true });
6
10
  const page = await browser.newPage();
7
11
 
@@ -0,0 +1,161 @@
1
+ /**
2
+ * freshcontext-mcp security module
3
+ * Input sanitization, domain allowlists, and request validation
4
+ */
5
+
6
+ // ─── Allowed domains per adapter ────────────────────────────────────────────
7
+
8
+ export const ALLOWED_DOMAINS: Record<string, string[]> = {
9
+ github: ["github.com", "raw.githubusercontent.com"],
10
+ scholar: ["scholar.google.com"],
11
+ hackernews: ["news.ycombinator.com", "hn.algolia.com"],
12
+ yc: ["www.ycombinator.com", "ycombinator.com"],
13
+ repoSearch: [], // uses GitHub API directly, no browser
14
+ packageTrends: [], // uses npm/PyPI APIs directly, no browser
15
+ };
16
+
17
+ // ─── Blocked IP ranges and internal hostnames ────────────────────────────────
18
+
19
+ const BLOCKED_PATTERNS = [
20
+ /^localhost$/i,
21
+ /^127\.\d+\.\d+\.\d+$/,
22
+ /^10\.\d+\.\d+\.\d+$/,
23
+ /^172\.(1[6-9]|2\d|3[01])\.\d+\.\d+$/,
24
+ /^192\.168\.\d+\.\d+$/,
25
+ /^169\.254\.\d+\.\d+$/, // AWS metadata
26
+ /^0\.0\.0\.0$/,
27
+ /^::1$/,
28
+ /^fc00:/i,
29
+ /^fe80:/i,
30
+ ];
31
+
32
+ // ─── Max length limits ────────────────────────────────────────────────────────
33
+
34
+ export const MAX_URL_LENGTH = 500;
35
+ export const MAX_QUERY_LENGTH = 200;
36
+ export const MAX_PACKAGES_LENGTH = 300;
37
+
38
+ // ─── Validation errors ───────────────────────────────────────────────────────
39
+
40
+ export class SecurityError extends Error {
41
+ constructor(message: string) {
42
+ super(message);
43
+ this.name = "SecurityError";
44
+ }
45
+ }
46
+
47
+ // ─── URL validator ───────────────────────────────────────────────────────────
48
+
49
+ export function validateUrl(
50
+ rawUrl: string,
51
+ adapterName: keyof typeof ALLOWED_DOMAINS
52
+ ): string {
53
+ // Length check
54
+ if (!rawUrl || rawUrl.trim().length === 0) {
55
+ throw new SecurityError("URL cannot be empty");
56
+ }
57
+ if (rawUrl.length > MAX_URL_LENGTH) {
58
+ throw new SecurityError(
59
+ `URL exceeds maximum length of ${MAX_URL_LENGTH} characters`
60
+ );
61
+ }
62
+
63
+ // Must be a valid URL
64
+ let parsed: URL;
65
+ try {
66
+ parsed = new URL(rawUrl.trim());
67
+ } catch {
68
+ throw new SecurityError(`Invalid URL format: ${rawUrl}`);
69
+ }
70
+
71
+ // Must use http or https
72
+ if (!["http:", "https:"].includes(parsed.protocol)) {
73
+ throw new SecurityError(
74
+ `Protocol not allowed: ${parsed.protocol}. Only http/https permitted.`
75
+ );
76
+ }
77
+
78
+ const hostname = parsed.hostname.toLowerCase();
79
+
80
+ // Block internal/private IPs and hostnames
81
+ for (const pattern of BLOCKED_PATTERNS) {
82
+ if (pattern.test(hostname)) {
83
+ throw new SecurityError(
84
+ `Access to internal/private addresses is not permitted: ${hostname}`
85
+ );
86
+ }
87
+ }
88
+
89
+ // Domain allowlist check (skip if allowlist is empty β€” means no browser used)
90
+ const allowedDomains = ALLOWED_DOMAINS[adapterName];
91
+ if (allowedDomains && allowedDomains.length > 0) {
92
+ const isAllowed = allowedDomains.some(
93
+ (domain) => hostname === domain || hostname.endsWith(`.${domain}`)
94
+ );
95
+ if (!isAllowed) {
96
+ throw new SecurityError(
97
+ `Domain not allowed for ${adapterName} adapter: ${hostname}. ` +
98
+ `Allowed domains: ${allowedDomains.join(", ")}`
99
+ );
100
+ }
101
+ }
102
+
103
+ return parsed.toString();
104
+ }
105
+
106
+ // ─── Query string sanitizer ──────────────────────────────────────────────────
107
+
108
+ export function sanitizeQuery(query: string, maxLength = MAX_QUERY_LENGTH): string {
109
+ if (!query || query.trim().length === 0) {
110
+ throw new SecurityError("Query cannot be empty");
111
+ }
112
+
113
+ const trimmed = query.trim().slice(0, maxLength);
114
+
115
+ // Strip null bytes and control characters
116
+ const cleaned = trimmed.replace(/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]/g, "");
117
+
118
+ if (cleaned.length === 0) {
119
+ throw new SecurityError("Query contains no valid characters after sanitization");
120
+ }
121
+
122
+ return cleaned;
123
+ }
124
+
125
+ // ─── Package name sanitizer ──────────────────────────────────────────────────
126
+
127
+ export function sanitizePackages(input: string): string {
128
+ if (!input || input.trim().length === 0) {
129
+ throw new SecurityError("Package name cannot be empty");
130
+ }
131
+
132
+ if (input.length > MAX_PACKAGES_LENGTH) {
133
+ throw new SecurityError(
134
+ `Package input exceeds maximum length of ${MAX_PACKAGES_LENGTH} characters`
135
+ );
136
+ }
137
+
138
+ // Only allow valid npm/PyPI package name characters, commas, colons (for npm:/pypi: prefix)
139
+ const cleaned = input
140
+ .trim()
141
+ .replace(/[^a-zA-Z0-9@/._\-,:]/g, "")
142
+ .slice(0, MAX_PACKAGES_LENGTH);
143
+
144
+ if (cleaned.length === 0) {
145
+ throw new SecurityError("Package name contains no valid characters after sanitization");
146
+ }
147
+
148
+ return cleaned;
149
+ }
150
+
151
+ // ─── Error formatter ─────────────────────────────────────────────────────────
152
+
153
+ export function formatSecurityError(err: unknown): string {
154
+ if (err instanceof SecurityError) {
155
+ return `[Security] ${err.message}`;
156
+ }
157
+ if (err instanceof Error) {
158
+ return `[Error] ${err.message}`;
159
+ }
160
+ return "[Error] Unknown error occurred";
161
+ }
package/src/server.ts CHANGED
@@ -8,6 +8,7 @@ import { ycAdapter } from "./adapters/yc.js";
8
8
  import { repoSearchAdapter } from "./adapters/repoSearch.js";
9
9
  import { packageTrendsAdapter } from "./adapters/packageTrends.js";
10
10
  import { stampFreshness, formatForLLM } from "./tools/freshnessStamp.js";
11
+ import { SecurityError, formatSecurityError } from "./security.js";
11
12
 
12
13
  const server = new McpServer({
13
14
  name: "freshcontext-mcp",
@@ -27,9 +28,13 @@ server.registerTool(
27
28
  annotations: { readOnlyHint: true, openWorldHint: true },
28
29
  },
29
30
  async ({ url, max_length }) => {
30
- const result = await githubAdapter({ url, maxLength: max_length });
31
- const ctx = stampFreshness(result, { url, maxLength: max_length }, "github");
32
- return { content: [{ type: "text", text: formatForLLM(ctx) }] };
31
+ try {
32
+ const result = await githubAdapter({ url, maxLength: max_length });
33
+ const ctx = stampFreshness(result, { url, maxLength: max_length }, "github");
34
+ return { content: [{ type: "text", text: formatForLLM(ctx) }] };
35
+ } catch (err) {
36
+ return { content: [{ type: "text", text: formatSecurityError(err) }] };
37
+ }
33
38
  }
34
39
  );
35
40
 
@@ -46,9 +51,13 @@ server.registerTool(
46
51
  annotations: { readOnlyHint: true, openWorldHint: true },
47
52
  },
48
53
  async ({ url, max_length }) => {
49
- const result = await scholarAdapter({ url, maxLength: max_length });
50
- const ctx = stampFreshness(result, { url, maxLength: max_length }, "google_scholar");
51
- return { content: [{ type: "text", text: formatForLLM(ctx) }] };
54
+ try {
55
+ const result = await scholarAdapter({ url, maxLength: max_length });
56
+ const ctx = stampFreshness(result, { url, maxLength: max_length }, "google_scholar");
57
+ return { content: [{ type: "text", text: formatForLLM(ctx) }] };
58
+ } catch (err) {
59
+ return { content: [{ type: "text", text: formatSecurityError(err) }] };
60
+ }
52
61
  }
53
62
  );
54
63
 
@@ -65,9 +74,13 @@ server.registerTool(
65
74
  annotations: { readOnlyHint: true, openWorldHint: true },
66
75
  },
67
76
  async ({ url, max_length }) => {
68
- const result = await hackerNewsAdapter({ url, maxLength: max_length });
69
- const ctx = stampFreshness(result, { url, maxLength: max_length }, "hackernews");
70
- return { content: [{ type: "text", text: formatForLLM(ctx) }] };
77
+ try {
78
+ const result = await hackerNewsAdapter({ url, maxLength: max_length });
79
+ const ctx = stampFreshness(result, { url, maxLength: max_length }, "hackernews");
80
+ return { content: [{ type: "text", text: formatForLLM(ctx) }] };
81
+ } catch (err) {
82
+ return { content: [{ type: "text", text: formatSecurityError(err) }] };
83
+ }
71
84
  }
72
85
  );
73
86
 
@@ -84,9 +97,13 @@ server.registerTool(
84
97
  annotations: { readOnlyHint: true, openWorldHint: true },
85
98
  },
86
99
  async ({ url, max_length }) => {
87
- const result = await ycAdapter({ url, maxLength: max_length });
88
- const ctx = stampFreshness(result, { url, maxLength: max_length }, "ycombinator");
89
- return { content: [{ type: "text", text: formatForLLM(ctx) }] };
100
+ try {
101
+ const result = await ycAdapter({ url, maxLength: max_length });
102
+ const ctx = stampFreshness(result, { url, maxLength: max_length }, "ycombinator");
103
+ return { content: [{ type: "text", text: formatForLLM(ctx) }] };
104
+ } catch (err) {
105
+ return { content: [{ type: "text", text: formatSecurityError(err) }] };
106
+ }
90
107
  }
91
108
  );
92
109
 
@@ -103,9 +120,13 @@ server.registerTool(
103
120
  annotations: { readOnlyHint: true, openWorldHint: true },
104
121
  },
105
122
  async ({ query, max_length }) => {
106
- const result = await repoSearchAdapter({ url: query, maxLength: max_length });
107
- const ctx = stampFreshness(result, { url: query, maxLength: max_length }, "github_search");
108
- return { content: [{ type: "text", text: formatForLLM(ctx) }] };
123
+ try {
124
+ const result = await repoSearchAdapter({ url: query, maxLength: max_length });
125
+ const ctx = stampFreshness(result, { url: query, maxLength: max_length }, "github_search");
126
+ return { content: [{ type: "text", text: formatForLLM(ctx) }] };
127
+ } catch (err) {
128
+ return { content: [{ type: "text", text: formatSecurityError(err) }] };
129
+ }
109
130
  }
110
131
  );
111
132
 
@@ -122,9 +143,13 @@ server.registerTool(
122
143
  annotations: { readOnlyHint: true, openWorldHint: true },
123
144
  },
124
145
  async ({ packages, max_length }) => {
125
- const result = await packageTrendsAdapter({ url: packages, maxLength: max_length });
126
- const ctx = stampFreshness(result, { url: packages, maxLength: max_length }, "package_registry");
127
- return { content: [{ type: "text", text: formatForLLM(ctx) }] };
146
+ try {
147
+ const result = await packageTrendsAdapter({ url: packages, maxLength: max_length });
148
+ const ctx = stampFreshness(result, { url: packages, maxLength: max_length }, "package_registry");
149
+ return { content: [{ type: "text", text: formatForLLM(ctx) }] };
150
+ } catch (err) {
151
+ return { content: [{ type: "text", text: formatSecurityError(err) }] };
152
+ }
128
153
  }
129
154
  );
130
155