freshcontext-mcp 0.1.6 β†’ 0.1.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -41,31 +41,41 @@ The AI agent always knows **when it's looking at data**, not just what the data
41
41
  | `extract_github` | README, stars, forks, language, topics, last commit from any GitHub repo |
42
42
  | `extract_hackernews` | Top stories or search results from HN with scores and timestamps |
43
43
  | `extract_scholar` | Research paper titles, authors, years, and snippets from Google Scholar |
44
+ | `extract_reddit` | Posts and community sentiment from any subreddit or Reddit search |
44
45
 
45
46
  ### πŸš€ Competitive Intelligence Tools
46
47
 
47
48
  | Tool | Description |
48
49
  |---|---|
49
50
  | `extract_yc` | Scrape YC company listings by keyword β€” find who's funded in your space |
51
+ | `extract_producthunt` | Recent Product Hunt launches by keyword or topic |
50
52
  | `search_repos` | Search GitHub for similar/competing repos, ranked by stars with activity signals |
51
53
  | `package_trends` | npm and PyPI package metadata β€” version history, release cadence, last updated |
52
54
 
55
+ ### πŸ“ˆ Market Data
56
+
57
+ | Tool | Description |
58
+ |---|---|
59
+ | `extract_finance` | Live stock data via Yahoo Finance β€” price, market cap, P/E, 52w range, sector, company summary |
60
+
53
61
  ### πŸ—ΊοΈ Composite Tool
54
62
 
55
63
  | Tool | Description |
56
64
  |---|---|
57
- | `extract_landscape` | **One call. Full picture.** Queries YC startups + GitHub repos + HN sentiment + package ecosystem simultaneously. Returns a unified landscape report. |
65
+ | `extract_landscape` | **One call. Full picture.** Queries YC + GitHub + HN + npm/PyPI simultaneously. Returns a unified timestamped landscape report. |
58
66
 
59
67
  ---
60
68
 
61
69
  ## Quick Start
62
70
 
63
- ### Option A β€” Cloud (no install, works immediately)
71
+ ### Option A β€” Cloud (recommended, no install needed)
72
+
73
+ Visit **[freshcontext-site.pages.dev](https://freshcontext-site.pages.dev)** for a guided 3-step install with copy-paste config. No terminal, no downloads, no antivirus alerts.
64
74
 
65
- No Node, no Playwright, nothing to install. Just add this to your Claude Desktop config and restart.
75
+ Or add this manually to your Claude Desktop config and restart:
66
76
 
67
- **Mac:** open `~/Library/Application Support/Claude/claude_desktop_config.json`
68
- **Windows:** open `%APPDATA%\Claude\claude_desktop_config.json`
77
+ **Mac:** `~/Library/Application Support/Claude/claude_desktop_config.json`
78
+ **Windows:** `%APPDATA%\Claude\claude_desktop_config.json`
69
79
 
70
80
  ```json
71
81
  {
@@ -80,11 +90,11 @@ No Node, no Playwright, nothing to install. Just add this to your Claude Desktop
80
90
 
81
91
  Restart Claude Desktop. The freshcontext tools will appear in your session.
82
92
 
83
- > **Note:** If `claude_desktop_config.json` doesn't exist yet, create it with the content above.
93
+ > If `claude_desktop_config.json` doesn't exist yet, create it with the content above.
84
94
 
85
95
  ---
86
96
 
87
- ### Option B β€” Local (full Playwright, faster for heavy use)
97
+ ### Option B β€” Local (full Playwright, for heavy use)
88
98
 
89
99
  **Prerequisites:** Node.js 18+ ([nodejs.org](https://nodejs.org))
90
100
 
@@ -98,7 +108,7 @@ npm run build
98
108
 
99
109
  Then add to your Claude Desktop config:
100
110
 
101
- **Mac** (`~/Library/Application Support/Claude/claude_desktop_config.json`):
111
+ **Mac:**
102
112
  ```json
103
113
  {
104
114
  "mcpServers": {
@@ -110,7 +120,7 @@ Then add to your Claude Desktop config:
110
120
  }
111
121
  ```
112
122
 
113
- **Windows** (`%APPDATA%\Claude\claude_desktop_config.json`):
123
+ **Windows:**
114
124
  ```json
115
125
  {
116
126
  "mcpServers": {
@@ -122,58 +132,48 @@ Then add to your Claude Desktop config:
122
132
  }
123
133
  ```
124
134
 
125
- Restart Claude Desktop.
126
-
127
135
  ---
128
136
 
129
137
  ### Troubleshooting (Mac)
130
138
 
131
- **"command not found: node"** β€” Node isn't on your PATH inside Claude Desktop's environment. Use the full path:
139
+ **"command not found: node"** β€” Node isn't on Claude Desktop's PATH. Use the full path:
132
140
  ```bash
133
141
  which node # copy this output
134
142
  ```
135
- Then replace `"command": "node"` with `"command": "/usr/local/bin/node"` (or whatever `which node` returned).
143
+ Replace `"command": "node"` with `"command": "/usr/local/bin/node"` (or whatever `which node` returned).
136
144
 
137
- **"npx: command not found"** β€” Same issue. Run `which npx` and use the full path for Option A:
138
- ```json
139
- "command": "/usr/local/bin/npx"
140
- ```
145
+ **"npx: command not found"** β€” Same fix. Run `which npx` and use the full path.
141
146
 
142
- **Config file doesn't exist** β€” Create it. On Mac:
147
+ **Config file doesn't exist** β€” Create it:
143
148
  ```bash
144
149
  mkdir -p ~/Library/Application\ Support/Claude
145
150
  touch ~/Library/Application\ Support/Claude/claude_desktop_config.json
146
151
  ```
147
- Then paste the config JSON above into it.
148
152
 
149
153
  ---
150
154
 
151
155
  ## Usage Examples
152
156
 
153
157
  ### Check if anyone is already building what you're building
154
-
155
158
  ```
156
159
  Use extract_landscape with topic "cashflow prediction mcp"
157
160
  ```
158
-
159
161
  Returns a unified report: who's funded (YC), what's trending (HN), what repos exist (GitHub), what packages are active (npm/PyPI). All timestamped.
160
162
 
161
- ### Analyse a specific repo
162
-
163
+ ### Get community sentiment on a topic
163
164
  ```
164
- Use extract_github on https://github.com/anthropics/anthropic-sdk-python
165
+ Use extract_reddit with url "r/MachineLearning"
166
+ Use extract_hackernews with url "https://hn.algolia.com/api/v1/search?query=mcp+server&tags=story"
165
167
  ```
166
168
 
167
- ### Find research papers on a topic
168
-
169
+ ### Check a company's stock
169
170
  ```
170
- Use extract_scholar on https://scholar.google.com/scholar?q=llm+context+freshness
171
+ Use extract_finance with url "NVDA,MSFT,GOOG"
171
172
  ```
172
173
 
173
- ### Check package ecosystem health
174
-
174
+ ### Find what just launched in your space
175
175
  ```
176
- Use package_trends with packages "npm:@modelcontextprotocol/sdk,pypi:langchain"
176
+ Use extract_producthunt with url "AI developer tools"
177
177
  ```
178
178
 
179
179
  ---
@@ -189,24 +189,14 @@ FreshContext treats **retrieval time as first-class metadata**. Every adapter re
189
189
  - `freshness_confidence` β€” `high`, `medium`, or `low` based on signal quality
190
190
  - `adapter` β€” which source the data came from
191
191
 
192
- This makes freshness **verifiable**, not assumed.
193
-
194
192
  ---
195
193
 
196
- ## Deployment
197
-
198
- ### Local (Playwright-based)
199
- Uses headless Chromium via Playwright. Full browser rendering for JavaScript-heavy sites.
194
+ ## Security
200
195
 
201
- ### Cloud (Cloudflare Workers)
202
- The `worker/` directory contains a Cloudflare Workers deployment. No Playwright dependency β€” runs at the edge globally.
203
-
204
- ```bash
205
- cd worker
206
- npm install
207
- npx wrangler secret put API_KEY
208
- npx wrangler deploy
209
- ```
196
+ - Input sanitization and domain allowlists on all adapters
197
+ - SSRF prevention (blocked private IP ranges)
198
+ - KV-backed global rate limiting: 60 requests/minute per IP across all edge nodes
199
+ - No credentials required for public data sources
210
200
 
211
201
  ---
212
202
 
@@ -217,17 +207,20 @@ freshcontext-mcp/
217
207
  β”œβ”€β”€ src/
218
208
  β”‚ β”œβ”€β”€ server.ts # MCP server, all tool registrations
219
209
  β”‚ β”œβ”€β”€ types.ts # FreshContext interfaces
220
- β”‚ β”œβ”€β”€ security.ts # Input validation, domain allowlists
210
+ β”‚ β”œβ”€β”€ security.ts # Input validation, domain allowlists, SSRF prevention
221
211
  β”‚ β”œβ”€β”€ adapters/
222
212
  β”‚ β”‚ β”œβ”€β”€ github.ts
223
213
  β”‚ β”‚ β”œβ”€β”€ hackernews.ts
224
214
  β”‚ β”‚ β”œβ”€β”€ scholar.ts
225
215
  β”‚ β”‚ β”œβ”€β”€ yc.ts
226
216
  β”‚ β”‚ β”œβ”€β”€ repoSearch.ts
227
- β”‚ β”‚ └── packageTrends.ts
217
+ β”‚ β”‚ β”œβ”€β”€ packageTrends.ts
218
+ β”‚ β”‚ β”œβ”€β”€ reddit.ts
219
+ β”‚ β”‚ β”œβ”€β”€ productHunt.ts
220
+ β”‚ β”‚ └── finance.ts
228
221
  β”‚ └── tools/
229
222
  β”‚ └── freshnessStamp.ts
230
- └── worker/ # Cloudflare Workers deployment
223
+ └── worker/ # Cloudflare Workers deployment (all 10 tools)
231
224
  └── src/worker.ts
232
225
  ```
233
226
 
@@ -243,9 +236,11 @@ freshcontext-mcp/
243
236
  - [x] npm/PyPI package trends
244
237
  - [x] `extract_landscape` composite tool
245
238
  - [x] Cloudflare Workers deployment
246
- - [x] Worker auth + rate limiting + domain allowlists
247
- - [ ] Product Hunt launches adapter
248
- - [ ] Finance/market data adapter
239
+ - [x] Worker auth + KV-backed global rate limiting
240
+ - [x] Reddit community sentiment adapter
241
+ - [x] Product Hunt launches adapter
242
+ - [x] Yahoo Finance market data adapter
243
+ - [ ] `extract_arxiv` β€” structured arXiv API (more reliable than Scholar)
249
244
  - [ ] TTL-based caching layer
250
245
  - [ ] `freshness_score` numeric metric
251
246
 
@@ -0,0 +1,66 @@
1
+ /**
2
+ * arXiv adapter β€” uses the official arXiv API (no scraping, no auth needed).
3
+ * Accepts a search query or a direct arXiv API URL.
4
+ * Docs: https://arxiv.org/help/api/user-manual
5
+ */
6
+ export async function arxivAdapter(options) {
7
+ const input = options.url.trim();
8
+ // Build API URL β€” if they pass a plain query, construct it
9
+ const apiUrl = input.startsWith("http")
10
+ ? input
11
+ : `https://export.arxiv.org/api/query?search_query=all:${encodeURIComponent(input)}&start=0&max_results=10&sortBy=relevance&sortOrder=descending`;
12
+ const res = await fetch(apiUrl, {
13
+ headers: { "User-Agent": "freshcontext-mcp/0.1.7 (https://github.com/PrinceGabriel-lgtm/freshcontext-mcp)" },
14
+ });
15
+ if (!res.ok)
16
+ throw new Error(`arXiv API error: ${res.status} ${res.statusText}`);
17
+ const xml = await res.text();
18
+ // Parse the Atom XML response
19
+ const entries = [...xml.matchAll(/<entry>([\s\S]*?)<\/entry>/g)];
20
+ if (!entries.length) {
21
+ return { raw: "No results found for this query.", content_date: null, freshness_confidence: "low" };
22
+ }
23
+ const getTag = (block, tag) => {
24
+ const m = block.match(new RegExp(`<${tag}[^>]*>([\\s\\S]*?)<\\/${tag}>`, "i"));
25
+ return m ? m[1].trim().replace(/\s+/g, " ") : "";
26
+ };
27
+ const getAttr = (block, tag, attr) => {
28
+ const m = block.match(new RegExp(`<${tag}[^>]*${attr}="([^"]*)"`, "i"));
29
+ return m ? m[1].trim() : "";
30
+ };
31
+ const papers = entries.map((match, i) => {
32
+ const block = match[1];
33
+ const title = getTag(block, "title").replace(/\n/g, " ");
34
+ const summary = getTag(block, "summary").slice(0, 300).replace(/\n/g, " ");
35
+ const published = getTag(block, "published").slice(0, 10); // YYYY-MM-DD
36
+ const updated = getTag(block, "updated").slice(0, 10);
37
+ const id = getTag(block, "id").replace("http://arxiv.org/abs/", "https://arxiv.org/abs/");
38
+ // Authors β€” can be multiple
39
+ const authorMatches = [...block.matchAll(/<author>([\s\S]*?)<\/author>/g)];
40
+ const authors = authorMatches
41
+ .map(a => getTag(a[1], "name"))
42
+ .filter(Boolean)
43
+ .slice(0, 4)
44
+ .join(", ");
45
+ // Categories
46
+ const primaryCat = getAttr(block, "arxiv:primary_category", "term") ||
47
+ getAttr(block, "category", "term");
48
+ return [
49
+ `[${i + 1}] ${title}`,
50
+ `Authors: ${authors || "Unknown"}`,
51
+ `Published: ${published}${updated !== published ? ` (updated ${updated})` : ""}`,
52
+ primaryCat ? `Category: ${primaryCat}` : null,
53
+ `Abstract: ${summary}…`,
54
+ `Link: ${id}`,
55
+ ].filter(Boolean).join("\n");
56
+ });
57
+ const raw = papers.join("\n\n").slice(0, options.maxLength ?? 6000);
58
+ // Most recent publication date
59
+ const dates = entries
60
+ .map(m => getTag(m[1], "published").slice(0, 10))
61
+ .filter(Boolean)
62
+ .sort()
63
+ .reverse();
64
+ const content_date = dates[0] ?? null;
65
+ return { raw, content_date, freshness_confidence: content_date ? "high" : "medium" };
66
+ }
package/dist/server.js CHANGED
@@ -124,10 +124,10 @@ server.registerTool("package_trends", {
124
124
  });
125
125
  // ─── Tool: extract_landscape ─────────────────────────────────────────────────
126
126
  server.registerTool("extract_landscape", {
127
- description: "Composite intelligence tool. Given a project idea or keyword, simultaneously queries YC startups, GitHub repos, HN sentiment, and package activity to answer: Who is building this? Is it funded? What's getting traction? Returns a unified timestamped landscape report.",
127
+ description: "Composite intelligence tool. Given a project idea or keyword, simultaneously queries YC startups, GitHub repos, HN, Reddit, Product Hunt, and package registries to answer: Who is building this? Is it funded? What's getting traction? Returns a unified 6-source timestamped landscape report.",
128
128
  inputSchema: z.object({
129
129
  topic: z.string().describe("Your project idea or keyword e.g. 'mcp server' or 'cashflow prediction'"),
130
- max_length: z.number().optional().default(8000),
130
+ max_length: z.number().optional().default(10000),
131
131
  }),
132
132
  annotations: { readOnlyHint: true, openWorldHint: true },
133
133
  }, async ({ topic, max_length }) => {
package/package.json CHANGED
@@ -1,6 +1,7 @@
1
1
  {
2
2
  "name": "freshcontext-mcp",
3
- "version": "0.1.6",
3
+ "mcpName": "io.github.PrinceGabriel-lgtm/freshcontext",
4
+ "version": "0.1.8",
4
5
  "description": "Real-time web extraction MCP server with freshness timestamps for AI agents",
5
6
  "keywords": [
6
7
  "mcp",
@@ -49,3 +50,6 @@
49
50
  }
50
51
  }
51
52
 
53
+
54
+
55
+
package/server.json ADDED
@@ -0,0 +1,28 @@
1
+ {
2
+ "$schema": "https://static.modelcontextprotocol.io/schemas/2025-07-09/server.schema.json",
3
+ "name": "io.github.PrinceGabriel-lgtm/freshcontext",
4
+ "description": "Real-time web intelligence for AI agents. 11 tools, no API keys. GitHub, HN, Reddit, arXiv & more.",
5
+ "repository": {
6
+ "url": "https://github.com/PrinceGabriel-lgtm/freshcontext-mcp",
7
+ "source": "github"
8
+ },
9
+ "version": "0.1.7",
10
+ "website_url": "https://freshcontext-site.pages.dev",
11
+ "packages": [
12
+ {
13
+ "registry_type": "npm",
14
+ "registry_base_url": "https://registry.npmjs.org",
15
+ "identifier": "freshcontext-mcp",
16
+ "version": "0.1.7",
17
+ "transport": {
18
+ "type": "stdio"
19
+ }
20
+ }
21
+ ],
22
+ "remotes": [
23
+ {
24
+ "type": "streamable-http",
25
+ "url": "https://freshcontext-mcp.gimmanuel73.workers.dev/mcp"
26
+ }
27
+ ]
28
+ }
package/.env.example DELETED
@@ -1,8 +0,0 @@
1
- # freshcontext-mcp environment variables
2
- # Copy to .env and fill in
3
-
4
- # Optional: GitHub Personal Access Token (increases rate limits for GitHub API fallback)
5
- GITHUB_TOKEN=
6
-
7
- # Optional: Proxy URL if needed for certain extractions
8
- # PROXY_URL=http://user:pass@host:port
@@ -1,159 +0,0 @@
1
- import { AdapterResult, ExtractOptions } from "../types.js";
2
-
3
- /**
4
- * Finance adapter β€” Yahoo Finance public API, no auth required.
5
- * Accepts:
6
- * - A ticker symbol e.g. "AAPL" or "MSFT,GOOG"
7
- * - A company name e.g. "Apple" (will search for ticker first)
8
- * - Comma-separated tickers for comparison
9
- */
10
-
11
- interface YahooQuote {
12
- symbol: string;
13
- shortName?: string;
14
- longName?: string;
15
- regularMarketPrice?: number;
16
- regularMarketChange?: number;
17
- regularMarketChangePercent?: number;
18
- marketCap?: number;
19
- regularMarketVolume?: number;
20
- fiftyTwoWeekHigh?: number;
21
- fiftyTwoWeekLow?: number;
22
- trailingPE?: number;
23
- dividendYield?: number;
24
- currency?: string;
25
- exchangeName?: string;
26
- regularMarketTime?: number;
27
- longBusinessSummary?: string;
28
- sector?: string;
29
- industry?: string;
30
- fullTimeEmployees?: number;
31
- website?: string;
32
- }
33
-
34
- function formatMarketCap(cap: number | undefined): string {
35
- if (!cap) return "N/A";
36
- if (cap >= 1e12) return `$${(cap / 1e12).toFixed(2)}T`;
37
- if (cap >= 1e9) return `$${(cap / 1e9).toFixed(2)}B`;
38
- if (cap >= 1e6) return `$${(cap / 1e6).toFixed(2)}M`;
39
- return `$${cap.toLocaleString()}`;
40
- }
41
-
42
- function formatChange(change: number | undefined, pct: number | undefined): string {
43
- if (change === undefined || pct === undefined) return "N/A";
44
- const sign = change >= 0 ? "+" : "";
45
- return `${sign}${change.toFixed(2)} (${sign}${pct.toFixed(2)}%)`;
46
- }
47
-
48
- export async function financeAdapter(options: ExtractOptions): Promise<AdapterResult> {
49
- const input = options.url.trim();
50
-
51
- // Support comma-separated tickers
52
- const rawTickers = input
53
- .split(",")
54
- .map((t) => t.trim().toUpperCase())
55
- .filter(Boolean)
56
- .slice(0, 5); // max 5 at once
57
-
58
- const results: string[] = [];
59
- let latestTimestamp: number | null = null;
60
-
61
- for (const ticker of rawTickers) {
62
- try {
63
- const quoteData = await fetchQuote(ticker);
64
- if (quoteData) {
65
- results.push(formatQuote(quoteData));
66
- if (quoteData.regularMarketTime) {
67
- latestTimestamp = Math.max(latestTimestamp ?? 0, quoteData.regularMarketTime);
68
- }
69
- }
70
- } catch (err) {
71
- results.push(`[${ticker}] Error: ${err instanceof Error ? err.message : String(err)}`);
72
- }
73
- }
74
-
75
- const raw = results.join("\n\n─────────────────────────────\n\n").slice(0, options.maxLength ?? 5000);
76
- const content_date = latestTimestamp
77
- ? new Date(latestTimestamp * 1000).toISOString()
78
- : new Date().toISOString();
79
-
80
- return { raw, content_date, freshness_confidence: "high" };
81
- }
82
-
83
- async function fetchQuote(ticker: string): Promise<YahooQuote | null> {
84
- // v7 quote endpoint β€” public, no auth
85
- const quoteUrl = `https://query1.finance.yahoo.com/v7/finance/quote?symbols=${encodeURIComponent(ticker)}&fields=shortName,longName,regularMarketPrice,regularMarketChange,regularMarketChangePercent,marketCap,regularMarketVolume,fiftyTwoWeekHigh,fiftyTwoWeekLow,trailingPE,dividendYield,currency,exchangeName,regularMarketTime`;
86
-
87
- const quoteRes = await fetch(quoteUrl, {
88
- headers: {
89
- "User-Agent": "Mozilla/5.0 (compatible; freshcontext-mcp/0.1.5)",
90
- "Accept": "application/json",
91
- },
92
- });
93
-
94
- if (!quoteRes.ok) throw new Error(`Yahoo Finance API error: ${quoteRes.status}`);
95
-
96
- const quoteJson = await quoteRes.json() as {
97
- quoteResponse?: { result?: YahooQuote[] };
98
- };
99
-
100
- const quote = quoteJson?.quoteResponse?.result?.[0];
101
- if (!quote) throw new Error(`No data found for ticker: ${ticker}`);
102
-
103
- // Optionally fetch company summary (v11 quoteSummary)
104
- try {
105
- const summaryUrl = `https://query1.finance.yahoo.com/v11/finance/quoteSummary/${encodeURIComponent(ticker)}?modules=assetProfile`;
106
- const summaryRes = await fetch(summaryUrl, {
107
- headers: { "User-Agent": "Mozilla/5.0 (compatible; freshcontext-mcp/0.1.5)" },
108
- });
109
- if (summaryRes.ok) {
110
- const summaryJson = await summaryRes.json() as {
111
- quoteSummary?: { result?: Array<{ assetProfile?: YahooQuote }> };
112
- };
113
- const profile = summaryJson?.quoteSummary?.result?.[0]?.assetProfile;
114
- if (profile) {
115
- Object.assign(quote, {
116
- longBusinessSummary: profile.longBusinessSummary,
117
- sector: profile.sector,
118
- industry: profile.industry,
119
- fullTimeEmployees: profile.fullTimeEmployees,
120
- website: profile.website,
121
- });
122
- }
123
- }
124
- } catch {
125
- // Summary is optional β€” continue without it
126
- }
127
-
128
- return quote;
129
- }
130
-
131
- function formatQuote(q: YahooQuote): string {
132
- const lines = [
133
- `${q.symbol} β€” ${q.longName ?? q.shortName ?? "Unknown"}`,
134
- `Exchange: ${q.exchangeName ?? "N/A"} Β· Currency: ${q.currency ?? "USD"}`,
135
- "",
136
- `Price: ${q.regularMarketPrice !== undefined ? `$${q.regularMarketPrice.toFixed(2)}` : "N/A"}`,
137
- `Change: ${formatChange(q.regularMarketChange, q.regularMarketChangePercent)}`,
138
- `Market Cap: ${formatMarketCap(q.marketCap)}`,
139
- `Volume: ${q.regularMarketVolume?.toLocaleString() ?? "N/A"}`,
140
- `52w High: ${q.fiftyTwoWeekHigh !== undefined ? `$${q.fiftyTwoWeekHigh.toFixed(2)}` : "N/A"}`,
141
- `52w Low: ${q.fiftyTwoWeekLow !== undefined ? `$${q.fiftyTwoWeekLow.toFixed(2)}` : "N/A"}`,
142
- `P/E Ratio: ${q.trailingPE !== undefined ? q.trailingPE.toFixed(2) : "N/A"}`,
143
- `Div Yield: ${q.dividendYield !== undefined ? `${(q.dividendYield * 100).toFixed(2)}%` : "N/A"}`,
144
- ];
145
-
146
- if (q.sector || q.industry) {
147
- lines.push("");
148
- if (q.sector) lines.push(`Sector: ${q.sector}`);
149
- if (q.industry) lines.push(`Industry: ${q.industry}`);
150
- if (q.fullTimeEmployees) lines.push(`Employees: ${q.fullTimeEmployees.toLocaleString()}`);
151
- if (q.website) lines.push(`Website: ${q.website}`);
152
- }
153
-
154
- if (q.longBusinessSummary) {
155
- lines.push("", "About:", q.longBusinessSummary.slice(0, 500) + (q.longBusinessSummary.length > 500 ? "…" : ""));
156
- }
157
-
158
- return lines.join("\n");
159
- }
@@ -1,54 +0,0 @@
1
- import { chromium } from "playwright";
2
- import { AdapterResult, ExtractOptions } from "../types.js";
3
- import { validateUrl } from "../security.js";
4
-
5
- export async function githubAdapter(options: ExtractOptions): Promise<AdapterResult> {
6
- const safeUrl = validateUrl(options.url, "github");
7
- options = { ...options, url: safeUrl };
8
-
9
- const browser = await chromium.launch({ headless: true });
10
- const page = await browser.newPage();
11
-
12
- // Spoof a real browser UA to avoid bot detection
13
- await page.setExtraHTTPHeaders({
14
- "User-Agent":
15
- "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
16
- });
17
-
18
- await page.goto(options.url, { waitUntil: "domcontentloaded", timeout: 20000 });
19
-
20
- // Extract key repo signals β€” no inner functions to avoid esbuild __name injection
21
- const data = await page.evaluate(`(function() {
22
- var readme = (document.querySelector('[data-target="readme-toc.content"]') || document.querySelector('.markdown-body') || {}).textContent || null;
23
- var starsEl = document.querySelector('[id="repo-stars-counter-star"]') || document.querySelector('.Counter.js-social-count');
24
- var stars = starsEl ? starsEl.textContent.trim() : null;
25
- var forksEl = document.querySelector('[id="repo-network-counter"]');
26
- var forks = forksEl ? forksEl.textContent.trim() : null;
27
- var commitEl = document.querySelector('relative-time');
28
- var lastCommit = commitEl ? commitEl.getAttribute('datetime') : null;
29
- var descEl = document.querySelector('.f4.my-3');
30
- var description = descEl ? descEl.textContent.trim() : null;
31
- var topics = Array.from(document.querySelectorAll('.topic-tag')).map(function(t) { return t.textContent.trim(); });
32
- var langEl = document.querySelector('.color-fg-default.text-bold.mr-1');
33
- var language = langEl ? langEl.textContent.trim() : null;
34
- return { readme: readme, stars: stars, forks: forks, lastCommit: lastCommit, description: description, topics: topics, language: language };
35
- })()`);
36
- const typedData = data as { readme: string | null; stars: string | null; forks: string | null; lastCommit: string | null; description: string | null; topics: string[]; language: string | null };
37
-
38
- await browser.close();
39
-
40
- const raw = [
41
- `Description: ${typedData.description ?? "N/A"}`,
42
- `Stars: ${typedData.stars ?? "N/A"} | Forks: ${typedData.forks ?? "N/A"}`,
43
- `Language: ${typedData.language ?? "N/A"}`,
44
- `Last commit: ${typedData.lastCommit ?? "N/A"}`,
45
- `Topics: ${typedData.topics?.join(", ") ?? "none"}`,
46
- `\n--- README ---\n${typedData.readme ?? "No README found"}`,
47
- ].join("\n");
48
-
49
- return {
50
- raw,
51
- content_date: typedData.lastCommit ?? null,
52
- freshness_confidence: typedData.lastCommit ? "high" : "medium",
53
- };
54
- }
@@ -1,95 +0,0 @@
1
- import { chromium } from "playwright";
2
- import { AdapterResult, ExtractOptions } from "../types.js";
3
- import { validateUrl } from "../security.js";
4
-
5
- export async function hackerNewsAdapter(options: ExtractOptions): Promise<AdapterResult> {
6
- // Validate URL β€” allow both HN and Algolia domains
7
- validateUrl(options.url, "hackernews");
8
- const url = options.url;
9
-
10
- if (url.includes("hn.algolia.com/api/") || url.startsWith("hn-search:")) {
11
- const query = url.startsWith("hn-search:")
12
- ? url.replace("hn-search:", "").trim()
13
- : url;
14
-
15
- const apiUrl = url.includes("hn.algolia.com/api/")
16
- ? url
17
- : `https://hn.algolia.com/api/v1/search?query=${encodeURIComponent(query)}&tags=story&hitsPerPage=20`;
18
-
19
- const res = await fetch(apiUrl);
20
- if (!res.ok) throw new Error(`HN Algolia API error: ${res.status}`);
21
- const data = await res.json() as {
22
- hits: Array<{
23
- title: string;
24
- url: string | null;
25
- points: number;
26
- num_comments: number;
27
- author: string;
28
- created_at: string;
29
- objectID: string;
30
- }>;
31
- };
32
-
33
- const raw = data.hits
34
- .map((r, i) =>
35
- [
36
- `[${i + 1}] ${r.title ?? "Untitled"}`,
37
- `URL: ${r.url ?? `https://news.ycombinator.com/item?id=${r.objectID}`}`,
38
- `Score: ${r.points} points | ${r.num_comments} comments`,
39
- `Author: ${r.author} | Posted: ${r.created_at}`,
40
- ].join("\n")
41
- )
42
- .join("\n\n")
43
- .slice(0, options.maxLength ?? 4000);
44
-
45
- const newest = data.hits.map((r) => r.created_at).sort().reverse()[0] ?? null;
46
- return { raw, content_date: newest, freshness_confidence: newest ? "high" : "medium" };
47
- }
48
-
49
- // Default: browser-based scrape for HN front page or search pages
50
- const browser = await chromium.launch({ headless: true });
51
- const page = await browser.newPage();
52
-
53
- await page.goto(url, { waitUntil: "domcontentloaded", timeout: 20000 });
54
-
55
- const data = await page.evaluate(`(function() {
56
- var items = Array.from(document.querySelectorAll('.athing')).slice(0, 20);
57
- var results = items.map(function(el) {
58
- var titleLineEl = el.querySelector('.titleline > a');
59
- var title = titleLineEl ? titleLineEl.textContent.trim() : null;
60
- var link = titleLineEl ? titleLineEl.getAttribute('href') : null;
61
- var subtext = el.nextElementSibling;
62
- var scoreEl = subtext ? subtext.querySelector('.score') : null;
63
- var score = scoreEl ? scoreEl.textContent.trim() : null;
64
- var ageEl = subtext ? subtext.querySelector('.age') : null;
65
- var age = ageEl ? ageEl.getAttribute('title') : null;
66
- var anchors = subtext ? subtext.querySelectorAll('a') : [];
67
- var commentLink = anchors.length > 0 ? anchors[anchors.length - 1].textContent.trim() : null;
68
- return { title: title, link: link, score: score, age: age, commentLink: commentLink };
69
- });
70
- return results;
71
- })()`);
72
-
73
- await browser.close();
74
-
75
- const typedData = data as Array<{ title: string | null; link: string | null; score: string | null; age: string | null; commentLink: string | null }>;
76
-
77
- const raw = typedData
78
- .map((r, i) =>
79
- [
80
- `[${i + 1}] ${r.title ?? "Untitled"}`,
81
- `URL: ${r.link ?? "N/A"}`,
82
- `Score: ${r.score ?? "N/A"} | ${r.commentLink ?? ""}`,
83
- `Posted: ${r.age ?? "unknown"}`,
84
- ].join("\n")
85
- )
86
- .join("\n\n");
87
-
88
- const newestDate = typedData.map((r) => r.age).filter(Boolean).sort().reverse()[0] ?? null;
89
-
90
- return {
91
- raw,
92
- content_date: newestDate,
93
- freshness_confidence: newestDate ? "high" : "medium",
94
- };
95
- }