@cyanheads/mcp-ts-core 0.6.10 → 0.6.12
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CLAUDE.md +1 -1
- package/README.md +1 -1
- package/biome.json +1 -1
- package/changelog/0.6.x/0.6.11.md +23 -0
- package/changelog/0.6.x/0.6.12.md +15 -0
- package/dist/logs/combined.log +4 -4
- package/dist/logs/error.log +4 -4
- package/dist/utils/index.d.ts +1 -1
- package/dist/utils/index.d.ts.map +1 -1
- package/dist/utils/index.js +1 -1
- package/dist/utils/index.js.map +1 -1
- package/dist/utils/parsing/htmlExtractor.d.ts +146 -0
- package/dist/utils/parsing/htmlExtractor.d.ts.map +1 -0
- package/dist/utils/parsing/htmlExtractor.js +171 -0
- package/dist/utils/parsing/htmlExtractor.js.map +1 -0
- package/dist/utils/parsing/index.d.ts +1 -0
- package/dist/utils/parsing/index.d.ts.map +1 -1
- package/dist/utils/parsing/index.js +1 -0
- package/dist/utils/parsing/index.js.map +1 -1
- package/package.json +15 -5
- package/skills/field-test/SKILL.md +205 -82
- package/skills/report-issue-framework/SKILL.md +54 -4
- package/skills/report-issue-local/SKILL.md +60 -5
package/CLAUDE.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# Agent Protocol
|
|
2
2
|
|
|
3
|
-
**Package:** `@cyanheads/mcp-ts-core` · **Version:** 0.6.
|
|
3
|
+
**Package:** `@cyanheads/mcp-ts-core` · **Version:** 0.6.12
|
|
4
4
|
**npm:** [@cyanheads/mcp-ts-core](https://www.npmjs.com/package/@cyanheads/mcp-ts-core) · **Docker:** [ghcr.io/cyanheads/mcp-ts-core](https://ghcr.io/cyanheads/mcp-ts-core)
|
|
5
5
|
|
|
6
6
|
> **Developer note:** Never assume. Read related files and docs before making changes. Read full file content for context. Never edit a file before reading it.
|
package/README.md
CHANGED
|
@@ -5,7 +5,7 @@
|
|
|
5
5
|
|
|
6
6
|
<div align="center">
|
|
7
7
|
|
|
8
|
-
[](./CHANGELOG.md) [](https://github.com/modelcontextprotocol/modelcontextprotocol/blob/main/docs/specification/2025-11-25/changelog.mdx) [](https://modelcontextprotocol.io/) [](./LICENSE)
|
|
9
9
|
|
|
10
10
|
[](https://www.typescriptlang.org/) [](https://bun.sh/)
|
|
11
11
|
|
package/biome.json
CHANGED
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
---
|
|
2
|
+
summary: Add HtmlExtractor Tier 3 utility — wraps defuddle + linkedom for extracting main article content and metadata from raw HTML into Markdown or cleaned HTML
|
|
3
|
+
breaking: false
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# 0.6.11 — 2026-04-23
|
|
7
|
+
|
|
8
|
+
Adds `HtmlExtractor`, a new Tier 3 parsing utility for turning raw HTML into clean article content plus best-effort metadata. Built for MCP servers that wrap scholarly or article APIs and need to hand page content to an LLM without hand-rolling extraction.
|
|
9
|
+
|
|
10
|
+
## Added
|
|
11
|
+
|
|
12
|
+
- **`HtmlExtractor` (`htmlExtractor` singleton)** in `src/utils/parsing/htmlExtractor.ts` — wraps [`defuddle`](https://github.com/kepano/defuddle) (modern Readability successor, powers Obsidian Web Clipper) together with [`linkedom`](https://github.com/WebReflection/linkedom) for DOM parsing. Exported from `@cyanheads/mcp-ts-core/utils`. One method:
|
|
13
|
+
- `extract(html, options?, context?)` — returns `{ title?, author?, description?, content, domain?, favicon?, image?, language?, metaTags?, parseTime?, published?, schemaOrgData?, site?, wordCount? }`. Only `content` is guaranteed; all other fields are best-effort based on what the source page exposes.
|
|
14
|
+
- **Options** on `ExtractArticleOptions`: `format` (`'markdown' | 'html'`, defaults to `'markdown'`), `url`, `contentSelector`, `removeImages`, `debug`, `language`, `useAsync` (off by default — keeps extraction local and deterministic; opt in to allow Defuddle's third-party API fallbacks for SPAs like Twitter).
|
|
15
|
+
- **Peer dependencies** (both optional): `defuddle ^0.18.1`, `linkedom ^0.18.12`. Install with `bun add defuddle linkedom`. Cloudflare Workers note: linkedom works in Workers but adds ~150KB minified plus entity tables — factor into your Worker size budget. JSDOM is also supported via defuddle's node entry but is not the default.
|
|
16
|
+
- **Tests** at `tests/unit/utils/parsing/htmlExtractor.test.ts` covering clean articles, metadata extraction, boilerplate removal (nav/sidebar/footer), markdown vs HTML output, `contentSelector` override, `removeImages`, empty input (→ `ValidationError`), SPA shells, and malformed HTML.
|
|
17
|
+
|
|
18
|
+
## Changed
|
|
19
|
+
|
|
20
|
+
- **`skills/field-test`** rewritten to v2.0 — pivots from "use the MCP tools already connected in your client" to "start the HTTP server locally and drive it with curl + JSON-RPC." Adds a reusable bash helper (`mcp_start`/`mcp_init`/`mcp_call`/`mcp_stop`) that persists PID/URL/session state across tool invocations, replaces the per-definition category matrix with a universal battery (happy path, parity, input error) plus trigger-gated situational categories so it scales to large servers, and tightens the report format (summary paragraph → grouped findings → numbered cherry-pick options).
|
|
21
|
+
- **`biome.json`** `$schema` URL bumped to `2.4.13` to match the `@biomejs/biome` version already in devDependencies.
|
|
22
|
+
|
|
23
|
+
Resolves [#46](https://github.com/cyanheads/mcp-ts-core/issues/46).
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
---
|
|
2
|
+
summary: Enrich report-issue-framework and report-issue-local skills with Writing Well-Structured Issues guidance and an expanded feature-request template
|
|
3
|
+
breaking: false
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# 0.6.12 — 2026-04-23
|
|
7
|
+
|
|
8
|
+
Thickens the two issue-filing skills so feature requests land as scannable, self-contained proposals instead of abstract "wouldn't it be nice" framings. No code changes.
|
|
9
|
+
|
|
10
|
+
## Changed
|
|
11
|
+
|
|
12
|
+
- **`skills/report-issue-framework/SKILL.md`** bumped to v1.2. Adds a **Writing Well-Structured Issues** section with nine bullets covering concrete openings, inline library links, `owner/repo#N` cross-repo refs, `Related: #N` provenance, philosophy-before-tradeoff framing, Markdown tables for comparisons, explicit `Scope` vs `Out of scope`, `Depends on:` ordering, and a generalized "skip collaborator-framing sign-offs" rule that works for both maintainer and external-contributor contexts. Expands the feature-request CLI template from three fields to a richer structure: concrete opening paragraph → `Related:` → `## Proposal` → `### Proposed API` → optional `Flow` (with a `trigger → resolve → fetch → degrade` hint) → optional `Design / Tradeoffs` (philosophy line + tradeoff table) → `Scope` → `Out of scope` → optional `Dependencies` → `Alternatives considered`. Adds a one-line cross-link to the `github-cli` skill for general `gh` workflows outside issue filing.
|
|
13
|
+
- **`skills/report-issue-local/SKILL.md`** bumped to v1.2 with the same treatment, adapted for server context — `### Proposed behavior` instead of `### Proposed API`, examples scoped to tools/services/resources, and a `Dependencies` section that hints at cross-repo refs to upstream framework issues.
|
|
14
|
+
|
|
15
|
+
Resolves [#48](https://github.com/cyanheads/mcp-ts-core/issues/48).
|
package/dist/logs/combined.log
CHANGED
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
{"level":50,"time":
|
|
2
|
-
{"level":50,"time":
|
|
3
|
-
{"level":50,"time":
|
|
4
|
-
{"level":50,"time":
|
|
1
|
+
{"level":50,"time":1776966178178,"env":"testing","version":"0.0.0-test","pid":78759,"requestId":"44GLL-MUSY2","timestamp":"2026-04-23T17:42:58.177Z","operation":"HandleToolRequest","input":{"message":"blocked"},"critical":false,"errorCode":-32005,"originalErrorType":"McpError","finalErrorType":"McpError","sessionId":"c554df94160cff7175e42850d8f0f46eefbfcfdb4190cf2f23abdb6a21e0962d","toolName":"scoped_echo","tenantId":"authz-tenant","auth":{"sub":"authz-user","scopes":["tool:other:read"],"clientId":"authz-client","tenantId":"authz-tenant"},"errorData":{"sessionId":"c554df94160cff7175e42850d8f0f46eefbfcfdb4190cf2f23abdb6a21e0962d","toolName":"scoped_echo","input":{"message":"blocked"},"requestId":"44GLL-MUSY2","timestamp":"2026-04-23T17:42:58.177Z","tenantId":"authz-tenant","operation":"HandleToolRequest","auth":{"sub":"authz-user","scopes":["tool:other:read"],"clientId":"authz-client","tenantId":"authz-tenant"},"originalErrorName":"McpError","originalMessage":"Insufficient permissions.","originalStack":"McpError: Insufficient permissions.\n at forbidden (/Users/casey/Developer/github/mcp-ts-core/dist/types-global/errors.js:84:58)\n at withRequiredScopes (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/auth/lib/authUtils.js:61:15)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/tools/utils/toolHandlerFactory.js:68:17)\n at executeToolHandler (/Users/casey/Developer/github/mcp-ts-core/node_modules/@modelcontextprotocol/sdk/dist/esm/server/mcp.js:231:34)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/node_modules/@modelcontextprotocol/sdk/dist/esm/server/mcp.js:126:43)\n at processTicksAndRejections (native:7:39)"},"stack":"McpError: Insufficient permissions.\n at handleError (/Users/casey/Developer/github/mcp-ts-core/dist/utils/internal/error-handler/errorHandler.js:168:23)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/tools/utils/toolHandlerFactory.js:101:42)\n at executeToolHandler (/Users/casey/Developer/github/mcp-ts-core/node_modules/@modelcontextprotocol/sdk/dist/esm/server/mcp.js:231:34)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/node_modules/@modelcontextprotocol/sdk/dist/esm/server/mcp.js:126:43)\n at processTicksAndRejections (native:7:39)","msg":"Error in tool:scoped_echo: Insufficient permissions."}
|
|
2
|
+
{"level":50,"time":1776966179159,"env":"testing","version":"0.6.12","pid":78798,"requestId":"JM7M8-2BAH5","timestamp":"2026-04-23T17:42:59.158Z","operation":"httpErrorHandler","critical":false,"errorCode":-32006,"originalErrorType":"McpError","finalErrorType":"McpError","path":"/mcp","method":"POST","errorData":{"path":"/mcp","method":"POST","requestId":"JM7M8-2BAH5","timestamp":"2026-04-23T17:42:59.158Z","operation":"httpErrorHandler","originalErrorName":"McpError","originalMessage":"Missing or invalid Authorization header. Bearer scheme required.","originalStack":"McpError: Missing or invalid Authorization header. Bearer scheme required.\n at unauthorized (/Users/casey/Developer/github/mcp-ts-core/dist/types-global/errors.js:86:61)\n at authMiddleware (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/auth/authMiddleware.js:64:19)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:22:23)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/http/httpTransport.js:119:22)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:22:23)\n at cors2 (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/middleware/cors/index.js:82:11)\n at processTicksAndRejections (native:7:39)"},"stack":"McpError: Missing or invalid Authorization header. Bearer scheme required.\n at handleError (/Users/casey/Developer/github/mcp-ts-core/dist/utils/internal/error-handler/errorHandler.js:168:23)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/http/httpErrorHandler.js:59:39)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:26:25)\n at processTicksAndRejections (native:7:39)","msg":"Error in httpTransport: Missing or invalid Authorization header. Bearer scheme required."}
|
|
3
|
+
{"level":50,"time":1776966179173,"env":"testing","version":"0.6.12","pid":78798,"requestId":"OIV34-PX46Q","timestamp":"2026-04-23T17:42:59.173Z","operation":"httpErrorHandler","critical":false,"errorCode":-32006,"originalErrorType":"McpError","finalErrorType":"McpError","path":"/mcp","method":"POST","errorData":{"path":"/mcp","method":"POST","requestId":"OIV34-PX46Q","timestamp":"2026-04-23T17:42:59.173Z","operation":"httpErrorHandler","originalErrorName":"McpError","originalMessage":"Token has expired.","originalStack":"McpError: Token has expired.\n at unauthorized (/Users/casey/Developer/github/mcp-ts-core/dist/types-global/errors.js:86:61)\n at handleJoseVerifyError (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/auth/lib/claimParser.js:56:11)\n at verify (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/auth/strategies/jwtStrategy.js:91:13)\n at processTicksAndRejections (native:7:39)"},"stack":"McpError: Token has expired.\n at handleError (/Users/casey/Developer/github/mcp-ts-core/dist/utils/internal/error-handler/errorHandler.js:168:23)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/http/httpErrorHandler.js:59:39)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:26:25)\n at processTicksAndRejections (native:7:39)","msg":"Error in httpTransport: Token has expired."}
|
|
4
|
+
{"level":50,"time":1776966179179,"env":"testing","version":"0.6.12","pid":78798,"requestId":"LRG26-T34QU","timestamp":"2026-04-23T17:42:59.179Z","operation":"httpErrorHandler","critical":false,"errorCode":-32006,"originalErrorType":"McpError","finalErrorType":"McpError","path":"/mcp","method":"GET","errorData":{"path":"/mcp","method":"GET","requestId":"LRG26-T34QU","timestamp":"2026-04-23T17:42:59.179Z","operation":"httpErrorHandler","originalErrorName":"McpError","originalMessage":"Missing or invalid Authorization header. Bearer scheme required.","originalStack":"McpError: Missing or invalid Authorization header. Bearer scheme required.\n at unauthorized (/Users/casey/Developer/github/mcp-ts-core/dist/types-global/errors.js:86:61)\n at authMiddleware (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/auth/authMiddleware.js:64:19)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:22:23)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:22:23)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/http/httpTransport.js:119:22)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:22:23)\n at cors2 (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/middleware/cors/index.js:82:11)\n at processTicksAndRejections (native:7:39)"},"stack":"McpError: Missing or invalid Authorization header. Bearer scheme required.\n at handleError (/Users/casey/Developer/github/mcp-ts-core/dist/utils/internal/error-handler/errorHandler.js:168:23)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/http/httpErrorHandler.js:59:39)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:26:25)\n at processTicksAndRejections (native:7:39)","msg":"Error in httpTransport: Missing or invalid Authorization header. Bearer scheme required."}
|
package/dist/logs/error.log
CHANGED
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
{"level":50,"time":
|
|
2
|
-
{"level":50,"time":
|
|
3
|
-
{"level":50,"time":
|
|
4
|
-
{"level":50,"time":
|
|
1
|
+
{"level":50,"time":1776966178178,"env":"testing","version":"0.0.0-test","pid":78759,"requestId":"44GLL-MUSY2","timestamp":"2026-04-23T17:42:58.177Z","operation":"HandleToolRequest","input":{"message":"blocked"},"critical":false,"errorCode":-32005,"originalErrorType":"McpError","finalErrorType":"McpError","sessionId":"c554df94160cff7175e42850d8f0f46eefbfcfdb4190cf2f23abdb6a21e0962d","toolName":"scoped_echo","tenantId":"authz-tenant","auth":{"sub":"authz-user","scopes":["tool:other:read"],"clientId":"authz-client","tenantId":"authz-tenant"},"errorData":{"sessionId":"c554df94160cff7175e42850d8f0f46eefbfcfdb4190cf2f23abdb6a21e0962d","toolName":"scoped_echo","input":{"message":"blocked"},"requestId":"44GLL-MUSY2","timestamp":"2026-04-23T17:42:58.177Z","tenantId":"authz-tenant","operation":"HandleToolRequest","auth":{"sub":"authz-user","scopes":["tool:other:read"],"clientId":"authz-client","tenantId":"authz-tenant"},"originalErrorName":"McpError","originalMessage":"Insufficient permissions.","originalStack":"McpError: Insufficient permissions.\n at forbidden (/Users/casey/Developer/github/mcp-ts-core/dist/types-global/errors.js:84:58)\n at withRequiredScopes (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/auth/lib/authUtils.js:61:15)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/tools/utils/toolHandlerFactory.js:68:17)\n at executeToolHandler (/Users/casey/Developer/github/mcp-ts-core/node_modules/@modelcontextprotocol/sdk/dist/esm/server/mcp.js:231:34)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/node_modules/@modelcontextprotocol/sdk/dist/esm/server/mcp.js:126:43)\n at processTicksAndRejections (native:7:39)"},"stack":"McpError: Insufficient permissions.\n at handleError (/Users/casey/Developer/github/mcp-ts-core/dist/utils/internal/error-handler/errorHandler.js:168:23)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/tools/utils/toolHandlerFactory.js:101:42)\n at executeToolHandler (/Users/casey/Developer/github/mcp-ts-core/node_modules/@modelcontextprotocol/sdk/dist/esm/server/mcp.js:231:34)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/node_modules/@modelcontextprotocol/sdk/dist/esm/server/mcp.js:126:43)\n at processTicksAndRejections (native:7:39)","msg":"Error in tool:scoped_echo: Insufficient permissions."}
|
|
2
|
+
{"level":50,"time":1776966179159,"env":"testing","version":"0.6.12","pid":78798,"requestId":"JM7M8-2BAH5","timestamp":"2026-04-23T17:42:59.158Z","operation":"httpErrorHandler","critical":false,"errorCode":-32006,"originalErrorType":"McpError","finalErrorType":"McpError","path":"/mcp","method":"POST","errorData":{"path":"/mcp","method":"POST","requestId":"JM7M8-2BAH5","timestamp":"2026-04-23T17:42:59.158Z","operation":"httpErrorHandler","originalErrorName":"McpError","originalMessage":"Missing or invalid Authorization header. Bearer scheme required.","originalStack":"McpError: Missing or invalid Authorization header. Bearer scheme required.\n at unauthorized (/Users/casey/Developer/github/mcp-ts-core/dist/types-global/errors.js:86:61)\n at authMiddleware (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/auth/authMiddleware.js:64:19)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:22:23)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/http/httpTransport.js:119:22)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:22:23)\n at cors2 (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/middleware/cors/index.js:82:11)\n at processTicksAndRejections (native:7:39)"},"stack":"McpError: Missing or invalid Authorization header. Bearer scheme required.\n at handleError (/Users/casey/Developer/github/mcp-ts-core/dist/utils/internal/error-handler/errorHandler.js:168:23)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/http/httpErrorHandler.js:59:39)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:26:25)\n at processTicksAndRejections (native:7:39)","msg":"Error in httpTransport: Missing or invalid Authorization header. Bearer scheme required."}
|
|
3
|
+
{"level":50,"time":1776966179173,"env":"testing","version":"0.6.12","pid":78798,"requestId":"OIV34-PX46Q","timestamp":"2026-04-23T17:42:59.173Z","operation":"httpErrorHandler","critical":false,"errorCode":-32006,"originalErrorType":"McpError","finalErrorType":"McpError","path":"/mcp","method":"POST","errorData":{"path":"/mcp","method":"POST","requestId":"OIV34-PX46Q","timestamp":"2026-04-23T17:42:59.173Z","operation":"httpErrorHandler","originalErrorName":"McpError","originalMessage":"Token has expired.","originalStack":"McpError: Token has expired.\n at unauthorized (/Users/casey/Developer/github/mcp-ts-core/dist/types-global/errors.js:86:61)\n at handleJoseVerifyError (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/auth/lib/claimParser.js:56:11)\n at verify (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/auth/strategies/jwtStrategy.js:91:13)\n at processTicksAndRejections (native:7:39)"},"stack":"McpError: Token has expired.\n at handleError (/Users/casey/Developer/github/mcp-ts-core/dist/utils/internal/error-handler/errorHandler.js:168:23)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/http/httpErrorHandler.js:59:39)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:26:25)\n at processTicksAndRejections (native:7:39)","msg":"Error in httpTransport: Token has expired."}
|
|
4
|
+
{"level":50,"time":1776966179179,"env":"testing","version":"0.6.12","pid":78798,"requestId":"LRG26-T34QU","timestamp":"2026-04-23T17:42:59.179Z","operation":"httpErrorHandler","critical":false,"errorCode":-32006,"originalErrorType":"McpError","finalErrorType":"McpError","path":"/mcp","method":"GET","errorData":{"path":"/mcp","method":"GET","requestId":"LRG26-T34QU","timestamp":"2026-04-23T17:42:59.179Z","operation":"httpErrorHandler","originalErrorName":"McpError","originalMessage":"Missing or invalid Authorization header. Bearer scheme required.","originalStack":"McpError: Missing or invalid Authorization header. Bearer scheme required.\n at unauthorized (/Users/casey/Developer/github/mcp-ts-core/dist/types-global/errors.js:86:61)\n at authMiddleware (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/auth/authMiddleware.js:64:19)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:22:23)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:22:23)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/http/httpTransport.js:119:22)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:22:23)\n at cors2 (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/middleware/cors/index.js:82:11)\n at processTicksAndRejections (native:7:39)"},"stack":"McpError: Missing or invalid Authorization header. Bearer scheme required.\n at handleError (/Users/casey/Developer/github/mcp-ts-core/dist/utils/internal/error-handler/errorHandler.js:168:23)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/http/httpErrorHandler.js:59:39)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:26:25)\n at processTicksAndRejections (native:7:39)","msg":"Error in httpTransport: Missing or invalid Authorization header. Bearer scheme required."}
|
package/dist/utils/index.d.ts
CHANGED
|
@@ -13,7 +13,7 @@ export { type ChatMessage, countChatTokens, countTokens, type ModelHeuristics, }
|
|
|
13
13
|
export { type FetchWithTimeoutOptions, fetchWithTimeout } from './network/fetchWithTimeout.js';
|
|
14
14
|
export { type RetryOptions, withRetry } from './network/retry.js';
|
|
15
15
|
export { DEFAULT_PAGINATION_CONFIG, decodeCursor, encodeCursor, extractCursor, type PaginatedResult, type PaginationState, paginateArray, } from './pagination/pagination.js';
|
|
16
|
-
export { type AddPageOptions, Allow, CsvParser, csvParser, type DrawImageOptions, type DrawTextOptions, dateParser, type EmbedImageOptions, type ExtractTextOptions, type ExtractTextResult, type FillFormOptions, FrontmatterParser, type FrontmatterResult, frontmatterParser, JsonParser, jsonParser, type PageRange, type PdfMetadata, PdfParser, parseDateString, parseDateStringDetailed, pdfParser, type SetMetadataOptions, thinkBlockRegex, XmlParser, xmlParser, YamlParser, yamlParser, } from './parsing/index.js';
|
|
16
|
+
export { type AddPageOptions, Allow, CsvParser, csvParser, type DrawImageOptions, type DrawTextOptions, dateParser, type EmbedImageOptions, type ExtractArticleOptions, type ExtractArticleResult, type ExtractTextOptions, type ExtractTextResult, type FillFormOptions, FrontmatterParser, type FrontmatterResult, frontmatterParser, HtmlExtractor, htmlExtractor, JsonParser, jsonParser, type PageRange, type PdfMetadata, PdfParser, parseDateString, parseDateStringDetailed, pdfParser, type SetMetadataOptions, thinkBlockRegex, XmlParser, xmlParser, YamlParser, yamlParser, } from './parsing/index.js';
|
|
17
17
|
export { type Job, SchedulerService, schedulerService } from './scheduling/scheduler.js';
|
|
18
18
|
export { type EntityPrefixConfig, generateRequestContextId, generateUUID, type HtmlSanitizeConfig, type IdGenerationOptions, IdGenerator, idGenerator, type PathSanitizeOptions, type RateLimitConfig, type RateLimitEntry, RateLimiter, Sanitization, type SanitizedPathInfo, type SanitizeStringOptions, sanitization, sanitizeInputForLogging, } from './security/index.js';
|
|
19
19
|
export { ATTR_CODE_FUNCTION_NAME, ATTR_CODE_NAMESPACE, ATTR_GEN_AI_REQUEST_MAX_TOKENS, ATTR_GEN_AI_REQUEST_MODEL, ATTR_GEN_AI_REQUEST_STREAMING, ATTR_GEN_AI_REQUEST_TEMPERATURE, ATTR_GEN_AI_REQUEST_TOP_P, ATTR_GEN_AI_RESPONSE_MODEL, ATTR_GEN_AI_SYSTEM, ATTR_GEN_AI_TOKEN_TYPE, ATTR_GEN_AI_USAGE_INPUT_TOKENS, ATTR_GEN_AI_USAGE_OUTPUT_TOKENS, ATTR_GEN_AI_USAGE_TOTAL_TOKENS, ATTR_MCP_AUTH_FAILURE_REASON, ATTR_MCP_AUTH_METHOD, ATTR_MCP_AUTH_OUTCOME, ATTR_MCP_AUTH_SCOPES, ATTR_MCP_AUTH_SUBJECT, ATTR_MCP_CLIENT_ID, ATTR_MCP_ERROR_CLASSIFIED_CODE, ATTR_MCP_GRAPH_DURATION_MS, ATTR_MCP_GRAPH_OPERATION, ATTR_MCP_GRAPH_SUCCESS, ATTR_MCP_RESOURCE_DURATION_MS, ATTR_MCP_RESOURCE_ERROR_CODE, ATTR_MCP_RESOURCE_MIME_TYPE, ATTR_MCP_RESOURCE_SIZE_BYTES, ATTR_MCP_RESOURCE_SUCCESS, ATTR_MCP_RESOURCE_URI, ATTR_MCP_SESSION_EVENT, ATTR_MCP_SPEECH_DURATION_MS, ATTR_MCP_SPEECH_INPUT_BYTES, ATTR_MCP_SPEECH_OPERATION, ATTR_MCP_SPEECH_OUTPUT_BYTES, ATTR_MCP_SPEECH_PROVIDER, ATTR_MCP_SPEECH_SUCCESS, ATTR_MCP_STORAGE_DURATION_MS, ATTR_MCP_STORAGE_KEY_COUNT, ATTR_MCP_STORAGE_OPERATION, ATTR_MCP_STORAGE_SUCCESS, ATTR_MCP_TASK_STATUS, ATTR_MCP_TASK_STORE_TYPE, ATTR_MCP_TENANT_ID, ATTR_MCP_TOOL_BATCH_FAILED, ATTR_MCP_TOOL_BATCH_SUCCEEDED, ATTR_MCP_TOOL_DURATION_MS, ATTR_MCP_TOOL_ERROR_CATEGORY, ATTR_MCP_TOOL_ERROR_CODE, ATTR_MCP_TOOL_INPUT_BYTES, ATTR_MCP_TOOL_NAME, ATTR_MCP_TOOL_OUTPUT_BYTES, ATTR_MCP_TOOL_PARTIAL_SUCCESS, ATTR_MCP_TOOL_SUCCESS, } from './telemetry/attributes.js';
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../../src/utils/index.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAGH,OAAO,EACL,KAAK,SAAS,EACd,KAAK,UAAU,EACf,aAAa,EACb,KAAK,oBAAoB,EACzB,aAAa,EACb,UAAU,EACV,KAAK,iBAAiB,EACtB,IAAI,EACJ,eAAe,EACf,QAAQ,EACR,QAAQ,EACR,cAAc,EACd,KAAK,qBAAqB,EAC1B,KAAK,UAAU,EACf,aAAa,EACb,KAAK,oBAAoB,EACzB,KAAK,QAAQ,EACb,KAAK,SAAS,EACd,cAAc,EACd,aAAa,EACb,SAAS,GACV,MAAM,uBAAuB,CAAC;AAE/B,OAAO,EAAE,mBAAmB,EAAE,cAAc,EAAE,cAAc,EAAE,MAAM,wBAAwB,CAAC;AAE7F,OAAO,EAAE,YAAY,EAAE,MAAM,0CAA0C,CAAC;AACxE,YAAY,EACV,gBAAgB,EAChB,YAAY,EACZ,mBAAmB,EACnB,YAAY,GACb,MAAM,mCAAmC,CAAC;AAE3C,OAAO,EAAE,MAAM,EAAE,MAAM,EAAE,KAAK,WAAW,EAAE,MAAM,sBAAsB,CAAC;AAExE,OAAO,EACL,KAAK,WAAW,EAChB,KAAK,0BAA0B,EAC/B,KAAK,cAAc,EACnB,qBAAqB,GACtB,MAAM,8BAA8B,CAAC;AAEtC,OAAO,EAAE,KAAK,mBAAmB,EAAE,WAAW,EAAE,MAAM,uBAAuB,CAAC;AAE9E,OAAO,EACL,KAAK,WAAW,EAChB,eAAe,EACf,WAAW,EACX,KAAK,eAAe,GACrB,MAAM,2BAA2B,CAAC;AAEnC,OAAO,EAAE,KAAK,uBAAuB,EAAE,gBAAgB,EAAE,MAAM,+BAA+B,CAAC;AAC/F,OAAO,EAAE,KAAK,YAAY,EAAE,SAAS,EAAE,MAAM,oBAAoB,CAAC;AAElE,OAAO,EACL,yBAAyB,EACzB,YAAY,EACZ,YAAY,EACZ,aAAa,EACb,KAAK,eAAe,EACpB,KAAK,eAAe,EACpB,aAAa,GACd,MAAM,4BAA4B,CAAC;AAEpC,OAAO,EACL,KAAK,cAAc,EACnB,KAAK,EACL,SAAS,EACT,SAAS,EACT,KAAK,gBAAgB,EACrB,KAAK,eAAe,EACpB,UAAU,EACV,KAAK,iBAAiB,EACtB,KAAK,kBAAkB,EACvB,KAAK,iBAAiB,EACtB,KAAK,eAAe,EACpB,iBAAiB,EACjB,KAAK,iBAAiB,EACtB,iBAAiB,EACjB,UAAU,EACV,UAAU,EACV,KAAK,SAAS,EACd,KAAK,WAAW,EAChB,SAAS,EACT,eAAe,EACf,uBAAuB,EACvB,SAAS,EACT,KAAK,kBAAkB,EACvB,eAAe,EACf,SAAS,EACT,SAAS,EACT,UAAU,EACV,UAAU,GACX,MAAM,oBAAoB,CAAC;AAE5B,OAAO,EAAE,KAAK,GAAG,EAAE,gBAAgB,EAAE,gBAAgB,EAAE,MAAM,2BAA2B,CAAC;AAEzF,OAAO,EACL,KAAK,kBAAkB,EACvB,wBAAwB,EACxB,YAAY,EACZ,KAAK,kBAAkB,EACvB,KAAK,mBAAmB,EACxB,WAAW,EACX,WAAW,EACX,KAAK,mBAAmB,EACxB,KAAK,eAAe,EACpB,KAAK,cAAc,EACnB,WAAW,EACX,YAAY,EACZ,KAAK,iBAAiB,EACtB,KAAK,qBAAqB,EAC1B,YAAY,EACZ,uBAAuB,GACxB,MAAM,qBAAqB,CAAC;AAE7B,OAAO,EACL,uBAAuB,EACvB,mBAAmB,EACnB,8BAA8B,EAC9B,yBAAyB,EACzB,6BAA6B,EAC7B,+BAA+B,EAC/B,yBAAyB,EACzB,0BAA0B,EAC1B,kBAAkB,EAClB,sBAAsB,EACtB,8BAA8B,EAC9B,+BAA+B,EAC/B,8BAA8B,EAC9B,4BAA4B,EAC5B,oBAAoB,EACpB,qBAAqB,EACrB,oBAAoB,EACpB,qBAAqB,EACrB,kBAAkB,EAClB,8BAA8B,EAC9B,0BAA0B,EAC1B,wBAAwB,EACxB,sBAAsB,EACtB,6BAA6B,EAC7B,4BAA4B,EAC5B,2BAA2B,EAC3B,4BAA4B,EAC5B,yBAAyB,EACzB,qBAAqB,EACrB,sBAAsB,EACtB,2BAA2B,EAC3B,2BAA2B,EAC3B,yBAAyB,EACzB,4BAA4B,EAC5B,wBAAwB,EACxB,uBAAuB,EACvB,4BAA4B,EAC5B,0BAA0B,EAC1B,0BAA0B,EAC1B,wBAAwB,EACxB,oBAAoB,EACpB,wBAAwB,EACxB,kBAAkB,EAClB,0BAA0B,EAC1B,6BAA6B,EAC7B,yBAAyB,EACzB,4BAA4B,EAC5B,wBAAwB,EACxB,yBAAyB,EACzB,kBAAkB,EAClB,0BAA0B,EAC1B,6BAA6B,EAC7B,qBAAqB,GACtB,MAAM,2BAA2B,CAAC;AAEnC,OAAO,EACL,uBAAuB,EACvB,GAAG,EACH,qBAAqB,GACtB,MAAM,gCAAgC,CAAC;AAExC,OAAO,EACL,aAAa,EACb,eAAe,EACf,mBAAmB,EACnB,QAAQ,GACT,MAAM,wBAAwB,CAAC;AAEhC,OAAO,EACL,gBAAgB,EAChB,4BAA4B,EAC5B,kBAAkB,EAClB,wBAAwB,EACxB,YAAY,EACZ,KAAK,eAAe,EACpB,QAAQ,GACT,MAAM,sBAAsB,CAAC;AAE9B,OAAO,EAAE,eAAe,EAAE,QAAQ,EAAE,MAAM,kBAAkB,CAAC"}
|
|
1
|
+
{"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../../src/utils/index.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAGH,OAAO,EACL,KAAK,SAAS,EACd,KAAK,UAAU,EACf,aAAa,EACb,KAAK,oBAAoB,EACzB,aAAa,EACb,UAAU,EACV,KAAK,iBAAiB,EACtB,IAAI,EACJ,eAAe,EACf,QAAQ,EACR,QAAQ,EACR,cAAc,EACd,KAAK,qBAAqB,EAC1B,KAAK,UAAU,EACf,aAAa,EACb,KAAK,oBAAoB,EACzB,KAAK,QAAQ,EACb,KAAK,SAAS,EACd,cAAc,EACd,aAAa,EACb,SAAS,GACV,MAAM,uBAAuB,CAAC;AAE/B,OAAO,EAAE,mBAAmB,EAAE,cAAc,EAAE,cAAc,EAAE,MAAM,wBAAwB,CAAC;AAE7F,OAAO,EAAE,YAAY,EAAE,MAAM,0CAA0C,CAAC;AACxE,YAAY,EACV,gBAAgB,EAChB,YAAY,EACZ,mBAAmB,EACnB,YAAY,GACb,MAAM,mCAAmC,CAAC;AAE3C,OAAO,EAAE,MAAM,EAAE,MAAM,EAAE,KAAK,WAAW,EAAE,MAAM,sBAAsB,CAAC;AAExE,OAAO,EACL,KAAK,WAAW,EAChB,KAAK,0BAA0B,EAC/B,KAAK,cAAc,EACnB,qBAAqB,GACtB,MAAM,8BAA8B,CAAC;AAEtC,OAAO,EAAE,KAAK,mBAAmB,EAAE,WAAW,EAAE,MAAM,uBAAuB,CAAC;AAE9E,OAAO,EACL,KAAK,WAAW,EAChB,eAAe,EACf,WAAW,EACX,KAAK,eAAe,GACrB,MAAM,2BAA2B,CAAC;AAEnC,OAAO,EAAE,KAAK,uBAAuB,EAAE,gBAAgB,EAAE,MAAM,+BAA+B,CAAC;AAC/F,OAAO,EAAE,KAAK,YAAY,EAAE,SAAS,EAAE,MAAM,oBAAoB,CAAC;AAElE,OAAO,EACL,yBAAyB,EACzB,YAAY,EACZ,YAAY,EACZ,aAAa,EACb,KAAK,eAAe,EACpB,KAAK,eAAe,EACpB,aAAa,GACd,MAAM,4BAA4B,CAAC;AAEpC,OAAO,EACL,KAAK,cAAc,EACnB,KAAK,EACL,SAAS,EACT,SAAS,EACT,KAAK,gBAAgB,EACrB,KAAK,eAAe,EACpB,UAAU,EACV,KAAK,iBAAiB,EACtB,KAAK,qBAAqB,EAC1B,KAAK,oBAAoB,EACzB,KAAK,kBAAkB,EACvB,KAAK,iBAAiB,EACtB,KAAK,eAAe,EACpB,iBAAiB,EACjB,KAAK,iBAAiB,EACtB,iBAAiB,EACjB,aAAa,EACb,aAAa,EACb,UAAU,EACV,UAAU,EACV,KAAK,SAAS,EACd,KAAK,WAAW,EAChB,SAAS,EACT,eAAe,EACf,uBAAuB,EACvB,SAAS,EACT,KAAK,kBAAkB,EACvB,eAAe,EACf,SAAS,EACT,SAAS,EACT,UAAU,EACV,UAAU,GACX,MAAM,oBAAoB,CAAC;AAE5B,OAAO,EAAE,KAAK,GAAG,EAAE,gBAAgB,EAAE,gBAAgB,EAAE,MAAM,2BAA2B,CAAC;AAEzF,OAAO,EACL,KAAK,kBAAkB,EACvB,wBAAwB,EACxB,YAAY,EACZ,KAAK,kBAAkB,EACvB,KAAK,mBAAmB,EACxB,WAAW,EACX,WAAW,EACX,KAAK,mBAAmB,EACxB,KAAK,eAAe,EACpB,KAAK,cAAc,EACnB,WAAW,EACX,YAAY,EACZ,KAAK,iBAAiB,EACtB,KAAK,qBAAqB,EAC1B,YAAY,EACZ,uBAAuB,GACxB,MAAM,qBAAqB,CAAC;AAE7B,OAAO,EACL,uBAAuB,EACvB,mBAAmB,EACnB,8BAA8B,EAC9B,yBAAyB,EACzB,6BAA6B,EAC7B,+BAA+B,EAC/B,yBAAyB,EACzB,0BAA0B,EAC1B,kBAAkB,EAClB,sBAAsB,EACtB,8BAA8B,EAC9B,+BAA+B,EAC/B,8BAA8B,EAC9B,4BAA4B,EAC5B,oBAAoB,EACpB,qBAAqB,EACrB,oBAAoB,EACpB,qBAAqB,EACrB,kBAAkB,EAClB,8BAA8B,EAC9B,0BAA0B,EAC1B,wBAAwB,EACxB,sBAAsB,EACtB,6BAA6B,EAC7B,4BAA4B,EAC5B,2BAA2B,EAC3B,4BAA4B,EAC5B,yBAAyB,EACzB,qBAAqB,EACrB,sBAAsB,EACtB,2BAA2B,EAC3B,2BAA2B,EAC3B,yBAAyB,EACzB,4BAA4B,EAC5B,wBAAwB,EACxB,uBAAuB,EACvB,4BAA4B,EAC5B,0BAA0B,EAC1B,0BAA0B,EAC1B,wBAAwB,EACxB,oBAAoB,EACpB,wBAAwB,EACxB,kBAAkB,EAClB,0BAA0B,EAC1B,6BAA6B,EAC7B,yBAAyB,EACzB,4BAA4B,EAC5B,wBAAwB,EACxB,yBAAyB,EACzB,kBAAkB,EAClB,0BAA0B,EAC1B,6BAA6B,EAC7B,qBAAqB,GACtB,MAAM,2BAA2B,CAAC;AAEnC,OAAO,EACL,uBAAuB,EACvB,GAAG,EACH,qBAAqB,GACtB,MAAM,gCAAgC,CAAC;AAExC,OAAO,EACL,aAAa,EACb,eAAe,EACf,mBAAmB,EACnB,QAAQ,GACT,MAAM,wBAAwB,CAAC;AAEhC,OAAO,EACL,gBAAgB,EAChB,4BAA4B,EAC5B,kBAAkB,EAClB,wBAAwB,EACxB,YAAY,EACZ,KAAK,eAAe,EACpB,QAAQ,GACT,MAAM,sBAAsB,CAAC;AAE9B,OAAO,EAAE,eAAe,EAAE,QAAQ,EAAE,MAAM,kBAAkB,CAAC"}
|
package/dist/utils/index.js
CHANGED
|
@@ -22,7 +22,7 @@ export { withRetry } from './network/retry.js';
|
|
|
22
22
|
// Pagination
|
|
23
23
|
export { DEFAULT_PAGINATION_CONFIG, decodeCursor, encodeCursor, extractCursor, paginateArray, } from './pagination/pagination.js';
|
|
24
24
|
// Parsing
|
|
25
|
-
export { Allow, CsvParser, csvParser, dateParser, FrontmatterParser, frontmatterParser, JsonParser, jsonParser, PdfParser, parseDateString, parseDateStringDetailed, pdfParser, thinkBlockRegex, XmlParser, xmlParser, YamlParser, yamlParser, } from './parsing/index.js';
|
|
25
|
+
export { Allow, CsvParser, csvParser, dateParser, FrontmatterParser, frontmatterParser, HtmlExtractor, htmlExtractor, JsonParser, jsonParser, PdfParser, parseDateString, parseDateStringDetailed, pdfParser, thinkBlockRegex, XmlParser, xmlParser, YamlParser, yamlParser, } from './parsing/index.js';
|
|
26
26
|
// Scheduling
|
|
27
27
|
export { SchedulerService, schedulerService } from './scheduling/scheduler.js';
|
|
28
28
|
// Security
|
package/dist/utils/index.js.map
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"index.js","sourceRoot":"","sources":["../../src/utils/index.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAEH,aAAa;AACb,OAAO,EAGL,aAAa,EAEb,aAAa,EACb,UAAU,EAEV,IAAI,EACJ,eAAe,EACf,QAAQ,EACR,QAAQ,EACR,cAAc,EAGd,aAAa,EAIb,cAAc,EACd,aAAa,EACb,SAAS,GACV,MAAM,uBAAuB,CAAC;AAC/B,WAAW;AACX,OAAO,EAAE,mBAAmB,EAAE,cAAc,EAAE,cAAc,EAAE,MAAM,wBAAwB,CAAC;AAC7F,gBAAgB;AAChB,OAAO,EAAE,YAAY,EAAE,MAAM,0CAA0C,CAAC;AAOxE,SAAS;AACT,OAAO,EAAE,MAAM,EAAE,MAAM,EAAoB,MAAM,sBAAsB,CAAC;AACxE,kBAAkB;AAClB,OAAO,EAIL,qBAAqB,GACtB,MAAM,8BAA8B,CAAC;AACtC,UAAU;AACV,OAAO,EAA4B,WAAW,EAAE,MAAM,uBAAuB,CAAC;AAC9E,iBAAiB;AACjB,OAAO,EAEL,eAAe,EACf,WAAW,GAEZ,MAAM,2BAA2B,CAAC;AACnC,UAAU;AACV,OAAO,EAAgC,gBAAgB,EAAE,MAAM,+BAA+B,CAAC;AAC/F,OAAO,EAAqB,SAAS,EAAE,MAAM,oBAAoB,CAAC;AAClE,aAAa;AACb,OAAO,EACL,yBAAyB,EACzB,YAAY,EACZ,YAAY,EACZ,aAAa,EAGb,aAAa,GACd,MAAM,4BAA4B,CAAC;AACpC,UAAU;AACV,OAAO,EAEL,KAAK,EACL,SAAS,EACT,SAAS,EAGT,UAAU,
|
|
1
|
+
{"version":3,"file":"index.js","sourceRoot":"","sources":["../../src/utils/index.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAEH,aAAa;AACb,OAAO,EAGL,aAAa,EAEb,aAAa,EACb,UAAU,EAEV,IAAI,EACJ,eAAe,EACf,QAAQ,EACR,QAAQ,EACR,cAAc,EAGd,aAAa,EAIb,cAAc,EACd,aAAa,EACb,SAAS,GACV,MAAM,uBAAuB,CAAC;AAC/B,WAAW;AACX,OAAO,EAAE,mBAAmB,EAAE,cAAc,EAAE,cAAc,EAAE,MAAM,wBAAwB,CAAC;AAC7F,gBAAgB;AAChB,OAAO,EAAE,YAAY,EAAE,MAAM,0CAA0C,CAAC;AAOxE,SAAS;AACT,OAAO,EAAE,MAAM,EAAE,MAAM,EAAoB,MAAM,sBAAsB,CAAC;AACxE,kBAAkB;AAClB,OAAO,EAIL,qBAAqB,GACtB,MAAM,8BAA8B,CAAC;AACtC,UAAU;AACV,OAAO,EAA4B,WAAW,EAAE,MAAM,uBAAuB,CAAC;AAC9E,iBAAiB;AACjB,OAAO,EAEL,eAAe,EACf,WAAW,GAEZ,MAAM,2BAA2B,CAAC;AACnC,UAAU;AACV,OAAO,EAAgC,gBAAgB,EAAE,MAAM,+BAA+B,CAAC;AAC/F,OAAO,EAAqB,SAAS,EAAE,MAAM,oBAAoB,CAAC;AAClE,aAAa;AACb,OAAO,EACL,yBAAyB,EACzB,YAAY,EACZ,YAAY,EACZ,aAAa,EAGb,aAAa,GACd,MAAM,4BAA4B,CAAC;AACpC,UAAU;AACV,OAAO,EAEL,KAAK,EACL,SAAS,EACT,SAAS,EAGT,UAAU,EAOV,iBAAiB,EAEjB,iBAAiB,EACjB,aAAa,EACb,aAAa,EACb,UAAU,EACV,UAAU,EAGV,SAAS,EACT,eAAe,EACf,uBAAuB,EACvB,SAAS,EAET,eAAe,EACf,SAAS,EACT,SAAS,EACT,UAAU,EACV,UAAU,GACX,MAAM,oBAAoB,CAAC;AAC5B,aAAa;AACb,OAAO,EAAY,gBAAgB,EAAE,gBAAgB,EAAE,MAAM,2BAA2B,CAAC;AACzF,WAAW;AACX,OAAO,EAEL,wBAAwB,EACxB,YAAY,EAGZ,WAAW,EACX,WAAW,EAIX,WAAW,EACX,YAAY,EAGZ,YAAY,EACZ,uBAAuB,GACxB,MAAM,qBAAqB,CAAC;AAC7B,iCAAiC;AACjC,OAAO,EACL,uBAAuB,EACvB,mBAAmB,EACnB,8BAA8B,EAC9B,yBAAyB,EACzB,6BAA6B,EAC7B,+BAA+B,EAC/B,yBAAyB,EACzB,0BAA0B,EAC1B,kBAAkB,EAClB,sBAAsB,EACtB,8BAA8B,EAC9B,+BAA+B,EAC/B,8BAA8B,EAC9B,4BAA4B,EAC5B,oBAAoB,EACpB,qBAAqB,EACrB,oBAAoB,EACpB,qBAAqB,EACrB,kBAAkB,EAClB,8BAA8B,EAC9B,0BAA0B,EAC1B,wBAAwB,EACxB,sBAAsB,EACtB,6BAA6B,EAC7B,4BAA4B,EAC5B,2BAA2B,EAC3B,4BAA4B,EAC5B,yBAAyB,EACzB,qBAAqB,EACrB,sBAAsB,EACtB,2BAA2B,EAC3B,2BAA2B,EAC3B,yBAAyB,EACzB,4BAA4B,EAC5B,wBAAwB,EACxB,uBAAuB,EACvB,4BAA4B,EAC5B,0BAA0B,EAC1B,0BAA0B,EAC1B,wBAAwB,EACxB,oBAAoB,EACpB,wBAAwB,EACxB,kBAAkB,EAClB,0BAA0B,EAC1B,6BAA6B,EAC7B,yBAAyB,EACzB,4BAA4B,EAC5B,wBAAwB,EACxB,yBAAyB,EACzB,kBAAkB,EAClB,0BAA0B,EAC1B,6BAA6B,EAC7B,qBAAqB,GACtB,MAAM,2BAA2B,CAAC;AACnC,8BAA8B;AAC9B,OAAO,EACL,uBAAuB,EACvB,GAAG,EACH,qBAAqB,GACtB,MAAM,gCAAgC,CAAC;AACxC,sBAAsB;AACtB,OAAO,EACL,aAAa,EACb,eAAe,EACf,mBAAmB,EACnB,QAAQ,GACT,MAAM,wBAAwB,CAAC;AAChC,oBAAoB;AACpB,OAAO,EACL,gBAAgB,EAChB,4BAA4B,EAC5B,kBAAkB,EAClB,wBAAwB,EACxB,YAAY,EAEZ,QAAQ,GACT,MAAM,sBAAsB,CAAC;AAC9B,cAAc;AACd,OAAO,EAAE,eAAe,EAAE,QAAQ,EAAE,MAAM,kBAAkB,CAAC"}
|
|
@@ -0,0 +1,146 @@
|
|
|
1
|
+
import { type RequestContext } from '../../utils/internal/requestContext.js';
|
|
2
|
+
/**
|
|
3
|
+
* Options for HTML article extraction.
|
|
4
|
+
*/
|
|
5
|
+
export interface ExtractArticleOptions {
|
|
6
|
+
/**
|
|
7
|
+
* CSS selector to use as the main content element, bypassing auto-detection.
|
|
8
|
+
* If the selector does not match any element, Defuddle falls back to
|
|
9
|
+
* auto-detection.
|
|
10
|
+
*/
|
|
11
|
+
contentSelector?: string;
|
|
12
|
+
/**
|
|
13
|
+
* Enable Defuddle's debug logging and bypass div flattening. Useful when
|
|
14
|
+
* diagnosing why a specific page extracts poorly. Defaults to `false`.
|
|
15
|
+
*/
|
|
16
|
+
debug?: boolean;
|
|
17
|
+
/**
|
|
18
|
+
* Output format for `content`. `'markdown'` converts to Markdown (the common
|
|
19
|
+
* case for LLM-bound text), `'html'` returns cleaned HTML. Defaults to
|
|
20
|
+
* `'markdown'`.
|
|
21
|
+
*/
|
|
22
|
+
format?: 'html' | 'markdown';
|
|
23
|
+
/**
|
|
24
|
+
* Preferred language for extraction and transcript selection (BCP 47, e.g.
|
|
25
|
+
* `'en'`, `'fr'`, `'ja'`).
|
|
26
|
+
*/
|
|
27
|
+
language?: string;
|
|
28
|
+
/**
|
|
29
|
+
* Strip all images from the extracted content. Defaults to `false`.
|
|
30
|
+
*/
|
|
31
|
+
removeImages?: boolean;
|
|
32
|
+
/**
|
|
33
|
+
* URL of the page being parsed. Passed to Defuddle for site-specific
|
|
34
|
+
* extractors and resolved link rewriting.
|
|
35
|
+
*/
|
|
36
|
+
url?: string;
|
|
37
|
+
/**
|
|
38
|
+
* Allow Defuddle's async extractors to fetch from third-party APIs
|
|
39
|
+
* (e.g. FxTwitter) when no local content is available in the HTML.
|
|
40
|
+
* Defaults to `false` to keep extraction fully local and deterministic.
|
|
41
|
+
*/
|
|
42
|
+
useAsync?: boolean;
|
|
43
|
+
}
|
|
44
|
+
/**
|
|
45
|
+
* Result of HTML article extraction.
|
|
46
|
+
*
|
|
47
|
+
* All fields except `content` are best-effort — they may be undefined if the
|
|
48
|
+
* source page does not provide the corresponding metadata.
|
|
49
|
+
*/
|
|
50
|
+
export interface ExtractArticleResult {
|
|
51
|
+
/** Article author, if detected. */
|
|
52
|
+
author?: string;
|
|
53
|
+
/** Cleaned main content, either as Markdown or HTML depending on `format`. */
|
|
54
|
+
content: string;
|
|
55
|
+
/** Description or summary of the article, if present in page metadata. */
|
|
56
|
+
description?: string;
|
|
57
|
+
/** Domain of the source page (e.g. `'example.com'`), if derivable. */
|
|
58
|
+
domain?: string;
|
|
59
|
+
/** URL of the source site's favicon, if detected. */
|
|
60
|
+
favicon?: string;
|
|
61
|
+
/** URL of the article's primary image, if detected. */
|
|
62
|
+
image?: string;
|
|
63
|
+
/** Page language in BCP 47 format (e.g. `'en'`, `'en-US'`), if detected. */
|
|
64
|
+
language?: string;
|
|
65
|
+
/** Meta tags extracted from the page head, keyed by name. */
|
|
66
|
+
metaTags?: Record<string, string>;
|
|
67
|
+
/** Time `defuddle` spent parsing, in milliseconds. */
|
|
68
|
+
parseTime?: number;
|
|
69
|
+
/** Publication date string, if detected. Format is source-dependent. */
|
|
70
|
+
published?: string;
|
|
71
|
+
/** Raw schema.org data extracted from the page, if present. */
|
|
72
|
+
schemaOrgData?: unknown;
|
|
73
|
+
/** Site name, if detected (e.g. from Open Graph `og:site_name`). */
|
|
74
|
+
site?: string;
|
|
75
|
+
/** Article title, if detected. */
|
|
76
|
+
title?: string;
|
|
77
|
+
/** Word count of the extracted content, as reported by `defuddle`. */
|
|
78
|
+
wordCount?: number;
|
|
79
|
+
}
|
|
80
|
+
/**
|
|
81
|
+
* Utility class for extracting main article content from raw HTML.
|
|
82
|
+
*
|
|
83
|
+
* Lazily loads `defuddle` and `linkedom` on first use — both are optional peer
|
|
84
|
+
* dependencies (`bun add defuddle linkedom`). Returns cleaned main content
|
|
85
|
+
* plus best-effort metadata: title, author, description, Open Graph fields,
|
|
86
|
+
* schema.org data, word count.
|
|
87
|
+
*
|
|
88
|
+
* Does not guarantee structure beyond "main content of the page." For quirky
|
|
89
|
+
* pages, malformed HTML, or SPA shells with minimal server-rendered content,
|
|
90
|
+
* the result may be sparse — callers should degrade gracefully.
|
|
91
|
+
*/
|
|
92
|
+
export declare class HtmlExtractor {
|
|
93
|
+
/**
|
|
94
|
+
* Extracts the main article content from an HTML string.
|
|
95
|
+
*
|
|
96
|
+
* Async due to lazy loading of `defuddle` and `linkedom`, and because
|
|
97
|
+
* Defuddle's node entry is itself async (supports async fallback extractors
|
|
98
|
+
* gated by `useAsync`).
|
|
99
|
+
*
|
|
100
|
+
* @param html - Raw HTML string to extract from.
|
|
101
|
+
* @param options - Optional extraction options (format, URL, content selector, etc.).
|
|
102
|
+
* @param context - Optional `RequestContext` for correlated logging and error metadata.
|
|
103
|
+
* @returns Extracted content and metadata. Only `content` is guaranteed to
|
|
104
|
+
* be present; all other fields are best-effort.
|
|
105
|
+
* @throws {McpError} With `ConfigurationError` if `defuddle` or `linkedom` is not installed.
|
|
106
|
+
* @throws {McpError} With `ValidationError` if the HTML string is empty after trimming,
|
|
107
|
+
* or if `defuddle` fails to parse the page.
|
|
108
|
+
*
|
|
109
|
+
* @example
|
|
110
|
+
* ```typescript
|
|
111
|
+
* import { htmlExtractor } from '../../utils/parsing/htmlExtractor.js';
|
|
112
|
+
*
|
|
113
|
+
* const html = await fetch('https://example.com/article').then((r) => r.text());
|
|
114
|
+
* const result = await htmlExtractor.extract(html, {
|
|
115
|
+
* url: 'https://example.com/article',
|
|
116
|
+
* format: 'markdown',
|
|
117
|
+
* });
|
|
118
|
+
*
|
|
119
|
+
* console.log(result.title);
|
|
120
|
+
* console.log(result.content);
|
|
121
|
+
* ```
|
|
122
|
+
*/
|
|
123
|
+
extract(html: string, options?: ExtractArticleOptions, context?: RequestContext): Promise<ExtractArticleResult>;
|
|
124
|
+
}
|
|
125
|
+
/**
|
|
126
|
+
* Singleton instance of {@link HtmlExtractor}.
|
|
127
|
+
*
|
|
128
|
+
* Prefer this over constructing a new `HtmlExtractor` directly. Lazily loads
|
|
129
|
+
* `defuddle` and `linkedom` on first call, so there is no startup cost if
|
|
130
|
+
* HTML extraction is never used.
|
|
131
|
+
*
|
|
132
|
+
* @example
|
|
133
|
+
* ```typescript
|
|
134
|
+
* import { htmlExtractor } from '../../utils/parsing/htmlExtractor.js';
|
|
135
|
+
*
|
|
136
|
+
* const article = await htmlExtractor.extract(rawHtml, {
|
|
137
|
+
* url: 'https://example.com/post',
|
|
138
|
+
* format: 'markdown',
|
|
139
|
+
* });
|
|
140
|
+
*
|
|
141
|
+
* // Hand the content + metadata to the LLM
|
|
142
|
+
* llm.prompt({ title: article.title, body: article.content });
|
|
143
|
+
* ```
|
|
144
|
+
*/
|
|
145
|
+
export declare const htmlExtractor: HtmlExtractor;
|
|
146
|
+
//# sourceMappingURL=htmlExtractor.d.ts.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"htmlExtractor.d.ts","sourceRoot":"","sources":["../../../src/utils/parsing/htmlExtractor.ts"],"names":[],"mappings":"AAgCA,OAAO,EAAE,KAAK,cAAc,EAAyB,MAAM,oCAAoC,CAAC;AAqChG;;GAEG;AACH,MAAM,WAAW,qBAAqB;IACpC;;;;OAIG;IACH,eAAe,CAAC,EAAE,MAAM,CAAC;IACzB;;;OAGG;IACH,KAAK,CAAC,EAAE,OAAO,CAAC;IAChB;;;;OAIG;IACH,MAAM,CAAC,EAAE,MAAM,GAAG,UAAU,CAAC;IAC7B;;;OAGG;IACH,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB;;OAEG;IACH,YAAY,CAAC,EAAE,OAAO,CAAC;IACvB;;;OAGG;IACH,GAAG,CAAC,EAAE,MAAM,CAAC;IACb;;;;OAIG;IACH,QAAQ,CAAC,EAAE,OAAO,CAAC;CACpB;AAED;;;;;GAKG;AACH,MAAM,WAAW,oBAAoB;IACnC,mCAAmC;IACnC,MAAM,CAAC,EAAE,MAAM,CAAC;IAChB,8EAA8E;IAC9E,OAAO,EAAE,MAAM,CAAC;IAChB,0EAA0E;IAC1E,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,sEAAsE;IACtE,MAAM,CAAC,EAAE,MAAM,CAAC;IAChB,qDAAqD;IACrD,OAAO,CAAC,EAAE,MAAM,CAAC;IACjB,uDAAuD;IACvD,KAAK,CAAC,EAAE,MAAM,CAAC;IACf,4EAA4E;IAC5E,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,6DAA6D;IAC7D,QAAQ,CAAC,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAAC,CAAC;IAClC,sDAAsD;IACtD,SAAS,CAAC,EAAE,MAAM,CAAC;IACnB,wEAAwE;IACxE,SAAS,CAAC,EAAE,MAAM,CAAC;IACnB,+DAA+D;IAC/D,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,oEAAoE;IACpE,IAAI,CAAC,EAAE,MAAM,CAAC;IACd,kCAAkC;IAClC,KAAK,CAAC,EAAE,MAAM,CAAC;IACf,sEAAsE;IACtE,SAAS,CAAC,EAAE,MAAM,CAAC;CACpB;AAED;;;;;;;;;;;GAWG;AACH,qBAAa,aAAa;IACxB;;;;;;;;;;;;;;;;;;;;;;;;;;;;;OA6BG;IACG,OAAO,CACX,IAAI,EAAE,MAAM,EACZ,OAAO,CAAC,EAAE,qBAAqB,EAC/B,OAAO,CAAC,EAAE,cAAc,GACvB,OAAO,CAAC,oBAAoB,CAAC;CA4EjC;AAED;;;;;;;;;;;;;;;;;;;GAmBG;AACH,eAAO,MAAM,aAAa,eAAsB,CAAC"}
|
|
@@ -0,0 +1,171 @@
|
|
|
1
|
+
import { McpError, validationError } from '../../types-global/errors.js';
|
|
2
|
+
import { lazyImport } from '../../utils/internal/lazyImport.js';
|
|
3
|
+
import { logger } from '../../utils/internal/logger.js';
|
|
4
|
+
import { requestContextService } from '../../utils/internal/requestContext.js';
|
|
5
|
+
const getDefuddle = lazyImport(() => import('defuddle/node'), 'Install "defuddle" to use HTML article extraction: bun add defuddle linkedom');
|
|
6
|
+
const getLinkedom = lazyImport(() => import('linkedom'), 'Install "linkedom" to use HTML article extraction: bun add defuddle linkedom');
|
|
7
|
+
/** Flattens defuddle's `MetaTagItem[]` into a `Record<string, string>` keyed
|
|
8
|
+
* by `name ?? property`. Items without a usable key or content are skipped;
|
|
9
|
+
* returns `undefined` if nothing usable is left so callers can omit the
|
|
10
|
+
* field from the result entirely. */
|
|
11
|
+
function flattenMetaTags(tags) {
|
|
12
|
+
if (!tags || tags.length === 0)
|
|
13
|
+
return;
|
|
14
|
+
const out = {};
|
|
15
|
+
for (const tag of tags) {
|
|
16
|
+
const key = tag.name ?? tag.property;
|
|
17
|
+
if (!key || tag.content == null)
|
|
18
|
+
continue;
|
|
19
|
+
out[key] = tag.content;
|
|
20
|
+
}
|
|
21
|
+
return Object.keys(out).length > 0 ? out : undefined;
|
|
22
|
+
}
|
|
23
|
+
/**
|
|
24
|
+
* Utility class for extracting main article content from raw HTML.
|
|
25
|
+
*
|
|
26
|
+
* Lazily loads `defuddle` and `linkedom` on first use — both are optional peer
|
|
27
|
+
* dependencies (`bun add defuddle linkedom`). Returns cleaned main content
|
|
28
|
+
* plus best-effort metadata: title, author, description, Open Graph fields,
|
|
29
|
+
* schema.org data, word count.
|
|
30
|
+
*
|
|
31
|
+
* Does not guarantee structure beyond "main content of the page." For quirky
|
|
32
|
+
* pages, malformed HTML, or SPA shells with minimal server-rendered content,
|
|
33
|
+
* the result may be sparse — callers should degrade gracefully.
|
|
34
|
+
*/
|
|
35
|
+
export class HtmlExtractor {
|
|
36
|
+
/**
|
|
37
|
+
* Extracts the main article content from an HTML string.
|
|
38
|
+
*
|
|
39
|
+
* Async due to lazy loading of `defuddle` and `linkedom`, and because
|
|
40
|
+
* Defuddle's node entry is itself async (supports async fallback extractors
|
|
41
|
+
* gated by `useAsync`).
|
|
42
|
+
*
|
|
43
|
+
* @param html - Raw HTML string to extract from.
|
|
44
|
+
* @param options - Optional extraction options (format, URL, content selector, etc.).
|
|
45
|
+
* @param context - Optional `RequestContext` for correlated logging and error metadata.
|
|
46
|
+
* @returns Extracted content and metadata. Only `content` is guaranteed to
|
|
47
|
+
* be present; all other fields are best-effort.
|
|
48
|
+
* @throws {McpError} With `ConfigurationError` if `defuddle` or `linkedom` is not installed.
|
|
49
|
+
* @throws {McpError} With `ValidationError` if the HTML string is empty after trimming,
|
|
50
|
+
* or if `defuddle` fails to parse the page.
|
|
51
|
+
*
|
|
52
|
+
* @example
|
|
53
|
+
* ```typescript
|
|
54
|
+
* import { htmlExtractor } from '../../utils/parsing/htmlExtractor.js';
|
|
55
|
+
*
|
|
56
|
+
* const html = await fetch('https://example.com/article').then((r) => r.text());
|
|
57
|
+
* const result = await htmlExtractor.extract(html, {
|
|
58
|
+
* url: 'https://example.com/article',
|
|
59
|
+
* format: 'markdown',
|
|
60
|
+
* });
|
|
61
|
+
*
|
|
62
|
+
* console.log(result.title);
|
|
63
|
+
* console.log(result.content);
|
|
64
|
+
* ```
|
|
65
|
+
*/
|
|
66
|
+
async extract(html, options, context) {
|
|
67
|
+
const logContext = context ??
|
|
68
|
+
requestContextService.createRequestContext({
|
|
69
|
+
operation: 'HtmlExtractor.extract',
|
|
70
|
+
});
|
|
71
|
+
const trimmed = html.trim();
|
|
72
|
+
if (!trimmed) {
|
|
73
|
+
throw validationError('HTML string is empty.', context);
|
|
74
|
+
}
|
|
75
|
+
const [{ Defuddle }, { parseHTML }] = await Promise.all([getDefuddle(), getLinkedom()]);
|
|
76
|
+
const format = options?.format ?? 'markdown';
|
|
77
|
+
const defuddleOptions = {
|
|
78
|
+
markdown: format === 'markdown',
|
|
79
|
+
useAsync: options?.useAsync ?? false,
|
|
80
|
+
...(options?.contentSelector !== undefined && {
|
|
81
|
+
contentSelector: options.contentSelector,
|
|
82
|
+
}),
|
|
83
|
+
...(options?.removeImages !== undefined && {
|
|
84
|
+
removeImages: options.removeImages,
|
|
85
|
+
}),
|
|
86
|
+
...(options?.debug !== undefined && { debug: options.debug }),
|
|
87
|
+
...(options?.language !== undefined && { language: options.language }),
|
|
88
|
+
};
|
|
89
|
+
logger.debug('Extracting article content from HTML.', {
|
|
90
|
+
...logContext,
|
|
91
|
+
byteLength: trimmed.length,
|
|
92
|
+
format,
|
|
93
|
+
hasUrl: Boolean(options?.url),
|
|
94
|
+
hasContentSelector: Boolean(options?.contentSelector),
|
|
95
|
+
});
|
|
96
|
+
try {
|
|
97
|
+
const { document } = parseHTML(trimmed);
|
|
98
|
+
const result = await Defuddle(document, options?.url, defuddleOptions);
|
|
99
|
+
logger.debug('Successfully extracted article.', {
|
|
100
|
+
...logContext,
|
|
101
|
+
wordCount: result.wordCount,
|
|
102
|
+
titlePresent: Boolean(result.title),
|
|
103
|
+
parseTimeMs: result.parseTime,
|
|
104
|
+
});
|
|
105
|
+
const out = { content: result.content ?? '' };
|
|
106
|
+
if (result.title)
|
|
107
|
+
out.title = result.title;
|
|
108
|
+
if (result.author)
|
|
109
|
+
out.author = result.author;
|
|
110
|
+
if (result.description)
|
|
111
|
+
out.description = result.description;
|
|
112
|
+
if (result.domain)
|
|
113
|
+
out.domain = result.domain;
|
|
114
|
+
if (result.favicon)
|
|
115
|
+
out.favicon = result.favicon;
|
|
116
|
+
if (result.image)
|
|
117
|
+
out.image = result.image;
|
|
118
|
+
if (result.language)
|
|
119
|
+
out.language = result.language;
|
|
120
|
+
if (result.published)
|
|
121
|
+
out.published = result.published;
|
|
122
|
+
if (result.site)
|
|
123
|
+
out.site = result.site;
|
|
124
|
+
if (typeof result.parseTime === 'number')
|
|
125
|
+
out.parseTime = result.parseTime;
|
|
126
|
+
if (typeof result.wordCount === 'number')
|
|
127
|
+
out.wordCount = result.wordCount;
|
|
128
|
+
if (result.schemaOrgData)
|
|
129
|
+
out.schemaOrgData = result.schemaOrgData;
|
|
130
|
+
const metaTags = flattenMetaTags(result.metaTags);
|
|
131
|
+
if (metaTags)
|
|
132
|
+
out.metaTags = metaTags;
|
|
133
|
+
return out;
|
|
134
|
+
}
|
|
135
|
+
catch (e) {
|
|
136
|
+
if (e instanceof McpError)
|
|
137
|
+
throw e;
|
|
138
|
+
const error = e instanceof Error ? e : new Error(String(e));
|
|
139
|
+
logger.error('Failed to extract article from HTML.', {
|
|
140
|
+
...logContext,
|
|
141
|
+
errorDetails: error.message,
|
|
142
|
+
});
|
|
143
|
+
throw validationError(`Failed to extract article from HTML: ${error.message}`, {
|
|
144
|
+
...context,
|
|
145
|
+
rawError: error.stack ?? String(error),
|
|
146
|
+
});
|
|
147
|
+
}
|
|
148
|
+
}
|
|
149
|
+
}
|
|
150
|
+
/**
|
|
151
|
+
* Singleton instance of {@link HtmlExtractor}.
|
|
152
|
+
*
|
|
153
|
+
* Prefer this over constructing a new `HtmlExtractor` directly. Lazily loads
|
|
154
|
+
* `defuddle` and `linkedom` on first call, so there is no startup cost if
|
|
155
|
+
* HTML extraction is never used.
|
|
156
|
+
*
|
|
157
|
+
* @example
|
|
158
|
+
* ```typescript
|
|
159
|
+
* import { htmlExtractor } from '../../utils/parsing/htmlExtractor.js';
|
|
160
|
+
*
|
|
161
|
+
* const article = await htmlExtractor.extract(rawHtml, {
|
|
162
|
+
* url: 'https://example.com/post',
|
|
163
|
+
* format: 'markdown',
|
|
164
|
+
* });
|
|
165
|
+
*
|
|
166
|
+
* // Hand the content + metadata to the LLM
|
|
167
|
+
* llm.prompt({ title: article.title, body: article.content });
|
|
168
|
+
* ```
|
|
169
|
+
*/
|
|
170
|
+
export const htmlExtractor = new HtmlExtractor();
|
|
171
|
+
//# sourceMappingURL=htmlExtractor.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"htmlExtractor.js","sourceRoot":"","sources":["../../../src/utils/parsing/htmlExtractor.ts"],"names":[],"mappings":"AA6BA,OAAO,EAAE,QAAQ,EAAE,eAAe,EAAE,MAAM,0BAA0B,CAAC;AACrE,OAAO,EAAE,UAAU,EAAE,MAAM,gCAAgC,CAAC;AAC5D,OAAO,EAAE,MAAM,EAAE,MAAM,4BAA4B,CAAC;AACpD,OAAO,EAAuB,qBAAqB,EAAE,MAAM,oCAAoC,CAAC;AAEhG,MAAM,WAAW,GAAG,UAAU,CAC5B,GAAG,EAAE,CAAC,MAAM,CAAC,eAAe,CAAC,EAC7B,8EAA8E,CAC/E,CAAC;AAEF,MAAM,WAAW,GAAG,UAAU,CAC5B,GAAG,EAAE,CAAC,MAAM,CAAC,UAAU,CAAC,EACxB,8EAA8E,CAC/E,CAAC;AAUF;;;sCAGsC;AACtC,SAAS,eAAe,CACtB,IAAuC;IAEvC,IAAI,CAAC,IAAI,IAAI,IAAI,CAAC,MAAM,KAAK,CAAC;QAAE,OAAO;IACvC,MAAM,GAAG,GAA2B,EAAE,CAAC;IACvC,KAAK,MAAM,GAAG,IAAI,IAAI,EAAE,CAAC;QACvB,MAAM,GAAG,GAAG,GAAG,CAAC,IAAI,IAAI,GAAG,CAAC,QAAQ,CAAC;QACrC,IAAI,CAAC,GAAG,IAAI,GAAG,CAAC,OAAO,IAAI,IAAI;YAAE,SAAS;QAC1C,GAAG,CAAC,GAAG,CAAC,GAAG,GAAG,CAAC,OAAO,CAAC;IACzB,CAAC;IACD,OAAO,MAAM,CAAC,IAAI,CAAC,GAAG,CAAC,CAAC,MAAM,GAAG,CAAC,CAAC,CAAC,CAAC,GAAG,CAAC,CAAC,CAAC,SAAS,CAAC;AACvD,CAAC;AAkFD;;;;;;;;;;;GAWG;AACH,MAAM,OAAO,aAAa;IACxB;;;;;;;;;;;;;;;;;;;;;;;;;;;;;OA6BG;IACH,KAAK,CAAC,OAAO,CACX,IAAY,EACZ,OAA+B,EAC/B,OAAwB;QAExB,MAAM,UAAU,GACd,OAAO;YACP,qBAAqB,CAAC,oBAAoB,CAAC;gBACzC,SAAS,EAAE,uBAAuB;aACnC,CAAC,CAAC;QAEL,MAAM,OAAO,GAAG,IAAI,CAAC,IAAI,EAAE,CAAC;QAC5B,IAAI,CAAC,OAAO,EAAE,CAAC;YACb,MAAM,eAAe,CAAC,uBAAuB,EAAE,OAAO,CAAC,CAAC;QAC1D,CAAC;QAED,MAAM,CAAC,EAAE,QAAQ,EAAE,EAAE,EAAE,SAAS,EAAE,CAAC,GAAG,MAAM,OAAO,CAAC,GAAG,CAAC,CAAC,WAAW,EAAE,EAAE,WAAW,EAAE,CAAC,CAAC,CAAC;QAExF,MAAM,MAAM,GAAG,OAAO,EAAE,MAAM,IAAI,UAAU,CAAC;QAC7C,MAAM,eAAe,GAAoB;YACvC,QAAQ,EAAE,MAAM,KAAK,UAAU;YAC/B,QAAQ,EAAE,OAAO,EAAE,QAAQ,IAAI,KAAK;YACpC,GAAG,CAAC,OAAO,EAAE,eAAe,KAAK,SAAS,IAAI;gBAC5C,eAAe,EAAE,OAAO,CAAC,eAAe;aACzC,CAAC;YACF,GAAG,CAAC,OAAO,EAAE,YAAY,KAAK,SAAS,IAAI;gBACzC,YAAY,EAAE,OAAO,CAAC,YAAY;aACnC,CAAC;YACF,GAAG,CAAC,OAAO,EAAE,KAAK,KAAK,SAAS,IAAI,EAAE,KAAK,EAAE,OAAO,CAAC,KAAK,EAAE,CAAC;YAC7D,GAAG,CAAC,OAAO,EAAE,QAAQ,KAAK,SAAS,IAAI,EAAE,QAAQ,EAAE,OAAO,CAAC,QAAQ,EAAE,CAAC;SACvE,CAAC;QAEF,MAAM,CAAC,KAAK,CAAC,uCAAuC,EAAE;YACpD,GAAG,UAAU;YACb,UAAU,EAAE,OAAO,CAAC,MAAM;YAC1B,MAAM;YACN,MAAM,EAAE,OAAO,CAAC,OAAO,EAAE,GAAG,CAAC;YAC7B,kBAAkB,EAAE,OAAO,CAAC,OAAO,EAAE,eAAe,CAAC;SACtD,CAAC,CAAC;QAEH,IAAI,CAAC;YACH,MAAM,EAAE,QAAQ,EAAE,GAAG,SAAS,CAAC,OAAO,CAAC,CAAC;YACxC,MAAM,MAAM,GAAG,MAAM,QAAQ,CAAC,QAAQ,EAAE,OAAO,EAAE,GAAG,EAAE,eAAe,CAAC,CAAC;YAEvE,MAAM,CAAC,KAAK,CAAC,iCAAiC,EAAE;gBAC9C,GAAG,UAAU;gBACb,SAAS,EAAE,MAAM,CAAC,SAAS;gBAC3B,YAAY,EAAE,OAAO,CAAC,MAAM,CAAC,KAAK,CAAC;gBACnC,WAAW,EAAE,MAAM,CAAC,SAAS;aAC9B,CAAC,CAAC;YAEH,MAAM,GAAG,GAAyB,EAAE,OAAO,EAAE,MAAM,CAAC,OAAO,IAAI,EAAE,EAAE,CAAC;YACpE,IAAI,MAAM,CAAC,KAAK;gBAAE,GAAG,CAAC,KAAK,GAAG,MAAM,CAAC,KAAK,CAAC;YAC3C,IAAI,MAAM,CAAC,MAAM;gBAAE,GAAG,CAAC,MAAM,GAAG,MAAM,CAAC,MAAM,CAAC;YAC9C,IAAI,MAAM,CAAC,WAAW;gBAAE,GAAG,CAAC,WAAW,GAAG,MAAM,CAAC,WAAW,CAAC;YAC7D,IAAI,MAAM,CAAC,MAAM;gBAAE,GAAG,CAAC,MAAM,GAAG,MAAM,CAAC,MAAM,CAAC;YAC9C,IAAI,MAAM,CAAC,OAAO;gBAAE,GAAG,CAAC,OAAO,GAAG,MAAM,CAAC,OAAO,CAAC;YACjD,IAAI,MAAM,CAAC,KAAK;gBAAE,GAAG,CAAC,KAAK,GAAG,MAAM,CAAC,KAAK,CAAC;YAC3C,IAAI,MAAM,CAAC,QAAQ;gBAAE,GAAG,CAAC,QAAQ,GAAG,MAAM,CAAC,QAAQ,CAAC;YACpD,IAAI,MAAM,CAAC,SAAS;gBAAE,GAAG,CAAC,SAAS,GAAG,MAAM,CAAC,SAAS,CAAC;YACvD,IAAI,MAAM,CAAC,IAAI;gBAAE,GAAG,CAAC,IAAI,GAAG,MAAM,CAAC,IAAI,CAAC;YACxC,IAAI,OAAO,MAAM,CAAC,SAAS,KAAK,QAAQ;gBAAE,GAAG,CAAC,SAAS,GAAG,MAAM,CAAC,SAAS,CAAC;YAC3E,IAAI,OAAO,MAAM,CAAC,SAAS,KAAK,QAAQ;gBAAE,GAAG,CAAC,SAAS,GAAG,MAAM,CAAC,SAAS,CAAC;YAC3E,IAAI,MAAM,CAAC,aAAa;gBAAE,GAAG,CAAC,aAAa,GAAG,MAAM,CAAC,aAAa,CAAC;YACnE,MAAM,QAAQ,GAAG,eAAe,CAAC,MAAM,CAAC,QAAQ,CAAC,CAAC;YAClD,IAAI,QAAQ;gBAAE,GAAG,CAAC,QAAQ,GAAG,QAAQ,CAAC;YACtC,OAAO,GAAG,CAAC;QACb,CAAC;QAAC,OAAO,CAAU,EAAE,CAAC;YACpB,IAAI,CAAC,YAAY,QAAQ;gBAAE,MAAM,CAAC,CAAC;YACnC,MAAM,KAAK,GAAG,CAAC,YAAY,KAAK,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC,IAAI,KAAK,CAAC,MAAM,CAAC,CAAC,CAAC,CAAC,CAAC;YAC5D,MAAM,CAAC,KAAK,CAAC,sCAAsC,EAAE;gBACnD,GAAG,UAAU;gBACb,YAAY,EAAE,KAAK,CAAC,OAAO;aAC5B,CAAC,CAAC;YACH,MAAM,eAAe,CAAC,wCAAwC,KAAK,CAAC,OAAO,EAAE,EAAE;gBAC7E,GAAG,OAAO;gBACV,QAAQ,EAAE,KAAK,CAAC,KAAK,IAAI,MAAM,CAAC,KAAK,CAAC;aACvC,CAAC,CAAC;QACL,CAAC;IACH,CAAC;CACF;AAED;;;;;;;;;;;;;;;;;;;GAmBG;AACH,MAAM,CAAC,MAAM,aAAa,GAAG,IAAI,aAAa,EAAE,CAAC"}
|
|
@@ -7,6 +7,7 @@
|
|
|
7
7
|
export { CsvParser, csvParser } from './csvParser.js';
|
|
8
8
|
export { dateParser, parseDateString, parseDateStringDetailed } from './dateParser.js';
|
|
9
9
|
export { FrontmatterParser, type FrontmatterResult, frontmatterParser, } from './frontmatterParser.js';
|
|
10
|
+
export { type ExtractArticleOptions, type ExtractArticleResult, HtmlExtractor, htmlExtractor, } from './htmlExtractor.js';
|
|
10
11
|
export { Allow, JsonParser, jsonParser } from './jsonParser.js';
|
|
11
12
|
export { type AddPageOptions, type DrawImageOptions, type DrawTextOptions, type EmbedImageOptions, type ExtractTextOptions, type ExtractTextResult, type FillFormOptions, type PageRange, type PdfMetadata, PdfParser, pdfParser, type SetMetadataOptions, } from './pdfParser.js';
|
|
12
13
|
export { thinkBlockRegex } from './thinkBlock.js';
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../../../src/utils/parsing/index.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAEH,OAAO,EAAE,SAAS,EAAE,SAAS,EAAE,MAAM,gBAAgB,CAAC;AACtD,OAAO,EAAE,UAAU,EAAE,eAAe,EAAE,uBAAuB,EAAE,MAAM,iBAAiB,CAAC;AACvF,OAAO,EACL,iBAAiB,EACjB,KAAK,iBAAiB,EACtB,iBAAiB,GAClB,MAAM,wBAAwB,CAAC;AAChC,OAAO,EAAE,KAAK,EAAE,UAAU,EAAE,UAAU,EAAE,MAAM,iBAAiB,CAAC;AAChE,OAAO,EACL,KAAK,cAAc,EACnB,KAAK,gBAAgB,EACrB,KAAK,eAAe,EACpB,KAAK,iBAAiB,EACtB,KAAK,kBAAkB,EACvB,KAAK,iBAAiB,EACtB,KAAK,eAAe,EACpB,KAAK,SAAS,EACd,KAAK,WAAW,EAChB,SAAS,EACT,SAAS,EACT,KAAK,kBAAkB,GACxB,MAAM,gBAAgB,CAAC;AACxB,OAAO,EAAE,eAAe,EAAE,MAAM,iBAAiB,CAAC;AAClD,OAAO,EAAE,SAAS,EAAE,SAAS,EAAE,MAAM,gBAAgB,CAAC;AACtD,OAAO,EAAE,UAAU,EAAE,UAAU,EAAE,MAAM,iBAAiB,CAAC"}
|
|
1
|
+
{"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../../../src/utils/parsing/index.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAEH,OAAO,EAAE,SAAS,EAAE,SAAS,EAAE,MAAM,gBAAgB,CAAC;AACtD,OAAO,EAAE,UAAU,EAAE,eAAe,EAAE,uBAAuB,EAAE,MAAM,iBAAiB,CAAC;AACvF,OAAO,EACL,iBAAiB,EACjB,KAAK,iBAAiB,EACtB,iBAAiB,GAClB,MAAM,wBAAwB,CAAC;AAChC,OAAO,EACL,KAAK,qBAAqB,EAC1B,KAAK,oBAAoB,EACzB,aAAa,EACb,aAAa,GACd,MAAM,oBAAoB,CAAC;AAC5B,OAAO,EAAE,KAAK,EAAE,UAAU,EAAE,UAAU,EAAE,MAAM,iBAAiB,CAAC;AAChE,OAAO,EACL,KAAK,cAAc,EACnB,KAAK,gBAAgB,EACrB,KAAK,eAAe,EACpB,KAAK,iBAAiB,EACtB,KAAK,kBAAkB,EACvB,KAAK,iBAAiB,EACtB,KAAK,eAAe,EACpB,KAAK,SAAS,EACd,KAAK,WAAW,EAChB,SAAS,EACT,SAAS,EACT,KAAK,kBAAkB,GACxB,MAAM,gBAAgB,CAAC;AACxB,OAAO,EAAE,eAAe,EAAE,MAAM,iBAAiB,CAAC;AAClD,OAAO,EAAE,SAAS,EAAE,SAAS,EAAE,MAAM,gBAAgB,CAAC;AACtD,OAAO,EAAE,UAAU,EAAE,UAAU,EAAE,MAAM,iBAAiB,CAAC"}
|
|
@@ -7,6 +7,7 @@
|
|
|
7
7
|
export { CsvParser, csvParser } from './csvParser.js';
|
|
8
8
|
export { dateParser, parseDateString, parseDateStringDetailed } from './dateParser.js';
|
|
9
9
|
export { FrontmatterParser, frontmatterParser, } from './frontmatterParser.js';
|
|
10
|
+
export { HtmlExtractor, htmlExtractor, } from './htmlExtractor.js';
|
|
10
11
|
export { Allow, JsonParser, jsonParser } from './jsonParser.js';
|
|
11
12
|
export { PdfParser, pdfParser, } from './pdfParser.js';
|
|
12
13
|
export { thinkBlockRegex } from './thinkBlock.js';
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"index.js","sourceRoot":"","sources":["../../../src/utils/parsing/index.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAEH,OAAO,EAAE,SAAS,EAAE,SAAS,EAAE,MAAM,gBAAgB,CAAC;AACtD,OAAO,EAAE,UAAU,EAAE,eAAe,EAAE,uBAAuB,EAAE,MAAM,iBAAiB,CAAC;AACvF,OAAO,EACL,iBAAiB,EAEjB,iBAAiB,GAClB,MAAM,wBAAwB,CAAC;AAChC,OAAO,EAAE,KAAK,EAAE,UAAU,EAAE,UAAU,EAAE,MAAM,iBAAiB,CAAC;AAChE,OAAO,EAUL,SAAS,EACT,SAAS,GAEV,MAAM,gBAAgB,CAAC;AACxB,OAAO,EAAE,eAAe,EAAE,MAAM,iBAAiB,CAAC;AAClD,OAAO,EAAE,SAAS,EAAE,SAAS,EAAE,MAAM,gBAAgB,CAAC;AACtD,OAAO,EAAE,UAAU,EAAE,UAAU,EAAE,MAAM,iBAAiB,CAAC"}
|
|
1
|
+
{"version":3,"file":"index.js","sourceRoot":"","sources":["../../../src/utils/parsing/index.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAEH,OAAO,EAAE,SAAS,EAAE,SAAS,EAAE,MAAM,gBAAgB,CAAC;AACtD,OAAO,EAAE,UAAU,EAAE,eAAe,EAAE,uBAAuB,EAAE,MAAM,iBAAiB,CAAC;AACvF,OAAO,EACL,iBAAiB,EAEjB,iBAAiB,GAClB,MAAM,wBAAwB,CAAC;AAChC,OAAO,EAGL,aAAa,EACb,aAAa,GACd,MAAM,oBAAoB,CAAC;AAC5B,OAAO,EAAE,KAAK,EAAE,UAAU,EAAE,UAAU,EAAE,MAAM,iBAAiB,CAAC;AAChE,OAAO,EAUL,SAAS,EACT,SAAS,GAEV,MAAM,gBAAgB,CAAC;AACxB,OAAO,EAAE,eAAe,EAAE,MAAM,iBAAiB,CAAC;AAClD,OAAO,EAAE,SAAS,EAAE,SAAS,EAAE,MAAM,gBAAgB,CAAC;AACtD,OAAO,EAAE,UAAU,EAAE,UAAU,EAAE,MAAM,iBAAiB,CAAC"}
|