@cyanheads/mcp-ts-core 0.6.9 → 0.6.11
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CLAUDE.md +3 -3
- package/README.md +2 -2
- package/biome.json +1 -1
- package/changelog/0.6.x/0.6.10.md +21 -0
- package/changelog/0.6.x/0.6.11.md +23 -0
- package/dist/logs/combined.log +4 -0
- package/dist/logs/error.log +4 -0
- package/dist/logs/interactions.log +0 -0
- package/dist/utils/index.d.ts +1 -1
- package/dist/utils/index.d.ts.map +1 -1
- package/dist/utils/index.js +1 -1
- package/dist/utils/index.js.map +1 -1
- package/dist/utils/parsing/htmlExtractor.d.ts +146 -0
- package/dist/utils/parsing/htmlExtractor.d.ts.map +1 -0
- package/dist/utils/parsing/htmlExtractor.js +171 -0
- package/dist/utils/parsing/htmlExtractor.js.map +1 -0
- package/dist/utils/parsing/index.d.ts +1 -0
- package/dist/utils/parsing/index.d.ts.map +1 -1
- package/dist/utils/parsing/index.js +1 -0
- package/dist/utils/parsing/index.js.map +1 -1
- package/package.json +19 -9
- package/skills/field-test/SKILL.md +205 -82
- package/skills/polish-docs-meta/references/package-meta.md +1 -1
- package/skills/release-and-publish/SKILL.md +150 -0
- package/skills/setup/SKILL.md +64 -14
- package/skills/release/SKILL.md +0 -142
package/CLAUDE.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# Agent Protocol
|
|
2
2
|
|
|
3
|
-
**Package:** `@cyanheads/mcp-ts-core` · **Version:** 0.6.
|
|
3
|
+
**Package:** `@cyanheads/mcp-ts-core` · **Version:** 0.6.11
|
|
4
4
|
**npm:** [@cyanheads/mcp-ts-core](https://www.npmjs.com/package/@cyanheads/mcp-ts-core) · **Docker:** [ghcr.io/cyanheads/mcp-ts-core](https://ghcr.io/cyanheads/mcp-ts-core)
|
|
5
5
|
|
|
6
6
|
> **Developer note:** Never assume. Read related files and docs before making changes. Read full file content for context. Never edit a file before reading it.
|
|
@@ -468,7 +468,7 @@ Detailed method signatures, options, and examples live in skill files. Read the
|
|
|
468
468
|
| `polish-docs-meta` | `skills/polish-docs-meta/SKILL.md` | Finalize docs, README, metadata, and agent protocol for shipping |
|
|
469
469
|
| `report-issue-framework` | `skills/report-issue-framework/SKILL.md` | File a bug or feature request against `@cyanheads/mcp-ts-core` via `gh` CLI |
|
|
470
470
|
| `report-issue-local` | `skills/report-issue-local/SKILL.md` | File a bug or feature request against this server's own repo via `gh` CLI |
|
|
471
|
-
| `release` | `skills/release/SKILL.md` |
|
|
471
|
+
| `release-and-publish` | `skills/release-and-publish/SKILL.md` | Post-wrapup ship workflow: verification gate, push, publish to npm/MCP Registry/GHCR |
|
|
472
472
|
| `maintenance` | `skills/maintenance/SKILL.md` | Dependency updates, housekeeping tasks |
|
|
473
473
|
| `migrate-mcp-ts-template` | `skills/migrate-mcp-ts-template/SKILL.md` | Migrate legacy template fork to package dependency |
|
|
474
474
|
|
|
@@ -563,7 +563,7 @@ Pre-release versions (`0.6.0-beta.1`, `0.6.0-rc.1`, etc.) are consolidated as `#
|
|
|
563
563
|
|
|
564
564
|
## Publishing
|
|
565
565
|
|
|
566
|
-
Run the `release` skill
|
|
566
|
+
Run the `release-and-publish` skill — it runs the verification gate (`devcheck`, `rebuild`, `test:all`), pushes commits and tags, and publishes to every applicable destination. The full reference:
|
|
567
567
|
|
|
568
568
|
```bash
|
|
569
569
|
bun publish --access public
|
package/README.md
CHANGED
|
@@ -5,7 +5,7 @@
|
|
|
5
5
|
|
|
6
6
|
<div align="center">
|
|
7
7
|
|
|
8
|
-
[](./CHANGELOG.md) [](https://github.com/modelcontextprotocol/modelcontextprotocol/blob/main/docs/specification/2025-11-25/changelog.mdx) [](https://modelcontextprotocol.io/) [](./LICENSE)
|
|
9
9
|
|
|
10
10
|
[](https://www.typescriptlang.org/) [](https://bun.sh/)
|
|
11
11
|
|
|
@@ -43,7 +43,7 @@ bun install
|
|
|
43
43
|
|
|
44
44
|
You get a scaffolded project with `CLAUDE.md`, Agent Skills, and a `src/` tree ready for your tools. Infrastructure — transports, auth, storage, telemetry, lifecycle, linting — lives in `node_modules`. What's left is domain: which APIs to wrap, which workflows to expose.
|
|
45
45
|
|
|
46
|
-
Start your coding agent (Claude Code, Codex, Cursor), describe the system you want to expose, and it drives the build. The included skills cover the full cycle: `setup`, `design-mcp-server`, scaffolding, testing, release
|
|
46
|
+
Start your coding agent (Claude Code, Codex, Cursor), describe the system you want to expose, and it drives the build. The included skills cover the full cycle: `setup`, `design-mcp-server`, scaffolding, testing, `release-and-publish`.
|
|
47
47
|
|
|
48
48
|
### What you get
|
|
49
49
|
|
package/biome.json
CHANGED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
---
|
|
2
|
+
summary: Rename release skill to release-and-publish with an end-to-end ship workflow, expand setup skill scaffolding docs, and bump @cloudflare/workers-types, @supabase/supabase-js, and vite
|
|
3
|
+
breaking: false
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# 0.6.10 — 2026-04-23
|
|
7
|
+
|
|
8
|
+
Rework the release workflow into a post-wrapup publisher and expand the setup skill to match what `init` actually scaffolds. No library API changes.
|
|
9
|
+
|
|
10
|
+
## Changed
|
|
11
|
+
|
|
12
|
+
- **`release` skill renamed to `release-and-publish`** and rewritten as a post-wrapup shipping workflow (version bumps, changelog, commits, and tagging are assumed done by the git wrapup protocol). The skill now runs the verification gate (`devcheck`, `rebuild`, `test:all`), pushes commits and tags, then publishes to npm, the MCP Registry, and GHCR — halting on the first non-zero exit and reporting partial state so the user can resume manually. Audience flipped from `internal` to `external`; version bumped to `2.0`.
|
|
13
|
+
- **`setup` skill expanded** — project tree listing now reflects everything `init` actually creates (configs, Dockerfile, `.vscode/`, `server.json`, starter tests, app-tool + app-resource pair). Added a "Changelog Convention" section and a "Next Steps" progression through `design-mcp-server` → scaffolding skills → `add-test` → `field-test` → `polish-docs-meta` → `maintenance`. Version bumped to `1.5`.
|
|
14
|
+
- **`publish-mcp` script** now logs into the MCP Publisher with a Keychain-stored GitHub PAT before publishing: `mcp-publisher login github -token "$(security find-generic-password -a "$USER" -s mcp-publisher-github-pat -w)" && mcp-publisher publish`. Matches the flow documented in the `release-and-publish` skill.
|
|
15
|
+
- **Agent protocol references updated** — `CLAUDE.md`, `AGENTS.md`, `README.md`, and `skills/polish-docs-meta/references/package-meta.md` now point at `release-and-publish` instead of the old `release` skill.
|
|
16
|
+
|
|
17
|
+
## Dependencies
|
|
18
|
+
|
|
19
|
+
- `@cloudflare/workers-types` `^4.20260422.1` → `^4.20260423.1`
|
|
20
|
+
- `@supabase/supabase-js` `^2.104.0` → `^2.104.1`
|
|
21
|
+
- `vite` `8.0.9` → `8.0.10`
|
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
---
|
|
2
|
+
summary: Add HtmlExtractor Tier 3 utility — wraps defuddle + linkedom for extracting main article content and metadata from raw HTML into Markdown or cleaned HTML
|
|
3
|
+
breaking: false
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# 0.6.11 — 2026-04-23
|
|
7
|
+
|
|
8
|
+
Adds `HtmlExtractor`, a new Tier 3 parsing utility for turning raw HTML into clean article content plus best-effort metadata. Built for MCP servers that wrap scholarly or article APIs and need to hand page content to an LLM without hand-rolling extraction.
|
|
9
|
+
|
|
10
|
+
## Added
|
|
11
|
+
|
|
12
|
+
- **`HtmlExtractor` (`htmlExtractor` singleton)** in `src/utils/parsing/htmlExtractor.ts` — wraps [`defuddle`](https://github.com/kepano/defuddle) (modern Readability successor, powers Obsidian Web Clipper) together with [`linkedom`](https://github.com/WebReflection/linkedom) for DOM parsing. Exported from `@cyanheads/mcp-ts-core/utils`. One method:
|
|
13
|
+
- `extract(html, options?, context?)` — returns `{ title?, author?, description?, content, domain?, favicon?, image?, language?, metaTags?, parseTime?, published?, schemaOrgData?, site?, wordCount? }`. Only `content` is guaranteed; all other fields are best-effort based on what the source page exposes.
|
|
14
|
+
- **Options** on `ExtractArticleOptions`: `format` (`'markdown' | 'html'`, defaults to `'markdown'`), `url`, `contentSelector`, `removeImages`, `debug`, `language`, `useAsync` (off by default — keeps extraction local and deterministic; opt in to allow Defuddle's third-party API fallbacks for SPAs like Twitter).
|
|
15
|
+
- **Peer dependencies** (both optional): `defuddle ^0.18.1`, `linkedom ^0.18.12`. Install with `bun add defuddle linkedom`. Cloudflare Workers note: linkedom works in Workers but adds ~150KB minified plus entity tables — factor into your Worker size budget. JSDOM is also supported via defuddle's node entry but is not the default.
|
|
16
|
+
- **Tests** at `tests/unit/utils/parsing/htmlExtractor.test.ts` covering clean articles, metadata extraction, boilerplate removal (nav/sidebar/footer), markdown vs HTML output, `contentSelector` override, `removeImages`, empty input (→ `ValidationError`), SPA shells, and malformed HTML.
|
|
17
|
+
|
|
18
|
+
## Changed
|
|
19
|
+
|
|
20
|
+
- **`skills/field-test`** rewritten to v2.0 — pivots from "use the MCP tools already connected in your client" to "start the HTTP server locally and drive it with curl + JSON-RPC." Adds a reusable bash helper (`mcp_start`/`mcp_init`/`mcp_call`/`mcp_stop`) that persists PID/URL/session state across tool invocations, replaces the per-definition category matrix with a universal battery (happy path, parity, input error) plus trigger-gated situational categories so it scales to large servers, and tightens the report format (summary paragraph → grouped findings → numbered cherry-pick options).
|
|
21
|
+
- **`biome.json`** `$schema` URL bumped to `2.4.13` to match the `@biomejs/biome` version already in devDependencies.
|
|
22
|
+
|
|
23
|
+
Resolves [#46](https://github.com/cyanheads/mcp-ts-core/issues/46).
|
|
@@ -0,0 +1,4 @@
|
|
|
1
|
+
{"level":50,"time":1776964260716,"env":"testing","version":"0.6.11","pid":41913,"requestId":"39HTQ-Z1TY2","timestamp":"2026-04-23T17:11:00.716Z","operation":"httpErrorHandler","critical":false,"errorCode":-32006,"originalErrorType":"McpError","finalErrorType":"McpError","path":"/mcp","method":"POST","errorData":{"path":"/mcp","method":"POST","requestId":"39HTQ-Z1TY2","timestamp":"2026-04-23T17:11:00.716Z","operation":"httpErrorHandler","originalErrorName":"McpError","originalMessage":"Missing or invalid Authorization header. Bearer scheme required.","originalStack":"McpError: Missing or invalid Authorization header. Bearer scheme required.\n at unauthorized (/Users/casey/Developer/github/mcp-ts-core/dist/types-global/errors.js:86:61)\n at authMiddleware (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/auth/authMiddleware.js:64:19)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:22:23)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/http/httpTransport.js:119:22)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:22:23)\n at cors2 (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/middleware/cors/index.js:82:11)\n at processTicksAndRejections (native:7:39)"},"stack":"McpError: Missing or invalid Authorization header. Bearer scheme required.\n at handleError (/Users/casey/Developer/github/mcp-ts-core/dist/utils/internal/error-handler/errorHandler.js:168:23)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/http/httpErrorHandler.js:59:39)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:26:25)\n at processTicksAndRejections (native:7:39)","msg":"Error in httpTransport: Missing or invalid Authorization header. Bearer scheme required."}
|
|
2
|
+
{"level":50,"time":1776964260730,"env":"testing","version":"0.6.11","pid":41913,"requestId":"96UGI-8URT9","timestamp":"2026-04-23T17:11:00.730Z","operation":"httpErrorHandler","critical":false,"errorCode":-32006,"originalErrorType":"McpError","finalErrorType":"McpError","path":"/mcp","method":"POST","errorData":{"path":"/mcp","method":"POST","requestId":"96UGI-8URT9","timestamp":"2026-04-23T17:11:00.730Z","operation":"httpErrorHandler","originalErrorName":"McpError","originalMessage":"Token has expired.","originalStack":"McpError: Token has expired.\n at unauthorized (/Users/casey/Developer/github/mcp-ts-core/dist/types-global/errors.js:86:61)\n at handleJoseVerifyError (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/auth/lib/claimParser.js:56:11)\n at verify (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/auth/strategies/jwtStrategy.js:91:13)\n at processTicksAndRejections (native:7:39)"},"stack":"McpError: Token has expired.\n at handleError (/Users/casey/Developer/github/mcp-ts-core/dist/utils/internal/error-handler/errorHandler.js:168:23)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/http/httpErrorHandler.js:59:39)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:26:25)\n at processTicksAndRejections (native:7:39)","msg":"Error in httpTransport: Token has expired."}
|
|
3
|
+
{"level":50,"time":1776964260733,"env":"testing","version":"0.6.11","pid":41913,"requestId":"Y7DPA-0GN8D","timestamp":"2026-04-23T17:11:00.733Z","operation":"httpErrorHandler","critical":false,"errorCode":-32006,"originalErrorType":"McpError","finalErrorType":"McpError","path":"/mcp","method":"GET","errorData":{"path":"/mcp","method":"GET","requestId":"Y7DPA-0GN8D","timestamp":"2026-04-23T17:11:00.733Z","operation":"httpErrorHandler","originalErrorName":"McpError","originalMessage":"Missing or invalid Authorization header. Bearer scheme required.","originalStack":"McpError: Missing or invalid Authorization header. Bearer scheme required.\n at unauthorized (/Users/casey/Developer/github/mcp-ts-core/dist/types-global/errors.js:86:61)\n at authMiddleware (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/auth/authMiddleware.js:64:19)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:22:23)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:22:23)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/http/httpTransport.js:119:22)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:22:23)\n at cors2 (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/middleware/cors/index.js:82:11)\n at processTicksAndRejections (native:7:39)"},"stack":"McpError: Missing or invalid Authorization header. Bearer scheme required.\n at handleError (/Users/casey/Developer/github/mcp-ts-core/dist/utils/internal/error-handler/errorHandler.js:168:23)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/http/httpErrorHandler.js:59:39)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:26:25)\n at processTicksAndRejections (native:7:39)","msg":"Error in httpTransport: Missing or invalid Authorization header. Bearer scheme required."}
|
|
4
|
+
{"level":50,"time":1776964261975,"env":"testing","version":"0.0.0-test","pid":41951,"requestId":"I9RC9-PC6JL","timestamp":"2026-04-23T17:11:01.974Z","operation":"HandleToolRequest","input":{"message":"blocked"},"critical":false,"errorCode":-32005,"originalErrorType":"McpError","finalErrorType":"McpError","sessionId":"461886e20f7bb81bf0c81e4dc87a34cd9e982327ed1faaed8d7c15b148320462","toolName":"scoped_echo","tenantId":"authz-tenant","auth":{"sub":"authz-user","scopes":["tool:other:read"],"clientId":"authz-client","tenantId":"authz-tenant"},"errorData":{"sessionId":"461886e20f7bb81bf0c81e4dc87a34cd9e982327ed1faaed8d7c15b148320462","toolName":"scoped_echo","input":{"message":"blocked"},"requestId":"I9RC9-PC6JL","timestamp":"2026-04-23T17:11:01.974Z","tenantId":"authz-tenant","operation":"HandleToolRequest","auth":{"sub":"authz-user","scopes":["tool:other:read"],"clientId":"authz-client","tenantId":"authz-tenant"},"originalErrorName":"McpError","originalMessage":"Insufficient permissions.","originalStack":"McpError: Insufficient permissions.\n at forbidden (/Users/casey/Developer/github/mcp-ts-core/dist/types-global/errors.js:84:58)\n at withRequiredScopes (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/auth/lib/authUtils.js:61:15)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/tools/utils/toolHandlerFactory.js:68:17)\n at executeToolHandler (/Users/casey/Developer/github/mcp-ts-core/node_modules/@modelcontextprotocol/sdk/dist/esm/server/mcp.js:231:34)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/node_modules/@modelcontextprotocol/sdk/dist/esm/server/mcp.js:126:43)\n at processTicksAndRejections (native:7:39)"},"stack":"McpError: Insufficient permissions.\n at handleError (/Users/casey/Developer/github/mcp-ts-core/dist/utils/internal/error-handler/errorHandler.js:168:23)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/tools/utils/toolHandlerFactory.js:101:42)\n at executeToolHandler (/Users/casey/Developer/github/mcp-ts-core/node_modules/@modelcontextprotocol/sdk/dist/esm/server/mcp.js:231:34)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/node_modules/@modelcontextprotocol/sdk/dist/esm/server/mcp.js:126:43)\n at processTicksAndRejections (native:7:39)","msg":"Error in tool:scoped_echo: Insufficient permissions."}
|
|
@@ -0,0 +1,4 @@
|
|
|
1
|
+
{"level":50,"time":1776964260716,"env":"testing","version":"0.6.11","pid":41913,"requestId":"39HTQ-Z1TY2","timestamp":"2026-04-23T17:11:00.716Z","operation":"httpErrorHandler","critical":false,"errorCode":-32006,"originalErrorType":"McpError","finalErrorType":"McpError","path":"/mcp","method":"POST","errorData":{"path":"/mcp","method":"POST","requestId":"39HTQ-Z1TY2","timestamp":"2026-04-23T17:11:00.716Z","operation":"httpErrorHandler","originalErrorName":"McpError","originalMessage":"Missing or invalid Authorization header. Bearer scheme required.","originalStack":"McpError: Missing or invalid Authorization header. Bearer scheme required.\n at unauthorized (/Users/casey/Developer/github/mcp-ts-core/dist/types-global/errors.js:86:61)\n at authMiddleware (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/auth/authMiddleware.js:64:19)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:22:23)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/http/httpTransport.js:119:22)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:22:23)\n at cors2 (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/middleware/cors/index.js:82:11)\n at processTicksAndRejections (native:7:39)"},"stack":"McpError: Missing or invalid Authorization header. Bearer scheme required.\n at handleError (/Users/casey/Developer/github/mcp-ts-core/dist/utils/internal/error-handler/errorHandler.js:168:23)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/http/httpErrorHandler.js:59:39)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:26:25)\n at processTicksAndRejections (native:7:39)","msg":"Error in httpTransport: Missing or invalid Authorization header. Bearer scheme required."}
|
|
2
|
+
{"level":50,"time":1776964260730,"env":"testing","version":"0.6.11","pid":41913,"requestId":"96UGI-8URT9","timestamp":"2026-04-23T17:11:00.730Z","operation":"httpErrorHandler","critical":false,"errorCode":-32006,"originalErrorType":"McpError","finalErrorType":"McpError","path":"/mcp","method":"POST","errorData":{"path":"/mcp","method":"POST","requestId":"96UGI-8URT9","timestamp":"2026-04-23T17:11:00.730Z","operation":"httpErrorHandler","originalErrorName":"McpError","originalMessage":"Token has expired.","originalStack":"McpError: Token has expired.\n at unauthorized (/Users/casey/Developer/github/mcp-ts-core/dist/types-global/errors.js:86:61)\n at handleJoseVerifyError (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/auth/lib/claimParser.js:56:11)\n at verify (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/auth/strategies/jwtStrategy.js:91:13)\n at processTicksAndRejections (native:7:39)"},"stack":"McpError: Token has expired.\n at handleError (/Users/casey/Developer/github/mcp-ts-core/dist/utils/internal/error-handler/errorHandler.js:168:23)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/http/httpErrorHandler.js:59:39)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:26:25)\n at processTicksAndRejections (native:7:39)","msg":"Error in httpTransport: Token has expired."}
|
|
3
|
+
{"level":50,"time":1776964260733,"env":"testing","version":"0.6.11","pid":41913,"requestId":"Y7DPA-0GN8D","timestamp":"2026-04-23T17:11:00.733Z","operation":"httpErrorHandler","critical":false,"errorCode":-32006,"originalErrorType":"McpError","finalErrorType":"McpError","path":"/mcp","method":"GET","errorData":{"path":"/mcp","method":"GET","requestId":"Y7DPA-0GN8D","timestamp":"2026-04-23T17:11:00.733Z","operation":"httpErrorHandler","originalErrorName":"McpError","originalMessage":"Missing or invalid Authorization header. Bearer scheme required.","originalStack":"McpError: Missing or invalid Authorization header. Bearer scheme required.\n at unauthorized (/Users/casey/Developer/github/mcp-ts-core/dist/types-global/errors.js:86:61)\n at authMiddleware (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/auth/authMiddleware.js:64:19)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:22:23)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:22:23)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/http/httpTransport.js:119:22)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:22:23)\n at cors2 (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/middleware/cors/index.js:82:11)\n at processTicksAndRejections (native:7:39)"},"stack":"McpError: Missing or invalid Authorization header. Bearer scheme required.\n at handleError (/Users/casey/Developer/github/mcp-ts-core/dist/utils/internal/error-handler/errorHandler.js:168:23)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/http/httpErrorHandler.js:59:39)\n at dispatch (/Users/casey/Developer/github/mcp-ts-core/node_modules/hono/dist/compose.js:26:25)\n at processTicksAndRejections (native:7:39)","msg":"Error in httpTransport: Missing or invalid Authorization header. Bearer scheme required."}
|
|
4
|
+
{"level":50,"time":1776964261975,"env":"testing","version":"0.0.0-test","pid":41951,"requestId":"I9RC9-PC6JL","timestamp":"2026-04-23T17:11:01.974Z","operation":"HandleToolRequest","input":{"message":"blocked"},"critical":false,"errorCode":-32005,"originalErrorType":"McpError","finalErrorType":"McpError","sessionId":"461886e20f7bb81bf0c81e4dc87a34cd9e982327ed1faaed8d7c15b148320462","toolName":"scoped_echo","tenantId":"authz-tenant","auth":{"sub":"authz-user","scopes":["tool:other:read"],"clientId":"authz-client","tenantId":"authz-tenant"},"errorData":{"sessionId":"461886e20f7bb81bf0c81e4dc87a34cd9e982327ed1faaed8d7c15b148320462","toolName":"scoped_echo","input":{"message":"blocked"},"requestId":"I9RC9-PC6JL","timestamp":"2026-04-23T17:11:01.974Z","tenantId":"authz-tenant","operation":"HandleToolRequest","auth":{"sub":"authz-user","scopes":["tool:other:read"],"clientId":"authz-client","tenantId":"authz-tenant"},"originalErrorName":"McpError","originalMessage":"Insufficient permissions.","originalStack":"McpError: Insufficient permissions.\n at forbidden (/Users/casey/Developer/github/mcp-ts-core/dist/types-global/errors.js:84:58)\n at withRequiredScopes (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/transports/auth/lib/authUtils.js:61:15)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/tools/utils/toolHandlerFactory.js:68:17)\n at executeToolHandler (/Users/casey/Developer/github/mcp-ts-core/node_modules/@modelcontextprotocol/sdk/dist/esm/server/mcp.js:231:34)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/node_modules/@modelcontextprotocol/sdk/dist/esm/server/mcp.js:126:43)\n at processTicksAndRejections (native:7:39)"},"stack":"McpError: Insufficient permissions.\n at handleError (/Users/casey/Developer/github/mcp-ts-core/dist/utils/internal/error-handler/errorHandler.js:168:23)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/dist/mcp-server/tools/utils/toolHandlerFactory.js:101:42)\n at executeToolHandler (/Users/casey/Developer/github/mcp-ts-core/node_modules/@modelcontextprotocol/sdk/dist/esm/server/mcp.js:231:34)\n at <anonymous> (/Users/casey/Developer/github/mcp-ts-core/node_modules/@modelcontextprotocol/sdk/dist/esm/server/mcp.js:126:43)\n at processTicksAndRejections (native:7:39)","msg":"Error in tool:scoped_echo: Insufficient permissions."}
|
|
File without changes
|
package/dist/utils/index.d.ts
CHANGED
|
@@ -13,7 +13,7 @@ export { type ChatMessage, countChatTokens, countTokens, type ModelHeuristics, }
|
|
|
13
13
|
export { type FetchWithTimeoutOptions, fetchWithTimeout } from './network/fetchWithTimeout.js';
|
|
14
14
|
export { type RetryOptions, withRetry } from './network/retry.js';
|
|
15
15
|
export { DEFAULT_PAGINATION_CONFIG, decodeCursor, encodeCursor, extractCursor, type PaginatedResult, type PaginationState, paginateArray, } from './pagination/pagination.js';
|
|
16
|
-
export { type AddPageOptions, Allow, CsvParser, csvParser, type DrawImageOptions, type DrawTextOptions, dateParser, type EmbedImageOptions, type ExtractTextOptions, type ExtractTextResult, type FillFormOptions, FrontmatterParser, type FrontmatterResult, frontmatterParser, JsonParser, jsonParser, type PageRange, type PdfMetadata, PdfParser, parseDateString, parseDateStringDetailed, pdfParser, type SetMetadataOptions, thinkBlockRegex, XmlParser, xmlParser, YamlParser, yamlParser, } from './parsing/index.js';
|
|
16
|
+
export { type AddPageOptions, Allow, CsvParser, csvParser, type DrawImageOptions, type DrawTextOptions, dateParser, type EmbedImageOptions, type ExtractArticleOptions, type ExtractArticleResult, type ExtractTextOptions, type ExtractTextResult, type FillFormOptions, FrontmatterParser, type FrontmatterResult, frontmatterParser, HtmlExtractor, htmlExtractor, JsonParser, jsonParser, type PageRange, type PdfMetadata, PdfParser, parseDateString, parseDateStringDetailed, pdfParser, type SetMetadataOptions, thinkBlockRegex, XmlParser, xmlParser, YamlParser, yamlParser, } from './parsing/index.js';
|
|
17
17
|
export { type Job, SchedulerService, schedulerService } from './scheduling/scheduler.js';
|
|
18
18
|
export { type EntityPrefixConfig, generateRequestContextId, generateUUID, type HtmlSanitizeConfig, type IdGenerationOptions, IdGenerator, idGenerator, type PathSanitizeOptions, type RateLimitConfig, type RateLimitEntry, RateLimiter, Sanitization, type SanitizedPathInfo, type SanitizeStringOptions, sanitization, sanitizeInputForLogging, } from './security/index.js';
|
|
19
19
|
export { ATTR_CODE_FUNCTION_NAME, ATTR_CODE_NAMESPACE, ATTR_GEN_AI_REQUEST_MAX_TOKENS, ATTR_GEN_AI_REQUEST_MODEL, ATTR_GEN_AI_REQUEST_STREAMING, ATTR_GEN_AI_REQUEST_TEMPERATURE, ATTR_GEN_AI_REQUEST_TOP_P, ATTR_GEN_AI_RESPONSE_MODEL, ATTR_GEN_AI_SYSTEM, ATTR_GEN_AI_TOKEN_TYPE, ATTR_GEN_AI_USAGE_INPUT_TOKENS, ATTR_GEN_AI_USAGE_OUTPUT_TOKENS, ATTR_GEN_AI_USAGE_TOTAL_TOKENS, ATTR_MCP_AUTH_FAILURE_REASON, ATTR_MCP_AUTH_METHOD, ATTR_MCP_AUTH_OUTCOME, ATTR_MCP_AUTH_SCOPES, ATTR_MCP_AUTH_SUBJECT, ATTR_MCP_CLIENT_ID, ATTR_MCP_ERROR_CLASSIFIED_CODE, ATTR_MCP_GRAPH_DURATION_MS, ATTR_MCP_GRAPH_OPERATION, ATTR_MCP_GRAPH_SUCCESS, ATTR_MCP_RESOURCE_DURATION_MS, ATTR_MCP_RESOURCE_ERROR_CODE, ATTR_MCP_RESOURCE_MIME_TYPE, ATTR_MCP_RESOURCE_SIZE_BYTES, ATTR_MCP_RESOURCE_SUCCESS, ATTR_MCP_RESOURCE_URI, ATTR_MCP_SESSION_EVENT, ATTR_MCP_SPEECH_DURATION_MS, ATTR_MCP_SPEECH_INPUT_BYTES, ATTR_MCP_SPEECH_OPERATION, ATTR_MCP_SPEECH_OUTPUT_BYTES, ATTR_MCP_SPEECH_PROVIDER, ATTR_MCP_SPEECH_SUCCESS, ATTR_MCP_STORAGE_DURATION_MS, ATTR_MCP_STORAGE_KEY_COUNT, ATTR_MCP_STORAGE_OPERATION, ATTR_MCP_STORAGE_SUCCESS, ATTR_MCP_TASK_STATUS, ATTR_MCP_TASK_STORE_TYPE, ATTR_MCP_TENANT_ID, ATTR_MCP_TOOL_BATCH_FAILED, ATTR_MCP_TOOL_BATCH_SUCCEEDED, ATTR_MCP_TOOL_DURATION_MS, ATTR_MCP_TOOL_ERROR_CATEGORY, ATTR_MCP_TOOL_ERROR_CODE, ATTR_MCP_TOOL_INPUT_BYTES, ATTR_MCP_TOOL_NAME, ATTR_MCP_TOOL_OUTPUT_BYTES, ATTR_MCP_TOOL_PARTIAL_SUCCESS, ATTR_MCP_TOOL_SUCCESS, } from './telemetry/attributes.js';
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../../src/utils/index.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAGH,OAAO,EACL,KAAK,SAAS,EACd,KAAK,UAAU,EACf,aAAa,EACb,KAAK,oBAAoB,EACzB,aAAa,EACb,UAAU,EACV,KAAK,iBAAiB,EACtB,IAAI,EACJ,eAAe,EACf,QAAQ,EACR,QAAQ,EACR,cAAc,EACd,KAAK,qBAAqB,EAC1B,KAAK,UAAU,EACf,aAAa,EACb,KAAK,oBAAoB,EACzB,KAAK,QAAQ,EACb,KAAK,SAAS,EACd,cAAc,EACd,aAAa,EACb,SAAS,GACV,MAAM,uBAAuB,CAAC;AAE/B,OAAO,EAAE,mBAAmB,EAAE,cAAc,EAAE,cAAc,EAAE,MAAM,wBAAwB,CAAC;AAE7F,OAAO,EAAE,YAAY,EAAE,MAAM,0CAA0C,CAAC;AACxE,YAAY,EACV,gBAAgB,EAChB,YAAY,EACZ,mBAAmB,EACnB,YAAY,GACb,MAAM,mCAAmC,CAAC;AAE3C,OAAO,EAAE,MAAM,EAAE,MAAM,EAAE,KAAK,WAAW,EAAE,MAAM,sBAAsB,CAAC;AAExE,OAAO,EACL,KAAK,WAAW,EAChB,KAAK,0BAA0B,EAC/B,KAAK,cAAc,EACnB,qBAAqB,GACtB,MAAM,8BAA8B,CAAC;AAEtC,OAAO,EAAE,KAAK,mBAAmB,EAAE,WAAW,EAAE,MAAM,uBAAuB,CAAC;AAE9E,OAAO,EACL,KAAK,WAAW,EAChB,eAAe,EACf,WAAW,EACX,KAAK,eAAe,GACrB,MAAM,2BAA2B,CAAC;AAEnC,OAAO,EAAE,KAAK,uBAAuB,EAAE,gBAAgB,EAAE,MAAM,+BAA+B,CAAC;AAC/F,OAAO,EAAE,KAAK,YAAY,EAAE,SAAS,EAAE,MAAM,oBAAoB,CAAC;AAElE,OAAO,EACL,yBAAyB,EACzB,YAAY,EACZ,YAAY,EACZ,aAAa,EACb,KAAK,eAAe,EACpB,KAAK,eAAe,EACpB,aAAa,GACd,MAAM,4BAA4B,CAAC;AAEpC,OAAO,EACL,KAAK,cAAc,EACnB,KAAK,EACL,SAAS,EACT,SAAS,EACT,KAAK,gBAAgB,EACrB,KAAK,eAAe,EACpB,UAAU,EACV,KAAK,iBAAiB,EACtB,KAAK,kBAAkB,EACvB,KAAK,iBAAiB,EACtB,KAAK,eAAe,EACpB,iBAAiB,EACjB,KAAK,iBAAiB,EACtB,iBAAiB,EACjB,UAAU,EACV,UAAU,EACV,KAAK,SAAS,EACd,KAAK,WAAW,EAChB,SAAS,EACT,eAAe,EACf,uBAAuB,EACvB,SAAS,EACT,KAAK,kBAAkB,EACvB,eAAe,EACf,SAAS,EACT,SAAS,EACT,UAAU,EACV,UAAU,GACX,MAAM,oBAAoB,CAAC;AAE5B,OAAO,EAAE,KAAK,GAAG,EAAE,gBAAgB,EAAE,gBAAgB,EAAE,MAAM,2BAA2B,CAAC;AAEzF,OAAO,EACL,KAAK,kBAAkB,EACvB,wBAAwB,EACxB,YAAY,EACZ,KAAK,kBAAkB,EACvB,KAAK,mBAAmB,EACxB,WAAW,EACX,WAAW,EACX,KAAK,mBAAmB,EACxB,KAAK,eAAe,EACpB,KAAK,cAAc,EACnB,WAAW,EACX,YAAY,EACZ,KAAK,iBAAiB,EACtB,KAAK,qBAAqB,EAC1B,YAAY,EACZ,uBAAuB,GACxB,MAAM,qBAAqB,CAAC;AAE7B,OAAO,EACL,uBAAuB,EACvB,mBAAmB,EACnB,8BAA8B,EAC9B,yBAAyB,EACzB,6BAA6B,EAC7B,+BAA+B,EAC/B,yBAAyB,EACzB,0BAA0B,EAC1B,kBAAkB,EAClB,sBAAsB,EACtB,8BAA8B,EAC9B,+BAA+B,EAC/B,8BAA8B,EAC9B,4BAA4B,EAC5B,oBAAoB,EACpB,qBAAqB,EACrB,oBAAoB,EACpB,qBAAqB,EACrB,kBAAkB,EAClB,8BAA8B,EAC9B,0BAA0B,EAC1B,wBAAwB,EACxB,sBAAsB,EACtB,6BAA6B,EAC7B,4BAA4B,EAC5B,2BAA2B,EAC3B,4BAA4B,EAC5B,yBAAyB,EACzB,qBAAqB,EACrB,sBAAsB,EACtB,2BAA2B,EAC3B,2BAA2B,EAC3B,yBAAyB,EACzB,4BAA4B,EAC5B,wBAAwB,EACxB,uBAAuB,EACvB,4BAA4B,EAC5B,0BAA0B,EAC1B,0BAA0B,EAC1B,wBAAwB,EACxB,oBAAoB,EACpB,wBAAwB,EACxB,kBAAkB,EAClB,0BAA0B,EAC1B,6BAA6B,EAC7B,yBAAyB,EACzB,4BAA4B,EAC5B,wBAAwB,EACxB,yBAAyB,EACzB,kBAAkB,EAClB,0BAA0B,EAC1B,6BAA6B,EAC7B,qBAAqB,GACtB,MAAM,2BAA2B,CAAC;AAEnC,OAAO,EACL,uBAAuB,EACvB,GAAG,EACH,qBAAqB,GACtB,MAAM,gCAAgC,CAAC;AAExC,OAAO,EACL,aAAa,EACb,eAAe,EACf,mBAAmB,EACnB,QAAQ,GACT,MAAM,wBAAwB,CAAC;AAEhC,OAAO,EACL,gBAAgB,EAChB,4BAA4B,EAC5B,kBAAkB,EAClB,wBAAwB,EACxB,YAAY,EACZ,KAAK,eAAe,EACpB,QAAQ,GACT,MAAM,sBAAsB,CAAC;AAE9B,OAAO,EAAE,eAAe,EAAE,QAAQ,EAAE,MAAM,kBAAkB,CAAC"}
|
|
1
|
+
{"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../../src/utils/index.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAGH,OAAO,EACL,KAAK,SAAS,EACd,KAAK,UAAU,EACf,aAAa,EACb,KAAK,oBAAoB,EACzB,aAAa,EACb,UAAU,EACV,KAAK,iBAAiB,EACtB,IAAI,EACJ,eAAe,EACf,QAAQ,EACR,QAAQ,EACR,cAAc,EACd,KAAK,qBAAqB,EAC1B,KAAK,UAAU,EACf,aAAa,EACb,KAAK,oBAAoB,EACzB,KAAK,QAAQ,EACb,KAAK,SAAS,EACd,cAAc,EACd,aAAa,EACb,SAAS,GACV,MAAM,uBAAuB,CAAC;AAE/B,OAAO,EAAE,mBAAmB,EAAE,cAAc,EAAE,cAAc,EAAE,MAAM,wBAAwB,CAAC;AAE7F,OAAO,EAAE,YAAY,EAAE,MAAM,0CAA0C,CAAC;AACxE,YAAY,EACV,gBAAgB,EAChB,YAAY,EACZ,mBAAmB,EACnB,YAAY,GACb,MAAM,mCAAmC,CAAC;AAE3C,OAAO,EAAE,MAAM,EAAE,MAAM,EAAE,KAAK,WAAW,EAAE,MAAM,sBAAsB,CAAC;AAExE,OAAO,EACL,KAAK,WAAW,EAChB,KAAK,0BAA0B,EAC/B,KAAK,cAAc,EACnB,qBAAqB,GACtB,MAAM,8BAA8B,CAAC;AAEtC,OAAO,EAAE,KAAK,mBAAmB,EAAE,WAAW,EAAE,MAAM,uBAAuB,CAAC;AAE9E,OAAO,EACL,KAAK,WAAW,EAChB,eAAe,EACf,WAAW,EACX,KAAK,eAAe,GACrB,MAAM,2BAA2B,CAAC;AAEnC,OAAO,EAAE,KAAK,uBAAuB,EAAE,gBAAgB,EAAE,MAAM,+BAA+B,CAAC;AAC/F,OAAO,EAAE,KAAK,YAAY,EAAE,SAAS,EAAE,MAAM,oBAAoB,CAAC;AAElE,OAAO,EACL,yBAAyB,EACzB,YAAY,EACZ,YAAY,EACZ,aAAa,EACb,KAAK,eAAe,EACpB,KAAK,eAAe,EACpB,aAAa,GACd,MAAM,4BAA4B,CAAC;AAEpC,OAAO,EACL,KAAK,cAAc,EACnB,KAAK,EACL,SAAS,EACT,SAAS,EACT,KAAK,gBAAgB,EACrB,KAAK,eAAe,EACpB,UAAU,EACV,KAAK,iBAAiB,EACtB,KAAK,qBAAqB,EAC1B,KAAK,oBAAoB,EACzB,KAAK,kBAAkB,EACvB,KAAK,iBAAiB,EACtB,KAAK,eAAe,EACpB,iBAAiB,EACjB,KAAK,iBAAiB,EACtB,iBAAiB,EACjB,aAAa,EACb,aAAa,EACb,UAAU,EACV,UAAU,EACV,KAAK,SAAS,EACd,KAAK,WAAW,EAChB,SAAS,EACT,eAAe,EACf,uBAAuB,EACvB,SAAS,EACT,KAAK,kBAAkB,EACvB,eAAe,EACf,SAAS,EACT,SAAS,EACT,UAAU,EACV,UAAU,GACX,MAAM,oBAAoB,CAAC;AAE5B,OAAO,EAAE,KAAK,GAAG,EAAE,gBAAgB,EAAE,gBAAgB,EAAE,MAAM,2BAA2B,CAAC;AAEzF,OAAO,EACL,KAAK,kBAAkB,EACvB,wBAAwB,EACxB,YAAY,EACZ,KAAK,kBAAkB,EACvB,KAAK,mBAAmB,EACxB,WAAW,EACX,WAAW,EACX,KAAK,mBAAmB,EACxB,KAAK,eAAe,EACpB,KAAK,cAAc,EACnB,WAAW,EACX,YAAY,EACZ,KAAK,iBAAiB,EACtB,KAAK,qBAAqB,EAC1B,YAAY,EACZ,uBAAuB,GACxB,MAAM,qBAAqB,CAAC;AAE7B,OAAO,EACL,uBAAuB,EACvB,mBAAmB,EACnB,8BAA8B,EAC9B,yBAAyB,EACzB,6BAA6B,EAC7B,+BAA+B,EAC/B,yBAAyB,EACzB,0BAA0B,EAC1B,kBAAkB,EAClB,sBAAsB,EACtB,8BAA8B,EAC9B,+BAA+B,EAC/B,8BAA8B,EAC9B,4BAA4B,EAC5B,oBAAoB,EACpB,qBAAqB,EACrB,oBAAoB,EACpB,qBAAqB,EACrB,kBAAkB,EAClB,8BAA8B,EAC9B,0BAA0B,EAC1B,wBAAwB,EACxB,sBAAsB,EACtB,6BAA6B,EAC7B,4BAA4B,EAC5B,2BAA2B,EAC3B,4BAA4B,EAC5B,yBAAyB,EACzB,qBAAqB,EACrB,sBAAsB,EACtB,2BAA2B,EAC3B,2BAA2B,EAC3B,yBAAyB,EACzB,4BAA4B,EAC5B,wBAAwB,EACxB,uBAAuB,EACvB,4BAA4B,EAC5B,0BAA0B,EAC1B,0BAA0B,EAC1B,wBAAwB,EACxB,oBAAoB,EACpB,wBAAwB,EACxB,kBAAkB,EAClB,0BAA0B,EAC1B,6BAA6B,EAC7B,yBAAyB,EACzB,4BAA4B,EAC5B,wBAAwB,EACxB,yBAAyB,EACzB,kBAAkB,EAClB,0BAA0B,EAC1B,6BAA6B,EAC7B,qBAAqB,GACtB,MAAM,2BAA2B,CAAC;AAEnC,OAAO,EACL,uBAAuB,EACvB,GAAG,EACH,qBAAqB,GACtB,MAAM,gCAAgC,CAAC;AAExC,OAAO,EACL,aAAa,EACb,eAAe,EACf,mBAAmB,EACnB,QAAQ,GACT,MAAM,wBAAwB,CAAC;AAEhC,OAAO,EACL,gBAAgB,EAChB,4BAA4B,EAC5B,kBAAkB,EAClB,wBAAwB,EACxB,YAAY,EACZ,KAAK,eAAe,EACpB,QAAQ,GACT,MAAM,sBAAsB,CAAC;AAE9B,OAAO,EAAE,eAAe,EAAE,QAAQ,EAAE,MAAM,kBAAkB,CAAC"}
|
package/dist/utils/index.js
CHANGED
|
@@ -22,7 +22,7 @@ export { withRetry } from './network/retry.js';
|
|
|
22
22
|
// Pagination
|
|
23
23
|
export { DEFAULT_PAGINATION_CONFIG, decodeCursor, encodeCursor, extractCursor, paginateArray, } from './pagination/pagination.js';
|
|
24
24
|
// Parsing
|
|
25
|
-
export { Allow, CsvParser, csvParser, dateParser, FrontmatterParser, frontmatterParser, JsonParser, jsonParser, PdfParser, parseDateString, parseDateStringDetailed, pdfParser, thinkBlockRegex, XmlParser, xmlParser, YamlParser, yamlParser, } from './parsing/index.js';
|
|
25
|
+
export { Allow, CsvParser, csvParser, dateParser, FrontmatterParser, frontmatterParser, HtmlExtractor, htmlExtractor, JsonParser, jsonParser, PdfParser, parseDateString, parseDateStringDetailed, pdfParser, thinkBlockRegex, XmlParser, xmlParser, YamlParser, yamlParser, } from './parsing/index.js';
|
|
26
26
|
// Scheduling
|
|
27
27
|
export { SchedulerService, schedulerService } from './scheduling/scheduler.js';
|
|
28
28
|
// Security
|
package/dist/utils/index.js.map
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"index.js","sourceRoot":"","sources":["../../src/utils/index.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAEH,aAAa;AACb,OAAO,EAGL,aAAa,EAEb,aAAa,EACb,UAAU,EAEV,IAAI,EACJ,eAAe,EACf,QAAQ,EACR,QAAQ,EACR,cAAc,EAGd,aAAa,EAIb,cAAc,EACd,aAAa,EACb,SAAS,GACV,MAAM,uBAAuB,CAAC;AAC/B,WAAW;AACX,OAAO,EAAE,mBAAmB,EAAE,cAAc,EAAE,cAAc,EAAE,MAAM,wBAAwB,CAAC;AAC7F,gBAAgB;AAChB,OAAO,EAAE,YAAY,EAAE,MAAM,0CAA0C,CAAC;AAOxE,SAAS;AACT,OAAO,EAAE,MAAM,EAAE,MAAM,EAAoB,MAAM,sBAAsB,CAAC;AACxE,kBAAkB;AAClB,OAAO,EAIL,qBAAqB,GACtB,MAAM,8BAA8B,CAAC;AACtC,UAAU;AACV,OAAO,EAA4B,WAAW,EAAE,MAAM,uBAAuB,CAAC;AAC9E,iBAAiB;AACjB,OAAO,EAEL,eAAe,EACf,WAAW,GAEZ,MAAM,2BAA2B,CAAC;AACnC,UAAU;AACV,OAAO,EAAgC,gBAAgB,EAAE,MAAM,+BAA+B,CAAC;AAC/F,OAAO,EAAqB,SAAS,EAAE,MAAM,oBAAoB,CAAC;AAClE,aAAa;AACb,OAAO,EACL,yBAAyB,EACzB,YAAY,EACZ,YAAY,EACZ,aAAa,EAGb,aAAa,GACd,MAAM,4BAA4B,CAAC;AACpC,UAAU;AACV,OAAO,EAEL,KAAK,EACL,SAAS,EACT,SAAS,EAGT,UAAU,
|
|
1
|
+
{"version":3,"file":"index.js","sourceRoot":"","sources":["../../src/utils/index.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAEH,aAAa;AACb,OAAO,EAGL,aAAa,EAEb,aAAa,EACb,UAAU,EAEV,IAAI,EACJ,eAAe,EACf,QAAQ,EACR,QAAQ,EACR,cAAc,EAGd,aAAa,EAIb,cAAc,EACd,aAAa,EACb,SAAS,GACV,MAAM,uBAAuB,CAAC;AAC/B,WAAW;AACX,OAAO,EAAE,mBAAmB,EAAE,cAAc,EAAE,cAAc,EAAE,MAAM,wBAAwB,CAAC;AAC7F,gBAAgB;AAChB,OAAO,EAAE,YAAY,EAAE,MAAM,0CAA0C,CAAC;AAOxE,SAAS;AACT,OAAO,EAAE,MAAM,EAAE,MAAM,EAAoB,MAAM,sBAAsB,CAAC;AACxE,kBAAkB;AAClB,OAAO,EAIL,qBAAqB,GACtB,MAAM,8BAA8B,CAAC;AACtC,UAAU;AACV,OAAO,EAA4B,WAAW,EAAE,MAAM,uBAAuB,CAAC;AAC9E,iBAAiB;AACjB,OAAO,EAEL,eAAe,EACf,WAAW,GAEZ,MAAM,2BAA2B,CAAC;AACnC,UAAU;AACV,OAAO,EAAgC,gBAAgB,EAAE,MAAM,+BAA+B,CAAC;AAC/F,OAAO,EAAqB,SAAS,EAAE,MAAM,oBAAoB,CAAC;AAClE,aAAa;AACb,OAAO,EACL,yBAAyB,EACzB,YAAY,EACZ,YAAY,EACZ,aAAa,EAGb,aAAa,GACd,MAAM,4BAA4B,CAAC;AACpC,UAAU;AACV,OAAO,EAEL,KAAK,EACL,SAAS,EACT,SAAS,EAGT,UAAU,EAOV,iBAAiB,EAEjB,iBAAiB,EACjB,aAAa,EACb,aAAa,EACb,UAAU,EACV,UAAU,EAGV,SAAS,EACT,eAAe,EACf,uBAAuB,EACvB,SAAS,EAET,eAAe,EACf,SAAS,EACT,SAAS,EACT,UAAU,EACV,UAAU,GACX,MAAM,oBAAoB,CAAC;AAC5B,aAAa;AACb,OAAO,EAAY,gBAAgB,EAAE,gBAAgB,EAAE,MAAM,2BAA2B,CAAC;AACzF,WAAW;AACX,OAAO,EAEL,wBAAwB,EACxB,YAAY,EAGZ,WAAW,EACX,WAAW,EAIX,WAAW,EACX,YAAY,EAGZ,YAAY,EACZ,uBAAuB,GACxB,MAAM,qBAAqB,CAAC;AAC7B,iCAAiC;AACjC,OAAO,EACL,uBAAuB,EACvB,mBAAmB,EACnB,8BAA8B,EAC9B,yBAAyB,EACzB,6BAA6B,EAC7B,+BAA+B,EAC/B,yBAAyB,EACzB,0BAA0B,EAC1B,kBAAkB,EAClB,sBAAsB,EACtB,8BAA8B,EAC9B,+BAA+B,EAC/B,8BAA8B,EAC9B,4BAA4B,EAC5B,oBAAoB,EACpB,qBAAqB,EACrB,oBAAoB,EACpB,qBAAqB,EACrB,kBAAkB,EAClB,8BAA8B,EAC9B,0BAA0B,EAC1B,wBAAwB,EACxB,sBAAsB,EACtB,6BAA6B,EAC7B,4BAA4B,EAC5B,2BAA2B,EAC3B,4BAA4B,EAC5B,yBAAyB,EACzB,qBAAqB,EACrB,sBAAsB,EACtB,2BAA2B,EAC3B,2BAA2B,EAC3B,yBAAyB,EACzB,4BAA4B,EAC5B,wBAAwB,EACxB,uBAAuB,EACvB,4BAA4B,EAC5B,0BAA0B,EAC1B,0BAA0B,EAC1B,wBAAwB,EACxB,oBAAoB,EACpB,wBAAwB,EACxB,kBAAkB,EAClB,0BAA0B,EAC1B,6BAA6B,EAC7B,yBAAyB,EACzB,4BAA4B,EAC5B,wBAAwB,EACxB,yBAAyB,EACzB,kBAAkB,EAClB,0BAA0B,EAC1B,6BAA6B,EAC7B,qBAAqB,GACtB,MAAM,2BAA2B,CAAC;AACnC,8BAA8B;AAC9B,OAAO,EACL,uBAAuB,EACvB,GAAG,EACH,qBAAqB,GACtB,MAAM,gCAAgC,CAAC;AACxC,sBAAsB;AACtB,OAAO,EACL,aAAa,EACb,eAAe,EACf,mBAAmB,EACnB,QAAQ,GACT,MAAM,wBAAwB,CAAC;AAChC,oBAAoB;AACpB,OAAO,EACL,gBAAgB,EAChB,4BAA4B,EAC5B,kBAAkB,EAClB,wBAAwB,EACxB,YAAY,EAEZ,QAAQ,GACT,MAAM,sBAAsB,CAAC;AAC9B,cAAc;AACd,OAAO,EAAE,eAAe,EAAE,QAAQ,EAAE,MAAM,kBAAkB,CAAC"}
|
|
@@ -0,0 +1,146 @@
|
|
|
1
|
+
import { type RequestContext } from '../../utils/internal/requestContext.js';
|
|
2
|
+
/**
|
|
3
|
+
* Options for HTML article extraction.
|
|
4
|
+
*/
|
|
5
|
+
export interface ExtractArticleOptions {
|
|
6
|
+
/**
|
|
7
|
+
* CSS selector to use as the main content element, bypassing auto-detection.
|
|
8
|
+
* If the selector does not match any element, Defuddle falls back to
|
|
9
|
+
* auto-detection.
|
|
10
|
+
*/
|
|
11
|
+
contentSelector?: string;
|
|
12
|
+
/**
|
|
13
|
+
* Enable Defuddle's debug logging and bypass div flattening. Useful when
|
|
14
|
+
* diagnosing why a specific page extracts poorly. Defaults to `false`.
|
|
15
|
+
*/
|
|
16
|
+
debug?: boolean;
|
|
17
|
+
/**
|
|
18
|
+
* Output format for `content`. `'markdown'` converts to Markdown (the common
|
|
19
|
+
* case for LLM-bound text), `'html'` returns cleaned HTML. Defaults to
|
|
20
|
+
* `'markdown'`.
|
|
21
|
+
*/
|
|
22
|
+
format?: 'html' | 'markdown';
|
|
23
|
+
/**
|
|
24
|
+
* Preferred language for extraction and transcript selection (BCP 47, e.g.
|
|
25
|
+
* `'en'`, `'fr'`, `'ja'`).
|
|
26
|
+
*/
|
|
27
|
+
language?: string;
|
|
28
|
+
/**
|
|
29
|
+
* Strip all images from the extracted content. Defaults to `false`.
|
|
30
|
+
*/
|
|
31
|
+
removeImages?: boolean;
|
|
32
|
+
/**
|
|
33
|
+
* URL of the page being parsed. Passed to Defuddle for site-specific
|
|
34
|
+
* extractors and resolved link rewriting.
|
|
35
|
+
*/
|
|
36
|
+
url?: string;
|
|
37
|
+
/**
|
|
38
|
+
* Allow Defuddle's async extractors to fetch from third-party APIs
|
|
39
|
+
* (e.g. FxTwitter) when no local content is available in the HTML.
|
|
40
|
+
* Defaults to `false` to keep extraction fully local and deterministic.
|
|
41
|
+
*/
|
|
42
|
+
useAsync?: boolean;
|
|
43
|
+
}
|
|
44
|
+
/**
|
|
45
|
+
* Result of HTML article extraction.
|
|
46
|
+
*
|
|
47
|
+
* All fields except `content` are best-effort — they may be undefined if the
|
|
48
|
+
* source page does not provide the corresponding metadata.
|
|
49
|
+
*/
|
|
50
|
+
export interface ExtractArticleResult {
|
|
51
|
+
/** Article author, if detected. */
|
|
52
|
+
author?: string;
|
|
53
|
+
/** Cleaned main content, either as Markdown or HTML depending on `format`. */
|
|
54
|
+
content: string;
|
|
55
|
+
/** Description or summary of the article, if present in page metadata. */
|
|
56
|
+
description?: string;
|
|
57
|
+
/** Domain of the source page (e.g. `'example.com'`), if derivable. */
|
|
58
|
+
domain?: string;
|
|
59
|
+
/** URL of the source site's favicon, if detected. */
|
|
60
|
+
favicon?: string;
|
|
61
|
+
/** URL of the article's primary image, if detected. */
|
|
62
|
+
image?: string;
|
|
63
|
+
/** Page language in BCP 47 format (e.g. `'en'`, `'en-US'`), if detected. */
|
|
64
|
+
language?: string;
|
|
65
|
+
/** Meta tags extracted from the page head, keyed by name. */
|
|
66
|
+
metaTags?: Record<string, string>;
|
|
67
|
+
/** Time `defuddle` spent parsing, in milliseconds. */
|
|
68
|
+
parseTime?: number;
|
|
69
|
+
/** Publication date string, if detected. Format is source-dependent. */
|
|
70
|
+
published?: string;
|
|
71
|
+
/** Raw schema.org data extracted from the page, if present. */
|
|
72
|
+
schemaOrgData?: unknown;
|
|
73
|
+
/** Site name, if detected (e.g. from Open Graph `og:site_name`). */
|
|
74
|
+
site?: string;
|
|
75
|
+
/** Article title, if detected. */
|
|
76
|
+
title?: string;
|
|
77
|
+
/** Word count of the extracted content, as reported by `defuddle`. */
|
|
78
|
+
wordCount?: number;
|
|
79
|
+
}
|
|
80
|
+
/**
|
|
81
|
+
* Utility class for extracting main article content from raw HTML.
|
|
82
|
+
*
|
|
83
|
+
* Lazily loads `defuddle` and `linkedom` on first use — both are optional peer
|
|
84
|
+
* dependencies (`bun add defuddle linkedom`). Returns cleaned main content
|
|
85
|
+
* plus best-effort metadata: title, author, description, Open Graph fields,
|
|
86
|
+
* schema.org data, word count.
|
|
87
|
+
*
|
|
88
|
+
* Does not guarantee structure beyond "main content of the page." For quirky
|
|
89
|
+
* pages, malformed HTML, or SPA shells with minimal server-rendered content,
|
|
90
|
+
* the result may be sparse — callers should degrade gracefully.
|
|
91
|
+
*/
|
|
92
|
+
export declare class HtmlExtractor {
|
|
93
|
+
/**
|
|
94
|
+
* Extracts the main article content from an HTML string.
|
|
95
|
+
*
|
|
96
|
+
* Async due to lazy loading of `defuddle` and `linkedom`, and because
|
|
97
|
+
* Defuddle's node entry is itself async (supports async fallback extractors
|
|
98
|
+
* gated by `useAsync`).
|
|
99
|
+
*
|
|
100
|
+
* @param html - Raw HTML string to extract from.
|
|
101
|
+
* @param options - Optional extraction options (format, URL, content selector, etc.).
|
|
102
|
+
* @param context - Optional `RequestContext` for correlated logging and error metadata.
|
|
103
|
+
* @returns Extracted content and metadata. Only `content` is guaranteed to
|
|
104
|
+
* be present; all other fields are best-effort.
|
|
105
|
+
* @throws {McpError} With `ConfigurationError` if `defuddle` or `linkedom` is not installed.
|
|
106
|
+
* @throws {McpError} With `ValidationError` if the HTML string is empty after trimming,
|
|
107
|
+
* or if `defuddle` fails to parse the page.
|
|
108
|
+
*
|
|
109
|
+
* @example
|
|
110
|
+
* ```typescript
|
|
111
|
+
* import { htmlExtractor } from '../../utils/parsing/htmlExtractor.js';
|
|
112
|
+
*
|
|
113
|
+
* const html = await fetch('https://example.com/article').then((r) => r.text());
|
|
114
|
+
* const result = await htmlExtractor.extract(html, {
|
|
115
|
+
* url: 'https://example.com/article',
|
|
116
|
+
* format: 'markdown',
|
|
117
|
+
* });
|
|
118
|
+
*
|
|
119
|
+
* console.log(result.title);
|
|
120
|
+
* console.log(result.content);
|
|
121
|
+
* ```
|
|
122
|
+
*/
|
|
123
|
+
extract(html: string, options?: ExtractArticleOptions, context?: RequestContext): Promise<ExtractArticleResult>;
|
|
124
|
+
}
|
|
125
|
+
/**
|
|
126
|
+
* Singleton instance of {@link HtmlExtractor}.
|
|
127
|
+
*
|
|
128
|
+
* Prefer this over constructing a new `HtmlExtractor` directly. Lazily loads
|
|
129
|
+
* `defuddle` and `linkedom` on first call, so there is no startup cost if
|
|
130
|
+
* HTML extraction is never used.
|
|
131
|
+
*
|
|
132
|
+
* @example
|
|
133
|
+
* ```typescript
|
|
134
|
+
* import { htmlExtractor } from '../../utils/parsing/htmlExtractor.js';
|
|
135
|
+
*
|
|
136
|
+
* const article = await htmlExtractor.extract(rawHtml, {
|
|
137
|
+
* url: 'https://example.com/post',
|
|
138
|
+
* format: 'markdown',
|
|
139
|
+
* });
|
|
140
|
+
*
|
|
141
|
+
* // Hand the content + metadata to the LLM
|
|
142
|
+
* llm.prompt({ title: article.title, body: article.content });
|
|
143
|
+
* ```
|
|
144
|
+
*/
|
|
145
|
+
export declare const htmlExtractor: HtmlExtractor;
|
|
146
|
+
//# sourceMappingURL=htmlExtractor.d.ts.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"htmlExtractor.d.ts","sourceRoot":"","sources":["../../../src/utils/parsing/htmlExtractor.ts"],"names":[],"mappings":"AAgCA,OAAO,EAAE,KAAK,cAAc,EAAyB,MAAM,oCAAoC,CAAC;AAqChG;;GAEG;AACH,MAAM,WAAW,qBAAqB;IACpC;;;;OAIG;IACH,eAAe,CAAC,EAAE,MAAM,CAAC;IACzB;;;OAGG;IACH,KAAK,CAAC,EAAE,OAAO,CAAC;IAChB;;;;OAIG;IACH,MAAM,CAAC,EAAE,MAAM,GAAG,UAAU,CAAC;IAC7B;;;OAGG;IACH,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB;;OAEG;IACH,YAAY,CAAC,EAAE,OAAO,CAAC;IACvB;;;OAGG;IACH,GAAG,CAAC,EAAE,MAAM,CAAC;IACb;;;;OAIG;IACH,QAAQ,CAAC,EAAE,OAAO,CAAC;CACpB;AAED;;;;;GAKG;AACH,MAAM,WAAW,oBAAoB;IACnC,mCAAmC;IACnC,MAAM,CAAC,EAAE,MAAM,CAAC;IAChB,8EAA8E;IAC9E,OAAO,EAAE,MAAM,CAAC;IAChB,0EAA0E;IAC1E,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,sEAAsE;IACtE,MAAM,CAAC,EAAE,MAAM,CAAC;IAChB,qDAAqD;IACrD,OAAO,CAAC,EAAE,MAAM,CAAC;IACjB,uDAAuD;IACvD,KAAK,CAAC,EAAE,MAAM,CAAC;IACf,4EAA4E;IAC5E,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,6DAA6D;IAC7D,QAAQ,CAAC,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAAC,CAAC;IAClC,sDAAsD;IACtD,SAAS,CAAC,EAAE,MAAM,CAAC;IACnB,wEAAwE;IACxE,SAAS,CAAC,EAAE,MAAM,CAAC;IACnB,+DAA+D;IAC/D,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,oEAAoE;IACpE,IAAI,CAAC,EAAE,MAAM,CAAC;IACd,kCAAkC;IAClC,KAAK,CAAC,EAAE,MAAM,CAAC;IACf,sEAAsE;IACtE,SAAS,CAAC,EAAE,MAAM,CAAC;CACpB;AAED;;;;;;;;;;;GAWG;AACH,qBAAa,aAAa;IACxB;;;;;;;;;;;;;;;;;;;;;;;;;;;;;OA6BG;IACG,OAAO,CACX,IAAI,EAAE,MAAM,EACZ,OAAO,CAAC,EAAE,qBAAqB,EAC/B,OAAO,CAAC,EAAE,cAAc,GACvB,OAAO,CAAC,oBAAoB,CAAC;CA4EjC;AAED;;;;;;;;;;;;;;;;;;;GAmBG;AACH,eAAO,MAAM,aAAa,eAAsB,CAAC"}
|
|
@@ -0,0 +1,171 @@
|
|
|
1
|
+
import { McpError, validationError } from '../../types-global/errors.js';
|
|
2
|
+
import { lazyImport } from '../../utils/internal/lazyImport.js';
|
|
3
|
+
import { logger } from '../../utils/internal/logger.js';
|
|
4
|
+
import { requestContextService } from '../../utils/internal/requestContext.js';
|
|
5
|
+
const getDefuddle = lazyImport(() => import('defuddle/node'), 'Install "defuddle" to use HTML article extraction: bun add defuddle linkedom');
|
|
6
|
+
const getLinkedom = lazyImport(() => import('linkedom'), 'Install "linkedom" to use HTML article extraction: bun add defuddle linkedom');
|
|
7
|
+
/** Flattens defuddle's `MetaTagItem[]` into a `Record<string, string>` keyed
|
|
8
|
+
* by `name ?? property`. Items without a usable key or content are skipped;
|
|
9
|
+
* returns `undefined` if nothing usable is left so callers can omit the
|
|
10
|
+
* field from the result entirely. */
|
|
11
|
+
function flattenMetaTags(tags) {
|
|
12
|
+
if (!tags || tags.length === 0)
|
|
13
|
+
return;
|
|
14
|
+
const out = {};
|
|
15
|
+
for (const tag of tags) {
|
|
16
|
+
const key = tag.name ?? tag.property;
|
|
17
|
+
if (!key || tag.content == null)
|
|
18
|
+
continue;
|
|
19
|
+
out[key] = tag.content;
|
|
20
|
+
}
|
|
21
|
+
return Object.keys(out).length > 0 ? out : undefined;
|
|
22
|
+
}
|
|
23
|
+
/**
|
|
24
|
+
* Utility class for extracting main article content from raw HTML.
|
|
25
|
+
*
|
|
26
|
+
* Lazily loads `defuddle` and `linkedom` on first use — both are optional peer
|
|
27
|
+
* dependencies (`bun add defuddle linkedom`). Returns cleaned main content
|
|
28
|
+
* plus best-effort metadata: title, author, description, Open Graph fields,
|
|
29
|
+
* schema.org data, word count.
|
|
30
|
+
*
|
|
31
|
+
* Does not guarantee structure beyond "main content of the page." For quirky
|
|
32
|
+
* pages, malformed HTML, or SPA shells with minimal server-rendered content,
|
|
33
|
+
* the result may be sparse — callers should degrade gracefully.
|
|
34
|
+
*/
|
|
35
|
+
export class HtmlExtractor {
|
|
36
|
+
/**
|
|
37
|
+
* Extracts the main article content from an HTML string.
|
|
38
|
+
*
|
|
39
|
+
* Async due to lazy loading of `defuddle` and `linkedom`, and because
|
|
40
|
+
* Defuddle's node entry is itself async (supports async fallback extractors
|
|
41
|
+
* gated by `useAsync`).
|
|
42
|
+
*
|
|
43
|
+
* @param html - Raw HTML string to extract from.
|
|
44
|
+
* @param options - Optional extraction options (format, URL, content selector, etc.).
|
|
45
|
+
* @param context - Optional `RequestContext` for correlated logging and error metadata.
|
|
46
|
+
* @returns Extracted content and metadata. Only `content` is guaranteed to
|
|
47
|
+
* be present; all other fields are best-effort.
|
|
48
|
+
* @throws {McpError} With `ConfigurationError` if `defuddle` or `linkedom` is not installed.
|
|
49
|
+
* @throws {McpError} With `ValidationError` if the HTML string is empty after trimming,
|
|
50
|
+
* or if `defuddle` fails to parse the page.
|
|
51
|
+
*
|
|
52
|
+
* @example
|
|
53
|
+
* ```typescript
|
|
54
|
+
* import { htmlExtractor } from '../../utils/parsing/htmlExtractor.js';
|
|
55
|
+
*
|
|
56
|
+
* const html = await fetch('https://example.com/article').then((r) => r.text());
|
|
57
|
+
* const result = await htmlExtractor.extract(html, {
|
|
58
|
+
* url: 'https://example.com/article',
|
|
59
|
+
* format: 'markdown',
|
|
60
|
+
* });
|
|
61
|
+
*
|
|
62
|
+
* console.log(result.title);
|
|
63
|
+
* console.log(result.content);
|
|
64
|
+
* ```
|
|
65
|
+
*/
|
|
66
|
+
async extract(html, options, context) {
|
|
67
|
+
const logContext = context ??
|
|
68
|
+
requestContextService.createRequestContext({
|
|
69
|
+
operation: 'HtmlExtractor.extract',
|
|
70
|
+
});
|
|
71
|
+
const trimmed = html.trim();
|
|
72
|
+
if (!trimmed) {
|
|
73
|
+
throw validationError('HTML string is empty.', context);
|
|
74
|
+
}
|
|
75
|
+
const [{ Defuddle }, { parseHTML }] = await Promise.all([getDefuddle(), getLinkedom()]);
|
|
76
|
+
const format = options?.format ?? 'markdown';
|
|
77
|
+
const defuddleOptions = {
|
|
78
|
+
markdown: format === 'markdown',
|
|
79
|
+
useAsync: options?.useAsync ?? false,
|
|
80
|
+
...(options?.contentSelector !== undefined && {
|
|
81
|
+
contentSelector: options.contentSelector,
|
|
82
|
+
}),
|
|
83
|
+
...(options?.removeImages !== undefined && {
|
|
84
|
+
removeImages: options.removeImages,
|
|
85
|
+
}),
|
|
86
|
+
...(options?.debug !== undefined && { debug: options.debug }),
|
|
87
|
+
...(options?.language !== undefined && { language: options.language }),
|
|
88
|
+
};
|
|
89
|
+
logger.debug('Extracting article content from HTML.', {
|
|
90
|
+
...logContext,
|
|
91
|
+
byteLength: trimmed.length,
|
|
92
|
+
format,
|
|
93
|
+
hasUrl: Boolean(options?.url),
|
|
94
|
+
hasContentSelector: Boolean(options?.contentSelector),
|
|
95
|
+
});
|
|
96
|
+
try {
|
|
97
|
+
const { document } = parseHTML(trimmed);
|
|
98
|
+
const result = await Defuddle(document, options?.url, defuddleOptions);
|
|
99
|
+
logger.debug('Successfully extracted article.', {
|
|
100
|
+
...logContext,
|
|
101
|
+
wordCount: result.wordCount,
|
|
102
|
+
titlePresent: Boolean(result.title),
|
|
103
|
+
parseTimeMs: result.parseTime,
|
|
104
|
+
});
|
|
105
|
+
const out = { content: result.content ?? '' };
|
|
106
|
+
if (result.title)
|
|
107
|
+
out.title = result.title;
|
|
108
|
+
if (result.author)
|
|
109
|
+
out.author = result.author;
|
|
110
|
+
if (result.description)
|
|
111
|
+
out.description = result.description;
|
|
112
|
+
if (result.domain)
|
|
113
|
+
out.domain = result.domain;
|
|
114
|
+
if (result.favicon)
|
|
115
|
+
out.favicon = result.favicon;
|
|
116
|
+
if (result.image)
|
|
117
|
+
out.image = result.image;
|
|
118
|
+
if (result.language)
|
|
119
|
+
out.language = result.language;
|
|
120
|
+
if (result.published)
|
|
121
|
+
out.published = result.published;
|
|
122
|
+
if (result.site)
|
|
123
|
+
out.site = result.site;
|
|
124
|
+
if (typeof result.parseTime === 'number')
|
|
125
|
+
out.parseTime = result.parseTime;
|
|
126
|
+
if (typeof result.wordCount === 'number')
|
|
127
|
+
out.wordCount = result.wordCount;
|
|
128
|
+
if (result.schemaOrgData)
|
|
129
|
+
out.schemaOrgData = result.schemaOrgData;
|
|
130
|
+
const metaTags = flattenMetaTags(result.metaTags);
|
|
131
|
+
if (metaTags)
|
|
132
|
+
out.metaTags = metaTags;
|
|
133
|
+
return out;
|
|
134
|
+
}
|
|
135
|
+
catch (e) {
|
|
136
|
+
if (e instanceof McpError)
|
|
137
|
+
throw e;
|
|
138
|
+
const error = e instanceof Error ? e : new Error(String(e));
|
|
139
|
+
logger.error('Failed to extract article from HTML.', {
|
|
140
|
+
...logContext,
|
|
141
|
+
errorDetails: error.message,
|
|
142
|
+
});
|
|
143
|
+
throw validationError(`Failed to extract article from HTML: ${error.message}`, {
|
|
144
|
+
...context,
|
|
145
|
+
rawError: error.stack ?? String(error),
|
|
146
|
+
});
|
|
147
|
+
}
|
|
148
|
+
}
|
|
149
|
+
}
|
|
150
|
+
/**
|
|
151
|
+
* Singleton instance of {@link HtmlExtractor}.
|
|
152
|
+
*
|
|
153
|
+
* Prefer this over constructing a new `HtmlExtractor` directly. Lazily loads
|
|
154
|
+
* `defuddle` and `linkedom` on first call, so there is no startup cost if
|
|
155
|
+
* HTML extraction is never used.
|
|
156
|
+
*
|
|
157
|
+
* @example
|
|
158
|
+
* ```typescript
|
|
159
|
+
* import { htmlExtractor } from '../../utils/parsing/htmlExtractor.js';
|
|
160
|
+
*
|
|
161
|
+
* const article = await htmlExtractor.extract(rawHtml, {
|
|
162
|
+
* url: 'https://example.com/post',
|
|
163
|
+
* format: 'markdown',
|
|
164
|
+
* });
|
|
165
|
+
*
|
|
166
|
+
* // Hand the content + metadata to the LLM
|
|
167
|
+
* llm.prompt({ title: article.title, body: article.content });
|
|
168
|
+
* ```
|
|
169
|
+
*/
|
|
170
|
+
export const htmlExtractor = new HtmlExtractor();
|
|
171
|
+
//# sourceMappingURL=htmlExtractor.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"htmlExtractor.js","sourceRoot":"","sources":["../../../src/utils/parsing/htmlExtractor.ts"],"names":[],"mappings":"AA6BA,OAAO,EAAE,QAAQ,EAAE,eAAe,EAAE,MAAM,0BAA0B,CAAC;AACrE,OAAO,EAAE,UAAU,EAAE,MAAM,gCAAgC,CAAC;AAC5D,OAAO,EAAE,MAAM,EAAE,MAAM,4BAA4B,CAAC;AACpD,OAAO,EAAuB,qBAAqB,EAAE,MAAM,oCAAoC,CAAC;AAEhG,MAAM,WAAW,GAAG,UAAU,CAC5B,GAAG,EAAE,CAAC,MAAM,CAAC,eAAe,CAAC,EAC7B,8EAA8E,CAC/E,CAAC;AAEF,MAAM,WAAW,GAAG,UAAU,CAC5B,GAAG,EAAE,CAAC,MAAM,CAAC,UAAU,CAAC,EACxB,8EAA8E,CAC/E,CAAC;AAUF;;;sCAGsC;AACtC,SAAS,eAAe,CACtB,IAAuC;IAEvC,IAAI,CAAC,IAAI,IAAI,IAAI,CAAC,MAAM,KAAK,CAAC;QAAE,OAAO;IACvC,MAAM,GAAG,GAA2B,EAAE,CAAC;IACvC,KAAK,MAAM,GAAG,IAAI,IAAI,EAAE,CAAC;QACvB,MAAM,GAAG,GAAG,GAAG,CAAC,IAAI,IAAI,GAAG,CAAC,QAAQ,CAAC;QACrC,IAAI,CAAC,GAAG,IAAI,GAAG,CAAC,OAAO,IAAI,IAAI;YAAE,SAAS;QAC1C,GAAG,CAAC,GAAG,CAAC,GAAG,GAAG,CAAC,OAAO,CAAC;IACzB,CAAC;IACD,OAAO,MAAM,CAAC,IAAI,CAAC,GAAG,CAAC,CAAC,MAAM,GAAG,CAAC,CAAC,CAAC,CAAC,GAAG,CAAC,CAAC,CAAC,SAAS,CAAC;AACvD,CAAC;AAkFD;;;;;;;;;;;GAWG;AACH,MAAM,OAAO,aAAa;IACxB;;;;;;;;;;;;;;;;;;;;;;;;;;;;;OA6BG;IACH,KAAK,CAAC,OAAO,CACX,IAAY,EACZ,OAA+B,EAC/B,OAAwB;QAExB,MAAM,UAAU,GACd,OAAO;YACP,qBAAqB,CAAC,oBAAoB,CAAC;gBACzC,SAAS,EAAE,uBAAuB;aACnC,CAAC,CAAC;QAEL,MAAM,OAAO,GAAG,IAAI,CAAC,IAAI,EAAE,CAAC;QAC5B,IAAI,CAAC,OAAO,EAAE,CAAC;YACb,MAAM,eAAe,CAAC,uBAAuB,EAAE,OAAO,CAAC,CAAC;QAC1D,CAAC;QAED,MAAM,CAAC,EAAE,QAAQ,EAAE,EAAE,EAAE,SAAS,EAAE,CAAC,GAAG,MAAM,OAAO,CAAC,GAAG,CAAC,CAAC,WAAW,EAAE,EAAE,WAAW,EAAE,CAAC,CAAC,CAAC;QAExF,MAAM,MAAM,GAAG,OAAO,EAAE,MAAM,IAAI,UAAU,CAAC;QAC7C,MAAM,eAAe,GAAoB;YACvC,QAAQ,EAAE,MAAM,KAAK,UAAU;YAC/B,QAAQ,EAAE,OAAO,EAAE,QAAQ,IAAI,KAAK;YACpC,GAAG,CAAC,OAAO,EAAE,eAAe,KAAK,SAAS,IAAI;gBAC5C,eAAe,EAAE,OAAO,CAAC,eAAe;aACzC,CAAC;YACF,GAAG,CAAC,OAAO,EAAE,YAAY,KAAK,SAAS,IAAI;gBACzC,YAAY,EAAE,OAAO,CAAC,YAAY;aACnC,CAAC;YACF,GAAG,CAAC,OAAO,EAAE,KAAK,KAAK,SAAS,IAAI,EAAE,KAAK,EAAE,OAAO,CAAC,KAAK,EAAE,CAAC;YAC7D,GAAG,CAAC,OAAO,EAAE,QAAQ,KAAK,SAAS,IAAI,EAAE,QAAQ,EAAE,OAAO,CAAC,QAAQ,EAAE,CAAC;SACvE,CAAC;QAEF,MAAM,CAAC,KAAK,CAAC,uCAAuC,EAAE;YACpD,GAAG,UAAU;YACb,UAAU,EAAE,OAAO,CAAC,MAAM;YAC1B,MAAM;YACN,MAAM,EAAE,OAAO,CAAC,OAAO,EAAE,GAAG,CAAC;YAC7B,kBAAkB,EAAE,OAAO,CAAC,OAAO,EAAE,eAAe,CAAC;SACtD,CAAC,CAAC;QAEH,IAAI,CAAC;YACH,MAAM,EAAE,QAAQ,EAAE,GAAG,SAAS,CAAC,OAAO,CAAC,CAAC;YACxC,MAAM,MAAM,GAAG,MAAM,QAAQ,CAAC,QAAQ,EAAE,OAAO,EAAE,GAAG,EAAE,eAAe,CAAC,CAAC;YAEvE,MAAM,CAAC,KAAK,CAAC,iCAAiC,EAAE;gBAC9C,GAAG,UAAU;gBACb,SAAS,EAAE,MAAM,CAAC,SAAS;gBAC3B,YAAY,EAAE,OAAO,CAAC,MAAM,CAAC,KAAK,CAAC;gBACnC,WAAW,EAAE,MAAM,CAAC,SAAS;aAC9B,CAAC,CAAC;YAEH,MAAM,GAAG,GAAyB,EAAE,OAAO,EAAE,MAAM,CAAC,OAAO,IAAI,EAAE,EAAE,CAAC;YACpE,IAAI,MAAM,CAAC,KAAK;gBAAE,GAAG,CAAC,KAAK,GAAG,MAAM,CAAC,KAAK,CAAC;YAC3C,IAAI,MAAM,CAAC,MAAM;gBAAE,GAAG,CAAC,MAAM,GAAG,MAAM,CAAC,MAAM,CAAC;YAC9C,IAAI,MAAM,CAAC,WAAW;gBAAE,GAAG,CAAC,WAAW,GAAG,MAAM,CAAC,WAAW,CAAC;YAC7D,IAAI,MAAM,CAAC,MAAM;gBAAE,GAAG,CAAC,MAAM,GAAG,MAAM,CAAC,MAAM,CAAC;YAC9C,IAAI,MAAM,CAAC,OAAO;gBAAE,GAAG,CAAC,OAAO,GAAG,MAAM,CAAC,OAAO,CAAC;YACjD,IAAI,MAAM,CAAC,KAAK;gBAAE,GAAG,CAAC,KAAK,GAAG,MAAM,CAAC,KAAK,CAAC;YAC3C,IAAI,MAAM,CAAC,QAAQ;gBAAE,GAAG,CAAC,QAAQ,GAAG,MAAM,CAAC,QAAQ,CAAC;YACpD,IAAI,MAAM,CAAC,SAAS;gBAAE,GAAG,CAAC,SAAS,GAAG,MAAM,CAAC,SAAS,CAAC;YACvD,IAAI,MAAM,CAAC,IAAI;gBAAE,GAAG,CAAC,IAAI,GAAG,MAAM,CAAC,IAAI,CAAC;YACxC,IAAI,OAAO,MAAM,CAAC,SAAS,KAAK,QAAQ;gBAAE,GAAG,CAAC,SAAS,GAAG,MAAM,CAAC,SAAS,CAAC;YAC3E,IAAI,OAAO,MAAM,CAAC,SAAS,KAAK,QAAQ;gBAAE,GAAG,CAAC,SAAS,GAAG,MAAM,CAAC,SAAS,CAAC;YAC3E,IAAI,MAAM,CAAC,aAAa;gBAAE,GAAG,CAAC,aAAa,GAAG,MAAM,CAAC,aAAa,CAAC;YACnE,MAAM,QAAQ,GAAG,eAAe,CAAC,MAAM,CAAC,QAAQ,CAAC,CAAC;YAClD,IAAI,QAAQ;gBAAE,GAAG,CAAC,QAAQ,GAAG,QAAQ,CAAC;YACtC,OAAO,GAAG,CAAC;QACb,CAAC;QAAC,OAAO,CAAU,EAAE,CAAC;YACpB,IAAI,CAAC,YAAY,QAAQ;gBAAE,MAAM,CAAC,CAAC;YACnC,MAAM,KAAK,GAAG,CAAC,YAAY,KAAK,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC,IAAI,KAAK,CAAC,MAAM,CAAC,CAAC,CAAC,CAAC,CAAC;YAC5D,MAAM,CAAC,KAAK,CAAC,sCAAsC,EAAE;gBACnD,GAAG,UAAU;gBACb,YAAY,EAAE,KAAK,CAAC,OAAO;aAC5B,CAAC,CAAC;YACH,MAAM,eAAe,CAAC,wCAAwC,KAAK,CAAC,OAAO,EAAE,EAAE;gBAC7E,GAAG,OAAO;gBACV,QAAQ,EAAE,KAAK,CAAC,KAAK,IAAI,MAAM,CAAC,KAAK,CAAC;aACvC,CAAC,CAAC;QACL,CAAC;IACH,CAAC;CACF;AAED;;;;;;;;;;;;;;;;;;;GAmBG;AACH,MAAM,CAAC,MAAM,aAAa,GAAG,IAAI,aAAa,EAAE,CAAC"}
|
|
@@ -7,6 +7,7 @@
|
|
|
7
7
|
export { CsvParser, csvParser } from './csvParser.js';
|
|
8
8
|
export { dateParser, parseDateString, parseDateStringDetailed } from './dateParser.js';
|
|
9
9
|
export { FrontmatterParser, type FrontmatterResult, frontmatterParser, } from './frontmatterParser.js';
|
|
10
|
+
export { type ExtractArticleOptions, type ExtractArticleResult, HtmlExtractor, htmlExtractor, } from './htmlExtractor.js';
|
|
10
11
|
export { Allow, JsonParser, jsonParser } from './jsonParser.js';
|
|
11
12
|
export { type AddPageOptions, type DrawImageOptions, type DrawTextOptions, type EmbedImageOptions, type ExtractTextOptions, type ExtractTextResult, type FillFormOptions, type PageRange, type PdfMetadata, PdfParser, pdfParser, type SetMetadataOptions, } from './pdfParser.js';
|
|
12
13
|
export { thinkBlockRegex } from './thinkBlock.js';
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../../../src/utils/parsing/index.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAEH,OAAO,EAAE,SAAS,EAAE,SAAS,EAAE,MAAM,gBAAgB,CAAC;AACtD,OAAO,EAAE,UAAU,EAAE,eAAe,EAAE,uBAAuB,EAAE,MAAM,iBAAiB,CAAC;AACvF,OAAO,EACL,iBAAiB,EACjB,KAAK,iBAAiB,EACtB,iBAAiB,GAClB,MAAM,wBAAwB,CAAC;AAChC,OAAO,EAAE,KAAK,EAAE,UAAU,EAAE,UAAU,EAAE,MAAM,iBAAiB,CAAC;AAChE,OAAO,EACL,KAAK,cAAc,EACnB,KAAK,gBAAgB,EACrB,KAAK,eAAe,EACpB,KAAK,iBAAiB,EACtB,KAAK,kBAAkB,EACvB,KAAK,iBAAiB,EACtB,KAAK,eAAe,EACpB,KAAK,SAAS,EACd,KAAK,WAAW,EAChB,SAAS,EACT,SAAS,EACT,KAAK,kBAAkB,GACxB,MAAM,gBAAgB,CAAC;AACxB,OAAO,EAAE,eAAe,EAAE,MAAM,iBAAiB,CAAC;AAClD,OAAO,EAAE,SAAS,EAAE,SAAS,EAAE,MAAM,gBAAgB,CAAC;AACtD,OAAO,EAAE,UAAU,EAAE,UAAU,EAAE,MAAM,iBAAiB,CAAC"}
|
|
1
|
+
{"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../../../src/utils/parsing/index.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAEH,OAAO,EAAE,SAAS,EAAE,SAAS,EAAE,MAAM,gBAAgB,CAAC;AACtD,OAAO,EAAE,UAAU,EAAE,eAAe,EAAE,uBAAuB,EAAE,MAAM,iBAAiB,CAAC;AACvF,OAAO,EACL,iBAAiB,EACjB,KAAK,iBAAiB,EACtB,iBAAiB,GAClB,MAAM,wBAAwB,CAAC;AAChC,OAAO,EACL,KAAK,qBAAqB,EAC1B,KAAK,oBAAoB,EACzB,aAAa,EACb,aAAa,GACd,MAAM,oBAAoB,CAAC;AAC5B,OAAO,EAAE,KAAK,EAAE,UAAU,EAAE,UAAU,EAAE,MAAM,iBAAiB,CAAC;AAChE,OAAO,EACL,KAAK,cAAc,EACnB,KAAK,gBAAgB,EACrB,KAAK,eAAe,EACpB,KAAK,iBAAiB,EACtB,KAAK,kBAAkB,EACvB,KAAK,iBAAiB,EACtB,KAAK,eAAe,EACpB,KAAK,SAAS,EACd,KAAK,WAAW,EAChB,SAAS,EACT,SAAS,EACT,KAAK,kBAAkB,GACxB,MAAM,gBAAgB,CAAC;AACxB,OAAO,EAAE,eAAe,EAAE,MAAM,iBAAiB,CAAC;AAClD,OAAO,EAAE,SAAS,EAAE,SAAS,EAAE,MAAM,gBAAgB,CAAC;AACtD,OAAO,EAAE,UAAU,EAAE,UAAU,EAAE,MAAM,iBAAiB,CAAC"}
|
|
@@ -7,6 +7,7 @@
|
|
|
7
7
|
export { CsvParser, csvParser } from './csvParser.js';
|
|
8
8
|
export { dateParser, parseDateString, parseDateStringDetailed } from './dateParser.js';
|
|
9
9
|
export { FrontmatterParser, frontmatterParser, } from './frontmatterParser.js';
|
|
10
|
+
export { HtmlExtractor, htmlExtractor, } from './htmlExtractor.js';
|
|
10
11
|
export { Allow, JsonParser, jsonParser } from './jsonParser.js';
|
|
11
12
|
export { PdfParser, pdfParser, } from './pdfParser.js';
|
|
12
13
|
export { thinkBlockRegex } from './thinkBlock.js';
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"index.js","sourceRoot":"","sources":["../../../src/utils/parsing/index.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAEH,OAAO,EAAE,SAAS,EAAE,SAAS,EAAE,MAAM,gBAAgB,CAAC;AACtD,OAAO,EAAE,UAAU,EAAE,eAAe,EAAE,uBAAuB,EAAE,MAAM,iBAAiB,CAAC;AACvF,OAAO,EACL,iBAAiB,EAEjB,iBAAiB,GAClB,MAAM,wBAAwB,CAAC;AAChC,OAAO,EAAE,KAAK,EAAE,UAAU,EAAE,UAAU,EAAE,MAAM,iBAAiB,CAAC;AAChE,OAAO,EAUL,SAAS,EACT,SAAS,GAEV,MAAM,gBAAgB,CAAC;AACxB,OAAO,EAAE,eAAe,EAAE,MAAM,iBAAiB,CAAC;AAClD,OAAO,EAAE,SAAS,EAAE,SAAS,EAAE,MAAM,gBAAgB,CAAC;AACtD,OAAO,EAAE,UAAU,EAAE,UAAU,EAAE,MAAM,iBAAiB,CAAC"}
|
|
1
|
+
{"version":3,"file":"index.js","sourceRoot":"","sources":["../../../src/utils/parsing/index.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAEH,OAAO,EAAE,SAAS,EAAE,SAAS,EAAE,MAAM,gBAAgB,CAAC;AACtD,OAAO,EAAE,UAAU,EAAE,eAAe,EAAE,uBAAuB,EAAE,MAAM,iBAAiB,CAAC;AACvF,OAAO,EACL,iBAAiB,EAEjB,iBAAiB,GAClB,MAAM,wBAAwB,CAAC;AAChC,OAAO,EAGL,aAAa,EACb,aAAa,GACd,MAAM,oBAAoB,CAAC;AAC5B,OAAO,EAAE,KAAK,EAAE,UAAU,EAAE,UAAU,EAAE,MAAM,iBAAiB,CAAC;AAChE,OAAO,EAUL,SAAS,EACT,SAAS,GAEV,MAAM,gBAAgB,CAAC;AACxB,OAAO,EAAE,eAAe,EAAE,MAAM,iBAAiB,CAAC;AAClD,OAAO,EAAE,SAAS,EAAE,SAAS,EAAE,MAAM,gBAAgB,CAAC;AACtD,OAAO,EAAE,UAAU,EAAE,UAAU,EAAE,MAAM,iBAAiB,CAAC"}
|