@cyanheads/eur-lex-mcp-server 0.2.1 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +2 -2
- package/CLAUDE.md +2 -2
- package/README.md +6 -5
- package/changelog/0.3.x/0.3.0.md +26 -0
- package/changelog/0.4.x/0.4.0.md +20 -0
- package/dist/config/server-config.d.ts.map +1 -1
- package/dist/config/server-config.js +4 -2
- package/dist/config/server-config.js.map +1 -1
- package/dist/mcp-server/tools/definitions/eurlex-get-document.tool.d.ts +13 -0
- package/dist/mcp-server/tools/definitions/eurlex-get-document.tool.d.ts.map +1 -1
- package/dist/mcp-server/tools/definitions/eurlex-get-document.tool.js +140 -27
- package/dist/mcp-server/tools/definitions/eurlex-get-document.tool.js.map +1 -1
- package/dist/mcp-server/tools/definitions/index.d.ts +13 -0
- package/dist/mcp-server/tools/definitions/index.d.ts.map +1 -1
- package/dist/services/eurlex-content/eurlex-content-service.d.ts +47 -10
- package/dist/services/eurlex-content/eurlex-content-service.d.ts.map +1 -1
- package/dist/services/eurlex-content/eurlex-content-service.js +148 -32
- package/dist/services/eurlex-content/eurlex-content-service.js.map +1 -1
- package/dist/services/eurlex-content/html-to-markdown.d.ts +29 -0
- package/dist/services/eurlex-content/html-to-markdown.d.ts.map +1 -0
- package/dist/services/eurlex-content/html-to-markdown.js +172 -0
- package/dist/services/eurlex-content/html-to-markdown.js.map +1 -0
- package/package.json +4 -2
- package/server.json +7 -7
package/AGENTS.md
CHANGED
|
@@ -1,8 +1,8 @@
|
|
|
1
1
|
# Developer Protocol
|
|
2
2
|
|
|
3
3
|
**Server:** eur-lex-mcp-server
|
|
4
|
-
**Version:** 0.
|
|
5
|
-
**Framework:** [@cyanheads/mcp-ts-core](https://www.npmjs.com/package/@cyanheads/mcp-ts-core) `^0.10.
|
|
4
|
+
**Version:** 0.4.0
|
|
5
|
+
**Framework:** [@cyanheads/mcp-ts-core](https://www.npmjs.com/package/@cyanheads/mcp-ts-core) `^0.10.10`
|
|
6
6
|
**Engines:** Bun ≥1.3.0, Node ≥24.0.0
|
|
7
7
|
**MCP SDK:** `@modelcontextprotocol/sdk` ^1.29.0
|
|
8
8
|
**Zod:** ^4.4.3
|
package/CLAUDE.md
CHANGED
|
@@ -1,8 +1,8 @@
|
|
|
1
1
|
# Developer Protocol
|
|
2
2
|
|
|
3
3
|
**Server:** eur-lex-mcp-server
|
|
4
|
-
**Version:** 0.
|
|
5
|
-
**Framework:** [@cyanheads/mcp-ts-core](https://www.npmjs.com/package/@cyanheads/mcp-ts-core) `^0.10.
|
|
4
|
+
**Version:** 0.4.0
|
|
5
|
+
**Framework:** [@cyanheads/mcp-ts-core](https://www.npmjs.com/package/@cyanheads/mcp-ts-core) `^0.10.10`
|
|
6
6
|
**Engines:** Bun ≥1.3.0, Node ≥24.0.0
|
|
7
7
|
**MCP SDK:** `@modelcontextprotocol/sdk` ^1.29.0
|
|
8
8
|
**Zod:** ^4.4.3
|
package/README.md
CHANGED
|
@@ -7,7 +7,7 @@
|
|
|
7
7
|
|
|
8
8
|
<div align="center">
|
|
9
9
|
|
|
10
|
-
[](./CHANGELOG.md) [](./LICENSE) [](https://github.com/users/cyanheads/packages/container/package/eur-lex-mcp-server) [](https://modelcontextprotocol.io/) [](https://www.npmjs.com/package/@cyanheads/eur-lex-mcp-server) [](https://www.typescriptlang.org/) [](https://bun.sh/)
|
|
11
11
|
|
|
12
12
|
</div>
|
|
13
13
|
|
|
@@ -34,7 +34,7 @@ Seven tools covering EU legal research — document discovery, content retrieval
|
|
|
34
34
|
| Tool | Description |
|
|
35
35
|
|:-----|:------------|
|
|
36
36
|
| `eurlex_search_documents` | Search EU legislation, treaties, and preparatory acts across the CELLAR corpus. Filters by document type, date range, EuroVoc concept, author institution, and in-force status. |
|
|
37
|
-
| `eurlex_get_document` | Fetch structured metadata and full text (HTML or Formex4 XML) for a work by CELEX number or ELI URI. |
|
|
37
|
+
| `eurlex_get_document` | Fetch structured metadata and full text (HTML, Markdown, or Formex4 XML) for a work by CELEX number or ELI URI. |
|
|
38
38
|
| `eurlex_lookup_celex` | Resolve an EU legal citation — a CELEX number or an ELI URI — to the canonical CELLAR work. |
|
|
39
39
|
| `eurlex_get_cases` | Search CJEU and General Court case law — judgments, orders, and Advocate General opinions — by case number, party name, subject, or date range. |
|
|
40
40
|
| `eurlex_get_relations` | Traverse the CELLAR relationship graph: amendment chain, consolidated versions, legal basis, citation network, and national transposition measures. |
|
|
@@ -61,7 +61,8 @@ Fetch the notice and full text of an EU legal act.
|
|
|
61
61
|
|
|
62
62
|
- Accepts CELEX numbers (e.g., `32016R0679`) or ELI URIs
|
|
63
63
|
- Returns structured metadata: title, date, document type, author institution, legal basis, EuroVoc subjects, in-force flag
|
|
64
|
-
- Full text in HTML (default) or Formex4 XML
|
|
64
|
+
- Full text in HTML (default), Markdown, or Formex4 XML — `format: "markdown"` converts the act body to clean Markdown server-side (recitals and numbered points as readable text, genuine data tables as GFM)
|
|
65
|
+
- Content shaping for large acts: `content_mode` `"paged"` (default) returns a bounded character window (`offset` + `limit`) with `content_chars_total` and `has_more` so you can page to the end; `"full"` returns the whole body in one call; `"metadata_only"` skips the body
|
|
65
66
|
- Supports all 24 official EU languages; defaults to English with automatic fallback when a translation is unavailable
|
|
66
67
|
- Older acts and some CJEU judgments may lack English translations
|
|
67
68
|
|
|
@@ -140,7 +141,7 @@ EUR-Lex-specific:
|
|
|
140
141
|
|
|
141
142
|
- No API key required — both CELLAR SPARQL and EUR-Lex REST content endpoints are publicly accessible
|
|
142
143
|
- `CellarSparqlService` POSTs `application/x-www-form-urlencoded` SPARQL with CDM prefix declarations built in; server-side LIMIT enforcement (max 100) prevents Virtuoso timeout abuse
|
|
143
|
-
- `EurLexContentService` fetches
|
|
144
|
+
- `EurLexContentService` fetches act text from the CELLAR content-negotiation resolver (`/resource/celex/{CELEX}` with `Accept` / `Accept-Language` headers); HTML and Formex4 XML pass through, Markdown is converted server-side from the HTML body
|
|
144
145
|
- Virtuoso error classification: HTTP 200 with `Virtuoso 37000 Error` body is parsed and re-raised as `ServiceUnavailable` (transient/timeout) or `InvalidParams` (syntax error)
|
|
145
146
|
- Language fallback on document fetch: if the requested language is unavailable, retries with English; returns metadata-only with a note when English also fails
|
|
146
147
|
- Typed error contracts on every tool — structured `reason` codes let agents branch on outcomes without parsing text
|
|
@@ -275,7 +276,7 @@ All configuration is validated at startup via Zod schemas in `src/config/server-
|
|
|
275
276
|
| Variable | Description | Default |
|
|
276
277
|
|:---------|:------------|:--------|
|
|
277
278
|
| `CELLAR_SPARQL_ENDPOINT` | CELLAR SPARQL endpoint URL override (e.g., for a local Virtuoso mirror). | `http://publications.europa.eu/webapi/rdf/sparql` |
|
|
278
|
-
| `EURLEX_CONTENT_BASE_URL` |
|
|
279
|
+
| `EURLEX_CONTENT_BASE_URL` | EU Publications Office CELLAR content resolver base URL override. | `http://publications.europa.eu` |
|
|
279
280
|
| `SPARQL_QUERY_TIMEOUT_MS` | Client-side timeout for SPARQL requests in milliseconds. | `55000` |
|
|
280
281
|
| `MAX_SPARQL_RESULTS` | Enforced ceiling on LIMIT in all generated SPARQL queries. | `100` |
|
|
281
282
|
| `MCP_TRANSPORT_TYPE` | Transport: `stdio` or `http`. | `stdio` |
|
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
---
|
|
2
|
+
summary: "eurlex_get_document re-sources act text from the EU Publications Office CELLAR resolver and refuses AWS WAF bot-challenge stubs (previously surfaced as content); adds content_mode/offset/limit body pagination with content_* navigation fields and removes the 8,000-char text cut"
|
|
3
|
+
breaking: false
|
|
4
|
+
security: false
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# 0.3.0 — 2026-06-30
|
|
8
|
+
|
|
9
|
+
## Added
|
|
10
|
+
|
|
11
|
+
- **`content_mode`, `offset`, `limit` inputs on `eurlex_get_document`** ([#12](https://github.com/cyanheads/eur-lex-mcp-server/issues/12)) — `content_mode` selects how much body to return: `"paged"` (default) yields a bounded `[offset, offset+limit)` window, `"full"` the entire body in one call, `"metadata_only"` skips the content fetch. `limit` defaults to 25,000 characters, capped at 100,000.
|
|
12
|
+
- **`content_*` navigation output fields** ([#12](https://github.com/cyanheads/eur-lex-mcp-server/issues/12)) — `content_mode`, `content_offset`, `content_chars_returned`, `content_chars_total`, and `has_more` let a client page contiguous windows to the end of an act and reconstruct the full body. Paging past the end returns an empty window with `has_more: false`, not an error.
|
|
13
|
+
|
|
14
|
+
## Changed
|
|
15
|
+
|
|
16
|
+
- **Act full text re-sourced from CELLAR content negotiation** ([#16](https://github.com/cyanheads/eur-lex-mcp-server/issues/16)) — `EurLexContentService` fetches `publications.europa.eu/resource/celex/{CELEX}` (`application/xhtml+xml` → `text/html` for HTML, Formex 4 for XML) instead of the WAF-fronted `eur-lex.europa.eu` legal-content endpoint. `Accept-Language` maps EUR-Lex two-letter codes to the ISO 639-2/T forms CELLAR requires; a multi-part Formex `300` response is treated as unavailable rather than reconstructed.
|
|
17
|
+
- **`EURLEX_CONTENT_BASE_URL` default is now `http://publications.europa.eu`** ([#16](https://github.com/cyanheads/eur-lex-mcp-server/issues/16)) — was `https://eur-lex.europa.eu`. The env var name is unchanged; `server.json`, `README.md`, and `.env.example` updated to match the CELLAR content-negotiation source.
|
|
18
|
+
- **`content_available` distinguishes "not requested" from "unavailable"** ([#12](https://github.com/cyanheads/eur-lex-mcp-server/issues/12)) — now `false` in `"metadata_only"` mode (no fetch attempted); pair it with `content_mode` to tell the two apart.
|
|
19
|
+
|
|
20
|
+
## Removed
|
|
21
|
+
|
|
22
|
+
- **The 8,000-character `format()` content cut** ([#12](https://github.com/cyanheads/eur-lex-mcp-server/issues/12)) — the text view and `structuredContent.content` honor the same `content_mode` window, so there is no separate downstream truncation.
|
|
23
|
+
|
|
24
|
+
## Fixed
|
|
25
|
+
|
|
26
|
+
- **AWS WAF bot-challenge stub surfaced as act content** ([#16](https://github.com/cyanheads/eur-lex-mcp-server/issues/16)) — `eurlex_get_document` returned the JavaScript bot-challenge interstitial (the `awswaf`/`gokuProps` stub) as `content` with `content_available: true` for every CELEX. A response carrying a WAF challenge signature is now detected and raised as a `content_challenge` / `ServiceUnavailable` error rather than reported as available content.
|
|
@@ -0,0 +1,20 @@
|
|
|
1
|
+
---
|
|
2
|
+
summary: "eurlex_get_document adds an opt-in markdown format — server-side HTML→Markdown of the act body with GFM data tables — and bumps @cyanheads/mcp-ts-core to ^0.10.10, refreshing the lockfile to clear transitive advisories in hono, js-yaml, and esbuild"
|
|
3
|
+
breaking: false
|
|
4
|
+
security: true
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# 0.4.0 — 2026-06-30
|
|
8
|
+
|
|
9
|
+
## Added
|
|
10
|
+
|
|
11
|
+
- **`markdown` value on the `eurlex_get_document` `format` input** ([#13](https://github.com/cyanheads/eur-lex-mcp-server/issues/13)) — opt-in server-side HTML→Markdown of the act body; `html` stays the default and `content_format` reports `"markdown"`. The CONVEX numbering layout tables EUR-Lex uses for recitals, article paragraphs, and lettered/roman points flatten to inline-marked text (`(1) The protection of natural persons…`) instead of unreadable two-column rows; genuine `oj-table` data tables convert to GFM. Composes with the existing `content_mode`/`offset`/`limit` pagination.
|
|
12
|
+
|
|
13
|
+
## Changed
|
|
14
|
+
|
|
15
|
+
- **`@cyanheads/mcp-ts-core` `^0.10.9 → ^0.10.10`** — framework patch; lockfile refreshed.
|
|
16
|
+
- **New runtime dependencies** — `node-html-markdown` `^2.0.0` and `node-html-parser` `^8.0.4` power the Markdown conversion.
|
|
17
|
+
|
|
18
|
+
## Security
|
|
19
|
+
|
|
20
|
+
- **Transitive advisories cleared** — the framework bump and lockfile refresh resolved advisories in transitive dependencies: `hono` `4.12.26 → 4.12.27`, `js-yaml` `3.14.2 → 3.15.0`, and `esbuild` `0.28.0` (removed — `vite` `8.0.16 → 8.1.2` switched to rolldown, dropping the esbuild peer). `bun audit` reports no advisories.
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"server-config.d.ts","sourceRoot":"","sources":["../../src/config/server-config.ts"],"names":[],"mappings":"AAAA;;;;GAIG;AAEH,OAAO,EAAE,CAAC,EAAE,MAAM,wBAAwB,CAAC;AAG3C,QAAA,MAAM,kBAAkB;;;;;
|
|
1
|
+
{"version":3,"file":"server-config.d.ts","sourceRoot":"","sources":["../../src/config/server-config.ts"],"names":[],"mappings":"AAAA;;;;GAIG;AAEH,OAAO,EAAE,CAAC,EAAE,MAAM,wBAAwB,CAAC;AAG3C,QAAA,MAAM,kBAAkB;;;;;iBA4BtB,CAAC;AAEH,MAAM,MAAM,YAAY,GAAG,CAAC,CAAC,KAAK,CAAC,OAAO,kBAAkB,CAAC,CAAC;AAI9D,wBAAgB,eAAe,IAAI,YAAY,CAQ9C"}
|
|
@@ -14,8 +14,10 @@ const ServerConfigSchema = z.object({
|
|
|
14
14
|
eurLexContentBaseUrl: z
|
|
15
15
|
.string()
|
|
16
16
|
.url()
|
|
17
|
-
.default('
|
|
18
|
-
.describe('
|
|
17
|
+
.default('http://publications.europa.eu')
|
|
18
|
+
.describe('Base URL of the EU Publications Office CELLAR content-negotiation resolver, which serves ' +
|
|
19
|
+
'act text via /resource/celex/{CELEX}. Replaces the WAF-protected eur-lex.europa.eu ' +
|
|
20
|
+
'legal-content endpoint, which now returns an AWS WAF bot-challenge stub (issue #16).'),
|
|
19
21
|
sparqlQueryTimeoutMs: z.coerce
|
|
20
22
|
.number()
|
|
21
23
|
.int()
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"server-config.js","sourceRoot":"","sources":["../../src/config/server-config.ts"],"names":[],"mappings":"AAAA;;;;GAIG;AAEH,OAAO,EAAE,CAAC,EAAE,MAAM,wBAAwB,CAAC;AAC3C,OAAO,EAAE,cAAc,EAAE,MAAM,+BAA+B,CAAC;AAE/D,MAAM,kBAAkB,GAAG,CAAC,CAAC,MAAM,CAAC;IAClC,oBAAoB,EAAE,CAAC;SACpB,MAAM,EAAE;SACR,GAAG,EAAE;SACL,OAAO,CAAC,iDAAiD,CAAC;SAC1D,QAAQ,CAAC,uCAAuC,CAAC;IACpD,oBAAoB,EAAE,CAAC;SACpB,MAAM,EAAE;SACR,GAAG,EAAE;SACL,OAAO,CAAC,
|
|
1
|
+
{"version":3,"file":"server-config.js","sourceRoot":"","sources":["../../src/config/server-config.ts"],"names":[],"mappings":"AAAA;;;;GAIG;AAEH,OAAO,EAAE,CAAC,EAAE,MAAM,wBAAwB,CAAC;AAC3C,OAAO,EAAE,cAAc,EAAE,MAAM,+BAA+B,CAAC;AAE/D,MAAM,kBAAkB,GAAG,CAAC,CAAC,MAAM,CAAC;IAClC,oBAAoB,EAAE,CAAC;SACpB,MAAM,EAAE;SACR,GAAG,EAAE;SACL,OAAO,CAAC,iDAAiD,CAAC;SAC1D,QAAQ,CAAC,uCAAuC,CAAC;IACpD,oBAAoB,EAAE,CAAC;SACpB,MAAM,EAAE;SACR,GAAG,EAAE;SACL,OAAO,CAAC,+BAA+B,CAAC;SACxC,QAAQ,CACP,2FAA2F;QACzF,qFAAqF;QACrF,sFAAsF,CACzF;IACH,oBAAoB,EAAE,CAAC,CAAC,MAAM;SAC3B,MAAM,EAAE;SACR,GAAG,EAAE;SACL,QAAQ,EAAE;SACV,OAAO,CAAC,MAAM,CAAC;SACf,QAAQ,CAAC,yDAAyD,CAAC;IACtE,gBAAgB,EAAE,CAAC,CAAC,MAAM;SACvB,MAAM,EAAE;SACR,GAAG,EAAE;SACL,QAAQ,EAAE;SACV,GAAG,CAAC,GAAG,CAAC;SACR,OAAO,CAAC,GAAG,CAAC;SACZ,QAAQ,CAAC,2DAA2D,CAAC;CACzE,CAAC,CAAC;AAIH,IAAI,OAAiC,CAAC;AAEtC,MAAM,UAAU,eAAe;IAC7B,OAAO,KAAK,cAAc,CAAC,kBAAkB,EAAE;QAC7C,oBAAoB,EAAE,wBAAwB;QAC9C,oBAAoB,EAAE,yBAAyB;QAC/C,oBAAoB,EAAE,yBAAyB;QAC/C,gBAAgB,EAAE,oBAAoB;KACvC,CAAC,CAAC;IACH,OAAO,OAAO,CAAC;AACjB,CAAC"}
|
|
@@ -11,7 +11,15 @@ export declare const eurlex_get_document: import("@cyanheads/mcp-ts-core").ToolD
|
|
|
11
11
|
format: z.ZodDefault<z.ZodEnum<{
|
|
12
12
|
html: "html";
|
|
13
13
|
xml: "xml";
|
|
14
|
+
markdown: "markdown";
|
|
14
15
|
}>>;
|
|
16
|
+
content_mode: z.ZodDefault<z.ZodEnum<{
|
|
17
|
+
full: "full";
|
|
18
|
+
metadata_only: "metadata_only";
|
|
19
|
+
paged: "paged";
|
|
20
|
+
}>>;
|
|
21
|
+
offset: z.ZodDefault<z.ZodNumber>;
|
|
22
|
+
limit: z.ZodDefault<z.ZodNumber>;
|
|
15
23
|
}, z.core.$strip>, z.ZodObject<{
|
|
16
24
|
celex_number: z.ZodString;
|
|
17
25
|
work_uri: z.ZodOptional<z.ZodString>;
|
|
@@ -23,7 +31,12 @@ export declare const eurlex_get_document: import("@cyanheads/mcp-ts-core").ToolD
|
|
|
23
31
|
eurovoc_subjects: z.ZodOptional<z.ZodArray<z.ZodString>>;
|
|
24
32
|
in_force: z.ZodOptional<z.ZodBoolean>;
|
|
25
33
|
content: z.ZodOptional<z.ZodString>;
|
|
34
|
+
content_mode: z.ZodString;
|
|
26
35
|
content_available: z.ZodBoolean;
|
|
36
|
+
content_offset: z.ZodOptional<z.ZodNumber>;
|
|
37
|
+
content_chars_returned: z.ZodOptional<z.ZodNumber>;
|
|
38
|
+
content_chars_total: z.ZodOptional<z.ZodNumber>;
|
|
39
|
+
has_more: z.ZodBoolean;
|
|
27
40
|
language: z.ZodString;
|
|
28
41
|
language_fallback: z.ZodOptional<z.ZodString>;
|
|
29
42
|
content_format: z.ZodString;
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"eurlex-get-document.tool.d.ts","sourceRoot":"","sources":["../../../../src/mcp-server/tools/definitions/eurlex-get-document.tool.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAEH,OAAO,EAAQ,CAAC,EAAE,MAAM,wBAAwB,CAAC;AACjD,OAAO,EAAE,gBAAgB,EAAE,MAAM,+BAA+B,CAAC;
|
|
1
|
+
{"version":3,"file":"eurlex-get-document.tool.d.ts","sourceRoot":"","sources":["../../../../src/mcp-server/tools/definitions/eurlex-get-document.tool.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAEH,OAAO,EAAQ,CAAC,EAAE,MAAM,wBAAwB,CAAC;AACjD,OAAO,EAAE,gBAAgB,EAAE,MAAM,+BAA+B,CAAC;AA2BjE,eAAO,MAAM,mBAAmB;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;cAma9B,CAAC"}
|
|
@@ -8,17 +8,29 @@ import { ENG_LANGUAGE_URI, resolveCorporateBodyLabel, resolveResourceTypeLabel,
|
|
|
8
8
|
import { CellarSparqlService, getCellarSparqlService, } from '../../../services/cellar-sparql/cellar-sparql-service.js';
|
|
9
9
|
import { escapeSparqlLiteral, resolveEliToWork } from '../../../services/cellar-sparql/eli-resolution.js';
|
|
10
10
|
import { getEurLexContentService, } from '../../../services/eurlex-content/eurlex-content-service.js';
|
|
11
|
+
/**
|
|
12
|
+
* Default character window returned for body content in "paged" mode — bounds a
|
|
13
|
+
* single call while keeping small acts whole. The tail of a larger act is never
|
|
14
|
+
* lost: page forward with `offset`, or request `content_mode: "full"`.
|
|
15
|
+
*/
|
|
16
|
+
const DEFAULT_CONTENT_LIMIT = 25_000;
|
|
17
|
+
/** Hard ceiling on one paged window. Use `content_mode: "full"` for the whole body in a single call. */
|
|
18
|
+
const MAX_CONTENT_LIMIT = 100_000;
|
|
11
19
|
export const eurlex_get_document = tool('eurlex_get_document', {
|
|
12
20
|
title: 'Get EU Document',
|
|
13
21
|
description: 'Fetch the notice (metadata) and full text of an EU act by CELEX number or ELI URI. ' +
|
|
14
22
|
'Returns structured metadata — title, date, document type, author institution, legal basis, EuroVoc subjects — ' +
|
|
15
|
-
'plus the HTML or Formex4 XML
|
|
23
|
+
'plus the act content as HTML, Markdown, or Formex4 XML in the requested language. ' +
|
|
16
24
|
'Defaults to English (EN); not all works have content in all 24 official EU languages, ' +
|
|
17
25
|
'especially older acts pre-2004 EU enlargement. ' +
|
|
18
26
|
'If the requested language is unavailable, the server automatically falls back to English and notes the fallback. ' +
|
|
19
27
|
'CELEX format: {sector}{year}{type}{number} e.g. 32016R0679 for GDPR. ' +
|
|
20
28
|
'Use eurlex_lookup_celex to validate an identifier before calling this tool. ' +
|
|
21
|
-
'HTML
|
|
29
|
+
'HTML returns the full act text as served by EUR-Lex; markdown converts that HTML to clean Markdown server-side ' +
|
|
30
|
+
'(recitals and numbered points as readable text, genuine data tables as GFM tables); XML returns Formex4 for structured processing. ' +
|
|
31
|
+
'Large bodies are bounded per call but never lost: content_mode "paged" (default) returns a character window ' +
|
|
32
|
+
'(offset + limit) alongside content_chars_total and has_more, so you can page to the end and reconstruct the whole act; ' +
|
|
33
|
+
'content_mode "full" returns the entire body in one call; content_mode "metadata_only" returns metadata with no body and skips the content fetch.',
|
|
22
34
|
annotations: { readOnlyHint: true, idempotentHint: true, openWorldHint: true },
|
|
23
35
|
input: z.object({
|
|
24
36
|
celex_number: z
|
|
@@ -39,9 +51,33 @@ export const eurlex_get_document = tool('eurlex_get_document', {
|
|
|
39
51
|
.describe('Language code for document content (ISO 639-1 uppercase, e.g. EN, FR, DE). ' +
|
|
40
52
|
'Defaults to EN. Falls back to EN if the requested language is unavailable.'),
|
|
41
53
|
format: z
|
|
42
|
-
.enum(['html', 'xml'])
|
|
54
|
+
.enum(['html', 'xml', 'markdown'])
|
|
43
55
|
.default('html')
|
|
44
|
-
.describe('Content format: "html" for
|
|
56
|
+
.describe('Content format: "html" for the act text as served by EUR-Lex (default); ' +
|
|
57
|
+
'"markdown" for that HTML converted to clean Markdown server-side ' +
|
|
58
|
+
'(recitals and numbered points as readable text, genuine data tables as GFM); ' +
|
|
59
|
+
'"xml" for Formex4 XML structured format.'),
|
|
60
|
+
content_mode: z
|
|
61
|
+
.enum(['metadata_only', 'paged', 'full'])
|
|
62
|
+
.default('paged')
|
|
63
|
+
.describe('How much of the document body to return. "paged" (default) returns a bounded character window — see offset/limit; ' +
|
|
64
|
+
'"full" returns the entire body in one call (large acts can be hundreds of KB); ' +
|
|
65
|
+
'"metadata_only" returns metadata with no body and skips the content fetch. offset and limit apply only to "paged".'),
|
|
66
|
+
offset: z
|
|
67
|
+
.number()
|
|
68
|
+
.int()
|
|
69
|
+
.min(0)
|
|
70
|
+
.default(0)
|
|
71
|
+
.describe('Character offset into the full document body where the returned window starts ("paged" mode only). ' +
|
|
72
|
+
'Page forward by setting offset = content_offset + content_chars_returned from the previous call.'),
|
|
73
|
+
limit: z
|
|
74
|
+
.number()
|
|
75
|
+
.int()
|
|
76
|
+
.min(1)
|
|
77
|
+
.max(MAX_CONTENT_LIMIT)
|
|
78
|
+
.default(DEFAULT_CONTENT_LIMIT)
|
|
79
|
+
.describe(`Maximum characters of body content to return in this window ("paged" mode only). Default ${DEFAULT_CONTENT_LIMIT}, max ${MAX_CONTENT_LIMIT}. ` +
|
|
80
|
+
'For the entire body in one response, use content_mode "full" instead of a large limit.'),
|
|
45
81
|
}),
|
|
46
82
|
output: z.object({
|
|
47
83
|
celex_number: z.string().describe('Confirmed CELEX number for the retrieved work.'),
|
|
@@ -71,14 +107,44 @@ export const eurlex_get_document = tool('eurlex_get_document', {
|
|
|
71
107
|
content: z
|
|
72
108
|
.string()
|
|
73
109
|
.optional()
|
|
74
|
-
.describe('
|
|
75
|
-
|
|
110
|
+
.describe('Body content of the act in the requested format and language. In "paged" mode this is a character window ' +
|
|
111
|
+
'(see content_offset / content_chars_returned / has_more); in "full" mode the entire body; ' +
|
|
112
|
+
'omitted in "metadata_only" mode, when the window is empty (offset past the end), or when content is unavailable.'),
|
|
113
|
+
content_mode: z
|
|
114
|
+
.string()
|
|
115
|
+
.describe('Content mode applied to this response: "metadata_only", "paged", or "full".'),
|
|
116
|
+
content_available: z
|
|
117
|
+
.boolean()
|
|
118
|
+
.describe('Whether body content was fetched from EUR-Lex. False in "metadata_only" mode (no fetch attempted) — ' +
|
|
119
|
+
'use content_mode to distinguish "not requested" from "unavailable upstream".'),
|
|
120
|
+
content_offset: z
|
|
121
|
+
.number()
|
|
122
|
+
.int()
|
|
123
|
+
.optional()
|
|
124
|
+
.describe('Character offset where the returned content window begins. Present when a body was fetched and available.'),
|
|
125
|
+
content_chars_returned: z
|
|
126
|
+
.number()
|
|
127
|
+
.int()
|
|
128
|
+
.optional()
|
|
129
|
+
.describe('Number of body characters returned in this response (equals content length). Present when a body was fetched and available.'),
|
|
130
|
+
content_chars_total: z
|
|
131
|
+
.number()
|
|
132
|
+
.int()
|
|
133
|
+
.optional()
|
|
134
|
+
.describe('Total character length of the full document body. Present when content was fetched and available; ' +
|
|
135
|
+
'use with content_offset to page through the entire act.'),
|
|
136
|
+
has_more: z
|
|
137
|
+
.boolean()
|
|
138
|
+
.describe('True when body content exists beyond the returned window. Page forward with offset = content_offset + content_chars_returned, ' +
|
|
139
|
+
'or request content_mode "full" for the entire act in one call. Always false in "metadata_only" mode.'),
|
|
76
140
|
language: z.string().describe('Language code of the returned content.'),
|
|
77
141
|
language_fallback: z
|
|
78
142
|
.string()
|
|
79
143
|
.optional()
|
|
80
144
|
.describe('Human-readable note explaining the fallback that occurred (e.g. "Requested FR content unavailable; returned EN"). Present only when a fallback happened.'),
|
|
81
|
-
content_format: z
|
|
145
|
+
content_format: z
|
|
146
|
+
.string()
|
|
147
|
+
.describe('Format of the returned content: "html", "markdown", or "xml".'),
|
|
82
148
|
}),
|
|
83
149
|
errors: [
|
|
84
150
|
{
|
|
@@ -183,12 +249,18 @@ SELECT ?work ?celexNumber ?type ?date ?title ?inForce ?author ?legalBasis ?eurov
|
|
|
183
249
|
const inForceStr = CellarSparqlService.bindingValue(first, 'inForce');
|
|
184
250
|
const inForce = inForceStr !== undefined ? inForceStr === 'true' : undefined;
|
|
185
251
|
const authorUri = CellarSparqlService.bindingValue(first, 'author');
|
|
186
|
-
// Step 2:
|
|
187
|
-
|
|
252
|
+
// Step 2: assemble metadata, then shape the body per content_mode. The body
|
|
253
|
+
// is one navigable mechanism — "metadata_only" skips the fetch entirely,
|
|
254
|
+
// "full" returns the whole body, and "paged" returns a bounded
|
|
255
|
+
// [offset, offset+limit) window with content_chars_total + has_more so the
|
|
256
|
+
// tail is always reachable. The same shaped `content` feeds both
|
|
257
|
+
// structuredContent and format(); there is no separate truncation downstream.
|
|
188
258
|
const result = {
|
|
189
259
|
celex_number: confirmedCelex,
|
|
190
|
-
|
|
191
|
-
|
|
260
|
+
content_mode: input.content_mode,
|
|
261
|
+
content_available: false,
|
|
262
|
+
has_more: false,
|
|
263
|
+
language,
|
|
192
264
|
content_format: format,
|
|
193
265
|
};
|
|
194
266
|
if (workUri)
|
|
@@ -207,11 +279,36 @@ SELECT ?work ?celexNumber ?type ?date ?title ?inForce ?author ?legalBasis ?eurov
|
|
|
207
279
|
result.eurovoc_subjects = [...eurovocConcepts];
|
|
208
280
|
if (typeof inForce === 'boolean')
|
|
209
281
|
result.in_force = inForce;
|
|
210
|
-
if (
|
|
211
|
-
|
|
212
|
-
|
|
213
|
-
|
|
214
|
-
|
|
282
|
+
if (input.content_mode !== 'metadata_only') {
|
|
283
|
+
const contentResult = await contentSvc.fetchContent(celexNumber, language, format, ctx);
|
|
284
|
+
result.content_available = contentResult.contentAvailable;
|
|
285
|
+
result.language = contentResult.language;
|
|
286
|
+
if (contentResult.languageFallback) {
|
|
287
|
+
result.language_fallback = contentResult.languageFallback;
|
|
288
|
+
}
|
|
289
|
+
if (contentResult.contentAvailable && contentResult.content) {
|
|
290
|
+
const full = contentResult.content;
|
|
291
|
+
const total = full.length;
|
|
292
|
+
result.content_chars_total = total;
|
|
293
|
+
if (input.content_mode === 'full') {
|
|
294
|
+
result.content = full;
|
|
295
|
+
result.content_offset = 0;
|
|
296
|
+
result.content_chars_returned = total;
|
|
297
|
+
result.has_more = false;
|
|
298
|
+
}
|
|
299
|
+
else {
|
|
300
|
+
// Bounded [offset, offset+limit) window over the full body. offset is
|
|
301
|
+
// clamped to the body length so over-paging returns an empty window
|
|
302
|
+
// (has_more false) rather than erroring.
|
|
303
|
+
const offset = Math.min(input.offset, total);
|
|
304
|
+
const windowText = full.slice(offset, offset + input.limit);
|
|
305
|
+
result.content_offset = offset;
|
|
306
|
+
result.content_chars_returned = windowText.length;
|
|
307
|
+
result.has_more = offset + windowText.length < total;
|
|
308
|
+
if (windowText.length > 0)
|
|
309
|
+
result.content = windowText;
|
|
310
|
+
}
|
|
311
|
+
}
|
|
215
312
|
}
|
|
216
313
|
return result;
|
|
217
314
|
},
|
|
@@ -238,22 +335,38 @@ SELECT ?work ?celexNumber ?type ?date ?title ?inForce ?author ?legalBasis ?eurov
|
|
|
238
335
|
lines.push(`**Language:** ${result.language} | **Format:** ${result.content_format}`);
|
|
239
336
|
if (result.language_fallback)
|
|
240
337
|
lines.push(`*Note: ${result.language_fallback}*`);
|
|
241
|
-
|
|
242
|
-
|
|
338
|
+
// Body rendering honors the same window as structuredContent.content — the
|
|
339
|
+
// shaped content is emitted verbatim with a navigation line; no second cut.
|
|
340
|
+
if (result.content_mode === 'metadata_only') {
|
|
243
341
|
lines.push('');
|
|
244
|
-
lines.push('
|
|
245
|
-
|
|
246
|
-
|
|
247
|
-
const
|
|
248
|
-
if (result.content
|
|
249
|
-
|
|
250
|
-
|
|
342
|
+
lines.push('*Body omitted (content_mode "metadata_only"). Request content_mode "paged" or "full" to retrieve the text.*');
|
|
343
|
+
}
|
|
344
|
+
else if (result.content_available) {
|
|
345
|
+
const total = result.content_chars_total ?? result.content?.length ?? 0;
|
|
346
|
+
if (result.content) {
|
|
347
|
+
const start = result.content_offset ?? 0;
|
|
348
|
+
const returned = result.content_chars_returned ?? result.content.length;
|
|
349
|
+
const end = start + returned;
|
|
350
|
+
if (result.content_mode === 'full') {
|
|
351
|
+
lines.push(`**Content** (full): full body — ${returned} of ${total} characters.`);
|
|
352
|
+
}
|
|
353
|
+
else {
|
|
354
|
+
lines.push(`**Content** (${result.content_mode}): characters ${start}–${end} of ${total} (${returned} returned).` +
|
|
355
|
+
(result.has_more
|
|
356
|
+
? ` More available — page forward with offset=${end}, or content_mode="full" for the entire act.`
|
|
357
|
+
: ' End of document.'));
|
|
358
|
+
}
|
|
359
|
+
lines.push('');
|
|
360
|
+
lines.push('---');
|
|
361
|
+
lines.push('');
|
|
362
|
+
lines.push(result.content);
|
|
251
363
|
}
|
|
252
364
|
else {
|
|
253
|
-
lines.push(
|
|
365
|
+
lines.push('');
|
|
366
|
+
lines.push(`*No content at offset ${result.content_offset ?? 0} — past the end of the ${total}-character body. Lower offset to read.*`);
|
|
254
367
|
}
|
|
255
368
|
}
|
|
256
|
-
else
|
|
369
|
+
else {
|
|
257
370
|
lines.push('');
|
|
258
371
|
lines.push('*Document content is not available for this work in the requested language.*');
|
|
259
372
|
}
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"eurlex-get-document.tool.js","sourceRoot":"","sources":["../../../../src/mcp-server/tools/definitions/eurlex-get-document.tool.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAEH,OAAO,EAAE,IAAI,EAAE,CAAC,EAAE,MAAM,wBAAwB,CAAC;AACjD,OAAO,EAAE,gBAAgB,EAAE,MAAM,+BAA+B,CAAC;AACjE,OAAO,EACL,gBAAgB,EAChB,yBAAyB,EACzB,wBAAwB,GACzB,MAAM,wCAAwC,CAAC;AAChD,OAAO,EACL,mBAAmB,EACnB,sBAAsB,GACvB,MAAM,mDAAmD,CAAC;AAC3D,OAAO,EAAE,mBAAmB,EAAE,gBAAgB,EAAE,MAAM,4CAA4C,CAAC;AACnG,OAAO,EAGL,uBAAuB,GACxB,MAAM,qDAAqD,CAAC;AAE7D,MAAM,CAAC,MAAM,mBAAmB,GAAG,IAAI,CAAC,qBAAqB,EAAE;IAC7D,KAAK,EAAE,iBAAiB;IACxB,WAAW,EACT,qFAAqF;QACrF,gHAAgH;QAChH,kEAAkE;QAClE,wFAAwF;QACxF,iDAAiD;QACjD,mHAAmH;QACnH,uEAAuE;QACvE,8EAA8E;QAC9E,4GAA4G;IAC9G,WAAW,EAAE,EAAE,YAAY,EAAE,IAAI,EAAE,cAAc,EAAE,IAAI,EAAE,aAAa,EAAE,IAAI,EAAE;IAC9E,KAAK,EAAE,CAAC,CAAC,MAAM,CAAC;QACd,YAAY,EAAE,CAAC;aACZ,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CACP,+DAA+D;YAC7D,iDAAiD,CACpD;QACH,OAAO,EAAE,CAAC;aACP,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CACP,4EAA4E;YAC1E,2EAA2E;YAC3E,iDAAiD,CACpD;QACH,QAAQ,EAAE,CAAC;aACR,MAAM,EAAE;aACR,KAAK,CAAC,iBAAiB,CAAC;aACxB,OAAO,CAAC,IAAI,CAAC;aACb,QAAQ,CACP,6EAA6E;YAC3E,4EAA4E,CAC/E;QACH,MAAM,EAAE,CAAC;aACN,IAAI,CAAC,CAAC,MAAM,EAAE,KAAK,CAAC,CAAC;aACrB,OAAO,CAAC,MAAM,CAAC;aACf,QAAQ,CACP,mGAAmG,CACpG;KACJ,CAAC;IACF,MAAM,EAAE,CAAC,CAAC,MAAM,CAAC;QACf,YAAY,EAAE,CAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,CAAC,gDAAgD,CAAC;QACnF,QAAQ,EAAE,CAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,EAAE,CAAC,QAAQ,CAAC,kBAAkB,CAAC;QAC5D,KAAK,EAAE,CAAC;aACL,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CACP,uFAAuF,CACxF;QACH,IAAI,EAAE,CAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,EAAE,CAAC,QAAQ,CAAC,gDAAgD,CAAC;QACtF,aAAa,EAAE,CAAC;aACb,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CACP,mGAAmG,CACpG;QACH,kBAAkB,EAAE,CAAC;aAClB,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CACP,oIAAoI,CACrI;QACH,WAAW,EAAE,CAAC;aACX,KAAK,CAAC,CAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,CAAC,2CAA2C,CAAC,CAAC;aACvE,QAAQ,EAAE;aACV,QAAQ,CAAC,iCAAiC,CAAC;QAC9C,gBAAgB,EAAE,CAAC;aAChB,KAAK,CAAC,CAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,CAAC,sBAAsB,CAAC,CAAC;aAClD,QAAQ,EAAE;aACV,QAAQ,CAAC,kCAAkC,CAAC;QAC/C,QAAQ,EAAE,CAAC,CAAC,OAAO,EAAE,CAAC,QAAQ,EAAE,CAAC,QAAQ,CAAC,wCAAwC,CAAC;QACnF,OAAO,EAAE,CAAC;aACP,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CAAC,oEAAoE,CAAC;QACjF,iBAAiB,EAAE,CAAC,CAAC,OAAO,EAAE,CAAC,QAAQ,CAAC,sDAAsD,CAAC;QAC/F,QAAQ,EAAE,CAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,CAAC,wCAAwC,CAAC;QACvE,iBAAiB,EAAE,CAAC;aACjB,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CACP,0JAA0J,CAC3J;QACH,cAAc,EAAE,CAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,CAAC,kDAAkD,CAAC;KACxF,CAAC;IAEF,MAAM,EAAE;QACN;YACE,MAAM,EAAE,yBAAyB;YACjC,IAAI,EAAE,gBAAgB,CAAC,eAAe;YACtC,IAAI,EAAE,8DAA8D;YACpE,QAAQ,EAAE,iDAAiD;SAC5D;QACD;YACE,MAAM,EAAE,WAAW;YACnB,IAAI,EAAE,gBAAgB,CAAC,QAAQ;YAC/B,IAAI,EAAE,sFAAsF;YAC5F,QAAQ,EAAE,wEAAwE;SACnF;QACD;YACE,MAAM,EAAE,sBAAsB;YAC9B,IAAI,EAAE,gBAAgB,CAAC,QAAQ;YAC/B,IAAI,EAAE,qFAAqF;YAC3F,QAAQ,EACN,gGAAgG;SACnG;QACD;YACE,MAAM,EAAE,sBAAsB;YAC9B,IAAI,EAAE,gBAAgB,CAAC,kBAAkB;YACzC,IAAI,EAAE,wEAAwE;YAC9E,QAAQ,EACN,oFAAoF;SACvF;KACF;IAED,KAAK,CAAC,OAAO,CAAC,KAAK,EAAE,GAAG;QACtB,MAAM,SAAS,GAAG,sBAAsB,EAAE,CAAC;QAC3C,MAAM,UAAU,GAAG,uBAAuB,EAAE,CAAC;QAE7C,2EAA2E;QAC3E,0EAA0E;QAC1E,6EAA6E;QAC7E,qEAAqE;QACrE,4EAA4E;QAC5E,oEAAoE;QACpE,MAAM,UAAU,GAAG,KAAK,CAAC,YAAY,EAAE,IAAI,EAAE,CAAC;QAC9C,MAAM,QAAQ,GAAG,KAAK,CAAC,OAAO,EAAE,IAAI,EAAE,CAAC;QAEvC,IAAI,WAAmB,CAAC;QACxB,IAAI,QAAQ,IAAI,CAAC,UAAU,EAAE,CAAC;YAC5B,MAAM,OAAO,GAAG,MAAM,gBAAgB,CAAC,SAAS,EAAE,QAAQ,EAAE,GAAG,CAAC,CAAC;YACjE,MAAM,aAAa,GAAG,OAAO,IAAI,mBAAmB,CAAC,YAAY,CAAC,OAAO,EAAE,aAAa,CAAC,CAAC;YAC1F,IAAI,CAAC,aAAa,EAAE,CAAC;gBACnB,MAAM,GAAG,CAAC,IAAI,CAAC,WAAW,EAAE,iCAAiC,QAAQ,EAAE,EAAE;oBACvE,GAAG,GAAG,CAAC,WAAW,CAAC,WAAW,CAAC;iBAChC,CAAC,CAAC;YACL,CAAC;YACD,WAAW,GAAG,aAAa,CAAC;QAC9B,CAAC;aAAM,IAAI,UAAU,IAAI,CAAC,QAAQ,EAAE,CAAC;YACnC,WAAW,GAAG,UAAU,CAAC;QAC3B,CAAC;aAAM,CAAC;YACN,MAAM,GAAG,CAAC,IAAI,CACZ,yBAAyB,EACzB,UAAU;gBACR,CAAC,CAAC,wDAAwD;gBAC1D,CAAC,CAAC,yCAAyC,EAC7C,EAAE,GAAG,GAAG,CAAC,WAAW,CAAC,yBAAyB,CAAC,EAAE,CAClD,CAAC;QACJ,CAAC;QAED,MAAM,QAAQ,GAAG,CAAC,KAAK,CAAC,QAAQ,CAAC,IAAI,EAAE,CAAC,WAAW,EAAE,IAAI,IAAI,CAAmB,CAAC;QACjF,MAAM,MAAM,GAAG,KAAK,CAAC,MAAuB,CAAC;QAC7C,MAAM,eAAe,GAAG,mBAAmB,CAAC,WAAW,CAAC,CAAC;QAEzD,oCAAoC;QACpC,MAAM,UAAU,GAAG;;;gCAGS,eAAe;;;;;0CAKL,gBAAgB;;;;;;;WAO/C,CAAC;QAER,MAAM,YAAY,GAAG,MAAM,SAAS,CAAC,KAAK,CAAC,UAAU,EAAE,GAAG,CAAC,CAAC;QAC5D,GAAG,CAAC,GAAG,CAAC,IAAI,CAAC,yBAAyB,EAAE,EAAE,WAAW,EAAE,WAAW,EAAE,YAAY,CAAC,MAAM,EAAE,CAAC,CAAC;QAE3F,IAAI,YAAY,CAAC,MAAM,KAAK,CAAC,EAAE,CAAC;YAC9B,MAAM,GAAG,CAAC,IAAI,CAAC,WAAW,EAAE,mCAAmC,WAAW,EAAE,EAAE;gBAC5E,GAAG,GAAG,CAAC,WAAW,CAAC,WAAW,CAAC;aAChC,CAAC,CAAC;QACL,CAAC;QAED,kDAAkD;QAClD,MAAM,KAAK,GAAG,YAAY,CAAC,CAAC,CAAC,CAAC;QAC9B,MAAM,UAAU,GAAG,IAAI,GAAG,EAAU,CAAC;QACrC,MAAM,eAAe,GAAG,IAAI,GAAG,EAAU,CAAC;QAC1C,KAAK,MAAM,CAAC,IAAI,YAAY,EAAE,CAAC;YAC7B,MAAM,EAAE,GAAG,mBAAmB,CAAC,YAAY,CAAC,CAAC,EAAE,YAAY,CAAC,CAAC;YAC7D,IAAI,EAAE;gBAAE,UAAU,CAAC,GAAG,CAAC,EAAE,CAAC,CAAC;YAC3B,MAAM,EAAE,GAAG,mBAAmB,CAAC,YAAY,CAAC,CAAC,EAAE,SAAS,CAAC,CAAC;YAC1D,IAAI,EAAE;gBAAE,eAAe,CAAC,GAAG,CAAC,EAAE,CAAC,CAAC;QAClC,CAAC;QAED,MAAM,OAAO,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,MAAM,CAAC,CAAC;QAChE,MAAM,cAAc,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,aAAa,CAAC,IAAI,WAAW,CAAC;QAC7F,MAAM,YAAY,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,MAAM,CAAC,CAAC;QACrE,MAAM,IAAI,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,MAAM,CAAC,CAAC;QAC7D,MAAM,KAAK,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,OAAO,CAAC,CAAC;QAC/D,MAAM,UAAU,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,SAAS,CAAC,CAAC;QACtE,MAAM,OAAO,GAAG,UAAU,KAAK,SAAS,CAAC,CAAC,CAAC,UAAU,KAAK,MAAM,CAAC,CAAC,CAAC,SAAS,CAAC;QAC7E,MAAM,SAAS,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,QAAQ,CAAC,CAAC;QAEpE,yDAAyD;QACzD,MAAM,aAAa,GAAG,MAAM,UAAU,CAAC,YAAY,CAAC,WAAW,EAAE,QAAQ,EAAE,MAAM,EAAE,GAAG,CAAC,CAAC;QAExF,MAAM,MAAM,GAeR;YACF,YAAY,EAAE,cAAc;YAC5B,iBAAiB,EAAE,aAAa,CAAC,gBAAgB;YACjD,QAAQ,EAAE,aAAa,CAAC,QAAQ;YAChC,cAAc,EAAE,MAAM;SACvB,CAAC;QAEF,IAAI,OAAO;YAAE,MAAM,CAAC,QAAQ,GAAG,OAAO,CAAC;QACvC,IAAI,KAAK;YAAE,MAAM,CAAC,KAAK,GAAG,KAAK,CAAC;QAChC,IAAI,IAAI;YAAE,MAAM,CAAC,IAAI,GAAG,IAAI,CAAC;QAC7B,IAAI,YAAY;YAAE,MAAM,CAAC,aAAa,GAAG,wBAAwB,CAAC,YAAY,CAAC,CAAC;QAChF,IAAI,SAAS;YAAE,MAAM,CAAC,kBAAkB,GAAG,yBAAyB,CAAC,SAAS,CAAC,CAAC;QAChF,IAAI,UAAU,CAAC,IAAI,GAAG,CAAC;YAAE,MAAM,CAAC,WAAW,GAAG,CAAC,GAAG,UAAU,CAAC,CAAC;QAC9D,IAAI,eAAe,CAAC,IAAI,GAAG,CAAC;YAAE,MAAM,CAAC,gBAAgB,GAAG,CAAC,GAAG,eAAe,CAAC,CAAC;QAC7E,IAAI,OAAO,OAAO,KAAK,SAAS;YAAE,MAAM,CAAC,QAAQ,GAAG,OAAO,CAAC;QAC5D,IAAI,aAAa,CAAC,gBAAgB,IAAI,aAAa,CAAC,OAAO,EAAE,CAAC;YAC5D,MAAM,CAAC,OAAO,GAAG,aAAa,CAAC,OAAO,CAAC;QACzC,CAAC;QACD,IAAI,aAAa,CAAC,gBAAgB,EAAE,CAAC;YACnC,MAAM,CAAC,iBAAiB,GAAG,aAAa,CAAC,gBAAgB,CAAC;QAC5D,CAAC;QAED,OAAO,MAAM,CAAC;IAChB,CAAC;IAED,MAAM,EAAE,CAAC,MAAM,EAAE,EAAE;QACjB,MAAM,KAAK,GAAa;YACtB,MAAM,MAAM,CAAC,YAAY,GAAG,MAAM,CAAC,KAAK,CAAC,CAAC,CAAC,MAAM,MAAM,CAAC,KAAK,EAAE,CAAC,CAAC,CAAC,EAAE,IAAI;SACzE,CAAC;QACF,IAAI,MAAM,CAAC,IAAI;YAAE,KAAK,CAAC,IAAI,CAAC,aAAa,MAAM,CAAC,IAAI,EAAE,CAAC,CAAC;QACxD,IAAI,MAAM,CAAC,aAAa;YAAE,KAAK,CAAC,IAAI,CAAC,aAAa,MAAM,CAAC,aAAa,EAAE,CAAC,CAAC;QAC1E,IAAI,MAAM,CAAC,kBAAkB;YAAE,KAAK,CAAC,IAAI,CAAC,eAAe,MAAM,CAAC,kBAAkB,EAAE,CAAC,CAAC;QACtF,IAAI,OAAO,MAAM,CAAC,QAAQ,KAAK,SAAS;YAAE,KAAK,CAAC,IAAI,CAAC,iBAAiB,MAAM,CAAC,QAAQ,EAAE,CAAC,CAAC;QACzF,IAAI,MAAM,CAAC,QAAQ;YAAE,KAAK,CAAC,IAAI,CAAC,iBAAiB,MAAM,CAAC,QAAQ,EAAE,CAAC,CAAC;QACpE,IAAI,MAAM,CAAC,WAAW,IAAI,MAAM,CAAC,WAAW,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;YACxD,KAAK,CAAC,IAAI,CAAC,oBAAoB,MAAM,CAAC,WAAW,CAAC,IAAI,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;QAClE,CAAC;QACD,IAAI,MAAM,CAAC,gBAAgB,IAAI,MAAM,CAAC,gBAAgB,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;YAClE,KAAK,CAAC,IAAI,CACR,yBAAyB,MAAM,CAAC,gBAAgB,CAAC,KAAK,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC,IAAI,CAAC,IAAI,CAAC,GAAG,MAAM,CAAC,gBAAgB,CAAC,MAAM,GAAG,CAAC,CAAC,CAAC,CAAC,MAAM,MAAM,CAAC,gBAAgB,CAAC,MAAM,GAAG,CAAC,QAAQ,CAAC,CAAC,CAAC,EAAE,EAAE,CACvK,CAAC;QACJ,CAAC;QACD,KAAK,CAAC,IAAI,CAAC,iBAAiB,MAAM,CAAC,QAAQ,kBAAkB,MAAM,CAAC,cAAc,EAAE,CAAC,CAAC;QACtF,IAAI,MAAM,CAAC,iBAAiB;YAAE,KAAK,CAAC,IAAI,CAAC,UAAU,MAAM,CAAC,iBAAiB,GAAG,CAAC,CAAC;QAChF,KAAK,CAAC,IAAI,CAAC,0BAA0B,MAAM,CAAC,iBAAiB,EAAE,CAAC,CAAC;QACjE,IAAI,MAAM,CAAC,iBAAiB,IAAI,MAAM,CAAC,OAAO,EAAE,CAAC;YAC/C,KAAK,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;YACf,KAAK,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC;YAClB,KAAK,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;YACf,kDAAkD;YAClD,MAAM,MAAM,GAAG,IAAI,CAAC;YACpB,IAAI,MAAM,CAAC,OAAO,CAAC,MAAM,GAAG,MAAM,EAAE,CAAC;gBACnC,KAAK,CAAC,IAAI,CAAC,MAAM,CAAC,OAAO,CAAC,KAAK,CAAC,CAAC,EAAE,MAAM,CAAC,CAAC,CAAC;gBAC5C,KAAK,CAAC,IAAI,CACR,2BAA2B,MAAM,CAAC,OAAO,CAAC,MAAM,yDAAyD,CAC1G,CAAC;YACJ,CAAC;iBAAM,CAAC;gBACN,KAAK,CAAC,IAAI,CAAC,MAAM,CAAC,OAAO,CAAC,CAAC;YAC7B,CAAC;QACH,CAAC;aAAM,IAAI,CAAC,MAAM,CAAC,iBAAiB,EAAE,CAAC;YACrC,KAAK,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;YACf,KAAK,CAAC,IAAI,CAAC,8EAA8E,CAAC,CAAC;QAC7F,CAAC;QACD,OAAO,CAAC,EAAE,IAAI,EAAE,MAAM,EAAE,IAAI,EAAE,KAAK,CAAC,IAAI,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;IACpD,CAAC;CACF,CAAC,CAAC"}
|
|
1
|
+
{"version":3,"file":"eurlex-get-document.tool.js","sourceRoot":"","sources":["../../../../src/mcp-server/tools/definitions/eurlex-get-document.tool.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAEH,OAAO,EAAE,IAAI,EAAE,CAAC,EAAE,MAAM,wBAAwB,CAAC;AACjD,OAAO,EAAE,gBAAgB,EAAE,MAAM,+BAA+B,CAAC;AACjE,OAAO,EACL,gBAAgB,EAChB,yBAAyB,EACzB,wBAAwB,GACzB,MAAM,wCAAwC,CAAC;AAChD,OAAO,EACL,mBAAmB,EACnB,sBAAsB,GACvB,MAAM,mDAAmD,CAAC;AAC3D,OAAO,EAAE,mBAAmB,EAAE,gBAAgB,EAAE,MAAM,4CAA4C,CAAC;AACnG,OAAO,EAGL,uBAAuB,GACxB,MAAM,qDAAqD,CAAC;AAE7D;;;;GAIG;AACH,MAAM,qBAAqB,GAAG,MAAM,CAAC;AAErC,wGAAwG;AACxG,MAAM,iBAAiB,GAAG,OAAO,CAAC;AAElC,MAAM,CAAC,MAAM,mBAAmB,GAAG,IAAI,CAAC,qBAAqB,EAAE;IAC7D,KAAK,EAAE,iBAAiB;IACxB,WAAW,EACT,qFAAqF;QACrF,gHAAgH;QAChH,oFAAoF;QACpF,wFAAwF;QACxF,iDAAiD;QACjD,mHAAmH;QACnH,uEAAuE;QACvE,8EAA8E;QAC9E,iHAAiH;QACjH,qIAAqI;QACrI,8GAA8G;QAC9G,yHAAyH;QACzH,kJAAkJ;IACpJ,WAAW,EAAE,EAAE,YAAY,EAAE,IAAI,EAAE,cAAc,EAAE,IAAI,EAAE,aAAa,EAAE,IAAI,EAAE;IAC9E,KAAK,EAAE,CAAC,CAAC,MAAM,CAAC;QACd,YAAY,EAAE,CAAC;aACZ,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CACP,+DAA+D;YAC7D,iDAAiD,CACpD;QACH,OAAO,EAAE,CAAC;aACP,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CACP,4EAA4E;YAC1E,2EAA2E;YAC3E,iDAAiD,CACpD;QACH,QAAQ,EAAE,CAAC;aACR,MAAM,EAAE;aACR,KAAK,CAAC,iBAAiB,CAAC;aACxB,OAAO,CAAC,IAAI,CAAC;aACb,QAAQ,CACP,6EAA6E;YAC3E,4EAA4E,CAC/E;QACH,MAAM,EAAE,CAAC;aACN,IAAI,CAAC,CAAC,MAAM,EAAE,KAAK,EAAE,UAAU,CAAC,CAAC;aACjC,OAAO,CAAC,MAAM,CAAC;aACf,QAAQ,CACP,0EAA0E;YACxE,mEAAmE;YACnE,+EAA+E;YAC/E,0CAA0C,CAC7C;QACH,YAAY,EAAE,CAAC;aACZ,IAAI,CAAC,CAAC,eAAe,EAAE,OAAO,EAAE,MAAM,CAAC,CAAC;aACxC,OAAO,CAAC,OAAO,CAAC;aAChB,QAAQ,CACP,oHAAoH;YAClH,iFAAiF;YACjF,oHAAoH,CACvH;QACH,MAAM,EAAE,CAAC;aACN,MAAM,EAAE;aACR,GAAG,EAAE;aACL,GAAG,CAAC,CAAC,CAAC;aACN,OAAO,CAAC,CAAC,CAAC;aACV,QAAQ,CACP,qGAAqG;YACnG,kGAAkG,CACrG;QACH,KAAK,EAAE,CAAC;aACL,MAAM,EAAE;aACR,GAAG,EAAE;aACL,GAAG,CAAC,CAAC,CAAC;aACN,GAAG,CAAC,iBAAiB,CAAC;aACtB,OAAO,CAAC,qBAAqB,CAAC;aAC9B,QAAQ,CACP,4FAA4F,qBAAqB,SAAS,iBAAiB,IAAI;YAC7I,wFAAwF,CAC3F;KACJ,CAAC;IACF,MAAM,EAAE,CAAC,CAAC,MAAM,CAAC;QACf,YAAY,EAAE,CAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,CAAC,gDAAgD,CAAC;QACnF,QAAQ,EAAE,CAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,EAAE,CAAC,QAAQ,CAAC,kBAAkB,CAAC;QAC5D,KAAK,EAAE,CAAC;aACL,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CACP,uFAAuF,CACxF;QACH,IAAI,EAAE,CAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,EAAE,CAAC,QAAQ,CAAC,gDAAgD,CAAC;QACtF,aAAa,EAAE,CAAC;aACb,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CACP,mGAAmG,CACpG;QACH,kBAAkB,EAAE,CAAC;aAClB,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CACP,oIAAoI,CACrI;QACH,WAAW,EAAE,CAAC;aACX,KAAK,CAAC,CAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,CAAC,2CAA2C,CAAC,CAAC;aACvE,QAAQ,EAAE;aACV,QAAQ,CAAC,iCAAiC,CAAC;QAC9C,gBAAgB,EAAE,CAAC;aAChB,KAAK,CAAC,CAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,CAAC,sBAAsB,CAAC,CAAC;aAClD,QAAQ,EAAE;aACV,QAAQ,CAAC,kCAAkC,CAAC;QAC/C,QAAQ,EAAE,CAAC,CAAC,OAAO,EAAE,CAAC,QAAQ,EAAE,CAAC,QAAQ,CAAC,wCAAwC,CAAC;QACnF,OAAO,EAAE,CAAC;aACP,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CACP,2GAA2G;YACzG,4FAA4F;YAC5F,kHAAkH,CACrH;QACH,YAAY,EAAE,CAAC;aACZ,MAAM,EAAE;aACR,QAAQ,CAAC,6EAA6E,CAAC;QAC1F,iBAAiB,EAAE,CAAC;aACjB,OAAO,EAAE;aACT,QAAQ,CACP,sGAAsG;YACpG,8EAA8E,CACjF;QACH,cAAc,EAAE,CAAC;aACd,MAAM,EAAE;aACR,GAAG,EAAE;aACL,QAAQ,EAAE;aACV,QAAQ,CACP,2GAA2G,CAC5G;QACH,sBAAsB,EAAE,CAAC;aACtB,MAAM,EAAE;aACR,GAAG,EAAE;aACL,QAAQ,EAAE;aACV,QAAQ,CACP,6HAA6H,CAC9H;QACH,mBAAmB,EAAE,CAAC;aACnB,MAAM,EAAE;aACR,GAAG,EAAE;aACL,QAAQ,EAAE;aACV,QAAQ,CACP,oGAAoG;YAClG,yDAAyD,CAC5D;QACH,QAAQ,EAAE,CAAC;aACR,OAAO,EAAE;aACT,QAAQ,CACP,gIAAgI;YAC9H,sGAAsG,CACzG;QACH,QAAQ,EAAE,CAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,CAAC,wCAAwC,CAAC;QACvE,iBAAiB,EAAE,CAAC;aACjB,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CACP,0JAA0J,CAC3J;QACH,cAAc,EAAE,CAAC;aACd,MAAM,EAAE;aACR,QAAQ,CAAC,+DAA+D,CAAC;KAC7E,CAAC;IAEF,MAAM,EAAE;QACN;YACE,MAAM,EAAE,yBAAyB;YACjC,IAAI,EAAE,gBAAgB,CAAC,eAAe;YACtC,IAAI,EAAE,8DAA8D;YACpE,QAAQ,EAAE,iDAAiD;SAC5D;QACD;YACE,MAAM,EAAE,WAAW;YACnB,IAAI,EAAE,gBAAgB,CAAC,QAAQ;YAC/B,IAAI,EAAE,sFAAsF;YAC5F,QAAQ,EAAE,wEAAwE;SACnF;QACD;YACE,MAAM,EAAE,sBAAsB;YAC9B,IAAI,EAAE,gBAAgB,CAAC,QAAQ;YAC/B,IAAI,EAAE,qFAAqF;YAC3F,QAAQ,EACN,gGAAgG;SACnG;QACD;YACE,MAAM,EAAE,sBAAsB;YAC9B,IAAI,EAAE,gBAAgB,CAAC,kBAAkB;YACzC,IAAI,EAAE,wEAAwE;YAC9E,QAAQ,EACN,oFAAoF;SACvF;KACF;IAED,KAAK,CAAC,OAAO,CAAC,KAAK,EAAE,GAAG;QACtB,MAAM,SAAS,GAAG,sBAAsB,EAAE,CAAC;QAC3C,MAAM,UAAU,GAAG,uBAAuB,EAAE,CAAC;QAE7C,2EAA2E;QAC3E,0EAA0E;QAC1E,6EAA6E;QAC7E,qEAAqE;QACrE,4EAA4E;QAC5E,oEAAoE;QACpE,MAAM,UAAU,GAAG,KAAK,CAAC,YAAY,EAAE,IAAI,EAAE,CAAC;QAC9C,MAAM,QAAQ,GAAG,KAAK,CAAC,OAAO,EAAE,IAAI,EAAE,CAAC;QAEvC,IAAI,WAAmB,CAAC;QACxB,IAAI,QAAQ,IAAI,CAAC,UAAU,EAAE,CAAC;YAC5B,MAAM,OAAO,GAAG,MAAM,gBAAgB,CAAC,SAAS,EAAE,QAAQ,EAAE,GAAG,CAAC,CAAC;YACjE,MAAM,aAAa,GAAG,OAAO,IAAI,mBAAmB,CAAC,YAAY,CAAC,OAAO,EAAE,aAAa,CAAC,CAAC;YAC1F,IAAI,CAAC,aAAa,EAAE,CAAC;gBACnB,MAAM,GAAG,CAAC,IAAI,CAAC,WAAW,EAAE,iCAAiC,QAAQ,EAAE,EAAE;oBACvE,GAAG,GAAG,CAAC,WAAW,CAAC,WAAW,CAAC;iBAChC,CAAC,CAAC;YACL,CAAC;YACD,WAAW,GAAG,aAAa,CAAC;QAC9B,CAAC;aAAM,IAAI,UAAU,IAAI,CAAC,QAAQ,EAAE,CAAC;YACnC,WAAW,GAAG,UAAU,CAAC;QAC3B,CAAC;aAAM,CAAC;YACN,MAAM,GAAG,CAAC,IAAI,CACZ,yBAAyB,EACzB,UAAU;gBACR,CAAC,CAAC,wDAAwD;gBAC1D,CAAC,CAAC,yCAAyC,EAC7C,EAAE,GAAG,GAAG,CAAC,WAAW,CAAC,yBAAyB,CAAC,EAAE,CAClD,CAAC;QACJ,CAAC;QAED,MAAM,QAAQ,GAAG,CAAC,KAAK,CAAC,QAAQ,CAAC,IAAI,EAAE,CAAC,WAAW,EAAE,IAAI,IAAI,CAAmB,CAAC;QACjF,MAAM,MAAM,GAAG,KAAK,CAAC,MAAuB,CAAC;QAC7C,MAAM,eAAe,GAAG,mBAAmB,CAAC,WAAW,CAAC,CAAC;QAEzD,oCAAoC;QACpC,MAAM,UAAU,GAAG;;;gCAGS,eAAe;;;;;0CAKL,gBAAgB;;;;;;;WAO/C,CAAC;QAER,MAAM,YAAY,GAAG,MAAM,SAAS,CAAC,KAAK,CAAC,UAAU,EAAE,GAAG,CAAC,CAAC;QAC5D,GAAG,CAAC,GAAG,CAAC,IAAI,CAAC,yBAAyB,EAAE,EAAE,WAAW,EAAE,WAAW,EAAE,YAAY,CAAC,MAAM,EAAE,CAAC,CAAC;QAE3F,IAAI,YAAY,CAAC,MAAM,KAAK,CAAC,EAAE,CAAC;YAC9B,MAAM,GAAG,CAAC,IAAI,CAAC,WAAW,EAAE,mCAAmC,WAAW,EAAE,EAAE;gBAC5E,GAAG,GAAG,CAAC,WAAW,CAAC,WAAW,CAAC;aAChC,CAAC,CAAC;QACL,CAAC;QAED,kDAAkD;QAClD,MAAM,KAAK,GAAG,YAAY,CAAC,CAAC,CAAC,CAAC;QAC9B,MAAM,UAAU,GAAG,IAAI,GAAG,EAAU,CAAC;QACrC,MAAM,eAAe,GAAG,IAAI,GAAG,EAAU,CAAC;QAC1C,KAAK,MAAM,CAAC,IAAI,YAAY,EAAE,CAAC;YAC7B,MAAM,EAAE,GAAG,mBAAmB,CAAC,YAAY,CAAC,CAAC,EAAE,YAAY,CAAC,CAAC;YAC7D,IAAI,EAAE;gBAAE,UAAU,CAAC,GAAG,CAAC,EAAE,CAAC,CAAC;YAC3B,MAAM,EAAE,GAAG,mBAAmB,CAAC,YAAY,CAAC,CAAC,EAAE,SAAS,CAAC,CAAC;YAC1D,IAAI,EAAE;gBAAE,eAAe,CAAC,GAAG,CAAC,EAAE,CAAC,CAAC;QAClC,CAAC;QAED,MAAM,OAAO,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,MAAM,CAAC,CAAC;QAChE,MAAM,cAAc,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,aAAa,CAAC,IAAI,WAAW,CAAC;QAC7F,MAAM,YAAY,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,MAAM,CAAC,CAAC;QACrE,MAAM,IAAI,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,MAAM,CAAC,CAAC;QAC7D,MAAM,KAAK,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,OAAO,CAAC,CAAC;QAC/D,MAAM,UAAU,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,SAAS,CAAC,CAAC;QACtE,MAAM,OAAO,GAAG,UAAU,KAAK,SAAS,CAAC,CAAC,CAAC,UAAU,KAAK,MAAM,CAAC,CAAC,CAAC,SAAS,CAAC;QAC7E,MAAM,SAAS,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,QAAQ,CAAC,CAAC;QAEpE,4EAA4E;QAC5E,yEAAyE;QACzE,+DAA+D;QAC/D,2EAA2E;QAC3E,iEAAiE;QACjE,8EAA8E;QAC9E,MAAM,MAAM,GAoBR;YACF,YAAY,EAAE,cAAc;YAC5B,YAAY,EAAE,KAAK,CAAC,YAAY;YAChC,iBAAiB,EAAE,KAAK;YACxB,QAAQ,EAAE,KAAK;YACf,QAAQ;YACR,cAAc,EAAE,MAAM;SACvB,CAAC;QAEF,IAAI,OAAO;YAAE,MAAM,CAAC,QAAQ,GAAG,OAAO,CAAC;QACvC,IAAI,KAAK;YAAE,MAAM,CAAC,KAAK,GAAG,KAAK,CAAC;QAChC,IAAI,IAAI;YAAE,MAAM,CAAC,IAAI,GAAG,IAAI,CAAC;QAC7B,IAAI,YAAY;YAAE,MAAM,CAAC,aAAa,GAAG,wBAAwB,CAAC,YAAY,CAAC,CAAC;QAChF,IAAI,SAAS;YAAE,MAAM,CAAC,kBAAkB,GAAG,yBAAyB,CAAC,SAAS,CAAC,CAAC;QAChF,IAAI,UAAU,CAAC,IAAI,GAAG,CAAC;YAAE,MAAM,CAAC,WAAW,GAAG,CAAC,GAAG,UAAU,CAAC,CAAC;QAC9D,IAAI,eAAe,CAAC,IAAI,GAAG,CAAC;YAAE,MAAM,CAAC,gBAAgB,GAAG,CAAC,GAAG,eAAe,CAAC,CAAC;QAC7E,IAAI,OAAO,OAAO,KAAK,SAAS;YAAE,MAAM,CAAC,QAAQ,GAAG,OAAO,CAAC;QAE5D,IAAI,KAAK,CAAC,YAAY,KAAK,eAAe,EAAE,CAAC;YAC3C,MAAM,aAAa,GAAG,MAAM,UAAU,CAAC,YAAY,CAAC,WAAW,EAAE,QAAQ,EAAE,MAAM,EAAE,GAAG,CAAC,CAAC;YACxF,MAAM,CAAC,iBAAiB,GAAG,aAAa,CAAC,gBAAgB,CAAC;YAC1D,MAAM,CAAC,QAAQ,GAAG,aAAa,CAAC,QAAQ,CAAC;YACzC,IAAI,aAAa,CAAC,gBAAgB,EAAE,CAAC;gBACnC,MAAM,CAAC,iBAAiB,GAAG,aAAa,CAAC,gBAAgB,CAAC;YAC5D,CAAC;YAED,IAAI,aAAa,CAAC,gBAAgB,IAAI,aAAa,CAAC,OAAO,EAAE,CAAC;gBAC5D,MAAM,IAAI,GAAG,aAAa,CAAC,OAAO,CAAC;gBACnC,MAAM,KAAK,GAAG,IAAI,CAAC,MAAM,CAAC;gBAC1B,MAAM,CAAC,mBAAmB,GAAG,KAAK,CAAC;gBAEnC,IAAI,KAAK,CAAC,YAAY,KAAK,MAAM,EAAE,CAAC;oBAClC,MAAM,CAAC,OAAO,GAAG,IAAI,CAAC;oBACtB,MAAM,CAAC,cAAc,GAAG,CAAC,CAAC;oBAC1B,MAAM,CAAC,sBAAsB,GAAG,KAAK,CAAC;oBACtC,MAAM,CAAC,QAAQ,GAAG,KAAK,CAAC;gBAC1B,CAAC;qBAAM,CAAC;oBACN,sEAAsE;oBACtE,oEAAoE;oBACpE,yCAAyC;oBACzC,MAAM,MAAM,GAAG,IAAI,CAAC,GAAG,CAAC,KAAK,CAAC,MAAM,EAAE,KAAK,CAAC,CAAC;oBAC7C,MAAM,UAAU,GAAG,IAAI,CAAC,KAAK,CAAC,MAAM,EAAE,MAAM,GAAG,KAAK,CAAC,KAAK,CAAC,CAAC;oBAC5D,MAAM,CAAC,cAAc,GAAG,MAAM,CAAC;oBAC/B,MAAM,CAAC,sBAAsB,GAAG,UAAU,CAAC,MAAM,CAAC;oBAClD,MAAM,CAAC,QAAQ,GAAG,MAAM,GAAG,UAAU,CAAC,MAAM,GAAG,KAAK,CAAC;oBACrD,IAAI,UAAU,CAAC,MAAM,GAAG,CAAC;wBAAE,MAAM,CAAC,OAAO,GAAG,UAAU,CAAC;gBACzD,CAAC;YACH,CAAC;QACH,CAAC;QAED,OAAO,MAAM,CAAC;IAChB,CAAC;IAED,MAAM,EAAE,CAAC,MAAM,EAAE,EAAE;QACjB,MAAM,KAAK,GAAa;YACtB,MAAM,MAAM,CAAC,YAAY,GAAG,MAAM,CAAC,KAAK,CAAC,CAAC,CAAC,MAAM,MAAM,CAAC,KAAK,EAAE,CAAC,CAAC,CAAC,EAAE,IAAI;SACzE,CAAC;QACF,IAAI,MAAM,CAAC,IAAI;YAAE,KAAK,CAAC,IAAI,CAAC,aAAa,MAAM,CAAC,IAAI,EAAE,CAAC,CAAC;QACxD,IAAI,MAAM,CAAC,aAAa;YAAE,KAAK,CAAC,IAAI,CAAC,aAAa,MAAM,CAAC,aAAa,EAAE,CAAC,CAAC;QAC1E,IAAI,MAAM,CAAC,kBAAkB;YAAE,KAAK,CAAC,IAAI,CAAC,eAAe,MAAM,CAAC,kBAAkB,EAAE,CAAC,CAAC;QACtF,IAAI,OAAO,MAAM,CAAC,QAAQ,KAAK,SAAS;YAAE,KAAK,CAAC,IAAI,CAAC,iBAAiB,MAAM,CAAC,QAAQ,EAAE,CAAC,CAAC;QACzF,IAAI,MAAM,CAAC,QAAQ;YAAE,KAAK,CAAC,IAAI,CAAC,iBAAiB,MAAM,CAAC,QAAQ,EAAE,CAAC,CAAC;QACpE,IAAI,MAAM,CAAC,WAAW,IAAI,MAAM,CAAC,WAAW,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;YACxD,KAAK,CAAC,IAAI,CAAC,oBAAoB,MAAM,CAAC,WAAW,CAAC,IAAI,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;QAClE,CAAC;QACD,IAAI,MAAM,CAAC,gBAAgB,IAAI,MAAM,CAAC,gBAAgB,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;YAClE,KAAK,CAAC,IAAI,CACR,yBAAyB,MAAM,CAAC,gBAAgB,CAAC,KAAK,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC,IAAI,CAAC,IAAI,CAAC,GAAG,MAAM,CAAC,gBAAgB,CAAC,MAAM,GAAG,CAAC,CAAC,CAAC,CAAC,MAAM,MAAM,CAAC,gBAAgB,CAAC,MAAM,GAAG,CAAC,QAAQ,CAAC,CAAC,CAAC,EAAE,EAAE,CACvK,CAAC;QACJ,CAAC;QACD,KAAK,CAAC,IAAI,CAAC,iBAAiB,MAAM,CAAC,QAAQ,kBAAkB,MAAM,CAAC,cAAc,EAAE,CAAC,CAAC;QACtF,IAAI,MAAM,CAAC,iBAAiB;YAAE,KAAK,CAAC,IAAI,CAAC,UAAU,MAAM,CAAC,iBAAiB,GAAG,CAAC,CAAC;QAEhF,2EAA2E;QAC3E,4EAA4E;QAC5E,IAAI,MAAM,CAAC,YAAY,KAAK,eAAe,EAAE,CAAC;YAC5C,KAAK,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;YACf,KAAK,CAAC,IAAI,CACR,6GAA6G,CAC9G,CAAC;QACJ,CAAC;aAAM,IAAI,MAAM,CAAC,iBAAiB,EAAE,CAAC;YACpC,MAAM,KAAK,GAAG,MAAM,CAAC,mBAAmB,IAAI,MAAM,CAAC,OAAO,EAAE,MAAM,IAAI,CAAC,CAAC;YACxE,IAAI,MAAM,CAAC,OAAO,EAAE,CAAC;gBACnB,MAAM,KAAK,GAAG,MAAM,CAAC,cAAc,IAAI,CAAC,CAAC;gBACzC,MAAM,QAAQ,GAAG,MAAM,CAAC,sBAAsB,IAAI,MAAM,CAAC,OAAO,CAAC,MAAM,CAAC;gBACxE,MAAM,GAAG,GAAG,KAAK,GAAG,QAAQ,CAAC;gBAC7B,IAAI,MAAM,CAAC,YAAY,KAAK,MAAM,EAAE,CAAC;oBACnC,KAAK,CAAC,IAAI,CAAC,mCAAmC,QAAQ,OAAO,KAAK,cAAc,CAAC,CAAC;gBACpF,CAAC;qBAAM,CAAC;oBACN,KAAK,CAAC,IAAI,CACR,gBAAgB,MAAM,CAAC,YAAY,iBAAiB,KAAK,IAAI,GAAG,OAAO,KAAK,KAAK,QAAQ,aAAa;wBACpG,CAAC,MAAM,CAAC,QAAQ;4BACd,CAAC,CAAC,8CAA8C,GAAG,8CAA8C;4BACjG,CAAC,CAAC,mBAAmB,CAAC,CAC3B,CAAC;gBACJ,CAAC;gBACD,KAAK,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;gBACf,KAAK,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC;gBAClB,KAAK,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;gBACf,KAAK,CAAC,IAAI,CAAC,MAAM,CAAC,OAAO,CAAC,CAAC;YAC7B,CAAC;iBAAM,CAAC;gBACN,KAAK,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;gBACf,KAAK,CAAC,IAAI,CACR,yBAAyB,MAAM,CAAC,cAAc,IAAI,CAAC,0BAA0B,KAAK,yCAAyC,CAC5H,CAAC;YACJ,CAAC;QACH,CAAC;aAAM,CAAC;YACN,KAAK,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;YACf,KAAK,CAAC,IAAI,CAAC,8EAA8E,CAAC,CAAC;QAC7F,CAAC;QACD,OAAO,CAAC,EAAE,IAAI,EAAE,MAAM,EAAE,IAAI,EAAE,KAAK,CAAC,IAAI,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;IACpD,CAAC;CACF,CAAC,CAAC"}
|
|
@@ -71,7 +71,15 @@ export declare const allToolDefinitions: (import("@cyanheads/mcp-ts-core").ToolD
|
|
|
71
71
|
format: import("zod").ZodDefault<import("zod").ZodEnum<{
|
|
72
72
|
html: "html";
|
|
73
73
|
xml: "xml";
|
|
74
|
+
markdown: "markdown";
|
|
74
75
|
}>>;
|
|
76
|
+
content_mode: import("zod").ZodDefault<import("zod").ZodEnum<{
|
|
77
|
+
full: "full";
|
|
78
|
+
metadata_only: "metadata_only";
|
|
79
|
+
paged: "paged";
|
|
80
|
+
}>>;
|
|
81
|
+
offset: import("zod").ZodDefault<import("zod").ZodNumber>;
|
|
82
|
+
limit: import("zod").ZodDefault<import("zod").ZodNumber>;
|
|
75
83
|
}, import("zod/v4/core").$strip>, import("zod").ZodObject<{
|
|
76
84
|
celex_number: import("zod").ZodString;
|
|
77
85
|
work_uri: import("zod").ZodOptional<import("zod").ZodString>;
|
|
@@ -83,7 +91,12 @@ export declare const allToolDefinitions: (import("@cyanheads/mcp-ts-core").ToolD
|
|
|
83
91
|
eurovoc_subjects: import("zod").ZodOptional<import("zod").ZodArray<import("zod").ZodString>>;
|
|
84
92
|
in_force: import("zod").ZodOptional<import("zod").ZodBoolean>;
|
|
85
93
|
content: import("zod").ZodOptional<import("zod").ZodString>;
|
|
94
|
+
content_mode: import("zod").ZodString;
|
|
86
95
|
content_available: import("zod").ZodBoolean;
|
|
96
|
+
content_offset: import("zod").ZodOptional<import("zod").ZodNumber>;
|
|
97
|
+
content_chars_returned: import("zod").ZodOptional<import("zod").ZodNumber>;
|
|
98
|
+
content_chars_total: import("zod").ZodOptional<import("zod").ZodNumber>;
|
|
99
|
+
has_more: import("zod").ZodBoolean;
|
|
87
100
|
language: import("zod").ZodString;
|
|
88
101
|
language_fallback: import("zod").ZodOptional<import("zod").ZodString>;
|
|
89
102
|
content_format: import("zod").ZodString;
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../../../../src/mcp-server/tools/definitions/index.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAUH,eAAO,MAAM,kBAAkB
|
|
1
|
+
{"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../../../../src/mcp-server/tools/definitions/index.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAUH,eAAO,MAAM,kBAAkB;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;iBAQ9B,CAAC"}
|
|
@@ -1,14 +1,37 @@
|
|
|
1
1
|
/**
|
|
2
|
-
* @fileoverview EurLexContentService — HTTP client for
|
|
3
|
-
*
|
|
4
|
-
*
|
|
2
|
+
* @fileoverview EurLexContentService — HTTP client for EU act full-text content.
|
|
3
|
+
*
|
|
4
|
+
* Sources content from the EU Publications Office CELLAR content-negotiation
|
|
5
|
+
* resolver (`publications.europa.eu/resource/celex/{CELEX}`) — the same host the
|
|
6
|
+
* metadata SPARQL pipeline already queries — rather than the legacy
|
|
7
|
+
* `eur-lex.europa.eu` legal-content endpoint, which is now fronted by an AWS WAF
|
|
8
|
+
* that returns a JavaScript bot-challenge stub instead of the act text (issue #16).
|
|
9
|
+
*
|
|
10
|
+
* Content negotiation:
|
|
11
|
+
* - `Accept`: HTML acts vary by document family — OJ legislation exposes
|
|
12
|
+
* `application/xhtml+xml`, CJEU judgments expose `text/html`, so the HTML path
|
|
13
|
+
* tries both. The XML path requests Formex 4 (`application/xml;type=fmx4`),
|
|
14
|
+
* which CELLAR serves directly for single-part acts and returns HTTP 300
|
|
15
|
+
* (multiple manifestation streams) for multi-part OJ acts — treated as
|
|
16
|
+
* unavailable rather than reconstructed.
|
|
17
|
+
* - `Accept-Language`: CELLAR requires an ISO 639-2/T (three-letter) code and
|
|
18
|
+
* 400s on a missing one or on a bibliographic 639-2/B code (`ger`, `fre`);
|
|
19
|
+
* EUR-Lex two-letter codes are mapped before the request.
|
|
20
|
+
*
|
|
21
|
+
* Defense in depth: any response carrying an AWS WAF challenge signature is
|
|
22
|
+
* refused (never surfaced as content) and raised as a ServiceUnavailable error,
|
|
23
|
+
* so a challenge stub can never again be reported as `contentAvailable: true`.
|
|
5
24
|
* @module services/eurlex-content/eurlex-content-service
|
|
6
25
|
*/
|
|
7
26
|
import type { Context } from '@cyanheads/mcp-ts-core';
|
|
8
27
|
import type { AppConfig } from '@cyanheads/mcp-ts-core/config';
|
|
9
28
|
import type { StorageService } from '@cyanheads/mcp-ts-core/storage';
|
|
10
29
|
import type { ServerConfig } from '../../config/server-config.js';
|
|
11
|
-
|
|
30
|
+
/**
|
|
31
|
+
* Output formats a caller can request. `markdown` is not served by EUR-Lex — it is
|
|
32
|
+
* rendered server-side from the HTML body (see {@link WireFormat}).
|
|
33
|
+
*/
|
|
34
|
+
export type ContentFormat = 'html' | 'xml' | 'markdown';
|
|
12
35
|
/** Language codes supported by EUR-Lex (24 official EU languages). */
|
|
13
36
|
export type EurLexLanguage = 'EN' | 'FR' | 'DE' | 'ES' | 'IT' | 'PL' | 'PT' | 'NL' | 'CS' | 'DA' | 'EL' | 'ET' | 'FI' | 'HU' | 'LT' | 'LV' | 'MT' | 'RO' | 'SK' | 'SL' | 'SV' | 'BG' | 'HR' | 'GA';
|
|
14
37
|
export interface FetchContentResult {
|
|
@@ -24,21 +47,35 @@ export declare class EurLexContentService {
|
|
|
24
47
|
private readonly timeoutMs;
|
|
25
48
|
constructor(_config: AppConfig, _storage: StorageService, serverConfig: ServerConfig);
|
|
26
49
|
/**
|
|
27
|
-
* Build the
|
|
28
|
-
* Pattern: /
|
|
50
|
+
* Build the CELLAR content-negotiation URL for a CELEX number.
|
|
51
|
+
* Pattern: /resource/celex/{CELEX} (format + language come from request headers).
|
|
29
52
|
*/
|
|
30
|
-
buildContentUrl(celexNumber: string
|
|
53
|
+
buildContentUrl(celexNumber: string): string;
|
|
31
54
|
/**
|
|
32
55
|
* Fetch the full text content of an EU act by CELEX number.
|
|
33
56
|
* If the requested language is unavailable, falls back to English.
|
|
34
57
|
* Returns `contentAvailable: false` with an empty string if both attempts fail.
|
|
58
|
+
*
|
|
59
|
+
* Throws ServiceUnavailable if the content host returns an AWS WAF bot-challenge
|
|
60
|
+
* stub — a challenge is never reported as available content.
|
|
35
61
|
*/
|
|
36
62
|
fetchContent(celexNumber: string, language: EurLexLanguage, format: ContentFormat, ctx: Context): Promise<FetchContentResult>;
|
|
37
63
|
/**
|
|
38
|
-
*
|
|
39
|
-
* empty
|
|
64
|
+
* Resolve content for one language by trying each `Accept` variant for the
|
|
65
|
+
* format. Returns the first non-empty body, or null when none of the variants
|
|
66
|
+
* yield content (so the caller can fall back to English). Throws when a variant
|
|
67
|
+
* returns a bot-challenge stub.
|
|
68
|
+
*/
|
|
69
|
+
private fetchForLanguage;
|
|
70
|
+
/**
|
|
71
|
+
* Single content-negotiation request for one `Accept`/`Accept-Language` pair.
|
|
72
|
+
* Non-2xx responses (404 = no datastream of that type, 300 = multi-part Formex,
|
|
73
|
+
* 4xx/5xx) and network failures resolve to `none` so callers can try the next
|
|
74
|
+
* variant or language. A WAF challenge body resolves to `challenge`. The inner
|
|
75
|
+
* function only throws on a `fetch` rejection, so `withRetry` retries transient
|
|
76
|
+
* network errors but never a 404 or a challenge.
|
|
40
77
|
*/
|
|
41
|
-
private
|
|
78
|
+
private fetchVariant;
|
|
42
79
|
}
|
|
43
80
|
export declare function initEurLexContentService(config: AppConfig, storage: StorageService, serverConfig: ServerConfig): void;
|
|
44
81
|
export declare function getEurLexContentService(): EurLexContentService;
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"eurlex-content-service.d.ts","sourceRoot":"","sources":["../../../src/services/eurlex-content/eurlex-content-service.ts"],"names":[],"mappings":"AAAA
|
|
1
|
+
{"version":3,"file":"eurlex-content-service.d.ts","sourceRoot":"","sources":["../../../src/services/eurlex-content/eurlex-content-service.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;;;;;;;;;;GAwBG;AAEH,OAAO,KAAK,EAAE,OAAO,EAAE,MAAM,wBAAwB,CAAC;AACtD,OAAO,KAAK,EAAE,SAAS,EAAE,MAAM,+BAA+B,CAAC;AAE/D,OAAO,KAAK,EAAE,cAAc,EAAE,MAAM,gCAAgC,CAAC;AAErE,OAAO,KAAK,EAAE,YAAY,EAAE,MAAM,2BAA2B,CAAC;AAG9D;;;GAGG;AACH,MAAM,MAAM,aAAa,GAAG,MAAM,GAAG,KAAK,GAAG,UAAU,CAAC;AAQxD,sEAAsE;AACtE,MAAM,MAAM,cAAc,GACtB,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,CAAC;AA0ET,MAAM,WAAW,kBAAkB;IACjC,OAAO,EAAE,MAAM,CAAC;IAChB,gBAAgB,EAAE,OAAO,CAAC;IAC1B,MAAM,EAAE,aAAa,CAAC;IACtB,QAAQ,EAAE,cAAc,CAAC;IACzB,6CAA6C;IAC7C,gBAAgB,CAAC,EAAE,MAAM,CAAC;CAC3B;AAED,qBAAa,oBAAoB;IAC/B,OAAO,CAAC,QAAQ,CAAC,OAAO,CAAS;IACjC,OAAO,CAAC,QAAQ,CAAC,SAAS,CAAS;gBAEvB,OAAO,EAAE,SAAS,EAAE,QAAQ,EAAE,cAAc,EAAE,YAAY,EAAE,YAAY;IAKpF;;;OAGG;IACH,eAAe,CAAC,WAAW,EAAE,MAAM,GAAG,MAAM;IAI5C;;;;;;;OAOG;IACG,YAAY,CAChB,WAAW,EAAE,MAAM,EACnB,QAAQ,EAAE,cAAc,EACxB,MAAM,EAAE,aAAa,EACrB,GAAG,EAAE,OAAO,GACX,OAAO,CAAC,kBAAkB,CAAC;IA2B9B;;;;;OAKG;YACW,gBAAgB;IAgC9B;;;;;;;OAOG;IACH,OAAO,CAAC,YAAY;CA+BrB;AAMD,wBAAgB,wBAAwB,CACtC,MAAM,EAAE,SAAS,EACjB,OAAO,EAAE,cAAc,EACvB,YAAY,EAAE,YAAY,GACzB,IAAI,CAEN;AAED,wBAAgB,uBAAuB,IAAI,oBAAoB,CAO9D"}
|
|
@@ -1,11 +1,94 @@
|
|
|
1
1
|
/**
|
|
2
|
-
* @fileoverview EurLexContentService — HTTP client for
|
|
3
|
-
*
|
|
4
|
-
*
|
|
2
|
+
* @fileoverview EurLexContentService — HTTP client for EU act full-text content.
|
|
3
|
+
*
|
|
4
|
+
* Sources content from the EU Publications Office CELLAR content-negotiation
|
|
5
|
+
* resolver (`publications.europa.eu/resource/celex/{CELEX}`) — the same host the
|
|
6
|
+
* metadata SPARQL pipeline already queries — rather than the legacy
|
|
7
|
+
* `eur-lex.europa.eu` legal-content endpoint, which is now fronted by an AWS WAF
|
|
8
|
+
* that returns a JavaScript bot-challenge stub instead of the act text (issue #16).
|
|
9
|
+
*
|
|
10
|
+
* Content negotiation:
|
|
11
|
+
* - `Accept`: HTML acts vary by document family — OJ legislation exposes
|
|
12
|
+
* `application/xhtml+xml`, CJEU judgments expose `text/html`, so the HTML path
|
|
13
|
+
* tries both. The XML path requests Formex 4 (`application/xml;type=fmx4`),
|
|
14
|
+
* which CELLAR serves directly for single-part acts and returns HTTP 300
|
|
15
|
+
* (multiple manifestation streams) for multi-part OJ acts — treated as
|
|
16
|
+
* unavailable rather than reconstructed.
|
|
17
|
+
* - `Accept-Language`: CELLAR requires an ISO 639-2/T (three-letter) code and
|
|
18
|
+
* 400s on a missing one or on a bibliographic 639-2/B code (`ger`, `fre`);
|
|
19
|
+
* EUR-Lex two-letter codes are mapped before the request.
|
|
20
|
+
*
|
|
21
|
+
* Defense in depth: any response carrying an AWS WAF challenge signature is
|
|
22
|
+
* refused (never surfaced as content) and raised as a ServiceUnavailable error,
|
|
23
|
+
* so a challenge stub can never again be reported as `contentAvailable: true`.
|
|
5
24
|
* @module services/eurlex-content/eurlex-content-service
|
|
6
25
|
*/
|
|
7
26
|
import { serviceUnavailable } from '@cyanheads/mcp-ts-core/errors';
|
|
8
27
|
import { withRetry } from '@cyanheads/mcp-ts-core/utils';
|
|
28
|
+
import { htmlToMarkdown } from './html-to-markdown.js';
|
|
29
|
+
/**
|
|
30
|
+
* Map EUR-Lex two-letter language codes to the ISO 639-2/T (terminological,
|
|
31
|
+
* three-letter) codes CELLAR's content-negotiation resolver accepts in
|
|
32
|
+
* `Accept-Language`. CELLAR rejects bibliographic 639-2/B codes (`ger`, `fre`,
|
|
33
|
+
* `dut`, …), so the terminological forms (`deu`, `fra`, `nld`, …) are used.
|
|
34
|
+
*/
|
|
35
|
+
const LANGUAGE_TO_ISO_639_2 = {
|
|
36
|
+
EN: 'eng',
|
|
37
|
+
FR: 'fra',
|
|
38
|
+
DE: 'deu',
|
|
39
|
+
ES: 'spa',
|
|
40
|
+
IT: 'ita',
|
|
41
|
+
PL: 'pol',
|
|
42
|
+
PT: 'por',
|
|
43
|
+
NL: 'nld',
|
|
44
|
+
CS: 'ces',
|
|
45
|
+
DA: 'dan',
|
|
46
|
+
EL: 'ell',
|
|
47
|
+
ET: 'est',
|
|
48
|
+
FI: 'fin',
|
|
49
|
+
HU: 'hun',
|
|
50
|
+
LT: 'lit',
|
|
51
|
+
LV: 'lav',
|
|
52
|
+
MT: 'mlt',
|
|
53
|
+
RO: 'ron',
|
|
54
|
+
SK: 'slk',
|
|
55
|
+
SL: 'slv',
|
|
56
|
+
SV: 'swe',
|
|
57
|
+
BG: 'bul',
|
|
58
|
+
HR: 'hrv',
|
|
59
|
+
GA: 'gle',
|
|
60
|
+
};
|
|
61
|
+
/**
|
|
62
|
+
* `Accept` values tried per format, in order. HTML resolves to `application/xhtml+xml`
|
|
63
|
+
* for OJ legislation and `text/html` for CJEU judgments; the first to return a body
|
|
64
|
+
* wins. XML requests Formex 4 only.
|
|
65
|
+
*/
|
|
66
|
+
const ACCEPT_BY_FORMAT = {
|
|
67
|
+
html: ['application/xhtml+xml', 'text/html'],
|
|
68
|
+
xml: ['application/xml;type=fmx4'],
|
|
69
|
+
};
|
|
70
|
+
/**
|
|
71
|
+
* Render a fetched wire body into the requested output format. `html`/`xml` pass
|
|
72
|
+
* through verbatim; `markdown` is converted server-side from the HTML body.
|
|
73
|
+
*/
|
|
74
|
+
function renderContent(body, format) {
|
|
75
|
+
return format === 'markdown' ? htmlToMarkdown(body) : body;
|
|
76
|
+
}
|
|
77
|
+
/**
|
|
78
|
+
* AWS WAF bot-challenge signatures. `awswaf` matches the challenge.js host
|
|
79
|
+
* (`token.awswaf.com`), the cookie-domain list, and the `AwsWafIntegration`
|
|
80
|
+
* calls; `gokuprops` matches the per-request challenge blob. Both are
|
|
81
|
+
* WAF-specific and never appear in legitimate EU legal text. Matched
|
|
82
|
+
* case-insensitively against the response head.
|
|
83
|
+
*/
|
|
84
|
+
const CHALLENGE_MARKERS = ['awswaf', 'gokuprops'];
|
|
85
|
+
/** Bodies shorter than this (after trimming) are treated as empty/unavailable. */
|
|
86
|
+
const MIN_CONTENT_LENGTH = 100;
|
|
87
|
+
/** True when a response body carries an AWS WAF bot-challenge signature. */
|
|
88
|
+
function isChallengeResponse(body) {
|
|
89
|
+
const head = body.slice(0, 4096).toLowerCase();
|
|
90
|
+
return CHALLENGE_MARKERS.some((marker) => head.includes(marker));
|
|
91
|
+
}
|
|
9
92
|
export class EurLexContentService {
|
|
10
93
|
baseUrl;
|
|
11
94
|
timeoutMs;
|
|
@@ -14,29 +97,34 @@ export class EurLexContentService {
|
|
|
14
97
|
this.timeoutMs = serverConfig.sparqlQueryTimeoutMs;
|
|
15
98
|
}
|
|
16
99
|
/**
|
|
17
|
-
* Build the
|
|
18
|
-
* Pattern: /
|
|
100
|
+
* Build the CELLAR content-negotiation URL for a CELEX number.
|
|
101
|
+
* Pattern: /resource/celex/{CELEX} (format + language come from request headers).
|
|
19
102
|
*/
|
|
20
|
-
buildContentUrl(celexNumber
|
|
21
|
-
|
|
22
|
-
return `${this.baseUrl}/legal-content/${language}/TXT/${fmt}/?uri=CELEX:${celexNumber}`;
|
|
103
|
+
buildContentUrl(celexNumber) {
|
|
104
|
+
return `${this.baseUrl}/resource/celex/${encodeURIComponent(celexNumber)}`;
|
|
23
105
|
}
|
|
24
106
|
/**
|
|
25
107
|
* Fetch the full text content of an EU act by CELEX number.
|
|
26
108
|
* If the requested language is unavailable, falls back to English.
|
|
27
109
|
* Returns `contentAvailable: false` with an empty string if both attempts fail.
|
|
110
|
+
*
|
|
111
|
+
* Throws ServiceUnavailable if the content host returns an AWS WAF bot-challenge
|
|
112
|
+
* stub — a challenge is never reported as available content.
|
|
28
113
|
*/
|
|
29
114
|
async fetchContent(celexNumber, language, format, ctx) {
|
|
30
|
-
|
|
115
|
+
// `markdown` is rendered from the HTML body, so it is fetched as HTML; the
|
|
116
|
+
// returned `format` still reports `markdown` and `renderContent` converts.
|
|
117
|
+
const wireFormat = format === 'markdown' ? 'html' : format;
|
|
118
|
+
const primary = await this.fetchForLanguage(celexNumber, language, wireFormat, ctx);
|
|
31
119
|
if (primary !== null) {
|
|
32
|
-
return { content: primary, language, format, contentAvailable: true };
|
|
120
|
+
return { content: renderContent(primary, format), language, format, contentAvailable: true };
|
|
33
121
|
}
|
|
34
|
-
// Language fallback: try English if primary language failed
|
|
122
|
+
// Language fallback: try English if primary language failed.
|
|
35
123
|
if (language !== 'EN') {
|
|
36
|
-
const fallback = await this.
|
|
124
|
+
const fallback = await this.fetchForLanguage(celexNumber, 'EN', wireFormat, ctx);
|
|
37
125
|
if (fallback !== null) {
|
|
38
126
|
return {
|
|
39
|
-
content: fallback,
|
|
127
|
+
content: renderContent(fallback, format),
|
|
40
128
|
language: 'EN',
|
|
41
129
|
format,
|
|
42
130
|
contentAvailable: true,
|
|
@@ -47,36 +135,64 @@ export class EurLexContentService {
|
|
|
47
135
|
return { content: '', language, format, contentAvailable: false };
|
|
48
136
|
}
|
|
49
137
|
/**
|
|
50
|
-
*
|
|
51
|
-
* empty
|
|
138
|
+
* Resolve content for one language by trying each `Accept` variant for the
|
|
139
|
+
* format. Returns the first non-empty body, or null when none of the variants
|
|
140
|
+
* yield content (so the caller can fall back to English). Throws when a variant
|
|
141
|
+
* returns a bot-challenge stub.
|
|
52
142
|
*/
|
|
53
|
-
async
|
|
54
|
-
const
|
|
143
|
+
async fetchForLanguage(celexNumber, language, format, ctx) {
|
|
144
|
+
const isoLanguage = LANGUAGE_TO_ISO_639_2[language];
|
|
145
|
+
if (!isoLanguage)
|
|
146
|
+
return null;
|
|
147
|
+
for (const accept of ACCEPT_BY_FORMAT[format]) {
|
|
148
|
+
const outcome = await this.fetchVariant(celexNumber, accept, isoLanguage, ctx);
|
|
149
|
+
if (outcome.kind === 'challenge') {
|
|
150
|
+
throw serviceUnavailable(`The EU content endpoint returned a bot-challenge interstitial instead of the act text for CELEX ${celexNumber}.`, {
|
|
151
|
+
celexNumber,
|
|
152
|
+
reason: 'content_challenge',
|
|
153
|
+
recovery: {
|
|
154
|
+
hint: 'The content host is behind a WAF/bot challenge. Retry shortly; metadata remains ' +
|
|
155
|
+
'reachable via content_mode "metadata_only". A persistent challenge means ' +
|
|
156
|
+
'EURLEX_CONTENT_BASE_URL points at a WAF-protected host rather than the EU ' +
|
|
157
|
+
'Publications Office CELLAR resolver.',
|
|
158
|
+
},
|
|
159
|
+
});
|
|
160
|
+
}
|
|
161
|
+
if (outcome.kind === 'content')
|
|
162
|
+
return outcome.text;
|
|
163
|
+
}
|
|
164
|
+
return null;
|
|
165
|
+
}
|
|
166
|
+
/**
|
|
167
|
+
* Single content-negotiation request for one `Accept`/`Accept-Language` pair.
|
|
168
|
+
* Non-2xx responses (404 = no datastream of that type, 300 = multi-part Formex,
|
|
169
|
+
* 4xx/5xx) and network failures resolve to `none` so callers can try the next
|
|
170
|
+
* variant or language. A WAF challenge body resolves to `challenge`. The inner
|
|
171
|
+
* function only throws on a `fetch` rejection, so `withRetry` retries transient
|
|
172
|
+
* network errors but never a 404 or a challenge.
|
|
173
|
+
*/
|
|
174
|
+
fetchVariant(celexNumber, accept, isoLanguage, ctx) {
|
|
175
|
+
const url = this.buildContentUrl(celexNumber);
|
|
55
176
|
return withRetry(async () => {
|
|
56
177
|
const response = await fetch(url, {
|
|
57
|
-
headers: { Accept:
|
|
178
|
+
headers: { Accept: accept, 'Accept-Language': isoLanguage },
|
|
58
179
|
signal: AbortSignal.timeout(this.timeoutMs),
|
|
59
180
|
redirect: 'follow',
|
|
60
181
|
});
|
|
61
|
-
if (
|
|
62
|
-
return
|
|
63
|
-
}
|
|
182
|
+
if (!response.ok)
|
|
183
|
+
return { kind: 'none' };
|
|
64
184
|
const text = await response.text();
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
if (text.trim().length < 100) {
|
|
71
|
-
return null;
|
|
72
|
-
}
|
|
73
|
-
return text;
|
|
185
|
+
if (isChallengeResponse(text))
|
|
186
|
+
return { kind: 'challenge' };
|
|
187
|
+
if (text.trim().length < MIN_CONTENT_LENGTH)
|
|
188
|
+
return { kind: 'none' };
|
|
189
|
+
return { kind: 'content', text };
|
|
74
190
|
}, {
|
|
75
|
-
operation: 'EurLexContentService.
|
|
191
|
+
operation: 'EurLexContentService.fetchVariant',
|
|
76
192
|
baseDelayMs: 1000,
|
|
77
193
|
maxRetries: 2,
|
|
78
194
|
signal: ctx.signal,
|
|
79
|
-
}).catch(() =>
|
|
195
|
+
}).catch(() => ({ kind: 'none' }));
|
|
80
196
|
}
|
|
81
197
|
}
|
|
82
198
|
// --- Init/accessor pattern ---
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"eurlex-content-service.js","sourceRoot":"","sources":["../../../src/services/eurlex-content/eurlex-content-service.ts"],"names":[],"mappings":"AAAA
|
|
1
|
+
{"version":3,"file":"eurlex-content-service.js","sourceRoot":"","sources":["../../../src/services/eurlex-content/eurlex-content-service.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;;;;;;;;;;GAwBG;AAIH,OAAO,EAAE,kBAAkB,EAAE,MAAM,+BAA+B,CAAC;AAEnE,OAAO,EAAE,SAAS,EAAE,MAAM,8BAA8B,CAAC;AAEzD,OAAO,EAAE,cAAc,EAAE,MAAM,uBAAuB,CAAC;AAyCvD;;;;;GAKG;AACH,MAAM,qBAAqB,GAAmC;IAC5D,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;CACV,CAAC;AAEF;;;;GAIG;AACH,MAAM,gBAAgB,GAA0C;IAC9D,IAAI,EAAE,CAAC,uBAAuB,EAAE,WAAW,CAAC;IAC5C,GAAG,EAAE,CAAC,2BAA2B,CAAC;CACnC,CAAC;AAEF;;;GAGG;AACH,SAAS,aAAa,CAAC,IAAY,EAAE,MAAqB;IACxD,OAAO,MAAM,KAAK,UAAU,CAAC,CAAC,CAAC,cAAc,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC,IAAI,CAAC;AAC7D,CAAC;AAED;;;;;;GAMG;AACH,MAAM,iBAAiB,GAAG,CAAC,QAAQ,EAAE,WAAW,CAAC,CAAC;AAElD,kFAAkF;AAClF,MAAM,kBAAkB,GAAG,GAAG,CAAC;AAE/B,4EAA4E;AAC5E,SAAS,mBAAmB,CAAC,IAAY;IACvC,MAAM,IAAI,GAAG,IAAI,CAAC,KAAK,CAAC,CAAC,EAAE,IAAI,CAAC,CAAC,WAAW,EAAE,CAAC;IAC/C,OAAO,iBAAiB,CAAC,IAAI,CAAC,CAAC,MAAM,EAAE,EAAE,CAAC,IAAI,CAAC,QAAQ,CAAC,MAAM,CAAC,CAAC,CAAC;AACnE,CAAC;AAcD,MAAM,OAAO,oBAAoB;IACd,OAAO,CAAS;IAChB,SAAS,CAAS;IAEnC,YAAY,OAAkB,EAAE,QAAwB,EAAE,YAA0B;QAClF,IAAI,CAAC,OAAO,GAAG,YAAY,CAAC,oBAAoB,CAAC,OAAO,CAAC,KAAK,EAAE,EAAE,CAAC,CAAC;QACpE,IAAI,CAAC,SAAS,GAAG,YAAY,CAAC,oBAAoB,CAAC;IACrD,CAAC;IAED;;;OAGG;IACH,eAAe,CAAC,WAAmB;QACjC,OAAO,GAAG,IAAI,CAAC,OAAO,mBAAmB,kBAAkB,CAAC,WAAW,CAAC,EAAE,CAAC;IAC7E,CAAC;IAED;;;;;;;OAOG;IACH,KAAK,CAAC,YAAY,CAChB,WAAmB,EACnB,QAAwB,EACxB,MAAqB,EACrB,GAAY;QAEZ,2EAA2E;QAC3E,2EAA2E;QAC3E,MAAM,UAAU,GAAe,MAAM,KAAK,UAAU,CAAC,CAAC,CAAC,MAAM,CAAC,CAAC,CAAC,MAAM,CAAC;QAEvE,MAAM,OAAO,GAAG,MAAM,IAAI,CAAC,gBAAgB,CAAC,WAAW,EAAE,QAAQ,EAAE,UAAU,EAAE,GAAG,CAAC,CAAC;QACpF,IAAI,OAAO,KAAK,IAAI,EAAE,CAAC;YACrB,OAAO,EAAE,OAAO,EAAE,aAAa,CAAC,OAAO,EAAE,MAAM,CAAC,EAAE,QAAQ,EAAE,MAAM,EAAE,gBAAgB,EAAE,IAAI,EAAE,CAAC;QAC/F,CAAC;QAED,6DAA6D;QAC7D,IAAI,QAAQ,KAAK,IAAI,EAAE,CAAC;YACtB,MAAM,QAAQ,GAAG,MAAM,IAAI,CAAC,gBAAgB,CAAC,WAAW,EAAE,IAAI,EAAE,UAAU,EAAE,GAAG,CAAC,CAAC;YACjF,IAAI,QAAQ,KAAK,IAAI,EAAE,CAAC;gBACtB,OAAO;oBACL,OAAO,EAAE,aAAa,CAAC,QAAQ,EAAE,MAAM,CAAC;oBACxC,QAAQ,EAAE,IAAI;oBACd,MAAM;oBACN,gBAAgB,EAAE,IAAI;oBACtB,gBAAgB,EAAE,sBAAsB,QAAQ,yCAAyC;iBAC1F,CAAC;YACJ,CAAC;QACH,CAAC;QAED,OAAO,EAAE,OAAO,EAAE,EAAE,EAAE,QAAQ,EAAE,MAAM,EAAE,gBAAgB,EAAE,KAAK,EAAE,CAAC;IACpE,CAAC;IAED;;;;;OAKG;IACK,KAAK,CAAC,gBAAgB,CAC5B,WAAmB,EACnB,QAAwB,EACxB,MAAkB,EAClB,GAAY;QAEZ,MAAM,WAAW,GAAG,qBAAqB,CAAC,QAAQ,CAAC,CAAC;QACpD,IAAI,CAAC,WAAW;YAAE,OAAO,IAAI,CAAC;QAE9B,KAAK,MAAM,MAAM,IAAI,gBAAgB,CAAC,MAAM,CAAC,EAAE,CAAC;YAC9C,MAAM,OAAO,GAAG,MAAM,IAAI,CAAC,YAAY,CAAC,WAAW,EAAE,MAAM,EAAE,WAAW,EAAE,GAAG,CAAC,CAAC;YAC/E,IAAI,OAAO,CAAC,IAAI,KAAK,WAAW,EAAE,CAAC;gBACjC,MAAM,kBAAkB,CACtB,mGAAmG,WAAW,GAAG,EACjH;oBACE,WAAW;oBACX,MAAM,EAAE,mBAAmB;oBAC3B,QAAQ,EAAE;wBACR,IAAI,EACF,kFAAkF;4BAClF,2EAA2E;4BAC3E,4EAA4E;4BAC5E,sCAAsC;qBACzC;iBACF,CACF,CAAC;YACJ,CAAC;YACD,IAAI,OAAO,CAAC,IAAI,KAAK,SAAS;gBAAE,OAAO,OAAO,CAAC,IAAI,CAAC;QACtD,CAAC;QACD,OAAO,IAAI,CAAC;IACd,CAAC;IAED;;;;;;;OAOG;IACK,YAAY,CAClB,WAAmB,EACnB,MAAc,EACd,WAAmB,EACnB,GAAY;QAEZ,MAAM,GAAG,GAAG,IAAI,CAAC,eAAe,CAAC,WAAW,CAAC,CAAC;QAE9C,OAAO,SAAS,CACd,KAAK,IAA2B,EAAE;YAChC,MAAM,QAAQ,GAAG,MAAM,KAAK,CAAC,GAAG,EAAE;gBAChC,OAAO,EAAE,EAAE,MAAM,EAAE,MAAM,EAAE,iBAAiB,EAAE,WAAW,EAAE;gBAC3D,MAAM,EAAE,WAAW,CAAC,OAAO,CAAC,IAAI,CAAC,SAAS,CAAC;gBAC3C,QAAQ,EAAE,QAAQ;aACnB,CAAC,CAAC;YAEH,IAAI,CAAC,QAAQ,CAAC,EAAE;gBAAE,OAAO,EAAE,IAAI,EAAE,MAAM,EAAE,CAAC;YAE1C,MAAM,IAAI,GAAG,MAAM,QAAQ,CAAC,IAAI,EAAE,CAAC;YACnC,IAAI,mBAAmB,CAAC,IAAI,CAAC;gBAAE,OAAO,EAAE,IAAI,EAAE,WAAW,EAAE,CAAC;YAC5D,IAAI,IAAI,CAAC,IAAI,EAAE,CAAC,MAAM,GAAG,kBAAkB;gBAAE,OAAO,EAAE,IAAI,EAAE,MAAM,EAAE,CAAC;YACrE,OAAO,EAAE,IAAI,EAAE,SAAS,EAAE,IAAI,EAAE,CAAC;QACnC,CAAC,EACD;YACE,SAAS,EAAE,mCAAmC;YAC9C,WAAW,EAAE,IAAI;YACjB,UAAU,EAAE,CAAC;YACb,MAAM,EAAE,GAAG,CAAC,MAAM;SACnB,CACF,CAAC,KAAK,CAAC,GAAiB,EAAE,CAAC,CAAC,EAAE,IAAI,EAAE,MAAM,EAAE,CAAC,CAAC,CAAC;IAClD,CAAC;CACF;AAED,gCAAgC;AAEhC,IAAI,QAA0C,CAAC;AAE/C,MAAM,UAAU,wBAAwB,CACtC,MAAiB,EACjB,OAAuB,EACvB,YAA0B;IAE1B,QAAQ,GAAG,IAAI,oBAAoB,CAAC,MAAM,EAAE,OAAO,EAAE,YAAY,CAAC,CAAC;AACrE,CAAC;AAED,MAAM,UAAU,uBAAuB;IACrC,IAAI,CAAC,QAAQ,EAAE,CAAC;QACd,MAAM,IAAI,KAAK,CACb,mFAAmF,CACpF,CAAC;IACJ,CAAC;IACD,OAAO,QAAQ,CAAC;AAClB,CAAC"}
|
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* @fileoverview Server-side HTML→Markdown conversion for EU act bodies.
|
|
3
|
+
*
|
|
4
|
+
* EUR-Lex/CELLAR serves acts as CONVEX-generated XHTML in which the numbered
|
|
5
|
+
* structure — recitals, article paragraphs, lettered/roman points — is laid out
|
|
6
|
+
* in two-column tables: a narrow marker column (`(1)`, `(a)`, `1.1.`, `—`) beside
|
|
7
|
+
* a wide ~96% prose column. A naive HTML→Markdown pass turns each of these into an
|
|
8
|
+
* unreadable two-column GFM row (`| (1) | The protection of natural persons… |`).
|
|
9
|
+
*
|
|
10
|
+
* This module pre-processes the parsed DOM before conversion:
|
|
11
|
+
* - strips non-body chrome (`<head>`, inline `<style>`/`<script>`, the OJ
|
|
12
|
+
* masthead table, separators, dead intra-document fragment links);
|
|
13
|
+
* - flattens the numbering layout tables into inline-marked block text
|
|
14
|
+
* (`(1) The protection of natural persons…`), recursing innermost-first so
|
|
15
|
+
* nested points collapse cleanly;
|
|
16
|
+
* - preserves genuine data tables — CONVEX tags them `class="oj-table"` — so
|
|
17
|
+
* node-html-markdown renders them as real GFM tables.
|
|
18
|
+
*
|
|
19
|
+
* The conversion produces the full Markdown body; windowing/pagination is applied
|
|
20
|
+
* downstream by the caller (a paged window may land mid-structure — acceptable).
|
|
21
|
+
* @module services/eurlex-content/html-to-markdown
|
|
22
|
+
*/
|
|
23
|
+
/**
|
|
24
|
+
* Convert an EU act XHTML/HTML body to clean Markdown. Numbering layout tables
|
|
25
|
+
* become inline-marked text; genuine data tables become GFM tables; no raw HTML
|
|
26
|
+
* leaks through. Returns the full converted body.
|
|
27
|
+
*/
|
|
28
|
+
export declare function htmlToMarkdown(html: string): string;
|
|
29
|
+
//# sourceMappingURL=html-to-markdown.d.ts.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"html-to-markdown.d.ts","sourceRoot":"","sources":["../../../src/services/eurlex-content/html-to-markdown.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;;;;;;;GAqBG;AA2BH;;;;GAIG;AACH,wBAAgB,cAAc,CAAC,IAAI,EAAE,MAAM,GAAG,MAAM,CAMnD"}
|
|
@@ -0,0 +1,172 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* @fileoverview Server-side HTML→Markdown conversion for EU act bodies.
|
|
3
|
+
*
|
|
4
|
+
* EUR-Lex/CELLAR serves acts as CONVEX-generated XHTML in which the numbered
|
|
5
|
+
* structure — recitals, article paragraphs, lettered/roman points — is laid out
|
|
6
|
+
* in two-column tables: a narrow marker column (`(1)`, `(a)`, `1.1.`, `—`) beside
|
|
7
|
+
* a wide ~96% prose column. A naive HTML→Markdown pass turns each of these into an
|
|
8
|
+
* unreadable two-column GFM row (`| (1) | The protection of natural persons… |`).
|
|
9
|
+
*
|
|
10
|
+
* This module pre-processes the parsed DOM before conversion:
|
|
11
|
+
* - strips non-body chrome (`<head>`, inline `<style>`/`<script>`, the OJ
|
|
12
|
+
* masthead table, separators, dead intra-document fragment links);
|
|
13
|
+
* - flattens the numbering layout tables into inline-marked block text
|
|
14
|
+
* (`(1) The protection of natural persons…`), recursing innermost-first so
|
|
15
|
+
* nested points collapse cleanly;
|
|
16
|
+
* - preserves genuine data tables — CONVEX tags them `class="oj-table"` — so
|
|
17
|
+
* node-html-markdown renders them as real GFM tables.
|
|
18
|
+
*
|
|
19
|
+
* The conversion produces the full Markdown body; windowing/pagination is applied
|
|
20
|
+
* downstream by the caller (a paged window may land mid-structure — acceptable).
|
|
21
|
+
* @module services/eurlex-content/html-to-markdown
|
|
22
|
+
*/
|
|
23
|
+
import { NodeHtmlMarkdown } from 'node-html-markdown';
|
|
24
|
+
import { HTMLElement, NodeType, parse } from 'node-html-parser';
|
|
25
|
+
/** Chrome removed wholesale before conversion (head/style/script/link/separators). */
|
|
26
|
+
const CHROME_SELECTOR = 'head, style, script, link, hr';
|
|
27
|
+
/**
|
|
28
|
+
* The OJ masthead table (date | language | "Official Journal of the European
|
|
29
|
+
* Union" | "L 119/1") is identified by these CONVEX paragraph classes and dropped
|
|
30
|
+
* so it never leads the body.
|
|
31
|
+
*/
|
|
32
|
+
const MASTHEAD_SELECTOR = '.oj-hd-ti, .oj-hd-oj, .oj-hd-date, .oj-hd-lg';
|
|
33
|
+
/**
|
|
34
|
+
* Lead-column width (percent) at or below which a class-less table is treated as a
|
|
35
|
+
* numbering layout (its first column holds a `4%` marker), not tabular data. Every
|
|
36
|
+
* observed genuine data table leads with a column ≥ 20%; numbering tables lead with
|
|
37
|
+
* `4%` (single marker) or `4%`/`4%` (nested marker). The OJ masthead's `10%` lead
|
|
38
|
+
* also falls under this floor, a harmless extra guard since it is stripped above.
|
|
39
|
+
*/
|
|
40
|
+
const LAYOUT_LEAD_COL_MAX_PCT = 10;
|
|
41
|
+
/** Max length of a first-cell string still considered an ordinal marker (col-less fallback). */
|
|
42
|
+
const MARKER_MAX_LEN = 6;
|
|
43
|
+
/**
|
|
44
|
+
* Convert an EU act XHTML/HTML body to clean Markdown. Numbering layout tables
|
|
45
|
+
* become inline-marked text; genuine data tables become GFM tables; no raw HTML
|
|
46
|
+
* leaks through. Returns the full converted body.
|
|
47
|
+
*/
|
|
48
|
+
export function htmlToMarkdown(html) {
|
|
49
|
+
const root = parse(html, { comment: false });
|
|
50
|
+
stripChrome(root);
|
|
51
|
+
const body = root.querySelector('body') ?? root;
|
|
52
|
+
flattenLayoutTables(body);
|
|
53
|
+
return NodeHtmlMarkdown.translate(body.innerHTML).trim();
|
|
54
|
+
}
|
|
55
|
+
/** Remove document chrome and neutralize dead intra-document links in place. */
|
|
56
|
+
function stripChrome(root) {
|
|
57
|
+
for (const node of root.querySelectorAll(CHROME_SELECTOR))
|
|
58
|
+
node.remove();
|
|
59
|
+
for (const table of root.querySelectorAll('table')) {
|
|
60
|
+
if (table.querySelector(MASTHEAD_SELECTOR))
|
|
61
|
+
table.remove();
|
|
62
|
+
}
|
|
63
|
+
// Intra-document fragment anchors (footnote refs, internal cross-refs) don't
|
|
64
|
+
// survive as Markdown links — keep the visible text, drop the dead href.
|
|
65
|
+
for (const anchor of root.querySelectorAll('a')) {
|
|
66
|
+
const href = anchor.getAttribute('href') ?? '';
|
|
67
|
+
if (href === '' || href.startsWith('#')) {
|
|
68
|
+
anchor.replaceWith(parse(`<span>${anchor.innerHTML}</span>`));
|
|
69
|
+
}
|
|
70
|
+
}
|
|
71
|
+
}
|
|
72
|
+
/**
|
|
73
|
+
* Flatten numbering layout tables to inline-marked block text, innermost-first so
|
|
74
|
+
* a nested point is already collapsed when its parent row is rebuilt. Genuine data
|
|
75
|
+
* tables are left intact for GFM conversion.
|
|
76
|
+
*/
|
|
77
|
+
function flattenLayoutTables(node) {
|
|
78
|
+
for (const child of [...node.childNodes]) {
|
|
79
|
+
if (child instanceof HTMLElement)
|
|
80
|
+
flattenLayoutTables(child);
|
|
81
|
+
}
|
|
82
|
+
if (isTag(node, 'table') && !isGenuineDataTable(node)) {
|
|
83
|
+
node.replaceWith(parse(flattenNumberingTable(node)));
|
|
84
|
+
}
|
|
85
|
+
}
|
|
86
|
+
/**
|
|
87
|
+
* Whether a table carries tabular data (→ GFM) rather than numbering layout
|
|
88
|
+
* (→ flattened text). CONVEX marks real tables `class="oj-table"`; for class-less
|
|
89
|
+
* tables, a wide lead column (or, when no `<col>` widths exist, rows that don't all
|
|
90
|
+
* begin with a short ordinal marker) signals genuine data.
|
|
91
|
+
*/
|
|
92
|
+
function isGenuineDataTable(table) {
|
|
93
|
+
if (/\boj-table\b/.test(table.getAttribute('class') ?? ''))
|
|
94
|
+
return true;
|
|
95
|
+
const leadPct = leadColWidthPct(table);
|
|
96
|
+
if (leadPct !== null)
|
|
97
|
+
return leadPct > LAYOUT_LEAD_COL_MAX_PCT;
|
|
98
|
+
return !allRowsLeadWithMarker(table);
|
|
99
|
+
}
|
|
100
|
+
/** Width (percent) of the table's own first `<col>`, or null when absent/unparseable. */
|
|
101
|
+
function leadColWidthPct(table) {
|
|
102
|
+
const col = directChildrenByTag(table, ['col'])[0];
|
|
103
|
+
const match = (col?.getAttribute('width') ?? '').match(/^(\d+(?:\.\d+)?)\s*%/);
|
|
104
|
+
return match ? Number(match[1]) : null;
|
|
105
|
+
}
|
|
106
|
+
/** True when every row's first cell is empty or a short ordinal marker (no `<col>` widths). */
|
|
107
|
+
function allRowsLeadWithMarker(table) {
|
|
108
|
+
const rows = directRows(table);
|
|
109
|
+
if (rows.length === 0)
|
|
110
|
+
return false;
|
|
111
|
+
return rows.every((row) => {
|
|
112
|
+
const first = directCells(row)[0];
|
|
113
|
+
if (!first)
|
|
114
|
+
return true;
|
|
115
|
+
const text = first.text.trim();
|
|
116
|
+
return text === '' || (text.length <= MARKER_MAX_LEN && !/\s/.test(text));
|
|
117
|
+
});
|
|
118
|
+
}
|
|
119
|
+
/**
|
|
120
|
+
* Rebuild a numbering layout table as a `<div>` of block rows: the marker cell(s)
|
|
121
|
+
* are prefixed inline onto the prose cell so each row reads `(1) prose…`. The prose
|
|
122
|
+
* cell's inner HTML is preserved verbatim, so inline markup and any nested genuine
|
|
123
|
+
* tables (already-flattened nested points) carry through unchanged.
|
|
124
|
+
*/
|
|
125
|
+
function flattenNumberingTable(table) {
|
|
126
|
+
const blocks = [];
|
|
127
|
+
for (const row of directRows(table)) {
|
|
128
|
+
const cells = directCells(row);
|
|
129
|
+
const prose = cells.at(-1);
|
|
130
|
+
if (!prose)
|
|
131
|
+
continue;
|
|
132
|
+
const marker = cells
|
|
133
|
+
.slice(0, -1)
|
|
134
|
+
.map((cell) => cell.text.trim())
|
|
135
|
+
.filter(Boolean)
|
|
136
|
+
.join(' ')
|
|
137
|
+
.replace(/\s+/g, ' ');
|
|
138
|
+
let inner = prose.innerHTML.trim();
|
|
139
|
+
if (marker) {
|
|
140
|
+
const openTag = inner.match(/^<p\b[^>]*>/i);
|
|
141
|
+
inner = openTag
|
|
142
|
+
? inner.slice(0, openTag[0].length) +
|
|
143
|
+
`${escapeHtml(marker)} ` +
|
|
144
|
+
inner.slice(openTag[0].length)
|
|
145
|
+
: `<p>${escapeHtml(marker)}</p>${inner}`;
|
|
146
|
+
}
|
|
147
|
+
if (inner)
|
|
148
|
+
blocks.push(inner);
|
|
149
|
+
}
|
|
150
|
+
return `<div>${blocks.join('\n')}</div>`;
|
|
151
|
+
}
|
|
152
|
+
function isTag(node, tag) {
|
|
153
|
+
return node instanceof HTMLElement && node.rawTagName?.toLowerCase() === tag;
|
|
154
|
+
}
|
|
155
|
+
function directChildrenByTag(node, tags) {
|
|
156
|
+
return node.childNodes.filter((child) => child.nodeType === NodeType.ELEMENT_NODE &&
|
|
157
|
+
tags.includes(child.rawTagName?.toLowerCase()));
|
|
158
|
+
}
|
|
159
|
+
/** A table's own rows (direct `<tr>`, plus those under its direct sections) — never nested tables'. */
|
|
160
|
+
function directRows(table) {
|
|
161
|
+
const sections = directChildrenByTag(table, ['tbody', 'thead', 'tfoot']);
|
|
162
|
+
const rows = sections.flatMap((section) => directChildrenByTag(section, ['tr']));
|
|
163
|
+
rows.push(...directChildrenByTag(table, ['tr']));
|
|
164
|
+
return rows;
|
|
165
|
+
}
|
|
166
|
+
function directCells(row) {
|
|
167
|
+
return directChildrenByTag(row, ['td', 'th']);
|
|
168
|
+
}
|
|
169
|
+
function escapeHtml(value) {
|
|
170
|
+
return value.replace(/&/g, '&').replace(/</g, '<').replace(/>/g, '>');
|
|
171
|
+
}
|
|
172
|
+
//# sourceMappingURL=html-to-markdown.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"html-to-markdown.js","sourceRoot":"","sources":["../../../src/services/eurlex-content/html-to-markdown.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;;;;;;;GAqBG;AAEH,OAAO,EAAE,gBAAgB,EAAE,MAAM,oBAAoB,CAAC;AACtD,OAAO,EAAE,WAAW,EAAa,QAAQ,EAAE,KAAK,EAAE,MAAM,kBAAkB,CAAC;AAE3E,sFAAsF;AACtF,MAAM,eAAe,GAAG,+BAA+B,CAAC;AAExD;;;;GAIG;AACH,MAAM,iBAAiB,GAAG,8CAA8C,CAAC;AAEzE;;;;;;GAMG;AACH,MAAM,uBAAuB,GAAG,EAAE,CAAC;AAEnC,gGAAgG;AAChG,MAAM,cAAc,GAAG,CAAC,CAAC;AAEzB;;;;GAIG;AACH,MAAM,UAAU,cAAc,CAAC,IAAY;IACzC,MAAM,IAAI,GAAG,KAAK,CAAC,IAAI,EAAE,EAAE,OAAO,EAAE,KAAK,EAAE,CAAC,CAAC;IAC7C,WAAW,CAAC,IAAI,CAAC,CAAC;IAClB,MAAM,IAAI,GAAG,IAAI,CAAC,aAAa,CAAC,MAAM,CAAC,IAAI,IAAI,CAAC;IAChD,mBAAmB,CAAC,IAAI,CAAC,CAAC;IAC1B,OAAO,gBAAgB,CAAC,SAAS,CAAC,IAAI,CAAC,SAAS,CAAC,CAAC,IAAI,EAAE,CAAC;AAC3D,CAAC;AAED,gFAAgF;AAChF,SAAS,WAAW,CAAC,IAAiB;IACpC,KAAK,MAAM,IAAI,IAAI,IAAI,CAAC,gBAAgB,CAAC,eAAe,CAAC;QAAE,IAAI,CAAC,MAAM,EAAE,CAAC;IACzE,KAAK,MAAM,KAAK,IAAI,IAAI,CAAC,gBAAgB,CAAC,OAAO,CAAC,EAAE,CAAC;QACnD,IAAI,KAAK,CAAC,aAAa,CAAC,iBAAiB,CAAC;YAAE,KAAK,CAAC,MAAM,EAAE,CAAC;IAC7D,CAAC;IACD,6EAA6E;IAC7E,yEAAyE;IACzE,KAAK,MAAM,MAAM,IAAI,IAAI,CAAC,gBAAgB,CAAC,GAAG,CAAC,EAAE,CAAC;QAChD,MAAM,IAAI,GAAG,MAAM,CAAC,YAAY,CAAC,MAAM,CAAC,IAAI,EAAE,CAAC;QAC/C,IAAI,IAAI,KAAK,EAAE,IAAI,IAAI,CAAC,UAAU,CAAC,GAAG,CAAC,EAAE,CAAC;YACxC,MAAM,CAAC,WAAW,CAAC,KAAK,CAAC,SAAS,MAAM,CAAC,SAAS,SAAS,CAAC,CAAC,CAAC;QAChE,CAAC;IACH,CAAC;AACH,CAAC;AAED;;;;GAIG;AACH,SAAS,mBAAmB,CAAC,IAAiB;IAC5C,KAAK,MAAM,KAAK,IAAI,CAAC,GAAG,IAAI,CAAC,UAAU,CAAC,EAAE,CAAC;QACzC,IAAI,KAAK,YAAY,WAAW;YAAE,mBAAmB,CAAC,KAAK,CAAC,CAAC;IAC/D,CAAC;IACD,IAAI,KAAK,CAAC,IAAI,EAAE,OAAO,CAAC,IAAI,CAAC,kBAAkB,CAAC,IAAI,CAAC,EAAE,CAAC;QACtD,IAAI,CAAC,WAAW,CAAC,KAAK,CAAC,qBAAqB,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC;IACvD,CAAC;AACH,CAAC;AAED;;;;;GAKG;AACH,SAAS,kBAAkB,CAAC,KAAkB;IAC5C,IAAI,cAAc,CAAC,IAAI,CAAC,KAAK,CAAC,YAAY,CAAC,OAAO,CAAC,IAAI,EAAE,CAAC;QAAE,OAAO,IAAI,CAAC;IACxE,MAAM,OAAO,GAAG,eAAe,CAAC,KAAK,CAAC,CAAC;IACvC,IAAI,OAAO,KAAK,IAAI;QAAE,OAAO,OAAO,GAAG,uBAAuB,CAAC;IAC/D,OAAO,CAAC,qBAAqB,CAAC,KAAK,CAAC,CAAC;AACvC,CAAC;AAED,yFAAyF;AACzF,SAAS,eAAe,CAAC,KAAkB;IACzC,MAAM,GAAG,GAAG,mBAAmB,CAAC,KAAK,EAAE,CAAC,KAAK,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC;IACnD,MAAM,KAAK,GAAG,CAAC,GAAG,EAAE,YAAY,CAAC,OAAO,CAAC,IAAI,EAAE,CAAC,CAAC,KAAK,CAAC,sBAAsB,CAAC,CAAC;IAC/E,OAAO,KAAK,CAAC,CAAC,CAAC,MAAM,CAAC,KAAK,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC,IAAI,CAAC;AACzC,CAAC;AAED,+FAA+F;AAC/F,SAAS,qBAAqB,CAAC,KAAkB;IAC/C,MAAM,IAAI,GAAG,UAAU,CAAC,KAAK,CAAC,CAAC;IAC/B,IAAI,IAAI,CAAC,MAAM,KAAK,CAAC;QAAE,OAAO,KAAK,CAAC;IACpC,OAAO,IAAI,CAAC,KAAK,CAAC,CAAC,GAAG,EAAE,EAAE;QACxB,MAAM,KAAK,GAAG,WAAW,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,CAAC;QAClC,IAAI,CAAC,KAAK;YAAE,OAAO,IAAI,CAAC;QACxB,MAAM,IAAI,GAAG,KAAK,CAAC,IAAI,CAAC,IAAI,EAAE,CAAC;QAC/B,OAAO,IAAI,KAAK,EAAE,IAAI,CAAC,IAAI,CAAC,MAAM,IAAI,cAAc,IAAI,CAAC,IAAI,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC,CAAC;IAC5E,CAAC,CAAC,CAAC;AACL,CAAC;AAED;;;;;GAKG;AACH,SAAS,qBAAqB,CAAC,KAAkB;IAC/C,MAAM,MAAM,GAAa,EAAE,CAAC;IAC5B,KAAK,MAAM,GAAG,IAAI,UAAU,CAAC,KAAK,CAAC,EAAE,CAAC;QACpC,MAAM,KAAK,GAAG,WAAW,CAAC,GAAG,CAAC,CAAC;QAC/B,MAAM,KAAK,GAAG,KAAK,CAAC,EAAE,CAAC,CAAC,CAAC,CAAC,CAAC;QAC3B,IAAI,CAAC,KAAK;YAAE,SAAS;QACrB,MAAM,MAAM,GAAG,KAAK;aACjB,KAAK,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC;aACZ,GAAG,CAAC,CAAC,IAAI,EAAE,EAAE,CAAC,IAAI,CAAC,IAAI,CAAC,IAAI,EAAE,CAAC;aAC/B,MAAM,CAAC,OAAO,CAAC;aACf,IAAI,CAAC,GAAG,CAAC;aACT,OAAO,CAAC,MAAM,EAAE,GAAG,CAAC,CAAC;QACxB,IAAI,KAAK,GAAG,KAAK,CAAC,SAAS,CAAC,IAAI,EAAE,CAAC;QACnC,IAAI,MAAM,EAAE,CAAC;YACX,MAAM,OAAO,GAAG,KAAK,CAAC,KAAK,CAAC,cAAc,CAAC,CAAC;YAC5C,KAAK,GAAG,OAAO;gBACb,CAAC,CAAC,KAAK,CAAC,KAAK,CAAC,CAAC,EAAE,OAAO,CAAC,CAAC,CAAC,CAAC,MAAM,CAAC;oBACjC,GAAG,UAAU,CAAC,MAAM,CAAC,GAAG;oBACxB,KAAK,CAAC,KAAK,CAAC,OAAO,CAAC,CAAC,CAAC,CAAC,MAAM,CAAC;gBAChC,CAAC,CAAC,MAAM,UAAU,CAAC,MAAM,CAAC,OAAO,KAAK,EAAE,CAAC;QAC7C,CAAC;QACD,IAAI,KAAK;YAAE,MAAM,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC;IAChC,CAAC;IACD,OAAO,QAAQ,MAAM,CAAC,IAAI,CAAC,IAAI,CAAC,QAAQ,CAAC;AAC3C,CAAC;AAED,SAAS,KAAK,CAAC,IAAU,EAAE,GAAW;IACpC,OAAO,IAAI,YAAY,WAAW,IAAI,IAAI,CAAC,UAAU,EAAE,WAAW,EAAE,KAAK,GAAG,CAAC;AAC/E,CAAC;AAED,SAAS,mBAAmB,CAAC,IAAiB,EAAE,IAAuB;IACrE,OAAO,IAAI,CAAC,UAAU,CAAC,MAAM,CAC3B,CAAC,KAAK,EAAwB,EAAE,CAC9B,KAAK,CAAC,QAAQ,KAAK,QAAQ,CAAC,YAAY;QACxC,IAAI,CAAC,QAAQ,CAAE,KAAqB,CAAC,UAAU,EAAE,WAAW,EAAE,CAAC,CAClE,CAAC;AACJ,CAAC;AAED,uGAAuG;AACvG,SAAS,UAAU,CAAC,KAAkB;IACpC,MAAM,QAAQ,GAAG,mBAAmB,CAAC,KAAK,EAAE,CAAC,OAAO,EAAE,OAAO,EAAE,OAAO,CAAC,CAAC,CAAC;IACzE,MAAM,IAAI,GAAG,QAAQ,CAAC,OAAO,CAAC,CAAC,OAAO,EAAE,EAAE,CAAC,mBAAmB,CAAC,OAAO,EAAE,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC;IACjF,IAAI,CAAC,IAAI,CAAC,GAAG,mBAAmB,CAAC,KAAK,EAAE,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC;IACjD,OAAO,IAAI,CAAC;AACd,CAAC;AAED,SAAS,WAAW,CAAC,GAAgB;IACnC,OAAO,mBAAmB,CAAC,GAAG,EAAE,CAAC,IAAI,EAAE,IAAI,CAAC,CAAC,CAAC;AAChD,CAAC;AAED,SAAS,UAAU,CAAC,KAAa;IAC/B,OAAO,KAAK,CAAC,OAAO,CAAC,IAAI,EAAE,OAAO,CAAC,CAAC,OAAO,CAAC,IAAI,EAAE,MAAM,CAAC,CAAC,OAAO,CAAC,IAAI,EAAE,MAAM,CAAC,CAAC;AAClF,CAAC"}
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@cyanheads/eur-lex-mcp-server",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.4.0",
|
|
4
4
|
"mcpName": "io.github.cyanheads/eur-lex-mcp-server",
|
|
5
5
|
"description": "Search EU legislation, CJEU case law, and treaties; traverse the CELLAR relationship graph; resolve EuroVoc concepts via MCP. STDIO or Streamable HTTP.",
|
|
6
6
|
"type": "module",
|
|
@@ -89,7 +89,9 @@
|
|
|
89
89
|
"access": "public"
|
|
90
90
|
},
|
|
91
91
|
"dependencies": {
|
|
92
|
-
"@cyanheads/mcp-ts-core": "^0.10.
|
|
92
|
+
"@cyanheads/mcp-ts-core": "^0.10.10",
|
|
93
|
+
"node-html-markdown": "^2.0.0",
|
|
94
|
+
"node-html-parser": "^8.0.4",
|
|
93
95
|
"pino-pretty": "^13.1.3",
|
|
94
96
|
"zod": "^4.4.3"
|
|
95
97
|
},
|
package/server.json
CHANGED
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
"url": "https://github.com/cyanheads/eur-lex-mcp-server",
|
|
7
7
|
"source": "github"
|
|
8
8
|
},
|
|
9
|
-
"version": "0.
|
|
9
|
+
"version": "0.4.0",
|
|
10
10
|
"remotes": [
|
|
11
11
|
{
|
|
12
12
|
"type": "streamable-http",
|
|
@@ -19,7 +19,7 @@
|
|
|
19
19
|
"registryBaseUrl": "https://registry.npmjs.org",
|
|
20
20
|
"identifier": "@cyanheads/eur-lex-mcp-server",
|
|
21
21
|
"runtimeHint": "bun",
|
|
22
|
-
"version": "0.
|
|
22
|
+
"version": "0.4.0",
|
|
23
23
|
"packageArguments": [
|
|
24
24
|
{
|
|
25
25
|
"type": "positional",
|
|
@@ -40,10 +40,10 @@
|
|
|
40
40
|
},
|
|
41
41
|
{
|
|
42
42
|
"name": "EURLEX_CONTENT_BASE_URL",
|
|
43
|
-
"description": "
|
|
43
|
+
"description": "Base URL of the EU Publications Office CELLAR content-negotiation resolver that serves act full text; override e.g. for a local mirror.",
|
|
44
44
|
"format": "string",
|
|
45
45
|
"isRequired": false,
|
|
46
|
-
"default": "
|
|
46
|
+
"default": "http://publications.europa.eu"
|
|
47
47
|
},
|
|
48
48
|
{
|
|
49
49
|
"name": "SPARQL_QUERY_TIMEOUT_MS",
|
|
@@ -76,7 +76,7 @@
|
|
|
76
76
|
"registryBaseUrl": "https://registry.npmjs.org",
|
|
77
77
|
"identifier": "@cyanheads/eur-lex-mcp-server",
|
|
78
78
|
"runtimeHint": "bun",
|
|
79
|
-
"version": "0.
|
|
79
|
+
"version": "0.4.0",
|
|
80
80
|
"packageArguments": [
|
|
81
81
|
{
|
|
82
82
|
"type": "positional",
|
|
@@ -138,10 +138,10 @@
|
|
|
138
138
|
},
|
|
139
139
|
{
|
|
140
140
|
"name": "EURLEX_CONTENT_BASE_URL",
|
|
141
|
-
"description": "
|
|
141
|
+
"description": "Base URL of the EU Publications Office CELLAR content-negotiation resolver that serves act full text; override e.g. for a local mirror.",
|
|
142
142
|
"format": "string",
|
|
143
143
|
"isRequired": false,
|
|
144
|
-
"default": "
|
|
144
|
+
"default": "http://publications.europa.eu"
|
|
145
145
|
},
|
|
146
146
|
{
|
|
147
147
|
"name": "SPARQL_QUERY_TIMEOUT_MS",
|