@cyanheads/eur-lex-mcp-server 0.2.1 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/AGENTS.md CHANGED
@@ -1,8 +1,8 @@
1
1
  # Developer Protocol
2
2
 
3
3
  **Server:** eur-lex-mcp-server
4
- **Version:** 0.2.1
5
- **Framework:** [@cyanheads/mcp-ts-core](https://www.npmjs.com/package/@cyanheads/mcp-ts-core) `^0.10.9`
4
+ **Version:** 0.4.0
5
+ **Framework:** [@cyanheads/mcp-ts-core](https://www.npmjs.com/package/@cyanheads/mcp-ts-core) `^0.10.10`
6
6
  **Engines:** Bun ≥1.3.0, Node ≥24.0.0
7
7
  **MCP SDK:** `@modelcontextprotocol/sdk` ^1.29.0
8
8
  **Zod:** ^4.4.3
package/CLAUDE.md CHANGED
@@ -1,8 +1,8 @@
1
1
  # Developer Protocol
2
2
 
3
3
  **Server:** eur-lex-mcp-server
4
- **Version:** 0.2.1
5
- **Framework:** [@cyanheads/mcp-ts-core](https://www.npmjs.com/package/@cyanheads/mcp-ts-core) `^0.10.9`
4
+ **Version:** 0.4.0
5
+ **Framework:** [@cyanheads/mcp-ts-core](https://www.npmjs.com/package/@cyanheads/mcp-ts-core) `^0.10.10`
6
6
  **Engines:** Bun ≥1.3.0, Node ≥24.0.0
7
7
  **MCP SDK:** `@modelcontextprotocol/sdk` ^1.29.0
8
8
  **Zod:** ^4.4.3
package/README.md CHANGED
@@ -7,7 +7,7 @@
7
7
 
8
8
  <div align="center">
9
9
 
10
- [![Version](https://img.shields.io/badge/Version-0.2.1-blue.svg?style=flat-square)](./CHANGELOG.md) [![License](https://img.shields.io/badge/License-Apache%202.0-orange.svg?style=flat-square)](./LICENSE) [![Docker](https://img.shields.io/badge/Docker-ghcr.io-2496ED?style=flat-square&logo=docker&logoColor=white)](https://github.com/users/cyanheads/packages/container/package/eur-lex-mcp-server) [![MCP SDK](https://img.shields.io/badge/MCP%20SDK-^1.29.0-green.svg?style=flat-square)](https://modelcontextprotocol.io/) [![npm](https://img.shields.io/npm/v/@cyanheads/eur-lex-mcp-server?style=flat-square&logo=npm&logoColor=white)](https://www.npmjs.com/package/@cyanheads/eur-lex-mcp-server) [![TypeScript](https://img.shields.io/badge/TypeScript-^6.0.3-3178C6.svg?style=flat-square)](https://www.typescriptlang.org/) [![Bun](https://img.shields.io/badge/Bun-v1.3.11-blueviolet.svg?style=flat-square)](https://bun.sh/)
10
+ [![Version](https://img.shields.io/badge/Version-0.4.0-blue.svg?style=flat-square)](./CHANGELOG.md) [![License](https://img.shields.io/badge/License-Apache%202.0-orange.svg?style=flat-square)](./LICENSE) [![Docker](https://img.shields.io/badge/Docker-ghcr.io-2496ED?style=flat-square&logo=docker&logoColor=white)](https://github.com/users/cyanheads/packages/container/package/eur-lex-mcp-server) [![MCP SDK](https://img.shields.io/badge/MCP%20SDK-^1.29.0-green.svg?style=flat-square)](https://modelcontextprotocol.io/) [![npm](https://img.shields.io/npm/v/@cyanheads/eur-lex-mcp-server?style=flat-square&logo=npm&logoColor=white)](https://www.npmjs.com/package/@cyanheads/eur-lex-mcp-server) [![TypeScript](https://img.shields.io/badge/TypeScript-^6.0.3-3178C6.svg?style=flat-square)](https://www.typescriptlang.org/) [![Bun](https://img.shields.io/badge/Bun-v1.3.11-blueviolet.svg?style=flat-square)](https://bun.sh/)
11
11
 
12
12
  </div>
13
13
 
@@ -34,7 +34,7 @@ Seven tools covering EU legal research — document discovery, content retrieval
34
34
  | Tool | Description |
35
35
  |:-----|:------------|
36
36
  | `eurlex_search_documents` | Search EU legislation, treaties, and preparatory acts across the CELLAR corpus. Filters by document type, date range, EuroVoc concept, author institution, and in-force status. |
37
- | `eurlex_get_document` | Fetch structured metadata and full text (HTML or Formex4 XML) for a work by CELEX number or ELI URI. |
37
+ | `eurlex_get_document` | Fetch structured metadata and full text (HTML, Markdown, or Formex4 XML) for a work by CELEX number or ELI URI. |
38
38
  | `eurlex_lookup_celex` | Resolve an EU legal citation — a CELEX number or an ELI URI — to the canonical CELLAR work. |
39
39
  | `eurlex_get_cases` | Search CJEU and General Court case law — judgments, orders, and Advocate General opinions — by case number, party name, subject, or date range. |
40
40
  | `eurlex_get_relations` | Traverse the CELLAR relationship graph: amendment chain, consolidated versions, legal basis, citation network, and national transposition measures. |
@@ -61,7 +61,8 @@ Fetch the notice and full text of an EU legal act.
61
61
 
62
62
  - Accepts CELEX numbers (e.g., `32016R0679`) or ELI URIs
63
63
  - Returns structured metadata: title, date, document type, author institution, legal basis, EuroVoc subjects, in-force flag
64
- - Full text in HTML (default) or Formex4 XML
64
+ - Full text in HTML (default), Markdown, or Formex4 XML — `format: "markdown"` converts the act body to clean Markdown server-side (recitals and numbered points as readable text, genuine data tables as GFM)
65
+ - Content shaping for large acts: `content_mode` `"paged"` (default) returns a bounded character window (`offset` + `limit`) with `content_chars_total` and `has_more` so you can page to the end; `"full"` returns the whole body in one call; `"metadata_only"` skips the body
65
66
  - Supports all 24 official EU languages; defaults to English with automatic fallback when a translation is unavailable
66
67
  - Older acts and some CJEU judgments may lack English translations
67
68
 
@@ -140,7 +141,7 @@ EUR-Lex-specific:
140
141
 
141
142
  - No API key required — both CELLAR SPARQL and EUR-Lex REST content endpoints are publicly accessible
142
143
  - `CellarSparqlService` POSTs `application/x-www-form-urlencoded` SPARQL with CDM prefix declarations built in; server-side LIMIT enforcement (max 100) prevents Virtuoso timeout abuse
143
- - `EurLexContentService` fetches full HTML or Formex4 XML via the canonical `legal-content/{LANG}/TXT/` URL pattern (CELLAR work URIs return 400 on direct GET this avoids that)
144
+ - `EurLexContentService` fetches act text from the CELLAR content-negotiation resolver (`/resource/celex/{CELEX}` with `Accept` / `Accept-Language` headers); HTML and Formex4 XML pass through, Markdown is converted server-side from the HTML body
144
145
  - Virtuoso error classification: HTTP 200 with `Virtuoso 37000 Error` body is parsed and re-raised as `ServiceUnavailable` (transient/timeout) or `InvalidParams` (syntax error)
145
146
  - Language fallback on document fetch: if the requested language is unavailable, retries with English; returns metadata-only with a note when English also fails
146
147
  - Typed error contracts on every tool — structured `reason` codes let agents branch on outcomes without parsing text
@@ -275,7 +276,7 @@ All configuration is validated at startup via Zod schemas in `src/config/server-
275
276
  | Variable | Description | Default |
276
277
  |:---------|:------------|:--------|
277
278
  | `CELLAR_SPARQL_ENDPOINT` | CELLAR SPARQL endpoint URL override (e.g., for a local Virtuoso mirror). | `http://publications.europa.eu/webapi/rdf/sparql` |
278
- | `EURLEX_CONTENT_BASE_URL` | EUR-Lex content API base URL override. | `https://eur-lex.europa.eu` |
279
+ | `EURLEX_CONTENT_BASE_URL` | EU Publications Office CELLAR content resolver base URL override. | `http://publications.europa.eu` |
279
280
  | `SPARQL_QUERY_TIMEOUT_MS` | Client-side timeout for SPARQL requests in milliseconds. | `55000` |
280
281
  | `MAX_SPARQL_RESULTS` | Enforced ceiling on LIMIT in all generated SPARQL queries. | `100` |
281
282
  | `MCP_TRANSPORT_TYPE` | Transport: `stdio` or `http`. | `stdio` |
@@ -0,0 +1,26 @@
1
+ ---
2
+ summary: "eurlex_get_document re-sources act text from the EU Publications Office CELLAR resolver and refuses AWS WAF bot-challenge stubs (previously surfaced as content); adds content_mode/offset/limit body pagination with content_* navigation fields and removes the 8,000-char text cut"
3
+ breaking: false
4
+ security: false
5
+ ---
6
+
7
+ # 0.3.0 — 2026-06-30
8
+
9
+ ## Added
10
+
11
+ - **`content_mode`, `offset`, `limit` inputs on `eurlex_get_document`** ([#12](https://github.com/cyanheads/eur-lex-mcp-server/issues/12)) — `content_mode` selects how much body to return: `"paged"` (default) yields a bounded `[offset, offset+limit)` window, `"full"` the entire body in one call, `"metadata_only"` skips the content fetch. `limit` defaults to 25,000 characters, capped at 100,000.
12
+ - **`content_*` navigation output fields** ([#12](https://github.com/cyanheads/eur-lex-mcp-server/issues/12)) — `content_mode`, `content_offset`, `content_chars_returned`, `content_chars_total`, and `has_more` let a client page contiguous windows to the end of an act and reconstruct the full body. Paging past the end returns an empty window with `has_more: false`, not an error.
13
+
14
+ ## Changed
15
+
16
+ - **Act full text re-sourced from CELLAR content negotiation** ([#16](https://github.com/cyanheads/eur-lex-mcp-server/issues/16)) — `EurLexContentService` fetches `publications.europa.eu/resource/celex/{CELEX}` (`application/xhtml+xml` → `text/html` for HTML, Formex 4 for XML) instead of the WAF-fronted `eur-lex.europa.eu` legal-content endpoint. `Accept-Language` maps EUR-Lex two-letter codes to the ISO 639-2/T forms CELLAR requires; a multi-part Formex `300` response is treated as unavailable rather than reconstructed.
17
+ - **`EURLEX_CONTENT_BASE_URL` default is now `http://publications.europa.eu`** ([#16](https://github.com/cyanheads/eur-lex-mcp-server/issues/16)) — was `https://eur-lex.europa.eu`. The env var name is unchanged; `server.json`, `README.md`, and `.env.example` updated to match the CELLAR content-negotiation source.
18
+ - **`content_available` distinguishes "not requested" from "unavailable"** ([#12](https://github.com/cyanheads/eur-lex-mcp-server/issues/12)) — now `false` in `"metadata_only"` mode (no fetch attempted); pair it with `content_mode` to tell the two apart.
19
+
20
+ ## Removed
21
+
22
+ - **The 8,000-character `format()` content cut** ([#12](https://github.com/cyanheads/eur-lex-mcp-server/issues/12)) — the text view and `structuredContent.content` honor the same `content_mode` window, so there is no separate downstream truncation.
23
+
24
+ ## Fixed
25
+
26
+ - **AWS WAF bot-challenge stub surfaced as act content** ([#16](https://github.com/cyanheads/eur-lex-mcp-server/issues/16)) — `eurlex_get_document` returned the JavaScript bot-challenge interstitial (the `awswaf`/`gokuProps` stub) as `content` with `content_available: true` for every CELEX. A response carrying a WAF challenge signature is now detected and raised as a `content_challenge` / `ServiceUnavailable` error rather than reported as available content.
@@ -0,0 +1,20 @@
1
+ ---
2
+ summary: "eurlex_get_document adds an opt-in markdown format — server-side HTML→Markdown of the act body with GFM data tables — and bumps @cyanheads/mcp-ts-core to ^0.10.10, refreshing the lockfile to clear transitive advisories in hono, js-yaml, and esbuild"
3
+ breaking: false
4
+ security: true
5
+ ---
6
+
7
+ # 0.4.0 — 2026-06-30
8
+
9
+ ## Added
10
+
11
+ - **`markdown` value on the `eurlex_get_document` `format` input** ([#13](https://github.com/cyanheads/eur-lex-mcp-server/issues/13)) — opt-in server-side HTML→Markdown of the act body; `html` stays the default and `content_format` reports `"markdown"`. The CONVEX numbering layout tables EUR-Lex uses for recitals, article paragraphs, and lettered/roman points flatten to inline-marked text (`(1) The protection of natural persons…`) instead of unreadable two-column rows; genuine `oj-table` data tables convert to GFM. Composes with the existing `content_mode`/`offset`/`limit` pagination.
12
+
13
+ ## Changed
14
+
15
+ - **`@cyanheads/mcp-ts-core` `^0.10.9 → ^0.10.10`** — framework patch; lockfile refreshed.
16
+ - **New runtime dependencies** — `node-html-markdown` `^2.0.0` and `node-html-parser` `^8.0.4` power the Markdown conversion.
17
+
18
+ ## Security
19
+
20
+ - **Transitive advisories cleared** — the framework bump and lockfile refresh resolved advisories in transitive dependencies: `hono` `4.12.26 → 4.12.27`, `js-yaml` `3.14.2 → 3.15.0`, and `esbuild` `0.28.0` (removed — `vite` `8.0.16 → 8.1.2` switched to rolldown, dropping the esbuild peer). `bun audit` reports no advisories.
@@ -1 +1 @@
1
- {"version":3,"file":"server-config.d.ts","sourceRoot":"","sources":["../../src/config/server-config.ts"],"names":[],"mappings":"AAAA;;;;GAIG;AAEH,OAAO,EAAE,CAAC,EAAE,MAAM,wBAAwB,CAAC;AAG3C,QAAA,MAAM,kBAAkB;;;;;iBAwBtB,CAAC;AAEH,MAAM,MAAM,YAAY,GAAG,CAAC,CAAC,KAAK,CAAC,OAAO,kBAAkB,CAAC,CAAC;AAI9D,wBAAgB,eAAe,IAAI,YAAY,CAQ9C"}
1
+ {"version":3,"file":"server-config.d.ts","sourceRoot":"","sources":["../../src/config/server-config.ts"],"names":[],"mappings":"AAAA;;;;GAIG;AAEH,OAAO,EAAE,CAAC,EAAE,MAAM,wBAAwB,CAAC;AAG3C,QAAA,MAAM,kBAAkB;;;;;iBA4BtB,CAAC;AAEH,MAAM,MAAM,YAAY,GAAG,CAAC,CAAC,KAAK,CAAC,OAAO,kBAAkB,CAAC,CAAC;AAI9D,wBAAgB,eAAe,IAAI,YAAY,CAQ9C"}
@@ -14,8 +14,10 @@ const ServerConfigSchema = z.object({
14
14
  eurLexContentBaseUrl: z
15
15
  .string()
16
16
  .url()
17
- .default('https://eur-lex.europa.eu')
18
- .describe('EUR-Lex content API base URL'),
17
+ .default('http://publications.europa.eu')
18
+ .describe('Base URL of the EU Publications Office CELLAR content-negotiation resolver, which serves ' +
19
+ 'act text via /resource/celex/{CELEX}. Replaces the WAF-protected eur-lex.europa.eu ' +
20
+ 'legal-content endpoint, which now returns an AWS WAF bot-challenge stub (issue #16).'),
19
21
  sparqlQueryTimeoutMs: z.coerce
20
22
  .number()
21
23
  .int()
@@ -1 +1 @@
1
- {"version":3,"file":"server-config.js","sourceRoot":"","sources":["../../src/config/server-config.ts"],"names":[],"mappings":"AAAA;;;;GAIG;AAEH,OAAO,EAAE,CAAC,EAAE,MAAM,wBAAwB,CAAC;AAC3C,OAAO,EAAE,cAAc,EAAE,MAAM,+BAA+B,CAAC;AAE/D,MAAM,kBAAkB,GAAG,CAAC,CAAC,MAAM,CAAC;IAClC,oBAAoB,EAAE,CAAC;SACpB,MAAM,EAAE;SACR,GAAG,EAAE;SACL,OAAO,CAAC,iDAAiD,CAAC;SAC1D,QAAQ,CAAC,uCAAuC,CAAC;IACpD,oBAAoB,EAAE,CAAC;SACpB,MAAM,EAAE;SACR,GAAG,EAAE;SACL,OAAO,CAAC,2BAA2B,CAAC;SACpC,QAAQ,CAAC,8BAA8B,CAAC;IAC3C,oBAAoB,EAAE,CAAC,CAAC,MAAM;SAC3B,MAAM,EAAE;SACR,GAAG,EAAE;SACL,QAAQ,EAAE;SACV,OAAO,CAAC,MAAM,CAAC;SACf,QAAQ,CAAC,yDAAyD,CAAC;IACtE,gBAAgB,EAAE,CAAC,CAAC,MAAM;SACvB,MAAM,EAAE;SACR,GAAG,EAAE;SACL,QAAQ,EAAE;SACV,GAAG,CAAC,GAAG,CAAC;SACR,OAAO,CAAC,GAAG,CAAC;SACZ,QAAQ,CAAC,2DAA2D,CAAC;CACzE,CAAC,CAAC;AAIH,IAAI,OAAiC,CAAC;AAEtC,MAAM,UAAU,eAAe;IAC7B,OAAO,KAAK,cAAc,CAAC,kBAAkB,EAAE;QAC7C,oBAAoB,EAAE,wBAAwB;QAC9C,oBAAoB,EAAE,yBAAyB;QAC/C,oBAAoB,EAAE,yBAAyB;QAC/C,gBAAgB,EAAE,oBAAoB;KACvC,CAAC,CAAC;IACH,OAAO,OAAO,CAAC;AACjB,CAAC"}
1
+ {"version":3,"file":"server-config.js","sourceRoot":"","sources":["../../src/config/server-config.ts"],"names":[],"mappings":"AAAA;;;;GAIG;AAEH,OAAO,EAAE,CAAC,EAAE,MAAM,wBAAwB,CAAC;AAC3C,OAAO,EAAE,cAAc,EAAE,MAAM,+BAA+B,CAAC;AAE/D,MAAM,kBAAkB,GAAG,CAAC,CAAC,MAAM,CAAC;IAClC,oBAAoB,EAAE,CAAC;SACpB,MAAM,EAAE;SACR,GAAG,EAAE;SACL,OAAO,CAAC,iDAAiD,CAAC;SAC1D,QAAQ,CAAC,uCAAuC,CAAC;IACpD,oBAAoB,EAAE,CAAC;SACpB,MAAM,EAAE;SACR,GAAG,EAAE;SACL,OAAO,CAAC,+BAA+B,CAAC;SACxC,QAAQ,CACP,2FAA2F;QACzF,qFAAqF;QACrF,sFAAsF,CACzF;IACH,oBAAoB,EAAE,CAAC,CAAC,MAAM;SAC3B,MAAM,EAAE;SACR,GAAG,EAAE;SACL,QAAQ,EAAE;SACV,OAAO,CAAC,MAAM,CAAC;SACf,QAAQ,CAAC,yDAAyD,CAAC;IACtE,gBAAgB,EAAE,CAAC,CAAC,MAAM;SACvB,MAAM,EAAE;SACR,GAAG,EAAE;SACL,QAAQ,EAAE;SACV,GAAG,CAAC,GAAG,CAAC;SACR,OAAO,CAAC,GAAG,CAAC;SACZ,QAAQ,CAAC,2DAA2D,CAAC;CACzE,CAAC,CAAC;AAIH,IAAI,OAAiC,CAAC;AAEtC,MAAM,UAAU,eAAe;IAC7B,OAAO,KAAK,cAAc,CAAC,kBAAkB,EAAE;QAC7C,oBAAoB,EAAE,wBAAwB;QAC9C,oBAAoB,EAAE,yBAAyB;QAC/C,oBAAoB,EAAE,yBAAyB;QAC/C,gBAAgB,EAAE,oBAAoB;KACvC,CAAC,CAAC;IACH,OAAO,OAAO,CAAC;AACjB,CAAC"}
@@ -11,7 +11,15 @@ export declare const eurlex_get_document: import("@cyanheads/mcp-ts-core").ToolD
11
11
  format: z.ZodDefault<z.ZodEnum<{
12
12
  html: "html";
13
13
  xml: "xml";
14
+ markdown: "markdown";
14
15
  }>>;
16
+ content_mode: z.ZodDefault<z.ZodEnum<{
17
+ full: "full";
18
+ metadata_only: "metadata_only";
19
+ paged: "paged";
20
+ }>>;
21
+ offset: z.ZodDefault<z.ZodNumber>;
22
+ limit: z.ZodDefault<z.ZodNumber>;
15
23
  }, z.core.$strip>, z.ZodObject<{
16
24
  celex_number: z.ZodString;
17
25
  work_uri: z.ZodOptional<z.ZodString>;
@@ -23,7 +31,12 @@ export declare const eurlex_get_document: import("@cyanheads/mcp-ts-core").ToolD
23
31
  eurovoc_subjects: z.ZodOptional<z.ZodArray<z.ZodString>>;
24
32
  in_force: z.ZodOptional<z.ZodBoolean>;
25
33
  content: z.ZodOptional<z.ZodString>;
34
+ content_mode: z.ZodString;
26
35
  content_available: z.ZodBoolean;
36
+ content_offset: z.ZodOptional<z.ZodNumber>;
37
+ content_chars_returned: z.ZodOptional<z.ZodNumber>;
38
+ content_chars_total: z.ZodOptional<z.ZodNumber>;
39
+ has_more: z.ZodBoolean;
27
40
  language: z.ZodString;
28
41
  language_fallback: z.ZodOptional<z.ZodString>;
29
42
  content_format: z.ZodString;
@@ -1 +1 @@
1
- {"version":3,"file":"eurlex-get-document.tool.d.ts","sourceRoot":"","sources":["../../../../src/mcp-server/tools/definitions/eurlex-get-document.tool.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAEH,OAAO,EAAQ,CAAC,EAAE,MAAM,wBAAwB,CAAC;AACjD,OAAO,EAAE,gBAAgB,EAAE,MAAM,+BAA+B,CAAC;AAiBjE,eAAO,MAAM,mBAAmB;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;cAgS9B,CAAC"}
1
+ {"version":3,"file":"eurlex-get-document.tool.d.ts","sourceRoot":"","sources":["../../../../src/mcp-server/tools/definitions/eurlex-get-document.tool.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAEH,OAAO,EAAQ,CAAC,EAAE,MAAM,wBAAwB,CAAC;AACjD,OAAO,EAAE,gBAAgB,EAAE,MAAM,+BAA+B,CAAC;AA2BjE,eAAO,MAAM,mBAAmB;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;cAma9B,CAAC"}
@@ -8,17 +8,29 @@ import { ENG_LANGUAGE_URI, resolveCorporateBodyLabel, resolveResourceTypeLabel,
8
8
  import { CellarSparqlService, getCellarSparqlService, } from '../../../services/cellar-sparql/cellar-sparql-service.js';
9
9
  import { escapeSparqlLiteral, resolveEliToWork } from '../../../services/cellar-sparql/eli-resolution.js';
10
10
  import { getEurLexContentService, } from '../../../services/eurlex-content/eurlex-content-service.js';
11
+ /**
12
+ * Default character window returned for body content in "paged" mode — bounds a
13
+ * single call while keeping small acts whole. The tail of a larger act is never
14
+ * lost: page forward with `offset`, or request `content_mode: "full"`.
15
+ */
16
+ const DEFAULT_CONTENT_LIMIT = 25_000;
17
+ /** Hard ceiling on one paged window. Use `content_mode: "full"` for the whole body in a single call. */
18
+ const MAX_CONTENT_LIMIT = 100_000;
11
19
  export const eurlex_get_document = tool('eurlex_get_document', {
12
20
  title: 'Get EU Document',
13
21
  description: 'Fetch the notice (metadata) and full text of an EU act by CELEX number or ELI URI. ' +
14
22
  'Returns structured metadata — title, date, document type, author institution, legal basis, EuroVoc subjects — ' +
15
- 'plus the HTML or Formex4 XML content in the requested language. ' +
23
+ 'plus the act content as HTML, Markdown, or Formex4 XML in the requested language. ' +
16
24
  'Defaults to English (EN); not all works have content in all 24 official EU languages, ' +
17
25
  'especially older acts pre-2004 EU enlargement. ' +
18
26
  'If the requested language is unavailable, the server automatically falls back to English and notes the fallback. ' +
19
27
  'CELEX format: {sector}{year}{type}{number} e.g. 32016R0679 for GDPR. ' +
20
28
  'Use eurlex_lookup_celex to validate an identifier before calling this tool. ' +
21
- 'HTML format returns the full act text suitable for reading; XML returns Formex4 for structured processing.',
29
+ 'HTML returns the full act text as served by EUR-Lex; markdown converts that HTML to clean Markdown server-side ' +
30
+ '(recitals and numbered points as readable text, genuine data tables as GFM tables); XML returns Formex4 for structured processing. ' +
31
+ 'Large bodies are bounded per call but never lost: content_mode "paged" (default) returns a character window ' +
32
+ '(offset + limit) alongside content_chars_total and has_more, so you can page to the end and reconstruct the whole act; ' +
33
+ 'content_mode "full" returns the entire body in one call; content_mode "metadata_only" returns metadata with no body and skips the content fetch.',
22
34
  annotations: { readOnlyHint: true, idempotentHint: true, openWorldHint: true },
23
35
  input: z.object({
24
36
  celex_number: z
@@ -39,9 +51,33 @@ export const eurlex_get_document = tool('eurlex_get_document', {
39
51
  .describe('Language code for document content (ISO 639-1 uppercase, e.g. EN, FR, DE). ' +
40
52
  'Defaults to EN. Falls back to EN if the requested language is unavailable.'),
41
53
  format: z
42
- .enum(['html', 'xml'])
54
+ .enum(['html', 'xml', 'markdown'])
43
55
  .default('html')
44
- .describe('Content format: "html" for readable HTML text (default), "xml" for Formex4 XML structured format.'),
56
+ .describe('Content format: "html" for the act text as served by EUR-Lex (default); ' +
57
+ '"markdown" for that HTML converted to clean Markdown server-side ' +
58
+ '(recitals and numbered points as readable text, genuine data tables as GFM); ' +
59
+ '"xml" for Formex4 XML structured format.'),
60
+ content_mode: z
61
+ .enum(['metadata_only', 'paged', 'full'])
62
+ .default('paged')
63
+ .describe('How much of the document body to return. "paged" (default) returns a bounded character window — see offset/limit; ' +
64
+ '"full" returns the entire body in one call (large acts can be hundreds of KB); ' +
65
+ '"metadata_only" returns metadata with no body and skips the content fetch. offset and limit apply only to "paged".'),
66
+ offset: z
67
+ .number()
68
+ .int()
69
+ .min(0)
70
+ .default(0)
71
+ .describe('Character offset into the full document body where the returned window starts ("paged" mode only). ' +
72
+ 'Page forward by setting offset = content_offset + content_chars_returned from the previous call.'),
73
+ limit: z
74
+ .number()
75
+ .int()
76
+ .min(1)
77
+ .max(MAX_CONTENT_LIMIT)
78
+ .default(DEFAULT_CONTENT_LIMIT)
79
+ .describe(`Maximum characters of body content to return in this window ("paged" mode only). Default ${DEFAULT_CONTENT_LIMIT}, max ${MAX_CONTENT_LIMIT}. ` +
80
+ 'For the entire body in one response, use content_mode "full" instead of a large limit.'),
45
81
  }),
46
82
  output: z.object({
47
83
  celex_number: z.string().describe('Confirmed CELEX number for the retrieved work.'),
@@ -71,14 +107,44 @@ export const eurlex_get_document = tool('eurlex_get_document', {
71
107
  content: z
72
108
  .string()
73
109
  .optional()
74
- .describe('Full text content of the act in the requested format and language.'),
75
- content_available: z.boolean().describe('Whether document content was successfully retrieved.'),
110
+ .describe('Body content of the act in the requested format and language. In "paged" mode this is a character window ' +
111
+ '(see content_offset / content_chars_returned / has_more); in "full" mode the entire body; ' +
112
+ 'omitted in "metadata_only" mode, when the window is empty (offset past the end), or when content is unavailable.'),
113
+ content_mode: z
114
+ .string()
115
+ .describe('Content mode applied to this response: "metadata_only", "paged", or "full".'),
116
+ content_available: z
117
+ .boolean()
118
+ .describe('Whether body content was fetched from EUR-Lex. False in "metadata_only" mode (no fetch attempted) — ' +
119
+ 'use content_mode to distinguish "not requested" from "unavailable upstream".'),
120
+ content_offset: z
121
+ .number()
122
+ .int()
123
+ .optional()
124
+ .describe('Character offset where the returned content window begins. Present when a body was fetched and available.'),
125
+ content_chars_returned: z
126
+ .number()
127
+ .int()
128
+ .optional()
129
+ .describe('Number of body characters returned in this response (equals content length). Present when a body was fetched and available.'),
130
+ content_chars_total: z
131
+ .number()
132
+ .int()
133
+ .optional()
134
+ .describe('Total character length of the full document body. Present when content was fetched and available; ' +
135
+ 'use with content_offset to page through the entire act.'),
136
+ has_more: z
137
+ .boolean()
138
+ .describe('True when body content exists beyond the returned window. Page forward with offset = content_offset + content_chars_returned, ' +
139
+ 'or request content_mode "full" for the entire act in one call. Always false in "metadata_only" mode.'),
76
140
  language: z.string().describe('Language code of the returned content.'),
77
141
  language_fallback: z
78
142
  .string()
79
143
  .optional()
80
144
  .describe('Human-readable note explaining the fallback that occurred (e.g. "Requested FR content unavailable; returned EN"). Present only when a fallback happened.'),
81
- content_format: z.string().describe('Format of the returned content: "html" or "xml".'),
145
+ content_format: z
146
+ .string()
147
+ .describe('Format of the returned content: "html", "markdown", or "xml".'),
82
148
  }),
83
149
  errors: [
84
150
  {
@@ -183,12 +249,18 @@ SELECT ?work ?celexNumber ?type ?date ?title ?inForce ?author ?legalBasis ?eurov
183
249
  const inForceStr = CellarSparqlService.bindingValue(first, 'inForce');
184
250
  const inForce = inForceStr !== undefined ? inForceStr === 'true' : undefined;
185
251
  const authorUri = CellarSparqlService.bindingValue(first, 'author');
186
- // Step 2: Fetch document content via EUR-Lex content API
187
- const contentResult = await contentSvc.fetchContent(celexNumber, language, format, ctx);
252
+ // Step 2: assemble metadata, then shape the body per content_mode. The body
253
+ // is one navigable mechanism "metadata_only" skips the fetch entirely,
254
+ // "full" returns the whole body, and "paged" returns a bounded
255
+ // [offset, offset+limit) window with content_chars_total + has_more so the
256
+ // tail is always reachable. The same shaped `content` feeds both
257
+ // structuredContent and format(); there is no separate truncation downstream.
188
258
  const result = {
189
259
  celex_number: confirmedCelex,
190
- content_available: contentResult.contentAvailable,
191
- language: contentResult.language,
260
+ content_mode: input.content_mode,
261
+ content_available: false,
262
+ has_more: false,
263
+ language,
192
264
  content_format: format,
193
265
  };
194
266
  if (workUri)
@@ -207,11 +279,36 @@ SELECT ?work ?celexNumber ?type ?date ?title ?inForce ?author ?legalBasis ?eurov
207
279
  result.eurovoc_subjects = [...eurovocConcepts];
208
280
  if (typeof inForce === 'boolean')
209
281
  result.in_force = inForce;
210
- if (contentResult.contentAvailable && contentResult.content) {
211
- result.content = contentResult.content;
212
- }
213
- if (contentResult.languageFallback) {
214
- result.language_fallback = contentResult.languageFallback;
282
+ if (input.content_mode !== 'metadata_only') {
283
+ const contentResult = await contentSvc.fetchContent(celexNumber, language, format, ctx);
284
+ result.content_available = contentResult.contentAvailable;
285
+ result.language = contentResult.language;
286
+ if (contentResult.languageFallback) {
287
+ result.language_fallback = contentResult.languageFallback;
288
+ }
289
+ if (contentResult.contentAvailable && contentResult.content) {
290
+ const full = contentResult.content;
291
+ const total = full.length;
292
+ result.content_chars_total = total;
293
+ if (input.content_mode === 'full') {
294
+ result.content = full;
295
+ result.content_offset = 0;
296
+ result.content_chars_returned = total;
297
+ result.has_more = false;
298
+ }
299
+ else {
300
+ // Bounded [offset, offset+limit) window over the full body. offset is
301
+ // clamped to the body length so over-paging returns an empty window
302
+ // (has_more false) rather than erroring.
303
+ const offset = Math.min(input.offset, total);
304
+ const windowText = full.slice(offset, offset + input.limit);
305
+ result.content_offset = offset;
306
+ result.content_chars_returned = windowText.length;
307
+ result.has_more = offset + windowText.length < total;
308
+ if (windowText.length > 0)
309
+ result.content = windowText;
310
+ }
311
+ }
215
312
  }
216
313
  return result;
217
314
  },
@@ -238,22 +335,38 @@ SELECT ?work ?celexNumber ?type ?date ?title ?inForce ?author ?legalBasis ?eurov
238
335
  lines.push(`**Language:** ${result.language} | **Format:** ${result.content_format}`);
239
336
  if (result.language_fallback)
240
337
  lines.push(`*Note: ${result.language_fallback}*`);
241
- lines.push(`**Content Available:** ${result.content_available}`);
242
- if (result.content_available && result.content) {
338
+ // Body rendering honors the same window as structuredContent.content the
339
+ // shaped content is emitted verbatim with a navigation line; no second cut.
340
+ if (result.content_mode === 'metadata_only') {
243
341
  lines.push('');
244
- lines.push('---');
245
- lines.push('');
246
- // Truncate very large content for format() output
247
- const maxLen = 8000;
248
- if (result.content.length > maxLen) {
249
- lines.push(result.content.slice(0, maxLen));
250
- lines.push(`\n*[Content truncated ${result.content.length} chars total. Use the CELEX number to fetch directly.]*`);
342
+ lines.push('*Body omitted (content_mode "metadata_only"). Request content_mode "paged" or "full" to retrieve the text.*');
343
+ }
344
+ else if (result.content_available) {
345
+ const total = result.content_chars_total ?? result.content?.length ?? 0;
346
+ if (result.content) {
347
+ const start = result.content_offset ?? 0;
348
+ const returned = result.content_chars_returned ?? result.content.length;
349
+ const end = start + returned;
350
+ if (result.content_mode === 'full') {
351
+ lines.push(`**Content** (full): full body — ${returned} of ${total} characters.`);
352
+ }
353
+ else {
354
+ lines.push(`**Content** (${result.content_mode}): characters ${start}–${end} of ${total} (${returned} returned).` +
355
+ (result.has_more
356
+ ? ` More available — page forward with offset=${end}, or content_mode="full" for the entire act.`
357
+ : ' End of document.'));
358
+ }
359
+ lines.push('');
360
+ lines.push('---');
361
+ lines.push('');
362
+ lines.push(result.content);
251
363
  }
252
364
  else {
253
- lines.push(result.content);
365
+ lines.push('');
366
+ lines.push(`*No content at offset ${result.content_offset ?? 0} — past the end of the ${total}-character body. Lower offset to read.*`);
254
367
  }
255
368
  }
256
- else if (!result.content_available) {
369
+ else {
257
370
  lines.push('');
258
371
  lines.push('*Document content is not available for this work in the requested language.*');
259
372
  }
@@ -1 +1 @@
1
- {"version":3,"file":"eurlex-get-document.tool.js","sourceRoot":"","sources":["../../../../src/mcp-server/tools/definitions/eurlex-get-document.tool.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAEH,OAAO,EAAE,IAAI,EAAE,CAAC,EAAE,MAAM,wBAAwB,CAAC;AACjD,OAAO,EAAE,gBAAgB,EAAE,MAAM,+BAA+B,CAAC;AACjE,OAAO,EACL,gBAAgB,EAChB,yBAAyB,EACzB,wBAAwB,GACzB,MAAM,wCAAwC,CAAC;AAChD,OAAO,EACL,mBAAmB,EACnB,sBAAsB,GACvB,MAAM,mDAAmD,CAAC;AAC3D,OAAO,EAAE,mBAAmB,EAAE,gBAAgB,EAAE,MAAM,4CAA4C,CAAC;AACnG,OAAO,EAGL,uBAAuB,GACxB,MAAM,qDAAqD,CAAC;AAE7D,MAAM,CAAC,MAAM,mBAAmB,GAAG,IAAI,CAAC,qBAAqB,EAAE;IAC7D,KAAK,EAAE,iBAAiB;IACxB,WAAW,EACT,qFAAqF;QACrF,gHAAgH;QAChH,kEAAkE;QAClE,wFAAwF;QACxF,iDAAiD;QACjD,mHAAmH;QACnH,uEAAuE;QACvE,8EAA8E;QAC9E,4GAA4G;IAC9G,WAAW,EAAE,EAAE,YAAY,EAAE,IAAI,EAAE,cAAc,EAAE,IAAI,EAAE,aAAa,EAAE,IAAI,EAAE;IAC9E,KAAK,EAAE,CAAC,CAAC,MAAM,CAAC;QACd,YAAY,EAAE,CAAC;aACZ,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CACP,+DAA+D;YAC7D,iDAAiD,CACpD;QACH,OAAO,EAAE,CAAC;aACP,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CACP,4EAA4E;YAC1E,2EAA2E;YAC3E,iDAAiD,CACpD;QACH,QAAQ,EAAE,CAAC;aACR,MAAM,EAAE;aACR,KAAK,CAAC,iBAAiB,CAAC;aACxB,OAAO,CAAC,IAAI,CAAC;aACb,QAAQ,CACP,6EAA6E;YAC3E,4EAA4E,CAC/E;QACH,MAAM,EAAE,CAAC;aACN,IAAI,CAAC,CAAC,MAAM,EAAE,KAAK,CAAC,CAAC;aACrB,OAAO,CAAC,MAAM,CAAC;aACf,QAAQ,CACP,mGAAmG,CACpG;KACJ,CAAC;IACF,MAAM,EAAE,CAAC,CAAC,MAAM,CAAC;QACf,YAAY,EAAE,CAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,CAAC,gDAAgD,CAAC;QACnF,QAAQ,EAAE,CAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,EAAE,CAAC,QAAQ,CAAC,kBAAkB,CAAC;QAC5D,KAAK,EAAE,CAAC;aACL,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CACP,uFAAuF,CACxF;QACH,IAAI,EAAE,CAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,EAAE,CAAC,QAAQ,CAAC,gDAAgD,CAAC;QACtF,aAAa,EAAE,CAAC;aACb,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CACP,mGAAmG,CACpG;QACH,kBAAkB,EAAE,CAAC;aAClB,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CACP,oIAAoI,CACrI;QACH,WAAW,EAAE,CAAC;aACX,KAAK,CAAC,CAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,CAAC,2CAA2C,CAAC,CAAC;aACvE,QAAQ,EAAE;aACV,QAAQ,CAAC,iCAAiC,CAAC;QAC9C,gBAAgB,EAAE,CAAC;aAChB,KAAK,CAAC,CAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,CAAC,sBAAsB,CAAC,CAAC;aAClD,QAAQ,EAAE;aACV,QAAQ,CAAC,kCAAkC,CAAC;QAC/C,QAAQ,EAAE,CAAC,CAAC,OAAO,EAAE,CAAC,QAAQ,EAAE,CAAC,QAAQ,CAAC,wCAAwC,CAAC;QACnF,OAAO,EAAE,CAAC;aACP,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CAAC,oEAAoE,CAAC;QACjF,iBAAiB,EAAE,CAAC,CAAC,OAAO,EAAE,CAAC,QAAQ,CAAC,sDAAsD,CAAC;QAC/F,QAAQ,EAAE,CAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,CAAC,wCAAwC,CAAC;QACvE,iBAAiB,EAAE,CAAC;aACjB,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CACP,0JAA0J,CAC3J;QACH,cAAc,EAAE,CAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,CAAC,kDAAkD,CAAC;KACxF,CAAC;IAEF,MAAM,EAAE;QACN;YACE,MAAM,EAAE,yBAAyB;YACjC,IAAI,EAAE,gBAAgB,CAAC,eAAe;YACtC,IAAI,EAAE,8DAA8D;YACpE,QAAQ,EAAE,iDAAiD;SAC5D;QACD;YACE,MAAM,EAAE,WAAW;YACnB,IAAI,EAAE,gBAAgB,CAAC,QAAQ;YAC/B,IAAI,EAAE,sFAAsF;YAC5F,QAAQ,EAAE,wEAAwE;SACnF;QACD;YACE,MAAM,EAAE,sBAAsB;YAC9B,IAAI,EAAE,gBAAgB,CAAC,QAAQ;YAC/B,IAAI,EAAE,qFAAqF;YAC3F,QAAQ,EACN,gGAAgG;SACnG;QACD;YACE,MAAM,EAAE,sBAAsB;YAC9B,IAAI,EAAE,gBAAgB,CAAC,kBAAkB;YACzC,IAAI,EAAE,wEAAwE;YAC9E,QAAQ,EACN,oFAAoF;SACvF;KACF;IAED,KAAK,CAAC,OAAO,CAAC,KAAK,EAAE,GAAG;QACtB,MAAM,SAAS,GAAG,sBAAsB,EAAE,CAAC;QAC3C,MAAM,UAAU,GAAG,uBAAuB,EAAE,CAAC;QAE7C,2EAA2E;QAC3E,0EAA0E;QAC1E,6EAA6E;QAC7E,qEAAqE;QACrE,4EAA4E;QAC5E,oEAAoE;QACpE,MAAM,UAAU,GAAG,KAAK,CAAC,YAAY,EAAE,IAAI,EAAE,CAAC;QAC9C,MAAM,QAAQ,GAAG,KAAK,CAAC,OAAO,EAAE,IAAI,EAAE,CAAC;QAEvC,IAAI,WAAmB,CAAC;QACxB,IAAI,QAAQ,IAAI,CAAC,UAAU,EAAE,CAAC;YAC5B,MAAM,OAAO,GAAG,MAAM,gBAAgB,CAAC,SAAS,EAAE,QAAQ,EAAE,GAAG,CAAC,CAAC;YACjE,MAAM,aAAa,GAAG,OAAO,IAAI,mBAAmB,CAAC,YAAY,CAAC,OAAO,EAAE,aAAa,CAAC,CAAC;YAC1F,IAAI,CAAC,aAAa,EAAE,CAAC;gBACnB,MAAM,GAAG,CAAC,IAAI,CAAC,WAAW,EAAE,iCAAiC,QAAQ,EAAE,EAAE;oBACvE,GAAG,GAAG,CAAC,WAAW,CAAC,WAAW,CAAC;iBAChC,CAAC,CAAC;YACL,CAAC;YACD,WAAW,GAAG,aAAa,CAAC;QAC9B,CAAC;aAAM,IAAI,UAAU,IAAI,CAAC,QAAQ,EAAE,CAAC;YACnC,WAAW,GAAG,UAAU,CAAC;QAC3B,CAAC;aAAM,CAAC;YACN,MAAM,GAAG,CAAC,IAAI,CACZ,yBAAyB,EACzB,UAAU;gBACR,CAAC,CAAC,wDAAwD;gBAC1D,CAAC,CAAC,yCAAyC,EAC7C,EAAE,GAAG,GAAG,CAAC,WAAW,CAAC,yBAAyB,CAAC,EAAE,CAClD,CAAC;QACJ,CAAC;QAED,MAAM,QAAQ,GAAG,CAAC,KAAK,CAAC,QAAQ,CAAC,IAAI,EAAE,CAAC,WAAW,EAAE,IAAI,IAAI,CAAmB,CAAC;QACjF,MAAM,MAAM,GAAG,KAAK,CAAC,MAAuB,CAAC;QAC7C,MAAM,eAAe,GAAG,mBAAmB,CAAC,WAAW,CAAC,CAAC;QAEzD,oCAAoC;QACpC,MAAM,UAAU,GAAG;;;gCAGS,eAAe;;;;;0CAKL,gBAAgB;;;;;;;WAO/C,CAAC;QAER,MAAM,YAAY,GAAG,MAAM,SAAS,CAAC,KAAK,CAAC,UAAU,EAAE,GAAG,CAAC,CAAC;QAC5D,GAAG,CAAC,GAAG,CAAC,IAAI,CAAC,yBAAyB,EAAE,EAAE,WAAW,EAAE,WAAW,EAAE,YAAY,CAAC,MAAM,EAAE,CAAC,CAAC;QAE3F,IAAI,YAAY,CAAC,MAAM,KAAK,CAAC,EAAE,CAAC;YAC9B,MAAM,GAAG,CAAC,IAAI,CAAC,WAAW,EAAE,mCAAmC,WAAW,EAAE,EAAE;gBAC5E,GAAG,GAAG,CAAC,WAAW,CAAC,WAAW,CAAC;aAChC,CAAC,CAAC;QACL,CAAC;QAED,kDAAkD;QAClD,MAAM,KAAK,GAAG,YAAY,CAAC,CAAC,CAAC,CAAC;QAC9B,MAAM,UAAU,GAAG,IAAI,GAAG,EAAU,CAAC;QACrC,MAAM,eAAe,GAAG,IAAI,GAAG,EAAU,CAAC;QAC1C,KAAK,MAAM,CAAC,IAAI,YAAY,EAAE,CAAC;YAC7B,MAAM,EAAE,GAAG,mBAAmB,CAAC,YAAY,CAAC,CAAC,EAAE,YAAY,CAAC,CAAC;YAC7D,IAAI,EAAE;gBAAE,UAAU,CAAC,GAAG,CAAC,EAAE,CAAC,CAAC;YAC3B,MAAM,EAAE,GAAG,mBAAmB,CAAC,YAAY,CAAC,CAAC,EAAE,SAAS,CAAC,CAAC;YAC1D,IAAI,EAAE;gBAAE,eAAe,CAAC,GAAG,CAAC,EAAE,CAAC,CAAC;QAClC,CAAC;QAED,MAAM,OAAO,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,MAAM,CAAC,CAAC;QAChE,MAAM,cAAc,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,aAAa,CAAC,IAAI,WAAW,CAAC;QAC7F,MAAM,YAAY,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,MAAM,CAAC,CAAC;QACrE,MAAM,IAAI,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,MAAM,CAAC,CAAC;QAC7D,MAAM,KAAK,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,OAAO,CAAC,CAAC;QAC/D,MAAM,UAAU,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,SAAS,CAAC,CAAC;QACtE,MAAM,OAAO,GAAG,UAAU,KAAK,SAAS,CAAC,CAAC,CAAC,UAAU,KAAK,MAAM,CAAC,CAAC,CAAC,SAAS,CAAC;QAC7E,MAAM,SAAS,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,QAAQ,CAAC,CAAC;QAEpE,yDAAyD;QACzD,MAAM,aAAa,GAAG,MAAM,UAAU,CAAC,YAAY,CAAC,WAAW,EAAE,QAAQ,EAAE,MAAM,EAAE,GAAG,CAAC,CAAC;QAExF,MAAM,MAAM,GAeR;YACF,YAAY,EAAE,cAAc;YAC5B,iBAAiB,EAAE,aAAa,CAAC,gBAAgB;YACjD,QAAQ,EAAE,aAAa,CAAC,QAAQ;YAChC,cAAc,EAAE,MAAM;SACvB,CAAC;QAEF,IAAI,OAAO;YAAE,MAAM,CAAC,QAAQ,GAAG,OAAO,CAAC;QACvC,IAAI,KAAK;YAAE,MAAM,CAAC,KAAK,GAAG,KAAK,CAAC;QAChC,IAAI,IAAI;YAAE,MAAM,CAAC,IAAI,GAAG,IAAI,CAAC;QAC7B,IAAI,YAAY;YAAE,MAAM,CAAC,aAAa,GAAG,wBAAwB,CAAC,YAAY,CAAC,CAAC;QAChF,IAAI,SAAS;YAAE,MAAM,CAAC,kBAAkB,GAAG,yBAAyB,CAAC,SAAS,CAAC,CAAC;QAChF,IAAI,UAAU,CAAC,IAAI,GAAG,CAAC;YAAE,MAAM,CAAC,WAAW,GAAG,CAAC,GAAG,UAAU,CAAC,CAAC;QAC9D,IAAI,eAAe,CAAC,IAAI,GAAG,CAAC;YAAE,MAAM,CAAC,gBAAgB,GAAG,CAAC,GAAG,eAAe,CAAC,CAAC;QAC7E,IAAI,OAAO,OAAO,KAAK,SAAS;YAAE,MAAM,CAAC,QAAQ,GAAG,OAAO,CAAC;QAC5D,IAAI,aAAa,CAAC,gBAAgB,IAAI,aAAa,CAAC,OAAO,EAAE,CAAC;YAC5D,MAAM,CAAC,OAAO,GAAG,aAAa,CAAC,OAAO,CAAC;QACzC,CAAC;QACD,IAAI,aAAa,CAAC,gBAAgB,EAAE,CAAC;YACnC,MAAM,CAAC,iBAAiB,GAAG,aAAa,CAAC,gBAAgB,CAAC;QAC5D,CAAC;QAED,OAAO,MAAM,CAAC;IAChB,CAAC;IAED,MAAM,EAAE,CAAC,MAAM,EAAE,EAAE;QACjB,MAAM,KAAK,GAAa;YACtB,MAAM,MAAM,CAAC,YAAY,GAAG,MAAM,CAAC,KAAK,CAAC,CAAC,CAAC,MAAM,MAAM,CAAC,KAAK,EAAE,CAAC,CAAC,CAAC,EAAE,IAAI;SACzE,CAAC;QACF,IAAI,MAAM,CAAC,IAAI;YAAE,KAAK,CAAC,IAAI,CAAC,aAAa,MAAM,CAAC,IAAI,EAAE,CAAC,CAAC;QACxD,IAAI,MAAM,CAAC,aAAa;YAAE,KAAK,CAAC,IAAI,CAAC,aAAa,MAAM,CAAC,aAAa,EAAE,CAAC,CAAC;QAC1E,IAAI,MAAM,CAAC,kBAAkB;YAAE,KAAK,CAAC,IAAI,CAAC,eAAe,MAAM,CAAC,kBAAkB,EAAE,CAAC,CAAC;QACtF,IAAI,OAAO,MAAM,CAAC,QAAQ,KAAK,SAAS;YAAE,KAAK,CAAC,IAAI,CAAC,iBAAiB,MAAM,CAAC,QAAQ,EAAE,CAAC,CAAC;QACzF,IAAI,MAAM,CAAC,QAAQ;YAAE,KAAK,CAAC,IAAI,CAAC,iBAAiB,MAAM,CAAC,QAAQ,EAAE,CAAC,CAAC;QACpE,IAAI,MAAM,CAAC,WAAW,IAAI,MAAM,CAAC,WAAW,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;YACxD,KAAK,CAAC,IAAI,CAAC,oBAAoB,MAAM,CAAC,WAAW,CAAC,IAAI,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;QAClE,CAAC;QACD,IAAI,MAAM,CAAC,gBAAgB,IAAI,MAAM,CAAC,gBAAgB,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;YAClE,KAAK,CAAC,IAAI,CACR,yBAAyB,MAAM,CAAC,gBAAgB,CAAC,KAAK,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC,IAAI,CAAC,IAAI,CAAC,GAAG,MAAM,CAAC,gBAAgB,CAAC,MAAM,GAAG,CAAC,CAAC,CAAC,CAAC,MAAM,MAAM,CAAC,gBAAgB,CAAC,MAAM,GAAG,CAAC,QAAQ,CAAC,CAAC,CAAC,EAAE,EAAE,CACvK,CAAC;QACJ,CAAC;QACD,KAAK,CAAC,IAAI,CAAC,iBAAiB,MAAM,CAAC,QAAQ,kBAAkB,MAAM,CAAC,cAAc,EAAE,CAAC,CAAC;QACtF,IAAI,MAAM,CAAC,iBAAiB;YAAE,KAAK,CAAC,IAAI,CAAC,UAAU,MAAM,CAAC,iBAAiB,GAAG,CAAC,CAAC;QAChF,KAAK,CAAC,IAAI,CAAC,0BAA0B,MAAM,CAAC,iBAAiB,EAAE,CAAC,CAAC;QACjE,IAAI,MAAM,CAAC,iBAAiB,IAAI,MAAM,CAAC,OAAO,EAAE,CAAC;YAC/C,KAAK,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;YACf,KAAK,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC;YAClB,KAAK,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;YACf,kDAAkD;YAClD,MAAM,MAAM,GAAG,IAAI,CAAC;YACpB,IAAI,MAAM,CAAC,OAAO,CAAC,MAAM,GAAG,MAAM,EAAE,CAAC;gBACnC,KAAK,CAAC,IAAI,CAAC,MAAM,CAAC,OAAO,CAAC,KAAK,CAAC,CAAC,EAAE,MAAM,CAAC,CAAC,CAAC;gBAC5C,KAAK,CAAC,IAAI,CACR,2BAA2B,MAAM,CAAC,OAAO,CAAC,MAAM,yDAAyD,CAC1G,CAAC;YACJ,CAAC;iBAAM,CAAC;gBACN,KAAK,CAAC,IAAI,CAAC,MAAM,CAAC,OAAO,CAAC,CAAC;YAC7B,CAAC;QACH,CAAC;aAAM,IAAI,CAAC,MAAM,CAAC,iBAAiB,EAAE,CAAC;YACrC,KAAK,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;YACf,KAAK,CAAC,IAAI,CAAC,8EAA8E,CAAC,CAAC;QAC7F,CAAC;QACD,OAAO,CAAC,EAAE,IAAI,EAAE,MAAM,EAAE,IAAI,EAAE,KAAK,CAAC,IAAI,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;IACpD,CAAC;CACF,CAAC,CAAC"}
1
+ {"version":3,"file":"eurlex-get-document.tool.js","sourceRoot":"","sources":["../../../../src/mcp-server/tools/definitions/eurlex-get-document.tool.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAEH,OAAO,EAAE,IAAI,EAAE,CAAC,EAAE,MAAM,wBAAwB,CAAC;AACjD,OAAO,EAAE,gBAAgB,EAAE,MAAM,+BAA+B,CAAC;AACjE,OAAO,EACL,gBAAgB,EAChB,yBAAyB,EACzB,wBAAwB,GACzB,MAAM,wCAAwC,CAAC;AAChD,OAAO,EACL,mBAAmB,EACnB,sBAAsB,GACvB,MAAM,mDAAmD,CAAC;AAC3D,OAAO,EAAE,mBAAmB,EAAE,gBAAgB,EAAE,MAAM,4CAA4C,CAAC;AACnG,OAAO,EAGL,uBAAuB,GACxB,MAAM,qDAAqD,CAAC;AAE7D;;;;GAIG;AACH,MAAM,qBAAqB,GAAG,MAAM,CAAC;AAErC,wGAAwG;AACxG,MAAM,iBAAiB,GAAG,OAAO,CAAC;AAElC,MAAM,CAAC,MAAM,mBAAmB,GAAG,IAAI,CAAC,qBAAqB,EAAE;IAC7D,KAAK,EAAE,iBAAiB;IACxB,WAAW,EACT,qFAAqF;QACrF,gHAAgH;QAChH,oFAAoF;QACpF,wFAAwF;QACxF,iDAAiD;QACjD,mHAAmH;QACnH,uEAAuE;QACvE,8EAA8E;QAC9E,iHAAiH;QACjH,qIAAqI;QACrI,8GAA8G;QAC9G,yHAAyH;QACzH,kJAAkJ;IACpJ,WAAW,EAAE,EAAE,YAAY,EAAE,IAAI,EAAE,cAAc,EAAE,IAAI,EAAE,aAAa,EAAE,IAAI,EAAE;IAC9E,KAAK,EAAE,CAAC,CAAC,MAAM,CAAC;QACd,YAAY,EAAE,CAAC;aACZ,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CACP,+DAA+D;YAC7D,iDAAiD,CACpD;QACH,OAAO,EAAE,CAAC;aACP,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CACP,4EAA4E;YAC1E,2EAA2E;YAC3E,iDAAiD,CACpD;QACH,QAAQ,EAAE,CAAC;aACR,MAAM,EAAE;aACR,KAAK,CAAC,iBAAiB,CAAC;aACxB,OAAO,CAAC,IAAI,CAAC;aACb,QAAQ,CACP,6EAA6E;YAC3E,4EAA4E,CAC/E;QACH,MAAM,EAAE,CAAC;aACN,IAAI,CAAC,CAAC,MAAM,EAAE,KAAK,EAAE,UAAU,CAAC,CAAC;aACjC,OAAO,CAAC,MAAM,CAAC;aACf,QAAQ,CACP,0EAA0E;YACxE,mEAAmE;YACnE,+EAA+E;YAC/E,0CAA0C,CAC7C;QACH,YAAY,EAAE,CAAC;aACZ,IAAI,CAAC,CAAC,eAAe,EAAE,OAAO,EAAE,MAAM,CAAC,CAAC;aACxC,OAAO,CAAC,OAAO,CAAC;aAChB,QAAQ,CACP,oHAAoH;YAClH,iFAAiF;YACjF,oHAAoH,CACvH;QACH,MAAM,EAAE,CAAC;aACN,MAAM,EAAE;aACR,GAAG,EAAE;aACL,GAAG,CAAC,CAAC,CAAC;aACN,OAAO,CAAC,CAAC,CAAC;aACV,QAAQ,CACP,qGAAqG;YACnG,kGAAkG,CACrG;QACH,KAAK,EAAE,CAAC;aACL,MAAM,EAAE;aACR,GAAG,EAAE;aACL,GAAG,CAAC,CAAC,CAAC;aACN,GAAG,CAAC,iBAAiB,CAAC;aACtB,OAAO,CAAC,qBAAqB,CAAC;aAC9B,QAAQ,CACP,4FAA4F,qBAAqB,SAAS,iBAAiB,IAAI;YAC7I,wFAAwF,CAC3F;KACJ,CAAC;IACF,MAAM,EAAE,CAAC,CAAC,MAAM,CAAC;QACf,YAAY,EAAE,CAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,CAAC,gDAAgD,CAAC;QACnF,QAAQ,EAAE,CAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,EAAE,CAAC,QAAQ,CAAC,kBAAkB,CAAC;QAC5D,KAAK,EAAE,CAAC;aACL,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CACP,uFAAuF,CACxF;QACH,IAAI,EAAE,CAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,EAAE,CAAC,QAAQ,CAAC,gDAAgD,CAAC;QACtF,aAAa,EAAE,CAAC;aACb,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CACP,mGAAmG,CACpG;QACH,kBAAkB,EAAE,CAAC;aAClB,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CACP,oIAAoI,CACrI;QACH,WAAW,EAAE,CAAC;aACX,KAAK,CAAC,CAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,CAAC,2CAA2C,CAAC,CAAC;aACvE,QAAQ,EAAE;aACV,QAAQ,CAAC,iCAAiC,CAAC;QAC9C,gBAAgB,EAAE,CAAC;aAChB,KAAK,CAAC,CAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,CAAC,sBAAsB,CAAC,CAAC;aAClD,QAAQ,EAAE;aACV,QAAQ,CAAC,kCAAkC,CAAC;QAC/C,QAAQ,EAAE,CAAC,CAAC,OAAO,EAAE,CAAC,QAAQ,EAAE,CAAC,QAAQ,CAAC,wCAAwC,CAAC;QACnF,OAAO,EAAE,CAAC;aACP,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CACP,2GAA2G;YACzG,4FAA4F;YAC5F,kHAAkH,CACrH;QACH,YAAY,EAAE,CAAC;aACZ,MAAM,EAAE;aACR,QAAQ,CAAC,6EAA6E,CAAC;QAC1F,iBAAiB,EAAE,CAAC;aACjB,OAAO,EAAE;aACT,QAAQ,CACP,sGAAsG;YACpG,8EAA8E,CACjF;QACH,cAAc,EAAE,CAAC;aACd,MAAM,EAAE;aACR,GAAG,EAAE;aACL,QAAQ,EAAE;aACV,QAAQ,CACP,2GAA2G,CAC5G;QACH,sBAAsB,EAAE,CAAC;aACtB,MAAM,EAAE;aACR,GAAG,EAAE;aACL,QAAQ,EAAE;aACV,QAAQ,CACP,6HAA6H,CAC9H;QACH,mBAAmB,EAAE,CAAC;aACnB,MAAM,EAAE;aACR,GAAG,EAAE;aACL,QAAQ,EAAE;aACV,QAAQ,CACP,oGAAoG;YAClG,yDAAyD,CAC5D;QACH,QAAQ,EAAE,CAAC;aACR,OAAO,EAAE;aACT,QAAQ,CACP,gIAAgI;YAC9H,sGAAsG,CACzG;QACH,QAAQ,EAAE,CAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,CAAC,wCAAwC,CAAC;QACvE,iBAAiB,EAAE,CAAC;aACjB,MAAM,EAAE;aACR,QAAQ,EAAE;aACV,QAAQ,CACP,0JAA0J,CAC3J;QACH,cAAc,EAAE,CAAC;aACd,MAAM,EAAE;aACR,QAAQ,CAAC,+DAA+D,CAAC;KAC7E,CAAC;IAEF,MAAM,EAAE;QACN;YACE,MAAM,EAAE,yBAAyB;YACjC,IAAI,EAAE,gBAAgB,CAAC,eAAe;YACtC,IAAI,EAAE,8DAA8D;YACpE,QAAQ,EAAE,iDAAiD;SAC5D;QACD;YACE,MAAM,EAAE,WAAW;YACnB,IAAI,EAAE,gBAAgB,CAAC,QAAQ;YAC/B,IAAI,EAAE,sFAAsF;YAC5F,QAAQ,EAAE,wEAAwE;SACnF;QACD;YACE,MAAM,EAAE,sBAAsB;YAC9B,IAAI,EAAE,gBAAgB,CAAC,QAAQ;YAC/B,IAAI,EAAE,qFAAqF;YAC3F,QAAQ,EACN,gGAAgG;SACnG;QACD;YACE,MAAM,EAAE,sBAAsB;YAC9B,IAAI,EAAE,gBAAgB,CAAC,kBAAkB;YACzC,IAAI,EAAE,wEAAwE;YAC9E,QAAQ,EACN,oFAAoF;SACvF;KACF;IAED,KAAK,CAAC,OAAO,CAAC,KAAK,EAAE,GAAG;QACtB,MAAM,SAAS,GAAG,sBAAsB,EAAE,CAAC;QAC3C,MAAM,UAAU,GAAG,uBAAuB,EAAE,CAAC;QAE7C,2EAA2E;QAC3E,0EAA0E;QAC1E,6EAA6E;QAC7E,qEAAqE;QACrE,4EAA4E;QAC5E,oEAAoE;QACpE,MAAM,UAAU,GAAG,KAAK,CAAC,YAAY,EAAE,IAAI,EAAE,CAAC;QAC9C,MAAM,QAAQ,GAAG,KAAK,CAAC,OAAO,EAAE,IAAI,EAAE,CAAC;QAEvC,IAAI,WAAmB,CAAC;QACxB,IAAI,QAAQ,IAAI,CAAC,UAAU,EAAE,CAAC;YAC5B,MAAM,OAAO,GAAG,MAAM,gBAAgB,CAAC,SAAS,EAAE,QAAQ,EAAE,GAAG,CAAC,CAAC;YACjE,MAAM,aAAa,GAAG,OAAO,IAAI,mBAAmB,CAAC,YAAY,CAAC,OAAO,EAAE,aAAa,CAAC,CAAC;YAC1F,IAAI,CAAC,aAAa,EAAE,CAAC;gBACnB,MAAM,GAAG,CAAC,IAAI,CAAC,WAAW,EAAE,iCAAiC,QAAQ,EAAE,EAAE;oBACvE,GAAG,GAAG,CAAC,WAAW,CAAC,WAAW,CAAC;iBAChC,CAAC,CAAC;YACL,CAAC;YACD,WAAW,GAAG,aAAa,CAAC;QAC9B,CAAC;aAAM,IAAI,UAAU,IAAI,CAAC,QAAQ,EAAE,CAAC;YACnC,WAAW,GAAG,UAAU,CAAC;QAC3B,CAAC;aAAM,CAAC;YACN,MAAM,GAAG,CAAC,IAAI,CACZ,yBAAyB,EACzB,UAAU;gBACR,CAAC,CAAC,wDAAwD;gBAC1D,CAAC,CAAC,yCAAyC,EAC7C,EAAE,GAAG,GAAG,CAAC,WAAW,CAAC,yBAAyB,CAAC,EAAE,CAClD,CAAC;QACJ,CAAC;QAED,MAAM,QAAQ,GAAG,CAAC,KAAK,CAAC,QAAQ,CAAC,IAAI,EAAE,CAAC,WAAW,EAAE,IAAI,IAAI,CAAmB,CAAC;QACjF,MAAM,MAAM,GAAG,KAAK,CAAC,MAAuB,CAAC;QAC7C,MAAM,eAAe,GAAG,mBAAmB,CAAC,WAAW,CAAC,CAAC;QAEzD,oCAAoC;QACpC,MAAM,UAAU,GAAG;;;gCAGS,eAAe;;;;;0CAKL,gBAAgB;;;;;;;WAO/C,CAAC;QAER,MAAM,YAAY,GAAG,MAAM,SAAS,CAAC,KAAK,CAAC,UAAU,EAAE,GAAG,CAAC,CAAC;QAC5D,GAAG,CAAC,GAAG,CAAC,IAAI,CAAC,yBAAyB,EAAE,EAAE,WAAW,EAAE,WAAW,EAAE,YAAY,CAAC,MAAM,EAAE,CAAC,CAAC;QAE3F,IAAI,YAAY,CAAC,MAAM,KAAK,CAAC,EAAE,CAAC;YAC9B,MAAM,GAAG,CAAC,IAAI,CAAC,WAAW,EAAE,mCAAmC,WAAW,EAAE,EAAE;gBAC5E,GAAG,GAAG,CAAC,WAAW,CAAC,WAAW,CAAC;aAChC,CAAC,CAAC;QACL,CAAC;QAED,kDAAkD;QAClD,MAAM,KAAK,GAAG,YAAY,CAAC,CAAC,CAAC,CAAC;QAC9B,MAAM,UAAU,GAAG,IAAI,GAAG,EAAU,CAAC;QACrC,MAAM,eAAe,GAAG,IAAI,GAAG,EAAU,CAAC;QAC1C,KAAK,MAAM,CAAC,IAAI,YAAY,EAAE,CAAC;YAC7B,MAAM,EAAE,GAAG,mBAAmB,CAAC,YAAY,CAAC,CAAC,EAAE,YAAY,CAAC,CAAC;YAC7D,IAAI,EAAE;gBAAE,UAAU,CAAC,GAAG,CAAC,EAAE,CAAC,CAAC;YAC3B,MAAM,EAAE,GAAG,mBAAmB,CAAC,YAAY,CAAC,CAAC,EAAE,SAAS,CAAC,CAAC;YAC1D,IAAI,EAAE;gBAAE,eAAe,CAAC,GAAG,CAAC,EAAE,CAAC,CAAC;QAClC,CAAC;QAED,MAAM,OAAO,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,MAAM,CAAC,CAAC;QAChE,MAAM,cAAc,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,aAAa,CAAC,IAAI,WAAW,CAAC;QAC7F,MAAM,YAAY,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,MAAM,CAAC,CAAC;QACrE,MAAM,IAAI,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,MAAM,CAAC,CAAC;QAC7D,MAAM,KAAK,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,OAAO,CAAC,CAAC;QAC/D,MAAM,UAAU,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,SAAS,CAAC,CAAC;QACtE,MAAM,OAAO,GAAG,UAAU,KAAK,SAAS,CAAC,CAAC,CAAC,UAAU,KAAK,MAAM,CAAC,CAAC,CAAC,SAAS,CAAC;QAC7E,MAAM,SAAS,GAAG,mBAAmB,CAAC,YAAY,CAAC,KAAK,EAAE,QAAQ,CAAC,CAAC;QAEpE,4EAA4E;QAC5E,yEAAyE;QACzE,+DAA+D;QAC/D,2EAA2E;QAC3E,iEAAiE;QACjE,8EAA8E;QAC9E,MAAM,MAAM,GAoBR;YACF,YAAY,EAAE,cAAc;YAC5B,YAAY,EAAE,KAAK,CAAC,YAAY;YAChC,iBAAiB,EAAE,KAAK;YACxB,QAAQ,EAAE,KAAK;YACf,QAAQ;YACR,cAAc,EAAE,MAAM;SACvB,CAAC;QAEF,IAAI,OAAO;YAAE,MAAM,CAAC,QAAQ,GAAG,OAAO,CAAC;QACvC,IAAI,KAAK;YAAE,MAAM,CAAC,KAAK,GAAG,KAAK,CAAC;QAChC,IAAI,IAAI;YAAE,MAAM,CAAC,IAAI,GAAG,IAAI,CAAC;QAC7B,IAAI,YAAY;YAAE,MAAM,CAAC,aAAa,GAAG,wBAAwB,CAAC,YAAY,CAAC,CAAC;QAChF,IAAI,SAAS;YAAE,MAAM,CAAC,kBAAkB,GAAG,yBAAyB,CAAC,SAAS,CAAC,CAAC;QAChF,IAAI,UAAU,CAAC,IAAI,GAAG,CAAC;YAAE,MAAM,CAAC,WAAW,GAAG,CAAC,GAAG,UAAU,CAAC,CAAC;QAC9D,IAAI,eAAe,CAAC,IAAI,GAAG,CAAC;YAAE,MAAM,CAAC,gBAAgB,GAAG,CAAC,GAAG,eAAe,CAAC,CAAC;QAC7E,IAAI,OAAO,OAAO,KAAK,SAAS;YAAE,MAAM,CAAC,QAAQ,GAAG,OAAO,CAAC;QAE5D,IAAI,KAAK,CAAC,YAAY,KAAK,eAAe,EAAE,CAAC;YAC3C,MAAM,aAAa,GAAG,MAAM,UAAU,CAAC,YAAY,CAAC,WAAW,EAAE,QAAQ,EAAE,MAAM,EAAE,GAAG,CAAC,CAAC;YACxF,MAAM,CAAC,iBAAiB,GAAG,aAAa,CAAC,gBAAgB,CAAC;YAC1D,MAAM,CAAC,QAAQ,GAAG,aAAa,CAAC,QAAQ,CAAC;YACzC,IAAI,aAAa,CAAC,gBAAgB,EAAE,CAAC;gBACnC,MAAM,CAAC,iBAAiB,GAAG,aAAa,CAAC,gBAAgB,CAAC;YAC5D,CAAC;YAED,IAAI,aAAa,CAAC,gBAAgB,IAAI,aAAa,CAAC,OAAO,EAAE,CAAC;gBAC5D,MAAM,IAAI,GAAG,aAAa,CAAC,OAAO,CAAC;gBACnC,MAAM,KAAK,GAAG,IAAI,CAAC,MAAM,CAAC;gBAC1B,MAAM,CAAC,mBAAmB,GAAG,KAAK,CAAC;gBAEnC,IAAI,KAAK,CAAC,YAAY,KAAK,MAAM,EAAE,CAAC;oBAClC,MAAM,CAAC,OAAO,GAAG,IAAI,CAAC;oBACtB,MAAM,CAAC,cAAc,GAAG,CAAC,CAAC;oBAC1B,MAAM,CAAC,sBAAsB,GAAG,KAAK,CAAC;oBACtC,MAAM,CAAC,QAAQ,GAAG,KAAK,CAAC;gBAC1B,CAAC;qBAAM,CAAC;oBACN,sEAAsE;oBACtE,oEAAoE;oBACpE,yCAAyC;oBACzC,MAAM,MAAM,GAAG,IAAI,CAAC,GAAG,CAAC,KAAK,CAAC,MAAM,EAAE,KAAK,CAAC,CAAC;oBAC7C,MAAM,UAAU,GAAG,IAAI,CAAC,KAAK,CAAC,MAAM,EAAE,MAAM,GAAG,KAAK,CAAC,KAAK,CAAC,CAAC;oBAC5D,MAAM,CAAC,cAAc,GAAG,MAAM,CAAC;oBAC/B,MAAM,CAAC,sBAAsB,GAAG,UAAU,CAAC,MAAM,CAAC;oBAClD,MAAM,CAAC,QAAQ,GAAG,MAAM,GAAG,UAAU,CAAC,MAAM,GAAG,KAAK,CAAC;oBACrD,IAAI,UAAU,CAAC,MAAM,GAAG,CAAC;wBAAE,MAAM,CAAC,OAAO,GAAG,UAAU,CAAC;gBACzD,CAAC;YACH,CAAC;QACH,CAAC;QAED,OAAO,MAAM,CAAC;IAChB,CAAC;IAED,MAAM,EAAE,CAAC,MAAM,EAAE,EAAE;QACjB,MAAM,KAAK,GAAa;YACtB,MAAM,MAAM,CAAC,YAAY,GAAG,MAAM,CAAC,KAAK,CAAC,CAAC,CAAC,MAAM,MAAM,CAAC,KAAK,EAAE,CAAC,CAAC,CAAC,EAAE,IAAI;SACzE,CAAC;QACF,IAAI,MAAM,CAAC,IAAI;YAAE,KAAK,CAAC,IAAI,CAAC,aAAa,MAAM,CAAC,IAAI,EAAE,CAAC,CAAC;QACxD,IAAI,MAAM,CAAC,aAAa;YAAE,KAAK,CAAC,IAAI,CAAC,aAAa,MAAM,CAAC,aAAa,EAAE,CAAC,CAAC;QAC1E,IAAI,MAAM,CAAC,kBAAkB;YAAE,KAAK,CAAC,IAAI,CAAC,eAAe,MAAM,CAAC,kBAAkB,EAAE,CAAC,CAAC;QACtF,IAAI,OAAO,MAAM,CAAC,QAAQ,KAAK,SAAS;YAAE,KAAK,CAAC,IAAI,CAAC,iBAAiB,MAAM,CAAC,QAAQ,EAAE,CAAC,CAAC;QACzF,IAAI,MAAM,CAAC,QAAQ;YAAE,KAAK,CAAC,IAAI,CAAC,iBAAiB,MAAM,CAAC,QAAQ,EAAE,CAAC,CAAC;QACpE,IAAI,MAAM,CAAC,WAAW,IAAI,MAAM,CAAC,WAAW,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;YACxD,KAAK,CAAC,IAAI,CAAC,oBAAoB,MAAM,CAAC,WAAW,CAAC,IAAI,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;QAClE,CAAC;QACD,IAAI,MAAM,CAAC,gBAAgB,IAAI,MAAM,CAAC,gBAAgB,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;YAClE,KAAK,CAAC,IAAI,CACR,yBAAyB,MAAM,CAAC,gBAAgB,CAAC,KAAK,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC,IAAI,CAAC,IAAI,CAAC,GAAG,MAAM,CAAC,gBAAgB,CAAC,MAAM,GAAG,CAAC,CAAC,CAAC,CAAC,MAAM,MAAM,CAAC,gBAAgB,CAAC,MAAM,GAAG,CAAC,QAAQ,CAAC,CAAC,CAAC,EAAE,EAAE,CACvK,CAAC;QACJ,CAAC;QACD,KAAK,CAAC,IAAI,CAAC,iBAAiB,MAAM,CAAC,QAAQ,kBAAkB,MAAM,CAAC,cAAc,EAAE,CAAC,CAAC;QACtF,IAAI,MAAM,CAAC,iBAAiB;YAAE,KAAK,CAAC,IAAI,CAAC,UAAU,MAAM,CAAC,iBAAiB,GAAG,CAAC,CAAC;QAEhF,2EAA2E;QAC3E,4EAA4E;QAC5E,IAAI,MAAM,CAAC,YAAY,KAAK,eAAe,EAAE,CAAC;YAC5C,KAAK,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;YACf,KAAK,CAAC,IAAI,CACR,6GAA6G,CAC9G,CAAC;QACJ,CAAC;aAAM,IAAI,MAAM,CAAC,iBAAiB,EAAE,CAAC;YACpC,MAAM,KAAK,GAAG,MAAM,CAAC,mBAAmB,IAAI,MAAM,CAAC,OAAO,EAAE,MAAM,IAAI,CAAC,CAAC;YACxE,IAAI,MAAM,CAAC,OAAO,EAAE,CAAC;gBACnB,MAAM,KAAK,GAAG,MAAM,CAAC,cAAc,IAAI,CAAC,CAAC;gBACzC,MAAM,QAAQ,GAAG,MAAM,CAAC,sBAAsB,IAAI,MAAM,CAAC,OAAO,CAAC,MAAM,CAAC;gBACxE,MAAM,GAAG,GAAG,KAAK,GAAG,QAAQ,CAAC;gBAC7B,IAAI,MAAM,CAAC,YAAY,KAAK,MAAM,EAAE,CAAC;oBACnC,KAAK,CAAC,IAAI,CAAC,mCAAmC,QAAQ,OAAO,KAAK,cAAc,CAAC,CAAC;gBACpF,CAAC;qBAAM,CAAC;oBACN,KAAK,CAAC,IAAI,CACR,gBAAgB,MAAM,CAAC,YAAY,iBAAiB,KAAK,IAAI,GAAG,OAAO,KAAK,KAAK,QAAQ,aAAa;wBACpG,CAAC,MAAM,CAAC,QAAQ;4BACd,CAAC,CAAC,8CAA8C,GAAG,8CAA8C;4BACjG,CAAC,CAAC,mBAAmB,CAAC,CAC3B,CAAC;gBACJ,CAAC;gBACD,KAAK,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;gBACf,KAAK,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC;gBAClB,KAAK,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;gBACf,KAAK,CAAC,IAAI,CAAC,MAAM,CAAC,OAAO,CAAC,CAAC;YAC7B,CAAC;iBAAM,CAAC;gBACN,KAAK,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;gBACf,KAAK,CAAC,IAAI,CACR,yBAAyB,MAAM,CAAC,cAAc,IAAI,CAAC,0BAA0B,KAAK,yCAAyC,CAC5H,CAAC;YACJ,CAAC;QACH,CAAC;aAAM,CAAC;YACN,KAAK,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;YACf,KAAK,CAAC,IAAI,CAAC,8EAA8E,CAAC,CAAC;QAC7F,CAAC;QACD,OAAO,CAAC,EAAE,IAAI,EAAE,MAAM,EAAE,IAAI,EAAE,KAAK,CAAC,IAAI,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;IACpD,CAAC;CACF,CAAC,CAAC"}
@@ -71,7 +71,15 @@ export declare const allToolDefinitions: (import("@cyanheads/mcp-ts-core").ToolD
71
71
  format: import("zod").ZodDefault<import("zod").ZodEnum<{
72
72
  html: "html";
73
73
  xml: "xml";
74
+ markdown: "markdown";
74
75
  }>>;
76
+ content_mode: import("zod").ZodDefault<import("zod").ZodEnum<{
77
+ full: "full";
78
+ metadata_only: "metadata_only";
79
+ paged: "paged";
80
+ }>>;
81
+ offset: import("zod").ZodDefault<import("zod").ZodNumber>;
82
+ limit: import("zod").ZodDefault<import("zod").ZodNumber>;
75
83
  }, import("zod/v4/core").$strip>, import("zod").ZodObject<{
76
84
  celex_number: import("zod").ZodString;
77
85
  work_uri: import("zod").ZodOptional<import("zod").ZodString>;
@@ -83,7 +91,12 @@ export declare const allToolDefinitions: (import("@cyanheads/mcp-ts-core").ToolD
83
91
  eurovoc_subjects: import("zod").ZodOptional<import("zod").ZodArray<import("zod").ZodString>>;
84
92
  in_force: import("zod").ZodOptional<import("zod").ZodBoolean>;
85
93
  content: import("zod").ZodOptional<import("zod").ZodString>;
94
+ content_mode: import("zod").ZodString;
86
95
  content_available: import("zod").ZodBoolean;
96
+ content_offset: import("zod").ZodOptional<import("zod").ZodNumber>;
97
+ content_chars_returned: import("zod").ZodOptional<import("zod").ZodNumber>;
98
+ content_chars_total: import("zod").ZodOptional<import("zod").ZodNumber>;
99
+ has_more: import("zod").ZodBoolean;
87
100
  language: import("zod").ZodString;
88
101
  language_fallback: import("zod").ZodOptional<import("zod").ZodString>;
89
102
  content_format: import("zod").ZodString;
@@ -1 +1 @@
1
- {"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../../../../src/mcp-server/tools/definitions/index.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAUH,eAAO,MAAM,kBAAkB;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;iBAQ9B,CAAC"}
1
+ {"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../../../../src/mcp-server/tools/definitions/index.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAUH,eAAO,MAAM,kBAAkB;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;iBAQ9B,CAAC"}
@@ -1,14 +1,37 @@
1
1
  /**
2
- * @fileoverview EurLexContentService — HTTP client for the EUR-Lex REST content API.
3
- * Fetches full HTML or XML text of EU legal acts via the legal-content URL pattern.
4
- * Document content is NOT available via CELLAR work URI content negotiation (returns 400).
2
+ * @fileoverview EurLexContentService — HTTP client for EU act full-text content.
3
+ *
4
+ * Sources content from the EU Publications Office CELLAR content-negotiation
5
+ * resolver (`publications.europa.eu/resource/celex/{CELEX}`) — the same host the
6
+ * metadata SPARQL pipeline already queries — rather than the legacy
7
+ * `eur-lex.europa.eu` legal-content endpoint, which is now fronted by an AWS WAF
8
+ * that returns a JavaScript bot-challenge stub instead of the act text (issue #16).
9
+ *
10
+ * Content negotiation:
11
+ * - `Accept`: HTML acts vary by document family — OJ legislation exposes
12
+ * `application/xhtml+xml`, CJEU judgments expose `text/html`, so the HTML path
13
+ * tries both. The XML path requests Formex 4 (`application/xml;type=fmx4`),
14
+ * which CELLAR serves directly for single-part acts and returns HTTP 300
15
+ * (multiple manifestation streams) for multi-part OJ acts — treated as
16
+ * unavailable rather than reconstructed.
17
+ * - `Accept-Language`: CELLAR requires an ISO 639-2/T (three-letter) code and
18
+ * 400s on a missing one or on a bibliographic 639-2/B code (`ger`, `fre`);
19
+ * EUR-Lex two-letter codes are mapped before the request.
20
+ *
21
+ * Defense in depth: any response carrying an AWS WAF challenge signature is
22
+ * refused (never surfaced as content) and raised as a ServiceUnavailable error,
23
+ * so a challenge stub can never again be reported as `contentAvailable: true`.
5
24
  * @module services/eurlex-content/eurlex-content-service
6
25
  */
7
26
  import type { Context } from '@cyanheads/mcp-ts-core';
8
27
  import type { AppConfig } from '@cyanheads/mcp-ts-core/config';
9
28
  import type { StorageService } from '@cyanheads/mcp-ts-core/storage';
10
29
  import type { ServerConfig } from '../../config/server-config.js';
11
- export type ContentFormat = 'html' | 'xml';
30
+ /**
31
+ * Output formats a caller can request. `markdown` is not served by EUR-Lex — it is
32
+ * rendered server-side from the HTML body (see {@link WireFormat}).
33
+ */
34
+ export type ContentFormat = 'html' | 'xml' | 'markdown';
12
35
  /** Language codes supported by EUR-Lex (24 official EU languages). */
13
36
  export type EurLexLanguage = 'EN' | 'FR' | 'DE' | 'ES' | 'IT' | 'PL' | 'PT' | 'NL' | 'CS' | 'DA' | 'EL' | 'ET' | 'FI' | 'HU' | 'LT' | 'LV' | 'MT' | 'RO' | 'SK' | 'SL' | 'SV' | 'BG' | 'HR' | 'GA';
14
37
  export interface FetchContentResult {
@@ -24,21 +47,35 @@ export declare class EurLexContentService {
24
47
  private readonly timeoutMs;
25
48
  constructor(_config: AppConfig, _storage: StorageService, serverConfig: ServerConfig);
26
49
  /**
27
- * Build the EUR-Lex legal-content URL for a CELEX number.
28
- * Pattern: /legal-content/{LANG}/TXT/{FORMAT}/?uri=CELEX:{celex}
50
+ * Build the CELLAR content-negotiation URL for a CELEX number.
51
+ * Pattern: /resource/celex/{CELEX} (format + language come from request headers).
29
52
  */
30
- buildContentUrl(celexNumber: string, language: EurLexLanguage, format: ContentFormat): string;
53
+ buildContentUrl(celexNumber: string): string;
31
54
  /**
32
55
  * Fetch the full text content of an EU act by CELEX number.
33
56
  * If the requested language is unavailable, falls back to English.
34
57
  * Returns `contentAvailable: false` with an empty string if both attempts fail.
58
+ *
59
+ * Throws ServiceUnavailable if the content host returns an AWS WAF bot-challenge
60
+ * stub — a challenge is never reported as available content.
35
61
  */
36
62
  fetchContent(celexNumber: string, language: EurLexLanguage, format: ContentFormat, ctx: Context): Promise<FetchContentResult>;
37
63
  /**
38
- * Attempt to fetch content for a specific language. Returns null on non-200 or
39
- * empty/redirect response rather than throwing callers handle fallback logic.
64
+ * Resolve content for one language by trying each `Accept` variant for the
65
+ * format. Returns the first non-empty body, or null when none of the variants
66
+ * yield content (so the caller can fall back to English). Throws when a variant
67
+ * returns a bot-challenge stub.
68
+ */
69
+ private fetchForLanguage;
70
+ /**
71
+ * Single content-negotiation request for one `Accept`/`Accept-Language` pair.
72
+ * Non-2xx responses (404 = no datastream of that type, 300 = multi-part Formex,
73
+ * 4xx/5xx) and network failures resolve to `none` so callers can try the next
74
+ * variant or language. A WAF challenge body resolves to `challenge`. The inner
75
+ * function only throws on a `fetch` rejection, so `withRetry` retries transient
76
+ * network errors but never a 404 or a challenge.
40
77
  */
41
- private tryFetch;
78
+ private fetchVariant;
42
79
  }
43
80
  export declare function initEurLexContentService(config: AppConfig, storage: StorageService, serverConfig: ServerConfig): void;
44
81
  export declare function getEurLexContentService(): EurLexContentService;
@@ -1 +1 @@
1
- {"version":3,"file":"eurlex-content-service.d.ts","sourceRoot":"","sources":["../../../src/services/eurlex-content/eurlex-content-service.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAEH,OAAO,KAAK,EAAE,OAAO,EAAE,MAAM,wBAAwB,CAAC;AACtD,OAAO,KAAK,EAAE,SAAS,EAAE,MAAM,+BAA+B,CAAC;AAE/D,OAAO,KAAK,EAAE,cAAc,EAAE,MAAM,gCAAgC,CAAC;AAErE,OAAO,KAAK,EAAE,YAAY,EAAE,MAAM,2BAA2B,CAAC;AAE9D,MAAM,MAAM,aAAa,GAAG,MAAM,GAAG,KAAK,CAAC;AAE3C,sEAAsE;AACtE,MAAM,MAAM,cAAc,GACtB,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,CAAC;AAET,MAAM,WAAW,kBAAkB;IACjC,OAAO,EAAE,MAAM,CAAC;IAChB,gBAAgB,EAAE,OAAO,CAAC;IAC1B,MAAM,EAAE,aAAa,CAAC;IACtB,QAAQ,EAAE,cAAc,CAAC;IACzB,6CAA6C;IAC7C,gBAAgB,CAAC,EAAE,MAAM,CAAC;CAC3B;AAED,qBAAa,oBAAoB;IAC/B,OAAO,CAAC,QAAQ,CAAC,OAAO,CAAS;IACjC,OAAO,CAAC,QAAQ,CAAC,SAAS,CAAS;gBAEvB,OAAO,EAAE,SAAS,EAAE,QAAQ,EAAE,cAAc,EAAE,YAAY,EAAE,YAAY;IAKpF;;;OAGG;IACH,eAAe,CAAC,WAAW,EAAE,MAAM,EAAE,QAAQ,EAAE,cAAc,EAAE,MAAM,EAAE,aAAa,GAAG,MAAM;IAK7F;;;;OAIG;IACG,YAAY,CAChB,WAAW,EAAE,MAAM,EACnB,QAAQ,EAAE,cAAc,EACxB,MAAM,EAAE,aAAa,EACrB,GAAG,EAAE,OAAO,GACX,OAAO,CAAC,kBAAkB,CAAC;IAuB9B;;;OAGG;YACW,QAAQ;CA0CvB;AAMD,wBAAgB,wBAAwB,CACtC,MAAM,EAAE,SAAS,EACjB,OAAO,EAAE,cAAc,EACvB,YAAY,EAAE,YAAY,GACzB,IAAI,CAEN;AAED,wBAAgB,uBAAuB,IAAI,oBAAoB,CAO9D"}
1
+ {"version":3,"file":"eurlex-content-service.d.ts","sourceRoot":"","sources":["../../../src/services/eurlex-content/eurlex-content-service.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;;;;;;;;;;GAwBG;AAEH,OAAO,KAAK,EAAE,OAAO,EAAE,MAAM,wBAAwB,CAAC;AACtD,OAAO,KAAK,EAAE,SAAS,EAAE,MAAM,+BAA+B,CAAC;AAE/D,OAAO,KAAK,EAAE,cAAc,EAAE,MAAM,gCAAgC,CAAC;AAErE,OAAO,KAAK,EAAE,YAAY,EAAE,MAAM,2BAA2B,CAAC;AAG9D;;;GAGG;AACH,MAAM,MAAM,aAAa,GAAG,MAAM,GAAG,KAAK,GAAG,UAAU,CAAC;AAQxD,sEAAsE;AACtE,MAAM,MAAM,cAAc,GACtB,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,GACJ,IAAI,CAAC;AA0ET,MAAM,WAAW,kBAAkB;IACjC,OAAO,EAAE,MAAM,CAAC;IAChB,gBAAgB,EAAE,OAAO,CAAC;IAC1B,MAAM,EAAE,aAAa,CAAC;IACtB,QAAQ,EAAE,cAAc,CAAC;IACzB,6CAA6C;IAC7C,gBAAgB,CAAC,EAAE,MAAM,CAAC;CAC3B;AAED,qBAAa,oBAAoB;IAC/B,OAAO,CAAC,QAAQ,CAAC,OAAO,CAAS;IACjC,OAAO,CAAC,QAAQ,CAAC,SAAS,CAAS;gBAEvB,OAAO,EAAE,SAAS,EAAE,QAAQ,EAAE,cAAc,EAAE,YAAY,EAAE,YAAY;IAKpF;;;OAGG;IACH,eAAe,CAAC,WAAW,EAAE,MAAM,GAAG,MAAM;IAI5C;;;;;;;OAOG;IACG,YAAY,CAChB,WAAW,EAAE,MAAM,EACnB,QAAQ,EAAE,cAAc,EACxB,MAAM,EAAE,aAAa,EACrB,GAAG,EAAE,OAAO,GACX,OAAO,CAAC,kBAAkB,CAAC;IA2B9B;;;;;OAKG;YACW,gBAAgB;IAgC9B;;;;;;;OAOG;IACH,OAAO,CAAC,YAAY;CA+BrB;AAMD,wBAAgB,wBAAwB,CACtC,MAAM,EAAE,SAAS,EACjB,OAAO,EAAE,cAAc,EACvB,YAAY,EAAE,YAAY,GACzB,IAAI,CAEN;AAED,wBAAgB,uBAAuB,IAAI,oBAAoB,CAO9D"}
@@ -1,11 +1,94 @@
1
1
  /**
2
- * @fileoverview EurLexContentService — HTTP client for the EUR-Lex REST content API.
3
- * Fetches full HTML or XML text of EU legal acts via the legal-content URL pattern.
4
- * Document content is NOT available via CELLAR work URI content negotiation (returns 400).
2
+ * @fileoverview EurLexContentService — HTTP client for EU act full-text content.
3
+ *
4
+ * Sources content from the EU Publications Office CELLAR content-negotiation
5
+ * resolver (`publications.europa.eu/resource/celex/{CELEX}`) — the same host the
6
+ * metadata SPARQL pipeline already queries — rather than the legacy
7
+ * `eur-lex.europa.eu` legal-content endpoint, which is now fronted by an AWS WAF
8
+ * that returns a JavaScript bot-challenge stub instead of the act text (issue #16).
9
+ *
10
+ * Content negotiation:
11
+ * - `Accept`: HTML acts vary by document family — OJ legislation exposes
12
+ * `application/xhtml+xml`, CJEU judgments expose `text/html`, so the HTML path
13
+ * tries both. The XML path requests Formex 4 (`application/xml;type=fmx4`),
14
+ * which CELLAR serves directly for single-part acts and returns HTTP 300
15
+ * (multiple manifestation streams) for multi-part OJ acts — treated as
16
+ * unavailable rather than reconstructed.
17
+ * - `Accept-Language`: CELLAR requires an ISO 639-2/T (three-letter) code and
18
+ * 400s on a missing one or on a bibliographic 639-2/B code (`ger`, `fre`);
19
+ * EUR-Lex two-letter codes are mapped before the request.
20
+ *
21
+ * Defense in depth: any response carrying an AWS WAF challenge signature is
22
+ * refused (never surfaced as content) and raised as a ServiceUnavailable error,
23
+ * so a challenge stub can never again be reported as `contentAvailable: true`.
5
24
  * @module services/eurlex-content/eurlex-content-service
6
25
  */
7
26
  import { serviceUnavailable } from '@cyanheads/mcp-ts-core/errors';
8
27
  import { withRetry } from '@cyanheads/mcp-ts-core/utils';
28
+ import { htmlToMarkdown } from './html-to-markdown.js';
29
+ /**
30
+ * Map EUR-Lex two-letter language codes to the ISO 639-2/T (terminological,
31
+ * three-letter) codes CELLAR's content-negotiation resolver accepts in
32
+ * `Accept-Language`. CELLAR rejects bibliographic 639-2/B codes (`ger`, `fre`,
33
+ * `dut`, …), so the terminological forms (`deu`, `fra`, `nld`, …) are used.
34
+ */
35
+ const LANGUAGE_TO_ISO_639_2 = {
36
+ EN: 'eng',
37
+ FR: 'fra',
38
+ DE: 'deu',
39
+ ES: 'spa',
40
+ IT: 'ita',
41
+ PL: 'pol',
42
+ PT: 'por',
43
+ NL: 'nld',
44
+ CS: 'ces',
45
+ DA: 'dan',
46
+ EL: 'ell',
47
+ ET: 'est',
48
+ FI: 'fin',
49
+ HU: 'hun',
50
+ LT: 'lit',
51
+ LV: 'lav',
52
+ MT: 'mlt',
53
+ RO: 'ron',
54
+ SK: 'slk',
55
+ SL: 'slv',
56
+ SV: 'swe',
57
+ BG: 'bul',
58
+ HR: 'hrv',
59
+ GA: 'gle',
60
+ };
61
+ /**
62
+ * `Accept` values tried per format, in order. HTML resolves to `application/xhtml+xml`
63
+ * for OJ legislation and `text/html` for CJEU judgments; the first to return a body
64
+ * wins. XML requests Formex 4 only.
65
+ */
66
+ const ACCEPT_BY_FORMAT = {
67
+ html: ['application/xhtml+xml', 'text/html'],
68
+ xml: ['application/xml;type=fmx4'],
69
+ };
70
+ /**
71
+ * Render a fetched wire body into the requested output format. `html`/`xml` pass
72
+ * through verbatim; `markdown` is converted server-side from the HTML body.
73
+ */
74
+ function renderContent(body, format) {
75
+ return format === 'markdown' ? htmlToMarkdown(body) : body;
76
+ }
77
+ /**
78
+ * AWS WAF bot-challenge signatures. `awswaf` matches the challenge.js host
79
+ * (`token.awswaf.com`), the cookie-domain list, and the `AwsWafIntegration`
80
+ * calls; `gokuprops` matches the per-request challenge blob. Both are
81
+ * WAF-specific and never appear in legitimate EU legal text. Matched
82
+ * case-insensitively against the response head.
83
+ */
84
+ const CHALLENGE_MARKERS = ['awswaf', 'gokuprops'];
85
+ /** Bodies shorter than this (after trimming) are treated as empty/unavailable. */
86
+ const MIN_CONTENT_LENGTH = 100;
87
+ /** True when a response body carries an AWS WAF bot-challenge signature. */
88
+ function isChallengeResponse(body) {
89
+ const head = body.slice(0, 4096).toLowerCase();
90
+ return CHALLENGE_MARKERS.some((marker) => head.includes(marker));
91
+ }
9
92
  export class EurLexContentService {
10
93
  baseUrl;
11
94
  timeoutMs;
@@ -14,29 +97,34 @@ export class EurLexContentService {
14
97
  this.timeoutMs = serverConfig.sparqlQueryTimeoutMs;
15
98
  }
16
99
  /**
17
- * Build the EUR-Lex legal-content URL for a CELEX number.
18
- * Pattern: /legal-content/{LANG}/TXT/{FORMAT}/?uri=CELEX:{celex}
100
+ * Build the CELLAR content-negotiation URL for a CELEX number.
101
+ * Pattern: /resource/celex/{CELEX} (format + language come from request headers).
19
102
  */
20
- buildContentUrl(celexNumber, language, format) {
21
- const fmt = format === 'xml' ? 'XML' : 'HTML';
22
- return `${this.baseUrl}/legal-content/${language}/TXT/${fmt}/?uri=CELEX:${celexNumber}`;
103
+ buildContentUrl(celexNumber) {
104
+ return `${this.baseUrl}/resource/celex/${encodeURIComponent(celexNumber)}`;
23
105
  }
24
106
  /**
25
107
  * Fetch the full text content of an EU act by CELEX number.
26
108
  * If the requested language is unavailable, falls back to English.
27
109
  * Returns `contentAvailable: false` with an empty string if both attempts fail.
110
+ *
111
+ * Throws ServiceUnavailable if the content host returns an AWS WAF bot-challenge
112
+ * stub — a challenge is never reported as available content.
28
113
  */
29
114
  async fetchContent(celexNumber, language, format, ctx) {
30
- const primary = await this.tryFetch(celexNumber, language, format, ctx);
115
+ // `markdown` is rendered from the HTML body, so it is fetched as HTML; the
116
+ // returned `format` still reports `markdown` and `renderContent` converts.
117
+ const wireFormat = format === 'markdown' ? 'html' : format;
118
+ const primary = await this.fetchForLanguage(celexNumber, language, wireFormat, ctx);
31
119
  if (primary !== null) {
32
- return { content: primary, language, format, contentAvailable: true };
120
+ return { content: renderContent(primary, format), language, format, contentAvailable: true };
33
121
  }
34
- // Language fallback: try English if primary language failed
122
+ // Language fallback: try English if primary language failed.
35
123
  if (language !== 'EN') {
36
- const fallback = await this.tryFetch(celexNumber, 'EN', format, ctx);
124
+ const fallback = await this.fetchForLanguage(celexNumber, 'EN', wireFormat, ctx);
37
125
  if (fallback !== null) {
38
126
  return {
39
- content: fallback,
127
+ content: renderContent(fallback, format),
40
128
  language: 'EN',
41
129
  format,
42
130
  contentAvailable: true,
@@ -47,36 +135,64 @@ export class EurLexContentService {
47
135
  return { content: '', language, format, contentAvailable: false };
48
136
  }
49
137
  /**
50
- * Attempt to fetch content for a specific language. Returns null on non-200 or
51
- * empty/redirect response rather than throwing callers handle fallback logic.
138
+ * Resolve content for one language by trying each `Accept` variant for the
139
+ * format. Returns the first non-empty body, or null when none of the variants
140
+ * yield content (so the caller can fall back to English). Throws when a variant
141
+ * returns a bot-challenge stub.
52
142
  */
53
- async tryFetch(celexNumber, language, format, ctx) {
54
- const url = this.buildContentUrl(celexNumber, language, format);
143
+ async fetchForLanguage(celexNumber, language, format, ctx) {
144
+ const isoLanguage = LANGUAGE_TO_ISO_639_2[language];
145
+ if (!isoLanguage)
146
+ return null;
147
+ for (const accept of ACCEPT_BY_FORMAT[format]) {
148
+ const outcome = await this.fetchVariant(celexNumber, accept, isoLanguage, ctx);
149
+ if (outcome.kind === 'challenge') {
150
+ throw serviceUnavailable(`The EU content endpoint returned a bot-challenge interstitial instead of the act text for CELEX ${celexNumber}.`, {
151
+ celexNumber,
152
+ reason: 'content_challenge',
153
+ recovery: {
154
+ hint: 'The content host is behind a WAF/bot challenge. Retry shortly; metadata remains ' +
155
+ 'reachable via content_mode "metadata_only". A persistent challenge means ' +
156
+ 'EURLEX_CONTENT_BASE_URL points at a WAF-protected host rather than the EU ' +
157
+ 'Publications Office CELLAR resolver.',
158
+ },
159
+ });
160
+ }
161
+ if (outcome.kind === 'content')
162
+ return outcome.text;
163
+ }
164
+ return null;
165
+ }
166
+ /**
167
+ * Single content-negotiation request for one `Accept`/`Accept-Language` pair.
168
+ * Non-2xx responses (404 = no datastream of that type, 300 = multi-part Formex,
169
+ * 4xx/5xx) and network failures resolve to `none` so callers can try the next
170
+ * variant or language. A WAF challenge body resolves to `challenge`. The inner
171
+ * function only throws on a `fetch` rejection, so `withRetry` retries transient
172
+ * network errors but never a 404 or a challenge.
173
+ */
174
+ fetchVariant(celexNumber, accept, isoLanguage, ctx) {
175
+ const url = this.buildContentUrl(celexNumber);
55
176
  return withRetry(async () => {
56
177
  const response = await fetch(url, {
57
- headers: { Accept: format === 'xml' ? 'application/xml' : 'text/html' },
178
+ headers: { Accept: accept, 'Accept-Language': isoLanguage },
58
179
  signal: AbortSignal.timeout(this.timeoutMs),
59
180
  redirect: 'follow',
60
181
  });
61
- if (response.status === 404 || response.status === 302 || !response.ok) {
62
- return null;
63
- }
182
+ if (!response.ok)
183
+ return { kind: 'none' };
64
184
  const text = await response.text();
65
- // Detect HTML error pages masquerading as success (e.g. rate-limit pages)
66
- if (format === 'xml' && /^\s*<(!DOCTYPE\s+html|html[\s>])/i.test(text)) {
67
- throw serviceUnavailable('EUR-Lex returned HTML instead of XML content.');
68
- }
69
- // Empty content means unavailable
70
- if (text.trim().length < 100) {
71
- return null;
72
- }
73
- return text;
185
+ if (isChallengeResponse(text))
186
+ return { kind: 'challenge' };
187
+ if (text.trim().length < MIN_CONTENT_LENGTH)
188
+ return { kind: 'none' };
189
+ return { kind: 'content', text };
74
190
  }, {
75
- operation: 'EurLexContentService.tryFetch',
191
+ operation: 'EurLexContentService.fetchVariant',
76
192
  baseDelayMs: 1000,
77
193
  maxRetries: 2,
78
194
  signal: ctx.signal,
79
- }).catch(() => null); // Treat fetch failures as unavailable for language fallback
195
+ }).catch(() => ({ kind: 'none' }));
80
196
  }
81
197
  }
82
198
  // --- Init/accessor pattern ---
@@ -1 +1 @@
1
- {"version":3,"file":"eurlex-content-service.js","sourceRoot":"","sources":["../../../src/services/eurlex-content/eurlex-content-service.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAIH,OAAO,EAAE,kBAAkB,EAAE,MAAM,+BAA+B,CAAC;AAEnE,OAAO,EAAE,SAAS,EAAE,MAAM,8BAA8B,CAAC;AAyCzD,MAAM,OAAO,oBAAoB;IACd,OAAO,CAAS;IAChB,SAAS,CAAS;IAEnC,YAAY,OAAkB,EAAE,QAAwB,EAAE,YAA0B;QAClF,IAAI,CAAC,OAAO,GAAG,YAAY,CAAC,oBAAoB,CAAC,OAAO,CAAC,KAAK,EAAE,EAAE,CAAC,CAAC;QACpE,IAAI,CAAC,SAAS,GAAG,YAAY,CAAC,oBAAoB,CAAC;IACrD,CAAC;IAED;;;OAGG;IACH,eAAe,CAAC,WAAmB,EAAE,QAAwB,EAAE,MAAqB;QAClF,MAAM,GAAG,GAAG,MAAM,KAAK,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,CAAC,CAAC,MAAM,CAAC;QAC9C,OAAO,GAAG,IAAI,CAAC,OAAO,kBAAkB,QAAQ,QAAQ,GAAG,eAAe,WAAW,EAAE,CAAC;IAC1F,CAAC;IAED;;;;OAIG;IACH,KAAK,CAAC,YAAY,CAChB,WAAmB,EACnB,QAAwB,EACxB,MAAqB,EACrB,GAAY;QAEZ,MAAM,OAAO,GAAG,MAAM,IAAI,CAAC,QAAQ,CAAC,WAAW,EAAE,QAAQ,EAAE,MAAM,EAAE,GAAG,CAAC,CAAC;QACxE,IAAI,OAAO,KAAK,IAAI,EAAE,CAAC;YACrB,OAAO,EAAE,OAAO,EAAE,OAAO,EAAE,QAAQ,EAAE,MAAM,EAAE,gBAAgB,EAAE,IAAI,EAAE,CAAC;QACxE,CAAC;QAED,4DAA4D;QAC5D,IAAI,QAAQ,KAAK,IAAI,EAAE,CAAC;YACtB,MAAM,QAAQ,GAAG,MAAM,IAAI,CAAC,QAAQ,CAAC,WAAW,EAAE,IAAI,EAAE,MAAM,EAAE,GAAG,CAAC,CAAC;YACrE,IAAI,QAAQ,KAAK,IAAI,EAAE,CAAC;gBACtB,OAAO;oBACL,OAAO,EAAE,QAAQ;oBACjB,QAAQ,EAAE,IAAI;oBACd,MAAM;oBACN,gBAAgB,EAAE,IAAI;oBACtB,gBAAgB,EAAE,sBAAsB,QAAQ,yCAAyC;iBAC1F,CAAC;YACJ,CAAC;QACH,CAAC;QAED,OAAO,EAAE,OAAO,EAAE,EAAE,EAAE,QAAQ,EAAE,MAAM,EAAE,gBAAgB,EAAE,KAAK,EAAE,CAAC;IACpE,CAAC;IAED;;;OAGG;IACK,KAAK,CAAC,QAAQ,CACpB,WAAmB,EACnB,QAAwB,EACxB,MAAqB,EACrB,GAAY;QAEZ,MAAM,GAAG,GAAG,IAAI,CAAC,eAAe,CAAC,WAAW,EAAE,QAAQ,EAAE,MAAM,CAAC,CAAC;QAEhE,OAAO,SAAS,CACd,KAAK,IAAI,EAAE;YACT,MAAM,QAAQ,GAAG,MAAM,KAAK,CAAC,GAAG,EAAE;gBAChC,OAAO,EAAE,EAAE,MAAM,EAAE,MAAM,KAAK,KAAK,CAAC,CAAC,CAAC,iBAAiB,CAAC,CAAC,CAAC,WAAW,EAAE;gBACvE,MAAM,EAAE,WAAW,CAAC,OAAO,CAAC,IAAI,CAAC,SAAS,CAAC;gBAC3C,QAAQ,EAAE,QAAQ;aACnB,CAAC,CAAC;YAEH,IAAI,QAAQ,CAAC,MAAM,KAAK,GAAG,IAAI,QAAQ,CAAC,MAAM,KAAK,GAAG,IAAI,CAAC,QAAQ,CAAC,EAAE,EAAE,CAAC;gBACvE,OAAO,IAAI,CAAC;YACd,CAAC;YAED,MAAM,IAAI,GAAG,MAAM,QAAQ,CAAC,IAAI,EAAE,CAAC;YAEnC,0EAA0E;YAC1E,IAAI,MAAM,KAAK,KAAK,IAAI,mCAAmC,CAAC,IAAI,CAAC,IAAI,CAAC,EAAE,CAAC;gBACvE,MAAM,kBAAkB,CAAC,+CAA+C,CAAC,CAAC;YAC5E,CAAC;YAED,kCAAkC;YAClC,IAAI,IAAI,CAAC,IAAI,EAAE,CAAC,MAAM,GAAG,GAAG,EAAE,CAAC;gBAC7B,OAAO,IAAI,CAAC;YACd,CAAC;YAED,OAAO,IAAI,CAAC;QACd,CAAC,EACD;YACE,SAAS,EAAE,+BAA+B;YAC1C,WAAW,EAAE,IAAI;YACjB,UAAU,EAAE,CAAC;YACb,MAAM,EAAE,GAAG,CAAC,MAAM;SACnB,CACF,CAAC,KAAK,CAAC,GAAG,EAAE,CAAC,IAAI,CAAC,CAAC,CAAC,4DAA4D;IACnF,CAAC;CACF;AAED,gCAAgC;AAEhC,IAAI,QAA0C,CAAC;AAE/C,MAAM,UAAU,wBAAwB,CACtC,MAAiB,EACjB,OAAuB,EACvB,YAA0B;IAE1B,QAAQ,GAAG,IAAI,oBAAoB,CAAC,MAAM,EAAE,OAAO,EAAE,YAAY,CAAC,CAAC;AACrE,CAAC;AAED,MAAM,UAAU,uBAAuB;IACrC,IAAI,CAAC,QAAQ,EAAE,CAAC;QACd,MAAM,IAAI,KAAK,CACb,mFAAmF,CACpF,CAAC;IACJ,CAAC;IACD,OAAO,QAAQ,CAAC;AAClB,CAAC"}
1
+ {"version":3,"file":"eurlex-content-service.js","sourceRoot":"","sources":["../../../src/services/eurlex-content/eurlex-content-service.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;;;;;;;;;;GAwBG;AAIH,OAAO,EAAE,kBAAkB,EAAE,MAAM,+BAA+B,CAAC;AAEnE,OAAO,EAAE,SAAS,EAAE,MAAM,8BAA8B,CAAC;AAEzD,OAAO,EAAE,cAAc,EAAE,MAAM,uBAAuB,CAAC;AAyCvD;;;;;GAKG;AACH,MAAM,qBAAqB,GAAmC;IAC5D,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;IACT,EAAE,EAAE,KAAK;CACV,CAAC;AAEF;;;;GAIG;AACH,MAAM,gBAAgB,GAA0C;IAC9D,IAAI,EAAE,CAAC,uBAAuB,EAAE,WAAW,CAAC;IAC5C,GAAG,EAAE,CAAC,2BAA2B,CAAC;CACnC,CAAC;AAEF;;;GAGG;AACH,SAAS,aAAa,CAAC,IAAY,EAAE,MAAqB;IACxD,OAAO,MAAM,KAAK,UAAU,CAAC,CAAC,CAAC,cAAc,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC,IAAI,CAAC;AAC7D,CAAC;AAED;;;;;;GAMG;AACH,MAAM,iBAAiB,GAAG,CAAC,QAAQ,EAAE,WAAW,CAAC,CAAC;AAElD,kFAAkF;AAClF,MAAM,kBAAkB,GAAG,GAAG,CAAC;AAE/B,4EAA4E;AAC5E,SAAS,mBAAmB,CAAC,IAAY;IACvC,MAAM,IAAI,GAAG,IAAI,CAAC,KAAK,CAAC,CAAC,EAAE,IAAI,CAAC,CAAC,WAAW,EAAE,CAAC;IAC/C,OAAO,iBAAiB,CAAC,IAAI,CAAC,CAAC,MAAM,EAAE,EAAE,CAAC,IAAI,CAAC,QAAQ,CAAC,MAAM,CAAC,CAAC,CAAC;AACnE,CAAC;AAcD,MAAM,OAAO,oBAAoB;IACd,OAAO,CAAS;IAChB,SAAS,CAAS;IAEnC,YAAY,OAAkB,EAAE,QAAwB,EAAE,YAA0B;QAClF,IAAI,CAAC,OAAO,GAAG,YAAY,CAAC,oBAAoB,CAAC,OAAO,CAAC,KAAK,EAAE,EAAE,CAAC,CAAC;QACpE,IAAI,CAAC,SAAS,GAAG,YAAY,CAAC,oBAAoB,CAAC;IACrD,CAAC;IAED;;;OAGG;IACH,eAAe,CAAC,WAAmB;QACjC,OAAO,GAAG,IAAI,CAAC,OAAO,mBAAmB,kBAAkB,CAAC,WAAW,CAAC,EAAE,CAAC;IAC7E,CAAC;IAED;;;;;;;OAOG;IACH,KAAK,CAAC,YAAY,CAChB,WAAmB,EACnB,QAAwB,EACxB,MAAqB,EACrB,GAAY;QAEZ,2EAA2E;QAC3E,2EAA2E;QAC3E,MAAM,UAAU,GAAe,MAAM,KAAK,UAAU,CAAC,CAAC,CAAC,MAAM,CAAC,CAAC,CAAC,MAAM,CAAC;QAEvE,MAAM,OAAO,GAAG,MAAM,IAAI,CAAC,gBAAgB,CAAC,WAAW,EAAE,QAAQ,EAAE,UAAU,EAAE,GAAG,CAAC,CAAC;QACpF,IAAI,OAAO,KAAK,IAAI,EAAE,CAAC;YACrB,OAAO,EAAE,OAAO,EAAE,aAAa,CAAC,OAAO,EAAE,MAAM,CAAC,EAAE,QAAQ,EAAE,MAAM,EAAE,gBAAgB,EAAE,IAAI,EAAE,CAAC;QAC/F,CAAC;QAED,6DAA6D;QAC7D,IAAI,QAAQ,KAAK,IAAI,EAAE,CAAC;YACtB,MAAM,QAAQ,GAAG,MAAM,IAAI,CAAC,gBAAgB,CAAC,WAAW,EAAE,IAAI,EAAE,UAAU,EAAE,GAAG,CAAC,CAAC;YACjF,IAAI,QAAQ,KAAK,IAAI,EAAE,CAAC;gBACtB,OAAO;oBACL,OAAO,EAAE,aAAa,CAAC,QAAQ,EAAE,MAAM,CAAC;oBACxC,QAAQ,EAAE,IAAI;oBACd,MAAM;oBACN,gBAAgB,EAAE,IAAI;oBACtB,gBAAgB,EAAE,sBAAsB,QAAQ,yCAAyC;iBAC1F,CAAC;YACJ,CAAC;QACH,CAAC;QAED,OAAO,EAAE,OAAO,EAAE,EAAE,EAAE,QAAQ,EAAE,MAAM,EAAE,gBAAgB,EAAE,KAAK,EAAE,CAAC;IACpE,CAAC;IAED;;;;;OAKG;IACK,KAAK,CAAC,gBAAgB,CAC5B,WAAmB,EACnB,QAAwB,EACxB,MAAkB,EAClB,GAAY;QAEZ,MAAM,WAAW,GAAG,qBAAqB,CAAC,QAAQ,CAAC,CAAC;QACpD,IAAI,CAAC,WAAW;YAAE,OAAO,IAAI,CAAC;QAE9B,KAAK,MAAM,MAAM,IAAI,gBAAgB,CAAC,MAAM,CAAC,EAAE,CAAC;YAC9C,MAAM,OAAO,GAAG,MAAM,IAAI,CAAC,YAAY,CAAC,WAAW,EAAE,MAAM,EAAE,WAAW,EAAE,GAAG,CAAC,CAAC;YAC/E,IAAI,OAAO,CAAC,IAAI,KAAK,WAAW,EAAE,CAAC;gBACjC,MAAM,kBAAkB,CACtB,mGAAmG,WAAW,GAAG,EACjH;oBACE,WAAW;oBACX,MAAM,EAAE,mBAAmB;oBAC3B,QAAQ,EAAE;wBACR,IAAI,EACF,kFAAkF;4BAClF,2EAA2E;4BAC3E,4EAA4E;4BAC5E,sCAAsC;qBACzC;iBACF,CACF,CAAC;YACJ,CAAC;YACD,IAAI,OAAO,CAAC,IAAI,KAAK,SAAS;gBAAE,OAAO,OAAO,CAAC,IAAI,CAAC;QACtD,CAAC;QACD,OAAO,IAAI,CAAC;IACd,CAAC;IAED;;;;;;;OAOG;IACK,YAAY,CAClB,WAAmB,EACnB,MAAc,EACd,WAAmB,EACnB,GAAY;QAEZ,MAAM,GAAG,GAAG,IAAI,CAAC,eAAe,CAAC,WAAW,CAAC,CAAC;QAE9C,OAAO,SAAS,CACd,KAAK,IAA2B,EAAE;YAChC,MAAM,QAAQ,GAAG,MAAM,KAAK,CAAC,GAAG,EAAE;gBAChC,OAAO,EAAE,EAAE,MAAM,EAAE,MAAM,EAAE,iBAAiB,EAAE,WAAW,EAAE;gBAC3D,MAAM,EAAE,WAAW,CAAC,OAAO,CAAC,IAAI,CAAC,SAAS,CAAC;gBAC3C,QAAQ,EAAE,QAAQ;aACnB,CAAC,CAAC;YAEH,IAAI,CAAC,QAAQ,CAAC,EAAE;gBAAE,OAAO,EAAE,IAAI,EAAE,MAAM,EAAE,CAAC;YAE1C,MAAM,IAAI,GAAG,MAAM,QAAQ,CAAC,IAAI,EAAE,CAAC;YACnC,IAAI,mBAAmB,CAAC,IAAI,CAAC;gBAAE,OAAO,EAAE,IAAI,EAAE,WAAW,EAAE,CAAC;YAC5D,IAAI,IAAI,CAAC,IAAI,EAAE,CAAC,MAAM,GAAG,kBAAkB;gBAAE,OAAO,EAAE,IAAI,EAAE,MAAM,EAAE,CAAC;YACrE,OAAO,EAAE,IAAI,EAAE,SAAS,EAAE,IAAI,EAAE,CAAC;QACnC,CAAC,EACD;YACE,SAAS,EAAE,mCAAmC;YAC9C,WAAW,EAAE,IAAI;YACjB,UAAU,EAAE,CAAC;YACb,MAAM,EAAE,GAAG,CAAC,MAAM;SACnB,CACF,CAAC,KAAK,CAAC,GAAiB,EAAE,CAAC,CAAC,EAAE,IAAI,EAAE,MAAM,EAAE,CAAC,CAAC,CAAC;IAClD,CAAC;CACF;AAED,gCAAgC;AAEhC,IAAI,QAA0C,CAAC;AAE/C,MAAM,UAAU,wBAAwB,CACtC,MAAiB,EACjB,OAAuB,EACvB,YAA0B;IAE1B,QAAQ,GAAG,IAAI,oBAAoB,CAAC,MAAM,EAAE,OAAO,EAAE,YAAY,CAAC,CAAC;AACrE,CAAC;AAED,MAAM,UAAU,uBAAuB;IACrC,IAAI,CAAC,QAAQ,EAAE,CAAC;QACd,MAAM,IAAI,KAAK,CACb,mFAAmF,CACpF,CAAC;IACJ,CAAC;IACD,OAAO,QAAQ,CAAC;AAClB,CAAC"}
@@ -0,0 +1,29 @@
1
+ /**
2
+ * @fileoverview Server-side HTML→Markdown conversion for EU act bodies.
3
+ *
4
+ * EUR-Lex/CELLAR serves acts as CONVEX-generated XHTML in which the numbered
5
+ * structure — recitals, article paragraphs, lettered/roman points — is laid out
6
+ * in two-column tables: a narrow marker column (`(1)`, `(a)`, `1.1.`, `—`) beside
7
+ * a wide ~96% prose column. A naive HTML→Markdown pass turns each of these into an
8
+ * unreadable two-column GFM row (`| (1) | The protection of natural persons… |`).
9
+ *
10
+ * This module pre-processes the parsed DOM before conversion:
11
+ * - strips non-body chrome (`<head>`, inline `<style>`/`<script>`, the OJ
12
+ * masthead table, separators, dead intra-document fragment links);
13
+ * - flattens the numbering layout tables into inline-marked block text
14
+ * (`(1) The protection of natural persons…`), recursing innermost-first so
15
+ * nested points collapse cleanly;
16
+ * - preserves genuine data tables — CONVEX tags them `class="oj-table"` — so
17
+ * node-html-markdown renders them as real GFM tables.
18
+ *
19
+ * The conversion produces the full Markdown body; windowing/pagination is applied
20
+ * downstream by the caller (a paged window may land mid-structure — acceptable).
21
+ * @module services/eurlex-content/html-to-markdown
22
+ */
23
+ /**
24
+ * Convert an EU act XHTML/HTML body to clean Markdown. Numbering layout tables
25
+ * become inline-marked text; genuine data tables become GFM tables; no raw HTML
26
+ * leaks through. Returns the full converted body.
27
+ */
28
+ export declare function htmlToMarkdown(html: string): string;
29
+ //# sourceMappingURL=html-to-markdown.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"html-to-markdown.d.ts","sourceRoot":"","sources":["../../../src/services/eurlex-content/html-to-markdown.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;;;;;;;GAqBG;AA2BH;;;;GAIG;AACH,wBAAgB,cAAc,CAAC,IAAI,EAAE,MAAM,GAAG,MAAM,CAMnD"}
@@ -0,0 +1,172 @@
1
+ /**
2
+ * @fileoverview Server-side HTML→Markdown conversion for EU act bodies.
3
+ *
4
+ * EUR-Lex/CELLAR serves acts as CONVEX-generated XHTML in which the numbered
5
+ * structure — recitals, article paragraphs, lettered/roman points — is laid out
6
+ * in two-column tables: a narrow marker column (`(1)`, `(a)`, `1.1.`, `—`) beside
7
+ * a wide ~96% prose column. A naive HTML→Markdown pass turns each of these into an
8
+ * unreadable two-column GFM row (`| (1) | The protection of natural persons… |`).
9
+ *
10
+ * This module pre-processes the parsed DOM before conversion:
11
+ * - strips non-body chrome (`<head>`, inline `<style>`/`<script>`, the OJ
12
+ * masthead table, separators, dead intra-document fragment links);
13
+ * - flattens the numbering layout tables into inline-marked block text
14
+ * (`(1) The protection of natural persons…`), recursing innermost-first so
15
+ * nested points collapse cleanly;
16
+ * - preserves genuine data tables — CONVEX tags them `class="oj-table"` — so
17
+ * node-html-markdown renders them as real GFM tables.
18
+ *
19
+ * The conversion produces the full Markdown body; windowing/pagination is applied
20
+ * downstream by the caller (a paged window may land mid-structure — acceptable).
21
+ * @module services/eurlex-content/html-to-markdown
22
+ */
23
+ import { NodeHtmlMarkdown } from 'node-html-markdown';
24
+ import { HTMLElement, NodeType, parse } from 'node-html-parser';
25
+ /** Chrome removed wholesale before conversion (head/style/script/link/separators). */
26
+ const CHROME_SELECTOR = 'head, style, script, link, hr';
27
+ /**
28
+ * The OJ masthead table (date | language | "Official Journal of the European
29
+ * Union" | "L 119/1") is identified by these CONVEX paragraph classes and dropped
30
+ * so it never leads the body.
31
+ */
32
+ const MASTHEAD_SELECTOR = '.oj-hd-ti, .oj-hd-oj, .oj-hd-date, .oj-hd-lg';
33
+ /**
34
+ * Lead-column width (percent) at or below which a class-less table is treated as a
35
+ * numbering layout (its first column holds a `4%` marker), not tabular data. Every
36
+ * observed genuine data table leads with a column ≥ 20%; numbering tables lead with
37
+ * `4%` (single marker) or `4%`/`4%` (nested marker). The OJ masthead's `10%` lead
38
+ * also falls under this floor, a harmless extra guard since it is stripped above.
39
+ */
40
+ const LAYOUT_LEAD_COL_MAX_PCT = 10;
41
+ /** Max length of a first-cell string still considered an ordinal marker (col-less fallback). */
42
+ const MARKER_MAX_LEN = 6;
43
+ /**
44
+ * Convert an EU act XHTML/HTML body to clean Markdown. Numbering layout tables
45
+ * become inline-marked text; genuine data tables become GFM tables; no raw HTML
46
+ * leaks through. Returns the full converted body.
47
+ */
48
+ export function htmlToMarkdown(html) {
49
+ const root = parse(html, { comment: false });
50
+ stripChrome(root);
51
+ const body = root.querySelector('body') ?? root;
52
+ flattenLayoutTables(body);
53
+ return NodeHtmlMarkdown.translate(body.innerHTML).trim();
54
+ }
55
+ /** Remove document chrome and neutralize dead intra-document links in place. */
56
+ function stripChrome(root) {
57
+ for (const node of root.querySelectorAll(CHROME_SELECTOR))
58
+ node.remove();
59
+ for (const table of root.querySelectorAll('table')) {
60
+ if (table.querySelector(MASTHEAD_SELECTOR))
61
+ table.remove();
62
+ }
63
+ // Intra-document fragment anchors (footnote refs, internal cross-refs) don't
64
+ // survive as Markdown links — keep the visible text, drop the dead href.
65
+ for (const anchor of root.querySelectorAll('a')) {
66
+ const href = anchor.getAttribute('href') ?? '';
67
+ if (href === '' || href.startsWith('#')) {
68
+ anchor.replaceWith(parse(`<span>${anchor.innerHTML}</span>`));
69
+ }
70
+ }
71
+ }
72
+ /**
73
+ * Flatten numbering layout tables to inline-marked block text, innermost-first so
74
+ * a nested point is already collapsed when its parent row is rebuilt. Genuine data
75
+ * tables are left intact for GFM conversion.
76
+ */
77
+ function flattenLayoutTables(node) {
78
+ for (const child of [...node.childNodes]) {
79
+ if (child instanceof HTMLElement)
80
+ flattenLayoutTables(child);
81
+ }
82
+ if (isTag(node, 'table') && !isGenuineDataTable(node)) {
83
+ node.replaceWith(parse(flattenNumberingTable(node)));
84
+ }
85
+ }
86
+ /**
87
+ * Whether a table carries tabular data (→ GFM) rather than numbering layout
88
+ * (→ flattened text). CONVEX marks real tables `class="oj-table"`; for class-less
89
+ * tables, a wide lead column (or, when no `<col>` widths exist, rows that don't all
90
+ * begin with a short ordinal marker) signals genuine data.
91
+ */
92
+ function isGenuineDataTable(table) {
93
+ if (/\boj-table\b/.test(table.getAttribute('class') ?? ''))
94
+ return true;
95
+ const leadPct = leadColWidthPct(table);
96
+ if (leadPct !== null)
97
+ return leadPct > LAYOUT_LEAD_COL_MAX_PCT;
98
+ return !allRowsLeadWithMarker(table);
99
+ }
100
+ /** Width (percent) of the table's own first `<col>`, or null when absent/unparseable. */
101
+ function leadColWidthPct(table) {
102
+ const col = directChildrenByTag(table, ['col'])[0];
103
+ const match = (col?.getAttribute('width') ?? '').match(/^(\d+(?:\.\d+)?)\s*%/);
104
+ return match ? Number(match[1]) : null;
105
+ }
106
+ /** True when every row's first cell is empty or a short ordinal marker (no `<col>` widths). */
107
+ function allRowsLeadWithMarker(table) {
108
+ const rows = directRows(table);
109
+ if (rows.length === 0)
110
+ return false;
111
+ return rows.every((row) => {
112
+ const first = directCells(row)[0];
113
+ if (!first)
114
+ return true;
115
+ const text = first.text.trim();
116
+ return text === '' || (text.length <= MARKER_MAX_LEN && !/\s/.test(text));
117
+ });
118
+ }
119
+ /**
120
+ * Rebuild a numbering layout table as a `<div>` of block rows: the marker cell(s)
121
+ * are prefixed inline onto the prose cell so each row reads `(1) prose…`. The prose
122
+ * cell's inner HTML is preserved verbatim, so inline markup and any nested genuine
123
+ * tables (already-flattened nested points) carry through unchanged.
124
+ */
125
+ function flattenNumberingTable(table) {
126
+ const blocks = [];
127
+ for (const row of directRows(table)) {
128
+ const cells = directCells(row);
129
+ const prose = cells.at(-1);
130
+ if (!prose)
131
+ continue;
132
+ const marker = cells
133
+ .slice(0, -1)
134
+ .map((cell) => cell.text.trim())
135
+ .filter(Boolean)
136
+ .join(' ')
137
+ .replace(/\s+/g, ' ');
138
+ let inner = prose.innerHTML.trim();
139
+ if (marker) {
140
+ const openTag = inner.match(/^<p\b[^>]*>/i);
141
+ inner = openTag
142
+ ? inner.slice(0, openTag[0].length) +
143
+ `${escapeHtml(marker)} ` +
144
+ inner.slice(openTag[0].length)
145
+ : `<p>${escapeHtml(marker)}</p>${inner}`;
146
+ }
147
+ if (inner)
148
+ blocks.push(inner);
149
+ }
150
+ return `<div>${blocks.join('\n')}</div>`;
151
+ }
152
+ function isTag(node, tag) {
153
+ return node instanceof HTMLElement && node.rawTagName?.toLowerCase() === tag;
154
+ }
155
+ function directChildrenByTag(node, tags) {
156
+ return node.childNodes.filter((child) => child.nodeType === NodeType.ELEMENT_NODE &&
157
+ tags.includes(child.rawTagName?.toLowerCase()));
158
+ }
159
+ /** A table's own rows (direct `<tr>`, plus those under its direct sections) — never nested tables'. */
160
+ function directRows(table) {
161
+ const sections = directChildrenByTag(table, ['tbody', 'thead', 'tfoot']);
162
+ const rows = sections.flatMap((section) => directChildrenByTag(section, ['tr']));
163
+ rows.push(...directChildrenByTag(table, ['tr']));
164
+ return rows;
165
+ }
166
+ function directCells(row) {
167
+ return directChildrenByTag(row, ['td', 'th']);
168
+ }
169
+ function escapeHtml(value) {
170
+ return value.replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/>/g, '&gt;');
171
+ }
172
+ //# sourceMappingURL=html-to-markdown.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"html-to-markdown.js","sourceRoot":"","sources":["../../../src/services/eurlex-content/html-to-markdown.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;;;;;;;GAqBG;AAEH,OAAO,EAAE,gBAAgB,EAAE,MAAM,oBAAoB,CAAC;AACtD,OAAO,EAAE,WAAW,EAAa,QAAQ,EAAE,KAAK,EAAE,MAAM,kBAAkB,CAAC;AAE3E,sFAAsF;AACtF,MAAM,eAAe,GAAG,+BAA+B,CAAC;AAExD;;;;GAIG;AACH,MAAM,iBAAiB,GAAG,8CAA8C,CAAC;AAEzE;;;;;;GAMG;AACH,MAAM,uBAAuB,GAAG,EAAE,CAAC;AAEnC,gGAAgG;AAChG,MAAM,cAAc,GAAG,CAAC,CAAC;AAEzB;;;;GAIG;AACH,MAAM,UAAU,cAAc,CAAC,IAAY;IACzC,MAAM,IAAI,GAAG,KAAK,CAAC,IAAI,EAAE,EAAE,OAAO,EAAE,KAAK,EAAE,CAAC,CAAC;IAC7C,WAAW,CAAC,IAAI,CAAC,CAAC;IAClB,MAAM,IAAI,GAAG,IAAI,CAAC,aAAa,CAAC,MAAM,CAAC,IAAI,IAAI,CAAC;IAChD,mBAAmB,CAAC,IAAI,CAAC,CAAC;IAC1B,OAAO,gBAAgB,CAAC,SAAS,CAAC,IAAI,CAAC,SAAS,CAAC,CAAC,IAAI,EAAE,CAAC;AAC3D,CAAC;AAED,gFAAgF;AAChF,SAAS,WAAW,CAAC,IAAiB;IACpC,KAAK,MAAM,IAAI,IAAI,IAAI,CAAC,gBAAgB,CAAC,eAAe,CAAC;QAAE,IAAI,CAAC,MAAM,EAAE,CAAC;IACzE,KAAK,MAAM,KAAK,IAAI,IAAI,CAAC,gBAAgB,CAAC,OAAO,CAAC,EAAE,CAAC;QACnD,IAAI,KAAK,CAAC,aAAa,CAAC,iBAAiB,CAAC;YAAE,KAAK,CAAC,MAAM,EAAE,CAAC;IAC7D,CAAC;IACD,6EAA6E;IAC7E,yEAAyE;IACzE,KAAK,MAAM,MAAM,IAAI,IAAI,CAAC,gBAAgB,CAAC,GAAG,CAAC,EAAE,CAAC;QAChD,MAAM,IAAI,GAAG,MAAM,CAAC,YAAY,CAAC,MAAM,CAAC,IAAI,EAAE,CAAC;QAC/C,IAAI,IAAI,KAAK,EAAE,IAAI,IAAI,CAAC,UAAU,CAAC,GAAG,CAAC,EAAE,CAAC;YACxC,MAAM,CAAC,WAAW,CAAC,KAAK,CAAC,SAAS,MAAM,CAAC,SAAS,SAAS,CAAC,CAAC,CAAC;QAChE,CAAC;IACH,CAAC;AACH,CAAC;AAED;;;;GAIG;AACH,SAAS,mBAAmB,CAAC,IAAiB;IAC5C,KAAK,MAAM,KAAK,IAAI,CAAC,GAAG,IAAI,CAAC,UAAU,CAAC,EAAE,CAAC;QACzC,IAAI,KAAK,YAAY,WAAW;YAAE,mBAAmB,CAAC,KAAK,CAAC,CAAC;IAC/D,CAAC;IACD,IAAI,KAAK,CAAC,IAAI,EAAE,OAAO,CAAC,IAAI,CAAC,kBAAkB,CAAC,IAAI,CAAC,EAAE,CAAC;QACtD,IAAI,CAAC,WAAW,CAAC,KAAK,CAAC,qBAAqB,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC;IACvD,CAAC;AACH,CAAC;AAED;;;;;GAKG;AACH,SAAS,kBAAkB,CAAC,KAAkB;IAC5C,IAAI,cAAc,CAAC,IAAI,CAAC,KAAK,CAAC,YAAY,CAAC,OAAO,CAAC,IAAI,EAAE,CAAC;QAAE,OAAO,IAAI,CAAC;IACxE,MAAM,OAAO,GAAG,eAAe,CAAC,KAAK,CAAC,CAAC;IACvC,IAAI,OAAO,KAAK,IAAI;QAAE,OAAO,OAAO,GAAG,uBAAuB,CAAC;IAC/D,OAAO,CAAC,qBAAqB,CAAC,KAAK,CAAC,CAAC;AACvC,CAAC;AAED,yFAAyF;AACzF,SAAS,eAAe,CAAC,KAAkB;IACzC,MAAM,GAAG,GAAG,mBAAmB,CAAC,KAAK,EAAE,CAAC,KAAK,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC;IACnD,MAAM,KAAK,GAAG,CAAC,GAAG,EAAE,YAAY,CAAC,OAAO,CAAC,IAAI,EAAE,CAAC,CAAC,KAAK,CAAC,sBAAsB,CAAC,CAAC;IAC/E,OAAO,KAAK,CAAC,CAAC,CAAC,MAAM,CAAC,KAAK,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC,IAAI,CAAC;AACzC,CAAC;AAED,+FAA+F;AAC/F,SAAS,qBAAqB,CAAC,KAAkB;IAC/C,MAAM,IAAI,GAAG,UAAU,CAAC,KAAK,CAAC,CAAC;IAC/B,IAAI,IAAI,CAAC,MAAM,KAAK,CAAC;QAAE,OAAO,KAAK,CAAC;IACpC,OAAO,IAAI,CAAC,KAAK,CAAC,CAAC,GAAG,EAAE,EAAE;QACxB,MAAM,KAAK,GAAG,WAAW,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,CAAC;QAClC,IAAI,CAAC,KAAK;YAAE,OAAO,IAAI,CAAC;QACxB,MAAM,IAAI,GAAG,KAAK,CAAC,IAAI,CAAC,IAAI,EAAE,CAAC;QAC/B,OAAO,IAAI,KAAK,EAAE,IAAI,CAAC,IAAI,CAAC,MAAM,IAAI,cAAc,IAAI,CAAC,IAAI,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC,CAAC;IAC5E,CAAC,CAAC,CAAC;AACL,CAAC;AAED;;;;;GAKG;AACH,SAAS,qBAAqB,CAAC,KAAkB;IAC/C,MAAM,MAAM,GAAa,EAAE,CAAC;IAC5B,KAAK,MAAM,GAAG,IAAI,UAAU,CAAC,KAAK,CAAC,EAAE,CAAC;QACpC,MAAM,KAAK,GAAG,WAAW,CAAC,GAAG,CAAC,CAAC;QAC/B,MAAM,KAAK,GAAG,KAAK,CAAC,EAAE,CAAC,CAAC,CAAC,CAAC,CAAC;QAC3B,IAAI,CAAC,KAAK;YAAE,SAAS;QACrB,MAAM,MAAM,GAAG,KAAK;aACjB,KAAK,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC;aACZ,GAAG,CAAC,CAAC,IAAI,EAAE,EAAE,CAAC,IAAI,CAAC,IAAI,CAAC,IAAI,EAAE,CAAC;aAC/B,MAAM,CAAC,OAAO,CAAC;aACf,IAAI,CAAC,GAAG,CAAC;aACT,OAAO,CAAC,MAAM,EAAE,GAAG,CAAC,CAAC;QACxB,IAAI,KAAK,GAAG,KAAK,CAAC,SAAS,CAAC,IAAI,EAAE,CAAC;QACnC,IAAI,MAAM,EAAE,CAAC;YACX,MAAM,OAAO,GAAG,KAAK,CAAC,KAAK,CAAC,cAAc,CAAC,CAAC;YAC5C,KAAK,GAAG,OAAO;gBACb,CAAC,CAAC,KAAK,CAAC,KAAK,CAAC,CAAC,EAAE,OAAO,CAAC,CAAC,CAAC,CAAC,MAAM,CAAC;oBACjC,GAAG,UAAU,CAAC,MAAM,CAAC,GAAG;oBACxB,KAAK,CAAC,KAAK,CAAC,OAAO,CAAC,CAAC,CAAC,CAAC,MAAM,CAAC;gBAChC,CAAC,CAAC,MAAM,UAAU,CAAC,MAAM,CAAC,OAAO,KAAK,EAAE,CAAC;QAC7C,CAAC;QACD,IAAI,KAAK;YAAE,MAAM,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC;IAChC,CAAC;IACD,OAAO,QAAQ,MAAM,CAAC,IAAI,CAAC,IAAI,CAAC,QAAQ,CAAC;AAC3C,CAAC;AAED,SAAS,KAAK,CAAC,IAAU,EAAE,GAAW;IACpC,OAAO,IAAI,YAAY,WAAW,IAAI,IAAI,CAAC,UAAU,EAAE,WAAW,EAAE,KAAK,GAAG,CAAC;AAC/E,CAAC;AAED,SAAS,mBAAmB,CAAC,IAAiB,EAAE,IAAuB;IACrE,OAAO,IAAI,CAAC,UAAU,CAAC,MAAM,CAC3B,CAAC,KAAK,EAAwB,EAAE,CAC9B,KAAK,CAAC,QAAQ,KAAK,QAAQ,CAAC,YAAY;QACxC,IAAI,CAAC,QAAQ,CAAE,KAAqB,CAAC,UAAU,EAAE,WAAW,EAAE,CAAC,CAClE,CAAC;AACJ,CAAC;AAED,uGAAuG;AACvG,SAAS,UAAU,CAAC,KAAkB;IACpC,MAAM,QAAQ,GAAG,mBAAmB,CAAC,KAAK,EAAE,CAAC,OAAO,EAAE,OAAO,EAAE,OAAO,CAAC,CAAC,CAAC;IACzE,MAAM,IAAI,GAAG,QAAQ,CAAC,OAAO,CAAC,CAAC,OAAO,EAAE,EAAE,CAAC,mBAAmB,CAAC,OAAO,EAAE,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC;IACjF,IAAI,CAAC,IAAI,CAAC,GAAG,mBAAmB,CAAC,KAAK,EAAE,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC;IACjD,OAAO,IAAI,CAAC;AACd,CAAC;AAED,SAAS,WAAW,CAAC,GAAgB;IACnC,OAAO,mBAAmB,CAAC,GAAG,EAAE,CAAC,IAAI,EAAE,IAAI,CAAC,CAAC,CAAC;AAChD,CAAC;AAED,SAAS,UAAU,CAAC,KAAa;IAC/B,OAAO,KAAK,CAAC,OAAO,CAAC,IAAI,EAAE,OAAO,CAAC,CAAC,OAAO,CAAC,IAAI,EAAE,MAAM,CAAC,CAAC,OAAO,CAAC,IAAI,EAAE,MAAM,CAAC,CAAC;AAClF,CAAC"}
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@cyanheads/eur-lex-mcp-server",
3
- "version": "0.2.1",
3
+ "version": "0.4.0",
4
4
  "mcpName": "io.github.cyanheads/eur-lex-mcp-server",
5
5
  "description": "Search EU legislation, CJEU case law, and treaties; traverse the CELLAR relationship graph; resolve EuroVoc concepts via MCP. STDIO or Streamable HTTP.",
6
6
  "type": "module",
@@ -89,7 +89,9 @@
89
89
  "access": "public"
90
90
  },
91
91
  "dependencies": {
92
- "@cyanheads/mcp-ts-core": "^0.10.9",
92
+ "@cyanheads/mcp-ts-core": "^0.10.10",
93
+ "node-html-markdown": "^2.0.0",
94
+ "node-html-parser": "^8.0.4",
93
95
  "pino-pretty": "^13.1.3",
94
96
  "zod": "^4.4.3"
95
97
  },
package/server.json CHANGED
@@ -6,7 +6,7 @@
6
6
  "url": "https://github.com/cyanheads/eur-lex-mcp-server",
7
7
  "source": "github"
8
8
  },
9
- "version": "0.2.1",
9
+ "version": "0.4.0",
10
10
  "remotes": [
11
11
  {
12
12
  "type": "streamable-http",
@@ -19,7 +19,7 @@
19
19
  "registryBaseUrl": "https://registry.npmjs.org",
20
20
  "identifier": "@cyanheads/eur-lex-mcp-server",
21
21
  "runtimeHint": "bun",
22
- "version": "0.2.1",
22
+ "version": "0.4.0",
23
23
  "packageArguments": [
24
24
  {
25
25
  "type": "positional",
@@ -40,10 +40,10 @@
40
40
  },
41
41
  {
42
42
  "name": "EURLEX_CONTENT_BASE_URL",
43
- "description": "EUR-Lex content API base URL override.",
43
+ "description": "Base URL of the EU Publications Office CELLAR content-negotiation resolver that serves act full text; override e.g. for a local mirror.",
44
44
  "format": "string",
45
45
  "isRequired": false,
46
- "default": "https://eur-lex.europa.eu"
46
+ "default": "http://publications.europa.eu"
47
47
  },
48
48
  {
49
49
  "name": "SPARQL_QUERY_TIMEOUT_MS",
@@ -76,7 +76,7 @@
76
76
  "registryBaseUrl": "https://registry.npmjs.org",
77
77
  "identifier": "@cyanheads/eur-lex-mcp-server",
78
78
  "runtimeHint": "bun",
79
- "version": "0.2.1",
79
+ "version": "0.4.0",
80
80
  "packageArguments": [
81
81
  {
82
82
  "type": "positional",
@@ -138,10 +138,10 @@
138
138
  },
139
139
  {
140
140
  "name": "EURLEX_CONTENT_BASE_URL",
141
- "description": "EUR-Lex content API base URL override.",
141
+ "description": "Base URL of the EU Publications Office CELLAR content-negotiation resolver that serves act full text; override e.g. for a local mirror.",
142
142
  "format": "string",
143
143
  "isRequired": false,
144
- "default": "https://eur-lex.europa.eu"
144
+ "default": "http://publications.europa.eu"
145
145
  },
146
146
  {
147
147
  "name": "SPARQL_QUERY_TIMEOUT_MS",