@aborruso/ckan-mcp-server 0.4.99 → 0.4.106

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (4) hide show
  1. package/LOG.md +72 -0
  2. package/dist/index.js +436 -87
  3. package/dist/worker.js +207 -185
  4. package/package.json +1 -1
package/LOG.md CHANGED
@@ -1,5 +1,77 @@
1
1
  # LOG
2
2
 
3
+ ## 2026-05-31
4
+
5
+ ### v0.4.106
6
+
7
+ - Security fix (GHSA-g84h-j7jj-x32p): block `ip6-localhost` and `ip6-loopback` SSRF bypass — hostname aliases present in `/etc/hosts` on Linux that resolve to `::1` but bypassed the existing SSRF filter (GHSA-3xm7-qw7j-qc8v); replaced single `localhost` check with a blocked-hostname `Set`; 2 new unit tests added
8
+ - Reported by: hibrian827
9
+
10
+ ## 2026-05-25
11
+
12
+ ### v0.4.105
13
+
14
+ - Fix (scoring): `ckan_find_relevant_datasets` now scores `holder_name` (DCAT-AP_IT `dct:rightsHolder`) and `publisher_name` (`dct:publisher`) as distinct weighted fields, separate from `organization`
15
+ - Rationale: on federated catalogs (e.g. `dati.gov.it`, but the pattern applies to any portal harvesting from sub-publishers), `organization` is the harvesting catalog (e.g. `regione-puglia`), NOT the data owner. Queries like "datasets from Comune di Lecce" previously scored 0 on the owner field when the dataset was harvested via Regione Puglia or a local action group, missing the actual `rightsHolder`
16
+ - The fields are read from `extras[]` (the authoritative DCAT-AP_IT location on Italian portals) with fallback to root-level. On dati.gov.it, `package_search` exposes `holder_name` and `publisher_name` both in `extras[]` (correct DCAT values) and at root (often overwritten by the harvester with the organization name); reading only the root would be wrong. The root-level fallback preserves correct behavior on non-DCAT-AP_IT portals (data.gov, open.canada.ca)
17
+ - Bug surfaced on real-world Puglia datasets: `defibrillatori-esterni` (extras.holder=Comune di Mesagne, root.holder=GAL Terra dei Messapi, organization=GAL Terra dei Messapi) and `defibrillatori-dae-progetto-comune-cardioprotetto` (extras.holder=Comune di Lecce, organization=Regione Puglia)
18
+ - Defaults: `holder=4` (peer with `title` — actual institutional owner per DCAT-AP_IT), `publisher=2` (lower because sometimes a technical role like "Redazione OD" rather than the institution)
19
+ - API: `weights` object accepts two new optional fields (`holder`, `publisher`); backward-compatible — clients not setting them get the improved scoring by default
20
+ - Types: added `holder_name?: string` and `publisher_name?: string` to `CkanPackage` interface (previously accessed via index signature)
21
+ - Added internal helper `readDcatExtra(dataset, key)` that encapsulates the extras-first, root-fallback lookup
22
+ - Score breakdown markdown and JSON outputs include `holder` and `publisher` per dataset
23
+ - Validated against live `package_search` responses on dati.gov.it: defibrillatori-esterni (Mesagne) 6 → 12, comune-cardioprotetto (Lecce) 10 → 13
24
+
25
+ ## 2026-05-20
26
+
27
+ ### v0.4.104
28
+
29
+ - Source portal DataStore fallback + LLM error hints (see 2026-05-18 entries below)
30
+
31
+ ## 2026-05-18
32
+
33
+ - Source portal DataStore fallback: `ckan_list_resources` now probes the source portal when a resource has `datastore_active=false` and its download URL belongs to a different CKAN instance (harvested dataset pattern). Adds `source_datastore_active` and `source_portal_url` fields to output. New `check_source_portal` parameter (default `true`) to skip extra HTTP calls. New `extractSourcePortal()` utility in `url-generator.ts`. Scoped to CKAN-to-CKAN harvesting (detects `/resource/{uuid}/` URL pattern). 12 new tests → 399 total.
34
+ - LLM error hints: add `CkanApiError` class to `makeCkanRequest` (carries `status` + `action`); add `formatCkanError()` with hint table mapping HTTP status/action → actionable suggestion for the LLM (404 datastore → `ckan_package_show`, 404 package → `ckan_package_search`, 400 SQL → check columns, 503 → retry, etc.)
35
+ - Replace raw `error.message` interpolation in all tool catch blocks (datastore, package, organization, group, analyze, portal-discovery, quality) with `formatCkanError()`
36
+ - Replace fragile string-match in `organization.ts` (`includes('CKAN API error (500)')`) with `error instanceof CkanApiError && error.status === 500`
37
+ - Tests: 9 new unit tests for `CkanApiError` and `formatCkanError` → 387 total (381 pass, 6 skipped)
38
+
39
+ ## 2026-04-22
40
+
41
+ - Security: add optional domain allowlist (`CKAN_ALLOWED_DOMAINS` env var) in `validateServerUrl()`; blocks requests to unlisted public domains — CERT-AgID MCP recommendation
42
+ - Security: add structured audit logging to stderr for all `makeCkanRequest` calls in Node modes (stdio + http); fields: `ts`, `server`, `action`, `cache_hit`, query params — CERT-AgID MCP recommendation
43
+ - Tests: 8 new tests (4 allowlist, 3 audit log) → 383 total (377 pass, 6 skipped)
44
+
45
+ ## 2026-04-14
46
+
47
+ ### v0.4.102
48
+
49
+ - Add `cache_hit` field to worker telemetry log entries
50
+ - `getLastCacheHit()` exported from `src/utils/http.ts`; `worker.ts` logs after response with `cache_hit: true/false`
51
+ - Enables cache hit rate analysis in `worker_events_flat.jsonl`
52
+
53
+ ### v0.4.101
54
+
55
+ - Add per-portal upstream rate limiter (`src/utils/rate-limiter.ts`)
56
+ - Token bucket algorithm, one independent bucket per hostname
57
+ - Integration in `makeCkanRequest`: cache hit bypasses limiter; only upstream fetches consume tokens
58
+ - Config: `CKAN_RATE_LIMIT_ENABLED`, `CKAN_RATE_LIMIT_RPS` (default 5), `CKAN_RATE_LIMIT_BURST` (default 10), `CKAN_RATE_LIMIT_MAX_WAIT_MS` (default 5000)
59
+ - Per-call bypass via `opts.rateLimit: false`; disabled by default in Vitest runs
60
+ - `RateLimitError` thrown when wait exceeds `maxWaitMs` (includes hostname + wait ms)
61
+ - 13 new unit tests; suite now 370 tests, all green
62
+ - OpenSpec: `add-upstream-rate-limiter`
63
+
64
+ ### v0.4.100
65
+
66
+ - Add read-through HTTP cache layer in `makeCkanRequest` (`src/utils/cache.ts`)
67
+ - Backends: Cloudflare Cache API on Workers, bounded in-memory LRU on Node
68
+ - Action-based TTL: metadata 300s, datastore 60s, status 3600s; fallback configurable
69
+ - Env vars: `CKAN_CACHE_ENABLED`, `CKAN_CACHE_TTL_DEFAULT`, `CKAN_CACHE_MAX_ENTRIES`, `CKAN_CACHE_MAX_ENTRY_BYTES`
70
+ - Per-call bypass via `makeCkanRequest(..., { cache: false })`; errors and oversize payloads never cached
71
+ - 29 new unit tests; suite now 357 tests, all green
72
+ - Measured locally: 348 ms cache miss → 13 ms cache hit on identical `package_search` calls (~27×)
73
+ - OpenSpec: `add-http-cache-layer` (proposal + design + spec)
74
+
3
75
  ## 2026-04-09
4
76
 
5
77
  ### v0.4.99