@aborruso/ckan-mcp-server 0.4.99 → 0.4.105

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (4) hide show
  1. package/LOG.md +65 -0
  2. package/dist/index.js +430 -86
  3. package/dist/worker.js +207 -185
  4. package/package.json +1 -1
package/LOG.md CHANGED
@@ -1,5 +1,70 @@
1
1
  # LOG
2
2
 
3
+ ## 2026-05-25
4
+
5
+ ### v0.4.105
6
+
7
+ - Fix (scoring): `ckan_find_relevant_datasets` now scores `holder_name` (DCAT-AP_IT `dct:rightsHolder`) and `publisher_name` (`dct:publisher`) as distinct weighted fields, separate from `organization`
8
+ - Rationale: on federated catalogs (e.g. `dati.gov.it`, but the pattern applies to any portal harvesting from sub-publishers), `organization` is the harvesting catalog (e.g. `regione-puglia`), NOT the data owner. Queries like "datasets from Comune di Lecce" previously scored 0 on the owner field when the dataset was harvested via Regione Puglia or a local action group, missing the actual `rightsHolder`
9
+ - The fields are read from `extras[]` (the authoritative DCAT-AP_IT location on Italian portals) with fallback to root-level. On dati.gov.it, `package_search` exposes `holder_name` and `publisher_name` both in `extras[]` (correct DCAT values) and at root (often overwritten by the harvester with the organization name); reading only the root would be wrong. The root-level fallback preserves correct behavior on non-DCAT-AP_IT portals (data.gov, open.canada.ca)
10
+ - Bug surfaced on real-world Puglia datasets: `defibrillatori-esterni` (extras.holder=Comune di Mesagne, root.holder=GAL Terra dei Messapi, organization=GAL Terra dei Messapi) and `defibrillatori-dae-progetto-comune-cardioprotetto` (extras.holder=Comune di Lecce, organization=Regione Puglia)
11
+ - Defaults: `holder=4` (peer with `title` — actual institutional owner per DCAT-AP_IT), `publisher=2` (lower because sometimes a technical role like "Redazione OD" rather than the institution)
12
+ - API: `weights` object accepts two new optional fields (`holder`, `publisher`); backward-compatible — clients not setting them get the improved scoring by default
13
+ - Types: added `holder_name?: string` and `publisher_name?: string` to `CkanPackage` interface (previously accessed via index signature)
14
+ - Added internal helper `readDcatExtra(dataset, key)` that encapsulates the extras-first, root-fallback lookup
15
+ - Score breakdown markdown and JSON outputs include `holder` and `publisher` per dataset
16
+ - Validated against live `package_search` responses on dati.gov.it: defibrillatori-esterni (Mesagne) 6 → 12, comune-cardioprotetto (Lecce) 10 → 13
17
+
18
+ ## 2026-05-20
19
+
20
+ ### v0.4.104
21
+
22
+ - Source portal DataStore fallback + LLM error hints (see 2026-05-18 entries below)
23
+
24
+ ## 2026-05-18
25
+
26
+ - Source portal DataStore fallback: `ckan_list_resources` now probes the source portal when a resource has `datastore_active=false` and its download URL belongs to a different CKAN instance (harvested dataset pattern). Adds `source_datastore_active` and `source_portal_url` fields to output. New `check_source_portal` parameter (default `true`) to skip extra HTTP calls. New `extractSourcePortal()` utility in `url-generator.ts`. Scoped to CKAN-to-CKAN harvesting (detects `/resource/{uuid}/` URL pattern). 12 new tests → 399 total.
27
+ - LLM error hints: add `CkanApiError` class to `makeCkanRequest` (carries `status` + `action`); add `formatCkanError()` with hint table mapping HTTP status/action → actionable suggestion for the LLM (404 datastore → `ckan_package_show`, 404 package → `ckan_package_search`, 400 SQL → check columns, 503 → retry, etc.)
28
+ - Replace raw `error.message` interpolation in all tool catch blocks (datastore, package, organization, group, analyze, portal-discovery, quality) with `formatCkanError()`
29
+ - Replace fragile string-match in `organization.ts` (`includes('CKAN API error (500)')`) with `error instanceof CkanApiError && error.status === 500`
30
+ - Tests: 9 new unit tests for `CkanApiError` and `formatCkanError` → 387 total (381 pass, 6 skipped)
31
+
32
+ ## 2026-04-22
33
+
34
+ - Security: add optional domain allowlist (`CKAN_ALLOWED_DOMAINS` env var) in `validateServerUrl()`; blocks requests to unlisted public domains — CERT-AgID MCP recommendation
35
+ - Security: add structured audit logging to stderr for all `makeCkanRequest` calls in Node modes (stdio + http); fields: `ts`, `server`, `action`, `cache_hit`, query params — CERT-AgID MCP recommendation
36
+ - Tests: 8 new tests (4 allowlist, 3 audit log) → 383 total (377 pass, 6 skipped)
37
+
38
+ ## 2026-04-14
39
+
40
+ ### v0.4.102
41
+
42
+ - Add `cache_hit` field to worker telemetry log entries
43
+ - `getLastCacheHit()` exported from `src/utils/http.ts`; `worker.ts` logs after response with `cache_hit: true/false`
44
+ - Enables cache hit rate analysis in `worker_events_flat.jsonl`
45
+
46
+ ### v0.4.101
47
+
48
+ - Add per-portal upstream rate limiter (`src/utils/rate-limiter.ts`)
49
+ - Token bucket algorithm, one independent bucket per hostname
50
+ - Integration in `makeCkanRequest`: cache hit bypasses limiter; only upstream fetches consume tokens
51
+ - Config: `CKAN_RATE_LIMIT_ENABLED`, `CKAN_RATE_LIMIT_RPS` (default 5), `CKAN_RATE_LIMIT_BURST` (default 10), `CKAN_RATE_LIMIT_MAX_WAIT_MS` (default 5000)
52
+ - Per-call bypass via `opts.rateLimit: false`; disabled by default in Vitest runs
53
+ - `RateLimitError` thrown when wait exceeds `maxWaitMs` (includes hostname + wait ms)
54
+ - 13 new unit tests; suite now 370 tests, all green
55
+ - OpenSpec: `add-upstream-rate-limiter`
56
+
57
+ ### v0.4.100
58
+
59
+ - Add read-through HTTP cache layer in `makeCkanRequest` (`src/utils/cache.ts`)
60
+ - Backends: Cloudflare Cache API on Workers, bounded in-memory LRU on Node
61
+ - Action-based TTL: metadata 300s, datastore 60s, status 3600s; fallback configurable
62
+ - Env vars: `CKAN_CACHE_ENABLED`, `CKAN_CACHE_TTL_DEFAULT`, `CKAN_CACHE_MAX_ENTRIES`, `CKAN_CACHE_MAX_ENTRY_BYTES`
63
+ - Per-call bypass via `makeCkanRequest(..., { cache: false })`; errors and oversize payloads never cached
64
+ - 29 new unit tests; suite now 357 tests, all green
65
+ - Measured locally: 348 ms cache miss → 13 ms cache hit on identical `package_search` calls (~27×)
66
+ - OpenSpec: `add-http-cache-layer` (proposal + design + spec)
67
+
3
68
  ## 2026-04-09
4
69
 
5
70
  ### v0.4.99