@aborruso/ckan-mcp-server 0.4.98 → 0.4.105
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LOG.md +71 -0
- package/dist/index.js +442 -87
- package/dist/worker.js +207 -185
- package/package.json +1 -1
package/LOG.md
CHANGED
|
@@ -1,5 +1,76 @@
|
|
|
1
1
|
# LOG
|
|
2
2
|
|
|
3
|
+
## 2026-05-25
|
|
4
|
+
|
|
5
|
+
### v0.4.105
|
|
6
|
+
|
|
7
|
+
- Fix (scoring): `ckan_find_relevant_datasets` now scores `holder_name` (DCAT-AP_IT `dct:rightsHolder`) and `publisher_name` (`dct:publisher`) as distinct weighted fields, separate from `organization`
|
|
8
|
+
- Rationale: on federated catalogs (e.g. `dati.gov.it`, but the pattern applies to any portal harvesting from sub-publishers), `organization` is the harvesting catalog (e.g. `regione-puglia`), NOT the data owner. Queries like "datasets from Comune di Lecce" previously scored 0 on the owner field when the dataset was harvested via Regione Puglia or a local action group, missing the actual `rightsHolder`
|
|
9
|
+
- The fields are read from `extras[]` (the authoritative DCAT-AP_IT location on Italian portals) with fallback to root-level. On dati.gov.it, `package_search` exposes `holder_name` and `publisher_name` both in `extras[]` (correct DCAT values) and at root (often overwritten by the harvester with the organization name); reading only the root would be wrong. The root-level fallback preserves correct behavior on non-DCAT-AP_IT portals (data.gov, open.canada.ca)
|
|
10
|
+
- Bug surfaced on real-world Puglia datasets: `defibrillatori-esterni` (extras.holder=Comune di Mesagne, root.holder=GAL Terra dei Messapi, organization=GAL Terra dei Messapi) and `defibrillatori-dae-progetto-comune-cardioprotetto` (extras.holder=Comune di Lecce, organization=Regione Puglia)
|
|
11
|
+
- Defaults: `holder=4` (peer with `title` — actual institutional owner per DCAT-AP_IT), `publisher=2` (lower because sometimes a technical role like "Redazione OD" rather than the institution)
|
|
12
|
+
- API: `weights` object accepts two new optional fields (`holder`, `publisher`); backward-compatible — clients not setting them get the improved scoring by default
|
|
13
|
+
- Types: added `holder_name?: string` and `publisher_name?: string` to `CkanPackage` interface (previously accessed via index signature)
|
|
14
|
+
- Added internal helper `readDcatExtra(dataset, key)` that encapsulates the extras-first, root-fallback lookup
|
|
15
|
+
- Score breakdown markdown and JSON outputs include `holder` and `publisher` per dataset
|
|
16
|
+
- Validated against live `package_search` responses on dati.gov.it: defibrillatori-esterni (Mesagne) 6 → 12, comune-cardioprotetto (Lecce) 10 → 13
|
|
17
|
+
|
|
18
|
+
## 2026-05-20
|
|
19
|
+
|
|
20
|
+
### v0.4.104
|
|
21
|
+
|
|
22
|
+
- Source portal DataStore fallback + LLM error hints (see 2026-05-18 entries below)
|
|
23
|
+
|
|
24
|
+
## 2026-05-18
|
|
25
|
+
|
|
26
|
+
- Source portal DataStore fallback: `ckan_list_resources` now probes the source portal when a resource has `datastore_active=false` and its download URL belongs to a different CKAN instance (harvested dataset pattern). Adds `source_datastore_active` and `source_portal_url` fields to output. New `check_source_portal` parameter (default `true`) to skip extra HTTP calls. New `extractSourcePortal()` utility in `url-generator.ts`. Scoped to CKAN-to-CKAN harvesting (detects `/resource/{uuid}/` URL pattern). 12 new tests → 399 total.
|
|
27
|
+
- LLM error hints: add `CkanApiError` class to `makeCkanRequest` (carries `status` + `action`); add `formatCkanError()` with hint table mapping HTTP status/action → actionable suggestion for the LLM (404 datastore → `ckan_package_show`, 404 package → `ckan_package_search`, 400 SQL → check columns, 503 → retry, etc.)
|
|
28
|
+
- Replace raw `error.message` interpolation in all tool catch blocks (datastore, package, organization, group, analyze, portal-discovery, quality) with `formatCkanError()`
|
|
29
|
+
- Replace fragile string-match in `organization.ts` (`includes('CKAN API error (500)')`) with `error instanceof CkanApiError && error.status === 500`
|
|
30
|
+
- Tests: 9 new unit tests for `CkanApiError` and `formatCkanError` → 387 total (381 pass, 6 skipped)
|
|
31
|
+
|
|
32
|
+
## 2026-04-22
|
|
33
|
+
|
|
34
|
+
- Security: add optional domain allowlist (`CKAN_ALLOWED_DOMAINS` env var) in `validateServerUrl()`; blocks requests to unlisted public domains — CERT-AgID MCP recommendation
|
|
35
|
+
- Security: add structured audit logging to stderr for all `makeCkanRequest` calls in Node modes (stdio + http); fields: `ts`, `server`, `action`, `cache_hit`, query params — CERT-AgID MCP recommendation
|
|
36
|
+
- Tests: 8 new tests (4 allowlist, 3 audit log) → 383 total (377 pass, 6 skipped)
|
|
37
|
+
|
|
38
|
+
## 2026-04-14
|
|
39
|
+
|
|
40
|
+
### v0.4.102
|
|
41
|
+
|
|
42
|
+
- Add `cache_hit` field to worker telemetry log entries
|
|
43
|
+
- `getLastCacheHit()` exported from `src/utils/http.ts`; `worker.ts` logs after response with `cache_hit: true/false`
|
|
44
|
+
- Enables cache hit rate analysis in `worker_events_flat.jsonl`
|
|
45
|
+
|
|
46
|
+
### v0.4.101
|
|
47
|
+
|
|
48
|
+
- Add per-portal upstream rate limiter (`src/utils/rate-limiter.ts`)
|
|
49
|
+
- Token bucket algorithm, one independent bucket per hostname
|
|
50
|
+
- Integration in `makeCkanRequest`: cache hit bypasses limiter; only upstream fetches consume tokens
|
|
51
|
+
- Config: `CKAN_RATE_LIMIT_ENABLED`, `CKAN_RATE_LIMIT_RPS` (default 5), `CKAN_RATE_LIMIT_BURST` (default 10), `CKAN_RATE_LIMIT_MAX_WAIT_MS` (default 5000)
|
|
52
|
+
- Per-call bypass via `opts.rateLimit: false`; disabled by default in Vitest runs
|
|
53
|
+
- `RateLimitError` thrown when wait exceeds `maxWaitMs` (includes hostname + wait ms)
|
|
54
|
+
- 13 new unit tests; suite now 370 tests, all green
|
|
55
|
+
- OpenSpec: `add-upstream-rate-limiter`
|
|
56
|
+
|
|
57
|
+
### v0.4.100
|
|
58
|
+
|
|
59
|
+
- Add read-through HTTP cache layer in `makeCkanRequest` (`src/utils/cache.ts`)
|
|
60
|
+
- Backends: Cloudflare Cache API on Workers, bounded in-memory LRU on Node
|
|
61
|
+
- Action-based TTL: metadata 300s, datastore 60s, status 3600s; fallback configurable
|
|
62
|
+
- Env vars: `CKAN_CACHE_ENABLED`, `CKAN_CACHE_TTL_DEFAULT`, `CKAN_CACHE_MAX_ENTRIES`, `CKAN_CACHE_MAX_ENTRY_BYTES`
|
|
63
|
+
- Per-call bypass via `makeCkanRequest(..., { cache: false })`; errors and oversize payloads never cached
|
|
64
|
+
- 29 new unit tests; suite now 357 tests, all green
|
|
65
|
+
- Measured locally: 348 ms cache miss → 13 ms cache hit on identical `package_search` calls (~27×)
|
|
66
|
+
- OpenSpec: `add-http-cache-layer` (proposal + design + spec)
|
|
67
|
+
|
|
68
|
+
## 2026-04-09
|
|
69
|
+
|
|
70
|
+
### v0.4.99
|
|
71
|
+
|
|
72
|
+
- Fix `resolveSearchQuery`: `*:*` and fielded queries (e.g. `title:X`) are no longer wrapped in `text:(...)` when auto-detected parser override is "text" — `escapeSolrQuery` was turning `*:*` into `\*\:\*`, returning 0 results on portals like dati.comune.messina.it
|
|
73
|
+
|
|
3
74
|
## 2026-04-05
|
|
4
75
|
|
|
5
76
|
- Fix `ckan_package_search`: quoted phrases (e.g. `"aree protette"`) now work inside `text:(...)` wrapping — removed `"` from `escapeSolrQuery` special chars so phrase queries are preserved
|