@aborruso/ckan-mcp-server 0.4.17 → 0.4.18

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (87) hide show
  1. package/LOG.md +59 -0
  2. package/README.md +104 -34
  3. package/dist/index.js +161 -45
  4. package/dist/worker.js +42 -42
  5. package/package.json +12 -1
  6. package/.devin/wiki.json +0 -273
  7. package/CLAUDE.md +0 -398
  8. package/PRD.md +0 -999
  9. package/REFACTORING.md +0 -238
  10. package/examples/langgraph/01_basic_workflow.py +0 -277
  11. package/examples/langgraph/02_data_exploration.py +0 -366
  12. package/examples/langgraph/README.md +0 -719
  13. package/examples/langgraph/metadata_quality.py +0 -299
  14. package/examples/langgraph/requirements.txt +0 -12
  15. package/examples/langgraph/setup.sh +0 -32
  16. package/examples/langgraph/test_setup.py +0 -106
  17. package/openspec/AGENTS.md +0 -456
  18. package/openspec/changes/add-ckan-analyze-dataset-structure/proposal.md +0 -17
  19. package/openspec/changes/add-ckan-analyze-dataset-structure/specs/ckan-insights/spec.md +0 -7
  20. package/openspec/changes/add-ckan-analyze-dataset-structure/tasks.md +0 -6
  21. package/openspec/changes/add-ckan-analyze-dataset-updates/proposal.md +0 -17
  22. package/openspec/changes/add-ckan-analyze-dataset-updates/specs/ckan-insights/spec.md +0 -7
  23. package/openspec/changes/add-ckan-analyze-dataset-updates/tasks.md +0 -6
  24. package/openspec/changes/add-ckan-audit-tool/proposal.md +0 -17
  25. package/openspec/changes/add-ckan-audit-tool/specs/ckan-insights/spec.md +0 -7
  26. package/openspec/changes/add-ckan-audit-tool/tasks.md +0 -6
  27. package/openspec/changes/add-ckan-dataset-insights/proposal.md +0 -17
  28. package/openspec/changes/add-ckan-dataset-insights/specs/ckan-insights/spec.md +0 -7
  29. package/openspec/changes/add-ckan-dataset-insights/tasks.md +0 -6
  30. package/openspec/changes/add-ckan-host-allowlist-env/design.md +0 -38
  31. package/openspec/changes/add-ckan-host-allowlist-env/proposal.md +0 -16
  32. package/openspec/changes/add-ckan-host-allowlist-env/specs/ckan-request-allowlist/spec.md +0 -15
  33. package/openspec/changes/add-ckan-host-allowlist-env/specs/cloudflare-deployment/spec.md +0 -11
  34. package/openspec/changes/add-ckan-host-allowlist-env/tasks.md +0 -12
  35. package/openspec/changes/add-escape-text-query/proposal.md +0 -12
  36. package/openspec/changes/add-escape-text-query/specs/ckan-search/spec.md +0 -11
  37. package/openspec/changes/add-escape-text-query/tasks.md +0 -8
  38. package/openspec/changes/add-mqa-quality-tool/proposal.md +0 -21
  39. package/openspec/changes/add-mqa-quality-tool/specs/ckan-quality/spec.md +0 -71
  40. package/openspec/changes/add-mqa-quality-tool/tasks.md +0 -29
  41. package/openspec/changes/archive/2026-01-08-add-mcp-resources/design.md +0 -115
  42. package/openspec/changes/archive/2026-01-08-add-mcp-resources/proposal.md +0 -52
  43. package/openspec/changes/archive/2026-01-08-add-mcp-resources/specs/mcp-resources/spec.md +0 -92
  44. package/openspec/changes/archive/2026-01-08-add-mcp-resources/tasks.md +0 -56
  45. package/openspec/changes/archive/2026-01-08-expand-test-coverage-specs/design.md +0 -355
  46. package/openspec/changes/archive/2026-01-08-expand-test-coverage-specs/proposal.md +0 -161
  47. package/openspec/changes/archive/2026-01-08-expand-test-coverage-specs/tasks.md +0 -162
  48. package/openspec/changes/archive/2026-01-08-translate-project-to-english/proposal.md +0 -115
  49. package/openspec/changes/archive/2026-01-08-translate-project-to-english/specs/documentation-language/spec.md +0 -32
  50. package/openspec/changes/archive/2026-01-08-translate-project-to-english/tasks.md +0 -115
  51. package/openspec/changes/archive/2026-01-10-add-ckan-find-relevant-datasets/proposal.md +0 -17
  52. package/openspec/changes/archive/2026-01-10-add-ckan-find-relevant-datasets/specs/ckan-insights/spec.md +0 -7
  53. package/openspec/changes/archive/2026-01-10-add-ckan-find-relevant-datasets/tasks.md +0 -6
  54. package/openspec/changes/archive/2026-01-10-add-cloudflare-workers/design.md +0 -734
  55. package/openspec/changes/archive/2026-01-10-add-cloudflare-workers/proposal.md +0 -183
  56. package/openspec/changes/archive/2026-01-10-add-cloudflare-workers/specs/cloudflare-deployment/spec.md +0 -389
  57. package/openspec/changes/archive/2026-01-10-add-cloudflare-workers/tasks.md +0 -519
  58. package/openspec/changes/archive/2026-01-15-add-mcp-prompts/proposal.md +0 -13
  59. package/openspec/changes/archive/2026-01-15-add-mcp-prompts/specs/mcp-prompts/spec.md +0 -22
  60. package/openspec/changes/archive/2026-01-15-add-mcp-prompts/tasks.md +0 -10
  61. package/openspec/changes/archive/2026-01-15-add-mcp-resource-filters/proposal.md +0 -13
  62. package/openspec/changes/archive/2026-01-15-add-mcp-resource-filters/specs/mcp-resources/spec.md +0 -38
  63. package/openspec/changes/archive/2026-01-15-add-mcp-resource-filters/tasks.md +0 -10
  64. package/openspec/changes/archive/2026-01-19-update-repo-owner-ondata/proposal.md +0 -13
  65. package/openspec/changes/archive/2026-01-19-update-repo-owner-ondata/specs/repository-metadata/spec.md +0 -14
  66. package/openspec/changes/archive/2026-01-19-update-repo-owner-ondata/tasks.md +0 -12
  67. package/openspec/changes/archive/2026-01-19-update-search-parser-config/proposal.md +0 -13
  68. package/openspec/changes/archive/2026-01-19-update-search-parser-config/specs/ckan-insights/spec.md +0 -11
  69. package/openspec/changes/archive/2026-01-19-update-search-parser-config/specs/ckan-search/spec.md +0 -11
  70. package/openspec/changes/archive/2026-01-19-update-search-parser-config/tasks.md +0 -6
  71. package/openspec/changes/archive/add-automated-tests/design.md +0 -324
  72. package/openspec/changes/archive/add-automated-tests/proposal.md +0 -167
  73. package/openspec/changes/archive/add-automated-tests/specs/automated-testing/spec.md +0 -143
  74. package/openspec/changes/archive/add-automated-tests/tasks.md +0 -132
  75. package/openspec/project.md +0 -115
  76. package/openspec/specs/ckan-insights/spec.md +0 -23
  77. package/openspec/specs/ckan-search/spec.md +0 -16
  78. package/openspec/specs/cloudflare-deployment/spec.md +0 -344
  79. package/openspec/specs/documentation-language/spec.md +0 -32
  80. package/openspec/specs/mcp-prompts/spec.md +0 -26
  81. package/openspec/specs/mcp-resources/spec.md +0 -120
  82. package/openspec/specs/repository-metadata/spec.md +0 -19
  83. package/private/commenti-privati.yaml +0 -14
  84. package/testo.md +0 -12
  85. package/web-gui/PRD.md +0 -158
  86. package/web-gui/public/index.html +0 -883
  87. package/wrangler.toml +0 -6
package/PRD.md DELETED
@@ -1,999 +0,0 @@
1
- # Product Requirements Document (PRD)
2
-
3
- ## CKAN MCP Server
4
-
5
- **Version**: 0.4.12
6
- **Last Updated**: 2026-01-17
7
- **Author**: onData
8
- **Status**: Production
9
-
10
- ---
11
-
12
- ## 1. Executive Summary
13
-
14
- CKAN MCP Server is a Model Context Protocol (MCP) server that enables AI agents (like Claude Desktop) to interact with over 500 CKAN-based open data portals worldwide. The server exposes MCP tools to search datasets, explore organizations, query tabular data, and access complete metadata.
15
-
16
- ### 1.1 Problem Statement
17
-
18
- AI agents lack native capabilities to:
19
- - Discover and search datasets in open data portals
20
- - Query structured metadata from government datasets
21
- - Execute queries on tabular data published on CKAN portals
22
- - Explore public organizations and their open data production
23
-
24
- ### 1.2 Solution
25
-
26
- An MCP server that exposes tools to interact with CKAN API v3, enabling AI agents to:
27
- - Search datasets with advanced Solr queries and relevance ranking
28
- - Get complete metadata for datasets and resources
29
- - Explore organizations, groups, and tags
30
- - Query DataStore with filters, sorting, and SQL queries
31
- - Analyze statistics through faceting
32
- - Global deployment on Cloudflare Workers for worldwide edge access
33
-
34
- **Distribution Strategy**: Multi-platform deployment:
35
- - **npm registry**: Global installation with `npm install -g @aborruso/ckan-mcp-server`
36
- - **Cloudflare Workers**: Global edge deployment (https://ckan-mcp-server.andy-pr.workers.dev)
37
- - **Self-hosted**: HTTP server mode for custom infrastructure
38
-
39
- ---
40
-
41
- ## 2. Target Audience
42
-
43
- ### 2.1 Primary Users
44
-
45
- - **Data Scientist & Analyst**: Research and analysis of public datasets
46
- - **Civic Hacker & Developer**: Application development on open data
47
- - **Researcher & Journalist**: Investigation and analysis of government data
48
- - **Public Administration**: Exploration of open data catalogs
49
-
50
- ### 2.2 AI Agent Use Cases
51
-
52
- - **Claude Desktop**: Native integration via MCP configuration
53
- - **Other MCP clients**: Any MCP protocol-compatible client
54
- - **Automation**: Scripts and workflows requiring CKAN access
55
-
56
- ---
57
-
58
- ## 3. Core Requirements
59
-
60
- ### 3.1 Functional Requirements
61
-
62
- #### FR-1: Dataset Search
63
- - **Priority**: High
64
- - **Description**: Search datasets on any CKAN server using Solr syntax
65
- - **Acceptance Criteria**:
66
- - Full-text query support (q parameter)
67
- - Advanced filters (fq parameter)
68
- - Faceting for statistics (organization, tags, formats)
69
- - Pagination (start/rows)
70
- - Sorting (sort parameter)
71
- - Output in Markdown or JSON format
72
- - **Implementation Status**: ✅ Implemented (`ckan_package_search`)
73
-
74
- #### FR-2: Dataset Details
75
- - **Priority**: High
76
- - **Description**: Get complete metadata for a specific dataset
77
- - **Acceptance Criteria**:
78
- - Search by ID or name
79
- - Basic metadata (title, description, author, license)
80
- - Resource list with details (format, size, URL, DataStore status)
81
- - Organization and tags
82
- - Custom extra fields
83
- - Optional tracking statistics
84
- - **Implementation Status**: ✅ Implemented (`ckan_package_show`)
85
-
86
- #### FR-3: Organization Discovery
87
- - **Priority**: Medium
88
- - **Description**: Explore organizations publishing datasets
89
- - **Acceptance Criteria**:
90
- - List all organizations (with/without full details)
91
- - Search by name pattern
92
- - Sorting and pagination
93
- - Dataset count per organization
94
- - Complete organization details with dataset list
95
- - **Implementation Status**: ✅ Implemented (`ckan_organization_list`, `ckan_organization_show`, `ckan_organization_search`)
96
-
97
- #### FR-4: DataStore Query
98
- - **Priority**: High
99
- - **Description**: Query tabular data in CKAN DataStore with standard queries and SQL
100
- - **Acceptance Criteria**:
101
- - Query by resource_id
102
- - Key-value filters
103
- - Full-text search (q parameter)
104
- - Sorting and field selection
105
- - Pagination (limit/offset)
106
- - Distinct values
107
- - SQL queries with SELECT, WHERE, JOIN, GROUP BY
108
- - **Implementation Status**: ✅ Implemented (`ckan_datastore_search`, `ckan_datastore_search_sql`)
109
-
110
- #### FR-5: Tag Management
111
- - **Priority**: Medium
112
- - **Description**: Explore available tags in CKAN portals
113
- - **Acceptance Criteria**:
114
- - List all tags with dataset count
115
- - Search by name pattern
116
- - Pagination and sorting
117
- - Faceting with vocabularies
118
- - **Implementation Status**: ✅ Implemented (`ckan_tag_list`)
119
-
120
- #### FR-6: Group Management
121
- - **Priority**: Medium
122
- - **Description**: Explore thematic groups in CKAN portals
123
- - **Acceptance Criteria**:
124
- - List all groups
125
- - Search by pattern
126
- - Group details with included datasets
127
- - Sorting and pagination
128
- - **Implementation Status**: ✅ Implemented (`ckan_group_list`, `ckan_group_show`, `ckan_group_search`)
129
-
130
- #### FR-7: AI-Powered Dataset Discovery
131
- - **Priority**: High
132
- - **Description**: Search datasets with AI-based relevance ranking
133
- - **Acceptance Criteria**:
134
- - Natural language queries
135
- - Scoring based on title/description/tags match
136
- - Automatic relevance ranking
137
- - Output with score visibility
138
- - **Implementation Status**: ✅ Implemented (`ckan_find_relevant_datasets`)
139
-
140
- #### FR-8: Server Status Check
141
- - **Priority**: Low
142
- - **Description**: Check availability and version of a CKAN server
143
- - **Acceptance Criteria**:
144
- - Server connection verification
145
- - CKAN version information
146
- - Site title and URL
147
- - **Implementation Status**: ✅ Implemented (`ckan_status_show`)
148
-
149
- ### 3.2 Non-Functional Requirements
150
-
151
- #### NFR-1: Performance
152
- - **Response Time**: HTTP timeout at 30 seconds
153
- - **Throughput**: Limited by remote CKAN server APIs
154
- - **Scalability**:
155
- - Stateless, can handle multiple parallel requests
156
- - Cloudflare Workers: global edge deployment with cold start < 60ms
157
- - Workers free tier: 100,000 requests/day
158
- - **Bundle Size**: ~420KB (135KB gzipped)
159
-
160
- #### NFR-2: Reliability
161
- - **Error Handling**:
162
- - HTTP error management (404, 500, timeout)
163
- - Input validation with Zod strict schemas
164
- - Descriptive error messages
165
- - **Availability**: Depends on remote CKAN server availability
166
-
167
- #### NFR-3: Usability
168
- - **Output Format**:
169
- - Markdown for human readability (default)
170
- - JSON for programmatic processing
171
- - **Character Limit**: Automatic truncation at 50,000 characters
172
- - **Documentation**:
173
- - Complete README with examples
174
- - EXAMPLES.md with advanced use cases
175
- - HTML readme on worker root endpoint
176
- - Complete deployment guide
177
-
178
- #### NFR-4: Compatibility
179
- - **CKAN Versions**: API v3 (compatible with CKAN 2.x and 3.x)
180
- - **Node.js**: >= 18.0.0 (for local installation)
181
- - **Transport Modes**:
182
- - stdio (default) for local integration
183
- - HTTP for remote access
184
- - Cloudflare Workers for global edge deployment
185
- - **Runtimes**:
186
- - Node.js (local/self-hosted)
187
- - Cloudflare Workers (browser runtime, Web Standards API)
188
-
189
- #### NFR-5: Security
190
- - **Authentication**: Not supported (public endpoints only)
191
- - **Read-Only**: All tools are read-only, no data modification
192
- - **Input Validation**: Strict schema validation with Zod
193
-
194
- ---
195
-
196
- ## 4. Technical Architecture
197
-
198
- ### 4.1 Technology Stack
199
-
200
- **Runtime**:
201
- - Node.js >= 18.0.0 (local/self-hosted)
202
- - Cloudflare Workers (browser runtime, edge deployment)
203
- - TypeScript (ES2022)
204
-
205
- **Dependencies**:
206
- - `@modelcontextprotocol/sdk@^1.0.4` - MCP protocol implementation
207
- - `axios@^1.7.2` - HTTP client
208
- - `zod@^3.23.8` - Schema validation
209
- - `express@^4.19.2` - HTTP server (HTTP mode, optional)
210
-
211
- **Build Tools**:
212
- - `esbuild@^0.27.2` - Ultra-fast bundler (~50ms)
213
- - `typescript@^5.4.5` - Type checking and editor support
214
- - `wrangler@^4.58.0` - Cloudflare Workers CLI
215
-
216
- **Test Framework**:
217
- - `vitest@^4.0.16` - Test runner (191 tests, 100% passing)
218
-
219
- ### 4.2 Architecture Diagram
220
-
221
- ```
222
- ┌─────────────────────────────────────────────────────┐
223
- │ MCP Client │
224
- │ (Claude Desktop, etc.) │
225
- └─────────────┬───────────────────────────────────────┘
226
-
227
- │ MCP Protocol (stdio, HTTP, or Workers)
228
-
229
- ┌─────────────▼───────────────────────────────────────┐
230
- │ CKAN MCP Server │
231
- │ (Node.js or Workers runtime) │
232
- │ ┌───────────────────────────────────────────────┐ │
233
- │ │ MCP Tool Registry (13 tools) │ │
234
- │ │ - ckan_package_search │ │
235
- │ │ - ckan_package_show │ │
236
- │ │ - ckan_find_relevant_datasets │ │
237
- │ │ - ckan_organization_list/show/search │ │
238
- │ │ - ckan_group_list/show/search │ │
239
- │ │ - ckan_tag_list │ │
240
- │ │ - ckan_datastore_search │ │
241
- │ │ - ckan_datastore_search_sql │ │
242
- │ │ - ckan_status_show │ │
243
- │ └───────────┬───────────────────────────────────┘ │
244
- │ │ │
245
- │ ┌───────────▼───────────────────────────────────┐ │
246
- │ │ HTTP Client (axios/fetch) │ │
247
- │ │ - Timeout: 30s │ │
248
- │ │ - User-Agent: CKAN-MCP-Server/0.4.x │ │
249
- │ │ - Portal config with search parser override │ │
250
- │ └───────────┬───────────────────────────────────┘ │
251
- └──────────────┼───────────────────────────────────────┘
252
-
253
- │ HTTPS
254
-
255
- ┌──────────────▼───────────────────────────────────────┐
256
- │ CKAN Servers (worldwide) │
257
- │ - dati.gov.it (IT) │
258
- │ - data.gov (US) │
259
- │ - open.canada.ca (CA) │
260
- │ - data.gov.uk (UK) │
261
- │ - data.europa.eu (EU) │
262
- │ - 500+ other CKAN portals │
263
- └──────────────────────────────────────────────────────┘
264
- ```
265
-
266
- ### 4.3 Component Description
267
-
268
- #### MCP Tool Registry
269
- Registers available MCP tools with:
270
- - Input schema (Zod validation)
271
- - Output format (Markdown/JSON)
272
- - MCP annotations (readonly, idempotent, etc.)
273
- - Handler function
274
-
275
- #### HTTP Client Layer
276
- - Normalizes server URL (removes trailing slash)
277
- - Builds API endpoint: `{server_url}/api/3/action/{action}`
278
- - Handles timeout and errors
279
- - Validates response (`success: true`)
280
-
281
- #### Output Formatter
282
- - Markdown: Tables, sections, formatting for readability
283
- - JSON: Structured output with `structuredContent`
284
- - Truncation: Limits output to CHARACTER_LIMIT (50000)
285
-
286
- ---
287
-
288
- ## 5. MCP Tools Specification
289
-
290
- ### 5.1 ckan_package_search
291
-
292
- **Purpose**: Search datasets with advanced Solr queries
293
-
294
- **Input Parameters**:
295
- ```typescript
296
- {
297
- server_url: string (required) // CKAN server base URL
298
- q: string (default: "*:*") // Solr query
299
- fq?: string // Filter query
300
- rows: number (default: 10) // Results per page (max 1000)
301
- start: number (default: 0) // Pagination offset
302
- sort?: string // E.g.: "metadata_modified desc"
303
- facet_field?: string[] // Fields for faceting
304
- facet_limit: number (default: 50) // Max values per facet
305
- include_drafts: boolean (default: false)
306
- response_format: "markdown" | "json" (default: "markdown")
307
- }
308
- ```
309
-
310
- **Output**:
311
- - Total results count
312
- - Array of datasets with basic metadata
313
- - Facets (if requested)
314
- - Pagination links
315
-
316
- **Solr Query Examples**:
317
- - `q: "popolazione"` - Full-text search
318
- - `q: "title:covid"` - Search in field
319
- - `q: "tags:sanità"` - Tag filter
320
- - `fq: "organization:comune-palermo"` - Organization filter
321
- - `fq: "res_format:CSV"` - Resource format filter
322
-
323
- ### 5.2 ckan_package_show
324
-
325
- **Purpose**: Complete details of a dataset
326
-
327
- **Input Parameters**:
328
- ```typescript
329
- {
330
- server_url: string (required)
331
- id: string (required) // Dataset ID or name
332
- include_tracking: boolean (default: false)
333
- response_format: "markdown" | "json"
334
- }
335
- ```
336
-
337
- **Output**:
338
- - Complete metadata (title, description, author, license)
339
- - Organization
340
- - Tags and groups
341
- - Resource list with details (format, size, URL, DataStore status)
342
- - Custom extra fields
343
-
344
- ### 5.3 ckan_organization_list
345
-
346
- **Purpose**: List organizations
347
-
348
- **Input Parameters**:
349
- ```typescript
350
- {
351
- server_url: string (required)
352
- all_fields: boolean (default: false)
353
- sort: string (default: "name asc")
354
- limit: number (default: 100) // 0 for count only
355
- offset: number (default: 0)
356
- response_format: "markdown" | "json"
357
- }
358
- ```
359
-
360
- **Output**:
361
- - Array of organizations (names or complete objects)
362
- - If `limit=0`: count of organizations with datasets
363
-
364
- ### 5.4 ckan_organization_show
365
-
366
- **Purpose**: Specific organization details
367
-
368
- **Input Parameters**:
369
- ```typescript
370
- {
371
- server_url: string (required)
372
- id: string (required) // Organization ID or name
373
- include_datasets: boolean (default: true)
374
- include_users: boolean (default: false)
375
- response_format: "markdown" | "json"
376
- }
377
- ```
378
-
379
- **Output**:
380
- - Organization details
381
- - Dataset list (optional)
382
- - User list with roles (optional)
383
-
384
- ### 5.5 ckan_organization_search
385
-
386
- **Purpose**: Search organizations by pattern
387
-
388
- **Input Parameters**:
389
- ```typescript
390
- {
391
- server_url: string (required)
392
- pattern: string (required) // Pattern (automatic wildcards)
393
- response_format: "markdown" | "json"
394
- }
395
- ```
396
-
397
- **Output**:
398
- - List of matching organizations
399
- - Dataset count per organization
400
- - Total datasets
401
-
402
- **Implementation**: Uses `package_search` with `organization:*{pattern}*` and faceting
403
-
404
- ### 5.6 ckan_datastore_search
405
-
406
- **Purpose**: Query tabular data in DataStore
407
-
408
- **Input Parameters**:
409
- ```typescript
410
- {
411
- server_url: string (required)
412
- resource_id: string (required)
413
- q?: string // Full-text search
414
- filters?: Record<string, any> // Key-value filters
415
- limit: number (default: 100) // Max 32000
416
- offset: number (default: 0)
417
- fields?: string[] // Fields to return
418
- sort?: string // E.g.: "anno desc"
419
- distinct: boolean (default: false)
420
- response_format: "markdown" | "json"
421
- }
422
- ```
423
-
424
- **Output**:
425
- - Total records count
426
- - Fields metadata (type, id)
427
- - Records (max 50 in markdown for readability)
428
- - Pagination info
429
-
430
- **Limitations**:
431
- - Not all resources have active DataStore
432
- - Max 32000 records per query
433
-
434
- ### 5.7 ckan_status_show
435
-
436
- **Purpose**: Check CKAN server status
437
-
438
- **Input Parameters**:
439
- ```typescript
440
- {
441
- server_url: string (required)
442
- }
443
- ```
444
-
445
- **Output**:
446
- - Online status
447
- - CKAN version
448
- - Site title
449
- - Site URL
450
-
451
- ---
452
-
453
- ## 6. Supported CKAN Portals
454
-
455
- The server can connect to **any public CKAN server**. Main portals:
456
-
457
- | Country | Portal | URL |
458
- |---------|--------|-----|
459
- | 🇮🇹 Italia | Portale Nazionale Dati Aperti | https://www.dati.gov.it/opendata |
460
- | 🇺🇸 USA | Data.gov | https://catalog.data.gov |
461
- | 🇨🇦 Canada | Open Government | https://open.canada.ca/data |
462
- | 🇬🇧 UK | Data.gov.uk | https://data.gov.uk |
463
- | 🇪🇺 EU | European Data Portal | https://data.europa.eu |
464
- | 🌍 Demo | CKAN Official Demo | https://demo.ckan.org |
465
-
466
- **Compatibility**:
467
- - CKAN API v3 (CKAN 2.x and 3.x)
468
- - Over 500 portals worldwide
469
-
470
- ---
471
-
472
- ## 7. Installation & Deployment
473
-
474
- ### 7.1 Prerequisites
475
-
476
- - Node.js >= 18.0.0
477
- - npm or yarn
478
-
479
- ### 7.2 Installation
480
-
481
- #### Option 1: npm Package (Recommended - PLANNED)
482
-
483
- **Global Installation**:
484
- ```bash
485
- npm install -g ckan-mcp-server
486
- ```
487
-
488
- **Local Installation**:
489
- ```bash
490
- npm install ckan-mcp-server
491
- ```
492
-
493
- **npx (No Installation)**:
494
- ```bash
495
- npx ckan-mcp-server
496
- ```
497
-
498
- > **Note**: Publishing to npm registry is planned to enable simple installation like PyPI in Python. Currently requires installation from repository.
499
-
500
- #### Option 2: From Source (Current)
501
-
502
- ```bash
503
- git clone https://github.com/ondata/ckan-mcp-server
504
- cd ckan-mcp-server
505
- npm install
506
- npm run build
507
- ```
508
-
509
- ### 7.3 Usage Modes
510
-
511
- #### stdio Mode (Default)
512
- For integration with Claude Desktop and other local MCP clients:
513
-
514
- ```bash
515
- npm start
516
- ```
517
-
518
- **Claude Desktop Configuration** (`claude_desktop_config.json`):
519
-
520
- *After npm publication*:
521
- ```json
522
- {
523
- "mcpServers": {
524
- "ckan": {
525
- "command": "npx",
526
- "args": ["ckan-mcp-server"]
527
- }
528
- }
529
- }
530
- ```
531
-
532
- *Current (from source)*:
533
- ```json
534
- {
535
- "mcpServers": {
536
- "ckan": {
537
- "command": "node",
538
- "args": ["/path/to/ckan-mcp-server/dist/index.js"]
539
- }
540
- }
541
- }
542
- ```
543
-
544
- #### HTTP Mode
545
- For remote access via HTTP:
546
-
547
- ```bash
548
- TRANSPORT=http PORT=3000 npm start
549
- ```
550
-
551
- Server available at: `http://localhost:3000/mcp`
552
-
553
- **Test HTTP endpoint**:
554
- ```bash
555
- curl -X POST http://localhost:3000/mcp \
556
- -H "Content-Type: application/json" \
557
- -d '{"jsonrpc":"2.0","method":"tools/list","id":1}'
558
- ```
559
-
560
- ### 7.4 Build System
561
-
562
- The project uses **esbuild** (not tsc) for:
563
- - Ultra-fast build (~4ms vs minutes with tsc)
564
- - Minimal memory usage (important in WSL)
565
- - Automatic bundling with tree-shaking
566
-
567
- ```bash
568
- npm run build # Build con esbuild
569
- npm run watch # Watch mode
570
- npm run dev # Build + run
571
- ```
572
-
573
- ---
574
-
575
- ## 8. Use Cases & Examples
576
-
577
- ### 8.1 Use Case 1: Dataset Discovery
578
-
579
- **Scenario**: A data scientist searches for datasets on Italian population
580
-
581
- ```typescript
582
- // Step 1: Cerca dataset
583
- ckan_package_search({
584
- server_url: "https://www.dati.gov.it/opendata",
585
- q: "popolazione",
586
- rows: 20,
587
- sort: "metadata_modified desc"
588
- })
589
-
590
- // Step 2: Get details of interesting dataset
591
- ckan_package_show({
592
- server_url: "https://www.dati.gov.it/opendata",
593
- id: "popolazione-residente-2023"
594
- })
595
- ```
596
-
597
- ### 8.2 Use Case 2: Organization Analysis
598
-
599
- **Scenario**: Analyze regional open data production
600
-
601
- ```typescript
602
- // Step 1: Search for regional organizations
603
- ckan_organization_search({
604
- server_url: "https://www.dati.gov.it/opendata",
605
- pattern: "regione"
606
- })
607
-
608
- // Step 2: Analyze datasets of a region
609
- ckan_organization_show({
610
- server_url: "https://www.dati.gov.it/opendata",
611
- id: "regione-siciliana",
612
- include_datasets: true
613
- })
614
-
615
- // Step 3: Search for organization-specific datasets
616
- ckan_package_search({
617
- server_url: "https://www.dati.gov.it/opendata",
618
- fq: "organization:regione-siciliana",
619
- facet_field: ["tags", "res_format"],
620
- rows: 50
621
- })
622
- ```
623
-
624
- ### 8.3 Use Case 3: Data Analysis with DataStore
625
-
626
- **Scenario**: Analyze COVID-19 tabular data
627
-
628
- ```typescript
629
- // Step 1: Cerca dataset COVID
630
- ckan_package_search({
631
- server_url: "https://www.dati.gov.it/opendata",
632
- q: "covid",
633
- fq: "res_format:CSV"
634
- })
635
-
636
- // Step 2: Get details and resource_id
637
- ckan_package_show({
638
- server_url: "https://www.dati.gov.it/opendata",
639
- id: "covid-19-italia"
640
- })
641
-
642
- // Step 3: Query DataStore
643
- ckan_datastore_search({
644
- server_url: "https://www.dati.gov.it/opendata",
645
- resource_id: "abc-123-def",
646
- filters: { "regione": "Sicilia" },
647
- sort: "data desc",
648
- limit: 100
649
- })
650
- ```
651
-
652
- ### 8.4 Use Case 4: Statistical Analysis with Faceting
653
-
654
- **Scenario**: Analyze dataset distribution by format and organization
655
-
656
- ```typescript
657
- // Format statistics
658
- ckan_package_search({
659
- server_url: "https://www.dati.gov.it/opendata",
660
- facet_field: ["res_format"],
661
- facet_limit: 100,
662
- rows: 0 // Only facets, no results
663
- })
664
-
665
- // Organization statistics
666
- ckan_package_search({
667
- server_url: "https://www.dati.gov.it/opendata",
668
- facet_field: ["organization"],
669
- facet_limit: 50,
670
- rows: 0
671
- })
672
-
673
- // Tag distribution
674
- ckan_package_search({
675
- server_url: "https://www.dati.gov.it/opendata",
676
- facet_field: ["tags"],
677
- facet_limit: 50,
678
- rows: 0
679
- })
680
- ```
681
-
682
- ---
683
-
684
- ## 9. Limitations & Constraints
685
-
686
- ### 9.1 Current Limitations
687
-
688
- 1. **Read-Only**:
689
- - Does not support dataset creation/modification
690
- - Only public endpoints (no authentication)
691
-
692
- 2. **Character Limit**:
693
- - Output truncated at 50,000 characters
694
- - Hardcoded, not configurable
695
-
696
- 3. **No Caching**:
697
- - Every request makes a fresh HTTP call
698
- - Cloudflare Workers can use edge cache (optional)
699
-
700
- 4. **DataStore Limitations**:
701
- - Not all resources have active DataStore
702
- - Max 32,000 records per query
703
- - Depends on CKAN server configuration
704
-
705
- 5. **SQL Support Limitations**:
706
- - `ckan_datastore_search_sql` works only if the portal exposes the SQL endpoint
707
- - Some portals disable SQL for security reasons
708
- - Workers runtime supports SQL queries without limitations
709
-
710
- 6. **Timeout**:
711
- - Fixed 30 seconds for HTTP request
712
- - Cloudflare Workers has stricter timeout (10s per fetch)
713
-
714
- 7. **Locale**:
715
- - Dates formatted in ISO `YYYY-MM-DD`
716
- - Not parameterized
717
-
718
- ### 9.2 External Dependencies
719
-
720
- - **Network**: Requires internet connection
721
- - **CKAN Server Availability**: Depends on remote server availability
722
- - **CKAN API Compatibility**: Requires CKAN API v3
723
-
724
- ### 9.3 Known Issues
725
-
726
- - Cloudflare Workers has stricter timeout (10s) compared to Node.js (30s)
727
- - Some CKAN portals have non-standard configurations that might require workarounds
728
-
729
- ---
730
-
731
- ## 10. Future Enhancements
732
-
733
- ### 10.1 Completed Features
734
-
735
- #### ✅ npm Package Publication (v0.3.2+)
736
- - Published on npm registry: `@aborruso/ckan-mcp-server`
737
- - Global installation: `npm install -g @aborruso/ckan-mcp-server`
738
- - Executable CLI: `ckan-mcp-server`
739
- - Semantic versioning (semver)
740
-
741
- #### ✅ SQL Query Support (v0.4.4)
742
- - Implemented `ckan_datastore_search_sql`
743
- - Full support for SELECT, WHERE, JOIN, GROUP BY
744
- - Requires portals with active DataStore SQL
745
-
746
- #### ✅ AI-Powered Discovery (v0.4.6)
747
- - Tool `ckan_find_relevant_datasets`
748
- - Relevance ranking with scoring
749
- - Natural language queries
750
-
751
- #### ✅ Tags and Groups (v0.4.3)
752
- - Tool `ckan_tag_list` with faceting
753
- - Tool `ckan_group_list`, `ckan_group_show`, `ckan_group_search`
754
- - Full support for taxonomy exploration
755
-
756
- #### ✅ Cloudflare Workers Deployment (v0.4.0)
757
- - Global edge deployment: https://ckan-mcp-server.andy-pr.workers.dev
758
- - Free tier: 100k requests/day
759
- - Cold start < 60ms
760
- - Complete documentation in DEPLOYMENT.md
761
-
762
- #### ✅ Portal Search Parser Configuration (v0.4.7)
763
- - Per-portal query parser override
764
- - Handling portals with restrictive parsers
765
- - URL generator for browse/search links
766
-
767
- ### 10.2 Planned Features
768
-
769
- #### High Priority
770
-
771
- - [ ] **Authentication Support**
772
- - API key for private endpoints
773
- - OAuth for portals that support it
774
-
775
- - [ ] **Caching Layer**
776
- - Cache frequent results in Workers KV
777
- - Configurable TTL
778
- - Invalidation strategy
779
-
780
- #### Medium Priority
781
-
782
- - [ ] **Advanced DataStore Features**
783
- - Support for aggregations
784
- - JOIN between resources
785
- - Computed fields
786
-
787
- - [ ] **Batch Operations**
788
- - Multiple parallel queries
789
- - Bulk export
790
-
791
- - [ ] **Configuration**
792
- - Configurable timeout
793
- - Configurable character limit
794
- - Configurable locale
795
-
796
- #### Low Priority
797
-
798
- - [ ] **Write Operations** (se richiesto)
799
- - Create/update dataset
800
- - Upload risorse
801
- - Requires authentication
802
-
803
- - [ ] **Advanced Filtering**
804
- - Spatial filters (geo queries)
805
- - Temporal filters (date ranges)
806
-
807
- - [ ] **Export Formats**
808
- - CSV export
809
- - Excel export
810
- - Graph visualization data
811
-
812
- ### 10.2 Distribution & Deployment
813
-
814
- ✅ **Completed**:
815
- - npm registry publication: `@aborruso/ckan-mcp-server`
816
- - Global installation: `npm install -g @aborruso/ckan-mcp-server`
817
- - CLI command: `ckan-mcp-server`
818
- - GitHub Releases with semantic tags
819
- - Cloudflare Workers deployment
820
-
821
- **Future**:
822
- - [ ] Docker image (optional)
823
- - [ ] Kubernetes deployment examples
824
-
825
- ### 10.3 Testing & Quality
826
-
827
- ✅ **Current State**:
828
- - 191 unit and integration tests (100% passing)
829
- - vitest test runner
830
- - Coverage for all 13 tools
831
- - Fixtures for offline testing
832
-
833
- **Future**:
834
- - [ ] Performance benchmarks
835
- - [ ] E2E tests with live CKAN server
836
- - [ ] Load testing on Workers
837
-
838
- ### 10.4 Documentation
839
-
840
- ✅ **Current State**:
841
- - Complete README with examples
842
- - EXAMPLES.md with advanced use cases
843
- - DEPLOYMENT.md with release workflow
844
- - HTML readme on worker root
845
- - Updated PRD.md
846
-
847
- **Future**:
848
- - [ ] OpenAPI/Swagger spec for HTTP mode
849
- - [ ] Video tutorial
850
- - [ ] Best practices guide for query optimization
851
-
852
- ---
853
-
854
- ## 11. Success Metrics
855
-
856
- ### 11.1 Technical Metrics
857
-
858
- - **Build Time**: ~50-70ms (esbuild Node.js + Workers)
859
- - **Bundle Size**: ~420KB (~135KB gzipped)
860
- - **Memory Usage**: < 50MB runtime (Node.js), Workers limits apply
861
- - **Response Time**: < 30s (CKAN API timeout), < 10s (Workers)
862
- - **Cold Start**: < 60ms (Cloudflare Workers)
863
- - **Test Coverage**: 191 tests (100% passing)
864
-
865
- ### 11.2 Distribution Metrics
866
-
867
- ✅ **Achieved**:
868
- - npm package published: `@aborruso/ckan-mcp-server`
869
- - Installation time: < 1 minute
870
- - GitHub releases with semantic versioning
871
- - Cloudflare Workers deployment live
872
-
873
- **Future tracking**:
874
- - npm weekly/monthly downloads
875
- - Workers request count (100k/day free tier)
876
- - Installation success rate
877
-
878
- ### 11.3 Usage Metrics
879
-
880
- - Number of MCP tool calls per session
881
- - Most used tools
882
- - Average results per search
883
- - Error rate by tool
884
- - Server coverage (unique CKAN servers used)
885
-
886
- ### 11.4 Quality Metrics
887
-
888
- - Zero known security vulnerabilities
889
- - Error messages clarity
890
- - Documentation completeness
891
- - User satisfaction (GitHub issues/feedback)
892
-
893
- ---
894
-
895
- ## 12. References
896
-
897
- ### 12.1 Documentation
898
-
899
- - [CKAN API Documentation](https://docs.ckan.org/en/latest/api/)
900
- - [MCP Protocol Specification](https://modelcontextprotocol.io/)
901
- - [Apache Solr Query Syntax](https://solr.apache.org/guide/solr/latest/query-guide/standard-query-parser.html)
902
-
903
- ### 12.2 Related Resources
904
-
905
- - [CKAN Official Site](https://ckan.org/)
906
- - [onData Community](https://www.ondata.it/)
907
- - [dati.gov.it](https://www.dati.gov.it/opendata/)
908
-
909
- ### 12.3 Code Repository
910
-
911
- - **GitHub**: https://github.com/aborruso/ckan-mcp-server
912
- - **npm**: https://www.npmjs.com/package/@aborruso/ckan-mcp-server
913
- - **Live Demo**: https://ckan-mcp-server.andy-pr.workers.dev
914
- - **License**: MIT License
915
- - **Author**: Andrea Borruso (@aborruso)
916
- - **Community**: onData
917
-
918
- ---
919
-
920
- ## 13. Appendix
921
-
922
- ### 13.1 Glossary
923
-
924
- - **CKAN**: Comprehensive Knowledge Archive Network - open source platform for open data portals
925
- - **MCP**: Model Context Protocol - protocol for integrating AI agents with external tools
926
- - **Solr**: Apache Solr - full-text search engine used by CKAN
927
- - **DataStore**: CKAN feature for SQL-like queries on tabular data
928
- - **Faceting**: Statistical aggregations for distributive analysis
929
- - **Package**: CKAN term for "dataset"
930
- - **Resource**: File or API endpoint associated with a dataset
931
-
932
- ### 13.2 Solr Query Syntax Quick Reference
933
-
934
- ```
935
- # Full-text search
936
- q: "popolazione"
937
-
938
- # Field search
939
- q: "title:covid"
940
- q: "notes:sanità"
941
-
942
- # Boolean operators
943
- q: "popolazione AND sicilia"
944
- q: "popolazione OR abitanti"
945
- q: "popolazione NOT censimento"
946
-
947
- # Wildcard
948
- q: "popola*"
949
- q: "*salute*"
950
-
951
- # Filter query (no score impact)
952
- fq: "organization:comune-palermo"
953
- fq: "res_format:CSV"
954
-
955
- # Date range
956
- fq: "metadata_modified:[2023-01-01T00:00:00Z TO *]"
957
- fq: "metadata_created:[NOW-7DAYS TO NOW]"
958
- ```
959
-
960
- ### 13.3 Response Format Examples
961
-
962
- **Markdown Output** (human-readable):
963
- ```markdown
964
- # CKAN Package Search Results
965
-
966
- **Server**: https://www.dati.gov.it/opendata
967
- **Query**: popolazione
968
- **Total Results**: 1234
969
-
970
- ## Datasets
971
-
972
- ### Popolazione Residente 2023
973
- - **ID**: `abc-123-def`
974
- - **Organization**: ISTAT
975
- - **Tags**: popolazione, demografia, censimento
976
- ...
977
- ```
978
-
979
- **JSON Output** (machine-readable):
980
- ```json
981
- {
982
- "count": 1234,
983
- "results": [
984
- {
985
- "id": "abc-123-def",
986
- "name": "popolazione-residente-2023",
987
- "title": "Popolazione Residente 2023",
988
- "organization": { "name": "istat", "title": "ISTAT" }
989
- }
990
- ]
991
- }
992
- ```
993
-
994
- ---
995
-
996
- **Document Version**: 1.0.0
997
- **Created**: 2026-01-08
998
- **Status**: Approved
999
- **Next Review**: 2026-04-08