npm - mcp-chilegob-dataset - Versions diffs - 0.1.0 → 0.2.0 - Mend

mcp-chilegob-dataset 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/README.md CHANGED Viewed

@@ -20,12 +20,12 @@ El **Model Context Protocol (MCP)** es un estándar abierto que permite a los as
 ### Con Claude Desktop (recomendado)
-**Paso 1** — Abrí la configuración de Claude Desktop.
+**Paso 1** — Abre la configuración de Claude Desktop.
 En Mac: `~/Library/Application Support/Claude/claude_desktop_config.json`
 En Windows: `%APPDATA%\Claude\claude_desktop_config.json`
-**Paso 2** — Agregá estas líneas:
+**Paso 2** — Agrega estas líneas:
 ```json
 {
@@ -38,12 +38,12 @@ En Windows: `%APPDATA%\Claude\claude_desktop_config.json`
 }
 ```
-**Paso 3** — Reiniciá Claude Desktop. La primera vez descarga el paquete automáticamente.
+**Paso 3** — Reinicia Claude Desktop. La primera vez descarga el paquete automáticamente.
-Eso es todo. Podés pedirle a Claude cosas como:
+Eso es todo. Puedes pedirle a Claude cosas como:
-> *"Buscá datasets sobre educación en datos.gob.cl"*
-> *"Mostrá los datos del dataset de matrícula universitaria"*
+> *"Busca datasets sobre educación en datos.gob.cl"*
+> *"Muestra los datos del dataset de matrícula universitaria"*
 > *"¿Qué datasets de salud hay disponibles?"*
 ---
@@ -110,13 +110,21 @@ Obtiene los metadatos completos de un dataset: descripción, recursos, etiquetas
 }
 ```
-> Verificá el campo `datastore_available` antes de usar `get_resource_data`. Los recursos sin datastore deben descargarse desde su `url`.
+> Verifica el campo `datastore_available` antes de usar `get_resource_data`. Los recursos sin datastore deben descargarse desde su `url`.
 ---
 ### `get_resource_data`
-Lee filas tabulares de un recurso con el datastore CKAN habilitado. Soporta paginación.
+Lee filas tabulares de un recurso CKAN. Intenta primero el datastore y, si no está disponible, descarga y parsea el archivo directamente. Soporta paginación.
+**Estrategia de obtención de datos (automática):**
+1. **Datastore CKAN** — acceso estructurado y rápido; disponible solo en algunos recursos.
+2. **Descarga directa** — si el datastore no está habilitado, la herramienta descarga el archivo y lo parsea en memoria:
+   - `CSV` / `TSV` — parseado nativo, devuelve filas y columnas.
+   - `JSON` — parseado nativo si el contenido es un array de objetos.
+   - `XLS`, `PDF` y otros formatos binarios — no se parsean; se devuelve la URL directa para que el usuario los descargue.
 **Parámetros:**
@@ -126,10 +134,11 @@ Lee filas tabulares de un recurso con el datastore CKAN habilitado. Soporta pagi
 | `limit` | number | ❌ | 50 | Filas a devolver (1–500) |
 | `offset` | number | ❌ | 0 | Desplazamiento para paginación |
-**Ejemplo de respuesta:**
+**Ejemplo de respuesta (datastore):**
 ```json
 {
+  "source": "datastore",
   "total": 42,
   "returned": 3,
   "offset": 0,
@@ -144,7 +153,36 @@ Lee filas tabulares de un recurso con el datastore CKAN habilitado. Soporta pagi
 }
 ```
-> Si el recurso no tiene datastore habilitado, la herramienta devuelve un mensaje descriptivo indicando cómo acceder al archivo directamente.
+**Ejemplo de respuesta (archivo CSV descargado):**
+```json
+{
+  "source": "file",
+  "format": "CSV",
+  "url": "https://datosabiertos.mineduc.cl/...",
+  "total": 1500,
+  "returned": 50,
+  "offset": 0,
+  "fields": [
+    { "id": "Region", "type": "text" }
+  ],
+  "records": [
+    { "Region": "METROPOLITANA" }
+  ]
+}
+```
+**Ejemplo de respuesta (formato no parseable):**
+```json
+{
+  "source": "file",
+  "parseable": false,
+  "format": "XLS",
+  "url": "https://datosabiertos.mineduc.cl/archivo.xls",
+  "message": "This resource is a XLS file and cannot be parsed automatically. Download it directly from the URL above."
+}
+```
 ---
@@ -174,17 +212,17 @@ npm install
 ### Probar localmente (MCP Inspector)
-Con el servidor corriendo (`npm run dev`), abrí otra terminal y ejecutá:
+Con el servidor corriendo (`npm run dev`), abre otra terminal y ejecuta:
 ```bash
 npx @modelcontextprotocol/inspector
 ```
-Se abre una interfaz web en `http://localhost:6274`. Configurá:
+Se abre una interfaz web en `http://localhost:6274`. Configura:
 - **Transport type**: `Streamable HTTP`
 - **URL**: `http://localhost:3000/mcp`
-Desde ahí podés llamar a cualquier herramienta de forma interactiva y ver las respuestas.
+Desde ahí puedes llamar a cualquier herramienta de forma interactiva y ver las respuestas.
 ### Arquitectura
@@ -206,8 +244,8 @@ src/
 ### Cómo agregar nuevas herramientas
-1. Creá `src/tools/tu-herramienta.ts`
-2. Importala y registrala en `src/server.ts`
+1. Crea `src/tools/tu-herramienta.ts`
+2. Impórtala y regístrala en `src/server.ts`
 ```typescript
 // src/tools/tu-herramienta.ts
@@ -225,7 +263,7 @@ export function registerTuHerramienta(server: McpServer): void {
       }),
     },
     async ({ parametro }) => {
-      // tu lógica acá
+      // tu lógica aquí
       return {
         content: [{ type: 'text', text: JSON.stringify({ resultado: parametro }) }],
       }
@@ -238,7 +276,9 @@ export function registerTuHerramienta(server: McpServer): void {
 ## Limitaciones conocidas
-- **Disponibilidad del datastore** — No todos los recursos tienen datastore habilitado en CKAN. Los archivos (CSV, XLS, PDF) deben descargarse desde su URL directa.
+- **Disponibilidad del datastore** — No todos los recursos tienen datastore habilitado en CKAN. `get_resource_data` intenta automáticamente descargar el archivo (CSV, TSV, JSON); los formatos binarios (XLS, PDF) requieren descarga manual desde la URL devuelta.
+- **Archivos grandes** — La descarga directa carga el archivo completo en memoria antes de paginar. Para archivos muy grandes (>100 MB) esto puede ser lento o fallar.
+- **Encoding** — Los archivos CSV de datos.gob.cl pueden estar en ISO-8859-1 (Latin-1). La herramienta intenta leerlos como UTF-8; si los caracteres aparecen corruptos, descarga el archivo directamente.
 - **Sin caché** — Cada llamada hace una solicitud en vivo a datos.gob.cl. No hay límites de tasa documentados.
 - **Paquetes en alpha** — `@modelcontextprotocol/hono` y `@modelcontextprotocol/server` están en versión alpha.
@@ -253,7 +293,7 @@ Las contribuciones son bienvenidas. Algunas ideas:
 - [ ] Caché en memoria para reducir llamadas a la API
 - [ ] MCP Resources con URI templates (`datos-gob-cl://dataset/{id}`)
-Por favor, abrí un issue antes de enviar un PR grande.
+Por favor, abre un issue antes de enviar un PR grande.
 ---

package/dist/ckan.js CHANGED Viewed

@@ -28,3 +28,70 @@ export async function getResourceData(resourceId, limit = 50, offset = 0) {
         offset,
     });
 }
+export async function getResource(resourceId) {
+    return ckanAction('resource_show', { id: resourceId });
+}
+const PARSEABLE_FORMATS = new Set(['CSV', 'TSV', 'JSON']);
+export async function fetchAndParseFile(url, format, limit, offset) {
+    const normalizedFormat = format.toUpperCase().trim();
+    if (!PARSEABLE_FORMATS.has(normalizedFormat)) {
+        throw new Error(`FORMAT_NOT_PARSEABLE:${normalizedFormat}:${url}`);
+    }
+    const response = await fetch(url);
+    if (!response.ok) {
+        throw new Error(`Failed to fetch file: ${response.status} ${response.statusText}`);
+    }
+    if (normalizedFormat === 'JSON') {
+        const json = await response.json();
+        const rows = Array.isArray(json)
+            ? json
+            : [{ data: json }];
+        const page = rows.slice(offset, offset + limit);
+        const fields = page.length > 0
+            ? Object.keys(page[0]).map(key => ({ id: key, type: 'text' }))
+            : [];
+        return { fields, records: page, total: rows.length, source: 'file' };
+    }
+    // CSV / TSV
+    const text = await response.text();
+    const separator = normalizedFormat === 'TSV' ? '\t' : ',';
+    const lines = text.split(/\r?\n/).filter(l => l.trim() !== '');
+    if (lines.length === 0) {
+        return { fields: [], records: [], total: 0, source: 'file' };
+    }
+    const headers = parseDelimitedLine(lines[0], separator);
+    const dataLines = lines.slice(1);
+    const page = dataLines.slice(offset, offset + limit);
+    const records = page.map(line => {
+        const values = parseDelimitedLine(line, separator);
+        return Object.fromEntries(headers.map((h, i) => [h, values[i] ?? '']));
+    });
+    const fields = headers.map(h => ({ id: h, type: 'text' }));
+    return { fields, records, total: dataLines.length, source: 'file' };
+}
+function parseDelimitedLine(line, separator) {
+    const result = [];
+    let current = '';
+    let inQuotes = false;
+    for (let i = 0; i < line.length; i++) {
+        const char = line[i];
+        if (char === '"') {
+            if (inQuotes && line[i + 1] === '"') {
+                current += '"';
+                i++;
+            }
+            else {
+                inQuotes = !inQuotes;
+            }
+        }
+        else if (char === separator && !inQuotes) {
+            result.push(current);
+            current = '';
+        }
+        else {
+            current += char;
+        }
+    }
+    result.push(current);
+    return result;
+}

package/dist/tools/resource.js CHANGED Viewed

@@ -1,40 +1,86 @@
 import { z } from 'zod';
-import { getResourceData } from '../ckan.js';
+import { getResourceData, getResource, fetchAndParseFile } from '../ckan.js';
 export function registerResourceTool(server) {
     server.registerTool('get_resource_data', {
         title: 'Get Resource Data',
-        description: 'Read tabular data from a CKAN resource. Only works for resources with datastore enabled (check datastore_available from get_dataset). Supports pagination via limit and offset.',
+        description: 'Read tabular data from a CKAN resource. Tries the CKAN datastore first; if unavailable, automatically downloads and parses the file (CSV, TSV, JSON). For XLS, PDF and other binary formats it returns the direct download URL. Supports pagination via limit and offset.',
         inputSchema: z.object({
-            resource_id: z.string().describe('Resource UUID from a dataset\'s resources list (use get_dataset to obtain it)'),
+            resource_id: z.string().describe("Resource UUID from a dataset's resources list (use get_dataset to obtain it)"),
             limit: z.number().int().min(1).max(500).default(50).optional().describe('Rows to return (default: 50, max: 500)'),
             offset: z.number().int().min(0).default(0).optional().describe('Row offset for pagination (default: 0)'),
         }),
     }, async ({ resource_id, limit, offset }) => {
+        const effectiveLimit = limit ?? 50;
+        const effectiveOffset = offset ?? 0;
+        // Attempt 1: CKAN datastore
         try {
-            const result = await getResourceData(resource_id, limit ?? 50, offset ?? 0);
+            const result = await getResourceData(resource_id, effectiveLimit, effectiveOffset);
             return {
                 content: [{
                         type: 'text',
                         text: JSON.stringify({
+                            source: 'datastore',
                             total: result.total,
                             returned: result.records.length,
-                            offset: offset ?? 0,
+                            offset: effectiveOffset,
                             fields: result.fields,
                             records: result.records,
                         }, null, 2),
                     }],
             };
         }
-        catch (error) {
-            const message = error instanceof Error ? error.message : String(error);
-            const isNoDatastore = message.toLowerCase().includes('datastore') || message.includes('404') || message.includes('NOT FOUND');
+        catch (datastoreError) {
+            const dsMessage = datastoreError instanceof Error ? datastoreError.message : String(datastoreError);
+            const isNoDatastore = dsMessage.toLowerCase().includes('datastore') ||
+                dsMessage.includes('404') ||
+                dsMessage.toUpperCase().includes('NOT FOUND');
+            if (!isNoDatastore) {
+                return {
+                    content: [{ type: 'text', text: `Error: ${dsMessage}` }],
+                    isError: true,
+                };
+            }
+        }
+        // Attempt 2: direct file download
+        try {
+            const resource = await getResource(resource_id);
+            const result = await fetchAndParseFile(resource.url, resource.format, effectiveLimit, effectiveOffset);
             return {
                 content: [{
                         type: 'text',
-                        text: isNoDatastore
-                            ? `Datastore not available for resource "${resource_id}". This resource may be a file (CSV, XLS, PDF) without an activated datastore. Use the resource URL from get_dataset to download it directly.`
-                            : `Error: ${message}`,
+                        text: JSON.stringify({
+                            source: 'file',
+                            format: resource.format,
+                            url: resource.url,
+                            total: result.total,
+                            returned: result.records.length,
+                            offset: effectiveOffset,
+                            fields: result.fields,
+                            records: result.records,
+                        }, null, 2),
                     }],
+            };
+        }
+        catch (fileError) {
+            const fileMessage = fileError instanceof Error ? fileError.message : String(fileError);
+            // Format not parseable — return the URL so the AI can guide the user
+            if (fileMessage.startsWith('FORMAT_NOT_PARSEABLE:')) {
+                const [, fmt, url] = fileMessage.split(':');
+                return {
+                    content: [{
+                            type: 'text',
+                            text: JSON.stringify({
+                                source: 'file',
+                                parseable: false,
+                                format: fmt,
+                                url,
+                                message: `This resource is a ${fmt} file and cannot be parsed automatically. Download it directly from the URL above.`,
+                            }, null, 2),
+                        }],
+                };
+            }
+            return {
+                content: [{ type: 'text', text: `Error reading file: ${fileMessage}` }],
                 isError: true,
             };
         }

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "mcp-chilegob-dataset",
-  "version": "0.1.0",
+  "version": "0.2.0",
   "description": "MCP server exposing Chile's open government dataset portal (datos.gob.cl / CKAN API v3)",
   "type": "module",
   "license": "MIT",

package/src/ckan.ts CHANGED Viewed

@@ -20,10 +20,21 @@ export interface CkanResource {
   datastore_active: boolean
 }
+export interface CkanResourceDetail {
+  id: string
+  name: string
+  format: string
+  url: string
+  datastore_active: boolean
+  mimetype: string | null
+  size: number | null
+}
 export interface CkanDatastoreResult {
   fields: { id: string; type: string }[]
   records: Record<string, unknown>[]
   total: number
+  source?: 'datastore' | 'file'
 }
 async function ckanAction<T>(action: string, params: Record<string, unknown>): Promise<T> {
@@ -65,3 +76,90 @@ export async function getResourceData(
     offset,
   })
 }
+export async function getResource(resourceId: string): Promise<CkanResourceDetail> {
+  return ckanAction<CkanResourceDetail>('resource_show', { id: resourceId })
+}
+const PARSEABLE_FORMATS = new Set(['CSV', 'TSV', 'JSON'])
+export async function fetchAndParseFile(
+  url: string,
+  format: string,
+  limit: number,
+  offset: number
+): Promise<CkanDatastoreResult> {
+  const normalizedFormat = format.toUpperCase().trim()
+  if (!PARSEABLE_FORMATS.has(normalizedFormat)) {
+    throw new Error(
+      `FORMAT_NOT_PARSEABLE:${normalizedFormat}:${url}`
+    )
+  }
+  const response = await fetch(url)
+  if (!response.ok) {
+    throw new Error(`Failed to fetch file: ${response.status} ${response.statusText}`)
+  }
+  if (normalizedFormat === 'JSON') {
+    const json = await response.json() as unknown
+    const rows: Record<string, unknown>[] = Array.isArray(json)
+      ? (json as Record<string, unknown>[])
+      : [{ data: json }]
+    const page = rows.slice(offset, offset + limit)
+    const fields = page.length > 0
+      ? Object.keys(page[0]).map(key => ({ id: key, type: 'text' }))
+      : []
+    return { fields, records: page, total: rows.length, source: 'file' }
+  }
+  // CSV / TSV
+  const text = await response.text()
+  const separator = normalizedFormat === 'TSV' ? '\t' : ','
+  const lines = text.split(/\r?\n/).filter(l => l.trim() !== '')
+  if (lines.length === 0) {
+    return { fields: [], records: [], total: 0, source: 'file' }
+  }
+  const headers = parseDelimitedLine(lines[0], separator)
+  const dataLines = lines.slice(1)
+  const page = dataLines.slice(offset, offset + limit)
+  const records = page.map(line => {
+    const values = parseDelimitedLine(line, separator)
+    return Object.fromEntries(headers.map((h, i) => [h, values[i] ?? '']))
+  })
+  const fields = headers.map(h => ({ id: h, type: 'text' }))
+  return { fields, records, total: dataLines.length, source: 'file' }
+}
+function parseDelimitedLine(line: string, separator: string): string[] {
+  const result: string[] = []
+  let current = ''
+  let inQuotes = false
+  for (let i = 0; i < line.length; i++) {
+    const char = line[i]
+    if (char === '"') {
+      if (inQuotes && line[i + 1] === '"') {
+        current += '"'
+        i++
+      } else {
+        inQuotes = !inQuotes
+      }
+    } else if (char === separator && !inQuotes) {
+      result.push(current)
+      current = ''
+    } else {
+      current += char
+    }
+  }
+  result.push(current)
+  return result
+}

package/src/tools/resource.ts CHANGED Viewed

@@ -1,44 +1,97 @@
 import type { McpServer } from '@modelcontextprotocol/server'
 import { z } from 'zod'
-import { getResourceData } from '../ckan.js'
+import { getResourceData, getResource, fetchAndParseFile } from '../ckan.js'
 export function registerResourceTool(server: McpServer): void {
   server.registerTool(
     'get_resource_data',
     {
       title: 'Get Resource Data',
-      description: 'Read tabular data from a CKAN resource. Only works for resources with datastore enabled (check datastore_available from get_dataset). Supports pagination via limit and offset.',
+      description:
+        'Read tabular data from a CKAN resource. Tries the CKAN datastore first; if unavailable, automatically downloads and parses the file (CSV, TSV, JSON). For XLS, PDF and other binary formats it returns the direct download URL. Supports pagination via limit and offset.',
       inputSchema: z.object({
-        resource_id: z.string().describe('Resource UUID from a dataset\'s resources list (use get_dataset to obtain it)'),
+        resource_id: z.string().describe("Resource UUID from a dataset's resources list (use get_dataset to obtain it)"),
         limit: z.number().int().min(1).max(500).default(50).optional().describe('Rows to return (default: 50, max: 500)'),
         offset: z.number().int().min(0).default(0).optional().describe('Row offset for pagination (default: 0)'),
       }),
     },
     async ({ resource_id, limit, offset }) => {
+      const effectiveLimit = limit ?? 50
+      const effectiveOffset = offset ?? 0
+      // Attempt 1: CKAN datastore
       try {
-        const result = await getResourceData(resource_id, limit ?? 50, offset ?? 0)
+        const result = await getResourceData(resource_id, effectiveLimit, effectiveOffset)
         return {
           content: [{
             type: 'text',
             text: JSON.stringify({
+              source: 'datastore',
               total: result.total,
               returned: result.records.length,
-              offset: offset ?? 0,
+              offset: effectiveOffset,
               fields: result.fields,
               records: result.records,
             }, null, 2),
           }],
         }
-      } catch (error) {
-        const message = error instanceof Error ? error.message : String(error)
-        const isNoDatastore = message.toLowerCase().includes('datastore') || message.includes('404') || message.includes('NOT FOUND')
+      } catch (datastoreError) {
+        const dsMessage = datastoreError instanceof Error ? datastoreError.message : String(datastoreError)
+        const isNoDatastore =
+          dsMessage.toLowerCase().includes('datastore') ||
+          dsMessage.includes('404') ||
+          dsMessage.toUpperCase().includes('NOT FOUND')
+        if (!isNoDatastore) {
+          return {
+            content: [{ type: 'text', text: `Error: ${dsMessage}` }],
+            isError: true,
+          }
+        }
+      }
+      // Attempt 2: direct file download
+      try {
+        const resource = await getResource(resource_id)
+        const result = await fetchAndParseFile(resource.url, resource.format, effectiveLimit, effectiveOffset)
         return {
           content: [{
             type: 'text',
-            text: isNoDatastore
-              ? `Datastore not available for resource "${resource_id}". This resource may be a file (CSV, XLS, PDF) without an activated datastore. Use the resource URL from get_dataset to download it directly.`
-              : `Error: ${message}`,
+            text: JSON.stringify({
+              source: 'file',
+              format: resource.format,
+              url: resource.url,
+              total: result.total,
+              returned: result.records.length,
+              offset: effectiveOffset,
+              fields: result.fields,
+              records: result.records,
+            }, null, 2),
           }],
+        }
+      } catch (fileError) {
+        const fileMessage = fileError instanceof Error ? fileError.message : String(fileError)
+        // Format not parseable — return the URL so the AI can guide the user
+        if (fileMessage.startsWith('FORMAT_NOT_PARSEABLE:')) {
+          const [, fmt, url] = fileMessage.split(':')
+          return {
+            content: [{
+              type: 'text',
+              text: JSON.stringify({
+                source: 'file',
+                parseable: false,
+                format: fmt,
+                url,
+                message: `This resource is a ${fmt} file and cannot be parsed automatically. Download it directly from the URL above.`,
+              }, null, 2),
+            }],
+          }
+        }
+        return {
+          content: [{ type: 'text', text: `Error reading file: ${fileMessage}` }],
           isError: true,
         }
       }