npm - newskit-mcp-server - Versions diffs - 1.0.0 - Mend

newskit-mcp-server 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/CONTRIBUTING.md ADDED Viewed

@@ -0,0 +1,57 @@
+# Contributing to NewsKit MCP Server
+Thank you for your interest in contributing! This document provides guidelines for contributing to this project.
+## Development Setup
+1. Clone the repository:
+```bash
+git clone https://github.com/CodeAKrome/newskit-mcp-server.git
+cd newskit-mcp-server
+```
+2. Install Node.js dependencies:
+```bash
+npm install
+```
+3. Install Python dependencies:
+```bash
+pip install chromadb sentence-transformers pandas numpy scikit-learn
+```
+4. Build the project:
+```bash
+npm run build
+```
+## Making Changes
+1. Create a new branch for your feature or bug fix
+2. Make your changes
+3. Test thoroughly
+4. Update documentation if needed
+5. Submit a pull request
+## Code Style
+- TypeScript: Use strict mode, follow existing patterns
+- Python: Follow PEP 8 style guide
+- Keep functions focused and well-documented
+## Testing
+Before submitting a PR:
+- Test all four tools (categorize_articles, load_articles, search_similar, get_categories)
+- Verify the build compiles without errors
+- Test with sample data
+## Pull Request Process
+1. Ensure your PR description clearly describes the problem and solution
+2. Reference any related issues
+3. Wait for review and address feedback
+## Questions?
+Open an issue for questions or discussion.

package/Dockerfile ADDED Viewed

@@ -0,0 +1,63 @@
+# Multi-stage build for NewsKit MCP Server
+FROM node:20-slim AS builder
+WORKDIR /app
+# Copy package files
+COPY package*.json ./
+COPY tsconfig.json ./
+# Install dependencies
+RUN npm ci
+# Copy source code
+COPY src/ ./src/
+# Build the project
+RUN npm run build
+# Production stage
+FROM node:20-slim
+WORKDIR /app
+# Install Python and required packages
+RUN apt-get update && apt-get install -y \
+    python3 \
+    python3-pip \
+    && rm -rf /var/lib/apt/lists/*
+# Install Python dependencies
+RUN pip3 install --break-system-packages \
+    chromadb \
+    sentence-transformers \
+    pandas \
+    numpy \
+    scikit-learn
+# Copy package files
+COPY package*.json ./
+# Install production Node.js dependencies
+RUN npm ci --production
+# Copy built files from builder
+COPY --from=builder /app/build ./build
+# Copy Python bridge
+COPY python_bridge.py ./
+# Set executable permissions
+RUN chmod +x build/index.js
+# Create directory for ChromaDB
+RUN mkdir -p /app/chroma_db
+# Set environment variables
+ENV NODE_ENV=production
+ENV PYTHONPATH=/app
+# Expose no ports (uses stdio transport)
+# Run the server
+ENTRYPOINT ["node", "build/index.js"]

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 NewsKit Contributors
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,174 @@
+# NewsKit MCP Server
+An MCP server for intelligent news article categorization using embeddings and clustering. Automatically groups similar articles together and generates human-readable category names.
+## Features
+- **Semantic Categorization**: Uses sentence-transformers to generate embeddings and DBSCAN clustering to group similar articles
+- **ChromaDB Integration**: Stores article embeddings for fast semantic search
+- **Automatic Category Naming**: Uses TF-IDF to extract keywords and generate descriptive category names
+- **Configurable Parameters**: Adjust similarity thresholds and minimum cluster sizes to fine-tune results
+- **Search Capability**: Find semantically similar articles using natural language queries
+## Tools
+### categorize_articles
+Run the full categorization pipeline on a TSV file of news articles.
+**Parameters:**
+- `inputPath` (required): Path to TSV file with `article_id` and `title` columns
+- `outputPath` (optional): Output JSON file path (default: `categories.json`)
+- `minClusterSize` (optional): Minimum articles per category (default: 2)
+- `similarityThreshold` (optional): Cosine similarity threshold 0-1 (default: 0.75)
+- `persistDir` (optional): ChromaDB storage directory (default: `./chroma_db`)
+**Example:**
+```json
+{
+  "inputPath": "/path/to/articles.tsv",
+  "outputPath": "/path/to/categories.json",
+  "similarityThreshold": 0.8,
+  "minClusterSize": 3
+}
+```
+### load_articles
+Preview articles from a TSV file without categorizing.
+**Parameters:**
+- `inputPath` (required): Path to TSV file
+- `limit` (optional): Maximum articles to return (default: 50)
+### search_similar
+Search for semantically similar articles using natural language queries.
+**Parameters:**
+- `query` (required): Search query text
+- `persistDir` (optional): ChromaDB directory (default: `./chroma_db`)
+- `nResults` (optional): Number of results (default: 5, max: 20)
+### get_categories
+Display categorized results from a JSON output file.
+**Parameters:**
+- `resultsPath` (required): Path to categories.json file
+## Installation
+### Prerequisites
+- Node.js 18 or higher
+- Python 3.8 or higher
+- Python dependencies: `pip install chromadb sentence-transformers pandas numpy scikit-learn`
+### From NPM
+```bash
+npm install -g newskit-mcp-server
+```
+### From Source
+```bash
+git clone https://github.com/CodeAKrome/newskit-mcp-server.git
+cd newskit-mcp-server
+npm install
+npm run build
+```
+## Configuration
+Add to your MCP settings file:
+```json
+{
+  "mcpServers": {
+    "newskit": {
+      "command": "node",
+      "args": ["/path/to/newskit-mcp-server/build/index.js"],
+      "disabled": false,
+      "alwaysAllow": [],
+      "disabledTools": []
+    }
+  }
+}
+```
+Or if installed via npm:
+```json
+{
+  "mcpServers": {
+    "newskit": {
+      "command": "npx",
+      "args": ["newskit-mcp-server"],
+      "disabled": false
+    }
+  }
+}
+```
+## Input Format
+The input TSV file should have two columns:
+- `article_id`: Unique identifier for the article
+- `title`: Article title text
+Example:
+```tsv
+article_id	title
+abc123	Venezuela releases over 100 political prisoners
+def456	Seahawks advance to Super Bowl with thrilling win
+```
+## Output Format
+The output JSON file contains:
+```json
+{
+  "categories": [
+    {
+      "category_id": 1,
+      "category_name": "Venezuela / Prisoners",
+      "article_count": 3,
+      "articles": [
+        {"article_id": "abc123", "title": "Venezuela releases..."}
+      ]
+    }
+  ],
+  "uncategorized": [
+    {"article_id": "xyz789", "title": "Unique article..."}
+  ]
+}
+```
+## Tuning Guide
+| Goal | Parameter Adjustment |
+|------|---------------------|
+| More categories (looser) | Lower `similarityThreshold` (try 0.65) |
+| Fewer, tighter categories | Raise `similarityThreshold` (try 0.85) |
+| Only major categories | Raise `minClusterSize` (try 5) |
+| Include smaller clusters | Lower `minClusterSize` (try 2) |
+## Architecture
+- **TypeScript MCP Server**: Provides the tool interface via stdio transport
+- **Python Bridge**: Interfaces with ML libraries (sentence-transformers, scikit-learn)
+- **ChromaDB**: Vector database for embedding storage and similarity search
+- **Sentence-Transformers**: all-MiniLM-L6-v2 model for generating embeddings
+- **DBSCAN**: Clustering algorithm for grouping similar articles
+- **TF-IDF**: Keyword extraction for automatic category naming
+## License
+MIT License - See LICENSE file for details
+## Contributing
+Contributions welcome! Please read CONTRIBUTING.md for guidelines.

package/build/index.js ADDED Viewed

@@ -0,0 +1,237 @@
+#!/usr/bin/env node
+import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
+import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
+import { z } from "zod";
+import { spawn } from "child_process";
+import * as path from "path";
+import * as fs from "fs";
+import { fileURLToPath } from "url";
+// Get __dirname equivalent in ES modules
+const __filename = fileURLToPath(import.meta.url);
+const __dirname = path.dirname(__filename);
+// Path to the Python script bridge
+const PYTHON_BRIDGE_PATH = path.join(__dirname, "..", "python_bridge.py");
+const SRC_DIR = path.join(__dirname, "..", "..", "src");
+/**
+ * Execute Python script with arguments and return JSON output
+ */
+async function runPythonBridge(args) {
+    return new Promise((resolve, reject) => {
+        const pythonProcess = spawn("python3", [PYTHON_BRIDGE_PATH, ...args], {
+            cwd: SRC_DIR,
+            env: { ...process.env, PYTHONPATH: SRC_DIR }
+        });
+        let stdout = "";
+        let stderr = "";
+        pythonProcess.stdout.on("data", (data) => {
+            stdout += data.toString();
+        });
+        pythonProcess.stderr.on("data", (data) => {
+            stderr += data.toString();
+        });
+        pythonProcess.on("close", (code) => {
+            if (code !== 0) {
+                reject(new Error(`Python bridge failed: ${stderr || stdout}`));
+            }
+            else {
+                try {
+                    // Find JSON output (it should be the last line)
+                    const lines = stdout.trim().split("\n");
+                    const jsonLine = lines.find(line => line.startsWith("{") || line.startsWith("["));
+                    if (jsonLine) {
+                        resolve(JSON.parse(jsonLine));
+                    }
+                    else {
+                        resolve({ success: true, output: stdout.trim() });
+                    }
+                }
+                catch (e) {
+                    resolve({ success: true, output: stdout.trim() });
+                }
+            }
+        });
+        pythonProcess.on("error", (error) => {
+            reject(new Error(`Failed to start Python: ${error.message}`));
+        });
+    });
+}
+// Create MCP server
+const server = new McpServer({
+    name: "newskit-mcp-server",
+    version: "1.0.0"
+});
+// Tool: Categorize articles from TSV file
+server.tool("categorize_articles", {
+    inputPath: z.string().describe("Path to input TSV file with article_id and title columns"),
+    outputPath: z.string().optional().describe("Path to output JSON file (default: categories.json)"),
+    minClusterSize: z.number().min(1).optional().describe("Minimum articles per category (default: 2)"),
+    similarityThreshold: z.number().min(0).max(1).optional().describe("Cosine similarity threshold 0-1 (default: 0.75)"),
+    persistDir: z.string().optional().describe("ChromaDB storage directory (default: ./chroma_db)")
+}, async ({ inputPath, outputPath, minClusterSize, similarityThreshold, persistDir }) => {
+    try {
+        // Validate input file exists
+        if (!fs.existsSync(inputPath)) {
+            return {
+                content: [{ type: "text", text: `Error: Input file not found: ${inputPath}` }],
+                isError: true
+            };
+        }
+        const args = [
+            "categorize",
+            "--input", inputPath,
+            "--output", outputPath || "categories.json",
+            "--min-cluster-size", String(minClusterSize || 2),
+            "--similarity-threshold", String(similarityThreshold || 0.75),
+            "--persist-dir", persistDir || "./chroma_db"
+        ];
+        const result = await runPythonBridge(args);
+        return {
+            content: [
+                {
+                    type: "text",
+                    text: `Categorization complete!\n\nResults:\n${JSON.stringify(result, null, 2)}`
+                }
+            ]
+        };
+    }
+    catch (error) {
+        return {
+            content: [
+                {
+                    type: "text",
+                    text: `Categorization failed: ${error instanceof Error ? error.message : String(error)}`
+                }
+            ],
+            isError: true
+        };
+    }
+});
+// Tool: Load and view articles from TSV
+server.tool("load_articles", {
+    inputPath: z.string().describe("Path to input TSV file with article_id and title columns"),
+    limit: z.number().optional().describe("Maximum number of articles to return (default: 50)")
+}, async ({ inputPath, limit }) => {
+    try {
+        if (!fs.existsSync(inputPath)) {
+            return {
+                content: [{ type: "text", text: `Error: Input file not found: ${inputPath}` }],
+                isError: true
+            };
+        }
+        const args = [
+            "load",
+            "--input", inputPath,
+            "--limit", String(limit || 50)
+        ];
+        const result = await runPythonBridge(args);
+        return {
+            content: [
+                {
+                    type: "text",
+                    text: `Articles loaded (${result.count} total):\n\n${JSON.stringify(result.articles, null, 2)}`
+                }
+            ]
+        };
+    }
+    catch (error) {
+        return {
+            content: [
+                {
+                    type: "text",
+                    text: `Failed to load articles: ${error instanceof Error ? error.message : String(error)}`
+                }
+            ],
+            isError: true
+        };
+    }
+});
+// Tool: Search similar articles in ChromaDB
+server.tool("search_similar", {
+    query: z.string().describe("Search query to find similar articles"),
+    persistDir: z.string().optional().describe("ChromaDB storage directory (default: ./chroma_db)"),
+    nResults: z.number().min(1).max(20).optional().describe("Number of results to return (default: 5)")
+}, async ({ query, persistDir, nResults }) => {
+    try {
+        const args = [
+            "search",
+            "--query", query,
+            "--persist-dir", persistDir || "./chroma_db",
+            "--n-results", String(nResults || 5)
+        ];
+        const result = await runPythonBridge(args);
+        return {
+            content: [
+                {
+                    type: "text",
+                    text: `Search results for "${query}":\n\n${JSON.stringify(result, null, 2)}`
+                }
+            ]
+        };
+    }
+    catch (error) {
+        return {
+            content: [
+                {
+                    type: "text",
+                    text: `Search failed: ${error instanceof Error ? error.message : String(error)}`
+                }
+            ],
+            isError: true
+        };
+    }
+});
+// Tool: Get categories from results file
+server.tool("get_categories", {
+    resultsPath: z.string().describe("Path to the categories.json results file")
+}, async ({ resultsPath }) => {
+    try {
+        if (!fs.existsSync(resultsPath)) {
+            return {
+                content: [{ type: "text", text: `Error: Results file not found: ${resultsPath}` }],
+                isError: true
+            };
+        }
+        const content = fs.readFileSync(resultsPath, "utf-8");
+        const data = JSON.parse(content);
+        let output = "Categories:\n\n";
+        if (data.categories) {
+            for (const cat of data.categories) {
+                output += `Category ${cat.category_id}: "${cat.category_name}" (${cat.article_count} articles)\n`;
+                for (const article of cat.articles) {
+                    output += `  - ${article.article_id}: ${article.title}\n`;
+                }
+                output += "\n";
+            }
+        }
+        if (data.uncategorized && data.uncategorized.length > 0) {
+            output += `Uncategorized: ${data.uncategorized.length} articles\n`;
+            for (const article of data.uncategorized.slice(0, 10)) {
+                output += `  - ${article.article_id}: ${article.title}\n`;
+            }
+            if (data.uncategorized.length > 10) {
+                output += `  ... and ${data.uncategorized.length - 10} more\n`;
+            }
+        }
+        return {
+            content: [{ type: "text", text: output }]
+        };
+    }
+    catch (error) {
+        return {
+            content: [
+                {
+                    type: "text",
+                    text: `Failed to read categories: ${error instanceof Error ? error.message : String(error)}`
+                }
+            ],
+            isError: true
+        };
+    }
+});
+// Start the server
+async function main() {
+    const transport = new StdioServerTransport();
+    await server.connect(transport);
+    console.error("NewsKit MCP server running on stdio");
+}
+main();

package/package.json ADDED Viewed

@@ -0,0 +1,47 @@
+{
+  "name": "newskit-mcp-server",
+  "version": "1.0.0",
+  "description": "MCP server for intelligent news article categorization using embeddings and clustering",
+  "main": "build/index.js",
+  "type": "module",
+  "bin": {
+    "newskit-mcp-server": "build/index.js"
+  },
+  "scripts": {
+    "build": "tsc && node -e \"require('fs').chmodSync('build/index.js', '755')\"",
+    "dev": "ts-node --esm src/index.ts",
+    "prepare": "npm run build"
+  },
+  "keywords": [
+    "mcp",
+    "mcp-server",
+    "news",
+    "categorization",
+    "clustering",
+    "embeddings",
+    "nlp",
+    "machine-learning"
+  ],
+  "author": "",
+  "license": "MIT",
+  "repository": {
+    "type": "git",
+    "url": "git+https://github.com/CodeAKrome/newskit-mcp-server.git"
+  },
+  "bugs": {
+    "url": "https://github.com/CodeAKrome/newskit-mcp-server/issues"
+  },
+  "homepage": "https://github.com/CodeAKrome/newskit-mcp-server#readme",
+  "engines": {
+    "node": ">=18.0.0"
+  },
+  "dependencies": {
+    "@modelcontextprotocol/sdk": "^1.25.3",
+    "zod": "^4.3.6"
+  },
+  "devDependencies": {
+    "@types/node": "^25.1.0",
+    "ts-node": "^10.9.2",
+    "typescript": "^5.9.3"
+  }
+}

package/python_bridge.py ADDED Viewed

@@ -0,0 +1,189 @@
+#!/usr/bin/env python3
+"""
+Python bridge script for NewsKit MCP server.
+This script provides a command-line interface to the NewsKit functionality
+that can be called from the TypeScript MCP server.
+"""
+import argparse
+import json
+import sys
+import os
+# Add the parent directory to Python path to find src
+parent_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+src_dir = os.path.join(parent_dir, 'src')
+sys.path.insert(0, parent_dir)
+sys.path.insert(0, src_dir)
+from file_handler import load_articles, export_results
+from embeddings import generate_embeddings
+from chroma_manager import initialize_chroma, add_articles_to_chroma
+from categorizer import cluster_articles, generate_category_names, format_results
+from chromadb.config import Settings
+import config
+import chromadb
+def cmd_categorize(args):
+    """Run the full categorization pipeline."""
+    try:
+        # Step 1: Load articles
+        articles_df = load_articles(args.input)
+        if articles_df.empty:
+            print(json.dumps({"error": "No articles found in input file"}))
+            return 1
+        # Step 2: Generate embeddings
+        titles = articles_df['title'].tolist()
+        embeddings = generate_embeddings(titles)
+        # Step 3: Initialize ChromaDB
+        collection = initialize_chroma(
+            collection_name=config.COLLECTION_NAME,
+            persist_dir=args.persist_dir
+        )
+        # Step 4: Add articles to ChromaDB
+        add_articles_to_chroma(collection, articles_df, embeddings)
+        # Step 5: Cluster articles
+        cluster_labels = cluster_articles(
+            embeddings,
+            method="dbscan",
+            min_samples=args.min_cluster_size,
+            eps=1 - args.similarity_threshold
+        )
+        # Step 6: Generate category names
+        categories = generate_category_names(
+            articles_df,
+            cluster_labels,
+            args.min_cluster_size
+        )
+        # Step 7: Format and export results
+        results = format_results(categories, articles_df, cluster_labels)
+        export_results(results, args.output)
+        # Output summary
+        summary = {
+            "success": True,
+            "total_articles": len(articles_df),
+            "categories_count": len(results['categories']),
+            "uncategorized_count": len(results['uncategorized']),
+            "output_file": args.output
+        }
+        print(json.dumps(summary))
+        return 0
+    except Exception as e:
+        print(json.dumps({"error": str(e)}))
+        return 1
+def cmd_load(args):
+    """Load and return articles from TSV file."""
+    try:
+        articles_df = load_articles(args.input)
+        # Convert to list of dicts
+        articles = articles_df.head(args.limit).to_dict('records')
+        result = {
+            "count": len(articles_df),
+            "articles": [
+                {"article_id": str(a['article_id']), "title": a['title']}
+                for a in articles
+            ]
+        }
+        print(json.dumps(result))
+        return 0
+    except Exception as e:
+        print(json.dumps({"error": str(e)}))
+        return 1
+def cmd_search(args):
+    """Search for similar articles in ChromaDB."""
+    try:
+        # Initialize ChromaDB client
+        settings = Settings(anonymized_telemetry=False, allow_reset=True)
+        client = chromadb.PersistentClient(path=args.persist_dir, settings=settings)
+        # Get the collection
+        try:
+            collection = client.get_collection(name=config.COLLECTION_NAME)
+        except Exception:
+            print(json.dumps({"error": "Collection not found. Run categorization first."}))
+            return 1
+        # Query the collection
+        results = collection.query(
+            query_texts=[args.query],
+            n_results=args.n_results
+        )
+        # Format results
+        formatted_results = []
+        if results['ids'] and len(results['ids']) > 0:
+            for i in range(len(results['ids'][0])):
+                formatted_results.append({
+                    "article_id": results['ids'][0][i],
+                    "title": results['documents'][0][i] if results['documents'] else "",
+                    "metadata": results['metadatas'][0][i] if results['metadatas'] else {},
+                    "distance": results['distances'][0][i] if results['distances'] else None
+                })
+        print(json.dumps({
+            "query": args.query,
+            "results": formatted_results
+        }))
+        return 0
+    except Exception as e:
+        print(json.dumps({"error": str(e)}))
+        return 1
+def main():
+    parser = argparse.ArgumentParser(description="NewsKit Python Bridge")
+    subparsers = parser.add_subparsers(dest='command', help='Command to run')
+    # Categorize command
+    categorize_parser = subparsers.add_parser('categorize', help='Categorize articles')
+    categorize_parser.add_argument('--input', required=True, help='Input TSV file path')
+    categorize_parser.add_argument('--output', default='categories.json', help='Output JSON file path')
+    categorize_parser.add_argument('--min-cluster-size', type=int, default=2, help='Minimum cluster size')
+    categorize_parser.add_argument('--similarity-threshold', type=float, default=0.75, help='Similarity threshold')
+    categorize_parser.add_argument('--persist-dir', default='./chroma_db', help='ChromaDB persistence directory')
+    # Load command
+    load_parser = subparsers.add_parser('load', help='Load articles from TSV')
+    load_parser.add_argument('--input', required=True, help='Input TSV file path')
+    load_parser.add_argument('--limit', type=int, default=50, help='Maximum articles to return')
+    # Search command
+    search_parser = subparsers.add_parser('search', help='Search similar articles')
+    search_parser.add_argument('--query', required=True, help='Search query')
+    search_parser.add_argument('--persist-dir', default='./chroma_db', help='ChromaDB persistence directory')
+    search_parser.add_argument('--n-results', type=int, default=5, help='Number of results')
+    args = parser.parse_args()
+    if args.command == 'categorize':
+        return cmd_categorize(args)
+    elif args.command == 'load':
+        return cmd_load(args)
+    elif args.command == 'search':
+        return cmd_search(args)
+    else:
+        parser.print_help()
+        return 1
+if __name__ == "__main__":
+    sys.exit(main())

package/src/index.ts ADDED Viewed

@@ -0,0 +1,273 @@
+#!/usr/bin/env node
+import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
+import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
+import { z } from "zod";
+import { spawn } from "child_process";
+import * as path from "path";
+import * as fs from "fs";
+import { fileURLToPath } from "url";
+// Get __dirname equivalent in ES modules
+const __filename = fileURLToPath(import.meta.url);
+const __dirname = path.dirname(__filename);
+// Path to the Python script bridge
+const PYTHON_BRIDGE_PATH = path.join(__dirname, "..", "python_bridge.py");
+const SRC_DIR = path.join(__dirname, "..", "..", "src");
+/**
+ * Execute Python script with arguments and return JSON output
+ */
+async function runPythonBridge(args: string[]): Promise<any> {
+  return new Promise((resolve, reject) => {
+    const pythonProcess = spawn("python3", [PYTHON_BRIDGE_PATH, ...args], {
+      cwd: SRC_DIR,
+      env: { ...process.env, PYTHONPATH: SRC_DIR }
+    });
+    let stdout = "";
+    let stderr = "";
+    pythonProcess.stdout.on("data", (data) => {
+      stdout += data.toString();
+    });
+    pythonProcess.stderr.on("data", (data) => {
+      stderr += data.toString();
+    });
+    pythonProcess.on("close", (code) => {
+      if (code !== 0) {
+        reject(new Error(`Python bridge failed: ${stderr || stdout}`));
+      } else {
+        try {
+          // Find JSON output (it should be the last line)
+          const lines = stdout.trim().split("\n");
+          const jsonLine = lines.find(line => line.startsWith("{") || line.startsWith("["));
+          if (jsonLine) {
+            resolve(JSON.parse(jsonLine));
+          } else {
+            resolve({ success: true, output: stdout.trim() });
+          }
+        } catch (e) {
+          resolve({ success: true, output: stdout.trim() });
+        }
+      }
+    });
+    pythonProcess.on("error", (error) => {
+      reject(new Error(`Failed to start Python: ${error.message}`));
+    });
+  });
+}
+// Create MCP server
+const server = new McpServer({
+  name: "newskit-mcp-server",
+  version: "1.0.0"
+});
+// Tool: Categorize articles from TSV file
+server.tool(
+  "categorize_articles",
+  {
+    inputPath: z.string().describe("Path to input TSV file with article_id and title columns"),
+    outputPath: z.string().optional().describe("Path to output JSON file (default: categories.json)"),
+    minClusterSize: z.number().min(1).optional().describe("Minimum articles per category (default: 2)"),
+    similarityThreshold: z.number().min(0).max(1).optional().describe("Cosine similarity threshold 0-1 (default: 0.75)"),
+    persistDir: z.string().optional().describe("ChromaDB storage directory (default: ./chroma_db)")
+  },
+  async ({ inputPath, outputPath, minClusterSize, similarityThreshold, persistDir }) => {
+    try {
+      // Validate input file exists
+      if (!fs.existsSync(inputPath)) {
+        return {
+          content: [{ type: "text", text: `Error: Input file not found: ${inputPath}` }],
+          isError: true
+        };
+      }
+      const args = [
+        "categorize",
+        "--input", inputPath,
+        "--output", outputPath || "categories.json",
+        "--min-cluster-size", String(minClusterSize || 2),
+        "--similarity-threshold", String(similarityThreshold || 0.75),
+        "--persist-dir", persistDir || "./chroma_db"
+      ];
+      const result = await runPythonBridge(args);
+      return {
+        content: [
+          {
+            type: "text",
+            text: `Categorization complete!\n\nResults:\n${JSON.stringify(result, null, 2)}`
+          }
+        ]
+      };
+    } catch (error) {
+      return {
+        content: [
+          {
+            type: "text",
+            text: `Categorization failed: ${error instanceof Error ? error.message : String(error)}`
+          }
+        ],
+        isError: true
+      };
+    }
+  }
+);
+// Tool: Load and view articles from TSV
+server.tool(
+  "load_articles",
+  {
+    inputPath: z.string().describe("Path to input TSV file with article_id and title columns"),
+    limit: z.number().optional().describe("Maximum number of articles to return (default: 50)")
+  },
+  async ({ inputPath, limit }) => {
+    try {
+      if (!fs.existsSync(inputPath)) {
+        return {
+          content: [{ type: "text", text: `Error: Input file not found: ${inputPath}` }],
+          isError: true
+        };
+      }
+      const args = [
+        "load",
+        "--input", inputPath,
+        "--limit", String(limit || 50)
+      ];
+      const result = await runPythonBridge(args);
+      return {
+        content: [
+          {
+            type: "text",
+            text: `Articles loaded (${result.count} total):\n\n${JSON.stringify(result.articles, null, 2)}`
+          }
+        ]
+      };
+    } catch (error) {
+      return {
+        content: [
+          {
+            type: "text",
+            text: `Failed to load articles: ${error instanceof Error ? error.message : String(error)}`
+          }
+        ],
+        isError: true
+      };
+    }
+  }
+);
+// Tool: Search similar articles in ChromaDB
+server.tool(
+  "search_similar",
+  {
+    query: z.string().describe("Search query to find similar articles"),
+    persistDir: z.string().optional().describe("ChromaDB storage directory (default: ./chroma_db)"),
+    nResults: z.number().min(1).max(20).optional().describe("Number of results to return (default: 5)")
+  },
+  async ({ query, persistDir, nResults }) => {
+    try {
+      const args = [
+        "search",
+        "--query", query,
+        "--persist-dir", persistDir || "./chroma_db",
+        "--n-results", String(nResults || 5)
+      ];
+      const result = await runPythonBridge(args);
+      return {
+        content: [
+          {
+            type: "text",
+            text: `Search results for "${query}":\n\n${JSON.stringify(result, null, 2)}`
+          }
+        ]
+      };
+    } catch (error) {
+      return {
+        content: [
+          {
+            type: "text",
+            text: `Search failed: ${error instanceof Error ? error.message : String(error)}`
+          }
+        ],
+        isError: true
+      };
+    }
+  }
+);
+// Tool: Get categories from results file
+server.tool(
+  "get_categories",
+  {
+    resultsPath: z.string().describe("Path to the categories.json results file")
+  },
+  async ({ resultsPath }) => {
+    try {
+      if (!fs.existsSync(resultsPath)) {
+        return {
+          content: [{ type: "text", text: `Error: Results file not found: ${resultsPath}` }],
+          isError: true
+        };
+      }
+      const content = fs.readFileSync(resultsPath, "utf-8");
+      const data = JSON.parse(content);
+      let output = "Categories:\n\n";
+      if (data.categories) {
+        for (const cat of data.categories) {
+          output += `Category ${cat.category_id}: "${cat.category_name}" (${cat.article_count} articles)\n`;
+          for (const article of cat.articles) {
+            output += `  - ${article.article_id}: ${article.title}\n`;
+          }
+          output += "\n";
+        }
+      }
+      if (data.uncategorized && data.uncategorized.length > 0) {
+        output += `Uncategorized: ${data.uncategorized.length} articles\n`;
+        for (const article of data.uncategorized.slice(0, 10)) {
+          output += `  - ${article.article_id}: ${article.title}\n`;
+        }
+        if (data.uncategorized.length > 10) {
+          output += `  ... and ${data.uncategorized.length - 10} more\n`;
+        }
+      }
+      return {
+        content: [{ type: "text", text: output }]
+      };
+    } catch (error) {
+      return {
+        content: [
+          {
+            type: "text",
+            text: `Failed to read categories: ${error instanceof Error ? error.message : String(error)}`
+          }
+        ],
+        isError: true
+      };
+    }
+  }
+);
+// Start the server
+async function main() {
+  const transport = new StdioServerTransport();
+  await server.connect(transport);
+  console.error("NewsKit MCP server running on stdio");
+}
+main();

package/tsconfig.json ADDED Viewed

@@ -0,0 +1,16 @@
+{
+  "compilerOptions": {
+    "target": "ES2022",
+    "module": "Node16",
+    "moduleResolution": "Node16",
+    "outDir": "./build",
+    "rootDir": "./src",
+    "strict": true,
+    "esModuleInterop": true,
+    "skipLibCheck": true,
+    "forceConsistentCasingInFileNames": true,
+    "resolveJsonModule": true
+  },
+  "include": ["src/**/*"],
+  "exclude": ["node_modules"]
+}