npm - uer-mcp - Versions diffs - 4.1.0 → 4.2.1 - Mend

uer-mcp 4.1.0 → 4.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/README.md +136 -94
package/package.json +15 -13
package/python/README.md +136 -94
package/python/pyproject.toml +2 -2
package/python/src/uer/evaluation/__init__.py +4 -0
package/python/src/uer/llm/config_guide.py +361 -0
package/python/src/uer/llm/config_registry.py +275 -0
package/python/src/uer/llm/gateway.py +579 -21
package/python/src/uer/security/__init__.py +13 -0
package/python/src/uer/security/prompt_injection.py +293 -0
package/python/src/uer/server.py +316 -0
package/python/uv.lock +1 -1

package/README.md CHANGED Viewed

@@ -3,16 +3,21 @@
   # Universal Expert Registry
-  [![npm version](https://badge.fury.io/js/uer-mcp.svg)](https://www.npmjs.com/package/uer-mcp)
+  [![npm version](https://img.shields.io/npm/v/uer-mcp)](https://www.npmjs.com/package/uer-mcp)
+  [![npm](https://img.shields.io/npm/dm/uer-mcp)](https://www.npmjs.com/package/uer-mcp)
+  [![npm bundle size](https://img.shields.io/bundlephobia/min/uer-mcp)](https://www.npmjs.com/package/uer-mcp)
   [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-  **ASI-Level Experts, Infinite Memory, Any Client**
+  **Multi-Provider LLM Gateway • S3-Compatible Storage • MCP Tool Orchestration**
+  > ⚠️ **Development Status**: This is a hackathon proof-of-concept. While the architecture supports 100+ LLM providers via LiteLLM, only a subset (Anthropic, Cerebras, OpenAI, Gemini, LM Studio, Ollama) have been extensively tested. Version numbers track feature implementation progress, not production readiness. If you encounter issues with other providers, please [open an issue](https://github.com/margusmartsepp/UER/issues).
 </div>
 ---
 **Standard config** works in most MCP clients:
-> 💡 **Quick Start**: Get a free Gemini API key at [aistudio.google.com/api-keys](https://aistudio.google.com/api-keys)
+> **Quick Start**: Get a free Cerebras API key at [cloud.cerebras.ai/platform](https://cloud.cerebras.ai/platform) under apikeys or use LM Studio (100% free, local)
 ```json
 {
   "mcpServers": {
@@ -20,16 +25,20 @@
       "command": "npx",
       "args": ["uer-mcp@latest"],
       "env": {
-        "GEMINI_API_KEY": "your-key-here"
+        // Specific provider key(s)
+        "CEREBRAS_API_KEY": "your-key-here",
+        "GEMINI_API_KEY": "your-key-here", // etc
+        // LM Studio (optional) - local models
+        "LM_STUDIO_API_BASE": "http://localhost:1234/v1"
       }
     }
   }
 }
 ```
-> **📦 Storage is optional**: This config works immediately for LLM and MCP features. For storage/context features, see [Storage Configuration Options](#storage-configuration-options) below.
+> **Storage is optional**: This config works immediately for LLM and MCP features. For storage/context features, see [Storage Configuration Options](#storage-configuration-options) below.
-> **⚠️ Required**: Add at least one API key to the `env` section. See [CONFIGURATION.md](CONFIGURATION.md) for all provider links and detailed setup.
+> **Required**: Add at least one API key to the `env` section. See [CONFIGURATION.md](CONFIGURATION.md) for all provider links and detailed setup.
 [<img src="https://img.shields.io/badge/VS_Code-VS_Code?style=flat-square&label=Install%20Server&color=0098FF" alt="Install in VS Code">](https://insiders.vscode.dev/redirect?url=vscode%3Amcp%2Finstall%3F%257B%2522name%2522%253A%2522uer%2522%252C%2522command%2522%253A%2522npx%2522%252C%2522args%2522%253A%255B%2522uer-mcp%2540latest%2522%255D%257D) [<img alt="Install in VS Code Insiders" src="https://img.shields.io/badge/VS_Code_Insiders-VS_Code_Insiders?style=flat-square&label=Install%20Server&color=24bfa5">](https://insiders.vscode.dev/redirect?url=vscode-insiders%3Amcp%2Finstall%3F%257B%2522name%2522%253A%2522uer%2522%252C%2522command%2522%253A%2522npx%2522%252C%2522args%2522%253A%255B%2522uer-mcp%2540latest%2522%255D%257D) [<img src="https://cursor.com/deeplink/mcp-install-dark.svg" alt="Install in Cursor">](https://cursor.com/en/install-mcp?name=UER&config=eyJjb21tYW5kIjoibnB4IHVlci1tY3BAbGF0ZXN0In0%3D) [<img src="https://img.shields.io/badge/Windsurf-Windsurf?style=flat-square&label=Install%20Server&color=0B7A8F" alt="Install in Windsurf">](https://windsurf.com)
@@ -43,24 +52,24 @@ For Claude Desktop, Goose, Codex, Amp, and other clients, see [CONFIGURATION.md]
 ---
 An MCP server that provides:
-1. **Universal LLM Access** - Call any LLM (Claude, GPT, Gemini, Bedrock, Azure, local models) through LiteLLM
-2. **MCP Tool Orchestration** - Connect to 1000+ MCP servers (filesystem, databases, browsers, etc.)
-3. **Shared Memory/Context** - Break context window limits via external storage with URI references
-4. **Subagent Delegation** - Spawn subagents with full chat history, not just single messages
+1. **Multi-Provider LLM Access** - Call 100+ LLM providers (Anthropic, OpenAI, Google, Azure, AWS Bedrock, local models) through LiteLLM
+2. **MCP Tool Integration** - Connect to other MCP servers for extended functionality
+3. **S3-Compatible Storage** - Store context and data in MinIO, AWS S3, or other S3-compatible backends
+4. **Prompt Injection Detection** - Basic content validation and security warnings
 ## Why This Exists
-LLMs have fundamental limitations:
-- **Single message I/O**: 32-64k tokens max
-- **Context window**: 200k-2M tokens
-- **No persistent memory**: Forget between sessions
-- **No expert access**: Can't use specialized tools
+MCP clients often need:
+- **Multiple LLM providers** - Different models for different tasks
+- **Persistent storage** - Save context between sessions
+- **Tool integration** - Connect to specialized MCP servers
+- **Configuration flexibility** - Support cloud and self-hosted solutions
-Traditional multi-agent approaches waste tokens by copying full context to each subagent. This registry solves it by:
-- Storing context externally (unlimited)
-- Passing URI references instead of full data (50 tokens vs 50k)
-- Building complete chat histories for subagents
-- Persisting across sessions
+UER provides:
+- Unified interface to 100+ LLM providers via LiteLLM
+- S3-compatible storage for context and data
+- MCP client for calling other MCP servers
+- Support for enterprise clouds (Azure, AWS, GCP) and self-hosted (Ollama, LM Studio)
 ## Architecture
@@ -80,9 +89,9 @@ graph TB
         subgraph litellm["LiteLLM Gateway"]
             C1["100+ LLM providers"]
-            C2["Native MCP Gateway"]
-            C3["A2A Protocol support"]
-            C4["Cost tracking, rate limiting, fallbacks"]
+            C2["Model routing"]
+            C3["Error handling"]
+            C4["Response formatting"]
         end
         subgraph store["Context Store"]
@@ -142,10 +151,10 @@ llm_call(model="ollama/llama3.1:8b-instruct-q4_K_M", messages=[...])
 ```
 Features included:
-- Automatic fallbacks between providers
-- Cost tracking per request
-- Rate limit handling with retries
-- Tool/function calling across all providers
+- Unified interface across providers
+- Support for cloud and self-hosted models
+- Automatic model detection and caching
+- Error handling and response formatting
 ### 2. MCP Tool Integration
@@ -161,28 +170,25 @@ mcp_call(server="postgres", tool="query", args={"sql": "SELECT * FROM users"})
 mcp_call(server="context7", tool="search", args={"query": "LiteLLM API reference"})
 ```
-### 3. Shared Context (The Killer Feature)
+### 3. S3-Compatible Storage
-Store data externally, pass URI references:
+Store data in S3-compatible backends:
 ```python
-# Store large document (200k tokens) in S3-compatible storage
-put("s3://uer-context/analysis/doc_001.json", {"content": large_document})
-# Pass only URI to subagent (50 tokens!)
-delegate(
-    model="anthropic/claude-sonnet-4-5-20250929",
-    task="Analyze the document",
-    context_refs=["s3://uer-context/analysis/doc_001.json"]
+# Store data in MinIO, AWS S3, or other S3-compatible storage
+storage_put(
+    key="analysis/doc_001.json",
+    content={"content": large_document},
+    bucket="uer-context"
 )
-# Subagent retrieves full content from storage
-# Result stored back to S3
-# Parent retrieves summary only
+# Retrieve data
+data = storage_get(
+    key="analysis/doc_001.json",
+    bucket="uer-context"
+)
 ```
-**Token savings: 99.9%** for multi-agent workflows.
 **Storage backends:**
 - **Local:** MinIO (S3-compatible, Docker-based)
 - **Cloud:** AWS S3, Azure Blob Storage, NetApp StorageGRID
@@ -263,33 +269,14 @@ With storage disabled:
 The server will start successfully without storage, and LLMs won't see storage-related tools in their tool list.
-### 4. Full Chat History for Subagents
+### 4. Prompt Injection Detection
-Build complete conversation context, not just single messages:
+Basic content validation and security warnings:
 ```python
-delegate(
-    model="openai/gpt-5-mini",
-    messages=[
-        {"role": "system", "content": "You are a code reviewer..."},
-        {"role": "user", "content": "Review this code for security issues"},
-        {"role": "assistant", "content": "I'll analyze the code..."},
-        {"role": "user", "content": "Focus on SQL injection risks"}
-    ],
-    tools=[...],  # MCP tools available to subagent
-    context_refs=["registry://context/codebase"]  # Large context via URI
-)
-```
-### 5. Continuation Across Sessions
-Complex tasks can span multiple messages and sessions:
-```
-Message 1: Start analysis → Progress: 20% → {{continuation: registry://plan/001}}
-Message 2: Continue → Progress: 60% → {{continuation: registry://plan/001}}
-[Next day]
-Message 3: Continue → Complete! Here's your report...
+# Detects potential prompt injection patterns
+# Provides risk assessment and warnings
+# Helps identify suspicious content in user inputs
 ```
 ## Usage
@@ -327,12 +314,15 @@ User: "Ask both Gemini and Claude Sonnet to write a haiku about programming"
 → Returns both haikus for comparison
 ```
-**3. Store and Share Context:**
+**3. Store and Retrieve Data:**
 ```
-User: "Store this document in the registry and have Gemini summarize it"
-→ put("registry://context/doc", {...})
-→ delegate(model="gemini/gemini-3-flash-preview", context_refs=["registry://context/doc"])
-→ Returns: Summary without re-sending full document
+User: "Store this configuration in S3"
+→ storage_put(key="config/settings.json", content={...})
+→ Returns: Confirmation with storage details
+User: "Retrieve the configuration"
+→ storage_get(key="config/settings.json")
+→ Returns: Configuration data
 ```
 ## Troubleshooting
@@ -364,36 +354,88 @@ User: "Store this document in the registry and have Gemini summarize it"
 | Tool | Description |
 |------|-------------|
 | `llm_call` | Call any LLM via LiteLLM (100+ providers) |
+| `llm_list_models` | List available models from configured providers |
+| `llm_config_guide` | Get configuration help for LLM providers |
 | `mcp_call` | Call any configured MCP server tool |
-| `put` | Store data/context in registry |
-| `get` | Retrieve data/context from registry |
-| `search` | Search MCP servers, skills, or stored context |
-| `delegate` | Spawn subagent with full chat history |
-| `subscribe` | Watch for async results |
-| `cancel` | Cancel subscription or execution |
+| `mcp_list_tools` | List available MCP tools |
+| `mcp_servers` | List configured MCP servers |
+| `storage_put` | Store data in S3-compatible storage |
+| `storage_get` | Retrieve data from storage |
+| `storage_list` | List stored objects |
+| `storage_delete` | Delete stored objects |
 ## LiteLLM Integration
 This project uses [LiteLLM](https://github.com/BerriAI/litellm) as the unified LLM gateway, providing:
 - **100+ LLM providers** through single interface
-- **Native MCP Gateway** with permission management
-- **A2A Protocol** for agent-to-agent communication
-- **Cost tracking** per request with spend reports
-- **Rate limiting** with automatic retries
-- **Fallbacks** between providers on failure
-- **Tool/function calling** normalized across providers
-### Supported Providers
-| Provider | Model Examples |
-|----------|---------------|
-| Anthropic | `anthropic/claude-sonnet-4-5-20250929`, `anthropic/claude-opus-4-5-20251101` |
-| OpenAI | `openai/gpt-5.2`, `openai/gpt-5-mini`, `openai/gpt-5.2-codex` |
-| Google | `gemini/gemini-3-flash-preview`, `gemini/gemini-3-pro-preview` |
-| Azure | `azure/gpt-4-deployment` |
-| AWS Bedrock | `bedrock/anthropic.claude-3-sonnet` |
-| Local | `ollama/llama3.1:8b-instruct-q4_K_M`, `lm_studio/lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF` |
+- **Unified API format** across all providers
+- **Support for cloud and self-hosted models**
+- **Automatic model detection** and caching
+- **Error handling** and response formatting
+### Provider & Model Discovery
+**Find supported providers and models:**
+- 📖 **[PROVIDERS.md](PROVIDERS.md)** - Complete guide to LiteLLM provider integrations and configuration
+- 🌐 **[LiteLLM Provider Docs](https://docs.litellm.ai/docs/providers/)** - Official documentation for all 100+ providers
+- 🔧 **`llm_list_models` tool** - Query available models from your configured providers
+- 🔧 **`llm_config_guide` tool** - Get configuration help for specific providers
+### Supported Providers (Examples)
+| Provider | Model Examples | Testing Status |
+|----------|---------------|----------------|
+| Anthropic | `anthropic/claude-sonnet-4-5-20250929`, `anthropic/claude-opus-4-5-20251101` | ✅ Tested |
+| Cerebras | `cerebras/llama-3.3-70b`, `cerebras/qwen-3-235b-a22b-instruct-2507` | ✅ Tested |
+| OpenAI | `openai/gpt-4o`, `openai/o3-mini` | ✅ Tested |
+| Google | `gemini/gemini-2.5-flash`, `gemini/gemini-2.0-flash-exp` | ✅ Tested |
+| LM Studio | `lm_studio/meta-llama-3.1-8b-instruct` (local) | ✅ Tested |
+| Ollama | `ollama/llama3.1:8b-instruct-q4_K_M` (local) | ✅ Tested |
+| Azure | `azure/gpt-4-deployment` | ⚠️ Untested |
+| AWS Bedrock | `bedrock/anthropic.claude-3-sonnet` | ⚠️ Untested |
+| Cohere | `cohere_chat/command-r-plus` | ⚠️ Untested |
+| Together AI | `together_ai/meta-llama/Llama-3-70b-chat-hf` | ⚠️ Untested |
+**Testing Status:**
+- ✅ **Tested**: Verified during development with live API queries and model caching
+- ⚠️ **Untested**: Supported via LiteLLM but not extensively tested. May require minor adjustments. Please [report issues](https://github.com/margusmartsepp/UER/issues) if you encounter problems.
+**Note:** Model names change frequently. Use the discovery tools above to find current models.
+### Advanced Configuration
+**Multi-Instance Providers:**
+LiteLLM supports multiple instances of the same provider (e.g., multiple Azure deployments). Configure via environment variables:
+```bash
+# Multiple Azure deployments
+AZURE_API_KEY="key1"
+AZURE_API_BASE="https://endpoint1.openai.azure.com"
+AZURE_API_VERSION="2023-05-15"
+# Use model format: azure/<deployment-name>
+# Example: azure/gpt-4-deployment
+```
+**Generic Provider Support:**
+Any provider with a configured API key will be detected automatically. If we don't have a specific query implementation, example models will be provided. Supported providers include:
+- Cohere (`COHERE_API_KEY`)
+- Together AI (`TOGETHERAI_API_KEY`)
+- Replicate (`REPLICATE_API_KEY`)
+- Hugging Face (`HUGGINGFACE_API_KEY`)
+- And 90+ more - see [LiteLLM docs](https://docs.litellm.ai/docs/providers/)
+**Fallback Chains:**
+LiteLLM supports automatic fallbacks. Configure via model list:
+```python
+# In your LLM call, specify fallback models
+model="gpt-4o"  # Primary
+fallbacks=["claude-sonnet-4-5", "gemini-2.5-flash"]  # Fallbacks
+```
+See [PROVIDERS.md](PROVIDERS.md) for detailed configuration examples.
 ## Project Structure

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "uer-mcp",
-  "version": "4.1.0",
-  "description": "Universal Expert Registry - Multi-agent safety monitoring, sandbagging detection, and simulation framework with 100+ LLM providers",
+  "version": "4.2.1",
+  "description": "[Hackathon Proof-of-Concept] Universal Expert Registry - Multi-provider LLM gateway, S3-compatible storage, and MCP tool orchestration. Tested with Anthropic, Cerebras, OpenAI, Gemini, LM Studio, Ollama.",
   "main": "index.js",
   "bin": {
     "uer-mcp": "bin/uer-mcp.js"
@@ -51,15 +51,17 @@
   "mcp": {
     "displayName": "Universal Expert Registry",
     "icon": "img/uer.jpg",
-    "description": "Multi-agent safety monitoring, sandbagging detection, and simulation framework. Access 100+ LLM providers, connect to 1000+ MCP servers, and manage unlimited context with external storage.",
+    "description": "[Proof-of-Concept] Multi-provider LLM gateway with 100+ providers via LiteLLM. Extensively tested: Anthropic, Cerebras, OpenAI, Gemini, LM Studio, Ollama. Other providers may need adjustments. Version numbers track features, not production readiness.",
     "features": [
       "Multi-Agent Safety Monitoring - 15+ behavior patterns (AgentVerse, sycophancy, deception, sandbagging)",
       "Sandbagging Detection - Multi-method detection with consistency testing and capability elicitation",
       "Multi-Agent Simulation - Full conversation orchestration with personas, audit trails, and manipulation detection",
       "Universal LLM Access - Call any LLM through LiteLLM (Claude, GPT, Gemini, Bedrock, Azure, local models)",
       "MCP Tool Orchestration - Connect to 1000+ MCP servers (filesystem, databases, browsers, etc.)",
-      "Shared Memory/Context - Break context window limits via external storage with URI references",
-      "Subagent Delegation - Spawn subagents with full chat history and behavior monitoring"
+      "S3-Compatible Storage - Persistent context storage with MinIO, AWS S3, or Azure Blob",
+      "Prompt Injection Detection - Basic content validation and security warnings",
+      "LM Studio Support - Local model hosting with OpenAI-compatible API",
+      "Model Query & Caching - Automatic model detection for Anthropic, Cerebras, OpenAI, Gemini"
     ],
     "tools": [
       {
@@ -91,20 +93,20 @@
         "description": "Quick sandbagging screening test"
       },
       {
-        "name": "put",
-        "description": "Store data in external context storage"
+        "name": "storage_put",
+        "description": "Store data in S3-compatible storage"
       },
       {
-        "name": "get",
-        "description": "Retrieve data from external context storage"
+        "name": "storage_get",
+        "description": "Retrieve data from S3-compatible storage"
       },
       {
-        "name": "delegate",
-        "description": "Delegate tasks to subagents with full context"
+        "name": "llm_list_models",
+        "description": "List available models from configured providers"
       },
       {
-        "name": "search",
-        "description": "Search stored context and knowledge"
+        "name": "llm_config_guide",
+        "description": "Get configuration help for LLM providers"
       }
     ],
     "configuration": {