uer-mcp 1.0.3 → 3.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,553 +1,637 @@
1
- <div align="center">
2
- <img src="img/uer.jpg" alt="UER Logo" width="200"/>
3
-
4
- # Universal Expert Registry
5
-
6
- [![npm version](https://badge.fury.io/js/uer-mcp.svg)](https://www.npmjs.com/package/uer-mcp)
7
- [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
8
-
9
- **ASI-Level Experts, Infinite Memory, Any Client**
10
- </div>
11
-
12
- ---
13
-
14
- **Standard config** works in most MCP clients:
15
- > 💡 **Quick Start**: Get a free Gemini API key at [aistudio.google.com/api-keys](https://aistudio.google.com/api-keys)
16
- ```json
17
- {
18
- "mcpServers": {
19
- "uer": {
20
- "command": "npx",
21
- "args": ["uer-mcp@latest"],
22
- "env": {
23
- "GEMINI_API_KEY": "your-key-here"
24
- }
25
- }
26
- }
27
- }
28
- ```
29
-
30
-
31
-
32
- > **⚠️ Required**: Add at least one API key to the `env` section. See [CONFIGURATION.md](CONFIGURATION.md) for all provider links and detailed setup.
33
-
34
- [<img src="https://img.shields.io/badge/VS_Code-VS_Code?style=flat-square&label=Install%20Server&color=0098FF" alt="Install in VS Code">](https://insiders.vscode.dev/redirect?url=vscode%3Amcp%2Finstall%3F%257B%2522name%2522%253A%2522uer%2522%252C%2522command%2522%253A%2522npx%2522%252C%2522args%2522%253A%255B%2522uer-mcp%2540latest%2522%255D%257D) [<img alt="Install in VS Code Insiders" src="https://img.shields.io/badge/VS_Code_Insiders-VS_Code_Insiders?style=flat-square&label=Install%20Server&color=24bfa5">](https://insiders.vscode.dev/redirect?url=vscode-insiders%3Amcp%2Finstall%3F%257B%2522name%2522%253A%2522uer%2522%252C%2522command%2522%253A%2522npx%2522%252C%2522args%2522%253A%255B%2522uer-mcp%2540latest%2522%255D%257D) [<img src="https://cursor.com/deeplink/mcp-install-dark.svg" alt="Install in Cursor">](https://cursor.com/en/install-mcp?name=UER&config=eyJjb21tYW5kIjoibnB4IHVlci1tY3BAbGF0ZXN0In0%3D) [<img src="https://img.shields.io/badge/Windsurf-Windsurf?style=flat-square&label=Install%20Server&color=0B7A8F" alt="Install in Windsurf">](https://windsurf.com)
35
-
36
- <details>
37
- <summary>Other MCP Clients</summary>
38
-
39
- For Claude Desktop, Goose, Codex, Amp, and other clients, see [CONFIGURATION.md](CONFIGURATION.md) for detailed setup instructions.
40
-
41
- </details>
42
-
43
- ---
44
-
45
- An MCP server that provides:
46
- 1. **Universal LLM Access** - Call any LLM (Claude, GPT, Gemini, Bedrock, Azure, local models) through LiteLLM
47
- 2. **MCP Tool Orchestration** - Connect to 1000+ MCP servers (filesystem, databases, browsers, etc.)
48
- 3. **Shared Memory/Context** - Break context window limits via external storage with URI references
49
- 4. **Subagent Delegation** - Spawn subagents with full chat history, not just single messages
50
-
51
- ## Why This Exists
52
-
53
- LLMs have fundamental limitations:
54
- - **Single message I/O**: 32-64k tokens max
55
- - **Context window**: 200k-2M tokens
56
- - **No persistent memory**: Forget between sessions
57
- - **No expert access**: Can't use specialized tools
58
-
59
- Traditional multi-agent approaches waste tokens by copying full context to each subagent. This registry solves it by:
60
- - Storing context externally (unlimited)
61
- - Passing URI references instead of full data (50 tokens vs 50k)
62
- - Building complete chat histories for subagents
63
- - Persisting across sessions
64
-
65
- ## Architecture
66
-
67
- ```mermaid
68
- graph TB
69
- subgraph clients["MCP Clients"]
70
- A1["Cursor"]
71
- A2["Claude Desktop"]
72
- A3["ChatGPT"]
73
- A4["VS Code"]
74
- A5["JetBrains"]
75
- end
76
-
77
- subgraph uer["UER - Universal Expert Registry"]
78
- direction TB
79
- B["MCP Tools<br/>llm_call, mcp_call, put, get, delegate, search"]
80
-
81
- subgraph litellm["LiteLLM Gateway"]
82
- C1["100+ LLM providers"]
83
- C2["Native MCP Gateway"]
84
- C3["A2A Protocol support"]
85
- C4["Cost tracking, rate limiting, fallbacks"]
86
- end
87
-
88
- subgraph store["Context Store"]
89
- D1["Local: SQLite"]
90
- D2["Cloud: Firebase"]
91
- end
92
-
93
- B --> litellm
94
- B --> store
95
- end
96
-
97
- subgraph providers["LLM Providers"]
98
- E1["Anthropic"]
99
- E2["OpenAI"]
100
- E3["Google"]
101
- E4["Azure"]
102
- E5["AWS Bedrock"]
103
- E6["Local: Ollama"]
104
- end
105
-
106
- subgraph mcpservers["MCP Servers"]
107
- F1["Filesystem"]
108
- F2["PostgreSQL"]
109
- F3["Slack"]
110
- F4["Browser"]
111
- F5["GitHub"]
112
- F6["1000+ more..."]
113
- end
114
-
115
- subgraph knowledge["Knowledge Sources"]
116
- G1["Context7"]
117
- G2["Company docs"]
118
- G3["Guidelines"]
119
- G4["Standards"]
120
- end
121
-
122
- clients -->|MCP Protocol| B
123
- litellm --> providers
124
- litellm --> mcpservers
125
- litellm --> knowledge
126
- ```
127
-
128
- ## Key Features
129
-
130
- ### 1. Universal LLM Access via LiteLLM
131
-
132
- Call any LLM with a single interface:
133
-
134
- ```python
135
- # All use the same interface - just change the model string
136
- llm_call(model="anthropic/claude-sonnet-4-5-20250929", messages=[...])
137
- llm_call(model="openai/gpt-5.2", messages=[...])
138
- llm_call(model="gemini/gemini-3-flash-preview", messages=[...])
139
- llm_call(model="bedrock/anthropic.claude-3-sonnet", messages=[...])
140
- llm_call(model="azure/gpt-4-deployment", messages=[...])
141
- llm_call(model="ollama/llama3.1:8b-instruct-q4_K_M", messages=[...])
142
- ```
143
-
144
- Features included:
145
- - Automatic fallbacks between providers
146
- - Cost tracking per request
147
- - Rate limit handling with retries
148
- - Tool/function calling across all providers
149
-
150
- ### 2. MCP Tool Integration
151
-
152
- Connect to any MCP server:
153
-
154
- ```python
155
- # List available MCP tools
156
- search(type="mcp")
157
-
158
- # Call MCP tools directly
159
- mcp_call(server="filesystem", tool="read_file", args={"path": "/data/report.txt"})
160
- mcp_call(server="postgres", tool="query", args={"sql": "SELECT * FROM users"})
161
- mcp_call(server="context7", tool="search", args={"query": "LiteLLM API reference"})
162
- ```
163
-
164
- ### 3. Shared Context (The Killer Feature)
165
-
166
- Store data externally, pass URI references:
167
-
168
- ```python
169
- # Store large document (200k tokens)
170
- put("registry://context/doc_001", {"content": large_document})
171
-
172
- # Pass only URI to subagent (50 tokens!)
173
- delegate(
174
- model="anthropic/claude-sonnet-4-5-20250929",
175
- task="Analyze the document",
176
- context_refs=["registry://context/doc_001"]
177
- )
178
-
179
- # Subagent retrieves full content from registry
180
- # Result stored back to registry
181
- # Parent retrieves summary only
182
- ```
183
-
184
- **Token savings: 99.9%** for multi-agent workflows.
185
-
186
- ### 4. Full Chat History for Subagents
187
-
188
- Build complete conversation context, not just single messages:
189
-
190
- ```python
191
- delegate(
192
- model="openai/gpt-5-mini",
193
- messages=[
194
- {"role": "system", "content": "You are a code reviewer..."},
195
- {"role": "user", "content": "Review this code for security issues"},
196
- {"role": "assistant", "content": "I'll analyze the code..."},
197
- {"role": "user", "content": "Focus on SQL injection risks"}
198
- ],
199
- tools=[...], # MCP tools available to subagent
200
- context_refs=["registry://context/codebase"] # Large context via URI
201
- )
202
- ```
203
-
204
- ### 5. Continuation Across Sessions
205
-
206
- Complex tasks can span multiple messages and sessions:
207
-
208
- ```
209
- Message 1: Start analysis → Progress: 20% → {{continuation: registry://plan/001}}
210
- Message 2: Continue → Progress: 60% → {{continuation: registry://plan/001}}
211
- [Next day]
212
- Message 3: Continue → Complete! Here's your report...
213
- ```
214
-
215
- ## Usage
216
-
217
- ### Test Your Setup
218
-
219
- Try this in Claude Desktop:
220
-
221
- ```
222
- "Use the llm_call tool to call Gemini 3 Flash and ask it to explain what an MCP server is in one sentence."
223
- ```
224
-
225
- Expected behavior:
226
- - Claude will use the `llm_call` tool
227
- - Call `gemini/gemini-3-flash-preview`
228
- - Return Gemini's response
229
-
230
- ### Example Usage Scenarios
231
-
232
- **1. Call Different LLMs:**
233
- ```
234
- User: "Use llm_call to ask Gemini what the capital of France is"
235
- Calls gemini/gemini-3-flash-preview
236
- Returns: "Paris"
237
-
238
- User: "Now ask Claude Sonnet the same question"
239
- → Calls anthropic/claude-sonnet-4-5-20250929
240
- Returns: "Paris"
241
- ```
242
-
243
- **2. Compare LLM Responses:**
244
- ```
245
- User: "Ask both Gemini and Claude Sonnet to write a haiku about programming"
246
- Uses llm_call twice with different models
247
- Returns both haikus for comparison
248
- ```
249
-
250
- **3. Store and Share Context:**
251
- ```
252
- User: "Store this document in the registry and have Gemini summarize it"
253
- → put("registry://context/doc", {...})
254
- → delegate(model="gemini/gemini-3-flash-preview", context_refs=["registry://context/doc"])
255
- → Returns: Summary without re-sending full document
256
- ```
257
-
258
- ## Troubleshooting
259
-
260
- ### "MCP server not found" or "No tools available"
261
-
262
- 1. Check that `claude_desktop_config.json` is in the correct location
263
- 2. Verify the `--directory` path is correct (use absolute path)
264
- 3. Ensure you've restarted Claude Desktop after configuration
265
- 4. Check Claude Desktop logs: `%APPDATA%\Claude\logs\` (Windows) or `~/Library/Logs/Claude/` (Mac)
266
-
267
- ### "API key invalid" errors
268
-
269
- 1. Verify your API key is correct and active
270
- 2. Check you're using the right key for the right provider
271
- 3. For Gemini, ensure the key starts with `AIza`
272
- 4. For Anthropic, ensure the key starts with `sk-ant-`
273
- 5. For OpenAI, ensure the key starts with `sk-`
274
-
275
- ### "Model not found" errors
276
-
277
- 1. Ensure you have an API key configured for that provider
278
- 2. Check the model name is correct (use LiteLLM format: `provider/model`)
279
- 3. Verify the model is available in your region/tier
280
-
281
-
282
- ## Tools Reference
283
-
284
- | Tool | Description |
285
- |------|-------------|
286
- | `llm_call` | Call any LLM via LiteLLM (100+ providers) |
287
- | `mcp_call` | Call any configured MCP server tool |
288
- | `put` | Store data/context in registry |
289
- | `get` | Retrieve data/context from registry |
290
- | `search` | Search MCP servers, skills, or stored context |
291
- | `delegate` | Spawn subagent with full chat history |
292
- | `subscribe` | Watch for async results |
293
- | `cancel` | Cancel subscription or execution |
294
-
295
- ## LiteLLM Integration
296
-
297
- This project uses [LiteLLM](https://github.com/BerriAI/litellm) as the unified LLM gateway, providing:
298
-
299
- - **100+ LLM providers** through single interface
300
- - **Native MCP Gateway** with permission management
301
- - **A2A Protocol** for agent-to-agent communication
302
- - **Cost tracking** per request with spend reports
303
- - **Rate limiting** with automatic retries
304
- - **Fallbacks** between providers on failure
305
- - **Tool/function calling** normalized across providers
306
-
307
- ### Supported Providers
308
-
309
- | Provider | Model Examples |
310
- |----------|---------------|
311
- | Anthropic | `anthropic/claude-sonnet-4-5-20250929`, `anthropic/claude-opus-4-5-20251101` |
312
- | OpenAI | `openai/gpt-5.2`, `openai/gpt-5-mini`, `openai/gpt-5.2-codex` |
313
- | Google | `gemini/gemini-3-flash-preview`, `gemini/gemini-3-pro-preview` |
314
- | Azure | `azure/gpt-4-deployment` |
315
- | AWS Bedrock | `bedrock/anthropic.claude-3-sonnet` |
316
- | Local | `ollama/llama3.1:8b-instruct-q4_K_M`, `lm_studio/lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF` |
317
-
318
- ## Project Structure
319
-
320
- ```
321
- UER/
322
- ├── README.md # This file
323
- ├── ADR.plan.md # Architecture Decision Record
324
- ├── TODO.md # Implementation checklist
325
- ├── pyproject.toml
326
-
327
- ├── src/
328
- │ ├── server.py # MCP server entry point
329
- │ ├── llm/
330
- │ │ └── gateway.py # LiteLLM wrapper
331
- │ ├── mcp/
332
- │ │ └── client.py # MCP client for calling other servers
333
- │ ├── storage/
334
- │ │ ├── base.py # Storage protocol
335
- │ │ └── local.py # SQLite + filesystem
336
- │ ├── tools/
337
- │ │ ├── llm_call.py # LLM invocation tool
338
- │ │ ├── mcp_call.py # MCP tool invocation
339
- │ │ ├── crud.py # put/get/search
340
- │ │ └── delegate.py # Subagent delegation
341
- │ └── models/
342
- │ ├── context.py # Context/blob schemas
343
- │ └── message.py # Chat message schemas
344
-
345
- └── config/
346
- └── litellm_config.yaml
347
- ```
348
-
349
- ## Dependencies
350
-
351
- ```toml
352
- [project]
353
- dependencies = [
354
- "mcp>=1.0.0",
355
- "litellm>=1.77.0",
356
- "pydantic>=2.0.0",
357
- "httpx>=0.25.0",
358
- ]
359
- ```
360
-
361
- ## Datasets & Testing
362
-
363
- UER includes scripts to download and test manipulation detection datasets.
364
-
365
- ### Quick Start: Download All Datasets
366
-
367
- **One command downloads everything:**
368
-
369
- ```bash
370
- python seed_datasets.py
371
- ```
372
-
373
- This downloads:
374
- - **WMDP Benchmark:** 3,668 questions (Bio: 1,273, Chem: 408, Cyber: 1,987)
375
- - **WildChat Sample:** 10,000 real conversations (162 MB)
376
- - **lm-evaluation-harness:** Evaluation framework
377
-
378
- **Time:** ~5-10 minutes depending on internet speed.
379
-
380
- ### Run Tests
381
-
382
- **Test for Sandbagging:**
383
- ```bash
384
- cd context/scripts
385
- python test_wmdp.py --model gemini/gemini-3-flash-preview --limit 50
386
- ```
387
-
388
- **Test for Sycophancy:**
389
- ```bash
390
- python test_sycophancy.py --models gemini
391
- ```
392
-
393
- **Results saved to:** `context/datasets/results/`
394
-
395
- ### Dataset Details
396
-
397
- | Dataset | Size | Purpose | Location |
398
- |---------|------|---------|----------|
399
- | **WMDP Benchmark** | 3,668 questions (2.2 MB) | Sandbagging detection | `context/datasets/wmdp_questions/` |
400
- | **WildChat** | 10k conversations (162 MB) | Real-world sycophancy | `context/datasets/wildchat/` |
401
- | **lm-evaluation-harness** | Framework | Standard LLM evaluation | `context/datasets/lm-evaluation-harness/` |
402
-
403
- All datasets are gitignored. Run `seed_datasets.py` to download locally.
404
-
405
- ## Hackathon Context
406
-
407
- This project was built for the **[AI Manipulation Hackathon](https://apartresearch.com/sprints/ai-manipulation-hackathon-2026-01-09-to-2026-01-11)** organized by [Apart Research](https://apartresearch.com/).
408
-
409
- ### Event Details
410
-
411
- - **Dates:** January 9-11, 2026
412
- - **Theme:** Measuring, detecting, and defending against AI manipulation
413
- - **Participants:** 500+ builders worldwide
414
- - **Prizes:** $2,000 in cash prizes
415
- - **Workshop:** Winners present at IASEAI workshop in Paris (February 26, 2026)
416
-
417
- ### The Challenge
418
-
419
- AI systems are mastering deception, sycophancy, sandbagging, and psychological exploitation at scale, while our ability to detect, measure, and counter these behaviors remains dangerously underdeveloped. This hackathon brings together builders to prototype practical systems that address this critical AI safety challenge.
420
-
421
- ### How UER Addresses AI Manipulation
422
-
423
- The Universal Expert Registry provides infrastructure for:
424
-
425
- 1. **Multi-Model Testing** - Compare responses across providers to detect inconsistencies and manipulation patterns
426
- 2. **Persistent Context** - Track conversation history across sessions to identify behavioral shifts
427
- 3. **Tool Integration** - Connect manipulation detection tools via MCP protocol
428
- 4. **Subagent Orchestration** - Deploy specialized agents for red-teaming and safety testing
429
- 5. **Transparent Logging** - Full visibility into LLM calls, costs, and behaviors
430
-
431
- ### Team
432
-
433
- **The Risk Takers** - Building practical tools for AI safety and transparency.
434
-
435
- ### Hackathon Resources
436
-
437
- The hackathon provides extensive research and tools for understanding AI manipulation. Below are key resources organized by track.
438
-
439
- #### General Introduction
440
-
441
- **[AI deception: A survey of examples, risks, and potential solutions](https://arxiv.org/abs/2308.14752)** (Park et al., 2024)
442
- Comprehensive survey defining AI deception and documenting manipulation via sycophancy, strategic behavior, and safety test evasion. Covers both near-term risks (fraud, election tampering) and long-term risks (losing control of AI systems).
443
-
444
- **[Artificial Intelligence Index Report 2025](https://hai.stanford.edu/assets/files/hai_ai_index_report_2025.pdf)** (Stanford HAI)
445
- Annual assessment of global AI progress including responsible AI and system risks. Provides regulatory trends and technical challenges context.
446
-
447
- **[Towards Understanding Sycophancy in Language Models](https://arxiv.org/abs/2310.13548)** (Sharma et al., 2024)
448
- Documents that state-of-the-art AI assistants consistently exhibit sycophancy, with human preference data driving this behavior. Demonstrates inverse scaling: larger models become MORE sycophantic.
449
-
450
- #### Track 1: Measurement & Evaluation
451
-
452
- **[AI Sandbagging: Language Models can Strategically Underperform on Evaluations](https://arxiv.org/abs/2406.07358)** (van der Weij et al., 2024)
453
- Demonstrates GPT-4 and Claude 3 Opus strategically underperforming on dangerous capability evaluations. Models can "password-lock" capabilities and target specific scores.
454
-
455
- **[Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models](https://arxiv.org/abs/2501.00002)** (Tice et al., 2024)
456
- Novel sandbagging detection via weight noise injection. Provides practical, model-agnostic detection tool.
457
-
458
- **[The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning](https://arxiv.org/abs/2403.03218)** (Li et al., 2024)
459
- Dataset of 3,668 multiple-choice questions for evaluating hazardous knowledge. Widely used in sandbagging and capability evaluation research.
460
-
461
- **UER Application:** Use `llm_call` to test multiple models against WMDP benchmark, comparing responses to detect sandbagging patterns.
462
-
463
- #### Track 2: Real-World Analysis
464
-
465
- **[Recent Frontier Models Are Reward Hacking](https://metr.org/blog/2025-06-05-recent-reward-hacking/)** (METR, 2025)
466
- Documents reward hacking in production-like settings: o3 hacked evaluation software, Claude models subverted tasks. Shows models demonstrate awareness their behavior violates intentions.
467
-
468
- **[Reward Hacking in Reinforcement Learning](https://lilianweng.github.io/posts/2024-11-28-reward-hacking/)** (Lilian Weng, 2024)
469
- Comprehensive overview with real-world examples: summarization gaming ROUGE, coding tampering tests, recommendation algorithms optimizing engagement over welfare.
470
-
471
- **[Reward hacking behavior can generalize across tasks](https://arxiv.org/abs/2406.06393)** (Denison et al., 2024)
472
- Training on reward hacking leads to 1.3x-2.0x increase on held-out datasets. Documents how manipulation generalizes.
473
-
474
- **UER Application:** Use persistent context storage with `put`/`get` to track behavioral shifts across sessions and detect manipulation patterns.
475
-
476
- #### Track 3: Mitigations
477
-
478
- **[Chain of Thought Monitoring for Misbehavior](https://openai.com/index/chain-of-thought-monitoring/)** (OpenAI, 2025)
479
- Chain-of-thought monitoring can detect reward hacking. GPT-4o can monitor stronger o3-mini model. Demonstrates transparency-based mitigation promise and pitfalls.
480
-
481
- **[From shortcuts to sabotage: natural emergent misalignment from reward hacking](https://www.anthropic.com/research/emergent-misalignment-reward-hacking)** (Anthropic, 2025)
482
- Reward hacking training causes downstream misalignment (alignment faking, sabotage). Shows interventions can break semantic links.
483
-
484
- **[The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning](https://arxiv.org/abs/2403.03218)** (Li et al., 2024)
485
- Develops RMU unlearning method removing hazardous knowledge while preserving general capabilities.
486
-
487
- **UER Application:** Integrate mitigation tools via `mcp_call` to test interventions across multiple models simultaneously.
488
-
489
- #### Track 4: Open Track (Multi-Agent & Emergent Behavior)
490
-
491
- **[AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors](https://arxiv.org/abs/2308.10848)** (Chen et al., 2024)
492
- Demonstrates emergent social behaviors in multi-agent systems: volunteer behaviors, conformity, destructive behaviors.
493
-
494
- **[Emergence in Multi-Agent Systems: A Safety Perspective](https://arxiv.org/abs/2406.12411)** (2024)
495
- Investigates how specification insufficiency leads to emergent manipulative behavior when agents' learned priors conflict.
496
-
497
- **[School of Reward Hacks: Hacking Harmless Tasks Generalizes to Misalignment](https://arxiv.org/abs/2501.00003)** (2024)
498
- Training on "harmless" reward hacking causes generalization to concerning behaviors including shutdown avoidance and alignment faking.
499
-
500
- **UER Application:** Use `delegate` to orchestrate multi-agent studies with different models, tracking emergent manipulation behaviors via shared context.
501
-
502
- #### Open Datasets & Tools
503
-
504
- | Resource | Type | Link |
505
- |----------|------|------|
506
- | **WMDP Benchmark** | Dataset + Code | [github.com/centerforaisafety/wmdp](https://github.com/centerforaisafety/wmdp) |
507
- | **WildChat Dataset** | 1M ChatGPT conversations | [huggingface.co/datasets/allenai/WildChat](https://huggingface.co/datasets/allenai/WildChat) |
508
- | **lm-evaluation-harness** | Evaluation framework | [github.com/EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) |
509
- | **METR Task Environments** | Autonomous AI tasks | [github.com/METR/task-standard](https://github.com/METR/task-standard) |
510
- | **TransformerLens** | Interpretability library | [github.com/neelnanda-io/TransformerLens](https://github.com/neelnanda-io/TransformerLens) |
511
- | **AgentVerse Framework** | Multi-agent collaboration | [github.com/OpenBMB/AgentVerse](https://github.com/OpenBMB/AgentVerse) |
512
- | **Multi-Agent Particle Envs** | OpenAI environments | [github.com/openai/multiagent-particle-envs](https://github.com/openai/multiagent-particle-envs) |
513
- | **School of Reward Hacks** | Training dataset | [github.com/aypan17/reward-hacking](https://github.com/aypan17/reward-hacking) |
514
- | **NetLogo** | Agent-based modeling | [ccl.northwestern.edu/netlogo](https://ccl.northwestern.edu/netlogo/) |
515
-
516
- #### Project Scoping Advice
517
-
518
- Based on successful hackathon retrospectives:
519
-
520
- **Focus on MVP, Not Production** (2-day timeline):
521
- - Day 1: Set up environment, implement core functionality, basic pipeline
522
- - Day 2: Add 1-2 key features, create demo, prepare presentation
523
-
524
- **Use Mock/Simulated Data** instead of real APIs:
525
- - Synthetic datasets (WMDP, WildChat, School of Reward Hacks)
526
- - Pre-recorded samples
527
- - Simulation environments (METR, AgentVerse)
528
-
529
- **Leverage Pre-trained Models** - Don't train from scratch:
530
- - OpenAI/Anthropic APIs via UER's `llm_call`
531
- - Hugging Face pre-trained models
532
- - Existing detection tools as starting points
533
-
534
- **Clear Success Criteria** - Define "working":
535
- - **Benchmarks:** Evaluates 3+ models on 50+ test cases with documented methodology
536
- - **Detection:** Identifies manipulation in 10+ examples with >70% accuracy
537
- - **Analysis:** Documents patterns across 100+ deployment examples with clear taxonomy
538
- - **Mitigation:** Demonstrates measurable improvement on 3+ manipulation metrics
539
-
540
- ## Related Projects
541
-
542
- - [LiteLLM](https://github.com/BerriAI/litellm) - Unified LLM gateway
543
- - [MCP Registry](https://registry.modelcontextprotocol.io) - Official MCP server directory
544
- - [Context7](https://github.com/upstash/context7) - Library documentation MCP
545
- - [Apart Research](https://apartresearch.com/) - AI safety research and hackathons
546
-
547
- ## License
548
-
549
- MIT
550
-
551
- ---
552
-
553
- *Built for the AI Manipulation Hackathon by The Risk Takers team*
1
+ <div align="center">
2
+ <img src="img/uer.jpg" alt="UER Logo" width="200"/>
3
+
4
+ # Universal Expert Registry
5
+
6
+ [![npm version](https://badge.fury.io/js/uer-mcp.svg)](https://www.npmjs.com/package/uer-mcp)
7
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
8
+
9
+ **ASI-Level Experts, Infinite Memory, Any Client**
10
+ </div>
11
+
12
+ ---
13
+
14
+ **Standard config** works in most MCP clients:
15
+ > 💡 **Quick Start**: Get a free Gemini API key at [aistudio.google.com/api-keys](https://aistudio.google.com/api-keys)
16
+ ```json
17
+ {
18
+ "mcpServers": {
19
+ "uer": {
20
+ "command": "npx",
21
+ "args": ["uer-mcp@latest"],
22
+ "env": {
23
+ "GEMINI_API_KEY": "your-key-here"
24
+ }
25
+ }
26
+ }
27
+ }
28
+ ```
29
+
30
+ > **📦 Storage is optional**: This config works immediately for LLM and MCP features. For storage/context features, see [Storage Configuration Options](#storage-configuration-options) below.
31
+
32
+ > **⚠️ Required**: Add at least one API key to the `env` section. See [CONFIGURATION.md](CONFIGURATION.md) for all provider links and detailed setup.
33
+
34
+ [<img src="https://img.shields.io/badge/VS_Code-VS_Code?style=flat-square&label=Install%20Server&color=0098FF" alt="Install in VS Code">](https://insiders.vscode.dev/redirect?url=vscode%3Amcp%2Finstall%3F%257B%2522name%2522%253A%2522uer%2522%252C%2522command%2522%253A%2522npx%2522%252C%2522args%2522%253A%255B%2522uer-mcp%2540latest%2522%255D%257D) [<img alt="Install in VS Code Insiders" src="https://img.shields.io/badge/VS_Code_Insiders-VS_Code_Insiders?style=flat-square&label=Install%20Server&color=24bfa5">](https://insiders.vscode.dev/redirect?url=vscode-insiders%3Amcp%2Finstall%3F%257B%2522name%2522%253A%2522uer%2522%252C%2522command%2522%253A%2522npx%2522%252C%2522args%2522%253A%255B%2522uer-mcp%2540latest%2522%255D%257D) [<img src="https://cursor.com/deeplink/mcp-install-dark.svg" alt="Install in Cursor">](https://cursor.com/en/install-mcp?name=UER&config=eyJjb21tYW5kIjoibnB4IHVlci1tY3BAbGF0ZXN0In0%3D) [<img src="https://img.shields.io/badge/Windsurf-Windsurf?style=flat-square&label=Install%20Server&color=0B7A8F" alt="Install in Windsurf">](https://windsurf.com)
35
+
36
+ <details>
37
+ <summary>Other MCP Clients</summary>
38
+
39
+ For Claude Desktop, Goose, Codex, Amp, and other clients, see [CONFIGURATION.md](CONFIGURATION.md) for detailed setup instructions.
40
+
41
+ </details>
42
+
43
+ ---
44
+
45
+ An MCP server that provides:
46
+ 1. **Universal LLM Access** - Call any LLM (Claude, GPT, Gemini, Bedrock, Azure, local models) through LiteLLM
47
+ 2. **MCP Tool Orchestration** - Connect to 1000+ MCP servers (filesystem, databases, browsers, etc.)
48
+ 3. **Shared Memory/Context** - Break context window limits via external storage with URI references
49
+ 4. **Subagent Delegation** - Spawn subagents with full chat history, not just single messages
50
+
51
+ ## Why This Exists
52
+
53
+ LLMs have fundamental limitations:
54
+ - **Single message I/O**: 32-64k tokens max
55
+ - **Context window**: 200k-2M tokens
56
+ - **No persistent memory**: Forget between sessions
57
+ - **No expert access**: Can't use specialized tools
58
+
59
+ Traditional multi-agent approaches waste tokens by copying full context to each subagent. This registry solves it by:
60
+ - Storing context externally (unlimited)
61
+ - Passing URI references instead of full data (50 tokens vs 50k)
62
+ - Building complete chat histories for subagents
63
+ - Persisting across sessions
64
+
65
+ ## Architecture
66
+
67
+ ```mermaid
68
+ graph TB
69
+ subgraph clients["MCP Clients"]
70
+ A1["Cursor"]
71
+ A2["Claude Desktop"]
72
+ A3["ChatGPT"]
73
+ A4["VS Code"]
74
+ A5["JetBrains"]
75
+ end
76
+
77
+ subgraph uer["UER - Universal Expert Registry"]
78
+ direction TB
79
+ B["MCP Tools<br/>llm_call, mcp_call, put, get, delegate, search"]
80
+
81
+ subgraph litellm["LiteLLM Gateway"]
82
+ C1["100+ LLM providers"]
83
+ C2["Native MCP Gateway"]
84
+ C3["A2A Protocol support"]
85
+ C4["Cost tracking, rate limiting, fallbacks"]
86
+ end
87
+
88
+ subgraph store["Context Store"]
89
+ D1["Local: MinIO"]
90
+ D2["Cloud: AWS S3, Azure, NetApp"]
91
+ end
92
+
93
+ B --> litellm
94
+ B --> store
95
+ end
96
+
97
+ subgraph providers["LLM Providers"]
98
+ E1["Anthropic"]
99
+ E2["OpenAI"]
100
+ E3["Google"]
101
+ E4["Azure"]
102
+ E5["AWS Bedrock"]
103
+ E6["Local: Ollama"]
104
+ end
105
+
106
+ subgraph mcpservers["MCP Servers"]
107
+ F1["Filesystem"]
108
+ F2["PostgreSQL"]
109
+ F3["Slack"]
110
+ F4["Browser"]
111
+ F5["GitHub"]
112
+ F6["1000+ more..."]
113
+ end
114
+
115
+ subgraph knowledge["Knowledge Sources"]
116
+ G1["Context7"]
117
+ G2["Company docs"]
118
+ G3["Guidelines"]
119
+ G4["Standards"]
120
+ end
121
+
122
+ clients -->|MCP Protocol| B
123
+ litellm --> providers
124
+ litellm --> mcpservers
125
+ litellm --> knowledge
126
+ ```
127
+
128
+ ## Key Features
129
+
130
+ ### 1. Universal LLM Access via LiteLLM
131
+
132
+ Call any LLM with a single interface:
133
+
134
+ ```python
135
+ # All use the same interface - just change the model string
136
+ llm_call(model="anthropic/claude-sonnet-4-5-20250929", messages=[...])
137
+ llm_call(model="openai/gpt-5.2", messages=[...])
138
+ llm_call(model="gemini/gemini-3-flash-preview", messages=[...])
139
+ llm_call(model="bedrock/anthropic.claude-3-sonnet", messages=[...])
140
+ llm_call(model="azure/gpt-4-deployment", messages=[...])
141
+ llm_call(model="ollama/llama3.1:8b-instruct-q4_K_M", messages=[...])
142
+ ```
143
+
144
+ Features included:
145
+ - Automatic fallbacks between providers
146
+ - Cost tracking per request
147
+ - Rate limit handling with retries
148
+ - Tool/function calling across all providers
149
+
150
+ ### 2. MCP Tool Integration
151
+
152
+ Connect to any MCP server:
153
+
154
+ ```python
155
+ # List available MCP tools
156
+ search(type="mcp")
157
+
158
+ # Call MCP tools directly
159
+ mcp_call(server="filesystem", tool="read_file", args={"path": "/data/report.txt"})
160
+ mcp_call(server="postgres", tool="query", args={"sql": "SELECT * FROM users"})
161
+ mcp_call(server="context7", tool="search", args={"query": "LiteLLM API reference"})
162
+ ```
163
+
164
+ ### 3. Shared Context (The Killer Feature)
165
+
166
+ Store data externally, pass URI references:
167
+
168
+ ```python
169
+ # Store large document (200k tokens) in S3-compatible storage
170
+ put("s3://uer-context/analysis/doc_001.json", {"content": large_document})
171
+
172
+ # Pass only URI to subagent (50 tokens!)
173
+ delegate(
174
+ model="anthropic/claude-sonnet-4-5-20250929",
175
+ task="Analyze the document",
176
+ context_refs=["s3://uer-context/analysis/doc_001.json"]
177
+ )
178
+
179
+ # Subagent retrieves full content from storage
180
+ # Result stored back to S3
181
+ # Parent retrieves summary only
182
+ ```
183
+
184
+ **Token savings: 99.9%** for multi-agent workflows.
185
+
186
+ **Storage backends:**
187
+ - **Local:** MinIO (S3-compatible, Docker-based)
188
+ - **Cloud:** AWS S3, Azure Blob Storage, NetApp StorageGRID
189
+ - **Features:** Versioning, WORM compliance, Jinja2 templates, Claude Skills API support
190
+
191
+ See [docs/ADR-002-S3-Storage-Architecture.md](docs/ADR-002-S3-Storage-Architecture.md) for details.
192
+
193
+ #### Storage Configuration Options
194
+
195
+ UER supports three deployment scenarios for storage:
196
+
197
+ **Option 1: Docker MinIO (Recommended for Development)**
198
+
199
+ If you have Docker installed, start MinIO with one command:
200
+
201
+ ```bash
202
+ docker-compose up -d
203
+ ```
204
+
205
+ This starts MinIO on `localhost:9000` with default credentials (`minioadmin`/`minioadmin`). UER will automatically connect and create the required buckets (`uer-context`, `uer-skills`, `uer-templates`) on first use.
206
+
207
+ Access the MinIO console at `http://localhost:9001` to browse stored objects.
208
+
209
+ **Option 2: Custom S3-Compatible Storage**
210
+
211
+ For production or if you don't use Docker, configure your own S3-compatible storage:
212
+
213
+ ```json
214
+ {
215
+ "mcpServers": {
216
+ "uer": {
217
+ "command": "npx",
218
+ "args": ["uer-mcp@latest"],
219
+ "env": {
220
+ "GEMINI_API_KEY": "your-key-here",
221
+ "STORAGE_BACKEND": "minio",
222
+ "MINIO_ENDPOINT": "your-minio-server.com:9000",
223
+ "MINIO_ACCESS_KEY": "your-access-key",
224
+ "MINIO_SECRET_KEY": "your-secret-key",
225
+ "MINIO_SECURE": "true"
226
+ }
227
+ }
228
+ }
229
+ }
230
+ ```
231
+
232
+ Supports any S3-compatible storage:
233
+ - **MinIO** (self-hosted)
234
+ - **AWS S3** (use `S3_ENDPOINT`, `S3_ACCESS_KEY`, `S3_SECRET_KEY`, `S3_REGION`)
235
+ - **NetApp StorageGRID**
236
+ - **Wasabi**, **Backblaze B2**, **DigitalOcean Spaces**
237
+
238
+ **Option 3: Disabled Storage (LLM/MCP Only)**
239
+
240
+ If you only need LLM and MCP features without storage:
241
+
242
+ ```json
243
+ {
244
+ "mcpServers": {
245
+ "uer": {
246
+ "command": "npx",
247
+ "args": ["uer-mcp@latest"],
248
+ "env": {
249
+ "GEMINI_API_KEY": "your-key-here",
250
+ "STORAGE_ENABLED": "false"
251
+ }
252
+ }
253
+ }
254
+ }
255
+ ```
256
+
257
+ With storage disabled:
258
+ - ✅ `llm_call` - Call any LLM
259
+ - ✅ `mcp_call`, `mcp_list_tools`, `mcp_servers` - MCP orchestration
260
+ - Storage tools (`storage_put`, `storage_get`, etc.) - Not available
261
+ - ❌ Skills tools (`skill_create`, `skill_get`, etc.) - Not available
262
+ - Template tools (`template_render`, etc.) - Not available
263
+
264
+ The server will start successfully without storage, and LLMs won't see storage-related tools in their tool list.
265
+
266
+ ### 4. Full Chat History for Subagents
267
+
268
+ Build complete conversation context, not just single messages:
269
+
270
+ ```python
271
+ delegate(
272
+ model="openai/gpt-5-mini",
273
+ messages=[
274
+ {"role": "system", "content": "You are a code reviewer..."},
275
+ {"role": "user", "content": "Review this code for security issues"},
276
+ {"role": "assistant", "content": "I'll analyze the code..."},
277
+ {"role": "user", "content": "Focus on SQL injection risks"}
278
+ ],
279
+ tools=[...], # MCP tools available to subagent
280
+ context_refs=["registry://context/codebase"] # Large context via URI
281
+ )
282
+ ```
283
+
284
+ ### 5. Continuation Across Sessions
285
+
286
+ Complex tasks can span multiple messages and sessions:
287
+
288
+ ```
289
+ Message 1: Start analysis Progress: 20% → {{continuation: registry://plan/001}}
290
+ Message 2: Continue Progress: 60% {{continuation: registry://plan/001}}
291
+ [Next day]
292
+ Message 3: Continue Complete! Here's your report...
293
+ ```
294
+
295
+ ## Usage
296
+
297
+ ### Test Your Setup
298
+
299
+ Try this in Claude Desktop:
300
+
301
+ ```
302
+ "Use the llm_call tool to call Gemini 3 Flash and ask it to explain what an MCP server is in one sentence."
303
+ ```
304
+
305
+ Expected behavior:
306
+ - Claude will use the `llm_call` tool
307
+ - Call `gemini/gemini-3-flash-preview`
308
+ - Return Gemini's response
309
+
310
+ ### Example Usage Scenarios
311
+
312
+ **1. Call Different LLMs:**
313
+ ```
314
+ User: "Use llm_call to ask Gemini what the capital of France is"
315
+ Calls gemini/gemini-3-flash-preview
316
+ Returns: "Paris"
317
+
318
+ User: "Now ask Claude Sonnet the same question"
319
+ → Calls anthropic/claude-sonnet-4-5-20250929
320
+ → Returns: "Paris"
321
+ ```
322
+
323
+ **2. Compare LLM Responses:**
324
+ ```
325
+ User: "Ask both Gemini and Claude Sonnet to write a haiku about programming"
326
+ → Uses llm_call twice with different models
327
+ Returns both haikus for comparison
328
+ ```
329
+
330
+ **3. Store and Share Context:**
331
+ ```
332
+ User: "Store this document in the registry and have Gemini summarize it"
333
+ put("registry://context/doc", {...})
334
+ delegate(model="gemini/gemini-3-flash-preview", context_refs=["registry://context/doc"])
335
+ Returns: Summary without re-sending full document
336
+ ```
337
+
338
+ ## Troubleshooting
339
+
340
+ ### "MCP server not found" or "No tools available"
341
+
342
+ 1. Check that `claude_desktop_config.json` is in the correct location
343
+ 2. Verify the `--directory` path is correct (use absolute path)
344
+ 3. Ensure you've restarted Claude Desktop after configuration
345
+ 4. Check Claude Desktop logs: `%APPDATA%\Claude\logs\` (Windows) or `~/Library/Logs/Claude/` (Mac)
346
+
347
+ ### "API key invalid" errors
348
+
349
+ 1. Verify your API key is correct and active
350
+ 2. Check you're using the right key for the right provider
351
+ 3. For Gemini, ensure the key starts with `AIza`
352
+ 4. For Anthropic, ensure the key starts with `sk-ant-`
353
+ 5. For OpenAI, ensure the key starts with `sk-`
354
+
355
+ ### "Model not found" errors
356
+
357
+ 1. Ensure you have an API key configured for that provider
358
+ 2. Check the model name is correct (use LiteLLM format: `provider/model`)
359
+ 3. Verify the model is available in your region/tier
360
+
361
+
362
+ ## Tools Reference
363
+
364
+ | Tool | Description |
365
+ |------|-------------|
366
+ | `llm_call` | Call any LLM via LiteLLM (100+ providers) |
367
+ | `mcp_call` | Call any configured MCP server tool |
368
+ | `put` | Store data/context in registry |
369
+ | `get` | Retrieve data/context from registry |
370
+ | `search` | Search MCP servers, skills, or stored context |
371
+ | `delegate` | Spawn subagent with full chat history |
372
+ | `subscribe` | Watch for async results |
373
+ | `cancel` | Cancel subscription or execution |
374
+
375
+ ## LiteLLM Integration
376
+
377
+ This project uses [LiteLLM](https://github.com/BerriAI/litellm) as the unified LLM gateway, providing:
378
+
379
+ - **100+ LLM providers** through single interface
380
+ - **Native MCP Gateway** with permission management
381
+ - **A2A Protocol** for agent-to-agent communication
382
+ - **Cost tracking** per request with spend reports
383
+ - **Rate limiting** with automatic retries
384
+ - **Fallbacks** between providers on failure
385
+ - **Tool/function calling** normalized across providers
386
+
387
+ ### Supported Providers
388
+
389
+ | Provider | Model Examples |
390
+ |----------|---------------|
391
+ | Anthropic | `anthropic/claude-sonnet-4-5-20250929`, `anthropic/claude-opus-4-5-20251101` |
392
+ | OpenAI | `openai/gpt-5.2`, `openai/gpt-5-mini`, `openai/gpt-5.2-codex` |
393
+ | Google | `gemini/gemini-3-flash-preview`, `gemini/gemini-3-pro-preview` |
394
+ | Azure | `azure/gpt-4-deployment` |
395
+ | AWS Bedrock | `bedrock/anthropic.claude-3-sonnet` |
396
+ | Local | `ollama/llama3.1:8b-instruct-q4_K_M`, `lm_studio/lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF` |
397
+
398
+ ## Project Structure
399
+
400
+ ```
401
+ UER/
402
+ ├── README.md # This file
403
+ ├── ADR.plan.md # Architecture Decision Record
404
+ ├── TODO.md # Implementation checklist
405
+ ├── pyproject.toml
406
+
407
+ ├── src/
408
+ │ ├── server.py # MCP server entry point
409
+ │ ├── llm/
410
+ │ │ └── gateway.py # LiteLLM wrapper
411
+ │ ├── mcp/
412
+ │ │ └── client.py # MCP client for calling other servers
413
+ │ ├── storage/
414
+ │ │ ├── base.py # S3-compatible storage protocol
415
+ │ │ ├── minio_backend.py # MinIO backend (local)
416
+ │ │ ├── s3_backend.py # AWS S3 backend (cloud)
417
+ │ │ ├── manager.py # Storage manager
418
+ │ │ ├── skills.py # Claude Skills API support
419
+ │ │ └── templates.py # Jinja2 template rendering
420
+ │ ├── tools/
421
+ │ │ ├── llm_call.py # LLM invocation tool
422
+ │ │ ├── mcp_call.py # MCP tool invocation
423
+ │ │ ├── storage_tools.py # put/get/list/delete
424
+ │ │ └── delegate.py # Subagent delegation
425
+ │ └── models/
426
+ │ ├── storage.py # Storage schemas (ObjectMetadata, Retention)
427
+ │ └── message.py # Chat message schemas
428
+
429
+ └── config/
430
+ └── litellm_config.yaml
431
+ ```
432
+
433
+ ## Dependencies
434
+
435
+ ```toml
436
+ [project]
437
+ dependencies = [
438
+ "mcp>=1.0.0",
439
+ "litellm>=1.77.0",
440
+ "pydantic>=2.0.0",
441
+ "httpx>=0.25.0",
442
+ ]
443
+ ```
444
+
445
+ ## Datasets & Testing
446
+
447
+ UER includes scripts to download and test manipulation detection datasets.
448
+
449
+ ### Quick Start: Download All Datasets
450
+
451
+ **One command downloads everything:**
452
+
453
+ ```bash
454
+ python seed_datasets.py
455
+ ```
456
+
457
+ This downloads:
458
+ - **WMDP Benchmark:** 3,668 questions (Bio: 1,273, Chem: 408, Cyber: 1,987)
459
+ - **WildChat Sample:** 10,000 real conversations (162 MB)
460
+ - **lm-evaluation-harness:** Evaluation framework
461
+
462
+ **Time:** ~5-10 minutes depending on internet speed.
463
+
464
+ ### Run Tests
465
+
466
+ **Test for Sandbagging:**
467
+ ```bash
468
+ cd context/scripts
469
+ python test_wmdp.py --model gemini/gemini-3-flash-preview --limit 50
470
+ ```
471
+
472
+ **Test for Sycophancy:**
473
+ ```bash
474
+ python test_sycophancy.py --models gemini
475
+ ```
476
+
477
+ **Results saved to:** `context/datasets/results/`
478
+
479
+ ### Dataset Details
480
+
481
+ | Dataset | Size | Purpose | Location |
482
+ |---------|------|---------|----------|
483
+ | **WMDP Benchmark** | 3,668 questions (2.2 MB) | Sandbagging detection | `context/datasets/wmdp_questions/` |
484
+ | **WildChat** | 10k conversations (162 MB) | Real-world sycophancy | `context/datasets/wildchat/` |
485
+ | **lm-evaluation-harness** | Framework | Standard LLM evaluation | `context/datasets/lm-evaluation-harness/` |
486
+
487
+ All datasets are gitignored. Run `seed_datasets.py` to download locally.
488
+
489
+ ## Hackathon Context
490
+
491
+ This project was built for the **[AI Manipulation Hackathon](https://apartresearch.com/sprints/ai-manipulation-hackathon-2026-01-09-to-2026-01-11)** organized by [Apart Research](https://apartresearch.com/).
492
+
493
+ ### Event Details
494
+
495
+ - **Dates:** January 9-11, 2026
496
+ - **Theme:** Measuring, detecting, and defending against AI manipulation
497
+ - **Participants:** 500+ builders worldwide
498
+ - **Prizes:** $2,000 in cash prizes
499
+ - **Workshop:** Winners present at IASEAI workshop in Paris (February 26, 2026)
500
+
501
+ ### The Challenge
502
+
503
+ AI systems are mastering deception, sycophancy, sandbagging, and psychological exploitation at scale, while our ability to detect, measure, and counter these behaviors remains dangerously underdeveloped. This hackathon brings together builders to prototype practical systems that address this critical AI safety challenge.
504
+
505
+ ### How UER Addresses AI Manipulation
506
+
507
+ The Universal Expert Registry provides infrastructure for:
508
+
509
+ 1. **Multi-Model Testing** - Compare responses across providers to detect inconsistencies and manipulation patterns
510
+ 2. **Persistent Context** - Track conversation history across sessions to identify behavioral shifts
511
+ 3. **Tool Integration** - Connect manipulation detection tools via MCP protocol
512
+ 4. **Subagent Orchestration** - Deploy specialized agents for red-teaming and safety testing
513
+ 5. **Transparent Logging** - Full visibility into LLM calls, costs, and behaviors
514
+
515
+ ### Team
516
+
517
+ **The Risk Takers** - Building practical tools for AI safety and transparency.
518
+
519
+ ### Hackathon Resources
520
+
521
+ The hackathon provides extensive research and tools for understanding AI manipulation. Below are key resources organized by track.
522
+
523
+ #### General Introduction
524
+
525
+ **[AI deception: A survey of examples, risks, and potential solutions](https://arxiv.org/abs/2308.14752)** (Park et al., 2024)
526
+ Comprehensive survey defining AI deception and documenting manipulation via sycophancy, strategic behavior, and safety test evasion. Covers both near-term risks (fraud, election tampering) and long-term risks (losing control of AI systems).
527
+
528
+ **[Artificial Intelligence Index Report 2025](https://hai.stanford.edu/assets/files/hai_ai_index_report_2025.pdf)** (Stanford HAI)
529
+ Annual assessment of global AI progress including responsible AI and system risks. Provides regulatory trends and technical challenges context.
530
+
531
+ **[Towards Understanding Sycophancy in Language Models](https://arxiv.org/abs/2310.13548)** (Sharma et al., 2024)
532
+ Documents that state-of-the-art AI assistants consistently exhibit sycophancy, with human preference data driving this behavior. Demonstrates inverse scaling: larger models become MORE sycophantic.
533
+
534
+ #### Track 1: Measurement & Evaluation
535
+
536
+ **[AI Sandbagging: Language Models can Strategically Underperform on Evaluations](https://arxiv.org/abs/2406.07358)** (van der Weij et al., 2024)
537
+ Demonstrates GPT-4 and Claude 3 Opus strategically underperforming on dangerous capability evaluations. Models can "password-lock" capabilities and target specific scores.
538
+
539
+ **[Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models](https://arxiv.org/abs/2501.00002)** (Tice et al., 2024)
540
+ Novel sandbagging detection via weight noise injection. Provides practical, model-agnostic detection tool.
541
+
542
+ **[The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning](https://arxiv.org/abs/2403.03218)** (Li et al., 2024)
543
+ Dataset of 3,668 multiple-choice questions for evaluating hazardous knowledge. Widely used in sandbagging and capability evaluation research.
544
+
545
+ **UER Application:** Use `llm_call` to test multiple models against WMDP benchmark, comparing responses to detect sandbagging patterns.
546
+
547
+ #### Track 2: Real-World Analysis
548
+
549
+ **[Recent Frontier Models Are Reward Hacking](https://metr.org/blog/2025-06-05-recent-reward-hacking/)** (METR, 2025)
550
+ Documents reward hacking in production-like settings: o3 hacked evaluation software, Claude models subverted tasks. Shows models demonstrate awareness their behavior violates intentions.
551
+
552
+ **[Reward Hacking in Reinforcement Learning](https://lilianweng.github.io/posts/2024-11-28-reward-hacking/)** (Lilian Weng, 2024)
553
+ Comprehensive overview with real-world examples: summarization gaming ROUGE, coding tampering tests, recommendation algorithms optimizing engagement over welfare.
554
+
555
+ **[Reward hacking behavior can generalize across tasks](https://arxiv.org/abs/2406.06393)** (Denison et al., 2024)
556
+ Training on reward hacking leads to 1.3x-2.0x increase on held-out datasets. Documents how manipulation generalizes.
557
+
558
+ **UER Application:** Use persistent context storage with `put`/`get` to track behavioral shifts across sessions and detect manipulation patterns.
559
+
560
+ #### Track 3: Mitigations
561
+
562
+ **[Chain of Thought Monitoring for Misbehavior](https://openai.com/index/chain-of-thought-monitoring/)** (OpenAI, 2025)
563
+ Chain-of-thought monitoring can detect reward hacking. GPT-4o can monitor stronger o3-mini model. Demonstrates transparency-based mitigation promise and pitfalls.
564
+
565
+ **[From shortcuts to sabotage: natural emergent misalignment from reward hacking](https://www.anthropic.com/research/emergent-misalignment-reward-hacking)** (Anthropic, 2025)
566
+ Reward hacking training causes downstream misalignment (alignment faking, sabotage). Shows interventions can break semantic links.
567
+
568
+ **[The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning](https://arxiv.org/abs/2403.03218)** (Li et al., 2024)
569
+ Develops RMU unlearning method removing hazardous knowledge while preserving general capabilities.
570
+
571
+ **UER Application:** Integrate mitigation tools via `mcp_call` to test interventions across multiple models simultaneously.
572
+
573
+ #### Track 4: Open Track (Multi-Agent & Emergent Behavior)
574
+
575
+ **[AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors](https://arxiv.org/abs/2308.10848)** (Chen et al., 2024)
576
+ Demonstrates emergent social behaviors in multi-agent systems: volunteer behaviors, conformity, destructive behaviors.
577
+
578
+ **[Emergence in Multi-Agent Systems: A Safety Perspective](https://arxiv.org/abs/2406.12411)** (2024)
579
+ Investigates how specification insufficiency leads to emergent manipulative behavior when agents' learned priors conflict.
580
+
581
+ **[School of Reward Hacks: Hacking Harmless Tasks Generalizes to Misalignment](https://arxiv.org/abs/2501.00003)** (2024)
582
+ Training on "harmless" reward hacking causes generalization to concerning behaviors including shutdown avoidance and alignment faking.
583
+
584
+ **UER Application:** Use `delegate` to orchestrate multi-agent studies with different models, tracking emergent manipulation behaviors via shared context.
585
+
586
+ #### Open Datasets & Tools
587
+
588
+ | Resource | Type | Link |
589
+ |----------|------|------|
590
+ | **WMDP Benchmark** | Dataset + Code | [github.com/centerforaisafety/wmdp](https://github.com/centerforaisafety/wmdp) |
591
+ | **WildChat Dataset** | 1M ChatGPT conversations | [huggingface.co/datasets/allenai/WildChat](https://huggingface.co/datasets/allenai/WildChat) |
592
+ | **lm-evaluation-harness** | Evaluation framework | [github.com/EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) |
593
+ | **METR Task Environments** | Autonomous AI tasks | [github.com/METR/task-standard](https://github.com/METR/task-standard) |
594
+ | **TransformerLens** | Interpretability library | [github.com/neelnanda-io/TransformerLens](https://github.com/neelnanda-io/TransformerLens) |
595
+ | **AgentVerse Framework** | Multi-agent collaboration | [github.com/OpenBMB/AgentVerse](https://github.com/OpenBMB/AgentVerse) |
596
+ | **Multi-Agent Particle Envs** | OpenAI environments | [github.com/openai/multiagent-particle-envs](https://github.com/openai/multiagent-particle-envs) |
597
+ | **School of Reward Hacks** | Training dataset | [github.com/aypan17/reward-hacking](https://github.com/aypan17/reward-hacking) |
598
+ | **NetLogo** | Agent-based modeling | [ccl.northwestern.edu/netlogo](https://ccl.northwestern.edu/netlogo/) |
599
+
600
+ #### Project Scoping Advice
601
+
602
+ Based on successful hackathon retrospectives:
603
+
604
+ **Focus on MVP, Not Production** (2-day timeline):
605
+ - Day 1: Set up environment, implement core functionality, basic pipeline
606
+ - Day 2: Add 1-2 key features, create demo, prepare presentation
607
+
608
+ **Use Mock/Simulated Data** instead of real APIs:
609
+ - Synthetic datasets (WMDP, WildChat, School of Reward Hacks)
610
+ - Pre-recorded samples
611
+ - Simulation environments (METR, AgentVerse)
612
+
613
+ **Leverage Pre-trained Models** - Don't train from scratch:
614
+ - OpenAI/Anthropic APIs via UER's `llm_call`
615
+ - Hugging Face pre-trained models
616
+ - Existing detection tools as starting points
617
+
618
+ **Clear Success Criteria** - Define "working":
619
+ - **Benchmarks:** Evaluates 3+ models on 50+ test cases with documented methodology
620
+ - **Detection:** Identifies manipulation in 10+ examples with >70% accuracy
621
+ - **Analysis:** Documents patterns across 100+ deployment examples with clear taxonomy
622
+ - **Mitigation:** Demonstrates measurable improvement on 3+ manipulation metrics
623
+
624
+ ## Related Projects
625
+
626
+ - [LiteLLM](https://github.com/BerriAI/litellm) - Unified LLM gateway
627
+ - [MCP Registry](https://registry.modelcontextprotocol.io) - Official MCP server directory
628
+ - [Context7](https://github.com/upstash/context7) - Library documentation MCP
629
+ - [Apart Research](https://apartresearch.com/) - AI safety research and hackathons
630
+
631
+ ## License
632
+
633
+ MIT
634
+
635
+ ---
636
+
637
+ *Built for the AI Manipulation Hackathon by The Risk Takers team*