npm - limbo-ai - Versions diffs - 1.24.8 → 1.25.0 - Mend

limbo-ai 1.24.8 → 1.25.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (42) hide show

package/README.md +116 -150
package/cli.js +23 -16
package/docker-compose.test.yml +22 -0
package/evals/cases/create-reminder.json +22 -0
package/evals/cases/hard-ambiguous-request.json +12 -0
package/evals/cases/hard-complex-note.json +17 -0
package/evals/cases/hard-synthesize-knowledge.json +33 -0
package/evals/cases/medium-note-type-inference.json +16 -0
package/evals/cases/medium-person-multiple-facts.json +16 -0
package/evals/cases/medium-search-implicit.json +13 -0
package/evals/cases/multi-step-remember-and-search.json +24 -0
package/evals/cases/read-note-by-id.json +22 -0
package/evals/cases/remember-fact.json +15 -0
package/evals/cases/reminder-timezone.json +23 -0
package/evals/cases/search-existing-note.json +27 -0
package/evals/cases/update-map.json +28 -0
package/evals/cases/web-search.json +22 -0
package/evals/cli.js +477 -0
package/evals/docker-compose.eval.yml +43 -0
package/evals/judge/rubrics.json +10 -0
package/evals/lib/judge.js +69 -0
package/evals/lib/mcp-log.js +62 -0
package/evals/lib/scorer.js +153 -0
package/evals/lib/vault-diff.js +59 -0
package/evals/results/.gitkeep +0 -0
package/evals/results/baseline.json +662 -0
package/evals/results/history/.gitkeep +0 -0
package/evals/results/history/run-1774559258082.json +662 -0
package/evals/results/history/run-1774559485256.json +662 -0
package/evals/results/history/run-1774559674855.json +662 -0
package/evals/results/latest.json +662 -0
package/evals/test/scorer.test.js +180 -0
package/evals/vault-seed/maps/.gitkeep +0 -0
package/evals/vault-seed/notes/.gitkeep +0 -0
package/evals/vault-seed/notes/eval-seed-birthday.md +10 -0
package/mcp-server/index.js +30 -10
package/mcp-server/test/eval-logging.test.js +254 -0
package/package.json +3 -2
package/setup-server/server.js +14 -10
package/test/cli-auth.test.js +21 -15
package/test/setup-server.test.js +14 -7
package/test/zeroclaw-migration.test.js +3 -3

package/README.md CHANGED Viewed

@@ -1,143 +1,104 @@
 # Limbo
-A personal memory agent. Captures ideas, remembers things, and connects knowledge across time — running quietly in a Docker container, accessible via Telegram or the ZeroClaw gateway.
+[![npm](https://img.shields.io/npm/v/limbo-ai?color=blue&label=release)](https://www.npmjs.com/package/limbo-ai)
+[![build](https://img.shields.io/github/actions/workflow/status/TomasWard1/limbo/ci.yml?branch=staging&label=build)](https://github.com/TomasWard1/limbo/actions)
+[![license](https://img.shields.io/badge/license-MIT-green)](./LICENSE)
+[![platform](https://img.shields.io/badge/platform-linux%20%7C%20macOS-lightgrey)](.)
+[![docker](https://img.shields.io/badge/docker-%E2%9C%93-blue)](https://github.com/TomasWard1/limbo/pkgs/container/limbo)
-## What it is
+A personal memory agent. Captures ideas, remembers things, and connects knowledge across time — running in a Docker container, accessible via Telegram or the ZeroClaw gateway.
-Limbo is a second brain with a conversational interface. It stores atomic notes in a local vault, searches them semantically, and maintains Maps of Content (MOCs) to keep knowledge navigable. It is not a general-purpose assistant — it is a memory system.
-**Agent personality:** defined in `workspace/IDENTITY.md` and `workspace/SOUL.md`, baked into the image at build time.
+Limbo is a second brain with a conversational interface. It stores atomic notes in a local vault, searches them semantically, and maintains Maps of Content (MOCs) to keep knowledge navigable.
 ---
-## Hardware Requirements
-Limbo runs as a single Docker container (~35 MB RAM at idle). The main resource cost is Docker and the host OS, not Limbo itself.
-| Tier | RAM | vCPU | Disk | Notes |
-|------|-----|------|------|-------|
-| Minimum | 512 MB | 1 | 1 GB | Needs swap configured |
-| Recommended | 1 GB | 1 | 5 GB | Comfortable for Limbo alone |
-| With other services | 2 GB | 1 | 10 GB | Room for reverse proxy, monitoring, etc. |
+## Install
-> Limbo's container uses ~35 MB at rest and peaks around ~70 MB during cold starts. CPU usage is negligible — short bursts of 5-7% when processing messages.
+> Limbo is designed to run on a VPS (always-on, accessible from anywhere). A $5/month Ubuntu server is all you need.
----
+### 1. Provision a server
-## Quick Start
+Any Ubuntu/Debian VPS with 1 GB+ RAM.
-Requires [Docker Desktop](https://docs.docker.com/get-docker/) and Node.js 18+.
+### 2. Run the installer
-```sh
-npx limbo-ai start
+```bash
+curl -fsSL https://raw.githubusercontent.com/TomasWard1/limbo/main/scripts/install.sh | bash
 ```
-This will:
-1. Prompt for your API key (Anthropic or OpenAI)
-2. Write `~/.limbo/.env` and `~/.limbo/docker-compose.yml`
-3. Pull the latest Limbo image and start the container
+This installs Docker, Node.js, and the Limbo CLI.
-Limbo binds to `127.0.0.1:18789`.
-### Agent Installation
-AI agents can install Limbo non-interactively using CLI flags:
+### 3. Start Limbo
 ```bash
-npx limbo-ai start --provider openrouter --api-key sk-or-v1-xxx --model auto
+limbo start
 ```
-**Required flags:**
-| Flag | Description |
-|------|-------------|
-| `--provider` | `openai`, `anthropic`, or `openrouter` |
-| `--api-key` | Your provider API key |
-**Optional flags:**
-| Flag | Default | Description |
-|------|---------|-------------|
-| `--model` | Provider default | Model name (e.g. `anthropic/claude-sonnet-4-6`) |
-| `--language` | `en` | CLI language (`en` or `es`) |
-Headless mode skips Telegram setup. To add Telegram later, run `npx limbo-ai start --reconfigure`.
+The setup wizard walks you through:
+- [ ] Choose a language (English / Español)
+- [ ] Select a provider (Anthropic, OpenAI, OpenRouter)
+- [ ] Authenticate (API key or Claude/ChatGPT subscription)
+- [ ] Pick a model
+- [ ] Connect Telegram (optional but recommended)
+- [ ] Enable voice messages and web search (optional)
+- [ ] Review and confirm
-> **Note:** Subscription-based auth (ChatGPT/Codex, Claude Code) requires interactive setup because it involves browser-based OAuth or token pasting. Use `npx limbo-ai start` without flags for subscription auth.
+Once complete, Limbo restarts and is ready to use.
-### Available commands
+### 4. Update
-```sh
-npx limbo-ai@latest start        # Install and start (default if no command given)
-npx limbo-ai@latest stop         # Stop the container
-npx limbo-ai@latest update       # Pull latest image and restart
-npx limbo-ai@latest status       # Show container status
-npx limbo-ai@latest logs         # Tail container logs
-npx limbo-ai@latest start --reconfigure   # Change API keys or settings
-npx limbo-ai@latest config               # Configure optional features (voice, web-search)
+```bash
+limbo update
 ```
----
-## Optional Features
-Limbo supports optional features that can be enabled during the setup wizard (step 7) or anytime via the CLI.
-### Voice Messages
+Pulls the latest image and restarts. Vault data is persisted and not affected.
-Transcribe Telegram voice notes using [Groq](https://groq.com) Whisper. Requires a Groq API key (`gsk_...`).
-```sh
-npx limbo-ai@latest config voice --enable --api-key gsk_xxx
-npx limbo-ai@latest config voice --status
-npx limbo-ai@latest config voice --disable
-```
+---
-### Web Search
+## Local Install (macOS/Linux)
-Give Limbo real-time web search via the [Brave Search API](https://brave.com/search/api/). Requires a Brave API key (`BSA...`).
+If you prefer running locally instead of a VPS:
-```sh
-npx limbo-ai@latest config web-search --enable --api-key BSAxxx
-npx limbo-ai@latest config web-search --status
-npx limbo-ai@latest config web-search --disable
+```bash
+npx limbo-ai start
 ```
-Both features store API keys as Docker secrets and toggle config sections in the container on restart.
+Requires [Docker Desktop](https://docs.docker.com/get-docker/) and Node.js 18+. Binds to `127.0.0.1:18789`.
 ---
-## Updating
+## Commands
 ```sh
-npx limbo-ai@latest update
+limbo start                  # Install and start (enters wizard on first run)
+limbo stop                   # Stop the container
+limbo update                 # Pull latest image and restart
+limbo status                 # Show container status
+limbo logs                   # Tail container logs
+limbo start --reconfigure    # Re-run the setup wizard
+limbo config voice --enable --api-key gsk_xxx   # Enable voice transcription
+limbo config web-search --enable --api-key BSA_xxx  # Enable web search
 ```
-Pulls the latest Limbo image and restarts the container. Your vault data is persisted in the `limbo-data` Docker volume and is not affected.
 ---
 ## Connecting
-There are two ways to connect: **talk to Limbo** (conversational, with its personality and memory logic) or **use the vault directly** (raw tool access from another agent).
-### Talk to Limbo
+### Telegram (recommended)
-#### Telegram (recommended)
+The setup wizard walks you through creating a Telegram bot and pairing it. Message your bot and Limbo responds — full agent with personality, memory logic, and vault tools.
-During setup (`npx limbo-ai start`), the wizard will walk you through creating a Telegram bot via BotFather and pairing it. Message your bot and Limbo will respond — full agent with personality, memory logic, and vault tools.
+### ZeroClaw gateway
-#### ZeroClaw gateway
-Any [ZeroClaw](https://github.com/zeroclaw-labs/zeroclaw)-compatible chat client can connect to:
+Any [ZeroClaw](https://github.com/zeroclaw-labs/zeroclaw)-compatible client can connect via WebSocket:
 ```
 ws://localhost:18789
 ```
-This gives you a conversational session with Limbo, same as Telegram but over WebSocket.
-### Use the vault from another agent
+### MCP (for other AI agents)
-If you want another AI agent (like Claude Code) to read and write to Limbo's vault directly — without going through Limbo's personality or reasoning — add it as an MCP server:
+Add Limbo as an MCP server to give another agent direct vault access:
 ```json
 {
@@ -149,43 +110,43 @@ If you want another AI agent (like Claude Code) to read and write to Limbo's vau
 }
 ```
-This exposes the 4 vault tools (`vault_search`, `vault_read`, `vault_write_note`, `vault_update_map`) as MCP tools in the connecting agent. The agent operates on the vault directly — Limbo's LLM is not involved.
+This exposes 4 vault tools (`vault_search`, `vault_read`, `vault_write_note`, `vault_update_map`). The connecting agent operates on the vault directly — Limbo's LLM is not involved.
 ---
-## Environment Variables
+## Optional Features
-Managed automatically by `npx limbo-ai start`, stored in `~/.limbo/.env`.
+Enable during the setup wizard or anytime via CLI.
-| Variable | Required | Default | Description |
-|----------|----------|---------|-------------|
-| `AUTH_MODE` | no | `api-key` | `api-key` or `subscription` |
-| `OPENAI_API_KEY` | no* | — | OpenAI API key for `MODEL_PROVIDER=openai` |
-| `ANTHROPIC_API_KEY` | no* | — | Anthropic API key for `MODEL_PROVIDER=anthropic` |
-| `LLM_API_KEY` | no | — | Legacy generic key path for older installs |
-| `MODEL_PROVIDER` | no | `anthropic` | Model provider: `anthropic`, `openai`, or `openai-codex` |
-| `MODEL_NAME` | no | `claude-opus-4-6` | Model name (e.g. `claude-opus-4-6`, `claude-sonnet-4-6`, `gpt-5.4`) |
-| `TELEGRAM_ENABLED` | no | `false` | Enable Telegram bot integration |
-| `TELEGRAM_BOT_TOKEN` | no | — | Telegram bot token (required if `TELEGRAM_ENABLED=true`) |
-| `VOICE_ENABLED` | no | `false` | Enable voice transcription (requires Groq API key as Docker secret) |
-| `WEB_SEARCH_ENABLED` | no | `false` | Enable web search (requires Brave API key as Docker secret) |
+### Voice Messages
-> \* API keys are required only for `AUTH_MODE=api-key`. Subscription auth uses ZeroClaw auth profiles instead.
+Transcribe Telegram voice notes using [Groq](https://groq.com) Whisper.
----
+```sh
+limbo config voice --enable --api-key gsk_xxx
+limbo config voice --disable
+```
+### Web Search
-## MCP Tools
+Real-time web search via [Brave Search API](https://brave.com/search/api/).
-Limbo exposes 4 tools via the `limbo-vault` MCP server:
+```sh
+limbo config web-search --enable --api-key BSAxxx
+limbo config web-search --disable
+```
+---
-| Tool | Description |
-|------|-------------|
-| `vault_search` | Search notes by regex or keyword |
-| `vault_read` | Read a note by ID (returns raw markdown + frontmatter) |
-| `vault_write_note` | Create or overwrite a note with structured frontmatter |
-| `vault_update_map` | Append entries to a Map of Content (MOC) |
+## Hardware Requirements
+| Tier | RAM | vCPU | Disk |
+|------|-----|------|------|
+| Minimum | 512 MB | 1 | 1 GB |
+| Recommended | 1 GB | 1 | 5 GB |
+| With other services | 2 GB | 1 | 10 GB |
-Full tool specs in `workspace/TOOLS.md`.
+Limbo uses ~35 MB at rest, peaks ~70 MB during cold starts. CPU usage is negligible.
 ---
@@ -211,54 +172,59 @@ Full tool specs in `workspace/TOOLS.md`.
 └─────────────────────────────────────────┘
 ```
-- **ZeroClaw** — lightweight Rust runtime (~5MB RAM) that handles client connections, routes to the LLM, manages Telegram, and integrates MCP tools natively
-- **MCP server** — Node.js server providing vault read/write tools (spawned by ZeroClaw, no mcporter needed)
-- **Vault** — plain markdown files with YAML frontmatter, persisted in a named Docker volume
-- **Migrations** — lightweight Node.js migration runner for vault schema changes
+- **ZeroClaw** — Rust runtime (~5 MB RAM) handling connections, LLM routing, Telegram, and MCP tools
+- **MCP server** — Node.js vault read/write tools, spawned by ZeroClaw
+- **Vault** — plain markdown with YAML frontmatter, persisted in a Docker volume
-**Data directory layout** (in `/data` volume):
+---
-```
-/data/
-  vault/      # markdown notes
-  db/         # sqlite (future use)
-  logs/       # startup and runtime logs
-  backups/    # snapshots
-  memory/     # agent memory
-  config/
-    USER.md   # per-user persona file (generated at runtime)
+## Agent Installation (headless)
+For CI/CD or automated provisioning:
+```bash
+npx limbo-ai start --provider anthropic --api-key sk-ant-xxx --model claude-sonnet-4-6
 ```
----
+| Flag | Required | Default | Description |
+|------|----------|---------|-------------|
+| `--provider` | yes | — | `anthropic`, `openai`, or `openrouter` |
+| `--api-key` | yes | — | Provider API key |
+| `--model` | no | Provider default | Model name |
+| `--language` | no | `en` | `en` or `es` |
-## Development Setup
+Headless mode skips Telegram. Add it later with `limbo start --reconfigure`.
-### Prerequisites
+> Subscription auth (Claude Code, ChatGPT Plus) requires the interactive wizard.
-- Docker + Docker Compose
-- Node.js 22+ (for local MCP server dev)
+---
-### Run MCP server locally
+## Environment Variables
-```sh
-cd mcp-server
-npm install
-VAULT_PATH=./dev-vault node index.js
-```
+Managed by `limbo start`, stored in `~/.limbo/.env`.
-### Build image locally
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `AUTH_MODE` | `api-key` | `api-key` or `subscription` |
+| `MODEL_PROVIDER` | `anthropic` | `anthropic`, `openai`, `openai-codex`, or `openrouter` |
+| `MODEL_NAME` | `claude-sonnet-4-6` | Model to use |
+| `TELEGRAM_ENABLED` | `false` | Enable Telegram integration |
+| `VOICE_ENABLED` | `false` | Enable Groq voice transcription |
+| `WEB_SEARCH_ENABLED` | `false` | Enable Brave web search |
-```sh
-docker build -t limbo:dev .
-docker compose up -d
-```
+---
-### Run migrations standalone
+## Development
 ```sh
-node migrations/index.js
-```
+# Run MCP server locally
+cd mcp-server && npm install && VAULT_PATH=./dev-vault node index.js
----
+# Build image locally
+docker build -t limbo:dev . && docker compose up -d
+# Run tests
+npm test
+```
 See [CONTRIBUTING.md](./CONTRIBUTING.md) for release and deployment process.

package/cli.js CHANGED Viewed

@@ -1297,38 +1297,45 @@ function writeAuthProfilesToDocker(store) {
 }
 function buildCodexAuthProfile(profile) {
-  const profileId = profile.email ? `openai-codex:${profile.email}` : 'openai-codex:default';
+  const profileName = profile.email || 'default';
+  const profileId = `openai-codex:${profileName}`;
+  const now = new Date().toISOString();
   return {
-    version: 1,
+    schema_version: 1,
+    updated_at: now,
+    active_profiles: { 'openai-codex': profileId },
     profiles: {
       [profileId]: {
-        type: 'oauth',
         provider: 'openai-codex',
-        access: profile.access,
-        refresh: profile.refresh,
-        expires: profile.expires,
-        accountId: profile.accountId,
+        profile_name: profileName,
+        kind: 'oauth',
+        account_id: profile.accountId || null,
+        access_token: profile.access,
+        refresh_token: profile.refresh,
+        expires_at: new Date(profile.expires).toISOString(),
+        created_at: now,
+        updated_at: now,
       },
     },
-    order: {},
-    lastGood: {},
-    usageStats: {},
   };
 }
 function buildAnthropicAuthProfile(token) {
+  const now = new Date().toISOString();
   return {
-    version: 1,
+    schema_version: 1,
+    updated_at: now,
+    active_profiles: { anthropic: 'anthropic:default' },
     profiles: {
-      'anthropic:token': {
-        type: 'token',
+      'anthropic:default': {
         provider: 'anthropic',
+        profile_name: 'default',
+        kind: 'token',
         token,
+        created_at: now,
+        updated_at: now,
       },
     },
-    order: { anthropic: ['anthropic:token'] },
-    lastGood: {},
-    usageStats: {},
   };
 }

package/docker-compose.test.yml ADDED Viewed

@@ -0,0 +1,22 @@
+# Local testing — setup persists across restarts.
+# Start:  docker compose -f docker-compose.test.yml up -d
+# Logs:   docker compose -f docker-compose.test.yml logs -f
+# Stop:   docker compose -f docker-compose.test.yml down
+# Reset:  docker compose -f docker-compose.test.yml down -v  (wipes setup)
+services:
+  limbo:
+    image: limbo:rag-pdf-test
+    restart: "no"
+    ports:
+      - "127.0.0.1:18789:18789"
+    volumes:
+      - limbo-test-data:/data
+      - limbo-test-state:/home/limbo/.zeroclaw
+    tmpfs:
+      - /tmp:size=100M
+volumes:
+  limbo-test-data:
+    name: limbo-test-data
+  limbo-test-state:
+    name: limbo-test-state

package/evals/cases/create-reminder.json ADDED Viewed

@@ -0,0 +1,22 @@
+{
+  "name": "create-reminder",
+  "description": "User asks Limbo to set a reminder — should create a cron job, not a vault note",
+  "input": "Recordame mañana a las 9am que tengo que llamar al banco",
+  "assertions": [
+    {
+      "type": "cron_created",
+      "pattern": "banco|bank"
+    },
+    {
+      "type": "response_matches",
+      "pattern": "(?i)(reminder|recordatorio|avisarte|cron|programado|mañana)"
+    }
+  ],
+  "runs": 1,
+  "pass_threshold": 1.0,
+  "tags": [
+    "cron",
+    "reminder"
+  ],
+  "difficulty": "easy"
+}

package/evals/cases/hard-ambiguous-request.json ADDED Viewed

@@ -0,0 +1,12 @@
+{
+  "name": "hard-ambiguous-request",
+  "description": "User gives a vague save command with no clear content — model should ask for clarification",
+  "input": "Guardate esto para después",
+  "assertions": [
+    { "type": "response_matches", "pattern": "(?i)(qu[eé]|what|cu[aá]l|especific|clarif|decime|contame|refer|exactamente|guardar)" }
+  ],
+  "runs": 1,
+  "pass_threshold": 1.0,
+  "tags": ["ambiguity", "clarification"],
+  "difficulty": "hard"
+}

package/evals/cases/hard-complex-note.json ADDED Viewed

@@ -0,0 +1,17 @@
+{
+  "name": "hard-complex-note",
+  "description": "User describes a conversation with multiple perspectives and an action item — note should capture all of it",
+  "input": "Ayer hablé con Laura del tema de migrar a Kubernetes. Ella dice que no vale la pena para nuestro scale, yo creo que sí. Quedamos en revisar los números la semana que viene.",
+  "assertions": [
+    { "type": "tool_called", "tool": "vault_write_note" },
+    { "type": "param_match", "tool": "vault_write_note", "key": "type", "pattern": "decision|insight|meeting|project" },
+    { "type": "vault_note_created", "pattern": "(?i)laura" },
+    { "type": "vault_note_created", "pattern": "(?i)kubernetes|k8s" },
+    { "type": "vault_note_created", "pattern": "(?i)(no vale la pena|not worth|scale)" },
+    { "type": "vault_note_created", "pattern": "(?i)(revisar|review|números|numbers|semana)" }
+  ],
+  "runs": 1,
+  "pass_threshold": 1.0,
+  "tags": ["tool-calling", "vault_write_note", "complex-content"],
+  "difficulty": "hard"
+}

package/evals/cases/hard-synthesize-knowledge.json ADDED Viewed

@@ -0,0 +1,33 @@
+{
+  "name": "hard-synthesize-knowledge",
+  "description": "Multi-step: save two person notes, then ask a broad question that requires searching and synthesizing both",
+  "steps": [
+    {
+      "input": "Acordate que Martín es diseñador UX y trabaja en Mercado Libre",
+      "assertions": [
+        { "type": "tool_called", "tool": "vault_write_note" },
+        { "type": "vault_note_created", "pattern": "(?i)mart[ií]n" }
+      ]
+    },
+    {
+      "input": "Guardá que Sofía es data scientist en Globant y la conozco del secundario",
+      "assertions": [
+        { "type": "tool_called", "tool": "vault_write_note" },
+        { "type": "vault_note_created", "pattern": "(?i)sof[ií]a" }
+      ]
+    },
+    {
+      "input": "Qué sabes de las personas que conozco?",
+      "assertions": [
+        { "type": "tool_called", "tool": "vault_search" },
+        { "type": "response_matches", "pattern": "(?i)mart[ií]n" },
+        { "type": "response_matches", "pattern": "(?i)sof[ií]a" },
+        { "type": "response_matches", "pattern": "(?i)(mercado libre|globant)" }
+      ]
+    }
+  ],
+  "runs": 1,
+  "pass_threshold": 1.0,
+  "tags": ["multi-step", "vault_write_note", "vault_search", "synthesis"],
+  "difficulty": "hard"
+}

package/evals/cases/medium-note-type-inference.json ADDED Viewed

@@ -0,0 +1,16 @@
+{
+  "name": "medium-note-type-inference",
+  "description": "User describes a team decision — the note type should be 'decision', not 'fact'",
+  "input": "Hoy decidimos con el equipo que vamos a usar PostgreSQL en vez de MongoDB para el proyecto nuevo",
+  "assertions": [
+    { "type": "tool_called", "tool": "vault_write_note" },
+    { "type": "param_match", "tool": "vault_write_note", "key": "type", "pattern": "decision" },
+    { "type": "vault_note_created", "pattern": "(?i)postgresql|postgres" },
+    { "type": "vault_note_created", "pattern": "(?i)mongodb|mongo" },
+    { "type": "response_matches", "pattern": "(?i)(guardé|guardado|anotado|decisión|decision)" }
+  ],
+  "runs": 1,
+  "pass_threshold": 1.0,
+  "tags": ["tool-calling", "vault_write_note", "type-inference"],
+  "difficulty": "medium"
+}

package/evals/cases/medium-person-multiple-facts.json ADDED Viewed

@@ -0,0 +1,16 @@
+{
+  "name": "medium-person-multiple-facts",
+  "description": "User mentions a person with multiple facts in one message — should create a person note capturing all details",
+  "input": "Mi viejo se llama Carlos, es ingeniero y vive en Córdoba",
+  "assertions": [
+    { "type": "tool_called", "tool": "vault_write_note" },
+    { "type": "param_match", "tool": "vault_write_note", "key": "type", "pattern": "person" },
+    { "type": "vault_note_created", "pattern": "(?i)carlos" },
+    { "type": "vault_note_created", "pattern": "(?i)ingeniero|engineer" },
+    { "type": "vault_note_created", "pattern": "(?i)c[oó]rdoba" }
+  ],
+  "runs": 1,
+  "pass_threshold": 1.0,
+  "tags": ["tool-calling", "vault_write_note", "type-inference"],
+  "difficulty": "medium"
+}

package/evals/cases/medium-search-implicit.json ADDED Viewed

@@ -0,0 +1,13 @@
+{
+  "name": "medium-search-implicit",
+  "description": "User asks a broad question about people in tech — should search the vault and return relevant results",
+  "input": "Qué sabes sobre la gente que trabaja en tech?",
+  "assertions": [
+    { "type": "tool_called", "tool": "vault_search" },
+    { "type": "response_matches", "pattern": "(?i)(no encontr|no tengo|no hay|nothing|google|engineer|ML|machine learning|birthday|cumpleaños)" }
+  ],
+  "runs": 1,
+  "pass_threshold": 1.0,
+  "tags": ["tool-calling", "vault_search", "retrieval"],
+  "difficulty": "medium"
+}

package/evals/cases/multi-step-remember-and-search.json ADDED Viewed

@@ -0,0 +1,24 @@
+{
+  "name": "multi-step-remember-and-search",
+  "description": "Two-message flow: save a note, then search for it",
+  "steps": [
+    {
+      "input": "Recordame que Alice trabaja en Google como ML engineer",
+      "assertions": [
+        { "type": "tool_called", "tool": "vault_write_note" },
+        { "type": "param_match", "tool": "vault_write_note", "key": "type", "pattern": "person" }
+      ]
+    },
+    {
+      "input": "Que sabes sobre Alice?",
+      "assertions": [
+        { "type": "tool_called", "tool": "vault_search" },
+        { "type": "response_matches", "pattern": "(?i)(google|ML|machine learning)" }
+      ]
+    }
+  ],
+  "runs": 1,
+  "pass_threshold": 1.0,
+  "tags": ["multi-step", "vault_write_note", "vault_search"],
+  "difficulty": "easy"
+}

package/evals/cases/read-note-by-id.json ADDED Viewed

@@ -0,0 +1,22 @@
+{
+  "name": "read-note-by-id",
+  "description": "User asks to read a specific note (tests vault_read tool selection)",
+  "input": "Lee la nota eval-seed-birthday",
+  "assertions": [
+    {
+      "type": "tool_called",
+      "tool": "vault_read"
+    },
+    {
+      "type": "response_matches",
+      "pattern": "(?i)(march 15|tomas|birthday)"
+    }
+  ],
+  "runs": 1,
+  "pass_threshold": 1.0,
+  "tags": [
+    "tool-calling",
+    "vault_read"
+  ],
+  "difficulty": "easy"
+}

package/evals/cases/remember-fact.json ADDED Viewed

@@ -0,0 +1,15 @@
+{
+  "name": "remember-fact",
+  "description": "User asks Limbo to remember a personal fact (tests vault_write_note)",
+  "input": "Acordate que soy alérgico al maní",
+  "assertions": [
+    { "type": "tool_called", "tool": "vault_write_note" },
+    { "type": "param_match", "tool": "vault_write_note", "key": "type", "pattern": "fact|preference" },
+    { "type": "vault_note_created", "pattern": "alerg|mani|peanut" },
+    { "type": "response_matches", "pattern": "(?i)(guardé|guardado|saved|anotado|alérgico|maní)" }
+  ],
+  "runs": 1,
+  "pass_threshold": 1.0,
+  "tags": ["tool-calling", "vault_write_note"],
+  "difficulty": "easy"
+}