PyPI - duckbrain - Versions diffs - 0.1.1__tar.gz - Mend

duckbrain 0.1.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

duckbrain-0.1.1/PKG-INFO +438 -0
duckbrain-0.1.1/README.md +411 -0
duckbrain-0.1.1/pyproject.toml +62 -0
duckbrain-0.1.1/src/duckbrain/__init__.py +38 -0
duckbrain-0.1.1/src/duckbrain/indexer.py +180 -0
duckbrain-0.1.1/src/duckbrain/py.typed +0 -0
duckbrain-0.1.1/src/duckbrain/scanner.py +140 -0
duckbrain-0.1.1/src/duckbrain/server.py +95 -0
duckbrain-0.1.1/src/duckbrain/tools/__init__.py +0 -0
duckbrain-0.1.1/src/duckbrain/tools/vault_info.py +29 -0
duckbrain-0.1.1/src/duckbrain/tools/vault_read.py +72 -0
duckbrain-0.1.1/src/duckbrain/tools/vault_search.py +38 -0
duckbrain-0.1.1/src/duckbrain/tools/vault_write.py +31 -0
duckbrain-0.1.1/src/duckbrain/writer.py +253 -0

duckbrain-0.1.1/PKG-INFO ADDED Viewed

@@ -0,0 +1,438 @@
+Metadata-Version: 2.4
+Name: duckbrain
+Version: 0.1.1
+Summary: DuckDB-backed MCP memory server for Obsidian vaults — structured search, read, and write access for AI coding agents.
+Keywords: mcp,obsidian,memory,knowledge-base,duckdb,ai-agent
+Author: Tim Hiebenthal
+Author-email: Tim Hiebenthal <timhiebenthal@gmail.com>
+License-Expression: MIT
+Classifier: Development Status :: 3 - Alpha
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Topic :: Software Development :: Libraries :: Python Modules
+Requires-Dist: duckdb>=1.5.3
+Requires-Dist: mcp[cli]>=1.27.1
+Requires-Dist: python-dotenv>=1.2.2
+Requires-Dist: pyyaml>=6.0.3
+Requires-Python: >=3.10
+Project-URL: Homepage, https://github.com/timhiebenthal/duckbrain
+Project-URL: Repository, https://github.com/timhiebenthal/duckbrain
+Project-URL: Issues, https://github.com/timhiebenthal/duckbrain/issues
+Description-Content-Type: text/markdown
+# DuckBrain
+<p align="center">
+  <img src="https://raw.githubusercontent.com/timhiebenthal/duckbrain/main/logo/logo_writing_white_bg.png" alt="DuckBrain" width="500" />
+</p>
+DuckDB-backed MCP memory server for Obsidian vaults. Gives AI coding agents structured read and write access to your personal wiki — with full-text search, frontmatter-aware indexing, and automatic index/log updates. Built on the principle that your vault filesystem should be the single source of truth, not a database hidden behind an API.
+## What it solves
+Existing agent memory tools (MemSearch, Open Brain, Mem0, Supermemory) treat memory as unstructured text blobs. If you maintain a [Karpathy-style LLM wiki](https://x.com/karpathy/status/1889054630119760374) in Obsidian with typed pages (entities, concepts, sources, synthesis), YAML frontmatter, tags, and wikilinks — none of those tools understand your vault's structure.
+DuckBrain fills that gap. It reads your vault as-is and writes new pages following your vault's schema, so your wiki stays a single source of truth on the filesystem.
+## How it works (Architecture)
+```
+┌──────────────────┐     MCP stdio     ┌─────────────────────────────────┐
+│    AI Agent      │ ◄──────────────►  │      DuckBrain MCP Server       │
+│                  │                   │                                 │
+│  Claude Code     │                   │  vault_info  ──┐                │
+│  OpenCode        │                   │  vault_search ─┤  DuckDB FTS    │
+│  Cursor          │                   │  vault_read  ──┤  Filesystem    │
+│  Hermes          │                   │  vault_write ──┘  Filesystem    │
+└──────────────────┘                   └────────┬────────┬───────────────┘
+                                                │        │
+                    query ┌─────────────────────┘        └── read/write ──┐
+             (full index) ▼                                               ▼  (single file)
+         ┌──────────────────────┐                              ┌───────────────────────────┐
+         │  DuckDB (in-memory)  │                              │    Your Obsidian Vault    │
+         │                      │                              │                           │
+         │  pages (in-memory    │    rebuilt from scratch      │  wiki/entities/           │
+         │  rebuilt every search)│    on every query           │  wiki/concepts/           │
+         │  ┌───────────────┐   │                              │  wiki/sources/            │
+         │  │ filepath      │   │                              │  wiki/synthesis/          │
+         │  │ title         │   │                              │  daily/                   │
+         │  │ kind          │   │                              │  wiki/index.md            │
+         │  │ tags          │   │                              │  wiki/log.md              │
+         │  │ body          │   │                              │                           │
+         │  │ created       │   │                              │  plain markdown on disk   │
+         │  │ updated       │   │                              │                           │
+         │  └───────────────┘   │                              │                           │
+         │                      │                              │                           │
+         │  BM25 search query:  │                              │                           │
+         │  SELECT ...          │                              │                           │
+         │  FROM pages p        │                              │                           │
+         │  WHERE fts_match_bm25│                              │                           │
+         │    (p.filepath,      │                              │                           │
+         │     'segfault')      │                              │                           │
+         │  AND kind='concept'  │                              │                           │
+         │  ORDER BY score DESC │                              │                           │
+         └──────────────────────┘                              └───────────────────────────┘
+```
+- **Reads** your vault files directly — no index to sync, no watchers, no duplicate storage
+- **Searches** via DuckDB full-text search (BM25 ranking), rebuilt fresh from disk on every query
+- **Writes** new pages with correct YAML frontmatter, auto-updating your index and log
+## Requirements
+- Python 3.10+
+- [uv](https://docs.astral.sh/uv/) (package manager)
+- An Obsidian vault structured with a `wiki/` directory containing:
+  - `wiki/entities/` — people, orgs, products, tools
+  - `wiki/concepts/` — ideas, frameworks, theories
+  - `wiki/sources/` — one summary per ingested source
+  - `wiki/synthesis/` — cross-cutting analysis
+  - `wiki/index.md` — page catalog with `## Entities`, `## Concepts`, `## Sources`, `## Synthesis` sections
+  - `wiki/log.md` — append-only chronological record
+- Pages should use YAML frontmatter: `title`, `item-type`, `tags`, `created`, `updated`
+This follows the schema defined for [LLM wikis](https://x.com/karpathy/status/1889054630119760374). If your vault uses a different structure, DuckBrain works with it — but index/log updates expect the section headers above.
+## Quick Start
+```bash
+pip install duckbrain
+```
+That's it. Now connect your AI agent (see below) — you don't run DuckBrain yourself, the agent spawns it as needed.
+*(Optional: verify the install by running `duckbrain` — it'll fail with "VAULT_PATH not set", which confirms it's working.)*
+### Installing from source (for contributors)
+```bash
+git clone https://github.com/timhiebenthal/duckbrain.git
+cd duckbrain
+uv sync         # installs project + dev dependencies in a virtual environment
+```
+This requires [uv](https://docs.astral.sh/uv/) (the Python package manager used for development). End users should use `pip install duckbrain` above.
+*(Optional: to verify the install, run `VAULT_PATH="/path/to/your/vault" uv run duckbrain`. It will appear to hang — that's correct, it's waiting on stdio. Press Ctrl+C to stop.)*
+## Connecting to Agents
+MCP stdio transport means the agent spawns DuckBrain as a child process when it starts. You don't need a separate terminal or a running server. Just add this to your MCP config:
+```json
+{
+  "duckbrain": {
+    "command": "uv",
+    "args": ["run", "duckbrain"],
+    "env": {
+      "VAULT_PATH": "/path/to/your/obsidian/vault"
+    }
+  }
+}
+```
+Where to put it:
+| Agent | Config file | Top-level key |
+|-------|-------------|---------------|
+| Claude Code | `~/.claude/claude_desktop_config.json` or `.mcp.json` | `mcpServers` |
+| OpenCode | `opencode.json` | `mcp` |
+| Cursor | `.cursor/mcp.json` | `mcpServers` |
+| Hermes Agent | `mcp.json` | `mcpServers` |
+Example for Claude Code:
+```json
+{
+  "mcpServers": {
+    "duckbrain": {
+      "command": "uv",
+      "args": ["run", "duckbrain"],
+      "env": {
+        "VAULT_PATH": "/path/to/your/obsidian/vault"
+      }
+    }
+  }
+}
+```
+> **Tip:** Instead of hardcoding the path in every config, set `VAULT_PATH` once in your shell profile (`~/.bashrc`, `~/.zshrc`, or `~/.config/fish/config.fish`) and reference it in the config with your agent's env-var syntax:
+>
+> - OpenCode: `"VAULT_PATH": "{env:VAULT_PATH}"`
+> - Claude Code: `"VAULT_PATH": "${env:VAULT_PATH}"`
+Make sure `uv` is on your `PATH`.
+### Auto-Writing Session Learnings
+There are two ways to make your agent write learnings to the vault: instructions (works everywhere) or hooks (automatic, agent-native).
+#### Approach 1: Instructions (all agents)
+Add this to the appropriate instructions file. The agent reads it on startup and follows it during the session. **Tested with OpenCode.**
+**Claude Code** — add to `CLAUDE.md`:
+```markdown
+## Session Learnings
+After debugging, diving into rabbit holes, or completing significant work,
+save what you learned so you don't repeat mistakes:
+- Use vault_write(kind="daily", title="...", content="...", tags=["..."])
+  to append to today's daily note.
+- For reusable knowledge, use vault_write(kind="concept", title="...",
+  content="...", tags=["..."]) to create a wiki page.
+```
+**OpenCode** — add to your config's `instructions` field (`opencode.json`):
+```json
+"instructions": ["~/.config/opencode/LEARNINGS.md"]
+```
+Then create `~/.config/opencode/LEARNINGS.md` (or wherever you prefer — any path the config can reach):
+```markdown
+## Session Learnings
+When you encounter problems, debug issues, or discover non-obvious solutions,
+save the learning to the vault so it's available in future sessions:
+- Append to today's daily note:
+  vault_write(kind="daily", title="short summary", content="what you learned", tags=["debugging", "learned"])
+- For reusable concepts/patterns worth revisiting:
+  vault_write(kind="concept", title="Concept Name", content="explanation", tags=["relevant", "tags"])
+Do this proactively — don't wait to be asked. A learning saved is a bug not repeated.
+```
+**Cursor** — add to `.cursorrules`:
+```markdown
+## Session Learnings
+After debugging or completing work, save learnings via DuckBrain:
+- vault_write(kind="daily", title="<summary>", content="<details>", tags=[])
+- Use kind="concept" for reusable knowledge.
+```
+#### Approach 2: Hooks (automatic, no prompt engineering needed)
+Hooks run shell commands at specific lifecycle points — no instructions needed, they fire deterministically. **⚠️ Not tested with DuckBrain yet.**
+**Claude Code** — supports a full [hooks system](https://code.claude.com/docs/en/hooks) including `SessionEnd` (fires when a session terminates). Add to `.claude/settings.json`:
+```json
+{
+  "hooks": {
+    "SessionEnd": [
+      {
+        "type": "command",
+        "command": "duckbrain-save-session --transcript-from-stdin"
+      }
+    ]
+  }
+}
+```
+The `SessionEnd` hook receives the full transcript on stdin. A wrapper script could pipe it through an LLM to extract learnings, then call `vault_write`. See [`agent-memory-mcp`](https://github.com/ipiton/agent-memory-mcp) for a production example of this pattern.
+**Cursor** — supports [hooks](https://cursor.com/docs/hooks.md) including `sessionEnd`, `postToolUse`, and `stop` via `.cursor/hooks.json`. However, `sessionEnd` is **not available in cloud agents** (local IDE only), and MCP execution hooks (`beforeMCPExecution`/`afterMCPExecution`) are **not yet wired for cloud agents**. Usable for local development, not for cloud-based Cursor sessions.
+**.cursor/hooks.json** (local IDE only):
+```json
+{
+  "hooks": {
+    "stop": [
+      {
+        "type": "command",
+        "command": "duckbrain-save-session --reason stop"
+      }
+    ]
+  }
+}
+```
+### How It Works
+During a session, the agent encounters a problem, debugs it, and resolves it:
+```
+> vault_search("duckbrain daily write")
+> vault_read(filepath="wiki/...")
+Agent debugs, fixes, learns something...
+> vault_write(
+    kind="daily",
+    title="vault_write daily kind doesn't support filepath-based reads",
+    content="When vault_search returns filepaths, the agent may try to Read files
+    directly. vault_read should accept filepath as well as title to close this gap.",
+    tags=["duckbrain", "debugging", "learned"]
+  )
+```
+The learning is now in `daily/2026-05-28.md`. Tomorrow when you ask "how do I read vault pages by path?", the agent searches the vault, finds your note, and recalls the solution.
+## Tools
+### `vault_info`
+Get a summary of your vault's structure.
+```
+> vault_info()
+→ {
+    entities: 38,
+    concepts: 38,
+    sources: 33,
+    synthesis: 9,
+    available_tags: ["agent-memory", "ai", "duckdb", "mcp", ...],
+    last_modified: "2026-05-28"
+  }
+```
+No parameters. Useful for agents to discover what's in the vault before searching.
+### `vault_search`
+Full-text search over all wiki pages.
+```
+> vault_search("agent memory", kind="concept")
+→ [
+    { title: "Agent Memory Systems", kind: "concept",
+      filepath: "wiki/concepts/agent-memory-systems.md",
+      snippet: "A 6-level taxonomy of Claude Code memory approaches..." },
+    ...
+  ]
+```
+Parameters:
+- `query` (required) — search text, BM25-ranked
+- `kind` (optional) — filter to `entity`, `concept`, `source`, `synthesis`, or `daily`
+- `tags` (optional) — filter by tag substring matches
+### `vault_read`
+Read a page by title or filepath. Returns full markdown content with metadata.
+```
+> vault_read(title="Agent Memory Systems")
+→ {
+    title: "Agent Memory Systems", kind: "concept",
+    filepath: "wiki/concepts/agent-memory-systems.md",
+    content: "# Agent Memory Systems\n\nA 6-level taxonomy...",
+    tags: ["agent-memory", "taxonomy", "ai"],
+    created: "2026-05-28", updated: "2026-05-28"
+  }
+```
+Parameters:
+- `title` (optional) — page title to look up (case-insensitive)
+- `filepath` (optional) — relative path from vault_search results (e.g. `wiki/concepts/foo.md`)
+Use after `vault_search` to get full page content. Pass `filepath` from search results directly.
+### `vault_write`
+Create a new wiki page or append to today's daily note, with automatic index and log updates.
+```
+> vault_write(
+    kind="concept",
+    title="DuckDB FTS Memory",
+    content="# DuckDB FTS Memory\n\nHow DuckDB serves as a memory layer...",
+    tags=["agent-memory", "duckdb"]
+  )
+→ { success: true, filepath: "wiki/concepts/duckdb-fts-memory.md" }
+```
+For daily notes (session learnings, debugging logs):
+```
+> vault_write(
+    kind="daily",
+    title="Debugging vault_read filepath",
+    content="When search returns filepaths, agents try to Read files directly.",
+    tags=["duckbrain", "debugging"]
+  )
+→ { success: true, filepath: "daily/2026-05-28.md" }
+```
+For wiki pages (entity|concept|source|synthesis), this automatically:
+1. Writes the markdown file to the correct wiki subdirectory
+2. Generates YAML frontmatter with title, item-type, tags, dates
+3. Appends an entry to `wiki/index.md` in the right section
+4. Appends a dated entry to `wiki/log.md`
+For daily notes, this automatically:
+1. Appends to `daily/YYYY-MM-DD.md` (creates the file if today's doesn't exist yet)
+2. No YAML frontmatter — just a `## heading` + content
+3. Does NOT update index.md (daily notes aren't wiki pages)
+4. Appends a dated entry to `wiki/log.md`
+Parameters:
+- `kind` (required) — `entity`, `concept`, `source`, `synthesis`, or `daily`
+- `title` (required) — page title (or section heading for daily entries)
+- `content` (required) — markdown body (without frontmatter)
+- `tags` (required) — list of tag strings
+## Vault Path
+Set via the `VAULT_PATH` environment variable (or the `env` field in your MCP config — no need for both).
+For local development, copy `.env.example` to `.env` and set your path:
+```
+VAULT_PATH=/path/to/your/obsidian/vault
+```
+If you use WSL2 with your vault on Windows, set it to the WSL mount path (e.g., `/mnt/c/Users/you/Documents/obsidian/my-vault`).
+## Performance
+- FTS index rebuilt fresh from disk on every query — ~90 pages in under a second
+- Write operations complete in <500ms
+- Everything is in-memory — no persistent DuckDB database file
+- Zero network calls, zero external services
+## Limitations (v1)
+- No update or delete operations (only create)
+- No vector embeddings or semantic search
+- No page deduplication check before writing
+- ~1s per search at current scale; at 500+ pages, incremental indexing would be needed
+## Under Consideration
+Ideas we're exploring but not committing to yet — as we use the tool and understand what matters, some of these may get built. Open an issue to discuss.
+- **Temporal decay (recency bias)** — boost search results from recently created or updated pages. Older knowledge fades unless explicitly referenced.
+- **Vector embeddings / semantic search** — cover the ~20% recall gap that BM25 can't reach (concepts with different wording). Could integrate MemSearch or local embeddings.
+- **Update and delete operations** — allow agents to edit or remove existing pages, not just create.
+- **Incremental indexing** — INSERT single pages into the FTS index instead of full rebuild, keeping search fast at 500+ pages.
+- **Page deduplication** — detect when a page with the same title already exists before writing.
+## Inspirations
+This project stands on the shoulders of several ideas and tools:
+- **[Andrej Karpathy's LLM wiki pattern](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f)** — the idea that a personal markdown wiki, co-maintained by humans and AI agents, compounds into a persistent knowledge base. The vault schema (entities, concepts, sources, synthesis, daily log) is directly inspired by this.
+- **[DuckDB](https://duckdb.org/)** — the embedded analytical database that makes full-text search over flat files viable without a server, index sync, or persistent storage. The decision to use in-memory FTS instead of a vector database was a deliberate trade-off for simplicity.
+- **[Obsidian](https://obsidian.md/)** — the local-first, markdown-native note-taking tool that treats your files as the truth. DuckBrain exists because Obsidian vaults deserve tooling that respects the filesystem.
+- **[MemSearch](https://github.com/zilliztech/memsearch)** and **[Open Brain (OB1)](https://github.com/NateBJones-Projects/OB1)** — early experiments in cross-tool agent memory that demonstrated the *need* for structured vault write-back while choosing different architectures. Their strengths and gaps directly informed DuckBrain's design.
+- **[Agent Memory Systems (6-level taxonomy)](https://www.youtube.com/watch?v=UHVFcUzAGlM)** — Simon Scrapes' comprehensive comparison of Claude Code memory approaches provided the framework for understanding where DuckBrain fits in the ecosystem (Level 6: cross-tool MCP with dedicated server).
+- **[trellis-datamodel](https://github.com/timhiebenthal/trellis-datamodel)** — the same author's data modeling tool whose CI/CD patterns were borrowed for this project's repository readiness.
+- **[mondayDB 3 — Solving HTAP for a Trillion-Table System](https://engineering.monday.com/mondaydb-3-solving-htap-for-a-trillion-table-system/)** — monday.com's engineering blog on their DuckDB-powered CQRS read serving layer at production scale. Proved that DuckDB in-process with per-tenant file isolation is a viable architecture — the same pattern DuckBrain applies at personal-wiki scale.
+The core decision — **build, don't integrate** — came from a [structured comparison](https://github.com/timhiebenthal/duckbrain/blob/main/specs/2026-05-28-duckdb-memory-mcp/spec.md) of 7 existing tools. All failed on one requirement: vault schema-aware write-back. Rather than fork or extend, DuckBrain started from first principles: what's the simplest thing that gives agents structured read/write access to an Obsidian vault? The answer was DuckDB + MCP + ~500 lines of Python.
+## License
+MIT