duckbrain 0.1.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,438 @@
1
+ Metadata-Version: 2.4
2
+ Name: duckbrain
3
+ Version: 0.1.1
4
+ Summary: DuckDB-backed MCP memory server for Obsidian vaults — structured search, read, and write access for AI coding agents.
5
+ Keywords: mcp,obsidian,memory,knowledge-base,duckdb,ai-agent
6
+ Author: Tim Hiebenthal
7
+ Author-email: Tim Hiebenthal <timhiebenthal@gmail.com>
8
+ License-Expression: MIT
9
+ Classifier: Development Status :: 3 - Alpha
10
+ Classifier: Intended Audience :: Developers
11
+ Classifier: License :: OSI Approved :: MIT License
12
+ Classifier: Programming Language :: Python :: 3
13
+ Classifier: Programming Language :: Python :: 3.10
14
+ Classifier: Programming Language :: Python :: 3.11
15
+ Classifier: Programming Language :: Python :: 3.12
16
+ Classifier: Programming Language :: Python :: 3.13
17
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
18
+ Requires-Dist: duckdb>=1.5.3
19
+ Requires-Dist: mcp[cli]>=1.27.1
20
+ Requires-Dist: python-dotenv>=1.2.2
21
+ Requires-Dist: pyyaml>=6.0.3
22
+ Requires-Python: >=3.10
23
+ Project-URL: Homepage, https://github.com/timhiebenthal/duckbrain
24
+ Project-URL: Repository, https://github.com/timhiebenthal/duckbrain
25
+ Project-URL: Issues, https://github.com/timhiebenthal/duckbrain/issues
26
+ Description-Content-Type: text/markdown
27
+
28
+ # DuckBrain
29
+
30
+ <p align="center">
31
+ <img src="https://raw.githubusercontent.com/timhiebenthal/duckbrain/main/logo/logo_writing_white_bg.png" alt="DuckBrain" width="500" />
32
+ </p>
33
+
34
+ DuckDB-backed MCP memory server for Obsidian vaults. Gives AI coding agents structured read and write access to your personal wiki — with full-text search, frontmatter-aware indexing, and automatic index/log updates. Built on the principle that your vault filesystem should be the single source of truth, not a database hidden behind an API.
35
+
36
+ ## What it solves
37
+
38
+ Existing agent memory tools (MemSearch, Open Brain, Mem0, Supermemory) treat memory as unstructured text blobs. If you maintain a [Karpathy-style LLM wiki](https://x.com/karpathy/status/1889054630119760374) in Obsidian with typed pages (entities, concepts, sources, synthesis), YAML frontmatter, tags, and wikilinks — none of those tools understand your vault's structure.
39
+
40
+ DuckBrain fills that gap. It reads your vault as-is and writes new pages following your vault's schema, so your wiki stays a single source of truth on the filesystem.
41
+
42
+ ## How it works (Architecture)
43
+
44
+ ```
45
+ ┌──────────────────┐ MCP stdio ┌─────────────────────────────────┐
46
+ │ AI Agent │ ◄──────────────► │ DuckBrain MCP Server │
47
+ │ │ │ │
48
+ │ Claude Code │ │ vault_info ──┐ │
49
+ │ OpenCode │ │ vault_search ─┤ DuckDB FTS │
50
+ │ Cursor │ │ vault_read ──┤ Filesystem │
51
+ │ Hermes │ │ vault_write ──┘ Filesystem │
52
+ └──────────────────┘ └────────┬────────┬───────────────┘
53
+ │ │
54
+ query ┌─────────────────────┘ └── read/write ──┐
55
+ (full index) ▼ ▼ (single file)
56
+ ┌──────────────────────┐ ┌───────────────────────────┐
57
+ │ DuckDB (in-memory) │ │ Your Obsidian Vault │
58
+ │ │ │ │
59
+ │ pages (in-memory │ rebuilt from scratch │ wiki/entities/ │
60
+ │ rebuilt every search)│ on every query │ wiki/concepts/ │
61
+ │ ┌───────────────┐ │ │ wiki/sources/ │
62
+ │ │ filepath │ │ │ wiki/synthesis/ │
63
+ │ │ title │ │ │ daily/ │
64
+ │ │ kind │ │ │ wiki/index.md │
65
+ │ │ tags │ │ │ wiki/log.md │
66
+ │ │ body │ │ │ │
67
+ │ │ created │ │ │ plain markdown on disk │
68
+ │ │ updated │ │ │ │
69
+ │ └───────────────┘ │ │ │
70
+ │ │ │ │
71
+ │ BM25 search query: │ │ │
72
+ │ SELECT ... │ │ │
73
+ │ FROM pages p │ │ │
74
+ │ WHERE fts_match_bm25│ │ │
75
+ │ (p.filepath, │ │ │
76
+ │ 'segfault') │ │ │
77
+ │ AND kind='concept' │ │ │
78
+ │ ORDER BY score DESC │ │ │
79
+ └──────────────────────┘ └───────────────────────────┘
80
+ ```
81
+
82
+ - **Reads** your vault files directly — no index to sync, no watchers, no duplicate storage
83
+ - **Searches** via DuckDB full-text search (BM25 ranking), rebuilt fresh from disk on every query
84
+ - **Writes** new pages with correct YAML frontmatter, auto-updating your index and log
85
+
86
+ ## Requirements
87
+
88
+ - Python 3.10+
89
+ - [uv](https://docs.astral.sh/uv/) (package manager)
90
+ - An Obsidian vault structured with a `wiki/` directory containing:
91
+ - `wiki/entities/` — people, orgs, products, tools
92
+ - `wiki/concepts/` — ideas, frameworks, theories
93
+ - `wiki/sources/` — one summary per ingested source
94
+ - `wiki/synthesis/` — cross-cutting analysis
95
+ - `wiki/index.md` — page catalog with `## Entities`, `## Concepts`, `## Sources`, `## Synthesis` sections
96
+ - `wiki/log.md` — append-only chronological record
97
+ - Pages should use YAML frontmatter: `title`, `item-type`, `tags`, `created`, `updated`
98
+
99
+ This follows the schema defined for [LLM wikis](https://x.com/karpathy/status/1889054630119760374). If your vault uses a different structure, DuckBrain works with it — but index/log updates expect the section headers above.
100
+
101
+ ## Quick Start
102
+
103
+ ```bash
104
+ pip install duckbrain
105
+ ```
106
+
107
+ That's it. Now connect your AI agent (see below) — you don't run DuckBrain yourself, the agent spawns it as needed.
108
+
109
+ *(Optional: verify the install by running `duckbrain` — it'll fail with "VAULT_PATH not set", which confirms it's working.)*
110
+
111
+ ### Installing from source (for contributors)
112
+
113
+ ```bash
114
+ git clone https://github.com/timhiebenthal/duckbrain.git
115
+ cd duckbrain
116
+ uv sync # installs project + dev dependencies in a virtual environment
117
+ ```
118
+
119
+ This requires [uv](https://docs.astral.sh/uv/) (the Python package manager used for development). End users should use `pip install duckbrain` above.
120
+
121
+ *(Optional: to verify the install, run `VAULT_PATH="/path/to/your/vault" uv run duckbrain`. It will appear to hang — that's correct, it's waiting on stdio. Press Ctrl+C to stop.)*
122
+
123
+ ## Connecting to Agents
124
+
125
+ MCP stdio transport means the agent spawns DuckBrain as a child process when it starts. You don't need a separate terminal or a running server. Just add this to your MCP config:
126
+
127
+ ```json
128
+ {
129
+ "duckbrain": {
130
+ "command": "uv",
131
+ "args": ["run", "duckbrain"],
132
+ "env": {
133
+ "VAULT_PATH": "/path/to/your/obsidian/vault"
134
+ }
135
+ }
136
+ }
137
+ ```
138
+
139
+ Where to put it:
140
+
141
+ | Agent | Config file | Top-level key |
142
+ |-------|-------------|---------------|
143
+ | Claude Code | `~/.claude/claude_desktop_config.json` or `.mcp.json` | `mcpServers` |
144
+ | OpenCode | `opencode.json` | `mcp` |
145
+ | Cursor | `.cursor/mcp.json` | `mcpServers` |
146
+ | Hermes Agent | `mcp.json` | `mcpServers` |
147
+
148
+ Example for Claude Code:
149
+ ```json
150
+ {
151
+ "mcpServers": {
152
+ "duckbrain": {
153
+ "command": "uv",
154
+ "args": ["run", "duckbrain"],
155
+ "env": {
156
+ "VAULT_PATH": "/path/to/your/obsidian/vault"
157
+ }
158
+ }
159
+ }
160
+ }
161
+ ```
162
+
163
+ > **Tip:** Instead of hardcoding the path in every config, set `VAULT_PATH` once in your shell profile (`~/.bashrc`, `~/.zshrc`, or `~/.config/fish/config.fish`) and reference it in the config with your agent's env-var syntax:
164
+ >
165
+ > - OpenCode: `"VAULT_PATH": "{env:VAULT_PATH}"`
166
+ > - Claude Code: `"VAULT_PATH": "${env:VAULT_PATH}"`
167
+
168
+ Make sure `uv` is on your `PATH`.
169
+
170
+ ### Auto-Writing Session Learnings
171
+
172
+ There are two ways to make your agent write learnings to the vault: instructions (works everywhere) or hooks (automatic, agent-native).
173
+
174
+ #### Approach 1: Instructions (all agents)
175
+
176
+ Add this to the appropriate instructions file. The agent reads it on startup and follows it during the session. **Tested with OpenCode.**
177
+
178
+ **Claude Code** — add to `CLAUDE.md`:
179
+
180
+ ```markdown
181
+ ## Session Learnings
182
+
183
+ After debugging, diving into rabbit holes, or completing significant work,
184
+ save what you learned so you don't repeat mistakes:
185
+
186
+ - Use vault_write(kind="daily", title="...", content="...", tags=["..."])
187
+ to append to today's daily note.
188
+ - For reusable knowledge, use vault_write(kind="concept", title="...",
189
+ content="...", tags=["..."]) to create a wiki page.
190
+ ```
191
+
192
+ **OpenCode** — add to your config's `instructions` field (`opencode.json`):
193
+
194
+ ```json
195
+ "instructions": ["~/.config/opencode/LEARNINGS.md"]
196
+ ```
197
+
198
+ Then create `~/.config/opencode/LEARNINGS.md` (or wherever you prefer — any path the config can reach):
199
+
200
+ ```markdown
201
+ ## Session Learnings
202
+
203
+ When you encounter problems, debug issues, or discover non-obvious solutions,
204
+ save the learning to the vault so it's available in future sessions:
205
+
206
+ - Append to today's daily note:
207
+ vault_write(kind="daily", title="short summary", content="what you learned", tags=["debugging", "learned"])
208
+
209
+ - For reusable concepts/patterns worth revisiting:
210
+ vault_write(kind="concept", title="Concept Name", content="explanation", tags=["relevant", "tags"])
211
+
212
+ Do this proactively — don't wait to be asked. A learning saved is a bug not repeated.
213
+ ```
214
+
215
+ **Cursor** — add to `.cursorrules`:
216
+
217
+ ```markdown
218
+ ## Session Learnings
219
+
220
+ After debugging or completing work, save learnings via DuckBrain:
221
+ - vault_write(kind="daily", title="<summary>", content="<details>", tags=[])
222
+ - Use kind="concept" for reusable knowledge.
223
+ ```
224
+
225
+ #### Approach 2: Hooks (automatic, no prompt engineering needed)
226
+
227
+ Hooks run shell commands at specific lifecycle points — no instructions needed, they fire deterministically. **⚠️ Not tested with DuckBrain yet.**
228
+
229
+ **Claude Code** — supports a full [hooks system](https://code.claude.com/docs/en/hooks) including `SessionEnd` (fires when a session terminates). Add to `.claude/settings.json`:
230
+
231
+ ```json
232
+ {
233
+ "hooks": {
234
+ "SessionEnd": [
235
+ {
236
+ "type": "command",
237
+ "command": "duckbrain-save-session --transcript-from-stdin"
238
+ }
239
+ ]
240
+ }
241
+ }
242
+ ```
243
+
244
+ The `SessionEnd` hook receives the full transcript on stdin. A wrapper script could pipe it through an LLM to extract learnings, then call `vault_write`. See [`agent-memory-mcp`](https://github.com/ipiton/agent-memory-mcp) for a production example of this pattern.
245
+
246
+ **Cursor** — supports [hooks](https://cursor.com/docs/hooks.md) including `sessionEnd`, `postToolUse`, and `stop` via `.cursor/hooks.json`. However, `sessionEnd` is **not available in cloud agents** (local IDE only), and MCP execution hooks (`beforeMCPExecution`/`afterMCPExecution`) are **not yet wired for cloud agents**. Usable for local development, not for cloud-based Cursor sessions.
247
+
248
+ **.cursor/hooks.json** (local IDE only):
249
+ ```json
250
+ {
251
+ "hooks": {
252
+ "stop": [
253
+ {
254
+ "type": "command",
255
+ "command": "duckbrain-save-session --reason stop"
256
+ }
257
+ ]
258
+ }
259
+ }
260
+ ```
261
+
262
+ ### How It Works
263
+
264
+ During a session, the agent encounters a problem, debugs it, and resolves it:
265
+
266
+ ```
267
+ > vault_search("duckbrain daily write")
268
+ > vault_read(filepath="wiki/...")
269
+
270
+ Agent debugs, fixes, learns something...
271
+
272
+ > vault_write(
273
+ kind="daily",
274
+ title="vault_write daily kind doesn't support filepath-based reads",
275
+ content="When vault_search returns filepaths, the agent may try to Read files
276
+ directly. vault_read should accept filepath as well as title to close this gap.",
277
+ tags=["duckbrain", "debugging", "learned"]
278
+ )
279
+ ```
280
+
281
+ The learning is now in `daily/2026-05-28.md`. Tomorrow when you ask "how do I read vault pages by path?", the agent searches the vault, finds your note, and recalls the solution.
282
+
283
+ ## Tools
284
+
285
+ ### `vault_info`
286
+
287
+ Get a summary of your vault's structure.
288
+
289
+ ```
290
+ > vault_info()
291
+ → {
292
+ entities: 38,
293
+ concepts: 38,
294
+ sources: 33,
295
+ synthesis: 9,
296
+ available_tags: ["agent-memory", "ai", "duckdb", "mcp", ...],
297
+ last_modified: "2026-05-28"
298
+ }
299
+ ```
300
+
301
+ No parameters. Useful for agents to discover what's in the vault before searching.
302
+
303
+ ### `vault_search`
304
+
305
+ Full-text search over all wiki pages.
306
+
307
+ ```
308
+ > vault_search("agent memory", kind="concept")
309
+ → [
310
+ { title: "Agent Memory Systems", kind: "concept",
311
+ filepath: "wiki/concepts/agent-memory-systems.md",
312
+ snippet: "A 6-level taxonomy of Claude Code memory approaches..." },
313
+ ...
314
+ ]
315
+ ```
316
+
317
+ Parameters:
318
+ - `query` (required) — search text, BM25-ranked
319
+ - `kind` (optional) — filter to `entity`, `concept`, `source`, `synthesis`, or `daily`
320
+ - `tags` (optional) — filter by tag substring matches
321
+
322
+ ### `vault_read`
323
+
324
+ Read a page by title or filepath. Returns full markdown content with metadata.
325
+
326
+ ```
327
+ > vault_read(title="Agent Memory Systems")
328
+ → {
329
+ title: "Agent Memory Systems", kind: "concept",
330
+ filepath: "wiki/concepts/agent-memory-systems.md",
331
+ content: "# Agent Memory Systems\n\nA 6-level taxonomy...",
332
+ tags: ["agent-memory", "taxonomy", "ai"],
333
+ created: "2026-05-28", updated: "2026-05-28"
334
+ }
335
+ ```
336
+
337
+ Parameters:
338
+ - `title` (optional) — page title to look up (case-insensitive)
339
+ - `filepath` (optional) — relative path from vault_search results (e.g. `wiki/concepts/foo.md`)
340
+
341
+ Use after `vault_search` to get full page content. Pass `filepath` from search results directly.
342
+
343
+ ### `vault_write`
344
+
345
+ Create a new wiki page or append to today's daily note, with automatic index and log updates.
346
+
347
+ ```
348
+ > vault_write(
349
+ kind="concept",
350
+ title="DuckDB FTS Memory",
351
+ content="# DuckDB FTS Memory\n\nHow DuckDB serves as a memory layer...",
352
+ tags=["agent-memory", "duckdb"]
353
+ )
354
+ → { success: true, filepath: "wiki/concepts/duckdb-fts-memory.md" }
355
+ ```
356
+
357
+ For daily notes (session learnings, debugging logs):
358
+ ```
359
+ > vault_write(
360
+ kind="daily",
361
+ title="Debugging vault_read filepath",
362
+ content="When search returns filepaths, agents try to Read files directly.",
363
+ tags=["duckbrain", "debugging"]
364
+ )
365
+ → { success: true, filepath: "daily/2026-05-28.md" }
366
+ ```
367
+
368
+ For wiki pages (entity|concept|source|synthesis), this automatically:
369
+ 1. Writes the markdown file to the correct wiki subdirectory
370
+ 2. Generates YAML frontmatter with title, item-type, tags, dates
371
+ 3. Appends an entry to `wiki/index.md` in the right section
372
+ 4. Appends a dated entry to `wiki/log.md`
373
+
374
+ For daily notes, this automatically:
375
+ 1. Appends to `daily/YYYY-MM-DD.md` (creates the file if today's doesn't exist yet)
376
+ 2. No YAML frontmatter — just a `## heading` + content
377
+ 3. Does NOT update index.md (daily notes aren't wiki pages)
378
+ 4. Appends a dated entry to `wiki/log.md`
379
+
380
+ Parameters:
381
+ - `kind` (required) — `entity`, `concept`, `source`, `synthesis`, or `daily`
382
+ - `title` (required) — page title (or section heading for daily entries)
383
+ - `content` (required) — markdown body (without frontmatter)
384
+ - `tags` (required) — list of tag strings
385
+
386
+ ## Vault Path
387
+
388
+ Set via the `VAULT_PATH` environment variable (or the `env` field in your MCP config — no need for both).
389
+
390
+ For local development, copy `.env.example` to `.env` and set your path:
391
+
392
+ ```
393
+ VAULT_PATH=/path/to/your/obsidian/vault
394
+ ```
395
+
396
+ If you use WSL2 with your vault on Windows, set it to the WSL mount path (e.g., `/mnt/c/Users/you/Documents/obsidian/my-vault`).
397
+
398
+ ## Performance
399
+
400
+ - FTS index rebuilt fresh from disk on every query — ~90 pages in under a second
401
+ - Write operations complete in <500ms
402
+ - Everything is in-memory — no persistent DuckDB database file
403
+ - Zero network calls, zero external services
404
+
405
+ ## Limitations (v1)
406
+
407
+ - No update or delete operations (only create)
408
+ - No vector embeddings or semantic search
409
+ - No page deduplication check before writing
410
+ - ~1s per search at current scale; at 500+ pages, incremental indexing would be needed
411
+
412
+ ## Under Consideration
413
+
414
+ Ideas we're exploring but not committing to yet — as we use the tool and understand what matters, some of these may get built. Open an issue to discuss.
415
+
416
+ - **Temporal decay (recency bias)** — boost search results from recently created or updated pages. Older knowledge fades unless explicitly referenced.
417
+ - **Vector embeddings / semantic search** — cover the ~20% recall gap that BM25 can't reach (concepts with different wording). Could integrate MemSearch or local embeddings.
418
+ - **Update and delete operations** — allow agents to edit or remove existing pages, not just create.
419
+ - **Incremental indexing** — INSERT single pages into the FTS index instead of full rebuild, keeping search fast at 500+ pages.
420
+ - **Page deduplication** — detect when a page with the same title already exists before writing.
421
+
422
+ ## Inspirations
423
+
424
+ This project stands on the shoulders of several ideas and tools:
425
+
426
+ - **[Andrej Karpathy's LLM wiki pattern](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f)** — the idea that a personal markdown wiki, co-maintained by humans and AI agents, compounds into a persistent knowledge base. The vault schema (entities, concepts, sources, synthesis, daily log) is directly inspired by this.
427
+ - **[DuckDB](https://duckdb.org/)** — the embedded analytical database that makes full-text search over flat files viable without a server, index sync, or persistent storage. The decision to use in-memory FTS instead of a vector database was a deliberate trade-off for simplicity.
428
+ - **[Obsidian](https://obsidian.md/)** — the local-first, markdown-native note-taking tool that treats your files as the truth. DuckBrain exists because Obsidian vaults deserve tooling that respects the filesystem.
429
+ - **[MemSearch](https://github.com/zilliztech/memsearch)** and **[Open Brain (OB1)](https://github.com/NateBJones-Projects/OB1)** — early experiments in cross-tool agent memory that demonstrated the *need* for structured vault write-back while choosing different architectures. Their strengths and gaps directly informed DuckBrain's design.
430
+ - **[Agent Memory Systems (6-level taxonomy)](https://www.youtube.com/watch?v=UHVFcUzAGlM)** — Simon Scrapes' comprehensive comparison of Claude Code memory approaches provided the framework for understanding where DuckBrain fits in the ecosystem (Level 6: cross-tool MCP with dedicated server).
431
+ - **[trellis-datamodel](https://github.com/timhiebenthal/trellis-datamodel)** — the same author's data modeling tool whose CI/CD patterns were borrowed for this project's repository readiness.
432
+ - **[mondayDB 3 — Solving HTAP for a Trillion-Table System](https://engineering.monday.com/mondaydb-3-solving-htap-for-a-trillion-table-system/)** — monday.com's engineering blog on their DuckDB-powered CQRS read serving layer at production scale. Proved that DuckDB in-process with per-tenant file isolation is a viable architecture — the same pattern DuckBrain applies at personal-wiki scale.
433
+
434
+ The core decision — **build, don't integrate** — came from a [structured comparison](https://github.com/timhiebenthal/duckbrain/blob/main/specs/2026-05-28-duckdb-memory-mcp/spec.md) of 7 existing tools. All failed on one requirement: vault schema-aware write-back. Rather than fork or extend, DuckBrain started from first principles: what's the simplest thing that gives agents structured read/write access to an Obsidian vault? The answer was DuckDB + MCP + ~500 lines of Python.
435
+
436
+ ## License
437
+
438
+ MIT