membot 0.1.0 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,137 @@
1
+ ---
2
+ name: membot
3
+ description: Persistent, versioned context store for AI agents — ingest, search, read, and write knowledge via the membot CLI or MCP server
4
+ trigger: when the user wants to remember, recall, or search project knowledge, ingest documents into a long-lived store, or surface relevant context for a task
5
+ ---
6
+
7
+ # membot — Persistent Context for Agents
8
+
9
+ You have access to a long-lived context store via `membot`. Files (markdown, PDFs, DOCX, HTML, URLs, agent notes) are ingested, converted to markdown, chunked, embedded locally, and indexed in DuckDB with hybrid search (semantic + BM25). Every artifact is addressed by a virtual `logical_path`. Every change creates a new immutable version — nothing is overwritten in place.
10
+
11
+ Use this workflow:
12
+
13
+ ## 1. Discover what's already there
14
+
15
+ Before ingesting, check whether the knowledge already exists.
16
+
17
+ ```bash
18
+ membot tree # synthesised directory tree of logical_paths
19
+ membot ls # one row per current file (size, mime, refresh status)
20
+ membot ls docs/ # filter by prefix
21
+ membot search "<question>" # hybrid search (semantic + keyword)
22
+ ```
23
+
24
+ `search` is the primary discovery tool — prefer it over scanning files.
25
+
26
+ ## 2. Ingest
27
+
28
+ ```bash
29
+ membot add ./README.md # single file
30
+ membot add ./docs # recursive directory walk
31
+ membot add "docs/**/*.md" # glob
32
+ membot add https://example.com/spec.pdf # URL (auto-converted to markdown)
33
+ membot add "inline:Decision: use X because Y" # literal text
34
+ membot add ./docs --refresh-frequency 24h # auto-refresh every day
35
+ ```
36
+
37
+ Each entry becomes a new version under its own `logical_path`. PDFs/DOCX/HTML are converted to markdown; images get vision captions; original bytes are kept and reachable via `membot read --bytes`.
38
+
39
+ ## 3. Read
40
+
41
+ ```bash
42
+ membot read <logical_path> # current markdown surrogate
43
+ membot read <logical_path> --bytes # original bytes (base64) — PDF/DOCX/image as ingested
44
+ membot read <logical_path> --version <ts> # historical snapshot
45
+ membot info <logical_path> # metadata only (no content)
46
+ membot versions <logical_path> # every version, newest first
47
+ membot diff <logical_path> --a <ts> [--b <ts>] # unified diff between versions
48
+ ```
49
+
50
+ Defaults to the current (non-tombstoned) version. Pass `--version` only when you need history.
51
+
52
+ ## 4. Write your own notes
53
+
54
+ Persist agent-authored summaries, decisions, or synthesised context so they survive across conversations:
55
+
56
+ ```bash
57
+ membot write notes/decision-2026-05.md --content "Decided to ..."
58
+ ```
59
+
60
+ Inline writes create a new `(logical_path, version_id)` row just like file ingests — `membot versions` lists them, `membot diff` compares them. To mirror an external doc that should re-fetch over time, use `membot add <url> --refresh-frequency` instead.
61
+
62
+ ## 5. Refresh, rename, delete, prune
63
+
64
+ ```bash
65
+ membot refresh <logical_path> # re-read source; new version only if bytes changed
66
+ membot refresh # refresh all rows whose schedule has elapsed
67
+ membot mv old/path new/path # rename (history preserved under both)
68
+ membot rm <logical_path> # tombstone (history still queryable)
69
+ membot prune --before <iso-ts> # drop non-current versions older than cutoff (irreversible)
70
+ ```
71
+
72
+ Tombstones hide a path from `ls` / `tree` / `search` but `versions` and `read --version <ts>` still work. Pruning is the only way to actually remove data.
73
+
74
+ ## Versioning rules
75
+
76
+ - Defaults always operate on the current, non-tombstoned version.
77
+ - Pass an explicit `--version <timestamp>` (from `membot versions`) to read or diff history.
78
+ - `membot_add` (when source bytes have changed), refresh-with-changes, `write`, and `mv` each create a new version. The previous version is preserved. Re-running `membot_add` against an unchanged source is a no-op (status `unchanged`, same `version_id`); pass `force=true` to force a new version.
79
+ - Mutating an existing version is not possible — corrections are new versions.
80
+
81
+ ## When to use this skill
82
+
83
+ - The user asks to remember, recall, save, or look up something across conversations.
84
+ - You need project-specific context (specs, decisions, transcripts, rendered docs) that's larger than fits in the prompt.
85
+ - You need to ingest a document (PDF, DOCX, HTML, URL) and reason over it.
86
+ - You're producing a summary or decision that should survive past this conversation.
87
+
88
+ ## When NOT to use this skill
89
+
90
+ - Reading a file the user just pointed at — use the regular file-read tool unless they want it persisted.
91
+ - Storing secrets, credentials, or anything that shouldn't sit in `~/.membot/index.duckdb`.
92
+ - Quick scratch state for the current turn — keep that in the conversation.
93
+
94
+ ## MCP server
95
+
96
+ `membot serve` exposes the same operations as MCP tools (`membot_add`, `membot_search`, etc.) over stdio (default) or HTTP (`--http <port>`). When connected, prefer the MCP tools over shelling out — they return structured `outputSchema` data with `version_id` echoed on every read.
97
+
98
+ ## Available commands
99
+
100
+ | Command | Purpose |
101
+ | ------------------------------------- | ------------------------------------------------------------------------------ |
102
+ | `membot add <source>` | Ingest file, directory, glob, URL, or `inline:<text>`. Skips unchanged sources; pass `--force` to re-ingest |
103
+ | `membot ls [prefix]` | List current files (size, mime, refresh status) |
104
+ | `membot tree [prefix]` | Render the synthesised logical-path tree |
105
+ | `membot read <path>` | Read current markdown surrogate (or `--bytes` for original) |
106
+ | `membot write <path> --content <txt>` | Write inline agent-authored markdown as a new version |
107
+ | `membot search <query>` | Hybrid search (semantic + BM25); add `--include-history` to search older versions |
108
+ | `membot info <path>` | Inspect metadata (source, fetcher, refresh schedule, digests) without content |
109
+ | `membot versions <path>` | List every version newest-first with version_id and change notes |
110
+ | `membot diff <path> --a <ts>` | Unified diff between two versions |
111
+ | `membot mv <old> <new>` | Rename a logical_path (history preserved) |
112
+ | `membot rm <path>` | Tombstone a logical_path (history still queryable) |
113
+ | `membot refresh [path]` | Re-read source; create new version only if bytes changed |
114
+ | `membot prune --before <ts>` | Permanently drop non-current versions older than cutoff (irreversible) |
115
+ | `membot serve` | Start MCP server (stdio default, `--http <port>` for HTTP) |
116
+ | `membot reindex` | Rebuild the FTS keyword index over current chunks |
117
+
118
+ ## Output formats
119
+
120
+ - TTY → spinners, colors, tables. `--no-color` disables ANSI.
121
+ - Piped, `--json`, `CI=true`, or `NO_COLOR` → JSON to stdout, structured logs to stderr, no ANSI bytes.
122
+ - Use `--json` when parsing output programmatically (it's automatic when piped, but explicit is safer).
123
+ - Use `--verbose` if a command fails unexpectedly.
124
+
125
+ ## Troubleshooting
126
+
127
+ - **"ingest failed: unsupported mime"** → Add a converter or pass `--bytes` to keep the original; LLM-fallback only runs when `ANTHROPIC_API_KEY` is set.
128
+ - **"refresh failed: auth"** → The original fetch used an authenticated mcpx tool; re-auth via `mcpx auth <server>`.
129
+ - **Search returns nothing** → Confirm the file ingested with `membot info <path>`; if needed, run `membot reindex` to rebuild the FTS keyword index.
130
+ - **Stale results after manual DB edits** → `membot reindex`.
131
+ - **Two paths point at the same content** → `membot mv` doesn't merge; tombstone one with `membot rm`.
132
+
133
+ ## Configuration
134
+
135
+ - Data lives in `~/.membot/index.duckdb` (override via `MEMBOT_HOME`).
136
+ - Optional `ANTHROPIC_API_KEY` enables LLM fallback for messy/binary input. Without it, conversion degrades to deterministic native output.
137
+ - Config file: `~/.membot/config.json` (see `membot --help` for the global flags).
@@ -0,0 +1,137 @@
1
+ ---
2
+ description: Persistent, versioned context store for AI agents — ingest, search, read, and write knowledge via the membot CLI or MCP server
3
+ globs:
4
+ alwaysApply: true
5
+ ---
6
+
7
+ # membot — Persistent Context for Agents
8
+
9
+ You have access to a long-lived context store via `membot`. Files (markdown, PDFs, DOCX, HTML, URLs, agent notes) are ingested, converted to markdown, chunked, embedded locally, and indexed in DuckDB with hybrid search (semantic + BM25). Every artifact is addressed by a virtual `logical_path`. Every change creates a new immutable version — nothing is overwritten in place.
10
+
11
+ Use this workflow:
12
+
13
+ ## 1. Discover what's already there
14
+
15
+ Before ingesting, check whether the knowledge already exists.
16
+
17
+ ```bash
18
+ membot tree # synthesised directory tree of logical_paths
19
+ membot ls # one row per current file (size, mime, refresh status)
20
+ membot ls docs/ # filter by prefix
21
+ membot search "<question>" # hybrid search (semantic + keyword)
22
+ ```
23
+
24
+ `search` is the primary discovery tool — prefer it over scanning files.
25
+
26
+ ## 2. Ingest
27
+
28
+ ```bash
29
+ membot add ./README.md # single file
30
+ membot add ./docs # recursive directory walk
31
+ membot add "docs/**/*.md" # glob
32
+ membot add https://example.com/spec.pdf # URL (auto-converted to markdown)
33
+ membot add "inline:Decision: use X because Y" # literal text
34
+ membot add ./docs --refresh-frequency 24h # auto-refresh every day
35
+ ```
36
+
37
+ Each entry becomes a new version under its own `logical_path`. PDFs/DOCX/HTML are converted to markdown; images get vision captions; original bytes are kept and reachable via `membot read --bytes`.
38
+
39
+ ## 3. Read
40
+
41
+ ```bash
42
+ membot read <logical_path> # current markdown surrogate
43
+ membot read <logical_path> --bytes # original bytes (base64) — PDF/DOCX/image as ingested
44
+ membot read <logical_path> --version <ts> # historical snapshot
45
+ membot info <logical_path> # metadata only (no content)
46
+ membot versions <logical_path> # every version, newest first
47
+ membot diff <logical_path> --a <ts> [--b <ts>] # unified diff between versions
48
+ ```
49
+
50
+ Defaults to the current (non-tombstoned) version. Pass `--version` only when you need history.
51
+
52
+ ## 4. Write your own notes
53
+
54
+ Persist agent-authored summaries, decisions, or synthesised context so they survive across conversations:
55
+
56
+ ```bash
57
+ membot write notes/decision-2026-05.md --content "Decided to ..."
58
+ ```
59
+
60
+ Inline writes create a new `(logical_path, version_id)` row just like file ingests — `membot versions` lists them, `membot diff` compares them. To mirror an external doc that should re-fetch over time, use `membot add <url> --refresh-frequency` instead.
61
+
62
+ ## 5. Refresh, rename, delete, prune
63
+
64
+ ```bash
65
+ membot refresh <logical_path> # re-read source; new version only if bytes changed
66
+ membot refresh # refresh all rows whose schedule has elapsed
67
+ membot mv old/path new/path # rename (history preserved under both)
68
+ membot rm <logical_path> # tombstone (history still queryable)
69
+ membot prune --before <iso-ts> # drop non-current versions older than cutoff (irreversible)
70
+ ```
71
+
72
+ Tombstones hide a path from `ls` / `tree` / `search` but `versions` and `read --version <ts>` still work. Pruning is the only way to actually remove data.
73
+
74
+ ## Versioning rules
75
+
76
+ - Defaults always operate on the current, non-tombstoned version.
77
+ - Pass an explicit `--version <timestamp>` (from `membot versions`) to read or diff history.
78
+ - `membot_add` (when source bytes have changed), refresh-with-changes, `write`, and `mv` each create a new version. The previous version is preserved. Re-running `membot_add` against an unchanged source is a no-op (status `unchanged`, same `version_id`); pass `force=true` to force a new version.
79
+ - Mutating an existing version is not possible — corrections are new versions.
80
+
81
+ ## When to use this rule
82
+
83
+ - The user asks to remember, recall, save, or look up something across conversations.
84
+ - You need project-specific context (specs, decisions, transcripts, rendered docs) that's larger than fits in the prompt.
85
+ - You need to ingest a document (PDF, DOCX, HTML, URL) and reason over it.
86
+ - You're producing a summary or decision that should survive past this conversation.
87
+
88
+ ## When NOT to use this rule
89
+
90
+ - Reading a file the user just pointed at — use the regular file-read tool unless they want it persisted.
91
+ - Storing secrets, credentials, or anything that shouldn't sit in `~/.membot/index.duckdb`.
92
+ - Quick scratch state for the current turn — keep that in the conversation.
93
+
94
+ ## MCP server
95
+
96
+ `membot serve` exposes the same operations as MCP tools (`membot_add`, `membot_search`, etc.) over stdio (default) or HTTP (`--http <port>`). When connected, prefer the MCP tools over shelling out — they return structured `outputSchema` data with `version_id` echoed on every read.
97
+
98
+ ## Available commands
99
+
100
+ | Command | Purpose |
101
+ | ------------------------------------- | ------------------------------------------------------------------------------ |
102
+ | `membot add <source>` | Ingest file, directory, glob, URL, or `inline:<text>`. Skips unchanged sources; pass `--force` to re-ingest |
103
+ | `membot ls [prefix]` | List current files (size, mime, refresh status) |
104
+ | `membot tree [prefix]` | Render the synthesised logical-path tree |
105
+ | `membot read <path>` | Read current markdown surrogate (or `--bytes` for original) |
106
+ | `membot write <path> --content <txt>` | Write inline agent-authored markdown as a new version |
107
+ | `membot search <query>` | Hybrid search (semantic + BM25); add `--include-history` to search older versions |
108
+ | `membot info <path>` | Inspect metadata (source, fetcher, refresh schedule, digests) without content |
109
+ | `membot versions <path>` | List every version newest-first with version_id and change notes |
110
+ | `membot diff <path> --a <ts>` | Unified diff between two versions |
111
+ | `membot mv <old> <new>` | Rename a logical_path (history preserved) |
112
+ | `membot rm <path>` | Tombstone a logical_path (history still queryable) |
113
+ | `membot refresh [path]` | Re-read source; create new version only if bytes changed |
114
+ | `membot prune --before <ts>` | Permanently drop non-current versions older than cutoff (irreversible) |
115
+ | `membot serve` | Start MCP server (stdio default, `--http <port>` for HTTP) |
116
+ | `membot reindex` | Rebuild the FTS keyword index over current chunks |
117
+
118
+ ## Output formats
119
+
120
+ - TTY → spinners, colors, tables. `--no-color` disables ANSI.
121
+ - Piped, `--json`, `CI=true`, or `NO_COLOR` → JSON to stdout, structured logs to stderr, no ANSI bytes.
122
+ - Use `--json` when parsing output programmatically (it's automatic when piped, but explicit is safer).
123
+ - Use `--verbose` if a command fails unexpectedly.
124
+
125
+ ## Troubleshooting
126
+
127
+ - **"ingest failed: unsupported mime"** → Add a converter or pass `--bytes` to keep the original; LLM-fallback only runs when `ANTHROPIC_API_KEY` is set.
128
+ - **"refresh failed: auth"** → The original fetch used an authenticated mcpx tool; re-auth via `mcpx auth <server>`.
129
+ - **Search returns nothing** → Confirm the file ingested with `membot info <path>`; if needed, run `membot reindex` to rebuild the FTS keyword index.
130
+ - **Stale results after manual DB edits** → `membot reindex`.
131
+ - **Two paths point at the same content** → `membot mv` doesn't merge; tombstone one with `membot rm`.
132
+
133
+ ## Configuration
134
+
135
+ - Data lives in `~/.membot/index.duckdb` (override via `MEMBOT_HOME`).
136
+ - Optional `ANTHROPIC_API_KEY` enables LLM fallback for messy/binary input. Without it, conversion degrades to deterministic native output.
137
+ - Config file: `~/.membot/config.json` (see `membot --help` for the global flags).
package/README.md ADDED
@@ -0,0 +1,126 @@
1
+ # membot
2
+
3
+ > Versioned context store with hybrid search for AI agents. Stdio + HTTP MCP server and CLI.
4
+
5
+ [![npm](https://img.shields.io/npm/v/membot.svg)](https://www.npmjs.com/package/membot)
6
+ [![license](https://img.shields.io/npm/l/membot.svg)](./LICENSE)
7
+
8
+ `membot` is a single-binary CLI and MCP server that gives AI agents a persistent, versioned, searchable context store. Files (markdown, PDFs, DOCX, HTML, URLs, agent-authored notes) are ingested, converted to markdown, chunked, embedded **locally** with `@huggingface/transformers` (WASM, no cloud calls), and indexed in DuckDB with hybrid search (semantic vector + BM25). Every change creates a new version — nothing is overwritten in place.
9
+
10
+ - **Local everything** — embeddings run on your machine; data lives in `~/.membot/index.duckdb`.
11
+ - **One mental model** — every artifact (markdown, PDF, image, audio) becomes a markdown surrogate that flows through the same chunk → embed → search pipeline.
12
+ - **Append-only versioning** — every ingest, refresh, or write creates a new `(logical_path, version_id)` row. History is queryable; nothing is mutated.
13
+ - **Two surfaces, one source of truth** — every operation is exposed identically as a CLI subcommand and an MCP tool. The agent sees `membot_search`; you see `membot search`.
14
+
15
+ ## Install
16
+
17
+ ```bash
18
+ bun install -g membot
19
+ # or
20
+ npm install -g membot
21
+ ```
22
+
23
+ This pulls in DuckDB's per-platform native bindings alongside membot. The build externalizes `@duckdb/*` (those `.node` bindings can't be embedded by `bun build --compile`), so a global npm/bun install is the supported path.
24
+
25
+ ## Quick start
26
+
27
+ ```bash
28
+ membot add ./docs # ingest a directory recursively
29
+ membot add https://example.com/spec.pdf # ingest a URL (auto-converted to markdown)
30
+ membot ls # list current files
31
+ membot search "how does refresh work?" # hybrid search
32
+ membot read docs/refresh.md # read the markdown surrogate
33
+ membot serve # expose the same operations as MCP tools (stdio)
34
+ ```
35
+
36
+ ## Use with Claude Code or Cursor
37
+
38
+ `membot skill install` drops the agent skill into the right place so Claude Code or Cursor know **when** to call `membot`.
39
+
40
+ ```bash
41
+ membot skill install --claude # writes ./.claude/skills/membot.md (project)
42
+ membot skill install --cursor # writes ./.cursor/rules/membot.mdc (project)
43
+ membot skill install --claude --global # writes ~/.claude/skills/membot.md
44
+ membot skill install --claude --cursor -f # both, overwrite if present
45
+ ```
46
+
47
+ The skill files describe the discover → ingest → search → read → write workflow and the versioning rules. You can re-run with `--force` to refresh after upgrading membot.
48
+
49
+ ## Commands
50
+
51
+ | Command | Description |
52
+ | ------------------------------- | --------------------------------------------------------------------------------- |
53
+ | `membot add <source>` | Ingest a file, directory, glob, URL, or `inline:<text>`. Skips on unchanged source bytes; pass `--force` to re-ingest |
54
+ | `membot ls [prefix]` | List current files (size, mime, refresh status) |
55
+ | `membot tree [prefix]` | Render the synthesised logical-path tree |
56
+ | `membot read <path>` | Read the markdown surrogate (or `--bytes` for original bytes, base64) |
57
+ | `membot search <query>` | Hybrid search (semantic + BM25); `--include-history` searches older versions |
58
+ | `membot info <path>` | Inspect metadata (source, fetcher, schedule, digests) without content |
59
+ | `membot versions <path>` | List every version newest-first |
60
+ | `membot diff <path> <a> [b]` | Unified diff between two versions |
61
+ | `membot write <path>` | Write inline agent-authored markdown as a new version |
62
+ | `membot mv <from> <to>` | Rename a logical_path (history preserved under both) |
63
+ | `membot rm <path>` | Tombstone a logical_path (history still queryable) |
64
+ | `membot refresh [path]` | Re-read source; new version only if bytes changed |
65
+ | `membot prune --before <ts>` | Permanently drop non-current versions older than cutoff (irreversible) |
66
+ | `membot serve` | Run the MCP server (stdio default; `--http <port>` for HTTP) |
67
+ | `membot reindex` | Rebuild the FTS keyword index over current chunks |
68
+ | `membot mcpx <subcommand>` | Forward to the bundled `mcpx` CLI for managing remote MCP servers |
69
+ | `membot skill install` | Install the Claude Code / Cursor agent skill |
70
+
71
+ Run `membot <command> --help` for full flags and arguments. Every command produces JSON when piped, when `--json` is set, or when `CI=true`.
72
+
73
+ ## MCP server
74
+
75
+ `membot serve` exposes every operation as an MCP tool. Stdio is the default; pass `--http <port>` for streamable HTTP.
76
+
77
+ **Claude Desktop** (`~/Library/Application Support/Claude/claude_desktop_config.json` on macOS):
78
+
79
+ ```json
80
+ {
81
+ "mcpServers": {
82
+ "membot": {
83
+ "command": "membot",
84
+ "args": ["serve"]
85
+ }
86
+ }
87
+ }
88
+ ```
89
+
90
+ **Streamable HTTP** (any MCP client that speaks HTTP):
91
+
92
+ ```bash
93
+ membot serve --http 3000
94
+ # tool endpoint: http://localhost:3000/mcp
95
+ ```
96
+
97
+ Add `--watch` (and optional `--tick <sec>`) to also run the refresh daemon, which re-reads any file whose `refresh_frequency` has elapsed.
98
+
99
+ ## Configuration
100
+
101
+ - **Data directory:** `~/.membot/` (override with `MEMBOT_HOME=/path` or `--config <path>`).
102
+ - `~/.membot/index.duckdb` — all content, blobs, chunks, embeddings, and metadata.
103
+ - `~/.membot/models/` — cached embedding model weights (`Xenova/bge-small-en-v1.5`, 384-dim).
104
+ - `~/.membot/logs/` — daemon logs when running `serve --watch`.
105
+ - **Config file:** `~/.membot/config.json` (optional; defaults are sane).
106
+ - **Environment variables:**
107
+ - `ANTHROPIC_API_KEY` — optional. Enables LLM fallback for messy / scanned input (vision captions for images, last-resort markdown conversion). Without it, the pipeline degrades to deterministic native conversion.
108
+ - `MEMBOT_HOME` — override the data directory.
109
+ - `NO_COLOR`, `CI`, `FORCE_COLOR` — standard output controls.
110
+
111
+ ## Development
112
+
113
+ ```bash
114
+ bun install
115
+ bun run dev <args> # run from source
116
+ bun test # full test suite (real ephemeral DuckDB per test)
117
+ bun run lint # biome + tsc
118
+ bun run format # biome --write
119
+ bun run build # compile a standalone binary into dist/membot
120
+ ```
121
+
122
+ Architecture, design constraints, and reference projects are documented in [`docs/plan.md`](./docs/plan.md) and [`CLAUDE.md`](./CLAUDE.md).
123
+
124
+ ## License
125
+
126
+ MIT © Evan Tahler
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "membot",
3
- "version": "0.1.0",
3
+ "version": "0.1.2",
4
4
  "description": "Versioned context store with hybrid search for AI agents. Stdio + HTTP MCP server and CLI.",
5
5
  "type": "module",
6
6
  "exports": {
@@ -16,6 +16,8 @@
16
16
  "src",
17
17
  "patches",
18
18
  "scripts",
19
+ ".claude",
20
+ ".cursor",
19
21
  "README.md",
20
22
  "LICENSE"
21
23
  ],
@@ -24,7 +26,7 @@
24
26
  "test": "bun test",
25
27
  "lint": "biome ci . && tsc --noEmit",
26
28
  "format": "biome check --write .",
27
- "prebuild": "bash scripts/apply-transformers-patch.sh",
29
+ "prebuild": "bash scripts/apply-patches.sh",
28
30
  "build": "bun build --compile --minify --sourcemap --external '@duckdb/*' ./src/cli.ts --outfile dist/membot"
29
31
  },
30
32
  "keywords": [
@@ -39,7 +41,7 @@
39
41
  "bun"
40
42
  ],
41
43
  "license": "MIT",
42
- "author": "Evan Tahler <evan@arcade.dev>",
44
+ "author": "Evan Tahler <evan@evantahler.com>",
43
45
  "repository": {
44
46
  "type": "git",
45
47
  "url": "https://github.com/evantahler/membot.git"
@@ -0,0 +1,44 @@
1
+ diff --git a/src/search/onnx-wasm-paths.ts b/src/search/onnx-wasm-paths.ts
2
+ --- a/src/search/onnx-wasm-paths.ts
3
+ +++ b/src/search/onnx-wasm-paths.ts
4
+ @@ -1,31 +1,9 @@
5
+ -// Embed the onnxruntime-web WASM runtime files into the compiled binary
6
+ -// (`bun build --compile`) so they survive in a single-binary distribution
7
+ -// where the user has no node_modules.
8
+ -//
9
+ -// This file is loaded **dynamically** by semantic.ts. The relative paths
10
+ -// only resolve in the local repo / compiled binary; for npm/bun-installed
11
+ -// mcpx the parent directory layout is different (deps are hoisted), the
12
+ -// dynamic import throws, and we fall back to letting transformers.js
13
+ -// load WASM via its default mechanism — which works fine because in
14
+ -// that environment node_modules exists and onnxruntime-web is reachable
15
+ -// through normal module resolution.
16
+ -
17
+ -// The relative `../../node_modules/...` paths only resolve from the local repo
18
+ -// layout (and inside `bun build --compile`). When this file is shipped via npm,
19
+ -// deps are hoisted, so consumer `tsc` runs hit TS2307. The `ts-ignore` directive
20
+ -// below silences that for consumers; we avoid the stricter `expect-error` form
21
+ -// because in the local repo the path resolves fine and there would be no error
22
+ -// to expect. At runtime the dynamic import in semantic.ts is wrapped in
23
+ -// try/catch and falls back to transformers.js's default WASM loader (issue #85).
24
+ -// biome-ignore lint/suspicious/noTsIgnore: must stay as ts-ignore per comment above
25
+ -// @ts-ignore - dynamic-only import
26
+ -import wasmMjsPath from "../../node_modules/onnxruntime-web/dist/ort-wasm-simd-threaded.asyncify.mjs" with {
27
+ - type: "file",
28
+ -};
29
+ -// biome-ignore lint/suspicious/noTsIgnore: must stay as ts-ignore per comment above
30
+ -// @ts-ignore - dynamic-only import
31
+ -import wasmBinPath from "../../node_modules/onnxruntime-web/dist/ort-wasm-simd-threaded.asyncify.wasm" with {
32
+ - type: "file",
33
+ -};
34
+ -
35
+ -export { wasmBinPath, wasmMjsPath };
36
+ +// PATCHED (membot): upstream mcpx ships static `with { type: "file" }` imports
37
+ +// of onnxruntime-web WASM assets via `../../node_modules/...`, which only
38
+ +// resolves when mcpx is built standalone. When consumed as an npm dep those
39
+ +// paths are unreachable and `bun build --compile` fails at build time. membot
40
+ +// never invokes mcpx's semantic search (only `mcpx.exec()` for URL fetching),
41
+ +// so we stub the exports — semantic.ts wraps the dynamic import in try/catch
42
+ +// and falls back to transformers.js's default WASM loader.
43
+ +export const wasmMjsPath = "";
44
+ +export const wasmBinPath = "";
@@ -0,0 +1,49 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+
4
+ # Apply node_modules patches imperatively. We don't use package.json's
5
+ # `patchedDependencies` field because that field, when present in a published
6
+ # package, breaks `bun install` from a tarball.
7
+ #
8
+ # Each patch is gated by a marker file inside its target so reruns are no-ops.
9
+
10
+ apply_patch() {
11
+ local patch="$1" target="$2" marker_name="$3"
12
+ local marker="$target/$marker_name"
13
+
14
+ if [ ! -d "$target" ]; then
15
+ echo "error: $target not found — run \`bun install\` first" >&2
16
+ exit 1
17
+ fi
18
+ if [ ! -f "$patch" ]; then
19
+ echo "error: $patch not found" >&2
20
+ exit 1
21
+ fi
22
+ if [ -f "$marker" ]; then
23
+ echo "patch $patch already applied — skipping"
24
+ return 0
25
+ fi
26
+
27
+ echo "Applying $patch to $target..."
28
+ git apply --directory="$target" "$patch"
29
+ touch "$marker"
30
+ }
31
+
32
+ # @huggingface/transformers — replace static `import 'onnxruntime-node'` with a
33
+ # stub so `bun build --compile` produces a binary using the WASM backend
34
+ # (onnxruntime-web) instead of onnxruntime-node, whose native bindings can't be
35
+ # bundled into a single-binary distribution.
36
+ apply_patch \
37
+ "patches/@huggingface%2Ftransformers@4.2.0.patch" \
38
+ "node_modules/@huggingface/transformers" \
39
+ ".membot-transformers-patch-applied"
40
+
41
+ # @evantahler/mcpx — stub `src/search/onnx-wasm-paths.ts` whose static
42
+ # `with { type: "file" }` imports use a relative path that only resolves in
43
+ # mcpx's own repo layout. When mcpx is consumed as an npm dep those paths are
44
+ # unreachable and `bun build --compile` fails at build time. membot never
45
+ # invokes mcpx's semantic search, so the stubbed exports are safe.
46
+ apply_patch \
47
+ "patches/@evantahler%2Fmcpx@0.21.4.patch" \
48
+ "node_modules/@evantahler/mcpx" \
49
+ ".membot-mcpx-patch-applied"
package/src/cli.ts CHANGED
@@ -7,6 +7,7 @@ import { registerCheckUpdateCommand } from "./commands/check-update.ts";
7
7
  import { registerMcpxCommand } from "./commands/mcpx.ts";
8
8
  import { registerReindexCommand } from "./commands/reindex.ts";
9
9
  import { registerServeCommand } from "./commands/serve.ts";
10
+ import { registerSkillCommand } from "./commands/skill.ts";
10
11
  import { registerUpgradeCommand } from "./commands/upgrade.ts";
11
12
  import type { BuildContextOptions } from "./context.ts";
12
13
  import { mountAsCommanderCommand } from "./mount/commander.ts";
@@ -57,6 +58,7 @@ for (const op of OPERATIONS) {
57
58
  registerServeCommand(program);
58
59
  registerReindexCommand(program);
59
60
  registerMcpxCommand(program);
61
+ registerSkillCommand(program);
60
62
  registerCheckUpdateCommand(program);
61
63
  registerUpgradeCommand(program);
62
64
 
@@ -0,0 +1,131 @@
1
+ import { existsSync, mkdirSync, writeFileSync } from "node:fs";
2
+ import { homedir } from "node:os";
3
+ import { join, resolve } from "node:path";
4
+ import type { Command } from "commander";
5
+ import claudeSkill from "../../.claude/skills/membot.md" with { type: "text" };
6
+ import cursorRule from "../../.cursor/rules/membot.mdc" with { type: "text" };
7
+ import { HelpfulError, isHelpfulError, mapKindToExit } from "../errors.ts";
8
+ import { renderCliError } from "../mount/commander.ts";
9
+ import { logger } from "../output/logger.ts";
10
+ import { detectMode, setMode } from "../output/tty.ts";
11
+
12
+ interface SkillTarget {
13
+ agentLabel: string;
14
+ scopeLabel: string;
15
+ dir: string;
16
+ filename: string;
17
+ content: string;
18
+ }
19
+
20
+ interface SkillInstallOptions {
21
+ claude?: boolean;
22
+ cursor?: boolean;
23
+ global?: boolean;
24
+ project?: boolean;
25
+ force?: boolean;
26
+ }
27
+
28
+ /**
29
+ * `membot skill install [--claude] [--cursor] [--global|--project] [-f]`
30
+ *
31
+ * Drop the membot agent skill into the right location for Claude Code
32
+ * (`.claude/skills/membot.md`) or Cursor (`.cursor/rules/membot.mdc`),
33
+ * either in the current project (default) or in the user's home directory
34
+ * (`--global`). Both flags can be combined to install for both targets at
35
+ * once. The skill files are bundled into the binary via Bun text imports
36
+ * so this works in the compiled distribution as well as in `bun run`.
37
+ */
38
+ export function registerSkillCommand(program: Command): void {
39
+ const skill = program.command("skill").description("Install agent skills (Claude Code, Cursor)");
40
+
41
+ skill
42
+ .command("install")
43
+ .description(
44
+ "Install the membot skill into Claude Code (.claude/skills/membot.md) and/or Cursor (.cursor/rules/membot.mdc)",
45
+ )
46
+ .option("--claude", "install for Claude Code")
47
+ .option("--cursor", "install for Cursor")
48
+ .option("--global", "install to the user's home directory (default: project)")
49
+ .option("--project", "install to the current working directory (default)")
50
+ .option("-f, --force", "overwrite if the skill file already exists")
51
+ .action((opts: SkillInstallOptions) => {
52
+ const globalOpts = program.optsWithGlobals<{ json?: boolean; verbose?: boolean; color?: boolean }>();
53
+ setMode(
54
+ detectMode({
55
+ json: globalOpts.json,
56
+ verbose: globalOpts.verbose,
57
+ noColor: globalOpts.color === false,
58
+ }),
59
+ );
60
+ try {
61
+ install(opts);
62
+ } catch (err) {
63
+ renderCliError(err);
64
+ process.exit(isHelpfulError(err) ? mapKindToExit(err.kind) : 1);
65
+ }
66
+ });
67
+ }
68
+
69
+ /**
70
+ * Resolve and write every requested skill file. Throws `HelpfulError` on
71
+ * any input or conflict failure so the mount-style error renderer can
72
+ * surface a uniform JSON / colorized message.
73
+ */
74
+ function install(opts: SkillInstallOptions): void {
75
+ if (!opts.claude && !opts.cursor) {
76
+ throw new HelpfulError({
77
+ kind: "input_error",
78
+ message: "no agent target specified",
79
+ hint: "Pass --claude, --cursor, or both — e.g. `membot skill install --claude`",
80
+ });
81
+ }
82
+
83
+ const targets = computeTargets(opts);
84
+ for (const target of targets) {
85
+ const dest = join(target.dir, target.filename);
86
+ if (existsSync(dest) && !opts.force) {
87
+ throw new HelpfulError({
88
+ kind: "conflict",
89
+ message: `${dest} already exists`,
90
+ hint: "Re-run with --force to overwrite",
91
+ });
92
+ }
93
+ mkdirSync(target.dir, { recursive: true });
94
+ writeFileSync(dest, target.content, "utf-8");
95
+ logger.info(`installed ${target.agentLabel} skill (${target.scopeLabel}): ${dest}`);
96
+ }
97
+ }
98
+
99
+ /**
100
+ * Materialise the (agent × scope) cartesian product of install targets the
101
+ * user asked for. Default scope is project when neither --global nor
102
+ * --project is passed; passing both installs to both locations.
103
+ */
104
+ function computeTargets(opts: SkillInstallOptions): SkillTarget[] {
105
+ const scopes: { label: string; resolveDir: (rel: string) => string }[] = [];
106
+ if (opts.global) scopes.push({ label: "global", resolveDir: (rel) => join(homedir(), rel) });
107
+ if (opts.project || !opts.global) scopes.push({ label: "project", resolveDir: (rel) => resolve(rel) });
108
+
109
+ const targets: SkillTarget[] = [];
110
+ for (const scope of scopes) {
111
+ if (opts.claude) {
112
+ targets.push({
113
+ agentLabel: "Claude Code",
114
+ scopeLabel: scope.label,
115
+ dir: scope.resolveDir(".claude/skills"),
116
+ filename: "membot.md",
117
+ content: claudeSkill,
118
+ });
119
+ }
120
+ if (opts.cursor) {
121
+ targets.push({
122
+ agentLabel: "Cursor",
123
+ scopeLabel: scope.label,
124
+ dir: scope.resolveDir(".cursor/rules"),
125
+ filename: "membot.mdc",
126
+ content: cursorRule,
127
+ });
128
+ }
129
+ }
130
+ return targets;
131
+ }
@@ -31,6 +31,15 @@ function isModelCached(model: string): boolean {
31
31
  * Lazily load (and cache) the feature-extraction pipeline for a model. Loading
32
32
  * is expensive (downloads weights on first run, ~100s of ms to instantiate
33
33
  * ONNX), so we hold one promise per model name for the life of the process.
34
+ *
35
+ * Try `wasm` first, fall back to `cpu` on "Unsupported device". The transformers
36
+ * patch (applied for `bun build --compile` and via `bun run prebuild` for local
37
+ * dev) registers `wasm` as a supported device backed by onnxruntime-web — that's
38
+ * mandatory for the single-binary build because native bindings can't be
39
+ * bundled. When the package is unpatched (npm-installed membot, or `bun dev`
40
+ * before `prebuild`), `wasm` is rejected and we fall back to the default `cpu`
41
+ * device, which uses the onnxruntime-node native bindings that ship with the
42
+ * unpatched package.
34
43
  */
35
44
  async function getPipeline(model: string): Promise<FeatureExtractionPipeline> {
36
45
  let p = pipelinePromises.get(model);
@@ -40,9 +49,15 @@ async function getPipeline(model: string): Promise<FeatureExtractionPipeline> {
40
49
  } else {
41
50
  logger.info(`embedder: loading model ${model} (first run, downloading weights)`);
42
51
  }
43
- // device: "wasm" matches what our transformers patch supports — the
44
- // default ("cpu") errors out because the patch removes onnxruntime-node.
45
- p = pipeline("feature-extraction", model, { device: "wasm" }) as Promise<FeatureExtractionPipeline>;
52
+ p = (async () => {
53
+ try {
54
+ return (await pipeline("feature-extraction", model, { device: "wasm" })) as FeatureExtractionPipeline;
55
+ } catch (err) {
56
+ if (!String((err as Error)?.message ?? "").includes("Unsupported device")) throw err;
57
+ logger.debug("embedder: wasm backend unavailable, falling back to cpu (onnxruntime-node)");
58
+ return (await pipeline("feature-extraction", model, { device: "cpu" })) as FeatureExtractionPipeline;
59
+ }
60
+ })();
46
61
  pipelinePromises.set(model, p);
47
62
  }
48
63
  return p;
@@ -21,13 +21,14 @@ export interface IngestInput {
21
21
  refresh_frequency?: string;
22
22
  fetcher_hint?: string;
23
23
  change_note?: string;
24
+ force?: boolean;
24
25
  }
25
26
 
26
27
  export interface IngestEntryResult {
27
28
  source_path: string;
28
29
  logical_path: string;
29
30
  version_id: string | null;
30
- status: "ok" | "failed";
31
+ status: "ok" | "unchanged" | "failed";
31
32
  error?: string;
32
33
  mime_type: string | null;
33
34
  size_bytes: number;
@@ -39,6 +40,7 @@ export interface IngestResult {
39
40
  ingested: IngestEntryResult[];
40
41
  total: number;
41
42
  ok: number;
43
+ unchanged: number;
42
44
  failed: number;
43
45
  }
44
46
 
@@ -57,14 +59,15 @@ export async function ingest(input: IngestInput, ctx: AppContext): Promise<Inges
57
59
  });
58
60
 
59
61
  const refreshSec = parseDuration(input.refresh_frequency);
62
+ const force = input.force === true;
60
63
 
61
64
  if (resolved.kind === "inline") {
62
65
  return ingestInline(resolved.text, input, ctx, refreshSec);
63
66
  }
64
67
  if (resolved.kind === "url") {
65
- return ingestUrl(resolved.url, input, ctx, refreshSec);
68
+ return ingestUrl(resolved.url, input, ctx, refreshSec, force);
66
69
  }
67
- return ingestLocalFiles(resolved, input, ctx, refreshSec);
70
+ return ingestLocalFiles(resolved, input, ctx, refreshSec, force);
68
71
  }
69
72
 
70
73
  /** Ingest a single inline blob (source_type='inline'). */
@@ -119,6 +122,7 @@ async function ingestUrl(
119
122
  input: IngestInput,
120
123
  ctx: AppContext,
121
124
  refreshSec: number | null,
125
+ force: boolean,
122
126
  ): Promise<IngestResult> {
123
127
  const mcpxAdapter = ctx.mcpx
124
128
  ? {
@@ -151,6 +155,15 @@ async function ingestUrl(
151
155
  result.fetcher = fetched.fetcher;
152
156
  result.source_sha256 = fetched.sha256;
153
157
 
158
+ if (!force) {
159
+ const cur = await getCurrent(ctx.db, logicalPath);
160
+ if (cur && cur.source_sha256 === fetched.sha256) {
161
+ result.status = "unchanged";
162
+ result.version_id = cur.version_id;
163
+ return summarize([result]);
164
+ }
165
+ }
166
+
154
167
  const versionId = await pipelineForBytes(ctx, {
155
168
  logicalPath,
156
169
  bytes: fetched.bytes,
@@ -181,6 +194,7 @@ async function ingestLocalFiles(
181
194
  input: IngestInput,
182
195
  ctx: AppContext,
183
196
  refreshSec: number | null,
197
+ force: boolean,
184
198
  ): Promise<IngestResult> {
185
199
  if (resolved.entries.length === 0) {
186
200
  throw new HelpfulError({
@@ -213,6 +227,16 @@ async function ingestLocalFiles(
213
227
  result.size_bytes = local.sizeBytes;
214
228
  result.source_sha256 = local.sha256;
215
229
 
230
+ if (!force) {
231
+ const cur = await getCurrent(ctx.db, logicalPath);
232
+ if (cur && cur.source_sha256 === local.sha256) {
233
+ result.status = "unchanged";
234
+ result.version_id = cur.version_id;
235
+ results.push(result);
236
+ continue;
237
+ }
238
+ }
239
+
216
240
  const versionId = await pipelineForBytes(ctx, {
217
241
  logicalPath,
218
242
  bytes: local.bytes,
@@ -236,7 +260,10 @@ async function ingestLocalFiles(
236
260
  }
237
261
  results.push(result);
238
262
  }
239
- ctx.progress.done(`ingested ${results.filter((r) => r.status === "ok").length}/${results.length}`);
263
+ const okCount = results.filter((r) => r.status === "ok").length;
264
+ const unchangedCount = results.filter((r) => r.status === "unchanged").length;
265
+ const suffix = unchangedCount > 0 ? ` (${unchangedCount} unchanged)` : "";
266
+ ctx.progress.done(`ingested ${okCount}/${results.length}${suffix}`);
240
267
 
241
268
  return summarize(results);
242
269
  }
@@ -428,12 +455,14 @@ export function parseDuration(input: string | null | undefined): number | null {
428
455
  /** Roll a list of per-entry results into the top-level summary shape. */
429
456
  function summarize(entries: IngestEntryResult[]): IngestResult {
430
457
  let ok = 0;
458
+ let unchanged = 0;
431
459
  let failed = 0;
432
460
  for (const e of entries) {
433
461
  if (e.status === "ok") ok += 1;
462
+ else if (e.status === "unchanged") unchanged += 1;
434
463
  else failed += 1;
435
464
  }
436
- return { ingested: entries, total: entries.length, ok, failed };
465
+ return { ingested: entries, total: entries.length, ok, unchanged, failed };
437
466
  }
438
467
 
439
468
  function errorMessage(err: unknown): string {
@@ -43,10 +43,12 @@ export async function resolveSource(source: string, options: ResolveOptions = {}
43
43
  }
44
44
 
45
45
  const followSymlinks = options.followSymlinks !== false;
46
- const includeMatchers = (options.include ?? "**/*")
47
- .split(",")
48
- .map((g) => g.trim())
49
- .filter(Boolean);
46
+ const userIncludes = options.include
47
+ ? options.include
48
+ .split(",")
49
+ .map((g) => g.trim())
50
+ .filter(Boolean)
51
+ : [];
50
52
  const excludeMatchers = [
51
53
  ...DEFAULT_EXCLUDES,
52
54
  ...(options.exclude ?? "")
@@ -57,9 +59,14 @@ export async function resolveSource(source: string, options: ResolveOptions = {}
57
59
 
58
60
  if (isGlob(source)) {
59
61
  const base = globBase(source);
62
+ const remainder = globRemainder(source);
60
63
  try {
61
64
  const realBase = await realpath(base);
62
- return walk(realBase, [source, ...includeMatchers], excludeMatchers, followSymlinks);
65
+ // Source glob acts as a hard filter; user includes (if any) further
66
+ // narrow the result via AND. Pass them as a separate matcher so the
67
+ // two sets aren't picomatch-OR'd together.
68
+ const extraIncludes = userIncludes.length > 0 ? [userIncludes] : [];
69
+ return walk(realBase, [remainder], excludeMatchers, followSymlinks, extraIncludes);
63
70
  } catch (err) {
64
71
  throw asHelpful(
65
72
  err,
@@ -93,7 +100,8 @@ export async function resolveSource(source: string, options: ResolveOptions = {}
93
100
 
94
101
  if (st.isDirectory()) {
95
102
  const realBase = await realpath(abs);
96
- return walk(realBase, includeMatchers, excludeMatchers, followSymlinks);
103
+ const dirIncludes = userIncludes.length > 0 ? userIncludes : ["**/*"];
104
+ return walk(realBase, dirIncludes, excludeMatchers, followSymlinks);
97
105
  }
98
106
 
99
107
  throw new HelpfulError({
@@ -120,22 +128,40 @@ export function globBase(glob: string): string {
120
128
  return base.length === 0 || !isAbsolute(base) ? resolve(base || ".") : base;
121
129
  }
122
130
 
131
+ /**
132
+ * Take the wildcard portion of a glob — everything from the first segment
133
+ * containing a wildcard onward. We strip the static prefix so the matcher
134
+ * runs against entry paths relative to `globBase`. Without this, a glob like
135
+ * `docs/star-star/star.md` never matches anything under base=`docs/`, since
136
+ * walk() exposes `sub/file.md` to picomatch, not `docs/sub/file.md`.
137
+ */
138
+ export function globRemainder(glob: string): string {
139
+ const parts = glob.split(sep);
140
+ const wildcardIdx = parts.findIndex((p) => /[*?[\]{}!]/.test(p));
141
+ if (wildcardIdx === -1) return glob;
142
+ return parts.slice(wildcardIdx).join(sep);
143
+ }
144
+
123
145
  /**
124
146
  * Recursively walk `base`, returning files matched by `includes` and not
125
147
  * matched by `excludes`. Both globsets match against the entry's path
126
148
  * relative to `base`. Symlinks are followed when `followSymlinks` is true,
127
- * with cycles detected via a realpath cache.
149
+ * with cycles detected via a realpath cache. `extraIncludeSets` is a list
150
+ * of additional include groups, each ANDed onto the primary `includes` —
151
+ * use it when two filters must both match (e.g. source glob + --include).
128
152
  */
129
153
  async function walk(
130
154
  base: string,
131
155
  includes: string[],
132
156
  excludes: string[],
133
157
  followSymlinks: boolean,
158
+ extraIncludeSets: string[][] = [],
134
159
  ): Promise<ResolvedSource> {
135
160
  const seen = new Set<string>();
136
161
  const entries: ResolvedLocalEntry[] = [];
137
162
 
138
163
  const isInclude = picomatch(includes, { dot: false, nocase: false });
164
+ const extraMatchers = extraIncludeSets.map((set) => picomatch(set, { dot: false, nocase: false }));
139
165
  const isExclude = excludes.length ? picomatch(excludes, { dot: false }) : null;
140
166
 
141
167
  const queue: string[] = [base];
@@ -174,6 +200,7 @@ async function walk(
174
200
  const relForMatch = rel.length === 0 ? (cur.split(sep).pop() ?? cur) : rel;
175
201
  if (isExclude?.(relForMatch)) continue;
176
202
  if (!isInclude(relForMatch)) continue;
203
+ if (extraMatchers.some((m) => !m(relForMatch))) continue;
177
204
  entries.push({ absPath: real, relPath: relForMatch });
178
205
  }
179
206
 
@@ -14,11 +14,16 @@ export const addOperation = defineOperation({
14
14
  - a glob pattern (e.g. "docs/**/*.md")
15
15
  - a URL (fetched via mcpx if configured, otherwise plain HTTP)
16
16
  - "inline:<text>" literal
17
- PDF, DOCX, HTML, images, and other binaries are converted to markdown — native libraries first, vision/OCR for images, LLM fallback for messy or scanned input. Original bytes are kept in the blobs table; \`membot_read bytes=true\` returns them. Setting \`refresh_frequency\` enables automatic refresh from the daemon. Each ingested file becomes a NEW version under its own logical_path; existing versions stay queryable via membot_versions. Directory/glob ingests stream one file at a time — partial failures do not abort the rest; the response lists per-entry status.`,
17
+ PDF, DOCX, HTML, images, and other binaries are converted to markdown — native libraries first, vision/OCR for images, LLM fallback for messy or scanned input. Original bytes are kept in the blobs table; \`membot_read bytes=true\` returns them. Setting \`refresh_frequency\` enables automatic refresh from the daemon. By default, re-ingesting an unchanged source (same source_sha256 as the current version) is a no-op and reports \`status: "unchanged"\`; pass \`force=true\` to always create a new version. Each newly-ingested file becomes a new version under its own logical_path; existing versions stay queryable via membot_versions. Directory/glob ingests stream one file at a time — partial failures do not abort the rest; the response lists per-entry status.`,
18
18
  inputSchema: z.object({
19
19
  source: z.string().describe("Local path, directory, glob, URL, or `inline:<text>` literal"),
20
20
  logical_path: z.string().optional().describe("Destination logical_path (single source) or prefix (directory/glob)"),
21
- include: z.string().optional().describe("Glob include filter (comma-separated for multiple); default `**/*`"),
21
+ include: z
22
+ .string()
23
+ .optional()
24
+ .describe(
25
+ "Glob include filter (comma-separated for multiple). Defaults to `**/*` for directory sources, or the source pattern itself when source is a glob.",
26
+ ),
22
27
  exclude: z.string().optional().describe("Glob exclude filter (comma-separated for multiple)"),
23
28
  follow_symlinks: z
24
29
  .boolean()
@@ -30,6 +35,10 @@ PDF, DOCX, HTML, images, and other binaries are converted to markdown — native
30
35
  .optional()
31
36
  .describe("Free-form hint passed to mcpx tool search (e.g. 'firecrawl', 'github', 'google docs', 'http')"),
32
37
  change_note: z.string().optional().describe("Free-text note attached to the new version"),
38
+ force: z
39
+ .boolean()
40
+ .optional()
41
+ .describe("Re-ingest even when source bytes are unchanged. Default skips and reports `unchanged`."),
33
42
  }),
34
43
  outputSchema: z.object({
35
44
  ingested: z.array(
@@ -37,7 +46,7 @@ PDF, DOCX, HTML, images, and other binaries are converted to markdown — native
37
46
  source_path: z.string(),
38
47
  logical_path: z.string(),
39
48
  version_id: z.string().nullable(),
40
- status: z.enum(["ok", "failed"]),
49
+ status: z.enum(["ok", "unchanged", "failed"]),
41
50
  error: z.string().optional(),
42
51
  mime_type: z.string().nullable(),
43
52
  size_bytes: z.number(),
@@ -47,23 +56,27 @@ PDF, DOCX, HTML, images, and other binaries are converted to markdown — native
47
56
  ),
48
57
  total: z.number(),
49
58
  ok: z.number(),
59
+ unchanged: z.number(),
50
60
  failed: z.number(),
51
61
  }),
52
62
  cli: {
53
63
  positional: ["source"],
54
- aliases: { logical_path: "-p", refresh_frequency: "-r", change_note: "-m" },
64
+ aliases: { logical_path: "-p", refresh_frequency: "-r", change_note: "-m", force: "-f" },
55
65
  },
56
66
  console_formatter: (result) => {
57
67
  const lines = result.ingested.map((e) => {
58
68
  if (e.status === "ok") {
59
69
  return `${colors.green("✓")} ${colors.cyan(e.logical_path)} ${colors.dim(`(${e.fetcher}, ${e.size_bytes}B)`)}`;
60
70
  }
71
+ if (e.status === "unchanged") {
72
+ return `${colors.dim("≡")} ${colors.cyan(e.logical_path)} ${colors.dim("(unchanged)")}`;
73
+ }
61
74
  return `${colors.red("✗")} ${e.source_path} ${colors.dim(e.error ?? "")}`;
62
75
  });
63
- const summary = result.failed
64
- ? `${colors.green(`added ${result.ok}`)}, ${colors.red(`failed ${result.failed}`)}`
65
- : colors.green(`added ${result.ok}`);
66
- return `${lines.join("\n")}\n${summary}`;
76
+ const parts: string[] = [colors.green(`added ${result.ok}`)];
77
+ if (result.unchanged > 0) parts.push(colors.dim(`unchanged ${result.unchanged}`));
78
+ if (result.failed > 0) parts.push(colors.red(`failed ${result.failed}`));
79
+ return `${lines.join("\n")}\n${parts.join(", ")}`;
67
80
  },
68
81
  handler: async (input, ctx) => ingest(input, ctx),
69
82
  });
@@ -0,0 +1,9 @@
1
+ declare module "*.md" {
2
+ const content: string;
3
+ export default content;
4
+ }
5
+
6
+ declare module "*.mdc" {
7
+ const content: string;
8
+ export default content;
9
+ }
@@ -1,35 +0,0 @@
1
- #!/usr/bin/env bash
2
- set -euo pipefail
3
-
4
- # Apply the @huggingface/transformers patch to node_modules so that
5
- # `bun build --compile` produces a binary using the WASM backend
6
- # (onnxruntime-web) instead of onnxruntime-node, whose native bindings
7
- # can't be bundled into a single-binary distribution.
8
- #
9
- # We apply the patch imperatively (rather than via package.json
10
- # `patchedDependencies`) because that field, when present in a
11
- # published package, breaks `bun install` from a tarball.
12
-
13
- PATCH="patches/@huggingface%2Ftransformers@4.2.0.patch"
14
- TARGET="node_modules/@huggingface/transformers"
15
- MARKER="$TARGET/.membot-transformers-patch-applied"
16
-
17
- if [ ! -d "$TARGET" ]; then
18
- echo "error: $TARGET not found — run \`bun install\` first" >&2
19
- exit 1
20
- fi
21
-
22
- if [ ! -f "$PATCH" ]; then
23
- echo "error: $PATCH not found" >&2
24
- exit 1
25
- fi
26
-
27
- if [ -f "$MARKER" ]; then
28
- echo "transformers patch already applied — skipping"
29
- exit 0
30
- fi
31
-
32
- echo "Applying transformers patch ($PATCH) to $TARGET..."
33
- git apply --directory="$TARGET" "$PATCH"
34
- touch "$MARKER"
35
- echo "Patch applied."