@cerefox/memory 0.4.3 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/AGENT_GUIDE.md ADDED
@@ -0,0 +1,462 @@
1
+ # How AI Agents Use Cerefox
2
+
3
+ Reference guide for AI agents interacting with the Cerefox knowledge base.
4
+ Read this before your first interaction. For a minimal quick reference, see `AGENT_QUICK_REFERENCE.md`.
5
+
6
+ ---
7
+
8
+ ## What Cerefox Is
9
+
10
+ Cerefox is a persistent, shared knowledge base that multiple AI agents can read and write.
11
+ Knowledge written by one agent (or a human) is immediately searchable by any other agent.
12
+ It is not a message bus -- it is curated, versioned, searchable memory backed by Postgres + pgvector.
13
+
14
+ ## Two ways to interact with Cerefox
15
+
16
+ You'll be using **one** of these — whichever your user (or the harness) has configured:
17
+
18
+ 1. **MCP tools (default)** — ten named tools (`cerefox_search`, `cerefox_ingest`, …, `cerefox_get_help`) exposed by either a local MCP server (`@cerefox/memory` via npm, or `cerefox mcp` as a soft wrapper) or the remote `cerefox-mcp` Edge Function. Tool names and parameters are documented in **The 10 Tools** below. This is the recommended path for purpose-built agent clients.
19
+ 2. **Shell CLI (Bash tool)** — the same operations exposed as a local `uv run cerefox …` command, invoked via your Bash tool. Used when your user prefers not to install/configure an MCP server. The semantics are identical; only the surface differs. See **Using Cerefox via the CLI** near the bottom of this guide for the MCP-tool → CLI-command mapping and the small list of behavioural differences.
20
+
21
+ If you're not sure which mode you're in: check whether `cerefox_search` shows up in your tool list. If yes, use MCP. If no, ask your user where the Cerefox checkout lives — they'll have told you, typically in `CLAUDE.md`, `AGENTS.md`, or an equivalent project memory file.
22
+
23
+ The rest of this guide is written around the MCP tool names, since those are stable across both modes. The CLI section maps each tool name to its CLI command.
24
+
25
+ ### Self-help via MCP
26
+
27
+ If you have MCP access and you're uncertain about any convention in this guide, call **`cerefox_get_help`** — it returns the contents of `AGENT_QUICK_REFERENCE.md` (the same conventions, rules, and workflow snippets) as MCP-native text, no file-system reads required.
28
+
29
+ - No arguments → full reference + an index of `## H2` topics.
30
+ - `topic: "tools"` (or any case-insensitive H2 substring) → just that section.
31
+ - `topic: "made-up-name"` → an "unknown topic" message plus the available-topics list.
32
+
33
+ The tool is intentionally MCP-only so an agent that has been dropped into Cerefox without filesystem access (e.g. a remote MCP client) can still bootstrap its own conventions. Treat it as a fallback: this guide and `AGENT_QUICK_REFERENCE.md` are the canonical surface; `cerefox_get_help` is the in-band escape hatch.
34
+
35
+ ---
36
+
37
+ ## The 10 Tools
38
+
39
+ ### cerefox_search
40
+
41
+ Find documents using hybrid search (full-text + semantic vector similarity).
42
+
43
+ | Parameter | Required | Description |
44
+ |-----------|----------|-------------|
45
+ | `query` | Yes | Natural language search query. 3-8 focused keywords work best. |
46
+ | `match_count` | No | Max documents to return (default 5). |
47
+ | `project_name` | No | Filter to a specific project by name. |
48
+ | `metadata_filter` | No | JSON object for filtering by metadata (AND semantics). Example: `{"type": "decision-log"}` |
49
+ | `max_bytes` | No | Response size budget in bytes (default 200000). |
50
+ | `requestor` | No | Your agent name for attribution. Always set this. |
51
+
52
+ **Results format**: Each result shows `## Title [id: <uuid>] (score: X.XXX)` followed by content.
53
+ Save the `document_id` from `[id: ...]` -- you need it for `cerefox_get_document` and `cerefox_ingest` updates.
54
+
55
+ For large documents, results may be partial (`is_partial` flag). Use `cerefox_get_document` with the ID to get the full text.
56
+
57
+ **Rule**: Always search before answering questions about stored knowledge. Always search before ingesting to check for duplicates.
58
+
59
+ ---
60
+
61
+ ### cerefox_ingest
62
+
63
+ Save a new document or update an existing one.
64
+
65
+ | Parameter | Required | Description |
66
+ |-----------|----------|-------------|
67
+ | `title` | Yes | Descriptive, stable title (e.g., "OAuth 2.1 Design Document", not "doc1"). |
68
+ | `content` | Yes | Markdown content. Use H1/H2/H3 headings -- the chunker uses them for segmentation. |
69
+ | `document_id` | No | UUID of an existing document to update. When provided, updates that document directly regardless of `update_if_exists`. Returns an error if the document does not exist. Workflow: search → note the `[id: ...]` → pass here. |
70
+ | `update_if_exists` | No | When `true`, updates the document with the same title (versions the old content). Default `false`. Ignored when `document_id` is provided. |
71
+ | `project_name` | No | **Single** project name (created if absent). On update: **non-destructive add** — ensures this membership exists, preserves others. See "Project membership semantics" below. |
72
+ | `project_names` | No | **List** of project names (each created if absent). On update: **destructive replace** — sets the document's full project set to exactly this list. Use when you want to set multiple projects at once, or deliberately change the membership list. Wins over `project_name` when both are passed. |
73
+ | `metadata` | No | Arbitrary JSON. Use at minimum: `type` and `status`. |
74
+ | `author` | No | Your agent name for audit attribution. Always set this. |
75
+ | `source` | No | Origin label (default "agent"). |
76
+
77
+ **The update workflow (preferred -- ID-based)**:
78
+ 1. Search for the document. Note the `[id: abc123]` in the result.
79
+ 2. Call `cerefox_ingest` with `document_id: "abc123"` and the new content.
80
+ 3. The old content is automatically versioned and recoverable.
81
+
82
+ **The update workflow (fallback -- title-based)**:
83
+ 1. Search for the document first.
84
+ 2. Call `cerefox_ingest` with the **exact same title** and `update_if_exists: true`.
85
+ 3. If you use a different title, a **new** document is created (the old one remains). This is almost never what you want when revising.
86
+
87
+ **Deduplication**: Content is SHA-256 hashed. Identical content is skipped (no re-indexing). Metadata-only changes update metadata without creating a version.
88
+
89
+ **What to ingest**: Distilled summaries, decisions with rationale, curated insights. Not raw dumps, logs, or transcripts. Use Markdown headings for structure.
90
+
91
+ #### Project membership semantics
92
+
93
+ This is subtle but important — a document can belong to multiple projects (many-to-many), and an operator may have curated the project list via the web UI. **You must not silently strip their work when updating content.** The rules:
94
+
95
+ | What you pass on update | What happens to memberships |
96
+ |---|---|
97
+ | `project_name: "X"` (singular) | **Non-destructive add.** Ensures the doc is in project X. Other memberships untouched. |
98
+ | `project_names: ["X", "Y"]` (list) | **Destructive replace.** Sets the doc's project set to exactly `{X, Y}`. Other memberships are removed. Use when you want this. |
99
+ | Neither | **No change** to project memberships. |
100
+
101
+ **Rule of thumb**: if you just want to ensure a doc is *associated with* a project, use singular `project_name`. If you want to *change* the project list, use `project_names`. If you don't know — use singular. When in doubt, use the dedicated `cerefox_set_document_projects` tool, which makes the destructive replace intent explicit and doesn't require also writing content.
102
+
103
+ ---
104
+
105
+ ### cerefox_get_document
106
+
107
+ Retrieve the complete text of a document by its UUID.
108
+
109
+ | Parameter | Required | Description |
110
+ |-----------|----------|-------------|
111
+ | `document_id` | Yes | UUID from search results `[id: ...]`. |
112
+ | `version_id` | No | UUID of an archived version (from `cerefox_list_versions`). |
113
+ | `requestor` | No | Your agent name. |
114
+
115
+ Use this when search returns partial results, or to read a previous version before restoring it.
116
+
117
+ ---
118
+
119
+ ### cerefox_list_versions
120
+
121
+ Show version history of a document.
122
+
123
+ | Parameter | Required | Description |
124
+ |-----------|----------|-------------|
125
+ | `document_id` | Yes | UUID of the document. |
126
+ | `requestor` | No | Your agent name. |
127
+
128
+ Returns: version_number, version_id, source, chunk_count, total_chars, created_at.
129
+
130
+ **To restore an old version**: retrieve it with `cerefox_get_document(document_id, version_id=<target>)`, then re-ingest with `cerefox_ingest(title=<same>, content=<old>, update_if_exists=true)`.
131
+
132
+ ---
133
+
134
+ ### cerefox_list_metadata_keys
135
+
136
+ Discover which metadata keys are in use across the knowledge base.
137
+
138
+ | Parameter | Required | Description |
139
+ |-----------|----------|-------------|
140
+ | `requestor` | No | Your agent name. |
141
+
142
+ Returns each key with document count and example values. Call this before constructing `metadata_filter` for search.
143
+
144
+ ---
145
+
146
+ ### cerefox_metadata_search
147
+
148
+ Find documents by metadata criteria without a text search query.
149
+
150
+ | Parameter | Required | Description |
151
+ |-----------|----------|-------------|
152
+ | `metadata_filter` | Yes | JSON key-value pairs (AND semantics). Example: `{"type": "decision-log"}` |
153
+ | `project_name` | No | Restrict to a project. |
154
+ | `include_content` | No | Include full text (default false). |
155
+ | `limit` | No | Max results (default 10). |
156
+ | `updated_since` | No | ISO-8601 timestamp. Only docs updated on/after. |
157
+ | `created_since` | No | ISO-8601 timestamp. Only docs created on/after. |
158
+ | `max_bytes` | No | Response size budget when include_content is true. |
159
+ | `requestor` | No | Your agent name. |
160
+
161
+ Use for browsing by category, catching up on recent changes (`updated_since`), or finding all documents of a specific type.
162
+
163
+ ---
164
+
165
+ ### cerefox_list_projects
166
+
167
+ List all projects with names, IDs, and descriptions.
168
+
169
+ | Parameter | Required | Description |
170
+ |-----------|----------|-------------|
171
+ | `requestor` | No | Your agent name. |
172
+
173
+ Call once per session to discover available projects before filtering search results by `project_name`.
174
+
175
+ ---
176
+
177
+ ### cerefox_set_document_projects
178
+
179
+ Set the document's project memberships to EXACTLY the given list. **Destructive replace.** Any existing memberships not in the list are removed. Content is untouched. Logged as `update-metadata` in the audit log.
180
+
181
+ | Parameter | Required | Description |
182
+ |-----------|----------|-------------|
183
+ | `document_id` | Yes | UUID of the document. Get from a prior `cerefox_search` result (the `[id: ...]` tag). |
184
+ | `project_names` | Yes | Explicit list of project names. Each created if absent (case-insensitive lookup). Empty list = clear all memberships. Order is preserved. |
185
+ | `author` | No | Your agent name for audit attribution. |
186
+
187
+ **Use cases**:
188
+ - You want to change project membership without rewriting the document body. This tool is faster and clearer than calling `cerefox_ingest` again.
189
+ - You want to add a doc to multiple projects in one call (cleaner than N separate `cerefox_ingest` calls).
190
+ - You want to *remove* a project from a doc's set (use the list of remaining projects without the one to drop).
191
+ - An operator asked you to consolidate or clean up a doc's project list.
192
+
193
+ **Use `cerefox_ingest` with `project_names` instead** if you're updating the content anyway — same destructive-replace semantics, one call instead of two.
194
+
195
+ **Never use this tool to "just ensure X is in the list"** — that's what `cerefox_ingest` with singular `project_name` does, non-destructively. If you call this tool with only one name, you will REMOVE the document from every other project it was in.
196
+
197
+ ---
198
+
199
+ ### cerefox_get_audit_log
200
+
201
+ Query the immutable audit log of all write operations.
202
+
203
+ | Parameter | Required | Description |
204
+ |-----------|----------|-------------|
205
+ | `document_id` | No | Filter by document UUID. |
206
+ | `author` | No | Filter by author name. |
207
+ | `operation` | No | Filter by type: create, update-content, update-metadata, delete, restore. |
208
+ | `since` | No | ISO timestamp lower bound. |
209
+ | `limit` | No | Max entries (default 50, max 200). |
210
+ | `requestor` | No | Your agent name. |
211
+
212
+ ---
213
+
214
+ ### cerefox_get_help
215
+
216
+ Retrieve Cerefox conventions and quick reference content over MCP — the same content as `AGENT_QUICK_REFERENCE.md` in the repo. Designed for agents who lack filesystem access (remote MCP) or just want an in-band refresher.
217
+
218
+ | Parameter | Required | Description |
219
+ |-----------|----------|-------------|
220
+ | `topic` | No | Case-insensitive substring match against `## H2` section titles. Omit to get the full reference plus a section index. |
221
+ | `requestor` | No | Your agent name (recorded with `access_path = "remote-mcp"` or `"local-mcp"`). |
222
+
223
+ **Behaviour:**
224
+ - No `topic` → full quick-reference markdown + an `## Available topics` index.
225
+ - `topic: "tools"` → just the `## Tools` section (no index footer).
226
+ - `topic` matches nothing → `No help topic matched "<topic>"` + available-topics list.
227
+
228
+ Cheap and idempotent. Call it any time you're uncertain about a convention (link forms, project-membership semantics, identity flags, etc.).
229
+
230
+ ---
231
+
232
+ ## Key Workflows
233
+
234
+ ### Search then update (ID-based -- preferred)
235
+
236
+ ```
237
+ 1. cerefox_search("topic") -- find relevant docs, note [id: uuid]
238
+ 2. cerefox_get_document(id) -- get full text if partial
239
+ 3. cerefox_ingest(title, content, -- update by document ID (deterministic)
240
+ document_id="uuid")
241
+ ```
242
+
243
+ ### Search then update (title-based -- fallback)
244
+
245
+ ```
246
+ 1. cerefox_search("topic") -- find relevant docs
247
+ 2. cerefox_get_document(id) -- get full text if partial
248
+ 3. cerefox_ingest(title, content, -- update with same title
249
+ update_if_exists=true)
250
+ ```
251
+
252
+ ### Save new knowledge
253
+
254
+ ```
255
+ 1. cerefox_search("topic") -- check if it already exists
256
+ 2. If not found: cerefox_ingest(title, content, project_name, metadata)
257
+ 3. If found: cerefox_ingest(same_title, new_content, document_id="uuid")
258
+ ```
259
+
260
+ ### Catch up on recent changes
261
+
262
+ ```
263
+ 1. cerefox_metadata_search(metadata_filter={"type": "decision-log"},
264
+ updated_since="2026-03-28T00:00:00Z")
265
+ 2. Review what other agents or the user have written since your last session
266
+ ```
267
+
268
+ ---
269
+
270
+ ## Rules
271
+
272
+ 1. **Always search before ingesting.** Check for existing documents on the topic.
273
+ 2. **Prefer `document_id` for updates** -- pass the UUID from search results to update a specific document. Use `update_if_exists: true` as a fallback when you don't have the ID.
274
+ 3. **Always set `author`/`requestor`** to your agent name for attribution.
275
+ 4. **Use the `document_id` from search results** for `cerefox_get_document`, `cerefox_list_versions`, and targeted `cerefox_ingest` updates.
276
+ 5. **Add metadata**: at minimum `type` (e.g., "research", "decision-log") and `status` ("active", "draft").
277
+ 6. **Write structured Markdown** with H1/H2/H3 headings. The chunker uses heading structure.
278
+ 7. **Distill, don't dump.** Summaries > transcripts. Decisions > discussions. Insights > raw data.
279
+
280
+ ---
281
+
282
+ ## Metadata Conventions
283
+
284
+ | Key | Purpose | Example values |
285
+ |-----|---------|---------------|
286
+ | `type` | Document category | `decision-log`, `design-doc`, `research`, `agent-guide`, `vision-document` |
287
+ | `status` | Lifecycle state | `active`, `draft`, `archived`, `research-complete` |
288
+ | `author` | Creator name | `claude-code`, `archiver`, `user` |
289
+ | `tags` | Topic keywords (JSON array string) | `["architecture", "MCP", "memory"]` |
290
+
291
+ Call `cerefox_list_metadata_keys` for the current list -- conventions evolve.
292
+
293
+ ---
294
+
295
+ ## Writing linkable content
296
+
297
+ Documents you ingest may contain markdown links to other Cerefox documents. The Cerefox web UI intercepts these links at click time and resolves them to the target document. The resolution happens entirely in the browser; the stored markdown is untouched.
298
+
299
+ ### The rule for agents: use document UUIDs
300
+
301
+ **For any cross-reference you author, use the target document's UUID.** Period.
302
+
303
+ ```markdown
304
+ [Opportunity Index](c937b70f-77af-43d3-b9bc-9f31e0d2041d)
305
+ ```
306
+
307
+ UUIDs are the only link form that is fully reliable:
308
+
309
+ - **Stable**: survives title changes. If the target gets renamed, the link still resolves.
310
+ - **Unambiguous**: a UUID matches exactly one document. No "multiple matches" popover, no surprise navigations.
311
+ - **Encoding-safe**: no spaces, no colons, no parentheses, no characters that the markdown parser, URL sanitizer, or HTML attribute layer will trip over.
312
+ - **Discoverable**: every `cerefox_search` result includes `[id: <uuid>]` after the title. Every `cerefox_ingest` response returns the document_id. **Capture and use these IDs.**
313
+
314
+ ### Workflow
315
+
316
+ ```
317
+ 1. cerefox_search "topic" → result includes [id: abc123]
318
+ 2. In your written content, link as: [Topic Name](abc123)
319
+ 3. Done.
320
+ ```
321
+
322
+ If you're writing about a document you haven't searched for yet, search for it first, grab the ID, then write the link. Don't guess by title — searching costs one tool call and gives you the stable link form.
323
+
324
+ ### Other link forms (best-effort, NOT for agent-authored content)
325
+
326
+ The resolver also accepts three other link forms, but **agents should not write them**. They exist primarily for repo-ingested files (where the source markdown naturally uses paths) and as best-effort fallbacks for legacy or human-authored content.
327
+
328
+ | Form | Example | Reliable for agents? |
329
+ |---|---|---|
330
+ | Repo-relative path | `[Quickstart](docs/guides/quickstart.md)` | Only when the target has a `source_path` from repo ingest. Don't construct manually. |
331
+ | Basename only | `[Quickstart](quickstart.md)` | Same — best-effort path fallback. Don't construct manually. |
332
+ | Angle-bracket title | `[Career Coach](<Career Coach: Lisa Nichols>)` | **Fragile**. Breaks on titles containing colons, parentheses, ampersands, brackets, or other punctuation. Web UI's URL sanitizer strips suspicious-looking URLs (e.g. anything before a `:` that looks like a scheme) → link silently navigates to current page. **Never use this form in agent-authored content.** |
333
+
334
+ If you're tempted to write `[Title With Spaces](<Title With Spaces>)` because you don't have the ID, **do an extra `cerefox_search` and use the ID instead**. The one extra tool call is much cheaper than the user encountering a broken link.
335
+
336
+ ### Always set meaningful link text
337
+
338
+ The `[Link Text](target)` syntax has two halves:
339
+
340
+ - **Link text** (`[…]`): what the human reader sees. Use the actual title.
341
+ - **Target** (`(…)`): what the resolver consumes. Always a UUID for agent-authored content.
342
+
343
+ Bad: `[c937b70f-77af-...](c937b70f-77af-...)` — opaque to the reader.
344
+ Good: `[Job Hunting - Opportunity Index](c937b70f-77af-43d3-b9bc-9f31e0d2041d)`.
345
+
346
+ ### What you don't need to do
347
+
348
+ - **You don't need to escape `#` anchors.** `[Section](abc123#configuration)` works — the resolver splits the anchor off and reattaches it to the target document URL.
349
+ - **You don't need to handle external URLs.** Links starting with `http://`, `https://`, `mailto:`, etc. pass through unchanged and open in a new tab.
350
+ - **You don't need to handle absolute SPA paths.** Links starting with `/` (e.g. `/search?q=foo`) pass through to the SPA router unchanged.
351
+ - **You don't need to create relation rows** for these links. The resolver does not populate the relation graph at this stage — that's a separate, future feature. If you want explicit relations between documents, use `cerefox_set_relation` when it ships.
352
+
353
+ ### A note on agents on Path C (CLI via Bash tool)
354
+
355
+ If you're using Cerefox via the local CLI (Path C from `connect-agents.md`), the same writing conventions apply. The web UI is where resolution happens; the CLI is just how you wrote the content. A user reading your ingested document later in the web UI gets clickable behaviour for free — **as long as you authored the links by UUID**.
356
+
357
+ ---
358
+
359
+ ## Governance
360
+
361
+ - **Review status**: agent writes set `pending_review`; human edits set `approved`. Both are searchable.
362
+ - **Soft delete**: deleted documents go to trash (recoverable). They are excluded from search. You can soft-delete via MCP (`cerefox_delete_document` if your client exposes it) or CLI (`cerefox delete-doc --yes --author <you> --author-type agent`).
363
+ - **Permanent purge and restore-from-trash are web-UI-only**, by design. If you decide to delete something, **tell the user explicitly** that you soft-deleted it and that they can review or restore it via the Cerefox web UI. You cannot un-do your own soft-delete from agent code; only the human can. See [`docs/guides/access-paths.md` → Destructive operations and the trust model](docs/guides/access-paths.md#destructive-operations-and-the-trust-model).
364
+ - **Versioning**: every update via `update_if_exists` creates an archived version. Old content is always recoverable.
365
+ - **Audit log**: all write operations are recorded with author, timestamp, and size changes.
366
+
367
+ This is a human-on-the-loop model: agents write and soft-delete freely with full audit attribution; humans review the trash, restore mistakes, and decide when to purge.
368
+
369
+ ---
370
+
371
+ ## Using Cerefox via the CLI
372
+
373
+ Read this section only if you do **not** have MCP tools available (no `cerefox_search` in your tool list) and your user has pointed you at a local Cerefox checkout. The semantics of every operation are identical to MCP — only the calling surface differs. The conventions above (when to search, when to ingest, metadata rules, ID-based update workflow, governance) all still apply.
374
+
375
+ ### Setup
376
+
377
+ Your user will have told you where their Cerefox checkout lives (commonly `/Users/<name>/src/cerefox`, but check `CLAUDE.md` / `AGENTS.md` / project memory for the exact path). Run every command from that directory, or use `cd /path/to/cerefox && uv run cerefox …` in your Bash tool call.
378
+
379
+ If a command fails with `command not found: cerefox`, run it as `uv run cerefox <subcommand>` (the project's `uv` environment provides the binary).
380
+
381
+ > Full per-flag reference lives in [`docs/guides/cli.md`](docs/guides/cli.md). The mapping table below is the agent-facing summary. **CLI flag names match MCP parameter names exactly** (kebab-case); short forms like `--project`, `--filter`, `--count`, `--update`, `--version` are accepted as aliases.
382
+
383
+ ### MCP tool ↔ CLI command mapping
384
+
385
+ | MCP tool | CLI command |
386
+ |---|---|
387
+ | `cerefox_search(query, match_count, project_name, metadata_filter, requestor)` | `uv run cerefox search "<query>" --match-count N --project-name <n> --metadata-filter '<json>' --requestor <name>` (also `--mode`, `--alpha`, `--min-score` — CLI-only) |
388
+ | `cerefox_ingest(title, content, project_name, metadata, update_if_exists, document_id, source, author, author_type)` (file) | `uv run cerefox ingest <path> --title <t> --project-name <n> --metadata '<json>' --update-if-exists\|--document-id <uuid> --source <s> --author <a> --author-type user\|agent` |
389
+ | `cerefox_ingest(...)` (paste) | `printf '%s' "<content>" \| uv run cerefox ingest --paste --title "<title>"` (same flags) |
390
+ | `cerefox_get_document(document_id, version_id, requestor)` | `uv run cerefox get-doc <document-id> --version-id <vid> --requestor <name>` |
391
+ | `cerefox_list_versions(document_id, requestor)` | `uv run cerefox list-versions <document-id> --requestor <name>` |
392
+ | `cerefox_list_projects(requestor)` | `uv run cerefox list-projects --requestor <name>` |
393
+ | `cerefox_list_metadata_keys()` | `uv run cerefox list-metadata-keys` |
394
+ | `cerefox_metadata_search(metadata_filter, project_name, updated_since, created_since, limit, include_content, requestor)` | `uv run cerefox metadata-search --metadata-filter '<json>' --project-name <n> --updated-since <iso> --created-since <iso> --limit N --include-content --requestor <name>` |
395
+ | `cerefox_get_audit_log(document_id, author, operation, since, until, limit, requestor)` | `uv run cerefox get-audit-log --document-id <id> --author <a> --operation <op> --since <iso> --until <iso> --limit N --json --requestor <name>` |
396
+
397
+ ### Caller-identity flags (set these the same way you would on MCP)
398
+
399
+ You **MUST** identify yourself on every CLI invocation, exactly as you do via MCP:
400
+
401
+ - **Writes** (`ingest`, `ingest-dir`): set `--author "<your-agent-name>" --author-type "agent"`. The `author_type=agent` value auto-routes the write to `pending_review` (governance signal), matching the MCP path.
402
+ - **Reads** (`search`, `get-doc`, `list-versions`, `list-projects`, `metadata-search`, `get-audit-log`): set `--requestor "<your-agent-name>"`.
403
+
404
+ Alternative: have your user set `CEREFOX_AUTHOR_NAME`, `CEREFOX_AUTHOR_TYPE`, `CEREFOX_REQUESTOR_NAME` in their `.env` once. The CLI picks them up automatically — see [`docs/guides/cli.md`](docs/guides/cli.md) for the precedence rules.
405
+
406
+ ### Behavioural differences worth knowing
407
+
408
+ 1. **CLI output is human-formatted by default.** `cerefox search` returns a numbered, indented text block with title, score, and a 300-char preview per result. To extract document IDs reliably, parse the `Doc: <title> (<source>)` lines or fall back to `cerefox list-docs` for a clean tabular listing. `cerefox get-doc <id>` prints raw Markdown to stdout. **For scripted access to audit data**, use `cerefox get-audit-log --json` — one JSON object per line, ideal for piping to `jq`.
409
+
410
+ 2. **Every invocation is independent.** With MCP, your tool framework can pass `requestor` once per session. With the CLI, every command is a separate process — pass `--requestor` / `--author` / `--author-type` on every relevant invocation, or set the env-var defaults once at the start.
411
+
412
+ 3. **Errors come back on stderr with a non-zero exit code.** Check both — a successful command prints results on stdout and exits 0; a failure prints to stderr and exits non-zero.
413
+
414
+ ### Quick patterns
415
+
416
+ **Search before answering:**
417
+ ```bash
418
+ uv run cerefox search "OAuth design notes" --match-count 5 --requestor "claude-code"
419
+ ```
420
+
421
+ **Search then read full content of a hit:**
422
+ ```bash
423
+ uv run cerefox search "OAuth design" --match-count 3 --requestor "claude-code"
424
+ # Note the [n] entries. Pick one and grab the doc id from `list-docs` or the result preview.
425
+ uv run cerefox get-doc <document-id> --requestor "claude-code"
426
+ ```
427
+
428
+ **Ingest a note (agent identity):**
429
+ ```bash
430
+ printf '# Title\n\nBody markdown with H2s for chunking.\n' \
431
+ | uv run cerefox ingest --paste \
432
+ --title "Stable Title" \
433
+ --project-name "Cerefox" \
434
+ --metadata '{"type":"decision-log","status":"active"}' \
435
+ --author "claude-code" --author-type "agent"
436
+ ```
437
+
438
+ **ID-based update (preferred — deterministic):**
439
+ ```bash
440
+ # Step 1: search and note the [id: abc12345-...] in the result
441
+ uv run cerefox search "the exact doc" --match-count 1 --requestor "claude-code"
442
+
443
+ # Step 2: update by ID
444
+ printf '...new content...' \
445
+ | uv run cerefox ingest --paste \
446
+ --title "Exact Same Title" \
447
+ --document-id "abc12345-..." \
448
+ --author "claude-code" --author-type "agent"
449
+ ```
450
+
451
+ **Title-based update (fallback when ID isn't available):**
452
+ ```bash
453
+ printf '...new content...' \
454
+ | uv run cerefox ingest --paste --title "Exact Same Title" --update-if-exists \
455
+ --author "claude-code" --author-type "agent"
456
+ ```
457
+
458
+ **Audit-log access (scripted, JSON):**
459
+ ```bash
460
+ uv run cerefox get-audit-log --json --limit 1000 --requestor "claude-code" \
461
+ | jq 'select(.author_type == "agent")'
462
+ ```
@@ -0,0 +1,76 @@
1
+ # Cerefox Knowledge Base -- Agent Quick Reference
2
+
3
+ Cerefox is a persistent, shared knowledge base. You have **10 MCP tools** (9 of them have CLI equivalents — `cerefox_get_help` is MCP-only). For the full guide, search Cerefox for "How AI Agents Use Cerefox" or call `cerefox_get_help` to retrieve this content over MCP.
4
+
5
+ ## Tools
6
+
7
+ | Tool | Purpose | Key params |
8
+ |------|---------|------------|
9
+ | `cerefox_search` | Find documents (hybrid FTS + semantic) | `query` (required), `project_name`, `metadata_filter`, `requestor` |
10
+ | `cerefox_ingest` | Save or update a document | `title`, `content` (required), `document_id` (update by ID), `update_if_exists`, `project_name` (single, non-destructive add on update), `project_names` (list, destructive replace on update), `metadata`, `author` |
11
+ | `cerefox_get_document` | Get full document by ID | `document_id` (required) |
12
+ | `cerefox_list_versions` | Version history of a document | `document_id` (required) |
13
+ | `cerefox_metadata_search` | Find docs by metadata (no text query) | `metadata_filter` (required), `include_content`, `updated_since` |
14
+ | `cerefox_list_metadata_keys` | Discover available metadata keys | (none required) |
15
+ | `cerefox_list_projects` | List all projects | (none required) |
16
+ | `cerefox_set_document_projects` | Set doc's project memberships to exactly the given list (destructive replace; metadata-only, no content change) | `document_id`, `project_names` (required) |
17
+ | `cerefox_get_audit_log` | Query write operation history | `document_id`, `author`, `operation`, `since` |
18
+ | `cerefox_get_help` | Retrieve Cerefox conventions (this reference) over MCP. **Call this whenever uncertain.** | `topic` (optional, case-insensitive H2 substring match) |
19
+
20
+ ## Essential Rules
21
+
22
+ 1. **Search before ingesting** -- check if the document exists first.
23
+ 2. **Prefer ID-based updates** -- pass `document_id` from search results for deterministic updates. Falls back to title-matching with `update_if_exists: true`.
24
+ 3. **Set `author`/`requestor`** to your name on every call (e.g., "Claude Code", "archiver"). On MCP, pass as parameters. On CLI, pass `--author`/`--author-type`/`--requestor` flags, or rely on `CEREFOX_AUTHOR_NAME`/`CEREFOX_AUTHOR_TYPE`/`CEREFOX_REQUESTOR_NAME` env vars set in the user's `.env`.
25
+ 4. **Use `document_id` from search results** `[id: uuid]` for get_document and list_versions.
26
+ 5. **Add metadata** -- at minimum `type` ("decision-log", "research", "design-doc") and `status` ("active", "draft").
27
+ 6. **Write structured Markdown** with H1/H2/H3 headings for good chunking and search.
28
+ 7. **Deletes are soft (recoverable); purge is web-UI-only.** If you decide to delete, surface it to the user (`I soft-deleted X — recoverable from the Cerefox web UI trash`). You cannot un-do your own delete from agent code by design.
29
+ 8. **Cross-doc links inside content**: **always use `[Text](document-uuid)`.** UUIDs are the only fully reliable link form — stable across title changes, never ambiguous, no encoding gotchas. Every `cerefox_search` result shows `[id: <uuid>]` after the title; grab it and use it. Title-based linking (`[Text](<Title With Spaces>)`) is fragile (breaks on colons, parens, ampersands, brackets — silently navigates to wrong page) — **don't write title-based links**; do an extra search to get the UUID instead. Repo-path forms (`[Text](docs/path.md)`) exist for repo-ingested files; don't construct manually. See `AGENT_GUIDE.md → Writing linkable content` for the full rule.
30
+ 9. **Project memberships — non-destructive by default**: on `cerefox_ingest` updates, **`project_name` (singular) is a non-destructive add** (ensures membership, preserves others). Use **`project_names` (list)** when you want to set the doc's full project set in one call (destructive replace). For metadata-only project changes without writing content, use **`cerefox_set_document_projects(document_id, project_names)`** — that tool is the destructive-replace contract made explicit. Never call `cerefox_set_document_projects` with a single name when you mean "add" — that would REMOVE the doc from all other projects. When in doubt, use `cerefox_ingest` with singular `project_name`.
31
+
32
+ ## Update Workflow (ID-based -- preferred)
33
+
34
+ ```
35
+ search("topic") -> find doc [id: abc123] -> get_document(abc123) -> modify ->
36
+ ingest(title="Same Title", content="...", document_id="abc123", author="my-agent")
37
+ ```
38
+
39
+ ## Update Workflow (title-based -- fallback)
40
+
41
+ ```
42
+ search("topic") -> find doc -> modify ->
43
+ ingest(title="Same Title", content="...", update_if_exists=true, author="my-agent")
44
+ ```
45
+
46
+ ## Catch-Up Workflow
47
+
48
+ ```
49
+ metadata_search(metadata_filter={"type": "decision-log"}, updated_since="2026-03-28T00:00:00Z")
50
+ ```
51
+
52
+ ## CLI fallback (when MCP is unavailable)
53
+
54
+ If `cerefox_search` is not in your tool list, your user has likely installed the Cerefox CLI. From v0.5+ the canonical invocation is plain **`cerefox <subcommand>`** (installed via `npm install -g @cerefox/memory`). The legacy `uv run cerefox <subcommand>` (Python CLI in a Cerefox checkout) still works through v0.7 but emits a deprecation banner.
55
+
56
+ Same operations, same conventions. Full reference: [`docs/guides/cli.md`](docs/guides/cli.md). CLI flag names match MCP parameter names exactly (e.g. `metadata_filter` ↔ `--metadata-filter`); short forms (`--filter`, `--project`, `--count`, `--update`, `--version`) work as aliases.
57
+
58
+ | MCP tool | CLI (v0.5+ canonical) |
59
+ |---|---|
60
+ | `cerefox_search` | `cerefox search "<q>" --requestor "<your-name>"` |
61
+ | `cerefox_ingest` (paste) | `printf '...' \| cerefox ingest --paste --title "<t>" --author "<your-name>" --author-type agent` |
62
+ | `cerefox_ingest` (update by ID) | `printf '...' \| cerefox ingest --paste --title "<t>" --document-id "<uuid>" --author "<your-name>" --author-type agent` |
63
+ | `cerefox_get_document` | `cerefox get-doc <id> --version-id <vid> --requestor "<your-name>"` |
64
+ | `cerefox_list_versions` | `cerefox list-versions <id> --requestor "<your-name>"` |
65
+ | `cerefox_list_projects` | `cerefox list-projects --requestor "<your-name>"` |
66
+ | `cerefox_list_metadata_keys` | `cerefox list-metadata-keys` |
67
+ | `cerefox_metadata_search` | `cerefox metadata-search --metadata-filter '<json>' --requestor "<your-name>"` |
68
+ | `cerefox_set_document_projects` | _MCP-only; a CLI command will be added in a future release. Until then, run via MCP if available._ |
69
+ | `cerefox_get_audit_log` | `cerefox get-audit-log --requestor "<your-name>"` (add `--json` for scripted access) |
70
+ | `cerefox_get_help` | `cerefox docs agent-quick-reference --print` (or `cerefox docs --list` for the full bundled-docs index) |
71
+
72
+ **Set identity on every call**, exactly as you would on MCP:
73
+ - Writes (`ingest`, `ingest-dir`): `--author "<your-name>" --author-type agent`
74
+ - Reads: `--requestor "<your-name>"`
75
+
76
+ Or have your user set `CEREFOX_AUTHOR_NAME` / `CEREFOX_AUTHOR_TYPE` / `CEREFOX_REQUESTOR_NAME` in their `.env` to apply defaults once.