@cerefox/memory 0.4.2 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,460 @@
1
+ # Cerefox Configuration Reference
2
+
3
+ All settings use the `CEREFOX_` environment variable prefix and can be set in a `.env` file (location resolved per the rule below) or as actual environment variables.
4
+
5
+ Copy `.env.example` to `.env` to get started:
6
+ ```bash
7
+ cp .env.example .env
8
+ ```
9
+
10
+ ## Where Cerefox looks for `.env` (v0.3.0+)
11
+
12
+ Resolved at process start, highest precedence wins:
13
+
14
+ 1. **`CEREFOX_CONFIG_DIR`** environment variable — explicit override; supports `~` expansion.
15
+ 2. **`./.env`** in the current working directory — dev mode. Wins for anyone running `cd /path/to/cerefox && uv run cerefox …`.
16
+ 3. **`~/.cerefox/.env`** — the user-state root; default for installed setups.
17
+
18
+ For most contributors, option 2 (repo-local `.env`) wins automatically. For an installed CLI (no repo checkout), option 3 is the default. Use option 1 to point a single machine at multiple Cerefox knowledge bases:
19
+
20
+ ```bash
21
+ CEREFOX_CONFIG_DIR=~/.cerefox-work cerefox search "…"
22
+ CEREFOX_CONFIG_DIR=~/.cerefox-personal cerefox search "…"
23
+ ```
24
+
25
+ Full rule documented in [`docs/specs/polish-and-distribution-design.md` §7](../specs/polish-and-distribution-design.md).
26
+
27
+ ---
28
+
29
+ ## Supabase / Database
30
+
31
+ | Variable | Default | Required | Description |
32
+ |----------|---------|----------|-------------|
33
+ | `CEREFOX_SUPABASE_URL` | `""` | For app | Supabase project URL. Found in: Project Settings → API → Project URL |
34
+ | `CEREFOX_SUPABASE_KEY` | `""` | For app | New **secret key** (`sb_secret_…`) from Project Settings → API Keys → Secret key. Legacy `service_role` JWT also works. **Keep secret.** See [`setup-supabase.md` → Supabase API keys (2026)](setup-supabase.md#supabase-api-keys-2026). |
35
+ | `CEREFOX_SUPABASE_ANON_KEY` | `""` | For Edge Functions / e2e | **Legacy anon JWT** (`eyJ…`), under "Legacy" in Project Settings → API Keys. Used as Bearer token for Edge Function / MCP / GPT Action calls. The new `sb_publishable_…` key fails at the Edge Function gateway and cannot replace this. See [`setup-supabase.md`](setup-supabase.md#supabase-api-keys-2026). |
36
+ | `CEREFOX_DATABASE_URL` | `""` | For scripts | Direct Postgres URL for deployment scripts. **Use the Session Pooler** (port `5432`) — Transaction Pooler (`6543`) does not support DDL. Username must include the project-ref suffix (`postgres.<project-ref>`). Append `?sslmode=require`. See [`setup-supabase.md` → Connection pooling (2026)](setup-supabase.md#connection-pooling-2026). |
37
+
38
+ **When each is needed:**
39
+ - `CEREFOX_SUPABASE_URL` + `CEREFOX_SUPABASE_KEY` — used by the Python app (ingestion, search, CLI, web UI) via supabase-py
40
+ - `CEREFOX_DATABASE_URL` — used only by the deployment scripts (psycopg2 direct connection)
41
+
42
+ ---
43
+
44
+ ## Embeddings
45
+
46
+ Cerefox uses cloud-based embedding APIs. Local models (mpnet, Ollama) are not supported — they require large downloads, fail on some hardware, and add installation complexity.
47
+
48
+ | Variable | Default | Description |
49
+ |----------|---------|-------------|
50
+ | `CEREFOX_EMBEDDER` | `openai` | Embedding provider. Valid values: `openai`, `fireworks` |
51
+
52
+ ### OpenAI (default, recommended)
53
+
54
+ | Variable | Default | Description |
55
+ |----------|---------|-------------|
56
+ | `OPENAI_API_KEY` | `""` | OpenAI API key. Also accepted as `CEREFOX_OPENAI_API_KEY`. Get one at [platform.openai.com/api-keys](https://platform.openai.com/api-keys). |
57
+ | `CEREFOX_OPENAI_BASE_URL` | `https://api.openai.com/v1` | API base URL. Override for proxies or OpenAI-compatible providers. |
58
+ | `CEREFOX_OPENAI_EMBEDDING_MODEL` | `text-embedding-3-small` | OpenAI embedding model. |
59
+ | `CEREFOX_OPENAI_EMBEDDING_DIMENSIONS` | `768` | Output dimensions. Must match the database schema (VECTOR(768)). |
60
+
61
+ For cost estimates see `docs/guides/operational-cost.md`.
62
+
63
+ ### Fireworks AI (alternative, lower cost)
64
+
65
+ | Variable | Default | Description |
66
+ |----------|---------|-------------|
67
+ | `CEREFOX_FIREWORKS_API_KEY` | `""` | Fireworks AI API key. |
68
+ | `CEREFOX_FIREWORKS_BASE_URL` | `https://api.fireworks.ai/inference/v1` | Fireworks API base URL. |
69
+ | `CEREFOX_FIREWORKS_EMBEDDING_MODEL` | `nomic-ai/nomic-embed-text-v1.5` | Fireworks model. Must natively output 768-dim vectors. |
70
+
71
+ To use Fireworks:
72
+ ```env
73
+ CEREFOX_EMBEDDER=fireworks
74
+ CEREFOX_FIREWORKS_API_KEY=fw_...
75
+ ```
76
+
77
+ ### Edge Functions (for agents)
78
+
79
+ The `cerefox-search` and `cerefox-ingest` Supabase Edge Functions handle embeddings server-side -- agents don't need to set up any embedder locally. The Edge Functions read `OPENAI_API_KEY` from the Supabase project's secrets. See `docs/guides/connect-agents.md`.
80
+
81
+ ### Embedding API retry
82
+
83
+ All embedding API calls (Python `CloudEmbedder` and Edge Functions) include automatic retry with exponential backoff for transient failures:
84
+
85
+ - **3 attempts** with backoff: 500ms, 1s, 2s
86
+ - **Retried**: HTTP 5xx server errors, network timeouts, connection failures
87
+ - **Not retried**: HTTP 4xx client errors (invalid API key, bad request)
88
+ - **Logged**: every retry attempt is logged with the failure reason and attempt number
89
+
90
+ This handles intermittent OpenAI API errors (500s) that would otherwise cause search or ingestion failures. The retry logic is consistent across both the Python path (local MCP, web UI, CLI) and the Edge Function path (remote MCP, GPT Actions).
91
+
92
+ ---
93
+
94
+ ## Chunking
95
+
96
+ | Variable | Default | Description |
97
+ |----------|---------|-------------|
98
+ | `CEREFOX_MAX_CHUNK_CHARS` | `4000` | Maximum characters per chunk before splitting at paragraph boundaries |
99
+ | `CEREFOX_MIN_CHUNK_CHARS` | `100` | Minimum chunk size. Chunks smaller than this are merged into the preceding chunk |
100
+
101
+ **Tuning advice:**
102
+ - Smaller `MAX_CHUNK_CHARS` → more precise chunk retrieval, but more DB rows and more embedding calls
103
+ - Larger `MAX_CHUNK_CHARS` → fewer chunks, coarser retrieval
104
+ - Default (4000) is a good balance for typical markdown notes
105
+ - Heading-bounded chunks are always kept whole regardless of size — `MIN_CHUNK_CHARS` only affects paragraph-level splits within oversized sections
106
+
107
+ ---
108
+
109
+ ## Retrieval
110
+
111
+ | Variable | Default | Description |
112
+ |----------|---------|-------------|
113
+ | `CEREFOX_MAX_RESPONSE_BYTES` | `200000` | Maximum bytes in a single search response (local MCP path). See explanation below. |
114
+ | `CEREFOX_MIN_SEARCH_SCORE` | `0.50` | Minimum cosine similarity for hybrid and semantic search results (0.0–1.0). In **hybrid search**, chunks that matched the FTS keyword operator (`@@`) always pass through regardless of their vector score — the threshold only filters vector-only results. In **semantic search**, all results are filtered. The pure **FTS search** mode is unaffected. Increase for stricter precision; decrease for wider recall. |
115
+
116
+ ### Metadata filter
117
+
118
+ The `metadata_filter` search parameter (available in all search modes, all access paths) performs **server-side JSONB containment filtering** before vector ranking. It is not a configuration variable — it is passed per request.
119
+
120
+ - Filters are expressed as a JSON object: `{"type": "decision", "status": "active"}`
121
+ - All key-value pairs must match (AND semantics via PostgreSQL `@>` operator)
122
+ - Uses the existing GIN index on `cerefox_documents.metadata` — no additional schema changes needed
123
+ - `NULL` filter = no restriction (backwards-compatible default)
124
+ - Discover available keys via `cerefox_list_metadata_keys` MCP tool or `cerefox list-metadata-keys` CLI
125
+
126
+ Access paths:
127
+ - **MCP tool**: `metadata_filter` argument on `cerefox_search`
128
+ - **CLI**: `cerefox search "query" --metadata-filter '{"type": "decision"}'` (alias: `--filter`, `-f`)
129
+ - **Web UI**: Metadata Filter section (collapsible) in the Knowledge Browser
130
+ - **GPT Actions**: `metadata_filter` field in `searchKnowledgeBase` request body (schema v1.4.0)
131
+ - **HTTP API**: `metadata_filter` JSON key in the `cerefox-search` Edge Function POST body
132
+
133
+ **Score threshold guidance (OpenAI text-embedding-3-small):**
134
+
135
+ | Score | Meaning |
136
+ |-------|---------|
137
+ | 0.0 – 0.20 | Noise floor — unrelated content |
138
+ | 0.20 – 0.45 | Weak/tangential overlap — same domain, different topic |
139
+ | 0.45 – 0.70 | Genuine semantic match — related concepts, paraphrases |
140
+ | 0.70 – 1.0 | High similarity — near-duplicate or very direct answer |
141
+
142
+ Recommended values:
143
+ - `0.50` (default) — filters noise, keeps genuine results
144
+ - `0.40`–`0.45` — wider recall; useful for small corpora or exploratory search
145
+ - `0.70`–`0.80` — high precision; only very close semantic matches
146
+ - `0.0` — disable filtering entirely (returns all RPC results, not recommended)
147
+
148
+ ### Response size limits
149
+
150
+ Response size limits are **opt-in per call** — they apply only on the MCP and Edge Function
151
+ paths where an AI agent's context window matters. The web UI and CLI always return all results
152
+ with no truncation.
153
+
154
+ | Path | Default limit | Ceiling | How to change |
155
+ |------|--------------|---------|---------------|
156
+ | Web UI / CLI | None | None | — |
157
+ | Local MCP server (`cerefox mcp`) | `CEREFOX_MAX_RESPONSE_BYTES` | Same | `.env` |
158
+ | Remote MCP / Edge Function | 200 000 bytes | 200 000 bytes | Agent passes `max_bytes` |
159
+
160
+ **`CEREFOX_MAX_RESPONSE_BYTES`** sets the default and ceiling for the local MCP server. Agents
161
+ can pass a smaller `max_bytes` in the `cerefox_search` tool call; larger values are silently
162
+ capped at this setting.
163
+
164
+ **Why 200 000 as the default?** At the default `match_count=5` and small-to-big threshold of
165
+ 20 000 chars, the worst case is 5 × 20 KB ≈ 100 KB — comfortably under 200 KB. The limit
166
+ protects against high `match_count` + large documents without cutting legitimate results at
167
+ defaults. (The original 65 KB default was driven by the Supabase MCP protocol limit, which no
168
+ longer applies.)
169
+
170
+ **Agent `max_bytes` parameter**: pass this when your model's context window is limited:
171
+ - MCP tool: `{"query": "...", "max_bytes": 50000}`
172
+ - Edge Function body: `{"query": "...", "max_bytes": 50000}`
173
+
174
+ See `docs/guides/response-limits.md` for the full guide including behaviour details and examples.
175
+
176
+ ### RPC-level retrieval parameters
177
+
178
+ Two retrieval parameters are configured directly in `src/cerefox/db/rpcs.sql` rather than in `.env`. They follow the same convention as `OPENAI_MODEL` and `EMBEDDING_DIMENSIONS` in the Edge Functions: they are system-level tuning knobs that rarely change, and changing them requires a SQL re-deploy (`python scripts/db_deploy.py`) rather than a restart.
179
+
180
+ | Parameter | Default | Location | Description |
181
+ |-----------|---------|----------|-------------|
182
+ | `p_small_to_big_threshold` | `20000` chars | `rpcs.sql` — `cerefox_search_docs` | Documents larger than this return matched chunks + neighbours instead of the full document. Set to `0` to always return full content. |
183
+ | `p_context_window` | `1` | `rpcs.sql` — `cerefox_search_docs` | Neighbour chunks on each side of each matched chunk. `N=1` → up to 3 contiguous chunks per hit. `N=0` → matched chunks only. `N=2` → up to 5. |
184
+
185
+ To change these values, edit the `DEFAULT` values in `cerefox_search_docs` in `src/cerefox/db/rpcs.sql` and redeploy:
186
+ ```bash
187
+ python scripts/db_deploy.py
188
+ ```
189
+
190
+ ---
191
+
192
+ ## Versioning
193
+
194
+ Cerefox automatically archives previous document content whenever a document is updated with new content. Archived chunks are preserved and searchable via the versioning API, but excluded from live search results.
195
+
196
+ | Variable | Default | Description |
197
+ |----------|---------|-------------|
198
+ | `CEREFOX_VERSION_RETENTION_HOURS` | `48` | How many hours to keep archived document versions. Versions older than this are lazily deleted the next time the same document is updated. Always keeps at least the most recent version regardless of age. |
199
+ | `CEREFOX_VERSION_CLEANUP_ENABLED` | `true` | When `true`, old versions are lazily deleted during updates (respecting `VERSION_RETENTION_HOURS`). Versions marked as `archived` are always protected. When `false`, all versions are retained indefinitely (immutable mode). |
200
+
201
+ **How versioning works:**
202
+
203
+ When a document's content changes during ingestion, Cerefox calls the `cerefox_snapshot_version` database function before writing new chunks. This function:
204
+ 1. Creates a version record in `cerefox_document_versions`
205
+ 2. Moves all current chunks to that version (by setting their `version_id`)
206
+ 3. If `CEREFOX_VERSION_CLEANUP_ENABLED` is `true`, deletes stale versions older than `CEREFOX_VERSION_RETENTION_HOURS` (skipping archived versions)
207
+
208
+ Metadata-only updates (same content, different title or project) do **not** create a new version.
209
+
210
+ To view and retrieve previous versions:
211
+ ```bash
212
+ uv run cerefox list-versions <document-id>
213
+ uv run cerefox get-doc <document-id> --version <version-id>
214
+ ```
215
+
216
+ ---
217
+
218
+ ## Storage & Backup
219
+
220
+ | Variable | Default | Description |
221
+ |----------|---------|-------------|
222
+ | `CEREFOX_BACKUP_DIR` | dev mode: `./backups` · user-state mode: `~/.cerefox/backups` (v0.3.0+) | Local directory where file system backups are stored. Created automatically if it doesn't exist. The default tracks the resolved config dir — dev users see no change. |
223
+ | `CEREFOX_VERSION_RETENTION_HOURS` | `48` | How long to retain archived document versions (hours). The most recent version is always kept regardless of this setting. |
224
+
225
+ ---
226
+
227
+ ## Logging
228
+
229
+ | Variable | Default | Description |
230
+ |----------|---------|-------------|
231
+ | `CEREFOX_LOG_LEVEL` | `INFO` | Python logging level. Valid values: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL` |
232
+
233
+ Set to `DEBUG` during development to see detailed operation logs.
234
+
235
+ ---
236
+
237
+ ## Example: Minimal Production `.env`
238
+
239
+ ```bash
240
+ # Required
241
+ CEREFOX_SUPABASE_URL=https://abcdefghijkl.supabase.co
242
+ CEREFOX_SUPABASE_KEY=eyJhbGciOiJIUzI1NiIs...
243
+
244
+ # Required for scripts only
245
+ CEREFOX_DATABASE_URL=postgresql://postgres.abcdefghijkl:MyPassword@aws-1-us-east-1.pooler.supabase.com:5432/postgres?sslmode=require
246
+
247
+ # Embeddings — OpenAI (default)
248
+ OPENAI_API_KEY=sk-...
249
+
250
+ # All other settings use defaults
251
+ ```
252
+
253
+ ## Example: Fireworks Embedder `.env`
254
+
255
+ ```bash
256
+ CEREFOX_SUPABASE_URL=https://abcdefghijkl.supabase.co
257
+ CEREFOX_SUPABASE_KEY=eyJhbGciOiJIUzI1NiIs...
258
+ CEREFOX_DATABASE_URL=postgresql://...
259
+
260
+ CEREFOX_EMBEDDER=fireworks
261
+ CEREFOX_FIREWORKS_API_KEY=fw_...
262
+ ```
263
+
264
+ ---
265
+
266
+ ## Changing the embedding model
267
+
268
+ Cerefox has **two independent access paths**, each with its own embedding configuration:
269
+
270
+ | Path | Where embedding happens | Config location |
271
+ |------|------------------------|-----------------|
272
+ | Local MCP server + CLI | Python `CloudEmbedder` | `.env` (`CEREFOX_OPENAI_EMBEDDING_MODEL`, etc.) |
273
+ | Edge Functions (GPT Actions, curl) | TypeScript constants in Edge Function code | Hardcoded in `supabase/functions/*/index.ts` |
274
+
275
+ When you change the embedding model, **both paths must be updated and kept in sync** — they must use the same model and dimensions, or search results will be incoherent (queries embedded by one model won't match chunks embedded by another).
276
+
277
+ ### Step 1 — Update `.env`
278
+
279
+ Change `CEREFOX_OPENAI_EMBEDDING_MODEL` and `CEREFOX_OPENAI_EMBEDDING_DIMENSIONS` to the new values.
280
+
281
+ ### Step 2 — Re-embed all stored chunks
282
+
283
+ ```bash
284
+ uv run cerefox reindex
285
+ ```
286
+
287
+ This re-embeds every chunk in the database using the model now configured in `.env`.
288
+ Preserves document IDs and project assignments. Run this before using the new model for searches.
289
+
290
+ ### Step 3 — Update and redeploy the Edge Functions (if you use them)
291
+
292
+ The Edge Functions have the model hardcoded as TypeScript constants. Edit both files:
293
+
294
+ ```
295
+ supabase/functions/cerefox-search/index.ts (lines ~29–30)
296
+ supabase/functions/cerefox-ingest/index.ts (lines ~25–26)
297
+ ```
298
+
299
+ Change:
300
+ ```typescript
301
+ const OPENAI_MODEL = "text-embedding-3-small"; // ← update this
302
+ const EMBEDDING_DIMENSIONS = 768; // ← and this if dimensions change
303
+ ```
304
+
305
+ Then redeploy via the Supabase CLI:
306
+ ```bash
307
+ supabase functions deploy cerefox-search
308
+ supabase functions deploy cerefox-ingest
309
+ ```
310
+
311
+ Or redeploy through the Supabase Dashboard → Edge Functions → Deploy.
312
+
313
+ > **If you only use the local MCP server** (Claude Desktop, ChatGPT Desktop, Cursor), Step 3 is
314
+ > optional — the Edge Functions are only used for GPT Actions and direct HTTP access.
315
+
316
+ > **Future improvement**: the Edge Functions will be updated to read model config from Supabase
317
+ > secrets, eliminating the need to edit TypeScript and redeploy when the model changes.
318
+
319
+ ---
320
+
321
+ ## Usage Tracking
322
+
323
+ Cerefox can optionally log all operations (both reads and writes) across all access paths.
324
+ This includes search, metadata search, get document, list versions, get audit log, list
325
+ metadata keys, list projects, and ingest. This data feeds the analytics page and CSV export.
326
+
327
+ **Usage tracking is opt-in and disabled by default.** No data is collected until you explicitly
328
+ enable it.
329
+
330
+ ### How it works
331
+
332
+ A `cerefox_config` table in Postgres stores runtime configuration as key-value pairs. The only
333
+ key currently in use is `usage_tracking_enabled`. Every usage logging call goes through the
334
+ `cerefox_log_usage` RPC, which checks this config value first:
335
+
336
+ - If `usage_tracking_enabled` is `"true"` -- the RPC inserts a row into `cerefox_usage_log`
337
+ - If `usage_tracking_enabled` is anything else (including missing) -- the RPC returns immediately without inserting
338
+
339
+ The check happens **inside Postgres on every call**. All callers (Edge Functions, MCP tools,
340
+ Python routes, CLI) call `cerefox_log_usage` unconditionally -- the RPC decides whether to
341
+ actually log. Callers never wait for the logging result or handle errors from it
342
+ (fire-and-forget).
343
+
344
+ This means:
345
+ - **No redeploy needed** to toggle tracking on or off -- just change the config value
346
+ - **No performance impact when disabled** -- the RPC exits immediately
347
+ - **One implementation** -- the check is in the RPC, not duplicated across callers
348
+
349
+ ### Enabling and disabling
350
+
351
+ **Via CLI:**
352
+ ```bash
353
+ # Enable
354
+ cerefox config-set usage_tracking_enabled true
355
+
356
+ # Disable
357
+ cerefox config-set usage_tracking_enabled false
358
+
359
+ # Check current state
360
+ cerefox config-get usage_tracking_enabled
361
+ ```
362
+
363
+ **Via REST API:**
364
+ ```bash
365
+ # Enable
366
+ curl -X PUT http://localhost:8000/api/v1/config/usage_tracking_enabled \
367
+ -H 'Content-Type: application/json' -d '{"value": "true"}'
368
+
369
+ # Read
370
+ curl http://localhost:8000/api/v1/config/usage_tracking_enabled
371
+ ```
372
+
373
+ ### What gets logged
374
+
375
+ Each usage log entry records:
376
+
377
+ | Field | Description |
378
+ |-------|-------------|
379
+ | `operation` | What was called: `search`, `metadata_search`, `get_document`, `list_versions`, `get_audit_log`, `list_metadata_keys`, `list_projects` |
380
+ | `access_path` | Where the call came from: `remote-mcp`, `local-mcp`, `edge-function`, `webapp`, `cli` |
381
+ | `requestor` | Who made the call: agent name (e.g., "Claude Code", "mcp-agent") or "user" for webapp/CLI |
382
+ | `document_id` | Optional: which document was accessed (for get_document, list_versions) |
383
+ | `project_id` | Optional: which project was filtered on |
384
+ | `query_text` | The search query or metadata filter |
385
+ | `result_count` | Number of results returned |
386
+ | `extra` | Flexible JSONB for additional context |
387
+
388
+ The `access_path` is set by the caller layer (not the end user):
389
+ - Edge Functions set `"edge-function"` (GPT Actions, direct HTTP callers)
390
+ - `cerefox-mcp` tool handlers set `"remote-mcp"` (Claude Code, Cursor, Claude Desktop)
391
+ - Python REST routes set `"webapp"` (the web UI)
392
+ - Local MCP server sets `"local-mcp"`
393
+ - CLI sets `"cli"` for search, get-doc, and list-versions commands
394
+
395
+ ### Viewing and exporting usage data
396
+
397
+ **REST API endpoints:**
398
+ - `GET /api/v1/usage-log` -- filtered list of entries (params: start, end, operation, access_path, requestor, project_id, limit)
399
+ - `GET /api/v1/usage-log/summary` -- aggregated stats (by day, operation, access path, top documents, top requestors)
400
+ - `GET /api/v1/usage-log/export.csv` -- CSV download with all columns
401
+
402
+ **CLI:**
403
+ ```bash
404
+ cerefox config-get usage_tracking_enabled
405
+ ```
406
+
407
+ ---
408
+
409
+ ## Requestor Identity Enforcement
410
+
411
+ By default, the `requestor` parameter on MCP read tools (and `author` on ingest) is
412
+ optional. When omitted, it defaults to `"mcp-agent"`. This means the usage log shows
413
+ `"mcp-agent"` for all calls that don't explicitly identify themselves, making analytics
414
+ less useful in multi-agent setups.
415
+
416
+ You can optionally enforce caller identification so that all MCP tool calls must include
417
+ a requestor/author identity. Calls without identity receive a JSON-RPC `-32602` error
418
+ with a helpful message telling the agent what to provide.
419
+
420
+ ### Enabling enforcement
421
+
422
+ ```bash
423
+ # Require all MCP tool calls to include requestor/author
424
+ cerefox config-set require_requestor_identity true
425
+
426
+ # Optionally override the default naming format (regex)
427
+ # Default: ^[a-zA-Z0-9_:.\- ]+$ (letters, numbers, underscores, colons, dots, hyphens, spaces)
428
+ cerefox config-set requestor_identity_format "^[a-z]+:[a-z]+$"
429
+ ```
430
+
431
+ ### Format examples
432
+
433
+ | Format regex | Allows | Use case |
434
+ |-------------|--------|----------|
435
+ | `^[a-zA-Z0-9_:.\- ]+$` | Letters, numbers, underscores, colons, dots, hyphens, spaces | **Default** -- covers "Claude Code", "mcp-agent", "personal:steward", "user" |
436
+ | `^[a-z]+:[a-z]+$` | `conclave:agent` format only | Multi-conclave setups (e.g., `personal:steward`) |
437
+ | (empty string) | Any non-empty string | No format restriction |
438
+
439
+ The format is applied to both `requestor` (read tools) and `author` (ingest).
440
+
441
+ ### Disabling enforcement
442
+
443
+ ```bash
444
+ cerefox config-set require_requestor_identity false
445
+ ```
446
+
447
+ When disabled, the requestor parameter remains optional with the `"mcp-agent"` default.
448
+ This is the default state -- no configuration needed for backward compatibility.
449
+
450
+ ---
451
+
452
+ ## Checking Your Configuration
453
+
454
+ Run the status script to verify everything is connected:
455
+
456
+ ```bash
457
+ uv run python scripts/db_status.py
458
+ ```
459
+
460
+ If it exits successfully (code 0), your configuration is correct.