@bodhi-ventures/aiocs 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,157 @@
1
+ # AI Agent JSON CLI And Daemon Design
2
+
3
+ ## Summary
4
+
5
+ `aiocs` will add a machine-oriented JSON contract across the CLI and a first-class long-running daemon mode for scheduled refreshes. The CLI remains human-friendly by default, but every command will support a global `--json` flag so agents can consume one stable structured payload instead of parsing text. The daemon will live inside the same binary as `docs daemon`, and a Docker image will run that command in a loop with environment-configured cadence.
6
+
7
+ ## Goals
8
+
9
+ - make every one-shot CLI command safe for direct agent use
10
+ - keep a single canonical implementation path inside the existing CLI
11
+ - avoid introducing a separate HTTP service or second control plane
12
+ - support a long-running local container that keeps the shared catalog warm
13
+
14
+ ## Non-Goals
15
+
16
+ - MCP in this change
17
+ - remote registries or distributed scheduling
18
+ - a second machine API beyond the CLI
19
+
20
+ ## JSON Output Contract
21
+
22
+ ### Scope
23
+
24
+ `--json` is a root-level global flag that applies to every CLI command.
25
+
26
+ ### One-shot commands
27
+
28
+ These commands emit exactly one JSON document to stdout:
29
+
30
+ - `source upsert`
31
+ - `source list`
32
+ - `fetch`
33
+ - `refresh due`
34
+ - `snapshot list`
35
+ - `project link`
36
+ - `project unlink`
37
+ - `search`
38
+ - `show`
39
+
40
+ ### Envelope
41
+
42
+ Every command returns:
43
+
44
+ ```json
45
+ {
46
+ "ok": true,
47
+ "command": "source.list",
48
+ "data": {}
49
+ }
50
+ ```
51
+
52
+ Failures also emit a single JSON document to stdout, with exit code `1`:
53
+
54
+ ```json
55
+ {
56
+ "ok": false,
57
+ "command": "search",
58
+ "error": {
59
+ "message": "No linked project scope found. Use --source or --all."
60
+ }
61
+ }
62
+ ```
63
+
64
+ ### Command payloads
65
+
66
+ - `source.upsert`: upserted source metadata
67
+ - `source.list`: array of sources with due/snapshot fields
68
+ - `fetch`: array of per-source fetch results, even for a single source
69
+ - `refresh.due`: array of per-source fetch results; empty array when nothing is due
70
+ - `snapshot.list`: array of snapshots
71
+ - `project.link`: canonical project path and linked source ids
72
+ - `project.unlink`: canonical project path and removed scope
73
+ - `search`: array of chunk results
74
+ - `show`: one chunk result
75
+
76
+ ### Daemon exception
77
+
78
+ `docs daemon` is long-running, so a single final JSON document is the wrong shape. In JSON mode it will emit newline-delimited JSON event objects to stdout, one event per lifecycle action. This is the one intended exception to the single-document rule.
79
+
80
+ ## Daemon Design
81
+
82
+ ### Command
83
+
84
+ Add `docs daemon`.
85
+
86
+ ### Responsibilities
87
+
88
+ - ensure config and data directories exist
89
+ - optionally bootstrap source specs from configured directories
90
+ - optionally run an immediate refresh cycle on startup
91
+ - loop forever:
92
+ - upsert any source spec files from configured directories
93
+ - run refresh for due sources
94
+ - sleep until the next cycle
95
+
96
+ ### Environment variables
97
+
98
+ - `AIOCS_DAEMON_INTERVAL_MINUTES`
99
+ - required positive integer semantics, default `60`
100
+ - `AIOCS_DAEMON_FETCH_ON_START`
101
+ - `true` by default
102
+ - `AIOCS_SOURCE_SPEC_DIRS`
103
+ - comma-separated list of directories to scan for `.yaml`, `.yml`, and `.json` source specs
104
+ - default points at the bundled `sources/` directory in the image and local repo
105
+
106
+ ### Logging
107
+
108
+ - human mode: concise single-line operational logs
109
+ - JSON mode: one JSON event per line with `event`, `timestamp`, and event-specific fields
110
+
111
+ ### Failure model
112
+
113
+ - invalid env config fails fast at startup
114
+ - invalid source spec files fail the cycle and are logged explicitly
115
+ - fetch failures for one source do not kill the daemon process unless startup config is invalid
116
+
117
+ ## Docker Design
118
+
119
+ ### Image
120
+
121
+ Ship a Dockerfile that builds `aiocs`, includes the bundled `sources/` directory, and runs:
122
+
123
+ ```bash
124
+ ./dist/cli.js daemon
125
+ ```
126
+
127
+ ### Runtime contract
128
+
129
+ - mount persistent data to `/root/.aiocs/data` or provide `AIOCS_DATA_DIR`
130
+ - optional config mount for `/root/.aiocs/config`
131
+ - source specs available from bundled `/app/sources` by default
132
+ - allow overriding `AIOCS_SOURCE_SPEC_DIRS` with mounted custom directories
133
+
134
+ ### Compose
135
+
136
+ Ship a compose example that:
137
+
138
+ - builds the image locally
139
+ - mounts a persistent volume for the data directory
140
+ - sets `AIOCS_DAEMON_INTERVAL_MINUTES`
141
+ - optionally mounts a host directory of custom source specs
142
+
143
+ ## Testing Strategy
144
+
145
+ - CLI tests for `--json` across representative commands and failure paths
146
+ - unit tests for daemon env parsing and cycle behavior
147
+ - integration tests for daemon bootstrap + due refresh behavior with a short injected interval
148
+ - existing CLI/fetch regression suite stays green in human mode
149
+
150
+ ## Risks And Mitigations
151
+
152
+ - daemon JSON logs differ from one-shot JSON
153
+ - mitigate by documenting daemon as the explicit streaming exception
154
+ - source spec drift inside long-running containers
155
+ - mitigate by re-upserting source specs each cycle
156
+ - duplicated output logic across commands
157
+ - mitigate by centralizing response/error emission in one CLI output path
@@ -0,0 +1,423 @@
1
+ # aiocs Hybrid Search Design
2
+
3
+ Date: 2026-03-28
4
+
5
+ ## Goal
6
+
7
+ Add hybrid retrieval to `aiocs` so agents get better docs results for fuzzy and conceptual queries without weakening the current source/snapshot semantics.
8
+
9
+ This design keeps `aiocs` as the canonical docs system:
10
+
11
+ - `aiocs` owns fetching, normalization, chunking, snapshots, canaries, diffs, and ranking policy
12
+ - SQLite FTS5 remains the primary lexical index and source of truth
13
+ - a dedicated `aiocs-qdrant` container stores only derived embedding vectors for `aiocs`
14
+ - local Ollama generates embeddings
15
+ - SocratiCode remains separate and may later consume selected `aiocs` snapshots, but it is not the runtime for `aiocs` hybrid search
16
+
17
+ ## Non-Goals
18
+
19
+ - No replacement of SQLite FTS5 with vector-only search
20
+ - No reuse of SocratiCode's Qdrant collection or deployment
21
+ - No new hosted service or remote dependency
22
+ - No cross-repo code search in this phase
23
+ - No backup/restore of vector state; vectors are rebuildable
24
+
25
+ ## Why This Shape
26
+
27
+ The current `aiocs` search path in [catalog.ts](/Users/jmucha/repos/mandex/aiocs/src/catalog/catalog.ts) is pure FTS5 BM25 over the latest successful snapshots. That is excellent for exact docs lookups, versioned terms, and API names. It is weaker for:
28
+
29
+ - synonym-heavy prompts
30
+ - conceptual questions
31
+ - vague agent prompts
32
+ - recall across wording shifts in docs
33
+
34
+ Hybrid retrieval is the right improvement, but the ranking policy must remain docs-aware. That is why `aiocs` itself should own the hybrid query plan instead of delegating search semantics to a generic vector layer.
35
+
36
+ ## Recommended Architecture
37
+
38
+ ### 1. Canonical storage remains SQLite
39
+
40
+ SQLite remains the system of record for:
41
+
42
+ - sources
43
+ - snapshots
44
+ - pages
45
+ - chunks
46
+ - project links
47
+ - fetch/canary/daemon metadata
48
+
49
+ Add embedding-specific metadata tables in the same catalog:
50
+
51
+ - `embedding_models`
52
+ - `embedding_jobs`
53
+ - `embedding_state`
54
+
55
+ These track derived vector work, not source content.
56
+
57
+ ### 2. Dedicated `aiocs-qdrant`
58
+
59
+ Ship a separate Qdrant container in `aiocs/docker-compose.yml`:
60
+
61
+ - service name: `aiocs-qdrant`
62
+ - dedicated persistent volume
63
+ - default local URL from `aiocs` runtime
64
+
65
+ This container is strictly for `aiocs`. It must not share collections or lifecycle with SocratiCode.
66
+
67
+ ### 3. Ollama as embedding provider
68
+
69
+ Use Ollama locally for embeddings, with explicit config:
70
+
71
+ - provider: `ollama`
72
+ - model: configurable
73
+ - default model chosen from the local setup you already use for embeddings
74
+
75
+ `aiocs` should own its own embedding config even if the model matches SocratiCode.
76
+
77
+ ### 4. Hybrid retrieval strategy
78
+
79
+ Search modes:
80
+
81
+ - `lexical`
82
+ - `hybrid`
83
+ - `semantic`
84
+ - `auto`
85
+
86
+ Default: `auto`
87
+
88
+ Behavior:
89
+
90
+ - if vector infra is healthy and the target scope has embeddings, use hybrid
91
+ - otherwise fall back to lexical
92
+
93
+ Hybrid query plan:
94
+
95
+ 1. Run FTS5 BM25 against SQLite
96
+ 2. Run vector similarity search in Qdrant
97
+ 3. Fuse result sets with Reciprocal Rank Fusion
98
+ 4. Return the existing `aiocs` chunk shape plus hybrid metadata
99
+
100
+ RRF is preferred over weighted score mixing because:
101
+
102
+ - BM25 and cosine/dot-product scores are not directly comparable
103
+ - RRF is robust across model swaps
104
+ - RRF is simple and stable for agents
105
+
106
+ ## Data Model
107
+
108
+ ### SQLite additions
109
+
110
+ #### `embedding_models`
111
+
112
+ Tracks the embedding configuration currently in use.
113
+
114
+ Columns:
115
+
116
+ - `id`
117
+ - `provider`
118
+ - `model`
119
+ - `dimension`
120
+ - `distance_metric`
121
+ - `created_at`
122
+ - `active`
123
+
124
+ #### `embedding_state`
125
+
126
+ Tracks per-chunk embedding lifecycle.
127
+
128
+ Columns:
129
+
130
+ - `chunk_id`
131
+ - `source_id`
132
+ - `snapshot_id`
133
+ - `embedding_model_id`
134
+ - `content_hash`
135
+ - `vector_id`
136
+ - `status` (`pending`, `embedded`, `stale`, `failed`)
137
+ - `last_embedded_at`
138
+ - `last_error`
139
+
140
+ This avoids guessing whether a vector is current.
141
+
142
+ #### `embedding_jobs`
143
+
144
+ Persistent queue for background embedding work.
145
+
146
+ Columns:
147
+
148
+ - `id`
149
+ - `source_id`
150
+ - `snapshot_id`
151
+ - `job_type` (`snapshot_latest`, `snapshot_remove`, `reindex_model`)
152
+ - `status` (`pending`, `running`, `failed`, `completed`)
153
+ - `attempt_count`
154
+ - `last_error`
155
+ - `created_at`
156
+ - `started_at`
157
+ - `finished_at`
158
+
159
+ This queue is important because embedding is slower and more failure-prone than lexical indexing.
160
+
161
+ ### Qdrant payload
162
+
163
+ Each vector point stores:
164
+
165
+ - `chunk_id`
166
+ - `source_id`
167
+ - `snapshot_id`
168
+ - `page_url`
169
+ - `page_title`
170
+ - `section_title`
171
+ - `embedding_model_id`
172
+ - `content_hash`
173
+ - `is_latest_snapshot`
174
+
175
+ The point id should be stable and deterministic per chunk/model, for example:
176
+
177
+ - `${embedding_model_id}:${chunk_id}:${content_hash}`
178
+
179
+ That makes reindexing idempotent.
180
+
181
+ ## Indexing Lifecycle
182
+
183
+ ### Snapshot write path
184
+
185
+ When `recordSuccessfulSnapshot()` writes chunks:
186
+
187
+ 1. normal SQLite snapshot/page/chunk write happens first
188
+ 2. `aiocs` marks the new latest snapshot as requiring embeddings
189
+ 3. an embedding job is enqueued
190
+
191
+ The fetch path must not block on embedding completion. Search must continue working lexically even with zero vectors.
192
+
193
+ ### Latest-only vector policy
194
+
195
+ For this phase, vectors should be generated only for the latest successful snapshot per source.
196
+
197
+ Rationale:
198
+
199
+ - minimizes vector volume
200
+ - aligns with how current `search()` already targets latest successful snapshots by default
201
+ - avoids wasting GPU/CPU on historical snapshots rarely used in normal retrieval
202
+
203
+ Historical snapshot diffs remain SQLite-only.
204
+
205
+ If a source gets a new latest snapshot:
206
+
207
+ - mark prior latest vectors as stale
208
+ - enqueue cleanup/remove for stale vectors
209
+ - enqueue embedding for the new latest snapshot
210
+
211
+ ### Embedding worker
212
+
213
+ Add an embedding worker loop to the daemon process:
214
+
215
+ - fetch/canary cycle remains unchanged in purpose
216
+ - after refresh work, process a bounded number of embedding jobs
217
+ - retry failed jobs with capped attempts and backoff
218
+
219
+ The daemon becomes the single operational background process for both freshness and vector health.
220
+
221
+ ## Query Path
222
+
223
+ ### Lexical path
224
+
225
+ Keep the existing query path as-is for:
226
+
227
+ - `searchMode=lexical`
228
+ - `searchMode=auto` when vectors are unavailable
229
+
230
+ ### Semantic path
231
+
232
+ For `searchMode=semantic`:
233
+
234
+ 1. embed the query with Ollama
235
+ 2. query Qdrant with the same scope constraints
236
+ 3. fetch result chunk records from SQLite by `chunk_id`
237
+ 4. return ordered results
238
+
239
+ ### Hybrid path
240
+
241
+ For `searchMode=hybrid`:
242
+
243
+ 1. run lexical query for top `N`
244
+ 2. run semantic query for top `K`
245
+ 3. fuse with RRF
246
+ 4. fetch canonical chunk data from SQLite
247
+ 5. return result rows with mode metadata
248
+
249
+ Initial defaults:
250
+
251
+ - BM25 candidate window: `40`
252
+ - vector candidate window: `40`
253
+ - final page size: existing `limit`
254
+ - RRF `k`: `60`
255
+
256
+ These should be configurable, but not user-tuned in the first release.
257
+
258
+ ## Filtering and Invariants
259
+
260
+ All source/snapshot/project scoping remains authoritative in `aiocs`, not in Qdrant.
261
+
262
+ The runtime must:
263
+
264
+ - resolve project scope in SQLite first
265
+ - resolve latest snapshot ids in SQLite first
266
+ - constrain vector retrieval to those snapshot ids
267
+
268
+ This preserves the current guarantee that source/project/snapshot filters are exact.
269
+
270
+ Qdrant is a retrieval backend, not a source of truth.
271
+
272
+ ## CLI and MCP Changes
273
+
274
+ ### CLI
275
+
276
+ Extend `docs search` with:
277
+
278
+ - `--mode lexical|hybrid|semantic|auto`
279
+
280
+ Add operational commands:
281
+
282
+ - `docs embeddings status`
283
+ - `docs embeddings backfill [source-id|all]`
284
+ - `docs embeddings clear [source-id|all]`
285
+
286
+ Optional later:
287
+
288
+ - `docs embeddings doctor`
289
+
290
+ ### MCP
291
+
292
+ Extend `search` input with `mode`.
293
+
294
+ Add tools:
295
+
296
+ - `embeddings_status`
297
+ - `embeddings_backfill`
298
+ - `embeddings_clear`
299
+
300
+ The existing JSON envelope remains unchanged.
301
+
302
+ ## Docker and Runtime
303
+
304
+ ### `docker-compose.yml`
305
+
306
+ Add:
307
+
308
+ - `aiocs-qdrant`
309
+ - volume for Qdrant storage
310
+ - daemon env vars for Qdrant/Ollama config
311
+
312
+ The daemon container should depend on Qdrant health, not just startup ordering.
313
+
314
+ ### Config
315
+
316
+ Add environment variables:
317
+
318
+ - `AIOCS_SEARCH_MODE_DEFAULT`
319
+ - `AIOCS_QDRANT_URL`
320
+ - `AIOCS_QDRANT_COLLECTION`
321
+ - `AIOCS_EMBEDDING_PROVIDER`
322
+ - `AIOCS_OLLAMA_BASE_URL`
323
+ - `AIOCS_OLLAMA_EMBEDDING_MODEL`
324
+ - `AIOCS_EMBEDDING_BATCH_SIZE`
325
+ - `AIOCS_EMBEDDING_JOB_LIMIT_PER_CYCLE`
326
+
327
+ Defaults should make local Docker + local Ollama work without extra ceremony.
328
+
329
+ ## Doctor and Health
330
+
331
+ Extend `doctor` with new checks:
332
+
333
+ - `qdrant`
334
+ - `embedding-provider`
335
+ - `embedding-coverage`
336
+ - `embedding-backlog`
337
+
338
+ Examples:
339
+
340
+ - pass: vectors are healthy and mostly current
341
+ - warn: lexical search works, but vectors are unavailable or backlog is growing
342
+ - fail: `searchMode=hybrid` default is configured but vector infra is broken
343
+
344
+ ## Backups
345
+
346
+ Backups remain SQLite/config only.
347
+
348
+ Do not export Qdrant state in `backup export`.
349
+
350
+ After `backup import`:
351
+
352
+ - mark embeddings stale
353
+ - enqueue re-embedding for current latest snapshots
354
+
355
+ This keeps backup semantics simple and avoids trying to synchronize two storage engines.
356
+
357
+ ## Testing Strategy
358
+
359
+ ### Unit
360
+
361
+ - query mode parsing
362
+ - embedding config validation
363
+ - RRF fusion
364
+ - deterministic vector id generation
365
+ - embedding-state transitions
366
+
367
+ ### Integration
368
+
369
+ - snapshot creation enqueues embedding work
370
+ - daemon processes embedding jobs
371
+ - hybrid search falls back cleanly when vector infra is absent
372
+ - hybrid search respects source/project/snapshot filters
373
+ - `backup import` triggers rebuild behavior
374
+
375
+ ### Docker/runtime
376
+
377
+ - compose config includes dedicated Qdrant service
378
+ - doctor reports degraded state when Qdrant is unreachable
379
+
380
+ ## Migration Strategy
381
+
382
+ 1. add schema and config surfaces
383
+ 2. add Qdrant/Ollama client integration
384
+ 3. add embedding queue and daemon worker
385
+ 4. add hybrid query mode
386
+ 5. add doctor/docs/tests
387
+
388
+ This keeps lexical search live throughout the rollout.
389
+
390
+ ## Risks
391
+
392
+ ### 1. Ranking regressions
393
+
394
+ Mitigation:
395
+
396
+ - lexical remains available
397
+ - `auto` falls back safely
398
+ - RRF instead of fragile score blending
399
+
400
+ ### 2. Embedding backlog growth
401
+
402
+ Mitigation:
403
+
404
+ - latest-only vector policy
405
+ - bounded jobs per cycle
406
+ - explicit backlog health checks
407
+
408
+ ### 3. Vector/schema drift
409
+
410
+ Mitigation:
411
+
412
+ - embedding model registry in SQLite
413
+ - deterministic point ids
414
+ - explicit stale/rebuild lifecycle
415
+
416
+ ## Recommended Next Step
417
+
418
+ Turn this design into an execution plan and implement it in phases, starting with:
419
+
420
+ 1. embedding config + schema
421
+ 2. dedicated Qdrant runtime
422
+ 3. daemon embedding worker
423
+ 4. hybrid search mode
package/docs/README.md ADDED
@@ -0,0 +1,12 @@
1
+ # Docs
2
+
3
+ Keep durable project documentation here.
4
+
5
+ Good candidates:
6
+
7
+ - architecture notes
8
+ - domain concepts
9
+ - operational runbooks
10
+ - decisions worth preserving across sessions
11
+ - Codex integration guidance in `codex-integration.md`
12
+ - reusable agent examples under `examples/codex-agents/`
@@ -0,0 +1,125 @@
1
+ # Codex Integration
2
+
3
+ Use `aiocs` as the local-first documentation system for Codex. The best results come from treating `aiocs` as the authoritative docs runtime and only falling back to live browsing when the catalog is missing, stale, or explicitly bypassed by the user.
4
+
5
+ ## Recommended setup
6
+
7
+ Install the CLI and MCP binary globally:
8
+
9
+ ```bash
10
+ npm install -g @bodhi-ventures/aiocs
11
+ docs --version
12
+ aiocs-mcp
13
+ ```
14
+
15
+ The `aiocs-mcp` process is an MCP stdio server, so running it directly will wait for MCP clients instead of printing interactive help. The useful validation commands are:
16
+
17
+ ```bash
18
+ docs --json doctor
19
+ docs --json init --no-fetch
20
+ ```
21
+
22
+ ## How Codex should use aiocs
23
+
24
+ 1. Prefer `aiocs` before live browsing when the requested docs may already exist locally.
25
+ 2. Prefer MCP through `aiocs-mcp` when Codex can use it.
26
+ 3. Fall back to `docs --json ...` only when MCP is unavailable.
27
+ 4. Check `source_list` before assuming a source is missing or stale.
28
+ 5. Default to `search mode=auto`.
29
+ 6. Use `mode=lexical` for exact identifiers, endpoint names, headings, and error strings.
30
+ 7. Prefer `refresh due <source-id>` over force `fetch <source-id>` when the source already exists.
31
+ 8. Use MCP `batch` when multiple list/search/show or search/diff/coverage steps are needed.
32
+ 9. Cite `sourceId`, `snapshotId`, and `pageUrl` when they materially improve traceability.
33
+
34
+ ## Automatic use in Codex
35
+
36
+ Codex does not automatically invoke a custom subagent just because one exists. The primary automatic-use mechanism is the `aiocs` skill itself.
37
+
38
+ To make Codex discover `aiocs` automatically on this machine, expose the skill in the global Codex skill directory:
39
+
40
+ ```bash
41
+ mkdir -p ~/.codex/skills
42
+ ln -sfn /Users/jmucha/repos/mandex/aiocs/skills/aiocs ~/.codex/skills/aiocs
43
+ ```
44
+
45
+ Once that symlink exists, Codex can load the `aiocs` skill directly from the global skills catalog and prefer local docs without you explicitly calling a subagent.
46
+
47
+ ## Subagent options
48
+
49
+ There are two supported subagent patterns:
50
+
51
+ - Repo example for development and debugging:
52
+ [`docs/examples/codex-agents/aiocs-docs-specialist.example.toml`](/Users/jmucha/repos/mandex/aiocs/docs/examples/codex-agents/aiocs-docs-specialist.example.toml)
53
+ - Install-ready global agent definition:
54
+ [`/Users/jmucha/repos/ai-skills/agents/aiocs-docs-specialist.toml`](/Users/jmucha/repos/ai-skills/agents/aiocs-docs-specialist.toml)
55
+
56
+ The repo example is intentionally development-oriented and uses a checkout-local MCP command. The global agent points at the globally installed `aiocs-mcp` binary.
57
+
58
+ To expose the install-ready global agent to Codex on this machine:
59
+
60
+ ```bash
61
+ mkdir -p ~/.codex/agents
62
+ ln -sfn /Users/jmucha/repos/ai-skills/agents/aiocs-docs-specialist.toml ~/.codex/agents/aiocs-docs-specialist.toml
63
+ ```
64
+
65
+ ## Suggested Codex flows
66
+
67
+ Health and bootstrap:
68
+
69
+ ```bash
70
+ docs --json doctor
71
+ docs --json init --no-fetch
72
+ ```
73
+
74
+ Local docs lookup:
75
+
76
+ ```bash
77
+ docs --json source list
78
+ docs --json search "maker flow" --source hyperliquid --mode auto
79
+ docs --json show 42
80
+ ```
81
+
82
+ Missing or stale sources:
83
+
84
+ ```bash
85
+ # user-managed source specs live here
86
+ ~/.aiocs/sources
87
+
88
+ docs --json source upsert ~/.aiocs/sources/my-source.yaml
89
+ docs --json refresh due my-source
90
+ ```
91
+
92
+ Drift, change, and completeness:
93
+
94
+ ```bash
95
+ docs --json canary hyperliquid
96
+ docs --json diff hyperliquid
97
+ docs --json verify coverage hyperliquid /absolute/path/to/reference.md
98
+ ```
99
+
100
+ Catalog maintenance:
101
+
102
+ ```bash
103
+ docs --json refresh due hyperliquid
104
+ docs --json embeddings status
105
+ docs --json backup export /absolute/path/to/backup
106
+ ```
107
+
108
+ ## MCP-first guidance
109
+
110
+ If a Codex agent has access to the `aiocs-mcp` server, prefer these MCP tools over shelling out:
111
+
112
+ - `doctor`
113
+ - `init`
114
+ - `source_list`
115
+ - `source_upsert`
116
+ - `search`
117
+ - `show`
118
+ - `canary`
119
+ - `refresh_due`
120
+ - `diff_snapshots`
121
+ - `verify_coverage`
122
+ - `embeddings_status`
123
+ - `batch`
124
+
125
+ The CLI remains the fallback and should always be invoked with `--json` for agent use. For normal answering flows, avoid `fetch all`; use targeted due refresh or explicit user-approved force fetches.