@bodhi-ventures/aiocs 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 jmucha
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,488 @@
1
+ # aiocs
2
+
3
+ Local-only documentation fetch, versioning, and search CLI for AI agents.
4
+
5
+ ## What it does
6
+
7
+ - fetches docs from websites with Playwright
8
+ - supports authenticated sources via environment-backed headers and cookies
9
+ - runs lightweight canaries to detect source drift before full refreshes
10
+ - normalizes them into Markdown
11
+ - stores immutable local snapshots in a shared catalog
12
+ - diffs snapshots to show what changed between fetches
13
+ - indexes heading-aware chunks in SQLite FTS5
14
+ - adds optional hybrid retrieval with local Ollama embeddings and a dedicated Qdrant vector index
15
+ - links docs sources to local projects for scoped search
16
+ - exports and imports manifest-backed backups for `~/.aiocs`
17
+
18
+ All state is local. By default, data lives under `~/.aiocs`:
19
+
20
+ - data: `~/.aiocs/data`
21
+ - config: `~/.aiocs/config`
22
+
23
+ For testing or local overrides, set:
24
+
25
+ - `AIOCS_DATA_DIR`
26
+ - `AIOCS_CONFIG_DIR`
27
+
28
+ ## Install
29
+
30
+ ```bash
31
+ npm install -g @bodhi-ventures/aiocs
32
+ docs --version
33
+ ```
34
+
35
+ For repository development:
36
+
37
+ ```bash
38
+ pnpm install
39
+ pnpm build
40
+ ```
41
+
42
+ Run the CLI during development with:
43
+
44
+ ```bash
45
+ pnpm dev -- --help
46
+ ```
47
+
48
+ Or after build:
49
+
50
+ ```bash
51
+ ./dist/cli.js --help
52
+ ```
53
+
54
+ For AI agents, prefer the root-level `--json` flag for one-shot commands:
55
+
56
+ ```bash
57
+ docs --json version
58
+ docs --json doctor
59
+ docs --json init --no-fetch
60
+ pnpm dev -- --json source list
61
+ pnpm dev -- --json search "maker flow" --source hyperliquid
62
+ pnpm dev -- --json show 42
63
+ ```
64
+
65
+ `--json` emits exactly one JSON document to stdout with this envelope:
66
+
67
+ ```json
68
+ {
69
+ "ok": true,
70
+ "command": "search",
71
+ "data": {
72
+ "total": 0,
73
+ "limit": 20,
74
+ "offset": 0,
75
+ "hasMore": false,
76
+ "results": []
77
+ }
78
+ }
79
+ ```
80
+
81
+ Failures still exit with status `1`, but emit a JSON error document instead of human text:
82
+
83
+ ```json
84
+ {
85
+ "ok": false,
86
+ "command": "show",
87
+ "error": {
88
+ "code": "CHUNK_NOT_FOUND",
89
+ "message": "Chunk 42 not found"
90
+ }
91
+ }
92
+ ```
93
+
94
+ The full stable JSON contract lives in [docs/json-contract.md](./docs/json-contract.md).
95
+
96
+ ## Release
97
+
98
+ Stable releases are tag-driven. Bump `package.json.version`, commit the change, then create and push a matching stable tag:
99
+
100
+ ```bash
101
+ git add package.json
102
+ git commit -m "release: vX.Y.Z"
103
+ git tag vX.Y.Z
104
+ git push origin main
105
+ git push origin vX.Y.Z
106
+ ```
107
+
108
+ GitHub Actions publishes `@bodhi-ventures/aiocs` publicly to npm and creates the GitHub release only from pushed tags matching `vX.Y.Z`. The workflow validates that the tag exactly matches `package.json.version` and is safe to rerun after partial success.
109
+
110
+ ## Codex integration
111
+
112
+ For Codex-first setup, automatic-use guidance, MCP recommendations, and subagent examples, see [docs/codex-integration.md](./docs/codex-integration.md).
113
+
114
+ ## Built-in sources
115
+
116
+ Initial source specs are shipped in `sources/`:
117
+
118
+ - `synthetix`
119
+ - `hyperliquid`
120
+ - `lighter`
121
+ - `nado`
122
+ - `ethereal`
123
+
124
+ Bootstrap them in one command:
125
+
126
+ ```bash
127
+ docs init --no-fetch
128
+ docs --json init --no-fetch
129
+ ```
130
+
131
+ Validate the machine before bootstrapping:
132
+
133
+ ```bash
134
+ docs doctor
135
+ docs --json doctor
136
+ ```
137
+
138
+ ## Workflow
139
+
140
+ Register a source:
141
+
142
+ ```bash
143
+ mkdir -p ~/.aiocs/sources
144
+ cp /path/to/source.yaml ~/.aiocs/sources/my-source.yaml
145
+ pnpm dev -- source upsert ~/.aiocs/sources/my-source.yaml
146
+ pnpm dev -- source upsert /path/to/source.yaml
147
+ pnpm dev -- source list
148
+ ```
149
+
150
+ Fetch and snapshot docs:
151
+
152
+ ```bash
153
+ pnpm dev -- refresh due hyperliquid
154
+ pnpm dev -- snapshot list hyperliquid
155
+ pnpm dev -- refresh due
156
+ ```
157
+
158
+ Force fetch remains available for explicit maintenance:
159
+
160
+ ```bash
161
+ pnpm dev -- fetch hyperliquid
162
+ pnpm dev -- fetch all
163
+ ```
164
+
165
+ Link docs to a local project:
166
+
167
+ ```bash
168
+ pnpm dev -- project link /absolute/path/to/project hyperliquid lighter
169
+ pnpm dev -- project unlink /absolute/path/to/project lighter
170
+ ```
171
+
172
+ Search and inspect results:
173
+
174
+ ```bash
175
+ pnpm dev -- search "maker flow" --source hyperliquid
176
+ pnpm dev -- search "maker flow" --source hyperliquid --mode lexical
177
+ pnpm dev -- search "maker flow" --source hyperliquid --mode hybrid
178
+ pnpm dev -- search "maker flow" --source hyperliquid --mode semantic
179
+ pnpm dev -- search "maker flow" --all
180
+ pnpm dev -- search "maker flow" --source hyperliquid --limit 5 --offset 0
181
+ pnpm dev -- show 42
182
+ pnpm dev -- canary hyperliquid
183
+ pnpm dev -- diff hyperliquid
184
+ pnpm dev -- embeddings status
185
+ pnpm dev -- embeddings backfill all
186
+ pnpm dev -- embeddings run
187
+ pnpm dev -- backup export /absolute/path/to/backup
188
+ pnpm dev -- verify coverage hyperliquid /absolute/path/to/reference.md
189
+ ```
190
+
191
+ When `docs search` runs inside a linked project, it automatically scopes to that project's linked sources unless `--source` or `--all` is provided.
192
+
193
+ For agents, the intended decision order is:
194
+
195
+ 1. check `source list` or scoped `search` first
196
+ 2. if the source exists and is due, run `refresh due <source-id>`
197
+ 3. if the source is missing but worth reusing, add a spec under `~/.aiocs/sources`, then upsert and refresh only that source
198
+ 4. avoid `fetch all` unless the user explicitly asks or the daemon is doing maintenance
199
+
200
+ ### Hybrid search
201
+
202
+ `aiocs` keeps SQLite FTS5/BM25 as the canonical lexical index and adds an optional hybrid layer:
203
+
204
+ - `--mode lexical`: lexical search only
205
+ - `--mode hybrid`: BM25 plus vector recall fused with reciprocal-rank fusion
206
+ - `--mode semantic`: vector-only recall over the latest indexed snapshots
207
+ - `--mode auto`: default; uses hybrid only when the vector layer is healthy and current for the requested scope
208
+
209
+ Vector state is derived from the catalog, not a second source of truth. If Ollama or Qdrant is unavailable, `auto` degrades back to lexical search.
210
+
211
+ ### Authenticated sources
212
+
213
+ Source specs can reference secrets from the environment without storing raw values in YAML:
214
+
215
+ ```yaml
216
+ auth:
217
+ headers:
218
+ - name: authorization
219
+ valueFromEnv: AIOCS_DOCS_TOKEN
220
+ hosts:
221
+ - docs.example.com
222
+ include:
223
+ - https://docs.example.com/private/**
224
+ cookies:
225
+ - name: session
226
+ valueFromEnv: AIOCS_DOCS_SESSION
227
+ domain: docs.example.com
228
+ path: /
229
+ ```
230
+
231
+ Header secrets are scoped per entry. If `hosts` is omitted, the header applies to the source `allowedHosts`; `include` can further narrow it to specific URL patterns.
232
+
233
+ ### Canary checks
234
+
235
+ Canaries execute the real extraction strategy without creating snapshots. They are intended to catch selector/copy-markdown drift before a full refresh degrades silently.
236
+
237
+ ```yaml
238
+ canary:
239
+ everyHours: 6
240
+ checks:
241
+ - url: https://docs.example.com/start
242
+ expectedTitle: Private Docs Start
243
+ expectedText: Secret market structure docs
244
+ minMarkdownLength: 40
245
+ ```
246
+
247
+ If `canary` is omitted, `aiocs` defaults to a lightweight canary against the first `startUrl`.
248
+
249
+ ### Backups
250
+
251
+ `backup export` creates a manifest-backed directory snapshot. The catalog database is exported with SQLite's native backup mechanism so the backup stays consistent even if `aiocs` is reading or writing the catalog while the export runs.
252
+
253
+ Backups intentionally include only the canonical `~/.aiocs` data/config state. The Qdrant vector index is treated as derived state and is rebuilt from the restored catalog after `backup import`.
254
+
255
+ ## JSON command reference
256
+
257
+ All one-shot commands support `--json`:
258
+
259
+ - `version`
260
+ - `init`
261
+ - `doctor`
262
+ - `source upsert`
263
+ - `source list`
264
+ - `fetch`
265
+ - `canary`
266
+ - `refresh due`
267
+ - `snapshot list`
268
+ - `diff`
269
+ - `project link`
270
+ - `project unlink`
271
+ - `backup export`
272
+ - `backup import`
273
+ - `embeddings status`
274
+ - `embeddings backfill`
275
+ - `embeddings clear`
276
+ - `embeddings run`
277
+ - `search`
278
+ - `verify coverage`
279
+ - `show`
280
+
281
+ Representative examples:
282
+
283
+ ```bash
284
+ pnpm dev -- --json doctor
285
+ pnpm dev -- --json init --no-fetch
286
+ pnpm dev -- --json source list
287
+ pnpm dev -- --json source upsert sources/hyperliquid.yaml
288
+ pnpm dev -- --json refresh due hyperliquid
289
+ pnpm dev -- --json canary hyperliquid
290
+ pnpm dev -- --json refresh due
291
+ pnpm dev -- --json diff hyperliquid
292
+ pnpm dev -- --json embeddings status
293
+ pnpm dev -- --json embeddings backfill all
294
+ pnpm dev -- --json embeddings clear hyperliquid
295
+ pnpm dev -- --json embeddings run
296
+ pnpm dev -- --json project link /absolute/path/to/project hyperliquid lighter
297
+ pnpm dev -- --json snapshot list hyperliquid
298
+ pnpm dev -- --json backup export /absolute/path/to/backup
299
+ pnpm dev -- --json verify coverage hyperliquid /absolute/path/to/reference.md
300
+ ```
301
+
302
+ For multi-result commands like `fetch`, `refresh due`, and `search`, `data` contains structured collections rather than line-by-line output:
303
+
304
+ ```json
305
+ {
306
+ "ok": true,
307
+ "command": "search",
308
+ "data": {
309
+ "query": "maker flow",
310
+ "total": 42,
311
+ "limit": 20,
312
+ "offset": 0,
313
+ "hasMore": true,
314
+ "modeRequested": "auto",
315
+ "modeUsed": "hybrid",
316
+ "results": []
317
+ }
318
+ }
319
+ ```
320
+
321
+ ## Daemon
322
+
323
+ `aiocs` ships a first-class long-running refresh process:
324
+
325
+ ```bash
326
+ pnpm dev -- daemon
327
+ ./dist/cli.js daemon
328
+ ```
329
+
330
+ The daemon bootstraps source specs from the configured directories, refreshes due sources, sleeps for the configured interval, and repeats.
331
+ Configured source spec directories are treated as the daemon’s source of truth:
332
+
333
+ - if a managed source spec changes, the source is made due immediately in the same daemon cycle
334
+ - if a managed source spec is removed from disk, the source is removed from the catalog on the next bootstrap
335
+ - if `AIOCS_SOURCE_SPEC_DIRS` is explicitly set but resolves to missing or empty directories, the daemon fails fast instead of silently idling
336
+ - due canaries run independently from full fetch schedules so drift is caught earlier than the next full snapshot refresh
337
+ - daemon heartbeat state is persisted in the local catalog and surfaced through `docs doctor`
338
+ - queued embedding jobs are processed in the same daemon cycle after fetches complete
339
+
340
+ Environment variables:
341
+
342
+ - `AIOCS_DAEMON_INTERVAL_MINUTES`
343
+ - positive integer, defaults to `60`
344
+ - `AIOCS_DAEMON_FETCH_ON_START`
345
+ - `true` by default
346
+ - accepted values: `true`, `false`, `1`, `0`, `yes`, `no`, `on`, `off`
347
+ - `AIOCS_SOURCE_SPEC_DIRS`
348
+ - comma-separated list of source spec directories
349
+ - defaults to `~/.aiocs/sources`, the bundled `sources/` path, plus `/app/sources` inside Docker when present
350
+
351
+ For local agents, the daemon keeps the shared catalog under `~/.aiocs` warm while agents continue to use the normal CLI with `--json`.
352
+
353
+ ### Daemon JSON mode
354
+
355
+ `docs daemon --json` is intentionally different from one-shot commands. Because it is long-running, it emits one JSON event per line:
356
+
357
+ ```bash
358
+ ./dist/cli.js --json daemon
359
+ ```
360
+
361
+ Example event stream:
362
+
363
+ ```json
364
+ {"type":"daemon.started","intervalMinutes":60,"fetchOnStart":true,"sourceSpecDirs":["/app/sources"]}
365
+ {"type":"daemon.cycle.started","reason":"startup","startedAt":"2026-03-26T00:00:00.000Z"}
366
+ {"type":"daemon.cycle.completed","reason":"startup","result":{"canaryDueSourceIds":[],"dueSourceIds":[],"bootstrapped":{"processedSpecCount":5,"sources":[]},"canaried":[],"canaryFailed":[],"refreshed":[],"failed":[],"embedded":[],"embeddingFailed":[]}}
367
+ ```
368
+
369
+ ## MCP server
370
+
371
+ `aiocs` also ships an MCP server binary for tool-native agent integrations:
372
+
373
+ ```bash
374
+ aiocs-mcp
375
+ pnpm dev:mcp
376
+ ```
377
+
378
+ The MCP server exposes the same shared operations as the CLI without shell parsing:
379
+
380
+ - `version`
381
+ - `doctor`
382
+ - `init`
383
+ - `source_upsert`
384
+ - `source_list`
385
+ - `canary`
386
+ - `fetch`
387
+ - `refresh_due`
388
+ - `snapshot_list`
389
+ - `diff_snapshots`
390
+ - `project_link`
391
+ - `project_unlink`
392
+ - `embeddings_status`
393
+ - `embeddings_backfill`
394
+ - `embeddings_clear`
395
+ - `embeddings_run`
396
+ - `backup_export`
397
+ - `backup_import`
398
+ - `search`
399
+ - `show`
400
+ - `verify_coverage`
401
+ - `batch`
402
+
403
+ ## Release automation
404
+
405
+ The repo ships two GitHub Actions workflows:
406
+
407
+ - [ci.yml](./.github/workflows/ci.yml): validation for lint, tests, build, pack, and Docker smoke coverage
408
+ - [release.yml](./.github/workflows/release.yml): tag-driven stable release flow that validates the tagged package state, publishes to npm, and creates a GitHub release
409
+
410
+ The release workflow is triggered only by pushed stable tags matching `vX.Y.Z` and expects `NPM_TOKEN` in repository secrets. The release job is retryable: if `@bodhi-ventures/aiocs@X.Y.Z` already exists on npm or the GitHub release already exists for `vX.Y.Z`, the workflow skips the completed publication step and finishes the remaining one.
411
+
412
+ Successful MCP results use an envelope:
413
+
414
+ ```json
415
+ {
416
+ "ok": true,
417
+ "data": {
418
+ "name": "@bodhi-ventures/aiocs",
419
+ "version": "0.1.0"
420
+ }
421
+ }
422
+ ```
423
+
424
+ Failed MCP results use the same machine-readable error shape:
425
+
426
+ ```json
427
+ {
428
+ "ok": false,
429
+ "error": {
430
+ "code": "CHUNK_NOT_FOUND",
431
+ "message": "Chunk 42 not found"
432
+ }
433
+ }
434
+ ```
435
+
436
+ ## Docker
437
+
438
+ The repo ships a long-running Docker service for scheduled refreshes.
439
+
440
+ Build and start it with:
441
+
442
+ ```bash
443
+ docker compose up --build -d
444
+ ```
445
+
446
+ The compose file:
447
+
448
+ - runs `docs daemon` as the container entrypoint
449
+ - bind-mounts `${HOME}/.aiocs` into `/root/.aiocs` so the container shares the same local catalog defaults as the host CLI
450
+ - bind-mounts `./sources` into `/app/sources` so source spec edits are picked up without rebuilding
451
+ - runs a dedicated `aiocs-qdrant` container for vector search
452
+ - points the daemon at host Ollama with `AIOCS_OLLAMA_BASE_URL` (defaults to `http://host.docker.internal:11434` in Compose)
453
+
454
+ Override cadence with environment variables when starting compose:
455
+
456
+ ```bash
457
+ AIOCS_DAEMON_INTERVAL_MINUTES=15 docker compose up --build -d
458
+ ```
459
+
460
+ ## Source spec shape
461
+
462
+ Each source spec is YAML or JSON and must define:
463
+
464
+ - `id`
465
+ - `label`
466
+ - `startUrls`
467
+ - `allowedHosts`
468
+ - `discovery.include`
469
+ - `discovery.exclude`
470
+ - `discovery.maxPages`
471
+ - `extract`
472
+ - `normalize`
473
+ - `schedule.everyHours`
474
+
475
+ Supported extraction strategies:
476
+
477
+ - `clipboardButton`
478
+ - `selector`
479
+ - `readability`
480
+
481
+ ## Verification
482
+
483
+ ```bash
484
+ pnpm lint
485
+ pnpm test
486
+ pnpm build
487
+ npm pack --dry-run
488
+ ```