@c3-oss/prosa 0.6.1 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md DELETED
@@ -1,655 +0,0 @@
1
- # prosa
2
-
3
- `prosa` is a local-first CLI for compiling, searching, auditing, and exporting
4
- agent session histories.
5
-
6
- It imports local histories from Codex CLI, Claude Code, Gemini CLI, and Cursor
7
- into one canonical bundle so you can search across tools, inspect prior work,
8
- export readable transcripts, and run analytical queries without giving up the
9
- original raw data.
10
-
11
- ## What it does
12
-
13
- - Imports session histories from multiple agent CLIs into a single local store.
14
- - Preserves raw source files and raw records for future re-processing.
15
- - Normalizes sessions, messages, tool calls, tool results, artifacts, and graph
16
- edges into SQLite tables.
17
- - Builds searchable derived indexes over messages, commands, paths, and result
18
- previews.
19
- - Lists and filters sessions by source and timestamp.
20
- - Exports individual sessions as Markdown.
21
- - Exports canonical tables to Parquet for DuckDB analytics.
22
- - Runs built-in analytics reports over Parquet with DuckDB.
23
- - Provides an Ink-based terminal UI for browsing sessions and search results.
24
- - Serves a read-only MCP server over the local bundle for agent memory access.
25
-
26
- `prosa` is early software, but the main CLI surfaces described below are
27
- implemented.
28
-
29
- ## Quick start
30
-
31
- From this repository:
32
-
33
- ```bash
34
- devbox shell
35
- pnpm install
36
- pnpm build
37
- ```
38
-
39
- During development, run commands through SWC:
40
-
41
- ```bash
42
- pnpm dev -- init
43
- pnpm dev -- compile codex
44
- pnpm dev -- sessions
45
- pnpm dev -- search "terraform"
46
- ```
47
-
48
- After building or installing the package, use the `prosa` binary:
49
-
50
- ```bash
51
- prosa init
52
- prosa compile-all
53
-
54
- prosa sessions --source codex --since 2026-01-01
55
- prosa search "package.json"
56
- prosa export session <session-id> --format markdown --out session.md
57
- prosa export parquet
58
- prosa query duckdb "select source_tool, count(*) from sessions group by 1"
59
- prosa analytics tools --refresh
60
- prosa tui
61
- prosa mcp serve
62
- ```
63
-
64
- By default, the bundle is stored at `~/.prosa`. Override it with `--store` or
65
- the `PROSA_STORE` environment variable:
66
-
67
- ```bash
68
- PROSA_STORE=/tmp/prosa-demo prosa init
69
- prosa sessions --store /tmp/prosa-demo
70
- ```
71
-
72
- ## Supported sources
73
-
74
- `prosa compile` imports one source at a time. If `--sessions-path` is omitted,
75
- the provider default is used:
76
-
77
- ```bash
78
- prosa compile codex [--sessions-path <path>]
79
- prosa compile claude [--sessions-path <path>]
80
- prosa compile gemini [--sessions-path <path>]
81
- prosa compile cursor [--sessions-path <path>]
82
- prosa compile-all [--verbose] [--json-logs]
83
- ```
84
-
85
- Supported importers:
86
-
87
- | Source | Typical path | Imported files |
88
- |---|---|---|
89
- | Codex CLI | `~/.codex/sessions` | Recursive `.jsonl` session files |
90
- | Claude Code | `~/.claude/projects` | Project JSONL files and subagent JSONL files |
91
- | Gemini CLI | `~/.gemini/tmp` | `chats/session-*.json` snapshots |
92
- | Cursor | `~/.cursor/chats` | `store.db` SQLite agent stores |
93
-
94
- Imports are idempotent for already-seen source files. Each import reports counts
95
- for source files, sessions, messages, tool calls, tool results, artifacts,
96
- edges, and errors.
97
-
98
- `prosa compile` always disables FTS5 triggers during the import loop and
99
- rebuilds the FTS5 index in bulk at the end (mirroring how the Tantivy sidecar
100
- is rebuilt). Sidecars stay in sync without a manual step. For recovery, the
101
- standalone `prosa index fts5` command is still available.
102
-
103
- ## CLI reference
104
-
105
- ### `prosa init`
106
-
107
- Initialize a bundle:
108
-
109
- ```bash
110
- prosa init
111
- prosa init --store /path/to/bundle
112
- ```
113
-
114
- If the bundle already exists, `init` exits with an error unless
115
- `--force-existing` is passed:
116
-
117
- ```bash
118
- prosa init --force-existing
119
- ```
120
-
121
- ### `prosa compile`
122
-
123
- Import session histories into the bundle:
124
-
125
- ```bash
126
- prosa compile codex
127
- prosa compile claude
128
- prosa compile gemini
129
- prosa compile cursor
130
- ```
131
-
132
- Override a provider source path:
133
-
134
- ```bash
135
- prosa compile codex --sessions-path ~/custom/codex/sessions
136
- ```
137
-
138
- Import every supported provider with default paths:
139
-
140
- ```bash
141
- prosa compile-all
142
- ```
143
-
144
- Options:
145
-
146
- | Option | Description |
147
- |---|---|
148
- | `--sessions-path <path>` | Root of the selected provider's session history |
149
- | `--store <path>` | Bundle directory |
150
- | `--verbose` | Emit debug logs during compilation |
151
- | `--json-logs` | Emit raw JSON logs instead of pretty logs |
152
-
153
- `prosa compile-all` accepts only the logging flags. It uses provider defaults and
154
- the normal `PROSA_STORE` environment variable when the bundle path must be
155
- overridden.
156
-
157
- ### `prosa index`
158
-
159
- Build or inspect derived search indexes:
160
-
161
- ```bash
162
- prosa index status
163
- prosa index fts5
164
- prosa index tantivy
165
- ```
166
-
167
- `fts5` is the default SQLite full-text index. `prosa compile` rebuilds it in
168
- bulk at the end of every import; `prosa index fts5` is a standalone recovery
169
- path that repopulates the index from `search_docs`.
170
-
171
- `tantivy` is an optional sidecar search index. Build it before searching with
172
- `--engine tantivy`:
173
-
174
- ```bash
175
- prosa index tantivy
176
- prosa search "migration error" --engine tantivy
177
- ```
178
-
179
- `index status` supports machine-readable output:
180
-
181
- ```bash
182
- prosa index status --output-format json
183
- ```
184
-
185
- ### `prosa sessions`
186
-
187
- List sessions in the bundle:
188
-
189
- ```bash
190
- prosa sessions
191
- prosa sessions --source claude
192
- prosa sessions --since 2026-01-01
193
- prosa sessions --until 2026-02-01
194
- prosa sessions --limit 100
195
- ```
196
-
197
- Count sessions with the same filters:
198
-
199
- ```bash
200
- prosa sessions count
201
- prosa sessions count --source cursor --since 2026-01-01
202
- ```
203
-
204
- Session list output includes timestamp, source tool, a 12-char `session_id`
205
- prefix, model, message count, tool call count, and title by default. Use
206
- `--columns all` to include `cwd_initial`, `source_session_id`,
207
- `parent_session_id`, `is_subagent`, `git_branch_initial`, `model_first`,
208
- `status`, `timeline_confidence`, and `end_ts`. Pass a CSV list to pick a
209
- subset (`--columns start_ts,session_id,title`).
210
-
211
- Output formats:
212
-
213
- ```bash
214
- prosa sessions --output-format table
215
- prosa sessions --output-format json
216
- prosa sessions --output-format csv
217
- prosa sessions --columns all
218
- prosa sessions --columns start_ts,session_id,title
219
- ```
220
-
221
- `table` and `interactive` outputs are width-aware: long values are truncated
222
- with `…` to fit the terminal (or 200 columns when piped). `json` and `csv`
223
- always emit full values. Use `prosa tui` for the interactive browser.
224
-
225
- ### `prosa search`
226
-
227
- Search messages, tool calls, paths, commands, and result previews:
228
-
229
- ```bash
230
- prosa search "terraform"
231
- prosa search "package.json" --limit 20
232
- prosa search "failed migration" --output-format json
233
- prosa search "schema update" --engine fts5
234
- prosa search "schema update" --engine tantivy
235
- ```
236
-
237
- The default engine is `fts5`. The Tantivy engine requires a sidecar index:
238
-
239
- ```bash
240
- prosa index tantivy
241
- prosa search "indexing" --engine tantivy
242
- ```
243
-
244
- Search output includes timestamp, role, tool name, session ID, and a snippet.
245
-
246
- ### `prosa export session`
247
-
248
- Export a single session as Markdown:
249
-
250
- ```bash
251
- prosa export session <session-id> --format markdown
252
- prosa export session <session-id> --format markdown --out transcript.md
253
- ```
254
-
255
- Markdown exports include source metadata, prosa and source session IDs,
256
- timestamps, working directory, branch, model span, timeline confidence,
257
- messages, and related tool calls.
258
-
259
- Large outputs are not intended to be dumped wholesale into Markdown. The export
260
- renders useful previews while the full bytes remain in the content-addressed
261
- object store.
262
-
263
- ### `prosa export parquet`
264
-
265
- Export canonical SQLite tables to Parquet:
266
-
267
- ```bash
268
- prosa export parquet
269
- prosa export parquet --out /tmp/prosa-parquet
270
- ```
271
-
272
- The export writes one `.parquet` file per canonical table plus a manifest. These
273
- files are derived analytics snapshots, not the source of truth.
274
-
275
- Exported tables include:
276
-
277
- ```text
278
- objects, source_files, import_batches, raw_records, import_errors,
279
- uncertainties, projects, sessions, turns, events, messages, content_blocks,
280
- tool_calls, tool_results, artifacts, edges, search_docs
281
- ```
282
-
283
- ### `prosa query duckdb`
284
-
285
- Run DuckDB SQL over exported Parquet tables:
286
-
287
- ```bash
288
- prosa export parquet
289
- prosa query duckdb "select source_tool, count(*) from sessions group by 1"
290
- ```
291
-
292
- Use a custom Parquet directory:
293
-
294
- ```bash
295
- prosa query duckdb \
296
- --parquet-dir /tmp/prosa-parquet \
297
- "select tool_name, count(*) from tool_calls group by 1 order by 2 desc"
298
- ```
299
-
300
- Output formats:
301
-
302
- ```bash
303
- prosa query duckdb "select count(*) as n from sessions" --output-format json
304
- prosa query duckdb "select * from sessions limit 10" --output-format csv
305
- ```
306
-
307
- `prosa query duckdb` also exposes derived analytics views:
308
-
309
- ```text
310
- session_facts, tool_usage_facts, error_facts, model_usage, project_activity
311
- ```
312
-
313
- See [`docs/recipes/duckdb.md`](./docs/recipes/duckdb.md) for copy-pasteable
314
- queries.
315
-
316
- ### `prosa analytics`
317
-
318
- Run built-in reports over exported Parquet files:
319
-
320
- ```bash
321
- prosa analytics sessions --refresh
322
- prosa analytics tools --source codex
323
- prosa analytics errors --output-format json
324
- prosa analytics models --since 2026-01-01
325
- prosa analytics projects --project /Users/me/app
326
- ```
327
-
328
- Reports require Parquet files. Add `--refresh` to export Parquet before running
329
- the report. All reports support `--store`, `--parquet-dir`, `--source`,
330
- `--since`, `--until`, `--limit`, `--output-format table|json|csv`, and
331
- `--columns <list>` for column selection.
332
-
333
- Table output is curated to fit a normal terminal: `analytics sessions` shows
334
- 9 columns by default (drops `source_file_path`, `session_id`,
335
- `source_session_id`, `tool_result_count`, `tool_duration_ms`, and
336
- `timeline_confidence`), `analytics projects` drops `project_path`, and
337
- `analytics errors` drops `session_id` and the full `message` (the shorter
338
- `preview` keeps the signal). Use `--columns all` to get every column the SQL
339
- returns, or `--columns col1,col2` to pick specific ones:
340
-
341
- ```bash
342
- prosa analytics sessions --columns all
343
- prosa analytics sessions --columns start_ts,project_name,source_file_path
344
- prosa analytics errors --columns all # includes the full `message`
345
- ```
346
-
347
- `json` and `csv` output always include every column regardless of `--columns`.
348
-
349
- Additional filters:
350
-
351
- ```bash
352
- prosa analytics tools --tool-name Bash --errors-only
353
- prosa analytics tools --canonical-type shell
354
- prosa analytics errors --category tool_result
355
- prosa analytics models --model gpt-5.4
356
- ```
357
-
358
- ### `prosa tui`
359
-
360
- Open the Ink-based interactive explorer:
361
-
362
- ```bash
363
- prosa tui
364
- prosa tui --store /path/to/bundle
365
- ```
366
-
367
- Key bindings:
368
-
369
- | Key | Action |
370
- |---|---|
371
- | `j` / `k` or arrows | Move selection or scroll detail view |
372
- | `Enter` | Open the selected session |
373
- | `/` | Search |
374
- | `s` | Cycle source filter |
375
- | `R` | Reload |
376
- | `Esc` | Return to the session list |
377
- | `gg` / `G` | Jump to top or bottom |
378
- | `Ctrl-d` / `Ctrl-u` | Half-page down or up |
379
- | `q` | Quit from the session list |
380
-
381
- ### `prosa mcp serve`
382
-
383
- Start a local read-only MCP server over the bundle. The default transport is
384
- stdio, suitable for MCP clients that launch a command through `npx` or a local
385
- binary:
386
-
387
- ```bash
388
- prosa mcp serve
389
- npx @c3-oss/prosa mcp serve
390
- prosa mcp serve --transport stdio
391
- prosa mcp serve --search-engine tantivy
392
- ```
393
-
394
- Example MCP client command config:
395
-
396
- ```json
397
- {
398
- "command": "npx",
399
- "args": ["@c3-oss/prosa", "mcp", "serve"]
400
- }
401
- ```
402
-
403
- In stdio mode, stdout is reserved for MCP JSON-RPC frames. Do not expect normal
404
- human-readable startup logs on stdout.
405
-
406
- To expose MCP over HTTP Streamable transport, pass `--transport http`:
407
-
408
- ```bash
409
- prosa mcp serve --transport http
410
- prosa mcp serve --transport http --host 127.0.0.1 --port 7331 --path /mcp
411
- prosa mcp serve --transport http --search-engine tantivy
412
- ```
413
-
414
- By default, HTTP mode listens at:
415
-
416
- ```text
417
- http://127.0.0.1:7331/mcp
418
- ```
419
-
420
- Registered MCP tools (six in total):
421
-
422
- | Tool | Purpose |
423
- |---|---|
424
- | `search` | Full-text search over messages, commands, paths, diffs, and previews. Optional `engine`, `field_kind`, `since`/`until`, `raw`, `limit`. |
425
- | `sessions` | Without `session_id`, lists candidates filtered by source/time/limit. With `session_id`, opens it: `format=detail` (default) returns metadata + timeline, `format=summary` returns the row only, `format=markdown` renders the transcript. |
426
- | `tool_calls` | Audit commands and tool usage by tool_name, canonical_type, session_id, errors_only, time bounds. When `path_substring` is set, also returns matching artifacts. |
427
- | `analytics` | Built-in aggregate reports backed by SQLite views: `report=sessions\|tools\|errors\|models\|projects` with the matching filters. |
428
- | `artifact` | Fetch full text for an `artifact_id`. Binary artifacts return a placeholder. |
429
- | `compile` | Without args, returns a status snapshot (search index health). With `source` (and optional `sessions_path`), imports that provider into the bundle. |
430
-
431
- Registered MCP prompts include:
432
-
433
- | Prompt | Purpose |
434
- |---|---|
435
- | `investigate_prior_work` | Search prior work on a topic and cite evidence |
436
- | `find_file_history` | Investigate history for a file or path |
437
- | `audit_tool_failures` | Group and explain failed tool calls |
438
-
439
- Five tools are read-only; `compile` is dual-mode (status without args, mutating import with args). All tools use the same services as the CLI.
440
-
441
- ## Common workflows
442
-
443
- ### Import everything local
444
-
445
- ```bash
446
- prosa init --force-existing
447
- prosa compile-all
448
- prosa index status
449
- ```
450
-
451
- ### Find prior work on a topic
452
-
453
- ```bash
454
- prosa search "auth middleware"
455
- prosa sessions --source codex --limit 20
456
- prosa export session <session-id> --format markdown
457
- ```
458
-
459
- ### Audit failed or suspicious tool usage
460
-
461
- Use the built-in analytics report for quick aggregates:
462
-
463
- ```bash
464
- prosa analytics tools --refresh --errors-only
465
- prosa analytics errors --output-format json
466
- ```
467
-
468
- Use MCP `tool_calls` for the richest session-level filtering, or query
469
- Parquet directly when you need custom SQL:
470
-
471
- ```bash
472
- prosa export parquet
473
- prosa query duckdb "
474
- select tool_name, status, count(*) as n
475
- from tool_calls
476
- group by 1, 2
477
- order by n desc
478
- "
479
- ```
480
-
481
- ### Summarize a custom session store through MCP
482
-
483
- After compiling a non-default sessions path, use MCP `analytics report=sessions`
484
- with `source_path_substring` to keep analysis inside prosa instead of reading
485
- the source JSONL directly. This is useful for stores such as
486
- `~/.codex-mz/sessions` that share the same provider name as the default Codex
487
- store.
488
-
489
- ### Search faster with a sidecar index
490
-
491
- ```bash
492
- prosa index tantivy
493
- prosa search "slow test" --engine tantivy
494
- prosa index status
495
- ```
496
-
497
- ### Keep an isolated test bundle
498
-
499
- ```bash
500
- prosa init --store /tmp/prosa-demo
501
- prosa compile codex --store /tmp/prosa-demo
502
- prosa tui --store /tmp/prosa-demo
503
- ```
504
-
505
- ## Bundle layout
506
-
507
- A bundle is a local directory, defaulting to `~/.prosa`:
508
-
509
- ```text
510
- ~/.prosa/
511
- manifest.json
512
- prosa.sqlite
513
- prosa.lock
514
- objects/
515
- blake3/...
516
- raw/
517
- sources/
518
- search/
519
- tantivy/
520
- exports/
521
- parquet/
522
- ```
523
-
524
- The layers are:
525
-
526
- | Layer | Contents |
527
- |---|---|
528
- | Raw immutable layer | Preserved source files, raw records, import batches, import errors |
529
- | Canonical projection | Projects, sessions, turns, events, messages, blocks, tool calls, tool results, artifacts, edges |
530
- | Derived read surfaces | `search_docs`, SQLite FTS5, Tantivy sidecar, Markdown, Parquet |
531
-
532
- SQLite is the canonical catalog. Large payloads such as raw records, tool
533
- outputs, diffs, and JSON payloads are stored in the content-addressed object
534
- store. Object IDs use BLAKE3 and object bytes are compressed with zstd.
535
-
536
- Raw source files are preserved so future importer versions can rebuild better
537
- projections without re-reading the original tool history directories.
538
-
539
- Search indexes, Markdown exports, and Parquet files are derived. Do not treat
540
- them as the source of truth.
541
-
542
- ## Development
543
-
544
- Requirements:
545
-
546
- - Node.js 22.15.1 through 26.x
547
- - pnpm
548
- - devbox, recommended for the local shell
549
-
550
- Useful commands:
551
-
552
- ```bash
553
- pnpm install
554
- pnpm dev -- <command>
555
- just build
556
- just test
557
- just lint
558
- just typecheck
559
- pnpm build
560
- pnpm test
561
- pnpm test:watch
562
- pnpm test:coverage
563
- pnpm typecheck
564
- pnpm lint
565
- pnpm lint:fix
566
- pnpm format
567
- pnpm clean
568
- ```
569
-
570
- Examples:
571
-
572
- ```bash
573
- pnpm dev -- init --store /tmp/prosa-dev
574
- pnpm dev -- compile codex --store /tmp/prosa-dev
575
- pnpm dev -- sessions --store /tmp/prosa-dev --output-format json
576
- ```
577
-
578
- Project layout:
579
-
580
- | Path | Purpose |
581
- |---|---|
582
- | `src/cli/commands/` | CLI command implementations |
583
- | `src/core/` | Bundle, schema, CAS, domain IDs, ingest helpers |
584
- | `src/importers/` | Codex, Claude, Gemini, and Cursor importers |
585
- | `src/services/` | Sessions, search, indexing, exports |
586
- | `src/mcp/` | MCP server, tools, and prompts |
587
- | `src/tui/` | Ink terminal UI |
588
- | `test/` | Vitest tests and fixtures |
589
- | `docs/` | Architecture and source-format references |
590
-
591
- ## Documentation
592
-
593
- `docs/` holds the architecture and source-format references. Start with
594
- [`docs/README.md`](./docs/README.md) for an index. Highlights:
595
-
596
- | Doc | Purpose |
597
- |---|---|
598
- | [`docs/architecture/bundle-format.md`](./docs/architecture/bundle-format.md) | Bundle layout, full SQLite schema, CAS, idempotency keys |
599
- | [`docs/architecture/import-pipeline.md`](./docs/architecture/import-pipeline.md) | How `compile` walks sources, stages CAS, commits, and rebuilds indexes |
600
- | [`docs/architecture/search-engines.md`](./docs/architecture/search-engines.md) | FTS5 default vs. Tantivy sidecar |
601
- | [`docs/sources/codex.md`](./docs/sources/codex.md) | `~/.codex/sessions/` JSONL format |
602
- | [`docs/sources/claude-code.md`](./docs/sources/claude-code.md) | `~/.claude/projects/` JSONL + artifacts |
603
- | [`docs/sources/cursor.md`](./docs/sources/cursor.md) | `~/.cursor/chats/**/store.db` SQLite |
604
- | [`docs/sources/gemini.md`](./docs/sources/gemini.md) | `~/.gemini/tmp/` JSON |
605
-
606
- ## Releasing
607
-
608
- `prosa` uses Changesets for local npm releases to the official npm registry.
609
- The package is published publicly as `@c3-oss/prosa`.
610
-
611
- Create a changeset for user-facing changes:
612
-
613
- ```bash
614
- just changeset
615
- ```
616
-
617
- Apply pending changesets to `package.json` and `CHANGELOG.md`:
618
-
619
- ```bash
620
- just version-packages
621
- ```
622
-
623
- Build and publish:
624
-
625
- ```bash
626
- just release
627
- ```
628
-
629
- Publishing requires an npm account authenticated locally with permission to
630
- publish public packages under the `@c3-oss` scope. Do not run `just release`
631
- unless you intend to publish to `https://registry.npmjs.org/`.
632
-
633
- ## Status and limitations
634
-
635
- - Version is currently `0.1.0`.
636
- - Source formats for agent tools can change; importers preserve raw bytes so
637
- projections can be improved later.
638
- - FTS5 is available by default; Tantivy search requires `prosa index tantivy`
639
- before use.
640
- - `prosa query duckdb` requires Parquet exports. Run `prosa export parquet`
641
- after importing or re-importing data.
642
- - Markdown export is optimized for readable transcripts and previews, not for
643
- dumping every stored byte inline.
644
- - The default store may contain private local history. Be careful before
645
- sharing exports, Parquet snapshots, or the bundle itself.
646
-
647
- ## Why this exists
648
-
649
- Every agent CLI keeps history in its own format. Searching across tools is
650
- painful, auditing tool calls is harder, and exporting human-readable transcripts
651
- is inconsistent.
652
-
653
- `prosa` reduces that fragmentation into one queryable local store while
654
- preserving provenance and raw source bytes for auditability and future
655
- re-processing.