@c3-oss/prosa 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,566 @@
1
+ # prosa
2
+
3
+ `prosa` is a local-first CLI for compiling, searching, auditing, and exporting
4
+ agent session histories.
5
+
6
+ It imports local histories from Codex CLI, Claude Code, Gemini CLI, and Cursor
7
+ into one canonical bundle so you can search across tools, inspect prior work,
8
+ export readable transcripts, and run analytical queries without giving up the
9
+ original raw data.
10
+
11
+ ## What it does
12
+
13
+ - Imports session histories from multiple agent CLIs into a single local store.
14
+ - Preserves raw source files and raw records for future re-processing.
15
+ - Normalizes sessions, messages, tool calls, tool results, artifacts, and graph
16
+ edges into SQLite tables.
17
+ - Builds searchable derived indexes over messages, commands, paths, and result
18
+ previews.
19
+ - Lists and filters sessions by source and timestamp.
20
+ - Exports individual sessions as Markdown.
21
+ - Exports canonical tables to Parquet for DuckDB analytics.
22
+ - Provides an Ink-based terminal UI for browsing sessions and search results.
23
+ - Serves a read-only MCP server over the local bundle for agent memory access.
24
+
25
+ `prosa` is early software, but the main CLI surfaces described below are
26
+ implemented.
27
+
28
+ ## Quick start
29
+
30
+ From this repository:
31
+
32
+ ```bash
33
+ devbox shell
34
+ pnpm install
35
+ pnpm build
36
+ ```
37
+
38
+ During development, run commands through SWC:
39
+
40
+ ```bash
41
+ pnpm dev -- init
42
+ pnpm dev -- compile --codex ~/.codex/sessions
43
+ pnpm dev -- sessions
44
+ pnpm dev -- search "terraform"
45
+ ```
46
+
47
+ After building or installing the package, use the `prosa` binary:
48
+
49
+ ```bash
50
+ prosa init
51
+ prosa compile \
52
+ --codex ~/.codex/sessions \
53
+ --claude ~/.claude/projects \
54
+ --gemini ~/.gemini/tmp \
55
+ --cursor ~/.cursor/chats
56
+
57
+ prosa sessions --source codex --since 2026-01-01
58
+ prosa search "package.json"
59
+ prosa export session <session-id> --format markdown --out session.md
60
+ prosa export parquet
61
+ prosa query duckdb "select source_tool, count(*) from sessions group by 1"
62
+ prosa tui
63
+ prosa mcp serve
64
+ ```
65
+
66
+ By default, the bundle is stored at `~/.prosa`. Override it with `--store` or
67
+ the `PROSA_STORE` environment variable:
68
+
69
+ ```bash
70
+ PROSA_STORE=/tmp/prosa-demo prosa init
71
+ prosa sessions --store /tmp/prosa-demo
72
+ ```
73
+
74
+ ## Supported sources
75
+
76
+ `prosa compile` accepts one or more source roots:
77
+
78
+ ```bash
79
+ prosa compile [--codex <path>] [--claude <path>] [--gemini <path>] [--cursor <path>]
80
+ ```
81
+
82
+ Supported importers:
83
+
84
+ | Source | Typical path | Imported files |
85
+ |---|---|---|
86
+ | Codex CLI | `~/.codex/sessions` | Recursive `.jsonl` session files |
87
+ | Claude Code | `~/.claude/projects` | Project JSONL files and subagent JSONL files |
88
+ | Gemini CLI | `~/.gemini/tmp` | `chats/session-*.json` snapshots |
89
+ | Cursor | `~/.cursor/chats` | `store.db` SQLite agent stores |
90
+
91
+ Imports are idempotent for already-seen source files. Each import reports counts
92
+ for source files, sessions, messages, tool calls, tool results, artifacts,
93
+ edges, and errors.
94
+
95
+ For large imports, you can defer FTS5 index updates and rebuild later:
96
+
97
+ ```bash
98
+ prosa compile --codex ~/.codex/sessions --defer-index
99
+ prosa index fts5
100
+ ```
101
+
102
+ ## CLI reference
103
+
104
+ ### `prosa init`
105
+
106
+ Initialize a bundle:
107
+
108
+ ```bash
109
+ prosa init
110
+ prosa init --store /path/to/bundle
111
+ ```
112
+
113
+ If the bundle already exists, `init` exits with an error unless
114
+ `--force-existing` is passed:
115
+
116
+ ```bash
117
+ prosa init --force-existing
118
+ ```
119
+
120
+ ### `prosa compile`
121
+
122
+ Import session histories into the bundle:
123
+
124
+ ```bash
125
+ prosa compile --codex ~/.codex/sessions
126
+ prosa compile --claude ~/.claude/projects
127
+ prosa compile --gemini ~/.gemini/tmp
128
+ prosa compile --cursor ~/.cursor/chats
129
+ ```
130
+
131
+ Import multiple tools in one run:
132
+
133
+ ```bash
134
+ prosa compile \
135
+ --codex ~/.codex/sessions \
136
+ --claude ~/.claude/projects \
137
+ --gemini ~/.gemini/tmp \
138
+ --cursor ~/.cursor/chats
139
+ ```
140
+
141
+ Options:
142
+
143
+ | Option | Description |
144
+ |---|---|
145
+ | `--codex <path>` | Root of Codex CLI sessions |
146
+ | `--claude <path>` | Root of Claude Code projects |
147
+ | `--gemini <path>` | Root of Gemini CLI temp/session data |
148
+ | `--cursor <path>` | Root of Cursor agent stores |
149
+ | `--store <path>` | Bundle directory |
150
+ | `--defer-index` | Skip immediate FTS5 updates; run `prosa index fts5` later |
151
+
152
+ ### `prosa index`
153
+
154
+ Build or inspect derived search indexes:
155
+
156
+ ```bash
157
+ prosa index status
158
+ prosa index fts5
159
+ prosa index tantivy
160
+ ```
161
+
162
+ `fts5` is the default SQLite full-text index. It is updated during normal
163
+ imports unless `--defer-index` is used.
164
+
165
+ `tantivy` is an optional sidecar search index. Build it before searching with
166
+ `--engine tantivy`:
167
+
168
+ ```bash
169
+ prosa index tantivy
170
+ prosa search "migration error" --engine tantivy
171
+ ```
172
+
173
+ `index status` supports machine-readable output:
174
+
175
+ ```bash
176
+ prosa index status --output-format json
177
+ ```
178
+
179
+ ### `prosa sessions`
180
+
181
+ List sessions in the bundle:
182
+
183
+ ```bash
184
+ prosa sessions
185
+ prosa sessions --source claude
186
+ prosa sessions --since 2026-01-01
187
+ prosa sessions --until 2026-02-01
188
+ prosa sessions --limit 100
189
+ ```
190
+
191
+ Count sessions with the same filters:
192
+
193
+ ```bash
194
+ prosa sessions count
195
+ prosa sessions count --source cursor --since 2026-01-01
196
+ ```
197
+
198
+ Session list output includes timestamp, source tool, `session_id`, model,
199
+ message count, tool call count, initial working directory, and title.
200
+
201
+ Output formats:
202
+
203
+ ```bash
204
+ prosa sessions --output-format table
205
+ prosa sessions --output-format json
206
+ prosa sessions --output-format csv
207
+ ```
208
+
209
+ The `interactive` output format is accepted by headless commands and currently
210
+ renders as a table. Use `prosa tui` for the interactive browser.
211
+
212
+ ### `prosa search`
213
+
214
+ Search messages, tool calls, paths, commands, and result previews:
215
+
216
+ ```bash
217
+ prosa search "terraform"
218
+ prosa search "package.json" --limit 20
219
+ prosa search "failed migration" --output-format json
220
+ prosa search "schema update" --engine fts5
221
+ prosa search "schema update" --engine tantivy
222
+ ```
223
+
224
+ The default engine is `fts5`. The Tantivy engine requires a sidecar index:
225
+
226
+ ```bash
227
+ prosa index tantivy
228
+ prosa search "indexing" --engine tantivy
229
+ ```
230
+
231
+ Search output includes timestamp, role, tool name, session ID, and a snippet.
232
+
233
+ ### `prosa export session`
234
+
235
+ Export a single session as Markdown:
236
+
237
+ ```bash
238
+ prosa export session <session-id> --format markdown
239
+ prosa export session <session-id> --format markdown --out transcript.md
240
+ ```
241
+
242
+ Markdown exports include source metadata, prosa and source session IDs,
243
+ timestamps, working directory, branch, model span, timeline confidence,
244
+ messages, and related tool calls.
245
+
246
+ Large outputs are not intended to be dumped wholesale into Markdown. The export
247
+ renders useful previews while the full bytes remain in the content-addressed
248
+ object store.
249
+
250
+ ### `prosa export parquet`
251
+
252
+ Export canonical SQLite tables to Parquet:
253
+
254
+ ```bash
255
+ prosa export parquet
256
+ prosa export parquet --out /tmp/prosa-parquet
257
+ ```
258
+
259
+ The export writes one `.parquet` file per canonical table plus a manifest. These
260
+ files are derived analytics snapshots, not the source of truth.
261
+
262
+ Exported tables include:
263
+
264
+ ```text
265
+ objects, source_files, import_batches, raw_records, import_errors,
266
+ uncertainties, projects, sessions, turns, events, messages, content_blocks,
267
+ tool_calls, tool_results, artifacts, edges, search_docs
268
+ ```
269
+
270
+ ### `prosa query duckdb`
271
+
272
+ Run DuckDB SQL over exported Parquet tables:
273
+
274
+ ```bash
275
+ prosa export parquet
276
+ prosa query duckdb "select source_tool, count(*) from sessions group by 1"
277
+ ```
278
+
279
+ Use a custom Parquet directory:
280
+
281
+ ```bash
282
+ prosa query duckdb \
283
+ --parquet-dir /tmp/prosa-parquet \
284
+ "select tool_name, count(*) from tool_calls group by 1 order by 2 desc"
285
+ ```
286
+
287
+ Output formats:
288
+
289
+ ```bash
290
+ prosa query duckdb "select count(*) as n from sessions" --output-format json
291
+ prosa query duckdb "select * from sessions limit 10" --output-format csv
292
+ ```
293
+
294
+ ### `prosa tui`
295
+
296
+ Open the Ink-based interactive explorer:
297
+
298
+ ```bash
299
+ prosa tui
300
+ prosa tui --store /path/to/bundle
301
+ ```
302
+
303
+ Key bindings:
304
+
305
+ | Key | Action |
306
+ |---|---|
307
+ | `j` / `k` or arrows | Move selection or scroll detail view |
308
+ | `Enter` | Open the selected session |
309
+ | `/` | Search |
310
+ | `s` | Cycle source filter |
311
+ | `R` | Reload |
312
+ | `Esc` | Return to the session list |
313
+ | `gg` / `G` | Jump to top or bottom |
314
+ | `Ctrl-d` / `Ctrl-u` | Half-page down or up |
315
+ | `q` | Quit from the session list |
316
+
317
+ ### `prosa mcp serve`
318
+
319
+ Start a local read-only MCP server over the bundle. The default transport is
320
+ stdio, suitable for MCP clients that launch a command through `npx` or a local
321
+ binary:
322
+
323
+ ```bash
324
+ prosa mcp serve
325
+ npx @c3-oss/prosa mcp serve
326
+ prosa mcp serve --transport stdio
327
+ prosa mcp serve --search-engine tantivy
328
+ ```
329
+
330
+ Example MCP client command config:
331
+
332
+ ```json
333
+ {
334
+ "command": "npx",
335
+ "args": ["@c3-oss/prosa", "mcp", "serve"]
336
+ }
337
+ ```
338
+
339
+ In stdio mode, stdout is reserved for MCP JSON-RPC frames. Do not expect normal
340
+ human-readable startup logs on stdout.
341
+
342
+ To expose MCP over HTTP Streamable transport, pass `--transport http`:
343
+
344
+ ```bash
345
+ prosa mcp serve --transport http
346
+ prosa mcp serve --transport http --host 127.0.0.1 --port 7331 --path /mcp
347
+ prosa mcp serve --transport http --search-engine tantivy
348
+ ```
349
+
350
+ By default, HTTP mode listens at:
351
+
352
+ ```text
353
+ http://127.0.0.1:7331/mcp
354
+ ```
355
+
356
+ Registered MCP tools include:
357
+
358
+ | Tool | Purpose |
359
+ |---|---|
360
+ | `list_sessions` | List recent sessions with optional source/date filters |
361
+ | `get_session` | Return metadata and timeline events for one session |
362
+ | `search_sessions` | Full-text search over indexed session content |
363
+ | `export_session_markdown` | Render a selected session as Markdown |
364
+ | `list_tool_calls` | Audit commands and tool usage |
365
+ | `find_touched_files` | Find sessions that touched a file/path |
366
+ | `get_artifact` | Retrieve stored artifact text when available |
367
+ | `index_status` | Show derived search index status |
368
+
369
+ Registered MCP prompts include:
370
+
371
+ | Prompt | Purpose |
372
+ |---|---|
373
+ | `investigate_prior_work` | Search prior work on a topic and cite evidence |
374
+ | `find_file_history` | Investigate history for a file or path |
375
+ | `audit_tool_failures` | Group and explain failed tool calls |
376
+
377
+ All MCP tools are read-only and use the same services as the CLI.
378
+
379
+ ## Common workflows
380
+
381
+ ### Import everything local
382
+
383
+ ```bash
384
+ prosa init --force-existing
385
+ prosa compile \
386
+ --codex ~/.codex/sessions \
387
+ --claude ~/.claude/projects \
388
+ --gemini ~/.gemini/tmp \
389
+ --cursor ~/.cursor/chats
390
+ prosa index status
391
+ ```
392
+
393
+ ### Find prior work on a topic
394
+
395
+ ```bash
396
+ prosa search "auth middleware"
397
+ prosa sessions --source codex --limit 20
398
+ prosa export session <session-id> --format markdown
399
+ ```
400
+
401
+ ### Audit failed or suspicious tool usage
402
+
403
+ Use MCP `list_tool_calls` for the richest tool-call filtering, or query Parquet:
404
+
405
+ ```bash
406
+ prosa export parquet
407
+ prosa query duckdb "
408
+ select tool_name, status, count(*) as n
409
+ from tool_calls
410
+ group by 1, 2
411
+ order by n desc
412
+ "
413
+ ```
414
+
415
+ ### Search faster with a sidecar index
416
+
417
+ ```bash
418
+ prosa index tantivy
419
+ prosa search "slow test" --engine tantivy
420
+ prosa index status
421
+ ```
422
+
423
+ ### Keep an isolated test bundle
424
+
425
+ ```bash
426
+ prosa init --store /tmp/prosa-demo
427
+ prosa compile --codex ~/.codex/sessions --store /tmp/prosa-demo
428
+ prosa tui --store /tmp/prosa-demo
429
+ ```
430
+
431
+ ## Bundle layout
432
+
433
+ A bundle is a local directory, defaulting to `~/.prosa`:
434
+
435
+ ```text
436
+ ~/.prosa/
437
+ manifest.json
438
+ prosa.sqlite
439
+ prosa.lock
440
+ objects/
441
+ blake3/...
442
+ raw/
443
+ sources/
444
+ search/
445
+ tantivy/
446
+ exports/
447
+ parquet/
448
+ ```
449
+
450
+ The layers are:
451
+
452
+ | Layer | Contents |
453
+ |---|---|
454
+ | Raw immutable layer | Preserved source files, raw records, import batches, import errors |
455
+ | Canonical projection | Projects, sessions, turns, events, messages, blocks, tool calls, tool results, artifacts, edges |
456
+ | Derived read surfaces | `search_docs`, SQLite FTS5, Tantivy sidecar, Markdown, Parquet |
457
+
458
+ SQLite is the canonical catalog. Large payloads such as raw records, tool
459
+ outputs, diffs, and JSON payloads are stored in the content-addressed object
460
+ store. Object IDs use BLAKE3 and object bytes are compressed with zstd.
461
+
462
+ Raw source files are preserved so future importer versions can rebuild better
463
+ projections without re-reading the original tool history directories.
464
+
465
+ Search indexes, Markdown exports, and Parquet files are derived. Do not treat
466
+ them as the source of truth.
467
+
468
+ ## Development
469
+
470
+ Requirements:
471
+
472
+ - Node.js 22 or newer
473
+ - pnpm
474
+ - devbox, recommended for the local shell
475
+
476
+ Useful commands:
477
+
478
+ ```bash
479
+ pnpm install
480
+ pnpm dev -- <command>
481
+ just build
482
+ just test
483
+ just lint
484
+ just typecheck
485
+ pnpm build
486
+ pnpm test
487
+ pnpm test:watch
488
+ pnpm test:coverage
489
+ pnpm typecheck
490
+ pnpm lint
491
+ pnpm lint:fix
492
+ pnpm format
493
+ pnpm clean
494
+ ```
495
+
496
+ Examples:
497
+
498
+ ```bash
499
+ pnpm dev -- init --store /tmp/prosa-dev
500
+ pnpm dev -- compile --codex ~/.codex/sessions --store /tmp/prosa-dev
501
+ pnpm dev -- sessions --store /tmp/prosa-dev --output-format json
502
+ ```
503
+
504
+ Project layout:
505
+
506
+ | Path | Purpose |
507
+ |---|---|
508
+ | `src/cli/commands/` | CLI command implementations |
509
+ | `src/core/` | Bundle, schema, CAS, domain IDs, ingest helpers |
510
+ | `src/importers/` | Codex, Claude, Gemini, and Cursor importers |
511
+ | `src/services/` | Sessions, search, indexing, exports |
512
+ | `src/mcp/` | MCP server, tools, and prompts |
513
+ | `src/tui/` | Ink terminal UI |
514
+ | `test/` | Vitest tests and fixtures |
515
+ | `docs/` | Design and recovery notes |
516
+
517
+ ## Releasing
518
+
519
+ `prosa` uses Changesets for local npm releases to the official npm registry.
520
+ The package is published publicly as `@c3-oss/prosa`.
521
+
522
+ Create a changeset for user-facing changes:
523
+
524
+ ```bash
525
+ just changeset
526
+ ```
527
+
528
+ Apply pending changesets to `package.json` and `CHANGELOG.md`:
529
+
530
+ ```bash
531
+ just version-packages
532
+ ```
533
+
534
+ Build and publish:
535
+
536
+ ```bash
537
+ just release
538
+ ```
539
+
540
+ Publishing requires an npm account authenticated locally with permission to
541
+ publish public packages under the `@c3-oss` scope. Do not run `just release`
542
+ unless you intend to publish to `https://registry.npmjs.org/`.
543
+
544
+ ## Status and limitations
545
+
546
+ - Version is currently `0.1.0`.
547
+ - Source formats for agent tools can change; importers preserve raw bytes so
548
+ projections can be improved later.
549
+ - FTS5 is available by default; Tantivy search requires `prosa index tantivy`
550
+ before use.
551
+ - `prosa query duckdb` requires Parquet exports. Run `prosa export parquet`
552
+ after importing or re-importing data.
553
+ - Markdown export is optimized for readable transcripts and previews, not for
554
+ dumping every stored byte inline.
555
+ - The default store may contain private local history. Be careful before
556
+ sharing exports, Parquet snapshots, or the bundle itself.
557
+
558
+ ## Why this exists
559
+
560
+ Every agent CLI keeps history in its own format. Searching across tools is
561
+ painful, auditing tool calls is harder, and exporting human-readable transcripts
562
+ is inconsistent.
563
+
564
+ `prosa` reduces that fragmentation into one queryable local store while
565
+ preserving provenance and raw source bytes for auditability and future
566
+ re-processing.
@@ -0,0 +1,4 @@
1
+ #!/usr/bin/env node
2
+ declare function runCli(argv: readonly string[]): Promise<void>;
3
+
4
+ export { runCli };