@c3-oss/prosa 0.3.2 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -19,6 +19,7 @@ original raw data.
19
19
  - Lists and filters sessions by source and timestamp.
20
20
  - Exports individual sessions as Markdown.
21
21
  - Exports canonical tables to Parquet for DuckDB analytics.
22
+ - Runs built-in analytics reports over Parquet with DuckDB.
22
23
  - Provides an Ink-based terminal UI for browsing sessions and search results.
23
24
  - Serves a read-only MCP server over the local bundle for agent memory access.
24
25
 
@@ -55,6 +56,7 @@ prosa search "package.json"
55
56
  prosa export session <session-id> --format markdown --out session.md
56
57
  prosa export parquet
57
58
  prosa query duckdb "select source_tool, count(*) from sessions group by 1"
59
+ prosa analytics tools --refresh
58
60
  prosa tui
59
61
  prosa mcp serve
60
62
  ```
@@ -93,12 +95,10 @@ Imports are idempotent for already-seen source files. Each import reports counts
93
95
  for source files, sessions, messages, tool calls, tool results, artifacts,
94
96
  edges, and errors.
95
97
 
96
- For large imports, you can defer FTS5 index updates and rebuild later:
97
-
98
- ```bash
99
- prosa compile codex --defer-index
100
- prosa index fts5
101
- ```
98
+ `prosa compile` always disables FTS5 triggers during the import loop and
99
+ rebuilds the FTS5 index in bulk at the end (mirroring how the Tantivy sidecar
100
+ is rebuilt). Sidecars stay in sync without a manual step. For recovery, the
101
+ standalone `prosa index fts5` command is still available.
102
102
 
103
103
  ## CLI reference
104
104
 
@@ -147,7 +147,6 @@ Options:
147
147
  |---|---|
148
148
  | `--sessions-path <path>` | Root of the selected provider's session history |
149
149
  | `--store <path>` | Bundle directory |
150
- | `--defer-index` | Skip immediate FTS5 updates; run `prosa index fts5` later |
151
150
  | `--verbose` | Emit debug logs during compilation |
152
151
  | `--json-logs` | Emit raw JSON logs instead of pretty logs |
153
152
 
@@ -165,8 +164,9 @@ prosa index fts5
165
164
  prosa index tantivy
166
165
  ```
167
166
 
168
- `fts5` is the default SQLite full-text index. It is updated during normal
169
- imports unless `--defer-index` is used.
167
+ `fts5` is the default SQLite full-text index. `prosa compile` rebuilds it in
168
+ bulk at the end of every import; `prosa index fts5` is a standalone recovery
169
+ path that repopulates the index from `search_docs`.
170
170
 
171
171
  `tantivy` is an optional sidecar search index. Build it before searching with
172
172
  `--engine tantivy`:
@@ -201,8 +201,12 @@ prosa sessions count
201
201
  prosa sessions count --source cursor --since 2026-01-01
202
202
  ```
203
203
 
204
- Session list output includes timestamp, source tool, `session_id`, model,
205
- message count, tool call count, initial working directory, and title.
204
+ Session list output includes timestamp, source tool, a 12-char `session_id`
205
+ prefix, model, message count, tool call count, and title by default. Use
206
+ `--columns all` to include `cwd_initial`, `source_session_id`,
207
+ `parent_session_id`, `is_subagent`, `git_branch_initial`, `model_first`,
208
+ `status`, `timeline_confidence`, and `end_ts`. Pass a CSV list to pick a
209
+ subset (`--columns start_ts,session_id,title`).
206
210
 
207
211
  Output formats:
208
212
 
@@ -210,10 +214,13 @@ Output formats:
210
214
  prosa sessions --output-format table
211
215
  prosa sessions --output-format json
212
216
  prosa sessions --output-format csv
217
+ prosa sessions --columns all
218
+ prosa sessions --columns start_ts,session_id,title
213
219
  ```
214
220
 
215
- The `interactive` output format is accepted by headless commands and currently
216
- renders as a table. Use `prosa tui` for the interactive browser.
221
+ `table` and `interactive` outputs are width-aware: long values are truncated
222
+ with `…` to fit the terminal (or 200 columns when piped). `json` and `csv`
223
+ always emit full values. Use `prosa tui` for the interactive browser.
217
224
 
218
225
  ### `prosa search`
219
226
 
@@ -297,6 +304,57 @@ prosa query duckdb "select count(*) as n from sessions" --output-format json
297
304
  prosa query duckdb "select * from sessions limit 10" --output-format csv
298
305
  ```
299
306
 
307
+ `prosa query duckdb` also exposes derived analytics views:
308
+
309
+ ```text
310
+ session_facts, tool_usage_facts, error_facts, model_usage, project_activity
311
+ ```
312
+
313
+ See [`docs/recipes/duckdb.md`](./docs/recipes/duckdb.md) for copy-pasteable
314
+ queries.
315
+
316
+ ### `prosa analytics`
317
+
318
+ Run built-in reports over exported Parquet files:
319
+
320
+ ```bash
321
+ prosa analytics sessions --refresh
322
+ prosa analytics tools --source codex
323
+ prosa analytics errors --output-format json
324
+ prosa analytics models --since 2026-01-01
325
+ prosa analytics projects --project /Users/me/app
326
+ ```
327
+
328
+ Reports require Parquet files. Add `--refresh` to export Parquet before running
329
+ the report. All reports support `--store`, `--parquet-dir`, `--source`,
330
+ `--since`, `--until`, `--limit`, `--output-format table|json|csv`, and
331
+ `--columns <list>` for column selection.
332
+
333
+ Table output is curated to fit a normal terminal: `analytics sessions` shows
334
+ 9 columns by default (drops `source_file_path`, `session_id`,
335
+ `source_session_id`, `tool_result_count`, `tool_duration_ms`, and
336
+ `timeline_confidence`), `analytics projects` drops `project_path`, and
337
+ `analytics errors` drops `session_id` and the full `message` (the shorter
338
+ `preview` keeps the signal). Use `--columns all` to get every column the SQL
339
+ returns, or `--columns col1,col2` to pick specific ones:
340
+
341
+ ```bash
342
+ prosa analytics sessions --columns all
343
+ prosa analytics sessions --columns start_ts,project_name,source_file_path
344
+ prosa analytics errors --columns all # includes the full `message`
345
+ ```
346
+
347
+ `json` and `csv` output always include every column regardless of `--columns`.
348
+
349
+ Additional filters:
350
+
351
+ ```bash
352
+ prosa analytics tools --tool-name Bash --errors-only
353
+ prosa analytics tools --canonical-type shell
354
+ prosa analytics errors --category tool_result
355
+ prosa analytics models --model gpt-5.4
356
+ ```
357
+
300
358
  ### `prosa tui`
301
359
 
302
360
  Open the Ink-based interactive explorer:
@@ -359,18 +417,16 @@ By default, HTTP mode listens at:
359
417
  http://127.0.0.1:7331/mcp
360
418
  ```
361
419
 
362
- Registered MCP tools include:
420
+ Registered MCP tools (six in total):
363
421
 
364
422
  | Tool | Purpose |
365
423
  |---|---|
366
- | `list_sessions` | List recent sessions with optional source/date filters |
367
- | `get_session` | Return metadata and timeline events for one session |
368
- | `search_sessions` | Full-text search over indexed session content |
369
- | `export_session_markdown` | Render a selected session as Markdown |
370
- | `list_tool_calls` | Audit commands and tool usage |
371
- | `find_touched_files` | Find sessions that touched a file/path |
372
- | `get_artifact` | Retrieve stored artifact text when available |
373
- | `index_status` | Show derived search index status |
424
+ | `search` | Full-text search over messages, commands, paths, diffs, and previews. Optional `engine`, `field_kind`, `since`/`until`, `raw`, `limit`. |
425
+ | `sessions` | Without `session_id`, lists candidates filtered by source/time/limit. With `session_id`, opens it: `format=detail` (default) returns metadata + timeline, `format=summary` returns the row only, `format=markdown` renders the transcript. |
426
+ | `tool_calls` | Audit commands and tool usage by tool_name, canonical_type, session_id, errors_only, time bounds. When `path_substring` is set, also returns matching artifacts. |
427
+ | `analytics` | Built-in aggregate reports backed by SQLite views: `report=sessions\|tools\|errors\|models\|projects` with the matching filters. |
428
+ | `artifact` | Fetch full text for an `artifact_id`. Binary artifacts return a placeholder. |
429
+ | `compile` | Without args, returns a status snapshot (search index health). With `source` (and optional `sessions_path`), imports that provider into the bundle. |
374
430
 
375
431
  Registered MCP prompts include:
376
432
 
@@ -380,7 +436,7 @@ Registered MCP prompts include:
380
436
  | `find_file_history` | Investigate history for a file or path |
381
437
  | `audit_tool_failures` | Group and explain failed tool calls |
382
438
 
383
- All MCP tools are read-only and use the same services as the CLI.
439
+ Five tools are read-only; `compile` is dual-mode (status without args, mutating import with args). All tools use the same services as the CLI.
384
440
 
385
441
  ## Common workflows
386
442
 
@@ -402,7 +458,15 @@ prosa export session <session-id> --format markdown
402
458
 
403
459
  ### Audit failed or suspicious tool usage
404
460
 
405
- Use MCP `list_tool_calls` for the richest tool-call filtering, or query Parquet:
461
+ Use the built-in analytics report for quick aggregates:
462
+
463
+ ```bash
464
+ prosa analytics tools --refresh --errors-only
465
+ prosa analytics errors --output-format json
466
+ ```
467
+
468
+ Use MCP `tool_calls` for the richest session-level filtering, or query
469
+ Parquet directly when you need custom SQL:
406
470
 
407
471
  ```bash
408
472
  prosa export parquet
@@ -414,6 +478,14 @@ prosa query duckdb "
414
478
  "
415
479
  ```
416
480
 
481
+ ### Summarize a custom session store through MCP
482
+
483
+ After compiling a non-default sessions path, use MCP `analytics report=sessions`
484
+ with `source_path_substring` to keep analysis inside prosa instead of reading
485
+ the source JSONL directly. This is useful for stores such as
486
+ `~/.codex-mz/sessions` that share the same provider name as the default Codex
487
+ store.
488
+
417
489
  ### Search faster with a sidecar index
418
490
 
419
491
  ```bash