@c3-oss/prosa 0.3.2 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -19,6 +19,7 @@ original raw data.
19
19
  - Lists and filters sessions by source and timestamp.
20
20
  - Exports individual sessions as Markdown.
21
21
  - Exports canonical tables to Parquet for DuckDB analytics.
22
+ - Runs built-in analytics reports over Parquet with DuckDB.
22
23
  - Provides an Ink-based terminal UI for browsing sessions and search results.
23
24
  - Serves a read-only MCP server over the local bundle for agent memory access.
24
25
 
@@ -55,6 +56,7 @@ prosa search "package.json"
55
56
  prosa export session <session-id> --format markdown --out session.md
56
57
  prosa export parquet
57
58
  prosa query duckdb "select source_tool, count(*) from sessions group by 1"
59
+ prosa analytics tools --refresh
58
60
  prosa tui
59
61
  prosa mcp serve
60
62
  ```
@@ -93,12 +95,10 @@ Imports are idempotent for already-seen source files. Each import reports counts
93
95
  for source files, sessions, messages, tool calls, tool results, artifacts,
94
96
  edges, and errors.
95
97
 
96
- For large imports, you can defer FTS5 index updates and rebuild later:
97
-
98
- ```bash
99
- prosa compile codex --defer-index
100
- prosa index fts5
101
- ```
98
+ `prosa compile` always disables FTS5 triggers during the import loop and
99
+ rebuilds the FTS5 index in bulk at the end (mirroring how the Tantivy sidecar
100
+ is rebuilt). Sidecars stay in sync without a manual step. For recovery, the
101
+ standalone `prosa index fts5` command is still available.
102
102
 
103
103
  ## CLI reference
104
104
 
@@ -147,7 +147,6 @@ Options:
147
147
  |---|---|
148
148
  | `--sessions-path <path>` | Root of the selected provider's session history |
149
149
  | `--store <path>` | Bundle directory |
150
- | `--defer-index` | Skip immediate FTS5 updates; run `prosa index fts5` later |
151
150
  | `--verbose` | Emit debug logs during compilation |
152
151
  | `--json-logs` | Emit raw JSON logs instead of pretty logs |
153
152
 
@@ -165,8 +164,9 @@ prosa index fts5
165
164
  prosa index tantivy
166
165
  ```
167
166
 
168
- `fts5` is the default SQLite full-text index. It is updated during normal
169
- imports unless `--defer-index` is used.
167
+ `fts5` is the default SQLite full-text index. `prosa compile` rebuilds it in
168
+ bulk at the end of every import; `prosa index fts5` is a standalone recovery
169
+ path that repopulates the index from `search_docs`.
170
170
 
171
171
  `tantivy` is an optional sidecar search index. Build it before searching with
172
172
  `--engine tantivy`:
@@ -297,6 +297,40 @@ prosa query duckdb "select count(*) as n from sessions" --output-format json
297
297
  prosa query duckdb "select * from sessions limit 10" --output-format csv
298
298
  ```
299
299
 
300
+ `prosa query duckdb` also exposes derived analytics views:
301
+
302
+ ```text
303
+ session_facts, tool_usage_facts, error_facts, model_usage, project_activity
304
+ ```
305
+
306
+ See [`docs/recipes/duckdb.md`](./docs/recipes/duckdb.md) for copy-pasteable
307
+ queries.
308
+
309
+ ### `prosa analytics`
310
+
311
+ Run built-in reports over exported Parquet files:
312
+
313
+ ```bash
314
+ prosa analytics sessions --refresh
315
+ prosa analytics tools --source codex
316
+ prosa analytics errors --output-format json
317
+ prosa analytics models --since 2026-01-01
318
+ prosa analytics projects --project /Users/me/app
319
+ ```
320
+
321
+ Reports require Parquet files. Add `--refresh` to export Parquet before running
322
+ the report. All reports support `--store`, `--parquet-dir`, `--source`,
323
+ `--since`, `--until`, `--limit`, and `--output-format table|json|csv`.
324
+
325
+ Additional filters:
326
+
327
+ ```bash
328
+ prosa analytics tools --tool-name Bash --errors-only
329
+ prosa analytics tools --canonical-type shell
330
+ prosa analytics errors --category tool_result
331
+ prosa analytics models --model gpt-5.4
332
+ ```
333
+
300
334
  ### `prosa tui`
301
335
 
302
336
  Open the Ink-based interactive explorer:
@@ -359,18 +393,16 @@ By default, HTTP mode listens at:
359
393
  http://127.0.0.1:7331/mcp
360
394
  ```
361
395
 
362
- Registered MCP tools include:
396
+ Registered MCP tools (six in total):
363
397
 
364
398
  | Tool | Purpose |
365
399
  |---|---|
366
- | `list_sessions` | List recent sessions with optional source/date filters |
367
- | `get_session` | Return metadata and timeline events for one session |
368
- | `search_sessions` | Full-text search over indexed session content |
369
- | `export_session_markdown` | Render a selected session as Markdown |
370
- | `list_tool_calls` | Audit commands and tool usage |
371
- | `find_touched_files` | Find sessions that touched a file/path |
372
- | `get_artifact` | Retrieve stored artifact text when available |
373
- | `index_status` | Show derived search index status |
400
+ | `search` | Full-text search over messages, commands, paths, diffs, and previews. Optional `engine`, `field_kind`, `since`/`until`, `raw`, `limit`. |
401
+ | `sessions` | Without `session_id`, lists candidates filtered by source/time/limit. With `session_id`, opens it: `format=detail` (default) returns metadata + timeline, `format=summary` returns the row only, `format=markdown` renders the transcript. |
402
+ | `tool_calls` | Audit commands and tool usage by tool_name, canonical_type, session_id, errors_only, time bounds. When `path_substring` is set, also returns matching artifacts. |
403
+ | `analytics` | Built-in aggregate reports backed by SQLite views: `report=sessions\|tools\|errors\|models\|projects` with the matching filters. |
404
+ | `artifact` | Fetch full text for an `artifact_id`. Binary artifacts return a placeholder. |
405
+ | `compile` | Without args, returns a status snapshot (search index health). With `source` (and optional `sessions_path`), imports that provider into the bundle. |
374
406
 
375
407
  Registered MCP prompts include:
376
408
 
@@ -380,7 +412,7 @@ Registered MCP prompts include:
380
412
  | `find_file_history` | Investigate history for a file or path |
381
413
  | `audit_tool_failures` | Group and explain failed tool calls |
382
414
 
383
- All MCP tools are read-only and use the same services as the CLI.
415
+ Five tools are read-only; `compile` is dual-mode (status without args, mutating import with args). All tools use the same services as the CLI.
384
416
 
385
417
  ## Common workflows
386
418
 
@@ -402,7 +434,15 @@ prosa export session <session-id> --format markdown
402
434
 
403
435
  ### Audit failed or suspicious tool usage
404
436
 
405
- Use MCP `list_tool_calls` for the richest tool-call filtering, or query Parquet:
437
+ Use the built-in analytics report for quick aggregates:
438
+
439
+ ```bash
440
+ prosa analytics tools --refresh --errors-only
441
+ prosa analytics errors --output-format json
442
+ ```
443
+
444
+ Use MCP `tool_calls` for the richest session-level filtering, or query
445
+ Parquet directly when you need custom SQL:
406
446
 
407
447
  ```bash
408
448
  prosa export parquet
@@ -414,6 +454,14 @@ prosa query duckdb "
414
454
  "
415
455
  ```
416
456
 
457
+ ### Summarize a custom session store through MCP
458
+
459
+ After compiling a non-default sessions path, use MCP `analytics report=sessions`
460
+ with `source_path_substring` to keep analysis inside prosa instead of reading
461
+ the source JSONL directly. This is useful for stores such as
462
+ `~/.codex-mz/sessions` that share the same provider name as the default Codex
463
+ store.
464
+
417
465
  ### Search faster with a sidecar index
418
466
 
419
467
  ```bash