@threadbase-sh/scanner 0.7.2 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -9,33 +9,38 @@ Combines the best parts of four independent scanner implementations (VS Code, El
9
9
 
10
10
  ## Features
11
11
 
12
+ - **Persistent SQLite index** (default) — durable metadata/search index with incremental byte-offset updates: after the first scan, a grown conversation file is re-read for only its appended bytes. Opt out with `persistent: false` for a pure in-memory scan.
12
13
  - **Deep discovery** — `**/*.jsonl` glob finds all conversations including subagents (1,472 conversations vs 351-497 from individual scanners)
13
14
  - **Full metadata extraction** — session ID, project, git branch, model, tool names, teammate/subagent detection
14
- - **Full-text search** — FlexSearch-powered indexing across all metadata fields
15
+ - **Full-text search** — SQLite FTS5 (persistent) or FlexSearch (in-memory) across content and metadata
16
+ - **File watching** — optional chokidar watcher with a periodic-rescan correctness backstop, emitting change events
17
+ - **Bounded conversation paging** — read a message window without parsing the whole file, via byte-offset checkpoints
15
18
  - **Configurable content tiers** — `standard` (200/5K) and `full` (1,200/50K) preview/snippet limits, extensible
16
19
  - **Multiple views** — flat, tree (parent + subagents), grouped (by team)
17
20
  - **Filtering** — by project, account, time range, conversation type (conversations/subagents/teammates)
18
21
  - **5 sort modes** — recent, oldest, messages-desc, messages-asc, alphabetical
19
22
  - **Pagination** — limit/offset on all operations
23
+ - **Multi-provider** — index Threadbase/Claude history and local OpenAI Codex CLI sessions through one normalized pipeline (Codex is opt-in; in-memory path only — see below)
20
24
  - **Multi-profile** — scan multiple Claude config directories
21
25
  - **LRU caching** — metadata and conversation caches for fast repeated access
22
26
  - **Git branch detection** — reads `.git/HEAD` with parent directory walking
23
27
 
24
28
  ## Installation
25
29
 
26
- This package is consumed from a public GitHub repo, not published to npm.
30
+ ```bash
31
+ npm install @threadbase-sh/scanner
32
+ ```
27
33
 
28
- To use it in your project, add it as a git URL dependency in your `package.json`:
34
+ **Requires Node.js 18 or later.** The package uses `better-sqlite3` (a native module) for its persistent index; prebuilt binaries are downloaded for common platforms, with a node-gyp fallback otherwise.
29
35
 
30
- ```json
31
- "dependencies": {
32
- "@threadbase/scanner": "github:RonenMars/threadbase-scanner#v0.3.0"
33
- }
34
- ```
36
+ ### Persistent vs. in-memory
35
37
 
36
- Then run `npm install`. npm will clone this repo at tag `v0.3.0`, run its `prepare` script to build `dist/`, and make the package available under `node_modules/@threadbase/scanner/`.
38
+ By default the scanner maintains a durable SQLite index at `~/.config/threadbase-scanner/index.db`, so repeated scans only re-read files that changed and search/list queries are indexed. To opt out of the native dependency and use the legacy in-memory path, construct with `persistent: false` (or pass `--no-persist` to the CLI):
37
39
 
38
- **Requires Node.js 18 or later.**
40
+ ```typescript
41
+ const scanner = new ConversationScanner({ persistent: false }) // in-memory, no DB
42
+ const scanner2 = new ConversationScanner({ persistent: { dbPath: '/tmp/tb.db' } }) // custom DB
43
+ ```
39
44
 
40
45
  ## Library Usage
41
46
 
@@ -94,6 +99,68 @@ const result = await scanner.scan({
94
99
 
95
100
  // Reuse the scanner instance for cached lookups
96
101
  const conv = await scanner.getConversation(someId)
102
+
103
+ // Bounded page — reads only the requested window (persistent mode seeks from a
104
+ // checkpoint instead of parsing the whole file)
105
+ const page = await scanner.getConversationPage(someId, { limit: 50 })
106
+
107
+ // Collision-safe sessionId lookup (session ids are not unique)
108
+ const all = scanner.getConversationsBySessionId('sess-123')
109
+
110
+ // Release the SQLite connection when done
111
+ scanner.close()
112
+ ```
113
+
114
+ ### Scanning Codex CLI history (providers)
115
+
116
+ The scanner can index local **OpenAI Codex CLI** rollout sessions alongside the
117
+ default Threadbase/Claude history, normalizing both into the same
118
+ `ConversationMeta` model. Codex support is **opt-in**: pass `providers` and the
119
+ explicit `codexRoots` to discover under (no home directory is scanned by
120
+ default).
121
+
122
+ ```typescript
123
+ const scanner = new ConversationScanner()
124
+
125
+ const result = await scanner.scan({
126
+ providers: ['claude-code', 'codex-cli'],
127
+ codexRoots: ['~/.codex/sessions'], // expand ~ yourself, or pass an absolute path
128
+ })
129
+
130
+ // Each meta carries its source provider
131
+ for (const c of result.conversations) {
132
+ console.log(c.provider) // 'claude-code' | 'codex-cli'
133
+ }
134
+
135
+ // Search across both, or filter to one provider
136
+ const codexHits = await scanner.search('refactor', { provider: 'codex-cli' })
137
+ ```
138
+
139
+ `codexRoots` entries must be absolute paths — expand `~` before passing them
140
+ (e.g. ``join(homedir(), '.codex/sessions')``). Codex metas also set `kind`
141
+ (`'conversation'` | `'task'`) and `externalSessionId` (the Codex-native session
142
+ id) when available.
143
+
144
+ > **⚠️ In-memory only (for now).** Codex support runs through the legacy
145
+ > in-memory scan path — the SQLite persistent engine indexes Threadbase/Claude
146
+ > files only. Requesting `codex-cli` (via `providers` or `codexRoots`)
147
+ > automatically routes that scan/search through the in-memory path, even on a
148
+ > scanner constructed in persistent mode. Threadbase-only scans are unaffected
149
+ > and still use SQLite. Persistent-mode Codex indexing is a planned follow-up.
150
+
151
+ ### Watching for changes (persistent mode)
152
+
153
+ ```typescript
154
+ const scanner = new ConversationScanner() // persistent by default
155
+
156
+ scanner.on('change', ({ filePath, meta }) => {
157
+ // meta is the fresh ConversationMeta, or null if the file was removed
158
+ refreshUI(meta)
159
+ })
160
+
161
+ await scanner.watch() // filesystem watcher + periodic rescan backstop
162
+ // ... later
163
+ await scanner.unwatch()
97
164
  ```
98
165
 
99
166
  ### View modes
@@ -263,6 +330,9 @@ Every scanned conversation produces a `ConversationMeta` with the full superset
263
330
  | `isTeammate` | boolean | VS Code |
264
331
  | `teamName` | string \| null | VS Code |
265
332
  | `toolNames` | string[] | CLI |
333
+ | `provider` | `'claude-code' \| 'codex-cli'` | Provider that produced the meta |
334
+ | `kind` | `'conversation' \| 'task'` | Codex (optional) |
335
+ | `externalSessionId` | string | Codex-native session id (optional) |
266
336
 
267
337
  ## Development
268
338