@threadbase-sh/scanner 0.7.2 → 0.8.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +80 -10
- package/dist/cli.js +2059 -338
- package/dist/cli.js.map +1 -1
- package/dist/index.cjs +2114 -376
- package/dist/index.cjs.map +1 -1
- package/dist/index.d.cts +154 -6
- package/dist/index.d.ts +154 -6
- package/dist/index.js +2111 -377
- package/dist/index.js.map +1 -1
- package/package.json +4 -1
package/README.md
CHANGED
|
@@ -9,33 +9,38 @@ Combines the best parts of four independent scanner implementations (VS Code, El
|
|
|
9
9
|
|
|
10
10
|
## Features
|
|
11
11
|
|
|
12
|
+
- **Persistent SQLite index** (default) — durable metadata/search index with incremental byte-offset updates: after the first scan, a grown conversation file is re-read for only its appended bytes. Opt out with `persistent: false` for a pure in-memory scan.
|
|
12
13
|
- **Deep discovery** — `**/*.jsonl` glob finds all conversations including subagents (1,472 conversations vs 351-497 from individual scanners)
|
|
13
14
|
- **Full metadata extraction** — session ID, project, git branch, model, tool names, teammate/subagent detection
|
|
14
|
-
- **Full-text search** — FlexSearch-
|
|
15
|
+
- **Full-text search** — SQLite FTS5 (persistent) or FlexSearch (in-memory) across content and metadata
|
|
16
|
+
- **File watching** — optional chokidar watcher with a periodic-rescan correctness backstop, emitting change events
|
|
17
|
+
- **Bounded conversation paging** — read a message window without parsing the whole file, via byte-offset checkpoints
|
|
15
18
|
- **Configurable content tiers** — `standard` (200/5K) and `full` (1,200/50K) preview/snippet limits, extensible
|
|
16
19
|
- **Multiple views** — flat, tree (parent + subagents), grouped (by team)
|
|
17
20
|
- **Filtering** — by project, account, time range, conversation type (conversations/subagents/teammates)
|
|
18
21
|
- **5 sort modes** — recent, oldest, messages-desc, messages-asc, alphabetical
|
|
19
22
|
- **Pagination** — limit/offset on all operations
|
|
23
|
+
- **Multi-provider** — index Threadbase/Claude history and local OpenAI Codex CLI sessions through one normalized pipeline (Codex is opt-in; in-memory path only — see below)
|
|
20
24
|
- **Multi-profile** — scan multiple Claude config directories
|
|
21
25
|
- **LRU caching** — metadata and conversation caches for fast repeated access
|
|
22
26
|
- **Git branch detection** — reads `.git/HEAD` with parent directory walking
|
|
23
27
|
|
|
24
28
|
## Installation
|
|
25
29
|
|
|
26
|
-
|
|
30
|
+
```bash
|
|
31
|
+
npm install @threadbase-sh/scanner
|
|
32
|
+
```
|
|
27
33
|
|
|
28
|
-
|
|
34
|
+
**Requires Node.js 18 or later.** The package uses `better-sqlite3` (a native module) for its persistent index; prebuilt binaries are downloaded for common platforms, with a node-gyp fallback otherwise.
|
|
29
35
|
|
|
30
|
-
|
|
31
|
-
"dependencies": {
|
|
32
|
-
"@threadbase/scanner": "github:RonenMars/threadbase-scanner#v0.3.0"
|
|
33
|
-
}
|
|
34
|
-
```
|
|
36
|
+
### Persistent vs. in-memory
|
|
35
37
|
|
|
36
|
-
|
|
38
|
+
By default the scanner maintains a durable SQLite index at `~/.config/threadbase-scanner/index.db`, so repeated scans only re-read files that changed and search/list queries are indexed. To opt out of the native dependency and use the legacy in-memory path, construct with `persistent: false` (or pass `--no-persist` to the CLI):
|
|
37
39
|
|
|
38
|
-
|
|
40
|
+
```typescript
|
|
41
|
+
const scanner = new ConversationScanner({ persistent: false }) // in-memory, no DB
|
|
42
|
+
const scanner2 = new ConversationScanner({ persistent: { dbPath: '/tmp/tb.db' } }) // custom DB
|
|
43
|
+
```
|
|
39
44
|
|
|
40
45
|
## Library Usage
|
|
41
46
|
|
|
@@ -94,6 +99,68 @@ const result = await scanner.scan({
|
|
|
94
99
|
|
|
95
100
|
// Reuse the scanner instance for cached lookups
|
|
96
101
|
const conv = await scanner.getConversation(someId)
|
|
102
|
+
|
|
103
|
+
// Bounded page — reads only the requested window (persistent mode seeks from a
|
|
104
|
+
// checkpoint instead of parsing the whole file)
|
|
105
|
+
const page = await scanner.getConversationPage(someId, { limit: 50 })
|
|
106
|
+
|
|
107
|
+
// Collision-safe sessionId lookup (session ids are not unique)
|
|
108
|
+
const all = scanner.getConversationsBySessionId('sess-123')
|
|
109
|
+
|
|
110
|
+
// Release the SQLite connection when done
|
|
111
|
+
scanner.close()
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
### Scanning Codex CLI history (providers)
|
|
115
|
+
|
|
116
|
+
The scanner can index local **OpenAI Codex CLI** rollout sessions alongside the
|
|
117
|
+
default Threadbase/Claude history, normalizing both into the same
|
|
118
|
+
`ConversationMeta` model. Codex support is **opt-in**: pass `providers` and the
|
|
119
|
+
explicit `codexRoots` to discover under (no home directory is scanned by
|
|
120
|
+
default).
|
|
121
|
+
|
|
122
|
+
```typescript
|
|
123
|
+
const scanner = new ConversationScanner()
|
|
124
|
+
|
|
125
|
+
const result = await scanner.scan({
|
|
126
|
+
providers: ['claude-code', 'codex-cli'],
|
|
127
|
+
codexRoots: ['~/.codex/sessions'], // expand ~ yourself, or pass an absolute path
|
|
128
|
+
})
|
|
129
|
+
|
|
130
|
+
// Each meta carries its source provider
|
|
131
|
+
for (const c of result.conversations) {
|
|
132
|
+
console.log(c.provider) // 'claude-code' | 'codex-cli'
|
|
133
|
+
}
|
|
134
|
+
|
|
135
|
+
// Search across both, or filter to one provider
|
|
136
|
+
const codexHits = await scanner.search('refactor', { provider: 'codex-cli' })
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
`codexRoots` entries must be absolute paths — expand `~` before passing them
|
|
140
|
+
(e.g. ``join(homedir(), '.codex/sessions')``). Codex metas also set `kind`
|
|
141
|
+
(`'conversation'` | `'task'`) and `externalSessionId` (the Codex-native session
|
|
142
|
+
id) when available.
|
|
143
|
+
|
|
144
|
+
> **⚠️ In-memory only (for now).** Codex support runs through the legacy
|
|
145
|
+
> in-memory scan path — the SQLite persistent engine indexes Threadbase/Claude
|
|
146
|
+
> files only. Requesting `codex-cli` (via `providers` or `codexRoots`)
|
|
147
|
+
> automatically routes that scan/search through the in-memory path, even on a
|
|
148
|
+
> scanner constructed in persistent mode. Threadbase-only scans are unaffected
|
|
149
|
+
> and still use SQLite. Persistent-mode Codex indexing is a planned follow-up.
|
|
150
|
+
|
|
151
|
+
### Watching for changes (persistent mode)
|
|
152
|
+
|
|
153
|
+
```typescript
|
|
154
|
+
const scanner = new ConversationScanner() // persistent by default
|
|
155
|
+
|
|
156
|
+
scanner.on('change', ({ filePath, meta }) => {
|
|
157
|
+
// meta is the fresh ConversationMeta, or null if the file was removed
|
|
158
|
+
refreshUI(meta)
|
|
159
|
+
})
|
|
160
|
+
|
|
161
|
+
await scanner.watch() // filesystem watcher + periodic rescan backstop
|
|
162
|
+
// ... later
|
|
163
|
+
await scanner.unwatch()
|
|
97
164
|
```
|
|
98
165
|
|
|
99
166
|
### View modes
|
|
@@ -263,6 +330,9 @@ Every scanned conversation produces a `ConversationMeta` with the full superset
|
|
|
263
330
|
| `isTeammate` | boolean | VS Code |
|
|
264
331
|
| `teamName` | string \| null | VS Code |
|
|
265
332
|
| `toolNames` | string[] | CLI |
|
|
333
|
+
| `provider` | `'claude-code' \| 'codex-cli'` | Provider that produced the meta |
|
|
334
|
+
| `kind` | `'conversation' \| 'task'` | Codex (optional) |
|
|
335
|
+
| `externalSessionId` | string | Codex-native session id (optional) |
|
|
266
336
|
|
|
267
337
|
## Development
|
|
268
338
|
|