@threadbase-sh/scanner 0.7.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Ronen Mars
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,304 @@
1
+ # @threadbase/scanner
2
+
3
+ Unified Claude Code conversation history scanner.
4
+
5
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](./LICENSE)
6
+ [![Node.js](https://img.shields.io/badge/node-%3E%3D18-brightgreen.svg)](https://nodejs.org)
7
+
8
+ Combines the best parts of four independent scanner implementations (VS Code, Electron, IntelliJ, CLI) into a single TypeScript package.
9
+
10
+ ## Features
11
+
12
+ - **Deep discovery** — `**/*.jsonl` glob finds all conversations including subagents (1,472 conversations vs 351-497 from individual scanners)
13
+ - **Full metadata extraction** — session ID, project, git branch, model, tool names, teammate/subagent detection
14
+ - **Full-text search** — FlexSearch-powered indexing across all metadata fields
15
+ - **Configurable content tiers** — `standard` (200/5K) and `full` (1,200/50K) preview/snippet limits, extensible
16
+ - **Multiple views** — flat, tree (parent + subagents), grouped (by team)
17
+ - **Filtering** — by project, account, time range, conversation type (conversations/subagents/teammates)
18
+ - **5 sort modes** — recent, oldest, messages-desc, messages-asc, alphabetical
19
+ - **Pagination** — limit/offset on all operations
20
+ - **Multi-profile** — scan multiple Claude config directories
21
+ - **LRU caching** — metadata and conversation caches for fast repeated access
22
+ - **Git branch detection** — reads `.git/HEAD` with parent directory walking
23
+
24
+ ## Installation
25
+
26
+ This package is consumed from a public GitHub repo, not published to npm.
27
+
28
+ To use it in your project, add it as a git URL dependency in your `package.json`:
29
+
30
+ ```json
31
+ "dependencies": {
32
+ "@threadbase/scanner": "github:RonenMars/threadbase-scanner#v0.3.0"
33
+ }
34
+ ```
35
+
36
+ Then run `npm install`. npm will clone this repo at tag `v0.3.0`, run its `prepare` script to build `dist/`, and make the package available under `node_modules/@threadbase/scanner/`.
37
+
38
+ **Requires Node.js 18 or later.**
39
+
40
+ ## Library Usage
41
+
42
+ ```typescript
43
+ import { scan, search, getConversation, ConversationScanner } from '@threadbase/scanner'
44
+
45
+ // Quick scan with defaults
46
+ const result = await scan()
47
+ console.log(`Found ${result.total} conversations`)
48
+
49
+ // Scan with options
50
+ const filtered = await scan({
51
+ sort: 'recent',
52
+ since: '7d',
53
+ project: 'my-app',
54
+ include: 'conversations', // exclude subagents/teammates
55
+ tier: 'full', // larger previews
56
+ limit: 20,
57
+ offset: 0,
58
+ })
59
+
60
+ // Full-text search
61
+ const results = await search('authentication bug', {
62
+ limit: 10,
63
+ project: 'backend',
64
+ })
65
+
66
+ for (const r of results) {
67
+ console.log(r.meta.projectName, r.matches[0]?.snippet)
68
+ }
69
+
70
+ // Load full conversation
71
+ const conv = await getConversation(results[0].meta.id)
72
+ for (const msg of conv.messages) {
73
+ console.log(`[${msg.role}] ${msg.text.slice(0, 100)}`)
74
+ }
75
+ ```
76
+
77
+ ### Using the class directly
78
+
79
+ ```typescript
80
+ import { ConversationScanner } from '@threadbase/scanner'
81
+
82
+ const scanner = new ConversationScanner({ conversationCacheSize: 10 })
83
+
84
+ // Scan with progress and batch callbacks
85
+ const result = await scanner.scan({
86
+ onProgress: (scanned, total) => console.log(`${scanned}/${total}`),
87
+ onBatch: (metas) => {
88
+ // Incrementally update UI as batches complete
89
+ for (const meta of metas) {
90
+ addToList(meta)
91
+ }
92
+ },
93
+ })
94
+
95
+ // Reuse the scanner instance for cached lookups
96
+ const conv = await scanner.getConversation(someId)
97
+ ```
98
+
99
+ ### View modes
100
+
101
+ ```typescript
102
+ // Flat (default) — all conversations in a single list
103
+ await scan({ view: 'flat' })
104
+
105
+ // Tree — parent conversations with nested subagents
106
+ await scan({ view: 'tree' })
107
+ // Returns TreeConversation[] with .subagents array
108
+
109
+ // Grouped — conversations grouped by team name
110
+ await scan({ view: 'grouped' })
111
+ // Returns { [teamName: string]: ConversationMeta[] }
112
+ ```
113
+
114
+ ### Custom content tiers
115
+
116
+ ```typescript
117
+ await scan({
118
+ tier: 'compact',
119
+ tiers: {
120
+ compact: { name: 'compact', previewMax: 50, snippetMax: 500 },
121
+ },
122
+ })
123
+ ```
124
+
125
+ ### Shared default scanner
126
+
127
+ The convenience functions `scan`, `search`, and `getConversation` share a lazy module-level `ConversationScanner` so the FlexSearch index and conversation LRU survive across calls. A first `scan()` warms state; a subsequent `search()` reuses the already-built index instead of re-walking the filesystem.
128
+
129
+ ```typescript
130
+ import { scan, search, getConversation, resetDefaultScanner } from '@threadbase/scanner'
131
+
132
+ await scan({ profiles }) // warms the shared scanner
133
+ await search('auth', { profiles }) // hits the in-memory index — no re-scan
134
+ await getConversation(id) // LRU hit on subsequent calls for the same id
135
+
136
+ // Drop shared state (e.g. between tests, or to force a fresh scan)
137
+ resetDefaultScanner()
138
+ ```
139
+
140
+ To run isolated state (parallel scans with different options, multi-tenant hosts, etc.) pass an explicit scanner as the optional third parameter:
141
+
142
+ ```typescript
143
+ import { ConversationScanner, scan, search } from '@threadbase/scanner'
144
+
145
+ const work = new ConversationScanner()
146
+ const personal = new ConversationScanner()
147
+
148
+ await scan({ profiles: workProfiles }, work)
149
+ await scan({ profiles: personalProfiles }, personal)
150
+
151
+ const results = await search('query', { limit: 20 }, work)
152
+ ```
153
+
154
+ The shared scanner does **not** auto-refresh: it reflects the filesystem at the time of the first scan. Call `resetDefaultScanner()` (or `scan()` again) when you need to pick up newly-created `.jsonl` files.
155
+
156
+ ### Logging
157
+
158
+ The library uses [pino](https://getpino.io) internally and ships with a default **silent** logger, so embedding it produces no console output unless you opt in.
159
+
160
+ ```typescript
161
+ import pino from 'pino'
162
+ import { setLogger, createLogger } from '@threadbase/scanner'
163
+
164
+ // Use your own pino instance
165
+ setLogger(pino({ level: 'info' }))
166
+
167
+ // Or build one from options
168
+ setLogger(createLogger({ level: 'debug' }))
169
+ ```
170
+
171
+ The CLI installs a `pino-pretty` transport on stderr at level `info` by default. Override with the `LOG_LEVEL` env var:
172
+
173
+ ```bash
174
+ LOG_LEVEL=debug threadbase-scanner scan
175
+ LOG_LEVEL=silent threadbase-scanner list --json
176
+ ```
177
+
178
+ Log events the scanner emits include `scan: start` / `scan: complete` (with timings + counts), `search: start` / `search: complete`, batched discovery summaries, parse-failure warnings, and `getConversation` cache-hit traces. Previously-swallowed errors (broken JSONL, inaccessible files, missing config dirs) now surface as `warn`-level events with structured context — useful for diagnosing why a particular conversation didn't show up.
179
+
180
+ ### Profiles
181
+
182
+ ```typescript
183
+ import { loadProfiles, saveProfiles } from '@threadbase/scanner'
184
+
185
+ // Load from ~/.config/threadbase-scanner/profiles.json
186
+ const profiles = await loadProfiles('~/.config/threadbase-scanner')
187
+
188
+ // Scan specific profiles
189
+ await scan({
190
+ profiles: [
191
+ { id: 'work', label: 'Work', configDir: '~/.claude-work', enabled: true },
192
+ { id: 'personal', label: 'Personal', configDir: '~/.claude', enabled: true },
193
+ ],
194
+ })
195
+ ```
196
+
197
+ ## CLI Usage
198
+
199
+ ```bash
200
+ # Install globally
201
+ npm install -g @threadbase/scanner
202
+
203
+ # Scan all conversations
204
+ threadbase-scanner scan
205
+
206
+ # List recent conversations
207
+ threadbase-scanner list --limit 20 --sort recent
208
+
209
+ # List with filters
210
+ threadbase-scanner list --since 7d --project my-app --include conversations
211
+
212
+ # Full-text search
213
+ threadbase-scanner search "fix bug" --limit 10
214
+
215
+ # Show a full conversation (prefix match on session ID)
216
+ threadbase-scanner show 879dd66c
217
+
218
+ # JSON output (for piping)
219
+ threadbase-scanner list --json | jq '.conversations[].projectName'
220
+
221
+ # Profile management
222
+ threadbase-scanner profiles list
223
+ threadbase-scanner profiles add work ~/.claude-work
224
+ threadbase-scanner profiles remove work
225
+ ```
226
+
227
+ ### CLI Flags
228
+
229
+ | Flag | Commands | Description |
230
+ |---|---|---|
231
+ | `--limit, -l` | list, search | Max results (default: 20) |
232
+ | `--offset` | list, search | Skip N results (default: 0) |
233
+ | `--sort, -s` | list, search | `recent\|oldest\|messages-desc\|messages-asc\|alpha` |
234
+ | `--since` | list, search | Time filter: `7d`, `2w`, `24h`, `2024-01-15` |
235
+ | `--project, -p` | list, search | Filter by project name/path |
236
+ | `--account, -a` | list, search | Filter by profile account |
237
+ | `--include` | list | `all\|conversations\|subagents\|teammates` |
238
+ | `--tier` | list, scan | Content tier: `standard\|full` |
239
+ | `--json` | all | JSON output |
240
+
241
+ ## ConversationMeta Fields
242
+
243
+ Every scanned conversation produces a `ConversationMeta` with the full superset of fields from all four original scanners:
244
+
245
+ | Field | Type | Origin |
246
+ |---|---|---|
247
+ | `id` | string | All |
248
+ | `filePath` | string | All |
249
+ | `sessionId` | string | All |
250
+ | `sessionName` | string | All |
251
+ | `projectPath` | string | All |
252
+ | `projectName` | string | All |
253
+ | `account` | string | All |
254
+ | `timestamp` | string (ISO-8601) | All |
255
+ | `messageCount` | number | All |
256
+ | `lastMessageSender` | `'user' \| 'assistant'` | Electron/VS Code/IntelliJ |
257
+ | `preview` | string | All (tier-dependent) |
258
+ | `contentSnippet` | string | Electron/VS Code/IntelliJ (tier-dependent) |
259
+ | `gitBranch` | string \| null | IntelliJ/CLI |
260
+ | `model` | string \| null | IntelliJ |
261
+ | `isSubagent` | boolean | VS Code |
262
+ | `parentSessionId` | string \| null | VS Code |
263
+ | `isTeammate` | boolean | VS Code |
264
+ | `teamName` | string \| null | VS Code |
265
+ | `toolNames` | string[] | CLI |
266
+
267
+ ## Development
268
+
269
+ ```bash
270
+ npm install
271
+ npm test # run tests
272
+ npm run build # build ESM + CJS + types
273
+ npm run lint # type check
274
+ ```
275
+
276
+ ## Contributing
277
+
278
+ Small bugfixes and parser improvements are welcome. For design changes, please open an issue first to discuss the shape before opening a PR.
279
+
280
+ - Use conventional commits (`feat:`, `fix:`, `chore:`, etc.) — see [`CLAUDE.md`](./CLAUDE.md) for project conventions.
281
+ - Run `npm run lint && npm test` before opening a PR.
282
+ - New features need an integration or e2e test in `__tests__/`; new parser cases need a fixture in `__fixtures__/`.
283
+
284
+ ## Architecture
285
+
286
+ ```
287
+ src/
288
+ index.ts Public API exports + standalone functions
289
+ types.ts All interfaces (ConversationMeta, ScanOptions, etc.)
290
+ scanner.ts ConversationScanner class (main orchestrator)
291
+ discovery.ts File discovery (fast-glob + exclusions)
292
+ parser.ts JSONL parsing (meta + full conversation)
293
+ indexer.ts FlexSearch-based search indexing
294
+ filters.ts Sort, since-filter, include, pagination
295
+ cache.ts LRU cache
296
+ git.ts Git branch detection
297
+ profiles.ts Profile management
298
+ tags.ts System tag cleaning
299
+ tiers.ts Content tier definitions
300
+ logger.ts Pino-based logger seam (silent by default)
301
+ cli/
302
+ index.ts CLI entry point (commander)
303
+ commands/ list, search, show, scan, profiles
304
+ ```