inspecto 1.1.0 → 1.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +150 -67
  2. package/package.json +1 -1
package/README.md CHANGED
@@ -6,12 +6,97 @@
6
6
 
7
7
  **Claude Code session quality analyzer — grade sessions, detect regressions, catch cache bugs.**
8
8
 
9
- > CLI to grade Claude Code sessions — 750+ weekly downloads in first week.
9
+ > CLI to grade Claude Code sessions — 915+ total downloads.
10
10
 
11
+ ---
12
+
13
+ ## What's New in v1.1.0
14
+
15
+ 915+ downloads in. Thank you. Here's everything that shipped.
16
+
17
+ ### 5 new quality metrics — 7 → 12
18
+
19
+ | # | Metric | What it measures |
20
+ |---|---|---|
21
+ | **M8** | Subagent overhead | Whether Claude is delegating work to subagents efficiently — and whether those subagents are doing meaningful work |
22
+ | **M9** | Tool error rate | How often Claude's tool calls return errors. High rates mean Claude is calling tools with bad arguments or on paths that don't exist |
23
+ | **M10** | Thinking utilization | Whether extended thinking is actually being used on turns that warrant it. Low utilization on complex sessions often predicts high retry density |
24
+ | **M11** | MCP usage | Informational count of MCP tool calls (web search, web fetch, custom servers) per session |
25
+ | **M12** | Session cost | Estimated USD cost from real token usage — output, cache creation, and cache reads, priced at current Sonnet rates |
26
+
27
+ All 12 metrics are pure functions of your local session files. No data leaves your machine.
28
+
29
+ ### Watch mode
30
+
31
+ ```bash
32
+ npx inspecto watch
33
+ ```
34
+
35
+ Streams live metric updates as Claude Code writes to the active session. Grade updates every time a new assistant turn lands. Hit Ctrl-C to exit.
36
+
37
+ ### Per-project config
38
+
39
+ Drop `.inspecto.json` in your repo root to override thresholds and weights for your team:
40
+
41
+ ```json
42
+ {
43
+ "thresholds": {
44
+ "tokensPerEdit": { "healthy": 8000, "warning": 15000 },
45
+ "sessionCost": { "healthy": 5.00, "warning": 10.00 }
46
+ },
47
+ "weights": {
48
+ "cacheHitRate": 0.20,
49
+ "taskCompletion": 0.20
50
+ }
51
+ }
52
+ ```
53
+
54
+ Validate your config at any time:
55
+
56
+ ```bash
57
+ npx inspecto config validate
58
+ ```
59
+
60
+ ### 2–3× faster trend and compare
61
+
62
+ Computed grades are now cached in `~/.claude/inspecto-cache.db` (SQLite via `node:sqlite`). Cache key is `sha256(path:mtime)` — it invalidates automatically when Claude Code appends to a session. Re-runs over unchanged sessions skip parsing entirely. Up to 16 sessions are also parsed in parallel, so a 300-session history finishes in seconds instead of minutes.
63
+
64
+ ### CI exit codes
65
+
66
+ `audit`, `trend`, and `cache-check` now exit 1 on failures. Drop into any pipeline:
67
+
68
+ ```bash
69
+ npx inspecto trend --since 14d # exits 1 on any regression
70
+ npx inspecto audit --no-fail # warns but never blocks
71
+ ```
11
72
 
73
+ ### Other additions
12
74
 
13
- > AMD's AI director manually analyzed 7,000 Claude Code sessions to prove it got worse.
14
- > `inspecto` automates that analysis for every developer.
75
+ - **`inspecto list`** discover your projects and sessions before running audit or compare. No more guessing project slugs for `--project`.
76
+ - **CSV export** — `--format csv` on `audit` and `trend` for dashboards, spreadsheets, and log aggregators.
77
+ - **Subagent session aggregation** — inspecto now reads subagent JSONL files (`{sessionId}/subagents/agent-*.jsonl`) and merges them into the parent session. Multi-agent sessions were previously graded with large gaps in tool calls and token usage.
78
+ - **Format version detection** — inspecto now reads the `version` field on every JSONL record and warns when the format differs from what it was built against. Unknown record types are surfaced in the output rather than silently dropped.
79
+
80
+ ---
81
+
82
+ ## Why I built this
83
+
84
+ In the 30 days before this tool existed:
85
+
86
+ - **Apr 7, 2026** — A Reddit post about Claude Code's declining quality hit 1,060 upvotes
87
+ - **Apr 6, 2026** — AMD's Director of AI filed a GitHub issue with data from 6,852 sessions proving Claude Code reads code 3x less before editing and rewrites entire files 2x more often than before
88
+ - **Mar 31, 2026** — Claude Code's source leaked via npm, revealing 2 cache bugs that silently inflate costs 10-20x
89
+ - **Mar 26, 2026** — Users on the $100/mo plan reported burning through limits in 90 minutes instead of 5 hours
90
+
91
+ AMD's AI director manually analyzed 7,000 sessions to prove it got worse. That shouldn't require manual analysis.
92
+
93
+ The tools that track token spending tell you *what* you used. `inspecto` tells you *whether it was worth it*.
94
+
95
+ ---
96
+
97
+ ## What it does
98
+
99
+ `inspecto` reads the JSONL session logs Claude Code already writes to `~/.claude/projects/` and grades every session across 12 quality metrics — no API key, no telemetry, fully offline.
15
100
 
16
101
  | | `ccusage` | `claude-usage` | `Claude-Code-Usage-Monitor` | **`inspecto`** |
17
102
  |---|---|---|---|---|
@@ -29,10 +114,38 @@ The others answer *"how much did I spend?"*
29
114
 
30
115
  <img width="427" height="338" alt="Screenshot 2026-04-11 at 6 00 37 PM" src="https://github.com/user-attachments/assets/81777511-dd45-4ae0-8382-8e008dd98a7a" />
31
116
 
117
+ ### The 12 quality metrics
118
+
119
+ Each metric is a pure function computed from your local session files.
120
+
121
+ | # | Metric | What it detects | Healthy |
122
+ |---|---|---|---|
123
+ | **M1** | Reads-before-edit ratio | Claude editing without reading context first | ≥ 4.0 |
124
+ | **M2** | Rewrite ratio | Full-file rewrites instead of surgical edits | ≤ 0.25 |
125
+ | **M3** | Cache hit rate | Prompt cache bugs inflating token costs | ≥ 0.50 |
126
+ | **M4** | Task completion | "I'll do X" promises without follow-through | ≥ 0.90 |
127
+ | **M5** | Retry density | User repeating themselves (proxy for misunderstanding) | ≤ 0.10 |
128
+ | **M6** | Tool diversity | Over-reliance on a narrow set of tools (Shannon entropy) | ≥ 0.60 |
129
+ | **M7** | Tokens per edit | Token cost per productive action | ≤ 5,000 |
130
+ | **M8** | Subagent overhead | Fraction of token work delegated to subagents | < 0.60 |
131
+ | **M9** | Tool error rate | Rate of tool calls returning errors | ≤ 5% |
132
+ | **M10** | Thinking utilization | Fraction of tool-using turns with extended thinking | ≥ 30% |
133
+ | **M11** | MCP usage | Count of MCP tool turns (informational) | — |
134
+ | **M12** | Session cost | Total estimated session cost | ≤ $2.00 |
135
+
136
+ ### How it works
137
+
138
+ Claude Code writes one JSONL session file per conversation to `~/.claude/projects/{project}/{sessionId}.jsonl`. Each line is a JSON record — user messages, assistant responses (streamed as multiple chunks), tool calls, and tool results. Subagent sessions land in `{sessionId}/subagents/agent-*.jsonl`.
139
+
140
+ `inspecto` streams these files line-by-line (never loading 100MB+ files into memory), merges streaming chunks by `message.id`, aggregates subagent turns into the parent session, extracts tool-use patterns and token usage, and computes the 12 metrics above.
141
+
142
+ The composite grade is a weighted average mapped to a letter grade from **A+** to **F**.
32
143
 
33
144
  ---
34
145
 
35
- ## Install
146
+ ## How to use it
147
+
148
+ ### Install
36
149
 
37
150
  ```bash
38
151
  npm install -g inspecto
@@ -48,8 +161,6 @@ Requires Node.js >= 22 (uses the built-in `node:sqlite` module). Works on macOS,
48
161
 
49
162
  ---
50
163
 
51
- ## Usage
52
-
53
164
  ### Grade your most recent session
54
165
 
55
166
  ```bash
@@ -57,7 +168,7 @@ npx inspecto
57
168
  ```
58
169
 
59
170
  ```
60
- inspecto v1.0.13 — Claude Code Session Quality Analyzer
171
+ inspecto v1.1.0 — Claude Code Session Quality Analyzer
61
172
 
62
173
  Session: 31f3f224 | my-app | 47 min | claude-opus-4-6
63
174
 
@@ -72,9 +183,23 @@ npx inspecto
72
183
  Retry density 0.08 ✓ healthy
73
184
  Tool diversity 0.52 ⚠ warning
74
185
  Tokens/useful-edit 3,218 ✓ healthy
186
+ Subagent overhead 0.41 ✓ healthy
187
+ Tool error rate 0.03 ✓ healthy
188
+ Thinking utilization 0.44 ✓ healthy
189
+ MCP usage 7 ✓ healthy
190
+ Session cost $1.24 ✓ healthy
75
191
  ...
76
192
  ```
77
193
 
194
+ ### Watch a session live
195
+
196
+ ```bash
197
+ npx inspecto watch
198
+ npx inspecto watch --project my-app
199
+ ```
200
+
201
+ Clears and re-renders the full audit report every time Claude Code writes a new turn. Useful during long sessions to catch degradation before it compounds.
202
+
78
203
  ### Detect regressions over time
79
204
 
80
205
  ```bash
@@ -124,15 +249,15 @@ npx inspecto compare --projects my-app,api-gateway,shared-lib
124
249
 
125
250
  ### Manage the grade cache
126
251
 
127
- `inspecto trend` and `inspecto compare` cache computed grade results in `~/.claude/inspecto-cache.db` so re-runs over the same sessions are near-instant. The cache is keyed by file path + mtime, so it invalidates automatically when Claude Code writes new data to a session.
128
-
129
252
  ```bash
130
253
  npx inspecto cache clear # delete the cache file (~/.claude/inspecto-cache.db)
131
254
  ```
132
255
 
256
+ `inspecto trend` and `inspecto compare` cache computed grades in `~/.claude/inspecto-cache.db`. The cache is keyed by file path + mtime and invalidates automatically when Claude Code writes new data.
257
+
133
258
  ---
134
259
 
135
- ## CI integration
260
+ ### CI integration
136
261
 
137
262
  `inspecto` exits with a non-zero code when quality drops below acceptable thresholds:
138
263
 
@@ -143,7 +268,7 @@ npx inspecto cache clear # delete the cache file (~/.claude/inspecto-cache.db)
143
268
  | `inspecto cache-check` | Any session has a cache anomaly |
144
269
  | `inspecto compare` | Never (comparison is informational) |
145
270
 
146
- Use `--no-fail` on any command to always exit 0 — useful for scripts that want the output without failing the pipeline:
271
+ Use `--no-fail` to always exit 0:
147
272
 
148
273
  ```bash
149
274
  # In a pre-push hook — warn but don't block
@@ -158,9 +283,7 @@ npx inspecto audit --format csv --no-fail >> metrics.csv
158
283
 
159
284
  ---
160
285
 
161
- ## Output formats
162
-
163
- All commands default to terminal output with color and tables. For scripting:
286
+ ### Output formats
164
287
 
165
288
  ```bash
166
289
  # Structured JSON
@@ -172,13 +295,13 @@ npx inspecto audit --format csv
172
295
  npx inspecto trend --format csv
173
296
  ```
174
297
 
175
- `audit --format csv` produces one row per metric: `name,value,status,label`
298
+ `audit --format csv` one row per metric: `name,value,status,label`
176
299
 
177
- `trend --format csv` produces one row per metric: `name,recentAvg,fullAvg,changePercent,status`
300
+ `trend --format csv` one row per metric: `name,recentAvg,fullAvg,changePercent,status`
178
301
 
179
302
  ---
180
303
 
181
- ## Global options
304
+ ### Global options
182
305
 
183
306
  | Flag | Commands | Description |
184
307
  |---|---|---|
@@ -189,33 +312,13 @@ npx inspecto trend --format csv
189
312
  | `--project <name>` | `audit`, `trend`, `compare`, `list` | Filter to a specific project |
190
313
  | `--since <duration>` | `trend`, `cache-check` | Time range (e.g., `7d`, `14d`, `30d`) |
191
314
  | `--sessions` | `list` | Show sessions view instead of projects view |
315
+ | `--interval <ms>` | `watch` | Polling interval fallback in ms (default: 2000) |
192
316
 
193
317
  ---
194
318
 
195
- ## The 12 Quality Metrics
196
-
197
- Each metric is a pure function computed from your local session files. No data leaves your machine.
198
-
199
- | # | Metric | What it detects | Healthy |
200
- |---|---|---|---|
201
- | **M1** | Reads-before-edit ratio | Claude editing without reading context first | ≥ 4.0 |
202
- | **M2** | Rewrite ratio | Full-file rewrites instead of surgical edits | ≤ 0.25 |
203
- | **M3** | Cache hit rate | Prompt cache bugs inflating token costs | ≥ 0.50 |
204
- | **M4** | Task completion | "I'll do X" promises without follow-through | ≥ 0.90 |
205
- | **M5** | Retry density | User repeating themselves (proxy for misunderstanding) | ≤ 0.10 |
206
- | **M6** | Tool diversity | Over-reliance on a narrow set of tools (Shannon entropy) | ≥ 0.60 |
207
- | **M7** | Tokens per edit | Token cost per productive action | ≤ 5,000 |
208
- | **M8** | Subagent overhead | Fraction of turns delegated to subagents | < 0.60 |
209
- | **M9** | Tool error rate | Rate of tool calls returning errors | ≤ 5% |
210
- | **M10** | Thinking utilization | Fraction of turns using extended thinking | ≥ 30% |
211
- | **M11** | MCP usage | Count of MCP tool turns (informational) | — |
212
- | **M12** | Session cost | Total estimated session cost | ≤ $2.00 |
213
-
214
- ---
215
-
216
- ## What to do about it
319
+ ### What to do about it
217
320
 
218
- Once inspecto shows you where your sessions are degrading, here's how to fix each metric:
321
+ Once inspecto shows you where sessions are degrading, here's how to fix each metric:
219
322
 
220
323
  | Metric | Symptom | Fix |
221
324
  |---|---|---|
@@ -226,6 +329,7 @@ Once inspecto shows you where your sessions are degrading, here's how to fix eac
226
329
  | **Retry density high** | You're repeating yourself — Claude keeps misunderstanding | You're probably under-specifying. Provide a concrete example of the output you want in the first message. If retries persist across sessions, the root cause is usually a missing `CLAUDE.md` or a context window that's too wide. |
227
330
  | **Tool diversity low** | Claude over-relies on a narrow tool set (e.g. only Bash) | Prompt explicitly: *"Use the most specific tool available. Prefer Read over Bash for file reads. Prefer Edit over Write for modifications."* This is also a sign of a degraded model — track it over time with `inspecto trend`. |
228
331
  | **Tokens/edit high** | High token burn per productive action | Shorten your context. Close irrelevant files in the IDE, trim `CLAUDE.md` to essentials, and use `--project` to scope sessions to one repo at a time. |
332
+ | **Subagent overhead high** | Subagents doing most of the work but graded separately | Run `inspecto audit` on the root session — it now aggregates subagent turns automatically. If overhead is still high, your orchestration prompts may be spawning agents that duplicate work. |
229
333
  | **Tool error rate high** | Claude's tool calls are frequently failing | Usually means Claude is passing bad arguments or calling tools on files that don't exist. Add stricter preconditions in `CLAUDE.md`: *"Verify a file exists before reading it. Verify a path before writing."* |
230
334
  | **Thinking utilization low** | Extended thinking is rarely being used | For complex tasks, prompt Claude to think before acting: *"Think carefully before making any changes."* Low thinking utilization often correlates with shallow analysis and increased retry density. |
231
335
  | **Session cost high** | Spending more than expected per session | Scope sessions narrowly — one task, one repo. Use `--project` to avoid scanning large unrelated projects. Frequent cache misses compound cost; check `cache-check` if cost spiked unexpectedly. |
@@ -234,29 +338,6 @@ Once inspecto shows you where your sessions are degrading, here's how to fix eac
234
338
 
235
339
  ---
236
340
 
237
- ## How it works
238
-
239
- Claude Code writes one JSONL session file per conversation to `~/.claude/projects/{project}/{sessionId}.jsonl`. Each line is a JSON record — user messages, assistant responses (streamed as multiple chunks), tool calls, and tool results.
240
-
241
- `inspecto` streams these files line-by-line (never loading 100MB+ files into memory), merges streaming chunks by `message.id`, extracts tool-use patterns and token usage, and computes the 12 metrics above.
242
-
243
- The composite grade is a weighted average mapped to a letter grade from **A+** to **F**. Grades below **D+** (score < 67) trigger a non-zero exit code in CI mode.
244
-
245
- ---
246
-
247
- ## Why this exists
248
-
249
- In the last 30 days before this tool was built:
250
-
251
- - **Apr 7, 2026** — A Reddit post about Claude Code's declining quality hit 1,060 upvotes
252
- - **Apr 6, 2026** — AMD's Director of AI filed a GitHub issue with data from 6,852 sessions proving Claude Code reads code 3x less before editing and rewrites entire files 2x more often than before
253
- - **Mar 31, 2026** — Claude Code's source leaked via npm, revealing 2 cache bugs that silently inflate costs 10-20x
254
- - **Mar 26, 2026** — Users on the $100/mo plan reported burning through limits in 90 minutes instead of 5 hours
255
-
256
- The tools that track token spending tell you *what* you used. `inspecto` tells you *whether it was worth it*.
257
-
258
- ---
259
-
260
341
  ## Development
261
342
 
262
343
  ```bash
@@ -272,11 +353,11 @@ Architecture:
272
353
 
273
354
  ```
274
355
  src/
275
- ├── parser/ # Streaming JSONL reader + session builder (merges streaming chunks)
356
+ ├── parser/ # Streaming JSONL reader + session builder (merges streaming chunks, aggregates subagents)
276
357
  ├── metrics/ # 12 pure-function quality metrics + composite grader
277
358
  ├── anomaly/ # Baseline computation + regression detection + cache anomaly
278
359
  ├── reporter/ # Terminal (chalk + cli-table3), JSON, and CSV output modes
279
- ├── commands/ # audit, trend, cache-check, compare, list
360
+ ├── commands/ # audit, trend, cache-check, compare, list, watch
280
361
  ├── cache/ # SQLite grade-result cache (node:sqlite, ~/.claude/inspecto-cache.db)
281
362
  ├── config/ # .inspecto.json config loader + per-metric threshold/weight overrides
282
363
  └── utils/ # Levenshtein, paths, duration parsing, formatting, concurrency helper
@@ -284,12 +365,14 @@ src/
284
365
 
285
366
  Key technical details:
286
367
  - **Streaming parse**: `readline` + `createReadStream` — never loads full files into memory
287
- - **Chunk deduplication**: Assistant responses come as multiple JSONL records sharing `message.id`; content blocks are merged and only the final chunk's `output_tokens` is used
288
- - **No external APIs**: All analysis is local. No network calls. Works offline
368
+ - **Subagent aggregation**: discovers `{sessionId}/subagents/agent-*.jsonl`, tags each turn with `agentId`, and merges into the parent session's turn list
369
+ - **Chunk deduplication**: assistant responses come as multiple JSONL records sharing `message.id`; content blocks are merged and only the final chunk's `output_tokens` is used
370
+ - **No external APIs**: all analysis is local. No network calls. Works offline
289
371
  - **Real token cost**: `input_tokens` is always a streaming placeholder — actual input = `cache_read_input_tokens + cache_creation_input_tokens`
290
372
  - **Concurrency**: `trend` and `compare` parse up to 16 session files in parallel (semaphore-limited) so large histories don't block
291
373
  - **Grade cache**: computed `GradeResult` objects are persisted in `~/.claude/inspecto-cache.db` (SQLite via `node:sqlite`). Cache key = `sha256(path:mtime)`. Re-runs over unchanged sessions skip parsing entirely — typically 2–3× faster
292
374
  - **CI exit codes**: `audit` exits 1 on D/F grades, `trend` exits 1 on any regression, `cache-check` exits 1 on any anomaly. All suppressed by `--no-fail`
375
+ - **Format resilience**: reads the `version` field on every JSONL record and warns when the format diverges from the expected version. Unknown record types are surfaced in output rather than silently dropped
293
376
 
294
377
  ---
295
378
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "inspecto",
3
- "version": "1.1.0",
3
+ "version": "1.1.2",
4
4
  "description": "inspecto — Claude Code session quality analyzer. Grade sessions, detect regressions, catch cache bugs.",
5
5
  "type": "module",
6
6
  "main": "./dist/index.js",