inspecto 1.1.0 → 1.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +97 -13
  2. package/package.json +1 -1
package/README.md CHANGED
@@ -6,9 +6,7 @@
6
6
 
7
7
  **Claude Code session quality analyzer — grade sessions, detect regressions, catch cache bugs.**
8
8
 
9
- > CLI to grade Claude Code sessions — 750+ weekly downloads in first week.
10
-
11
-
9
+ > CLI to grade Claude Code sessions — 915+ total downloads. v1.1.0 is out with 5 new metrics, watch mode, per-project config, a grade cache, and CI exit codes.
12
10
 
13
11
  > AMD's AI director manually analyzed 7,000 Claude Code sessions to prove it got worse.
14
12
  > `inspecto` automates that analysis for every developer.
@@ -29,6 +27,74 @@ The others answer *"how much did I spend?"*
29
27
 
30
28
  <img width="427" height="338" alt="Screenshot 2026-04-11 at 6 00 37 PM" src="https://github.com/user-attachments/assets/81777511-dd45-4ae0-8382-8e008dd98a7a" />
31
29
 
30
+ ---
31
+
32
+ ## What's New in v1.1.0
33
+
34
+ 900 downloads in. Thank you. Here's everything that shipped.
35
+
36
+ ### 5 new quality metrics — 7 → 12
37
+
38
+ | # | Metric | What it measures |
39
+ |---|---|---|
40
+ | **M8** | Subagent overhead | Whether Claude is delegating work to subagents efficiently — and whether those subagents are doing meaningful work |
41
+ | **M9** | Tool error rate | How often Claude's tool calls return errors. High rates mean Claude is calling tools with bad arguments or on paths that don't exist |
42
+ | **M10** | Thinking utilization | Whether extended thinking is actually being used on turns that warrant it. Low utilization on complex sessions often predicts high retry density |
43
+ | **M11** | MCP usage | Informational count of MCP tool calls (web search, web fetch, custom servers) per session |
44
+ | **M12** | Session cost | Estimated USD cost from real token usage — output, cache creation, and cache reads, priced at current Sonnet rates |
45
+
46
+ All 12 metrics are pure functions of your local session files. No data leaves your machine.
47
+
48
+ ### Watch mode
49
+
50
+ ```bash
51
+ npx inspecto watch
52
+ ```
53
+
54
+ Streams live metric updates as Claude Code writes to the active session. Grade updates every time a new assistant turn lands. Hit Ctrl-C to exit.
55
+
56
+ ### Per-project config
57
+
58
+ Drop `.inspecto.json` in your repo root to override thresholds and weights for your team:
59
+
60
+ ```json
61
+ {
62
+ "thresholds": {
63
+ "tokensPerEdit": { "healthy": 8000, "warning": 15000 },
64
+ "sessionCost": { "healthy": 5.00, "warning": 10.00 }
65
+ },
66
+ "weights": {
67
+ "cacheHitRate": 0.20,
68
+ "taskCompletion": 0.20
69
+ }
70
+ }
71
+ ```
72
+
73
+ Validate your config at any time:
74
+
75
+ ```bash
76
+ npx inspecto config validate
77
+ ```
78
+
79
+ ### 2–3× faster trend and compare
80
+
81
+ Computed grades are now cached in `~/.claude/inspecto-cache.db` (SQLite via `node:sqlite`). Cache key is `sha256(path:mtime)` — it invalidates automatically when Claude Code appends to a session. Re-runs over unchanged sessions skip parsing entirely. Up to 16 sessions are also parsed in parallel, so a 300-session history finishes in seconds instead of minutes.
82
+
83
+ ### CI exit codes
84
+
85
+ `audit`, `trend`, and `cache-check` now exit 1 on failures. Drop into any pipeline:
86
+
87
+ ```bash
88
+ npx inspecto trend --since 14d # exits 1 on any regression
89
+ npx inspecto audit --no-fail # warns but never blocks
90
+ ```
91
+
92
+ ### Other additions
93
+
94
+ - **`inspecto list`** — discover your projects and sessions before running audit or compare. No more guessing project slugs for `--project`.
95
+ - **CSV export** — `--format csv` on `audit` and `trend` for dashboards, spreadsheets, and log aggregators.
96
+ - **Subagent session aggregation** — inspecto now reads subagent JSONL files (`{sessionId}/subagents/agent-*.jsonl`) and merges them into the parent session. Multi-agent sessions were previously graded with large gaps in tool calls and token usage.
97
+ - **Format version detection** — inspecto now reads the `version` field on every JSONL record and warns when the format differs from what it was built against. Unknown record types are surfaced in the output rather than silently dropped.
32
98
 
33
99
  ---
34
100
 
@@ -57,7 +123,7 @@ npx inspecto
57
123
  ```
58
124
 
59
125
  ```
60
- inspecto v1.0.13 — Claude Code Session Quality Analyzer
126
+ inspecto v1.1.0 — Claude Code Session Quality Analyzer
61
127
 
62
128
  Session: 31f3f224 | my-app | 47 min | claude-opus-4-6
63
129
 
@@ -72,9 +138,23 @@ npx inspecto
72
138
  Retry density 0.08 ✓ healthy
73
139
  Tool diversity 0.52 ⚠ warning
74
140
  Tokens/useful-edit 3,218 ✓ healthy
141
+ Subagent overhead 0.41 ✓ healthy
142
+ Tool error rate 0.03 ✓ healthy
143
+ Thinking utilization 0.44 ✓ healthy
144
+ MCP usage 7 ✓ healthy
145
+ Session cost $1.24 ✓ healthy
75
146
  ...
76
147
  ```
77
148
 
149
+ ### Watch a session live
150
+
151
+ ```bash
152
+ npx inspecto watch
153
+ npx inspecto watch --project my-app
154
+ ```
155
+
156
+ Clears and re-renders the full audit report every time Claude Code writes a new turn. Useful during long sessions to catch degradation before it compounds.
157
+
78
158
  ### Detect regressions over time
79
159
 
80
160
  ```bash
@@ -189,6 +269,7 @@ npx inspecto trend --format csv
189
269
  | `--project <name>` | `audit`, `trend`, `compare`, `list` | Filter to a specific project |
190
270
  | `--since <duration>` | `trend`, `cache-check` | Time range (e.g., `7d`, `14d`, `30d`) |
191
271
  | `--sessions` | `list` | Show sessions view instead of projects view |
272
+ | `--interval <ms>` | `watch` | Polling interval fallback in ms (default: 2000) |
192
273
 
193
274
  ---
194
275
 
@@ -205,9 +286,9 @@ Each metric is a pure function computed from your local session files. No data l
205
286
  | **M5** | Retry density | User repeating themselves (proxy for misunderstanding) | ≤ 0.10 |
206
287
  | **M6** | Tool diversity | Over-reliance on a narrow set of tools (Shannon entropy) | ≥ 0.60 |
207
288
  | **M7** | Tokens per edit | Token cost per productive action | ≤ 5,000 |
208
- | **M8** | Subagent overhead | Fraction of turns delegated to subagents | < 0.60 |
289
+ | **M8** | Subagent overhead | Fraction of token work delegated to subagents | < 0.60 |
209
290
  | **M9** | Tool error rate | Rate of tool calls returning errors | ≤ 5% |
210
- | **M10** | Thinking utilization | Fraction of turns using extended thinking | ≥ 30% |
291
+ | **M10** | Thinking utilization | Fraction of tool-using turns with extended thinking | ≥ 30% |
211
292
  | **M11** | MCP usage | Count of MCP tool turns (informational) | — |
212
293
  | **M12** | Session cost | Total estimated session cost | ≤ $2.00 |
213
294
 
@@ -226,6 +307,7 @@ Once inspecto shows you where your sessions are degrading, here's how to fix eac
226
307
  | **Retry density high** | You're repeating yourself — Claude keeps misunderstanding | You're probably under-specifying. Provide a concrete example of the output you want in the first message. If retries persist across sessions, the root cause is usually a missing `CLAUDE.md` or a context window that's too wide. |
227
308
  | **Tool diversity low** | Claude over-relies on a narrow tool set (e.g. only Bash) | Prompt explicitly: *"Use the most specific tool available. Prefer Read over Bash for file reads. Prefer Edit over Write for modifications."* This is also a sign of a degraded model — track it over time with `inspecto trend`. |
228
309
  | **Tokens/edit high** | High token burn per productive action | Shorten your context. Close irrelevant files in the IDE, trim `CLAUDE.md` to essentials, and use `--project` to scope sessions to one repo at a time. |
310
+ | **Subagent overhead high** | Subagents doing most of the work but graded separately | Run `inspecto audit` on the root session — it now aggregates subagent turns automatically. If overhead is still high, your orchestration prompts may be spawning agents that duplicate work. |
229
311
  | **Tool error rate high** | Claude's tool calls are frequently failing | Usually means Claude is passing bad arguments or calling tools on files that don't exist. Add stricter preconditions in `CLAUDE.md`: *"Verify a file exists before reading it. Verify a path before writing."* |
230
312
  | **Thinking utilization low** | Extended thinking is rarely being used | For complex tasks, prompt Claude to think before acting: *"Think carefully before making any changes."* Low thinking utilization often correlates with shallow analysis and increased retry density. |
231
313
  | **Session cost high** | Spending more than expected per session | Scope sessions narrowly — one task, one repo. Use `--project` to avoid scanning large unrelated projects. Frequent cache misses compound cost; check `cache-check` if cost spiked unexpectedly. |
@@ -236,9 +318,9 @@ Once inspecto shows you where your sessions are degrading, here's how to fix eac
236
318
 
237
319
  ## How it works
238
320
 
239
- Claude Code writes one JSONL session file per conversation to `~/.claude/projects/{project}/{sessionId}.jsonl`. Each line is a JSON record — user messages, assistant responses (streamed as multiple chunks), tool calls, and tool results.
321
+ Claude Code writes one JSONL session file per conversation to `~/.claude/projects/{project}/{sessionId}.jsonl`. Each line is a JSON record — user messages, assistant responses (streamed as multiple chunks), tool calls, and tool results. Subagent sessions land in `{sessionId}/subagents/agent-*.jsonl`.
240
322
 
241
- `inspecto` streams these files line-by-line (never loading 100MB+ files into memory), merges streaming chunks by `message.id`, extracts tool-use patterns and token usage, and computes the 12 metrics above.
323
+ `inspecto` streams these files line-by-line (never loading 100MB+ files into memory), merges streaming chunks by `message.id`, aggregates subagent turns into the parent session, extracts tool-use patterns and token usage, and computes the 12 metrics above.
242
324
 
243
325
  The composite grade is a weighted average mapped to a letter grade from **A+** to **F**. Grades below **D+** (score < 67) trigger a non-zero exit code in CI mode.
244
326
 
@@ -246,7 +328,7 @@ The composite grade is a weighted average mapped to a letter grade from **A+** t
246
328
 
247
329
  ## Why this exists
248
330
 
249
- In the last 30 days before this tool was built:
331
+ In the 30 days before this tool was built:
250
332
 
251
333
  - **Apr 7, 2026** — A Reddit post about Claude Code's declining quality hit 1,060 upvotes
252
334
  - **Apr 6, 2026** — AMD's Director of AI filed a GitHub issue with data from 6,852 sessions proving Claude Code reads code 3x less before editing and rewrites entire files 2x more often than before
@@ -272,11 +354,11 @@ Architecture:
272
354
 
273
355
  ```
274
356
  src/
275
- ├── parser/ # Streaming JSONL reader + session builder (merges streaming chunks)
357
+ ├── parser/ # Streaming JSONL reader + session builder (merges streaming chunks, aggregates subagents)
276
358
  ├── metrics/ # 12 pure-function quality metrics + composite grader
277
359
  ├── anomaly/ # Baseline computation + regression detection + cache anomaly
278
360
  ├── reporter/ # Terminal (chalk + cli-table3), JSON, and CSV output modes
279
- ├── commands/ # audit, trend, cache-check, compare, list
361
+ ├── commands/ # audit, trend, cache-check, compare, list, watch
280
362
  ├── cache/ # SQLite grade-result cache (node:sqlite, ~/.claude/inspecto-cache.db)
281
363
  ├── config/ # .inspecto.json config loader + per-metric threshold/weight overrides
282
364
  └── utils/ # Levenshtein, paths, duration parsing, formatting, concurrency helper
@@ -284,12 +366,14 @@ src/
284
366
 
285
367
  Key technical details:
286
368
  - **Streaming parse**: `readline` + `createReadStream` — never loads full files into memory
287
- - **Chunk deduplication**: Assistant responses come as multiple JSONL records sharing `message.id`; content blocks are merged and only the final chunk's `output_tokens` is used
288
- - **No external APIs**: All analysis is local. No network calls. Works offline
369
+ - **Subagent aggregation**: discovers `{sessionId}/subagents/agent-*.jsonl`, tags each turn with `agentId`, and merges into the parent session's turn list
370
+ - **Chunk deduplication**: assistant responses come as multiple JSONL records sharing `message.id`; content blocks are merged and only the final chunk's `output_tokens` is used
371
+ - **No external APIs**: all analysis is local. No network calls. Works offline
289
372
  - **Real token cost**: `input_tokens` is always a streaming placeholder — actual input = `cache_read_input_tokens + cache_creation_input_tokens`
290
373
  - **Concurrency**: `trend` and `compare` parse up to 16 session files in parallel (semaphore-limited) so large histories don't block
291
374
  - **Grade cache**: computed `GradeResult` objects are persisted in `~/.claude/inspecto-cache.db` (SQLite via `node:sqlite`). Cache key = `sha256(path:mtime)`. Re-runs over unchanged sessions skip parsing entirely — typically 2–3× faster
292
375
  - **CI exit codes**: `audit` exits 1 on D/F grades, `trend` exits 1 on any regression, `cache-check` exits 1 on any anomaly. All suppressed by `--no-fail`
376
+ - **Format resilience**: reads the `version` field on every JSONL record and warns when the format diverges from the expected version. Unknown record types are surfaced in output rather than silently dropped
293
377
 
294
378
  ---
295
379
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "inspecto",
3
- "version": "1.1.0",
3
+ "version": "1.1.1",
4
4
  "description": "inspecto — Claude Code session quality analyzer. Grade sessions, detect regressions, catch cache bugs.",
5
5
  "type": "module",
6
6
  "main": "./dist/index.js",