@uxcontinuum/ccaudit 1.0.3 → 1.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +77 -37
  2. package/index.js +83 -7
  3. package/package.json +1 -1
package/README.md CHANGED
@@ -1,12 +1,32 @@
1
1
  # ccaudit
2
2
 
3
- A diagnostic for your Claude Code setup. Run it, get graded across five dimensions, find the specific fixes.
3
+ A diagnostic for your Claude Code setup. Three things at once:
4
+
5
+ 1. **A fun report card** you can screenshot and share.
6
+ 2. **A hygiene linter** that surfaces what's missing.
7
+ 3. **A discovery tool** that shows you which parts of Claude Code you are not using yet.
4
8
 
5
9
  ```bash
6
10
  npx @uxcontinuum/ccaudit
7
11
  ```
8
12
 
9
- Reads `~/.claude/` locally. Zero dependencies. Nothing leaves your machine.
13
+ Zero install. Zero dependencies. No network calls. Reads `~/.claude/` on your machine and outputs a grade card.
14
+
15
+ ## Why this exists
16
+
17
+ Most Claude Code users are running on a fraction of the surface area. No hooks installed. No skills configured. No MCP servers. No idea what their token cost per shipped feature is. No concept of how often their agent fails on first try.
18
+
19
+ The hype is on the model. The actual constraint is everything around the model. The scaffolding.
20
+
21
+ ccaudit grades the scaffolding.
22
+
23
+ ## What the grade is and isn't
24
+
25
+ This is a **hygiene and discovery audit**, not an outcomes audit. It measures whether your Claude Code setup is **set up well** and **uses what's available**, not whether your specific outputs are good.
26
+
27
+ Think of it as a linter for your AI workflow. Passing lint doesn't guarantee your code is good. Failing lint usually means something is missing. Same here: a high grade doesn't mean Claude is shipping perfect work for you. A low grade usually means there's surface area of Claude Code you haven't unlocked yet.
28
+
29
+ The grade can be gamed (install five no-op hooks, auto-title every session, scrub "just" from your prompts). Don't bother. The findings under the grade are the value, not the letter.
10
30
 
11
31
  ## What you get
12
32
 
@@ -15,38 +35,50 @@ Reads `~/.claude/` locally. Zero dependencies. Nothing leaves your machine.
15
35
  CCAUDIT your Claude Code report card
16
36
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
17
37
 
18
- OVERALL GRADE D+ (67/100)
38
+ OVERALL GRADE C+ (79/100)
19
39
 
20
- Hook coverage B- ████████████████░░░░
21
- 1 PreToolUse, 1 PostToolUse, 0 UserPromptSubmit hook(s).
40
+ Hook coverage A+ ████████████████████
41
+ 1 PostToolUse, 2 Stop, 1 PreToolUse, autoMemory plugin.
22
42
 
23
43
  Project hygiene (human) F ████████░░░░░░░░░░░░
24
- 0% of your human sessions are titled. Avg prompt: 2350 chars.
25
- → Title your sessions. Untitled sessions are unsearchable history.
44
+ 0% titled, launched from 10 distinct working dirs.
26
45
 
27
- Tool balance (human) F ███████████░░░░░░░░░
28
- Bash 73%, Edit+Write 10%, Read 10%, Grep+Glob 2%, Agent/Task 0%.
29
- → You are running things, not editing things. Use Edit/Write more.
46
+ Tool balance (human) D+ ██████████████░░░░░░
47
+ Bash 73%, Edit+Write 10% (3,536 calls), Read 10%.
30
48
 
31
49
  Prompt tells C ███████████████░░░░░
32
- You said "just" 10243 times across 19199 prompts (53%).
33
- → The word "just" telegraphs that you think the task is simple. It is not.
50
+ Said "just" 10,236 times across 19,192 prompts (53%).
51
+
52
+ Output signals B ████████████████░░░░
53
+ Tool error rate 4.2%, median session length 8 messages.
34
54
 
35
55
  Pipeline ops (agent sessions) B █████████████████░░░
36
- 3253 agent-spawned sessions, 26.93M output tokens.
56
+ 3,253 agent-spawned sessions, 26.93M output tokens.
37
57
  ```
38
58
 
39
59
  ## What it checks
40
60
 
41
- | Dimension | Source |
42
- |-----------|--------|
43
- | Hook coverage | `~/.claude/settings.json` (PreToolUse, PostToolUse, UserPromptSubmit hook counts) |
44
- | Project hygiene | session titles, average prompt length, untitled rate (human sessions only) |
45
- | Tool balance | distribution across Bash, Edit/Write, Read, Grep/Glob, Agent/Task (human sessions only) |
46
- | Prompt tells | "just" frequency, "please" frequency, total prompt count |
47
- | Pipeline ops | agent-spawned session stats: count, token spend, hook coverage relative to volume |
61
+ | Dimension | What it measures | What it cannot see |
62
+ |-----------|------------------|-------------------|
63
+ | Hook coverage | Hooks configured in `~/.claude/settings.json` across all event types, plus `autoMemoryEnabled` plugin flag | Whether the hooks actually do anything useful |
64
+ | Project hygiene | Custom titles, auto-slugs, CWD diversity, prompt length | Whether your titles describe the work accurately |
65
+ | Tool balance | Distribution across Bash, Edit, Read, Grep, Agent. Adaptive: high Bash% is okay if absolute Edit volume is also high | Whether each tool call accomplished the goal |
66
+ | Prompt tells | Frequency of hedge words ("just", "please"), prompt clarity heuristics | Whether your prompts produce good outputs |
67
+ | Output signals | Tool-call error rate, median session length, within-session retry patterns | Whether your shipped code works in production |
68
+ | Pipeline ops | Agent-spawned session count, token spend, hook coverage relative to volume | Whether your pipeline ships features that don't break |
69
+
70
+ It separates human-driven sessions from agent-spawned worktrees via three signals (`isSidechain`, `userType`, UUID/hex dir-name pattern). Operator grade and pipeline grade get scored independently against different rubrics.
48
71
 
49
- It separates human-driven sessions from agent-spawned worktrees (UUID and ULID-suffix dirs in your projects folder). Your operator-grade and your pipeline-grade get scored independently against different rubrics.
72
+ ## Cross-platform support
73
+
74
+ Works on:
75
+ - macOS (`$HOME/.claude/`)
76
+ - Linux (`$HOME/.claude/`)
77
+ - Windows / WSL (`%USERPROFILE%\.claude\` or `$HOME/.claude/`)
78
+ - VPS / non-default home (uses Node's `os.homedir()`)
79
+ - Running as root with users in `/home/*` (scans all)
80
+
81
+ Tested against setups ranging from "brand new install with zero sessions" to "20,000 sessions and 4,000 agent worktrees."
50
82
 
51
83
  ## Install
52
84
 
@@ -54,7 +86,7 @@ It separates human-driven sessions from agent-spawned worktrees (UUID and ULID-s
54
86
  # Run once without installing
55
87
  npx @uxcontinuum/ccaudit
56
88
 
57
- # Or globally
89
+ # Or install globally
58
90
  npm i -g @uxcontinuum/ccaudit
59
91
  ccaudit
60
92
  ```
@@ -64,34 +96,42 @@ Requires Node 14+. No other dependencies.
64
96
  ## Options
65
97
 
66
98
  ```bash
67
- ccaudit # full report, last 30 days of activity
99
+ ccaudit # full report, last 30 days
68
100
  ccaudit --days 7 # just last week
69
101
  ccaudit --days 365 # full year
102
+ ccaudit --json # programmatic output, anonymized
70
103
  ccaudit --no-color # plain text for copying
71
104
  ```
72
105
 
73
- ## How it grades
106
+ ## Privacy
107
+
108
+ Reads `~/.claude/` on your machine. Outputs to stdout. No network calls, no telemetry, no opt-in submission. The `--json` output is anonymized (no prompts, no slugs, no CWD strings, just aggregate counts and percentages).
74
109
 
75
- Each dimension produces a 0-100 score and a letter grade (A+ through F). The overall grade is the mean of the dimension scores. The rubric weights:
110
+ ## Honest disclaimers
76
111
 
77
- - Hook coverage is hard-floored at 35 if you have zero hooks. Anything could happen overnight.
78
- - Project hygiene scales linearly with titled-session percentage and penalizes both ultra-terse (<80 chars) and wall-of-text (>1500 chars) average prompts.
79
- - Tool balance penalizes Bash dominance above 65% and rewards healthy editing (10-55% Edit+Write).
80
- - Prompt tells subtract for high "just" frequency. "Just" telegraphs that you think the task is simple. It usually is not.
81
- - Pipeline ops rewards low tokens-per-session and penalizes running an agent pipeline without runtime hooks.
112
+ - The grade is opinionated, not objective.
113
+ - The rubric will change as the tool matures.
114
+ - High grade good outputs. Low grade ≠ bad outputs. The grade is about **scaffolding and feature coverage**, not results.
115
+ - The tool ships with a built-in nudge toward [Continuum Sprint](https://uxcontinuum.com/sprint) when it surfaces 2+ failing dimensions. That is intentional. If your setup is genuinely broken in two places, a 2-week sprint is often what fixes it. Ignore the nudge if you don't want it.
82
116
 
83
- The grade is opinionated, not objective. Read it as a diagnostic, not a judgment.
117
+ ## The story behind this
84
118
 
85
- ## Why this exists
119
+ Karpathy keeps saying we're entering vibe coding. Software you write in English while AI generates the code. He is not wrong about where this is going.
86
120
 
87
- There is no public benchmark for "is my Claude Code setup any good." People burn weeks reading other people's CLAUDE.md files trying to figure out what they're doing wrong. This tool answers that question in 30 seconds.
121
+ I bought in six months ago. Built a multi-agent pipeline. Started shipping production code through it. Six weeks of recent data: 333 PRs, $1,132 in tokens, $3.40 per shipped PR.
88
122
 
89
- If the audit flags two or more dimensions, the fix is usually a few days of work, not a rebuild. [Continuum](https://continuum.build) runs structured 2-week sprints for setups that need them.
123
+ Then I ran ccaudit on myself, expecting an A.
90
124
 
91
- ## Privacy
125
+ I got a B-.
92
126
 
93
- Reads `~/.claude/` on your machine. Outputs to stdout. Makes no network calls. No telemetry, no analytics, no opt-in submission (yet).
127
+ The findings were valid. The reason I assumed A was that I had been optimizing the agents and ignoring the room they live in. Almost everyone running Claude Code is doing the same thing. The hype is on the model. The constraint is the scaffolding.
128
+
129
+ If you want to know what your scaffolding looks like graded:
130
+
131
+ ```bash
132
+ npx @uxcontinuum/ccaudit
133
+ ```
94
134
 
95
135
  ---
96
136
 
97
- Built by [Matt Turley](https://uxcontinuum.com) / [Continuum](https://continuum.build).
137
+ Built by [Matt Turley](https://uxcontinuum.com).
package/index.js CHANGED
@@ -75,6 +75,14 @@ function projDirName(filePath) {
75
75
  return idx >= 0 && idx + 1 < parts.length ? parts[idx + 1] : '';
76
76
  }
77
77
 
78
+ // Cheap fingerprint of a tool_use input. Used to detect within-session retries.
79
+ function fpToolUse(name, input) {
80
+ const key = typeof input === 'object' && input
81
+ ? (input.command || input.file_path || input.path || input.pattern || JSON.stringify(input))
82
+ : String(input ?? '');
83
+ return name + '::' + String(key).slice(0, 200);
84
+ }
85
+
78
86
  function parseSession(filePath, cutoffMs) {
79
87
  let lines;
80
88
  try { lines = fs.readFileSync(filePath, 'utf8').split('\n'); } catch (_) { return null; }
@@ -92,6 +100,8 @@ function parseSession(filePath, cutoffMs) {
92
100
  let entrypoint = null;
93
101
  let claudeVersion = null;
94
102
  let messageCount = 0;
103
+ let toolErrors = 0;
104
+ const fpCounts = new Map(); // tool_use fingerprint → count, for retry detection
95
105
 
96
106
  for (const raw of lines) {
97
107
  if (!raw) continue;
@@ -119,7 +129,11 @@ function parseSession(filePath, cutoffMs) {
119
129
  if (msg.type === 'user') {
120
130
  const c = msg.message?.content;
121
131
  if (Array.isArray(c)) {
122
- for (const b of c) if (b?.type === 'text' && b.text?.trim()) userPrompts.push(b.text.trim());
132
+ for (const b of c) {
133
+ if (b?.type === 'text' && b.text?.trim()) userPrompts.push(b.text.trim());
134
+ // tool_result blocks appear in user messages. is_error true = the tool call failed.
135
+ if (b?.type === 'tool_result' && b.is_error === true) toolErrors++;
136
+ }
123
137
  } else if (typeof c === 'string' && c.trim()) {
124
138
  userPrompts.push(c.trim());
125
139
  }
@@ -128,7 +142,13 @@ function parseSession(filePath, cutoffMs) {
128
142
  if (msg.type === 'assistant') {
129
143
  const c = msg.message?.content;
130
144
  if (Array.isArray(c)) {
131
- for (const b of c) if (b?.type === 'tool_use') toolCalls.push(b.name || 'unknown');
145
+ for (const b of c) {
146
+ if (b?.type === 'tool_use') {
147
+ toolCalls.push(b.name || 'unknown');
148
+ const fp = fpToolUse(b.name || 'unknown', b.input);
149
+ fpCounts.set(fp, (fpCounts.get(fp) || 0) + 1);
150
+ }
151
+ }
132
152
  }
133
153
  const u = msg.message?.usage;
134
154
  if (u) {
@@ -141,14 +161,15 @@ function parseSession(filePath, cutoffMs) {
141
161
  if (!timestamps.length) return null;
142
162
 
143
163
  const projDir = projDirName(filePath);
144
- // Multi-signal agent detector. Any of these is sufficient:
145
- // - isSidechain: subagent inside another Claude session
146
- // - userType non-external: internal automation invocation
147
- // - dir-name matches UUID/hex pattern: orchestrator-spawned worktree
148
164
  const isAgent = isSidechain ||
149
165
  (userType && userType !== 'external') ||
150
166
  fallbackAgentDirGuess(projDir);
151
167
 
168
+ // Within-session retries: any fingerprint that fired >1 time. Count the
169
+ // excess fires beyond the first as retries.
170
+ let retries = 0;
171
+ for (const c of fpCounts.values()) if (c > 1) retries += (c - 1);
172
+
152
173
  return {
153
174
  projDir,
154
175
  isAgent,
@@ -157,6 +178,8 @@ function parseSession(filePath, cutoffMs) {
157
178
  cwd: cwd || '',
158
179
  userPrompts,
159
180
  toolCalls,
181
+ toolErrors,
182
+ retries,
160
183
  timestamps,
161
184
  outputTokens,
162
185
  inputTokens,
@@ -283,6 +306,19 @@ function aggregate(sessions) {
283
306
  const outputTokens = subset.reduce((n, s) => n + s.outputTokens, 0);
284
307
  const inputTokens = subset.reduce((n, s) => n + s.inputTokens, 0);
285
308
 
309
+ const toolErrorsTotal = subset.reduce((n, s) => n + (s.toolErrors || 0), 0);
310
+ const retriesTotal = subset.reduce((n, s) => n + (s.retries || 0), 0);
311
+ const toolErrorRate = tools.length ? (100 * toolErrorsTotal / tools.length) : 0;
312
+ const retriesPerSession = subset.length ? (retriesTotal / subset.length) : 0;
313
+
314
+ // Median session length (message count). Cheaper proxy for first-shot success.
315
+ const lengths = subset.map(s => s.messageCount).sort((a, b) => a - b);
316
+ const medianLen = lengths.length
317
+ ? (lengths.length % 2 === 1
318
+ ? lengths[(lengths.length - 1) / 2]
319
+ : Math.round((lengths[lengths.length / 2 - 1] + lengths[lengths.length / 2]) / 2))
320
+ : 0;
321
+
286
322
  return {
287
323
  sessions: subset.length,
288
324
  prompts: prompts.length,
@@ -300,6 +336,11 @@ function aggregate(sessions) {
300
336
  outputTokens,
301
337
  inputTokens,
302
338
  totalTools: tools.length,
339
+ toolErrorRate: Math.round(toolErrorRate * 10) / 10,
340
+ toolErrorsTotal,
341
+ retriesTotal,
342
+ retriesPerSession: Math.round(retriesPerSession * 10) / 10,
343
+ medianSessionLength: medianLen,
303
344
  };
304
345
  };
305
346
 
@@ -443,7 +484,37 @@ function grade(stats, setup) {
443
484
  });
444
485
  }
445
486
 
446
- // 5. Agent pipeline grade (only if agent sessions exist).
487
+ // 5. Output signals (human sessions only). Best available local proxy for
488
+ // whether your sessions actually produce results vs grinding. Three inputs:
489
+ // - tool error rate (lower = cleaner runs)
490
+ // - retries per session (lower = first-shot success)
491
+ // - median session length (very long = stuck, very short = trivial)
492
+ if (human.sessions && human.totalTools > 0) {
493
+ let oScore = 80;
494
+ if (human.toolErrorRate > 15) oScore -= 18;
495
+ else if (human.toolErrorRate > 8) oScore -= 10;
496
+ else if (human.toolErrorRate > 4) oScore -= 4;
497
+
498
+ if (human.retriesPerSession > 6) oScore -= 12;
499
+ else if (human.retriesPerSession > 3) oScore -= 6;
500
+
501
+ if (human.medianSessionLength > 100) oScore -= 15; // genuinely stuck
502
+ else if (human.medianSessionLength > 50) oScore -= 6; // long grinds
503
+ else if (human.medianSessionLength >= 2 && human.medianSessionLength <= 20) oScore += 4; // healthy
504
+
505
+ oScore = Math.max(0, Math.min(100, oScore));
506
+
507
+ dims.push({
508
+ name: 'Output signals',
509
+ score: oScore,
510
+ detail: `Tool error rate ${human.toolErrorRate}%, ${human.retriesPerSession} retries per session, median session ${human.medianSessionLength} messages.`,
511
+ fix: human.toolErrorRate > 15
512
+ ? 'Your tool error rate is high. Sessions are fighting the environment more than producing output.'
513
+ : null,
514
+ });
515
+ }
516
+
517
+ // 6. Agent pipeline grade (only if agent sessions exist).
447
518
  if (agent.sessions) {
448
519
  let aScore = 75;
449
520
  if (agent.sessions > 50) aScore += 8;
@@ -604,6 +675,11 @@ if (hasFlag('--json')) {
604
675
  },
605
676
  output_tokens: stats.human.outputTokens,
606
677
  input_tokens: stats.human.inputTokens,
678
+ tool_error_rate_pct: stats.human.toolErrorRate,
679
+ tool_errors_total: stats.human.toolErrorsTotal,
680
+ retries_total: stats.human.retriesTotal,
681
+ retries_per_session: stats.human.retriesPerSession,
682
+ median_session_length: stats.human.medianSessionLength,
607
683
  } : null,
608
684
  agent: stats.agent ? {
609
685
  sessions: stats.agent.sessions,
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@uxcontinuum/ccaudit",
3
- "version": "1.0.3",
3
+ "version": "1.1.1",
4
4
  "description": "A diagnostic for your Claude Code setup. Reads ~/.claude/ locally, grades you across hook coverage, project hygiene, tool balance, prompt tells, and pipeline ops. Zero install: npx @uxcontinuum/ccaudit",
5
5
  "main": "index.js",
6
6
  "bin": {