npm - @uxcontinuum/ccaudit - Versions diffs - 1.0.3 → 1.1.1 - Mend

@uxcontinuum/ccaudit 1.0.3 → 1.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md CHANGED Viewed

@@ -1,12 +1,32 @@
 # ccaudit
-A diagnostic for your Claude Code setup. Run it, get graded across five dimensions, find the specific fixes.
+A diagnostic for your Claude Code setup. Three things at once:
+1. **A fun report card** you can screenshot and share.
+2. **A hygiene linter** that surfaces what's missing.
+3. **A discovery tool** that shows you which parts of Claude Code you are not using yet.
 ```bash
 npx @uxcontinuum/ccaudit
 ```
-Reads `~/.claude/` locally. Zero dependencies. Nothing leaves your machine.
+Zero install. Zero dependencies. No network calls. Reads `~/.claude/` on your machine and outputs a grade card.
+## Why this exists
+Most Claude Code users are running on a fraction of the surface area. No hooks installed. No skills configured. No MCP servers. No idea what their token cost per shipped feature is. No concept of how often their agent fails on first try.
+The hype is on the model. The actual constraint is everything around the model. The scaffolding.
+ccaudit grades the scaffolding.
+## What the grade is and isn't
+This is a **hygiene and discovery audit**, not an outcomes audit. It measures whether your Claude Code setup is **set up well** and **uses what's available**, not whether your specific outputs are good.
+Think of it as a linter for your AI workflow. Passing lint doesn't guarantee your code is good. Failing lint usually means something is missing. Same here: a high grade doesn't mean Claude is shipping perfect work for you. A low grade usually means there's surface area of Claude Code you haven't unlocked yet.
+The grade can be gamed (install five no-op hooks, auto-title every session, scrub "just" from your prompts). Don't bother. The findings under the grade are the value, not the letter.
 ## What you get
@@ -15,38 +35,50 @@ Reads `~/.claude/` locally. Zero dependencies. Nothing leaves your machine.
   CCAUDIT  your Claude Code report card
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-  OVERALL GRADE   D+   (67/100)
+  OVERALL GRADE   C+   (79/100)
-  Hook coverage                     B-   ████████████████░░░░
-    1 PreToolUse, 1 PostToolUse, 0 UserPromptSubmit hook(s).
+  Hook coverage                     A+   ████████████████████
+    1 PostToolUse, 2 Stop, 1 PreToolUse, autoMemory plugin.
   Project hygiene (human)            F   ████████░░░░░░░░░░░░
-    0% of your human sessions are titled. Avg prompt: 2350 chars.
-    → Title your sessions. Untitled sessions are unsearchable history.
+    0% titled, launched from 10 distinct working dirs.
-  Tool balance (human)               F   ███████████░░░░░░░░░
-    Bash 73%, Edit+Write 10%, Read 10%, Grep+Glob 2%, Agent/Task 0%.
-    → You are running things, not editing things. Use Edit/Write more.
+  Tool balance (human)              D+   ██████████████░░░░░░
+    Bash 73%, Edit+Write 10% (3,536 calls), Read 10%.
   Prompt tells                       C   ███████████████░░░░░
-    You said "just" 10243 times across 19199 prompts (53%).
-    → The word "just" telegraphs that you think the task is simple. It is not.
+    Said "just" 10,236 times across 19,192 prompts (53%).
+  Output signals                     B   ████████████████░░░░
+    Tool error rate 4.2%, median session length 8 messages.
   Pipeline ops (agent sessions)      B   █████████████████░░░
-    3253 agent-spawned sessions, 26.93M output tokens.
+    3,253 agent-spawned sessions, 26.93M output tokens.
 ```
 ## What it checks
-| Dimension | Source |
-|-----------|--------|
-| Hook coverage | `~/.claude/settings.json` (PreToolUse, PostToolUse, UserPromptSubmit hook counts) |
-| Project hygiene | session titles, average prompt length, untitled rate (human sessions only) |
-| Tool balance | distribution across Bash, Edit/Write, Read, Grep/Glob, Agent/Task (human sessions only) |
-| Prompt tells | "just" frequency, "please" frequency, total prompt count |
-| Pipeline ops | agent-spawned session stats: count, token spend, hook coverage relative to volume |
+| Dimension | What it measures | What it cannot see |
+|-----------|------------------|-------------------|
+| Hook coverage | Hooks configured in `~/.claude/settings.json` across all event types, plus `autoMemoryEnabled` plugin flag | Whether the hooks actually do anything useful |
+| Project hygiene | Custom titles, auto-slugs, CWD diversity, prompt length | Whether your titles describe the work accurately |
+| Tool balance | Distribution across Bash, Edit, Read, Grep, Agent. Adaptive: high Bash% is okay if absolute Edit volume is also high | Whether each tool call accomplished the goal |
+| Prompt tells | Frequency of hedge words ("just", "please"), prompt clarity heuristics | Whether your prompts produce good outputs |
+| Output signals | Tool-call error rate, median session length, within-session retry patterns | Whether your shipped code works in production |
+| Pipeline ops | Agent-spawned session count, token spend, hook coverage relative to volume | Whether your pipeline ships features that don't break |
+It separates human-driven sessions from agent-spawned worktrees via three signals (`isSidechain`, `userType`, UUID/hex dir-name pattern). Operator grade and pipeline grade get scored independently against different rubrics.
-It separates human-driven sessions from agent-spawned worktrees (UUID and ULID-suffix dirs in your projects folder). Your operator-grade and your pipeline-grade get scored independently against different rubrics.
+## Cross-platform support
+Works on:
+- macOS (`$HOME/.claude/`)
+- Linux (`$HOME/.claude/`)
+- Windows / WSL (`%USERPROFILE%\.claude\` or `$HOME/.claude/`)
+- VPS / non-default home (uses Node's `os.homedir()`)
+- Running as root with users in `/home/*` (scans all)
+Tested against setups ranging from "brand new install with zero sessions" to "20,000 sessions and 4,000 agent worktrees."
 ## Install
@@ -54,7 +86,7 @@ It separates human-driven sessions from agent-spawned worktrees (UUID and ULID-s
 # Run once without installing
 npx @uxcontinuum/ccaudit
-# Or globally
+# Or install globally
 npm i -g @uxcontinuum/ccaudit
 ccaudit
 ```
@@ -64,34 +96,42 @@ Requires Node 14+. No other dependencies.
 ## Options
 ```bash
-ccaudit                 # full report, last 30 days of activity
+ccaudit                 # full report, last 30 days
 ccaudit --days 7        # just last week
 ccaudit --days 365      # full year
+ccaudit --json          # programmatic output, anonymized
 ccaudit --no-color      # plain text for copying
 ```
-## How it grades
+## Privacy
+Reads `~/.claude/` on your machine. Outputs to stdout. No network calls, no telemetry, no opt-in submission. The `--json` output is anonymized (no prompts, no slugs, no CWD strings, just aggregate counts and percentages).
-Each dimension produces a 0-100 score and a letter grade (A+ through F). The overall grade is the mean of the dimension scores. The rubric weights:
+## Honest disclaimers
-- Hook coverage is hard-floored at 35 if you have zero hooks. Anything could happen overnight.
-- Project hygiene scales linearly with titled-session percentage and penalizes both ultra-terse (<80 chars) and wall-of-text (>1500 chars) average prompts.
-- Tool balance penalizes Bash dominance above 65% and rewards healthy editing (10-55% Edit+Write).
-- Prompt tells subtract for high "just" frequency. "Just" telegraphs that you think the task is simple. It usually is not.
-- Pipeline ops rewards low tokens-per-session and penalizes running an agent pipeline without runtime hooks.
+- The grade is opinionated, not objective.
+- The rubric will change as the tool matures.
+- High grade ≠ good outputs. Low grade ≠ bad outputs. The grade is about **scaffolding and feature coverage**, not results.
+- The tool ships with a built-in nudge toward [Continuum Sprint](https://uxcontinuum.com/sprint) when it surfaces 2+ failing dimensions. That is intentional. If your setup is genuinely broken in two places, a 2-week sprint is often what fixes it. Ignore the nudge if you don't want it.
-The grade is opinionated, not objective. Read it as a diagnostic, not a judgment.
+## The story behind this
-## Why this exists
+Karpathy keeps saying we're entering vibe coding. Software you write in English while AI generates the code. He is not wrong about where this is going.
-There is no public benchmark for "is my Claude Code setup any good." People burn weeks reading other people's CLAUDE.md files trying to figure out what they're doing wrong. This tool answers that question in 30 seconds.
+I bought in six months ago. Built a multi-agent pipeline. Started shipping production code through it. Six weeks of recent data: 333 PRs, $1,132 in tokens, $3.40 per shipped PR.
-If the audit flags two or more dimensions, the fix is usually a few days of work, not a rebuild. [Continuum](https://continuum.build) runs structured 2-week sprints for setups that need them.
+Then I ran ccaudit on myself, expecting an A.
-## Privacy
+I got a B-.
-Reads `~/.claude/` on your machine. Outputs to stdout. Makes no network calls. No telemetry, no analytics, no opt-in submission (yet).
+The findings were valid. The reason I assumed A was that I had been optimizing the agents and ignoring the room they live in. Almost everyone running Claude Code is doing the same thing. The hype is on the model. The constraint is the scaffolding.
+If you want to know what your scaffolding looks like graded:
+```bash
+npx @uxcontinuum/ccaudit
+```
 ---
-Built by [Matt Turley](https://uxcontinuum.com) / [Continuum](https://continuum.build).
+Built by [Matt Turley](https://uxcontinuum.com).

package/index.js CHANGED Viewed

@@ -75,6 +75,14 @@ function projDirName(filePath) {
   return idx >= 0 && idx + 1 < parts.length ? parts[idx + 1] : '';
 }
+// Cheap fingerprint of a tool_use input. Used to detect within-session retries.
+function fpToolUse(name, input) {
+  const key = typeof input === 'object' && input
+    ? (input.command || input.file_path || input.path || input.pattern || JSON.stringify(input))
+    : String(input ?? '');
+  return name + '::' + String(key).slice(0, 200);
+}
 function parseSession(filePath, cutoffMs) {
   let lines;
   try { lines = fs.readFileSync(filePath, 'utf8').split('\n'); } catch (_) { return null; }
@@ -92,6 +100,8 @@ function parseSession(filePath, cutoffMs) {
   let entrypoint   = null;
   let claudeVersion = null;
   let messageCount = 0;
+  let toolErrors   = 0;
+  const fpCounts = new Map(); // tool_use fingerprint → count, for retry detection
   for (const raw of lines) {
     if (!raw) continue;
@@ -119,7 +129,11 @@ function parseSession(filePath, cutoffMs) {
     if (msg.type === 'user') {
       const c = msg.message?.content;
       if (Array.isArray(c)) {
-        for (const b of c) if (b?.type === 'text' && b.text?.trim()) userPrompts.push(b.text.trim());
+        for (const b of c) {
+          if (b?.type === 'text' && b.text?.trim()) userPrompts.push(b.text.trim());
+          // tool_result blocks appear in user messages. is_error true = the tool call failed.
+          if (b?.type === 'tool_result' && b.is_error === true) toolErrors++;
+        }
       } else if (typeof c === 'string' && c.trim()) {
         userPrompts.push(c.trim());
       }
@@ -128,7 +142,13 @@ function parseSession(filePath, cutoffMs) {
     if (msg.type === 'assistant') {
       const c = msg.message?.content;
       if (Array.isArray(c)) {
-        for (const b of c) if (b?.type === 'tool_use') toolCalls.push(b.name || 'unknown');
+        for (const b of c) {
+          if (b?.type === 'tool_use') {
+            toolCalls.push(b.name || 'unknown');
+            const fp = fpToolUse(b.name || 'unknown', b.input);
+            fpCounts.set(fp, (fpCounts.get(fp) || 0) + 1);
+          }
+        }
       }
       const u = msg.message?.usage;
       if (u) {
@@ -141,14 +161,15 @@ function parseSession(filePath, cutoffMs) {
   if (!timestamps.length) return null;
   const projDir = projDirName(filePath);
-  // Multi-signal agent detector. Any of these is sufficient:
-  //   - isSidechain: subagent inside another Claude session
-  //   - userType non-external: internal automation invocation
-  //   - dir-name matches UUID/hex pattern: orchestrator-spawned worktree
   const isAgent = isSidechain ||
                   (userType && userType !== 'external') ||
                   fallbackAgentDirGuess(projDir);
+  // Within-session retries: any fingerprint that fired >1 time. Count the
+  // excess fires beyond the first as retries.
+  let retries = 0;
+  for (const c of fpCounts.values()) if (c > 1) retries += (c - 1);
   return {
     projDir,
     isAgent,
@@ -157,6 +178,8 @@ function parseSession(filePath, cutoffMs) {
     cwd: cwd || '',
     userPrompts,
     toolCalls,
+    toolErrors,
+    retries,
     timestamps,
     outputTokens,
     inputTokens,
@@ -283,6 +306,19 @@ function aggregate(sessions) {
     const outputTokens = subset.reduce((n, s) => n + s.outputTokens, 0);
     const inputTokens  = subset.reduce((n, s) => n + s.inputTokens, 0);
+    const toolErrorsTotal = subset.reduce((n, s) => n + (s.toolErrors || 0), 0);
+    const retriesTotal    = subset.reduce((n, s) => n + (s.retries || 0), 0);
+    const toolErrorRate   = tools.length ? (100 * toolErrorsTotal / tools.length) : 0;
+    const retriesPerSession = subset.length ? (retriesTotal / subset.length) : 0;
+    // Median session length (message count). Cheaper proxy for first-shot success.
+    const lengths = subset.map(s => s.messageCount).sort((a, b) => a - b);
+    const medianLen = lengths.length
+      ? (lengths.length % 2 === 1
+          ? lengths[(lengths.length - 1) / 2]
+          : Math.round((lengths[lengths.length / 2 - 1] + lengths[lengths.length / 2]) / 2))
+      : 0;
     return {
       sessions: subset.length,
       prompts: prompts.length,
@@ -300,6 +336,11 @@ function aggregate(sessions) {
       outputTokens,
       inputTokens,
       totalTools: tools.length,
+      toolErrorRate: Math.round(toolErrorRate * 10) / 10,
+      toolErrorsTotal,
+      retriesTotal,
+      retriesPerSession: Math.round(retriesPerSession * 10) / 10,
+      medianSessionLength: medianLen,
     };
   };
@@ -443,7 +484,37 @@ function grade(stats, setup) {
     });
   }
-  // 5. Agent pipeline grade (only if agent sessions exist).
+  // 5. Output signals (human sessions only). Best available local proxy for
+  // whether your sessions actually produce results vs grinding. Three inputs:
+  //   - tool error rate (lower = cleaner runs)
+  //   - retries per session (lower = first-shot success)
+  //   - median session length (very long = stuck, very short = trivial)
+  if (human.sessions && human.totalTools > 0) {
+    let oScore = 80;
+    if (human.toolErrorRate > 15) oScore -= 18;
+    else if (human.toolErrorRate > 8) oScore -= 10;
+    else if (human.toolErrorRate > 4) oScore -= 4;
+    if (human.retriesPerSession > 6) oScore -= 12;
+    else if (human.retriesPerSession > 3) oScore -= 6;
+    if      (human.medianSessionLength > 100) oScore -= 15; // genuinely stuck
+    else if (human.medianSessionLength > 50)  oScore -= 6;  // long grinds
+    else if (human.medianSessionLength >= 2 && human.medianSessionLength <= 20) oScore += 4; // healthy
+    oScore = Math.max(0, Math.min(100, oScore));
+    dims.push({
+      name: 'Output signals',
+      score: oScore,
+      detail: `Tool error rate ${human.toolErrorRate}%, ${human.retriesPerSession} retries per session, median session ${human.medianSessionLength} messages.`,
+      fix: human.toolErrorRate > 15
+        ? 'Your tool error rate is high. Sessions are fighting the environment more than producing output.'
+        : null,
+    });
+  }
+  // 6. Agent pipeline grade (only if agent sessions exist).
   if (agent.sessions) {
     let aScore = 75;
     if (agent.sessions > 50) aScore += 8;
@@ -604,6 +675,11 @@ if (hasFlag('--json')) {
       },
       output_tokens: stats.human.outputTokens,
       input_tokens: stats.human.inputTokens,
+      tool_error_rate_pct: stats.human.toolErrorRate,
+      tool_errors_total: stats.human.toolErrorsTotal,
+      retries_total: stats.human.retriesTotal,
+      retries_per_session: stats.human.retriesPerSession,
+      median_session_length: stats.human.medianSessionLength,
     } : null,
     agent: stats.agent ? {
       sessions: stats.agent.sessions,

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@uxcontinuum/ccaudit",
-  "version": "1.0.3",
+  "version": "1.1.1",
   "description": "A diagnostic for your Claude Code setup. Reads ~/.claude/ locally, grades you across hook coverage, project hygiene, tool balance, prompt tells, and pipeline ops. Zero install: npx @uxcontinuum/ccaudit",
   "main": "index.js",
   "bin": {