npm - @uxcontinuum/ccaudit - Versions diffs - 1.0.3 → 1.1.0 - Mend

@uxcontinuum/ccaudit 1.0.3 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md CHANGED Viewed

@@ -1,12 +1,20 @@
 # ccaudit
-A diagnostic for your Claude Code setup. Run it, get graded across five dimensions, find the specific fixes.
+A diagnostic for your Claude Code setup. Mostly for fun, partly genuinely useful. It reads `~/.claude/` locally and grades you across hook coverage, project hygiene, tool balance, prompt tells, and pipeline ops.
 ```bash
 npx @uxcontinuum/ccaudit
 ```
-Reads `~/.claude/` locally. Zero dependencies. Nothing leaves your machine.
+Zero install, zero dependencies, no network calls.
+## What the grade is and isn't
+This is a hygiene audit, not an outcomes audit. It measures whether your Claude Code setup is **set up well**, not whether your outputs are good.
+Think of it as a linter for your AI workflow. Passing lint doesn't guarantee your code is good. Failing lint usually means something is missing. Same here: a high grade doesn't mean Claude is shipping perfect work for you; a low grade usually means the scaffolding around your AI is sparse.
+The grade can be gamed (install five no-op hooks, auto-title every session, scrub "just" from your prompts). Don't bother. The findings under the grade are the value, not the letter.
 ## What you get
@@ -15,38 +23,50 @@ Reads `~/.claude/` locally. Zero dependencies. Nothing leaves your machine.
   CCAUDIT  your Claude Code report card
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-  OVERALL GRADE   D+   (67/100)
+  OVERALL GRADE   C+   (79/100)
-  Hook coverage                     B-   ████████████████░░░░
-    1 PreToolUse, 1 PostToolUse, 0 UserPromptSubmit hook(s).
+  Hook coverage                     A+   ████████████████████
+    1 PostToolUse, 2 Stop, 1 PreToolUse, autoMemory plugin.
   Project hygiene (human)            F   ████████░░░░░░░░░░░░
-    0% of your human sessions are titled. Avg prompt: 2350 chars.
-    → Title your sessions. Untitled sessions are unsearchable history.
+    0% titled, launched from 10 distinct working dirs.
-  Tool balance (human)               F   ███████████░░░░░░░░░
-    Bash 73%, Edit+Write 10%, Read 10%, Grep+Glob 2%, Agent/Task 0%.
-    → You are running things, not editing things. Use Edit/Write more.
+  Tool balance (human)              D+   ██████████████░░░░░░
+    Bash 73%, Edit+Write 10% (3,536 calls), Read 10%.
   Prompt tells                       C   ███████████████░░░░░
-    You said "just" 10243 times across 19199 prompts (53%).
-    → The word "just" telegraphs that you think the task is simple. It is not.
+    Said "just" 10,236 times across 19,192 prompts (53%).
+  Output signals                     B   ████████████████░░░░
+    Tool error rate 4.2%, median session length 8 messages.
   Pipeline ops (agent sessions)      B   █████████████████░░░
-    3253 agent-spawned sessions, 26.93M output tokens.
+    3,253 agent-spawned sessions, 26.93M output tokens.
 ```
 ## What it checks
-| Dimension | Source |
-|-----------|--------|
-| Hook coverage | `~/.claude/settings.json` (PreToolUse, PostToolUse, UserPromptSubmit hook counts) |
-| Project hygiene | session titles, average prompt length, untitled rate (human sessions only) |
-| Tool balance | distribution across Bash, Edit/Write, Read, Grep/Glob, Agent/Task (human sessions only) |
-| Prompt tells | "just" frequency, "please" frequency, total prompt count |
-| Pipeline ops | agent-spawned session stats: count, token spend, hook coverage relative to volume |
+| Dimension | What it measures | What it cannot see |
+|-----------|------------------|-------------------|
+| Hook coverage | Hooks configured in `~/.claude/settings.json` across all event types, plus `autoMemoryEnabled` plugin flag | Whether the hooks actually do anything useful |
+| Project hygiene | Custom titles, auto-slugs, CWD diversity, prompt length | Whether your titles describe the work accurately |
+| Tool balance | Distribution across Bash, Edit, Read, Grep, Agent. Adaptive: high Bash% is okay if absolute Edit volume is also high | Whether each tool call accomplished the goal |
+| Prompt tells | Frequency of hedge words ("just", "please"), prompt clarity heuristics | Whether your prompts produce good outputs |
+| Output signals | Tool-call error rate, median session length, within-session retry patterns | Whether your shipped code works in production |
+| Pipeline ops | Agent-spawned session count, token spend, hook coverage relative to volume | Whether your pipeline ships features that don't break |
+It separates human-driven sessions from agent-spawned worktrees via three signals (`isSidechain`, `userType`, UUID/hex dir-name pattern). Operator grade and pipeline grade get scored independently against different rubrics.
+## Cross-platform support
+Works on:
+- macOS (`$HOME/.claude/`)
+- Linux (`$HOME/.claude/`)
+- Windows / WSL (`%USERPROFILE%\.claude\` or `$HOME/.claude/`)
+- VPS / non-default home (uses Node's `os.homedir()`)
+- Running as root with users in `/home/*` (scans all)
-It separates human-driven sessions from agent-spawned worktrees (UUID and ULID-suffix dirs in your projects folder). Your operator-grade and your pipeline-grade get scored independently against different rubrics.
+Tested against setups ranging from "brand new install with zero sessions" to "20,000 sessions and 4,000 agent worktrees."
 ## Install
@@ -54,7 +74,7 @@ It separates human-driven sessions from agent-spawned worktrees (UUID and ULID-s
 # Run once without installing
 npx @uxcontinuum/ccaudit
-# Or globally
+# Or install globally
 npm i -g @uxcontinuum/ccaudit
 ccaudit
 ```
@@ -64,34 +84,24 @@ Requires Node 14+. No other dependencies.
 ## Options
 ```bash
-ccaudit                 # full report, last 30 days of activity
+ccaudit                 # full report, last 30 days
 ccaudit --days 7        # just last week
 ccaudit --days 365      # full year
+ccaudit --json          # programmatic output, anonymized
 ccaudit --no-color      # plain text for copying
 ```
-## How it grades
-Each dimension produces a 0-100 score and a letter grade (A+ through F). The overall grade is the mean of the dimension scores. The rubric weights:
-- Hook coverage is hard-floored at 35 if you have zero hooks. Anything could happen overnight.
-- Project hygiene scales linearly with titled-session percentage and penalizes both ultra-terse (<80 chars) and wall-of-text (>1500 chars) average prompts.
-- Tool balance penalizes Bash dominance above 65% and rewards healthy editing (10-55% Edit+Write).
-- Prompt tells subtract for high "just" frequency. "Just" telegraphs that you think the task is simple. It usually is not.
-- Pipeline ops rewards low tokens-per-session and penalizes running an agent pipeline without runtime hooks.
-The grade is opinionated, not objective. Read it as a diagnostic, not a judgment.
-## Why this exists
-There is no public benchmark for "is my Claude Code setup any good." People burn weeks reading other people's CLAUDE.md files trying to figure out what they're doing wrong. This tool answers that question in 30 seconds.
+## Privacy
-If the audit flags two or more dimensions, the fix is usually a few days of work, not a rebuild. [Continuum](https://continuum.build) runs structured 2-week sprints for setups that need them.
+Reads `~/.claude/` on your machine. Outputs to stdout. No network calls, no telemetry, no opt-in submission. The `--json` output is anonymized (no prompts, no slugs, no CWD strings, just aggregate counts and percentages).
-## Privacy
+## Honest disclaimers
-Reads `~/.claude/` on your machine. Outputs to stdout. Makes no network calls. No telemetry, no analytics, no opt-in submission (yet).
+- The grade is opinionated, not objective.
+- The rubric will change as the tool matures.
+- High grade ≠ good outputs. Low grade ≠ bad outputs. The grade is about scaffolding, not results.
+- The tool ships with a built-in nudge toward [Continuum Sprint](https://uxcontinuum.com/sprint) when it surfaces 2+ failing dimensions. That's intentional. If your setup is genuinely broken in two places, a 2-week sprint is often what fixes it. Ignore the nudge if you don't want it.
 ---
-Built by [Matt Turley](https://uxcontinuum.com) / [Continuum](https://continuum.build).
+Built by [Matt Turley](https://uxcontinuum.com).

package/index.js CHANGED Viewed

@@ -75,6 +75,14 @@ function projDirName(filePath) {
   return idx >= 0 && idx + 1 < parts.length ? parts[idx + 1] : '';
 }
+// Cheap fingerprint of a tool_use input. Used to detect within-session retries.
+function fpToolUse(name, input) {
+  const key = typeof input === 'object' && input
+    ? (input.command || input.file_path || input.path || input.pattern || JSON.stringify(input))
+    : String(input ?? '');
+  return name + '::' + String(key).slice(0, 200);
+}
 function parseSession(filePath, cutoffMs) {
   let lines;
   try { lines = fs.readFileSync(filePath, 'utf8').split('\n'); } catch (_) { return null; }
@@ -92,6 +100,8 @@ function parseSession(filePath, cutoffMs) {
   let entrypoint   = null;
   let claudeVersion = null;
   let messageCount = 0;
+  let toolErrors   = 0;
+  const fpCounts = new Map(); // tool_use fingerprint → count, for retry detection
   for (const raw of lines) {
     if (!raw) continue;
@@ -119,7 +129,11 @@ function parseSession(filePath, cutoffMs) {
     if (msg.type === 'user') {
       const c = msg.message?.content;
       if (Array.isArray(c)) {
-        for (const b of c) if (b?.type === 'text' && b.text?.trim()) userPrompts.push(b.text.trim());
+        for (const b of c) {
+          if (b?.type === 'text' && b.text?.trim()) userPrompts.push(b.text.trim());
+          // tool_result blocks appear in user messages. is_error true = the tool call failed.
+          if (b?.type === 'tool_result' && b.is_error === true) toolErrors++;
+        }
       } else if (typeof c === 'string' && c.trim()) {
         userPrompts.push(c.trim());
       }
@@ -128,7 +142,13 @@ function parseSession(filePath, cutoffMs) {
     if (msg.type === 'assistant') {
       const c = msg.message?.content;
       if (Array.isArray(c)) {
-        for (const b of c) if (b?.type === 'tool_use') toolCalls.push(b.name || 'unknown');
+        for (const b of c) {
+          if (b?.type === 'tool_use') {
+            toolCalls.push(b.name || 'unknown');
+            const fp = fpToolUse(b.name || 'unknown', b.input);
+            fpCounts.set(fp, (fpCounts.get(fp) || 0) + 1);
+          }
+        }
       }
       const u = msg.message?.usage;
       if (u) {
@@ -141,14 +161,15 @@ function parseSession(filePath, cutoffMs) {
   if (!timestamps.length) return null;
   const projDir = projDirName(filePath);
-  // Multi-signal agent detector. Any of these is sufficient:
-  //   - isSidechain: subagent inside another Claude session
-  //   - userType non-external: internal automation invocation
-  //   - dir-name matches UUID/hex pattern: orchestrator-spawned worktree
   const isAgent = isSidechain ||
                   (userType && userType !== 'external') ||
                   fallbackAgentDirGuess(projDir);
+  // Within-session retries: any fingerprint that fired >1 time. Count the
+  // excess fires beyond the first as retries.
+  let retries = 0;
+  for (const c of fpCounts.values()) if (c > 1) retries += (c - 1);
   return {
     projDir,
     isAgent,
@@ -157,6 +178,8 @@ function parseSession(filePath, cutoffMs) {
     cwd: cwd || '',
     userPrompts,
     toolCalls,
+    toolErrors,
+    retries,
     timestamps,
     outputTokens,
     inputTokens,
@@ -283,6 +306,19 @@ function aggregate(sessions) {
     const outputTokens = subset.reduce((n, s) => n + s.outputTokens, 0);
     const inputTokens  = subset.reduce((n, s) => n + s.inputTokens, 0);
+    const toolErrorsTotal = subset.reduce((n, s) => n + (s.toolErrors || 0), 0);
+    const retriesTotal    = subset.reduce((n, s) => n + (s.retries || 0), 0);
+    const toolErrorRate   = tools.length ? (100 * toolErrorsTotal / tools.length) : 0;
+    const retriesPerSession = subset.length ? (retriesTotal / subset.length) : 0;
+    // Median session length (message count). Cheaper proxy for first-shot success.
+    const lengths = subset.map(s => s.messageCount).sort((a, b) => a - b);
+    const medianLen = lengths.length
+      ? (lengths.length % 2 === 1
+          ? lengths[(lengths.length - 1) / 2]
+          : Math.round((lengths[lengths.length / 2 - 1] + lengths[lengths.length / 2]) / 2))
+      : 0;
     return {
       sessions: subset.length,
       prompts: prompts.length,
@@ -300,6 +336,11 @@ function aggregate(sessions) {
       outputTokens,
       inputTokens,
       totalTools: tools.length,
+      toolErrorRate: Math.round(toolErrorRate * 10) / 10,
+      toolErrorsTotal,
+      retriesTotal,
+      retriesPerSession: Math.round(retriesPerSession * 10) / 10,
+      medianSessionLength: medianLen,
     };
   };
@@ -443,7 +484,37 @@ function grade(stats, setup) {
     });
   }
-  // 5. Agent pipeline grade (only if agent sessions exist).
+  // 5. Output signals (human sessions only). Best available local proxy for
+  // whether your sessions actually produce results vs grinding. Three inputs:
+  //   - tool error rate (lower = cleaner runs)
+  //   - retries per session (lower = first-shot success)
+  //   - median session length (very long = stuck, very short = trivial)
+  if (human.sessions && human.totalTools > 0) {
+    let oScore = 80;
+    if (human.toolErrorRate > 15) oScore -= 18;
+    else if (human.toolErrorRate > 8) oScore -= 10;
+    else if (human.toolErrorRate > 4) oScore -= 4;
+    if (human.retriesPerSession > 6) oScore -= 12;
+    else if (human.retriesPerSession > 3) oScore -= 6;
+    if      (human.medianSessionLength > 100) oScore -= 15; // genuinely stuck
+    else if (human.medianSessionLength > 50)  oScore -= 6;  // long grinds
+    else if (human.medianSessionLength >= 2 && human.medianSessionLength <= 20) oScore += 4; // healthy
+    oScore = Math.max(0, Math.min(100, oScore));
+    dims.push({
+      name: 'Output signals',
+      score: oScore,
+      detail: `Tool error rate ${human.toolErrorRate}%, ${human.retriesPerSession} retries per session, median session ${human.medianSessionLength} messages.`,
+      fix: human.toolErrorRate > 15
+        ? 'Your tool error rate is high. Sessions are fighting the environment more than producing output.'
+        : null,
+    });
+  }
+  // 6. Agent pipeline grade (only if agent sessions exist).
   if (agent.sessions) {
     let aScore = 75;
     if (agent.sessions > 50) aScore += 8;
@@ -604,6 +675,11 @@ if (hasFlag('--json')) {
       },
       output_tokens: stats.human.outputTokens,
       input_tokens: stats.human.inputTokens,
+      tool_error_rate_pct: stats.human.toolErrorRate,
+      tool_errors_total: stats.human.toolErrorsTotal,
+      retries_total: stats.human.retriesTotal,
+      retries_per_session: stats.human.retriesPerSession,
+      median_session_length: stats.human.medianSessionLength,
     } : null,
     agent: stats.agent ? {
       sessions: stats.agent.sessions,

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@uxcontinuum/ccaudit",
-  "version": "1.0.3",
+  "version": "1.1.0",
   "description": "A diagnostic for your Claude Code setup. Reads ~/.claude/ locally, grades you across hook coverage, project hygiene, tool balance, prompt tells, and pipeline ops. Zero install: npx @uxcontinuum/ccaudit",
   "main": "index.js",
   "bin": {