npm - claudex-setup - Versions diffs - 1.16.0 → 1.16.2 - Mend

claudex-setup 1.16.0 → 1.16.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

package/CHANGELOG.md +21 -0
package/README.md +52 -23
package/bin/cli.js +92 -5
package/content/launch-posts.md +159 -92
package/package.json +2 -2
package/src/activity.js +195 -1
package/src/analyze.js +11 -6
package/src/audit.js +49 -14
package/src/context.js +106 -0
package/src/deep-review.js +95 -68
package/src/domain-packs.js +13 -4
package/src/index.js +4 -0
package/src/mcp-packs.js +16 -0
package/src/secret-patterns.js +30 -0
package/src/techniques.js +4 -2
package/src/watch.js +170 -42

package/content/launch-posts.md CHANGED Viewed

@@ -1,159 +1,226 @@
-# Launch Posts — Ready to Publish
+# Launch Posts — Proof-Backed Distribution Assets
+**Status:** Complete — every asset below is anchored in measured proof, a canonical artifact, or a verified runtime surface
+**Date:** 2026-04-03
+## Shared Proof Anchors
+Use these links as the canonical sources behind public claims:
+- Proof artifact index: https://github.com/DnaFin/claudex/blob/main/research/proof-artifacts/README.md
+- CLAUDEX self-dogfood trace: https://github.com/DnaFin/claudex/blob/main/research/proof-artifacts/claudex-self-dogfood-proof-trace-2026-04-03.md
+- VTCLE case study: https://github.com/DnaFin/claudex/blob/main/research/case-study-vtcle-2026-04-03.md
+- Social case study: https://github.com/DnaFin/claudex/blob/main/research/case-study-social-2026-04-03.md
+- Polymiro case study: https://github.com/DnaFin/claudex/blob/main/research/case-study-polymiro-2026-04-03.md
+- Public proof metrics source: https://github.com/DnaFin/claudex/blob/main/research/claudex-proof-metrics-source-2026-04-03.md
+Measured-result boundary to preserve:
+- before/after scores were measured with `claudex-setup@1.10.3` on `2026-04-03`
+- current npm latest is `1.16.1`
+- current product surface is `85 checks`
 ## Post 1: Reddit r/ClaudeAI
-**Title:** I built a tool that audits your project for Claude Code optimization — scores you 0-100
+**Title:** I built a CLI that audits your Claude Code setup — 85 checks, measured on 4 real repos
 **Body:**
-After cataloging 1,107 Claude Code entries and verifying 948 of them with evidence, I built a CLI that checks if your project is actually set up to get the most out of Claude Code. It runs 84 checks across CLAUDE.md, hooks, commands, agents, diagrams, and more.
+I built a zero-dependency CLI that audits how well a repo is set up for Claude Code.
-I tested it on 4 real repos. A FastAPI marketing engine went from 46 to 64. A React Native social app went from 40 to 48. A Polymiro project jumped from 35 to 48. The CLAUDEX catalog repo itself: 62 to 90.
+It checks `85` things across `CLAUDE.md`, hooks, commands, agents, skills, MCP config, permissions, diagrams, and verification loops.
-```
+Measured on `2026-04-03` with `claudex-setup@1.10.3`:
+- CLAUDEX: `62 -> 90`
+- VTCLE: `46 -> 64`
+- Social: `40 -> 48`
+- Polymiro: `35 -> 48`
+```bash
 npx claudex-setup
 ```
-Then `npx claudex-setup setup` auto-creates everything that's missing, tailored to your stack (React, Python, TypeScript, etc).
+It starts trust-first:
+- audit first
+- plan / suggest-only before writes
+- apply only what you approve
+- rollback artifacts for every applied batch
-Zero dependencies. No API keys. Runs entirely local.
+Zero dependencies. No API keys. Runs local.
 GitHub: https://github.com/DnaFin/claudex-setup
-Would love feedback!
+Proof and case studies:
+- https://github.com/DnaFin/claudex/blob/main/research/proof-artifacts/README.md
+- https://github.com/DnaFin/claudex/blob/main/research/case-study-vtcle-2026-04-03.md
+- https://github.com/DnaFin/claudex/blob/main/research/case-study-social-2026-04-03.md
+- https://github.com/DnaFin/claudex/blob/main/research/case-study-polymiro-2026-04-03.md
+Would love feedback on what checks or rollout surfaces are still missing.
+**Evidence anchor:** proof artifact index + 3 external case studies + current proof source
 ---
 ## Post 2: Reddit r/ChatGPTCoding
-**Title:** Your Claude Code project is probably running at 10% efficiency. Here's how to check.
+**Title:** Most Claude Code repos are missing the safety layer, not the model
 **Body:**
-I spent weeks cataloging every Claude Code feature, technique, and best practice — 1,107 total, 948 verified with real evidence.
-Turns out most projects are missing basic stuff that makes a huge difference:
-- No CLAUDE.md (Claude doesn't know your project conventions)
-- No hooks (no auto-lint, no auto-test)
-- No custom commands (repeating the same prompts manually)
-- No Mermaid diagrams (wasting 73% more tokens on prose descriptions)
-I tested 4 real repos with 84 checks. Before optimization: scores ranged from 35 to 62. After: 48 to 90. A VTCLE FastAPI project jumped from 46→64 just from adding missing hooks and commands.
-```
+The interesting problem with Claude Code is not "can it write code?".
+It's "is the repo actually set up so Claude can work safely and predictably?".
+I built `claudex-setup` to audit that surface:
+- `85` checks
+- zero dependencies
+- local-only by default
+- trust-first flow: audit -> plan -> apply -> rollback
+Measured on 4 real repos:
+- FastAPI repo: `46 -> 64`
+- React Native repo: `40 -> 48`
+- Python/Docker repo: `35 -> 48`
+- research engine repo: `62 -> 90`
+```bash
 npx claudex-setup
 ```
-Scores your project 0-100, tells you exactly what to fix, and can auto-apply everything.
+The most common misses were not exotic:
+- no deny rules
+- no secrets protection
+- no mermaid architecture
+- no hooks registered in settings
+Proof:
+https://github.com/DnaFin/claudex/blob/main/research/proof-artifacts/README.md
-Free, open source, zero dependencies: https://github.com/DnaFin/claudex-setup
+**Evidence anchor:** measured before/after traces + common gap summary from public proof set
 ---
 ## Post 3: Dev.to Article
-**Title:** 1,107 Claude Code Entries: What I Learned Building the Most Comprehensive Catalog
+**Title:** What 4 Real Repos Taught Me About Claude Code Readiness
 **Body (excerpt):**
-I set out to catalog every single Claude Code capability, technique, and best practice. After repeated research cycles, I have 1,107 entries — 948 verified with real evidence. I packaged this into an 84-check CLI audit and tested it on 4 real projects — scores before optimization ranged from 35 to 62, and after: 48 to 90.
+I tested `claudex-setup` on 4 real repos and the pattern was clear:
-Here are the top 10 things most developers are missing:
+- the best teams still miss permission deny rules
+- mature repos often have hooks in files but not actually registered
+- non-standard settings formats are a real adoption trap
+- shared `settings.json` matters more than personal local overrides
-1. **CLAUDE.md** — Claude reads this at the start of every session. Without it, Claude doesn't know your build commands, code style, or project rules.
+Measured on `2026-04-03` with `claudex-setup@1.10.3`:
+- CLAUDEX: `62 -> 90`
+- VTCLE: `46 -> 64`
+- Social: `40 -> 48`
+- Polymiro: `35 -> 48`
-2. **Mermaid diagrams** — A Mermaid architecture diagram saves 73% tokens compared to describing your project in prose.
+The product today is strongest as:
-3. **Hooks** — Auto-lint after every edit. Auto-test before every commit. Hooks fire 100% of the time, CLAUDE.md rules fire ~80%.
+`audit -> plan -> safe apply -> governance -> benchmark`
-4. **Custom commands** — `/test`, `/deploy`, `/review` — package your repeated workflows.
+Not a code generator. Not an MCP installer. A trust layer for Claude Code repos.
-5. **Verification loops** — Tell Claude how to verify its own work. Include test commands in CLAUDE.md.
+Proof packet:
+https://github.com/DnaFin/claudex/blob/main/research/proof-artifacts/README.md
-6. **Path-specific rules** — Different conventions for frontend vs backend files.
-7. **XML tags** — `<constraints>`, `<validation>` in CLAUDE.md = unambiguous instructions.
-8. **Custom agents** — Security reviewer, test writer — specialized subagents for focused tasks.
-9. **Skills** — Domain-specific workflows that load on demand, not every session.
-10. **MCP servers** — Connect Claude to your database, ticket system, Slack.
-I packaged this into a CLI that checks your project:
-```
-npx claudex-setup
-```
-Full catalog: https://github.com/DnaFin/claudex-setup
+**Evidence anchor:** proof artifact index + case-study docs + current proof source
 ---
 ## Post 4: Twitter/X Thread
 **Tweet 1:**
-I cataloged 1,107 Claude Code entries and verified 948 of them with evidence.
-Most projects use less than 5% of what Claude Code can do.
+I built a zero-dependency CLI that audits Claude Code readiness across `85` checks.
-Tested on 4 real repos — a React Native app scored 40, a FastAPI engine scored 46. After auto-setup: 48 and 64. Here's the free 84-check audit:
+Measured on 4 real repos:
+- `62 -> 90`
+- `46 -> 64`
+- `40 -> 48`
+- `35 -> 48`
-npx claudex-setup
+`npx claudex-setup`
-Thread 🧵👇
+Proof: github.com/DnaFin/claudex/blob/main/research/proof-artifacts/README.md
 **Tweet 2:**
-The #1 thing you're probably missing: CLAUDE.md
-It's a file Claude reads at the start of every session. Without it, Claude doesn't know your:
-- Build commands
-- Code style
-- Testing framework
-- Project architecture
+The most common misses were boring and important:
+- no deny rules
+- no secrets protection
+- no mermaid diagram
+- no hooks registered in settings
-Takes 2 minutes to create. Impact: massive.
+It is much more "trust layer" than "AI magic".
 **Tweet 3:**
-#2: Mermaid diagrams in CLAUDE.md
-A few hundred tokens of Mermaid syntax conveys what takes thousands of tokens in prose.
-73% token savings = faster responses, lower cost, better context.
+What it does well today:
+- audit first
+- suggest / plan before writes
+- apply selectively
+- emit rollback artifacts
+- benchmark on isolated copy
 **Tweet 4:**
-#3: Hooks > CLAUDE.md rules
-CLAUDE.md instructions = ~80% compliance
-Hooks = 100% enforcement
+Best result so far:
+- CLAUDEX self-dogfood: `62 -> 90`
-Auto-lint after edits. Block commits without tests. Prevent force-push.
+Best external proof:
+- VTCLE: `46 -> 64`
-Hooks are deterministic. Instructions are advisory.
+Case studies:
+- github.com/DnaFin/claudex/blob/main/research/case-study-vtcle-2026-04-03.md
+- github.com/DnaFin/claudex/blob/main/research/case-study-social-2026-04-03.md
+- github.com/DnaFin/claudex/blob/main/research/case-study-polymiro-2026-04-03.md
 **Tweet 5:**
-Want to check your project in 10 seconds?
+Measured results were captured on `claudex-setup@1.10.3` on `2026-04-03`.
+Current npm latest is `1.16.1`, so exact scores can move slightly, but the proof packet is explicit about that boundary.
-npx claudex-setup
-Scores 0-100. Shows what's missing. Auto-fixes with `setup`.
-Free. Open source. Zero dependencies.
-https://github.com/DnaFin/claudex-setup
+**Evidence anchor:** proof artifact index + per-repo traces
 ---
 ## Post 5: Hacker News (Show HN)
-**Title:** Show HN: claudex-setup – Audit any project for Claude Code optimization (1,107 entries)
+**Title:** Show HN: claudex-setup — audit Claude Code readiness with 85 checks
 **Body:**
-I built a CLI tool that scores your project against Claude Code best practices.
-After researching 1,107 entries (948 verified with evidence), I tested 4 real repos with 84 checks. Scores before optimization: 35–62. After auto-setup: 48–90. The biggest jump was a research catalog repo going from 62 to 90.
-npx claudex-setup → audit (84 checks, 0-100 score)
-npx claudex-setup setup → auto-fix
-Detects your stack (React, Python, TS, Rust, Go, etc) and tailors recommendations.
-Zero dependencies, no API keys, runs locally.
-https://github.com/DnaFin/claudex-setup
+I built a CLI that audits how well a repo is set up for Claude Code.
+This is not a code-quality linter and not an MCP installer.
+It focuses on Claude workflow quality:
+- `CLAUDE.md`
+- hooks
+- commands
+- agents
+- skills
+- MCP config
+- permissions / deny rules
+- diagrams
+- verification loops
+Core workflow:
+- `npx claudex-setup`
+- `npx claudex-setup suggest-only`
+- `npx claudex-setup plan`
+- `npx claudex-setup apply`
+- `npx claudex-setup benchmark`
+Measured on 4 real repos on `2026-04-03` with `claudex-setup@1.10.3`:
+- CLAUDEX: `62 -> 90`
+- VTCLE: `46 -> 64`
+- Social: `40 -> 48`
+- Polymiro: `35 -> 48`
+Trust decisions that mattered:
+- zero dependencies
+- audit before write
+- rollback artifacts
+- cross-platform Node hooks
+- explicit proof packets instead of vague claims
+Proof packet:
+https://github.com/DnaFin/claudex/blob/main/research/proof-artifacts/README.md
+**Evidence anchor:** proof artifact index + current npm proof source

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "claudex-setup",
-  "version": "1.16.0",
-  "description": "Score your repo's Claude Code setup against 84 checks. See gaps, apply fixes selectively with rollback, govern hooks and permissions, and benchmark impact — without breaking existing config.",
+  "version": "1.16.2",
+  "description": "Score your repo's Claude Code setup against 85 checks. See gaps, apply fixes selectively with rollback, govern hooks and permissions, and benchmark impact — without breaking existing config.",
   "main": "src/index.js",
   "bin": {
     "claudex-setup": "bin/cli.js"

package/src/activity.js CHANGED Viewed

@@ -21,10 +21,12 @@ function ensureArtifactDirs(dir) {
   const activityDir = path.join(root, 'activity');
   const rollbackDir = path.join(root, 'rollbacks');
   const snapshotDir = path.join(root, 'snapshots');
+  const outcomesDir = path.join(root, 'outcomes');
   fs.mkdirSync(activityDir, { recursive: true });
   fs.mkdirSync(rollbackDir, { recursive: true });
   fs.mkdirSync(snapshotDir, { recursive: true });
-  return { root, activityDir, rollbackDir, snapshotDir };
+  fs.mkdirSync(outcomesDir, { recursive: true });
+  return { root, activityDir, rollbackDir, snapshotDir, outcomesDir };
 }
 function writeJson(filePath, payload) {
@@ -322,6 +324,192 @@ function exportTrendReport(dir) {
   return lines.join('\n');
 }
+function readOutcomeIndex(dir) {
+  const indexPath = path.join(dir, '.claude', 'claudex-setup', 'outcomes', 'index.json');
+  if (!fs.existsSync(indexPath)) return [];
+  try {
+    const entries = JSON.parse(fs.readFileSync(indexPath, 'utf8'));
+    return Array.isArray(entries) ? entries : [];
+  } catch {
+    return [];
+  }
+}
+function updateOutcomeIndex(outcomesDir, record) {
+  const indexPath = path.join(outcomesDir, 'index.json');
+  let entries = [];
+  if (fs.existsSync(indexPath)) {
+    try {
+      entries = JSON.parse(fs.readFileSync(indexPath, 'utf8'));
+      if (!Array.isArray(entries)) entries = [];
+    } catch {
+      entries = [];
+    }
+  }
+  entries.push(record);
+  const MAX_INDEX_ENTRIES = 500;
+  if (entries.length > MAX_INDEX_ENTRIES) {
+    entries = entries.slice(entries.length - MAX_INDEX_ENTRIES);
+  }
+  fs.writeFileSync(indexPath, JSON.stringify(entries, null, 2), 'utf8');
+}
+function normalizeOutcomeStatus(value) {
+  const normalized = `${value || ''}`.trim().toLowerCase();
+  if (!['accepted', 'rejected', 'deferred'].includes(normalized)) {
+    throw new Error('feedback status must be one of: accepted, rejected, deferred');
+  }
+  return normalized;
+}
+function normalizeOutcomeEffect(value) {
+  const normalized = `${value || ''}`.trim().toLowerCase();
+  if (!['positive', 'neutral', 'negative'].includes(normalized)) {
+    throw new Error('feedback effect must be one of: positive, neutral, negative');
+  }
+  return normalized;
+}
+function recordRecommendationOutcome(dir, payload) {
+  const key = `${payload.key || ''}`.trim();
+  if (!key) {
+    throw new Error('feedback requires a recommendation key');
+  }
+  const status = normalizeOutcomeStatus(payload.status);
+  const effect = normalizeOutcomeEffect(payload.effect || 'neutral');
+  const scoreDelta = Number.isFinite(payload.scoreDelta) ? payload.scoreDelta : (
+    payload.scoreDelta === null || payload.scoreDelta === undefined || payload.scoreDelta === ''
+      ? null
+      : Number(payload.scoreDelta)
+  );
+  if (scoreDelta !== null && !Number.isFinite(scoreDelta)) {
+    throw new Error('feedback scoreDelta must be a number when provided');
+  }
+  const id = timestampId();
+  const { outcomesDir } = ensureArtifactDirs(dir);
+  const filePath = path.join(outcomesDir, `${id}.json`);
+  const record = {
+    id,
+    createdAt: new Date().toISOString(),
+    key,
+    status,
+    effect,
+    source: `${payload.source || 'manual-cli'}`.trim() || 'manual-cli',
+    notes: `${payload.notes || ''}`.trim(),
+    scoreDelta,
+  };
+  writeJson(filePath, record);
+  updateOutcomeIndex(outcomesDir, {
+    ...record,
+    relativePath: path.relative(dir, filePath),
+  });
+  return {
+    id,
+    filePath,
+    relativePath: path.relative(dir, filePath),
+    record,
+  };
+}
+function summarizeOutcomeEntries(entries = []) {
+  const byKey = {};
+  for (const entry of entries) {
+    if (!entry || !entry.key) continue;
+    const bucket = byKey[entry.key] || {
+      key: entry.key,
+      total: 0,
+      accepted: 0,
+      rejected: 0,
+      deferred: 0,
+      positive: 0,
+      neutral: 0,
+      negative: 0,
+      scoreDeltaTotal: 0,
+      scoreDeltaCount: 0,
+      latestAt: null,
+    };
+    bucket.total += 1;
+    if (bucket[entry.status] !== undefined) bucket[entry.status] += 1;
+    if (bucket[entry.effect] !== undefined) bucket[entry.effect] += 1;
+    if (Number.isFinite(entry.scoreDelta)) {
+      bucket.scoreDeltaTotal += entry.scoreDelta;
+      bucket.scoreDeltaCount += 1;
+    }
+    if (!bucket.latestAt || new Date(entry.createdAt) > new Date(bucket.latestAt)) {
+      bucket.latestAt = entry.createdAt;
+    }
+    byKey[entry.key] = bucket;
+  }
+  for (const bucket of Object.values(byKey)) {
+    bucket.avgScoreDelta = bucket.scoreDeltaCount > 0
+      ? Number((bucket.scoreDeltaTotal / bucket.scoreDeltaCount).toFixed(2))
+      : null;
+    bucket.evidenceClass = bucket.total > 0 ? 'measured' : 'estimated';
+  }
+  return {
+    totalEntries: entries.length,
+    byKey,
+    keys: Object.keys(byKey).sort(),
+  };
+}
+function getRecommendationOutcomeSummary(dir) {
+  return summarizeOutcomeEntries(readOutcomeIndex(dir));
+}
+function getRecommendationAdjustment(summaryByKey, key) {
+  const bucket = summaryByKey && summaryByKey[key];
+  if (!bucket) return 0;
+  let adjustment = 0;
+  adjustment += bucket.accepted * 2;
+  adjustment += bucket.positive * 3;
+  adjustment -= bucket.rejected * 3;
+  adjustment -= bucket.negative * 4;
+  if (Number.isFinite(bucket.avgScoreDelta)) {
+    if (bucket.avgScoreDelta > 0) adjustment += Math.min(4, Math.round(bucket.avgScoreDelta / 4));
+    if (bucket.avgScoreDelta < 0) adjustment -= Math.min(4, Math.round(Math.abs(bucket.avgScoreDelta) / 4));
+  }
+  if (adjustment > 8) return 8;
+  if (adjustment < -8) return -8;
+  return adjustment;
+}
+function formatRecommendationOutcomeSummary(dir) {
+  const summary = getRecommendationOutcomeSummary(dir);
+  if (summary.totalEntries === 0) {
+    return 'No recommendation outcomes recorded yet. Use `npx claudex-setup feedback --key permissionDeny --status accepted --effect positive` after a real run.';
+  }
+  const lines = [
+    'Recommendation outcome summary:',
+    '',
+  ];
+  for (const key of summary.keys) {
+    const bucket = summary.byKey[key];
+    const avg = Number.isFinite(bucket.avgScoreDelta) ? ` | avg score delta ${bucket.avgScoreDelta >= 0 ? '+' : ''}${bucket.avgScoreDelta}` : '';
+    const adjustment = getRecommendationAdjustment(summary.byKey, key);
+    lines.push(`  ${key}: total ${bucket.total} | accepted ${bucket.accepted} | rejected ${bucket.rejected} | deferred ${bucket.deferred} | positive ${bucket.positive} | negative ${bucket.negative}${avg} | ranking ${adjustment >= 0 ? '+' : ''}${adjustment}`);
+  }
+  return lines.join('\n');
+}
 module.exports = {
   ensureArtifactDirs,
   writeActivityArtifact,
@@ -332,4 +520,10 @@ module.exports = {
   compareLatest,
   formatHistory,
   exportTrendReport,
+  readOutcomeIndex,
+  recordRecommendationOutcome,
+  summarizeOutcomeEntries,
+  getRecommendationOutcomeSummary,
+  getRecommendationAdjustment,
+  formatRecommendationOutcomeSummary,
 };

package/src/analyze.js CHANGED Viewed

@@ -244,12 +244,15 @@ function toGaps(results) {
 }
 function toRecommendations(auditResult) {
-  const failed = auditResult.results
-    .filter(r => r.passed === false)
-    .sort((a, b) => {
-      const order = { critical: 3, high: 2, medium: 1, low: 0 };
-      return (order[b.impact] || 0) - (order[a.impact] || 0);
-    });
+  const failed = auditResult.results.filter(r => r.passed === false);
+  const topActionOrder = new Map((auditResult.topNextActions || []).map((item, index) => [item.key, index]));
+  failed.sort((a, b) => {
+    const rankedA = topActionOrder.has(a.key) ? topActionOrder.get(a.key) : Number.MAX_SAFE_INTEGER;
+    const rankedB = topActionOrder.has(b.key) ? topActionOrder.get(b.key) : Number.MAX_SAFE_INTEGER;
+    if (rankedA !== rankedB) return rankedA - rankedB;
+    const order = { critical: 3, high: 2, medium: 1, low: 0 };
+    return (order[b.impact] || 0) - (order[a.impact] || 0);
+  });
   return failed.slice(0, 10).map((r, index) => ({
     priority: index + 1,
@@ -259,6 +262,8 @@ function toRecommendations(auditResult) {
     module: moduleFromCategory(r.category),
     risk: riskFromImpact(r.impact),
     why: r.fix,
+    evidenceClass: (auditResult.topNextActions || []).find(item => item.key === r.key)?.evidenceClass || 'estimated',
+    rankingAdjustment: (auditResult.topNextActions || []).find(item => item.key === r.key)?.rankingAdjustment || 0,
   }));
 }