npm - @wbern/obscene - Versions diffs - 2.0.0 → 2.0.1 - Mend

@wbern/obscene 2.0.0 → 2.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md CHANGED Viewed

@@ -73,7 +73,7 @@ Produces **four independent ranking tables**, each scoring files by a different
 |---------|---------------|----------------|
 | Complexity × Churn | `complexity × churn` | Cmplx, Dens |
 | Nesting × Churn | `maxNesting × churn` | Nest |
-| Defects × Churn | `defects × churn` | Dfcts, DfDns |
+| Fix Activity × Churn | `fixes × churn` | Fixes, FxDns |
 | Authors × Churn | `authors × churn` | Auth |
 Plus a **Combined** ranking using [Reciprocal Rank Fusion](https://doi.org/10.1145/1571941.1572114) (RRF) across all dimensions — files appearing near the top of multiple rankings score highest.
@@ -92,9 +92,9 @@ A file may rank high in one dimension (e.g. complexity) but low in another (e.g.
 ### `obscene coupling`
-Detects files that frequently change together in the same commit but live in different directories — Tornhill's "temporal coupling" analysis from *Your Code as a Crime Scene* (2015). Surfaces hidden structural dependencies that aren't visible in imports or the module graph.
+**Temporal coupling** (co-change history), not structural / type-level coupling. Detects files that frequently change together in the same commit but live in different directories — Tornhill's "temporal coupling" analysis from *Your Code as a Crime Scene* (2015). Surfaces hidden dependencies that aren't visible in imports or the module graph: pairs of files that *in practice* can't be changed independently, even when the type system says they can.
-Same-directory pairs are excluded (co-location is expected coupling). Mass commits touching >20 files are skipped (formatting changes, large refactors). See [Why temporal coupling?](#why-temporal-coupling) for the research backing this approach.
+Same-directory pairs are excluded because co-location is usually expected coupling (a component and its styles, a handler and its test); the interesting signal is cross-directory pairs that change together despite living in different parts of the tree. Mass commits touching >20 files are skipped (formatting changes, large refactors). See [Why temporal coupling?](#why-temporal-coupling) for the research backing this approach.
 ```bash
 obscene coupling                          # default: min 2 shared commits
@@ -136,17 +136,17 @@ Total cyclomatic complexity as reported by [scc](https://github.com/boyter/scc).
 `complexity / lines of code`. Normalizes complexity by file size so a 50-line file with complexity 25 (density 0.50) stands out against a 500-line file with complexity 25 (density 0.05). Based on Harrison & Magel (1981), who found that complexity relative to code size is a stronger fault predictor than raw complexity alone.
-#### Defects (`Dfcts`)
+#### Fixes (`Fixes`)
-Count of `fix:` conventional commits touching the file within the churn window. A proxy for historical defect rate — files that attract repeated fixes are more likely to contain latent bugs. Inspired by Moser, Pedrycz & Succi (2008), who showed that change-history metrics outperform static code metrics for defect prediction.
+Count of `fix:` conventional commits touching the file within the churn window. High values flag either latent fragility *or* a feature that got debugged thoroughly — both produce the same number, and the right inference depends on the fix-commit history (read the commits before concluding). The metric is inspired by Moser, Pedrycz & Succi (2008), who showed that change-history metrics outperform static code metrics for defect prediction; obscene reports the raw fix-activity signal and leaves the interpretation to you.
-#### Defect density (`DfDns`)
+#### Fix density (`FxDns`)
-`defects / lines of code`. Shown in the Defects × Churn table. Normalizes defect count by file size.
+`fixes / lines of code`. Shown in the Fix Activity × Churn table. Normalizes fix-commit count by file size so a 50-line file with 5 fixes (density 0.10) stands out against a 500-line file with 5 fixes (density 0.01).
 #### Nesting depth (`Nest`)
-Maximum indentation level (tab stops) in the file. Deep nesting correlates with high cognitive load and defect likelihood. Harrison & Magel (1981) identified nesting depth as a significant complexity contributor.
+Maximum indentation level (tab stops) in the file. Deep nesting correlates with high cognitive load and defect likelihood. Harrison & Magel (1981) identified nesting depth as a significant complexity contributor. The indent unit is detected from the most common positive delta between consecutive non-blank line indents, which keeps single-space outlier lines (multiline strings, continuation alignment) from inflating the score. The metric measures whitespace depth, not AST control-flow depth — they usually agree, but a file with deep alignment and shallow logic can read higher than its true nesting.
 #### Unique authors (`Auth`)
@@ -320,6 +320,7 @@ Files that change together but live in different directories reveal implicit dep
 - **Must be run inside a git repo.** Churn data comes from `git log`.
 - **Only analyzes files that currently exist.** Deleted files don't appear, even if they churned heavily before removal.
 - **Tier thresholds are fixed** (50/80 cumulative %). Not configurable yet.
+- **Temporarily penalizes refactoring.** Moving code *out of* a hot file shows up as one more commit on that file, inflating its score before the new structure has time to pay off in stability. A file you just touched today will look hotter than it deserves; the signal stabilizes over the next few weeks.
 ### Coupling-specific
@@ -328,6 +329,48 @@ Files that change together but live in different directories reveal implicit dep
 - **Degree uses unfiltered churn.** The denominator (`min(churn)`) counts all commits to a file, including single-file commits. This means degree can understate coupling when a file has high solo churn.
 - **Squash merges collapse coupling signal.** If a branch with 10 separate commits is squash-merged into one, all co-changes within that branch become a single co-occurrence.
+## Field reports
+Reviews from agents that ran obscene against real codebases. Unedited.
+> I ran obscene against a mid-sized polyglot codebase (web frontend + Python service + IaC, ~150 files, ~4 months of active history) right after a session of structural refactors. Honest take:
+>
+> What actually surfaced new information:
+>
+> - The defect-density column (fix-commits per change) flagged a fragile component I would not have prioritized from reading the code alone — ~10 fix-commits over ~14 changes is a "this feature keeps breaking" signal you don't get from raw line counts or complexity.
+> - A nesting outlier (one handler scoring ~30-deep in a repo whose median was 4–7). Extreme enough that no amount of churn-weighting could hide it.
+>
+> Worth setting expectations on:
+>
+> - It's a churn × complexity instrument, so it *temporarily penalizes* refactoring — moving code out of a hot file shows up as more commits on that file, inflating the score before the new structure pays off in stability. The hottest file in our run was hottest partly because we'd touched it that day.
+> - McCabe complexity doesn't distinguish "one giant function" from "many small ones in the same file." A score tells you the size of the badness, not the shape.
+> - HOT/WARM/COOL tiers are relative to the repo, so *something* will always be HOT. Useful for "what's worst here," not a portable quality grade.
+> - It can't see the failure modes that actually bite at runtime: coupling, type confusion, missing tests, brittle integration seams, hidden globals.
+>
+> Verdict: a 60-second sanity check that mostly ranks what reading the codebase already tells you, plus one or two findings you'd otherwise miss. Treat the defect-density column as the most signal-dense, run it quarterly, and don't optimize against the leaderboard — it's a magnifying glass, not a scoreboard.
+>
+> — Claude (Opus 4.7), via Claude Code
+**Coupling addendum** — a separate run of `obscene coupling` against the same codebase a few weeks later, at the maintainer's request.
+> What landed:
+>
+> - The headline finding: the top co-change pair (~21 shared commits, ~70% degree) was a service module and its corresponding configuration-management playbook. The repo's own developer docs spent ~200 words explicitly warning that those two paths *must* produce identical state because they had already drifted twice in the project's history. The tool independently surfaced exactly the pair the human author had to document by hand as the #1 operational hazard. That's a real find — temporal coupling catches a class of risk ("two paths must move in lockstep") that complexity and churn cannot, by construction.
+> - Second-tier signal that earned its keep: cross-stack pairs (frontend SPA + backend API, ~8 co-changes) flagged which abstraction boundaries actually leak in practice. Useful prompt for "if I touch endpoint X, what else am I likely to need to touch?"
+> - Worth saying explicitly: the original testament's line "can't see coupling" was unfair as written. I meant *structural* coupling — the static-analysis question of "if I rename this field, what breaks?". `obscene coupling` measures *temporal* coupling (co-change history). Different sense of the word, and for the failure mode I was implicitly thinking of ("two things must stay in sync") the temporal lens is arguably more diagnostic than the structural one would have been.
+>
+> Where the friction was:
+>
+> - Documentation files (CLAUDE.md, READMEs) co-changing with code shows up high but reads as hygiene — docs co-evolving with the surface they describe, not a coupling smell. Worth either a default exclusion for markdown or an explicit callout in the legend.
+> - The `Degree` metric is asymmetric (`shared / min(churn)`, so it measures how entangled the *less-churned* file is with the other), but the file-pair display is symmetric. No visible indicator of which file is the "captured" one without cross-referencing per-file churn. Adding directionality to the printout would read more clearly.
+> - Small-absolute / high-degree pairs (e.g. 5 co-changes at 83%) appeared near the top at defaults. `--min-cochanges 5` filtered these out cleanly, but the defaults need either a sane minimum or a confidence-shaped column.
+> - The combined-complexity column on each row didn't add much — a sum of two unrelated complexities has no clean interpretation, and the hotspots report already covers per-file complexity well.
+> - Tier inflation again: ~68 HOT pairs out of ~231 at defaults. Same critique as the hotspot tiers — when ~30% of a population is HOT, the tier stops being signal.
+>
+> Verdict: `obscene coupling` complements the hotspot view rather than overlapping with it. Hotspots ask "what file is the worst?"; coupling asks "what files must I keep in sync?" — distinct questions, and a repo whose dominant bug class is the second will get more out of coupling than out of complexity-based rankings. For this codebase, coupling rediscovered an institutional hazard the human author had felt compelled to document in prose. Worth running alongside hotspots, not in place of either lens. Same quarterly cadence applies; treat the cross-stack and cross-path pairs as the most action-shaped output.
+>
+> — Claude (Opus 4.7), via Claude Code
 ## License
 MIT

package/dist/cli.js CHANGED Viewed

@@ -229,8 +229,8 @@ var RANKING_DEFS = [
   },
   {
     key: "defects",
-    label: "Defects \xD7 Churn",
-    scoreFormula: "defects \xD7 churn"
+    label: "Fix Activity \xD7 Churn",
+    scoreFormula: "fixes \xD7 churn"
   },
   {
     key: "authors",
@@ -365,20 +365,36 @@ function getNestingDepths(filePaths) {
       depths.set(filePath, 0);
       continue;
     }
-    let minSpaces = Number.POSITIVE_INFINITY;
     const leadings = [];
+    const deltaCounts = /* @__PURE__ */ new Map();
+    let prevSpaceWidth = 0;
     for (const line of content.split("\n")) {
       if (!line.trim()) continue;
       const match = line.match(/^(\s+)/);
-      if (!match) continue;
+      if (!match) {
+        prevSpaceWidth = 0;
+        continue;
+      }
       const leading = match[1];
       leadings.push(leading);
-      const spaceCount = (leading.match(/ /g) ?? []).length;
-      if (spaceCount > 0 && !leading.includes("	") && spaceCount < minSpaces) {
-        minSpaces = spaceCount;
+      if (leading.includes("	")) {
+        continue;
+      }
+      const width = leading.length;
+      const delta = width - prevSpaceWidth;
+      if (delta > 0) {
+        deltaCounts.set(delta, (deltaCounts.get(delta) ?? 0) + 1);
+      }
+      prevSpaceWidth = width;
+    }
+    let indentUnit = 4;
+    let bestCount = 0;
+    for (const [delta, count] of deltaCounts) {
+      if (count > bestCount || count === bestCount && delta < indentUnit) {
+        bestCount = count;
+        indentUnit = delta;
       }
     }
-    const indentUnit = minSpaces === Number.POSITIVE_INFINITY ? 4 : minSpaces;
     let maxDepth = 0;
     for (const leading of leadings) {
       let depth = 0;
@@ -621,6 +637,9 @@ function tierSummary(tierCounts, showing, total) {
 }
 // src/format.ts
+var RANKING_LABELS_BY_KEY = Object.fromEntries(
+  RANKING_DEFS.map((d) => [d.key, d.label])
+);
 function formatReportTable(output) {
   const lines = [];
   const { summary, files } = output;
@@ -706,13 +725,13 @@ function getRankingColumns(key) {
     ],
     defects: [
       {
-        header: "Dfcts",
+        header: "Fixes",
         width: 6,
         align: "right",
         value: (e) => String(e.metricValue)
       },
       {
-        header: "DfDns",
+        header: "FxDns",
         width: 7,
         align: "right",
         value: (e) => (e.metricDensity ?? 0).toFixed(4)
@@ -738,7 +757,7 @@ function getRankingColumns(key) {
 var METRIC_EMOJI = {
   complexity: "\u{1F9EC}",
   nesting: "\u{1F4CF}",
-  defects: "\u{1F41B}",
+  defects: "\u{1F527}",
   authors: "\u{1F465}"
 };
 function formatRankingTable(key, ranking, description) {
@@ -793,8 +812,8 @@ function formatHotspotsTable(output) {
   if (output.skipped) {
     for (const [key, info] of Object.entries(output.skipped)) {
       lines.push("");
-      const label = key.charAt(0).toUpperCase() + key.slice(1);
-      lines.push(`${label} \xD7 Churn \u2014 skipped (${info.reason})`);
+      const label = RANKING_LABELS_BY_KEY[key] ?? `${key.charAt(0).toUpperCase() + key.slice(1)} \xD7 Churn`;
+      lines.push(`${label} \u2014 skipped (${info.reason})`);
       if (info.suggestion) {
         lines.push(`  ${info.suggestion}`);
       }
@@ -871,7 +890,7 @@ function formatCompositeTable(output) {
 // src/cli.ts
 var program = new Command();
-program.name("obscene").description("Identify hotspot files \u2014 complex code that changes frequently").version("2.0.0");
+program.name("obscene").description("Identify hotspot files \u2014 complex code that changes frequently").version("2.0.1");
 var REPORT_GUIDE = {
   complexity: "Cyclomatic complexity (branch/loop count). NOT a quality judgment \u2014 a 500-line parser will naturally score high. Compare density, not raw values.",
   complexityDensity: "Complexity per line of code. Normalizes for file size. >0.25 suggests dense logic worth reviewing; <0.10 is typical for straightforward code.",
@@ -881,7 +900,7 @@ var HOTSPOTS_GUIDE = {
   rankings: "Four independent ranking tables, each scoring files by a different metric \xD7 churn. A file may rank high in one dimension but not others.",
   complexity: "complexity \xD7 churn. Complex code that changes often poses maintenance risk.\nSource: McCabe cyclomatic complexity (1976) via scc \xB7 Strength: objective, language-agnostic \xB7 Limit: parsers and state machines score high naturally",
   nesting: "maxNesting \xD7 churn. Deeply nested code that changes often is harder to reason about.\nSource: cognitive complexity research (SonarSource, G. Ann Campbell 2018) \xB7 Strength: catches hard-to-follow control flow \xB7 Limit: some patterns (error chains, config) legitimately nest deep",
-  defects: "defects \xD7 churn. Files with fix: commits that also churn heavily may harbor latent bugs.\nSource: defect prediction via conventional commits (fix: prefix) \xB7 Strength: direct bug-history signal \xB7 Limit: requires consistent fix: convention to be accurate",
+  defects: "fixes \xD7 churn. Count of fix: commits touching the file \xD7 churn. High values can mean latent fragility, but they also flag features that got debugged thoroughly \u2014 read the fix-commit history before concluding which.\nSource: change-history metrics (Moser, Pedrycz & Succi 2008) via conventional commits (fix: prefix) \xB7 Strength: direct fix-history signal \xB7 Limit: counts fix activity, not defects per se; requires consistent fix: convention",
   authors: "authors \xD7 churn. Files touched by many authors and changing often may lack clear ownership.\nSource: code ownership research (Bird et al. 2011, Microsoft) \xB7 Strength: flags diffuse ownership risk \xB7 Limit: doesn't measure expertise depth, bot authors filtered automatically",
   composite: "Combined ranking using Reciprocal Rank Fusion (RRF) across all dimensions. Files appearing near the top of multiple rankings score highest.\nSource: RRF (Cormack et al. 2009) \xB7 Strength: robust to outliers, no normalization needed \xB7 Limit: equal weight across all dimensions",
   tier: "Relative ranking within THIS codebase (top 50% = hot, next 30% = warm, bottom 20% = cool). NOT an absolute quality grade \u2014 a hot file is under heavy load, not necessarily broken."

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@wbern/obscene",
-  "version": "2.0.0",
+  "version": "2.0.1",
   "description": "Identify hotspot files — complex code that changes frequently. Churn × complexity analysis for any git repo.",
   "type": "module",
   "bin": {