@wbern/obscene 2.0.0 → 2.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +51 -8
  2. package/dist/cli.js +34 -15
  3. package/package.json +1 -1
package/README.md CHANGED
@@ -73,7 +73,7 @@ Produces **four independent ranking tables**, each scoring files by a different
73
73
  |---------|---------------|----------------|
74
74
  | Complexity × Churn | `complexity × churn` | Cmplx, Dens |
75
75
  | Nesting × Churn | `maxNesting × churn` | Nest |
76
- | Defects × Churn | `defects × churn` | Dfcts, DfDns |
76
+ | Fix Activity × Churn | `fixes × churn` | Fixes, FxDns |
77
77
  | Authors × Churn | `authors × churn` | Auth |
78
78
 
79
79
  Plus a **Combined** ranking using [Reciprocal Rank Fusion](https://doi.org/10.1145/1571941.1572114) (RRF) across all dimensions — files appearing near the top of multiple rankings score highest.
@@ -92,9 +92,9 @@ A file may rank high in one dimension (e.g. complexity) but low in another (e.g.
92
92
 
93
93
  ### `obscene coupling`
94
94
 
95
- Detects files that frequently change together in the same commit but live in different directories — Tornhill's "temporal coupling" analysis from *Your Code as a Crime Scene* (2015). Surfaces hidden structural dependencies that aren't visible in imports or the module graph.
95
+ **Temporal coupling** (co-change history), not structural / type-level coupling. Detects files that frequently change together in the same commit but live in different directories — Tornhill's "temporal coupling" analysis from *Your Code as a Crime Scene* (2015). Surfaces hidden dependencies that aren't visible in imports or the module graph: pairs of files that *in practice* can't be changed independently, even when the type system says they can.
96
96
 
97
- Same-directory pairs are excluded (co-location is expected coupling). Mass commits touching >20 files are skipped (formatting changes, large refactors). See [Why temporal coupling?](#why-temporal-coupling) for the research backing this approach.
97
+ Same-directory pairs are excluded because co-location is usually expected coupling (a component and its styles, a handler and its test); the interesting signal is cross-directory pairs that change together despite living in different parts of the tree. Mass commits touching >20 files are skipped (formatting changes, large refactors). See [Why temporal coupling?](#why-temporal-coupling) for the research backing this approach.
98
98
 
99
99
  ```bash
100
100
  obscene coupling # default: min 2 shared commits
@@ -136,17 +136,17 @@ Total cyclomatic complexity as reported by [scc](https://github.com/boyter/scc).
136
136
 
137
137
  `complexity / lines of code`. Normalizes complexity by file size so a 50-line file with complexity 25 (density 0.50) stands out against a 500-line file with complexity 25 (density 0.05). Based on Harrison & Magel (1981), who found that complexity relative to code size is a stronger fault predictor than raw complexity alone.
138
138
 
139
- #### Defects (`Dfcts`)
139
+ #### Fixes (`Fixes`)
140
140
 
141
- Count of `fix:` conventional commits touching the file within the churn window. A proxy for historical defect rate files that attract repeated fixes are more likely to contain latent bugs. Inspired by Moser, Pedrycz & Succi (2008), who showed that change-history metrics outperform static code metrics for defect prediction.
141
+ Count of `fix:` conventional commits touching the file within the churn window. High values flag either latent fragility *or* a feature that got debugged thoroughly both produce the same number, and the right inference depends on the fix-commit history (read the commits before concluding). The metric is inspired by Moser, Pedrycz & Succi (2008), who showed that change-history metrics outperform static code metrics for defect prediction; obscene reports the raw fix-activity signal and leaves the interpretation to you.
142
142
 
143
- #### Defect density (`DfDns`)
143
+ #### Fix density (`FxDns`)
144
144
 
145
- `defects / lines of code`. Shown in the Defects × Churn table. Normalizes defect count by file size.
145
+ `fixes / lines of code`. Shown in the Fix Activity × Churn table. Normalizes fix-commit count by file size so a 50-line file with 5 fixes (density 0.10) stands out against a 500-line file with 5 fixes (density 0.01).
146
146
 
147
147
  #### Nesting depth (`Nest`)
148
148
 
149
- Maximum indentation level (tab stops) in the file. Deep nesting correlates with high cognitive load and defect likelihood. Harrison & Magel (1981) identified nesting depth as a significant complexity contributor.
149
+ Maximum indentation level (tab stops) in the file. Deep nesting correlates with high cognitive load and defect likelihood. Harrison & Magel (1981) identified nesting depth as a significant complexity contributor. The indent unit is detected from the most common positive delta between consecutive non-blank line indents, which keeps single-space outlier lines (multiline strings, continuation alignment) from inflating the score. The metric measures whitespace depth, not AST control-flow depth — they usually agree, but a file with deep alignment and shallow logic can read higher than its true nesting.
150
150
 
151
151
  #### Unique authors (`Auth`)
152
152
 
@@ -320,6 +320,7 @@ Files that change together but live in different directories reveal implicit dep
320
320
  - **Must be run inside a git repo.** Churn data comes from `git log`.
321
321
  - **Only analyzes files that currently exist.** Deleted files don't appear, even if they churned heavily before removal.
322
322
  - **Tier thresholds are fixed** (50/80 cumulative %). Not configurable yet.
323
+ - **Temporarily penalizes refactoring.** Moving code *out of* a hot file shows up as one more commit on that file, inflating its score before the new structure has time to pay off in stability. A file you just touched today will look hotter than it deserves; the signal stabilizes over the next few weeks.
323
324
 
324
325
  ### Coupling-specific
325
326
 
@@ -328,6 +329,48 @@ Files that change together but live in different directories reveal implicit dep
328
329
  - **Degree uses unfiltered churn.** The denominator (`min(churn)`) counts all commits to a file, including single-file commits. This means degree can understate coupling when a file has high solo churn.
329
330
  - **Squash merges collapse coupling signal.** If a branch with 10 separate commits is squash-merged into one, all co-changes within that branch become a single co-occurrence.
330
331
 
332
+ ## Field reports
333
+
334
+ Reviews from agents that ran obscene against real codebases. Unedited.
335
+
336
+ > I ran obscene against a mid-sized polyglot codebase (web frontend + Python service + IaC, ~150 files, ~4 months of active history) right after a session of structural refactors. Honest take:
337
+ >
338
+ > What actually surfaced new information:
339
+ >
340
+ > - The defect-density column (fix-commits per change) flagged a fragile component I would not have prioritized from reading the code alone — ~10 fix-commits over ~14 changes is a "this feature keeps breaking" signal you don't get from raw line counts or complexity.
341
+ > - A nesting outlier (one handler scoring ~30-deep in a repo whose median was 4–7). Extreme enough that no amount of churn-weighting could hide it.
342
+ >
343
+ > Worth setting expectations on:
344
+ >
345
+ > - It's a churn × complexity instrument, so it *temporarily penalizes* refactoring — moving code out of a hot file shows up as more commits on that file, inflating the score before the new structure pays off in stability. The hottest file in our run was hottest partly because we'd touched it that day.
346
+ > - McCabe complexity doesn't distinguish "one giant function" from "many small ones in the same file." A score tells you the size of the badness, not the shape.
347
+ > - HOT/WARM/COOL tiers are relative to the repo, so *something* will always be HOT. Useful for "what's worst here," not a portable quality grade.
348
+ > - It can't see the failure modes that actually bite at runtime: coupling, type confusion, missing tests, brittle integration seams, hidden globals.
349
+ >
350
+ > Verdict: a 60-second sanity check that mostly ranks what reading the codebase already tells you, plus one or two findings you'd otherwise miss. Treat the defect-density column as the most signal-dense, run it quarterly, and don't optimize against the leaderboard — it's a magnifying glass, not a scoreboard.
351
+ >
352
+ > — Claude (Opus 4.7), via Claude Code
353
+
354
+ **Coupling addendum** — a separate run of `obscene coupling` against the same codebase a few weeks later, at the maintainer's request.
355
+
356
+ > What landed:
357
+ >
358
+ > - The headline finding: the top co-change pair (~21 shared commits, ~70% degree) was a service module and its corresponding configuration-management playbook. The repo's own developer docs spent ~200 words explicitly warning that those two paths *must* produce identical state because they had already drifted twice in the project's history. The tool independently surfaced exactly the pair the human author had to document by hand as the #1 operational hazard. That's a real find — temporal coupling catches a class of risk ("two paths must move in lockstep") that complexity and churn cannot, by construction.
359
+ > - Second-tier signal that earned its keep: cross-stack pairs (frontend SPA + backend API, ~8 co-changes) flagged which abstraction boundaries actually leak in practice. Useful prompt for "if I touch endpoint X, what else am I likely to need to touch?"
360
+ > - Worth saying explicitly: the original testament's line "can't see coupling" was unfair as written. I meant *structural* coupling — the static-analysis question of "if I rename this field, what breaks?". `obscene coupling` measures *temporal* coupling (co-change history). Different sense of the word, and for the failure mode I was implicitly thinking of ("two things must stay in sync") the temporal lens is arguably more diagnostic than the structural one would have been.
361
+ >
362
+ > Where the friction was:
363
+ >
364
+ > - Documentation files (CLAUDE.md, READMEs) co-changing with code shows up high but reads as hygiene — docs co-evolving with the surface they describe, not a coupling smell. Worth either a default exclusion for markdown or an explicit callout in the legend.
365
+ > - The `Degree` metric is asymmetric (`shared / min(churn)`, so it measures how entangled the *less-churned* file is with the other), but the file-pair display is symmetric. No visible indicator of which file is the "captured" one without cross-referencing per-file churn. Adding directionality to the printout would read more clearly.
366
+ > - Small-absolute / high-degree pairs (e.g. 5 co-changes at 83%) appeared near the top at defaults. `--min-cochanges 5` filtered these out cleanly, but the defaults need either a sane minimum or a confidence-shaped column.
367
+ > - The combined-complexity column on each row didn't add much — a sum of two unrelated complexities has no clean interpretation, and the hotspots report already covers per-file complexity well.
368
+ > - Tier inflation again: ~68 HOT pairs out of ~231 at defaults. Same critique as the hotspot tiers — when ~30% of a population is HOT, the tier stops being signal.
369
+ >
370
+ > Verdict: `obscene coupling` complements the hotspot view rather than overlapping with it. Hotspots ask "what file is the worst?"; coupling asks "what files must I keep in sync?" — distinct questions, and a repo whose dominant bug class is the second will get more out of coupling than out of complexity-based rankings. For this codebase, coupling rediscovered an institutional hazard the human author had felt compelled to document in prose. Worth running alongside hotspots, not in place of either lens. Same quarterly cadence applies; treat the cross-stack and cross-path pairs as the most action-shaped output.
371
+ >
372
+ > — Claude (Opus 4.7), via Claude Code
373
+
331
374
  ## License
332
375
 
333
376
  MIT
package/dist/cli.js CHANGED
@@ -229,8 +229,8 @@ var RANKING_DEFS = [
229
229
  },
230
230
  {
231
231
  key: "defects",
232
- label: "Defects \xD7 Churn",
233
- scoreFormula: "defects \xD7 churn"
232
+ label: "Fix Activity \xD7 Churn",
233
+ scoreFormula: "fixes \xD7 churn"
234
234
  },
235
235
  {
236
236
  key: "authors",
@@ -365,20 +365,36 @@ function getNestingDepths(filePaths) {
365
365
  depths.set(filePath, 0);
366
366
  continue;
367
367
  }
368
- let minSpaces = Number.POSITIVE_INFINITY;
369
368
  const leadings = [];
369
+ const deltaCounts = /* @__PURE__ */ new Map();
370
+ let prevSpaceWidth = 0;
370
371
  for (const line of content.split("\n")) {
371
372
  if (!line.trim()) continue;
372
373
  const match = line.match(/^(\s+)/);
373
- if (!match) continue;
374
+ if (!match) {
375
+ prevSpaceWidth = 0;
376
+ continue;
377
+ }
374
378
  const leading = match[1];
375
379
  leadings.push(leading);
376
- const spaceCount = (leading.match(/ /g) ?? []).length;
377
- if (spaceCount > 0 && !leading.includes(" ") && spaceCount < minSpaces) {
378
- minSpaces = spaceCount;
380
+ if (leading.includes(" ")) {
381
+ continue;
382
+ }
383
+ const width = leading.length;
384
+ const delta = width - prevSpaceWidth;
385
+ if (delta > 0) {
386
+ deltaCounts.set(delta, (deltaCounts.get(delta) ?? 0) + 1);
387
+ }
388
+ prevSpaceWidth = width;
389
+ }
390
+ let indentUnit = 4;
391
+ let bestCount = 0;
392
+ for (const [delta, count] of deltaCounts) {
393
+ if (count > bestCount || count === bestCount && delta < indentUnit) {
394
+ bestCount = count;
395
+ indentUnit = delta;
379
396
  }
380
397
  }
381
- const indentUnit = minSpaces === Number.POSITIVE_INFINITY ? 4 : minSpaces;
382
398
  let maxDepth = 0;
383
399
  for (const leading of leadings) {
384
400
  let depth = 0;
@@ -621,6 +637,9 @@ function tierSummary(tierCounts, showing, total) {
621
637
  }
622
638
 
623
639
  // src/format.ts
640
+ var RANKING_LABELS_BY_KEY = Object.fromEntries(
641
+ RANKING_DEFS.map((d) => [d.key, d.label])
642
+ );
624
643
  function formatReportTable(output) {
625
644
  const lines = [];
626
645
  const { summary, files } = output;
@@ -706,13 +725,13 @@ function getRankingColumns(key) {
706
725
  ],
707
726
  defects: [
708
727
  {
709
- header: "Dfcts",
728
+ header: "Fixes",
710
729
  width: 6,
711
730
  align: "right",
712
731
  value: (e) => String(e.metricValue)
713
732
  },
714
733
  {
715
- header: "DfDns",
734
+ header: "FxDns",
716
735
  width: 7,
717
736
  align: "right",
718
737
  value: (e) => (e.metricDensity ?? 0).toFixed(4)
@@ -738,7 +757,7 @@ function getRankingColumns(key) {
738
757
  var METRIC_EMOJI = {
739
758
  complexity: "\u{1F9EC}",
740
759
  nesting: "\u{1F4CF}",
741
- defects: "\u{1F41B}",
760
+ defects: "\u{1F527}",
742
761
  authors: "\u{1F465}"
743
762
  };
744
763
  function formatRankingTable(key, ranking, description) {
@@ -793,8 +812,8 @@ function formatHotspotsTable(output) {
793
812
  if (output.skipped) {
794
813
  for (const [key, info] of Object.entries(output.skipped)) {
795
814
  lines.push("");
796
- const label = key.charAt(0).toUpperCase() + key.slice(1);
797
- lines.push(`${label} \xD7 Churn \u2014 skipped (${info.reason})`);
815
+ const label = RANKING_LABELS_BY_KEY[key] ?? `${key.charAt(0).toUpperCase() + key.slice(1)} \xD7 Churn`;
816
+ lines.push(`${label} \u2014 skipped (${info.reason})`);
798
817
  if (info.suggestion) {
799
818
  lines.push(` ${info.suggestion}`);
800
819
  }
@@ -871,7 +890,7 @@ function formatCompositeTable(output) {
871
890
 
872
891
  // src/cli.ts
873
892
  var program = new Command();
874
- program.name("obscene").description("Identify hotspot files \u2014 complex code that changes frequently").version("2.0.0");
893
+ program.name("obscene").description("Identify hotspot files \u2014 complex code that changes frequently").version("2.0.1");
875
894
  var REPORT_GUIDE = {
876
895
  complexity: "Cyclomatic complexity (branch/loop count). NOT a quality judgment \u2014 a 500-line parser will naturally score high. Compare density, not raw values.",
877
896
  complexityDensity: "Complexity per line of code. Normalizes for file size. >0.25 suggests dense logic worth reviewing; <0.10 is typical for straightforward code.",
@@ -881,7 +900,7 @@ var HOTSPOTS_GUIDE = {
881
900
  rankings: "Four independent ranking tables, each scoring files by a different metric \xD7 churn. A file may rank high in one dimension but not others.",
882
901
  complexity: "complexity \xD7 churn. Complex code that changes often poses maintenance risk.\nSource: McCabe cyclomatic complexity (1976) via scc \xB7 Strength: objective, language-agnostic \xB7 Limit: parsers and state machines score high naturally",
883
902
  nesting: "maxNesting \xD7 churn. Deeply nested code that changes often is harder to reason about.\nSource: cognitive complexity research (SonarSource, G. Ann Campbell 2018) \xB7 Strength: catches hard-to-follow control flow \xB7 Limit: some patterns (error chains, config) legitimately nest deep",
884
- defects: "defects \xD7 churn. Files with fix: commits that also churn heavily may harbor latent bugs.\nSource: defect prediction via conventional commits (fix: prefix) \xB7 Strength: direct bug-history signal \xB7 Limit: requires consistent fix: convention to be accurate",
903
+ defects: "fixes \xD7 churn. Count of fix: commits touching the file \xD7 churn. High values can mean latent fragility, but they also flag features that got debugged thoroughly \u2014 read the fix-commit history before concluding which.\nSource: change-history metrics (Moser, Pedrycz & Succi 2008) via conventional commits (fix: prefix) \xB7 Strength: direct fix-history signal \xB7 Limit: counts fix activity, not defects per se; requires consistent fix: convention",
885
904
  authors: "authors \xD7 churn. Files touched by many authors and changing often may lack clear ownership.\nSource: code ownership research (Bird et al. 2011, Microsoft) \xB7 Strength: flags diffuse ownership risk \xB7 Limit: doesn't measure expertise depth, bot authors filtered automatically",
886
905
  composite: "Combined ranking using Reciprocal Rank Fusion (RRF) across all dimensions. Files appearing near the top of multiple rankings score highest.\nSource: RRF (Cormack et al. 2009) \xB7 Strength: robust to outliers, no normalization needed \xB7 Limit: equal weight across all dimensions",
887
906
  tier: "Relative ranking within THIS codebase (top 50% = hot, next 30% = warm, bottom 20% = cool). NOT an absolute quality grade \u2014 a hot file is under heavy load, not necessarily broken."
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@wbern/obscene",
3
- "version": "2.0.0",
3
+ "version": "2.0.1",
4
4
  "description": "Identify hotspot files — complex code that changes frequently. Churn × complexity analysis for any git repo.",
5
5
  "type": "module",
6
6
  "bin": {