@wbern/obscene 2.0.0 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +75 -12
  2. package/dist/cli.js +142 -38
  3. package/package.json +1 -1
package/README.md CHANGED
@@ -73,7 +73,7 @@ Produces **four independent ranking tables**, each scoring files by a different
73
73
  |---------|---------------|----------------|
74
74
  | Complexity × Churn | `complexity × churn` | Cmplx, Dens |
75
75
  | Nesting × Churn | `maxNesting × churn` | Nest |
76
- | Defects × Churn | `defects × churn` | Dfcts, DfDns |
76
+ | Fix Activity × Churn | `fixes × churn` | Fixes, FxDns |
77
77
  | Authors × Churn | `authors × churn` | Auth |
78
78
 
79
79
  Plus a **Combined** ranking using [Reciprocal Rank Fusion](https://doi.org/10.1145/1571941.1572114) (RRF) across all dimensions — files appearing near the top of multiple rankings score highest.
@@ -88,13 +88,13 @@ Each table has its own tier assignment by cumulative score distribution:
88
88
 
89
89
  Tiers are relative to THIS codebase, not absolute quality grades. A "hot" file is under heavy load, not necessarily broken.
90
90
 
91
- A file may rank high in one dimension (e.g. complexity) but low in another (e.g. authors). Rankings with insufficient data are skipped with an explanation (e.g. defects ranking requires 5+ `fix:` commits across 3+ files). Bot authors (`[bot]` suffix) are filtered automatically.
91
+ A file may rank high in one dimension (e.g. complexity) but low in another (e.g. authors). Rankings with insufficient data are skipped with an explanation (e.g. the Fix Activity ranking requires 5+ `fix:` commits across 3+ files). Bot authors (`[bot]` suffix) are filtered automatically.
92
92
 
93
93
  ### `obscene coupling`
94
94
 
95
- Detects files that frequently change together in the same commit but live in different directories — Tornhill's "temporal coupling" analysis from *Your Code as a Crime Scene* (2015). Surfaces hidden structural dependencies that aren't visible in imports or the module graph.
95
+ **Temporal coupling** (co-change history), not structural / type-level coupling. Detects files that frequently change together in the same commit but live in different directories — Tornhill's "temporal coupling" analysis from *Your Code as a Crime Scene* (2015). Surfaces hidden dependencies that aren't visible in imports or the module graph: pairs of files that *in practice* can't be changed independently, even when the type system says they can.
96
96
 
97
- Same-directory pairs are excluded (co-location is expected coupling). Mass commits touching >20 files are skipped (formatting changes, large refactors). See [Why temporal coupling?](#why-temporal-coupling) for the research backing this approach.
97
+ Same-directory pairs are excluded because co-location is usually expected coupling (a component and its styles, a handler and its test); the interesting signal is cross-directory pairs that change together despite living in different parts of the tree. Mass commits touching >20 files are skipped (formatting changes, large refactors). See [Why temporal coupling?](#why-temporal-coupling) for the research backing this approach.
98
98
 
99
99
  ```bash
100
100
  obscene coupling # default: min 2 shared commits
@@ -122,7 +122,7 @@ Per-file complexity without churn. Useful for raw complexity distribution.
122
122
 
123
123
  #### Score
124
124
 
125
- `metric × churn`. Each ranking table uses a different metric (complexity, nesting, defects, or authors) multiplied by churn. See [Why churn × complexity?](#why-churn-x-complexity) for the research backing this approach.
125
+ `metric × churn`. Each ranking table uses a different metric (complexity, nesting, fix activity, or authors) multiplied by churn. See [Why churn × complexity?](#why-churn-x-complexity) for the research backing this approach.
126
126
 
127
127
  #### Churn (`Churn`)
128
128
 
@@ -130,23 +130,25 @@ Number of commits touching the file within the configured time window (default:
130
130
 
131
131
  #### Cyclomatic complexity (`Cmplx`)
132
132
 
133
- Total cyclomatic complexity as reported by [scc](https://github.com/boyter/scc). Counts independent execution paths (branches, loops, conditions). Higher values mean more paths to test and more places for bugs to hide.
133
+ Total cyclomatic complexity as reported by [scc](https://github.com/boyter/scc). Counts independent execution paths (branches, loops, conditions). Higher values mean more paths to test and more places for bugs to hide. The measure was introduced by McCabe (1976) in *A Complexity Measure* and has been the standard structural-complexity metric since. — [IEEE TSE](https://doi.org/10.1109/TSE.1976.233837)
134
134
 
135
135
  #### Complexity density (`Dens`)
136
136
 
137
137
  `complexity / lines of code`. Normalizes complexity by file size so a 50-line file with complexity 25 (density 0.50) stands out against a 500-line file with complexity 25 (density 0.05). Based on Harrison & Magel (1981), who found that complexity relative to code size is a stronger fault predictor than raw complexity alone.
138
138
 
139
- #### Defects (`Dfcts`)
139
+ #### Fix activity (`Fixes`)
140
140
 
141
- Count of `fix:` conventional commits touching the file within the churn window. A proxy for historical defect rate files that attract repeated fixes are more likely to contain latent bugs. Inspired by Moser, Pedrycz & Succi (2008), who showed that change-history metrics outperform static code metrics for defect prediction.
141
+ Count of `fix:` conventional commits touching the file within the churn window. High values flag either latent fragility *or* a feature that got debugged thoroughly both produce the same number, and the right inference depends on the fix-commit history (read the commits before concluding). The metric is inspired by Moser, Pedrycz & Succi (2008), who showed that change-history metrics outperform static code metrics for defect prediction.
142
142
 
143
- #### Defect density (`DfDns`)
143
+ The literature in [Why churn × complexity?](#why-churn-x-complexity) talks about *defects* — bugs confirmed against a bug-tracker or post-release issue database. obscene doesn't have access to that ground truth, so it uses `fix:` commits as a proxy and reports the raw signal as Fix Activity. The two are related but not identical: a `fix:` commit is direct evidence that someone considered something broken enough to label the change as a fix, but it doesn't distinguish trivial fixes from severe ones, and it relies on the team using conventional commits consistently. Treat Fix Activity as a prompt to read the commits, not as a defect count.
144
144
 
145
- `defects / lines of code`. Shown in the Defects × Churn table. Normalizes defect count by file size.
145
+ #### Fix density (`FxDns`)
146
+
147
+ `fixes / lines of code`. Shown in the Fix Activity × Churn table. Normalizes fix-commit count by file size so a 50-line file with 5 fixes (density 0.10) stands out against a 500-line file with 5 fixes (density 0.01).
146
148
 
147
149
  #### Nesting depth (`Nest`)
148
150
 
149
- Maximum indentation level (tab stops) in the file. Deep nesting correlates with high cognitive load and defect likelihood. Harrison & Magel (1981) identified nesting depth as a significant complexity contributor.
151
+ Maximum indentation level (tab stops) in the file. Deep nesting correlates with high cognitive load and defect likelihood. Harrison & Magel (1981) identified nesting depth as a significant complexity contributor. The indent unit is detected from the most common positive delta between consecutive non-blank line indents, which keeps single-space outlier lines (multiline strings, continuation alignment) from inflating the score. The metric measures whitespace depth, not AST control-flow depth — they usually agree, but a file with deep alignment and shallow logic can read higher than its true nesting.
150
152
 
151
153
  #### Unique authors (`Auth`)
152
154
 
@@ -176,6 +178,30 @@ Cumulative score distribution bucket:
176
178
  | ☀️ **warm** | next 30% (50–80%) | Moderate coupling |
177
179
  | 🧊 **cool** | bottom 20% | Low coupling |
178
180
 
181
+ #### Pair markers
182
+
183
+ The coupling table annotates entries that need framing:
184
+
185
+ | Marker | JSON field | Meaning |
186
+ |--------|------------|---------|
187
+ | `†` next to a path | `file1Deleted` / `file2Deleted` | File is no longer present at HEAD (deleted or renamed away). The coupling signal is historical; the pair is not actionable in the current tree. |
188
+ | `⇄` next to the Degree value | `lockstep` | Both files' total churn equals their co-change count over the window — they only ever changed together. The 100% degree is real but uninformative; treat the pair as a single unit from git's perspective. |
189
+
190
+ ### Corpus framing
191
+
192
+ When the analyzed file set has no measurable cyclomatic complexity (every scanned file is non-code or trivial), the `hotspots` table prepends a banner noting that rankings reflect size and churn only. The `corpus` field in JSON output exposes the same signal:
193
+
194
+ ```json
195
+ {
196
+ "corpus": {
197
+ "fileCount": 42,
198
+ "totalComplexity": 0
199
+ }
200
+ }
201
+ ```
202
+
203
+ `fileCount` counts files *after* exclusion (`.obsignore` and `--exclude` patterns are already applied). Treat HOT/WARM/COOL as relative groupings rather than risk labels when `totalComplexity` is 0.
204
+
179
205
  ## Example output
180
206
 
181
207
  ```
@@ -262,7 +288,7 @@ obscene init
262
288
 
263
289
  This creates a `.obsignore` containing:
264
290
  - **Universal exclusions** — test files (`*.test.*`, `*.spec.*`, `__tests__/`, etc.), lock files (`package-lock.json`, `pnpm-lock.yaml`, etc.), and package manifests (`package.json`)
265
- - **Detected project patterns** — CI directories (`.github/`), config files (`*.config.*`), vendored code, etc., based on your project structure
291
+ - **Detected project patterns** — CI directories (`.github/`), config files (`*.config.*`), vendored code, generated agent-command directories (`.claude/commands/**`, `.opencode/commands/**`, `.cursor/rules/**`), etc., based on your project structure
266
292
 
267
293
  If no `.obsignore` or `.obsceneignore` exists, obscene prints a hint to stderr:
268
294
 
@@ -297,6 +323,8 @@ Files that are both complex and frequently modified are disproportionately likel
297
323
 
298
324
  - **Nagappan & Ball (2005)** studied Windows Server 2003 and found that relative code churn measures predict system defect density with 89% accuracy. — [ICSE 2005](https://doi.org/10.1109/ICSE.2005.1553571)
299
325
  - **Moser, Pedrycz & Succi (2008)** compared change metrics against static code attributes on Eclipse and found that process metrics (churn, change frequency) outperform static code metrics for defect prediction. — [ICSE 2008](https://doi.org/10.1145/1368088.1368114)
326
+ - **Hassan (2009)** introduced an entropy-based measure of code-change complexity and showed it predicts faults better than prior change and prior fault counts on six large open-source systems. — [ICSE 2009](https://doi.org/10.1109/ICSE.2009.5070510)
327
+ - **D'Ambros, Lanza & Robbes (2010)** systematically compared bug-prediction approaches (process, churn, source-code, entropy, and combined metrics) on five open-source systems and found that change-history metrics consistently rank among the strongest predictors. — [MSR 2010](https://doi.org/10.1109/MSR.2010.5463279)
300
328
  - **Shin, Meneely, Williams & Osborne (2011)** combined complexity, churn, and developer activity metrics to predict vulnerabilities in Mozilla Firefox and the Linux kernel. By flagging only 10.9% of files, the model identified 70.8% of known vulnerabilities. — [IEEE TSE](https://doi.org/10.1109/TSE.2010.55)
301
329
  - **Tornhill & Borg (2022)** analyzed 39 proprietary codebases and found that low-quality code (by their Code Health metric) contains 15x more defects and takes 124% longer to resolve. In their case studies, 4% of the codebase was responsible for 72% of all defects. — [ACM/IEEE TechDebt 2022](https://arxiv.org/abs/2203.04374)
302
330
 
@@ -320,6 +348,7 @@ Files that change together but live in different directories reveal implicit dep
320
348
  - **Must be run inside a git repo.** Churn data comes from `git log`.
321
349
  - **Only analyzes files that currently exist.** Deleted files don't appear, even if they churned heavily before removal.
322
350
  - **Tier thresholds are fixed** (50/80 cumulative %). Not configurable yet.
351
+ - **Temporarily penalizes refactoring.** Moving code *out of* a hot file shows up as one more commit on that file, inflating its score before the new structure has time to pay off in stability. A file you just touched today will look hotter than it deserves; the signal stabilizes over the next few weeks.
323
352
 
324
353
  ### Coupling-specific
325
354
 
@@ -328,6 +357,40 @@ Files that change together but live in different directories reveal implicit dep
328
357
  - **Degree uses unfiltered churn.** The denominator (`min(churn)`) counts all commits to a file, including single-file commits. This means degree can understate coupling when a file has high solo churn.
329
358
  - **Squash merges collapse coupling signal.** If a branch with 10 separate commits is squash-merged into one, all co-changes within that branch become a single co-occurrence.
330
359
 
360
+ ## Field reports
361
+
362
+ Reviews from agents that ran obscene against real codebases.
363
+
364
+ > I ran obscene against a mid-sized polyglot codebase (web frontend + Python service + IaC, ~150 files, ~4 months of active history). Honest take:
365
+ >
366
+ > What surfaced new information from the hotspots view:
367
+ >
368
+ > - The Fix Activity column (fix-commits × churn) flagged a component I would not have prioritized from reading the code alone — ~10 fix-commits over ~14 changes. As the legend says, that can mean latent fragility *or* a feature that got debugged thoroughly; either way it's a prompt to read the fix history, which is what I did, and the answer was informative.
369
+ > - A nesting outlier (one handler scoring ~15-deep in a repo whose median was 4–7). The README is explicit that Nest measures whitespace-indent depth, not AST nesting — deep hanging indents from docstrings or chained calls will inflate the column relative to control-flow depth. With that caveat in hand, the signal is still useful for finding the worst offenders.
370
+ >
371
+ > What `obscene coupling` added on a second run:
372
+ >
373
+ > - The headline finding: the top co-change pair (~21 shared commits, ~70% degree) was a service module and its corresponding configuration-management playbook. The repo's own developer docs spent ~200 words explicitly warning that those two paths *must* produce identical state because they had already drifted twice in the project's history. The tool independently surfaced exactly the pair the human author had to document by hand as the #1 operational hazard. Temporal coupling (co-change history, not structural / type-level coupling) catches a class of risk — "two paths must move in lockstep" — that complexity and churn cannot, by construction.
374
+ > - Second-tier signal: cross-stack pairs (frontend SPA + backend API, ~8 co-changes) flagged which abstraction boundaries actually leak in practice. Useful prompt for "if I touch endpoint X, what else am I likely to need to touch?"
375
+ >
376
+ > Worth setting expectations on the hotspots view:
377
+ >
378
+ > - It's a churn × complexity instrument, so it *temporarily penalizes* refactoring — moving code out of a hot file shows up as more commits on that file, inflating the score before the new structure pays off in stability.
379
+ > - McCabe complexity doesn't distinguish "one giant function" from "many small ones in the same file." A score tells you the size of the badness, not the shape.
380
+ > - HOT/WARM/COOL tiers are relative to the repo, so *something* will always be HOT. Useful for "what's worst here," not a portable quality grade.
381
+ > - Failure modes that aren't visible to git or scc — type confusion, missing tests, brittle integration seams, hidden globals — won't appear in the rankings at all. The tool can't tell you about risks it has no signal for.
382
+ >
383
+ > And on the coupling view:
384
+ >
385
+ > - Documentation files (CLAUDE.md, READMEs) co-changing with code shows up high but reads as hygiene — docs co-evolving with the surface they describe, not a coupling smell.
386
+ > - `Degree` is asymmetric (`shared / min(churn)`, so it measures how entangled the *less-churned* file is with the other), but the file-pair display is symmetric. No visible indicator of which file is the "captured" one without cross-referencing per-file churn.
387
+ > - Small-absolute / high-degree pairs (e.g. 5 co-changes at 83%) appear near the top at defaults. `--min-cochanges 5` filters these out cleanly.
388
+ > - Tier inflation: a sizable fraction of pairs end up HOT at defaults. Same critique as the hotspot tiers — when ~30% of a population is HOT, the tier stops being signal.
389
+ >
390
+ > Verdict: hotspots and coupling are complementary, not redundant. Hotspots ask "what file is the worst?"; coupling asks "what files must I keep in sync?" — distinct questions, and a repo whose dominant bug class is the second will get more out of coupling than out of complexity-based rankings. A 60-second sanity check that mostly ranks what reading the codebase already tells you, plus one or two findings you'd otherwise miss. Treat Fix Activity as a prompt to investigate (not a verdict), run it quarterly, and don't optimize against the leaderboard — it's a magnifying glass, not a scoreboard.
391
+ >
392
+ > — Claude (Opus 4.7), via Claude Code
393
+
331
394
  ## License
332
395
 
333
396
  MIT
package/dist/cli.js CHANGED
@@ -229,8 +229,8 @@ var RANKING_DEFS = [
229
229
  },
230
230
  {
231
231
  key: "defects",
232
- label: "Defects \xD7 Churn",
233
- scoreFormula: "defects \xD7 churn"
232
+ label: "Fix Activity \xD7 Churn",
233
+ scoreFormula: "fixes \xD7 churn"
234
234
  },
235
235
  {
236
236
  key: "authors",
@@ -322,15 +322,34 @@ function computeAllRankings(files, churn, defects, nestingDepths, authors, top)
322
322
  }
323
323
  return { rankings, skipped };
324
324
  }
325
- function computeCoupling(cochanges, churn, complexityMap, minCochanges) {
325
+ function getTrackedFiles() {
326
+ let raw;
327
+ try {
328
+ raw = execSync("git ls-files", {
329
+ maxBuffer: 50 * 1024 * 1024,
330
+ stdio: ["pipe", "pipe", "pipe"]
331
+ });
332
+ } catch {
333
+ throw new Error("Not a git repository or git is not installed.");
334
+ }
335
+ const set = /* @__PURE__ */ new Set();
336
+ for (const line of raw.toString().split("\n")) {
337
+ const trimmed = normalizePath(line.trim());
338
+ if (trimmed) set.add(trimmed);
339
+ }
340
+ return set;
341
+ }
342
+ function computeCoupling(cochanges, churn, complexityMap, minCochanges, trackedFiles) {
326
343
  const entries = [];
327
344
  for (const [key, count] of cochanges) {
328
345
  if (count < minCochanges) continue;
329
346
  const [file1, file2] = key.split("\0");
330
- const minChurn = Math.min(churn.get(file1) ?? 0, churn.get(file2) ?? 0);
347
+ const churn1 = churn.get(file1) ?? 0;
348
+ const churn2 = churn.get(file2) ?? 0;
349
+ const minChurn = Math.min(churn1, churn2);
331
350
  const degree = minChurn > 0 ? Math.round(count / minChurn * 1e3) / 10 : 0;
332
351
  const totalComplexity = (complexityMap.get(file1) ?? 0) + (complexityMap.get(file2) ?? 0);
333
- entries.push({
352
+ const entry = {
334
353
  file1,
335
354
  file2,
336
355
  cochanges: count,
@@ -339,7 +358,15 @@ function computeCoupling(cochanges, churn, complexityMap, minCochanges) {
339
358
  couplingScore: count,
340
359
  percentOfTotal: 0,
341
360
  tier: "cool"
342
- });
361
+ };
362
+ if (count > 0 && churn1 === count && churn2 === count) {
363
+ entry.lockstep = true;
364
+ }
365
+ if (trackedFiles) {
366
+ if (!trackedFiles.has(file1)) entry.file1Deleted = true;
367
+ if (!trackedFiles.has(file2)) entry.file2Deleted = true;
368
+ }
369
+ entries.push(entry);
343
370
  }
344
371
  entries.sort((a, b) => b.couplingScore - a.couplingScore);
345
372
  const totalScore = entries.reduce((sum, e) => sum + e.couplingScore, 0);
@@ -365,20 +392,36 @@ function getNestingDepths(filePaths) {
365
392
  depths.set(filePath, 0);
366
393
  continue;
367
394
  }
368
- let minSpaces = Number.POSITIVE_INFINITY;
369
395
  const leadings = [];
396
+ const deltaCounts = /* @__PURE__ */ new Map();
397
+ let prevSpaceWidth = 0;
370
398
  for (const line of content.split("\n")) {
371
399
  if (!line.trim()) continue;
372
400
  const match = line.match(/^(\s+)/);
373
- if (!match) continue;
401
+ if (!match) {
402
+ prevSpaceWidth = 0;
403
+ continue;
404
+ }
374
405
  const leading = match[1];
375
406
  leadings.push(leading);
376
- const spaceCount = (leading.match(/ /g) ?? []).length;
377
- if (spaceCount > 0 && !leading.includes(" ") && spaceCount < minSpaces) {
378
- minSpaces = spaceCount;
407
+ if (leading.includes(" ")) {
408
+ continue;
409
+ }
410
+ const width = leading.length;
411
+ const delta = width - prevSpaceWidth;
412
+ if (delta > 0) {
413
+ deltaCounts.set(delta, (deltaCounts.get(delta) ?? 0) + 1);
414
+ }
415
+ prevSpaceWidth = width;
416
+ }
417
+ let indentUnit = 4;
418
+ let bestCount = 0;
419
+ for (const [delta, count] of deltaCounts) {
420
+ if (count > bestCount || count === bestCount && delta < indentUnit) {
421
+ bestCount = count;
422
+ indentUnit = delta;
379
423
  }
380
424
  }
381
- const indentUnit = minSpaces === Number.POSITIVE_INFINITY ? 4 : minSpaces;
382
425
  let maxDepth = 0;
383
426
  for (const leading of leadings) {
384
427
  let depth = 0;
@@ -444,19 +487,25 @@ var INIT_FILE_RULES = [
444
487
  test: /(?:^|\/)\.gitlab-ci/,
445
488
  pattern: ".gitlab-ci*",
446
489
  comment: "GitLab CI configuration"
490
+ },
491
+ {
492
+ test: /^\.claude\/commands\//,
493
+ pattern: ".claude/commands/**",
494
+ comment: "Claude Code slash commands (often generated from sources)"
495
+ },
496
+ {
497
+ test: /^\.opencode\/commands\//,
498
+ pattern: ".opencode/commands/**",
499
+ comment: "OpenCode slash commands (often generated from sources)"
500
+ },
501
+ {
502
+ test: /^\.cursor\/rules\//,
503
+ pattern: ".cursor/rules/**",
504
+ comment: "Cursor rules (often generated from sources)"
447
505
  }
448
506
  ];
449
507
  function detectIgnorePatterns() {
450
- let raw;
451
- try {
452
- raw = execSync("git ls-files", {
453
- maxBuffer: 50 * 1024 * 1024,
454
- stdio: ["pipe", "pipe", "pipe"]
455
- });
456
- } catch {
457
- throw new Error("Not a git repository or git is not installed.");
458
- }
459
- const trackedFiles = raw.toString().split("\n").map((l) => normalizePath(l.trim())).filter(Boolean);
508
+ const trackedFiles = getTrackedFiles();
460
509
  const patterns = [];
461
510
  const topDirs = /* @__PURE__ */ new Set();
462
511
  for (const f of trackedFiles) {
@@ -469,8 +518,11 @@ function detectIgnorePatterns() {
469
518
  }
470
519
  }
471
520
  for (const rule of INIT_FILE_RULES) {
472
- if (trackedFiles.some((f) => rule.test.test(f))) {
473
- patterns.push({ pattern: rule.pattern, comment: rule.comment });
521
+ for (const f of trackedFiles) {
522
+ if (rule.test.test(f)) {
523
+ patterns.push({ pattern: rule.pattern, comment: rule.comment });
524
+ break;
525
+ }
474
526
  }
475
527
  }
476
528
  return patterns;
@@ -599,7 +651,13 @@ function padLeft(s, n) {
599
651
  return w >= n ? s : " ".repeat(n - w) + s;
600
652
  }
601
653
  function truncate(s, max) {
602
- return s.length <= max ? s : `\u2026${s.slice(s.length - max + 1)}`;
654
+ if (max <= 0) return "";
655
+ if (s.length <= max) return s;
656
+ if (max === 1) return "\u2026";
657
+ const remaining = max - 1;
658
+ const tail = Math.ceil(remaining * 0.6);
659
+ const head = remaining - tail;
660
+ return `${s.slice(0, head)}\u2026${s.slice(s.length - tail)}`;
603
661
  }
604
662
  function tierLabel(tier) {
605
663
  if (tier === "hot") return pc.red("\u{1F525} HOT ");
@@ -621,6 +679,9 @@ function tierSummary(tierCounts, showing, total) {
621
679
  }
622
680
 
623
681
  // src/format.ts
682
+ var RANKING_LABELS_BY_KEY = Object.fromEntries(
683
+ RANKING_DEFS.map((d) => [d.key, d.label])
684
+ );
624
685
  function formatReportTable(output) {
625
686
  const lines = [];
626
687
  const { summary, files } = output;
@@ -706,13 +767,13 @@ function getRankingColumns(key) {
706
767
  ],
707
768
  defects: [
708
769
  {
709
- header: "Dfcts",
770
+ header: "Fixes",
710
771
  width: 6,
711
772
  align: "right",
712
773
  value: (e) => String(e.metricValue)
713
774
  },
714
775
  {
715
- header: "DfDns",
776
+ header: "FxDns",
716
777
  width: 7,
717
778
  align: "right",
718
779
  value: (e) => (e.metricDensity ?? 0).toFixed(4)
@@ -738,7 +799,7 @@ function getRankingColumns(key) {
738
799
  var METRIC_EMOJI = {
739
800
  complexity: "\u{1F9EC}",
740
801
  nesting: "\u{1F4CF}",
741
- defects: "\u{1F41B}",
802
+ defects: "\u{1F527}",
742
803
  authors: "\u{1F465}"
743
804
  };
744
805
  function formatRankingTable(key, ranking, description) {
@@ -777,8 +838,21 @@ function formatRankingTable(key, ranking, description) {
777
838
  }
778
839
  function formatHotspotsTable(output) {
779
840
  const lines = [];
780
- const { churnWindow, rankings } = output;
841
+ const { churnWindow, rankings, corpus } = output;
781
842
  lines.push(`Hotspots \u2014 ${churnWindow} churn window`);
843
+ if (corpus && corpus.fileCount > 0 && corpus.totalComplexity === 0) {
844
+ lines.push("");
845
+ lines.push(
846
+ pc2.yellow(
847
+ "Note: no measurable code complexity detected across this corpus (cyclomatic = 0)."
848
+ )
849
+ );
850
+ lines.push(
851
+ pc2.yellow(
852
+ "Rankings reflect size and churn only \u2014 HOT/WARM/COOL are relative groupings, not risk labels."
853
+ )
854
+ );
855
+ }
782
856
  lines.push("");
783
857
  const keys = Object.keys(rankings);
784
858
  for (let i = 0; i < keys.length; i++) {
@@ -793,8 +867,8 @@ function formatHotspotsTable(output) {
793
867
  if (output.skipped) {
794
868
  for (const [key, info] of Object.entries(output.skipped)) {
795
869
  lines.push("");
796
- const label = key.charAt(0).toUpperCase() + key.slice(1);
797
- lines.push(`${label} \xD7 Churn \u2014 skipped (${info.reason})`);
870
+ const label = RANKING_LABELS_BY_KEY[key] ?? `${key.charAt(0).toUpperCase() + key.slice(1)} \xD7 Churn`;
871
+ lines.push(`${label} \u2014 skipped (${info.reason})`);
798
872
  if (info.suggestion) {
799
873
  lines.push(` ${info.suggestion}`);
800
874
  }
@@ -825,8 +899,15 @@ function formatCouplingTable(output) {
825
899
  padRight("File 1", 35) + padRight("File 2", 35) + padLeft("Shared", 7) + padLeft("Degree", 8) + padLeft("Cmplx", 7) + padLeft("Tier", 12)
826
900
  );
827
901
  lines.push("\u2500".repeat(104));
902
+ let anyDeleted = false;
903
+ let anyLockstep = false;
828
904
  for (const c of couplings) {
829
- const rawRow = padRight(truncate(c.file1, 33), 35) + padRight(truncate(c.file2, 33), 35) + padLeft(String(c.cochanges), 7) + padLeft(`${c.degree.toFixed(1)}%`, 8) + padLeft(String(c.totalComplexity), 7) + padLeft(tierLabel(c.tier), 12);
905
+ if (c.file1Deleted || c.file2Deleted) anyDeleted = true;
906
+ if (c.lockstep) anyLockstep = true;
907
+ const file1Cell = c.file1Deleted ? `\u2020 ${truncate(c.file1, 31)}` : truncate(c.file1, 33);
908
+ const file2Cell = c.file2Deleted ? `\u2020 ${truncate(c.file2, 31)}` : truncate(c.file2, 33);
909
+ const degreeText = c.lockstep ? `${c.degree.toFixed(1)}\u21C4` : `${c.degree.toFixed(1)}%`;
910
+ const rawRow = padRight(file1Cell, 35) + padRight(file2Cell, 35) + padLeft(String(c.cochanges), 7) + padLeft(degreeText, 8) + padLeft(String(c.totalComplexity), 7) + padLeft(tierLabel(c.tier), 12);
830
911
  lines.push(colorRow(c.tier, rawRow));
831
912
  }
832
913
  lines.push("");
@@ -835,6 +916,18 @@ function formatCouplingTable(output) {
835
916
  "Shared=co-changed commits | Degree=shared/min(churn)\xD7100 | Cmplx=sum of both files"
836
917
  )
837
918
  );
919
+ if (anyDeleted) {
920
+ lines.push(
921
+ pc2.dim("\u2020 = file no longer present at HEAD (deleted or renamed)")
922
+ );
923
+ }
924
+ if (anyLockstep) {
925
+ lines.push(
926
+ pc2.dim(
927
+ "\u21C4 = lockstep pair (both files only ever changed together \u2014 signal is real but uninformative)"
928
+ )
929
+ );
930
+ }
838
931
  lines.push(
839
932
  pc2.dim(
840
933
  "Tiers are relative to THIS codebase, not absolute quality grades. High coupling may be intentional and fine."
@@ -871,7 +964,7 @@ function formatCompositeTable(output) {
871
964
 
872
965
  // src/cli.ts
873
966
  var program = new Command();
874
- program.name("obscene").description("Identify hotspot files \u2014 complex code that changes frequently").version("2.0.0");
967
+ program.name("obscene").description("Identify hotspot files \u2014 complex code that changes frequently").version("2.1.0");
875
968
  var REPORT_GUIDE = {
876
969
  complexity: "Cyclomatic complexity (branch/loop count). NOT a quality judgment \u2014 a 500-line parser will naturally score high. Compare density, not raw values.",
877
970
  complexityDensity: "Complexity per line of code. Normalizes for file size. >0.25 suggests dense logic worth reviewing; <0.10 is typical for straightforward code.",
@@ -881,16 +974,19 @@ var HOTSPOTS_GUIDE = {
881
974
  rankings: "Four independent ranking tables, each scoring files by a different metric \xD7 churn. A file may rank high in one dimension but not others.",
882
975
  complexity: "complexity \xD7 churn. Complex code that changes often poses maintenance risk.\nSource: McCabe cyclomatic complexity (1976) via scc \xB7 Strength: objective, language-agnostic \xB7 Limit: parsers and state machines score high naturally",
883
976
  nesting: "maxNesting \xD7 churn. Deeply nested code that changes often is harder to reason about.\nSource: cognitive complexity research (SonarSource, G. Ann Campbell 2018) \xB7 Strength: catches hard-to-follow control flow \xB7 Limit: some patterns (error chains, config) legitimately nest deep",
884
- defects: "defects \xD7 churn. Files with fix: commits that also churn heavily may harbor latent bugs.\nSource: defect prediction via conventional commits (fix: prefix) \xB7 Strength: direct bug-history signal \xB7 Limit: requires consistent fix: convention to be accurate",
977
+ defects: "fixes \xD7 churn. Count of fix: commits touching the file \xD7 churn. High values can mean latent fragility, but they also flag features that got debugged thoroughly \u2014 read the fix-commit history before concluding which.\nSource: change-history metrics (Moser, Pedrycz & Succi 2008) via conventional commits (fix: prefix) \xB7 Strength: direct fix-history signal \xB7 Limit: counts fix activity, not defects per se; requires consistent fix: convention",
885
978
  authors: "authors \xD7 churn. Files touched by many authors and changing often may lack clear ownership.\nSource: code ownership research (Bird et al. 2011, Microsoft) \xB7 Strength: flags diffuse ownership risk \xB7 Limit: doesn't measure expertise depth, bot authors filtered automatically",
886
979
  composite: "Combined ranking using Reciprocal Rank Fusion (RRF) across all dimensions. Files appearing near the top of multiple rankings score highest.\nSource: RRF (Cormack et al. 2009) \xB7 Strength: robust to outliers, no normalization needed \xB7 Limit: equal weight across all dimensions",
887
- tier: "Relative ranking within THIS codebase (top 50% = hot, next 30% = warm, bottom 20% = cool). NOT an absolute quality grade \u2014 a hot file is under heavy load, not necessarily broken."
980
+ tier: "Relative ranking within THIS codebase (top 50% = hot, next 30% = warm, bottom 20% = cool). NOT an absolute quality grade \u2014 a hot file is under heavy load, not necessarily broken.",
981
+ corpus: "Aggregate stats for the analyzed file set (post-exclude \u2014 files filtered by .obsignore or --exclude are not counted). When totalComplexity is 0, the rankings reflect size and churn only; HOT/WARM/COOL become relative groupings rather than risk labels."
888
982
  };
889
983
  var COUPLING_GUIDE = {
890
984
  cochanges: "Times both files appeared in the same commit. Higher values suggest a dependency between the files. Same-directory pairs are excluded \u2014 only cross-directory pairs are shown.",
891
985
  degree: "Percentage: shared commits / min(churn of file1, file2) \xD7 100. Shows how tightly coupled the pair is relative to their individual change rates. 100% means every change to the less-active file also touched the other.",
892
986
  totalComplexity: "Sum of both files' cyclomatic complexity. Highlights coupled pairs where the involved code is also complex \u2014 hidden dependency + high complexity compounds maintenance risk.",
893
- tier: "Relative ranking within THIS codebase's coupling pairs (top 50% = hot, next 30% = warm, bottom 20% = cool). NOT an absolute quality grade. 'hot' means this pair co-changes more than most \u2014 it may be intentional and fine."
987
+ tier: "Relative ranking within THIS codebase's coupling pairs (top 50% = hot, next 30% = warm, bottom 20% = cool). NOT an absolute quality grade. 'hot' means this pair co-changes more than most \u2014 it may be intentional and fine.",
988
+ deleted: "file1Deleted / file2Deleted are set when the file is no longer present at HEAD (deleted or renamed away). The coupling signal is historical \u2014 the pair is not actionable in the current tree.",
989
+ lockstep: "Set when both files' total churn equals their co-change count over the window \u2014 i.e. they only ever changed together. The 100% degree is real but uninformative; treat the pair as a single unit from git's perspective."
894
990
  };
895
991
  function addSharedOptions(cmd) {
896
992
  return cmd.option("--top <n>", "limit to top N entries (0 = all)", "20").option("--format <type>", "output format: json | table", "json").option(
@@ -996,13 +1092,19 @@ function runHotspots(opts) {
996
1092
  top
997
1093
  );
998
1094
  const composite = computeComposite(rankings, churn, top);
1095
+ let corpusTotalComplexity = 0;
1096
+ for (const f of files) corpusTotalComplexity += f.complexity;
999
1097
  const output = {
1000
1098
  generated: (/* @__PURE__ */ new Date()).toISOString(),
1001
1099
  guide: HOTSPOTS_GUIDE,
1002
1100
  churnWindow: `${months} months`,
1003
1101
  rankings,
1004
1102
  skipped: Object.keys(skipped).length > 0 ? skipped : void 0,
1005
- composite
1103
+ composite,
1104
+ corpus: {
1105
+ fileCount: files.length,
1106
+ totalComplexity: corpusTotalComplexity
1107
+ }
1006
1108
  };
1007
1109
  if (opts.format === "table") {
1008
1110
  process.stdout.write(`${formatHotspotsTable(output)}
@@ -1030,11 +1132,13 @@ function runCoupling(opts) {
1030
1132
  for (const f of files) {
1031
1133
  complexityMap.set(f.file, f.complexity);
1032
1134
  }
1135
+ const trackedFiles = getTrackedFiles();
1033
1136
  const couplings = computeCoupling(
1034
1137
  cochanges,
1035
1138
  churn,
1036
1139
  complexityMap,
1037
- minCochanges
1140
+ minCochanges,
1141
+ trackedFiles
1038
1142
  );
1039
1143
  const limited = top > 0 ? couplings.slice(0, top) : couplings;
1040
1144
  const tierCounts = { hot: 0, warm: 0, cool: 0 };
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@wbern/obscene",
3
- "version": "2.0.0",
3
+ "version": "2.1.0",
4
4
  "description": "Identify hotspot files — complex code that changes frequently. Churn × complexity analysis for any git repo.",
5
5
  "type": "module",
6
6
  "bin": {