@wbern/obscene 2.0.0 → 2.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +51 -8
- package/dist/cli.js +34 -15
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -73,7 +73,7 @@ Produces **four independent ranking tables**, each scoring files by a different
|
|
|
73
73
|
|---------|---------------|----------------|
|
|
74
74
|
| Complexity × Churn | `complexity × churn` | Cmplx, Dens |
|
|
75
75
|
| Nesting × Churn | `maxNesting × churn` | Nest |
|
|
76
|
-
|
|
|
76
|
+
| Fix Activity × Churn | `fixes × churn` | Fixes, FxDns |
|
|
77
77
|
| Authors × Churn | `authors × churn` | Auth |
|
|
78
78
|
|
|
79
79
|
Plus a **Combined** ranking using [Reciprocal Rank Fusion](https://doi.org/10.1145/1571941.1572114) (RRF) across all dimensions — files appearing near the top of multiple rankings score highest.
|
|
@@ -92,9 +92,9 @@ A file may rank high in one dimension (e.g. complexity) but low in another (e.g.
|
|
|
92
92
|
|
|
93
93
|
### `obscene coupling`
|
|
94
94
|
|
|
95
|
-
Detects files that frequently change together in the same commit but live in different directories — Tornhill's "temporal coupling" analysis from *Your Code as a Crime Scene* (2015). Surfaces hidden
|
|
95
|
+
**Temporal coupling** (co-change history), not structural / type-level coupling. Detects files that frequently change together in the same commit but live in different directories — Tornhill's "temporal coupling" analysis from *Your Code as a Crime Scene* (2015). Surfaces hidden dependencies that aren't visible in imports or the module graph: pairs of files that *in practice* can't be changed independently, even when the type system says they can.
|
|
96
96
|
|
|
97
|
-
Same-directory pairs are excluded
|
|
97
|
+
Same-directory pairs are excluded because co-location is usually expected coupling (a component and its styles, a handler and its test); the interesting signal is cross-directory pairs that change together despite living in different parts of the tree. Mass commits touching >20 files are skipped (formatting changes, large refactors). See [Why temporal coupling?](#why-temporal-coupling) for the research backing this approach.
|
|
98
98
|
|
|
99
99
|
```bash
|
|
100
100
|
obscene coupling # default: min 2 shared commits
|
|
@@ -136,17 +136,17 @@ Total cyclomatic complexity as reported by [scc](https://github.com/boyter/scc).
|
|
|
136
136
|
|
|
137
137
|
`complexity / lines of code`. Normalizes complexity by file size so a 50-line file with complexity 25 (density 0.50) stands out against a 500-line file with complexity 25 (density 0.05). Based on Harrison & Magel (1981), who found that complexity relative to code size is a stronger fault predictor than raw complexity alone.
|
|
138
138
|
|
|
139
|
-
####
|
|
139
|
+
#### Fixes (`Fixes`)
|
|
140
140
|
|
|
141
|
-
Count of `fix:` conventional commits touching the file within the churn window.
|
|
141
|
+
Count of `fix:` conventional commits touching the file within the churn window. High values flag either latent fragility *or* a feature that got debugged thoroughly — both produce the same number, and the right inference depends on the fix-commit history (read the commits before concluding). The metric is inspired by Moser, Pedrycz & Succi (2008), who showed that change-history metrics outperform static code metrics for defect prediction; obscene reports the raw fix-activity signal and leaves the interpretation to you.
|
|
142
142
|
|
|
143
|
-
####
|
|
143
|
+
#### Fix density (`FxDns`)
|
|
144
144
|
|
|
145
|
-
`
|
|
145
|
+
`fixes / lines of code`. Shown in the Fix Activity × Churn table. Normalizes fix-commit count by file size so a 50-line file with 5 fixes (density 0.10) stands out against a 500-line file with 5 fixes (density 0.01).
|
|
146
146
|
|
|
147
147
|
#### Nesting depth (`Nest`)
|
|
148
148
|
|
|
149
|
-
Maximum indentation level (tab stops) in the file. Deep nesting correlates with high cognitive load and defect likelihood. Harrison & Magel (1981) identified nesting depth as a significant complexity contributor.
|
|
149
|
+
Maximum indentation level (tab stops) in the file. Deep nesting correlates with high cognitive load and defect likelihood. Harrison & Magel (1981) identified nesting depth as a significant complexity contributor. The indent unit is detected from the most common positive delta between consecutive non-blank line indents, which keeps single-space outlier lines (multiline strings, continuation alignment) from inflating the score. The metric measures whitespace depth, not AST control-flow depth — they usually agree, but a file with deep alignment and shallow logic can read higher than its true nesting.
|
|
150
150
|
|
|
151
151
|
#### Unique authors (`Auth`)
|
|
152
152
|
|
|
@@ -320,6 +320,7 @@ Files that change together but live in different directories reveal implicit dep
|
|
|
320
320
|
- **Must be run inside a git repo.** Churn data comes from `git log`.
|
|
321
321
|
- **Only analyzes files that currently exist.** Deleted files don't appear, even if they churned heavily before removal.
|
|
322
322
|
- **Tier thresholds are fixed** (50/80 cumulative %). Not configurable yet.
|
|
323
|
+
- **Temporarily penalizes refactoring.** Moving code *out of* a hot file shows up as one more commit on that file, inflating its score before the new structure has time to pay off in stability. A file you just touched today will look hotter than it deserves; the signal stabilizes over the next few weeks.
|
|
323
324
|
|
|
324
325
|
### Coupling-specific
|
|
325
326
|
|
|
@@ -328,6 +329,48 @@ Files that change together but live in different directories reveal implicit dep
|
|
|
328
329
|
- **Degree uses unfiltered churn.** The denominator (`min(churn)`) counts all commits to a file, including single-file commits. This means degree can understate coupling when a file has high solo churn.
|
|
329
330
|
- **Squash merges collapse coupling signal.** If a branch with 10 separate commits is squash-merged into one, all co-changes within that branch become a single co-occurrence.
|
|
330
331
|
|
|
332
|
+
## Field reports
|
|
333
|
+
|
|
334
|
+
Reviews from agents that ran obscene against real codebases. Unedited.
|
|
335
|
+
|
|
336
|
+
> I ran obscene against a mid-sized polyglot codebase (web frontend + Python service + IaC, ~150 files, ~4 months of active history) right after a session of structural refactors. Honest take:
|
|
337
|
+
>
|
|
338
|
+
> What actually surfaced new information:
|
|
339
|
+
>
|
|
340
|
+
> - The defect-density column (fix-commits per change) flagged a fragile component I would not have prioritized from reading the code alone — ~10 fix-commits over ~14 changes is a "this feature keeps breaking" signal you don't get from raw line counts or complexity.
|
|
341
|
+
> - A nesting outlier (one handler scoring ~30-deep in a repo whose median was 4–7). Extreme enough that no amount of churn-weighting could hide it.
|
|
342
|
+
>
|
|
343
|
+
> Worth setting expectations on:
|
|
344
|
+
>
|
|
345
|
+
> - It's a churn × complexity instrument, so it *temporarily penalizes* refactoring — moving code out of a hot file shows up as more commits on that file, inflating the score before the new structure pays off in stability. The hottest file in our run was hottest partly because we'd touched it that day.
|
|
346
|
+
> - McCabe complexity doesn't distinguish "one giant function" from "many small ones in the same file." A score tells you the size of the badness, not the shape.
|
|
347
|
+
> - HOT/WARM/COOL tiers are relative to the repo, so *something* will always be HOT. Useful for "what's worst here," not a portable quality grade.
|
|
348
|
+
> - It can't see the failure modes that actually bite at runtime: coupling, type confusion, missing tests, brittle integration seams, hidden globals.
|
|
349
|
+
>
|
|
350
|
+
> Verdict: a 60-second sanity check that mostly ranks what reading the codebase already tells you, plus one or two findings you'd otherwise miss. Treat the defect-density column as the most signal-dense, run it quarterly, and don't optimize against the leaderboard — it's a magnifying glass, not a scoreboard.
|
|
351
|
+
>
|
|
352
|
+
> — Claude (Opus 4.7), via Claude Code
|
|
353
|
+
|
|
354
|
+
**Coupling addendum** — a separate run of `obscene coupling` against the same codebase a few weeks later, at the maintainer's request.
|
|
355
|
+
|
|
356
|
+
> What landed:
|
|
357
|
+
>
|
|
358
|
+
> - The headline finding: the top co-change pair (~21 shared commits, ~70% degree) was a service module and its corresponding configuration-management playbook. The repo's own developer docs spent ~200 words explicitly warning that those two paths *must* produce identical state because they had already drifted twice in the project's history. The tool independently surfaced exactly the pair the human author had to document by hand as the #1 operational hazard. That's a real find — temporal coupling catches a class of risk ("two paths must move in lockstep") that complexity and churn cannot, by construction.
|
|
359
|
+
> - Second-tier signal that earned its keep: cross-stack pairs (frontend SPA + backend API, ~8 co-changes) flagged which abstraction boundaries actually leak in practice. Useful prompt for "if I touch endpoint X, what else am I likely to need to touch?"
|
|
360
|
+
> - Worth saying explicitly: the original testament's line "can't see coupling" was unfair as written. I meant *structural* coupling — the static-analysis question of "if I rename this field, what breaks?". `obscene coupling` measures *temporal* coupling (co-change history). Different sense of the word, and for the failure mode I was implicitly thinking of ("two things must stay in sync") the temporal lens is arguably more diagnostic than the structural one would have been.
|
|
361
|
+
>
|
|
362
|
+
> Where the friction was:
|
|
363
|
+
>
|
|
364
|
+
> - Documentation files (CLAUDE.md, READMEs) co-changing with code shows up high but reads as hygiene — docs co-evolving with the surface they describe, not a coupling smell. Worth either a default exclusion for markdown or an explicit callout in the legend.
|
|
365
|
+
> - The `Degree` metric is asymmetric (`shared / min(churn)`, so it measures how entangled the *less-churned* file is with the other), but the file-pair display is symmetric. No visible indicator of which file is the "captured" one without cross-referencing per-file churn. Adding directionality to the printout would read more clearly.
|
|
366
|
+
> - Small-absolute / high-degree pairs (e.g. 5 co-changes at 83%) appeared near the top at defaults. `--min-cochanges 5` filtered these out cleanly, but the defaults need either a sane minimum or a confidence-shaped column.
|
|
367
|
+
> - The combined-complexity column on each row didn't add much — a sum of two unrelated complexities has no clean interpretation, and the hotspots report already covers per-file complexity well.
|
|
368
|
+
> - Tier inflation again: ~68 HOT pairs out of ~231 at defaults. Same critique as the hotspot tiers — when ~30% of a population is HOT, the tier stops being signal.
|
|
369
|
+
>
|
|
370
|
+
> Verdict: `obscene coupling` complements the hotspot view rather than overlapping with it. Hotspots ask "what file is the worst?"; coupling asks "what files must I keep in sync?" — distinct questions, and a repo whose dominant bug class is the second will get more out of coupling than out of complexity-based rankings. For this codebase, coupling rediscovered an institutional hazard the human author had felt compelled to document in prose. Worth running alongside hotspots, not in place of either lens. Same quarterly cadence applies; treat the cross-stack and cross-path pairs as the most action-shaped output.
|
|
371
|
+
>
|
|
372
|
+
> — Claude (Opus 4.7), via Claude Code
|
|
373
|
+
|
|
331
374
|
## License
|
|
332
375
|
|
|
333
376
|
MIT
|
package/dist/cli.js
CHANGED
|
@@ -229,8 +229,8 @@ var RANKING_DEFS = [
|
|
|
229
229
|
},
|
|
230
230
|
{
|
|
231
231
|
key: "defects",
|
|
232
|
-
label: "
|
|
233
|
-
scoreFormula: "
|
|
232
|
+
label: "Fix Activity \xD7 Churn",
|
|
233
|
+
scoreFormula: "fixes \xD7 churn"
|
|
234
234
|
},
|
|
235
235
|
{
|
|
236
236
|
key: "authors",
|
|
@@ -365,20 +365,36 @@ function getNestingDepths(filePaths) {
|
|
|
365
365
|
depths.set(filePath, 0);
|
|
366
366
|
continue;
|
|
367
367
|
}
|
|
368
|
-
let minSpaces = Number.POSITIVE_INFINITY;
|
|
369
368
|
const leadings = [];
|
|
369
|
+
const deltaCounts = /* @__PURE__ */ new Map();
|
|
370
|
+
let prevSpaceWidth = 0;
|
|
370
371
|
for (const line of content.split("\n")) {
|
|
371
372
|
if (!line.trim()) continue;
|
|
372
373
|
const match = line.match(/^(\s+)/);
|
|
373
|
-
if (!match)
|
|
374
|
+
if (!match) {
|
|
375
|
+
prevSpaceWidth = 0;
|
|
376
|
+
continue;
|
|
377
|
+
}
|
|
374
378
|
const leading = match[1];
|
|
375
379
|
leadings.push(leading);
|
|
376
|
-
|
|
377
|
-
|
|
378
|
-
|
|
380
|
+
if (leading.includes(" ")) {
|
|
381
|
+
continue;
|
|
382
|
+
}
|
|
383
|
+
const width = leading.length;
|
|
384
|
+
const delta = width - prevSpaceWidth;
|
|
385
|
+
if (delta > 0) {
|
|
386
|
+
deltaCounts.set(delta, (deltaCounts.get(delta) ?? 0) + 1);
|
|
387
|
+
}
|
|
388
|
+
prevSpaceWidth = width;
|
|
389
|
+
}
|
|
390
|
+
let indentUnit = 4;
|
|
391
|
+
let bestCount = 0;
|
|
392
|
+
for (const [delta, count] of deltaCounts) {
|
|
393
|
+
if (count > bestCount || count === bestCount && delta < indentUnit) {
|
|
394
|
+
bestCount = count;
|
|
395
|
+
indentUnit = delta;
|
|
379
396
|
}
|
|
380
397
|
}
|
|
381
|
-
const indentUnit = minSpaces === Number.POSITIVE_INFINITY ? 4 : minSpaces;
|
|
382
398
|
let maxDepth = 0;
|
|
383
399
|
for (const leading of leadings) {
|
|
384
400
|
let depth = 0;
|
|
@@ -621,6 +637,9 @@ function tierSummary(tierCounts, showing, total) {
|
|
|
621
637
|
}
|
|
622
638
|
|
|
623
639
|
// src/format.ts
|
|
640
|
+
var RANKING_LABELS_BY_KEY = Object.fromEntries(
|
|
641
|
+
RANKING_DEFS.map((d) => [d.key, d.label])
|
|
642
|
+
);
|
|
624
643
|
function formatReportTable(output) {
|
|
625
644
|
const lines = [];
|
|
626
645
|
const { summary, files } = output;
|
|
@@ -706,13 +725,13 @@ function getRankingColumns(key) {
|
|
|
706
725
|
],
|
|
707
726
|
defects: [
|
|
708
727
|
{
|
|
709
|
-
header: "
|
|
728
|
+
header: "Fixes",
|
|
710
729
|
width: 6,
|
|
711
730
|
align: "right",
|
|
712
731
|
value: (e) => String(e.metricValue)
|
|
713
732
|
},
|
|
714
733
|
{
|
|
715
|
-
header: "
|
|
734
|
+
header: "FxDns",
|
|
716
735
|
width: 7,
|
|
717
736
|
align: "right",
|
|
718
737
|
value: (e) => (e.metricDensity ?? 0).toFixed(4)
|
|
@@ -738,7 +757,7 @@ function getRankingColumns(key) {
|
|
|
738
757
|
var METRIC_EMOJI = {
|
|
739
758
|
complexity: "\u{1F9EC}",
|
|
740
759
|
nesting: "\u{1F4CF}",
|
|
741
|
-
defects: "\u{
|
|
760
|
+
defects: "\u{1F527}",
|
|
742
761
|
authors: "\u{1F465}"
|
|
743
762
|
};
|
|
744
763
|
function formatRankingTable(key, ranking, description) {
|
|
@@ -793,8 +812,8 @@ function formatHotspotsTable(output) {
|
|
|
793
812
|
if (output.skipped) {
|
|
794
813
|
for (const [key, info] of Object.entries(output.skipped)) {
|
|
795
814
|
lines.push("");
|
|
796
|
-
const label = key.charAt(0).toUpperCase() + key.slice(1)
|
|
797
|
-
lines.push(`${label} \
|
|
815
|
+
const label = RANKING_LABELS_BY_KEY[key] ?? `${key.charAt(0).toUpperCase() + key.slice(1)} \xD7 Churn`;
|
|
816
|
+
lines.push(`${label} \u2014 skipped (${info.reason})`);
|
|
798
817
|
if (info.suggestion) {
|
|
799
818
|
lines.push(` ${info.suggestion}`);
|
|
800
819
|
}
|
|
@@ -871,7 +890,7 @@ function formatCompositeTable(output) {
|
|
|
871
890
|
|
|
872
891
|
// src/cli.ts
|
|
873
892
|
var program = new Command();
|
|
874
|
-
program.name("obscene").description("Identify hotspot files \u2014 complex code that changes frequently").version("2.0.
|
|
893
|
+
program.name("obscene").description("Identify hotspot files \u2014 complex code that changes frequently").version("2.0.1");
|
|
875
894
|
var REPORT_GUIDE = {
|
|
876
895
|
complexity: "Cyclomatic complexity (branch/loop count). NOT a quality judgment \u2014 a 500-line parser will naturally score high. Compare density, not raw values.",
|
|
877
896
|
complexityDensity: "Complexity per line of code. Normalizes for file size. >0.25 suggests dense logic worth reviewing; <0.10 is typical for straightforward code.",
|
|
@@ -881,7 +900,7 @@ var HOTSPOTS_GUIDE = {
|
|
|
881
900
|
rankings: "Four independent ranking tables, each scoring files by a different metric \xD7 churn. A file may rank high in one dimension but not others.",
|
|
882
901
|
complexity: "complexity \xD7 churn. Complex code that changes often poses maintenance risk.\nSource: McCabe cyclomatic complexity (1976) via scc \xB7 Strength: objective, language-agnostic \xB7 Limit: parsers and state machines score high naturally",
|
|
883
902
|
nesting: "maxNesting \xD7 churn. Deeply nested code that changes often is harder to reason about.\nSource: cognitive complexity research (SonarSource, G. Ann Campbell 2018) \xB7 Strength: catches hard-to-follow control flow \xB7 Limit: some patterns (error chains, config) legitimately nest deep",
|
|
884
|
-
defects: "
|
|
903
|
+
defects: "fixes \xD7 churn. Count of fix: commits touching the file \xD7 churn. High values can mean latent fragility, but they also flag features that got debugged thoroughly \u2014 read the fix-commit history before concluding which.\nSource: change-history metrics (Moser, Pedrycz & Succi 2008) via conventional commits (fix: prefix) \xB7 Strength: direct fix-history signal \xB7 Limit: counts fix activity, not defects per se; requires consistent fix: convention",
|
|
885
904
|
authors: "authors \xD7 churn. Files touched by many authors and changing often may lack clear ownership.\nSource: code ownership research (Bird et al. 2011, Microsoft) \xB7 Strength: flags diffuse ownership risk \xB7 Limit: doesn't measure expertise depth, bot authors filtered automatically",
|
|
886
905
|
composite: "Combined ranking using Reciprocal Rank Fusion (RRF) across all dimensions. Files appearing near the top of multiple rankings score highest.\nSource: RRF (Cormack et al. 2009) \xB7 Strength: robust to outliers, no normalization needed \xB7 Limit: equal weight across all dimensions",
|
|
887
906
|
tier: "Relative ranking within THIS codebase (top 50% = hot, next 30% = warm, bottom 20% = cool). NOT an absolute quality grade \u2014 a hot file is under heavy load, not necessarily broken."
|