@wbern/obscene 0.2.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +91 -8
  2. package/dist/cli.js +171 -8
  3. package/package.json +1 -1
package/README.md CHANGED
@@ -57,6 +57,8 @@ obscene --format table # human-readable table
57
57
  obscene --top 50 --months 6 # more results, longer window
58
58
  obscene --top 0 # all files
59
59
  obscene report # raw complexity (no churn)
60
+ obscene coupling # temporal coupling analysis
61
+ obscene coupling --min-cochanges 1 --format table
60
62
  obscene --exclude "*.generated.*"
61
63
  obscene | jq '.hotspots[0]' # pipe-friendly
62
64
  ```
@@ -73,6 +75,18 @@ Scores each file by `complexity × commits` over a time window, then assigns tie
73
75
  | **watch** | next 30% (50–80%) | Keep an eye on these |
74
76
  | **stable** | bottom 20% | Low risk |
75
77
 
78
+ ### `obscene coupling`
79
+
80
+ Detects files that frequently change together in the same commit but live in different directories — Tornhill's "temporal coupling" analysis. Surfaces hidden structural dependencies that aren't visible in the code itself.
81
+
82
+ Same-directory pairs are excluded (co-location is expected coupling). Mass commits touching >20 files are skipped (formatting changes, large refactors).
83
+
84
+ ```bash
85
+ obscene coupling # default: min 2 shared commits
86
+ obscene coupling --min-cochanges 1 # include single co-occurrences
87
+ obscene coupling --format table --top 10 # human-readable, top 10
88
+ ```
89
+
76
90
  ### `obscene report`
77
91
 
78
92
  Per-file complexity without churn. Useful for raw complexity distribution.
@@ -84,22 +98,80 @@ Per-file complexity without churn. Useful for raw complexity distribution.
84
98
  | `--top <n>` | `20` | Limit results (0 = all) |
85
99
  | `--months <n>` | `3` | Churn window in months |
86
100
  | `--format <type>` | `json` | `json` or `table` |
101
+ | `--min-cochanges <n>` | `2` | Minimum shared commits (coupling only) |
87
102
  | `--exclude <patterns...>` | — | Additional exclusion patterns |
88
103
 
104
+ ## Metrics
105
+
106
+ Each hotspot row includes the following metrics:
107
+
108
+ ### Hotspot score (`Score`)
109
+
110
+ `complexity × churn`. The core ranking metric — files that are both complex and frequently modified bubble to the top. See [Why churn × complexity?](#why-churn-x-complexity) for the research backing this approach.
111
+
112
+ ### Churn (`Churn`)
113
+
114
+ Number of commits touching the file within the configured time window (default: 3 months). Measures how actively the file is being modified.
115
+
116
+ ### Cyclomatic complexity (`Cmplx`)
117
+
118
+ Total cyclomatic complexity as reported by [scc](https://github.com/boyter/scc). Counts independent execution paths (branches, loops, conditions). Higher values mean more paths to test and more places for bugs to hide.
119
+
120
+ ### Complexity density (`Dens`)
121
+
122
+ `complexity / lines of code`. Normalizes complexity by file size so a 50-line file with complexity 25 (density 0.50) stands out against a 500-line file with complexity 25 (density 0.05). Based on Harrison & Magel (1981), who found that complexity relative to code size is a stronger fault predictor than raw complexity alone.
123
+
124
+ ### Defects (`Dfcts`)
125
+
126
+ Count of `fix:` conventional commits touching the file within the churn window. A proxy for historical defect rate — files that attract repeated fixes are more likely to contain latent bugs. Inspired by Moser, Pedrycz & Succi (2008), who showed that change-history metrics outperform static code metrics for defect prediction.
127
+
128
+ ### Defect density (`defectDensity`, JSON only)
129
+
130
+ `defects / lines of code`. Not shown in table output due to column width, but available in JSON. Normalizes defect count by file size.
131
+
132
+ ### Nesting depth (`Nest`)
133
+
134
+ Maximum indentation level (tab stops) in the file. Deep nesting correlates with high cognitive load and defect likelihood. Harrison & Magel (1981) identified nesting depth as a significant complexity contributor.
135
+
136
+ ### Unique authors (`Auth`)
137
+
138
+ Number of distinct git authors who committed to the file within the churn window. Files touched by many authors may lack clear ownership and accumulate inconsistent patterns. Kamei et al. (2013) found developer count to be a significant predictor of defect-introducing changes.
139
+
140
+ ### Shared commits (`Shared`, coupling only)
141
+
142
+ Number of commits where both files in a pair were modified together. The core ranking metric for temporal coupling — higher values indicate stronger hidden dependencies between files in different directories.
143
+
144
+ ### Coupling degree (`Degree`, coupling only)
145
+
146
+ `shared commits / min(churn of file1, churn of file2) × 100`. What percentage of the less-active file's changes also involved the other file. A degree of 100% means every change to the less-active file also touched the other file.
147
+
148
+ ### Tier
149
+
150
+ Cumulative score distribution bucket:
151
+
152
+ | Tier | Range | Meaning |
153
+ |------|-------|---------|
154
+ | **danger** | top 50% of total score | Refactor candidates |
155
+ | **watch** | next 30% (50–80%) | Keep an eye on these |
156
+ | **stable** | bottom 20% | Low risk |
157
+
89
158
  ## Example output
90
159
 
91
160
  ```
92
- Hotspots — 3 months churn window | Total score: 35452
161
+ Hotspots — 3 months churn window | Total score: 35,452
93
162
  Tiers: 3 danger, 13 watch, 194 stable
94
163
  Showing: 5 of 210
95
164
 
96
- File Score % Churn Cmplx Density Tier
97
- ──────────────────────────────────────────────────────────────────────────────────────
98
- src/utils/effect-generator.ts 8296 23.4 68 122 0.12 DANGER
99
- src/services/game-engine.ts 4284 12.1 51 84 0.09 DANGER
100
- src/components/board-renderer.tsx 2940 8.3 42 70 0.11 DANGER
101
- src/hooks/use-game-state.ts 1320 3.7 33 40 0.08 WATCH
102
- src/utils/move-validator.ts 945 2.7 27 35 0.06 WATCH
165
+ File Score % Churn Cmplx Dens Dfcts Nest Auth Tier
166
+ ────────────────────────────────────────────────────────────────────────────────────────────────────────────────
167
+ src/utils/effect-generator.ts 8,296 23.4 68 122 0.12 5 6 4 DANGER
168
+ src/services/game-engine.ts 4,284 12.1 51 84 0.09 3 4 3 DANGER
169
+ src/components/board-renderer.tsx 2,940 8.3 42 70 0.11 2 5 3 DANGER
170
+ src/hooks/use-game-state.ts 1,320 3.7 33 40 0.08 1 3 2 WATCH
171
+ src/utils/move-validator.ts 945 2.7 27 35 0.06 0 2 1 WATCH
172
+
173
+ Score=complexity×churn | Dens=complexity/code | Dfcts=fix commits | Nest=max indent depth | Auth=unique authors
174
+ Docs: https://github.com/wbern/obscene#metrics
103
175
  ```
104
176
 
105
177
  ## Supported languages
@@ -110,6 +182,17 @@ Any language [scc supports](https://github.com/boyter/scc#features) — 200+ lan
110
182
 
111
183
  Test and generated files are excluded automatically: `*.test.*`, `*.spec.*`, `__tests__/`, `__mocks__/`, `*.stories.*`, `*.d.ts`, and similar patterns. scc also skips generated files by default (`--no-gen`).
112
184
 
185
+ ## Why churn x complexity?
186
+
187
+ Files that are both complex and frequently modified are disproportionately likely to contain defects. This is backed by decades of empirical software engineering research:
188
+
189
+ - **Nagappan & Ball (2005)** studied Windows Server 2003 and found that relative code churn measures predict system defect density with 89% accuracy. — [ICSE 2005](https://doi.org/10.1109/ICSE.2005.1553571)
190
+ - **Moser, Pedrycz & Succi (2008)** compared change metrics against static code attributes on Eclipse and found that process metrics (churn, change frequency) outperform static code metrics for defect prediction. — [ICSE 2008](https://doi.org/10.1145/1368088.1368114)
191
+ - **Shin, Meneely, Williams & Osborne (2011)** combined complexity, churn, and developer activity metrics to predict vulnerabilities in Mozilla Firefox and the Linux kernel. By flagging only 10.9% of files, the model identified 70.8% of known vulnerabilities. — [IEEE TSE](https://doi.org/10.1109/TSE.2010.55)
192
+ - **Tornhill & Borg (2022)** analyzed 39 proprietary codebases and found that low-quality code (by their Code Health metric) contains 15x more defects and takes 124% longer to resolve. In their case studies, 4% of the codebase was responsible for 72% of all defects. — [ACM/IEEE TechDebt 2022](https://arxiv.org/abs/2203.04374)
193
+
194
+ The general approach was popularized by Adam Tornhill's *Your Code as a Crime Scene* (2015), which applies forensic analysis techniques to version control history.
195
+
113
196
  ## Limitations
114
197
 
115
198
  - **Churn = commit count**, not lines changed. A one-line typo fix counts the same as a 500-line rewrite.
package/dist/cli.js CHANGED
@@ -129,6 +129,82 @@ function getAuthors(months) {
129
129
  }
130
130
  return counts;
131
131
  }
132
+ var MAX_FILES_PER_COMMIT = 20;
133
+ function getCoChanges(months, excludes = []) {
134
+ const patterns = [...DEFAULT_EXCLUDES, ...excludes.map(globToRegex)];
135
+ let raw;
136
+ try {
137
+ raw = execSync(
138
+ `git log --since="${months} months ago" --format="COMMIT_SEP%n" --name-only`,
139
+ { maxBuffer: 50 * 1024 * 1024, stdio: ["pipe", "pipe", "pipe"] }
140
+ );
141
+ } catch {
142
+ throw new Error("Not a git repository or git is not installed.");
143
+ }
144
+ const cochanges = /* @__PURE__ */ new Map();
145
+ const commits = raw.toString().split("COMMIT_SEP\n");
146
+ for (const commit of commits) {
147
+ if (!commit.trim()) continue;
148
+ const seen = /* @__PURE__ */ new Set();
149
+ for (const line of commit.split("\n")) {
150
+ const trimmed = normalizePath(line.trim());
151
+ if (!trimmed) continue;
152
+ if (!isExcluded(trimmed, patterns)) {
153
+ seen.add(trimmed);
154
+ }
155
+ }
156
+ const files = [...seen];
157
+ if (files.length < 2 || files.length > MAX_FILES_PER_COMMIT) continue;
158
+ for (let i = 0; i < files.length; i++) {
159
+ for (let j = i + 1; j < files.length; j++) {
160
+ const [a, b] = files[i] < files[j] ? [files[i], files[j]] : [files[j], files[i]];
161
+ const dirA = a.includes("/") ? a.slice(0, a.lastIndexOf("/")) : "";
162
+ const dirB = b.includes("/") ? b.slice(0, b.lastIndexOf("/")) : "";
163
+ if (dirA === dirB) continue;
164
+ const key = `${a}\0${b}`;
165
+ cochanges.set(key, (cochanges.get(key) ?? 0) + 1);
166
+ }
167
+ }
168
+ }
169
+ return cochanges;
170
+ }
171
+ function computeCoupling(cochanges, churn, complexityMap, minCochanges) {
172
+ const entries = [];
173
+ for (const [key, count] of cochanges) {
174
+ if (count < minCochanges) continue;
175
+ const [file1, file2] = key.split("\0");
176
+ const minChurn = Math.min(churn.get(file1) ?? 0, churn.get(file2) ?? 0);
177
+ const degree = minChurn > 0 ? Math.round(count / minChurn * 1e3) / 10 : 0;
178
+ const totalComplexity = (complexityMap.get(file1) ?? 0) + (complexityMap.get(file2) ?? 0);
179
+ entries.push({
180
+ file1,
181
+ file2,
182
+ cochanges: count,
183
+ degree,
184
+ totalComplexity,
185
+ couplingScore: count,
186
+ percentOfTotal: 0,
187
+ tier: "stable"
188
+ });
189
+ }
190
+ entries.sort((a, b) => b.couplingScore - a.couplingScore);
191
+ const totalScore = entries.reduce((sum, e) => sum + e.couplingScore, 0);
192
+ if (totalScore === 0) return [];
193
+ let cumulative = 0;
194
+ for (const entry of entries) {
195
+ entry.percentOfTotal = Math.round(entry.couplingScore / totalScore * 1e3) / 10;
196
+ cumulative += entry.couplingScore;
197
+ const cumulativeShare = cumulative / totalScore;
198
+ if (cumulativeShare <= DANGER_CUMULATIVE) {
199
+ entry.tier = "danger";
200
+ } else if (cumulativeShare <= WATCH_CUMULATIVE) {
201
+ entry.tier = "watch";
202
+ } else {
203
+ entry.tier = "stable";
204
+ }
205
+ }
206
+ return entries;
207
+ }
132
208
  function getNestingDepths(filePaths) {
133
209
  const depths = /* @__PURE__ */ new Map();
134
210
  for (const filePath of filePaths) {
@@ -231,23 +307,58 @@ function formatHotspotsTable(output) {
231
307
  lines.push(
232
308
  `Hotspots \u2014 ${churnWindow} churn window | Total score: ${totalScore.toLocaleString()}`
233
309
  );
234
- lines.push(
235
- `Tiers: ${tierCounts.danger} danger, ${tierCounts.watch} watch, ${tierCounts.stable} stable`
236
- );
237
- lines.push(`Showing: ${output.showing} of ${output.totalHotspots}`);
238
- lines.push("");
310
+ pushTierSummary(lines, tierCounts, output.showing, output.totalHotspots);
239
311
  lines.push(
240
312
  padRight("File", 50) + padLeft("Score", 8) + padLeft("%", 7) + padLeft("Churn", 7) + padLeft("Cmplx", 7) + padLeft("Dens", 7) + padLeft("Dfcts", 6) + padLeft("Nest", 6) + padLeft("Auth", 6) + padLeft("Tier", 8)
241
313
  );
242
314
  lines.push("\u2500".repeat(112));
243
315
  for (const h of hotspots) {
244
- const tierLabel = h.tier === "danger" ? "DANGER" : h.tier === "watch" ? "WATCH" : "stable";
245
316
  lines.push(
246
- padRight(truncate(h.file, 48), 50) + padLeft(h.hotspotScore.toLocaleString(), 8) + padLeft(h.percentOfTotal.toFixed(1), 7) + padLeft(String(h.churn), 7) + padLeft(String(h.complexity), 7) + padLeft(h.complexityDensity.toFixed(2), 7) + padLeft(String(h.defects), 6) + padLeft(String(h.maxNesting), 6) + padLeft(String(h.authors), 6) + padLeft(tierLabel, 8)
317
+ padRight(truncate(h.file, 48), 50) + padLeft(h.hotspotScore.toLocaleString(), 8) + padLeft(h.percentOfTotal.toFixed(1), 7) + padLeft(String(h.churn), 7) + padLeft(String(h.complexity), 7) + padLeft(h.complexityDensity.toFixed(2), 7) + padLeft(String(h.defects), 6) + padLeft(String(h.maxNesting), 6) + padLeft(String(h.authors), 6) + padLeft(tierLabel(h.tier), 8)
318
+ );
319
+ }
320
+ lines.push("");
321
+ lines.push(
322
+ "Score=complexity\xD7churn | Dens=complexity/code | Dfcts=fix commits | Nest=max indent depth | Auth=unique authors"
323
+ );
324
+ lines.push("Docs: https://github.com/wbern/obscene#metrics");
325
+ return lines.join("\n");
326
+ }
327
+ function formatCouplingTable(output) {
328
+ const lines = [];
329
+ const { tierCounts, totalScore, churnWindow, couplings } = output;
330
+ lines.push(
331
+ `Coupling \u2014 ${churnWindow} churn window | Min shared: ${output.minCochanges} | Total score: ${totalScore.toLocaleString()}`
332
+ );
333
+ pushTierSummary(lines, tierCounts, output.showing, output.totalCouplings);
334
+ lines.push(
335
+ padRight("File 1", 35) + padRight("File 2", 35) + padLeft("Shared", 7) + padLeft("Degree", 8) + padLeft("Cmplx", 7) + padLeft("Tier", 8)
336
+ );
337
+ lines.push("\u2500".repeat(100));
338
+ for (const c of couplings) {
339
+ lines.push(
340
+ padRight(truncate(c.file1, 33), 35) + padRight(truncate(c.file2, 33), 35) + padLeft(String(c.cochanges), 7) + padLeft(`${c.degree.toFixed(1)}%`, 8) + padLeft(String(c.totalComplexity), 7) + padLeft(tierLabel(c.tier), 8)
247
341
  );
248
342
  }
343
+ lines.push("");
344
+ lines.push(
345
+ "Shared=co-changed commits | Degree=shared/min(churn)\xD7100 | Cmplx=sum of both files"
346
+ );
347
+ lines.push("Docs: https://github.com/wbern/obscene#metrics");
249
348
  return lines.join("\n");
250
349
  }
350
+ function pushTierSummary(lines, tierCounts, showing, total) {
351
+ lines.push(
352
+ `Tiers: ${tierCounts.danger} danger, ${tierCounts.watch} watch, ${tierCounts.stable} stable`
353
+ );
354
+ lines.push(`Showing: ${showing} of ${total}`);
355
+ lines.push("");
356
+ }
357
+ function tierLabel(tier) {
358
+ if (tier === "danger") return "DANGER";
359
+ if (tier === "watch") return "WATCH";
360
+ return "stable";
361
+ }
251
362
  function padRight(s, n) {
252
363
  return s.length >= n ? s : s + " ".repeat(n - s.length);
253
364
  }
@@ -260,7 +371,7 @@ function truncate(s, max) {
260
371
 
261
372
  // src/cli.ts
262
373
  var program = new Command();
263
- program.name("obscene").description("Identify hotspot files \u2014 complex code that changes frequently").version("0.2.0");
374
+ program.name("obscene").description("Identify hotspot files \u2014 complex code that changes frequently").version("0.3.0");
264
375
  function addSharedOptions(cmd) {
265
376
  return cmd.option("--top <n>", "limit to top N entries (0 = all)", "20").option("--format <type>", "output format: json | table", "json").option(
266
377
  "--exclude <patterns...>",
@@ -285,6 +396,17 @@ addSharedOptions(
285
396
  exitWithError(err);
286
397
  }
287
398
  });
399
+ addSharedOptions(
400
+ program.command("coupling").description(
401
+ "temporal coupling \u2014 files that change together across directories"
402
+ )
403
+ ).option("--months <n>", "churn window in months", "3").option("--min-cochanges <n>", "minimum shared commits to include", "2").action((opts) => {
404
+ try {
405
+ runCoupling(opts);
406
+ } catch (err) {
407
+ exitWithError(err);
408
+ }
409
+ });
288
410
  function runReport(opts) {
289
411
  const top = parseInt(opts.top, 10);
290
412
  const files = runScc(opts.exclude);
@@ -353,6 +475,47 @@ function runHotspots(opts) {
353
475
  `);
354
476
  }
355
477
  }
478
+ function runCoupling(opts) {
479
+ const top = parseInt(opts.top, 10);
480
+ const months = parseInt(opts.months, 10);
481
+ const minCochanges = parseInt(opts.minCochanges, 10);
482
+ const files = runScc(opts.exclude);
483
+ const churn = getChurn(months);
484
+ const cochanges = getCoChanges(months, opts.exclude);
485
+ const complexityMap = /* @__PURE__ */ new Map();
486
+ for (const f of files) {
487
+ complexityMap.set(f.file, f.complexity);
488
+ }
489
+ const couplings = computeCoupling(
490
+ cochanges,
491
+ churn,
492
+ complexityMap,
493
+ minCochanges
494
+ );
495
+ const limited = top > 0 ? couplings.slice(0, top) : couplings;
496
+ const tierCounts = { danger: 0, watch: 0, stable: 0 };
497
+ for (const c of couplings) {
498
+ tierCounts[c.tier]++;
499
+ }
500
+ const totalScore = couplings.reduce((sum, c) => sum + c.couplingScore, 0);
501
+ const output = {
502
+ generated: (/* @__PURE__ */ new Date()).toISOString(),
503
+ churnWindow: `${months} months`,
504
+ minCochanges,
505
+ totalScore,
506
+ tierCounts,
507
+ totalCouplings: couplings.length,
508
+ showing: limited.length,
509
+ couplings: limited
510
+ };
511
+ if (opts.format === "table") {
512
+ process.stdout.write(`${formatCouplingTable(output)}
513
+ `);
514
+ } else {
515
+ process.stdout.write(`${JSON.stringify(output, null, 2)}
516
+ `);
517
+ }
518
+ }
356
519
  function exitWithError(err) {
357
520
  const message = err instanceof Error ? err.message : String(err);
358
521
  process.stderr.write(`Error: ${message}
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@wbern/obscene",
3
- "version": "0.2.0",
3
+ "version": "0.3.0",
4
4
  "description": "Identify hotspot files — complex code that changes frequently. Churn × complexity analysis for any git repo.",
5
5
  "type": "module",
6
6
  "bin": {