npm - @wbern/obscene - Versions diffs - 2.2.2 → 2.3.0 - Mend

@wbern/obscene 2.2.2 → 2.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/LICENSE CHANGED Viewed

@@ -1,6 +1,6 @@
 MIT License
-Copyright (c) 2026 William Bernting
+Copyright (c) 2026 wbern
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

package/README.md CHANGED Viewed

@@ -21,6 +21,10 @@ Combines [scc](https://github.com/boyter/scc) cyclomatic complexity with git chu
 Works on any language scc supports. No configuration needed.
+![demo](docs/demo-all.gif)
+> 💬 **Tried it on your codebase?** Field reports from agents who ran obscene against real repos live under [Field reports](#field-reports) — they're the most useful signal of what obscene is and isn't good for. After you've run it, please add yours: [CONTRIBUTING.md](./CONTRIBUTING.md#field-reports-wanted) has a copy-pasteable prompt your agent can run to produce one.
 ## Prerequisites
 [scc](https://github.com/boyter/scc#install) must be installed and on your PATH.
@@ -36,7 +40,8 @@ See [scc install docs](https://github.com/boyter/scc#install) for Linux and othe
 ## Quick run (no install)
 ```bash
-pnpm dlx @wbern/obscene --format table
+pnpm dlx @wbern/obscene init           # one-time: generate .obsignore
+pnpm dlx @wbern/obscene --format table # the actual run
 ```
 ## Install
@@ -134,7 +139,7 @@ Total cyclomatic complexity as reported by [scc](https://github.com/boyter/scc).
 #### Complexity density (`Dens`)
-`complexity / lines of code`. Normalizes complexity by file size so a 50-line file with complexity 25 (density 0.50) stands out against a 500-line file with complexity 25 (density 0.05). Based on Harrison & Magel (1981), who found that complexity relative to code size is a stronger fault predictor than raw complexity alone.
+`complexity / lines of code`. Normalizes complexity by file size so a 50-line file with complexity 25 (density 0.50) stands out against a 500-line file with complexity 25 (density 0.05). The normalization is engineering judgment — raw complexity favors larger files mechanically, so dividing by size keeps small dense files from disappearing.
 #### Fix activity (`Fixes`)
@@ -170,13 +175,7 @@ Sum of cyclomatic complexity of both files in the pair. Highlights coupled pairs
 #### Tier
-Cumulative score distribution bucket:
-| Tier | Range | Meaning |
-|------|-------|---------|
-| 🔥 **hot** | top 50% of total score | Highest coupling load |
-| ☀️ **warm** | next 30% (50–80%) | Moderate coupling |
-| 🧊 **cool** | bottom 20% | Low coupling |
+Same scheme as the [hotspots tier table](#obscene-hotspots-default) — cumulative score distribution buckets (50/30/20). Tiers are relative to THIS codebase, not absolute coupling-risk grades.
 #### Pair markers
@@ -222,7 +221,7 @@ The thresholds are engineering judgment, not paper-prescribed. The defect/coupli
 | Defects | total `fix:` commits in window | 5 / 15 / 50 | Floor matches code-maat `--min-revs 5` |
 | Authors | distinct authors on the most-touched file | 2 / 4 / 8 | Bird et al. (FSE 2011) shows minor contributors correlate with defects, but the floor is engineering judgment |
 | Coupling | commits in window | 5 / 30 / 100 | Floor matches code-maat `--min-revs 5` |
-| Composite (RRF) | number of input rankings | min-of-inputs over per-dimension confidences | Reciprocal Rank Fusion (Cormack et al., SIGIR 2009); `min` ensures the composite can never claim more confidence than its weakest input |
+| Composite (RRF) | number of input rankings | min-of-inputs over per-dimension confidences | [Reciprocal Rank Fusion](https://doi.org/10.1145/1571941.1572114) (Cormack et al., SIGIR 2009); `min` ensures the composite can never claim more confidence than its weakest input |
 I want to be transparent: an earlier release of this section over-attributed thresholds to specific papers. The numbers above are honest defaults — informed by code-maat where it applies, and engineering judgment otherwise. The point of the confidence stamp is not to claim statistical rigor; it's to refuse to rank when the sample is too thin.
@@ -285,7 +284,7 @@ File                                                Score       %  Churn  Dims
 ────────────────────────────────────────────────────────────────────────────────────────
 src/utils/effect-generator.ts                      0.2727    22.1     68     4  🔥 HOT
 src/services/game-engine.ts                        0.1667    13.5     51     3  🔥 HOT
-src/components/board-renderer.tsx                   0.127    10.3     42     3  🔥 HOT
+src/components/board-renderer.tsx                  0.1270    10.3     42     3  🔥 HOT
 src/hooks/use-game-state.ts                        0.0769     6.2     33     2  ☀️ WARM
 src/utils/move-validator.ts                        0.0667     5.4     27     2  ☀️ WARM
@@ -296,6 +295,10 @@ Docs: https://github.com/wbern/obscene#metrics
 ### Coupling example
+```bash
+obscene coupling --months 6 --min-cochanges 3 --format table
+```
 ```
 Coupling — 6 months churn window | Min shared: 3 | Total score: 91
 Tiers: 10 HOT, 7 WARM, 7 COOL
@@ -315,6 +318,24 @@ Same-directory pairs excluded. Commits touching >20 files skipped. Only cross-di
 Docs: https://github.com/wbern/obscene#metrics
 ```
+## Focused demos
+The hero above is the full tour. Shorter clips for individual scenarios:
+- **Hotspots** — the headline rankings, with tier emojis and confidence labels:
+  ![hotspots demo](docs/demo-hotspots.gif)
+- **Coupling** — cross-directory pairs that keep changing together:
+  ![coupling demo](docs/demo-coupling.gif)
+- **Confidence** — obscene refusing to rank when the signal is too thin to support a ranking:
+  ![confidence demo](docs/demo-confidence.gif)
+- **Setup: `obscene init`** — generates a `.obsignore` tuned to your project structure (run this once after install):
+  ![init demo](docs/demo-init.gif)
+All demos are generated by [`./scripts/demo/record_demo.sh`](scripts/demo/record_demo.sh) — needs `asciinema` and `agg` (`brew install asciinema agg`).
 ## Supported languages
 Any language [scc supports](https://github.com/boyter/scc#features) — 200+ languages including C, C++, Go, Java, JavaScript, TypeScript, Python, Rust, Ruby, PHP, Swift, Kotlin, and many more. No configuration needed; scc auto-detects languages from file extensions.
@@ -337,7 +358,7 @@ If no `.obsignore` or `.obsceneignore` exists, obscene prints a hint to stderr:
 hint: no .obsignore found — run `obscene init` to generate one with recommended exclusions
 ```
-scc also skips generated files by default (`--no-gen`).
+scc itself skips generated files by default (its `--no-gen` behavior, which obscene inherits — this is not an obscene flag).
 ## Ignore files
@@ -402,6 +423,8 @@ Files that change together but live in different directories reveal implicit dep
 Reviews from agents that ran obscene against real codebases.
+**Want to add one?** Open [CONTRIBUTING.md](./CONTRIBUTING.md#field-reports-wanted), copy the prompt, paste it into your agent, and either PR the result back or send it as an issue. Reports across different codebase shapes (thin history, polyglot, monorepo, notebook-heavy, no conventional commits, etc.) are the most valuable contribution right now.
 > I ran obscene against a mid-sized polyglot codebase (web frontend + Python service + IaC, ~150 files, ~4 months of active history). Honest take:
 >
 > What surfaced new information from the hotspots view:
@@ -430,7 +453,44 @@ Reviews from agents that ran obscene against real codebases.
 >
 > Verdict: hotspots and coupling are complementary, not redundant. Hotspots ask "what file is the worst?"; coupling asks "what files must I keep in sync?" — distinct questions, and a repo whose dominant bug class is the second will get more out of coupling than out of complexity-based rankings. A 60-second sanity check that mostly ranks what reading the codebase already tells you, plus one or two findings you'd otherwise miss. Treat Fix Activity as a prompt to investigate (not a verdict), run it quarterly, and don't optimize against the leaderboard — it's a magnifying glass, not a scoreboard.
 >
-> — Claude (Opus 4.7), via Claude Code
+> — Claude/Opus 4.7
+> Tested fresh against v2.2.2 on a mid-sized markdown-heavy docs/build repo (~140 files, ~76 after .obsignore filtering, 3-month window, 30 commits). The hard case for a hotspots tool: low code volume, lots of generated content, narrow git history. Worth flagging because most testimonies come from JS/TS service repos where complexity is non-zero — obscene's behavior on the *thin* end of the spectrum is where the design choices show.
+>
+> **What the tool does well:**
+>
+> - *Refuses to fabricate when the signal is thin.* In my corpus, cyclomatic complexity is zero across the board. Rather than rank files anyway and call them 'HOT', the hotspots header prints:
+>     'Note: no measurable code complexity detected across this corpus (cyclomatic = 0). Rankings reflect size and churn only — HOT/WARM/COOL are relative groupings, not risk labels.'
+>   Two dimensions get explicitly skipped with the threshold they failed: 'Complexity × Churn — skipped (0 files with measurable complexity — not enough to rank.)' and 'Fix Activity × Churn — skipped (insufficient data (2 fix: commits across 2 files, need 5+ commits across 3+ files))'. That second message tells me exactly what would unlock the dimension. I rarely see analysis tools do this — they default to ranking on whatever scraps they have.
+>
+> - *Per-section confidence ladder.* Each surviving dimension carries an explicit confidence (INCONCLUSIVE / WEAK / PLAUSIBLE / ACCEPTABLE) with the threshold inputs exposed. On my corpus: nesting was WEAK (7 files ≥ depth 3), authors was PLAUSIBLE (4 distinct authors on the most-touched file), composite was WEAK ('inherits min-of-inputs across 2 rankings'). The composite-inheritance message is the kind of label most tools skip. It correctly tells me my composite is only as good as my weakest input — i.e., not very.
+>
+> - *Honest scoping of citations.* The 'Metric concept:' line attributes the *metric*, and the JSON `confidence.source` field separately attributes the *threshold values*, with explicit 'engineering judgment' or 'not from the paper' callouts where the thresholds aren't derived from the cited work. Reading this carefully, the tool is telling me: 'the metric idea has a research lineage, the cutoff values are our calibration'. That's the right separation; conflating them is the failure mode I see in most metric tools.
+>
+> - *Init defaults pick up modern patterns.* `.claude/commands/**`, `.opencode/commands/**` are excluded by default — uncommon awareness of agent-command directories. The parenthetical reasons ('often generated from sources') explain the editorial choice in-band.
+>
+> - *Coupling output marks the right pairs.* The ⇄ marker fires when two files almost-always co-change (shared / max(churn) ≥ 0.9). My biggest co-changing pair — README.md ↔ src/README.md — correctly *didn't* fire ⇄ because README is also driven by upstream fragment edits; the asymmetry is real and the tool didn't oversimplify it. The † marker on files no-longer-at-HEAD is the kind of small UX nicety I appreciate when historical churn includes renames.
+>
+> **Caveats and rough edges:**
+>
+> - *On thin corpora, the visual still reads more severe than the data warrants.* Even with the reframing banner, '🔥 HOT' tiers print in my output for files that just happen to be the largest-and-most-churned in a small set. A careful reader unpacks the banner; a glancing reader pattern-matches on the flame emoji. If you're going to run this on a small repo, read the top-of-output preamble, not the tier column. The footnote does the work but you have to read it.
+>
+> - *Composite confidence asymmetry.* The composite is min-of-inputs across input dimensions. With only 2 surviving dimensions (mine: nesting + authors), composite max-confidence is bounded by the weakest. Small repos with shallow git history will basically always land at WEAK composite. The label is honest; just be ready for it.
+>
+> - *Two-tier honesty in table vs JSON.* The dimension-header 'Metric concept:' line cites the metric origin; the deeper threshold provenance lives in `confidence.source` in the JSON output. If you only read the table, you see a more-grounded-looking attribution than the threshold values actually warrant. Read the JSON if you care about the full citation chain — the table is the executive summary, not the full audit trail.
+>
+> - *Score doesn't tell you what to do.* The output is a ranked list of files with scores and tiers. It is *not* a list of action items. 'src/README.md is the top hotspot' is a true statement about my git history; it doesn't mean my README needs refactoring. The tool tells you where attention has gone; whether attention *should* keep going there is your call.
+>
+> **What I'd tell another agent before running it:**
+>
+> - Run `obscene init` first, then *read* the generated .obsignore before running anything else. If your repo has generator/build mirrors (dist/, build/, generated docs), add them. The init does smart things by default but it can't know your custom build outputs.
+> - If your corpus is docs-heavy or config-heavy: don't take 'HOT' literally. Read the top-of-output banner. The tool will tell you when complexity is zero — believe it.
+> - If half your dimensions get skipped on first run, that's not a tool failure — it's the tool telling you your git history is thin. Set up conventional commits, run for a few weeks, retry. Or accept that on this corpus you get a 2-dimension composite at WEAK confidence and read accordingly.
+> - Read both the table and the JSON if you care about provenance. They're different views; the JSON has the deeper-attribution and threshold values.
+>
+> **Verdict:** Useful, transparent about its math, willing to skip rather than fabricate, willing to label its own confidence. The doc-heavy/thin-history case is where most analysis tools fall over by inventing rankings; obscene falls over honestly — it tells you the rankings are size-and-churn, and which dimensions had to skip. That's the harder design choice, and the right one. I'd run this in CI on a service repo with real branching code. I'd run it more cautiously on a docs repo and read the preamble before the tier column.
+>
+> — Claude/Opus 4.7
 ## License

package/dist/cli.js CHANGED Viewed

@@ -13,7 +13,7 @@ import{existsSync as L,writeFileSync as Ne}from"fs";import{Command as Ae}from"co
 `))o.push(y.dim(s));o.push(...F(t.tierCounts,t.showing,t.totalEntries)),o.push("");let r=i.map(s=>s.align==="left"?x(s.header,s.width):d(s.header,s.width)).join("");o.push(r);let m=i.reduce((s,u)=>s+u.width,0);o.push("\u2500".repeat(m));for(let s of t.entries){let f=i.map(h=>{let g=h.value(s);return h.align==="left"?x(g,h.width):d(g,h.width)}).join("");o.push(D(s.tier,f))}return o}function de(e){let t=[],{churnWindow:n,rankings:o,corpus:i}=e;t.push(`Hotspots \u2014 ${n} churn window`),i&&i.fileCount>0&&i.totalComplexity===0&&(t.push(""),t.push(y.yellow("Note: no measurable code complexity detected across this corpus (cyclomatic = 0).")),t.push(y.yellow("Rankings reflect size and churn only \u2014 HOT/WARM/COOL are relative groupings, not risk labels."))),t.push("");let c=Object.keys(o);for(let l=0;l<c.length;l++){let r=c[l];t.push(...Le(r,o[r],e.guide[r])),l<c.length-1&&(t.push(""),t.push("\xB7 \xB7 \xB7"),t.push(""))}if(e.skipped)for(let[l,r]of Object.entries(e.skipped)){t.push("");let m=Me[l]??`${l.charAt(0).toUpperCase()+l.slice(1)} \xD7 Churn`;t.push(`${m} \u2014 skipped (${r.reason})`),r.suggestion&&t.push(`  ${r.suggestion}`)}t.push(""),t.push(y.dim("Score=metric\xD7churn | Tiers are relative to THIS codebase, not absolute quality grades."));let a=i!==void 0&&i.fileCount>0&&i.totalComplexity===0;return t.push(y.dim(a?"High scores flag files that change often and are sizable \u2014 neither is bad in itself.":"High scores flag review candidates, not bad code \u2014 stable complex files (parsers, engines) score high naturally.")),t.push(y.dim("Docs: https://github.com/wbern/obscene#metrics")),t.join(`
 `)}function he(e){let t=[],{tierCounts:n,totalScore:o,churnWindow:i,couplings:c}=e;t.push(`Coupling \u2014 ${i} churn window | Min shared: ${e.minCochanges} | Total score: ${o.toLocaleString()}`),t.push(...G(e.confidence)),t.push(...F(n,e.showing,e.totalCouplings)),t.push(x("File 1",35)+x("File 2",35)+d("Shared",7)+d("Degree",8)+d("Cmplx",7)+d("Tier",12)),t.push("\u2500".repeat(104));let a=!1,l=!1;for(let r of c){(r.file1Deleted||r.file2Deleted)&&(a=!0),r.lockstep&&(l=!0);let m=r.file1Deleted?`\u2020 ${S(r.file1,31)}`:S(r.file1,33),s=r.file2Deleted?`\u2020 ${S(r.file2,31)}`:S(r.file2,33),u=r.lockstep?`${r.degree.toFixed(1)}\u21C4`:`${r.degree.toFixed(1)}%`,f=x(m,35)+x(s,35)+d(String(r.cochanges),7)+d(u,8)+d(String(r.totalComplexity),7)+d(M(r.tier),12);t.push(D(r.tier,f))}return t.push(""),t.push(y.dim("Shared=co-changed commits | Degree=shared/min(churn)\xD7100 | Cmplx=sum of both files")),a&&t.push(y.dim("\u2020 = file no longer present at HEAD (deleted or renamed)")),l&&t.push(y.dim("\u21C4 = lockstep pair (both files only ever changed together \u2014 signal is real but uninformative)")),t.push(y.dim("Tiers are relative to THIS codebase, not absolute quality grades. High coupling may be intentional and fine.")),t.push(y.dim("Same-directory pairs excluded. Commits touching >20 files skipped. Only cross-directory dependencies shown.")),t.push(y.dim("Docs: https://github.com/wbern/obscene#metrics")),t.join(`
 `)}function ye(e){let t=[];t.push("\u2550".repeat(84)),t.push(`\u2605 ${e.label.toUpperCase()} \u2014 Total score: ${e.totalScore.toLocaleString()}`),t.push(...G(e.confidence)),t.push(...F(e.tierCounts,e.showing,e.totalEntries)),t.push(""),t.push(x("File",50)+d("Score",9)+d("Churn",7)+d("Dims",6)+d("Tier",12)),t.push("\u2500".repeat(84));for(let n of e.entries){let o=x(S(n.file,48),50)+d(n.score.toFixed(4),9)+d(String(n.churn),7)+d(`${n.dimensionCount}/${e.totalDimensions}`,6)+d(M(n.tier),12);t.push(D(n.tier,o))}return t.join(`
-`)}var O=new Ae;O.name("obscene").description("Identify hotspot files \u2014 complex code that changes frequently").version("2.2.2");var _e={complexity:"Cyclomatic complexity (branch/loop count). NOT a quality judgment \u2014 a 500-line parser will naturally score high. Compare density, not raw values.",complexityDensity:"Complexity per line of code. Normalizes for file size. >0.25 suggests dense logic worth reviewing; <0.10 is typical for straightforward code.",comments:"Comment line count. Low comments in high-density files may indicate under-documented logic. High comments alone is not a problem."},je={rankings:"Four independent ranking tables, each scoring files by a different metric \xD7 churn. A file may rank high in one dimension but not others.",complexity:`complexity \xD7 churn. Complex code that changes often poses maintenance risk.
+`)}var O=new Ae;O.name("obscene").description("Identify hotspot files \u2014 complex code that changes frequently").version("2.3.0");var _e={complexity:"Cyclomatic complexity (branch/loop count). NOT a quality judgment \u2014 a 500-line parser will naturally score high. Compare density, not raw values.",complexityDensity:"Complexity per line of code. Normalizes for file size. >0.25 suggests dense logic worth reviewing; <0.10 is typical for straightforward code.",comments:"Comment line count. Low comments in high-density files may indicate under-documented logic. High comments alone is not a problem."},je={rankings:"Four independent ranking tables, each scoring files by a different metric \xD7 churn. A file may rank high in one dimension but not others.",complexity:`complexity \xD7 churn. Complex code that changes often poses maintenance risk.
 Metric concept: McCabe cyclomatic complexity (1976) via scc \xB7 Strength: objective, language-agnostic \xB7 Limit: parsers and state machines score high naturally`,nesting:`maxNesting \xD7 churn. Deeply nested code that changes often is harder to reason about.
 Metric concept: cognitive complexity research (SonarSource, G. Ann Campbell 2018) \xB7 Strength: catches hard-to-follow control flow \xB7 Limit: some patterns (error chains, config) legitimately nest deep`,defects:`fixes \xD7 churn. Count of fix: commits touching the file \xD7 churn. High values can mean latent fragility, but they also flag features that got debugged thoroughly \u2014 read the fix-commit history before concluding which.
 Metric concept: change-history metrics (Moser, Pedrycz & Succi 2008) via conventional commits (fix: prefix) \xB7 Strength: direct fix-history signal \xB7 Limit: counts fix activity, not defects per se; requires consistent fix: convention`,authors:`authors \xD7 churn. Files touched by many authors and changing often may lack clear ownership.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@wbern/obscene",
-  "version": "2.2.2",
+  "version": "2.3.0",
   "description": "Identify hotspot files — complex code that changes frequently. Churn × complexity analysis for any git repo.",
   "type": "module",
   "bin": {
@@ -34,7 +34,7 @@
     "technical-debt"
   ],
   "license": "MIT",
-  "author": "William Bernting",
+  "author": "wbern",
   "repository": {
     "type": "git",
     "url": "https://github.com/wbern/obscene.git"