unslop 0.1.5 → 0.1.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +148 -64
  2. package/package.json +6 -6
package/README.md CHANGED
@@ -2,8 +2,6 @@
2
2
 
3
3
  Standalone CLI tool for detecting duplicated code, dead code, inlined utilities, and semantic anti-patterns in AI-generated codebases. No AI/LLM in the detection pipeline — deterministic analysis only.
4
4
 
5
- Like `react-doctor` but for de-slopping entire codebases.
6
-
7
5
  ## Install
8
6
 
9
7
  ```bash
@@ -55,36 +53,160 @@ Use `--changed-only` to focus on uncommitted files and prioritize reuse against
55
53
 
56
54
  ## What It Detects
57
55
 
58
- | Category | Engine | Reliability |
59
- |----------|--------|:-----------:|
60
- | Identical constants across files | Clone | 99% |
61
- | Copy-pasted functions (same names) | Clone | 99% |
62
- | Copy-pasted functions (renamed params) | Clone | 95% |
63
- | Reformatted JSX components | Clone | 90% |
64
- | Similar functions (small edits) | Clone | 80% |
65
- | Equivalent regex patterns | Clone | 95% |
66
- | Cross-package export matches | Clone | 99% |
67
- | `a>b?a:b` `Math.max` | Oxlint | 99% |
68
- | Dead branches | Oxlint | 90% |
69
- | Unreachable code | Oxlint | 99% |
70
- | Inlined utilities | Oxlint | 95% |
71
- | Dead exports | Clone | 95% |
72
- | Complexity budget breaches | Practices | 90% |
73
- | Boundary violations (`packages -> apps`, cross-app deps, cycles) | Practices | 90-98% |
74
- | Ignored Go errors | Practices | 93% |
75
- | Go context leaks | Practices | 87% |
76
- | React broad barrel re-exports | Practices | 78-86% |
56
+ ### Quick Reference
57
+
58
+ | Category | Engine | Reliability |
59
+ | -------------------------------------- | --------- | :---------: |
60
+ | Identical constants across files | Clone | 99% |
61
+ | Copy-pasted functions (same names) | Clone | 99% |
62
+ | Copy-pasted functions (renamed params) | Clone | 95% |
63
+ | Reformatted JSX components | Clone | 90% |
64
+ | Similar functions (small edits) | Clone | 80% |
65
+ | Equivalent regex patterns | Clone | 95% |
66
+ | Cross-package export matches | Clone | 99% |
67
+ | `a>b?a:b` `Math.max` | Oxlint | 99% |
68
+ | Dead branches | Oxlint | 90% |
69
+ | Unreachable code | Oxlint | 99% |
70
+ | Inlined utilities | Oxlint | 95% |
71
+ | Dead exports | Clone | 95% |
72
+ | Complexity budget breaches | Practices | 90% |
73
+
74
+ ## Rules Reference
75
+
76
+ 10 rules across 3 engines. Use `unslop --list-rules` to see defaults for your config.
77
+
78
+ ---
79
+
80
+ ### Engine 1: Clone Detection (3 rules)
81
+
82
+ Tree-sitter parses source into a CST, then a language plugin normalizes it to a language-agnostic tree with alpha-renamed identifiers (`a`, `b`, `c` instead of real names). All three rules operate on these normalized trees.
83
+
84
+ #### `exact-clone` — Tier A, Error
85
+
86
+ SHA-256 fingerprint of the entire normalized tree. Two fragments with identical hashes are exact clones.
87
+
88
+ - **Algorithm**: deterministic S-expression serialization `(Kind:Label child1 child2 ...)` → SHA-256. O(n) grouping by hash.
89
+ - **Minimum size**: 50 tokens, 8 nodes.
90
+ - **Filters**: same-file duplicates, import-linked pairs, rule boilerplate scaffolding (<30 line span in rules scaffold paths), repeated framework idioms (3+ occurrences with diverse names across 2+ files).
91
+
92
+ #### `near-clone` — Tier A, Warning
93
+
94
+ Finds near-miss duplicates via suffix tree / LCS analysis on linearized token sequences.
95
+
96
+ - **Linearization**: pre-order traversal of normalized tree → flat token sequence with `^` sentinel tokens marking end-of-children.
97
+ - **Bucketing**: logarithmic buckets by token count (20–39 → bucket 0, 40–79 → bucket 1, etc.) to avoid O(n²). Each sequence placed in adjacent bucket +1 for cross-boundary matches.
98
+ - **Algorithm**: rolling DP longest common substring (two rows, O(n) space), capped at **220,000 DP cells** per pair. Falls back to bounded longest common subsequence for edited clones with insertions/deletions.
99
+ - **Threshold**: **80% similarity** (`matchLen / max(len(A), len(B))`).
100
+ - **Limits**: max 10,000 comparisons, min 50 tokens.
101
+ - **Post-processing**: deterministic one-to-one matching per file pair — sorts by similarity descending, prefers same-name pairs, greedy assignment (each fragment matched at most once per file pair). Excludes declarative clones (different names + no control flow or ≤3 statements).
102
+
103
+ #### `structural-similarity` — Tier A, Warning
104
+
105
+ PQ-gram tree profiling for structural similarity between same-kind fragments.
106
+
107
+ - **Algorithm**: PQ-gram with **p=2** ancestors, **q=3** siblings. Each gram = stem of p ancestor labels + base of q sibling labels. Similarity via **Sørensen–Dice coefficient**: `2 * |intersection| / (|A| + |B|)`.
108
+ - **Eligibility**: same fragment Kind only (function-to-function, class-to-class), different files, min 8 nodes, size ratio ≤ 1.8×.
109
+ - **Thresholds** (variable by file type):
110
+ - Both production files: **70%**
111
+ - One test + one production: **86%**
112
+ - Both test files: **90%**
113
+ - **Limits**: max 50,000 comparisons. Profiles cached per fragment.
114
+
115
+ ---
116
+
117
+ ### Engine 2: Linter Integration (5 rules)
118
+
119
+ Wraps external linters (e.g. oxlint) run as subprocesses with configurable timeouts. Parses JSON output into findings. Non-fatal — a missing linter produces a warning, not an error.
120
+
121
+ #### `inlined-utility` — Tier A, Warning
122
+
123
+ Hand-rolled code that could use a standard library or utility call (e.g. `a > b ? a : b` → `Math.max`).
124
+
125
+ #### `dead-code` — Tier A, Error, Gateable
126
+
127
+ Unreachable code blocks that can never execute.
128
+
129
+ #### `dead-branch` — Tier A, Warning
130
+
131
+ Conditional branches (if/else, ternary) that can never be taken.
132
+
133
+ #### `duplication` — Tier A, Warning
134
+
135
+ Duplicate imports, exports, or repeated patterns detected by the linter.
136
+
137
+ #### `dead-export` — Tier A, Warning
138
+
139
+ Exported symbols that nothing imports.
140
+
141
+ ---
142
+
143
+ ### Engine 3: Best Practices (1 rule)
144
+
145
+ In-process deterministic checks — pattern matching and metric computation. No external dependencies.
146
+
147
+ #### `complexity-budget` — Tier A, Warning
148
+
149
+ Flags functions that exceed thresholds on **2+ of 3 metrics** simultaneously.
150
+
151
+ | Metric | Strict | Balanced (default) | Lenient |
152
+ | -------------------------------------- | -----: | -----------------: | ------: |
153
+ | Decision count (if/loop/switch/select) | ≥10 | ≥12 | ≥14 |
154
+ | Max nesting depth | ≥3 | ≥4 | ≥5 |
155
+ | Source lines | ≥75 | ≥90 | ≥110 |
156
+
157
+ - Functions under 20 lines are always excluded.
158
+ - **Headroom** for special paths: tooling/infrastructure paths get +20 decisions, +3 nesting, +80 lines. Script functions (`/scripts/`, `.mjs`/`.cjs`, `main`/`walk`) get +14/+2/+40. Orchestration functions (`/cmd/`, `/routes/`, `/handlers/`, names starting with `handle`/`route`/`serve`) get +2/+1/+25.
159
+ - Also fires on **single-metric extreme outliers**: decisions ≥ threshold+8, nesting ≥ threshold+2, or lines ≥ threshold+40 (when some complexity is also present).
160
+
161
+ ---
162
+
163
+ ### Slop Score
164
+
165
+ A 0–100 weighted composite of findings and structural smell metrics.
166
+
167
+ **Category weights** (base, before profile multiplier):
168
+
169
+ | Category | Weight | Cap |
170
+ | --------------------- | -----: | --: |
171
+ | exact-clone | 9 | 18 |
172
+ | near-clone | 6 | 18 |
173
+ | structural-similarity | 5 | 18 |
174
+ | dead-code | 5 | 18 |
175
+ | complexity-budget | 5 | 14 |
176
+ | inlined-utility | 4 | 18 |
177
+ | duplication | 4 | 18 |
178
+ | dead-export | 4 | 18 |
179
+ | dead-branch | 3 | 18 |
180
+
181
+ **Profile multipliers**: strict = 1.2×, balanced = 1.0×, lenient = 0.8×.
182
+
183
+ **Per-finding formula**: `points = weight × profileMultiplier × (0.6 + 0.4 × confidence) × (1.0 + min(0.5, 0.1 × (locations - 1)))`. Category sums are capped, then total finding points capped at 70.
184
+
185
+ **Smell metrics** (up to 30 points):
186
+
187
+ - **wrapper-function-density** (45% weight) — single-statement wrapper functions as a ratio of total functions.
188
+ - **trivial-declaration-density** (35% weight) — declarations with ≤6 nodes and no control flow.
189
+ - **reused-fragment-names** (20% weight) — function/class/variable names appearing in 3+ distinct files.
190
+
191
+ **Score bands**:
192
+
193
+ | Range | Band |
194
+ | -----: | -------- |
195
+ | 0–14 | Minimal |
196
+ | 15–34 | Low |
197
+ | 35–59 | Moderate |
198
+ | 60–79 | High |
199
+ | 80–100 | Severe |
77
200
 
78
201
  ## Architecture
79
202
 
80
- Four artifact engines run in parallel and are evaluated through a rule catalog:
203
+ Three artifact engines run in parallel and are evaluated through a rule catalog:
81
204
 
82
205
  1. **Clone Detection** (in-process) — tree-sitter parsing → pre-normalization (cached) → alpha-renaming → hash/suffix-tree/PQ-gram comparison
83
206
  2. **Semantic Analysis** (subprocess) — external linter diagnostics
84
- 3. **Best Practices** (in-process) — deterministic clean-code and architecture checks
85
- 4. **Repo Policy** (in-process) — repository policy signals
207
+ 3. **Best Practices** (in-process) — deterministic clean-code checks
86
208
 
87
- Each rule is now a first-class catalog entry with defaults and per-rule controls.
209
+ Each rule is a first-class catalog entry with defaults and per-rule controls.
88
210
 
89
211
  Built on Semgrep's model: **tree-sitter CST → language-specific normalizer → generic normalized tree → language-agnostic analysis**.
90
212
 
@@ -149,8 +271,6 @@ rules:
149
271
  exact-clone:
150
272
  severity: error
151
273
  gateable: true
152
- barrel-import:
153
- enabled: false
154
274
  complexity-budget:
155
275
  paths:
156
276
  include: ["apps/**"]
@@ -158,7 +278,7 @@ rules:
158
278
 
159
279
  gates:
160
280
  max_score: 35
161
- fail_on_rules: ["exact-clone", "boundary-violation-cycle"]
281
+ fail_on_rules: ["exact-clone"]
162
282
  fail_on_tiers: ["A"]
163
283
 
164
284
  slop_score:
@@ -166,42 +286,6 @@ slop_score:
166
286
  top_contributors: 5
167
287
  ```
168
288
 
169
- See `/Users/danieledeling/projects/go/unslop/docs/config-v2.md` and `/Users/danieledeling/projects/go/unslop/docs/rules.md` for full details.
170
-
171
- ## v2 Breaking Change
172
-
173
- Configuration is now strict v2:
174
-
175
- - `version: 2` is required when using a config file
176
- - unknown top-level keys fail parsing
177
- - unknown non-legacy rule IDs fail parsing
178
- - removed legacy rule IDs are silently ignored in config/CLI rule selectors
179
- - unknown fields under `rules.<id>` fail parsing
180
- - there is no v1 compatibility bridge
181
-
182
- Example terminal section (`--verbose`):
183
-
184
- ```text
185
- ── high-priority (2) ──
186
-
187
- [exact-clone] Exact duplicate: parseConfig
188
- ./src/a.ts:10
189
- ./src/b.ts:12
190
-
191
- ── low-priority (1) ──
192
-
193
- [exact-clone] Exact duplicate: makeTempDir
194
- ./src/foo.test.ts:5
195
- ./src/bar.test.ts:9
196
-
197
- ── slop-score ──
198
-
199
- Slop Score: 42/100 (Moderate)
200
- Top contributors:
201
- exact-clone: +18.4 (3 findings)
202
- wrapper-function-density: +8.7 (31% of functions)
203
- ```
204
-
205
289
  ## Development
206
290
 
207
291
  ```bash
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "unslop",
3
- "version": "0.1.5",
3
+ "version": "0.1.7",
4
4
  "description": "Detect duplicated code, dead code, and anti-patterns in AI-generated codebases",
5
5
  "bin": {
6
6
  "unslop": "bin.js"
@@ -9,11 +9,11 @@
9
9
  "bin.js"
10
10
  ],
11
11
  "optionalDependencies": {
12
- "@unslop/darwin-arm64": "0.1.5",
13
- "@unslop/darwin-x64": "0.1.5",
14
- "@unslop/linux-arm64": "0.1.5",
15
- "@unslop/linux-x64": "0.1.5",
16
- "@unslop/win32-x64": "0.1.5"
12
+ "@unslop/darwin-arm64": "0.1.7",
13
+ "@unslop/darwin-x64": "0.1.7",
14
+ "@unslop/linux-arm64": "0.1.7",
15
+ "@unslop/linux-x64": "0.1.7",
16
+ "@unslop/win32-x64": "0.1.7"
17
17
  },
18
18
  "license": "Apache-2.0",
19
19
  "repository": {