unslop 0.1.4 → 0.1.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +196 -85
  2. package/package.json +6 -6
package/README.md CHANGED
@@ -2,8 +2,6 @@
2
2
 
3
3
  Standalone CLI tool for detecting duplicated code, dead code, inlined utilities, and semantic anti-patterns in AI-generated codebases. No AI/LLM in the detection pipeline — deterministic analysis only.
4
4
 
5
- Like `react-doctor` but for de-slopping entire codebases.
6
-
7
5
  ## Install
8
6
 
9
7
  ```bash
@@ -42,6 +40,9 @@ unslop --format json .
42
40
  # SARIF output (for CI integration)
43
41
  unslop --format sarif .
44
42
 
43
+ # List built-in rules and defaults
44
+ unslop --list-rules
45
+
45
46
  # With config file
46
47
  unslop --config .unslop.yaml .
47
48
  ```
@@ -52,34 +53,160 @@ Use `--changed-only` to focus on uncommitted files and prioritize reuse against
52
53
 
53
54
  ## What It Detects
54
55
 
55
- | Category | Engine | Reliability |
56
- |----------|--------|:-----------:|
57
- | Identical constants across files | Clone | 99% |
58
- | Copy-pasted functions (same names) | Clone | 99% |
59
- | Copy-pasted functions (renamed params) | Clone | 95% |
60
- | Reformatted JSX components | Clone | 90% |
61
- | Similar functions (small edits) | Clone | 80% |
62
- | Equivalent regex patterns | Clone | 95% |
63
- | Cross-package export matches | Clone | 99% |
64
- | `a>b?a:b` `Math.max` | Oxlint | 99% |
65
- | Dead branches | Oxlint | 90% |
66
- | Unreachable code | Oxlint | 99% |
67
- | Inlined utilities | Oxlint | 95% |
68
- | Dead exports | Clone | 95% |
69
- | Complexity budget breaches | Practices | 90% |
70
- | Boundary violations (`packages -> apps`, cross-app deps, cycles) | Practices | 90-98% |
71
- | Generic naming entropy | Practices | 74% |
72
- | Ignored Go errors | Practices | 93% |
73
- | Go context leaks | Practices | 87% |
74
- | React broad barrel re-exports | Practices | 78-86% |
56
+ ### Quick Reference
57
+
58
+ | Category | Engine | Reliability |
59
+ | -------------------------------------- | --------- | :---------: |
60
+ | Identical constants across files | Clone | 99% |
61
+ | Copy-pasted functions (same names) | Clone | 99% |
62
+ | Copy-pasted functions (renamed params) | Clone | 95% |
63
+ | Reformatted JSX components | Clone | 90% |
64
+ | Similar functions (small edits) | Clone | 80% |
65
+ | Equivalent regex patterns | Clone | 95% |
66
+ | Cross-package export matches | Clone | 99% |
67
+ | `a>b?a:b` `Math.max` | Oxlint | 99% |
68
+ | Dead branches | Oxlint | 90% |
69
+ | Unreachable code | Oxlint | 99% |
70
+ | Inlined utilities | Oxlint | 95% |
71
+ | Dead exports | Clone | 95% |
72
+ | Complexity budget breaches | Practices | 90% |
73
+
74
+ ## Rules Reference
75
+
76
+ 10 rules across 3 engines. Use `unslop --list-rules` to see defaults for your config.
77
+
78
+ ---
79
+
80
+ ### Engine 1: Clone Detection (3 rules)
81
+
82
+ Tree-sitter parses source into a CST, then a language plugin normalizes it to a language-agnostic tree with alpha-renamed identifiers (`a`, `b`, `c` instead of real names). All three rules operate on these normalized trees.
83
+
84
+ #### `exact-clone` — Tier A, Error
85
+
86
+ SHA-256 fingerprint of the entire normalized tree. Two fragments with identical hashes are exact clones.
87
+
88
+ - **Algorithm**: deterministic S-expression serialization `(Kind:Label child1 child2 ...)` → SHA-256. O(n) grouping by hash.
89
+ - **Minimum size**: 50 tokens, 8 nodes.
90
+ - **Filters**: same-file duplicates, import-linked pairs, rule boilerplate scaffolding (<30 line span in rules scaffold paths), repeated framework idioms (3+ occurrences with diverse names across 2+ files).
91
+
92
+ #### `near-clone` — Tier A, Warning
93
+
94
+ Finds near-miss duplicates via suffix tree / LCS analysis on linearized token sequences.
95
+
96
+ - **Linearization**: pre-order traversal of normalized tree → flat token sequence with `^` sentinel tokens marking end-of-children.
97
+ - **Bucketing**: logarithmic buckets by token count (20–39 → bucket 0, 40–79 → bucket 1, etc.) to avoid O(n²). Each sequence placed in adjacent bucket +1 for cross-boundary matches.
98
+ - **Algorithm**: rolling DP longest common substring (two rows, O(n) space), capped at **220,000 DP cells** per pair. Falls back to bounded longest common subsequence for edited clones with insertions/deletions.
99
+ - **Threshold**: **80% similarity** (`matchLen / max(len(A), len(B))`).
100
+ - **Limits**: max 10,000 comparisons, min 50 tokens.
101
+ - **Post-processing**: deterministic one-to-one matching per file pair — sorts by similarity descending, prefers same-name pairs, greedy assignment (each fragment matched at most once per file pair). Excludes declarative clones (different names + no control flow or ≤3 statements).
102
+
103
+ #### `structural-similarity` — Tier A, Warning
104
+
105
+ PQ-gram tree profiling for structural similarity between same-kind fragments.
106
+
107
+ - **Algorithm**: PQ-gram with **p=2** ancestors, **q=3** siblings. Each gram = stem of p ancestor labels + base of q sibling labels. Similarity via **Sørensen–Dice coefficient**: `2 * |intersection| / (|A| + |B|)`.
108
+ - **Eligibility**: same fragment Kind only (function-to-function, class-to-class), different files, min 8 nodes, size ratio ≤ 1.8×.
109
+ - **Thresholds** (variable by file type):
110
+ - Both production files: **70%**
111
+ - One test + one production: **86%**
112
+ - Both test files: **90%**
113
+ - **Limits**: max 50,000 comparisons. Profiles cached per fragment.
114
+
115
+ ---
116
+
117
+ ### Engine 2: Linter Integration (5 rules)
118
+
119
+ Wraps external linters (e.g. oxlint) run as subprocesses with configurable timeouts. Parses JSON output into findings. Non-fatal — a missing linter produces a warning, not an error.
120
+
121
+ #### `inlined-utility` — Tier A, Warning
122
+
123
+ Hand-rolled code that could use a standard library or utility call (e.g. `a > b ? a : b` → `Math.max`).
124
+
125
+ #### `dead-code` — Tier A, Error, Gateable
126
+
127
+ Unreachable code blocks that can never execute.
128
+
129
+ #### `dead-branch` — Tier A, Warning
130
+
131
+ Conditional branches (if/else, ternary) that can never be taken.
132
+
133
+ #### `duplication` — Tier A, Warning
134
+
135
+ Duplicate imports, exports, or repeated patterns detected by the linter.
136
+
137
+ #### `dead-export` — Tier A, Warning
138
+
139
+ Exported symbols that nothing imports.
140
+
141
+ ---
142
+
143
+ ### Engine 3: Best Practices (1 rule)
144
+
145
+ In-process deterministic checks — pattern matching and metric computation. No external dependencies.
146
+
147
+ #### `complexity-budget` — Tier A, Warning
148
+
149
+ Flags functions that exceed thresholds on **2+ of 3 metrics** simultaneously.
150
+
151
+ | Metric | Strict | Balanced (default) | Lenient |
152
+ | -------------------------------------- | -----: | -----------------: | ------: |
153
+ | Decision count (if/loop/switch/select) | ≥10 | ≥12 | ≥14 |
154
+ | Max nesting depth | ≥3 | ≥4 | ≥5 |
155
+ | Source lines | ≥75 | ≥90 | ≥110 |
156
+
157
+ - Functions under 20 lines are always excluded.
158
+ - **Headroom** for special paths: tooling/infrastructure paths get +20 decisions, +3 nesting, +80 lines. Script functions (`/scripts/`, `.mjs`/`.cjs`, `main`/`walk`) get +14/+2/+40. Orchestration functions (`/cmd/`, `/routes/`, `/handlers/`, names starting with `handle`/`route`/`serve`) get +2/+1/+25.
159
+ - Also fires on **single-metric extreme outliers**: decisions ≥ threshold+8, nesting ≥ threshold+2, or lines ≥ threshold+40 (when some complexity is also present).
160
+
161
+ ---
162
+
163
+ ### Slop Score
164
+
165
+ A 0–100 weighted composite of findings and structural smell metrics.
166
+
167
+ **Category weights** (base, before profile multiplier):
168
+
169
+ | Category | Weight | Cap |
170
+ | --------------------- | -----: | --: |
171
+ | exact-clone | 9 | 18 |
172
+ | near-clone | 6 | 18 |
173
+ | structural-similarity | 5 | 18 |
174
+ | dead-code | 5 | 18 |
175
+ | complexity-budget | 5 | 14 |
176
+ | inlined-utility | 4 | 18 |
177
+ | duplication | 4 | 18 |
178
+ | dead-export | 4 | 18 |
179
+ | dead-branch | 3 | 18 |
180
+
181
+ **Profile multipliers**: strict = 1.2×, balanced = 1.0×, lenient = 0.8×.
182
+
183
+ **Per-finding formula**: `points = weight × profileMultiplier × (0.6 + 0.4 × confidence) × (1.0 + min(0.5, 0.1 × (locations - 1)))`. Category sums are capped, then total finding points capped at 70.
184
+
185
+ **Smell metrics** (up to 30 points):
186
+
187
+ - **wrapper-function-density** (45% weight) — single-statement wrapper functions as a ratio of total functions.
188
+ - **trivial-declaration-density** (35% weight) — declarations with ≤6 nodes and no control flow.
189
+ - **reused-fragment-names** (20% weight) — function/class/variable names appearing in 3+ distinct files.
190
+
191
+ **Score bands**:
192
+
193
+ | Range | Band |
194
+ | -----: | -------- |
195
+ | 0–14 | Minimal |
196
+ | 15–34 | Low |
197
+ | 35–59 | Moderate |
198
+ | 60–79 | High |
199
+ | 80–100 | Severe |
75
200
 
76
201
  ## Architecture
77
202
 
78
- Three engines run in parallel:
203
+ Three artifact engines run in parallel and are evaluated through a rule catalog:
79
204
 
80
205
  1. **Clone Detection** (in-process) — tree-sitter parsing → pre-normalization (cached) → alpha-renaming → hash/suffix-tree/PQ-gram comparison
81
- 2. **Semantic Analysis** (subprocess) — delegates to oxlint, clippy, ruff, etc. for language-specific pattern detection
82
- 3. **Best Practices** (in-process) — deterministic clean-code and architecture checks (complexity, boundaries, naming, Go hygiene, React barrels)
206
+ 2. **Semantic Analysis** (subprocess) — external linter diagnostics
207
+ 3. **Best Practices** (in-process) — deterministic clean-code checks
208
+
209
+ Each rule is a first-class catalog entry with defaults and per-rule controls.
83
210
 
84
211
  Built on Semgrep's model: **tree-sitter CST → language-specific normalizer → generic normalized tree → language-agnostic analysis**.
85
212
 
@@ -106,73 +233,57 @@ SDK stability policy: `pkg/sdk/STABILITY.md`.
106
233
 
107
234
  ## Configuration
108
235
 
109
- Create `.unslop.yaml` in your project root:
236
+ Create `.unslop.yaml` in your project root. v2 requires `version: 2`:
110
237
 
111
238
  ```yaml
112
- # Minimum token count for clone detection
113
- min_tokens: 50
114
-
115
- # Similarity threshold for near-miss detection (0.0-1.0)
116
- similarity_threshold: 0.8
117
-
118
- # Max near-miss comparisons
119
- max_suffix_pairs: 10000
120
-
121
- # Max structural (PQ-gram) comparisons
122
- max_pqgram_pairs: 50000
123
-
124
- # External linters
125
- linters:
126
- oxlint:
239
+ version: 2
240
+
241
+ analysis:
242
+ ignore:
243
+ - "vendor/"
244
+ - "node_modules/"
245
+ extensions: [".ts", ".tsx", ".go"]
246
+ languages: ["typescript", "go"]
247
+ changed_only: false
248
+
249
+ engines:
250
+ clone:
251
+ min_tokens: 50
252
+ similarity_threshold: 0.8
253
+ max_suffix_pairs: 10000
254
+ max_pqgram_pairs: 50000
255
+ linters:
256
+ oxlint:
257
+ enabled: true
258
+ command: oxlint
259
+ args: ["--format", "json"]
260
+ timeout: "60s"
261
+ practices:
127
262
  enabled: true
128
- command: oxlint
129
- args: ["--format", "json"]
130
-
131
- # Paths to ignore (in addition to .gitignore)
132
- ignore:
133
- - "vendor/"
134
- - "node_modules/"
135
- - "*.generated.ts"
263
+ profile: balanced
264
+ ignore_tests: true
265
+ max_findings_per_rule: 200
266
+
267
+ rules:
268
+ defaults:
269
+ severity: warning
270
+ gateable: false
271
+ exact-clone:
272
+ severity: error
273
+ gateable: true
274
+ complexity-budget:
275
+ paths:
276
+ include: ["apps/**"]
277
+ exclude: ["scripts/**"]
278
+
279
+ gates:
280
+ max_score: 35
281
+ fail_on_rules: ["exact-clone"]
282
+ fail_on_tiers: ["A"]
136
283
 
137
- # Slop score configuration
138
284
  slop_score:
139
- # strict | balanced | lenient
140
285
  profile: balanced
141
- # number of top contributors shown in terminal output (1-10)
142
286
  top_contributors: 5
143
-
144
- # Best-practices configuration
145
- best_practices:
146
- enabled: true
147
- # strict | balanced | lenient
148
- profile: balanced
149
- # ignore test-like paths for best-practices findings
150
- ignore_tests: true
151
- # deterministic per-rule cap (20-1000)
152
- max_findings_per_rule: 200
153
- ```
154
-
155
- Example terminal section (`--verbose`):
156
-
157
- ```text
158
- ── high-priority (2) ──
159
-
160
- [exact-clone] Exact duplicate: parseConfig
161
- ./src/a.ts:10
162
- ./src/b.ts:12
163
-
164
- ── low-priority (1) ──
165
-
166
- [exact-clone] Exact duplicate: makeTempDir
167
- ./src/foo.test.ts:5
168
- ./src/bar.test.ts:9
169
-
170
- ── slop-score ──
171
-
172
- Slop Score: 42/100 (Moderate)
173
- Top contributors:
174
- exact-clone: +18.4 (3 findings)
175
- wrapper-function-density: +8.7 (31% of functions)
176
287
  ```
177
288
 
178
289
  ## Development
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "unslop",
3
- "version": "0.1.4",
3
+ "version": "0.1.6",
4
4
  "description": "Detect duplicated code, dead code, and anti-patterns in AI-generated codebases",
5
5
  "bin": {
6
6
  "unslop": "bin.js"
@@ -9,11 +9,11 @@
9
9
  "bin.js"
10
10
  ],
11
11
  "optionalDependencies": {
12
- "@unslop/darwin-arm64": "0.1.4",
13
- "@unslop/darwin-x64": "0.1.4",
14
- "@unslop/linux-arm64": "0.1.4",
15
- "@unslop/linux-x64": "0.1.4",
16
- "@unslop/win32-x64": "0.1.4"
12
+ "@unslop/darwin-arm64": "0.1.6",
13
+ "@unslop/darwin-x64": "0.1.6",
14
+ "@unslop/linux-arm64": "0.1.6",
15
+ "@unslop/linux-x64": "0.1.6",
16
+ "@unslop/win32-x64": "0.1.6"
17
17
  },
18
18
  "license": "Apache-2.0",
19
19
  "repository": {