npm - unslop - Versions diffs - 0.1.4 → 0.1.6 - Mend

unslop 0.1.4 → 0.1.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (2) hide show

package/README.md +196 -85
package/package.json +6 -6

package/README.md CHANGED Viewed

@@ -2,8 +2,6 @@
 Standalone CLI tool for detecting duplicated code, dead code, inlined utilities, and semantic anti-patterns in AI-generated codebases. No AI/LLM in the detection pipeline — deterministic analysis only.
-Like `react-doctor` but for de-slopping entire codebases.
 ## Install
 ```bash
@@ -42,6 +40,9 @@ unslop --format json .
 # SARIF output (for CI integration)
 unslop --format sarif .
+# List built-in rules and defaults
+unslop --list-rules
 # With config file
 unslop --config .unslop.yaml .
 ```
@@ -52,34 +53,160 @@ Use `--changed-only` to focus on uncommitted files and prioritize reuse against
 ## What It Detects
-| Category | Engine | Reliability |
-|----------|--------|:-----------:|
-| Identical constants across files | Clone | 99% |
-| Copy-pasted functions (same names) | Clone | 99% |
-| Copy-pasted functions (renamed params) | Clone | 95% |
-| Reformatted JSX components | Clone | 90% |
-| Similar functions (small edits) | Clone | 80% |
-| Equivalent regex patterns | Clone | 95% |
-| Cross-package export matches | Clone | 99% |
-| `a>b?a:b` → `Math.max` | Oxlint | 99% |
-| Dead branches | Oxlint | 90% |
-| Unreachable code | Oxlint | 99% |
-| Inlined utilities | Oxlint | 95% |
-| Dead exports | Clone | 95% |
-| Complexity budget breaches | Practices | 90% |
-| Boundary violations (`packages -> apps`, cross-app deps, cycles) | Practices | 90-98% |
-| Generic naming entropy | Practices | 74% |
-| Ignored Go errors | Practices | 93% |
-| Go context leaks | Practices | 87% |
-| React broad barrel re-exports | Practices | 78-86% |
+### Quick Reference
+| Category                               | Engine    | Reliability |
+| -------------------------------------- | --------- | :---------: |
+| Identical constants across files       | Clone     |     99%     |
+| Copy-pasted functions (same names)     | Clone     |     99%     |
+| Copy-pasted functions (renamed params) | Clone     |     95%     |
+| Reformatted JSX components             | Clone     |     90%     |
+| Similar functions (small edits)        | Clone     |     80%     |
+| Equivalent regex patterns              | Clone     |     95%     |
+| Cross-package export matches           | Clone     |     99%     |
+| `a>b?a:b` → `Math.max`                 | Oxlint    |     99%     |
+| Dead branches                          | Oxlint    |     90%     |
+| Unreachable code                       | Oxlint    |     99%     |
+| Inlined utilities                      | Oxlint    |     95%     |
+| Dead exports                           | Clone     |     95%     |
+| Complexity budget breaches             | Practices |     90%     |
+## Rules Reference
+10 rules across 3 engines. Use `unslop --list-rules` to see defaults for your config.
+---
+### Engine 1: Clone Detection (3 rules)
+Tree-sitter parses source into a CST, then a language plugin normalizes it to a language-agnostic tree with alpha-renamed identifiers (`a`, `b`, `c` instead of real names). All three rules operate on these normalized trees.
+#### `exact-clone` — Tier A, Error
+SHA-256 fingerprint of the entire normalized tree. Two fragments with identical hashes are exact clones.
+- **Algorithm**: deterministic S-expression serialization `(Kind:Label child1 child2 ...)` → SHA-256. O(n) grouping by hash.
+- **Minimum size**: 50 tokens, 8 nodes.
+- **Filters**: same-file duplicates, import-linked pairs, rule boilerplate scaffolding (<30 line span in rules scaffold paths), repeated framework idioms (3+ occurrences with diverse names across 2+ files).
+#### `near-clone` — Tier A, Warning
+Finds near-miss duplicates via suffix tree / LCS analysis on linearized token sequences.
+- **Linearization**: pre-order traversal of normalized tree → flat token sequence with `^` sentinel tokens marking end-of-children.
+- **Bucketing**: logarithmic buckets by token count (20–39 → bucket 0, 40–79 → bucket 1, etc.) to avoid O(n²). Each sequence placed in adjacent bucket +1 for cross-boundary matches.
+- **Algorithm**: rolling DP longest common substring (two rows, O(n) space), capped at **220,000 DP cells** per pair. Falls back to bounded longest common subsequence for edited clones with insertions/deletions.
+- **Threshold**: **80% similarity** (`matchLen / max(len(A), len(B))`).
+- **Limits**: max 10,000 comparisons, min 50 tokens.
+- **Post-processing**: deterministic one-to-one matching per file pair — sorts by similarity descending, prefers same-name pairs, greedy assignment (each fragment matched at most once per file pair). Excludes declarative clones (different names + no control flow or ≤3 statements).
+#### `structural-similarity` — Tier A, Warning
+PQ-gram tree profiling for structural similarity between same-kind fragments.
+- **Algorithm**: PQ-gram with **p=2** ancestors, **q=3** siblings. Each gram = stem of p ancestor labels + base of q sibling labels. Similarity via **Sørensen–Dice coefficient**: `2 * |intersection| / (|A| + |B|)`.
+- **Eligibility**: same fragment Kind only (function-to-function, class-to-class), different files, min 8 nodes, size ratio ≤ 1.8×.
+- **Thresholds** (variable by file type):
+  - Both production files: **70%**
+  - One test + one production: **86%**
+  - Both test files: **90%**
+- **Limits**: max 50,000 comparisons. Profiles cached per fragment.
+---
+### Engine 2: Linter Integration (5 rules)
+Wraps external linters (e.g. oxlint) run as subprocesses with configurable timeouts. Parses JSON output into findings. Non-fatal — a missing linter produces a warning, not an error.
+#### `inlined-utility` — Tier A, Warning
+Hand-rolled code that could use a standard library or utility call (e.g. `a > b ? a : b` → `Math.max`).
+#### `dead-code` — Tier A, Error, Gateable
+Unreachable code blocks that can never execute.
+#### `dead-branch` — Tier A, Warning
+Conditional branches (if/else, ternary) that can never be taken.
+#### `duplication` — Tier A, Warning
+Duplicate imports, exports, or repeated patterns detected by the linter.
+#### `dead-export` — Tier A, Warning
+Exported symbols that nothing imports.
+---
+### Engine 3: Best Practices (1 rule)
+In-process deterministic checks — pattern matching and metric computation. No external dependencies.
+#### `complexity-budget` — Tier A, Warning
+Flags functions that exceed thresholds on **2+ of 3 metrics** simultaneously.
+| Metric                                 | Strict | Balanced (default) | Lenient |
+| -------------------------------------- | -----: | -----------------: | ------: |
+| Decision count (if/loop/switch/select) |    ≥10 |                ≥12 |     ≥14 |
+| Max nesting depth                      |     ≥3 |                 ≥4 |      ≥5 |
+| Source lines                           |    ≥75 |                ≥90 |    ≥110 |
+- Functions under 20 lines are always excluded.
+- **Headroom** for special paths: tooling/infrastructure paths get +20 decisions, +3 nesting, +80 lines. Script functions (`/scripts/`, `.mjs`/`.cjs`, `main`/`walk`) get +14/+2/+40. Orchestration functions (`/cmd/`, `/routes/`, `/handlers/`, names starting with `handle`/`route`/`serve`) get +2/+1/+25.
+- Also fires on **single-metric extreme outliers**: decisions ≥ threshold+8, nesting ≥ threshold+2, or lines ≥ threshold+40 (when some complexity is also present).
+---
+### Slop Score
+A 0–100 weighted composite of findings and structural smell metrics.
+**Category weights** (base, before profile multiplier):
+| Category              | Weight | Cap |
+| --------------------- | -----: | --: |
+| exact-clone           |      9 |  18 |
+| near-clone            |      6 |  18 |
+| structural-similarity |      5 |  18 |
+| dead-code             |      5 |  18 |
+| complexity-budget     |      5 |  14 |
+| inlined-utility       |      4 |  18 |
+| duplication           |      4 |  18 |
+| dead-export           |      4 |  18 |
+| dead-branch           |      3 |  18 |
+**Profile multipliers**: strict = 1.2×, balanced = 1.0×, lenient = 0.8×.
+**Per-finding formula**: `points = weight × profileMultiplier × (0.6 + 0.4 × confidence) × (1.0 + min(0.5, 0.1 × (locations - 1)))`. Category sums are capped, then total finding points capped at 70.
+**Smell metrics** (up to 30 points):
+- **wrapper-function-density** (45% weight) — single-statement wrapper functions as a ratio of total functions.
+- **trivial-declaration-density** (35% weight) — declarations with ≤6 nodes and no control flow.
+- **reused-fragment-names** (20% weight) — function/class/variable names appearing in 3+ distinct files.
+**Score bands**:
+|  Range | Band     |
+| -----: | -------- |
+|   0–14 | Minimal  |
+|  15–34 | Low      |
+|  35–59 | Moderate |
+|  60–79 | High     |
+| 80–100 | Severe   |
 ## Architecture
-Three engines run in parallel:
+Three artifact engines run in parallel and are evaluated through a rule catalog:
 1. **Clone Detection** (in-process) — tree-sitter parsing → pre-normalization (cached) → alpha-renaming → hash/suffix-tree/PQ-gram comparison
-2. **Semantic Analysis** (subprocess) — delegates to oxlint, clippy, ruff, etc. for language-specific pattern detection
-3. **Best Practices** (in-process) — deterministic clean-code and architecture checks (complexity, boundaries, naming, Go hygiene, React barrels)
+2. **Semantic Analysis** (subprocess) — external linter diagnostics
+3. **Best Practices** (in-process) — deterministic clean-code checks
+Each rule is a first-class catalog entry with defaults and per-rule controls.
 Built on Semgrep's model: **tree-sitter CST → language-specific normalizer → generic normalized tree → language-agnostic analysis**.
@@ -106,73 +233,57 @@ SDK stability policy: `pkg/sdk/STABILITY.md`.
 ## Configuration
-Create `.unslop.yaml` in your project root:
+Create `.unslop.yaml` in your project root. v2 requires `version: 2`:
 ```yaml
-# Minimum token count for clone detection
-min_tokens: 50
-# Similarity threshold for near-miss detection (0.0-1.0)
-similarity_threshold: 0.8
-# Max near-miss comparisons
-max_suffix_pairs: 10000
-# Max structural (PQ-gram) comparisons
-max_pqgram_pairs: 50000
-# External linters
-linters:
-  oxlint:
+version: 2
+analysis:
+  ignore:
+    - "vendor/"
+    - "node_modules/"
+  extensions: [".ts", ".tsx", ".go"]
+  languages: ["typescript", "go"]
+  changed_only: false
+engines:
+  clone:
+    min_tokens: 50
+    similarity_threshold: 0.8
+    max_suffix_pairs: 10000
+    max_pqgram_pairs: 50000
+  linters:
+    oxlint:
+      enabled: true
+      command: oxlint
+      args: ["--format", "json"]
+      timeout: "60s"
+  practices:
     enabled: true
-    command: oxlint
-    args: ["--format", "json"]
-# Paths to ignore (in addition to .gitignore)
-ignore:
-  - "vendor/"
-  - "node_modules/"
-  - "*.generated.ts"
+    profile: balanced
+    ignore_tests: true
+    max_findings_per_rule: 200
+rules:
+  defaults:
+    severity: warning
+    gateable: false
+  exact-clone:
+    severity: error
+    gateable: true
+  complexity-budget:
+    paths:
+      include: ["apps/**"]
+      exclude: ["scripts/**"]
+gates:
+  max_score: 35
+  fail_on_rules: ["exact-clone"]
+  fail_on_tiers: ["A"]
-# Slop score configuration
 slop_score:
-  # strict | balanced | lenient
   profile: balanced
-  # number of top contributors shown in terminal output (1-10)
   top_contributors: 5
-# Best-practices configuration
-best_practices:
-  enabled: true
-  # strict | balanced | lenient
-  profile: balanced
-  # ignore test-like paths for best-practices findings
-  ignore_tests: true
-  # deterministic per-rule cap (20-1000)
-  max_findings_per_rule: 200
-```
-Example terminal section (`--verbose`):
-```text
-── high-priority (2) ──
-  [exact-clone] Exact duplicate: parseConfig
-    ./src/a.ts:10
-    ./src/b.ts:12
-── low-priority (1) ──
-  [exact-clone] Exact duplicate: makeTempDir
-    ./src/foo.test.ts:5
-    ./src/bar.test.ts:9
-── slop-score ──
-  Slop Score: 42/100 (Moderate)
-  Top contributors:
-    exact-clone: +18.4 (3 findings)
-    wrapper-function-density: +8.7 (31% of functions)
 ```
 ## Development

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "unslop",
-  "version": "0.1.4",
+  "version": "0.1.6",
   "description": "Detect duplicated code, dead code, and anti-patterns in AI-generated codebases",
   "bin": {
     "unslop": "bin.js"
@@ -9,11 +9,11 @@
     "bin.js"
   ],
   "optionalDependencies": {
-    "@unslop/darwin-arm64": "0.1.4",
-    "@unslop/darwin-x64": "0.1.4",
-    "@unslop/linux-arm64": "0.1.4",
-    "@unslop/linux-x64": "0.1.4",
-    "@unslop/win32-x64": "0.1.4"
+    "@unslop/darwin-arm64": "0.1.6",
+    "@unslop/darwin-x64": "0.1.6",
+    "@unslop/linux-arm64": "0.1.6",
+    "@unslop/linux-x64": "0.1.6",
+    "@unslop/win32-x64": "0.1.6"
   },
   "license": "Apache-2.0",
   "repository": {