dslop 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +113 -0
  2. package/dist/index.js +6691 -0
  3. package/package.json +49 -0
package/README.md ADDED
@@ -0,0 +1,113 @@
1
+ # dslop
2
+
3
+ **D**etect **S**imilar/**L**ines **O**f **P**rogramming - A fast duplicate code detector.
4
+
5
+ ## Installation
6
+
7
+ ```bash
8
+ bun install
9
+ ```
10
+
11
+ ## Usage
12
+
13
+ ```bash
14
+ # Scan current directory
15
+ bun run index.ts .
16
+
17
+ # Scan specific directory with options
18
+ bun run index.ts ./src -m 6 -s 80
19
+
20
+ # Output as JSON
21
+ bun run index.ts . --json
22
+ ```
23
+
24
+ ## Options
25
+
26
+ | Option | Short | Default | Description |
27
+ |--------|-------|---------|-------------|
28
+ | `--min-lines` | `-m` | 4 | Minimum block size in lines |
29
+ | `--similarity` | `-s` | 70 | Minimum similarity threshold (0-100) |
30
+ | `--extensions` | `-e` | ts,tsx,js,jsx | File extensions to scan |
31
+ | `--ignore` | `-i` | node_modules,dist,... | Patterns to ignore |
32
+ | `--no-normalize` | | | Disable string/number normalization |
33
+ | `--json` | | | Output as JSON |
34
+ | `--help` | `-h` | | Show help |
35
+ | `--version` | `-v` | | Show version |
36
+
37
+ ## How It Works
38
+
39
+ 1. **Scanning**: Recursively scans files matching the specified extensions
40
+ 2. **Block Extraction**: Extracts code blocks using a sliding window approach at multiple granularities
41
+ 3. **Normalization**: Replaces string literals, numbers, and colors with placeholders for structural comparison
42
+ 4. **Hash Grouping**: Groups exact duplicates by hash for fast matching
43
+ 5. **Similarity Matching**: Uses Jaccard similarity on line sets for near-duplicates
44
+ 6. **Filtering**: Removes overlapping blocks and deduplicates groups
45
+
46
+ ## Configuration
47
+
48
+ All detection parameters are configurable in `src/constants.ts`:
49
+
50
+ ```typescript
51
+ // Block extraction
52
+ MAX_BLOCK_SIZE = 100 // Maximum lines per block
53
+ BLOCK_SIZE_MULTIPLIER = 1.5 // Growth factor for multi-size extraction
54
+ MIN_MEANINGFUL_LINE_RATIO = 0.6 // Skip blocks with too many comments/whitespace
55
+
56
+ // Detection
57
+ SIZE_BUCKET_DIVISOR = 5 // Group blocks by ~5 line buckets
58
+ MAX_BLOCKS_FOR_SIMILARITY = 10000 // Skip similarity for large codebases
59
+ GROUP_OVERLAP_THRESHOLD = 0.5 // Dedup threshold
60
+
61
+ // Output
62
+ MAX_GROUPS_DETAILED = 20 // Max groups to show in detail
63
+ MAX_MATCHES_IN_SUMMARY = 5 // Max file matches per group
64
+ ```
65
+
66
+ ## Example Output
67
+
68
+ ```
69
+ Scanning ./src...
70
+ Extensions: ts, tsx, js, jsx
71
+ Min block size: 4 lines
72
+ Similarity threshold: 70%
73
+ Normalization: enabled
74
+
75
+ Scanned 81 files (15,672 lines) in 129ms
76
+ Extracted 13,920 code blocks
77
+
78
+ Found 500 duplicate groups in 51ms
79
+
80
+ ────────────────────────────────────────────────────────────────────────────────
81
+ DUPLICATE CODE DETECTED
82
+ ────────────────────────────────────────────────────────────────────────────────
83
+
84
+ Group 1 │ EXACT │ 28 lines × 2 occurrences = 56 lines of duplication
85
+
86
+ ├─ QuarterlyWinnerMessagePreview.tsx:197-224
87
+ ├─ CoachEnteredMessagePreview.tsx:113-140
88
+
89
+ Code preview:
90
+ │ <View style={{
91
+ │ backgroundColor: "white",
92
+ │ ...
93
+
94
+ SUMMARY
95
+ ────────────────────────────────────────────────────────────────────────────────
96
+ Total duplicate groups: 500
97
+ Exact matches: 450
98
+ Similar matches: 50
99
+ Files affected: 65
100
+ Total duplicate lines: 12,340
101
+ Average similarity: 95%
102
+ ```
103
+
104
+ ## Build
105
+
106
+ ```bash
107
+ # Create standalone binary
108
+ bun build --compile ./index.ts --outfile dslop
109
+ ```
110
+
111
+ ## License
112
+
113
+ MIT