dslop 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +113 -0
- package/dist/index.js +6691 -0
- package/package.json +49 -0
package/README.md
ADDED
|
@@ -0,0 +1,113 @@
|
|
|
1
|
+
# dslop
|
|
2
|
+
|
|
3
|
+
**D**etect **S**imilar/**L**ines **O**f **P**rogramming - A fast duplicate code detector.
|
|
4
|
+
|
|
5
|
+
## Installation
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
bun install
|
|
9
|
+
```
|
|
10
|
+
|
|
11
|
+
## Usage
|
|
12
|
+
|
|
13
|
+
```bash
|
|
14
|
+
# Scan current directory
|
|
15
|
+
bun run index.ts .
|
|
16
|
+
|
|
17
|
+
# Scan specific directory with options
|
|
18
|
+
bun run index.ts ./src -m 6 -s 80
|
|
19
|
+
|
|
20
|
+
# Output as JSON
|
|
21
|
+
bun run index.ts . --json
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
## Options
|
|
25
|
+
|
|
26
|
+
| Option | Short | Default | Description |
|
|
27
|
+
|--------|-------|---------|-------------|
|
|
28
|
+
| `--min-lines` | `-m` | 4 | Minimum block size in lines |
|
|
29
|
+
| `--similarity` | `-s` | 70 | Minimum similarity threshold (0-100) |
|
|
30
|
+
| `--extensions` | `-e` | ts,tsx,js,jsx | File extensions to scan |
|
|
31
|
+
| `--ignore` | `-i` | node_modules,dist,... | Patterns to ignore |
|
|
32
|
+
| `--no-normalize` | | | Disable string/number normalization |
|
|
33
|
+
| `--json` | | | Output as JSON |
|
|
34
|
+
| `--help` | `-h` | | Show help |
|
|
35
|
+
| `--version` | `-v` | | Show version |
|
|
36
|
+
|
|
37
|
+
## How It Works
|
|
38
|
+
|
|
39
|
+
1. **Scanning**: Recursively scans files matching the specified extensions
|
|
40
|
+
2. **Block Extraction**: Extracts code blocks using a sliding window approach at multiple granularities
|
|
41
|
+
3. **Normalization**: Replaces string literals, numbers, and colors with placeholders for structural comparison
|
|
42
|
+
4. **Hash Grouping**: Groups exact duplicates by hash for fast matching
|
|
43
|
+
5. **Similarity Matching**: Uses Jaccard similarity on line sets for near-duplicates
|
|
44
|
+
6. **Filtering**: Removes overlapping blocks and deduplicates groups
|
|
45
|
+
|
|
46
|
+
## Configuration
|
|
47
|
+
|
|
48
|
+
All detection parameters are configurable in `src/constants.ts`:
|
|
49
|
+
|
|
50
|
+
```typescript
|
|
51
|
+
// Block extraction
|
|
52
|
+
MAX_BLOCK_SIZE = 100 // Maximum lines per block
|
|
53
|
+
BLOCK_SIZE_MULTIPLIER = 1.5 // Growth factor for multi-size extraction
|
|
54
|
+
MIN_MEANINGFUL_LINE_RATIO = 0.6 // Skip blocks with too many comments/whitespace
|
|
55
|
+
|
|
56
|
+
// Detection
|
|
57
|
+
SIZE_BUCKET_DIVISOR = 5 // Group blocks by ~5 line buckets
|
|
58
|
+
MAX_BLOCKS_FOR_SIMILARITY = 10000 // Skip similarity for large codebases
|
|
59
|
+
GROUP_OVERLAP_THRESHOLD = 0.5 // Dedup threshold
|
|
60
|
+
|
|
61
|
+
// Output
|
|
62
|
+
MAX_GROUPS_DETAILED = 20 // Max groups to show in detail
|
|
63
|
+
MAX_MATCHES_IN_SUMMARY = 5 // Max file matches per group
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
## Example Output
|
|
67
|
+
|
|
68
|
+
```
|
|
69
|
+
Scanning ./src...
|
|
70
|
+
Extensions: ts, tsx, js, jsx
|
|
71
|
+
Min block size: 4 lines
|
|
72
|
+
Similarity threshold: 70%
|
|
73
|
+
Normalization: enabled
|
|
74
|
+
|
|
75
|
+
Scanned 81 files (15,672 lines) in 129ms
|
|
76
|
+
Extracted 13,920 code blocks
|
|
77
|
+
|
|
78
|
+
Found 500 duplicate groups in 51ms
|
|
79
|
+
|
|
80
|
+
────────────────────────────────────────────────────────────────────────────────
|
|
81
|
+
DUPLICATE CODE DETECTED
|
|
82
|
+
────────────────────────────────────────────────────────────────────────────────
|
|
83
|
+
|
|
84
|
+
Group 1 │ EXACT │ 28 lines × 2 occurrences = 56 lines of duplication
|
|
85
|
+
|
|
86
|
+
├─ QuarterlyWinnerMessagePreview.tsx:197-224
|
|
87
|
+
├─ CoachEnteredMessagePreview.tsx:113-140
|
|
88
|
+
|
|
89
|
+
Code preview:
|
|
90
|
+
│ <View style={{
|
|
91
|
+
│ backgroundColor: "white",
|
|
92
|
+
│ ...
|
|
93
|
+
|
|
94
|
+
SUMMARY
|
|
95
|
+
────────────────────────────────────────────────────────────────────────────────
|
|
96
|
+
Total duplicate groups: 500
|
|
97
|
+
Exact matches: 450
|
|
98
|
+
Similar matches: 50
|
|
99
|
+
Files affected: 65
|
|
100
|
+
Total duplicate lines: 12,340
|
|
101
|
+
Average similarity: 95%
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
## Build
|
|
105
|
+
|
|
106
|
+
```bash
|
|
107
|
+
# Create standalone binary
|
|
108
|
+
bun build --compile ./index.ts --outfile dslop
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
## License
|
|
112
|
+
|
|
113
|
+
MIT
|