dslop 1.4.0 → 1.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +24 -0
- package/README.md +26 -0
- package/dist/index.cjs +52003 -0
- package/package.json +9 -5
- package/dist/index.js +0 -7215
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,29 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## v1.5.0
|
|
4
|
+
|
|
5
|
+
[compare changes](https://github.com/turf-sports/dslop/compare/v1.4.1...v1.5.0)
|
|
6
|
+
|
|
7
|
+
### 🚀 Enhancements
|
|
8
|
+
|
|
9
|
+
- Add AST-based duplicate detection ([f813c14](https://github.com/turf-sports/dslop/commit/f813c14))
|
|
10
|
+
|
|
11
|
+
### ❤️ Contributors
|
|
12
|
+
|
|
13
|
+
- Siddharth Sharma <sharmasiddharthcs@gmail.com>
|
|
14
|
+
|
|
15
|
+
## v1.4.1
|
|
16
|
+
|
|
17
|
+
[compare changes](https://github.com/turf-sports/dslop/compare/v1.4.0...v1.4.1)
|
|
18
|
+
|
|
19
|
+
### 📖 Documentation
|
|
20
|
+
|
|
21
|
+
- Add technical details and limitations ([3598291](https://github.com/turf-sports/dslop/commit/3598291))
|
|
22
|
+
|
|
23
|
+
### ❤️ Contributors
|
|
24
|
+
|
|
25
|
+
- Siddharth Sharma <sharmasiddharthcs@gmail.com>
|
|
26
|
+
|
|
3
27
|
## v1.4.0
|
|
4
28
|
|
|
5
29
|
[compare changes](https://github.com/turf-sports/dslop/compare/v1.3.1...v1.4.0)
|
package/README.md
CHANGED
|
@@ -34,6 +34,32 @@ dslop --all --cross-package # cross-package dupes (monorepos)
|
|
|
34
34
|
| `--cross-package` | only show dupes across packages |
|
|
35
35
|
| `--json` | json output |
|
|
36
36
|
|
|
37
|
+
## How it works
|
|
38
|
+
|
|
39
|
+
**Block extraction:** Sliding window over source files. Extracts overlapping blocks at sizes 4, 6, 9, 13... lines. For blocks <10 lines, step=1 (every line). Larger blocks use step=blockSize/2.
|
|
40
|
+
|
|
41
|
+
**Normalization:** Before hashing, code is normalized:
|
|
42
|
+
- String literals → `"<STRING>"`
|
|
43
|
+
- Numbers → `<NUMBER>`
|
|
44
|
+
- Whitespace collapsed
|
|
45
|
+
- Comments preserved (intentional - comments often indicate copy-paste)
|
|
46
|
+
|
|
47
|
+
**Matching:** Normalized blocks are hashed. Exact hash matches = exact duplicates. For similar (non-exact) matches, uses character-level similarity on a sample of blocks per hash bucket.
|
|
48
|
+
|
|
49
|
+
**Declaration detection** (`--all` mode): Regex-based extraction of types, interfaces, functions, classes. Compares by name similarity (Levenshtein + word overlap) and content similarity.
|
|
50
|
+
|
|
51
|
+
**Changed-line filtering** (default mode): Parses `git diff` output to get exact line ranges. Only reports duplicates where your changed lines match code elsewhere.
|
|
52
|
+
|
|
53
|
+
## Limitations
|
|
54
|
+
|
|
55
|
+
- **Text-based, not AST:** Doesn't understand code structure. A reformatted function won't match the original. Two semantically identical functions with different variable names won't match.
|
|
56
|
+
- **TypeScript/JavaScript focused:** Default extensions are ts/tsx/js/jsx. Works on any text but tuned for JS-like syntax.
|
|
57
|
+
- **No cross-language:** Won't detect a Python function duplicated in TypeScript.
|
|
58
|
+
- **Comments affect matching:** Intentional tradeoff. Copy-pasted code often includes comments.
|
|
59
|
+
- **Declaration detection is regex:** Can miss edge cases like multi-line generics or decorators.
|
|
60
|
+
- **Minimum 4 lines:** Shorter duplicates ignored to reduce noise. Use `-m 2` for stricter.
|
|
61
|
+
- **Memory:** Loads all blocks in memory. Very large codebases (>1M lines) may be slow.
|
|
62
|
+
|
|
37
63
|
## License
|
|
38
64
|
|
|
39
65
|
MIT
|