dslop 1.5.1 → 1.5.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,17 @@
1
1
  # Changelog
2
2
 
3
+ ## v1.5.2
4
+
5
+ [compare changes](https://github.com/turf-sports/dslop/compare/v1.5.1...v1.5.2)
6
+
7
+ ### 📖 Documentation
8
+
9
+ - Update README with AST-based detection details ([2bdb4f4](https://github.com/turf-sports/dslop/commit/2bdb4f4))
10
+
11
+ ### ❤️ Contributors
12
+
13
+ - Siddharth Sharma <sharmasiddharthcs@gmail.com>
14
+
3
15
  ## v1.5.1
4
16
 
5
17
  [compare changes](https://github.com/turf-sports/dslop/compare/v1.5.0...v1.5.1)
package/README.md CHANGED
@@ -36,26 +36,42 @@ dslop --all --cross-package # cross-package dupes (monorepos)
36
36
 
37
37
  ## How it works
38
38
 
39
- **Block extraction:** Sliding window over source files. Extracts overlapping blocks at sizes 4, 6, 9, 13... lines. For blocks <10 lines, step=1 (every line). Larger blocks use step=blockSize/2.
39
+ dslop uses two detection methods in parallel:
40
40
 
41
- **Normalization:** Before hashing, code is normalized:
41
+ ### 1. AST-based detection (functions/classes)
42
+
43
+ Parses TypeScript/JavaScript with Babel to extract functions and classes. Normalizes the AST by replacing all identifiers with generic placeholders (`$0`, `$1`, etc.), preserving only the code structure.
44
+
45
+ **This catches:**
46
+ - Functions with identical logic but different variable names
47
+ - Renamed copies of existing functions
48
+ - Structurally identical classes
49
+
50
+ Example: `calculateSum(numbers)` and `computeTotal(items)` with the same loop structure will match.
51
+
52
+ ### 2. Text-based detection (code blocks)
53
+
54
+ Sliding window over source files extracts overlapping blocks at sizes 4, 6, 9, 13... lines. Before hashing, code is normalized:
42
55
  - String literals → `"<STRING>"`
43
56
  - Numbers → `<NUMBER>`
44
57
  - Whitespace collapsed
45
58
  - Comments preserved (intentional - comments often indicate copy-paste)
46
59
 
47
- **Matching:** Normalized blocks are hashed. Exact hash matches = exact duplicates. For similar (non-exact) matches, uses character-level similarity on a sample of blocks per hash bucket.
60
+ Exact hash matches = exact duplicates. For similar (non-exact) matches, uses character-level similarity.
61
+
62
+ ### Changed-line filtering (default mode)
63
+
64
+ Parses `git diff` output to get exact line ranges of your changes. Only reports duplicates where your changed lines match code elsewhere in the codebase.
48
65
 
49
- **Declaration detection** (`--all` mode): Regex-based extraction of types, interfaces, functions, classes. Compares by name similarity (Levenshtein + word overlap) and content similarity.
66
+ ### Declaration detection (`--all` mode)
50
67
 
51
- **Changed-line filtering** (default mode): Parses `git diff` output to get exact line ranges. Only reports duplicates where your changed lines match code elsewhere.
68
+ Regex-based extraction of types, interfaces, enums. Compares by name similarity (Levenshtein + word overlap) and content similarity.
52
69
 
53
70
  ## Limitations
54
71
 
55
- - **Text-based, not AST:** Doesn't understand code structure. A reformatted function won't match the original. Two semantically identical functions with different variable names won't match.
56
- - **TypeScript/JavaScript focused:** Default extensions are ts/tsx/js/jsx. Works on any text but tuned for JS-like syntax.
72
+ - **TypeScript/JavaScript only for AST:** AST parsing uses Babel with TS/JSX plugins. Other languages fall back to text-based only.
57
73
  - **No cross-language:** Won't detect a Python function duplicated in TypeScript.
58
- - **Comments affect matching:** Intentional tradeoff. Copy-pasted code often includes comments.
74
+ - **Comments affect text matching:** Intentional tradeoff. Copy-pasted code often includes comments.
59
75
  - **Declaration detection is regex:** Can miss edge cases like multi-line generics or decorators.
60
76
  - **Minimum 4 lines:** Shorter duplicates ignored to reduce noise. Use `-m 2` for stricter.
61
77
  - **Memory:** Loads all blocks in memory. Very large codebases (>1M lines) may be slow.
package/dist/index.cjs CHANGED
@@ -51724,7 +51724,7 @@ async function scanDirectory(targetPath, options, enableAST = true) {
51724
51724
  }
51725
51725
 
51726
51726
  // index.ts
51727
- var VERSION = process.env.npm_package_version || "1.5.1";
51727
+ var VERSION = process.env.npm_package_version || "1.5.2";
51728
51728
  function parseDiffOutput(diff, cwd) {
51729
51729
  const changes = /* @__PURE__ */ new Map();
51730
51730
  let currentFile = null;
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "dslop",
3
- "version": "1.5.1",
3
+ "version": "1.5.2",
4
4
  "description": "Detect Similar/Duplicate Lines Of Programming - Find code duplication in your codebase",
5
5
  "type": "module",
6
6
  "main": "./dist/index.cjs",