dslop 1.5.1 → 1.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,29 @@
1
1
  # Changelog
2
2
 
3
+ ## v1.6.0
4
+
5
+ [compare changes](https://github.com/turf-sports/dslop/compare/v1.5.2...v1.6.0)
6
+
7
+ ### 🚀 Enhancements
8
+
9
+ - Smarter defaults - auto-switch to full scan when no changes ([fb918a8](https://github.com/turf-sports/dslop/commit/fb918a8))
10
+
11
+ ### ❤️ Contributors
12
+
13
+ - Siddharth Sharma <sharmasiddharthcs@gmail.com>
14
+
15
+ ## v1.5.2
16
+
17
+ [compare changes](https://github.com/turf-sports/dslop/compare/v1.5.1...v1.5.2)
18
+
19
+ ### 📖 Documentation
20
+
21
+ - Update README with AST-based detection details ([2bdb4f4](https://github.com/turf-sports/dslop/commit/2bdb4f4))
22
+
23
+ ### ❤️ Contributors
24
+
25
+ - Siddharth Sharma <sharmasiddharthcs@gmail.com>
26
+
3
27
  ## v1.5.1
4
28
 
5
29
  [compare changes](https://github.com/turf-sports/dslop/compare/v1.5.0...v1.5.1)
package/README.md CHANGED
@@ -6,7 +6,7 @@ Find duplicate code in your codebase.
6
6
  npx dslop
7
7
  ```
8
8
 
9
- By default, checks your branch changes (committed + uncommitted) against the codebase.
9
+ By default, checks your branch changes against the codebase. If no changes found, automatically does a full scan.
10
10
 
11
11
  ## Install
12
12
 
@@ -17,17 +17,18 @@ npm i -g dslop
17
17
  ## Usage
18
18
 
19
19
  ```bash
20
- dslop # check your PR/branch for dupes
21
- dslop --all # scan entire codebase
22
- dslop ./src -m 6 -s 80 # 6 line min, 80% similarity
23
- dslop --all --cross-package # cross-package dupes (monorepos)
20
+ dslop # check PR changes (or full scan if none)
21
+ dslop ./apps/web # scan apps/web (full if no changes there)
22
+ dslop -c # changes only, exit if none found
23
+ dslop --cross-package # cross-package dupes (monorepos)
24
24
  ```
25
25
 
26
26
  ## Options
27
27
 
28
28
  | Flag | Description |
29
29
  |------|-------------|
30
- | `-a, --all` | scan entire codebase (default: uncommitted only) |
30
+ | `-a, --all` | force full codebase scan |
31
+ | `-c, --changes` | force changes-only mode (exit if no changes) |
31
32
  | `-m, --min-lines` | min lines per block (default: 4) |
32
33
  | `-s, --similarity` | similarity threshold 0-100 (default: 70) |
33
34
  | `-e, --extensions` | file extensions (default: ts,tsx,js,jsx) |
@@ -36,27 +37,40 @@ dslop --all --cross-package # cross-package dupes (monorepos)
36
37
 
37
38
  ## How it works
38
39
 
39
- **Block extraction:** Sliding window over source files. Extracts overlapping blocks at sizes 4, 6, 9, 13... lines. For blocks <10 lines, step=1 (every line). Larger blocks use step=blockSize/2.
40
+ dslop uses two detection methods in parallel:
40
41
 
41
- **Normalization:** Before hashing, code is normalized:
42
+ ### 1. AST-based detection (functions/classes)
43
+
44
+ Parses TypeScript/JavaScript with Babel to extract functions and classes. Normalizes the AST by replacing all identifiers with generic placeholders (`$0`, `$1`, etc.), preserving only the code structure.
45
+
46
+ **This catches:**
47
+ - Functions with identical logic but different variable names
48
+ - Renamed copies of existing functions
49
+ - Structurally identical classes
50
+
51
+ Example: `calculateSum(numbers)` and `computeTotal(items)` with the same loop structure will match.
52
+
53
+ ### 2. Text-based detection (code blocks)
54
+
55
+ Sliding window over source files extracts overlapping blocks at sizes 4, 6, 9, 13... lines. Before hashing, code is normalized:
42
56
  - String literals → `"<STRING>"`
43
57
  - Numbers → `<NUMBER>`
44
58
  - Whitespace collapsed
45
59
  - Comments preserved (intentional - comments often indicate copy-paste)
46
60
 
47
- **Matching:** Normalized blocks are hashed. Exact hash matches = exact duplicates. For similar (non-exact) matches, uses character-level similarity on a sample of blocks per hash bucket.
61
+ Exact hash matches = exact duplicates. For similar (non-exact) matches, uses character-level similarity.
48
62
 
49
- **Declaration detection** (`--all` mode): Regex-based extraction of types, interfaces, functions, classes. Compares by name similarity (Levenshtein + word overlap) and content similarity.
63
+ ### Smart defaults
50
64
 
51
- **Changed-line filtering** (default mode): Parses `git diff` output to get exact line ranges. Only reports duplicates where your changed lines match code elsewhere.
65
+ 1. If you have branch changes checks those against the codebase
66
+ 2. If no changes found → automatically scans the entire target path
67
+ 3. Use `-c` to force changes-only mode (useful in CI)
52
68
 
53
69
  ## Limitations
54
70
 
55
- - **Text-based, not AST:** Doesn't understand code structure. A reformatted function won't match the original. Two semantically identical functions with different variable names won't match.
56
- - **TypeScript/JavaScript focused:** Default extensions are ts/tsx/js/jsx. Works on any text but tuned for JS-like syntax.
71
+ - **TypeScript/JavaScript only for AST:** AST parsing uses Babel with TS/JSX plugins. Other languages fall back to text-based only.
57
72
  - **No cross-language:** Won't detect a Python function duplicated in TypeScript.
58
- - **Comments affect matching:** Intentional tradeoff. Copy-pasted code often includes comments.
59
- - **Declaration detection is regex:** Can miss edge cases like multi-line generics or decorators.
73
+ - **Comments affect text matching:** Intentional tradeoff. Copy-pasted code often includes comments.
60
74
  - **Minimum 4 lines:** Shorter duplicates ignored to reduce noise. Use `-m 2` for stricter.
61
75
  - **Memory:** Loads all blocks in memory. Very large codebases (>1M lines) may be slow.
62
76
 
package/dist/index.cjs CHANGED
@@ -51724,7 +51724,7 @@ async function scanDirectory(targetPath, options, enableAST = true) {
51724
51724
  }
51725
51725
 
51726
51726
  // index.ts
51727
- var VERSION = process.env.npm_package_version || "1.5.1";
51727
+ var VERSION = process.env.npm_package_version || "1.6.0";
51728
51728
  function parseDiffOutput(diff, cwd) {
51729
51729
  const changes = /* @__PURE__ */ new Map();
51730
51730
  let currentFile = null;
@@ -51812,13 +51812,15 @@ dslop - Detect Similar/Duplicate Lines Of Programming
51812
51812
  Usage:
51813
51813
  dslop [path] [options]
51814
51814
 
51815
- By default, checks your branch changes (committed + uncommitted) against the codebase.
51815
+ By default, checks your branch changes against the codebase.
51816
+ If no changes are found, automatically scans the entire path.
51816
51817
 
51817
51818
  Arguments:
51818
51819
  path Directory to scan (default: current directory)
51819
51820
 
51820
51821
  Options:
51821
- -a, --all Scan entire codebase (not just uncommitted changes)
51822
+ -a, --all Force full codebase scan
51823
+ -c, --changes Force changes-only mode (exit if no changes found)
51822
51824
  -m, --min-lines <n> Minimum block size in lines (default: ${DEFAULT_MIN_LINES})
51823
51825
  -s, --similarity <n> Minimum similarity threshold 0-100 (default: ${Math.round(DEFAULT_SIMILARITY * 100)})
51824
51826
  -e, --extensions <s> File extensions to scan (default: ${DEFAULT_EXTENSIONS.join(",")})
@@ -51830,10 +51832,10 @@ Options:
51830
51832
  -v, --version Show version
51831
51833
 
51832
51834
  Examples:
51833
- dslop Check your PR/branch changes for duplicates
51834
- dslop --all Scan entire codebase
51835
- dslop ./src -m 6 -s 80 Scan src with 6 line min, 80% similarity
51836
- dslop --all --cross-package Cross-package duplicates in entire codebase
51835
+ dslop Check your PR changes (or full scan if none)
51836
+ dslop ./apps/web Scan apps/web (full scan if no changes there)
51837
+ dslop -c Check PR changes only, exit if none
51838
+ dslop --cross-package Cross-package duplicates
51837
51839
  `);
51838
51840
  }
51839
51841
  function showVersion() {
@@ -51850,6 +51852,7 @@ async function main() {
51850
51852
  normalize: { type: "boolean", default: true },
51851
51853
  "no-normalize": { type: "boolean", default: false },
51852
51854
  all: { type: "boolean", short: "a", default: false },
51855
+ changes: { type: "boolean", short: "c", default: false },
51853
51856
  "cross-package": { type: "boolean", default: false },
51854
51857
  json: { type: "boolean", default: false },
51855
51858
  help: { type: "boolean", short: "h", default: false },
@@ -51871,9 +51874,11 @@ async function main() {
51871
51874
  const extensions = values.extensions.split(",").map((e) => e.trim());
51872
51875
  const ignorePatterns = values.ignore.split(",").map((p) => p.trim());
51873
51876
  const normalize2 = !values["no-normalize"];
51874
- const scanAll = values.all;
51877
+ const forceAll = values.all;
51878
+ const forceChanges = values.changes;
51875
51879
  const crossPackage = values["cross-package"];
51876
51880
  const jsonOutput = values.json;
51881
+ const hasExplicitPath = positionals.length > 0;
51877
51882
  if (minLines < 2) {
51878
51883
  console.error("Error: --min-lines must be at least 2");
51879
51884
  process.exit(1);
@@ -51888,32 +51893,47 @@ async function main() {
51888
51893
  minLines,
51889
51894
  normalize: normalize2
51890
51895
  };
51891
- const changedLines = !scanAll ? getChangedLines(targetPath) : null;
51892
- if (!scanAll && changedLines?.size === 0) {
51893
- console.log("\nNo changes found. Use --all to scan entire codebase.");
51894
- process.exit(0);
51896
+ let scanAll = forceAll;
51897
+ let changedLines = null;
51898
+ if (!forceAll) {
51899
+ changedLines = getChangedLines(targetPath);
51900
+ if (changedLines.size === 0) {
51901
+ if (forceChanges) {
51902
+ console.log("\nNo changes found on current branch.");
51903
+ process.exit(0);
51904
+ }
51905
+ scanAll = true;
51906
+ if (!jsonOutput) {
51907
+ console.log(`
51908
+ No changes detected${hasExplicitPath ? ` in ${targetPath}` : ""}, defaulting to full scan...`);
51909
+ }
51910
+ }
51895
51911
  }
51896
- console.log(`
51912
+ if (!jsonOutput) {
51913
+ console.log(`
51897
51914
  Scanning ${targetPath}...`);
51898
- if (!scanAll && changedLines) {
51899
- console.log(` Mode: checking changed lines in ${changedLines.size} files`);
51900
- } else {
51901
- console.log(` Mode: full codebase scan`);
51902
- }
51903
- console.log(` Extensions: ${extensions.join(", ")}`);
51904
- console.log(` Min block size: ${minLines} lines`);
51905
- console.log(` Similarity threshold: ${Math.round(similarity * 100)}%`);
51906
- if (crossPackage) {
51907
- console.log(` Cross-package: enabled`);
51915
+ if (!scanAll && changedLines && changedLines.size > 0) {
51916
+ console.log(` Mode: checking changed lines in ${changedLines.size} files`);
51917
+ } else {
51918
+ console.log(` Mode: full codebase scan`);
51919
+ }
51920
+ console.log(` Extensions: ${extensions.join(", ")}`);
51921
+ console.log(` Min block size: ${minLines} lines`);
51922
+ console.log(` Similarity threshold: ${Math.round(similarity * 100)}%`);
51923
+ if (crossPackage) {
51924
+ console.log(` Cross-package: enabled`);
51925
+ }
51926
+ console.log();
51908
51927
  }
51909
- console.log();
51910
51928
  try {
51911
51929
  const startTime = performance.now();
51912
51930
  const { blocks, declarations, astBlocks, fileCount, totalLines } = await scanDirectory(targetPath, scanOptions, true);
51913
51931
  const scanTime = performance.now() - startTime;
51914
- console.log(`Scanned ${fileCount} files (${totalLines.toLocaleString()} lines) in ${Math.round(scanTime)}ms`);
51915
- console.log(`Extracted ${blocks.length.toLocaleString()} blocks, ${astBlocks.length.toLocaleString()} functions/classes
51932
+ if (!jsonOutput) {
51933
+ console.log(`Scanned ${fileCount} files (${totalLines.toLocaleString()} lines) in ${Math.round(scanTime)}ms`);
51934
+ console.log(`Extracted ${blocks.length.toLocaleString()} blocks, ${astBlocks.length.toLocaleString()} functions/classes
51916
51935
  `);
51936
+ }
51917
51937
  if (blocks.length === 0 && astBlocks.length === 0) {
51918
51938
  console.log("No code found to analyze.");
51919
51939
  process.exit(0);
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "dslop",
3
- "version": "1.5.1",
3
+ "version": "1.6.0",
4
4
  "description": "Detect Similar/Duplicate Lines Of Programming - Find code duplication in your codebase",
5
5
  "type": "module",
6
6
  "main": "./dist/index.cjs",