@aiready/pattern-detect 0.7.11 → 0.7.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +81 -558
  2. package/package.json +2 -2
package/README.md CHANGED
@@ -2,635 +2,158 @@
2
2
 
3
3
  > **Semantic duplicate pattern detection for AI-generated code**
4
4
 
5
- When AI tools generate code without awareness of existing patterns in your codebase, you end up with semantically similar but syntactically different implementations. This tool finds those patterns and quantifies their cost.
5
+ Finds semantically similar but syntactically different code patterns that waste AI context and confuse models.
6
6
 
7
- ## 🎯 Why This Tool?
7
+ ## 🚀 Quick Start
8
8
 
9
- ### The AI Code Problem
10
-
11
- AI coding assistants (GitHub Copilot, ChatGPT, Claude) generate functionally similar code in different ways because:
12
- - No awareness of existing patterns in your codebase
13
- - Different AI models have different coding styles
14
- - Team members use AI tools with varying contexts
15
- - AI can't see your full codebase (context window limits)
16
-
17
- ### What Makes Us Different?
18
-
19
- | Feature | jscpd | @aiready/pattern-detect |
20
- |---------|-------|------------------------|
21
- | Detection Method | Byte-level exact matching | Semantic similarity |
22
- | Pattern Types | Generic blocks | Categorized (API, validators, utils, etc.) |
23
- | Token Cost | ❌ No | ✅ Yes - shows AI context waste |
24
- | Refactoring Suggestions | ❌ Generic | ✅ Specific to pattern type |
25
- | Output Formats | Text/JSON | Console/JSON/HTML with rich formatting |
26
-
27
- #### How We Differ (and When to Use Each)
28
-
29
- - **Semantic intent vs exact clones**: jscpd flags copy-paste or near-duplicates; we detect functionally similar code even when structure differs (e.g., two API handlers with different frameworks).
30
- - **Pattern typing**: We classify duplicates into `api-handler`, `validator`, `utility`, `component`, etc., so teams can prioritize coherent refactors.
31
- - **AI context cost**: We estimate tokens wasted to quantify impact on AI tools (larger context, higher cost, more confusion).
32
- - **Refactoring guidance**: We propose targeted fixes per pattern type (e.g., extract middleware or create base handler).
33
- - **Performance profile**: We use Jaccard similarity with candidate filtering; ~2–3s for ~500 blocks on medium repos.
34
-
35
- Recommended workflow:
36
- - Run **jscpd** in CI to enforce low clone percentage (blocking).
37
- - Run **@aiready/pattern-detect** to surface semantic duplicates and token waste (advisory), feeding a refactoring backlog.
38
- - Use both for comprehensive hygiene: jscpd for exact clones; AIReady for intent-level duplication that AI tends to reintroduce.
39
-
40
- ## 🚀 Installation
9
+ **Recommended: Use the unified CLI** (includes pattern detection + more tools):
41
10
 
42
11
  ```bash
43
- npm install -g @aiready/pattern-detect
44
-
45
- # Or use directly with npx
46
- npx @aiready/pattern-detect ./src
12
+ npm install -g @aiready/cli
13
+ aiready patterns ./src
47
14
  ```
48
15
 
49
- ## 📊 Usage
50
-
51
- ### CLI
16
+ **Or use this package directly:**
52
17
 
53
18
  ```bash
54
- # Basic usage
19
+ npm install -g @aiready/pattern-detect
55
20
  aiready-patterns ./src
56
-
57
- # Adjust sensitivity
58
- aiready-patterns ./src --similarity 0.9
59
-
60
- # Only look at larger patterns
61
- aiready-patterns ./src --min-lines 10
62
-
63
- # Filter by severity (focus on critical issues first)
64
- aiready-patterns ./src --severity critical # Only >95% similar
65
- aiready-patterns ./src --severity high # Only >90% similar
66
- aiready-patterns ./src --severity medium # Only >80% similar
67
-
68
- # Include test files (excluded by default)
69
- aiready-patterns ./src --include-tests
70
-
71
- # Memory optimization for large codebases
72
- aiready-patterns ./src --max-blocks 1000 --batch-size 200
73
-
74
- # Export to JSON
75
- aiready-patterns ./src --output json --output-file report.json
76
-
77
- # Generate HTML report
78
- aiready-patterns ./src --output html
79
- ```
80
-
81
- #### Presets (quick copy/paste)
82
-
83
- ```bash
84
- # Speed-first (large repos)
85
- aiready-patterns ./src \
86
- --min-shared-tokens 12 \
87
- --max-candidates 60 \
88
- --max-blocks 300
89
-
90
- # Coverage-first (more findings)
91
- aiready-patterns ./src \
92
- --min-shared-tokens 6 \
93
- --max-candidates 150
94
-
95
- # Short-block focus (helpers/utilities)
96
- aiready-patterns ./src \
97
- --min-lines 5 \
98
- --min-shared-tokens 6 \
99
- --max-candidates 120 \
100
- --exclude "**/test/**"
101
-
102
- # Deep dive with streaming (comprehensive detection)
103
- aiready-patterns ./src \
104
- --no-approx \
105
- --stream-results
106
21
  ```
107
22
 
108
- ### Configuration
109
-
110
- Create an `aiready.json` or `aiready.config.json` file in your project root to persist settings:
111
-
112
- ```json
113
- {
114
- "scan": {
115
- "include": ["**/*.{ts,tsx,js,jsx}"],
116
- "exclude": ["**/test/**", "**/*.test.*"]
117
- },
118
- "tools": {
119
- "pattern-detect": {
120
- "minSimilarity": 0.5,
121
- "minLines": 8,
122
- "approx": false,
123
- "batchSize": 200,
124
- "severity": "high",
125
- "includeTests": false
126
- }
127
- }
128
- }
129
- ```
23
+ ## 🎯 What It Does
130
24
 
131
- CLI options override config file settings.
25
+ AI tools generate similar code in different ways because they lack awareness of your codebase patterns. This tool:
132
26
 
133
- ### Programmatic API
134
-
135
- ```typescript
136
- import { analyzePatterns, generateSummary } from '@aiready/pattern-detect';
137
-
138
- const results = await analyzePatterns({
139
- rootDir: './src',
140
- minSimilarity: 0.85, // 85% similar
141
- minLines: 5,
142
- include: ['**/*.ts', '**/*.tsx'],
143
- exclude: ['**/*.test.ts', '**/node_modules/**'],
144
- });
145
-
146
- const summary = generateSummary(results);
147
-
148
- console.log(`Found ${summary.totalPatterns} duplicate patterns`);
149
- console.log(`Token cost: ${summary.totalTokenCost} tokens wasted`);
150
- console.log(`Pattern breakdown:`, summary.patternsByType);
151
- ```
152
-
153
- ## 🔍 Real-World Example
154
-
155
- ### Before Analysis
156
-
157
- Two API handlers that were written by AI on different days:
158
-
159
- ```typescript
160
- // File: src/api/users.ts
161
- app.get('/api/users/:id', async (request, response) => {
162
- const user = await db.users.findOne({ id: request.params.id });
163
- if (!user) {
164
- return response.status(404).json({ error: 'User not found' });
165
- }
166
- response.json(user);
167
- });
168
-
169
- // File: src/api/posts.ts
170
- router.get('/posts/:id', async (req, res) => {
171
- const post = await database.posts.findOne({ id: req.params.id });
172
- if (!post) {
173
- res.status(404).send({ message: 'Post not found' });
174
- return;
175
- }
176
- res.json(post);
177
- });
178
- ```
27
+ - **Semantic detection**: Finds functionally similar code (not just copy-paste)
28
+ - **Pattern classification**: Groups duplicates by type (API handlers, validators, utilities, etc.)
29
+ - **Token cost analysis**: Shows wasted AI context budget
30
+ - **Refactoring guidance**: Suggests specific fixes per pattern type
179
31
 
180
- ### Analysis Output
32
+ ### Example Output
181
33
 
182
34
  ```
183
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
184
- PATTERN ANALYSIS SUMMARY
185
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
186
-
187
35
  📁 Files analyzed: 47
188
36
  ⚠ Duplicate patterns found: 23
189
37
  💰 Token cost (wasted): 8,450
190
38
 
191
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
192
- PATTERNS BY TYPE
193
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
194
-
195
- 🌐 api-handler 12
196
- ✓ validator 8
197
- 🔧 utility 3
198
-
199
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
200
- TOP DUPLICATE PATTERNS
201
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
39
+ 🌐 api-handler 12 patterns
40
+ validator 8 patterns
41
+ 🔧 utility 3 patterns
202
42
 
203
43
  1. 87% 🌐 api-handler
204
- src/api/users.ts:15
205
- ↔ src/api/posts.ts:22
44
+ src/api/users.ts:15 ↔ src/api/posts.ts:22
206
45
  432 tokens wasted
207
-
208
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
209
- CRITICAL ISSUES (>95% similar)
210
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
211
-
212
- ● src/utils/validators.ts:15
213
- validator pattern 97% similar to src/utils/checks.ts (125 tokens wasted)
214
- → Consolidate validation logic into shared schema validators (Zod/Yup) (CRITICAL: Nearly identical code)
215
- ```
216
-
217
- ### Suggested Refactoring
218
-
219
- Create a generic handler:
220
-
221
- ```typescript
222
- // utils/apiHandler.ts
223
- export const createResourceHandler = (resourceName: string, findFn: Function) => {
224
- return async (req: Request, res: Response) => {
225
- const item = await findFn({ id: req.params.id });
226
- if (!item) {
227
- return res.status(404).json({ error: `${resourceName} not found` });
228
- }
229
- res.json(item);
230
- };
231
- };
232
-
233
- // src/api/users.ts
234
- app.get('/api/users/:id', createResourceHandler('User', db.users.findOne));
235
-
236
- // src/api/posts.ts
237
- router.get('/posts/:id', createResourceHandler('Post', database.posts.findOne));
46
+ → Create generic handler function
238
47
  ```
239
48
 
240
- **Result:** Reduced from 432 tokens to ~100 tokens in AI context.
241
-
242
- ## ⚙️ Configuration
49
+ ## ⚙️ Key Options
243
50
 
244
- ### Common Options
245
-
246
- | Option | Description | Default |
247
- |--------|-------------|---------|
248
- | `minSimilarity` | Similarity threshold (0-1). Default `0.40` (Jaccard). Raise for only obvious duplicates; lower to catch more | `0.40` |
249
- | `minSimilarity` | Similarity threshold (0-1). Default `0.40` (Jaccard). Raise for only obvious duplicates; lower to catch more | `0.40` |
250
- | `minLines` | Minimum lines to consider a pattern | `5` |
251
- | `maxBlocks` | Maximum code blocks to analyze (prevents OOM) | `500` |
252
- | `include` | File patterns to include | `['**/*.{ts,tsx,js,jsx,py,java}']` |
253
- | `exclude` | File patterns to exclude | See below |
254
-
255
- ### Exclude Patterns (Default)
256
-
257
- By default, these patterns are excluded:
258
51
  ```bash
259
- # Dependencies
260
- **/node_modules/**
261
-
262
- # Build outputs
263
- **/dist/**, **/build/**, **/out/**, **/output/**, **/target/**, **/bin/**, **/obj/**
264
-
265
- # Framework-specific build dirs
266
- **/.next/**, **/.nuxt/**, **/.vuepress/**, **/.cache/**, **/.turbo/**
267
-
268
- # Test and coverage
269
- **/coverage/**, **/.nyc_output/**, **/.jest/**
270
-
271
- # Version control and IDE
272
- **/.git/**, **/.svn/**, **/.hg/**, **/.vscode/**, **/.idea/**, **/*.swp, **/*.swo
273
-
274
- # Build artifacts and minified files
275
- **/*.min.js, **/*.min.css, **/*.bundle.js, **/*.tsbuildinfo
276
-
277
- # Logs and temporary files
278
- **/logs/**, **/*.log, **/.DS_Store
279
- ```
280
-
281
- Override with `--exclude` flag:
282
- ```bash
283
- # Exclude test files and generated code
284
- aiready-patterns ./src --exclude "**/test/**,**/generated/**,**/__snapshots__/**"
285
-
286
- # Add to defaults (comma-separated)
287
- aiready-patterns ./src --exclude "**/node_modules/**,**/dist/**,**/build/**,**/*.spec.ts"
288
- ```
289
-
290
- ## 📈 Understanding the Output
291
-
292
- ### Severity Levels
293
-
294
- - **CRITICAL (>95% similar)**: Nearly identical code - refactor immediately
295
- - **MAJOR (>90% similar)**: Very similar - refactor soon
296
- - **MINOR (>85% similar)**: Similar - consider refactoring
297
-
298
- ### Pattern Types
299
-
300
- - **🌐 api-handler**: REST API endpoints, route handlers
301
- - **✓ validator**: Input validation, schema checks
302
- - **🔧 utility**: Pure utility functions
303
- - **📦 class-method**: Class methods with similar logic
304
- - **⚛️ component**: UI components (React, Vue, etc.)
305
- - **ƒ function**: Generic functions
306
-
307
- ### Token Cost
308
-
309
- Estimated tokens wasted when AI tools process duplicate code:
310
- - Increases context window usage
311
- - Higher API costs for AI-powered tools
312
- - Slower analysis and generation
313
- - More potential for AI confusion
314
-
315
- ## 🎓 Best Practices
316
-
317
- 1. **Run regularly**: Integrate into CI/CD to catch new duplicates early
318
- 2. **Start with high similarity**: Use `--similarity 0.9` to find obvious wins
319
- 3. **Focus on critical issues**: Fix >95% similar patterns first
320
- 4. **Use pattern types**: Prioritize refactoring by category (API handlers → validators → utilities)
321
- 5. **Export reports**: Generate HTML reports for team reviews
322
-
323
- ## ⚠️ Performance & Memory
324
-
325
- ### Algorithm Complexity
326
-
327
- **Jaccard Similarity**: **O(B × C × T)** where:
328
- - B = number of blocks
329
- - C = average candidates per block (~100)
330
- - T = average tokens per block (~50)
331
- - **O(T) per comparison** instead of O(N²)
332
- - **Default threshold: 0.40** (comprehensive detection including tests and helpers)
333
-
334
- ### Performance Benchmarks
335
-
336
- | Repo Size | Blocks | Analysis Time |
337
- |-----------|--------|--------------|
338
- | Small (<100 files) | ~50 | <1s |
339
- | Medium (100-500 files) | ~500 | ~2s |
340
- | Large (500+ files) | ~500 (capped) | ~2s |
341
-
342
- **Example:** 828 code blocks → limited to 500 → **2.4s** analysis time
343
-
344
- ### Tuning Options
345
-
346
- ```bash
347
- # Default (40% threshold - comprehensive detection)
348
- aiready-patterns ./src
349
-
350
- # Higher threshold for only obvious duplicates
351
- aiready-patterns ./src --similarity 0.65
352
-
353
- # Lower threshold for more potential duplicates
354
- aiready-patterns ./src --similarity 0.55
52
+ # Basic usage
53
+ aiready patterns ./src
355
54
 
356
- # Approximate mode is default (fast, with candidate filtering)
357
- aiready-patterns ./src
55
+ # Focus on obvious duplicates
56
+ aiready patterns ./src --similarity 0.9
358
57
 
359
- # Exact mode with progress tracking (shows % and ETA)
360
- aiready-patterns ./src --no-approx --stream-results
58
+ # Include smaller patterns
59
+ aiready patterns ./src --min-lines 3
361
60
 
362
- # Maximum speed (aggressive filtering)
363
- aiready-patterns ./src --min-shared-tokens 12 --min-lines 10
61
+ # Export results
62
+ aiready patterns ./src --output json --output-file report.json
364
63
  ```
365
64
 
366
- ## 🎛️ Tuning Playbook
367
-
368
- Use these presets to quickly balance precision, recall, and runtime:
369
-
370
- - Speed-first (large repos):
371
- - `aiready-patterns ./src --min-shared-tokens 12 --max-candidates 60 --max-blocks 300`
372
- - Cuts weak candidates early; best for fast, iterative scans.
373
-
374
- - Coverage-first (more findings):
375
- - `aiready-patterns ./src --min-shared-tokens 6 --max-candidates 150`
376
- - Expands candidate pool; expect more results and longer runtime.
377
-
378
- - Short-block focus (helpers/utilities):
379
- - `aiready-patterns ./src --min-lines 5 --min-shared-tokens 6 --max-candidates 120`
380
- - Better recall for small functions; consider `--exclude "**/test/**"` to reduce noise.
381
-
382
- ### Minimum Lines vs Min Shared Tokens
383
-
384
- - `minLines` filters which blocks are extracted; lower values include smaller functions that have fewer tokens overall.
385
- - Smaller blocks naturally share fewer tokens; to avoid missing true matches when `minLines` is low (≤5–6), consider lowering `minSharedTokens` by 1–2.
386
- - Recommended pairs:
387
- - `minLines 5–6` → `minSharedTokens 6–8` (recall-friendly; watch noise)
388
- - `minLines 8–10` → `minSharedTokens 8–10` (precision-first)
389
- - Default balance: `minLines=5`, `minSharedTokens=8` works well for most repos. Reduce `minSharedTokens` only when you specifically want to catch more short helpers.
65
+ ## 🎛️ Tuning Guide
390
66
 
391
- ## 🎯 Parameter Tuning Guide
67
+ ### Main Parameters
392
68
 
393
- ### When You Get Too Few Results
69
+ | Parameter | Default | Effect | Use When |
70
+ |-----------|---------|--------|----------|
71
+ | `--similarity` | `0.4` | Similarity threshold (0-1) | Want more/less sensitive detection |
72
+ | `--min-lines` | `5` | Minimum lines per pattern | Include/exclude small functions |
73
+ | `--min-shared-tokens` | `8` | Tokens that must match | Control comparison strictness |
394
74
 
395
- If the tool finds fewer duplicates than expected, try these adjustments in order:
75
+ ### Quick Tuning Scenarios
396
76
 
397
- **1. Lower similarity threshold** (most effective)
77
+ **Want more results?** (catch subtle duplicates)
398
78
  ```bash
399
- # Default: 0.4, try lowering to find more potential duplicates
400
- aiready-patterns ./src --similarity 0.3 # More sensitive
401
- aiready-patterns ./src --similarity 0.2 # Very sensitive (may include noise)
402
- ```
403
- *Tradeoff: More results but potentially more false positives*
79
+ # Lower similarity threshold
80
+ aiready patterns ./src --similarity 0.3
404
81
 
405
- **2. Reduce minimum lines**
406
- ```bash
407
- # Default: 5, try lowering to catch smaller functions/utilities
408
- aiready-patterns ./src --min-lines 3 # Include very small functions
409
- aiready-patterns ./src --min-lines 1 # Include almost everything
410
- ```
411
- *Tradeoff: More results but slower analysis and more noise*
82
+ # Include smaller functions
83
+ aiready patterns ./src --min-lines 3
412
84
 
413
- **3. Lower shared tokens threshold**
414
- ```bash
415
- # Default: 8, try lowering to expand candidate pool
416
- aiready-patterns ./src --min-shared-tokens 5 # More candidates
417
- aiready-patterns ./src --min-shared-tokens 3 # Many more candidates
85
+ # Both together
86
+ aiready patterns ./src --similarity 0.3 --min-lines 3
418
87
  ```
419
- *Tradeoff: More results but slower analysis*
420
88
 
421
- **4. Include test files**
89
+ **Want fewer but higher quality results?** (focus on obvious duplicates)
422
90
  ```bash
423
- aiready-patterns ./src --include-tests
424
- ```
425
- *Tradeoff: More results but may include test-specific patterns*
91
+ # Higher similarity threshold
92
+ aiready patterns ./src --similarity 0.8
426
93
 
427
- **5. Increase max candidates per block**
428
- ```bash
429
- # Default: 100, try increasing for more thorough search
430
- aiready-patterns ./src --max-candidates 200 # More thorough
94
+ # Larger patterns only
95
+ aiready patterns ./src --min-lines 10
431
96
  ```
432
- *Tradeoff: Slower analysis but more comprehensive*
433
-
434
- ### When Analysis is Too Slow
435
97
 
436
- If the tool takes too long to run, try these optimizations in order:
437
-
438
- **1. Increase minimum lines** (most effective)
98
+ **Analysis too slow?** (optimize for speed)
439
99
  ```bash
440
- # Default: 5, try increasing to focus on substantial functions
441
- aiready-patterns ./src --min-lines 10 # Focus on larger functions
442
- aiready-patterns ./src --min-lines 15 # Only major functions
443
- ```
444
- *Tradeoff: Faster but may miss small but important duplicates*
100
+ # Focus on substantial functions
101
+ aiready patterns ./src --min-lines 10
445
102
 
446
- **2. Increase shared tokens threshold**
447
- ```bash
448
- # Default: 8, try increasing to reduce candidate pool
449
- aiready-patterns ./src --min-shared-tokens 12 # Fewer candidates
450
- aiready-patterns ./src --min-shared-tokens 15 # Much fewer candidates
103
+ # Reduce comparison candidates
104
+ aiready patterns ./src --min-shared-tokens 12
451
105
  ```
452
- *Tradeoff: Faster but may miss some duplicates*
453
106
 
454
- **3. Reduce max candidates per block**
455
- ```bash
456
- # Default: 100, try reducing for faster analysis
457
- aiready-patterns ./src --max-candidates 50 # Faster
458
- aiready-patterns ./src --max-candidates 20 # Much faster
459
- ```
460
- *Tradeoff: Faster but may miss some duplicates*
107
+ ### Parameter Tradeoffs
461
108
 
462
- **4. Increase similarity threshold**
463
- ```bash
464
- # Default: 0.4, try increasing to reduce comparisons
465
- aiready-patterns ./src --similarity 0.6 # Fewer but more obvious duplicates
466
- ```
467
- *Tradeoff: Faster but fewer results*
109
+ | Adjustment | More Results | Faster | Higher Quality | Tradeoff |
110
+ |------------|--------------|--------|----------------|----------|
111
+ | Lower `--similarity` | | | ❌ | More false positives |
112
+ | Lower `--min-lines` | | | | Includes trivial duplicates |
113
+ | Higher `--similarity` | ❌ | ✅ | ✅ | Misses subtle duplicates |
114
+ | Higher `--min-lines` | ❌ | ✅ | ✅ | Misses small but important patterns |
468
115
 
469
- **5. Analyze by module/directory**
470
- ```bash
471
- # Instead of analyzing the entire repo, analyze specific directories
472
- aiready-patterns ./src/api --min-lines 8
473
- aiready-patterns ./src/components --min-lines 8
474
- ```
475
- *Tradeoff: Need to run multiple commands but each is faster*
476
-
477
- ### When You Get Too Many False Positives
116
+ ### Common Workflows
478
117
 
479
- If the results include many irrelevant duplicates:
480
-
481
- **1. Increase similarity threshold**
118
+ **First run** (broad discovery):
482
119
  ```bash
483
- # Default: 0.4, try increasing for more accurate matches
484
- aiready-patterns ./src --similarity 0.6 # More accurate
485
- aiready-patterns ./src --similarity 0.8 # Very accurate
120
+ aiready patterns ./src # Default settings
486
121
  ```
487
- *Tradeoff: Fewer results but higher quality*
488
122
 
489
- **2. Increase minimum lines**
123
+ **Focus on critical issues** (production ready):
490
124
  ```bash
491
- # Default: 5, try increasing to focus on substantial duplicates
492
- aiready-patterns ./src --min-lines 10 # Larger patterns only
125
+ aiready patterns ./src --similarity 0.8 --min-lines 8
493
126
  ```
494
- *Tradeoff: Fewer results but more significant ones*
495
127
 
496
- **3. Use severity filtering**
128
+ **Catch everything** (comprehensive audit):
497
129
  ```bash
498
- aiready-patterns ./src --severity high # Only >90% similar
499
- aiready-patterns ./src --severity critical # Only >95% similar
130
+ aiready patterns ./src --similarity 0.3 --min-lines 3
500
131
  ```
501
- *Tradeoff: Fewer results but highest quality*
502
132
 
503
- **4. Exclude specific patterns**
133
+ **Performance optimization** (large codebases):
504
134
  ```bash
505
- # Exclude generated files, migrations, etc.
506
- aiready-patterns ./src --exclude "**/migrations/**,**/generated/**"
135
+ aiready patterns ./src --min-lines 10 --min-shared-tokens 10
507
136
  ```
508
- *Tradeoff: Fewer results but more relevant ones*
509
-
510
- ### Quick Troubleshooting Reference
511
-
512
- | Problem | Symptom | Solution | Tradeoff |
513
- |---------|---------|----------|----------|
514
- | **No results** | "No duplicate patterns detected" | Lower `--similarity` to 0.3 | More noise |
515
- | **Few results** | <5 duplicates found | Lower `--min-lines` to 3 | Slower analysis |
516
- | **Too slow** | Takes >30 seconds | Increase `--min-lines` to 10 | Misses small duplicates |
517
- | **Too many results** | 100+ duplicates | Increase `--similarity` to 0.6 | Misses subtle duplicates |
518
- | **False positives** | Many irrelevant matches | Use `--severity critical` | Fewer results |
519
- | **Memory issues** | Out of memory error | Analyze by directory | Multiple commands needed |
520
-
521
- **CLI Options:**
522
- - `--stream-results` - Output duplicates as found (enabled by default)
523
- - `--no-approx` - Disable approximate mode (slower, O(B²) complexity, use with caution)
524
- - `--min-lines N` - Filter blocks smaller than N lines (default 5)
525
-
526
- ### Controlling Analysis Scope
527
137
 
528
- The tool analyzes **all extracted code blocks** by default. Control scope using:
138
+ **Use the unified CLI** for all AIReady tools:
529
139
 
530
- **1. `--min-lines` (primary filter):**
531
- - Filters blocks during extraction (most efficient)
532
- - Higher values = focus on substantial functions
533
- - Lower values = catch smaller utility duplicates
534
-
535
- **2. `--no-approx` mode (use with caution):**
536
- - Disables approximate mode (candidate pre-filtering)
537
- - O(B²) complexity - compares every block to every other block
538
- - **Automatic safety limit:** 500K comparisons (~1000 blocks max)
539
- - Shows warning when used with >500 blocks
540
- - Approximate mode (default) is recommended for all use cases
541
-
542
- **Examples:**
543
140
  ```bash
544
- # Focus on substantial functions only
545
- aiready-patterns ./src --min-lines 15
546
-
547
- # Comprehensive scan of all functions (recommended)
548
- aiready-patterns ./src --min-lines 5
549
-
550
- # Quick scan of major duplicates
551
- aiready-patterns ./src --min-lines 20
552
- ```
553
-
554
- **Recommendations by codebase size:**
141
+ npm install -g @aiready/cli
555
142
 
556
- | Repo Size | Files | Strategy | Expected Time |
557
- |-----------|-------|----------|---------------|
558
- | **Small** | <100 | Use defaults | <1s ✅ |
559
- | **Medium** | 100-500 | Use defaults | 1-5s ✅ |
560
- | **Large** | 500-1,000 | Use defaults or `--min-lines 10` | 3-10s ✅ |
561
- | **Very Large** | 1,000-5,000 | `--min-lines 15` or analyze by module | 5-20s ⚠️ |
562
- | **Super Large** | 5,000+ | **Analyze by module** (see below) | 10-60s per module ⚠️ |
143
+ # Pattern detection
144
+ aiready patterns ./src
563
145
 
564
- ### Analyzing Very Large Repositories
146
+ # Context analysis (token costs, fragmentation)
147
+ aiready context ./src
565
148
 
566
- For repos with 1,000+ files, use modular analysis:
567
-
568
- ```bash
569
- # Analyze by top-level directory
570
- for dir in src/*/; do
571
- echo "Analyzing $dir"
572
- aiready-patterns "$dir" --min-lines 10
573
- done
574
-
575
- # Or focus on specific high-value areas
576
- aiready-patterns ./src/api --min-lines 10
577
- aiready-patterns ./src/core --min-lines 10
578
- aiready-patterns ./src/services --min-lines 10
579
-
580
- # For super large repos (5K+ files), increase thresholds
581
- aiready-patterns ./src/backend --min-lines 20 --similarity 0.50
149
+ # Full codebase analysis
150
+ aiready scan ./src
582
151
  ```
583
152
 
584
- **Why modular analysis?**
585
- - Ensures comprehensive coverage (100% of each module)
586
- - Avoids hitting comparison budget limits
587
- - Provides focused, actionable results per module
588
- - Better for CI/CD integration (parallel jobs)
589
-
590
- **Progress Indicators:**
591
- - **Approx mode**: Shows blocks processed + duplicates found
592
- - **Exact mode**: Shows % complete, ETA, and comparisons processed
593
- - **Stream mode**: Prints each duplicate immediately when found (enabled by default)
594
-
595
- ## 🔧 CI/CD Integration
596
-
597
- ### GitHub Actions
598
-
599
- ```yaml
600
- name: Pattern Detection
601
-
602
- on: [pull_request]
603
-
604
- jobs:
605
- detect-patterns:
606
- runs-on: ubuntu-latest
607
- steps:
608
- - uses: actions/checkout@v3
609
- - uses: actions/setup-node@v3
610
- - run: npx @aiready/pattern-detect ./src --output json --output-file patterns.json
611
- - name: Check for critical issues
612
- run: |
613
- CRITICAL=$(jq '.summary.topDuplicates | map(select(.similarity > 0.95)) | length' patterns.json)
614
- if [ "$CRITICAL" -gt "0" ]; then
615
- echo "Found $CRITICAL critical duplicate patterns"
616
- exit 1
617
- fi
618
- ```
619
-
620
- ## 🤝 Contributing
621
-
622
- We welcome contributions! This tool is part of the [AIReady](https://github.com/aiready/aiready) ecosystem.
623
-
624
- ## 📝 License
625
-
626
- MIT - See LICENSE file
627
-
628
- ## 🔗 Related Tools (Coming Soon)
629
-
630
- - **@aiready/context-analyzer** - Analyze token costs and context fragmentation
631
- - **@aiready/doc-drift** - Track documentation freshness
632
- - **@aiready/consistency** - Check naming pattern consistency
153
+ **Individual packages:**
154
+ - [**@aiready/cli**](https://www.npmjs.com/package/@aiready/cli) - Unified CLI with all tools
155
+ - [**@aiready/context-analyzer**](https://www.npmjs.com/package/@aiready/context-analyzer) - Context window cost analysis
633
156
 
634
157
  ---
635
158
 
636
- **Made with 💙 by the AIReady team** | [Docs](https://aiready.dev/docs) | [GitHub](https://github.com/aiready/aiready)
159
+ **Made with 💙 by the AIReady team** | [GitHub](https://github.com/caopengau/aiready)
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@aiready/pattern-detect",
3
- "version": "0.7.11",
3
+ "version": "0.7.12",
4
4
  "description": "Semantic duplicate pattern detection for AI-generated code - finds similar implementations that waste AI context tokens",
5
5
  "main": "./dist/index.js",
6
6
  "module": "./dist/index.mjs",
@@ -45,7 +45,7 @@
45
45
  "dependencies": {
46
46
  "commander": "^14.0.0",
47
47
  "chalk": "^5.3.0",
48
- "@aiready/core": "0.3.6"
48
+ "@aiready/core": "0.3.7"
49
49
  },
50
50
  "devDependencies": {
51
51
  "tsup": "^8.3.5",