@aiready/pattern-detect 0.7.11 → 0.7.12
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +81 -558
- package/package.json +2 -2
package/README.md
CHANGED
|
@@ -2,635 +2,158 @@
|
|
|
2
2
|
|
|
3
3
|
> **Semantic duplicate pattern detection for AI-generated code**
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
Finds semantically similar but syntactically different code patterns that waste AI context and confuse models.
|
|
6
6
|
|
|
7
|
-
##
|
|
7
|
+
## 🚀 Quick Start
|
|
8
8
|
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
AI coding assistants (GitHub Copilot, ChatGPT, Claude) generate functionally similar code in different ways because:
|
|
12
|
-
- No awareness of existing patterns in your codebase
|
|
13
|
-
- Different AI models have different coding styles
|
|
14
|
-
- Team members use AI tools with varying contexts
|
|
15
|
-
- AI can't see your full codebase (context window limits)
|
|
16
|
-
|
|
17
|
-
### What Makes Us Different?
|
|
18
|
-
|
|
19
|
-
| Feature | jscpd | @aiready/pattern-detect |
|
|
20
|
-
|---------|-------|------------------------|
|
|
21
|
-
| Detection Method | Byte-level exact matching | Semantic similarity |
|
|
22
|
-
| Pattern Types | Generic blocks | Categorized (API, validators, utils, etc.) |
|
|
23
|
-
| Token Cost | ❌ No | ✅ Yes - shows AI context waste |
|
|
24
|
-
| Refactoring Suggestions | ❌ Generic | ✅ Specific to pattern type |
|
|
25
|
-
| Output Formats | Text/JSON | Console/JSON/HTML with rich formatting |
|
|
26
|
-
|
|
27
|
-
#### How We Differ (and When to Use Each)
|
|
28
|
-
|
|
29
|
-
- **Semantic intent vs exact clones**: jscpd flags copy-paste or near-duplicates; we detect functionally similar code even when structure differs (e.g., two API handlers with different frameworks).
|
|
30
|
-
- **Pattern typing**: We classify duplicates into `api-handler`, `validator`, `utility`, `component`, etc., so teams can prioritize coherent refactors.
|
|
31
|
-
- **AI context cost**: We estimate tokens wasted to quantify impact on AI tools (larger context, higher cost, more confusion).
|
|
32
|
-
- **Refactoring guidance**: We propose targeted fixes per pattern type (e.g., extract middleware or create base handler).
|
|
33
|
-
- **Performance profile**: We use Jaccard similarity with candidate filtering; ~2–3s for ~500 blocks on medium repos.
|
|
34
|
-
|
|
35
|
-
Recommended workflow:
|
|
36
|
-
- Run **jscpd** in CI to enforce low clone percentage (blocking).
|
|
37
|
-
- Run **@aiready/pattern-detect** to surface semantic duplicates and token waste (advisory), feeding a refactoring backlog.
|
|
38
|
-
- Use both for comprehensive hygiene: jscpd for exact clones; AIReady for intent-level duplication that AI tends to reintroduce.
|
|
39
|
-
|
|
40
|
-
## 🚀 Installation
|
|
9
|
+
**Recommended: Use the unified CLI** (includes pattern detection + more tools):
|
|
41
10
|
|
|
42
11
|
```bash
|
|
43
|
-
npm install -g @aiready/
|
|
44
|
-
|
|
45
|
-
# Or use directly with npx
|
|
46
|
-
npx @aiready/pattern-detect ./src
|
|
12
|
+
npm install -g @aiready/cli
|
|
13
|
+
aiready patterns ./src
|
|
47
14
|
```
|
|
48
15
|
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
### CLI
|
|
16
|
+
**Or use this package directly:**
|
|
52
17
|
|
|
53
18
|
```bash
|
|
54
|
-
|
|
19
|
+
npm install -g @aiready/pattern-detect
|
|
55
20
|
aiready-patterns ./src
|
|
56
|
-
|
|
57
|
-
# Adjust sensitivity
|
|
58
|
-
aiready-patterns ./src --similarity 0.9
|
|
59
|
-
|
|
60
|
-
# Only look at larger patterns
|
|
61
|
-
aiready-patterns ./src --min-lines 10
|
|
62
|
-
|
|
63
|
-
# Filter by severity (focus on critical issues first)
|
|
64
|
-
aiready-patterns ./src --severity critical # Only >95% similar
|
|
65
|
-
aiready-patterns ./src --severity high # Only >90% similar
|
|
66
|
-
aiready-patterns ./src --severity medium # Only >80% similar
|
|
67
|
-
|
|
68
|
-
# Include test files (excluded by default)
|
|
69
|
-
aiready-patterns ./src --include-tests
|
|
70
|
-
|
|
71
|
-
# Memory optimization for large codebases
|
|
72
|
-
aiready-patterns ./src --max-blocks 1000 --batch-size 200
|
|
73
|
-
|
|
74
|
-
# Export to JSON
|
|
75
|
-
aiready-patterns ./src --output json --output-file report.json
|
|
76
|
-
|
|
77
|
-
# Generate HTML report
|
|
78
|
-
aiready-patterns ./src --output html
|
|
79
|
-
```
|
|
80
|
-
|
|
81
|
-
#### Presets (quick copy/paste)
|
|
82
|
-
|
|
83
|
-
```bash
|
|
84
|
-
# Speed-first (large repos)
|
|
85
|
-
aiready-patterns ./src \
|
|
86
|
-
--min-shared-tokens 12 \
|
|
87
|
-
--max-candidates 60 \
|
|
88
|
-
--max-blocks 300
|
|
89
|
-
|
|
90
|
-
# Coverage-first (more findings)
|
|
91
|
-
aiready-patterns ./src \
|
|
92
|
-
--min-shared-tokens 6 \
|
|
93
|
-
--max-candidates 150
|
|
94
|
-
|
|
95
|
-
# Short-block focus (helpers/utilities)
|
|
96
|
-
aiready-patterns ./src \
|
|
97
|
-
--min-lines 5 \
|
|
98
|
-
--min-shared-tokens 6 \
|
|
99
|
-
--max-candidates 120 \
|
|
100
|
-
--exclude "**/test/**"
|
|
101
|
-
|
|
102
|
-
# Deep dive with streaming (comprehensive detection)
|
|
103
|
-
aiready-patterns ./src \
|
|
104
|
-
--no-approx \
|
|
105
|
-
--stream-results
|
|
106
21
|
```
|
|
107
22
|
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
Create an `aiready.json` or `aiready.config.json` file in your project root to persist settings:
|
|
111
|
-
|
|
112
|
-
```json
|
|
113
|
-
{
|
|
114
|
-
"scan": {
|
|
115
|
-
"include": ["**/*.{ts,tsx,js,jsx}"],
|
|
116
|
-
"exclude": ["**/test/**", "**/*.test.*"]
|
|
117
|
-
},
|
|
118
|
-
"tools": {
|
|
119
|
-
"pattern-detect": {
|
|
120
|
-
"minSimilarity": 0.5,
|
|
121
|
-
"minLines": 8,
|
|
122
|
-
"approx": false,
|
|
123
|
-
"batchSize": 200,
|
|
124
|
-
"severity": "high",
|
|
125
|
-
"includeTests": false
|
|
126
|
-
}
|
|
127
|
-
}
|
|
128
|
-
}
|
|
129
|
-
```
|
|
23
|
+
## 🎯 What It Does
|
|
130
24
|
|
|
131
|
-
|
|
25
|
+
AI tools generate similar code in different ways because they lack awareness of your codebase patterns. This tool:
|
|
132
26
|
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
const results = await analyzePatterns({
|
|
139
|
-
rootDir: './src',
|
|
140
|
-
minSimilarity: 0.85, // 85% similar
|
|
141
|
-
minLines: 5,
|
|
142
|
-
include: ['**/*.ts', '**/*.tsx'],
|
|
143
|
-
exclude: ['**/*.test.ts', '**/node_modules/**'],
|
|
144
|
-
});
|
|
145
|
-
|
|
146
|
-
const summary = generateSummary(results);
|
|
147
|
-
|
|
148
|
-
console.log(`Found ${summary.totalPatterns} duplicate patterns`);
|
|
149
|
-
console.log(`Token cost: ${summary.totalTokenCost} tokens wasted`);
|
|
150
|
-
console.log(`Pattern breakdown:`, summary.patternsByType);
|
|
151
|
-
```
|
|
152
|
-
|
|
153
|
-
## 🔍 Real-World Example
|
|
154
|
-
|
|
155
|
-
### Before Analysis
|
|
156
|
-
|
|
157
|
-
Two API handlers that were written by AI on different days:
|
|
158
|
-
|
|
159
|
-
```typescript
|
|
160
|
-
// File: src/api/users.ts
|
|
161
|
-
app.get('/api/users/:id', async (request, response) => {
|
|
162
|
-
const user = await db.users.findOne({ id: request.params.id });
|
|
163
|
-
if (!user) {
|
|
164
|
-
return response.status(404).json({ error: 'User not found' });
|
|
165
|
-
}
|
|
166
|
-
response.json(user);
|
|
167
|
-
});
|
|
168
|
-
|
|
169
|
-
// File: src/api/posts.ts
|
|
170
|
-
router.get('/posts/:id', async (req, res) => {
|
|
171
|
-
const post = await database.posts.findOne({ id: req.params.id });
|
|
172
|
-
if (!post) {
|
|
173
|
-
res.status(404).send({ message: 'Post not found' });
|
|
174
|
-
return;
|
|
175
|
-
}
|
|
176
|
-
res.json(post);
|
|
177
|
-
});
|
|
178
|
-
```
|
|
27
|
+
- **Semantic detection**: Finds functionally similar code (not just copy-paste)
|
|
28
|
+
- **Pattern classification**: Groups duplicates by type (API handlers, validators, utilities, etc.)
|
|
29
|
+
- **Token cost analysis**: Shows wasted AI context budget
|
|
30
|
+
- **Refactoring guidance**: Suggests specific fixes per pattern type
|
|
179
31
|
|
|
180
|
-
###
|
|
32
|
+
### Example Output
|
|
181
33
|
|
|
182
34
|
```
|
|
183
|
-
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
184
|
-
PATTERN ANALYSIS SUMMARY
|
|
185
|
-
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
186
|
-
|
|
187
35
|
📁 Files analyzed: 47
|
|
188
36
|
⚠ Duplicate patterns found: 23
|
|
189
37
|
💰 Token cost (wasted): 8,450
|
|
190
38
|
|
|
191
|
-
|
|
192
|
-
|
|
193
|
-
|
|
194
|
-
|
|
195
|
-
🌐 api-handler 12
|
|
196
|
-
✓ validator 8
|
|
197
|
-
🔧 utility 3
|
|
198
|
-
|
|
199
|
-
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
200
|
-
TOP DUPLICATE PATTERNS
|
|
201
|
-
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
39
|
+
🌐 api-handler 12 patterns
|
|
40
|
+
✓ validator 8 patterns
|
|
41
|
+
🔧 utility 3 patterns
|
|
202
42
|
|
|
203
43
|
1. 87% 🌐 api-handler
|
|
204
|
-
src/api/users.ts:15
|
|
205
|
-
↔ src/api/posts.ts:22
|
|
44
|
+
src/api/users.ts:15 ↔ src/api/posts.ts:22
|
|
206
45
|
432 tokens wasted
|
|
207
|
-
|
|
208
|
-
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
209
|
-
CRITICAL ISSUES (>95% similar)
|
|
210
|
-
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
211
|
-
|
|
212
|
-
● src/utils/validators.ts:15
|
|
213
|
-
validator pattern 97% similar to src/utils/checks.ts (125 tokens wasted)
|
|
214
|
-
→ Consolidate validation logic into shared schema validators (Zod/Yup) (CRITICAL: Nearly identical code)
|
|
215
|
-
```
|
|
216
|
-
|
|
217
|
-
### Suggested Refactoring
|
|
218
|
-
|
|
219
|
-
Create a generic handler:
|
|
220
|
-
|
|
221
|
-
```typescript
|
|
222
|
-
// utils/apiHandler.ts
|
|
223
|
-
export const createResourceHandler = (resourceName: string, findFn: Function) => {
|
|
224
|
-
return async (req: Request, res: Response) => {
|
|
225
|
-
const item = await findFn({ id: req.params.id });
|
|
226
|
-
if (!item) {
|
|
227
|
-
return res.status(404).json({ error: `${resourceName} not found` });
|
|
228
|
-
}
|
|
229
|
-
res.json(item);
|
|
230
|
-
};
|
|
231
|
-
};
|
|
232
|
-
|
|
233
|
-
// src/api/users.ts
|
|
234
|
-
app.get('/api/users/:id', createResourceHandler('User', db.users.findOne));
|
|
235
|
-
|
|
236
|
-
// src/api/posts.ts
|
|
237
|
-
router.get('/posts/:id', createResourceHandler('Post', database.posts.findOne));
|
|
46
|
+
→ Create generic handler function
|
|
238
47
|
```
|
|
239
48
|
|
|
240
|
-
|
|
241
|
-
|
|
242
|
-
## ⚙️ Configuration
|
|
49
|
+
## ⚙️ Key Options
|
|
243
50
|
|
|
244
|
-
### Common Options
|
|
245
|
-
|
|
246
|
-
| Option | Description | Default |
|
|
247
|
-
|--------|-------------|---------|
|
|
248
|
-
| `minSimilarity` | Similarity threshold (0-1). Default `0.40` (Jaccard). Raise for only obvious duplicates; lower to catch more | `0.40` |
|
|
249
|
-
| `minSimilarity` | Similarity threshold (0-1). Default `0.40` (Jaccard). Raise for only obvious duplicates; lower to catch more | `0.40` |
|
|
250
|
-
| `minLines` | Minimum lines to consider a pattern | `5` |
|
|
251
|
-
| `maxBlocks` | Maximum code blocks to analyze (prevents OOM) | `500` |
|
|
252
|
-
| `include` | File patterns to include | `['**/*.{ts,tsx,js,jsx,py,java}']` |
|
|
253
|
-
| `exclude` | File patterns to exclude | See below |
|
|
254
|
-
|
|
255
|
-
### Exclude Patterns (Default)
|
|
256
|
-
|
|
257
|
-
By default, these patterns are excluded:
|
|
258
51
|
```bash
|
|
259
|
-
#
|
|
260
|
-
|
|
261
|
-
|
|
262
|
-
# Build outputs
|
|
263
|
-
**/dist/**, **/build/**, **/out/**, **/output/**, **/target/**, **/bin/**, **/obj/**
|
|
264
|
-
|
|
265
|
-
# Framework-specific build dirs
|
|
266
|
-
**/.next/**, **/.nuxt/**, **/.vuepress/**, **/.cache/**, **/.turbo/**
|
|
267
|
-
|
|
268
|
-
# Test and coverage
|
|
269
|
-
**/coverage/**, **/.nyc_output/**, **/.jest/**
|
|
270
|
-
|
|
271
|
-
# Version control and IDE
|
|
272
|
-
**/.git/**, **/.svn/**, **/.hg/**, **/.vscode/**, **/.idea/**, **/*.swp, **/*.swo
|
|
273
|
-
|
|
274
|
-
# Build artifacts and minified files
|
|
275
|
-
**/*.min.js, **/*.min.css, **/*.bundle.js, **/*.tsbuildinfo
|
|
276
|
-
|
|
277
|
-
# Logs and temporary files
|
|
278
|
-
**/logs/**, **/*.log, **/.DS_Store
|
|
279
|
-
```
|
|
280
|
-
|
|
281
|
-
Override with `--exclude` flag:
|
|
282
|
-
```bash
|
|
283
|
-
# Exclude test files and generated code
|
|
284
|
-
aiready-patterns ./src --exclude "**/test/**,**/generated/**,**/__snapshots__/**"
|
|
285
|
-
|
|
286
|
-
# Add to defaults (comma-separated)
|
|
287
|
-
aiready-patterns ./src --exclude "**/node_modules/**,**/dist/**,**/build/**,**/*.spec.ts"
|
|
288
|
-
```
|
|
289
|
-
|
|
290
|
-
## 📈 Understanding the Output
|
|
291
|
-
|
|
292
|
-
### Severity Levels
|
|
293
|
-
|
|
294
|
-
- **CRITICAL (>95% similar)**: Nearly identical code - refactor immediately
|
|
295
|
-
- **MAJOR (>90% similar)**: Very similar - refactor soon
|
|
296
|
-
- **MINOR (>85% similar)**: Similar - consider refactoring
|
|
297
|
-
|
|
298
|
-
### Pattern Types
|
|
299
|
-
|
|
300
|
-
- **🌐 api-handler**: REST API endpoints, route handlers
|
|
301
|
-
- **✓ validator**: Input validation, schema checks
|
|
302
|
-
- **🔧 utility**: Pure utility functions
|
|
303
|
-
- **📦 class-method**: Class methods with similar logic
|
|
304
|
-
- **⚛️ component**: UI components (React, Vue, etc.)
|
|
305
|
-
- **ƒ function**: Generic functions
|
|
306
|
-
|
|
307
|
-
### Token Cost
|
|
308
|
-
|
|
309
|
-
Estimated tokens wasted when AI tools process duplicate code:
|
|
310
|
-
- Increases context window usage
|
|
311
|
-
- Higher API costs for AI-powered tools
|
|
312
|
-
- Slower analysis and generation
|
|
313
|
-
- More potential for AI confusion
|
|
314
|
-
|
|
315
|
-
## 🎓 Best Practices
|
|
316
|
-
|
|
317
|
-
1. **Run regularly**: Integrate into CI/CD to catch new duplicates early
|
|
318
|
-
2. **Start with high similarity**: Use `--similarity 0.9` to find obvious wins
|
|
319
|
-
3. **Focus on critical issues**: Fix >95% similar patterns first
|
|
320
|
-
4. **Use pattern types**: Prioritize refactoring by category (API handlers → validators → utilities)
|
|
321
|
-
5. **Export reports**: Generate HTML reports for team reviews
|
|
322
|
-
|
|
323
|
-
## ⚠️ Performance & Memory
|
|
324
|
-
|
|
325
|
-
### Algorithm Complexity
|
|
326
|
-
|
|
327
|
-
**Jaccard Similarity**: **O(B × C × T)** where:
|
|
328
|
-
- B = number of blocks
|
|
329
|
-
- C = average candidates per block (~100)
|
|
330
|
-
- T = average tokens per block (~50)
|
|
331
|
-
- **O(T) per comparison** instead of O(N²)
|
|
332
|
-
- **Default threshold: 0.40** (comprehensive detection including tests and helpers)
|
|
333
|
-
|
|
334
|
-
### Performance Benchmarks
|
|
335
|
-
|
|
336
|
-
| Repo Size | Blocks | Analysis Time |
|
|
337
|
-
|-----------|--------|--------------|
|
|
338
|
-
| Small (<100 files) | ~50 | <1s |
|
|
339
|
-
| Medium (100-500 files) | ~500 | ~2s |
|
|
340
|
-
| Large (500+ files) | ~500 (capped) | ~2s |
|
|
341
|
-
|
|
342
|
-
**Example:** 828 code blocks → limited to 500 → **2.4s** analysis time
|
|
343
|
-
|
|
344
|
-
### Tuning Options
|
|
345
|
-
|
|
346
|
-
```bash
|
|
347
|
-
# Default (40% threshold - comprehensive detection)
|
|
348
|
-
aiready-patterns ./src
|
|
349
|
-
|
|
350
|
-
# Higher threshold for only obvious duplicates
|
|
351
|
-
aiready-patterns ./src --similarity 0.65
|
|
352
|
-
|
|
353
|
-
# Lower threshold for more potential duplicates
|
|
354
|
-
aiready-patterns ./src --similarity 0.55
|
|
52
|
+
# Basic usage
|
|
53
|
+
aiready patterns ./src
|
|
355
54
|
|
|
356
|
-
#
|
|
357
|
-
aiready
|
|
55
|
+
# Focus on obvious duplicates
|
|
56
|
+
aiready patterns ./src --similarity 0.9
|
|
358
57
|
|
|
359
|
-
#
|
|
360
|
-
aiready
|
|
58
|
+
# Include smaller patterns
|
|
59
|
+
aiready patterns ./src --min-lines 3
|
|
361
60
|
|
|
362
|
-
#
|
|
363
|
-
aiready
|
|
61
|
+
# Export results
|
|
62
|
+
aiready patterns ./src --output json --output-file report.json
|
|
364
63
|
```
|
|
365
64
|
|
|
366
|
-
## 🎛️ Tuning
|
|
367
|
-
|
|
368
|
-
Use these presets to quickly balance precision, recall, and runtime:
|
|
369
|
-
|
|
370
|
-
- Speed-first (large repos):
|
|
371
|
-
- `aiready-patterns ./src --min-shared-tokens 12 --max-candidates 60 --max-blocks 300`
|
|
372
|
-
- Cuts weak candidates early; best for fast, iterative scans.
|
|
373
|
-
|
|
374
|
-
- Coverage-first (more findings):
|
|
375
|
-
- `aiready-patterns ./src --min-shared-tokens 6 --max-candidates 150`
|
|
376
|
-
- Expands candidate pool; expect more results and longer runtime.
|
|
377
|
-
|
|
378
|
-
- Short-block focus (helpers/utilities):
|
|
379
|
-
- `aiready-patterns ./src --min-lines 5 --min-shared-tokens 6 --max-candidates 120`
|
|
380
|
-
- Better recall for small functions; consider `--exclude "**/test/**"` to reduce noise.
|
|
381
|
-
|
|
382
|
-
### Minimum Lines vs Min Shared Tokens
|
|
383
|
-
|
|
384
|
-
- `minLines` filters which blocks are extracted; lower values include smaller functions that have fewer tokens overall.
|
|
385
|
-
- Smaller blocks naturally share fewer tokens; to avoid missing true matches when `minLines` is low (≤5–6), consider lowering `minSharedTokens` by 1–2.
|
|
386
|
-
- Recommended pairs:
|
|
387
|
-
- `minLines 5–6` → `minSharedTokens 6–8` (recall-friendly; watch noise)
|
|
388
|
-
- `minLines 8–10` → `minSharedTokens 8–10` (precision-first)
|
|
389
|
-
- Default balance: `minLines=5`, `minSharedTokens=8` works well for most repos. Reduce `minSharedTokens` only when you specifically want to catch more short helpers.
|
|
65
|
+
## 🎛️ Tuning Guide
|
|
390
66
|
|
|
391
|
-
|
|
67
|
+
### Main Parameters
|
|
392
68
|
|
|
393
|
-
|
|
69
|
+
| Parameter | Default | Effect | Use When |
|
|
70
|
+
|-----------|---------|--------|----------|
|
|
71
|
+
| `--similarity` | `0.4` | Similarity threshold (0-1) | Want more/less sensitive detection |
|
|
72
|
+
| `--min-lines` | `5` | Minimum lines per pattern | Include/exclude small functions |
|
|
73
|
+
| `--min-shared-tokens` | `8` | Tokens that must match | Control comparison strictness |
|
|
394
74
|
|
|
395
|
-
|
|
75
|
+
### Quick Tuning Scenarios
|
|
396
76
|
|
|
397
|
-
**
|
|
77
|
+
**Want more results?** (catch subtle duplicates)
|
|
398
78
|
```bash
|
|
399
|
-
#
|
|
400
|
-
aiready
|
|
401
|
-
aiready-patterns ./src --similarity 0.2 # Very sensitive (may include noise)
|
|
402
|
-
```
|
|
403
|
-
*Tradeoff: More results but potentially more false positives*
|
|
79
|
+
# Lower similarity threshold
|
|
80
|
+
aiready patterns ./src --similarity 0.3
|
|
404
81
|
|
|
405
|
-
|
|
406
|
-
|
|
407
|
-
# Default: 5, try lowering to catch smaller functions/utilities
|
|
408
|
-
aiready-patterns ./src --min-lines 3 # Include very small functions
|
|
409
|
-
aiready-patterns ./src --min-lines 1 # Include almost everything
|
|
410
|
-
```
|
|
411
|
-
*Tradeoff: More results but slower analysis and more noise*
|
|
82
|
+
# Include smaller functions
|
|
83
|
+
aiready patterns ./src --min-lines 3
|
|
412
84
|
|
|
413
|
-
|
|
414
|
-
|
|
415
|
-
# Default: 8, try lowering to expand candidate pool
|
|
416
|
-
aiready-patterns ./src --min-shared-tokens 5 # More candidates
|
|
417
|
-
aiready-patterns ./src --min-shared-tokens 3 # Many more candidates
|
|
85
|
+
# Both together
|
|
86
|
+
aiready patterns ./src --similarity 0.3 --min-lines 3
|
|
418
87
|
```
|
|
419
|
-
*Tradeoff: More results but slower analysis*
|
|
420
88
|
|
|
421
|
-
**
|
|
89
|
+
**Want fewer but higher quality results?** (focus on obvious duplicates)
|
|
422
90
|
```bash
|
|
423
|
-
|
|
424
|
-
|
|
425
|
-
*Tradeoff: More results but may include test-specific patterns*
|
|
91
|
+
# Higher similarity threshold
|
|
92
|
+
aiready patterns ./src --similarity 0.8
|
|
426
93
|
|
|
427
|
-
|
|
428
|
-
|
|
429
|
-
# Default: 100, try increasing for more thorough search
|
|
430
|
-
aiready-patterns ./src --max-candidates 200 # More thorough
|
|
94
|
+
# Larger patterns only
|
|
95
|
+
aiready patterns ./src --min-lines 10
|
|
431
96
|
```
|
|
432
|
-
*Tradeoff: Slower analysis but more comprehensive*
|
|
433
|
-
|
|
434
|
-
### When Analysis is Too Slow
|
|
435
97
|
|
|
436
|
-
|
|
437
|
-
|
|
438
|
-
**1. Increase minimum lines** (most effective)
|
|
98
|
+
**Analysis too slow?** (optimize for speed)
|
|
439
99
|
```bash
|
|
440
|
-
#
|
|
441
|
-
aiready
|
|
442
|
-
aiready-patterns ./src --min-lines 15 # Only major functions
|
|
443
|
-
```
|
|
444
|
-
*Tradeoff: Faster but may miss small but important duplicates*
|
|
100
|
+
# Focus on substantial functions
|
|
101
|
+
aiready patterns ./src --min-lines 10
|
|
445
102
|
|
|
446
|
-
|
|
447
|
-
|
|
448
|
-
# Default: 8, try increasing to reduce candidate pool
|
|
449
|
-
aiready-patterns ./src --min-shared-tokens 12 # Fewer candidates
|
|
450
|
-
aiready-patterns ./src --min-shared-tokens 15 # Much fewer candidates
|
|
103
|
+
# Reduce comparison candidates
|
|
104
|
+
aiready patterns ./src --min-shared-tokens 12
|
|
451
105
|
```
|
|
452
|
-
*Tradeoff: Faster but may miss some duplicates*
|
|
453
106
|
|
|
454
|
-
|
|
455
|
-
```bash
|
|
456
|
-
# Default: 100, try reducing for faster analysis
|
|
457
|
-
aiready-patterns ./src --max-candidates 50 # Faster
|
|
458
|
-
aiready-patterns ./src --max-candidates 20 # Much faster
|
|
459
|
-
```
|
|
460
|
-
*Tradeoff: Faster but may miss some duplicates*
|
|
107
|
+
### Parameter Tradeoffs
|
|
461
108
|
|
|
462
|
-
|
|
463
|
-
|
|
464
|
-
|
|
465
|
-
|
|
466
|
-
|
|
467
|
-
|
|
109
|
+
| Adjustment | More Results | Faster | Higher Quality | Tradeoff |
|
|
110
|
+
|------------|--------------|--------|----------------|----------|
|
|
111
|
+
| Lower `--similarity` | ✅ | ❌ | ❌ | More false positives |
|
|
112
|
+
| Lower `--min-lines` | ✅ | ❌ | ❌ | Includes trivial duplicates |
|
|
113
|
+
| Higher `--similarity` | ❌ | ✅ | ✅ | Misses subtle duplicates |
|
|
114
|
+
| Higher `--min-lines` | ❌ | ✅ | ✅ | Misses small but important patterns |
|
|
468
115
|
|
|
469
|
-
|
|
470
|
-
```bash
|
|
471
|
-
# Instead of analyzing the entire repo, analyze specific directories
|
|
472
|
-
aiready-patterns ./src/api --min-lines 8
|
|
473
|
-
aiready-patterns ./src/components --min-lines 8
|
|
474
|
-
```
|
|
475
|
-
*Tradeoff: Need to run multiple commands but each is faster*
|
|
476
|
-
|
|
477
|
-
### When You Get Too Many False Positives
|
|
116
|
+
### Common Workflows
|
|
478
117
|
|
|
479
|
-
|
|
480
|
-
|
|
481
|
-
**1. Increase similarity threshold**
|
|
118
|
+
**First run** (broad discovery):
|
|
482
119
|
```bash
|
|
483
|
-
# Default
|
|
484
|
-
aiready-patterns ./src --similarity 0.6 # More accurate
|
|
485
|
-
aiready-patterns ./src --similarity 0.8 # Very accurate
|
|
120
|
+
aiready patterns ./src # Default settings
|
|
486
121
|
```
|
|
487
|
-
*Tradeoff: Fewer results but higher quality*
|
|
488
122
|
|
|
489
|
-
**
|
|
123
|
+
**Focus on critical issues** (production ready):
|
|
490
124
|
```bash
|
|
491
|
-
|
|
492
|
-
aiready-patterns ./src --min-lines 10 # Larger patterns only
|
|
125
|
+
aiready patterns ./src --similarity 0.8 --min-lines 8
|
|
493
126
|
```
|
|
494
|
-
*Tradeoff: Fewer results but more significant ones*
|
|
495
127
|
|
|
496
|
-
**
|
|
128
|
+
**Catch everything** (comprehensive audit):
|
|
497
129
|
```bash
|
|
498
|
-
aiready
|
|
499
|
-
aiready-patterns ./src --severity critical # Only >95% similar
|
|
130
|
+
aiready patterns ./src --similarity 0.3 --min-lines 3
|
|
500
131
|
```
|
|
501
|
-
*Tradeoff: Fewer results but highest quality*
|
|
502
132
|
|
|
503
|
-
**
|
|
133
|
+
**Performance optimization** (large codebases):
|
|
504
134
|
```bash
|
|
505
|
-
|
|
506
|
-
aiready-patterns ./src --exclude "**/migrations/**,**/generated/**"
|
|
135
|
+
aiready patterns ./src --min-lines 10 --min-shared-tokens 10
|
|
507
136
|
```
|
|
508
|
-
*Tradeoff: Fewer results but more relevant ones*
|
|
509
|
-
|
|
510
|
-
### Quick Troubleshooting Reference
|
|
511
|
-
|
|
512
|
-
| Problem | Symptom | Solution | Tradeoff |
|
|
513
|
-
|---------|---------|----------|----------|
|
|
514
|
-
| **No results** | "No duplicate patterns detected" | Lower `--similarity` to 0.3 | More noise |
|
|
515
|
-
| **Few results** | <5 duplicates found | Lower `--min-lines` to 3 | Slower analysis |
|
|
516
|
-
| **Too slow** | Takes >30 seconds | Increase `--min-lines` to 10 | Misses small duplicates |
|
|
517
|
-
| **Too many results** | 100+ duplicates | Increase `--similarity` to 0.6 | Misses subtle duplicates |
|
|
518
|
-
| **False positives** | Many irrelevant matches | Use `--severity critical` | Fewer results |
|
|
519
|
-
| **Memory issues** | Out of memory error | Analyze by directory | Multiple commands needed |
|
|
520
|
-
|
|
521
|
-
**CLI Options:**
|
|
522
|
-
- `--stream-results` - Output duplicates as found (enabled by default)
|
|
523
|
-
- `--no-approx` - Disable approximate mode (slower, O(B²) complexity, use with caution)
|
|
524
|
-
- `--min-lines N` - Filter blocks smaller than N lines (default 5)
|
|
525
|
-
|
|
526
|
-
### Controlling Analysis Scope
|
|
527
137
|
|
|
528
|
-
|
|
138
|
+
**Use the unified CLI** for all AIReady tools:
|
|
529
139
|
|
|
530
|
-
**1. `--min-lines` (primary filter):**
|
|
531
|
-
- Filters blocks during extraction (most efficient)
|
|
532
|
-
- Higher values = focus on substantial functions
|
|
533
|
-
- Lower values = catch smaller utility duplicates
|
|
534
|
-
|
|
535
|
-
**2. `--no-approx` mode (use with caution):**
|
|
536
|
-
- Disables approximate mode (candidate pre-filtering)
|
|
537
|
-
- O(B²) complexity - compares every block to every other block
|
|
538
|
-
- **Automatic safety limit:** 500K comparisons (~1000 blocks max)
|
|
539
|
-
- Shows warning when used with >500 blocks
|
|
540
|
-
- Approximate mode (default) is recommended for all use cases
|
|
541
|
-
|
|
542
|
-
**Examples:**
|
|
543
140
|
```bash
|
|
544
|
-
|
|
545
|
-
aiready-patterns ./src --min-lines 15
|
|
546
|
-
|
|
547
|
-
# Comprehensive scan of all functions (recommended)
|
|
548
|
-
aiready-patterns ./src --min-lines 5
|
|
549
|
-
|
|
550
|
-
# Quick scan of major duplicates
|
|
551
|
-
aiready-patterns ./src --min-lines 20
|
|
552
|
-
```
|
|
553
|
-
|
|
554
|
-
**Recommendations by codebase size:**
|
|
141
|
+
npm install -g @aiready/cli
|
|
555
142
|
|
|
556
|
-
|
|
557
|
-
|
|
558
|
-
| **Small** | <100 | Use defaults | <1s ✅ |
|
|
559
|
-
| **Medium** | 100-500 | Use defaults | 1-5s ✅ |
|
|
560
|
-
| **Large** | 500-1,000 | Use defaults or `--min-lines 10` | 3-10s ✅ |
|
|
561
|
-
| **Very Large** | 1,000-5,000 | `--min-lines 15` or analyze by module | 5-20s ⚠️ |
|
|
562
|
-
| **Super Large** | 5,000+ | **Analyze by module** (see below) | 10-60s per module ⚠️ |
|
|
143
|
+
# Pattern detection
|
|
144
|
+
aiready patterns ./src
|
|
563
145
|
|
|
564
|
-
|
|
146
|
+
# Context analysis (token costs, fragmentation)
|
|
147
|
+
aiready context ./src
|
|
565
148
|
|
|
566
|
-
|
|
567
|
-
|
|
568
|
-
```bash
|
|
569
|
-
# Analyze by top-level directory
|
|
570
|
-
for dir in src/*/; do
|
|
571
|
-
echo "Analyzing $dir"
|
|
572
|
-
aiready-patterns "$dir" --min-lines 10
|
|
573
|
-
done
|
|
574
|
-
|
|
575
|
-
# Or focus on specific high-value areas
|
|
576
|
-
aiready-patterns ./src/api --min-lines 10
|
|
577
|
-
aiready-patterns ./src/core --min-lines 10
|
|
578
|
-
aiready-patterns ./src/services --min-lines 10
|
|
579
|
-
|
|
580
|
-
# For super large repos (5K+ files), increase thresholds
|
|
581
|
-
aiready-patterns ./src/backend --min-lines 20 --similarity 0.50
|
|
149
|
+
# Full codebase analysis
|
|
150
|
+
aiready scan ./src
|
|
582
151
|
```
|
|
583
152
|
|
|
584
|
-
**
|
|
585
|
-
-
|
|
586
|
-
-
|
|
587
|
-
- Provides focused, actionable results per module
|
|
588
|
-
- Better for CI/CD integration (parallel jobs)
|
|
589
|
-
|
|
590
|
-
**Progress Indicators:**
|
|
591
|
-
- **Approx mode**: Shows blocks processed + duplicates found
|
|
592
|
-
- **Exact mode**: Shows % complete, ETA, and comparisons processed
|
|
593
|
-
- **Stream mode**: Prints each duplicate immediately when found (enabled by default)
|
|
594
|
-
|
|
595
|
-
## 🔧 CI/CD Integration
|
|
596
|
-
|
|
597
|
-
### GitHub Actions
|
|
598
|
-
|
|
599
|
-
```yaml
|
|
600
|
-
name: Pattern Detection
|
|
601
|
-
|
|
602
|
-
on: [pull_request]
|
|
603
|
-
|
|
604
|
-
jobs:
|
|
605
|
-
detect-patterns:
|
|
606
|
-
runs-on: ubuntu-latest
|
|
607
|
-
steps:
|
|
608
|
-
- uses: actions/checkout@v3
|
|
609
|
-
- uses: actions/setup-node@v3
|
|
610
|
-
- run: npx @aiready/pattern-detect ./src --output json --output-file patterns.json
|
|
611
|
-
- name: Check for critical issues
|
|
612
|
-
run: |
|
|
613
|
-
CRITICAL=$(jq '.summary.topDuplicates | map(select(.similarity > 0.95)) | length' patterns.json)
|
|
614
|
-
if [ "$CRITICAL" -gt "0" ]; then
|
|
615
|
-
echo "Found $CRITICAL critical duplicate patterns"
|
|
616
|
-
exit 1
|
|
617
|
-
fi
|
|
618
|
-
```
|
|
619
|
-
|
|
620
|
-
## 🤝 Contributing
|
|
621
|
-
|
|
622
|
-
We welcome contributions! This tool is part of the [AIReady](https://github.com/aiready/aiready) ecosystem.
|
|
623
|
-
|
|
624
|
-
## 📝 License
|
|
625
|
-
|
|
626
|
-
MIT - See LICENSE file
|
|
627
|
-
|
|
628
|
-
## 🔗 Related Tools (Coming Soon)
|
|
629
|
-
|
|
630
|
-
- **@aiready/context-analyzer** - Analyze token costs and context fragmentation
|
|
631
|
-
- **@aiready/doc-drift** - Track documentation freshness
|
|
632
|
-
- **@aiready/consistency** - Check naming pattern consistency
|
|
153
|
+
**Individual packages:**
|
|
154
|
+
- [**@aiready/cli**](https://www.npmjs.com/package/@aiready/cli) - Unified CLI with all tools
|
|
155
|
+
- [**@aiready/context-analyzer**](https://www.npmjs.com/package/@aiready/context-analyzer) - Context window cost analysis
|
|
633
156
|
|
|
634
157
|
---
|
|
635
158
|
|
|
636
|
-
**Made with 💙 by the AIReady team** | [
|
|
159
|
+
**Made with 💙 by the AIReady team** | [GitHub](https://github.com/caopengau/aiready)
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@aiready/pattern-detect",
|
|
3
|
-
"version": "0.7.
|
|
3
|
+
"version": "0.7.12",
|
|
4
4
|
"description": "Semantic duplicate pattern detection for AI-generated code - finds similar implementations that waste AI context tokens",
|
|
5
5
|
"main": "./dist/index.js",
|
|
6
6
|
"module": "./dist/index.mjs",
|
|
@@ -45,7 +45,7 @@
|
|
|
45
45
|
"dependencies": {
|
|
46
46
|
"commander": "^14.0.0",
|
|
47
47
|
"chalk": "^5.3.0",
|
|
48
|
-
"@aiready/core": "0.3.
|
|
48
|
+
"@aiready/core": "0.3.7"
|
|
49
49
|
},
|
|
50
50
|
"devDependencies": {
|
|
51
51
|
"tsup": "^8.3.5",
|