codecritique 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (40) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +1145 -0
  3. package/package.json +98 -0
  4. package/src/content-retrieval.js +747 -0
  5. package/src/custom-documents.js +597 -0
  6. package/src/embeddings/cache-manager.js +364 -0
  7. package/src/embeddings/constants.js +40 -0
  8. package/src/embeddings/database.js +921 -0
  9. package/src/embeddings/errors.js +208 -0
  10. package/src/embeddings/factory.js +447 -0
  11. package/src/embeddings/file-processor.js +851 -0
  12. package/src/embeddings/model-manager.js +337 -0
  13. package/src/embeddings/similarity-calculator.js +97 -0
  14. package/src/embeddings/types.js +113 -0
  15. package/src/feedback-loader.js +384 -0
  16. package/src/index.js +1418 -0
  17. package/src/llm.js +123 -0
  18. package/src/pr-history/analyzer.js +579 -0
  19. package/src/pr-history/bot-detector.js +123 -0
  20. package/src/pr-history/cli-utils.js +204 -0
  21. package/src/pr-history/comment-processor.js +549 -0
  22. package/src/pr-history/database.js +819 -0
  23. package/src/pr-history/github-client.js +629 -0
  24. package/src/project-analyzer.js +955 -0
  25. package/src/rag-analyzer.js +2764 -0
  26. package/src/rag-review.js +566 -0
  27. package/src/technology-keywords.json +753 -0
  28. package/src/utils/command.js +48 -0
  29. package/src/utils/constants.js +263 -0
  30. package/src/utils/context-inference.js +364 -0
  31. package/src/utils/document-detection.js +105 -0
  32. package/src/utils/file-validation.js +271 -0
  33. package/src/utils/git.js +232 -0
  34. package/src/utils/language-detection.js +170 -0
  35. package/src/utils/logging.js +24 -0
  36. package/src/utils/markdown.js +132 -0
  37. package/src/utils/mobilebert-tokenizer.js +141 -0
  38. package/src/utils/pr-chunking.js +276 -0
  39. package/src/utils/string-utils.js +28 -0
  40. package/src/zero-shot-classifier-open.js +392 -0
package/README.md ADDED
@@ -0,0 +1,1145 @@
1
+ # CodeCritique
2
+
3
+ A self-hosted, AI-powered code review tool using **RAG (Retrieval-Augmented Generation)** with local embeddings and Anthropic Claude for intelligent, context-aware code analysis. Supports any programming language with specialized features for JavaScript/TypeScript projects.
4
+
5
+ ## Table of Contents
6
+
7
+ - [Overview](#overview)
8
+ - [Installation](#installation)
9
+ - [Quick Start](#quick-start)
10
+ - [GitHub Actions Integration](#github-actions-integration)
11
+ - [Commands Reference](#commands-reference)
12
+ - [RAG Architecture](#rag-architecture)
13
+ - [Configuration](#configuration)
14
+ - [Output Formats](#output-formats)
15
+ - [Error Handling & Troubleshooting](#error-handling--troubleshooting)
16
+ - [Contributing](#contributing)
17
+ - [License](#license)
18
+
19
+ ## Overview
20
+
21
+ ### How RAG Powers Intelligent Code Review
22
+
23
+ CodeCritique uses **Retrieval-Augmented Generation (RAG)** to provide context-aware code analysis by combining:
24
+
25
+ - **Local embeddings** (via FastEmbed) for understanding your codebase patterns
26
+ - **Vector similarity search** to find relevant code examples and documentation
27
+ - **Historical PR analysis** to learn from past code review patterns
28
+ - **Custom document integration** for project-specific guidelines
29
+ - **LLM-powered analysis** (Anthropic Claude) with rich contextual information
30
+
31
+ This RAG-based approach provides more accurate, project-specific code reviews compared to generic static analysis tools.
32
+
33
+ ### Key Features
34
+
35
+ - **🔍 Context-Aware Analysis**: Understands your codebase patterns and conventions
36
+ - **🌐 Universal Language Support**: Works with any programming language
37
+ - **⚡ Local Embeddings**: Uses FastEmbed for fast, privacy-respecting semantic search
38
+ - **📚 Custom Guidelines**: Integrate your team's coding standards and documentation
39
+ - **🔄 PR History Learning**: Learns from past code review patterns in your repository
40
+ - **📊 Multiple Output Formats**: Text, JSON, and Markdown output for flexible integration
41
+ - **🔧 Git Integration**: Analyze specific files, patterns, or branch differences
42
+ - **🚀 Easy Setup**: Works via npx in any project type
43
+
44
+ ### Benefits
45
+
46
+ - **Reduced Review Time**: Automate repetitive aspects of code review
47
+ - **Consistent Standards**: Enforce coding standards uniformly across the codebase
48
+ - **Learning from History**: Leverage patterns from previous code reviews
49
+ - **Project-Specific**: Understands your codebase's unique patterns and conventions
50
+ - **Actionable Feedback**: Provides specific, constructive suggestions
51
+
52
+ ## Installation
53
+
54
+ ### Prerequisites
55
+
56
+ - **Node.js** v22.0.0 or higher
57
+ - **Git** (for diff-based analysis)
58
+ - **Anthropic API key** (for LLM analysis)
59
+
60
+ ### API Key Setup
61
+
62
+ Set up your Anthropic API key using one of these methods:
63
+
64
+ #### Option 1: Environment Variable
65
+
66
+ ```bash
67
+ export ANTHROPIC_API_KEY=your_anthropic_api_key
68
+ ```
69
+
70
+ #### Option 2: .env File
71
+
72
+ Create a `.env` file in your project directory:
73
+
74
+ ```env
75
+ ANTHROPIC_API_KEY=your_anthropic_api_key
76
+ ```
77
+
78
+ #### Option 3: Inline with Command
79
+
80
+ ```bash
81
+ ANTHROPIC_API_KEY=your_key npx codecritique analyze --file app.py
82
+ ```
83
+
84
+ ### Installation Options
85
+
86
+ > **Note**: This tool is currently in development and not yet published to npm. You'll need to run it locally for now.
87
+
88
+ #### Option 1: Run Locally (Current Method)
89
+
90
+ 1. **Clone the repository**:
91
+
92
+ ```bash
93
+ git clone https://github.com/cosmocoder/CodeCritique.git
94
+ cd CodeCritique
95
+ ```
96
+
97
+ 2. **Install dependencies**:
98
+
99
+ ```bash
100
+ npm install
101
+ ```
102
+
103
+ 3. **Run the tool**:
104
+
105
+ ```bash
106
+ # Analyze a single file
107
+ node src/index.js analyze --file path/to/file.py
108
+
109
+ # Or use npm script (if available)
110
+ npm start analyze --file path/to/file.py
111
+ ```
112
+
113
+ **Method B: Using Shell Script Wrapper (Recommended for non-JS projects)**
114
+
115
+ For easier integration with non-JavaScript projects, you can use the provided shell script wrapper:
116
+
117
+ 1. **Copy the wrapper script** to your project:
118
+
119
+ ```bash
120
+ # From the CodeCritique repository
121
+ cp src/codecritique.sh /path/to/your/project/codecritique.sh
122
+ chmod +x /path/to/your/project/codecritique.sh
123
+ ```
124
+
125
+ 2. **Use the wrapper** (automatically handles environment setup):
126
+
127
+ ```bash
128
+ # The script will automatically:
129
+ # - Check for Node.js installation
130
+ # - Load .env file if present
131
+ # - Verify ANTHROPIC_API_KEY
132
+ # - Try global installation first, then fall back to npx
133
+
134
+ ./codecritique.sh analyze --file path/to/file.py
135
+ ./codecritique.sh embeddings:generate --directory src
136
+ ```
137
+
138
+ 3. **Environment setup** (the script handles this automatically):
139
+ - Creates/uses `.env` file in your project directory
140
+ - Validates Node.js v22.0.0+ requirement
141
+ - Provides helpful error messages for missing dependencies
142
+
143
+ #### Option 2: Using npx (Future - Once Published)
144
+
145
+ ```bash
146
+ # This will be available once the tool is published to npm
147
+ npx codecritique analyze --file path/to/file.py
148
+ ```
149
+
150
+ #### Option 3: Global Installation (Future - Once Published)
151
+
152
+ ```bash
153
+ # This will be available once the tool is published to npm
154
+ npm install -g codecritique
155
+ codecritique analyze --file path/to/file.py
156
+ ```
157
+
158
+ ## Quick Start
159
+
160
+ Follow this three-step workflow for optimal code review results:
161
+
162
+ ### Step 1: Generate Embeddings (Required)
163
+
164
+ **Generate embeddings for your codebase first** - this is essential for context-aware analysis:
165
+
166
+ ```bash
167
+ # Generate embeddings for current directory
168
+ npx codecritique embeddings:generate --directory src
169
+
170
+ # Generate for specific files or patterns
171
+ npx codecritique embeddings:generate --files "src/**/*.ts" "lib/*.js"
172
+
173
+ # Generate with exclusions (recommended for large codebases)
174
+ npx codecritique embeddings:generate --directory src --exclude "**/*.test.js" "**/*.spec.js"
175
+ ```
176
+
177
+ ### Step 2: Analyze PR History (Optional)
178
+
179
+ **Enhance reviews with historical context** by analyzing past PR comments. This step requires a GitHub token:
180
+
181
+ #### Prerequisites for PR History Analysis
182
+
183
+ You must set a `GITHUB_TOKEN` environment variable with repository access permissions:
184
+
185
+ ```bash
186
+ # Set GitHub token (required for PR history analysis)
187
+ export GITHUB_TOKEN=your_github_token_here
188
+
189
+ # Or add to .env file
190
+ echo "GITHUB_TOKEN=your_github_token_here" >> .env
191
+ ```
192
+
193
+ #### Run PR History Analysis
194
+
195
+ ```bash
196
+ # Analyze PR history for current project (auto-detects GitHub repo)
197
+ npx codecritique pr-history:analyze
198
+
199
+ # Analyze specific repository
200
+ npx codecritique pr-history:analyze --repository owner/repo
201
+
202
+ # Analyze with date range
203
+ npx codecritique pr-history:analyze --since 2024-01-01 --until 2024-12-31
204
+ ```
205
+
206
+ ### Step 3: Analyze Code (Final Step)
207
+
208
+ **Now perform the actual code review** with rich context from embeddings and PR history:
209
+
210
+ #### Basic Analysis
211
+
212
+ ```bash
213
+ # Analyze a single file
214
+ npx codecritique analyze --file src/components/Button.tsx
215
+
216
+ # Analyze files matching patterns
217
+ npx codecritique analyze --files "src/**/*.ts" "lib/*.js"
218
+
219
+ # Analyze changes in feature-branch vs main branch (auto-detects base branch)
220
+ npx codecritique analyze --diff-with feature-branch
221
+ ```
222
+
223
+ #### Using with Custom Guidelines
224
+
225
+ ```bash
226
+ # Include your team's coding standards
227
+ npx codecritique analyze \
228
+ --file src/utils/validation.ts \
229
+ --doc "Engineering Guidelines:./docs/guidelines.md" \
230
+ --doc "API Standards:./docs/api-standards.md"
231
+ ```
232
+
233
+ #### Non-JavaScript Projects
234
+
235
+ ```bash
236
+ # Python project
237
+ cd /path/to/python/project
238
+ npx codecritique analyze --file app.py
239
+
240
+ # Ruby project
241
+ npx codecritique analyze --files "**/*.rb"
242
+
243
+ # Any language with git diff
244
+ npx codecritique analyze --diff-with feature-branch
245
+ ```
246
+
247
+ ## GitHub Actions Integration
248
+
249
+ This project provides **two reusable GitHub Actions** that can be used in any repository for automated AI-powered code review:
250
+
251
+ 1. **🧠 Generate Embeddings Action** - Creates semantic embeddings for your codebase
252
+ 2. **🔍 PR Review Action** - Performs AI-powered code reviews on pull requests
253
+
254
+ These actions can be used independently or together for a complete AI code review workflow in your CI/CD pipeline.
255
+
256
+ ---
257
+
258
+ ### 🧠 Generate Embeddings Action
259
+
260
+ **Action Path:** `cosmocoder/CodeCritique/.github/actions/generate-embeddings@main`
261
+
262
+ This action generates semantic embeddings for your codebase, enabling context-aware code analysis. The embeddings are stored as GitHub Actions artifacts and can be reused across workflow runs. It is recommended to generated embeddings for your project every time the `main` branch is updated.
263
+
264
+ #### Basic Usage
265
+
266
+ ```yaml
267
+ name: Generate Code Embeddings
268
+
269
+ on:
270
+ push:
271
+ branches:
272
+ - main
273
+
274
+ jobs:
275
+ generate-embeddings:
276
+ name: Generate Code Embeddings
277
+ runs-on: ubuntu-latest
278
+ permissions:
279
+ contents: read
280
+ actions: read # needed for downloading artifacts
281
+
282
+ steps:
283
+ - name: Checkout Target Repository
284
+ uses: actions/checkout@v4
285
+
286
+ - name: Generate Embeddings
287
+ uses: cosmocoder/CodeCritique/.github/actions/generate-embeddings@main
288
+ with:
289
+ verbose: true
290
+ ```
291
+
292
+ #### Input Parameters
293
+
294
+ | Parameter | Description | Required | Default |
295
+ | --------------------------- | ------------------------------------------------------- | -------- | ---------------- |
296
+ | `files` | Specific files or patterns to process (space-separated) | No | `''` (all files) |
297
+ | `concurrency` | Number of concurrent embedding requests | No | Auto-detected |
298
+ | `exclude` | Patterns to exclude (space-separated glob patterns) | No | `''` |
299
+ | `exclude-file` | File containing patterns to exclude (one per line) | No | `''` |
300
+ | `verbose` | Show verbose output | No | `false` |
301
+ | `embeddings-retention-days` | Number of days to retain embedding artifacts | No | `30` |
302
+
303
+ #### Advanced Configuration Examples
304
+
305
+ ##### Processing Specific Files
306
+
307
+ ```yaml
308
+ - name: Generate Embeddings for TypeScript Files
309
+ uses: cosmocoder/CodeCritique/.github/actions/generate-embeddings@main
310
+ with:
311
+ files: 'src/**/*.ts src/**/*.tsx'
312
+ exclude: '**/*.test.ts **/*.spec.ts'
313
+ verbose: true
314
+ ```
315
+
316
+ ##### High Performance Setup
317
+
318
+ ```yaml
319
+ - name: Generate Embeddings (High Performance)
320
+ uses: cosmocoder/CodeCritique/.github/actions/generate-embeddings@main
321
+ with:
322
+ concurrency: 20
323
+ embeddings-retention-days: 60
324
+ ```
325
+
326
+ ---
327
+
328
+ ### 🔍 PR Review Action
329
+
330
+ **Action Path:** `cosmocoder/CodeCritique/.github/actions/pr-review@main`
331
+
332
+ This action performs AI-powered code reviews on pull requests using Anthropic Claude models. It automatically downloads any available embeddings to provide context-aware analysis and posts review comments directly to the PR.
333
+
334
+ The action includes intelligent feedback tracking that monitors user reactions and replies to review comments. When users dismiss suggestions (through reactions like 👎 or replies with keywords like "disagree", "ignore", or "not relevant"), the action automatically resolves those conversation threads and avoids reposting similar issues in subsequent runs on the same PR, creating a more streamlined review experience.
335
+
336
+ #### Basic Usage
337
+
338
+ ```yaml
339
+ name: AI PR Review
340
+
341
+ on:
342
+ pull_request:
343
+ types: [opened, synchronize, reopened]
344
+
345
+ jobs:
346
+ pr-review:
347
+ name: AI PR Review
348
+ runs-on: ubuntu-latest
349
+ permissions:
350
+ contents: write # needed for marking conversations as resolved
351
+ pull-requests: write # needed for posting comments
352
+ actions: read # needed for downloading artifacts
353
+
354
+ steps:
355
+ - name: ⬇️ Checkout repo
356
+ uses: actions/checkout@v4
357
+
358
+ - name: Setup master branch for diff analysis
359
+ run: git fetch --no-tags --prune origin main:main
360
+
361
+ - name: Code Review
362
+ uses: cosmocoder/CodeCritique/.github/actions/pr-review@main
363
+ with:
364
+ verbose: true
365
+ anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
366
+ ```
367
+
368
+ #### Required Setup
369
+
370
+ 1. **Anthropic API Key**: Store your Anthropic API key as a repository secret named `ANTHROPIC_API_KEY`
371
+ 2. **Permissions**: The workflow must have `contents: write`, `actions: read`, and `pull-requests: write` permissions
372
+ 3. **Git Setup**: Ensure the base branch is available for diff analysis (see example above)
373
+
374
+ #### Input Parameters
375
+
376
+ | Parameter | Description | Required | Default |
377
+ | ------------------- | ---------------------------------------------------- | -------- | -------------------- |
378
+ | `anthropic-api-key` | Anthropic API key for Claude models | **Yes** | - |
379
+ | `skip-label` | Label name to skip AI review | No | `ai-review-disabled` |
380
+ | `verbose` | Show verbose output | No | `false` |
381
+ | `model` | LLM model to use (e.g., `claude-sonnet-4-20250514`) | No | Auto-selected |
382
+ | `max-tokens` | Maximum tokens for LLM response | No | Auto-calculated |
383
+ | `concurrency` | Concurrency for processing multiple files | No | `3` |
384
+ | `custom-docs` | Custom documents (format: `"title:path,title:path"`) | No | `''` |
385
+
386
+ > **Note**: The action uses sensible defaults for all review parameters. It always:
387
+ >
388
+ > - Uses JSON output format for parsing results
389
+ > - Posts both individual comments and summary comments to PRs
390
+ > - Limits to 25 comments maximum
391
+ > - Tracks feedback to improve future reviews
392
+ > - Uses optimal temperature and similarity thresholds
393
+
394
+ #### Output Values
395
+
396
+ The action provides several outputs that can be used in subsequent workflow steps:
397
+
398
+ | Output | Description |
399
+ | ------------------------ | -------------------------------------- |
400
+ | `comments-posted` | Number of review comments posted |
401
+ | `issues-found` | Total number of issues found |
402
+ | `files-analyzed` | Number of files analyzed |
403
+ | `analysis-time` | Time taken for analysis (seconds) |
404
+ | `embedding-cache-hit` | Whether embeddings were found and used |
405
+ | `review-score` | Overall review score (0-100) |
406
+ | `security-issues` | Number of security issues found |
407
+ | `performance-issues` | Number of performance issues found |
408
+ | `maintainability-issues` | Number of maintainability issues found |
409
+ | `review-report-path` | Path to the detailed review report |
410
+
411
+ #### Advanced Configuration Examples
412
+
413
+ ##### Skipping Reviews with Labels
414
+
415
+ You can skip AI reviews for specific PRs by adding a label. This is useful when:
416
+
417
+ - You want to merge urgent hotfixes without waiting for AI review
418
+ - The PR contains only documentation or configuration changes
419
+ - You're making experimental changes that don't need review
420
+
421
+ By default, the action checks for the `ai-review-disabled` label, but you can customize this:
422
+
423
+ ```yaml
424
+ - name: AI Code Review (Customizable Skip)
425
+ uses: cosmocoder/CodeCritique/.github/actions/pr-review@main
426
+ with:
427
+ anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
428
+ skip-label: 'no-ai-review' # Custom label name
429
+ ```
430
+
431
+ When a PR has the skip label, the workflow will exit early with a message:
432
+
433
+ ```
434
+ ⏭️ Skipping AI review - PR has 'ai-review-disabled' label
435
+ ```
436
+
437
+ To use this feature:
438
+
439
+ 1. Add the label to your repository (e.g., create a label named `ai-review-disabled`)
440
+ 2. Add the label to any PR you want to skip
441
+ 3. The action will automatically detect it and skip the review
442
+
443
+ ##### Custom Model and Performance Settings
444
+
445
+ ```yaml
446
+ - name: AI Code Review with Custom Settings
447
+ uses: cosmocoder/CodeCritique/.github/actions/pr-review@main
448
+ with:
449
+ anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
450
+ model: 'claude-3-5-sonnet-20241022'
451
+ max-tokens: '4000'
452
+ concurrency: '5'
453
+ verbose: true
454
+ ```
455
+
456
+ ##### With Custom Documentation
457
+
458
+ ```yaml
459
+ - name: AI Code Review with Team Guidelines
460
+ uses: cosmocoder/CodeCritique/.github/actions/pr-review@main
461
+ with:
462
+ anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
463
+ custom-docs: 'Style Guide:./docs/style-guide.md,API Standards:./docs/api-standards.md'
464
+ verbose: true
465
+ ```
466
+
467
+ ---
468
+
469
+ ## Commands Reference
470
+
471
+ ### analyze
472
+
473
+ Analyze code using RAG (Retrieval-Augmented Generation) approach with dynamic context retrieval.
474
+
475
+ ```bash
476
+ codecritique analyze [options]
477
+ ```
478
+
479
+ #### Options
480
+
481
+ | Option | Description | Default |
482
+ | -------------------------- | --------------------------------------------------------------------------------------- | ------- |
483
+ | `-b, --diff-with <branch>` | Analyze files changed in the specified branch compared to the base branch (main/master) | - |
484
+ | `-f, --files <files...>` | Specific files or glob patterns to review | - |
485
+ | `--file <file>` | Analyze a single file | - |
486
+ | `-d, --directory <dir>` | Working directory for git operations (use with --diff-with) | - |
487
+ | `-o, --output <format>` | Output format (text, json, markdown) | `text` |
488
+ | `--no-color` | Disable colored output | `false` |
489
+ | `--verbose` | Show verbose output | `false` |
490
+
491
+ | `--model <model>` | LLM model to use (e.g., claude-sonnet-4-20250514) | `claude-sonnet-4-20250514` |
492
+ | `--temperature <number>` | LLM temperature | `0.2` |
493
+ | `--max-tokens <number>` | LLM max tokens | `8192` |
494
+ | `--similarity-threshold <number>` | Threshold for finding similar code examples | `0.6` |
495
+ | `--max-examples <number>` | Max similar code examples to use | `5` |
496
+ | `--concurrency <number>` | Concurrency for processing multiple files | `3` |
497
+ | `--doc <specs...>` | Custom documents to provide to LLM (format: "Title:./path/to/file.md") | - |
498
+
499
+ #### Examples
500
+
501
+ ```bash
502
+ # Analyze a single file
503
+ codecritique analyze --file src/components/Button.tsx
504
+
505
+ # Analyze multiple files with patterns
506
+ codecritique analyze --files "src/**/*.tsx" "lib/*.js"
507
+
508
+ # Analyze changes in feature-branch vs main branch (auto-detects base branch)
509
+ codecritique analyze --diff-with feature-branch
510
+
511
+ # Analyze with custom documentation
512
+ codecritique analyze --file src/utils/validation.ts \
513
+ --doc "Engineering Guidelines:./docs/guidelines.md"
514
+
515
+ # Analyze with custom LLM settings
516
+ codecritique analyze --file app.py \
517
+ --temperature 0.1 \
518
+ --max-tokens 4096 \
519
+ --similarity-threshold 0.7
520
+
521
+ # Analyze changes in specific directory
522
+ codecritique analyze --diff-with feature-branch --directory /path/to/repo
523
+
524
+ # Output as JSON
525
+ codecritique analyze --files "src/**/*.ts" --output json > review.json
526
+ ```
527
+
528
+ ### embeddings:generate
529
+
530
+ Generate embeddings for the codebase to enable context-aware analysis.
531
+
532
+ ```bash
533
+ codecritique embeddings:generate [options]
534
+ ```
535
+
536
+ #### Options
537
+
538
+ | Option | Description | Default |
539
+ | ---------------------------- | ------------------------------------------------------------------------------ | ------- |
540
+ | `-d, --directory <dir>` | Directory to process | `.` |
541
+ | `-f, --files <files...>` | Specific files or patterns to process | - |
542
+ | `-c, --concurrency <number>` | Number of concurrent embedding requests | `10` |
543
+ | `--verbose` | Show verbose output | `false` |
544
+ | `--exclude <patterns...>` | Patterns to exclude (e.g., "**/\*.test.js" "docs/**") | - |
545
+ | `--exclude-file <file>` | File containing patterns to exclude (one per line) | - |
546
+ | `--no-gitignore` | Disable automatic exclusion of files in .gitignore | `false` |
547
+ | `--max-lines` | Maximum lines per code file that will be considered when generating embeddings | `1000` |
548
+ | `--force-analysis` | Force regeneration of project analysis summary (bypasses cache) | `false` |
549
+
550
+ #### Examples
551
+
552
+ ```bash
553
+ # Generate embeddings for current directory
554
+ codecritique embeddings:generate
555
+
556
+ # Generate for specific directory
557
+ codecritique embeddings:generate --directory src
558
+
559
+ # Generate for specific files
560
+ codecritique embeddings:generate --files "src/**/*.tsx" "lib/*.js"
561
+
562
+ # Exclude test files and docs
563
+ codecritique embeddings:generate --exclude "**/*.test.js" "**/*.spec.js" "docs/**"
564
+
565
+ # Use exclusion file
566
+ codecritique embeddings:generate --exclude-file exclusion-patterns.txt
567
+
568
+ # Process without gitignore exclusions
569
+ codecritique embeddings:generate --no-gitignore
570
+
571
+ # High concurrency for large codebases
572
+ codecritique embeddings:generate --concurrency 20 --verbose
573
+
574
+ # Force regeneration of project analysis (useful after major codebase changes)
575
+ codecritique embeddings:generate --force-analysis --verbose
576
+
577
+ # Combine force analysis with specific directory processing
578
+ codecritique embeddings:generate --directory src --force-analysis
579
+ ```
580
+
581
+ ### embeddings:stats
582
+
583
+ Show statistics about stored embeddings.
584
+
585
+ ```bash
586
+ codecritique embeddings:stats [options]
587
+ ```
588
+
589
+ #### Options
590
+
591
+ | Option | Description | Default |
592
+ | ----------------------- | -------------------------------------------------------------------------------- | ------- |
593
+ | `-d, --directory <dir>` | Directory of the project to show stats for (shows all projects if not specified) | - |
594
+
595
+ #### Examples
596
+
597
+ ```bash
598
+ # Show stats for all projects
599
+ codecritique embeddings:stats
600
+
601
+ # Show stats for specific project
602
+ codecritique embeddings:stats --directory /path/to/project
603
+ ```
604
+
605
+ ### embeddings:clear
606
+
607
+ Clear stored embeddings for the current project.
608
+
609
+ ```bash
610
+ codecritique embeddings:clear [options]
611
+ ```
612
+
613
+ #### Options
614
+
615
+ | Option | Description | Default |
616
+ | ----------------------- | ------------------------------------------------ | ------- |
617
+ | `-d, --directory <dir>` | Directory of the project to clear embeddings for | `.` |
618
+
619
+ #### Examples
620
+
621
+ ```bash
622
+ # Clear embeddings for current project
623
+ codecritique embeddings:clear
624
+
625
+ # Clear embeddings for specific project
626
+ codecritique embeddings:clear --directory /path/to/project
627
+ ```
628
+
629
+ ### embeddings:clear-all
630
+
631
+ Clear ALL stored embeddings (affects all projects - use with caution).
632
+
633
+ ```bash
634
+ codecritique embeddings:clear-all
635
+ ```
636
+
637
+ **Warning**: This command clears embeddings for all projects on the machine.
638
+
639
+ ### pr-history:analyze
640
+
641
+ Analyze PR comment history for the current project or specified repository.
642
+
643
+ ```bash
644
+ codecritique pr-history:analyze [options]
645
+ ```
646
+
647
+ #### Options
648
+
649
+ | Option | Description | Default |
650
+ | ------------------------- | ------------------------------------------------------------------- | ------- |
651
+ | `-d, --directory <dir>` | Project directory to analyze (auto-detects GitHub repo) | `.` |
652
+ | `-r, --repository <repo>` | GitHub repository in format "owner/repo" (overrides auto-detection) | - |
653
+ | `-t, --token <token>` | GitHub API token (or set GITHUB_TOKEN env var) | - |
654
+ | `--since <date>` | Only analyze PRs since this date (ISO format) | - |
655
+ | `--until <date>` | Only analyze PRs until this date (ISO format) | - |
656
+ | `--limit <number>` | Limit number of PRs to analyze | - |
657
+ | `--resume` | Resume interrupted analysis | `false` |
658
+ | `--clear` | Clear existing data before analysis | `false` |
659
+ | `--concurrency <number>` | Number of concurrent requests | `2` |
660
+ | `--batch-size <number>` | Batch size for processing | `50` |
661
+ | `--verbose` | Show verbose output | `false` |
662
+
663
+ #### Examples
664
+
665
+ ```bash
666
+ # Analyze current project (auto-detect repo)
667
+ codecritique pr-history:analyze
668
+
669
+ # Analyze specific repository
670
+ codecritique pr-history:analyze --repository owner/repo --token ghp_xxx
671
+
672
+ # Analyze with date range
673
+ codecritique pr-history:analyze --since 2024-01-01 --until 2024-12-31
674
+
675
+ # Clear existing data and re-analyze
676
+ codecritique pr-history:analyze --clear --limit 100
677
+
678
+ # Resume interrupted analysis
679
+ codecritique pr-history:analyze --resume
680
+ ```
681
+
682
+ ### pr-history:status
683
+
684
+ Check PR analysis status for the current project or specified repository.
685
+
686
+ ```bash
687
+ codecritique pr-history:status [options]
688
+ ```
689
+
690
+ #### Options
691
+
692
+ | Option | Description | Default |
693
+ | ------------------------- | ------------------------------------------------------------------- | ------- |
694
+ | `-d, --directory <dir>` | Project directory to check status for | `.` |
695
+ | `-r, --repository <repo>` | GitHub repository in format "owner/repo" (overrides auto-detection) | - |
696
+
697
+ #### Examples
698
+
699
+ ```bash
700
+ # Check status for current project
701
+ codecritique pr-history:status
702
+
703
+ # Check status for specific repository
704
+ codecritique pr-history:status --repository owner/repo
705
+ ```
706
+
707
+ ### pr-history:clear
708
+
709
+ Clear PR analysis data for the current project or specified repository.
710
+
711
+ ```bash
712
+ codecritique pr-history:clear [options]
713
+ ```
714
+
715
+ #### Options
716
+
717
+ | Option | Description | Default |
718
+ | ------------------------- | ------------------------------------------------------------------- | ------- |
719
+ | `-d, --directory <dir>` | Project directory to clear data for | `.` |
720
+ | `-r, --repository <repo>` | GitHub repository in format "owner/repo" (overrides auto-detection) | - |
721
+ | `--force` | Skip confirmation prompts | `false` |
722
+
723
+ #### Examples
724
+
725
+ ```bash
726
+ # Clear data for current project (with confirmation)
727
+ codecritique pr-history:clear
728
+
729
+ # Clear data for specific repository without confirmation
730
+ codecritique pr-history:clear --repository owner/repo --force
731
+ ```
732
+
733
+ ## RAG Architecture
734
+
735
+ ### How RAG Works
736
+
737
+ The Retrieval-Augmented Generation (RAG) approach enhances traditional AI code review by providing rich context:
738
+
739
+ ```mermaid
740
+ graph TD
741
+ A[Code Input] --> B[File Analysis]
742
+ B --> C[Context Retrieval]
743
+ C --> D[Similar Code Examples]
744
+ C --> E[Relevant Documentation]
745
+ C --> F[PR History Patterns]
746
+ C --> G[Custom Guidelines]
747
+ D --> H[LLM Analysis]
748
+ E --> H
749
+ F --> H
750
+ G --> H
751
+ H --> I[Contextualized Review]
752
+ ```
753
+
754
+ ### Components
755
+
756
+ 1. **Embedding Engine**: Uses FastEmbed to generate vector representations of code and documentation
757
+ 2. **Vector Database**: LanceDB stores embeddings for fast similarity search
758
+ 3. **Context Retrieval**: Finds relevant code examples, documentation, and historical patterns
759
+ 4. **LLM Integration**: Anthropic Claude analyzes code with rich contextual information
760
+ 5. **PR History Analyzer**: Learns from past code review patterns in your repository
761
+
762
+ ### Benefits of RAG
763
+
764
+ - **Project-Specific**: Understands your codebase's unique patterns
765
+ - **Learning**: Improves recommendations based on historical data
766
+ - **Comprehensive**: Considers code, docs, and review history together
767
+ - **Efficient**: Local embeddings provide fast context retrieval
768
+ - **Privacy**: Embeddings are stored locally, code never leaves your machine
769
+
770
+ ## Configuration
771
+
772
+ ### Custom Documents
773
+
774
+ Integrate your team's guidelines and documentation:
775
+
776
+ ```bash
777
+ codecritique analyze --file src/component.tsx \
778
+ --doc "Engineering Guidelines:./docs/engineering.md" \
779
+ --doc "React Standards:./docs/react-guide.md" \
780
+ --doc "API Guidelines:./docs/api-standards.md"
781
+ ```
782
+
783
+ Document format: `"Title:./path/to/file.md"`
784
+
785
+ ### Embedding Exclusions
786
+
787
+ #### Using exclusion files
788
+
789
+ Create a file containing exclusion patterns (one per line) and reference it with `--exclude-file`:
790
+
791
+ ```
792
+ # Example: exclusion-patterns.txt
793
+ # Exclude test files
794
+ **/*.test.js
795
+ **/*.spec.js
796
+ **/*.test.ts
797
+ **/*.spec.ts
798
+
799
+ # Exclude build outputs
800
+ dist/
801
+ build/
802
+ *.min.js
803
+
804
+ # Exclude dependencies
805
+ node_modules/
806
+ vendor/
807
+ ```
808
+
809
+ #### Using command-line exclusions
810
+
811
+ ```bash
812
+ codecritique embeddings:generate \
813
+ --exclude "**/*.test.js" "dist/**" "node_modules/**"
814
+ ```
815
+
816
+ ### Environment Variables
817
+
818
+ ```env
819
+ # Required
820
+ ANTHROPIC_API_KEY=your_anthropic_api_key
821
+
822
+ # Optional for PR history analysis
823
+ GITHUB_TOKEN=your_github_token
824
+
825
+ # Optional debugging
826
+ DEBUG=true
827
+ VERBOSE=true
828
+ ```
829
+
830
+ ## Output Formats
831
+
832
+ ### Text (Default)
833
+
834
+ Human-readable colored output for terminal usage:
835
+
836
+ ```
837
+ ===== AI Code Review Summary =====
838
+ Files Analyzed: 3
839
+ Files with Issues: 2
840
+ Total Issues Found: 5
841
+
842
+ ===== Review for src/components/Button.tsx =====
843
+ Summary: Component has naming inconsistency and missing prop validation
844
+
845
+ Issues:
846
+ [MAJOR] (Lines: 5) Component name 'ButtonComponent' doesn't match filename 'Button'
847
+ Suggestion: Rename component to 'Button' or update file name
848
+
849
+ [MINOR] (Lines: 12-15) Missing prop type validation
850
+ Suggestion: Add PropTypes or TypeScript interface
851
+
852
+ Positives:
853
+ - Good use of semantic HTML elements
854
+ - Proper accessibility attributes
855
+ ```
856
+
857
+ ### JSON
858
+
859
+ Structured output for programmatic processing:
860
+
861
+ ```json
862
+ {
863
+ "summary": {
864
+ "totalFilesReviewed": 3,
865
+ "filesWithIssues": 2,
866
+ "totalIssues": 5,
867
+ "skippedFiles": 0,
868
+ "errorFiles": 0
869
+ },
870
+ "details": [
871
+ {
872
+ "filePath": "src/components/Button.tsx",
873
+ "success": true,
874
+ "language": "typescript",
875
+ "review": {
876
+ "summary": "Component has naming inconsistency and missing prop validation",
877
+ "issues": [
878
+ {
879
+ "severity": "major",
880
+ "description": "Component name 'ButtonComponent' doesn't match filename 'Button'",
881
+ "lineNumbers": [5],
882
+ "suggestion": "Rename component to 'Button' or update file name"
883
+ }
884
+ ],
885
+ "positives": ["Good use of semantic HTML elements", "Proper accessibility attributes"]
886
+ }
887
+ }
888
+ ]
889
+ }
890
+ ```
891
+
892
+ ### Markdown
893
+
894
+ Documentation-friendly format:
895
+
896
+ ```markdown
897
+ # AI Code Review Results (RAG Approach)
898
+
899
+ ## Summary
900
+
901
+ - **Files Analyzed:** 3
902
+ - **Files with Issues:** 2
903
+ - **Total Issues Found:** 5
904
+
905
+ ## Detailed Review per File
906
+
907
+ ### src/components/Button.tsx
908
+
909
+ **Summary:** Component has naming inconsistency and missing prop validation
910
+
911
+ **Issues Found (2):**
912
+
913
+ - **[MAJOR] 🔥 (Lines: 5)**: Component name 'ButtonComponent' doesn't match filename 'Button'
914
+ - **[MINOR] 💡 (Lines: 12-15)**: Missing prop type validation
915
+
916
+ **Positives Found (2):**
917
+
918
+ - Good use of semantic HTML elements
919
+ - Proper accessibility attributes
920
+ ```
921
+
922
+ ## Error Handling & Troubleshooting
923
+
924
+ ### Common Issues
925
+
926
+ #### API Key Issues
927
+
928
+ **Error**: `ANTHROPIC_API_KEY not found in environment variables`
929
+
930
+ **Solution**:
931
+
932
+ ```bash
933
+ # Set environment variable
934
+ export ANTHROPIC_API_KEY=your_api_key
935
+
936
+ # Or create .env file
937
+ echo "ANTHROPIC_API_KEY=your_api_key" > .env
938
+ ```
939
+
940
+ #### Git Repository Issues
941
+
942
+ **Error**: `Not a git repository`
943
+
944
+ **Solution**: Ensure you're in a git repository when using `--diff-with`:
945
+
946
+ ```bash
947
+ git init # If needed
948
+ git add .
949
+ git commit -m "Initial commit"
950
+ ```
951
+
952
+ #### File Not Found
953
+
954
+ **Error**: `File not found: path/to/file.js`
955
+
956
+ **Solution**: Check file path and ensure it exists:
957
+
958
+ ```bash
959
+ # Use absolute path
960
+ codecritique analyze --file /full/path/to/file.js
961
+
962
+ # Or relative from current directory
963
+ ls path/to/file.js # Verify file exists
964
+ ```
965
+
966
+ #### Embedding Generation Issues
967
+
968
+ **Error**: `Failed to generate embeddings`
969
+
970
+ **Solutions**:
971
+
972
+ ```bash
973
+ # Clear existing embeddings and regenerate
974
+ codecritique embeddings:clear
975
+ codecritique embeddings:generate --verbose
976
+
977
+ # Reduce concurrency for memory issues
978
+ codecritique embeddings:generate --concurrency 5
979
+
980
+ # Exclude problematic files
981
+ codecritique embeddings:generate --exclude "large-files/**"
982
+ ```
983
+
984
+ #### Memory Issues
985
+
986
+ **Error**: `JavaScript heap out of memory`
987
+
988
+ **Solutions**:
989
+
990
+ ```bash
991
+ # Increase Node.js memory limit
992
+ export NODE_OPTIONS="--max-old-space-size=4096"
993
+
994
+ # Process fewer files at once
995
+ codecritique embeddings:generate --concurrency 3
996
+
997
+ # Exclude large files
998
+ codecritique embeddings:generate --exclude "**/*.min.js" "dist/**"
999
+ ```
1000
+
1001
+ ### Debugging
1002
+
1003
+ Enable verbose output for detailed logging:
1004
+
1005
+ ```bash
1006
+ codecritique analyze --file app.py --verbose
1007
+ ```
1008
+
1009
+ Enable debug mode:
1010
+
1011
+ ```bash
1012
+ DEBUG=true codecritique analyze --file app.py
1013
+ ```
1014
+
1015
+ ### Performance Optimization
1016
+
1017
+ 1. **Generate embeddings first** for better context:
1018
+
1019
+ ```bash
1020
+ codecritique embeddings:generate
1021
+ codecritique analyze --files "src/**/*.ts"
1022
+ ```
1023
+
1024
+ 2. **Use exclusion patterns** to skip irrelevant files:
1025
+
1026
+ ```bash
1027
+ codecritique embeddings:generate --exclude "**/*.test.js" "dist/**"
1028
+ ```
1029
+
1030
+ 3. **Adjust concurrency** based on system resources:
1031
+
1032
+ ```bash
1033
+ # For powerful machines
1034
+ codecritique embeddings:generate --concurrency 20
1035
+
1036
+ # For resource-constrained environments
1037
+ codecritique embeddings:generate --concurrency 3
1038
+ ```
1039
+
1040
+ ## Dependencies
1041
+
1042
+ ### Core Dependencies
1043
+
1044
+ - **[@anthropic-ai/sdk](https://www.npmjs.com/package/@anthropic-ai/sdk)** `0.55.0` - Anthropic Claude API integration
1045
+ - **[@lancedb/lancedb](https://www.npmjs.com/package/@lancedb/lancedb)** `0.19.0` - Vector database for embeddings
1046
+ - **[fastembed](https://www.npmjs.com/package/fastembed)** `^1.14.4` - Local embedding generation
1047
+ - **[commander](https://www.npmjs.com/package/commander)** `^11.0.0` - CLI framework
1048
+ - **[chalk](https://www.npmjs.com/package/chalk)** `^5.3.0` - Terminal colors
1049
+ - **[glob](https://www.npmjs.com/package/glob)** `^10.3.0` - File pattern matching
1050
+
1051
+ ### Optional Dependencies
1052
+
1053
+ - **[@octokit/rest](https://www.npmjs.com/package/@octokit/rest)** `21.1.1` - GitHub API (for PR history analysis)
1054
+ - **[dotenv](https://www.npmjs.com/package/dotenv)** `16.5.0` - Environment variable loading
1055
+
1056
+ ### Development Dependencies
1057
+
1058
+ - **[eslint](https://www.npmjs.com/package/eslint)** `9.29.0` - Code linting
1059
+ - **[prettier](https://www.npmjs.com/package/prettier)** `3.5.3` - Code formatting
1060
+ - **[typescript](https://www.npmjs.com/package/typescript)** `5.8.3` - TypeScript support
1061
+
1062
+ ## Contributing
1063
+
1064
+ We welcome contributions! Please follow these guidelines:
1065
+
1066
+ ### Development Setup
1067
+
1068
+ 1. **Clone the repository**:
1069
+
1070
+ ```bash
1071
+ git clone https://github.com/cosmocoder/CodeCritique.git
1072
+ cd CodeCritique
1073
+ ```
1074
+
1075
+ 2. **Install dependencies**:
1076
+
1077
+ ```bash
1078
+ npm install
1079
+ ```
1080
+
1081
+ 3. **Set up environment**:
1082
+
1083
+ ```bash
1084
+ cp .env.example .env
1085
+ # Add your API keys to .env
1086
+ ```
1087
+
1088
+ 4. **Run locally**:
1089
+ ```bash
1090
+ node src/index.js analyze --file test-file.js
1091
+ ```
1092
+
1093
+ ### Code Standards
1094
+
1095
+ - **ESLint**: Follow the configured ESLint rules
1096
+ - **Prettier**: Code is automatically formatted
1097
+ - **TypeScript**: Type definitions for better code quality
1098
+
1099
+ ### Testing
1100
+
1101
+ ```bash
1102
+ # Run linting
1103
+ npm run lint
1104
+
1105
+ # Run formatting
1106
+ npm run prettier
1107
+
1108
+ # Run tests
1109
+ npm run test
1110
+
1111
+ # Check for unused dependencies
1112
+ npm run knip
1113
+ ```
1114
+
1115
+ ### Submitting Changes
1116
+
1117
+ 1. **Fork the repository**
1118
+ 2. **Create a feature branch**: `git checkout -b feature/amazing-feature`
1119
+ 3. **Make your changes**
1120
+ 4. **Run tests**: `npm run lint && npm run prettier:ci`
1121
+ 5. **Commit your changes**: `git commit -m 'Add amazing feature'`
1122
+ 6. **Push to the branch**: `git push origin feature/amazing-feature`
1123
+ 7. **Open a Pull Request**
1124
+
1125
+ ### Areas for Contribution
1126
+
1127
+ - **Language Support**: Add specialized rules for new programming languages
1128
+ - **LLM Providers**: Integrate additional LLM providers (OpenAI, etc.)
1129
+ - **Output Formats**: Add new output formats (XML, SARIF, etc.)
1130
+ - **Performance**: Optimize embedding generation and search
1131
+ - **Documentation**: Improve documentation and examples
1132
+ - **Testing**: Add comprehensive test coverage
1133
+
1134
+ ### Reporting Issues
1135
+
1136
+ Please use GitHub Issues to report bugs or request features. Include:
1137
+
1138
+ - **System information** (OS, Node.js version)
1139
+ - **Command used** and **full error message**
1140
+ - **Expected vs actual behavior**
1141
+ - **Minimal reproduction case**
1142
+
1143
+ ## License
1144
+
1145
+ MIT License - see [LICENSE](LICENSE) file for details.