ai-codeindex 0.7.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,966 @@
1
+ Metadata-Version: 2.4
2
+ Name: ai-codeindex
3
+ Version: 0.7.0
4
+ Summary: AI-native code indexing tool for large codebases
5
+ Project-URL: Homepage, https://github.com/yourusername/codeindex
6
+ Project-URL: Documentation, https://github.com/yourusername/codeindex
7
+ Project-URL: Repository, https://github.com/yourusername/codeindex
8
+ Project-URL: Changelog, https://github.com/yourusername/codeindex/blob/master/CHANGELOG.md
9
+ Author-email: codeindex contributors <noreply@github.com>
10
+ Maintainer-email: codeindex team <noreply@github.com>
11
+ License: MIT
12
+ License-File: LICENSE
13
+ Keywords: ai,code,code-analysis,documentation,index,llm,tree-sitter
14
+ Classifier: Development Status :: 3 - Alpha
15
+ Classifier: Intended Audience :: Developers
16
+ Classifier: License :: OSI Approved :: MIT License
17
+ Classifier: Programming Language :: Python :: 3
18
+ Classifier: Programming Language :: Python :: 3.10
19
+ Classifier: Programming Language :: Python :: 3.11
20
+ Classifier: Programming Language :: Python :: 3.12
21
+ Requires-Python: >=3.10
22
+ Requires-Dist: click>=8.0
23
+ Requires-Dist: pyyaml>=6.0
24
+ Requires-Dist: rich>=13.0
25
+ Requires-Dist: tree-sitter-php>=0.23
26
+ Requires-Dist: tree-sitter-python>=0.21
27
+ Requires-Dist: tree-sitter>=0.21
28
+ Provides-Extra: dev
29
+ Requires-Dist: pytest-bdd>=7.0; extra == 'dev'
30
+ Requires-Dist: pytest-cov>=4.0; extra == 'dev'
31
+ Requires-Dist: pytest>=7.0; extra == 'dev'
32
+ Requires-Dist: ruff>=0.1; extra == 'dev'
33
+ Description-Content-Type: text/markdown
34
+
35
+ # codeindex
36
+
37
+ [![PyPI version](https://badge.fury.io/py/ai-codeindex.svg)](https://badge.fury.io/py/ai-codeindex)
38
+ [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
39
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
40
+ [![Tests](https://github.com/yourusername/codeindex/workflows/Tests/badge.svg)](https://github.com/yourusername/codeindex/actions)
41
+
42
+ **AI-native code indexing tool for large codebases.**
43
+
44
+ codeindex automatically generates intelligent documentation (`README_AI.md`) for your directories using tree-sitter parsing and external AI CLIs. Perfect for understanding large codebases, onboarding new developers, and maintaining living documentation.
45
+
46
+ ---
47
+
48
+ ## ✨ Features
49
+
50
+ - 🚀 **AI-Powered Documentation**: Generate comprehensive README files using Claude, GPT, or any AI CLI
51
+ - 🌳 **Tree-sitter Parsing**: Accurate symbol extraction (classes, functions, methods, imports) for Python & PHP
52
+ - ⚡ **Parallel Scanning**: Scan multiple directories concurrently for fast indexing
53
+ - 🎯 **Smart Filtering**: Include/exclude patterns with glob support
54
+ - 🔧 **Flexible Integration**: Works with any AI CLI tool via configurable commands
55
+ - 📊 **Coverage Tracking**: Check which directories have been indexed
56
+ - 🎨 **Fallback Mode**: Generate basic documentation without AI
57
+ - 🎯 **KISS Universal Description** (v0.4.0+): Language-agnostic, zero-assumption module descriptions
58
+ - 🏗️ **Modular Architecture** (v0.3.1+): Clean, maintainable 6-module CLI design
59
+ - 🔄 **Adaptive Symbols** (v0.2.0+): Dynamic symbol extraction (5-150 per file based on size)
60
+ - 📈 **Technical Debt Analysis** (v0.3.0+): Detect code quality issues and complexity metrics
61
+ - 🔍 **Symbol Indexing** (v0.1.2+): Global symbol search and project-wide navigation
62
+ - 🛣️ **Framework Route Extraction** (v0.5.0+): Auto-detect and extract routes from web frameworks
63
+ - **ThinkPHP**: Convention-based routing with line numbers and PHPDoc descriptions
64
+ - **Laravel**: (Coming soon) Explicit route definitions
65
+ - **FastAPI**: (Coming soon) Decorator-based routes
66
+ - **Django**: (Coming soon) URL patterns
67
+ - 📝 **AI Docstring Extraction** (v0.4.0+, Epic 9): Multi-language documentation normalization
68
+ - **Hybrid mode**: Selective AI processing (<$1 per 250 directories)
69
+ - **All-AI mode**: Maximum quality for critical projects
70
+ - **Language support**: PHP (PHPDoc + inline comments), Python (coming soon)
71
+ - **Mixed language**: Normalize Chinese + English comments to clean English
72
+
73
+ ---
74
+
75
+ ## 📦 Installation
76
+
77
+ ### Using pipx (Recommended)
78
+
79
+ ```bash
80
+ pipx install ai-codeindex
81
+ ```
82
+
83
+ ### Using pip
84
+
85
+ ```bash
86
+ pip install ai-codeindex
87
+ ```
88
+
89
+ ### From Source
90
+
91
+ ```bash
92
+ git clone https://github.com/yourusername/codeindex.git
93
+ cd codeindex
94
+ pip install -e .
95
+ ```
96
+
97
+ ---
98
+
99
+ ## 🚀 Quick Start
100
+
101
+ ### 1. Initialize Configuration
102
+
103
+ ```bash
104
+ cd /your/project
105
+ codeindex init
106
+ ```
107
+
108
+ This creates `.codeindex.yaml` in your project.
109
+
110
+ ### 2. Configure AI CLI
111
+
112
+ Edit `.codeindex.yaml`:
113
+
114
+ ```yaml
115
+ # AI CLI command to use for generating documentation
116
+ ai_command: 'claude -p "{prompt}" --allowedTools "Read"'
117
+
118
+ # List of patterns to include for scanning
119
+ include:
120
+ - src/
121
+
122
+ # List of patterns to exclude from scanning
123
+ exclude:
124
+ - "**/test/**"
125
+ - "**/__pycache__/**"
126
+
127
+ # Supported languages
128
+ languages:
129
+ - python
130
+ - php
131
+
132
+ # Output filename
133
+ output_file: "README_AI.md"
134
+ ```
135
+
136
+ **Other AI CLI examples:**
137
+ ```yaml
138
+ # OpenAI
139
+ ai_command: 'openai chat "{prompt}" --model gpt-4'
140
+
141
+ # Gemini
142
+ ai_command: 'gemini "{prompt}"'
143
+
144
+ # Custom script
145
+ ai_command: '/path/to/my-ai-wrapper.sh "{prompt}"'
146
+ ```
147
+
148
+ ### 3. Scan a Directory
149
+
150
+ ```bash
151
+ # Scan single directory
152
+ codeindex scan ./src/auth
153
+
154
+ # Preview prompt without executing
155
+ codeindex scan ./src/auth --dry-run
156
+
157
+ # Generate without AI (fallback mode)
158
+ codeindex scan ./src/auth --fallback
159
+ ```
160
+
161
+ **💡 Pro Tip**: When scanning web framework directories (like `Application/Admin/Controller` for ThinkPHP), codeindex automatically:
162
+ - ✅ Detects the framework
163
+ - ✅ Extracts routes with line numbers
164
+ - ✅ Includes method descriptions from PHPDoc/docstrings
165
+ - ✅ Generates route tables in README_AI.md
166
+
167
+ ### 4. Batch Processing
168
+
169
+ ```bash
170
+ # Scan all directories (generates SmartWriter READMEs)
171
+ codeindex scan-all
172
+
173
+ # Traditional batch processing (for AI-enhanced docs)
174
+ codeindex list-dirs | xargs -P 4 -I {} codeindex scan {}
175
+ codeindex list-dirs | parallel -j 4 codeindex scan {}
176
+ ```
177
+
178
+ **Example output:**
179
+ ```
180
+ 📝 Generating READMEs (SmartWriter)...
181
+ ✓ Application ( 50KB)
182
+ ✓ Admin ( 20KB)
183
+ ✓ api ( 15KB)
184
+ → Completed: 3/3 directories
185
+ ```
186
+
187
+ ### 5. Generate Structured Data (JSON)
188
+
189
+ **NEW in v0.5.0**: For tool integration (e.g., LoomGraph, custom scripts, CI/CD pipelines), generate machine-readable JSON output.
190
+
191
+ ```bash
192
+ # Single directory
193
+ codeindex scan ./src --output json
194
+
195
+ # Entire project
196
+ codeindex scan-all --output json > parse_results.json
197
+
198
+ # View formatted JSON
199
+ codeindex scan ./src --output json | jq .
200
+ ```
201
+
202
+ **JSON Output Structure**:
203
+
204
+ ```json
205
+ {
206
+ "success": true,
207
+ "results": [
208
+ {
209
+ "file": "src/parser.py",
210
+ "symbols": [
211
+ {
212
+ "name": "Parser",
213
+ "kind": "class",
214
+ "signature": "class Parser:",
215
+ "line_start": 15,
216
+ "line_end": 120
217
+ }
218
+ ],
219
+ "imports": [
220
+ {"module": "pathlib", "names": ["Path"], "is_from": true}
221
+ ],
222
+ "error": null
223
+ }
224
+ ],
225
+ "summary": {
226
+ "total_files": 1,
227
+ "total_symbols": 1,
228
+ "total_imports": 1,
229
+ "errors": 0
230
+ }
231
+ }
232
+ ```
233
+
234
+ **Error Handling**:
235
+
236
+ When errors occur, the JSON response includes structured error information:
237
+
238
+ ```json
239
+ {
240
+ "success": false,
241
+ "error": {
242
+ "code": "DIRECTORY_NOT_FOUND",
243
+ "message": "Directory does not exist: /path/to/dir",
244
+ "detail": null
245
+ },
246
+ "results": [],
247
+ "summary": {
248
+ "total_files": 0,
249
+ "errors": 1
250
+ }
251
+ }
252
+ ```
253
+
254
+ **Use Cases**:
255
+ - 🔌 **Tool Integration**: Feed parse results to visualization tools like LoomGraph
256
+ - 🤖 **CI/CD Pipelines**: Validate code structure in automated workflows
257
+ - 📊 **Analytics**: Analyze codebase metrics across versions
258
+ - 🧪 **Testing**: Verify expected code structure in tests
259
+
260
+ ### 6. Check Status
261
+
262
+ ```bash
263
+ codeindex status
264
+ ```
265
+
266
+ **Output:**
267
+ ```
268
+ Indexing Status
269
+ ───────────────────────────────────────
270
+ ✅ src/auth/
271
+ ✅ src/utils/
272
+ ⚠️ src/api/ (no README_AI.md)
273
+ ✅ src/db/
274
+
275
+ Indexed: 3/4 (75%)
276
+ ```
277
+
278
+ ### 7. Generate Symbol Indexes (v0.1.2+)
279
+
280
+ **Global symbol index** - Find any class/function across your codebase:
281
+
282
+ ```bash
283
+ # Generate PROJECT_SYMBOLS.md (global symbol index)
284
+ codeindex symbols
285
+
286
+ # Generate PROJECT_INDEX.md (module overview)
287
+ codeindex index
288
+
289
+ # Analyze git changes and affected directories
290
+ codeindex affected --since HEAD~5 --until HEAD
291
+ codeindex affected --json # For scripting/CI
292
+ ```
293
+
294
+ **What you get:**
295
+
296
+ **PROJECT_SYMBOLS.md** provides:
297
+ - Quick class/function lookup across all files
298
+ - Cross-file references and imports
299
+ - Symbol locations with line numbers
300
+ - Grouped by directory
301
+
302
+ **PROJECT_INDEX.md** provides:
303
+ - Module overview with descriptions
304
+ - Directory structure
305
+ - Entry points and CLI commands
306
+ - Generated from README_AI.md files
307
+
308
+ **Affected analysis** helps with incremental updates:
309
+ - Shows which directories changed in git commits
310
+ - Suggests which README_AI.md files need regeneration
311
+ - JSON output for CI/CD integration
312
+
313
+ ### 8. Analyze Technical Debt (v0.3.0+)
314
+
315
+ **NEW in v0.3.0**: Detect code quality issues and technical debt patterns.
316
+
317
+ ```bash
318
+ # Analyze directory for technical debt
319
+ codeindex tech-debt ./src
320
+
321
+ # Output formats
322
+ codeindex tech-debt ./src --format console # Human-readable (default)
323
+ codeindex tech-debt ./src --format markdown # Documentation
324
+ codeindex tech-debt ./src --format json # API/scripting
325
+
326
+ # Save to file
327
+ codeindex tech-debt ./src --output debt_report.md
328
+
329
+ # Recursive analysis
330
+ codeindex tech-debt ./src --recursive
331
+
332
+ # Quiet mode (minimal output)
333
+ codeindex tech-debt ./src --quiet
334
+ ```
335
+
336
+ **What it detects:**
337
+ - 🔴 **Super large files** (>5000 lines) - CRITICAL
338
+ - 🟡 **Large files** (>2000 lines) - HIGH
339
+ - 🔴 **God Classes** (>50 methods) - CRITICAL
340
+ - 🟡 **Symbol overload** (>100 symbols) - CRITICAL
341
+ - 🟠 **High noise ratio** (>50% low-quality symbols) - HIGH
342
+
343
+ **Example output:**
344
+ ```
345
+ ══════════════════════════════════════
346
+ Technical Debt Report
347
+ ══════════════════════════════════════
348
+
349
+ Summary
350
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
351
+ Files analyzed: 15
352
+ Issues found: 3
353
+ Quality Score: 78.3/100
354
+
355
+ Severity Breakdown
356
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
357
+ CRITICAL: 1
358
+ HIGH: 2
359
+ MEDIUM: 0
360
+ LOW: 0
361
+
362
+ File Details
363
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
364
+
365
+ 📄 src/models/user.py (Quality: 70.0)
366
+ 🔴 CRITICAL - super_large_file
367
+ File has 6000 lines (threshold: 5000)
368
+ → Split into 3-5 smaller files
369
+ ```
370
+
371
+ ### 9. Framework Route Extraction (v0.5.0+)
372
+
373
+ **NEW in v0.5.0**: Automatically detect and extract routes from web frameworks with line numbers and descriptions.
374
+
375
+ codeindex automatically identifies web frameworks and extracts route information when scanning Controller/View directories. Routes are displayed as beautiful markdown tables in your `README_AI.md` files.
376
+
377
+ #### Supported Frameworks
378
+
379
+ | Framework | Language | Status | Features |
380
+ |-----------|----------|--------|----------|
381
+ | **ThinkPHP** | PHP | ✅ Stable | Line numbers, PHPDoc descriptions, module-based routing |
382
+ | **Laravel** | PHP | 🔄 Coming v0.6.0 | Named routes, route groups, middleware |
383
+ | **FastAPI** | Python | 🔄 Coming v0.6.0 | Path operations, dependencies, tags |
384
+ | **Django** | Python | 🔄 Coming v0.6.0 | URL patterns, namespaces, view classes |
385
+
386
+ #### Example Output
387
+
388
+ **ThinkPHP Controller** (`Application/Admin/Controller/UserController.php`):
389
+
390
+ ```php
391
+ class UserController {
392
+ /**
393
+ * Get user list with pagination
394
+ */
395
+ public function index() {
396
+ // ...
397
+ }
398
+
399
+ /**
400
+ * 创建新用户
401
+ */
402
+ public function create() {
403
+ // ...
404
+ }
405
+ }
406
+ ```
407
+
408
+ **Generated Route Table** in `README_AI.md`:
409
+
410
+ ```markdown
411
+ ## Routes (ThinkPHP)
412
+
413
+ | URL | Controller | Action | Location | Description |
414
+ |-----|------------|--------|----------|-------------|
415
+ | `/admin/user/index` | UserController | index | `UserController.php:12` | Get user list with pagination |
416
+ | `/admin/user/create` | UserController | create | `UserController.php:20` | 创建新用户 |
417
+ ```
418
+
419
+ #### How It Works
420
+
421
+ 1. **Auto-Detection**: Scans directory structure to detect web frameworks
422
+ 2. **Symbol Extraction**: Parses controllers/views using tree-sitter
423
+ 3. **Route Inference**: Applies framework-specific routing conventions
424
+ 4. **Documentation Extraction**: Extracts docstrings/PHPDoc comments
425
+ 5. **Table Generation**: Formats as markdown table in README_AI.md
426
+
427
+ **Features:**
428
+ - ✅ **Line Numbers**: Clickable `file:line` locations
429
+ - ✅ **Descriptions**: From PHPDoc/docstrings (auto-truncated to 60 chars)
430
+ - ✅ **Multi-language**: Supports Chinese and English descriptions
431
+ - ✅ **Smart Filtering**: Only public methods, excludes magic methods
432
+ - ✅ **Zero Configuration**: Just scan, routes auto-appear
433
+
434
+ #### Usage
435
+
436
+ ```bash
437
+ # Routes are automatically extracted when scanning
438
+ codeindex scan-all
439
+
440
+ # Or scan specific controller directory
441
+ codeindex scan ./Application/Admin/Controller
442
+ ```
443
+
444
+ No configuration needed! Routes are detected and extracted automatically.
445
+
446
+ #### For Developers
447
+
448
+ Want to add support for your favorite framework? See [CLAUDE.md](CLAUDE.md#framework-route-extraction) for the complete developer guide on creating custom route extractors.
449
+
450
+ ---
451
+
452
+ ## 🎯 What's New in v0.4.0
453
+
454
+ ### KISS Universal Description Generator
455
+
456
+ **Story 4.4.5** introduces a completely new approach to module descriptions - zero assumptions, zero domain knowledge, completely universal.
457
+
458
+ **Before (v0.3.x):**
459
+ ```markdown
460
+ | Admin/Controller | 后台管理模块:系统管理和配置功能 | ← Generic, unhelpful
461
+ | Agent/Controller | 用户管理相关的控制器目录 | ← Can't differentiate
462
+ | Retail/Marketing | Module directory | ← No information
463
+ ```
464
+
465
+ **After (v0.4.0):**
466
+ ```markdown
467
+ | Admin/Controller | Admin/Controller: 36 modules (AdminJurUsers, Permission, SystemConfig, ...) |
468
+ | Agent/Controller | Agent/Controller: 13 modules (Agent, Commission, Withdrawal, ...) |
469
+ | Retail/Marketing | Retail/Marketing: 3 modules (BigWheel, Coupon, Lottery, ...) |
470
+ ```
471
+
472
+ **Benefits:**
473
+ - ✅ **Universal**: Works for all languages (Python, PHP, Java, Go, TypeScript, Rust...)
474
+ - ✅ **Specific**: Lists actual module/class names instead of generic descriptions
475
+ - ✅ **Differentiated**: Each directory description is unique
476
+ - ✅ **Traceable**: Original symbol names preserved, easy to search
477
+ - ✅ **Zero maintenance**: No hardcoded business domain keywords to maintain
478
+
479
+ **Validation Results:**
480
+ - PHP Project (ThinkPHP 5.0): ⭐⭐⭐⭐⭐
481
+ - Python Project (codeindex itself): ⭐⭐⭐⭐⭐
482
+ - Code reduction: -78 lines (-17%)
483
+ - Test coverage: 299 passed, 1 skipped
484
+
485
+ **Example: PROJECT_INDEX.md**
486
+ ```markdown
487
+ | Path | Purpose |
488
+ |------|---------|
489
+ | `src/codeindex/` | src/codeindex: 28 modules (adaptive_config, ai_helper, parser, scanner, ...) |
490
+ | `tests/` | codeindex/tests: 24 modules (test_adaptive_config, test_parser, ...) |
491
+ ```
492
+
493
+ See [docs/evaluation/story-4.4.5-kiss-validation.md](docs/evaluation/story-4.4.5-kiss-validation.md) for detailed validation report.
494
+
495
+ ---
496
+
497
+ ## 🏗️ Architecture Improvements (v0.3.1)
498
+
499
+ codeindex v0.3.1 features a **completely refactored CLI architecture** with specialized modules following the Single Responsibility Principle:
500
+
501
+ ### Modular CLI Design
502
+
503
+ ```
504
+ src/codeindex/
505
+ ├── cli.py (36 lines, -97%) # Main entry point
506
+ ├── cli_common.py (10 lines) # Shared utilities
507
+ ├── cli_scan.py (587 lines) # Scanning operations
508
+ ├── cli_config.py (97 lines) # Configuration management
509
+ ├── cli_symbols.py (226 lines) # Symbol indexing
510
+ └── cli_tech_debt.py (238 lines) # Technical debt analysis
511
+ ```
512
+
513
+ ### Benefits
514
+
515
+ - ✅ **Easier to maintain**: Each module has a single, clear responsibility
516
+ - ✅ **Better code organization**: 1062 lines → 36 lines in main CLI entry point
517
+ - ✅ **100% backward compatible**: All commands and options preserved
518
+ - ✅ **Zero breaking changes**: All 263 tests passing
519
+ - ✅ **Extensible**: Easy to add new commands without bloating main file
520
+
521
+ ---
522
+
523
+ ## 📖 Documentation
524
+
525
+ ### User Guides
526
+ - **[Getting Started](docs/guides/getting-started.md)** - Detailed installation and setup
527
+ - **[Configuration Guide](docs/guides/configuration.md)** - All config options explained
528
+ - **[Configuration Changelog](docs/guides/configuration-changelog.md)** - Version-by-version config changes
529
+ - **[Advanced Usage](docs/guides/advanced-usage.md)** - Parallel scanning, custom prompts
530
+ - **[Git Hooks Integration](docs/guides/git-hooks-integration.md)** - Automated code quality checks
531
+
532
+ ### Developer Guides
533
+ - **[Contributing](docs/guides/contributing.md)** - Development setup and guidelines
534
+ - **[Requirements Workflow](docs/development/requirements-workflow.md)** - Planning, issues, and development process
535
+ - **[CLAUDE.md](CLAUDE.md)** - Quick reference for AI Code and developers
536
+ - **[Architecture](docs/architecture/)** - Design decisions and ADRs
537
+
538
+ ### Planning
539
+ - **[Strategic Roadmap](docs/planning/ROADMAP.md)** - Long-term vision and priorities
540
+ - **[Changelog](CHANGELOG.md)** - Version history and breaking changes
541
+
542
+ ---
543
+
544
+ ## ⚙️ Configuration Reference
545
+
546
+ ### Complete `.codeindex.yaml`
547
+
548
+ ```yaml
549
+ codeindex: 1
550
+
551
+ # AI CLI command (required)
552
+ ai_command: 'claude -p "{prompt}" --allowedTools "Read"'
553
+
554
+ # Directory patterns
555
+ include:
556
+ - src/ # Include all subdirectories recursively
557
+ - modules/
558
+
559
+ exclude:
560
+ - "**/test/**"
561
+ - "**/__pycache__/**"
562
+ - "**/node_modules/**"
563
+
564
+ # Language support
565
+ languages:
566
+ - python
567
+ - php
568
+
569
+ # Output settings
570
+ output_file: "README_AI.md"
571
+ parallel_workers: 8
572
+ batch_size: 50
573
+
574
+ # Smart indexing (generates tiered documentation)
575
+ indexing:
576
+ max_readme_size: 51200
577
+ root_level: "overview"
578
+ module_level: "navigation"
579
+ leaf_level: "detailed"
580
+
581
+ # Adaptive symbol extraction (v0.2.0+)
582
+ symbols:
583
+ adaptive_symbols:
584
+ enabled: true # Enable dynamic symbol limits based on file size
585
+ min_symbols: 5 # Minimum symbols for tiny files
586
+ max_symbols: 150 # Maximum symbols for huge files
587
+ thresholds: # File size thresholds (lines)
588
+ tiny: 100 # <100 lines → 5 symbols
589
+ small: 500 # 100-500 lines → 15 symbols
590
+ medium: 1500 # 500-1500 lines → 30 symbols
591
+ large: 3000 # 1500-3000 lines → 50 symbols
592
+ xlarge: 5000 # 3000-5000 lines → 80 symbols
593
+ huge: 8000 # 5000-8000 lines → 120 symbols
594
+ mega: null # >8000 lines → 150 symbols
595
+ limits: # Symbol limits per category
596
+ tiny: 5
597
+ small: 15
598
+ medium: 30
599
+ large: 50
600
+ xlarge: 80
601
+ huge: 120
602
+ mega: 150
603
+
604
+ # Incremental updates
605
+ incremental:
606
+ enabled: true
607
+ thresholds:
608
+ skip_lines: 5
609
+ current_only: 50
610
+ suggest_full: 200
611
+
612
+ # Git Hooks configuration (v0.7.0+, Story 6)
613
+ hooks:
614
+ post_commit:
615
+ mode: auto # auto | disabled | async | sync | prompt
616
+ max_dirs_sync: 2 # Auto mode: ≤2 dirs = sync, >2 = async
617
+ enabled: true # Master switch
618
+ log_file: ~/.codeindex/hooks/post-commit.log
619
+ ```
620
+
621
+ **Hooks Modes**:
622
+ - `auto` (default): Smart detection based on project size
623
+ - `disabled`: Completely disabled
624
+ - `async`: Always non-blocking (background updates)
625
+ - `sync`: Always blocking (immediate updates)
626
+ - `prompt`: Reminder only, no auto-execution
627
+
628
+ See [Git Hooks Integration Guide](docs/guides/git-hooks-integration.md) for detailed configuration.
629
+
630
+ ---
631
+
632
+ ## 🤖 Claude Code Integration
633
+
634
+ codeindex generates `README_AI.md` files that are perfect for [Claude Code](https://claude.ai/code) to understand your project architecture. By adding a `CLAUDE.md` file to your project, you can guide Claude Code to use these indexes effectively.
635
+
636
+ ### Why Use CLAUDE.md?
637
+
638
+ Without guidance, Claude Code might:
639
+ - ❌ Blindly search through all source files (slow and inefficient)
640
+ - ❌ Miss important architectural context
641
+ - ❌ Use Glob/Grep instead of semantic understanding
642
+
643
+ With `CLAUDE.md`, Claude Code will:
644
+ - ✅ Read `README_AI.md` files first (fast and structured)
645
+ - ✅ Understand your project architecture before diving into code
646
+ - ✅ Use Serena MCP tools for precise symbol navigation
647
+
648
+ ### Quick Setup
649
+
650
+ **1. Copy the template to your project:**
651
+
652
+ ```bash
653
+ # After running codeindex scan-all
654
+ cp examples/CLAUDE.md.template CLAUDE.md
655
+ ```
656
+
657
+ **2. Customize the project-specific sections:**
658
+
659
+ Edit the "Project Specific Configuration" section in your `CLAUDE.md` to document your project structure, key components, and development guidelines.
660
+
661
+ **3. Commit and push:**
662
+
663
+ ```bash
664
+ git add CLAUDE.md README_AI.md **/README_AI.md
665
+ git commit -m "docs: add Claude Code integration"
666
+ ```
667
+
668
+ ### What's Included in the Template
669
+
670
+ The template includes guidance for Claude Code to:
671
+
672
+ 1. **Prioritize README_AI.md files** when understanding architecture
673
+ 2. **Use Serena MCP tools** (find_symbol, find_referencing_symbols) for precise navigation
674
+ 3. **Follow a structured workflow**: README → find_symbol → read source → analyze dependencies
675
+ 4. **Avoid inefficient patterns** like Glob/Grep searches
676
+
677
+ ### Example Workflow
678
+
679
+ After setup, when you ask Claude Code about your project:
680
+
681
+ ```
682
+ ❌ Without CLAUDE.md:
683
+ You: "Where is the authentication module?"
684
+ Claude: [Uses Glob to search for "auth*"]
685
+ [Scans 50 files, wastes time]
686
+
687
+ ✅ With CLAUDE.md:
688
+ You: "Where is the authentication module?"
689
+ Claude: [Reads /src/README_AI.md]
690
+ [Reads /src/auth/README_AI.md]
691
+ "The authentication module is in src/auth/authenticator.py:15
692
+ with UserAuthenticator class..."
693
+ ```
694
+
695
+ ### Advanced Integration: MCP Skills
696
+
697
+ codeindex also includes MCP skills for Claude Code:
698
+
699
+ | Skill | Description |
700
+ |-------|-------------|
701
+ | `/mo:arch` | Query code architecture using README_AI.md indexes |
702
+ | `/mo:index` | Generate repository index with codeindex |
703
+
704
+ **Install skills:**
705
+
706
+ ```bash
707
+ # Navigate to codeindex directory
708
+ cd /path/to/codeindex
709
+
710
+ # Run install script
711
+ ./skills/install.sh
712
+ ```
713
+
714
+ ### For Git Hooks Users (v0.5.0+)
715
+
716
+ If you're using **codeindex Git Hooks**, help your AI Code CLI understand how hooks work:
717
+
718
+ **Method 1: Let AI Code read the guide** ⭐️ (Recommended)
719
+
720
+ ```bash
721
+ # In your project directory, run:
722
+ codeindex docs show-ai-guide
723
+ ```
724
+
725
+ Then tell your AI:
726
+ ```
727
+ User: "Read the output above and update my CLAUDE.md with Git Hooks documentation"
728
+ AI Code: [Reads the guide]
729
+ [Understands Git Hooks]
730
+ [Updates your CLAUDE.md/AGENTS.md]
731
+ ✅ Done!
732
+ ```
733
+
734
+ **Method 2: Direct AI integration**
735
+
736
+ ```
737
+ User: "Help my AI CLI understand codeindex Git Hooks"
738
+ AI Code: [User runs: codeindex docs show-ai-guide]
739
+ [AI reads output]
740
+ [Updates CLAUDE.md with Git Hooks section]
741
+ ✅ Done! Future AI sessions will know about hooks.
742
+ ```
743
+
744
+ **What the guide contains:**
745
+ - Complete Git Hooks functionality explanation
746
+ - Pre-commit and post-commit behaviors
747
+ - Ready-to-use section template for your CLAUDE.md
748
+ - Troubleshooting and common scenarios
749
+ - Expected behaviors (auto-commits are normal!)
750
+
751
+ **Why this matters**: Your AI CLI needs to know that post-commit will create auto-commits (normal behavior) and that lint failures will block commits (by design).
752
+
753
+ ### Full Documentation
754
+
755
+ - **User Guide**: [docs/guides/claude-code-integration.md](docs/guides/claude-code-integration.md)
756
+ - **Git Hooks Guide**: [docs/guides/git-hooks-integration.md](docs/guides/git-hooks-integration.md)
757
+ - **AI Integration**: [examples/ai-integration-guide.md](examples/ai-integration-guide.md)
758
+ - **Template File**: [examples/CLAUDE.md.template](examples/CLAUDE.md.template)
759
+ - **Skills Documentation**: [skills/README.md](skills/README.md)
760
+
761
+ ---
762
+
763
+ ## 🎯 Use Cases
764
+
765
+ ### 📚 Code Understanding
766
+ Generate comprehensive documentation for legacy codebases to help new developers onboard faster.
767
+
768
+ ### 🔍 Codebase Navigation
769
+ Create structured overviews of large projects (10,000+ files) for efficient exploration.
770
+
771
+ ### 🤖 AI Agent Integration
772
+ Use generated indexes with tools like Claude Code or Cursor for better code context.
773
+
774
+ ### 📝 Living Documentation
775
+ Keep documentation up-to-date by regenerating README_AI.md files as code changes.
776
+
777
+ ---
778
+
779
+ ## 🛠️ How It Works
780
+
781
+ ```
782
+ Directory → Scanner → Parser (tree-sitter) → Smart Writer → README_AI.md (≤50KB)
783
+ ```
784
+
785
+ 1. **Scanner**: Walks directories, filters by config patterns
786
+ 2. **Parser**: Extracts symbols (classes, functions, imports) using tree-sitter
787
+ 3. **Smart Writer**: Generates tiered documentation with size limits
788
+ 4. **Output**: Optimized `README_AI.md` for AI consumption
789
+
790
+ ---
791
+
792
+ ## 📐 Smart Indexing Architecture
793
+
794
+ codeindex generates **tiered documentation** optimized for AI agents:
795
+
796
+ ```
797
+ Project Root/
798
+ ├── PROJECT_INDEX.md (~10KB) # Overview level
799
+ │ └── Module list + descriptions
800
+
801
+ ├── Module/
802
+ │ └── README_AI.md (~30KB) # Navigation level
803
+ │ ├── Grouped files by type
804
+ │ └── Key classes summary
805
+
806
+ └── LeafDir/
807
+ └── README_AI.md (≤50KB) # Detailed level
808
+ ├── Full symbol info
809
+ └── Dependencies
810
+ ```
811
+
812
+ ### Configuration
813
+
814
+ ```yaml
815
+ indexing:
816
+ max_readme_size: 51200 # 50KB limit
817
+ symbols:
818
+ max_per_file: 15
819
+ include_visibility: [public, protected]
820
+ exclude_patterns: ["get*", "set*"]
821
+ grouping:
822
+ by: suffix
823
+ patterns:
824
+ Controller: "HTTP handlers"
825
+ Service: "Business logic"
826
+ Model: "Data models"
827
+ ```
828
+
829
+ ---
830
+
831
+ ## 🤖 AI Coder Integration
832
+
833
+ ### For Claude Code Users
834
+
835
+ Add this to your project's `CLAUDE.md`:
836
+
837
+ ```markdown
838
+ ## Code Index
839
+
840
+ This project uses codeindex for AI-friendly documentation.
841
+
842
+ ### How to Read Code Index
843
+
844
+ 1. **Start with overview**: Read `PROJECT_INDEX.md` or root `README_AI.md` to understand project structure
845
+ 2. **Locate module**: Find the relevant module from the module list
846
+ 3. **Deep dive**: Read module's `README_AI.md` for file/symbol details
847
+ 4. **Read source**: Open specific files when you need implementation details
848
+
849
+ ### Index Files
850
+
851
+ - `README_AI.md` - Directory-level documentation (≤50KB each)
852
+ - Each directory with source code has its own README_AI.md
853
+
854
+ ### Example Workflow
855
+
856
+ Task: "Fix user authentication bug"
857
+ 1. Read root README_AI.md → Find Auth/User module
858
+ 2. Read Auth/README_AI.md → Find AuthService.php
859
+ 3. Read AuthService.php → Understand implementation
860
+ ```
861
+
862
+ ### Usage Tips
863
+
864
+ - **Token efficient**: Each README is ≤50KB, suitable for LLM context
865
+ - **Progressive loading**: Start from overview, drill down as needed
866
+ - **Keep indexes updated**: Run `codeindex scan-all --fallback` after major changes
867
+
868
+ ### CLAUDE.md Template
869
+
870
+ Copy the template to your project:
871
+
872
+ ```bash
873
+ cp /path/to/codeindex/examples/CLAUDE.md.template your-project/CLAUDE.md
874
+ ```
875
+
876
+ Or see [examples/CLAUDE.md.template](examples/CLAUDE.md.template) for the full template.
877
+
878
+ ---
879
+
880
+ ## 🌍 Language Support
881
+
882
+ | Language | Status | Parser | Features |
883
+ |----------------|-----------------|-------------|----------|
884
+ | Python | ✅ Supported | tree-sitter | Classes, functions, methods, imports, docstrings |
885
+ | PHP | ✅ Supported | tree-sitter | Classes (extends/implements), methods (visibility, static, return types), properties, functions |
886
+ | TypeScript/JS | 🚧 Coming Soon | tree-sitter | - |
887
+ | Java | 🚧 Planned | tree-sitter | - |
888
+ | Go | 🚧 Planned | tree-sitter | - |
889
+ | Rust | 🚧 Planned | tree-sitter | - |
890
+
891
+ ---
892
+
893
+ ## 🤝 Contributing
894
+
895
+ We welcome contributions! See [CONTRIBUTING.md](docs/guides/contributing.md) for:
896
+
897
+ - Development setup
898
+ - TDD workflow
899
+ - Code style guidelines
900
+ - How to add new languages
901
+ - Release process
902
+
903
+ ### Quick Start for Contributors
904
+
905
+ ```bash
906
+ # Clone and install
907
+ git clone https://github.com/yourusername/codeindex.git
908
+ cd codeindex
909
+ pip install -e ".[dev]"
910
+
911
+ # Run tests
912
+ pytest
913
+
914
+ # Lint and format
915
+ ruff check src/
916
+ ruff format src/
917
+ ```
918
+
919
+ ---
920
+
921
+ ## 📊 Roadmap
922
+
923
+ See [2025 Q1 Roadmap](docs/planning/roadmap/2025-Q1.md) for detailed plans.
924
+
925
+ **Upcoming:**
926
+ - Multi-language support (TypeScript, Java, Go)
927
+ - MCP service integration for Claude Code
928
+ - Incremental indexing (only scan changed files)
929
+ - Performance optimizations
930
+ - Plugin system for custom AI providers
931
+
932
+ ---
933
+
934
+ ## 📄 License
935
+
936
+ MIT License - see [LICENSE](LICENSE) file for details.
937
+
938
+ ---
939
+
940
+ ## 🙏 Acknowledgments
941
+
942
+ - [tree-sitter](https://tree-sitter.github.io/) - Fast, incremental parsing
943
+ - [Claude CLI](https://github.com/anthropics/claude-cli) - AI integration inspiration
944
+ - All contributors and users
945
+
946
+ ---
947
+
948
+ ## 📞 Support
949
+
950
+ - **Questions**: [GitHub Discussions](https://github.com/yourusername/codeindex/discussions)
951
+ - **Bugs**: [GitHub Issues](https://github.com/yourusername/codeindex/issues)
952
+ - **Feature Requests**: [GitHub Issues](https://github.com/yourusername/codeindex/issues/new?labels=enhancement)
953
+
954
+ ---
955
+
956
+ ## ⭐ Star History
957
+
958
+ If you find codeindex useful, please star the repository to show your support!
959
+
960
+ [![Star History Chart](https://api.star-history.com/svg?repos=yourusername/codeindex&type=Date)](https://star-history.com/#yourusername/codeindex&Date)
961
+
962
+ ---
963
+
964
+ <p align="center">
965
+ Made with ❤️ by the codeindex team
966
+ </p>