cicada-mcp 0.2.0__py3-none-any.whl → 0.3.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (62) hide show
  1. cicada/_version_hash.py +4 -0
  2. cicada/cli.py +6 -748
  3. cicada/commands.py +1255 -0
  4. cicada/dead_code/__init__.py +1 -0
  5. cicada/{find_dead_code.py → dead_code/finder.py} +2 -1
  6. cicada/dependency_analyzer.py +147 -0
  7. cicada/entry_utils.py +92 -0
  8. cicada/extractors/base.py +9 -9
  9. cicada/extractors/call.py +17 -20
  10. cicada/extractors/common.py +64 -0
  11. cicada/extractors/dependency.py +117 -235
  12. cicada/extractors/doc.py +2 -49
  13. cicada/extractors/function.py +10 -14
  14. cicada/extractors/keybert.py +228 -0
  15. cicada/extractors/keyword.py +191 -0
  16. cicada/extractors/module.py +6 -10
  17. cicada/extractors/spec.py +8 -56
  18. cicada/format/__init__.py +20 -0
  19. cicada/{ascii_art.py → format/ascii_art.py} +1 -1
  20. cicada/format/formatter.py +1145 -0
  21. cicada/git_helper.py +134 -7
  22. cicada/indexer.py +322 -89
  23. cicada/interactive_setup.py +251 -323
  24. cicada/interactive_setup_helpers.py +302 -0
  25. cicada/keyword_expander.py +437 -0
  26. cicada/keyword_search.py +208 -422
  27. cicada/keyword_test.py +383 -16
  28. cicada/mcp/__init__.py +10 -0
  29. cicada/mcp/entry.py +17 -0
  30. cicada/mcp/filter_utils.py +107 -0
  31. cicada/mcp/pattern_utils.py +118 -0
  32. cicada/{mcp_server.py → mcp/server.py} +819 -73
  33. cicada/mcp/tools.py +473 -0
  34. cicada/pr_finder.py +2 -3
  35. cicada/pr_indexer/indexer.py +3 -2
  36. cicada/setup.py +167 -35
  37. cicada/tier.py +225 -0
  38. cicada/utils/__init__.py +9 -2
  39. cicada/utils/fuzzy_match.py +54 -0
  40. cicada/utils/index_utils.py +9 -0
  41. cicada/utils/path_utils.py +18 -0
  42. cicada/utils/text_utils.py +52 -1
  43. cicada/utils/tree_utils.py +47 -0
  44. cicada/version_check.py +99 -0
  45. cicada/watch_manager.py +320 -0
  46. cicada/watcher.py +431 -0
  47. cicada_mcp-0.3.0.dist-info/METADATA +541 -0
  48. cicada_mcp-0.3.0.dist-info/RECORD +70 -0
  49. cicada_mcp-0.3.0.dist-info/entry_points.txt +4 -0
  50. cicada/formatter.py +0 -864
  51. cicada/keybert_extractor.py +0 -286
  52. cicada/lightweight_keyword_extractor.py +0 -290
  53. cicada/mcp_entry.py +0 -683
  54. cicada/mcp_tools.py +0 -291
  55. cicada_mcp-0.2.0.dist-info/METADATA +0 -735
  56. cicada_mcp-0.2.0.dist-info/RECORD +0 -53
  57. cicada_mcp-0.2.0.dist-info/entry_points.txt +0 -4
  58. /cicada/{dead_code_analyzer.py → dead_code/analyzer.py} +0 -0
  59. /cicada/{colors.py → format/colors.py} +0 -0
  60. {cicada_mcp-0.2.0.dist-info → cicada_mcp-0.3.0.dist-info}/WHEEL +0 -0
  61. {cicada_mcp-0.2.0.dist-info → cicada_mcp-0.3.0.dist-info}/licenses/LICENSE +0 -0
  62. {cicada_mcp-0.2.0.dist-info → cicada_mcp-0.3.0.dist-info}/top_level.txt +0 -0
@@ -1,735 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: cicada-mcp
3
- Version: 0.2.0
4
- Summary: An Elixir module search MCP server
5
- Author-email: wende <wende@hey.com>
6
- Maintainer-email: wende <wende@hey.com>
7
- License-Expression: MIT
8
- Project-URL: Homepage, https://github.com/wende/cicada
9
- Project-URL: Repository, https://github.com/wende/cicada
10
- Project-URL: Issues, https://github.com/wende/cicada/issues
11
- Project-URL: Changelog, https://github.com/wende/cicada/blob/main/CHANGELOG.md
12
- Project-URL: Documentation, https://github.com/wende/cicada#readme
13
- Keywords: elixir,phoenix,mcp,model-context-protocol,code-search,developer-tools,git-history,code-intelligence,ai-assistant
14
- Classifier: Development Status :: 4 - Beta
15
- Classifier: Intended Audience :: Developers
16
- Classifier: Operating System :: OS Independent
17
- Classifier: Programming Language :: Python :: 3
18
- Classifier: Programming Language :: Python :: 3.10
19
- Classifier: Programming Language :: Python :: 3.11
20
- Classifier: Programming Language :: Python :: 3.12
21
- Classifier: Topic :: Software Development :: Code Generators
22
- Classifier: Topic :: Software Development :: Libraries :: Python Modules
23
- Classifier: Topic :: Software Development :: Version Control :: Git
24
- Classifier: Topic :: Text Processing :: Indexing
25
- Classifier: Framework :: Pytest
26
- Requires-Python: >=3.10
27
- Description-Content-Type: text/markdown
28
- License-File: LICENSE
29
- Requires-Dist: mcp>=0.1.0
30
- Requires-Dist: pyyaml>=6.0
31
- Requires-Dist: tree-sitter>=0.20.0
32
- Requires-Dist: tree-sitter-elixir>=0.1.0
33
- Requires-Dist: gitpython>=3.1.0
34
- Requires-Dist: keybert>=0.8.0
35
- Requires-Dist: lemminflect>=0.2.3
36
- Requires-Dist: rank-bm25>=0.2.2
37
- Requires-Dist: simple-term-menu>=1.6.0
38
- Requires-Dist: tomli>=2.0.0; python_version < "3.11"
39
- Requires-Dist: gensim>=4.4.0
40
- Dynamic: license-file
41
-
42
- <div align="center">
43
-
44
- <img src="https://raw.githubusercontent.com/wende/cicada/main/public/cicada.png" alt="CICADA Logo" width="400"/>
45
-
46
- # CICADA
47
-
48
- ### **C**ode **I**ntelligence: **C**ontextual **A**nalysis, **D**iscovery, and **A**ttribution
49
-
50
- *Coding Agents search blindly. Be their guide.*
51
-
52
- [![Python Version](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
53
- [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
54
- [![codecov](https://codecov.io/gh/wende/cicada/branch/main/graph/badge.svg)](https://codecov.io/gh/wende/cicada)
55
- [![MCP Compatible](https://img.shields.io/badge/MCP-Compatible-green.svg)](https://modelcontextprotocol.io)
56
- [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
57
- [![Elixir](https://img.shields.io/badge/Elixir-Support-purple.svg)](https://elixir-lang.org/)
58
- [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](http://makeapullrequest.com)
59
-
60
- > 🎉 **Version 0.2.0 Released!** Enhanced AI-powered keyword search with **15-25x faster** incremental indexing. [What's New →](#whats-new-in-v020)
61
-
62
- [![Install MCP Server](https://cursor.com/deeplink/mcp-install-dark.svg)](https://cursor.com/en-US/install-mcp?name=cicada&config=eyJjb21tYW5kIjoidXZ4IGNpY2FkYS1tY3AgLiJ9)
63
-
64
- [Installation](#installation) •
65
- [Quick Start](#quick-start) •
66
- [Configuration](#configuration) •
67
- [MCP Tools](#mcp-tools) •
68
- [Contributing](#contributing)
69
-
70
- </div>
71
-
72
- ---
73
-
74
- ## Overview
75
-
76
- CICADA is a Model Context Protocol (MCP) server that provides AI coding assistants with deep code intelligence. **Currently supports Elixir projects**, with Python and TypeScript support planned for future releases. It indexes your codebase using tree-sitter AST parsing and provides instant access to modules, functions, call sites, and PR attribution.
77
-
78
- <div align="center">
79
- <table>
80
- <tr>
81
- <td align="center"><b>Without CICADA</b></td>
82
- <td align="center"><b>With CICADA</b></td>
83
- </tr>
84
- <tr>
85
- <td><img src="https://raw.githubusercontent.com/wende/cicada/main/public/no-cicada-demo-trimmed.gif" alt="Demo without CICADA" width="450"/></td>
86
- <td><img src="https://raw.githubusercontent.com/wende/cicada/main/public/cicada-demo-extended-clean-trimmed%20copy.gif" alt="Demo with CICADA" width="450"/></td>
87
- </tr>
88
- <tr>
89
- <td align="center">3,127 tokens • 52.84s</td>
90
- <td align="center">550 tokens • 35.04s</td>
91
- </tr>
92
- <tr>
93
- <td colspan="2" align="center"><b>82.4% fewer tokens • 33.7% faster</b></td>
94
- </tr>
95
- </table>
96
- </div>
97
-
98
- ## What's New in v0.2.0
99
-
100
- ### 🤖 Enhanced AI Keyword Extraction
101
-
102
- AI-powered semantic search is now production-ready with advanced NLP capabilities:
103
-
104
- - **BERT Integration**: KeyBERT-based keyword extraction for superior semantic understanding
105
- - **Configurable Model Tiers**: Choose between `fast`, `regular`, or `large` models to balance speed and accuracy
106
- - **Smart Wildcard Search**: Use patterns like `create*` or `*_user` to find related concepts
107
- - **Improved Relevance Scoring**: Better ranking of search results by semantic relevance
108
-
109
- ```bash
110
- # Index with enhanced AI keyword extraction
111
- cicada index --nlp --fast
112
-
113
- # Search by concept, not just exact names
114
- # AI will find: create_user, user_creation, new_user_account, etc.
115
- ```
116
-
117
- ### ⚡ Incremental Indexing - Lightning Fast Updates
118
-
119
- Say goodbye to slow reindexing! v0.2.0 introduces intelligent change detection that makes reindexing **15-25x faster**:
120
-
121
- - **🚀 15-25x Speedup**: Only processes files that actually changed (MD5 hash-based detection)
122
- - **💾 Interrupt Safety**: Ctrl-C gracefully saves progress - resume anytime without data loss
123
- - **🎯 Perfect for AI Search**: Keyword extraction drops from 48.7s to 2.1s for typical updates
124
- - **🔄 Zero Configuration**: Works automatically out of the box
125
-
126
- ```bash
127
- # First run: full index + hash computation (~12s for 200 files)
128
- cicada index --nlp
129
-
130
- # Subsequent runs: lightning fast incremental updates
131
- # Changed 5 files? Only 2.1s instead of 48.7s!
132
- cicada index --nlp
133
- ```
134
-
135
- **Performance Benchmark** (200-file Phoenix app, 5 files changed):
136
-
137
- | Operation | Before v0.2.0 | v0.2.0 Incremental | Speedup |
138
- |-----------|---------------|-------------------|---------|
139
- | Code indexing only | 12.3s | 0.8s | **15.4x faster** |
140
- | With AI keyword extraction | 48.7s | 2.1s | **23.2x faster** |
141
-
142
- ### 🛡️ Production-Ready Features
143
-
144
- - **Graceful Interruption**: Press Ctrl-C to cleanly save progress mid-indexing
145
- - **Resume Capability**: Interrupted? Just run the same command again to continue
146
- - **Smart Merging**: Automatically merges incremental changes with existing index
147
- - **Backward Compatible**: Seamlessly upgrades from v0.1.x with no breaking changes
148
-
149
- ### Migration from v0.1.x
150
-
151
- ✅ **Zero Breaking Changes** - v0.2.0 is fully backward compatible
152
- ✅ **Automatic Upgrade** - Just install and run `cicada index` as usual
153
- ✅ **Graceful Fallback** - Missing hashes? Performs full index once automatically
154
-
155
- ```bash
156
- # Update to v0.2.0
157
- uv tool install git+https://github.com/wende/cicada.git@latest --force
158
-
159
- # Run indexer - automatically enables incremental mode
160
- cicada index --nlp
161
-
162
- # Need to switch keyword extraction methods? Use --full for consistency
163
- cicada index --rag --fast --full
164
- ```
165
-
166
- **[Read the complete incremental indexing guide →](docs/INCREMENTAL_INDEXING.md)**
167
-
168
- ---
169
-
170
- ### Key Features
171
-
172
- - **AST-aware code search** - Find function definitions with full signatures, types, and documentation—no implementation bloat
173
- - **Intelligent call site tracking** - Resolve aliases and track where functions are actually invoked across the codebase
174
- - **PR attribution & review context** - Discover which pull request introduced any line and view historical code review discussions inline
175
- - **Function evolution tracking** - See when functions were created, how often they’re modified, and their complete git history
176
- - **Semantic module analysis** - Understand module dependencies, imports, and relationships beyond text matching
177
- - **MCP integration** - Provide AI coding assistants with structured code intelligence, not raw text
178
-
179
- ## Installation
180
-
181
- ### Recommended: Permanent Installation
182
-
183
- **Installing UV:**
184
- ```bash
185
- curl -LsSf https://astral.sh/uv/install.sh | sh
186
- # or: brew install uv
187
- ```
188
-
189
- **Install Cicada permanently for best experience:**
190
-
191
- ```bash
192
- # Step 1: Install once
193
- uv tool install cicada-mcp
194
-
195
- # Step 2: Setup in each project (one command per project)
196
- cd /path/to/your/elixir/project
197
- cicada claude # or: cicada cursor, cicada vs
198
- ```
199
-
200
- **That's it!** The setup command:
201
- - Indexes your codebase with keyword extraction
202
- - Stores all files in `~/.cicada/projects/<hash>/` (outside your repo)
203
- - Creates only an MCP config file in your repo (`.mcp.json` for Claude Code)
204
- - Configures the MCP server automatically
205
-
206
- **After setup:**
207
- 1. Restart your editor
208
- 2. Start coding with AI-powered Elixir intelligence!
209
-
210
- **Available commands after installation:**
211
- - `cicada [claude|cursor|vs]` - One-command setup per project
212
- - `cicada-mcp` - MCP server (auto-started by editor)
213
- - `cicada index` - Re-index code with custom options (--nlp or --rag)
214
- - `cicada index-pr` - Index pull requests for PR attribution
215
- - `cicada find-dead-code` - Find potentially unused functions
216
-
217
- ### Try Before Installing
218
-
219
- Want to test Cicada first? Use `uvx` for a quick trial:
220
-
221
- ```bash
222
- cd /path/to/your/elixir/project
223
-
224
- # For Claude Code
225
- uvx --from cicada-mcp cicada claude
226
-
227
- # For Cursor
228
- uvx --from cicada-mcp cicada cursor
229
-
230
- # For VS Code
231
- uvx --from cicada-mcp cicada vs
232
- ```
233
-
234
- **Note:** `uvx` is perfect for trying Cicada, but **permanent installation is recommended** because:
235
- - ✅ Faster MCP server startup (no temporary environment creation)
236
- - ✅ Access to all CLI commands (`cicada index`, `cicada index-pr`)
237
- - ✅ Fine-tuned keyword extraction with lemminflect or BERT models
238
- - ✅ PR indexing features
239
- - ✅ Custom re-indexing options
240
-
241
- Once you're convinced, install permanently with `uv tool install` above!
242
-
243
- ### Quick Setup for Cursor and Claude Code
244
-
245
- **For Cursor:**
246
-
247
- Click the install button at the top of this README or visit:
248
- [![Install MCP Server](https://cursor.com/deeplink/mcp-install-dark.svg)](https://cursor.com/en-US/install-mcp?name=cicada&config=eyJjb21tYW5kIjoidXZ4IGNpY2FkYS1tY3AgLiJ9)
249
-
250
- **For Claude Code:**
251
-
252
- ```bash
253
- # Option 1: Using claude mcp add command
254
- claude mcp add cicada -- uvx cicada-mcp ./path/to/your/codebase
255
-
256
- # Option 2: Using setup script
257
- uvx --from cicada-mcp cicada claude
258
- ```
259
-
260
- **Then for both editors,** run these commands in your codebase to generate keyword lookup and GitHub PR lookup databases:
261
-
262
- ```bash
263
- # Generate keyword lookup database
264
- uvx --from cicada-mcp cicada-index .
265
-
266
- # Generate GitHub PR lookup database
267
- uvx --from cicada-mcp cicada-index-pr .
268
- ```
269
-
270
- ---
271
-
272
- ## Quick Start
273
-
274
- After installation, ask your AI coding assistant:
275
-
276
- ```
277
- "What functions are in the MyApp.User module?"
278
- "Show me where authenticate/2 is called"
279
- "Which PR introduced line 42 of user.ex?"
280
- "Show me all PRs that modified the User module with their review comments"
281
- "Find all usages of Repo.insert/2"
282
- "What's the git history of the authenticate function?"
283
- ```
284
-
285
- **For PR features**, first run:
286
- ```bash
287
- cicada index-pr .
288
- ```
289
-
290
- ---
291
-
292
- ## Configuration
293
-
294
- ### Automatic Configuration
295
-
296
- The new simplified workflow stores all generated files outside your repository:
297
-
298
- **Storage Structure:**
299
- ```
300
- ~/.cicada/
301
- projects/
302
- <repo-hash>/
303
- config.yaml # MCP server configuration
304
- index.json # Code index with keywords
305
- pr_index.json # PR attribution data (optional)
306
- hashes.json # For incremental indexing
307
- ```
308
-
309
- **Your Repository (Clean!):**
310
- ```
311
- your-project/
312
- .mcp.json # Only this file is added (for Claude Code)
313
- # or .cursor/mcp.json for Cursor
314
- # or .vscode/settings.json for VS Code
315
- ```
316
-
317
- **Generated MCP Config (Claude Code example):**
318
- ```json
319
- {
320
- "mcpServers": {
321
- "cicada": {
322
- "command": "cicada-mcp",
323
- "env": {
324
- "CICADA_REPO_PATH": "/path/to/project",
325
- "CICADA_CONFIG_DIR": "/home/user/.cicada/projects/<hash>"
326
- }
327
- }
328
- }
329
- }
330
- ```
331
-
332
- ✅ Fast startup, no paths, portable!
333
-
334
- **Migration tip from v0.1.x:** If you have the old Python-based config, run:
335
- ```bash
336
- uv tool install git+https://github.com/wende/cicada.git@v0.2.0 --force
337
- cicada claude # Re-run to get optimized config
338
- ```
339
-
340
- ### Re-indexing
341
-
342
- After code changes, re-run the setup command:
343
-
344
- ```bash
345
- # Re-index for Claude Code
346
- uvx --from cicada-mcp cicada claude
347
-
348
- # Or if permanently installed
349
- cicada claude
350
- ```
351
-
352
- This will:
353
- - Detect changed files (incremental indexing)
354
- - Update the index with new/modified code
355
- - Keep your existing MCP configuration
356
-
357
- ### Optional: PR Attribution
358
-
359
- Index pull requests for PR-related features:
360
-
361
- ```bash
362
- # After permanent installation
363
- cicada index-pr .
364
-
365
- # Or with uvx
366
- uvx --from cicada-mcp cicada-index-pr .
367
- ```
368
-
369
- # Clean rebuild (re-index everything from scratch)
370
- cicada index-pr . --clean
371
- ```
372
-
373
- **See also:** [PR Indexing Documentation](docs/PR_INDEXING.md)
374
-
375
- ---
376
-
377
- ## MCP Tools
378
-
379
- CICADA provides 9 specialized tools for AI assistants to understand and navigate your codebase. For complete technical documentation including parameters and return formats, see [MCP Tools Reference](docs/MCP-Tools-Reference.md).
380
-
381
- ### Core Search Tools
382
-
383
- **`search_module`** - Find modules and view all their functions
384
- - Search by exact module name or file path
385
- - View function signatures with type specs
386
- - Filter public/private functions
387
- - Output in Markdown or JSON
388
-
389
- **`search_function`** - Locate function definitions and track usage
390
- - Search by function name, arity, or full module path
391
- - See where functions are called with line numbers
392
- - View actual code usage examples
393
- - Filter for test files only
394
-
395
- **`search_module_usage`** - Track module dependencies
396
- - Find all aliases and imports
397
- - See all function calls to a module
398
- - Understand module relationships
399
- - Map dependencies across codebase
400
-
401
- ### Git History & Attribution Tools
402
-
403
- **`find_pr_for_line`** - Identify which PR introduced any line of code
404
- - Line-level PR attribution via git blame
405
- - Author and commit information
406
- - Direct links to GitHub PRs
407
- - Requires: GitHub CLI + PR index
408
-
409
- **`get_file_pr_history`** - View complete PR history for a file
410
- - All PRs that modified the file
411
- - PR descriptions and metadata
412
- - Code review comments with line numbers
413
- - Requires: GitHub CLI + PR index
414
-
415
- **`get_commit_history`** - Track file and function evolution over time
416
- - Complete commit history for files
417
- - Function-level tracking (follows refactors)
418
- - Creation and modification timeline
419
- - Requires: `.gitattributes` configuration
420
-
421
- **`get_blame`** - Show line-by-line code ownership
422
- - Grouped authorship display
423
- - Commit details for each author
424
- - Code snippets with context
425
-
426
- ### Advanced Features
427
-
428
- **`search_by_keywords`** (EXPERIMENTAL) - Semantic documentation search
429
- - Find code by concepts, not just names
430
- - Wildcard pattern matching (`create*`, `*_user`)
431
- - NLP-extracted keywords from docs
432
- - Relevance scoring
433
- - Requires: Index built with `--nlp` or `--rag`
434
-
435
- **`find_dead_code`** - Identify potentially unused functions
436
- - Three confidence levels (high, medium, low)
437
- - Smart detection of callbacks and behaviors
438
- - Recognition of dynamic call patterns
439
- - Module-level grouping with line numbers
440
- - Excludes test files and `@impl` functions
441
-
442
- ---
443
-
444
- **See also:** [Complete MCP Tools Reference](docs/MCP-Tools-Reference.md) for detailed specifications
445
-
446
- ---
447
-
448
- ## CLI Tools
449
-
450
- CICADA provides several command-line tools for setup, indexing, and analysis:
451
-
452
- ### Setup & Configuration
453
-
454
- **`cicada`** - Initialize CICADA in your project
455
- ```bash
456
- cicada # Setup in current directory
457
- cicada /path/to/other/project # Setup in different directory
458
- ```
459
- - Generates `.mcp.json` configuration
460
- - Creates `.cicada/` directory
461
- - Installs Elixir dependencies
462
- - Configures git attributes for function tracking
463
-
464
- ### Indexing Tools
465
-
466
- **`cicada index`** - Index Elixir codebase
467
- ```bash
468
- cicada index # Index current directory
469
- cicada index --nlp # Use NLP keyword extraction (lemminflect)
470
- cicada index --rag # Use BERT-based keyword extraction
471
- ```
472
- - Parses all Elixir files using tree-sitter
473
- - Extracts modules, functions, and call sites
474
- - Resolves aliases for accurate tracking
475
- - Optional keyword extraction for semantic search
476
-
477
- **`cicada index-pr`** - Index GitHub pull requests
478
- ```bash
479
- cicada index-pr . # Index PRs for current repo
480
- cicada index-pr . --clean # Full rebuild from scratch
481
- ```
482
- - Requires GitHub CLI (`gh`) authenticated
483
- - Indexes PR metadata and review comments
484
- - Incremental updates by default
485
- - Enables PR attribution features
486
-
487
- ### Analysis Tools
488
-
489
- **`cicada find-dead-code`** - Find unused functions (CLI version)
490
- ```bash
491
- cicada find-dead-code # Show high confidence only
492
- cicada find-dead-code --min-confidence low # Show all candidates
493
- cicada find-dead-code --format json # JSON output
494
- cicada find-dead-code --index path/to/index.json
495
- ```
496
- - Analyzes function usage across codebase
497
- - Categorizes by confidence level
498
- - Available as both CLI tool and MCP tool
499
-
500
- ---
501
-
502
- ## Roadmap
503
-
504
- ### v0.2.0 (Released - October 2025) ✅
505
- - **Enhanced AI Keyword Extraction** - Production-ready semantic search
506
- - BERT integration with KeyBERT for superior keyword extraction
507
- - Configurable model tiers (fast, regular, large)
508
- - Wildcard pattern support (`create*`, `*_user`)
509
- - Improved relevance scoring
510
- - **Incremental Indexing** - 15-25x faster reindexing
511
- - MD5-based change detection
512
- - Processes only modified files
513
- - Interrupt-safe with graceful Ctrl-C handling
514
- - Resume capability for interrupted indexes
515
- - **Production Hardening**
516
- - Signal handlers (SIGINT, SIGTERM)
517
- - Partial progress saving
518
- - Automatic hash storage and management
519
-
520
- ### v0.1.1 (Released - October 2025) ✅
521
- - Module and function search
522
- - Call site tracking with alias resolution
523
- - PR attribution via git blame + GitHub
524
- - PR review comments with line mapping
525
- - File PR history with descriptions
526
- - GraphQL-based PR indexing (30x faster)
527
- - Function usage examples with code snippets
528
- - Git commit history tracking with precise function tracking
529
- - Function evolution metadata (creation, modifications, frequency)
530
- - Git blame integration with line-by-line authorship
531
- - Test file filtering
532
- - Multiple output formats (markdown, JSON)
533
- - Intelligent .mcp.json auto-configuration
534
- - `uv tool install` support
535
- - **Automatic version update checking** - Notifies users when newer versions are available
536
- - **NLP Keyword search** (EXPERIMENTAL) - Basic semantic search across documentation
537
-
538
- ### v0.3 (Potential Future Enhancements)
539
- - Enhanced keyword search with BM25 ranking
540
- - Directory tree hashing for faster change detection
541
- - Caching optimizations for large codebases
542
-
543
- ### Long Term (Stretch Goals)
544
- - Multi-language support (Python, TypeScript)
545
- - Semantic code search
546
- - Real-time incremental indexing
547
- - Web UI for exploration
548
-
549
- ### Out of Scope (Non-Goals)
550
- These features are explicitly **not planned**:
551
- - Fuzzy search / "did you mean" suggestions (grep is sufficient)
552
- - Function similarity algorithms or recommendations
553
- - Confidence scoring systems
554
- - Multi-repository support (single repo focus)
555
- - Alternative function suggestions (bang/non-bang variants)
556
-
557
- ---
558
-
559
- ## Design Decisions
560
-
561
- CICADA prioritizes simplicity and reliability over complexity:
562
-
563
- ### Intentional Constraints
564
- - **Exact name matching only** - Use grep/ripgrep for fuzzy searches; keeping CICADA focused
565
- - **Direct call tracking** - Tracks explicit function calls; comprehensive call graphs add complexity without enough value
566
- - **Manual documentation search** - Documentation indexing planned for v0.1
567
- - **No AI/ML features** - No similarity algorithms, confidence scoring, or recommendations; deterministic results only
568
-
569
- These are deliberate design choices to keep CICADA fast, predictable, and maintainable.
570
-
571
- ---
572
-
573
- ## Contributing
574
-
575
- ### Development Setup
576
-
577
- ```bash
578
- # Clone your fork
579
- git clone https://github.com/wende/cicada.git
580
- cd cicada
581
-
582
- # Using uv (recommended)
583
- uv sync
584
-
585
- # Or traditional venv (legacy)
586
- python -m venv venv
587
- source venv/bin/activate # On Windows: venv\Scripts\activate
588
- pip install -e ".[dev]"
589
-
590
- # Run tests
591
- pytest
592
- ```
593
-
594
- ### Testing
595
-
596
- ```bash
597
- # Run all tests
598
- pytest
599
-
600
- # Run specific test files
601
- pytest tests/test_parser.py
602
- pytest tests/test_search_function.py
603
-
604
- # Run with coverage (terminal report)
605
- pytest --cov=cicada --cov-report=term-missing
606
-
607
- # Generate HTML coverage report
608
- pytest --cov=cicada --cov-report=html
609
- # Open htmlcov/index.html in your browser
610
-
611
- # Run with coverage and see which lines need tests
612
- pytest --cov=cicada --cov-report=term-missing --cov-report=html
613
-
614
- # Check coverage and fail if below threshold (e.g., 80%)
615
- pytest --cov=cicada --cov-fail-under=80
616
- ```
617
-
618
- ### Code Style
619
-
620
- This project uses:
621
- - **black** for code formatting
622
- - **pytest** for testing
623
- - **type hints** where appropriate
624
-
625
- Before submitting a PR:
626
- ```bash
627
- # Format code
628
- black cicada tests
629
-
630
- # Run tests
631
- pytest
632
-
633
- # Check types (if using mypy)
634
- mypy cicada
635
- ```
636
-
637
- ### Reporting Issues
638
-
639
- When reporting bugs or requesting features:
640
-
641
- 1. Check existing [Issues](https://github.com/wende/cicada/issues)
642
- 2. If not found, create a new issue with:
643
- - Clear description
644
- - Steps to reproduce (for bugs)
645
- - Expected vs actual behavior
646
- - Your environment (OS, Python version, Elixir version)
647
-
648
- ---
649
-
650
- ## Troubleshooting
651
-
652
- ### "Index file not found"
653
-
654
- Run the indexer first:
655
- ```bash
656
- cicada index /path/to/project
657
- ```
658
-
659
- ### "Module not found"
660
-
661
- Use the exact module name as it appears in code (e.g., `MyApp.User`, not `User`).
662
-
663
- ### MCP Server Won't Connect
664
-
665
- 1. Verify `.mcp.json` exists in your project root
666
- 2. Check that all paths in `.mcp.json` are absolute
667
- 3. Ensure `index.json` was created successfully
668
- 4. Restart your MCP client (Claude Code, Cline, etc.)
669
- 5. Check your MCP client logs for errors
670
-
671
- ### PR Features Not Working
672
-
673
- PR features require the GitHub CLI and a PR index:
674
-
675
- ```bash
676
- # Install GitHub CLI
677
- brew install gh # macOS
678
- # or visit https://cli.github.com/
679
-
680
- # Authenticate
681
- gh auth login
682
-
683
- # Index PRs (first time or after new PRs)
684
- cicada index-pr .
685
-
686
- # Clean rebuild (re-index everything from scratch)
687
- cicada index-pr . --clean
688
- ```
689
-
690
- **Common issues:**
691
- - "No PR index found" → Run `cicada index-pr .`
692
- - "Not a GitHub repository" → Ensure repo has GitHub remote
693
- - Slow indexing → Incremental updates are used by default
694
-
695
- #### Uninstall
696
-
697
- Remove CICADA from a project:
698
-
699
- ```bash
700
- rm -rf .cicada/ .mcp.json
701
- # Restart your MCP client
702
- ```
703
-
704
- ---
705
-
706
- ## Credits
707
-
708
- ### Built With
709
-
710
- - [Tree-sitter](https://tree-sitter.github.io/) - Incremental parsing system
711
- - [tree-sitter-elixir](https://github.com/elixir-lang/tree-sitter-elixir) - Elixir grammar
712
- - [MCP](https://modelcontextprotocol.io/) - Model Context Protocol
713
- - [GitHub CLI](https://cli.github.com/) - PR attribution
714
-
715
- ---
716
-
717
- ## License
718
-
719
- This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
720
-
721
- ---
722
-
723
- ## Acknowledgments
724
-
725
- - The Anthropic team for Claude Code and MCP
726
- - The Elixir community for tree-sitter-elixir
727
- - All contributors who help improve CICADA
728
-
729
- ---
730
-
731
- <div align="center">
732
-
733
- **[⬆ back to top](#cicada)**
734
-
735
- </div>