code-finder 0.1.0__tar.gz → 0.1.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- code_finder-0.1.1/CODE_FINDER_README.md +167 -0
- code_finder-0.1.1/PKG-INFO +192 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/pyproject.toml +2 -2
- code_finder-0.1.1/src/code_finder.egg-info/PKG-INFO +192 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/code_finder.egg-info/SOURCES.txt +1 -0
- code_finder-0.1.0/PKG-INFO +0 -823
- code_finder-0.1.0/src/code_finder.egg-info/PKG-INFO +0 -823
- {code_finder-0.1.0 → code_finder-0.1.1}/README.md +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/setup.cfg +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/__init__.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/agentic_integration.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/ast_chunker.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/config.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/context_manager.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/embeddings.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/embeddings_interface.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/enhanced_ast_chunker.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/explorer.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/explorer_with_context.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/indexer.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/markdown_chunker.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/mode_handler.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/query_metrics.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/question_generator.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/readme_extractor.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/repository_adapter.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/search.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/skills/__init__.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/skills/_cli_common.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/skills/_index_manager.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/skills/api_surface.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/skills/evidence_retrieval.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/skills/grounded_review.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/synthesis/__init__.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/synthesis/editor_agent.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/synthesis/llm_synthesizer.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/synthesis/logic_explainer.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/synthesis/multi_review_pipeline.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/synthesis/prompt_builder.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/synthesis/providers.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/claude_context/synthesis/validators.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/code_finder.egg-info/dependency_links.txt +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/code_finder.egg-info/entry_points.txt +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/code_finder.egg-info/requires.txt +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/src/code_finder.egg-info/top_level.txt +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/tests/test_all_components.py +0 -0
- {code_finder-0.1.0 → code_finder-0.1.1}/tests/test_docstring_indexer.py +0 -0
|
@@ -0,0 +1,167 @@
|
|
|
1
|
+
# code-finder
|
|
2
|
+
|
|
3
|
+
AST-based code indexing and hybrid search (BM25 + vector) for retrieving code evidence from repositories. Built to answer natural-language questions about a codebase with ranked, source-grounded results.
|
|
4
|
+
|
|
5
|
+
> **Import name**: The package installs as `code-finder` but the Python import is `claude_context`, not `code_finder`.
|
|
6
|
+
|
|
7
|
+
## Install
|
|
8
|
+
|
|
9
|
+
```bash
|
|
10
|
+
pip install code-finder
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
Or run ephemerally without installing:
|
|
14
|
+
|
|
15
|
+
```bash
|
|
16
|
+
uv run --with code-finder code-finder-evidence --repo /path/to/repo --query "how does auth work?"
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
## What it does
|
|
20
|
+
|
|
21
|
+
code-finder parses source code into AST-aware chunks, embeds them with a local sentence-transformer model, and stores them in a Milvus Lite vector database. At query time it combines BM25 keyword search with vector similarity search (reciprocal rank fusion) to return the most relevant code snippets for a natural-language question.
|
|
22
|
+
|
|
23
|
+
Three capabilities are exposed as both CLI commands and Python functions:
|
|
24
|
+
|
|
25
|
+
| Capability | CLI command | What it returns |
|
|
26
|
+
|---|---|---|
|
|
27
|
+
| **Code evidence retrieval** | `code-finder-evidence` | Ranked code snippets matching a query |
|
|
28
|
+
| **Code-grounded review** | `code-finder-review` | Per-claim verdicts for a draft document |
|
|
29
|
+
| **API surface extraction** | `code-finder-api-surface` | Public classes, functions, and signatures |
|
|
30
|
+
|
|
31
|
+
## CLI usage
|
|
32
|
+
|
|
33
|
+
### Code evidence retrieval
|
|
34
|
+
|
|
35
|
+
Search a repo with a natural-language question:
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
code-finder-evidence \
|
|
39
|
+
--repo /path/to/repo \
|
|
40
|
+
--query "how does authentication work?" \
|
|
41
|
+
--limit 5
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
Filter by chunk type or file path:
|
|
45
|
+
|
|
46
|
+
```bash
|
|
47
|
+
code-finder-evidence \
|
|
48
|
+
--repo /path/to/repo \
|
|
49
|
+
--query "error handling" \
|
|
50
|
+
--filter-types function,method \
|
|
51
|
+
--filter-paths src/auth,src/config
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
Force a re-index after code changes:
|
|
55
|
+
|
|
56
|
+
```bash
|
|
57
|
+
code-finder-evidence --repo /path/to/repo --query "config loading" --reindex
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
### Code-grounded review
|
|
61
|
+
|
|
62
|
+
Validate a draft document's factual claims against the source code:
|
|
63
|
+
|
|
64
|
+
```bash
|
|
65
|
+
code-finder-review \
|
|
66
|
+
--repo /path/to/repo \
|
|
67
|
+
--draft docs/getting-started.md
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
Each claim gets a verdict: `supported`, `partially_supported`, `unsupported`, or `no_evidence_found`.
|
|
71
|
+
|
|
72
|
+
### API surface extraction
|
|
73
|
+
|
|
74
|
+
Extract the public API from source files. This is deterministic (no LLM, no indexing):
|
|
75
|
+
|
|
76
|
+
```bash
|
|
77
|
+
code-finder-api-surface --target src/mypackage/
|
|
78
|
+
|
|
79
|
+
# Single file
|
|
80
|
+
code-finder-api-surface --target src/mypackage/client.py
|
|
81
|
+
|
|
82
|
+
# Include private members
|
|
83
|
+
code-finder-api-surface --target src/mypackage/ --include-private
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
## Python API
|
|
87
|
+
|
|
88
|
+
```python
|
|
89
|
+
from claude_context.skills.evidence_retrieval import retrieve_evidence
|
|
90
|
+
|
|
91
|
+
results = retrieve_evidence(
|
|
92
|
+
repo_path="/path/to/repo",
|
|
93
|
+
query="how does hybrid search combine BM25 and vector results?",
|
|
94
|
+
limit=10,
|
|
95
|
+
filter_types=["function", "method"],
|
|
96
|
+
filter_paths=["src/auth", "src/config"],
|
|
97
|
+
)
|
|
98
|
+
|
|
99
|
+
for r in results:
|
|
100
|
+
print(f"{r['file_path']}:{r['start_line']} ({r['combined_score']:.3f})")
|
|
101
|
+
print(f" {r['signature']}")
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
```python
|
|
105
|
+
from claude_context.skills.grounded_review import grounded_review
|
|
106
|
+
|
|
107
|
+
report = grounded_review(
|
|
108
|
+
repo_path="/path/to/repo",
|
|
109
|
+
draft_path="docs/getting-started.md",
|
|
110
|
+
max_evidence_per_claim=5,
|
|
111
|
+
)
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
```python
|
|
115
|
+
from claude_context.skills.api_surface import extract_api_surface
|
|
116
|
+
|
|
117
|
+
surface = extract_api_surface(
|
|
118
|
+
target_path="src/mypackage/",
|
|
119
|
+
languages=["python"],
|
|
120
|
+
include_private=False,
|
|
121
|
+
include_docstrings=True,
|
|
122
|
+
)
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
## Index caching
|
|
126
|
+
|
|
127
|
+
On first run, code-finder builds an index of the repository (AST chunking + embeddings). This takes 1-3 minutes depending on repo size. The index is cached at:
|
|
128
|
+
|
|
129
|
+
```
|
|
130
|
+
{repo}/.vibe2doc/index.db
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
Subsequent runs reuse the cached index. Pass `--reindex` (CLI) or `reindex=True` (Python) after significant code changes. API surface extraction does not use the index.
|
|
134
|
+
|
|
135
|
+
## Filtering
|
|
136
|
+
|
|
137
|
+
### Path filtering
|
|
138
|
+
|
|
139
|
+
Restrict results to specific directories using `--filter-paths` (CLI) or `filter_paths` (Python). Paths are relative to the repo root:
|
|
140
|
+
|
|
141
|
+
```bash
|
|
142
|
+
code-finder-evidence --repo /path/to/repo --query "auth" --filter-paths src/auth,src/middleware
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
### Type filtering
|
|
146
|
+
|
|
147
|
+
Restrict to specific chunk types: `function`, `method`, `class`, `module`, `import`, `decorator`.
|
|
148
|
+
|
|
149
|
+
### Language filtering
|
|
150
|
+
|
|
151
|
+
Restrict to specific languages: `python`, `javascript`, `typescript`, `go`, and others.
|
|
152
|
+
|
|
153
|
+
## Supported languages
|
|
154
|
+
|
|
155
|
+
Python, JavaScript, TypeScript, Go (AST-parsed via tree-sitter). Additionally indexes Markdown, JSON, YAML, TOML, HTML, CSS, shell scripts, SQL, and other text formats.
|
|
156
|
+
|
|
157
|
+
## Used by
|
|
158
|
+
|
|
159
|
+
[redhat-docs-agent-tools](https://gitlab.cee.redhat.com/ccs-internal-tools/redhat-docs-agent-tools) uses code-finder as the backend for its `code-evidence`, `grounded-review`, and `api-surface` skills. If you're using those skills, code-finder is installed automatically as a dependency.
|
|
160
|
+
|
|
161
|
+
## Origin
|
|
162
|
+
|
|
163
|
+
code-finder was built from a fork of [claude-context](https://github.com/zilliztech/claude-context/) by Zilliz, which provides Milvus-backed code search for Claude. It was extended within [vibe2doc](https://gitlab.cee.redhat.com/dobrenna/vibe2doc) with enhanced AST chunking, path filtering, grounded review, and API surface extraction, then extracted as a standalone package. The vibe2doc README describes the full doc generation workflow; this package provides only the code analysis and search layer.
|
|
164
|
+
|
|
165
|
+
## License
|
|
166
|
+
|
|
167
|
+
Apache-2.0
|
|
@@ -0,0 +1,192 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: code-finder
|
|
3
|
+
Version: 0.1.1
|
|
4
|
+
Summary: Code evidence retrieval and grounded review for documentation workflows. AST chunking, hybrid search (BM25 + vector), and API surface extraction.
|
|
5
|
+
License-Expression: Apache-2.0
|
|
6
|
+
Keywords: documentation,code analysis,code evidence,semantic search,ast,embeddings
|
|
7
|
+
Requires-Python: >=3.10
|
|
8
|
+
Description-Content-Type: text/markdown
|
|
9
|
+
Requires-Dist: pymilvus>=2.3.0
|
|
10
|
+
Requires-Dist: milvus-lite>=2.3.0
|
|
11
|
+
Requires-Dist: sentence-transformers>=2.2.0
|
|
12
|
+
Requires-Dist: rank-bm25
|
|
13
|
+
Requires-Dist: numpy>=1.24.0
|
|
14
|
+
Requires-Dist: tqdm>=4.65.0
|
|
15
|
+
Requires-Dist: tree-sitter
|
|
16
|
+
Requires-Dist: tree-sitter-python
|
|
17
|
+
Requires-Dist: tree-sitter-javascript
|
|
18
|
+
Requires-Dist: tree-sitter-typescript
|
|
19
|
+
Requires-Dist: tree-sitter-go
|
|
20
|
+
Provides-Extra: synthesis
|
|
21
|
+
Requires-Dist: anthropic>=0.34.0; extra == "synthesis"
|
|
22
|
+
Provides-Extra: dev
|
|
23
|
+
Requires-Dist: pytest>=7.0; extra == "dev"
|
|
24
|
+
Requires-Dist: ruff; extra == "dev"
|
|
25
|
+
|
|
26
|
+
# code-finder
|
|
27
|
+
|
|
28
|
+
AST-based code indexing and hybrid search (BM25 + vector) for retrieving code evidence from repositories. Built to answer natural-language questions about a codebase with ranked, source-grounded results.
|
|
29
|
+
|
|
30
|
+
> **Import name**: The package installs as `code-finder` but the Python import is `claude_context`, not `code_finder`.
|
|
31
|
+
|
|
32
|
+
## Install
|
|
33
|
+
|
|
34
|
+
```bash
|
|
35
|
+
pip install code-finder
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
Or run ephemerally without installing:
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
uv run --with code-finder code-finder-evidence --repo /path/to/repo --query "how does auth work?"
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
## What it does
|
|
45
|
+
|
|
46
|
+
code-finder parses source code into AST-aware chunks, embeds them with a local sentence-transformer model, and stores them in a Milvus Lite vector database. At query time it combines BM25 keyword search with vector similarity search (reciprocal rank fusion) to return the most relevant code snippets for a natural-language question.
|
|
47
|
+
|
|
48
|
+
Three capabilities are exposed as both CLI commands and Python functions:
|
|
49
|
+
|
|
50
|
+
| Capability | CLI command | What it returns |
|
|
51
|
+
|---|---|---|
|
|
52
|
+
| **Code evidence retrieval** | `code-finder-evidence` | Ranked code snippets matching a query |
|
|
53
|
+
| **Code-grounded review** | `code-finder-review` | Per-claim verdicts for a draft document |
|
|
54
|
+
| **API surface extraction** | `code-finder-api-surface` | Public classes, functions, and signatures |
|
|
55
|
+
|
|
56
|
+
## CLI usage
|
|
57
|
+
|
|
58
|
+
### Code evidence retrieval
|
|
59
|
+
|
|
60
|
+
Search a repo with a natural-language question:
|
|
61
|
+
|
|
62
|
+
```bash
|
|
63
|
+
code-finder-evidence \
|
|
64
|
+
--repo /path/to/repo \
|
|
65
|
+
--query "how does authentication work?" \
|
|
66
|
+
--limit 5
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
Filter by chunk type or file path:
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
code-finder-evidence \
|
|
73
|
+
--repo /path/to/repo \
|
|
74
|
+
--query "error handling" \
|
|
75
|
+
--filter-types function,method \
|
|
76
|
+
--filter-paths src/auth,src/config
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
Force a re-index after code changes:
|
|
80
|
+
|
|
81
|
+
```bash
|
|
82
|
+
code-finder-evidence --repo /path/to/repo --query "config loading" --reindex
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
### Code-grounded review
|
|
86
|
+
|
|
87
|
+
Validate a draft document's factual claims against the source code:
|
|
88
|
+
|
|
89
|
+
```bash
|
|
90
|
+
code-finder-review \
|
|
91
|
+
--repo /path/to/repo \
|
|
92
|
+
--draft docs/getting-started.md
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
Each claim gets a verdict: `supported`, `partially_supported`, `unsupported`, or `no_evidence_found`.
|
|
96
|
+
|
|
97
|
+
### API surface extraction
|
|
98
|
+
|
|
99
|
+
Extract the public API from source files. This is deterministic (no LLM, no indexing):
|
|
100
|
+
|
|
101
|
+
```bash
|
|
102
|
+
code-finder-api-surface --target src/mypackage/
|
|
103
|
+
|
|
104
|
+
# Single file
|
|
105
|
+
code-finder-api-surface --target src/mypackage/client.py
|
|
106
|
+
|
|
107
|
+
# Include private members
|
|
108
|
+
code-finder-api-surface --target src/mypackage/ --include-private
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
## Python API
|
|
112
|
+
|
|
113
|
+
```python
|
|
114
|
+
from claude_context.skills.evidence_retrieval import retrieve_evidence
|
|
115
|
+
|
|
116
|
+
results = retrieve_evidence(
|
|
117
|
+
repo_path="/path/to/repo",
|
|
118
|
+
query="how does hybrid search combine BM25 and vector results?",
|
|
119
|
+
limit=10,
|
|
120
|
+
filter_types=["function", "method"],
|
|
121
|
+
filter_paths=["src/auth", "src/config"],
|
|
122
|
+
)
|
|
123
|
+
|
|
124
|
+
for r in results:
|
|
125
|
+
print(f"{r['file_path']}:{r['start_line']} ({r['combined_score']:.3f})")
|
|
126
|
+
print(f" {r['signature']}")
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
```python
|
|
130
|
+
from claude_context.skills.grounded_review import grounded_review
|
|
131
|
+
|
|
132
|
+
report = grounded_review(
|
|
133
|
+
repo_path="/path/to/repo",
|
|
134
|
+
draft_path="docs/getting-started.md",
|
|
135
|
+
max_evidence_per_claim=5,
|
|
136
|
+
)
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
```python
|
|
140
|
+
from claude_context.skills.api_surface import extract_api_surface
|
|
141
|
+
|
|
142
|
+
surface = extract_api_surface(
|
|
143
|
+
target_path="src/mypackage/",
|
|
144
|
+
languages=["python"],
|
|
145
|
+
include_private=False,
|
|
146
|
+
include_docstrings=True,
|
|
147
|
+
)
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
## Index caching
|
|
151
|
+
|
|
152
|
+
On first run, code-finder builds an index of the repository (AST chunking + embeddings). This takes 1-3 minutes depending on repo size. The index is cached at:
|
|
153
|
+
|
|
154
|
+
```
|
|
155
|
+
{repo}/.vibe2doc/index.db
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
Subsequent runs reuse the cached index. Pass `--reindex` (CLI) or `reindex=True` (Python) after significant code changes. API surface extraction does not use the index.
|
|
159
|
+
|
|
160
|
+
## Filtering
|
|
161
|
+
|
|
162
|
+
### Path filtering
|
|
163
|
+
|
|
164
|
+
Restrict results to specific directories using `--filter-paths` (CLI) or `filter_paths` (Python). Paths are relative to the repo root:
|
|
165
|
+
|
|
166
|
+
```bash
|
|
167
|
+
code-finder-evidence --repo /path/to/repo --query "auth" --filter-paths src/auth,src/middleware
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
### Type filtering
|
|
171
|
+
|
|
172
|
+
Restrict to specific chunk types: `function`, `method`, `class`, `module`, `import`, `decorator`.
|
|
173
|
+
|
|
174
|
+
### Language filtering
|
|
175
|
+
|
|
176
|
+
Restrict to specific languages: `python`, `javascript`, `typescript`, `go`, and others.
|
|
177
|
+
|
|
178
|
+
## Supported languages
|
|
179
|
+
|
|
180
|
+
Python, JavaScript, TypeScript, Go (AST-parsed via tree-sitter). Additionally indexes Markdown, JSON, YAML, TOML, HTML, CSS, shell scripts, SQL, and other text formats.
|
|
181
|
+
|
|
182
|
+
## Used by
|
|
183
|
+
|
|
184
|
+
[redhat-docs-agent-tools](https://gitlab.cee.redhat.com/ccs-internal-tools/redhat-docs-agent-tools) uses code-finder as the backend for its `code-evidence`, `grounded-review`, and `api-surface` skills. If you're using those skills, code-finder is installed automatically as a dependency.
|
|
185
|
+
|
|
186
|
+
## Origin
|
|
187
|
+
|
|
188
|
+
code-finder was built from a fork of [claude-context](https://github.com/zilliztech/claude-context/) by Zilliz, which provides Milvus-backed code search for Claude. It was extended within [vibe2doc](https://gitlab.cee.redhat.com/dobrenna/vibe2doc) with enhanced AST chunking, path filtering, grounded review, and API surface extraction, then extracted as a standalone package. The vibe2doc README describes the full doc generation workflow; this package provides only the code analysis and search layer.
|
|
189
|
+
|
|
190
|
+
## License
|
|
191
|
+
|
|
192
|
+
Apache-2.0
|
|
@@ -4,9 +4,9 @@ build-backend = "setuptools.build_meta"
|
|
|
4
4
|
|
|
5
5
|
[project]
|
|
6
6
|
name = "code-finder"
|
|
7
|
-
version = "0.1.
|
|
7
|
+
version = "0.1.1"
|
|
8
8
|
description = "Code evidence retrieval and grounded review for documentation workflows. AST chunking, hybrid search (BM25 + vector), and API surface extraction."
|
|
9
|
-
readme = "
|
|
9
|
+
readme = "CODE_FINDER_README.md"
|
|
10
10
|
requires-python = ">=3.10"
|
|
11
11
|
license = "Apache-2.0"
|
|
12
12
|
keywords = [
|
|
@@ -0,0 +1,192 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: code-finder
|
|
3
|
+
Version: 0.1.1
|
|
4
|
+
Summary: Code evidence retrieval and grounded review for documentation workflows. AST chunking, hybrid search (BM25 + vector), and API surface extraction.
|
|
5
|
+
License-Expression: Apache-2.0
|
|
6
|
+
Keywords: documentation,code analysis,code evidence,semantic search,ast,embeddings
|
|
7
|
+
Requires-Python: >=3.10
|
|
8
|
+
Description-Content-Type: text/markdown
|
|
9
|
+
Requires-Dist: pymilvus>=2.3.0
|
|
10
|
+
Requires-Dist: milvus-lite>=2.3.0
|
|
11
|
+
Requires-Dist: sentence-transformers>=2.2.0
|
|
12
|
+
Requires-Dist: rank-bm25
|
|
13
|
+
Requires-Dist: numpy>=1.24.0
|
|
14
|
+
Requires-Dist: tqdm>=4.65.0
|
|
15
|
+
Requires-Dist: tree-sitter
|
|
16
|
+
Requires-Dist: tree-sitter-python
|
|
17
|
+
Requires-Dist: tree-sitter-javascript
|
|
18
|
+
Requires-Dist: tree-sitter-typescript
|
|
19
|
+
Requires-Dist: tree-sitter-go
|
|
20
|
+
Provides-Extra: synthesis
|
|
21
|
+
Requires-Dist: anthropic>=0.34.0; extra == "synthesis"
|
|
22
|
+
Provides-Extra: dev
|
|
23
|
+
Requires-Dist: pytest>=7.0; extra == "dev"
|
|
24
|
+
Requires-Dist: ruff; extra == "dev"
|
|
25
|
+
|
|
26
|
+
# code-finder
|
|
27
|
+
|
|
28
|
+
AST-based code indexing and hybrid search (BM25 + vector) for retrieving code evidence from repositories. Built to answer natural-language questions about a codebase with ranked, source-grounded results.
|
|
29
|
+
|
|
30
|
+
> **Import name**: The package installs as `code-finder` but the Python import is `claude_context`, not `code_finder`.
|
|
31
|
+
|
|
32
|
+
## Install
|
|
33
|
+
|
|
34
|
+
```bash
|
|
35
|
+
pip install code-finder
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
Or run ephemerally without installing:
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
uv run --with code-finder code-finder-evidence --repo /path/to/repo --query "how does auth work?"
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
## What it does
|
|
45
|
+
|
|
46
|
+
code-finder parses source code into AST-aware chunks, embeds them with a local sentence-transformer model, and stores them in a Milvus Lite vector database. At query time it combines BM25 keyword search with vector similarity search (reciprocal rank fusion) to return the most relevant code snippets for a natural-language question.
|
|
47
|
+
|
|
48
|
+
Three capabilities are exposed as both CLI commands and Python functions:
|
|
49
|
+
|
|
50
|
+
| Capability | CLI command | What it returns |
|
|
51
|
+
|---|---|---|
|
|
52
|
+
| **Code evidence retrieval** | `code-finder-evidence` | Ranked code snippets matching a query |
|
|
53
|
+
| **Code-grounded review** | `code-finder-review` | Per-claim verdicts for a draft document |
|
|
54
|
+
| **API surface extraction** | `code-finder-api-surface` | Public classes, functions, and signatures |
|
|
55
|
+
|
|
56
|
+
## CLI usage
|
|
57
|
+
|
|
58
|
+
### Code evidence retrieval
|
|
59
|
+
|
|
60
|
+
Search a repo with a natural-language question:
|
|
61
|
+
|
|
62
|
+
```bash
|
|
63
|
+
code-finder-evidence \
|
|
64
|
+
--repo /path/to/repo \
|
|
65
|
+
--query "how does authentication work?" \
|
|
66
|
+
--limit 5
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
Filter by chunk type or file path:
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
code-finder-evidence \
|
|
73
|
+
--repo /path/to/repo \
|
|
74
|
+
--query "error handling" \
|
|
75
|
+
--filter-types function,method \
|
|
76
|
+
--filter-paths src/auth,src/config
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
Force a re-index after code changes:
|
|
80
|
+
|
|
81
|
+
```bash
|
|
82
|
+
code-finder-evidence --repo /path/to/repo --query "config loading" --reindex
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
### Code-grounded review
|
|
86
|
+
|
|
87
|
+
Validate a draft document's factual claims against the source code:
|
|
88
|
+
|
|
89
|
+
```bash
|
|
90
|
+
code-finder-review \
|
|
91
|
+
--repo /path/to/repo \
|
|
92
|
+
--draft docs/getting-started.md
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
Each claim gets a verdict: `supported`, `partially_supported`, `unsupported`, or `no_evidence_found`.
|
|
96
|
+
|
|
97
|
+
### API surface extraction
|
|
98
|
+
|
|
99
|
+
Extract the public API from source files. This is deterministic (no LLM, no indexing):
|
|
100
|
+
|
|
101
|
+
```bash
|
|
102
|
+
code-finder-api-surface --target src/mypackage/
|
|
103
|
+
|
|
104
|
+
# Single file
|
|
105
|
+
code-finder-api-surface --target src/mypackage/client.py
|
|
106
|
+
|
|
107
|
+
# Include private members
|
|
108
|
+
code-finder-api-surface --target src/mypackage/ --include-private
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
## Python API
|
|
112
|
+
|
|
113
|
+
```python
|
|
114
|
+
from claude_context.skills.evidence_retrieval import retrieve_evidence
|
|
115
|
+
|
|
116
|
+
results = retrieve_evidence(
|
|
117
|
+
repo_path="/path/to/repo",
|
|
118
|
+
query="how does hybrid search combine BM25 and vector results?",
|
|
119
|
+
limit=10,
|
|
120
|
+
filter_types=["function", "method"],
|
|
121
|
+
filter_paths=["src/auth", "src/config"],
|
|
122
|
+
)
|
|
123
|
+
|
|
124
|
+
for r in results:
|
|
125
|
+
print(f"{r['file_path']}:{r['start_line']} ({r['combined_score']:.3f})")
|
|
126
|
+
print(f" {r['signature']}")
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
```python
|
|
130
|
+
from claude_context.skills.grounded_review import grounded_review
|
|
131
|
+
|
|
132
|
+
report = grounded_review(
|
|
133
|
+
repo_path="/path/to/repo",
|
|
134
|
+
draft_path="docs/getting-started.md",
|
|
135
|
+
max_evidence_per_claim=5,
|
|
136
|
+
)
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
```python
|
|
140
|
+
from claude_context.skills.api_surface import extract_api_surface
|
|
141
|
+
|
|
142
|
+
surface = extract_api_surface(
|
|
143
|
+
target_path="src/mypackage/",
|
|
144
|
+
languages=["python"],
|
|
145
|
+
include_private=False,
|
|
146
|
+
include_docstrings=True,
|
|
147
|
+
)
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
## Index caching
|
|
151
|
+
|
|
152
|
+
On first run, code-finder builds an index of the repository (AST chunking + embeddings). This takes 1-3 minutes depending on repo size. The index is cached at:
|
|
153
|
+
|
|
154
|
+
```
|
|
155
|
+
{repo}/.vibe2doc/index.db
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
Subsequent runs reuse the cached index. Pass `--reindex` (CLI) or `reindex=True` (Python) after significant code changes. API surface extraction does not use the index.
|
|
159
|
+
|
|
160
|
+
## Filtering
|
|
161
|
+
|
|
162
|
+
### Path filtering
|
|
163
|
+
|
|
164
|
+
Restrict results to specific directories using `--filter-paths` (CLI) or `filter_paths` (Python). Paths are relative to the repo root:
|
|
165
|
+
|
|
166
|
+
```bash
|
|
167
|
+
code-finder-evidence --repo /path/to/repo --query "auth" --filter-paths src/auth,src/middleware
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
### Type filtering
|
|
171
|
+
|
|
172
|
+
Restrict to specific chunk types: `function`, `method`, `class`, `module`, `import`, `decorator`.
|
|
173
|
+
|
|
174
|
+
### Language filtering
|
|
175
|
+
|
|
176
|
+
Restrict to specific languages: `python`, `javascript`, `typescript`, `go`, and others.
|
|
177
|
+
|
|
178
|
+
## Supported languages
|
|
179
|
+
|
|
180
|
+
Python, JavaScript, TypeScript, Go (AST-parsed via tree-sitter). Additionally indexes Markdown, JSON, YAML, TOML, HTML, CSS, shell scripts, SQL, and other text formats.
|
|
181
|
+
|
|
182
|
+
## Used by
|
|
183
|
+
|
|
184
|
+
[redhat-docs-agent-tools](https://gitlab.cee.redhat.com/ccs-internal-tools/redhat-docs-agent-tools) uses code-finder as the backend for its `code-evidence`, `grounded-review`, and `api-surface` skills. If you're using those skills, code-finder is installed automatically as a dependency.
|
|
185
|
+
|
|
186
|
+
## Origin
|
|
187
|
+
|
|
188
|
+
code-finder was built from a fork of [claude-context](https://github.com/zilliztech/claude-context/) by Zilliz, which provides Milvus-backed code search for Claude. It was extended within [vibe2doc](https://gitlab.cee.redhat.com/dobrenna/vibe2doc) with enhanced AST chunking, path filtering, grounded review, and API surface extraction, then extracted as a standalone package. The vibe2doc README describes the full doc generation workflow; this package provides only the code analysis and search layer.
|
|
189
|
+
|
|
190
|
+
## License
|
|
191
|
+
|
|
192
|
+
Apache-2.0
|