java-codebase-rag 0.2.0__py3-none-any.whl → 0.2.2__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: java-codebase-rag
3
- Version: 0.2.0
3
+ Version: 0.2.2
4
4
  Summary: MCP server for semantic + structural search over Java codebases
5
5
  Author: HumanBean17
6
6
  License-Expression: MIT
@@ -18,6 +18,7 @@ Classifier: Topic :: Software Development :: Libraries
18
18
  Requires-Python: >=3.11
19
19
  Description-Content-Type: text/markdown
20
20
  License-File: LICENSE
21
+ Requires-Dist: cocoindex[lancedb]<2,>=1.0.0a43
21
22
  Requires-Dist: kuzu<0.12,>=0.11.3
22
23
  Requires-Dist: lancedb<0.31,>=0.25.3
23
24
  Requires-Dist: mcp<2,>=1.27.0
@@ -26,6 +27,7 @@ Requires-Dist: pathspec<2,>=1.0.4
26
27
  Requires-Dist: pyarrow<24,>=23.0.1
27
28
  Requires-Dist: PyYAML<7,>=6.0.3
28
29
  Requires-Dist: sentence-transformers<6,>=5.4.0
30
+ Requires-Dist: transformers<=5.5.3,>=4.48.3
29
31
  Requires-Dist: tree-sitter<0.26,>=0.25.2
30
32
  Requires-Dist: tree-sitter-java<0.24,>=0.23.5
31
33
  Requires-Dist: unidiff<1,>=0.7.3
@@ -50,6 +52,24 @@ For the design rationale, the GPS metaphor, and the full ontology, see [`docs/pa
50
52
 
51
53
  ---
52
54
 
55
+ ## Why this exists
56
+
57
+ Generic code-search tools (grep, ctags, vector-only RAG) hit a ceiling on real Java microservice estates: they find files but lose the structure that makes a Spring/JAX-RS system navigable. This project is built around five choices that target that gap.
58
+
59
+ - **Hybrid RAG + GraphRAG, not either-or.** Semantic recall (LanceDB chunk vectors) and structural navigation (Kuzu property graph) are composed in one surface. `search` finds candidate nodes by meaning; `neighbors` walks the exact edge you care about (`CALLS`, `IMPLEMENTS`, `INJECTS`, `DECLARES_ROUTE`, …). The agent picks the right primitive per step instead of being forced into pure-vector or pure-symbol search.
60
+
61
+ - **A Java-tuned role model.** Symbols are labelled with stereotypes inferred from Spring and JAX-RS conventions — `CONTROLLER`, `SERVICE`, `REPOSITORY`, `CLIENT`, `PRODUCER`, `MAPPER`, `DTO`. Agents can ask "list controllers" or "who injects this repository" directly, instead of grep-ing for `@RestController` and hoping for the best. Roles drive both filtering (`find` with a `NodeFilter`) and ranking.
62
+
63
+ - **Ranking specialized for Java codebases.** The composite ranker is aware of role, microservice, and FQN structure — not a generic BM25. A search for `"chat ingress"` surfaces controllers before utility classes; a search scoped to one microservice doesn't drown in matches from the other 19. Defaults are tuned on the bank-chat fixture and exposed in `docs/CONFIGURATION.md` for per-repo overrides.
64
+
65
+ - **Cross-service resolution + system-level navigation.** `HTTP_CALLS` and `ASYNC_CALLS` edges connect Clients and Producers in one microservice to Routes and Handlers in another, resolved at index time from URL/topic strings + Spring `@FeignClient` / `RestTemplate` conventions. `/who-hits-route`, `/trace-request-flow`, and `/impact-of` use these to answer questions a single-service tool fundamentally can't — "who calls this REST endpoint from outside this service", "trace this Kafka message end-to-end", "if I change this DTO, which services break".
66
+
67
+ - **Brownfield annotations as a first-class override.** Real Java estates have hand-rolled HTTP clients, dynamic topic names, reflection-heavy routing. `@CodebaseHttpRoute`, `@CodebaseAsyncRoute`, `@CodebaseHttpClient`, and `@CodebaseProducer` let you pin the truth in source. They have **exclusive priority** — when a symbol is annotated, framework-convention inference is skipped entirely. You get a correct graph on legacy code without rewriting it.
68
+
69
+ The rest of this README is the install, walkthrough, and tool cheat sheet for putting that to work.
70
+
71
+ ---
72
+
53
73
  ## Install
54
74
 
55
75
  ```bash
@@ -57,6 +77,7 @@ pip install java-codebase-rag
57
77
  ```
58
78
 
59
79
  Python **3.11+** required. After install, `java-codebase-rag --help` should print the CLI groups.
80
+ The package includes the CocoIndex lifecycle dependency used by `init`, `increment`, `reprocess`, and `erase`.
60
81
 
61
82
  > **Stability disclaimer.** This package does **not** promise backward compatibility. MCP tool contracts, env vars, Lance/Kuzu schemas, config files, and Python APIs may change without a deprecation period. Track `main` and rebuild indexes when ontology or embedding settings change.
62
83
 
@@ -132,9 +153,9 @@ See [`mcp.json.example`](./mcp.json.example) for the same shape in `.mcp.json` (
132
153
 
133
154
  Pick **one** of two options (not both — they cover the same navigation intents):
134
155
 
135
- 1. **[`docs/AGENT-GUIDE.md`](./docs/AGENT-GUIDE.md)** (recommended for most) — standalone MCP operating manual. Copy-paste the `BEGIN`/`END` block into your project's `QWEN.md`, `CLAUDE.md`, or `AGENTS.md`. Contains: five-tool reference, `NodeFilter` / edge taxonomy, ontology glossary, recovery playbook, and inline slash-style aliases (`/callers`, `/callees`, `/routes`, etc.) as prompt templates. Self-contained — no external file dependencies.
156
+ 1. **[`docs/AGENT-GUIDE.md`](./docs/AGENT-GUIDE.md)** (recommended for most) — standalone MCP operating manual. Copy-paste the `BEGIN`/`END` block into your project's `QWEN.md`, `CLAUDE.md`, or `AGENTS.md`. Contains: five-tool reference, `NodeFilter` / edge taxonomy, ontology glossary, recovery playbook, and navigation patterns. Self-contained — no external file dependencies.
136
157
 
137
- 2. **[`skills/`](./skills/)** (for hosts with skill discovery) — 15 shipped `SKILL.md` files. If your MCP host supports skill discovery (Claude Code, Qwen Code, Cursor), the same navigation intents are available as discoverable `/` commands. Tier 1 = deterministic MCP chains (`/callers`, `/callees`, `/routes`, `/controllers`, `/clients`, `/producers`, `/handlers`, `/who-hits-route`, `/implements`, `/injects`, `/nl`). Tier 2 = bounded workflows (`/explain-feature`, `/impact-of`, `/trace-request-flow`, `/mini-map`). See [`skills/README.md`](./skills/README.md) for the full index.
158
+ 2. **[`/explore-codebase`](./skills/explore-codebase/SKILL.md)** (for hosts with skill discovery) — single self-contained skill with the complete operating manual. If your MCP host supports skill discovery (Claude Code, Qwen Code, Cursor), load `/explore-codebase` to get the full tool reference, edge taxonomy, decision tree, and recovery playbook in one shot.
138
159
 
139
160
  Also: **[`docs/MANUAL-VERIFICATION-CHECKLIST.md`](./docs/MANUAL-VERIFICATION-CHECKLIST.md)** — 7-phase agent-driven verification you run after indexing your real project.
140
161
 
@@ -154,7 +175,7 @@ Full schemas, `NodeFilter` / `EdgeFilter` semantics, and the hints contract live
154
175
 
155
176
  ### Three-layer architecture
156
177
 
157
- Layer 1 (storage) → Layer 2 (5 MCP tools) → Layer 3 (skills). Navigation skills in [`skills/`](./skills/) wrap the MCP tools into deterministic chains (Tier 1) and bounded workflows (Tier 2). See the [architecture diagram in `skills/README.md`](./skills/README.md#three-layer-architecture).
178
+ Layer 1 (storage) → Layer 2 (5 MCP tools) → Layer 3 (skill). The [`/explore-codebase`](./skills/explore-codebase/SKILL.md) skill provides the full operating manual for Layer 2. See the [architecture diagram in `skills/README.md`](./skills/README.md#three-layer-architecture).
158
179
 
159
180
  ---
160
181
 
@@ -197,7 +218,7 @@ Run `java-codebase-rag --help` to list grouped subcommands. Operator playbook wi
197
218
  | [`docs/CONFIGURATION.md`](./docs/CONFIGURATION.md) | Environment variables, project YAML, graph ontology, brownfield overrides, ignore patterns. |
198
219
  | [`docs/JAVA-CODEBASE-RAG-CLI.md`](./docs/JAVA-CODEBASE-RAG-CLI.md) | CLI operator playbook: workflows, exit codes, env alignment. |
199
220
  | [`docs/EDGE-NAVIGATION.md`](./docs/EDGE-NAVIGATION.md) | MCP-traversable edges, directions, dot-key composition. |
200
- | [`skills/`](./skills/) | 15 navigation and workflow skills for hosts with skill discovery (alternative to copy-pasting AGENT-GUIDE). See [`skills/README.md`](./skills/README.md). |
221
+ | [`skills/`](./skills/) | Single `/explore-codebase` skill complete MCP operating manual for hosts with skill discovery (alternative to copy-pasting AGENT-GUIDE). See [`skills/README.md`](./skills/README.md). |
201
222
  | [`docs/MANUAL-VERIFICATION-CHECKLIST.md`](./docs/MANUAL-VERIFICATION-CHECKLIST.md) | 7-phase agent-driven verification after indexing your project. |
202
223
  | [`docs/CODEBASE_REQUIREMENTS.md`](./docs/CODEBASE_REQUIREMENTS.md) | Assumptions about your Java repo + per-file edit map for non-conforming codebases. |
203
224
  | [`automation/cursor_propose_only/README.md`](./automation/cursor_propose_only/README.md) | Optional proposal orchestration workflow (single-command autopilot, planning bundles, automated execution/review loops). |
@@ -214,7 +235,7 @@ python3 -m venv .venv
214
235
  .venv/bin/pip install -r requirements.txt
215
236
  ```
216
237
 
217
- The `cocoindex` package is **only** needed for lifecycle commands that run the indexer (`init`, `increment`, `reprocess`, `erase`). Search and MCP navigation work without it.
238
+ The `cocoindex` package powers lifecycle commands that run the indexer (`init`, `increment`, `reprocess`, `erase`). Search and MCP navigation do not invoke it directly.
218
239
 
219
240
  The default embedding model is `sentence-transformers/all-MiniLM-L6-v2` (downloaded on first `init`). Override via the `EMBEDDING_MODEL` env var — see [`docs/CONFIGURATION.md` §1](./docs/CONFIGURATION.md#1-environment-variables).
220
241
 
@@ -19,9 +19,9 @@ java_codebase_rag/cli.py,sha256=hCjlmAXkS80noTX_bxm6BMiLIYEz_P5xfrw9C7LvkBE,2767
19
19
  java_codebase_rag/cli_progress.py,sha256=Vtio3RqJ3LkRoNpxrv8iGbEiX4klkTlJX-mR4l6oeBM,1586
20
20
  java_codebase_rag/config.py,sha256=h07zJrV8QoLv9hIhJZ2JgUI0Rh6uPBZUiPkGDEmTg_w,11687
21
21
  java_codebase_rag/pipeline.py,sha256=QyKNCrBsjdFU71N9Xygti-DdtMQQsrZ8aySisux46lI,5311
22
- java_codebase_rag-0.2.0.dist-info/licenses/LICENSE,sha256=gxvtiHtuviR_q8ZAjWw-QTcF3DyPzg6ZY-lQrr8OPpw,1068
23
- java_codebase_rag-0.2.0.dist-info/METADATA,sha256=NviBdzC9KvG34qsD-GJP-0EotAPWe4kpp8yagOBdRnc,12560
24
- java_codebase_rag-0.2.0.dist-info/WHEEL,sha256=aeYiig01lYGDzBgS8HxWXOg3uV61G9ijOsup-k9o1sk,91
25
- java_codebase_rag-0.2.0.dist-info/entry_points.txt,sha256=mVVQJa0n73OWfhHXYCDoPRrWin_LJhH2Rn0CkJ2iax4,101
26
- java_codebase_rag-0.2.0.dist-info/top_level.txt,sha256=5aIYoMkvJvvfXvf4iHn2OeSIM7PZXP-0j94eNESnwMw,242
27
- java_codebase_rag-0.2.0.dist-info/RECORD,,
22
+ java_codebase_rag-0.2.2.dist-info/licenses/LICENSE,sha256=gxvtiHtuviR_q8ZAjWw-QTcF3DyPzg6ZY-lQrr8OPpw,1068
23
+ java_codebase_rag-0.2.2.dist-info/METADATA,sha256=VWpfMNxxjvuY2x-rJviWa4pv-OlkX7R93ew-IkFyzjM,15112
24
+ java_codebase_rag-0.2.2.dist-info/WHEEL,sha256=aeYiig01lYGDzBgS8HxWXOg3uV61G9ijOsup-k9o1sk,91
25
+ java_codebase_rag-0.2.2.dist-info/entry_points.txt,sha256=mVVQJa0n73OWfhHXYCDoPRrWin_LJhH2Rn0CkJ2iax4,101
26
+ java_codebase_rag-0.2.2.dist-info/top_level.txt,sha256=5aIYoMkvJvvfXvf4iHn2OeSIM7PZXP-0j94eNESnwMw,242
27
+ java_codebase_rag-0.2.2.dist-info/RECORD,,