codetex 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- codetex-0.1.0/.github/workflows/workflow.yml +69 -0
- codetex-0.1.0/.gitignore +7 -0
- codetex-0.1.0/CLAUDE.md +64 -0
- codetex-0.1.0/PKG-INFO +20 -0
- codetex-0.1.0/README.md +103 -0
- codetex-0.1.0/docs/superpowers/plans/2026-04-08-claude-code-provider.md +788 -0
- codetex-0.1.0/docs/superpowers/specs/2026-04-08-claude-code-provider-design.md +96 -0
- codetex-0.1.0/pyproject.toml +56 -0
- codetex-0.1.0/src/codetex/__init__.py +3 -0
- codetex-0.1.0/src/codetex/claude_code.py +69 -0
- codetex-0.1.0/src/codetex/cli.py +149 -0
- codetex-0.1.0/src/codetex/git.py +97 -0
- codetex-0.1.0/src/codetex/ignore.py +83 -0
- codetex-0.1.0/src/codetex/indexer.py +344 -0
- codetex-0.1.0/src/codetex/llm.py +174 -0
- codetex-0.1.0/src/codetex/markdown.py +68 -0
- codetex-0.1.0/src/codetex/models.py +63 -0
- codetex-0.1.0/src/codetex/parser.py +645 -0
- codetex-0.1.0/src/codetex/provider.py +29 -0
- codetex-0.1.0/src/codetex/server.py +86 -0
- codetex-0.1.0/src/codetex/state.py +108 -0
- codetex-0.1.0/tests/__init__.py +0 -0
- codetex-0.1.0/tests/test_claude_code.py +174 -0
- codetex-0.1.0/tests/test_git.py +90 -0
- codetex-0.1.0/tests/test_ignore.py +51 -0
- codetex-0.1.0/tests/test_indexer.py +192 -0
- codetex-0.1.0/tests/test_llm.py +62 -0
- codetex-0.1.0/tests/test_markdown.py +63 -0
- codetex-0.1.0/tests/test_models.py +48 -0
- codetex-0.1.0/tests/test_parser.py +113 -0
- codetex-0.1.0/tests/test_provider.py +16 -0
- codetex-0.1.0/tests/test_state.py +85 -0
- codetex-0.1.0/uv.lock +1979 -0
|
@@ -0,0 +1,69 @@
|
|
|
1
|
+
name: Publish to PyPI
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
branches: [main]
|
|
6
|
+
pull_request:
|
|
7
|
+
branches: [main]
|
|
8
|
+
release:
|
|
9
|
+
types: [published]
|
|
10
|
+
|
|
11
|
+
jobs:
|
|
12
|
+
test:
|
|
13
|
+
runs-on: ubuntu-latest
|
|
14
|
+
steps:
|
|
15
|
+
- uses: actions/checkout@v4
|
|
16
|
+
|
|
17
|
+
- name: Install uv
|
|
18
|
+
uses: astral-sh/setup-uv@v4
|
|
19
|
+
|
|
20
|
+
- name: Set up Python
|
|
21
|
+
run: uv python install 3.12
|
|
22
|
+
|
|
23
|
+
- name: Install dependencies
|
|
24
|
+
run: uv sync
|
|
25
|
+
|
|
26
|
+
- name: Lint
|
|
27
|
+
run: uv run ruff check src/ tests/
|
|
28
|
+
|
|
29
|
+
- name: Type check
|
|
30
|
+
run: uv run mypy src/
|
|
31
|
+
|
|
32
|
+
- name: Test
|
|
33
|
+
run: uv run pytest
|
|
34
|
+
|
|
35
|
+
build:
|
|
36
|
+
runs-on: ubuntu-latest
|
|
37
|
+
steps:
|
|
38
|
+
- uses: actions/checkout@v4
|
|
39
|
+
|
|
40
|
+
- name: Install uv
|
|
41
|
+
uses: astral-sh/setup-uv@v4
|
|
42
|
+
|
|
43
|
+
- name: Set up Python
|
|
44
|
+
run: uv python install 3.12
|
|
45
|
+
|
|
46
|
+
- name: Build package
|
|
47
|
+
run: uv build
|
|
48
|
+
|
|
49
|
+
- name: Upload dist artifacts
|
|
50
|
+
uses: actions/upload-artifact@v4
|
|
51
|
+
with:
|
|
52
|
+
name: dist
|
|
53
|
+
path: dist/
|
|
54
|
+
|
|
55
|
+
publish:
|
|
56
|
+
needs: [test, build]
|
|
57
|
+
runs-on: ubuntu-latest
|
|
58
|
+
environment: release
|
|
59
|
+
permissions:
|
|
60
|
+
id-token: write
|
|
61
|
+
steps:
|
|
62
|
+
- name: Download dist artifacts
|
|
63
|
+
uses: actions/download-artifact@v4
|
|
64
|
+
with:
|
|
65
|
+
name: dist
|
|
66
|
+
path: dist/
|
|
67
|
+
|
|
68
|
+
- name: Publish to PyPI
|
|
69
|
+
uses: pypa/gh-action-pypi-publish@release/v1
|
codetex-0.1.0/.gitignore
ADDED
codetex-0.1.0/CLAUDE.md
ADDED
|
@@ -0,0 +1,64 @@
|
|
|
1
|
+
# CLAUDE.md
|
|
2
|
+
|
|
3
|
+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
4
|
+
|
|
5
|
+
## Project Overview
|
|
6
|
+
|
|
7
|
+
**codetex** is a standalone CLI/MCP tool that indexes Git repositories and generates a single `.codetex/SUMMARY.md` file with architecture overviews and per-file summaries optimized for LLM agent consumption. Supports incremental updates via git diff.
|
|
8
|
+
|
|
9
|
+
## Commands
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
uv sync # install dependencies
|
|
13
|
+
uv run pytest # run all tests
|
|
14
|
+
uv run pytest tests/test_git.py # run one test file
|
|
15
|
+
uv run pytest tests/test_git.py::test_is_git_repo_true # run single test
|
|
16
|
+
uv run ruff check src/ tests/ # lint
|
|
17
|
+
uv run ruff format src/ tests/ # format
|
|
18
|
+
uv run mypy src/ # type check (strict mode)
|
|
19
|
+
uv run codetex index <repo-path> [--folder <path>] [--dry-run] [--force] [--provider claude-code|anthropic] [--max-concurrent N]
|
|
20
|
+
uv run codetex serve # start MCP server
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
## Architecture
|
|
24
|
+
|
|
25
|
+
Single-command CLI (Typer) + MCP server (FastMCP). Both entry points call the same `index()` async function in `indexer.py`.
|
|
26
|
+
|
|
27
|
+
```
|
|
28
|
+
CLI (cli.py) ──┐
|
|
29
|
+
├──▶ indexer.py ─── index() decides mode:
|
|
30
|
+
MCP (server.py)─┘ │
|
|
31
|
+
_full_index() or _incremental_sync()
|
|
32
|
+
│ │
|
|
33
|
+
git.list_tracked_files git.diff_name_status
|
|
34
|
+
│ │
|
|
35
|
+
IgnoreFilter.filter filter changed files
|
|
36
|
+
│ │
|
|
37
|
+
parse_file() ◄────────────┘
|
|
38
|
+
│
|
|
39
|
+
LLM Tier 2: batch summarize files
|
|
40
|
+
│
|
|
41
|
+
LLM Tier 1: single call for repo overview
|
|
42
|
+
│
|
|
43
|
+
write .codetex/SUMMARY.md + state.json
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
**LLM provider abstraction:** `LLMProviderBase` protocol in `provider.py` with two implementations: `AnthropicProvider` (SDK-based, in `llm.py`) and `ClaudeCodeProvider` (shells out to `claude -p`, in `claude_code.py`). Default is `claude-code`. The indexer receives a provider instance — it has no knowledge of which backend is in use.
|
|
47
|
+
|
|
48
|
+
**Two-tier LLM strategy:** Tier 2 summarizes individual files (batched, concurrency-limited via semaphore). Tier 1 generates a single repo-level overview from all file summaries. Both tiers run on every index, even incremental (Tier 1 rebuilds from merged state).
|
|
49
|
+
|
|
50
|
+
**Incremental sync:** Anchored on commit SHA in `state.json`. Only changed files get re-parsed and re-summarized; old summaries are merged via `state.merge_state()`.
|
|
51
|
+
|
|
52
|
+
**Parser two-path strategy:** Tree-sitter for supported languages (Python, JS/TS, Go, Rust, Java, C/C++), regex fallback otherwise. Python gets full treatment (params, docstrings, base classes); other languages get name + first-line signature. Grammars and tiktoken encoder are lazy-loaded and cached at module level.
|
|
53
|
+
|
|
54
|
+
**Ignore filtering:** Uses `pathspec` library (gitignore semantics). Merges built-in excludes + `.gitignore` + `.codetexignore`. Also rejects files >2MB or with binary content (null bytes in first 8KB).
|
|
55
|
+
|
|
56
|
+
## Conventions
|
|
57
|
+
|
|
58
|
+
- All pipeline code is async; CLI wraps with `asyncio.run()`
|
|
59
|
+
- No SQLite, no embeddings, no vector search
|
|
60
|
+
- State tracked in `.codetex/state.json`, output in `.codetex/SUMMARY.md`
|
|
61
|
+
- Data models are plain dataclasses in `models.py` (no logic)
|
|
62
|
+
- `asyncio_mode = "strict"` in pytest — async tests require explicit `@pytest.mark.asyncio`
|
|
63
|
+
- Tests use `tmp_path` and inline `subprocess.run()` for git setup — no mocking library
|
|
64
|
+
- LLM responses parsed by line prefix (`ROLE:`, `SUMMARY:`, `INTERFACES:`, `DEPENDENCIES:`)
|
codetex-0.1.0/PKG-INFO
ADDED
|
@@ -0,0 +1,20 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: codetex
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: LLM-friendly repo summarizer — indexes codebases into markdown summaries with incremental git-diff updates
|
|
5
|
+
Requires-Python: >=3.12
|
|
6
|
+
Requires-Dist: anthropic>=0.40
|
|
7
|
+
Requires-Dist: fastmcp>=2.0
|
|
8
|
+
Requires-Dist: pathspec>=0.12
|
|
9
|
+
Requires-Dist: rich>=13.0
|
|
10
|
+
Requires-Dist: tiktoken>=0.7
|
|
11
|
+
Requires-Dist: tree-sitter-c
|
|
12
|
+
Requires-Dist: tree-sitter-cpp
|
|
13
|
+
Requires-Dist: tree-sitter-go
|
|
14
|
+
Requires-Dist: tree-sitter-java
|
|
15
|
+
Requires-Dist: tree-sitter-javascript
|
|
16
|
+
Requires-Dist: tree-sitter-python
|
|
17
|
+
Requires-Dist: tree-sitter-rust
|
|
18
|
+
Requires-Dist: tree-sitter-typescript
|
|
19
|
+
Requires-Dist: tree-sitter>=0.24
|
|
20
|
+
Requires-Dist: typer>=0.9
|
codetex-0.1.0/README.md
ADDED
|
@@ -0,0 +1,103 @@
|
|
|
1
|
+
# codetex
|
|
2
|
+
|
|
3
|
+
LLM-friendly repo summarizer — indexes codebases into a single markdown summary with incremental git-diff updates.
|
|
4
|
+
|
|
5
|
+
codetex scans a Git repository, extracts code structure using tree-sitter (with regex fallback), and uses a two-tier LLM pipeline to generate a `.codetex/SUMMARY.md` file optimized for LLM agent consumption. Subsequent runs only re-process changed files.
|
|
6
|
+
|
|
7
|
+
## Installation
|
|
8
|
+
|
|
9
|
+
Requires Python 3.12+ and [uv](https://docs.astral.sh/uv/).
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
git clone <repo-url>
|
|
13
|
+
cd codetex
|
|
14
|
+
uv sync
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
You'll need an `ANTHROPIC_API_KEY` environment variable set for LLM summarization.
|
|
18
|
+
|
|
19
|
+
## Usage
|
|
20
|
+
|
|
21
|
+
### CLI
|
|
22
|
+
|
|
23
|
+
```bash
|
|
24
|
+
# Index a repository (generates .codetex/SUMMARY.md)
|
|
25
|
+
codetex index <repo-path>
|
|
26
|
+
|
|
27
|
+
# Index only a subfolder
|
|
28
|
+
codetex index <repo-path> --folder src/
|
|
29
|
+
|
|
30
|
+
# Preview what would be indexed without calling the LLM
|
|
31
|
+
codetex index <repo-path> --dry-run
|
|
32
|
+
|
|
33
|
+
# Force full re-index (ignore incremental state)
|
|
34
|
+
codetex index <repo-path> --force
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
On subsequent runs, codetex detects the previous commit SHA in `.codetex/state.json` and only re-summarizes files that changed — saving time and tokens.
|
|
38
|
+
|
|
39
|
+
### MCP Server
|
|
40
|
+
|
|
41
|
+
```bash
|
|
42
|
+
codetex serve
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
Starts a [Model Context Protocol](https://modelcontextprotocol.io/) server over stdio, exposing two tools:
|
|
46
|
+
|
|
47
|
+
- **`index_repo(path, folder?)`** — index a repository
|
|
48
|
+
- **`get_summary(path)`** — retrieve the generated summary
|
|
49
|
+
|
|
50
|
+
## How It Works
|
|
51
|
+
|
|
52
|
+
```
|
|
53
|
+
CLI / MCP
|
|
54
|
+
│
|
|
55
|
+
▼
|
|
56
|
+
indexer.index() ─── decides full vs incremental
|
|
57
|
+
│
|
|
58
|
+
├── git.list_tracked_files / git.diff_name_status
|
|
59
|
+
├── IgnoreFilter (.gitignore + .codetexignore + defaults)
|
|
60
|
+
├── parser.parse_file (tree-sitter or regex fallback)
|
|
61
|
+
│
|
|
62
|
+
├── Tier 2: batch-summarize individual files (concurrent, semaphore-limited)
|
|
63
|
+
├── Tier 1: generate repo-level overview from all file summaries
|
|
64
|
+
│
|
|
65
|
+
└── write .codetex/SUMMARY.md + state.json
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
**Tier 2** summarizes each file independently (role, summary, interfaces, dependencies). **Tier 1** takes all file summaries and produces a single architectural overview. Both tiers run on every index — incremental runs merge old summaries with new ones before rebuilding Tier 1.
|
|
69
|
+
|
|
70
|
+
### Supported Languages
|
|
71
|
+
|
|
72
|
+
Tree-sitter parsing (full extraction — functions, classes, imports, parameters):
|
|
73
|
+
Python, JavaScript, TypeScript, Go, Rust, Java, C, C++
|
|
74
|
+
|
|
75
|
+
Regex fallback (symbol names + import detection):
|
|
76
|
+
All other text files
|
|
77
|
+
|
|
78
|
+
### Ignore Rules
|
|
79
|
+
|
|
80
|
+
Files are excluded by: built-in defaults (`.git`, `node_modules`, `__pycache__`, etc.), `.gitignore` patterns, `.codetexignore` patterns, size >2MB, or binary content.
|
|
81
|
+
|
|
82
|
+
To exclude additional files from indexing, create a `.codetexignore` file in the repository root. It uses the same syntax as `.gitignore`:
|
|
83
|
+
|
|
84
|
+
```
|
|
85
|
+
# .codetexignore
|
|
86
|
+
docs/
|
|
87
|
+
*.generated.ts
|
|
88
|
+
vendor/
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
## Development
|
|
92
|
+
|
|
93
|
+
```bash
|
|
94
|
+
uv sync # install dependencies
|
|
95
|
+
uv run pytest # run tests
|
|
96
|
+
uv run ruff check src/ tests/ # lint
|
|
97
|
+
uv run ruff format src/ tests/ # format
|
|
98
|
+
uv run mypy src/ # type check (strict)
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
## License
|
|
102
|
+
|
|
103
|
+
MIT
|