codetex 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,69 @@
1
+ name: Publish to PyPI
2
+
3
+ on:
4
+ push:
5
+ branches: [main]
6
+ pull_request:
7
+ branches: [main]
8
+ release:
9
+ types: [published]
10
+
11
+ jobs:
12
+ test:
13
+ runs-on: ubuntu-latest
14
+ steps:
15
+ - uses: actions/checkout@v4
16
+
17
+ - name: Install uv
18
+ uses: astral-sh/setup-uv@v4
19
+
20
+ - name: Set up Python
21
+ run: uv python install 3.12
22
+
23
+ - name: Install dependencies
24
+ run: uv sync
25
+
26
+ - name: Lint
27
+ run: uv run ruff check src/ tests/
28
+
29
+ - name: Type check
30
+ run: uv run mypy src/
31
+
32
+ - name: Test
33
+ run: uv run pytest
34
+
35
+ build:
36
+ runs-on: ubuntu-latest
37
+ steps:
38
+ - uses: actions/checkout@v4
39
+
40
+ - name: Install uv
41
+ uses: astral-sh/setup-uv@v4
42
+
43
+ - name: Set up Python
44
+ run: uv python install 3.12
45
+
46
+ - name: Build package
47
+ run: uv build
48
+
49
+ - name: Upload dist artifacts
50
+ uses: actions/upload-artifact@v4
51
+ with:
52
+ name: dist
53
+ path: dist/
54
+
55
+ publish:
56
+ needs: [test, build]
57
+ runs-on: ubuntu-latest
58
+ environment: release
59
+ permissions:
60
+ id-token: write
61
+ steps:
62
+ - name: Download dist artifacts
63
+ uses: actions/download-artifact@v4
64
+ with:
65
+ name: dist
66
+ path: dist/
67
+
68
+ - name: Publish to PyPI
69
+ uses: pypa/gh-action-pypi-publish@release/v1
@@ -0,0 +1,7 @@
1
+ .codetex/
2
+ __pycache__/
3
+ *.pyc
4
+ .mypy_cache/
5
+ .ruff_cache/
6
+ .pytest_cache/
7
+ .superpowers/
@@ -0,0 +1,64 @@
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## Project Overview
6
+
7
+ **codetex** is a standalone CLI/MCP tool that indexes Git repositories and generates a single `.codetex/SUMMARY.md` file with architecture overviews and per-file summaries optimized for LLM agent consumption. Supports incremental updates via git diff.
8
+
9
+ ## Commands
10
+
11
+ ```bash
12
+ uv sync # install dependencies
13
+ uv run pytest # run all tests
14
+ uv run pytest tests/test_git.py # run one test file
15
+ uv run pytest tests/test_git.py::test_is_git_repo_true # run single test
16
+ uv run ruff check src/ tests/ # lint
17
+ uv run ruff format src/ tests/ # format
18
+ uv run mypy src/ # type check (strict mode)
19
+ uv run codetex index <repo-path> [--folder <path>] [--dry-run] [--force] [--provider claude-code|anthropic] [--max-concurrent N]
20
+ uv run codetex serve # start MCP server
21
+ ```
22
+
23
+ ## Architecture
24
+
25
+ Single-command CLI (Typer) + MCP server (FastMCP). Both entry points call the same `index()` async function in `indexer.py`.
26
+
27
+ ```
28
+ CLI (cli.py) ──┐
29
+ ├──▶ indexer.py ─── index() decides mode:
30
+ MCP (server.py)─┘ │
31
+ _full_index() or _incremental_sync()
32
+ │ │
33
+ git.list_tracked_files git.diff_name_status
34
+ │ │
35
+ IgnoreFilter.filter filter changed files
36
+ │ │
37
+ parse_file() ◄────────────┘
38
+
39
+ LLM Tier 2: batch summarize files
40
+
41
+ LLM Tier 1: single call for repo overview
42
+
43
+ write .codetex/SUMMARY.md + state.json
44
+ ```
45
+
46
+ **LLM provider abstraction:** `LLMProviderBase` protocol in `provider.py` with two implementations: `AnthropicProvider` (SDK-based, in `llm.py`) and `ClaudeCodeProvider` (shells out to `claude -p`, in `claude_code.py`). Default is `claude-code`. The indexer receives a provider instance — it has no knowledge of which backend is in use.
47
+
48
+ **Two-tier LLM strategy:** Tier 2 summarizes individual files (batched, concurrency-limited via semaphore). Tier 1 generates a single repo-level overview from all file summaries. Both tiers run on every index, even incremental (Tier 1 rebuilds from merged state).
49
+
50
+ **Incremental sync:** Anchored on commit SHA in `state.json`. Only changed files get re-parsed and re-summarized; old summaries are merged via `state.merge_state()`.
51
+
52
+ **Parser two-path strategy:** Tree-sitter for supported languages (Python, JS/TS, Go, Rust, Java, C/C++), regex fallback otherwise. Python gets full treatment (params, docstrings, base classes); other languages get name + first-line signature. Grammars and tiktoken encoder are lazy-loaded and cached at module level.
53
+
54
+ **Ignore filtering:** Uses `pathspec` library (gitignore semantics). Merges built-in excludes + `.gitignore` + `.codetexignore`. Also rejects files >2MB or with binary content (null bytes in first 8KB).
55
+
56
+ ## Conventions
57
+
58
+ - All pipeline code is async; CLI wraps with `asyncio.run()`
59
+ - No SQLite, no embeddings, no vector search
60
+ - State tracked in `.codetex/state.json`, output in `.codetex/SUMMARY.md`
61
+ - Data models are plain dataclasses in `models.py` (no logic)
62
+ - `asyncio_mode = "strict"` in pytest — async tests require explicit `@pytest.mark.asyncio`
63
+ - Tests use `tmp_path` and inline `subprocess.run()` for git setup — no mocking library
64
+ - LLM responses parsed by line prefix (`ROLE:`, `SUMMARY:`, `INTERFACES:`, `DEPENDENCIES:`)
codetex-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,20 @@
1
+ Metadata-Version: 2.4
2
+ Name: codetex
3
+ Version: 0.1.0
4
+ Summary: LLM-friendly repo summarizer — indexes codebases into markdown summaries with incremental git-diff updates
5
+ Requires-Python: >=3.12
6
+ Requires-Dist: anthropic>=0.40
7
+ Requires-Dist: fastmcp>=2.0
8
+ Requires-Dist: pathspec>=0.12
9
+ Requires-Dist: rich>=13.0
10
+ Requires-Dist: tiktoken>=0.7
11
+ Requires-Dist: tree-sitter-c
12
+ Requires-Dist: tree-sitter-cpp
13
+ Requires-Dist: tree-sitter-go
14
+ Requires-Dist: tree-sitter-java
15
+ Requires-Dist: tree-sitter-javascript
16
+ Requires-Dist: tree-sitter-python
17
+ Requires-Dist: tree-sitter-rust
18
+ Requires-Dist: tree-sitter-typescript
19
+ Requires-Dist: tree-sitter>=0.24
20
+ Requires-Dist: typer>=0.9
@@ -0,0 +1,103 @@
1
+ # codetex
2
+
3
+ LLM-friendly repo summarizer — indexes codebases into a single markdown summary with incremental git-diff updates.
4
+
5
+ codetex scans a Git repository, extracts code structure using tree-sitter (with regex fallback), and uses a two-tier LLM pipeline to generate a `.codetex/SUMMARY.md` file optimized for LLM agent consumption. Subsequent runs only re-process changed files.
6
+
7
+ ## Installation
8
+
9
+ Requires Python 3.12+ and [uv](https://docs.astral.sh/uv/).
10
+
11
+ ```bash
12
+ git clone <repo-url>
13
+ cd codetex
14
+ uv sync
15
+ ```
16
+
17
+ You'll need an `ANTHROPIC_API_KEY` environment variable set for LLM summarization.
18
+
19
+ ## Usage
20
+
21
+ ### CLI
22
+
23
+ ```bash
24
+ # Index a repository (generates .codetex/SUMMARY.md)
25
+ codetex index <repo-path>
26
+
27
+ # Index only a subfolder
28
+ codetex index <repo-path> --folder src/
29
+
30
+ # Preview what would be indexed without calling the LLM
31
+ codetex index <repo-path> --dry-run
32
+
33
+ # Force full re-index (ignore incremental state)
34
+ codetex index <repo-path> --force
35
+ ```
36
+
37
+ On subsequent runs, codetex detects the previous commit SHA in `.codetex/state.json` and only re-summarizes files that changed — saving time and tokens.
38
+
39
+ ### MCP Server
40
+
41
+ ```bash
42
+ codetex serve
43
+ ```
44
+
45
+ Starts a [Model Context Protocol](https://modelcontextprotocol.io/) server over stdio, exposing two tools:
46
+
47
+ - **`index_repo(path, folder?)`** — index a repository
48
+ - **`get_summary(path)`** — retrieve the generated summary
49
+
50
+ ## How It Works
51
+
52
+ ```
53
+ CLI / MCP
54
+
55
+
56
+ indexer.index() ─── decides full vs incremental
57
+
58
+ ├── git.list_tracked_files / git.diff_name_status
59
+ ├── IgnoreFilter (.gitignore + .codetexignore + defaults)
60
+ ├── parser.parse_file (tree-sitter or regex fallback)
61
+
62
+ ├── Tier 2: batch-summarize individual files (concurrent, semaphore-limited)
63
+ ├── Tier 1: generate repo-level overview from all file summaries
64
+
65
+ └── write .codetex/SUMMARY.md + state.json
66
+ ```
67
+
68
+ **Tier 2** summarizes each file independently (role, summary, interfaces, dependencies). **Tier 1** takes all file summaries and produces a single architectural overview. Both tiers run on every index — incremental runs merge old summaries with new ones before rebuilding Tier 1.
69
+
70
+ ### Supported Languages
71
+
72
+ Tree-sitter parsing (full extraction — functions, classes, imports, parameters):
73
+ Python, JavaScript, TypeScript, Go, Rust, Java, C, C++
74
+
75
+ Regex fallback (symbol names + import detection):
76
+ All other text files
77
+
78
+ ### Ignore Rules
79
+
80
+ Files are excluded by: built-in defaults (`.git`, `node_modules`, `__pycache__`, etc.), `.gitignore` patterns, `.codetexignore` patterns, size >2MB, or binary content.
81
+
82
+ To exclude additional files from indexing, create a `.codetexignore` file in the repository root. It uses the same syntax as `.gitignore`:
83
+
84
+ ```
85
+ # .codetexignore
86
+ docs/
87
+ *.generated.ts
88
+ vendor/
89
+ ```
90
+
91
+ ## Development
92
+
93
+ ```bash
94
+ uv sync # install dependencies
95
+ uv run pytest # run tests
96
+ uv run ruff check src/ tests/ # lint
97
+ uv run ruff format src/ tests/ # format
98
+ uv run mypy src/ # type check (strict)
99
+ ```
100
+
101
+ ## License
102
+
103
+ MIT