codectx 0.1.2__tar.gz → 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (108) hide show
  1. codectx-0.2.0/.gitignore +44 -0
  2. codectx-0.2.0/ARCHITECTURE.md +251 -0
  3. codectx-0.2.0/CONTEXT.md +1055 -0
  4. codectx-0.2.0/DECISIONS.md +261 -0
  5. codectx-0.2.0/LICENSE +21 -0
  6. codectx-0.2.0/PKG-INFO +252 -0
  7. codectx-0.2.0/PLAN.md +145 -0
  8. codectx-0.2.0/README.md +198 -0
  9. codectx-0.2.0/benchmark.png +0 -0
  10. codectx-0.2.0/docs/astro.config.mjs +73 -0
  11. codectx-0.2.0/docs/build_output.txt +381 -0
  12. codectx-0.2.0/docs/bun.lock +1470 -0
  13. codectx-0.2.0/docs/package.json +25 -0
  14. codectx-0.2.0/docs/src/content/docs/advanced/dependency-graph.md +22 -0
  15. codectx-0.2.0/docs/src/content/docs/advanced/ranking-system.md +40 -0
  16. codectx-0.2.0/docs/src/content/docs/advanced/token-compression.md +26 -0
  17. codectx-0.2.0/docs/src/content/docs/community/contributing.md +51 -0
  18. codectx-0.2.0/docs/src/content/docs/community/faq.md +22 -0
  19. codectx-0.2.0/docs/src/content/docs/comparison.md +30 -0
  20. codectx-0.2.0/docs/src/content/docs/getting-started/basic-usage.md +62 -0
  21. codectx-0.2.0/docs/src/content/docs/getting-started/installation.md +61 -0
  22. codectx-0.2.0/docs/src/content/docs/getting-started/quick-start.mdx +43 -0
  23. codectx-0.2.0/docs/src/content/docs/guides/best-practices.md +33 -0
  24. codectx-0.2.0/docs/src/content/docs/guides/configuration.md +52 -0
  25. codectx-0.2.0/docs/src/content/docs/guides/using-context-effectively.md +33 -0
  26. codectx-0.2.0/docs/src/content/docs/index.mdx +31 -0
  27. codectx-0.2.0/docs/src/content/docs/introduction/what-is-codectx.md +21 -0
  28. codectx-0.2.0/docs/src/content/docs/introduction/why-it-exists.md +19 -0
  29. codectx-0.2.0/docs/src/content/docs/reference/architecture-overview.md +32 -0
  30. codectx-0.2.0/docs/src/content/docs/reference/cli-reference.md +115 -0
  31. codectx-0.2.0/docs/src/content.config.ts +7 -0
  32. codectx-0.2.0/docs/src/env.d.ts +2 -0
  33. codectx-0.2.0/docs/src/styles/custom.css +18 -0
  34. codectx-0.2.0/docs/tsconfig.json +9 -0
  35. {codectx-0.1.2 → codectx-0.2.0}/pyproject.toml +3 -3
  36. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/__init__.py +1 -1
  37. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/cli.py +33 -35
  38. codectx-0.2.0/src/codectx/compressor/tiered.py +705 -0
  39. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/config/defaults.py +28 -24
  40. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/graph/builder.py +33 -26
  41. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/graph/resolver.py +1 -1
  42. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/output/formatter.py +94 -43
  43. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/output/sections.py +2 -0
  44. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/parser/languages.py +5 -2
  45. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/parser/treesitter.py +6 -5
  46. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/ranker/git_meta.py +3 -2
  47. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/ranker/scorer.py +40 -11
  48. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/ranker/semantic.py +1 -1
  49. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/walker.py +4 -0
  50. {codectx-0.1.2 → codectx-0.2.0}/tests/test_compressor.py +19 -2
  51. {codectx-0.1.2 → codectx-0.2.0}/tests/test_integration.py +2 -3
  52. {codectx-0.1.2 → codectx-0.2.0}/tests/test_scorer.py +71 -0
  53. {codectx-0.1.2 → codectx-0.2.0}/tests/test_walker.py +35 -0
  54. {codectx-0.1.2 → codectx-0.2.0}/tests/unit/test_cache_export.py +1 -1
  55. codectx-0.2.0/tests/unit/test_call_paths.py +161 -0
  56. {codectx-0.1.2 → codectx-0.2.0}/tests/unit/test_formatter_coverage.py +0 -7
  57. {codectx-0.1.2 → codectx-0.2.0}/tests/unit/test_formatter_sections.py +49 -6
  58. {codectx-0.1.2 → codectx-0.2.0}/tests/unit/test_git_meta.py +51 -0
  59. {codectx-0.1.2 → codectx-0.2.0}/tests/unit/test_resolver.py +13 -0
  60. codectx-0.2.0/tests/unit/test_safety.py +27 -0
  61. {codectx-0.1.2 → codectx-0.2.0}/tests/unit/test_semantic.py +31 -0
  62. {codectx-0.1.2 → codectx-0.2.0}/tests/unit/test_summarizer.py +3 -2
  63. {codectx-0.1.2 → codectx-0.2.0}/tests/unit/test_treesitter.py +1 -1
  64. {codectx-0.1.2 → codectx-0.2.0}/uv.lock +882 -810
  65. codectx-0.1.2/.gitignore +0 -18
  66. codectx-0.1.2/ARCHITECTURE.md +0 -112
  67. codectx-0.1.2/CLAUDE.md +0 -90
  68. codectx-0.1.2/CONTEXT.md +0 -982
  69. codectx-0.1.2/DECISIONS.md +0 -81
  70. codectx-0.1.2/PKG-INFO +0 -389
  71. codectx-0.1.2/PLAN.md +0 -53
  72. codectx-0.1.2/README.md +0 -336
  73. codectx-0.1.2/requirements.txt +0 -112
  74. codectx-0.1.2/src/codectx/compressor/tiered.py +0 -299
  75. {codectx-0.1.2 → codectx-0.2.0}/.github/workflows/ci.yml +0 -0
  76. {codectx-0.1.2 → codectx-0.2.0}/.github/workflows/codeql.yml +0 -0
  77. {codectx-0.1.2 → codectx-0.2.0}/.github/workflows/publish.yml +0 -0
  78. {codectx-0.1.2 → codectx-0.2.0}/.python-version +0 -0
  79. {codectx-0.1.2 → codectx-0.2.0}/main.py +0 -0
  80. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/cache.py +0 -0
  81. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/compressor/__init__.py +0 -0
  82. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/compressor/budget.py +0 -0
  83. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/compressor/summarizer.py +0 -0
  84. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/config/__init__.py +0 -0
  85. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/config/loader.py +0 -0
  86. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/graph/__init__.py +0 -0
  87. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/ignore.py +0 -0
  88. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/output/__init__.py +0 -0
  89. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/parser/__init__.py +0 -0
  90. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/parser/base.py +0 -0
  91. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/parser/queries/go.scm +0 -0
  92. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/parser/queries/java.scm +0 -0
  93. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/parser/queries/javascript.scm +0 -0
  94. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/parser/queries/python.scm +0 -0
  95. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/parser/queries/rust.scm +0 -0
  96. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/parser/queries/typescript.scm +0 -0
  97. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/ranker/__init__.py +0 -0
  98. {codectx-0.1.2 → codectx-0.2.0}/src/codectx/safety.py +0 -0
  99. {codectx-0.1.2 → codectx-0.2.0}/tests/__init__.py +0 -0
  100. {codectx-0.1.2 → codectx-0.2.0}/tests/test_ignore.py +0 -0
  101. {codectx-0.1.2 → codectx-0.2.0}/tests/test_parser.py +0 -0
  102. {codectx-0.1.2 → codectx-0.2.0}/tests/unit/__init__.py +0 -0
  103. {codectx-0.1.2 → codectx-0.2.0}/tests/unit/test_cache_wiring.py +0 -0
  104. {codectx-0.1.2 → codectx-0.2.0}/tests/unit/test_cli.py +0 -0
  105. {codectx-0.1.2 → codectx-0.2.0}/tests/unit/test_cycles.py +0 -0
  106. {codectx-0.1.2 → codectx-0.2.0}/tests/unit/test_multi_root.py +0 -0
  107. {codectx-0.1.2 → codectx-0.2.0}/tests/unit/test_queries.py +0 -0
  108. {codectx-0.1.2 → codectx-0.2.0}/tests/unit/test_semantic_mock.py +0 -0
@@ -0,0 +1,44 @@
1
+ # Python-generated files
2
+ __pycache__/
3
+ *.py[oc]
4
+ build/
5
+ dist/
6
+ wheels/
7
+ *.egg-info
8
+
9
+ # Virtual environments
10
+ .venv
11
+ venv/
12
+ .pytest_cache
13
+ .codectx_cache
14
+ .ruff_cache
15
+ .mypy_cache
16
+
17
+ .vscode
18
+ .idea
19
+ .coverage
20
+ node_modules/
21
+ .astro/
22
+
23
+ # Logs
24
+ logs
25
+ *.log
26
+ npm-debug.log*
27
+ yarn-debug.log*
28
+ yarn-error.log*
29
+ pnpm-debug.log*
30
+ lerna-debug.log*
31
+
32
+ dist
33
+ dist-ssr
34
+ *.local
35
+
36
+ # Editor directories and files
37
+ .vscode/*
38
+ .DS_Store
39
+ *.suo
40
+ *.ntvs*
41
+ *.njsproj
42
+ *.sln
43
+ *.sw?
44
+ .env
@@ -0,0 +1,251 @@
1
+ # Architecture
2
+
3
+ ## Overview
4
+
5
+ codectx processes repositories through a structured analysis pipeline that ranks code by importance, compresses it intelligently, and emits a structured markdown document optimized for AI systems.
6
+
7
+ The pipeline consists of six stages: file discovery, parsing, graph construction, ranking, compression, and formatting.
8
+
9
+ ## Pipeline
10
+
11
+ ### Stage 1: Walker
12
+
13
+ **Purpose:** Discover repository files while respecting ignore rules.
14
+
15
+ The Walker recursively traverses the filesystem from the repository root and applies ignore rules in order:
16
+
17
+ 1. `ALWAYS_IGNORE` — built-in patterns (`.git`, `__pycache__`, `.venv`, etc.)
18
+ 2. `.gitignore` — Git standard ignore rules
19
+ 3. `.ctxignore` — codectx-specific ignore rules
20
+
21
+ The tool uses `pathspec` with `gitwildmatch` semantics to ensure exact behavioral parity with Git's ignore processing.
22
+
23
+ **Output:** `List[Path]` of files to analyze.
24
+
25
+ ### Stage 2: Parser
26
+
27
+ **Purpose:** Extract imports, symbols, and metadata from source files.
28
+
29
+ The Parser processes files in parallel using `ProcessPoolExecutor` (CPU-bound) and `ThreadPoolExecutor` (I/O-bound). For each file:
30
+
31
+ 1. Detect language from file extension
32
+ 2. Parse AST using tree-sitter
33
+ 3. Extract:
34
+ - Import statements (list of import strings)
35
+ - Top-level symbols (functions, classes, methods)
36
+ - Docstrings per symbol
37
+ - Code structure metadata
38
+
39
+ Tree-sitter provides a unified interface across six+ languages: Python, TypeScript, JavaScript, Go, Rust, Java, C, C++, and Ruby.
40
+
41
+ **Output:** `Dict[Path, ParseResult]` where each `ParseResult` contains imports, symbols, and source text.
42
+
43
+ ### Stage 3: Dependency Graph
44
+
45
+ **Purpose:** Build a directed graph representing module relationships.
46
+
47
+ The Graph Builder processes parse results to construct a `rustworkx.DiGraph`:
48
+
49
+ 1. For each import statement, resolve the import string to a file path using per-language import resolvers
50
+ 2. Create nodes for files and edges for import relationships
51
+ 3. Compute graph metrics:
52
+ - **Fan-in** — in-degree per node (how many files import this module)
53
+ - **Fan-out** — out-degree per node (how many modules this file imports)
54
+ - **Strongly connected components** — detect cyclic dependencies
55
+
56
+ The graph enables ranking algorithms to identify important modules based on structural position.
57
+
58
+ **Output:** `rustworkx.DiGraph` with computed metrics.
59
+
60
+ ### Stage 4: Ranker
61
+
62
+ **Purpose:** Score files by importance using multiple signals.
63
+
64
+ The Ranker computes a composite importance score for each file:
65
+
66
+ ```
67
+ score = (0.40 × git_frequency)
68
+ + (0.40 × fan_in)
69
+ + (0.10 × recency)
70
+ + (0.10 × entry_proximity)
71
+ ```
72
+
73
+ **Git Frequency (0.40):** Commit count touching the file. Frequently-modified files are typically more important.
74
+
75
+ **Fan-in (0.40):** Inverse-normalized in-degree. Files imported by many other modules are critical interfaces.
76
+
77
+ **Recency (0.10):** Days since last modification. Recently active files are prioritized.
78
+
79
+ **Entry Proximity (0.10):** Graph distance from identified entry points. Files close to main execution paths rank higher.
80
+
81
+ Scores are normalized to `[0.0, 1.0]` range for uniform compression tier assignment. Semantic searches (`--query`) inject a 5th signal at 20% weight and rescale the other four to 80%.
82
+
83
+ **Output:** `Dict[Path, float]` mapping file paths to scores.
84
+
85
+ ### Stage 5: Compressor
86
+
87
+ **Purpose:** Fit code content within a token budget.
88
+
89
+ The Compressor assigns content tiers based on scored percentiles:
90
+
91
+ - **Tier 1** (Top 15%) — AST-driven structured summaries or full source code for true entry points
92
+ - **Tier 2** (Next 30%) — Function signatures and docstrings only
93
+ - **Tier 3** (Remaining) — One-line summaries
94
+
95
+ A Summarizer step (`--llm` extras) runs specifically evaluating `Tier 3` code mapping out detailed functions implicitly before output mapping.
96
+
97
+ Files are emitted in order: Tier 1 by score descending, then Tier 2, then Tier 3.
98
+
99
+ If total token count exceeds the budget:
100
+
101
+ 1. Drop all Tier 3 files
102
+ 2. Truncate Tier 2 content (keep only signatures, remove docstrings)
103
+ 3. Truncate Tier 1 content (reduce line count progressively)
104
+ 4. If still over budget, drop lowest-scored Tier 1 files
105
+
106
+ This is a hard constraint. The tool does not emit context that exceeds the token limit.
107
+
108
+ **Output:** `Dict[Path, CompressedContent]` and usage statistics.
109
+
110
+ ### Stage 6: Formatter
111
+
112
+ **Purpose:** Emit structured markdown optimized for AI agents.
113
+
114
+ The Formatter writes sections in fixed order:
115
+
116
+ 1. **ARCHITECTURE** — High-level project structure derived from files
117
+ 2. **ENTRY_POINTS** — Main files and public interfaces with full source
118
+ 3. **SYMBOL_INDEX** — Identifies references and mappings across the codebase
119
+ 4. **IMPORTANT_CALL_PATHS** — Tracks deep operational flows sequentially
120
+ 5. **CORE_MODULES** — High-scoring modules with structured logic constraints
121
+ 6. **SUPPORTING_MODULES** — Mid-scoring modules with signatures and docstrings
122
+ 7. **DEPENDENCY_GRAPH** — Mermaid diagram of module relationships
123
+ 8. **RANKED_FILES** — Sorted layout tracking cost algorithms
124
+ 9. **PERIPHERY** — Low-scoring files with one-line summaries
125
+
126
+ Each section is preceded by a Markdown heading and terminated with metadata (token count, file count).
127
+
128
+ **Output:** Markdown string suitable for writing to disk as `CONTEXT.md`.
129
+
130
+ ## Data Flow Diagram
131
+
132
+ ```
133
+ File System
134
+
135
+ ├─→ [Walker]
136
+ │ ├ Respects .gitignore
137
+ │ ├ Respects .ctxignore
138
+ │ └ Output: List[Path]
139
+
140
+ ├─→ [Parser] (Parallel)
141
+ │ ├ Per-language extraction
142
+ │ ├ tree-sitter AST processing
143
+ │ └ Output: Dict[Path, ParseResult]
144
+
145
+ ├─→ [Graph Builder]
146
+ │ ├ Resolve imports
147
+ │ ├ Construct DiGraph
148
+ │ └ Output: rustworkx.DiGraph
149
+
150
+ ├─→ [Git Metadata] (Parallel)
151
+ │ ├ Commit frequency per file
152
+ │ ├ Recency (last modification)
153
+ │ └ Output: Dict[Path, GitMeta]
154
+
155
+ ├─→ [Ranker]
156
+ │ ├ Composite scoring
157
+ │ ├ Normalize to [0.0, 1.0]
158
+ │ └ Output: Dict[Path, float]
159
+
160
+ ├─→ [Compressor]
161
+ │ ├ Tier assignment
162
+ │ ├ Token budget enforcement
163
+ │ ├ [Optional: AI Summarizer hooks]
164
+ │ └ Output: Dict[Path, CompressedContent]
165
+
166
+ └─→ [Formatter]
167
+ ├ Section organization
168
+ ├ Markdown generation
169
+ └ Output: CONTEXT.md
170
+ ```
171
+
172
+ ## Caching
173
+
174
+ The tool caches expensive computations:
175
+
176
+ **Cache key:** `(file_path, file_hash, git_commit_sha)`
177
+
178
+ **Cached items:**
179
+ - Parsed AST and extracted symbols per file
180
+ - Git metadata (frequency, recency)
181
+
182
+ **Cache location:** `.codectx_cache/` at repository root (gitignored)
183
+
184
+ **Invalidation:** Cache entries are invalidated when file content changes or HEAD commit changes.
185
+
186
+ This enables fast incremental updates in watch mode.
187
+
188
+ ## Incremental Mode
189
+
190
+ When running `codectx watch .`, the tool:
191
+
192
+ 1. Monitors filesystem with `watchfiles`
193
+ 2. On file change:
194
+ - Reparse only affected files
195
+ - Rebuild graph for changed nodes and dependents
196
+ - Re-rank affected subgraph
197
+ - Recompress to budget
198
+ - Re-emit output
199
+
200
+ This is significantly faster than full analysis on every change.
201
+
202
+ ## Token Budget Enforcement
203
+
204
+ Token counting uses `tiktoken`, which accurately reflects OpenAI and Anthropic model tokenization.
205
+
206
+ Budget enforcement is hard: the tool does not emit context exceeding the specified limit.
207
+
208
+ Consumption order:
209
+
210
+ 1. Fixed overhead (section headers, metadata) — typically 500–1000 tokens
211
+ 2. Tier 1 files by score descending (AST Summaries / Full source)
212
+ 3. Tier 2 files by score descending (signatures only)
213
+ 4. Tier 3 files by score descending (one-line summaries)
214
+
215
+ Files omitted due to budget are logged with a note in the output.
216
+
217
+ ## Language Support
218
+
219
+ The Parser uses tree-sitter for universal AST extraction. Each language requires:
220
+
221
+ 1. **tree-sitter grammar** — provided by `tree-sitter-LANGUAGE` package
222
+ 2. **Import resolver** — per-language logic to resolve import strings to file paths
223
+
224
+ Currently supported:
225
+
226
+ - **Python**
227
+ - **TypeScript/JavaScript**
228
+ - **Go**
229
+ - **Rust**
230
+ - **Java**
231
+ - **C/C++**
232
+ - **Ruby**
233
+
234
+ Adding a language requires implementing a resolver in `src/codectx/graph/resolver.py` and adding the grammar dependency to `pyproject.toml`.
235
+
236
+ ## Configuration
237
+
238
+ Configuration is applied in this precedence order:
239
+
240
+ 1. **CLI flags** (highest priority)
241
+ 2. **`.codectx.toml`** in repository root
242
+ 3. **Built-in defaults** (lowest priority)
243
+
244
+ Example `.codectx.toml`:
245
+
246
+ ```toml
247
+ [codectx]
248
+ token_budget = 120000
249
+ output_file = "CONTEXT.md"
250
+ extra_ignore = ["**/generated/**", "*.draft.py"]
251
+ ```