codectx 0.1.3__tar.gz → 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (108) hide show
  1. {codectx-0.1.3 → codectx-0.2.0}/ARCHITECTURE.md +35 -54
  2. {codectx-0.1.3 → codectx-0.2.0}/CONTEXT.md +372 -328
  3. {codectx-0.1.3 → codectx-0.2.0}/DECISIONS.md +1 -1
  4. codectx-0.2.0/PKG-INFO +252 -0
  5. codectx-0.2.0/PLAN.md +145 -0
  6. codectx-0.2.0/README.md +198 -0
  7. codectx-0.2.0/benchmark.png +0 -0
  8. {codectx-0.1.3 → codectx-0.2.0}/docs/src/content/docs/advanced/dependency-graph.md +4 -0
  9. codectx-0.2.0/docs/src/content/docs/advanced/ranking-system.md +40 -0
  10. codectx-0.2.0/docs/src/content/docs/advanced/token-compression.md +26 -0
  11. {codectx-0.1.3 → codectx-0.2.0}/docs/src/content/docs/comparison.md +1 -1
  12. codectx-0.2.0/docs/src/content/docs/getting-started/basic-usage.md +62 -0
  13. {codectx-0.1.3 → codectx-0.2.0}/docs/src/content/docs/getting-started/installation.md +14 -3
  14. {codectx-0.1.3 → codectx-0.2.0}/docs/src/content/docs/getting-started/quick-start.mdx +6 -2
  15. codectx-0.2.0/docs/src/content/docs/guides/configuration.md +52 -0
  16. {codectx-0.1.3 → codectx-0.2.0}/docs/src/content/docs/introduction/what-is-codectx.md +2 -0
  17. codectx-0.2.0/docs/src/content/docs/reference/architecture-overview.md +32 -0
  18. codectx-0.2.0/docs/src/content/docs/reference/cli-reference.md +115 -0
  19. {codectx-0.1.3 → codectx-0.2.0}/pyproject.toml +2 -3
  20. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/__init__.py +1 -1
  21. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/config/defaults.py +15 -21
  22. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/graph/builder.py +33 -26
  23. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/output/formatter.py +5 -1
  24. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/parser/languages.py +5 -2
  25. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/ranker/scorer.py +12 -1
  26. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/walker.py +4 -0
  27. {codectx-0.1.3 → codectx-0.2.0}/tests/test_scorer.py +71 -0
  28. {codectx-0.1.3 → codectx-0.2.0}/tests/test_walker.py +35 -0
  29. codectx-0.2.0/tests/unit/test_call_paths.py +161 -0
  30. {codectx-0.1.3 → codectx-0.2.0}/tests/unit/test_git_meta.py +51 -0
  31. codectx-0.2.0/tests/unit/test_safety.py +27 -0
  32. {codectx-0.1.3 → codectx-0.2.0}/tests/unit/test_semantic.py +31 -0
  33. {codectx-0.1.3 → codectx-0.2.0}/uv.lock +875 -820
  34. codectx-0.1.3/PKG-INFO +0 -300
  35. codectx-0.1.3/PLAN.md +0 -174
  36. codectx-0.1.3/README.md +0 -245
  37. codectx-0.1.3/docs/src/content/docs/advanced/ranking-system.md +0 -31
  38. codectx-0.1.3/docs/src/content/docs/advanced/token-compression.md +0 -22
  39. codectx-0.1.3/docs/src/content/docs/getting-started/basic-usage.md +0 -44
  40. codectx-0.1.3/docs/src/content/docs/guides/configuration.md +0 -40
  41. codectx-0.1.3/docs/src/content/docs/reference/architecture-overview.md +0 -18
  42. codectx-0.1.3/docs/src/content/docs/reference/cli-reference.md +0 -37
  43. codectx-0.1.3/requirements.txt +0 -115
  44. {codectx-0.1.3 → codectx-0.2.0}/.github/workflows/ci.yml +0 -0
  45. {codectx-0.1.3 → codectx-0.2.0}/.github/workflows/codeql.yml +0 -0
  46. {codectx-0.1.3 → codectx-0.2.0}/.github/workflows/publish.yml +0 -0
  47. {codectx-0.1.3 → codectx-0.2.0}/.gitignore +0 -0
  48. {codectx-0.1.3 → codectx-0.2.0}/.python-version +0 -0
  49. {codectx-0.1.3 → codectx-0.2.0}/LICENSE +0 -0
  50. {codectx-0.1.3 → codectx-0.2.0}/docs/astro.config.mjs +0 -0
  51. {codectx-0.1.3 → codectx-0.2.0}/docs/build_output.txt +0 -0
  52. {codectx-0.1.3 → codectx-0.2.0}/docs/bun.lock +0 -0
  53. {codectx-0.1.3 → codectx-0.2.0}/docs/package.json +0 -0
  54. {codectx-0.1.3 → codectx-0.2.0}/docs/src/content/docs/community/contributing.md +0 -0
  55. {codectx-0.1.3 → codectx-0.2.0}/docs/src/content/docs/community/faq.md +0 -0
  56. {codectx-0.1.3 → codectx-0.2.0}/docs/src/content/docs/guides/best-practices.md +0 -0
  57. {codectx-0.1.3 → codectx-0.2.0}/docs/src/content/docs/guides/using-context-effectively.md +0 -0
  58. {codectx-0.1.3 → codectx-0.2.0}/docs/src/content/docs/index.mdx +0 -0
  59. {codectx-0.1.3 → codectx-0.2.0}/docs/src/content/docs/introduction/why-it-exists.md +0 -0
  60. {codectx-0.1.3 → codectx-0.2.0}/docs/src/content.config.ts +0 -0
  61. {codectx-0.1.3 → codectx-0.2.0}/docs/src/env.d.ts +0 -0
  62. {codectx-0.1.3 → codectx-0.2.0}/docs/src/styles/custom.css +0 -0
  63. {codectx-0.1.3 → codectx-0.2.0}/docs/tsconfig.json +0 -0
  64. {codectx-0.1.3 → codectx-0.2.0}/main.py +0 -0
  65. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/cache.py +0 -0
  66. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/cli.py +0 -0
  67. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/compressor/__init__.py +0 -0
  68. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/compressor/budget.py +0 -0
  69. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/compressor/summarizer.py +0 -0
  70. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/compressor/tiered.py +0 -0
  71. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/config/__init__.py +0 -0
  72. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/config/loader.py +0 -0
  73. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/graph/__init__.py +0 -0
  74. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/graph/resolver.py +0 -0
  75. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/ignore.py +0 -0
  76. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/output/__init__.py +0 -0
  77. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/output/sections.py +0 -0
  78. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/parser/__init__.py +0 -0
  79. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/parser/base.py +0 -0
  80. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/parser/queries/go.scm +0 -0
  81. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/parser/queries/java.scm +0 -0
  82. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/parser/queries/javascript.scm +0 -0
  83. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/parser/queries/python.scm +0 -0
  84. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/parser/queries/rust.scm +0 -0
  85. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/parser/queries/typescript.scm +0 -0
  86. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/parser/treesitter.py +0 -0
  87. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/ranker/__init__.py +0 -0
  88. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/ranker/git_meta.py +0 -0
  89. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/ranker/semantic.py +0 -0
  90. {codectx-0.1.3 → codectx-0.2.0}/src/codectx/safety.py +0 -0
  91. {codectx-0.1.3 → codectx-0.2.0}/tests/__init__.py +0 -0
  92. {codectx-0.1.3 → codectx-0.2.0}/tests/test_compressor.py +0 -0
  93. {codectx-0.1.3 → codectx-0.2.0}/tests/test_ignore.py +0 -0
  94. {codectx-0.1.3 → codectx-0.2.0}/tests/test_integration.py +0 -0
  95. {codectx-0.1.3 → codectx-0.2.0}/tests/test_parser.py +0 -0
  96. {codectx-0.1.3 → codectx-0.2.0}/tests/unit/__init__.py +0 -0
  97. {codectx-0.1.3 → codectx-0.2.0}/tests/unit/test_cache_export.py +0 -0
  98. {codectx-0.1.3 → codectx-0.2.0}/tests/unit/test_cache_wiring.py +0 -0
  99. {codectx-0.1.3 → codectx-0.2.0}/tests/unit/test_cli.py +0 -0
  100. {codectx-0.1.3 → codectx-0.2.0}/tests/unit/test_cycles.py +0 -0
  101. {codectx-0.1.3 → codectx-0.2.0}/tests/unit/test_formatter_coverage.py +0 -0
  102. {codectx-0.1.3 → codectx-0.2.0}/tests/unit/test_formatter_sections.py +0 -0
  103. {codectx-0.1.3 → codectx-0.2.0}/tests/unit/test_multi_root.py +0 -0
  104. {codectx-0.1.3 → codectx-0.2.0}/tests/unit/test_queries.py +0 -0
  105. {codectx-0.1.3 → codectx-0.2.0}/tests/unit/test_resolver.py +0 -0
  106. {codectx-0.1.3 → codectx-0.2.0}/tests/unit/test_semantic_mock.py +0 -0
  107. {codectx-0.1.3 → codectx-0.2.0}/tests/unit/test_summarizer.py +0 -0
  108. {codectx-0.1.3 → codectx-0.2.0}/tests/unit/test_treesitter.py +0 -0
@@ -64,21 +64,21 @@ The graph enables ranking algorithms to identify important modules based on stru
64
64
  The Ranker computes a composite importance score for each file:
65
65
 
66
66
  ```
67
- score = (0.35 × git_frequency)
68
- + (0.35 × fan_in)
69
- + (0.20 × recency)
67
+ score = (0.40 × git_frequency)
68
+ + (0.40 × fan_in)
69
+ + (0.10 × recency)
70
70
  + (0.10 × entry_proximity)
71
71
  ```
72
72
 
73
- **Git Frequency (0.35):** Commit count touching the file. Frequently-modified files are typically more important.
73
+ **Git Frequency (0.40):** Commit count touching the file. Frequently-modified files are typically more important.
74
74
 
75
- **Fan-in (0.35):** Inverse-normalized in-degree. Files imported by many other modules are critical interfaces.
75
+ **Fan-in (0.40):** Inverse-normalized in-degree. Files imported by many other modules are critical interfaces.
76
76
 
77
- **Recency (0.20):** Days since last modification. Recently active files are prioritized.
77
+ **Recency (0.10):** Days since last modification. Recently active files are prioritized.
78
78
 
79
79
  **Entry Proximity (0.10):** Graph distance from identified entry points. Files close to main execution paths rank higher.
80
80
 
81
- Scores are normalized to `[0.0, 1.0]` range for uniform compression tier assignment.
81
+ Scores are normalized to `[0.0, 1.0]` range for uniform compression tier assignment. Semantic searches (`--query`) inject a 5th signal at 20% weight and rescale the other four to 80%.
82
82
 
83
83
  **Output:** `Dict[Path, float]` mapping file paths to scores.
84
84
 
@@ -86,11 +86,13 @@ Scores are normalized to `[0.0, 1.0]` range for uniform compression tier assignm
86
86
 
87
87
  **Purpose:** Fit code content within a token budget.
88
88
 
89
- The Compressor assigns content tiers based on scores:
89
+ The Compressor assigns content tiers based on scored percentiles:
90
90
 
91
- - **Tier 1** (score ≥ 0.7) — Full source code
92
- - **Tier 2** (0.3 ≤ score < 0.7) — Function signatures and docstrings only
93
- - **Tier 3** (score < 0.3) — One-line summary
91
+ - **Tier 1** (Top 15%) — AST-driven structured summaries or full source code for true entry points
92
+ - **Tier 2** (Next 30%) — Function signatures and docstrings only
93
+ - **Tier 3** (Remaining) — One-line summaries
94
+
95
+ A Summarizer step (`--llm` extras) runs specifically evaluating `Tier 3` code mapping out detailed functions implicitly before output mapping.
94
96
 
95
97
  Files are emitted in order: Tier 1 by score descending, then Tier 2, then Tier 3.
96
98
 
@@ -111,13 +113,15 @@ This is a hard constraint. The tool does not emit context that exceeds the token
111
113
 
112
114
  The Formatter writes sections in fixed order:
113
115
 
114
- 1. **ARCHITECTURE** — High-level project structure
115
- 2. **DEPENDENCY_GRAPH** — Mermaid diagram of module relationships
116
- 3. **ENTRY_POINTS** — Main files and public interfaces with full source
117
- 4. **CORE_MODULES** — High-scoring modules with full source
118
- 5. **SUPPORTING_MODULES** — Mid-scoring modules with signatures and docstrings
119
- 6. **PERIPHERY** — Low-scoring files with one-line summaries
120
- 7. **RECENT_CHANGES** — Optional diff section (if `--since` flag provided)
116
+ 1. **ARCHITECTURE** — High-level project structure derived from files
117
+ 2. **ENTRY_POINTS** — Main files and public interfaces with full source
118
+ 3. **SYMBOL_INDEX** — Identifies references and mappings across the codebase
119
+ 4. **IMPORTANT_CALL_PATHS** — Tracks deep operational flows sequentially
120
+ 5. **CORE_MODULES** — High-scoring modules with structured logic constraints
121
+ 6. **SUPPORTING_MODULES** — Mid-scoring modules with signatures and docstrings
122
+ 7. **DEPENDENCY_GRAPH** — Mermaid diagram of module relationships
123
+ 8. **RANKED_FILES** — Sorted layout tracking cost algorithms
124
+ 9. **PERIPHERY** — Low-scoring files with one-line summaries
121
125
 
122
126
  Each section is preceded by a Markdown heading and terminated with metadata (token count, file count).
123
127
 
@@ -156,6 +160,7 @@ File System
156
160
  ├─→ [Compressor]
157
161
  │ ├ Tier assignment
158
162
  │ ├ Token budget enforcement
163
+ │ ├ [Optional: AI Summarizer hooks]
159
164
  │ └ Output: Dict[Path, CompressedContent]
160
165
 
161
166
  └─→ [Formatter]
@@ -203,7 +208,7 @@ Budget enforcement is hard: the tool does not emit context exceeding the specifi
203
208
  Consumption order:
204
209
 
205
210
  1. Fixed overhead (section headers, metadata) — typically 500–1000 tokens
206
- 2. Tier 1 files by score descending (full source)
211
+ 2. Tier 1 files by score descending (AST Summaries / Full source)
207
212
  3. Tier 2 files by score descending (signatures only)
208
213
  4. Tier 3 files by score descending (one-line summaries)
209
214
 
@@ -218,11 +223,13 @@ The Parser uses tree-sitter for universal AST extraction. Each language requires
218
223
 
219
224
  Currently supported:
220
225
 
221
- - **Python** — `import X`, `from X import Y`
222
- - **TypeScript/JavaScript** — `import * from "X"`, `require("X")`
223
- - **Go** — `import "X"`
224
- - **Rust** — `use X::{Y, Z}`
225
- - **Java** — `import X.Y;`
226
+ - **Python**
227
+ - **TypeScript/JavaScript**
228
+ - **Go**
229
+ - **Rust**
230
+ - **Java**
231
+ - **C/C++**
232
+ - **Ruby**
226
233
 
227
234
  Adding a language requires implementing a resolver in `src/codectx/graph/resolver.py` and adding the grammar dependency to `pyproject.toml`.
228
235
 
@@ -231,40 +238,14 @@ Adding a language requires implementing a resolver in `src/codectx/graph/resolve
231
238
  Configuration is applied in this precedence order:
232
239
 
233
240
  1. **CLI flags** (highest priority)
234
- 2. **`.contextcraft.toml`** in repository root
241
+ 2. **`.codectx.toml`** in repository root
235
242
  3. **Built-in defaults** (lowest priority)
236
243
 
237
- Example `.contextcraft.toml`:
244
+ Example `.codectx.toml`:
238
245
 
239
246
  ```toml
240
247
  [codectx]
241
248
  token_budget = 120000
242
- output = "CONTEXT.md"
243
- include_patterns = ["src/**", "lib/**"]
244
- exclude_patterns = ["tests/**", "*.test.py"]
249
+ output_file = "CONTEXT.md"
250
+ extra_ignore = ["**/generated/**", "*.draft.py"]
245
251
  ```
246
-
247
- ## Parallelism Strategy
248
-
249
- **CPU-bound tasks (Parser):** `ProcessPoolExecutor` — parsing and AST extraction leverages tree-sitter C extension.
250
-
251
- **I/O-bound tasks (Git metadata, file I/O):** `ThreadPoolExecutor` — reading git history and source files is I/O-bound.
252
-
253
- **Sync tasks:** Graph construction, ranking, and compression are single-threaded because they are fast and maintain simple state.
254
-
255
- This mixed-executor approach balances CPU and I/O contention.
256
-
257
- ## Performance Characteristics
258
-
259
- On a typical 10k-file repository:
260
-
261
- - **Walker:** ~500ms (filesystem traversal)
262
- - **Parser:** ~2-5s (parallel tree-sitter parsing)
263
- - **Graph Builder:** ~100ms (import resolution)
264
- - **Ranker:** ~200ms (scoring and normalization)
265
- - **Compressor:** ~50ms (tier assignment)
266
- - **Formatter:** ~100ms (markdown generation)
267
-
268
- **Total:** ~3-6 seconds for full analysis.
269
-
270
- Incremental mode (watch) is typically 5-10x faster because it processes only changed files.