@pmaddire/gcie 0.1.4 → 0.1.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENT.md +6 -2
- package/GCIE_USAGE.md +212 -153
- package/README.md +30 -8
- package/bench_questions.py +69 -0
- package/cli/app.py +198 -162
- package/cli/commands/adaptation.py +341 -0
- package/cli/commands/context.py +682 -34
- package/cli/commands/context_slices.py +1322 -601
- package/cli/commands/setup.py +86 -72
- package/context/architecture_slicer.py +2 -1
- package/llm_context/snippet_selector.py +1 -1
- package/package.json +1 -1
- package/retrieval/hybrid_retriever.py +9 -1
package/AGENT.md
CHANGED
|
@@ -1,10 +1,14 @@
|
|
|
1
1
|
# AGENT.md
|
|
2
2
|
|
|
3
|
-
Agent Operating Instructions for GraphCode Intelligence Engine (GCIE)
|
|
3
|
+
Agent Operating Instructions for GraphCode Intelligence Engine (GCIE)
|
|
4
4
|
|
|
5
5
|
This file provides persistent architectural context for coding agents working on this repository.
|
|
6
6
|
|
|
7
|
-
Agents must read this file before performing any development tasks.
|
|
7
|
+
Agents must read this file before performing any development tasks.
|
|
8
|
+
|
|
9
|
+
Agent trigger:
|
|
10
|
+
Use GCIE for context lookup before reading files or making edits.
|
|
11
|
+
Follow `GCIE_USAGE.md` for the adaptive retrieval workflow and mode-routing rules.
|
|
8
12
|
|
|
9
13
|
---
|
|
10
14
|
|
package/GCIE_USAGE.md
CHANGED
|
@@ -1,231 +1,290 @@
|
|
|
1
|
-
|
|
1
|
+
# GCIE Agent Usage (Portable, Accuracy-First)
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
Trigger line for agent instructions:
|
|
4
|
+
`Use GCIE for context lookup before reading files or making edits. Follow GCIE_USAGE.md.`
|
|
4
5
|
|
|
5
6
|
## Goal
|
|
6
7
|
|
|
7
|
-
Retrieve the smallest useful context
|
|
8
|
+
Retrieve the smallest useful context without sacrificing edit safety.
|
|
8
9
|
|
|
9
10
|
Priority order:
|
|
10
11
|
1. accuracy (must-have coverage)
|
|
11
12
|
2. full-hit reliability
|
|
12
13
|
3. token efficiency
|
|
13
14
|
|
|
14
|
-
##
|
|
15
|
+
## Core Rules
|
|
15
16
|
|
|
16
|
-
1.
|
|
17
|
-
-
|
|
18
|
-
-
|
|
19
|
-
|
|
20
|
-
|
|
17
|
+
1. Do not trade recall for token savings.
|
|
18
|
+
2. Stop retrieval as soon as must-have categories are covered.
|
|
19
|
+
3. Adapt per task family, not per one-off query.
|
|
20
|
+
4. Keep defaults portable; keep repo-specific learning in `.gcie` state.
|
|
21
|
+
|
|
22
|
+
## Commands (Tool-Synced)
|
|
23
|
+
|
|
24
|
+
Primary retrieval:
|
|
25
|
+
```powershell
|
|
26
|
+
gcie.cmd context <path> "<query>" --intent <edit|debug|refactor|explore> --budget <auto|int> --mode <basic|adaptive>
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
Sliced retrieval:
|
|
30
|
+
```powershell
|
|
31
|
+
gcie.cmd context-slices <path> "<query>" --intent <edit|debug|refactor|explore> --profile <low|recall|adaptive>
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
Adaptive profile state:
|
|
35
|
+
```powershell
|
|
36
|
+
gcie.cmd adaptive-profile .
|
|
37
|
+
gcie.cmd adaptive-profile . --clear
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
Post-init adaptation pipeline:
|
|
41
|
+
```powershell
|
|
42
|
+
gcie.cmd adapt . --benchmark-size 10 --efficiency-iterations 5 --clear-profile
|
|
43
|
+
```
|
|
21
44
|
|
|
22
|
-
|
|
45
|
+
One-shot setup + adaptation:
|
|
23
46
|
```powershell
|
|
24
|
-
gcie.cmd
|
|
47
|
+
gcie.cmd setup . --adapt --adapt-benchmark-size 10 --adapt-efficiency-iterations 5
|
|
25
48
|
```
|
|
26
49
|
|
|
27
|
-
|
|
50
|
+
Setup and index:
|
|
51
|
+
```powershell
|
|
52
|
+
gcie.cmd setup .
|
|
53
|
+
gcie.cmd index .
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
Direct verification:
|
|
57
|
+
```powershell
|
|
58
|
+
rg -n "symbol1|symbol2|symbol3" <likely files or subtree>
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
## Must-Have Coverage Gate (Required)
|
|
62
|
+
|
|
63
|
+
Context is sufficient only when all needed categories are present:
|
|
64
|
+
- implementation file(s)
|
|
65
|
+
- wiring/orchestration/caller file(s)
|
|
66
|
+
- validation surface when risk is non-trivial (test/spec/schema/config/contract/CLI surface)
|
|
67
|
+
|
|
68
|
+
If any must-have file is missing, retrieval is incomplete.
|
|
69
|
+
|
|
70
|
+
If a must-have file appears only as compact/skeleton context, re-query that file explicitly (pin or targeted query) before editing.
|
|
71
|
+
|
|
72
|
+
Note: tests/spec files are often excluded by default. Add `--include-tests` only when test context is required.
|
|
73
|
+
|
|
74
|
+
## Query Construction (Portable, High-Signal)
|
|
75
|
+
|
|
76
|
+
Preferred pattern:
|
|
77
|
+
`<file-a> <file-b> <function/component> <route/flag> <state/config-key>`
|
|
28
78
|
|
|
29
|
-
|
|
79
|
+
Rules:
|
|
80
|
+
1. Use file-first, symbol-heavy phrasing.
|
|
81
|
+
2. Include explicit file paths when known.
|
|
82
|
+
3. Include 2-6 distinctive symbols.
|
|
83
|
+
4. Add caller/entry anchor when target is indirect.
|
|
84
|
+
5. Avoid natural-language question phrasing.
|
|
30
85
|
|
|
31
|
-
|
|
86
|
+
Example:
|
|
87
|
+
- Bad: `How does architecture routing decide when to fall back?`
|
|
88
|
+
- Good: `context/context_router.py context/fallback_evaluator.py architecture routing fallback confidence`
|
|
32
89
|
|
|
33
90
|
## Retrieval Modes (Adaptive Router)
|
|
34
91
|
|
|
35
|
-
Use three modes and
|
|
92
|
+
Use three modes and route by observed outcomes:
|
|
36
93
|
|
|
37
|
-
1. `
|
|
38
|
-
2. `
|
|
39
|
-
3. `direct-file-check`
|
|
94
|
+
1. `slicer-first`
|
|
95
|
+
2. `plain-context-first`
|
|
96
|
+
3. `direct-file-check`
|
|
40
97
|
|
|
41
|
-
|
|
98
|
+
Slicer-first:
|
|
42
99
|
```powershell
|
|
43
|
-
gcie.cmd context <path> "<query>" --
|
|
100
|
+
gcie.cmd context-slices <path> "<query>" --profile low --intent <intent>
|
|
44
101
|
```
|
|
45
102
|
|
|
46
|
-
|
|
103
|
+
Plain-context-first:
|
|
47
104
|
```powershell
|
|
48
|
-
gcie.cmd context
|
|
105
|
+
gcie.cmd context <path> "<query>" --mode basic --intent <intent> --budget auto
|
|
49
106
|
```
|
|
50
107
|
|
|
51
|
-
Direct-file-check
|
|
108
|
+
Direct-file-check:
|
|
52
109
|
```powershell
|
|
53
|
-
rg -n "<
|
|
110
|
+
rg -n "<symbols>" <files-or-subtree>
|
|
54
111
|
```
|
|
55
112
|
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
113
|
+
Routing policy:
|
|
114
|
+
1. Start new repos in `slicer-first` bootstrap mode.
|
|
115
|
+
2. If must-have coverage is incomplete after one slicer pass, switch that task to `plain-context-first`.
|
|
116
|
+
3. If a task family misses with slicer 2+ times in calibration, set that family default to `plain-context-first`.
|
|
117
|
+
4. Keep slicer for families where it is both accurate and cheaper.
|
|
118
|
+
5. If two GCIE attempts still miss required files, use `direct-file-check` and mark family `manual-verify-required` until recalibrated.
|
|
61
119
|
|
|
62
|
-
|
|
63
|
-
- default all families to `plain-context-first`
|
|
64
|
-
- after first 10-20 tasks, promote individual families to `slicer-first` only if benchmarked better
|
|
65
|
-
- keep a family on plain-context if slicer is more expensive with no accuracy gain
|
|
120
|
+
## Scope and Budget Baseline (Portable)
|
|
66
121
|
|
|
67
|
-
|
|
122
|
+
Scope rule:
|
|
123
|
+
1. Use the smallest path scope that still contains expected files.
|
|
124
|
+
2. Use repo root `.` only for true cross-layer recovery.
|
|
125
|
+
3. If explicit targets cluster in one subtree, subtree scope is usually better than root.
|
|
68
126
|
|
|
69
|
-
|
|
127
|
+
Profile ladder (concrete, portable):
|
|
128
|
+
1. `context-slices --profile low`
|
|
129
|
+
2. if miss: `context-slices --profile recall`
|
|
130
|
+
3. if miss: `context-slices --profile recall --pin <missing-file>`
|
|
131
|
+
4. if miss: `rg` direct check and targeted file retrieval
|
|
70
132
|
|
|
71
|
-
|
|
72
|
-
-
|
|
73
|
-
-
|
|
74
|
-
-
|
|
133
|
+
Plain-context budget baseline:
|
|
134
|
+
- `auto`: simple same-layer or strong single-file lookup
|
|
135
|
+
- `900`: same-family two-file lookup
|
|
136
|
+
- `1100`: backend/config pair or same-layer backend pair
|
|
137
|
+
- `1150`: cross-layer UI/API flow
|
|
138
|
+
- `1300-1400`: explicit multi-hop chain
|
|
75
139
|
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
-
|
|
79
|
-
2. Use `.gcie/architecture.md` as GCIE-owned working architecture map.
|
|
80
|
-
3. Refresh `.gcie/architecture.md` and `.gcie/architecture_index.json` when structural changes happen:
|
|
81
|
-
- new subsystem
|
|
82
|
-
- major module split/merge
|
|
83
|
-
- interface/boundary change
|
|
84
|
-
- dependency-direction change
|
|
85
|
-
- active work-area shift
|
|
86
|
-
4. Do not overwrite user-owned docs unless explicitly asked.
|
|
140
|
+
Gap-fill baseline:
|
|
141
|
+
- general implementation/wiring file: `900`
|
|
142
|
+
- small entry/orchestrator file: `500`
|
|
87
143
|
|
|
88
|
-
|
|
89
|
-
- if architecture slice confidence is low or required mappings are stale/missing, fallback to plain `context` automatically
|
|
90
|
-
- record fallback reason in `.gcie/context_config.json` when bypassing slicer mode
|
|
144
|
+
## Adaptive Recovery Order (One Change At A Time)
|
|
91
145
|
|
|
92
|
-
|
|
146
|
+
When retrieval is weak, apply in this exact order:
|
|
93
147
|
|
|
94
|
-
|
|
148
|
+
1. Query upgrade: add explicit files, symbols, caller/entry anchor
|
|
149
|
+
2. Scope correction: subtree vs root
|
|
150
|
+
3. One profile/budget escalation
|
|
151
|
+
4. Targeted gap-fill for only missing must-have file(s)
|
|
152
|
+
5. Multi-hop decomposition only if still incomplete
|
|
95
153
|
|
|
96
|
-
|
|
97
|
-
-
|
|
98
|
-
- `900`: same-family two-file lookup, frontend-local component lookup
|
|
99
|
-
- `1100`: backend/config pair, same-layer backend pair
|
|
100
|
-
- `1150`: cross-layer UI/API flow
|
|
101
|
-
- `1300-1400`: explicit multi-hop chain (3+ linked files)
|
|
154
|
+
Stop condition:
|
|
155
|
+
- If a required file is still missing after two GCIE attempts (with query+scope corrected), stop GCIE retries and use `rg`.
|
|
102
156
|
|
|
103
|
-
|
|
104
|
-
- missing general implementation/wiring file: `900`
|
|
105
|
-
- missing small orchestration or entry file: `500`
|
|
157
|
+
## Architecture Tracking (Portable, In-Repo)
|
|
106
158
|
|
|
107
|
-
|
|
108
|
-
-
|
|
109
|
-
-
|
|
110
|
-
-
|
|
159
|
+
Track these under `.gcie/`:
|
|
160
|
+
- `.gcie/architecture.md`
|
|
161
|
+
- `.gcie/architecture_index.json`
|
|
162
|
+
- `.gcie/context_config.json`
|
|
111
163
|
|
|
112
|
-
|
|
164
|
+
Keep adaptive:
|
|
165
|
+
1. Bootstrap from user docs once (`ARCHITECTURE.md`, `README.md`, `PROJECT.md`, `docs/*architecture*`).
|
|
166
|
+
2. Treat `.gcie/architecture.md` as GCIE-owned working map.
|
|
167
|
+
3. Refresh architecture files when boundaries/subsystems/interfaces change.
|
|
168
|
+
4. Do not overwrite user-owned docs unless explicitly asked.
|
|
113
169
|
|
|
114
|
-
|
|
170
|
+
Fallback confidence rule:
|
|
171
|
+
- If architecture confidence is low or mappings are stale/missing, fallback to plain context and record reason in `.gcie/context_config.json`.
|
|
115
172
|
|
|
116
|
-
|
|
173
|
+
## Pre-Calibration Readiness Gate (Required)
|
|
117
174
|
|
|
118
|
-
|
|
119
|
-
- include explicit file paths when known
|
|
120
|
-
- include 2 to 6 distinctive symbols
|
|
121
|
-
- include a caller or entry anchor when the target is indirect
|
|
122
|
-
- avoid vague summaries and long laundry-list queries
|
|
175
|
+
Run before full adaptation:
|
|
123
176
|
|
|
124
|
-
|
|
177
|
+
1. Index + architecture refresh:
|
|
178
|
+
```powershell
|
|
179
|
+
gcie.cmd index .
|
|
180
|
+
```
|
|
125
181
|
|
|
126
|
-
|
|
127
|
-
-
|
|
128
|
-
- generic entry/support files dominate
|
|
129
|
-
- only tiny snippets from the target file appear, with no useful implementation body
|
|
130
|
-
- expected cross-layer endpoint is missing
|
|
182
|
+
2. Readiness probe:
|
|
183
|
+
- Run 10-20 stratified queries across major families using `context-slices` and `context`.
|
|
131
184
|
|
|
132
|
-
|
|
185
|
+
3. Readiness adaptation loop:
|
|
186
|
+
- Apply recovery order (query -> scope -> escalation -> targeted gap-fill -> decomposition).
|
|
133
187
|
|
|
134
|
-
|
|
135
|
-
-
|
|
136
|
-
-
|
|
137
|
-
- add caller or entry anchor
|
|
188
|
+
4. Gate decision:
|
|
189
|
+
- Proceed to calibration only after coverage is reachable with stable behavior.
|
|
190
|
+
- If not reachable, keep safer fallback mode for affected families and continue tracking.
|
|
138
191
|
|
|
139
|
-
|
|
140
|
-
- noisy root results: move to subtree scope
|
|
141
|
-
- missing cross-layer or backend anchor: use a targeted root query for that file
|
|
192
|
+
## Automatic Post-Trigger Adaptation (Required)
|
|
142
193
|
|
|
143
|
-
|
|
144
|
-
- raise one rung only, roughly `+100` to `+250`
|
|
194
|
+
After trigger detection in a repo session:
|
|
145
195
|
|
|
146
|
-
|
|
147
|
-
|
|
196
|
+
1. `checkpoint: trigger_detected`
|
|
197
|
+
2. Run `gcie.cmd index .` -> `checkpoint: index_complete`
|
|
198
|
+
3. Run readiness probe -> `checkpoint: readiness_probe_complete`
|
|
199
|
+
4. Run accuracy calibration to 100% must-have hit -> `checkpoint: accuracy_lock_complete`
|
|
200
|
+
5. Run efficiency iterations only under hard accuracy gate -> `checkpoint: efficiency_complete`
|
|
201
|
+
6. Run final stress validation (recommended 50-query) -> `checkpoint: stress_validation_complete`
|
|
202
|
+
7. Write back results (`.planning`, `.gcie/context_config.json`, learned overrides section) -> `checkpoint: write_back_complete`
|
|
148
203
|
|
|
149
|
-
|
|
150
|
-
- for 4+ hops, split into adjacent 2-3 file hops
|
|
204
|
+
If any checkpoint fails, mark run `incomplete`, record failure artifact in `.planning/`, and continue recovery/fallback flow.
|
|
151
205
|
|
|
152
|
-
##
|
|
206
|
+
## Mandatory Bootstrap Calibration Sequence
|
|
153
207
|
|
|
154
|
-
|
|
208
|
+
1. Recall calibration stage (required):
|
|
209
|
+
- Tune mode/scope/query/profile until overall and per-family hit rates are 100%.
|
|
155
210
|
|
|
156
|
-
|
|
157
|
-
-
|
|
158
|
-
- for a single missing file, try `800` before `900` only if the first pass already found same-family context
|
|
159
|
-
- if `800` misses, immediately retry the stable default
|
|
160
|
-
- if any miss persists, revert that task family to stable settings
|
|
211
|
+
2. Recall lock verification (required):
|
|
212
|
+
- Require 2 consecutive 100% lock runs.
|
|
161
213
|
|
|
162
|
-
|
|
163
|
-
-
|
|
164
|
-
-
|
|
214
|
+
3. Efficiency stage (optional, only after lock):
|
|
215
|
+
- Test controlled reductions one change at a time.
|
|
216
|
+
- Immediately rollback any hit-rate regression.
|
|
165
217
|
|
|
166
|
-
|
|
218
|
+
4. Activation rule (required):
|
|
219
|
+
- Activate only if lock/stress pass.
|
|
220
|
+
- If stress fails, rollback to last known 100%-hit config.
|
|
167
221
|
|
|
168
|
-
|
|
222
|
+
## Metrics and Decision Rules
|
|
169
223
|
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
|
|
224
|
+
Per query, record:
|
|
225
|
+
- must-have hit (true/false)
|
|
226
|
+
- tokens used
|
|
227
|
+
- retrieved files
|
|
228
|
+
- escalations performed
|
|
173
229
|
|
|
174
|
-
|
|
230
|
+
Track overall and by family:
|
|
231
|
+
- hit rate
|
|
232
|
+
- average and median tokens
|
|
233
|
+
- tokens-per-hit (`total_tokens / hit_count`)
|
|
175
234
|
|
|
176
|
-
|
|
235
|
+
Selection rule per family:
|
|
236
|
+
1. highest hit rate
|
|
237
|
+
2. if tie: lowest tokens-per-hit
|
|
238
|
+
3. if tie: lowest median tokens
|
|
177
239
|
|
|
178
|
-
|
|
240
|
+
Demotion rules:
|
|
241
|
+
- If slicer miss-rate > 0% during recall calibration, do not keep slicer as default for that family.
|
|
242
|
+
- If both slicer and plain fail, route family to manual-verify until recalibration.
|
|
179
243
|
|
|
180
|
-
|
|
181
|
-
-
|
|
182
|
-
-
|
|
183
|
-
- validation surface, when risk justifies it
|
|
244
|
+
Promotion rules:
|
|
245
|
+
- Promote only configurations that preserve 100% hit.
|
|
246
|
+
- Efficiency changes must improve tokens without reducing hit rate.
|
|
184
247
|
|
|
185
|
-
|
|
248
|
+
## Continuous Adaptation Over Time
|
|
186
249
|
|
|
187
|
-
|
|
250
|
+
Trigger recalibration when any are true:
|
|
251
|
+
1. major repo-change signal (large refactor/churn)
|
|
252
|
+
2. savings decay (rolling savings drops materially vs active baseline)
|
|
253
|
+
3. repeated family misses (2+ in recent window)
|
|
188
254
|
|
|
189
|
-
|
|
190
|
-
|
|
191
|
-
|
|
192
|
-
|
|
193
|
-
|
|
194
|
-
-
|
|
255
|
+
Guardrails:
|
|
256
|
+
1. Use a minimum evidence window (recommended: 20 retrieval events).
|
|
257
|
+
2. Run in quiet/background mode when possible.
|
|
258
|
+
3. Cap adaptation budget per cycle.
|
|
259
|
+
4. Early-stop efficiency loop after 2 non-improving iterations.
|
|
260
|
+
5. Prefer family-scoped recalibration before full recalibration.
|
|
195
261
|
|
|
196
|
-
|
|
197
|
-
- add one local override for that family only
|
|
198
|
-
- keep all other families on portable defaults
|
|
262
|
+
## Persistence
|
|
199
263
|
|
|
200
|
-
|
|
201
|
-
-
|
|
202
|
-
-
|
|
203
|
-
-
|
|
264
|
+
Persist learned defaults in `.gcie/context_config.json` and `.gcie/retrieval_profile.json` with:
|
|
265
|
+
- family
|
|
266
|
+
- default mode/profile
|
|
267
|
+
- last benchmark date
|
|
268
|
+
- hit/token metrics
|
|
204
269
|
|
|
205
|
-
|
|
270
|
+
Write repo-local learned routing here:
|
|
206
271
|
|
|
207
|
-
|
|
272
|
+
## Learned Routing Overrides (Repo-Local, Mutable)
|
|
208
273
|
|
|
209
|
-
|
|
210
|
-
|
|
211
|
-
gcie.cmd context frontend "src/App.jsx src/main.jsx <symbols>" --intent edit --budget 900
|
|
212
|
-
gcie.cmd context . "app.py start_convert selected_theme selectedTheme no_ai" --intent edit --budget 900
|
|
213
|
-
```
|
|
274
|
+
No active learned overrides yet.
|
|
275
|
+
Populate after first full adaptation cycle.
|
|
214
276
|
|
|
215
|
-
|
|
216
|
-
```powershell
|
|
217
|
-
gcie.cmd context . "Plan_slides.py content_slides section_divider figure_slides table_slide" --intent <intent> --budget 900
|
|
218
|
-
gcie.cmd context . "Build_pptx.py build_pptx render_eq_png apply_theme THEME_CHOICES" --intent <intent> --budget 900
|
|
219
|
-
```
|
|
277
|
+
## Agent Instructions Snippet (Copy/Paste)
|
|
220
278
|
|
|
221
|
-
|
|
222
|
-
|
|
223
|
-
|
|
224
|
-
|
|
279
|
+
```text
|
|
280
|
+
Use GCIE for context lookup before reading files or making edits. Follow GCIE_USAGE.md.
|
|
281
|
+
Prioritize must-have coverage over token savings.
|
|
282
|
+
Start with context-slices --profile low, then adapt using recovery order:
|
|
283
|
+
query -> scope -> profile/budget escalation -> targeted gap-fill -> rg fallback.
|
|
225
284
|
```
|
|
226
285
|
|
|
227
|
-
|
|
228
|
-
- keep the stable workflow for families that regress under split retrieval
|
|
229
|
-
- example: `llm_client.py + Analyze_pdf_structure.py + Extract_pdf_content.py` in one benchmarked repo
|
|
286
|
+
## Notes
|
|
230
287
|
|
|
231
|
-
|
|
288
|
+
1. This file is intentionally generalized and adaptive for any repo.
|
|
289
|
+
2. Keep repo-specific tuning in learned overrides and `.gcie` state, not in global defaults.
|
|
290
|
+
3. If in doubt, choose the higher-accuracy path first, then optimize tokens after lock.
|
package/README.md
CHANGED
|
@@ -58,7 +58,7 @@ Use this when you want a fast drop-in setup for coding agents.
|
|
|
58
58
|
No heavy upfront tuning is required. The workflow starts portable-first and only adds local overrides after repeated miss patterns.
|
|
59
59
|
|
|
60
60
|
One-command repo bootstrap:
|
|
61
|
-
- `gcie
|
|
61
|
+
- `npx -y @pmaddire/gcie@latest setup .`
|
|
62
62
|
|
|
63
63
|
This creates `.gcie` architecture tracking files, copies portable agent workflow docs, and runs an initial index pass.
|
|
64
64
|
|
|
@@ -101,7 +101,7 @@ Net: adapted protocol preserved full accuracy while reducing average tokens by ~
|
|
|
101
101
|
After publishing to npm, users can set up any repo with one command:
|
|
102
102
|
|
|
103
103
|
```powershell
|
|
104
|
-
npx gcie@latest
|
|
104
|
+
npx -y @pmaddire/gcie@latest setup .
|
|
105
105
|
```
|
|
106
106
|
|
|
107
107
|
This runs `gcie setup .` in the current repo by default.
|
|
@@ -110,10 +110,32 @@ If Python deps are missing, GCIE now bootstraps a local package venv and install
|
|
|
110
110
|
Optional setup flags are passed through:
|
|
111
111
|
|
|
112
112
|
```powershell
|
|
113
|
-
npx gcie@latest --no-index
|
|
114
|
-
npx gcie@latest --force
|
|
113
|
+
npx -y @pmaddire/gcie@latest setup . --no-index
|
|
114
|
+
npx -y @pmaddire/gcie@latest setup . --force
|
|
115
115
|
```
|
|
116
116
|
|
|
117
|
+
For command-only usage without setup:
|
|
118
|
+
|
|
119
|
+
```powershell
|
|
120
|
+
npx -y @pmaddire/gcie@latest --help
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
## Agent Integration
|
|
124
|
+
|
|
125
|
+
To make your coding agent use GCIE automatically, add this trigger line to your
|
|
126
|
+
agent instructions (system prompt / repo instruction file):
|
|
127
|
+
|
|
128
|
+
`Use GCIE for context lookup before reading files or making edits. Follow GCIE_USAGE.md.`
|
|
129
|
+
|
|
130
|
+
Required file:
|
|
131
|
+
- keep `GCIE_USAGE.md` in the target repo root
|
|
132
|
+
|
|
133
|
+
Recommended setup:
|
|
134
|
+
1. Run one-command setup:
|
|
135
|
+
- `npx -y @pmaddire/gcie@latest setup .`
|
|
136
|
+
2. Add the trigger line above to your agent instruction file.
|
|
137
|
+
3. Start normal coding tasks; the agent should use GCIE-first retrieval workflow.
|
|
138
|
+
|
|
117
139
|
## One-Command GitHub Bootstrap
|
|
118
140
|
|
|
119
141
|
Run this from the target repo to download GCIE from GitHub and set it up automatically:
|
|
@@ -146,7 +168,7 @@ What it does:
|
|
|
146
168
|
1. In the GCIE repo:
|
|
147
169
|
- `npm link`
|
|
148
170
|
2. In your target repo:
|
|
149
|
-
- `npm link gcie`
|
|
171
|
+
- `npm link @pmaddire/gcie`
|
|
150
172
|
3. Verify:
|
|
151
173
|
- `gcie --help`
|
|
152
174
|
|
|
@@ -162,7 +184,7 @@ This repo includes a lightweight npm wrapper so you can run `gcie` like other np
|
|
|
162
184
|
2. In target repo: `gcie --help`
|
|
163
185
|
|
|
164
186
|
Local option:
|
|
165
|
-
- `npm install` then `npx gcie --help`
|
|
187
|
+
- `npm install` then `npx @pmaddire/gcie@latest --help`
|
|
166
188
|
|
|
167
189
|
The wrapper prefers `.venv` in the GCIE repo and falls back to system Python.
|
|
168
190
|
|
|
@@ -216,7 +238,7 @@ Important note:
|
|
|
216
238
|
- `gcie index <path>`
|
|
217
239
|
- `gcie query <file.py> "<question>"`
|
|
218
240
|
- `gcie debug <file.py> "<question>"`
|
|
219
|
-
- `gcie context <repo|file> "<task>" --budget auto --intent <edit|debug|refactor|explore
|
|
241
|
+
- `gcie context <repo|file> "<task>" --budget auto --intent <edit|debug|refactor|explore> --mode basic`
|
|
220
242
|
- `gcie context-slices <repo> "<task>" --intent <edit|debug|refactor|explore> [--profile recall|low] [--stage-a 400] [--stage-b 800] [--max-total 1200] [--pin frontend/src/App.jsx] [--pin-budget 300] [--include-tests]`
|
|
221
243
|
|
|
222
244
|
## How To Use It
|
|
@@ -367,6 +389,6 @@ npm publish --access public
|
|
|
367
389
|
Then users can run:
|
|
368
390
|
|
|
369
391
|
```powershell
|
|
370
|
-
npx gcie@latest
|
|
392
|
+
npx -y @pmaddire/gcie@latest setup .
|
|
371
393
|
```
|
|
372
394
|
|
|
@@ -0,0 +1,69 @@
|
|
|
1
|
+
import os
|
|
2
|
+
import pathlib
|
|
3
|
+
from parser.ast_parser import parse_python_file
|
|
4
|
+
from graphs.call_graph import build_call_graph
|
|
5
|
+
from graphs.variable_graph import build_variable_graph
|
|
6
|
+
from retrieval.hybrid_retriever import hybrid_retrieve
|
|
7
|
+
from llm_context.snippet_selector import RankedSnippet, estimate_tokens
|
|
8
|
+
from llm_context.context_builder import build_context
|
|
9
|
+
|
|
10
|
+
ROOT=pathlib.Path('.')
|
|
11
|
+
EXCLUDE={'__pycache__','.venv','venv'}
|
|
12
|
+
py_files=[]
|
|
13
|
+
for path in ROOT.rglob('*.py'):
|
|
14
|
+
if any(part in EXCLUDE for part in path.parts):
|
|
15
|
+
continue
|
|
16
|
+
py_files.append(path)
|
|
17
|
+
|
|
18
|
+
snippets_by_node={}
|
|
19
|
+
modules=[]
|
|
20
|
+
for path in py_files:
|
|
21
|
+
try:
|
|
22
|
+
module=parse_python_file(path)
|
|
23
|
+
except Exception:
|
|
24
|
+
continue
|
|
25
|
+
modules.append(module)
|
|
26
|
+
text=path.read_text()
|
|
27
|
+
lines=text.splitlines()
|
|
28
|
+
for fn in module.functions:
|
|
29
|
+
start=max(0,fn.start_line-1)
|
|
30
|
+
end=min(len(lines),fn.end_line)
|
|
31
|
+
snippet='\n'.join(lines[start:end])
|
|
32
|
+
node=f"function:{path.as_posix()}::{fn.name}"
|
|
33
|
+
snippets_by_node[node]=snippet
|
|
34
|
+
|
|
35
|
+
call_graph=build_call_graph(modules)
|
|
36
|
+
var_graph=build_variable_graph(modules)
|
|
37
|
+
graph=call_graph
|
|
38
|
+
for node,attrs in var_graph.nodes(data=True):
|
|
39
|
+
if not graph.has_node(node):
|
|
40
|
+
graph.add_node(node,**attrs)
|
|
41
|
+
for u,v,data in var_graph.edges(data=True):
|
|
42
|
+
graph.add_edge(u,v,**data)
|
|
43
|
+
|
|
44
|
+
prompts=[
|
|
45
|
+
"Why is variable diff exploding?",
|
|
46
|
+
"How does git history mining handle empty repositories?",
|
|
47
|
+
"How do CLI index/query/debug commands work?",
|
|
48
|
+
]
|
|
49
|
+
|
|
50
|
+
def naive_tokens():
|
|
51
|
+
total=0
|
|
52
|
+
for path in py_files:
|
|
53
|
+
total+=estimate_tokens(path.read_text())
|
|
54
|
+
return total
|
|
55
|
+
|
|
56
|
+
naive=naive_tokens()
|
|
57
|
+
print('Prompt|GCIE tokens|Naive tokens|Reduction%|Selected snippets|Notes')
|
|
58
|
+
for prompt in prompts:
|
|
59
|
+
hybrid=hybrid_retrieve(graph,prompt,top_k=10,git_recency_by_node={},coverage_risk_by_node={},max_hops=2)
|
|
60
|
+
ranked=[]
|
|
61
|
+
for cand in hybrid:
|
|
62
|
+
text=snippets_by_node.get(cand.node_id)
|
|
63
|
+
if not text:
|
|
64
|
+
continue
|
|
65
|
+
ranked.append(RankedSnippet(cand.node_id,text,cand.score))
|
|
66
|
+
context=build_context(prompt,ranked,token_budget=300)
|
|
67
|
+
reduction=(1-context.total_tokens_estimate/naive)*100 if naive else 0
|
|
68
|
+
note='good' if ranked else 'empty'
|
|
69
|
+
print(f"{prompt}|{context.total_tokens_estimate:.1f}|{naive:.1f}|{reduction:.1f}%|{len(context.snippets)}|{note}")
|