@pmaddire/gcie 0.1.3 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/AGENT.md CHANGED
@@ -1,10 +1,14 @@
1
1
  # AGENT.md
2
2
 
3
- Agent Operating Instructions for GraphCode Intelligence Engine (GCIE)
3
+ Agent Operating Instructions for GraphCode Intelligence Engine (GCIE)
4
4
 
5
5
  This file provides persistent architectural context for coding agents working on this repository.
6
6
 
7
- Agents must read this file before performing any development tasks.
7
+ Agents must read this file before performing any development tasks.
8
+
9
+ Agent trigger:
10
+ Use GCIE for context lookup before reading files or making edits.
11
+ Follow `GCIE_USAGE.md` for the adaptive retrieval workflow and mode-routing rules.
8
12
 
9
13
  ---
10
14
 
package/GCIE_USAGE.md ADDED
@@ -0,0 +1,290 @@
1
+ # GCIE Agent Usage (Portable, Accuracy-First)
2
+
3
+ Trigger line for agent instructions:
4
+ `Use GCIE for context lookup before reading files or making edits. Follow GCIE_USAGE.md.`
5
+
6
+ ## Goal
7
+
8
+ Retrieve the smallest useful context without sacrificing edit safety.
9
+
10
+ Priority order:
11
+ 1. accuracy (must-have coverage)
12
+ 2. full-hit reliability
13
+ 3. token efficiency
14
+
15
+ ## Core Rules
16
+
17
+ 1. Do not trade recall for token savings.
18
+ 2. Stop retrieval as soon as must-have categories are covered.
19
+ 3. Adapt per task family, not per one-off query.
20
+ 4. Keep defaults portable; keep repo-specific learning in `.gcie` state.
21
+
22
+ ## Commands (Tool-Synced)
23
+
24
+ Primary retrieval:
25
+ ```powershell
26
+ gcie.cmd context <path> "<query>" --intent <edit|debug|refactor|explore> --budget <auto|int> --mode <basic|adaptive>
27
+ ```
28
+
29
+ Sliced retrieval:
30
+ ```powershell
31
+ gcie.cmd context-slices <path> "<query>" --intent <edit|debug|refactor|explore> --profile <low|recall|adaptive>
32
+ ```
33
+
34
+ Adaptive profile state:
35
+ ```powershell
36
+ gcie.cmd adaptive-profile .
37
+ gcie.cmd adaptive-profile . --clear
38
+ ```
39
+
40
+ Post-init adaptation pipeline:
41
+ ```powershell
42
+ gcie.cmd adapt . --benchmark-size 10 --efficiency-iterations 5 --clear-profile
43
+ ```
44
+
45
+ One-shot setup + adaptation:
46
+ ```powershell
47
+ gcie.cmd setup . --adapt --adapt-benchmark-size 10 --adapt-efficiency-iterations 5
48
+ ```
49
+
50
+ Setup and index:
51
+ ```powershell
52
+ gcie.cmd setup .
53
+ gcie.cmd index .
54
+ ```
55
+
56
+ Direct verification:
57
+ ```powershell
58
+ rg -n "symbol1|symbol2|symbol3" <likely files or subtree>
59
+ ```
60
+
61
+ ## Must-Have Coverage Gate (Required)
62
+
63
+ Context is sufficient only when all needed categories are present:
64
+ - implementation file(s)
65
+ - wiring/orchestration/caller file(s)
66
+ - validation surface when risk is non-trivial (test/spec/schema/config/contract/CLI surface)
67
+
68
+ If any must-have file is missing, retrieval is incomplete.
69
+
70
+ If a must-have file appears only as compact/skeleton context, re-query that file explicitly (pin or targeted query) before editing.
71
+
72
+ Note: tests/spec files are often excluded by default. Add `--include-tests` only when test context is required.
73
+
74
+ ## Query Construction (Portable, High-Signal)
75
+
76
+ Preferred pattern:
77
+ `<file-a> <file-b> <function/component> <route/flag> <state/config-key>`
78
+
79
+ Rules:
80
+ 1. Use file-first, symbol-heavy phrasing.
81
+ 2. Include explicit file paths when known.
82
+ 3. Include 2-6 distinctive symbols.
83
+ 4. Add caller/entry anchor when target is indirect.
84
+ 5. Avoid natural-language question phrasing.
85
+
86
+ Example:
87
+ - Bad: `How does architecture routing decide when to fall back?`
88
+ - Good: `context/context_router.py context/fallback_evaluator.py architecture routing fallback confidence`
89
+
90
+ ## Retrieval Modes (Adaptive Router)
91
+
92
+ Use three modes and route by observed outcomes:
93
+
94
+ 1. `slicer-first`
95
+ 2. `plain-context-first`
96
+ 3. `direct-file-check`
97
+
98
+ Slicer-first:
99
+ ```powershell
100
+ gcie.cmd context-slices <path> "<query>" --profile low --intent <intent>
101
+ ```
102
+
103
+ Plain-context-first:
104
+ ```powershell
105
+ gcie.cmd context <path> "<query>" --mode basic --intent <intent> --budget auto
106
+ ```
107
+
108
+ Direct-file-check:
109
+ ```powershell
110
+ rg -n "<symbols>" <files-or-subtree>
111
+ ```
112
+
113
+ Routing policy:
114
+ 1. Start new repos in `slicer-first` bootstrap mode.
115
+ 2. If must-have coverage is incomplete after one slicer pass, switch that task to `plain-context-first`.
116
+ 3. If a task family misses with slicer 2+ times in calibration, set that family default to `plain-context-first`.
117
+ 4. Keep slicer for families where it is both accurate and cheaper.
118
+ 5. If two GCIE attempts still miss required files, use `direct-file-check` and mark family `manual-verify-required` until recalibrated.
119
+
120
+ ## Scope and Budget Baseline (Portable)
121
+
122
+ Scope rule:
123
+ 1. Use the smallest path scope that still contains expected files.
124
+ 2. Use repo root `.` only for true cross-layer recovery.
125
+ 3. If explicit targets cluster in one subtree, subtree scope is usually better than root.
126
+
127
+ Profile ladder (concrete, portable):
128
+ 1. `context-slices --profile low`
129
+ 2. if miss: `context-slices --profile recall`
130
+ 3. if miss: `context-slices --profile recall --pin <missing-file>`
131
+ 4. if miss: `rg` direct check and targeted file retrieval
132
+
133
+ Plain-context budget baseline:
134
+ - `auto`: simple same-layer or strong single-file lookup
135
+ - `900`: same-family two-file lookup
136
+ - `1100`: backend/config pair or same-layer backend pair
137
+ - `1150`: cross-layer UI/API flow
138
+ - `1300-1400`: explicit multi-hop chain
139
+
140
+ Gap-fill baseline:
141
+ - general implementation/wiring file: `900`
142
+ - small entry/orchestrator file: `500`
143
+
144
+ ## Adaptive Recovery Order (One Change At A Time)
145
+
146
+ When retrieval is weak, apply in this exact order:
147
+
148
+ 1. Query upgrade: add explicit files, symbols, caller/entry anchor
149
+ 2. Scope correction: subtree vs root
150
+ 3. One profile/budget escalation
151
+ 4. Targeted gap-fill for only missing must-have file(s)
152
+ 5. Multi-hop decomposition only if still incomplete
153
+
154
+ Stop condition:
155
+ - If a required file is still missing after two GCIE attempts (with query+scope corrected), stop GCIE retries and use `rg`.
156
+
157
+ ## Architecture Tracking (Portable, In-Repo)
158
+
159
+ Track these under `.gcie/`:
160
+ - `.gcie/architecture.md`
161
+ - `.gcie/architecture_index.json`
162
+ - `.gcie/context_config.json`
163
+
164
+ Keep adaptive:
165
+ 1. Bootstrap from user docs once (`ARCHITECTURE.md`, `README.md`, `PROJECT.md`, `docs/*architecture*`).
166
+ 2. Treat `.gcie/architecture.md` as GCIE-owned working map.
167
+ 3. Refresh architecture files when boundaries/subsystems/interfaces change.
168
+ 4. Do not overwrite user-owned docs unless explicitly asked.
169
+
170
+ Fallback confidence rule:
171
+ - If architecture confidence is low or mappings are stale/missing, fallback to plain context and record reason in `.gcie/context_config.json`.
172
+
173
+ ## Pre-Calibration Readiness Gate (Required)
174
+
175
+ Run before full adaptation:
176
+
177
+ 1. Index + architecture refresh:
178
+ ```powershell
179
+ gcie.cmd index .
180
+ ```
181
+
182
+ 2. Readiness probe:
183
+ - Run 10-20 stratified queries across major families using `context-slices` and `context`.
184
+
185
+ 3. Readiness adaptation loop:
186
+ - Apply recovery order (query -> scope -> escalation -> targeted gap-fill -> decomposition).
187
+
188
+ 4. Gate decision:
189
+ - Proceed to calibration only after coverage is reachable with stable behavior.
190
+ - If not reachable, keep safer fallback mode for affected families and continue tracking.
191
+
192
+ ## Automatic Post-Trigger Adaptation (Required)
193
+
194
+ After trigger detection in a repo session:
195
+
196
+ 1. `checkpoint: trigger_detected`
197
+ 2. Run `gcie.cmd index .` -> `checkpoint: index_complete`
198
+ 3. Run readiness probe -> `checkpoint: readiness_probe_complete`
199
+ 4. Run accuracy calibration to 100% must-have hit -> `checkpoint: accuracy_lock_complete`
200
+ 5. Run efficiency iterations only under hard accuracy gate -> `checkpoint: efficiency_complete`
201
+ 6. Run final stress validation (recommended 50-query) -> `checkpoint: stress_validation_complete`
202
+ 7. Write back results (`.planning`, `.gcie/context_config.json`, learned overrides section) -> `checkpoint: write_back_complete`
203
+
204
+ If any checkpoint fails, mark run `incomplete`, record failure artifact in `.planning/`, and continue recovery/fallback flow.
205
+
206
+ ## Mandatory Bootstrap Calibration Sequence
207
+
208
+ 1. Recall calibration stage (required):
209
+ - Tune mode/scope/query/profile until overall and per-family hit rates are 100%.
210
+
211
+ 2. Recall lock verification (required):
212
+ - Require 2 consecutive 100% lock runs.
213
+
214
+ 3. Efficiency stage (optional, only after lock):
215
+ - Test controlled reductions one change at a time.
216
+ - Immediately rollback any hit-rate regression.
217
+
218
+ 4. Activation rule (required):
219
+ - Activate only if lock/stress pass.
220
+ - If stress fails, rollback to last known 100%-hit config.
221
+
222
+ ## Metrics and Decision Rules
223
+
224
+ Per query, record:
225
+ - must-have hit (true/false)
226
+ - tokens used
227
+ - retrieved files
228
+ - escalations performed
229
+
230
+ Track overall and by family:
231
+ - hit rate
232
+ - average and median tokens
233
+ - tokens-per-hit (`total_tokens / hit_count`)
234
+
235
+ Selection rule per family:
236
+ 1. highest hit rate
237
+ 2. if tie: lowest tokens-per-hit
238
+ 3. if tie: lowest median tokens
239
+
240
+ Demotion rules:
241
+ - If slicer miss-rate > 0% during recall calibration, do not keep slicer as default for that family.
242
+ - If both slicer and plain fail, route family to manual-verify until recalibration.
243
+
244
+ Promotion rules:
245
+ - Promote only configurations that preserve 100% hit.
246
+ - Efficiency changes must improve tokens without reducing hit rate.
247
+
248
+ ## Continuous Adaptation Over Time
249
+
250
+ Trigger recalibration when any are true:
251
+ 1. major repo-change signal (large refactor/churn)
252
+ 2. savings decay (rolling savings drops materially vs active baseline)
253
+ 3. repeated family misses (2+ in recent window)
254
+
255
+ Guardrails:
256
+ 1. Use a minimum evidence window (recommended: 20 retrieval events).
257
+ 2. Run in quiet/background mode when possible.
258
+ 3. Cap adaptation budget per cycle.
259
+ 4. Early-stop efficiency loop after 2 non-improving iterations.
260
+ 5. Prefer family-scoped recalibration before full recalibration.
261
+
262
+ ## Persistence
263
+
264
+ Persist learned defaults in `.gcie/context_config.json` and `.gcie/retrieval_profile.json` with:
265
+ - family
266
+ - default mode/profile
267
+ - last benchmark date
268
+ - hit/token metrics
269
+
270
+ Write repo-local learned routing here:
271
+
272
+ ## Learned Routing Overrides (Repo-Local, Mutable)
273
+
274
+ No active learned overrides yet.
275
+ Populate after first full adaptation cycle.
276
+
277
+ ## Agent Instructions Snippet (Copy/Paste)
278
+
279
+ ```text
280
+ Use GCIE for context lookup before reading files or making edits. Follow GCIE_USAGE.md.
281
+ Prioritize must-have coverage over token savings.
282
+ Start with context-slices --profile low, then adapt using recovery order:
283
+ query -> scope -> profile/budget escalation -> targeted gap-fill -> rg fallback.
284
+ ```
285
+
286
+ ## Notes
287
+
288
+ 1. This file is intentionally generalized and adaptive for any repo.
289
+ 2. Keep repo-specific tuning in learned overrides and `.gcie` state, not in global defaults.
290
+ 3. If in doubt, choose the higher-accuracy path first, then optimize tokens after lock.
package/README.md CHANGED
@@ -49,7 +49,7 @@ cost of sending full repo surfaces to the model.
49
49
  Use this when you want a fast drop-in setup for coding agents.
50
50
 
51
51
  1. Install GCIE CLI in the target repo (via your preferred method: npm link, local wrapper, or direct Python module).
52
- 2. Copy [AGENT_USAGE.md](c:\GBCRSS\AGENT_USAGE.md) into the target repo root.
52
+ 2. Copy [GCIE_USAGE.md](c:\GBCRSS\GCIE_USAGE.md) into the target repo root.
53
53
  3. Run one index pass:
54
54
  - `gcie.cmd index .`
55
55
  4. Start using adaptive retrieval immediately:
@@ -58,7 +58,7 @@ Use this when you want a fast drop-in setup for coding agents.
58
58
  No heavy upfront tuning is required. The workflow starts portable-first and only adds local overrides after repeated miss patterns.
59
59
 
60
60
  One-command repo bootstrap:
61
- - `gcie.cmd setup .`
61
+ - `npx -y @pmaddire/gcie@latest setup .`
62
62
 
63
63
  This creates `.gcie` architecture tracking files, copies portable agent workflow docs, and runs an initial index pass.
64
64
 
@@ -101,7 +101,7 @@ Net: adapted protocol preserved full accuracy while reducing average tokens by ~
101
101
  After publishing to npm, users can set up any repo with one command:
102
102
 
103
103
  ```powershell
104
- npx gcie@latest
104
+ npx -y @pmaddire/gcie@latest setup .
105
105
  ```
106
106
 
107
107
  This runs `gcie setup .` in the current repo by default.
@@ -110,10 +110,32 @@ If Python deps are missing, GCIE now bootstraps a local package venv and install
110
110
  Optional setup flags are passed through:
111
111
 
112
112
  ```powershell
113
- npx gcie@latest --no-index
114
- npx gcie@latest --force
113
+ npx -y @pmaddire/gcie@latest setup . --no-index
114
+ npx -y @pmaddire/gcie@latest setup . --force
115
115
  ```
116
116
 
117
+ For command-only usage without setup:
118
+
119
+ ```powershell
120
+ npx -y @pmaddire/gcie@latest --help
121
+ ```
122
+
123
+ ## Agent Integration
124
+
125
+ To make your coding agent use GCIE automatically, add this trigger line to your
126
+ agent instructions (system prompt / repo instruction file):
127
+
128
+ `Use GCIE for context lookup before reading files or making edits. Follow GCIE_USAGE.md.`
129
+
130
+ Required file:
131
+ - keep `GCIE_USAGE.md` in the target repo root
132
+
133
+ Recommended setup:
134
+ 1. Run one-command setup:
135
+ - `npx -y @pmaddire/gcie@latest setup .`
136
+ 2. Add the trigger line above to your agent instruction file.
137
+ 3. Start normal coding tasks; the agent should use GCIE-first retrieval workflow.
138
+
117
139
  ## One-Command GitHub Bootstrap
118
140
 
119
141
  Run this from the target repo to download GCIE from GitHub and set it up automatically:
@@ -146,7 +168,7 @@ What it does:
146
168
  1. In the GCIE repo:
147
169
  - `npm link`
148
170
  2. In your target repo:
149
- - `npm link gcie`
171
+ - `npm link @pmaddire/gcie`
150
172
  3. Verify:
151
173
  - `gcie --help`
152
174
 
@@ -162,7 +184,7 @@ This repo includes a lightweight npm wrapper so you can run `gcie` like other np
162
184
  2. In target repo: `gcie --help`
163
185
 
164
186
  Local option:
165
- - `npm install` then `npx gcie --help`
187
+ - `npm install` then `npx @pmaddire/gcie@latest --help`
166
188
 
167
189
  The wrapper prefers `.venv` in the GCIE repo and falls back to system Python.
168
190
 
@@ -216,7 +238,7 @@ Important note:
216
238
  - `gcie index <path>`
217
239
  - `gcie query <file.py> "<question>"`
218
240
  - `gcie debug <file.py> "<question>"`
219
- - `gcie context <repo|file> "<task>" --budget auto --intent <edit|debug|refactor|explore>`
241
+ - `gcie context <repo|file> "<task>" --budget auto --intent <edit|debug|refactor|explore> --mode basic`
220
242
  - `gcie context-slices <repo> "<task>" --intent <edit|debug|refactor|explore> [--profile recall|low] [--stage-a 400] [--stage-b 800] [--max-total 1200] [--pin frontend/src/App.jsx] [--pin-budget 300] [--include-tests]`
221
243
 
222
244
  ## How To Use It
@@ -367,6 +389,6 @@ npm publish --access public
367
389
  Then users can run:
368
390
 
369
391
  ```powershell
370
- npx gcie@latest
392
+ npx -y @pmaddire/gcie@latest setup .
371
393
  ```
372
394
 
package/SETUP_ANY_REPO.md CHANGED
@@ -43,7 +43,7 @@ Re-run indexing after major structural changes.
43
43
 
44
44
  ## 3) Add Agent Workflow File
45
45
 
46
- Copy `AGENT_USAGE.md` from GCIE into the target repo root.
46
+ Copy `GCIE_USAGE.md` from GCIE into the target repo root.
47
47
 
48
48
  ## 4) Start With Portable Defaults
49
49
 
@@ -51,7 +51,7 @@ Copy `AGENT_USAGE.md` from GCIE into the target repo root.
51
51
  gcie.cmd context . "<task>" --intent <edit|debug|refactor|explore> --budget auto
52
52
  ```
53
53
 
54
- Then apply the adaptive loop in `AGENT_USAGE.md` if must-have coverage is incomplete.
54
+ Then apply the adaptive loop in `GCIE_USAGE.md` if must-have coverage is incomplete.
55
55
 
56
56
  ## 5) Use Adaptive Mode Routing
57
57
 
@@ -0,0 +1,69 @@
1
+ import os
2
+ import pathlib
3
+ from parser.ast_parser import parse_python_file
4
+ from graphs.call_graph import build_call_graph
5
+ from graphs.variable_graph import build_variable_graph
6
+ from retrieval.hybrid_retriever import hybrid_retrieve
7
+ from llm_context.snippet_selector import RankedSnippet, estimate_tokens
8
+ from llm_context.context_builder import build_context
9
+
10
+ ROOT=pathlib.Path('.')
11
+ EXCLUDE={'__pycache__','.venv','venv'}
12
+ py_files=[]
13
+ for path in ROOT.rglob('*.py'):
14
+ if any(part in EXCLUDE for part in path.parts):
15
+ continue
16
+ py_files.append(path)
17
+
18
+ snippets_by_node={}
19
+ modules=[]
20
+ for path in py_files:
21
+ try:
22
+ module=parse_python_file(path)
23
+ except Exception:
24
+ continue
25
+ modules.append(module)
26
+ text=path.read_text()
27
+ lines=text.splitlines()
28
+ for fn in module.functions:
29
+ start=max(0,fn.start_line-1)
30
+ end=min(len(lines),fn.end_line)
31
+ snippet='\n'.join(lines[start:end])
32
+ node=f"function:{path.as_posix()}::{fn.name}"
33
+ snippets_by_node[node]=snippet
34
+
35
+ call_graph=build_call_graph(modules)
36
+ var_graph=build_variable_graph(modules)
37
+ graph=call_graph
38
+ for node,attrs in var_graph.nodes(data=True):
39
+ if not graph.has_node(node):
40
+ graph.add_node(node,**attrs)
41
+ for u,v,data in var_graph.edges(data=True):
42
+ graph.add_edge(u,v,**data)
43
+
44
+ prompts=[
45
+ "Why is variable diff exploding?",
46
+ "How does git history mining handle empty repositories?",
47
+ "How do CLI index/query/debug commands work?",
48
+ ]
49
+
50
+ def naive_tokens():
51
+ total=0
52
+ for path in py_files:
53
+ total+=estimate_tokens(path.read_text())
54
+ return total
55
+
56
+ naive=naive_tokens()
57
+ print('Prompt|GCIE tokens|Naive tokens|Reduction%|Selected snippets|Notes')
58
+ for prompt in prompts:
59
+ hybrid=hybrid_retrieve(graph,prompt,top_k=10,git_recency_by_node={},coverage_risk_by_node={},max_hops=2)
60
+ ranked=[]
61
+ for cand in hybrid:
62
+ text=snippets_by_node.get(cand.node_id)
63
+ if not text:
64
+ continue
65
+ ranked.append(RankedSnippet(cand.node_id,text,cand.score))
66
+ context=build_context(prompt,ranked,token_budget=300)
67
+ reduction=(1-context.total_tokens_estimate/naive)*100 if naive else 0
68
+ note='good' if ranked else 'empty'
69
+ print(f"{prompt}|{context.total_tokens_estimate:.1f}|{naive:.1f}|{reduction:.1f}%|{len(context.snippets)}|{note}")