@pmaddire/gcie 0.1.13 → 0.1.15
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/GCIE_USAGE.md +7 -2
- package/README.md +121 -191
- package/cli/app.py +42 -10
- package/cli/commands/adaptation.py +72 -14
- package/cli/commands/context.py +351 -145
- package/llm_context/context_builder.py +83 -66
- package/llm_context/snippet_selector.py +157 -26
- package/package.json +1 -1
package/GCIE_USAGE.md
CHANGED
|
@@ -23,9 +23,14 @@ Priority order:
|
|
|
23
23
|
|
|
24
24
|
Primary retrieval:
|
|
25
25
|
```powershell
|
|
26
|
-
gcie.cmd context <path> "<query>" --intent <edit|debug|refactor|explore> --budget <auto|int> --mode <basic|adaptive>
|
|
26
|
+
gcie.cmd context <path> "<query>" --intent <edit|debug|refactor|explore> --budget <auto|int> --mode <basic|adaptive> --usage-policy <hybrid|force|minimal|off>
|
|
27
27
|
```
|
|
28
28
|
|
|
29
|
+
Usage policy guidance:
|
|
30
|
+
1. `hybrid` (default): balanced accuracy and token cost.
|
|
31
|
+
2. `force`: strict-accuracy path with stronger fallback gating.
|
|
32
|
+
3. `minimal`/`off`: smallest context for known-file, low-risk checks.
|
|
33
|
+
|
|
29
34
|
Sliced retrieval:
|
|
30
35
|
```powershell
|
|
31
36
|
gcie.cmd context-slices <path> "<query>" --intent <edit|debug|refactor|explore> --profile <low|recall|adaptive>
|
|
@@ -39,7 +44,7 @@ gcie.cmd adaptive-profile . --clear
|
|
|
39
44
|
|
|
40
45
|
Post-init adaptation pipeline:
|
|
41
46
|
- run from the target repo root (cd <repo> first); use . as scope
|
|
42
|
-
- adaptation now bootstraps per-family method defaults before accuracy rounds (plain/
|
|
47
|
+
- adaptation now bootstraps per-family method defaults before accuracy rounds (`plain_minimal`/`plain`/`plain_force` plus chain/gapfill/rescue/slices)
|
|
43
48
|
- adaptation case generation is mixed by design (single-file, same-layer pairs, cross-subtree pairs, and some 3-file chains on larger runs)
|
|
44
49
|
```powershell
|
|
45
50
|
gcie.cmd adapt . --benchmark-size 10 --efficiency-iterations 5 --clear-profile
|
package/README.md
CHANGED
|
@@ -6,36 +6,34 @@ It is designed for coding-agent workflows where we want to retrieve the smallest
|
|
|
6
6
|
useful set of code and operational context instead of reading whole files or
|
|
7
7
|
whole directories into the model.
|
|
8
8
|
|
|
9
|
-
## How It Works
|
|
10
|
-
|
|
11
|
-
GCIE
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
-
|
|
24
|
-
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
cost of sending full repo surfaces to the model.
|
|
38
|
-
|
|
9
|
+
## How It Works
|
|
10
|
+
|
|
11
|
+
GCIE is an adaptive context retrieval engine for coding agents.
|
|
12
|
+
|
|
13
|
+
At a high level:
|
|
14
|
+
|
|
15
|
+
1. Index + architecture snapshot
|
|
16
|
+
- `gcie index .` scans the repo and builds retrieval artifacts under `.gcie/`.
|
|
17
|
+
- GCIE tracks architecture/retrieval state so it can route future queries better.
|
|
18
|
+
|
|
19
|
+
2. Query classification
|
|
20
|
+
- `gcie context` classifies each request by intent and structure (single-file, same-layer pair, cross-layer, multi-hop).
|
|
21
|
+
|
|
22
|
+
3. Retrieval routing
|
|
23
|
+
- GCIE chooses retrieval strategy (`plain`, `plain_gapfill`, `plain_chain`, or slices where useful), path scope, token budget, and usage policy (`hybrid`, `force`, or `minimal/off`).
|
|
24
|
+
- `--budget auto` uses built-in heuristics; explicit budgets are available when needed.
|
|
25
|
+
|
|
26
|
+
4. Gap-fill + must-have recovery
|
|
27
|
+
- If expected support files are missing, GCIE runs targeted follow-up retrieval to recover must-have files instead of over-fetching whole repo context.
|
|
28
|
+
|
|
29
|
+
5. Adaptation loop (optional but recommended)
|
|
30
|
+
- `gcie adapt .` benchmarks repo-local cases, selects per-family methods, and runs efficiency trials under an accuracy gate.
|
|
31
|
+
- Results are written to `.planning/post_init_adaptation_report.json` and `.gcie/context_config.json`.
|
|
32
|
+
|
|
33
|
+
6. Fast path for day-to-day use
|
|
34
|
+
- After adaptation, most tasks should run through `gcie context` with small prompt footprints and high recall.
|
|
35
|
+
|
|
36
|
+
The practical goal is to keep must-have coverage while minimizing token cost.
|
|
39
37
|
## Quick Start
|
|
40
38
|
|
|
41
39
|
1. Create venv: `.venv\\Scripts\\python.exe -m venv .venv`
|
|
@@ -52,8 +50,8 @@ Use this when you want a fast drop-in setup for coding agents.
|
|
|
52
50
|
2. Copy [GCIE_USAGE.md](c:\GBCRSS\GCIE_USAGE.md) into the target repo root.
|
|
53
51
|
3. Run one index pass:
|
|
54
52
|
- `gcie.cmd index .`
|
|
55
|
-
4. Start using adaptive retrieval immediately:
|
|
56
|
-
- `gcie.cmd context . "<task>" --intent edit --budget auto`
|
|
53
|
+
4. Start using adaptive retrieval immediately:
|
|
54
|
+
- `gcie.cmd context . "<task>" --intent edit --budget auto --mode adaptive --usage-policy hybrid`
|
|
57
55
|
|
|
58
56
|
No heavy upfront tuning is required. The workflow starts portable-first and only adds local overrides after repeated miss patterns.
|
|
59
57
|
|
|
@@ -233,162 +231,94 @@ Important note:
|
|
|
233
231
|
- `--budget 1200` consistently improved recall without needing broad manual reads
|
|
234
232
|
- `1500` added more noise without materially helping more than `1200`
|
|
235
233
|
|
|
236
|
-
##
|
|
237
|
-
|
|
238
|
-
|
|
239
|
-
|
|
240
|
-
|
|
241
|
-
|
|
242
|
-
- `gcie
|
|
243
|
-
|
|
244
|
-
|
|
245
|
-
|
|
246
|
-
|
|
247
|
-
|
|
248
|
-
|
|
249
|
-
|
|
250
|
-
|
|
251
|
-
|
|
252
|
-
|
|
253
|
-
|
|
254
|
-
|
|
255
|
-
|
|
256
|
-
|
|
257
|
-
gcie context . "<task>" --
|
|
258
|
-
|
|
259
|
-
|
|
260
|
-
|
|
261
|
-
|
|
262
|
-
- `
|
|
263
|
-
- `
|
|
264
|
-
|
|
265
|
-
|
|
266
|
-
|
|
267
|
-
|
|
268
|
-
|
|
269
|
-
|
|
270
|
-
|
|
271
|
-
|
|
272
|
-
|
|
273
|
-
|
|
274
|
-
|
|
275
|
-
|
|
276
|
-
|
|
277
|
-
-
|
|
278
|
-
-
|
|
279
|
-
-
|
|
280
|
-
|
|
281
|
-
|
|
282
|
-
|
|
283
|
-
|
|
284
|
-
|
|
285
|
-
|
|
286
|
-
|
|
287
|
-
|
|
288
|
-
|
|
289
|
-
|
|
290
|
-
|
|
291
|
-
|
|
292
|
-
|
|
293
|
-
|
|
294
|
-
|
|
295
|
-
|
|
296
|
-
|
|
297
|
-
```
|
|
298
|
-
|
|
299
|
-
```
|
|
300
|
-
|
|
301
|
-
|
|
302
|
-
|
|
303
|
-
|
|
304
|
-
|
|
305
|
-
|
|
306
|
-
|
|
307
|
-
|
|
308
|
-
|
|
309
|
-
|
|
310
|
-
|
|
311
|
-
|
|
312
|
-
|
|
313
|
-
|
|
314
|
-
|
|
315
|
-
|
|
316
|
-
|
|
317
|
-
```
|
|
318
|
-
|
|
319
|
-
|
|
320
|
-
|
|
321
|
-
|
|
322
|
-
|
|
323
|
-
|
|
324
|
-
|
|
325
|
-
|
|
326
|
-
|
|
327
|
-
|
|
328
|
-
```
|
|
329
|
-
gcie context-slices . "<task>" --intent edit --pin <expected wiring file>
|
|
330
|
-
```
|
|
331
|
-
|
|
332
|
-
This is still the safest mode when you already know a few must-have files.
|
|
333
|
-
|
|
334
|
-
## Agent Workflow
|
|
335
|
-
|
|
336
|
-
For coding agents, the safest practical pattern is:
|
|
337
|
-
|
|
338
|
-
1. Run GCIE first
|
|
339
|
-
2. Check that the result includes:
|
|
340
|
-
- the main implementation file
|
|
341
|
-
- the wiring or entry file
|
|
342
|
-
- at least one validation or test surface when relevant
|
|
343
|
-
3. If a must-have file is missing:
|
|
344
|
-
- rerun with a more file-first query
|
|
345
|
-
- increase budget to `1000` or `1200`
|
|
346
|
-
- or pin the missing file in `context-slices`
|
|
347
|
-
4. Verify with `rg` before editing
|
|
348
|
-
|
|
349
|
-
This usually gives a much better accuracy/token tradeoff than broad manual file
|
|
350
|
-
reading.
|
|
351
|
-
|
|
352
|
-
## Cache
|
|
353
|
-
|
|
354
|
-
Repo-wide context is cached to speed up repeated calls.
|
|
355
|
-
|
|
356
|
-
- `gcie cache-warm .`
|
|
357
|
-
- `gcie cache-status .`
|
|
358
|
-
- `gcie cache-clear .`
|
|
359
|
-
|
|
360
|
-
Cache file: `.gcie/cache/context_cache.json` (auto-invalidated on file changes).
|
|
361
|
-
|
|
362
|
-
## Frontend and Non-Python Files
|
|
363
|
-
|
|
364
|
-
Repo-wide context scans common frontend and config extensions and adds file nodes so
|
|
365
|
-
queries can retrieve non-Python surfaces when relevant.
|
|
366
|
-
|
|
367
|
-
Default extensions include: `.js`, `.jsx`, `.ts`, `.tsx`, `.css`, `.scss`, `.html`, `.vue`,
|
|
368
|
-
plus `.json`, `.yaml`, `.yml`, `.toml`, `.md`, `.txt`.
|
|
369
|
-
|
|
370
|
-
## Core Capabilities
|
|
371
|
-
|
|
372
|
-
- Repository scanning
|
|
373
|
-
- Graph construction (structure, call, variable, execution, git, test coverage)
|
|
374
|
-
- Symbolic + semantic + hybrid retrieval
|
|
375
|
-
- Bug localization
|
|
376
|
-
- Minimal LLM context building
|
|
377
|
-
- Architecture-aware context routing and fallback
|
|
378
|
-
- Agent-friendly retrieval for edit/debug/refactor workflows
|
|
379
|
-
|
|
380
|
-
## Publish For NPX
|
|
381
|
-
|
|
382
|
-
From this repo:
|
|
383
|
-
|
|
384
|
-
```powershell
|
|
385
|
-
npm login
|
|
386
|
-
npm publish --access public
|
|
387
|
-
```
|
|
388
|
-
|
|
389
|
-
Then users can run:
|
|
390
|
-
|
|
391
|
-
```powershell
|
|
392
|
-
npx -y @pmaddire/gcie@latest setup .
|
|
393
|
-
```
|
|
394
|
-
|
|
234
|
+
## Command Reference
|
|
235
|
+
|
|
236
|
+
Use `gcie` or `gcie.cmd` on Windows.
|
|
237
|
+
|
|
238
|
+
### Setup / Lifecycle
|
|
239
|
+
|
|
240
|
+
- `gcie setup .`
|
|
241
|
+
- `gcie setup . --force`
|
|
242
|
+
- `gcie setup . --no-index`
|
|
243
|
+
- `gcie setup . --adapt --adapt-benchmark-size 25 --adapt-efficiency-iterations 8 --adapt-workers 6`
|
|
244
|
+
- `gcie remove .`
|
|
245
|
+
- `gcie remove . --remove-planning`
|
|
246
|
+
- `gcie remove . --keep-usage --keep-setup-doc`
|
|
247
|
+
|
|
248
|
+
### Index and Retrieval
|
|
249
|
+
|
|
250
|
+
- `gcie index .`
|
|
251
|
+
- `gcie context . "<task>" --intent edit --budget auto --mode adaptive --usage-policy hybrid`
|
|
252
|
+
- `gcie context . "<task>" --intent debug --budget 1200 --mode adaptive --usage-policy force`
|
|
253
|
+
- `gcie context . "<task>" --intent explore --budget auto --mode basic --usage-policy off`
|
|
254
|
+
- `gcie context-slices . "<task>" --intent edit --profile recall`
|
|
255
|
+
- `gcie context-slices . "<task>" --intent edit --profile low --pin frontend/src/App.jsx --pin-budget 300`
|
|
256
|
+
|
|
257
|
+
### Usage Policy
|
|
258
|
+
|
|
259
|
+
- `hybrid` is the default. It keeps the existing balance between recall and token cost.
|
|
260
|
+
- `force` always takes the richer GCIE retrieval path, even for simpler prompts.
|
|
261
|
+
- `minimal` or `off` keeps retrieval tiny when you already know the target files or only need a quick probe.
|
|
262
|
+
|
|
263
|
+
### Adaptation and Profile State
|
|
264
|
+
|
|
265
|
+
- `gcie adapt . --benchmark-size 25 --efficiency-iterations 8 --adapt-workers 6`
|
|
266
|
+
- `gcie adapt . --benchmark-size 25 --efficiency-iterations 8 --adapt-workers 6 --clear-profile`
|
|
267
|
+
- `gcie adaptive-profile .`
|
|
268
|
+
- `gcie adaptive-profile . --clear`
|
|
269
|
+
- adaptation evaluates policy-aware candidates (`plain_minimal`, `plain`, `plain_force`) plus chain/gapfill/rescue/slices and picks per-family under an accuracy gate
|
|
270
|
+
|
|
271
|
+
### Utility Commands
|
|
272
|
+
|
|
273
|
+
- `gcie query <path> "<question>"`
|
|
274
|
+
- `gcie debug <path> "<question>"`
|
|
275
|
+
- `gcie cache-status .`
|
|
276
|
+
- `gcie cache-warm .`
|
|
277
|
+
- `gcie cache-clear .`
|
|
278
|
+
|
|
279
|
+
## Recommended Workflow
|
|
280
|
+
|
|
281
|
+
### 1) Bootstrap once per repo
|
|
282
|
+
|
|
283
|
+
```powershell
|
|
284
|
+
gcie setup . --adapt --adapt-benchmark-size 25 --adapt-efficiency-iterations 8 --adapt-workers 6
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
### 2) Day-to-day retrieval
|
|
288
|
+
|
|
289
|
+
```powershell
|
|
290
|
+
gcie context . "<task>" --intent edit --budget auto --mode adaptive --usage-policy hybrid
|
|
291
|
+
```
|
|
292
|
+
|
|
293
|
+
For cross-layer flows, use file-first symbol-rich queries and optionally pin budget:
|
|
294
|
+
|
|
295
|
+
```powershell
|
|
296
|
+
gcie context . "frontend/src/App.jsx selectedTheme /api/convert/start app.py start_convert" --intent edit --budget 1200 --mode adaptive --usage-policy force
|
|
297
|
+
```
|
|
298
|
+
|
|
299
|
+
### 3) Verify before edits on critical changes
|
|
300
|
+
|
|
301
|
+
```powershell
|
|
302
|
+
rg -n "<symbol1>|<symbol2>|<symbol3>" .
|
|
303
|
+
```
|
|
304
|
+
|
|
305
|
+
### 4) Re-adapt only when needed
|
|
306
|
+
|
|
307
|
+
Use adaptation again after large refactors, architecture shifts, or repeated recall misses:
|
|
308
|
+
|
|
309
|
+
```powershell
|
|
310
|
+
gcie adapt . --benchmark-size 25 --efficiency-iterations 8 --adapt-workers 6
|
|
311
|
+
```
|
|
312
|
+
|
|
313
|
+
If adaptation quality drifts due stale profile state, reset first:
|
|
314
|
+
|
|
315
|
+
```powershell
|
|
316
|
+
gcie adaptive-profile . --clear
|
|
317
|
+
gcie adapt . --benchmark-size 25 --efficiency-iterations 8 --adapt-workers 6 --clear-profile
|
|
318
|
+
```
|
|
319
|
+
|
|
320
|
+
## Notes
|
|
321
|
+
|
|
322
|
+
- `requested_benchmark_size` can be higher than `benchmark_size` used when fewer unique repo-local benchmark cases are available.
|
|
323
|
+
- `status: accuracy_locked_but_cost_risky` can appear when the selected 100%-accuracy policy is compared against a cheaper but lower-accuracy baseline.
|
|
324
|
+
- Primary success criteria remain must-have coverage and pass rate; optimize cost after lock.
|
package/cli/app.py
CHANGED
|
@@ -1,9 +1,10 @@
|
|
|
1
|
-
|
|
1
|
+
"""Typer entrypoint for GCIE CLI."""
|
|
2
2
|
|
|
3
3
|
from __future__ import annotations
|
|
4
4
|
|
|
5
5
|
import json
|
|
6
|
-
import re
|
|
6
|
+
import re
|
|
7
|
+
from typing import Literal
|
|
7
8
|
|
|
8
9
|
import typer
|
|
9
10
|
|
|
@@ -16,7 +17,8 @@ from .commands.index import run_index
|
|
|
16
17
|
from .commands.query import run_query
|
|
17
18
|
from .commands.setup import run_remove, run_setup
|
|
18
19
|
|
|
19
|
-
app = typer.Typer(help="GraphCode Intelligence Engine CLI")
|
|
20
|
+
app = typer.Typer(help="GraphCode Intelligence Engine CLI")
|
|
21
|
+
UsagePolicy = Literal["hybrid", "force", "minimal", "off"]
|
|
20
22
|
|
|
21
23
|
|
|
22
24
|
def _query_tokens(query: str) -> tuple[str, ...]:
|
|
@@ -60,6 +62,20 @@ def _auto_context_budget(query: str, intent: str | None) -> int | None:
|
|
|
60
62
|
return None
|
|
61
63
|
|
|
62
64
|
|
|
65
|
+
def _resolve_context_usage(
|
|
66
|
+
*,
|
|
67
|
+
mode: str,
|
|
68
|
+
usage_policy: UsagePolicy,
|
|
69
|
+
budget: int | None,
|
|
70
|
+
) -> tuple[str, int | None, bool]:
|
|
71
|
+
"""Map a high-level usage policy to the existing context command parameters."""
|
|
72
|
+
if usage_policy in {"minimal", "off"}:
|
|
73
|
+
return "basic", 0, False
|
|
74
|
+
if usage_policy == "force":
|
|
75
|
+
return "adaptive", budget, True
|
|
76
|
+
return mode, budget, False
|
|
77
|
+
|
|
78
|
+
|
|
63
79
|
@app.command("index")
|
|
64
80
|
def index_cmd(path: str = typer.Argument(".")) -> None:
|
|
65
81
|
result = run_index(path)
|
|
@@ -85,18 +101,36 @@ def context_cmd(
|
|
|
85
101
|
budget: str = typer.Option("auto", "--budget"),
|
|
86
102
|
intent: str | None = typer.Option(None, "--intent"),
|
|
87
103
|
mode: str = typer.Option("basic", "--mode", help="context mode: basic or adaptive"),
|
|
104
|
+
usage_policy: UsagePolicy = typer.Option(
|
|
105
|
+
"hybrid",
|
|
106
|
+
"--usage-policy",
|
|
107
|
+
help="GCIE usage policy: hybrid, force, minimal, or off",
|
|
108
|
+
),
|
|
88
109
|
) -> None:
|
|
89
110
|
if budget == "auto":
|
|
90
111
|
budget_val = _auto_context_budget(query, intent)
|
|
91
112
|
else:
|
|
92
113
|
budget_val = int(budget)
|
|
93
114
|
|
|
94
|
-
if mode
|
|
95
|
-
result = run_context_basic(path, query, budget=budget_val, intent=intent)
|
|
96
|
-
elif mode == "adaptive":
|
|
97
|
-
result = run_context(path, query, budget=budget_val, intent=intent)
|
|
98
|
-
else:
|
|
115
|
+
if mode not in {"basic", "adaptive"}:
|
|
99
116
|
raise typer.BadParameter("--mode must be 'basic' or 'adaptive'")
|
|
117
|
+
|
|
118
|
+
effective_mode, effective_budget, strict_accuracy = _resolve_context_usage(
|
|
119
|
+
mode=mode,
|
|
120
|
+
usage_policy=usage_policy,
|
|
121
|
+
budget=budget_val,
|
|
122
|
+
)
|
|
123
|
+
|
|
124
|
+
if effective_mode == "basic":
|
|
125
|
+
result = run_context_basic(path, query, budget=effective_budget, intent=intent)
|
|
126
|
+
else:
|
|
127
|
+
result = run_context(
|
|
128
|
+
path,
|
|
129
|
+
query,
|
|
130
|
+
budget=effective_budget,
|
|
131
|
+
intent=intent,
|
|
132
|
+
strict_accuracy=strict_accuracy,
|
|
133
|
+
)
|
|
100
134
|
typer.echo(json.dumps(result, indent=2))
|
|
101
135
|
|
|
102
136
|
|
|
@@ -217,5 +251,3 @@ def cache_warm_cmd(path: str = typer.Argument(".")) -> None:
|
|
|
217
251
|
if __name__ == "__main__":
|
|
218
252
|
app()
|
|
219
253
|
|
|
220
|
-
|
|
221
|
-
|
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
|
|
1
|
+
"""Post-initialization adaptation pipeline (accuracy rounds first, then efficiency rounds)."""
|
|
2
2
|
|
|
3
3
|
from __future__ import annotations
|
|
4
4
|
|
|
@@ -10,7 +10,7 @@ import os
|
|
|
10
10
|
import re
|
|
11
11
|
from pathlib import Path
|
|
12
12
|
|
|
13
|
-
from .context import run_context
|
|
13
|
+
from .context import run_context, run_context_basic
|
|
14
14
|
from .context_slices import _classify_query_family, run_context_slices
|
|
15
15
|
from .index import run_index
|
|
16
16
|
|
|
@@ -54,7 +54,7 @@ _IGNORED_DIRS = {
|
|
|
54
54
|
"build",
|
|
55
55
|
"coverage",
|
|
56
56
|
}
|
|
57
|
-
_METHOD_ORDER = ["plain", "plain_chain", "plain_gapfill", "plain_rescue", "slices"]
|
|
57
|
+
_METHOD_ORDER = ["plain_minimal", "plain", "plain_force", "plain_chain", "plain_gapfill", "plain_rescue", "slices"]
|
|
58
58
|
|
|
59
59
|
|
|
60
60
|
def _adapt_worker_count(workers: int | None = None) -> int:
|
|
@@ -385,7 +385,57 @@ def _evaluate_plain_case(case, *, allow_gapfill: bool = True, aggressive_gapfill
|
|
|
385
385
|
missing_expected=tuple(missing),
|
|
386
386
|
context_complete=not missing,
|
|
387
387
|
)
|
|
388
|
-
|
|
388
|
+
|
|
389
|
+
def _evaluate_plain_minimal_case(case) -> CaseResult:
|
|
390
|
+
path, query, budget = _plan_query(case)
|
|
391
|
+
path = _safe_scope(path)
|
|
392
|
+
payload = run_context_basic(path, query, budget=budget, intent=case.intent)
|
|
393
|
+
files = {
|
|
394
|
+
_normalize_scoped_path(path, rel)
|
|
395
|
+
for rel in (_node_to_file(item.get("node_id", "")) for item in payload.get("snippets", []))
|
|
396
|
+
if rel
|
|
397
|
+
}
|
|
398
|
+
expected = tuple(case.expected_files)
|
|
399
|
+
missing = [rel for rel in expected if rel not in files]
|
|
400
|
+
tokens = int(payload.get("tokens", 0) or 0)
|
|
401
|
+
expected_hits = len(expected) - len(missing)
|
|
402
|
+
family = _classify_query_family(query)
|
|
403
|
+
return CaseResult(
|
|
404
|
+
name=case.name,
|
|
405
|
+
family=family,
|
|
406
|
+
mode="plain_context_workflow_minimal",
|
|
407
|
+
tokens=tokens,
|
|
408
|
+
expected_hits=expected_hits,
|
|
409
|
+
expected_total=len(expected),
|
|
410
|
+
missing_expected=tuple(missing),
|
|
411
|
+
context_complete=not missing,
|
|
412
|
+
)
|
|
413
|
+
|
|
414
|
+
|
|
415
|
+
def _evaluate_plain_force_case(case) -> CaseResult:
|
|
416
|
+
path, query, budget = _plan_query(case)
|
|
417
|
+
path = _safe_scope(path)
|
|
418
|
+
payload = run_context(path, query, budget=budget, intent=case.intent, strict_accuracy=True)
|
|
419
|
+
files = {
|
|
420
|
+
_normalize_scoped_path(path, rel)
|
|
421
|
+
for rel in (_node_to_file(item.get("node_id", "")) for item in payload.get("snippets", []))
|
|
422
|
+
if rel
|
|
423
|
+
}
|
|
424
|
+
expected = tuple(case.expected_files)
|
|
425
|
+
missing = [rel for rel in expected if rel not in files]
|
|
426
|
+
tokens = int(payload.get("tokens", 0) or 0)
|
|
427
|
+
expected_hits = len(expected) - len(missing)
|
|
428
|
+
family = _classify_query_family(query)
|
|
429
|
+
return CaseResult(
|
|
430
|
+
name=case.name,
|
|
431
|
+
family=family,
|
|
432
|
+
mode="plain_context_workflow_force",
|
|
433
|
+
tokens=tokens,
|
|
434
|
+
expected_hits=expected_hits,
|
|
435
|
+
expected_total=len(expected),
|
|
436
|
+
missing_expected=tuple(missing),
|
|
437
|
+
context_complete=not missing,
|
|
438
|
+
)
|
|
389
439
|
|
|
390
440
|
def _evaluate_slices_case(case) -> CaseResult:
|
|
391
441
|
payload = run_context_slices(
|
|
@@ -457,15 +507,19 @@ def _evaluate_slices_case(case) -> CaseResult:
|
|
|
457
507
|
)
|
|
458
508
|
|
|
459
509
|
|
|
460
|
-
def _evaluate_case_with_method(case, method: str) -> CaseResult:
|
|
461
|
-
if method == "
|
|
462
|
-
return
|
|
463
|
-
if method == "
|
|
464
|
-
return
|
|
465
|
-
if method == "
|
|
466
|
-
return
|
|
467
|
-
if method == "
|
|
468
|
-
return
|
|
510
|
+
def _evaluate_case_with_method(case, method: str) -> CaseResult:
|
|
511
|
+
if method == "plain_minimal":
|
|
512
|
+
return _evaluate_plain_minimal_case(case)
|
|
513
|
+
if method == "plain":
|
|
514
|
+
return _evaluate_plain_case(case, allow_gapfill=False)
|
|
515
|
+
if method == "plain_force":
|
|
516
|
+
return _evaluate_plain_force_case(case)
|
|
517
|
+
if method == "plain_chain":
|
|
518
|
+
return _evaluate_plain_chain_case(case)
|
|
519
|
+
if method == "plain_gapfill":
|
|
520
|
+
return _evaluate_plain_case(case, allow_gapfill=True, aggressive_gapfill=False)
|
|
521
|
+
if method == "plain_rescue":
|
|
522
|
+
return _evaluate_plain_case(case, allow_gapfill=True, aggressive_gapfill=True)
|
|
469
523
|
return _evaluate_slices_case(case)
|
|
470
524
|
|
|
471
525
|
|
|
@@ -890,14 +944,18 @@ def run_post_init_adaptation(
|
|
|
890
944
|
|
|
891
945
|
# Global candidate snapshots for transparency.
|
|
892
946
|
slices_rows = _evaluate_cases_with_method(cases, 'slices', workers)
|
|
947
|
+
plain_min_rows = _evaluate_cases_with_method(cases, 'plain_minimal', workers)
|
|
893
948
|
plain_rows = _evaluate_cases_with_method(cases, 'plain', workers)
|
|
949
|
+
plain_force_rows = _evaluate_cases_with_method(cases, 'plain_force', workers)
|
|
894
950
|
plain_gap_rows = _evaluate_cases_with_method(cases, 'plain_gapfill', workers)
|
|
895
951
|
plain_rescue_rows = _evaluate_cases_with_method(cases, 'plain_rescue', workers)
|
|
896
952
|
slices_summary = _summarize('slices_accuracy_stage', slices_rows)
|
|
953
|
+
plain_min_summary = _summarize('plain_minimal_accuracy_stage', plain_min_rows)
|
|
897
954
|
plain_summary = _summarize('plain_accuracy_stage', plain_rows)
|
|
955
|
+
plain_force_summary = _summarize('plain_force_accuracy_stage', plain_force_rows)
|
|
898
956
|
plain_gap_summary = _summarize('plain_gapfill_accuracy_stage', plain_gap_rows)
|
|
899
957
|
plain_rescue_summary = _summarize('plain_rescue_accuracy_stage', plain_rescue_rows)
|
|
900
|
-
candidates = [slices_summary, plain_summary, plain_gap_summary, plain_rescue_summary]
|
|
958
|
+
candidates = [slices_summary, plain_min_summary, plain_summary, plain_force_summary, plain_gap_summary, plain_rescue_summary]
|
|
901
959
|
|
|
902
960
|
active = {
|
|
903
961
|
'label': 'family_policy_selected',
|