@bilalimamoglu/sift 0.3.1 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (4) hide show
  1. package/README.md +124 -337
  2. package/dist/cli.js +998 -27
  3. package/dist/index.js +998 -27
  4. package/package.json +2 -2
package/README.md CHANGED
@@ -1,197 +1,165 @@
1
1
  # sift
2
2
 
3
- <img src="assets/brand/sift-logo-badge-monochrome.svg" alt="sift logo" width="88" />
3
+ <img src="assets/brand/sift-logo-minimal-monochrome.svg" alt="sift logo" width="120" />
4
4
 
5
- `sift` turns a long terminal wall of text into a short answer you can act on.
5
+ Most command output is long and noisy, but the thing you actually need to know is short: what failed, where, and what to do next. `sift` runs the command for you, captures the output, and gives you a short answer instead of a wall of text.
6
6
 
7
- Think of it like this:
8
- - `standard` = map
9
- - `focused` or `rerun --remaining` = zoom
10
- - raw traceback = last resort
11
-
12
- It is a good fit when a human, agent, or CI job needs the answer faster than it needs the whole log.
7
+ It works with test suites, build logs, `git diff`, `npm audit`, `terraform plan` — anything where the signal is buried in noise. It always tries the cheapest approach first and only escalates when needed. Exit codes are preserved.
13
8
 
14
- Common uses:
15
- - test failures
16
- - typecheck failures
17
- - lint failures
18
- - build logs
19
- - `git diff`
20
- - `npm audit`
21
- - `terraform plan`
22
-
23
- Do not use it when:
24
- - the exact raw log is the main thing you need
9
+ Skip it when:
10
+ - you need the exact raw log
25
11
  - the command is interactive or TUI-based
26
- - shell behavior depends on exact raw command output
12
+ - the output is already short
27
13
 
28
14
  ## Install
29
15
 
30
- Requires Node.js 20 or later.
16
+ Requires Node.js 24 or later.
31
17
 
32
18
  ```bash
33
19
  npm install -g @bilalimamoglu/sift
34
20
  ```
35
21
 
36
- ## One-time setup
22
+ ## Setup
37
23
 
38
- The easiest setup path is:
24
+ The interactive setup writes a machine-wide config and walks you through provider selection:
39
25
 
40
26
  ```bash
41
27
  sift config setup
28
+ sift doctor # verify it works
42
29
  ```
43
30
 
44
- That writes a machine-wide config to:
45
-
46
- ```text
47
- ~/.config/sift/config.yaml
48
- ```
49
-
50
- After that, any terminal on the machine can use `sift`. A repo-local config can still override it later.
31
+ Config is saved to `~/.config/sift/config.yaml`. A repo-local `sift.config.yaml` can override it later.
51
32
 
52
- To switch between saved native providers without editing YAML:
53
-
54
- ```bash
55
- sift config use openai
56
- sift config use openrouter
57
- ```
58
-
59
- If you prefer manual setup, this is the smallest useful OpenAI setup:
33
+ If you prefer environment variables instead:
60
34
 
61
35
  ```bash
36
+ # OpenAI
62
37
  export SIFT_PROVIDER=openai
63
38
  export SIFT_BASE_URL=https://api.openai.com/v1
64
39
  export SIFT_MODEL=gpt-5-nano
65
40
  export OPENAI_API_KEY=your_openai_api_key
66
- ```
67
-
68
- If you prefer manual setup, this is the smallest useful OpenRouter setup:
69
41
 
70
- ```bash
42
+ # or OpenRouter
71
43
  export SIFT_PROVIDER=openrouter
72
44
  export OPENROUTER_API_KEY=your_openrouter_api_key
45
+
46
+ # or any OpenAI-compatible endpoint (Together, Groq, self-hosted, etc.)
47
+ export SIFT_PROVIDER=openai-compatible
48
+ export SIFT_BASE_URL=https://your-endpoint/v1
49
+ export SIFT_PROVIDER_API_KEY=your_api_key
73
50
  ```
74
51
 
75
- Then check it:
52
+ To switch between saved providers without editing files:
76
53
 
77
54
  ```bash
78
- sift doctor
55
+ sift config use openai
56
+ sift config use openrouter
79
57
  ```
80
58
 
81
- ## Start here
82
-
83
- The default path is simple:
84
- 1. run the noisy command through `sift`
85
- 2. read the short `standard` answer first
86
- 3. only zoom in if `standard` clearly tells you more detail is still worth it
59
+ ## Usage
87
60
 
88
- Examples:
61
+ Run a noisy command through `sift`, read the short answer, and only zoom in if it tells you to:
89
62
 
90
63
  ```bash
91
- sift exec "what changed?" -- git diff
92
64
  sift exec --preset test-status -- pytest -q
93
- sift rerun
94
- sift rerun --remaining --detail focused
95
- sift rerun --remaining --detail verbose --show-raw
96
- sift watch "what changed between cycles?" < watcher-output.txt
97
- sift exec --watch "what changed between cycles?" -- node watcher.js
98
- sift exec --preset typecheck-summary -- npm run typecheck
99
- sift exec --preset lint-failures -- eslint .
65
+ sift exec "what changed?" -- git diff
100
66
  sift exec --preset audit-critical -- npm audit
101
67
  sift exec --preset infra-risk -- terraform plan
102
- sift agent install codex --dry-run
103
- ```
104
-
105
- ## Simple workflow
106
-
107
- For most repos, this is the whole story:
108
-
109
- ```bash
110
- sift exec --preset test-status -- <test command>
111
- sift rerun
112
- sift rerun --remaining --detail focused
113
68
  ```
114
69
 
115
- Mental model:
116
- - `sift escalate` = same cached output, deeper render
117
- - `sift rerun` = rerun the cached full command at `standard` and prepend what resolved, remained, or changed
118
- - `sift rerun --remaining` = rerun only the remaining failing pytest node IDs for a zoomed-in view
119
- - `sift watch` / `sift exec --watch` = treat redraw-style output as cycles and summarize what changed
120
- - `Decision: stop and act` = trust the current diagnosis and go read or fix code
121
- - `Decision: zoom` = one deeper sift pass is justified before raw
122
- - `Decision: raw only if exact traceback is required` = raw is last resort, not the next default step
70
+ `sift exec` runs the child command, captures its output, reduces it, and preserves the original exit code.
123
71
 
124
- If your project uses `pytest`, `vitest`, `jest`, `bun test`, or another test runner instead of `npm test`, use the same preset with that command.
72
+ Useful flags:
73
+ - `--dry-run`: show the reduced input and prompt without calling the provider
74
+ - `--show-raw`: print the captured raw output to `stderr`
125
75
 
126
- What `sift` does in `exec` mode:
127
- 1. runs the child command
128
- 2. captures `stdout` and `stderr`
129
- 3. keeps the useful signal
130
- 4. returns a short answer or JSON
131
- 5. preserves the child command exit code
76
+ ## Test debugging workflow
132
77
 
133
- Useful debug flags:
134
- - `--dry-run`: show the reduced input and prompt without calling the provider
135
- - `--show-raw`: print the captured raw input to `stderr`
78
+ This is the most common use case and where `sift` adds the most value.
136
79
 
137
- ## When tests fail
80
+ Think of it like this:
81
+ - `standard` = map
82
+ - `focused` or `rerun --remaining` = zoom
83
+ - raw traceback = last resort
138
84
 
139
- Start with the map:
85
+ For most repos, the whole story is:
140
86
 
141
87
  ```bash
142
- sift exec --preset test-status -- <test command>
88
+ sift exec --preset test-status -- <test command> # get the map
89
+ sift rerun # after a fix, refresh the truth
90
+ sift rerun --remaining --detail focused # zoom into what's still failing
143
91
  ```
144
92
 
145
- If `standard` already names the main failure buckets, counts, and hints, stop there and read code.
93
+ `test-status` becomes test-aware because you chose the preset. It does **not** infer "this is a test command" from the runner name — use the same preset with `pytest`, `vitest`, `jest`, `bun test`, or any other runner.
146
94
 
147
- If `standard` still includes an unknown bucket or ends with `Decision: zoom`, do one deeper sift pass before you fall back to raw traceback.
95
+ If `standard` already names the failure buckets, counts, and hints, stop there and read code. If it ends with `Decision: zoom`, do one deeper pass before falling back to raw traceback.
148
96
 
149
- Then use this order:
150
- 1. `sift exec --preset test-status -- <test command>`
151
- 2. `sift rerun`
152
- 3. `sift rerun --remaining --detail focused`
153
- 4. `sift rerun --remaining --detail verbose`
154
- 5. `sift rerun --remaining --detail verbose --show-raw`
155
- 6. raw pytest only if exact traceback lines are still needed
97
+ ### What `sift` returns for each failure family
156
98
 
157
- The normal stop budget is `standard` first, then at most one zoom step before raw.
99
+ - `Shared blocker` one setup problem affecting many tests
100
+ - A named family such as import, timeout, network, migration, or assertion
101
+ - `Anchor` — the first file, line window, or search term worth opening
102
+ - `Fix` — the likely next move
103
+ - `Decision` — whether to stop here or zoom one step deeper
104
+ - `Next` — the smallest practical action
158
105
 
159
- If you want the older explicit compare shape, `sift exec --preset test-status --diff -- <test command>` still works. `sift rerun` is the shorter normal path for the same idea.
106
+ ### Detail levels
160
107
 
161
- ## Diagnose JSON
162
-
163
- Most of the time, you do not need JSON. Start with text first.
164
-
165
- If `standard` already shows bucket-level root cause, `Anchor`, and `Fix`, do not re-verify the same bucket with raw pytest. At most do one targeted source read before you edit.
108
+ - `standard` — short summary, no file list (default)
109
+ - `focused` — groups failures by error type, shows a few representative tests
110
+ - `verbose` flat list of all visible failing tests with their normalized reason
166
111
 
167
- If diagnose output still contains an unknown bucket or `Decision: zoom`, take one sift zoom step before raw traceback.
112
+ ### Example output
168
113
 
169
- Use diagnose JSON only when automation or machine branching really needs it:
114
+ Single failure family:
115
+ ```text
116
+ - Tests did not complete.
117
+ - 114 errors occurred during collection.
118
+ - Import/dependency blocker: repeated collection failures are caused by missing dependencies.
119
+ - Anchor: path/to/failing_test.py
120
+ - Fix: Install the missing dependencies and rerun the affected tests.
121
+ - Decision: stop and act. Do not escalate unless you need exact traceback lines.
122
+ - Next: Fix bucket 1 first, then rerun the full suite at standard.
123
+ ```
170
124
 
171
- ```bash
172
- sift exec --preset test-status --goal diagnose --format json -- pytest -q
173
- sift rerun --goal diagnose --format json
174
- sift watch --preset test-status --goal diagnose --format json < pytest-watch.txt
125
+ Multiple failure families in one pass:
126
+ ```text
127
+ - Tests did not pass.
128
+ - 3 tests failed. 124 errors occurred.
129
+ - Shared blocker: DB-isolated tests are missing a required test env var.
130
+ Anchor: search <TEST_ENV_VAR> in path/to/test_setup.py
131
+ Fix: Set the required test env var and rerun the suite.
132
+ - Contract drift: snapshot expectations are out of sync with the current API or model state.
133
+ Anchor: search <route-or-entity> in path/to/freeze_test.py
134
+ Fix: Review the drift and regenerate the snapshots if the change is intentional.
135
+ - Decision: stop and act.
136
+ - Next: Fix bucket 1 first, then rerun the full suite at standard.
175
137
  ```
176
138
 
177
- Default diagnose JSON is summary-first:
178
- - `remaining_summary` and `resolved_summary` keep the answer small
179
- - `read_targets` points to the first file or line worth reading
180
- - `read_targets.context_hint` can tell an agent to read only a small line window first
181
- - if `context_hint` only includes `search_hint`, search for that string before reading the whole file
182
- - `remaining_subset_available` tells you whether `sift rerun --remaining` can zoom safely
139
+ ### Recommended debugging order
183
140
 
184
- If an agent truly needs every raw failing test ID, opt in:
141
+ 1. `sift exec --preset test-status -- <test command>` get the map.
142
+ 2. If `standard` already shows root cause, `Anchor`, and `Fix`, trust it and act.
143
+ 3. `sift escalate` — deeper render of the same cached output, without rerunning.
144
+ 4. `sift rerun` — after a fix, refresh the full-suite truth at `standard`.
145
+ 5. `sift rerun --remaining --detail focused` — zoom into what is still failing.
146
+ 6. `sift rerun --remaining --detail verbose`
147
+ 7. `sift rerun --remaining --detail verbose --show-raw`
148
+ 8. Raw test command only if exact traceback lines are still needed.
185
149
 
186
- ```bash
187
- sift exec --preset test-status --goal diagnose --format json --include-test-ids -- pytest -q
188
- ```
150
+ `sift rerun --remaining` currently supports only cached `pytest` or `python -m pytest` runs. For other runners, rerun a narrowed command manually with `sift exec --preset test-status -- <narrowed command>`.
189
151
 
190
- `--goal diagnose --format json` is currently supported only for `test-status`, `rerun`, and `test-status` watch flows.
152
+ ### Quick glossary
153
+
154
+ - `sift escalate` = same cached output, deeper render
155
+ - `sift rerun` = rerun the cached command at `standard`, show what resolved or remained
156
+ - `sift rerun --remaining` = rerun only the remaining failing test nodes
157
+ - `Decision: stop and act` = trust the diagnosis and go fix code
158
+ - `Decision: zoom` = one deeper sift pass is justified before raw
191
159
 
192
160
  ## Watch mode
193
161
 
194
- Use watch mode when command output redraws or repeats and you care about cycle-to-cycle change summaries more than the raw stream:
162
+ Use watch mode when output redraws or repeats across cycles:
195
163
 
196
164
  ```bash
197
165
  sift watch "what changed between cycles?" < watcher-output.txt
@@ -199,116 +167,22 @@ sift exec --watch "what changed between cycles?" -- node watcher.js
199
167
  sift exec --watch --preset test-status -- pytest -f
200
168
  ```
201
169
 
202
- `sift watch` keeps the current summary and change summary together:
203
170
  - cycle 1 = current state
204
171
  - later cycles = what changed, what resolved, what stayed, and the next best action
205
172
  - for `test-status`, resolved tests drop out and remaining failures stay in focus
206
173
 
207
- If the stream clearly looks like a redraw/watch session, `sift` can auto-switch to watch handling and prints a short stderr note when it does.
208
-
209
- ## `test-status` detail modes
210
-
211
- If you are running `npm test` and want `sift` to check the result, use `--preset test-status`.
212
-
213
- `test-status` becomes test-aware because you chose the preset. It does **not** infer “this is a test command” from `pytest`, `vitest`, `npm test`, or any other runner name.
214
-
215
- Available detail levels:
216
-
217
- - `standard`
218
- - short default summary
219
- - no file list
220
- - `focused`
221
- - groups failures by error type
222
- - shows a few representative failing tests or modules
223
- - `verbose`
224
- - flat list of visible failing tests or modules and their normalized reason
225
- - useful when Codex needs to know exactly what to fix first
226
-
227
- Examples:
228
-
229
- ```bash
230
- sift exec --preset test-status -- npm test
231
- sift rerun
232
- sift rerun --remaining --detail focused
233
- sift rerun --remaining --detail verbose
234
- sift rerun --remaining --detail verbose --show-raw
235
- ```
174
+ ## Diagnose JSON
236
175
 
237
- If you use a different runner, swap in your command:
176
+ Start with text. Use JSON only when automation needs machine-readable output:
238
177
 
239
178
  ```bash
240
- sift exec --preset test-status -- pytest
241
- sift rerun
242
- sift rerun --remaining --detail focused
243
- sift rerun --remaining --detail verbose --show-raw
244
- ```
245
-
246
- `sift rerun --remaining` currently supports only cached argv-mode `pytest ...` or `python -m pytest ...` runs. If the cached command is not subset-capable, run a narrowed pytest command manually with `sift exec --preset test-status -- <narrowed pytest command>`.
247
-
248
- Typical shapes:
249
-
250
- `standard`
251
- ```text
252
- - Tests did not complete.
253
- - 114 errors occurred during collection.
254
- - Import/dependency blocker: repeated collection failures are caused by missing dependencies.
255
- - Anchor: path/to/failing_test.py
256
- - Fix: Install the missing dependencies and rerun the affected tests.
257
- - Decision: stop and act. Do not escalate unless you need exact traceback lines.
258
- - Next: Fix bucket 1 first, then rerun the full suite at standard.
259
- - Stop signal: diagnosis complete; raw not needed.
260
- ```
261
-
262
- `standard` can also separate more than one failure family in a single pass:
263
- ```text
264
- - Tests did not pass.
265
- - 3 tests failed. 124 errors occurred.
266
- - Shared blocker: DB-isolated tests are missing a required test env var.
267
- - Anchor: search <TEST_ENV_VAR> in path/to/test_setup.py
268
- - Fix: Set the required test env var and rerun the suite.
269
- - Contract drift: snapshot expectations are out of sync with the current API or model state.
270
- - Anchor: search <route-or-entity> in path/to/freeze_test.py
271
- - Fix: Review the drift and regenerate the snapshots if the change is intentional.
272
- - Decision: stop and act. Do not escalate unless you need exact traceback lines.
273
- - Next: Fix bucket 1 first, then rerun the full suite at standard. Secondary buckets are already visible behind it.
274
- - Stop signal: diagnosis complete; raw not needed.
275
- ```
276
-
277
- `focused`
278
- ```text
279
- - Tests did not complete.
280
- - 114 errors occurred during collection.
281
- - Import/dependency blocker: missing dependencies are blocking collection.
282
- - Missing modules include <module-a>, <module-b>.
283
- - path/to/test_a.py -> missing module: <module-a>
284
- - path/to/test_b.py -> missing module: <module-b>
285
- - Hint: Install the missing dependencies and rerun the affected tests.
286
- - Next: Fix bucket 1 first, then rerun the full suite at standard.
287
- - Stop signal: diagnosis complete; raw not needed.
179
+ sift exec --preset test-status --goal diagnose --format json -- pytest -q
180
+ sift rerun --goal diagnose --format json
288
181
  ```
289
182
 
290
- `verbose`
291
- ```text
292
- - Tests did not complete.
293
- - 114 errors occurred during collection.
294
- - Import/dependency blocker: missing dependencies are blocking collection.
295
- - path/to/test_a.py -> missing module: <module-a>
296
- - path/to/test_b.py -> missing module: <module-b>
297
- - path/to/test_c.py -> missing module: <module-c>
298
- - Hint: Install the missing dependencies and rerun the affected tests.
299
- - Next: Fix bucket 1 first, then rerun the full suite at standard.
300
- - Stop signal: diagnosis complete; raw not needed.
301
- ```
183
+ The JSON is summary-first: `remaining_summary`, `resolved_summary`, `read_targets` with optional `context_hint`, and `remaining_subset_available` to tell you whether `sift rerun --remaining` can zoom safely.
302
184
 
303
- Recommended debugging order for tests:
304
- 1. Use `standard` for the full suite first.
305
- 2. Treat `standard` as the map. If it already shows bucket-level root cause, `Anchor`, and `Fix`, trust it and report or act from there directly.
306
- 3. Use `sift escalate` only when you want a deeper render of the same cached output without rerunning the command.
307
- 4. After fixing something, run `sift rerun` to refresh the full-suite truth at `standard`.
308
- 5. Only then use `sift rerun --remaining --detail focused` as the zoom lens after the full-suite truth is refreshed.
309
- 6. Then use `sift rerun --remaining --detail verbose`.
310
- 7. Then use `sift rerun --remaining --detail verbose --show-raw`.
311
- 8. Fall back to the raw pytest command only if you still need exact traceback lines for the remaining failing subset.
185
+ Add `--include-test-ids` only when you need every raw failing test ID.
312
186
 
313
187
  ## Built-in presets
314
188
 
@@ -321,8 +195,6 @@ Recommended debugging order for tests:
321
195
  - `build-failure`: explain the most likely build failure
322
196
  - `log-errors`: extract the most relevant error signals
323
197
 
324
- List or inspect them:
325
-
326
198
  ```bash
327
199
  sift presets list
328
200
  sift presets show test-status
@@ -330,27 +202,14 @@ sift presets show test-status
330
202
 
331
203
  ## Agent setup
332
204
 
333
- If you want Codex or Claude Code to use `sift` by default, let `sift` install a managed instruction block for you.
334
-
335
- Repo scope is the default because it is safer:
205
+ `sift` can install a managed instruction block so Codex or Claude Code uses `sift` by default for long command output:
336
206
 
337
207
  ```bash
338
- sift agent show codex
339
- sift agent show codex --raw
340
- sift agent install codex --dry-run
341
- sift agent install codex --dry-run --raw
342
208
  sift agent install codex
343
209
  sift agent install claude
344
210
  ```
345
211
 
346
- You can also install machine-wide instructions explicitly:
347
-
348
- ```bash
349
- sift agent install codex --scope global
350
- sift agent install claude --scope global
351
- ```
352
-
353
- Useful commands:
212
+ This writes a managed block to `AGENTS.md` or `CLAUDE.md` in the current repo. Use `--dry-run` to preview, or `--scope global` for machine-wide instructions.
354
213
 
355
214
  ```bash
356
215
  sift agent status
@@ -358,79 +217,33 @@ sift agent remove codex
358
217
  sift agent remove claude
359
218
  ```
360
219
 
361
- `sift agent show ...` is a preview. It also tells you whether the managed block is already installed in the current scope.
362
-
363
- What the installer does:
364
- - writes to `AGENTS.md` or `CLAUDE.md` by default in the current repo
365
- - uses marked managed blocks instead of rewriting the whole file
366
- - preserves your surrounding notes and instructions
367
- - can use global files when you explicitly choose `--scope global`
368
- - keeps previews short by default
369
- - shows the exact managed block or final dry-run content only with `--raw`
220
+ ## CI usage
370
221
 
371
- What the managed block tells the agent:
372
- - start with `sift` for long non-interactive command output so the agent spends less context-window and token budget on raw logs
373
- - for tests, begin with the normal `test-status` summary
374
- - if `standard` already identifies the main buckets, stop there instead of escalating automatically
375
- - use `sift escalate` only for the same cached output when more detail is needed without rerunning the command
376
- - after a fix, refresh the truth with `sift rerun`
377
- - only then zoom into the remaining failing pytest subset with `sift rerun --remaining --detail focused`, then `verbose`, then `--show-raw`
378
- - fall back to the raw test command only when exact traceback lines are still needed
379
-
380
- ## CI-friendly usage
381
-
382
- Some commands succeed technically but should still block CI. `--fail-on` handles that for the built-in semantic presets that have stable machine-readable output:
222
+ Some commands succeed technically but should still block CI. `--fail-on` handles that:
383
223
 
384
224
  ```bash
385
225
  sift exec --preset audit-critical --fail-on -- npm audit
386
226
  sift exec --preset infra-risk --fail-on -- terraform plan
387
227
  ```
388
228
 
389
- Supported presets for `--fail-on`:
390
- - `audit-critical`
391
- - `infra-risk`
392
-
393
229
  ## Config
394
230
 
395
- Useful commands:
396
-
397
231
  ```bash
398
- sift config setup
399
- sift config use openrouter
400
- sift config init
401
- sift config show
232
+ sift config show # masks secrets by default
233
+ sift config show --show-secrets
402
234
  sift config validate
403
- sift doctor
404
235
  ```
405
236
 
406
- `sift config show` masks secrets by default. Use `--show-secrets` only when you explicitly need raw values.
407
-
408
237
  Config precedence:
409
238
  1. CLI flags
410
239
  2. environment variables
411
- 3. repo-local `sift.config.yaml` or `sift.config.yml`
412
- 4. machine-wide `~/.config/sift/config.yaml` or `~/.config/sift/config.yml`
240
+ 3. repo-local `sift.config.yaml`
241
+ 4. machine-wide `~/.config/sift/config.yaml`
413
242
  5. built-in defaults
414
243
 
415
- ## Maintainer benchmark
416
-
417
- To compare raw pytest output against the `test-status` reduction ladder on fixed fixtures, run:
418
-
419
- ```bash
420
- npm run bench:test-status-ab
421
- npm run bench:test-status-live
422
- ```
423
-
424
- This uses the real `o200k_base` tokenizer and reports both:
425
- - command-output budget as the primary benchmark
426
- - deterministic recipe-budget comparisons as supporting evidence only
427
- - live-session scorecards for captured mixed full-suite agent transcripts
244
+ If you pass `--config <path>`, that path is strict — missing paths are errors.
428
245
 
429
- The benchmark is meant to show context-window and command-output reduction first. In normal debugging flows, `test-status` should usually stop at `standard`; `focused` and `verbose` are escalation tools, and raw pytest is the last resort when exact traceback evidence is still needed.
430
-
431
- If you pass `--config <path>`, that path is strict. Missing explicit config paths are errors.
432
-
433
- Minimal config example:
246
+ Minimal YAML config:
434
247
 
435
248
  ```yaml
436
249
  provider:
@@ -449,63 +262,37 @@ runtime:
449
262
  rawFallback: true
450
263
  ```
451
264
 
452
- ## OpenAI vs OpenRouter vs OpenAI-compatible
453
-
454
- Use `provider: openai` for `api.openai.com`.
455
-
456
- Use `provider: openrouter` for the native OpenRouter path. It defaults to:
457
- - `baseUrl: https://openrouter.ai/api/v1`
458
- - `model: openrouter/free`
459
-
460
- Use `provider: openai-compatible` for third-party compatible gateways or self-hosted endpoints.
461
-
462
- For OpenAI:
463
- ```bash
464
- export OPENAI_API_KEY=your_openai_api_key
465
- ```
466
-
467
- For OpenRouter:
468
- ```bash
469
- export OPENROUTER_API_KEY=your_openrouter_api_key
470
- ```
471
-
472
- For third-party compatible endpoints, use either the endpoint-native env var or:
473
-
474
- ```bash
475
- export SIFT_PROVIDER_API_KEY=your_provider_api_key
476
- ```
477
-
478
- Known compatible env fallbacks include:
479
- - `OPENROUTER_API_KEY`
480
- - `TOGETHER_API_KEY`
481
- - `GROQ_API_KEY`
482
-
483
265
  ## Safety and limits
484
266
 
485
267
  - redaction is optional and regex-based
486
- - retriable provider failures such as `429`, timeouts, and `5xx` are retried once
487
- - `sift exec` detects simple prompt-like output such as `[y/N]` or `password:` and skips reduction
488
- - pipe mode does not preserve upstream shell pipeline failures; use `set -o pipefail` if you need that behavior
268
+ - retriable provider failures (`429`, timeouts, `5xx`) are retried once
269
+ - `sift exec` detects interactive prompts (`[y/N]`, `password:`) and skips reduction
270
+ - pipe mode does not preserve upstream pipeline failures; use `set -o pipefail` if needed
489
271
 
490
272
  ## Releasing
491
273
 
492
274
  This repo uses a manual GitHub Actions release workflow with npm trusted publishing.
493
275
 
494
- Release flow:
495
276
  1. bump `package.json`
496
277
  2. merge to `main`
497
278
  3. run the `release` workflow manually
498
279
 
499
280
  The workflow runs typecheck, tests, coverage, build, packaging smoke checks, npm publish, tag creation, and GitHub Release creation.
500
281
 
501
- ## Brand assets
282
+ Release notes: if `release-notes/v<version>.md` or `release-notes/<version>.md` exists, the workflow uses it. Otherwise it falls back to GitHub generated notes.
502
283
 
503
- Curated public logo assets live in `assets/brand/`.
284
+ ## Maintainer benchmark
285
+
286
+ ```bash
287
+ npm run bench:test-status-ab
288
+ npm run bench:test-status-live
289
+ ```
290
+
291
+ Uses the `o200k_base` tokenizer and reports command-output budget as the primary benchmark, with deterministic recipe-budget comparisons and live-session scorecards as supporting evidence.
292
+
293
+ ## Brand assets
504
294
 
505
- Included SVG sets:
506
- - badge/app: teal, black, monochrome
507
- - icon-only: teal, black, monochrome
508
- - 24px icon: teal, black, monochrome
295
+ Logo assets live in `assets/brand/`: badge/app, icon-only, and 24px icon variants in teal, black, and monochrome.
509
296
 
510
297
  ## License
511
298