@bilalimamoglu/sift 0.2.2 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +322 -82
- package/dist/cli.js +5404 -663
- package/dist/index.d.ts +43 -1
- package/dist/index.js +3471 -241
- package/package.json +7 -2
package/README.md
CHANGED
|
@@ -2,21 +2,30 @@
|
|
|
2
2
|
|
|
3
3
|
<img src="assets/brand/sift-logo-badge-monochrome.svg" alt="sift logo" width="88" />
|
|
4
4
|
|
|
5
|
-
`sift`
|
|
5
|
+
`sift` turns a long terminal wall of text into a short answer you can act on.
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
Think of it like this:
|
|
8
|
+
- `standard` = map
|
|
9
|
+
- `focused` or `rerun --remaining` = zoom
|
|
10
|
+
- raw traceback = last resort
|
|
8
11
|
|
|
9
|
-
|
|
10
|
-
- non-interactive shell commands
|
|
11
|
-
- agents that need short answers instead of full logs
|
|
12
|
-
- CI checks where a command may succeed but still produce a blocking result
|
|
12
|
+
It is a good fit when a human, agent, or CI job needs the answer faster than it needs the whole log.
|
|
13
13
|
|
|
14
|
-
|
|
15
|
-
-
|
|
16
|
-
-
|
|
17
|
-
-
|
|
14
|
+
Common uses:
|
|
15
|
+
- test failures
|
|
16
|
+
- typecheck failures
|
|
17
|
+
- lint failures
|
|
18
|
+
- build logs
|
|
19
|
+
- `git diff`
|
|
20
|
+
- `npm audit`
|
|
21
|
+
- `terraform plan`
|
|
18
22
|
|
|
19
|
-
|
|
23
|
+
Do not use it when:
|
|
24
|
+
- the exact raw log is the main thing you need
|
|
25
|
+
- the command is interactive or TUI-based
|
|
26
|
+
- shell behavior depends on exact raw command output
|
|
27
|
+
|
|
28
|
+
## Install
|
|
20
29
|
|
|
21
30
|
Requires Node.js 20 or later.
|
|
22
31
|
|
|
@@ -26,7 +35,7 @@ npm install -g @bilalimamoglu/sift
|
|
|
26
35
|
|
|
27
36
|
## One-time setup
|
|
28
37
|
|
|
29
|
-
The easiest path is
|
|
38
|
+
The easiest setup path is:
|
|
30
39
|
|
|
31
40
|
```bash
|
|
32
41
|
sift config setup
|
|
@@ -38,9 +47,9 @@ That writes a machine-wide config to:
|
|
|
38
47
|
~/.config/sift/config.yaml
|
|
39
48
|
```
|
|
40
49
|
|
|
41
|
-
After that, any terminal can use `sift
|
|
50
|
+
After that, any terminal on the machine can use `sift`. A repo-local config can still override it later.
|
|
42
51
|
|
|
43
|
-
If you
|
|
52
|
+
If you prefer manual setup, this is the smallest useful OpenAI setup:
|
|
44
53
|
|
|
45
54
|
```bash
|
|
46
55
|
export SIFT_PROVIDER=openai
|
|
@@ -49,86 +58,243 @@ export SIFT_MODEL=gpt-5-nano
|
|
|
49
58
|
export OPENAI_API_KEY=your_openai_api_key
|
|
50
59
|
```
|
|
51
60
|
|
|
52
|
-
|
|
61
|
+
Then check it:
|
|
53
62
|
|
|
54
63
|
```bash
|
|
55
|
-
sift
|
|
64
|
+
sift doctor
|
|
56
65
|
```
|
|
57
66
|
|
|
58
|
-
|
|
67
|
+
## Start here
|
|
68
|
+
|
|
69
|
+
The default path is simple:
|
|
70
|
+
1. run the noisy command through `sift`
|
|
71
|
+
2. read the short `standard` answer first
|
|
72
|
+
3. only zoom in if `standard` clearly tells you more detail is still worth it
|
|
73
|
+
|
|
74
|
+
Examples:
|
|
59
75
|
|
|
60
76
|
```bash
|
|
61
|
-
sift
|
|
77
|
+
sift exec "what changed?" -- git diff
|
|
78
|
+
sift exec --preset test-status -- pytest -q
|
|
79
|
+
sift rerun
|
|
80
|
+
sift rerun --remaining --detail focused
|
|
81
|
+
sift rerun --remaining --detail verbose --show-raw
|
|
82
|
+
sift watch "what changed between cycles?" < watcher-output.txt
|
|
83
|
+
sift exec --watch "what changed between cycles?" -- node watcher.js
|
|
84
|
+
sift exec --preset typecheck-summary -- npm run typecheck
|
|
85
|
+
sift exec --preset lint-failures -- eslint .
|
|
86
|
+
sift exec --preset audit-critical -- npm audit
|
|
87
|
+
sift exec --preset infra-risk -- terraform plan
|
|
88
|
+
sift agent install codex --dry-run
|
|
62
89
|
```
|
|
63
90
|
|
|
64
|
-
|
|
91
|
+
## Simple workflow
|
|
65
92
|
|
|
66
|
-
|
|
67
|
-
|
|
93
|
+
For most repos, this is the whole story:
|
|
94
|
+
|
|
95
|
+
```bash
|
|
96
|
+
sift exec --preset test-status -- <test command>
|
|
97
|
+
sift rerun
|
|
98
|
+
sift rerun --remaining --detail focused
|
|
68
99
|
```
|
|
69
100
|
|
|
70
|
-
|
|
101
|
+
Mental model:
|
|
102
|
+
- `sift escalate` = same cached output, deeper render
|
|
103
|
+
- `sift rerun` = rerun the cached full command at `standard` and prepend what resolved, remained, or changed
|
|
104
|
+
- `sift rerun --remaining` = rerun only the remaining failing pytest node IDs for a zoomed-in view
|
|
105
|
+
- `sift watch` / `sift exec --watch` = treat redraw-style output as cycles and summarize what changed
|
|
106
|
+
- `Decision: stop and act` = trust the current diagnosis and go read or fix code
|
|
107
|
+
- `Decision: zoom` = one deeper sift pass is justified before raw
|
|
108
|
+
- `Decision: raw only if exact traceback is required` = raw is last resort, not the next default step
|
|
109
|
+
|
|
110
|
+
If your project uses `pytest`, `vitest`, `jest`, `bun test`, or another test runner instead of `npm test`, use the same preset with that command.
|
|
111
|
+
|
|
112
|
+
What `sift` does in `exec` mode:
|
|
113
|
+
1. runs the child command
|
|
114
|
+
2. captures `stdout` and `stderr`
|
|
115
|
+
3. keeps the useful signal
|
|
116
|
+
4. returns a short answer or JSON
|
|
117
|
+
5. preserves the child command exit code
|
|
118
|
+
|
|
119
|
+
Useful debug flags:
|
|
120
|
+
- `--dry-run`: show the reduced input and prompt without calling the provider
|
|
121
|
+
- `--show-raw`: print the captured raw input to `stderr`
|
|
122
|
+
|
|
123
|
+
## When tests fail
|
|
124
|
+
|
|
125
|
+
Start with the map:
|
|
71
126
|
|
|
72
127
|
```bash
|
|
73
|
-
|
|
128
|
+
sift exec --preset test-status -- <test command>
|
|
74
129
|
```
|
|
75
130
|
|
|
76
|
-
If
|
|
131
|
+
If `standard` already names the main failure buckets, counts, and hints, stop there and read code.
|
|
132
|
+
|
|
133
|
+
Then use this order:
|
|
134
|
+
1. `sift exec --preset test-status -- <test command>`
|
|
135
|
+
2. `sift rerun`
|
|
136
|
+
3. `sift rerun --remaining --detail focused`
|
|
137
|
+
4. `sift rerun --remaining --detail verbose`
|
|
138
|
+
5. `sift rerun --remaining --detail verbose --show-raw`
|
|
139
|
+
6. raw pytest only if exact traceback lines are still needed
|
|
140
|
+
|
|
141
|
+
The normal stop budget is `standard` first, then at most one zoom step before raw.
|
|
142
|
+
|
|
143
|
+
If you want the older explicit compare shape, `sift exec --preset test-status --diff -- <test command>` still works. `sift rerun` is the shorter normal path for the same idea.
|
|
144
|
+
|
|
145
|
+
## Diagnose JSON
|
|
146
|
+
|
|
147
|
+
Most of the time, you do not need JSON. Start with text first.
|
|
148
|
+
|
|
149
|
+
If `standard` already shows bucket-level root cause, `Anchor`, and `Fix`, do not re-verify the same bucket with raw pytest. At most do one targeted source read before you edit.
|
|
150
|
+
|
|
151
|
+
Use diagnose JSON only when automation or machine branching really needs it:
|
|
77
152
|
|
|
78
153
|
```bash
|
|
79
|
-
|
|
154
|
+
sift exec --preset test-status --goal diagnose --format json -- pytest -q
|
|
155
|
+
sift rerun --goal diagnose --format json
|
|
156
|
+
sift watch --preset test-status --goal diagnose --format json < pytest-watch.txt
|
|
80
157
|
```
|
|
81
158
|
|
|
82
|
-
|
|
83
|
-
- `
|
|
84
|
-
- `
|
|
85
|
-
- `
|
|
159
|
+
Default diagnose JSON is summary-first:
|
|
160
|
+
- `remaining_summary` and `resolved_summary` keep the answer small
|
|
161
|
+
- `read_targets` points to the first file or line worth reading
|
|
162
|
+
- `read_targets.context_hint` can tell an agent to read only a small line window first
|
|
163
|
+
- if `context_hint` only includes `search_hint`, search for that string before reading the whole file
|
|
164
|
+
- `remaining_subset_available` tells you whether `sift rerun --remaining` can zoom safely
|
|
86
165
|
|
|
87
|
-
|
|
166
|
+
If an agent truly needs every raw failing test ID, opt in:
|
|
88
167
|
|
|
89
168
|
```bash
|
|
90
|
-
sift exec
|
|
91
|
-
sift exec --preset test-status -- pytest
|
|
92
|
-
sift exec --preset typecheck-summary -- tsc --noEmit
|
|
93
|
-
sift exec --preset lint-failures -- eslint .
|
|
94
|
-
sift exec --preset audit-critical -- npm audit
|
|
95
|
-
sift exec --preset infra-risk -- terraform plan
|
|
96
|
-
sift exec --preset audit-critical --fail-on -- npm audit
|
|
97
|
-
sift exec --preset infra-risk --fail-on -- terraform plan
|
|
169
|
+
sift exec --preset test-status --goal diagnose --format json --include-test-ids -- pytest -q
|
|
98
170
|
```
|
|
99
171
|
|
|
100
|
-
|
|
172
|
+
`--goal diagnose --format json` is currently supported only for `test-status`, `rerun`, and `test-status` watch flows.
|
|
173
|
+
|
|
174
|
+
## Watch mode
|
|
101
175
|
|
|
102
|
-
|
|
176
|
+
Use watch mode when command output redraws or repeats and you care about cycle-to-cycle change summaries more than the raw stream:
|
|
103
177
|
|
|
104
178
|
```bash
|
|
105
|
-
sift
|
|
106
|
-
sift exec --
|
|
179
|
+
sift watch "what changed between cycles?" < watcher-output.txt
|
|
180
|
+
sift exec --watch "what changed between cycles?" -- node watcher.js
|
|
181
|
+
sift exec --watch --preset test-status -- pytest -f
|
|
107
182
|
```
|
|
108
183
|
|
|
109
|
-
|
|
110
|
-
1
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
4. sends the reduced input to a smaller model
|
|
114
|
-
5. prints a short answer or JSON
|
|
115
|
-
6. preserves the wrapped command's exit code
|
|
184
|
+
`sift watch` keeps the current summary and change summary together:
|
|
185
|
+
- cycle 1 = current state
|
|
186
|
+
- later cycles = what changed, what resolved, what stayed, and the next best action
|
|
187
|
+
- for `test-status`, resolved tests drop out and remaining failures stay in focus
|
|
116
188
|
|
|
117
|
-
|
|
189
|
+
If the stream clearly looks like a redraw/watch session, `sift` can auto-switch to watch handling and prints a short stderr note when it does.
|
|
118
190
|
|
|
119
|
-
|
|
120
|
-
- `infra-risk`
|
|
121
|
-
- `audit-critical`
|
|
191
|
+
## `test-status` detail modes
|
|
122
192
|
|
|
123
|
-
|
|
193
|
+
If you are running `npm test` and want `sift` to check the result, use `--preset test-status`.
|
|
194
|
+
|
|
195
|
+
`test-status` becomes test-aware because you chose the preset. It does **not** infer “this is a test command” from `pytest`, `vitest`, `npm test`, or any other runner name.
|
|
196
|
+
|
|
197
|
+
Available detail levels:
|
|
198
|
+
|
|
199
|
+
- `standard`
|
|
200
|
+
- short default summary
|
|
201
|
+
- no file list
|
|
202
|
+
- `focused`
|
|
203
|
+
- groups failures by error type
|
|
204
|
+
- shows a few representative failing tests or modules
|
|
205
|
+
- `verbose`
|
|
206
|
+
- flat list of visible failing tests or modules and their normalized reason
|
|
207
|
+
- useful when Codex needs to know exactly what to fix first
|
|
208
|
+
|
|
209
|
+
Examples:
|
|
124
210
|
|
|
125
211
|
```bash
|
|
126
|
-
|
|
212
|
+
sift exec --preset test-status -- npm test
|
|
213
|
+
sift rerun
|
|
214
|
+
sift rerun --remaining --detail focused
|
|
215
|
+
sift rerun --remaining --detail verbose
|
|
216
|
+
sift rerun --remaining --detail verbose --show-raw
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
If you use a different runner, swap in your command:
|
|
220
|
+
|
|
221
|
+
```bash
|
|
222
|
+
sift exec --preset test-status -- pytest
|
|
223
|
+
sift rerun
|
|
224
|
+
sift rerun --remaining --detail focused
|
|
225
|
+
sift rerun --remaining --detail verbose --show-raw
|
|
226
|
+
```
|
|
227
|
+
|
|
228
|
+
`sift rerun --remaining` currently supports only cached argv-mode `pytest ...` or `python -m pytest ...` runs. If the cached command is not subset-capable, run a narrowed pytest command manually with `sift exec --preset test-status -- <narrowed pytest command>`.
|
|
229
|
+
|
|
230
|
+
Typical shapes:
|
|
231
|
+
|
|
232
|
+
`standard`
|
|
233
|
+
```text
|
|
234
|
+
- Tests did not complete.
|
|
235
|
+
- 114 errors occurred during collection.
|
|
236
|
+
- Import/dependency blocker: repeated collection failures are caused by missing dependencies.
|
|
237
|
+
- Anchor: path/to/failing_test.py
|
|
238
|
+
- Fix: Install the missing dependencies and rerun the affected tests.
|
|
239
|
+
- Decision: stop and act. Do not escalate unless you need exact traceback lines.
|
|
240
|
+
- Next: Fix bucket 1 first, then rerun the full suite at standard.
|
|
241
|
+
- Stop signal: diagnosis complete; raw not needed.
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
`standard` can also separate more than one failure family in a single pass:
|
|
245
|
+
```text
|
|
246
|
+
- Tests did not pass.
|
|
247
|
+
- 3 tests failed. 124 errors occurred.
|
|
248
|
+
- Shared blocker: DB-isolated tests are missing a required test env var.
|
|
249
|
+
- Anchor: search <TEST_ENV_VAR> in path/to/test_setup.py
|
|
250
|
+
- Fix: Set the required test env var and rerun the suite.
|
|
251
|
+
- Contract drift: snapshot expectations are out of sync with the current API or model state.
|
|
252
|
+
- Anchor: search <route-or-entity> in path/to/freeze_test.py
|
|
253
|
+
- Fix: Review the drift and regenerate the snapshots if the change is intentional.
|
|
254
|
+
- Decision: stop and act. Do not escalate unless you need exact traceback lines.
|
|
255
|
+
- Next: Fix bucket 1 first, then rerun the full suite at standard. Secondary buckets are already visible behind it.
|
|
256
|
+
- Stop signal: diagnosis complete; raw not needed.
|
|
257
|
+
```
|
|
258
|
+
|
|
259
|
+
`focused`
|
|
260
|
+
```text
|
|
261
|
+
- Tests did not complete.
|
|
262
|
+
- 114 errors occurred during collection.
|
|
263
|
+
- Import/dependency blocker: missing dependencies are blocking collection.
|
|
264
|
+
- Missing modules include <module-a>, <module-b>.
|
|
265
|
+
- path/to/test_a.py -> missing module: <module-a>
|
|
266
|
+
- path/to/test_b.py -> missing module: <module-b>
|
|
267
|
+
- Hint: Install the missing dependencies and rerun the affected tests.
|
|
268
|
+
- Next: Fix bucket 1 first, then rerun the full suite at standard.
|
|
269
|
+
- Stop signal: diagnosis complete; raw not needed.
|
|
127
270
|
```
|
|
128
271
|
|
|
272
|
+
`verbose`
|
|
273
|
+
```text
|
|
274
|
+
- Tests did not complete.
|
|
275
|
+
- 114 errors occurred during collection.
|
|
276
|
+
- Import/dependency blocker: missing dependencies are blocking collection.
|
|
277
|
+
- path/to/test_a.py -> missing module: <module-a>
|
|
278
|
+
- path/to/test_b.py -> missing module: <module-b>
|
|
279
|
+
- path/to/test_c.py -> missing module: <module-c>
|
|
280
|
+
- Hint: Install the missing dependencies and rerun the affected tests.
|
|
281
|
+
- Next: Fix bucket 1 first, then rerun the full suite at standard.
|
|
282
|
+
- Stop signal: diagnosis complete; raw not needed.
|
|
283
|
+
```
|
|
284
|
+
|
|
285
|
+
Recommended debugging order for tests:
|
|
286
|
+
1. Use `standard` for the full suite first.
|
|
287
|
+
2. Treat `standard` as the map. If it already shows bucket-level root cause, `Anchor`, and `Fix`, trust it and report or act from there directly.
|
|
288
|
+
3. Use `sift escalate` only when you want a deeper render of the same cached output without rerunning the command.
|
|
289
|
+
4. After fixing something, run `sift rerun` to refresh the full-suite truth at `standard`.
|
|
290
|
+
5. Only then use `sift rerun --remaining --detail focused` as the zoom lens after the full-suite truth is refreshed.
|
|
291
|
+
6. Then use `sift rerun --remaining --detail verbose`.
|
|
292
|
+
7. Then use `sift rerun --remaining --detail verbose --show-raw`.
|
|
293
|
+
8. Fall back to the raw pytest command only if you still need exact traceback lines for the remaining failing subset.
|
|
294
|
+
|
|
129
295
|
## Built-in presets
|
|
130
296
|
|
|
131
|
-
- `test-status`: summarize test
|
|
297
|
+
- `test-status`: summarize test runs
|
|
132
298
|
- `typecheck-summary`: group blocking type errors by root cause
|
|
133
299
|
- `lint-failures`: group repeated lint violations and highlight the files or rules that matter
|
|
134
300
|
- `audit-critical`: extract only high and critical vulnerabilities
|
|
@@ -137,21 +303,74 @@ git diff 2>&1 | sift "what changed?"
|
|
|
137
303
|
- `build-failure`: explain the most likely build failure
|
|
138
304
|
- `log-errors`: extract the most relevant error signals
|
|
139
305
|
|
|
140
|
-
|
|
306
|
+
List or inspect them:
|
|
141
307
|
|
|
142
308
|
```bash
|
|
143
309
|
sift presets list
|
|
144
|
-
sift presets show
|
|
310
|
+
sift presets show test-status
|
|
311
|
+
```
|
|
312
|
+
|
|
313
|
+
## Agent setup
|
|
314
|
+
|
|
315
|
+
If you want Codex or Claude Code to use `sift` by default, let `sift` install a managed instruction block for you.
|
|
316
|
+
|
|
317
|
+
Repo scope is the default because it is safer:
|
|
318
|
+
|
|
319
|
+
```bash
|
|
320
|
+
sift agent show codex
|
|
321
|
+
sift agent show codex --raw
|
|
322
|
+
sift agent install codex --dry-run
|
|
323
|
+
sift agent install codex --dry-run --raw
|
|
324
|
+
sift agent install codex
|
|
325
|
+
sift agent install claude
|
|
326
|
+
```
|
|
327
|
+
|
|
328
|
+
You can also install machine-wide instructions explicitly:
|
|
329
|
+
|
|
330
|
+
```bash
|
|
331
|
+
sift agent install codex --scope global
|
|
332
|
+
sift agent install claude --scope global
|
|
145
333
|
```
|
|
146
334
|
|
|
147
|
-
|
|
335
|
+
Useful commands:
|
|
336
|
+
|
|
337
|
+
```bash
|
|
338
|
+
sift agent status
|
|
339
|
+
sift agent remove codex
|
|
340
|
+
sift agent remove claude
|
|
341
|
+
```
|
|
342
|
+
|
|
343
|
+
`sift agent show ...` is a preview. It also tells you whether the managed block is already installed in the current scope.
|
|
344
|
+
|
|
345
|
+
What the installer does:
|
|
346
|
+
- writes to `AGENTS.md` or `CLAUDE.md` by default in the current repo
|
|
347
|
+
- uses marked managed blocks instead of rewriting the whole file
|
|
348
|
+
- preserves your surrounding notes and instructions
|
|
349
|
+
- can use global files when you explicitly choose `--scope global`
|
|
350
|
+
- keeps previews short by default
|
|
351
|
+
- shows the exact managed block or final dry-run content only with `--raw`
|
|
352
|
+
|
|
353
|
+
What the managed block tells the agent:
|
|
354
|
+
- start with `sift` for long non-interactive command output so the agent spends less context-window and token budget on raw logs
|
|
355
|
+
- for tests, begin with the normal `test-status` summary
|
|
356
|
+
- if `standard` already identifies the main buckets, stop there instead of escalating automatically
|
|
357
|
+
- use `sift escalate` only for the same cached output when more detail is needed without rerunning the command
|
|
358
|
+
- after a fix, refresh the truth with `sift rerun`
|
|
359
|
+
- only then zoom into the remaining failing pytest subset with `sift rerun --remaining --detail focused`, then `verbose`, then `--show-raw`
|
|
360
|
+
- fall back to the raw test command only when exact traceback lines are still needed
|
|
148
361
|
|
|
149
|
-
-
|
|
150
|
-
- `bullets`
|
|
151
|
-
- `json`
|
|
152
|
-
- `verdict`
|
|
362
|
+
## CI-friendly usage
|
|
153
363
|
|
|
154
|
-
|
|
364
|
+
Some commands succeed technically but should still block CI. `--fail-on` handles that for the built-in semantic presets that have stable machine-readable output:
|
|
365
|
+
|
|
366
|
+
```bash
|
|
367
|
+
sift exec --preset audit-critical --fail-on -- npm audit
|
|
368
|
+
sift exec --preset infra-risk --fail-on -- terraform plan
|
|
369
|
+
```
|
|
370
|
+
|
|
371
|
+
Supported presets for `--fail-on`:
|
|
372
|
+
- `audit-critical`
|
|
373
|
+
- `infra-risk`
|
|
155
374
|
|
|
156
375
|
## Config
|
|
157
376
|
|
|
@@ -167,16 +386,32 @@ sift doctor
|
|
|
167
386
|
|
|
168
387
|
`sift config show` masks secrets by default. Use `--show-secrets` only when you explicitly need raw values.
|
|
169
388
|
|
|
170
|
-
|
|
389
|
+
Config precedence:
|
|
171
390
|
1. CLI flags
|
|
172
391
|
2. environment variables
|
|
173
|
-
3. `sift.config.yaml` or `sift.config.yml`
|
|
174
|
-
4. `~/.config/sift/config.yaml` or `~/.config/sift/config.yml`
|
|
392
|
+
3. repo-local `sift.config.yaml` or `sift.config.yml`
|
|
393
|
+
4. machine-wide `~/.config/sift/config.yaml` or `~/.config/sift/config.yml`
|
|
175
394
|
5. built-in defaults
|
|
176
395
|
|
|
396
|
+
## Maintainer benchmark
|
|
397
|
+
|
|
398
|
+
To compare raw pytest output against the `test-status` reduction ladder on fixed fixtures, run:
|
|
399
|
+
|
|
400
|
+
```bash
|
|
401
|
+
npm run bench:test-status-ab
|
|
402
|
+
npm run bench:test-status-live
|
|
403
|
+
```
|
|
404
|
+
|
|
405
|
+
This uses the real `o200k_base` tokenizer and reports both:
|
|
406
|
+
- command-output budget as the primary benchmark
|
|
407
|
+
- deterministic recipe-budget comparisons as supporting evidence only
|
|
408
|
+
- live-session scorecards for captured mixed full-suite agent transcripts
|
|
409
|
+
|
|
410
|
+
The benchmark is meant to show context-window and command-output reduction first. In normal debugging flows, `test-status` should usually stop at `standard`; `focused` and `verbose` are escalation tools, and raw pytest is the last resort when exact traceback evidence is still needed.
|
|
411
|
+
|
|
177
412
|
If you pass `--config <path>`, that path is strict. Missing explicit config paths are errors.
|
|
178
413
|
|
|
179
|
-
Minimal example:
|
|
414
|
+
Minimal config example:
|
|
180
415
|
|
|
181
416
|
```yaml
|
|
182
417
|
provider:
|
|
@@ -195,16 +430,27 @@ runtime:
|
|
|
195
430
|
rawFallback: true
|
|
196
431
|
```
|
|
197
432
|
|
|
198
|
-
##
|
|
433
|
+
## OpenAI vs OpenAI-compatible
|
|
199
434
|
|
|
200
|
-
|
|
435
|
+
Use `provider: openai` for `api.openai.com`.
|
|
201
436
|
|
|
202
|
-
|
|
437
|
+
Use `provider: openai-compatible` for third-party compatible gateways or self-hosted endpoints.
|
|
203
438
|
|
|
204
|
-
|
|
205
|
-
|
|
206
|
-
|
|
207
|
-
|
|
439
|
+
For OpenAI:
|
|
440
|
+
```bash
|
|
441
|
+
export OPENAI_API_KEY=your_openai_api_key
|
|
442
|
+
```
|
|
443
|
+
|
|
444
|
+
For third-party compatible endpoints, use either the endpoint-native env var or:
|
|
445
|
+
|
|
446
|
+
```bash
|
|
447
|
+
export SIFT_PROVIDER_API_KEY=your_provider_api_key
|
|
448
|
+
```
|
|
449
|
+
|
|
450
|
+
Known compatible env fallbacks include:
|
|
451
|
+
- `OPENROUTER_API_KEY`
|
|
452
|
+
- `TOGETHER_API_KEY`
|
|
453
|
+
- `GROQ_API_KEY`
|
|
208
454
|
|
|
209
455
|
## Safety and limits
|
|
210
456
|
|
|
@@ -222,13 +468,7 @@ Release flow:
|
|
|
222
468
|
2. merge to `main`
|
|
223
469
|
3. run the `release` workflow manually
|
|
224
470
|
|
|
225
|
-
The workflow
|
|
226
|
-
1. installs dependencies
|
|
227
|
-
2. runs typecheck, tests, and build
|
|
228
|
-
3. packs and smoke-tests the tarball
|
|
229
|
-
4. publishes to npm
|
|
230
|
-
5. creates and pushes the `vX.Y.Z` tag
|
|
231
|
-
6. creates a GitHub Release
|
|
471
|
+
The workflow runs typecheck, tests, coverage, build, packaging smoke checks, npm publish, tag creation, and GitHub Release creation.
|
|
232
472
|
|
|
233
473
|
## Brand assets
|
|
234
474
|
|