@bilalimamoglu/sift 0.2.3 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -2,10 +2,17 @@
2
2
 
3
3
  <img src="assets/brand/sift-logo-badge-monochrome.svg" alt="sift logo" width="88" />
4
4
 
5
- `sift` is a small CLI that runs a noisy shell command, keeps the useful signal, and returns a much smaller answer.
5
+ `sift` turns a long terminal wall of text into a short answer you can act on.
6
6
 
7
- It is a good fit when you want an agent or CI job to understand:
8
- - test results
7
+ Think of it like this:
8
+ - `standard` = map
9
+ - `focused` or `rerun --remaining` = zoom
10
+ - raw traceback = last resort
11
+
12
+ It is a good fit when a human, agent, or CI job needs the answer faster than it needs the whole log.
13
+
14
+ Common uses:
15
+ - test failures
9
16
  - typecheck failures
10
17
  - lint failures
11
18
  - build logs
@@ -13,10 +20,10 @@ It is a good fit when you want an agent or CI job to understand:
13
20
  - `npm audit`
14
21
  - `terraform plan`
15
22
 
16
- It is not a good fit when you need:
17
- - the exact raw log as the main output
18
- - interactive or TUI commands
19
- - shell behavior that depends on raw command output
23
+ Do not use it when:
24
+ - the exact raw log is the main thing you need
25
+ - the command is interactive or TUI-based
26
+ - shell behavior depends on exact raw command output
20
27
 
21
28
  ## Install
22
29
 
@@ -42,6 +49,13 @@ That writes a machine-wide config to:
42
49
 
43
50
  After that, any terminal on the machine can use `sift`. A repo-local config can still override it later.
44
51
 
52
+ To switch between saved native providers without editing YAML:
53
+
54
+ ```bash
55
+ sift config use openai
56
+ sift config use openrouter
57
+ ```
58
+
45
59
  If you prefer manual setup, this is the smallest useful OpenAI setup:
46
60
 
47
61
  ```bash
@@ -51,49 +65,147 @@ export SIFT_MODEL=gpt-5-nano
51
65
  export OPENAI_API_KEY=your_openai_api_key
52
66
  ```
53
67
 
68
+ If you prefer manual setup, this is the smallest useful OpenRouter setup:
69
+
70
+ ```bash
71
+ export SIFT_PROVIDER=openrouter
72
+ export OPENROUTER_API_KEY=your_openrouter_api_key
73
+ ```
74
+
54
75
  Then check it:
55
76
 
56
77
  ```bash
57
78
  sift doctor
58
79
  ```
59
80
 
60
- ## Quick start
81
+ ## Start here
82
+
83
+ The default path is simple:
84
+ 1. run the noisy command through `sift`
85
+ 2. read the short `standard` answer first
86
+ 3. only zoom in if `standard` clearly tells you more detail is still worth it
87
+
88
+ Examples:
61
89
 
62
90
  ```bash
63
91
  sift exec "what changed?" -- git diff
64
- sift exec --preset test-status -- npm test
92
+ sift exec --preset test-status -- pytest -q
93
+ sift rerun
94
+ sift rerun --remaining --detail focused
95
+ sift rerun --remaining --detail verbose --show-raw
96
+ sift watch "what changed between cycles?" < watcher-output.txt
97
+ sift exec --watch "what changed between cycles?" -- node watcher.js
65
98
  sift exec --preset typecheck-summary -- npm run typecheck
66
99
  sift exec --preset lint-failures -- eslint .
67
100
  sift exec --preset audit-critical -- npm audit
68
101
  sift exec --preset infra-risk -- terraform plan
102
+ sift agent install codex --dry-run
69
103
  ```
70
104
 
71
- ## The main workflow
105
+ ## Simple workflow
72
106
 
73
- `sift exec` is the default path:
107
+ For most repos, this is the whole story:
74
108
 
75
109
  ```bash
76
- sift exec "what changed?" -- git diff
77
- sift exec --preset test-status -- npm test
78
- sift exec --preset test-status --show-raw -- npm test
79
- sift exec --preset test-status --detail focused -- npm test
80
- sift exec --preset test-status --detail verbose -- npm test
110
+ sift exec --preset test-status -- <test command>
111
+ sift rerun
112
+ sift rerun --remaining --detail focused
81
113
  ```
82
114
 
115
+ Mental model:
116
+ - `sift escalate` = same cached output, deeper render
117
+ - `sift rerun` = rerun the cached full command at `standard` and prepend what resolved, remained, or changed
118
+ - `sift rerun --remaining` = rerun only the remaining failing pytest node IDs for a zoomed-in view
119
+ - `sift watch` / `sift exec --watch` = treat redraw-style output as cycles and summarize what changed
120
+ - `Decision: stop and act` = trust the current diagnosis and go read or fix code
121
+ - `Decision: zoom` = one deeper sift pass is justified before raw
122
+ - `Decision: raw only if exact traceback is required` = raw is last resort, not the next default step
123
+
83
124
  If your project uses `pytest`, `vitest`, `jest`, `bun test`, or another test runner instead of `npm test`, use the same preset with that command.
84
125
 
85
- What happens:
86
- 1. `sift` runs the command
126
+ What `sift` does in `exec` mode:
127
+ 1. runs the child command
87
128
  2. captures `stdout` and `stderr`
88
- 3. trims the noise
89
- 4. sends a smaller input to the model
90
- 5. prints a short answer or JSON
91
- 6. preserves the child command exit code in `exec` mode
129
+ 3. keeps the useful signal
130
+ 4. returns a short answer or JSON
131
+ 5. preserves the child command exit code
92
132
 
93
133
  Useful debug flags:
94
134
  - `--dry-run`: show the reduced input and prompt without calling the provider
95
135
  - `--show-raw`: print the captured raw input to `stderr`
96
136
 
137
+ ## When tests fail
138
+
139
+ Start with the map:
140
+
141
+ ```bash
142
+ sift exec --preset test-status -- <test command>
143
+ ```
144
+
145
+ If `standard` already names the main failure buckets, counts, and hints, stop there and read code.
146
+
147
+ If `standard` still includes an unknown bucket or ends with `Decision: zoom`, do one deeper sift pass before you fall back to raw traceback.
148
+
149
+ Then use this order:
150
+ 1. `sift exec --preset test-status -- <test command>`
151
+ 2. `sift rerun`
152
+ 3. `sift rerun --remaining --detail focused`
153
+ 4. `sift rerun --remaining --detail verbose`
154
+ 5. `sift rerun --remaining --detail verbose --show-raw`
155
+ 6. raw pytest only if exact traceback lines are still needed
156
+
157
+ The normal stop budget is `standard` first, then at most one zoom step before raw.
158
+
159
+ If you want the older explicit compare shape, `sift exec --preset test-status --diff -- <test command>` still works. `sift rerun` is the shorter normal path for the same idea.
160
+
161
+ ## Diagnose JSON
162
+
163
+ Most of the time, you do not need JSON. Start with text first.
164
+
165
+ If `standard` already shows bucket-level root cause, `Anchor`, and `Fix`, do not re-verify the same bucket with raw pytest. At most do one targeted source read before you edit.
166
+
167
+ If diagnose output still contains an unknown bucket or `Decision: zoom`, take one sift zoom step before raw traceback.
168
+
169
+ Use diagnose JSON only when automation or machine branching really needs it:
170
+
171
+ ```bash
172
+ sift exec --preset test-status --goal diagnose --format json -- pytest -q
173
+ sift rerun --goal diagnose --format json
174
+ sift watch --preset test-status --goal diagnose --format json < pytest-watch.txt
175
+ ```
176
+
177
+ Default diagnose JSON is summary-first:
178
+ - `remaining_summary` and `resolved_summary` keep the answer small
179
+ - `read_targets` points to the first file or line worth reading
180
+ - `read_targets.context_hint` can tell an agent to read only a small line window first
181
+ - if `context_hint` only includes `search_hint`, search for that string before reading the whole file
182
+ - `remaining_subset_available` tells you whether `sift rerun --remaining` can zoom safely
183
+
184
+ If an agent truly needs every raw failing test ID, opt in:
185
+
186
+ ```bash
187
+ sift exec --preset test-status --goal diagnose --format json --include-test-ids -- pytest -q
188
+ ```
189
+
190
+ `--goal diagnose --format json` is currently supported only for `test-status`, `rerun`, and `test-status` watch flows.
191
+
192
+ ## Watch mode
193
+
194
+ Use watch mode when command output redraws or repeats and you care about cycle-to-cycle change summaries more than the raw stream:
195
+
196
+ ```bash
197
+ sift watch "what changed between cycles?" < watcher-output.txt
198
+ sift exec --watch "what changed between cycles?" -- node watcher.js
199
+ sift exec --watch --preset test-status -- pytest -f
200
+ ```
201
+
202
+ `sift watch` keeps the current summary and change summary together:
203
+ - cycle 1 = current state
204
+ - later cycles = what changed, what resolved, what stayed, and the next best action
205
+ - for `test-status`, resolved tests drop out and remaining failures stay in focus
206
+
207
+ If the stream clearly looks like a redraw/watch session, `sift` can auto-switch to watch handling and prints a short stderr note when it does.
208
+
97
209
  ## `test-status` detail modes
98
210
 
99
211
  If you are running `npm test` and want `sift` to check the result, use `--preset test-status`.
@@ -116,48 +228,88 @@ Examples:
116
228
 
117
229
  ```bash
118
230
  sift exec --preset test-status -- npm test
119
- sift exec --preset test-status --detail focused -- npm test
120
- sift exec --preset test-status --detail verbose -- npm test
121
- sift exec --preset test-status --detail verbose --show-raw -- npm test
231
+ sift rerun
232
+ sift rerun --remaining --detail focused
233
+ sift rerun --remaining --detail verbose
234
+ sift rerun --remaining --detail verbose --show-raw
122
235
  ```
123
236
 
124
237
  If you use a different runner, swap in your command:
125
238
 
126
239
  ```bash
127
240
  sift exec --preset test-status -- pytest
128
- sift exec --preset test-status --detail focused -- vitest
129
- sift exec --preset test-status --detail verbose -- bun test
241
+ sift rerun
242
+ sift rerun --remaining --detail focused
243
+ sift rerun --remaining --detail verbose --show-raw
130
244
  ```
131
245
 
246
+ `sift rerun --remaining` currently supports only cached argv-mode `pytest ...` or `python -m pytest ...` runs. If the cached command is not subset-capable, run a narrowed pytest command manually with `sift exec --preset test-status -- <narrowed pytest command>`.
247
+
132
248
  Typical shapes:
133
249
 
134
250
  `standard`
135
251
  ```text
136
252
  - Tests did not complete.
137
253
  - 114 errors occurred during collection.
138
- - Most failures are import/dependency errors during test collection.
139
- - Missing modules include pydantic, fastapi, botocore, PIL, httpx, numpy.
254
+ - Import/dependency blocker: repeated collection failures are caused by missing dependencies.
255
+ - Anchor: path/to/failing_test.py
256
+ - Fix: Install the missing dependencies and rerun the affected tests.
257
+ - Decision: stop and act. Do not escalate unless you need exact traceback lines.
258
+ - Next: Fix bucket 1 first, then rerun the full suite at standard.
259
+ - Stop signal: diagnosis complete; raw not needed.
260
+ ```
261
+
262
+ `standard` can also separate more than one failure family in a single pass:
263
+ ```text
264
+ - Tests did not pass.
265
+ - 3 tests failed. 124 errors occurred.
266
+ - Shared blocker: DB-isolated tests are missing a required test env var.
267
+ - Anchor: search <TEST_ENV_VAR> in path/to/test_setup.py
268
+ - Fix: Set the required test env var and rerun the suite.
269
+ - Contract drift: snapshot expectations are out of sync with the current API or model state.
270
+ - Anchor: search <route-or-entity> in path/to/freeze_test.py
271
+ - Fix: Review the drift and regenerate the snapshots if the change is intentional.
272
+ - Decision: stop and act. Do not escalate unless you need exact traceback lines.
273
+ - Next: Fix bucket 1 first, then rerun the full suite at standard. Secondary buckets are already visible behind it.
274
+ - Stop signal: diagnosis complete; raw not needed.
140
275
  ```
141
276
 
142
277
  `focused`
143
278
  ```text
144
279
  - Tests did not complete.
145
280
  - 114 errors occurred during collection.
146
- - import/dependency errors during collection
147
- - tests/unit/test_auth_refresh.py -> missing module: botocore
148
- - tests/unit/test_cognito.py -> missing module: pydantic
149
- - and 103 more failing modules
281
+ - Import/dependency blocker: missing dependencies are blocking collection.
282
+ - Missing modules include <module-a>, <module-b>.
283
+ - path/to/test_a.py -> missing module: <module-a>
284
+ - path/to/test_b.py -> missing module: <module-b>
285
+ - Hint: Install the missing dependencies and rerun the affected tests.
286
+ - Next: Fix bucket 1 first, then rerun the full suite at standard.
287
+ - Stop signal: diagnosis complete; raw not needed.
150
288
  ```
151
289
 
152
290
  `verbose`
153
291
  ```text
154
292
  - Tests did not complete.
155
293
  - 114 errors occurred during collection.
156
- - tests/unit/test_auth_refresh.py -> missing module: botocore
157
- - tests/unit/test_cognito.py -> missing module: pydantic
158
- - tests/unit/test_dataset_use_case_facade.py -> missing module: fastapi
294
+ - Import/dependency blocker: missing dependencies are blocking collection.
295
+ - path/to/test_a.py -> missing module: <module-a>
296
+ - path/to/test_b.py -> missing module: <module-b>
297
+ - path/to/test_c.py -> missing module: <module-c>
298
+ - Hint: Install the missing dependencies and rerun the affected tests.
299
+ - Next: Fix bucket 1 first, then rerun the full suite at standard.
300
+ - Stop signal: diagnosis complete; raw not needed.
159
301
  ```
160
302
 
303
+ Recommended debugging order for tests:
304
+ 1. Use `standard` for the full suite first.
305
+ 2. Treat `standard` as the map. If it already shows bucket-level root cause, `Anchor`, and `Fix`, trust it and report or act from there directly.
306
+ 3. Use `sift escalate` only when you want a deeper render of the same cached output without rerunning the command.
307
+ 4. After fixing something, run `sift rerun` to refresh the full-suite truth at `standard`.
308
+ 5. Only then use `sift rerun --remaining --detail focused` as the zoom lens after the full-suite truth is refreshed.
309
+ 6. Then use `sift rerun --remaining --detail verbose`.
310
+ 7. Then use `sift rerun --remaining --detail verbose --show-raw`.
311
+ 8. Fall back to the raw pytest command only if you still need exact traceback lines for the remaining failing subset.
312
+
161
313
  ## Built-in presets
162
314
 
163
315
  - `test-status`: summarize test runs
@@ -176,6 +328,55 @@ sift presets list
176
328
  sift presets show test-status
177
329
  ```
178
330
 
331
+ ## Agent setup
332
+
333
+ If you want Codex or Claude Code to use `sift` by default, let `sift` install a managed instruction block for you.
334
+
335
+ Repo scope is the default because it is safer:
336
+
337
+ ```bash
338
+ sift agent show codex
339
+ sift agent show codex --raw
340
+ sift agent install codex --dry-run
341
+ sift agent install codex --dry-run --raw
342
+ sift agent install codex
343
+ sift agent install claude
344
+ ```
345
+
346
+ You can also install machine-wide instructions explicitly:
347
+
348
+ ```bash
349
+ sift agent install codex --scope global
350
+ sift agent install claude --scope global
351
+ ```
352
+
353
+ Useful commands:
354
+
355
+ ```bash
356
+ sift agent status
357
+ sift agent remove codex
358
+ sift agent remove claude
359
+ ```
360
+
361
+ `sift agent show ...` is a preview. It also tells you whether the managed block is already installed in the current scope.
362
+
363
+ What the installer does:
364
+ - writes to `AGENTS.md` or `CLAUDE.md` by default in the current repo
365
+ - uses marked managed blocks instead of rewriting the whole file
366
+ - preserves your surrounding notes and instructions
367
+ - can use global files when you explicitly choose `--scope global`
368
+ - keeps previews short by default
369
+ - shows the exact managed block or final dry-run content only with `--raw`
370
+
371
+ What the managed block tells the agent:
372
+ - start with `sift` for long non-interactive command output so the agent spends less context-window and token budget on raw logs
373
+ - for tests, begin with the normal `test-status` summary
374
+ - if `standard` already identifies the main buckets, stop there instead of escalating automatically
375
+ - use `sift escalate` only for the same cached output when more detail is needed without rerunning the command
376
+ - after a fix, refresh the truth with `sift rerun`
377
+ - only then zoom into the remaining failing pytest subset with `sift rerun --remaining --detail focused`, then `verbose`, then `--show-raw`
378
+ - fall back to the raw test command only when exact traceback lines are still needed
379
+
179
380
  ## CI-friendly usage
180
381
 
181
382
  Some commands succeed technically but should still block CI. `--fail-on` handles that for the built-in semantic presets that have stable machine-readable output:
@@ -195,6 +396,7 @@ Useful commands:
195
396
 
196
397
  ```bash
197
398
  sift config setup
399
+ sift config use openrouter
198
400
  sift config init
199
401
  sift config show
200
402
  sift config validate
@@ -210,6 +412,22 @@ Config precedence:
210
412
  4. machine-wide `~/.config/sift/config.yaml` or `~/.config/sift/config.yml`
211
413
  5. built-in defaults
212
414
 
415
+ ## Maintainer benchmark
416
+
417
+ To compare raw pytest output against the `test-status` reduction ladder on fixed fixtures, run:
418
+
419
+ ```bash
420
+ npm run bench:test-status-ab
421
+ npm run bench:test-status-live
422
+ ```
423
+
424
+ This uses the real `o200k_base` tokenizer and reports both:
425
+ - command-output budget as the primary benchmark
426
+ - deterministic recipe-budget comparisons as supporting evidence only
427
+ - live-session scorecards for captured mixed full-suite agent transcripts
428
+
429
+ The benchmark is meant to show context-window and command-output reduction first. In normal debugging flows, `test-status` should usually stop at `standard`; `focused` and `verbose` are escalation tools, and raw pytest is the last resort when exact traceback evidence is still needed.
430
+
213
431
  If you pass `--config <path>`, that path is strict. Missing explicit config paths are errors.
214
432
 
215
433
  Minimal config example:
@@ -231,10 +449,14 @@ runtime:
231
449
  rawFallback: true
232
450
  ```
233
451
 
234
- ## OpenAI vs OpenAI-compatible
452
+ ## OpenAI vs OpenRouter vs OpenAI-compatible
235
453
 
236
454
  Use `provider: openai` for `api.openai.com`.
237
455
 
456
+ Use `provider: openrouter` for the native OpenRouter path. It defaults to:
457
+ - `baseUrl: https://openrouter.ai/api/v1`
458
+ - `model: openrouter/free`
459
+
238
460
  Use `provider: openai-compatible` for third-party compatible gateways or self-hosted endpoints.
239
461
 
240
462
  For OpenAI:
@@ -242,6 +464,11 @@ For OpenAI:
242
464
  export OPENAI_API_KEY=your_openai_api_key
243
465
  ```
244
466
 
467
+ For OpenRouter:
468
+ ```bash
469
+ export OPENROUTER_API_KEY=your_openrouter_api_key
470
+ ```
471
+
245
472
  For third-party compatible endpoints, use either the endpoint-native env var or:
246
473
 
247
474
  ```bash
@@ -253,15 +480,6 @@ Known compatible env fallbacks include:
253
480
  - `TOGETHER_API_KEY`
254
481
  - `GROQ_API_KEY`
255
482
 
256
- ## Agent usage
257
-
258
- The simple rule is:
259
- - use `sift exec` for long, noisy, non-interactive command output
260
- - skip `sift` when exact raw output matters
261
-
262
- For Codex, put that rule in `~/.codex/AGENTS.md`.
263
- For Claude Code, put the same rule in `CLAUDE.md`.
264
-
265
483
  ## Safety and limits
266
484
 
267
485
  - redaction is optional and regex-based