@bilalimamoglu/sift 0.2.2 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -2,21 +2,30 @@
2
2
 
3
3
  <img src="assets/brand/sift-logo-badge-monochrome.svg" alt="sift logo" width="88" />
4
4
 
5
- `sift` is a small command-output reducer for agent workflows.
5
+ `sift` turns a long terminal wall of text into a short answer you can act on.
6
6
 
7
- Instead of feeding a model the full output of `pytest`, `git diff`, `npm audit`, `tsc --noEmit`, `eslint .`, or `terraform plan`, you run the command through `sift`. It captures the output, trims the noise, and returns a much smaller answer.
7
+ Think of it like this:
8
+ - `standard` = map
9
+ - `focused` or `rerun --remaining` = zoom
10
+ - raw traceback = last resort
8
11
 
9
- Best fit:
10
- - non-interactive shell commands
11
- - agents that need short answers instead of full logs
12
- - CI checks where a command may succeed but still produce a blocking result
12
+ It is a good fit when a human, agent, or CI job needs the answer faster than it needs the whole log.
13
13
 
14
- Not a fit:
15
- - exact raw log inspection
16
- - TUI tools
17
- - password/confirmation prompts
14
+ Common uses:
15
+ - test failures
16
+ - typecheck failures
17
+ - lint failures
18
+ - build logs
19
+ - `git diff`
20
+ - `npm audit`
21
+ - `terraform plan`
18
22
 
19
- ## Installation
23
+ Do not use it when:
24
+ - the exact raw log is the main thing you need
25
+ - the command is interactive or TUI-based
26
+ - shell behavior depends on exact raw command output
27
+
28
+ ## Install
20
29
 
21
30
  Requires Node.js 20 or later.
22
31
 
@@ -26,7 +35,7 @@ npm install -g @bilalimamoglu/sift
26
35
 
27
36
  ## One-time setup
28
37
 
29
- The easiest path is the guided setup:
38
+ The easiest setup path is:
30
39
 
31
40
  ```bash
32
41
  sift config setup
@@ -38,9 +47,9 @@ That writes a machine-wide config to:
38
47
  ~/.config/sift/config.yaml
39
48
  ```
40
49
 
41
- After that, any terminal can use `sift` without per-project setup. A repo-local config can still override it later.
50
+ After that, any terminal on the machine can use `sift`. A repo-local config can still override it later.
42
51
 
43
- If you want to set things up manually, for OpenAI-hosted models:
52
+ If you prefer manual setup, this is the smallest useful OpenAI setup:
44
53
 
45
54
  ```bash
46
55
  export SIFT_PROVIDER=openai
@@ -49,86 +58,243 @@ export SIFT_MODEL=gpt-5-nano
49
58
  export OPENAI_API_KEY=your_openai_api_key
50
59
  ```
51
60
 
52
- Or write a template config file:
61
+ Then check it:
53
62
 
54
63
  ```bash
55
- sift config init
64
+ sift doctor
56
65
  ```
57
66
 
58
- For a manual machine-wide template:
67
+ ## Start here
68
+
69
+ The default path is simple:
70
+ 1. run the noisy command through `sift`
71
+ 2. read the short `standard` answer first
72
+ 3. only zoom in if `standard` clearly tells you more detail is still worth it
73
+
74
+ Examples:
59
75
 
60
76
  ```bash
61
- sift config init --global
77
+ sift exec "what changed?" -- git diff
78
+ sift exec --preset test-status -- pytest -q
79
+ sift rerun
80
+ sift rerun --remaining --detail focused
81
+ sift rerun --remaining --detail verbose --show-raw
82
+ sift watch "what changed between cycles?" < watcher-output.txt
83
+ sift exec --watch "what changed between cycles?" -- node watcher.js
84
+ sift exec --preset typecheck-summary -- npm run typecheck
85
+ sift exec --preset lint-failures -- eslint .
86
+ sift exec --preset audit-critical -- npm audit
87
+ sift exec --preset infra-risk -- terraform plan
88
+ sift agent install codex --dry-run
62
89
  ```
63
90
 
64
- That writes:
91
+ ## Simple workflow
65
92
 
66
- ```text
67
- ~/.config/sift/config.yaml
93
+ For most repos, this is the whole story:
94
+
95
+ ```bash
96
+ sift exec --preset test-status -- <test command>
97
+ sift rerun
98
+ sift rerun --remaining --detail focused
68
99
  ```
69
100
 
70
- Then keep the API key in your shell profile so every terminal can use it:
101
+ Mental model:
102
+ - `sift escalate` = same cached output, deeper render
103
+ - `sift rerun` = rerun the cached full command at `standard` and prepend what resolved, remained, or changed
104
+ - `sift rerun --remaining` = rerun only the remaining failing pytest node IDs for a zoomed-in view
105
+ - `sift watch` / `sift exec --watch` = treat redraw-style output as cycles and summarize what changed
106
+ - `Decision: stop and act` = trust the current diagnosis and go read or fix code
107
+ - `Decision: zoom` = one deeper sift pass is justified before raw
108
+ - `Decision: raw only if exact traceback is required` = raw is last resort, not the next default step
109
+
110
+ If your project uses `pytest`, `vitest`, `jest`, `bun test`, or another test runner instead of `npm test`, use the same preset with that command.
111
+
112
+ What `sift` does in `exec` mode:
113
+ 1. runs the child command
114
+ 2. captures `stdout` and `stderr`
115
+ 3. keeps the useful signal
116
+ 4. returns a short answer or JSON
117
+ 5. preserves the child command exit code
118
+
119
+ Useful debug flags:
120
+ - `--dry-run`: show the reduced input and prompt without calling the provider
121
+ - `--show-raw`: print the captured raw input to `stderr`
122
+
123
+ ## When tests fail
124
+
125
+ Start with the map:
71
126
 
72
127
  ```bash
73
- export OPENAI_API_KEY=your_openai_api_key
128
+ sift exec --preset test-status -- <test command>
74
129
  ```
75
130
 
76
- If you use a different OpenAI-compatible endpoint, switch to `provider: openai-compatible` and use either the endpoint's native API key env var or the generic fallback:
131
+ If `standard` already names the main failure buckets, counts, and hints, stop there and read code.
132
+
133
+ Then use this order:
134
+ 1. `sift exec --preset test-status -- <test command>`
135
+ 2. `sift rerun`
136
+ 3. `sift rerun --remaining --detail focused`
137
+ 4. `sift rerun --remaining --detail verbose`
138
+ 5. `sift rerun --remaining --detail verbose --show-raw`
139
+ 6. raw pytest only if exact traceback lines are still needed
140
+
141
+ The normal stop budget is `standard` first, then at most one zoom step before raw.
142
+
143
+ If you want the older explicit compare shape, `sift exec --preset test-status --diff -- <test command>` still works. `sift rerun` is the shorter normal path for the same idea.
144
+
145
+ ## Diagnose JSON
146
+
147
+ Most of the time, you do not need JSON. Start with text first.
148
+
149
+ If `standard` already shows bucket-level root cause, `Anchor`, and `Fix`, do not re-verify the same bucket with raw pytest. At most do one targeted source read before you edit.
150
+
151
+ Use diagnose JSON only when automation or machine branching really needs it:
77
152
 
78
153
  ```bash
79
- export SIFT_PROVIDER_API_KEY=your_provider_api_key
154
+ sift exec --preset test-status --goal diagnose --format json -- pytest -q
155
+ sift rerun --goal diagnose --format json
156
+ sift watch --preset test-status --goal diagnose --format json < pytest-watch.txt
80
157
  ```
81
158
 
82
- Common compatible env fallbacks:
83
- - `OPENROUTER_API_KEY`
84
- - `TOGETHER_API_KEY`
85
- - `GROQ_API_KEY`
159
+ Default diagnose JSON is summary-first:
160
+ - `remaining_summary` and `resolved_summary` keep the answer small
161
+ - `read_targets` points to the first file or line worth reading
162
+ - `read_targets.context_hint` can tell an agent to read only a small line window first
163
+ - if `context_hint` only includes `search_hint`, search for that string before reading the whole file
164
+ - `remaining_subset_available` tells you whether `sift rerun --remaining` can zoom safely
86
165
 
87
- ## Quick start
166
+ If an agent truly needs every raw failing test ID, opt in:
88
167
 
89
168
  ```bash
90
- sift exec "what changed?" -- git diff
91
- sift exec --preset test-status -- pytest
92
- sift exec --preset typecheck-summary -- tsc --noEmit
93
- sift exec --preset lint-failures -- eslint .
94
- sift exec --preset audit-critical -- npm audit
95
- sift exec --preset infra-risk -- terraform plan
96
- sift exec --preset audit-critical --fail-on -- npm audit
97
- sift exec --preset infra-risk --fail-on -- terraform plan
169
+ sift exec --preset test-status --goal diagnose --format json --include-test-ids -- pytest -q
98
170
  ```
99
171
 
100
- ## Main workflow
172
+ `--goal diagnose --format json` is currently supported only for `test-status`, `rerun`, and `test-status` watch flows.
173
+
174
+ ## Watch mode
101
175
 
102
- `sift exec` is the default path:
176
+ Use watch mode when command output redraws or repeats and you care about cycle-to-cycle change summaries more than the raw stream:
103
177
 
104
178
  ```bash
105
- sift exec "did tests pass?" -- pytest
106
- sift exec --dry-run "what changed?" -- git diff
179
+ sift watch "what changed between cycles?" < watcher-output.txt
180
+ sift exec --watch "what changed between cycles?" -- node watcher.js
181
+ sift exec --watch --preset test-status -- pytest -f
107
182
  ```
108
183
 
109
- What it does:
110
- 1. runs the command
111
- 2. captures `stdout` and `stderr`
112
- 3. sanitizes, optionally redacts, and truncates the output
113
- 4. sends the reduced input to a smaller model
114
- 5. prints a short answer or JSON
115
- 6. preserves the wrapped command's exit code
184
+ `sift watch` keeps the current summary and change summary together:
185
+ - cycle 1 = current state
186
+ - later cycles = what changed, what resolved, what stayed, and the next best action
187
+ - for `test-status`, resolved tests drop out and remaining failures stay in focus
116
188
 
117
- Use `--dry-run` to inspect the reduced input and prompt without calling the provider.
189
+ If the stream clearly looks like a redraw/watch session, `sift` can auto-switch to watch handling and prints a short stderr note when it does.
118
190
 
119
- Use `--fail-on` when a built-in semantic preset should turn a technically successful command into a CI failure. Supported presets:
120
- - `infra-risk`
121
- - `audit-critical`
191
+ ## `test-status` detail modes
122
192
 
123
- Pipe mode still works when output already exists:
193
+ If you are running `npm test` and want `sift` to check the result, use `--preset test-status`.
194
+
195
+ `test-status` becomes test-aware because you chose the preset. It does **not** infer “this is a test command” from `pytest`, `vitest`, `npm test`, or any other runner name.
196
+
197
+ Available detail levels:
198
+
199
+ - `standard`
200
+ - short default summary
201
+ - no file list
202
+ - `focused`
203
+ - groups failures by error type
204
+ - shows a few representative failing tests or modules
205
+ - `verbose`
206
+ - flat list of visible failing tests or modules and their normalized reason
207
+ - useful when Codex needs to know exactly what to fix first
208
+
209
+ Examples:
124
210
 
125
211
  ```bash
126
- git diff 2>&1 | sift "what changed?"
212
+ sift exec --preset test-status -- npm test
213
+ sift rerun
214
+ sift rerun --remaining --detail focused
215
+ sift rerun --remaining --detail verbose
216
+ sift rerun --remaining --detail verbose --show-raw
217
+ ```
218
+
219
+ If you use a different runner, swap in your command:
220
+
221
+ ```bash
222
+ sift exec --preset test-status -- pytest
223
+ sift rerun
224
+ sift rerun --remaining --detail focused
225
+ sift rerun --remaining --detail verbose --show-raw
226
+ ```
227
+
228
+ `sift rerun --remaining` currently supports only cached argv-mode `pytest ...` or `python -m pytest ...` runs. If the cached command is not subset-capable, run a narrowed pytest command manually with `sift exec --preset test-status -- <narrowed pytest command>`.
229
+
230
+ Typical shapes:
231
+
232
+ `standard`
233
+ ```text
234
+ - Tests did not complete.
235
+ - 114 errors occurred during collection.
236
+ - Import/dependency blocker: repeated collection failures are caused by missing dependencies.
237
+ - Anchor: path/to/failing_test.py
238
+ - Fix: Install the missing dependencies and rerun the affected tests.
239
+ - Decision: stop and act. Do not escalate unless you need exact traceback lines.
240
+ - Next: Fix bucket 1 first, then rerun the full suite at standard.
241
+ - Stop signal: diagnosis complete; raw not needed.
242
+ ```
243
+
244
+ `standard` can also separate more than one failure family in a single pass:
245
+ ```text
246
+ - Tests did not pass.
247
+ - 3 tests failed. 124 errors occurred.
248
+ - Shared blocker: DB-isolated tests are missing a required test env var.
249
+ - Anchor: search <TEST_ENV_VAR> in path/to/test_setup.py
250
+ - Fix: Set the required test env var and rerun the suite.
251
+ - Contract drift: snapshot expectations are out of sync with the current API or model state.
252
+ - Anchor: search <route-or-entity> in path/to/freeze_test.py
253
+ - Fix: Review the drift and regenerate the snapshots if the change is intentional.
254
+ - Decision: stop and act. Do not escalate unless you need exact traceback lines.
255
+ - Next: Fix bucket 1 first, then rerun the full suite at standard. Secondary buckets are already visible behind it.
256
+ - Stop signal: diagnosis complete; raw not needed.
257
+ ```
258
+
259
+ `focused`
260
+ ```text
261
+ - Tests did not complete.
262
+ - 114 errors occurred during collection.
263
+ - Import/dependency blocker: missing dependencies are blocking collection.
264
+ - Missing modules include <module-a>, <module-b>.
265
+ - path/to/test_a.py -> missing module: <module-a>
266
+ - path/to/test_b.py -> missing module: <module-b>
267
+ - Hint: Install the missing dependencies and rerun the affected tests.
268
+ - Next: Fix bucket 1 first, then rerun the full suite at standard.
269
+ - Stop signal: diagnosis complete; raw not needed.
127
270
  ```
128
271
 
272
+ `verbose`
273
+ ```text
274
+ - Tests did not complete.
275
+ - 114 errors occurred during collection.
276
+ - Import/dependency blocker: missing dependencies are blocking collection.
277
+ - path/to/test_a.py -> missing module: <module-a>
278
+ - path/to/test_b.py -> missing module: <module-b>
279
+ - path/to/test_c.py -> missing module: <module-c>
280
+ - Hint: Install the missing dependencies and rerun the affected tests.
281
+ - Next: Fix bucket 1 first, then rerun the full suite at standard.
282
+ - Stop signal: diagnosis complete; raw not needed.
283
+ ```
284
+
285
+ Recommended debugging order for tests:
286
+ 1. Use `standard` for the full suite first.
287
+ 2. Treat `standard` as the map. If it already shows bucket-level root cause, `Anchor`, and `Fix`, trust it and report or act from there directly.
288
+ 3. Use `sift escalate` only when you want a deeper render of the same cached output without rerunning the command.
289
+ 4. After fixing something, run `sift rerun` to refresh the full-suite truth at `standard`.
290
+ 5. Only then use `sift rerun --remaining --detail focused` as the zoom lens after the full-suite truth is refreshed.
291
+ 6. Then use `sift rerun --remaining --detail verbose`.
292
+ 7. Then use `sift rerun --remaining --detail verbose --show-raw`.
293
+ 8. Fall back to the raw pytest command only if you still need exact traceback lines for the remaining failing subset.
294
+
129
295
  ## Built-in presets
130
296
 
131
- - `test-status`: summarize test results
297
+ - `test-status`: summarize test runs
132
298
  - `typecheck-summary`: group blocking type errors by root cause
133
299
  - `lint-failures`: group repeated lint violations and highlight the files or rules that matter
134
300
  - `audit-critical`: extract only high and critical vulnerabilities
@@ -137,21 +303,74 @@ git diff 2>&1 | sift "what changed?"
137
303
  - `build-failure`: explain the most likely build failure
138
304
  - `log-errors`: extract the most relevant error signals
139
305
 
140
- Inspect them with:
306
+ List or inspect them:
141
307
 
142
308
  ```bash
143
309
  sift presets list
144
- sift presets show audit-critical
310
+ sift presets show test-status
311
+ ```
312
+
313
+ ## Agent setup
314
+
315
+ If you want Codex or Claude Code to use `sift` by default, let `sift` install a managed instruction block for you.
316
+
317
+ Repo scope is the default because it is safer:
318
+
319
+ ```bash
320
+ sift agent show codex
321
+ sift agent show codex --raw
322
+ sift agent install codex --dry-run
323
+ sift agent install codex --dry-run --raw
324
+ sift agent install codex
325
+ sift agent install claude
326
+ ```
327
+
328
+ You can also install machine-wide instructions explicitly:
329
+
330
+ ```bash
331
+ sift agent install codex --scope global
332
+ sift agent install claude --scope global
145
333
  ```
146
334
 
147
- ## Output modes
335
+ Useful commands:
336
+
337
+ ```bash
338
+ sift agent status
339
+ sift agent remove codex
340
+ sift agent remove claude
341
+ ```
342
+
343
+ `sift agent show ...` is a preview. It also tells you whether the managed block is already installed in the current scope.
344
+
345
+ What the installer does:
346
+ - writes to `AGENTS.md` or `CLAUDE.md` by default in the current repo
347
+ - uses marked managed blocks instead of rewriting the whole file
348
+ - preserves your surrounding notes and instructions
349
+ - can use global files when you explicitly choose `--scope global`
350
+ - keeps previews short by default
351
+ - shows the exact managed block or final dry-run content only with `--raw`
352
+
353
+ What the managed block tells the agent:
354
+ - start with `sift` for long non-interactive command output so the agent spends less context-window and token budget on raw logs
355
+ - for tests, begin with the normal `test-status` summary
356
+ - if `standard` already identifies the main buckets, stop there instead of escalating automatically
357
+ - use `sift escalate` only for the same cached output when more detail is needed without rerunning the command
358
+ - after a fix, refresh the truth with `sift rerun`
359
+ - only then zoom into the remaining failing pytest subset with `sift rerun --remaining --detail focused`, then `verbose`, then `--show-raw`
360
+ - fall back to the raw test command only when exact traceback lines are still needed
148
361
 
149
- - `brief`
150
- - `bullets`
151
- - `json`
152
- - `verdict`
362
+ ## CI-friendly usage
153
363
 
154
- Built-in JSON and verdict flows return strict error objects on provider or model failure.
364
+ Some commands succeed technically but should still block CI. `--fail-on` handles that for the built-in semantic presets that have stable machine-readable output:
365
+
366
+ ```bash
367
+ sift exec --preset audit-critical --fail-on -- npm audit
368
+ sift exec --preset infra-risk --fail-on -- terraform plan
369
+ ```
370
+
371
+ Supported presets for `--fail-on`:
372
+ - `audit-critical`
373
+ - `infra-risk`
155
374
 
156
375
  ## Config
157
376
 
@@ -167,16 +386,32 @@ sift doctor
167
386
 
168
387
  `sift config show` masks secrets by default. Use `--show-secrets` only when you explicitly need raw values.
169
388
 
170
- Resolution order:
389
+ Config precedence:
171
390
  1. CLI flags
172
391
  2. environment variables
173
- 3. `sift.config.yaml` or `sift.config.yml`
174
- 4. `~/.config/sift/config.yaml` or `~/.config/sift/config.yml`
392
+ 3. repo-local `sift.config.yaml` or `sift.config.yml`
393
+ 4. machine-wide `~/.config/sift/config.yaml` or `~/.config/sift/config.yml`
175
394
  5. built-in defaults
176
395
 
396
+ ## Maintainer benchmark
397
+
398
+ To compare raw pytest output against the `test-status` reduction ladder on fixed fixtures, run:
399
+
400
+ ```bash
401
+ npm run bench:test-status-ab
402
+ npm run bench:test-status-live
403
+ ```
404
+
405
+ This uses the real `o200k_base` tokenizer and reports both:
406
+ - command-output budget as the primary benchmark
407
+ - deterministic recipe-budget comparisons as supporting evidence only
408
+ - live-session scorecards for captured mixed full-suite agent transcripts
409
+
410
+ The benchmark is meant to show context-window and command-output reduction first. In normal debugging flows, `test-status` should usually stop at `standard`; `focused` and `verbose` are escalation tools, and raw pytest is the last resort when exact traceback evidence is still needed.
411
+
177
412
  If you pass `--config <path>`, that path is strict. Missing explicit config paths are errors.
178
413
 
179
- Minimal example:
414
+ Minimal config example:
180
415
 
181
416
  ```yaml
182
417
  provider:
@@ -195,16 +430,27 @@ runtime:
195
430
  rawFallback: true
196
431
  ```
197
432
 
198
- ## Agent usage
433
+ ## OpenAI vs OpenAI-compatible
199
434
 
200
- For Claude Code, add a short rule to `CLAUDE.md`.
435
+ Use `provider: openai` for `api.openai.com`.
201
436
 
202
- For Codex, add the same rule to `~/.codex/AGENTS.md`.
437
+ Use `provider: openai-compatible` for third-party compatible gateways or self-hosted endpoints.
203
438
 
204
- The important part is simple:
205
- - prefer `sift exec` for noisy shell commands
206
- - skip `sift` when exact raw output matters
207
- - keep credentials in your shell env or `sift.config.yaml`, never inline in prompts or agent instructions
439
+ For OpenAI:
440
+ ```bash
441
+ export OPENAI_API_KEY=your_openai_api_key
442
+ ```
443
+
444
+ For third-party compatible endpoints, use either the endpoint-native env var or:
445
+
446
+ ```bash
447
+ export SIFT_PROVIDER_API_KEY=your_provider_api_key
448
+ ```
449
+
450
+ Known compatible env fallbacks include:
451
+ - `OPENROUTER_API_KEY`
452
+ - `TOGETHER_API_KEY`
453
+ - `GROQ_API_KEY`
208
454
 
209
455
  ## Safety and limits
210
456
 
@@ -222,13 +468,7 @@ Release flow:
222
468
  2. merge to `main`
223
469
  3. run the `release` workflow manually
224
470
 
225
- The workflow:
226
- 1. installs dependencies
227
- 2. runs typecheck, tests, and build
228
- 3. packs and smoke-tests the tarball
229
- 4. publishes to npm
230
- 5. creates and pushes the `vX.Y.Z` tag
231
- 6. creates a GitHub Release
471
+ The workflow runs typecheck, tests, coverage, build, packaging smoke checks, npm publish, tag creation, and GitHub Release creation.
232
472
 
233
473
  ## Brand assets
234
474