@bilalimamoglu/sift 0.3.2 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,247 +1,279 @@
1
1
  # sift
2
2
 
3
- <img src="assets/brand/sift-logo-minimal-monochrome.svg" alt="sift logo" width="120" />
3
+ [![npm version](https://img.shields.io/npm/v/@bilalimamoglu/sift)](https://www.npmjs.com/package/@bilalimamoglu/sift)
4
+ [![license](https://img.shields.io/github/license/bilalimamoglu/sift)](LICENSE)
5
+ [![CI](https://img.shields.io/github/actions/workflow/status/bilalimamoglu/sift/ci.yml?branch=main&label=CI)](https://github.com/bilalimamoglu/sift/actions/workflows/ci.yml)
4
6
 
5
- Most command output is long and noisy, but the thing you actually need to know is short: what failed, where, and what to do next. `sift` runs the command for you, captures the output, and gives you a short answer instead of a wall of text.
7
+ <img src="assets/brand/sift-logo-minimal-teal-default.svg" alt="sift logo" width="140" />
6
8
 
7
- It works with test suites, build logs, `git diff`, `npm audit`, `terraform plan` — anything where the signal is buried in noise. It always tries the cheapest approach first and only escalates when needed. Exit codes are preserved.
9
+ Your AI agent should not be reading 13,000 lines of test output.
8
10
 
9
- Skip it when:
10
- - you need the exact raw log
11
- - the command is interactive or TUI-based
12
- - the output is already short
13
-
14
- ## Install
15
-
16
- Requires Node.js 24 or later.
11
+ On the largest real fixture in the benchmark:
12
+ **Before:** 128 failures, 198K raw-output tokens, agent reconstructs the failure shape from scratch.
13
+ **After:** 6 lines, 129 `standard` tokens, agent acts on a grouped diagnosis immediately.
17
14
 
18
15
  ```bash
19
- npm install -g @bilalimamoglu/sift
16
+ sift exec --preset test-status -- pytest -q
20
17
  ```
21
18
 
22
- ## Setup
19
+ ```text
20
+ - Tests did not pass.
21
+ - 3 tests failed. 125 errors occurred.
22
+ - Shared blocker: 125 errors share the same root cause - a missing test environment variable.
23
+ Anchor: tests/conftest.py
24
+ Fix: Set the required env var before rerunning DB-isolated tests.
25
+ - Contract drift: 3 snapshot tests are out of sync with the current API or model state.
26
+ Anchor: tests/contracts/test_feature_manifest_freeze.py
27
+ Fix: Regenerate the snapshots if the changes are intentional.
28
+ - Decision: stop and act.
29
+ ```
30
+
31
+ If 125 tests fail for one reason, the agent should pay for that reason once.
32
+
33
+ ## What it is
23
34
 
24
- The interactive setup writes a machine-wide config and walks you through provider selection:
35
+ Developers using coding agents Claude Code, Codex, Cursor, Windsurf, Copilot, or any LLM-driven workflow that runs shell commands and reads the output.
36
+
37
+ `sift` sits between the command and the agent. It captures noisy output, groups repeated failures into root-cause buckets, and returns a short diagnosis with an anchor, a likely fix, and a decision signal. The agent gets a map instead of a wall of text.
38
+
39
+ ## Install
25
40
 
26
41
  ```bash
27
- sift config setup
28
- sift doctor # verify it works
42
+ npm install -g @bilalimamoglu/sift
29
43
  ```
30
44
 
31
- Config is saved to `~/.config/sift/config.yaml`. A repo-local `sift.config.yaml` can override it later.
45
+ Requires Node.js 20+.
32
46
 
33
- If you prefer environment variables instead:
47
+ ## Try it in 60 seconds
48
+
49
+ If you already have an API key, you can try `sift` without any setup wizard:
34
50
 
35
51
  ```bash
36
- # OpenAI
37
- export SIFT_PROVIDER=openai
38
- export SIFT_BASE_URL=https://api.openai.com/v1
39
- export SIFT_MODEL=gpt-5-nano
40
52
  export OPENAI_API_KEY=your_openai_api_key
53
+ sift exec --preset test-status -- pytest -q
54
+ ```
41
55
 
42
- # or OpenRouter
43
- export SIFT_PROVIDER=openrouter
44
- export OPENROUTER_API_KEY=your_openrouter_api_key
56
+ You can also use a freeform prompt for non-test output:
45
57
 
46
- # or any OpenAI-compatible endpoint (Together, Groq, self-hosted, etc.)
47
- export SIFT_PROVIDER=openai-compatible
48
- export SIFT_BASE_URL=https://your-endpoint/v1
49
- export SIFT_PROVIDER_API_KEY=your_api_key
58
+ ```bash
59
+ sift exec "what changed?" -- git diff
50
60
  ```
51
61
 
52
- To switch between saved providers without editing files:
62
+ ## Set it up for daily use
63
+
64
+ Guided setup writes a machine-wide config, verifies the provider, and makes the CLI easier to use day to day:
53
65
 
54
66
  ```bash
55
- sift config use openai
56
- sift config use openrouter
67
+ sift config setup
68
+ sift doctor
57
69
  ```
58
70
 
59
- ## Usage
71
+ Config lives at `~/.config/sift/config.yaml`. A repo-local `sift.config.yaml` can override it later.
60
72
 
61
- Run a noisy command through `sift`, read the short answer, and only zoom in if it tells you to:
73
+ If you want your coding agent to use `sift` automatically, install the managed instruction block too:
62
74
 
63
75
  ```bash
64
- sift exec --preset test-status -- pytest -q
76
+ sift agent install codex
77
+ sift agent install claude
78
+ ```
79
+
80
+ Then run noisy commands through `sift`:
81
+
82
+ ```bash
83
+ sift exec --preset test-status -- <test command>
65
84
  sift exec "what changed?" -- git diff
66
85
  sift exec --preset audit-critical -- npm audit
67
86
  sift exec --preset infra-risk -- terraform plan
68
87
  ```
69
88
 
70
- `sift exec` runs the child command, captures its output, reduces it, and preserves the original exit code.
71
-
72
89
  Useful flags:
73
- - `--dry-run`: show the reduced input and prompt without calling the provider
74
- - `--show-raw`: print the captured raw output to `stderr`
75
-
76
- ## Test debugging workflow
77
-
78
- This is the most common use case and where `sift` adds the most value.
79
-
80
- Think of it like this:
81
- - `standard` = map
82
- - `focused` or `rerun --remaining` = zoom
83
- - raw traceback = last resort
90
+ - `--dry-run` to preview the reduced input and prompt without calling a provider
91
+ - `--show-raw` to print captured raw output to `stderr`
92
+ - `--fail-on` to let reduced results fail CI for commands such as `npm audit` or `terraform plan`
84
93
 
85
- For most repos, the whole story is:
94
+ If you prefer environment variables instead of setup:
86
95
 
87
96
  ```bash
88
- sift exec --preset test-status -- <test command> # get the map
89
- sift rerun # after a fix, refresh the truth
90
- sift rerun --remaining --detail focused # zoom into what's still failing
91
- ```
97
+ # OpenAI
98
+ export SIFT_PROVIDER=openai
99
+ export SIFT_BASE_URL=https://api.openai.com/v1
100
+ export SIFT_MODEL=gpt-5-nano
101
+ export OPENAI_API_KEY=your_openai_api_key
92
102
 
93
- `test-status` becomes test-aware because you chose the preset. It does **not** infer "this is a test command" from the runner name — use the same preset with `pytest`, `vitest`, `jest`, `bun test`, or any other runner.
103
+ # OpenRouter
104
+ export SIFT_PROVIDER=openrouter
105
+ export OPENROUTER_API_KEY=your_openrouter_api_key
94
106
 
95
- If `standard` already names the failure buckets, counts, and hints, stop there and read code. If it ends with `Decision: zoom`, do one deeper pass before falling back to raw traceback.
107
+ # Any OpenAI-compatible endpoint
108
+ export SIFT_PROVIDER=openai-compatible
109
+ export SIFT_BASE_URL=https://your-endpoint/v1
110
+ export SIFT_PROVIDER_API_KEY=your_api_key
111
+ ```
96
112
 
97
- ### What `sift` returns for each failure family
113
+ ## Why it helps
98
114
 
99
- - `Shared blocker` one setup problem affecting many tests
100
- - A named family such as import, timeout, network, migration, or assertion
101
- - `Anchor` — the first file, line window, or search term worth opening
102
- - `Fix` — the likely next move
103
- - `Decision` — whether to stop here or zoom one step deeper
104
- - `Next` — the smallest practical action
115
+ The core abstraction is a **bucket**: one distinct root cause, no matter how many tests it affects.
105
116
 
106
- ### Detail levels
117
+ Instead of making an agent reason over 125 repeated tracebacks, `sift` compresses them into one actionable bucket with:
118
+ - a label
119
+ - an affected count
120
+ - an anchor
121
+ - a likely fix
122
+ - a decision signal
107
123
 
108
- - `standard` short summary, no file list (default)
109
- - `focused` — groups failures by error type, shows a few representative tests
110
- - `verbose` — flat list of all visible failing tests with their normalized reason
124
+ That changes the agent's job from "figure out what happened" to "act on the diagnosis."
111
125
 
112
- ### Example output
126
+ ## How it works
113
127
 
114
- Single failure family:
115
- ```text
116
- - Tests did not complete.
117
- - 114 errors occurred during collection.
118
- - Import/dependency blocker: repeated collection failures are caused by missing dependencies.
119
- - Anchor: path/to/failing_test.py
120
- - Fix: Install the missing dependencies and rerun the affected tests.
121
- - Decision: stop and act. Do not escalate unless you need exact traceback lines.
122
- - Next: Fix bucket 1 first, then rerun the full suite at standard.
123
- ```
128
+ `sift` follows a cheapest-first pipeline:
124
129
 
125
- Multiple failure families in one pass:
126
- ```text
127
- - Tests did not pass.
128
- - 3 tests failed. 124 errors occurred.
129
- - Shared blocker: DB-isolated tests are missing a required test env var.
130
- Anchor: search <TEST_ENV_VAR> in path/to/test_setup.py
131
- Fix: Set the required test env var and rerun the suite.
132
- - Contract drift: snapshot expectations are out of sync with the current API or model state.
133
- Anchor: search <route-or-entity> in path/to/freeze_test.py
134
- Fix: Review the drift and regenerate the snapshots if the change is intentional.
135
- - Decision: stop and act.
136
- - Next: Fix bucket 1 first, then rerun the full suite at standard.
137
- ```
130
+ 1. Capture command output.
131
+ 2. Sanitize sensitive-looking material.
132
+ 3. Apply local heuristics for known failure shapes.
133
+ 4. Escalate to a cheaper provider only if needed.
134
+ 5. Return a short diagnosis to the main agent.
138
135
 
139
- ### Recommended debugging order
136
+ It also returns a decision signal:
137
+ - `stop and act` when the diagnosis is already actionable
138
+ - `zoom` when one deeper pass is justified
139
+ - raw logs only as a last resort
140
140
 
141
- 1. `sift exec --preset test-status -- <test command>` get the map.
142
- 2. If `standard` already shows root cause, `Anchor`, and `Fix`, trust it and act.
143
- 3. `sift escalate` — deeper render of the same cached output, without rerunning.
144
- 4. `sift rerun` — after a fix, refresh the full-suite truth at `standard`.
145
- 5. `sift rerun --remaining --detail focused` — zoom into what is still failing.
146
- 6. `sift rerun --remaining --detail verbose`
147
- 7. `sift rerun --remaining --detail verbose --show-raw`
148
- 8. Raw test command only if exact traceback lines are still needed.
141
+ For recognized formats, local heuristics can fully handle the output and skip the provider entirely.
149
142
 
150
- `sift rerun --remaining` currently supports only cached `pytest` or `python -m pytest` runs. For other runners, rerun a narrowed command manually with `sift exec --preset test-status -- <narrowed command>`.
143
+ The deepest local coverage today is test debugging, especially `pytest`, with growing support for `vitest` and `jest`. Other presets cover typecheck walls, lint failures, build errors, audit output, and Terraform risk detection.
151
144
 
152
- ### Quick glossary
145
+ ## Built-in presets
153
146
 
154
- - `sift escalate` = same cached output, deeper render
155
- - `sift rerun` = rerun the cached command at `standard`, show what resolved or remained
156
- - `sift rerun --remaining` = rerun only the remaining failing test nodes
157
- - `Decision: stop and act` = trust the diagnosis and go fix code
158
- - `Decision: zoom` = one deeper sift pass is justified before raw
147
+ Every preset runs local heuristics first. When the heuristic confidently handles the output, the provider is never called.
159
148
 
160
- ## Watch mode
149
+ | Preset | Heuristic | What it does |
150
+ |--------|-----------|-------------|
151
+ | `test-status` | Deep | Bucket/anchor/decision system for pytest, vitest, jest. 30+ failure patterns, confidence-gated stop/zoom decisions. |
152
+ | `typecheck-summary` | Deterministic | Parses `tsc` output (standard and pretty formats), groups by error code, returns max 5 bullets. |
153
+ | `lint-failures` | Deterministic | Parses ESLint stylish output, groups by rule, distinguishes errors from warnings, detects fixable hints. |
154
+ | `audit-critical` | Deterministic | Extracts high/critical vulnerabilities from `npm audit` or similar. |
155
+ | `infra-risk` | Deterministic | Detects destructive signals in `terraform plan` output. Returns pass/fail verdict. |
156
+ | `build-failure` | Deterministic-first | Extracts the first concrete build error for recognized webpack, esbuild/Vite, Cargo, Go, GCC/Clang, and `tsc --build` output; falls back to the provider for unsupported formats. |
157
+ | `diff-summary` | Provider | Summarizes changes and risks in diff output. |
158
+ | `log-errors` | Provider | Extracts top error signals from log output. |
161
159
 
162
- Use watch mode when output redraws or repeats across cycles:
160
+ Presets marked **Deterministic** bypass the provider entirely for recognized output formats. Presets marked **Deterministic-first** try a local heuristic first and fall back to the provider only when the captured output is unsupported or ambiguous. Presets marked **Provider** always call the LLM but benefit from input sanitization and truncation.
163
161
 
164
162
  ```bash
165
- sift watch "what changed between cycles?" < watcher-output.txt
166
- sift exec --watch "what changed between cycles?" -- node watcher.js
167
- sift exec --watch --preset test-status -- pytest -f
163
+ sift exec --preset typecheck-summary -- npx tsc --noEmit
164
+ sift exec --preset lint-failures -- npx eslint src/
165
+ sift exec --preset build-failure -- npm run build
166
+ sift exec --preset audit-critical -- npm audit
167
+ sift exec --preset infra-risk -- terraform plan
168
168
  ```
169
169
 
170
- - cycle 1 = current state
171
- - later cycles = what changed, what resolved, what stayed, and the next best action
172
- - for `test-status`, resolved tests drop out and remaining failures stay in focus
170
+ On an interactive terminal, `sift` also shows a small stderr footer so humans can see whether the provider was skipped:
173
171
 
174
- ## Diagnose JSON
172
+ ```text
173
+ [sift: heuristic • LLM skipped • summary 47ms]
174
+ [sift: provider • LLM used • 380 tokens • summary 1.2s]
175
+ ```
175
176
 
176
- Start with text. Use JSON only when automation needs machine-readable output:
177
+ Suppress the footer with `--quiet`:
177
178
 
178
179
  ```bash
179
- sift exec --preset test-status --goal diagnose --format json -- pytest -q
180
- sift rerun --goal diagnose --format json
180
+ sift exec --preset typecheck-summary --quiet -- npx tsc --noEmit
181
181
  ```
182
182
 
183
- The JSON is summary-first: `remaining_summary`, `resolved_summary`, `read_targets` with optional `context_hint`, and `remaining_subset_available` to tell you whether `sift rerun --remaining` can zoom safely.
183
+ ## Strongest today
184
184
 
185
- Add `--include-test-ids` only when you need every raw failing test ID.
185
+ `sift` is strongest when output is:
186
+ - long
187
+ - repetitive
188
+ - triage-heavy
189
+ - shaped by a small number of shared root causes
186
190
 
187
- ## Built-in presets
191
+ Best fits today:
192
+ - large `pytest`, `vitest`, or `jest` runs
193
+ - `tsc` type errors and `eslint` lint failures
194
+ - build failures from webpack, esbuild/Vite, Cargo, Go, GCC/Clang
195
+ - `npm audit` and `terraform plan`
196
+ - repeated CI blockers
197
+ - noisy diffs and log streams
188
198
 
189
- - `test-status`: summarize test runs
190
- - `typecheck-summary`: group blocking type errors by root cause
191
- - `lint-failures`: group repeated lint violations and highlight the files or rules that matter
192
- - `audit-critical`: extract only high and critical vulnerabilities
193
- - `infra-risk`: return a safety verdict for infra changes
194
- - `diff-summary`: summarize code changes and risks
195
- - `build-failure`: explain the most likely build failure
196
- - `log-errors`: extract the most relevant error signals
199
+ ## Test debugging workflow
197
200
 
198
- ```bash
199
- sift presets list
200
- sift presets show test-status
201
- ```
201
+ This is where `sift` is strongest today.
202
202
 
203
- ## Agent setup
203
+ Think of it like this:
204
+ - `standard` = map
205
+ - `focused` = zoom
206
+ - raw traceback = last resort
204
207
 
205
- `sift` can install a managed instruction block so Codex or Claude Code uses `sift` by default for long command output:
208
+ Typical loop:
206
209
 
207
210
  ```bash
208
- sift agent install codex
209
- sift agent install claude
211
+ sift exec --preset test-status -- <test command>
212
+ sift rerun
213
+ sift rerun --remaining --detail focused
210
214
  ```
211
215
 
212
- This writes a managed block to `AGENTS.md` or `CLAUDE.md` in the current repo. Use `--dry-run` to preview, or `--scope global` for machine-wide instructions.
216
+ If `standard` already gives you the root cause, anchor, and fix, stop there and act.
217
+
218
+ `sift rerun --remaining` narrows automatically for cached `pytest` runs.
219
+
220
+ For cached `vitest` and `jest` runs, it reruns the original full command and keeps the diagnosis focused on what still fails relative to the cached baseline.
221
+
222
+ For other runners, rerun a narrowed command manually with `sift exec --preset test-status -- <narrowed command>`.
213
223
 
214
224
  ```bash
215
225
  sift agent status
216
- sift agent remove codex
226
+ sift agent show claude
217
227
  sift agent remove claude
218
228
  ```
219
229
 
220
- ## CI usage
230
+ ## Where it helps less
221
231
 
222
- Some commands succeed technically but should still block CI. `--fail-on` handles that:
232
+ `sift` adds less value when:
233
+ - the output is already short and obvious
234
+ - the command is interactive or TUI-based
235
+ - the exact raw log matters
236
+ - the output does not expose enough evidence for reliable grouping
223
237
 
224
- ```bash
225
- sift exec --preset audit-critical --fail-on -- npm audit
226
- sift exec --preset infra-risk --fail-on -- terraform plan
227
- ```
238
+ When it cannot be confident, it tells you to zoom or read raw instead of pretending certainty.
239
+
240
+ ## Benchmark
241
+
242
+ On a real 640-test Python backend (125 repeated setup errors, 3 contract failures, 510 passing tests):
243
+
244
+ | Metric | Raw agent | sift-first | Reduction |
245
+ |--------|-----------|------------|-----------|
246
+ | Tokens | 305K | 600 | 99.8% |
247
+ | Tool calls | 16 | 7 | 56% |
248
+ | Diagnosis | Same | Same | — |
249
+
250
+ The table above is the single-fixture reduction story: the largest real test log in the benchmark shrank from `198026` raw tokens to `129` `standard` tokens.
228
251
 
229
- ## Config
252
+ The end-to-end workflow benchmark is a different metric:
253
+ - `62%` fewer total debugging tokens
254
+ - `71%` fewer tool calls
255
+ - `65%` faster wall-clock time
256
+
257
+ Both matter. The table shows how aggressively `sift` can compress one large noisy run. The workflow numbers show how that compounds across a full debug loop.
258
+
259
+ Methodology and caveats live in [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md).
260
+
261
+ ## Configuration
262
+
263
+ Inspect and validate config with:
230
264
 
231
265
  ```bash
232
- sift config show # masks secrets by default
266
+ sift config show
233
267
  sift config show --show-secrets
234
268
  sift config validate
235
269
  ```
236
270
 
237
- Config precedence:
238
- 1. CLI flags
239
- 2. environment variables
240
- 3. repo-local `sift.config.yaml`
241
- 4. machine-wide `~/.config/sift/config.yaml`
242
- 5. built-in defaults
271
+ To switch between saved providers without editing files:
243
272
 
244
- If you pass `--config <path>`, that path is strict — missing paths are errors.
273
+ ```bash
274
+ sift config use openai
275
+ sift config use openrouter
276
+ ```
245
277
 
246
278
  Minimal YAML config:
247
279
 
@@ -262,37 +294,12 @@ runtime:
262
294
  rawFallback: true
263
295
  ```
264
296
 
265
- ## Safety and limits
266
-
267
- - redaction is optional and regex-based
268
- - retriable provider failures (`429`, timeouts, `5xx`) are retried once
269
- - `sift exec` detects interactive prompts (`[y/N]`, `password:`) and skips reduction
270
- - pipe mode does not preserve upstream pipeline failures; use `set -o pipefail` if needed
271
-
272
- ## Releasing
273
-
274
- This repo uses a manual GitHub Actions release workflow with npm trusted publishing.
275
-
276
- 1. bump `package.json`
277
- 2. merge to `main`
278
- 3. run the `release` workflow manually
279
-
280
- The workflow runs typecheck, tests, coverage, build, packaging smoke checks, npm publish, tag creation, and GitHub Release creation.
281
-
282
- Release notes: if `release-notes/v<version>.md` or `release-notes/<version>.md` exists, the workflow uses it. Otherwise it falls back to GitHub generated notes.
283
-
284
- ## Maintainer benchmark
285
-
286
- ```bash
287
- npm run bench:test-status-ab
288
- npm run bench:test-status-live
289
- ```
290
-
291
- Uses the `o200k_base` tokenizer and reports command-output budget as the primary benchmark, with deterministic recipe-budget comparisons and live-session scorecards as supporting evidence.
292
-
293
- ## Brand assets
297
+ ## Docs
294
298
 
295
- Logo assets live in `assets/brand/`: badge/app, icon-only, and 24px icon variants in teal, black, and monochrome.
299
+ - CLI reference: [docs/cli-reference.md](docs/cli-reference.md)
300
+ - Worked examples: [docs/examples](docs/examples)
301
+ - Benchmark methodology: [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md)
302
+ - Release notes: [release-notes](release-notes)
296
303
 
297
304
  ## License
298
305