@bilalimamoglu/sift 0.4.1 → 0.4.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +63 -217
  2. package/package.json +1 -1
package/README.md CHANGED
@@ -4,23 +4,19 @@
4
4
  [![license](https://img.shields.io/github/license/bilalimamoglu/sift)](LICENSE)
5
5
  [![CI](https://img.shields.io/github/actions/workflow/status/bilalimamoglu/sift/ci.yml?branch=main&label=CI)](https://github.com/bilalimamoglu/sift/actions/workflows/ci.yml)
6
6
 
7
- <img src="assets/brand/sift-logo-minimal-teal-default.svg" alt="sift logo" width="140" />
7
+ Turn 13,000 lines of test output into 2 root causes.
8
8
 
9
- Your AI agent should not be reading 13,000 lines of test output.
9
+ Your agent reads a diagnosis, not a log file.
10
10
 
11
- If 125 tests fail for one reason, it should pay for that reason once.
11
+ <p align="center">
12
+ <img src="assets/readme/test-status-demo.gif" alt="sift turning a pytest failure wall into a short diagnosis" width="960" />
13
+ </p>
12
14
 
13
- `sift` turns noisy command output into a short, structured diagnosis for coding agents, so they spend fewer tokens, cost less to run, and move through debug loops faster.
15
+ ## Before / After
14
16
 
15
- Instead of feeding an agent thousands of lines of logs, you give it:
16
- - the root cause
17
- - where it happens
18
- - what to fix
19
- - what to do next
17
+ 128 test failures. 13,000 lines of logs. The agent reads all of it.
20
18
 
21
- ```bash
22
- sift exec --preset test-status -- pytest -q
23
- ```
19
+ With `sift`, it reads this instead:
24
20
 
25
21
  ```text
26
22
  - Tests did not pass.
@@ -34,14 +30,18 @@ sift exec --preset test-status -- pytest -q
34
30
  - Decision: stop and act.
35
31
  ```
36
32
 
37
- On the largest real fixture in the benchmark:
38
- `198K` raw-output tokens -> `129` `standard` tokens.
33
+ Same diagnosis. One run compressed from 198,000 tokens to 129.
39
34
 
40
- Same diagnosis. Far less work.
35
+ ## Not just tests
41
36
 
42
- ## What it is
37
+ The same idea applies across noisy dev workflows:
43
38
 
44
- `sift` sits between a noisy command and a coding agent. It captures output, groups repeated failures into root-cause buckets, and returns a short diagnosis with an anchor, a likely fix, and a decision signal.
39
+ - **Type errors** grouped by error code, no model call
40
+ - **Lint output** → grouped by rule, no model call
41
+ - **Build failures** → first real error from webpack, esbuild/Vite, Cargo, Go, GCC/Clang
42
+ - **`npm audit`** → high/critical vulnerabilities only, no model call
43
+ - **`terraform plan`** → destructive risk detection, no model call
44
+ - **Diffs and logs** → compressed through a cheaper model before reaching your agent
45
45
 
46
46
  ## Install
47
47
 
@@ -51,255 +51,101 @@ npm install -g @bilalimamoglu/sift
51
51
 
52
52
  Requires Node.js 20+.
53
53
 
54
- ## Try it in 60 seconds
55
-
56
- If you already have an API key, you can try `sift` without any setup wizard:
54
+ ## Try it
57
55
 
58
56
  ```bash
59
- export OPENAI_API_KEY=your_openai_api_key
60
57
  sift exec --preset test-status -- pytest -q
58
+ sift exec --preset test-status -- npx vitest run
59
+ sift exec --preset test-status -- npx jest
61
60
  ```
62
61
 
63
- You can also use a freeform prompt for non-test output:
64
-
65
- ```bash
66
- sift exec "what changed?" -- git diff
67
- ```
68
-
69
- ## Set it up for daily use
70
-
71
- Guided setup writes a machine-wide config, verifies the provider, and makes the CLI easier to use day to day:
72
-
73
- ```bash
74
- sift config setup
75
- sift doctor
76
- ```
77
-
78
- Config lives at `~/.config/sift/config.yaml`. A repo-local `sift.config.yaml` can override it later.
79
-
80
- If you want your coding agent to use `sift` automatically, install the managed instruction block too:
81
-
82
- ```bash
83
- sift agent install codex
84
- sift agent install claude
85
- ```
86
-
87
- Then run noisy commands through `sift`:
62
+ Other workflows:
88
63
 
89
64
  ```bash
90
- sift exec --preset test-status -- <test command>
91
- sift exec "what changed?" -- git diff
65
+ sift exec --preset typecheck-summary -- npx tsc --noEmit
66
+ sift exec --preset lint-failures -- npx eslint src/
67
+ sift exec --preset build-failure -- npm run build
92
68
  sift exec --preset audit-critical -- npm audit
93
69
  sift exec --preset infra-risk -- terraform plan
70
+ sift exec "what changed?" -- git diff
94
71
  ```
95
72
 
96
- Useful flags:
97
- - `--dry-run` to preview the reduced input and prompt without calling a provider
98
- - `--show-raw` to print captured raw output to `stderr`
99
- - `--fail-on` to let reduced results fail CI for commands such as `npm audit` or `terraform plan`
100
-
101
- If you prefer environment variables instead of setup:
102
-
103
- ```bash
104
- # OpenAI
105
- export SIFT_PROVIDER=openai
106
- export SIFT_BASE_URL=https://api.openai.com/v1
107
- export SIFT_MODEL=gpt-5-nano
108
- export OPENAI_API_KEY=your_openai_api_key
109
-
110
- # OpenRouter
111
- export SIFT_PROVIDER=openrouter
112
- export OPENROUTER_API_KEY=your_openrouter_api_key
113
-
114
- # Any OpenAI-compatible endpoint
115
- export SIFT_PROVIDER=openai-compatible
116
- export SIFT_BASE_URL=https://your-endpoint/v1
117
- export SIFT_PROVIDER_API_KEY=your_api_key
118
- ```
119
-
120
- ## Why it helps
121
-
122
- The core abstraction is a **bucket**: one distinct root cause, no matter how many tests it affects.
123
-
124
- Instead of making an agent reason over 125 repeated tracebacks, `sift` compresses them into one actionable bucket with:
125
- - a label
126
- - an affected count
127
- - an anchor
128
- - a likely fix
129
- - a decision signal
130
-
131
- That changes the agent's job from "figure out what happened" to "act on the diagnosis."
132
-
133
73
  ## How it works
134
74
 
135
- `sift` follows a cheapest-first pipeline:
136
-
137
- 1. Capture command output.
138
- 2. Sanitize sensitive-looking material.
139
- 3. Apply local heuristics for known failure shapes.
140
- 4. Escalate to a cheaper provider only if needed.
141
- 5. Return a short diagnosis to the main agent.
75
+ `sift` sits between a noisy command and a coding agent.
142
76
 
143
- It also returns a decision signal:
144
- - `stop and act` when the diagnosis is already actionable
145
- - `zoom` when one deeper pass is justified
146
- - raw logs only as a last resort
77
+ 1. Capture output.
78
+ 2. Run local heuristics for known failure shapes.
79
+ 3. If heuristics are confident, return the diagnosis. No model call.
80
+ 4. If not, call a cheaper model — not your agent's.
147
81
 
148
- For recognized formats, local heuristics can fully handle the output and skip the provider entirely.
82
+ The agent gets the root cause, where it happens, and what to do next.
149
83
 
150
- The deepest local coverage today is test debugging, especially `pytest`, with growing support for `vitest` and `jest`. Other presets cover typecheck walls, lint failures, build errors, audit output, and Terraform risk detection.
84
+ So your agent spends tokens fixing, not reading.
151
85
 
152
86
  ## Built-in presets
153
87
 
154
- Every preset runs local heuristics first. When the heuristic confidently handles the output, the provider is never called.
155
-
156
- | Preset | Heuristic | What it does |
157
- |--------|-----------|-------------|
158
- | `test-status` | Deep | Bucket/anchor/decision system for pytest, vitest, jest. 30+ failure patterns, confidence-gated stop/zoom decisions. |
159
- | `typecheck-summary` | Deterministic | Parses `tsc` output (standard and pretty formats), groups by error code, returns max 5 bullets. |
160
- | `lint-failures` | Deterministic | Parses ESLint stylish output, groups by rule, distinguishes errors from warnings, detects fixable hints. |
161
- | `audit-critical` | Deterministic | Extracts high/critical vulnerabilities from `npm audit` or similar. |
162
- | `infra-risk` | Deterministic | Detects destructive signals in `terraform plan` output. Returns pass/fail verdict. |
163
- | `build-failure` | Deterministic-first | Extracts the first concrete build error for recognized webpack, esbuild/Vite, Cargo, Go, GCC/Clang, and `tsc --build` output; falls back to the provider for unsupported formats. |
164
- | `diff-summary` | Provider | Summarizes changes and risks in diff output. |
165
- | `log-errors` | Provider | Extracts top error signals from log output. |
166
-
167
- Presets marked **Deterministic** bypass the provider entirely for recognized output formats. Presets marked **Deterministic-first** try a local heuristic first and fall back to the provider only when the captured output is unsupported or ambiguous. Presets marked **Provider** always call the LLM but benefit from input sanitization and truncation.
168
-
169
- ```bash
170
- sift exec --preset typecheck-summary -- npx tsc --noEmit
171
- sift exec --preset lint-failures -- npx eslint src/
172
- sift exec --preset build-failure -- npm run build
173
- sift exec --preset audit-critical -- npm audit
174
- sift exec --preset infra-risk -- terraform plan
175
- ```
176
-
177
- On an interactive terminal, `sift` also shows a small stderr footer so humans can see whether the provider was skipped:
178
-
179
- ```text
180
- [sift: heuristic • LLM skipped • summary 47ms]
181
- [sift: provider • LLM used • 380 tokens • summary 1.2s]
182
- ```
88
+ Every preset runs local heuristics first. When the heuristic handles the output, the provider is never called.
183
89
 
184
- Suppress the footer with `--quiet`:
90
+ | Preset | What it does |
91
+ |--------|-------------|
92
+ | `test-status` | Groups pytest, vitest, jest failures into root-cause buckets with anchors and fix suggestions. 30+ failure patterns. |
93
+ | `typecheck-summary` | Parses `tsc` output, groups by error code, returns max 5 bullets. No model call. |
94
+ | `lint-failures` | Parses ESLint output, groups by rule, detects fixable hints. No model call. |
95
+ | `build-failure` | Extracts first concrete error from webpack, esbuild/Vite, Cargo, Go, GCC/Clang, `tsc --build`. Falls back to model for unsupported formats. |
96
+ | `audit-critical` | Extracts high/critical vulnerabilities from `npm audit`. No model call. |
97
+ | `infra-risk` | Detects destructive signals in `terraform plan`. No model call. |
98
+ | `diff-summary` | Summarizes changes and risks in diff output. |
99
+ | `log-errors` | Extracts top error signals from log output. |
185
100
 
186
- ```bash
187
- sift exec --preset typecheck-summary --quiet -- npx tsc --noEmit
188
- ```
101
+ ## Benchmark
189
102
 
190
- ## Strongest today
103
+ End-to-end debug loop on a real 640-test Python backend (125 repeated setup errors, 3 contract failures, 510 passing tests):
191
104
 
192
- `sift` is strongest when output is:
193
- - long
194
- - repetitive
195
- - triage-heavy
196
- - shaped by a small number of shared root causes
105
+ | Metric | Without sift | With sift | Reduction |
106
+ |--------|-------------|-----------|-----------|
107
+ | Tokens | 52,944 | 20,049 | 62% fewer |
108
+ | Tool calls | 40.8 | 12 | 71% fewer |
109
+ | Wall-clock time | 244s | 85s | 65% faster |
110
+ | Commands | 15.5 | 6 | 61% fewer |
111
+ | Diagnosis | Same | Same | — |
197
112
 
198
- Best fits today:
199
- - large `pytest`, `vitest`, or `jest` runs
200
- - `tsc` type errors and `eslint` lint failures
201
- - build failures from webpack, esbuild/Vite, Cargo, Go, GCC/Clang
202
- - `npm audit` and `terraform plan`
203
- - repeated CI blockers
204
- - noisy diffs and log streams
113
+ Methodology and caveats: [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md)
205
114
 
206
115
  ## Test debugging workflow
207
116
 
208
- This is where `sift` is strongest today.
209
-
210
117
  Think of it like this:
211
118
  - `standard` = map
212
119
  - `focused` = zoom
213
120
  - raw traceback = last resort
214
121
 
215
- Typical loop:
216
-
217
122
  ```bash
218
123
  sift exec --preset test-status -- <test command>
219
124
  sift rerun
220
125
  sift rerun --remaining --detail focused
221
126
  ```
222
127
 
223
- If `standard` already gives you the root cause, anchor, and fix, stop there and act.
224
-
225
- `sift rerun --remaining` narrows automatically for cached `pytest` runs.
226
-
227
- For cached `vitest` and `jest` runs, it reruns the original full command and keeps the diagnosis focused on what still fails relative to the cached baseline.
228
-
229
- For other runners, rerun a narrowed command manually with `sift exec --preset test-status -- <narrowed command>`.
230
-
231
- ```bash
232
- sift agent status
233
- sift agent show claude
234
- sift agent remove claude
235
- ```
236
-
237
- ## Where it helps less
238
-
239
- `sift` adds less value when:
240
- - the output is already short and obvious
241
- - the command is interactive or TUI-based
242
- - the exact raw log matters
243
- - the output does not expose enough evidence for reliable grouping
244
-
245
- When it cannot be confident, it tells you to zoom or read raw instead of pretending certainty.
246
-
247
- ## Benchmark
248
-
249
- On a real 640-test Python backend (125 repeated setup errors, 3 contract failures, 510 passing tests):
250
-
251
- | Metric | Raw agent | sift-first | Reduction |
252
- |--------|-----------|------------|-----------|
253
- | Tokens | 305K | 600 | 99.8% |
254
- | Tool calls | 16 | 7 | 56% |
255
- | Diagnosis | Same | Same | — |
256
-
257
- The table above is the single-fixture reduction story: the largest real test log in the benchmark shrank from `198026` raw tokens to `129` `standard` tokens.
128
+ If `standard` already gives you the root cause, anchor, and fix stop and act.
258
129
 
259
- The end-to-end workflow benchmark is a different metric:
260
- - `62%` fewer total debugging tokens
261
- - `71%` fewer tool calls
262
- - `65%` faster wall-clock time
130
+ `sift rerun --remaining` narrows automatically for cached `pytest` runs. For `vitest` and `jest`, it reruns the full command and keeps diagnosis focused on what still fails.
263
131
 
264
- Both matter. The table shows how aggressively `sift` can compress one large noisy run. The workflow numbers show how that compounds across a full debug loop.
132
+ ## Setup
265
133
 
266
- Methodology and caveats live in [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md).
267
-
268
- ## Configuration
269
-
270
- Inspect and validate config with:
134
+ Guided setup writes a config, verifies the provider, and makes daily use easier:
271
135
 
272
136
  ```bash
273
- sift config show
274
- sift config show --show-secrets
275
- sift config validate
137
+ sift config setup
138
+ sift doctor
276
139
  ```
277
140
 
278
- To switch between saved providers without editing files:
141
+ To wire `sift` into your coding agent automatically:
279
142
 
280
143
  ```bash
281
- sift config use openai
282
- sift config use openrouter
144
+ sift agent install claude
145
+ sift agent install codex
283
146
  ```
284
147
 
285
- Minimal YAML config:
286
-
287
- ```yaml
288
- provider:
289
- provider: openai
290
- model: gpt-5-nano
291
- baseUrl: https://api.openai.com/v1
292
- apiKey: YOUR_API_KEY
293
-
294
- input:
295
- stripAnsi: true
296
- redact: false
297
- maxCaptureChars: 400000
298
- maxInputChars: 60000
299
-
300
- runtime:
301
- rawFallback: true
302
- ```
148
+ Config details: [docs/cli-reference.md](docs/cli-reference.md)
303
149
 
304
150
  ## Docs
305
151
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@bilalimamoglu/sift",
3
- "version": "0.4.1",
3
+ "version": "0.4.2",
4
4
  "description": "Agent-first command-output reduction layer for agents, CI, and automation.",
5
5
  "type": "module",
6
6
  "bin": {