@bilalimamoglu/sift 0.4.1 → 0.4.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +63 -217
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -4,23 +4,19 @@
|
|
|
4
4
|
[](LICENSE)
|
|
5
5
|
[](https://github.com/bilalimamoglu/sift/actions/workflows/ci.yml)
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
Turn 13,000 lines of test output into 2 root causes.
|
|
8
8
|
|
|
9
|
-
Your
|
|
9
|
+
Your agent reads a diagnosis, not a log file.
|
|
10
10
|
|
|
11
|
-
|
|
11
|
+
<p align="center">
|
|
12
|
+
<img src="assets/readme/test-status-demo.gif" alt="sift turning a pytest failure wall into a short diagnosis" width="960" />
|
|
13
|
+
</p>
|
|
12
14
|
|
|
13
|
-
|
|
15
|
+
## Before / After
|
|
14
16
|
|
|
15
|
-
|
|
16
|
-
- the root cause
|
|
17
|
-
- where it happens
|
|
18
|
-
- what to fix
|
|
19
|
-
- what to do next
|
|
17
|
+
128 test failures. 13,000 lines of logs. The agent reads all of it.
|
|
20
18
|
|
|
21
|
-
|
|
22
|
-
sift exec --preset test-status -- pytest -q
|
|
23
|
-
```
|
|
19
|
+
With `sift`, it reads this instead:
|
|
24
20
|
|
|
25
21
|
```text
|
|
26
22
|
- Tests did not pass.
|
|
@@ -34,14 +30,18 @@ sift exec --preset test-status -- pytest -q
|
|
|
34
30
|
- Decision: stop and act.
|
|
35
31
|
```
|
|
36
32
|
|
|
37
|
-
|
|
38
|
-
`198K` raw-output tokens -> `129` `standard` tokens.
|
|
33
|
+
Same diagnosis. One run compressed from 198,000 tokens to 129.
|
|
39
34
|
|
|
40
|
-
|
|
35
|
+
## Not just tests
|
|
41
36
|
|
|
42
|
-
|
|
37
|
+
The same idea applies across noisy dev workflows:
|
|
43
38
|
|
|
44
|
-
|
|
39
|
+
- **Type errors** → grouped by error code, no model call
|
|
40
|
+
- **Lint output** → grouped by rule, no model call
|
|
41
|
+
- **Build failures** → first real error from webpack, esbuild/Vite, Cargo, Go, GCC/Clang
|
|
42
|
+
- **`npm audit`** → high/critical vulnerabilities only, no model call
|
|
43
|
+
- **`terraform plan`** → destructive risk detection, no model call
|
|
44
|
+
- **Diffs and logs** → compressed through a cheaper model before reaching your agent
|
|
45
45
|
|
|
46
46
|
## Install
|
|
47
47
|
|
|
@@ -51,255 +51,101 @@ npm install -g @bilalimamoglu/sift
|
|
|
51
51
|
|
|
52
52
|
Requires Node.js 20+.
|
|
53
53
|
|
|
54
|
-
## Try it
|
|
55
|
-
|
|
56
|
-
If you already have an API key, you can try `sift` without any setup wizard:
|
|
54
|
+
## Try it
|
|
57
55
|
|
|
58
56
|
```bash
|
|
59
|
-
export OPENAI_API_KEY=your_openai_api_key
|
|
60
57
|
sift exec --preset test-status -- pytest -q
|
|
58
|
+
sift exec --preset test-status -- npx vitest run
|
|
59
|
+
sift exec --preset test-status -- npx jest
|
|
61
60
|
```
|
|
62
61
|
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
```bash
|
|
66
|
-
sift exec "what changed?" -- git diff
|
|
67
|
-
```
|
|
68
|
-
|
|
69
|
-
## Set it up for daily use
|
|
70
|
-
|
|
71
|
-
Guided setup writes a machine-wide config, verifies the provider, and makes the CLI easier to use day to day:
|
|
72
|
-
|
|
73
|
-
```bash
|
|
74
|
-
sift config setup
|
|
75
|
-
sift doctor
|
|
76
|
-
```
|
|
77
|
-
|
|
78
|
-
Config lives at `~/.config/sift/config.yaml`. A repo-local `sift.config.yaml` can override it later.
|
|
79
|
-
|
|
80
|
-
If you want your coding agent to use `sift` automatically, install the managed instruction block too:
|
|
81
|
-
|
|
82
|
-
```bash
|
|
83
|
-
sift agent install codex
|
|
84
|
-
sift agent install claude
|
|
85
|
-
```
|
|
86
|
-
|
|
87
|
-
Then run noisy commands through `sift`:
|
|
62
|
+
Other workflows:
|
|
88
63
|
|
|
89
64
|
```bash
|
|
90
|
-
sift exec --preset
|
|
91
|
-
sift exec
|
|
65
|
+
sift exec --preset typecheck-summary -- npx tsc --noEmit
|
|
66
|
+
sift exec --preset lint-failures -- npx eslint src/
|
|
67
|
+
sift exec --preset build-failure -- npm run build
|
|
92
68
|
sift exec --preset audit-critical -- npm audit
|
|
93
69
|
sift exec --preset infra-risk -- terraform plan
|
|
70
|
+
sift exec "what changed?" -- git diff
|
|
94
71
|
```
|
|
95
72
|
|
|
96
|
-
Useful flags:
|
|
97
|
-
- `--dry-run` to preview the reduced input and prompt without calling a provider
|
|
98
|
-
- `--show-raw` to print captured raw output to `stderr`
|
|
99
|
-
- `--fail-on` to let reduced results fail CI for commands such as `npm audit` or `terraform plan`
|
|
100
|
-
|
|
101
|
-
If you prefer environment variables instead of setup:
|
|
102
|
-
|
|
103
|
-
```bash
|
|
104
|
-
# OpenAI
|
|
105
|
-
export SIFT_PROVIDER=openai
|
|
106
|
-
export SIFT_BASE_URL=https://api.openai.com/v1
|
|
107
|
-
export SIFT_MODEL=gpt-5-nano
|
|
108
|
-
export OPENAI_API_KEY=your_openai_api_key
|
|
109
|
-
|
|
110
|
-
# OpenRouter
|
|
111
|
-
export SIFT_PROVIDER=openrouter
|
|
112
|
-
export OPENROUTER_API_KEY=your_openrouter_api_key
|
|
113
|
-
|
|
114
|
-
# Any OpenAI-compatible endpoint
|
|
115
|
-
export SIFT_PROVIDER=openai-compatible
|
|
116
|
-
export SIFT_BASE_URL=https://your-endpoint/v1
|
|
117
|
-
export SIFT_PROVIDER_API_KEY=your_api_key
|
|
118
|
-
```
|
|
119
|
-
|
|
120
|
-
## Why it helps
|
|
121
|
-
|
|
122
|
-
The core abstraction is a **bucket**: one distinct root cause, no matter how many tests it affects.
|
|
123
|
-
|
|
124
|
-
Instead of making an agent reason over 125 repeated tracebacks, `sift` compresses them into one actionable bucket with:
|
|
125
|
-
- a label
|
|
126
|
-
- an affected count
|
|
127
|
-
- an anchor
|
|
128
|
-
- a likely fix
|
|
129
|
-
- a decision signal
|
|
130
|
-
|
|
131
|
-
That changes the agent's job from "figure out what happened" to "act on the diagnosis."
|
|
132
|
-
|
|
133
73
|
## How it works
|
|
134
74
|
|
|
135
|
-
`sift`
|
|
136
|
-
|
|
137
|
-
1. Capture command output.
|
|
138
|
-
2. Sanitize sensitive-looking material.
|
|
139
|
-
3. Apply local heuristics for known failure shapes.
|
|
140
|
-
4. Escalate to a cheaper provider only if needed.
|
|
141
|
-
5. Return a short diagnosis to the main agent.
|
|
75
|
+
`sift` sits between a noisy command and a coding agent.
|
|
142
76
|
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
77
|
+
1. Capture output.
|
|
78
|
+
2. Run local heuristics for known failure shapes.
|
|
79
|
+
3. If heuristics are confident, return the diagnosis. No model call.
|
|
80
|
+
4. If not, call a cheaper model — not your agent's.
|
|
147
81
|
|
|
148
|
-
|
|
82
|
+
The agent gets the root cause, where it happens, and what to do next.
|
|
149
83
|
|
|
150
|
-
|
|
84
|
+
So your agent spends tokens fixing, not reading.
|
|
151
85
|
|
|
152
86
|
## Built-in presets
|
|
153
87
|
|
|
154
|
-
Every preset runs local heuristics first. When the heuristic
|
|
155
|
-
|
|
156
|
-
| Preset | Heuristic | What it does |
|
|
157
|
-
|--------|-----------|-------------|
|
|
158
|
-
| `test-status` | Deep | Bucket/anchor/decision system for pytest, vitest, jest. 30+ failure patterns, confidence-gated stop/zoom decisions. |
|
|
159
|
-
| `typecheck-summary` | Deterministic | Parses `tsc` output (standard and pretty formats), groups by error code, returns max 5 bullets. |
|
|
160
|
-
| `lint-failures` | Deterministic | Parses ESLint stylish output, groups by rule, distinguishes errors from warnings, detects fixable hints. |
|
|
161
|
-
| `audit-critical` | Deterministic | Extracts high/critical vulnerabilities from `npm audit` or similar. |
|
|
162
|
-
| `infra-risk` | Deterministic | Detects destructive signals in `terraform plan` output. Returns pass/fail verdict. |
|
|
163
|
-
| `build-failure` | Deterministic-first | Extracts the first concrete build error for recognized webpack, esbuild/Vite, Cargo, Go, GCC/Clang, and `tsc --build` output; falls back to the provider for unsupported formats. |
|
|
164
|
-
| `diff-summary` | Provider | Summarizes changes and risks in diff output. |
|
|
165
|
-
| `log-errors` | Provider | Extracts top error signals from log output. |
|
|
166
|
-
|
|
167
|
-
Presets marked **Deterministic** bypass the provider entirely for recognized output formats. Presets marked **Deterministic-first** try a local heuristic first and fall back to the provider only when the captured output is unsupported or ambiguous. Presets marked **Provider** always call the LLM but benefit from input sanitization and truncation.
|
|
168
|
-
|
|
169
|
-
```bash
|
|
170
|
-
sift exec --preset typecheck-summary -- npx tsc --noEmit
|
|
171
|
-
sift exec --preset lint-failures -- npx eslint src/
|
|
172
|
-
sift exec --preset build-failure -- npm run build
|
|
173
|
-
sift exec --preset audit-critical -- npm audit
|
|
174
|
-
sift exec --preset infra-risk -- terraform plan
|
|
175
|
-
```
|
|
176
|
-
|
|
177
|
-
On an interactive terminal, `sift` also shows a small stderr footer so humans can see whether the provider was skipped:
|
|
178
|
-
|
|
179
|
-
```text
|
|
180
|
-
[sift: heuristic • LLM skipped • summary 47ms]
|
|
181
|
-
[sift: provider • LLM used • 380 tokens • summary 1.2s]
|
|
182
|
-
```
|
|
88
|
+
Every preset runs local heuristics first. When the heuristic handles the output, the provider is never called.
|
|
183
89
|
|
|
184
|
-
|
|
90
|
+
| Preset | What it does |
|
|
91
|
+
|--------|-------------|
|
|
92
|
+
| `test-status` | Groups pytest, vitest, jest failures into root-cause buckets with anchors and fix suggestions. 30+ failure patterns. |
|
|
93
|
+
| `typecheck-summary` | Parses `tsc` output, groups by error code, returns max 5 bullets. No model call. |
|
|
94
|
+
| `lint-failures` | Parses ESLint output, groups by rule, detects fixable hints. No model call. |
|
|
95
|
+
| `build-failure` | Extracts first concrete error from webpack, esbuild/Vite, Cargo, Go, GCC/Clang, `tsc --build`. Falls back to model for unsupported formats. |
|
|
96
|
+
| `audit-critical` | Extracts high/critical vulnerabilities from `npm audit`. No model call. |
|
|
97
|
+
| `infra-risk` | Detects destructive signals in `terraform plan`. No model call. |
|
|
98
|
+
| `diff-summary` | Summarizes changes and risks in diff output. |
|
|
99
|
+
| `log-errors` | Extracts top error signals from log output. |
|
|
185
100
|
|
|
186
|
-
|
|
187
|
-
sift exec --preset typecheck-summary --quiet -- npx tsc --noEmit
|
|
188
|
-
```
|
|
101
|
+
## Benchmark
|
|
189
102
|
|
|
190
|
-
|
|
103
|
+
End-to-end debug loop on a real 640-test Python backend (125 repeated setup errors, 3 contract failures, 510 passing tests):
|
|
191
104
|
|
|
192
|
-
|
|
193
|
-
|
|
194
|
-
|
|
195
|
-
|
|
196
|
-
-
|
|
105
|
+
| Metric | Without sift | With sift | Reduction |
|
|
106
|
+
|--------|-------------|-----------|-----------|
|
|
107
|
+
| Tokens | 52,944 | 20,049 | 62% fewer |
|
|
108
|
+
| Tool calls | 40.8 | 12 | 71% fewer |
|
|
109
|
+
| Wall-clock time | 244s | 85s | 65% faster |
|
|
110
|
+
| Commands | 15.5 | 6 | 61% fewer |
|
|
111
|
+
| Diagnosis | Same | Same | — |
|
|
197
112
|
|
|
198
|
-
|
|
199
|
-
- large `pytest`, `vitest`, or `jest` runs
|
|
200
|
-
- `tsc` type errors and `eslint` lint failures
|
|
201
|
-
- build failures from webpack, esbuild/Vite, Cargo, Go, GCC/Clang
|
|
202
|
-
- `npm audit` and `terraform plan`
|
|
203
|
-
- repeated CI blockers
|
|
204
|
-
- noisy diffs and log streams
|
|
113
|
+
Methodology and caveats: [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md)
|
|
205
114
|
|
|
206
115
|
## Test debugging workflow
|
|
207
116
|
|
|
208
|
-
This is where `sift` is strongest today.
|
|
209
|
-
|
|
210
117
|
Think of it like this:
|
|
211
118
|
- `standard` = map
|
|
212
119
|
- `focused` = zoom
|
|
213
120
|
- raw traceback = last resort
|
|
214
121
|
|
|
215
|
-
Typical loop:
|
|
216
|
-
|
|
217
122
|
```bash
|
|
218
123
|
sift exec --preset test-status -- <test command>
|
|
219
124
|
sift rerun
|
|
220
125
|
sift rerun --remaining --detail focused
|
|
221
126
|
```
|
|
222
127
|
|
|
223
|
-
If `standard` already gives you the root cause, anchor, and fix
|
|
224
|
-
|
|
225
|
-
`sift rerun --remaining` narrows automatically for cached `pytest` runs.
|
|
226
|
-
|
|
227
|
-
For cached `vitest` and `jest` runs, it reruns the original full command and keeps the diagnosis focused on what still fails relative to the cached baseline.
|
|
228
|
-
|
|
229
|
-
For other runners, rerun a narrowed command manually with `sift exec --preset test-status -- <narrowed command>`.
|
|
230
|
-
|
|
231
|
-
```bash
|
|
232
|
-
sift agent status
|
|
233
|
-
sift agent show claude
|
|
234
|
-
sift agent remove claude
|
|
235
|
-
```
|
|
236
|
-
|
|
237
|
-
## Where it helps less
|
|
238
|
-
|
|
239
|
-
`sift` adds less value when:
|
|
240
|
-
- the output is already short and obvious
|
|
241
|
-
- the command is interactive or TUI-based
|
|
242
|
-
- the exact raw log matters
|
|
243
|
-
- the output does not expose enough evidence for reliable grouping
|
|
244
|
-
|
|
245
|
-
When it cannot be confident, it tells you to zoom or read raw instead of pretending certainty.
|
|
246
|
-
|
|
247
|
-
## Benchmark
|
|
248
|
-
|
|
249
|
-
On a real 640-test Python backend (125 repeated setup errors, 3 contract failures, 510 passing tests):
|
|
250
|
-
|
|
251
|
-
| Metric | Raw agent | sift-first | Reduction |
|
|
252
|
-
|--------|-----------|------------|-----------|
|
|
253
|
-
| Tokens | 305K | 600 | 99.8% |
|
|
254
|
-
| Tool calls | 16 | 7 | 56% |
|
|
255
|
-
| Diagnosis | Same | Same | — |
|
|
256
|
-
|
|
257
|
-
The table above is the single-fixture reduction story: the largest real test log in the benchmark shrank from `198026` raw tokens to `129` `standard` tokens.
|
|
128
|
+
If `standard` already gives you the root cause, anchor, and fix — stop and act.
|
|
258
129
|
|
|
259
|
-
|
|
260
|
-
- `62%` fewer total debugging tokens
|
|
261
|
-
- `71%` fewer tool calls
|
|
262
|
-
- `65%` faster wall-clock time
|
|
130
|
+
`sift rerun --remaining` narrows automatically for cached `pytest` runs. For `vitest` and `jest`, it reruns the full command and keeps diagnosis focused on what still fails.
|
|
263
131
|
|
|
264
|
-
|
|
132
|
+
## Setup
|
|
265
133
|
|
|
266
|
-
|
|
267
|
-
|
|
268
|
-
## Configuration
|
|
269
|
-
|
|
270
|
-
Inspect and validate config with:
|
|
134
|
+
Guided setup writes a config, verifies the provider, and makes daily use easier:
|
|
271
135
|
|
|
272
136
|
```bash
|
|
273
|
-
sift config
|
|
274
|
-
sift
|
|
275
|
-
sift config validate
|
|
137
|
+
sift config setup
|
|
138
|
+
sift doctor
|
|
276
139
|
```
|
|
277
140
|
|
|
278
|
-
To
|
|
141
|
+
To wire `sift` into your coding agent automatically:
|
|
279
142
|
|
|
280
143
|
```bash
|
|
281
|
-
sift
|
|
282
|
-
sift
|
|
144
|
+
sift agent install claude
|
|
145
|
+
sift agent install codex
|
|
283
146
|
```
|
|
284
147
|
|
|
285
|
-
|
|
286
|
-
|
|
287
|
-
```yaml
|
|
288
|
-
provider:
|
|
289
|
-
provider: openai
|
|
290
|
-
model: gpt-5-nano
|
|
291
|
-
baseUrl: https://api.openai.com/v1
|
|
292
|
-
apiKey: YOUR_API_KEY
|
|
293
|
-
|
|
294
|
-
input:
|
|
295
|
-
stripAnsi: true
|
|
296
|
-
redact: false
|
|
297
|
-
maxCaptureChars: 400000
|
|
298
|
-
maxInputChars: 60000
|
|
299
|
-
|
|
300
|
-
runtime:
|
|
301
|
-
rawFallback: true
|
|
302
|
-
```
|
|
148
|
+
Config details: [docs/cli-reference.md](docs/cli-reference.md)
|
|
303
149
|
|
|
304
150
|
## Docs
|
|
305
151
|
|