@bilalimamoglu/sift 0.3.3 → 0.4.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +84 -39
- package/dist/cli.js +725 -183
- package/dist/index.d.ts +6 -1
- package/dist/index.js +657 -158
- package/package.json +3 -1
package/README.md
CHANGED
|
@@ -8,8 +8,15 @@
|
|
|
8
8
|
|
|
9
9
|
Your AI agent should not be reading 13,000 lines of test output.
|
|
10
10
|
|
|
11
|
-
|
|
12
|
-
|
|
11
|
+
If 125 tests fail for one reason, it should pay for that reason once.
|
|
12
|
+
|
|
13
|
+
`sift` turns noisy command output into a short, structured diagnosis for coding agents, so they spend fewer tokens, cost less to run, and move through debug loops faster.
|
|
14
|
+
|
|
15
|
+
Instead of feeding an agent thousands of lines of logs, you give it:
|
|
16
|
+
- the root cause
|
|
17
|
+
- where it happens
|
|
18
|
+
- what to fix
|
|
19
|
+
- what to do next
|
|
13
20
|
|
|
14
21
|
```bash
|
|
15
22
|
sift exec --preset test-status -- pytest -q
|
|
@@ -27,13 +34,14 @@ sift exec --preset test-status -- pytest -q
|
|
|
27
34
|
- Decision: stop and act.
|
|
28
35
|
```
|
|
29
36
|
|
|
30
|
-
|
|
37
|
+
On the largest real fixture in the benchmark:
|
|
38
|
+
`198K` raw-output tokens -> `129` `standard` tokens.
|
|
31
39
|
|
|
32
|
-
|
|
40
|
+
Same diagnosis. Far less work.
|
|
33
41
|
|
|
34
|
-
|
|
42
|
+
## What it is
|
|
35
43
|
|
|
36
|
-
`sift` sits between
|
|
44
|
+
`sift` sits between a noisy command and a coding agent. It captures output, groups repeated failures into root-cause buckets, and returns a short diagnosis with an anchor, a likely fix, and a decision signal.
|
|
37
45
|
|
|
38
46
|
## Install
|
|
39
47
|
|
|
@@ -43,9 +51,24 @@ npm install -g @bilalimamoglu/sift
|
|
|
43
51
|
|
|
44
52
|
Requires Node.js 20+.
|
|
45
53
|
|
|
46
|
-
##
|
|
54
|
+
## Try it in 60 seconds
|
|
55
|
+
|
|
56
|
+
If you already have an API key, you can try `sift` without any setup wizard:
|
|
57
|
+
|
|
58
|
+
```bash
|
|
59
|
+
export OPENAI_API_KEY=your_openai_api_key
|
|
60
|
+
sift exec --preset test-status -- pytest -q
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
You can also use a freeform prompt for non-test output:
|
|
64
|
+
|
|
65
|
+
```bash
|
|
66
|
+
sift exec "what changed?" -- git diff
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
## Set it up for daily use
|
|
47
70
|
|
|
48
|
-
Guided setup writes a machine-wide config and
|
|
71
|
+
Guided setup writes a machine-wide config, verifies the provider, and makes the CLI easier to use day to day:
|
|
49
72
|
|
|
50
73
|
```bash
|
|
51
74
|
sift config setup
|
|
@@ -54,6 +77,13 @@ sift doctor
|
|
|
54
77
|
|
|
55
78
|
Config lives at `~/.config/sift/config.yaml`. A repo-local `sift.config.yaml` can override it later.
|
|
56
79
|
|
|
80
|
+
If you want your coding agent to use `sift` automatically, install the managed instruction block too:
|
|
81
|
+
|
|
82
|
+
```bash
|
|
83
|
+
sift agent install codex
|
|
84
|
+
sift agent install claude
|
|
85
|
+
```
|
|
86
|
+
|
|
57
87
|
Then run noisy commands through `sift`:
|
|
58
88
|
|
|
59
89
|
```bash
|
|
@@ -87,6 +117,19 @@ export SIFT_BASE_URL=https://your-endpoint/v1
|
|
|
87
117
|
export SIFT_PROVIDER_API_KEY=your_api_key
|
|
88
118
|
```
|
|
89
119
|
|
|
120
|
+
## Why it helps
|
|
121
|
+
|
|
122
|
+
The core abstraction is a **bucket**: one distinct root cause, no matter how many tests it affects.
|
|
123
|
+
|
|
124
|
+
Instead of making an agent reason over 125 repeated tracebacks, `sift` compresses them into one actionable bucket with:
|
|
125
|
+
- a label
|
|
126
|
+
- an affected count
|
|
127
|
+
- an anchor
|
|
128
|
+
- a likely fix
|
|
129
|
+
- a decision signal
|
|
130
|
+
|
|
131
|
+
That changes the agent's job from "figure out what happened" to "act on the diagnosis."
|
|
132
|
+
|
|
90
133
|
## How it works
|
|
91
134
|
|
|
92
135
|
`sift` follows a cheapest-first pipeline:
|
|
@@ -97,18 +140,18 @@ export SIFT_PROVIDER_API_KEY=your_api_key
|
|
|
97
140
|
4. Escalate to a cheaper provider only if needed.
|
|
98
141
|
5. Return a short diagnosis to the main agent.
|
|
99
142
|
|
|
100
|
-
The core abstraction is a **bucket** — one distinct root cause, no matter how many tests it affects. Instead of making an agent reason over 125 repeated tracebacks, `sift` compresses them into one actionable bucket with a label, an affected count, an anchor, and a likely fix.
|
|
101
|
-
|
|
102
143
|
It also returns a decision signal:
|
|
103
144
|
- `stop and act` when the diagnosis is already actionable
|
|
104
145
|
- `zoom` when one deeper pass is justified
|
|
105
146
|
- raw logs only as a last resort
|
|
106
147
|
|
|
107
|
-
|
|
148
|
+
For recognized formats, local heuristics can fully handle the output and skip the provider entirely.
|
|
149
|
+
|
|
150
|
+
The deepest local coverage today is test debugging, especially `pytest`, with growing support for `vitest` and `jest`. Other presets cover typecheck walls, lint failures, build errors, audit output, and Terraform risk detection.
|
|
108
151
|
|
|
109
152
|
## Built-in presets
|
|
110
153
|
|
|
111
|
-
Every preset runs local heuristics first. When the heuristic confidently handles the output, the provider is never called
|
|
154
|
+
Every preset runs local heuristics first. When the heuristic confidently handles the output, the provider is never called.
|
|
112
155
|
|
|
113
156
|
| Preset | Heuristic | What it does |
|
|
114
157
|
|--------|-----------|-------------|
|
|
@@ -144,6 +187,22 @@ Suppress the footer with `--quiet`:
|
|
|
144
187
|
sift exec --preset typecheck-summary --quiet -- npx tsc --noEmit
|
|
145
188
|
```
|
|
146
189
|
|
|
190
|
+
## Strongest today
|
|
191
|
+
|
|
192
|
+
`sift` is strongest when output is:
|
|
193
|
+
- long
|
|
194
|
+
- repetitive
|
|
195
|
+
- triage-heavy
|
|
196
|
+
- shaped by a small number of shared root causes
|
|
197
|
+
|
|
198
|
+
Best fits today:
|
|
199
|
+
- large `pytest`, `vitest`, or `jest` runs
|
|
200
|
+
- `tsc` type errors and `eslint` lint failures
|
|
201
|
+
- build failures from webpack, esbuild/Vite, Cargo, Go, GCC/Clang
|
|
202
|
+
- `npm audit` and `terraform plan`
|
|
203
|
+
- repeated CI blockers
|
|
204
|
+
- noisy diffs and log streams
|
|
205
|
+
|
|
147
206
|
## Test debugging workflow
|
|
148
207
|
|
|
149
208
|
This is where `sift` is strongest today.
|
|
@@ -163,18 +222,11 @@ sift rerun --remaining --detail focused
|
|
|
163
222
|
|
|
164
223
|
If `standard` already gives you the root cause, anchor, and fix, stop there and act.
|
|
165
224
|
|
|
166
|
-
`sift rerun --remaining`
|
|
167
|
-
|
|
168
|
-
## Agent setup
|
|
225
|
+
`sift rerun --remaining` narrows automatically for cached `pytest` runs.
|
|
169
226
|
|
|
170
|
-
`
|
|
171
|
-
|
|
172
|
-
```bash
|
|
173
|
-
sift agent install claude
|
|
174
|
-
sift agent install codex
|
|
175
|
-
```
|
|
227
|
+
For cached `vitest` and `jest` runs, it reruns the original full command and keeps the diagnosis focused on what still fails relative to the cached baseline.
|
|
176
228
|
|
|
177
|
-
|
|
229
|
+
For other runners, rerun a narrowed command manually with `sift exec --preset test-status -- <narrowed command>`.
|
|
178
230
|
|
|
179
231
|
```bash
|
|
180
232
|
sift agent status
|
|
@@ -182,22 +234,6 @@ sift agent show claude
|
|
|
182
234
|
sift agent remove claude
|
|
183
235
|
```
|
|
184
236
|
|
|
185
|
-
## Where `sift` helps most
|
|
186
|
-
|
|
187
|
-
`sift` is strongest when output is:
|
|
188
|
-
- long
|
|
189
|
-
- repetitive
|
|
190
|
-
- triage-heavy
|
|
191
|
-
- shaped by a small number of root causes
|
|
192
|
-
|
|
193
|
-
Good fits:
|
|
194
|
-
- large `pytest`, `vitest`, or `jest` runs (deterministic heuristics)
|
|
195
|
-
- `tsc` type errors and `eslint` lint failures (deterministic heuristics)
|
|
196
|
-
- build failures from webpack, esbuild, cargo, go, gcc
|
|
197
|
-
- `npm audit` and `terraform plan` (deterministic heuristics)
|
|
198
|
-
- repeated CI blockers
|
|
199
|
-
- noisy diffs and log streams
|
|
200
|
-
|
|
201
237
|
## Where it helps less
|
|
202
238
|
|
|
203
239
|
`sift` adds less value when:
|
|
@@ -218,7 +254,14 @@ On a real 640-test Python backend (125 repeated setup errors, 3 contract failure
|
|
|
218
254
|
| Tool calls | 16 | 7 | 56% |
|
|
219
255
|
| Diagnosis | Same | Same | — |
|
|
220
256
|
|
|
221
|
-
The
|
|
257
|
+
The table above is the single-fixture reduction story: the largest real test log in the benchmark shrank from `198026` raw tokens to `129` `standard` tokens.
|
|
258
|
+
|
|
259
|
+
The end-to-end workflow benchmark is a different metric:
|
|
260
|
+
- `62%` fewer total debugging tokens
|
|
261
|
+
- `71%` fewer tool calls
|
|
262
|
+
- `65%` faster wall-clock time
|
|
263
|
+
|
|
264
|
+
Both matter. The table shows how aggressively `sift` can compress one large noisy run. The workflow numbers show how that compounds across a full debug loop.
|
|
222
265
|
|
|
223
266
|
Methodology and caveats live in [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md).
|
|
224
267
|
|
|
@@ -261,7 +304,9 @@ runtime:
|
|
|
261
304
|
## Docs
|
|
262
305
|
|
|
263
306
|
- CLI reference: [docs/cli-reference.md](docs/cli-reference.md)
|
|
307
|
+
- Worked examples: [docs/examples](docs/examples)
|
|
264
308
|
- Benchmark methodology: [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md)
|
|
309
|
+
- Release notes: [release-notes](release-notes)
|
|
265
310
|
|
|
266
311
|
## License
|
|
267
312
|
|