@bilalimamoglu/sift 0.3.3 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +74 -36
- package/dist/cli.js +699 -167
- package/dist/index.d.ts +5 -1
- package/dist/index.js +629 -141
- package/package.json +3 -1
package/README.md
CHANGED
|
@@ -8,8 +8,9 @@
|
|
|
8
8
|
|
|
9
9
|
Your AI agent should not be reading 13,000 lines of test output.
|
|
10
10
|
|
|
11
|
-
|
|
12
|
-
**
|
|
11
|
+
On the largest real fixture in the benchmark:
|
|
12
|
+
**Before:** 128 failures, 198K raw-output tokens, agent reconstructs the failure shape from scratch.
|
|
13
|
+
**After:** 6 lines, 129 `standard` tokens, agent acts on a grouped diagnosis immediately.
|
|
13
14
|
|
|
14
15
|
```bash
|
|
15
16
|
sift exec --preset test-status -- pytest -q
|
|
@@ -29,7 +30,7 @@ sift exec --preset test-status -- pytest -q
|
|
|
29
30
|
|
|
30
31
|
If 125 tests fail for one reason, the agent should pay for that reason once.
|
|
31
32
|
|
|
32
|
-
##
|
|
33
|
+
## What it is
|
|
33
34
|
|
|
34
35
|
Developers using coding agents — Claude Code, Codex, Cursor, Windsurf, Copilot, or any LLM-driven workflow that runs shell commands and reads the output.
|
|
35
36
|
|
|
@@ -43,9 +44,24 @@ npm install -g @bilalimamoglu/sift
|
|
|
43
44
|
|
|
44
45
|
Requires Node.js 20+.
|
|
45
46
|
|
|
46
|
-
##
|
|
47
|
+
## Try it in 60 seconds
|
|
47
48
|
|
|
48
|
-
|
|
49
|
+
If you already have an API key, you can try `sift` without any setup wizard:
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
export OPENAI_API_KEY=your_openai_api_key
|
|
53
|
+
sift exec --preset test-status -- pytest -q
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
You can also use a freeform prompt for non-test output:
|
|
57
|
+
|
|
58
|
+
```bash
|
|
59
|
+
sift exec "what changed?" -- git diff
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
## Set it up for daily use
|
|
63
|
+
|
|
64
|
+
Guided setup writes a machine-wide config, verifies the provider, and makes the CLI easier to use day to day:
|
|
49
65
|
|
|
50
66
|
```bash
|
|
51
67
|
sift config setup
|
|
@@ -54,6 +70,13 @@ sift doctor
|
|
|
54
70
|
|
|
55
71
|
Config lives at `~/.config/sift/config.yaml`. A repo-local `sift.config.yaml` can override it later.
|
|
56
72
|
|
|
73
|
+
If you want your coding agent to use `sift` automatically, install the managed instruction block too:
|
|
74
|
+
|
|
75
|
+
```bash
|
|
76
|
+
sift agent install codex
|
|
77
|
+
sift agent install claude
|
|
78
|
+
```
|
|
79
|
+
|
|
57
80
|
Then run noisy commands through `sift`:
|
|
58
81
|
|
|
59
82
|
```bash
|
|
@@ -87,6 +110,19 @@ export SIFT_BASE_URL=https://your-endpoint/v1
|
|
|
87
110
|
export SIFT_PROVIDER_API_KEY=your_api_key
|
|
88
111
|
```
|
|
89
112
|
|
|
113
|
+
## Why it helps
|
|
114
|
+
|
|
115
|
+
The core abstraction is a **bucket**: one distinct root cause, no matter how many tests it affects.
|
|
116
|
+
|
|
117
|
+
Instead of making an agent reason over 125 repeated tracebacks, `sift` compresses them into one actionable bucket with:
|
|
118
|
+
- a label
|
|
119
|
+
- an affected count
|
|
120
|
+
- an anchor
|
|
121
|
+
- a likely fix
|
|
122
|
+
- a decision signal
|
|
123
|
+
|
|
124
|
+
That changes the agent's job from "figure out what happened" to "act on the diagnosis."
|
|
125
|
+
|
|
90
126
|
## How it works
|
|
91
127
|
|
|
92
128
|
`sift` follows a cheapest-first pipeline:
|
|
@@ -97,18 +133,18 @@ export SIFT_PROVIDER_API_KEY=your_api_key
|
|
|
97
133
|
4. Escalate to a cheaper provider only if needed.
|
|
98
134
|
5. Return a short diagnosis to the main agent.
|
|
99
135
|
|
|
100
|
-
The core abstraction is a **bucket** — one distinct root cause, no matter how many tests it affects. Instead of making an agent reason over 125 repeated tracebacks, `sift` compresses them into one actionable bucket with a label, an affected count, an anchor, and a likely fix.
|
|
101
|
-
|
|
102
136
|
It also returns a decision signal:
|
|
103
137
|
- `stop and act` when the diagnosis is already actionable
|
|
104
138
|
- `zoom` when one deeper pass is justified
|
|
105
139
|
- raw logs only as a last resort
|
|
106
140
|
|
|
107
|
-
|
|
141
|
+
For recognized formats, local heuristics can fully handle the output and skip the provider entirely.
|
|
142
|
+
|
|
143
|
+
The deepest local coverage today is test debugging, especially `pytest`, with growing support for `vitest` and `jest`. Other presets cover typecheck walls, lint failures, build errors, audit output, and Terraform risk detection.
|
|
108
144
|
|
|
109
145
|
## Built-in presets
|
|
110
146
|
|
|
111
|
-
Every preset runs local heuristics first. When the heuristic confidently handles the output, the provider is never called
|
|
147
|
+
Every preset runs local heuristics first. When the heuristic confidently handles the output, the provider is never called.
|
|
112
148
|
|
|
113
149
|
| Preset | Heuristic | What it does |
|
|
114
150
|
|--------|-----------|-------------|
|
|
@@ -144,6 +180,22 @@ Suppress the footer with `--quiet`:
|
|
|
144
180
|
sift exec --preset typecheck-summary --quiet -- npx tsc --noEmit
|
|
145
181
|
```
|
|
146
182
|
|
|
183
|
+
## Strongest today
|
|
184
|
+
|
|
185
|
+
`sift` is strongest when output is:
|
|
186
|
+
- long
|
|
187
|
+
- repetitive
|
|
188
|
+
- triage-heavy
|
|
189
|
+
- shaped by a small number of shared root causes
|
|
190
|
+
|
|
191
|
+
Best fits today:
|
|
192
|
+
- large `pytest`, `vitest`, or `jest` runs
|
|
193
|
+
- `tsc` type errors and `eslint` lint failures
|
|
194
|
+
- build failures from webpack, esbuild/Vite, Cargo, Go, GCC/Clang
|
|
195
|
+
- `npm audit` and `terraform plan`
|
|
196
|
+
- repeated CI blockers
|
|
197
|
+
- noisy diffs and log streams
|
|
198
|
+
|
|
147
199
|
## Test debugging workflow
|
|
148
200
|
|
|
149
201
|
This is where `sift` is strongest today.
|
|
@@ -163,18 +215,11 @@ sift rerun --remaining --detail focused
|
|
|
163
215
|
|
|
164
216
|
If `standard` already gives you the root cause, anchor, and fix, stop there and act.
|
|
165
217
|
|
|
166
|
-
`sift rerun --remaining`
|
|
218
|
+
`sift rerun --remaining` narrows automatically for cached `pytest` runs.
|
|
167
219
|
|
|
168
|
-
|
|
220
|
+
For cached `vitest` and `jest` runs, it reruns the original full command and keeps the diagnosis focused on what still fails relative to the cached baseline.
|
|
169
221
|
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
```bash
|
|
173
|
-
sift agent install claude
|
|
174
|
-
sift agent install codex
|
|
175
|
-
```
|
|
176
|
-
|
|
177
|
-
This writes a tuned set of rules into your agent's config (CLAUDE.md, AGENTS.md, etc.) so the agent routes noisy commands through `sift` automatically — no manual prompting needed.
|
|
222
|
+
For other runners, rerun a narrowed command manually with `sift exec --preset test-status -- <narrowed command>`.
|
|
178
223
|
|
|
179
224
|
```bash
|
|
180
225
|
sift agent status
|
|
@@ -182,22 +227,6 @@ sift agent show claude
|
|
|
182
227
|
sift agent remove claude
|
|
183
228
|
```
|
|
184
229
|
|
|
185
|
-
## Where `sift` helps most
|
|
186
|
-
|
|
187
|
-
`sift` is strongest when output is:
|
|
188
|
-
- long
|
|
189
|
-
- repetitive
|
|
190
|
-
- triage-heavy
|
|
191
|
-
- shaped by a small number of root causes
|
|
192
|
-
|
|
193
|
-
Good fits:
|
|
194
|
-
- large `pytest`, `vitest`, or `jest` runs (deterministic heuristics)
|
|
195
|
-
- `tsc` type errors and `eslint` lint failures (deterministic heuristics)
|
|
196
|
-
- build failures from webpack, esbuild, cargo, go, gcc
|
|
197
|
-
- `npm audit` and `terraform plan` (deterministic heuristics)
|
|
198
|
-
- repeated CI blockers
|
|
199
|
-
- noisy diffs and log streams
|
|
200
|
-
|
|
201
230
|
## Where it helps less
|
|
202
231
|
|
|
203
232
|
`sift` adds less value when:
|
|
@@ -218,7 +247,14 @@ On a real 640-test Python backend (125 repeated setup errors, 3 contract failure
|
|
|
218
247
|
| Tool calls | 16 | 7 | 56% |
|
|
219
248
|
| Diagnosis | Same | Same | — |
|
|
220
249
|
|
|
221
|
-
The
|
|
250
|
+
The table above is the single-fixture reduction story: the largest real test log in the benchmark shrank from `198026` raw tokens to `129` `standard` tokens.
|
|
251
|
+
|
|
252
|
+
The end-to-end workflow benchmark is a different metric:
|
|
253
|
+
- `62%` fewer total debugging tokens
|
|
254
|
+
- `71%` fewer tool calls
|
|
255
|
+
- `65%` faster wall-clock time
|
|
256
|
+
|
|
257
|
+
Both matter. The table shows how aggressively `sift` can compress one large noisy run. The workflow numbers show how that compounds across a full debug loop.
|
|
222
258
|
|
|
223
259
|
Methodology and caveats live in [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md).
|
|
224
260
|
|
|
@@ -261,7 +297,9 @@ runtime:
|
|
|
261
297
|
## Docs
|
|
262
298
|
|
|
263
299
|
- CLI reference: [docs/cli-reference.md](docs/cli-reference.md)
|
|
300
|
+
- Worked examples: [docs/examples](docs/examples)
|
|
264
301
|
- Benchmark methodology: [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md)
|
|
302
|
+
- Release notes: [release-notes](release-notes)
|
|
265
303
|
|
|
266
304
|
## License
|
|
267
305
|
|