@bilalimamoglu/sift 0.3.3 → 0.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -8,8 +8,15 @@
8
8
 
9
9
  Your AI agent should not be reading 13,000 lines of test output.
10
10
 
11
- **Before:** 128 failures, 198K tokens, 16 tool calls, agent reconstructs the failure shape from scratch.
12
- **After:** 6 lines, 129 tokens, 4 tool calls, agent acts on a grouped diagnosis immediately.
11
+ If 125 tests fail for one reason, it should pay for that reason once.
12
+
13
+ `sift` turns noisy command output into a short, structured diagnosis for coding agents, so they spend fewer tokens, cost less to run, and move through debug loops faster.
14
+
15
+ Instead of feeding an agent thousands of lines of logs, you give it:
16
+ - the root cause
17
+ - where it happens
18
+ - what to fix
19
+ - what to do next
13
20
 
14
21
  ```bash
15
22
  sift exec --preset test-status -- pytest -q
@@ -27,13 +34,14 @@ sift exec --preset test-status -- pytest -q
27
34
  - Decision: stop and act.
28
35
  ```
29
36
 
30
- If 125 tests fail for one reason, the agent should pay for that reason once.
37
+ On the largest real fixture in the benchmark:
38
+ `198K` raw-output tokens -> `129` `standard` tokens.
31
39
 
32
- ## Who is this for
40
+ Same diagnosis. Far less work.
33
41
 
34
- Developers using coding agents — Claude Code, Codex, Cursor, Windsurf, Copilot, or any LLM-driven workflow that runs shell commands and reads the output.
42
+ ## What it is
35
43
 
36
- `sift` sits between the command and the agent. It captures noisy output, groups repeated failures into root-cause buckets, and returns a short diagnosis with an anchor, a likely fix, and a decision signal. The agent gets a map instead of a wall of text.
44
+ `sift` sits between a noisy command and a coding agent. It captures output, groups repeated failures into root-cause buckets, and returns a short diagnosis with an anchor, a likely fix, and a decision signal.
37
45
 
38
46
  ## Install
39
47
 
@@ -43,9 +51,24 @@ npm install -g @bilalimamoglu/sift
43
51
 
44
52
  Requires Node.js 20+.
45
53
 
46
- ## Quick start
54
+ ## Try it in 60 seconds
55
+
56
+ If you already have an API key, you can try `sift` without any setup wizard:
57
+
58
+ ```bash
59
+ export OPENAI_API_KEY=your_openai_api_key
60
+ sift exec --preset test-status -- pytest -q
61
+ ```
62
+
63
+ You can also use a freeform prompt for non-test output:
64
+
65
+ ```bash
66
+ sift exec "what changed?" -- git diff
67
+ ```
68
+
69
+ ## Set it up for daily use
47
70
 
48
- Guided setup writes a machine-wide config and verifies the provider:
71
+ Guided setup writes a machine-wide config, verifies the provider, and makes the CLI easier to use day to day:
49
72
 
50
73
  ```bash
51
74
  sift config setup
@@ -54,6 +77,13 @@ sift doctor
54
77
 
55
78
  Config lives at `~/.config/sift/config.yaml`. A repo-local `sift.config.yaml` can override it later.
56
79
 
80
+ If you want your coding agent to use `sift` automatically, install the managed instruction block too:
81
+
82
+ ```bash
83
+ sift agent install codex
84
+ sift agent install claude
85
+ ```
86
+
57
87
  Then run noisy commands through `sift`:
58
88
 
59
89
  ```bash
@@ -87,6 +117,19 @@ export SIFT_BASE_URL=https://your-endpoint/v1
87
117
  export SIFT_PROVIDER_API_KEY=your_api_key
88
118
  ```
89
119
 
120
+ ## Why it helps
121
+
122
+ The core abstraction is a **bucket**: one distinct root cause, no matter how many tests it affects.
123
+
124
+ Instead of making an agent reason over 125 repeated tracebacks, `sift` compresses them into one actionable bucket with:
125
+ - a label
126
+ - an affected count
127
+ - an anchor
128
+ - a likely fix
129
+ - a decision signal
130
+
131
+ That changes the agent's job from "figure out what happened" to "act on the diagnosis."
132
+
90
133
  ## How it works
91
134
 
92
135
  `sift` follows a cheapest-first pipeline:
@@ -97,18 +140,18 @@ export SIFT_PROVIDER_API_KEY=your_api_key
97
140
  4. Escalate to a cheaper provider only if needed.
98
141
  5. Return a short diagnosis to the main agent.
99
142
 
100
- The core abstraction is a **bucket** — one distinct root cause, no matter how many tests it affects. Instead of making an agent reason over 125 repeated tracebacks, `sift` compresses them into one actionable bucket with a label, an affected count, an anchor, and a likely fix.
101
-
102
143
  It also returns a decision signal:
103
144
  - `stop and act` when the diagnosis is already actionable
104
145
  - `zoom` when one deeper pass is justified
105
146
  - raw logs only as a last resort
106
147
 
107
- The deepest local coverage today is test debugging, especially `pytest`, with growing support for `vitest` and `jest`.
148
+ For recognized formats, local heuristics can fully handle the output and skip the provider entirely.
149
+
150
+ The deepest local coverage today is test debugging, especially `pytest`, with growing support for `vitest` and `jest`. Other presets cover typecheck walls, lint failures, build errors, audit output, and Terraform risk detection.
108
151
 
109
152
  ## Built-in presets
110
153
 
111
- Every preset runs local heuristics first. When the heuristic confidently handles the output, the provider is never called — zero tokens, zero latency, fully deterministic.
154
+ Every preset runs local heuristics first. When the heuristic confidently handles the output, the provider is never called.
112
155
 
113
156
  | Preset | Heuristic | What it does |
114
157
  |--------|-----------|-------------|
@@ -144,6 +187,22 @@ Suppress the footer with `--quiet`:
144
187
  sift exec --preset typecheck-summary --quiet -- npx tsc --noEmit
145
188
  ```
146
189
 
190
+ ## Strongest today
191
+
192
+ `sift` is strongest when output is:
193
+ - long
194
+ - repetitive
195
+ - triage-heavy
196
+ - shaped by a small number of shared root causes
197
+
198
+ Best fits today:
199
+ - large `pytest`, `vitest`, or `jest` runs
200
+ - `tsc` type errors and `eslint` lint failures
201
+ - build failures from webpack, esbuild/Vite, Cargo, Go, GCC/Clang
202
+ - `npm audit` and `terraform plan`
203
+ - repeated CI blockers
204
+ - noisy diffs and log streams
205
+
147
206
  ## Test debugging workflow
148
207
 
149
208
  This is where `sift` is strongest today.
@@ -163,18 +222,11 @@ sift rerun --remaining --detail focused
163
222
 
164
223
  If `standard` already gives you the root cause, anchor, and fix, stop there and act.
165
224
 
166
- `sift rerun --remaining` currently supports only cached `pytest` or `python -m pytest` runs. For other runners, rerun a narrowed command manually with `sift exec --preset test-status -- <narrowed command>`.
167
-
168
- ## Agent setup
225
+ `sift rerun --remaining` narrows automatically for cached `pytest` runs.
169
226
 
170
- `sift` can install a managed instruction block so coding agents use it by default for long command output:
171
-
172
- ```bash
173
- sift agent install claude
174
- sift agent install codex
175
- ```
227
+ For cached `vitest` and `jest` runs, it reruns the original full command and keeps the diagnosis focused on what still fails relative to the cached baseline.
176
228
 
177
- This writes a tuned set of rules into your agent's config (CLAUDE.md, AGENTS.md, etc.) so the agent routes noisy commands through `sift` automatically no manual prompting needed.
229
+ For other runners, rerun a narrowed command manually with `sift exec --preset test-status -- <narrowed command>`.
178
230
 
179
231
  ```bash
180
232
  sift agent status
@@ -182,22 +234,6 @@ sift agent show claude
182
234
  sift agent remove claude
183
235
  ```
184
236
 
185
- ## Where `sift` helps most
186
-
187
- `sift` is strongest when output is:
188
- - long
189
- - repetitive
190
- - triage-heavy
191
- - shaped by a small number of root causes
192
-
193
- Good fits:
194
- - large `pytest`, `vitest`, or `jest` runs (deterministic heuristics)
195
- - `tsc` type errors and `eslint` lint failures (deterministic heuristics)
196
- - build failures from webpack, esbuild, cargo, go, gcc
197
- - `npm audit` and `terraform plan` (deterministic heuristics)
198
- - repeated CI blockers
199
- - noisy diffs and log streams
200
-
201
237
  ## Where it helps less
202
238
 
203
239
  `sift` adds less value when:
@@ -218,7 +254,14 @@ On a real 640-test Python backend (125 repeated setup errors, 3 contract failure
218
254
  | Tool calls | 16 | 7 | 56% |
219
255
  | Diagnosis | Same | Same | — |
220
256
 
221
- The headline numbers (62% token reduction, 71% fewer tool calls, 65% faster) come from the end-to-end wall-clock comparison. The table above shows the token-level reduction on the largest real fixture.
257
+ The table above is the single-fixture reduction story: the largest real test log in the benchmark shrank from `198026` raw tokens to `129` `standard` tokens.
258
+
259
+ The end-to-end workflow benchmark is a different metric:
260
+ - `62%` fewer total debugging tokens
261
+ - `71%` fewer tool calls
262
+ - `65%` faster wall-clock time
263
+
264
+ Both matter. The table shows how aggressively `sift` can compress one large noisy run. The workflow numbers show how that compounds across a full debug loop.
222
265
 
223
266
  Methodology and caveats live in [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md).
224
267
 
@@ -261,7 +304,9 @@ runtime:
261
304
  ## Docs
262
305
 
263
306
  - CLI reference: [docs/cli-reference.md](docs/cli-reference.md)
307
+ - Worked examples: [docs/examples](docs/examples)
264
308
  - Benchmark methodology: [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md)
309
+ - Release notes: [release-notes](release-notes)
265
310
 
266
311
  ## License
267
312