@bilalimamoglu/sift 0.3.3 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -8,8 +8,9 @@
8
8
 
9
9
  Your AI agent should not be reading 13,000 lines of test output.
10
10
 
11
- **Before:** 128 failures, 198K tokens, 16 tool calls, agent reconstructs the failure shape from scratch.
12
- **After:** 6 lines, 129 tokens, 4 tool calls, agent acts on a grouped diagnosis immediately.
11
+ On the largest real fixture in the benchmark:
12
+ **Before:** 128 failures, 198K raw-output tokens, agent reconstructs the failure shape from scratch.
13
+ **After:** 6 lines, 129 `standard` tokens, agent acts on a grouped diagnosis immediately.
13
14
 
14
15
  ```bash
15
16
  sift exec --preset test-status -- pytest -q
@@ -29,7 +30,7 @@ sift exec --preset test-status -- pytest -q
29
30
 
30
31
  If 125 tests fail for one reason, the agent should pay for that reason once.
31
32
 
32
- ## Who is this for
33
+ ## What it is
33
34
 
34
35
  Developers using coding agents — Claude Code, Codex, Cursor, Windsurf, Copilot, or any LLM-driven workflow that runs shell commands and reads the output.
35
36
 
@@ -43,9 +44,24 @@ npm install -g @bilalimamoglu/sift
43
44
 
44
45
  Requires Node.js 20+.
45
46
 
46
- ## Quick start
47
+ ## Try it in 60 seconds
47
48
 
48
- Guided setup writes a machine-wide config and verifies the provider:
49
+ If you already have an API key, you can try `sift` without any setup wizard:
50
+
51
+ ```bash
52
+ export OPENAI_API_KEY=your_openai_api_key
53
+ sift exec --preset test-status -- pytest -q
54
+ ```
55
+
56
+ You can also use a freeform prompt for non-test output:
57
+
58
+ ```bash
59
+ sift exec "what changed?" -- git diff
60
+ ```
61
+
62
+ ## Set it up for daily use
63
+
64
+ Guided setup writes a machine-wide config, verifies the provider, and makes the CLI easier to use day to day:
49
65
 
50
66
  ```bash
51
67
  sift config setup
@@ -54,6 +70,13 @@ sift doctor
54
70
 
55
71
  Config lives at `~/.config/sift/config.yaml`. A repo-local `sift.config.yaml` can override it later.
56
72
 
73
+ If you want your coding agent to use `sift` automatically, install the managed instruction block too:
74
+
75
+ ```bash
76
+ sift agent install codex
77
+ sift agent install claude
78
+ ```
79
+
57
80
  Then run noisy commands through `sift`:
58
81
 
59
82
  ```bash
@@ -87,6 +110,19 @@ export SIFT_BASE_URL=https://your-endpoint/v1
87
110
  export SIFT_PROVIDER_API_KEY=your_api_key
88
111
  ```
89
112
 
113
+ ## Why it helps
114
+
115
+ The core abstraction is a **bucket**: one distinct root cause, no matter how many tests it affects.
116
+
117
+ Instead of making an agent reason over 125 repeated tracebacks, `sift` compresses them into one actionable bucket with:
118
+ - a label
119
+ - an affected count
120
+ - an anchor
121
+ - a likely fix
122
+ - a decision signal
123
+
124
+ That changes the agent's job from "figure out what happened" to "act on the diagnosis."
125
+
90
126
  ## How it works
91
127
 
92
128
  `sift` follows a cheapest-first pipeline:
@@ -97,18 +133,18 @@ export SIFT_PROVIDER_API_KEY=your_api_key
97
133
  4. Escalate to a cheaper provider only if needed.
98
134
  5. Return a short diagnosis to the main agent.
99
135
 
100
- The core abstraction is a **bucket** — one distinct root cause, no matter how many tests it affects. Instead of making an agent reason over 125 repeated tracebacks, `sift` compresses them into one actionable bucket with a label, an affected count, an anchor, and a likely fix.
101
-
102
136
  It also returns a decision signal:
103
137
  - `stop and act` when the diagnosis is already actionable
104
138
  - `zoom` when one deeper pass is justified
105
139
  - raw logs only as a last resort
106
140
 
107
- The deepest local coverage today is test debugging, especially `pytest`, with growing support for `vitest` and `jest`.
141
+ For recognized formats, local heuristics can fully handle the output and skip the provider entirely.
142
+
143
+ The deepest local coverage today is test debugging, especially `pytest`, with growing support for `vitest` and `jest`. Other presets cover typecheck walls, lint failures, build errors, audit output, and Terraform risk detection.
108
144
 
109
145
  ## Built-in presets
110
146
 
111
- Every preset runs local heuristics first. When the heuristic confidently handles the output, the provider is never called — zero tokens, zero latency, fully deterministic.
147
+ Every preset runs local heuristics first. When the heuristic confidently handles the output, the provider is never called.
112
148
 
113
149
  | Preset | Heuristic | What it does |
114
150
  |--------|-----------|-------------|
@@ -144,6 +180,22 @@ Suppress the footer with `--quiet`:
144
180
  sift exec --preset typecheck-summary --quiet -- npx tsc --noEmit
145
181
  ```
146
182
 
183
+ ## Strongest today
184
+
185
+ `sift` is strongest when output is:
186
+ - long
187
+ - repetitive
188
+ - triage-heavy
189
+ - shaped by a small number of shared root causes
190
+
191
+ Best fits today:
192
+ - large `pytest`, `vitest`, or `jest` runs
193
+ - `tsc` type errors and `eslint` lint failures
194
+ - build failures from webpack, esbuild/Vite, Cargo, Go, GCC/Clang
195
+ - `npm audit` and `terraform plan`
196
+ - repeated CI blockers
197
+ - noisy diffs and log streams
198
+
147
199
  ## Test debugging workflow
148
200
 
149
201
  This is where `sift` is strongest today.
@@ -163,18 +215,11 @@ sift rerun --remaining --detail focused
163
215
 
164
216
  If `standard` already gives you the root cause, anchor, and fix, stop there and act.
165
217
 
166
- `sift rerun --remaining` currently supports only cached `pytest` or `python -m pytest` runs. For other runners, rerun a narrowed command manually with `sift exec --preset test-status -- <narrowed command>`.
218
+ `sift rerun --remaining` narrows automatically for cached `pytest` runs.
167
219
 
168
- ## Agent setup
220
+ For cached `vitest` and `jest` runs, it reruns the original full command and keeps the diagnosis focused on what still fails relative to the cached baseline.
169
221
 
170
- `sift` can install a managed instruction block so coding agents use it by default for long command output:
171
-
172
- ```bash
173
- sift agent install claude
174
- sift agent install codex
175
- ```
176
-
177
- This writes a tuned set of rules into your agent's config (CLAUDE.md, AGENTS.md, etc.) so the agent routes noisy commands through `sift` automatically — no manual prompting needed.
222
+ For other runners, rerun a narrowed command manually with `sift exec --preset test-status -- <narrowed command>`.
178
223
 
179
224
  ```bash
180
225
  sift agent status
@@ -182,22 +227,6 @@ sift agent show claude
182
227
  sift agent remove claude
183
228
  ```
184
229
 
185
- ## Where `sift` helps most
186
-
187
- `sift` is strongest when output is:
188
- - long
189
- - repetitive
190
- - triage-heavy
191
- - shaped by a small number of root causes
192
-
193
- Good fits:
194
- - large `pytest`, `vitest`, or `jest` runs (deterministic heuristics)
195
- - `tsc` type errors and `eslint` lint failures (deterministic heuristics)
196
- - build failures from webpack, esbuild, cargo, go, gcc
197
- - `npm audit` and `terraform plan` (deterministic heuristics)
198
- - repeated CI blockers
199
- - noisy diffs and log streams
200
-
201
230
  ## Where it helps less
202
231
 
203
232
  `sift` adds less value when:
@@ -218,7 +247,14 @@ On a real 640-test Python backend (125 repeated setup errors, 3 contract failure
218
247
  | Tool calls | 16 | 7 | 56% |
219
248
  | Diagnosis | Same | Same | — |
220
249
 
221
- The headline numbers (62% token reduction, 71% fewer tool calls, 65% faster) come from the end-to-end wall-clock comparison. The table above shows the token-level reduction on the largest real fixture.
250
+ The table above is the single-fixture reduction story: the largest real test log in the benchmark shrank from `198026` raw tokens to `129` `standard` tokens.
251
+
252
+ The end-to-end workflow benchmark is a different metric:
253
+ - `62%` fewer total debugging tokens
254
+ - `71%` fewer tool calls
255
+ - `65%` faster wall-clock time
256
+
257
+ Both matter. The table shows how aggressively `sift` can compress one large noisy run. The workflow numbers show how that compounds across a full debug loop.
222
258
 
223
259
  Methodology and caveats live in [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md).
224
260
 
@@ -261,7 +297,9 @@ runtime:
261
297
  ## Docs
262
298
 
263
299
  - CLI reference: [docs/cli-reference.md](docs/cli-reference.md)
300
+ - Worked examples: [docs/examples](docs/examples)
264
301
  - Benchmark methodology: [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md)
302
+ - Release notes: [release-notes](release-notes)
265
303
 
266
304
  ## License
267
305