@bilalimamoglu/sift 0.3.2 → 0.3.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +154 -185
- package/dist/cli.js +1410 -107
- package/dist/index.d.ts +15 -1
- package/dist/index.js +1375 -88
- package/package.json +2 -2
package/README.md
CHANGED
|
@@ -1,36 +1,74 @@
|
|
|
1
1
|
# sift
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
[](https://www.npmjs.com/package/@bilalimamoglu/sift)
|
|
4
|
+
[](LICENSE)
|
|
5
|
+
[](https://github.com/bilalimamoglu/sift/actions/workflows/ci.yml)
|
|
4
6
|
|
|
5
|
-
|
|
7
|
+
<img src="assets/brand/sift-logo-minimal-teal-default.svg" alt="sift logo" width="140" />
|
|
6
8
|
|
|
7
|
-
|
|
9
|
+
Your AI agent should not be reading 13,000 lines of test output.
|
|
8
10
|
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
- the command is interactive or TUI-based
|
|
12
|
-
- the output is already short
|
|
11
|
+
**Before:** 128 failures, 198K tokens, 16 tool calls, agent reconstructs the failure shape from scratch.
|
|
12
|
+
**After:** 6 lines, 129 tokens, 4 tool calls, agent acts on a grouped diagnosis immediately.
|
|
13
13
|
|
|
14
|
-
|
|
14
|
+
```bash
|
|
15
|
+
sift exec --preset test-status -- pytest -q
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
```text
|
|
19
|
+
- Tests did not pass.
|
|
20
|
+
- 3 tests failed. 125 errors occurred.
|
|
21
|
+
- Shared blocker: 125 errors share the same root cause - a missing test environment variable.
|
|
22
|
+
Anchor: tests/conftest.py
|
|
23
|
+
Fix: Set the required env var before rerunning DB-isolated tests.
|
|
24
|
+
- Contract drift: 3 snapshot tests are out of sync with the current API or model state.
|
|
25
|
+
Anchor: tests/contracts/test_feature_manifest_freeze.py
|
|
26
|
+
Fix: Regenerate the snapshots if the changes are intentional.
|
|
27
|
+
- Decision: stop and act.
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
If 125 tests fail for one reason, the agent should pay for that reason once.
|
|
31
|
+
|
|
32
|
+
## Who is this for
|
|
33
|
+
|
|
34
|
+
Developers using coding agents — Claude Code, Codex, Cursor, Windsurf, Copilot, or any LLM-driven workflow that runs shell commands and reads the output.
|
|
15
35
|
|
|
16
|
-
|
|
36
|
+
`sift` sits between the command and the agent. It captures noisy output, groups repeated failures into root-cause buckets, and returns a short diagnosis with an anchor, a likely fix, and a decision signal. The agent gets a map instead of a wall of text.
|
|
37
|
+
|
|
38
|
+
## Install
|
|
17
39
|
|
|
18
40
|
```bash
|
|
19
41
|
npm install -g @bilalimamoglu/sift
|
|
20
42
|
```
|
|
21
43
|
|
|
22
|
-
|
|
44
|
+
Requires Node.js 20+.
|
|
23
45
|
|
|
24
|
-
|
|
46
|
+
## Quick start
|
|
47
|
+
|
|
48
|
+
Guided setup writes a machine-wide config and verifies the provider:
|
|
25
49
|
|
|
26
50
|
```bash
|
|
27
51
|
sift config setup
|
|
28
|
-
sift doctor
|
|
52
|
+
sift doctor
|
|
29
53
|
```
|
|
30
54
|
|
|
31
|
-
Config
|
|
55
|
+
Config lives at `~/.config/sift/config.yaml`. A repo-local `sift.config.yaml` can override it later.
|
|
32
56
|
|
|
33
|
-
|
|
57
|
+
Then run noisy commands through `sift`:
|
|
58
|
+
|
|
59
|
+
```bash
|
|
60
|
+
sift exec --preset test-status -- <test command>
|
|
61
|
+
sift exec "what changed?" -- git diff
|
|
62
|
+
sift exec --preset audit-critical -- npm audit
|
|
63
|
+
sift exec --preset infra-risk -- terraform plan
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
Useful flags:
|
|
67
|
+
- `--dry-run` to preview the reduced input and prompt without calling a provider
|
|
68
|
+
- `--show-raw` to print captured raw output to `stderr`
|
|
69
|
+
- `--fail-on` to let reduced results fail CI for commands such as `npm audit` or `terraform plan`
|
|
70
|
+
|
|
71
|
+
If you prefer environment variables instead of setup:
|
|
34
72
|
|
|
35
73
|
```bash
|
|
36
74
|
# OpenAI
|
|
@@ -39,209 +77,167 @@ export SIFT_BASE_URL=https://api.openai.com/v1
|
|
|
39
77
|
export SIFT_MODEL=gpt-5-nano
|
|
40
78
|
export OPENAI_API_KEY=your_openai_api_key
|
|
41
79
|
|
|
42
|
-
#
|
|
80
|
+
# OpenRouter
|
|
43
81
|
export SIFT_PROVIDER=openrouter
|
|
44
82
|
export OPENROUTER_API_KEY=your_openrouter_api_key
|
|
45
83
|
|
|
46
|
-
#
|
|
84
|
+
# Any OpenAI-compatible endpoint
|
|
47
85
|
export SIFT_PROVIDER=openai-compatible
|
|
48
86
|
export SIFT_BASE_URL=https://your-endpoint/v1
|
|
49
87
|
export SIFT_PROVIDER_API_KEY=your_api_key
|
|
50
88
|
```
|
|
51
89
|
|
|
52
|
-
|
|
90
|
+
## How it works
|
|
53
91
|
|
|
54
|
-
|
|
55
|
-
sift config use openai
|
|
56
|
-
sift config use openrouter
|
|
57
|
-
```
|
|
92
|
+
`sift` follows a cheapest-first pipeline:
|
|
58
93
|
|
|
59
|
-
|
|
94
|
+
1. Capture command output.
|
|
95
|
+
2. Sanitize sensitive-looking material.
|
|
96
|
+
3. Apply local heuristics for known failure shapes.
|
|
97
|
+
4. Escalate to a cheaper provider only if needed.
|
|
98
|
+
5. Return a short diagnosis to the main agent.
|
|
60
99
|
|
|
61
|
-
|
|
100
|
+
The core abstraction is a **bucket** — one distinct root cause, no matter how many tests it affects. Instead of making an agent reason over 125 repeated tracebacks, `sift` compresses them into one actionable bucket with a label, an affected count, an anchor, and a likely fix.
|
|
62
101
|
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
sift exec --preset infra-risk -- terraform plan
|
|
68
|
-
```
|
|
102
|
+
It also returns a decision signal:
|
|
103
|
+
- `stop and act` when the diagnosis is already actionable
|
|
104
|
+
- `zoom` when one deeper pass is justified
|
|
105
|
+
- raw logs only as a last resort
|
|
69
106
|
|
|
70
|
-
|
|
107
|
+
The deepest local coverage today is test debugging, especially `pytest`, with growing support for `vitest` and `jest`.
|
|
71
108
|
|
|
72
|
-
|
|
73
|
-
- `--dry-run`: show the reduced input and prompt without calling the provider
|
|
74
|
-
- `--show-raw`: print the captured raw output to `stderr`
|
|
75
|
-
|
|
76
|
-
## Test debugging workflow
|
|
109
|
+
## Built-in presets
|
|
77
110
|
|
|
78
|
-
|
|
111
|
+
Every preset runs local heuristics first. When the heuristic confidently handles the output, the provider is never called — zero tokens, zero latency, fully deterministic.
|
|
79
112
|
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
-
|
|
113
|
+
| Preset | Heuristic | What it does |
|
|
114
|
+
|--------|-----------|-------------|
|
|
115
|
+
| `test-status` | Deep | Bucket/anchor/decision system for pytest, vitest, jest. 30+ failure patterns, confidence-gated stop/zoom decisions. |
|
|
116
|
+
| `typecheck-summary` | Deterministic | Parses `tsc` output (standard and pretty formats), groups by error code, returns max 5 bullets. |
|
|
117
|
+
| `lint-failures` | Deterministic | Parses ESLint stylish output, groups by rule, distinguishes errors from warnings, detects fixable hints. |
|
|
118
|
+
| `audit-critical` | Deterministic | Extracts high/critical vulnerabilities from `npm audit` or similar. |
|
|
119
|
+
| `infra-risk` | Deterministic | Detects destructive signals in `terraform plan` output. Returns pass/fail verdict. |
|
|
120
|
+
| `build-failure` | Deterministic-first | Extracts the first concrete build error for recognized webpack, esbuild/Vite, Cargo, Go, GCC/Clang, and `tsc --build` output; falls back to the provider for unsupported formats. |
|
|
121
|
+
| `diff-summary` | Provider | Summarizes changes and risks in diff output. |
|
|
122
|
+
| `log-errors` | Provider | Extracts top error signals from log output. |
|
|
84
123
|
|
|
85
|
-
|
|
124
|
+
Presets marked **Deterministic** bypass the provider entirely for recognized output formats. Presets marked **Deterministic-first** try a local heuristic first and fall back to the provider only when the captured output is unsupported or ambiguous. Presets marked **Provider** always call the LLM but benefit from input sanitization and truncation.
|
|
86
125
|
|
|
87
126
|
```bash
|
|
88
|
-
sift exec --preset
|
|
89
|
-
sift
|
|
90
|
-
sift
|
|
127
|
+
sift exec --preset typecheck-summary -- npx tsc --noEmit
|
|
128
|
+
sift exec --preset lint-failures -- npx eslint src/
|
|
129
|
+
sift exec --preset build-failure -- npm run build
|
|
130
|
+
sift exec --preset audit-critical -- npm audit
|
|
131
|
+
sift exec --preset infra-risk -- terraform plan
|
|
91
132
|
```
|
|
92
133
|
|
|
93
|
-
|
|
134
|
+
On an interactive terminal, `sift` also shows a small stderr footer so humans can see whether the provider was skipped:
|
|
94
135
|
|
|
95
|
-
|
|
136
|
+
```text
|
|
137
|
+
[sift: heuristic • LLM skipped • summary 47ms]
|
|
138
|
+
[sift: provider • LLM used • 380 tokens • summary 1.2s]
|
|
139
|
+
```
|
|
96
140
|
|
|
97
|
-
|
|
141
|
+
Suppress the footer with `--quiet`:
|
|
98
142
|
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
- `Fix` — the likely next move
|
|
103
|
-
- `Decision` — whether to stop here or zoom one step deeper
|
|
104
|
-
- `Next` — the smallest practical action
|
|
143
|
+
```bash
|
|
144
|
+
sift exec --preset typecheck-summary --quiet -- npx tsc --noEmit
|
|
145
|
+
```
|
|
105
146
|
|
|
106
|
-
|
|
147
|
+
## Test debugging workflow
|
|
107
148
|
|
|
108
|
-
|
|
109
|
-
- `focused` — groups failures by error type, shows a few representative tests
|
|
110
|
-
- `verbose` — flat list of all visible failing tests with their normalized reason
|
|
149
|
+
This is where `sift` is strongest today.
|
|
111
150
|
|
|
112
|
-
|
|
151
|
+
Think of it like this:
|
|
152
|
+
- `standard` = map
|
|
153
|
+
- `focused` = zoom
|
|
154
|
+
- raw traceback = last resort
|
|
113
155
|
|
|
114
|
-
|
|
115
|
-
```text
|
|
116
|
-
- Tests did not complete.
|
|
117
|
-
- 114 errors occurred during collection.
|
|
118
|
-
- Import/dependency blocker: repeated collection failures are caused by missing dependencies.
|
|
119
|
-
- Anchor: path/to/failing_test.py
|
|
120
|
-
- Fix: Install the missing dependencies and rerun the affected tests.
|
|
121
|
-
- Decision: stop and act. Do not escalate unless you need exact traceback lines.
|
|
122
|
-
- Next: Fix bucket 1 first, then rerun the full suite at standard.
|
|
123
|
-
```
|
|
156
|
+
Typical loop:
|
|
124
157
|
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
- Shared blocker: DB-isolated tests are missing a required test env var.
|
|
130
|
-
Anchor: search <TEST_ENV_VAR> in path/to/test_setup.py
|
|
131
|
-
Fix: Set the required test env var and rerun the suite.
|
|
132
|
-
- Contract drift: snapshot expectations are out of sync with the current API or model state.
|
|
133
|
-
Anchor: search <route-or-entity> in path/to/freeze_test.py
|
|
134
|
-
Fix: Review the drift and regenerate the snapshots if the change is intentional.
|
|
135
|
-
- Decision: stop and act.
|
|
136
|
-
- Next: Fix bucket 1 first, then rerun the full suite at standard.
|
|
158
|
+
```bash
|
|
159
|
+
sift exec --preset test-status -- <test command>
|
|
160
|
+
sift rerun
|
|
161
|
+
sift rerun --remaining --detail focused
|
|
137
162
|
```
|
|
138
163
|
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
1. `sift exec --preset test-status -- <test command>` — get the map.
|
|
142
|
-
2. If `standard` already shows root cause, `Anchor`, and `Fix`, trust it and act.
|
|
143
|
-
3. `sift escalate` — deeper render of the same cached output, without rerunning.
|
|
144
|
-
4. `sift rerun` — after a fix, refresh the full-suite truth at `standard`.
|
|
145
|
-
5. `sift rerun --remaining --detail focused` — zoom into what is still failing.
|
|
146
|
-
6. `sift rerun --remaining --detail verbose`
|
|
147
|
-
7. `sift rerun --remaining --detail verbose --show-raw`
|
|
148
|
-
8. Raw test command only if exact traceback lines are still needed.
|
|
164
|
+
If `standard` already gives you the root cause, anchor, and fix, stop there and act.
|
|
149
165
|
|
|
150
166
|
`sift rerun --remaining` currently supports only cached `pytest` or `python -m pytest` runs. For other runners, rerun a narrowed command manually with `sift exec --preset test-status -- <narrowed command>`.
|
|
151
167
|
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
- `sift escalate` = same cached output, deeper render
|
|
155
|
-
- `sift rerun` = rerun the cached command at `standard`, show what resolved or remained
|
|
156
|
-
- `sift rerun --remaining` = rerun only the remaining failing test nodes
|
|
157
|
-
- `Decision: stop and act` = trust the diagnosis and go fix code
|
|
158
|
-
- `Decision: zoom` = one deeper sift pass is justified before raw
|
|
159
|
-
|
|
160
|
-
## Watch mode
|
|
168
|
+
## Agent setup
|
|
161
169
|
|
|
162
|
-
|
|
170
|
+
`sift` can install a managed instruction block so coding agents use it by default for long command output:
|
|
163
171
|
|
|
164
172
|
```bash
|
|
165
|
-
sift
|
|
166
|
-
sift
|
|
167
|
-
sift exec --watch --preset test-status -- pytest -f
|
|
173
|
+
sift agent install claude
|
|
174
|
+
sift agent install codex
|
|
168
175
|
```
|
|
169
176
|
|
|
170
|
-
|
|
171
|
-
- later cycles = what changed, what resolved, what stayed, and the next best action
|
|
172
|
-
- for `test-status`, resolved tests drop out and remaining failures stay in focus
|
|
173
|
-
|
|
174
|
-
## Diagnose JSON
|
|
175
|
-
|
|
176
|
-
Start with text. Use JSON only when automation needs machine-readable output:
|
|
177
|
+
This writes a tuned set of rules into your agent's config (CLAUDE.md, AGENTS.md, etc.) so the agent routes noisy commands through `sift` automatically — no manual prompting needed.
|
|
177
178
|
|
|
178
179
|
```bash
|
|
179
|
-
sift
|
|
180
|
-
sift
|
|
180
|
+
sift agent status
|
|
181
|
+
sift agent show claude
|
|
182
|
+
sift agent remove claude
|
|
181
183
|
```
|
|
182
184
|
|
|
183
|
-
|
|
185
|
+
## Where `sift` helps most
|
|
184
186
|
|
|
185
|
-
|
|
187
|
+
`sift` is strongest when output is:
|
|
188
|
+
- long
|
|
189
|
+
- repetitive
|
|
190
|
+
- triage-heavy
|
|
191
|
+
- shaped by a small number of root causes
|
|
186
192
|
|
|
187
|
-
|
|
193
|
+
Good fits:
|
|
194
|
+
- large `pytest`, `vitest`, or `jest` runs (deterministic heuristics)
|
|
195
|
+
- `tsc` type errors and `eslint` lint failures (deterministic heuristics)
|
|
196
|
+
- build failures from webpack, esbuild, cargo, go, gcc
|
|
197
|
+
- `npm audit` and `terraform plan` (deterministic heuristics)
|
|
198
|
+
- repeated CI blockers
|
|
199
|
+
- noisy diffs and log streams
|
|
188
200
|
|
|
189
|
-
|
|
190
|
-
- `typecheck-summary`: group blocking type errors by root cause
|
|
191
|
-
- `lint-failures`: group repeated lint violations and highlight the files or rules that matter
|
|
192
|
-
- `audit-critical`: extract only high and critical vulnerabilities
|
|
193
|
-
- `infra-risk`: return a safety verdict for infra changes
|
|
194
|
-
- `diff-summary`: summarize code changes and risks
|
|
195
|
-
- `build-failure`: explain the most likely build failure
|
|
196
|
-
- `log-errors`: extract the most relevant error signals
|
|
201
|
+
## Where it helps less
|
|
197
202
|
|
|
198
|
-
|
|
199
|
-
|
|
200
|
-
|
|
201
|
-
|
|
203
|
+
`sift` adds less value when:
|
|
204
|
+
- the output is already short and obvious
|
|
205
|
+
- the command is interactive or TUI-based
|
|
206
|
+
- the exact raw log matters
|
|
207
|
+
- the output does not expose enough evidence for reliable grouping
|
|
202
208
|
|
|
203
|
-
|
|
209
|
+
When it cannot be confident, it tells you to zoom or read raw instead of pretending certainty.
|
|
204
210
|
|
|
205
|
-
|
|
211
|
+
## Benchmark
|
|
206
212
|
|
|
207
|
-
|
|
208
|
-
sift agent install codex
|
|
209
|
-
sift agent install claude
|
|
210
|
-
```
|
|
213
|
+
On a real 640-test Python backend (125 repeated setup errors, 3 contract failures, 510 passing tests):
|
|
211
214
|
|
|
212
|
-
|
|
215
|
+
| Metric | Raw agent | sift-first | Reduction |
|
|
216
|
+
|--------|-----------|------------|-----------|
|
|
217
|
+
| Tokens | 305K | 600 | 99.8% |
|
|
218
|
+
| Tool calls | 16 | 7 | 56% |
|
|
219
|
+
| Diagnosis | Same | Same | — |
|
|
213
220
|
|
|
214
|
-
|
|
215
|
-
sift agent status
|
|
216
|
-
sift agent remove codex
|
|
217
|
-
sift agent remove claude
|
|
218
|
-
```
|
|
221
|
+
The headline numbers (62% token reduction, 71% fewer tool calls, 65% faster) come from the end-to-end wall-clock comparison. The table above shows the token-level reduction on the largest real fixture.
|
|
219
222
|
|
|
220
|
-
|
|
223
|
+
Methodology and caveats live in [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md).
|
|
221
224
|
|
|
222
|
-
|
|
225
|
+
## Configuration
|
|
223
226
|
|
|
224
|
-
|
|
225
|
-
sift exec --preset audit-critical --fail-on -- npm audit
|
|
226
|
-
sift exec --preset infra-risk --fail-on -- terraform plan
|
|
227
|
-
```
|
|
228
|
-
|
|
229
|
-
## Config
|
|
227
|
+
Inspect and validate config with:
|
|
230
228
|
|
|
231
229
|
```bash
|
|
232
|
-
sift config show
|
|
230
|
+
sift config show
|
|
233
231
|
sift config show --show-secrets
|
|
234
232
|
sift config validate
|
|
235
233
|
```
|
|
236
234
|
|
|
237
|
-
|
|
238
|
-
1. CLI flags
|
|
239
|
-
2. environment variables
|
|
240
|
-
3. repo-local `sift.config.yaml`
|
|
241
|
-
4. machine-wide `~/.config/sift/config.yaml`
|
|
242
|
-
5. built-in defaults
|
|
235
|
+
To switch between saved providers without editing files:
|
|
243
236
|
|
|
244
|
-
|
|
237
|
+
```bash
|
|
238
|
+
sift config use openai
|
|
239
|
+
sift config use openrouter
|
|
240
|
+
```
|
|
245
241
|
|
|
246
242
|
Minimal YAML config:
|
|
247
243
|
|
|
@@ -262,37 +258,10 @@ runtime:
|
|
|
262
258
|
rawFallback: true
|
|
263
259
|
```
|
|
264
260
|
|
|
265
|
-
##
|
|
266
|
-
|
|
267
|
-
- redaction is optional and regex-based
|
|
268
|
-
- retriable provider failures (`429`, timeouts, `5xx`) are retried once
|
|
269
|
-
- `sift exec` detects interactive prompts (`[y/N]`, `password:`) and skips reduction
|
|
270
|
-
- pipe mode does not preserve upstream pipeline failures; use `set -o pipefail` if needed
|
|
271
|
-
|
|
272
|
-
## Releasing
|
|
273
|
-
|
|
274
|
-
This repo uses a manual GitHub Actions release workflow with npm trusted publishing.
|
|
275
|
-
|
|
276
|
-
1. bump `package.json`
|
|
277
|
-
2. merge to `main`
|
|
278
|
-
3. run the `release` workflow manually
|
|
279
|
-
|
|
280
|
-
The workflow runs typecheck, tests, coverage, build, packaging smoke checks, npm publish, tag creation, and GitHub Release creation.
|
|
281
|
-
|
|
282
|
-
Release notes: if `release-notes/v<version>.md` or `release-notes/<version>.md` exists, the workflow uses it. Otherwise it falls back to GitHub generated notes.
|
|
283
|
-
|
|
284
|
-
## Maintainer benchmark
|
|
285
|
-
|
|
286
|
-
```bash
|
|
287
|
-
npm run bench:test-status-ab
|
|
288
|
-
npm run bench:test-status-live
|
|
289
|
-
```
|
|
290
|
-
|
|
291
|
-
Uses the `o200k_base` tokenizer and reports command-output budget as the primary benchmark, with deterministic recipe-budget comparisons and live-session scorecards as supporting evidence.
|
|
292
|
-
|
|
293
|
-
## Brand assets
|
|
261
|
+
## Docs
|
|
294
262
|
|
|
295
|
-
|
|
263
|
+
- CLI reference: [docs/cli-reference.md](docs/cli-reference.md)
|
|
264
|
+
- Benchmark methodology: [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md)
|
|
296
265
|
|
|
297
266
|
## License
|
|
298
267
|
|