@bilalimamoglu/sift 0.3.1 → 0.3.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +142 -386
- package/dist/cli.js +2500 -226
- package/dist/index.d.ts +15 -1
- package/dist/index.js +2474 -216
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,436 +1,245 @@
|
|
|
1
1
|
# sift
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
[](https://www.npmjs.com/package/@bilalimamoglu/sift)
|
|
4
|
+
[](LICENSE)
|
|
5
|
+
[](https://github.com/bilalimamoglu/sift/actions/workflows/ci.yml)
|
|
4
6
|
|
|
5
|
-
|
|
7
|
+
<img src="assets/brand/sift-logo-minimal-teal-default.svg" alt="sift logo" width="140" />
|
|
6
8
|
|
|
7
|
-
|
|
8
|
-
- `standard` = map
|
|
9
|
-
- `focused` or `rerun --remaining` = zoom
|
|
10
|
-
- raw traceback = last resort
|
|
11
|
-
|
|
12
|
-
It is a good fit when a human, agent, or CI job needs the answer faster than it needs the whole log.
|
|
13
|
-
|
|
14
|
-
Common uses:
|
|
15
|
-
- test failures
|
|
16
|
-
- typecheck failures
|
|
17
|
-
- lint failures
|
|
18
|
-
- build logs
|
|
19
|
-
- `git diff`
|
|
20
|
-
- `npm audit`
|
|
21
|
-
- `terraform plan`
|
|
9
|
+
Your AI agent should not be reading 13,000 lines of test output.
|
|
22
10
|
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
- the command is interactive or TUI-based
|
|
26
|
-
- shell behavior depends on exact raw command output
|
|
27
|
-
|
|
28
|
-
## Install
|
|
29
|
-
|
|
30
|
-
Requires Node.js 20 or later.
|
|
11
|
+
**Before:** 128 failures, 198K tokens, 16 tool calls, agent reconstructs the failure shape from scratch.
|
|
12
|
+
**After:** 6 lines, 129 tokens, 4 tool calls, agent acts on a grouped diagnosis immediately.
|
|
31
13
|
|
|
32
14
|
```bash
|
|
33
|
-
|
|
34
|
-
```
|
|
35
|
-
|
|
36
|
-
## One-time setup
|
|
37
|
-
|
|
38
|
-
The easiest setup path is:
|
|
39
|
-
|
|
40
|
-
```bash
|
|
41
|
-
sift config setup
|
|
15
|
+
sift exec --preset test-status -- pytest -q
|
|
42
16
|
```
|
|
43
17
|
|
|
44
|
-
That writes a machine-wide config to:
|
|
45
|
-
|
|
46
18
|
```text
|
|
47
|
-
|
|
19
|
+
- Tests did not pass.
|
|
20
|
+
- 3 tests failed. 125 errors occurred.
|
|
21
|
+
- Shared blocker: 125 errors share the same root cause - a missing test environment variable.
|
|
22
|
+
Anchor: tests/conftest.py
|
|
23
|
+
Fix: Set the required env var before rerunning DB-isolated tests.
|
|
24
|
+
- Contract drift: 3 snapshot tests are out of sync with the current API or model state.
|
|
25
|
+
Anchor: tests/contracts/test_feature_manifest_freeze.py
|
|
26
|
+
Fix: Regenerate the snapshots if the changes are intentional.
|
|
27
|
+
- Decision: stop and act.
|
|
48
28
|
```
|
|
49
29
|
|
|
50
|
-
|
|
30
|
+
If 125 tests fail for one reason, the agent should pay for that reason once.
|
|
51
31
|
|
|
52
|
-
|
|
32
|
+
## Who is this for
|
|
53
33
|
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
sift
|
|
57
|
-
```
|
|
34
|
+
Developers using coding agents — Claude Code, Codex, Cursor, Windsurf, Copilot, or any LLM-driven workflow that runs shell commands and reads the output.
|
|
35
|
+
|
|
36
|
+
`sift` sits between the command and the agent. It captures noisy output, groups repeated failures into root-cause buckets, and returns a short diagnosis with an anchor, a likely fix, and a decision signal. The agent gets a map instead of a wall of text.
|
|
58
37
|
|
|
59
|
-
|
|
38
|
+
## Install
|
|
60
39
|
|
|
61
40
|
```bash
|
|
62
|
-
|
|
63
|
-
export SIFT_BASE_URL=https://api.openai.com/v1
|
|
64
|
-
export SIFT_MODEL=gpt-5-nano
|
|
65
|
-
export OPENAI_API_KEY=your_openai_api_key
|
|
41
|
+
npm install -g @bilalimamoglu/sift
|
|
66
42
|
```
|
|
67
43
|
|
|
68
|
-
|
|
44
|
+
Requires Node.js 20+.
|
|
69
45
|
|
|
70
|
-
|
|
71
|
-
export SIFT_PROVIDER=openrouter
|
|
72
|
-
export OPENROUTER_API_KEY=your_openrouter_api_key
|
|
73
|
-
```
|
|
46
|
+
## Quick start
|
|
74
47
|
|
|
75
|
-
|
|
48
|
+
Guided setup writes a machine-wide config and verifies the provider:
|
|
76
49
|
|
|
77
50
|
```bash
|
|
51
|
+
sift config setup
|
|
78
52
|
sift doctor
|
|
79
53
|
```
|
|
80
54
|
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
The default path is simple:
|
|
84
|
-
1. run the noisy command through `sift`
|
|
85
|
-
2. read the short `standard` answer first
|
|
86
|
-
3. only zoom in if `standard` clearly tells you more detail is still worth it
|
|
55
|
+
Config lives at `~/.config/sift/config.yaml`. A repo-local `sift.config.yaml` can override it later.
|
|
87
56
|
|
|
88
|
-
|
|
57
|
+
Then run noisy commands through `sift`:
|
|
89
58
|
|
|
90
59
|
```bash
|
|
60
|
+
sift exec --preset test-status -- <test command>
|
|
91
61
|
sift exec "what changed?" -- git diff
|
|
92
|
-
sift exec --preset test-status -- pytest -q
|
|
93
|
-
sift rerun
|
|
94
|
-
sift rerun --remaining --detail focused
|
|
95
|
-
sift rerun --remaining --detail verbose --show-raw
|
|
96
|
-
sift watch "what changed between cycles?" < watcher-output.txt
|
|
97
|
-
sift exec --watch "what changed between cycles?" -- node watcher.js
|
|
98
|
-
sift exec --preset typecheck-summary -- npm run typecheck
|
|
99
|
-
sift exec --preset lint-failures -- eslint .
|
|
100
62
|
sift exec --preset audit-critical -- npm audit
|
|
101
63
|
sift exec --preset infra-risk -- terraform plan
|
|
102
|
-
sift agent install codex --dry-run
|
|
103
64
|
```
|
|
104
65
|
|
|
105
|
-
|
|
66
|
+
Useful flags:
|
|
67
|
+
- `--dry-run` to preview the reduced input and prompt without calling a provider
|
|
68
|
+
- `--show-raw` to print captured raw output to `stderr`
|
|
69
|
+
- `--fail-on` to let reduced results fail CI for commands such as `npm audit` or `terraform plan`
|
|
106
70
|
|
|
107
|
-
|
|
71
|
+
If you prefer environment variables instead of setup:
|
|
108
72
|
|
|
109
73
|
```bash
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
Mental model:
|
|
116
|
-
- `sift escalate` = same cached output, deeper render
|
|
117
|
-
- `sift rerun` = rerun the cached full command at `standard` and prepend what resolved, remained, or changed
|
|
118
|
-
- `sift rerun --remaining` = rerun only the remaining failing pytest node IDs for a zoomed-in view
|
|
119
|
-
- `sift watch` / `sift exec --watch` = treat redraw-style output as cycles and summarize what changed
|
|
120
|
-
- `Decision: stop and act` = trust the current diagnosis and go read or fix code
|
|
121
|
-
- `Decision: zoom` = one deeper sift pass is justified before raw
|
|
122
|
-
- `Decision: raw only if exact traceback is required` = raw is last resort, not the next default step
|
|
123
|
-
|
|
124
|
-
If your project uses `pytest`, `vitest`, `jest`, `bun test`, or another test runner instead of `npm test`, use the same preset with that command.
|
|
125
|
-
|
|
126
|
-
What `sift` does in `exec` mode:
|
|
127
|
-
1. runs the child command
|
|
128
|
-
2. captures `stdout` and `stderr`
|
|
129
|
-
3. keeps the useful signal
|
|
130
|
-
4. returns a short answer or JSON
|
|
131
|
-
5. preserves the child command exit code
|
|
132
|
-
|
|
133
|
-
Useful debug flags:
|
|
134
|
-
- `--dry-run`: show the reduced input and prompt without calling the provider
|
|
135
|
-
- `--show-raw`: print the captured raw input to `stderr`
|
|
136
|
-
|
|
137
|
-
## When tests fail
|
|
74
|
+
# OpenAI
|
|
75
|
+
export SIFT_PROVIDER=openai
|
|
76
|
+
export SIFT_BASE_URL=https://api.openai.com/v1
|
|
77
|
+
export SIFT_MODEL=gpt-5-nano
|
|
78
|
+
export OPENAI_API_KEY=your_openai_api_key
|
|
138
79
|
|
|
139
|
-
|
|
80
|
+
# OpenRouter
|
|
81
|
+
export SIFT_PROVIDER=openrouter
|
|
82
|
+
export OPENROUTER_API_KEY=your_openrouter_api_key
|
|
140
83
|
|
|
141
|
-
|
|
142
|
-
|
|
84
|
+
# Any OpenAI-compatible endpoint
|
|
85
|
+
export SIFT_PROVIDER=openai-compatible
|
|
86
|
+
export SIFT_BASE_URL=https://your-endpoint/v1
|
|
87
|
+
export SIFT_PROVIDER_API_KEY=your_api_key
|
|
143
88
|
```
|
|
144
89
|
|
|
145
|
-
|
|
90
|
+
## How it works
|
|
146
91
|
|
|
147
|
-
|
|
92
|
+
`sift` follows a cheapest-first pipeline:
|
|
148
93
|
|
|
149
|
-
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
5. `sift rerun --remaining --detail verbose --show-raw`
|
|
155
|
-
6. raw pytest only if exact traceback lines are still needed
|
|
94
|
+
1. Capture command output.
|
|
95
|
+
2. Sanitize sensitive-looking material.
|
|
96
|
+
3. Apply local heuristics for known failure shapes.
|
|
97
|
+
4. Escalate to a cheaper provider only if needed.
|
|
98
|
+
5. Return a short diagnosis to the main agent.
|
|
156
99
|
|
|
157
|
-
The
|
|
100
|
+
The core abstraction is a **bucket** — one distinct root cause, no matter how many tests it affects. Instead of making an agent reason over 125 repeated tracebacks, `sift` compresses them into one actionable bucket with a label, an affected count, an anchor, and a likely fix.
|
|
158
101
|
|
|
159
|
-
|
|
102
|
+
It also returns a decision signal:
|
|
103
|
+
- `stop and act` when the diagnosis is already actionable
|
|
104
|
+
- `zoom` when one deeper pass is justified
|
|
105
|
+
- raw logs only as a last resort
|
|
160
106
|
|
|
161
|
-
|
|
107
|
+
The deepest local coverage today is test debugging, especially `pytest`, with growing support for `vitest` and `jest`.
|
|
162
108
|
|
|
163
|
-
|
|
109
|
+
## Built-in presets
|
|
164
110
|
|
|
165
|
-
|
|
111
|
+
Every preset runs local heuristics first. When the heuristic confidently handles the output, the provider is never called — zero tokens, zero latency, fully deterministic.
|
|
166
112
|
|
|
167
|
-
|
|
113
|
+
| Preset | Heuristic | What it does |
|
|
114
|
+
|--------|-----------|-------------|
|
|
115
|
+
| `test-status` | Deep | Bucket/anchor/decision system for pytest, vitest, jest. 30+ failure patterns, confidence-gated stop/zoom decisions. |
|
|
116
|
+
| `typecheck-summary` | Deterministic | Parses `tsc` output (standard and pretty formats), groups by error code, returns max 5 bullets. |
|
|
117
|
+
| `lint-failures` | Deterministic | Parses ESLint stylish output, groups by rule, distinguishes errors from warnings, detects fixable hints. |
|
|
118
|
+
| `audit-critical` | Deterministic | Extracts high/critical vulnerabilities from `npm audit` or similar. |
|
|
119
|
+
| `infra-risk` | Deterministic | Detects destructive signals in `terraform plan` output. Returns pass/fail verdict. |
|
|
120
|
+
| `build-failure` | Deterministic-first | Extracts the first concrete build error for recognized webpack, esbuild/Vite, Cargo, Go, GCC/Clang, and `tsc --build` output; falls back to the provider for unsupported formats. |
|
|
121
|
+
| `diff-summary` | Provider | Summarizes changes and risks in diff output. |
|
|
122
|
+
| `log-errors` | Provider | Extracts top error signals from log output. |
|
|
168
123
|
|
|
169
|
-
|
|
124
|
+
Presets marked **Deterministic** bypass the provider entirely for recognized output formats. Presets marked **Deterministic-first** try a local heuristic first and fall back to the provider only when the captured output is unsupported or ambiguous. Presets marked **Provider** always call the LLM but benefit from input sanitization and truncation.
|
|
170
125
|
|
|
171
126
|
```bash
|
|
172
|
-
sift exec --preset
|
|
173
|
-
sift
|
|
174
|
-
sift
|
|
127
|
+
sift exec --preset typecheck-summary -- npx tsc --noEmit
|
|
128
|
+
sift exec --preset lint-failures -- npx eslint src/
|
|
129
|
+
sift exec --preset build-failure -- npm run build
|
|
130
|
+
sift exec --preset audit-critical -- npm audit
|
|
131
|
+
sift exec --preset infra-risk -- terraform plan
|
|
175
132
|
```
|
|
176
133
|
|
|
177
|
-
|
|
178
|
-
- `remaining_summary` and `resolved_summary` keep the answer small
|
|
179
|
-
- `read_targets` points to the first file or line worth reading
|
|
180
|
-
- `read_targets.context_hint` can tell an agent to read only a small line window first
|
|
181
|
-
- if `context_hint` only includes `search_hint`, search for that string before reading the whole file
|
|
182
|
-
- `remaining_subset_available` tells you whether `sift rerun --remaining` can zoom safely
|
|
134
|
+
On an interactive terminal, `sift` also shows a small stderr footer so humans can see whether the provider was skipped:
|
|
183
135
|
|
|
184
|
-
|
|
185
|
-
|
|
186
|
-
|
|
187
|
-
sift exec --preset test-status --goal diagnose --format json --include-test-ids -- pytest -q
|
|
136
|
+
```text
|
|
137
|
+
[sift: heuristic • LLM skipped • summary 47ms]
|
|
138
|
+
[sift: provider • LLM used • 380 tokens • summary 1.2s]
|
|
188
139
|
```
|
|
189
140
|
|
|
190
|
-
|
|
191
|
-
|
|
192
|
-
## Watch mode
|
|
193
|
-
|
|
194
|
-
Use watch mode when command output redraws or repeats and you care about cycle-to-cycle change summaries more than the raw stream:
|
|
141
|
+
Suppress the footer with `--quiet`:
|
|
195
142
|
|
|
196
143
|
```bash
|
|
197
|
-
sift
|
|
198
|
-
sift exec --watch "what changed between cycles?" -- node watcher.js
|
|
199
|
-
sift exec --watch --preset test-status -- pytest -f
|
|
144
|
+
sift exec --preset typecheck-summary --quiet -- npx tsc --noEmit
|
|
200
145
|
```
|
|
201
146
|
|
|
202
|
-
|
|
203
|
-
- cycle 1 = current state
|
|
204
|
-
- later cycles = what changed, what resolved, what stayed, and the next best action
|
|
205
|
-
- for `test-status`, resolved tests drop out and remaining failures stay in focus
|
|
206
|
-
|
|
207
|
-
If the stream clearly looks like a redraw/watch session, `sift` can auto-switch to watch handling and prints a short stderr note when it does.
|
|
147
|
+
## Test debugging workflow
|
|
208
148
|
|
|
209
|
-
|
|
149
|
+
This is where `sift` is strongest today.
|
|
210
150
|
|
|
211
|
-
|
|
212
|
-
|
|
213
|
-
|
|
214
|
-
|
|
215
|
-
Available detail levels:
|
|
216
|
-
|
|
217
|
-
- `standard`
|
|
218
|
-
- short default summary
|
|
219
|
-
- no file list
|
|
220
|
-
- `focused`
|
|
221
|
-
- groups failures by error type
|
|
222
|
-
- shows a few representative failing tests or modules
|
|
223
|
-
- `verbose`
|
|
224
|
-
- flat list of visible failing tests or modules and their normalized reason
|
|
225
|
-
- useful when Codex needs to know exactly what to fix first
|
|
226
|
-
|
|
227
|
-
Examples:
|
|
228
|
-
|
|
229
|
-
```bash
|
|
230
|
-
sift exec --preset test-status -- npm test
|
|
231
|
-
sift rerun
|
|
232
|
-
sift rerun --remaining --detail focused
|
|
233
|
-
sift rerun --remaining --detail verbose
|
|
234
|
-
sift rerun --remaining --detail verbose --show-raw
|
|
235
|
-
```
|
|
151
|
+
Think of it like this:
|
|
152
|
+
- `standard` = map
|
|
153
|
+
- `focused` = zoom
|
|
154
|
+
- raw traceback = last resort
|
|
236
155
|
|
|
237
|
-
|
|
156
|
+
Typical loop:
|
|
238
157
|
|
|
239
158
|
```bash
|
|
240
|
-
sift exec --preset test-status --
|
|
159
|
+
sift exec --preset test-status -- <test command>
|
|
241
160
|
sift rerun
|
|
242
161
|
sift rerun --remaining --detail focused
|
|
243
|
-
sift rerun --remaining --detail verbose --show-raw
|
|
244
162
|
```
|
|
245
163
|
|
|
246
|
-
|
|
247
|
-
|
|
248
|
-
Typical shapes:
|
|
164
|
+
If `standard` already gives you the root cause, anchor, and fix, stop there and act.
|
|
249
165
|
|
|
250
|
-
`
|
|
251
|
-
```text
|
|
252
|
-
- Tests did not complete.
|
|
253
|
-
- 114 errors occurred during collection.
|
|
254
|
-
- Import/dependency blocker: repeated collection failures are caused by missing dependencies.
|
|
255
|
-
- Anchor: path/to/failing_test.py
|
|
256
|
-
- Fix: Install the missing dependencies and rerun the affected tests.
|
|
257
|
-
- Decision: stop and act. Do not escalate unless you need exact traceback lines.
|
|
258
|
-
- Next: Fix bucket 1 first, then rerun the full suite at standard.
|
|
259
|
-
- Stop signal: diagnosis complete; raw not needed.
|
|
260
|
-
```
|
|
261
|
-
|
|
262
|
-
`standard` can also separate more than one failure family in a single pass:
|
|
263
|
-
```text
|
|
264
|
-
- Tests did not pass.
|
|
265
|
-
- 3 tests failed. 124 errors occurred.
|
|
266
|
-
- Shared blocker: DB-isolated tests are missing a required test env var.
|
|
267
|
-
- Anchor: search <TEST_ENV_VAR> in path/to/test_setup.py
|
|
268
|
-
- Fix: Set the required test env var and rerun the suite.
|
|
269
|
-
- Contract drift: snapshot expectations are out of sync with the current API or model state.
|
|
270
|
-
- Anchor: search <route-or-entity> in path/to/freeze_test.py
|
|
271
|
-
- Fix: Review the drift and regenerate the snapshots if the change is intentional.
|
|
272
|
-
- Decision: stop and act. Do not escalate unless you need exact traceback lines.
|
|
273
|
-
- Next: Fix bucket 1 first, then rerun the full suite at standard. Secondary buckets are already visible behind it.
|
|
274
|
-
- Stop signal: diagnosis complete; raw not needed.
|
|
275
|
-
```
|
|
276
|
-
|
|
277
|
-
`focused`
|
|
278
|
-
```text
|
|
279
|
-
- Tests did not complete.
|
|
280
|
-
- 114 errors occurred during collection.
|
|
281
|
-
- Import/dependency blocker: missing dependencies are blocking collection.
|
|
282
|
-
- Missing modules include <module-a>, <module-b>.
|
|
283
|
-
- path/to/test_a.py -> missing module: <module-a>
|
|
284
|
-
- path/to/test_b.py -> missing module: <module-b>
|
|
285
|
-
- Hint: Install the missing dependencies and rerun the affected tests.
|
|
286
|
-
- Next: Fix bucket 1 first, then rerun the full suite at standard.
|
|
287
|
-
- Stop signal: diagnosis complete; raw not needed.
|
|
288
|
-
```
|
|
289
|
-
|
|
290
|
-
`verbose`
|
|
291
|
-
```text
|
|
292
|
-
- Tests did not complete.
|
|
293
|
-
- 114 errors occurred during collection.
|
|
294
|
-
- Import/dependency blocker: missing dependencies are blocking collection.
|
|
295
|
-
- path/to/test_a.py -> missing module: <module-a>
|
|
296
|
-
- path/to/test_b.py -> missing module: <module-b>
|
|
297
|
-
- path/to/test_c.py -> missing module: <module-c>
|
|
298
|
-
- Hint: Install the missing dependencies and rerun the affected tests.
|
|
299
|
-
- Next: Fix bucket 1 first, then rerun the full suite at standard.
|
|
300
|
-
- Stop signal: diagnosis complete; raw not needed.
|
|
301
|
-
```
|
|
302
|
-
|
|
303
|
-
Recommended debugging order for tests:
|
|
304
|
-
1. Use `standard` for the full suite first.
|
|
305
|
-
2. Treat `standard` as the map. If it already shows bucket-level root cause, `Anchor`, and `Fix`, trust it and report or act from there directly.
|
|
306
|
-
3. Use `sift escalate` only when you want a deeper render of the same cached output without rerunning the command.
|
|
307
|
-
4. After fixing something, run `sift rerun` to refresh the full-suite truth at `standard`.
|
|
308
|
-
5. Only then use `sift rerun --remaining --detail focused` as the zoom lens after the full-suite truth is refreshed.
|
|
309
|
-
6. Then use `sift rerun --remaining --detail verbose`.
|
|
310
|
-
7. Then use `sift rerun --remaining --detail verbose --show-raw`.
|
|
311
|
-
8. Fall back to the raw pytest command only if you still need exact traceback lines for the remaining failing subset.
|
|
312
|
-
|
|
313
|
-
## Built-in presets
|
|
314
|
-
|
|
315
|
-
- `test-status`: summarize test runs
|
|
316
|
-
- `typecheck-summary`: group blocking type errors by root cause
|
|
317
|
-
- `lint-failures`: group repeated lint violations and highlight the files or rules that matter
|
|
318
|
-
- `audit-critical`: extract only high and critical vulnerabilities
|
|
319
|
-
- `infra-risk`: return a safety verdict for infra changes
|
|
320
|
-
- `diff-summary`: summarize code changes and risks
|
|
321
|
-
- `build-failure`: explain the most likely build failure
|
|
322
|
-
- `log-errors`: extract the most relevant error signals
|
|
323
|
-
|
|
324
|
-
List or inspect them:
|
|
325
|
-
|
|
326
|
-
```bash
|
|
327
|
-
sift presets list
|
|
328
|
-
sift presets show test-status
|
|
329
|
-
```
|
|
166
|
+
`sift rerun --remaining` currently supports only cached `pytest` or `python -m pytest` runs. For other runners, rerun a narrowed command manually with `sift exec --preset test-status -- <narrowed command>`.
|
|
330
167
|
|
|
331
168
|
## Agent setup
|
|
332
169
|
|
|
333
|
-
|
|
334
|
-
|
|
335
|
-
Repo scope is the default because it is safer:
|
|
170
|
+
`sift` can install a managed instruction block so coding agents use it by default for long command output:
|
|
336
171
|
|
|
337
172
|
```bash
|
|
338
|
-
sift agent show codex
|
|
339
|
-
sift agent show codex --raw
|
|
340
|
-
sift agent install codex --dry-run
|
|
341
|
-
sift agent install codex --dry-run --raw
|
|
342
|
-
sift agent install codex
|
|
343
173
|
sift agent install claude
|
|
174
|
+
sift agent install codex
|
|
344
175
|
```
|
|
345
176
|
|
|
346
|
-
|
|
347
|
-
|
|
348
|
-
```bash
|
|
349
|
-
sift agent install codex --scope global
|
|
350
|
-
sift agent install claude --scope global
|
|
351
|
-
```
|
|
352
|
-
|
|
353
|
-
Useful commands:
|
|
177
|
+
This writes a tuned set of rules into your agent's config (CLAUDE.md, AGENTS.md, etc.) so the agent routes noisy commands through `sift` automatically — no manual prompting needed.
|
|
354
178
|
|
|
355
179
|
```bash
|
|
356
180
|
sift agent status
|
|
357
|
-
sift agent
|
|
181
|
+
sift agent show claude
|
|
358
182
|
sift agent remove claude
|
|
359
183
|
```
|
|
360
184
|
|
|
361
|
-
`sift
|
|
362
|
-
|
|
363
|
-
What the installer does:
|
|
364
|
-
- writes to `AGENTS.md` or `CLAUDE.md` by default in the current repo
|
|
365
|
-
- uses marked managed blocks instead of rewriting the whole file
|
|
366
|
-
- preserves your surrounding notes and instructions
|
|
367
|
-
- can use global files when you explicitly choose `--scope global`
|
|
368
|
-
- keeps previews short by default
|
|
369
|
-
- shows the exact managed block or final dry-run content only with `--raw`
|
|
185
|
+
## Where `sift` helps most
|
|
370
186
|
|
|
371
|
-
|
|
372
|
-
-
|
|
373
|
-
-
|
|
374
|
-
-
|
|
375
|
-
-
|
|
376
|
-
- after a fix, refresh the truth with `sift rerun`
|
|
377
|
-
- only then zoom into the remaining failing pytest subset with `sift rerun --remaining --detail focused`, then `verbose`, then `--show-raw`
|
|
378
|
-
- fall back to the raw test command only when exact traceback lines are still needed
|
|
187
|
+
`sift` is strongest when output is:
|
|
188
|
+
- long
|
|
189
|
+
- repetitive
|
|
190
|
+
- triage-heavy
|
|
191
|
+
- shaped by a small number of root causes
|
|
379
192
|
|
|
380
|
-
|
|
193
|
+
Good fits:
|
|
194
|
+
- large `pytest`, `vitest`, or `jest` runs (deterministic heuristics)
|
|
195
|
+
- `tsc` type errors and `eslint` lint failures (deterministic heuristics)
|
|
196
|
+
- build failures from webpack, esbuild, cargo, go, gcc
|
|
197
|
+
- `npm audit` and `terraform plan` (deterministic heuristics)
|
|
198
|
+
- repeated CI blockers
|
|
199
|
+
- noisy diffs and log streams
|
|
381
200
|
|
|
382
|
-
|
|
201
|
+
## Where it helps less
|
|
383
202
|
|
|
384
|
-
|
|
385
|
-
|
|
386
|
-
|
|
387
|
-
|
|
203
|
+
`sift` adds less value when:
|
|
204
|
+
- the output is already short and obvious
|
|
205
|
+
- the command is interactive or TUI-based
|
|
206
|
+
- the exact raw log matters
|
|
207
|
+
- the output does not expose enough evidence for reliable grouping
|
|
388
208
|
|
|
389
|
-
|
|
390
|
-
- `audit-critical`
|
|
391
|
-
- `infra-risk`
|
|
209
|
+
When it cannot be confident, it tells you to zoom or read raw instead of pretending certainty.
|
|
392
210
|
|
|
393
|
-
##
|
|
211
|
+
## Benchmark
|
|
394
212
|
|
|
395
|
-
|
|
213
|
+
On a real 640-test Python backend (125 repeated setup errors, 3 contract failures, 510 passing tests):
|
|
396
214
|
|
|
397
|
-
|
|
398
|
-
|
|
399
|
-
|
|
400
|
-
|
|
401
|
-
|
|
402
|
-
sift config validate
|
|
403
|
-
sift doctor
|
|
404
|
-
```
|
|
215
|
+
| Metric | Raw agent | sift-first | Reduction |
|
|
216
|
+
|--------|-----------|------------|-----------|
|
|
217
|
+
| Tokens | 305K | 600 | 99.8% |
|
|
218
|
+
| Tool calls | 16 | 7 | 56% |
|
|
219
|
+
| Diagnosis | Same | Same | — |
|
|
405
220
|
|
|
406
|
-
|
|
221
|
+
The headline numbers (62% token reduction, 71% fewer tool calls, 65% faster) come from the end-to-end wall-clock comparison. The table above shows the token-level reduction on the largest real fixture.
|
|
407
222
|
|
|
408
|
-
|
|
409
|
-
1. CLI flags
|
|
410
|
-
2. environment variables
|
|
411
|
-
3. repo-local `sift.config.yaml` or `sift.config.yml`
|
|
412
|
-
4. machine-wide `~/.config/sift/config.yaml` or `~/.config/sift/config.yml`
|
|
413
|
-
5. built-in defaults
|
|
223
|
+
Methodology and caveats live in [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md).
|
|
414
224
|
|
|
415
|
-
##
|
|
225
|
+
## Configuration
|
|
416
226
|
|
|
417
|
-
|
|
227
|
+
Inspect and validate config with:
|
|
418
228
|
|
|
419
229
|
```bash
|
|
420
|
-
|
|
421
|
-
|
|
230
|
+
sift config show
|
|
231
|
+
sift config show --show-secrets
|
|
232
|
+
sift config validate
|
|
422
233
|
```
|
|
423
234
|
|
|
424
|
-
|
|
425
|
-
- command-output budget as the primary benchmark
|
|
426
|
-
- deterministic recipe-budget comparisons as supporting evidence only
|
|
427
|
-
- live-session scorecards for captured mixed full-suite agent transcripts
|
|
235
|
+
To switch between saved providers without editing files:
|
|
428
236
|
|
|
429
|
-
|
|
430
|
-
|
|
431
|
-
|
|
237
|
+
```bash
|
|
238
|
+
sift config use openai
|
|
239
|
+
sift config use openrouter
|
|
240
|
+
```
|
|
432
241
|
|
|
433
|
-
Minimal config
|
|
242
|
+
Minimal YAML config:
|
|
434
243
|
|
|
435
244
|
```yaml
|
|
436
245
|
provider:
|
|
@@ -449,63 +258,10 @@ runtime:
|
|
|
449
258
|
rawFallback: true
|
|
450
259
|
```
|
|
451
260
|
|
|
452
|
-
##
|
|
453
|
-
|
|
454
|
-
Use `provider: openai` for `api.openai.com`.
|
|
455
|
-
|
|
456
|
-
Use `provider: openrouter` for the native OpenRouter path. It defaults to:
|
|
457
|
-
- `baseUrl: https://openrouter.ai/api/v1`
|
|
458
|
-
- `model: openrouter/free`
|
|
459
|
-
|
|
460
|
-
Use `provider: openai-compatible` for third-party compatible gateways or self-hosted endpoints.
|
|
461
|
-
|
|
462
|
-
For OpenAI:
|
|
463
|
-
```bash
|
|
464
|
-
export OPENAI_API_KEY=your_openai_api_key
|
|
465
|
-
```
|
|
466
|
-
|
|
467
|
-
For OpenRouter:
|
|
468
|
-
```bash
|
|
469
|
-
export OPENROUTER_API_KEY=your_openrouter_api_key
|
|
470
|
-
```
|
|
471
|
-
|
|
472
|
-
For third-party compatible endpoints, use either the endpoint-native env var or:
|
|
473
|
-
|
|
474
|
-
```bash
|
|
475
|
-
export SIFT_PROVIDER_API_KEY=your_provider_api_key
|
|
476
|
-
```
|
|
477
|
-
|
|
478
|
-
Known compatible env fallbacks include:
|
|
479
|
-
- `OPENROUTER_API_KEY`
|
|
480
|
-
- `TOGETHER_API_KEY`
|
|
481
|
-
- `GROQ_API_KEY`
|
|
482
|
-
|
|
483
|
-
## Safety and limits
|
|
484
|
-
|
|
485
|
-
- redaction is optional and regex-based
|
|
486
|
-
- retriable provider failures such as `429`, timeouts, and `5xx` are retried once
|
|
487
|
-
- `sift exec` detects simple prompt-like output such as `[y/N]` or `password:` and skips reduction
|
|
488
|
-
- pipe mode does not preserve upstream shell pipeline failures; use `set -o pipefail` if you need that behavior
|
|
489
|
-
|
|
490
|
-
## Releasing
|
|
491
|
-
|
|
492
|
-
This repo uses a manual GitHub Actions release workflow with npm trusted publishing.
|
|
493
|
-
|
|
494
|
-
Release flow:
|
|
495
|
-
1. bump `package.json`
|
|
496
|
-
2. merge to `main`
|
|
497
|
-
3. run the `release` workflow manually
|
|
498
|
-
|
|
499
|
-
The workflow runs typecheck, tests, coverage, build, packaging smoke checks, npm publish, tag creation, and GitHub Release creation.
|
|
500
|
-
|
|
501
|
-
## Brand assets
|
|
502
|
-
|
|
503
|
-
Curated public logo assets live in `assets/brand/`.
|
|
261
|
+
## Docs
|
|
504
262
|
|
|
505
|
-
|
|
506
|
-
-
|
|
507
|
-
- icon-only: teal, black, monochrome
|
|
508
|
-
- 24px icon: teal, black, monochrome
|
|
263
|
+
- CLI reference: [docs/cli-reference.md](docs/cli-reference.md)
|
|
264
|
+
- Benchmark methodology: [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md)
|
|
509
265
|
|
|
510
266
|
## License
|
|
511
267
|
|