@bilalimamoglu/sift 0.3.2 → 0.3.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,36 +1,74 @@
1
1
  # sift
2
2
 
3
- <img src="assets/brand/sift-logo-minimal-monochrome.svg" alt="sift logo" width="120" />
3
+ [![npm version](https://img.shields.io/npm/v/@bilalimamoglu/sift)](https://www.npmjs.com/package/@bilalimamoglu/sift)
4
+ [![license](https://img.shields.io/github/license/bilalimamoglu/sift)](LICENSE)
5
+ [![CI](https://img.shields.io/github/actions/workflow/status/bilalimamoglu/sift/ci.yml?branch=main&label=CI)](https://github.com/bilalimamoglu/sift/actions/workflows/ci.yml)
4
6
 
5
- Most command output is long and noisy, but the thing you actually need to know is short: what failed, where, and what to do next. `sift` runs the command for you, captures the output, and gives you a short answer instead of a wall of text.
7
+ <img src="assets/brand/sift-logo-minimal-teal-default.svg" alt="sift logo" width="140" />
6
8
 
7
- It works with test suites, build logs, `git diff`, `npm audit`, `terraform plan` — anything where the signal is buried in noise. It always tries the cheapest approach first and only escalates when needed. Exit codes are preserved.
9
+ Your AI agent should not be reading 13,000 lines of test output.
8
10
 
9
- Skip it when:
10
- - you need the exact raw log
11
- - the command is interactive or TUI-based
12
- - the output is already short
11
+ **Before:** 128 failures, 198K tokens, 16 tool calls, agent reconstructs the failure shape from scratch.
12
+ **After:** 6 lines, 129 tokens, 4 tool calls, agent acts on a grouped diagnosis immediately.
13
13
 
14
- ## Install
14
+ ```bash
15
+ sift exec --preset test-status -- pytest -q
16
+ ```
17
+
18
+ ```text
19
+ - Tests did not pass.
20
+ - 3 tests failed. 125 errors occurred.
21
+ - Shared blocker: 125 errors share the same root cause - a missing test environment variable.
22
+ Anchor: tests/conftest.py
23
+ Fix: Set the required env var before rerunning DB-isolated tests.
24
+ - Contract drift: 3 snapshot tests are out of sync with the current API or model state.
25
+ Anchor: tests/contracts/test_feature_manifest_freeze.py
26
+ Fix: Regenerate the snapshots if the changes are intentional.
27
+ - Decision: stop and act.
28
+ ```
29
+
30
+ If 125 tests fail for one reason, the agent should pay for that reason once.
31
+
32
+ ## Who is this for
33
+
34
+ Developers using coding agents — Claude Code, Codex, Cursor, Windsurf, Copilot, or any LLM-driven workflow that runs shell commands and reads the output.
15
35
 
16
- Requires Node.js 24 or later.
36
+ `sift` sits between the command and the agent. It captures noisy output, groups repeated failures into root-cause buckets, and returns a short diagnosis with an anchor, a likely fix, and a decision signal. The agent gets a map instead of a wall of text.
37
+
38
+ ## Install
17
39
 
18
40
  ```bash
19
41
  npm install -g @bilalimamoglu/sift
20
42
  ```
21
43
 
22
- ## Setup
44
+ Requires Node.js 20+.
23
45
 
24
- The interactive setup writes a machine-wide config and walks you through provider selection:
46
+ ## Quick start
47
+
48
+ Guided setup writes a machine-wide config and verifies the provider:
25
49
 
26
50
  ```bash
27
51
  sift config setup
28
- sift doctor # verify it works
52
+ sift doctor
29
53
  ```
30
54
 
31
- Config is saved to `~/.config/sift/config.yaml`. A repo-local `sift.config.yaml` can override it later.
55
+ Config lives at `~/.config/sift/config.yaml`. A repo-local `sift.config.yaml` can override it later.
32
56
 
33
- If you prefer environment variables instead:
57
+ Then run noisy commands through `sift`:
58
+
59
+ ```bash
60
+ sift exec --preset test-status -- <test command>
61
+ sift exec "what changed?" -- git diff
62
+ sift exec --preset audit-critical -- npm audit
63
+ sift exec --preset infra-risk -- terraform plan
64
+ ```
65
+
66
+ Useful flags:
67
+ - `--dry-run` to preview the reduced input and prompt without calling a provider
68
+ - `--show-raw` to print captured raw output to `stderr`
69
+ - `--fail-on` to let reduced results fail CI for commands such as `npm audit` or `terraform plan`
70
+
71
+ If you prefer environment variables instead of setup:
34
72
 
35
73
  ```bash
36
74
  # OpenAI
@@ -39,209 +77,167 @@ export SIFT_BASE_URL=https://api.openai.com/v1
39
77
  export SIFT_MODEL=gpt-5-nano
40
78
  export OPENAI_API_KEY=your_openai_api_key
41
79
 
42
- # or OpenRouter
80
+ # OpenRouter
43
81
  export SIFT_PROVIDER=openrouter
44
82
  export OPENROUTER_API_KEY=your_openrouter_api_key
45
83
 
46
- # or any OpenAI-compatible endpoint (Together, Groq, self-hosted, etc.)
84
+ # Any OpenAI-compatible endpoint
47
85
  export SIFT_PROVIDER=openai-compatible
48
86
  export SIFT_BASE_URL=https://your-endpoint/v1
49
87
  export SIFT_PROVIDER_API_KEY=your_api_key
50
88
  ```
51
89
 
52
- To switch between saved providers without editing files:
90
+ ## How it works
53
91
 
54
- ```bash
55
- sift config use openai
56
- sift config use openrouter
57
- ```
92
+ `sift` follows a cheapest-first pipeline:
58
93
 
59
- ## Usage
94
+ 1. Capture command output.
95
+ 2. Sanitize sensitive-looking material.
96
+ 3. Apply local heuristics for known failure shapes.
97
+ 4. Escalate to a cheaper provider only if needed.
98
+ 5. Return a short diagnosis to the main agent.
60
99
 
61
- Run a noisy command through `sift`, read the short answer, and only zoom in if it tells you to:
100
+ The core abstraction is a **bucket** one distinct root cause, no matter how many tests it affects. Instead of making an agent reason over 125 repeated tracebacks, `sift` compresses them into one actionable bucket with a label, an affected count, an anchor, and a likely fix.
62
101
 
63
- ```bash
64
- sift exec --preset test-status -- pytest -q
65
- sift exec "what changed?" -- git diff
66
- sift exec --preset audit-critical -- npm audit
67
- sift exec --preset infra-risk -- terraform plan
68
- ```
102
+ It also returns a decision signal:
103
+ - `stop and act` when the diagnosis is already actionable
104
+ - `zoom` when one deeper pass is justified
105
+ - raw logs only as a last resort
69
106
 
70
- `sift exec` runs the child command, captures its output, reduces it, and preserves the original exit code.
107
+ The deepest local coverage today is test debugging, especially `pytest`, with growing support for `vitest` and `jest`.
71
108
 
72
- Useful flags:
73
- - `--dry-run`: show the reduced input and prompt without calling the provider
74
- - `--show-raw`: print the captured raw output to `stderr`
75
-
76
- ## Test debugging workflow
109
+ ## Built-in presets
77
110
 
78
- This is the most common use case and where `sift` adds the most value.
111
+ Every preset runs local heuristics first. When the heuristic confidently handles the output, the provider is never called — zero tokens, zero latency, fully deterministic.
79
112
 
80
- Think of it like this:
81
- - `standard` = map
82
- - `focused` or `rerun --remaining` = zoom
83
- - raw traceback = last resort
113
+ | Preset | Heuristic | What it does |
114
+ |--------|-----------|-------------|
115
+ | `test-status` | Deep | Bucket/anchor/decision system for pytest, vitest, jest. 30+ failure patterns, confidence-gated stop/zoom decisions. |
116
+ | `typecheck-summary` | Deterministic | Parses `tsc` output (standard and pretty formats), groups by error code, returns max 5 bullets. |
117
+ | `lint-failures` | Deterministic | Parses ESLint stylish output, groups by rule, distinguishes errors from warnings, detects fixable hints. |
118
+ | `audit-critical` | Deterministic | Extracts high/critical vulnerabilities from `npm audit` or similar. |
119
+ | `infra-risk` | Deterministic | Detects destructive signals in `terraform plan` output. Returns pass/fail verdict. |
120
+ | `build-failure` | Deterministic-first | Extracts the first concrete build error for recognized webpack, esbuild/Vite, Cargo, Go, GCC/Clang, and `tsc --build` output; falls back to the provider for unsupported formats. |
121
+ | `diff-summary` | Provider | Summarizes changes and risks in diff output. |
122
+ | `log-errors` | Provider | Extracts top error signals from log output. |
84
123
 
85
- For most repos, the whole story is:
124
+ Presets marked **Deterministic** bypass the provider entirely for recognized output formats. Presets marked **Deterministic-first** try a local heuristic first and fall back to the provider only when the captured output is unsupported or ambiguous. Presets marked **Provider** always call the LLM but benefit from input sanitization and truncation.
86
125
 
87
126
  ```bash
88
- sift exec --preset test-status -- <test command> # get the map
89
- sift rerun # after a fix, refresh the truth
90
- sift rerun --remaining --detail focused # zoom into what's still failing
127
+ sift exec --preset typecheck-summary -- npx tsc --noEmit
128
+ sift exec --preset lint-failures -- npx eslint src/
129
+ sift exec --preset build-failure -- npm run build
130
+ sift exec --preset audit-critical -- npm audit
131
+ sift exec --preset infra-risk -- terraform plan
91
132
  ```
92
133
 
93
- `test-status` becomes test-aware because you chose the preset. It does **not** infer "this is a test command" from the runner name use the same preset with `pytest`, `vitest`, `jest`, `bun test`, or any other runner.
134
+ On an interactive terminal, `sift` also shows a small stderr footer so humans can see whether the provider was skipped:
94
135
 
95
- If `standard` already names the failure buckets, counts, and hints, stop there and read code. If it ends with `Decision: zoom`, do one deeper pass before falling back to raw traceback.
136
+ ```text
137
+ [sift: heuristic • LLM skipped • summary 47ms]
138
+ [sift: provider • LLM used • 380 tokens • summary 1.2s]
139
+ ```
96
140
 
97
- ### What `sift` returns for each failure family
141
+ Suppress the footer with `--quiet`:
98
142
 
99
- - `Shared blocker` — one setup problem affecting many tests
100
- - A named family such as import, timeout, network, migration, or assertion
101
- - `Anchor` — the first file, line window, or search term worth opening
102
- - `Fix` — the likely next move
103
- - `Decision` — whether to stop here or zoom one step deeper
104
- - `Next` — the smallest practical action
143
+ ```bash
144
+ sift exec --preset typecheck-summary --quiet -- npx tsc --noEmit
145
+ ```
105
146
 
106
- ### Detail levels
147
+ ## Test debugging workflow
107
148
 
108
- - `standard` short summary, no file list (default)
109
- - `focused` — groups failures by error type, shows a few representative tests
110
- - `verbose` — flat list of all visible failing tests with their normalized reason
149
+ This is where `sift` is strongest today.
111
150
 
112
- ### Example output
151
+ Think of it like this:
152
+ - `standard` = map
153
+ - `focused` = zoom
154
+ - raw traceback = last resort
113
155
 
114
- Single failure family:
115
- ```text
116
- - Tests did not complete.
117
- - 114 errors occurred during collection.
118
- - Import/dependency blocker: repeated collection failures are caused by missing dependencies.
119
- - Anchor: path/to/failing_test.py
120
- - Fix: Install the missing dependencies and rerun the affected tests.
121
- - Decision: stop and act. Do not escalate unless you need exact traceback lines.
122
- - Next: Fix bucket 1 first, then rerun the full suite at standard.
123
- ```
156
+ Typical loop:
124
157
 
125
- Multiple failure families in one pass:
126
- ```text
127
- - Tests did not pass.
128
- - 3 tests failed. 124 errors occurred.
129
- - Shared blocker: DB-isolated tests are missing a required test env var.
130
- Anchor: search <TEST_ENV_VAR> in path/to/test_setup.py
131
- Fix: Set the required test env var and rerun the suite.
132
- - Contract drift: snapshot expectations are out of sync with the current API or model state.
133
- Anchor: search <route-or-entity> in path/to/freeze_test.py
134
- Fix: Review the drift and regenerate the snapshots if the change is intentional.
135
- - Decision: stop and act.
136
- - Next: Fix bucket 1 first, then rerun the full suite at standard.
158
+ ```bash
159
+ sift exec --preset test-status -- <test command>
160
+ sift rerun
161
+ sift rerun --remaining --detail focused
137
162
  ```
138
163
 
139
- ### Recommended debugging order
140
-
141
- 1. `sift exec --preset test-status -- <test command>` — get the map.
142
- 2. If `standard` already shows root cause, `Anchor`, and `Fix`, trust it and act.
143
- 3. `sift escalate` — deeper render of the same cached output, without rerunning.
144
- 4. `sift rerun` — after a fix, refresh the full-suite truth at `standard`.
145
- 5. `sift rerun --remaining --detail focused` — zoom into what is still failing.
146
- 6. `sift rerun --remaining --detail verbose`
147
- 7. `sift rerun --remaining --detail verbose --show-raw`
148
- 8. Raw test command only if exact traceback lines are still needed.
164
+ If `standard` already gives you the root cause, anchor, and fix, stop there and act.
149
165
 
150
166
  `sift rerun --remaining` currently supports only cached `pytest` or `python -m pytest` runs. For other runners, rerun a narrowed command manually with `sift exec --preset test-status -- <narrowed command>`.
151
167
 
152
- ### Quick glossary
153
-
154
- - `sift escalate` = same cached output, deeper render
155
- - `sift rerun` = rerun the cached command at `standard`, show what resolved or remained
156
- - `sift rerun --remaining` = rerun only the remaining failing test nodes
157
- - `Decision: stop and act` = trust the diagnosis and go fix code
158
- - `Decision: zoom` = one deeper sift pass is justified before raw
159
-
160
- ## Watch mode
168
+ ## Agent setup
161
169
 
162
- Use watch mode when output redraws or repeats across cycles:
170
+ `sift` can install a managed instruction block so coding agents use it by default for long command output:
163
171
 
164
172
  ```bash
165
- sift watch "what changed between cycles?" < watcher-output.txt
166
- sift exec --watch "what changed between cycles?" -- node watcher.js
167
- sift exec --watch --preset test-status -- pytest -f
173
+ sift agent install claude
174
+ sift agent install codex
168
175
  ```
169
176
 
170
- - cycle 1 = current state
171
- - later cycles = what changed, what resolved, what stayed, and the next best action
172
- - for `test-status`, resolved tests drop out and remaining failures stay in focus
173
-
174
- ## Diagnose JSON
175
-
176
- Start with text. Use JSON only when automation needs machine-readable output:
177
+ This writes a tuned set of rules into your agent's config (CLAUDE.md, AGENTS.md, etc.) so the agent routes noisy commands through `sift` automatically — no manual prompting needed.
177
178
 
178
179
  ```bash
179
- sift exec --preset test-status --goal diagnose --format json -- pytest -q
180
- sift rerun --goal diagnose --format json
180
+ sift agent status
181
+ sift agent show claude
182
+ sift agent remove claude
181
183
  ```
182
184
 
183
- The JSON is summary-first: `remaining_summary`, `resolved_summary`, `read_targets` with optional `context_hint`, and `remaining_subset_available` to tell you whether `sift rerun --remaining` can zoom safely.
185
+ ## Where `sift` helps most
184
186
 
185
- Add `--include-test-ids` only when you need every raw failing test ID.
187
+ `sift` is strongest when output is:
188
+ - long
189
+ - repetitive
190
+ - triage-heavy
191
+ - shaped by a small number of root causes
186
192
 
187
- ## Built-in presets
193
+ Good fits:
194
+ - large `pytest`, `vitest`, or `jest` runs (deterministic heuristics)
195
+ - `tsc` type errors and `eslint` lint failures (deterministic heuristics)
196
+ - build failures from webpack, esbuild, cargo, go, gcc
197
+ - `npm audit` and `terraform plan` (deterministic heuristics)
198
+ - repeated CI blockers
199
+ - noisy diffs and log streams
188
200
 
189
- - `test-status`: summarize test runs
190
- - `typecheck-summary`: group blocking type errors by root cause
191
- - `lint-failures`: group repeated lint violations and highlight the files or rules that matter
192
- - `audit-critical`: extract only high and critical vulnerabilities
193
- - `infra-risk`: return a safety verdict for infra changes
194
- - `diff-summary`: summarize code changes and risks
195
- - `build-failure`: explain the most likely build failure
196
- - `log-errors`: extract the most relevant error signals
201
+ ## Where it helps less
197
202
 
198
- ```bash
199
- sift presets list
200
- sift presets show test-status
201
- ```
203
+ `sift` adds less value when:
204
+ - the output is already short and obvious
205
+ - the command is interactive or TUI-based
206
+ - the exact raw log matters
207
+ - the output does not expose enough evidence for reliable grouping
202
208
 
203
- ## Agent setup
209
+ When it cannot be confident, it tells you to zoom or read raw instead of pretending certainty.
204
210
 
205
- `sift` can install a managed instruction block so Codex or Claude Code uses `sift` by default for long command output:
211
+ ## Benchmark
206
212
 
207
- ```bash
208
- sift agent install codex
209
- sift agent install claude
210
- ```
213
+ On a real 640-test Python backend (125 repeated setup errors, 3 contract failures, 510 passing tests):
211
214
 
212
- This writes a managed block to `AGENTS.md` or `CLAUDE.md` in the current repo. Use `--dry-run` to preview, or `--scope global` for machine-wide instructions.
215
+ | Metric | Raw agent | sift-first | Reduction |
216
+ |--------|-----------|------------|-----------|
217
+ | Tokens | 305K | 600 | 99.8% |
218
+ | Tool calls | 16 | 7 | 56% |
219
+ | Diagnosis | Same | Same | — |
213
220
 
214
- ```bash
215
- sift agent status
216
- sift agent remove codex
217
- sift agent remove claude
218
- ```
221
+ The headline numbers (62% token reduction, 71% fewer tool calls, 65% faster) come from the end-to-end wall-clock comparison. The table above shows the token-level reduction on the largest real fixture.
219
222
 
220
- ## CI usage
223
+ Methodology and caveats live in [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md).
221
224
 
222
- Some commands succeed technically but should still block CI. `--fail-on` handles that:
225
+ ## Configuration
223
226
 
224
- ```bash
225
- sift exec --preset audit-critical --fail-on -- npm audit
226
- sift exec --preset infra-risk --fail-on -- terraform plan
227
- ```
228
-
229
- ## Config
227
+ Inspect and validate config with:
230
228
 
231
229
  ```bash
232
- sift config show # masks secrets by default
230
+ sift config show
233
231
  sift config show --show-secrets
234
232
  sift config validate
235
233
  ```
236
234
 
237
- Config precedence:
238
- 1. CLI flags
239
- 2. environment variables
240
- 3. repo-local `sift.config.yaml`
241
- 4. machine-wide `~/.config/sift/config.yaml`
242
- 5. built-in defaults
235
+ To switch between saved providers without editing files:
243
236
 
244
- If you pass `--config <path>`, that path is strict — missing paths are errors.
237
+ ```bash
238
+ sift config use openai
239
+ sift config use openrouter
240
+ ```
245
241
 
246
242
  Minimal YAML config:
247
243
 
@@ -262,37 +258,10 @@ runtime:
262
258
  rawFallback: true
263
259
  ```
264
260
 
265
- ## Safety and limits
266
-
267
- - redaction is optional and regex-based
268
- - retriable provider failures (`429`, timeouts, `5xx`) are retried once
269
- - `sift exec` detects interactive prompts (`[y/N]`, `password:`) and skips reduction
270
- - pipe mode does not preserve upstream pipeline failures; use `set -o pipefail` if needed
271
-
272
- ## Releasing
273
-
274
- This repo uses a manual GitHub Actions release workflow with npm trusted publishing.
275
-
276
- 1. bump `package.json`
277
- 2. merge to `main`
278
- 3. run the `release` workflow manually
279
-
280
- The workflow runs typecheck, tests, coverage, build, packaging smoke checks, npm publish, tag creation, and GitHub Release creation.
281
-
282
- Release notes: if `release-notes/v<version>.md` or `release-notes/<version>.md` exists, the workflow uses it. Otherwise it falls back to GitHub generated notes.
283
-
284
- ## Maintainer benchmark
285
-
286
- ```bash
287
- npm run bench:test-status-ab
288
- npm run bench:test-status-live
289
- ```
290
-
291
- Uses the `o200k_base` tokenizer and reports command-output budget as the primary benchmark, with deterministic recipe-budget comparisons and live-session scorecards as supporting evidence.
292
-
293
- ## Brand assets
261
+ ## Docs
294
262
 
295
- Logo assets live in `assets/brand/`: badge/app, icon-only, and 24px icon variants in teal, black, and monochrome.
263
+ - CLI reference: [docs/cli-reference.md](docs/cli-reference.md)
264
+ - Benchmark methodology: [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md)
296
265
 
297
266
  ## License
298
267