npm - @bilalimamoglu/sift - Versions diffs - 0.4.1 → 0.4.2 - Mend

@bilalimamoglu/sift 0.4.1 → 0.4.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (2) hide show

package/README.md +63 -217
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -4,23 +4,19 @@
 [![license](https://img.shields.io/github/license/bilalimamoglu/sift)](LICENSE)
 [![CI](https://img.shields.io/github/actions/workflow/status/bilalimamoglu/sift/ci.yml?branch=main&label=CI)](https://github.com/bilalimamoglu/sift/actions/workflows/ci.yml)
-<img src="assets/brand/sift-logo-minimal-teal-default.svg" alt="sift logo" width="140" />
+Turn 13,000 lines of test output into 2 root causes.
-Your AI agent should not be reading 13,000 lines of test output.
+Your agent reads a diagnosis, not a log file.
-If 125 tests fail for one reason, it should pay for that reason once.
+<p align="center">
+  <img src="assets/readme/test-status-demo.gif" alt="sift turning a pytest failure wall into a short diagnosis" width="960" />
+</p>
-`sift` turns noisy command output into a short, structured diagnosis for coding agents, so they spend fewer tokens, cost less to run, and move through debug loops faster.
+## Before / After
-Instead of feeding an agent thousands of lines of logs, you give it:
-- the root cause
-- where it happens
-- what to fix
-- what to do next
+128 test failures. 13,000 lines of logs. The agent reads all of it.
-```bash
-sift exec --preset test-status -- pytest -q
-```
+With `sift`, it reads this instead:
 ```text
 - Tests did not pass.
@@ -34,14 +30,18 @@ sift exec --preset test-status -- pytest -q
 - Decision: stop and act.
 ```
-On the largest real fixture in the benchmark:
-`198K` raw-output tokens -> `129` `standard` tokens.
+Same diagnosis. One run compressed from 198,000 tokens to 129.
-Same diagnosis. Far less work.
+## Not just tests
-## What it is
+The same idea applies across noisy dev workflows:
-`sift` sits between a noisy command and a coding agent. It captures output, groups repeated failures into root-cause buckets, and returns a short diagnosis with an anchor, a likely fix, and a decision signal.
+- **Type errors** → grouped by error code, no model call
+- **Lint output** → grouped by rule, no model call
+- **Build failures** → first real error from webpack, esbuild/Vite, Cargo, Go, GCC/Clang
+- **`npm audit`** → high/critical vulnerabilities only, no model call
+- **`terraform plan`** → destructive risk detection, no model call
+- **Diffs and logs** → compressed through a cheaper model before reaching your agent
 ## Install
@@ -51,255 +51,101 @@ npm install -g @bilalimamoglu/sift
 Requires Node.js 20+.
-## Try it in 60 seconds
-If you already have an API key, you can try `sift` without any setup wizard:
+## Try it
 ```bash
-export OPENAI_API_KEY=your_openai_api_key
 sift exec --preset test-status -- pytest -q
+sift exec --preset test-status -- npx vitest run
+sift exec --preset test-status -- npx jest
 ```
-You can also use a freeform prompt for non-test output:
-```bash
-sift exec "what changed?" -- git diff
-```
-## Set it up for daily use
-Guided setup writes a machine-wide config, verifies the provider, and makes the CLI easier to use day to day:
-```bash
-sift config setup
-sift doctor
-```
-Config lives at `~/.config/sift/config.yaml`. A repo-local `sift.config.yaml` can override it later.
-If you want your coding agent to use `sift` automatically, install the managed instruction block too:
-```bash
-sift agent install codex
-sift agent install claude
-```
-Then run noisy commands through `sift`:
+Other workflows:
 ```bash
-sift exec --preset test-status -- <test command>
-sift exec "what changed?" -- git diff
+sift exec --preset typecheck-summary -- npx tsc --noEmit
+sift exec --preset lint-failures -- npx eslint src/
+sift exec --preset build-failure -- npm run build
 sift exec --preset audit-critical -- npm audit
 sift exec --preset infra-risk -- terraform plan
+sift exec "what changed?" -- git diff
 ```
-Useful flags:
-- `--dry-run` to preview the reduced input and prompt without calling a provider
-- `--show-raw` to print captured raw output to `stderr`
-- `--fail-on` to let reduced results fail CI for commands such as `npm audit` or `terraform plan`
-If you prefer environment variables instead of setup:
-```bash
-# OpenAI
-export SIFT_PROVIDER=openai
-export SIFT_BASE_URL=https://api.openai.com/v1
-export SIFT_MODEL=gpt-5-nano
-export OPENAI_API_KEY=your_openai_api_key
-# OpenRouter
-export SIFT_PROVIDER=openrouter
-export OPENROUTER_API_KEY=your_openrouter_api_key
-# Any OpenAI-compatible endpoint
-export SIFT_PROVIDER=openai-compatible
-export SIFT_BASE_URL=https://your-endpoint/v1
-export SIFT_PROVIDER_API_KEY=your_api_key
-```
-## Why it helps
-The core abstraction is a **bucket**: one distinct root cause, no matter how many tests it affects.
-Instead of making an agent reason over 125 repeated tracebacks, `sift` compresses them into one actionable bucket with:
-- a label
-- an affected count
-- an anchor
-- a likely fix
-- a decision signal
-That changes the agent's job from "figure out what happened" to "act on the diagnosis."
 ## How it works
-`sift` follows a cheapest-first pipeline:
-1. Capture command output.
-2. Sanitize sensitive-looking material.
-3. Apply local heuristics for known failure shapes.
-4. Escalate to a cheaper provider only if needed.
-5. Return a short diagnosis to the main agent.
+`sift` sits between a noisy command and a coding agent.
-It also returns a decision signal:
-- `stop and act` when the diagnosis is already actionable
-- `zoom` when one deeper pass is justified
-- raw logs only as a last resort
+1. Capture output.
+2. Run local heuristics for known failure shapes.
+3. If heuristics are confident, return the diagnosis. No model call.
+4. If not, call a cheaper model — not your agent's.
-For recognized formats, local heuristics can fully handle the output and skip the provider entirely.
+The agent gets the root cause, where it happens, and what to do next.
-The deepest local coverage today is test debugging, especially `pytest`, with growing support for `vitest` and `jest`. Other presets cover typecheck walls, lint failures, build errors, audit output, and Terraform risk detection.
+So your agent spends tokens fixing, not reading.
 ## Built-in presets
-Every preset runs local heuristics first. When the heuristic confidently handles the output, the provider is never called.
-| Preset | Heuristic | What it does |
-|--------|-----------|-------------|
-| `test-status` | Deep | Bucket/anchor/decision system for pytest, vitest, jest. 30+ failure patterns, confidence-gated stop/zoom decisions. |
-| `typecheck-summary` | Deterministic | Parses `tsc` output (standard and pretty formats), groups by error code, returns max 5 bullets. |
-| `lint-failures` | Deterministic | Parses ESLint stylish output, groups by rule, distinguishes errors from warnings, detects fixable hints. |
-| `audit-critical` | Deterministic | Extracts high/critical vulnerabilities from `npm audit` or similar. |
-| `infra-risk` | Deterministic | Detects destructive signals in `terraform plan` output. Returns pass/fail verdict. |
-| `build-failure` | Deterministic-first | Extracts the first concrete build error for recognized webpack, esbuild/Vite, Cargo, Go, GCC/Clang, and `tsc --build` output; falls back to the provider for unsupported formats. |
-| `diff-summary` | Provider | Summarizes changes and risks in diff output. |
-| `log-errors` | Provider | Extracts top error signals from log output. |
-Presets marked **Deterministic** bypass the provider entirely for recognized output formats. Presets marked **Deterministic-first** try a local heuristic first and fall back to the provider only when the captured output is unsupported or ambiguous. Presets marked **Provider** always call the LLM but benefit from input sanitization and truncation.
-```bash
-sift exec --preset typecheck-summary -- npx tsc --noEmit
-sift exec --preset lint-failures -- npx eslint src/
-sift exec --preset build-failure -- npm run build
-sift exec --preset audit-critical -- npm audit
-sift exec --preset infra-risk -- terraform plan
-```
-On an interactive terminal, `sift` also shows a small stderr footer so humans can see whether the provider was skipped:
-```text
-[sift: heuristic • LLM skipped • summary 47ms]
-[sift: provider • LLM used • 380 tokens • summary 1.2s]
-```
+Every preset runs local heuristics first. When the heuristic handles the output, the provider is never called.
-Suppress the footer with `--quiet`:
+| Preset | What it does |
+|--------|-------------|
+| `test-status` | Groups pytest, vitest, jest failures into root-cause buckets with anchors and fix suggestions. 30+ failure patterns. |
+| `typecheck-summary` | Parses `tsc` output, groups by error code, returns max 5 bullets. No model call. |
+| `lint-failures` | Parses ESLint output, groups by rule, detects fixable hints. No model call. |
+| `build-failure` | Extracts first concrete error from webpack, esbuild/Vite, Cargo, Go, GCC/Clang, `tsc --build`. Falls back to model for unsupported formats. |
+| `audit-critical` | Extracts high/critical vulnerabilities from `npm audit`. No model call. |
+| `infra-risk` | Detects destructive signals in `terraform plan`. No model call. |
+| `diff-summary` | Summarizes changes and risks in diff output. |
+| `log-errors` | Extracts top error signals from log output. |
-```bash
-sift exec --preset typecheck-summary --quiet -- npx tsc --noEmit
-```
+## Benchmark
-## Strongest today
+End-to-end debug loop on a real 640-test Python backend (125 repeated setup errors, 3 contract failures, 510 passing tests):
-`sift` is strongest when output is:
-- long
-- repetitive
-- triage-heavy
-- shaped by a small number of shared root causes
+| Metric | Without sift | With sift | Reduction |
+|--------|-------------|-----------|-----------|
+| Tokens | 52,944 | 20,049 | 62% fewer |
+| Tool calls | 40.8 | 12 | 71% fewer |
+| Wall-clock time | 244s | 85s | 65% faster |
+| Commands | 15.5 | 6 | 61% fewer |
+| Diagnosis | Same | Same | — |
-Best fits today:
-- large `pytest`, `vitest`, or `jest` runs
-- `tsc` type errors and `eslint` lint failures
-- build failures from webpack, esbuild/Vite, Cargo, Go, GCC/Clang
-- `npm audit` and `terraform plan`
-- repeated CI blockers
-- noisy diffs and log streams
+Methodology and caveats: [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md)
 ## Test debugging workflow
-This is where `sift` is strongest today.
 Think of it like this:
 - `standard` = map
 - `focused` = zoom
 - raw traceback = last resort
-Typical loop:
 ```bash
 sift exec --preset test-status -- <test command>
 sift rerun
 sift rerun --remaining --detail focused
 ```
-If `standard` already gives you the root cause, anchor, and fix, stop there and act.
-`sift rerun --remaining` narrows automatically for cached `pytest` runs.
-For cached `vitest` and `jest` runs, it reruns the original full command and keeps the diagnosis focused on what still fails relative to the cached baseline.
-For other runners, rerun a narrowed command manually with `sift exec --preset test-status -- <narrowed command>`.
-```bash
-sift agent status
-sift agent show claude
-sift agent remove claude
-```
-## Where it helps less
-`sift` adds less value when:
-- the output is already short and obvious
-- the command is interactive or TUI-based
-- the exact raw log matters
-- the output does not expose enough evidence for reliable grouping
-When it cannot be confident, it tells you to zoom or read raw instead of pretending certainty.
-## Benchmark
-On a real 640-test Python backend (125 repeated setup errors, 3 contract failures, 510 passing tests):
-| Metric | Raw agent | sift-first | Reduction |
-|--------|-----------|------------|-----------|
-| Tokens | 305K | 600 | 99.8% |
-| Tool calls | 16 | 7 | 56% |
-| Diagnosis | Same | Same | — |
-The table above is the single-fixture reduction story: the largest real test log in the benchmark shrank from `198026` raw tokens to `129` `standard` tokens.
+If `standard` already gives you the root cause, anchor, and fix — stop and act.
-The end-to-end workflow benchmark is a different metric:
-- `62%` fewer total debugging tokens
-- `71%` fewer tool calls
-- `65%` faster wall-clock time
+`sift rerun --remaining` narrows automatically for cached `pytest` runs. For `vitest` and `jest`, it reruns the full command and keeps diagnosis focused on what still fails.
-Both matter. The table shows how aggressively `sift` can compress one large noisy run. The workflow numbers show how that compounds across a full debug loop.
+## Setup
-Methodology and caveats live in [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md).
-## Configuration
-Inspect and validate config with:
+Guided setup writes a config, verifies the provider, and makes daily use easier:
 ```bash
-sift config show
-sift config show --show-secrets
-sift config validate
+sift config setup
+sift doctor
 ```
-To switch between saved providers without editing files:
+To wire `sift` into your coding agent automatically:
 ```bash
-sift config use openai
-sift config use openrouter
+sift agent install claude
+sift agent install codex
 ```
-Minimal YAML config:
-```yaml
-provider:
-  provider: openai
-  model: gpt-5-nano
-  baseUrl: https://api.openai.com/v1
-  apiKey: YOUR_API_KEY
-input:
-  stripAnsi: true
-  redact: false
-  maxCaptureChars: 400000
-  maxInputChars: 60000
-runtime:
-  rawFallback: true
-```
+Config details: [docs/cli-reference.md](docs/cli-reference.md)
 ## Docs

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@bilalimamoglu/sift",
-  "version": "0.4.1",
+  "version": "0.4.2",
   "description": "Agent-first command-output reduction layer for agents, CI, and automation.",
   "type": "module",
   "bin": {