npm - @bilalimamoglu/sift - Versions diffs - 0.3.2 → 0.4.0 - Mend

@bilalimamoglu/sift 0.3.2 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md CHANGED Viewed

@@ -1,247 +1,279 @@
 # sift
-<img src="assets/brand/sift-logo-minimal-monochrome.svg" alt="sift logo" width="120" />
+[![npm version](https://img.shields.io/npm/v/@bilalimamoglu/sift)](https://www.npmjs.com/package/@bilalimamoglu/sift)
+[![license](https://img.shields.io/github/license/bilalimamoglu/sift)](LICENSE)
+[![CI](https://img.shields.io/github/actions/workflow/status/bilalimamoglu/sift/ci.yml?branch=main&label=CI)](https://github.com/bilalimamoglu/sift/actions/workflows/ci.yml)
-Most command output is long and noisy, but the thing you actually need to know is short: what failed, where, and what to do next. `sift` runs the command for you, captures the output, and gives you a short answer instead of a wall of text.
+<img src="assets/brand/sift-logo-minimal-teal-default.svg" alt="sift logo" width="140" />
-It works with test suites, build logs, `git diff`, `npm audit`, `terraform plan` — anything where the signal is buried in noise. It always tries the cheapest approach first and only escalates when needed. Exit codes are preserved.
+Your AI agent should not be reading 13,000 lines of test output.
-Skip it when:
-- you need the exact raw log
-- the command is interactive or TUI-based
-- the output is already short
-## Install
-Requires Node.js 24 or later.
+On the largest real fixture in the benchmark:
+**Before:** 128 failures, 198K raw-output tokens, agent reconstructs the failure shape from scratch.
+**After:** 6 lines, 129 `standard` tokens, agent acts on a grouped diagnosis immediately.
 ```bash
-npm install -g @bilalimamoglu/sift
+sift exec --preset test-status -- pytest -q
 ```
-## Setup
+```text
+- Tests did not pass.
+- 3 tests failed. 125 errors occurred.
+- Shared blocker: 125 errors share the same root cause - a missing test environment variable.
+  Anchor: tests/conftest.py
+  Fix: Set the required env var before rerunning DB-isolated tests.
+- Contract drift: 3 snapshot tests are out of sync with the current API or model state.
+  Anchor: tests/contracts/test_feature_manifest_freeze.py
+  Fix: Regenerate the snapshots if the changes are intentional.
+- Decision: stop and act.
+```
+If 125 tests fail for one reason, the agent should pay for that reason once.
+## What it is
-The interactive setup writes a machine-wide config and walks you through provider selection:
+Developers using coding agents — Claude Code, Codex, Cursor, Windsurf, Copilot, or any LLM-driven workflow that runs shell commands and reads the output.
+`sift` sits between the command and the agent. It captures noisy output, groups repeated failures into root-cause buckets, and returns a short diagnosis with an anchor, a likely fix, and a decision signal. The agent gets a map instead of a wall of text.
+## Install
 ```bash
-sift config setup
-sift doctor          # verify it works
+npm install -g @bilalimamoglu/sift
 ```
-Config is saved to `~/.config/sift/config.yaml`. A repo-local `sift.config.yaml` can override it later.
+Requires Node.js 20+.
-If you prefer environment variables instead:
+## Try it in 60 seconds
+If you already have an API key, you can try `sift` without any setup wizard:
 ```bash
-# OpenAI
-export SIFT_PROVIDER=openai
-export SIFT_BASE_URL=https://api.openai.com/v1
-export SIFT_MODEL=gpt-5-nano
 export OPENAI_API_KEY=your_openai_api_key
+sift exec --preset test-status -- pytest -q
+```
-# or OpenRouter
-export SIFT_PROVIDER=openrouter
-export OPENROUTER_API_KEY=your_openrouter_api_key
+You can also use a freeform prompt for non-test output:
-# or any OpenAI-compatible endpoint (Together, Groq, self-hosted, etc.)
-export SIFT_PROVIDER=openai-compatible
-export SIFT_BASE_URL=https://your-endpoint/v1
-export SIFT_PROVIDER_API_KEY=your_api_key
+```bash
+sift exec "what changed?" -- git diff
 ```
-To switch between saved providers without editing files:
+## Set it up for daily use
+Guided setup writes a machine-wide config, verifies the provider, and makes the CLI easier to use day to day:
 ```bash
-sift config use openai
-sift config use openrouter
+sift config setup
+sift doctor
 ```
-## Usage
+Config lives at `~/.config/sift/config.yaml`. A repo-local `sift.config.yaml` can override it later.
-Run a noisy command through `sift`, read the short answer, and only zoom in if it tells you to:
+If you want your coding agent to use `sift` automatically, install the managed instruction block too:
 ```bash
-sift exec --preset test-status -- pytest -q
+sift agent install codex
+sift agent install claude
+```
+Then run noisy commands through `sift`:
+```bash
+sift exec --preset test-status -- <test command>
 sift exec "what changed?" -- git diff
 sift exec --preset audit-critical -- npm audit
 sift exec --preset infra-risk -- terraform plan
 ```
-`sift exec` runs the child command, captures its output, reduces it, and preserves the original exit code.
 Useful flags:
-- `--dry-run`: show the reduced input and prompt without calling the provider
-- `--show-raw`: print the captured raw output to `stderr`
-## Test debugging workflow
-This is the most common use case and where `sift` adds the most value.
-Think of it like this:
-- `standard` = map
-- `focused` or `rerun --remaining` = zoom
-- raw traceback = last resort
+- `--dry-run` to preview the reduced input and prompt without calling a provider
+- `--show-raw` to print captured raw output to `stderr`
+- `--fail-on` to let reduced results fail CI for commands such as `npm audit` or `terraform plan`
-For most repos, the whole story is:
+If you prefer environment variables instead of setup:
 ```bash
-sift exec --preset test-status -- <test command>   # get the map
-sift rerun                                          # after a fix, refresh the truth
-sift rerun --remaining --detail focused             # zoom into what's still failing
-```
+# OpenAI
+export SIFT_PROVIDER=openai
+export SIFT_BASE_URL=https://api.openai.com/v1
+export SIFT_MODEL=gpt-5-nano
+export OPENAI_API_KEY=your_openai_api_key
-`test-status` becomes test-aware because you chose the preset. It does **not** infer "this is a test command" from the runner name — use the same preset with `pytest`, `vitest`, `jest`, `bun test`, or any other runner.
+# OpenRouter
+export SIFT_PROVIDER=openrouter
+export OPENROUTER_API_KEY=your_openrouter_api_key
-If `standard` already names the failure buckets, counts, and hints, stop there and read code. If it ends with `Decision: zoom`, do one deeper pass before falling back to raw traceback.
+# Any OpenAI-compatible endpoint
+export SIFT_PROVIDER=openai-compatible
+export SIFT_BASE_URL=https://your-endpoint/v1
+export SIFT_PROVIDER_API_KEY=your_api_key
+```
-### What `sift` returns for each failure family
+## Why it helps
-- `Shared blocker` — one setup problem affecting many tests
-- A named family such as import, timeout, network, migration, or assertion
-- `Anchor` — the first file, line window, or search term worth opening
-- `Fix` — the likely next move
-- `Decision` — whether to stop here or zoom one step deeper
-- `Next` — the smallest practical action
+The core abstraction is a **bucket**: one distinct root cause, no matter how many tests it affects.
-### Detail levels
+Instead of making an agent reason over 125 repeated tracebacks, `sift` compresses them into one actionable bucket with:
+- a label
+- an affected count
+- an anchor
+- a likely fix
+- a decision signal
-- `standard` — short summary, no file list (default)
-- `focused` — groups failures by error type, shows a few representative tests
-- `verbose` — flat list of all visible failing tests with their normalized reason
+That changes the agent's job from "figure out what happened" to "act on the diagnosis."
-### Example output
+## How it works
-Single failure family:
-```text
-- Tests did not complete.
-- 114 errors occurred during collection.
-- Import/dependency blocker: repeated collection failures are caused by missing dependencies.
-- Anchor: path/to/failing_test.py
-- Fix: Install the missing dependencies and rerun the affected tests.
-- Decision: stop and act. Do not escalate unless you need exact traceback lines.
-- Next: Fix bucket 1 first, then rerun the full suite at standard.
-```
+`sift` follows a cheapest-first pipeline:
-Multiple failure families in one pass:
-```text
-- Tests did not pass.
-- 3 tests failed. 124 errors occurred.
-- Shared blocker: DB-isolated tests are missing a required test env var.
-  Anchor: search <TEST_ENV_VAR> in path/to/test_setup.py
-  Fix: Set the required test env var and rerun the suite.
-- Contract drift: snapshot expectations are out of sync with the current API or model state.
-  Anchor: search <route-or-entity> in path/to/freeze_test.py
-  Fix: Review the drift and regenerate the snapshots if the change is intentional.
-- Decision: stop and act.
-- Next: Fix bucket 1 first, then rerun the full suite at standard.
-```
+1. Capture command output.
+2. Sanitize sensitive-looking material.
+3. Apply local heuristics for known failure shapes.
+4. Escalate to a cheaper provider only if needed.
+5. Return a short diagnosis to the main agent.
-### Recommended debugging order
+It also returns a decision signal:
+- `stop and act` when the diagnosis is already actionable
+- `zoom` when one deeper pass is justified
+- raw logs only as a last resort
-1. `sift exec --preset test-status -- <test command>` — get the map.
-2. If `standard` already shows root cause, `Anchor`, and `Fix`, trust it and act.
-3. `sift escalate` — deeper render of the same cached output, without rerunning.
-4. `sift rerun` — after a fix, refresh the full-suite truth at `standard`.
-5. `sift rerun --remaining --detail focused` — zoom into what is still failing.
-6. `sift rerun --remaining --detail verbose`
-7. `sift rerun --remaining --detail verbose --show-raw`
-8. Raw test command only if exact traceback lines are still needed.
+For recognized formats, local heuristics can fully handle the output and skip the provider entirely.
-`sift rerun --remaining` currently supports only cached `pytest` or `python -m pytest` runs. For other runners, rerun a narrowed command manually with `sift exec --preset test-status -- <narrowed command>`.
+The deepest local coverage today is test debugging, especially `pytest`, with growing support for `vitest` and `jest`. Other presets cover typecheck walls, lint failures, build errors, audit output, and Terraform risk detection.
-### Quick glossary
+## Built-in presets
-- `sift escalate` = same cached output, deeper render
-- `sift rerun` = rerun the cached command at `standard`, show what resolved or remained
-- `sift rerun --remaining` = rerun only the remaining failing test nodes
-- `Decision: stop and act` = trust the diagnosis and go fix code
-- `Decision: zoom` = one deeper sift pass is justified before raw
+Every preset runs local heuristics first. When the heuristic confidently handles the output, the provider is never called.
-## Watch mode
+| Preset | Heuristic | What it does |
+|--------|-----------|-------------|
+| `test-status` | Deep | Bucket/anchor/decision system for pytest, vitest, jest. 30+ failure patterns, confidence-gated stop/zoom decisions. |
+| `typecheck-summary` | Deterministic | Parses `tsc` output (standard and pretty formats), groups by error code, returns max 5 bullets. |
+| `lint-failures` | Deterministic | Parses ESLint stylish output, groups by rule, distinguishes errors from warnings, detects fixable hints. |
+| `audit-critical` | Deterministic | Extracts high/critical vulnerabilities from `npm audit` or similar. |
+| `infra-risk` | Deterministic | Detects destructive signals in `terraform plan` output. Returns pass/fail verdict. |
+| `build-failure` | Deterministic-first | Extracts the first concrete build error for recognized webpack, esbuild/Vite, Cargo, Go, GCC/Clang, and `tsc --build` output; falls back to the provider for unsupported formats. |
+| `diff-summary` | Provider | Summarizes changes and risks in diff output. |
+| `log-errors` | Provider | Extracts top error signals from log output. |
-Use watch mode when output redraws or repeats across cycles:
+Presets marked **Deterministic** bypass the provider entirely for recognized output formats. Presets marked **Deterministic-first** try a local heuristic first and fall back to the provider only when the captured output is unsupported or ambiguous. Presets marked **Provider** always call the LLM but benefit from input sanitization and truncation.
 ```bash
-sift watch "what changed between cycles?" < watcher-output.txt
-sift exec --watch "what changed between cycles?" -- node watcher.js
-sift exec --watch --preset test-status -- pytest -f
+sift exec --preset typecheck-summary -- npx tsc --noEmit
+sift exec --preset lint-failures -- npx eslint src/
+sift exec --preset build-failure -- npm run build
+sift exec --preset audit-critical -- npm audit
+sift exec --preset infra-risk -- terraform plan
 ```
-- cycle 1 = current state
-- later cycles = what changed, what resolved, what stayed, and the next best action
-- for `test-status`, resolved tests drop out and remaining failures stay in focus
+On an interactive terminal, `sift` also shows a small stderr footer so humans can see whether the provider was skipped:
-## Diagnose JSON
+```text
+[sift: heuristic • LLM skipped • summary 47ms]
+[sift: provider • LLM used • 380 tokens • summary 1.2s]
+```
-Start with text. Use JSON only when automation needs machine-readable output:
+Suppress the footer with `--quiet`:
 ```bash
-sift exec --preset test-status --goal diagnose --format json -- pytest -q
-sift rerun --goal diagnose --format json
+sift exec --preset typecheck-summary --quiet -- npx tsc --noEmit
 ```
-The JSON is summary-first: `remaining_summary`, `resolved_summary`, `read_targets` with optional `context_hint`, and `remaining_subset_available` to tell you whether `sift rerun --remaining` can zoom safely.
+## Strongest today
-Add `--include-test-ids` only when you need every raw failing test ID.
+`sift` is strongest when output is:
+- long
+- repetitive
+- triage-heavy
+- shaped by a small number of shared root causes
-## Built-in presets
+Best fits today:
+- large `pytest`, `vitest`, or `jest` runs
+- `tsc` type errors and `eslint` lint failures
+- build failures from webpack, esbuild/Vite, Cargo, Go, GCC/Clang
+- `npm audit` and `terraform plan`
+- repeated CI blockers
+- noisy diffs and log streams
-- `test-status`: summarize test runs
-- `typecheck-summary`: group blocking type errors by root cause
-- `lint-failures`: group repeated lint violations and highlight the files or rules that matter
-- `audit-critical`: extract only high and critical vulnerabilities
-- `infra-risk`: return a safety verdict for infra changes
-- `diff-summary`: summarize code changes and risks
-- `build-failure`: explain the most likely build failure
-- `log-errors`: extract the most relevant error signals
+## Test debugging workflow
-```bash
-sift presets list
-sift presets show test-status
-```
+This is where `sift` is strongest today.
-## Agent setup
+Think of it like this:
+- `standard` = map
+- `focused` = zoom
+- raw traceback = last resort
-`sift` can install a managed instruction block so Codex or Claude Code uses `sift` by default for long command output:
+Typical loop:
 ```bash
-sift agent install codex
-sift agent install claude
+sift exec --preset test-status -- <test command>
+sift rerun
+sift rerun --remaining --detail focused
 ```
-This writes a managed block to `AGENTS.md` or `CLAUDE.md` in the current repo. Use `--dry-run` to preview, or `--scope global` for machine-wide instructions.
+If `standard` already gives you the root cause, anchor, and fix, stop there and act.
+`sift rerun --remaining` narrows automatically for cached `pytest` runs.
+For cached `vitest` and `jest` runs, it reruns the original full command and keeps the diagnosis focused on what still fails relative to the cached baseline.
+For other runners, rerun a narrowed command manually with `sift exec --preset test-status -- <narrowed command>`.
 ```bash
 sift agent status
-sift agent remove codex
+sift agent show claude
 sift agent remove claude
 ```
-## CI usage
+## Where it helps less
-Some commands succeed technically but should still block CI. `--fail-on` handles that:
+`sift` adds less value when:
+- the output is already short and obvious
+- the command is interactive or TUI-based
+- the exact raw log matters
+- the output does not expose enough evidence for reliable grouping
-```bash
-sift exec --preset audit-critical --fail-on -- npm audit
-sift exec --preset infra-risk --fail-on -- terraform plan
-```
+When it cannot be confident, it tells you to zoom or read raw instead of pretending certainty.
+## Benchmark
+On a real 640-test Python backend (125 repeated setup errors, 3 contract failures, 510 passing tests):
+| Metric | Raw agent | sift-first | Reduction |
+|--------|-----------|------------|-----------|
+| Tokens | 305K | 600 | 99.8% |
+| Tool calls | 16 | 7 | 56% |
+| Diagnosis | Same | Same | — |
+The table above is the single-fixture reduction story: the largest real test log in the benchmark shrank from `198026` raw tokens to `129` `standard` tokens.
-## Config
+The end-to-end workflow benchmark is a different metric:
+- `62%` fewer total debugging tokens
+- `71%` fewer tool calls
+- `65%` faster wall-clock time
+Both matter. The table shows how aggressively `sift` can compress one large noisy run. The workflow numbers show how that compounds across a full debug loop.
+Methodology and caveats live in [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md).
+## Configuration
+Inspect and validate config with:
 ```bash
-sift config show          # masks secrets by default
+sift config show
 sift config show --show-secrets
 sift config validate
 ```
-Config precedence:
-1. CLI flags
-2. environment variables
-3. repo-local `sift.config.yaml`
-4. machine-wide `~/.config/sift/config.yaml`
-5. built-in defaults
+To switch between saved providers without editing files:
-If you pass `--config <path>`, that path is strict — missing paths are errors.
+```bash
+sift config use openai
+sift config use openrouter
+```
 Minimal YAML config:
@@ -262,37 +294,12 @@ runtime:
   rawFallback: true
 ```
-## Safety and limits
-- redaction is optional and regex-based
-- retriable provider failures (`429`, timeouts, `5xx`) are retried once
-- `sift exec` detects interactive prompts (`[y/N]`, `password:`) and skips reduction
-- pipe mode does not preserve upstream pipeline failures; use `set -o pipefail` if needed
-## Releasing
-This repo uses a manual GitHub Actions release workflow with npm trusted publishing.
-1. bump `package.json`
-2. merge to `main`
-3. run the `release` workflow manually
-The workflow runs typecheck, tests, coverage, build, packaging smoke checks, npm publish, tag creation, and GitHub Release creation.
-Release notes: if `release-notes/v<version>.md` or `release-notes/<version>.md` exists, the workflow uses it. Otherwise it falls back to GitHub generated notes.
-## Maintainer benchmark
-```bash
-npm run bench:test-status-ab
-npm run bench:test-status-live
-```
-Uses the `o200k_base` tokenizer and reports command-output budget as the primary benchmark, with deterministic recipe-budget comparisons and live-session scorecards as supporting evidence.
-## Brand assets
+## Docs
-Logo assets live in `assets/brand/`: badge/app, icon-only, and 24px icon variants in teal, black, and monochrome.
+- CLI reference: [docs/cli-reference.md](docs/cli-reference.md)
+- Worked examples: [docs/examples](docs/examples)
+- Benchmark methodology: [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md)
+- Release notes: [release-notes](release-notes)
 ## License