npm - @bilalimamoglu/sift - Versions diffs - 0.3.2 → 0.3.3 - Mend

@bilalimamoglu/sift 0.3.2 → 0.3.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md CHANGED Viewed

@@ -1,36 +1,74 @@
 # sift
-<img src="assets/brand/sift-logo-minimal-monochrome.svg" alt="sift logo" width="120" />
+[![npm version](https://img.shields.io/npm/v/@bilalimamoglu/sift)](https://www.npmjs.com/package/@bilalimamoglu/sift)
+[![license](https://img.shields.io/github/license/bilalimamoglu/sift)](LICENSE)
+[![CI](https://img.shields.io/github/actions/workflow/status/bilalimamoglu/sift/ci.yml?branch=main&label=CI)](https://github.com/bilalimamoglu/sift/actions/workflows/ci.yml)
-Most command output is long and noisy, but the thing you actually need to know is short: what failed, where, and what to do next. `sift` runs the command for you, captures the output, and gives you a short answer instead of a wall of text.
+<img src="assets/brand/sift-logo-minimal-teal-default.svg" alt="sift logo" width="140" />
-It works with test suites, build logs, `git diff`, `npm audit`, `terraform plan` — anything where the signal is buried in noise. It always tries the cheapest approach first and only escalates when needed. Exit codes are preserved.
+Your AI agent should not be reading 13,000 lines of test output.
-Skip it when:
-- you need the exact raw log
-- the command is interactive or TUI-based
-- the output is already short
+**Before:** 128 failures, 198K tokens, 16 tool calls, agent reconstructs the failure shape from scratch.
+**After:** 6 lines, 129 tokens, 4 tool calls, agent acts on a grouped diagnosis immediately.
-## Install
+```bash
+sift exec --preset test-status -- pytest -q
+```
+```text
+- Tests did not pass.
+- 3 tests failed. 125 errors occurred.
+- Shared blocker: 125 errors share the same root cause - a missing test environment variable.
+  Anchor: tests/conftest.py
+  Fix: Set the required env var before rerunning DB-isolated tests.
+- Contract drift: 3 snapshot tests are out of sync with the current API or model state.
+  Anchor: tests/contracts/test_feature_manifest_freeze.py
+  Fix: Regenerate the snapshots if the changes are intentional.
+- Decision: stop and act.
+```
+If 125 tests fail for one reason, the agent should pay for that reason once.
+## Who is this for
+Developers using coding agents — Claude Code, Codex, Cursor, Windsurf, Copilot, or any LLM-driven workflow that runs shell commands and reads the output.
-Requires Node.js 24 or later.
+`sift` sits between the command and the agent. It captures noisy output, groups repeated failures into root-cause buckets, and returns a short diagnosis with an anchor, a likely fix, and a decision signal. The agent gets a map instead of a wall of text.
+## Install
 ```bash
 npm install -g @bilalimamoglu/sift
 ```
-## Setup
+Requires Node.js 20+.
-The interactive setup writes a machine-wide config and walks you through provider selection:
+## Quick start
+Guided setup writes a machine-wide config and verifies the provider:
 ```bash
 sift config setup
-sift doctor          # verify it works
+sift doctor
 ```
-Config is saved to `~/.config/sift/config.yaml`. A repo-local `sift.config.yaml` can override it later.
+Config lives at `~/.config/sift/config.yaml`. A repo-local `sift.config.yaml` can override it later.
-If you prefer environment variables instead:
+Then run noisy commands through `sift`:
+```bash
+sift exec --preset test-status -- <test command>
+sift exec "what changed?" -- git diff
+sift exec --preset audit-critical -- npm audit
+sift exec --preset infra-risk -- terraform plan
+```
+Useful flags:
+- `--dry-run` to preview the reduced input and prompt without calling a provider
+- `--show-raw` to print captured raw output to `stderr`
+- `--fail-on` to let reduced results fail CI for commands such as `npm audit` or `terraform plan`
+If you prefer environment variables instead of setup:
 ```bash
 # OpenAI
@@ -39,209 +77,167 @@ export SIFT_BASE_URL=https://api.openai.com/v1
 export SIFT_MODEL=gpt-5-nano
 export OPENAI_API_KEY=your_openai_api_key
-# or OpenRouter
+# OpenRouter
 export SIFT_PROVIDER=openrouter
 export OPENROUTER_API_KEY=your_openrouter_api_key
-# or any OpenAI-compatible endpoint (Together, Groq, self-hosted, etc.)
+# Any OpenAI-compatible endpoint
 export SIFT_PROVIDER=openai-compatible
 export SIFT_BASE_URL=https://your-endpoint/v1
 export SIFT_PROVIDER_API_KEY=your_api_key
 ```
-To switch between saved providers without editing files:
+## How it works
-```bash
-sift config use openai
-sift config use openrouter
-```
+`sift` follows a cheapest-first pipeline:
-## Usage
+1. Capture command output.
+2. Sanitize sensitive-looking material.
+3. Apply local heuristics for known failure shapes.
+4. Escalate to a cheaper provider only if needed.
+5. Return a short diagnosis to the main agent.
-Run a noisy command through `sift`, read the short answer, and only zoom in if it tells you to:
+The core abstraction is a **bucket** — one distinct root cause, no matter how many tests it affects. Instead of making an agent reason over 125 repeated tracebacks, `sift` compresses them into one actionable bucket with a label, an affected count, an anchor, and a likely fix.
-```bash
-sift exec --preset test-status -- pytest -q
-sift exec "what changed?" -- git diff
-sift exec --preset audit-critical -- npm audit
-sift exec --preset infra-risk -- terraform plan
-```
+It also returns a decision signal:
+- `stop and act` when the diagnosis is already actionable
+- `zoom` when one deeper pass is justified
+- raw logs only as a last resort
-`sift exec` runs the child command, captures its output, reduces it, and preserves the original exit code.
+The deepest local coverage today is test debugging, especially `pytest`, with growing support for `vitest` and `jest`.
-Useful flags:
-- `--dry-run`: show the reduced input and prompt without calling the provider
-- `--show-raw`: print the captured raw output to `stderr`
-## Test debugging workflow
+## Built-in presets
-This is the most common use case and where `sift` adds the most value.
+Every preset runs local heuristics first. When the heuristic confidently handles the output, the provider is never called — zero tokens, zero latency, fully deterministic.
-Think of it like this:
-- `standard` = map
-- `focused` or `rerun --remaining` = zoom
-- raw traceback = last resort
+| Preset | Heuristic | What it does |
+|--------|-----------|-------------|
+| `test-status` | Deep | Bucket/anchor/decision system for pytest, vitest, jest. 30+ failure patterns, confidence-gated stop/zoom decisions. |
+| `typecheck-summary` | Deterministic | Parses `tsc` output (standard and pretty formats), groups by error code, returns max 5 bullets. |
+| `lint-failures` | Deterministic | Parses ESLint stylish output, groups by rule, distinguishes errors from warnings, detects fixable hints. |
+| `audit-critical` | Deterministic | Extracts high/critical vulnerabilities from `npm audit` or similar. |
+| `infra-risk` | Deterministic | Detects destructive signals in `terraform plan` output. Returns pass/fail verdict. |
+| `build-failure` | Deterministic-first | Extracts the first concrete build error for recognized webpack, esbuild/Vite, Cargo, Go, GCC/Clang, and `tsc --build` output; falls back to the provider for unsupported formats. |
+| `diff-summary` | Provider | Summarizes changes and risks in diff output. |
+| `log-errors` | Provider | Extracts top error signals from log output. |
-For most repos, the whole story is:
+Presets marked **Deterministic** bypass the provider entirely for recognized output formats. Presets marked **Deterministic-first** try a local heuristic first and fall back to the provider only when the captured output is unsupported or ambiguous. Presets marked **Provider** always call the LLM but benefit from input sanitization and truncation.
 ```bash
-sift exec --preset test-status -- <test command>   # get the map
-sift rerun                                          # after a fix, refresh the truth
-sift rerun --remaining --detail focused             # zoom into what's still failing
+sift exec --preset typecheck-summary -- npx tsc --noEmit
+sift exec --preset lint-failures -- npx eslint src/
+sift exec --preset build-failure -- npm run build
+sift exec --preset audit-critical -- npm audit
+sift exec --preset infra-risk -- terraform plan
 ```
-`test-status` becomes test-aware because you chose the preset. It does **not** infer "this is a test command" from the runner name — use the same preset with `pytest`, `vitest`, `jest`, `bun test`, or any other runner.
+On an interactive terminal, `sift` also shows a small stderr footer so humans can see whether the provider was skipped:
-If `standard` already names the failure buckets, counts, and hints, stop there and read code. If it ends with `Decision: zoom`, do one deeper pass before falling back to raw traceback.
+```text
+[sift: heuristic • LLM skipped • summary 47ms]
+[sift: provider • LLM used • 380 tokens • summary 1.2s]
+```
-### What `sift` returns for each failure family
+Suppress the footer with `--quiet`:
-- `Shared blocker` — one setup problem affecting many tests
-- A named family such as import, timeout, network, migration, or assertion
-- `Anchor` — the first file, line window, or search term worth opening
-- `Fix` — the likely next move
-- `Decision` — whether to stop here or zoom one step deeper
-- `Next` — the smallest practical action
+```bash
+sift exec --preset typecheck-summary --quiet -- npx tsc --noEmit
+```
-### Detail levels
+## Test debugging workflow
-- `standard` — short summary, no file list (default)
-- `focused` — groups failures by error type, shows a few representative tests
-- `verbose` — flat list of all visible failing tests with their normalized reason
+This is where `sift` is strongest today.
-### Example output
+Think of it like this:
+- `standard` = map
+- `focused` = zoom
+- raw traceback = last resort
-Single failure family:
-```text
-- Tests did not complete.
-- 114 errors occurred during collection.
-- Import/dependency blocker: repeated collection failures are caused by missing dependencies.
-- Anchor: path/to/failing_test.py
-- Fix: Install the missing dependencies and rerun the affected tests.
-- Decision: stop and act. Do not escalate unless you need exact traceback lines.
-- Next: Fix bucket 1 first, then rerun the full suite at standard.
-```
+Typical loop:
-Multiple failure families in one pass:
-```text
-- Tests did not pass.
-- 3 tests failed. 124 errors occurred.
-- Shared blocker: DB-isolated tests are missing a required test env var.
-  Anchor: search <TEST_ENV_VAR> in path/to/test_setup.py
-  Fix: Set the required test env var and rerun the suite.
-- Contract drift: snapshot expectations are out of sync with the current API or model state.
-  Anchor: search <route-or-entity> in path/to/freeze_test.py
-  Fix: Review the drift and regenerate the snapshots if the change is intentional.
-- Decision: stop and act.
-- Next: Fix bucket 1 first, then rerun the full suite at standard.
+```bash
+sift exec --preset test-status -- <test command>
+sift rerun
+sift rerun --remaining --detail focused
 ```
-### Recommended debugging order
-1. `sift exec --preset test-status -- <test command>` — get the map.
-2. If `standard` already shows root cause, `Anchor`, and `Fix`, trust it and act.
-3. `sift escalate` — deeper render of the same cached output, without rerunning.
-4. `sift rerun` — after a fix, refresh the full-suite truth at `standard`.
-5. `sift rerun --remaining --detail focused` — zoom into what is still failing.
-6. `sift rerun --remaining --detail verbose`
-7. `sift rerun --remaining --detail verbose --show-raw`
-8. Raw test command only if exact traceback lines are still needed.
+If `standard` already gives you the root cause, anchor, and fix, stop there and act.
 `sift rerun --remaining` currently supports only cached `pytest` or `python -m pytest` runs. For other runners, rerun a narrowed command manually with `sift exec --preset test-status -- <narrowed command>`.
-### Quick glossary
-- `sift escalate` = same cached output, deeper render
-- `sift rerun` = rerun the cached command at `standard`, show what resolved or remained
-- `sift rerun --remaining` = rerun only the remaining failing test nodes
-- `Decision: stop and act` = trust the diagnosis and go fix code
-- `Decision: zoom` = one deeper sift pass is justified before raw
-## Watch mode
+## Agent setup
-Use watch mode when output redraws or repeats across cycles:
+`sift` can install a managed instruction block so coding agents use it by default for long command output:
 ```bash
-sift watch "what changed between cycles?" < watcher-output.txt
-sift exec --watch "what changed between cycles?" -- node watcher.js
-sift exec --watch --preset test-status -- pytest -f
+sift agent install claude
+sift agent install codex
 ```
-- cycle 1 = current state
-- later cycles = what changed, what resolved, what stayed, and the next best action
-- for `test-status`, resolved tests drop out and remaining failures stay in focus
-## Diagnose JSON
-Start with text. Use JSON only when automation needs machine-readable output:
+This writes a tuned set of rules into your agent's config (CLAUDE.md, AGENTS.md, etc.) so the agent routes noisy commands through `sift` automatically — no manual prompting needed.
 ```bash
-sift exec --preset test-status --goal diagnose --format json -- pytest -q
-sift rerun --goal diagnose --format json
+sift agent status
+sift agent show claude
+sift agent remove claude
 ```
-The JSON is summary-first: `remaining_summary`, `resolved_summary`, `read_targets` with optional `context_hint`, and `remaining_subset_available` to tell you whether `sift rerun --remaining` can zoom safely.
+## Where `sift` helps most
-Add `--include-test-ids` only when you need every raw failing test ID.
+`sift` is strongest when output is:
+- long
+- repetitive
+- triage-heavy
+- shaped by a small number of root causes
-## Built-in presets
+Good fits:
+- large `pytest`, `vitest`, or `jest` runs (deterministic heuristics)
+- `tsc` type errors and `eslint` lint failures (deterministic heuristics)
+- build failures from webpack, esbuild, cargo, go, gcc
+- `npm audit` and `terraform plan` (deterministic heuristics)
+- repeated CI blockers
+- noisy diffs and log streams
-- `test-status`: summarize test runs
-- `typecheck-summary`: group blocking type errors by root cause
-- `lint-failures`: group repeated lint violations and highlight the files or rules that matter
-- `audit-critical`: extract only high and critical vulnerabilities
-- `infra-risk`: return a safety verdict for infra changes
-- `diff-summary`: summarize code changes and risks
-- `build-failure`: explain the most likely build failure
-- `log-errors`: extract the most relevant error signals
+## Where it helps less
-```bash
-sift presets list
-sift presets show test-status
-```
+`sift` adds less value when:
+- the output is already short and obvious
+- the command is interactive or TUI-based
+- the exact raw log matters
+- the output does not expose enough evidence for reliable grouping
-## Agent setup
+When it cannot be confident, it tells you to zoom or read raw instead of pretending certainty.
-`sift` can install a managed instruction block so Codex or Claude Code uses `sift` by default for long command output:
+## Benchmark
-```bash
-sift agent install codex
-sift agent install claude
-```
+On a real 640-test Python backend (125 repeated setup errors, 3 contract failures, 510 passing tests):
-This writes a managed block to `AGENTS.md` or `CLAUDE.md` in the current repo. Use `--dry-run` to preview, or `--scope global` for machine-wide instructions.
+| Metric | Raw agent | sift-first | Reduction |
+|--------|-----------|------------|-----------|
+| Tokens | 305K | 600 | 99.8% |
+| Tool calls | 16 | 7 | 56% |
+| Diagnosis | Same | Same | — |
-```bash
-sift agent status
-sift agent remove codex
-sift agent remove claude
-```
+The headline numbers (62% token reduction, 71% fewer tool calls, 65% faster) come from the end-to-end wall-clock comparison. The table above shows the token-level reduction on the largest real fixture.
-## CI usage
+Methodology and caveats live in [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md).
-Some commands succeed technically but should still block CI. `--fail-on` handles that:
+## Configuration
-```bash
-sift exec --preset audit-critical --fail-on -- npm audit
-sift exec --preset infra-risk --fail-on -- terraform plan
-```
-## Config
+Inspect and validate config with:
 ```bash
-sift config show          # masks secrets by default
+sift config show
 sift config show --show-secrets
 sift config validate
 ```
-Config precedence:
-1. CLI flags
-2. environment variables
-3. repo-local `sift.config.yaml`
-4. machine-wide `~/.config/sift/config.yaml`
-5. built-in defaults
+To switch between saved providers without editing files:
-If you pass `--config <path>`, that path is strict — missing paths are errors.
+```bash
+sift config use openai
+sift config use openrouter
+```
 Minimal YAML config:
@@ -262,37 +258,10 @@ runtime:
   rawFallback: true
 ```
-## Safety and limits
-- redaction is optional and regex-based
-- retriable provider failures (`429`, timeouts, `5xx`) are retried once
-- `sift exec` detects interactive prompts (`[y/N]`, `password:`) and skips reduction
-- pipe mode does not preserve upstream pipeline failures; use `set -o pipefail` if needed
-## Releasing
-This repo uses a manual GitHub Actions release workflow with npm trusted publishing.
-1. bump `package.json`
-2. merge to `main`
-3. run the `release` workflow manually
-The workflow runs typecheck, tests, coverage, build, packaging smoke checks, npm publish, tag creation, and GitHub Release creation.
-Release notes: if `release-notes/v<version>.md` or `release-notes/<version>.md` exists, the workflow uses it. Otherwise it falls back to GitHub generated notes.
-## Maintainer benchmark
-```bash
-npm run bench:test-status-ab
-npm run bench:test-status-live
-```
-Uses the `o200k_base` tokenizer and reports command-output budget as the primary benchmark, with deterministic recipe-budget comparisons and live-session scorecards as supporting evidence.
-## Brand assets
+## Docs
-Logo assets live in `assets/brand/`: badge/app, icon-only, and 24px icon variants in teal, black, and monochrome.
+- CLI reference: [docs/cli-reference.md](docs/cli-reference.md)
+- Benchmark methodology: [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md)
 ## License