npm - @bilalimamoglu/sift - Versions diffs - 0.4.1 → 0.4.3 - Mend

@bilalimamoglu/sift 0.4.1 → 0.4.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md CHANGED Viewed

@@ -1,26 +1,45 @@
+<div align="center">
+<img src="assets/brand/sift-logo-minimal-teal-default.svg" alt="sift logo" width="220" />
 # sift
+### Turn noisy command output into actionable diagnoses for your coding agent
+**Benchmark-backed test triage - Heuristic-first reductions - Agent-ready terminal workflows**
 [![npm version](https://img.shields.io/npm/v/@bilalimamoglu/sift)](https://www.npmjs.com/package/@bilalimamoglu/sift)
 [![license](https://img.shields.io/github/license/bilalimamoglu/sift)](LICENSE)
 [![CI](https://img.shields.io/github/actions/workflow/status/bilalimamoglu/sift/ci.yml?branch=main&label=CI)](https://github.com/bilalimamoglu/sift/actions/workflows/ci.yml)
+[![Node.js](https://img.shields.io/badge/Node.js-20+-green.svg)](https://nodejs.org/)
-<img src="assets/brand/sift-logo-minimal-teal-default.svg" alt="sift logo" width="140" />
+<br />
-Your AI agent should not be reading 13,000 lines of test output.
+### Get Started
-If 125 tests fail for one reason, it should pay for that reason once.
+```bash
+npm install -g @bilalimamoglu/sift
+```
-`sift` turns noisy command output into a short, structured diagnosis for coding agents, so they spend fewer tokens, cost less to run, and move through debug loops faster.
+<sub>Works with pytest, vitest, jest, tsc, ESLint, webpack, Cargo, terraform, npm audit, and more.</sub>
-Instead of feeding an agent thousands of lines of logs, you give it:
-- the root cause
-- where it happens
-- what to fix
-- what to do next
+</div>
-```bash
-sift exec --preset test-status -- pytest -q
-```
+---
+## Why Sift?
+When an agent hits noisy output, it burns budget reading logs instead of fixing the problem.
+`sift` sits in front of that output and reduces it into a small, actionable first pass. Your agent reads the diagnosis, not the wall of text.
+Turn 13,000 lines of test output into 2 root causes.
+<p align="center">
+  <img src="assets/readme/test-status-demo.gif" alt="sift turning a pytest failure wall into a short diagnosis" width="960" />
+</p>
+With `sift`, the same run becomes:
 ```text
 - Tests did not pass.
@@ -34,272 +53,213 @@ sift exec --preset test-status -- pytest -q
 - Decision: stop and act.
 ```
-On the largest real fixture in the benchmark:
-`198K` raw-output tokens -> `129` `standard` tokens.
+In the largest benchmark fixture, sift compressed 198,026 raw output tokens to 129. That is what the agent reads instead of the full log.
-Same diagnosis. Far less work.
+---
-## What it is
+## Benchmark Results
-`sift` sits between a noisy command and a coding agent. It captures output, groups repeated failures into root-cause buckets, and returns a short diagnosis with an anchor, a likely fix, and a decision signal.
+The output reduction above measures a single command's raw output. The table below measures the full end-to-end debug session: how many tokens, tool calls, and seconds the agent spends to reach the same diagnosis.
-## Install
+Real debug loop on a 640-test Python backend with 124 repeated setup errors, 3 contract failures, and 511 passing tests:
-```bash
-npm install -g @bilalimamoglu/sift
-```
+| Metric | Without sift | With sift | Reduction |
+|--------|-------------:|----------:|----------:|
+| Tokens | 52,944 | 20,049 | 62% fewer |
+| Tool calls | 40.8 | 12 | 71% fewer |
+| Wall-clock time | 244s | 85s | 65% faster |
+| Commands | 15.5 | 6 | 61% fewer |
+| Diagnosis | Same | Same | Same outcome |
-Requires Node.js 20+.
+Same diagnosis, less agent thrash.
-## Try it in 60 seconds
+Methodology and caveats: [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md)
-If you already have an API key, you can try `sift` without any setup wizard:
+---
-```bash
-export OPENAI_API_KEY=your_openai_api_key
-sift exec --preset test-status -- pytest -q
-```
+## How It Works
-You can also use a freeform prompt for non-test output:
+`sift` keeps the explanation simple:
-```bash
-sift exec "what changed?" -- git diff
-```
+1. **Capture output.** Run the noisy command or accept already-existing piped output.
+2. **Run local heuristics.** Detect known failure shapes first so common cases stay cheap and deterministic.
+3. **Return the diagnosis.** When heuristics are confident, `sift` gives the agent the root cause, anchor, and next step.
+4. **Fall back only when needed.** If heuristics are not enough, `sift` uses a cheaper model instead of spending your main agent budget.
-## Set it up for daily use
+Your agent spends tokens fixing, not reading.
-Guided setup writes a machine-wide config, verifies the provider, and makes the CLI easier to use day to day:
+---
-```bash
-sift config setup
-sift doctor
-```
+## Key Features
-Config lives at `~/.config/sift/config.yaml`. A repo-local `sift.config.yaml` can override it later.
+<table>
+<tr>
+<td width="33%" valign="top">
-If you want your coding agent to use `sift` automatically, install the managed instruction block too:
+### Test Failure Triage
+Collapse repeated pytest, vitest, and jest failures into a short diagnosis with root-cause buckets, anchors, and fix hints.
-```bash
-sift agent install codex
-sift agent install claude
-```
+</td>
+<td width="33%" valign="top">
-Then run noisy commands through `sift`:
+### Typecheck and Lint Reduction
+Group noisy `tsc` and ESLint output into the few issues that actually matter instead of dumping the whole log back into the model.
-```bash
-sift exec --preset test-status -- <test command>
-sift exec "what changed?" -- git diff
-sift exec --preset audit-critical -- npm audit
-sift exec --preset infra-risk -- terraform plan
-```
+</td>
+<td width="33%" valign="top">
-Useful flags:
-- `--dry-run` to preview the reduced input and prompt without calling a provider
-- `--show-raw` to print captured raw output to `stderr`
-- `--fail-on` to let reduced results fail CI for commands such as `npm audit` or `terraform plan`
+### Build Failure Extraction
+Pull out the first concrete error from webpack, esbuild/Vite, Cargo, Go, GCC/Clang, and similar build output.
-If you prefer environment variables instead of setup:
+</td>
+</tr>
+<tr>
+<td width="33%" valign="top">
-```bash
-# OpenAI
-export SIFT_PROVIDER=openai
-export SIFT_BASE_URL=https://api.openai.com/v1
-export SIFT_MODEL=gpt-5-nano
-export OPENAI_API_KEY=your_openai_api_key
-# OpenRouter
-export SIFT_PROVIDER=openrouter
-export OPENROUTER_API_KEY=your_openrouter_api_key
-# Any OpenAI-compatible endpoint
-export SIFT_PROVIDER=openai-compatible
-export SIFT_BASE_URL=https://your-endpoint/v1
-export SIFT_PROVIDER_API_KEY=your_api_key
-```
+### Audit and Infra Risk
+Surface high-impact `npm audit` findings and destructive `terraform plan` signals without making the agent read everything.
+</td>
+<td width="33%" valign="top">
-## Why it helps
+### Heuristic-First by Default
+Every built-in preset tries local parsing first. When the heuristic handles the output, no provider call is needed.
-The core abstraction is a **bucket**: one distinct root cause, no matter how many tests it affects.
+</td>
+<td width="33%" valign="top">
-Instead of making an agent reason over 125 repeated tracebacks, `sift` compresses them into one actionable bucket with:
-- a label
-- an affected count
-- an anchor
-- a likely fix
-- a decision signal
+### Agent and Automation Friendly
+Use `sift` in Codex, Claude, CI, hooks, or shell scripts so downstream tooling gets short, structured answers instead of raw noise.
-That changes the agent's job from "figure out what happened" to "act on the diagnosis."
+</td>
+</tr>
+</table>
-## How it works
+---
-`sift` follows a cheapest-first pipeline:
+## Setup and Agent Integration
-1. Capture command output.
-2. Sanitize sensitive-looking material.
-3. Apply local heuristics for known failure shapes.
-4. Escalate to a cheaper provider only if needed.
-5. Return a short diagnosis to the main agent.
+Most built-in presets run entirely on local heuristics with no API key needed. For presets that fall back to a model (`diff-summary`, `log-errors`, or when heuristics are not confident enough), sift supports OpenAI-compatible and OpenRouter-compatible endpoints.
-It also returns a decision signal:
-- `stop and act` when the diagnosis is already actionable
-- `zoom` when one deeper pass is justified
-- raw logs only as a last resort
+Set up the provider first, then install the managed instruction block for the agent you want to steer:
+```bash
+sift config setup
+sift doctor
+sift agent install codex
+sift agent install claude
+```
-For recognized formats, local heuristics can fully handle the output and skip the provider entirely.
+You can also preview, inspect, or remove those blocks:
-The deepest local coverage today is test debugging, especially `pytest`, with growing support for `vitest` and `jest`. Other presets cover typecheck walls, lint failures, build errors, audit output, and Terraform risk detection.
+```bash
+sift agent show codex
+sift agent status
+sift agent remove codex
+```
-## Built-in presets
+Command-first details live in [docs/cli-reference.md](docs/cli-reference.md).
-Every preset runs local heuristics first. When the heuristic confidently handles the output, the provider is never called.
+---
-| Preset | Heuristic | What it does |
-|--------|-----------|-------------|
-| `test-status` | Deep | Bucket/anchor/decision system for pytest, vitest, jest. 30+ failure patterns, confidence-gated stop/zoom decisions. |
-| `typecheck-summary` | Deterministic | Parses `tsc` output (standard and pretty formats), groups by error code, returns max 5 bullets. |
-| `lint-failures` | Deterministic | Parses ESLint stylish output, groups by rule, distinguishes errors from warnings, detects fixable hints. |
-| `audit-critical` | Deterministic | Extracts high/critical vulnerabilities from `npm audit` or similar. |
-| `infra-risk` | Deterministic | Detects destructive signals in `terraform plan` output. Returns pass/fail verdict. |
-| `build-failure` | Deterministic-first | Extracts the first concrete build error for recognized webpack, esbuild/Vite, Cargo, Go, GCC/Clang, and `tsc --build` output; falls back to the provider for unsupported formats. |
-| `diff-summary` | Provider | Summarizes changes and risks in diff output. |
-| `log-errors` | Provider | Extracts top error signals from log output. |
+## Quick Start
-Presets marked **Deterministic** bypass the provider entirely for recognized output formats. Presets marked **Deterministic-first** try a local heuristic first and fall back to the provider only when the captured output is unsupported or ambiguous. Presets marked **Provider** always call the LLM but benefit from input sanitization and truncation.
+### 1. Install
 ```bash
-sift exec --preset typecheck-summary -- npx tsc --noEmit
-sift exec --preset lint-failures -- npx eslint src/
-sift exec --preset build-failure -- npm run build
-sift exec --preset audit-critical -- npm audit
-sift exec --preset infra-risk -- terraform plan
+npm install -g @bilalimamoglu/sift
 ```
-On an interactive terminal, `sift` also shows a small stderr footer so humans can see whether the provider was skipped:
-```text
-[sift: heuristic • LLM skipped • summary 47ms]
-[sift: provider • LLM used • 380 tokens • summary 1.2s]
-```
+Requires Node.js 20+.
-Suppress the footer with `--quiet`:
+### 2. Run Sift in front of a noisy command
 ```bash
-sift exec --preset typecheck-summary --quiet -- npx tsc --noEmit
+sift exec --preset test-status -- pytest -q
 ```
-## Strongest today
+Other common entry points:
-`sift` is strongest when output is:
-- long
-- repetitive
-- triage-heavy
-- shaped by a small number of shared root causes
-Best fits today:
-- large `pytest`, `vitest`, or `jest` runs
-- `tsc` type errors and `eslint` lint failures
-- build failures from webpack, esbuild/Vite, Cargo, Go, GCC/Clang
-- `npm audit` and `terraform plan`
-- repeated CI blockers
-- noisy diffs and log streams
+```bash
+sift exec --preset test-status -- npx vitest run
+sift exec --preset test-status -- npx jest
+sift exec "what changed?" -- git diff
+```
-## Test debugging workflow
+### 3. Zoom only if needed
-This is where `sift` is strongest today.
+Think of the workflow like this:
-Think of it like this:
 - `standard` = map
 - `focused` = zoom
 - raw traceback = last resort
-Typical loop:
 ```bash
-sift exec --preset test-status -- <test command>
 sift rerun
 sift rerun --remaining --detail focused
 ```
 If `standard` already gives you the root cause, anchor, and fix, stop there and act.
-`sift rerun --remaining` narrows automatically for cached `pytest` runs.
+---
-For cached `vitest` and `jest` runs, it reruns the original full command and keeps the diagnosis focused on what still fails relative to the cached baseline.
+## Presets
-For other runners, rerun a narrowed command manually with `sift exec --preset test-status -- <narrowed command>`.
+| Preset | What it does | Needs provider? |
+|--------|--------------|:---------------:|
+| `test-status` | Groups pytest, vitest, and jest failures into root-cause buckets with anchors and fix suggestions. | No |
+| `typecheck-summary` | Parses `tsc` output and groups issues by error code. | No |
+| `lint-failures` | Parses ESLint output and groups failures by rule. | No |
+| `build-failure` | Extracts the first concrete build error from common toolchains. | Fallback only |
+| `audit-critical` | Pulls high and critical `npm audit` findings. | No |
+| `infra-risk` | Detects destructive signals in `terraform plan`. | No |
+| `diff-summary` | Summarizes change sets and likely risks in diff output. | Yes |
+| `log-errors` | Extracts the strongest error signals from noisy logs. | Fallback only |
+When output already exists in a pipeline, use pipe mode instead of `exec`:
 ```bash
-sift agent status
-sift agent show claude
-sift agent remove claude
+pytest -q 2>&1 | sift preset test-status
+npm audit 2>&1 | sift preset audit-critical
 ```
-## Where it helps less
+---
-`sift` adds less value when:
-- the output is already short and obvious
-- the command is interactive or TUI-based
-- the exact raw log matters
-- the output does not expose enough evidence for reliable grouping
+## Test Debugging Workflow
-When it cannot be confident, it tells you to zoom or read raw instead of pretending certainty.
-## Benchmark
-On a real 640-test Python backend (125 repeated setup errors, 3 contract failures, 510 passing tests):
-| Metric | Raw agent | sift-first | Reduction |
-|--------|-----------|------------|-----------|
-| Tokens | 305K | 600 | 99.8% |
-| Tool calls | 16 | 7 | 56% |
-| Diagnosis | Same | Same | — |
-The table above is the single-fixture reduction story: the largest real test log in the benchmark shrank from `198026` raw tokens to `129` `standard` tokens.
-The end-to-end workflow benchmark is a different metric:
-- `62%` fewer total debugging tokens
-- `71%` fewer tool calls
-- `65%` faster wall-clock time
-Both matter. The table shows how aggressively `sift` can compress one large noisy run. The workflow numbers show how that compounds across a full debug loop.
-Methodology and caveats live in [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md).
-## Configuration
-Inspect and validate config with:
+For noisy test failures, start with the `test-status` preset and let `standard` be the default stop point.
 ```bash
-sift config show
-sift config show --show-secrets
-sift config validate
+sift exec --preset test-status -- <test command>
+sift rerun
+sift rerun --remaining --detail focused
+sift rerun --remaining --detail verbose --show-raw
 ```
-To switch between saved providers without editing files:
+Useful rules of thumb:
+- If `standard` ends with `Decision: stop and act`, go read source and fix the issue.
+- Use `sift rerun` after a change to refresh the same test command at `standard`.
+- Use `sift rerun --remaining` to zoom into what still fails after the first pass.
+- Treat raw traceback as the last resort, not the starting point.
+For machine branching or automation, `test-status` also supports diagnose JSON:
 ```bash
-sift config use openai
-sift config use openrouter
+sift exec --preset test-status --goal diagnose --format json -- pytest -q
+sift rerun --goal diagnose --format json
 ```
-Minimal YAML config:
+---
-```yaml
-provider:
-  provider: openai
-  model: gpt-5-nano
-  baseUrl: https://api.openai.com/v1
-  apiKey: YOUR_API_KEY
+## Limitations
-input:
-  stripAnsi: true
-  redact: false
-  maxCaptureChars: 400000
-  maxInputChars: 60000
+- sift adds the most value when output is long, repetitive, and shaped by a small number of root causes. For short, obvious failures it may not save much.
+- The deepest local heuristic coverage is in test debugging (pytest, vitest, jest). Other presets have solid heuristics but less depth.
+- sift does not help with interactive or TUI-based commands.
+- When heuristics cannot explain the output confidently, sift falls back to a provider. If no provider is configured, it returns what the heuristics could extract and signals that raw output may still be needed.
-runtime:
-  rawFallback: true
-```
+---
 ## Docs
@@ -308,6 +268,18 @@ runtime:
 - Benchmark methodology: [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md)
 - Release notes: [release-notes](release-notes)
+---
 ## License
 MIT
+---
+<div align="center">
+Built for agent-first terminal workflows.
+[Report Bug](https://github.com/bilalimamoglu/sift/issues) | [Request Feature](https://github.com/bilalimamoglu/sift/issues)
+</div>

package/dist/cli.js CHANGED Viewed

@@ -485,7 +485,14 @@ function writeExampleConfig(options = {}) {
   }
   const yaml = YAML2.stringify(defaultConfig);
   fs2.mkdirSync(path3.dirname(resolved), { recursive: true });
-  fs2.writeFileSync(resolved, yaml, "utf8");
+  fs2.writeFileSync(resolved, yaml, {
+    encoding: "utf8",
+    mode: 384
+  });
+  try {
+    fs2.chmodSync(resolved, 384);
+  } catch {
+  }
   return resolved;
 }
 function writeConfigFile(options) {
@@ -1807,8 +1814,29 @@ function escapeRegExp(value) {
 }
 // src/commands/doctor.ts
+var PLACEHOLDER_API_KEYS = [
+  "YOUR_API_KEY",
+  "your_api_key",
+  "your-api-key",
+  "sk-xxx",
+  "sk-placeholder",
+  "CHANGE_ME",
+  "change_me",
+  "TODO",
+  "todo",
+  "xxx",
+  "XXX"
+];
+function isPlaceholderApiKey(key) {
+  if (!key) return false;
+  return PLACEHOLDER_API_KEYS.includes(key.trim());
+}
+function isRealApiKey(key) {
+  return Boolean(key) && !isPlaceholderApiKey(key);
+}
 function runDoctor(config, configPath) {
   const ui = createPresentation(Boolean(process.stdout.isTTY));
+  const apiKeyStatus = isRealApiKey(config.provider.apiKey) ? "set" : isPlaceholderApiKey(config.provider.apiKey) ? "placeholder (not a real key)" : "not set";
   const lines = [
     "sift doctor",
     "A quick check for your local setup.",
@@ -1817,7 +1845,7 @@ function runDoctor(config, configPath) {
     ui.labelValue("provider", config.provider.provider),
     ui.labelValue("model", config.provider.model),
     ui.labelValue("baseUrl", config.provider.baseUrl),
-    ui.labelValue("apiKey", config.provider.apiKey ? "set" : "not set"),
+    ui.labelValue("apiKey", apiKeyStatus),
     ui.labelValue("maxCaptureChars", String(config.input.maxCaptureChars)),
     ui.labelValue("maxInputChars", String(config.input.maxInputChars)),
     ui.labelValue("rawFallback", String(config.runtime.rawFallback))
@@ -1831,8 +1859,12 @@ function runDoctor(config, configPath) {
   if (!config.provider.model) {
     problems.push("Missing provider.model");
   }
-  if ((config.provider.provider === "openai" || config.provider.provider === "openai-compatible" || config.provider.provider === "openrouter") && !config.provider.apiKey) {
-    problems.push("Missing provider.apiKey");
+  if ((config.provider.provider === "openai" || config.provider.provider === "openai-compatible" || config.provider.provider === "openrouter") && !isRealApiKey(config.provider.apiKey)) {
+    if (isPlaceholderApiKey(config.provider.apiKey)) {
+      problems.push(`provider.apiKey looks like a placeholder: "${config.provider.apiKey}"`);
+    } else {
+      problems.push("Missing provider.apiKey");
+    }
     problems.push(
       `Set one of: ${getProviderApiKeyEnvNames(
         config.provider.provider,

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@bilalimamoglu/sift",
-  "version": "0.4.1",
+  "version": "0.4.3",
   "description": "Agent-first command-output reduction layer for agents, CI, and automation.",
   "type": "module",
   "bin": {