npm - @bilalimamoglu/sift - Versions diffs - 0.4.2 → 0.4.3 - Mend

@bilalimamoglu/sift 0.4.2 → 0.4.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md CHANGED Viewed

@@ -1,22 +1,45 @@
+<div align="center">
+<img src="assets/brand/sift-logo-minimal-teal-default.svg" alt="sift logo" width="220" />
 # sift
+### Turn noisy command output into actionable diagnoses for your coding agent
+**Benchmark-backed test triage - Heuristic-first reductions - Agent-ready terminal workflows**
 [![npm version](https://img.shields.io/npm/v/@bilalimamoglu/sift)](https://www.npmjs.com/package/@bilalimamoglu/sift)
 [![license](https://img.shields.io/github/license/bilalimamoglu/sift)](LICENSE)
 [![CI](https://img.shields.io/github/actions/workflow/status/bilalimamoglu/sift/ci.yml?branch=main&label=CI)](https://github.com/bilalimamoglu/sift/actions/workflows/ci.yml)
+[![Node.js](https://img.shields.io/badge/Node.js-20+-green.svg)](https://nodejs.org/)
-Turn 13,000 lines of test output into 2 root causes.
+<br />
+### Get Started
+```bash
+npm install -g @bilalimamoglu/sift
+```
+<sub>Works with pytest, vitest, jest, tsc, ESLint, webpack, Cargo, terraform, npm audit, and more.</sub>
+</div>
+---
-Your agent reads a diagnosis, not a log file.
+## Why Sift?
+When an agent hits noisy output, it burns budget reading logs instead of fixing the problem.
+`sift` sits in front of that output and reduces it into a small, actionable first pass. Your agent reads the diagnosis, not the wall of text.
+Turn 13,000 lines of test output into 2 root causes.
 <p align="center">
   <img src="assets/readme/test-status-demo.gif" alt="sift turning a pytest failure wall into a short diagnosis" width="960" />
 </p>
-## Before / After
-128 test failures. 13,000 lines of logs. The agent reads all of it.
-With `sift`, it reads this instead:
+With `sift`, the same run becomes:
 ```text
 - Tests did not pass.
@@ -30,20 +53,118 @@ With `sift`, it reads this instead:
 - Decision: stop and act.
 ```
-Same diagnosis. One run compressed from 198,000 tokens to 129.
+In the largest benchmark fixture, sift compressed 198,026 raw output tokens to 129. That is what the agent reads instead of the full log.
+---
+## Benchmark Results
+The output reduction above measures a single command's raw output. The table below measures the full end-to-end debug session: how many tokens, tool calls, and seconds the agent spends to reach the same diagnosis.
+Real debug loop on a 640-test Python backend with 124 repeated setup errors, 3 contract failures, and 511 passing tests:
+| Metric | Without sift | With sift | Reduction |
+|--------|-------------:|----------:|----------:|
+| Tokens | 52,944 | 20,049 | 62% fewer |
+| Tool calls | 40.8 | 12 | 71% fewer |
+| Wall-clock time | 244s | 85s | 65% faster |
+| Commands | 15.5 | 6 | 61% fewer |
+| Diagnosis | Same | Same | Same outcome |
+Same diagnosis, less agent thrash.
+Methodology and caveats: [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md)
+---
+## How It Works
+`sift` keeps the explanation simple:
+1. **Capture output.** Run the noisy command or accept already-existing piped output.
+2. **Run local heuristics.** Detect known failure shapes first so common cases stay cheap and deterministic.
+3. **Return the diagnosis.** When heuristics are confident, `sift` gives the agent the root cause, anchor, and next step.
+4. **Fall back only when needed.** If heuristics are not enough, `sift` uses a cheaper model instead of spending your main agent budget.
+Your agent spends tokens fixing, not reading.
+---
+## Key Features
+<table>
+<tr>
+<td width="33%" valign="top">
+### Test Failure Triage
+Collapse repeated pytest, vitest, and jest failures into a short diagnosis with root-cause buckets, anchors, and fix hints.
+</td>
+<td width="33%" valign="top">
+### Typecheck and Lint Reduction
+Group noisy `tsc` and ESLint output into the few issues that actually matter instead of dumping the whole log back into the model.
+</td>
+<td width="33%" valign="top">
+### Build Failure Extraction
+Pull out the first concrete error from webpack, esbuild/Vite, Cargo, Go, GCC/Clang, and similar build output.
+</td>
+</tr>
+<tr>
+<td width="33%" valign="top">
+### Audit and Infra Risk
+Surface high-impact `npm audit` findings and destructive `terraform plan` signals without making the agent read everything.
+</td>
+<td width="33%" valign="top">
+### Heuristic-First by Default
+Every built-in preset tries local parsing first. When the heuristic handles the output, no provider call is needed.
+</td>
+<td width="33%" valign="top">
+### Agent and Automation Friendly
+Use `sift` in Codex, Claude, CI, hooks, or shell scripts so downstream tooling gets short, structured answers instead of raw noise.
+</td>
+</tr>
+</table>
+---
+## Setup and Agent Integration
+Most built-in presets run entirely on local heuristics with no API key needed. For presets that fall back to a model (`diff-summary`, `log-errors`, or when heuristics are not confident enough), sift supports OpenAI-compatible and OpenRouter-compatible endpoints.
+Set up the provider first, then install the managed instruction block for the agent you want to steer:
+```bash
+sift config setup
+sift doctor
+sift agent install codex
+sift agent install claude
+```
+You can also preview, inspect, or remove those blocks:
+```bash
+sift agent show codex
+sift agent status
+sift agent remove codex
+```
-## Not just tests
+Command-first details live in [docs/cli-reference.md](docs/cli-reference.md).
-The same idea applies across noisy dev workflows:
+---
-- **Type errors** → grouped by error code, no model call
-- **Lint output** → grouped by rule, no model call
-- **Build failures** → first real error from webpack, esbuild/Vite, Cargo, Go, GCC/Clang
-- **`npm audit`** → high/critical vulnerabilities only, no model call
-- **`terraform plan`** → destructive risk detection, no model call
-- **Diffs and logs** → compressed through a cheaper model before reaching your agent
+## Quick Start
-## Install
+### 1. Install
 ```bash
 npm install -g @bilalimamoglu/sift
@@ -51,101 +172,94 @@ npm install -g @bilalimamoglu/sift
 Requires Node.js 20+.
-## Try it
+### 2. Run Sift in front of a noisy command
 ```bash
 sift exec --preset test-status -- pytest -q
-sift exec --preset test-status -- npx vitest run
-sift exec --preset test-status -- npx jest
 ```
-Other workflows:
+Other common entry points:
 ```bash
-sift exec --preset typecheck-summary -- npx tsc --noEmit
-sift exec --preset lint-failures -- npx eslint src/
-sift exec --preset build-failure -- npm run build
-sift exec --preset audit-critical -- npm audit
-sift exec --preset infra-risk -- terraform plan
+sift exec --preset test-status -- npx vitest run
+sift exec --preset test-status -- npx jest
 sift exec "what changed?" -- git diff
 ```
-## How it works
-`sift` sits between a noisy command and a coding agent.
+### 3. Zoom only if needed
-1. Capture output.
-2. Run local heuristics for known failure shapes.
-3. If heuristics are confident, return the diagnosis. No model call.
-4. If not, call a cheaper model — not your agent's.
+Think of the workflow like this:
-The agent gets the root cause, where it happens, and what to do next.
+- `standard` = map
+- `focused` = zoom
+- raw traceback = last resort
-So your agent spends tokens fixing, not reading.
+```bash
+sift rerun
+sift rerun --remaining --detail focused
+```
-## Built-in presets
+If `standard` already gives you the root cause, anchor, and fix, stop there and act.
-Every preset runs local heuristics first. When the heuristic handles the output, the provider is never called.
+---
-| Preset | What it does |
-|--------|-------------|
-| `test-status` | Groups pytest, vitest, jest failures into root-cause buckets with anchors and fix suggestions. 30+ failure patterns. |
-| `typecheck-summary` | Parses `tsc` output, groups by error code, returns max 5 bullets. No model call. |
-| `lint-failures` | Parses ESLint output, groups by rule, detects fixable hints. No model call. |
-| `build-failure` | Extracts first concrete error from webpack, esbuild/Vite, Cargo, Go, GCC/Clang, `tsc --build`. Falls back to model for unsupported formats. |
-| `audit-critical` | Extracts high/critical vulnerabilities from `npm audit`. No model call. |
-| `infra-risk` | Detects destructive signals in `terraform plan`. No model call. |
-| `diff-summary` | Summarizes changes and risks in diff output. |
-| `log-errors` | Extracts top error signals from log output. |
+## Presets
-## Benchmark
+| Preset | What it does | Needs provider? |
+|--------|--------------|:---------------:|
+| `test-status` | Groups pytest, vitest, and jest failures into root-cause buckets with anchors and fix suggestions. | No |
+| `typecheck-summary` | Parses `tsc` output and groups issues by error code. | No |
+| `lint-failures` | Parses ESLint output and groups failures by rule. | No |
+| `build-failure` | Extracts the first concrete build error from common toolchains. | Fallback only |
+| `audit-critical` | Pulls high and critical `npm audit` findings. | No |
+| `infra-risk` | Detects destructive signals in `terraform plan`. | No |
+| `diff-summary` | Summarizes change sets and likely risks in diff output. | Yes |
+| `log-errors` | Extracts the strongest error signals from noisy logs. | Fallback only |
-End-to-end debug loop on a real 640-test Python backend (125 repeated setup errors, 3 contract failures, 510 passing tests):
+When output already exists in a pipeline, use pipe mode instead of `exec`:
-| Metric | Without sift | With sift | Reduction |
-|--------|-------------|-----------|-----------|
-| Tokens | 52,944 | 20,049 | 62% fewer |
-| Tool calls | 40.8 | 12 | 71% fewer |
-| Wall-clock time | 244s | 85s | 65% faster |
-| Commands | 15.5 | 6 | 61% fewer |
-| Diagnosis | Same | Same | — |
+```bash
+pytest -q 2>&1 | sift preset test-status
+npm audit 2>&1 | sift preset audit-critical
+```
-Methodology and caveats: [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md)
+---
-## Test debugging workflow
+## Test Debugging Workflow
-Think of it like this:
-- `standard` = map
-- `focused` = zoom
-- raw traceback = last resort
+For noisy test failures, start with the `test-status` preset and let `standard` be the default stop point.
 ```bash
 sift exec --preset test-status -- <test command>
 sift rerun
 sift rerun --remaining --detail focused
+sift rerun --remaining --detail verbose --show-raw
 ```
-If `standard` already gives you the root cause, anchor, and fix — stop and act.
-`sift rerun --remaining` narrows automatically for cached `pytest` runs. For `vitest` and `jest`, it reruns the full command and keeps diagnosis focused on what still fails.
+Useful rules of thumb:
-## Setup
+- If `standard` ends with `Decision: stop and act`, go read source and fix the issue.
+- Use `sift rerun` after a change to refresh the same test command at `standard`.
+- Use `sift rerun --remaining` to zoom into what still fails after the first pass.
+- Treat raw traceback as the last resort, not the starting point.
-Guided setup writes a config, verifies the provider, and makes daily use easier:
+For machine branching or automation, `test-status` also supports diagnose JSON:
 ```bash
-sift config setup
-sift doctor
+sift exec --preset test-status --goal diagnose --format json -- pytest -q
+sift rerun --goal diagnose --format json
 ```
-To wire `sift` into your coding agent automatically:
+---
-```bash
-sift agent install claude
-sift agent install codex
-```
+## Limitations
+- sift adds the most value when output is long, repetitive, and shaped by a small number of root causes. For short, obvious failures it may not save much.
+- The deepest local heuristic coverage is in test debugging (pytest, vitest, jest). Other presets have solid heuristics but less depth.
+- sift does not help with interactive or TUI-based commands.
+- When heuristics cannot explain the output confidently, sift falls back to a provider. If no provider is configured, it returns what the heuristics could extract and signals that raw output may still be needed.
-Config details: [docs/cli-reference.md](docs/cli-reference.md)
+---
 ## Docs
@@ -154,6 +268,18 @@ Config details: [docs/cli-reference.md](docs/cli-reference.md)
 - Benchmark methodology: [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md)
 - Release notes: [release-notes](release-notes)
+---
 ## License
 MIT
+---
+<div align="center">
+Built for agent-first terminal workflows.
+[Report Bug](https://github.com/bilalimamoglu/sift/issues) | [Request Feature](https://github.com/bilalimamoglu/sift/issues)
+</div>

package/dist/cli.js CHANGED Viewed

@@ -485,7 +485,14 @@ function writeExampleConfig(options = {}) {
   }
   const yaml = YAML2.stringify(defaultConfig);
   fs2.mkdirSync(path3.dirname(resolved), { recursive: true });
-  fs2.writeFileSync(resolved, yaml, "utf8");
+  fs2.writeFileSync(resolved, yaml, {
+    encoding: "utf8",
+    mode: 384
+  });
+  try {
+    fs2.chmodSync(resolved, 384);
+  } catch {
+  }
   return resolved;
 }
 function writeConfigFile(options) {
@@ -1807,8 +1814,29 @@ function escapeRegExp(value) {
 }
 // src/commands/doctor.ts
+var PLACEHOLDER_API_KEYS = [
+  "YOUR_API_KEY",
+  "your_api_key",
+  "your-api-key",
+  "sk-xxx",
+  "sk-placeholder",
+  "CHANGE_ME",
+  "change_me",
+  "TODO",
+  "todo",
+  "xxx",
+  "XXX"
+];
+function isPlaceholderApiKey(key) {
+  if (!key) return false;
+  return PLACEHOLDER_API_KEYS.includes(key.trim());
+}
+function isRealApiKey(key) {
+  return Boolean(key) && !isPlaceholderApiKey(key);
+}
 function runDoctor(config, configPath) {
   const ui = createPresentation(Boolean(process.stdout.isTTY));
+  const apiKeyStatus = isRealApiKey(config.provider.apiKey) ? "set" : isPlaceholderApiKey(config.provider.apiKey) ? "placeholder (not a real key)" : "not set";
   const lines = [
     "sift doctor",
     "A quick check for your local setup.",
@@ -1817,7 +1845,7 @@ function runDoctor(config, configPath) {
     ui.labelValue("provider", config.provider.provider),
     ui.labelValue("model", config.provider.model),
     ui.labelValue("baseUrl", config.provider.baseUrl),
-    ui.labelValue("apiKey", config.provider.apiKey ? "set" : "not set"),
+    ui.labelValue("apiKey", apiKeyStatus),
     ui.labelValue("maxCaptureChars", String(config.input.maxCaptureChars)),
     ui.labelValue("maxInputChars", String(config.input.maxInputChars)),
     ui.labelValue("rawFallback", String(config.runtime.rawFallback))
@@ -1831,8 +1859,12 @@ function runDoctor(config, configPath) {
   if (!config.provider.model) {
     problems.push("Missing provider.model");
   }
-  if ((config.provider.provider === "openai" || config.provider.provider === "openai-compatible" || config.provider.provider === "openrouter") && !config.provider.apiKey) {
-    problems.push("Missing provider.apiKey");
+  if ((config.provider.provider === "openai" || config.provider.provider === "openai-compatible" || config.provider.provider === "openrouter") && !isRealApiKey(config.provider.apiKey)) {
+    if (isPlaceholderApiKey(config.provider.apiKey)) {
+      problems.push(`provider.apiKey looks like a placeholder: "${config.provider.apiKey}"`);
+    } else {
+      problems.push("Missing provider.apiKey");
+    }
     problems.push(
       `Set one of: ${getProviderApiKeyEnvNames(
         config.provider.provider,

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@bilalimamoglu/sift",
-  "version": "0.4.2",
+  "version": "0.4.3",
   "description": "Agent-first command-output reduction layer for agents, CI, and automation.",
   "type": "module",
   "bin": {