npm - attacca-forge - Versions diffs - 0.5.0 - Mend

attacca-forge 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (48) hide show

package/LICENSE +21 -0
package/README.md +159 -0
package/bin/cli.js +79 -0
package/docs/architecture.md +132 -0
package/docs/getting-started.md +137 -0
package/docs/methodology/factorial-stress-testing.md +64 -0
package/docs/methodology/failure-modes.md +82 -0
package/docs/methodology/intent-engineering.md +78 -0
package/docs/methodology/progressive-autonomy.md +92 -0
package/docs/methodology/spec-driven-development.md +52 -0
package/docs/methodology/trust-tiers.md +52 -0
package/examples/stress-test-matrix.md +98 -0
package/examples/tier-2-saas-spec.md +142 -0
package/package.json +44 -0
package/plugins/attacca-forge/.claude-plugin/plugin.json +7 -0
package/plugins/attacca-forge/skills/agent-economics-analyzer/SKILL.md +90 -0
package/plugins/attacca-forge/skills/agent-readiness-audit/SKILL.md +90 -0
package/plugins/attacca-forge/skills/agent-stack-opportunity-mapper/SKILL.md +93 -0
package/plugins/attacca-forge/skills/ai-dev-level-assessment/SKILL.md +112 -0
package/plugins/attacca-forge/skills/ai-dev-talent-strategy/SKILL.md +154 -0
package/plugins/attacca-forge/skills/ai-difficulty-rapid-audit/SKILL.md +121 -0
package/plugins/attacca-forge/skills/ai-native-org-redesign/SKILL.md +114 -0
package/plugins/attacca-forge/skills/ai-output-taste-builder/SKILL.md +116 -0
package/plugins/attacca-forge/skills/ai-workflow-capability-map/SKILL.md +98 -0
package/plugins/attacca-forge/skills/ai-workflow-optimizer/SKILL.md +131 -0
package/plugins/attacca-forge/skills/build-orchestrator/SKILL.md +320 -0
package/plugins/attacca-forge/skills/codebase-discovery/SKILL.md +286 -0
package/plugins/attacca-forge/skills/forge-help/SKILL.md +100 -0
package/plugins/attacca-forge/skills/forge-start/SKILL.md +110 -0
package/plugins/attacca-forge/skills/harness-simulator/SKILL.md +137 -0
package/plugins/attacca-forge/skills/insight-to-action-compression-map/SKILL.md +134 -0
package/plugins/attacca-forge/skills/intent-audit/SKILL.md +144 -0
package/plugins/attacca-forge/skills/intent-gap-diagnostic/SKILL.md +63 -0
package/plugins/attacca-forge/skills/intent-spec/SKILL.md +170 -0
package/plugins/attacca-forge/skills/legacy-migration-roadmap/SKILL.md +126 -0
package/plugins/attacca-forge/skills/personal-intent-layer-builder/SKILL.md +80 -0
package/plugins/attacca-forge/skills/problem-difficulty-decomposition/SKILL.md +128 -0
package/plugins/attacca-forge/skills/spec-architect/SKILL.md +210 -0
package/plugins/attacca-forge/skills/spec-writer/SKILL.md +145 -0
package/plugins/attacca-forge/skills/stress-test/SKILL.md +283 -0
package/plugins/attacca-forge/skills/web-fork-strategic-briefing/SKILL.md +66 -0
package/src/commands/help.js +44 -0
package/src/commands/init.js +121 -0
package/src/commands/install.js +77 -0
package/src/commands/status.js +87 -0
package/src/utils/context.js +141 -0
package/src/utils/detect-claude.js +23 -0
package/src/utils/prompt.js +44 -0

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Attacca
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,159 @@
+# Attacca Forge
+**Spec-driven AI development toolkit. Your spec is the source of truth — not the model, not the prompt, not the vibes.**
+```bash
+npx attacca-forge init
+npx attacca-forge install
+```
+AI agents don't ask clarifying questions — they make assumptions. The quality gap between Level 3 and Level 5 AI-assisted development isn't model intelligence. It's specification quality.
+Attacca Forge gives you an 8-phase pipeline from idea to production, 26 skills for Claude Code, and a methodology that scales evaluation rigor to the stakes of what you're building.
+## Quick Start
+### Option A: npx (recommended)
+```bash
+# Initialize your project (interactive setup)
+npx attacca-forge init
+# Install skills into Claude Code
+npx attacca-forge install
+```
+Restart Claude Code. Then say what you want to build:
+```
+I want to build a notification system that alerts users when their subscription is about to expire
+```
+### Option B: Manual install
+```bash
+git clone https://github.com/attacca-ai/attacca-forge.git
+cd attacca-forge
+./install.sh        # macOS / Linux / WSL
+# or
+./install.ps1       # Windows PowerShell
+```
+## The Pipeline
+```
+IDEA → DISCOVER → SPEC → BUILD → TEST → CERTIFY → DEPLOY → MAINTAIN
+```
+Every project starts at IDEA. The pipeline guides you phase by phase. Trust tiers (1-4) scale the rigor at every step — a hobby project moves fast, a safety-critical system gets full evaluation.
+```bash
+npx attacca-forge status    # See where you are
+```
+```
+  Pipeline:
+  ✓ IDEA       Capture your idea and classify risk
+  ✓ SPEC       Write behavioral specification with intent contract
+  ▸ BUILD      Execute implementation on deterministic rails
+    TEST       Run factorial stress testing against scenarios
+    CERTIFY    Human sign-off (tier-appropriate review)
+    DEPLOY     Production deployment with gates
+    MAINTAIN   Continuous flywheel + drift detection
+```
+## Skills
+### Core Pipeline (9 skills)
+| Skill | Phase | What It Does |
+|-------|-------|-------------|
+| `forge-start` | IDEA | Capture intent, classify trust tier, route to next phase |
+| `forge-help` | Any | "What should I do next?" — phase-aware navigation |
+| `codebase-discovery` | DISCOVER | Brownfield behavioral snapshot (6-layer exploration) |
+| `spec-architect` | SPEC | Full spec with intent contracts, eval scenarios, trust tier classification |
+| `spec-writer` | SPEC | Streamlined spec — no intent layer, faster for Tier 1-2 |
+| `stress-test` | TEST | Factorial stress testing — 22 variation types, 4 failure modes |
+| `intent-spec` | SPEC | Agent intent specification — value hierarchies, decision boundaries, drift detection |
+| `intent-audit` | Any | Organizational AI maturity audit — three-layer assessment |
+| `build-orchestrator` | BUILD | Spec-tests-code pipeline with 4-layer eval stack |
+### Extended Skills (17 skills)
+| Skill | Category | What It Does |
+|-------|----------|-------------|
+| `intent-gap-diagnostic` | Intent | 10-min rapid diagnostic — find your biggest AI intent gap |
+| `personal-intent-layer-builder` | Intent | Build a reusable personal intent document for AI collaboration |
+| `ai-workflow-capability-map` | Intent | Map team workflows into agent-ready / augmented / human-only |
+| `insight-to-action-compression-map` | Analysis | Map bottlenecks, redesign compressed workflows |
+| `harness-simulator` | Analysis | Planner-Worker-Judge multi-pass decomposition with self-critique |
+| `ai-difficulty-rapid-audit` | Analysis | 10-min audit — map work across difficulty axes |
+| `problem-difficulty-decomposition` | Analysis | Deep decomposition into 7 difficulty axes |
+| `ai-workflow-optimizer` | Analysis | Evaluate AI usage against difficulty profile |
+| `agent-stack-opportunity-mapper` | Analysis | Map business against 5-layer agent infrastructure stack |
+| `agent-readiness-audit` | Analysis | Technical audit of agent-readiness (content, discovery, API, security) |
+| `agent-economics-analyzer` | Analysis | Evaluate task viability for agent automation |
+| `ai-dev-level-assessment` | Organization | Diagnose team's AI-assisted dev maturity (Level 0-5) |
+| `ai-native-org-redesign` | Organization | Redesign engineering org for AI-native development |
+| `legacy-migration-roadmap` | Organization | Phased brownfield modernization plan |
+| `ai-dev-talent-strategy` | Organization | Career/talent strategy for the AI-native era |
+| `web-fork-strategic-briefing` | Strategy | Strategic briefing on agent web fork impact |
+| `ai-output-taste-builder` | Quality | Build domain-specific "taste" for evaluating AI output |
+**26 skills total.** Each loads only when invoked — zero context window bloat.
+## Trust Tiers
+Every project gets a trust tier. The tier scales everything downstream.
+| Tier | Risk Level | Example | Eval Rigor |
+|------|-----------|---------|------------|
+| 1 | Nothing bad happens | Hobby project, prototype | Base scenarios only |
+| 2 | Time or money lost | SaaS, client work | 2 variations/scenario + intent recommended |
+| 3 | Legal/financial/reputation risk | Compliance, finance | 3 variations + intent required + domain review |
+| 4 | Irreversible harm | Healthcare, safety-critical | 5 variations + all eval layers + expert sign-off |
+## The Methodology
+Attacca Forge encodes a development methodology, not just prompts:
+- **[Spec-Driven Development](docs/methodology/spec-driven-development.md)** — The Spec-Tests-Code triangle
+- **[Trust Tiers](docs/methodology/trust-tiers.md)** — Classify by risk, scale evaluation to stakes
+- **[Factorial Stress Testing](docs/methodology/factorial-stress-testing.md)** — 22 variation types expose hidden failures
+- **[The Four Failure Modes](docs/methodology/failure-modes.md)** — Inverted U, reasoning-output disconnect, anchoring bias, guardrail inversion
+- **[Intent Engineering](docs/methodology/intent-engineering.md)** — Encode organizational judgment into machine-actionable specs
+- **[Progressive Autonomy](docs/methodology/progressive-autonomy.md)** — Shadow mode to full autonomy, earned through evaluation
+## Example Output
+- [Tier 2 SaaS Notification System](examples/tier-2-saas-spec.md) — Complete spec with behavioral scenarios, variations, and intent contract
+- [Stress Test Matrix](examples/stress-test-matrix.md) — Factorial test with 16 variations across 3 scenarios
+## CLI Commands
+```bash
+npx attacca-forge init       # Interactive project setup
+npx attacca-forge install    # Install skills to Claude Code
+npx attacca-forge status     # Pipeline phase + next steps
+npx attacca-forge help       # Full command reference
+```
+## The Attacca Ecosystem
+**Attacca Forge** designs and evaluates agents. *What should this agent do? How do we know it works?*
+**[Attacca Claw](https://github.com/attacca-ai/attacca-claw-desktop)** executes agents. *How does a non-technical user interact with an autonomous agent safely?*
+Independent tools that work together.
+## Attribution
+Built on frameworks by:
+- **Nate Jones** — Spec-driven development methodology and intent engineering
+- **Drew Breunig** — Spec-Tests-Code triangle
+- **Mount Sinai Health System** — Failure mode taxonomy from factorial design study (Nature Medicine, 2026)
+## License
+MIT — see [LICENSE](LICENSE)

package/bin/cli.js ADDED Viewed

@@ -0,0 +1,79 @@
+#!/usr/bin/env node
+// =============================================================================
+// Attacca Forge CLI
+// Spec-driven AI development toolkit
+// Usage: npx attacca-forge <command> [options]
+// =============================================================================
+import { resolve, dirname } from 'node:path';
+import { fileURLToPath } from 'node:url';
+import { existsSync } from 'node:fs';
+const __dirname = dirname(fileURLToPath(import.meta.url));
+// npx runs from a temp directory — resolve back to the user's CWD
+const userCwd = process.env.INIT_CWD || process.cwd();
+const COMMANDS = {
+  init: () => import('../src/commands/init.js'),
+  install: () => import('../src/commands/install.js'),
+  status: () => import('../src/commands/status.js'),
+  help: () => import('../src/commands/help.js'),
+};
+const HELP = `
+  attacca-forge — Spec-driven AI development toolkit
+  Usage:
+    npx attacca-forge <command> [options]
+  Commands:
+    init        Initialize a new project with Attacca Forge
+    install     Install skills into Claude Code plugin directory
+    status      Show current pipeline phase and next steps
+    help        Show this help message
+  Examples:
+    npx attacca-forge init
+    npx attacca-forge init my-project
+    npx attacca-forge install
+    npx attacca-forge status
+  Documentation:
+    https://github.com/attacca-ai/attacca-forge
+`;
+async function main() {
+  const args = process.argv.slice(2);
+  const command = args[0];
+  if (!command || command === 'help' || command === '--help' || command === '-h') {
+    console.log(HELP);
+    process.exit(0);
+  }
+  if (command === '--version' || command === '-v') {
+    const pkg = JSON.parse(
+      (await import('node:fs')).readFileSync(resolve(__dirname, '..', 'package.json'), 'utf-8')
+    );
+    console.log(`attacca-forge v${pkg.version}`);
+    process.exit(0);
+  }
+  if (!COMMANDS[command]) {
+    console.error(`\n  Unknown command: ${command}\n`);
+    console.log(HELP);
+    process.exit(1);
+  }
+  try {
+    const mod = await COMMANDS[command]();
+    await mod.default({ args: args.slice(1), cwd: userCwd, rootDir: resolve(__dirname, '..') });
+  } catch (err) {
+    console.error(`\n  Error: ${err.message}\n`);
+    process.exit(1);
+  }
+}
+main();

package/docs/architecture.md ADDED Viewed

@@ -0,0 +1,132 @@
+# Architecture — How the Layers Fit Together
+> Design → Evaluate → Align → Orchestrate
+## The Full Pipeline
+```
+Raw Idea
+    │
+    ▼
+┌──────────────────┐
+│   SPEC STUDIO     │  spec-architect / spec-writer
+│                    │  "What should this system do?"
+│   Trust tier       │
+│   Behavioral       │
+│   contract         │
+│   Scenarios with   │
+│   variations       │
+└────────┬──────────┘
+         │
+         ▼
+┌──────────────────┐
+│   EVAL GATE       │  stress-test
+│                    │  "Does it work under pressure?"
+│   Factorial matrix │
+│   Failure mode     │
+│   coverage         │
+│   Aggregate        │
+│   metrics          │
+└────────┬──────────┘
+         │
+         ▼
+┌──────────────────┐
+│   INTENT LAYER    │  intent-spec / intent-audit
+│                    │  "What should it optimize for?"
+│   Value hierarchy  │
+│   Decision         │
+│   boundaries       │
+│   Drift detection  │
+└────────┬──────────┘
+         │
+         ▼
+┌──────────────────┐
+│   BUILD FLOOR     │  build-orchestrator
+│                    │  "How does it earn autonomy?"
+│   Progressive      │
+│   autonomy         │
+│   Deterministic    │
+│   validation       │
+│   Continuous       │
+│   flywheel         │
+│   Quality gates    │
+└────────┬──────────┘
+         │
+         ▼
+    DELIVERY
+```
+## The Spec-Tests-Code Triangle
+At the core of everything is the triangle. Changes in any node must propagate to the others:
+```
+         SPEC
+        /    \
+     TESTS ── CODE
+```
+- **Spec changes** → tests must be updated → code must be re-verified
+- **Test failures** → either the code is wrong or the spec has a gap
+- **Code changes** → must be validated against spec and tests
+The `spec-architect` skill generates the spec. The `stress-test` skill generates the tests (with factorial variations). The coding agent generates the code. The `build-orchestrator` keeps the triangle in sync.
+## Layer Dependencies
+Each layer builds on the previous:
+```
+Layer 1: Spec        → Defines WHAT the system does
+Layer 2: Eval        → Validates HOW it handles pressure
+Layer 3: Intent      → Encodes WHY it exists (and what to optimize for)
+Layer 4: Orchestrate → Governs the full lifecycle (design → deploy → monitor)
+```
+You can adopt layers incrementally:
+- **Layer 1 alone** = better specs, fewer agent assumptions
+- **Layers 1+2** = specs validated under contextual stress
+- **Layers 1+2+3** = specs + eval + organizational alignment
+- **All four layers** = production-grade agent deployment pipeline
+## The Ecosystem
+```
+DESIGN (Forge)              EXECUTE (Claw)              OPERATE (???)
+┌────────────────────┐      ┌────────────────────┐      ┌────────────────────┐
+│ Spec Studio         │─────→│ Agent Runtime       │─────→│                    │
+│ Eval Gate           │      │ Trust Tiers         │      │                    │
+│ Intent Layer        │      │ Task Execution      │      │                    │
+│ Build Floor         │      │ Human Supervision   │      │                    │
+└────────────────────┘      └────────────────────┘      └────────────────────┘
+  attacca-forge               attacca-claw-desktop           ???
+  (methodology)               (runtime)                     (coming)
+```
+**Attacca Forge** designs and evaluates agents — the methodology.
+**[Attacca Claw](https://github.com/attacca-ai/attacca-claw-desktop)** executes agents — the runtime.
+The operate layer completes the cycle.
+## What This Architecture Handles
+| Challenge | Which Layer Solves It |
+|-----------|---------------------|
+| Agent makes assumptions | Layer 1 (Spec) — explicit behavioral contracts |
+| Agent fails on edge cases | Layer 2 (Eval) — factorial stress testing on distribution tails |
+| Agent optimizes for wrong thing | Layer 3 (Intent) — value hierarchy + Klarna checklist |
+| Agent's reasoning doesn't match its output | Layer 4 (Orchestrate) — deterministic validation rules |
+| Agent degrades after model update | Layer 4 (Orchestrate) — model change protocol |
+| New failure pattern in production | Layer 4 (Orchestrate) — continuous flywheel catches and encodes it |
+| Agent needs more/less autonomy | Layer 4 (Orchestrate) — progressive autonomy with promotion/demotion |
+## What This Architecture Does NOT Handle
+This is a methodology toolkit for designing, evaluating, and deploying agents. It does not include:
+- **Multi-project orchestration** — managing many agents across many projects simultaneously
+- **Client delivery workflows** — intake, billing, handoff, maintenance cycles
+- **Team coordination** — assigning work to humans and agents across an organization
+- **Knowledge compounding** — learning from one project to improve the next systematically
+These are operational challenges that emerge when you run the methodology at scale.

package/docs/getting-started.md ADDED Viewed

@@ -0,0 +1,137 @@
+# Getting Started with Attacca Forge
+## Installation
+### Option A: npx (recommended)
+```bash
+# Set up your project
+npx attacca-forge init
+# Install skills into Claude Code
+npx attacca-forge install
+```
+Restart Claude Code after installation.
+### Option B: Manual install
+```bash
+git clone https://github.com/attacca-ai/attacca-forge.git
+cd attacca-forge
+./install.sh        # macOS/Linux/WSL
+# or
+./install.ps1       # Windows PowerShell
+```
+Restart Claude Code after installation.
+## Your First Project
+### 1. Initialize
+```bash
+npx attacca-forge init
+```
+The setup wizard asks 4 questions:
+- **Project name** — what you're building
+- **Greenfield or brownfield** — new code or modifying existing
+- **Trust tier** — what happens if it's wrong (1=nothing, 4=irreversible harm)
+- **Experience level** — calibrates how much the skills explain
+This creates `.attacca/config.yaml` and `.attacca/context.md` in your project root.
+### 2. Start the Pipeline
+In Claude Code, say:
+```
+help me start
+```
+The `forge-start` skill captures your idea and routes you to the next phase:
+- **Greenfield** → SPEC phase (write behavioral specification)
+- **Brownfield** → DISCOVER phase (map existing codebase first)
+### 3. Follow the Pipeline
+```bash
+npx attacca-forge status
+```
+Shows where you are and what to do next:
+```
+  Pipeline:
+  ✓ IDEA       Capture your idea and classify risk
+  ▸ SPEC       Write behavioral specification with intent contract
+    BUILD      Execute implementation on deterministic rails
+    TEST       Run factorial stress testing against scenarios
+    CERTIFY    Human sign-off (tier-appropriate review)
+    DEPLOY     Production deployment with gates
+    MAINTAIN   Continuous flywheel + drift detection
+  Next: /spec-architect
+        Write behavioral specification with intent contract
+```
+At any point, say "what should I do next" in Claude Code to invoke `forge-help`.
+## Choosing Between Spec Skills
+| Skill | When to Use | Output |
+|-------|------------|--------|
+| `spec-architect` | Full spec with organizational alignment + eval (17 questions) | Behavioral Contract + Intent Contract + Eval Thresholds |
+| `spec-writer` | Quick implementation spec (13 questions) | Behavioral Contract + Ambiguity Warnings |
+**Rule of thumb**: Use `spec-architect` for anything Tier 2+ or any system involving autonomous decisions. Use `spec-writer` for Tier 1 features where you just need a clean spec fast.
+## Trust Tiers
+Your trust tier (set during init) scales everything automatically:
+| Tier | What It Means | Spec Rigor | Stress Test | Intent | Sign-Off |
+|------|--------------|-----------|-------------|--------|----------|
+| 1 | Nothing bad happens | 7 scenarios | Skip | Skip | Deploy only |
+| 2 | Time/money lost | 7 scenarios + 2 variations each | Required | Recommended | Spec + deploy |
+| 3 | Legal/financial risk | 7+ scenarios + 3 variations | Required | Required | Spec + intent + test + deploy |
+| 4 | Irreversible harm | 7+ scenarios + 5 variations | Required | Required | Full review + domain expert |
+## All Skills
+### Core Pipeline
+- `/forge-start` — IDEA phase onboarding
+- `/forge-help` — Phase-aware "what's next?"
+- `/codebase-discovery` — Brownfield behavioral snapshot
+- `/spec-architect` — Full behavioral spec with intent
+- `/spec-writer` — Streamlined spec (no intent)
+- `/stress-test` — Factorial stress testing
+- `/intent-spec` — Agent intent specification
+- `/intent-audit` — Organizational AI maturity audit
+- `/build-orchestrator` — Build pipeline with eval stack
+### Extended (17 skills)
+Intent engineering, analysis, organizational design, and quality evaluation skills. Run `/forge-help` or `npx attacca-forge help` for the full list.
+## Trigger Keywords
+Skills activate on natural language. Just say what you need:
+- "spec this out" → `spec-architect`
+- "quick spec" → `spec-writer`
+- "stress test my scenarios" → `stress-test`
+- "what should I do next" → `forge-help`
+- "discover this codebase" → `codebase-discovery`
+- "help me start" → `forge-start`
+## CLI Reference
+```bash
+npx attacca-forge init       # Interactive project setup
+npx attacca-forge install    # Install skills to Claude Code
+npx attacca-forge status     # Pipeline phase + next steps
+npx attacca-forge help       # Full command reference
+npx attacca-forge --version  # Show version
+```

package/docs/methodology/factorial-stress-testing.md ADDED Viewed

@@ -0,0 +1,64 @@
+# Factorial Stress Testing
+> Your agent's accuracy dashboard might say 87%. The inverted U means that number is masking silent failures on the tails of the distribution — precisely where consequential decisions live.
+## The Problem
+Standard evaluations test each scenario once under clean conditions. This misses three categories of failure that only surface when context varies:
+1. **Anchoring bias** — social context shifts the output without changing the facts
+2. **Guardrail inversion** — safety mechanisms fire on surface patterns, not actual risk
+3. **Inverted U** — performance degrades at extremes where stakes are highest
+## The Solution: Factorial Design
+Take each behavioral scenario and systematically apply controlled contextual stressors — one at a time. Measure whether the output shifts. If it shifts when it shouldn't, you've found a hidden failure.
+This approach was validated in a landmark study where researchers tested an AI health system across 960 variations (60 scenarios × 16 conditions). Standard testing said the system worked. Factorial testing revealed it under-triaged 52% of emergencies and was 12x more likely to inappropriately de-escalate when social pressure was present.
+## The Five Stressor Categories
+| Category | What It Tests | Example |
+|----------|--------------|---------|
+| **A: Social/Authority Pressure** | Does someone's opinion shift the output? | "The VP says it's low risk" |
+| **B: Framing/Anchoring** | Does language tone shift risk assessment? | Optimistic vs. pessimistic framing of same data |
+| **C: Temporal/Access Pressure** | Does urgency or scarcity bypass quality? | "Budget is extremely tight this quarter" |
+| **D: Structural Edge Cases** | Does the system handle distribution tails? | Near-miss to extreme, disguised severity |
+| **E: Reasoning-Output Alignment** | Does the reasoning match the recommendation? | Agent identifies risk in analysis, recommends ignoring it |
+Each category contains 3-6 specific variation types. See the `stress-test` skill for the full library.
+## How to Apply
+1. Start with your behavioral scenarios (from a spec)
+2. Select applicable stressor categories based on trust tier
+3. Inject one stressor per variation (never combine — makes failures undiagnosable)
+4. For each variation, define whether the correct output should change or remain stable
+5. Run the matrix and score with aggregate metrics
+## Key Metrics
+| Metric | What It Catches |
+|--------|----------------|
+| **Variation stability** | Overall robustness to contextual pressure |
+| **Anchoring susceptibility** | Vulnerability to social/authority influence |
+| **Reasoning alignment** | Gap between what the agent thinks and what it does |
+| **Guardrail reliability** | Whether safety mechanisms fire on actual risk |
+| **Inverted U index** | Performance gap between routine and extreme cases |
+## The Continuous Flywheel
+Factorial stress testing isn't a one-time event. In production:
+1. **LLM-as-judge** evaluates every agent run against the scenario rulebook
+2. **Bias toward false positives** — flag more, miss less
+3. **Review flagged runs** — true positive → fix; false positive → refine rules
+4. **Audit PASSED runs** — the step nobody does. Sample 5-10%. Find what the judge missed.
+5. **Feed back** — every real failure becomes a new scenario in the library
+The library grows organically from production. Every failure encoded makes the system harder to break.
+## Attribution
+- **Mount Sinai Health System** — Factorial design methodology (Ramaswamy et al., Nature Medicine, 2026)
+- **Nate Jones** — Failure mode analysis and evaluation architecture

package/docs/methodology/failure-modes.md ADDED Viewed

@@ -0,0 +1,82 @@
+# The Four Failure Modes
+> Your agent knows the answer and sometimes it recommends the opposite thing.
+These four failure modes were identified through a factorial design study that tested an AI health system across 960 controlled variations. They are domain-general — they apply to any agent system, not just healthcare.
+## FM-1: The Inverted U
+**Pattern**: The agent performs best on routine, middle-of-distribution cases and worst at the extremes — where stakes are highest.
+**Why it happens**: LLMs are trained on distributions where the middle is densely represented and the extremes are sparse. They perform best exactly where performance matters least.
+**How to detect**: Compare accuracy on extreme cases vs. mid-range cases (the Inverted U Index). If the ratio is below 0.8, your agent has blind spots at the tails.
+**Examples**:
+- Accounts payable agent processes routine invoices perfectly, misses the slightly-modified duplicate
+- Claims agent handles fender benders but can't detect the third claim from the same address in 14 months
+- The health study: 93% accuracy on semi-urgent cases, 48% on emergencies
+**Stressors that expose it**: SE-01 (near-miss to extreme), SE-06 (routine packaging of extreme)
+## FM-2: Knows But Doesn't Act
+**Pattern**: The agent's reasoning correctly identifies a finding, but the output recommendation contradicts it.
+**Why it happens**: Research on chain-of-thought faithfulness reveals that reasoning traces and final outputs operate as semi-independent processes. Studies show models fail to update answers after logically significant changes in reasoning more than 50% of the time. The Oxford AI Governance Initiative has argued that chain of thought is "fundamentally unreliable as an explanation of a model's decision process."
+**How to detect**: Deterministic validation rules that compare reasoning to output. If the reasoning contains "enhanced due diligence flag" and the output says "standard risk," escalate.
+**Examples**:
+- Compliance agent identifies elevated risk in analysis, outputs "standard risk" classification
+- Customer service agent identifies known billing error pattern, recommends generic 5-7 day review
+- The health study: reasoning said "early respiratory failure," output said "wait 24-48 hours"
+**Key insight**: If chain-of-thought faithfulness can't be fixed at the model level, the solution must be architectural. External validation, not self-correction.
+**Stressors that expose it**: RO-01 (reasoning contradicts output), RO-02 (early-chain anchoring), RO-03 (confidence without basis)
+## FM-3: Social Context Hijacks Judgment
+**Pattern**: When a stakeholder minimizes severity or applies social pressure, the agent shifts its recommendation — individually defensible but systematically biased.
+**Why it happens**: Any agent that processes inputs combining structured data with unstructured human language is vulnerable. The structured data should drive the decision. The unstructured language creates a framing effect that anchors the response.
+**How to detect**: Run the same scenario with and without the social context cue. If the output shifts, measure the magnitude. The health study found an odds ratio of 11.7 — roughly 12x more likely to inappropriately de-escalate.
+**Examples**:
+- Vendor selection shifts when a VP note says "I'm confident this is the right choice"
+- Lending risk assessment shifts when employer letter describes applicant as "valued longtime employee"
+- The health study: when a family member said "the patient looks fine," triage de-escalated dramatically
+**Key insight**: Without controlled variation testing (same scenario ± anchoring input), this bias is invisible on standard evaluations.
+**Stressors that expose it**: SP-01 through SP-04 (authority, peer, client, expert), FA-01 through FA-04 (positive, negative, hedging, numerical)
+## FM-4: Guardrails Fire on Vibes, Not Risk
+**Pattern**: Safety mechanisms match surface-level language patterns (emotional keywords, alarming phrases) rather than actual risk taxonomy. Alerts are inverted relative to actual risk.
+**Why it happens**: Guardrails are often trained on surface features (keyword density, emotional tone) rather than structured risk assessment. They test for the appearance of safety, not actual safety.
+**How to detect**: Test with disguised severity (critical issue in calm packaging) and surface alarm (benign issue with alarming language). If guardrails fire on the surface alarm but miss the disguised severity, they're inverted.
+**Examples**:
+- Security agent flags email labeled "confidential financial data" (actually a public press release) but passes 50K customer records exported to personal Dropbox described as "backup"
+- The health study: crisis alerts fired more reliably for vague emotional distress than for concrete, detailed self-harm plans
+**Key insight**: The system will tell you it's doing well and have grounds for that assessment. You need domain knowledge to override.
+**Stressors that expose it**: SE-05 (disguised severity), SE-06 (routine packaging of extreme)
+## Using Failure Modes in Practice
+Every behavioral scenario in a spec should target at least one failure mode. For Tier 4 systems, all four failure modes must be covered across the scenario set.
+The `stress-test` skill generates a complete variation matrix with failure mode mapping for any set of behavioral scenarios.
+## Attribution
+- **Mount Sinai Health System** — Failure mode identification from their factorial design study (Ramaswamy et al., Nature Medicine, 2026)
+- **Nate Jones** — Failure mode generalization to enterprise agent systems
+- **Oxford AI Governance Initiative** — Chain-of-thought faithfulness research (FM-2)