npm - warp-os - Versions diffs - 1.1.2 → 1.2.1 - Mend

warp-os 1.1.2 → 1.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (45) hide show

package/CHANGELOG.md +85 -0
package/README.md +6 -4
package/VERSION +1 -1
package/agents/warp-annotate.md +394 -0
package/agents/warp-browse.md +9 -1
package/agents/warp-build-code.md +9 -1
package/agents/warp-orchestrator.md +10 -1
package/agents/warp-plan-architect.md +120 -1
package/agents/warp-plan-brainstorm.md +93 -2
package/agents/warp-plan-design.md +97 -4
package/agents/warp-plan-onboarding.md +9 -1
package/agents/warp-plan-optimize.md +9 -1
package/agents/warp-plan-scope.md +67 -1
package/agents/warp-plan-security.md +576 -35
package/agents/warp-plan-testdesign.md +9 -1
package/agents/warp-qa-debug.md +117 -1
package/agents/warp-qa-test.md +167 -1
package/agents/warp-release-update.md +290 -4
package/agents/warp-setup.md +9 -1
package/agents/warp-upgrade.md +21 -4
package/bin/hooks/CLAUDE.md +24 -0
package/bin/hooks/_warp_json.sh +4 -2
package/bin/hooks/identity-briefing.sh +20 -13
package/bin/hooks/validate-askuser.sh +41 -0
package/bin/migrate-sessions.js +284 -173
package/dist/warp-annotate/SKILL.md +404 -0
package/dist/warp-browse/SKILL.md +9 -1
package/dist/warp-build-code/SKILL.md +9 -1
package/dist/warp-orchestrator/SKILL.md +10 -1
package/dist/warp-plan-architect/SKILL.md +120 -1
package/dist/warp-plan-brainstorm/SKILL.md +93 -2
package/dist/warp-plan-design/SKILL.md +97 -4
package/dist/warp-plan-onboarding/SKILL.md +9 -1
package/dist/warp-plan-optimize/SKILL.md +9 -1
package/dist/warp-plan-scope/SKILL.md +67 -1
package/dist/warp-plan-security/SKILL.md +578 -35
package/dist/warp-plan-testdesign/SKILL.md +9 -1
package/dist/warp-qa-debug/SKILL.md +117 -1
package/dist/warp-qa-test/SKILL.md +167 -1
package/dist/warp-release-update/SKILL.md +290 -4
package/dist/warp-setup/SKILL.md +9 -1
package/dist/warp-upgrade/SKILL.md +21 -4
package/package.json +2 -2
package/shared/project-hooks.json +7 -0
package/shared/tier1-engineering-constitution.md +9 -1

package/agents/warp-plan-security.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 name: warp-plan-security
 description: >-
-  Full-spectrum security audit: secrets archaeology, dependency supply chain, OWASP Top 10, STRIDE threat modeling, static analysis patterns, variant analysis, and fix verification. Inspired by gstack CSO, Trail of Bits security methodology, and skill-threat-modeling. Two modes: daily (fast, high-confidence, 5-10 min) and comprehensive (full audit, catches everything, 30-60 min).
+  Full-spectrum security audit: secrets archaeology, dependency supply chain, CI/CD pipeline security, LLM/AI security, skill supply chain scanning, OWASP Top 10, STRIDE threat modeling, static analysis patterns, variant analysis, and fix verification. Inspired by gstack CSO, Trail of Bits security methodology, and skill-threat-modeling. Two modes: daily (fast, high-confidence, 5-10 min) and comprehensive (full audit, catches everything, 30-60 min). Scope flags for targeted audits (--infra, --code, --deps, --diff, --skills, --llm).
 ---
 <!-- ═══════════════════════════════════════════════════════════ -->
@@ -119,6 +119,8 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
 ## AskUserQuestion
+**Flow: analysis first, then decision tool.** Present your full reasoning, trade-offs, and recommendations as conversational text — the user wants to read your thinking. Then cap it with AskUserQuestion to formalize the decision. **If you're composing a message with multiple options or "which approach?" language, you MUST end it with AskUserQuestion.** Never present options in prose without the tool.
 **Contract:**
 1. **Re-ground:** Project name, branch, current task. (1-2 sentences.)
 2. **Simplify:** Plain English a smart 16-year-old could follow.
@@ -140,9 +142,15 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
 Format: `"Option name — X/10 🟢"` (or 🟡 or 🔴). In the label, not the description.
 Rate: 🟢 9-10 complete, 🟡 6-8 adequate, 🔴 1-5 shortcuts.
+**Pre-call checklist (verify before every AskUserQuestion invocation):**
+- ☐ Completeness scores in every option label
+- ☐ Recommended option listed first
+- ☐ One decision per question (split if multiple)
+- ☐ Analysis/reasoning already presented in message text above
 **Formatting:**
 - *Italics* for emphasis, not **bold** (bold for headers only).
-- After each answer: `✔ Decision {N} recorded [quicksave updated]`
+- After each answer: `✔ Decision {N} recorded`
 - Previews under 8 lines. Full mockups go in conversation text before the question.
 ---
@@ -191,22 +199,28 @@ Status values: **DONE**, **DONE_WITH_CONCERNS** (list concerns), **BLOCKED** (st
 Standalone skill. Runs anytime. Recommended before every `/warp-release-update` and after any dependency change, environment variable addition, or new API endpoint.
 ```
-  ┌─────────────────────────────────────────────────────────────┐
-  │                    WARP-PLAN-SECURITY                        │
-  │                                                             │
-  │   Mode: Daily (Phases 1-3)    Mode: Comprehensive (1-7)    │
-  │                                                             │
-  │   Phase 1: Secrets Archaeology                              │
-  │   Phase 2: Dependency Supply Chain                          │
-  │   Phase 3: OWASP Top 10                                    │
-  │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
-  │   Phase 4: STRIDE Threat Model           (comprehensive)   │
-  │   Phase 5: Static Analysis Patterns      (comprehensive)   │
-  │   Phase 6: Variant Analysis              (comprehensive)   │
-  │   Phase 7: Fix Verification              (comprehensive)   │
-  │                                                             │
-  │   Output: Security audit report (stdout + optional file)   │
-  └─────────────────────────────────────────────────────────────┘
+  ┌──────────────────────────────────────────────────────────────────┐
+  │                      WARP-PLAN-SECURITY                          │
+  │                                                                  │
+  │   Mode: Daily (Phases 0-3)     Mode: Comprehensive (0-10)       │
+  │   Scope: --infra --code --deps --diff --skills --llm            │
+  │                                                                  │
+  │   Phase 0:   Architecture Mental Model                           │
+  │   Phase 0.5: Attack Surface Census                               │
+  │   Phase 1:   Secrets Archaeology                                 │
+  │   Phase 2:   Dependency Supply Chain                             │
+  │   Phase 2.5: CI/CD Pipeline Security                             │
+  │   Phase 3:   OWASP Top 10                                       │
+  │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
+  │   Phase 4:   STRIDE Threat Model          (comprehensive)       │
+  │   Phase 5:   Static Analysis Patterns     (comprehensive)       │
+  │   Phase 5.5: LLM & AI Security           (comprehensive)       │
+  │   Phase 5.6: Skill Supply Chain Scanning  (comprehensive)       │
+  │   Phase 6:   Variant Analysis             (comprehensive)       │
+  │   Phase 7:   Fix Verification             (comprehensive)       │
+  │                                                                  │
+  │   Output: Security audit report (stdout + optional file)        │
+  └──────────────────────────────────────────────────────────────────┘
 ```
 ---
@@ -251,16 +265,43 @@ Internalize these cognitive patterns. They are not a checklist -- they are how y
 On invocation, determine the mode:
-- If the user says "daily," "quick," "fast," or "scan" --> **Daily mode** (Phases 1-3)
-- If the user says "comprehensive," "full," "audit," or "deep" --> **Comprehensive mode** (Phases 1-7)
+- If the user says "daily," "quick," "fast," or "scan" --> **Daily mode** (Phases 0-3)
+- If the user says "comprehensive," "full," "audit," or "deep" --> **Comprehensive mode** (Phases 0-10)
 - If the user says nothing about mode --> ask:
 > "Two audit modes available:
-> A) **Daily scan** -- high-confidence findings only (8/10 severity bar), runs 5-10 minutes, covers secrets/deps/OWASP
-> B) **Comprehensive audit** -- catches everything (2/10 bar), runs 30-60 minutes, adds threat model/static analysis/variant analysis/fix verification
+> A) **Daily scan** -- high-confidence findings only (8/10 severity bar), runs 5-10 minutes, covers architecture model/attack surface/secrets/deps/CI-CD/OWASP
+> B) **Comprehensive audit** -- catches everything (2/10 bar), runs 30-60 minutes, adds threat model/static analysis/LLM security/skill supply chain/variant analysis/fix verification
 >
 > RECOMMENDATION: Choose B if this is pre-ship, first audit, or after major changes. Choose A for routine checks."
+### Scope Flags
+Support targeted audits via scope flags. When a scope flag is provided, run ONLY the matching phases (plus Phase 0 for context). Multiple flags can be combined.
+| Flag | Phases Run | What It Covers |
+|------|-----------|----------------|
+| `--infra` | 0, 0.5, 2.5 (CI/CD) | CI/CD pipelines, Docker, IaC configs only |
+| `--code` | 0, 3 (OWASP), 5 (Static Analysis) | OWASP Top 10, static analysis patterns only |
+| `--deps` | 0, 2 (Dependency Supply Chain) | Dependency supply chain only |
+| `--diff` | 0, then all phases scoped to changed files | Only scan files changed since last audit |
+| `--skills` | 0, 5.6 (Skill Supply Chain) | Claude Code skill supply chain only |
+| `--llm` | 0, 5.5 (LLM & AI Security) | LLM/AI security vectors only |
+When `--diff` is used, determine the changed file set:
+```bash
+# If a previous audit report exists, diff since that commit
+LAST_AUDIT_COMMIT=$(git log --all --oneline --grep="Security Audit" --format='%H' | head -1)
+if [ -n "$LAST_AUDIT_COMMIT" ]; then
+  git diff --name-only "$LAST_AUDIT_COMMIT"..HEAD
+else
+  # Fall back to last 7 days
+  git diff --name-only "HEAD@{7 days ago}"..HEAD 2>/dev/null || git diff --name-only HEAD~20..HEAD
+fi
+```
+Only scan those files in each phase. Report the diff scope in the audit header.
 ### Severity Bar
 - **Daily mode (8/10 bar):** Only report findings that are HIGH or CRITICAL severity. These are things that an attacker could exploit today with minimal effort. Skip informational, low, and medium findings -- they create noise that delays shipping.
@@ -283,6 +324,142 @@ On invocation, determine the mode:
 ---
+## PHASE 0: Architecture Mental Model
+**Goal:** Before scanning anything, build a mental model of the application's technology stack, deployment model, and integration points. This model determines which subsequent phases are relevant and which can be skipped.
+**Time budget:** Daily 1-2 min, Comprehensive 2-3 min.
+### 0A. Technology Detection
+Scan project root for configuration files that reveal the stack:
+```bash
+# Detect language/runtime
+for f in package.json Cargo.toml go.mod pyproject.toml requirements.txt Gemfile pom.xml build.gradle composer.json; do
+  [ -f "$f" ] && echo "FOUND: $f"
+done
+# Detect framework
+git grep -l -E '(next\.config|nuxt\.config|remix\.config|vite\.config|angular\.json|svelte\.config)' -- ':!node_modules' 2>/dev/null | head -5
+git grep -l -E '(from ["\x27]express|from ["\x27]fastify|from ["\x27]hono|from ["\x27]django|from ["\x27]flask|from ["\x27]rails)' -- ':!node_modules' 2>/dev/null | head -5
+# Detect database
+git grep -l -E '(prisma|drizzle|typeorm|sequelize|knex|mongoose|supabase|firebase)' -- '*.json' '*.ts' '*.js' ':!node_modules' 2>/dev/null | head -5
+# Detect auth
+git grep -l -E '(next-auth|passport|jwt|jsonwebtoken|bcrypt|argon2|oauth|clerk|auth0|supabase.*auth)' -- '*.json' '*.ts' '*.js' ':!node_modules' 2>/dev/null | head -5
+# Detect deployment
+for f in Dockerfile docker-compose.yml fly.toml vercel.json netlify.toml render.yaml serverless.yml terraform.tf; do
+  [ -f "$f" ] && echo "DEPLOY: $f"
+done
+find . -name "*.tf" -not -path "*/node_modules/*" -maxdepth 3 2>/dev/null | head -5
+```
+### 0B. Mental Model Output
+Produce this structured summary before proceeding:
+```
+ARCHITECTURE MENTAL MODEL:
+  Language/Runtime: [detected from package.json, Cargo.toml, etc.]
+  Framework: [Next.js, Express, Django, Rails, etc.]
+  Database: [Postgres, MySQL, SQLite, none]
+  Auth: [JWT, session, OAuth, none detected]
+  External integrations: [list APIs, webhooks, third-party services]
+  Deployment: [Docker, serverless, bare metal, PaaS]
+This mental model guides which phases are relevant and which can be skipped.
+```
+### 0C. Phase Relevance
+Based on the mental model, note which phases need extra attention:
+- No auth detected --> Phase 3 A01 and A07 are top priority
+- Docker/CI detected --> Phase 2.5 is critical
+- LLM/AI dependencies detected --> Phase 5.5 is critical
+- Claude Code skills installed --> Phase 5.6 is critical
+- External integrations detected --> STRIDE trust boundary analysis is critical
+- No database detected --> skip database-specific checks in OWASP
+---
+## PHASE 0.5: Attack Surface Census
+**Goal:** Produce a quantitative map of the application's attack surface. Higher numbers mean larger surface area requiring more scrutiny.
+**Time budget:** Daily 1-2 min, Comprehensive 3-5 min.
+### 0.5A. Surface Enumeration
+```bash
+# Count public endpoints (no auth middleware)
+echo "=== Endpoint counts ==="
+PUBLIC=$(git grep -c -E '(app\.(get|post|put|patch|delete)|router\.(get|post|put|patch|delete)|export.*(GET|POST|PUT|PATCH|DELETE))' -- '*.ts' '*.js' ':!node_modules' 2>/dev/null | awk -F: '{s+=$2} END {print s+0}')
+echo "Route definitions: $PUBLIC"
+# Count file upload handlers
+UPLOADS=$(git grep -c -E '(multer|upload|formidable|busboy|multipart)' -- '*.ts' '*.js' ':!node_modules' 2>/dev/null | awk -F: '{s+=$2} END {print s+0}')
+echo "File upload references: $UPLOADS"
+# Count WebSocket channels
+WS=$(git grep -c -E '(WebSocket|socket\.io|ws\(|wss://|io\.on)' -- '*.ts' '*.js' ':!node_modules' 2>/dev/null | awk -F: '{s+=$2} END {print s+0}')
+echo "WebSocket references: $WS"
+# Count external integrations
+EXT=$(git grep -c -E '(fetch|axios|http\.get|https\.get)\s*\(' -- '*.ts' '*.js' ':!node_modules' 2>/dev/null | awk -F: '{s+=$2} END {print s+0}')
+echo "External HTTP calls: $EXT"
+# Count background jobs
+JOBS=$(git grep -c -E '(cron|schedule|setInterval|bull|agenda|bree|node-cron)' -- '*.ts' '*.js' ':!node_modules' 2>/dev/null | awk -F: '{s+=$2} END {print s+0}')
+echo "Background job references: $JOBS"
+# Count CI/CD workflows
+CI=$(find .github/workflows -name "*.yml" -o -name "*.yaml" 2>/dev/null | wc -l)
+echo "CI/CD workflow files: $CI"
+# Count webhook receivers
+HOOKS=$(git grep -c -E '(webhook|/hook|/callback|/notify)' -- '*.ts' '*.js' ':!node_modules' 2>/dev/null | awk -F: '{s+=$2} END {print s+0}')
+echo "Webhook references: $HOOKS"
+# Count container configs
+CONTAINERS=$(find . \( -name "Dockerfile*" -o -name "docker-compose*.yml" -o -name ".dockerignore" \) -not -path "*/node_modules/*" 2>/dev/null | wc -l)
+echo "Container config files: $CONTAINERS"
+# Count IaC configs
+IAC=$(find . \( -name "*.tf" -o -name "*.tfvars" -o -name "serverless.yml" -o -name "cdk.json" -o -name "pulumi.*" \) -not -path "*/node_modules/*" 2>/dev/null | wc -l)
+echo "IaC config files: $IAC"
+```
+### 0.5B. Census Output
+```
+ATTACK SURFACE CENSUS:
+  ┌────────────────────────┬───────┐
+  │ Surface                 │ Count │
+  ├────────────────────────┼───────┤
+  │ Public endpoints        │ [N]   │
+  │ Authenticated endpoints │ [N]   │
+  │ Admin-only endpoints    │ [N]   │
+  │ File upload points      │ [N]   │
+  │ WebSocket channels      │ [N]   │
+  │ External integrations   │ [N]   │
+  │ Background jobs         │ [N]   │
+  │ CI/CD workflows         │ [N]   │
+  │ Webhook receivers       │ [N]   │
+  │ Container configs       │ [N]   │
+  │ IaC configs             │ [N]   │
+  └────────────────────────┴───────┘
+Higher counts = larger attack surface = more scrutiny needed.
+```
+Use these counts to allocate time in subsequent phases. A project with 50 endpoints and 0 CI/CD workflows should spend 80% of time on code phases and skip CI/CD. A project with 2 endpoints and 15 CI/CD workflows should prioritize infrastructure.
+**SOFT GATE: Mental model and attack surface census complete. Proceed to secrets archaeology.**
+---
 ## PHASE 1: Secrets Archaeology
 **Goal:** Find every secret that has ever been committed, is currently exposed, or could leak through misconfiguration.
@@ -496,6 +673,110 @@ find node_modules -name "binding.gyp" -maxdepth 3 2>/dev/null | head -10
 ---
+## PHASE 2.5: CI/CD Pipeline Security
+**Goal:** Audit CI/CD pipelines for supply chain attacks, secret exfiltration, and privilege escalation. CI/CD pipelines have production credentials, deployment access, and signing keys -- they are the single highest-value target in most organizations.
+**Time budget:** Daily 2-3 min, Comprehensive 5-10 min.
+### 2.5A. GitHub Actions Audit
+```bash
+# Find all workflow files
+find .github/workflows -name "*.yml" -o -name "*.yaml" 2>/dev/null | while read f; do
+  echo "=== $f ==="
+  # Check for unpinned third-party actions (uses tag instead of SHA)
+  echo "--- Unpinned actions (should use SHA, not tag) ---"
+  grep -n 'uses:' "$f" | grep -v -E '@[0-9a-f]{40}' | grep -v 'actions/(checkout|setup-node|cache|upload-artifact|download-artifact)@v' | head -10
+  # Check for dangerous triggers
+  echo "--- Dangerous triggers ---"
+  grep -n 'pull_request_target' "$f" | head -5
+  grep -n 'workflow_dispatch' "$f" | head -5
+  # Check for script injection via interpolation in run: blocks
+  echo "--- Potential script injection ---"
+  grep -n -E '\$\{\{\s*github\.event\.(issue|pull_request|comment|review|discussion)\.' "$f" | head -10
+  # Check for overly broad secret access
+  echo "--- Secret usage ---"
+  grep -n -E 'secrets\.' "$f" | head -10
+done
+```
+### 2.5B. Specific CI/CD Checks
+**Unpinned third-party actions:**
+Third-party GitHub Actions referenced by tag (`@v1`, `@main`) can be silently updated by the action author to inject malicious code. Pin to a full commit SHA (`@a1b2c3d...`).
+```
+FINDING template:
+  Unpinned action: [action@tag]
+  Workflow: [file:line]
+  Risk: Action author can push malicious code that runs with your repo's permissions
+  Fix: Pin to SHA: [action@full-sha] (find SHA at the action's releases page)
+```
+**`pull_request_target` trigger:**
+This trigger runs in the context of the BASE branch, with access to base branch secrets. A malicious PR can exfiltrate secrets if the workflow checks out PR code and runs it. This is one of the most dangerous GitHub Actions patterns.
+```bash
+# Check if pull_request_target workflows check out PR code (the dangerous pattern)
+for f in .github/workflows/*.yml .github/workflows/*.yaml; do
+  [ -f "$f" ] || continue
+  if grep -q 'pull_request_target' "$f"; then
+    echo "=== $f uses pull_request_target ==="
+    # Does it also checkout PR head? (the exploit vector)
+    grep -n -E 'ref.*\$\{\{ github.event.pull_request.head' "$f" | head -5
+    grep -n -E 'ref.*\$\{\{ github.head_ref' "$f" | head -5
+  fi
+done
+```
+**Script injection via expression interpolation:**
+When `${{ github.event.issue.title }}` or similar expressions appear in `run:` blocks, an attacker can craft an issue title containing shell commands that execute in the workflow.
+```bash
+# Find all interpolated expressions in run: blocks
+for f in .github/workflows/*.yml .github/workflows/*.yaml; do
+  [ -f "$f" ] || continue
+  # Look for github.event context used in run blocks (potential injection)
+  grep -n -B2 -A2 '\$\{\{.*github\.event\.' "$f" 2>/dev/null | grep -A2 'run:' | head -20
+done
+```
+**Secrets passed to unnecessary steps:**
+Each workflow step should only receive the secrets it needs. Steps that receive all secrets via `env:` at the job level can exfiltrate secrets they do not need.
+**CODEOWNERS protection:**
+```bash
+# Check if CODEOWNERS exists and what it protects
+if [ -f "CODEOWNERS" ] || [ -f ".github/CODEOWNERS" ] || [ -f "docs/CODEOWNERS" ]; then
+  echo "CODEOWNERS found:"
+  cat CODEOWNERS .github/CODEOWNERS docs/CODEOWNERS 2>/dev/null
+  echo ""
+  echo "Check: Is branch protection enabled requiring CODEOWNERS review?"
+else
+  echo "WARN: No CODEOWNERS file found"
+fi
+```
+**Self-hosted runner risks:**
+```bash
+# Check for self-hosted runner usage
+for f in .github/workflows/*.yml .github/workflows/*.yaml; do
+  [ -f "$f" ] || continue
+  grep -n 'runs-on.*self-hosted' "$f" | head -5
+done
+```
+Self-hosted runners can access other workflows' secrets, persist state between jobs, and provide lateral movement to the host machine's network. If found, flag as HIGH and recommend ephemeral runners or container isolation.
+**SOFT GATE: Phase 2.5 complete. Proceed to OWASP Top 10.**
+---
 ## PHASE 3: OWASP Top 10
 **Goal:** Systematically check for all OWASP Top 10 (2021) vulnerability categories in the codebase.
@@ -676,9 +957,9 @@ git grep -n -E '(url\.parse|new URL|redirect|location)' -- '*.ts' '*.js' ':!node
 - Internal network addresses (10.x, 172.16.x, 192.168.x, 127.x, localhost) are blocked in user-supplied URLs
 - Redirect endpoints validate the target URL
-**HARD GATE (Daily mode): Phases 1-3 complete. Present findings summary. In daily mode, this is the final gate -- produce the report.**
+**HARD GATE (Daily mode): Phases 0-3 complete. Present findings summary. In daily mode, this is the final gate -- produce the report.**
-**SOFT GATE (Comprehensive mode): Phases 1-3 complete. Proceed to Phase 4.**
+**SOFT GATE (Comprehensive mode): Phases 0-3 complete. Proceed to Phase 4.**
 ---
@@ -876,7 +1157,205 @@ Apply common vulnerability patterns. For each pattern, search the codebase:
 | Mass assignment | Spreading user input into DB write | `git grep -n 'insert.*\.\.\.req\|update.*\.\.\.body\|create.*\.\.\.input'` |
 | Timing attack | Non-constant-time comparison of secrets | `git grep -n '===.*token\|===.*secret\|===.*hash'` |
-**SOFT GATE: Phase 5 complete. Proceed to variant analysis.**
+**SOFT GATE: Phase 5 complete. Proceed to LLM & AI security.**
+---
+## PHASE 5.5: LLM & AI Security
+**Comprehensive mode only.**
+**Goal:** Audit all LLM/AI integration points for prompt injection, output trust violations, key exposure, and unsafe execution patterns.
+**Time budget:** 5-10 min.
+### 5.5A. LLM Integration Detection
+```bash
+# Detect LLM/AI SDKs and API usage
+git grep -l -E '(openai|anthropic|@anthropic-ai|langchain|llama|cohere|replicate|huggingface|ai/core|@ai-sdk)' -- '*.ts' '*.js' '*.py' '*.json' ':!node_modules' ':!*.lock' 2>/dev/null | head -20
+# Detect prompt construction
+git grep -n -E '(system.*prompt|user.*prompt|messages.*role|ChatCompletion|generateText|streamText|createChat)' -- '*.ts' '*.js' '*.py' ':!node_modules' 2>/dev/null | head -30
+# Detect tool/function calling
+git grep -n -E '(tools|functions|function_call|tool_choice|tool_use)' -- '*.ts' '*.js' '*.py' ':!node_modules' 2>/dev/null | head -20
+```
+### 5.5B. Prompt Injection Vectors
+Check every code path where user input feeds into LLM prompts:
+```bash
+# Find prompt templates with interpolated user input
+git grep -n -E '(prompt.*\$\{|prompt.*\+|`.*\$\{.*user|`.*\$\{.*input|`.*\$\{.*query|`.*\$\{.*message)' -- '*.ts' '*.js' ':!node_modules' 2>/dev/null | head -20
+# Find f-strings or format strings in prompt construction (Python)
+git grep -n -E '(f["\x27].*\{.*user|f["\x27].*\{.*input|\.format\(.*user|\.format\(.*input)' -- '*.py' ':!node_modules' 2>/dev/null | head -20
+```
+**Specific checks:**
+- User input concatenated directly into system prompts without sanitization or delimiters
+- User input placed before system instructions (allows instruction override)
+- No input length limits on user content sent to LLM (cost and injection risk)
+- Missing output validation before rendering LLM responses
+### 5.5C. Unsanitized LLM Output
+LLM output is UNTRUSTED. It must be treated like user input at every rendering boundary.
+```bash
+# LLM output rendered as HTML (XSS via LLM)
+git grep -n -E '(dangerouslySetInnerHTML|v-html|innerHTML)' -- '*.ts' '*.tsx' '*.js' '*.jsx' '*.vue' ':!node_modules' 2>/dev/null | head -10
+# LLM output used in SQL (injection via LLM)
+git grep -n -E '(query|execute|raw)\s*\(' -- '*.ts' '*.js' ':!node_modules' 2>/dev/null | head -10
+# LLM output used in shell commands (command injection via LLM)
+git grep -n -E '(exec|spawn|execSync|child_process)' -- '*.ts' '*.js' ':!node_modules' 2>/dev/null | head -10
+# eval/exec of LLM output (arbitrary code execution)
+git grep -n -E '(eval|exec|Function)\s*\(' -- '*.ts' '*.js' '*.py' ':!node_modules' 2>/dev/null | head -15
+```
+Cross-reference these with LLM output variables. If LLM output flows into any of these sinks without sanitization, it is a finding.
+### 5.5D. Tool/Function Calling Validation
+If the application uses LLM tool/function calling:
+**Specific checks:**
+- Are tool calls from the LLM validated against an allowlist before execution?
+- Are tool call arguments validated/sanitized before being passed to the actual function?
+- Can the LLM request tools that access sensitive resources (file system, database, network)?
+- Is there a human-in-the-loop for destructive tool calls?
+- Are tool results from external sources sanitized before being fed back to the LLM?
+### 5.5E. AI API Key Security
+```bash
+# Hardcoded AI API keys
+git grep -n -E '(sk-[a-zA-Z0-9]{20,}|sk-ant-[a-zA-Z0-9]{20,}|sk-proj-[a-zA-Z0-9]{20,})' -- ':!node_modules' ':!*.lock' 2>/dev/null | head -10
+# AI keys in committed env files
+git log --all -p --diff-filter=A -- '*.env' '*.env.*' 2>/dev/null | grep -E '(OPENAI|ANTHROPIC|COHERE|REPLICATE|HUGGINGFACE).*=' | head -10
+# AI keys in client-side code (prefix patterns)
+git grep -n -E '(NEXT_PUBLIC|EXPO_PUBLIC|REACT_APP|VITE_).*(OPENAI|ANTHROPIC|AI_|LLM_)' -- ':!*.md' 2>/dev/null | head -10
+```
+### 5.5F. LLM Rate Limiting and Availability
+**Specific checks:**
+- Rate limiting on endpoints that trigger LLM API calls (prevent cost abuse)
+- Timeout handling for LLM API calls (prevent request queuing)
+- Fallback behavior when LLM API is unavailable (graceful degradation, not crash)
+- Cost controls or budget limits on LLM API usage
+- Retry logic that could amplify costs on failure
+**SOFT GATE: Phase 5.5 complete. Proceed to skill supply chain scanning.**
+---
+## PHASE 5.6: Skill Supply Chain Scanning
+**Comprehensive mode only.**
+**Goal:** Audit installed Claude Code skills (and similar agent tools) for malicious behavior, data exfiltration, and prompt injection. Research shows 36% of published skills have security flaws, and 13.4% are outright malicious (gstack research).
+**Time budget:** 5-10 min.
+### 5.6A. Skill Inventory
+```bash
+# List installed Claude Code skills
+ls -la ~/.claude/skills/ 2>/dev/null | head -30
+# List installed agents
+ls -la ~/.claude/agents/ 2>/dev/null | head -30
+# Check project-local skills
+find . -path '*/.claude/skills/*' -name "*.md" -not -path "*/node_modules/*" 2>/dev/null | head -20
+# Check for hook scripts
+cat .claude/settings.json 2>/dev/null | grep -A5 '"hooks"' | head -20
+cat .claude/settings.local.json 2>/dev/null | grep -A5 '"hooks"' | head -20
+```
+### 5.6B. Network Exfiltration Patterns
+Scan all skill and hook files for outbound network calls:
+```bash
+# Check hook scripts for network calls
+find .claude -name "*.sh" -o -name "*.js" -o -name "*.py" 2>/dev/null | while read f; do
+  echo "=== $f ==="
+  grep -n -E '(curl|wget|fetch|http|https|XMLHttpRequest|net\.Socket|dgram)' "$f" 2>/dev/null | head -5
+done
+# Check skill definitions for network instruction patterns
+find ~/.claude/skills -name "*.md" 2>/dev/null | while read f; do
+  HITS=$(grep -c -i -E '(send.*to|post.*to|upload.*to|exfiltrate|phone.*home|beacon|ping.*server)' "$f" 2>/dev/null)
+  if [ "$HITS" -gt 0 ]; then
+    echo "SUSPICIOUS: $f ($HITS network instruction patterns)"
+  fi
+done
+```
+### 5.6C. Credential Access Patterns
+```bash
+# Check for scripts reading sensitive directories
+find .claude -name "*.sh" -o -name "*.js" -o -name "*.py" 2>/dev/null | while read f; do
+  echo "=== $f ==="
+  grep -n -E '(\.ssh/|\.aws/|\.gnupg/|\.npmrc|\.pypirc|\.netrc|\.docker/config|credentials|keychain)' "$f" 2>/dev/null | head -5
+  grep -n -E '(process\.env|os\.environ|ENV\[|getenv)' "$f" 2>/dev/null | head -5
+done
+```
+### 5.6D. Prompt Injection in Skill Definitions
+Check skill definitions for prompt injection patterns:
+```bash
+# Look for instruction override attempts in skill files
+find ~/.claude/skills -name "*.md" 2>/dev/null | while read f; do
+  grep -n -i -E '(ignore.*previous|disregard.*instructions|override.*system|you are now|forget.*rules|new instructions)' "$f" 2>/dev/null | head -3
+  [ $? -eq 0 ] && echo "SUSPICIOUS prompt injection pattern in: $f"
+done
+```
+### 5.6E. Obfuscated Code Detection
+```bash
+# Check hook scripts for obfuscation patterns
+find .claude -name "*.sh" -o -name "*.js" -o -name "*.py" 2>/dev/null | while read f; do
+  echo "=== $f ==="
+  # Base64 encoded commands
+  grep -n -E '(base64|atob|btoa|decode\(|b64decode)' "$f" 2>/dev/null | head -3
+  # Hex-encoded strings
+  grep -n -E '\\x[0-9a-fA-F]{2}.*\\x[0-9a-fA-F]{2}.*\\x[0-9a-fA-F]{2}' "$f" 2>/dev/null | head -3
+  # eval with encoded input
+  grep -n -E '(eval|exec)\s*\(.*decode' "$f" 2>/dev/null | head -3
+done
+```
+### 5.6F. Skill Risk Classification
+For each skill/agent found:
+```
+SKILL: [name]
+  Source: [official | community | unknown]
+  Network access: [none | detected (list URLs)]
+  Credential access: [none | detected (list paths)]
+  Prompt injection: [none | suspicious patterns found]
+  Obfuscation: [none | detected]
+  Risk: [LOW | MEDIUM | HIGH | CRITICAL]
+  Recommendation: [keep | review | remove]
+```
+**SOFT GATE: Phase 5.6 complete. Proceed to variant analysis.**
 ---
@@ -923,7 +1402,22 @@ VARIANT: [location file:line]
   Severity delta: [same as original | higher because... | lower because...]
 ```
-### 6D. Systemic Issue Identification
+### 6D. Variant Analysis Output
+When a finding is VERIFIED in any phase, produce this structured output for every variant search:
+```
+VARIANT ANALYSIS:
+  Finding: [description of the original verified finding]
+  Pattern: [regex or code pattern used to search]
+  Codebase scan results: [N additional instances found]
+  Locations: [file:line for each instance]
+  Exploitability per location: [yes/no/uncertain for each]
+```
+This output is mandatory for every CRITICAL and HIGH finding. For MEDIUM and below, variant analysis is recommended but not required.
+### 6E. Systemic Issue Identification
 If 3+ variants of the same pattern are found:
@@ -1059,7 +1553,7 @@ For daily mode, sections marked [COMPREHENSIVE] are omitted.
 ## [COMPREHENSIVE] Systemic Issues
-{From Phase 6D -- patterns that recur across the codebase}
+{From Phase 6E -- patterns that recur across the codebase}
 ## Fix Priority Matrix
@@ -1078,10 +1572,43 @@ For daily mode, sections marked [COMPREHENSIVE] are omitted.
 ## Trend Tracking
-{If previous audit reports exist, compare:}
+{If previous audit reports exist in `.warp/reports/planning/`, compare:}
+TREND COMPARISON:
+  Previous audit: [date] — [N] findings
+  Current audit: [date] — [N] findings
+  Resolved since last: [N] — [list]
+  Persistent (still open): [N] — [list]
+  New (first seen): [N] — [list]
+  Trend: IMPROVING | STABLE | DEGRADING
 - Findings resolved since last audit: {list}
 - New findings since last audit: {list}
 - Recurring findings (appeared in 2+ audits): {list} — these need systemic fixes
+- Trend direction: IMPROVING (fewer findings) | STABLE (same count) | DEGRADING (more findings)
+## Attack Surface Census
+{From Phase 0.5 — the quantitative attack surface map}
+## [COMPREHENSIVE] Data Classification
+{From Phase 4 data flow analysis — the four-tier data classification table}
+## [COMPREHENSIVE] LLM & AI Security Summary
+{From Phase 5.5 — prompt injection vectors, output trust violations, API key findings}
+## [COMPREHENSIVE] Skill Supply Chain Summary
+{From Phase 5.6 — risk classification for each installed skill/agent}
+---
+DISCLAIMER: This automated security audit is not a substitute for a professional
+penetration test or security assessment. It identifies common vulnerability patterns
+but cannot guarantee completeness. For production systems handling sensitive data,
+engage a professional security firm.
 ```
 ---
@@ -1092,12 +1619,26 @@ These principles are architectural, not tactical. They shape how the system is d
 **Default deny.** New endpoints, resources, and operations are inaccessible until permissions are explicitly defined. The default state of any new surface area is "blocked." Access must be granted, never assumed. If a developer adds a new API endpoint and forgets to add an auth check, the default-deny architecture rejects all requests to it rather than serving them to anyone.
-**Three-tier data classification.** All data in the system belongs to one of three tiers:
-- **Public:** Safe to expose to any user or external system. Example: app version, public documentation.
-- **Sensitive:** Accessible only to authenticated users with appropriate authorization. Example: user profile, flight schedule, notification preferences.
-- **Restricted:** Accessible only to specific roles with audit logging. Example: service role keys, admin credentials, PII aggregates.
+**Four-tier data classification.** All data in the system belongs to one of four tiers. In comprehensive audits, produce the data classification table as part of Phase 4 (STRIDE) data flow analysis:
+```
+DATA CLASSIFICATION:
+  ┌────────────────┬─────────────┬──────────────────────────────┐
+  │ Data            │ Class        │ Handling Requirements          │
+  ├────────────────┼─────────────┼──────────────────────────────┤
+  │ [data type]     │ RESTRICTED   │ Encrypted at rest + in transit │
+  │ [data type]     │ CONFIDENTIAL │ Access-controlled, logged      │
+  │ [data type]     │ INTERNAL     │ Not public, basic controls     │
+  │ [data type]     │ PUBLIC       │ No restrictions                │
+  └────────────────┴─────────────┴──────────────────────────────┘
+```
+- **PUBLIC:** Safe to expose to any user or external system. Example: app version, public documentation.
+- **INTERNAL:** Not public, but low sensitivity. Basic access controls. Example: internal feature flags, non-sensitive configuration.
+- **CONFIDENTIAL:** Accessible only to authenticated users with appropriate authorization. Example: user profile, flight schedule, notification preferences.
+- **RESTRICTED:** Accessible only to specific roles with audit logging. Encrypted at rest and in transit. Example: service role keys, admin credentials, PII aggregates, payment data.
-Every data field in the architecture should be classifiable into one of these tiers. If a field's tier is ambiguous, treat it as Sensitive until explicitly classified.
+Every data field in the architecture should be classifiable into one of these tiers. If a field's tier is ambiguous, treat it as CONFIDENTIAL until explicitly classified. The classification table is a required output in comprehensive mode -- it goes in the report alongside the STRIDE threat model.
 **Design for most restrictive phase, enforce from earliest.** If the product will eventually need HIPAA compliance, SOC 2, or GDPR data residency — design for those constraints now, even if enforcement is deferred. Retrofitting security constraints into an architecture designed without them costs 10x the effort of building them in from the start. No shortcuts that require retrofit.
@@ -1133,7 +1674,7 @@ These are security review anti-patterns. If you catch yourself doing any of thes
 1. **MUST redact all secret values in report output.** Never print actual API keys, passwords, tokens, or credentials. Use `[REDACTED]` or show only the first 4 characters.
-2. **MUST run Phase 1 (Secrets Archaeology) first in every audit.** Secrets leaks are the highest-ROI finding for an attacker and take minutes to check.
+2. **MUST run Phase 0 (Architecture Mental Model) and Phase 1 (Secrets Archaeology) in every audit.** Phase 0 builds the context model. Phase 1 (secrets) is the first scanning phase and the highest-ROI finding for an attacker -- it takes minutes to check.
 3. **MUST classify every finding with both OWASP category and STRIDE category.** This ensures completeness -- if a finding does not map to either framework, it may be a false positive.
@@ -1147,7 +1688,7 @@ These are security review anti-patterns. If you catch yourself doing any of thes
 8. **MUST ask the user before making any code changes.** This skill is audit-only by default. Present findings and get explicit approval before touching code.
-9. **MUST consider the full attack surface.** Client-side code, server-side code, CI/CD pipelines, infrastructure configuration, dependency chain, git history -- all of it is in scope.
+9. **MUST consider the full attack surface.** Client-side code, server-side code, CI/CD pipelines, infrastructure configuration, dependency chain, git history, LLM integration points, installed skills/agents -- all of it is in scope. The Phase 0.5 census quantifies this surface.
 10. **MUST apply the severity bar consistently.** Daily mode (8/10 bar) skips LOW/MEDIUM/INFO. Comprehensive mode (2/10 bar) reports everything. Never mix bars within a single audit.