warp-os 1.1.2 → 1.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (45) hide show
  1. package/CHANGELOG.md +85 -0
  2. package/README.md +6 -4
  3. package/VERSION +1 -1
  4. package/agents/warp-annotate.md +394 -0
  5. package/agents/warp-browse.md +9 -1
  6. package/agents/warp-build-code.md +9 -1
  7. package/agents/warp-orchestrator.md +10 -1
  8. package/agents/warp-plan-architect.md +120 -1
  9. package/agents/warp-plan-brainstorm.md +93 -2
  10. package/agents/warp-plan-design.md +97 -4
  11. package/agents/warp-plan-onboarding.md +9 -1
  12. package/agents/warp-plan-optimize.md +9 -1
  13. package/agents/warp-plan-scope.md +67 -1
  14. package/agents/warp-plan-security.md +576 -35
  15. package/agents/warp-plan-testdesign.md +9 -1
  16. package/agents/warp-qa-debug.md +117 -1
  17. package/agents/warp-qa-test.md +167 -1
  18. package/agents/warp-release-update.md +290 -4
  19. package/agents/warp-setup.md +9 -1
  20. package/agents/warp-upgrade.md +21 -4
  21. package/bin/hooks/CLAUDE.md +24 -0
  22. package/bin/hooks/_warp_json.sh +4 -2
  23. package/bin/hooks/identity-briefing.sh +20 -13
  24. package/bin/hooks/validate-askuser.sh +41 -0
  25. package/bin/migrate-sessions.js +284 -173
  26. package/dist/warp-annotate/SKILL.md +404 -0
  27. package/dist/warp-browse/SKILL.md +9 -1
  28. package/dist/warp-build-code/SKILL.md +9 -1
  29. package/dist/warp-orchestrator/SKILL.md +10 -1
  30. package/dist/warp-plan-architect/SKILL.md +120 -1
  31. package/dist/warp-plan-brainstorm/SKILL.md +93 -2
  32. package/dist/warp-plan-design/SKILL.md +97 -4
  33. package/dist/warp-plan-onboarding/SKILL.md +9 -1
  34. package/dist/warp-plan-optimize/SKILL.md +9 -1
  35. package/dist/warp-plan-scope/SKILL.md +67 -1
  36. package/dist/warp-plan-security/SKILL.md +578 -35
  37. package/dist/warp-plan-testdesign/SKILL.md +9 -1
  38. package/dist/warp-qa-debug/SKILL.md +117 -1
  39. package/dist/warp-qa-test/SKILL.md +167 -1
  40. package/dist/warp-release-update/SKILL.md +290 -4
  41. package/dist/warp-setup/SKILL.md +9 -1
  42. package/dist/warp-upgrade/SKILL.md +21 -4
  43. package/package.json +2 -2
  44. package/shared/project-hooks.json +7 -0
  45. package/shared/tier1-engineering-constitution.md +9 -1
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  name: warp-plan-security
3
3
  description: >-
4
- Full-spectrum security audit: secrets archaeology, dependency supply chain, OWASP Top 10, STRIDE threat modeling, static analysis patterns, variant analysis, and fix verification. Inspired by gstack CSO, Trail of Bits security methodology, and skill-threat-modeling. Two modes: daily (fast, high-confidence, 5-10 min) and comprehensive (full audit, catches everything, 30-60 min).
4
+ Full-spectrum security audit: secrets archaeology, dependency supply chain, CI/CD pipeline security, LLM/AI security, skill supply chain scanning, OWASP Top 10, STRIDE threat modeling, static analysis patterns, variant analysis, and fix verification. Inspired by gstack CSO, Trail of Bits security methodology, and skill-threat-modeling. Two modes: daily (fast, high-confidence, 5-10 min) and comprehensive (full audit, catches everything, 30-60 min). Scope flags for targeted audits (--infra, --code, --deps, --diff, --skills, --llm).
5
5
  ---
6
6
 
7
7
  <!-- ═══════════════════════════════════════════════════════════ -->
@@ -119,6 +119,8 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
119
119
 
120
120
  ## AskUserQuestion
121
121
 
122
+ **Flow: analysis first, then decision tool.** Present your full reasoning, trade-offs, and recommendations as conversational text — the user wants to read your thinking. Then cap it with AskUserQuestion to formalize the decision. **If you're composing a message with multiple options or "which approach?" language, you MUST end it with AskUserQuestion.** Never present options in prose without the tool.
123
+
122
124
  **Contract:**
123
125
  1. **Re-ground:** Project name, branch, current task. (1-2 sentences.)
124
126
  2. **Simplify:** Plain English a smart 16-year-old could follow.
@@ -140,9 +142,15 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
140
142
  Format: `"Option name — X/10 🟢"` (or 🟡 or 🔴). In the label, not the description.
141
143
  Rate: 🟢 9-10 complete, 🟡 6-8 adequate, 🔴 1-5 shortcuts.
142
144
 
145
+ **Pre-call checklist (verify before every AskUserQuestion invocation):**
146
+ - ☐ Completeness scores in every option label
147
+ - ☐ Recommended option listed first
148
+ - ☐ One decision per question (split if multiple)
149
+ - ☐ Analysis/reasoning already presented in message text above
150
+
143
151
  **Formatting:**
144
152
  - *Italics* for emphasis, not **bold** (bold for headers only).
145
- - After each answer: `✔ Decision {N} recorded [quicksave updated]`
153
+ - After each answer: `✔ Decision {N} recorded`
146
154
  - Previews under 8 lines. Full mockups go in conversation text before the question.
147
155
 
148
156
  ---
@@ -191,22 +199,28 @@ Status values: **DONE**, **DONE_WITH_CONCERNS** (list concerns), **BLOCKED** (st
191
199
  Standalone skill. Runs anytime. Recommended before every `/warp-release-update` and after any dependency change, environment variable addition, or new API endpoint.
192
200
 
193
201
  ```
194
- ┌─────────────────────────────────────────────────────────────┐
195
- WARP-PLAN-SECURITY
196
-
197
- │ Mode: Daily (Phases 1-3) Mode: Comprehensive (1-7)
198
-
199
- Phase 1: Secrets Archaeology
200
- │ Phase 2: Dependency Supply Chain
201
- │ Phase 3: OWASP Top 10
202
- ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
203
- │ Phase 4: STRIDE Threat Model (comprehensive)
204
- │ Phase 5: Static Analysis Patterns (comprehensive)
205
- │ Phase 6: Variant Analysis (comprehensive)
206
- Phase 7: Fix Verification (comprehensive)
207
-
208
- Output: Security audit report (stdout + optional file)
209
- └─────────────────────────────────────────────────────────────┘
202
+ ┌──────────────────────────────────────────────────────────────────┐
203
+ WARP-PLAN-SECURITY
204
+
205
+ │ Mode: Daily (Phases 0-3) Mode: Comprehensive (0-10)
206
+ Scope: --infra --code --deps --diff --skills --llm
207
+
208
+ │ Phase 0: Architecture Mental Model
209
+ │ Phase 0.5: Attack Surface Census
210
+ Phase 1: Secrets Archaeology
211
+ │ Phase 2: Dependency Supply Chain
212
+ │ Phase 2.5: CI/CD Pipeline Security
213
+ │ Phase 3: OWASP Top 10
214
+ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
215
+ Phase 4: STRIDE Threat Model (comprehensive)
216
+ Phase 5: Static Analysis Patterns (comprehensive)
217
+ │ Phase 5.5: LLM & AI Security (comprehensive) │
218
+ │ Phase 5.6: Skill Supply Chain Scanning (comprehensive) │
219
+ │ Phase 6: Variant Analysis (comprehensive) │
220
+ │ Phase 7: Fix Verification (comprehensive) │
221
+ │ │
222
+ │ Output: Security audit report (stdout + optional file) │
223
+ └──────────────────────────────────────────────────────────────────┘
210
224
  ```
211
225
 
212
226
  ---
@@ -251,16 +265,43 @@ Internalize these cognitive patterns. They are not a checklist -- they are how y
251
265
 
252
266
  On invocation, determine the mode:
253
267
 
254
- - If the user says "daily," "quick," "fast," or "scan" --> **Daily mode** (Phases 1-3)
255
- - If the user says "comprehensive," "full," "audit," or "deep" --> **Comprehensive mode** (Phases 1-7)
268
+ - If the user says "daily," "quick," "fast," or "scan" --> **Daily mode** (Phases 0-3)
269
+ - If the user says "comprehensive," "full," "audit," or "deep" --> **Comprehensive mode** (Phases 0-10)
256
270
  - If the user says nothing about mode --> ask:
257
271
 
258
272
  > "Two audit modes available:
259
- > A) **Daily scan** -- high-confidence findings only (8/10 severity bar), runs 5-10 minutes, covers secrets/deps/OWASP
260
- > B) **Comprehensive audit** -- catches everything (2/10 bar), runs 30-60 minutes, adds threat model/static analysis/variant analysis/fix verification
273
+ > A) **Daily scan** -- high-confidence findings only (8/10 severity bar), runs 5-10 minutes, covers architecture model/attack surface/secrets/deps/CI-CD/OWASP
274
+ > B) **Comprehensive audit** -- catches everything (2/10 bar), runs 30-60 minutes, adds threat model/static analysis/LLM security/skill supply chain/variant analysis/fix verification
261
275
  >
262
276
  > RECOMMENDATION: Choose B if this is pre-ship, first audit, or after major changes. Choose A for routine checks."
263
277
 
278
+ ### Scope Flags
279
+
280
+ Support targeted audits via scope flags. When a scope flag is provided, run ONLY the matching phases (plus Phase 0 for context). Multiple flags can be combined.
281
+
282
+ | Flag | Phases Run | What It Covers |
283
+ |------|-----------|----------------|
284
+ | `--infra` | 0, 0.5, 2.5 (CI/CD) | CI/CD pipelines, Docker, IaC configs only |
285
+ | `--code` | 0, 3 (OWASP), 5 (Static Analysis) | OWASP Top 10, static analysis patterns only |
286
+ | `--deps` | 0, 2 (Dependency Supply Chain) | Dependency supply chain only |
287
+ | `--diff` | 0, then all phases scoped to changed files | Only scan files changed since last audit |
288
+ | `--skills` | 0, 5.6 (Skill Supply Chain) | Claude Code skill supply chain only |
289
+ | `--llm` | 0, 5.5 (LLM & AI Security) | LLM/AI security vectors only |
290
+
291
+ When `--diff` is used, determine the changed file set:
292
+ ```bash
293
+ # If a previous audit report exists, diff since that commit
294
+ LAST_AUDIT_COMMIT=$(git log --all --oneline --grep="Security Audit" --format='%H' | head -1)
295
+ if [ -n "$LAST_AUDIT_COMMIT" ]; then
296
+ git diff --name-only "$LAST_AUDIT_COMMIT"..HEAD
297
+ else
298
+ # Fall back to last 7 days
299
+ git diff --name-only "HEAD@{7 days ago}"..HEAD 2>/dev/null || git diff --name-only HEAD~20..HEAD
300
+ fi
301
+ ```
302
+
303
+ Only scan those files in each phase. Report the diff scope in the audit header.
304
+
264
305
  ### Severity Bar
265
306
 
266
307
  - **Daily mode (8/10 bar):** Only report findings that are HIGH or CRITICAL severity. These are things that an attacker could exploit today with minimal effort. Skip informational, low, and medium findings -- they create noise that delays shipping.
@@ -283,6 +324,142 @@ On invocation, determine the mode:
283
324
 
284
325
  ---
285
326
 
327
+ ## PHASE 0: Architecture Mental Model
328
+
329
+ **Goal:** Before scanning anything, build a mental model of the application's technology stack, deployment model, and integration points. This model determines which subsequent phases are relevant and which can be skipped.
330
+
331
+ **Time budget:** Daily 1-2 min, Comprehensive 2-3 min.
332
+
333
+ ### 0A. Technology Detection
334
+
335
+ Scan project root for configuration files that reveal the stack:
336
+
337
+ ```bash
338
+ # Detect language/runtime
339
+ for f in package.json Cargo.toml go.mod pyproject.toml requirements.txt Gemfile pom.xml build.gradle composer.json; do
340
+ [ -f "$f" ] && echo "FOUND: $f"
341
+ done
342
+
343
+ # Detect framework
344
+ git grep -l -E '(next\.config|nuxt\.config|remix\.config|vite\.config|angular\.json|svelte\.config)' -- ':!node_modules' 2>/dev/null | head -5
345
+ git grep -l -E '(from ["\x27]express|from ["\x27]fastify|from ["\x27]hono|from ["\x27]django|from ["\x27]flask|from ["\x27]rails)' -- ':!node_modules' 2>/dev/null | head -5
346
+
347
+ # Detect database
348
+ git grep -l -E '(prisma|drizzle|typeorm|sequelize|knex|mongoose|supabase|firebase)' -- '*.json' '*.ts' '*.js' ':!node_modules' 2>/dev/null | head -5
349
+
350
+ # Detect auth
351
+ git grep -l -E '(next-auth|passport|jwt|jsonwebtoken|bcrypt|argon2|oauth|clerk|auth0|supabase.*auth)' -- '*.json' '*.ts' '*.js' ':!node_modules' 2>/dev/null | head -5
352
+
353
+ # Detect deployment
354
+ for f in Dockerfile docker-compose.yml fly.toml vercel.json netlify.toml render.yaml serverless.yml terraform.tf; do
355
+ [ -f "$f" ] && echo "DEPLOY: $f"
356
+ done
357
+ find . -name "*.tf" -not -path "*/node_modules/*" -maxdepth 3 2>/dev/null | head -5
358
+ ```
359
+
360
+ ### 0B. Mental Model Output
361
+
362
+ Produce this structured summary before proceeding:
363
+
364
+ ```
365
+ ARCHITECTURE MENTAL MODEL:
366
+ Language/Runtime: [detected from package.json, Cargo.toml, etc.]
367
+ Framework: [Next.js, Express, Django, Rails, etc.]
368
+ Database: [Postgres, MySQL, SQLite, none]
369
+ Auth: [JWT, session, OAuth, none detected]
370
+ External integrations: [list APIs, webhooks, third-party services]
371
+ Deployment: [Docker, serverless, bare metal, PaaS]
372
+
373
+ This mental model guides which phases are relevant and which can be skipped.
374
+ ```
375
+
376
+ ### 0C. Phase Relevance
377
+
378
+ Based on the mental model, note which phases need extra attention:
379
+ - No auth detected --> Phase 3 A01 and A07 are top priority
380
+ - Docker/CI detected --> Phase 2.5 is critical
381
+ - LLM/AI dependencies detected --> Phase 5.5 is critical
382
+ - Claude Code skills installed --> Phase 5.6 is critical
383
+ - External integrations detected --> STRIDE trust boundary analysis is critical
384
+ - No database detected --> skip database-specific checks in OWASP
385
+
386
+ ---
387
+
388
+ ## PHASE 0.5: Attack Surface Census
389
+
390
+ **Goal:** Produce a quantitative map of the application's attack surface. Higher numbers mean larger surface area requiring more scrutiny.
391
+
392
+ **Time budget:** Daily 1-2 min, Comprehensive 3-5 min.
393
+
394
+ ### 0.5A. Surface Enumeration
395
+
396
+ ```bash
397
+ # Count public endpoints (no auth middleware)
398
+ echo "=== Endpoint counts ==="
399
+ PUBLIC=$(git grep -c -E '(app\.(get|post|put|patch|delete)|router\.(get|post|put|patch|delete)|export.*(GET|POST|PUT|PATCH|DELETE))' -- '*.ts' '*.js' ':!node_modules' 2>/dev/null | awk -F: '{s+=$2} END {print s+0}')
400
+ echo "Route definitions: $PUBLIC"
401
+
402
+ # Count file upload handlers
403
+ UPLOADS=$(git grep -c -E '(multer|upload|formidable|busboy|multipart)' -- '*.ts' '*.js' ':!node_modules' 2>/dev/null | awk -F: '{s+=$2} END {print s+0}')
404
+ echo "File upload references: $UPLOADS"
405
+
406
+ # Count WebSocket channels
407
+ WS=$(git grep -c -E '(WebSocket|socket\.io|ws\(|wss://|io\.on)' -- '*.ts' '*.js' ':!node_modules' 2>/dev/null | awk -F: '{s+=$2} END {print s+0}')
408
+ echo "WebSocket references: $WS"
409
+
410
+ # Count external integrations
411
+ EXT=$(git grep -c -E '(fetch|axios|http\.get|https\.get)\s*\(' -- '*.ts' '*.js' ':!node_modules' 2>/dev/null | awk -F: '{s+=$2} END {print s+0}')
412
+ echo "External HTTP calls: $EXT"
413
+
414
+ # Count background jobs
415
+ JOBS=$(git grep -c -E '(cron|schedule|setInterval|bull|agenda|bree|node-cron)' -- '*.ts' '*.js' ':!node_modules' 2>/dev/null | awk -F: '{s+=$2} END {print s+0}')
416
+ echo "Background job references: $JOBS"
417
+
418
+ # Count CI/CD workflows
419
+ CI=$(find .github/workflows -name "*.yml" -o -name "*.yaml" 2>/dev/null | wc -l)
420
+ echo "CI/CD workflow files: $CI"
421
+
422
+ # Count webhook receivers
423
+ HOOKS=$(git grep -c -E '(webhook|/hook|/callback|/notify)' -- '*.ts' '*.js' ':!node_modules' 2>/dev/null | awk -F: '{s+=$2} END {print s+0}')
424
+ echo "Webhook references: $HOOKS"
425
+
426
+ # Count container configs
427
+ CONTAINERS=$(find . \( -name "Dockerfile*" -o -name "docker-compose*.yml" -o -name ".dockerignore" \) -not -path "*/node_modules/*" 2>/dev/null | wc -l)
428
+ echo "Container config files: $CONTAINERS"
429
+
430
+ # Count IaC configs
431
+ IAC=$(find . \( -name "*.tf" -o -name "*.tfvars" -o -name "serverless.yml" -o -name "cdk.json" -o -name "pulumi.*" \) -not -path "*/node_modules/*" 2>/dev/null | wc -l)
432
+ echo "IaC config files: $IAC"
433
+ ```
434
+
435
+ ### 0.5B. Census Output
436
+
437
+ ```
438
+ ATTACK SURFACE CENSUS:
439
+ ┌────────────────────────┬───────┐
440
+ │ Surface │ Count │
441
+ ├────────────────────────┼───────┤
442
+ │ Public endpoints │ [N] │
443
+ │ Authenticated endpoints │ [N] │
444
+ │ Admin-only endpoints │ [N] │
445
+ │ File upload points │ [N] │
446
+ │ WebSocket channels │ [N] │
447
+ │ External integrations │ [N] │
448
+ │ Background jobs │ [N] │
449
+ │ CI/CD workflows │ [N] │
450
+ │ Webhook receivers │ [N] │
451
+ │ Container configs │ [N] │
452
+ │ IaC configs │ [N] │
453
+ └────────────────────────┴───────┘
454
+ Higher counts = larger attack surface = more scrutiny needed.
455
+ ```
456
+
457
+ Use these counts to allocate time in subsequent phases. A project with 50 endpoints and 0 CI/CD workflows should spend 80% of time on code phases and skip CI/CD. A project with 2 endpoints and 15 CI/CD workflows should prioritize infrastructure.
458
+
459
+ **SOFT GATE: Mental model and attack surface census complete. Proceed to secrets archaeology.**
460
+
461
+ ---
462
+
286
463
  ## PHASE 1: Secrets Archaeology
287
464
 
288
465
  **Goal:** Find every secret that has ever been committed, is currently exposed, or could leak through misconfiguration.
@@ -496,6 +673,110 @@ find node_modules -name "binding.gyp" -maxdepth 3 2>/dev/null | head -10
496
673
 
497
674
  ---
498
675
 
676
+ ## PHASE 2.5: CI/CD Pipeline Security
677
+
678
+ **Goal:** Audit CI/CD pipelines for supply chain attacks, secret exfiltration, and privilege escalation. CI/CD pipelines have production credentials, deployment access, and signing keys -- they are the single highest-value target in most organizations.
679
+
680
+ **Time budget:** Daily 2-3 min, Comprehensive 5-10 min.
681
+
682
+ ### 2.5A. GitHub Actions Audit
683
+
684
+ ```bash
685
+ # Find all workflow files
686
+ find .github/workflows -name "*.yml" -o -name "*.yaml" 2>/dev/null | while read f; do
687
+ echo "=== $f ==="
688
+
689
+ # Check for unpinned third-party actions (uses tag instead of SHA)
690
+ echo "--- Unpinned actions (should use SHA, not tag) ---"
691
+ grep -n 'uses:' "$f" | grep -v -E '@[0-9a-f]{40}' | grep -v 'actions/(checkout|setup-node|cache|upload-artifact|download-artifact)@v' | head -10
692
+
693
+ # Check for dangerous triggers
694
+ echo "--- Dangerous triggers ---"
695
+ grep -n 'pull_request_target' "$f" | head -5
696
+ grep -n 'workflow_dispatch' "$f" | head -5
697
+
698
+ # Check for script injection via interpolation in run: blocks
699
+ echo "--- Potential script injection ---"
700
+ grep -n -E '\$\{\{\s*github\.event\.(issue|pull_request|comment|review|discussion)\.' "$f" | head -10
701
+
702
+ # Check for overly broad secret access
703
+ echo "--- Secret usage ---"
704
+ grep -n -E 'secrets\.' "$f" | head -10
705
+ done
706
+ ```
707
+
708
+ ### 2.5B. Specific CI/CD Checks
709
+
710
+ **Unpinned third-party actions:**
711
+ Third-party GitHub Actions referenced by tag (`@v1`, `@main`) can be silently updated by the action author to inject malicious code. Pin to a full commit SHA (`@a1b2c3d...`).
712
+
713
+ ```
714
+ FINDING template:
715
+ Unpinned action: [action@tag]
716
+ Workflow: [file:line]
717
+ Risk: Action author can push malicious code that runs with your repo's permissions
718
+ Fix: Pin to SHA: [action@full-sha] (find SHA at the action's releases page)
719
+ ```
720
+
721
+ **`pull_request_target` trigger:**
722
+ This trigger runs in the context of the BASE branch, with access to base branch secrets. A malicious PR can exfiltrate secrets if the workflow checks out PR code and runs it. This is one of the most dangerous GitHub Actions patterns.
723
+
724
+ ```bash
725
+ # Check if pull_request_target workflows check out PR code (the dangerous pattern)
726
+ for f in .github/workflows/*.yml .github/workflows/*.yaml; do
727
+ [ -f "$f" ] || continue
728
+ if grep -q 'pull_request_target' "$f"; then
729
+ echo "=== $f uses pull_request_target ==="
730
+ # Does it also checkout PR head? (the exploit vector)
731
+ grep -n -E 'ref.*\$\{\{ github.event.pull_request.head' "$f" | head -5
732
+ grep -n -E 'ref.*\$\{\{ github.head_ref' "$f" | head -5
733
+ fi
734
+ done
735
+ ```
736
+
737
+ **Script injection via expression interpolation:**
738
+ When `${{ github.event.issue.title }}` or similar expressions appear in `run:` blocks, an attacker can craft an issue title containing shell commands that execute in the workflow.
739
+
740
+ ```bash
741
+ # Find all interpolated expressions in run: blocks
742
+ for f in .github/workflows/*.yml .github/workflows/*.yaml; do
743
+ [ -f "$f" ] || continue
744
+ # Look for github.event context used in run blocks (potential injection)
745
+ grep -n -B2 -A2 '\$\{\{.*github\.event\.' "$f" 2>/dev/null | grep -A2 'run:' | head -20
746
+ done
747
+ ```
748
+
749
+ **Secrets passed to unnecessary steps:**
750
+ Each workflow step should only receive the secrets it needs. Steps that receive all secrets via `env:` at the job level can exfiltrate secrets they do not need.
751
+
752
+ **CODEOWNERS protection:**
753
+ ```bash
754
+ # Check if CODEOWNERS exists and what it protects
755
+ if [ -f "CODEOWNERS" ] || [ -f ".github/CODEOWNERS" ] || [ -f "docs/CODEOWNERS" ]; then
756
+ echo "CODEOWNERS found:"
757
+ cat CODEOWNERS .github/CODEOWNERS docs/CODEOWNERS 2>/dev/null
758
+ echo ""
759
+ echo "Check: Is branch protection enabled requiring CODEOWNERS review?"
760
+ else
761
+ echo "WARN: No CODEOWNERS file found"
762
+ fi
763
+ ```
764
+
765
+ **Self-hosted runner risks:**
766
+ ```bash
767
+ # Check for self-hosted runner usage
768
+ for f in .github/workflows/*.yml .github/workflows/*.yaml; do
769
+ [ -f "$f" ] || continue
770
+ grep -n 'runs-on.*self-hosted' "$f" | head -5
771
+ done
772
+ ```
773
+
774
+ Self-hosted runners can access other workflows' secrets, persist state between jobs, and provide lateral movement to the host machine's network. If found, flag as HIGH and recommend ephemeral runners or container isolation.
775
+
776
+ **SOFT GATE: Phase 2.5 complete. Proceed to OWASP Top 10.**
777
+
778
+ ---
779
+
499
780
  ## PHASE 3: OWASP Top 10
500
781
 
501
782
  **Goal:** Systematically check for all OWASP Top 10 (2021) vulnerability categories in the codebase.
@@ -676,9 +957,9 @@ git grep -n -E '(url\.parse|new URL|redirect|location)' -- '*.ts' '*.js' ':!node
676
957
  - Internal network addresses (10.x, 172.16.x, 192.168.x, 127.x, localhost) are blocked in user-supplied URLs
677
958
  - Redirect endpoints validate the target URL
678
959
 
679
- **HARD GATE (Daily mode): Phases 1-3 complete. Present findings summary. In daily mode, this is the final gate -- produce the report.**
960
+ **HARD GATE (Daily mode): Phases 0-3 complete. Present findings summary. In daily mode, this is the final gate -- produce the report.**
680
961
 
681
- **SOFT GATE (Comprehensive mode): Phases 1-3 complete. Proceed to Phase 4.**
962
+ **SOFT GATE (Comprehensive mode): Phases 0-3 complete. Proceed to Phase 4.**
682
963
 
683
964
  ---
684
965
 
@@ -876,7 +1157,205 @@ Apply common vulnerability patterns. For each pattern, search the codebase:
876
1157
  | Mass assignment | Spreading user input into DB write | `git grep -n 'insert.*\.\.\.req\|update.*\.\.\.body\|create.*\.\.\.input'` |
877
1158
  | Timing attack | Non-constant-time comparison of secrets | `git grep -n '===.*token\|===.*secret\|===.*hash'` |
878
1159
 
879
- **SOFT GATE: Phase 5 complete. Proceed to variant analysis.**
1160
+ **SOFT GATE: Phase 5 complete. Proceed to LLM & AI security.**
1161
+
1162
+ ---
1163
+
1164
+ ## PHASE 5.5: LLM & AI Security
1165
+
1166
+ **Comprehensive mode only.**
1167
+
1168
+ **Goal:** Audit all LLM/AI integration points for prompt injection, output trust violations, key exposure, and unsafe execution patterns.
1169
+
1170
+ **Time budget:** 5-10 min.
1171
+
1172
+ ### 5.5A. LLM Integration Detection
1173
+
1174
+ ```bash
1175
+ # Detect LLM/AI SDKs and API usage
1176
+ git grep -l -E '(openai|anthropic|@anthropic-ai|langchain|llama|cohere|replicate|huggingface|ai/core|@ai-sdk)' -- '*.ts' '*.js' '*.py' '*.json' ':!node_modules' ':!*.lock' 2>/dev/null | head -20
1177
+
1178
+ # Detect prompt construction
1179
+ git grep -n -E '(system.*prompt|user.*prompt|messages.*role|ChatCompletion|generateText|streamText|createChat)' -- '*.ts' '*.js' '*.py' ':!node_modules' 2>/dev/null | head -30
1180
+
1181
+ # Detect tool/function calling
1182
+ git grep -n -E '(tools|functions|function_call|tool_choice|tool_use)' -- '*.ts' '*.js' '*.py' ':!node_modules' 2>/dev/null | head -20
1183
+ ```
1184
+
1185
+ ### 5.5B. Prompt Injection Vectors
1186
+
1187
+ Check every code path where user input feeds into LLM prompts:
1188
+
1189
+ ```bash
1190
+ # Find prompt templates with interpolated user input
1191
+ git grep -n -E '(prompt.*\$\{|prompt.*\+|`.*\$\{.*user|`.*\$\{.*input|`.*\$\{.*query|`.*\$\{.*message)' -- '*.ts' '*.js' ':!node_modules' 2>/dev/null | head -20
1192
+
1193
+ # Find f-strings or format strings in prompt construction (Python)
1194
+ git grep -n -E '(f["\x27].*\{.*user|f["\x27].*\{.*input|\.format\(.*user|\.format\(.*input)' -- '*.py' ':!node_modules' 2>/dev/null | head -20
1195
+ ```
1196
+
1197
+ **Specific checks:**
1198
+ - User input concatenated directly into system prompts without sanitization or delimiters
1199
+ - User input placed before system instructions (allows instruction override)
1200
+ - No input length limits on user content sent to LLM (cost and injection risk)
1201
+ - Missing output validation before rendering LLM responses
1202
+
1203
+ ### 5.5C. Unsanitized LLM Output
1204
+
1205
+ LLM output is UNTRUSTED. It must be treated like user input at every rendering boundary.
1206
+
1207
+ ```bash
1208
+ # LLM output rendered as HTML (XSS via LLM)
1209
+ git grep -n -E '(dangerouslySetInnerHTML|v-html|innerHTML)' -- '*.ts' '*.tsx' '*.js' '*.jsx' '*.vue' ':!node_modules' 2>/dev/null | head -10
1210
+
1211
+ # LLM output used in SQL (injection via LLM)
1212
+ git grep -n -E '(query|execute|raw)\s*\(' -- '*.ts' '*.js' ':!node_modules' 2>/dev/null | head -10
1213
+
1214
+ # LLM output used in shell commands (command injection via LLM)
1215
+ git grep -n -E '(exec|spawn|execSync|child_process)' -- '*.ts' '*.js' ':!node_modules' 2>/dev/null | head -10
1216
+
1217
+ # eval/exec of LLM output (arbitrary code execution)
1218
+ git grep -n -E '(eval|exec|Function)\s*\(' -- '*.ts' '*.js' '*.py' ':!node_modules' 2>/dev/null | head -15
1219
+ ```
1220
+
1221
+ Cross-reference these with LLM output variables. If LLM output flows into any of these sinks without sanitization, it is a finding.
1222
+
1223
+ ### 5.5D. Tool/Function Calling Validation
1224
+
1225
+ If the application uses LLM tool/function calling:
1226
+
1227
+ **Specific checks:**
1228
+ - Are tool calls from the LLM validated against an allowlist before execution?
1229
+ - Are tool call arguments validated/sanitized before being passed to the actual function?
1230
+ - Can the LLM request tools that access sensitive resources (file system, database, network)?
1231
+ - Is there a human-in-the-loop for destructive tool calls?
1232
+ - Are tool results from external sources sanitized before being fed back to the LLM?
1233
+
1234
+ ### 5.5E. AI API Key Security
1235
+
1236
+ ```bash
1237
+ # Hardcoded AI API keys
1238
+ git grep -n -E '(sk-[a-zA-Z0-9]{20,}|sk-ant-[a-zA-Z0-9]{20,}|sk-proj-[a-zA-Z0-9]{20,})' -- ':!node_modules' ':!*.lock' 2>/dev/null | head -10
1239
+
1240
+ # AI keys in committed env files
1241
+ git log --all -p --diff-filter=A -- '*.env' '*.env.*' 2>/dev/null | grep -E '(OPENAI|ANTHROPIC|COHERE|REPLICATE|HUGGINGFACE).*=' | head -10
1242
+
1243
+ # AI keys in client-side code (prefix patterns)
1244
+ git grep -n -E '(NEXT_PUBLIC|EXPO_PUBLIC|REACT_APP|VITE_).*(OPENAI|ANTHROPIC|AI_|LLM_)' -- ':!*.md' 2>/dev/null | head -10
1245
+ ```
1246
+
1247
+ ### 5.5F. LLM Rate Limiting and Availability
1248
+
1249
+ **Specific checks:**
1250
+ - Rate limiting on endpoints that trigger LLM API calls (prevent cost abuse)
1251
+ - Timeout handling for LLM API calls (prevent request queuing)
1252
+ - Fallback behavior when LLM API is unavailable (graceful degradation, not crash)
1253
+ - Cost controls or budget limits on LLM API usage
1254
+ - Retry logic that could amplify costs on failure
1255
+
1256
+ **SOFT GATE: Phase 5.5 complete. Proceed to skill supply chain scanning.**
1257
+
1258
+ ---
1259
+
1260
+ ## PHASE 5.6: Skill Supply Chain Scanning
1261
+
1262
+ **Comprehensive mode only.**
1263
+
1264
+ **Goal:** Audit installed Claude Code skills (and similar agent tools) for malicious behavior, data exfiltration, and prompt injection. Research shows 36% of published skills have security flaws, and 13.4% are outright malicious (gstack research).
1265
+
1266
+ **Time budget:** 5-10 min.
1267
+
1268
+ ### 5.6A. Skill Inventory
1269
+
1270
+ ```bash
1271
+ # List installed Claude Code skills
1272
+ ls -la ~/.claude/skills/ 2>/dev/null | head -30
1273
+
1274
+ # List installed agents
1275
+ ls -la ~/.claude/agents/ 2>/dev/null | head -30
1276
+
1277
+ # Check project-local skills
1278
+ find . -path '*/.claude/skills/*' -name "*.md" -not -path "*/node_modules/*" 2>/dev/null | head -20
1279
+
1280
+ # Check for hook scripts
1281
+ cat .claude/settings.json 2>/dev/null | grep -A5 '"hooks"' | head -20
1282
+ cat .claude/settings.local.json 2>/dev/null | grep -A5 '"hooks"' | head -20
1283
+ ```
1284
+
1285
+ ### 5.6B. Network Exfiltration Patterns
1286
+
1287
+ Scan all skill and hook files for outbound network calls:
1288
+
1289
+ ```bash
1290
+ # Check hook scripts for network calls
1291
+ find .claude -name "*.sh" -o -name "*.js" -o -name "*.py" 2>/dev/null | while read f; do
1292
+ echo "=== $f ==="
1293
+ grep -n -E '(curl|wget|fetch|http|https|XMLHttpRequest|net\.Socket|dgram)' "$f" 2>/dev/null | head -5
1294
+ done
1295
+
1296
+ # Check skill definitions for network instruction patterns
1297
+ find ~/.claude/skills -name "*.md" 2>/dev/null | while read f; do
1298
+ HITS=$(grep -c -i -E '(send.*to|post.*to|upload.*to|exfiltrate|phone.*home|beacon|ping.*server)' "$f" 2>/dev/null)
1299
+ if [ "$HITS" -gt 0 ]; then
1300
+ echo "SUSPICIOUS: $f ($HITS network instruction patterns)"
1301
+ fi
1302
+ done
1303
+ ```
1304
+
1305
+ ### 5.6C. Credential Access Patterns
1306
+
1307
+ ```bash
1308
+ # Check for scripts reading sensitive directories
1309
+ find .claude -name "*.sh" -o -name "*.js" -o -name "*.py" 2>/dev/null | while read f; do
1310
+ echo "=== $f ==="
1311
+ grep -n -E '(\.ssh/|\.aws/|\.gnupg/|\.npmrc|\.pypirc|\.netrc|\.docker/config|credentials|keychain)' "$f" 2>/dev/null | head -5
1312
+ grep -n -E '(process\.env|os\.environ|ENV\[|getenv)' "$f" 2>/dev/null | head -5
1313
+ done
1314
+ ```
1315
+
1316
+ ### 5.6D. Prompt Injection in Skill Definitions
1317
+
1318
+ Check skill definitions for prompt injection patterns:
1319
+
1320
+ ```bash
1321
+ # Look for instruction override attempts in skill files
1322
+ find ~/.claude/skills -name "*.md" 2>/dev/null | while read f; do
1323
+ grep -n -i -E '(ignore.*previous|disregard.*instructions|override.*system|you are now|forget.*rules|new instructions)' "$f" 2>/dev/null | head -3
1324
+ [ $? -eq 0 ] && echo "SUSPICIOUS prompt injection pattern in: $f"
1325
+ done
1326
+ ```
1327
+
1328
+ ### 5.6E. Obfuscated Code Detection
1329
+
1330
+ ```bash
1331
+ # Check hook scripts for obfuscation patterns
1332
+ find .claude -name "*.sh" -o -name "*.js" -o -name "*.py" 2>/dev/null | while read f; do
1333
+ echo "=== $f ==="
1334
+ # Base64 encoded commands
1335
+ grep -n -E '(base64|atob|btoa|decode\(|b64decode)' "$f" 2>/dev/null | head -3
1336
+ # Hex-encoded strings
1337
+ grep -n -E '\\x[0-9a-fA-F]{2}.*\\x[0-9a-fA-F]{2}.*\\x[0-9a-fA-F]{2}' "$f" 2>/dev/null | head -3
1338
+ # eval with encoded input
1339
+ grep -n -E '(eval|exec)\s*\(.*decode' "$f" 2>/dev/null | head -3
1340
+ done
1341
+ ```
1342
+
1343
+ ### 5.6F. Skill Risk Classification
1344
+
1345
+ For each skill/agent found:
1346
+
1347
+ ```
1348
+ SKILL: [name]
1349
+ Source: [official | community | unknown]
1350
+ Network access: [none | detected (list URLs)]
1351
+ Credential access: [none | detected (list paths)]
1352
+ Prompt injection: [none | suspicious patterns found]
1353
+ Obfuscation: [none | detected]
1354
+ Risk: [LOW | MEDIUM | HIGH | CRITICAL]
1355
+ Recommendation: [keep | review | remove]
1356
+ ```
1357
+
1358
+ **SOFT GATE: Phase 5.6 complete. Proceed to variant analysis.**
880
1359
 
881
1360
  ---
882
1361
 
@@ -923,7 +1402,22 @@ VARIANT: [location file:line]
923
1402
  Severity delta: [same as original | higher because... | lower because...]
924
1403
  ```
925
1404
 
926
- ### 6D. Systemic Issue Identification
1405
+ ### 6D. Variant Analysis Output
1406
+
1407
+ When a finding is VERIFIED in any phase, produce this structured output for every variant search:
1408
+
1409
+ ```
1410
+ VARIANT ANALYSIS:
1411
+ Finding: [description of the original verified finding]
1412
+ Pattern: [regex or code pattern used to search]
1413
+ Codebase scan results: [N additional instances found]
1414
+ Locations: [file:line for each instance]
1415
+ Exploitability per location: [yes/no/uncertain for each]
1416
+ ```
1417
+
1418
+ This output is mandatory for every CRITICAL and HIGH finding. For MEDIUM and below, variant analysis is recommended but not required.
1419
+
1420
+ ### 6E. Systemic Issue Identification
927
1421
 
928
1422
  If 3+ variants of the same pattern are found:
929
1423
 
@@ -1059,7 +1553,7 @@ For daily mode, sections marked [COMPREHENSIVE] are omitted.
1059
1553
 
1060
1554
  ## [COMPREHENSIVE] Systemic Issues
1061
1555
 
1062
- {From Phase 6D -- patterns that recur across the codebase}
1556
+ {From Phase 6E -- patterns that recur across the codebase}
1063
1557
 
1064
1558
  ## Fix Priority Matrix
1065
1559
 
@@ -1078,10 +1572,43 @@ For daily mode, sections marked [COMPREHENSIVE] are omitted.
1078
1572
 
1079
1573
  ## Trend Tracking
1080
1574
 
1081
- {If previous audit reports exist, compare:}
1575
+ {If previous audit reports exist in `.warp/reports/planning/`, compare:}
1576
+
1577
+ TREND COMPARISON:
1578
+ Previous audit: [date] — [N] findings
1579
+ Current audit: [date] — [N] findings
1580
+ Resolved since last: [N] — [list]
1581
+ Persistent (still open): [N] — [list]
1582
+ New (first seen): [N] — [list]
1583
+ Trend: IMPROVING | STABLE | DEGRADING
1584
+
1082
1585
  - Findings resolved since last audit: {list}
1083
1586
  - New findings since last audit: {list}
1084
1587
  - Recurring findings (appeared in 2+ audits): {list} — these need systemic fixes
1588
+ - Trend direction: IMPROVING (fewer findings) | STABLE (same count) | DEGRADING (more findings)
1589
+
1590
+ ## Attack Surface Census
1591
+
1592
+ {From Phase 0.5 — the quantitative attack surface map}
1593
+
1594
+ ## [COMPREHENSIVE] Data Classification
1595
+
1596
+ {From Phase 4 data flow analysis — the four-tier data classification table}
1597
+
1598
+ ## [COMPREHENSIVE] LLM & AI Security Summary
1599
+
1600
+ {From Phase 5.5 — prompt injection vectors, output trust violations, API key findings}
1601
+
1602
+ ## [COMPREHENSIVE] Skill Supply Chain Summary
1603
+
1604
+ {From Phase 5.6 — risk classification for each installed skill/agent}
1605
+
1606
+ ---
1607
+
1608
+ DISCLAIMER: This automated security audit is not a substitute for a professional
1609
+ penetration test or security assessment. It identifies common vulnerability patterns
1610
+ but cannot guarantee completeness. For production systems handling sensitive data,
1611
+ engage a professional security firm.
1085
1612
  ```
1086
1613
 
1087
1614
  ---
@@ -1092,12 +1619,26 @@ These principles are architectural, not tactical. They shape how the system is d
1092
1619
 
1093
1620
  **Default deny.** New endpoints, resources, and operations are inaccessible until permissions are explicitly defined. The default state of any new surface area is "blocked." Access must be granted, never assumed. If a developer adds a new API endpoint and forgets to add an auth check, the default-deny architecture rejects all requests to it rather than serving them to anyone.
1094
1621
 
1095
- **Three-tier data classification.** All data in the system belongs to one of three tiers:
1096
- - **Public:** Safe to expose to any user or external system. Example: app version, public documentation.
1097
- - **Sensitive:** Accessible only to authenticated users with appropriate authorization. Example: user profile, flight schedule, notification preferences.
1098
- - **Restricted:** Accessible only to specific roles with audit logging. Example: service role keys, admin credentials, PII aggregates.
1622
+ **Four-tier data classification.** All data in the system belongs to one of four tiers. In comprehensive audits, produce the data classification table as part of Phase 4 (STRIDE) data flow analysis:
1623
+
1624
+ ```
1625
+ DATA CLASSIFICATION:
1626
+ ┌────────────────┬─────────────┬──────────────────────────────┐
1627
+ │ Data │ Class │ Handling Requirements │
1628
+ ├────────────────┼─────────────┼──────────────────────────────┤
1629
+ │ [data type] │ RESTRICTED │ Encrypted at rest + in transit │
1630
+ │ [data type] │ CONFIDENTIAL │ Access-controlled, logged │
1631
+ │ [data type] │ INTERNAL │ Not public, basic controls │
1632
+ │ [data type] │ PUBLIC │ No restrictions │
1633
+ └────────────────┴─────────────┴──────────────────────────────┘
1634
+ ```
1635
+
1636
+ - **PUBLIC:** Safe to expose to any user or external system. Example: app version, public documentation.
1637
+ - **INTERNAL:** Not public, but low sensitivity. Basic access controls. Example: internal feature flags, non-sensitive configuration.
1638
+ - **CONFIDENTIAL:** Accessible only to authenticated users with appropriate authorization. Example: user profile, flight schedule, notification preferences.
1639
+ - **RESTRICTED:** Accessible only to specific roles with audit logging. Encrypted at rest and in transit. Example: service role keys, admin credentials, PII aggregates, payment data.
1099
1640
 
1100
- Every data field in the architecture should be classifiable into one of these tiers. If a field's tier is ambiguous, treat it as Sensitive until explicitly classified.
1641
+ Every data field in the architecture should be classifiable into one of these tiers. If a field's tier is ambiguous, treat it as CONFIDENTIAL until explicitly classified. The classification table is a required output in comprehensive mode -- it goes in the report alongside the STRIDE threat model.
1101
1642
 
1102
1643
  **Design for most restrictive phase, enforce from earliest.** If the product will eventually need HIPAA compliance, SOC 2, or GDPR data residency — design for those constraints now, even if enforcement is deferred. Retrofitting security constraints into an architecture designed without them costs 10x the effort of building them in from the start. No shortcuts that require retrofit.
1103
1644
 
@@ -1133,7 +1674,7 @@ These are security review anti-patterns. If you catch yourself doing any of thes
1133
1674
 
1134
1675
  1. **MUST redact all secret values in report output.** Never print actual API keys, passwords, tokens, or credentials. Use `[REDACTED]` or show only the first 4 characters.
1135
1676
 
1136
- 2. **MUST run Phase 1 (Secrets Archaeology) first in every audit.** Secrets leaks are the highest-ROI finding for an attacker and take minutes to check.
1677
+ 2. **MUST run Phase 0 (Architecture Mental Model) and Phase 1 (Secrets Archaeology) in every audit.** Phase 0 builds the context model. Phase 1 (secrets) is the first scanning phase and the highest-ROI finding for an attacker -- it takes minutes to check.
1137
1678
 
1138
1679
  3. **MUST classify every finding with both OWASP category and STRIDE category.** This ensures completeness -- if a finding does not map to either framework, it may be a false positive.
1139
1680
 
@@ -1147,7 +1688,7 @@ These are security review anti-patterns. If you catch yourself doing any of thes
1147
1688
 
1148
1689
  8. **MUST ask the user before making any code changes.** This skill is audit-only by default. Present findings and get explicit approval before touching code.
1149
1690
 
1150
- 9. **MUST consider the full attack surface.** Client-side code, server-side code, CI/CD pipelines, infrastructure configuration, dependency chain, git history -- all of it is in scope.
1691
+ 9. **MUST consider the full attack surface.** Client-side code, server-side code, CI/CD pipelines, infrastructure configuration, dependency chain, git history, LLM integration points, installed skills/agents -- all of it is in scope. The Phase 0.5 census quantifies this surface.
1151
1692
 
1152
1693
  10. **MUST apply the severity bar consistently.** Daily mode (8/10 bar) skips LOW/MEDIUM/INFO. Comprehensive mode (2/10 bar) reports everything. Never mix bars within a single audit.
1153
1694