@jamie-tam/forge 6.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (213) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +389 -0
  3. package/agents/architect.md +92 -0
  4. package/agents/builder.md +122 -0
  5. package/agents/code-reviewer.md +107 -0
  6. package/agents/concept-designer.md +207 -0
  7. package/agents/craft-reviewer.md +132 -0
  8. package/agents/critic.md +130 -0
  9. package/agents/doc-writer.md +85 -0
  10. package/agents/dreamer.md +129 -0
  11. package/agents/e2e-runner.md +89 -0
  12. package/agents/gotcha-hunter.md +127 -0
  13. package/agents/prototype-builder.md +193 -0
  14. package/agents/prototype-codifier.md +204 -0
  15. package/agents/prototype-reviewer.md +163 -0
  16. package/agents/security-reviewer.md +108 -0
  17. package/agents/spec-reviewer.md +94 -0
  18. package/agents/tracer.md +98 -0
  19. package/agents/wireframer.md +109 -0
  20. package/commands/abort.md +25 -0
  21. package/commands/bugfix.md +151 -0
  22. package/commands/evolve.md +118 -0
  23. package/commands/feature.md +236 -0
  24. package/commands/forge.md +100 -0
  25. package/commands/greenfield.md +185 -0
  26. package/commands/hotfix.md +98 -0
  27. package/commands/refactor.md +147 -0
  28. package/commands/resume.md +25 -0
  29. package/commands/setup.md +201 -0
  30. package/commands/status.md +27 -0
  31. package/commands/task-force.md +110 -0
  32. package/commands/validate.md +12 -0
  33. package/dist/__tests__/active-manifest.test.js +272 -0
  34. package/dist/__tests__/copy.test.js +96 -0
  35. package/dist/__tests__/gate-check.test.js +384 -0
  36. package/dist/__tests__/wiki.test.js +472 -0
  37. package/dist/__tests__/work-manifest.test.js +304 -0
  38. package/dist/active-manifest.js +229 -0
  39. package/dist/cli.js +158 -0
  40. package/dist/copy.js +124 -0
  41. package/dist/gate-check.js +326 -0
  42. package/dist/hooks.js +60 -0
  43. package/dist/init.js +140 -0
  44. package/dist/manifest.js +90 -0
  45. package/dist/merge.js +77 -0
  46. package/dist/paths.js +36 -0
  47. package/dist/uninstall.js +216 -0
  48. package/dist/update.js +158 -0
  49. package/dist/verify-manifest.js +65 -0
  50. package/dist/verify.js +98 -0
  51. package/dist/wiki-ui.js +310 -0
  52. package/dist/wiki.js +364 -0
  53. package/dist/work-manifest.js +798 -0
  54. package/hooks/config/gate-requirements.json +79 -0
  55. package/hooks/hooks.json +143 -0
  56. package/hooks/scripts/analyze-telemetry.sh +114 -0
  57. package/hooks/scripts/gate-enforcer.sh +164 -0
  58. package/hooks/scripts/pre-compact.sh +90 -0
  59. package/hooks/scripts/session-start.sh +81 -0
  60. package/hooks/scripts/telemetry.sh +41 -0
  61. package/hooks/scripts/wiki-lint.sh +87 -0
  62. package/hooks/templates/AGENTS.md.template +48 -0
  63. package/hooks/templates/CLAUDE.md.template +45 -0
  64. package/package.json +55 -0
  65. package/protocols/README.md +40 -0
  66. package/protocols/codex.md +151 -0
  67. package/protocols/graphify.md +156 -0
  68. package/references/common/agent-coordination.md +65 -0
  69. package/references/common/coding-standards.md +54 -0
  70. package/references/common/feature-tracking.md +21 -0
  71. package/references/common/io-protocol.md +36 -0
  72. package/references/common/phases.md +57 -0
  73. package/references/common/quality-gates.md +130 -0
  74. package/references/common/skill-authoring.md +154 -0
  75. package/references/common/skill-compliance.md +30 -0
  76. package/references/python/standards.md +44 -0
  77. package/references/react/standards.md +61 -0
  78. package/references/typescript/standards.md +42 -0
  79. package/rules/common/forge-system.md +59 -0
  80. package/rules/common/git-workflow.md +40 -0
  81. package/rules/common/guardrails.md +37 -0
  82. package/rules/common/quality-gates.md +18 -0
  83. package/rules/common/security.md +50 -0
  84. package/rules/common/skill-selection.md +78 -0
  85. package/rules/common/testing.md +58 -0
  86. package/rules/common/verification.md +39 -0
  87. package/skills/build-pr-workflow/SKILL.md +301 -0
  88. package/skills/build-pr-workflow/references/pr-template.md +62 -0
  89. package/skills/build-pr-workflow/references/subagent-merge.md +47 -0
  90. package/skills/build-pr-workflow/references/worktree-setup.md +125 -0
  91. package/skills/build-prototype/SKILL.md +264 -0
  92. package/skills/build-scaffold/SKILL.md +340 -0
  93. package/skills/build-tdd/SKILL.md +89 -0
  94. package/skills/build-wireframe/SKILL.md +110 -0
  95. package/skills/build-wireframe/assets/baseline-template.html +486 -0
  96. package/skills/build-wireframe/references/demo-walkthroughs.md +170 -0
  97. package/skills/build-wireframe/references/gotchas.md +188 -0
  98. package/skills/build-wireframe/references/legend-lines.md +141 -0
  99. package/skills/concept-slides/SKILL.md +192 -0
  100. package/skills/deliver-db-migration/SKILL.md +466 -0
  101. package/skills/deliver-deploy/SKILL.md +407 -0
  102. package/skills/deliver-onboarding/SKILL.md +198 -0
  103. package/skills/deliver-onboarding/references/document-templates.md +393 -0
  104. package/skills/deliver-onboarding/templates/getting-started.md +122 -0
  105. package/skills/discover-codebase-analysis/SKILL.md +448 -0
  106. package/skills/discover-requirements/SKILL.md +418 -0
  107. package/skills/discover-requirements/templates/prd.md +99 -0
  108. package/skills/discover-requirements/templates/technical-spec.md +123 -0
  109. package/skills/discover-requirements/templates/user-stories.md +76 -0
  110. package/skills/harden/SKILL.md +214 -0
  111. package/skills/iterate-prototype/SKILL.md +241 -0
  112. package/skills/plan-architecture/SKILL.md +457 -0
  113. package/skills/plan-architecture/templates/adr-template.md +52 -0
  114. package/skills/plan-architecture/templates/api-contract.md +99 -0
  115. package/skills/plan-architecture/templates/db-schema.md +81 -0
  116. package/skills/plan-architecture/templates/system-design.md +111 -0
  117. package/skills/plan-brainstorm/SKILL.md +433 -0
  118. package/skills/plan-design-system/SKILL.md +279 -0
  119. package/skills/plan-task-decompose/SKILL.md +454 -0
  120. package/skills/quality-code-review/SKILL.md +286 -0
  121. package/skills/quality-security-audit/SKILL.md +292 -0
  122. package/skills/quality-security-audit/references/audit-report-template.md +89 -0
  123. package/skills/quality-security-audit/references/owasp-checks.md +178 -0
  124. package/skills/quality-test-execution/SKILL.md +435 -0
  125. package/skills/quality-test-plan/SKILL.md +297 -0
  126. package/skills/quality-test-plan/references/test-type-guide.md +263 -0
  127. package/skills/quality-test-plan/templates/e2e-test-plan.md +72 -0
  128. package/skills/quality-test-plan/templates/integration-test-plan.md +74 -0
  129. package/skills/quality-test-plan/templates/load-test-plan.md +111 -0
  130. package/skills/quality-test-plan/templates/smoke-test-plan.md +68 -0
  131. package/skills/quality-test-plan/templates/unit-test-plan.md +56 -0
  132. package/skills/quality-uiux/SKILL.md +481 -0
  133. package/skills/support-debug/SKILL.md +464 -0
  134. package/skills/support-dream/SKILL.md +213 -0
  135. package/skills/support-gotcha/SKILL.md +249 -0
  136. package/skills/support-runtime-reachability/SKILL.md +190 -0
  137. package/skills/support-runtime-reachability/scripts/__fixtures__/case-01-passes-app-use/src/app.ts +7 -0
  138. package/skills/support-runtime-reachability/scripts/__fixtures__/case-01-passes-app-use/src/handlers/cases.ts +7 -0
  139. package/skills/support-runtime-reachability/scripts/__fixtures__/case-02-orphan-no-app-use/src/app.ts +8 -0
  140. package/skills/support-runtime-reachability/scripts/__fixtures__/case-02-orphan-no-app-use/src/handlers/cases.ts +7 -0
  141. package/skills/support-runtime-reachability/scripts/__fixtures__/case-03-orphan-import-only/src/App.tsx +5 -0
  142. package/skills/support-runtime-reachability/scripts/__fixtures__/case-03-orphan-import-only/src/components/RingingBanner.tsx +7 -0
  143. package/skills/support-runtime-reachability/scripts/__fixtures__/case-03-orphan-import-only/src/hooks/useTwilio.ts +6 -0
  144. package/skills/support-runtime-reachability/scripts/__fixtures__/case-04-jsx-component-rendered/src/App.tsx +5 -0
  145. package/skills/support-runtime-reachability/scripts/__fixtures__/case-04-jsx-component-rendered/src/components/MyComp.tsx +3 -0
  146. package/skills/support-runtime-reachability/scripts/__fixtures__/case-05-jsx-component-not-rendered/src/App.tsx +3 -0
  147. package/skills/support-runtime-reachability/scripts/__fixtures__/case-05-jsx-component-not-rendered/src/components/Orphan.tsx +3 -0
  148. package/skills/support-runtime-reachability/scripts/__fixtures__/case-06-class-instantiated/src/lib/Service.ts +6 -0
  149. package/skills/support-runtime-reachability/scripts/__fixtures__/case-06-class-instantiated/src/main.ts +4 -0
  150. package/skills/support-runtime-reachability/scripts/__fixtures__/case-07-class-not-instantiated/src/lib/Lonely.ts +5 -0
  151. package/skills/support-runtime-reachability/scripts/__fixtures__/case-07-class-not-instantiated/src/main.ts +2 -0
  152. package/skills/support-runtime-reachability/scripts/__fixtures__/case-08-default-export-imported-and-called/src/handler.ts +3 -0
  153. package/skills/support-runtime-reachability/scripts/__fixtures__/case-08-default-export-imported-and-called/src/main.ts +3 -0
  154. package/skills/support-runtime-reachability/scripts/__fixtures__/case-09-default-export-orphan/src/handler.ts +3 -0
  155. package/skills/support-runtime-reachability/scripts/__fixtures__/case-09-default-export-orphan/src/main.ts +2 -0
  156. package/skills/support-runtime-reachability/scripts/__fixtures__/case-10-aliased-named-export/src/lib.ts +5 -0
  157. package/skills/support-runtime-reachability/scripts/__fixtures__/case-10-aliased-named-export/src/main.ts +3 -0
  158. package/skills/support-runtime-reachability/scripts/__fixtures__/case-11-re-export-chain/src/lib/index.ts +1 -0
  159. package/skills/support-runtime-reachability/scripts/__fixtures__/case-11-re-export-chain/src/lib/internal.ts +3 -0
  160. package/skills/support-runtime-reachability/scripts/__fixtures__/case-11-re-export-chain/src/main.ts +3 -0
  161. package/skills/support-runtime-reachability/scripts/__fixtures__/case-12-test-only-caller/src/util.test.ts +5 -0
  162. package/skills/support-runtime-reachability/scripts/__fixtures__/case-12-test-only-caller/src/util.ts +3 -0
  163. package/skills/support-runtime-reachability/scripts/__fixtures__/case-13-gated-pending-annotation/src/future.ts +4 -0
  164. package/skills/support-runtime-reachability/scripts/__fixtures__/case-14-untraceable-annotation/src/decorated.ts +4 -0
  165. package/skills/support-runtime-reachability/scripts/__fixtures__/case-15-untraceable-empty/src/lazy.ts +4 -0
  166. package/skills/support-runtime-reachability/scripts/__fixtures__/case-16-python-module/src/lib.py +15 -0
  167. package/skills/support-runtime-reachability/scripts/__fixtures__/case-16-python-module/src/main.py +5 -0
  168. package/skills/support-runtime-reachability/scripts/__fixtures__/case-17-router-use/src/parent.ts +5 -0
  169. package/skills/support-runtime-reachability/scripts/__fixtures__/case-17-router-use/src/routes/cases.ts +5 -0
  170. package/skills/support-runtime-reachability/scripts/__fixtures__/case-18-shadowed-name-fp/src/lib/foo.ts +3 -0
  171. package/skills/support-runtime-reachability/scripts/__fixtures__/case-18-shadowed-name-fp/src/other.ts +8 -0
  172. package/skills/support-runtime-reachability/scripts/__fixtures__/case-19-same-name-different-module/src/handlers/cases.ts +4 -0
  173. package/skills/support-runtime-reachability/scripts/__fixtures__/case-19-same-name-different-module/src/handlers/users.ts +4 -0
  174. package/skills/support-runtime-reachability/scripts/__fixtures__/case-19-same-name-different-module/src/main.ts +5 -0
  175. package/skills/support-runtime-reachability/scripts/__fixtures__/case-20-aliased-import-usage/src/handlers/cases.ts +3 -0
  176. package/skills/support-runtime-reachability/scripts/__fixtures__/case-20-aliased-import-usage/src/main.ts +4 -0
  177. package/skills/support-runtime-reachability/scripts/__fixtures__/case-21-mixed-default-and-named/src/lib.ts +5 -0
  178. package/skills/support-runtime-reachability/scripts/__fixtures__/case-21-mixed-default-and-named/src/main.ts +5 -0
  179. package/skills/support-runtime-reachability/scripts/__fixtures__/case-22-dynamic-import-then-caller/src/lib.ts +3 -0
  180. package/skills/support-runtime-reachability/scripts/__fixtures__/case-22-dynamic-import-then-caller/src/main.ts +8 -0
  181. package/skills/support-runtime-reachability/scripts/__fixtures__/case-23-dynamic-import-with-space/src/lib.ts +3 -0
  182. package/skills/support-runtime-reachability/scripts/__fixtures__/case-23-dynamic-import-with-space/src/main.ts +7 -0
  183. package/skills/support-runtime-reachability/scripts/check.mjs +638 -0
  184. package/skills/support-runtime-reachability/scripts/check.test.mjs +244 -0
  185. package/skills/support-skill-validator/SKILL.md +194 -0
  186. package/skills/support-skill-validator/references/false-positives.md +59 -0
  187. package/skills/support-skill-validator/references/validation-checks.md +280 -0
  188. package/skills/support-system-guide/SKILL.md +311 -0
  189. package/skills/support-task-force/SKILL.md +265 -0
  190. package/skills/support-task-force/references/dispatch-pattern.md +178 -0
  191. package/skills/support-task-force/references/synthesis-template.md +126 -0
  192. package/skills/support-wiki-bootstrap/SKILL.md +37 -0
  193. package/skills/support-wiki-lint/SKILL.md +196 -0
  194. package/skills/support-wiki-lint/scripts/lint.mjs +488 -0
  195. package/skills/support-wiki-lint/scripts/lint.test.mjs +196 -0
  196. package/templates/README.md +23 -0
  197. package/templates/aiwiki/CLAUDE.md.template +78 -0
  198. package/templates/aiwiki/schemas/architecture.md +118 -0
  199. package/templates/aiwiki/schemas/convention.md +112 -0
  200. package/templates/aiwiki/schemas/decision.md +144 -0
  201. package/templates/aiwiki/schemas/gotcha.md +118 -0
  202. package/templates/aiwiki/schemas/oracle.md +105 -0
  203. package/templates/aiwiki/schemas/session.md +125 -0
  204. package/templates/manifests/bugfix.yaml +41 -0
  205. package/templates/manifests/feature.yaml +69 -0
  206. package/templates/manifests/greenfield.yaml +61 -0
  207. package/templates/manifests/hotfix.yaml +45 -0
  208. package/templates/manifests/refactor.yaml +44 -0
  209. package/templates/manifests/v5/SCHEMA.md +327 -0
  210. package/templates/manifests/v5/feature.yaml +77 -0
  211. package/templates/manifests/v6/SCHEMA.md +199 -0
  212. package/templates/wiki-html/dream-detail.html +378 -0
  213. package/templates/wiki-html/dreams-list.html +155 -0
@@ -0,0 +1,178 @@
1
+ # OWASP Top 10 — checks and exploit scenarios
2
+
3
+ Phase 1 of the security audit: systematically check the code against each OWASP Top 10 category. Each category lists what to check for and at least one realistic exploit scenario.
4
+
5
+ **Core rule:** If you cannot describe how an attacker would exploit a finding, it is not a real finding. Every finding at MEDIUM or above MUST include a realistic exploit scenario.
6
+
7
+ ---
8
+
9
+ ## A01: Broken Access Control
10
+
11
+ **Check for:**
12
+ - Missing authorization checks on endpoints
13
+ - Privilege escalation paths (user accessing admin resources)
14
+ - Insecure direct object references (IDOR) -- can user A access user B's data by changing an ID?
15
+ - Missing function-level access control
16
+ - CORS misconfiguration allowing unauthorized origins
17
+ - Directory traversal in file operations
18
+
19
+ **Exploit scenario example:**
20
+
21
+ ```
22
+ FINDING: IDOR in GET /api/users/:id
23
+ EXPLOIT: Authenticated user changes :id parameter to another user's ID.
24
+ curl -H "Authorization: Bearer <user-a-token>" /api/users/<user-b-id>
25
+ Response: 200 OK with user B's profile data including email, phone, address.
26
+ IMPACT: Any authenticated user can read any other user's personal data.
27
+ SEVERITY: HIGH
28
+ FIX: Verify requesting user's ID matches :id parameter, or user has admin role.
29
+ ```
30
+
31
+ ## A02: Cryptographic Failures
32
+
33
+ **Check for:**
34
+ - Passwords stored in plaintext or weak hash (MD5, SHA1)
35
+ - Sensitive data transmitted without TLS
36
+ - Hardcoded encryption keys or IVs
37
+ - Weak random number generation for tokens/sessions
38
+ - PII stored without encryption at rest
39
+ - Deprecated cryptographic algorithms
40
+
41
+ **Exploit scenario example:**
42
+
43
+ ```
44
+ FINDING: Password hashed with MD5 in user registration
45
+ EXPLOIT: Attacker obtains database dump. MD5 hashes cracked in seconds using
46
+ rainbow tables. All user passwords compromised.
47
+ IMPACT: Full account takeover for all users.
48
+ SEVERITY: CRITICAL
49
+ FIX: Use bcrypt/scrypt/argon2id with appropriate cost factor.
50
+ ```
51
+
52
+ ## A03: Injection
53
+
54
+ **Check for:**
55
+ - SQL injection (string concatenation in queries)
56
+ - NoSQL injection (unsanitized input in MongoDB queries)
57
+ - Command injection (user input in exec/spawn/system calls)
58
+ - LDAP injection
59
+ - Template injection (user input rendered in server-side templates)
60
+ - Header injection (user input in HTTP headers)
61
+
62
+ **Exploit scenario example:**
63
+
64
+ ```
65
+ FINDING: SQL injection in search endpoint
66
+ CODE: db.query(`SELECT * FROM products WHERE name LIKE '%${req.query.search}%'`)
67
+ EXPLOIT: Attacker sends: /search?search=' UNION SELECT username,password FROM users--
68
+ Returns all usernames and password hashes.
69
+ IMPACT: Full database read access including credentials.
70
+ SEVERITY: CRITICAL
71
+ FIX: Use parameterized query: db.query('SELECT * FROM products WHERE name LIKE $1', [`%${search}%`])
72
+ ```
73
+
74
+ ## A04: Insecure Design
75
+
76
+ **Check for:**
77
+ - Missing rate limiting on authentication endpoints
78
+ - No account lockout after failed attempts
79
+ - Business logic flaws (negative quantities, race conditions in payments)
80
+ - Missing input validation on business rules
81
+ - Lack of defense in depth
82
+
83
+ **Exploit scenario example:**
84
+
85
+ ```
86
+ FINDING: No rate limiting on POST /api/auth/login
87
+ EXPLOIT: Attacker runs brute-force attack with common password list.
88
+ At 100 requests/second, 10000 common passwords tested in 100 seconds.
89
+ No lockout, no CAPTCHA, no delay.
90
+ IMPACT: Account takeover for users with weak passwords.
91
+ SEVERITY: HIGH
92
+ FIX: Add rate limiting (5 attempts per minute per IP), account lockout after 10 failures,
93
+ progressive delays, CAPTCHA after 3 failures.
94
+ ```
95
+
96
+ ## A05: Security Misconfiguration
97
+
98
+ **Check for:**
99
+ - Debug mode enabled in production
100
+ - Default credentials in configuration
101
+ - Unnecessary features enabled (directory listing, stack traces)
102
+ - Missing security headers (CSP, HSTS, X-Frame-Options)
103
+ - Overly permissive CORS
104
+ - Verbose error messages exposing internal details
105
+
106
+ ## A06: Vulnerable and Outdated Components
107
+
108
+ **Check for:**
109
+ - Known CVEs in dependencies (npm audit, pip audit, cargo audit)
110
+ - Outdated packages with known vulnerabilities
111
+ - Abandoned/unmaintained packages
112
+ - Packages with very few maintainers (bus factor risk)
113
+
114
+ **Run dependency audit:**
115
+
116
+ ```bash
117
+ # Node.js
118
+ npm audit
119
+ # or: npx better-npm-audit audit
120
+
121
+ # Python
122
+ pip audit
123
+ # or: safety check
124
+
125
+ # Go
126
+ govulncheck ./...
127
+
128
+ # Rust
129
+ cargo audit
130
+ ```
131
+
132
+ ## A07: Identification and Authentication Failures
133
+
134
+ **Check for:**
135
+ - Weak password requirements
136
+ - Session tokens in URLs
137
+ - Session fixation vulnerabilities
138
+ - Missing session invalidation on logout/password change
139
+ - JWT without expiration
140
+ - JWT secret hardcoded or weak
141
+
142
+ ## A08: Software and Data Integrity Failures
143
+
144
+ **Check for:**
145
+ - Unsigned updates or deployments
146
+ - Untrusted CI/CD pipeline modifications
147
+ - Deserialization of untrusted data
148
+ - Missing integrity checks on critical data
149
+
150
+ ## A09: Security Logging and Monitoring Failures
151
+
152
+ **Check for:**
153
+ - Missing audit logs for authentication events
154
+ - Missing logs for authorization failures
155
+ - No alerting on suspicious patterns
156
+ - Sensitive data in log output (passwords, tokens, PII)
157
+ - Log injection vulnerabilities
158
+
159
+ ## A10: Server-Side Request Forgery (SSRF)
160
+
161
+ **Check for:**
162
+ - User-controlled URLs in server-side requests
163
+ - Missing URL allowlist validation
164
+ - Internal network access via crafted URLs
165
+ - Cloud metadata endpoint access (169.254.169.254)
166
+
167
+ **Exploit scenario example:**
168
+
169
+ ```
170
+ FINDING: SSRF in image proxy endpoint
171
+ CODE: const image = await fetch(req.query.url);
172
+ EXPLOIT: Attacker sends: /proxy?url=http://169.254.169.254/latest/meta-data/iam/security-credentials/
173
+ Server fetches AWS credentials and returns them to attacker.
174
+ IMPACT: Full AWS account access via stolen IAM credentials.
175
+ SEVERITY: CRITICAL
176
+ FIX: Validate URL against allowlist. Block private IP ranges. Block metadata endpoints.
177
+ Use a URL parsing library, do not rely on string matching.
178
+ ```
@@ -0,0 +1,435 @@
1
+ ---
2
+ name: quality-test-execution
3
+ description: "Use when a test plan exists and tests need to run — executes all test types specified by the plan."
4
+ ---
5
+
6
+ # Test Execution
7
+
8
+ ## Overview
9
+
10
+ Execute every test defined in the test plan. Map each scenario to pass/fail. Generate a results report with full traceability back to requirements. No skipping.
11
+
12
+ **Core principle:** The test plan says what to test. This skill executes it ALL. No exceptions, no shortcuts, no "we'll test that later."
13
+
14
+ **Announce at start:** "I'm using the quality-test-execution skill to execute the full test plan."
15
+
16
+ ## When to Use
17
+
18
+ - After `quality-test-plan` has produced a test plan document
19
+ - During `/feature` and `/greenfield` commands at the test execution phase
20
+ - Before deployment (quality gate requirement)
21
+
22
+ **Not for:**
23
+ - Writing tests during implementation (that is build-tdd)
24
+ - Generating test plans (that is quality-test-plan)
25
+ - Ad-hoc testing (just run the tests directly)
26
+
27
+ ## Prerequisites
28
+
29
+ Before executing, the following must exist:
30
+
31
+ 1. **Test plan** -- `.forge/work/{type}/{name}/test-plan.md` (the plan to execute)
32
+ 2. **Implementation code** -- All code from the build-tdd phase
33
+ 3. **Test infrastructure** -- Test runners, databases, Playwright, etc. configured
34
+ 4. **All test files written** -- Tests from the plan must exist as actual code
35
+
36
+ If any are missing, stop. Do not execute a partial test suite.
37
+
38
+ ## The Execution Process
39
+
40
+ ### Step 1: Read the Test Plan
41
+
42
+ Parse `.forge/work/{type}/{name}/test-plan.md` and extract:
43
+
44
+ - All test IDs (UT-001, IT-001, E2E-001, SMOKE-001, LOAD-001, CONTRACT-001)
45
+ - Their expected locations (file paths)
46
+ - Their traceability (which requirements they cover)
47
+ - Pass/fail thresholds (coverage %, response time, error rate)
48
+
49
+ **Verify all test files exist.** If a test from the plan has no corresponding test file, flag it immediately.
50
+
51
+ ```
52
+ MISSING TEST FILES:
53
+ - IT-003: tests/integration/auth/login.test.ts (file not found)
54
+ - E2E-002: tests/e2e/auth/login.spec.ts (file not found)
55
+
56
+ Cannot proceed. Write missing tests first.
57
+ ```
58
+
59
+ ### Step 1.5: Codex Mode Check
60
+
61
+ Now that the test plan is parsed and files are verified, run the Codex consent flow from `protocols/codex.md`. The selected mode applies for the rest of this skill's invocation.
62
+
63
+ - **Takeover:** Dispatch Codex with the test plan to execute tests and produce the results report. Claude reviews test quality and coverage.
64
+ - **Verify** or **Skip / Codex unavailable:** Proceed with the steps below. The Codex Verify note in Step 9 will dispatch Codex to review test quality (Verify only).
65
+
66
+ ### Step 2: External Service Availability Check
67
+
68
+ Before executing any tests, determine which external services are available for real testing.
69
+
70
+ 1. **Extract** all external services from the architecture's API contract (`architecture/api-contract.md` — look for external service sections)
71
+ 2. **Attempt automated connectivity** — hit health endpoints or make minimal requests to each service
72
+ 3. **If check succeeds** — service confirmed available, record the endpoint
73
+ 4. **If check fails or is ambiguous** — ask the user: "Is [service] running right now? If yes, confirm the endpoint."
74
+ 5. **Record the availability matrix** — this feeds into all subsequent test steps
75
+
76
+ ```markdown
77
+ ## External Service Availability
78
+
79
+ | Service | Endpoint | Status | Verified |
80
+ |---|---|---|---|
81
+ | Payment API | api.stripe.com/v1/charges | Available ✓ | Real request returned 200 |
82
+ | Email Service | smtp.sendgrid.net | Not available | Connection refused |
83
+ ```
84
+
85
+ **Downstream enforcement based on availability:**
86
+
87
+ | Service Status | Unit Tests | Integration Tests | E2E Tests |
88
+ |---|---|---|---|
89
+ | **Confirmed available** | Mocks OK | SHOULD use real, mocks require documented justification | **MUST use real service** |
90
+ | **Not available** | Mocks OK | Mocks OK, flagged | **BLOCKED** — not faked |
91
+
92
+ **Note:** This step is skipped during `/hotfix` execution, consistent with existing gate exemptions (smoke tests only).
93
+
94
+ ### Step 3: Execute Unit Tests
95
+
96
+ ```bash
97
+ # Run unit tests with coverage
98
+ npm test -- --coverage
99
+ # or: pytest --cov=src --cov-report=term-missing
100
+ # or: go test ./... -coverprofile=coverage.out
101
+ ```
102
+
103
+ **Record:**
104
+ - Total tests: passed / failed / skipped
105
+ - Coverage percentage (overall and per-file)
106
+ - Any skipped tests (MUST be justified)
107
+ - Failed test details with error messages
108
+
109
+ **Coverage check:**
110
+ ```
111
+ Coverage: 87% overall
112
+ src/auth/service.ts: 95%
113
+ src/auth/middleware.ts: 100%
114
+ src/auth/validation.ts: 82%
115
+ src/users/repository.ts: 73% <-- BELOW THRESHOLD (80%)
116
+
117
+ FAIL: src/users/repository.ts coverage below 80% threshold
118
+ ```
119
+
120
+ **Rules:**
121
+ - 80% minimum overall coverage
122
+ - 100% for critical paths (auth, payments, data validation)
123
+ - Zero skipped tests without documented justification
124
+ - Zero failed tests (all must pass)
125
+
126
+ ### Step 4: Execute Integration Tests
127
+
128
+ ```bash
129
+ # Run integration tests (typically against test database)
130
+ npm run test:integration
131
+ # or: pytest tests/integration/
132
+ # or: go test ./tests/integration/...
133
+ ```
134
+
135
+ **Pre-execution setup:**
136
+ - Verify test database is running and accessible (use real database, not mocks)
137
+ - Run migrations on test database
138
+ - Seed test data if required by the plan
139
+ - For external services confirmed available in Step 2: use real service (mocks require documented justification)
140
+ - For external services not available: verify stubs/mocks are configured before running the suite (suite is flagged as incomplete, but must still execute)
141
+
142
+ **Record:**
143
+ - Total tests: passed / failed / skipped
144
+ - API contract compliance (response shapes match)
145
+ - Database constraint validation results
146
+ - Failed test details with full error output
147
+
148
+ **Rules:**
149
+ - Use real database, not mocks
150
+ - Each test cleans up its own data
151
+ - Validate against API contract shapes exactly
152
+ - All integration tests from the plan must execute
153
+
154
+ ### Step 5: Execute E2E Tests (Playwright)
155
+
156
+ Dispatch the **e2e-runner** subagent for E2E test execution. The e2e-runner has Bash access, specializes in Playwright, and frees your context from long test output.
157
+
158
+ ```bash
159
+ # Run Playwright E2E tests
160
+ npx playwright test
161
+ # or: npx playwright test tests/e2e/auth/
162
+ ```
163
+
164
+ **Pre-execution setup:**
165
+ - Verify application is running (start if needed)
166
+ - Verify database is seeded with E2E test data
167
+ - Configure Playwright browsers (chromium minimum, cross-browser if specified)
168
+ - Set base URL and auth credentials for test environment
169
+
170
+ **Execute each E2E scenario from the plan:**
171
+
172
+ ```typescript
173
+ // Example: Execute E2E-001 from test plan
174
+ test('E2E-001: User registration flow', async ({ page }) => {
175
+ // Steps from test plan
176
+ await page.goto('/register');
177
+ await page.fill('[name="email"]', 'e2e-test@example.com');
178
+ await page.fill('[name="password"]', 'ValidPass123');
179
+ await page.fill('[name="name"]', 'E2E Test User');
180
+ await page.click('button[type="submit"]');
181
+
182
+ // Expected results from test plan
183
+ await expect(page).toHaveURL('/dashboard');
184
+ await expect(page.locator('.welcome-message')).toContainText('E2E Test User');
185
+ });
186
+ ```
187
+
188
+ **Record:**
189
+ - Total scenarios: passed / failed / skipped
190
+ - Screenshots captured per step (for visual verification)
191
+ - Video recordings of failed scenarios
192
+ - Browser(s) tested against
193
+ - Timing information per scenario
194
+
195
+ **Rules:**
196
+ - Every E2E scenario from the plan must execute
197
+ - Capture screenshots at key steps
198
+ - Record video for failed tests
199
+ - Test both happy path AND error scenarios from the plan
200
+ - No "flaky test" excuses -- if a test flakes, fix it
201
+
202
+ ### Step 6: Execute Smoke Tests
203
+
204
+ ```bash
205
+ # Run smoke tests (typically lightweight HTTP checks)
206
+ npm run test:smoke
207
+ # or: curl-based health checks
208
+ ```
209
+
210
+ **Execute each smoke check from the plan:**
211
+
212
+ ```bash
213
+ # SMOKE-001: Application starts and responds
214
+ curl -f http://localhost:3000/health --max-time 5
215
+ # Expected: 200 OK
216
+
217
+ # SMOKE-002: Authentication works
218
+ curl -X POST http://localhost:3000/api/auth/login \
219
+ -H "Content-Type: application/json" \
220
+ -d '{"email":"test@example.com","password":"testpass"}' \
221
+ --max-time 10
222
+ # Expected: 200 + token in response
223
+
224
+ # SMOKE-003: Core API responds
225
+ curl -H "Authorization: Bearer $TOKEN" \
226
+ http://localhost:3000/api/users \
227
+ --max-time 10
228
+ # Expected: 200
229
+ ```
230
+
231
+ **Record:**
232
+ - Each check: pass / fail
233
+ - Response time for each check
234
+ - Any timeout or connection failures
235
+
236
+ **Rules:**
237
+ - All smoke checks must pass
238
+ - Timeout thresholds from the plan are hard limits
239
+ - Smoke tests must complete within 60 seconds total
240
+
241
+ ### Step 7: Execute Load/Stress Tests
242
+
243
+ **Only execute if specified in the test plan.** Check the project profile -- not all projects need load testing.
244
+
245
+ ```bash
246
+ # Run load tests with k6
247
+ k6 run tests/load/registration-load.js
248
+
249
+ # Or with artillery
250
+ npx artillery run tests/load/registration.yml
251
+ ```
252
+
253
+ **Execute each load scenario from the plan:**
254
+
255
+ ```javascript
256
+ // k6 example for LOAD-001
257
+ import http from 'k6/http';
258
+ import { check } from 'k6';
259
+
260
+ export const options = {
261
+ scenarios: {
262
+ normal_load: {
263
+ executor: 'constant-vus',
264
+ vus: 50,
265
+ duration: '5m',
266
+ },
267
+ peak_load: {
268
+ executor: 'constant-vus',
269
+ vus: 200,
270
+ duration: '2m',
271
+ startTime: '5m',
272
+ },
273
+ },
274
+ thresholds: {
275
+ http_req_duration: ['p(95)<500', 'p(99)<2000'],
276
+ http_req_failed: ['rate<0.01'],
277
+ },
278
+ };
279
+ ```
280
+
281
+ **Record:**
282
+ - Requests per second achieved
283
+ - Response time percentiles (p50, p95, p99)
284
+ - Error rate under each load level
285
+ - Breaking point (if stress test included)
286
+ - Threshold pass/fail status
287
+
288
+ **Rules:**
289
+ - Use thresholds from the test plan (not arbitrary values)
290
+ - Run against a clean environment (no other traffic)
291
+ - Report actual numbers, not just pass/fail
292
+ - If thresholds fail, this is a blocking issue
293
+
294
+ ### Step 8: Execute API Contract Tests
295
+
296
+ ```bash
297
+ # Run contract tests
298
+ npm run test:contract
299
+ # or: pytest tests/contract/
300
+ ```
301
+
302
+ **Verify each contract from the plan:**
303
+
304
+ ```typescript
305
+ // CONTRACT-001: POST /api/users request/response shape
306
+ test('POST /api/users matches API contract', async () => {
307
+ const response = await request(app)
308
+ .post('/api/users')
309
+ .send(validUserPayload)
310
+ .expect(201);
311
+
312
+ // Validate response shape matches api-contract.md exactly
313
+ expect(response.body).toMatchSchema(userCreatedSchema);
314
+ });
315
+ ```
316
+
317
+ **Record:**
318
+ - Each contract: valid / violated
319
+ - Schema mismatches with details
320
+ - Missing fields, extra fields, wrong types
321
+
322
+ **Rules:**
323
+ - Validate exact shapes (not just "has some fields")
324
+ - Test all status codes defined in the contract
325
+ - Any contract violation is a blocking issue
326
+
327
+ ### Step 9: Generate Results Report
328
+
329
+ Map every test result back to the plan and produce the report.
330
+
331
+ **Codex Verify:** Before presenting the report, check the mode recorded at Step 1.5. If **Verify** was selected, dispatch Codex to review for tests that pass but test the wrong thing, misleading coverage, and mock-only blind spots. If **Takeover** was selected, skip this step (Codex already ran). If **Skip**, do nothing. Do NOT re-run the consent flow. See **Codex Integration** section below for full details.
332
+
333
+ **Output to:** `.forge/work/{type}/{name}/test-results.md`
334
+
335
+ Required sections:
336
+ 1. **Metadata** — feature, date, duration, link to test plan
337
+ 2. **Summary table** — each test type: Total / Passed / Failed / Skipped / Status
338
+ 3. **Coverage report** — per-file and overall, against thresholds
339
+ 4. **Traceability matrix** — each test ID mapped to requirement and result
340
+ 5. **Failed tests** — error, screenshot/video path, root requirement traced
341
+ 6. **Load test results** — p50/p95/p99, error rate, threshold status (if applicable)
342
+ 7. **Quality gate status** — each gate criteria, pass/fail
343
+ 8. **External service verification** — from Step 2 availability matrix: service, verified status, real tests run
344
+
345
+ ## Enforcement Rules
346
+
347
+ ### No Skipping
348
+
349
+ Every test in the plan MUST execute. If a test cannot execute:
350
+
351
+ 1. **Infrastructure missing** -- Set it up. Do not skip.
352
+ 2. **Test is flaky** -- Fix it. Do not skip.
353
+ 3. **Test takes too long** -- Optimize it. Do not skip.
354
+ 4. **External service confirmed available** -- E2E tests MUST use real service. Integration tests SHOULD use real service. Do not mock what is available.
355
+ 5. **External service not available** -- E2E tests that depend on the service are BLOCKED (not faked, not worked around with pre-existing data). Integration tests may use stubs. Flag the suite as INCOMPLETE.
356
+ 6. **E2E tests MUST NOT fall back to cached, pre-seeded, or pre-existing data** as a substitute for real service responses. If a real service is slow or times out, the test FAILS — it does not degrade to fake data.
357
+
358
+ The ONLY acceptable skip is a test that the user explicitly requests to defer, with documented justification.
359
+
360
+ ### No Partial Execution
361
+
362
+ Do not execute only unit tests and declare success. All test types from the plan must run:
363
+ - Unit tests
364
+ - Integration tests
365
+ - E2E tests
366
+ - Smoke tests
367
+ - Load tests (if in plan)
368
+ - Contract tests (if in plan)
369
+
370
+ ### Failure Handling
371
+
372
+ | Failures | Action |
373
+ |----------|--------|
374
+ | 0 failures | PASS -- proceed to deployment gate |
375
+ | 1-3 failures | FAIL -- report details, return to build-tdd to fix |
376
+ | 4+ failures | FAIL -- report details, may need architecture review |
377
+ | Infrastructure failure | BLOCKED -- fix infrastructure, re-run entire suite |
378
+ | External service unavailable | INCOMPLETE -- can proceed to PR with flag visible, cannot deploy to production without resolution |
379
+
380
+ ## Quality Enforcement
381
+
382
+ | Do NOT | DO |
383
+ |---|---|
384
+ | Run only unit tests | Execute ALL test types from the plan |
385
+ | Ignore flaky tests | Fix them — flaky = bug |
386
+ | Skip load tests | Run if in the plan |
387
+ | Proceed with failures | Zero failures = quality gate |
388
+ | Skip E2E screenshots/video | Capture at every key step |
389
+ | Test against production data | Use isolated test environment |
390
+ | Declare success without traceability | Map every result to the plan |
391
+
392
+ ## I/O Contract
393
+
394
+ | Field | Value |
395
+ |---|---|
396
+ | **Requires** | Test plan (`.forge/work/{type}/{name}/test-plan.md`) + implementation code (from `build-tdd`) + architecture artifacts (`.forge/work/{type}/{name}/architecture/api-contract.md` for external service verification) |
397
+ | **Produces** | `.forge/work/{type}/{name}/test-results.md` |
398
+ | **Feeds into** | `deliver-deploy` (via quality gate -- zero failures required) |
399
+ | **Updates manifest** | `artifacts.test-results: test-results.md`, `phases.quality.test-execution: { status: complete, gate-passed: true }` |
400
+
401
+ ## Codex Integration
402
+ **Modes:** Verify or Takeover | **Protocol:** `protocols/codex.md`
403
+
404
+ - **Verify:** Claude executes tests and reports, Codex reviews test quality.
405
+ - **Takeover:** Codex executes tests and reports, Claude reviews results.
406
+
407
+ **When:** After test execution, when reviewing results.
408
+
409
+ **Context to pass:**
410
+ - Path to `test-results.md` or test output
411
+ - Path to `test-plan.md`
412
+ - Path to test source files
413
+
414
+ **What Codex reviews:**
415
+ - Tests that pass but test the wrong thing (assertions on mocks, not real behavior)
416
+ - Coverage numbers that are misleading (high line coverage, low branch coverage)
417
+ - Integration boundaries tested only via mocks
418
+
419
+ **Prompt focus:** "Review these test results and test source code. Identify tests that pass but don't verify real behavior — especially mock-based tests at integration boundaries. Flag misleading coverage metrics. Which passing tests would still pass if the feature were completely broken?"
420
+
421
+ **Presentation:** Codex findings presented as "Test Quality Review" alongside pass/fail results.
422
+
423
+ ---
424
+
425
+ ## Integration
426
+
427
+ **Called by:**
428
+ - `/feature` command (test execution phase)
429
+ - `/greenfield` command (test execution phase)
430
+
431
+ **Pairs with:**
432
+ - `quality-test-plan` (provides the plan to execute)
433
+ - `build-tdd` (provides the tests and implementation)
434
+ - `deliver-deploy` (test results are a quality gate for deployment)
435
+ - `quality-code-review` (test failures may trigger re-review)