multi-forge 0.2.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (311) hide show
  1. forge/__init__.py +3 -0
  2. forge/_extensions/agents/.gitkeep +0 -0
  3. forge/_extensions/commands/.gitkeep +0 -0
  4. forge/_extensions/skills/analyze/SKILL.md +87 -0
  5. forge/_extensions/skills/challenge/SKILL.md +91 -0
  6. forge/_extensions/skills/consensus/SKILL.md +120 -0
  7. forge/_extensions/skills/consensus/resources/code_consensus_evaluation.md +94 -0
  8. forge/_extensions/skills/consensus/resources/consensus_evaluation.md +70 -0
  9. forge/_extensions/skills/consensus/resources/synthesis.md +101 -0
  10. forge/_extensions/skills/debate/SKILL.md +116 -0
  11. forge/_extensions/skills/debate/resources/code_debate_evaluation.md +101 -0
  12. forge/_extensions/skills/debate/resources/debate_evaluation.md +90 -0
  13. forge/_extensions/skills/panel/SKILL.md +141 -0
  14. forge/_extensions/skills/panel/resources/synthesis.md +103 -0
  15. forge/_extensions/skills/qa/SKILL.md +704 -0
  16. forge/_extensions/skills/qa/resources/checklist/0-enable.md +78 -0
  17. forge/_extensions/skills/qa/resources/checklist/1-preflight.md +24 -0
  18. forge/_extensions/skills/qa/resources/checklist/10-resume.md +143 -0
  19. forge/_extensions/skills/qa/resources/checklist/11-config.md +150 -0
  20. forge/_extensions/skills/qa/resources/checklist/12-search.md +58 -0
  21. forge/_extensions/skills/qa/resources/checklist/13-guard.md +237 -0
  22. forge/_extensions/skills/qa/resources/checklist/14-workflow.md +305 -0
  23. forge/_extensions/skills/qa/resources/checklist/15-skills.md +155 -0
  24. forge/_extensions/skills/qa/resources/checklist/16-handoff.md +224 -0
  25. forge/_extensions/skills/qa/resources/checklist/17-info.md +50 -0
  26. forge/_extensions/skills/qa/resources/checklist/18-disable.md +84 -0
  27. forge/_extensions/skills/qa/resources/checklist/19-uninstall.md +146 -0
  28. forge/_extensions/skills/qa/resources/checklist/2-extensions.md +188 -0
  29. forge/_extensions/skills/qa/resources/checklist/20-cleanup.md +36 -0
  30. forge/_extensions/skills/qa/resources/checklist/3-auth.md +234 -0
  31. forge/_extensions/skills/qa/resources/checklist/4-proxy.md +481 -0
  32. forge/_extensions/skills/qa/resources/checklist/5-session.md +541 -0
  33. forge/_extensions/skills/qa/resources/checklist/6-hooks.md +275 -0
  34. forge/_extensions/skills/qa/resources/checklist/7-costs.md +309 -0
  35. forge/_extensions/skills/qa/resources/checklist/8-status-line.md +174 -0
  36. forge/_extensions/skills/qa/resources/checklist/9-direct-commands.md +146 -0
  37. forge/_extensions/skills/qa/resources/checklist.md +103 -0
  38. forge/_extensions/skills/qa/resources/report-template.md +62 -0
  39. forge/_extensions/skills/qa/scripts/start-container.sh +529 -0
  40. forge/_extensions/skills/qa/scripts/walkthrough-state.py +1137 -0
  41. forge/_extensions/skills/review/SKILL.md +125 -0
  42. forge/_extensions/skills/review/references/claude-4.6.md +474 -0
  43. forge/_extensions/skills/review/references/claude-4.7.md +710 -0
  44. forge/_extensions/skills/review/references/gemini-3.1.md +546 -0
  45. forge/_extensions/skills/review/references/gpt-5.5.md +490 -0
  46. forge/_extensions/skills/review/references/skills-writing-guide.md +1588 -0
  47. forge/_extensions/skills/review/resources/code-anthropic.md +160 -0
  48. forge/_extensions/skills/review/resources/code-gemini.md +184 -0
  49. forge/_extensions/skills/review/resources/code-openai.md +203 -0
  50. forge/_extensions/skills/review/resources/code.md +160 -0
  51. forge/_extensions/skills/review-docs/SKILL.md +121 -0
  52. forge/_extensions/skills/review-docs/resources/docs-anthropic.md +170 -0
  53. forge/_extensions/skills/review-docs/resources/docs-gemini.md +204 -0
  54. forge/_extensions/skills/review-docs/resources/docs-openai.md +231 -0
  55. forge/_extensions/skills/review-docs/resources/docs.md +170 -0
  56. forge/_extensions/skills/smoke-test/SKILL.md +27 -0
  57. forge/_extensions/skills/smoke-test/scripts/smoke-test.sh +118 -0
  58. forge/_extensions/skills/understand/SKILL.md +148 -0
  59. forge/_extensions/skills/understand/resources/code-anthropic.md +163 -0
  60. forge/_extensions/skills/understand/resources/code-gemini.md +194 -0
  61. forge/_extensions/skills/understand/resources/code-openai.md +181 -0
  62. forge/_extensions/skills/understand/resources/code.md +163 -0
  63. forge/_extensions/skills/understand/resources/docs-anthropic.md +177 -0
  64. forge/_extensions/skills/understand/resources/docs-gemini.md +202 -0
  65. forge/_extensions/skills/understand/resources/docs-openai.md +191 -0
  66. forge/_extensions/skills/understand/resources/docs.md +177 -0
  67. forge/_extensions/skills/walkthrough/SKILL.md +599 -0
  68. forge/_extensions/skills/walkthrough/resources/checklist.md +765 -0
  69. forge/_extensions/skills/walkthrough/scripts/run-in-repo.sh +118 -0
  70. forge/_extensions/skills/walkthrough/scripts/setup-test-repo.sh +198 -0
  71. forge/_extensions/skills/walkthrough/scripts/walkthrough-state.py +1137 -0
  72. forge/backend/__init__.py +174 -0
  73. forge/backend/adapters/__init__.py +38 -0
  74. forge/backend/adapters/litellm.py +158 -0
  75. forge/backend/creation.py +89 -0
  76. forge/backend/registry.py +178 -0
  77. forge/cli/__init__.py +16 -0
  78. forge/cli/auth.py +483 -0
  79. forge/cli/backend.py +298 -0
  80. forge/cli/claude.py +411 -0
  81. forge/cli/config_cmd.py +303 -0
  82. forge/cli/extensions.py +1001 -0
  83. forge/cli/gc.py +165 -0
  84. forge/cli/guard.py +1018 -0
  85. forge/cli/guards.py +106 -0
  86. forge/cli/handoff.py +110 -0
  87. forge/cli/hooks/__init__.py +36 -0
  88. forge/cli/hooks/_group.py +20 -0
  89. forge/cli/hooks/_helpers.py +149 -0
  90. forge/cli/hooks/commands.py +1677 -0
  91. forge/cli/hooks/direct_commands.py +1304 -0
  92. forge/cli/hooks/install.py +232 -0
  93. forge/cli/hooks/policy.py +151 -0
  94. forge/cli/hooks/read_hygiene.py +74 -0
  95. forge/cli/hooks/verification.py +370 -0
  96. forge/cli/logs.py +406 -0
  97. forge/cli/main.py +292 -0
  98. forge/cli/proxy.py +1821 -0
  99. forge/cli/proxy_costs.py +313 -0
  100. forge/cli/search.py +416 -0
  101. forge/cli/session.py +892 -0
  102. forge/cli/session_addendum.py +81 -0
  103. forge/cli/session_fork.py +750 -0
  104. forge/cli/session_handoff.py +141 -0
  105. forge/cli/session_lifecycle.py +2053 -0
  106. forge/cli/session_manage.py +1336 -0
  107. forge/cli/session_memory.py +201 -0
  108. forge/cli/status_line.py +1398 -0
  109. forge/cli/workflow.py +1964 -0
  110. forge/config/__init__.py +110 -0
  111. forge/config/dataclass_utils.py +88 -0
  112. forge/config/defaults/__init__.py +0 -0
  113. forge/config/defaults/backends/__init__.py +0 -0
  114. forge/config/defaults/backends/litellm.yaml +196 -0
  115. forge/config/defaults/templates/__init__.py +0 -0
  116. forge/config/defaults/templates/litellm-anthropic-local.yaml +33 -0
  117. forge/config/defaults/templates/litellm-anthropic.yaml +24 -0
  118. forge/config/defaults/templates/litellm-gemini-flash-local.yaml +37 -0
  119. forge/config/defaults/templates/litellm-gemini-local.yaml +32 -0
  120. forge/config/defaults/templates/litellm-gemini-test.yaml +34 -0
  121. forge/config/defaults/templates/litellm-gemini.yaml +21 -0
  122. forge/config/defaults/templates/litellm-openai-codex-local.yaml +36 -0
  123. forge/config/defaults/templates/litellm-openai-local.yaml +38 -0
  124. forge/config/defaults/templates/litellm-openai.yaml +28 -0
  125. forge/config/defaults/templates/openrouter-anthropic.yaml +23 -0
  126. forge/config/defaults/templates/openrouter-deepseek.yaml +26 -0
  127. forge/config/defaults/templates/openrouter-gemini-flash.yaml +26 -0
  128. forge/config/defaults/templates/openrouter-gemini.yaml +23 -0
  129. forge/config/defaults/templates/openrouter-glm.yaml +23 -0
  130. forge/config/defaults/templates/openrouter-kimi.yaml +30 -0
  131. forge/config/defaults/templates/openrouter-minimax.yaml +26 -0
  132. forge/config/defaults/templates/openrouter-openai-codex.yaml +23 -0
  133. forge/config/defaults/templates/openrouter-openai.yaml +28 -0
  134. forge/config/defaults/templates/openrouter-qwen.yaml +25 -0
  135. forge/config/loader.py +675 -0
  136. forge/config/schema.py +448 -0
  137. forge/core/__init__.py +5 -0
  138. forge/core/auth/__init__.py +67 -0
  139. forge/core/auth/capabilities.py +219 -0
  140. forge/core/auth/credentials_file.py +244 -0
  141. forge/core/auth/protocols.py +18 -0
  142. forge/core/auth/secrets.py +243 -0
  143. forge/core/auth/template_secrets.py +112 -0
  144. forge/core/data/__init__.py +5 -0
  145. forge/core/data/model_catalog.yaml +1522 -0
  146. forge/core/data/pricing.yaml +140 -0
  147. forge/core/data/system_prompt_addendums/__init__.py +0 -0
  148. forge/core/data/system_prompt_addendums/gemini.md +330 -0
  149. forge/core/data/system_prompt_addendums/openai.md +328 -0
  150. forge/core/llm/__init__.py +231 -0
  151. forge/core/llm/clients/__init__.py +14 -0
  152. forge/core/llm/clients/base.py +115 -0
  153. forge/core/llm/clients/litellm.py +619 -0
  154. forge/core/llm/clients/openai_compat.py +244 -0
  155. forge/core/llm/clients/openrouter.py +234 -0
  156. forge/core/llm/credentials.py +439 -0
  157. forge/core/llm/detection.py +86 -0
  158. forge/core/llm/errors.py +44 -0
  159. forge/core/llm/protocols.py +80 -0
  160. forge/core/llm/types.py +176 -0
  161. forge/core/logging.py +146 -0
  162. forge/core/models/__init__.py +91 -0
  163. forge/core/models/catalog.py +467 -0
  164. forge/core/models/pricing.py +165 -0
  165. forge/core/models/types.py +167 -0
  166. forge/core/naming.py +212 -0
  167. forge/core/ops/__init__.py +73 -0
  168. forge/core/ops/context.py +141 -0
  169. forge/core/ops/gc.py +802 -0
  170. forge/core/ops/proxy.py +146 -0
  171. forge/core/ops/resolution.py +135 -0
  172. forge/core/ops/session.py +344 -0
  173. forge/core/ops/session_context.py +548 -0
  174. forge/core/paths.py +38 -0
  175. forge/core/process.py +54 -0
  176. forge/core/reactive/__init__.py +38 -0
  177. forge/core/reactive/cost_tracking.py +300 -0
  178. forge/core/reactive/env.py +180 -0
  179. forge/core/reactive/proxy.py +78 -0
  180. forge/core/reactive/routing.py +622 -0
  181. forge/core/reactive/session_runner.py +185 -0
  182. forge/core/reactive/structured_output.py +62 -0
  183. forge/core/reactive/tagger.py +94 -0
  184. forge/core/reactive/throttle.py +132 -0
  185. forge/core/state/__init__.py +59 -0
  186. forge/core/state/exceptions.py +59 -0
  187. forge/core/state/io.py +140 -0
  188. forge/core/state/lock.py +99 -0
  189. forge/core/state/timestamps.py +60 -0
  190. forge/core/transcript.py +78 -0
  191. forge/core/typing_helpers.py +24 -0
  192. forge/core/workqueue/__init__.py +67 -0
  193. forge/core/workqueue/queue.py +552 -0
  194. forge/core/workqueue/types.py +63 -0
  195. forge/guard/__init__.py +26 -0
  196. forge/guard/deterministic/__init__.py +26 -0
  197. forge/guard/deterministic/base.py +158 -0
  198. forge/guard/deterministic/coding_standards.py +256 -0
  199. forge/guard/deterministic/registry.py +148 -0
  200. forge/guard/deterministic/tdd.py +171 -0
  201. forge/guard/engine.py +216 -0
  202. forge/guard/protocols.py +91 -0
  203. forge/guard/queries.py +96 -0
  204. forge/guard/semantic/__init__.py +34 -0
  205. forge/guard/semantic/promotion.py +18 -0
  206. forge/guard/semantic/supervisor.py +813 -0
  207. forge/guard/semantic/verdict.py +183 -0
  208. forge/guard/store.py +124 -0
  209. forge/guard/team/__init__.py +6 -0
  210. forge/guard/team/config.py +24 -0
  211. forge/guard/team/handlers.py +209 -0
  212. forge/guard/team/prompts.py +41 -0
  213. forge/guard/types.py +125 -0
  214. forge/guard/workflow/__init__.py +17 -0
  215. forge/guard/workflow/branches.py +67 -0
  216. forge/guard/workflow/config.py +63 -0
  217. forge/guard/workflow/divergence.py +113 -0
  218. forge/guard/workflow/policy.py +87 -0
  219. forge/guard/workflow/stages.py +205 -0
  220. forge/install/__init__.py +55 -0
  221. forge/install/cli.py +281 -0
  222. forge/install/exceptions.py +163 -0
  223. forge/install/hooks.py +109 -0
  224. forge/install/installer.py +1037 -0
  225. forge/install/models.py +321 -0
  226. forge/install/preset.py +272 -0
  227. forge/install/settings_merge.py +831 -0
  228. forge/install/tracking.py +238 -0
  229. forge/install/version.py +141 -0
  230. forge/proxy/__init__.py +0 -0
  231. forge/proxy/base_client.py +181 -0
  232. forge/proxy/client_adapter.py +476 -0
  233. forge/proxy/client_factory.py +531 -0
  234. forge/proxy/converters.py +1206 -0
  235. forge/proxy/cost_logger.py +132 -0
  236. forge/proxy/cost_tracker.py +242 -0
  237. forge/proxy/data_models.py +338 -0
  238. forge/proxy/error_hints.py +92 -0
  239. forge/proxy/metrics.py +222 -0
  240. forge/proxy/model_spec.py +158 -0
  241. forge/proxy/proxies.py +333 -0
  242. forge/proxy/proxy_identity.py +134 -0
  243. forge/proxy/proxy_orchestrator.py +1018 -0
  244. forge/proxy/proxy_startup.py +54 -0
  245. forge/proxy/server.py +1561 -0
  246. forge/proxy/utils.py +537 -0
  247. forge/review/__init__.py +6 -0
  248. forge/review/adversarial.py +111 -0
  249. forge/review/consensus.py +236 -0
  250. forge/review/engine.py +356 -0
  251. forge/review/models.py +437 -0
  252. forge/review/resources/__init__.py +5 -0
  253. forge/review/resources/codereview-performance.md +85 -0
  254. forge/review/resources/codereview-quick.md +75 -0
  255. forge/review/resources/codereview-security.md +92 -0
  256. forge/review/resources/codereview.md +85 -0
  257. forge/review/resources/docreview-quick.md +75 -0
  258. forge/review/resources/docreview.md +86 -0
  259. forge/review/resources/thinkdeep.md +89 -0
  260. forge/review/routing.py +368 -0
  261. forge/review/synthesis.py +73 -0
  262. forge/runtime_config.py +438 -0
  263. forge/search/__init__.py +55 -0
  264. forge/search/bm25_store.py +264 -0
  265. forge/search/content_store.py +197 -0
  266. forge/search/engine.py +352 -0
  267. forge/search/exceptions.py +51 -0
  268. forge/search/extractor.py +234 -0
  269. forge/search/index_state.py +295 -0
  270. forge/search/store.py +215 -0
  271. forge/search/tokenizer.py +24 -0
  272. forge/session/__init__.py +130 -0
  273. forge/session/active.py +339 -0
  274. forge/session/artifacts.py +202 -0
  275. forge/session/claude/__init__.py +50 -0
  276. forge/session/claude/cleanup.py +105 -0
  277. forge/session/claude/invoke.py +236 -0
  278. forge/session/claude/paths.py +200 -0
  279. forge/session/cleanup.py +216 -0
  280. forge/session/config.py +34 -0
  281. forge/session/direct_model.py +107 -0
  282. forge/session/effective.py +169 -0
  283. forge/session/exceptions.py +255 -0
  284. forge/session/handoff.py +881 -0
  285. forge/session/handoff_agent.py +544 -0
  286. forge/session/hooks/__init__.py +35 -0
  287. forge/session/hooks/models.py +73 -0
  288. forge/session/hooks/session_start.py +507 -0
  289. forge/session/identity.py +84 -0
  290. forge/session/index.py +553 -0
  291. forge/session/manager.py +1506 -0
  292. forge/session/models.py +572 -0
  293. forge/session/overrides.py +344 -0
  294. forge/session/plan_resolution.py +286 -0
  295. forge/session/prev_sessions.py +128 -0
  296. forge/session/store.py +431 -0
  297. forge/session/validation.py +47 -0
  298. forge/session/worktree/__init__.py +65 -0
  299. forge/session/worktree/cleanup.py +262 -0
  300. forge/session/worktree/config_copy.py +203 -0
  301. forge/session/worktree/create.py +332 -0
  302. forge/sidecar/__init__.py +29 -0
  303. forge/sidecar/container.py +161 -0
  304. forge/sidecar/docker.py +86 -0
  305. forge/sidecar/secrets.py +19 -0
  306. multi_forge-0.2.0.dist-info/METADATA +242 -0
  307. multi_forge-0.2.0.dist-info/RECORD +311 -0
  308. multi_forge-0.2.0.dist-info/WHEEL +4 -0
  309. multi_forge-0.2.0.dist-info/entry_points.txt +2 -0
  310. multi_forge-0.2.0.dist-info/licenses/LICENSE +203 -0
  311. multi_forge-0.2.0.dist-info/licenses/NOTICE +14 -0
@@ -0,0 +1,704 @@
1
+ ---
2
+ name: forge:qa
3
+ description: Full Forge QA checklist in Docker container. Use for release validation or comprehensive verification of all Forge features.
4
+ disable-model-invocation: true
5
+ argument-hint: '[--provider-profile openrouter|remote-litellm] [--from X.Y] [--to X.Y] [--reset] [--stop] [--keep] [categories...]'
6
+ allowed-tools: Read, Bash, Glob # AskUserQuestion deliberately omitted — listing it triggers CC auto-approve bug (github.com/anthropics/claude-code/issues/29547). The tool remains available; omitting it preserves the interactive dialog.
7
+ ---
8
+
9
+ # Full QA
10
+
11
+ Full Forge QA checklist inside a Docker container. The container IS the sandbox -- any command inside it is safe.
12
+
13
+ ## Usage
14
+
15
+ ```
16
+ /forge:qa Run full QA checklist
17
+ /forge:qa session proxy Run specific categories only
18
+ /forge:qa --from 4.1 Resume from section 4.1
19
+ /forge:qa --from 4.1 --to 7 Run sections 4.1 through 6.x (excludes 7)
20
+ /forge:qa --from 10 --to 13 Run sections 10 through 12 (13 is excluded)
21
+ /forge:qa --provider-profile remote-litellm
22
+ Use remote/shared LiteLLM instead of default OpenRouter
23
+ /forge:qa --reset Kill container, remove image, rebuild from scratch
24
+ /forge:qa --stop Stop and remove the QA container
25
+ /forge:qa --keep Keep container running after completion
26
+ ```
27
+
28
+ ## Arguments
29
+
30
+ | Argument | Description |
31
+ | ----------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
32
+ | `--from X.Y` | Resume from section X subsection Y. |
33
+ | `--to X.Y` | Stop before section X subsection Y (exclusive). Example: `--from 10 --to 13` runs sections 10-12 and stops before 13. |
34
+ | `--provider-profile openrouter\|remote-litellm` | Select the proxy backend family used by provider-dependent QA steps. Defaults to `openrouter`; `remote-litellm` is for shared/internal LiteLLM infrastructure. |
35
+ | `--reset` | Kill container, remove image, rebuild from scratch. Use when auto-staleness detection is insufficient: Dockerfile changes, Claude Code version upgrades, corrupt image layers, or persistent container state not cleared by workspace init. |
36
+ | `--stop` | Stop and remove the QA Docker container. |
37
+ | `--keep` | Keep the container running after completion. |
38
+ | `categories` | One or more category names to run (see allowlist below). |
39
+
40
+ ## Execution
41
+
42
+ Follow these steps in order. Do not skip steps.
43
+
44
+ ### Step 1: Parse Arguments and Route
45
+
46
+ Parse `$ARGUMENTS` to extract flags: `--provider-profile <profile>`, `--from X.Y`, `--to X.Y`, `--reset`, `--stop`,
47
+ `--keep`. Any remaining words after flags are category names. Default `--provider-profile` to `openrouter`. Valid
48
+ provider profiles are `openrouter` and `remote-litellm`; reject any other value before starting the container.
49
+
50
+ **Greet the user:**
51
+
52
+ "Running the full Forge QA checklist inside a Docker container. This requires Docker Desktop to be running. I'll walk
53
+ through each test section, run commands inside the container, and check assertions. Forge debug logging is enabled by
54
+ default in the container, and the run artifacts will include command output plus copied Forge logs. You can ask
55
+ questions or explore at any point."
56
+
57
+ ### Step 2: QA Mode
58
+
59
+ Full QA runs the checklist inside a Docker container. The container IS the sandbox -- the agent can run any command
60
+ inside it safely.
61
+
62
+ **Execution model**: Run ONLY commands that appear in the checklist's bash blocks. Do NOT invent commands. Adaptability
63
+ is at the assertion/interpretation layer -- judge output against assertion text even if format changes. Keep command
64
+ execution deterministic.
65
+
66
+ **Set the scripts directory** from the skill's own location:
67
+
68
+ ```bash
69
+ SCRIPTS="${CLAUDE_SKILL_DIR}/scripts"
70
+ ```
71
+
72
+ **If `--stop` was set**: Run `bash "$SCRIPTS/start-container.sh" --stop` and stop. No tests.
73
+
74
+ **If `--reset` was set**: Pass `--reset` to `start-container.sh` in Phase 1 (it kills the container, removes the image,
75
+ and rebuilds from scratch). Continue with the normal flow after that. The script's auto-staleness detection (comparing
76
+ the image's git rev label to `HEAD`) handles most cases automatically; `--reset` is the manual escape hatch for
77
+ situations where the label matches but the image is wrong (see the `--reset` argument description above).
78
+
79
+ **Provider profile**: Pass the selected provider profile to `start-container.sh`. The script validates required
80
+ credentials and exports the QA template/proxy variables into the container environment. If a running container was
81
+ created with a different provider profile, `start-container.sh` fails with a reset/stop hint; surface that message and
82
+ stop.
83
+
84
+ **Category name allowlist** (exact match only -- reject unknown names):
85
+
86
+ | Name | Section | Name | Section |
87
+ | ---------- | ------- | ----------- | ------- |
88
+ | enable | 0 | status-line | 8 |
89
+ | preflight | 1 | commands | 9 |
90
+ | extensions | 2 | resume | 10 |
91
+ | auth | 3 | config | 11 |
92
+ | proxy | 4 | search | 12 |
93
+ | session | 5 | guard | 13 |
94
+ | hooks | 6 | workflow | 14 |
95
+ | costs | 7 | skills | 15 |
96
+ | | | handoff | 16 |
97
+ | | | info | 17 |
98
+ | | | disable | 18 |
99
+ | | | uninstall | 19 |
100
+ | | | cleanup | 20 |
101
+
102
+ If category names were given, validate each against this allowlist. Reject unknown names: "Unknown category 'foo'. Valid
103
+ categories: enable, preflight, extensions, ..."
104
+
105
+ #### Phase 1: Start Container
106
+
107
+ Run `start-container.sh` to get a Docker container:
108
+
109
+ ```bash
110
+ # Pass --reset if the user requested a full image rebuild.
111
+ # PROVIDER_PROFILE is the parsed --provider-profile value, defaulting to openrouter.
112
+ CONTAINER=$(bash "$SCRIPTS/start-container.sh" --provider-profile "$PROVIDER_PROFILE" ${REBUILD:+--reset})
113
+
114
+ # `start-container.sh` prints the container name on stdout
115
+ if [ -z "$CONTAINER" ]; then
116
+ echo "ERROR: start-container.sh returned empty container name."
117
+ exit 1
118
+ fi
119
+ ```
120
+
121
+ Note: `start-container.sh` mounts a host state directory into the container at `$FORGE_TEST_REPO/.forge/qa/`, so state
122
+ persists on the host at `${FORGE_HOME:-$HOME/.forge}/manual-testing/qa/`.
123
+
124
+ If it fails, show the error and stop. The script handles image build, staleness detection, container reuse, workspace
125
+ init, and jq preflight.
126
+
127
+ Tell the user: "Docker container ready: `<container>`. Starting QA run."
128
+
129
+ **Check for stale artifacts**: Probe the container for leftover state from a previous QA run.
130
+
131
+ Note: a freshly rebuilt container always has `/root/.claude/settings.json` seeded to `{}` by `start-container.sh`. Treat
132
+ that empty baseline file as clean, not stale.
133
+
134
+ ```bash
135
+ docker exec "$CONTAINER" bash -lc 'test -d ~/.forge/proxies || test -f ~/.forge/installed.json || jq -e '\''type == "object" and length > 0'\'' ~/.claude/settings.json >/dev/null 2>&1' && echo "STALE" || echo "CLEAN"
136
+ ```
137
+
138
+ If `STALE`: use AskUserQuestion to ask "Previous QA artifacts detected in container. Reset to clean state?" with options
139
+ "Reset" / "Keep (resume where left off)". If the user chooses Reset, stop and recreate the container, then continue from
140
+ Phase 1 with the fresh container. Do **not** try to scrub the live container in place: stale state can live in both
141
+ `/root` and `$FORGE_TEST_REPO`, and the workspace reset must restore the seeded test repo.
142
+
143
+ ```bash
144
+ bash "$SCRIPTS/start-container.sh" --stop
145
+ CONTAINER=$(bash "$SCRIPTS/start-container.sh" --provider-profile "$PROVIDER_PROFILE" ${REBUILD:+--reset})
146
+
147
+ if [ -z "$CONTAINER" ]; then
148
+ echo "ERROR: start-container.sh returned empty container name after reset."
149
+ exit 1
150
+ fi
151
+ ```
152
+
153
+ This is more reliable than ad-hoc `rm -rf` cleanup because `start-container.sh` already owns workspace initialization.
154
+
155
+ #### Phase 2: Initialize State + Infra Probes
156
+
157
+ **Set the checklist index** from the skill's own location:
158
+
159
+ ```bash
160
+ CHECKLIST="${CLAUDE_SKILL_DIR}/resources/checklist.md"
161
+ ```
162
+
163
+ **Resolve the host-side state directory** (the mount makes host and container paths equivalent):
164
+
165
+ ```bash
166
+ STATE_DIR_RAW="${FORGE_HOME:-$HOME/.forge}/manual-testing/qa"
167
+ STATE_DIR=$(python3 -c 'import os,sys; print(os.path.abspath(os.path.expanduser(os.path.expandvars(sys.argv[1]))))' "$STATE_DIR_RAW")
168
+ STATE_FILE="$STATE_DIR/state.json"
169
+ ```
170
+
171
+ **Prepare mounted artifact directories**. Raw step logs and pre-clean log snapshots live under the mounted QA state
172
+ directory; Forge's own debug logs live under `/root/.forge/logs` inside the container and are copied out later.
173
+
174
+ ```bash
175
+ docker exec "$CONTAINER" bash -lc 'mkdir -p "$FORGE_TEST_REPO/.forge/qa/logs" "$FORGE_TEST_REPO/.forge/qa/forge-logs-snapshots"'
176
+ ```
177
+
178
+ **Fresh run**: clear any previous run-local logs/snapshots, reset container debug logs, then initialize progress
179
+ tracking via `walkthrough-state.py`:
180
+
181
+ ```bash
182
+ rm -rf "$STATE_DIR/logs" "$STATE_DIR/forge-logs-snapshots"
183
+ docker exec "$CONTAINER" bash -lc 'rm -rf /root/.forge/logs && mkdir -p "$FORGE_TEST_REPO/.forge/qa/logs" "$FORGE_TEST_REPO/.forge/qa/forge-logs-snapshots"'
184
+ python3 "$SCRIPTS/walkthrough-state.py" "$CHECKLIST" init --force --mode full-qa "$STATE_FILE"
185
+ ```
186
+
187
+ This creates the state file with schema version, checklist hash, and empty step records. The script handles all
188
+ bookkeeping -- the agent never constructs state JSON manually.
189
+
190
+ **Run infrastructure probes.** These drive `<!-- requires: X -->` skip decisions for the entire run:
191
+
192
+ | Probe | Command | Stored as | Meaning |
193
+ | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------- | --------------------------------------------- |
194
+ | `docker` | `docker exec $CONTAINER command -v docker` | `INFRA_DOCKER` | Docker client in container (docker-in-docker) |
195
+ | `api_key` | `docker exec $CONTAINER bash -lc 'case "${FORGE_QA_PROVIDER_PROFILE:-openrouter}" in openrouter) test -n "${OPENROUTER_API_KEY:-}" ;; remote-litellm) test -n "${LITELLM_API_KEY:-}" && test -n "${LITELLM_BASE_URL:-}" ;; esac'` | `INFRA_API_KEY` | Selected provider credentials are available |
196
+
197
+ Store probe results in the state file:
198
+
199
+ ```bash
200
+ CONTAINER_ID=$(docker inspect -f '{{.Id}}' "$CONTAINER")
201
+
202
+ python3 "$SCRIPTS/walkthrough-state.py" "$CHECKLIST" var "$STATE_FILE" set INFRA_DOCKER <true|false>
203
+ python3 "$SCRIPTS/walkthrough-state.py" "$CHECKLIST" var "$STATE_FILE" set INFRA_API_KEY <true|false>
204
+ python3 "$SCRIPTS/walkthrough-state.py" "$CHECKLIST" var "$STATE_FILE" set CONTAINER "$CONTAINER"
205
+ python3 "$SCRIPTS/walkthrough-state.py" "$CHECKLIST" var "$STATE_FILE" set CONTAINER_ID "$CONTAINER_ID"
206
+ python3 "$SCRIPTS/walkthrough-state.py" "$CHECKLIST" var "$STATE_FILE" set RUN_SCOPE "container:$CONTAINER_ID"
207
+ ```
208
+
209
+ `RUN_SCOPE` ties prerequisite satisfaction to the current container instance, so a rebuilt container cannot inherit
210
+ side-effect-dependent sections from an old run by accident.
211
+
212
+ Tell the user which infrastructure is available and what will be skipped.
213
+
214
+ **Resume** (`--from X.Y`): Read `$STATE_FILE` directly from the host, then validate it against the chosen resume point:
215
+
216
+ ```bash
217
+ python3 "$SCRIPTS/walkthrough-state.py" "$CHECKLIST" validate "$STATE_FILE" --from <X.Y from --from>
218
+ ```
219
+
220
+ This clears stale future-step records and refreshes derived section status for the current run before execution resumes.
221
+ The `record` command still validates checklist hash on each call, so hash drift is caught automatically. Show progress:
222
+ "Previously: N sections, M passed, K failed. Resuming from X.Y."
223
+
224
+ On resume, preserve `$STATE_DIR/logs`, `$STATE_DIR/forge-logs-snapshots`, and `/root/.forge/logs` so evidence from the
225
+ earlier part of the same QA run remains available.
226
+
227
+ #### Phase 3: Build Section Index
228
+
229
+ Run the checklist parser to get the full structure:
230
+
231
+ ```bash
232
+ python3 "$SCRIPTS/walkthrough-state.py" "$CHECKLIST" index
233
+ ```
234
+
235
+ This returns JSON with all sections, subsections, annotations, and assertion counts. Store this as the checklist index.
236
+
237
+ If category names were given, filter the index to matching sections only.
238
+
239
+ #### Phase 4: Execute Sections (Main Loop)
240
+
241
+ For each section in the index (or starting from `--from X.Y`). If `--to X.Y` was set, stop **before** reaching that step
242
+ — do not execute it or anything after it. `--to` accepts both section-level (`--to 7` stops before section 7) and
243
+ subsection-level (`--to 7.3` stops before step 7.3) IDs. When the stop point is reached, skip to Phase 5 (Summary).
244
+
245
+ For each section/step in the filtered range:
246
+
247
+ 01. **Read the section file** on the host (path from the index) using the Read tool. Keep reads scoped to a single
248
+ section file (do not load multiple sections at once).
249
+
250
+ 02. **Get step details** for each subsection via the parser:
251
+
252
+ ```bash
253
+ python3 "$SCRIPTS/walkthrough-state.py" "$CHECKLIST" step <N.X>
254
+ ```
255
+
256
+ This returns JSON with:
257
+
258
+ - `annotation` / `annotations`: step type(s)
259
+ - `code_blocks`: list of `{code, runnable}` objects
260
+ - `instructions`: prose for the user
261
+ - `assertions`: list of assertion texts to verify
262
+ - `assertion_count`: number of assertions (deterministic -- do not count manually)
263
+ - `next`: ID of the next step (or null if last)
264
+
265
+ 03. **Annotations** map to step types. Never show raw HTML comments in output.
266
+
267
+ | Annotation | Step type | Preamble |
268
+ | ------------------------ | ------------- | -------------------------------------------------------- |
269
+ | `<!-- auto -->` | `[Automatic]` | "Automatic step -- running checks." |
270
+ | `<!-- human:confirm -->` | `[Review]` | "I'll run this and show you the output for review." |
271
+ | `<!-- human:guided -->` | `[Hands-on]` | "Your turn -- here's what to do in the container shell." |
272
+
273
+ **Handle by annotation type**:
274
+
275
+ | Annotation | Action |
276
+ | ------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
277
+ | `<!-- auto -->` | Run bash block via `docker exec`. Check assertions against output. Show results block. |
278
+ | `<!-- human:confirm -->` | Run bash block via `docker exec`, show output to user. Use AskUserQuestion: "Does this look correct?" (Pass / Fail / Skip). Show results block. |
279
+ | `<!-- human:guided -->` | Show instructions and bash snippet from the checklist. Do NOT run the bash block. Use AskUserQuestion with context-appropriate framing (see rule 9). After user confirms, verify artifacts via `docker exec` (rule 9). Show results block. |
280
+ | `<!-- requires: X -->` | Split `X` on commas, uppercase each token to form `INFRA_<TOKEN>` (e.g., `docker,api_key` checks `INFRA_DOCKER` and `INFRA_API_KEY`). Look up each via `var get`. Skip if any is unavailable: show `[Skipped -- requires: X]`. |
281
+ | `<!-- prereq: N, ... -->` | Section-level or subsection-level prerequisite. Lists section numbers (e.g., `0, 2, 4`) that must be satisfied in the current run before this section can run. On `--from` resume, check state file for each prereq and warn the user about any blockers. See rule 10. |
282
+ | `<!-- destructive -->` | Safe inside Docker. Run the bash block, check assertions. |
283
+ | No annotation | Treat as `<!-- human:confirm -->`. |
284
+
285
+ A subsection can have multiple annotations (e.g., `<!-- destructive -->` + `<!-- human: ... -->`). Apply all that
286
+ match. `requires` is checked first (skip before attempting anything else). `prereq` is checked at section entry.
287
+
288
+ 04. **Execute bash blocks** from the checklist -- run ONLY what the checklist specifies:
289
+
290
+ ```bash
291
+ docker exec "$CONTAINER" bash -lc 'cd "$FORGE_TEST_REPO" && <bash block from checklist>'
292
+ ```
293
+
294
+ The agent does NOT invent commands. It runs the checklist's bash blocks verbatim. For each entry in the step's
295
+ `code_blocks` where `runnable` is `true`, run `code` as one Bash tool call. Entries where `runnable` is `false` are
296
+ display-only snippets for `human:guided` steps.
297
+
298
+ **Default debug logging**: the QA container exports `FORGE_DEBUG=1` via `/etc/profile.d/forge-qa.sh`, so Forge
299
+ commands write debug logs to `/root/.forge/logs/...` unless the subcommand is explicitly exempt.
300
+
301
+ **Before a block that contains `forge logs --clean`**, snapshot the current Forge debug logs into the mounted state
302
+ dir so evidence survives the cleanup step:
303
+
304
+ ```bash
305
+ docker exec "$CONTAINER" bash -lc 'SNAP="$FORGE_TEST_REPO/.forge/qa/forge-logs-snapshots/N.X/pre-clean"; rm -rf "$SNAP"; if [ -d /root/.forge/logs ]; then mkdir -p "$SNAP" && cp -R /root/.forge/logs/. "$SNAP"/; fi'
306
+ ```
307
+
308
+ 05. **Check assertions**: For each assertion text from the step details, examine the command output and judge whether it
309
+ is satisfied. This is the adaptability layer -- if CLI output format changes slightly, the agent can still verify
310
+ the intent of the assertion. Classify each assertion as `p` (pass), `f` (fail), or `s` (skip).
311
+
312
+ 06. **Write logs** inside the container -- save raw command output to per-subsection log files:
313
+
314
+ ```bash
315
+ docker exec "$CONTAINER" bash -c 'cat > "$FORGE_TEST_REPO/.forge/qa/logs/N.X.log" <<'"'"'EOF'"'"'
316
+ <raw output>
317
+ EOF'
318
+ ```
319
+
320
+ 07. **Record results** in the state file after classifying each step's assertions:
321
+
322
+ ```bash
323
+ python3 "$SCRIPTS/walkthrough-state.py" "$CHECKLIST" record "$STATE_FILE" <N.X> <results>
324
+ ```
325
+
326
+ Where `<results>` is comma-separated: `p` (pass), `f` (fail), `s` (skip) -- one per assertion. Example:
327
+ `record "$STATE_FILE" 3.1 p,p,p,p` for a step where all 4 assertions passed. The output shows progress:
328
+ `3.1: 4/4 pass | Section 3: 4/30 | Overall: 75/N`.
329
+
330
+ 08. **Step presentation format**: Every subsection follows a visual pattern so progress is easy to scan.
331
+
332
+ ```
333
+ --- N.X Step Title [Type] -------------------------
334
+ <preamble from annotation table above>
335
+
336
+ <body: commands, output, or instructions>
337
+
338
+ Results:
339
+ ✔ First assertion passed
340
+ ✘ Second assertion FAILED: reason
341
+ o Third assertion skipped
342
+ ----------------------------------------------------
343
+ ```
344
+
345
+ **`[Hands-on]` body template** -- guided steps use a fixed inner layout so every run looks the same:
346
+
347
+ ```
348
+ --- N.X Step Title [Hands-on] -------------------------
349
+ Your turn -- here's what to do in the container shell.
350
+
351
+ In the container shell (`docker exec -it $CONTAINER bash -l`):
352
+
353
+ 1. First action
354
+ ```
355
+
356
+ command-to-run
357
+
358
+ ```
359
+
360
+ 2. Second action
361
+ ```
362
+
363
+ another-command
364
+
365
+ ```
366
+
367
+ Expected:
368
+ - First assertion text from checklist
369
+ - Second assertion text from checklist
370
+
371
+ If something goes wrong: <failure cue from checklist, if any>
372
+
373
+ Review the instructions above, then answer below.
374
+
375
+
376
+
377
+ <AskUserQuestion>
378
+ ```
379
+
380
+ Rules for the template:
381
+
382
+ - **"In the container shell:"** (or **"In Session B:"** for live Claude steps) -- always anchor where
383
+ - **Numbered steps** with flush-left code blocks -- no indentation so copy-paste has no leading spaces
384
+ - **"Expected:"** bullet list pulled from the checklist assertions -- tells the user what to look for
385
+ - **Failure cue** line only if the checklist includes one (e.g., "If Claude only says Command completed...")
386
+ - Never rephrase checklist instructions as prose -- copy the structure, fill in runtime values
387
+ - The buffer line and blank lines before AskUserQuestion are mandatory (rule 9)
388
+
389
+ **Section boundaries** appear between sections (not between steps within a section):
390
+
391
+ ```
392
+ Section N Complete: X/Y passed
393
+
394
+ ====================================================
395
+
396
+ --- M.1 First Step [Type] -------------------------
397
+ ```
398
+
399
+ Use `---` (thin) for step boundaries, `===` (thick) as a single separator line between sections. Use ✔ for pass, ✘
400
+ for fail, o for skip.
401
+
402
+ 09. **For `human:confirm` and `human:guided` items**: CRITICAL -- print the full instructions and bash snippet from the
403
+ checklist **before** calling AskUserQuestion. Do **not** end immediately on the last instruction line or code fence:
404
+ Claude Code's dialog overlays the bottom few terminal lines. After the real instructions, print one short disposable
405
+ buffer line such as `Review the instructions above, then answer below.` and then print **at least three blank
406
+ lines** before calling AskUserQuestion. Treat that buffer line and blank space as sacrificial padding. The user must
407
+ see what to do BEFORE being asked to confirm. The instructions appear in the step body between the opening preamble
408
+ and the AskUserQuestion call. If you put instructions after the question, the user sees only the question with no
409
+ context.
410
+
411
+ **Match question framing and options to the step type:**
412
+
413
+ | Step asks user to... | Question style | Options |
414
+ | --------------------------------- | ------------------------------- | ---------------------------------- |
415
+ | Confirm output looks correct | "Does this look correct?" | Pass / Fail / Skip |
416
+ | Perform an action (open, launch) | "Have you [action]?" | Done / Skip / Stop QA |
417
+ | Verify something (status, output) | "[Expected result] visible?" | Yes / No, something's wrong / Skip |
418
+ | Both (run command + check result) | "Did [expected result] appear?" | Yes / No, something's wrong / Skip |
419
+
420
+ Keep the AskUserQuestion prompt itself short enough to fit on one line when possible. Put detail in the printed
421
+ instructions, not in the dialog. Don't use "Done" as an answer to a yes/no question. "Did the install succeed?"
422
+ needs Yes/No, not Done.
423
+
424
+ The user acts in the container shell. If they choose "Stop QA", skip all remaining sections and go to Phase 5
425
+ (Summary).
426
+
427
+ **Do not invent Claude availability failures**: For guided steps that involve a live Claude Code session
428
+ (`forge session start`, `forge session resume`, `forge claude start`, plan mode, Session B, status line checks,
429
+ etc.), do **not** recommend "Skip" merely because the agent cannot drive the TUI itself. Recommend "Skip" only when
430
+ you have concrete evidence that live Claude launching is unavailable in the QA container:
431
+
432
+ - A direct probe fails, for example:
433
+
434
+ ```bash
435
+ docker exec "$CONTAINER" bash -lc 'command -v claude >/dev/null 2>&1'
436
+ ```
437
+
438
+ - The user reports an actual launch failure such as `claude: command not found`.
439
+
440
+ If the current run already contains evidence that Claude launched successfully (welcome banner, successful
441
+ `forge session start`, prior guided step, etc.), treat live Claude as available and ask the user to proceed with the
442
+ guided instructions instead of steering them toward `Skip`.
443
+
444
+ **Post-confirmation verification**: After the user says "Done", verify that the step actually produced expected
445
+ artifacts before recording results. For each assertion, check whether it can be verified programmatically via
446
+ `docker exec` (file exists, permissions correct, command output matches). Run those checks and record `p`/`f` based
447
+ on the actual result -- not the user's word alone. Only trust the user's confirmation for assertions that are purely
448
+ observational (e.g., "input was hidden", "prompt appeared") where no container state can be checked.
449
+
450
+ 10. **Prerequisite checks** (`<!-- prereq: N, ... -->`):
451
+
452
+ Section completion is tracked **automatically** by the `record` command. When the final subsection of a section is
453
+ recorded in the current run scope, `record` sets `SECTION_<N>_STATUS` to `passed` or `failed` in the state file. No
454
+ manual `var set` is needed.
455
+
456
+ **When entering a section** (or subsection) with prereqs in its `step` output, run:
457
+
458
+ ```bash
459
+ python3 "$SCRIPTS/walkthrough-state.py" "$CHECKLIST" prereq-check "$STATE_FILE" <step_id>
460
+ ```
461
+
462
+ This returns `{"ok": true/false, "required": [...], "missing": [...], "blocking": [...], "statuses": {...}}`.
463
+
464
+ - If `ok` is `true`: proceed normally.
465
+
466
+ - If `ok` is `false`: check the `resolvable` list in the response. `resolvable` contains step-level prereqs (e.g.,
467
+ `4.2`) whose section prereqs are already satisfied -- meaning you can run that step immediately.
468
+
469
+ **Auto-resolve resolvable prereqs**: For each step in `resolvable`, fetch its details via
470
+ `walkthrough-state.py step <prereq_id>`. Only auto-run if the step's annotation is `auto` (not `human:guided` or
471
+ `human:confirm`) and it has no unmet `requires:` gates. For interactive prereqs, ask the user instead. Execute
472
+ auto steps normally (run bash blocks, check assertions, record results), then re-run `prereq-check` for the
473
+ original step. This avoids unnecessary skips when the missing prereq is cheap to run.
474
+
475
+ **If blocking prereqs remain after auto-resolution** (section-level prereqs, or step-level prereqs whose own
476
+ section prereqs aren't met): warn the user which prerequisites are blocking (show `blocking` and `statuses`).
477
+ `missing` is the subset that was never completed in this run; `failed` and `stale_run` also block. Ask whether to
478
+ (a) run the blocking prereqs first, (b) skip this section/step, or (c) proceed anyway (risky). This handles both
479
+ `--from` resume (skipped sections) and container rebuild (lost state).
480
+
481
+ Prereqs are **not transitive** -- only the directly listed sections are checked. Each section already lists its full
482
+ dependency set (e.g., section 5 lists `0, 2, 4`, not just `4`).
483
+
484
+ 11. **Gate rules** -- check after each section completes:
485
+
486
+ | If section fails... | Then... |
487
+ | ------------------- | -------------------------------------------------------------------- |
488
+ | 0 (Enable) | Stop. Enable is broken. |
489
+ | 2 (Extensions) | Skip Section 3 (can't verify auth without ext). |
490
+ | 4 (Proxy) | Skip Sections 7, 14-16 (no proxy for costs/workflow/skills/handoff). |
491
+ | Any section | Section 20 (Cleanup) always runs. |
492
+
493
+ 12. **Context conservation**: After completing each `## N.` section, print a one-line summary using the progress numbers
494
+ from the last `record` output. Do NOT carry raw command output forward -- the state file and logs inside the
495
+ container have the details. This preserves context window for the full run.
496
+
497
+ **Glue calls need no narration.** The `walkthrough-state.py step`, `record`, and `var` calls between steps are
498
+ bookkeeping. The Bash tool will show their JSON output in the transcript -- that's fine. But do NOT add commentary
499
+ around them ("now let me fetch the next step", "the JSON shows..."). Just call the tool and proceed to the next visible
500
+ step. The user should see a clean flow of steps, not a play-by-play of the bookkeeping layer.
501
+
502
+ **Variable substitution**: When commands in bash blocks use placeholders like `<proxy_id>`, capture runtime values and
503
+ store them in the state file:
504
+
505
+ ```bash
506
+ python3 "$SCRIPTS/walkthrough-state.py" "$CHECKLIST" var "$STATE_FILE" set PROXY_ID <value>
507
+ ```
508
+
509
+ Retrieve when needed for substitution in later steps:
510
+
511
+ ```bash
512
+ python3 "$SCRIPTS/walkthrough-state.py" "$CHECKLIST" var "$STATE_FILE" get PROXY_ID
513
+ ```
514
+
515
+ #### Phase 5: Summary
516
+
517
+ Get the final report from the state file:
518
+
519
+ ```bash
520
+ python3 "$SCRIPTS/walkthrough-state.py" "$CHECKLIST" report "$STATE_FILE"
521
+ ```
522
+
523
+ This returns JSON with per-section pass/fail/skip counts, failures list, gaps, and totals. The script provides all
524
+ numbers -- do not count manually. Render the report JSON as a results table:
525
+
526
+ ```
527
+ Full QA Results
528
+ ====================================
529
+ Container: $CONTAINER
530
+ Checklist: v1.0.0 (N items)
531
+
532
+ Section Pass Fail Skip
533
+ ----------------------------------------
534
+ 0. Install 17 0 0
535
+ 1. Pre-Flight 2 0 0
536
+ 2. Extensions 26 0 0
537
+ ...
538
+ ----------------------------------------
539
+ TOTAL 290 3 22
540
+
541
+ Failures:
542
+ 2.3 Verify Pre-Existing Settings: ...
543
+ 6.4 Smoke Test SessionStart Hook: ...
544
+
545
+ Skipped (infra missing):
546
+ 3.1-3.11 (requires: api_key)
547
+ ====================================
548
+ ```
549
+
550
+ #### Phase 5b: Save Run Artifacts
551
+
552
+ After generating the report, save all artifacts to a timestamped run directory.
553
+
554
+ This phase is required for every QA run, including partial `--from/--to` runs and runs with failures. Do not stop after
555
+ printing the summary. A QA run is not complete until `report.md`, `state.json`, and `.pending-transcript` exist in the
556
+ run directory / state dir.
557
+
558
+ After Phase 5 summary, continue directly into Phase 5b without asking the user whether to save artifacts.
559
+
560
+ ```bash
561
+ RUN_DIR="$STATE_DIR/runs/$(date +%Y-%m-%d-%H%M%S)"
562
+ mkdir -p "$RUN_DIR"
563
+ ```
564
+
565
+ 1. Generate the report using `walkthrough-state.py`:
566
+
567
+ ```bash
568
+ python3 "$SCRIPTS/walkthrough-state.py" "$CHECKLIST" report "$STATE_FILE"
569
+ ```
570
+
571
+ This returns JSON with per-section pass/fail/skip counts, failures, and gaps. Find the report template
572
+ (`${CLAUDE_SKILL_DIR}/resources/report-template.md`), fill it in, and write to `$RUN_DIR/report.md`.
573
+
574
+ 2. Copy the state file: `cp "$STATE_FILE" "$RUN_DIR/state.json"`
575
+
576
+ 3. Copy mounted raw step logs when present:
577
+
578
+ ```bash
579
+ if [ -d "$STATE_DIR/logs" ]; then
580
+ cp -R "$STATE_DIR/logs" "$RUN_DIR/step-logs"
581
+ fi
582
+ ```
583
+
584
+ 4. Copy any pre-clean Forge log snapshots when present:
585
+
586
+ ```bash
587
+ if [ -d "$STATE_DIR/forge-logs-snapshots" ]; then
588
+ cp -R "$STATE_DIR/forge-logs-snapshots" "$RUN_DIR/forge-logs-snapshots"
589
+ fi
590
+ ```
591
+
592
+ 5. Copy the container's current Forge debug logs when present:
593
+
594
+ ```bash
595
+ if docker exec "$CONTAINER" bash -lc 'test -d /root/.forge/logs'; then
596
+ mkdir -p "$RUN_DIR/forge-logs/final"
597
+ docker cp "$CONTAINER:/root/.forge/logs/." "$RUN_DIR/forge-logs/final"
598
+ fi
599
+ ```
600
+
601
+ 6. Generate a transcript claim token and write the marker so only this QA session can copy the transcript here when it
602
+ ends:
603
+
604
+ ```bash
605
+ TRANSCRIPT_TOKEN="forge-qa-transcript-token:$(python3 - <<'PY'
606
+ import uuid
607
+ print(uuid.uuid4())
608
+ PY
609
+ )"
610
+ python3 - <<'PY' "$RUN_DIR" "$STATE_DIR/.pending-transcript" "$TRANSCRIPT_TOKEN"
611
+ import json
612
+ import sys
613
+
614
+ run_dir, marker_path, token = sys.argv[1:4]
615
+ with open(marker_path, "w", encoding="utf-8") as handle:
616
+ json.dump({"run_dir": run_dir, "transcript_contains": token}, handle)
617
+ handle.write("\n")
618
+ PY
619
+ ```
620
+
621
+ Tell the user: "Run artifacts saved to `$RUN_DIR`. Forge step logs and debug logs were copied when present. Transcript
622
+ claim token: `$TRANSCRIPT_TOKEN`. Transcript will be added when this QA session ends."
623
+
624
+ #### Phase 6: Cleanup
625
+
626
+ - If all passed and `--keep` was NOT set: stop and remove the container.
627
+ - If any failures: keep the container for inspection. Print: "Container kept for inspection. Run `/forge:qa --stop` to
628
+ remove."
629
+ - The last `record` call already updated `last_updated` in the state file.
630
+
631
+ Tip: "Report and transcript saved to the run directory. Find previous reports in `~/.forge/manual-testing/qa/runs/`."
632
+
633
+ ## Safety Model
634
+
635
+ | Tier | Scripts involved | What can go wrong | Mitigation |
636
+ | ------- | ----------------------------- | ---------------------- | -------------------------------------- |
637
+ | Full QA | `start-container.sh` + Docker | Nothing -- OS boundary | Container cannot reach host filesystem |
638
+
639
+ All commands run inside the Docker container via `docker exec`. The container is the sandbox.
640
+
641
+ `walkthrough-state.py` runs on the HOST for bookkeeping (state file is accessible via mount). It never executes commands
642
+ inside the container.
643
+
644
+ ## Reference: Full QA Checklist
645
+
646
+ The full checklist is split:
647
+
648
+ - Index: `resources/checklist.md`
649
+ - Sections: `resources/checklist/*.md`
650
+
651
+ It covers 21 categories:
652
+
653
+ | Category | Section | Destructive? |
654
+ | ----------- | ------- | ------------ |
655
+ | enable | 0 | Yes |
656
+ | preflight | 1 | No |
657
+ | extensions | 2 | No |
658
+ | auth | 3 | No |
659
+ | proxy | 4 | No |
660
+ | session | 5 | No |
661
+ | hooks | 6 | No |
662
+ | costs | 7 | No |
663
+ | status-line | 8 | No |
664
+ | commands | 9 | No |
665
+ | resume | 10 | No |
666
+ | config | 11 | No |
667
+ | search | 12 | No |
668
+ | guard | 13 | No |
669
+ | workflow | 14 | No |
670
+ | skills | 15 | No |
671
+ | handoff | 16 | No |
672
+ | info | 17 | No |
673
+ | disable | 18 | Yes |
674
+ | uninstall | 19 | Yes |
675
+ | cleanup | 20 | Yes |
676
+
677
+ Commands are deterministic (from checklist); interpretation is adaptive (agent judges output).
678
+
679
+ ## Common Mistakes (DON'T)
680
+
681
+ - **DON'T invent CLI commands.** Run ONLY commands from the checklist's bash blocks. If a command doesn't exist, the QA
682
+ run will show a confusing error.
683
+ - **DON'T carry raw output forward.** After each section, summarize and drop. The state file and logs inside the
684
+ container have the details. This preserves context window for the full run.
685
+ - **DON'T count assertions manually.** Use `walkthrough-state.py record` and `report` for all counting. LLMs get
686
+ arithmetic wrong.
687
+ - **DON'T combine multiple Bash commands in one call.** Run each `code_blocks` entry as a separate Bash call. Piped
688
+ multi-command blocks fail silently in the Bash tool.
689
+ - **DON'T put instructions after AskUserQuestion.** The user sees the question modal immediately -- anything you print
690
+ after it appears below their answer, not above the question. Print instructions BEFORE the tool call.
691
+ - **DO add a real visual buffer before AskUserQuestion.** Use a short sacrificial buffer line plus at least three blank
692
+ lines so the dialog covers padding, not the instructions or command snippet.
693
+ - **DON'T ignore script failures.** If `start-container.sh`, `docker exec`, or `walkthrough-state.py` exits with a
694
+ non-zero code, STOP. The error message tells you what went wrong (count mismatch, hash drift, corrupt state). Do not
695
+ proceed with stale data.
696
+ - **DON'T assume Claude Code is unavailable without evidence.** For `human:guided` live-session steps, only recommend
697
+ `Skip` after a real failed probe (`command -v claude`) or an actual user-reported launch error.
698
+
699
+ ## Tips
700
+
701
+ - **Context window**: Full QA may be long-running -- use `--from X.Y` to resume after compaction.
702
+ - **Run a range**: Use `--from 4.1 --to 7` to run sections 4 through 6 only (excludes the `--to` step).
703
+ - **Resume after compaction**: If the conversation compacts during QA, use `/forge:qa --from X.Y`.
704
+ - **Quick check**: For a quick non-interactive health check, use `/forge:smoke-test`.