sofia-cli 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (435) hide show
  1. package/.github/agents/copilot-instructions.md +39 -0
  2. package/.github/agents/speckit.analyze.agent.md +184 -0
  3. package/.github/agents/speckit.checklist.agent.md +294 -0
  4. package/.github/agents/speckit.clarify.agent.md +181 -0
  5. package/.github/agents/speckit.constitution.agent.md +84 -0
  6. package/.github/agents/speckit.implement.agent.md +135 -0
  7. package/.github/agents/speckit.plan.agent.md +90 -0
  8. package/.github/agents/speckit.specify.agent.md +258 -0
  9. package/.github/agents/speckit.tasks.agent.md +137 -0
  10. package/.github/agents/speckit.taskstoissues.agent.md +30 -0
  11. package/.github/copilot-instructions.md +257 -0
  12. package/.github/prompts/speckit.analyze.prompt.md +3 -0
  13. package/.github/prompts/speckit.checklist.prompt.md +3 -0
  14. package/.github/prompts/speckit.clarify.prompt.md +3 -0
  15. package/.github/prompts/speckit.constitution.prompt.md +3 -0
  16. package/.github/prompts/speckit.implement.prompt.md +3 -0
  17. package/.github/prompts/speckit.plan.prompt.md +3 -0
  18. package/.github/prompts/speckit.specify.prompt.md +3 -0
  19. package/.github/prompts/speckit.tasks.prompt.md +3 -0
  20. package/.github/prompts/speckit.taskstoissues.prompt.md +3 -0
  21. package/.github/workflows/ci.yml +38 -0
  22. package/.prettierrc +6 -0
  23. package/.specify/memory/constitution.md +181 -0
  24. package/.specify/scripts/bash/check-prerequisites.sh +166 -0
  25. package/.specify/scripts/bash/common.sh +156 -0
  26. package/.specify/scripts/bash/create-new-feature.sh +297 -0
  27. package/.specify/scripts/bash/setup-plan.sh +61 -0
  28. package/.specify/scripts/bash/update-agent-context.sh +810 -0
  29. package/.specify/templates/agent-file-template.md +28 -0
  30. package/.specify/templates/checklist-template.md +40 -0
  31. package/.specify/templates/constitution-template.md +50 -0
  32. package/.specify/templates/plan-template.md +113 -0
  33. package/.specify/templates/spec-template.md +115 -0
  34. package/.specify/templates/tasks-template.md +251 -0
  35. package/.vscode/mcp.json +42 -0
  36. package/.vscode/settings.json +19 -0
  37. package/CODE_OF_CONDUCT.md +128 -0
  38. package/LICENSE +21 -0
  39. package/README.md +213 -0
  40. package/dist/src/cli/developCommand.js +240 -0
  41. package/dist/src/cli/directCommands.js +143 -0
  42. package/dist/src/cli/envLoader.js +16 -0
  43. package/dist/src/cli/exportCommand.js +53 -0
  44. package/dist/src/cli/index.js +203 -0
  45. package/dist/src/cli/ioContext.js +109 -0
  46. package/dist/src/cli/preflight.js +57 -0
  47. package/dist/src/cli/statusCommand.js +110 -0
  48. package/dist/src/cli/workshopCommand.js +400 -0
  49. package/dist/src/develop/checkpointState.js +86 -0
  50. package/dist/src/develop/codeGenerator.js +319 -0
  51. package/dist/src/develop/dynamicScaffolder.js +226 -0
  52. package/dist/src/develop/githubMcpAdapter.js +122 -0
  53. package/dist/src/develop/index.js +15 -0
  54. package/dist/src/develop/mcpContextEnricher.js +195 -0
  55. package/dist/src/develop/pocScaffolder.js +542 -0
  56. package/dist/src/develop/ralphLoop.js +659 -0
  57. package/dist/src/develop/templateRegistry.js +364 -0
  58. package/dist/src/develop/testRunner.js +202 -0
  59. package/dist/src/logging/logger.js +58 -0
  60. package/dist/src/loop/conversationLoop.js +227 -0
  61. package/dist/src/loop/phaseSummarizer.js +87 -0
  62. package/dist/src/mcp/mcpManager.js +267 -0
  63. package/dist/src/mcp/mcpTransport.js +391 -0
  64. package/dist/src/mcp/retryPolicy.js +47 -0
  65. package/dist/src/mcp/webSearch.js +254 -0
  66. package/dist/src/phases/contextSummarizer.js +101 -0
  67. package/dist/src/phases/discoveryEnricher.js +156 -0
  68. package/dist/src/phases/phaseExtractors.js +222 -0
  69. package/dist/src/phases/phaseHandlers.js +328 -0
  70. package/dist/src/prompts/design.md +51 -0
  71. package/dist/src/prompts/develop-boundary.md +51 -0
  72. package/dist/src/prompts/develop.md +111 -0
  73. package/dist/src/prompts/discover.md +58 -0
  74. package/dist/src/prompts/ideate.md +56 -0
  75. package/dist/src/prompts/plan.md +51 -0
  76. package/dist/src/prompts/promptLoader.js +167 -0
  77. package/dist/src/prompts/promptLoader.ts +198 -0
  78. package/dist/src/prompts/select.md +47 -0
  79. package/dist/src/prompts/summarize/README.md +8 -0
  80. package/dist/src/prompts/summarize/design-summary.md +37 -0
  81. package/dist/src/prompts/summarize/develop-summary.md +25 -0
  82. package/dist/src/prompts/summarize/ideate-summary.md +27 -0
  83. package/dist/src/prompts/summarize/plan-summary.md +27 -0
  84. package/dist/src/prompts/summarize/select-summary.md +21 -0
  85. package/dist/src/prompts/system.md +28 -0
  86. package/dist/src/sessions/exportPaths.js +22 -0
  87. package/dist/src/sessions/exportWriter.js +406 -0
  88. package/dist/src/sessions/sessionManager.js +81 -0
  89. package/dist/src/sessions/sessionStore.js +65 -0
  90. package/dist/src/shared/activitySpinner.js +91 -0
  91. package/dist/src/shared/copilotClient.js +129 -0
  92. package/dist/src/shared/data/cards.json +1249 -0
  93. package/dist/src/shared/data/cardsLoader.js +51 -0
  94. package/dist/src/shared/errorClassifier.js +120 -0
  95. package/dist/src/shared/events.js +28 -0
  96. package/dist/src/shared/markdownRenderer.js +34 -0
  97. package/dist/src/shared/schemas/session.js +265 -0
  98. package/dist/src/shared/tableRenderer.js +20 -0
  99. package/dist/src/vendor/chalk.js +2 -0
  100. package/dist/src/vendor/cli-table3.js +3 -0
  101. package/dist/src/vendor/commander.js +2 -0
  102. package/dist/src/vendor/marked-terminal.js +3 -0
  103. package/dist/src/vendor/marked.js +2 -0
  104. package/dist/src/vendor/ora.js +2 -0
  105. package/dist/src/vendor/pino.js +2 -0
  106. package/dist/src/vendor/zod.js +2 -0
  107. package/dist/tests/e2e/developE2e.spec.js +126 -0
  108. package/dist/tests/e2e/developFailureE2e.spec.js +247 -0
  109. package/dist/tests/e2e/developPty.spec.js +75 -0
  110. package/dist/tests/e2e/discoveryWebSearchRelevance.spec.js +84 -0
  111. package/dist/tests/e2e/harness.spec.js +83 -0
  112. package/dist/tests/e2e/mcpLive.spec.js +120 -0
  113. package/dist/tests/e2e/newSession.e2e.spec.js +177 -0
  114. package/dist/tests/e2e/ralphLoopEnrichmentComparison.spec.js +62 -0
  115. package/dist/tests/e2e/workiqEnrichment.spec.js +56 -0
  116. package/dist/tests/e2e/zavaSimulation.spec.js +452 -0
  117. package/dist/tests/fixtures/test-fixture-project/src/add.js +3 -0
  118. package/dist/tests/fixtures/test-fixture-project/tests/failing.test.js +6 -0
  119. package/dist/tests/fixtures/test-fixture-project/tests/hanging.test.js +8 -0
  120. package/dist/tests/fixtures/test-fixture-project/tests/passing.test.js +10 -0
  121. package/dist/tests/fixtures/test-fixture-project/vitest.config.js +6 -0
  122. package/dist/tests/integration/autoStartConversation.spec.js +138 -0
  123. package/dist/tests/integration/defaultCommand.spec.js +147 -0
  124. package/dist/tests/integration/directCommandNonTty.spec.js +224 -0
  125. package/dist/tests/integration/directCommandTty.spec.js +151 -0
  126. package/dist/tests/integration/discoveryEnrichmentFlow.spec.js +175 -0
  127. package/dist/tests/integration/exportArtifacts.spec.js +202 -0
  128. package/dist/tests/integration/exportFallbackFlow.spec.js +99 -0
  129. package/dist/tests/integration/mcpDegradationFlow.spec.js +190 -0
  130. package/dist/tests/integration/mcpTransportFlow.spec.js +139 -0
  131. package/dist/tests/integration/newSessionFlow.spec.js +343 -0
  132. package/dist/tests/integration/pocGithubMcp.spec.js +186 -0
  133. package/dist/tests/integration/pocLocalFallback.spec.js +171 -0
  134. package/dist/tests/integration/pocScaffold.spec.js +163 -0
  135. package/dist/tests/integration/ralphLoopFlow.spec.js +359 -0
  136. package/dist/tests/integration/ralphLoopPartial.spec.js +368 -0
  137. package/dist/tests/integration/resumeAndBacktrack.spec.js +247 -0
  138. package/dist/tests/integration/spinnerLifecycle.spec.js +220 -0
  139. package/dist/tests/integration/summarizationFlow.spec.js +115 -0
  140. package/dist/tests/integration/testRunnerReal.spec.js +52 -0
  141. package/dist/tests/integration/webSearchAgent.spec.js +128 -0
  142. package/dist/tests/live/copilotSdkLive.spec.js +107 -0
  143. package/dist/tests/live/zavaFullWorkshop.spec.js +392 -0
  144. package/dist/tests/setup/loadEnv.js +3 -0
  145. package/dist/tests/unit/cli/developCommand.spec.js +567 -0
  146. package/dist/tests/unit/cli/directCommands.spec.js +279 -0
  147. package/dist/tests/unit/cli/envLoader.spec.js +58 -0
  148. package/dist/tests/unit/cli/ioContext.spec.js +119 -0
  149. package/dist/tests/unit/cli/preflight.spec.js +108 -0
  150. package/dist/tests/unit/cli/statusCommand.spec.js +111 -0
  151. package/dist/tests/unit/cli/workshopClientFallback.spec.js +80 -0
  152. package/dist/tests/unit/cli/workshopCommand.spec.js +329 -0
  153. package/dist/tests/unit/config/vitestEnvSetup.spec.js +13 -0
  154. package/dist/tests/unit/develop/checkpointState.spec.js +315 -0
  155. package/dist/tests/unit/develop/codeGenerator.spec.js +355 -0
  156. package/dist/tests/unit/develop/githubMcpAdapter.spec.js +231 -0
  157. package/dist/tests/unit/develop/mcpContextEnricher.spec.js +433 -0
  158. package/dist/tests/unit/develop/outputValidator.spec.js +119 -0
  159. package/dist/tests/unit/develop/pocScaffolder.spec.js +353 -0
  160. package/dist/tests/unit/develop/ralphLoop.spec.js +1248 -0
  161. package/dist/tests/unit/develop/templateRegistry.spec.js +85 -0
  162. package/dist/tests/unit/develop/testRunner.spec.js +249 -0
  163. package/dist/tests/unit/infraBicep.spec.js +92 -0
  164. package/dist/tests/unit/infraDeploy.spec.js +82 -0
  165. package/dist/tests/unit/infraTeardown.spec.js +63 -0
  166. package/dist/tests/unit/logging/logger.spec.js +43 -0
  167. package/dist/tests/unit/loop/conversationLoop.spec.js +592 -0
  168. package/dist/tests/unit/loop/phaseSummarizer.spec.js +141 -0
  169. package/dist/tests/unit/loop/streamingMarkdown.spec.js +147 -0
  170. package/dist/tests/unit/mcp/mcpManager.spec.js +279 -0
  171. package/dist/tests/unit/mcp/mcpTransport.spec.js +529 -0
  172. package/dist/tests/unit/mcp/retryPolicy.spec.js +218 -0
  173. package/dist/tests/unit/mcp/timeoutValidation.spec.js +46 -0
  174. package/dist/tests/unit/mcp/webSearch.spec.js +567 -0
  175. package/dist/tests/unit/phases/contextSummarizer.spec.js +140 -0
  176. package/dist/tests/unit/phases/discoveryEnricher.repeatCalls.spec.js +93 -0
  177. package/dist/tests/unit/phases/discoveryEnricher.spec.js +411 -0
  178. package/dist/tests/unit/phases/phaseExtractors.spec.js +352 -0
  179. package/dist/tests/unit/phases/phaseHandlers.spec.js +425 -0
  180. package/dist/tests/unit/prompts/promptLoader.spec.js +118 -0
  181. package/dist/tests/unit/schemas/pocSchemas.spec.js +412 -0
  182. package/dist/tests/unit/schemas/session.spec.js +257 -0
  183. package/dist/tests/unit/sessions/exportPaths.spec.js +31 -0
  184. package/dist/tests/unit/sessions/exportWriter.spec.js +655 -0
  185. package/dist/tests/unit/sessions/sessionManager.spec.js +151 -0
  186. package/dist/tests/unit/sessions/sessionStore.spec.js +116 -0
  187. package/dist/tests/unit/shared/activitySpinner.spec.js +175 -0
  188. package/dist/tests/unit/shared/cardsLoader.spec.js +76 -0
  189. package/dist/tests/unit/shared/copilotClient.spec.js +155 -0
  190. package/dist/tests/unit/shared/errorClassifier.spec.js +131 -0
  191. package/dist/tests/unit/shared/events.spec.js +55 -0
  192. package/dist/tests/unit/shared/markdownRenderer.spec.js +35 -0
  193. package/dist/tests/unit/shared/markdownRendererChunks.spec.js +70 -0
  194. package/dist/tests/unit/shared/tableRenderer.spec.js +34 -0
  195. package/dist/vitest.config.js +14 -0
  196. package/dist/vitest.live.config.js +18 -0
  197. package/docs/README.md +35 -0
  198. package/docs/architecture.md +169 -0
  199. package/docs/cli-usage.md +207 -0
  200. package/docs/environment.md +66 -0
  201. package/docs/export-format.md +146 -0
  202. package/docs/session-model.md +113 -0
  203. package/eslint.config.js +35 -0
  204. package/infra/deploy.sh +193 -0
  205. package/infra/gather-env.sh +211 -0
  206. package/infra/main.bicep +90 -0
  207. package/infra/main.bicepparam +18 -0
  208. package/infra/resources.bicep +134 -0
  209. package/infra/teardown.sh +114 -0
  210. package/package.json +63 -0
  211. package/specs/001-cli-workshop-rebuild/checklists/requirements.md +35 -0
  212. package/specs/001-cli-workshop-rebuild/contracts/cli.md +59 -0
  213. package/specs/001-cli-workshop-rebuild/contracts/export-summary-json.md +23 -0
  214. package/specs/001-cli-workshop-rebuild/contracts/session-json.md +30 -0
  215. package/specs/001-cli-workshop-rebuild/data-model.md +210 -0
  216. package/specs/001-cli-workshop-rebuild/plan.md +361 -0
  217. package/specs/001-cli-workshop-rebuild/quickstart.md +83 -0
  218. package/specs/001-cli-workshop-rebuild/research.md +116 -0
  219. package/specs/001-cli-workshop-rebuild/spec.md +240 -0
  220. package/specs/001-cli-workshop-rebuild/tasks.md +476 -0
  221. package/specs/002-poc-generation/contracts/poc-output.md +172 -0
  222. package/specs/002-poc-generation/contracts/ralph-loop.md +113 -0
  223. package/specs/002-poc-generation/data-model.md +172 -0
  224. package/specs/002-poc-generation/plan.md +109 -0
  225. package/specs/002-poc-generation/quickstart.md +97 -0
  226. package/specs/002-poc-generation/research.md +786 -0
  227. package/specs/002-poc-generation/spec.md +81 -0
  228. package/specs/002-poc-generation/tasks-fix.md +198 -0
  229. package/specs/002-poc-generation/tasks.md +252 -0
  230. package/specs/003-mcp-transport-integration/checklists/requirements.md +37 -0
  231. package/specs/003-mcp-transport-integration/contracts/context-enricher.md +220 -0
  232. package/specs/003-mcp-transport-integration/contracts/discovery-enricher.md +267 -0
  233. package/specs/003-mcp-transport-integration/contracts/github-adapter.md +149 -0
  234. package/specs/003-mcp-transport-integration/contracts/mcp-transport.md +288 -0
  235. package/specs/003-mcp-transport-integration/data-model.md +326 -0
  236. package/specs/003-mcp-transport-integration/plan.md +114 -0
  237. package/specs/003-mcp-transport-integration/quickstart.md +311 -0
  238. package/specs/003-mcp-transport-integration/research.md +395 -0
  239. package/specs/003-mcp-transport-integration/spec.md +234 -0
  240. package/specs/003-mcp-transport-integration/tasks.md +324 -0
  241. package/specs/003-next-spec-gaps.md +150 -0
  242. package/specs/004-dev-resume-hardening/checklists/requirements.md +37 -0
  243. package/specs/004-dev-resume-hardening/contracts/cli.md +160 -0
  244. package/specs/004-dev-resume-hardening/data-model.md +321 -0
  245. package/specs/004-dev-resume-hardening/plan.md +107 -0
  246. package/specs/004-dev-resume-hardening/quickstart.md +115 -0
  247. package/specs/004-dev-resume-hardening/research.md +142 -0
  248. package/specs/004-dev-resume-hardening/spec.md +221 -0
  249. package/specs/004-dev-resume-hardening/tasks.md +333 -0
  250. package/specs/005-ai-search-deploy/checklists/requirements.md +39 -0
  251. package/specs/005-ai-search-deploy/contracts/web-search-tool.md +241 -0
  252. package/specs/005-ai-search-deploy/data-model.md +130 -0
  253. package/specs/005-ai-search-deploy/plan.md +93 -0
  254. package/specs/005-ai-search-deploy/quickstart.md +96 -0
  255. package/specs/005-ai-search-deploy/research.md +187 -0
  256. package/specs/005-ai-search-deploy/spec.md +143 -0
  257. package/specs/005-ai-search-deploy/tasks.md +284 -0
  258. package/specs/006-workshop-extraction-fixes/checklists/requirements.md +61 -0
  259. package/specs/006-workshop-extraction-fixes/contracts/summarization-and-export.md +131 -0
  260. package/specs/006-workshop-extraction-fixes/data-model.md +149 -0
  261. package/specs/006-workshop-extraction-fixes/plan.md +123 -0
  262. package/specs/006-workshop-extraction-fixes/quickstart.md +101 -0
  263. package/specs/006-workshop-extraction-fixes/research.md +143 -0
  264. package/specs/006-workshop-extraction-fixes/spec.md +210 -0
  265. package/specs/006-workshop-extraction-fixes/tasks.md +316 -0
  266. package/src/cli/developCommand.ts +308 -0
  267. package/src/cli/directCommands.ts +195 -0
  268. package/src/cli/envLoader.ts +17 -0
  269. package/src/cli/exportCommand.ts +65 -0
  270. package/src/cli/index.ts +249 -0
  271. package/src/cli/ioContext.ts +139 -0
  272. package/src/cli/preflight.ts +86 -0
  273. package/src/cli/statusCommand.ts +118 -0
  274. package/src/cli/workshopCommand.ts +496 -0
  275. package/src/develop/checkpointState.ts +121 -0
  276. package/src/develop/codeGenerator.ts +402 -0
  277. package/src/develop/dynamicScaffolder.ts +284 -0
  278. package/src/develop/githubMcpAdapter.ts +199 -0
  279. package/src/develop/index.ts +34 -0
  280. package/src/develop/mcpContextEnricher.ts +279 -0
  281. package/src/develop/pocScaffolder.ts +646 -0
  282. package/src/develop/ralphLoop.ts +1044 -0
  283. package/src/develop/templateRegistry.ts +427 -0
  284. package/src/develop/testRunner.ts +276 -0
  285. package/src/logging/logger.ts +73 -0
  286. package/src/loop/conversationLoop.ts +355 -0
  287. package/src/loop/phaseSummarizer.ts +114 -0
  288. package/src/mcp/mcpManager.ts +365 -0
  289. package/src/mcp/mcpTransport.ts +562 -0
  290. package/src/mcp/retryPolicy.ts +87 -0
  291. package/src/mcp/webSearch.ts +388 -0
  292. package/src/originalPrompts/design_thinking.md +178 -0
  293. package/src/originalPrompts/design_thinking_persona.md +76 -0
  294. package/src/originalPrompts/document_generator_example.md +77 -0
  295. package/src/originalPrompts/document_generator_persona.md +47 -0
  296. package/src/originalPrompts/facilitator_persona.md +125 -0
  297. package/src/originalPrompts/guardrails.md +47 -0
  298. package/src/phases/contextSummarizer.ts +154 -0
  299. package/src/phases/discoveryEnricher.ts +223 -0
  300. package/src/phases/phaseExtractors.ts +247 -0
  301. package/src/phases/phaseHandlers.ts +450 -0
  302. package/src/prompts/design.md +51 -0
  303. package/src/prompts/develop-boundary.md +51 -0
  304. package/src/prompts/develop.md +111 -0
  305. package/src/prompts/discover.md +58 -0
  306. package/src/prompts/ideate.md +56 -0
  307. package/src/prompts/plan.md +51 -0
  308. package/src/prompts/promptLoader.ts +198 -0
  309. package/src/prompts/select.md +47 -0
  310. package/src/prompts/summarize/README.md +8 -0
  311. package/src/prompts/summarize/design-summary.md +37 -0
  312. package/src/prompts/summarize/develop-summary.md +25 -0
  313. package/src/prompts/summarize/ideate-summary.md +27 -0
  314. package/src/prompts/summarize/plan-summary.md +27 -0
  315. package/src/prompts/summarize/select-summary.md +21 -0
  316. package/src/prompts/system.md +28 -0
  317. package/src/sessions/exportPaths.ts +28 -0
  318. package/src/sessions/exportWriter.ts +490 -0
  319. package/src/sessions/sessionManager.ts +119 -0
  320. package/src/sessions/sessionStore.ts +69 -0
  321. package/src/shared/activitySpinner.ts +108 -0
  322. package/src/shared/copilotClient.ts +291 -0
  323. package/src/shared/data/cards.json +1249 -0
  324. package/src/shared/data/cardsLoader.ts +70 -0
  325. package/src/shared/errorClassifier.ts +160 -0
  326. package/src/shared/events.ts +103 -0
  327. package/src/shared/markdownRenderer.ts +44 -0
  328. package/src/shared/schemas/session.ts +346 -0
  329. package/src/shared/tableRenderer.ts +28 -0
  330. package/src/types/marked-terminal.d.ts +5 -0
  331. package/src/vendor/chalk.ts +2 -0
  332. package/src/vendor/cli-table3.ts +3 -0
  333. package/src/vendor/commander.ts +2 -0
  334. package/src/vendor/marked-terminal.ts +3 -0
  335. package/src/vendor/marked.ts +2 -0
  336. package/src/vendor/ora.ts +2 -0
  337. package/src/vendor/pino.ts +3 -0
  338. package/src/vendor/zod.ts +3 -0
  339. package/tests/e2e/developE2e.spec.ts +152 -0
  340. package/tests/e2e/developFailureE2e.spec.ts +289 -0
  341. package/tests/e2e/developPty.spec.ts +86 -0
  342. package/tests/e2e/discoveryWebSearchRelevance.spec.ts +103 -0
  343. package/tests/e2e/harness.spec.ts +104 -0
  344. package/tests/e2e/mcpLive.spec.ts +149 -0
  345. package/tests/e2e/newSession.e2e.spec.ts +245 -0
  346. package/tests/e2e/ralphLoopEnrichmentComparison.spec.ts +70 -0
  347. package/tests/e2e/workiqEnrichment.spec.ts +72 -0
  348. package/tests/e2e/zava-assessment/agent-interaction-script.md +258 -0
  349. package/tests/e2e/zava-assessment/company-profile.md +98 -0
  350. package/tests/e2e/zava-assessment/expected-results-checklist.md +454 -0
  351. package/tests/e2e/zavaSimulation.spec.ts +511 -0
  352. package/tests/fixtures/completedSession.json +141 -0
  353. package/tests/fixtures/test-fixture-project/package-lock.json +1585 -0
  354. package/tests/fixtures/test-fixture-project/package.json +12 -0
  355. package/tests/fixtures/test-fixture-project/src/add.ts +3 -0
  356. package/tests/fixtures/test-fixture-project/tests/failing.test.ts +7 -0
  357. package/tests/fixtures/test-fixture-project/tests/hanging.test.ts +9 -0
  358. package/tests/fixtures/test-fixture-project/tests/passing.test.ts +13 -0
  359. package/tests/fixtures/test-fixture-project/vitest.config.ts +7 -0
  360. package/tests/integration/autoStartConversation.spec.ts +168 -0
  361. package/tests/integration/defaultCommand.spec.ts +179 -0
  362. package/tests/integration/directCommandNonTty.spec.ts +260 -0
  363. package/tests/integration/directCommandTty.spec.ts +185 -0
  364. package/tests/integration/discoveryEnrichmentFlow.spec.ts +209 -0
  365. package/tests/integration/exportArtifacts.spec.ts +232 -0
  366. package/tests/integration/exportFallbackFlow.spec.ts +115 -0
  367. package/tests/integration/mcpDegradationFlow.spec.ts +231 -0
  368. package/tests/integration/mcpTransportFlow.spec.ts +178 -0
  369. package/tests/integration/newSessionFlow.spec.ts +406 -0
  370. package/tests/integration/pocGithubMcp.spec.ts +224 -0
  371. package/tests/integration/pocLocalFallback.spec.ts +205 -0
  372. package/tests/integration/pocScaffold.spec.ts +220 -0
  373. package/tests/integration/ralphLoopFlow.spec.ts +430 -0
  374. package/tests/integration/ralphLoopPartial.spec.ts +416 -0
  375. package/tests/integration/resumeAndBacktrack.spec.ts +278 -0
  376. package/tests/integration/spinnerLifecycle.spec.ts +270 -0
  377. package/tests/integration/summarizationFlow.spec.ts +135 -0
  378. package/tests/integration/testRunnerReal.spec.ts +63 -0
  379. package/tests/integration/webSearchAgent.spec.ts +155 -0
  380. package/tests/live/copilotSdkLive.spec.ts +149 -0
  381. package/tests/live/zavaFullWorkshop.spec.ts +515 -0
  382. package/tests/setup/loadEnv.ts +5 -0
  383. package/tests/unit/cli/developCommand.spec.ts +679 -0
  384. package/tests/unit/cli/directCommands.spec.ts +325 -0
  385. package/tests/unit/cli/envLoader.spec.ts +73 -0
  386. package/tests/unit/cli/ioContext.spec.ts +148 -0
  387. package/tests/unit/cli/preflight.spec.ts +125 -0
  388. package/tests/unit/cli/statusCommand.spec.ts +134 -0
  389. package/tests/unit/cli/workshopClientFallback.spec.ts +100 -0
  390. package/tests/unit/cli/workshopCommand.spec.ts +378 -0
  391. package/tests/unit/config/vitestEnvSetup.spec.ts +24 -0
  392. package/tests/unit/develop/checkpointState.spec.ts +378 -0
  393. package/tests/unit/develop/codeGenerator.spec.ts +447 -0
  394. package/tests/unit/develop/githubMcpAdapter.spec.ts +283 -0
  395. package/tests/unit/develop/mcpContextEnricher.spec.ts +564 -0
  396. package/tests/unit/develop/outputValidator.spec.ts +134 -0
  397. package/tests/unit/develop/pocScaffolder.spec.ts +451 -0
  398. package/tests/unit/develop/ralphLoop.spec.ts +1439 -0
  399. package/tests/unit/develop/templateRegistry.spec.ts +106 -0
  400. package/tests/unit/develop/testRunner.spec.ts +294 -0
  401. package/tests/unit/infraBicep.spec.ts +116 -0
  402. package/tests/unit/infraDeploy.spec.ts +102 -0
  403. package/tests/unit/infraTeardown.spec.ts +77 -0
  404. package/tests/unit/logging/logger.spec.ts +50 -0
  405. package/tests/unit/loop/conversationLoop.spec.ts +719 -0
  406. package/tests/unit/loop/phaseSummarizer.spec.ts +169 -0
  407. package/tests/unit/loop/streamingMarkdown.spec.ts +180 -0
  408. package/tests/unit/mcp/mcpManager.spec.ts +336 -0
  409. package/tests/unit/mcp/mcpTransport.spec.ts +689 -0
  410. package/tests/unit/mcp/retryPolicy.spec.ts +278 -0
  411. package/tests/unit/mcp/timeoutValidation.spec.ts +55 -0
  412. package/tests/unit/mcp/webSearch.spec.ts +718 -0
  413. package/tests/unit/phases/contextSummarizer.spec.ts +158 -0
  414. package/tests/unit/phases/discoveryEnricher.repeatCalls.spec.ts +125 -0
  415. package/tests/unit/phases/discoveryEnricher.spec.ts +512 -0
  416. package/tests/unit/phases/phaseExtractors.spec.ts +406 -0
  417. package/tests/unit/phases/phaseHandlers.spec.ts +483 -0
  418. package/tests/unit/prompts/promptLoader.spec.ts +144 -0
  419. package/tests/unit/schemas/pocSchemas.spec.ts +457 -0
  420. package/tests/unit/schemas/session.spec.ts +328 -0
  421. package/tests/unit/sessions/exportPaths.spec.ts +38 -0
  422. package/tests/unit/sessions/exportWriter.spec.ts +737 -0
  423. package/tests/unit/sessions/sessionManager.spec.ts +174 -0
  424. package/tests/unit/sessions/sessionStore.spec.ts +136 -0
  425. package/tests/unit/shared/activitySpinner.spec.ts +211 -0
  426. package/tests/unit/shared/cardsLoader.spec.ts +89 -0
  427. package/tests/unit/shared/copilotClient.spec.ts +185 -0
  428. package/tests/unit/shared/errorClassifier.spec.ts +152 -0
  429. package/tests/unit/shared/events.spec.ts +71 -0
  430. package/tests/unit/shared/markdownRenderer.spec.ts +42 -0
  431. package/tests/unit/shared/markdownRendererChunks.spec.ts +83 -0
  432. package/tests/unit/shared/tableRenderer.spec.ts +38 -0
  433. package/tsconfig.json +20 -0
  434. package/vitest.config.ts +15 -0
  435. package/vitest.live.config.ts +19 -0
@@ -0,0 +1,786 @@
1
+ # Research: PoC Generation & Ralph Loop
2
+
3
+ **Feature ID**: 002-poc-generation
4
+ **Date**: 2026-02-27
5
+ **Status**: Complete
6
+
7
+ ---
8
+
9
+ ## Topic 1: Ralph Loop Pattern
10
+
11
+ ### Findings
12
+
13
+ The Ralph Loop is an **autonomous, iterative code-generation-test-refine** pattern originally conceived by Geoffrey Huntley ([ghuntley.com/ralph](https://ghuntley.com/ralph/)) and formalized as a Claude Code plugin at [`anthropics/claude-plugins-official/plugins/ralph-loop`](https://github.com/anthropics/claude-plugins-official/tree/main/plugins/ralph-loop).
14
+
15
+ #### Canonical Pattern
16
+
17
+ The core concept is simple: **a `while true` loop that repeatedly feeds the same prompt to an LLM**, where the LLM's work persists across iterations via the filesystem.
18
+
19
+ ```
20
+ ┌───────────────────────────────────────────┐
21
+ │ Ralph Loop │
22
+ │ │
23
+ │ ┌─────────┐ ┌────────────────┐ │
24
+ │ │ Prompt │────▶│ LLM works on │ │
25
+ │ │ (fixed) │ │ task, modifies │ │
26
+ │ └─────────┘ │ files, runs │ │
27
+ │ ▲ │ tests │ │
28
+ │ │ └───────┬────────┘ │
29
+ │ │ │ │
30
+ │ │ ┌───────▼────────┐ │
31
+ │ │ │ Check exit │ │
32
+ │ │ │ conditions │ │
33
+ │ │ └───────┬────────┘ │
34
+ │ │ │ │
35
+ │ │ ┌─────────┴─────────┐ │
36
+ │ │ CONTINUE STOP │
37
+ │ │ │ │ │
38
+ │ └─────────┘ ┌───────▼──┐ │
39
+ │ │ Complete │ │
40
+ │ └──────────┘ │
41
+ └───────────────────────────────────────────┘
42
+ ```
43
+
44
+ #### Iteration Steps (per the canonical implementation)
45
+
46
+ 1. **LLM receives the SAME prompt** every iteration (the prompt never changes)
47
+ 2. **LLM works on the task** — generates/modifies code, runs tests, reviews output
48
+ 3. **LLM tries to exit** — considers itself "done" for this pass
49
+ 4. **Stop hook intercepts** — checks termination conditions
50
+ 5. **If not complete** — blocks exit, feeds the same prompt back, increments iteration counter
51
+ 6. **Self-reference** — LLM sees its previous work in files and git history
52
+
53
+ #### Termination Conditions
54
+
55
+ The canonical implementation uses three termination mechanisms:
56
+
57
+ | Condition | Mechanism | Priority |
58
+ |-----------|-----------|----------|
59
+ | **Completion promise** | LLM outputs `<promise>EXACT_TEXT</promise>` tag; stop hook does exact string match | Primary (semantic) |
60
+ | **Max iterations** | Counter in state file; stop hook checks `iteration >= max_iterations` | Safety net |
61
+ | **State file removal** | User runs `/cancel-ralph` or hook detects corruption | Manual override |
62
+
63
+ #### State File Format
64
+
65
+ ```markdown
66
+ ---
67
+ active: true
68
+ iteration: 1
69
+ max_iterations: 10
70
+ completion_promise: "All tests passing"
71
+ started_at: "2026-02-27T14:30:00Z"
72
+ ---
73
+
74
+ Build a REST API for todos.
75
+ When complete:
76
+ - All CRUD endpoints working
77
+ - Tests passing (coverage > 80%)
78
+ - Output: <promise>All tests passing</promise>
79
+ ```
80
+
81
+ #### Feedback Mechanism
82
+
83
+ **Key insight**: The feedback is NOT output-to-input piping. Instead:
84
+ - The prompt stays the same every iteration
85
+ - The LLM's work **persists in files on disk**
86
+ - Each iteration, the LLM **reads its own prior work** from the filesystem
87
+ - This creates a self-referential improvement loop via file-system state
88
+
89
+ The stop hook outputs a JSON `block` decision:
90
+ ```json
91
+ {
92
+ "decision": "block",
93
+ "reason": "<the original prompt text>",
94
+ "systemMessage": "🔄 Ralph iteration 5 | To stop: output <promise>DONE</promise>"
95
+ }
96
+ ```
97
+
98
+ #### Adaptation for sofIA
99
+
100
+ For sofIA's Develop phase, we need to **internalize** the Ralph loop rather than using external bash hooks. Key differences from the canonical pattern:
101
+
102
+ | Aspect | Canonical (Claude Code) | sofIA Adaptation |
103
+ |--------|------------------------|-------------------|
104
+ | Loop mechanism | Bash `while true` / Stop hook | TypeScript `while` loop in `ralphLoop.ts` |
105
+ | Feedback | File system persistence | File system + structured `PocIteration` in session |
106
+ | Prompt | Fixed markdown file | Dynamic, enriched with test failure context |
107
+ | Termination | Promise tag + max iterations | Tests passing + max iterations + user abort |
108
+ | State | `.claude/ralph-loop.local.md` | `WorkshopSession.poc.iterations[]` |
109
+
110
+ **Critical enhancement**: Unlike the canonical Ralph loop where the prompt never changes, sofIA's adaptation should **inject test failure output** into subsequent prompts. This is closer to how the `skill-creator` plugin's `run_loop.py` works — evaluation results from iteration N feed into the improvement prompt for iteration N+1.
111
+
112
+ ### Decision
113
+
114
+ Implement a **modified Ralph loop** in `src/develop/ralphLoop.ts` with these iteration steps:
115
+
116
+ 1. **Generate/refine code** — Send prompt + test failures to LLM, write output files
117
+ 2. **Run tests** — Execute test runner, capture structured results
118
+ 3. **Evaluate termination** — Check: tests pass? max iterations? user abort? stuck detection?
119
+ 4. **Record iteration** — Persist `PocIteration` to session
120
+ 5. **Loop or exit** — Feed failures as context into next iteration, or finalize
121
+
122
+ ### Rationale
123
+
124
+ - Internalizing the loop (vs. external bash) gives us structured state tracking, session persistence, and the ability to enrich prompts with failure context
125
+ - Adding test-failure injection improves convergence speed vs. plain prompt repetition
126
+ - Keeping the `max_iterations` safety net and adding stuck-detection (same failures N times) prevents infinite loops
127
+
128
+ ### Alternatives Considered
129
+
130
+ 1. **External bash loop wrapping `sofiacli`** — Rejected: loses session integration, no structured state, platform-specific
131
+ 2. **Pure prompt repetition (canonical Ralph)** — Rejected: slower convergence without failure context injection
132
+ 3. **LangGraph-style state machine** — Rejected: over-engineered for this use case, adds a heavy dependency
133
+
134
+ ---
135
+
136
+ ## Topic 2: Test Runner Invocation from Node.js
137
+
138
+ ### Findings
139
+
140
+ #### Approach: `child_process.spawn` with JSON reporter
141
+
142
+ ```typescript
143
+ import { spawn } from 'node:child_process';
144
+
145
+ interface TestResult {
146
+ passed: number;
147
+ failed: number;
148
+ skipped: number;
149
+ duration: number;
150
+ failures: TestFailure[];
151
+ }
152
+
153
+ interface TestFailure {
154
+ name: string;
155
+ message: string;
156
+ stack?: string;
157
+ }
158
+
159
+ async function runTests(cwd: string, timeout = 60_000): Promise<TestResult> {
160
+ return new Promise((resolve, reject) => {
161
+ const child = spawn('npx', ['vitest', 'run', '--reporter=json'], {
162
+ cwd,
163
+ stdio: ['ignore', 'pipe', 'pipe'],
164
+ timeout,
165
+ env: { ...process.env, CI: '1', NO_COLOR: '1' },
166
+ });
167
+
168
+ const stdoutChunks: Buffer[] = [];
169
+ const stderrChunks: Buffer[] = [];
170
+
171
+ child.stdout.on('data', (chunk) => stdoutChunks.push(chunk));
172
+ child.stderr.on('data', (chunk) => stderrChunks.push(chunk));
173
+
174
+ child.on('close', (code) => {
175
+ const stdout = Buffer.concat(stdoutChunks).toString();
176
+ const stderr = Buffer.concat(stderrChunks).toString();
177
+
178
+ try {
179
+ const json = JSON.parse(stdout);
180
+ resolve(parseVitestJson(json));
181
+ } catch {
182
+ // Fallback: parse exit code
183
+ resolve({
184
+ passed: code === 0 ? 1 : 0,
185
+ failed: code === 0 ? 0 : 1,
186
+ skipped: 0,
187
+ duration: 0,
188
+ failures: code !== 0
189
+ ? [{ name: 'unknown', message: stderr || stdout }]
190
+ : [],
191
+ });
192
+ }
193
+ });
194
+
195
+ child.on('error', (err) => {
196
+ reject(new Error(`Test runner failed to start: ${err.message}`));
197
+ });
198
+ });
199
+ }
200
+ ```
201
+
202
+ #### spawn vs exec
203
+
204
+ | Factor | `spawn` | `exec` |
205
+ |--------|---------|--------|
206
+ | Buffer limit | **No limit** (streams) | 1MB default `maxBuffer` |
207
+ | Streaming | Yes — can process real-time | No — waits for completion |
208
+ | Timeout | Built-in `timeout` option | Built-in `timeout` option |
209
+ | Signal handling | Direct `child.kill()` | Same via returned child |
210
+ | **Verdict** | **Preferred** | Acceptable for small output |
211
+
212
+ Use `spawn` because test output can be large (especially with failure stacks).
213
+
214
+ #### JSON Reporters by Test Runner
215
+
216
+ | Runner | JSON Flag | Output |
217
+ |--------|-----------|--------|
218
+ | **Vitest** | `--reporter=json` | `{ numPassedTests, numFailedTests, testResults[] }` |
219
+ | **Jest** | `--json` | Same format (Vitest is Jest-compatible) |
220
+ | **Node test runner** | `--test-reporter=spec` | TAP output (parse with `tap-parser`) |
221
+ | **TAP** | Various | Use `tap-parser` npm package to parse |
222
+
223
+ **Recommendation**: Use Vitest JSON reporter since the project already uses Vitest. Fall back to exit-code parsing if JSON fails.
224
+
225
+ #### Timeout Handling
226
+
227
+ ```typescript
228
+ const child = spawn('npx', ['vitest', 'run', '--reporter=json'], {
229
+ cwd,
230
+ timeout: 60_000, // Kill after 60s
231
+ killSignal: 'SIGTERM', // Graceful first
232
+ });
233
+
234
+ // Belt-and-suspenders: hard kill after grace period
235
+ const hardKill = setTimeout(() => {
236
+ if (!child.killed) child.kill('SIGKILL');
237
+ }, timeout + 5_000);
238
+
239
+ child.on('close', () => clearTimeout(hardKill));
240
+ ```
241
+
242
+ #### Environment Variables
243
+
244
+ Set these to prevent interactive/hanging behavior:
245
+ ```typescript
246
+ env: {
247
+ ...process.env,
248
+ CI: '1', // Disable watch mode, interactive prompts
249
+ NO_COLOR: '1', // Clean output for parsing
250
+ FORCE_COLOR: '0', // Redundant safety
251
+ }
252
+ ```
253
+
254
+ ### Decision
255
+
256
+ Use `child_process.spawn` with Vitest's `--reporter=json` flag. Capture stdout/stderr separately. Apply a configurable timeout (default 60s) with belt-and-suspenders hard kill. Parse JSON output into a `TestResult` struct; fall back to exit-code parsing on malformed output.
257
+
258
+ ### Rationale
259
+
260
+ - `spawn` handles arbitrarily large output without buffering issues
261
+ - JSON reporter gives structured results without regex parsing
262
+ - Vitest is already the project's test runner, so the JSON format is well-understood
263
+ - Separate stdout/stderr capture allows clean JSON parsing even when warnings appear on stderr
264
+
265
+ ### Alternatives Considered
266
+
267
+ 1. **`exec` with `maxBuffer`** — Rejected: risk of truncation on large test output
268
+ 2. **TAP protocol** — Rejected: requires additional parser dependency; Vitest's JSON is sufficient
269
+ 3. **Vitest Node API** — Rejected: tightly couples to Vitest version; `spawn` is runner-agnostic
270
+ 4. **`node:test` built-in runner** — Rejected: less mature, fewer features than Vitest for this use case
271
+
272
+ ---
273
+
274
+ ## Topic 3: GitHub MCP Repo Creation
275
+
276
+ ### Findings
277
+
278
+ The GitHub MCP server at `https://api.githubcopilot.com/mcp/` provides tools via the Model Context Protocol. Based on the MCP standard and GitHub's documentation, the available tools include:
279
+
280
+ #### Available Tools (relevant subset)
281
+
282
+ | Tool | Description |
283
+ |------|-------------|
284
+ | `create_repository` | Create a new GitHub repository |
285
+ | `create_or_update_file` | Create or update a single file in a repo |
286
+ | `push_files` | Push multiple files in a single commit |
287
+ | `create_branch` | Create a new branch |
288
+ | `create_pull_request` | Open a PR |
289
+ | `search_repositories` | Search existing repos |
290
+ | `get_file_contents` | Read file from repo |
291
+ | `list_branches` | List branches |
292
+
293
+ #### Tool Calling Pattern via Copilot SDK
294
+
295
+ The Copilot SDK routes MCP tool calls automatically when MCP servers are configured. The flow is:
296
+
297
+ ```
298
+ ConversationSession.send(prompt)
299
+ → SDK resolves MCP servers from config
300
+ → LLM decides to call a tool (e.g., create_repository)
301
+ → SDK routes to GitHub MCP server
302
+ → Server executes against GitHub API
303
+ → Result returned as ToolResult event
304
+ ```
305
+
306
+ In sofIA's architecture, MCP tools are invoked **indirectly** — the LLM decides which tools to call based on the system prompt. The `developPocPrompt` would instruct the LLM to:
307
+
308
+ 1. Check if a repo already exists (or use local fallback)
309
+ 2. Create the repo with `create_repository`
310
+ 3. Push scaffold files with `push_files` or `create_or_update_file`
311
+ 4. Create a branch for the PoC work
312
+
313
+ #### Direct MCP Invocation (Alternative)
314
+
315
+ For more deterministic control, sofIA could call MCP tools directly without going through the LLM:
316
+
317
+ ```typescript
318
+ // Hypothetical direct MCP tool call via SDK
319
+ // The Copilot SDK's CopilotSession may expose tool invocation
320
+ const result = await sdkSession.invokeTool('create_repository', {
321
+ name: `sofia-poc-${sessionId}`,
322
+ description: 'PoC generated by sofIA workshop',
323
+ private: true,
324
+ auto_init: true,
325
+ });
326
+ ```
327
+
328
+ However, the current `@github/copilot-sdk` API uses `sendAndWait`, which routes through the LLM. Direct tool invocation would require using the MCP protocol directly (e.g., `@modelcontextprotocol/sdk`).
329
+
330
+ #### Availability Detection
331
+
332
+ ```typescript
333
+ // Check if GitHub MCP is available before attempting repo creation
334
+ const mcpManager = new McpManager(config);
335
+ const githubAvailable = mcpManager.isAvailable('github');
336
+
337
+ if (!githubAvailable) {
338
+ // Fall back to local scaffolding (D-003)
339
+ return scaffoldLocally(session, pocDir);
340
+ }
341
+ ```
342
+
343
+ ### Decision
344
+
345
+ Use **LLM-mediated MCP tool calls** for GitHub repo creation (the LLM decides when/how to call GitHub MCP tools based on the develop prompt). Add explicit availability detection via `McpManager.isAvailable('github')` to enable graceful fallback to local scaffolding. Do NOT attempt direct MCP protocol calls — keep the architecture aligned with how the Copilot SDK works.
346
+
347
+ ### Rationale
348
+
349
+ - The Copilot SDK already handles MCP routing; adding a parallel MCP client adds complexity
350
+ - LLM-mediated calls allow the model to adapt to errors (e.g., repo already exists, permission denied)
351
+ - Graceful fallback to local scaffolding (D-003) ensures the feature works without GitHub MCP
352
+ - The `McpManager` already has the detection infrastructure
353
+
354
+ ### Alternatives Considered
355
+
356
+ 1. **Direct MCP protocol client** (`@modelcontextprotocol/sdk`) — Rejected: adds a dependency, duplicates SDK functionality, and the control flow becomes harder to test
357
+ 2. **GitHub REST API directly** — Rejected: requires separate auth, loses MCP abstraction, doesn't benefit from SDK's tool routing
358
+ 3. **GitHub CLI (`gh repo create`)** — Rejected: requires `gh` installed, additional auth setup, not composable
359
+
360
+ ---
361
+
362
+ ## Topic 4: Local Filesystem PoC Scaffolding
363
+
364
+ ### Findings
365
+
366
+ #### File Tree Generation Pattern
367
+
368
+ Recommended approach: **Programmatic generation from in-memory template descriptors**, not template engines.
369
+
370
+ ```typescript
371
+ interface ScaffoldFile {
372
+ relativePath: string;
373
+ content: string | ((ctx: ScaffoldContext) => string);
374
+ }
375
+
376
+ interface ScaffoldContext {
377
+ projectName: string;
378
+ sessionId: string;
379
+ description: string;
380
+ techStack: string;
381
+ architectureNotes?: string;
382
+ }
383
+
384
+ const SCAFFOLD_FILES: ScaffoldFile[] = [
385
+ {
386
+ relativePath: 'package.json',
387
+ content: (ctx) => JSON.stringify({
388
+ name: ctx.projectName,
389
+ version: '0.1.0',
390
+ scripts: { test: 'vitest run', build: 'tsc' },
391
+ }, null, 2),
392
+ },
393
+ {
394
+ relativePath: 'README.md',
395
+ content: (ctx) => `# ${ctx.projectName}\n\n${ctx.description}\n`,
396
+ },
397
+ {
398
+ relativePath: 'tsconfig.json',
399
+ content: JSON.stringify({
400
+ compilerOptions: { target: 'ES2022', module: 'nodenext', strict: true, outDir: 'dist' },
401
+ include: ['src'],
402
+ }, null, 2),
403
+ },
404
+ {
405
+ relativePath: 'src/index.ts',
406
+ content: '// Entry point — generated by sofIA\n',
407
+ },
408
+ {
409
+ relativePath: 'tests/smoke.test.ts',
410
+ content: (ctx) => `import { describe, it, expect } from 'vitest';\n\ndescribe('${ctx.projectName}', () => {\n it('should be truthy', () => {\n expect(true).toBe(true);\n });\n});\n`,
411
+ },
412
+ ];
413
+ ```
414
+
415
+ #### Idempotency Strategy
416
+
417
+ ```typescript
418
+ async function scaffold(
419
+ outputDir: string,
420
+ files: ScaffoldFile[],
421
+ ctx: ScaffoldContext,
422
+ options: { overwrite?: boolean } = {},
423
+ ): Promise<string[]> {
424
+ const written: string[] = [];
425
+
426
+ await mkdir(outputDir, { recursive: true });
427
+
428
+ for (const file of files) {
429
+ const fullPath = join(outputDir, file.relativePath);
430
+ const dir = dirname(fullPath);
431
+ await mkdir(dir, { recursive: true });
432
+
433
+ // Idempotency: skip existing files unless overwrite is true
434
+ if (!options.overwrite) {
435
+ try {
436
+ await access(fullPath);
437
+ continue; // File exists, skip
438
+ } catch {
439
+ // File doesn't exist, proceed
440
+ }
441
+ }
442
+
443
+ const content = typeof file.content === 'function'
444
+ ? file.content(ctx)
445
+ : file.content;
446
+
447
+ await writeFile(fullPath, content, 'utf-8');
448
+ written.push(file.relativePath);
449
+ }
450
+
451
+ return written;
452
+ }
453
+ ```
454
+
455
+ #### Platform-Safe Path Handling
456
+
457
+ ```typescript
458
+ import { join, resolve, normalize } from 'node:path';
459
+
460
+ // ALWAYS use path.join() — never string concatenation
461
+ const pocDir = join('.', 'poc', sessionId); // ✅
462
+ const pocDir = `./poc/${sessionId}`; // ❌ Windows path separator issues
463
+
464
+ // Normalize user-provided paths
465
+ const safePath = normalize(userPath);
466
+
467
+ // Prevent path traversal
468
+ function isSafePath(base: string, target: string): boolean {
469
+ const resolvedBase = resolve(base);
470
+ const resolvedTarget = resolve(base, target);
471
+ return resolvedTarget.startsWith(resolvedBase);
472
+ }
473
+ ```
474
+
475
+ ### Decision
476
+
477
+ Use **programmatic generation from typed template descriptors** (no template engine dependency). Implement idempotency via "skip existing files unless `--overwrite`" semantics. Use `node:path` functions exclusively for all path operations. Output directory: `./poc/<sessionId>/`.
478
+
479
+ ### Rationale
480
+
481
+ - Template descriptors are fully typed, testable, and don't require a runtime parser
482
+ - Skip-existing-files idempotency is simpler and safer than diff-and-merge
483
+ - `node:path` handles platform differences automatically
484
+ - Keeping scaffolds as code (not files on disk) avoids packaging/distribution issues
485
+
486
+ ### Alternatives Considered
487
+
488
+ 1. **Template engine (Handlebars, EJS)** — Rejected: adds dependency, requires template files to ship, harder to type-check
489
+ 2. **Yeoman/Plop generators** — Rejected: heavy dependencies, CLI-centric design doesn't compose well
490
+ 3. **Copy directory tree from `templates/`** — Rejected: requires shipping template files, variable substitution still needed
491
+ 4. **Git clone template repo** — Partially viable for GitHub MCP path but adds network dependency
492
+
493
+ ---
494
+
495
+ ## Topic 5: Autonomous Loop vs Interactive Loop
496
+
497
+ ### Findings
498
+
499
+ The current `ConversationLoop` class is fundamentally **interactive**:
500
+ - It calls `this.io.readInput()` in a `while` loop waiting for user text
501
+ - It uses `DecisionGate` to ask the user what to do next
502
+ - It checks for `done` / empty input to break
503
+
504
+ An autonomous Ralph loop needs to:
505
+ - Supply its own "input" (the prompt + test failure context)
506
+ - Never block waiting for user input
507
+ - Terminate based on programmatic conditions (tests passing, max iterations)
508
+ - Still produce streaming output for visibility
509
+
510
+ #### Architecture Options Analysis
511
+
512
+ ##### Option A: Subclass ConversationLoop
513
+
514
+ ```typescript
515
+ class AutonomousLoop extends ConversationLoop {
516
+ override async run(): Promise<WorkshopSession> {
517
+ // Override the main loop behavior
518
+ }
519
+ }
520
+ ```
521
+ **Pros**: Reuses streaming/rendering code
522
+ **Cons**: `ConversationLoop.run()` is monolithic; overriding it means reimplementing most of the logic. Fragile inheritance.
523
+
524
+ ##### Option B: New standalone AutonomousLoop class
525
+
526
+ ```typescript
527
+ class RalphLoop {
528
+ constructor(private options: RalphLoopOptions) {}
529
+
530
+ async run(): Promise<PocDevelopmentState> {
531
+ while (iteration < maxIterations && !testsPass) {
532
+ const code = await this.generate(prompt, failures);
533
+ await this.writeFiles(code);
534
+ const results = await this.runTests();
535
+ failures = results.failures;
536
+ iteration++;
537
+ }
538
+ }
539
+ }
540
+ ```
541
+ **Pros**: Clean separation of concerns; purpose-built for the autonomous case
542
+ **Cons**: Duplicates streaming/rendering logic from ConversationLoop
543
+
544
+ ##### Option C: Parameterize ConversationLoop with a "driver"
545
+
546
+ ```typescript
547
+ interface LoopDriver {
548
+ getNextInput(session: WorkshopSession, lastResponse: string): Promise<string | null>;
549
+ shouldContinue(session: WorkshopSession): boolean;
550
+ }
551
+
552
+ class InteractiveDriver implements LoopDriver {
553
+ async getNextInput() { return this.io.readInput(); }
554
+ shouldContinue() { return true; } // User controls via "done"
555
+ }
556
+
557
+ class AutonomousDriver implements LoopDriver {
558
+ async getNextInput(session, lastResponse) {
559
+ const testResults = await this.runTests(session.poc.repoPath);
560
+ if (testResults.allPassing) return null; // Signal done
561
+ return formatFailurePrompt(testResults);
562
+ }
563
+ shouldContinue(session) {
564
+ return session.poc.iterations.length < this.maxIterations;
565
+ }
566
+ }
567
+ ```
568
+ **Pros**: Open/Closed principle; ConversationLoop stays unchanged; easy to test drivers independently
569
+ **Cons**: ConversationLoop needs refactoring to accept a driver; the streaming and turn-management code becomes shared
570
+
571
+ ##### Option D: Compose ConversationLoop as inner component
572
+
573
+ ```typescript
574
+ class RalphLoop {
575
+ async run(): Promise<PocDevelopmentState> {
576
+ for (let i = 0; i < maxIterations; i++) {
577
+ // Use ConversationLoop for a single LLM turn
578
+ const loop = new ConversationLoop({
579
+ client: this.client,
580
+ io: this.createAutoIO(prompt),
581
+ session: this.session,
582
+ phaseHandler: this.handler,
583
+ initialMessage: prompt,
584
+ });
585
+ this.session = await loop.run();
586
+
587
+ // Run tests
588
+ const results = await this.runTests();
589
+ if (results.allPassing) break;
590
+ prompt = enrichPromptWithFailures(prompt, results);
591
+ }
592
+ }
593
+
594
+ private createAutoIO(prompt: string): LoopIO {
595
+ return {
596
+ write: (text) => this.outputHandler(text),
597
+ writeActivity: (text) => this.outputHandler(text),
598
+ readInput: async () => null, // Immediately signal "done"
599
+ showDecisionGate: async () => ({ choice: 'continue' }),
600
+ isJsonMode: false,
601
+ isTTY: false,
602
+ };
603
+ }
604
+ }
605
+ ```
606
+ **Pros**: Reuses ConversationLoop's streaming exactly; no modification to existing code; each LLM turn is isolated
607
+ **Cons**: Creates a new ConversationLoop per iteration (minor overhead); ConversationLoop does more than needed per call (signal handlers, etc.)
608
+
609
+ ### Decision
610
+
611
+ **Option D: Compose ConversationLoop as inner component** for the initial implementation, with a path to evolve toward Option C.
612
+
613
+ The `RalphLoop` class is the outer orchestrator. For each iteration, it creates a `ConversationLoop` with an auto-completing `LoopIO` (returns `null` from `readInput` immediately after the initial message is sent) and uses `initialMessage` to inject the prompt. This approach:
614
+
615
+ 1. Reuses all existing streaming/rendering infrastructure
616
+ 2. Requires zero changes to `ConversationLoop`
617
+ 3. Each iteration is isolated (clean session state handoff)
618
+ 4. The auto-completing `LoopIO` is trivially testable
619
+
620
+ The `RalphLoop.run()` method owns the outer iteration, test execution, and termination logic.
621
+
622
+ ### Rationale
623
+
624
+ - Minimizes risk: `ConversationLoop` is battle-tested and unchanged
625
+ - The `LoopIO` mock pattern is simple: `readInput: async () => null`
626
+ - Each iteration gets a fresh LLM session, preventing context window overflow
627
+ - The composition pattern naturally supports the spec's requirement for multiple iteration records
628
+
629
+ ### Alternatives Considered
630
+
631
+ See Options A–C above. Option C (driver pattern) is the best long-term architecture but requires refactoring `ConversationLoop.run()`, which is out of scope for feature 002's initial implementation.
632
+
633
+ ---
634
+
635
+ ## Topic 6: PocDevelopmentState Schema Extensions
636
+
637
+ ### Findings
638
+
639
+ The current schema is minimal:
640
+
641
+ ```typescript
642
+ // Current (from session.ts)
643
+ export const pocIterationSchema = z.object({
644
+ iteration: z.number(),
645
+ startedAt: z.string(),
646
+ endedAt: z.string().optional(),
647
+ changesSummary: z.string().optional(),
648
+ testsRun: z.array(z.string()).optional(),
649
+ });
650
+
651
+ export const pocDevelopmentStateSchema = z.object({
652
+ repoPath: z.string().optional(),
653
+ iterations: z.array(pocIterationSchema),
654
+ finalStatus: z.enum(['success', 'failed']).optional(),
655
+ });
656
+ ```
657
+
658
+ This is insufficient for a working Ralph loop. The following extensions are needed:
659
+
660
+ #### Per-Iteration Extensions
661
+
662
+ ```typescript
663
+ export const testResultSchema = z.object({
664
+ passed: z.number(),
665
+ failed: z.number(),
666
+ skipped: z.number(),
667
+ duration: z.number(), // milliseconds
668
+ failures: z.array(z.object({
669
+ testName: z.string(),
670
+ message: z.string(),
671
+ stack: z.string().optional(),
672
+ })),
673
+ });
674
+
675
+ export const pocIterationSchema = z.object({
676
+ iteration: z.number(),
677
+ startedAt: z.string(), // ISO-8601
678
+ endedAt: z.string().optional(), // ISO-8601
679
+ changesSummary: z.string().optional(),
680
+
681
+ // NEW: Structured test results
682
+ testResults: testResultSchema.optional(),
683
+
684
+ // NEW: Files touched in this iteration
685
+ filesChanged: z.array(z.string()).optional(), // relative paths
686
+
687
+ // NEW: Prompt context tracking (for audit)
688
+ promptTokensUsed: z.number().optional(),
689
+ responseTokensUsed: z.number().optional(),
690
+
691
+ // NEW: Iteration outcome classification
692
+ outcome: z.enum([
693
+ 'tests-passing', // All tests pass — can terminate
694
+ 'tests-improving', // Fewer failures than previous iteration
695
+ 'tests-regressing', // More failures than previous iteration
696
+ 'tests-stuck', // Same failures as previous iteration
697
+ 'error', // Runtime error (test runner crash, timeout)
698
+ ]).optional(),
699
+
700
+ // DEPRECATED: replaced by testResults
701
+ testsRun: z.array(z.string()).optional(),
702
+ });
703
+ ```
704
+
705
+ #### Overall State Extensions
706
+
707
+ ```typescript
708
+ export const pocDevelopmentStateSchema = z.object({
709
+ repoPath: z.string().optional(), // local path or GitHub URL
710
+ iterations: z.array(pocIterationSchema),
711
+ finalStatus: z.enum(['success', 'failed', 'partial', 'aborted']).optional(),
712
+
713
+ // NEW: Technology context
714
+ techStack: z.string().optional(), // e.g., "Node.js + TypeScript + Express"
715
+ templateUsed: z.string().optional(), // e.g., "node-ts-api"
716
+
717
+ // NEW: Timing
718
+ totalDuration: z.number().optional(), // total ms across all iterations
719
+
720
+ // NEW: Configuration used
721
+ maxIterations: z.number().optional(), // configured limit
722
+ testCommand: z.string().optional(), // e.g., "npm test"
723
+
724
+ // NEW: Source tracking
725
+ repoSource: z.enum(['github-mcp', 'local', 'existing']).optional(),
726
+
727
+ // NEW: Termination reason
728
+ terminationReason: z.enum([
729
+ 'tests-passing',
730
+ 'max-iterations',
731
+ 'user-abort',
732
+ 'stuck-detected', // same failures for N consecutive iterations
733
+ 'error',
734
+ ]).optional(),
735
+
736
+ // NEW: Summary for export/audit
737
+ finalTestResults: testResultSchema.optional(),
738
+ });
739
+ ```
740
+
741
+ #### Audit Trail Compliance
742
+
743
+ The schema supports audit requirements through:
744
+
745
+ 1. **Per-iteration `testResults`** — exact pass/fail counts and failure messages recorded
746
+ 2. **`outcome` classification** — machine-readable iteration assessment
747
+ 3. **`filesChanged`** — what was modified (without storing full diffs, which could be large)
748
+ 4. **`terminationReason`** — why the loop stopped
749
+ 5. **Token usage** — cost tracking per iteration
750
+ 6. **Timestamps** — `startedAt`/`endedAt` on each iteration plus `totalDuration`
751
+
752
+ What we deliberately **exclude** from the schema (stored elsewhere or not at all):
753
+ - Full file contents (too large for JSON state; stored on disk)
754
+ - Full LLM conversation history (already in `turns[]`)
755
+ - Secrets/tokens (security policy)
756
+
757
+ ### Decision
758
+
759
+ Extend `PocDevelopmentState` and `PocIteration` as described above. Add the new `TestResult` schema. Expand `finalStatus` to include `'partial'` and `'aborted'`. Add `terminationReason`, `repoSource`, `techStack`, `templateUsed`, `totalDuration`, `maxIterations`, `testCommand`, and `finalTestResults` to the state. Add `testResults`, `filesChanged`, `promptTokensUsed`, `responseTokensUsed`, and `outcome` to iterations. Keep `testsRun` for backward compatibility but mark as deprecated.
760
+
761
+ ### Rationale
762
+
763
+ - Structured `TestResult` enables the Ralph loop to programmatically compare iterations and detect stuck states
764
+ - `outcome` classification enables the termination logic to be data-driven
765
+ - `terminationReason` + `repoSource` satisfy D-005 auditability requirements
766
+ - Token usage tracking enables cost monitoring for workshop facilitators
767
+ - Backward compatibility with existing `testsRun` field prevents breaking existing sessions
768
+
769
+ ### Alternatives Considered
770
+
771
+ 1. **Minimal extension (just add `testResults`)** — Rejected: insufficient for termination logic and audit trail
772
+ 2. **Separate `RalphLoopState` schema** — Rejected: the PoC state and Ralph loop state are the same thing; splitting adds indirection
773
+ 3. **Store full diffs per iteration** — Rejected: too large for JSON session files; incompatible with the lightweight session model
774
+
775
+ ---
776
+
777
+ ## Summary of Decisions
778
+
779
+ | # | Topic | Decision |
780
+ |---|-------|----------|
781
+ | 1 | Ralph Loop Pattern | Modified Ralph loop with test-failure injection; internal TypeScript loop, not external bash |
782
+ | 2 | Test Runner | `spawn` + Vitest `--reporter=json` + 60s timeout + belt-and-suspenders kill |
783
+ | 3 | GitHub MCP | LLM-mediated MCP tool calls with `McpManager` availability detection; local fallback |
784
+ | 4 | Local Scaffolding | Programmatic typed template descriptors; skip-existing idempotency; `node:path` for safety |
785
+ | 5 | Loop Architecture | Compose: `RalphLoop` owns iteration, uses `ConversationLoop` per turn with auto-completing IO |
786
+ | 6 | Schema Extensions | Full extension of `PocDevelopmentState` + `PocIteration` + new `TestResult` schema |