sofia-cli 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (435) hide show
  1. package/.github/agents/copilot-instructions.md +39 -0
  2. package/.github/agents/speckit.analyze.agent.md +184 -0
  3. package/.github/agents/speckit.checklist.agent.md +294 -0
  4. package/.github/agents/speckit.clarify.agent.md +181 -0
  5. package/.github/agents/speckit.constitution.agent.md +84 -0
  6. package/.github/agents/speckit.implement.agent.md +135 -0
  7. package/.github/agents/speckit.plan.agent.md +90 -0
  8. package/.github/agents/speckit.specify.agent.md +258 -0
  9. package/.github/agents/speckit.tasks.agent.md +137 -0
  10. package/.github/agents/speckit.taskstoissues.agent.md +30 -0
  11. package/.github/copilot-instructions.md +257 -0
  12. package/.github/prompts/speckit.analyze.prompt.md +3 -0
  13. package/.github/prompts/speckit.checklist.prompt.md +3 -0
  14. package/.github/prompts/speckit.clarify.prompt.md +3 -0
  15. package/.github/prompts/speckit.constitution.prompt.md +3 -0
  16. package/.github/prompts/speckit.implement.prompt.md +3 -0
  17. package/.github/prompts/speckit.plan.prompt.md +3 -0
  18. package/.github/prompts/speckit.specify.prompt.md +3 -0
  19. package/.github/prompts/speckit.tasks.prompt.md +3 -0
  20. package/.github/prompts/speckit.taskstoissues.prompt.md +3 -0
  21. package/.github/workflows/ci.yml +38 -0
  22. package/.prettierrc +6 -0
  23. package/.specify/memory/constitution.md +181 -0
  24. package/.specify/scripts/bash/check-prerequisites.sh +166 -0
  25. package/.specify/scripts/bash/common.sh +156 -0
  26. package/.specify/scripts/bash/create-new-feature.sh +297 -0
  27. package/.specify/scripts/bash/setup-plan.sh +61 -0
  28. package/.specify/scripts/bash/update-agent-context.sh +810 -0
  29. package/.specify/templates/agent-file-template.md +28 -0
  30. package/.specify/templates/checklist-template.md +40 -0
  31. package/.specify/templates/constitution-template.md +50 -0
  32. package/.specify/templates/plan-template.md +113 -0
  33. package/.specify/templates/spec-template.md +115 -0
  34. package/.specify/templates/tasks-template.md +251 -0
  35. package/.vscode/mcp.json +42 -0
  36. package/.vscode/settings.json +19 -0
  37. package/CODE_OF_CONDUCT.md +128 -0
  38. package/LICENSE +21 -0
  39. package/README.md +213 -0
  40. package/dist/src/cli/developCommand.js +240 -0
  41. package/dist/src/cli/directCommands.js +143 -0
  42. package/dist/src/cli/envLoader.js +16 -0
  43. package/dist/src/cli/exportCommand.js +53 -0
  44. package/dist/src/cli/index.js +203 -0
  45. package/dist/src/cli/ioContext.js +109 -0
  46. package/dist/src/cli/preflight.js +57 -0
  47. package/dist/src/cli/statusCommand.js +110 -0
  48. package/dist/src/cli/workshopCommand.js +400 -0
  49. package/dist/src/develop/checkpointState.js +86 -0
  50. package/dist/src/develop/codeGenerator.js +319 -0
  51. package/dist/src/develop/dynamicScaffolder.js +226 -0
  52. package/dist/src/develop/githubMcpAdapter.js +122 -0
  53. package/dist/src/develop/index.js +15 -0
  54. package/dist/src/develop/mcpContextEnricher.js +195 -0
  55. package/dist/src/develop/pocScaffolder.js +542 -0
  56. package/dist/src/develop/ralphLoop.js +659 -0
  57. package/dist/src/develop/templateRegistry.js +364 -0
  58. package/dist/src/develop/testRunner.js +202 -0
  59. package/dist/src/logging/logger.js +58 -0
  60. package/dist/src/loop/conversationLoop.js +227 -0
  61. package/dist/src/loop/phaseSummarizer.js +87 -0
  62. package/dist/src/mcp/mcpManager.js +267 -0
  63. package/dist/src/mcp/mcpTransport.js +391 -0
  64. package/dist/src/mcp/retryPolicy.js +47 -0
  65. package/dist/src/mcp/webSearch.js +254 -0
  66. package/dist/src/phases/contextSummarizer.js +101 -0
  67. package/dist/src/phases/discoveryEnricher.js +156 -0
  68. package/dist/src/phases/phaseExtractors.js +222 -0
  69. package/dist/src/phases/phaseHandlers.js +328 -0
  70. package/dist/src/prompts/design.md +51 -0
  71. package/dist/src/prompts/develop-boundary.md +51 -0
  72. package/dist/src/prompts/develop.md +111 -0
  73. package/dist/src/prompts/discover.md +58 -0
  74. package/dist/src/prompts/ideate.md +56 -0
  75. package/dist/src/prompts/plan.md +51 -0
  76. package/dist/src/prompts/promptLoader.js +167 -0
  77. package/dist/src/prompts/promptLoader.ts +198 -0
  78. package/dist/src/prompts/select.md +47 -0
  79. package/dist/src/prompts/summarize/README.md +8 -0
  80. package/dist/src/prompts/summarize/design-summary.md +37 -0
  81. package/dist/src/prompts/summarize/develop-summary.md +25 -0
  82. package/dist/src/prompts/summarize/ideate-summary.md +27 -0
  83. package/dist/src/prompts/summarize/plan-summary.md +27 -0
  84. package/dist/src/prompts/summarize/select-summary.md +21 -0
  85. package/dist/src/prompts/system.md +28 -0
  86. package/dist/src/sessions/exportPaths.js +22 -0
  87. package/dist/src/sessions/exportWriter.js +406 -0
  88. package/dist/src/sessions/sessionManager.js +81 -0
  89. package/dist/src/sessions/sessionStore.js +65 -0
  90. package/dist/src/shared/activitySpinner.js +91 -0
  91. package/dist/src/shared/copilotClient.js +129 -0
  92. package/dist/src/shared/data/cards.json +1249 -0
  93. package/dist/src/shared/data/cardsLoader.js +51 -0
  94. package/dist/src/shared/errorClassifier.js +120 -0
  95. package/dist/src/shared/events.js +28 -0
  96. package/dist/src/shared/markdownRenderer.js +34 -0
  97. package/dist/src/shared/schemas/session.js +265 -0
  98. package/dist/src/shared/tableRenderer.js +20 -0
  99. package/dist/src/vendor/chalk.js +2 -0
  100. package/dist/src/vendor/cli-table3.js +3 -0
  101. package/dist/src/vendor/commander.js +2 -0
  102. package/dist/src/vendor/marked-terminal.js +3 -0
  103. package/dist/src/vendor/marked.js +2 -0
  104. package/dist/src/vendor/ora.js +2 -0
  105. package/dist/src/vendor/pino.js +2 -0
  106. package/dist/src/vendor/zod.js +2 -0
  107. package/dist/tests/e2e/developE2e.spec.js +126 -0
  108. package/dist/tests/e2e/developFailureE2e.spec.js +247 -0
  109. package/dist/tests/e2e/developPty.spec.js +75 -0
  110. package/dist/tests/e2e/discoveryWebSearchRelevance.spec.js +84 -0
  111. package/dist/tests/e2e/harness.spec.js +83 -0
  112. package/dist/tests/e2e/mcpLive.spec.js +120 -0
  113. package/dist/tests/e2e/newSession.e2e.spec.js +177 -0
  114. package/dist/tests/e2e/ralphLoopEnrichmentComparison.spec.js +62 -0
  115. package/dist/tests/e2e/workiqEnrichment.spec.js +56 -0
  116. package/dist/tests/e2e/zavaSimulation.spec.js +452 -0
  117. package/dist/tests/fixtures/test-fixture-project/src/add.js +3 -0
  118. package/dist/tests/fixtures/test-fixture-project/tests/failing.test.js +6 -0
  119. package/dist/tests/fixtures/test-fixture-project/tests/hanging.test.js +8 -0
  120. package/dist/tests/fixtures/test-fixture-project/tests/passing.test.js +10 -0
  121. package/dist/tests/fixtures/test-fixture-project/vitest.config.js +6 -0
  122. package/dist/tests/integration/autoStartConversation.spec.js +138 -0
  123. package/dist/tests/integration/defaultCommand.spec.js +147 -0
  124. package/dist/tests/integration/directCommandNonTty.spec.js +224 -0
  125. package/dist/tests/integration/directCommandTty.spec.js +151 -0
  126. package/dist/tests/integration/discoveryEnrichmentFlow.spec.js +175 -0
  127. package/dist/tests/integration/exportArtifacts.spec.js +202 -0
  128. package/dist/tests/integration/exportFallbackFlow.spec.js +99 -0
  129. package/dist/tests/integration/mcpDegradationFlow.spec.js +190 -0
  130. package/dist/tests/integration/mcpTransportFlow.spec.js +139 -0
  131. package/dist/tests/integration/newSessionFlow.spec.js +343 -0
  132. package/dist/tests/integration/pocGithubMcp.spec.js +186 -0
  133. package/dist/tests/integration/pocLocalFallback.spec.js +171 -0
  134. package/dist/tests/integration/pocScaffold.spec.js +163 -0
  135. package/dist/tests/integration/ralphLoopFlow.spec.js +359 -0
  136. package/dist/tests/integration/ralphLoopPartial.spec.js +368 -0
  137. package/dist/tests/integration/resumeAndBacktrack.spec.js +247 -0
  138. package/dist/tests/integration/spinnerLifecycle.spec.js +220 -0
  139. package/dist/tests/integration/summarizationFlow.spec.js +115 -0
  140. package/dist/tests/integration/testRunnerReal.spec.js +52 -0
  141. package/dist/tests/integration/webSearchAgent.spec.js +128 -0
  142. package/dist/tests/live/copilotSdkLive.spec.js +107 -0
  143. package/dist/tests/live/zavaFullWorkshop.spec.js +392 -0
  144. package/dist/tests/setup/loadEnv.js +3 -0
  145. package/dist/tests/unit/cli/developCommand.spec.js +567 -0
  146. package/dist/tests/unit/cli/directCommands.spec.js +279 -0
  147. package/dist/tests/unit/cli/envLoader.spec.js +58 -0
  148. package/dist/tests/unit/cli/ioContext.spec.js +119 -0
  149. package/dist/tests/unit/cli/preflight.spec.js +108 -0
  150. package/dist/tests/unit/cli/statusCommand.spec.js +111 -0
  151. package/dist/tests/unit/cli/workshopClientFallback.spec.js +80 -0
  152. package/dist/tests/unit/cli/workshopCommand.spec.js +329 -0
  153. package/dist/tests/unit/config/vitestEnvSetup.spec.js +13 -0
  154. package/dist/tests/unit/develop/checkpointState.spec.js +315 -0
  155. package/dist/tests/unit/develop/codeGenerator.spec.js +355 -0
  156. package/dist/tests/unit/develop/githubMcpAdapter.spec.js +231 -0
  157. package/dist/tests/unit/develop/mcpContextEnricher.spec.js +433 -0
  158. package/dist/tests/unit/develop/outputValidator.spec.js +119 -0
  159. package/dist/tests/unit/develop/pocScaffolder.spec.js +353 -0
  160. package/dist/tests/unit/develop/ralphLoop.spec.js +1248 -0
  161. package/dist/tests/unit/develop/templateRegistry.spec.js +85 -0
  162. package/dist/tests/unit/develop/testRunner.spec.js +249 -0
  163. package/dist/tests/unit/infraBicep.spec.js +92 -0
  164. package/dist/tests/unit/infraDeploy.spec.js +82 -0
  165. package/dist/tests/unit/infraTeardown.spec.js +63 -0
  166. package/dist/tests/unit/logging/logger.spec.js +43 -0
  167. package/dist/tests/unit/loop/conversationLoop.spec.js +592 -0
  168. package/dist/tests/unit/loop/phaseSummarizer.spec.js +141 -0
  169. package/dist/tests/unit/loop/streamingMarkdown.spec.js +147 -0
  170. package/dist/tests/unit/mcp/mcpManager.spec.js +279 -0
  171. package/dist/tests/unit/mcp/mcpTransport.spec.js +529 -0
  172. package/dist/tests/unit/mcp/retryPolicy.spec.js +218 -0
  173. package/dist/tests/unit/mcp/timeoutValidation.spec.js +46 -0
  174. package/dist/tests/unit/mcp/webSearch.spec.js +567 -0
  175. package/dist/tests/unit/phases/contextSummarizer.spec.js +140 -0
  176. package/dist/tests/unit/phases/discoveryEnricher.repeatCalls.spec.js +93 -0
  177. package/dist/tests/unit/phases/discoveryEnricher.spec.js +411 -0
  178. package/dist/tests/unit/phases/phaseExtractors.spec.js +352 -0
  179. package/dist/tests/unit/phases/phaseHandlers.spec.js +425 -0
  180. package/dist/tests/unit/prompts/promptLoader.spec.js +118 -0
  181. package/dist/tests/unit/schemas/pocSchemas.spec.js +412 -0
  182. package/dist/tests/unit/schemas/session.spec.js +257 -0
  183. package/dist/tests/unit/sessions/exportPaths.spec.js +31 -0
  184. package/dist/tests/unit/sessions/exportWriter.spec.js +655 -0
  185. package/dist/tests/unit/sessions/sessionManager.spec.js +151 -0
  186. package/dist/tests/unit/sessions/sessionStore.spec.js +116 -0
  187. package/dist/tests/unit/shared/activitySpinner.spec.js +175 -0
  188. package/dist/tests/unit/shared/cardsLoader.spec.js +76 -0
  189. package/dist/tests/unit/shared/copilotClient.spec.js +155 -0
  190. package/dist/tests/unit/shared/errorClassifier.spec.js +131 -0
  191. package/dist/tests/unit/shared/events.spec.js +55 -0
  192. package/dist/tests/unit/shared/markdownRenderer.spec.js +35 -0
  193. package/dist/tests/unit/shared/markdownRendererChunks.spec.js +70 -0
  194. package/dist/tests/unit/shared/tableRenderer.spec.js +34 -0
  195. package/dist/vitest.config.js +14 -0
  196. package/dist/vitest.live.config.js +18 -0
  197. package/docs/README.md +35 -0
  198. package/docs/architecture.md +169 -0
  199. package/docs/cli-usage.md +207 -0
  200. package/docs/environment.md +66 -0
  201. package/docs/export-format.md +146 -0
  202. package/docs/session-model.md +113 -0
  203. package/eslint.config.js +35 -0
  204. package/infra/deploy.sh +193 -0
  205. package/infra/gather-env.sh +211 -0
  206. package/infra/main.bicep +90 -0
  207. package/infra/main.bicepparam +18 -0
  208. package/infra/resources.bicep +134 -0
  209. package/infra/teardown.sh +114 -0
  210. package/package.json +63 -0
  211. package/specs/001-cli-workshop-rebuild/checklists/requirements.md +35 -0
  212. package/specs/001-cli-workshop-rebuild/contracts/cli.md +59 -0
  213. package/specs/001-cli-workshop-rebuild/contracts/export-summary-json.md +23 -0
  214. package/specs/001-cli-workshop-rebuild/contracts/session-json.md +30 -0
  215. package/specs/001-cli-workshop-rebuild/data-model.md +210 -0
  216. package/specs/001-cli-workshop-rebuild/plan.md +361 -0
  217. package/specs/001-cli-workshop-rebuild/quickstart.md +83 -0
  218. package/specs/001-cli-workshop-rebuild/research.md +116 -0
  219. package/specs/001-cli-workshop-rebuild/spec.md +240 -0
  220. package/specs/001-cli-workshop-rebuild/tasks.md +476 -0
  221. package/specs/002-poc-generation/contracts/poc-output.md +172 -0
  222. package/specs/002-poc-generation/contracts/ralph-loop.md +113 -0
  223. package/specs/002-poc-generation/data-model.md +172 -0
  224. package/specs/002-poc-generation/plan.md +109 -0
  225. package/specs/002-poc-generation/quickstart.md +97 -0
  226. package/specs/002-poc-generation/research.md +786 -0
  227. package/specs/002-poc-generation/spec.md +81 -0
  228. package/specs/002-poc-generation/tasks-fix.md +198 -0
  229. package/specs/002-poc-generation/tasks.md +252 -0
  230. package/specs/003-mcp-transport-integration/checklists/requirements.md +37 -0
  231. package/specs/003-mcp-transport-integration/contracts/context-enricher.md +220 -0
  232. package/specs/003-mcp-transport-integration/contracts/discovery-enricher.md +267 -0
  233. package/specs/003-mcp-transport-integration/contracts/github-adapter.md +149 -0
  234. package/specs/003-mcp-transport-integration/contracts/mcp-transport.md +288 -0
  235. package/specs/003-mcp-transport-integration/data-model.md +326 -0
  236. package/specs/003-mcp-transport-integration/plan.md +114 -0
  237. package/specs/003-mcp-transport-integration/quickstart.md +311 -0
  238. package/specs/003-mcp-transport-integration/research.md +395 -0
  239. package/specs/003-mcp-transport-integration/spec.md +234 -0
  240. package/specs/003-mcp-transport-integration/tasks.md +324 -0
  241. package/specs/003-next-spec-gaps.md +150 -0
  242. package/specs/004-dev-resume-hardening/checklists/requirements.md +37 -0
  243. package/specs/004-dev-resume-hardening/contracts/cli.md +160 -0
  244. package/specs/004-dev-resume-hardening/data-model.md +321 -0
  245. package/specs/004-dev-resume-hardening/plan.md +107 -0
  246. package/specs/004-dev-resume-hardening/quickstart.md +115 -0
  247. package/specs/004-dev-resume-hardening/research.md +142 -0
  248. package/specs/004-dev-resume-hardening/spec.md +221 -0
  249. package/specs/004-dev-resume-hardening/tasks.md +333 -0
  250. package/specs/005-ai-search-deploy/checklists/requirements.md +39 -0
  251. package/specs/005-ai-search-deploy/contracts/web-search-tool.md +241 -0
  252. package/specs/005-ai-search-deploy/data-model.md +130 -0
  253. package/specs/005-ai-search-deploy/plan.md +93 -0
  254. package/specs/005-ai-search-deploy/quickstart.md +96 -0
  255. package/specs/005-ai-search-deploy/research.md +187 -0
  256. package/specs/005-ai-search-deploy/spec.md +143 -0
  257. package/specs/005-ai-search-deploy/tasks.md +284 -0
  258. package/specs/006-workshop-extraction-fixes/checklists/requirements.md +61 -0
  259. package/specs/006-workshop-extraction-fixes/contracts/summarization-and-export.md +131 -0
  260. package/specs/006-workshop-extraction-fixes/data-model.md +149 -0
  261. package/specs/006-workshop-extraction-fixes/plan.md +123 -0
  262. package/specs/006-workshop-extraction-fixes/quickstart.md +101 -0
  263. package/specs/006-workshop-extraction-fixes/research.md +143 -0
  264. package/specs/006-workshop-extraction-fixes/spec.md +210 -0
  265. package/specs/006-workshop-extraction-fixes/tasks.md +316 -0
  266. package/src/cli/developCommand.ts +308 -0
  267. package/src/cli/directCommands.ts +195 -0
  268. package/src/cli/envLoader.ts +17 -0
  269. package/src/cli/exportCommand.ts +65 -0
  270. package/src/cli/index.ts +249 -0
  271. package/src/cli/ioContext.ts +139 -0
  272. package/src/cli/preflight.ts +86 -0
  273. package/src/cli/statusCommand.ts +118 -0
  274. package/src/cli/workshopCommand.ts +496 -0
  275. package/src/develop/checkpointState.ts +121 -0
  276. package/src/develop/codeGenerator.ts +402 -0
  277. package/src/develop/dynamicScaffolder.ts +284 -0
  278. package/src/develop/githubMcpAdapter.ts +199 -0
  279. package/src/develop/index.ts +34 -0
  280. package/src/develop/mcpContextEnricher.ts +279 -0
  281. package/src/develop/pocScaffolder.ts +646 -0
  282. package/src/develop/ralphLoop.ts +1044 -0
  283. package/src/develop/templateRegistry.ts +427 -0
  284. package/src/develop/testRunner.ts +276 -0
  285. package/src/logging/logger.ts +73 -0
  286. package/src/loop/conversationLoop.ts +355 -0
  287. package/src/loop/phaseSummarizer.ts +114 -0
  288. package/src/mcp/mcpManager.ts +365 -0
  289. package/src/mcp/mcpTransport.ts +562 -0
  290. package/src/mcp/retryPolicy.ts +87 -0
  291. package/src/mcp/webSearch.ts +388 -0
  292. package/src/originalPrompts/design_thinking.md +178 -0
  293. package/src/originalPrompts/design_thinking_persona.md +76 -0
  294. package/src/originalPrompts/document_generator_example.md +77 -0
  295. package/src/originalPrompts/document_generator_persona.md +47 -0
  296. package/src/originalPrompts/facilitator_persona.md +125 -0
  297. package/src/originalPrompts/guardrails.md +47 -0
  298. package/src/phases/contextSummarizer.ts +154 -0
  299. package/src/phases/discoveryEnricher.ts +223 -0
  300. package/src/phases/phaseExtractors.ts +247 -0
  301. package/src/phases/phaseHandlers.ts +450 -0
  302. package/src/prompts/design.md +51 -0
  303. package/src/prompts/develop-boundary.md +51 -0
  304. package/src/prompts/develop.md +111 -0
  305. package/src/prompts/discover.md +58 -0
  306. package/src/prompts/ideate.md +56 -0
  307. package/src/prompts/plan.md +51 -0
  308. package/src/prompts/promptLoader.ts +198 -0
  309. package/src/prompts/select.md +47 -0
  310. package/src/prompts/summarize/README.md +8 -0
  311. package/src/prompts/summarize/design-summary.md +37 -0
  312. package/src/prompts/summarize/develop-summary.md +25 -0
  313. package/src/prompts/summarize/ideate-summary.md +27 -0
  314. package/src/prompts/summarize/plan-summary.md +27 -0
  315. package/src/prompts/summarize/select-summary.md +21 -0
  316. package/src/prompts/system.md +28 -0
  317. package/src/sessions/exportPaths.ts +28 -0
  318. package/src/sessions/exportWriter.ts +490 -0
  319. package/src/sessions/sessionManager.ts +119 -0
  320. package/src/sessions/sessionStore.ts +69 -0
  321. package/src/shared/activitySpinner.ts +108 -0
  322. package/src/shared/copilotClient.ts +291 -0
  323. package/src/shared/data/cards.json +1249 -0
  324. package/src/shared/data/cardsLoader.ts +70 -0
  325. package/src/shared/errorClassifier.ts +160 -0
  326. package/src/shared/events.ts +103 -0
  327. package/src/shared/markdownRenderer.ts +44 -0
  328. package/src/shared/schemas/session.ts +346 -0
  329. package/src/shared/tableRenderer.ts +28 -0
  330. package/src/types/marked-terminal.d.ts +5 -0
  331. package/src/vendor/chalk.ts +2 -0
  332. package/src/vendor/cli-table3.ts +3 -0
  333. package/src/vendor/commander.ts +2 -0
  334. package/src/vendor/marked-terminal.ts +3 -0
  335. package/src/vendor/marked.ts +2 -0
  336. package/src/vendor/ora.ts +2 -0
  337. package/src/vendor/pino.ts +3 -0
  338. package/src/vendor/zod.ts +3 -0
  339. package/tests/e2e/developE2e.spec.ts +152 -0
  340. package/tests/e2e/developFailureE2e.spec.ts +289 -0
  341. package/tests/e2e/developPty.spec.ts +86 -0
  342. package/tests/e2e/discoveryWebSearchRelevance.spec.ts +103 -0
  343. package/tests/e2e/harness.spec.ts +104 -0
  344. package/tests/e2e/mcpLive.spec.ts +149 -0
  345. package/tests/e2e/newSession.e2e.spec.ts +245 -0
  346. package/tests/e2e/ralphLoopEnrichmentComparison.spec.ts +70 -0
  347. package/tests/e2e/workiqEnrichment.spec.ts +72 -0
  348. package/tests/e2e/zava-assessment/agent-interaction-script.md +258 -0
  349. package/tests/e2e/zava-assessment/company-profile.md +98 -0
  350. package/tests/e2e/zava-assessment/expected-results-checklist.md +454 -0
  351. package/tests/e2e/zavaSimulation.spec.ts +511 -0
  352. package/tests/fixtures/completedSession.json +141 -0
  353. package/tests/fixtures/test-fixture-project/package-lock.json +1585 -0
  354. package/tests/fixtures/test-fixture-project/package.json +12 -0
  355. package/tests/fixtures/test-fixture-project/src/add.ts +3 -0
  356. package/tests/fixtures/test-fixture-project/tests/failing.test.ts +7 -0
  357. package/tests/fixtures/test-fixture-project/tests/hanging.test.ts +9 -0
  358. package/tests/fixtures/test-fixture-project/tests/passing.test.ts +13 -0
  359. package/tests/fixtures/test-fixture-project/vitest.config.ts +7 -0
  360. package/tests/integration/autoStartConversation.spec.ts +168 -0
  361. package/tests/integration/defaultCommand.spec.ts +179 -0
  362. package/tests/integration/directCommandNonTty.spec.ts +260 -0
  363. package/tests/integration/directCommandTty.spec.ts +185 -0
  364. package/tests/integration/discoveryEnrichmentFlow.spec.ts +209 -0
  365. package/tests/integration/exportArtifacts.spec.ts +232 -0
  366. package/tests/integration/exportFallbackFlow.spec.ts +115 -0
  367. package/tests/integration/mcpDegradationFlow.spec.ts +231 -0
  368. package/tests/integration/mcpTransportFlow.spec.ts +178 -0
  369. package/tests/integration/newSessionFlow.spec.ts +406 -0
  370. package/tests/integration/pocGithubMcp.spec.ts +224 -0
  371. package/tests/integration/pocLocalFallback.spec.ts +205 -0
  372. package/tests/integration/pocScaffold.spec.ts +220 -0
  373. package/tests/integration/ralphLoopFlow.spec.ts +430 -0
  374. package/tests/integration/ralphLoopPartial.spec.ts +416 -0
  375. package/tests/integration/resumeAndBacktrack.spec.ts +278 -0
  376. package/tests/integration/spinnerLifecycle.spec.ts +270 -0
  377. package/tests/integration/summarizationFlow.spec.ts +135 -0
  378. package/tests/integration/testRunnerReal.spec.ts +63 -0
  379. package/tests/integration/webSearchAgent.spec.ts +155 -0
  380. package/tests/live/copilotSdkLive.spec.ts +149 -0
  381. package/tests/live/zavaFullWorkshop.spec.ts +515 -0
  382. package/tests/setup/loadEnv.ts +5 -0
  383. package/tests/unit/cli/developCommand.spec.ts +679 -0
  384. package/tests/unit/cli/directCommands.spec.ts +325 -0
  385. package/tests/unit/cli/envLoader.spec.ts +73 -0
  386. package/tests/unit/cli/ioContext.spec.ts +148 -0
  387. package/tests/unit/cli/preflight.spec.ts +125 -0
  388. package/tests/unit/cli/statusCommand.spec.ts +134 -0
  389. package/tests/unit/cli/workshopClientFallback.spec.ts +100 -0
  390. package/tests/unit/cli/workshopCommand.spec.ts +378 -0
  391. package/tests/unit/config/vitestEnvSetup.spec.ts +24 -0
  392. package/tests/unit/develop/checkpointState.spec.ts +378 -0
  393. package/tests/unit/develop/codeGenerator.spec.ts +447 -0
  394. package/tests/unit/develop/githubMcpAdapter.spec.ts +283 -0
  395. package/tests/unit/develop/mcpContextEnricher.spec.ts +564 -0
  396. package/tests/unit/develop/outputValidator.spec.ts +134 -0
  397. package/tests/unit/develop/pocScaffolder.spec.ts +451 -0
  398. package/tests/unit/develop/ralphLoop.spec.ts +1439 -0
  399. package/tests/unit/develop/templateRegistry.spec.ts +106 -0
  400. package/tests/unit/develop/testRunner.spec.ts +294 -0
  401. package/tests/unit/infraBicep.spec.ts +116 -0
  402. package/tests/unit/infraDeploy.spec.ts +102 -0
  403. package/tests/unit/infraTeardown.spec.ts +77 -0
  404. package/tests/unit/logging/logger.spec.ts +50 -0
  405. package/tests/unit/loop/conversationLoop.spec.ts +719 -0
  406. package/tests/unit/loop/phaseSummarizer.spec.ts +169 -0
  407. package/tests/unit/loop/streamingMarkdown.spec.ts +180 -0
  408. package/tests/unit/mcp/mcpManager.spec.ts +336 -0
  409. package/tests/unit/mcp/mcpTransport.spec.ts +689 -0
  410. package/tests/unit/mcp/retryPolicy.spec.ts +278 -0
  411. package/tests/unit/mcp/timeoutValidation.spec.ts +55 -0
  412. package/tests/unit/mcp/webSearch.spec.ts +718 -0
  413. package/tests/unit/phases/contextSummarizer.spec.ts +158 -0
  414. package/tests/unit/phases/discoveryEnricher.repeatCalls.spec.ts +125 -0
  415. package/tests/unit/phases/discoveryEnricher.spec.ts +512 -0
  416. package/tests/unit/phases/phaseExtractors.spec.ts +406 -0
  417. package/tests/unit/phases/phaseHandlers.spec.ts +483 -0
  418. package/tests/unit/prompts/promptLoader.spec.ts +144 -0
  419. package/tests/unit/schemas/pocSchemas.spec.ts +457 -0
  420. package/tests/unit/schemas/session.spec.ts +328 -0
  421. package/tests/unit/sessions/exportPaths.spec.ts +38 -0
  422. package/tests/unit/sessions/exportWriter.spec.ts +737 -0
  423. package/tests/unit/sessions/sessionManager.spec.ts +174 -0
  424. package/tests/unit/sessions/sessionStore.spec.ts +136 -0
  425. package/tests/unit/shared/activitySpinner.spec.ts +211 -0
  426. package/tests/unit/shared/cardsLoader.spec.ts +89 -0
  427. package/tests/unit/shared/copilotClient.spec.ts +185 -0
  428. package/tests/unit/shared/errorClassifier.spec.ts +152 -0
  429. package/tests/unit/shared/events.spec.ts +71 -0
  430. package/tests/unit/shared/markdownRenderer.spec.ts +42 -0
  431. package/tests/unit/shared/markdownRendererChunks.spec.ts +83 -0
  432. package/tests/unit/shared/tableRenderer.spec.ts +38 -0
  433. package/tsconfig.json +20 -0
  434. package/vitest.config.ts +15 -0
  435. package/vitest.live.config.ts +19 -0
@@ -0,0 +1,210 @@
1
+ # Feature Specification: Workshop Phase Extraction & Tool Wiring Fixes
2
+
3
+ **Feature Branch**: `006-workshop-extraction-fixes`
4
+ **Created**: 2026-03-04
5
+ **Status**: Draft
6
+ **Upstream Dependency**: specs/001-cli-workshop-rebuild/spec.md, specs/003-mcp-transport-integration/spec.md, specs/005-ai-search-deploy/spec.md
7
+ **Input**: User description: "Fix workshop phase extraction failures, lazy web search config, MCP tool wiring, context window management, and export completeness identified in the Zava Industries full-session assessment"
8
+
9
+ ## Overview
10
+
11
+ A full end-to-end workshop session (the Zava Industries assessment — 6 phases, 48 turns, ~13 minutes) revealed five systemic bugs that prevent sofIA from extracting structured data from LLM responses and from using MCP tools during the workshop flow. The assessment scored **53%** (59/126 testable checks passed). The core conversational quality is excellent (rated 4–5/5 across phases), but the pipeline between LLM output and structured session state is broken for all phases except Discover's `businessContext`.
12
+
13
+ This feature addresses all five bugs discovered in that assessment, plus related gaps in export completeness and context management.
14
+
15
+ ### Bugs Addressed
16
+
17
+ | Bug ID | Severity | Summary |
18
+ | ------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
19
+ | BUG-001 | High | `isWebSearchConfigured()` checks `process.env` at function call time, but `.env` may not be loaded yet if the module was imported before `envLoader.ts` ran — making web search appear unconfigured |
20
+ | BUG-002 | High | Phase extractors (`extractIdeas`, `extractEvaluation`, `extractSelection`, `extractPlan`, `extractPocState`) only find data inside JSON code blocks; the LLM often produces structured information in prose/tables/markdown without a JSON block, leaving `session.ideas`, `session.evaluation`, `session.selection`, `session.plan`, and `session.poc` all null |
21
+ | BUG-003 | Medium | Select phase hit a 120-second SDK `sendAndWait()` timeout when prior phases accumulated ~38 turns of context — likely a context window or processing overload issue |
22
+ | BUG-004 | Medium | Export only produces `discover.md` because the export writer requires populated structured fields (e.g., `session.ideas`) to generate phase files, and those fields are empty due to BUG-002 |
23
+ | BUG-005 | Medium | MCP tools (web search, WorkIQ, Context7, Azure MCP) are not wired into the workshop flow — `workshopCommand.ts` does not create an `McpManager` or pass `webSearchClient`/MCP config to phase handlers beyond Discover |
24
+
25
+ ## User Scenarios & Testing _(mandatory)_
26
+
27
+ ### User Story 1 — Phase data is reliably extracted from LLM responses (Priority: P1)
28
+
29
+ As a facilitator completing a multi-phase workshop, I want sofIA to reliably capture structured artifacts (ideas, evaluation, selection, plan, PoC intent) from each phase's conversation, so that the session JSON contains actionable data for export, `sofia dev`, and progress tracking — regardless of whether the LLM uses JSON blocks, markdown tables, or prose.
30
+
31
+ **Why this priority**: This is the root cause of BUG-002 and BUG-004. Without reliable extraction, the entire downstream pipeline (export, dev, status) is broken. The LLM produces high-quality content, but it's lost because the extractors are too rigid.
32
+
33
+ **Independent Test**: Run a workshop session through all phases with real LLM output, then verify that `session.ideas`, `session.evaluation`, `session.selection`, `session.plan`, and `session.poc` are all populated in the session JSON. Alternatively, feed recorded LLM outputs from the Zava assessment into the extractors and verify structured data is produced.
34
+
35
+ **Acceptance Scenarios**:
36
+
37
+ 1. **Given** the Ideate phase produces ideas in markdown format (titles, descriptions, bullet points) without a JSON code block, **When** the phase completes, **Then** `session.ideas` contains at least 3 extracted `IdeaCard` entries with title and description populated.
38
+ 2. **Given** the Design phase produces a feasibility-value scoring table in markdown, **When** the phase completes, **Then** `session.evaluation` contains scored ideas matching the LLM's output.
39
+ 3. **Given** the Select phase produces a recommendation with rationale in prose, **When** the user confirms the selection, **Then** `session.selection` contains the selected idea ID, rationale, and `confirmedByUser: true`.
40
+ 4. **Given** the Plan phase produces milestones in a numbered list or markdown structure, **When** the phase completes, **Then** `session.plan` contains milestone entries with titles and descriptions.
41
+ 5. **Given** the Develop boundary phase captures PoC intent in prose, **When** the phase completes, **Then** `session.poc` contains target stack, key scenarios, and constraints.
42
+ 6. **Given** the LLM response includes a valid JSON code block (current behavior), **When** the extractor runs, **Then** the JSON block extraction still works (backward compatibility preserved).
43
+
44
+ ---
45
+
46
+ ### User Story 2 — Web search is available when `.env` is loaded (Priority: P1)
47
+
48
+ As a facilitator starting a workshop session, I want sofIA's web search to work when my `.env` file contains the correct Foundry credentials, regardless of module import ordering, so that the Discover phase can search the web for my company and industry context.
49
+
50
+ **Why this priority**: BUG-001 means web search silently fails in any execution path where `envLoader.ts` hasn't run before `webSearch.ts` is first evaluated. This affects both the CLI and programmatic test paths.
51
+
52
+ **Independent Test**: Set `FOUNDRY_PROJECT_ENDPOINT` and `FOUNDRY_MODEL_DEPLOYMENT_NAME` in `.env` only (not in shell env), start sofIA, and verify `isWebSearchConfigured()` returns true and the Discover phase offers web search.
53
+
54
+ **Acceptance Scenarios**:
55
+
56
+ 1. **Given** `.env` contains valid Foundry credentials and `process.env` does not have them pre-set, **When** the CLI starts and calls `isWebSearchConfigured()`, **Then** it returns `true`.
57
+ 2. **Given** `.env` is absent and Foundry env vars are not set, **When** the CLI starts and calls `isWebSearchConfigured()`, **Then** it returns `false` (no error, no crash).
58
+ 3. **Given** Foundry env vars are set in the shell environment (not in `.env`), **When** the CLI starts, **Then** `isWebSearchConfigured()` returns `true` regardless of `.env` loading order.
59
+
60
+ ---
61
+
62
+ ### User Story 3 — Workshop phases use MCP tools for enrichment (Priority: P2)
63
+
64
+ As a facilitator running a workshop, I want sofIA to use web search during Discover, and Context7/Azure MCP during Design and Plan, so that the workshop output is grounded in real-world data and current documentation rather than relying solely on the LLM's training data.
65
+
66
+ **Why this priority**: BUG-005 prevents all MCP-driven enrichment in the workshop flow. While the workshop works without tools (graceful degradation is solid), the quality gap is significant — enriched sessions produce more relevant ideas and better-grounded architecture recommendations.
67
+
68
+ **Independent Test**: Run a workshop session with MCP servers configured, verify that: (a) Discover phase calls `web.search` and stores results, (b) Design phase queries Context7 for library docs when ideas reference specific technologies, (c) Plan phase queries Azure MCP when the plan references Azure services.
69
+
70
+ **Acceptance Scenarios**:
71
+
72
+ 1. **Given** web search is configured and the Discover phase has collected business context, **When** the enrichment hook runs, **Then** web search is called with company and industry queries and results are stored in `session.discovery.enrichment`.
73
+ 2. **Given** Context7 is available and the Design phase discusses ideas referencing npm packages, **When** the Design phase handler retrieves references, **Then** Context7 is queried for relevant library documentation.
74
+ 3. **Given** Azure MCP is available and the Plan phase references Azure services (e.g., Azure Functions, Cosmos DB), **When** the Plan phase handler builds the system prompt or post-processes, **Then** Azure architecture guidance is fetched and included in the LLM's context.
75
+ 4. **Given** all MCP tools are unavailable, **When** the workshop runs, **Then** every phase still completes successfully (existing graceful degradation preserved).
76
+
77
+ ---
78
+
79
+ ### User Story 4 — Long sessions don't cause timeouts (Priority: P2)
80
+
81
+ As a facilitator completing a multi-phase workshop with rich conversations, I want sofIA to manage its context window so that later phases (Select, Plan, Develop) don't time out due to accumulated conversation history from earlier phases.
82
+
83
+ **Why this priority**: BUG-003 caused the Select phase to time out completely, losing all user progress for that phase. In a real workshop, this would be frustrating and would undermine trust in the tool.
84
+
85
+ **Independent Test**: Run a 6-phase workshop session with at least 10 turns per phase. Verify that the Select and Plan phases complete successfully without SDK timeouts, and that prior phase context is available to the LLM in summarized form.
86
+
87
+ **Acceptance Scenarios**:
88
+
89
+ 1. **Given** a session with 30+ turns across Discover, Ideate, and Design phases, **When** the Select phase starts, **Then** it completes within 120 seconds without a timeout error.
90
+ 2. **Given** a session starting a new phase after multiple completed phases, **When** the ConversationLoop builds the context, **Then** prior phase turns are summarized (not included verbatim) in the system prompt to reduce context size.
91
+ 3. **Given** the summarized context from prior phases, **When** the LLM generates phase output, **Then** it can reference key decisions from earlier phases (business context, selected ideas) accurately despite summarization.
92
+
93
+ ---
94
+
95
+ ### User Story 5 — Export includes all phases with conversation fallback (Priority: P2)
96
+
97
+ As a facilitator exporting workshop results, I want `sofia export` to produce markdown files for every phase that had conversations — even if structured artifacts weren't extracted — so that the full workshop output is preserved for review.
98
+
99
+ **Why this priority**: BUG-004 means that only the Discover phase (which has `businessContext` populated) generates an export file. The other 5 phases' conversations are lost from the export despite containing rich, useful content. A workshop facilitator needs all phases represented.
100
+
101
+ **Independent Test**: Run `sofia export` on a session that completed all phases but has null structured fields for Ideate through Develop. Verify that export files are generated for all phases using conversation turn fallback.
102
+
103
+ **Acceptance Scenarios**:
104
+
105
+ 1. **Given** a session with Ideate conversation turns but no extracted `session.ideas`, **When** `sofia export` runs, **Then** an `ideate.md` file is generated containing the conversation turns for the Ideate phase.
106
+ 2. **Given** a session with all 6 phases completed and conversation data, **When** `sofia export` runs, **Then** the export directory contains `discover.md`, `ideate.md`, `design.md`, `select.md`, `plan.md`, and `develop.md`.
107
+ 3. **Given** a session with both structured data and conversation turns for a phase, **When** `sofia export` runs, **Then** the structured data is rendered first, followed by the conversation.
108
+ 4. **Given** `summary.json`, **When** generated, **Then** it lists all exported phase files and includes highlights from every populated phase.
109
+
110
+ ---
111
+
112
+ ### Edge Cases
113
+
114
+ - What if the LLM produces a JSON block that partially matches the schema (e.g., missing one required field)? The summarization call should still produce a valid extraction; the original partial JSON should be logged for debugging.
115
+ - What if the summarization LLM call itself fails? Fall back to the current extraction behavior (JSON block parsing) and log a warning. The phase should not be blocked by a failed summarization attempt.
116
+ - What if context summarization loses critical details (e.g., a specific technology choice from the Plan)? The system should preserve key fields (business context, selected idea, plan milestones) verbatim, and only summarize conversation turns.
117
+ - What if the user's session has 0 turns for a phase (e.g., the Select timeout scenario)? The export phase file should still be generated with a note that the phase had no conversation content.
118
+ - What if multiple JSON blocks exist in a single LLM response? The extractor should try all blocks against the expected schema, not just the first one.
119
+ - What if the LLM drifts into the next phase's content before the decision gate? The system prompt should enforce phase boundaries explicitly.
120
+ - What if the Select phase still times out even after context summarization? The system should have a secondary fallback (minimal-context retry or user-directed selection).
121
+ - What if the Zava assessment live test provides more or fewer inputs than the LLM expects? The test harness should detect input exhaustion and signal completion gracefully rather than silently consuming inputs across phase boundaries. (Test infrastructure concern — not a production code issue.)
122
+
123
+ ## Requirements _(mandatory)_
124
+
125
+ ### Functional Requirements
126
+
127
+ #### Phase Extraction Hardening (BUG-002)
128
+
129
+ - **FR-001**: At the end of each phase (when the conversation loop exits), the system MUST make a dedicated "summarization" LLM call that asks the model to output the phase's structured data as a JSON code block, using the full conversation from that phase as context.
130
+ - **FR-002**: The summarization call MUST use a phase-specific prompt that instructs the LLM to extract the exact JSON shape expected by the schema (e.g., `IdeaCard[]`, `IdeaEvaluation`, `SelectedIdea`, `ImplementationPlan`, `PocDevelopmentState`).
131
+ - **FR-003**: The existing `extractJsonBlock()` → schema parsing pipeline MUST be preserved as the primary extraction path during the conversation. The summarization call is an additional fallback that runs once at phase end.
132
+ - **FR-004**: If the summarization call produces valid structured data and the session field is still null (i.e., per-turn extraction didn't capture it), the system MUST populate the session field from the summarization result.
133
+ - **FR-005**: If the summarization call fails or returns invalid data, the system MUST log a warning and continue without blocking the phase transition. Existing per-turn extraction results (if any) MUST be preserved.
134
+ - **FR-006**: The summarization call MUST be implemented as a method on `ConversationLoop` (or a utility invoked by it) so that all phases benefit without duplicating logic per handler.
135
+ - **FR-007**: The existing `extractJsonBlock()` function MUST be enhanced to try multiple JSON blocks in a response (not just the first match) and return the first one that validates against the expected schema.
136
+ - **FR-007a**: The Design phase summarization prompt (FR-002) MUST also request a Mermaid architecture diagram alongside the structured evaluation JSON, fulfilling spec 001 FR-030. The diagram MUST be stored in the session (as part of `evaluation` or a dedicated field) for export.
137
+
138
+ #### Phase Boundary Enforcement
139
+
140
+ - **FR-007b**: The system prompt for each phase MUST include an explicit instruction prohibiting the LLM from introducing or transitioning to the next phase. The instruction MUST state: "You are in the [Phase] phase. Do NOT introduce or begin the next phase. The user will be offered a decision gate when this phase is complete."
141
+ - **FR-007c**: The `ConversationLoop` (or phase handler `buildSystemPrompt`) MUST inject the phase-boundary instruction automatically for all phases, without requiring per-handler duplication.
142
+
143
+ #### Lazy Web Search Configuration (BUG-001)
144
+
145
+ - **FR-008**: `isWebSearchConfigured()` MUST evaluate `process.env` at call time (lazy), not at module load time. The function MUST NOT cache or memoize the result of the environment variable check.
146
+ - **FR-009**: All callers of `isWebSearchConfigured()` MUST continue to call it as a function. No API signature changes required.
147
+ - **FR-010**: The CLI startup sequence MUST ensure `loadEnvFile()` runs before any code path checks `isWebSearchConfigured()`. This ordering MUST be enforced by calling `loadEnvFile()` at the top of the `workshopCommand()` and `developCommand()` entry points.
148
+
149
+ #### MCP Tool Wiring in Workshop Flow (BUG-005)
150
+
151
+ - **FR-011**: `workshopCommand.ts` MUST create an `McpManager` instance at startup (if MCP configuration exists in `.vscode/mcp.json` or equivalent).
152
+ - **FR-012**: `workshopCommand.ts` MUST create a `WebSearchClient` from the configured Foundry agent credentials (if web search is configured) and pass it to the Discover phase handler via `DiscoverHandlerConfig`.
153
+ - **FR-012a**: `workshopCommand.ts` MUST pass the `McpManager` to the Discover phase handler so that it can check WorkIQ availability. When WorkIQ is available, the Discover handler MUST prompt the user for explicit consent before querying WorkIQ (per spec 001 FR-020 and spec 003 US4). WorkIQ insights MUST be stored in `session.discovery.enrichment.workiqInsights`.
154
+ - **FR-013**: The Design phase handler MUST accept optional MCP configuration and use Context7 to fetch library documentation for technologies referenced in the ideas, when available.
155
+ - **FR-014**: The Plan phase handler MUST accept optional MCP configuration and use Azure MCP / Microsoft Learn to fetch architecture guidance for Azure services referenced in the plan, when available.
156
+ - **FR-015**: All MCP tool calls from workshop phase handlers MUST degrade gracefully — if a tool is unavailable or errors, the phase continues with LLM-only output.
157
+
158
+ #### Context Window Management (BUG-003)
159
+
160
+ - **FR-016**: When starting a new phase, the `ConversationLoop` MUST NOT include raw conversation turns from previous phases in the system prompt. Instead, it MUST include a summarized context block.
161
+ - **FR-017**: The summarized context MUST preserve: business context, topic, key decisions, selected idea (if applicable), plan milestones (if applicable), discovery enrichment data (web search results, WorkIQ insights — if populated), and any other structured session fields already extracted.
162
+ - **FR-018**: Only conversation turns from the current phase MUST be included in the session history injection (for resume scenarios). Turns from prior phases MUST be summarized.
163
+ - **FR-019**: The system SHOULD use the SDK's `infiniteSessions` configuration for long-running sessions as an additional protection against context exhaustion.
164
+ - **FR-019a**: If a phase times out even after context summarization (FR-016–FR-018), the system MUST retry once with a minimal context payload containing only the structured session fields (no conversation turns at all). If the retry also fails, the system MUST fall back to asking the user to manually confirm or provide the expected output (e.g., for Select: present the top-ranked idea from Design and ask the user to confirm).
165
+
166
+ #### Export Completeness (BUG-004)
167
+
168
+ - **FR-020**: The export writer MUST generate a phase markdown file for any phase that has conversation turns in the session, even if the structured session field for that phase is null.
169
+ - **FR-021**: When generating a phase file without structured data, the export writer MUST include the conversation turns formatted as a readable transcript.
170
+ - **FR-022**: When generating a phase file with both structured data and conversation turns, the export writer MUST include the structured data first, then the conversation as a "Conversation" section.
171
+ - **FR-023**: `summary.json` MUST list all generated phase files, not just those with structured data.
172
+ - **FR-024**: `summary.json` highlights MUST include at least one highlight per phase that had conversation turns, derived from the conversation content or structured data.
173
+
174
+ ### Key Entities
175
+
176
+ - **PhaseSummarizationRequest**: A structured request to the LLM at phase end, containing the full conversation transcript for the phase and a phase-specific extraction prompt. Produces the expected schema (e.g., `IdeaCard[]`, `ImplementationPlan`).
177
+ - **SummarizedPhaseContext**: A compact representation of a completed phase's key outputs, used to inject prior-phase context into subsequent phases without including raw turns.
178
+ - **McpWorkshopConfig**: Configuration object passed from `workshopCommand.ts` to phase handler factories, containing optional `McpManager` and `WebSearchClient` references for tool-based enrichment.
179
+
180
+ ## Success Criteria _(mandatory)_
181
+
182
+ ### Measurable Outcomes
183
+
184
+ - **SC-001**: A full 6-phase workshop session produces non-null values for at least `session.businessContext`, `session.ideas`, `session.selection`, and `session.plan` in 90%+ of properly configured runs.
185
+ - **SC-002**: The `sofia export` command generates markdown files for all 6 phases (Discover through Develop) when the session has conversation data for all phases, regardless of structured data availability.
186
+ - **SC-003**: `isWebSearchConfigured()` returns the correct value when `.env` is the sole source of Foundry credentials, verified by a unit test that sets env vars after module import.
187
+ - **SC-004**: A workshop session with 40+ turns across 4+ phases completes the Select and Plan phases without SDK timeout errors.
188
+ - **SC-005**: When web search and at least one MCP server (Context7 or Azure) are configured, the workshop session uses those tools during the appropriate phases and stores the results in the session.
189
+ - **SC-006**: The Zava Industries assessment test (or equivalent) scores at least 75% on testable checks (up from 53%) after these fixes are applied.
190
+
191
+ ## Assumptions
192
+
193
+ - The Copilot SDK's `sendAndWait()` method can handle a 2-turn summarization call (system prompt + phase transcript) within the existing 120-second timeout.
194
+ - The LLM is capable of producing valid JSON that matches the session schema when given explicit instructions and the full conversation transcript — this has been verified in the Discover phase where `businessContext` extraction already works.
195
+ - The existing test suite (709 unit + 99 integration tests) continues to pass after these changes, as the fixes are backward-compatible with existing behavior.
196
+ - MCP server configurations in `.vscode/mcp.json` are readable by `McpManager` — this was already implemented in Feature 003.
197
+ - The `infiniteSessions` SDK feature (documented in `copilotClient.ts`) is stable enough for production use.
198
+
199
+ ## Dependencies
200
+
201
+ - **Feature 001**: Session schema, ConversationLoop, phase handlers, export writer
202
+ - **Feature 003**: McpManager, MCP transport layer, DiscoveryEnricher, web search client
203
+ - **Feature 005**: Foundry deployment, `.env` output, `isWebSearchConfigured()`
204
+
205
+ ## Out of Scope
206
+
207
+ - **Prose-based extraction without summarization call**: Building NLP-based extractors that parse markdown tables and prose directly is complex and fragile. The summarization call approach is simpler and leverages the LLM's own ability to restructure its output.
208
+ - **Automatic retry for the Select timeout**: While BUG-003 caused a timeout, the root fix is context management (FR-016), not retry logic. The existing retry infrastructure (FR-050 from spec 001) handles transient failures.
209
+ - **Multi-language template support for PoC generation**: Template changes belong in Feature 004.
210
+ - **PTY-based E2E test automation**: The Zava assessment test is a programmatic live test. Full PTY-based interactive testing is deferred to Feature 004.
@@ -0,0 +1,316 @@
1
+ # Tasks: Workshop Phase Extraction & Tool Wiring Fixes
2
+
3
+ **Input**: Design documents from `/specs/006-workshop-extraction-fixes/`
4
+ **Prerequisites**: plan.md, spec.md, research.md, data-model.md, contracts/
5
+
6
+ **Tests**: Tests are REQUIRED for new behavior (Red → Green → Review). Test tasks are included for each user story and MUST be written first.
7
+
8
+ **Organization**: Tasks are grouped by user story (from spec.md) to enable independent implementation and testing.
9
+
10
+ ## Format: `[ID] [P?] [Story?] Description`
11
+
12
+ - **[P]**: Can run in parallel (different files, no dependencies on incomplete tasks)
13
+ - **[Story]**: Which user story (US1–US5) this task belongs to
14
+
15
+ > **Note**: Implementation phases are ordered for efficiency (US2 before US1 because it's a quick fix).
16
+ > This differs from spec story numbering where US1=Extraction, US2=WebSearch.
17
+
18
+ ---
19
+
20
+ ## Phase 1: Setup
21
+
22
+ **Purpose**: Project structure for new modules; no behavior changes yet
23
+
24
+ - [x] T001 Create empty module file src/loop/phaseSummarizer.ts with JSDoc header and type-only imports
25
+ - [x] T002 [P] Create empty module file src/phases/contextSummarizer.ts with JSDoc header and type-only imports
26
+ - [x] T003 [P] Create prompt directory src/prompts/summarize/ with placeholder README
27
+ - [x] T004 [P] Update src/prompts/promptLoader.ts to support loading summarization prompts from `summarize/` subdirectory
28
+ - [x] T005 Verify `npm run typecheck` and `npm run lint` pass with empty modules
29
+
30
+ ---
31
+
32
+ ## Phase 2: Foundational (Blocking Prerequisites)
33
+
34
+ **Purpose**: Multi-JSON-block extraction and phase boundary enforcement — used by multiple user stories
35
+
36
+ **⚠️ CRITICAL**: US1 (extraction) and US4 (context) depend on these foundational changes
37
+
38
+ ### Tests (REQUIRED — write first, must FAIL) ⚠️
39
+
40
+ - [x] T006 [P] Add failing tests for `extractAllJsonBlocks()` in tests/unit/phases/phaseExtractors.spec.ts — test with 0, 1, 2, and 3 JSON blocks in a single response
41
+ - [x] T007 [P] Add failing tests for `extractJsonBlockForSchema()` in tests/unit/phases/phaseExtractors.spec.ts — test with multiple blocks where only the second matches the schema
42
+ - [x] T008 [P] Add failing test for phase boundary injection in tests/unit/phases/phaseHandlers.spec.ts — verify system prompt contains "Do NOT introduce or begin the next phase"
43
+
44
+ ### Implementation
45
+
46
+ - [x] T009 Implement `extractAllJsonBlocks()` in src/phases/phaseExtractors.ts — use `/g` flag for fenced blocks, bracket-depth counter for raw JSON (FR-007)
47
+ - [x] T010 Implement `extractJsonBlockForSchema<T>()` in src/phases/phaseExtractors.ts — try each block with `safeParse()`, return first valid match (FR-007)
48
+ - [x] T011 [P] Inject phase-boundary instruction in ConversationLoop system prompt builder in src/loop/conversationLoop.ts (FR-007b, FR-007c)
49
+ - [x] T012 Run `npm run test:unit` — T006, T007, T008 must now PASS; all existing 709 tests must still PASS
50
+
51
+ **Checkpoint**: Foundational extraction + boundary enforcement ready. User story work can begin.
52
+
53
+ ---
54
+
55
+ ## Phase 3: User Story 2 — Lazy Web Search Config (Priority: P1) 🎯 MVP-1
56
+
57
+ **Goal**: `isWebSearchConfigured()` returns correct value regardless of `.env` loading order (BUG-001)
58
+
59
+ **Independent Test**: Set Foundry vars in `.env` only, import `webSearch.ts`, then call `isWebSearchConfigured()` and expect `true`.
60
+
61
+ ### Tests for US2 (REQUIRED — write first, must FAIL) ⚠️
62
+
63
+ - [x] T013 [P] [US2] Add failing test in tests/unit/mcp/webSearch.spec.ts — set env vars AFTER module import, verify `isWebSearchConfigured()` returns true
64
+ - [x] T014 [P] [US2] Add failing test in tests/unit/mcp/webSearch.spec.ts — verify returns false when vars absent
65
+ - [x] T015 [P] [US2] Add failing test in tests/unit/cli/workshopCommand.spec.ts — verify `loadEnvFile()` is called before workshop logic starts
66
+
67
+ ### Implementation for US2
68
+
69
+ - [x] T016 [US2] Verify `isWebSearchConfigured()` in src/mcp/webSearch.ts reads `process.env` at call time with no caching (FR-008, FR-009)
70
+ - [x] T017 [US2] Add `loadEnvFile()` call at top of `workshopCommand()` in src/cli/workshopCommand.ts (FR-010)
71
+ - [x] T018 [P] [US2] Add `loadEnvFile()` call at top of `developCommand()` in src/cli/developCommand.ts (FR-010)
72
+ - [x] T019 [US2] Run `npm run test:unit` — T013, T014, T015 must now PASS
73
+
74
+ **Checkpoint**: Web search configuration works reliably. Can be verified in isolation.
75
+
76
+ ---
77
+
78
+ ## Phase 4: User Story 1 — Phase Data Extraction (Priority: P1) 🎯 MVP-2
79
+
80
+ **Goal**: Structured artifacts (ideas, evaluation, selection, plan, poc) reliably extracted from every workshop phase via post-phase summarization call (BUG-002)
81
+
82
+ **Independent Test**: Feed recorded Zava assessment conversations into the summarization pipeline, verify all session fields populated.
83
+
84
+ ### Tests for US1 (REQUIRED — write first, must FAIL) ⚠️
85
+
86
+ - [x] T020 [P] [US1] Add failing test for `phaseSummarize()` in tests/unit/loop/phaseSummarizer.spec.ts — with fake client returning JSON block, verify session field populated
87
+ - [x] T021 [P] [US1] Add failing test for `phaseSummarize()` in tests/unit/loop/phaseSummarizer.spec.ts — with fake client returning invalid response, verify session unchanged (no crash)
88
+ - [x] T022 [P] [US1] Add failing test for `phaseSummarize()` in tests/unit/loop/phaseSummarizer.spec.ts — field already populated, verify summarization skipped (no-op)
89
+ - [x] T023 [P] [US1] Add failing test for Ideate summarization prompt in tests/unit/loop/phaseSummarizer.spec.ts — verify IdeaCard[] extracted from LLM summary response
90
+ - [x] T024 [P] [US1] Add failing test for Design summarization + Mermaid diagram extraction in tests/unit/loop/phaseSummarizer.spec.ts (FR-007a)
91
+ - [x] T025 [P] [US1] Add failing integration test in tests/integration/summarizationFlow.spec.ts — full pipeline: ConversationLoop → phaseSummarize → session updated
92
+
93
+ ### Implementation for US1
94
+
95
+ - [x] T026 [US1] Create summarization prompt src/prompts/summarize/ideate-summary.md — IdeaCard[] schema shape + extraction instructions (FR-002)
96
+ - [x] T027 [P] [US1] Create summarization prompt src/prompts/summarize/design-summary.md — IdeaEvaluation schema + Mermaid diagram request (FR-002, FR-007a)
97
+ - [x] T028 [P] [US1] Create summarization prompt src/prompts/summarize/select-summary.md — SelectedIdea schema shape (FR-002)
98
+ - [x] T029 [P] [US1] Create summarization prompt src/prompts/summarize/plan-summary.md — ImplementationPlan schema shape (FR-002)
99
+ - [x] T030 [P] [US1] Create summarization prompt src/prompts/summarize/develop-summary.md — PocDevelopmentState schema shape (FR-002)
100
+ - [x] T031 [US1] Implement `phaseSummarize()` in src/loop/phaseSummarizer.ts — create new session, send transcript, extract with handler (FR-001, FR-003, FR-004, FR-005, FR-006). The Discover phase MAY skip this call if `businessContext` is already populated.
101
+ - [x] T032 [US1] Implement `needsSummarization()` in src/loop/phaseSummarizer.ts — check if phase's session field is null
102
+ - [x] T033 [US1] Implement `buildPhaseTranscript()` in src/loop/phaseSummarizer.ts — concatenate user+assistant turns for the phase
103
+ - [x] T034 [US1] Implement Mermaid diagram extraction in Design summarization path in src/loop/phaseSummarizer.ts — extract `mermaid` block from summarization response and store in session `evaluation.architectureDiagram` (FR-007a)
104
+ - [x] T035 [US1] Hook `phaseSummarize()` into ConversationLoop.run() after while loop exits, before return, in src/loop/conversationLoop.ts (FR-006)
105
+ - [x] T036 [US1] Run `npm run test:unit && npm run test:integration` — T020–T025 must PASS; all existing tests must still PASS
106
+
107
+ **Checkpoint**: All phases extract structured data via summarization fallback. Session fields populated.
108
+
109
+ ---
110
+
111
+ ## Phase 5: User Story 4 — Context Window Management (Priority: P2)
112
+
113
+ **Goal**: Later phases (Select, Plan, Develop) don't time out due to accumulated conversation history (BUG-003)
114
+
115
+ **Independent Test**: Build a session with 40+ turns, start Select phase, verify no timeout and summarized context in prompt.
116
+
117
+ ### Tests for US4 (REQUIRED — write first, must FAIL) ⚠️
118
+
119
+ - [x] T037 [P] [US4] Add failing test for `buildSummarizedContext()` in tests/unit/phases/contextSummarizer.spec.ts — with full session, verify all fields projected
120
+ - [x] T038 [P] [US4] Add failing test for `buildSummarizedContext()` in tests/unit/phases/contextSummarizer.spec.ts — with null fields, verify graceful omission
121
+ - [x] T039 [P] [US4] Add failing test for `renderSummarizedContext()` in tests/unit/phases/contextSummarizer.spec.ts — verify markdown output format
122
+ - [x] T040 [P] [US4] Add failing test in tests/unit/phases/phaseHandlers.spec.ts — verify Ideate handler uses renderSummarizedContext (not ad-hoc context)
123
+ - [x] T041 [P] [US4] Add failing test for ConversationLoop infiniteSessions forwarding in tests/unit/loop/conversationLoop.spec.ts
124
+ - [x] T042 [P] [US4] Add failing test for timeout-retry fallback (FR-019a) in tests/unit/loop/conversationLoop.spec.ts — on timeout, retry with minimal context
125
+ - [x] T043 [P] [US4] Add failing test for user-directed fallback (FR-019a) in tests/unit/loop/conversationLoop.spec.ts — on second timeout, ask user for manual input
126
+
127
+ ### Implementation for US4
128
+
129
+ - [x] T044 [US4] Implement `buildSummarizedContext()` in src/phases/contextSummarizer.ts — project all structured session fields including discoveryEnrichment (FR-016, FR-017)
130
+ - [x] T045 [US4] Implement `renderSummarizedContext()` in src/phases/contextSummarizer.ts — render as compact markdown section (FR-017)
131
+ - [x] T046 [US4] Replace ad-hoc context blocks in Ideate handler's `buildSystemPrompt()` with `renderSummarizedContext()` in src/phases/phaseHandlers.ts (FR-016)
132
+ - [x] T047 [P] [US4] Replace ad-hoc context blocks in Design handler's `buildSystemPrompt()` with `renderSummarizedContext()` in src/phases/phaseHandlers.ts (FR-016)
133
+ - [x] T048 [P] [US4] Replace ad-hoc context blocks in Select handler's `buildSystemPrompt()` with `renderSummarizedContext()` in src/phases/phaseHandlers.ts (FR-016)
134
+ - [x] T049 [P] [US4] Replace ad-hoc context blocks in Plan handler's `buildSystemPrompt()` with `renderSummarizedContext()` in src/phases/phaseHandlers.ts (FR-016)
135
+ - [x] T050 [P] [US4] Replace ad-hoc context blocks in Develop handler's `buildSystemPrompt()` with `renderSummarizedContext()` in src/phases/phaseHandlers.ts (FR-016)
136
+ - [x] T051 [US4] Verify ConversationLoop.run() only injects current-phase turns (not prior phases) in system prompt history block in src/loop/conversationLoop.ts (FR-018 — already implemented, add regression test)
137
+ - [x] T052 [US4] Add `infiniteSessions` option to `ConversationLoopOptions` and forward to `createSession()` in src/loop/conversationLoop.ts (FR-019)
138
+ - [x] T053 [US4] Implement minimal-context retry on timeout in ConversationLoop — on `sendAndWait` timeout, retry with only structured session fields and no conversation turns (FR-019a)
139
+ - [x] T054 [US4] Implement user-directed fallback in ConversationLoop — on second timeout after retry, present the best available answer to user and ask for manual confirmation (FR-019a)
140
+ - [x] T055 [US4] Pass `infiniteSessions: { backgroundCompactionThreshold: 0.7, bufferExhaustionThreshold: 0.9 }` from workshopCommand.ts to ConversationLoop (FR-019)
141
+ - [x] T056 [US4] Run `npm run test:unit` — T037–T043 must PASS; all existing tests must still PASS
142
+
143
+ **Checkpoint**: Select/Plan phases complete without timeout. Context is compact and accurate.
144
+
145
+ ---
146
+
147
+ ## Phase 6: User Story 3 — MCP Tool Wiring (Priority: P2)
148
+
149
+ **Goal**: Workshop phases use web search, WorkIQ, Context7, and Azure MCP for enrichment (BUG-005)
150
+
151
+ **Independent Test**: With MCP configured, run Discover and verify web search results stored; run Design and verify Context7 queried.
152
+
153
+ ### Tests for US3 (REQUIRED — write first, must FAIL) ⚠️
154
+
155
+ - [x] T057 [P] [US3] Add failing test in tests/unit/cli/workshopCommand.spec.ts — verify McpManager created from .vscode/mcp.json
156
+ - [x] T058 [P] [US3] Add failing test in tests/unit/cli/workshopCommand.spec.ts — verify WebSearchClient created when configured and passed to Discover handler
157
+ - [x] T059 [P] [US3] Add failing test in tests/unit/cli/workshopCommand.spec.ts — verify McpManager passed to Discover handler for WorkIQ consent flow (FR-012a)
158
+ - [x] T060 [P] [US3] Add failing test in tests/unit/phases/phaseHandlers.spec.ts — verify Design handler queries Context7 via McpManager in postExtract (FR-013)
159
+ - [x] T061 [P] [US3] Add failing test in tests/unit/phases/phaseHandlers.spec.ts — verify Plan handler queries Azure MCP via McpManager in postExtract (FR-014)
160
+ - [x] T062 [P] [US3] Add failing test in tests/unit/phases/phaseHandlers.spec.ts — verify Design handler degrades gracefully when Context7 unavailable (FR-015)
161
+
162
+ ### Implementation for US3
163
+
164
+ - [x] T063 [US3] Extend `PhaseHandlerConfig` with `mcpManager?: McpManager` and `webSearchClient?: WebSearchClient` in src/phases/phaseHandlers.ts (FR-011)
165
+ - [x] T064 [US3] Create `McpManager` in `workshopCommandInner()` from `.vscode/mcp.json` via `loadMcpConfig()` in src/cli/workshopCommand.ts (FR-011)
166
+ - [x] T065 [US3] Create `WebSearchClient` in `workshopCommandInner()` when `isWebSearchConfigured()` returns true in src/cli/workshopCommand.ts (FR-012)
167
+ - [x] T066 [US3] Pass `mcpManager` + `webSearchClient` to Discover handler via `PhaseHandlerConfig.discover` in src/cli/workshopCommand.ts (FR-012, FR-012a — verify existing WorkIQ consent flow activates when McpManager is wired)
168
+ - [x] T067 [US3] Pass `mcpManager` to all phase handler calls via `PhaseHandlerConfig` in src/cli/workshopCommand.ts
169
+ - [x] T068 [US3] Add `postExtract` hook to Design handler — query Context7 for technologies in `session.ideas` in src/phases/phaseHandlers.ts (FR-013)
170
+ - [x] T069 [US3] Add `postExtract` hook to Plan handler — query Azure MCP for services in `session.plan.architectureNotes` in src/phases/phaseHandlers.ts (FR-014)
171
+ - [x] T070 [US3] Wrap all MCP calls in try/catch for graceful degradation in src/phases/phaseHandlers.ts (FR-015)
172
+ - [x] T071 [US3] Run `npm run test:unit` — T057–T062 must PASS; all existing tests must still PASS
173
+
174
+ **Checkpoint**: MCP tools wired and operational. Enrichment flows working with graceful degradation.
175
+
176
+ ---
177
+
178
+ ## Phase 7: User Story 5 — Export Completeness (Priority: P2)
179
+
180
+ **Goal**: `sofia export` produces markdown files for all phases with conversation data, even without structured artifacts (BUG-004)
181
+
182
+ **Independent Test**: Export a session with null structured fields but 48 conversation turns — verify 6 markdown files generated.
183
+
184
+ ### Tests for US5 (REQUIRED — write first, must FAIL) ⚠️
185
+
186
+ - [x] T072 [P] [US5] Add failing test in tests/unit/sessions/exportWriter.spec.ts — Ideate export with null `session.ideas` but conversation turns produces ideate.md
187
+ - [x] T073 [P] [US5] Add failing test in tests/unit/sessions/exportWriter.spec.ts — Design export with null `session.evaluation` but turns produces design.md
188
+ - [x] T074 [P] [US5] Add failing test in tests/unit/sessions/exportWriter.spec.ts — export with both structured data + turns renders structured first then conversation
189
+ - [x] T075 [P] [US5] Add failing test in tests/unit/sessions/exportWriter.spec.ts — summary.json lists all 6 phase files when all phases have turns
190
+ - [x] T076 [P] [US5] Add failing test in tests/unit/sessions/exportWriter.spec.ts — summary.json highlights include one entry per phase with turns
191
+ - [x] T077 [P] [US5] Add failing integration test in tests/integration/exportFallbackFlow.spec.ts — full export pipeline with null structured data
192
+
193
+ ### Implementation for US5
194
+
195
+ - [x] T078 [US5] Refactor `generateIdeateMarkdown()` in src/sessions/exportWriter.ts — remove early return null; add conversation turn fallback (FR-020, FR-021, FR-022)
196
+ - [x] T079 [P] [US5] Refactor `generateDesignMarkdown()` in src/sessions/exportWriter.ts — same pattern (FR-020, FR-021, FR-022)
197
+ - [x] T080 [P] [US5] Refactor `generateSelectMarkdown()` in src/sessions/exportWriter.ts — same pattern (FR-020, FR-021, FR-022)
198
+ - [x] T081 [P] [US5] Refactor `generatePlanMarkdown()` in src/sessions/exportWriter.ts — same pattern (FR-020, FR-021, FR-022)
199
+ - [x] T082 [P] [US5] Refactor `generateDevelopMarkdown()` in src/sessions/exportWriter.ts — same pattern (FR-020, FR-021, FR-022)
200
+ - [x] T083 [US5] Update `exportSession()` in src/sessions/exportWriter.ts — summary.json lists all generated files (FR-023)
201
+ - [x] T084 [US5] Update highlight generation in src/sessions/exportWriter.ts — include one highlight per phase with turns, fallback to first assistant turn opening (FR-024)
202
+ - [x] T085 [US5] Run `npm run test:unit && npm run test:integration` — T072–T077 must PASS; all existing tests must still PASS
203
+
204
+ **Checkpoint**: Export produces complete artifacts for all 6 phases. summary.json includes all files and highlights.
205
+
206
+ ---
207
+
208
+ ## Phase 8: Polish & Cross-Cutting Concerns
209
+
210
+ **Purpose**: Final validation, regression testing, cleanup
211
+
212
+ - [x] T086 [P] Update build assets script in package.json to copy `src/prompts/summarize/*.md` to `dist/src/prompts/summarize/`
213
+ - [x] T087 [P] Run `npm run typecheck` — zero errors
214
+ - [x] T088 [P] Run `npm run lint` — zero errors (fix any import ordering issues)
215
+ - [x] T089 Run full test suite: `npm run test:unit && npm run test:integration && npm run test:e2e`
216
+ - [x] T090 Update Zava assessment test in tests/live/zavaFullWorkshop.spec.ts — relax assertion on `session.ideas` (now expected to pass), add assertions for extraction, export completeness
217
+ - [x] T091 Add failure/recovery E2E scenario to Zava live test in tests/live/zavaFullWorkshop.spec.ts — simulate a phase timeout and verify recovery fallback activates (FR-019a, Constitution Principle VI)
218
+ - [x] T092 Run quickstart.md validation — verify all file paths and commands in quickstart.md are accurate
219
+
220
+ ---
221
+
222
+ ## Dependencies & Execution Order
223
+
224
+ ### Phase Dependencies
225
+
226
+ - **Phase 1 (Setup)**: No dependencies — start immediately
227
+ - **Phase 2 (Foundational)**: Depends on Phase 1 — BLOCKS all user stories
228
+ - **Phase 3 (US2 — Web Search)**: Depends on Phase 2 — independent of other stories
229
+ - **Phase 4 (US1 — Extraction)**: Depends on Phase 2 — independent of other stories
230
+ - **Phase 5 (US4 — Context)**: Depends on Phase 2 — independent of other stories
231
+ - **Phase 6 (US3 — MCP Wiring)**: Depends on Phase 3 (needs lazy web search) — can parallel with US1/US4
232
+ - **Phase 7 (US5 — Export)**: Depends on Phase 2 — independent of other stories (benefits from US1 but works without)
233
+ - **Phase 8 (Polish)**: Depends on all user stories being complete
234
+
235
+ ### User Story Dependencies
236
+
237
+ | Story | Can Start After | Parallel With | Notes |
238
+ | --------------------- | --------------- | ------------------ | -------------------------------------------------------- |
239
+ | US2 (P1 — Web Search) | Phase 2 | US1, US4, US5 | Quick win — 3 FRs, ~4 tasks |
240
+ | US1 (P1 — Extraction) | Phase 2 | US2, US4, US5 | Largest story — 7 FRs, ~16 tasks |
241
+ | US4 (P2 — Context) | Phase 2 | US1, US2, US5 | Modifies phaseHandlers shared with US3 |
242
+ | US3 (P2 — MCP) | Phase 3 (US2) | US1, US4, US5 | Needs web search from US2; shares phaseHandlers with US4 |
243
+ | US5 (P2 — Export) | Phase 2 | US1, US2, US3, US4 | Standalone module; benefits from US1 but works without |
244
+
245
+ ### Within Each User Story
246
+
247
+ 1. Tests written first — MUST FAIL before implementation
248
+ 2. Implementation tasks in dependency order
249
+ 3. Green checkpoint: all tests pass
250
+ 4. Full suite verification before moving to next story
251
+
252
+ ### Parallel Opportunities
253
+
254
+ ```text
255
+ After Phase 2 completes:
256
+
257
+ ┌──── US2 (Web Search) ─────┐
258
+ │ │
259
+ │ ┌──── US1 (Extraction) ──┼──── US3 (MCP Wiring) ────┐
260
+ │ │ │ │
261
+ │ │ ┌── US4 (Context) ───┘ │
262
+ │ │ │ │
263
+ │ │ │ ┌── US5 (Export) ──────────────────────────────┘
264
+ │ │ │ │
265
+ ▼ ▼ ▼ ▼
266
+ Phase 8: Polish
267
+ ```
268
+
269
+ ---
270
+
271
+ ## Implementation Strategy
272
+
273
+ ### MVP First (US2 + US1 Only)
274
+
275
+ 1. Complete Phase 1: Setup
276
+ 2. Complete Phase 2: Foundational (multi-block extraction + phase boundaries)
277
+ 3. Complete Phase 3: US2 — Lazy Web Search Config (quick win, ~30 min)
278
+ 4. Complete Phase 4: US1 — Phase Data Extraction (summarization pipeline, ~2 hours)
279
+ 5. **STOP and VALIDATE**: Run Zava assessment test — extraction should now populate all fields
280
+
281
+ ### Incremental Delivery
282
+
283
+ 1. Setup + Foundational → Base ready
284
+ 2. US2 (Web Search) → Config bug fixed → Deploy
285
+ 3. US1 (Extraction) → Structured data flows → Deploy (major milestone)
286
+ 4. US4 (Context) → Timeout prevention → Deploy
287
+ 5. US3 (MCP Wiring) → Enrichment active → Deploy
288
+ 6. US5 (Export) → Complete exports → Deploy
289
+ 7. Polish → Full regression pass → Release
290
+
291
+ ### Suggested MVP Scope
292
+
293
+ **US2 + US1** cover the two P1 stories and directly address:
294
+
295
+ - BUG-001 (web search config)
296
+ - BUG-002 (extraction failures)
297
+ - Indirectly improves BUG-004 (export now has structured data to render)
298
+
299
+ This combination targets the biggest score improvement in the Zava assessment.
300
+
301
+ ---
302
+
303
+ ## Summary
304
+
305
+ | Metric | Value |
306
+ | -------------------------- | ----------------------------------------------------- |
307
+ | **Total tasks** | 92 |
308
+ | **US1 (Extraction)** | 17 tasks (incl. Mermaid extraction T034) |
309
+ | **US2 (Web Search)** | 7 tasks |
310
+ | **US3 (MCP Wiring)** | 15 tasks |
311
+ | **US4 (Context)** | 20 tasks (incl. split timeout retry/fallback) |
312
+ | **US5 (Export)** | 14 tasks |
313
+ | **Setup + Foundational** | 12 tasks |
314
+ | **Polish** | 7 tasks (incl. failure/recovery E2E scenario) |
315
+ | **Parallel opportunities** | US2/US1/US4/US5 can all start after Phase 2 completes |
316
+ | **MVP scope** | US2 + US1 (24 tasks, addresses P1 stories) |