sofia-cli 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (435) hide show
  1. package/.github/agents/copilot-instructions.md +39 -0
  2. package/.github/agents/speckit.analyze.agent.md +184 -0
  3. package/.github/agents/speckit.checklist.agent.md +294 -0
  4. package/.github/agents/speckit.clarify.agent.md +181 -0
  5. package/.github/agents/speckit.constitution.agent.md +84 -0
  6. package/.github/agents/speckit.implement.agent.md +135 -0
  7. package/.github/agents/speckit.plan.agent.md +90 -0
  8. package/.github/agents/speckit.specify.agent.md +258 -0
  9. package/.github/agents/speckit.tasks.agent.md +137 -0
  10. package/.github/agents/speckit.taskstoissues.agent.md +30 -0
  11. package/.github/copilot-instructions.md +257 -0
  12. package/.github/prompts/speckit.analyze.prompt.md +3 -0
  13. package/.github/prompts/speckit.checklist.prompt.md +3 -0
  14. package/.github/prompts/speckit.clarify.prompt.md +3 -0
  15. package/.github/prompts/speckit.constitution.prompt.md +3 -0
  16. package/.github/prompts/speckit.implement.prompt.md +3 -0
  17. package/.github/prompts/speckit.plan.prompt.md +3 -0
  18. package/.github/prompts/speckit.specify.prompt.md +3 -0
  19. package/.github/prompts/speckit.tasks.prompt.md +3 -0
  20. package/.github/prompts/speckit.taskstoissues.prompt.md +3 -0
  21. package/.github/workflows/ci.yml +38 -0
  22. package/.prettierrc +6 -0
  23. package/.specify/memory/constitution.md +181 -0
  24. package/.specify/scripts/bash/check-prerequisites.sh +166 -0
  25. package/.specify/scripts/bash/common.sh +156 -0
  26. package/.specify/scripts/bash/create-new-feature.sh +297 -0
  27. package/.specify/scripts/bash/setup-plan.sh +61 -0
  28. package/.specify/scripts/bash/update-agent-context.sh +810 -0
  29. package/.specify/templates/agent-file-template.md +28 -0
  30. package/.specify/templates/checklist-template.md +40 -0
  31. package/.specify/templates/constitution-template.md +50 -0
  32. package/.specify/templates/plan-template.md +113 -0
  33. package/.specify/templates/spec-template.md +115 -0
  34. package/.specify/templates/tasks-template.md +251 -0
  35. package/.vscode/mcp.json +42 -0
  36. package/.vscode/settings.json +19 -0
  37. package/CODE_OF_CONDUCT.md +128 -0
  38. package/LICENSE +21 -0
  39. package/README.md +213 -0
  40. package/dist/src/cli/developCommand.js +240 -0
  41. package/dist/src/cli/directCommands.js +143 -0
  42. package/dist/src/cli/envLoader.js +16 -0
  43. package/dist/src/cli/exportCommand.js +53 -0
  44. package/dist/src/cli/index.js +203 -0
  45. package/dist/src/cli/ioContext.js +109 -0
  46. package/dist/src/cli/preflight.js +57 -0
  47. package/dist/src/cli/statusCommand.js +110 -0
  48. package/dist/src/cli/workshopCommand.js +400 -0
  49. package/dist/src/develop/checkpointState.js +86 -0
  50. package/dist/src/develop/codeGenerator.js +319 -0
  51. package/dist/src/develop/dynamicScaffolder.js +226 -0
  52. package/dist/src/develop/githubMcpAdapter.js +122 -0
  53. package/dist/src/develop/index.js +15 -0
  54. package/dist/src/develop/mcpContextEnricher.js +195 -0
  55. package/dist/src/develop/pocScaffolder.js +542 -0
  56. package/dist/src/develop/ralphLoop.js +659 -0
  57. package/dist/src/develop/templateRegistry.js +364 -0
  58. package/dist/src/develop/testRunner.js +202 -0
  59. package/dist/src/logging/logger.js +58 -0
  60. package/dist/src/loop/conversationLoop.js +227 -0
  61. package/dist/src/loop/phaseSummarizer.js +87 -0
  62. package/dist/src/mcp/mcpManager.js +267 -0
  63. package/dist/src/mcp/mcpTransport.js +391 -0
  64. package/dist/src/mcp/retryPolicy.js +47 -0
  65. package/dist/src/mcp/webSearch.js +254 -0
  66. package/dist/src/phases/contextSummarizer.js +101 -0
  67. package/dist/src/phases/discoveryEnricher.js +156 -0
  68. package/dist/src/phases/phaseExtractors.js +222 -0
  69. package/dist/src/phases/phaseHandlers.js +328 -0
  70. package/dist/src/prompts/design.md +51 -0
  71. package/dist/src/prompts/develop-boundary.md +51 -0
  72. package/dist/src/prompts/develop.md +111 -0
  73. package/dist/src/prompts/discover.md +58 -0
  74. package/dist/src/prompts/ideate.md +56 -0
  75. package/dist/src/prompts/plan.md +51 -0
  76. package/dist/src/prompts/promptLoader.js +167 -0
  77. package/dist/src/prompts/promptLoader.ts +198 -0
  78. package/dist/src/prompts/select.md +47 -0
  79. package/dist/src/prompts/summarize/README.md +8 -0
  80. package/dist/src/prompts/summarize/design-summary.md +37 -0
  81. package/dist/src/prompts/summarize/develop-summary.md +25 -0
  82. package/dist/src/prompts/summarize/ideate-summary.md +27 -0
  83. package/dist/src/prompts/summarize/plan-summary.md +27 -0
  84. package/dist/src/prompts/summarize/select-summary.md +21 -0
  85. package/dist/src/prompts/system.md +28 -0
  86. package/dist/src/sessions/exportPaths.js +22 -0
  87. package/dist/src/sessions/exportWriter.js +406 -0
  88. package/dist/src/sessions/sessionManager.js +81 -0
  89. package/dist/src/sessions/sessionStore.js +65 -0
  90. package/dist/src/shared/activitySpinner.js +91 -0
  91. package/dist/src/shared/copilotClient.js +129 -0
  92. package/dist/src/shared/data/cards.json +1249 -0
  93. package/dist/src/shared/data/cardsLoader.js +51 -0
  94. package/dist/src/shared/errorClassifier.js +120 -0
  95. package/dist/src/shared/events.js +28 -0
  96. package/dist/src/shared/markdownRenderer.js +34 -0
  97. package/dist/src/shared/schemas/session.js +265 -0
  98. package/dist/src/shared/tableRenderer.js +20 -0
  99. package/dist/src/vendor/chalk.js +2 -0
  100. package/dist/src/vendor/cli-table3.js +3 -0
  101. package/dist/src/vendor/commander.js +2 -0
  102. package/dist/src/vendor/marked-terminal.js +3 -0
  103. package/dist/src/vendor/marked.js +2 -0
  104. package/dist/src/vendor/ora.js +2 -0
  105. package/dist/src/vendor/pino.js +2 -0
  106. package/dist/src/vendor/zod.js +2 -0
  107. package/dist/tests/e2e/developE2e.spec.js +126 -0
  108. package/dist/tests/e2e/developFailureE2e.spec.js +247 -0
  109. package/dist/tests/e2e/developPty.spec.js +75 -0
  110. package/dist/tests/e2e/discoveryWebSearchRelevance.spec.js +84 -0
  111. package/dist/tests/e2e/harness.spec.js +83 -0
  112. package/dist/tests/e2e/mcpLive.spec.js +120 -0
  113. package/dist/tests/e2e/newSession.e2e.spec.js +177 -0
  114. package/dist/tests/e2e/ralphLoopEnrichmentComparison.spec.js +62 -0
  115. package/dist/tests/e2e/workiqEnrichment.spec.js +56 -0
  116. package/dist/tests/e2e/zavaSimulation.spec.js +452 -0
  117. package/dist/tests/fixtures/test-fixture-project/src/add.js +3 -0
  118. package/dist/tests/fixtures/test-fixture-project/tests/failing.test.js +6 -0
  119. package/dist/tests/fixtures/test-fixture-project/tests/hanging.test.js +8 -0
  120. package/dist/tests/fixtures/test-fixture-project/tests/passing.test.js +10 -0
  121. package/dist/tests/fixtures/test-fixture-project/vitest.config.js +6 -0
  122. package/dist/tests/integration/autoStartConversation.spec.js +138 -0
  123. package/dist/tests/integration/defaultCommand.spec.js +147 -0
  124. package/dist/tests/integration/directCommandNonTty.spec.js +224 -0
  125. package/dist/tests/integration/directCommandTty.spec.js +151 -0
  126. package/dist/tests/integration/discoveryEnrichmentFlow.spec.js +175 -0
  127. package/dist/tests/integration/exportArtifacts.spec.js +202 -0
  128. package/dist/tests/integration/exportFallbackFlow.spec.js +99 -0
  129. package/dist/tests/integration/mcpDegradationFlow.spec.js +190 -0
  130. package/dist/tests/integration/mcpTransportFlow.spec.js +139 -0
  131. package/dist/tests/integration/newSessionFlow.spec.js +343 -0
  132. package/dist/tests/integration/pocGithubMcp.spec.js +186 -0
  133. package/dist/tests/integration/pocLocalFallback.spec.js +171 -0
  134. package/dist/tests/integration/pocScaffold.spec.js +163 -0
  135. package/dist/tests/integration/ralphLoopFlow.spec.js +359 -0
  136. package/dist/tests/integration/ralphLoopPartial.spec.js +368 -0
  137. package/dist/tests/integration/resumeAndBacktrack.spec.js +247 -0
  138. package/dist/tests/integration/spinnerLifecycle.spec.js +220 -0
  139. package/dist/tests/integration/summarizationFlow.spec.js +115 -0
  140. package/dist/tests/integration/testRunnerReal.spec.js +52 -0
  141. package/dist/tests/integration/webSearchAgent.spec.js +128 -0
  142. package/dist/tests/live/copilotSdkLive.spec.js +107 -0
  143. package/dist/tests/live/zavaFullWorkshop.spec.js +392 -0
  144. package/dist/tests/setup/loadEnv.js +3 -0
  145. package/dist/tests/unit/cli/developCommand.spec.js +567 -0
  146. package/dist/tests/unit/cli/directCommands.spec.js +279 -0
  147. package/dist/tests/unit/cli/envLoader.spec.js +58 -0
  148. package/dist/tests/unit/cli/ioContext.spec.js +119 -0
  149. package/dist/tests/unit/cli/preflight.spec.js +108 -0
  150. package/dist/tests/unit/cli/statusCommand.spec.js +111 -0
  151. package/dist/tests/unit/cli/workshopClientFallback.spec.js +80 -0
  152. package/dist/tests/unit/cli/workshopCommand.spec.js +329 -0
  153. package/dist/tests/unit/config/vitestEnvSetup.spec.js +13 -0
  154. package/dist/tests/unit/develop/checkpointState.spec.js +315 -0
  155. package/dist/tests/unit/develop/codeGenerator.spec.js +355 -0
  156. package/dist/tests/unit/develop/githubMcpAdapter.spec.js +231 -0
  157. package/dist/tests/unit/develop/mcpContextEnricher.spec.js +433 -0
  158. package/dist/tests/unit/develop/outputValidator.spec.js +119 -0
  159. package/dist/tests/unit/develop/pocScaffolder.spec.js +353 -0
  160. package/dist/tests/unit/develop/ralphLoop.spec.js +1248 -0
  161. package/dist/tests/unit/develop/templateRegistry.spec.js +85 -0
  162. package/dist/tests/unit/develop/testRunner.spec.js +249 -0
  163. package/dist/tests/unit/infraBicep.spec.js +92 -0
  164. package/dist/tests/unit/infraDeploy.spec.js +82 -0
  165. package/dist/tests/unit/infraTeardown.spec.js +63 -0
  166. package/dist/tests/unit/logging/logger.spec.js +43 -0
  167. package/dist/tests/unit/loop/conversationLoop.spec.js +592 -0
  168. package/dist/tests/unit/loop/phaseSummarizer.spec.js +141 -0
  169. package/dist/tests/unit/loop/streamingMarkdown.spec.js +147 -0
  170. package/dist/tests/unit/mcp/mcpManager.spec.js +279 -0
  171. package/dist/tests/unit/mcp/mcpTransport.spec.js +529 -0
  172. package/dist/tests/unit/mcp/retryPolicy.spec.js +218 -0
  173. package/dist/tests/unit/mcp/timeoutValidation.spec.js +46 -0
  174. package/dist/tests/unit/mcp/webSearch.spec.js +567 -0
  175. package/dist/tests/unit/phases/contextSummarizer.spec.js +140 -0
  176. package/dist/tests/unit/phases/discoveryEnricher.repeatCalls.spec.js +93 -0
  177. package/dist/tests/unit/phases/discoveryEnricher.spec.js +411 -0
  178. package/dist/tests/unit/phases/phaseExtractors.spec.js +352 -0
  179. package/dist/tests/unit/phases/phaseHandlers.spec.js +425 -0
  180. package/dist/tests/unit/prompts/promptLoader.spec.js +118 -0
  181. package/dist/tests/unit/schemas/pocSchemas.spec.js +412 -0
  182. package/dist/tests/unit/schemas/session.spec.js +257 -0
  183. package/dist/tests/unit/sessions/exportPaths.spec.js +31 -0
  184. package/dist/tests/unit/sessions/exportWriter.spec.js +655 -0
  185. package/dist/tests/unit/sessions/sessionManager.spec.js +151 -0
  186. package/dist/tests/unit/sessions/sessionStore.spec.js +116 -0
  187. package/dist/tests/unit/shared/activitySpinner.spec.js +175 -0
  188. package/dist/tests/unit/shared/cardsLoader.spec.js +76 -0
  189. package/dist/tests/unit/shared/copilotClient.spec.js +155 -0
  190. package/dist/tests/unit/shared/errorClassifier.spec.js +131 -0
  191. package/dist/tests/unit/shared/events.spec.js +55 -0
  192. package/dist/tests/unit/shared/markdownRenderer.spec.js +35 -0
  193. package/dist/tests/unit/shared/markdownRendererChunks.spec.js +70 -0
  194. package/dist/tests/unit/shared/tableRenderer.spec.js +34 -0
  195. package/dist/vitest.config.js +14 -0
  196. package/dist/vitest.live.config.js +18 -0
  197. package/docs/README.md +35 -0
  198. package/docs/architecture.md +169 -0
  199. package/docs/cli-usage.md +207 -0
  200. package/docs/environment.md +66 -0
  201. package/docs/export-format.md +146 -0
  202. package/docs/session-model.md +113 -0
  203. package/eslint.config.js +35 -0
  204. package/infra/deploy.sh +193 -0
  205. package/infra/gather-env.sh +211 -0
  206. package/infra/main.bicep +90 -0
  207. package/infra/main.bicepparam +18 -0
  208. package/infra/resources.bicep +134 -0
  209. package/infra/teardown.sh +114 -0
  210. package/package.json +63 -0
  211. package/specs/001-cli-workshop-rebuild/checklists/requirements.md +35 -0
  212. package/specs/001-cli-workshop-rebuild/contracts/cli.md +59 -0
  213. package/specs/001-cli-workshop-rebuild/contracts/export-summary-json.md +23 -0
  214. package/specs/001-cli-workshop-rebuild/contracts/session-json.md +30 -0
  215. package/specs/001-cli-workshop-rebuild/data-model.md +210 -0
  216. package/specs/001-cli-workshop-rebuild/plan.md +361 -0
  217. package/specs/001-cli-workshop-rebuild/quickstart.md +83 -0
  218. package/specs/001-cli-workshop-rebuild/research.md +116 -0
  219. package/specs/001-cli-workshop-rebuild/spec.md +240 -0
  220. package/specs/001-cli-workshop-rebuild/tasks.md +476 -0
  221. package/specs/002-poc-generation/contracts/poc-output.md +172 -0
  222. package/specs/002-poc-generation/contracts/ralph-loop.md +113 -0
  223. package/specs/002-poc-generation/data-model.md +172 -0
  224. package/specs/002-poc-generation/plan.md +109 -0
  225. package/specs/002-poc-generation/quickstart.md +97 -0
  226. package/specs/002-poc-generation/research.md +786 -0
  227. package/specs/002-poc-generation/spec.md +81 -0
  228. package/specs/002-poc-generation/tasks-fix.md +198 -0
  229. package/specs/002-poc-generation/tasks.md +252 -0
  230. package/specs/003-mcp-transport-integration/checklists/requirements.md +37 -0
  231. package/specs/003-mcp-transport-integration/contracts/context-enricher.md +220 -0
  232. package/specs/003-mcp-transport-integration/contracts/discovery-enricher.md +267 -0
  233. package/specs/003-mcp-transport-integration/contracts/github-adapter.md +149 -0
  234. package/specs/003-mcp-transport-integration/contracts/mcp-transport.md +288 -0
  235. package/specs/003-mcp-transport-integration/data-model.md +326 -0
  236. package/specs/003-mcp-transport-integration/plan.md +114 -0
  237. package/specs/003-mcp-transport-integration/quickstart.md +311 -0
  238. package/specs/003-mcp-transport-integration/research.md +395 -0
  239. package/specs/003-mcp-transport-integration/spec.md +234 -0
  240. package/specs/003-mcp-transport-integration/tasks.md +324 -0
  241. package/specs/003-next-spec-gaps.md +150 -0
  242. package/specs/004-dev-resume-hardening/checklists/requirements.md +37 -0
  243. package/specs/004-dev-resume-hardening/contracts/cli.md +160 -0
  244. package/specs/004-dev-resume-hardening/data-model.md +321 -0
  245. package/specs/004-dev-resume-hardening/plan.md +107 -0
  246. package/specs/004-dev-resume-hardening/quickstart.md +115 -0
  247. package/specs/004-dev-resume-hardening/research.md +142 -0
  248. package/specs/004-dev-resume-hardening/spec.md +221 -0
  249. package/specs/004-dev-resume-hardening/tasks.md +333 -0
  250. package/specs/005-ai-search-deploy/checklists/requirements.md +39 -0
  251. package/specs/005-ai-search-deploy/contracts/web-search-tool.md +241 -0
  252. package/specs/005-ai-search-deploy/data-model.md +130 -0
  253. package/specs/005-ai-search-deploy/plan.md +93 -0
  254. package/specs/005-ai-search-deploy/quickstart.md +96 -0
  255. package/specs/005-ai-search-deploy/research.md +187 -0
  256. package/specs/005-ai-search-deploy/spec.md +143 -0
  257. package/specs/005-ai-search-deploy/tasks.md +284 -0
  258. package/specs/006-workshop-extraction-fixes/checklists/requirements.md +61 -0
  259. package/specs/006-workshop-extraction-fixes/contracts/summarization-and-export.md +131 -0
  260. package/specs/006-workshop-extraction-fixes/data-model.md +149 -0
  261. package/specs/006-workshop-extraction-fixes/plan.md +123 -0
  262. package/specs/006-workshop-extraction-fixes/quickstart.md +101 -0
  263. package/specs/006-workshop-extraction-fixes/research.md +143 -0
  264. package/specs/006-workshop-extraction-fixes/spec.md +210 -0
  265. package/specs/006-workshop-extraction-fixes/tasks.md +316 -0
  266. package/src/cli/developCommand.ts +308 -0
  267. package/src/cli/directCommands.ts +195 -0
  268. package/src/cli/envLoader.ts +17 -0
  269. package/src/cli/exportCommand.ts +65 -0
  270. package/src/cli/index.ts +249 -0
  271. package/src/cli/ioContext.ts +139 -0
  272. package/src/cli/preflight.ts +86 -0
  273. package/src/cli/statusCommand.ts +118 -0
  274. package/src/cli/workshopCommand.ts +496 -0
  275. package/src/develop/checkpointState.ts +121 -0
  276. package/src/develop/codeGenerator.ts +402 -0
  277. package/src/develop/dynamicScaffolder.ts +284 -0
  278. package/src/develop/githubMcpAdapter.ts +199 -0
  279. package/src/develop/index.ts +34 -0
  280. package/src/develop/mcpContextEnricher.ts +279 -0
  281. package/src/develop/pocScaffolder.ts +646 -0
  282. package/src/develop/ralphLoop.ts +1044 -0
  283. package/src/develop/templateRegistry.ts +427 -0
  284. package/src/develop/testRunner.ts +276 -0
  285. package/src/logging/logger.ts +73 -0
  286. package/src/loop/conversationLoop.ts +355 -0
  287. package/src/loop/phaseSummarizer.ts +114 -0
  288. package/src/mcp/mcpManager.ts +365 -0
  289. package/src/mcp/mcpTransport.ts +562 -0
  290. package/src/mcp/retryPolicy.ts +87 -0
  291. package/src/mcp/webSearch.ts +388 -0
  292. package/src/originalPrompts/design_thinking.md +178 -0
  293. package/src/originalPrompts/design_thinking_persona.md +76 -0
  294. package/src/originalPrompts/document_generator_example.md +77 -0
  295. package/src/originalPrompts/document_generator_persona.md +47 -0
  296. package/src/originalPrompts/facilitator_persona.md +125 -0
  297. package/src/originalPrompts/guardrails.md +47 -0
  298. package/src/phases/contextSummarizer.ts +154 -0
  299. package/src/phases/discoveryEnricher.ts +223 -0
  300. package/src/phases/phaseExtractors.ts +247 -0
  301. package/src/phases/phaseHandlers.ts +450 -0
  302. package/src/prompts/design.md +51 -0
  303. package/src/prompts/develop-boundary.md +51 -0
  304. package/src/prompts/develop.md +111 -0
  305. package/src/prompts/discover.md +58 -0
  306. package/src/prompts/ideate.md +56 -0
  307. package/src/prompts/plan.md +51 -0
  308. package/src/prompts/promptLoader.ts +198 -0
  309. package/src/prompts/select.md +47 -0
  310. package/src/prompts/summarize/README.md +8 -0
  311. package/src/prompts/summarize/design-summary.md +37 -0
  312. package/src/prompts/summarize/develop-summary.md +25 -0
  313. package/src/prompts/summarize/ideate-summary.md +27 -0
  314. package/src/prompts/summarize/plan-summary.md +27 -0
  315. package/src/prompts/summarize/select-summary.md +21 -0
  316. package/src/prompts/system.md +28 -0
  317. package/src/sessions/exportPaths.ts +28 -0
  318. package/src/sessions/exportWriter.ts +490 -0
  319. package/src/sessions/sessionManager.ts +119 -0
  320. package/src/sessions/sessionStore.ts +69 -0
  321. package/src/shared/activitySpinner.ts +108 -0
  322. package/src/shared/copilotClient.ts +291 -0
  323. package/src/shared/data/cards.json +1249 -0
  324. package/src/shared/data/cardsLoader.ts +70 -0
  325. package/src/shared/errorClassifier.ts +160 -0
  326. package/src/shared/events.ts +103 -0
  327. package/src/shared/markdownRenderer.ts +44 -0
  328. package/src/shared/schemas/session.ts +346 -0
  329. package/src/shared/tableRenderer.ts +28 -0
  330. package/src/types/marked-terminal.d.ts +5 -0
  331. package/src/vendor/chalk.ts +2 -0
  332. package/src/vendor/cli-table3.ts +3 -0
  333. package/src/vendor/commander.ts +2 -0
  334. package/src/vendor/marked-terminal.ts +3 -0
  335. package/src/vendor/marked.ts +2 -0
  336. package/src/vendor/ora.ts +2 -0
  337. package/src/vendor/pino.ts +3 -0
  338. package/src/vendor/zod.ts +3 -0
  339. package/tests/e2e/developE2e.spec.ts +152 -0
  340. package/tests/e2e/developFailureE2e.spec.ts +289 -0
  341. package/tests/e2e/developPty.spec.ts +86 -0
  342. package/tests/e2e/discoveryWebSearchRelevance.spec.ts +103 -0
  343. package/tests/e2e/harness.spec.ts +104 -0
  344. package/tests/e2e/mcpLive.spec.ts +149 -0
  345. package/tests/e2e/newSession.e2e.spec.ts +245 -0
  346. package/tests/e2e/ralphLoopEnrichmentComparison.spec.ts +70 -0
  347. package/tests/e2e/workiqEnrichment.spec.ts +72 -0
  348. package/tests/e2e/zava-assessment/agent-interaction-script.md +258 -0
  349. package/tests/e2e/zava-assessment/company-profile.md +98 -0
  350. package/tests/e2e/zava-assessment/expected-results-checklist.md +454 -0
  351. package/tests/e2e/zavaSimulation.spec.ts +511 -0
  352. package/tests/fixtures/completedSession.json +141 -0
  353. package/tests/fixtures/test-fixture-project/package-lock.json +1585 -0
  354. package/tests/fixtures/test-fixture-project/package.json +12 -0
  355. package/tests/fixtures/test-fixture-project/src/add.ts +3 -0
  356. package/tests/fixtures/test-fixture-project/tests/failing.test.ts +7 -0
  357. package/tests/fixtures/test-fixture-project/tests/hanging.test.ts +9 -0
  358. package/tests/fixtures/test-fixture-project/tests/passing.test.ts +13 -0
  359. package/tests/fixtures/test-fixture-project/vitest.config.ts +7 -0
  360. package/tests/integration/autoStartConversation.spec.ts +168 -0
  361. package/tests/integration/defaultCommand.spec.ts +179 -0
  362. package/tests/integration/directCommandNonTty.spec.ts +260 -0
  363. package/tests/integration/directCommandTty.spec.ts +185 -0
  364. package/tests/integration/discoveryEnrichmentFlow.spec.ts +209 -0
  365. package/tests/integration/exportArtifacts.spec.ts +232 -0
  366. package/tests/integration/exportFallbackFlow.spec.ts +115 -0
  367. package/tests/integration/mcpDegradationFlow.spec.ts +231 -0
  368. package/tests/integration/mcpTransportFlow.spec.ts +178 -0
  369. package/tests/integration/newSessionFlow.spec.ts +406 -0
  370. package/tests/integration/pocGithubMcp.spec.ts +224 -0
  371. package/tests/integration/pocLocalFallback.spec.ts +205 -0
  372. package/tests/integration/pocScaffold.spec.ts +220 -0
  373. package/tests/integration/ralphLoopFlow.spec.ts +430 -0
  374. package/tests/integration/ralphLoopPartial.spec.ts +416 -0
  375. package/tests/integration/resumeAndBacktrack.spec.ts +278 -0
  376. package/tests/integration/spinnerLifecycle.spec.ts +270 -0
  377. package/tests/integration/summarizationFlow.spec.ts +135 -0
  378. package/tests/integration/testRunnerReal.spec.ts +63 -0
  379. package/tests/integration/webSearchAgent.spec.ts +155 -0
  380. package/tests/live/copilotSdkLive.spec.ts +149 -0
  381. package/tests/live/zavaFullWorkshop.spec.ts +515 -0
  382. package/tests/setup/loadEnv.ts +5 -0
  383. package/tests/unit/cli/developCommand.spec.ts +679 -0
  384. package/tests/unit/cli/directCommands.spec.ts +325 -0
  385. package/tests/unit/cli/envLoader.spec.ts +73 -0
  386. package/tests/unit/cli/ioContext.spec.ts +148 -0
  387. package/tests/unit/cli/preflight.spec.ts +125 -0
  388. package/tests/unit/cli/statusCommand.spec.ts +134 -0
  389. package/tests/unit/cli/workshopClientFallback.spec.ts +100 -0
  390. package/tests/unit/cli/workshopCommand.spec.ts +378 -0
  391. package/tests/unit/config/vitestEnvSetup.spec.ts +24 -0
  392. package/tests/unit/develop/checkpointState.spec.ts +378 -0
  393. package/tests/unit/develop/codeGenerator.spec.ts +447 -0
  394. package/tests/unit/develop/githubMcpAdapter.spec.ts +283 -0
  395. package/tests/unit/develop/mcpContextEnricher.spec.ts +564 -0
  396. package/tests/unit/develop/outputValidator.spec.ts +134 -0
  397. package/tests/unit/develop/pocScaffolder.spec.ts +451 -0
  398. package/tests/unit/develop/ralphLoop.spec.ts +1439 -0
  399. package/tests/unit/develop/templateRegistry.spec.ts +106 -0
  400. package/tests/unit/develop/testRunner.spec.ts +294 -0
  401. package/tests/unit/infraBicep.spec.ts +116 -0
  402. package/tests/unit/infraDeploy.spec.ts +102 -0
  403. package/tests/unit/infraTeardown.spec.ts +77 -0
  404. package/tests/unit/logging/logger.spec.ts +50 -0
  405. package/tests/unit/loop/conversationLoop.spec.ts +719 -0
  406. package/tests/unit/loop/phaseSummarizer.spec.ts +169 -0
  407. package/tests/unit/loop/streamingMarkdown.spec.ts +180 -0
  408. package/tests/unit/mcp/mcpManager.spec.ts +336 -0
  409. package/tests/unit/mcp/mcpTransport.spec.ts +689 -0
  410. package/tests/unit/mcp/retryPolicy.spec.ts +278 -0
  411. package/tests/unit/mcp/timeoutValidation.spec.ts +55 -0
  412. package/tests/unit/mcp/webSearch.spec.ts +718 -0
  413. package/tests/unit/phases/contextSummarizer.spec.ts +158 -0
  414. package/tests/unit/phases/discoveryEnricher.repeatCalls.spec.ts +125 -0
  415. package/tests/unit/phases/discoveryEnricher.spec.ts +512 -0
  416. package/tests/unit/phases/phaseExtractors.spec.ts +406 -0
  417. package/tests/unit/phases/phaseHandlers.spec.ts +483 -0
  418. package/tests/unit/prompts/promptLoader.spec.ts +144 -0
  419. package/tests/unit/schemas/pocSchemas.spec.ts +457 -0
  420. package/tests/unit/schemas/session.spec.ts +328 -0
  421. package/tests/unit/sessions/exportPaths.spec.ts +38 -0
  422. package/tests/unit/sessions/exportWriter.spec.ts +737 -0
  423. package/tests/unit/sessions/sessionManager.spec.ts +174 -0
  424. package/tests/unit/sessions/sessionStore.spec.ts +136 -0
  425. package/tests/unit/shared/activitySpinner.spec.ts +211 -0
  426. package/tests/unit/shared/cardsLoader.spec.ts +89 -0
  427. package/tests/unit/shared/copilotClient.spec.ts +185 -0
  428. package/tests/unit/shared/errorClassifier.spec.ts +152 -0
  429. package/tests/unit/shared/events.spec.ts +71 -0
  430. package/tests/unit/shared/markdownRenderer.spec.ts +42 -0
  431. package/tests/unit/shared/markdownRendererChunks.spec.ts +83 -0
  432. package/tests/unit/shared/tableRenderer.spec.ts +38 -0
  433. package/tsconfig.json +20 -0
  434. package/vitest.config.ts +15 -0
  435. package/vitest.live.config.ts +19 -0
@@ -0,0 +1,395 @@
1
+ # Research: MCP Transport Integration
2
+
3
+ **Feature ID**: 003-mcp-transport-integration
4
+ **Date**: 2026-03-01
5
+ **Status**: Complete
6
+
7
+ ---
8
+
9
+ ## Topic 1: GitHub Copilot SDK MCP Support (FR-019)
10
+
11
+ ### Question
12
+
13
+ Does `@github/copilot-sdk` v0.1.28 provide native MCP transport — tool dispatch, server lifecycle management, stdio/HTTP protocol handling, and authentication? Where must we build custom transport?
14
+
15
+ ### Findings
16
+
17
+ **SDK scope (v0.1.28) — UPDATED**: The Copilot SDK provides **native MCP server management** via `SessionConfig.mcpServers`. When `mcpServers` is passed to `CopilotClient.createSession()`, the SDK handles the full lifecycle: spawning stdio subprocesses, connecting to HTTP endpoints, JSON-RPC framing, and tool dispatch during LLM conversations. The SDK exposes `MCPLocalServerConfig` (type `"local"` | `"stdio"`, with `command`, `args`, `env?`, `cwd?`, `tools`, `timeout?`) and `MCPRemoteServerConfig` (type `"http"` | `"sse"`, with `url`, `headers?`, `tools`, `timeout?`). These types map directly to `.vscode/mcp.json` server entries.
18
+
19
+ **However**, the SDK's `executeToolCall()` method on `CopilotClient` is **private** — there is no public API to invoke a tool programmatically outside of an LLM conversation turn. This means:
20
+
21
+ - **LLM-initiated tool calls** (e.g., web search during conversation, discovery enrichment) can leverage SDK-managed MCP servers. The SDK spawns, connects, and dispatches tool calls as needed — no custom transport required.
22
+ - **Programmatic adapter calls** (GitHub `createRepository`, Context7 `resolve-library-id`, Azure `documentation`) cannot use the SDK. These calls are made deterministically by application code, not by the LLM. They require our custom `McpTransport` layer.
23
+
24
+ **Dual-path architecture**:
25
+
26
+ 1. **SDK-managed path (LLM conversations)**: Pass `.vscode/mcp.json` config as `mcpServers` to `sdkClient.createSession()`. The SDK handles server lifecycle, JSON-RPC, and tool dispatch during `sendAndWait()` calls. Tools are available to the LLM without additional sofIA code.
27
+ 2. **Custom transport path (programmatic adapter calls)**: `McpManager.callTool()` uses our `StdioMcpTransport` / `HttpMcpTransport` for deterministic tool invocations by adapters (GitHub, Context7, Azure). These bypass the LLM entirely.
28
+
29
+ **SDK value in this feature**: The SDK is useful for both (a) the web search case (`web.search` tool) — already registered via `WEB_SEARCH_TOOL_DEFINITION` — and (b) making any MCP server's tools available to the LLM during conversation turns. For direct MCP tool calls made by adapters (GitHub adapter, Context7 enricher), the SDK is bypassed — adapters call `McpManager.callTool()` directly without going through the LLM conversation.
30
+
31
+ **⚠️ Dual-lifecycle limitation**: If the same MCP server (e.g., `context7`) is used both by the SDK (LLM conversation) and by a custom transport (programmatic adapter call), two separate subprocesses will be spawned. This is acceptable for Feature 003 scope but should be revisited if subprocess resource usage becomes a concern in later features.
32
+
33
+ ### Decision
34
+
35
+ **Dual-path approach**:
36
+
37
+ 1. **Wire SDK-managed MCP** — Pass `.vscode/mcp.json` config (converted to `MCPServerConfig` format) to `sdkClient.createSession({ mcpServers })` so the LLM can invoke MCP tools during conversation turns. This requires updating `copilotClient.ts` to accept and forward `mcpServers`.
38
+ 2. **Build custom MCP transport in `src/mcp/mcpTransport.ts`** — For the programmatic adapter path only (GitHub, Context7, Azure). The SDK's `executeToolCall` is private, so direct tool invocation from application code requires custom transport.
39
+
40
+ **Rationale**: Leveraging the SDK's native `mcpServers` support reduces the amount of custom protocol code needed for LLM conversation paths. The custom transport is still necessary for deterministic adapter calls that bypass the LLM. This dual approach aligns with the SDK's design intent.
41
+
42
+ **Alternatives considered**:
43
+
44
+ - Use the SDK tool-call loop for all MCP calls — rejected because it requires the LLM to initiate every tool call, making GitHub repo creation dependent on LLM decisions rather than deterministic code.
45
+ - Use only custom transport for everything — rejected because it ignores the SDK's native MCP support, duplicating server lifecycle management the SDK already provides for LLM conversations.
46
+ - Use a third-party MCP client library (`@modelcontextprotocol/sdk`) — viable but adds a dependency. The MCP client protocol (JSON-RPC 2.0 over stdio/HTTP) is simple enough to implement directly for the adapter path. Revisit in a later feature if the protocol surface grows.
47
+
48
+ ---
49
+
50
+ ## Topic 2: MCP Transport Protocol (stdio vs HTTP)
51
+
52
+ ### Question
53
+
54
+ What protocol framing does each MCP server type require? How do stdio servers (Context7, Azure, WorkIQ, Playwright) communicate, and how do HTTP servers (GitHub MCP, Microsoft Docs MCP) differ?
55
+
56
+ ### Findings
57
+
58
+ **MCP Protocol**: JSON-RPC 2.0. Every request is a JSON object:
59
+
60
+ ```json
61
+ {
62
+ "jsonrpc": "2.0",
63
+ "id": 1,
64
+ "method": "tools/call",
65
+ "params": { "name": "toolName", "arguments": {} }
66
+ }
67
+ ```
68
+
69
+ Every response is:
70
+
71
+ ```json
72
+ { "jsonrpc": "2.0", "id": 1, "result": { "content": [{ "type": "text", "text": "..." }] } }
73
+ ```
74
+
75
+ Errors use the standard JSON-RPC error envelope.
76
+
77
+ **Stdio servers** (Context7: `npx @upstash/context7-mcp`, Azure: `npx @azure/mcp server start`, WorkIQ: `npx @microsoft/workiq mcp`):
78
+
79
+ - Spawned as child processes (`child_process.spawn` with `stdio: ['pipe','pipe','pipe']`)
80
+ - JSON-RPC messages delimited by newlines over stdin/stdout
81
+ - One subprocess per server, kept alive for the session duration
82
+ - Initialization: send `initialize` request first, receive `initialized` notification
83
+ - Auth: inherits environment variables from the parent process (e.g., `GITHUB_TOKEN`, `AZURE_SUBSCRIPTION_ID`)
84
+
85
+ **HTTP servers** (GitHub MCP: `https://api.githubcopilot.com/mcp/`, Microsoft Docs: `https://learn.microsoft.com/api/mcp`):
86
+
87
+ - Standard HTTPS POST to the server URL with `Content-Type: application/json`
88
+ - Response is JSON-RPC response body
89
+ - Auth: HTTP Authorization header — GitHub MCP uses the user's Copilot token (extracted from the SDK session context or `GITHUB_TOKEN` env var)
90
+ - No persistent connection needed; each tool call is a stateless HTTP request
91
+
92
+ ### Decision
93
+
94
+ **Implement two transport strategies**:
95
+
96
+ 1. `StdioMcpTransport`: spawns subprocess, maintains persistent stdin/stdout pipe, sequences requests with pending-request map keyed by JSON-RPC id.
97
+ 2. `HttpMcpTransport`: wraps native `fetch()`, adds `Authorization` header from env `GITHUB_TOKEN`, sets timeout via `AbortController`.
98
+
99
+ Both implement a common `McpTransport` interface:
100
+
101
+ ```typescript
102
+ interface McpTransport {
103
+ callTool(
104
+ toolName: string,
105
+ args: Record<string, unknown>,
106
+ timeoutMs: number,
107
+ ): Promise<ToolCallResponse>;
108
+ isConnected(): boolean;
109
+ disconnect(): Promise<void>;
110
+ }
111
+ ```
112
+
113
+ **Rationale**: Clean abstraction isolates protocol details from `McpManager`, making both easily testable with mock transports.
114
+
115
+ **Alternatives considered**:
116
+
117
+ - Single class with type switch — rejected because it conflates two fundamentally different I/O models.
118
+ - Streaming responses (SSE) — deferred; none of the current MCP servers require streaming at the tool-call level.
119
+
120
+ ---
121
+
122
+ ## Topic 3: Authentication Model Per Transport
123
+
124
+ ### Question
125
+
126
+ How does each MCP server authenticate tool calls? What credentials are needed and how are they passed?
127
+
128
+ ### Findings
129
+
130
+ | Server | Transport | Auth Mechanism | Source |
131
+ | ------------------- | --------- | -------------------------------------------------------------------------------------------------------------- | --------------------------------------------------- |
132
+ | `github` | HTTP | `Authorization: Bearer <token>` | `GITHUB_TOKEN` env var (set by Copilot environment) |
133
+ | `context7` | stdio | None required for public package | npx subprocess, no auth needed |
134
+ | `azure` | stdio | Azure identity from env (`AZURE_SUBSCRIPTION_ID`, `AZURE_TENANT_ID`, `AZURE_CLIENT_ID`/DefaultAzureCredential) | Subprocess inherits env |
135
+ | `workiq` | stdio | Microsoft 365 OAuth — WorkIQ handles its own auth flow when launched | Subprocess prompts user or reads cached token |
136
+ | `microsoftdocs/mcp` | HTTP | None (public API) | No auth header needed |
137
+
138
+ **Key finding for WorkIQ**: WorkIQ's own auth flow means sofIA does not need to implement OAuth — it only needs to spawn the subprocess. However, WorkIQ will prompt the user to authenticate on first use. This is handled transparently by the subprocess.
139
+
140
+ **Key finding for GitHub HTTP transport**: The `GITHUB_TOKEN` environment variable is the standard Copilot-set credential. The `HttpMcpTransport` reads it at call time (not startup) to avoid storing it in memory.
141
+
142
+ ### Decision
143
+
144
+ Auth is **transport-level, not application-level**:
145
+
146
+ - `HttpMcpTransport` reads `process.env.GITHUB_TOKEN` per call, adds it as Bearer token.
147
+ - `StdioMcpTransport` passes the parent's `process.env` to the subprocess unchanged.
148
+ - No custom auth abstraction is needed. WorkIQ's own flow covers M365 auth.
149
+
150
+ ---
151
+
152
+ ## Topic 4: Retry Policy
153
+
154
+ ### Question
155
+
156
+ Which error types are transient (retryable) vs permanent (must not retry)?
157
+
158
+ ### Findings
159
+
160
+ From the spec (FR-004a) and error classification in `classifyMcpError()`:
161
+
162
+ | Error Class | Retryable | Examples |
163
+ | ---------------------- | ------------- | --------------------------------------------------- |
164
+ | `connection-refused` | ✅ Yes | MCP subprocess not yet ready, temporary port in use |
165
+ | `timeout` | ✅ Yes | Server temporarily slow, network hiccup |
166
+ | `dns-failure` | ✅ Yes (once) | Transient DNS issues |
167
+ | `auth-failure` | ❌ No | Invalid token, expired credentials, 401/403 HTTP |
168
+ | `unknown` | ❌ No | Malformed JSON response, logic errors |
169
+ | Validation error (Zod) | ❌ No | Bad args, schema mismatch |
170
+
171
+ **Backoff**: 1 second initial delay (jittered ±20%), exponential factor 2x, maximum 1 retry (as per spec — one automatic retry).
172
+
173
+ ### Decision
174
+
175
+ **`retryPolicy.ts`** exports a `withRetry<T>(fn, options)` helper that wraps any async call:
176
+
177
+ - Retries once on transient errors after `initialDelayMs` (default 1000ms) with ±20% jitter.
178
+ - Does not retry on auth failures, validation errors, or unknown errors.
179
+ - On retry, logs a `warn` entry with server name, tool name, attempt number, and delay.
180
+
181
+ ---
182
+
183
+ ## Topic 5: `pushFiles` Empty Content Bug (GAP-003)
184
+
185
+ ### Question
186
+
187
+ Where exactly does `ralphLoop.ts` send empty file content, and what is the minimal fix?
188
+
189
+ ### Findings
190
+
191
+ In `ralphLoop.ts` lines 527–548, the current code already reads file content from disk using `readFile(resolve(outputDir, f), 'utf-8')` in the success-path push. The `content: ''` fallback at line 540 is reached only when `readFile` throws (file unreadable). The **actual bug** is that this same pattern is NOT used in the first-iteration scaffold push — only `applyResult.writtenFiles` paths are pushed after iterations 2+.
192
+
193
+ Reviewing the code more carefully: the current implementation at lines 527–548 does read file content. The GAP-003 bug noted in `specs/003-next-spec-gaps.md` was based on an earlier version. The current code's `content: ''` at line 540 is a **fallback for unreadable files** (correct behavior — skip unreadable rather than push garbage). However, the first scaffold push (before iteration 2) is missing — files created by `PocScaffolder` are never pushed to GitHub.
194
+
195
+ **Fix needed**: After the initial scaffold completes and a GitHub repo is created, push the scaffold files once (all `scaffoldResult.createdFiles`) with their actual on-disk content. The per-iteration push in the main loop already reads content correctly.
196
+
197
+ ### Decision
198
+
199
+ Add a post-scaffold push in `RalphLoop.run()` immediately after the npm install step, reading all scaffold file contents from disk (same `readFile` pattern already used in the iteration push block).
200
+
201
+ ---
202
+
203
+ ## Topic 6: Discovery Phase Enrichment Architecture (GAP-005)
204
+
205
+ ### Question
206
+
207
+ Where in the discovery phase flow should web search and WorkIQ enrichment be inserted? How is `DiscoveryEnrichment` stored in the session?
208
+
209
+ ### Findings
210
+
211
+ From `src/phases/phaseHandlers.ts` and the spec (FR-014 through FR-018):
212
+
213
+ - Step 1 of the discovery phase collects company and team information and stores it in `DiscoveryState`.
214
+ - After this collection, the enricher should be triggered.
215
+ - The session schema (`src/shared/schemas/session.ts`) has a `DiscoveryState` object that currently does not have an enrichment field.
216
+
217
+ **Flow design**:
218
+
219
+ 1. User completes Step 1 input.
220
+ 2. `discoveryEnricher.ts` is invoked with the step 1 summary.
221
+ 3. It calls `web.search` via `webSearch.ts` (already implemented) for company/competitor/industry queries.
222
+ 4. It optionally calls WorkIQ after showing a permission prompt.
223
+ 5. Results are stored in `session.discovery.enrichment: DiscoveryEnrichment`.
224
+ 6. The enrichment is referenced in subsequent phase prompts (ideation, planning).
225
+
226
+ **WorkIQ permission**: Implemented as an `@inquirer/prompts` `confirm` prompt before any WorkIQ call. If the user declines, or WorkIQ subprocess is not available, enrichment is skipped silently.
227
+
228
+ ### Decision
229
+
230
+ New module `src/phases/discoveryEnricher.ts` with a `DiscoveryEnricher` class that has:
231
+
232
+ - `enrichFromWebSearch(companySummary, mcpManager): Promise<Partial<DiscoveryEnrichment>>`
233
+ - `enrichFromWorkIQ(companySummary, mcpManager, io): Promise<Partial<DiscoveryEnrichment>>`
234
+ - `enrich(companySummary, mcpManager, io): Promise<DiscoveryEnrichment>` — orchestrates both, handles graceful degradation.
235
+
236
+ `DiscoveryEnrichment` is added to the Zod session schema as an optional field on `DiscoveryState`.
237
+
238
+ ---
239
+
240
+ ## Topic 7: Copilot SDK Agent Architecture Alignment (FR-020)
241
+
242
+ ### Question
243
+
244
+ Do the current agent definitions (discovery, ideation, design, select, plan, develop) need structural changes to align with Copilot SDK v0.1.28 patterns?
245
+
246
+ ### Findings
247
+
248
+ The Copilot SDK v0.1.28 agent model:
249
+
250
+ - An "agent" is a function/class that creates sessions via `CopilotClient.createSession(options)` where `options.systemPrompt` is the agent's identity and `options.tools` is the tool set.
251
+ - Tools are declared as `ToolDefinition[]`; there is no formal "agent registry" API in v0.1.28.
252
+ - The SDK dispatches `tool_call` events which the host handles in the conversation loop.
253
+
254
+ **Current sofIA agents**: Each phase (discovery, ideation, design, select, plan, develop) uses `createFakeCopilotClient` in tests and the real SDK client in production. They declare tools via `SessionOptions.tools`. This matches the SDK's expected pattern exactly.
255
+
256
+ **No structural misalignment found**. The current architecture correctly uses:
257
+
258
+ - `CopilotClient.createSession()` → one session per phase turn.
259
+ - `ToolDefinition` objects for capability declaration.
260
+ - Conversation loop event handling for tool dispatch.
261
+
262
+ **What FR-020 means in practice**: Ensure that when `callTool()` is implemented, direct adapter calls (GitHub, Context7, Azure) do NOT go through the LLM session — they are application-layer calls. This is already the design. The "alignment" is confirming the SDK does not provide a competing pattern that we should use instead.
263
+
264
+ ### Decision
265
+
266
+ **No agent refactoring needed**. The existing architecture is already SDK-aligned. FR-020 is satisfied by documenting this finding in `research.md` (this document) and ensuring `McpManager.callTool()` is an application-layer call, not routed through LLM sessions.
267
+
268
+ ---
269
+
270
+ ## Topic 8: SDK Hooks, Events, and CLI Transparency (FR-021, FR-022, FR-024)
271
+
272
+ ### Question
273
+
274
+ How should sofIA use the Copilot SDK's hooks and event system to provide real-time visibility into tool activity, errors, and usage — satisfying Constitution Principle VIII (CLI-First UX & Transparency)?
275
+
276
+ ### Findings
277
+
278
+ The Copilot SDK v0.1.28 provides two complementary mechanisms for runtime visibility:
279
+
280
+ **Hooks** (via `SessionConfig.hooks`):
281
+
282
+ Six lifecycle hooks are available:
283
+
284
+ - `onPreToolUse(toolName, toolArgs, context)` — fired before every tool call; returns `{ permissionDecision, reason }` to allow/deny
285
+ - `onPostToolUse(toolResult, context)` — fired after every tool call; can modify or log results
286
+ - `onUserPromptSubmitted(prompt, context)` — modify user prompts before processing
287
+ - `onSessionStart(context)` — add additional context at session start
288
+ - `onSessionEnd(context)` — cleanup/analytics
289
+ - `onErrorOccurred(error, context)` — custom error handling for LLM-path errors
290
+
291
+ **Events** (via `session.on()`/`session.once()`):
292
+
293
+ 40+ event types are available, including:
294
+
295
+ - `assistant.usage` — token usage per turn (input/output tokens)
296
+ - Streaming delta events for real-time content display
297
+ - Tool call lifecycle events
298
+
299
+ **Key insight for CLI transparency**: `onPreToolUse` and `onPostToolUse` are the standard mechanism to implement Constitution Principle VIII's requirement that _"Users MUST always see the current execution state."_ Currently, sofIA's spinner shows phase-level activity but does NOT show individual MCP tool calls being made during LLM conversation turns. The SDK hooks are the native way to emit this visibility.
300
+
301
+ **`onErrorOccurred` for LLM-path errors**: FR-004's `classifyMcpError()` only covers the custom transport path (adapter calls). Errors during SDK-managed tool calls (LLM conversation path) are handled by the SDK internally. The `onErrorOccurred` hook allows sofIA to log, surface, or recover from these errors — complementing the custom transport error handling.
302
+
303
+ **`onPermissionRequest` for tool approval**: The SDK provides an `onPermissionRequest` handler that implements deny-by-default tool approval. This was evaluated as an alternative to `io.prompt()` for WorkIQ consent (FR-016). Decision: defer in favor of the existing `io.prompt()` pattern, which is consistent with other interactive prompts in the discovery phase and supports custom consent UX.
304
+
305
+ **`assistant.usage` for token tracking**: Subscribing to the `assistant.usage` event provides per-turn token counts. This can be logged at `debug` level and optionally displayed in the spinner for transparency during long-running sessions.
306
+
307
+ ### Decision
308
+
309
+ **Wire SDK hooks for tool-call visibility**:
310
+
311
+ 1. Add `hooks` support to `SessionOptions` in `copilotClient.ts`.
312
+ 2. Wire `onPreToolUse` to emit a `tool:start` activity event (tool name) to the CLI spinner.
313
+ 3. Wire `onPostToolUse` to emit a `tool:end` activity event (tool name, duration) to the CLI spinner.
314
+ 4. Wire `onErrorOccurred` to log SDK-path errors at `warn` level via the existing pino logger.
315
+ 5. Subscribe to `assistant.usage` events and log token usage at `debug` level.
316
+
317
+ **Defer**: `onPermissionRequest` (use `io.prompt()` instead for WorkIQ consent), `onUserPromptSubmitted` (no current use case), `onSessionStart`/`onSessionEnd` (no current use case beyond what's already handled).
318
+
319
+ **Rationale**: Hooks are the SDK-native mechanism for the transparency that Constitution Principle VIII requires. Without them, MCP tool calls during LLM conversation turns are invisible to the user. The implementation is low-effort: forward hooks to `createSession()`, emit events to the existing spinner infrastructure in `src/shared/events.ts`.
320
+
321
+ ---
322
+
323
+ ## Topic 9: SDK Advanced Session Features — infiniteSessions, customAgents, skillDirectories (FR-023)
324
+
325
+ ### Question
326
+
327
+ Do the SDK's `infiniteSessions`, `customAgents`, and `skillDirectories` features offer advantages over sofIA's current implementation patterns?
328
+
329
+ ### Findings
330
+
331
+ **`infiniteSessions` config**:
332
+
333
+ - Controls context window management for long-running sessions.
334
+ - `backgroundCompactionThreshold` (default 0.7): triggers background context compaction when usage exceeds this ratio.
335
+ - `bufferExhaustionThreshold` (default 0.9): forces compaction to prevent context overflow.
336
+ - **Direct relevance**: The Ralph Loop runs extended multi-iteration conversations (up to `maxIterations` turns with code generation, test output, and enrichment context). Without `infiniteSessions`, long runs risk silently truncating conversation history, losing important context about failing tests or previous code changes.
337
+ - **Current gap**: Neither spec nor tasks configure `infiniteSessions`. The Ralph Loop could hit context limits on iteration 8+ with verbose test output.
338
+
339
+ **`customAgents` config**:
340
+
341
+ - Allows defining multiple agent personas within a single session.
342
+ - Each agent has its own system prompt, tools, and capabilities.
343
+ - The SDK handles agent switching within the session.
344
+ - **sofIA's current pattern**: Creates a new session per phase via `createSession()`. Each phase has its own system prompt and tools.
345
+ - **Evaluation**: `customAgents` would allow all phases to share a single session, but sofIA intentionally isolates phases with separate sessions for:
346
+ - Clean context boundaries between workshop phases
347
+ - Independent session history per phase
348
+ - Ability to checkpoint/resume individual phases
349
+ - **Conclusion**: The per-phase session pattern is deliberate and offers advantages that `customAgents` would sacrifice. No change needed.
350
+
351
+ **`skillDirectories` config**:
352
+
353
+ - Skills are named directories containing `SKILL.md` files (markdown with optional YAML frontmatter).
354
+ - Content is injected into the session context.
355
+ - Can disable specific skills via `disabledSkills`.
356
+ - **sofIA's current pattern**: `promptLoader.ts` loads prompts from `src/prompts/` as markdown files, injects them as system prompts via `SessionOptions.systemMessage`.
357
+ - **Evaluation**: `skillDirectories` could replace `promptLoader.ts` for phase-specific prompts, but:
358
+ - `promptLoader.ts` already works correctly and is tested.
359
+ - Skills are additive context injection, not primary system prompts — semantically different.
360
+ - Migration would add complexity without clear benefit.
361
+ - **Conclusion**: Keep `promptLoader.ts`. Skills could be used for supplementary context (e.g., workshop materials, card decks) in future features.
362
+
363
+ **`resumeSession(sessionId)` + session persistence**:
364
+
365
+ - Sessions persist at `~/.copilot/session-state/{sessionId}/` with checkpoints, plan.md, files.
366
+ - `resumeSession()` restores conversation history, tool call results, agent planning state.
367
+ - **Current "Out of Scope" deferral**: "Resume/checkpoint for `sofia dev`" (GAP-006 P2) was deferred assuming significant implementation effort.
368
+ - **SDK reality**: The SDK handles persistence natively — sofIA only needs to pass a structured `sessionId` and call `resumeSession()`. Implementation complexity is much lower than originally assessed.
369
+ - **Conclusion**: Keep deferred (different feature scope) but note reduced complexity in spec's Out of Scope section.
370
+
371
+ ### Decision
372
+
373
+ | Feature | Decision | Rationale |
374
+ | ------------------ | --------------------------------------------------------------- | ------------------------------------------------------------------------------------ |
375
+ | `infiniteSessions` | **Wire for Ralph Loop sessions** | Prevents context window exhaustion in extended iterations; low implementation effort |
376
+ | `customAgents` | **Defer — current per-phase sessions are deliberate** | Phase isolation provides clean context boundaries and independent checkpointing |
377
+ | `skillDirectories` | **Defer — `promptLoader.ts` is sufficient** | Current approach works; skills could supplement in future features |
378
+ | `resumeSession` | **Defer (different feature scope) but note reduced complexity** | SDK makes this nearly trivial; updated Out of Scope note in spec.md |
379
+
380
+ ---
381
+
382
+ ## Summary of Decisions
383
+
384
+ | Topic | Decision |
385
+ | -------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
386
+ | SDK MCP support | SDK provides native `mcpServers` in `createSession()` for LLM conversations; build custom `mcpTransport.ts` only for programmatic adapter calls (GitHub, Context7, Azure) |
387
+ | stdio transport | `child_process.spawn` + JSON-RPC 2.0 newline-delimited over stdin/stdout |
388
+ | HTTP transport | Native `fetch()` + `AbortController` timeout + `Authorization: Bearer` |
389
+ | Auth model | Transport-level: env var for HTTP, subprocess env inheritance for stdio |
390
+ | Retry policy | 1 retry max, transient errors only, 1s base delay ±20% jitter |
391
+ | pushFiles bug | Add post-scaffold push; per-iteration push already correct |
392
+ | Discovery enrichment | New `discoveryEnricher.ts`; `DiscoveryEnrichment` in session schema |
393
+ | Agent alignment | No refactoring needed; current SDK usage is correct |
394
+ | SDK hooks & events | Wire `onPreToolUse`/`onPostToolUse` for CLI spinner visibility; `onErrorOccurred` for LLM-path errors; `assistant.usage` for token tracking |
395
+ | SDK session features | `infiniteSessions` wired for Ralph Loop; `customAgents` deferred (per-phase sessions deliberate); `skillDirectories` deferred (`promptLoader.ts` sufficient); session persistence noted as low-complexity |
@@ -0,0 +1,234 @@
1
+ # Feature Specification: MCP Transport Integration
2
+
3
+ **Feature Branch**: `003-mcp-transport-integration`
4
+ **Created**: 2026-03-01
5
+ **Status**: Complete
6
+ **Completed**: 2026-03-02
7
+ **Upstream Dependency**: specs/002-poc-generation/spec.md (Ralph Loop, GitHub adapter, context enricher)
8
+ **Input**: User description: "Implement real MCP tool invocation layer connecting McpManager to actual MCP server transports, enabling GitHub repository creation, Context7 documentation lookup, Azure architecture guidance, and web search capabilities to function in production"
9
+
10
+ ## Overview
11
+
12
+ Feature 002 built the Ralph Loop iteration engine and all its MCP-powered components — GitHub adapter, Context7 enricher, Azure enricher, web search enricher — but every MCP tool call is currently a stub that either returns fake data or throws "not yet wired to transport." This feature implements the actual MCP transport layer so that `McpManager.callTool()` dispatches real requests to configured MCP servers.
13
+
14
+ Additionally, this feature wires the discovery phase to use web search and WorkIQ MCP tools for gathering company and industry context, and researches how the GitHub Copilot SDK structures agent definitions and MCP tool routing to ensure architectural alignment.
15
+
16
+ **Gaps addressed**: GAP-001, GAP-002, GAP-003, GAP-004, GAP-005, GAP-006 (P1), GAP-007 (P1) from `specs/003-next-spec-gaps.md`.
17
+ **Gaps deferred**: GAP-006 (P2, resume/checkpoint), GAP-007 (P2, `--force`), GAP-008 (P2, testRunner coverage), GAP-009 (P2, template selection) — see Out of Scope.
18
+
19
+ ## Clarifications
20
+
21
+ ### Session 2026-03-01
22
+
23
+ - Q: Should P2 gaps (resume/checkpoint, --force, testRunner coverage, template selection) be included in this feature spec or deferred? → A: Explicitly defer all P2 gaps to a separate feature spec; add Out of Scope section.
24
+ - Q: How should MCP authentication work across transport types (stdio env, HTTP SDK, WorkIQ OAuth)? → A: Defer auth design to implementation research (FR-019); determine correct pattern per transport during SDK research phase.
25
+ - Q: Should failed MCP tool calls be retried automatically? → A: One automatic retry with backoff for transient errors (connection refused, timeout); no retry for auth/validation errors.
26
+ - Q: What is the schema shape of DiscoveryEnrichment? → A: Flat structure with optional string arrays and nested WorkIQ insights object.
27
+ - Q: What if the Copilot SDK doesn't provide MCP integration or agent registration patterns? → A: Research SDK first; use built-in MCP support where available; build custom transport only where SDK lacks coverage; modify existing code to align with SDK patterns.
28
+
29
+ ## Out of Scope
30
+
31
+ The following items from `specs/003-next-spec-gaps.md` are explicitly deferred to a subsequent feature spec to limit scope risk:
32
+
33
+ - **Resume/checkpoint for `sofia dev`** (P2 GAP-006) — Detect existing PoC directory and resume from last iteration instead of re-scaffolding. _Note: SDK provides native `resumeSession(sessionId)` with state stored at `~/.copilot/session-state/` — implementation complexity is lower than originally assessed (see research.md Topic 9)._
34
+ - **`--force` flag implementation** (P2 GAP-007) — Honor the declared `--force` CLI option to delete existing output and restart.
35
+ - **testRunner.ts coverage hardening** (P2 GAP-008) — Add spawn-based integration tests for child process spawning, timeout, and SIGTERM/SIGKILL paths.
36
+ - **PoC template selection** (P2 GAP-009) — Define a template registry mapping plan characteristics to scaffold templates (e.g., Python/FastAPI).
37
+ - **Generated scaffold TODOs** (P3 GAP-009) — Tracking intentional TODO markers in generated code for template quality.
38
+ - **PTY-based E2E tests** (P3 GAP-010) — Interactive terminal tests for `sofia dev` Ctrl+C handling and spinner output.
39
+ - **Workshop→develop phase transition** (P3 GAP-011) — Whether `workshop` should auto-invoke the Ralph loop after Plan completion.
40
+
41
+ ## User Scenarios & Testing _(mandatory)_
42
+
43
+ ### User Story 1 — MCP Tool Calls Work in Production (Priority: P1)
44
+
45
+ As a facilitator running `sofia dev`, I want the Ralph Loop to create a real GitHub repository, query Context7 for library documentation, fetch Azure architecture guidance, and perform web searches when stuck — so that the PoC generation pipeline works end-to-end in a properly configured environment.
46
+
47
+ **Why this priority**: Without working MCP transport, the entire PoC generation pipeline produces degraded output. Every MCP-dependent component returns hardcoded or simulated data, making the Ralph Loop significantly less effective.
48
+
49
+ **Independent Test**: Configure a test environment with a mock MCP server implementing the MCP protocol, run `sofia dev` on a session with a plan referencing Azure services and npm dependencies, and verify that real tool calls are dispatched and real responses are used in the LLM prompt.
50
+
51
+ **Acceptance Scenarios**:
52
+
53
+ 1. **Given** a configured MCP environment with GitHub MCP available, **When** the Ralph Loop scaffolds a PoC, **Then** `createRepository` dispatches a `create_repository` tool call via the MCP transport and returns the real repository URL from the server response.
54
+ 2. **Given** a configured MCP environment with Context7 available, **When** the context enricher queries library docs for a dependency (e.g., "express"), **Then** `queryContext7` dispatches `resolve-library-id` followed by `query-docs` tool calls via MCP transport and returns real documentation text.
55
+ 3. **Given** a configured MCP environment with Azure MCP available, **When** the plan references Azure services, **Then** `queryAzureMcp` dispatches a `documentation` tool call and returns real architecture guidance.
56
+ 4. **Given** all MCP servers are unavailable, **When** the Ralph Loop runs, **Then** all adapters degrade gracefully to local fallbacks (local scaffold, static context strings) without errors or crashes.
57
+
58
+ ---
59
+
60
+ ### User Story 2 — GitHub MCP Pushes Real File Content (Priority: P1)
61
+
62
+ As a facilitator, I want the Ralph Loop to push actual file content (not empty strings) to the GitHub repository after each iteration, so that the remote repository always reflects the current state of the PoC.
63
+
64
+ **Why this priority**: Even with MCP transport working, pushing empty files makes the GitHub integration useless. This is a data-flow bug that must be fixed alongside the transport layer.
65
+
66
+ **Independent Test**: Run a Ralph Loop iteration that modifies files, verify that `pushFiles` sends the actual file content read from disk to the MCP server.
67
+
68
+ **Acceptance Scenarios**:
69
+
70
+ 1. **Given** the Ralph Loop has applied code changes in an iteration, **When** `pushFiles` is called, **Then** each file in the push request contains the actual content read from disk (not empty strings).
71
+ 2. **Given** a file path is outside the output directory, **When** `pushFiles` prepares the file list, **Then** that file is skipped with a warning logged.
72
+
73
+ ---
74
+
75
+ ### User Story 3 — Discovery Phase Uses Web Search and WorkIQ (Priority: P2)
76
+
77
+ As a facilitator running a discovery workshop, I want sofIA to optionally search the web for recent company news, competitor activity, and industry trends after I provide company information, so that the ideation phase is informed by current market context.
78
+
79
+ **Why this priority**: Enriching the discovery phase with real-world context improves PoC relevance, but the core pipeline works without it. This is an enhancement to workshop quality.
80
+
81
+ **Independent Test**: Start a workshop session, provide company and team information, verify that sofIA offers to search for relevant context and stores the results in the session for later phases.
82
+
83
+ **Acceptance Scenarios**:
84
+
85
+ 1. **Given** the user has described their business in Step 1, **When** the discovery agent processes this input, **Then** it offers to search the web for recent news about the company, competitors, and industry trends.
86
+ 2. **Given** the user consents to web search enrichment, **When** the search completes, **Then** the results are stored in the session's discovery state and are available to inform ideation and planning phases.
87
+ 3. **Given** web search MCP is unavailable, **When** the discovery agent tries to enrich context, **Then** it skips enrichment gracefully with a message explaining that web search is not available.
88
+
89
+ ---
90
+
91
+ ### User Story 4 — WorkIQ Integration for Internal Context (Priority: P3)
92
+
93
+ As a facilitator, I want sofIA to optionally query WorkIQ to analyze internal documentation, meeting patterns, and team expertise, so that the PoC is aligned with the team's actual strengths and collaboration patterns.
94
+
95
+ **Why this priority**: WorkIQ provides valuable internal context but requires Microsoft 365 access and admin consent. It's an optional enhancement that adds significant value when available but is not required for core functionality.
96
+
97
+ **Independent Test**: With WorkIQ configured and authorized, verify that sofIA asks permission before querying, returns meaningful team insights, and stores them in the session.
98
+
99
+ **Acceptance Scenarios**:
100
+
101
+ 1. **Given** WorkIQ is configured and available, **When** the discovery phase gathers team information, **Then** sofIA asks the user for explicit permission before querying WorkIQ.
102
+ 2. **Given** the user grants WorkIQ permission, **When** WorkIQ returns team insights, **Then** these insights are stored in the session and surfaced during ideation to help shape the PoC approach.
103
+ 3. **Given** the user declines WorkIQ access or WorkIQ is unavailable, **When** the discovery phase continues, **Then** it proceeds normally without internal context, with no errors.
104
+
105
+ ---
106
+
107
+ ### User Story 5 — Copilot SDK Agent Architecture Alignment (Priority: P2)
108
+
109
+ As a developer, I want the MCP transport layer to align with how the GitHub Copilot SDK structures agent definitions and tool routing, so that sofIA's agents can be registered and orchestrated through the SDK's native patterns rather than custom dispatch.
110
+
111
+ **Why this priority**: Aligning with SDK patterns now avoids a costly refactor later and enables sofIA to leverage SDK features like built-in authentication, tool approval flows, and session management.
112
+
113
+ **Independent Test**: Verify that the agent definitions follow Copilot SDK conventions and that MCP tool calls flow through the SDK's expected tool-calling interface.
114
+
115
+ **Acceptance Scenarios**:
116
+
117
+ 1. **Given** the Copilot SDK provides native `mcpServers` support in `SessionConfig`, **When** `createSession()` is called with MCP server configurations from `.vscode/mcp.json`, **Then** the SDK manages server lifecycle (spawn/connect, JSON-RPC, tool dispatch) for LLM-initiated tool calls without custom transport code.
118
+ 2. **Given** adapters (GitHub, Context7, Azure) need deterministic programmatic tool calls, **When** `McpManager.callTool()` is invoked, **Then** it routes through the custom transport layer (`StdioMcpTransport` / `HttpMcpTransport`) since the SDK's `executeToolCall` is private and cannot be called directly from application code.
119
+ 3. **Given** the SDK defines agent registration patterns, **When** sofIA's discovery/ideation/design/select/plan agents are initialized, **Then** they follow SDK conventions for capability declaration and tool access.
120
+
121
+ ### Dual-Lifecycle Limitation
122
+
123
+ **⚠️ Known limitation**: The Copilot SDK's `mcpServers` support and the custom `McpTransport` layer manage MCP server connections independently. If the same server (e.g., `context7`) is used both by the LLM during conversation turns (SDK-managed) and by a programmatic adapter call (custom transport-managed), **two separate subprocess instances** will be spawned. This is acceptable for Feature 003 scope because:
124
+
125
+ - The two paths serve fundamentally different call patterns (LLM-driven vs. deterministic).
126
+ - MCP subprocesses are lightweight (Node.js CLI tools via npx).
127
+ - HTTP servers (GitHub, Microsoft Docs) are stateless and unaffected.
128
+
129
+ This limitation should be revisited in a future feature if subprocess resource usage becomes a concern, potentially by sharing a single subprocess instance between the SDK and custom transport layers.
130
+
131
+ ---
132
+
133
+ ### Edge Cases
134
+
135
+ - What happens when an MCP server disconnects mid-tool-call? The transport MUST timeout and return a classified error that adapters handle gracefully.
136
+ - How does the system handle MCP servers returning malformed JSON responses? The transport MUST parse defensively and throw typed errors classifiable by `classifyMcpError()`.
137
+ - What happens when Context7 `resolve-library-id` succeeds but `query-docs` fails? Return whatever partial context was gathered.
138
+ - What happens when the GitHub MCP `create_repository` call creates the repo but the subsequent `push_files` call fails? The repository URL MUST still be recorded; the push failure is a recoverable error for the next iteration.
139
+ - What if WorkIQ requires re-authentication mid-session? Surface a clear message to the user and skip enrichment for that query.
140
+
141
+ ## Requirements _(mandatory)_
142
+
143
+ ### Functional Requirements
144
+
145
+ #### MCP Transport Layer (GAP-001)
146
+
147
+ - **FR-001**: `McpManager` MUST implement a working `callTool(serverName, toolName, args)` method that dispatches real tool calls to configured MCP servers and returns structured results.
148
+ - **FR-002**: The transport layer MUST support both `stdio` (subprocess-based) and `http` (URL-based) MCP server configurations as defined in `.vscode/mcp.json`.
149
+ - **FR-003**: Tool calls MUST include configurable timeouts (default: 30 seconds for data queries, 60 seconds for repository operations).
150
+ - **FR-004**: The transport MUST handle connection failures, timeouts, and malformed responses by throwing typed errors that callers can classify using `classifyMcpError()`.
151
+ - **FR-004a**: The transport MUST automatically retry once with exponential backoff for transient errors (connection refused, timeout, server unavailable). Auth failures and validation errors MUST NOT be retried.
152
+ - **FR-005**: The transport MUST support the MCP protocol's request/response format, including JSON-RPC message framing for stdio servers.
153
+
154
+ #### GitHub MCP Integration (GAP-002, GAP-003)
155
+
156
+ - **FR-006**: `GitHubMcpAdapter.createRepository()` MUST dispatch a `create_repository` tool call via the MCP transport and extract the repository URL from the real response.
157
+ - **FR-007**: `GitHubMcpAdapter.pushFiles()` MUST dispatch a `push_files` tool call via the MCP transport, sending actual file content read from disk.
158
+ - **FR-008**: The Ralph Loop MUST read file content from disk before passing it to `pushFiles()`, replacing the current empty-string behavior (GAP-003).
159
+ - **FR-009**: Both GitHub adapter methods MUST degrade gracefully when the MCP transport is unavailable, returning `{ available: false, reason }`.
160
+
161
+ #### Context Enrichment MCP Integration (GAP-004)
162
+
163
+ - **FR-010**: `McpContextEnricher.queryContext7()` MUST dispatch `resolve-library-id` and `query-docs` tool calls to the Context7 MCP server and return real documentation text.
164
+ - **FR-011**: `McpContextEnricher.queryAzureMcp()` MUST dispatch a `documentation` tool call to the Azure MCP server with architecture keywords and return real guidance.
165
+ - **FR-012**: `McpContextEnricher.queryWebSearch()` MUST dispatch a search tool call when the Ralph Loop is stuck (2+ consecutive iterations with same failures).
166
+ - **FR-013**: All enrichment methods MUST fall back to static/empty context when their respective MCP servers are unavailable, with no impact on the Ralph Loop's ability to continue iterating.
167
+
168
+ #### Discovery Phase Enrichment (GAP-005)
169
+
170
+ - **FR-014**: The discovery phase SHOULD offer web search enrichment after the user provides company and team information in Step 1.
171
+ - **FR-015**: Web search results (company news, competitor activity, industry trends) MUST be stored in the session state for use in subsequent phases.
172
+ - **FR-016**: WorkIQ integration MUST request explicit user permission before accessing Microsoft 365 data. _Note: SDK provides `onPermissionRequest` handler as an alternative — evaluated and deferred in favor of `io.prompt()` for consistency with other interactive prompts (see research.md Topic 8)._
173
+ - **FR-017**: WorkIQ-derived insights (team collaboration patterns, documentation gaps, expertise areas) MUST be stored in the session state when the user consents.
174
+ - **FR-018**: Both web search and WorkIQ enrichment MUST be optional — the discovery phase MUST function normally without them.
175
+
176
+ #### Copilot SDK Alignment (GAP-006, GAP-007)
177
+
178
+ - **FR-019**: Before implementing custom MCP transport, the team MUST research the GitHub Copilot SDK's built-in MCP support and tool-calling capabilities. Where the SDK provides native MCP integration (tool dispatch, authentication, protocol handling), the implementation MUST use the SDK's mechanisms rather than building custom transport. Custom stdio/HTTP transport MUST only be built for capabilities the SDK does not cover. This research MUST also determine the authentication model for each transport type. Findings MUST be documented in a research note under `specs/003-mcp-transport-integration/research.md`.
179
+ - **FR-020**: Agent definitions (discovery, ideation, design, select, plan, develop) MUST be verified to follow Copilot SDK conventions for agent registration and capability declaration. Where SDK patterns (e.g., `customAgents`, `skillDirectories`) offer advantages over the current implementation, they SHOULD be evaluated and adopted if beneficial. Research findings (research.md Topic 7) confirmed no structural refactoring is currently required; SDK `customAgents` and `skillDirectories` evaluated and deferred (research.md Topic 9).
180
+
181
+ #### SDK Hooks & Transparency (Constitution Principle VIII)
182
+
183
+ - **FR-021**: The system MUST use the Copilot SDK's `onPreToolUse` and `onPostToolUse` hooks to emit tool-call activity events (tool name, start/end, duration) visible via the CLI spinner or activity log, satisfying Constitution Principle VIII ("Users MUST always see the current execution state"). Findings documented in research.md Topic 8.
184
+ - **FR-022**: The system SHOULD use the Copilot SDK's `onErrorOccurred` hook to centralize error handling for LLM-conversation-path MCP failures, complementing `classifyMcpError()` which handles the custom transport path. FR-004's error classification covers programmatic adapter calls only; SDK-managed tool call errors require the `onErrorOccurred` hook.
185
+ - **FR-023**: Ralph Loop sessions SHOULD wire the SDK's `infiniteSessions` config (with `backgroundCompactionThreshold` and `bufferExhaustionThreshold`) to prevent context window exhaustion during extended multi-iteration conversations. Without this, long Ralph Loop runs risk silently truncating important context (see research.md Topic 9).
186
+ - **FR-024**: The system SHOULD subscribe to SDK events (e.g., `assistant.usage` for token tracking) to provide real-time progress and usage transparency in the CLI, per Constitution Principle VIII. Token counts logged at `debug` level and optionally surfaced in the CLI spinner.
187
+
188
+ ### Key Entities
189
+
190
+ - **McpTransport**: Abstraction over the communication channel to an MCP server. Handles JSON-RPC framing for stdio servers and HTTP requests for HTTP servers. Manages connection lifecycle (connect, call, disconnect).
191
+ - **ToolCallRequest**: Structured request containing server name, tool name, and arguments. Includes timeout and retry configuration.
192
+ - **ToolCallResponse**: Structured response from an MCP server containing the tool result as parsed JSON, or an error with classification.
193
+ - **DiscoveryEnrichment**: Optional context gathered during the discovery phase, stored in the session for downstream phases. Schema:
194
+ ```typescript
195
+ {
196
+ webSearchResults?: string; // raw search summary text
197
+ companyNews?: string[]; // recent news headlines/snippets
198
+ competitorInfo?: string[]; // competitor activity summaries
199
+ industryTrends?: string[]; // industry trend descriptions
200
+ workiqInsights?: {
201
+ teamExpertise?: string[]; // identified team skill areas
202
+ collaborationPatterns?: string[];// meeting/communication patterns
203
+ documentationGaps?: string[]; // areas lacking documentation
204
+ };
205
+ }
206
+ ```
207
+
208
+ ## Success Criteria _(mandatory)_
209
+
210
+ ### Measurable Outcomes
211
+
212
+ - **SC-003-001**: All four MCP-dependent components (GitHub adapter, Context7 enricher, Azure enricher, web search enricher) successfully dispatch real tool calls and process real responses in a configured environment.
213
+ - **SC-003-002**: When any MCP server is unavailable, the system operates with the same graceful degradation behavior as the current stub implementation — no crashes, clear fallback messages, functional Ralph Loop output.
214
+ - **SC-003-003**: GitHub `pushFiles` sends actual file content to the MCP server, verified by inspecting the tool call arguments in integration tests.
215
+ - **SC-003-004**: The Ralph Loop completes at least one end-to-end run where Context7 documentation and web search results measurably improve the LLM's ability to fix failing tests (measured by comparing iteration counts with and without enrichment).
216
+ - **SC-003-005**: Discovery phase web search enrichment retrieves relevant context for at least 3 out of 5 test company descriptions, measured by keyword relevance in search results.
217
+ - **SC-003-006**: WorkIQ integration, when authorized, returns team insights within 10 seconds and stores them in the session without errors.
218
+ - **SC-003-007**: All MCP tool calls complete within their configured timeouts (30s for queries, 60s for repository operations) or return a classified timeout error.
219
+
220
+ ## Assumptions
221
+
222
+ - The GitHub Copilot SDK (`@github/copilot-sdk` v0.1.28+) is the primary implementation path for MCP transport. FR-019 research MUST determine what the SDK provides natively before building any custom transport. Custom stdio/HTTP transport is only built for gaps in SDK coverage.
223
+ - MCP servers configured in `.vscode/mcp.json` follow the standard MCP protocol (JSON-RPC 2.0 over stdio, or HTTP endpoints).
224
+ - The authentication model for each MCP transport type is not yet determined. FR-019 research will establish whether stdio servers inherit environment credentials, whether HTTP servers use Copilot SDK auth, and whether WorkIQ requires its own OAuth flow. No custom auth abstraction should be built until this research is complete.
225
+ - Context7 and Azure MCP servers are publicly accessible npm packages (`@upstash/context7-mcp`, `@azure/mcp`) that can be spawned as subprocesses — unless the SDK provides a different mechanism for spawning/connecting.
226
+ - WorkIQ requires Microsoft 365 tenant access with admin consent, as documented in the WorkIQ Admin Instructions.
227
+ - The `web.search` capability is either built into the Copilot SDK or available as an MCP tool — research is needed to confirm the exact integration path.
228
+ - Live integration tests for MCP servers will be gated behind environment variables (e.g., `SOFIA_LIVE_MCP_TESTS=true`) to avoid CI failures when servers are not available.
229
+
230
+ ## Dependencies
231
+
232
+ - **Feature 001**: Session model, workshop phases, plan outputs
233
+ - **Feature 002**: Ralph Loop, GitHub adapter, context enricher, PoC scaffolder
234
+ - **External**: `@github/copilot-sdk` v0.1.28+, MCP servers configured in `.vscode/mcp.json`