retestkit 1.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (327) hide show
  1. package/.claude/commands/openspec/apply.md +23 -0
  2. package/.claude/commands/openspec/archive.md +27 -0
  3. package/.claude/commands/openspec/proposal.md +28 -0
  4. package/.gemini/commands/openspec/apply.toml +21 -0
  5. package/.gemini/commands/openspec/archive.toml +25 -0
  6. package/.gemini/commands/openspec/proposal.toml +26 -0
  7. package/.github/prompts/openspec-apply.prompt.md +22 -0
  8. package/.github/prompts/openspec-archive.prompt.md +26 -0
  9. package/.github/prompts/openspec-proposal.prompt.md +27 -0
  10. package/.github/workflows/release.yml +33 -0
  11. package/.kilocode/workflows/openspec-apply.md +17 -0
  12. package/.kilocode/workflows/openspec-archive.md +21 -0
  13. package/.kilocode/workflows/openspec-proposal.md +22 -0
  14. package/.mcp.json +23 -0
  15. package/.opencode/command/openspec-apply.md +25 -0
  16. package/.opencode/command/openspec-archive.md +28 -0
  17. package/.opencode/command/openspec-proposal.md +30 -0
  18. package/.roo/commands/openspec-apply.md +20 -0
  19. package/.roo/commands/openspec-archive.md +24 -0
  20. package/.roo/commands/openspec-proposal.md +25 -0
  21. package/.vscode/mcp.json +23 -0
  22. package/AGENTS.md +18 -0
  23. package/CLAUDE.md +18 -0
  24. package/LICENSE +65 -0
  25. package/README.md +303 -0
  26. package/dist/config.d.ts +4 -0
  27. package/dist/config.d.ts.map +1 -0
  28. package/dist/config.js +27 -0
  29. package/dist/config.js.map +1 -0
  30. package/dist/elicitation/index.d.ts +17 -0
  31. package/dist/elicitation/index.d.ts.map +1 -0
  32. package/dist/elicitation/index.js +118 -0
  33. package/dist/elicitation/index.js.map +1 -0
  34. package/dist/elicitation/types.d.ts +35 -0
  35. package/dist/elicitation/types.d.ts.map +1 -0
  36. package/dist/elicitation/types.js +39 -0
  37. package/dist/elicitation/types.js.map +1 -0
  38. package/dist/index.d.ts +3 -0
  39. package/dist/index.d.ts.map +1 -0
  40. package/dist/index.js +76 -0
  41. package/dist/index.js.map +1 -0
  42. package/dist/lifecycle/index.d.ts +31 -0
  43. package/dist/lifecycle/index.d.ts.map +1 -0
  44. package/dist/lifecycle/index.js +61 -0
  45. package/dist/lifecycle/index.js.map +1 -0
  46. package/dist/logger.d.ts +21 -0
  47. package/dist/logger.d.ts.map +1 -0
  48. package/dist/logger.js +182 -0
  49. package/dist/logger.js.map +1 -0
  50. package/dist/playwright-client/index.d.ts +29 -0
  51. package/dist/playwright-client/index.d.ts.map +1 -0
  52. package/dist/playwright-client/index.js +288 -0
  53. package/dist/playwright-client/index.js.map +1 -0
  54. package/dist/playwright-client/types.d.ts +44 -0
  55. package/dist/playwright-client/types.d.ts.map +1 -0
  56. package/dist/playwright-client/types.js +49 -0
  57. package/dist/playwright-client/types.js.map +1 -0
  58. package/dist/progress/index.d.ts +39 -0
  59. package/dist/progress/index.d.ts.map +1 -0
  60. package/dist/progress/index.js +106 -0
  61. package/dist/progress/index.js.map +1 -0
  62. package/dist/progress/types.d.ts +24 -0
  63. package/dist/progress/types.d.ts.map +1 -0
  64. package/dist/progress/types.js +2 -0
  65. package/dist/progress/types.js.map +1 -0
  66. package/dist/prompts/index.d.ts +19 -0
  67. package/dist/prompts/index.d.ts.map +1 -0
  68. package/dist/prompts/index.js +207 -0
  69. package/dist/prompts/index.js.map +1 -0
  70. package/dist/prompts/loader.d.ts +20 -0
  71. package/dist/prompts/loader.d.ts.map +1 -0
  72. package/dist/prompts/loader.js +47 -0
  73. package/dist/prompts/loader.js.map +1 -0
  74. package/dist/resources/index.d.ts +27 -0
  75. package/dist/resources/index.d.ts.map +1 -0
  76. package/dist/resources/index.js +186 -0
  77. package/dist/resources/index.js.map +1 -0
  78. package/dist/resources/subscriptions.d.ts +10 -0
  79. package/dist/resources/subscriptions.d.ts.map +1 -0
  80. package/dist/resources/subscriptions.js +23 -0
  81. package/dist/resources/subscriptions.js.map +1 -0
  82. package/dist/sampling/index.d.ts +11 -0
  83. package/dist/sampling/index.d.ts.map +1 -0
  84. package/dist/sampling/index.js +201 -0
  85. package/dist/sampling/index.js.map +1 -0
  86. package/dist/sampling/prompts.d.ts +56 -0
  87. package/dist/sampling/prompts.d.ts.map +1 -0
  88. package/dist/sampling/prompts.js +124 -0
  89. package/dist/sampling/prompts.js.map +1 -0
  90. package/dist/sampling/types.d.ts +57 -0
  91. package/dist/sampling/types.d.ts.map +1 -0
  92. package/dist/sampling/types.js +2 -0
  93. package/dist/sampling/types.js.map +1 -0
  94. package/dist/schemas/config.d.ts +40 -0
  95. package/dist/schemas/config.d.ts.map +1 -0
  96. package/dist/schemas/config.js +30 -0
  97. package/dist/schemas/config.js.map +1 -0
  98. package/dist/security/index.d.ts +38 -0
  99. package/dist/security/index.d.ts.map +1 -0
  100. package/dist/security/index.js +281 -0
  101. package/dist/security/index.js.map +1 -0
  102. package/dist/server.d.ts +9 -0
  103. package/dist/server.d.ts.map +1 -0
  104. package/dist/server.js +142 -0
  105. package/dist/server.js.map +1 -0
  106. package/dist/test-utils/index.d.ts +6 -0
  107. package/dist/test-utils/index.d.ts.map +1 -0
  108. package/dist/test-utils/index.js +6 -0
  109. package/dist/test-utils/index.js.map +1 -0
  110. package/dist/test-utils/mock-context.d.ts +64 -0
  111. package/dist/test-utils/mock-context.d.ts.map +1 -0
  112. package/dist/test-utils/mock-context.js +347 -0
  113. package/dist/test-utils/mock-context.js.map +1 -0
  114. package/dist/test-utils/mock-playwright-client.d.ts +62 -0
  115. package/dist/test-utils/mock-playwright-client.d.ts.map +1 -0
  116. package/dist/test-utils/mock-playwright-client.js +315 -0
  117. package/dist/test-utils/mock-playwright-client.js.map +1 -0
  118. package/dist/tools/index.d.ts +4 -0
  119. package/dist/tools/index.d.ts.map +1 -0
  120. package/dist/tools/index.js +8 -0
  121. package/dist/tools/index.js.map +1 -0
  122. package/dist/tools/webtest/crawl.d.ts +46 -0
  123. package/dist/tools/webtest/crawl.d.ts.map +1 -0
  124. package/dist/tools/webtest/crawl.js +678 -0
  125. package/dist/tools/webtest/crawl.js.map +1 -0
  126. package/dist/tools/webtest/discover-features.d.ts +30 -0
  127. package/dist/tools/webtest/discover-features.d.ts.map +1 -0
  128. package/dist/tools/webtest/discover-features.js +343 -0
  129. package/dist/tools/webtest/discover-features.js.map +1 -0
  130. package/dist/tools/webtest/discover-flows.d.ts +29 -0
  131. package/dist/tools/webtest/discover-flows.d.ts.map +1 -0
  132. package/dist/tools/webtest/discover-flows.js +341 -0
  133. package/dist/tools/webtest/discover-flows.js.map +1 -0
  134. package/dist/tools/webtest/generate-tests.d.ts +54 -0
  135. package/dist/tools/webtest/generate-tests.d.ts.map +1 -0
  136. package/dist/tools/webtest/generate-tests.js +364 -0
  137. package/dist/tools/webtest/generate-tests.js.map +1 -0
  138. package/dist/tools/webtest/index.d.ts +8 -0
  139. package/dist/tools/webtest/index.d.ts.map +1 -0
  140. package/dist/tools/webtest/index.js +8 -0
  141. package/dist/tools/webtest/index.js.map +1 -0
  142. package/dist/tools/webtest/run-test-case.d.ts +28 -0
  143. package/dist/tools/webtest/run-test-case.d.ts.map +1 -0
  144. package/dist/tools/webtest/run-test-case.js +420 -0
  145. package/dist/tools/webtest/run-test-case.js.map +1 -0
  146. package/dist/tools/webtest/schemas.d.ts +175 -0
  147. package/dist/tools/webtest/schemas.d.ts.map +1 -0
  148. package/dist/tools/webtest/schemas.js +156 -0
  149. package/dist/tools/webtest/schemas.js.map +1 -0
  150. package/dist/tools/webtest/start-analysis.d.ts +16 -0
  151. package/dist/tools/webtest/start-analysis.d.ts.map +1 -0
  152. package/dist/tools/webtest/start-analysis.js +137 -0
  153. package/dist/tools/webtest/start-analysis.js.map +1 -0
  154. package/dist/transports/http.d.ts +8 -0
  155. package/dist/transports/http.d.ts.map +1 -0
  156. package/dist/transports/http.js +9 -0
  157. package/dist/transports/http.js.map +1 -0
  158. package/dist/transports/index.d.ts +14 -0
  159. package/dist/transports/index.d.ts.map +1 -0
  160. package/dist/transports/index.js +20 -0
  161. package/dist/transports/index.js.map +1 -0
  162. package/dist/transports/stdio.d.ts +4 -0
  163. package/dist/transports/stdio.d.ts.map +1 -0
  164. package/dist/transports/stdio.js +6 -0
  165. package/dist/transports/stdio.js.map +1 -0
  166. package/dist/types/capabilities.d.ts +18 -0
  167. package/dist/types/capabilities.d.ts.map +1 -0
  168. package/dist/types/capabilities.js +35 -0
  169. package/dist/types/capabilities.js.map +1 -0
  170. package/dist/types/context.d.ts +20 -0
  171. package/dist/types/context.d.ts.map +1 -0
  172. package/dist/types/context.js +2 -0
  173. package/dist/types/context.js.map +1 -0
  174. package/dist/types/tool.d.ts +10 -0
  175. package/dist/types/tool.d.ts.map +1 -0
  176. package/dist/types/tool.js +2 -0
  177. package/dist/types/tool.js.map +1 -0
  178. package/dist/workspace/index.d.ts +99 -0
  179. package/dist/workspace/index.d.ts.map +1 -0
  180. package/dist/workspace/index.js +648 -0
  181. package/dist/workspace/index.js.map +1 -0
  182. package/dist/workspace/markdown.d.ts +50 -0
  183. package/dist/workspace/markdown.d.ts.map +1 -0
  184. package/dist/workspace/markdown.js +210 -0
  185. package/dist/workspace/markdown.js.map +1 -0
  186. package/dist/workspace/types.d.ts +173 -0
  187. package/dist/workspace/types.d.ts.map +1 -0
  188. package/dist/workspace/types.js +2 -0
  189. package/dist/workspace/types.js.map +1 -0
  190. package/openspec/AGENTS.md +456 -0
  191. package/openspec/changes/archive/2025-12-18-add-hybrid-artifact-paths/proposal.md +33 -0
  192. package/openspec/changes/archive/2025-12-18-add-hybrid-artifact-paths/specs/webtest-resources/spec.md +27 -0
  193. package/openspec/changes/archive/2025-12-18-add-hybrid-artifact-paths/specs/webtest-tools/spec.md +304 -0
  194. package/openspec/changes/archive/2025-12-18-add-hybrid-artifact-paths/tasks.md +43 -0
  195. package/openspec/changes/archive/2025-12-18-add-mcp-server-foundation/design.md +209 -0
  196. package/openspec/changes/archive/2025-12-18-add-mcp-server-foundation/proposal.md +41 -0
  197. package/openspec/changes/archive/2025-12-18-add-mcp-server-foundation/specs/mcp-server-core/spec.md +183 -0
  198. package/openspec/changes/archive/2025-12-18-add-mcp-server-foundation/tasks.md +112 -0
  199. package/openspec/changes/archive/2025-12-18-add-webtest-orchestrator/design.md +333 -0
  200. package/openspec/changes/archive/2025-12-18-add-webtest-orchestrator/proposal.md +66 -0
  201. package/openspec/changes/archive/2025-12-18-add-webtest-orchestrator/specs/mcp-server-core/spec.md +129 -0
  202. package/openspec/changes/archive/2025-12-18-add-webtest-orchestrator/specs/webtest-lifecycle/spec.md +138 -0
  203. package/openspec/changes/archive/2025-12-18-add-webtest-orchestrator/specs/webtest-logging/spec.md +211 -0
  204. package/openspec/changes/archive/2025-12-18-add-webtest-orchestrator/specs/webtest-prompts/spec.md +157 -0
  205. package/openspec/changes/archive/2025-12-18-add-webtest-orchestrator/specs/webtest-resources/spec.md +213 -0
  206. package/openspec/changes/archive/2025-12-18-add-webtest-orchestrator/specs/webtest-sampling/spec.md +257 -0
  207. package/openspec/changes/archive/2025-12-18-add-webtest-orchestrator/specs/webtest-tools/spec.md +501 -0
  208. package/openspec/changes/archive/2025-12-18-add-webtest-orchestrator/tasks.md +264 -0
  209. package/openspec/changes/archive/2025-12-18-allow-analysis-of-incomplete-crawls/proposal.md +24 -0
  210. package/openspec/changes/archive/2025-12-18-allow-analysis-of-incomplete-crawls/specs/webtest-tools/spec.md +80 -0
  211. package/openspec/changes/archive/2025-12-18-allow-analysis-of-incomplete-crawls/tasks.md +8 -0
  212. package/openspec/changes/archive/2025-12-18-fix-crawl-loop-stability/design.md +90 -0
  213. package/openspec/changes/archive/2025-12-18-fix-crawl-loop-stability/proposal.md +28 -0
  214. package/openspec/changes/archive/2025-12-18-fix-crawl-loop-stability/specs/webtest-sampling/spec.md +90 -0
  215. package/openspec/changes/archive/2025-12-18-fix-crawl-loop-stability/tasks.md +33 -0
  216. package/openspec/changes/archive/2025-12-18-use-markdown-artifacts/design.md +558 -0
  217. package/openspec/changes/archive/2025-12-18-use-markdown-artifacts/proposal.md +119 -0
  218. package/openspec/changes/archive/2025-12-18-use-markdown-artifacts/specs/webtest-resources/spec.md +109 -0
  219. package/openspec/changes/archive/2025-12-18-use-markdown-artifacts/specs/webtest-tools/spec.md +121 -0
  220. package/openspec/changes/archive/2025-12-18-use-markdown-artifacts/tasks.md +133 -0
  221. package/openspec/changes/extract-prompts-to-markdown/design.md +86 -0
  222. package/openspec/changes/extract-prompts-to-markdown/proposal.md +50 -0
  223. package/openspec/changes/extract-prompts-to-markdown/specs/webtest-prompts/spec.md +74 -0
  224. package/openspec/changes/extract-prompts-to-markdown/tasks.md +40 -0
  225. package/openspec/changes/refactor-webtest-naming/design.md +95 -0
  226. package/openspec/changes/refactor-webtest-naming/proposal.md +66 -0
  227. package/openspec/changes/refactor-webtest-naming/specs/webtest-prompts/spec.md +79 -0
  228. package/openspec/changes/refactor-webtest-naming/specs/webtest-resources/spec.md +80 -0
  229. package/openspec/changes/refactor-webtest-naming/specs/webtest-sampling/spec.md +122 -0
  230. package/openspec/changes/refactor-webtest-naming/specs/webtest-tools/spec.md +113 -0
  231. package/openspec/changes/refactor-webtest-naming/tasks.md +119 -0
  232. package/openspec/changes/rename-package-to-retest/proposal.md +52 -0
  233. package/openspec/changes/rename-package-to-retest/specs/mcp-server-core/spec.md +53 -0
  234. package/openspec/changes/rename-package-to-retest/specs/retest-lifecycle/spec.md +68 -0
  235. package/openspec/changes/rename-package-to-retest/specs/retest-logging/spec.md +35 -0
  236. package/openspec/changes/rename-package-to-retest/specs/retest-prompts/spec.md +159 -0
  237. package/openspec/changes/rename-package-to-retest/specs/retest-resources/spec.md +251 -0
  238. package/openspec/changes/rename-package-to-retest/specs/retest-sampling/spec.md +99 -0
  239. package/openspec/changes/rename-package-to-retest/specs/retest-tools/spec.md +295 -0
  240. package/openspec/changes/rename-package-to-retest/tasks.md +71 -0
  241. package/openspec/project.md +31 -0
  242. package/openspec/specs/mcp-server-core/spec.md +178 -0
  243. package/openspec/specs/webtest-lifecycle/spec.md +136 -0
  244. package/openspec/specs/webtest-logging/spec.md +209 -0
  245. package/openspec/specs/webtest-prompts/spec.md +155 -0
  246. package/openspec/specs/webtest-resources/spec.md +248 -0
  247. package/openspec/specs/webtest-sampling/spec.md +344 -0
  248. package/openspec/specs/webtest-tools/spec.md +282 -0
  249. package/package.json +54 -0
  250. package/release.config.js +9 -0
  251. package/src/config.test.ts +96 -0
  252. package/src/config.ts +32 -0
  253. package/src/elicitation/index.test.ts +399 -0
  254. package/src/elicitation/index.ts +171 -0
  255. package/src/elicitation/types.ts +68 -0
  256. package/src/index.ts +83 -0
  257. package/src/lifecycle/index.test.ts +260 -0
  258. package/src/lifecycle/index.ts +101 -0
  259. package/src/logger.redaction.test.ts +322 -0
  260. package/src/logger.test.ts +123 -0
  261. package/src/logger.ts +229 -0
  262. package/src/playwright-client/index.ts +392 -0
  263. package/src/playwright-client/types.ts +99 -0
  264. package/src/progress/index.test.ts +327 -0
  265. package/src/progress/index.ts +170 -0
  266. package/src/progress/types.ts +25 -0
  267. package/src/prompts/index.test.ts +451 -0
  268. package/src/prompts/index.ts +246 -0
  269. package/src/prompts/loader.test.ts +100 -0
  270. package/src/prompts/loader.ts +59 -0
  271. package/src/prompts/templates/mcp/webtest-crawl.md +7 -0
  272. package/src/prompts/templates/mcp/webtest-discover-flows.md +11 -0
  273. package/src/prompts/templates/mcp/webtest-discover.md +12 -0
  274. package/src/prompts/templates/mcp/webtest-full-workflow.md +12 -0
  275. package/src/prompts/templates/mcp/webtest-generate-tests.md +11 -0
  276. package/src/prompts/templates/mcp/webtest-run-test.md +11 -0
  277. package/src/prompts/templates/mcp/webtest-start.md +8 -0
  278. package/src/prompts/templates/sampling/crawl-action.md +35 -0
  279. package/src/prompts/templates/sampling/feature-discovery.md +27 -0
  280. package/src/prompts/templates/sampling/flow-discovery.md +29 -0
  281. package/src/prompts/templates/sampling/page-content-wrapper.md +5 -0
  282. package/src/prompts/templates/sampling/system-prefix.md +12 -0
  283. package/src/prompts/templates/sampling/test-evaluation.md +17 -0
  284. package/src/prompts/templates/sampling/test-generation.md +31 -0
  285. package/src/resources/index.ts +250 -0
  286. package/src/resources/subscriptions.ts +37 -0
  287. package/src/sampling/index.test.ts +414 -0
  288. package/src/sampling/index.ts +286 -0
  289. package/src/sampling/prompts.ts +194 -0
  290. package/src/sampling/types.ts +60 -0
  291. package/src/schemas/config.ts +39 -0
  292. package/src/security/index.test.ts +441 -0
  293. package/src/security/index.ts +361 -0
  294. package/src/security/security-scenarios.test.ts +468 -0
  295. package/src/server.ts +211 -0
  296. package/src/test-utils/index.ts +6 -0
  297. package/src/test-utils/mock-context.ts +426 -0
  298. package/src/test-utils/mock-playwright-client.ts +422 -0
  299. package/src/tools/index.ts +11 -0
  300. package/src/tools/webtest/crawl.test.ts +834 -0
  301. package/src/tools/webtest/crawl.ts +901 -0
  302. package/src/tools/webtest/discover-features.ts +412 -0
  303. package/src/tools/webtest/discover-flows.ts +408 -0
  304. package/src/tools/webtest/generate-tests.test.ts +532 -0
  305. package/src/tools/webtest/generate-tests.ts +425 -0
  306. package/src/tools/webtest/index.ts +7 -0
  307. package/src/tools/webtest/integration.test.ts +536 -0
  308. package/src/tools/webtest/run-test-case.test.ts +659 -0
  309. package/src/tools/webtest/run-test-case.ts +508 -0
  310. package/src/tools/webtest/schemas.ts +201 -0
  311. package/src/tools/webtest/start-analysis.test.ts +151 -0
  312. package/src/tools/webtest/start-analysis.ts +158 -0
  313. package/src/transports/http.ts +19 -0
  314. package/src/transports/index.ts +30 -0
  315. package/src/transports/stdio.ts +7 -0
  316. package/src/types/capabilities.test.ts +193 -0
  317. package/src/types/capabilities.ts +50 -0
  318. package/src/types/context.ts +21 -0
  319. package/src/types/tool.ts +11 -0
  320. package/src/workspace/index.ts +945 -0
  321. package/src/workspace/markdown.ts +272 -0
  322. package/src/workspace/types.ts +186 -0
  323. package/tests/integration/server.test.ts +89 -0
  324. package/tests/integration/tools.test.ts +99 -0
  325. package/tsconfig.json +20 -0
  326. package/vitest.config.ts +9 -0
  327. package/vitest.integration.config.ts +10 -0
@@ -0,0 +1,183 @@
1
+ ## ADDED Requirements
2
+
3
+ ### Requirement: MCP Server Initialization
4
+
5
+ The system SHALL provide an MCP server that initializes with proper identification and connects to the configured transport.
6
+
7
+ #### Scenario: Server starts with stdio transport
8
+
9
+ - **GIVEN** the environment variable `TRANSPORT` is set to `stdio` or not set
10
+ - **WHEN** the server entry point is executed
11
+ - **THEN** it SHALL identify itself with name "testing-mcp" and version from package.json
12
+ - **AND** it SHALL connect to stdio transport for communication
13
+
14
+ #### Scenario: Server starts with HTTP transport
15
+
16
+ - **GIVEN** the environment variable `TRANSPORT` is set to `http`
17
+ - **AND** the environment variable `PORT` is set to a valid port number
18
+ - **WHEN** the server entry point is executed
19
+ - **THEN** it SHALL start a Streamable HTTP server on the specified port
20
+ - **AND** it SHALL accept MCP protocol connections over HTTP
21
+
22
+ #### Scenario: Server handles graceful shutdown
23
+
24
+ - **GIVEN** the server is running
25
+ - **WHEN** the process receives SIGINT or SIGTERM
26
+ - **THEN** the server SHALL disconnect gracefully
27
+ - **AND** the process SHALL exit with code 0
28
+
29
+ ### Requirement: Configuration Validation
30
+
31
+ The system SHALL validate configuration at startup using Zod schemas and fail fast on invalid configuration.
32
+
33
+ #### Scenario: Valid configuration starts server
34
+
35
+ - **GIVEN** all required environment variables are valid
36
+ - **WHEN** the server starts
37
+ - **THEN** configuration SHALL be parsed and validated
38
+ - **AND** the server SHALL proceed with initialization
39
+
40
+ #### Scenario: Invalid configuration fails fast
41
+
42
+ - **GIVEN** an environment variable has an invalid value (e.g., `PORT=invalid`)
43
+ - **WHEN** the server attempts to start
44
+ - **THEN** it SHALL log a descriptive error message
45
+ - **AND** the process SHALL exit with a non-zero code
46
+
47
+ ### Requirement: Pluggable Transport Layer
48
+
49
+ The system SHALL support multiple transport types through a pluggable architecture with transport selection via environment configuration.
50
+
51
+ #### Scenario: Transport factory selects stdio
52
+
53
+ - **GIVEN** the transport configuration specifies `stdio`
54
+ - **WHEN** the transport factory is invoked
55
+ - **THEN** it SHALL return a configured StdioServerTransport instance
56
+
57
+ #### Scenario: Transport factory selects HTTP
58
+
59
+ - **GIVEN** the transport configuration specifies `http` with a port
60
+ - **WHEN** the transport factory is invoked
61
+ - **THEN** it SHALL return a configured StreamableHTTPServerTransport instance
62
+
63
+ ### Requirement: Self-Describing Tool Registry
64
+
65
+ The system SHALL maintain a tool registry where each tool exports a standard interface including name, description, Zod input schema, and async handler function.
66
+
67
+ #### Scenario: Tool is registered and discoverable
68
+
69
+ - **GIVEN** a tool is added to the registry
70
+ - **WHEN** an MCP client requests the tool list
71
+ - **THEN** the tool SHALL appear in the list with its name and description
72
+ - **AND** the input JSON Schema SHALL be generated from the Zod schema
73
+
74
+ #### Scenario: New tool follows registry pattern
75
+
76
+ - **GIVEN** a developer creates a new tool
77
+ - **WHEN** the tool exports `{ name, description, inputSchema, handler }`
78
+ - **AND** the tool is added to the registry index
79
+ - **THEN** it SHALL be automatically registered with the MCP server
80
+
81
+ ### Requirement: Hello Tool Implementation
82
+
83
+ The system SHALL provide a "hello" demonstration tool that accepts a name parameter and returns a greeting message, serving as a reference implementation of the tool pattern.
84
+
85
+ #### Scenario: Hello tool returns greeting
86
+
87
+ - **GIVEN** the hello tool is registered
88
+ - **WHEN** called with input `{ "name": "World" }`
89
+ - **THEN** it SHALL return content with text "Hello, World!"
90
+
91
+ #### Scenario: Hello tool validates input
92
+
93
+ - **GIVEN** the hello tool is registered
94
+ - **WHEN** called without required name parameter
95
+ - **THEN** it SHALL return a validation error
96
+
97
+ ### Requirement: Structured Logging
98
+
99
+ The system SHALL provide structured JSON logging with configurable log levels and automatic redaction of sensitive fields.
100
+
101
+ #### Scenario: Log output is structured JSON
102
+
103
+ - **GIVEN** the server is running
104
+ - **WHEN** a log event occurs
105
+ - **THEN** it SHALL be output as a JSON object with timestamp, level, and message fields
106
+
107
+ #### Scenario: Sensitive fields are redacted
108
+
109
+ - **GIVEN** a log message contains a field matching a sensitive key pattern (password, token, secret, apiKey, authorization)
110
+ - **WHEN** the log is written
111
+ - **THEN** the sensitive field value SHALL be replaced with "[REDACTED]"
112
+
113
+ #### Scenario: Log level is configurable
114
+
115
+ - **GIVEN** the environment variable `LOG_LEVEL` is set to a valid level (debug, info, warn, error)
116
+ - **WHEN** the server starts
117
+ - **THEN** only log messages at or above that level SHALL be output
118
+
119
+ ### Requirement: Project Build Configuration
120
+
121
+ The system SHALL be buildable to JavaScript for production deployment using TypeScript compiler.
122
+
123
+ #### Scenario: Project builds successfully
124
+
125
+ - **GIVEN** the source code is valid TypeScript
126
+ - **WHEN** `npm run build` is executed
127
+ - **THEN** compiled JavaScript SHALL be output to `dist/` directory
128
+ - **AND** the build SHALL complete without errors
129
+
130
+ #### Scenario: Development mode runs with hot-reload
131
+
132
+ - **GIVEN** the development dependencies are installed
133
+ - **WHEN** `npm run dev` is executed
134
+ - **THEN** the server SHALL start with file watching enabled
135
+ - **AND** changes to source files SHALL trigger automatic restart
136
+
137
+ #### Scenario: Package is executable as CLI
138
+
139
+ - **GIVEN** the project is built
140
+ - **WHEN** `npx testing-mcp` is executed (or the bin entry is invoked)
141
+ - **THEN** the server SHALL start with default configuration
142
+
143
+ ### Requirement: Unit Test Infrastructure
144
+
145
+ The system SHALL include unit test configuration for validating tool handlers in isolation.
146
+
147
+ #### Scenario: Unit tests execute successfully
148
+
149
+ - **GIVEN** unit test files exist in the project
150
+ - **WHEN** `npm test` is executed
151
+ - **THEN** the test runner SHALL discover and execute all test files
152
+ - **AND** results SHALL be reported to stdout
153
+
154
+ #### Scenario: Tool handlers are testable in isolation
155
+
156
+ - **GIVEN** a tool handler function
157
+ - **WHEN** called directly with valid input
158
+ - **THEN** it SHALL return the expected result without requiring server initialization
159
+
160
+ ### Requirement: Integration Test Infrastructure
161
+
162
+ The system SHALL include integration tests that spawn the server and communicate using the MCP protocol to verify end-to-end behavior.
163
+
164
+ #### Scenario: Integration test spawns server
165
+
166
+ - **GIVEN** integration test configuration exists
167
+ - **WHEN** an integration test runs
168
+ - **THEN** it SHALL spawn the server as a child process
169
+ - **AND** connect to it using StdioServerTransport
170
+
171
+ #### Scenario: Integration test executes tool end-to-end
172
+
173
+ - **GIVEN** an integration test has connected to the server
174
+ - **WHEN** it calls a tool with valid input
175
+ - **THEN** it SHALL receive the expected response payload
176
+ - **AND** verify the response matches expected format
177
+
178
+ #### Scenario: Integration test verifies error handling
179
+
180
+ - **GIVEN** an integration test has connected to the server
181
+ - **WHEN** it calls a tool with invalid input
182
+ - **THEN** it SHALL receive an appropriate error response
183
+ - **AND** verify the error format matches MCP protocol specification
@@ -0,0 +1,112 @@
1
+ ## 1. Project Configuration
2
+
3
+ - [x] 1.1 Initialize `package.json` with:
4
+ - name: "testing-mcp"
5
+ - type: "module"
6
+ - exports map: `{ ".": "./dist/index.js" }`
7
+ - bin entry: `{ "testing-mcp": "./dist/index.js" }`
8
+ - scripts: dev, build, test, test:integration, start
9
+ - engines: `{ "node": ">=22.18.0" }`
10
+ - [x] 1.2 Create `tsconfig.json` targeting ES2022, NodeNext module resolution, strict mode enabled
11
+ - [x] 1.3 Install dependencies: `@modelcontextprotocol/sdk`, `zod`
12
+ - [x] 1.4 Install dev dependencies: `typescript`, `tsx`, `vitest`, `@types/node`
13
+ - [x] 1.5 Add `.gitignore` entries for `node_modules/`, `dist/`, and coverage reports
14
+ - [x] 1.6 Create `vitest.config.ts` with TypeScript and ESM support
15
+
16
+ ## 2. Project Structure
17
+
18
+ - [x] 2.1 Create `src/` directory structure:
19
+ ```
20
+ src/
21
+ ├── index.ts
22
+ ├── server.ts
23
+ ├── config.ts
24
+ ├── logger.ts
25
+ ├── transports/
26
+ │ ├── index.ts
27
+ │ ├── stdio.ts
28
+ │ └── http.ts
29
+ ├── tools/
30
+ │ ├── index.ts
31
+ │ └── hello.ts
32
+ ├── schemas/
33
+ │ └── config.ts
34
+ └── types/
35
+ └── tool.ts
36
+ ```
37
+
38
+ ## 3. Configuration & Logging
39
+
40
+ - [x] 3.1 Create `src/schemas/config.ts` with Zod schema for environment config:
41
+ - `TRANSPORT`: enum `stdio` | `http` (default: `stdio`)
42
+ - `PORT`: number (default: `3000`, required when TRANSPORT=http)
43
+ - `LOG_LEVEL`: enum `debug` | `info` | `warn` | `error` (default: `info`)
44
+ - [x] 3.2 Create `src/config.ts` that parses and validates env vars at startup
45
+ - [x] 3.3 Create `src/logger.ts` with:
46
+ - Structured JSON output
47
+ - Configurable log level
48
+ - Secret redaction for sensitive keys (password, token, secret, apiKey, authorization)
49
+
50
+ ## 4. Transport Layer
51
+
52
+ - [x] 4.1 Create `src/types/tool.ts` with `McpTool` interface
53
+ - [x] 4.2 Create `src/transports/stdio.ts` wrapping StdioServerTransport
54
+ - [x] 4.3 Create `src/transports/http.ts` wrapping StreamableHTTPServerTransport
55
+ - [x] 4.4 Create `src/transports/index.ts` transport factory based on config
56
+
57
+ ## 5. Tool Registry
58
+
59
+ - [x] 5.1 Create `src/tools/hello.ts` with:
60
+ - Zod input schema for `name` parameter
61
+ - Handler returning greeting message
62
+ - Export following `McpTool` interface
63
+ - [x] 5.2 Create `src/tools/index.ts` registry exporting all tools
64
+ - [x] 5.3 Verify JSON Schema generation from Zod works correctly
65
+
66
+ ## 6. Server Core
67
+
68
+ - [x] 6.1 Create `src/server.ts` with:
69
+ - MCP server factory function
70
+ - Tool registration from registry
71
+ - Server identification (name, version from package.json)
72
+ - [x] 6.2 Create `src/index.ts` entry point:
73
+ - Config validation
74
+ - Logger initialization
75
+ - Transport creation
76
+ - Server bootstrap
77
+ - Graceful shutdown handlers (SIGINT/SIGTERM)
78
+
79
+ ## 7. Unit Tests
80
+
81
+ - [x] 7.1 Create `src/tools/hello.test.ts` testing:
82
+ - Handler returns correct greeting
83
+ - Input validation rejects invalid input
84
+ - [x] 7.2 Create `src/config.test.ts` testing:
85
+ - Valid config parses correctly
86
+ - Invalid config throws descriptive error
87
+ - [x] 7.3 Create `src/logger.test.ts` testing:
88
+ - Output is valid JSON
89
+ - Sensitive fields are redacted
90
+ - [x] 7.4 Verify `npm test` runs all unit tests successfully
91
+
92
+ ## 8. Integration Tests
93
+
94
+ - [x] 8.1 Create `tests/integration/` directory
95
+ - [x] 8.2 Create `tests/integration/server.test.ts`:
96
+ - Spawn server as child process
97
+ - Connect using MCP client with StdioClientTransport
98
+ - Verify tool list includes "hello"
99
+ - [x] 8.3 Create `tests/integration/tools.test.ts`:
100
+ - Call hello tool with valid input, verify response
101
+ - Call hello tool with invalid input, verify error response
102
+ - [x] 8.4 Add `test:integration` npm script
103
+ - [x] 8.5 Verify integration tests pass
104
+
105
+ ## 9. Validation & Documentation
106
+
107
+ - [x] 9.1 Verify `npm run build` produces valid output in `dist/`
108
+ - [x] 9.2 Verify `npm run dev` starts server with watch mode (stdio transport)
109
+ - [x] 9.3 Verify `TRANSPORT=http PORT=3000 npm run dev` starts HTTP server
110
+ - [x] 9.4 Test server with MCP client (e.g., Claude Code) to confirm tool discovery and execution
111
+ - [x] 9.5 Add shebang `#!/usr/bin/env node` to entry point for bin execution
112
+ - [x] 9.6 Verify `npx .` works after build (local bin test)
@@ -0,0 +1,333 @@
1
+ # Design: Dynamic Web Testing Orchestrator
2
+
3
+ ## Context
4
+
5
+ This MCP server orchestrates web application testing by:
6
+ 1. Managing browser automation via an external Playwright MCP server
7
+ 2. Using MCP Sampling for all LLM-powered decisions (client-controlled)
8
+ 3. Exposing artifacts as MCP Resources for client consumption
9
+ 4. Supporting interactive workflows via MCP Elicitation
10
+
11
+ The architecture spans multiple systems (this server, Playwright MCP, client LLM) and introduces patterns for capability negotiation, fallback modes, and secure multi-model orchestration.
12
+
13
+ ## Goals / Non-Goals
14
+
15
+ ### Goals
16
+ - Provide end-to-end web testing workflow: explore → analyze → generate tests → execute
17
+ - Use MCP Sampling exclusively for LLM reasoning (no server-side API keys)
18
+ - Support graceful degradation when client lacks sampling/elicitation
19
+ - Ensure all artifacts are browsable as MCP Resources
20
+ - Support cancellation and progress for long-running operations
21
+ - Enforce security: domain allowlists, no credential elicitation, prompt injection resistance
22
+
23
+ ### Non-Goals
24
+ - Authentication/login automation (explicitly out of scope; stop and inform user)
25
+ - Server-side LLM API keys or model selection
26
+ - Visual regression testing (future enhancement)
27
+ - Multi-browser support (Playwright MCP handles this; we just orchestrate)
28
+ - Parallel test execution (single-threaded for v1)
29
+
30
+ ## Architecture Overview
31
+
32
+ ```
33
+ ┌─────────────────────────────────────────────────────────────────┐
34
+ │ MCP Client │
35
+ │ (Claude Desktop, VS Code, custom client) │
36
+ │ - Handles sampling/createMessage requests │
37
+ │ - Displays resources, prompts │
38
+ │ - Provides elicitation UI │
39
+ └─────────────────┬───────────────────────────────────────────────┘
40
+ │ MCP Protocol (stdio/HTTP)
41
+ ┌─────────────────▼───────────────────────────────────────────────┐
42
+ │ testing-mcp (This Server) │
43
+ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────────┐│
44
+ │ │ Lifecycle │ │ Tool │ │ Resource Manager ││
45
+ │ │ Manager │ │ Handlers │ │ (workspace/artifacts) ││
46
+ │ │ (caps nego) │ │ (5 tools) │ │ ││
47
+ │ └──────────────┘ └──────────────┘ └──────────────────────────┘│
48
+ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────────┐│
49
+ │ │ Sampling │ │ Elicitation │ │ Progress/Cancellation ││
50
+ │ │ Client │ │ Client │ │ Manager ││
51
+ │ └──────────────┘ └──────────────┘ └──────────────────────────┘│
52
+ │ ┌──────────────────────────────────────────────────────────────┐│
53
+ │ │ Playwright MCP Client (orchestrates external server) ││
54
+ │ └──────────────────────────────────────────────────────────────┘│
55
+ └─────────────────┬───────────────────────────────────────────────┘
56
+ │ MCP Protocol (stdio subprocess)
57
+ ┌─────────────────▼───────────────────────────────────────────────┐
58
+ │ Playwright MCP Server (Microsoft) │
59
+ │ - browser_snapshot, browser_take_screenshot │
60
+ │ - browser_click, browser_type, browser_navigate │
61
+ │ - browser_run_code (for DOM extraction) │
62
+ └─────────────────────────────────────────────────────────────────┘
63
+ ```
64
+
65
+ ## Decisions
66
+
67
+ ### D1: Playwright MCP as subprocess, not embedded
68
+ **Decision**: Spawn Playwright MCP server as a subprocess and communicate via stdio MCP protocol.
69
+
70
+ **Rationale**:
71
+ - Microsoft's Playwright MCP is maintained separately with frequent updates
72
+ - Subprocess isolation prevents version conflicts
73
+ - Standard MCP client pattern; reusable code
74
+ - Can be configured via environment (e.g., browser type, headless mode)
75
+
76
+ **Alternatives considered**:
77
+ - Direct Playwright library integration: Higher coupling, maintenance burden
78
+ - HTTP transport to remote Playwright MCP: Adds network complexity for local use
79
+
80
+ ### D2: Sampling-first with fallback modes
81
+ **Decision**: Primary reasoning via `sampling/createMessage`. When sampling unavailable, emit "human prompt" resources for manual execution.
82
+
83
+ **Rationale**:
84
+ - MCP Sampling keeps API keys client-side (security, flexibility)
85
+ - Fallback ensures server works with minimal clients
86
+ - "Human prompt" resources let users copy/paste to their LLM
87
+
88
+ **Fallback behavior**:
89
+ - `webtest_crawl_app` without sampling: Returns `needsManualInput: true` and a resource with the prompt
90
+ - Tool accepts `manualNextActions` input to continue crawl with user-provided actions
91
+
92
+ ### D3: Structured JSON schemas for all sampling requests
93
+ **Decision**: All sampling requests include strict JSON output schemas. Responses are validated before use.
94
+
95
+ **Rationale**:
96
+ - Predictable parsing of LLM responses
97
+ - Type safety in TypeScript handlers
98
+ - Clear contract between server and client LLM
99
+
100
+ **Schema examples**:
101
+ - Crawl action: `{ actions: [{ tool: string, args: object }], reasoning: string, goalProgress: string }`
102
+ - Test generation: `{ tests: [{ id, name, steps: [...] }] }`
103
+
104
+ ### D4: File-based workspace with Resource URIs
105
+ **Decision**: Each analysis creates a workspace directory. All artifacts written to disk and exposed as `webtest://analysisId/...` resources.
106
+
107
+ **Rationale**:
108
+ - Persistence survives server restarts
109
+ - Resources are stable, shareable URIs
110
+ - File system is simple, debuggable
111
+ - Enables future features (workspace resume, export)
112
+
113
+ **Structure**:
114
+ ```
115
+ workspaces/
116
+ {analysisId}/
117
+ index.json # Analysis metadata
118
+ crawls/
119
+ {crawlId}/
120
+ index.json # Crawl metadata, page list
121
+ pages/
122
+ {pageId}/
123
+ snapshot.json
124
+ screenshot.png
125
+ dom.html
126
+ summary.md
127
+ analysis/
128
+ app-analysis.md
129
+ flows.json
130
+ tests/
131
+ tests.md
132
+ tests.json
133
+ runs/
134
+ {runId}/
135
+ report.md
136
+ artifacts.json
137
+ steps/
138
+ {stepId}/
139
+ screenshot.png
140
+ snapshot.json
141
+ ```
142
+
143
+ ### D5: Capability-based runtime behavior
144
+ **Decision**: Query client capabilities at initialization; store in server context; branch behavior at runtime.
145
+
146
+ **Rationale**:
147
+ - MCP clients vary in capability support
148
+ - Graceful degradation over hard failures
149
+ - Single codebase serves all client types
150
+
151
+ **Capabilities checked**:
152
+ - `sampling`: Use sampling or fallback to manual prompts
153
+ - `elicitation`: Ask user or write questions to output
154
+ - `logging`: Emit logs or stay silent
155
+ - `progress`: Report progress or skip
156
+ - `tasks`: Use task-augmented execution for long operations (optional)
157
+
158
+ ### D6: Security boundaries
159
+ **Decision**: Implement multiple security layers.
160
+
161
+ **Rationale**: Web testing interacts with untrusted content; must prevent abuse.
162
+
163
+ **Layers**:
164
+ 1. **Domain allowlist**: Default to target domain only. Explicitly opt-in for additional domains.
165
+ 2. **Prompt injection resistance**: Model instructions are prefixed with "SYSTEM:" and wrapped; page content is clearly demarcated as "USER CONTENT:". Sampling prompts instruct model to ignore instructions in page content.
166
+ 3. **No credential elicitation**: If auth required, inform user and stop. Never ask for passwords/tokens via elicitation.
167
+ 4. **Action validation**: Before executing Playwright actions, validate they target allowed domains.
168
+
169
+ ### D7: Progress and cancellation implementation
170
+ **Decision**: Use `progressToken` from request `_meta`; emit `notifications/progress`. Check cancellation registry on each loop iteration.
171
+
172
+ **Rationale**:
173
+ - Standard MCP progress pattern
174
+ - Cancellation enables user control over long crawls
175
+ - Partial results are still valuable
176
+
177
+ **Implementation**:
178
+ - Maintain `Set<requestId>` of cancelled requests
179
+ - On `notifications/cancelled`, add to set
180
+ - Each crawl/test loop iteration checks set; if cancelled, finalize partial output
181
+
182
+ ### D8: Elicitation for specific decision points
183
+ **Decision**: Use elicitation only for enumerated, non-sensitive decisions.
184
+
185
+ **Elicitation triggers** (exhaustive list):
186
+ - Cookie consent: "Accept", "Reject", "Dismiss"
187
+ - Modal blocking: "Close modal", "Interact with modal"
188
+ - Ambiguous navigation: Multiple similar options → list them
189
+ - Auth required: "Stop analysis", "Continue unauthenticated"
190
+
191
+ **Never elicit**: Passwords, tokens, 2FA codes, personal data
192
+
193
+ ### D9: Protocol version requirements for elicitation
194
+ **Decision**: Require MCP protocol revision 2025-06-18 or later; negotiate gracefully with older clients.
195
+
196
+ **Rationale**:
197
+ - Elicitation is a newer MCP feature not available in all clients
198
+ - Explicit version requirement prevents runtime surprises
199
+ - Graceful degradation for older clients maintains usability
200
+
201
+ **Implementation**:
202
+ - Declare `protocolVersion: "2025-06-18"` in server capabilities
203
+ - If client negotiates older version, mark elicitation as unavailable
204
+ - Log warning when running in degraded mode
205
+
206
+ ### D10: Resource change signaling
207
+ **Decision**: Implement `resources/list_changed` notifications and optional `resources/subscribe` for live artifact updates.
208
+
209
+ **Rationale**:
210
+ - Long-running operations (crawl, test execution) produce artifacts incrementally
211
+ - Clients benefit from knowing when new artifacts are available
212
+ - Standard MCP pattern for resource-heavy servers
213
+
214
+ **Implementation**:
215
+ - Check `capabilities.resources.listChanged` at init
216
+ - Emit `notifications/resources/list_changed` when new resources created
217
+ - Support `resources/subscribe` for per-resource update notifications
218
+ - Fallback: clients poll `resources/list` if notifications unsupported
219
+
220
+ ### D11: Playwright MCP capability adapter
221
+ **Decision**: Dynamically discover Playwright MCP tools and build an adapter layer mapping canonical operations to actual tool names.
222
+
223
+ **Rationale**:
224
+ - Different Playwright MCP implementations use different naming (browser_*, playwright_*, unprefixed)
225
+ - Microsoft's implementation may change tool names between versions
226
+ - Adapter pattern isolates our code from external API changes
227
+
228
+ **Implementation**:
229
+ - On first Playwright MCP use, call `tools/list`
230
+ - Build mapping: `{ snapshot: "browser_snapshot", click: "browser_click", ... }`
231
+ - Check for required capabilities; log warnings if missing
232
+ - Cache mapping for session lifetime
233
+
234
+ ### D12: Crawl checkpointing and loop prevention
235
+ **Decision**: Implement periodic checkpoints and multi-level loop detection during crawl.
236
+
237
+ **Rationale**:
238
+ - Long crawls may be interrupted (cancellation, errors, timeouts)
239
+ - Infinite loops on complex apps are a real risk
240
+ - Checkpoints enable resumption; loop detection prevents resource waste
241
+
242
+ **Checkpointing**:
243
+ - Write checkpoint every N steps (default 5)
244
+ - Checkpoint includes: step count, visited pages, action history, goal progress
245
+ - Support `resume: true` to continue from checkpoint
246
+
247
+ **Loop detection**:
248
+ - DOM signature (hash of structural elements) detects same-state loops
249
+ - URL tracking detects navigation cycles
250
+ - Action deduplication prevents repeated identical actions
251
+ - Include loop state in sampling prompts to help model avoid patterns
252
+
253
+ ### D13: Structured logging with correlation and redaction
254
+ **Decision**: Implement MCP logging notifications with correlation IDs, log level control, and sensitive data redaction.
255
+
256
+ **Rationale**:
257
+ - Progress tells "where we are"; logs tell "why we did that"
258
+ - Correlation IDs enable tracing across analysis → crawl → test
259
+ - Sensitive data (tokens, passwords, cookies) must not leak to logs
260
+
261
+ **Implementation**:
262
+ - Support `logging/setLevel` for dynamic control
263
+ - Include `analysisId`, `crawlId`, `testRunId`, `iteration` in all logs
264
+ - Redact: URL query params matching sensitive patterns, cookie values, password inputs
265
+ - Log Playwright tool calls and sampling requests for debugging
266
+
267
+ ### D14: Comprehensive prompt injection hardening
268
+ **Decision**: Implement defense-in-depth against prompt injection attacks via page content.
269
+
270
+ **Rationale**:
271
+ - MCP Sampling forwards untrusted page content to a model
272
+ - Injection attacks could expand scope, exfiltrate data, or request secrets
273
+ - Multiple layers of defense required
274
+
275
+ **Layers**:
276
+ 1. **Demarcation**: Page content wrapped with explicit security warnings
277
+ 2. **Instruction protection**: System instructions use `[WEBTEST-SYSTEM]:` prefix
278
+ 3. **Action validation**: All actions checked against allowed domains
279
+ 4. **Scope enforcement**: Reject actions outside stated user goal
280
+ 5. **Exfiltration blocking**: Block POST to external domains, external network calls
281
+ 6. **Audit logging**: Log all sampling inputs/outputs for security review
282
+ 7. **Test suite**: Include injection resistance tests (direct, indirect, goal hijacking)
283
+
284
+ ## Risks / Trade-offs
285
+
286
+ ### Risk: Sampling latency impacts UX
287
+ **Mitigation**:
288
+ - Emit progress notifications frequently
289
+ - Allow cancellation
290
+ - Batch simple decisions where possible
291
+
292
+ ### Risk: Playwright MCP tool names change
293
+ **Mitigation**:
294
+ - Discover tools at connection time via `tools/list`
295
+ - Maintain mapping from canonical names to actual names
296
+ - Log warnings if expected tools missing
297
+
298
+ ### Risk: Large workspaces consume disk
299
+ **Mitigation**:
300
+ - Configurable retention policy (env var)
301
+ - Screenshots compressed (JPEG quality setting)
302
+ - Future: workspace cleanup command
303
+
304
+ ### Risk: Prompt injection via page content
305
+ **Mitigation**:
306
+ - Clear demarcation in sampling prompts
307
+ - Output schema validation (reject malformed responses)
308
+ - Domain allowlist prevents navigation to attacker-controlled sites
309
+ - Monitoring: log all sampling inputs/outputs for audit
310
+
311
+ ## Migration Plan
312
+
313
+ This is greenfield functionality; no migration needed. The `hello` tool removal is the only breaking change.
314
+
315
+ **Rollout**:
316
+ 1. Implement core infrastructure (lifecycle, sampling client, Playwright client)
317
+ 2. Implement `start_analysis` + resource system
318
+ 3. Implement `crawl` with sampling/elicitation
319
+ 4. Implement `analyze_app` and `generate_tests`
320
+ 5. Implement `run_test_case`
321
+ 6. Add prompts
322
+ 7. Remove `hello` tool
323
+ 8. Documentation and examples
324
+
325
+ ## Open Questions
326
+
327
+ 1. **Playwright MCP package name**: Is it `@anthropic-ai/mcp-playwright`, `@playwright/mcp`, or community package? Need to verify at implementation time.
328
+
329
+ 2. **Tasks support**: The MCP tasks extension is optional. Should we implement it in v1 or defer? **Recommendation**: Defer to v2; progress + cancellation covers most use cases.
330
+
331
+ 3. **Workspace location**: Default to `./workspaces` relative to CWD, or use temp directory? **Recommendation**: Configurable via `WEBTEST_WORKSPACE_DIR` env var, default to `./webtest-workspaces`.
332
+
333
+ 4. **Screenshot format**: PNG (lossless, larger) vs JPEG (lossy, smaller)? **Recommendation**: PNG for accuracy; make configurable.