code-ai-installer 4.0.0 → 4.0.1-b

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (471) hide show
  1. package/README.md +83 -67
  2. package/dist/index.js +2 -0
  3. package/dist/mcp/audit_ledger.d.ts +12 -0
  4. package/dist/mcp/audit_ledger.js +82 -0
  5. package/dist/mcp/cli.js +7 -1
  6. package/dist/mcp/config.d.ts +23 -0
  7. package/dist/mcp/config.js +44 -6
  8. package/dist/mcp/index.d.ts +1 -2
  9. package/dist/mcp/index.js +1 -2
  10. package/dist/mcp/paths.d.ts +20 -2
  11. package/dist/mcp/paths.js +29 -5
  12. package/dist/mcp/proposal_dedup.d.ts +32 -0
  13. package/dist/mcp/proposal_dedup.js +102 -0
  14. package/dist/mcp/proposal_store.d.ts +18 -0
  15. package/dist/mcp/proposal_store.js +74 -0
  16. package/dist/mcp/scorecard.d.ts +140 -0
  17. package/dist/mcp/scorecard.js +103 -0
  18. package/dist/mcp/skill_invocations.d.ts +15 -0
  19. package/dist/mcp/skill_invocations.js +28 -0
  20. package/dist/mcp/task_state.d.ts +77 -2
  21. package/dist/mcp/tools/_subprocess.d.ts +16 -0
  22. package/dist/mcp/tools/_subprocess.js +56 -0
  23. package/dist/mcp/tools/advance_gate.js +2 -2
  24. package/dist/mcp/tools/aggregate_run_metrics.d.ts +19 -0
  25. package/dist/mcp/tools/aggregate_run_metrics.js +139 -0
  26. package/dist/mcp/tools/apply_diff.d.ts +2 -0
  27. package/dist/mcp/tools/apply_diff.js +29 -0
  28. package/dist/mcp/tools/audit_bilocale_parity.d.ts +2 -0
  29. package/dist/mcp/tools/audit_bilocale_parity.js +146 -0
  30. package/dist/mcp/tools/audit_budget_compliance.d.ts +35 -0
  31. package/dist/mcp/tools/audit_budget_compliance.js +172 -0
  32. package/dist/mcp/tools/build.d.ts +2 -0
  33. package/dist/mcp/tools/build.js +47 -0
  34. package/dist/mcp/tools/check_lint.d.ts +2 -0
  35. package/dist/mcp/tools/check_lint.js +23 -0
  36. package/dist/mcp/tools/classify_gate.js +2 -2
  37. package/dist/mcp/tools/current_gate.js +2 -2
  38. package/dist/mcp/tools/dependency_supply_chain.d.ts +2 -0
  39. package/dist/mcp/tools/dependency_supply_chain.js +59 -0
  40. package/dist/mcp/tools/docker_compose.d.ts +2 -0
  41. package/dist/mcp/tools/docker_compose.js +24 -0
  42. package/dist/mcp/tools/e2e_playwright.d.ts +2 -0
  43. package/dist/mcp/tools/e2e_playwright.js +88 -0
  44. package/dist/mcp/tools/get_skill.js +17 -0
  45. package/dist/mcp/tools/git_commit.d.ts +2 -0
  46. package/dist/mcp/tools/git_commit.js +30 -0
  47. package/dist/mcp/tools/list_proposals.d.ts +6 -0
  48. package/dist/mcp/tools/list_proposals.js +16 -0
  49. package/dist/mcp/tools/list_skills.js +9 -1
  50. package/dist/mcp/tools/load_role.d.ts +3 -4
  51. package/dist/mcp/tools/load_role.js +11 -13
  52. package/dist/mcp/tools/propose_change.d.ts +8 -0
  53. package/dist/mcp/tools/propose_change.js +36 -0
  54. package/dist/mcp/tools/record_decision.js +25 -25
  55. package/dist/mcp/tools/review_proposal.d.ts +17 -0
  56. package/dist/mcp/tools/review_proposal.js +99 -0
  57. package/dist/mcp/tools/run_drift_audit.d.ts +11 -0
  58. package/dist/mcp/tools/run_drift_audit.js +79 -0
  59. package/dist/mcp/tools/run_tests.d.ts +2 -0
  60. package/dist/mcp/tools/run_tests.js +92 -0
  61. package/dist/mcp/tools/sign_off.js +14 -2
  62. package/dist/mcp/tools/stubs.js +30 -9
  63. package/dist/mcp/tools/verify_claim.js +33 -6
  64. package/dist/mcp_setup.d.ts +14 -3
  65. package/dist/mcp_setup.js +15 -6
  66. package/dist/shared/frontmatter.d.ts +44 -2
  67. package/dist/shared/frontmatter.js +54 -6
  68. package/dist/shared/index.d.ts +0 -5
  69. package/dist/shared/index.js +0 -5
  70. package/dist/shared/persona.d.ts +2 -2
  71. package/dist/shared/persona.js +1 -1
  72. package/dist/shared/pipeline.d.ts +46 -1
  73. package/dist/shared/tools.d.ts +1382 -16
  74. package/dist/shared/tools.js +229 -0
  75. package/dist/shared/vocabulary.d.ts +99 -4
  76. package/dist/shared/vocabulary.js +94 -5
  77. package/domains/analytics/.agents/skills/ansoff-matrix/SKILL.md +316 -300
  78. package/domains/analytics/.agents/skills/bcg-matrix/SKILL.md +345 -329
  79. package/domains/analytics/.agents/skills/blue-ocean-strategy/SKILL.md +432 -416
  80. package/domains/analytics/.agents/skills/board/SKILL.md +22 -0
  81. package/domains/analytics/.agents/skills/cohort-analysis/SKILL.md +338 -322
  82. package/domains/analytics/.agents/skills/competitive-analysis/SKILL.md +413 -395
  83. package/domains/analytics/.agents/skills/customer-journey-mapping/SKILL.md +347 -331
  84. package/domains/analytics/.agents/skills/gates/SKILL.md +388 -366
  85. package/domains/analytics/.agents/skills/handoff/SKILL.md +402 -380
  86. package/domains/analytics/.agents/skills/html-pdf-report/SKILL.md +21 -289
  87. package/domains/analytics/.agents/skills/html-pdf-report-reference/SKILL.md +325 -0
  88. package/domains/analytics/.agents/skills/html-pdf-report-reference/agents/claude.json +17 -0
  89. package/domains/analytics/.agents/skills/html-pdf-report-reference/agents/copilot.json +17 -0
  90. package/domains/analytics/.agents/skills/html-pdf-report-reference/agents/gemini.json +17 -0
  91. package/domains/analytics/.agents/skills/html-pdf-report-reference/agents/kimi.yaml +15 -0
  92. package/domains/analytics/.agents/skills/html-pdf-report-reference/agents/openai.yaml +10 -0
  93. package/domains/analytics/.agents/skills/html-pdf-report-reference/agents/qwen.json +17 -0
  94. package/domains/analytics/.agents/skills/html-pdf-report-reference/agents/skill.yaml +23 -0
  95. package/domains/analytics/.agents/skills/icp-buyer-persona/SKILL.md +407 -390
  96. package/domains/analytics/.agents/skills/jtbd-analysis/SKILL.md +357 -341
  97. package/domains/analytics/.agents/skills/karpathy-guidelines/SKILL.md +32 -0
  98. package/domains/analytics/.agents/skills/pest-analysis/SKILL.md +324 -305
  99. package/domains/analytics/.agents/skills/porters-five-forces/SKILL.md +377 -361
  100. package/domains/analytics/.agents/skills/report-design/SKILL.md +416 -398
  101. package/domains/analytics/.agents/skills/rfm-analysis/SKILL.md +330 -314
  102. package/domains/analytics/.agents/skills/session-prompt-generator/SKILL.md +400 -378
  103. package/domains/analytics/.agents/skills/swot-analysis/SKILL.md +340 -324
  104. package/domains/analytics/.agents/skills/tam-sam-som/SKILL.md +329 -312
  105. package/domains/analytics/.agents/skills/trend-analysis/SKILL.md +347 -331
  106. package/domains/analytics/.agents/skills/unit-economics/SKILL.md +430 -413
  107. package/domains/analytics/.agents/skills/value-chain-analysis/SKILL.md +346 -330
  108. package/domains/analytics/.agents/skills/web-research/SKILL.md +323 -308
  109. package/domains/analytics/AGENTS.md +1 -0
  110. package/domains/analytics/agents/auditor.md +76 -0
  111. package/domains/analytics/agents/conductor.md +11 -0
  112. package/domains/analytics/agents/data_analyst.md +11 -0
  113. package/domains/analytics/agents/designer.md +11 -0
  114. package/domains/analytics/agents/interviewer.md +11 -0
  115. package/domains/analytics/agents/layouter.md +11 -0
  116. package/domains/analytics/agents/mediator.md +11 -0
  117. package/domains/analytics/agents/researcher.md +11 -0
  118. package/domains/analytics/agents/strategist.md +11 -0
  119. package/domains/analytics/locales/en/.agents/skills/ansoff-matrix/SKILL.md +316 -300
  120. package/domains/analytics/locales/en/.agents/skills/bcg-matrix/SKILL.md +345 -329
  121. package/domains/analytics/locales/en/.agents/skills/blue-ocean-strategy/SKILL.md +432 -416
  122. package/domains/analytics/locales/en/.agents/skills/board/SKILL.md +22 -0
  123. package/domains/analytics/locales/en/.agents/skills/cohort-analysis/SKILL.md +338 -322
  124. package/domains/analytics/locales/en/.agents/skills/competitive-analysis/SKILL.md +413 -395
  125. package/domains/analytics/locales/en/.agents/skills/customer-journey-mapping/SKILL.md +347 -331
  126. package/domains/analytics/locales/en/.agents/skills/gates/SKILL.md +388 -366
  127. package/domains/analytics/locales/en/.agents/skills/handoff/SKILL.md +402 -380
  128. package/domains/analytics/locales/en/.agents/skills/html-pdf-report/SKILL.md +21 -289
  129. package/domains/analytics/locales/en/.agents/skills/html-pdf-report-reference/SKILL.md +325 -0
  130. package/domains/analytics/locales/en/.agents/skills/html-pdf-report-reference/agents/claude.json +17 -0
  131. package/domains/analytics/locales/en/.agents/skills/html-pdf-report-reference/agents/copilot.json +17 -0
  132. package/domains/analytics/locales/en/.agents/skills/html-pdf-report-reference/agents/gemini.json +17 -0
  133. package/domains/analytics/locales/en/.agents/skills/html-pdf-report-reference/agents/kimi.yaml +15 -0
  134. package/domains/analytics/locales/en/.agents/skills/html-pdf-report-reference/agents/openai.yaml +10 -0
  135. package/domains/analytics/locales/en/.agents/skills/html-pdf-report-reference/agents/qwen.json +17 -0
  136. package/domains/analytics/locales/en/.agents/skills/html-pdf-report-reference/agents/skill.yaml +29 -0
  137. package/domains/analytics/locales/en/.agents/skills/icp-buyer-persona/SKILL.md +407 -390
  138. package/domains/analytics/locales/en/.agents/skills/jtbd-analysis/SKILL.md +357 -341
  139. package/domains/analytics/locales/en/.agents/skills/karpathy-guidelines/SKILL.md +32 -0
  140. package/domains/analytics/locales/en/.agents/skills/pest-analysis/SKILL.md +324 -305
  141. package/domains/analytics/locales/en/.agents/skills/porters-five-forces/SKILL.md +377 -361
  142. package/domains/analytics/locales/en/.agents/skills/report-design/SKILL.md +416 -398
  143. package/domains/analytics/locales/en/.agents/skills/rfm-analysis/SKILL.md +330 -314
  144. package/domains/analytics/locales/en/.agents/skills/session-prompt-generator/SKILL.md +400 -378
  145. package/domains/analytics/locales/en/.agents/skills/swot-analysis/SKILL.md +340 -324
  146. package/domains/analytics/locales/en/.agents/skills/tam-sam-som/SKILL.md +329 -312
  147. package/domains/analytics/locales/en/.agents/skills/trend-analysis/SKILL.md +347 -331
  148. package/domains/analytics/locales/en/.agents/skills/unit-economics/SKILL.md +430 -413
  149. package/domains/analytics/locales/en/.agents/skills/value-chain-analysis/SKILL.md +366 -350
  150. package/domains/analytics/locales/en/.agents/skills/web-research/SKILL.md +324 -309
  151. package/domains/analytics/locales/en/AGENTS.md +1 -0
  152. package/domains/analytics/locales/en/agents/auditor.md +76 -0
  153. package/domains/analytics/locales/en/agents/conductor.md +27 -0
  154. package/domains/analytics/locales/en/agents/data_analyst.md +29 -0
  155. package/domains/analytics/locales/en/agents/designer.md +27 -0
  156. package/domains/analytics/locales/en/agents/interviewer.md +11 -0
  157. package/domains/analytics/locales/en/agents/layouter.md +11 -0
  158. package/domains/analytics/locales/en/agents/mediator.md +11 -0
  159. package/domains/analytics/locales/en/agents/researcher.md +11 -0
  160. package/domains/analytics/locales/en/agents/strategist.md +11 -0
  161. package/domains/analytics/persona/persona-base.md +94 -0
  162. package/domains/analytics/pipeline.yaml +102 -0
  163. package/domains/content/.agents/skills/audience-analysis/SKILL.md +15 -0
  164. package/domains/content/.agents/skills/board/SKILL.md +20 -0
  165. package/domains/content/.agents/skills/brand-compliance/SKILL.md +15 -0
  166. package/domains/content/.agents/skills/brand-guidelines/SKILL.md +17 -0
  167. package/domains/content/.agents/skills/competitor-content-analysis/SKILL.md +15 -0
  168. package/domains/content/.agents/skills/content-brief/SKILL.md +15 -0
  169. package/domains/content/.agents/skills/content-calendar/SKILL.md +15 -0
  170. package/domains/content/.agents/skills/content-release-gate/SKILL.md +15 -0
  171. package/domains/content/.agents/skills/content-review-checklist/SKILL.md +15 -0
  172. package/domains/content/.agents/skills/cta-optimization/SKILL.md +15 -0
  173. package/domains/content/.agents/skills/data-storytelling/SKILL.md +15 -0
  174. package/domains/content/.agents/skills/email-copywriting/SKILL.md +15 -0
  175. package/domains/content/.agents/skills/email-engagement-tiers/SKILL.md +15 -0
  176. package/domains/content/.agents/skills/fact-checking/SKILL.md +15 -0
  177. package/domains/content/.agents/skills/gates/SKILL.md +20 -0
  178. package/domains/content/.agents/skills/google-stitch-content/SKILL.md +15 -0
  179. package/domains/content/.agents/skills/handoff/SKILL.md +24 -0
  180. package/domains/content/.agents/skills/headline-formulas/SKILL.md +15 -0
  181. package/domains/content/.agents/skills/image-prompt-engineering/SKILL.md +15 -0
  182. package/domains/content/.agents/skills/karpathy-guidelines/SKILL.md +28 -0
  183. package/domains/content/.agents/skills/mailerlite-email-ops/SKILL.md +15 -0
  184. package/domains/content/.agents/skills/marketing-psychology/SKILL.md +15 -0
  185. package/domains/content/.agents/skills/moodboard/SKILL.md +15 -0
  186. package/domains/content/.agents/skills/platform-compliance/SKILL.md +15 -0
  187. package/domains/content/.agents/skills/platform-strategy/SKILL.md +15 -0
  188. package/domains/content/.agents/skills/platform-visual-specs/SKILL.md +15 -0
  189. package/domains/content/.agents/skills/readability-scoring/SKILL.md +15 -0
  190. package/domains/content/.agents/skills/seo-copywriting/SKILL.md +15 -0
  191. package/domains/content/.agents/skills/social-media-formats/SKILL.md +15 -0
  192. package/domains/content/.agents/skills/source-verification/SKILL.md +15 -0
  193. package/domains/content/.agents/skills/storytelling-framework/SKILL.md +15 -0
  194. package/domains/content/.agents/skills/tone-of-voice/SKILL.md +15 -0
  195. package/domains/content/.agents/skills/topic-research/SKILL.md +15 -0
  196. package/domains/content/.agents/skills/trend-research/SKILL.md +15 -0
  197. package/domains/content/.agents/skills/visual-brief/SKILL.md +15 -0
  198. package/domains/content/AGENTS.md +4 -0
  199. package/domains/content/agents/auditor.md +76 -0
  200. package/domains/content/agents/conductor.md +11 -0
  201. package/domains/content/agents/copywriter.md +11 -0
  202. package/domains/content/agents/researcher.md +11 -0
  203. package/domains/content/agents/reviewer.md +11 -0
  204. package/domains/content/agents/strategist.md +11 -0
  205. package/domains/content/agents/visual_concept.md +11 -0
  206. package/domains/content/locales/en/.agents/skills/audience-analysis/SKILL.md +15 -0
  207. package/domains/content/locales/en/.agents/skills/board/SKILL.md +20 -0
  208. package/domains/content/locales/en/.agents/skills/brand-compliance/SKILL.md +15 -0
  209. package/domains/content/locales/en/.agents/skills/brand-guidelines/SKILL.md +17 -0
  210. package/domains/content/locales/en/.agents/skills/competitor-content-analysis/SKILL.md +15 -0
  211. package/domains/content/locales/en/.agents/skills/content-brief/SKILL.md +15 -0
  212. package/domains/content/locales/en/.agents/skills/content-calendar/SKILL.md +15 -0
  213. package/domains/content/locales/en/.agents/skills/content-release-gate/SKILL.md +15 -0
  214. package/domains/content/locales/en/.agents/skills/content-review-checklist/SKILL.md +15 -0
  215. package/domains/content/locales/en/.agents/skills/cta-optimization/SKILL.md +15 -0
  216. package/domains/content/locales/en/.agents/skills/data-storytelling/SKILL.md +15 -0
  217. package/domains/content/locales/en/.agents/skills/email-copywriting/SKILL.md +15 -0
  218. package/domains/content/locales/en/.agents/skills/email-engagement-tiers/SKILL.md +15 -0
  219. package/domains/content/locales/en/.agents/skills/fact-checking/SKILL.md +15 -0
  220. package/domains/content/locales/en/.agents/skills/gates/SKILL.md +20 -0
  221. package/domains/content/locales/en/.agents/skills/google-stitch-content/SKILL.md +15 -0
  222. package/domains/content/locales/en/.agents/skills/handoff/SKILL.md +24 -0
  223. package/domains/content/locales/en/.agents/skills/headline-formulas/SKILL.md +15 -0
  224. package/domains/content/locales/en/.agents/skills/image-prompt-engineering/SKILL.md +15 -0
  225. package/domains/content/locales/en/.agents/skills/karpathy-guidelines/SKILL.md +30 -1
  226. package/domains/content/locales/en/.agents/skills/mailerlite-email-ops/SKILL.md +15 -0
  227. package/domains/content/locales/en/.agents/skills/marketing-psychology/SKILL.md +15 -0
  228. package/domains/content/locales/en/.agents/skills/moodboard/SKILL.md +15 -0
  229. package/domains/content/locales/en/.agents/skills/platform-compliance/SKILL.md +15 -0
  230. package/domains/content/locales/en/.agents/skills/platform-strategy/SKILL.md +15 -0
  231. package/domains/content/locales/en/.agents/skills/platform-visual-specs/SKILL.md +15 -0
  232. package/domains/content/locales/en/.agents/skills/readability-scoring/SKILL.md +15 -0
  233. package/domains/content/locales/en/.agents/skills/seo-copywriting/SKILL.md +15 -0
  234. package/domains/content/locales/en/.agents/skills/social-media-formats/SKILL.md +15 -0
  235. package/domains/content/locales/en/.agents/skills/source-verification/SKILL.md +15 -0
  236. package/domains/content/locales/en/.agents/skills/storytelling-framework/SKILL.md +15 -0
  237. package/domains/content/locales/en/.agents/skills/tone-of-voice/SKILL.md +15 -0
  238. package/domains/content/locales/en/.agents/skills/topic-research/SKILL.md +15 -0
  239. package/domains/content/locales/en/.agents/skills/trend-research/SKILL.md +15 -0
  240. package/domains/content/locales/en/.agents/skills/visual-brief/SKILL.md +15 -0
  241. package/domains/content/locales/en/AGENTS.md +4 -0
  242. package/domains/content/locales/en/agents/auditor.md +76 -0
  243. package/domains/content/locales/en/agents/conductor.md +12 -0
  244. package/domains/content/locales/en/agents/copywriter.md +12 -0
  245. package/domains/content/locales/en/agents/researcher.md +12 -0
  246. package/domains/content/locales/en/agents/reviewer.md +12 -0
  247. package/domains/content/locales/en/agents/strategist.md +12 -0
  248. package/domains/content/locales/en/agents/visual_concept.md +12 -0
  249. package/domains/content/persona/persona-base.md +94 -0
  250. package/domains/content/pipeline.yaml +96 -0
  251. package/domains/development/.agents/skills/adr-log/SKILL.md +1 -0
  252. package/domains/development/.agents/skills/design-intake/SKILL.md +0 -4
  253. package/domains/development/.agents/skills/karpathy-guidelines/SKILL.md +2 -1
  254. package/domains/development/.agents/skills/lava-flow-legacy-detection/SKILL.md +15 -1
  255. package/domains/development/.agents/skills/mcp-integration/SKILL.md +211 -0
  256. package/domains/development/.agents/skills/mcp-integration/agents/claude.json +22 -0
  257. package/domains/development/.agents/skills/mcp-integration/agents/copilot.json +22 -0
  258. package/domains/development/.agents/skills/mcp-integration/agents/gemini.json +22 -0
  259. package/domains/development/.agents/skills/mcp-integration/agents/kimi.yaml +18 -0
  260. package/domains/development/.agents/skills/mcp-integration/agents/openai.yaml +8 -0
  261. package/domains/development/.agents/skills/mcp-integration/agents/qwen.json +22 -0
  262. package/domains/development/.agents/skills/mcp-integration/agents/skill.yaml +26 -0
  263. package/domains/development/.agents/skills/qa-ui-a11y-smoke/SKILL.md +1 -1
  264. package/domains/development/.agents/skills/ui-a11y-smoke-review/SKILL.md +1 -1
  265. package/domains/development/AGENTS.md +1 -0
  266. package/domains/development/AGENTS.yaml +1 -0
  267. package/domains/development/agents/architect.md +13 -1
  268. package/domains/development/agents/auditor.md +74 -0
  269. package/domains/development/agents/conductor.md +14 -3
  270. package/domains/development/agents/devops.md +8 -9
  271. package/domains/development/agents/reviewer.md +12 -0
  272. package/domains/development/agents/senior_full_stack.md +12 -0
  273. package/domains/development/agents/tester.md +10 -16
  274. package/domains/development/locales/en/.agents/skills/adr-log/SKILL.md +1 -0
  275. package/domains/development/locales/en/.agents/skills/current-state-analysis/SKILL.md +256 -172
  276. package/domains/development/locales/en/.agents/skills/karpathy-guidelines/SKILL.md +2 -1
  277. package/domains/development/locales/en/.agents/skills/lava-flow-legacy-detection/SKILL.md +15 -1
  278. package/domains/development/locales/en/.agents/skills/mcp-integration/SKILL.md +211 -0
  279. package/domains/development/locales/en/.agents/skills/mcp-integration/agents/claude.json +22 -0
  280. package/domains/development/locales/en/.agents/skills/mcp-integration/agents/copilot.json +22 -0
  281. package/domains/development/locales/en/.agents/skills/mcp-integration/agents/gemini.json +22 -0
  282. package/domains/development/locales/en/.agents/skills/mcp-integration/agents/kimi.yaml +18 -0
  283. package/domains/development/locales/en/.agents/skills/mcp-integration/agents/openai.yaml +8 -0
  284. package/domains/development/locales/en/.agents/skills/mcp-integration/agents/qwen.json +22 -0
  285. package/domains/development/locales/en/.agents/skills/mcp-integration/agents/skill.yaml +26 -0
  286. package/domains/development/locales/en/.agents/skills/qa-ui-a11y-smoke/SKILL.md +1 -1
  287. package/domains/development/locales/en/.agents/skills/ui-a11y-smoke-review/SKILL.md +1 -1
  288. package/domains/development/locales/en/AGENTS.md +5 -0
  289. package/domains/development/locales/en/AGENTS.yaml +1 -0
  290. package/domains/development/locales/en/agents/architect.md +13 -1
  291. package/domains/development/locales/en/agents/auditor.md +74 -0
  292. package/domains/development/locales/en/agents/conductor.md +14 -3
  293. package/domains/development/locales/en/agents/devops.md +8 -9
  294. package/domains/development/locales/en/agents/reviewer.md +12 -0
  295. package/domains/development/locales/en/agents/senior_full_stack.md +12 -0
  296. package/domains/development/locales/en/agents/tester.md +10 -16
  297. package/domains/development/persona/persona-base.md +94 -0
  298. package/domains/product/.agents/skills/aarrr-metrics/SKILL.md +451 -433
  299. package/domains/product/.agents/skills/ab-test-design/SKILL.md +428 -412
  300. package/domains/product/.agents/skills/acceptance-criteria/SKILL.md +422 -406
  301. package/domains/product/.agents/skills/assumption-mapping/SKILL.md +323 -307
  302. package/domains/product/.agents/skills/board/SKILL.md +24 -0
  303. package/domains/product/.agents/skills/design-brief/SKILL.md +433 -418
  304. package/domains/product/.agents/skills/epic-breakdown/SKILL.md +435 -420
  305. package/domains/product/.agents/skills/gates/SKILL.md +470 -446
  306. package/domains/product/.agents/skills/gtm-brief/SKILL.md +18 -321
  307. package/domains/product/.agents/skills/gtm-brief-reference/SKILL.md +348 -0
  308. package/domains/product/.agents/skills/gtm-brief-reference/agents/claude.json +17 -0
  309. package/domains/product/.agents/skills/gtm-brief-reference/agents/copilot.json +17 -0
  310. package/domains/product/.agents/skills/gtm-brief-reference/agents/gemini.json +17 -0
  311. package/domains/product/.agents/skills/gtm-brief-reference/agents/kimi.yaml +15 -0
  312. package/domains/product/.agents/skills/gtm-brief-reference/agents/openai.yaml +10 -0
  313. package/domains/product/.agents/skills/gtm-brief-reference/agents/qwen.json +17 -0
  314. package/domains/product/.agents/skills/gtm-brief-reference/agents/skill.yaml +22 -0
  315. package/domains/product/.agents/skills/handoff/SKILL.md +463 -439
  316. package/domains/product/.agents/skills/html-pdf-report/SKILL.md +21 -663
  317. package/domains/product/.agents/skills/html-pdf-report-reference/SKILL.md +699 -0
  318. package/domains/product/.agents/skills/html-pdf-report-reference/agents/claude.json +17 -0
  319. package/domains/product/.agents/skills/html-pdf-report-reference/agents/copilot.json +17 -0
  320. package/domains/product/.agents/skills/html-pdf-report-reference/agents/gemini.json +17 -0
  321. package/domains/product/.agents/skills/html-pdf-report-reference/agents/kimi.yaml +15 -0
  322. package/domains/product/.agents/skills/html-pdf-report-reference/agents/openai.yaml +10 -0
  323. package/domains/product/.agents/skills/html-pdf-report-reference/agents/qwen.json +17 -0
  324. package/domains/product/.agents/skills/html-pdf-report-reference/agents/skill.yaml +22 -0
  325. package/domains/product/.agents/skills/hypothesis-template/SKILL.md +484 -469
  326. package/domains/product/.agents/skills/jtbd-canvas/SKILL.md +274 -258
  327. package/domains/product/.agents/skills/kano-model/SKILL.md +370 -355
  328. package/domains/product/.agents/skills/karpathy-guidelines/SKILL.md +36 -0
  329. package/domains/product/.agents/skills/launch-checklist/SKILL.md +434 -419
  330. package/domains/product/.agents/skills/moscow-prioritization/SKILL.md +407 -392
  331. package/domains/product/.agents/skills/north-star-metric/SKILL.md +317 -301
  332. package/domains/product/.agents/skills/okr-framework/SKILL.md +299 -284
  333. package/domains/product/.agents/skills/opportunity-solution-tree/SKILL.md +472 -456
  334. package/domains/product/.agents/skills/prd-template/SKILL.md +18 -258
  335. package/domains/product/.agents/skills/prd-template-reference/SKILL.md +285 -0
  336. package/domains/product/.agents/skills/prd-template-reference/agents/claude.json +17 -0
  337. package/domains/product/.agents/skills/prd-template-reference/agents/copilot.json +17 -0
  338. package/domains/product/.agents/skills/prd-template-reference/agents/gemini.json +17 -0
  339. package/domains/product/.agents/skills/prd-template-reference/agents/kimi.yaml +16 -0
  340. package/domains/product/.agents/skills/prd-template-reference/agents/openai.yaml +10 -0
  341. package/domains/product/.agents/skills/prd-template-reference/agents/qwen.json +17 -0
  342. package/domains/product/.agents/skills/prd-template-reference/agents/skill.yaml +23 -0
  343. package/domains/product/.agents/skills/problem-statement/SKILL.md +327 -312
  344. package/domains/product/.agents/skills/product-roadmap/SKILL.md +320 -304
  345. package/domains/product/.agents/skills/product-vision/SKILL.md +409 -394
  346. package/domains/product/.agents/skills/release-notes/SKILL.md +18 -258
  347. package/domains/product/.agents/skills/release-notes-reference/SKILL.md +285 -0
  348. package/domains/product/.agents/skills/release-notes-reference/agents/claude.json +17 -0
  349. package/domains/product/.agents/skills/release-notes-reference/agents/copilot.json +17 -0
  350. package/domains/product/.agents/skills/release-notes-reference/agents/gemini.json +17 -0
  351. package/domains/product/.agents/skills/release-notes-reference/agents/kimi.yaml +15 -0
  352. package/domains/product/.agents/skills/release-notes-reference/agents/openai.yaml +10 -0
  353. package/domains/product/.agents/skills/release-notes-reference/agents/qwen.json +17 -0
  354. package/domains/product/.agents/skills/release-notes-reference/agents/skill.yaml +22 -0
  355. package/domains/product/.agents/skills/report-design/SKILL.md +17 -307
  356. package/domains/product/.agents/skills/report-design-reference/SKILL.md +331 -0
  357. package/domains/product/.agents/skills/report-design-reference/agents/claude.json +17 -0
  358. package/domains/product/.agents/skills/report-design-reference/agents/copilot.json +17 -0
  359. package/domains/product/.agents/skills/report-design-reference/agents/gemini.json +17 -0
  360. package/domains/product/.agents/skills/report-design-reference/agents/kimi.yaml +15 -0
  361. package/domains/product/.agents/skills/report-design-reference/agents/openai.yaml +10 -0
  362. package/domains/product/.agents/skills/report-design-reference/agents/qwen.json +17 -0
  363. package/domains/product/.agents/skills/report-design-reference/agents/skill.yaml +22 -0
  364. package/domains/product/.agents/skills/rice-scoring/SKILL.md +266 -251
  365. package/domains/product/.agents/skills/saas-metrics/SKILL.md +422 -404
  366. package/domains/product/.agents/skills/session-prompt-generator/SKILL.md +474 -450
  367. package/domains/product/.agents/skills/user-flow/SKILL.md +491 -476
  368. package/domains/product/.agents/skills/user-interview-script/SKILL.md +315 -298
  369. package/domains/product/.agents/skills/user-story/SKILL.md +401 -385
  370. package/domains/product/.agents/skills/wsjf-scoring/SKILL.md +333 -315
  371. package/domains/product/AGENTS.md +5 -0
  372. package/domains/product/AGENTS.yaml +1 -0
  373. package/domains/product/agents/auditor.md +76 -0
  374. package/domains/product/agents/conductor.md +11 -0
  375. package/domains/product/agents/data_analyst.md +11 -0
  376. package/domains/product/agents/designer.md +11 -0
  377. package/domains/product/agents/discovery.md +11 -0
  378. package/domains/product/agents/layouter.md +11 -0
  379. package/domains/product/agents/mediator.md +11 -0
  380. package/domains/product/agents/pm.md +11 -0
  381. package/domains/product/agents/product_strategist.md +11 -0
  382. package/domains/product/agents/tech_lead.md +11 -0
  383. package/domains/product/agents/ux_designer.md +11 -0
  384. package/domains/product/locales/en/.agents/skills/aarrr-metrics/SKILL.md +451 -433
  385. package/domains/product/locales/en/.agents/skills/ab-test-design/SKILL.md +428 -412
  386. package/domains/product/locales/en/.agents/skills/acceptance-criteria/SKILL.md +422 -406
  387. package/domains/product/locales/en/.agents/skills/assumption-mapping/SKILL.md +323 -307
  388. package/domains/product/locales/en/.agents/skills/board/SKILL.md +24 -0
  389. package/domains/product/locales/en/.agents/skills/design-brief/SKILL.md +433 -418
  390. package/domains/product/locales/en/.agents/skills/epic-breakdown/SKILL.md +435 -420
  391. package/domains/product/locales/en/.agents/skills/gates/SKILL.md +470 -446
  392. package/domains/product/locales/en/.agents/skills/gtm-brief/SKILL.md +18 -321
  393. package/domains/product/locales/en/.agents/skills/gtm-brief-reference/SKILL.md +348 -0
  394. package/domains/product/locales/en/.agents/skills/gtm-brief-reference/agents/claude.json +17 -0
  395. package/domains/product/locales/en/.agents/skills/gtm-brief-reference/agents/copilot.json +17 -0
  396. package/domains/product/locales/en/.agents/skills/gtm-brief-reference/agents/gemini.json +17 -0
  397. package/domains/product/locales/en/.agents/skills/gtm-brief-reference/agents/kimi.yaml +15 -0
  398. package/domains/product/locales/en/.agents/skills/gtm-brief-reference/agents/openai.yaml +10 -0
  399. package/domains/product/locales/en/.agents/skills/gtm-brief-reference/agents/qwen.json +17 -0
  400. package/domains/product/locales/en/.agents/skills/gtm-brief-reference/agents/skill.yaml +22 -0
  401. package/domains/product/locales/en/.agents/skills/handoff/SKILL.md +463 -439
  402. package/domains/product/locales/en/.agents/skills/html-pdf-report/SKILL.md +21 -663
  403. package/domains/product/locales/en/.agents/skills/html-pdf-report-reference/SKILL.md +699 -0
  404. package/domains/product/locales/en/.agents/skills/html-pdf-report-reference/agents/claude.json +17 -0
  405. package/domains/product/locales/en/.agents/skills/html-pdf-report-reference/agents/copilot.json +17 -0
  406. package/domains/product/locales/en/.agents/skills/html-pdf-report-reference/agents/gemini.json +17 -0
  407. package/domains/product/locales/en/.agents/skills/html-pdf-report-reference/agents/kimi.yaml +15 -0
  408. package/domains/product/locales/en/.agents/skills/html-pdf-report-reference/agents/openai.yaml +10 -0
  409. package/domains/product/locales/en/.agents/skills/html-pdf-report-reference/agents/qwen.json +17 -0
  410. package/domains/product/locales/en/.agents/skills/html-pdf-report-reference/agents/skill.yaml +22 -0
  411. package/domains/product/locales/en/.agents/skills/hypothesis-template/SKILL.md +484 -469
  412. package/domains/product/locales/en/.agents/skills/jtbd-canvas/SKILL.md +273 -257
  413. package/domains/product/locales/en/.agents/skills/kano-model/SKILL.md +370 -355
  414. package/domains/product/locales/en/.agents/skills/karpathy-guidelines/SKILL.md +36 -0
  415. package/domains/product/locales/en/.agents/skills/launch-checklist/SKILL.md +434 -419
  416. package/domains/product/locales/en/.agents/skills/moscow-prioritization/SKILL.md +407 -392
  417. package/domains/product/locales/en/.agents/skills/north-star-metric/SKILL.md +317 -301
  418. package/domains/product/locales/en/.agents/skills/okr-framework/SKILL.md +299 -284
  419. package/domains/product/locales/en/.agents/skills/opportunity-solution-tree/SKILL.md +472 -456
  420. package/domains/product/locales/en/.agents/skills/prd-template/SKILL.md +18 -258
  421. package/domains/product/locales/en/.agents/skills/prd-template-reference/SKILL.md +285 -0
  422. package/domains/product/locales/en/.agents/skills/prd-template-reference/agents/claude.json +16 -0
  423. package/domains/product/locales/en/.agents/skills/prd-template-reference/agents/copilot.json +16 -0
  424. package/domains/product/locales/en/.agents/skills/prd-template-reference/agents/gemini.json +16 -0
  425. package/domains/product/locales/en/.agents/skills/prd-template-reference/agents/kimi.yaml +15 -0
  426. package/domains/product/locales/en/.agents/skills/prd-template-reference/agents/openai.yaml +10 -0
  427. package/domains/product/locales/en/.agents/skills/prd-template-reference/agents/qwen.json +16 -0
  428. package/domains/product/locales/en/.agents/skills/prd-template-reference/agents/skill.yaml +22 -0
  429. package/domains/product/locales/en/.agents/skills/problem-statement/SKILL.md +327 -312
  430. package/domains/product/locales/en/.agents/skills/product-roadmap/SKILL.md +321 -305
  431. package/domains/product/locales/en/.agents/skills/product-vision/SKILL.md +410 -395
  432. package/domains/product/locales/en/.agents/skills/release-notes/SKILL.md +18 -258
  433. package/domains/product/locales/en/.agents/skills/release-notes-reference/SKILL.md +285 -0
  434. package/domains/product/locales/en/.agents/skills/release-notes-reference/agents/claude.json +16 -0
  435. package/domains/product/locales/en/.agents/skills/release-notes-reference/agents/copilot.json +16 -0
  436. package/domains/product/locales/en/.agents/skills/release-notes-reference/agents/gemini.json +16 -0
  437. package/domains/product/locales/en/.agents/skills/release-notes-reference/agents/kimi.yaml +14 -0
  438. package/domains/product/locales/en/.agents/skills/release-notes-reference/agents/openai.yaml +10 -0
  439. package/domains/product/locales/en/.agents/skills/release-notes-reference/agents/qwen.json +16 -0
  440. package/domains/product/locales/en/.agents/skills/release-notes-reference/agents/skill.yaml +21 -0
  441. package/domains/product/locales/en/.agents/skills/report-design/SKILL.md +17 -307
  442. package/domains/product/locales/en/.agents/skills/report-design-reference/SKILL.md +331 -0
  443. package/domains/product/locales/en/.agents/skills/report-design-reference/agents/claude.json +17 -0
  444. package/domains/product/locales/en/.agents/skills/report-design-reference/agents/copilot.json +17 -0
  445. package/domains/product/locales/en/.agents/skills/report-design-reference/agents/gemini.json +17 -0
  446. package/domains/product/locales/en/.agents/skills/report-design-reference/agents/kimi.yaml +15 -0
  447. package/domains/product/locales/en/.agents/skills/report-design-reference/agents/openai.yaml +10 -0
  448. package/domains/product/locales/en/.agents/skills/report-design-reference/agents/qwen.json +17 -0
  449. package/domains/product/locales/en/.agents/skills/report-design-reference/agents/skill.yaml +22 -0
  450. package/domains/product/locales/en/.agents/skills/rice-scoring/SKILL.md +266 -251
  451. package/domains/product/locales/en/.agents/skills/saas-metrics/SKILL.md +422 -404
  452. package/domains/product/locales/en/.agents/skills/session-prompt-generator/SKILL.md +474 -450
  453. package/domains/product/locales/en/.agents/skills/user-flow/SKILL.md +491 -476
  454. package/domains/product/locales/en/.agents/skills/user-interview-script/SKILL.md +314 -297
  455. package/domains/product/locales/en/.agents/skills/user-story/SKILL.md +401 -385
  456. package/domains/product/locales/en/.agents/skills/wsjf-scoring/SKILL.md +333 -315
  457. package/domains/product/locales/en/AGENTS.md +5 -0
  458. package/domains/product/locales/en/agents/auditor.md +76 -0
  459. package/domains/product/locales/en/agents/conductor.md +11 -0
  460. package/domains/product/locales/en/agents/data_analyst.md +11 -0
  461. package/domains/product/locales/en/agents/designer.md +11 -0
  462. package/domains/product/locales/en/agents/discovery.md +11 -0
  463. package/domains/product/locales/en/agents/layouter.md +11 -0
  464. package/domains/product/locales/en/agents/mediator.md +11 -0
  465. package/domains/product/locales/en/agents/pm.md +11 -0
  466. package/domains/product/locales/en/agents/product_strategist.md +11 -0
  467. package/domains/product/locales/en/agents/tech_lead.md +11 -0
  468. package/domains/product/locales/en/agents/ux_designer.md +11 -0
  469. package/domains/product/persona/persona-base.md +94 -0
  470. package/domains/product/pipeline.yaml +115 -0
  471. package/package.json +72 -70
@@ -1,412 +1,428 @@
1
- ---
2
- name: ab-test-design
3
- description: A/B test design — primary metric, MDE, sample size, duration, guardrails, critical region
4
- ---
5
- # A/B Test Design
6
-
7
- > **Category:** Experimentation · **Slug:** `ab-test-design`
8
-
9
- ## When to Use
10
-
11
- - For hypothesis validation with a quantitative signal.
12
- - When rolling out high-risk changes (pricing, onboarding, core flow).
13
- - For comparing alternatives when evidence is unclear.
14
- - When measuring feature impact for PRD success criteria.
15
-
16
- ## Input
17
-
18
- | Field | Required | Description |
19
- |-------|:--------:|-------------|
20
- | Hypothesis | ✅ | Via `$hypothesis-template` |
21
- | Primary metric | ✅ | What we are measuring |
22
- | Baseline metric value | ✅ | Current mean + variance |
23
- | Expected effect size | ✅ | MDE — minimum detectable effect |
24
- | Traffic / eligible users | ✅ | Weekly eligible sample |
25
- | Infrastructure | ✅ | Feature flag / experimentation platform |
26
-
27
- ## Data Sources
28
-
29
- 1. Historical metric data baseline + variance.
30
- 2. `$hypothesis-template` expected direction + magnitude.
31
- 3. User analytics — eligible population.
32
- 4. Industry benchmarks — typical effect sizes.
33
-
34
- ### Related Skills
35
-
36
- | Skill | What we take | When to call |
37
- |-------|-------------|--------------|
38
- | `hypothesis-template` | What we're testing | Prerequisite |
39
- | `saas-metrics` | Primary + guardrail metrics | For selection |
40
- | `aarrr-metrics` | Funnel context | For understanding |
41
- | `assumption-mapping` | High-risk assumption A/B | For top assumptions |
42
-
43
- ## Protocol
44
-
45
- ### Step 0Is A/B Appropriate?
46
-
47
- Checklist:
48
- - Enough traffic (≥ 1000 users / week per variant)?
49
- - Metric instrumentable + detectable in timeframe?
50
- - Change isolatable (not confounded with other rollouts)?
51
- - Can ethically A/B (not critical safety / compliance)?
52
-
53
- If not — alternatives: phased rollout with cohort comparison, before/after, qualitative testing.
54
-
55
- ### Step 1 Primary Metric
56
-
57
- Single primary metric. Not multiple ("Impact on activation AND retention").
58
-
59
- Typical:
60
- - **Activation:** 7-day activation rate
61
- - **Retention:** W/W active, churn
62
- - **Conversion:** signup → paid, trial → active
63
- - **Engagement:** actions per session, DAU/MAU
64
-
65
- Properties:
66
- - **Detectable** in reasonable sample size
67
- - **Aligned** with hypothesis outcome
68
- - **Sensitive** — moves when the expected change happens
69
- - **Trustworthy**not easily gamed
70
-
71
- ### Step 2Minimum Detectable Effect (MDE)
72
-
73
- MDE smallest lift worth detecting. Trade-off:
74
- - Smaller MDE = need more sample = longer test
75
- - Larger MDE = faster test, but miss smaller wins
76
-
77
- Rules of thumb:
78
- - Activation metrics: MDE 3-5% (pp)
79
- - Conversion metrics: MDE 5-10% (pp) relative lift
80
- - Retention metrics: MDE 2-4% (pp)
81
-
82
- In B2B: smaller samples MDE often 5-10% minimum.
83
-
84
- ### Step 3 Sample Size Calculation
85
-
86
- Formula for proportion tests:
87
- ```
88
- n per variant = (Z_α/2 + Z_β)² × 2 × p(1-p) / MDE²
89
- ```
90
-
91
- Where:
92
- - Z_α/2 = 1.96 (95% confidence)
93
- - Z_β = 0.84 (80% power)
94
- - p = baseline rate
95
- - MDE = minimum detectable effect
96
-
97
- For continuous metrics — similar formula with variance.
98
-
99
- **Use calculator:** online tools (Evan Miller, Optimizely calc) — do not handle the math yourself.
100
-
101
- ### Step 4 — Duration
102
-
103
- Duration = sample_size × variants / weekly_eligible_users.
104
-
105
- Multiply:
106
- - By 1.5-2× for week-over-week cyclicality (weekdays vs weekends)
107
- - For B2B: minimum 2 weeks (full week cycle)
108
- - Maximum: 6-8 weeks (beyond that — context changes, seasonality)
109
-
110
- ### Step 5 Randomization
111
-
112
- - **Unit:** user or account-level (B2B: account-level typically, to avoid split-brain within a team)
113
- - **Seed:** random but deterministic (same user gets same variant on re-visit)
114
- - **Allocation:** 50/50 default, can be 80/20 (control heavy) for risky changes
115
-
116
- ### Step 6 — Guardrail Metrics
117
-
118
- Metrics that must not degrade:
119
- - Churn rate
120
- - NPS / CSAT
121
- - Support ticket volume
122
- - p95 latency / error rate
123
- - Revenue / user
124
-
125
- Set thresholds (e.g., "churn cannot increase >1pp").
126
-
127
- ### Step 7 — Segment Analysis Plan
128
-
129
- Pre-registered (not p-hacking after):
130
- - By company size
131
- - By tenure (new vs established)
132
- - By role
133
- - By geography (if relevant)
134
-
135
- Document in the test plan, not after results.
136
-
137
- ### Step 8 — Statistical Method
138
-
139
- - **Frequentist** (most common): p-value < 0.05, 80% power
140
- - **Bayesian:** posterior probability of improvement > 95%
141
-
142
- Pick one. Document.
143
-
144
- ### Step 9 — Critical Region / Stopping Rules
145
-
146
- When to stop the test:
147
- - **Success** significance reached, stop
148
- - **Failure (futility):** minimal effect after N% samples, stop
149
- - **Guardrail breach:** even if primary wins, stop
150
- - **Time limit:** maximum duration reached
151
-
152
- **NEVER peek early** and stop based on p-value (inflates false positive rate) without sequential testing design.
153
-
154
- ### Step 10 — Pre-registered Analysis Plan
155
-
156
- Document BEFORE running the test:
157
- - Primary metric + MDE + sample size
158
- - Segments
159
- - Guardrails
160
- - Stopping criteria
161
- - Interpretation rules
162
-
163
- Forces avoiding HARKing (Hypothesizing After Results Known).
164
-
165
- ## Validation (Quality Gate)
166
-
167
- - [ ] A/B appropriate (traffic, isolation, ethics)
168
- - [ ] Primary metric single + well-defined
169
- - [ ] MDE rationale (business + detectable)
170
- - [ ] Sample size calculated
171
- - [ ] Duration ≥ 2 weeks, ≤ 8 weeks
172
- - [ ] Randomization unit appropriate (user / account)
173
- - [ ] Guardrail metrics with thresholds
174
- - [ ] Segment analysis pre-registered
175
- - [ ] Statistical method picked + justified
176
- - [ ] Stopping rules explicit
177
- - [ ] Pre-registered analysis plan
178
-
179
- ## Handoff
180
-
181
- The result is the input for:
182
- - **Engineering** → feature flag + instrumentation
183
- - **Data Analyst** monitoring dashboard
184
- - **PM** launch criteria
185
- - **Stakeholders** weekly reports
186
-
187
- Format: A/B test design doc (markdown). Via `$handoff`.
188
-
189
- ## Anti-patterns
190
-
191
- | Error | Why it's bad | How to do it right |
192
- |-------|-------------|-------------------|
193
- | Multiple primary metrics | p-value inflation | Single primary |
194
- | Peeking + early stop | False positive | Sequential or set duration |
195
- | No MDE rationale | Under-powered or over-long | Business + detectable justification |
196
- | Ignore guardrails | Feature "wins" while breaking | Explicit guardrails with kill criteria |
197
- | No pre-registration | HARKing, p-hacking | Plan before running |
198
- | Short duration | Weekly cycle noise | Min 2 weeks |
199
- | User-level in B2B flow | Same account, different variants | Account-level randomization |
200
-
201
- ## Template
202
-
203
- ```markdown
204
- # A/B Test: [Name]
205
-
206
- ## Hypothesis
207
- [via $hypothesis-template]
208
-
209
- ## Primary Metric
210
- - Metric: [e.g. 7-day activation rate]
211
- - Baseline: X% (last 30 days)
212
- - MDE: +5pp (rationale: business need + sample support)
213
-
214
- ## Sample & Duration
215
- - Eligible users / week: Y
216
- - Sample per variant: Z
217
- - Calculated duration: N weeks
218
- - Planned duration: N weeks (accounting for cyclicality)
219
-
220
- ## Variants
221
- - Control: current flow
222
- - Treatment: [change]
223
- - Allocation: 50/50
224
- - Randomization: account-level, deterministic
225
-
226
- ## Guardrails
227
- | Metric | Current | Threshold |
228
- | Churn rate | 2% | < 2.5% |
229
- | NPS | 45 | ≥ 43 |
230
- | p95 latency | 180ms | < 200ms |
231
-
232
- ## Pre-registered Analysis
233
- - Segments: company size, user role
234
- - Statistical method: frequentist, α=0.05, power=0.8
235
- - Ship criteria: primary +MDE significant, no guardrail breach
236
-
237
- ## Stopping Rules
238
- - Success: significance reached + guardrails ok → ship
239
- - Failure: effect < MDE with 50%+ sample → kill
240
- - Guardrail: any guardrail breached → stop, investigate
241
- - Max duration: 8 weeks
242
- ```
243
-
244
- ## Worked Example TeamFlow Onboarding A/B Test (Post-MVP Iteration)
245
-
246
- **Context:** AI Summarization MVP shipped. 30 days later, H-003 (adoption) hypothesis tracking 42% adoption — below 60% target. Data Analyst designs an A/B test of onboarding checklist vs control for iteration.
247
-
248
- ```markdown
249
- # A/B Test Design: Onboarding Checklist for New AI Tier Managers
250
-
251
- **Experiment ID:** EXP-025
252
- **Status:** Approved, launch July 8, 2026
253
- **Owner:** Sam P. (Data) + Alex K. (PM) + Jordan M. (Design)
254
- **Hypothesis:** H-003 iteration 1
255
-
256
- ## Hypothesis (re-stated for test)
257
-
258
- **We believe** providing an onboarding checklist ("Complete 3 steps to master AI summaries")
259
- **For** managers newly activated in AI-tier accounts
260
- **Will result in** higher 30-day weekly adoption rate
261
- **We'll know it's true when** treatment group 30-day adoption rate is ≥5pp higher than control
262
- **Because**
263
- - Discovery surprise: 60% of low-adopters cited "didn't know where to start" in post-launch survey
264
- - Onboarding checklists in TeamFlow existing features show +22% activation lift (internal benchmark)
265
- - 7-day first-activation window predicts long-term retention (our cohort analysis Q1)
266
-
267
- ## Primary Metric
268
-
269
- - **Metric:** 30-day weekly adoption rate (% of newly-activated managers who used AI summary ≥1 per week in weeks 2-4 post-activation)
270
- - **Baseline:** 42% (current, measured last 30 days of AI tier rollout)
271
- - **MDE:** +5pp (target: 47% treatment vs 42% control)
272
- - **MDE Rationale:**
273
- - Business threshold: +5pp × ~200 new managers/month = 10 more managers retained/month = $2.4K MRR
274
- - Detectable with reasonable sample (see sample size calc)
275
- - Below 5pp not material for PM-level decision
276
-
277
- ## Sample Size Calculation
278
-
279
- Using formula for proportion test:
280
- - Baseline p1 = 0.42
281
- - Treatment p2 = 0.47 (target)
282
- - α = 0.05 (two-sided)
283
- - Power = 0.80
284
- - Z_α/2 = 1.96, Z_β = 0.84
285
-
286
- n per variant = 2 × (1.96 + 0.84)² × [0.42(0.58) + 0.47(0.53)] / (0.05)²
287
- 2 × 7.84 × 0.493 / 0.0025
288
- 3092
289
-
290
- **Sample per variant: ~3100**
291
- **Total sample: 6200**
292
-
293
- ## Duration
294
-
295
- - Weekly eligible managers (newly-activating in AI tier): ~200/week
296
- - Duration: 6200 / 200 = **31 weeks**
297
-
298
- **Problem:** 31 weeks is unreasonably long. Options:
299
- 1. **Raise MDE to 7pp** (target 49%) — Duration drops to ~16 weeks
300
- 2. **Increase allocation to 80/20** — more weight on treatment, but no speed benefit
301
- 3. **Accept extended runway** with weekly monitoring for early signal
302
-
303
- **Decision:** Raise MDE to 7pp (target 49%). Business-justified — 7pp × 200 managers = 14 managers/month saved, material.
304
-
305
- **Revised duration: 16 weeks.** Plus cyclicality buffer: 18 weeks total.
306
-
307
- ## Variants
308
-
309
- - **Control (50%):** Current experience — manager activates tier, sees default TeamFlow onboarding, no checklist
310
- - **Treatment (50%):** Same + sticky onboarding checklist:
311
- - Step 1: Enable AI for first 1:1 (button)
312
- - Step 2: Review generated summary + approve
313
- - Step 3: Check extracted action items before next 1:1
314
-
315
- Checklist persists in manager's dashboard until all 3 steps completed or manually dismissed.
316
-
317
- ## Randomization
318
-
319
- - **Unit:** Account-level (not user-levelsame account gets same variant across all managers)
320
- - Rationale: B2B consistency — HR admin shouldn't see different onboarding per team
321
- - **Seed:** SHA256 of account_id deterministic, re-assignable
322
- - **Allocation:** 50/50
323
- - **Eligibility:** Accounts activating AI tier from July 8 onward (test start date)
324
-
325
- ## Guardrail Metrics
326
-
327
- | Metric | Baseline | Threshold | Monitoring |
328
- |--------|:--------:|:---------:|:----------:|
329
- | Gross churn rate (AI tier accounts) | 3% / 90 days | 4% | Weekly |
330
- | NPS (in AI tier) | 50 | ≥ 47 | Bi-weekly survey |
331
- | Support tickets "onboarding confusion" | <2% of total | <3% | Daily check |
332
- | Manager NPS on AI feature | 52 | ≥ 48 | Monthly in-product survey |
333
-
334
- **If any guardrail breaches:** pause experiment, investigate, potentially kill.
335
-
336
- ## Segment Analysis (Pre-registered)
337
-
338
- Expected differential lift:
339
- - **Company size:** Expect SMB highest lift (more novice managers) > mid-market > enterprise (already have training programs)
340
- - **Manager experience:** Expect new managers (<2 years) highest lift
341
- - **Industry:** Tech companies first-movers — expect highest baseline + moderate lift
342
-
343
- Analysis will be reported **both** overall AND per-segment. No cherry-picking segments post-hoc.
344
-
345
- ## Statistical Method
346
-
347
- - **Frequentist, Z-test for proportions** (standard for adoption rate A/B)
348
- - α = 0.05 two-sided
349
- - Power = 0.80
350
- - No peeking before planned duration (18 weeks)
351
- - Segmented analysis — multiple comparisons correction (Bonferroni): α / 3 segments = 0.017
352
-
353
- ## Critical Region / Stopping Rules
354
-
355
- ### Success (Go ship to all)
356
- - Primary metric: treatment lift 7pp, significant at α=0.05
357
- - No guardrail breach
358
- - ≥ 16 weeks duration completed
359
-
360
- ### Failure (Stop — kill variant)
361
- - Primary metric: absolute difference < 3pp after 50% sample reached
362
- - OR guardrail breach
363
-
364
- ### Extended observation (continue)
365
- - Primary metric: 3-7pp observed, not significant → continue to planned duration
366
-
367
- ### Early positive signal (not stopping)
368
- - Primary metric: 8pp+ observed at 25% sample, significant
369
- - Action: DO NOT stop early — continue, watch for heterogeneity
370
-
371
- ## Pre-Registered Analysis Plan
372
-
373
- Posted on internal wiki before experiment start:
374
-
375
- 1. Primary metric: 30-day weekly adoption rate, Z-test, α=0.05
376
- 2. Guardrail checks: weekly automated alerts
377
- 3. Segment analysis: by company size, manager experience, industry (Bonferroni-corrected)
378
- 4. Interpretation rules: pre-committed above
379
- 5. Reporting: Weekly tracking dashboard, formal report at weeks 8, 16, 18
380
-
381
- ## Implementation Plan
382
-
383
- ### Pre-launch
384
- - [ ] Feature flag configured (account-level, deterministic)
385
- - [ ] Treatment variant built + QA'd
386
- - [ ] Instrumentation: event `onboarding_checklist_step_completed`, `onboarding_checklist_dismissed`
387
- - [ ] Analytics dashboard live (auto-updating weekly)
388
- - [ ] Control variant verified unchanged from production
389
-
390
- ### During
391
- - **Weekly review** (Monday 10am): check sample accrual, guardrails, no interim analysis peeking
392
- - **Bi-weekly survey:** 20 random managers from each variant — qualitative signal
393
- - **Incident response:** if guardrail breach pause within 24 hours
394
-
395
- ### Post (if success)
396
- - Full rollout — remove feature flag
397
- - Checklist becomes part of standard onboarding
398
- - Document decision rationale for future iteration
399
-
400
- ### Post (if failure)
401
- - Variant killed
402
- - Qualitative analysis of why interviews with managers who did / didn't complete checklist
403
- - Design next experiment (e.g., different onboarding approach)
404
-
405
- ## Open Questions
406
-
407
- 1. Do we expose variant to existing non-activated managers in AI-tier accounts? (**Decision:** No, only new activations from July 8 cleaner baseline)
408
- 2. What about managers in accounts that activate DURING experiment but ALL variant differ? (**Decision:** Follow account assignment if account is treatment, all new managers see checklist)
409
- 3. Bonferroni vs alternative multiple-comparisons correction? (**Decision:** Bonferroni conservative, easier to explain)
410
- ```
411
-
412
- > **A/B design lesson:** Sample size calculation revealed MDE had to be **raised**, no shortcuts. **18 weeks** is a real commitment — not a fake "2-week" test that reads noise. **Account-level** randomization is critical in B2B user-level would have one team split across variants = invalid. **Pre-registered analysis plan** on internal wiki prevents HARKing (Hypothesizing After Results Known). **Guardrails with numeric thresholds** — without them, "churn increased a bit but feature won" rationalizations happen. This test is honest science — takes months, tests one thing cleanly.
1
+ ---
2
+ name: ab-test-design
3
+ description: A/B test design — primary metric, MDE, sample size, duration, guardrails, critical region
4
+ type: triggered
5
+ domain: product
6
+ owners:
7
+ - data_analyst
8
+ gates:
9
+ - DATA_ANALYST
10
+ tech: []
11
+ topic: []
12
+ triggers:
13
+ - "ab-test-design"
14
+ - "A/B test"
15
+ - "сплит-тест"
16
+ - "experiment design"
17
+ related: []
18
+ budget_lines: 428
19
+ schema_version: 1
20
+ ---
21
+ # A/B Test Design
22
+
23
+ > **Category:** Experimentation · **Slug:** `ab-test-design`
24
+
25
+ ## When to Use
26
+
27
+ - For hypothesis validation with a quantitative signal.
28
+ - When rolling out high-risk changes (pricing, onboarding, core flow).
29
+ - For comparing alternatives when evidence is unclear.
30
+ - When measuring feature impact for PRD success criteria.
31
+
32
+ ## Input
33
+
34
+ | Field | Required | Description |
35
+ |-------|:--------:|-------------|
36
+ | Hypothesis | | Via `$hypothesis-template` |
37
+ | Primary metric | ✅ | What we are measuring |
38
+ | Baseline metric value | | Current mean + variance |
39
+ | Expected effect size | | MDE minimum detectable effect |
40
+ | Traffic / eligible users | | Weekly eligible sample |
41
+ | Infrastructure | | Feature flag / experimentation platform |
42
+
43
+ ## Data Sources
44
+
45
+ 1. Historical metric data baseline + variance.
46
+ 2. `$hypothesis-template` — expected direction + magnitude.
47
+ 3. User analytics — eligible population.
48
+ 4. Industry benchmarks typical effect sizes.
49
+
50
+ ### Related Skills
51
+
52
+ | Skill | What we take | When to call |
53
+ |-------|-------------|--------------|
54
+ | `hypothesis-template` | What we're testing | Prerequisite |
55
+ | `saas-metrics` | Primary + guardrail metrics | For selection |
56
+ | `aarrr-metrics` | Funnel context | For understanding |
57
+ | `assumption-mapping` | High-risk assumption A/B | For top assumptions |
58
+
59
+ ## Protocol
60
+
61
+ ### Step 0 — Is A/B Appropriate?
62
+
63
+ Checklist:
64
+ - Enough traffic (≥ 1000 users / week per variant)?
65
+ - Metric instrumentable + detectable in timeframe?
66
+ - Change isolatable (not confounded with other rollouts)?
67
+ - Can ethically A/B (not critical safety / compliance)?
68
+
69
+ If notalternatives: phased rollout with cohort comparison, before/after, qualitative testing.
70
+
71
+ ### Step 1Primary Metric
72
+
73
+ Single primary metric. Not multiple ("Impact on activation AND retention").
74
+
75
+ Typical:
76
+ - **Activation:** 7-day activation rate
77
+ - **Retention:** W/W active, churn
78
+ - **Conversion:** signup paid, trial → active
79
+ - **Engagement:** actions per session, DAU/MAU
80
+
81
+ Properties:
82
+ - **Detectable** in reasonable sample size
83
+ - **Aligned** with hypothesis outcome
84
+ - **Sensitive**moves when the expected change happens
85
+ - **Trustworthy** — not easily gamed
86
+
87
+ ### Step 2 — Minimum Detectable Effect (MDE)
88
+
89
+ MDE — smallest lift worth detecting. Trade-off:
90
+ - Smaller MDE = need more sample = longer test
91
+ - Larger MDE = faster test, but miss smaller wins
92
+
93
+ Rules of thumb:
94
+ - Activation metrics: MDE 3-5% (pp)
95
+ - Conversion metrics: MDE 5-10% (pp) relative lift
96
+ - Retention metrics: MDE 2-4% (pp)
97
+
98
+ In B2B: smaller samples → MDE often 5-10% minimum.
99
+
100
+ ### Step 3 — Sample Size Calculation
101
+
102
+ Formula for proportion tests:
103
+ ```
104
+ n per variant = (Z_α/2 + Z_β)² × 2 × p(1-p) / MDE²
105
+ ```
106
+
107
+ Where:
108
+ - Z_α/2 = 1.96 (95% confidence)
109
+ - Z_β = 0.84 (80% power)
110
+ - p = baseline rate
111
+ - MDE = minimum detectable effect
112
+
113
+ For continuous metrics similar formula with variance.
114
+
115
+ **Use calculator:** online tools (Evan Miller, Optimizely calc) — do not handle the math yourself.
116
+
117
+ ### Step 4 — Duration
118
+
119
+ Duration = sample_size × variants / weekly_eligible_users.
120
+
121
+ Multiply:
122
+ - By 1.5-2× for week-over-week cyclicality (weekdays vs weekends)
123
+ - For B2B: minimum 2 weeks (full week cycle)
124
+ - Maximum: 6-8 weeks (beyond that — context changes, seasonality)
125
+
126
+ ### Step 5 — Randomization
127
+
128
+ - **Unit:** user or account-level (B2B: account-level typically, to avoid split-brain within a team)
129
+ - **Seed:** random but deterministic (same user gets same variant on re-visit)
130
+ - **Allocation:** 50/50 default, can be 80/20 (control heavy) for risky changes
131
+
132
+ ### Step 6 — Guardrail Metrics
133
+
134
+ Metrics that must not degrade:
135
+ - Churn rate
136
+ - NPS / CSAT
137
+ - Support ticket volume
138
+ - p95 latency / error rate
139
+ - Revenue / user
140
+
141
+ Set thresholds (e.g., "churn cannot increase >1pp").
142
+
143
+ ### Step 7 — Segment Analysis Plan
144
+
145
+ Pre-registered (not p-hacking after):
146
+ - By company size
147
+ - By tenure (new vs established)
148
+ - By role
149
+ - By geography (if relevant)
150
+
151
+ Document in the test plan, not after results.
152
+
153
+ ### Step 8 — Statistical Method
154
+
155
+ - **Frequentist** (most common): p-value < 0.05, 80% power
156
+ - **Bayesian:** posterior probability of improvement > 95%
157
+
158
+ Pick one. Document.
159
+
160
+ ### Step 9 — Critical Region / Stopping Rules
161
+
162
+ When to stop the test:
163
+ - **Success** significance reached, stop
164
+ - **Failure (futility):** minimal effect after N% samples, stop
165
+ - **Guardrail breach:** even if primary wins, stop
166
+ - **Time limit:** maximum duration reached
167
+
168
+ **NEVER peek early** and stop based on p-value (inflates false positive rate) without sequential testing design.
169
+
170
+ ### Step 10 Pre-registered Analysis Plan
171
+
172
+ Document BEFORE running the test:
173
+ - Primary metric + MDE + sample size
174
+ - Segments
175
+ - Guardrails
176
+ - Stopping criteria
177
+ - Interpretation rules
178
+
179
+ Forces avoiding HARKing (Hypothesizing After Results Known).
180
+
181
+ ## Validation (Quality Gate)
182
+
183
+ - [ ] A/B appropriate (traffic, isolation, ethics)
184
+ - [ ] Primary metric single + well-defined
185
+ - [ ] MDE rationale (business + detectable)
186
+ - [ ] Sample size calculated
187
+ - [ ] Duration 2 weeks, ≤ 8 weeks
188
+ - [ ] Randomization unit appropriate (user / account)
189
+ - [ ] Guardrail metrics with thresholds
190
+ - [ ] Segment analysis pre-registered
191
+ - [ ] Statistical method picked + justified
192
+ - [ ] Stopping rules explicit
193
+ - [ ] Pre-registered analysis plan
194
+
195
+ ## Handoff
196
+
197
+ The result is the input for:
198
+ - **Engineering** feature flag + instrumentation
199
+ - **Data Analyst** monitoring dashboard
200
+ - **PM** → launch criteria
201
+ - **Stakeholders** → weekly reports
202
+
203
+ Format: A/B test design doc (markdown). Via `$handoff`.
204
+
205
+ ## Anti-patterns
206
+
207
+ | Error | Why it's bad | How to do it right |
208
+ |-------|-------------|-------------------|
209
+ | Multiple primary metrics | p-value inflation | Single primary |
210
+ | Peeking + early stop | False positive | Sequential or set duration |
211
+ | No MDE rationale | Under-powered or over-long | Business + detectable justification |
212
+ | Ignore guardrails | Feature "wins" while breaking | Explicit guardrails with kill criteria |
213
+ | No pre-registration | HARKing, p-hacking | Plan before running |
214
+ | Short duration | Weekly cycle noise | Min 2 weeks |
215
+ | User-level in B2B flow | Same account, different variants | Account-level randomization |
216
+
217
+ ## Template
218
+
219
+ ```markdown
220
+ # A/B Test: [Name]
221
+
222
+ ## Hypothesis
223
+ [via $hypothesis-template]
224
+
225
+ ## Primary Metric
226
+ - Metric: [e.g. 7-day activation rate]
227
+ - Baseline: X% (last 30 days)
228
+ - MDE: +5pp (rationale: business need + sample support)
229
+
230
+ ## Sample & Duration
231
+ - Eligible users / week: Y
232
+ - Sample per variant: Z
233
+ - Calculated duration: N weeks
234
+ - Planned duration: N weeks (accounting for cyclicality)
235
+
236
+ ## Variants
237
+ - Control: current flow
238
+ - Treatment: [change]
239
+ - Allocation: 50/50
240
+ - Randomization: account-level, deterministic
241
+
242
+ ## Guardrails
243
+ | Metric | Current | Threshold |
244
+ | Churn rate | 2% | < 2.5% |
245
+ | NPS | 45 | ≥ 43 |
246
+ | p95 latency | 180ms | < 200ms |
247
+
248
+ ## Pre-registered Analysis
249
+ - Segments: company size, user role
250
+ - Statistical method: frequentist, α=0.05, power=0.8
251
+ - Ship criteria: primary +MDE significant, no guardrail breach
252
+
253
+ ## Stopping Rules
254
+ - Success: significance reached + guardrails ok → ship
255
+ - Failure: effect < MDE with 50%+ sample → kill
256
+ - Guardrail: any guardrail breached → stop, investigate
257
+ - Max duration: 8 weeks
258
+ ```
259
+
260
+ ## Worked Example TeamFlow Onboarding A/B Test (Post-MVP Iteration)
261
+
262
+ **Context:** AI Summarization MVP shipped. 30 days later, H-003 (adoption) hypothesis tracking 42% adoption — below 60% target. Data Analyst designs an A/B test of onboarding checklist vs control for iteration.
263
+
264
+ ```markdown
265
+ # A/B Test Design: Onboarding Checklist for New AI Tier Managers
266
+
267
+ **Experiment ID:** EXP-025
268
+ **Status:** Approved, launch July 8, 2026
269
+ **Owner:** Sam P. (Data) + Alex K. (PM) + Jordan M. (Design)
270
+ **Hypothesis:** H-003 iteration 1
271
+
272
+ ## Hypothesis (re-stated for test)
273
+
274
+ **We believe** providing an onboarding checklist ("Complete 3 steps to master AI summaries")
275
+ **For** managers newly activated in AI-tier accounts
276
+ **Will result in** higher 30-day weekly adoption rate
277
+ **We'll know it's true when** treatment group 30-day adoption rate is ≥5pp higher than control
278
+ **Because**
279
+ - Discovery surprise: 60% of low-adopters cited "didn't know where to start" in post-launch survey
280
+ - Onboarding checklists in TeamFlow existing features show +22% activation lift (internal benchmark)
281
+ - 7-day first-activation window predicts long-term retention (our cohort analysis Q1)
282
+
283
+ ## Primary Metric
284
+
285
+ - **Metric:** 30-day weekly adoption rate (% of newly-activated managers who used AI summary ≥1 per week in weeks 2-4 post-activation)
286
+ - **Baseline:** 42% (current, measured last 30 days of AI tier rollout)
287
+ - **MDE:** +5pp (target: 47% treatment vs 42% control)
288
+ - **MDE Rationale:**
289
+ - Business threshold: +5pp × ~200 new managers/month = 10 more managers retained/month = $2.4K MRR
290
+ - Detectable with reasonable sample (see sample size calc)
291
+ - Below 5pp not material for PM-level decision
292
+
293
+ ## Sample Size Calculation
294
+
295
+ Using formula for proportion test:
296
+ - Baseline p1 = 0.42
297
+ - Treatment p2 = 0.47 (target)
298
+ - α = 0.05 (two-sided)
299
+ - Power = 0.80
300
+ - Z_α/2 = 1.96, Z_β = 0.84
301
+
302
+ n per variant = 2 × (1.96 + 0.84)² × [0.42(0.58) + 0.47(0.53)] / (0.05)²
303
+ 2 × 7.84 × 0.493 / 0.0025
304
+ ≈ 3092
305
+
306
+ **Sample per variant: ~3100**
307
+ **Total sample: 6200**
308
+
309
+ ## Duration
310
+
311
+ - Weekly eligible managers (newly-activating in AI tier): ~200/week
312
+ - Duration: 6200 / 200 = **31 weeks**
313
+
314
+ **Problem:** 31 weeks is unreasonably long. Options:
315
+ 1. **Raise MDE to 7pp** (target 49%) Duration drops to ~16 weeks
316
+ 2. **Increase allocation to 80/20** — more weight on treatment, but no speed benefit
317
+ 3. **Accept extended runway** with weekly monitoring for early signal
318
+
319
+ **Decision:** Raise MDE to 7pp (target 49%). Business-justified7pp × 200 managers = 14 managers/month saved, material.
320
+
321
+ **Revised duration: 16 weeks.** Plus cyclicality buffer: 18 weeks total.
322
+
323
+ ## Variants
324
+
325
+ - **Control (50%):** Current experience — manager activates tier, sees default TeamFlow onboarding, no checklist
326
+ - **Treatment (50%):** Same + sticky onboarding checklist:
327
+ - Step 1: Enable AI for first 1:1 (button)
328
+ - Step 2: Review generated summary + approve
329
+ - Step 3: Check extracted action items before next 1:1
330
+
331
+ Checklist persists in manager's dashboard until all 3 steps completed or manually dismissed.
332
+
333
+ ## Randomization
334
+
335
+ - **Unit:** Account-level (not user-level — same account gets same variant across all managers)
336
+ - Rationale: B2B consistency — HR admin shouldn't see different onboarding per team
337
+ - **Seed:** SHA256 of account_id — deterministic, re-assignable
338
+ - **Allocation:** 50/50
339
+ - **Eligibility:** Accounts activating AI tier from July 8 onward (test start date)
340
+
341
+ ## Guardrail Metrics
342
+
343
+ | Metric | Baseline | Threshold | Monitoring |
344
+ |--------|:--------:|:---------:|:----------:|
345
+ | Gross churn rate (AI tier accounts) | 3% / 90 days | ≤ 4% | Weekly |
346
+ | NPS (in AI tier) | 50 | ≥ 47 | Bi-weekly survey |
347
+ | Support tickets "onboarding confusion" | <2% of total | <3% | Daily check |
348
+ | Manager NPS on AI feature | 52 | ≥ 48 | Monthly in-product survey |
349
+
350
+ **If any guardrail breaches:** pause experiment, investigate, potentially kill.
351
+
352
+ ## Segment Analysis (Pre-registered)
353
+
354
+ Expected differential lift:
355
+ - **Company size:** Expect SMB highest lift (more novice managers) > mid-market > enterprise (already have training programs)
356
+ - **Manager experience:** Expect new managers (<2 years) highest lift
357
+ - **Industry:** Tech companies first-movers — expect highest baseline + moderate lift
358
+
359
+ Analysis will be reported **both** overall AND per-segment. No cherry-picking segments post-hoc.
360
+
361
+ ## Statistical Method
362
+
363
+ - **Frequentist, Z-test for proportions** (standard for adoption rate A/B)
364
+ - α = 0.05 two-sided
365
+ - Power = 0.80
366
+ - No peeking before planned duration (18 weeks)
367
+ - Segmented analysis multiple comparisons correction (Bonferroni): α / 3 segments = 0.017
368
+
369
+ ## Critical Region / Stopping Rules
370
+
371
+ ### Success (Go — ship to all)
372
+ - Primary metric: treatment lift ≥ 7pp, significant at α=0.05
373
+ - No guardrail breach
374
+ - ≥ 16 weeks duration completed
375
+
376
+ ### Failure (Stop kill variant)
377
+ - Primary metric: absolute difference < 3pp after 50% sample reached
378
+ - OR guardrail breach
379
+
380
+ ### Extended observation (continue)
381
+ - Primary metric: 3-7pp observed, not significant → continue to planned duration
382
+
383
+ ### Early positive signal (not stopping)
384
+ - Primary metric: 8pp+ observed at 25% sample, significant
385
+ - Action: DO NOT stop early continue, watch for heterogeneity
386
+
387
+ ## Pre-Registered Analysis Plan
388
+
389
+ Posted on internal wiki before experiment start:
390
+
391
+ 1. Primary metric: 30-day weekly adoption rate, Z-test, α=0.05
392
+ 2. Guardrail checks: weekly automated alerts
393
+ 3. Segment analysis: by company size, manager experience, industry (Bonferroni-corrected)
394
+ 4. Interpretation rules: pre-committed above
395
+ 5. Reporting: Weekly tracking dashboard, formal report at weeks 8, 16, 18
396
+
397
+ ## Implementation Plan
398
+
399
+ ### Pre-launch
400
+ - [ ] Feature flag configured (account-level, deterministic)
401
+ - [ ] Treatment variant built + QA'd
402
+ - [ ] Instrumentation: event `onboarding_checklist_step_completed`, `onboarding_checklist_dismissed`
403
+ - [ ] Analytics dashboard live (auto-updating weekly)
404
+ - [ ] Control variant verified unchanged from production
405
+
406
+ ### During
407
+ - **Weekly review** (Monday 10am): check sample accrual, guardrails, no interim analysis peeking
408
+ - **Bi-weekly survey:** 20 random managers from each variant — qualitative signal
409
+ - **Incident response:** if guardrail breachpause within 24 hours
410
+
411
+ ### Post (if success)
412
+ - Full rolloutremove feature flag
413
+ - Checklist becomes part of standard onboarding
414
+ - Document decision rationale for future iteration
415
+
416
+ ### Post (if failure)
417
+ - Variant killed
418
+ - Qualitative analysis of why — interviews with managers who did / didn't complete checklist
419
+ - Design next experiment (e.g., different onboarding approach)
420
+
421
+ ## Open Questions
422
+
423
+ 1. Do we expose variant to existing non-activated managers in AI-tier accounts? (**Decision:** No, only new activations from July 8 — cleaner baseline)
424
+ 2. What about managers in accounts that activate DURING experiment but ALL variant differ? (**Decision:** Follow account assignment — if account is treatment, all new managers see checklist)
425
+ 3. Bonferroni vs alternative multiple-comparisons correction? (**Decision:** Bonferroni — conservative, easier to explain)
426
+ ```
427
+
428
+ > **A/B design lesson:** Sample size calculation revealed MDE had to be **raised**, no shortcuts. **18 weeks** is a real commitment — not a fake "2-week" test that reads noise. **Account-level** randomization is critical in B2B — user-level would have one team split across variants = invalid. **Pre-registered analysis plan** on internal wiki prevents HARKing (Hypothesizing After Results Known). **Guardrails with numeric thresholds** — without them, "churn increased a bit but feature won" rationalizations happen. This test is honest science — takes months, tests one thing cleanly.