code-ai-installer 4.0.0 → 4.0.1-b

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (471) hide show
  1. package/README.md +83 -67
  2. package/dist/index.js +2 -0
  3. package/dist/mcp/audit_ledger.d.ts +12 -0
  4. package/dist/mcp/audit_ledger.js +82 -0
  5. package/dist/mcp/cli.js +7 -1
  6. package/dist/mcp/config.d.ts +23 -0
  7. package/dist/mcp/config.js +44 -6
  8. package/dist/mcp/index.d.ts +1 -2
  9. package/dist/mcp/index.js +1 -2
  10. package/dist/mcp/paths.d.ts +20 -2
  11. package/dist/mcp/paths.js +29 -5
  12. package/dist/mcp/proposal_dedup.d.ts +32 -0
  13. package/dist/mcp/proposal_dedup.js +102 -0
  14. package/dist/mcp/proposal_store.d.ts +18 -0
  15. package/dist/mcp/proposal_store.js +74 -0
  16. package/dist/mcp/scorecard.d.ts +140 -0
  17. package/dist/mcp/scorecard.js +103 -0
  18. package/dist/mcp/skill_invocations.d.ts +15 -0
  19. package/dist/mcp/skill_invocations.js +28 -0
  20. package/dist/mcp/task_state.d.ts +77 -2
  21. package/dist/mcp/tools/_subprocess.d.ts +16 -0
  22. package/dist/mcp/tools/_subprocess.js +56 -0
  23. package/dist/mcp/tools/advance_gate.js +2 -2
  24. package/dist/mcp/tools/aggregate_run_metrics.d.ts +19 -0
  25. package/dist/mcp/tools/aggregate_run_metrics.js +139 -0
  26. package/dist/mcp/tools/apply_diff.d.ts +2 -0
  27. package/dist/mcp/tools/apply_diff.js +29 -0
  28. package/dist/mcp/tools/audit_bilocale_parity.d.ts +2 -0
  29. package/dist/mcp/tools/audit_bilocale_parity.js +146 -0
  30. package/dist/mcp/tools/audit_budget_compliance.d.ts +35 -0
  31. package/dist/mcp/tools/audit_budget_compliance.js +172 -0
  32. package/dist/mcp/tools/build.d.ts +2 -0
  33. package/dist/mcp/tools/build.js +47 -0
  34. package/dist/mcp/tools/check_lint.d.ts +2 -0
  35. package/dist/mcp/tools/check_lint.js +23 -0
  36. package/dist/mcp/tools/classify_gate.js +2 -2
  37. package/dist/mcp/tools/current_gate.js +2 -2
  38. package/dist/mcp/tools/dependency_supply_chain.d.ts +2 -0
  39. package/dist/mcp/tools/dependency_supply_chain.js +59 -0
  40. package/dist/mcp/tools/docker_compose.d.ts +2 -0
  41. package/dist/mcp/tools/docker_compose.js +24 -0
  42. package/dist/mcp/tools/e2e_playwright.d.ts +2 -0
  43. package/dist/mcp/tools/e2e_playwright.js +88 -0
  44. package/dist/mcp/tools/get_skill.js +17 -0
  45. package/dist/mcp/tools/git_commit.d.ts +2 -0
  46. package/dist/mcp/tools/git_commit.js +30 -0
  47. package/dist/mcp/tools/list_proposals.d.ts +6 -0
  48. package/dist/mcp/tools/list_proposals.js +16 -0
  49. package/dist/mcp/tools/list_skills.js +9 -1
  50. package/dist/mcp/tools/load_role.d.ts +3 -4
  51. package/dist/mcp/tools/load_role.js +11 -13
  52. package/dist/mcp/tools/propose_change.d.ts +8 -0
  53. package/dist/mcp/tools/propose_change.js +36 -0
  54. package/dist/mcp/tools/record_decision.js +25 -25
  55. package/dist/mcp/tools/review_proposal.d.ts +17 -0
  56. package/dist/mcp/tools/review_proposal.js +99 -0
  57. package/dist/mcp/tools/run_drift_audit.d.ts +11 -0
  58. package/dist/mcp/tools/run_drift_audit.js +79 -0
  59. package/dist/mcp/tools/run_tests.d.ts +2 -0
  60. package/dist/mcp/tools/run_tests.js +92 -0
  61. package/dist/mcp/tools/sign_off.js +14 -2
  62. package/dist/mcp/tools/stubs.js +30 -9
  63. package/dist/mcp/tools/verify_claim.js +33 -6
  64. package/dist/mcp_setup.d.ts +14 -3
  65. package/dist/mcp_setup.js +15 -6
  66. package/dist/shared/frontmatter.d.ts +44 -2
  67. package/dist/shared/frontmatter.js +54 -6
  68. package/dist/shared/index.d.ts +0 -5
  69. package/dist/shared/index.js +0 -5
  70. package/dist/shared/persona.d.ts +2 -2
  71. package/dist/shared/persona.js +1 -1
  72. package/dist/shared/pipeline.d.ts +46 -1
  73. package/dist/shared/tools.d.ts +1382 -16
  74. package/dist/shared/tools.js +229 -0
  75. package/dist/shared/vocabulary.d.ts +99 -4
  76. package/dist/shared/vocabulary.js +94 -5
  77. package/domains/analytics/.agents/skills/ansoff-matrix/SKILL.md +316 -300
  78. package/domains/analytics/.agents/skills/bcg-matrix/SKILL.md +345 -329
  79. package/domains/analytics/.agents/skills/blue-ocean-strategy/SKILL.md +432 -416
  80. package/domains/analytics/.agents/skills/board/SKILL.md +22 -0
  81. package/domains/analytics/.agents/skills/cohort-analysis/SKILL.md +338 -322
  82. package/domains/analytics/.agents/skills/competitive-analysis/SKILL.md +413 -395
  83. package/domains/analytics/.agents/skills/customer-journey-mapping/SKILL.md +347 -331
  84. package/domains/analytics/.agents/skills/gates/SKILL.md +388 -366
  85. package/domains/analytics/.agents/skills/handoff/SKILL.md +402 -380
  86. package/domains/analytics/.agents/skills/html-pdf-report/SKILL.md +21 -289
  87. package/domains/analytics/.agents/skills/html-pdf-report-reference/SKILL.md +325 -0
  88. package/domains/analytics/.agents/skills/html-pdf-report-reference/agents/claude.json +17 -0
  89. package/domains/analytics/.agents/skills/html-pdf-report-reference/agents/copilot.json +17 -0
  90. package/domains/analytics/.agents/skills/html-pdf-report-reference/agents/gemini.json +17 -0
  91. package/domains/analytics/.agents/skills/html-pdf-report-reference/agents/kimi.yaml +15 -0
  92. package/domains/analytics/.agents/skills/html-pdf-report-reference/agents/openai.yaml +10 -0
  93. package/domains/analytics/.agents/skills/html-pdf-report-reference/agents/qwen.json +17 -0
  94. package/domains/analytics/.agents/skills/html-pdf-report-reference/agents/skill.yaml +23 -0
  95. package/domains/analytics/.agents/skills/icp-buyer-persona/SKILL.md +407 -390
  96. package/domains/analytics/.agents/skills/jtbd-analysis/SKILL.md +357 -341
  97. package/domains/analytics/.agents/skills/karpathy-guidelines/SKILL.md +32 -0
  98. package/domains/analytics/.agents/skills/pest-analysis/SKILL.md +324 -305
  99. package/domains/analytics/.agents/skills/porters-five-forces/SKILL.md +377 -361
  100. package/domains/analytics/.agents/skills/report-design/SKILL.md +416 -398
  101. package/domains/analytics/.agents/skills/rfm-analysis/SKILL.md +330 -314
  102. package/domains/analytics/.agents/skills/session-prompt-generator/SKILL.md +400 -378
  103. package/domains/analytics/.agents/skills/swot-analysis/SKILL.md +340 -324
  104. package/domains/analytics/.agents/skills/tam-sam-som/SKILL.md +329 -312
  105. package/domains/analytics/.agents/skills/trend-analysis/SKILL.md +347 -331
  106. package/domains/analytics/.agents/skills/unit-economics/SKILL.md +430 -413
  107. package/domains/analytics/.agents/skills/value-chain-analysis/SKILL.md +346 -330
  108. package/domains/analytics/.agents/skills/web-research/SKILL.md +323 -308
  109. package/domains/analytics/AGENTS.md +1 -0
  110. package/domains/analytics/agents/auditor.md +76 -0
  111. package/domains/analytics/agents/conductor.md +11 -0
  112. package/domains/analytics/agents/data_analyst.md +11 -0
  113. package/domains/analytics/agents/designer.md +11 -0
  114. package/domains/analytics/agents/interviewer.md +11 -0
  115. package/domains/analytics/agents/layouter.md +11 -0
  116. package/domains/analytics/agents/mediator.md +11 -0
  117. package/domains/analytics/agents/researcher.md +11 -0
  118. package/domains/analytics/agents/strategist.md +11 -0
  119. package/domains/analytics/locales/en/.agents/skills/ansoff-matrix/SKILL.md +316 -300
  120. package/domains/analytics/locales/en/.agents/skills/bcg-matrix/SKILL.md +345 -329
  121. package/domains/analytics/locales/en/.agents/skills/blue-ocean-strategy/SKILL.md +432 -416
  122. package/domains/analytics/locales/en/.agents/skills/board/SKILL.md +22 -0
  123. package/domains/analytics/locales/en/.agents/skills/cohort-analysis/SKILL.md +338 -322
  124. package/domains/analytics/locales/en/.agents/skills/competitive-analysis/SKILL.md +413 -395
  125. package/domains/analytics/locales/en/.agents/skills/customer-journey-mapping/SKILL.md +347 -331
  126. package/domains/analytics/locales/en/.agents/skills/gates/SKILL.md +388 -366
  127. package/domains/analytics/locales/en/.agents/skills/handoff/SKILL.md +402 -380
  128. package/domains/analytics/locales/en/.agents/skills/html-pdf-report/SKILL.md +21 -289
  129. package/domains/analytics/locales/en/.agents/skills/html-pdf-report-reference/SKILL.md +325 -0
  130. package/domains/analytics/locales/en/.agents/skills/html-pdf-report-reference/agents/claude.json +17 -0
  131. package/domains/analytics/locales/en/.agents/skills/html-pdf-report-reference/agents/copilot.json +17 -0
  132. package/domains/analytics/locales/en/.agents/skills/html-pdf-report-reference/agents/gemini.json +17 -0
  133. package/domains/analytics/locales/en/.agents/skills/html-pdf-report-reference/agents/kimi.yaml +15 -0
  134. package/domains/analytics/locales/en/.agents/skills/html-pdf-report-reference/agents/openai.yaml +10 -0
  135. package/domains/analytics/locales/en/.agents/skills/html-pdf-report-reference/agents/qwen.json +17 -0
  136. package/domains/analytics/locales/en/.agents/skills/html-pdf-report-reference/agents/skill.yaml +29 -0
  137. package/domains/analytics/locales/en/.agents/skills/icp-buyer-persona/SKILL.md +407 -390
  138. package/domains/analytics/locales/en/.agents/skills/jtbd-analysis/SKILL.md +357 -341
  139. package/domains/analytics/locales/en/.agents/skills/karpathy-guidelines/SKILL.md +32 -0
  140. package/domains/analytics/locales/en/.agents/skills/pest-analysis/SKILL.md +324 -305
  141. package/domains/analytics/locales/en/.agents/skills/porters-five-forces/SKILL.md +377 -361
  142. package/domains/analytics/locales/en/.agents/skills/report-design/SKILL.md +416 -398
  143. package/domains/analytics/locales/en/.agents/skills/rfm-analysis/SKILL.md +330 -314
  144. package/domains/analytics/locales/en/.agents/skills/session-prompt-generator/SKILL.md +400 -378
  145. package/domains/analytics/locales/en/.agents/skills/swot-analysis/SKILL.md +340 -324
  146. package/domains/analytics/locales/en/.agents/skills/tam-sam-som/SKILL.md +329 -312
  147. package/domains/analytics/locales/en/.agents/skills/trend-analysis/SKILL.md +347 -331
  148. package/domains/analytics/locales/en/.agents/skills/unit-economics/SKILL.md +430 -413
  149. package/domains/analytics/locales/en/.agents/skills/value-chain-analysis/SKILL.md +366 -350
  150. package/domains/analytics/locales/en/.agents/skills/web-research/SKILL.md +324 -309
  151. package/domains/analytics/locales/en/AGENTS.md +1 -0
  152. package/domains/analytics/locales/en/agents/auditor.md +76 -0
  153. package/domains/analytics/locales/en/agents/conductor.md +27 -0
  154. package/domains/analytics/locales/en/agents/data_analyst.md +29 -0
  155. package/domains/analytics/locales/en/agents/designer.md +27 -0
  156. package/domains/analytics/locales/en/agents/interviewer.md +11 -0
  157. package/domains/analytics/locales/en/agents/layouter.md +11 -0
  158. package/domains/analytics/locales/en/agents/mediator.md +11 -0
  159. package/domains/analytics/locales/en/agents/researcher.md +11 -0
  160. package/domains/analytics/locales/en/agents/strategist.md +11 -0
  161. package/domains/analytics/persona/persona-base.md +94 -0
  162. package/domains/analytics/pipeline.yaml +102 -0
  163. package/domains/content/.agents/skills/audience-analysis/SKILL.md +15 -0
  164. package/domains/content/.agents/skills/board/SKILL.md +20 -0
  165. package/domains/content/.agents/skills/brand-compliance/SKILL.md +15 -0
  166. package/domains/content/.agents/skills/brand-guidelines/SKILL.md +17 -0
  167. package/domains/content/.agents/skills/competitor-content-analysis/SKILL.md +15 -0
  168. package/domains/content/.agents/skills/content-brief/SKILL.md +15 -0
  169. package/domains/content/.agents/skills/content-calendar/SKILL.md +15 -0
  170. package/domains/content/.agents/skills/content-release-gate/SKILL.md +15 -0
  171. package/domains/content/.agents/skills/content-review-checklist/SKILL.md +15 -0
  172. package/domains/content/.agents/skills/cta-optimization/SKILL.md +15 -0
  173. package/domains/content/.agents/skills/data-storytelling/SKILL.md +15 -0
  174. package/domains/content/.agents/skills/email-copywriting/SKILL.md +15 -0
  175. package/domains/content/.agents/skills/email-engagement-tiers/SKILL.md +15 -0
  176. package/domains/content/.agents/skills/fact-checking/SKILL.md +15 -0
  177. package/domains/content/.agents/skills/gates/SKILL.md +20 -0
  178. package/domains/content/.agents/skills/google-stitch-content/SKILL.md +15 -0
  179. package/domains/content/.agents/skills/handoff/SKILL.md +24 -0
  180. package/domains/content/.agents/skills/headline-formulas/SKILL.md +15 -0
  181. package/domains/content/.agents/skills/image-prompt-engineering/SKILL.md +15 -0
  182. package/domains/content/.agents/skills/karpathy-guidelines/SKILL.md +28 -0
  183. package/domains/content/.agents/skills/mailerlite-email-ops/SKILL.md +15 -0
  184. package/domains/content/.agents/skills/marketing-psychology/SKILL.md +15 -0
  185. package/domains/content/.agents/skills/moodboard/SKILL.md +15 -0
  186. package/domains/content/.agents/skills/platform-compliance/SKILL.md +15 -0
  187. package/domains/content/.agents/skills/platform-strategy/SKILL.md +15 -0
  188. package/domains/content/.agents/skills/platform-visual-specs/SKILL.md +15 -0
  189. package/domains/content/.agents/skills/readability-scoring/SKILL.md +15 -0
  190. package/domains/content/.agents/skills/seo-copywriting/SKILL.md +15 -0
  191. package/domains/content/.agents/skills/social-media-formats/SKILL.md +15 -0
  192. package/domains/content/.agents/skills/source-verification/SKILL.md +15 -0
  193. package/domains/content/.agents/skills/storytelling-framework/SKILL.md +15 -0
  194. package/domains/content/.agents/skills/tone-of-voice/SKILL.md +15 -0
  195. package/domains/content/.agents/skills/topic-research/SKILL.md +15 -0
  196. package/domains/content/.agents/skills/trend-research/SKILL.md +15 -0
  197. package/domains/content/.agents/skills/visual-brief/SKILL.md +15 -0
  198. package/domains/content/AGENTS.md +4 -0
  199. package/domains/content/agents/auditor.md +76 -0
  200. package/domains/content/agents/conductor.md +11 -0
  201. package/domains/content/agents/copywriter.md +11 -0
  202. package/domains/content/agents/researcher.md +11 -0
  203. package/domains/content/agents/reviewer.md +11 -0
  204. package/domains/content/agents/strategist.md +11 -0
  205. package/domains/content/agents/visual_concept.md +11 -0
  206. package/domains/content/locales/en/.agents/skills/audience-analysis/SKILL.md +15 -0
  207. package/domains/content/locales/en/.agents/skills/board/SKILL.md +20 -0
  208. package/domains/content/locales/en/.agents/skills/brand-compliance/SKILL.md +15 -0
  209. package/domains/content/locales/en/.agents/skills/brand-guidelines/SKILL.md +17 -0
  210. package/domains/content/locales/en/.agents/skills/competitor-content-analysis/SKILL.md +15 -0
  211. package/domains/content/locales/en/.agents/skills/content-brief/SKILL.md +15 -0
  212. package/domains/content/locales/en/.agents/skills/content-calendar/SKILL.md +15 -0
  213. package/domains/content/locales/en/.agents/skills/content-release-gate/SKILL.md +15 -0
  214. package/domains/content/locales/en/.agents/skills/content-review-checklist/SKILL.md +15 -0
  215. package/domains/content/locales/en/.agents/skills/cta-optimization/SKILL.md +15 -0
  216. package/domains/content/locales/en/.agents/skills/data-storytelling/SKILL.md +15 -0
  217. package/domains/content/locales/en/.agents/skills/email-copywriting/SKILL.md +15 -0
  218. package/domains/content/locales/en/.agents/skills/email-engagement-tiers/SKILL.md +15 -0
  219. package/domains/content/locales/en/.agents/skills/fact-checking/SKILL.md +15 -0
  220. package/domains/content/locales/en/.agents/skills/gates/SKILL.md +20 -0
  221. package/domains/content/locales/en/.agents/skills/google-stitch-content/SKILL.md +15 -0
  222. package/domains/content/locales/en/.agents/skills/handoff/SKILL.md +24 -0
  223. package/domains/content/locales/en/.agents/skills/headline-formulas/SKILL.md +15 -0
  224. package/domains/content/locales/en/.agents/skills/image-prompt-engineering/SKILL.md +15 -0
  225. package/domains/content/locales/en/.agents/skills/karpathy-guidelines/SKILL.md +30 -1
  226. package/domains/content/locales/en/.agents/skills/mailerlite-email-ops/SKILL.md +15 -0
  227. package/domains/content/locales/en/.agents/skills/marketing-psychology/SKILL.md +15 -0
  228. package/domains/content/locales/en/.agents/skills/moodboard/SKILL.md +15 -0
  229. package/domains/content/locales/en/.agents/skills/platform-compliance/SKILL.md +15 -0
  230. package/domains/content/locales/en/.agents/skills/platform-strategy/SKILL.md +15 -0
  231. package/domains/content/locales/en/.agents/skills/platform-visual-specs/SKILL.md +15 -0
  232. package/domains/content/locales/en/.agents/skills/readability-scoring/SKILL.md +15 -0
  233. package/domains/content/locales/en/.agents/skills/seo-copywriting/SKILL.md +15 -0
  234. package/domains/content/locales/en/.agents/skills/social-media-formats/SKILL.md +15 -0
  235. package/domains/content/locales/en/.agents/skills/source-verification/SKILL.md +15 -0
  236. package/domains/content/locales/en/.agents/skills/storytelling-framework/SKILL.md +15 -0
  237. package/domains/content/locales/en/.agents/skills/tone-of-voice/SKILL.md +15 -0
  238. package/domains/content/locales/en/.agents/skills/topic-research/SKILL.md +15 -0
  239. package/domains/content/locales/en/.agents/skills/trend-research/SKILL.md +15 -0
  240. package/domains/content/locales/en/.agents/skills/visual-brief/SKILL.md +15 -0
  241. package/domains/content/locales/en/AGENTS.md +4 -0
  242. package/domains/content/locales/en/agents/auditor.md +76 -0
  243. package/domains/content/locales/en/agents/conductor.md +12 -0
  244. package/domains/content/locales/en/agents/copywriter.md +12 -0
  245. package/domains/content/locales/en/agents/researcher.md +12 -0
  246. package/domains/content/locales/en/agents/reviewer.md +12 -0
  247. package/domains/content/locales/en/agents/strategist.md +12 -0
  248. package/domains/content/locales/en/agents/visual_concept.md +12 -0
  249. package/domains/content/persona/persona-base.md +94 -0
  250. package/domains/content/pipeline.yaml +96 -0
  251. package/domains/development/.agents/skills/adr-log/SKILL.md +1 -0
  252. package/domains/development/.agents/skills/design-intake/SKILL.md +0 -4
  253. package/domains/development/.agents/skills/karpathy-guidelines/SKILL.md +2 -1
  254. package/domains/development/.agents/skills/lava-flow-legacy-detection/SKILL.md +15 -1
  255. package/domains/development/.agents/skills/mcp-integration/SKILL.md +211 -0
  256. package/domains/development/.agents/skills/mcp-integration/agents/claude.json +22 -0
  257. package/domains/development/.agents/skills/mcp-integration/agents/copilot.json +22 -0
  258. package/domains/development/.agents/skills/mcp-integration/agents/gemini.json +22 -0
  259. package/domains/development/.agents/skills/mcp-integration/agents/kimi.yaml +18 -0
  260. package/domains/development/.agents/skills/mcp-integration/agents/openai.yaml +8 -0
  261. package/domains/development/.agents/skills/mcp-integration/agents/qwen.json +22 -0
  262. package/domains/development/.agents/skills/mcp-integration/agents/skill.yaml +26 -0
  263. package/domains/development/.agents/skills/qa-ui-a11y-smoke/SKILL.md +1 -1
  264. package/domains/development/.agents/skills/ui-a11y-smoke-review/SKILL.md +1 -1
  265. package/domains/development/AGENTS.md +1 -0
  266. package/domains/development/AGENTS.yaml +1 -0
  267. package/domains/development/agents/architect.md +13 -1
  268. package/domains/development/agents/auditor.md +74 -0
  269. package/domains/development/agents/conductor.md +14 -3
  270. package/domains/development/agents/devops.md +8 -9
  271. package/domains/development/agents/reviewer.md +12 -0
  272. package/domains/development/agents/senior_full_stack.md +12 -0
  273. package/domains/development/agents/tester.md +10 -16
  274. package/domains/development/locales/en/.agents/skills/adr-log/SKILL.md +1 -0
  275. package/domains/development/locales/en/.agents/skills/current-state-analysis/SKILL.md +256 -172
  276. package/domains/development/locales/en/.agents/skills/karpathy-guidelines/SKILL.md +2 -1
  277. package/domains/development/locales/en/.agents/skills/lava-flow-legacy-detection/SKILL.md +15 -1
  278. package/domains/development/locales/en/.agents/skills/mcp-integration/SKILL.md +211 -0
  279. package/domains/development/locales/en/.agents/skills/mcp-integration/agents/claude.json +22 -0
  280. package/domains/development/locales/en/.agents/skills/mcp-integration/agents/copilot.json +22 -0
  281. package/domains/development/locales/en/.agents/skills/mcp-integration/agents/gemini.json +22 -0
  282. package/domains/development/locales/en/.agents/skills/mcp-integration/agents/kimi.yaml +18 -0
  283. package/domains/development/locales/en/.agents/skills/mcp-integration/agents/openai.yaml +8 -0
  284. package/domains/development/locales/en/.agents/skills/mcp-integration/agents/qwen.json +22 -0
  285. package/domains/development/locales/en/.agents/skills/mcp-integration/agents/skill.yaml +26 -0
  286. package/domains/development/locales/en/.agents/skills/qa-ui-a11y-smoke/SKILL.md +1 -1
  287. package/domains/development/locales/en/.agents/skills/ui-a11y-smoke-review/SKILL.md +1 -1
  288. package/domains/development/locales/en/AGENTS.md +5 -0
  289. package/domains/development/locales/en/AGENTS.yaml +1 -0
  290. package/domains/development/locales/en/agents/architect.md +13 -1
  291. package/domains/development/locales/en/agents/auditor.md +74 -0
  292. package/domains/development/locales/en/agents/conductor.md +14 -3
  293. package/domains/development/locales/en/agents/devops.md +8 -9
  294. package/domains/development/locales/en/agents/reviewer.md +12 -0
  295. package/domains/development/locales/en/agents/senior_full_stack.md +12 -0
  296. package/domains/development/locales/en/agents/tester.md +10 -16
  297. package/domains/development/persona/persona-base.md +94 -0
  298. package/domains/product/.agents/skills/aarrr-metrics/SKILL.md +451 -433
  299. package/domains/product/.agents/skills/ab-test-design/SKILL.md +428 -412
  300. package/domains/product/.agents/skills/acceptance-criteria/SKILL.md +422 -406
  301. package/domains/product/.agents/skills/assumption-mapping/SKILL.md +323 -307
  302. package/domains/product/.agents/skills/board/SKILL.md +24 -0
  303. package/domains/product/.agents/skills/design-brief/SKILL.md +433 -418
  304. package/domains/product/.agents/skills/epic-breakdown/SKILL.md +435 -420
  305. package/domains/product/.agents/skills/gates/SKILL.md +470 -446
  306. package/domains/product/.agents/skills/gtm-brief/SKILL.md +18 -321
  307. package/domains/product/.agents/skills/gtm-brief-reference/SKILL.md +348 -0
  308. package/domains/product/.agents/skills/gtm-brief-reference/agents/claude.json +17 -0
  309. package/domains/product/.agents/skills/gtm-brief-reference/agents/copilot.json +17 -0
  310. package/domains/product/.agents/skills/gtm-brief-reference/agents/gemini.json +17 -0
  311. package/domains/product/.agents/skills/gtm-brief-reference/agents/kimi.yaml +15 -0
  312. package/domains/product/.agents/skills/gtm-brief-reference/agents/openai.yaml +10 -0
  313. package/domains/product/.agents/skills/gtm-brief-reference/agents/qwen.json +17 -0
  314. package/domains/product/.agents/skills/gtm-brief-reference/agents/skill.yaml +22 -0
  315. package/domains/product/.agents/skills/handoff/SKILL.md +463 -439
  316. package/domains/product/.agents/skills/html-pdf-report/SKILL.md +21 -663
  317. package/domains/product/.agents/skills/html-pdf-report-reference/SKILL.md +699 -0
  318. package/domains/product/.agents/skills/html-pdf-report-reference/agents/claude.json +17 -0
  319. package/domains/product/.agents/skills/html-pdf-report-reference/agents/copilot.json +17 -0
  320. package/domains/product/.agents/skills/html-pdf-report-reference/agents/gemini.json +17 -0
  321. package/domains/product/.agents/skills/html-pdf-report-reference/agents/kimi.yaml +15 -0
  322. package/domains/product/.agents/skills/html-pdf-report-reference/agents/openai.yaml +10 -0
  323. package/domains/product/.agents/skills/html-pdf-report-reference/agents/qwen.json +17 -0
  324. package/domains/product/.agents/skills/html-pdf-report-reference/agents/skill.yaml +22 -0
  325. package/domains/product/.agents/skills/hypothesis-template/SKILL.md +484 -469
  326. package/domains/product/.agents/skills/jtbd-canvas/SKILL.md +274 -258
  327. package/domains/product/.agents/skills/kano-model/SKILL.md +370 -355
  328. package/domains/product/.agents/skills/karpathy-guidelines/SKILL.md +36 -0
  329. package/domains/product/.agents/skills/launch-checklist/SKILL.md +434 -419
  330. package/domains/product/.agents/skills/moscow-prioritization/SKILL.md +407 -392
  331. package/domains/product/.agents/skills/north-star-metric/SKILL.md +317 -301
  332. package/domains/product/.agents/skills/okr-framework/SKILL.md +299 -284
  333. package/domains/product/.agents/skills/opportunity-solution-tree/SKILL.md +472 -456
  334. package/domains/product/.agents/skills/prd-template/SKILL.md +18 -258
  335. package/domains/product/.agents/skills/prd-template-reference/SKILL.md +285 -0
  336. package/domains/product/.agents/skills/prd-template-reference/agents/claude.json +17 -0
  337. package/domains/product/.agents/skills/prd-template-reference/agents/copilot.json +17 -0
  338. package/domains/product/.agents/skills/prd-template-reference/agents/gemini.json +17 -0
  339. package/domains/product/.agents/skills/prd-template-reference/agents/kimi.yaml +16 -0
  340. package/domains/product/.agents/skills/prd-template-reference/agents/openai.yaml +10 -0
  341. package/domains/product/.agents/skills/prd-template-reference/agents/qwen.json +17 -0
  342. package/domains/product/.agents/skills/prd-template-reference/agents/skill.yaml +23 -0
  343. package/domains/product/.agents/skills/problem-statement/SKILL.md +327 -312
  344. package/domains/product/.agents/skills/product-roadmap/SKILL.md +320 -304
  345. package/domains/product/.agents/skills/product-vision/SKILL.md +409 -394
  346. package/domains/product/.agents/skills/release-notes/SKILL.md +18 -258
  347. package/domains/product/.agents/skills/release-notes-reference/SKILL.md +285 -0
  348. package/domains/product/.agents/skills/release-notes-reference/agents/claude.json +17 -0
  349. package/domains/product/.agents/skills/release-notes-reference/agents/copilot.json +17 -0
  350. package/domains/product/.agents/skills/release-notes-reference/agents/gemini.json +17 -0
  351. package/domains/product/.agents/skills/release-notes-reference/agents/kimi.yaml +15 -0
  352. package/domains/product/.agents/skills/release-notes-reference/agents/openai.yaml +10 -0
  353. package/domains/product/.agents/skills/release-notes-reference/agents/qwen.json +17 -0
  354. package/domains/product/.agents/skills/release-notes-reference/agents/skill.yaml +22 -0
  355. package/domains/product/.agents/skills/report-design/SKILL.md +17 -307
  356. package/domains/product/.agents/skills/report-design-reference/SKILL.md +331 -0
  357. package/domains/product/.agents/skills/report-design-reference/agents/claude.json +17 -0
  358. package/domains/product/.agents/skills/report-design-reference/agents/copilot.json +17 -0
  359. package/domains/product/.agents/skills/report-design-reference/agents/gemini.json +17 -0
  360. package/domains/product/.agents/skills/report-design-reference/agents/kimi.yaml +15 -0
  361. package/domains/product/.agents/skills/report-design-reference/agents/openai.yaml +10 -0
  362. package/domains/product/.agents/skills/report-design-reference/agents/qwen.json +17 -0
  363. package/domains/product/.agents/skills/report-design-reference/agents/skill.yaml +22 -0
  364. package/domains/product/.agents/skills/rice-scoring/SKILL.md +266 -251
  365. package/domains/product/.agents/skills/saas-metrics/SKILL.md +422 -404
  366. package/domains/product/.agents/skills/session-prompt-generator/SKILL.md +474 -450
  367. package/domains/product/.agents/skills/user-flow/SKILL.md +491 -476
  368. package/domains/product/.agents/skills/user-interview-script/SKILL.md +315 -298
  369. package/domains/product/.agents/skills/user-story/SKILL.md +401 -385
  370. package/domains/product/.agents/skills/wsjf-scoring/SKILL.md +333 -315
  371. package/domains/product/AGENTS.md +5 -0
  372. package/domains/product/AGENTS.yaml +1 -0
  373. package/domains/product/agents/auditor.md +76 -0
  374. package/domains/product/agents/conductor.md +11 -0
  375. package/domains/product/agents/data_analyst.md +11 -0
  376. package/domains/product/agents/designer.md +11 -0
  377. package/domains/product/agents/discovery.md +11 -0
  378. package/domains/product/agents/layouter.md +11 -0
  379. package/domains/product/agents/mediator.md +11 -0
  380. package/domains/product/agents/pm.md +11 -0
  381. package/domains/product/agents/product_strategist.md +11 -0
  382. package/domains/product/agents/tech_lead.md +11 -0
  383. package/domains/product/agents/ux_designer.md +11 -0
  384. package/domains/product/locales/en/.agents/skills/aarrr-metrics/SKILL.md +451 -433
  385. package/domains/product/locales/en/.agents/skills/ab-test-design/SKILL.md +428 -412
  386. package/domains/product/locales/en/.agents/skills/acceptance-criteria/SKILL.md +422 -406
  387. package/domains/product/locales/en/.agents/skills/assumption-mapping/SKILL.md +323 -307
  388. package/domains/product/locales/en/.agents/skills/board/SKILL.md +24 -0
  389. package/domains/product/locales/en/.agents/skills/design-brief/SKILL.md +433 -418
  390. package/domains/product/locales/en/.agents/skills/epic-breakdown/SKILL.md +435 -420
  391. package/domains/product/locales/en/.agents/skills/gates/SKILL.md +470 -446
  392. package/domains/product/locales/en/.agents/skills/gtm-brief/SKILL.md +18 -321
  393. package/domains/product/locales/en/.agents/skills/gtm-brief-reference/SKILL.md +348 -0
  394. package/domains/product/locales/en/.agents/skills/gtm-brief-reference/agents/claude.json +17 -0
  395. package/domains/product/locales/en/.agents/skills/gtm-brief-reference/agents/copilot.json +17 -0
  396. package/domains/product/locales/en/.agents/skills/gtm-brief-reference/agents/gemini.json +17 -0
  397. package/domains/product/locales/en/.agents/skills/gtm-brief-reference/agents/kimi.yaml +15 -0
  398. package/domains/product/locales/en/.agents/skills/gtm-brief-reference/agents/openai.yaml +10 -0
  399. package/domains/product/locales/en/.agents/skills/gtm-brief-reference/agents/qwen.json +17 -0
  400. package/domains/product/locales/en/.agents/skills/gtm-brief-reference/agents/skill.yaml +22 -0
  401. package/domains/product/locales/en/.agents/skills/handoff/SKILL.md +463 -439
  402. package/domains/product/locales/en/.agents/skills/html-pdf-report/SKILL.md +21 -663
  403. package/domains/product/locales/en/.agents/skills/html-pdf-report-reference/SKILL.md +699 -0
  404. package/domains/product/locales/en/.agents/skills/html-pdf-report-reference/agents/claude.json +17 -0
  405. package/domains/product/locales/en/.agents/skills/html-pdf-report-reference/agents/copilot.json +17 -0
  406. package/domains/product/locales/en/.agents/skills/html-pdf-report-reference/agents/gemini.json +17 -0
  407. package/domains/product/locales/en/.agents/skills/html-pdf-report-reference/agents/kimi.yaml +15 -0
  408. package/domains/product/locales/en/.agents/skills/html-pdf-report-reference/agents/openai.yaml +10 -0
  409. package/domains/product/locales/en/.agents/skills/html-pdf-report-reference/agents/qwen.json +17 -0
  410. package/domains/product/locales/en/.agents/skills/html-pdf-report-reference/agents/skill.yaml +22 -0
  411. package/domains/product/locales/en/.agents/skills/hypothesis-template/SKILL.md +484 -469
  412. package/domains/product/locales/en/.agents/skills/jtbd-canvas/SKILL.md +273 -257
  413. package/domains/product/locales/en/.agents/skills/kano-model/SKILL.md +370 -355
  414. package/domains/product/locales/en/.agents/skills/karpathy-guidelines/SKILL.md +36 -0
  415. package/domains/product/locales/en/.agents/skills/launch-checklist/SKILL.md +434 -419
  416. package/domains/product/locales/en/.agents/skills/moscow-prioritization/SKILL.md +407 -392
  417. package/domains/product/locales/en/.agents/skills/north-star-metric/SKILL.md +317 -301
  418. package/domains/product/locales/en/.agents/skills/okr-framework/SKILL.md +299 -284
  419. package/domains/product/locales/en/.agents/skills/opportunity-solution-tree/SKILL.md +472 -456
  420. package/domains/product/locales/en/.agents/skills/prd-template/SKILL.md +18 -258
  421. package/domains/product/locales/en/.agents/skills/prd-template-reference/SKILL.md +285 -0
  422. package/domains/product/locales/en/.agents/skills/prd-template-reference/agents/claude.json +16 -0
  423. package/domains/product/locales/en/.agents/skills/prd-template-reference/agents/copilot.json +16 -0
  424. package/domains/product/locales/en/.agents/skills/prd-template-reference/agents/gemini.json +16 -0
  425. package/domains/product/locales/en/.agents/skills/prd-template-reference/agents/kimi.yaml +15 -0
  426. package/domains/product/locales/en/.agents/skills/prd-template-reference/agents/openai.yaml +10 -0
  427. package/domains/product/locales/en/.agents/skills/prd-template-reference/agents/qwen.json +16 -0
  428. package/domains/product/locales/en/.agents/skills/prd-template-reference/agents/skill.yaml +22 -0
  429. package/domains/product/locales/en/.agents/skills/problem-statement/SKILL.md +327 -312
  430. package/domains/product/locales/en/.agents/skills/product-roadmap/SKILL.md +321 -305
  431. package/domains/product/locales/en/.agents/skills/product-vision/SKILL.md +410 -395
  432. package/domains/product/locales/en/.agents/skills/release-notes/SKILL.md +18 -258
  433. package/domains/product/locales/en/.agents/skills/release-notes-reference/SKILL.md +285 -0
  434. package/domains/product/locales/en/.agents/skills/release-notes-reference/agents/claude.json +16 -0
  435. package/domains/product/locales/en/.agents/skills/release-notes-reference/agents/copilot.json +16 -0
  436. package/domains/product/locales/en/.agents/skills/release-notes-reference/agents/gemini.json +16 -0
  437. package/domains/product/locales/en/.agents/skills/release-notes-reference/agents/kimi.yaml +14 -0
  438. package/domains/product/locales/en/.agents/skills/release-notes-reference/agents/openai.yaml +10 -0
  439. package/domains/product/locales/en/.agents/skills/release-notes-reference/agents/qwen.json +16 -0
  440. package/domains/product/locales/en/.agents/skills/release-notes-reference/agents/skill.yaml +21 -0
  441. package/domains/product/locales/en/.agents/skills/report-design/SKILL.md +17 -307
  442. package/domains/product/locales/en/.agents/skills/report-design-reference/SKILL.md +331 -0
  443. package/domains/product/locales/en/.agents/skills/report-design-reference/agents/claude.json +17 -0
  444. package/domains/product/locales/en/.agents/skills/report-design-reference/agents/copilot.json +17 -0
  445. package/domains/product/locales/en/.agents/skills/report-design-reference/agents/gemini.json +17 -0
  446. package/domains/product/locales/en/.agents/skills/report-design-reference/agents/kimi.yaml +15 -0
  447. package/domains/product/locales/en/.agents/skills/report-design-reference/agents/openai.yaml +10 -0
  448. package/domains/product/locales/en/.agents/skills/report-design-reference/agents/qwen.json +17 -0
  449. package/domains/product/locales/en/.agents/skills/report-design-reference/agents/skill.yaml +22 -0
  450. package/domains/product/locales/en/.agents/skills/rice-scoring/SKILL.md +266 -251
  451. package/domains/product/locales/en/.agents/skills/saas-metrics/SKILL.md +422 -404
  452. package/domains/product/locales/en/.agents/skills/session-prompt-generator/SKILL.md +474 -450
  453. package/domains/product/locales/en/.agents/skills/user-flow/SKILL.md +491 -476
  454. package/domains/product/locales/en/.agents/skills/user-interview-script/SKILL.md +314 -297
  455. package/domains/product/locales/en/.agents/skills/user-story/SKILL.md +401 -385
  456. package/domains/product/locales/en/.agents/skills/wsjf-scoring/SKILL.md +333 -315
  457. package/domains/product/locales/en/AGENTS.md +5 -0
  458. package/domains/product/locales/en/agents/auditor.md +76 -0
  459. package/domains/product/locales/en/agents/conductor.md +11 -0
  460. package/domains/product/locales/en/agents/data_analyst.md +11 -0
  461. package/domains/product/locales/en/agents/designer.md +11 -0
  462. package/domains/product/locales/en/agents/discovery.md +11 -0
  463. package/domains/product/locales/en/agents/layouter.md +11 -0
  464. package/domains/product/locales/en/agents/mediator.md +11 -0
  465. package/domains/product/locales/en/agents/pm.md +11 -0
  466. package/domains/product/locales/en/agents/product_strategist.md +11 -0
  467. package/domains/product/locales/en/agents/tech_lead.md +11 -0
  468. package/domains/product/locales/en/agents/ux_designer.md +11 -0
  469. package/domains/product/persona/persona-base.md +94 -0
  470. package/domains/product/pipeline.yaml +115 -0
  471. package/package.json +72 -70
@@ -1,469 +1,484 @@
1
- ---
2
- name: hypothesis-template
3
- description: Testable hypothesis — We believe / Will result in / We'll know when [metric] reaches [threshold]
4
- ---
5
- # Hypothesis Template
6
-
7
- > **Category:** Experimentation · **Slug:** `hypothesis-template`
8
-
9
- ## When to Use
10
-
11
- - Before every experiment (A/B test, rollout, prototype test).
12
- - During assumption validation — convert assumption into testable hypothesis.
13
- - For pre-mortem decisions — «if we do X, what do we expect?».
14
- - As part of PRD success criteria.
15
-
16
- ## Input
17
-
18
- | Field | Required | Description |
19
- |-------|:--------:|-------------|
20
- | Proposed change / feature | ✅ | What we're testing |
21
- | Underlying assumption | ✅ | Why we think it will work |
22
- | Outcome metric | ✅ | What to measure |
23
- | Baseline data | ✅ | Current metric level |
24
-
25
- ## Data Sources
26
-
27
- 1. `$assumption-mapping`which assumptions to test.
28
- 2. `$saas-metrics` + `$aarrr-metrics`for outcome metric selection.
29
- 3. Historical data baseline.
30
- 4. Industry benchmarks — expected effect sizes.
31
-
32
- ### Related Skills
33
-
34
- | Skill | What we take | When to invoke |
35
- |-------|-------------|----------------|
36
- | `assumption-mapping` | Top-risky assumptions hypotheses | Before hypothesis |
37
- | `ab-test-design` | Testing method | After hypothesis |
38
- | `saas-metrics` | Outcome metrics | For measurement |
39
- | `north-star-metric` | Primary metric alignment | For NSM-related tests |
40
-
41
- ## Format (Canonical)
42
-
43
- > **We believe** [proposed change / hypothesis]
44
- > **For** [target user / segment]
45
- > **Will result in** [expected outcome]
46
- > **We'll know it's true when** [metric] **reaches** [threshold] **within** [timeframe]
47
- > **Because** [underlying rationale]
48
-
49
- Example:
50
- > **We believe** adding an in-app onboarding checklist
51
- > **For** new users (trial signups, first 7 days)
52
- > **Will result in** higher activation rate
53
- > **We'll know it's true when** 7-day activation rate reaches **45%** (from baseline **32%**) **within** 6 weeks of rollout
54
- > **Because** 12/15 interviews showed confusion about first steps, and competitor data suggests checklist approach drives +40% activation in the category.
55
-
56
- ## Protocol
57
-
58
- ### Step 1 Hypothesis Formulation
59
-
60
- **We believe:** specific change (feature, copy, flow)
61
- **For:** specific user segment (not «all users»)
62
- **Will result in:** directional outcome + metric
63
-
64
- Rules:
65
- - Specific change, not vague («improve UX»)
66
- - Specific user, not «users»
67
- - Specific outcome, not «better engagement»
68
-
69
- ### Step 2 Outcome Metric Selection
70
-
71
- Primary metric must be:
72
- - **Measurable:** instrumented or instrumentable
73
- - **Leading or lagging:** know which
74
- - **Aligned:** ties to NSM / OKR
75
- - **Anti-game:** not easily gameable
76
-
77
- Common outcome metrics per hypothesis type:
78
- - **Onboarding/Activation hypothesis:** 7-day activation rate, time-to-first-value
79
- - **Retention hypothesis:** W/W retention, churn rate, usage frequency
80
- - **Monetization hypothesis:** conversion rate, ARPA, upsell rate
81
- - **Engagement hypothesis:** DAU/MAU, session duration, actions per session
82
-
83
- ### Step 3 — Baseline + Threshold
84
-
85
- **Baseline:** current level (based on recent data window).
86
-
87
- **Threshold:** what signals «hypothesis validated»? Two approaches:
88
-
89
- 1. **Absolute:** «45% activation» (defined absolute number)
90
- 2. **Relative:** «+20% activation» or «+5pp»
91
-
92
- Rationale for threshold:
93
- - Based on business need (what lift makes it worth shipping)
94
- - Based on detectable effect (what sample size supports)
95
- - Based on industry benchmarks
96
-
97
- ### Step 4 — Timeframe
98
-
99
- - Too short = noise
100
- - Too long = slow learning cycle
101
- - B2B SaaS typical: 4-8 weeks for activation, 8-12 for retention
102
-
103
- Justify: why this duration?
104
-
105
- ### Step 5 — «Because» Rationale
106
-
107
- Underlying evidence:
108
- - User research (quotes, interviews)
109
- - Historical data (past similar changes)
110
- - Industry benchmarks
111
- - Competitor behavior
112
-
113
- Without «because» — guessing. With evidence — informed bet.
114
-
115
- ### Step 6 Null Hypothesis (Explicit)
116
-
117
- What if hypothesis fails? What would it mean:
118
- - Assumption does not hold
119
- - Need new hypothesis
120
- - Feature does not ship, resources go elsewhere
121
-
122
- Prepare to kill idea if data says so.
123
-
124
- ### Step 7 Guardrail Metrics
125
-
126
- What **should not** degrade even if primary metric improves:
127
- - Churn rate (must not increase)
128
- - NPS
129
- - Support ticket volume
130
- - Performance metrics
131
- - Revenue per user (if engagement growth comes at ARPA cost)
132
-
133
- If guardrail breaks despite primary win — treat as failure.
134
-
135
- ### Step 8 Confidence Level
136
-
137
- Bayesian informal:
138
- - **High confidence** (80%+): Strong evidence, similar successful launches, clear mechanism
139
- - **Medium** (50-80%): Moderate evidence, novel mechanism
140
- - **Low** (<50%): Exploratory, assumption-heavy
141
-
142
- Adjusts experimentation investment (bigger tests for lower confidence).
143
-
144
- ### Step 9 — Segment Analysis Plan
145
-
146
- Specify segments for post-test analysis:
147
- - By company size (SMB / mid / enterprise)
148
- - By user role (buyer / end-user / admin)
149
- - By plan tier
150
- - By tenure (new / established)
151
-
152
- Overall lift + per-segment breakdown.
153
-
154
- ## Validation (Quality Gate)
155
-
156
- - [ ] All 5 components (believe / for / result / know / because) completed
157
- - [ ] Specific change + specific user segment
158
- - [ ] Outcome metric measurable + instrumented
159
- - [ ] Baseline data supplied (recent window)
160
- - [ ] Threshold justified (business + detectable)
161
- - [ ] Timeframe rationale
162
- - [ ] Rationale cites 2 evidence sources
163
- - [ ] Null hypothesis consequences are explicit
164
- - [ ] Guardrail metrics listed
165
- - [ ] Confidence level stated
166
- - [ ] Segment analysis plan
167
-
168
- ## Handoff
169
-
170
- The output is the input for:
171
- - **`ab-test-design`** testing method
172
- - **Data Analyst** instrumentation
173
- - **PM** PRD Success Criteria section
174
- - **Engineering** feature flag setup
175
-
176
- Format: hypothesis card (markdown). Via `$handoff`.
177
-
178
- ## Anti-patterns
179
-
180
- | Error | Why it's bad | How to do it right |
181
- |-------|-------------|---------------------|
182
- | Vague change | Not testable | Specific implementation |
183
- | «All users» | Dilutes signal | Specific segment |
184
- | No baseline | Can't detect change | Baseline data first |
185
- | No threshold | «It will improve» | Numeric threshold + rationale |
186
- | No rationale | Guessing | ≥ 2 evidence sources |
187
- | No guardrails | Invisible damage | Explicit guardrails |
188
- | Ignored null | Never kill losing ideas | Prepare kill conditions |
189
-
190
- ## Template
191
-
192
- ```markdown
193
- # Hypothesis: [Short Name]
194
-
195
- **We believe** [change]
196
- **For** [segment]
197
- **Will result in** [outcome]
198
- **We'll know it's true when** [metric] reaches [threshold] within [timeframe]
199
- **Because** [rationale, ≥2 evidence sources]
200
-
201
- ## Baseline
202
- - Current [metric]: X
203
- - Data window: [last 30 days, etc.]
204
- - Confidence: Medium
205
-
206
- ## Threshold
207
- - Target: X → Y
208
- - Rationale: [business need + detectable + benchmark]
209
-
210
- ## Guardrails
211
- - Churn < [threshold]
212
- - NPS [threshold]
213
- - Support tickets < [threshold]
214
-
215
- ## Segments for analysis
216
- - Company size
217
- - User role
218
-
219
- ## Null Hypothesis Consequences
220
- If metric does not reach Y:
221
- - Assumption X does not hold
222
- - Ship? Likely no — data says no fit
223
- ```
224
-
225
- ## Worked Example — TeamFlow Hypothesis Cards (4 cards for AI Summarization launch)
226
-
227
- **Context:** Pre-MVP launch, data analyst forms hypothesis cards for each high-risk assumption from the assumption-map. Each card will be validated through a specific experiment.
228
-
229
- ### Hypothesis Card H-001: AI Summary Willingness to Pay (V1 assumption)
230
-
231
- ```markdown
232
- # Hypothesis: H-001 Willingness to Pay for AI Tier
233
-
234
- **We believe** adding AI Summarization as Team Tier feature (+$8/seat/month premium)
235
- **For** TeamFlow customer base (200 existing Core accounts) + new trial signups with manager workflows
236
- **Will result in** 40 account upgrades to AI Team Tier within first quarter post-launch
237
- **We'll know it's true when** AI Team Tier adoption reaches **20%** of 200 existing customer base
238
- (baseline: 0% (tier not existing pre-launch); target = 40 of 200 customer accounts upgrade)
239
- **within** 90 days post-launch
240
- **Because**
241
- (1) 7 of 10 customer conversations in landing page test confirmed «we'd pay $10/seat for AI summaries»
242
- (2) Competitor ChatGPT Teams priced at $25/user shows price ceiling exists (we're well-below)
243
- (3) Post-Discovery survey: 34% of customers expressed interest in AI summarization, suggesting 20% conversion realistic conservative target
244
-
245
- ## Baseline
246
- - Current AI Tier adoption: 0 accounts (tier not existing pre-launch)
247
- - Historic Core Team Tier upgrade rate: 12% / year (industry normal)
248
- - Data window: Q4 2025 + Q1 2026 (6 months) — for baseline churn / NPS
249
- - Confidence: Medium-High (validated in 2 customer research methods)
250
-
251
- ## Threshold
252
- - Target: **20% conversion** of 200-customer base in 90 days = 40 accounts
253
- - Rationale:
254
- - Business need: OKR KR1.1 «40 accounts upgraded»
255
- - Upper bound: 34% expressed interest (Discovery survey) → 20% conversion assumes 60% interest-to-upgrade rate
256
- - Benchmark: Successful B2B SaaS premium-tier launches hit 15-25% in first 90 days when feature matches customer need
257
-
258
- ## Guardrails
259
- - Churn rate < 9% (from baseline 8%) — if pricing triggers churn
260
- - NPS ≥ 43 (from baseline 45) — if tier disruption causes dissatisfaction
261
- - Support tickets «pricing / tier confusion» < 5% of all tickets
262
- - Core-tier churn must not accelerate (+0.5pp max)
263
-
264
- ## Segments for analysis
265
- - Company size (SMB / mid-market / enterprise) — expect enterprise > mid > SMB
266
- - Tenure (<6 mo / 6-24 mo / 24+ mo) — expect established > newer
267
- - Current usage intensity (top quartile WAM / median / bottom) expect heavy users → upgrade
268
-
269
- ## Null Hypothesis Consequences
270
- If conversion does not reach 8%:
271
- - **Below 3%:** Pricing wrong OR feature value weak. Trigger: re-evaluate pricing; consider unbundling
272
- - **3-5%:** Signal mixed. Investigate by-segment: likely enterprise adopting but mid-market price-sensitive
273
- - **5-8%:** Near-miss — extend observation 30 days, adjust GTM messaging, re-evaluate
274
-
275
- ## Tied Experiments
276
- - Exp EXP-012: A/B test pricing page messaging (value-focus vs savings-focus)
277
- - Exp EXP-015: A/B test in-product upsell timing (day 7 vs day 14 vs day 30 of tier eligibility)
278
-
279
- ## Confidence Level: Medium-High (75%)
280
- ## Expected P-value if true: <0.05 in 90-day window
281
- ```
282
-
283
- ---
284
-
285
- ### Hypothesis Card H-002: LLM Quality Acceptability (F1 assumption)
286
-
287
- ```markdown
288
- # Hypothesis: H-002LLM Quality Acceptable for HR Use Case
289
-
290
- **We believe** GPT-4 level LLMs (primary OpenAI GPT-4-Turbo, fallback Anthropic Claude 3.5)
291
- **For** 30-minute 1:1 performance conversations in English
292
- **Will generate summaries** acceptable to managers >85% of the time
293
- (Acceptable = manager approves without major edits (< 50% content changed))
294
- **We'll know it's true when** in Wizard-of-Oz test:
295
- - Blind quality rating from managers: ≥ 4.0 out of 5.0 average (across N ≥ 100 meetings)
296
- - Hallucination rate: < 5% of summaries contain factually wrong info
297
- - Misattribution rate: < 3% of action items assigned to wrong person
298
- **within** 4 weeks of Wizard-of-Oz testing
299
- **Because**
300
- (1) Recent LLM benchmarks on summarization (Anthropic HELM, OpenAI evals) show 87-92% acceptance
301
- (2) Our manual QA on 30 sample prompts achieved 90% acceptable rate
302
- (3) Adjacent use cases (Fireflies.ai, Gong) report >80% customer satisfaction — lower bar but similar
303
-
304
- ## Baseline
305
- - Internal QA prompt testing: 90% acceptable (N=30 manual tests)
306
- - No external baseline for HR conversation specifically
307
- - Confidence: Medium (limited HR-specific data)
308
-
309
- ## Threshold
310
- - Target: **≥85% acceptable rate** in Wizard-of-Oz
311
- - Rationale:
312
- - Below 85% user trust collapses, feature becomes liability
313
- - 85-90% = production-ready
314
- - >90% = exceeds expectations
315
-
316
- ## Guardrails
317
- - P95 generation latency 60s (user experience constraint)
318
- - Cost per summary ≤ $0.10 (viability — FP5 assumption)
319
- - Zero data leakage to LLM provider (verified via provider audit)
320
- - Zero training on customer data (contractual + technical enforcement)
321
-
322
- ## Segments for analysis
323
- - Meeting duration (short 5-15 / standard 15-45 / long 45-120 min) — expect medium best
324
- - Conversation type (planning / feedback / difficult convo / catch-up) — expect varying
325
- - Industry of customer (tech / services / manufacturing) — expect tech highest
326
- - Language mix (pure English / some non-English) — exclude non-English for MVP
327
-
328
- ## Null Hypothesis Consequences
329
- If acceptance < 85%:
330
- - **<70%:** Kill feature. LLM not ready for HR use case, wait 6-12 months.
331
- - **70-80%:** Ship with mandatory human-review layer (feature flag). Reduces value prop but launches.
332
- - **80-85%:** Extensive prompt engineering + iteration before launch. Delay 2-4 weeks.
333
-
334
- ## Tied Experiments
335
- - **Exp EXP-020: Wizard-of-Oz test** 20 beta managers, 100+ meetings total, blind quality rating
336
- - Exp EXP-021: Prompt engineering iteration (A/B different prompts, measure acceptance)
337
-
338
- ## Confidence Level: Medium (60%)
339
- ## Investment at risk: $200K engineering + 10 weeks delay if invalidated
340
- ```
341
-
342
- ---
343
-
344
- ### Hypothesis Card H-003: Manager Adoption Rate (V2 assumption)
345
-
346
- ```markdown
347
- # Hypothesis: H-003 Manager Adoption Rate Post-Launch
348
-
349
- **We believe** managers in AI-tier upgraded accounts
350
- **Will** adopt AI summarization at **≥60% weekly usage rate** within 90 days of account upgrade
351
- (Adoption = ≥1 AI-summarized 1:1 per week)
352
- **Because**
353
- (1) Discovery: 6 of 8 managers expressed direct desire for this feature
354
- (2) Removes 3-4 hrs/week admin burden very high individual incentive
355
- (3) Onboarding checklist design will guide first-use in <7 days
356
-
357
- ## Baseline
358
- - N/A (feature not existing). Comparable: existing Team Tier feature adoption avg 55% weekly usage in 90 days.
359
- - Confidence: Medium
360
-
361
- ## Threshold
362
- - Target: **60% weekly adoption** by Day 90
363
- - Stretch: 75%
364
- - Rationale:
365
- - Below 50% feature isn't sticky; churn risk
366
- - 50-60% acceptable but needs improvement
367
- - 60-75% — healthy
368
- - 75%+ category-defining
369
-
370
- ## Guardrails
371
- - Reverse adoption (abandonment) < 10% — users who tried and stopped
372
- - NPS stable or improving
373
- - Action items completion rate lifts in adopters (bonus signal)
374
-
375
- ## Segments for analysis
376
- - Team size of manager (small 3-5 / medium 6-10 / large 11+ reports) — expect medium/large highest
377
- - Manager tenure in role (<2 yrs / 2-5 / 5+) — expect newer managers highest (novelty helps)
378
- - Industry / role (tech / non-tech) — expect tech highest
379
- - Time of month (reviews season vs normal) — should be stable
380
-
381
- ## Null Hypothesis Consequences
382
- If adoption < 60%:
383
- - **<40%:** Feature fail re-evaluate design, consider major rework
384
- - **40-50%:** Needs iteration, likely onboarding flow issues
385
- - **50-60%:** Near-miss, iterate onboarding + reminders
386
-
387
- ## Tied Experiments
388
- - Exp EXP-025: A/B test onboarding checklist presence (with vs without)
389
- - Exp EXP-026: A/B test first-meeting reminder timing
390
- - Ongoing: cohort analysis by activation month
391
-
392
- ## Confidence: Medium (65%)
393
- ```
394
-
395
- ---
396
-
397
- ### Hypothesis Card H-004: Enterprise Tier Dashboard Upgrade Driver (V3 assumption)
398
-
399
- ```markdown
400
- # Hypothesis: H-004 Aggregate Dashboard Drives Enterprise Tier Upgrades
401
-
402
- **We believe** showing VP HR / CPO buyers aggregate dashboard (cadence + health score + benchmarks)
403
- **Will** drive **5 Enterprise tier upgrades** (to $50+/seat tier) within Q2
404
- **Because**
405
- (1) 4 of 4 buyer interviews explicitly asked for dashboard visibility
406
- (2) 8 of 10 enterprise prospects in Q1 asked «do you have 1:1 analytics?» — current blocker
407
- (3) Existing mid-market → enterprise conversion rate 0% (no offering); we're creating demand
408
-
409
- ## Baseline
410
- - Current Enterprise tier conversions from mid-market: 0/quarter (no dashboard feature)
411
- - Current Enterprise tier = legacy pricing, 10 accounts grandfathered
412
- - Confidence: Medium (strong buyer signal, but new motion)
413
-
414
- ## Threshold
415
- - Target: **5 Enterprise tier upgrades** by June 30
416
- - Stretch: 10
417
- - Rationale:
418
- - 5 = OKR KR2.1 commitment
419
- - Based on 80 eligible mid-market accounts × 6% conversion (conservative) = 5
420
-
421
- ## Guardrails
422
- - No cannibalization of existing Enterprise tier accounts (shouldn't churn down)
423
- - Enterprise deal size not diluted (avg ACV maintained)
424
- - Sales team does not spend disproportionate time (time-per-deal monitored)
425
-
426
- ## Segments for analysis
427
- - Account size (100-299 / 300-499 / 500+ employees) — expect 300+ highest conversion
428
- - Current spend tier (Team vs Enterprise legacy) — Team with >50 seats most likely
429
- - Industry (regulated / non-regulated) — regulated lower (compliance review barriers)
430
-
431
- ## Null Hypothesis Consequences
432
- If upgrades < 5:
433
- - **0-2:** Dashboard insufficient value for tier premium. Re-evaluate feature scope or pricing.
434
- - **3-4:** Near-miss extend timeline 30 days, refine sales motion
435
- - **5+:** Success, expand roadmap for Enterprise tier features
436
-
437
- ## Tied Experiments
438
- - Exp EXP-030: 3 design partner concierge sessions (qualitative + conversion intent)
439
- - Exp EXP-031: Sales team pitch variations A/B (data-first pitch vs ROI-first pitch)
440
-
441
- ## Confidence: Medium (60%)
442
- ```
443
-
444
- ---
445
-
446
- ### Hypothesis Card Portfolio Overview
447
-
448
- | ID | Hypothesis | Confidence | Validation Method | Status | Ship Dependency |
449
- |----|-----------|:----------:|-------------------|:------:|-----------------|
450
- | H-001 | WTP $8/seat premium | 75% | Landing page + customer conversations | ⏳ Pre-launch | Ship price tier |
451
- | H-002 | LLM quality ≥85% | 60% | Wizard-of-Oz N=100+ | ⏳ Week 3-6 | Ship AI feature |
452
- | H-003 | 60% weekly adoption | 65% | Post-launch cohort analysis | 🔜 Post-launch | Iterate onboarding |
453
- | H-004 | 5 Enterprise upgrades | 60% | Design partner sessions + sales A/B | ⏳ Pre-launch | Dashboard MVP |
454
-
455
- ### Decision Tree Based on Results
456
-
457
- ```
458
- H-002 (LLM quality)
459
- ├── >85% acceptable → GREEN LIGHT for MVP launch
460
- ├── 70-85% → Ship with human-review layer, iterate prompts
461
- └── <70% Delay 2-3 months, wait for LLM improvement
462
-
463
- H-001 (WTP)
464
- ├── >8% conversion → Validated, scale GTM
465
- ├── 5-8% Near miss, iterate pricing / messaging
466
- └── <5% Pricing wrong, re-price or unbundle
467
- ```
468
-
469
- > **Hypothesis lesson:** 4 hypothesis cards cover **different types of risk**: value (H-001, H-004), feasibility (H-002), usability (H-003). Each with its own validation method (landing page, Wizard-of-Oz, cohort analysis, design partner). The **null hypothesis consequences** section — which is often skipped — makes it actionable: «if <X, do Y». Without this, hypothesis cards = wishful thinking. Each card links back to the assumption-map risk score, closing the discovery-to-validation loop.
1
+ ---
2
+ name: hypothesis-template
3
+ description: Testable hypothesis — We believe / Will result in / We'll know when [metric] reaches [threshold]
4
+ type: triggered
5
+ domain: product
6
+ owners:
7
+ - data_analyst
8
+ gates:
9
+ - DATA_ANALYST
10
+ tech: []
11
+ topic: []
12
+ triggers:
13
+ - hypothesis-template
14
+ - hypothesis
15
+ - гипотеза
16
+ related: []
17
+ budget_lines: 484
18
+ schema_version: 1
19
+ ---
20
+ # Hypothesis Template
21
+
22
+ > **Category:** Experimentation · **Slug:** `hypothesis-template`
23
+
24
+ ## When to Use
25
+
26
+ - Before every experiment (A/B test, rollout, prototype test).
27
+ - During assumption validation convert assumption into testable hypothesis.
28
+ - For pre-mortem decisions «if we do X, what do we expect?».
29
+ - As part of PRD success criteria.
30
+
31
+ ## Input
32
+
33
+ | Field | Required | Description |
34
+ |-------|:--------:|-------------|
35
+ | Proposed change / feature | ✅ | What we're testing |
36
+ | Underlying assumption | | Why we think it will work |
37
+ | Outcome metric | | What to measure |
38
+ | Baseline data | | Current metric level |
39
+
40
+ ## Data Sources
41
+
42
+ 1. `$assumption-mapping` — which assumptions to test.
43
+ 2. `$saas-metrics` + `$aarrr-metrics` for outcome metric selection.
44
+ 3. Historical data baseline.
45
+ 4. Industry benchmarks expected effect sizes.
46
+
47
+ ### Related Skills
48
+
49
+ | Skill | What we take | When to invoke |
50
+ |-------|-------------|----------------|
51
+ | `assumption-mapping` | Top-risky assumptions hypotheses | Before hypothesis |
52
+ | `ab-test-design` | Testing method | After hypothesis |
53
+ | `saas-metrics` | Outcome metrics | For measurement |
54
+ | `north-star-metric` | Primary metric alignment | For NSM-related tests |
55
+
56
+ ## Format (Canonical)
57
+
58
+ > **We believe** [proposed change / hypothesis]
59
+ > **For** [target user / segment]
60
+ > **Will result in** [expected outcome]
61
+ > **We'll know it's true when** [metric] **reaches** [threshold] **within** [timeframe]
62
+ > **Because** [underlying rationale]
63
+
64
+ Example:
65
+ > **We believe** adding an in-app onboarding checklist
66
+ > **For** new users (trial signups, first 7 days)
67
+ > **Will result in** higher activation rate
68
+ > **We'll know it's true when** 7-day activation rate reaches **45%** (from baseline **32%**) **within** 6 weeks of rollout
69
+ > **Because** 12/15 interviews showed confusion about first steps, and competitor data suggests checklist approach drives +40% activation in the category.
70
+
71
+ ## Protocol
72
+
73
+ ### Step 1 Hypothesis Formulation
74
+
75
+ **We believe:** specific change (feature, copy, flow)
76
+ **For:** specific user segment (not «all users»)
77
+ **Will result in:** directional outcome + metric
78
+
79
+ Rules:
80
+ - Specific change, not vague («improve UX»)
81
+ - Specific user, not «users»
82
+ - Specific outcome, not «better engagement»
83
+
84
+ ### Step 2 — Outcome Metric Selection
85
+
86
+ Primary metric must be:
87
+ - **Measurable:** instrumented or instrumentable
88
+ - **Leading or lagging:** know which
89
+ - **Aligned:** ties to NSM / OKR
90
+ - **Anti-game:** not easily gameable
91
+
92
+ Common outcome metrics per hypothesis type:
93
+ - **Onboarding/Activation hypothesis:** 7-day activation rate, time-to-first-value
94
+ - **Retention hypothesis:** W/W retention, churn rate, usage frequency
95
+ - **Monetization hypothesis:** conversion rate, ARPA, upsell rate
96
+ - **Engagement hypothesis:** DAU/MAU, session duration, actions per session
97
+
98
+ ### Step 3 — Baseline + Threshold
99
+
100
+ **Baseline:** current level (based on recent data window).
101
+
102
+ **Threshold:** what signals «hypothesis validated»? Two approaches:
103
+
104
+ 1. **Absolute:** «45% activation» (defined absolute number)
105
+ 2. **Relative:** «+20% activation» or «+5pp»
106
+
107
+ Rationale for threshold:
108
+ - Based on business need (what lift makes it worth shipping)
109
+ - Based on detectable effect (what sample size supports)
110
+ - Based on industry benchmarks
111
+
112
+ ### Step 4 — Timeframe
113
+
114
+ - Too short = noise
115
+ - Too long = slow learning cycle
116
+ - B2B SaaS typical: 4-8 weeks for activation, 8-12 for retention
117
+
118
+ Justify: why this duration?
119
+
120
+ ### Step 5 «Because» Rationale
121
+
122
+ Underlying evidence:
123
+ - User research (quotes, interviews)
124
+ - Historical data (past similar changes)
125
+ - Industry benchmarks
126
+ - Competitor behavior
127
+
128
+ Without «because» — guessing. With evidence — informed bet.
129
+
130
+ ### Step 6 — Null Hypothesis (Explicit)
131
+
132
+ What if hypothesis fails? What would it mean:
133
+ - Assumption does not hold
134
+ - Need new hypothesis
135
+ - Feature does not ship, resources go elsewhere
136
+
137
+ Prepare to kill idea if data says so.
138
+
139
+ ### Step 7 Guardrail Metrics
140
+
141
+ What **should not** degrade even if primary metric improves:
142
+ - Churn rate (must not increase)
143
+ - NPS
144
+ - Support ticket volume
145
+ - Performance metrics
146
+ - Revenue per user (if engagement growth comes at ARPA cost)
147
+
148
+ If guardrail breaks despite primary win treat as failure.
149
+
150
+ ### Step 8 Confidence Level
151
+
152
+ Bayesian informal:
153
+ - **High confidence** (80%+): Strong evidence, similar successful launches, clear mechanism
154
+ - **Medium** (50-80%): Moderate evidence, novel mechanism
155
+ - **Low** (<50%): Exploratory, assumption-heavy
156
+
157
+ Adjusts experimentation investment (bigger tests for lower confidence).
158
+
159
+ ### Step 9 Segment Analysis Plan
160
+
161
+ Specify segments for post-test analysis:
162
+ - By company size (SMB / mid / enterprise)
163
+ - By user role (buyer / end-user / admin)
164
+ - By plan tier
165
+ - By tenure (new / established)
166
+
167
+ Overall lift + per-segment breakdown.
168
+
169
+ ## Validation (Quality Gate)
170
+
171
+ - [ ] All 5 components (believe / for / result / know / because) completed
172
+ - [ ] Specific change + specific user segment
173
+ - [ ] Outcome metric measurable + instrumented
174
+ - [ ] Baseline data supplied (recent window)
175
+ - [ ] Threshold justified (business + detectable)
176
+ - [ ] Timeframe rationale
177
+ - [ ] Rationale cites ≥ 2 evidence sources
178
+ - [ ] Null hypothesis consequences are explicit
179
+ - [ ] Guardrail metrics listed
180
+ - [ ] Confidence level stated
181
+ - [ ] Segment analysis plan
182
+
183
+ ## Handoff
184
+
185
+ The output is the input for:
186
+ - **`ab-test-design`** testing method
187
+ - **Data Analyst** instrumentation
188
+ - **PM** PRD Success Criteria section
189
+ - **Engineering** → feature flag setup
190
+
191
+ Format: hypothesis card (markdown). Via `$handoff`.
192
+
193
+ ## Anti-patterns
194
+
195
+ | Error | Why it's bad | How to do it right |
196
+ |-------|-------------|---------------------|
197
+ | Vague change | Not testable | Specific implementation |
198
+ | «All users» | Dilutes signal | Specific segment |
199
+ | No baseline | Can't detect change | Baseline data first |
200
+ | No threshold | «It will improve» | Numeric threshold + rationale |
201
+ | No rationale | Guessing | ≥ 2 evidence sources |
202
+ | No guardrails | Invisible damage | Explicit guardrails |
203
+ | Ignored null | Never kill losing ideas | Prepare kill conditions |
204
+
205
+ ## Template
206
+
207
+ ```markdown
208
+ # Hypothesis: [Short Name]
209
+
210
+ **We believe** [change]
211
+ **For** [segment]
212
+ **Will result in** [outcome]
213
+ **We'll know it's true when** [metric] reaches [threshold] within [timeframe]
214
+ **Because** [rationale, ≥2 evidence sources]
215
+
216
+ ## Baseline
217
+ - Current [metric]: X
218
+ - Data window: [last 30 days, etc.]
219
+ - Confidence: Medium
220
+
221
+ ## Threshold
222
+ - Target: X Y
223
+ - Rationale: [business need + detectable + benchmark]
224
+
225
+ ## Guardrails
226
+ - Churn < [threshold]
227
+ - NPS [threshold]
228
+ - Support tickets < [threshold]
229
+
230
+ ## Segments for analysis
231
+ - Company size
232
+ - User role
233
+
234
+ ## Null Hypothesis Consequences
235
+ If metric does not reach Y:
236
+ - Assumption X does not hold
237
+ - Ship? Likely no data says no fit
238
+ ```
239
+
240
+ ## Worked Example — TeamFlow Hypothesis Cards (4 cards for AI Summarization launch)
241
+
242
+ **Context:** Pre-MVP launch, data analyst forms hypothesis cards for each high-risk assumption from the assumption-map. Each card will be validated through a specific experiment.
243
+
244
+ ### Hypothesis Card H-001: AI Summary Willingness to Pay (V1 assumption)
245
+
246
+ ```markdown
247
+ # Hypothesis: H-001 Willingness to Pay for AI Tier
248
+
249
+ **We believe** adding AI Summarization as Team Tier feature (+$8/seat/month premium)
250
+ **For** TeamFlow customer base (200 existing Core accounts) + new trial signups with manager workflows
251
+ **Will result in** 40 account upgrades to AI Team Tier within first quarter post-launch
252
+ **We'll know it's true when** AI Team Tier adoption reaches **20%** of 200 existing customer base
253
+ (baseline: 0% (tier not existing pre-launch); target = 40 of 200 customer accounts upgrade)
254
+ **within** 90 days post-launch
255
+ **Because**
256
+ (1) 7 of 10 customer conversations in landing page test confirmed «we'd pay $10/seat for AI summaries»
257
+ (2) Competitor ChatGPT Teams priced at $25/user shows price ceiling exists (we're well-below)
258
+ (3) Post-Discovery survey: 34% of customers expressed interest in AI summarization, suggesting 20% conversion realistic conservative target
259
+
260
+ ## Baseline
261
+ - Current AI Tier adoption: 0 accounts (tier not existing pre-launch)
262
+ - Historic Core Team Tier upgrade rate: 12% / year (industry normal)
263
+ - Data window: Q4 2025 + Q1 2026 (6 months) — for baseline churn / NPS
264
+ - Confidence: Medium-High (validated in 2 customer research methods)
265
+
266
+ ## Threshold
267
+ - Target: **20% conversion** of 200-customer base in 90 days = 40 accounts
268
+ - Rationale:
269
+ - Business need: OKR KR1.1 «40 accounts upgraded»
270
+ - Upper bound: 34% expressed interest (Discovery survey) → 20% conversion assumes 60% interest-to-upgrade rate
271
+ - Benchmark: Successful B2B SaaS premium-tier launches hit 15-25% in first 90 days when feature matches customer need
272
+
273
+ ## Guardrails
274
+ - Churn rate < 9% (from baseline 8%) — if pricing triggers churn
275
+ - NPS ≥ 43 (from baseline 45) — if tier disruption causes dissatisfaction
276
+ - Support tickets «pricing / tier confusion» < 5% of all tickets
277
+ - Core-tier churn must not accelerate (+0.5pp max)
278
+
279
+ ## Segments for analysis
280
+ - Company size (SMB / mid-market / enterprise) expect enterprise > mid > SMB
281
+ - Tenure (<6 mo / 6-24 mo / 24+ mo) — expect established > newer
282
+ - Current usage intensity (top quartile WAM / median / bottom) — expect heavy users → upgrade
283
+
284
+ ## Null Hypothesis Consequences
285
+ If conversion does not reach 8%:
286
+ - **Below 3%:** Pricing wrong OR feature value weak. Trigger: re-evaluate pricing; consider unbundling
287
+ - **3-5%:** Signal mixed. Investigate by-segment: likely enterprise adopting but mid-market price-sensitive
288
+ - **5-8%:** Near-missextend observation 30 days, adjust GTM messaging, re-evaluate
289
+
290
+ ## Tied Experiments
291
+ - Exp EXP-012: A/B test pricing page messaging (value-focus vs savings-focus)
292
+ - Exp EXP-015: A/B test in-product upsell timing (day 7 vs day 14 vs day 30 of tier eligibility)
293
+
294
+ ## Confidence Level: Medium-High (75%)
295
+ ## Expected P-value if true: <0.05 in 90-day window
296
+ ```
297
+
298
+ ---
299
+
300
+ ### Hypothesis Card H-002: LLM Quality Acceptability (F1 assumption)
301
+
302
+ ```markdown
303
+ # Hypothesis: H-002 — LLM Quality Acceptable for HR Use Case
304
+
305
+ **We believe** GPT-4 level LLMs (primary OpenAI GPT-4-Turbo, fallback Anthropic Claude 3.5)
306
+ **For** 30-minute 1:1 performance conversations in English
307
+ **Will generate summaries** acceptable to managers >85% of the time
308
+ (Acceptable = manager approves without major edits (< 50% content changed))
309
+ **We'll know it's true when** in Wizard-of-Oz test:
310
+ - Blind quality rating from managers: 4.0 out of 5.0 average (across N ≥ 100 meetings)
311
+ - Hallucination rate: < 5% of summaries contain factually wrong info
312
+ - Misattribution rate: < 3% of action items assigned to wrong person
313
+ **within** 4 weeks of Wizard-of-Oz testing
314
+ **Because**
315
+ (1) Recent LLM benchmarks on summarization (Anthropic HELM, OpenAI evals) show 87-92% acceptance
316
+ (2) Our manual QA on 30 sample prompts achieved 90% acceptable rate
317
+ (3) Adjacent use cases (Fireflies.ai, Gong) report >80% customer satisfaction — lower bar but similar
318
+
319
+ ## Baseline
320
+ - Internal QA prompt testing: 90% acceptable (N=30 manual tests)
321
+ - No external baseline for HR conversation specifically
322
+ - Confidence: Medium (limited HR-specific data)
323
+
324
+ ## Threshold
325
+ - Target: **≥85% acceptable rate** in Wizard-of-Oz
326
+ - Rationale:
327
+ - Below 85% → user trust collapses, feature becomes liability
328
+ - 85-90% = production-ready
329
+ - >90% = exceeds expectations
330
+
331
+ ## Guardrails
332
+ - P95 generation latency 60s (user experience constraint)
333
+ - Cost per summary ≤ $0.10 (viability — FP5 assumption)
334
+ - Zero data leakage to LLM provider (verified via provider audit)
335
+ - Zero training on customer data (contractual + technical enforcement)
336
+
337
+ ## Segments for analysis
338
+ - Meeting duration (short 5-15 / standard 15-45 / long 45-120 min) — expect medium best
339
+ - Conversation type (planning / feedback / difficult convo / catch-up) — expect varying
340
+ - Industry of customer (tech / services / manufacturing) — expect tech highest
341
+ - Language mix (pure English / some non-English) — exclude non-English for MVP
342
+
343
+ ## Null Hypothesis Consequences
344
+ If acceptance < 85%:
345
+ - **<70%:** Kill feature. LLM not ready for HR use case, wait 6-12 months.
346
+ - **70-80%:** Ship with mandatory human-review layer (feature flag). Reduces value prop but launches.
347
+ - **80-85%:** Extensive prompt engineering + iteration before launch. Delay 2-4 weeks.
348
+
349
+ ## Tied Experiments
350
+ - **Exp EXP-020: Wizard-of-Oz test** 20 beta managers, 100+ meetings total, blind quality rating
351
+ - Exp EXP-021: Prompt engineering iteration (A/B different prompts, measure acceptance)
352
+
353
+ ## Confidence Level: Medium (60%)
354
+ ## Investment at risk: $200K engineering + 10 weeks delay if invalidated
355
+ ```
356
+
357
+ ---
358
+
359
+ ### Hypothesis Card H-003: Manager Adoption Rate (V2 assumption)
360
+
361
+ ```markdown
362
+ # Hypothesis: H-003 Manager Adoption Rate Post-Launch
363
+
364
+ **We believe** managers in AI-tier upgraded accounts
365
+ **Will** adopt AI summarization at **≥60% weekly usage rate** within 90 days of account upgrade
366
+ (Adoption = ≥1 AI-summarized 1:1 per week)
367
+ **Because**
368
+ (1) Discovery: 6 of 8 managers expressed direct desire for this feature
369
+ (2) Removes 3-4 hrs/week admin burden — very high individual incentive
370
+ (3) Onboarding checklist design will guide first-use in <7 days
371
+
372
+ ## Baseline
373
+ - N/A (feature not existing). Comparable: existing Team Tier feature adoption avg 55% weekly usage in 90 days.
374
+ - Confidence: Medium
375
+
376
+ ## Threshold
377
+ - Target: **60% weekly adoption** by Day 90
378
+ - Stretch: 75%
379
+ - Rationale:
380
+ - Below 50% — feature isn't sticky; churn risk
381
+ - 50-60% acceptable but needs improvement
382
+ - 60-75% healthy
383
+ - 75%+category-defining
384
+
385
+ ## Guardrails
386
+ - Reverse adoption (abandonment) < 10% — users who tried and stopped
387
+ - NPS stable or improving
388
+ - Action items completion rate lifts in adopters (bonus signal)
389
+
390
+ ## Segments for analysis
391
+ - Team size of manager (small 3-5 / medium 6-10 / large 11+ reports) — expect medium/large highest
392
+ - Manager tenure in role (<2 yrs / 2-5 / 5+) — expect newer managers highest (novelty helps)
393
+ - Industry / role (tech / non-tech) — expect tech highest
394
+ - Time of month (reviews season vs normal) — should be stable
395
+
396
+ ## Null Hypothesis Consequences
397
+ If adoption < 60%:
398
+ - **<40%:** Feature fail — re-evaluate design, consider major rework
399
+ - **40-50%:** Needs iteration, likely onboarding flow issues
400
+ - **50-60%:** Near-miss, iterate onboarding + reminders
401
+
402
+ ## Tied Experiments
403
+ - Exp EXP-025: A/B test onboarding checklist presence (with vs without)
404
+ - Exp EXP-026: A/B test first-meeting reminder timing
405
+ - Ongoing: cohort analysis by activation month
406
+
407
+ ## Confidence: Medium (65%)
408
+ ```
409
+
410
+ ---
411
+
412
+ ### Hypothesis Card H-004: Enterprise Tier Dashboard Upgrade Driver (V3 assumption)
413
+
414
+ ```markdown
415
+ # Hypothesis: H-004 Aggregate Dashboard Drives Enterprise Tier Upgrades
416
+
417
+ **We believe** showing VP HR / CPO buyers aggregate dashboard (cadence + health score + benchmarks)
418
+ **Will** drive **5 Enterprise tier upgrades** (to $50+/seat tier) within Q2
419
+ **Because**
420
+ (1) 4 of 4 buyer interviews explicitly asked for dashboard visibility
421
+ (2) 8 of 10 enterprise prospects in Q1 asked «do you have 1:1 analytics?» — current blocker
422
+ (3) Existing mid-market enterprise conversion rate 0% (no offering); we're creating demand
423
+
424
+ ## Baseline
425
+ - Current Enterprise tier conversions from mid-market: 0/quarter (no dashboard feature)
426
+ - Current Enterprise tier = legacy pricing, 10 accounts grandfathered
427
+ - Confidence: Medium (strong buyer signal, but new motion)
428
+
429
+ ## Threshold
430
+ - Target: **5 Enterprise tier upgrades** by June 30
431
+ - Stretch: 10
432
+ - Rationale:
433
+ - 5 = OKR KR2.1 commitment
434
+ - Based on 80 eligible mid-market accounts × 6% conversion (conservative) = 5
435
+
436
+ ## Guardrails
437
+ - No cannibalization of existing Enterprise tier accounts (shouldn't churn down)
438
+ - Enterprise deal size not diluted (avg ACV maintained)
439
+ - Sales team does not spend disproportionate time (time-per-deal monitored)
440
+
441
+ ## Segments for analysis
442
+ - Account size (100-299 / 300-499 / 500+ employees) — expect 300+ highest conversion
443
+ - Current spend tier (Team vs Enterprise legacy) — Team with >50 seats most likely
444
+ - Industry (regulated / non-regulated) — regulated lower (compliance review barriers)
445
+
446
+ ## Null Hypothesis Consequences
447
+ If upgrades < 5:
448
+ - **0-2:** Dashboard insufficient value for tier premium. Re-evaluate feature scope or pricing.
449
+ - **3-4:** Near-miss — extend timeline 30 days, refine sales motion
450
+ - **5+:** Success, expand roadmap for Enterprise tier features
451
+
452
+ ## Tied Experiments
453
+ - Exp EXP-030: 3 design partner concierge sessions (qualitative + conversion intent)
454
+ - Exp EXP-031: Sales team pitch variations A/B (data-first pitch vs ROI-first pitch)
455
+
456
+ ## Confidence: Medium (60%)
457
+ ```
458
+
459
+ ---
460
+
461
+ ### Hypothesis Card Portfolio Overview
462
+
463
+ | ID | Hypothesis | Confidence | Validation Method | Status | Ship Dependency |
464
+ |----|-----------|:----------:|-------------------|:------:|-----------------|
465
+ | H-001 | WTP $8/seat premium | 75% | Landing page + customer conversations | ⏳ Pre-launch | Ship price tier |
466
+ | H-002 | LLM quality ≥85% | 60% | Wizard-of-Oz N=100+ | ⏳ Week 3-6 | Ship AI feature |
467
+ | H-003 | 60% weekly adoption | 65% | Post-launch cohort analysis | 🔜 Post-launch | Iterate onboarding |
468
+ | H-004 | 5 Enterprise upgrades | 60% | Design partner sessions + sales A/B | ⏳ Pre-launch | Dashboard MVP |
469
+
470
+ ### Decision Tree Based on Results
471
+
472
+ ```
473
+ H-002 (LLM quality)
474
+ ├── >85% acceptable → GREEN LIGHT for MVP launch
475
+ ├── 70-85% → Ship with human-review layer, iterate prompts
476
+ └── <70% → Delay 2-3 months, wait for LLM improvement
477
+
478
+ H-001 (WTP)
479
+ ├── >8% conversion → Validated, scale GTM
480
+ ├── 5-8% → Near miss, iterate pricing / messaging
481
+ └── <5% → Pricing wrong, re-price or unbundle
482
+ ```
483
+
484
+ > **Hypothesis lesson:** 4 hypothesis cards cover **different types of risk**: value (H-001, H-004), feasibility (H-002), usability (H-003). Each with its own validation method (landing page, Wizard-of-Oz, cohort analysis, design partner). The **null hypothesis consequences** section — which is often skipped — makes it actionable: «if <X, do Y». Without this, hypothesis cards = wishful thinking. Each card links back to the assumption-map risk score, closing the discovery-to-validation loop.