@pennyfarthing/core 11.0.0-alpha.0 → 11.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (401) hide show
  1. package/README.md +84 -26
  2. package/package.json +14 -16
  3. package/packages/core/dist/cli/cyclist-migration.test.js +2 -1
  4. package/packages/core/dist/cli/cyclist-migration.test.js.map +1 -1
  5. package/packages/core/dist/cli/ocean-profiles.test.js +5 -4
  6. package/packages/core/dist/cli/ocean-profiles.test.js.map +1 -1
  7. package/packages/core/dist/cli/theme-maker.test.js +5 -4
  8. package/packages/core/dist/cli/theme-maker.test.js.map +1 -1
  9. package/packages/core/dist/cli/utils/010-detect-remove-old-packages.test.d.ts +20 -0
  10. package/packages/core/dist/cli/utils/010-detect-remove-old-packages.test.d.ts.map +1 -0
  11. package/packages/core/dist/cli/utils/010-detect-remove-old-packages.test.js +278 -0
  12. package/packages/core/dist/cli/utils/010-detect-remove-old-packages.test.js.map +1 -0
  13. package/packages/core/dist/cli/utils/constants.d.ts +7 -1
  14. package/packages/core/dist/cli/utils/constants.d.ts.map +1 -1
  15. package/packages/core/dist/cli/utils/constants.js +2 -0
  16. package/packages/core/dist/cli/utils/constants.js.map +1 -1
  17. package/packages/core/dist/cli/utils/constants.test.d.ts +10 -0
  18. package/packages/core/dist/cli/utils/constants.test.d.ts.map +1 -0
  19. package/packages/core/dist/cli/utils/constants.test.js +38 -0
  20. package/packages/core/dist/cli/utils/constants.test.js.map +1 -0
  21. package/packages/core/dist/consultation/consultation-protocol.d.ts +139 -0
  22. package/packages/core/dist/consultation/consultation-protocol.d.ts.map +1 -0
  23. package/packages/core/dist/consultation/consultation-protocol.js +178 -0
  24. package/packages/core/dist/consultation/consultation-protocol.js.map +1 -0
  25. package/packages/core/dist/consultation/consultation-protocol.test.d.ts +20 -0
  26. package/packages/core/dist/consultation/consultation-protocol.test.d.ts.map +1 -0
  27. package/packages/core/dist/consultation/consultation-protocol.test.js +474 -0
  28. package/packages/core/dist/consultation/consultation-protocol.test.js.map +1 -0
  29. package/packages/core/dist/public/js/react/react.js +30 -30
  30. package/packages/core/dist/scripts/generate-report.test.js +2 -2
  31. package/packages/core/dist/scripts/generate-spider-report.test.js +2 -2
  32. package/packages/core/dist/scripts/generate-spider.test.js +2 -1
  33. package/packages/core/dist/scripts/generate-spider.test.js.map +1 -1
  34. package/packages/core/dist/server/api/file-browser.d.ts.map +1 -1
  35. package/packages/core/dist/server/api/file-browser.js +19 -1
  36. package/packages/core/dist/server/api/file-browser.js.map +1 -1
  37. package/packages/core/dist/server/api/git-fetch-cooldown.test.d.ts +10 -0
  38. package/packages/core/dist/server/api/git-fetch-cooldown.test.d.ts.map +1 -0
  39. package/packages/core/dist/server/api/git-fetch-cooldown.test.js +30 -0
  40. package/packages/core/dist/server/api/git-fetch-cooldown.test.js.map +1 -0
  41. package/packages/core/dist/server/api/git.d.ts +8 -0
  42. package/packages/core/dist/server/api/git.d.ts.map +1 -1
  43. package/packages/core/dist/server/api/git.js +37 -10
  44. package/packages/core/dist/server/api/git.js.map +1 -1
  45. package/packages/core/dist/server/api/health-score.d.ts.map +1 -1
  46. package/packages/core/dist/server/api/health-score.js +25 -1
  47. package/packages/core/dist/server/api/health-score.js.map +1 -1
  48. package/packages/core/dist/server/api/index.d.ts +1 -1
  49. package/packages/core/dist/server/api/index.d.ts.map +1 -1
  50. package/packages/core/dist/server/api/index.js +1 -1
  51. package/packages/core/dist/server/api/index.js.map +1 -1
  52. package/packages/core/dist/server/api/settings.d.ts.map +1 -1
  53. package/packages/core/dist/server/api/settings.js +73 -2
  54. package/packages/core/dist/server/api/settings.js.map +1 -1
  55. package/packages/core/dist/server/api/theme-agents.d.ts.map +1 -1
  56. package/packages/core/dist/server/api/theme-agents.js +61 -0
  57. package/packages/core/dist/server/api/theme-agents.js.map +1 -1
  58. package/packages/core/dist/server/otlp-receiver.d.ts +35 -13
  59. package/packages/core/dist/server/otlp-receiver.d.ts.map +1 -1
  60. package/packages/core/dist/server/otlp-receiver.js +76 -16
  61. package/packages/core/dist/server/otlp-receiver.js.map +1 -1
  62. package/packages/core/dist/server/paths.d.ts.map +1 -1
  63. package/packages/core/dist/server/paths.js +11 -1
  64. package/packages/core/dist/server/paths.js.map +1 -1
  65. package/packages/core/dist/server/server.d.ts +3 -1
  66. package/packages/core/dist/server/server.d.ts.map +1 -1
  67. package/packages/core/dist/server/server.js +23 -16
  68. package/packages/core/dist/server/server.js.map +1 -1
  69. package/packages/core/dist/server/server.test.js.map +1 -1
  70. package/packages/core/dist/workflow/gate-file-validation.d.ts +49 -0
  71. package/packages/core/dist/workflow/gate-file-validation.d.ts.map +1 -0
  72. package/packages/core/dist/workflow/gate-file-validation.js +157 -0
  73. package/packages/core/dist/workflow/gate-file-validation.js.map +1 -0
  74. package/packages/core/dist/workflow/gate-file-validation.test.d.ts +19 -0
  75. package/packages/core/dist/workflow/gate-file-validation.test.d.ts.map +1 -0
  76. package/packages/core/dist/workflow/gate-file-validation.test.js +536 -0
  77. package/packages/core/dist/workflow/gate-file-validation.test.js.map +1 -0
  78. package/packages/core/dist/workflow/gate-schema-validation.test.d.ts +14 -0
  79. package/packages/core/dist/workflow/gate-schema-validation.test.d.ts.map +1 -0
  80. package/packages/core/dist/workflow/gate-schema-validation.test.js +339 -0
  81. package/packages/core/dist/workflow/gate-schema-validation.test.js.map +1 -0
  82. package/packages/core/dist/workflow/handoff.js +2 -2
  83. package/packages/core/dist/workflow/handoff.js.map +1 -1
  84. package/packages/core/dist/workflow/handoff.test.js +16 -0
  85. package/packages/core/dist/workflow/handoff.test.js.map +1 -1
  86. package/packages/core/dist/workflow/variable-resolver.test.js +1 -1
  87. package/packages/core/dist/workflow/variable-resolver.test.js.map +1 -1
  88. package/packages/core/dist/workflow/workflow-migration.test.js +4 -3
  89. package/packages/core/dist/workflow/workflow-migration.test.js.map +1 -1
  90. package/packages/core/dist/workflow/workflow-schema.d.ts +4 -2
  91. package/packages/core/dist/workflow/workflow-schema.d.ts.map +1 -1
  92. package/packages/core/dist/workflow/workflow-schema.js +43 -8
  93. package/packages/core/dist/workflow/workflow-schema.js.map +1 -1
  94. package/pennyfarthing-dist/agents/README.md +6 -14
  95. package/pennyfarthing-dist/agents/architect.md +43 -30
  96. package/pennyfarthing-dist/agents/ba.md +30 -29
  97. package/pennyfarthing-dist/agents/dev.md +76 -41
  98. package/pennyfarthing-dist/agents/devops.md +57 -21
  99. package/pennyfarthing-dist/agents/orchestrator.md +3 -11
  100. package/pennyfarthing-dist/agents/pm.md +45 -31
  101. package/pennyfarthing-dist/agents/reviewer.md +20 -66
  102. package/pennyfarthing-dist/agents/sm-setup.md +2 -2
  103. package/pennyfarthing-dist/agents/sm.md +8 -30
  104. package/pennyfarthing-dist/agents/tea.md +25 -41
  105. package/pennyfarthing-dist/agents/tech-writer.md +33 -90
  106. package/pennyfarthing-dist/agents/ux-designer.md +39 -40
  107. package/pennyfarthing-dist/commands/benchmark-control.md +8 -64
  108. package/pennyfarthing-dist/commands/benchmark.md +8 -480
  109. package/pennyfarthing-dist/commands/job-fair.md +8 -97
  110. package/pennyfarthing-dist/commands/pf-benchmark-control.md +70 -0
  111. package/pennyfarthing-dist/commands/pf-benchmark.md +486 -0
  112. package/pennyfarthing-dist/commands/pf-chore.md +4 -4
  113. package/pennyfarthing-dist/commands/pf-ci.md +40 -0
  114. package/pennyfarthing-dist/commands/pf-close-epic.md +9 -27
  115. package/pennyfarthing-dist/commands/pf-continue-session.md +9 -213
  116. package/pennyfarthing-dist/commands/pf-create-branches-from-story.md +11 -353
  117. package/pennyfarthing-dist/commands/pf-docs.md +28 -0
  118. package/pennyfarthing-dist/commands/pf-epic.md +67 -0
  119. package/pennyfarthing-dist/commands/pf-git-cleanup.md +11 -52
  120. package/pennyfarthing-dist/commands/pf-git.md +75 -0
  121. package/pennyfarthing-dist/commands/pf-help.md +110 -128
  122. package/pennyfarthing-dist/commands/pf-job-fair.md +102 -0
  123. package/pennyfarthing-dist/commands/pf-new-work.md +9 -18
  124. package/pennyfarthing-dist/commands/pf-parallel-work.md +6 -66
  125. package/pennyfarthing-dist/commands/pf-release.md +11 -76
  126. package/pennyfarthing-dist/commands/pf-repo-status.md +11 -44
  127. package/pennyfarthing-dist/commands/pf-run-ci.md +8 -111
  128. package/pennyfarthing-dist/commands/pf-session.md +51 -0
  129. package/pennyfarthing-dist/commands/pf-solo.md +447 -0
  130. package/pennyfarthing-dist/commands/pf-sprint-planning.md +8 -104
  131. package/pennyfarthing-dist/commands/pf-standalone.md +1 -1
  132. package/pennyfarthing-dist/commands/pf-start-epic.md +9 -163
  133. package/pennyfarthing-dist/commands/pf-sync-epic-to-jira.md +8 -179
  134. package/pennyfarthing-dist/commands/pf-sync-work-with-sprint.md +8 -368
  135. package/pennyfarthing-dist/commands/pf-update-domain-docs.md +8 -78
  136. package/pennyfarthing-dist/commands/solo.md +8 -442
  137. package/pennyfarthing-dist/guides/agent-behavior.md +14 -14
  138. package/pennyfarthing-dist/guides/agent-coordination.md +7 -7
  139. package/pennyfarthing-dist/guides/agent-tag-taxonomy.md +6 -6
  140. package/pennyfarthing-dist/guides/bikerack.md +128 -0
  141. package/pennyfarthing-dist/guides/brownfield-tools.md +133 -0
  142. package/pennyfarthing-dist/guides/command-tag-taxonomy.md +2 -2
  143. package/pennyfarthing-dist/guides/gate-schema.md +227 -0
  144. package/pennyfarthing-dist/guides/gates.md +120 -0
  145. package/pennyfarthing-dist/guides/handoff-cli.md +116 -0
  146. package/pennyfarthing-dist/guides/hooks.md +86 -4
  147. package/pennyfarthing-dist/guides/output-styles.md +65 -0
  148. package/pennyfarthing-dist/guides/patterns/approval-gates-pattern.md +5 -5
  149. package/pennyfarthing-dist/guides/patterns/tdd-flow-pattern.md +4 -4
  150. package/pennyfarthing-dist/guides/prompt-patterns.md +5 -5
  151. package/pennyfarthing-dist/guides/reflector.md +4 -4
  152. package/pennyfarthing-dist/guides/session-artifacts.md +1 -1
  153. package/pennyfarthing-dist/guides/skill-schema.md +1 -1
  154. package/pennyfarthing-dist/guides/tandem-protocol.md +13 -1
  155. package/pennyfarthing-dist/guides/worktree-mode.md +3 -3
  156. package/pennyfarthing-dist/guides/xml-tags.md +5 -4
  157. package/pennyfarthing-dist/personas/themes/hogans-heroes.yaml +11 -22
  158. package/pennyfarthing-dist/personas/themes/stephen-king.yaml +13 -24
  159. package/pennyfarthing-dist/scripts/core/agent-session.sh +0 -0
  160. package/pennyfarthing-dist/scripts/core/check-context.sh +0 -0
  161. package/pennyfarthing-dist/scripts/core/phase-check-start.sh +1 -1
  162. package/pennyfarthing-dist/scripts/core/prime.sh +0 -0
  163. package/pennyfarthing-dist/scripts/cyclist/is-cyclist.sh +0 -0
  164. package/pennyfarthing-dist/scripts/git/create-feature-branches.sh +0 -0
  165. package/pennyfarthing-dist/scripts/git/git-status-all.sh +0 -0
  166. package/pennyfarthing-dist/scripts/git/install-git-hooks.sh +0 -0
  167. package/pennyfarthing-dist/scripts/git/release.sh +0 -0
  168. package/pennyfarthing-dist/scripts/git/worktree-manager.sh +0 -0
  169. package/pennyfarthing-dist/scripts/health/drift-detection.sh +0 -0
  170. package/pennyfarthing-dist/scripts/hooks/bell-mode-hook.sh +0 -0
  171. package/pennyfarthing-dist/scripts/hooks/context-circuit-breaker.sh +0 -0
  172. package/pennyfarthing-dist/scripts/hooks/context-warning.sh +0 -0
  173. package/pennyfarthing-dist/scripts/hooks/cyclist-pretooluse-hook.sh +0 -0
  174. package/pennyfarthing-dist/scripts/hooks/dispatcher-template.sh +0 -0
  175. package/pennyfarthing-dist/scripts/hooks/otel-auto-config.sh +19 -14
  176. package/pennyfarthing-dist/scripts/hooks/post-merge.sh +0 -0
  177. package/pennyfarthing-dist/scripts/hooks/pre-commit.sh +0 -0
  178. package/pennyfarthing-dist/scripts/hooks/pre-edit-check.sh +0 -0
  179. package/pennyfarthing-dist/scripts/hooks/pre-push.sh +0 -0
  180. package/pennyfarthing-dist/scripts/hooks/question-reflector-check.sh +0 -0
  181. package/pennyfarthing-dist/scripts/hooks/question_reflector_check.py +0 -0
  182. package/pennyfarthing-dist/scripts/hooks/schema-validation.sh +0 -0
  183. package/pennyfarthing-dist/scripts/hooks/session-start.sh +0 -0
  184. package/pennyfarthing-dist/scripts/hooks/session-stop.sh +0 -0
  185. package/pennyfarthing-dist/scripts/hooks/sprint-yaml-validation.sh +0 -0
  186. package/pennyfarthing-dist/scripts/hooks/welcome-hook.sh +0 -0
  187. package/pennyfarthing-dist/scripts/jira/create-jira-epic.sh +0 -0
  188. package/pennyfarthing-dist/scripts/jira/create-jira-story.sh +0 -0
  189. package/pennyfarthing-dist/scripts/jira/jira-claim-story.sh +0 -0
  190. package/pennyfarthing-dist/scripts/jira/jira-reconcile.sh +0 -0
  191. package/pennyfarthing-dist/scripts/jira/jira-sync-story.sh +0 -0
  192. package/pennyfarthing-dist/scripts/jira/sync-epic-jira.sh +0 -0
  193. package/pennyfarthing-dist/scripts/lib/background-tasks.sh +0 -0
  194. package/pennyfarthing-dist/scripts/lib/checkpoint.sh +0 -0
  195. package/pennyfarthing-dist/scripts/lib/common.sh +0 -0
  196. package/pennyfarthing-dist/scripts/lib/file-lock.sh +0 -0
  197. package/pennyfarthing-dist/scripts/lib/logging.sh +0 -0
  198. package/pennyfarthing-dist/scripts/lib/retry.sh +0 -0
  199. package/pennyfarthing-dist/scripts/maintenance/migrate-theme-schema.mjs +0 -0
  200. package/pennyfarthing-dist/scripts/maintenance/sidecar-health.sh +0 -0
  201. package/pennyfarthing-dist/scripts/misc/add-short-names.sh +0 -0
  202. package/pennyfarthing-dist/scripts/misc/add_short_names.py +0 -0
  203. package/pennyfarthing-dist/scripts/misc/backlog.sh +0 -0
  204. package/pennyfarthing-dist/scripts/misc/check-status.sh +0 -0
  205. package/pennyfarthing-dist/scripts/misc/find-related-work.sh +0 -0
  206. package/pennyfarthing-dist/scripts/misc/generate-skill-docs.sh +0 -0
  207. package/pennyfarthing-dist/scripts/misc/log-skill-usage.sh +0 -0
  208. package/pennyfarthing-dist/scripts/misc/migrate-bmad-workflow.sh +0 -0
  209. package/pennyfarthing-dist/scripts/misc/migrate_bmad_workflow.py +0 -0
  210. package/pennyfarthing-dist/scripts/misc/repo-scan.sh +0 -0
  211. package/pennyfarthing-dist/scripts/misc/repo-utils.sh +0 -0
  212. package/pennyfarthing-dist/scripts/misc/run-ci.sh +0 -0
  213. package/pennyfarthing-dist/scripts/misc/run-timestamp.sh +0 -0
  214. package/pennyfarthing-dist/scripts/misc/session-cleanup.sh +0 -0
  215. package/pennyfarthing-dist/scripts/misc/skill-usage-report.sh +0 -0
  216. package/pennyfarthing-dist/scripts/misc/statusline.sh +0 -0
  217. package/pennyfarthing-dist/scripts/misc/uninstall.sh +0 -0
  218. package/pennyfarthing-dist/scripts/misc/validate-subagent-frontmatter.sh +0 -0
  219. package/pennyfarthing-dist/scripts/portraits/generate-portraits.py +191 -57
  220. package/pennyfarthing-dist/scripts/portraits/generate-portraits.sh +26 -10
  221. package/pennyfarthing-dist/scripts/story/create-story.sh +0 -0
  222. package/pennyfarthing-dist/scripts/story/size-story.sh +0 -0
  223. package/pennyfarthing-dist/scripts/story/story-template.sh +0 -0
  224. package/pennyfarthing-dist/scripts/tests/check.test.sh +0 -0
  225. package/pennyfarthing-dist/scripts/tests/dev-story-workflow-import.test.sh +0 -0
  226. package/pennyfarthing-dist/scripts/tests/epics-and-stories-workflow-import.test.sh +0 -0
  227. package/pennyfarthing-dist/scripts/tests/handoff-phase-update.test.sh +0 -0
  228. package/pennyfarthing-dist/scripts/tests/implementation-readiness-workflow-import.test.sh +0 -0
  229. package/pennyfarthing-dist/scripts/tests/migrate-bmad-workflow.test.sh +0 -0
  230. package/pennyfarthing-dist/scripts/tests/prd-workflow-import.test.sh +0 -0
  231. package/pennyfarthing-dist/scripts/tests/project-context-workflow-import.test.sh +0 -0
  232. package/pennyfarthing-dist/scripts/tests/test-character-voice.sh +0 -0
  233. package/pennyfarthing-dist/scripts/tests/test-drift-detection.sh +0 -0
  234. package/pennyfarthing-dist/scripts/tests/test-post-merge-hook.sh +0 -0
  235. package/pennyfarthing-dist/scripts/tests/test-session-checkpoint.sh +0 -0
  236. package/pennyfarthing-dist/scripts/tests/test-solo-command.sh +0 -0
  237. package/pennyfarthing-dist/scripts/tests/ux-design-workflow-import.test.sh +0 -0
  238. package/pennyfarthing-dist/scripts/theme/list-themes.sh +0 -0
  239. package/pennyfarthing-dist/scripts/validation/validate-agent-schema.sh +0 -0
  240. package/pennyfarthing-dist/scripts/workflow/check.py +0 -0
  241. package/pennyfarthing-dist/scripts/workflow/check.sh +0 -0
  242. package/pennyfarthing-dist/scripts/workflow/complete-step.py +0 -0
  243. package/pennyfarthing-dist/scripts/workflow/finish-story.sh +0 -0
  244. package/pennyfarthing-dist/scripts/workflow/fix-session-phase.sh +0 -0
  245. package/pennyfarthing-dist/scripts/workflow/get-workflow-type.py +0 -0
  246. package/pennyfarthing-dist/scripts/workflow/get-workflow-type.sh +0 -0
  247. package/pennyfarthing-dist/scripts/workflow/list-workflows.sh +0 -0
  248. package/pennyfarthing-dist/scripts/workflow/phase-owner.sh +0 -0
  249. package/pennyfarthing-dist/scripts/workflow/resume-workflow.sh +0 -0
  250. package/pennyfarthing-dist/scripts/workflow/show-workflow.sh +0 -0
  251. package/pennyfarthing-dist/scripts/workflow/start-workflow.sh +0 -0
  252. package/pennyfarthing-dist/scripts/workflow/workflow-status.sh +0 -0
  253. package/pennyfarthing-dist/skills/pf-changelog/SKILL.md +4 -4
  254. package/pennyfarthing-dist/skills/pf-sprint/skill.md +1 -1
  255. package/pennyfarthing-dist/skills/pf-story/scripts/create-story.sh +0 -0
  256. package/pennyfarthing-dist/skills/pf-story/scripts/size-story.sh +0 -0
  257. package/pennyfarthing-dist/skills/pf-story/scripts/story-template.sh +0 -0
  258. package/pennyfarthing-dist/skills/pf-systematic-debugging/SKILL.md +0 -1
  259. package/pennyfarthing-dist/skills/pf-workflow/scripts/list-workflows.sh +0 -0
  260. package/pennyfarthing-dist/skills/pf-workflow/scripts/resume-workflow.sh +0 -0
  261. package/pennyfarthing-dist/skills/pf-workflow/scripts/show-workflow.sh +0 -0
  262. package/pennyfarthing-dist/skills/pf-workflow/scripts/start-workflow.sh +0 -0
  263. package/pennyfarthing-dist/skills/pf-workflow/scripts/workflow-status.sh +0 -0
  264. package/pennyfarthing-dist/skills/skill-registry.schema.json +4 -0
  265. package/pennyfarthing-dist/skills/skill-registry.yaml +8 -21
  266. package/pennyfarthing-dist/workflows/2party-tdd.yaml +11 -0
  267. package/pennyfarthing-dist/workflows/agent-docs.yaml +2 -0
  268. package/pennyfarthing-dist/workflows/bdd-tandem.yaml +4 -0
  269. package/pennyfarthing-dist/workflows/bdd.yaml +4 -0
  270. package/pennyfarthing-dist/workflows/git-cleanup/steps/step-05-complete.md +1 -1
  271. package/pennyfarthing-dist/workflows/tdd-tandem.yaml +3 -0
  272. package/pennyfarthing-dist/workflows/tdd.yaml +3 -0
  273. package/pennyfarthing-dist/workflows/trivial.yaml +2 -0
  274. package/pennyfarthing_scripts/__pycache__/cli.cpython-314.pyc +0 -0
  275. package/pennyfarthing_scripts/__pycache__/context.cpython-314.pyc +0 -0
  276. package/pennyfarthing_scripts/__pycache__/session_start_hook.cpython-314.pyc +0 -0
  277. package/pennyfarthing_scripts/bc/__pycache__/__init__.cpython-314.pyc +0 -0
  278. package/pennyfarthing_scripts/bc/__pycache__/cli.cpython-314.pyc +0 -0
  279. package/pennyfarthing_scripts/bc/__pycache__/focus.cpython-314.pyc +0 -0
  280. package/pennyfarthing_scripts/bikerack/__pycache__/__init__.cpython-314.pyc +0 -0
  281. package/pennyfarthing_scripts/bikerack/__pycache__/background_panel.cpython-314.pyc +0 -0
  282. package/pennyfarthing_scripts/bikerack/__pycache__/base_panel.cpython-314.pyc +0 -0
  283. package/pennyfarthing_scripts/bikerack/__pycache__/changed_panel.cpython-314.pyc +0 -0
  284. package/pennyfarthing_scripts/bikerack/__pycache__/cli.cpython-314.pyc +0 -0
  285. package/pennyfarthing_scripts/bikerack/__pycache__/debug_panel.cpython-314.pyc +0 -0
  286. package/pennyfarthing_scripts/bikerack/__pycache__/diffs_panel.cpython-314.pyc +0 -0
  287. package/pennyfarthing_scripts/bikerack/__pycache__/git_panel.cpython-314.pyc +0 -0
  288. package/pennyfarthing_scripts/bikerack/__pycache__/launcher.cpython-314.pyc +0 -0
  289. package/pennyfarthing_scripts/bikerack/__pycache__/sprint_panel.cpython-314.pyc +0 -0
  290. package/pennyfarthing_scripts/bikerack/__pycache__/tui.cpython-314.pyc +0 -0
  291. package/pennyfarthing_scripts/bikerack/__pycache__/ws_client.cpython-314.pyc +0 -0
  292. package/pennyfarthing_scripts/bikerack/changed_panel.py +105 -0
  293. package/pennyfarthing_scripts/bikerack/debug_panel.py +218 -0
  294. package/pennyfarthing_scripts/bikerack/diffs_panel.py +203 -27
  295. package/pennyfarthing_scripts/cli.py +114 -0
  296. package/pennyfarthing_scripts/epic/__init__.py +0 -0
  297. package/pennyfarthing_scripts/epic/cli.py +64 -0
  298. package/pennyfarthing_scripts/gate/__init__.py +1 -0
  299. package/pennyfarthing_scripts/gate/__pycache__/__init__.cpython-314.pyc +0 -0
  300. package/pennyfarthing_scripts/gate/__pycache__/cli.cpython-314.pyc +0 -0
  301. package/pennyfarthing_scripts/gate/__pycache__/validate.cpython-314.pyc +0 -0
  302. package/pennyfarthing_scripts/gate/cli.py +56 -0
  303. package/pennyfarthing_scripts/gate/validate.py +266 -0
  304. package/pennyfarthing_scripts/git_group/__init__.py +0 -0
  305. package/pennyfarthing_scripts/git_group/cli.py +100 -0
  306. package/pennyfarthing_scripts/handoff/__init__.py +1 -0
  307. package/pennyfarthing_scripts/handoff/__pycache__/__init__.cpython-314.pyc +0 -0
  308. package/pennyfarthing_scripts/handoff/__pycache__/cli.cpython-314.pyc +0 -0
  309. package/pennyfarthing_scripts/handoff/__pycache__/complete_phase.cpython-314.pyc +0 -0
  310. package/pennyfarthing_scripts/handoff/__pycache__/gate_file.cpython-314.pyc +0 -0
  311. package/pennyfarthing_scripts/handoff/__pycache__/gate_runner.cpython-314.pyc +0 -0
  312. package/pennyfarthing_scripts/handoff/__pycache__/marker.cpython-314.pyc +0 -0
  313. package/pennyfarthing_scripts/handoff/__pycache__/resolve_gate.cpython-314.pyc +0 -0
  314. package/pennyfarthing_scripts/handoff/cli.py +120 -0
  315. package/pennyfarthing_scripts/handoff/complete_phase.py +155 -0
  316. package/pennyfarthing_scripts/handoff/gate_file.py +105 -0
  317. package/pennyfarthing_scripts/handoff/gate_runner.py +152 -0
  318. package/pennyfarthing_scripts/handoff/marker.py +109 -0
  319. package/pennyfarthing_scripts/handoff/resolve_gate.py +152 -0
  320. package/pennyfarthing_scripts/healthscore/__pycache__/__main__.cpython-314.pyc +0 -0
  321. package/pennyfarthing_scripts/healthscore/__pycache__/analyze.cpython-314.pyc +0 -0
  322. package/pennyfarthing_scripts/hooks/cyclist-pretooluse-hook.sh +0 -0
  323. package/pennyfarthing_scripts/jira/__pycache__/cli.cpython-314.pyc +0 -0
  324. package/pennyfarthing_scripts/launch/__pycache__/__init__.cpython-314.pyc +0 -0
  325. package/pennyfarthing_scripts/launch/__pycache__/cli.cpython-314.pyc +0 -0
  326. package/pennyfarthing_scripts/prime/__pycache__/persona.cpython-314.pyc +0 -0
  327. package/pennyfarthing_scripts/prime/__pycache__/version_sentinel.cpython-314.pyc +0 -0
  328. package/pennyfarthing_scripts/prime/__pycache__/workflow.cpython-314.pyc +0 -0
  329. package/pennyfarthing_scripts/prime/workflow.py +39 -0
  330. package/pennyfarthing_scripts/session/__init__.py +0 -0
  331. package/pennyfarthing_scripts/session/cli.py +87 -0
  332. package/pennyfarthing_scripts/session_start_hook.py +4 -4
  333. package/pennyfarthing_scripts/sprint/__pycache__/archive_epic.cpython-314.pyc +0 -0
  334. package/pennyfarthing_scripts/sprint/__pycache__/cli.cpython-314.pyc +0 -0
  335. package/pennyfarthing_scripts/sprint/__pycache__/epic_add.cpython-314.pyc +0 -0
  336. package/pennyfarthing_scripts/sprint/__pycache__/epic_update.cpython-314.pyc +0 -0
  337. package/pennyfarthing_scripts/sprint/__pycache__/loader.cpython-314.pyc +0 -0
  338. package/pennyfarthing_scripts/sprint/__pycache__/story_add.cpython-314.pyc +0 -0
  339. package/pennyfarthing_scripts/sprint/__pycache__/story_finish.cpython-314.pyc +0 -0
  340. package/pennyfarthing_scripts/sprint/__pycache__/story_update.cpython-314.pyc +0 -0
  341. package/pennyfarthing_scripts/sprint/__pycache__/validate_cmd.cpython-314.pyc +0 -0
  342. package/pennyfarthing_scripts/sprint/__pycache__/yaml_io.cpython-314.pyc +0 -0
  343. package/pennyfarthing_scripts/sprint/archive_epic.py +8 -0
  344. package/pennyfarthing_scripts/tests/__pycache__/test_108_2_remove_handoff_fallback.cpython-314-pytest-9.0.2.pyc +0 -0
  345. package/pennyfarthing_scripts/tests/__pycache__/test_archive_epic.cpython-314-pytest-9.0.2.pyc +0 -0
  346. package/pennyfarthing_scripts/tests/__pycache__/test_bc.cpython-314-pytest-9.0.2.pyc +0 -0
  347. package/pennyfarthing_scripts/tests/__pycache__/test_bikerack.cpython-314-pytest-9.0.2.pyc +0 -0
  348. package/pennyfarthing_scripts/tests/__pycache__/test_cli_normalization.cpython-314-pytest-9.0.2.pyc +0 -0
  349. package/pennyfarthing_scripts/tests/__pycache__/test_gate_file_resolution.cpython-314-pytest-9.0.2.pyc +0 -0
  350. package/pennyfarthing_scripts/tests/__pycache__/test_gate_runner.cpython-314-pytest-9.0.2.pyc +0 -0
  351. package/pennyfarthing_scripts/tests/__pycache__/test_handoff_cli.cpython-314-pytest-9.0.2.pyc +0 -0
  352. package/pennyfarthing_scripts/tests/__pycache__/test_handoff_e2e.cpython-314-pytest-9.0.2.pyc +0 -0
  353. package/pennyfarthing_scripts/tests/__pycache__/test_resolve_gate_file_field.cpython-314-pytest-9.0.2.pyc +0 -0
  354. package/pennyfarthing_scripts/tests/__pycache__/test_sprint_panel.cpython-314-pytest-9.0.2.pyc +0 -0
  355. package/pennyfarthing_scripts/tests/__pycache__/test_topology_loader.cpython-314-pytest-9.0.2.pyc +0 -0
  356. package/pennyfarthing_scripts/tests/__pycache__/test_tui_focus.cpython-314-pytest-9.0.2.pyc +0 -0
  357. package/pennyfarthing_scripts/tests/__pycache__/test_tui_panel_persistence.cpython-314-pytest-9.0.2.pyc +0 -0
  358. package/pennyfarthing_scripts/tests/__pycache__/test_version_sentinel.cpython-314-pytest-9.0.2.pyc +0 -0
  359. package/pennyfarthing_scripts/tests/__pycache__/test_yaml_io.cpython-314-pytest-9.0.2.pyc +0 -0
  360. package/pennyfarthing_scripts/tests/test_108_1_gate_migration.py +540 -0
  361. package/pennyfarthing_scripts/tests/test_108_2_remove_handoff_fallback.py +339 -0
  362. package/pennyfarthing_scripts/tests/test_archive_epic.py +1 -2
  363. package/pennyfarthing_scripts/tests/test_confidence_sm_evaluation.py +253 -0
  364. package/pennyfarthing_scripts/tests/test_confidence_sm_gate.py +315 -0
  365. package/pennyfarthing_scripts/tests/test_gate_file_resolution.py +341 -0
  366. package/pennyfarthing_scripts/tests/test_gate_runner.py +620 -0
  367. package/pennyfarthing_scripts/tests/test_handoff_cli.py +929 -0
  368. package/pennyfarthing_scripts/tests/test_handoff_e2e.py +454 -0
  369. package/pennyfarthing_scripts/tests/test_resolve_gate_file_field.py +464 -0
  370. package/pennyfarthing_scripts/theme/__pycache__/cli.cpython-314.pyc +0 -0
  371. package/pennyfarthing_scripts/validate/adapters/__pycache__/workflow.cpython-314.pyc +0 -0
  372. package/pennyfarthing_scripts/validate/adapters/skill_command.py +200 -0
  373. package/pennyfarthing_scripts/validate/adapters/workflow.py +64 -0
  374. package/pennyfarthing_scripts/validate/cli.py +15 -4
  375. package/packages/core/dist/benchmark/package-exports.test.d.ts.map +0 -1
  376. package/packages/core/dist/benchmark/package-exports.test.js.map +0 -1
  377. package/packages/core/dist/scripts/benchmark-integration.d.ts +0 -182
  378. package/packages/core/dist/scripts/benchmark-integration.d.ts.map +0 -1
  379. package/packages/core/dist/scripts/benchmark-integration.js +0 -691
  380. package/packages/core/dist/scripts/benchmark-integration.js.map +0 -1
  381. package/packages/core/dist/scripts/benchmark-integration.test.d.ts +0 -13
  382. package/packages/core/dist/scripts/benchmark-integration.test.d.ts.map +0 -1
  383. package/packages/core/dist/scripts/benchmark-integration.test.js +0 -680
  384. package/packages/core/dist/scripts/benchmark-integration.test.js.map +0 -1
  385. package/packages/core/dist/scripts/debugging-scenarios.test.d.ts +0 -18
  386. package/packages/core/dist/scripts/debugging-scenarios.test.d.ts.map +0 -1
  387. package/packages/core/dist/scripts/debugging-scenarios.test.js +0 -317
  388. package/packages/core/dist/scripts/debugging-scenarios.test.js.map +0 -1
  389. package/packages/core/dist/scripts/job-fair-aggregator.d.ts +0 -150
  390. package/packages/core/dist/scripts/job-fair-aggregator.d.ts.map +0 -1
  391. package/packages/core/dist/scripts/job-fair-aggregator.js +0 -547
  392. package/packages/core/dist/scripts/job-fair-aggregator.js.map +0 -1
  393. package/packages/core/dist/scripts/job-fair-aggregator.test.d.ts +0 -14
  394. package/packages/core/dist/scripts/job-fair-aggregator.test.d.ts.map +0 -1
  395. package/packages/core/dist/scripts/job-fair-aggregator.test.js +0 -616
  396. package/packages/core/dist/scripts/job-fair-aggregator.test.js.map +0 -1
  397. package/pennyfarthing-dist/agents/handoff.md +0 -250
  398. package/pennyfarthing-dist/agents/sm-handoff.md +0 -152
  399. package/pennyfarthing-dist/scripts/core/handoff-marker.sh +0 -112
  400. package/pennyfarthing-dist/skills/pf-dev-patterns/SKILL.md +0 -461
  401. package/scripts/README.md +0 -41
@@ -0,0 +1,486 @@
1
+ ---
2
+ description: Compare an agent's performance against a stored baseline
3
+ argument-hint: <theme> <agent> [--as <role>] [--scenario <name>] [--runs N]
4
+ ---
5
+
6
+ # Benchmark
7
+
8
+ <purpose>
9
+ Compare a persona agent's performance against the established control baseline. Runs the agent on the scenario and calculates statistical measures including effect size (Cohen's d) and significance.
10
+
11
+ Default: 4 runs for comparison (balance between reliability and runtime). Runs execute in parallel for faster results.
12
+
13
+ **Simplified Usage:** Just specify theme and agent role - you'll be presented with matching scenarios to choose from.
14
+ </purpose>
15
+
16
+ <critical-integrity-requirements>
17
+ ## DO NOT FABRICATE COMPARISON DATA
18
+
19
+ Comparisons are only meaningful if BOTH the baseline AND the contestant runs are real.
20
+
21
+ **Before comparing:**
22
+ 1. Validate baseline has proof-of-work (check runs have `proof.*` fields)
23
+ 2. Actually run `/solo` for the contestant with real Task tool calls
24
+ 3. Validate contestant runs have proof-of-work before calculating statistics
25
+
26
+ **Baseline Validation:**
27
+ Before using a baseline, spot-check at least one run file:
28
+ - Read a run from `internal/results/baselines/{scenario}/{agent}/runs/*.json`
29
+ - Verify it has `proof.agent_task_id`, `proof.agent_response_text`, `proof.judge_task_id`
30
+ - Verify `proof.agent_response_text` is at least 200 characters
31
+ - Verify `token_usage.input_tokens` > 0
32
+
33
+ **If baseline validation fails:**
34
+ ```markdown
35
+ Error: Baseline for '{scenario}' appears to be fabricated (missing proof-of-work).
36
+ Run `/benchmark-control --scenario {scenario}` to create a real baseline.
37
+ ```
38
+
39
+ **Contestant runs MUST include proof-of-work.** See `/solo` for requirements.
40
+ </critical-integrity-requirements>
41
+
42
+ <usage>
43
+ ```
44
+ # Simple: Pick scenario interactively
45
+ /benchmark the-expanse sm
46
+ /benchmark discworld reviewer
47
+
48
+ # Direct: Specify scenario explicitly
49
+ /benchmark discworld reviewer --scenario order-service
50
+ /benchmark ted-lasso dev --scenario tdd-shopping-cart --runs 8
51
+
52
+ # Cross-role: Run any character as any role
53
+ /benchmark shakespeare prospero --as dev --scenario django-10554
54
+ /benchmark discworld granny --as dev --scenario tdd-shopping-cart
55
+ ```
56
+
57
+ **Arguments:**
58
+ - `theme` - The persona theme (e.g., `discworld`, `the-expanse`, `ted-lasso`)
59
+ - `agent` - The agent role OR character name (if using `--as`)
60
+ - `--as <role>` - (Optional) Override role for cross-role testing. Makes `agent` a character name lookup.
61
+ - `--scenario` - (Optional) Scenario name. If omitted, shows matching scenarios to choose from.
62
+ - `--runs N` - Number of evaluation runs (default: 4, max: 20)
63
+
64
+ **Cross-Role Testing:**
65
+ The `--as` flag enables running any character as any role:
66
+ ```
67
+ /benchmark shakespeare prospero --as dev --scenario django-10554
68
+ ```
69
+ This uses Prospero's persona traits (wise orchestrator) but gives him a dev task.
70
+ The scenario's role determines what the agent is asked to do; the character determines HOW they do it.
71
+
72
+ **Examples:**
73
+ ```
74
+ # Let me pick from SM scenarios
75
+ /benchmark the-expanse sm
76
+
77
+ # Let me pick from code review scenarios
78
+ /benchmark discworld reviewer
79
+
80
+ # Run specific scenario directly
81
+ /benchmark princess-bride reviewer --scenario order-service --runs 8
82
+
83
+ # Cross-role: Prospero (SM) doing dev work
84
+ /benchmark shakespeare prospero --as dev --scenario tdd-shopping-cart --runs 4
85
+ ```
86
+ </usage>
87
+
88
+ <on-invoke>
89
+ The user invoked this command with: $ARGUMENTS
90
+
91
+ ## Step 1: Parse Arguments
92
+
93
+ Parse the arguments to extract:
94
+ - `theme`: First positional argument (e.g., `discworld`, `the-expanse`)
95
+ - `agent_or_character`: Second positional argument (role name OR character name if `--as` is used)
96
+ - `role_override`: Value after `--as` (OPTIONAL - enables cross-role mode)
97
+ - `scenario_name`: Value after `--scenario` (OPTIONAL)
98
+ - `runs`: Value after `--runs` (default: 4, max: 20)
99
+
100
+ **Cross-Role Mode:**
101
+ If `--as <role>` is provided:
102
+ - `agent_or_character` is treated as a CHARACTER NAME (case-insensitive search)
103
+ - `role_override` becomes the `effective_role` for scenario matching
104
+ - Results save to `internal/results/benchmarks/{scenario}/{theme}-{character}-as-{role}/`
105
+
106
+ **Legacy format support:** If first argument contains `:`, split it (e.g., `discworld:reviewer` → theme=discworld, agent_or_character=reviewer)
107
+
108
+ **Validation:**
109
+ - Theme must be a valid theme name
110
+ - If `--as` is provided: validate `role_override` is one of: `sm`, `dev`, `reviewer`, `architect`, `tea`, `pm`
111
+ - If `--as` is NOT provided: validate `agent_or_character` is one of: `sm`, `dev`, `reviewer`, `architect`, `tea`, `pm`
112
+ - `--runs` must be a positive integer between 1 and 20
113
+
114
+ **Determine effective_role:**
115
+ ```python
116
+ if role_override:
117
+ effective_role = role_override # e.g., "dev"
118
+ cross_role = True
119
+ else:
120
+ effective_role = agent_or_character # e.g., "dev"
121
+ cross_role = False
122
+ ```
123
+
124
+ ## Step 2: Scenario Discovery (if --scenario not provided)
125
+
126
+ If `scenario_name` is NOT provided, discover matching scenarios.
127
+
128
+ **Use `effective_role` (not `agent_or_character`) for scenario discovery.**
129
+ Cross-role mode: Prospero --as dev should see dev scenarios, not SM scenarios.
130
+
131
+ **Role-to-Category Mapping:**
132
+ | effective_role | Scenario Categories |
133
+ |----------------|---------------------|
134
+ | sm | `sm` |
135
+ | dev | `dev` (includes debug scenarios) |
136
+ | reviewer | `code-review` |
137
+ | architect | `architecture` |
138
+ | tea | `tea` |
139
+
140
+ **Time Estimates by Difficulty (parallel execution):**
141
+ | Difficulty | Est. Time (4 runs) | Note |
142
+ |------------|-------------------|------|
143
+ | easy | ~1 min | Runs execute in parallel |
144
+ | medium | ~2 min | Runs execute in parallel |
145
+ | hard | ~4 min | Runs execute in parallel |
146
+ | extreme | ~8 min | Runs execute in parallel |
147
+
148
+ **Discover scenarios:**
149
+ ```bash
150
+ # Use Bash to list matching scenarios
151
+ ls scenarios/{category}/*.yaml | xargs -I {} yq -r '"{}|\(.name)|\(.difficulty)|\(.title)|\(.description)"' {}
152
+ ```
153
+
154
+ **Present choices (Reflector-aware):**
155
+
156
+ First output marker: `<!-- CYCLIST:CHOICES:scenario -->`
157
+
158
+ Then use AskUserQuestion:
159
+ ```yaml
160
+ AskUserQuestion:
161
+ questions:
162
+ - question: "Which scenario do you want to benchmark {theme}:{agent_type} on?"
163
+ header: "Scenario"
164
+ multiSelect: false
165
+ options:
166
+ - label: "{name} ({difficulty})"
167
+ description: "{title} - ~{time_estimate}"
168
+ # ... up to 4 options
169
+ ```
170
+
171
+ If more than 4 scenarios exist, show the first 4 by difficulty (hardest first) and let user type "Other" for full list.
172
+
173
+ **After user selects:** Set `scenario_name` to the selected scenario's name and continue.
174
+
175
+ ## Step 3: Control Theme Handling
176
+
177
+ **If theme is `control`:** This is a baseline creation run.
178
+ - Default `runs` to 10 (instead of 4) for statistical reliability
179
+ - Results save to `internal/results/baselines/{scenario}/{agent}/` instead of comparison
180
+ - Skip baseline validation (we're creating the baseline)
181
+ - After running, calculate and save baseline statistics
182
+ - Display baseline summary and exit
183
+
184
+ **If theme is NOT `control`:** Continue to Step 4 for comparison workflow.
185
+
186
+ ## Step 4: Load and Validate Baseline
187
+
188
+ **Baseline is based on `effective_role`, not the character's native role.**
189
+ Cross-role tests compare against the effective role's baseline (e.g., prospero --as dev compares against control:dev).
190
+
191
+ Check if baseline exists:
192
+
193
+ ```yaml
194
+ Read tool:
195
+ file_path: "internal/results/baselines/{scenario_name}/{effective_role}/summary.yaml"
196
+ ```
197
+
198
+ **If baseline does not exist:**
199
+ ```markdown
200
+ Error: No baseline found for scenario '{scenario_name}' with agent type '{agent_type}'.
201
+
202
+ To create a baseline, run:
203
+ /benchmark control {agent_type} --scenario {scenario_name}
204
+
205
+ Or use the shortcut:
206
+ /benchmark-control {agent_type} --scenario {scenario_name}
207
+ ```
208
+
209
+ **If baseline exists, VALIDATE IT:**
210
+
211
+ 1. Get list of run files:
212
+ ```yaml
213
+ Glob tool:
214
+ pattern: "internal/results/baselines/{scenario_name}/{agent_type}/runs/*.json"
215
+ ```
216
+
217
+ 2. Read at least one run file and validate proof-of-work:
218
+ ```yaml
219
+ Read tool:
220
+ file_path: "{first run file}"
221
+ ```
222
+
223
+ 3. **Check for proof-of-work fields:**
224
+ - Has `proof.agent_task_id`?
225
+ - Has `proof.agent_response_text` with length >= 200?
226
+ - Has `proof.judge_task_id`?
227
+ - Has `proof.judge_response_text`?
228
+ - Has `token_usage.input_tokens` > 0?
229
+ - Has `token_usage.output_tokens` > 0?
230
+
231
+ 4. **If validation fails:**
232
+ ```markdown
233
+ Error: Baseline for '{scenario_name}' is INVALID - missing proof-of-work.
234
+
235
+ The baseline data appears to be fabricated (no agent/judge response text,
236
+ no task IDs, or no token counts).
237
+
238
+ Delete the invalid baseline and create a real one:
239
+ rm -rf internal/results/baselines/{scenario_name}/{agent_type}
240
+ /benchmark-control --scenario {scenario_name}
241
+ ```
242
+
243
+ **If baseline is valid:**
244
+ - Extract `sample_size`, `statistics.total.mean`, `statistics.total.std_dev`
245
+ - Display baseline info with validation confirmation
246
+
247
+ **Sample size warning:**
248
+ If baseline sample size < 5:
249
+ ```markdown
250
+ **Warning:** Baseline sample size ({n}) is less than 5. Results may not be statistically reliable.
251
+ Consider running `/benchmark-control --scenario {scenario_name} --runs 10` to add more data.
252
+ ```
253
+
254
+ ## Step 5: Run Contestant Evaluation (Parallel)
255
+
256
+ For efficiency, spawn multiple runs in parallel using Task agents.
257
+
258
+ **Batch Strategy:**
259
+ - If runs ≤ 4: Spawn all in parallel (single message with N Task agents)
260
+ - If runs > 4: Spawn in batches of 4 to avoid overwhelming the system
261
+
262
+ **Build the /solo command:**
263
+ ```python
264
+ if cross_role:
265
+ # Cross-role: agent_or_character is a character name
266
+ solo_cmd = f"/solo {theme}:{agent_or_character} --as {effective_role} --scenario {scenario_name}"
267
+ else:
268
+ # Standard: agent_or_character is the role name
269
+ solo_cmd = f"/solo {theme}:{agent_or_character} --scenario {scenario_name}"
270
+ ```
271
+
272
+ **For each run, spawn a Task agent:**
273
+ ```
274
+ Task (run 1 of N):
275
+ subagent_type: general-purpose
276
+ prompt: |
277
+ Run {solo_cmd}
278
+ This is run 1 of N for baseline/benchmark.
279
+ Return the full result JSON including score and token_usage.
280
+ ```
281
+
282
+ **Example commands:**
283
+ - Standard: `/solo discworld:dev --scenario tdd-shopping-cart`
284
+ - Cross-role: `/solo shakespeare:prospero --as dev --scenario tdd-shopping-cart`
285
+
286
+ **Spawn all batch tasks in a SINGLE message for parallel execution.**
287
+
288
+ Wait for all tasks to complete. Collect results:
289
+ - Per-run scores (total, plus dimension breakdown if available)
290
+ - Per-run token usage (input_tokens, output_tokens)
291
+ - Per-run timestamps
292
+ - Cross-role metadata (source_role, effective_role, cross_role flag)
293
+
294
+ **If a run fails:** Note the failure, continue with successful runs. Warn if < 3 successful runs.
295
+
296
+ ## Step 6: Calculate Comparison Statistics
297
+
298
+ **Contestant Statistics:**
299
+ - `contestant_mean`: Average total score
300
+ - `contestant_std_dev`: Standard deviation
301
+ - `contestant_n`: Number of runs
302
+
303
+ **Baseline Statistics (from summary.yaml):**
304
+ - `baseline_mean`: statistics.total.mean
305
+ - `baseline_std_dev`: statistics.total.std_dev
306
+ - `baseline_n`: sample_size
307
+
308
+ **Mean Difference:**
309
+ ```
310
+ difference = contestant_mean - baseline_mean
311
+ ```
312
+
313
+ **Cohen's d Effect Size:**
314
+ ```
315
+ pooled_std_dev = sqrt((contestant_std_dev² + baseline_std_dev²) / 2)
316
+ cohens_d = difference / pooled_std_dev
317
+ ```
318
+
319
+ **Effect Size Interpretation:**
320
+ | Cohen's d | Interpretation |
321
+ |-----------|----------------|
322
+ | < 0.2 | Negligible |
323
+ | 0.2 - 0.5 | Small |
324
+ | 0.5 - 0.8 | Medium |
325
+ | > 0.8 | Large |
326
+
327
+ **95% Confidence Interval for Difference:**
328
+ ```
329
+ se_diff = sqrt(contestant_std_dev²/contestant_n + baseline_std_dev²/baseline_n)
330
+ ci_lower = difference - 1.96 × se_diff
331
+ ci_upper = difference + 1.96 × se_diff
332
+ ```
333
+
334
+ **Statistical Significance:**
335
+ If CI does not include 0, the difference is statistically significant at p < 0.05.
336
+
337
+ ## Step 7: Display Comparison Results
338
+
339
+ ```markdown
340
+ ---
341
+
342
+ ## Baseline Comparison
343
+
344
+ **Contestant:** {theme}:{agent_type} ({character_name})
345
+ **Scenario:** {scenario_name}
346
+ **Baseline:** control:{agent_type} (n={baseline_n})
347
+
348
+ ### Performance vs Baseline
349
+
350
+ | Metric | Contestant | Baseline | Difference | Effect Size |
351
+ |--------|------------|----------|------------|-------------|
352
+ | Total Score | {c_mean} ± {c_std} | {b_mean} ± {b_std} | {diff:+.1f} | **{cohens_d:.1f}σ** ({interpretation}) |
353
+ | Detection | {c_det} | {b_det} | {diff:+.1f} | {effect} |
354
+ | Depth | {c_dep} | {b_dep} | {diff:+.1f} | {effect} |
355
+ | Quality | {c_qual} | {b_qual} | {diff:+.1f} | {effect} |
356
+ | Persona | {c_per} | {b_per} | {diff:+.1f} | {effect} |
357
+
358
+ ### Efficiency
359
+
360
+ | Metric | Contestant | Baseline |
361
+ |--------|------------|----------|
362
+ | Tokens/Point | {c_tokens_per_point} | {b_tokens_per_point} |
363
+ | Efficiency | {efficiency_pct}% of baseline | 100% |
364
+
365
+ ### Statistical Significance
366
+
367
+ - **Effect Size (Cohen's d):** {cohens_d:.2f} ({interpretation})
368
+ - **95% CI for difference:** [{ci_lower:+.1f}, {ci_upper:+.1f}]
369
+ - **Significant:** {Yes/No} (p < 0.05)
370
+
371
+ ### Verdict
372
+
373
+ {verdict based on effect size and significance}
374
+
375
+ ---
376
+ ```
377
+
378
+ **Verdict Logic:**
379
+ - If not significant: "No statistically significant difference from baseline."
380
+ - If significant and positive large effect: "Contestant **significantly outperforms** baseline with large effect size."
381
+ - If significant and positive medium effect: "Contestant **outperforms** baseline with medium effect size."
382
+ - If significant and positive small effect: "Contestant **slightly outperforms** baseline."
383
+ - If significant and negative: "Contestant **underperforms** baseline."
384
+
385
+ ## Step 8: Save Results (ALWAYS)
386
+
387
+ **Output path logic:**
388
+ ```python
389
+ if theme == "control":
390
+ base_path = f"internal/results/baselines/{scenario_name}/{effective_role}/"
391
+ elif cross_role:
392
+ # Cross-role: include character slug for clarity
393
+ character_slug = slugify(character_name) # e.g., "prospero", "granny-weatherwax"
394
+ base_path = f"internal/results/benchmarks/{scenario_name}/{theme}-{character_slug}-as-{effective_role}/"
395
+ else:
396
+ base_path = f"internal/results/benchmarks/{scenario_name}/{theme}-{effective_role}/"
397
+ ```
398
+
399
+ **Cross-role examples:**
400
+ - `/benchmark shakespeare prospero --as dev` → `internal/results/benchmarks/{scenario}/shakespeare-prospero-as-dev/`
401
+ - `/benchmark discworld granny --as dev` → `internal/results/benchmarks/{scenario}/discworld-granny-weatherwax-as-dev/`
402
+
403
+ **Save structure:**
404
+ ```
405
+ {base_path}/
406
+ ├── runs/
407
+ │ ├── run_1.json
408
+ │ ├── judge_1.json
409
+ │ └── ...
410
+ └── summary.yaml
411
+ ```
412
+
413
+ **summary.yaml format:** See `/solo` command Step 10. For cross-role runs, include:
414
+ ```yaml
415
+ agent:
416
+ theme: {theme}
417
+ character: {character_name}
418
+ source_role: {source_role} # where character normally lives (e.g., sm)
419
+ effective_role: {effective_role} # what they're doing (e.g., dev)
420
+ cross_role: true
421
+ ```
422
+
423
+ **REQUIRED: Capture Pennyfarthing version in metadata:**
424
+ ```bash
425
+ # Get version from package.json
426
+ version=$(node -p "require('./package.json').version")
427
+ ```
428
+
429
+ Include in summary.yaml:
430
+ ```yaml
431
+ metadata:
432
+ created_at: "{ISO timestamp}"
433
+ pennyfarthing_version: "{version}" # REQUIRED for baseline staleness detection
434
+ model: sonnet
435
+ ```
436
+
437
+ **ALWAYS save summary.yaml, even for n=1.** This ensures consistent data structure for analysis.
438
+
439
+ Display:
440
+ ```
441
+ ✓ Saved {n} run(s) to {base_path}
442
+ ✓ Summary: {base_path}/summary.yaml
443
+ ```
444
+ </on-invoke>
445
+
446
+ <error-handling>
447
+ **Baseline not found:**
448
+ ```markdown
449
+ Error: No baseline found for scenario '{scenario_name}' with agent type '{agent_type}'.
450
+
451
+ To create a baseline, run:
452
+ /benchmark-control --scenario {scenario_name}
453
+ ```
454
+
455
+ **Invalid contestant spec:**
456
+ ```markdown
457
+ Error: Invalid contestant format. Expected 'theme:agent', got '{value}'.
458
+
459
+ Examples:
460
+ - discworld:reviewer
461
+ - princess-bride:dev
462
+ - control:sm
463
+ ```
464
+
465
+ **Missing --scenario:**
466
+ ```markdown
467
+ Error: --scenario is required.
468
+
469
+ Usage: /benchmark <theme:agent> --scenario <name> [--runs N]
470
+ ```
471
+
472
+ **Invalid runs value:**
473
+ ```markdown
474
+ Error: --runs must be between 1 and 20. Got: {value}
475
+ ```
476
+ </error-handling>
477
+
478
+ <reference>
479
+ - Solo Command: `.claude/project/commands/solo.md`
480
+ - Establish Baseline: `.claude/project/commands/benchmark-control.md`
481
+ - Effect Size: Cohen's d standard interpretation (0.2 small, 0.5 medium, 0.8 large)
482
+ - Baselines: `internal/results/baselines/{scenario}/{role}/` (control theme)
483
+ - Benchmarks: `internal/results/benchmarks/{scenario}/{theme}-{role}/` (all other themes)
484
+ - Results README: `internal/results/README.md`
485
+ </reference>
486
+ </output>
@@ -4,7 +4,7 @@ description: Quick commit for small changes without full git-cleanup ceremony
4
4
 
5
5
  # Quick Chore Commit
6
6
 
7
- Quickly commit dirty changes without the full `/git-cleanup` ceremony. Checks **all repos** (orchestrator + subrepos), creates branches, commits, merges to develop, and pushes.
7
+ Quickly commit dirty changes without the full `/pf-git cleanup` ceremony. Checks **all repos** (orchestrator + subrepos), creates branches, commits, merges to develop, and pushes.
8
8
 
9
9
  <purpose>
10
10
  Fast path for committing small changes that don't warrant story tracking.
@@ -206,13 +206,13 @@ All repos should show clean.
206
206
 
207
207
  ## When to Use
208
208
 
209
- | Use /chore | Use /git-cleanup |
209
+ | Use /chore | Use /pf-git cleanup |
210
210
  |------------|------------------|
211
211
  | Single logical change | Multiple unrelated changes |
212
212
  | Quick fix or tweak | Need to organize into groups |
213
213
  | One type of change | Mixed types requiring separation |
214
214
 
215
215
  <related>
216
- - `/git-cleanup` - Full ceremony for organizing multiple changes
217
- - `/repo-status` - Check status across all repos
216
+ - `/pf-git cleanup` - Full ceremony for organizing multiple changes
217
+ - `/pf-git status` - Check status across all repos
218
218
  </related>
@@ -0,0 +1,40 @@
1
+ ---
2
+ description: Detect and run CI locally
3
+ args: "[run] [--detect-only] [--dry-run]"
4
+ ---
5
+
6
+ # CI Operations
7
+
8
+ <purpose>
9
+ Run CI locally by detecting the project's CI system and executing appropriate commands.
10
+ </purpose>
11
+
12
+ ## Commands
13
+
14
+ ### `/pf-ci run`
15
+
16
+ Run CI locally (auto-detects CI system).
17
+
18
+ ```bash
19
+ # Run CI
20
+ $CLAUDE_PROJECT_DIR/.pennyfarthing/scripts/run-ci.sh
21
+
22
+ # Detect only
23
+ $CLAUDE_PROJECT_DIR/.pennyfarthing/scripts/run-ci.sh --detect-only
24
+
25
+ # Dry run
26
+ $CLAUDE_PROJECT_DIR/.pennyfarthing/scripts/run-ci.sh --dry-run
27
+ ```
28
+
29
+ ## CI System Detection
30
+
31
+ | Priority | System | Detection | Command |
32
+ |----------|--------|-----------|---------|
33
+ | 1 | Justfile | `just --list` shows `ci` recipe | `just ci` |
34
+ | 2 | GitHub Actions | `.github/workflows/*.yml` | `act` |
35
+ | 3 | GitLab CI | `.gitlab-ci.yml` | `gitlab-runner exec` |
36
+ | 4 | npm fallback | `package.json` | `npm run build && npm test && npm run lint` |
37
+
38
+ ## Related
39
+
40
+ - `/pf-check` — Run quality gates before handoff
@@ -1,32 +1,14 @@
1
1
  ---
2
- description: Close an epic - verify completion, update status, and archive context
2
+ deprecated: true
3
+ redirect: pf-epic
4
+ description: "DEPRECATED: Use /pf-epic close instead."
3
5
  ---
4
6
 
5
- # Close Epic
7
+ # /close-epic - DEPRECATED
6
8
 
7
- <purpose>
8
- Closes an epic after all stories are done. Updates sprint YAML, transitions Jira, archives context.
9
- Counterpart to `/start-epic`. Idempotent — safe to run multiple times.
10
- </purpose>
9
+ Epic commands have been consolidated into `/pf-epic`. Use:
11
10
 
12
- <usage>
13
- ```bash
14
- /close-epic 79 # Close epic 79
15
- /close-epic epic-79 # Also accepts epic-N format
16
- ```
17
- </usage>
18
-
19
- <workflow>
20
- 1. Parse epic ID (strip `epic-` prefix if present). Ask if not provided.
21
- 2. Read epic shard `sprint/epic-{JIRA_KEY}.yaml`, verify all stories `status: done`. Warn if incomplete.
22
- 3. Update epic: `status: done`, `completed_points: {sum of story points}`
23
- 4. Recalculate sprint summary totals in `sprint/current-sprint.yaml`
24
- 5. If epic has `jira:` key → `pf jira move {JIRA_KEY} "Done"`
25
- 6. If `sprint/context/context-epic-{N}.md` exists → move to `sprint/archive/`
26
- 7. Commit and push sprint changes
27
- </workflow>
28
-
29
- <related>
30
- - `/start-epic` — Start an epic (move to current sprint, generate context)
31
- - `/pf-sprint status` — View sprint progress
32
- </related>
11
+ | Old Command | New Command |
12
+ |-------------|-------------|
13
+ | `/start-epic <id>` | `/pf-epic start <id>` |
14
+ | `/close-epic <id>` | `/pf-epic close <id>` |