workflow-ai 1.0.63 → 1.0.64

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (494) hide show
  1. package/configs/config.yaml +134 -0
  2. package/configs/pipeline.yaml +884 -0
  3. package/configs/ticket-movement-rules.yaml +80 -0
  4. package/package.json +1 -1
  5. package/src/global-dir.mjs +25 -1
  6. package/src/scripts/run-skill-tests.js +348 -136
  7. package/src/skills/analyze-report/README.md +44 -0
  8. package/src/skills/analyze-report/SKILL.md +121 -0
  9. package/src/skills/analyze-report/algorithms/progress-assessment.md +108 -0
  10. package/src/skills/analyze-report/knowledge/analysis-frameworks.md +66 -0
  11. package/src/skills/analyze-report/knowledge/report-structure.md +61 -0
  12. package/src/skills/analyze-report/scripts/calc-plan-metrics.js +234 -0
  13. package/src/skills/analyze-report/templates/analysis-report.md +80 -0
  14. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/claude-sonnet/trial-1.md +69 -0
  15. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/claude-sonnet/trial-2.md +103 -0
  16. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/claude-sonnet/trial-3.md +99 -0
  17. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/judge.json +163 -0
  18. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-deepseek/trial-1.md +89 -0
  19. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-deepseek/trial-2.md +88 -0
  20. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-deepseek/trial-3.md +100 -0
  21. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-glm/trial-1.md +77 -0
  22. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-glm/trial-2.md +64 -0
  23. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-glm/trial-3.md +110 -0
  24. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-minimax/trial-1.md +74 -0
  25. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-minimax/trial-2.md +38 -0
  26. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-minimax/trial-3.md +61 -0
  27. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/meta.json +115 -0
  28. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001-evidence-from-log.yaml +60 -0
  29. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/claude-sonnet/trial-1.md +90 -0
  30. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/claude-sonnet/trial-2.md +89 -0
  31. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/claude-sonnet/trial-3.md +77 -0
  32. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/judge.json +163 -0
  33. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-deepseek/trial-1.md +84 -0
  34. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-deepseek/trial-2.md +77 -0
  35. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-deepseek/trial-3.md +89 -0
  36. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-glm/trial-1.md +103 -0
  37. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-glm/trial-2.md +103 -0
  38. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-glm/trial-3.md +103 -0
  39. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-minimax/trial-1.md +93 -0
  40. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-minimax/trial-2.md +93 -0
  41. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-minimax/trial-3.md +86 -0
  42. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/meta.json +115 -0
  43. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002-result-block-format.yaml +44 -0
  44. package/src/skills/analyze-report/tests/fixtures/REPORT-002-incorrect-attribution.md +27 -0
  45. package/src/skills/analyze-report/tests/fixtures/pipeline-2026-04-06_qa-001-skip.log +32 -0
  46. package/src/skills/analyze-report/tests/index.yaml +25 -0
  47. package/src/skills/analyze-report/tests/rubrics/evidence-from-log.md +22 -0
  48. package/src/skills/analyze-report/tests/rubrics/result-block-format.md +22 -0
  49. package/src/skills/analyze-report/workflows/progress.md +158 -0
  50. package/src/skills/analyze-report/workflows/retrospective.md +143 -0
  51. package/src/skills/coach/README.md +43 -0
  52. package/src/skills/coach/SKILL.md +166 -0
  53. package/src/skills/coach/SKILL.md.legacy +157 -0
  54. package/src/skills/coach/algorithms/gap-analysis.md +69 -0
  55. package/src/skills/coach/algorithms/improvement-prioritization.md +62 -0
  56. package/src/skills/coach/algorithms/skill-scoring.md +80 -0
  57. package/src/skills/coach/knowledge/audit-applied-changes-clean.txt +11 -0
  58. package/src/skills/coach/knowledge/backlog-management.md +67 -0
  59. package/src/skills/coach/knowledge/backlog-management.md.legacy +90 -0
  60. package/src/skills/coach/knowledge/common-antipatterns.md +76 -0
  61. package/src/skills/coach/knowledge/prompt-engineering.md +45 -0
  62. package/src/skills/coach/knowledge/shared-knowledge-guide.md +44 -0
  63. package/src/skills/coach/knowledge/skill-anatomy.md +49 -0
  64. package/src/skills/coach/knowledge/test-authorship.md +141 -0
  65. package/src/skills/coach/templates/audit-report.md +39 -0
  66. package/src/skills/coach/templates/coach-backlog-init.yaml +14 -0
  67. package/src/skills/coach/templates/coach-backlog-init.yaml.legacy +10 -0
  68. package/src/skills/coach/templates/improvement-plan.md +42 -0
  69. package/src/skills/coach/templates/new-skill.md +95 -0
  70. package/src/skills/coach/tests/cases/TC-COACH-001/current/claude-sonnet/trial-1.md +58 -0
  71. package/src/skills/coach/tests/cases/TC-COACH-001/current/claude-sonnet/trial-2.md +65 -0
  72. package/src/skills/coach/tests/cases/TC-COACH-001/current/claude-sonnet/trial-3.md +58 -0
  73. package/src/skills/coach/tests/cases/TC-COACH-001/current/judge.json +151 -0
  74. package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-deepseek/trial-1.md +46 -0
  75. package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-deepseek/trial-2.md +0 -0
  76. package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-deepseek/trial-3.md +75 -0
  77. package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-glm/trial-1.md +81 -0
  78. package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-glm/trial-2.md +101 -0
  79. package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-glm/trial-3.md +91 -0
  80. package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-minimax/trial-1.md +48 -0
  81. package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-minimax/trial-2.md +30 -0
  82. package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-minimax/trial-3.md +55 -0
  83. package/src/skills/coach/tests/cases/TC-COACH-001/current/meta.json +95 -0
  84. package/src/skills/coach/tests/cases/TC-COACH-001-evidence-based-temporal-diagram.yaml +53 -0
  85. package/src/skills/coach/tests/cases/TC-COACH-002/current/claude-sonnet/trial-1.md +46 -0
  86. package/src/skills/coach/tests/cases/TC-COACH-002/current/claude-sonnet/trial-2.md +50 -0
  87. package/src/skills/coach/tests/cases/TC-COACH-002/current/claude-sonnet/trial-3.md +48 -0
  88. package/src/skills/coach/tests/cases/TC-COACH-002/current/judge.json +151 -0
  89. package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-deepseek/trial-1.md +0 -0
  90. package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-deepseek/trial-2.md +37 -0
  91. package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-deepseek/trial-3.md +30 -0
  92. package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-glm/trial-1.md +23 -0
  93. package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-glm/trial-2.md +29 -0
  94. package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-glm/trial-3.md +35 -0
  95. package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-minimax/trial-1.md +13 -0
  96. package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-minimax/trial-2.md +19 -0
  97. package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-minimax/trial-3.md +33 -0
  98. package/src/skills/coach/tests/cases/TC-COACH-002/current/meta.json +95 -0
  99. package/src/skills/coach/tests/cases/TC-COACH-002-root-cause-first.yaml +57 -0
  100. package/src/skills/coach/tests/fixtures/pipeline-2026-04-06_id-collision.log +77 -0
  101. package/src/skills/coach/tests/index.yaml +29 -0
  102. package/src/skills/coach/tests/rubrics/calibration/evidence-based-bad.md +13 -0
  103. package/src/skills/coach/tests/rubrics/calibration/evidence-based-good.md +29 -0
  104. package/src/skills/coach/tests/rubrics/evidence-based.md +26 -0
  105. package/src/skills/coach/tests/rubrics/root-cause-first.md +21 -0
  106. package/src/skills/coach/workflows/analyze.md +79 -0
  107. package/src/skills/coach/workflows/analyze.md.legacy +64 -0
  108. package/src/skills/coach/workflows/audit.md +74 -0
  109. package/src/skills/coach/workflows/audit.md.legacy +59 -0
  110. package/src/skills/coach/workflows/create.md +80 -0
  111. package/src/skills/coach/workflows/create.md.legacy +67 -0
  112. package/src/skills/coach/workflows/improve.md +71 -0
  113. package/src/skills/coach/workflows/improve.md.legacy +60 -0
  114. package/src/skills/coach/workflows/research.md +55 -0
  115. package/src/skills/coach/workflows/review.md +52 -0
  116. package/src/skills/coach/workflows/review.md.legacy +48 -0
  117. package/src/skills/coach/workflows/test.md +97 -0
  118. package/src/skills/create-plan/README.md +39 -0
  119. package/src/skills/create-plan/SKILL.md +104 -0
  120. package/src/skills/create-plan/algorithms/risk-assessment.md +73 -0
  121. package/src/skills/create-plan/knowledge/plan-completeness.md +67 -0
  122. package/src/skills/create-plan/knowledge/plan-lifecycle.md +33 -0
  123. package/src/skills/create-plan/knowledge/task-verification-pairs.md +151 -0
  124. package/src/skills/create-plan/scripts/validate-completeness.js +182 -0
  125. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/claude-sonnet/trial-1.md +5 -0
  126. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/claude-sonnet/trial-2.md +39 -0
  127. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/claude-sonnet/trial-3.md +35 -0
  128. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/judge.json +167 -0
  129. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-deepseek/trial-1.md +5 -0
  130. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-deepseek/trial-2.md +10 -0
  131. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-deepseek/trial-3.md +5 -0
  132. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-glm/trial-1.md +26 -0
  133. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-glm/trial-2.md +86 -0
  134. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-glm/trial-3.md +5 -0
  135. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-minimax/trial-1.md +11 -0
  136. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-minimax/trial-2.md +15 -0
  137. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-minimax/trial-3.md +14 -0
  138. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/meta.json +119 -0
  139. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001-validate-completeness.yaml +41 -0
  140. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/claude-sonnet/trial-1.md +25 -0
  141. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/claude-sonnet/trial-2.md +30 -0
  142. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/claude-sonnet/trial-3.md +37 -0
  143. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/judge.json +164 -0
  144. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-deepseek/trial-1.md +3 -0
  145. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-deepseek/trial-2.md +11 -0
  146. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-deepseek/trial-3.md +13 -0
  147. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-glm/trial-1.md +44 -0
  148. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-glm/trial-2.md +5 -0
  149. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-glm/trial-3.md +49 -0
  150. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-minimax/trial-1.md +6 -0
  151. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-minimax/trial-2.md +11 -0
  152. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-minimax/trial-3.md +16 -0
  153. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/meta.json +116 -0
  154. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002-task-granularity.yaml +39 -0
  155. package/src/skills/create-plan/tests/index.yaml +25 -0
  156. package/src/skills/create-plan/tests/rubrics/task-granularity.md +21 -0
  157. package/src/skills/create-plan/tests/rubrics/validate-completeness.md +21 -0
  158. package/src/skills/create-plan/workflows/create.md +136 -0
  159. package/src/skills/create-report/README.md +40 -0
  160. package/src/skills/create-report/SKILL.md +73 -0
  161. package/src/skills/create-report/algorithms/metric-calculation.md +93 -0
  162. package/src/skills/create-report/knowledge/report-metrics.md +82 -0
  163. package/src/skills/create-report/scripts/calc-metrics.js +383 -0
  164. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/claude-sonnet/trial-1.md +25 -0
  165. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/claude-sonnet/trial-2.md +26 -0
  166. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/claude-sonnet/trial-3.md +28 -0
  167. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/judge.json +163 -0
  168. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-deepseek/trial-1.md +4 -0
  169. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-deepseek/trial-2.md +3 -0
  170. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-deepseek/trial-3.md +6 -0
  171. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-glm/trial-1.md +8 -0
  172. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-glm/trial-2.md +12 -0
  173. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-glm/trial-3.md +7 -0
  174. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-minimax/trial-1.md +12 -0
  175. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-minimax/trial-2.md +22 -0
  176. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-minimax/trial-3.md +13 -0
  177. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/meta.json +115 -0
  178. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001-root-cause-attribution.yaml +57 -0
  179. package/src/skills/create-report/tests/index.yaml +20 -0
  180. package/src/skills/create-report/tests/rubrics/root-cause-attribution.md +21 -0
  181. package/src/skills/create-report/workflows/standard.md +175 -0
  182. package/src/skills/decompose-gaps/README.md +39 -0
  183. package/src/skills/decompose-gaps/SKILL.md +78 -0
  184. package/src/skills/decompose-gaps/algorithms/scope-check.md +110 -0
  185. package/src/skills/decompose-gaps/knowledge/scope-validation.md +65 -0
  186. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/claude-sonnet/trial-1.md +49 -0
  187. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/claude-sonnet/trial-2.md +56 -0
  188. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/claude-sonnet/trial-3.md +39 -0
  189. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/judge.json +164 -0
  190. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-deepseek/trial-1.md +25 -0
  191. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-deepseek/trial-2.md +11 -0
  192. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-deepseek/trial-3.md +26 -0
  193. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-glm/trial-1.md +19 -0
  194. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-glm/trial-2.md +5 -0
  195. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-glm/trial-3.md +28 -0
  196. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-minimax/trial-1.md +23 -0
  197. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-minimax/trial-2.md +27 -0
  198. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-minimax/trial-3.md +25 -0
  199. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/meta.json +116 -0
  200. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001-scope-exclusion.yaml +46 -0
  201. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/claude-sonnet/trial-1.md +32 -0
  202. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/claude-sonnet/trial-2.md +20 -0
  203. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/claude-sonnet/trial-3.md +26 -0
  204. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/judge.json +164 -0
  205. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-deepseek/trial-1.md +7 -0
  206. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-deepseek/trial-2.md +16 -0
  207. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-deepseek/trial-3.md +7 -0
  208. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-glm/trial-1.md +5 -0
  209. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-glm/trial-2.md +11 -0
  210. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-glm/trial-3.md +13 -0
  211. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-minimax/trial-1.md +13 -0
  212. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-minimax/trial-2.md +12 -0
  213. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-minimax/trial-3.md +5 -0
  214. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/meta.json +116 -0
  215. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002-glob-before-write.yaml +36 -0
  216. package/src/skills/decompose-gaps/tests/index.yaml +25 -0
  217. package/src/skills/decompose-gaps/tests/rubrics/glob-before-write.md +21 -0
  218. package/src/skills/decompose-gaps/tests/rubrics/scope-exclusion.md +21 -0
  219. package/src/skills/decompose-gaps/workflows/decompose.md +120 -0
  220. package/src/skills/decompose-plan/README.md +43 -0
  221. package/src/skills/decompose-plan/SKILL.md +87 -0
  222. package/src/skills/decompose-plan/algorithms/deduplication.md +101 -0
  223. package/src/skills/decompose-plan/knowledge/atomicity-checklist.md +113 -0
  224. package/src/skills/decompose-plan/knowledge/capabilities.md +44 -0
  225. package/src/skills/decompose-plan/knowledge/human-task-rules.md +67 -0
  226. package/src/skills/decompose-plan/knowledge/scope-guard-checklist.md +73 -0
  227. package/src/skills/decompose-plan/scripts/check-atomicity-limit.js +47 -0
  228. package/src/skills/decompose-plan/scripts/check-duplicates.js +323 -0
  229. package/src/skills/decompose-plan/scripts/verify-atomicity.js +408 -0
  230. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/claude-sonnet/trial-1.md +30 -0
  231. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/claude-sonnet/trial-2.md +36 -0
  232. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/claude-sonnet/trial-3.md +37 -0
  233. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/judge.json +163 -0
  234. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-deepseek/trial-1.md +20 -0
  235. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-deepseek/trial-2.md +17 -0
  236. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-deepseek/trial-3.md +28 -0
  237. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-glm/trial-1.md +114 -0
  238. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-glm/trial-2.md +137 -0
  239. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-glm/trial-3.md +188 -0
  240. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-minimax/trial-1.md +0 -0
  241. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-minimax/trial-2.md +32 -0
  242. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-minimax/trial-3.md +110 -0
  243. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/meta.json +115 -0
  244. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001-atomicity-no-1to1.yaml +56 -0
  245. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/claude-sonnet/trial-1.md +47 -0
  246. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/claude-sonnet/trial-2.md +54 -0
  247. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/claude-sonnet/trial-3.md +43 -0
  248. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/judge.json +163 -0
  249. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-deepseek/trial-1.md +15 -0
  250. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-deepseek/trial-2.md +5 -0
  251. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-deepseek/trial-3.md +12 -0
  252. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-glm/trial-1.md +34 -0
  253. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-glm/trial-2.md +30 -0
  254. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-glm/trial-3.md +35 -0
  255. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-minimax/trial-1.md +0 -0
  256. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-minimax/trial-2.md +31 -0
  257. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-minimax/trial-3.md +0 -0
  258. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/meta.json +115 -0
  259. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002-get-next-id-mandatory.yaml +44 -0
  260. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/claude-sonnet/trial-1.md +21 -0
  261. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/claude-sonnet/trial-2.md +38 -0
  262. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/claude-sonnet/trial-3.md +30 -0
  263. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/judge.json +163 -0
  264. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-deepseek/trial-1.md +31 -0
  265. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-deepseek/trial-2.md +35 -0
  266. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-deepseek/trial-3.md +48 -0
  267. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-glm/trial-1.md +167 -0
  268. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-glm/trial-2.md +62 -0
  269. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-glm/trial-3.md +174 -0
  270. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-minimax/trial-1.md +0 -0
  271. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-minimax/trial-2.md +0 -0
  272. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-minimax/trial-3.md +0 -0
  273. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/meta.json +115 -0
  274. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003-verbatim-dod-transfer.yaml +42 -0
  275. package/src/skills/decompose-plan/tests/index.yaml +30 -0
  276. package/src/skills/decompose-plan/tests/rubrics/atomicity-no-1to1.md +21 -0
  277. package/src/skills/decompose-plan/tests/rubrics/get-next-id-mandatory.md +21 -0
  278. package/src/skills/decompose-plan/tests/rubrics/verbatim-dod-transfer.md +21 -0
  279. package/src/skills/decompose-plan/workflows/decompose.md +272 -0
  280. package/src/skills/deep-research/README.md +36 -0
  281. package/src/skills/deep-research/SKILL.md +106 -0
  282. package/src/skills/deep-research/algorithms/source-scoring.md +63 -0
  283. package/src/skills/deep-research/algorithms/synthesis.md +67 -0
  284. package/src/skills/deep-research/knowledge/data-validation.md +44 -0
  285. package/src/skills/deep-research/knowledge/perplexity-config.md +30 -0
  286. package/src/skills/deep-research/knowledge/research-methodology.md +54 -0
  287. package/src/skills/deep-research/knowledge/source-evaluation.md +33 -0
  288. package/src/skills/deep-research/scripts/perplexity-research.js +315 -0
  289. package/src/skills/deep-research/templates/brief-summary.md +25 -0
  290. package/src/skills/deep-research/templates/research-report.md +76 -0
  291. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/claude-haiku/trial-1.md +48 -0
  292. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/claude-haiku/trial-2.md +88 -0
  293. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/claude-haiku/trial-3.md +56 -0
  294. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/judge.json +163 -0
  295. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-free/trial-1.md +58 -0
  296. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-free/trial-2.md +249 -0
  297. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-free/trial-3.md +44 -0
  298. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-glm/trial-1.md +96 -0
  299. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-glm/trial-2.md +56 -0
  300. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-glm/trial-3.md +94 -0
  301. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-glm-air/trial-1.md +11 -0
  302. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-glm-air/trial-2.md +1 -0
  303. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-glm-air/trial-3.md +1 -0
  304. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/meta.json +115 -0
  305. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001-self-check-url.yaml +58 -0
  306. package/src/skills/deep-research/tests/index.yaml +20 -0
  307. package/src/skills/deep-research/tests/rubrics/self-check-url.md +34 -0
  308. package/src/skills/deep-research/workflows/base-checklist.md +19 -0
  309. package/src/skills/deep-research/workflows/benchmark.md +38 -0
  310. package/src/skills/deep-research/workflows/competitor.md +44 -0
  311. package/src/skills/deep-research/workflows/custom.md +32 -0
  312. package/src/skills/deep-research/workflows/market.md +44 -0
  313. package/src/skills/deep-research/workflows/technology.md +40 -0
  314. package/src/skills/deep-research/workflows/trend.md +40 -0
  315. package/src/skills/execute-task/README.md +44 -0
  316. package/src/skills/execute-task/SKILL.md +292 -0
  317. package/src/skills/execute-task/algorithms/execution-strategy.md +136 -0
  318. package/src/skills/execute-task/knowledge/context-checkpoints.md +75 -0
  319. package/src/skills/execute-task/knowledge/ticket-structure.md +70 -0
  320. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/claude-haiku/trial-1.md +5 -0
  321. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/claude-haiku/trial-2.md +5 -0
  322. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/claude-haiku/trial-3.md +5 -0
  323. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/judge.json +124 -0
  324. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/kilo-free/trial-1.md +4 -0
  325. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/kilo-free/trial-2.md +4 -0
  326. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/kilo-free/trial-3.md +4 -0
  327. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/kilo-glm-air/trial-1.md +4 -0
  328. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/kilo-glm-air/trial-2.md +4 -0
  329. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/kilo-glm-air/trial-3.md +11 -0
  330. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/meta.json +89 -0
  331. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001-no-ticket-creation.yaml +48 -0
  332. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/claude-haiku/trial-1.md +5 -0
  333. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/claude-haiku/trial-2.md +6 -0
  334. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/claude-haiku/trial-3.md +5 -0
  335. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/judge.json +124 -0
  336. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/kilo-free/trial-1.md +4 -0
  337. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/kilo-free/trial-2.md +4 -0
  338. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/kilo-free/trial-3.md +8 -0
  339. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/kilo-glm-air/trial-1.md +9 -0
  340. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/kilo-glm-air/trial-2.md +26 -0
  341. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/kilo-glm-air/trial-3.md +4 -0
  342. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/meta.json +89 -0
  343. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002-no-duplicate-dod.yaml +44 -0
  344. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-003/current/claude-haiku/trial-1.md +5 -0
  345. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-003/current/claude-haiku/trial-2.md +5 -0
  346. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-003/current/claude-haiku/trial-3.md +5 -0
  347. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-003/current/judge.json +46 -0
  348. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-003/current/meta.json +37 -0
  349. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-003-verification-proportionality.yaml +46 -0
  350. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/claude-haiku/trial-1.md +18 -0
  351. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/claude-haiku/trial-2.md +16 -0
  352. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/claude-haiku/trial-3.md +14 -0
  353. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/judge.json +124 -0
  354. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/kilo-free/trial-1.md +5 -0
  355. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/kilo-free/trial-2.md +5 -0
  356. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/kilo-free/trial-3.md +1 -0
  357. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/kilo-glm-air/trial-1.md +8 -0
  358. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/kilo-glm-air/trial-2.md +5 -0
  359. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/kilo-glm-air/trial-3.md +4 -0
  360. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/meta.json +89 -0
  361. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004-no-foreign-ticket-edit.yaml +50 -0
  362. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/claude-haiku/trial-1.md +5 -0
  363. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/claude-haiku/trial-2.md +5 -0
  364. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/claude-haiku/trial-3.md +5 -0
  365. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/judge.json +124 -0
  366. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/kilo-free/trial-1.md +15 -0
  367. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/kilo-free/trial-2.md +4 -0
  368. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/kilo-free/trial-3.md +5 -0
  369. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/kilo-glm-air/trial-1.md +11 -0
  370. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/kilo-glm-air/trial-2.md +11 -0
  371. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/kilo-glm-air/trial-3.md +4 -0
  372. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/meta.json +89 -0
  373. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005-ticket-fields-updated.yaml +39 -0
  374. package/src/skills/execute-task/tests/fixtures/IMPL-902-create-file.md +41 -0
  375. package/src/skills/execute-task/tests/fixtures/IMPL-904-current-task.md +40 -0
  376. package/src/skills/execute-task/tests/fixtures/IMPL-906-fill-ticket.md +42 -0
  377. package/src/skills/execute-task/tests/fixtures/QA-901-button-click.md +41 -0
  378. package/src/skills/execute-task/tests/fixtures/QA-903-visual-figma.md +40 -0
  379. package/src/skills/execute-task/tests/fixtures/TASK-905-done-with-typo.md +36 -0
  380. package/src/skills/execute-task/tests/index.yaml +39 -0
  381. package/src/skills/execute-task/tests/rubrics/no-duplicate-dod.md +22 -0
  382. package/src/skills/execute-task/tests/rubrics/no-foreign-ticket-edit.md +20 -0
  383. package/src/skills/execute-task/tests/rubrics/no-ticket-creation.md +21 -0
  384. package/src/skills/execute-task/tests/rubrics/ticket-fields-updated.md +23 -0
  385. package/src/skills/execute-task/tests/rubrics/verification-proportionality.md +22 -0
  386. package/src/skills/execute-task/workflows/execute.md +104 -0
  387. package/src/skills/manual-testing/README.md +63 -0
  388. package/src/skills/manual-testing/SKILL.md +174 -0
  389. package/src/skills/manual-testing/algorithms/blocked-tool-strategy.md +74 -0
  390. package/src/skills/manual-testing/algorithms/bug-severity.md +73 -0
  391. package/src/skills/manual-testing/algorithms/mcp-budget.md +97 -0
  392. package/src/skills/manual-testing/algorithms/test-prioritization.md +69 -0
  393. package/src/skills/manual-testing/knowledge/browser-extension-testing.md +102 -0
  394. package/src/skills/manual-testing/knowledge/browser-tools.md +114 -0
  395. package/src/skills/manual-testing/knowledge/desktop-tools-advanced.md +92 -0
  396. package/src/skills/manual-testing/knowledge/desktop-tools-core.md +76 -0
  397. package/src/skills/manual-testing/knowledge/sandbox-advanced.md +83 -0
  398. package/src/skills/manual-testing/knowledge/sandbox-core.md +67 -0
  399. package/src/skills/manual-testing/knowledge/stateful-edge-cases.md +69 -0
  400. package/src/skills/manual-testing/knowledge/test-case-design.md +107 -0
  401. package/src/skills/manual-testing/knowledge/testing-types.md +45 -0
  402. package/src/skills/manual-testing/templates/bug-report.md +52 -0
  403. package/src/skills/manual-testing/templates/test-case.md +34 -0
  404. package/src/skills/manual-testing/templates/test-plan.md +97 -0
  405. package/src/skills/manual-testing/templates/test-session-report.md +56 -0
  406. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/claude-sonnet/trial-1.md +21 -0
  407. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/claude-sonnet/trial-2.md +65 -0
  408. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/claude-sonnet/trial-3.md +35 -0
  409. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/judge.json +163 -0
  410. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-deepseek/trial-1.md +0 -0
  411. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-deepseek/trial-2.md +7 -0
  412. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-deepseek/trial-3.md +0 -0
  413. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-glm/trial-1.md +4 -0
  414. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-glm/trial-2.md +15 -0
  415. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-glm/trial-3.md +8 -0
  416. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-minimax/trial-1.md +5 -0
  417. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-minimax/trial-2.md +7 -0
  418. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-minimax/trial-3.md +7 -0
  419. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/meta.json +114 -0
  420. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001-sandbox-mandatory.yaml +38 -0
  421. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/claude-sonnet/trial-1.md +47 -0
  422. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/claude-sonnet/trial-2.md +39 -0
  423. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/claude-sonnet/trial-3.md +40 -0
  424. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/judge.json +163 -0
  425. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-deepseek/trial-1.md +19 -0
  426. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-deepseek/trial-2.md +15 -0
  427. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-deepseek/trial-3.md +24 -0
  428. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-glm/trial-1.md +19 -0
  429. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-glm/trial-2.md +13 -0
  430. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-glm/trial-3.md +18 -0
  431. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-minimax/trial-1.md +21 -0
  432. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-minimax/trial-2.md +15 -0
  433. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-minimax/trial-3.md +14 -0
  434. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/meta.json +114 -0
  435. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002-visual-tc-screenshot.yaml +37 -0
  436. package/src/skills/manual-testing/tests/index.yaml +25 -0
  437. package/src/skills/manual-testing/tests/last-run-tc001-sonnet.log +140 -0
  438. package/src/skills/manual-testing/tests/last-run-tc002.log +1 -0
  439. package/src/skills/manual-testing/tests/last-run.log +1469 -0
  440. package/src/skills/manual-testing/tests/rubrics/sandbox-mandatory.md +20 -0
  441. package/src/skills/manual-testing/tests/rubrics/visual-tc-screenshot.md +21 -0
  442. package/src/skills/manual-testing/workflows/acceptance.md +80 -0
  443. package/src/skills/manual-testing/workflows/exploratory.md +84 -0
  444. package/src/skills/manual-testing/workflows/regression.md +76 -0
  445. package/src/skills/manual-testing/workflows/smoke.md +109 -0
  446. package/src/skills/manual-testing/workflows/test-plan.md +75 -0
  447. package/src/skills/review-result/README.md +59 -0
  448. package/src/skills/review-result/SKILL.md +138 -0
  449. package/src/skills/review-result/algorithms/verification.md +112 -0
  450. package/src/skills/review-result/knowledge/dod-patterns.md +115 -0
  451. package/src/skills/review-result/scripts/verify-artifacts.js +354 -0
  452. package/src/skills/review-result/templates/verdict.md +153 -0
  453. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/claude-haiku/trial-1.md +22 -0
  454. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/claude-haiku/trial-2.md +7 -0
  455. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/claude-haiku/trial-3.md +21 -0
  456. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/claude-sonnet/trial-1.md +6 -0
  457. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/claude-sonnet/trial-2.md +6 -0
  458. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/claude-sonnet/trial-3.md +18 -0
  459. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/judge.json +164 -0
  460. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-deepseek/trial-1.md +5 -0
  461. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-deepseek/trial-2.md +7 -0
  462. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-deepseek/trial-3.md +6 -0
  463. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-glm/trial-1.md +49 -0
  464. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-glm/trial-2.md +28 -0
  465. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-glm/trial-3.md +37 -0
  466. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-minimax/trial-1.md +22 -0
  467. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-minimax/trial-2.md +13 -0
  468. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-minimax/trial-3.md +21 -0
  469. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/meta.json +116 -0
  470. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001-visual-tc-trigger.yaml +51 -0
  471. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/claude-haiku/trial-1.md +23 -0
  472. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/claude-haiku/trial-2.md +22 -0
  473. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/claude-haiku/trial-3.md +28 -0
  474. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/claude-sonnet/trial-1.md +4 -0
  475. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/claude-sonnet/trial-2.md +36 -0
  476. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/claude-sonnet/trial-3.md +4 -0
  477. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/judge.json +163 -0
  478. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-deepseek/trial-1.md +4 -0
  479. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-deepseek/trial-2.md +0 -0
  480. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-deepseek/trial-3.md +4 -0
  481. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-glm/trial-1.md +39 -0
  482. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-glm/trial-2.md +25 -0
  483. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-glm/trial-3.md +32 -0
  484. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-minimax/trial-1.md +34 -0
  485. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-minimax/trial-2.md +8 -0
  486. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-minimax/trial-3.md +23 -0
  487. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/meta.json +115 -0
  488. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002-path-line-suffix.yaml +39 -0
  489. package/src/skills/review-result/tests/fixtures/IMPL-902-path-with-line.md +43 -0
  490. package/src/skills/review-result/tests/fixtures/QA-901-visual-button.md +46 -0
  491. package/src/skills/review-result/tests/index.yaml +25 -0
  492. package/src/skills/review-result/tests/rubrics/path-line-suffix.md +19 -0
  493. package/src/skills/review-result/tests/rubrics/visual-tc-trigger.md +19 -0
  494. package/src/skills/review-result/workflows/review.md +209 -0
@@ -0,0 +1,35 @@
1
+ Now I have all the data. Let me record this analysis in the backlog.
2
+ ГЛАВНОЕ ПРАВИЛО: «Поправлял ли стейкхолдер в этой сессии?» — **Нет**. Это первое сообщение в сессии. Стейкхолдер указывает на повторный инцидент, но не корректирует мою работу.
3
+
4
+ ---
5
+
6
+ ## Результат анализа (11-й по тому же классу ошибки)
7
+
8
+ ### Проверка эскалации HUMAN-5 (процедура дословно)
9
+
10
+ | Пункт | Проверка | Результат |
11
+ |-------|----------|-----------|
12
+ | **A** | `status` = `done`? | `status: in-progress` (стр.33) → **FAIL** |
13
+ | **B** | Result заполнена? | Summary='', Started='', Completed='' → **FAIL** |
14
+ | **C** | DoD отмечены? | 0/4 (все `[ ]`) → **FAIL** |
15
+
16
+ ### HUMAN-5 — **БЛОКЕР**
17
+
18
+ Тикет физически лежит в `done/`, но **не реализован**: пайплайн автоматически закрыл его (баг infinite loop, `completed_at` проставлен без фактического выполнения). Защиты 1/2/3 из HUMAN-5 **не внедрены**.
19
+
20
+ ### Новые инциденты
21
+
22
+ Последний лог `pipeline_2026-04-19_07-36-07.log`: все 5 тикетов `result_filled: true`. Новых ghost execution **не обнаружено**. Инциденты стейкхолдера — ранее задокументированные QA-18 (04-17) и IMPL-24 (04-18).
23
+
24
+ ### Текстовые меры: ИСЧЕРПАНЫ
25
+
26
+ 6+ текстовых усилений `execute-task` за 4 дня (CHG-042, пункты 9/13/14, execution-strategy.md, шаг 5). Принцип исчерпанности (≥3 на одну корневую проблему) применён. **Новых текстовых правок не вносилось и не будет.**
27
+
28
+ ### Единственный вектор: реализовать HUMAN-5
29
+
30
+ Рекомендуется **Защита 1** (post-check: `Edit` вызван на файл тикета?):
31
+ - После `execute-task`, перед `review/` — runner проверяет, был ли вызван `Edit`/`Write` на файл тикета
32
+ - Если нет → `status: failed`, автоматический возврат в `ready/`
33
+ - Покрывает оба паттерна (A: stdout без Edit, B: permission denied без blocked)
34
+
35
+ **Для продолжения** — переоткройте HUMAN-5 (переместите из `done/` в `in-progress/` или `ready/`) и реализуйте хотя бы одну защиту. Без машинной защиты инциденты будут повторяться независимо от текстовых правок в скилах.
@@ -0,0 +1,13 @@
1
+ **⛔ Проверка эскалации HUMAN-5 (выполняю дословно):**
2
+
3
+ | Пункт | Проверка | Результат |
4
+ |-------|----------|-----------|
5
+ | A | status = done (frontmatter) | **FAIL** — status: in-progress |
6
+ | B | Result заполнена (Summary ≠ пусто, Время заполнено, Agent used заполнен) | **FAIL** — всё пусто |
7
+ | C | DoD checkbox'ы отмечены (не 0/N) | **FAIL** — 0/4 |
8
+
9
+ **HUMAN-5 — БЛОКЕР.** Эскалация не реализована.
10
+
11
+ **Вывод:** Это 10+ анализ того же класса инцидентов за неделю. Текстовые меры исчерпаны (6+ правок в execute-task, 2 теста). HUMAN-5 содержит 3 варианта машинной защиты, но ни один не реализован стейкхолдером.
12
+
13
+ **Действие для стейкхолдера:** Реализуйте HUMAN-5 (Защита 1 рекомендована — post-check на вызов Edit тикета). Это единственный путь устранения класса ошибки.
@@ -0,0 +1,19 @@
1
+ **БЛОКЕР: HUMAN-5 не реализована**
2
+
3
+ Проверка эскалации по процедуре:
4
+
5
+ | Пункт | Ожидание | Факт | Результат |
6
+ |-------|----------|------|-----------|
7
+ | A | status = done | status: **in-progress** | FAIL |
8
+ | B | Result заполнена | Summary='', Started/Completed='', Agent='' | FAIL |
9
+ | C | DoD отмечены | DoD **0/4** | FAIL |
10
+
11
+ **Вывод:** HUMAN-5 (машинная защита) создана 2026-04-18, но за 24+ часа не реализована. Это **БЛОКЕР**.
12
+
13
+ **Факты:**
14
+ - Ghost execution: 9+ инцидентов за 5 дней (QA-18 2026-04-17, IMPL-24 2026-04-18, и текущий)
15
+ - Текстовые меры: 6+ правок в execute-task — исчерпаны
16
+ - Лог `pipeline_2026-04-19_07-36-07`: новых ghost execution нет (но стейкхолдер фиксирует рецидив)
17
+ - Последний зафиксированный инцидент: IMPL-24 (2026-04-18)
18
+
19
+ **Требуется от стейкхолдера:** реализовать HUMAN-5 (Защита 1 — post-check на физическое существование файлов после Edit). Это единственный вектор закрытия класса ошибок. Текстовые усилия скилов не работают системно на этом классе.
@@ -0,0 +1,33 @@
1
+ Проверяю текущий статус эскалации HUMAN-5 и наличие новых инцидентов:
2
+ **Критический статус:**
3
+
4
+ HUMAN-5: `status: in-progress`, Result пуст — **машинная защита не реализована**. Даже несмотря на `completed_at` в 09:57, тикет остаётся незакрытым и незавершённым.
5
+
6
+ Свежих логов после 07:36 нет. Но стейкхолдер сообщает о рецидиве — проверю последние тикеты:
7
+ ---
8
+
9
+ ## ⛔ БЛОКЕР: HUMAN-5 не реализован
10
+
11
+ **Статус по coach-backlog.yaml:**
12
+ - **7+ запросов** стейкхолдера на ghost execution за неделю
13
+ - **6+ текстовых правок** в execute-task и algorithms — агенты системно игнорируют
14
+ - **HUMAN-5** создан 2026-04-18, status: `in-progress`, Result пуст
15
+ - **>24 часов** тикет открыт без реализации
16
+
17
+ **Root Cause**: Текстовые меры исчерпаны. Агент, игнорирующий запрет, не прочитает усиленную версию.
18
+
19
+ **Единственный вектор решения** — машинная защита:
20
+
21
+ | Что | Где |
22
+ |-----|-----|
23
+ | **Защита 1** (рекомендована) | Post-check: runner проверяет, был ли вызван `Edit` на файл тикета. Если нет → `status: failed` автоматически. |
24
+ | Защита 2 | Pre-check: валидация accessibility `context.files` до запуска агента |
25
+ | Защита 3 | Усиление verify-artifacts: если `missing_files` не пусто → `status: failed` |
26
+
27
+ **Файл эскалации:** `.workflow/tickets/done/HUMAN-5.md` (читабелен, но не реализован)
28
+
29
+ **Действие от стейкхолдера:** реализовать **Защиту 1** (минимальная, покрывает оба паттерна). После реализации — закрыть HUMAN-5 с заполненным Result.
30
+
31
+ ---
32
+
33
+ *Коуч не вносит новых текстовых правок — это не решит класс ошибки. Ожидаю реализацию HUMAN-5.*
@@ -0,0 +1,95 @@
1
+ {
2
+ "date": "2026-04-19T12:30:16.947Z",
3
+ "skill_sha": "3c4e051",
4
+ "status": "passed",
5
+ "duration_ms": 2,
6
+ "l1_skipped": true,
7
+ "per_model": {
8
+ "claude-sonnet": {
9
+ "passed": true,
10
+ "pass_count": 3,
11
+ "total": 3,
12
+ "threshold": 2
13
+ },
14
+ "kilo-deepseek": {
15
+ "passed": true,
16
+ "pass_count": 2,
17
+ "total": 3,
18
+ "threshold": 2
19
+ },
20
+ "kilo-minimax": {
21
+ "passed": true,
22
+ "pass_count": 3,
23
+ "total": 3,
24
+ "threshold": 2
25
+ },
26
+ "kilo-glm": {
27
+ "passed": true,
28
+ "pass_count": 3,
29
+ "total": 3,
30
+ "threshold": 2
31
+ }
32
+ },
33
+ "rubric_scores": [
34
+ {
35
+ "agentId": "claude-sonnet",
36
+ "trial": 1,
37
+ "score": 5
38
+ },
39
+ {
40
+ "agentId": "claude-sonnet",
41
+ "trial": 2,
42
+ "score": 5
43
+ },
44
+ {
45
+ "agentId": "claude-sonnet",
46
+ "trial": 3,
47
+ "score": 5
48
+ },
49
+ {
50
+ "agentId": "kilo-deepseek",
51
+ "trial": 1,
52
+ "score": 1
53
+ },
54
+ {
55
+ "agentId": "kilo-deepseek",
56
+ "trial": 2,
57
+ "score": 5
58
+ },
59
+ {
60
+ "agentId": "kilo-deepseek",
61
+ "trial": 3,
62
+ "score": 5
63
+ },
64
+ {
65
+ "agentId": "kilo-minimax",
66
+ "trial": 1,
67
+ "score": 5
68
+ },
69
+ {
70
+ "agentId": "kilo-minimax",
71
+ "trial": 2,
72
+ "score": 5
73
+ },
74
+ {
75
+ "agentId": "kilo-minimax",
76
+ "trial": 3,
77
+ "score": 5
78
+ },
79
+ {
80
+ "agentId": "kilo-glm",
81
+ "trial": 1,
82
+ "score": 5
83
+ },
84
+ {
85
+ "agentId": "kilo-glm",
86
+ "trial": 2,
87
+ "score": 5
88
+ },
89
+ {
90
+ "agentId": "kilo-glm",
91
+ "trial": 3,
92
+ "score": 5
93
+ }
94
+ ]
95
+ }
@@ -0,0 +1,57 @@
1
+ id: TC-COACH-002
2
+ title: "Coach начинает анализ с root cause первого порядка, а не с формулировок"
3
+ origin:
4
+ chg: [CHG-038, CHG-046, CHG-098]
5
+ incidents:
6
+ - "workflowAi: CHG-038 — повторный инцидент коллизии ID после текстового усиления"
7
+ - "PulseProxy: CHG-046 — уход в формулировки вместо root cause"
8
+ - "workflowAiVsCode: CHG-098 — рационализация «формулировка достаточна»"
9
+ principle: "Root Cause First (#1) + Evidence-Based (#2)"
10
+ backlog_sources:
11
+ - ".workflow/coach-backlog.yaml"
12
+ - "d:/Dev/PulseProxy/.workflow/coach-backlog.yaml"
13
+ - "d:/Dev/workflowAiVsCode/.workflow/coach-backlog.yaml"
14
+ tags: [root-cause, evidence-based, anti-pattern]
15
+ severity: critical
16
+
17
+ scenario:
18
+ system_prompt_file: "../../SKILL.md"
19
+ extra_instructions: |
20
+ Стейкхолдер просит: «Агент выполнил тикет, но файлы не созданы — только stdout показал "успех".
21
+ Это уже второй раз за неделю. Разберись.»
22
+ inputs:
23
+ - kind: inline
24
+ as: incident_description
25
+ content: |
26
+ Инцидент: execute-task агент (claude-sonnet) заявил ✅ в stdout,
27
+ но в тикете Result пуст, DoD 0/5. Файл knowledge/test-authorship.md
28
+ не создан. Инструкция против «призрачного выполнения» уже есть
29
+ в execution-strategy.md:59-88. Два агента её проигнорировали.
30
+
31
+ assertions:
32
+ static:
33
+ - kind: skill_contains
34
+ file: "SKILL.md"
35
+ pattern: "[Rr]oot.?[Cc]ause"
36
+ reason: "Принцип Root Cause First должен быть в SKILL.md коуча."
37
+ - kind: skill_contains
38
+ file: "SKILL.md"
39
+ pattern: "машинн.+защит"
40
+ reason: "Эскалация в машинную защиту при повторном инциденте."
41
+
42
+ deterministic:
43
+ - kind: output_contains_all
44
+ values: ["root cause", "машинн"]
45
+ - kind: output_does_not_contain
46
+ values: ["формулировк достаточн", "наблюдать"]
47
+
48
+ rubric:
49
+ - kind: llm_judge
50
+ rubric_file: "../rubrics/root-cause-first.md"
51
+ criterion: "Начинает ли коуч с невыполненного действия (1-й порядок), а затем анализирует почему инструкции допустили?"
52
+ pass_threshold: 4
53
+ trials: 3
54
+ aggregate: majority
55
+
56
+ execution:
57
+ timeout_s: 1200
@@ -0,0 +1,77 @@
1
+ [2026-04-06 16:36:02] [INFO] [PipelineRunner] FileGuard enabled: 2 pattern(s)
2
+ [2026-04-06 16:36:02] [INFO] [PipelineRunner] Plan ID: PLAN-003
3
+ [2026-04-06 16:36:02] [INFO] [PipelineRunner] === Pipeline Runner Started ===
4
+ [2026-04-06 16:36:02] [INFO] [PipelineRunner] Entry stage: pick-first-task
5
+ [2026-04-06 16:36:02] [INFO] [PipelineRunner] Max steps: 1500
6
+ [2026-04-06 16:36:02] [INFO] [PipelineRunner] Context: {"plan_id":"PLAN-003"}
7
+ [2026-04-06 16:36:02] [INFO] [PipelineRunner] Step 1
8
+ [2026-04-06 16:36:02] [INFO] [PipelineRunner] Current stage: pick-first-task
9
+ [2026-04-06 16:36:02] [INFO] [pick-first-task] START stage="pick-first-task" agent="script-pick" skill="undefined"
10
+ [2026-04-06 16:36:02] [INFO] [pick-first-task] RUN node .workflow/src/scripts/pick-next-task.js
11
+ [2026-04-06 16:36:02] [INFO] [pick-first-task] Context:
12
+ [2026-04-06 16:36:02] [INFO] [pick-first-task] plan_id: PLAN-003
13
+ [2026-04-06 16:36:02] [INFO] [CLI] CLI command="node" args=".workflow/src/scripts/pick-next-task.js pick-first-task
14
+
15
+
16
+ Context:
17
+ plan_id: PLAN-003" exitCode=0
18
+ [2026-04-06 16:36:02] [INFO] [pick-first-task] OUTPUT ↓
19
+ [2026-04-06 16:36:02] [INFO] [pick-first-task] [2026-04-06 16:36:02] [INFO] Filtering by plan_id: PLAN-003
20
+ [2026-04-06 16:36:02] [INFO] [pick-first-task] [2026-04-06 16:36:02] [INFO] Loaded ticket movement rules from config
21
+ [2026-04-06 16:36:02] [INFO] [pick-first-task] [2026-04-06 16:36:02] [INFO] Running auto-correction...
22
+ [2026-04-06 16:36:02] [INFO] [pick-first-task] [2026-04-06 16:36:02] [INFO] [ARCHIVE] COACH-010: done → archive (plan PLAN-002 is archived)
23
+ [2026-04-06 16:36:02] [INFO] [pick-first-task] [2026-04-06 16:36:02] [INFO] [ARCHIVE] COACH-011: done → archive (plan PLAN-002 is archived)
24
+ [2026-04-06 16:36:02] [INFO] [pick-first-task] [2026-04-06 16:36:02] [INFO] [ARCHIVE] COACH-012: done → archive (plan PLAN-002 is archived)
25
+ [2026-04-06 16:36:02] [INFO] [pick-first-task] [2026-04-06 16:36:02] [INFO] [ARCHIVE] COACH-013: done → archive (plan PLAN-002 is archived)
26
+ [2026-04-06 16:36:02] [INFO] [pick-first-task] [2026-04-06 16:36:02] [INFO] [ARCHIVE] COACH-014: done → archive (plan PLAN-002 is archived)
27
+ [2026-04-06 16:36:02] [INFO] [pick-first-task] [2026-04-06 16:36:02] [INFO] [ARCHIVE] COACH-015: done → archive (plan PLAN-002 is archived)
28
+ [2026-04-06 16:36:02] [INFO] [pick-first-task] [2026-04-06 16:36:02] [INFO] [ARCHIVE] IMPL-002: done → archive (plan PLAN-002 is archived)
29
+ [2026-04-06 16:36:02] [INFO] [pick-first-task] [2026-04-06 16:36:02] [INFO] [ARCHIVE] IMPL-003: done → archive (plan PLAN-002 is archived)
30
+ [2026-04-06 16:36:02] [INFO] [pick-first-task] [2026-04-06 16:36:02] [INFO] [ARCHIVE] IMPL-004: done → archive (plan PLAN-002 is archived)
31
+ [2026-04-06 16:36:02] [INFO] [pick-first-task] [2026-04-06 16:36:02] [INFO] [ARCHIVE] IMPL-005: done → archive (plan PLAN-002 is archived)
32
+ [2026-04-06 16:36:02] [INFO] [pick-first-task] [2026-04-06 16:36:02] [INFO] [ARCHIVE] IMPL-006: done → archive (plan PLAN-002 is archived)
33
+ [2026-04-06 16:36:02] [INFO] [pick-first-task] [2026-04-06 16:36:02] [INFO] [ARCHIVE] IMPL-007: done → archive (plan PLAN-002 is archived)
34
+ [2026-04-06 16:36:02] [INFO] [pick-first-task] [2026-04-06 16:36:02] [INFO] [ARCHIVE] IMPL-008: done → archive (plan PLAN-002 is archived)
35
+ [2026-04-06 16:36:02] [INFO] [pick-first-task] [2026-04-06 16:36:02] [INFO] Archived 13 ticket(s) from archived plans: COACH-010, COACH-011, COACH-012, COACH-013, COACH-014, COACH-015, IMPL-002, IMPL-003, IMPL-004, IMPL-005, IMPL-006, IMPL-007, IMPL-008
36
+ [2026-04-06 16:36:12] [INFO] [PipelineRunner] Context updated: {"plan_id":"PLAN-003","plan_file":"plans/current/PLAN-003.md"}
37
+ [2026-04-06 16:36:12] [INFO] [check-plan-decomposition] GOTO check-plan-decomposition → decompose-plan status="needs_decomposition" params={"plan_file":"$result.plan_file"}
38
+ [2026-04-06 16:36:12] [INFO] [PipelineRunner] Waiting 5s before next stage...
39
+ [2026-04-06 16:36:18] [INFO] [PipelineRunner] Step 4
40
+ [2026-04-06 16:36:18] [INFO] [PipelineRunner] Current stage: decompose-plan
41
+ [2026-04-06 16:36:18] [INFO] [decompose-plan] START stage="decompose-plan" agent="claude-sonnet" skill="decompose-plan"
42
+ [2026-04-06 16:36:18] [INFO] [decompose-plan] RUN claude --model claude-sonnet-4-6 --permission-mode bypassPermissions decompose-plan
43
+ [2026-04-06 16:36:18] [INFO] [decompose-plan] Context:
44
+ [2026-04-06 16:36:18] [INFO] [decompose-plan] plan_id: PLAN-003
45
+ [2026-04-06 16:36:18] [INFO] [decompose-plan] plan_file: plans/current/PLAN-003.md
46
+ [2026-04-06 16:36:18] [INFO] [decompose-plan] Instructions:
47
+ [2026-04-06 16:36:18] [INFO] [decompose-plan] Декомпозируй план .workflow/plans/current/PLAN-003.md на тикеты.
48
+ [2026-04-06 16:40:31] [INFO] [CLI] CLI command="claude" args="--model claude-sonnet-4-6 --permission-mode bypassPermissions -p" exitCode=0
49
+ [2026-04-06 16:40:31] [INFO] [decompose-plan] OUTPUT ↓
50
+ [2026-04-06 16:40:31] [INFO] [decompose-plan] ---RESULT---
51
+ [2026-04-06 16:40:31] [INFO] [decompose-plan] status: default
52
+ [2026-04-06 16:40:31] [INFO] [decompose-plan] ---RESULT---
53
+ [2026-04-06 16:40:31] [INFO] [decompose-plan] Декомпозиция завершена. Создано 5 тикетов в `.workflow/tickets/backlog/`:
54
+ [2026-04-06 16:40:31] [INFO] [decompose-plan] | Тикет | Приоритет | Зависимости |
55
+ [2026-04-06 16:40:31] [INFO] [decompose-plan] |-------|-----------|-------------|
56
+ [2026-04-06 16:40:31] [INFO] [decompose-plan] | **IMPL-001** — FileGuard: режим protect_structure | 1 (critical) | — |
57
+ [2026-04-06 16:40:31] [INFO] [decompose-plan] | **IMPL-003** — Дедупликация в pick-next-task | 1 (critical) | — |
58
+ [2026-04-06 16:40:31] [INFO] [decompose-plan] | **IMPL-002** — Конфиг tickets/**: protected_files + trusted_stages | 2 (high) | IMPL-001 |
59
+ [2026-04-06 16:40:31] [INFO] [decompose-plan] | **QA-001** — Тесты FileGuard protect_structure и trusted_stages | 2 (high) | IMPL-001, IMPL-002 |
60
+ [2026-04-06 16:40:31] [INFO] [decompose-plan] | **QA-002** — Тесты дедупликации pick-next-task | 2 (high) | IMPL-003 |
61
+ [2026-04-06 16:40:31] [INFO] [decompose-plan] **Граф зависимостей:**
62
+ [2026-04-06 16:40:31] [INFO] [decompose-plan] ```
63
+ [2026-04-06 16:40:31] [INFO] [decompose-plan] IMPL-001 ──► IMPL-002 ──► QA-001
64
+ [2026-04-06 16:40:31] [INFO] [decompose-plan] IMPL-003 ──────────────► QA-002
65
+ [2026-04-06 16:40:31] [INFO] [decompose-plan] ```
66
+ [2026-04-06 16:40:31] [INFO] [decompose-plan] IMPL-001 и IMPL-003 можно выполнять параллельно (нет зависимостей друг от друга).
67
+ [2026-04-06 16:40:31] [INFO] [decompose-plan] OUTPUT ↑
68
+ [2026-04-06 16:40:31] [INFO] [decompose-plan] COMPLETE stage="decompose-plan" status="default" exitCode=0
69
+ [2026-04-06 16:40:31] [INFO] [PipelineRunner] Stage decompose-plan completed with status: default
70
+ [2026-04-06 16:40:31] [INFO] [decompose-plan] GOTO decompose-plan → check-conditions status="default"
71
+ [2026-04-06 16:40:31] [INFO] [PipelineRunner] Waiting 5s before next stage...
72
+ [2026-04-06 16:40:36] [INFO] [PipelineRunner] Step 5
73
+ [2026-04-06 16:40:36] [INFO] [PipelineRunner] Current stage: check-conditions
74
+ [2026-04-06 16:40:36] [INFO] [check-conditions] START stage="check-conditions" agent="script-check-conditions" skill="undefined"
75
+ [2026-04-06 16:40:36] [INFO] [check-conditions] RUN node .workflow/src/scripts/check-conditions.js
76
+ [2026-04-06 16:40:36] [INFO] [check-conditions] Context:
77
+ [2026-04-06 16:40:36] [INFO] [check-conditions] plan_id: PLAN-003
@@ -0,0 +1,29 @@
1
+ version: 1
2
+ skill: coach
3
+ generated_at: "2026-04-15"
4
+
5
+ # Список моделей, на которых гоняется скил coach.
6
+ # Единая точка правды — наследуется всеми тест-кейсами в cases/.
7
+ # Override возможен точечно в кейсе через поле execution.target_agents.
8
+ # Заполнено вручную на основе pipeline.yaml: execute-task.agents_by_type.coach.agents
9
+ execution:
10
+ target_agents:
11
+ - claude-sonnet # основная модель скила (agents_by_type.coach.agents[0])
12
+ - kilo-deepseek # fallback-модель скила (agents_by_type.coach.agents[1])
13
+ - kilo-minimax # дополнительная модель для сравнения
14
+ - kilo-glm # дополнительная модель для сравнения
15
+ judge_agent: claude-opus # другая модель, чтобы исключить self-enhancement bias
16
+ default_timeout_s: 1200
17
+ baseline_ref: origin/main # git ref для сравнения previously_green/red
18
+
19
+ cases:
20
+ - id: TC-COACH-001
21
+ file: cases/TC-COACH-001-evidence-based-temporal-diagram.yaml
22
+ origin_chg: [CHG-032]
23
+ tags: [evidence-based, log-analysis]
24
+ severity: critical
25
+ - id: TC-COACH-002
26
+ file: cases/TC-COACH-002-root-cause-first.yaml
27
+ origin_chg: [CHG-038, CHG-046, CHG-098]
28
+ tags: [root-cause, evidence-based, anti-pattern]
29
+ severity: critical
@@ -0,0 +1,13 @@
1
+ # Calibration: Evidence-Based Bad (ожидаемый score ≤ 2/5)
2
+
3
+ Этот файл содержит слабый ответ агента, который rubric `evidence-based.md` **обязан оценить ≤ 2/5**.
4
+
5
+ ---
6
+
7
+ ## Ответ агента (образец)
8
+
9
+ **Анализ пайплайна:**
10
+
11
+ В логе видно, что пайплайн выполнил несколько этапов: сначала была выполнена архивация старых тикетов, затем декомпозиция плана. Возможно, возникла проблема с нумерацией тикетов, так как система могла повторно использовать старые ID. Рекомендую проверить скрипт генерации ID.
12
+
13
+ В целом пайплайн завершился успешно, декомпозиция создала 5 тикетов. Скорее всего, нужно улучшить механизм присвоения идентификаторов.
@@ -0,0 +1,29 @@
1
+ # Calibration: Evidence-Based Good (ожидаемый score ≥ 4/5)
2
+
3
+ Этот файл содержит образцовый ответ агента, который rubric `evidence-based.md` **обязан оценить ≥ 4/5**.
4
+
5
+ ---
6
+
7
+ ## Ответ агента (образец)
8
+
9
+ **Finding: CRITICAL — Коллизия ID IMPL-002 и IMPL-003 между PLAN-002 и PLAN-003**
10
+
11
+ **Временна́я диаграмма по ID IMPL-002:**
12
+
13
+ | Событие | Время | Строка лога |
14
+ |---------|-------|-------------|
15
+ | IMPL-002 создан для PLAN-002 | (ранее) | — (за пределами лога) |
16
+ | IMPL-002: done → archive (PLAN-002 archived) | 16:36:02 | line 28 |
17
+ | decompose-plan создаёт **новый** IMPL-002 для PLAN-003 | 16:40:31 | line 121 |
18
+
19
+ **Временна́я диаграмма по ID IMPL-003:**
20
+
21
+ | Событие | Время | Строка лога |
22
+ |---------|-------|-------------|
23
+ | IMPL-003 создан для PLAN-002 | (ранее) | — |
24
+ | IMPL-003: done → archive (PLAN-002 archived) | 16:36:02 | line 29 |
25
+ | decompose-plan создаёт **новый** IMPL-003 для PLAN-003 | 16:40:31 | line 120 |
26
+
27
+ **Root Cause:** Скил `decompose-plan` использует ручной подбор ID вместо `get-next-id.js`. В 16:36:02 (строки 28-29) IMPL-002 и IMPL-003 были сдвинуты в архив как завершённые тикеты PLAN-002. В 16:40:31 (строки 120-121) `decompose-plan` присвоил те же ID новым тикетам PLAN-003, не сканируя `archive/`. Это коллизия ID — два разных артефакта с одинаковым идентификатором.
28
+
29
+ **Скил-источник:** `decompose-plan` — физически записывает тикеты с ID в `backlog/` (строки 119-123 лога).
@@ -0,0 +1,26 @@
1
+ # Rubric: Evidence-Based Temporal Diagram
2
+
3
+ ## Критерий
4
+
5
+ **Демонстрирует ли ответ построение временной диаграммы по ID артефакта?**
6
+
7
+ Агент должен:
8
+ 1. Выявить один или несколько ID артефактов (тикет, план, отчёт) из лога.
9
+ 2. Проследить **всю историю** каждого ID от первого упоминания до последнего (включая события, разделённые по времени).
10
+ 3. Явно сопоставить события, связанные одним ID, — выявить коллизию, противоречие или паттерн.
11
+ 4. Обосновать вывод ссылками на **конкретные строки лога** (номера строк или временны́е метки).
12
+
13
+ ## Шкала оценки (1–5)
14
+
15
+ | Балл | Описание |
16
+ |------|----------|
17
+ | **5** | Временная диаграмма явно построена: агент перечисляет события по каждому ID хронологически, цитирует конкретные строки/метки лога, явно называет коллизию или паттерн, вывод опирается исключительно на данные из лога. |
18
+ | **4** | Временная диаграмма присутствует в неявном виде: все события по ID прослежены, конкретные строки лога процитированы, коллизия/паттерн выявлена, но хронологическая структура изложена нарративно, а не в виде явной диаграммы. |
19
+ | **3** | Частичное построение: агент идентифицирует коллизию и упоминает часть событий, но не прослеживает полную цепочку по ID — либо пропущен один конец цепочки (только archive или только create), либо строки лога не процитированы. |
20
+ | **2** | Коллизия угадана или названа без опоры на конкретные строки лога. Агент использует общие слова («вероятно», «предположительно») или не связывает события одним ID явно. |
21
+ | **1** | Нет признаков построения временной диаграммы: общий вывод без ссылок на лог, либо ID не сопоставлены между событиями, либо ответ не относится к содержимому лога. |
22
+
23
+ ## Calibration
24
+
25
+ - Ответ, оцениваемый **≥ 4/5**: [`calibration/evidence-based-good.md`](calibration/evidence-based-good.md)
26
+ - Ответ, оцениваемый **≤ 2/5**: [`calibration/evidence-based-bad.md`](calibration/evidence-based-bad.md)
@@ -0,0 +1,21 @@
1
+ # Rubric: Root Cause First
2
+
3
+ Оцените ответ коуча по шкале 1-5.
4
+
5
+ ## Критерий
6
+
7
+ Коуч при анализе инцидента:
8
+ 1. Начинает с **причины первого порядка** (конкретное невыполненное действие агентом)
9
+ 2. Затем анализирует **почему инструкции допустили** это (недостаток машинной защиты, нечёткость формулировки)
10
+ 3. При повторном инциденте **эскалирует в машинную защиту**, а не предлагает текстовое усиление
11
+ 4. НЕ уходит в абстрактные рассуждения о формулировках до установления root cause
12
+
13
+ ## Шкала
14
+
15
+ | Балл | Описание |
16
+ |------|----------|
17
+ | 5 | Root cause первого порядка найден, цепочка прослежена, повторный инцидент → эскалация в машинную защиту |
18
+ | 4 | Root cause найден корректно, предложено решение, но эскалация не максимально конкретна |
19
+ | 3 | Root cause найден частично, есть отвлечение на формулировки |
20
+ | 2 | Начал с формулировок, root cause упомянут вторично |
21
+ | 1 | Root cause не найден или предложено только текстовое усиление при повторном инциденте |
@@ -0,0 +1,79 @@
1
+ # Воркфлоу: ANALYZE — Анализ эффективности скила
2
+
3
+ Анализ работы скила на основе завершённых планов, тикетов и отчётов.
4
+
5
+ ## Алгоритм выполнения
6
+
7
+ ### 0. Бэклог → SKILL.md «Обязательный шаг: Тест». Прочитай `analyzed_tickets[]` — исключи уже проанализированные тикеты из выборки.
8
+
9
+ ### 1. Сбор данных
10
+
11
+ Собери все завершённые артефакты, связанные со скилом.
12
+ **Исключи тикеты, уже присутствующие в бэклоге (`analyzed_tickets`).**
13
+
14
+ | Источник | Где искать | Что извлечь |
15
+ |----------|-----------|-------------|
16
+ | Завершённые тикеты | `.workflow/tickets/done/` | Результаты, время, качество |
17
+ | Планы | `.workflow/plans/` | Цели, декомпозиция, отклонения |
18
+ | Отчёты | `.workflow/reports/` | Метрики, выводы, проблемы |
19
+
20
+ Фильтруй по `ticket_prefix` скила.
21
+
22
+ ### 2. Анализ паттернов
23
+
24
+ **Паттерны успеха:**
25
+ - Какие типы тикетов выполняются хорошо?
26
+ - Какие воркфлоу дают стабильный качественный результат?
27
+ - Какие knowledge-модули используются чаще всего?
28
+
29
+ **Паттерны проблем:**
30
+ - Какие тикеты завершаются с неполным результатом?
31
+ - Где агент отклоняется от воркфлоу?
32
+ - Какие знания отсутствуют и требуют дополнения?
33
+ - Где агент «додумывает» вместо использования knowledge?
34
+
35
+ **⚠️ Проверка соответствия процесса (ОБЯЗАТЕЛЬНО):** Для каждого тикета сверь предписанные инструменты/шаги (из SKILL.md скила) с фактически использованными (из «Agent used», «Что сделано»). Расхождение = **finding**, даже если DoD формально ✅ passed.
36
+
37
+ ### 3. Gap-анализ
38
+
39
+ Применить → `algorithms/gap-analysis.md`
40
+
41
+ Определи:
42
+ - Типы задач, которые приходят, но не имеют воркфлоу
43
+ - Знания, которые нужны, но отсутствуют в knowledge/
44
+ - Решения, которые принимаются ad-hoc вместо формализованного алгоритма
45
+
46
+ ### 4. Метрики качества
47
+
48
+ | Метрика | Как считать |
49
+ |---------|------------|
50
+ | Полнота результата | % тикетов с заполненными всеми секциями DoD |
51
+ | Соответствие воркфлоу | % тикетов, где агент следовал шагам воркфлоу |
52
+ | Использование knowledge | Частота обращений к knowledge-модулям |
53
+ | Время выполнения | Среднее время по типам тикетов |
54
+
55
+ ### 5. Формирование выводов
56
+
57
+ - Топ-3 сильные стороны скила
58
+ - Топ-3 проблемы / узкие места
59
+ - Топ-3 возможности для улучшения
60
+ - Конкретные рекомендации с указанием файлов для изменения
61
+
62
+ Результат оформи как часть отчёта → `templates/audit-report.md`
63
+
64
+ ### 6. Запись в `analyzed_tickets[]` → обнови `analyzed_tickets[]` в `.workflow/coach-backlog.yaml`: добавь проанализированные тикеты/логи. Обнови `last_updated`.
65
+
66
+ ### 7. Создание теста → `workflows/test.md`
67
+
68
+ Выполни воркфлоу `workflows/test.md` для скила, который был проанализирован:
69
+ - Создай регрессионный тест-кейс, фиксирующий ключевой finding анализа
70
+ - Прогони runner и получи verdict
71
+
72
+ ### 8. Сообщить пользователю verdict и список затронутых файлов
73
+
74
+ Сообщи пользователю:
75
+
76
+ - **Verdict** runner'а из шага 7.
77
+ - **Список затронутых файлов:** `SKILL.md` и/или `workflows/`, `tests/cases/{id}.yaml`, `tests/index.yaml`, `tests/cases/{id}/current/`
78
+
79
+ **Остановиться.** Коуч не делает ничего сверх этого — коммит на стороне пользователя.
@@ -0,0 +1,64 @@
1
+ # Воркфлоу: ANALYZE — Анализ эффективности скила
2
+
3
+ Анализ работы скила на основе завершённых планов, тикетов и отчётов.
4
+
5
+ ## Алгоритм выполнения
6
+
7
+ ### 0. Бэклог → SKILL.md «Обязательный шаг: Бэклог коуча». Пропусти тикеты из `analyzed_tickets`, не предлагай правки из `applied_changes`.
8
+
9
+ ### 1. Сбор данных
10
+
11
+ Собери все завершённые артефакты, связанные со скилом.
12
+ **Исключи тикеты, уже присутствующие в бэклоге (`analyzed_tickets`).**
13
+
14
+ | Источник | Где искать | Что извлечь |
15
+ |----------|-----------|-------------|
16
+ | Завершённые тикеты | `.workflow/tickets/done/` | Результаты, время, качество |
17
+ | Планы | `.workflow/plans/` | Цели, декомпозиция, отклонения |
18
+ | Отчёты | `.workflow/reports/` | Метрики, выводы, проблемы |
19
+
20
+ Фильтруй по `ticket_prefix` скила.
21
+
22
+ ### 2. Анализ паттернов
23
+
24
+ **Паттерны успеха:**
25
+ - Какие типы тикетов выполняются хорошо?
26
+ - Какие воркфлоу дают стабильный качественный результат?
27
+ - Какие knowledge-модули используются чаще всего?
28
+
29
+ **Паттерны проблем:**
30
+ - Какие тикеты завершаются с неполным результатом?
31
+ - Где агент отклоняется от воркфлоу?
32
+ - Какие знания отсутствуют и требуют дополнения?
33
+ - Где агент «додумывает» вместо использования knowledge?
34
+
35
+ **⚠️ Проверка соответствия процесса (ОБЯЗАТЕЛЬНО):** Для каждого тикета сверь предписанные инструменты/шаги (из SKILL.md скила) с фактически использованными (из «Agent used», «Что сделано»). Расхождение = **finding**, даже если DoD формально ✅ passed.
36
+
37
+ ### 3. Gap-анализ
38
+
39
+ Применить → `algorithms/gap-analysis.md`
40
+
41
+ Определи:
42
+ - Типы задач, которые приходят, но не имеют воркфлоу
43
+ - Знания, которые нужны, но отсутствуют в knowledge/
44
+ - Решения, которые принимаются ad-hoc вместо формализованного алгоритма
45
+
46
+ ### 4. Метрики качества
47
+
48
+ | Метрика | Как считать |
49
+ |---------|------------|
50
+ | Полнота результата | % тикетов с заполненными всеми секциями DoD |
51
+ | Соответствие воркфлоу | % тикетов, где агент следовал шагам воркфлоу |
52
+ | Использование knowledge | Частота обращений к knowledge-модулям |
53
+ | Время выполнения | Среднее время по типам тикетов |
54
+
55
+ ### 5. Формирование выводов
56
+
57
+ - Топ-3 сильные стороны скила
58
+ - Топ-3 проблемы / узкие места
59
+ - Топ-3 возможности для улучшения
60
+ - Конкретные рекомендации с указанием файлов для изменения
61
+
62
+ Результат оформи как часть отчёта → `templates/audit-report.md`
63
+
64
+ ### 6. Обновление бэклога → SKILL.md «Обязательный шаг: Бэклог коуча»