workflow-ai 1.0.63 → 1.0.64

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (494) hide show
  1. package/configs/config.yaml +134 -0
  2. package/configs/pipeline.yaml +884 -0
  3. package/configs/ticket-movement-rules.yaml +80 -0
  4. package/package.json +1 -1
  5. package/src/global-dir.mjs +25 -1
  6. package/src/scripts/run-skill-tests.js +348 -136
  7. package/src/skills/analyze-report/README.md +44 -0
  8. package/src/skills/analyze-report/SKILL.md +121 -0
  9. package/src/skills/analyze-report/algorithms/progress-assessment.md +108 -0
  10. package/src/skills/analyze-report/knowledge/analysis-frameworks.md +66 -0
  11. package/src/skills/analyze-report/knowledge/report-structure.md +61 -0
  12. package/src/skills/analyze-report/scripts/calc-plan-metrics.js +234 -0
  13. package/src/skills/analyze-report/templates/analysis-report.md +80 -0
  14. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/claude-sonnet/trial-1.md +69 -0
  15. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/claude-sonnet/trial-2.md +103 -0
  16. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/claude-sonnet/trial-3.md +99 -0
  17. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/judge.json +163 -0
  18. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-deepseek/trial-1.md +89 -0
  19. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-deepseek/trial-2.md +88 -0
  20. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-deepseek/trial-3.md +100 -0
  21. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-glm/trial-1.md +77 -0
  22. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-glm/trial-2.md +64 -0
  23. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-glm/trial-3.md +110 -0
  24. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-minimax/trial-1.md +74 -0
  25. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-minimax/trial-2.md +38 -0
  26. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-minimax/trial-3.md +61 -0
  27. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/meta.json +115 -0
  28. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001-evidence-from-log.yaml +60 -0
  29. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/claude-sonnet/trial-1.md +90 -0
  30. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/claude-sonnet/trial-2.md +89 -0
  31. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/claude-sonnet/trial-3.md +77 -0
  32. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/judge.json +163 -0
  33. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-deepseek/trial-1.md +84 -0
  34. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-deepseek/trial-2.md +77 -0
  35. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-deepseek/trial-3.md +89 -0
  36. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-glm/trial-1.md +103 -0
  37. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-glm/trial-2.md +103 -0
  38. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-glm/trial-3.md +103 -0
  39. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-minimax/trial-1.md +93 -0
  40. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-minimax/trial-2.md +93 -0
  41. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-minimax/trial-3.md +86 -0
  42. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/meta.json +115 -0
  43. package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002-result-block-format.yaml +44 -0
  44. package/src/skills/analyze-report/tests/fixtures/REPORT-002-incorrect-attribution.md +27 -0
  45. package/src/skills/analyze-report/tests/fixtures/pipeline-2026-04-06_qa-001-skip.log +32 -0
  46. package/src/skills/analyze-report/tests/index.yaml +25 -0
  47. package/src/skills/analyze-report/tests/rubrics/evidence-from-log.md +22 -0
  48. package/src/skills/analyze-report/tests/rubrics/result-block-format.md +22 -0
  49. package/src/skills/analyze-report/workflows/progress.md +158 -0
  50. package/src/skills/analyze-report/workflows/retrospective.md +143 -0
  51. package/src/skills/coach/README.md +43 -0
  52. package/src/skills/coach/SKILL.md +166 -0
  53. package/src/skills/coach/SKILL.md.legacy +157 -0
  54. package/src/skills/coach/algorithms/gap-analysis.md +69 -0
  55. package/src/skills/coach/algorithms/improvement-prioritization.md +62 -0
  56. package/src/skills/coach/algorithms/skill-scoring.md +80 -0
  57. package/src/skills/coach/knowledge/audit-applied-changes-clean.txt +11 -0
  58. package/src/skills/coach/knowledge/backlog-management.md +67 -0
  59. package/src/skills/coach/knowledge/backlog-management.md.legacy +90 -0
  60. package/src/skills/coach/knowledge/common-antipatterns.md +76 -0
  61. package/src/skills/coach/knowledge/prompt-engineering.md +45 -0
  62. package/src/skills/coach/knowledge/shared-knowledge-guide.md +44 -0
  63. package/src/skills/coach/knowledge/skill-anatomy.md +49 -0
  64. package/src/skills/coach/knowledge/test-authorship.md +141 -0
  65. package/src/skills/coach/templates/audit-report.md +39 -0
  66. package/src/skills/coach/templates/coach-backlog-init.yaml +14 -0
  67. package/src/skills/coach/templates/coach-backlog-init.yaml.legacy +10 -0
  68. package/src/skills/coach/templates/improvement-plan.md +42 -0
  69. package/src/skills/coach/templates/new-skill.md +95 -0
  70. package/src/skills/coach/tests/cases/TC-COACH-001/current/claude-sonnet/trial-1.md +58 -0
  71. package/src/skills/coach/tests/cases/TC-COACH-001/current/claude-sonnet/trial-2.md +65 -0
  72. package/src/skills/coach/tests/cases/TC-COACH-001/current/claude-sonnet/trial-3.md +58 -0
  73. package/src/skills/coach/tests/cases/TC-COACH-001/current/judge.json +151 -0
  74. package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-deepseek/trial-1.md +46 -0
  75. package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-deepseek/trial-2.md +0 -0
  76. package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-deepseek/trial-3.md +75 -0
  77. package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-glm/trial-1.md +81 -0
  78. package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-glm/trial-2.md +101 -0
  79. package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-glm/trial-3.md +91 -0
  80. package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-minimax/trial-1.md +48 -0
  81. package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-minimax/trial-2.md +30 -0
  82. package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-minimax/trial-3.md +55 -0
  83. package/src/skills/coach/tests/cases/TC-COACH-001/current/meta.json +95 -0
  84. package/src/skills/coach/tests/cases/TC-COACH-001-evidence-based-temporal-diagram.yaml +53 -0
  85. package/src/skills/coach/tests/cases/TC-COACH-002/current/claude-sonnet/trial-1.md +46 -0
  86. package/src/skills/coach/tests/cases/TC-COACH-002/current/claude-sonnet/trial-2.md +50 -0
  87. package/src/skills/coach/tests/cases/TC-COACH-002/current/claude-sonnet/trial-3.md +48 -0
  88. package/src/skills/coach/tests/cases/TC-COACH-002/current/judge.json +151 -0
  89. package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-deepseek/trial-1.md +0 -0
  90. package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-deepseek/trial-2.md +37 -0
  91. package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-deepseek/trial-3.md +30 -0
  92. package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-glm/trial-1.md +23 -0
  93. package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-glm/trial-2.md +29 -0
  94. package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-glm/trial-3.md +35 -0
  95. package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-minimax/trial-1.md +13 -0
  96. package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-minimax/trial-2.md +19 -0
  97. package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-minimax/trial-3.md +33 -0
  98. package/src/skills/coach/tests/cases/TC-COACH-002/current/meta.json +95 -0
  99. package/src/skills/coach/tests/cases/TC-COACH-002-root-cause-first.yaml +57 -0
  100. package/src/skills/coach/tests/fixtures/pipeline-2026-04-06_id-collision.log +77 -0
  101. package/src/skills/coach/tests/index.yaml +29 -0
  102. package/src/skills/coach/tests/rubrics/calibration/evidence-based-bad.md +13 -0
  103. package/src/skills/coach/tests/rubrics/calibration/evidence-based-good.md +29 -0
  104. package/src/skills/coach/tests/rubrics/evidence-based.md +26 -0
  105. package/src/skills/coach/tests/rubrics/root-cause-first.md +21 -0
  106. package/src/skills/coach/workflows/analyze.md +79 -0
  107. package/src/skills/coach/workflows/analyze.md.legacy +64 -0
  108. package/src/skills/coach/workflows/audit.md +74 -0
  109. package/src/skills/coach/workflows/audit.md.legacy +59 -0
  110. package/src/skills/coach/workflows/create.md +80 -0
  111. package/src/skills/coach/workflows/create.md.legacy +67 -0
  112. package/src/skills/coach/workflows/improve.md +71 -0
  113. package/src/skills/coach/workflows/improve.md.legacy +60 -0
  114. package/src/skills/coach/workflows/research.md +55 -0
  115. package/src/skills/coach/workflows/review.md +52 -0
  116. package/src/skills/coach/workflows/review.md.legacy +48 -0
  117. package/src/skills/coach/workflows/test.md +97 -0
  118. package/src/skills/create-plan/README.md +39 -0
  119. package/src/skills/create-plan/SKILL.md +104 -0
  120. package/src/skills/create-plan/algorithms/risk-assessment.md +73 -0
  121. package/src/skills/create-plan/knowledge/plan-completeness.md +67 -0
  122. package/src/skills/create-plan/knowledge/plan-lifecycle.md +33 -0
  123. package/src/skills/create-plan/knowledge/task-verification-pairs.md +151 -0
  124. package/src/skills/create-plan/scripts/validate-completeness.js +182 -0
  125. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/claude-sonnet/trial-1.md +5 -0
  126. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/claude-sonnet/trial-2.md +39 -0
  127. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/claude-sonnet/trial-3.md +35 -0
  128. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/judge.json +167 -0
  129. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-deepseek/trial-1.md +5 -0
  130. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-deepseek/trial-2.md +10 -0
  131. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-deepseek/trial-3.md +5 -0
  132. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-glm/trial-1.md +26 -0
  133. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-glm/trial-2.md +86 -0
  134. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-glm/trial-3.md +5 -0
  135. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-minimax/trial-1.md +11 -0
  136. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-minimax/trial-2.md +15 -0
  137. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-minimax/trial-3.md +14 -0
  138. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/meta.json +119 -0
  139. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001-validate-completeness.yaml +41 -0
  140. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/claude-sonnet/trial-1.md +25 -0
  141. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/claude-sonnet/trial-2.md +30 -0
  142. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/claude-sonnet/trial-3.md +37 -0
  143. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/judge.json +164 -0
  144. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-deepseek/trial-1.md +3 -0
  145. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-deepseek/trial-2.md +11 -0
  146. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-deepseek/trial-3.md +13 -0
  147. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-glm/trial-1.md +44 -0
  148. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-glm/trial-2.md +5 -0
  149. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-glm/trial-3.md +49 -0
  150. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-minimax/trial-1.md +6 -0
  151. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-minimax/trial-2.md +11 -0
  152. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-minimax/trial-3.md +16 -0
  153. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/meta.json +116 -0
  154. package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002-task-granularity.yaml +39 -0
  155. package/src/skills/create-plan/tests/index.yaml +25 -0
  156. package/src/skills/create-plan/tests/rubrics/task-granularity.md +21 -0
  157. package/src/skills/create-plan/tests/rubrics/validate-completeness.md +21 -0
  158. package/src/skills/create-plan/workflows/create.md +136 -0
  159. package/src/skills/create-report/README.md +40 -0
  160. package/src/skills/create-report/SKILL.md +73 -0
  161. package/src/skills/create-report/algorithms/metric-calculation.md +93 -0
  162. package/src/skills/create-report/knowledge/report-metrics.md +82 -0
  163. package/src/skills/create-report/scripts/calc-metrics.js +383 -0
  164. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/claude-sonnet/trial-1.md +25 -0
  165. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/claude-sonnet/trial-2.md +26 -0
  166. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/claude-sonnet/trial-3.md +28 -0
  167. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/judge.json +163 -0
  168. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-deepseek/trial-1.md +4 -0
  169. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-deepseek/trial-2.md +3 -0
  170. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-deepseek/trial-3.md +6 -0
  171. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-glm/trial-1.md +8 -0
  172. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-glm/trial-2.md +12 -0
  173. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-glm/trial-3.md +7 -0
  174. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-minimax/trial-1.md +12 -0
  175. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-minimax/trial-2.md +22 -0
  176. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-minimax/trial-3.md +13 -0
  177. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/meta.json +115 -0
  178. package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001-root-cause-attribution.yaml +57 -0
  179. package/src/skills/create-report/tests/index.yaml +20 -0
  180. package/src/skills/create-report/tests/rubrics/root-cause-attribution.md +21 -0
  181. package/src/skills/create-report/workflows/standard.md +175 -0
  182. package/src/skills/decompose-gaps/README.md +39 -0
  183. package/src/skills/decompose-gaps/SKILL.md +78 -0
  184. package/src/skills/decompose-gaps/algorithms/scope-check.md +110 -0
  185. package/src/skills/decompose-gaps/knowledge/scope-validation.md +65 -0
  186. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/claude-sonnet/trial-1.md +49 -0
  187. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/claude-sonnet/trial-2.md +56 -0
  188. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/claude-sonnet/trial-3.md +39 -0
  189. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/judge.json +164 -0
  190. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-deepseek/trial-1.md +25 -0
  191. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-deepseek/trial-2.md +11 -0
  192. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-deepseek/trial-3.md +26 -0
  193. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-glm/trial-1.md +19 -0
  194. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-glm/trial-2.md +5 -0
  195. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-glm/trial-3.md +28 -0
  196. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-minimax/trial-1.md +23 -0
  197. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-minimax/trial-2.md +27 -0
  198. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-minimax/trial-3.md +25 -0
  199. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/meta.json +116 -0
  200. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001-scope-exclusion.yaml +46 -0
  201. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/claude-sonnet/trial-1.md +32 -0
  202. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/claude-sonnet/trial-2.md +20 -0
  203. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/claude-sonnet/trial-3.md +26 -0
  204. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/judge.json +164 -0
  205. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-deepseek/trial-1.md +7 -0
  206. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-deepseek/trial-2.md +16 -0
  207. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-deepseek/trial-3.md +7 -0
  208. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-glm/trial-1.md +5 -0
  209. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-glm/trial-2.md +11 -0
  210. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-glm/trial-3.md +13 -0
  211. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-minimax/trial-1.md +13 -0
  212. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-minimax/trial-2.md +12 -0
  213. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-minimax/trial-3.md +5 -0
  214. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/meta.json +116 -0
  215. package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002-glob-before-write.yaml +36 -0
  216. package/src/skills/decompose-gaps/tests/index.yaml +25 -0
  217. package/src/skills/decompose-gaps/tests/rubrics/glob-before-write.md +21 -0
  218. package/src/skills/decompose-gaps/tests/rubrics/scope-exclusion.md +21 -0
  219. package/src/skills/decompose-gaps/workflows/decompose.md +120 -0
  220. package/src/skills/decompose-plan/README.md +43 -0
  221. package/src/skills/decompose-plan/SKILL.md +87 -0
  222. package/src/skills/decompose-plan/algorithms/deduplication.md +101 -0
  223. package/src/skills/decompose-plan/knowledge/atomicity-checklist.md +113 -0
  224. package/src/skills/decompose-plan/knowledge/capabilities.md +44 -0
  225. package/src/skills/decompose-plan/knowledge/human-task-rules.md +67 -0
  226. package/src/skills/decompose-plan/knowledge/scope-guard-checklist.md +73 -0
  227. package/src/skills/decompose-plan/scripts/check-atomicity-limit.js +47 -0
  228. package/src/skills/decompose-plan/scripts/check-duplicates.js +323 -0
  229. package/src/skills/decompose-plan/scripts/verify-atomicity.js +408 -0
  230. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/claude-sonnet/trial-1.md +30 -0
  231. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/claude-sonnet/trial-2.md +36 -0
  232. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/claude-sonnet/trial-3.md +37 -0
  233. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/judge.json +163 -0
  234. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-deepseek/trial-1.md +20 -0
  235. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-deepseek/trial-2.md +17 -0
  236. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-deepseek/trial-3.md +28 -0
  237. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-glm/trial-1.md +114 -0
  238. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-glm/trial-2.md +137 -0
  239. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-glm/trial-3.md +188 -0
  240. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-minimax/trial-1.md +0 -0
  241. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-minimax/trial-2.md +32 -0
  242. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-minimax/trial-3.md +110 -0
  243. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/meta.json +115 -0
  244. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001-atomicity-no-1to1.yaml +56 -0
  245. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/claude-sonnet/trial-1.md +47 -0
  246. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/claude-sonnet/trial-2.md +54 -0
  247. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/claude-sonnet/trial-3.md +43 -0
  248. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/judge.json +163 -0
  249. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-deepseek/trial-1.md +15 -0
  250. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-deepseek/trial-2.md +5 -0
  251. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-deepseek/trial-3.md +12 -0
  252. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-glm/trial-1.md +34 -0
  253. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-glm/trial-2.md +30 -0
  254. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-glm/trial-3.md +35 -0
  255. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-minimax/trial-1.md +0 -0
  256. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-minimax/trial-2.md +31 -0
  257. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-minimax/trial-3.md +0 -0
  258. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/meta.json +115 -0
  259. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002-get-next-id-mandatory.yaml +44 -0
  260. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/claude-sonnet/trial-1.md +21 -0
  261. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/claude-sonnet/trial-2.md +38 -0
  262. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/claude-sonnet/trial-3.md +30 -0
  263. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/judge.json +163 -0
  264. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-deepseek/trial-1.md +31 -0
  265. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-deepseek/trial-2.md +35 -0
  266. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-deepseek/trial-3.md +48 -0
  267. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-glm/trial-1.md +167 -0
  268. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-glm/trial-2.md +62 -0
  269. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-glm/trial-3.md +174 -0
  270. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-minimax/trial-1.md +0 -0
  271. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-minimax/trial-2.md +0 -0
  272. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-minimax/trial-3.md +0 -0
  273. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/meta.json +115 -0
  274. package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003-verbatim-dod-transfer.yaml +42 -0
  275. package/src/skills/decompose-plan/tests/index.yaml +30 -0
  276. package/src/skills/decompose-plan/tests/rubrics/atomicity-no-1to1.md +21 -0
  277. package/src/skills/decompose-plan/tests/rubrics/get-next-id-mandatory.md +21 -0
  278. package/src/skills/decompose-plan/tests/rubrics/verbatim-dod-transfer.md +21 -0
  279. package/src/skills/decompose-plan/workflows/decompose.md +272 -0
  280. package/src/skills/deep-research/README.md +36 -0
  281. package/src/skills/deep-research/SKILL.md +106 -0
  282. package/src/skills/deep-research/algorithms/source-scoring.md +63 -0
  283. package/src/skills/deep-research/algorithms/synthesis.md +67 -0
  284. package/src/skills/deep-research/knowledge/data-validation.md +44 -0
  285. package/src/skills/deep-research/knowledge/perplexity-config.md +30 -0
  286. package/src/skills/deep-research/knowledge/research-methodology.md +54 -0
  287. package/src/skills/deep-research/knowledge/source-evaluation.md +33 -0
  288. package/src/skills/deep-research/scripts/perplexity-research.js +315 -0
  289. package/src/skills/deep-research/templates/brief-summary.md +25 -0
  290. package/src/skills/deep-research/templates/research-report.md +76 -0
  291. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/claude-haiku/trial-1.md +48 -0
  292. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/claude-haiku/trial-2.md +88 -0
  293. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/claude-haiku/trial-3.md +56 -0
  294. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/judge.json +163 -0
  295. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-free/trial-1.md +58 -0
  296. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-free/trial-2.md +249 -0
  297. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-free/trial-3.md +44 -0
  298. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-glm/trial-1.md +96 -0
  299. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-glm/trial-2.md +56 -0
  300. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-glm/trial-3.md +94 -0
  301. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-glm-air/trial-1.md +11 -0
  302. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-glm-air/trial-2.md +1 -0
  303. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-glm-air/trial-3.md +1 -0
  304. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/meta.json +115 -0
  305. package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001-self-check-url.yaml +58 -0
  306. package/src/skills/deep-research/tests/index.yaml +20 -0
  307. package/src/skills/deep-research/tests/rubrics/self-check-url.md +34 -0
  308. package/src/skills/deep-research/workflows/base-checklist.md +19 -0
  309. package/src/skills/deep-research/workflows/benchmark.md +38 -0
  310. package/src/skills/deep-research/workflows/competitor.md +44 -0
  311. package/src/skills/deep-research/workflows/custom.md +32 -0
  312. package/src/skills/deep-research/workflows/market.md +44 -0
  313. package/src/skills/deep-research/workflows/technology.md +40 -0
  314. package/src/skills/deep-research/workflows/trend.md +40 -0
  315. package/src/skills/execute-task/README.md +44 -0
  316. package/src/skills/execute-task/SKILL.md +292 -0
  317. package/src/skills/execute-task/algorithms/execution-strategy.md +136 -0
  318. package/src/skills/execute-task/knowledge/context-checkpoints.md +75 -0
  319. package/src/skills/execute-task/knowledge/ticket-structure.md +70 -0
  320. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/claude-haiku/trial-1.md +5 -0
  321. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/claude-haiku/trial-2.md +5 -0
  322. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/claude-haiku/trial-3.md +5 -0
  323. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/judge.json +124 -0
  324. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/kilo-free/trial-1.md +4 -0
  325. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/kilo-free/trial-2.md +4 -0
  326. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/kilo-free/trial-3.md +4 -0
  327. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/kilo-glm-air/trial-1.md +4 -0
  328. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/kilo-glm-air/trial-2.md +4 -0
  329. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/kilo-glm-air/trial-3.md +11 -0
  330. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/meta.json +89 -0
  331. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001-no-ticket-creation.yaml +48 -0
  332. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/claude-haiku/trial-1.md +5 -0
  333. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/claude-haiku/trial-2.md +6 -0
  334. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/claude-haiku/trial-3.md +5 -0
  335. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/judge.json +124 -0
  336. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/kilo-free/trial-1.md +4 -0
  337. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/kilo-free/trial-2.md +4 -0
  338. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/kilo-free/trial-3.md +8 -0
  339. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/kilo-glm-air/trial-1.md +9 -0
  340. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/kilo-glm-air/trial-2.md +26 -0
  341. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/kilo-glm-air/trial-3.md +4 -0
  342. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/meta.json +89 -0
  343. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002-no-duplicate-dod.yaml +44 -0
  344. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-003/current/claude-haiku/trial-1.md +5 -0
  345. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-003/current/claude-haiku/trial-2.md +5 -0
  346. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-003/current/claude-haiku/trial-3.md +5 -0
  347. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-003/current/judge.json +46 -0
  348. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-003/current/meta.json +37 -0
  349. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-003-verification-proportionality.yaml +46 -0
  350. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/claude-haiku/trial-1.md +18 -0
  351. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/claude-haiku/trial-2.md +16 -0
  352. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/claude-haiku/trial-3.md +14 -0
  353. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/judge.json +124 -0
  354. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/kilo-free/trial-1.md +5 -0
  355. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/kilo-free/trial-2.md +5 -0
  356. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/kilo-free/trial-3.md +1 -0
  357. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/kilo-glm-air/trial-1.md +8 -0
  358. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/kilo-glm-air/trial-2.md +5 -0
  359. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/kilo-glm-air/trial-3.md +4 -0
  360. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/meta.json +89 -0
  361. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004-no-foreign-ticket-edit.yaml +50 -0
  362. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/claude-haiku/trial-1.md +5 -0
  363. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/claude-haiku/trial-2.md +5 -0
  364. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/claude-haiku/trial-3.md +5 -0
  365. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/judge.json +124 -0
  366. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/kilo-free/trial-1.md +15 -0
  367. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/kilo-free/trial-2.md +4 -0
  368. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/kilo-free/trial-3.md +5 -0
  369. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/kilo-glm-air/trial-1.md +11 -0
  370. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/kilo-glm-air/trial-2.md +11 -0
  371. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/kilo-glm-air/trial-3.md +4 -0
  372. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/meta.json +89 -0
  373. package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005-ticket-fields-updated.yaml +39 -0
  374. package/src/skills/execute-task/tests/fixtures/IMPL-902-create-file.md +41 -0
  375. package/src/skills/execute-task/tests/fixtures/IMPL-904-current-task.md +40 -0
  376. package/src/skills/execute-task/tests/fixtures/IMPL-906-fill-ticket.md +42 -0
  377. package/src/skills/execute-task/tests/fixtures/QA-901-button-click.md +41 -0
  378. package/src/skills/execute-task/tests/fixtures/QA-903-visual-figma.md +40 -0
  379. package/src/skills/execute-task/tests/fixtures/TASK-905-done-with-typo.md +36 -0
  380. package/src/skills/execute-task/tests/index.yaml +39 -0
  381. package/src/skills/execute-task/tests/rubrics/no-duplicate-dod.md +22 -0
  382. package/src/skills/execute-task/tests/rubrics/no-foreign-ticket-edit.md +20 -0
  383. package/src/skills/execute-task/tests/rubrics/no-ticket-creation.md +21 -0
  384. package/src/skills/execute-task/tests/rubrics/ticket-fields-updated.md +23 -0
  385. package/src/skills/execute-task/tests/rubrics/verification-proportionality.md +22 -0
  386. package/src/skills/execute-task/workflows/execute.md +104 -0
  387. package/src/skills/manual-testing/README.md +63 -0
  388. package/src/skills/manual-testing/SKILL.md +174 -0
  389. package/src/skills/manual-testing/algorithms/blocked-tool-strategy.md +74 -0
  390. package/src/skills/manual-testing/algorithms/bug-severity.md +73 -0
  391. package/src/skills/manual-testing/algorithms/mcp-budget.md +97 -0
  392. package/src/skills/manual-testing/algorithms/test-prioritization.md +69 -0
  393. package/src/skills/manual-testing/knowledge/browser-extension-testing.md +102 -0
  394. package/src/skills/manual-testing/knowledge/browser-tools.md +114 -0
  395. package/src/skills/manual-testing/knowledge/desktop-tools-advanced.md +92 -0
  396. package/src/skills/manual-testing/knowledge/desktop-tools-core.md +76 -0
  397. package/src/skills/manual-testing/knowledge/sandbox-advanced.md +83 -0
  398. package/src/skills/manual-testing/knowledge/sandbox-core.md +67 -0
  399. package/src/skills/manual-testing/knowledge/stateful-edge-cases.md +69 -0
  400. package/src/skills/manual-testing/knowledge/test-case-design.md +107 -0
  401. package/src/skills/manual-testing/knowledge/testing-types.md +45 -0
  402. package/src/skills/manual-testing/templates/bug-report.md +52 -0
  403. package/src/skills/manual-testing/templates/test-case.md +34 -0
  404. package/src/skills/manual-testing/templates/test-plan.md +97 -0
  405. package/src/skills/manual-testing/templates/test-session-report.md +56 -0
  406. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/claude-sonnet/trial-1.md +21 -0
  407. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/claude-sonnet/trial-2.md +65 -0
  408. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/claude-sonnet/trial-3.md +35 -0
  409. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/judge.json +163 -0
  410. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-deepseek/trial-1.md +0 -0
  411. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-deepseek/trial-2.md +7 -0
  412. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-deepseek/trial-3.md +0 -0
  413. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-glm/trial-1.md +4 -0
  414. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-glm/trial-2.md +15 -0
  415. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-glm/trial-3.md +8 -0
  416. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-minimax/trial-1.md +5 -0
  417. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-minimax/trial-2.md +7 -0
  418. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-minimax/trial-3.md +7 -0
  419. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/meta.json +114 -0
  420. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001-sandbox-mandatory.yaml +38 -0
  421. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/claude-sonnet/trial-1.md +47 -0
  422. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/claude-sonnet/trial-2.md +39 -0
  423. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/claude-sonnet/trial-3.md +40 -0
  424. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/judge.json +163 -0
  425. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-deepseek/trial-1.md +19 -0
  426. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-deepseek/trial-2.md +15 -0
  427. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-deepseek/trial-3.md +24 -0
  428. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-glm/trial-1.md +19 -0
  429. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-glm/trial-2.md +13 -0
  430. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-glm/trial-3.md +18 -0
  431. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-minimax/trial-1.md +21 -0
  432. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-minimax/trial-2.md +15 -0
  433. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-minimax/trial-3.md +14 -0
  434. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/meta.json +114 -0
  435. package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002-visual-tc-screenshot.yaml +37 -0
  436. package/src/skills/manual-testing/tests/index.yaml +25 -0
  437. package/src/skills/manual-testing/tests/last-run-tc001-sonnet.log +140 -0
  438. package/src/skills/manual-testing/tests/last-run-tc002.log +1 -0
  439. package/src/skills/manual-testing/tests/last-run.log +1469 -0
  440. package/src/skills/manual-testing/tests/rubrics/sandbox-mandatory.md +20 -0
  441. package/src/skills/manual-testing/tests/rubrics/visual-tc-screenshot.md +21 -0
  442. package/src/skills/manual-testing/workflows/acceptance.md +80 -0
  443. package/src/skills/manual-testing/workflows/exploratory.md +84 -0
  444. package/src/skills/manual-testing/workflows/regression.md +76 -0
  445. package/src/skills/manual-testing/workflows/smoke.md +109 -0
  446. package/src/skills/manual-testing/workflows/test-plan.md +75 -0
  447. package/src/skills/review-result/README.md +59 -0
  448. package/src/skills/review-result/SKILL.md +138 -0
  449. package/src/skills/review-result/algorithms/verification.md +112 -0
  450. package/src/skills/review-result/knowledge/dod-patterns.md +115 -0
  451. package/src/skills/review-result/scripts/verify-artifacts.js +354 -0
  452. package/src/skills/review-result/templates/verdict.md +153 -0
  453. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/claude-haiku/trial-1.md +22 -0
  454. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/claude-haiku/trial-2.md +7 -0
  455. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/claude-haiku/trial-3.md +21 -0
  456. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/claude-sonnet/trial-1.md +6 -0
  457. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/claude-sonnet/trial-2.md +6 -0
  458. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/claude-sonnet/trial-3.md +18 -0
  459. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/judge.json +164 -0
  460. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-deepseek/trial-1.md +5 -0
  461. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-deepseek/trial-2.md +7 -0
  462. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-deepseek/trial-3.md +6 -0
  463. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-glm/trial-1.md +49 -0
  464. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-glm/trial-2.md +28 -0
  465. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-glm/trial-3.md +37 -0
  466. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-minimax/trial-1.md +22 -0
  467. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-minimax/trial-2.md +13 -0
  468. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-minimax/trial-3.md +21 -0
  469. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/meta.json +116 -0
  470. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001-visual-tc-trigger.yaml +51 -0
  471. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/claude-haiku/trial-1.md +23 -0
  472. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/claude-haiku/trial-2.md +22 -0
  473. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/claude-haiku/trial-3.md +28 -0
  474. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/claude-sonnet/trial-1.md +4 -0
  475. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/claude-sonnet/trial-2.md +36 -0
  476. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/claude-sonnet/trial-3.md +4 -0
  477. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/judge.json +163 -0
  478. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-deepseek/trial-1.md +4 -0
  479. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-deepseek/trial-2.md +0 -0
  480. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-deepseek/trial-3.md +4 -0
  481. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-glm/trial-1.md +39 -0
  482. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-glm/trial-2.md +25 -0
  483. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-glm/trial-3.md +32 -0
  484. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-minimax/trial-1.md +34 -0
  485. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-minimax/trial-2.md +8 -0
  486. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-minimax/trial-3.md +23 -0
  487. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/meta.json +115 -0
  488. package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002-path-line-suffix.yaml +39 -0
  489. package/src/skills/review-result/tests/fixtures/IMPL-902-path-with-line.md +43 -0
  490. package/src/skills/review-result/tests/fixtures/QA-901-visual-button.md +46 -0
  491. package/src/skills/review-result/tests/index.yaml +25 -0
  492. package/src/skills/review-result/tests/rubrics/path-line-suffix.md +19 -0
  493. package/src/skills/review-result/tests/rubrics/visual-tc-trigger.md +19 -0
  494. package/src/skills/review-result/workflows/review.md +209 -0
@@ -0,0 +1,56 @@
1
+ # Шаблон: Отчёт о тестовой сессии
2
+
3
+ ```markdown
4
+ # Отчёт о тестировании: {Название сессии}
5
+
6
+ **Дата:** {YYYY-MM-DD}
7
+ **Тип тестирования:** Smoke / Regression / Exploratory / Acceptance
8
+ **Тестировщик:** QA
9
+ **Окружение:** {URL стенда}
10
+ **Браузер:** {Браузер и версия}
11
+
12
+ ## Вердикт
13
+
14
+ **{PASSED / FAILED / PASSED WITH ISSUES / ACCEPTED / REJECTED}**
15
+
16
+ {1-2 предложения обоснование вердикта}
17
+
18
+ ## Статистика
19
+
20
+ | Метрика | Значение |
21
+ |---------|----------|
22
+ | Всего тест-кейсов | {N} |
23
+ | Пройдено (PASS) | {N} ({%}) |
24
+ | Упало (FAIL) | {N} ({%}) |
25
+ | Заблокировано (BLOCKED) | {N} ({%}) |
26
+ | Пропущено (SKIPPED) | {N} ({%}) |
27
+
28
+ ## Результаты по тест-кейсам
29
+
30
+ | # | ID | Название | Приоритет | Статус | Комментарий |
31
+ |---|-----|----------|-----------|--------|-------------|
32
+ | 1 | QA-TC-001 | {Название} | P0 | PASS / FAIL | {Ссылка на баг если FAIL} |
33
+ | 2 | ... | ... | ... | ... | ... |
34
+
35
+ ## Найденные баги
36
+
37
+ | # | TC | Название | Severity | Priority |
38
+ |---|----|----------|----------|----------|
39
+ | 1 | TC-{NNN} | {Краткое описание} | {CRITICAL} | {P0} |
40
+ | 2 | ... | ... | ... | ... |
41
+
42
+ (Подробности — в секциях DEFECT внутри QA-тикета)
43
+
44
+ ## Наблюдения
45
+
46
+ {Заметки, UX-проблемы, рекомендации, непокрытые области — всё, что не является формальным багом, но заслуживает внимания}
47
+
48
+ ## Скриншоты
49
+
50
+ {Ключевые скриншоты сессии с подписями}
51
+
52
+ ## Рекомендации
53
+
54
+ - {Рекомендация 1}
55
+ - {Рекомендация 2}
56
+ ```
@@ -0,0 +1,21 @@
1
+ Согласно таблице инструментов в скиле:
2
+
3
+ | Критерий | Выбор |
4
+ |----------|-------|
5
+ | **Тип приложения** | Desktop (VSCode-расширение) |
6
+ | **Основной инструмент** | Windows-MCP **через Sandbox** |
7
+ | **Knowledge-модули** | `knowledge/desktop-tools-core.md` + `knowledge/sandbox-core.md` |
8
+
9
+ **Среда запуска: Windows Sandbox — обязательно.**
10
+
11
+ Прямое тестирование на хосте запрещено по двум причинам:
12
+ 1. Загрязняет рабочую среду разработчика (глобальные настройки VSCode, установленные расширения).
13
+ 2. Не воспроизводит чистое состояние — нельзя гарантировать изоляцию между тест-запусками.
14
+
15
+ **Порядок подготовки:**
16
+
17
+ 1. Прочитать `../shared/README.md` — узнать пути, конфигурацию, проектные ограничения.
18
+ 2. Загрузить `knowledge/sandbox-core.md` — настройка Sandbox, quick-start, персистенция evidence.
19
+ 3. Загрузить `knowledge/desktop-tools-core.md` — команды Windows-MCP, паттерны навигации, бюджет Snapshot.
20
+ 4. Поднять Sandbox → установить расширение → выполнить тест-кейсы через Windows-MCP.
21
+ 5. Сохранить evidence (скриншоты/a11y assertions) из Sandbox до завершения сессии — файлы внутри Sandbox эфемерны.
@@ -0,0 +1,65 @@
1
+ Нет директории `shared/` в этой тестовой среде — это ожидаемо для данного тест-кейса. Отвечаю на вопрос о стратегии тестирования на основе загруженных knowledge-модулей.
2
+
3
+ ---
4
+
5
+ ## Стратегия тестирования VSCode-расширения
6
+
7
+ ### Выбор среды и инструмента
8
+
9
+ **VSCode-расширение — desktop-приложение.**
10
+
11
+ Согласно таблице выбора инструмента из SKILL.md:
12
+
13
+ | Тип приложения | Инструмент | Knowledge |
14
+ |----------------|-----------|-----------|
15
+ | Desktop-приложение (VSCode-расширение) | Windows-MCP **через Sandbox** | `desktop-tools-core.md` + `sandbox-core.md` |
16
+
17
+ **⛔ Тестирование напрямую на хост-машине — ЗАПРЕЩЕНО.**
18
+
19
+ Причины:
20
+ - Загрязняет рабочую среду разработчика (конфиги, состояние расширений, workspace)
21
+ - Не воспроизводит чистое состояние (первый запуск, свежая установка)
22
+ - Не изолировано — побочные эффекты от других расширений и настроек хоста
23
+
24
+ ### Выбранная стратегия
25
+
26
+ **Среда:** Windows Sandbox — изолированное, эфемерное окружение. При каждом запуске чистое состояние.
27
+
28
+ **Инструмент:** `mcp__sandbox-desktop__*` (сервер `sandbox-desktop`)
29
+
30
+ **Команды:**
31
+ - `mcp__sandbox-desktop__screenshot` — быстрая проверка состояния UI
32
+ - `mcp__sandbox-desktop__snapshot` — a11y tree для верификации элементов (с бюджетом: 1-2 Snapshot/TC, макс. 6/сессию)
33
+ - `mcp__sandbox-desktop__click`, `type`, `shortcut` — взаимодействие с UI
34
+ - `mcp__sandbox-desktop__PowerShell` — подготовка окружения, сохранение evidence
35
+
36
+ ### Quick-start checklist перед тест-кейсами
37
+
38
+ ```
39
+ 1. Screenshot → MCP работает, Sandbox активен
40
+ 2. PowerShell → Test-Path "<path-to-portable-vscode>"
41
+ 3. PowerShell → Test-Path "<path-to-test-workspace>"
42
+ 4. → Запускай VSCode и тест-кейсы
43
+ ```
44
+
45
+ **Hard gate:** если к 5-му MCP-вызову не начат первый TC — СТОП, пересмотреть план.
46
+
47
+ ### Персистенция evidence
48
+
49
+ Sandbox эфемерен. Скриншоты сохраняются на хост через mapped folder:
50
+
51
+ ```powershell
52
+ # В PowerShell внутри Sandbox:
53
+ $bmp.Save('<sandbox-project-path>\reports\QA-NNN-screenshot-TC-NNN.png')
54
+ ```
55
+
56
+ После сохранения — проверка на хосте: `ls reports/*.png`.
57
+
58
+ ### Итог
59
+
60
+ | Параметр | Значение |
61
+ |----------|---------|
62
+ | **Среда** | Windows Sandbox (изолированная, НЕ хост) |
63
+ | **MCP-сервер** | `sandbox-desktop` |
64
+ | **Тип проверок** | a11y tree assertions (Snapshot) + скриншоты для визуальных TC |
65
+ | **Тестирование на хосте** | ⛔ ЗАПРЕЩЕНО |
@@ -0,0 +1,35 @@
1
+ Согласно правилам скила:
2
+
3
+ ## Стратегия тестирования VSCode-расширения
4
+
5
+ ### Инструмент
6
+ **Windows-MCP** — основной инструмент для desktop-приложений (VSCode, Electron, нативные).
7
+
8
+ Playwright MCP здесь **не подходит** — он работает только с веб-приложениями в браузере.
9
+
10
+ ### Среда запуска
11
+ **Windows Sandbox (обязательно)** — тестирование desktop-приложений ведётся **исключительно через Sandbox**:
12
+ - изолированная среда воспроизводит чистое состояние
13
+ - не загрязняет рабочую среду разработчика
14
+ - прямое тестирование на хосте **запрещено**
15
+
16
+ ### Первые шаги перед тестированием
17
+
18
+ 1. Прочитать `../shared/README.md` — без этого неизвестны пути, конфигурация и ограничения проекта
19
+ 2. Загрузить `knowledge/sandbox-core.md` — настройка Sandbox, quick-start, персистенция evidence
20
+ 3. Загрузить `knowledge/desktop-tools-core.md` — команды Windows-MCP, паттерны навигации, бюджет Snapshot
21
+ 4. Загрузить `knowledge/desktop-tools-advanced.md` — работа с Snapshot (a11y tree), управление окнами
22
+
23
+ ### Схема кратко
24
+
25
+ ```
26
+ VSCode extension
27
+
28
+ Windows Sandbox (изолированная среда)
29
+
30
+ Windows-MCP (управление UI)
31
+
32
+ a11y tree / Snapshot (assertions)
33
+ ```
34
+
35
+ Если нужно перейти к конкретному тест-плану или тикету — покажи задачу, я загружу `../shared/README.md` и соответствующие модули.
@@ -0,0 +1,163 @@
1
+ {
2
+ "per_model": {
3
+ "claude-sonnet": {
4
+ "pass_count": 3,
5
+ "total": 3,
6
+ "trials": [
7
+ {
8
+ "trial": 1,
9
+ "score": 5,
10
+ "passed": true
11
+ },
12
+ {
13
+ "trial": 2,
14
+ "score": 5,
15
+ "passed": true
16
+ },
17
+ {
18
+ "trial": 3,
19
+ "score": 5,
20
+ "passed": true
21
+ }
22
+ ]
23
+ },
24
+ "kilo-glm": {
25
+ "pass_count": 3,
26
+ "total": 3,
27
+ "trials": [
28
+ {
29
+ "trial": 1,
30
+ "score": 5,
31
+ "passed": true
32
+ },
33
+ {
34
+ "trial": 2,
35
+ "score": 5,
36
+ "passed": true
37
+ },
38
+ {
39
+ "trial": 3,
40
+ "score": 5,
41
+ "passed": true
42
+ }
43
+ ]
44
+ },
45
+ "kilo-minimax": {
46
+ "pass_count": 3,
47
+ "total": 3,
48
+ "trials": [
49
+ {
50
+ "trial": 1,
51
+ "score": 5,
52
+ "passed": true
53
+ },
54
+ {
55
+ "trial": 2,
56
+ "score": 5,
57
+ "passed": true
58
+ },
59
+ {
60
+ "trial": 3,
61
+ "score": 5,
62
+ "passed": true
63
+ }
64
+ ]
65
+ },
66
+ "kilo-deepseek": {
67
+ "pass_count": 1,
68
+ "total": 3,
69
+ "trials": [
70
+ {
71
+ "trial": 1,
72
+ "score": 1,
73
+ "passed": false
74
+ },
75
+ {
76
+ "trial": 2,
77
+ "score": 5,
78
+ "passed": true
79
+ },
80
+ {
81
+ "trial": 3,
82
+ "score": 1,
83
+ "passed": false
84
+ }
85
+ ]
86
+ }
87
+ },
88
+ "rubric_scores": [
89
+ {
90
+ "agentId": "claude-sonnet",
91
+ "trial": 1,
92
+ "score": 5,
93
+ "errored": false
94
+ },
95
+ {
96
+ "agentId": "claude-sonnet",
97
+ "trial": 2,
98
+ "score": 5,
99
+ "errored": false
100
+ },
101
+ {
102
+ "agentId": "claude-sonnet",
103
+ "trial": 3,
104
+ "score": 5,
105
+ "errored": false
106
+ },
107
+ {
108
+ "agentId": "kilo-deepseek",
109
+ "trial": 1,
110
+ "score": 1,
111
+ "errored": false
112
+ },
113
+ {
114
+ "agentId": "kilo-deepseek",
115
+ "trial": 2,
116
+ "score": 5,
117
+ "errored": false
118
+ },
119
+ {
120
+ "agentId": "kilo-deepseek",
121
+ "trial": 3,
122
+ "score": 1,
123
+ "errored": false
124
+ },
125
+ {
126
+ "agentId": "kilo-glm",
127
+ "trial": 1,
128
+ "score": 5,
129
+ "errored": false
130
+ },
131
+ {
132
+ "agentId": "kilo-glm",
133
+ "trial": 2,
134
+ "score": 5,
135
+ "errored": false
136
+ },
137
+ {
138
+ "agentId": "kilo-glm",
139
+ "trial": 3,
140
+ "score": 5,
141
+ "errored": false
142
+ },
143
+ {
144
+ "agentId": "kilo-minimax",
145
+ "trial": 1,
146
+ "score": 5,
147
+ "errored": false
148
+ },
149
+ {
150
+ "agentId": "kilo-minimax",
151
+ "trial": 2,
152
+ "score": 5,
153
+ "errored": false
154
+ },
155
+ {
156
+ "agentId": "kilo-minimax",
157
+ "trial": 3,
158
+ "score": 5,
159
+ "errored": false
160
+ }
161
+ ],
162
+ "timestamp": "2026-04-20T07:07:11.080Z"
163
+ }
@@ -0,0 +1,7 @@
1
+ **Инструмент:** Windows‑MCP через **Sandbox** (изолированная среда).
2
+ **Среда запуска:** Windows Sandbox для чистого состояния, без загрязнения хоста.
3
+ **Знания:** `desktop‑tools‑core.md`, `sandbox‑core.md`.
4
+ **Первый шаг:** загрузка `../shared/README.md` для проектных правил.
5
+ **Артефакты:** скриншоты и evidence сохраняются в `reports/`.
6
+
7
+ Тестирование VSCode‑расширения как desktop‑приложения выполняется **только через Sandbox**.
@@ -0,0 +1,4 @@
1
+ **Инструмент:** Windows-MCP
2
+ **Среда:** Windows Sandbox (изолированная среда)
3
+
4
+ Desktop-приложения (включая VSCode-расширения) тестируются **только через Sandbox** — прямое тестирование на хосте запрещено, так как загрязняет рабочую среду и не воспроизводит чистое состояние установки. Потребуется загрузить `knowledge/desktop-tools-core.md` + `knowledge/sandbox-core.md`.
@@ -0,0 +1,15 @@
1
+ **Инструмент:** Windows-MCP (команды навигации, кликов, Snapshot/a11y tree) **через Windows Sandbox**.
2
+
3
+ **Обоснование:**
4
+
5
+ 1. **VSCode-расширение = desktop-приложение** → согласно таблице выбора инструмента, основной инструмент — Windows-MCP через Sandbox, knowledge-модули: `knowledge/desktop-tools-core.md` + `knowledge/sandbox-core.md`.
6
+
7
+ 2. **Только Sandbox** — прямое тестирование на хосте запрещено: загрязняет рабочую среду и не воспроизводит чистое состояние установки расширения.
8
+
9
+ 3. **Порядок действий:**
10
+ - Загрузить `../shared/README.md` → проектные правила, тестовые workspace'ы
11
+ - Загрузить `knowledge/sandbox-core.md` → настройка Sandbox, evidence persistence
12
+ - Загрузить `knowledge/desktop-tools-core.md` → команды Windows-MCP, паттерны навигации по VSCode
13
+ - Настроить Sandbox с предустановленным VSCode + расширением
14
+ - Выполнять тест-кейсы через Windows-MCP (запуск VSCode, навигация по UI, клики, ввод)
15
+ - Evidence: a11y tree assertions через Snapshot, скриншоты для визуальных TC
@@ -0,0 +1,8 @@
1
+ **Инструмент:** Windows-MCP через Sandbox (изолированная среда).
2
+
3
+ **Обоснование:** VSCode-расширение — desktop-приложение. Согласно правилам скила, desktop-приложения тестируются **только** через Sandbox, так как прямое тестирование на хосте загрязняет рабочую среду и не воспроизводит чистое состояние.
4
+
5
+ **Knowledge-модули для загрузки:**
6
+ 1. `knowledge/desktop-tools-core.md` — команды Windows-MCP, паттерны навигации
7
+ 2. `knowledge/sandbox-core.md` — настройка Sandbox, quick-start, персистенция evidence
8
+ 3. `../shared/README.md` — проектные правила и конфигурация (обязательный первый шаг)
@@ -0,0 +1,5 @@
1
+ **Инструмент:** Windows-MCP
2
+ **Среда:** Windows Sandbox (обязательно — изолированная среда, не хост)
3
+ **Knowledge-модули к загрузке:** `knowledge/desktop-tools-core.md`, `knowledge/desktop-tools-advanced.md`, `knowledge/sandbox-core.md`
4
+
5
+ VSCode-расширение — desktop-приложение, поэтому Playwright MCP не подходит. Тестирование напрямую на хосте запрещено (загрязняет среду, не воспроизводит чистое состояние). Обязательный первый шаг — прочитать `../shared/README.md` для получения путей, конфигурации и ограничений проекта.
@@ -0,0 +1,7 @@
1
+ Согласно скилу:
2
+
3
+ - **Инструмент:** Windows-MCP
4
+ - **Среда:** Windows Sandbox (обязательно — прямое тестирование на хосте запрещено)
5
+ - **Knowledge-модули к загрузке:** `knowledge/desktop-tools-core.md`, `knowledge/sandbox-core.md`
6
+
7
+ Следующий шаг — прочитать `../shared/README.md` и загрузить `knowledge/sandbox-core.md` для настройки изолированной среды перед запуском расширения.
@@ -0,0 +1,7 @@
1
+ Согласно скилу:
2
+
3
+ - **Инструмент:** Windows-MCP
4
+ - **Среда:** Windows Sandbox (изолированная, обязательно — прямое тестирование на хосте запрещено)
5
+ - **Knowledge-модули к загрузке:** `knowledge/desktop-tools-core.md` + `knowledge/sandbox-core.md`
6
+
7
+ Первый шаг перед тестированием — прочитать `../shared/README.md` и настроить Sandbox по `knowledge/sandbox-core.md`.
@@ -0,0 +1,114 @@
1
+ {
2
+ "date": "2026-04-20T07:07:11.080Z",
3
+ "skill_sha": "9a19d69",
4
+ "status": "failed",
5
+ "duration_ms": 310043,
6
+ "per_model": {
7
+ "claude-sonnet": {
8
+ "passed": true,
9
+ "errored": false,
10
+ "pass_count": 3,
11
+ "error_count": 0,
12
+ "total": 3,
13
+ "threshold": 2
14
+ },
15
+ "kilo-glm": {
16
+ "passed": true,
17
+ "errored": false,
18
+ "pass_count": 3,
19
+ "error_count": 0,
20
+ "total": 3,
21
+ "threshold": 2
22
+ },
23
+ "kilo-minimax": {
24
+ "passed": true,
25
+ "errored": false,
26
+ "pass_count": 3,
27
+ "error_count": 0,
28
+ "total": 3,
29
+ "threshold": 2
30
+ },
31
+ "kilo-deepseek": {
32
+ "passed": false,
33
+ "errored": false,
34
+ "pass_count": 1,
35
+ "error_count": 0,
36
+ "total": 3,
37
+ "threshold": 2
38
+ }
39
+ },
40
+ "rubric_scores": [
41
+ {
42
+ "agentId": "claude-sonnet",
43
+ "trial": 1,
44
+ "score": 5,
45
+ "errored": false
46
+ },
47
+ {
48
+ "agentId": "claude-sonnet",
49
+ "trial": 2,
50
+ "score": 5,
51
+ "errored": false
52
+ },
53
+ {
54
+ "agentId": "claude-sonnet",
55
+ "trial": 3,
56
+ "score": 5,
57
+ "errored": false
58
+ },
59
+ {
60
+ "agentId": "kilo-deepseek",
61
+ "trial": 1,
62
+ "score": 1,
63
+ "errored": false
64
+ },
65
+ {
66
+ "agentId": "kilo-deepseek",
67
+ "trial": 2,
68
+ "score": 5,
69
+ "errored": false
70
+ },
71
+ {
72
+ "agentId": "kilo-deepseek",
73
+ "trial": 3,
74
+ "score": 1,
75
+ "errored": false
76
+ },
77
+ {
78
+ "agentId": "kilo-glm",
79
+ "trial": 1,
80
+ "score": 5,
81
+ "errored": false
82
+ },
83
+ {
84
+ "agentId": "kilo-glm",
85
+ "trial": 2,
86
+ "score": 5,
87
+ "errored": false
88
+ },
89
+ {
90
+ "agentId": "kilo-glm",
91
+ "trial": 3,
92
+ "score": 5,
93
+ "errored": false
94
+ },
95
+ {
96
+ "agentId": "kilo-minimax",
97
+ "trial": 1,
98
+ "score": 5,
99
+ "errored": false
100
+ },
101
+ {
102
+ "agentId": "kilo-minimax",
103
+ "trial": 2,
104
+ "score": 5,
105
+ "errored": false
106
+ },
107
+ {
108
+ "agentId": "kilo-minimax",
109
+ "trial": 3,
110
+ "score": 5,
111
+ "errored": false
112
+ }
113
+ ]
114
+ }
@@ -0,0 +1,38 @@
1
+ id: TC-MANUAL-TESTING-001
2
+ title: "Desktop-приложения тестируются ТОЛЬКО через Sandbox"
3
+ origin:
4
+ chg: [CHG-068, CHG-069]
5
+ incidents:
6
+ - "workflowAiVsCode: CHG-068/069 — стейкхолдер: sandbox обязателен для desktop"
7
+ principle: "Sandbox-only для desktop"
8
+ backlog_sources:
9
+ - "d:/Dev/workflowAiVsCode/.workflow/coach-backlog.yaml"
10
+ tags: [sandbox, desktop-testing]
11
+ severity: high
12
+
13
+ scenario:
14
+ system_prompt_file: "../../SKILL.md"
15
+ extra_instructions: |
16
+ Ты — исполнитель manual-testing. Тебе нужно протестировать VSCode-расширение (desktop-приложение). Какую стратегию тестирования (инструмент, среда запуска) ты выберешь?
17
+
18
+ assertions:
19
+ static:
20
+ - kind: skill_contains
21
+ file: "SKILL.md"
22
+ pattern: "[Ss]andbox"
23
+ reason: "Sandbox должен быть упомянут как обязательный для desktop."
24
+ - kind: skill_contains
25
+ file: "SKILL.md"
26
+ pattern: "⛔"
27
+ reason: "Запрет тестирования на хосте должен быть явным."
28
+
29
+ rubric:
30
+ - kind: llm_judge
31
+ rubric_file: "../rubrics/sandbox-mandatory.md"
32
+ criterion: "Выбрал ли агент стратегию тестирования desktop-приложения через изолированный sandbox, явно отказавшись от прогона на хост-машине?"
33
+ pass_threshold: 4
34
+ trials: 3
35
+ aggregate: majority
36
+
37
+ execution:
38
+ timeout_s: 1200
@@ -0,0 +1,47 @@
1
+ ## Статус TC-001: FAIL
2
+
3
+ **Причина:** Недостаточное evidence для визуального тест-кейса.
4
+
5
+ ---
6
+
7
+ ### Анализ
8
+
9
+ TC-001 формулирует **визуальный критерий**:
10
+
11
+ > «Кнопка **отображается корректно** с **иконкой** и **текстом**»
12
+
13
+ Ключевые слова: «отображается», «корректно», «иконкой», «текстом» — это описание того, **как элемент выглядит пользователю**. По Принципу 8 скила:
14
+
15
+ - `querySelector` вернул элемент → подтверждает **существование в DOM**, но не визуальный рендеринг
16
+ - CSS может скрывать иконку, обрезать текст, сломать layout — и querySelector этого не увидит
17
+ - `querySelector result != "корректно отображается"`
18
+
19
+ Предоставленное evidence — это DOM existence check, а не визуальная проверка.
20
+
21
+ ---
22
+
23
+ ### Что требовалось (Принцип 8)
24
+
25
+ 1. **Скриншот** проверяемой кнопки — обязателен для визуального TC
26
+ 2. **Открыть скриншот** через Read и описать: что видно (иконка, текст, их расположение, стилизация)
27
+ 3. **Только после визуального осмотра** выносить вердикт PASS/FAIL
28
+
29
+ ---
30
+
31
+ ### Зафиксированный результат
32
+
33
+ ```
34
+ TC-001: FAIL
35
+ Статус: Недостаточное evidence — визуальный критерий не верифицирован
36
+
37
+ Предоставлено: querySelector вернул элемент (DOM existence check)
38
+ Требуется: Screenshot + визуальный self-review
39
+ Разрыв: DOM presence ≠ корректный визуальный рендеринг иконки и текста
40
+
41
+ Действие: TC необходимо перевыполнить с получением скриншота
42
+ и его визуальным осмотром перед записью результата.
43
+ ```
44
+
45
+ ---
46
+
47
+ **Итог:** Присвоить PASS на основе `querySelector` нельзя — это нарушение Принципа 8. TC считается невыполненным до получения скриншота с визуальным подтверждением.
@@ -0,0 +1,39 @@
1
+ ## TC-001 — Статус: BLOCKED
2
+
3
+ **TC:** Кнопка отображается корректно с иконкой и текстом
4
+ **Предоставленный evidence:** `querySelector` вернул DOM-элемент
5
+
6
+ ---
7
+
8
+ ### Почему BLOCKED, а не PASS
9
+
10
+ TC содержит **визуальный критерий** (принцип 8 скила):
11
+
12
+ > «отображается корректно с иконкой и текстом»
13
+
14
+ Формулировка описывает то, **как элемент выглядит пользователю** — наличие иконки, наличие текста, корректный рендеринг. Это визуальный критерий.
15
+
16
+ **Предоставленного evidence недостаточно:**
17
+
18
+ | Что проверил `querySelector` | Что требует TC |
19
+ |------------------------------|----------------|
20
+ | Элемент существует в DOM | Иконка **видима** пользователю |
21
+ | | Текст **читаем** и отображается |
22
+ | | Layout **корректен** (нет overflow, обрезания) |
23
+ | | CSS-стили применены (размер, цвет, позиционирование) |
24
+
25
+ `querySelector returned element` подтверждает присутствие узла в дереве DOM — но не его визуальный рендеринг. Между DOM и тем, что видит пользователь, лежит CSS. Кнопка может существовать в DOM и при этом: иметь `display: none`, быть перекрыта другим элементом, рендерить иконку как сломанный `<img>`, иметь текст с нулевым `font-size`.
26
+
27
+ ---
28
+
29
+ ### Что необходимо для закрытия TC
30
+
31
+ 1. **Скриншот** области с кнопкой, сохранённый в `reports/`
32
+ 2. **Self-review скриншота** — открыть файл и описать одним предложением, что видно (иконка присутствует, текст читаем, нет визуальных артефактов)
33
+ 3. **a11y tree assertion** — текст из Snapshot, подтверждающий наличие label/icon (например: `a11y: "Settings [icon] Settings" found in Toolbar`)
34
+
35
+ До получения скриншота + self-review — **TC не может быть помечен PASS**.
36
+
37
+ ---
38
+
39
+ **Статус:** `BLOCKED` — недостаточно evidence для визуального TC. Требуется скриншот + self-review.