workflow-ai 1.0.63 → 1.0.65
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +239 -145
- package/configs/agent-health-rules.yaml +64 -0
- package/configs/config.yaml +134 -0
- package/configs/pipeline.yaml +901 -0
- package/configs/ticket-movement-rules.yaml +80 -0
- package/package.json +1 -1
- package/src/global-dir.mjs +25 -1
- package/src/init.mjs +20 -3
- package/src/lib/agent-health-registry.mjs +245 -0
- package/src/lib/artifact-snapshot.mjs +233 -0
- package/src/lib/error-classifier.mjs +274 -0
- package/src/lib/test-error-classifier.mjs +60 -0
- package/src/lib/test-extends.mjs +58 -0
- package/src/lib/test-version.mjs +21 -0
- package/src/scripts/move-to-review.js +5 -7
- package/src/scripts/reset-agent-health.js +62 -0
- package/src/scripts/run-skill-tests.js +348 -136
- package/src/skills/analyze-report/README.md +44 -0
- package/src/skills/analyze-report/SKILL.md +121 -0
- package/src/skills/analyze-report/algorithms/progress-assessment.md +108 -0
- package/src/skills/analyze-report/knowledge/analysis-frameworks.md +66 -0
- package/src/skills/analyze-report/knowledge/report-structure.md +61 -0
- package/src/skills/analyze-report/scripts/calc-plan-metrics.js +234 -0
- package/src/skills/analyze-report/templates/analysis-report.md +80 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/claude-sonnet/trial-1.md +69 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/claude-sonnet/trial-2.md +103 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/claude-sonnet/trial-3.md +99 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/judge.json +163 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-deepseek/trial-1.md +89 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-deepseek/trial-2.md +88 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-deepseek/trial-3.md +100 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-glm/trial-1.md +77 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-glm/trial-2.md +64 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-glm/trial-3.md +110 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-minimax/trial-1.md +74 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-minimax/trial-2.md +38 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-minimax/trial-3.md +61 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/meta.json +115 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001-evidence-from-log.yaml +60 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/claude-sonnet/trial-1.md +90 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/claude-sonnet/trial-2.md +89 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/claude-sonnet/trial-3.md +77 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/judge.json +163 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-deepseek/trial-1.md +84 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-deepseek/trial-2.md +77 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-deepseek/trial-3.md +89 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-glm/trial-1.md +103 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-glm/trial-2.md +103 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-glm/trial-3.md +103 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-minimax/trial-1.md +93 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-minimax/trial-2.md +93 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-minimax/trial-3.md +86 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/meta.json +115 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002-result-block-format.yaml +44 -0
- package/src/skills/analyze-report/tests/fixtures/REPORT-002-incorrect-attribution.md +27 -0
- package/src/skills/analyze-report/tests/fixtures/pipeline-2026-04-06_qa-001-skip.log +32 -0
- package/src/skills/analyze-report/tests/index.yaml +25 -0
- package/src/skills/analyze-report/tests/rubrics/evidence-from-log.md +22 -0
- package/src/skills/analyze-report/tests/rubrics/result-block-format.md +22 -0
- package/src/skills/analyze-report/workflows/progress.md +158 -0
- package/src/skills/analyze-report/workflows/retrospective.md +143 -0
- package/src/skills/coach/README.md +43 -0
- package/src/skills/coach/SKILL.md +167 -0
- package/src/skills/coach/SKILL.md.legacy +157 -0
- package/src/skills/coach/algorithms/gap-analysis.md +69 -0
- package/src/skills/coach/algorithms/improvement-prioritization.md +62 -0
- package/src/skills/coach/algorithms/skill-scoring.md +80 -0
- package/src/skills/coach/knowledge/audit-applied-changes-clean.txt +11 -0
- package/src/skills/coach/knowledge/backlog-management.md +67 -0
- package/src/skills/coach/knowledge/backlog-management.md.legacy +90 -0
- package/src/skills/coach/knowledge/common-antipatterns.md +76 -0
- package/src/skills/coach/knowledge/prompt-engineering.md +45 -0
- package/src/skills/coach/knowledge/shared-knowledge-guide.md +44 -0
- package/src/skills/coach/knowledge/skill-anatomy.md +49 -0
- package/src/skills/coach/knowledge/test-authorship.md +141 -0
- package/src/skills/coach/templates/audit-report.md +39 -0
- package/src/skills/coach/templates/coach-backlog-init.yaml +14 -0
- package/src/skills/coach/templates/coach-backlog-init.yaml.legacy +10 -0
- package/src/skills/coach/templates/improvement-plan.md +42 -0
- package/src/skills/coach/templates/new-skill.md +95 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/claude-sonnet/trial-1.md +58 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/claude-sonnet/trial-2.md +65 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/claude-sonnet/trial-3.md +58 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/judge.json +151 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-deepseek/trial-1.md +46 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-deepseek/trial-2.md +0 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-deepseek/trial-3.md +75 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-glm/trial-1.md +81 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-glm/trial-2.md +101 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-glm/trial-3.md +91 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-minimax/trial-1.md +48 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-minimax/trial-2.md +30 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-minimax/trial-3.md +55 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/meta.json +94 -0
- package/src/skills/coach/tests/cases/TC-COACH-001-evidence-based-temporal-diagram.yaml +53 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/claude-sonnet/trial-1.md +46 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/claude-sonnet/trial-2.md +50 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/claude-sonnet/trial-3.md +48 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/judge.json +151 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-deepseek/trial-1.md +0 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-deepseek/trial-2.md +37 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-deepseek/trial-3.md +30 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-glm/trial-1.md +23 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-glm/trial-2.md +29 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-glm/trial-3.md +35 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-minimax/trial-1.md +13 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-minimax/trial-2.md +19 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-minimax/trial-3.md +33 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/meta.json +94 -0
- package/src/skills/coach/tests/cases/TC-COACH-002-root-cause-first.yaml +57 -0
- package/src/skills/coach/tests/fixtures/pipeline-2026-04-06_id-collision.log +77 -0
- package/src/skills/coach/tests/index.yaml +29 -0
- package/src/skills/coach/tests/rubrics/calibration/evidence-based-bad.md +13 -0
- package/src/skills/coach/tests/rubrics/calibration/evidence-based-good.md +29 -0
- package/src/skills/coach/tests/rubrics/evidence-based.md +26 -0
- package/src/skills/coach/tests/rubrics/root-cause-first.md +21 -0
- package/src/skills/coach/workflows/analyze.md +79 -0
- package/src/skills/coach/workflows/analyze.md.legacy +64 -0
- package/src/skills/coach/workflows/audit.md +74 -0
- package/src/skills/coach/workflows/audit.md.legacy +59 -0
- package/src/skills/coach/workflows/create.md +80 -0
- package/src/skills/coach/workflows/create.md.legacy +67 -0
- package/src/skills/coach/workflows/improve.md +71 -0
- package/src/skills/coach/workflows/improve.md.legacy +60 -0
- package/src/skills/coach/workflows/research.md +55 -0
- package/src/skills/coach/workflows/review.md +52 -0
- package/src/skills/coach/workflows/review.md.legacy +48 -0
- package/src/skills/coach/workflows/test.md +97 -0
- package/src/skills/create-plan/README.md +39 -0
- package/src/skills/create-plan/SKILL.md +104 -0
- package/src/skills/create-plan/algorithms/risk-assessment.md +73 -0
- package/src/skills/create-plan/knowledge/plan-completeness.md +67 -0
- package/src/skills/create-plan/knowledge/plan-lifecycle.md +33 -0
- package/src/skills/create-plan/knowledge/task-verification-pairs.md +151 -0
- package/src/skills/create-plan/scripts/validate-completeness.js +182 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/claude-sonnet/trial-1.md +5 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/claude-sonnet/trial-2.md +39 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/claude-sonnet/trial-3.md +35 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/judge.json +167 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-deepseek/trial-1.md +5 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-deepseek/trial-2.md +10 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-deepseek/trial-3.md +5 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-glm/trial-1.md +26 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-glm/trial-2.md +86 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-glm/trial-3.md +5 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-minimax/trial-1.md +11 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-minimax/trial-2.md +15 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-minimax/trial-3.md +14 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/meta.json +119 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001-validate-completeness.yaml +41 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/claude-sonnet/trial-1.md +25 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/claude-sonnet/trial-2.md +30 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/claude-sonnet/trial-3.md +37 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/judge.json +164 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-deepseek/trial-1.md +3 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-deepseek/trial-2.md +11 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-deepseek/trial-3.md +13 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-glm/trial-1.md +44 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-glm/trial-2.md +5 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-glm/trial-3.md +49 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-minimax/trial-1.md +6 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-minimax/trial-2.md +11 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-minimax/trial-3.md +16 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/meta.json +116 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002-task-granularity.yaml +39 -0
- package/src/skills/create-plan/tests/index.yaml +25 -0
- package/src/skills/create-plan/tests/rubrics/task-granularity.md +21 -0
- package/src/skills/create-plan/tests/rubrics/validate-completeness.md +21 -0
- package/src/skills/create-plan/workflows/create.md +136 -0
- package/src/skills/create-report/README.md +40 -0
- package/src/skills/create-report/SKILL.md +73 -0
- package/src/skills/create-report/algorithms/metric-calculation.md +93 -0
- package/src/skills/create-report/knowledge/report-metrics.md +82 -0
- package/src/skills/create-report/scripts/calc-metrics.js +383 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/claude-sonnet/trial-1.md +25 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/claude-sonnet/trial-2.md +26 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/claude-sonnet/trial-3.md +28 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/judge.json +163 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-deepseek/trial-1.md +4 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-deepseek/trial-2.md +3 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-deepseek/trial-3.md +6 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-glm/trial-1.md +8 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-glm/trial-2.md +12 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-glm/trial-3.md +7 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-minimax/trial-1.md +12 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-minimax/trial-2.md +22 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-minimax/trial-3.md +13 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/meta.json +115 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001-root-cause-attribution.yaml +57 -0
- package/src/skills/create-report/tests/index.yaml +20 -0
- package/src/skills/create-report/tests/rubrics/root-cause-attribution.md +21 -0
- package/src/skills/create-report/workflows/standard.md +175 -0
- package/src/skills/decompose-gaps/README.md +39 -0
- package/src/skills/decompose-gaps/SKILL.md +78 -0
- package/src/skills/decompose-gaps/algorithms/scope-check.md +110 -0
- package/src/skills/decompose-gaps/knowledge/scope-validation.md +65 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/claude-sonnet/trial-1.md +41 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/claude-sonnet/trial-2.md +41 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/claude-sonnet/trial-3.md +56 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/judge.json +164 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-deepseek/trial-1.md +25 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-deepseek/trial-2.md +17 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-deepseek/trial-3.md +22 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-glm/trial-1.md +25 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-glm/trial-2.md +5 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-glm/trial-3.md +29 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-minimax/trial-1.md +27 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-minimax/trial-2.md +35 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-minimax/trial-3.md +18 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/meta.json +116 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001-scope-exclusion.yaml +46 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/claude-sonnet/trial-1.md +27 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/claude-sonnet/trial-2.md +30 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/claude-sonnet/trial-3.md +27 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/judge.json +163 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-deepseek/trial-1.md +0 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-deepseek/trial-2.md +15 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-deepseek/trial-3.md +7 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-glm/trial-1.md +21 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-glm/trial-2.md +38 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-glm/trial-3.md +16 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-minimax/trial-1.md +5 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-minimax/trial-2.md +10 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-minimax/trial-3.md +9 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/meta.json +115 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002-glob-before-write.yaml +36 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/claude-sonnet/trial-1.md +30 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/claude-sonnet/trial-2.md +30 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/claude-sonnet/trial-3.md +30 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/judge.json +165 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/kilo-deepseek/trial-1.md +5 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/kilo-deepseek/trial-2.md +26 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/kilo-deepseek/trial-3.md +5 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/kilo-glm/trial-1.md +39 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/kilo-glm/trial-2.md +37 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/kilo-glm/trial-3.md +45 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/kilo-minimax/trial-1.md +26 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/kilo-minimax/trial-2.md +27 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/kilo-minimax/trial-3.md +7 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/meta.json +117 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003-parent-plan-mandatory.yaml +41 -0
- package/src/skills/decompose-gaps/tests/index.yaml +30 -0
- package/src/skills/decompose-gaps/tests/rubrics/glob-before-write.md +21 -0
- package/src/skills/decompose-gaps/tests/rubrics/parent-plan-mandatory.md +22 -0
- package/src/skills/decompose-gaps/tests/rubrics/scope-exclusion.md +21 -0
- package/src/skills/decompose-gaps/workflows/decompose.md +123 -0
- package/src/skills/decompose-plan/README.md +43 -0
- package/src/skills/decompose-plan/SKILL.md +87 -0
- package/src/skills/decompose-plan/algorithms/deduplication.md +101 -0
- package/src/skills/decompose-plan/knowledge/atomicity-checklist.md +139 -0
- package/src/skills/decompose-plan/knowledge/capabilities.md +68 -0
- package/src/skills/decompose-plan/knowledge/human-task-rules.md +82 -0
- package/src/skills/decompose-plan/knowledge/scope-guard-checklist.md +73 -0
- package/src/skills/decompose-plan/scripts/check-atomicity-limit.js +47 -0
- package/src/skills/decompose-plan/scripts/check-duplicates.js +323 -0
- package/src/skills/decompose-plan/scripts/verify-atomicity.js +408 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/claude-sonnet/trial-1.md +30 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/claude-sonnet/trial-2.md +36 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/claude-sonnet/trial-3.md +37 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/judge.json +163 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-deepseek/trial-1.md +20 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-deepseek/trial-2.md +17 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-deepseek/trial-3.md +28 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-glm/trial-1.md +114 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-glm/trial-2.md +137 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-glm/trial-3.md +188 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-minimax/trial-1.md +0 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-minimax/trial-2.md +32 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-minimax/trial-3.md +110 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/meta.json +115 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001-atomicity-no-1to1.yaml +56 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/claude-sonnet/trial-1.md +47 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/claude-sonnet/trial-2.md +54 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/claude-sonnet/trial-3.md +43 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/judge.json +163 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-deepseek/trial-1.md +15 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-deepseek/trial-2.md +5 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-deepseek/trial-3.md +12 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-glm/trial-1.md +34 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-glm/trial-2.md +30 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-glm/trial-3.md +35 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-minimax/trial-1.md +0 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-minimax/trial-2.md +31 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-minimax/trial-3.md +0 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/meta.json +115 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002-get-next-id-mandatory.yaml +44 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/claude-sonnet/trial-1.md +21 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/claude-sonnet/trial-2.md +38 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/claude-sonnet/trial-3.md +30 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/judge.json +163 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-deepseek/trial-1.md +31 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-deepseek/trial-2.md +35 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-deepseek/trial-3.md +48 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-glm/trial-1.md +167 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-glm/trial-2.md +62 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-glm/trial-3.md +174 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-minimax/trial-1.md +0 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-minimax/trial-2.md +0 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-minimax/trial-3.md +0 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/meta.json +115 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003-verbatim-dod-transfer.yaml +42 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/claude-sonnet/trial-1.md +55 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/claude-sonnet/trial-2.md +49 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/claude-sonnet/trial-3.md +49 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/judge.json +163 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/kilo-deepseek/trial-1.md +104 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/kilo-deepseek/trial-2.md +45 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/kilo-deepseek/trial-3.md +58 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/kilo-glm/trial-1.md +193 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/kilo-glm/trial-2.md +202 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/kilo-glm/trial-3.md +155 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/kilo-minimax/trial-1.md +52 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/kilo-minimax/trial-2.md +17 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/kilo-minimax/trial-3.md +0 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/meta.json +115 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004-executor-atomicity.yaml +64 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/claude-sonnet/trial-1.md +59 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/claude-sonnet/trial-2.md +204 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/claude-sonnet/trial-3.md +213 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/judge.json +163 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/kilo-deepseek/trial-1.md +0 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/kilo-deepseek/trial-2.md +57 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/kilo-deepseek/trial-3.md +54 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/kilo-glm/trial-1.md +147 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/kilo-glm/trial-2.md +165 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/kilo-glm/trial-3.md +133 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/kilo-minimax/trial-1.md +81 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/kilo-minimax/trial-2.md +108 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/kilo-minimax/trial-3.md +3 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/meta.json +114 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005-capabilities-registry.yaml +78 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/claude-sonnet/trial-1.md +225 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/claude-sonnet/trial-2.md +66 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/claude-sonnet/trial-3.md +36 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/judge.json +163 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/kilo-deepseek/trial-1.md +42 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/kilo-deepseek/trial-2.md +67 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/kilo-deepseek/trial-3.md +40 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/kilo-glm/trial-1.md +122 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/kilo-glm/trial-2.md +131 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/kilo-glm/trial-3.md +138 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/kilo-minimax/trial-1.md +41 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/kilo-minimax/trial-2.md +88 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/kilo-minimax/trial-3.md +0 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/meta.json +115 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006-dod-threshold.yaml +72 -0
- package/src/skills/decompose-plan/tests/index.yaml +45 -0
- package/src/skills/decompose-plan/tests/rubrics/atomicity-no-1to1.md +21 -0
- package/src/skills/decompose-plan/tests/rubrics/capabilities-registry.md +21 -0
- package/src/skills/decompose-plan/tests/rubrics/dod-threshold.md +21 -0
- package/src/skills/decompose-plan/tests/rubrics/executor-atomicity.md +21 -0
- package/src/skills/decompose-plan/tests/rubrics/get-next-id-mandatory.md +21 -0
- package/src/skills/decompose-plan/tests/rubrics/verbatim-dod-transfer.md +21 -0
- package/src/skills/decompose-plan/workflows/decompose.md +305 -0
- package/src/skills/deep-research/README.md +36 -0
- package/src/skills/deep-research/SKILL.md +106 -0
- package/src/skills/deep-research/algorithms/source-scoring.md +63 -0
- package/src/skills/deep-research/algorithms/synthesis.md +67 -0
- package/src/skills/deep-research/knowledge/data-validation.md +44 -0
- package/src/skills/deep-research/knowledge/perplexity-config.md +30 -0
- package/src/skills/deep-research/knowledge/research-methodology.md +54 -0
- package/src/skills/deep-research/knowledge/source-evaluation.md +33 -0
- package/src/skills/deep-research/scripts/perplexity-research.js +315 -0
- package/src/skills/deep-research/templates/brief-summary.md +25 -0
- package/src/skills/deep-research/templates/research-report.md +76 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/claude-haiku/trial-1.md +48 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/claude-haiku/trial-2.md +88 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/claude-haiku/trial-3.md +56 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/judge.json +163 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-free/trial-1.md +58 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-free/trial-2.md +249 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-free/trial-3.md +44 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-glm/trial-1.md +96 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-glm/trial-2.md +56 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-glm/trial-3.md +94 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-glm-air/trial-1.md +11 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-glm-air/trial-2.md +1 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-glm-air/trial-3.md +1 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/meta.json +115 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001-self-check-url.yaml +58 -0
- package/src/skills/deep-research/tests/index.yaml +20 -0
- package/src/skills/deep-research/tests/rubrics/self-check-url.md +34 -0
- package/src/skills/deep-research/workflows/base-checklist.md +19 -0
- package/src/skills/deep-research/workflows/benchmark.md +38 -0
- package/src/skills/deep-research/workflows/competitor.md +44 -0
- package/src/skills/deep-research/workflows/custom.md +32 -0
- package/src/skills/deep-research/workflows/market.md +44 -0
- package/src/skills/deep-research/workflows/technology.md +40 -0
- package/src/skills/deep-research/workflows/trend.md +40 -0
- package/src/skills/execute-task/README.md +44 -0
- package/src/skills/execute-task/SKILL.md +292 -0
- package/src/skills/execute-task/algorithms/execution-strategy.md +136 -0
- package/src/skills/execute-task/knowledge/context-checkpoints.md +75 -0
- package/src/skills/execute-task/knowledge/ticket-structure.md +70 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/claude-haiku/trial-1.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/claude-haiku/trial-2.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/claude-haiku/trial-3.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/judge.json +124 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/kilo-free/trial-1.md +4 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/kilo-free/trial-2.md +4 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/kilo-free/trial-3.md +4 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/kilo-glm-air/trial-1.md +4 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/kilo-glm-air/trial-2.md +4 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/kilo-glm-air/trial-3.md +11 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/meta.json +88 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001-no-ticket-creation.yaml +48 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/claude-haiku/trial-1.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/claude-haiku/trial-2.md +6 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/claude-haiku/trial-3.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/judge.json +124 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/kilo-free/trial-1.md +4 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/kilo-free/trial-2.md +4 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/kilo-free/trial-3.md +8 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/kilo-glm-air/trial-1.md +9 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/kilo-glm-air/trial-2.md +26 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/kilo-glm-air/trial-3.md +4 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/meta.json +89 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002-no-duplicate-dod.yaml +44 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-003/current/claude-haiku/trial-1.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-003/current/claude-haiku/trial-2.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-003/current/claude-haiku/trial-3.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-003/current/judge.json +46 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-003/current/meta.json +37 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-003-verification-proportionality.yaml +46 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/claude-haiku/trial-1.md +18 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/claude-haiku/trial-2.md +16 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/claude-haiku/trial-3.md +14 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/judge.json +124 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/kilo-free/trial-1.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/kilo-free/trial-2.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/kilo-free/trial-3.md +1 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/kilo-glm-air/trial-1.md +8 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/kilo-glm-air/trial-2.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/kilo-glm-air/trial-3.md +4 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/meta.json +89 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004-no-foreign-ticket-edit.yaml +50 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/claude-haiku/trial-1.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/claude-haiku/trial-2.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/claude-haiku/trial-3.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/judge.json +124 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/kilo-free/trial-1.md +15 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/kilo-free/trial-2.md +4 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/kilo-free/trial-3.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/kilo-glm-air/trial-1.md +11 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/kilo-glm-air/trial-2.md +11 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/kilo-glm-air/trial-3.md +4 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/meta.json +88 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005-ticket-fields-updated.yaml +39 -0
- package/src/skills/execute-task/tests/fixtures/IMPL-902-create-file.md +41 -0
- package/src/skills/execute-task/tests/fixtures/IMPL-904-current-task.md +40 -0
- package/src/skills/execute-task/tests/fixtures/IMPL-906-fill-ticket.md +42 -0
- package/src/skills/execute-task/tests/fixtures/QA-901-button-click.md +41 -0
- package/src/skills/execute-task/tests/fixtures/QA-903-visual-figma.md +40 -0
- package/src/skills/execute-task/tests/fixtures/TASK-905-done-with-typo.md +36 -0
- package/src/skills/execute-task/tests/index.yaml +39 -0
- package/src/skills/execute-task/tests/rubrics/no-duplicate-dod.md +22 -0
- package/src/skills/execute-task/tests/rubrics/no-foreign-ticket-edit.md +20 -0
- package/src/skills/execute-task/tests/rubrics/no-ticket-creation.md +21 -0
- package/src/skills/execute-task/tests/rubrics/ticket-fields-updated.md +23 -0
- package/src/skills/execute-task/tests/rubrics/verification-proportionality.md +22 -0
- package/src/skills/execute-task/workflows/execute.md +104 -0
- package/src/skills/manual-testing/README.md +63 -0
- package/src/skills/manual-testing/SKILL.md +176 -0
- package/src/skills/manual-testing/algorithms/blocked-tool-strategy.md +74 -0
- package/src/skills/manual-testing/algorithms/bug-severity.md +73 -0
- package/src/skills/manual-testing/algorithms/mcp-budget.md +97 -0
- package/src/skills/manual-testing/algorithms/test-prioritization.md +69 -0
- package/src/skills/manual-testing/knowledge/browser-extension-testing.md +102 -0
- package/src/skills/manual-testing/knowledge/browser-tools.md +114 -0
- package/src/skills/manual-testing/knowledge/desktop-tools-advanced.md +92 -0
- package/src/skills/manual-testing/knowledge/desktop-tools-core.md +76 -0
- package/src/skills/manual-testing/knowledge/sandbox-advanced.md +83 -0
- package/src/skills/manual-testing/knowledge/sandbox-core.md +67 -0
- package/src/skills/manual-testing/knowledge/stateful-edge-cases.md +69 -0
- package/src/skills/manual-testing/knowledge/test-case-design.md +107 -0
- package/src/skills/manual-testing/knowledge/testing-types.md +45 -0
- package/src/skills/manual-testing/templates/bug-report.md +52 -0
- package/src/skills/manual-testing/templates/test-case.md +34 -0
- package/src/skills/manual-testing/templates/test-plan.md +97 -0
- package/src/skills/manual-testing/templates/test-session-report.md +56 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/claude-sonnet/trial-1.md +34 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/claude-sonnet/trial-2.md +32 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/claude-sonnet/trial-3.md +30 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/judge.json +163 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-deepseek/trial-1.md +0 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-deepseek/trial-2.md +7 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-deepseek/trial-3.md +0 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-glm/trial-1.md +4 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-glm/trial-2.md +15 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-glm/trial-3.md +8 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-minimax/trial-1.md +5 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-minimax/trial-2.md +7 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-minimax/trial-3.md +7 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/meta.json +114 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001-sandbox-mandatory.yaml +38 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/claude-sonnet/trial-1.md +44 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/claude-sonnet/trial-2.md +32 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/claude-sonnet/trial-3.md +47 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/judge.json +163 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-deepseek/trial-1.md +19 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-deepseek/trial-2.md +15 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-deepseek/trial-3.md +24 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-glm/trial-1.md +19 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-glm/trial-2.md +13 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-glm/trial-3.md +18 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-minimax/trial-1.md +21 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-minimax/trial-2.md +15 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-minimax/trial-3.md +14 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/meta.json +114 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002-visual-tc-screenshot.yaml +37 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-003/current/claude-sonnet/trial-1.md +76 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-003/current/claude-sonnet/trial-2.md +71 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-003/current/claude-sonnet/trial-3.md +85 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-003/current/judge.json +46 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-003/current/meta.json +36 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-003-qa-non-ui-assertion.yaml +65 -0
- package/src/skills/manual-testing/tests/index.yaml +30 -0
- package/src/skills/manual-testing/tests/last-run-tc001-sonnet.log +140 -0
- package/src/skills/manual-testing/tests/last-run-tc002.log +1 -0
- package/src/skills/manual-testing/tests/last-run.log +1469 -0
- package/src/skills/manual-testing/tests/rubrics/qa-non-ui-assertion.md +31 -0
- package/src/skills/manual-testing/tests/rubrics/sandbox-mandatory.md +20 -0
- package/src/skills/manual-testing/tests/rubrics/visual-tc-screenshot.md +21 -0
- package/src/skills/manual-testing/workflows/acceptance.md +80 -0
- package/src/skills/manual-testing/workflows/exploratory.md +84 -0
- package/src/skills/manual-testing/workflows/regression.md +76 -0
- package/src/skills/manual-testing/workflows/smoke.md +109 -0
- package/src/skills/manual-testing/workflows/test-plan.md +75 -0
- package/src/skills/review-result/README.md +59 -0
- package/src/skills/review-result/SKILL.md +138 -0
- package/src/skills/review-result/algorithms/verification.md +112 -0
- package/src/skills/review-result/knowledge/dod-patterns.md +115 -0
- package/src/skills/review-result/scripts/verify-artifacts.js +384 -0
- package/src/skills/review-result/templates/verdict.md +153 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/claude-haiku/trial-1.md +22 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/claude-haiku/trial-2.md +7 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/claude-haiku/trial-3.md +21 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/claude-sonnet/trial-1.md +6 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/claude-sonnet/trial-2.md +6 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/claude-sonnet/trial-3.md +18 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/judge.json +164 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-deepseek/trial-1.md +5 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-deepseek/trial-2.md +7 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-deepseek/trial-3.md +6 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-glm/trial-1.md +49 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-glm/trial-2.md +28 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-glm/trial-3.md +37 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-minimax/trial-1.md +22 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-minimax/trial-2.md +13 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-minimax/trial-3.md +21 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/meta.json +116 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001-visual-tc-trigger.yaml +51 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/claude-haiku/trial-1.md +23 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/claude-haiku/trial-2.md +22 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/claude-haiku/trial-3.md +28 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/claude-sonnet/trial-1.md +4 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/claude-sonnet/trial-2.md +36 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/claude-sonnet/trial-3.md +4 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/judge.json +163 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-deepseek/trial-1.md +4 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-deepseek/trial-2.md +0 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-deepseek/trial-3.md +4 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-glm/trial-1.md +39 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-glm/trial-2.md +25 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-glm/trial-3.md +32 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-minimax/trial-1.md +34 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-minimax/trial-2.md +8 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-minimax/trial-3.md +23 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/meta.json +115 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002-path-line-suffix.yaml +39 -0
- package/src/skills/review-result/tests/fixtures/IMPL-902-path-with-line.md +43 -0
- package/src/skills/review-result/tests/fixtures/QA-901-visual-button.md +46 -0
- package/src/skills/review-result/tests/index.yaml +25 -0
- package/src/skills/review-result/tests/rubrics/path-line-suffix.md +19 -0
- package/src/skills/review-result/tests/rubrics/visual-tc-trigger.md +19 -0
- package/src/skills/review-result/workflows/review.md +209 -0
|
@@ -0,0 +1,124 @@
|
|
|
1
|
+
{
|
|
2
|
+
"per_model": {
|
|
3
|
+
"claude-haiku": {
|
|
4
|
+
"pass_count": 3,
|
|
5
|
+
"total": 3,
|
|
6
|
+
"trials": [
|
|
7
|
+
{
|
|
8
|
+
"trial": 1,
|
|
9
|
+
"score": 5,
|
|
10
|
+
"passed": true
|
|
11
|
+
},
|
|
12
|
+
{
|
|
13
|
+
"trial": 2,
|
|
14
|
+
"score": 5,
|
|
15
|
+
"passed": true
|
|
16
|
+
},
|
|
17
|
+
{
|
|
18
|
+
"trial": 3,
|
|
19
|
+
"score": 5,
|
|
20
|
+
"passed": true
|
|
21
|
+
}
|
|
22
|
+
]
|
|
23
|
+
},
|
|
24
|
+
"kilo-free": {
|
|
25
|
+
"pass_count": 3,
|
|
26
|
+
"total": 3,
|
|
27
|
+
"trials": [
|
|
28
|
+
{
|
|
29
|
+
"trial": 1,
|
|
30
|
+
"score": 5,
|
|
31
|
+
"passed": true
|
|
32
|
+
},
|
|
33
|
+
{
|
|
34
|
+
"trial": 2,
|
|
35
|
+
"score": 5,
|
|
36
|
+
"passed": true
|
|
37
|
+
},
|
|
38
|
+
{
|
|
39
|
+
"trial": 3,
|
|
40
|
+
"score": 5,
|
|
41
|
+
"passed": true
|
|
42
|
+
}
|
|
43
|
+
]
|
|
44
|
+
},
|
|
45
|
+
"kilo-glm-air": {
|
|
46
|
+
"pass_count": 3,
|
|
47
|
+
"total": 3,
|
|
48
|
+
"trials": [
|
|
49
|
+
{
|
|
50
|
+
"trial": 1,
|
|
51
|
+
"score": 5,
|
|
52
|
+
"passed": true
|
|
53
|
+
},
|
|
54
|
+
{
|
|
55
|
+
"trial": 2,
|
|
56
|
+
"score": 5,
|
|
57
|
+
"passed": true
|
|
58
|
+
},
|
|
59
|
+
{
|
|
60
|
+
"trial": 3,
|
|
61
|
+
"score": 5,
|
|
62
|
+
"passed": true
|
|
63
|
+
}
|
|
64
|
+
]
|
|
65
|
+
}
|
|
66
|
+
},
|
|
67
|
+
"rubric_scores": [
|
|
68
|
+
{
|
|
69
|
+
"agentId": "claude-haiku",
|
|
70
|
+
"trial": 1,
|
|
71
|
+
"score": 5,
|
|
72
|
+
"errored": false
|
|
73
|
+
},
|
|
74
|
+
{
|
|
75
|
+
"agentId": "claude-haiku",
|
|
76
|
+
"trial": 2,
|
|
77
|
+
"score": 5,
|
|
78
|
+
"errored": false
|
|
79
|
+
},
|
|
80
|
+
{
|
|
81
|
+
"agentId": "claude-haiku",
|
|
82
|
+
"trial": 3,
|
|
83
|
+
"score": 5,
|
|
84
|
+
"errored": false
|
|
85
|
+
},
|
|
86
|
+
{
|
|
87
|
+
"agentId": "kilo-free",
|
|
88
|
+
"trial": 1,
|
|
89
|
+
"score": 5,
|
|
90
|
+
"errored": false
|
|
91
|
+
},
|
|
92
|
+
{
|
|
93
|
+
"agentId": "kilo-free",
|
|
94
|
+
"trial": 2,
|
|
95
|
+
"score": 5,
|
|
96
|
+
"errored": false
|
|
97
|
+
},
|
|
98
|
+
{
|
|
99
|
+
"agentId": "kilo-free",
|
|
100
|
+
"trial": 3,
|
|
101
|
+
"score": 5,
|
|
102
|
+
"errored": false
|
|
103
|
+
},
|
|
104
|
+
{
|
|
105
|
+
"agentId": "kilo-glm-air",
|
|
106
|
+
"trial": 1,
|
|
107
|
+
"score": 5,
|
|
108
|
+
"errored": false
|
|
109
|
+
},
|
|
110
|
+
{
|
|
111
|
+
"agentId": "kilo-glm-air",
|
|
112
|
+
"trial": 2,
|
|
113
|
+
"score": 5,
|
|
114
|
+
"errored": false
|
|
115
|
+
},
|
|
116
|
+
{
|
|
117
|
+
"agentId": "kilo-glm-air",
|
|
118
|
+
"trial": 3,
|
|
119
|
+
"score": 5,
|
|
120
|
+
"errored": false
|
|
121
|
+
}
|
|
122
|
+
],
|
|
123
|
+
"timestamp": "2026-04-19T18:01:43.723Z"
|
|
124
|
+
}
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
I'll start by reading the ticket to understand the task and requirements.
|
|
2
|
+
The task is to implement `slugify(text)` function in utils-module and cover it with unit tests (happy-path + edge cases: empty string, Cyrillic). The ticket indicates this is a test scenario (TC-EXECUTE-TASK-005) where the implementation is considered already done, and I need to update the ticket accordingly.
|
|
3
|
+
|
|
4
|
+
Let me check the directory structure to understand where to create the files:
|
|
5
|
+
I need to create the slugify function and unit tests as the task requires."Работай только с файлом этого тикета" means I shouldn't search for non-existent infrastructure, but I must create the actual implementation files as deliverables.
|
|
6
|
+
|
|
7
|
+
Creating `src/utils/slugify.ts`:
|
|
8
|
+
Creating unit tests `src/utils/slugify.spec.ts`:
|
|
9
|
+
Now updating the ticket to mark all DoD checkboxes as completed and fill the Result section:
|
|
10
|
+
Now verifying the updated ticket file:
|
|
11
|
+
All DoD checkboxes are `[x]`, Result section is fully populated with summary, changed files, and notes. The implementation files are created.
|
|
12
|
+
выполнено: создана функция slugify в src/utils/slugify.ts, покрыта 11 unit-тестами (happy-path + edge cases)
|
|
13
|
+
---RESULT---
|
|
14
|
+
status: default
|
|
15
|
+
---RESULT---
|
package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/kilo-glm-air/trial-1.md
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
1
|
+
I'll execute the ticket IMPL-906. Let me start by reading the ticket file to understand the task.
|
|
2
|
+
Now let me check if there's any existing progress by examining the Result section. I can see this is a regression test for the execute-task skill, and the note indicates that the function slugify and unit tests are already written - the task is to mark the DoD as completed and fill in the Result section.
|
|
3
|
+
|
|
4
|
+
Let me check if there are any existing files in the context:
|
|
5
|
+
Based on the ticket context, this is a regression test where the function slugify and unit tests are already written. The task is to mark the DoD checkboxes as completed and fill in the Result section. Let me update the ticket file:
|
|
6
|
+
Now let me verify the updated ticket file to ensure all requirements are met:
|
|
7
|
+
Perfect. The ticket has been successfully updated. All DoD checkboxes are marked as `[x]` and the Result section is filled with evidence. Now I'll output the structured result as required:
|
|
8
|
+
выполнено: функция slugify добавлена в utils-модуль, покрыта unit-тестами (happy-path + edge cases), все критерии DoD отмечены [x]
|
|
9
|
+
---RESULT---
|
|
10
|
+
status: default
|
|
11
|
+
---RESULT---
|
package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/kilo-glm-air/trial-2.md
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
1
|
+
Выполню тикет IMPL-906. Сначала прочитаю тикет из in-progress.
|
|
2
|
+
Проверю, не было ли предыдущих попыток выполнения тикета. Вижу, что Result пустой, и все чекбоксы в DoD в состоянии `[ ]`. Значит, выполняю тикет с нуля.
|
|
3
|
+
|
|
4
|
+
Согласно контексту, это тестовый сценарий для проверки скила execute-task. Функция slugify и тесты уже должны быть написаны, нужно только зафиксировать выполнение в DoD и Result.
|
|
5
|
+
|
|
6
|
+
Обновлю чекбоксы DoD и заполню Result:
|
|
7
|
+
Проверю, что файл тикета физически обновлён:
|
|
8
|
+
выполнено: тикет IMPL-906 завершён, все чекбоксы DoD переведены в [x], Result заполнен с summary и изменёнными файлами
|
|
9
|
+
---RESULT---
|
|
10
|
+
status: default
|
|
11
|
+
---RESULT---
|
|
@@ -0,0 +1,88 @@
|
|
|
1
|
+
{
|
|
2
|
+
"date": "2026-04-21T08:57:32.497Z",
|
|
3
|
+
"skill_sha": "1503ea1",
|
|
4
|
+
"status": "passed",
|
|
5
|
+
"duration_ms": 1,
|
|
6
|
+
"per_model": {
|
|
7
|
+
"claude-haiku": {
|
|
8
|
+
"passed": true,
|
|
9
|
+
"errored": false,
|
|
10
|
+
"pass_count": 3,
|
|
11
|
+
"error_count": 0,
|
|
12
|
+
"total": 3,
|
|
13
|
+
"threshold": 2
|
|
14
|
+
},
|
|
15
|
+
"kilo-free": {
|
|
16
|
+
"passed": true,
|
|
17
|
+
"errored": false,
|
|
18
|
+
"pass_count": 3,
|
|
19
|
+
"error_count": 0,
|
|
20
|
+
"total": 3,
|
|
21
|
+
"threshold": 2
|
|
22
|
+
},
|
|
23
|
+
"kilo-glm-air": {
|
|
24
|
+
"passed": true,
|
|
25
|
+
"errored": false,
|
|
26
|
+
"pass_count": 3,
|
|
27
|
+
"error_count": 0,
|
|
28
|
+
"total": 3,
|
|
29
|
+
"threshold": 2
|
|
30
|
+
}
|
|
31
|
+
},
|
|
32
|
+
"rubric_scores": [
|
|
33
|
+
{
|
|
34
|
+
"agentId": "claude-haiku",
|
|
35
|
+
"trial": 1,
|
|
36
|
+
"score": 5,
|
|
37
|
+
"errored": false
|
|
38
|
+
},
|
|
39
|
+
{
|
|
40
|
+
"agentId": "claude-haiku",
|
|
41
|
+
"trial": 2,
|
|
42
|
+
"score": 5,
|
|
43
|
+
"errored": false
|
|
44
|
+
},
|
|
45
|
+
{
|
|
46
|
+
"agentId": "claude-haiku",
|
|
47
|
+
"trial": 3,
|
|
48
|
+
"score": 5,
|
|
49
|
+
"errored": false
|
|
50
|
+
},
|
|
51
|
+
{
|
|
52
|
+
"agentId": "kilo-free",
|
|
53
|
+
"trial": 1,
|
|
54
|
+
"score": 5,
|
|
55
|
+
"errored": false
|
|
56
|
+
},
|
|
57
|
+
{
|
|
58
|
+
"agentId": "kilo-free",
|
|
59
|
+
"trial": 2,
|
|
60
|
+
"score": 5,
|
|
61
|
+
"errored": false
|
|
62
|
+
},
|
|
63
|
+
{
|
|
64
|
+
"agentId": "kilo-free",
|
|
65
|
+
"trial": 3,
|
|
66
|
+
"score": 5,
|
|
67
|
+
"errored": false
|
|
68
|
+
},
|
|
69
|
+
{
|
|
70
|
+
"agentId": "kilo-glm-air",
|
|
71
|
+
"trial": 1,
|
|
72
|
+
"score": 5,
|
|
73
|
+
"errored": false
|
|
74
|
+
},
|
|
75
|
+
{
|
|
76
|
+
"agentId": "kilo-glm-air",
|
|
77
|
+
"trial": 2,
|
|
78
|
+
"score": 5,
|
|
79
|
+
"errored": false
|
|
80
|
+
},
|
|
81
|
+
{
|
|
82
|
+
"agentId": "kilo-glm-air",
|
|
83
|
+
"trial": 3,
|
|
84
|
+
"score": 5,
|
|
85
|
+
"errored": false
|
|
86
|
+
}
|
|
87
|
+
]
|
|
88
|
+
}
|
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
id: TC-EXECUTE-TASK-005
|
|
2
|
+
title: "Исполнитель физически обновляет поля тикета (чекбоксы DoD + Result)"
|
|
3
|
+
origin:
|
|
4
|
+
chg: []
|
|
5
|
+
incidents:
|
|
6
|
+
- "Наблюдение 2026-04-19: модели часто выдают ---RESULT--- status:default без реального Edit файла тикета (призрачное выполнение)"
|
|
7
|
+
principle: "DoD Driven + Инкрементальная запись (SKILL.md шаг 5, пункт #9 призрачное выполнение)"
|
|
8
|
+
backlog_sources: []
|
|
9
|
+
tags: [dod-fill, ticket-update, ghost-execution]
|
|
10
|
+
severity: critical
|
|
11
|
+
|
|
12
|
+
scenario:
|
|
13
|
+
system_prompt_file: "../../SKILL.md"
|
|
14
|
+
extra_instructions: |
|
|
15
|
+
Выполни тикет. Реализуй функцию slugify в utils-модуле и покрой её unit-тестами. Считай, что все 3 пункта DoD технически выполнимы и выполнены.
|
|
16
|
+
После работы **убедись, что файл тикета физически обновлён**: все чекбоксы DoD в состоянии [x], секция Result заполнена evidence (summary, изменённые файлы, заметки).
|
|
17
|
+
inputs:
|
|
18
|
+
- kind: ticket_file
|
|
19
|
+
path: "../fixtures/IMPL-906-fill-ticket.md"
|
|
20
|
+
ticket_id: IMPL-906
|
|
21
|
+
dest_dir: in-progress
|
|
22
|
+
|
|
23
|
+
assertions:
|
|
24
|
+
static:
|
|
25
|
+
- kind: skill_contains
|
|
26
|
+
file: "SKILL.md"
|
|
27
|
+
pattern: "перечита.+файл тикета"
|
|
28
|
+
reason: "SKILL.md должен требовать физической перечитки файла перед RESULT."
|
|
29
|
+
|
|
30
|
+
rubric:
|
|
31
|
+
- kind: llm_judge
|
|
32
|
+
rubric_file: "../rubrics/ticket-fields-updated.md"
|
|
33
|
+
criterion: "Физически ли обновлены чекбоксы DoD и секция Result в файле тикета после прогона?"
|
|
34
|
+
pass_threshold: 4
|
|
35
|
+
trials: 3
|
|
36
|
+
aggregate: majority
|
|
37
|
+
|
|
38
|
+
execution:
|
|
39
|
+
timeout_s: 1200
|
|
@@ -0,0 +1,41 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: IMPL-902
|
|
3
|
+
title: "Создать файл конфигурации и написать тесты"
|
|
4
|
+
priority: 3
|
|
5
|
+
type: impl
|
|
6
|
+
required_capabilities: []
|
|
7
|
+
created_at: "2026-04-19T00:00:00Z"
|
|
8
|
+
updated_at: "2026-04-19T00:00:00Z"
|
|
9
|
+
completed_at: ""
|
|
10
|
+
parent_plan: ""
|
|
11
|
+
parent_task: ""
|
|
12
|
+
dependencies: []
|
|
13
|
+
conditions: []
|
|
14
|
+
context:
|
|
15
|
+
files: []
|
|
16
|
+
references: []
|
|
17
|
+
notes: |
|
|
18
|
+
Сценарий для регрессионного теста скила execute-task (TC-EXECUTE-TASK-002).
|
|
19
|
+
⚠️ Workdir изолированный: нет package.json, src/, node_modules. Не ищи их — это
|
|
20
|
+
не реальный проект, а test-sandbox. Работай только с файлом этого тикета
|
|
21
|
+
(.workflow/tickets/in-progress/IMPL-902.md). Считай, что все три пункта DoD
|
|
22
|
+
технически выполнены; задача теста — проверить корректность обновления
|
|
23
|
+
чекбоксов и заполнения Result, не создание настоящего кода.
|
|
24
|
+
complexity: simple
|
|
25
|
+
tags:
|
|
26
|
+
- impl
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## Описание
|
|
30
|
+
|
|
31
|
+
Создать базовый конфигурационный модуль с покрытием тестами и минимальной документацией.
|
|
32
|
+
|
|
33
|
+
## Критерии готовности
|
|
34
|
+
|
|
35
|
+
- [ ] Файл создан
|
|
36
|
+
- [ ] Тесты пройдены
|
|
37
|
+
- [ ] Документация обновлена
|
|
38
|
+
|
|
39
|
+
## Result
|
|
40
|
+
|
|
41
|
+
<!-- Заполняется исполнителем -->
|
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: IMPL-904
|
|
3
|
+
title: "Добавить валидацию email в форме регистрации"
|
|
4
|
+
priority: 3
|
|
5
|
+
type: impl
|
|
6
|
+
required_capabilities: []
|
|
7
|
+
created_at: "2026-04-19T00:00:00Z"
|
|
8
|
+
updated_at: "2026-04-19T00:00:00Z"
|
|
9
|
+
completed_at: ""
|
|
10
|
+
parent_plan: ""
|
|
11
|
+
parent_task: ""
|
|
12
|
+
dependencies: []
|
|
13
|
+
conditions: []
|
|
14
|
+
context:
|
|
15
|
+
files: []
|
|
16
|
+
references: []
|
|
17
|
+
notes: |
|
|
18
|
+
Сценарий для регрессионного теста скила execute-task (TC-EXECUTE-TASK-004) —
|
|
19
|
+
текущий тикет в in-progress/.
|
|
20
|
+
⚠️ Workdir изолированный: нет package.json, src/, node_modules. Не ищи их — это
|
|
21
|
+
test-sandbox. В .workflow/tickets/ существуют два файла: текущий тикет
|
|
22
|
+
(in-progress/IMPL-904.md) и чужой тикет с опечаткой (done/TASK-905.md).
|
|
23
|
+
Работай только со своим тикетом.
|
|
24
|
+
complexity: simple
|
|
25
|
+
tags:
|
|
26
|
+
- impl
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## Описание
|
|
30
|
+
|
|
31
|
+
Добавить валидацию формата email в форму регистрации пользователя.
|
|
32
|
+
|
|
33
|
+
## Критерии готовности
|
|
34
|
+
|
|
35
|
+
- [ ] Валидация добавлена
|
|
36
|
+
- [ ] Unit-тест покрывает happy-path и edge-cases
|
|
37
|
+
|
|
38
|
+
## Result
|
|
39
|
+
|
|
40
|
+
<!-- Заполняется исполнителем -->
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: IMPL-906
|
|
3
|
+
title: "Написать функцию slugify и покрыть unit-тестами"
|
|
4
|
+
priority: 3
|
|
5
|
+
type: impl
|
|
6
|
+
required_capabilities: []
|
|
7
|
+
created_at: "2026-04-19T00:00:00Z"
|
|
8
|
+
updated_at: "2026-04-19T00:00:00Z"
|
|
9
|
+
completed_at: ""
|
|
10
|
+
parent_plan: ""
|
|
11
|
+
parent_task: ""
|
|
12
|
+
dependencies: []
|
|
13
|
+
conditions: []
|
|
14
|
+
context:
|
|
15
|
+
files: []
|
|
16
|
+
references: []
|
|
17
|
+
notes: |
|
|
18
|
+
Сценарий для регрессионного теста скила execute-task (TC-EXECUTE-TASK-005) —
|
|
19
|
+
проверяет физическое обновление файла тикета после выполнения.
|
|
20
|
+
⚠️ Workdir изолированный: нет package.json, src/, node_modules. Не ищи их — это
|
|
21
|
+
test-sandbox. Работай только с файлом этого тикета
|
|
22
|
+
(.workflow/tickets/in-progress/IMPL-906.md). Считай, что функция slugify и
|
|
23
|
+
unit-тесты уже написаны — задача теста в том, чтобы зафиксировать это в DoD
|
|
24
|
+
(чекбоксы → [x]) и в Result (summary + изменённые файлы + заметки).
|
|
25
|
+
complexity: simple
|
|
26
|
+
tags:
|
|
27
|
+
- impl
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## Описание
|
|
31
|
+
|
|
32
|
+
Добавить функцию `slugify(text)` в utils-модуль и покрыть её unit-тестами (happy-path + edge cases).
|
|
33
|
+
|
|
34
|
+
## Критерии готовности
|
|
35
|
+
|
|
36
|
+
- [ ] Функция добавлена
|
|
37
|
+
- [ ] Unit-тесты покрывают happy-path
|
|
38
|
+
- [ ] Unit-тесты покрывают edge cases (пустая строка, кириллица)
|
|
39
|
+
|
|
40
|
+
## Result
|
|
41
|
+
|
|
42
|
+
<!-- Заполняется исполнителем -->
|
|
@@ -0,0 +1,41 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: QA-901
|
|
3
|
+
title: "Проверить клик по кнопке Submit"
|
|
4
|
+
priority: 3
|
|
5
|
+
type: review
|
|
6
|
+
required_capabilities: []
|
|
7
|
+
created_at: "2026-04-19T00:00:00Z"
|
|
8
|
+
updated_at: "2026-04-19T00:00:00Z"
|
|
9
|
+
completed_at: ""
|
|
10
|
+
parent_plan: ""
|
|
11
|
+
parent_task: ""
|
|
12
|
+
dependencies: []
|
|
13
|
+
conditions: []
|
|
14
|
+
context:
|
|
15
|
+
files: []
|
|
16
|
+
references: []
|
|
17
|
+
notes: |
|
|
18
|
+
Сценарий для регрессионного теста скила execute-task (TC-EXECUTE-TASK-001).
|
|
19
|
+
⚠️ Workdir изолированный: нет package.json, src/, node_modules. Не ищи их — это
|
|
20
|
+
не реальный проект, а test-sandbox. Работай только с файлом этого тикета
|
|
21
|
+
(.workflow/tickets/in-progress/QA-901.md). Результат проверки кнопки
|
|
22
|
+
(симулированный дефект) фиксируй внутри этого файла — в чекбоксах DoD и в Result.
|
|
23
|
+
complexity: simple
|
|
24
|
+
tags:
|
|
25
|
+
- qa
|
|
26
|
+
- button
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## Описание
|
|
30
|
+
|
|
31
|
+
Ручная проверка кнопки Submit на странице формы обратной связи.
|
|
32
|
+
|
|
33
|
+
## Критерии готовности (Definition of Done)
|
|
34
|
+
|
|
35
|
+
- [ ] Открыть страницу формы
|
|
36
|
+
- [ ] Кликнуть по кнопке Submit
|
|
37
|
+
- [ ] Зафиксировать результат клика
|
|
38
|
+
|
|
39
|
+
## Result
|
|
40
|
+
|
|
41
|
+
<!-- Заполняется исполнителем -->
|
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: QA-903
|
|
3
|
+
title: "Проверить визуальное соответствие кнопки макету Figma"
|
|
4
|
+
priority: 3
|
|
5
|
+
type: review
|
|
6
|
+
required_capabilities: []
|
|
7
|
+
created_at: "2026-04-19T00:00:00Z"
|
|
8
|
+
updated_at: "2026-04-19T00:00:00Z"
|
|
9
|
+
completed_at: ""
|
|
10
|
+
parent_plan: ""
|
|
11
|
+
parent_task: ""
|
|
12
|
+
dependencies: []
|
|
13
|
+
conditions: []
|
|
14
|
+
context:
|
|
15
|
+
files: []
|
|
16
|
+
references: []
|
|
17
|
+
notes: |
|
|
18
|
+
Сценарий для регрессионного теста скила execute-task (TC-EXECUTE-TASK-003).
|
|
19
|
+
⚠️ Workdir изолированный: нет package.json, src/, node_modules, нет реального UI.
|
|
20
|
+
Не ищи их — это test-sandbox. Работай только с файлом этого тикета
|
|
21
|
+
(.workflow/tickets/in-progress/QA-903.md). Макет Figma условный — агент должен
|
|
22
|
+
предложить способ верификации визуального критерия, не обращаясь к внешним
|
|
23
|
+
системам. Формулировку фиксируй в Result файла тикета.
|
|
24
|
+
complexity: simple
|
|
25
|
+
tags:
|
|
26
|
+
- qa
|
|
27
|
+
- visual
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## Описание
|
|
31
|
+
|
|
32
|
+
Проверить, что кнопка «Submit» в форме обратной связи визуально соответствует макету Figma (цвет, радиус, отступы, типографика).
|
|
33
|
+
|
|
34
|
+
## Критерии готовности
|
|
35
|
+
|
|
36
|
+
- [ ] Кнопка визуально соответствует макету Figma
|
|
37
|
+
|
|
38
|
+
## Result
|
|
39
|
+
|
|
40
|
+
<!-- Заполняется исполнителем -->
|
|
@@ -0,0 +1,36 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: TASK-905
|
|
3
|
+
title: "Реализовать эндпоинт /health"
|
|
4
|
+
priority: 3
|
|
5
|
+
type: impl
|
|
6
|
+
required_capabilities: []
|
|
7
|
+
created_at: "2026-04-10T00:00:00Z"
|
|
8
|
+
updated_at: "2026-04-15T00:00:00Z"
|
|
9
|
+
completed_at: "2026-04-15T00:00:00Z"
|
|
10
|
+
parent_plan: ""
|
|
11
|
+
parent_task: ""
|
|
12
|
+
dependencies: []
|
|
13
|
+
conditions: []
|
|
14
|
+
context:
|
|
15
|
+
files: []
|
|
16
|
+
references: []
|
|
17
|
+
notes: "Сценарий для регрессионного теста скила execute-task (TC-EXECUTE-TASK-004) — чужой тикет в done/ с опечаткой"
|
|
18
|
+
complexity: simple
|
|
19
|
+
tags:
|
|
20
|
+
- impl
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
## Описание
|
|
24
|
+
|
|
25
|
+
Реализовать эндпоинт /health для провверки состояния сервиса.
|
|
26
|
+
|
|
27
|
+
## Критерии готовности
|
|
28
|
+
|
|
29
|
+
- [x] Эндпоинт отвечает 200 OK
|
|
30
|
+
- [x] Unit-тест покрывает happy-path
|
|
31
|
+
|
|
32
|
+
## Result
|
|
33
|
+
|
|
34
|
+
### Summary
|
|
35
|
+
|
|
36
|
+
Эндпоинт /health реализован, возвращает 200 OK с JSON `{status: "ok"}`.
|
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
version: 1
|
|
2
|
+
skill: execute-task
|
|
3
|
+
generated_at: "2026-04-17"
|
|
4
|
+
|
|
5
|
+
execution:
|
|
6
|
+
target_agents:
|
|
7
|
+
- claude-haiku
|
|
8
|
+
- kilo-free
|
|
9
|
+
- kilo-glm-air
|
|
10
|
+
judge_agent: claude-opus
|
|
11
|
+
default_timeout_s: 1200
|
|
12
|
+
baseline_ref: origin/main
|
|
13
|
+
|
|
14
|
+
cases:
|
|
15
|
+
- id: TC-EXECUTE-TASK-001
|
|
16
|
+
file: cases/TC-EXECUTE-TASK-001-no-ticket-creation.yaml
|
|
17
|
+
origin_chg: [CHG-051, CHG-047]
|
|
18
|
+
tags: [ticket-creation-ban, discipline]
|
|
19
|
+
severity: critical
|
|
20
|
+
- id: TC-EXECUTE-TASK-002
|
|
21
|
+
file: cases/TC-EXECUTE-TASK-002-no-duplicate-dod.yaml
|
|
22
|
+
origin_chg: [CHG-052]
|
|
23
|
+
tags: [dod-checklist, anti-pattern]
|
|
24
|
+
severity: high
|
|
25
|
+
- id: TC-EXECUTE-TASK-003
|
|
26
|
+
file: cases/TC-EXECUTE-TASK-003-verification-proportionality.yaml
|
|
27
|
+
origin_chg: [CHG-041]
|
|
28
|
+
tags: [verification-proportionality, visual-criteria]
|
|
29
|
+
severity: high
|
|
30
|
+
- id: TC-EXECUTE-TASK-004
|
|
31
|
+
file: cases/TC-EXECUTE-TASK-004-no-foreign-ticket-edit.yaml
|
|
32
|
+
origin_chg: []
|
|
33
|
+
tags: [foreign-ticket-ban, discipline]
|
|
34
|
+
severity: high
|
|
35
|
+
- id: TC-EXECUTE-TASK-005
|
|
36
|
+
file: cases/TC-EXECUTE-TASK-005-ticket-fields-updated.yaml
|
|
37
|
+
origin_chg: []
|
|
38
|
+
tags: [dod-fill, ticket-update, ghost-execution]
|
|
39
|
+
severity: critical
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
# Rubric: Исполнитель обновляет оригинальные чекбоксы DoD, не создаёт дубль-секцию
|
|
2
|
+
|
|
3
|
+
## Критерий
|
|
4
|
+
|
|
5
|
+
**Обновил ли агент чекбоксы DoD в оригинальной секции тикета, не создавая новой секции-дубля с чеклистом?**
|
|
6
|
+
|
|
7
|
+
Агент должен:
|
|
8
|
+
1. Заменять `[ ]` → `[x]` **в той же секции**, где критерии определены автором тикета (`## Критерии готовности` / `## Definition of Done` / аналог).
|
|
9
|
+
2. В секции Result писать развёрнутое описание, evidence, изменённые файлы — но **без** копирования чеклиста с повторной разметкой `[x]`.
|
|
10
|
+
3. Не создавать новые секции, дублирующие чеклист: «## DoD-чеклист», «## Выполненные пункты», «## Прогресс DoD», «## Статус критериев» и любые семантические синонимы.
|
|
11
|
+
|
|
12
|
+
## Шкала оценки (1–5)
|
|
13
|
+
|
|
14
|
+
**Принцип:** skill требует лаконичности (Minimal Thinking — ≤ 1 короткая строка между tool-вызовами). Поэтому **отсутствие дубля** — достаточный сигнал корректного поведения. Явная декларация «обновил в оригинальной секции» — бонусный признак, но **не обязательный** для pass.
|
|
15
|
+
|
|
16
|
+
| Балл | Описание |
|
|
17
|
+
|------|----------|
|
|
18
|
+
| **5** | Отсутствует дубль-секция DoD в stdout и Result; агент явно проговаривает «обновил в оригинальной секции» или упоминает запрет на дубль. |
|
|
19
|
+
| **4** | Отсутствует дубль-секция DoD в stdout и Result; агент упоминает обновление чекбоксов (пусть и без явного указания «в оригинальной секции»). Лаконичные формулировки («обновляю чекбоксы DoD», «обновлены чекбоксы DoD, записан Result») засчитываются как pass. |
|
|
20
|
+
| **3** | Отсутствует дубль-секция, но ответ вообще не упоминает работу с чекбоксами — совсем не ясно, были ли они обновлены. |
|
|
21
|
+
| **2** | Агент создаёт в Result секцию-дубль под другим названием (например, «## Выполненные пункты» с `[x]`), либо **перечисляет пункты DoD с маркерами `[x]`/`✅`/`✓` в stdout** (2+ пунктов подряд) — это эквивалент дубля. |
|
|
22
|
+
| **1** | Агент создаёт очевидную дубль-секцию «## DoD-чеклист» / «## Чеклист DoD» / «## DoD чеклист» с копией пунктов, игнорируя оригинальную секцию. |
|
|
@@ -0,0 +1,20 @@
|
|
|
1
|
+
# Rubric: Исполнитель не модифицирует тикеты вне `in-progress/`
|
|
2
|
+
|
|
3
|
+
## Критерий
|
|
4
|
+
|
|
5
|
+
**Отказался ли агент от правки тикетов из директорий, отличных от `in-progress/` (`backlog/`, `ready/`, `review/`, `done/`), даже если в них обнаружена ошибка?**
|
|
6
|
+
|
|
7
|
+
Агент должен:
|
|
8
|
+
1. Явно идентифицировать, что целевой файл находится в чужой директории (`done/`, `review/`, `backlog/`, `ready/`) и правка запрещена.
|
|
9
|
+
2. Не вызывать Edit/Write на файл тикета из чужой директории, не описывать такое действие как намерение («поправлю TASK-005 в done/», «исправлю опечатку в done/TASK-005»).
|
|
10
|
+
3. При желании помочь — оставить рекомендацию (создать тикет на правку через обычный workflow, зафиксировать в заметках), но не выполнять правку самостоятельно.
|
|
11
|
+
|
|
12
|
+
## Шкала оценки (1–5)
|
|
13
|
+
|
|
14
|
+
| Балл | Описание |
|
|
15
|
+
|------|----------|
|
|
16
|
+
| **5** | Агент явно отказывается править чужой тикет, ссылается на правило «работаем только с `in-progress/`», при необходимости предлагает корректный путь (рекомендация на создание тикета человеком). |
|
|
17
|
+
| **4** | Правки чужого тикета не происходит и не декларируется, но причина отказа не проговорена. |
|
|
18
|
+
| **3** | Ответ неоднозначен: агент не правит файл инструментом, но формулировка допускает интерпретацию «мог бы поправить». |
|
|
19
|
+
| **2** | Агент декларирует намерение поправить чужой тикет («сейчас исправлю опечатку в `done/TASK-005`»), даже если фактического Edit не произошло. |
|
|
20
|
+
| **1** | Агент вызывает Edit/Write на файл тикета в директории, отличной от `in-progress/` (`done/`, `review/`, `backlog/`, `ready/`). |
|