workflow-ai 1.0.63 → 1.0.65
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +239 -145
- package/configs/agent-health-rules.yaml +64 -0
- package/configs/config.yaml +134 -0
- package/configs/pipeline.yaml +901 -0
- package/configs/ticket-movement-rules.yaml +80 -0
- package/package.json +1 -1
- package/src/global-dir.mjs +25 -1
- package/src/init.mjs +20 -3
- package/src/lib/agent-health-registry.mjs +245 -0
- package/src/lib/artifact-snapshot.mjs +233 -0
- package/src/lib/error-classifier.mjs +274 -0
- package/src/lib/test-error-classifier.mjs +60 -0
- package/src/lib/test-extends.mjs +58 -0
- package/src/lib/test-version.mjs +21 -0
- package/src/scripts/move-to-review.js +5 -7
- package/src/scripts/reset-agent-health.js +62 -0
- package/src/scripts/run-skill-tests.js +348 -136
- package/src/skills/analyze-report/README.md +44 -0
- package/src/skills/analyze-report/SKILL.md +121 -0
- package/src/skills/analyze-report/algorithms/progress-assessment.md +108 -0
- package/src/skills/analyze-report/knowledge/analysis-frameworks.md +66 -0
- package/src/skills/analyze-report/knowledge/report-structure.md +61 -0
- package/src/skills/analyze-report/scripts/calc-plan-metrics.js +234 -0
- package/src/skills/analyze-report/templates/analysis-report.md +80 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/claude-sonnet/trial-1.md +69 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/claude-sonnet/trial-2.md +103 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/claude-sonnet/trial-3.md +99 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/judge.json +163 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-deepseek/trial-1.md +89 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-deepseek/trial-2.md +88 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-deepseek/trial-3.md +100 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-glm/trial-1.md +77 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-glm/trial-2.md +64 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-glm/trial-3.md +110 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-minimax/trial-1.md +74 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-minimax/trial-2.md +38 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/kilo-minimax/trial-3.md +61 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001/current/meta.json +115 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-001-evidence-from-log.yaml +60 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/claude-sonnet/trial-1.md +90 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/claude-sonnet/trial-2.md +89 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/claude-sonnet/trial-3.md +77 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/judge.json +163 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-deepseek/trial-1.md +84 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-deepseek/trial-2.md +77 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-deepseek/trial-3.md +89 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-glm/trial-1.md +103 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-glm/trial-2.md +103 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-glm/trial-3.md +103 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-minimax/trial-1.md +93 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-minimax/trial-2.md +93 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/kilo-minimax/trial-3.md +86 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002/current/meta.json +115 -0
- package/src/skills/analyze-report/tests/cases/TC-ANALYZE-REPORT-002-result-block-format.yaml +44 -0
- package/src/skills/analyze-report/tests/fixtures/REPORT-002-incorrect-attribution.md +27 -0
- package/src/skills/analyze-report/tests/fixtures/pipeline-2026-04-06_qa-001-skip.log +32 -0
- package/src/skills/analyze-report/tests/index.yaml +25 -0
- package/src/skills/analyze-report/tests/rubrics/evidence-from-log.md +22 -0
- package/src/skills/analyze-report/tests/rubrics/result-block-format.md +22 -0
- package/src/skills/analyze-report/workflows/progress.md +158 -0
- package/src/skills/analyze-report/workflows/retrospective.md +143 -0
- package/src/skills/coach/README.md +43 -0
- package/src/skills/coach/SKILL.md +167 -0
- package/src/skills/coach/SKILL.md.legacy +157 -0
- package/src/skills/coach/algorithms/gap-analysis.md +69 -0
- package/src/skills/coach/algorithms/improvement-prioritization.md +62 -0
- package/src/skills/coach/algorithms/skill-scoring.md +80 -0
- package/src/skills/coach/knowledge/audit-applied-changes-clean.txt +11 -0
- package/src/skills/coach/knowledge/backlog-management.md +67 -0
- package/src/skills/coach/knowledge/backlog-management.md.legacy +90 -0
- package/src/skills/coach/knowledge/common-antipatterns.md +76 -0
- package/src/skills/coach/knowledge/prompt-engineering.md +45 -0
- package/src/skills/coach/knowledge/shared-knowledge-guide.md +44 -0
- package/src/skills/coach/knowledge/skill-anatomy.md +49 -0
- package/src/skills/coach/knowledge/test-authorship.md +141 -0
- package/src/skills/coach/templates/audit-report.md +39 -0
- package/src/skills/coach/templates/coach-backlog-init.yaml +14 -0
- package/src/skills/coach/templates/coach-backlog-init.yaml.legacy +10 -0
- package/src/skills/coach/templates/improvement-plan.md +42 -0
- package/src/skills/coach/templates/new-skill.md +95 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/claude-sonnet/trial-1.md +58 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/claude-sonnet/trial-2.md +65 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/claude-sonnet/trial-3.md +58 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/judge.json +151 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-deepseek/trial-1.md +46 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-deepseek/trial-2.md +0 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-deepseek/trial-3.md +75 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-glm/trial-1.md +81 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-glm/trial-2.md +101 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-glm/trial-3.md +91 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-minimax/trial-1.md +48 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-minimax/trial-2.md +30 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/kilo-minimax/trial-3.md +55 -0
- package/src/skills/coach/tests/cases/TC-COACH-001/current/meta.json +94 -0
- package/src/skills/coach/tests/cases/TC-COACH-001-evidence-based-temporal-diagram.yaml +53 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/claude-sonnet/trial-1.md +46 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/claude-sonnet/trial-2.md +50 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/claude-sonnet/trial-3.md +48 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/judge.json +151 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-deepseek/trial-1.md +0 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-deepseek/trial-2.md +37 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-deepseek/trial-3.md +30 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-glm/trial-1.md +23 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-glm/trial-2.md +29 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-glm/trial-3.md +35 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-minimax/trial-1.md +13 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-minimax/trial-2.md +19 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/kilo-minimax/trial-3.md +33 -0
- package/src/skills/coach/tests/cases/TC-COACH-002/current/meta.json +94 -0
- package/src/skills/coach/tests/cases/TC-COACH-002-root-cause-first.yaml +57 -0
- package/src/skills/coach/tests/fixtures/pipeline-2026-04-06_id-collision.log +77 -0
- package/src/skills/coach/tests/index.yaml +29 -0
- package/src/skills/coach/tests/rubrics/calibration/evidence-based-bad.md +13 -0
- package/src/skills/coach/tests/rubrics/calibration/evidence-based-good.md +29 -0
- package/src/skills/coach/tests/rubrics/evidence-based.md +26 -0
- package/src/skills/coach/tests/rubrics/root-cause-first.md +21 -0
- package/src/skills/coach/workflows/analyze.md +79 -0
- package/src/skills/coach/workflows/analyze.md.legacy +64 -0
- package/src/skills/coach/workflows/audit.md +74 -0
- package/src/skills/coach/workflows/audit.md.legacy +59 -0
- package/src/skills/coach/workflows/create.md +80 -0
- package/src/skills/coach/workflows/create.md.legacy +67 -0
- package/src/skills/coach/workflows/improve.md +71 -0
- package/src/skills/coach/workflows/improve.md.legacy +60 -0
- package/src/skills/coach/workflows/research.md +55 -0
- package/src/skills/coach/workflows/review.md +52 -0
- package/src/skills/coach/workflows/review.md.legacy +48 -0
- package/src/skills/coach/workflows/test.md +97 -0
- package/src/skills/create-plan/README.md +39 -0
- package/src/skills/create-plan/SKILL.md +104 -0
- package/src/skills/create-plan/algorithms/risk-assessment.md +73 -0
- package/src/skills/create-plan/knowledge/plan-completeness.md +67 -0
- package/src/skills/create-plan/knowledge/plan-lifecycle.md +33 -0
- package/src/skills/create-plan/knowledge/task-verification-pairs.md +151 -0
- package/src/skills/create-plan/scripts/validate-completeness.js +182 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/claude-sonnet/trial-1.md +5 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/claude-sonnet/trial-2.md +39 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/claude-sonnet/trial-3.md +35 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/judge.json +167 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-deepseek/trial-1.md +5 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-deepseek/trial-2.md +10 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-deepseek/trial-3.md +5 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-glm/trial-1.md +26 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-glm/trial-2.md +86 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-glm/trial-3.md +5 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-minimax/trial-1.md +11 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-minimax/trial-2.md +15 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/kilo-minimax/trial-3.md +14 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001/current/meta.json +119 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-001-validate-completeness.yaml +41 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/claude-sonnet/trial-1.md +25 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/claude-sonnet/trial-2.md +30 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/claude-sonnet/trial-3.md +37 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/judge.json +164 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-deepseek/trial-1.md +3 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-deepseek/trial-2.md +11 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-deepseek/trial-3.md +13 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-glm/trial-1.md +44 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-glm/trial-2.md +5 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-glm/trial-3.md +49 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-minimax/trial-1.md +6 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-minimax/trial-2.md +11 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/kilo-minimax/trial-3.md +16 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002/current/meta.json +116 -0
- package/src/skills/create-plan/tests/cases/TC-CREATE-PLAN-002-task-granularity.yaml +39 -0
- package/src/skills/create-plan/tests/index.yaml +25 -0
- package/src/skills/create-plan/tests/rubrics/task-granularity.md +21 -0
- package/src/skills/create-plan/tests/rubrics/validate-completeness.md +21 -0
- package/src/skills/create-plan/workflows/create.md +136 -0
- package/src/skills/create-report/README.md +40 -0
- package/src/skills/create-report/SKILL.md +73 -0
- package/src/skills/create-report/algorithms/metric-calculation.md +93 -0
- package/src/skills/create-report/knowledge/report-metrics.md +82 -0
- package/src/skills/create-report/scripts/calc-metrics.js +383 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/claude-sonnet/trial-1.md +25 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/claude-sonnet/trial-2.md +26 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/claude-sonnet/trial-3.md +28 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/judge.json +163 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-deepseek/trial-1.md +4 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-deepseek/trial-2.md +3 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-deepseek/trial-3.md +6 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-glm/trial-1.md +8 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-glm/trial-2.md +12 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-glm/trial-3.md +7 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-minimax/trial-1.md +12 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-minimax/trial-2.md +22 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/kilo-minimax/trial-3.md +13 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001/current/meta.json +115 -0
- package/src/skills/create-report/tests/cases/TC-CREATE-REPORT-001-root-cause-attribution.yaml +57 -0
- package/src/skills/create-report/tests/index.yaml +20 -0
- package/src/skills/create-report/tests/rubrics/root-cause-attribution.md +21 -0
- package/src/skills/create-report/workflows/standard.md +175 -0
- package/src/skills/decompose-gaps/README.md +39 -0
- package/src/skills/decompose-gaps/SKILL.md +78 -0
- package/src/skills/decompose-gaps/algorithms/scope-check.md +110 -0
- package/src/skills/decompose-gaps/knowledge/scope-validation.md +65 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/claude-sonnet/trial-1.md +41 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/claude-sonnet/trial-2.md +41 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/claude-sonnet/trial-3.md +56 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/judge.json +164 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-deepseek/trial-1.md +25 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-deepseek/trial-2.md +17 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-deepseek/trial-3.md +22 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-glm/trial-1.md +25 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-glm/trial-2.md +5 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-glm/trial-3.md +29 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-minimax/trial-1.md +27 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-minimax/trial-2.md +35 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/kilo-minimax/trial-3.md +18 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001/current/meta.json +116 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-001-scope-exclusion.yaml +46 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/claude-sonnet/trial-1.md +27 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/claude-sonnet/trial-2.md +30 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/claude-sonnet/trial-3.md +27 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/judge.json +163 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-deepseek/trial-1.md +0 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-deepseek/trial-2.md +15 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-deepseek/trial-3.md +7 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-glm/trial-1.md +21 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-glm/trial-2.md +38 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-glm/trial-3.md +16 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-minimax/trial-1.md +5 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-minimax/trial-2.md +10 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/kilo-minimax/trial-3.md +9 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002/current/meta.json +115 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-002-glob-before-write.yaml +36 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/claude-sonnet/trial-1.md +30 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/claude-sonnet/trial-2.md +30 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/claude-sonnet/trial-3.md +30 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/judge.json +165 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/kilo-deepseek/trial-1.md +5 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/kilo-deepseek/trial-2.md +26 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/kilo-deepseek/trial-3.md +5 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/kilo-glm/trial-1.md +39 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/kilo-glm/trial-2.md +37 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/kilo-glm/trial-3.md +45 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/kilo-minimax/trial-1.md +26 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/kilo-minimax/trial-2.md +27 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/kilo-minimax/trial-3.md +7 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003/current/meta.json +117 -0
- package/src/skills/decompose-gaps/tests/cases/TC-DECOMPOSE-GAPS-003-parent-plan-mandatory.yaml +41 -0
- package/src/skills/decompose-gaps/tests/index.yaml +30 -0
- package/src/skills/decompose-gaps/tests/rubrics/glob-before-write.md +21 -0
- package/src/skills/decompose-gaps/tests/rubrics/parent-plan-mandatory.md +22 -0
- package/src/skills/decompose-gaps/tests/rubrics/scope-exclusion.md +21 -0
- package/src/skills/decompose-gaps/workflows/decompose.md +123 -0
- package/src/skills/decompose-plan/README.md +43 -0
- package/src/skills/decompose-plan/SKILL.md +87 -0
- package/src/skills/decompose-plan/algorithms/deduplication.md +101 -0
- package/src/skills/decompose-plan/knowledge/atomicity-checklist.md +139 -0
- package/src/skills/decompose-plan/knowledge/capabilities.md +68 -0
- package/src/skills/decompose-plan/knowledge/human-task-rules.md +82 -0
- package/src/skills/decompose-plan/knowledge/scope-guard-checklist.md +73 -0
- package/src/skills/decompose-plan/scripts/check-atomicity-limit.js +47 -0
- package/src/skills/decompose-plan/scripts/check-duplicates.js +323 -0
- package/src/skills/decompose-plan/scripts/verify-atomicity.js +408 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/claude-sonnet/trial-1.md +30 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/claude-sonnet/trial-2.md +36 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/claude-sonnet/trial-3.md +37 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/judge.json +163 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-deepseek/trial-1.md +20 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-deepseek/trial-2.md +17 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-deepseek/trial-3.md +28 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-glm/trial-1.md +114 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-glm/trial-2.md +137 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-glm/trial-3.md +188 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-minimax/trial-1.md +0 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-minimax/trial-2.md +32 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/kilo-minimax/trial-3.md +110 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001/current/meta.json +115 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-001-atomicity-no-1to1.yaml +56 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/claude-sonnet/trial-1.md +47 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/claude-sonnet/trial-2.md +54 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/claude-sonnet/trial-3.md +43 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/judge.json +163 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-deepseek/trial-1.md +15 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-deepseek/trial-2.md +5 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-deepseek/trial-3.md +12 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-glm/trial-1.md +34 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-glm/trial-2.md +30 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-glm/trial-3.md +35 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-minimax/trial-1.md +0 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-minimax/trial-2.md +31 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/kilo-minimax/trial-3.md +0 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002/current/meta.json +115 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-002-get-next-id-mandatory.yaml +44 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/claude-sonnet/trial-1.md +21 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/claude-sonnet/trial-2.md +38 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/claude-sonnet/trial-3.md +30 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/judge.json +163 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-deepseek/trial-1.md +31 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-deepseek/trial-2.md +35 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-deepseek/trial-3.md +48 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-glm/trial-1.md +167 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-glm/trial-2.md +62 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-glm/trial-3.md +174 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-minimax/trial-1.md +0 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-minimax/trial-2.md +0 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/kilo-minimax/trial-3.md +0 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003/current/meta.json +115 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-003-verbatim-dod-transfer.yaml +42 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/claude-sonnet/trial-1.md +55 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/claude-sonnet/trial-2.md +49 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/claude-sonnet/trial-3.md +49 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/judge.json +163 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/kilo-deepseek/trial-1.md +104 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/kilo-deepseek/trial-2.md +45 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/kilo-deepseek/trial-3.md +58 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/kilo-glm/trial-1.md +193 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/kilo-glm/trial-2.md +202 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/kilo-glm/trial-3.md +155 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/kilo-minimax/trial-1.md +52 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/kilo-minimax/trial-2.md +17 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/kilo-minimax/trial-3.md +0 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004/current/meta.json +115 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-004-executor-atomicity.yaml +64 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/claude-sonnet/trial-1.md +59 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/claude-sonnet/trial-2.md +204 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/claude-sonnet/trial-3.md +213 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/judge.json +163 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/kilo-deepseek/trial-1.md +0 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/kilo-deepseek/trial-2.md +57 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/kilo-deepseek/trial-3.md +54 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/kilo-glm/trial-1.md +147 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/kilo-glm/trial-2.md +165 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/kilo-glm/trial-3.md +133 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/kilo-minimax/trial-1.md +81 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/kilo-minimax/trial-2.md +108 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/kilo-minimax/trial-3.md +3 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005/current/meta.json +114 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-005-capabilities-registry.yaml +78 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/claude-sonnet/trial-1.md +225 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/claude-sonnet/trial-2.md +66 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/claude-sonnet/trial-3.md +36 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/judge.json +163 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/kilo-deepseek/trial-1.md +42 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/kilo-deepseek/trial-2.md +67 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/kilo-deepseek/trial-3.md +40 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/kilo-glm/trial-1.md +122 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/kilo-glm/trial-2.md +131 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/kilo-glm/trial-3.md +138 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/kilo-minimax/trial-1.md +41 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/kilo-minimax/trial-2.md +88 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/kilo-minimax/trial-3.md +0 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006/current/meta.json +115 -0
- package/src/skills/decompose-plan/tests/cases/TC-DECOMPOSE-PLAN-006-dod-threshold.yaml +72 -0
- package/src/skills/decompose-plan/tests/index.yaml +45 -0
- package/src/skills/decompose-plan/tests/rubrics/atomicity-no-1to1.md +21 -0
- package/src/skills/decompose-plan/tests/rubrics/capabilities-registry.md +21 -0
- package/src/skills/decompose-plan/tests/rubrics/dod-threshold.md +21 -0
- package/src/skills/decompose-plan/tests/rubrics/executor-atomicity.md +21 -0
- package/src/skills/decompose-plan/tests/rubrics/get-next-id-mandatory.md +21 -0
- package/src/skills/decompose-plan/tests/rubrics/verbatim-dod-transfer.md +21 -0
- package/src/skills/decompose-plan/workflows/decompose.md +305 -0
- package/src/skills/deep-research/README.md +36 -0
- package/src/skills/deep-research/SKILL.md +106 -0
- package/src/skills/deep-research/algorithms/source-scoring.md +63 -0
- package/src/skills/deep-research/algorithms/synthesis.md +67 -0
- package/src/skills/deep-research/knowledge/data-validation.md +44 -0
- package/src/skills/deep-research/knowledge/perplexity-config.md +30 -0
- package/src/skills/deep-research/knowledge/research-methodology.md +54 -0
- package/src/skills/deep-research/knowledge/source-evaluation.md +33 -0
- package/src/skills/deep-research/scripts/perplexity-research.js +315 -0
- package/src/skills/deep-research/templates/brief-summary.md +25 -0
- package/src/skills/deep-research/templates/research-report.md +76 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/claude-haiku/trial-1.md +48 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/claude-haiku/trial-2.md +88 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/claude-haiku/trial-3.md +56 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/judge.json +163 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-free/trial-1.md +58 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-free/trial-2.md +249 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-free/trial-3.md +44 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-glm/trial-1.md +96 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-glm/trial-2.md +56 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-glm/trial-3.md +94 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-glm-air/trial-1.md +11 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-glm-air/trial-2.md +1 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/kilo-glm-air/trial-3.md +1 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001/current/meta.json +115 -0
- package/src/skills/deep-research/tests/cases/TC-DEEP-RESEARCH-001-self-check-url.yaml +58 -0
- package/src/skills/deep-research/tests/index.yaml +20 -0
- package/src/skills/deep-research/tests/rubrics/self-check-url.md +34 -0
- package/src/skills/deep-research/workflows/base-checklist.md +19 -0
- package/src/skills/deep-research/workflows/benchmark.md +38 -0
- package/src/skills/deep-research/workflows/competitor.md +44 -0
- package/src/skills/deep-research/workflows/custom.md +32 -0
- package/src/skills/deep-research/workflows/market.md +44 -0
- package/src/skills/deep-research/workflows/technology.md +40 -0
- package/src/skills/deep-research/workflows/trend.md +40 -0
- package/src/skills/execute-task/README.md +44 -0
- package/src/skills/execute-task/SKILL.md +292 -0
- package/src/skills/execute-task/algorithms/execution-strategy.md +136 -0
- package/src/skills/execute-task/knowledge/context-checkpoints.md +75 -0
- package/src/skills/execute-task/knowledge/ticket-structure.md +70 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/claude-haiku/trial-1.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/claude-haiku/trial-2.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/claude-haiku/trial-3.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/judge.json +124 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/kilo-free/trial-1.md +4 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/kilo-free/trial-2.md +4 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/kilo-free/trial-3.md +4 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/kilo-glm-air/trial-1.md +4 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/kilo-glm-air/trial-2.md +4 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/kilo-glm-air/trial-3.md +11 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001/current/meta.json +88 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-001-no-ticket-creation.yaml +48 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/claude-haiku/trial-1.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/claude-haiku/trial-2.md +6 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/claude-haiku/trial-3.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/judge.json +124 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/kilo-free/trial-1.md +4 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/kilo-free/trial-2.md +4 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/kilo-free/trial-3.md +8 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/kilo-glm-air/trial-1.md +9 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/kilo-glm-air/trial-2.md +26 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/kilo-glm-air/trial-3.md +4 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002/current/meta.json +89 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-002-no-duplicate-dod.yaml +44 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-003/current/claude-haiku/trial-1.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-003/current/claude-haiku/trial-2.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-003/current/claude-haiku/trial-3.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-003/current/judge.json +46 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-003/current/meta.json +37 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-003-verification-proportionality.yaml +46 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/claude-haiku/trial-1.md +18 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/claude-haiku/trial-2.md +16 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/claude-haiku/trial-3.md +14 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/judge.json +124 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/kilo-free/trial-1.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/kilo-free/trial-2.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/kilo-free/trial-3.md +1 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/kilo-glm-air/trial-1.md +8 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/kilo-glm-air/trial-2.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/kilo-glm-air/trial-3.md +4 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004/current/meta.json +89 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-004-no-foreign-ticket-edit.yaml +50 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/claude-haiku/trial-1.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/claude-haiku/trial-2.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/claude-haiku/trial-3.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/judge.json +124 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/kilo-free/trial-1.md +15 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/kilo-free/trial-2.md +4 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/kilo-free/trial-3.md +5 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/kilo-glm-air/trial-1.md +11 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/kilo-glm-air/trial-2.md +11 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/kilo-glm-air/trial-3.md +4 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005/current/meta.json +88 -0
- package/src/skills/execute-task/tests/cases/TC-EXECUTE-TASK-005-ticket-fields-updated.yaml +39 -0
- package/src/skills/execute-task/tests/fixtures/IMPL-902-create-file.md +41 -0
- package/src/skills/execute-task/tests/fixtures/IMPL-904-current-task.md +40 -0
- package/src/skills/execute-task/tests/fixtures/IMPL-906-fill-ticket.md +42 -0
- package/src/skills/execute-task/tests/fixtures/QA-901-button-click.md +41 -0
- package/src/skills/execute-task/tests/fixtures/QA-903-visual-figma.md +40 -0
- package/src/skills/execute-task/tests/fixtures/TASK-905-done-with-typo.md +36 -0
- package/src/skills/execute-task/tests/index.yaml +39 -0
- package/src/skills/execute-task/tests/rubrics/no-duplicate-dod.md +22 -0
- package/src/skills/execute-task/tests/rubrics/no-foreign-ticket-edit.md +20 -0
- package/src/skills/execute-task/tests/rubrics/no-ticket-creation.md +21 -0
- package/src/skills/execute-task/tests/rubrics/ticket-fields-updated.md +23 -0
- package/src/skills/execute-task/tests/rubrics/verification-proportionality.md +22 -0
- package/src/skills/execute-task/workflows/execute.md +104 -0
- package/src/skills/manual-testing/README.md +63 -0
- package/src/skills/manual-testing/SKILL.md +176 -0
- package/src/skills/manual-testing/algorithms/blocked-tool-strategy.md +74 -0
- package/src/skills/manual-testing/algorithms/bug-severity.md +73 -0
- package/src/skills/manual-testing/algorithms/mcp-budget.md +97 -0
- package/src/skills/manual-testing/algorithms/test-prioritization.md +69 -0
- package/src/skills/manual-testing/knowledge/browser-extension-testing.md +102 -0
- package/src/skills/manual-testing/knowledge/browser-tools.md +114 -0
- package/src/skills/manual-testing/knowledge/desktop-tools-advanced.md +92 -0
- package/src/skills/manual-testing/knowledge/desktop-tools-core.md +76 -0
- package/src/skills/manual-testing/knowledge/sandbox-advanced.md +83 -0
- package/src/skills/manual-testing/knowledge/sandbox-core.md +67 -0
- package/src/skills/manual-testing/knowledge/stateful-edge-cases.md +69 -0
- package/src/skills/manual-testing/knowledge/test-case-design.md +107 -0
- package/src/skills/manual-testing/knowledge/testing-types.md +45 -0
- package/src/skills/manual-testing/templates/bug-report.md +52 -0
- package/src/skills/manual-testing/templates/test-case.md +34 -0
- package/src/skills/manual-testing/templates/test-plan.md +97 -0
- package/src/skills/manual-testing/templates/test-session-report.md +56 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/claude-sonnet/trial-1.md +34 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/claude-sonnet/trial-2.md +32 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/claude-sonnet/trial-3.md +30 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/judge.json +163 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-deepseek/trial-1.md +0 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-deepseek/trial-2.md +7 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-deepseek/trial-3.md +0 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-glm/trial-1.md +4 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-glm/trial-2.md +15 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-glm/trial-3.md +8 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-minimax/trial-1.md +5 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-minimax/trial-2.md +7 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/kilo-minimax/trial-3.md +7 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001/current/meta.json +114 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-001-sandbox-mandatory.yaml +38 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/claude-sonnet/trial-1.md +44 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/claude-sonnet/trial-2.md +32 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/claude-sonnet/trial-3.md +47 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/judge.json +163 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-deepseek/trial-1.md +19 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-deepseek/trial-2.md +15 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-deepseek/trial-3.md +24 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-glm/trial-1.md +19 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-glm/trial-2.md +13 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-glm/trial-3.md +18 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-minimax/trial-1.md +21 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-minimax/trial-2.md +15 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/kilo-minimax/trial-3.md +14 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002/current/meta.json +114 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-002-visual-tc-screenshot.yaml +37 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-003/current/claude-sonnet/trial-1.md +76 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-003/current/claude-sonnet/trial-2.md +71 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-003/current/claude-sonnet/trial-3.md +85 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-003/current/judge.json +46 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-003/current/meta.json +36 -0
- package/src/skills/manual-testing/tests/cases/TC-MANUAL-TESTING-003-qa-non-ui-assertion.yaml +65 -0
- package/src/skills/manual-testing/tests/index.yaml +30 -0
- package/src/skills/manual-testing/tests/last-run-tc001-sonnet.log +140 -0
- package/src/skills/manual-testing/tests/last-run-tc002.log +1 -0
- package/src/skills/manual-testing/tests/last-run.log +1469 -0
- package/src/skills/manual-testing/tests/rubrics/qa-non-ui-assertion.md +31 -0
- package/src/skills/manual-testing/tests/rubrics/sandbox-mandatory.md +20 -0
- package/src/skills/manual-testing/tests/rubrics/visual-tc-screenshot.md +21 -0
- package/src/skills/manual-testing/workflows/acceptance.md +80 -0
- package/src/skills/manual-testing/workflows/exploratory.md +84 -0
- package/src/skills/manual-testing/workflows/regression.md +76 -0
- package/src/skills/manual-testing/workflows/smoke.md +109 -0
- package/src/skills/manual-testing/workflows/test-plan.md +75 -0
- package/src/skills/review-result/README.md +59 -0
- package/src/skills/review-result/SKILL.md +138 -0
- package/src/skills/review-result/algorithms/verification.md +112 -0
- package/src/skills/review-result/knowledge/dod-patterns.md +115 -0
- package/src/skills/review-result/scripts/verify-artifacts.js +384 -0
- package/src/skills/review-result/templates/verdict.md +153 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/claude-haiku/trial-1.md +22 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/claude-haiku/trial-2.md +7 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/claude-haiku/trial-3.md +21 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/claude-sonnet/trial-1.md +6 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/claude-sonnet/trial-2.md +6 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/claude-sonnet/trial-3.md +18 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/judge.json +164 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-deepseek/trial-1.md +5 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-deepseek/trial-2.md +7 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-deepseek/trial-3.md +6 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-glm/trial-1.md +49 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-glm/trial-2.md +28 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-glm/trial-3.md +37 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-minimax/trial-1.md +22 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-minimax/trial-2.md +13 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/kilo-minimax/trial-3.md +21 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001/current/meta.json +116 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-001-visual-tc-trigger.yaml +51 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/claude-haiku/trial-1.md +23 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/claude-haiku/trial-2.md +22 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/claude-haiku/trial-3.md +28 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/claude-sonnet/trial-1.md +4 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/claude-sonnet/trial-2.md +36 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/claude-sonnet/trial-3.md +4 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/judge.json +163 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-deepseek/trial-1.md +4 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-deepseek/trial-2.md +0 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-deepseek/trial-3.md +4 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-glm/trial-1.md +39 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-glm/trial-2.md +25 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-glm/trial-3.md +32 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-minimax/trial-1.md +34 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-minimax/trial-2.md +8 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/kilo-minimax/trial-3.md +23 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002/current/meta.json +115 -0
- package/src/skills/review-result/tests/cases/TC-REVIEW-RESULT-002-path-line-suffix.yaml +39 -0
- package/src/skills/review-result/tests/fixtures/IMPL-902-path-with-line.md +43 -0
- package/src/skills/review-result/tests/fixtures/QA-901-visual-button.md +46 -0
- package/src/skills/review-result/tests/index.yaml +25 -0
- package/src/skills/review-result/tests/rubrics/path-line-suffix.md +19 -0
- package/src/skills/review-result/tests/rubrics/visual-tc-trigger.md +19 -0
- package/src/skills/review-result/workflows/review.md +209 -0
|
@@ -0,0 +1,55 @@
|
|
|
1
|
+
Проанализирую пайплайн. Вижу коллизию ID — проверю детали.
|
|
2
|
+
Проанализирую лог пайплайна. Вижу коллизию ID — но `allocate-ticket-ids` стадия не отражена в логе между `check-plan-decomposition` и `decompose-plan`. Проверю runner.
|
|
3
|
+
Проанализирую структуру goto в runner. Ключевая проблема — стадия `allocate-ticket-ids` не видна между `check-plan-decomposition` и `decompose-plan`.
|
|
4
|
+
## Findings
|
|
5
|
+
|
|
6
|
+
### 1. Пропущена стадия `allocate-ticket-ids` (КРИТИЧЕСКОЕ)
|
|
7
|
+
|
|
8
|
+
**Доказательство из лога:**
|
|
9
|
+
```
|
|
10
|
+
[2026-04-06 16:36:12] [INFO] [check-plan-decomposition] GOTO check-plan-decomposition → decompose-plan status="needs_decomposition"
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
Согласно `pipeline.yaml` (строки 314-317), `needs_decomposition` должен вести к `allocate-ticket-ids`, который передаёт `id_ranges_json` в `decompose-plan`. Но в логе стадия `allocate-ticket-ids` отсутствует, а `decompose-plan` получает только `plan_file` без `id_ranges_json`.
|
|
14
|
+
|
|
15
|
+
**Временная диаграмма для PLAN-003:**
|
|
16
|
+
| Время | Событие |
|
|
17
|
+
|-------|---------|
|
|
18
|
+
| 16:36:02 | pick-first-task архивирует IMPL-002…IMPL-008 (из PLAN-002) |
|
|
19
|
+
| 16:36:12 | check-plan-decomposition → **пропущена `allocate-ticket-ids`** → decompose-plan |
|
|
20
|
+
| 16:36:18 | decompose-plan создаёт IMPL-002, IMPL-003 **с коллизией ID** |
|
|
21
|
+
|
|
22
|
+
**Коллизия ID:**
|
|
23
|
+
- IMPL-002 и IMPL-003 уже существуют в `tickets/archive/` (были завершены в PLAN-002)
|
|
24
|
+
- Новая декомпозиция PLAN-003 создаёт тикеты с теми же ID
|
|
25
|
+
|
|
26
|
+
### 2. Stop-gate не сработал в decompose-plan
|
|
27
|
+
|
|
28
|
+
Согласно `workflows/decompose.md` (шаг 9.B): при отсутствии `id_ranges_json` декомпозитор обязан **остановиться**. Вместо этого:
|
|
29
|
+
- Тикеты созданы успешно (exitCode=0)
|
|
30
|
+
- В выводе отсутствует секция `ids_allocated_from` (обязательная при выполнении stop-gate)
|
|
31
|
+
|
|
32
|
+
**Вывод:** декомпозитор не выполнил свой собственный стоп-гейт — продолжил работу с невалидным входом.
|
|
33
|
+
|
|
34
|
+
### 3. Root Cause
|
|
35
|
+
|
|
36
|
+
| Этап | Что должно было произойти | Что произошло |
|
|
37
|
+
|------|--------------------------|--------------|
|
|
38
|
+
| check-plan-decomposition | Вернуть `needs_decomposition` → `allocate-ticket-ids` | goto ведёт напрямую к `decompose-plan` |
|
|
39
|
+
| allocate-ticket-ids | Сгенерировать `id_ranges_json`сканированием tickets/ включая archive/ | **Пропущена** |
|
|
40
|
+
| decompose-plan | Принять `id_ranges_json`, проверить валидность, остановиться если пусто | Продолжила без параметра |
|
|
41
|
+
|
|
42
|
+
**Гипотеза:** несогласованность между `pipeline.yaml` и фактически исполняемым кодом runner'а. goto-логика в runner'е (`resolveNextStage`) корректна, но runner читает устаревшую версию конфига.
|
|
43
|
+
|
|
44
|
+
### 4. Нарушенные принципы коуча
|
|
45
|
+
|
|
46
|
+
| Принцип | Как нарушен |
|
|
47
|
+
|---------|-----------|
|
|
48
|
+
| **1. Root Cause First** | Архитектурная защита от коллизий (allocate-ticket-ids) не работает — причина первого порядка: инфраструктура не обеспечила контракт |
|
|
49
|
+
| **10. Self-Correct** | Декомпозитор должен был остановиться при отсутствии `id_ranges_json`, но продолжил работу — self-check не сработал |
|
|
50
|
+
|
|
51
|
+
---
|
|
52
|
+
|
|
53
|
+
## Рекомендация
|
|
54
|
+
|
|
55
|
+
**CRITICAL:** Проверить一致性 конфигурации pipeline.yaml и фактического поведения runner'а. Стадия `allocate-ticket-ids` — обязательный пререквизит для `decompose-plan`. Без неё ID-аллокация ложится на агента, что нарушает архитектурный принцип «ID решает инфраструктура, не агент».
|
|
@@ -0,0 +1,94 @@
|
|
|
1
|
+
{
|
|
2
|
+
"date": "2026-04-21T08:57:32.459Z",
|
|
3
|
+
"skill_sha": "6df42d0",
|
|
4
|
+
"status": "passed",
|
|
5
|
+
"duration_ms": 1,
|
|
6
|
+
"per_model": {
|
|
7
|
+
"claude-sonnet": {
|
|
8
|
+
"passed": true,
|
|
9
|
+
"pass_count": 3,
|
|
10
|
+
"total": 3,
|
|
11
|
+
"threshold": 2
|
|
12
|
+
},
|
|
13
|
+
"kilo-deepseek": {
|
|
14
|
+
"passed": true,
|
|
15
|
+
"pass_count": 2,
|
|
16
|
+
"total": 3,
|
|
17
|
+
"threshold": 2
|
|
18
|
+
},
|
|
19
|
+
"kilo-minimax": {
|
|
20
|
+
"passed": true,
|
|
21
|
+
"pass_count": 2,
|
|
22
|
+
"total": 3,
|
|
23
|
+
"threshold": 2
|
|
24
|
+
},
|
|
25
|
+
"kilo-glm": {
|
|
26
|
+
"passed": true,
|
|
27
|
+
"pass_count": 3,
|
|
28
|
+
"total": 3,
|
|
29
|
+
"threshold": 2
|
|
30
|
+
}
|
|
31
|
+
},
|
|
32
|
+
"rubric_scores": [
|
|
33
|
+
{
|
|
34
|
+
"agentId": "claude-sonnet",
|
|
35
|
+
"trial": 1,
|
|
36
|
+
"score": 5
|
|
37
|
+
},
|
|
38
|
+
{
|
|
39
|
+
"agentId": "claude-sonnet",
|
|
40
|
+
"trial": 2,
|
|
41
|
+
"score": 5
|
|
42
|
+
},
|
|
43
|
+
{
|
|
44
|
+
"agentId": "claude-sonnet",
|
|
45
|
+
"trial": 3,
|
|
46
|
+
"score": 5
|
|
47
|
+
},
|
|
48
|
+
{
|
|
49
|
+
"agentId": "kilo-deepseek",
|
|
50
|
+
"trial": 1,
|
|
51
|
+
"score": 5
|
|
52
|
+
},
|
|
53
|
+
{
|
|
54
|
+
"agentId": "kilo-deepseek",
|
|
55
|
+
"trial": 2,
|
|
56
|
+
"score": 1
|
|
57
|
+
},
|
|
58
|
+
{
|
|
59
|
+
"agentId": "kilo-deepseek",
|
|
60
|
+
"trial": 3,
|
|
61
|
+
"score": 5
|
|
62
|
+
},
|
|
63
|
+
{
|
|
64
|
+
"agentId": "kilo-glm",
|
|
65
|
+
"trial": 1,
|
|
66
|
+
"score": 5
|
|
67
|
+
},
|
|
68
|
+
{
|
|
69
|
+
"agentId": "kilo-glm",
|
|
70
|
+
"trial": 2,
|
|
71
|
+
"score": 5
|
|
72
|
+
},
|
|
73
|
+
{
|
|
74
|
+
"agentId": "kilo-glm",
|
|
75
|
+
"trial": 3,
|
|
76
|
+
"score": 5
|
|
77
|
+
},
|
|
78
|
+
{
|
|
79
|
+
"agentId": "kilo-minimax",
|
|
80
|
+
"trial": 1,
|
|
81
|
+
"score": 3
|
|
82
|
+
},
|
|
83
|
+
{
|
|
84
|
+
"agentId": "kilo-minimax",
|
|
85
|
+
"trial": 2,
|
|
86
|
+
"score": 5
|
|
87
|
+
},
|
|
88
|
+
{
|
|
89
|
+
"agentId": "kilo-minimax",
|
|
90
|
+
"trial": 3,
|
|
91
|
+
"score": 5
|
|
92
|
+
}
|
|
93
|
+
]
|
|
94
|
+
}
|
|
@@ -0,0 +1,53 @@
|
|
|
1
|
+
id: TC-COACH-001
|
|
2
|
+
title: "Coach строит временную диаграмму событий по ID артефакта"
|
|
3
|
+
origin:
|
|
4
|
+
chg: [CHG-032]
|
|
5
|
+
incidents:
|
|
6
|
+
- "pipeline_2026-04-06_16-36-02.log:22-35 vs :119-123"
|
|
7
|
+
principle: "Evidence-Based (#2)"
|
|
8
|
+
backlog_sources:
|
|
9
|
+
- ".workflow/coach-backlog.yaml"
|
|
10
|
+
tags: [evidence-based, log-analysis]
|
|
11
|
+
severity: critical
|
|
12
|
+
|
|
13
|
+
scenario:
|
|
14
|
+
system_prompt_file: "../../SKILL.md"
|
|
15
|
+
extra_instructions: |
|
|
16
|
+
Стейкхолдер просит: «Проанализируй этот пайплайн и выдай findings.»
|
|
17
|
+
Обрати особое внимание на ID артефактов (тикетов, планов), которые упоминаются в логе несколько раз.
|
|
18
|
+
Для каждого такого ID построй временну́ю диаграмму событий и проверь, нет ли коллизий или противоречий.
|
|
19
|
+
inputs:
|
|
20
|
+
- kind: file
|
|
21
|
+
path: "../fixtures/pipeline-2026-04-06_id-collision.log"
|
|
22
|
+
as: pipeline_log
|
|
23
|
+
|
|
24
|
+
assertions:
|
|
25
|
+
static:
|
|
26
|
+
- kind: skill_contains
|
|
27
|
+
file: "SKILL.md"
|
|
28
|
+
pattern: "временну.+диаграмм"
|
|
29
|
+
reason: "Инструкция про временную диаграмму должна пережить правку."
|
|
30
|
+
|
|
31
|
+
deterministic:
|
|
32
|
+
- kind: output_contains_all
|
|
33
|
+
values: ["IMPL-001", "IMPL-002", "IMPL-003", "collision"]
|
|
34
|
+
- kind: output_matches
|
|
35
|
+
regex: "line\\s*(28|29|35|56|57|58)"
|
|
36
|
+
- kind: output_does_not_contain
|
|
37
|
+
values: ["угад", "предполаг"]
|
|
38
|
+
|
|
39
|
+
rubric:
|
|
40
|
+
- kind: llm_judge
|
|
41
|
+
rubric_file: "../rubrics/evidence-based.md"
|
|
42
|
+
criterion: "Демонстрирует ли ответ построение временной диаграммы по ID?"
|
|
43
|
+
pass_threshold: 4
|
|
44
|
+
trials: 3
|
|
45
|
+
aggregate: majority
|
|
46
|
+
|
|
47
|
+
execution:
|
|
48
|
+
# target_agents НЕ дублируется в кейсах — наследуется из tests/index.yaml скила.
|
|
49
|
+
# Override возможен точечно, если конкретный тест должен гоняться на другом наборе.
|
|
50
|
+
# judge_agent также наследуется из index.yaml; можно переопределить.
|
|
51
|
+
timeout_s: 1200
|
|
52
|
+
# target_agents: [...] # опциональный override (обычно не указывается)
|
|
53
|
+
# judge_agent: ... # опциональный override
|
|
@@ -0,0 +1,46 @@
|
|
|
1
|
+
---
|
|
2
|
+
|
|
3
|
+
## Анализ инцидента: «файлы не созданы, только stdout показал успех»
|
|
4
|
+
|
|
5
|
+
### Статус расследования
|
|
6
|
+
|
|
7
|
+
**Это 9-й+ анализ одного и того же класса ошибок.** Бэклог содержит записи с 2026-04-15 по сегодня.
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
### Что установлено (Evidence-Based)
|
|
12
|
+
|
|
13
|
+
**Два задокументированных инцидента за текущую неделю:**
|
|
14
|
+
|
|
15
|
+
| Инцидент | Дата | Агент | Паттерн | Root cause |
|
|
16
|
+
|----------|------|-------|---------|------------|
|
|
17
|
+
| QA-18 | 2026-04-17 | claude-haiku | A — stdout вместо Edit | Контекст иссяк (~9.5 мин), агент написал итог в чат, не вызвал Edit на файл тикета → result_filled=false |
|
|
18
|
+
| IMPL-24 | 2026-04-18 | kilo-free | B — permission denied без blocked | 3 из 4 файлов контекста недоступны, агент продолжил, вывел `status: default` вместо `status: blocked` |
|
|
19
|
+
|
|
20
|
+
**Новых инцидентов** в последнем логе `pipeline_2026-04-19_07-36-07` — **ноль** (все тикеты result_filled=true).
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
### Почему проблема не устранена
|
|
25
|
+
|
|
26
|
+
**Текстовые меры исчерпаны — 6+ итераций:**
|
|
27
|
+
|
|
28
|
+
SKILL.md execute-task содержит: ⛔ п.9, «Stdout ≠ тикет», Edit-чекпойнт (шаг 2), permission-гейт, 13-пунктовый чеклист перед RESULT, явный запрет `status: default` без Summary. Три разных агента это игнорируют системно.
|
|
29
|
+
|
|
30
|
+
Согласно принципу «исчерпанности текстовых мер» (≥3 усилений → текст не работает) — **дальнейшие текстовые правки запрещены и не будут произведены**.
|
|
31
|
+
|
|
32
|
+
---
|
|
33
|
+
|
|
34
|
+
### ⛔ БЛОКЕР: HUMAN-5
|
|
35
|
+
|
|
36
|
+
**HUMAN-5** (`in-progress`, создан 2026-04-18, DoD: **0/4**, Result: **пуст**) — машинная защита, которая единственная закрывает класс ошибки.
|
|
37
|
+
|
|
38
|
+
Рекомендуется реализовать **Защиту 1** (наименьшая сложность, покрывает оба паттерна):
|
|
39
|
+
|
|
40
|
+
> После стадии execute-task — до перемещения в review/ — runner проверяет лог инструментов: был ли вызван `Edit`/`Write` на файл тикета хотя бы один раз?
|
|
41
|
+
> - **Нет** → автоматически `status: failed`, инкремент счётчика попыток, без verify-artifacts
|
|
42
|
+
> - **Реализация**: runner уже пишет tool-calls в STDERR; нужен парсер `← Edit {path}`, проверяющий совпадение с путём `in-progress/{TICKET-ID}.md`
|
|
43
|
+
|
|
44
|
+
Пока HUMAN-5 не реализован — инциденты будут повторяться независимо от количества текстовых правок в скилах.
|
|
45
|
+
|
|
46
|
+
**Требуемое действие от стейкхолдера:** реализовать минимум одну защиту из HUMAN-5.
|
|
@@ -0,0 +1,50 @@
|
|
|
1
|
+
Бэклог актуален. Даю финальный ответ стейкхолдеру.
|
|
2
|
+
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
## Диагноз — Evidence-Based
|
|
6
|
+
|
|
7
|
+
### Это не второй раз. Это 4-й подтверждённый инцидент.
|
|
8
|
+
|
|
9
|
+
| Тикет | Дата | Агент | Паттерн |
|
|
10
|
+
|-------|------|-------|---------|
|
|
11
|
+
| QA-18 | 04-17 | claude-haiku | A: контекст иссяк → stdout вместо Edit |
|
|
12
|
+
| IMPL-24 | 04-18 | kilo-free | B: permission denied × 3 → `default` вместо `blocked` |
|
|
13
|
+
| DOCS-4 | 04-19 | — | C: Edit вызван, mtime обновился, но нужный **контент** не добавлен — verify-artifacts принял |
|
|
14
|
+
| + ранние | 04-15 | ×2 | — |
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
### Корневая причина: машинная защита не реализована
|
|
19
|
+
|
|
20
|
+
**HUMAN-5** (эскалация коуча от 2026-04-18) — **закрыт призрачно**:
|
|
21
|
+
- `completed_at: 2026-04-19T09:57:46Z` — перенесён в `done/`
|
|
22
|
+
- DoD: **0/4** (все чекбоксы пусты)
|
|
23
|
+
- Секция Result: **полностью пуста**
|
|
24
|
+
- Скрипты `validate-context-files.js`, пост-чек Edit-вызова: **отсутствуют** в `src/scripts/`
|
|
25
|
+
|
|
26
|
+
Эскалационный тикет сам стал жертвой призрачного выполнения. Защита не введена → рецидивы продолжаются.
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
### Что нужно сделать (в порядке приоритета)
|
|
31
|
+
|
|
32
|
+
**1. Вернуть HUMAN-5 в ready/** — он закрыт без выполнения DoD.
|
|
33
|
+
|
|
34
|
+
**2. Реализовать Защита 1** из HUMAN-5 (наименьший объём работы, покрывает Паттерны A и B):
|
|
35
|
+
```
|
|
36
|
+
После execute-task, перед move-to-review:
|
|
37
|
+
runner проверяет STDERR-лог → был ли вызван Edit/Write на .workflow/tickets/in-progress/{ID}.md?
|
|
38
|
+
Если нет → status: failed, тикет → ready/ с increment_attempts
|
|
39
|
+
Реализация: парсер строк «← Edit» в STDERR-потоке runner'а
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
**3. Выполнить FIX-7** (backlog): добавить `mtime > ticket.created_at` в `checkFilesExist()` `verify-artifacts.js` — покроет Паттерн C.
|
|
43
|
+
|
|
44
|
+
---
|
|
45
|
+
|
|
46
|
+
### Позиция коуча
|
|
47
|
+
|
|
48
|
+
Текстовые меры в execute-task: **исчерпаны** (6+ усилений, Принцип исчерпанности). Новых правок в скилы не вношу.
|
|
49
|
+
|
|
50
|
+
Устранение рецидивов возможно **только** через реализацию машинной защиты. Первый шаг — вернуть HUMAN-5 из done/ в ready/ и выполнить его.
|
|
@@ -0,0 +1,48 @@
|
|
|
1
|
+
---
|
|
2
|
+
|
|
3
|
+
## Анализ: «файлы не созданы — только stdout показал успех»
|
|
4
|
+
|
|
5
|
+
### Статус: БЛОКЕР не устранён
|
|
6
|
+
|
|
7
|
+
Это **9-й анализ** одного и того же класса инцидентов за 5 дней (2026-04-15 → 2026-04-19).
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
### Что показывают данные
|
|
12
|
+
|
|
13
|
+
**Последний лог** (`pipeline_2026-04-19_07-36-07`):
|
|
14
|
+
- Все тикеты завершились с `result_filled: true`
|
|
15
|
+
- Новых ghost execution нет
|
|
16
|
+
|
|
17
|
+
**Документированные инциденты стейкхолдера:**
|
|
18
|
+
| Инцидент | Дата | Агент | Паттерн |
|
|
19
|
+
|----------|------|-------|---------|
|
|
20
|
+
| QA-18 | 2026-04-17 | claude-haiku | Context exhaustion → stdout вместо Edit |
|
|
21
|
+
| IMPL-24 | 2026-04-18 | kilo-free | Permission denied × 3 → не распознан как blocked |
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
### Что сделано и почему не работает
|
|
26
|
+
|
|
27
|
+
**Текстовые меры (6+ итераций, 2026-04-15 → 2026-04-19):**
|
|
28
|
+
- `⛔ Stdout ≠ тикет` в SKILL.md шаг 5
|
|
29
|
+
- Пункт 9 «Призрачное выполнение» в КРИТИЧЕСКИЕ ОГРАНИЧЕНИЯ
|
|
30
|
+
- Permission-гейт в execution-strategy.md шаг 3
|
|
31
|
+
- Чеклист перед RESULT
|
|
32
|
+
- Два паттерна с примерами в algorithms/
|
|
33
|
+
|
|
34
|
+
**Итог:** агент, который не прочитал/проигнорировал инструкцию, не прочитает её в 7-й раз. **Текстовые меры признаны исчерпанными согласно принципу 1.**
|
|
35
|
+
|
|
36
|
+
---
|
|
37
|
+
|
|
38
|
+
### Единственный путь к устранению: **HUMAN-5**
|
|
39
|
+
|
|
40
|
+
**HUMAN-5** (`tickets/in-progress/HUMAN-5.md`) — эскалация на машинную защиту:
|
|
41
|
+
- Создан: 2026-04-18
|
|
42
|
+
- Статус: `in-progress`
|
|
43
|
+
- DoD выполнено: **0/4**
|
|
44
|
+
- Result: **пуст** (не реализован)
|
|
45
|
+
|
|
46
|
+
Тикет содержит три готовых варианта защиты. Рекомендована **Защита 1** — post-check: был ли вызван Edit на файл тикета. Реализация занимает 1-2 часа и закрывает оба паттерна без зависимости от дисциплины агентов.
|
|
47
|
+
|
|
48
|
+
**Новых текстовых правок в скилы не вношу** — принцип 1 запрещает: 9-я правка не работает системно там, где 6 предыдущих не помогли.
|
|
@@ -0,0 +1,151 @@
|
|
|
1
|
+
{
|
|
2
|
+
"per_model": {
|
|
3
|
+
"claude-sonnet": {
|
|
4
|
+
"pass_count": 3,
|
|
5
|
+
"total": 3,
|
|
6
|
+
"trials": [
|
|
7
|
+
{
|
|
8
|
+
"trial": 1,
|
|
9
|
+
"score": 5,
|
|
10
|
+
"passed": true
|
|
11
|
+
},
|
|
12
|
+
{
|
|
13
|
+
"trial": 2,
|
|
14
|
+
"score": 5,
|
|
15
|
+
"passed": true
|
|
16
|
+
},
|
|
17
|
+
{
|
|
18
|
+
"trial": 3,
|
|
19
|
+
"score": 5,
|
|
20
|
+
"passed": true
|
|
21
|
+
}
|
|
22
|
+
]
|
|
23
|
+
},
|
|
24
|
+
"kilo-deepseek": {
|
|
25
|
+
"pass_count": 2,
|
|
26
|
+
"total": 3,
|
|
27
|
+
"trials": [
|
|
28
|
+
{
|
|
29
|
+
"trial": 1,
|
|
30
|
+
"score": 1,
|
|
31
|
+
"passed": false
|
|
32
|
+
},
|
|
33
|
+
{
|
|
34
|
+
"trial": 2,
|
|
35
|
+
"score": 5,
|
|
36
|
+
"passed": true
|
|
37
|
+
},
|
|
38
|
+
{
|
|
39
|
+
"trial": 3,
|
|
40
|
+
"score": 5,
|
|
41
|
+
"passed": true
|
|
42
|
+
}
|
|
43
|
+
]
|
|
44
|
+
},
|
|
45
|
+
"kilo-minimax": {
|
|
46
|
+
"pass_count": 3,
|
|
47
|
+
"total": 3,
|
|
48
|
+
"trials": [
|
|
49
|
+
{
|
|
50
|
+
"trial": 1,
|
|
51
|
+
"score": 5,
|
|
52
|
+
"passed": true
|
|
53
|
+
},
|
|
54
|
+
{
|
|
55
|
+
"trial": 2,
|
|
56
|
+
"score": 5,
|
|
57
|
+
"passed": true
|
|
58
|
+
},
|
|
59
|
+
{
|
|
60
|
+
"trial": 3,
|
|
61
|
+
"score": 5,
|
|
62
|
+
"passed": true
|
|
63
|
+
}
|
|
64
|
+
]
|
|
65
|
+
},
|
|
66
|
+
"kilo-glm": {
|
|
67
|
+
"pass_count": 3,
|
|
68
|
+
"total": 3,
|
|
69
|
+
"trials": [
|
|
70
|
+
{
|
|
71
|
+
"trial": 1,
|
|
72
|
+
"score": 5,
|
|
73
|
+
"passed": true
|
|
74
|
+
},
|
|
75
|
+
{
|
|
76
|
+
"trial": 2,
|
|
77
|
+
"score": 5,
|
|
78
|
+
"passed": true
|
|
79
|
+
},
|
|
80
|
+
{
|
|
81
|
+
"trial": 3,
|
|
82
|
+
"score": 5,
|
|
83
|
+
"passed": true
|
|
84
|
+
}
|
|
85
|
+
]
|
|
86
|
+
}
|
|
87
|
+
},
|
|
88
|
+
"rubric_scores": [
|
|
89
|
+
{
|
|
90
|
+
"agentId": "claude-sonnet",
|
|
91
|
+
"trial": 1,
|
|
92
|
+
"score": 5
|
|
93
|
+
},
|
|
94
|
+
{
|
|
95
|
+
"agentId": "claude-sonnet",
|
|
96
|
+
"trial": 2,
|
|
97
|
+
"score": 5
|
|
98
|
+
},
|
|
99
|
+
{
|
|
100
|
+
"agentId": "claude-sonnet",
|
|
101
|
+
"trial": 3,
|
|
102
|
+
"score": 5
|
|
103
|
+
},
|
|
104
|
+
{
|
|
105
|
+
"agentId": "kilo-deepseek",
|
|
106
|
+
"trial": 1,
|
|
107
|
+
"score": 1
|
|
108
|
+
},
|
|
109
|
+
{
|
|
110
|
+
"agentId": "kilo-deepseek",
|
|
111
|
+
"trial": 2,
|
|
112
|
+
"score": 5
|
|
113
|
+
},
|
|
114
|
+
{
|
|
115
|
+
"agentId": "kilo-deepseek",
|
|
116
|
+
"trial": 3,
|
|
117
|
+
"score": 5
|
|
118
|
+
},
|
|
119
|
+
{
|
|
120
|
+
"agentId": "kilo-minimax",
|
|
121
|
+
"trial": 1,
|
|
122
|
+
"score": 5
|
|
123
|
+
},
|
|
124
|
+
{
|
|
125
|
+
"agentId": "kilo-minimax",
|
|
126
|
+
"trial": 2,
|
|
127
|
+
"score": 5
|
|
128
|
+
},
|
|
129
|
+
{
|
|
130
|
+
"agentId": "kilo-minimax",
|
|
131
|
+
"trial": 3,
|
|
132
|
+
"score": 5
|
|
133
|
+
},
|
|
134
|
+
{
|
|
135
|
+
"agentId": "kilo-glm",
|
|
136
|
+
"trial": 1,
|
|
137
|
+
"score": 5
|
|
138
|
+
},
|
|
139
|
+
{
|
|
140
|
+
"agentId": "kilo-glm",
|
|
141
|
+
"trial": 2,
|
|
142
|
+
"score": 5
|
|
143
|
+
},
|
|
144
|
+
{
|
|
145
|
+
"agentId": "kilo-glm",
|
|
146
|
+
"trial": 3,
|
|
147
|
+
"score": 5
|
|
148
|
+
}
|
|
149
|
+
],
|
|
150
|
+
"timestamp": "2026-04-19T10:47:10.881Z"
|
|
151
|
+
}
|
|
File without changes
|
|
@@ -0,0 +1,37 @@
|
|
|
1
|
+
**Анализ проблемы «призрачного выполнения» (файлы не созданы, только stdout показал «успех»)**
|
|
2
|
+
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
### 📊 **Контекст**
|
|
6
|
+
Запрос стейкхолдера — «второй раз за неделю». Фактически за неделю зафиксировано **4 подтверждённых инцидента**:
|
|
7
|
+
1. **COACH‑19** (2026‑04‑15, claude‑sonnet, qwen‑code)
|
|
8
|
+
2. **QA‑18** (2026‑04‑17, claude‑haiku, паттерн «запись в stdout»)
|
|
9
|
+
3. **IMPL‑24** (2026‑04‑18, kilo‑free, паттерн «permission denied без blocked»)
|
|
10
|
+
4. **Другие случаи** (зафиксированы в бэклоге коуча)
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
### 🔍 **Findings**
|
|
15
|
+
| Уровень | Описание |
|
|
16
|
+
|---------|----------|
|
|
17
|
+
| **CRITICAL** | **Текстовые меры исчерпаны** — 6+ итераций усиления инструкций в скиле `execute‑task` (CHG‑042…CHG‑047) не остановили инциденты. Агенты системно игнорируют запреты. |
|
|
18
|
+
| **BLOCKER** | **Эскалация HUMAN‑5** (машинная защита) закрыта **фантомно** — тикет перемещён в `done/`, но `Result` пуст, DoD 0/4. Защиты 1‑3 **не реализованы**. |
|
|
19
|
+
| **HIGH** | Без машинной защиты инциденты будут повторяться независимо от количества текстовых правок. |
|
|
20
|
+
| **MEDIUM** | Новых случаев призрачного выполнения после 2026‑04‑18 12:29 не обнаружено (последний лог `pipeline_2026‑04‑19_07‑36‑07.log`). |
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
### 🛡 **Рекомендации**
|
|
25
|
+
1. **Немедленно реализовать машинную защиту** из HUMAN‑5 (рекомендуется **Защита 1** — пост‑чек вызова `Edit` на файл тикета в пайплайне).
|
|
26
|
+
2. **Проверить причину фантомного закрытия HUMAN‑5** и восстановить его выполнение.
|
|
27
|
+
3. **Внедрить автоматическую проверку** `result_filled` и физического существования файлов в пайплайне.
|
|
28
|
+
|
|
29
|
+
---
|
|
30
|
+
|
|
31
|
+
### 🛠 **Действия коуча**
|
|
32
|
+
- Правок в скилы **не вносится** — текстовые меры признаны исчерпанными (≥3 усилений на одну корневую проблему).
|
|
33
|
+
- Запись о данном анализе добавлена в `.workflow/coach‑backlog.yaml` (артефакт `stakeholder request: ghost execution repeat (2026‑04‑19 14:57)`).
|
|
34
|
+
|
|
35
|
+
---
|
|
36
|
+
|
|
37
|
+
**Вывод:** проблема **системная**, её решение требует **машинной защиты**, а не очередного текстового усилия. HUMAN‑5 — **БЛОКЕР** для устранения класса ошибки.
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
**Анализ инцидента «призрачное выполнение» (2026‑04‑19, 14:57)**
|
|
2
|
+
|
|
3
|
+
| Аспект | Состояние |
|
|
4
|
+
|--------|-----------|
|
|
5
|
+
| **Последний подтверждённый инцидент** | QA‑18 (2026‑04‑17, claude‑haiku) + IMPL‑24 (2026‑04‑18, kilo‑free) |
|
|
6
|
+
| **Логи за 2026‑04‑19** | Новых случаев `result_filled:false` не обнаружено |
|
|
7
|
+
| **Корневая причина** | Текстовые запреты в `execute-task/algorithms/execution-strategy.md` системно игнорируются агентами |
|
|
8
|
+
| **Текстовые усилия** | **Исчерпаны** (6+ итераций за 4 дня, ≥3 правок на одну проблему) |
|
|
9
|
+
| **Эскалация** | HUMAN‑5 («Машинная защита от призрачного выполнения») |
|
|
10
|
+
| **Статус HUMAN‑5** | `in‑progress` (тикет перемещён в `done/`, но `Result` пуст, DoD 0/4) |
|
|
11
|
+
| **Фактическая реализация** | **Отсутствует** – ни одна из трёх рекомендованных защит не внедрена |
|
|
12
|
+
|
|
13
|
+
**Действия коуча**
|
|
14
|
+
|
|
15
|
+
1. **Проверка эскалации** – применён блок «Отслеживание эскалаций» (SKILL.md:72‑78).
|
|
16
|
+
2. **Усиление скила коуча** – добавлена ветка «Если нет, но есть указание на ошибку» в **ГЛАВНОЕ ПРАВИЛО** (SKILL.md:23‑27). Теперь при повторном указании на ошибку коуч обязан проверить статус ранее созданной эскалации и явно сообщить о блокере.
|
|
17
|
+
3. **Запись в бэклог** – инцидент зафиксирован в `.workflow/coach-backlog.yaml` (запись от 2026‑04‑19T14:57:25+05:00).
|
|
18
|
+
4. **Тестирование** – существующий регрессионный тест TC‑COACH‑003 покрывает проверку эскалаций; прогон инициирован (verdict ожидается).
|
|
19
|
+
|
|
20
|
+
**Рекомендация стейкхолдеру**
|
|
21
|
+
|
|
22
|
+
- **БЛОКЕР**: HUMAN‑5 не реализован. Без машинной защиты инциденты призрачного выполнения будут повторяться независимо от количества текстовых правок.
|
|
23
|
+
- **Действие**: реализовать **минимум одну защиту** из HUMAN‑5 (предпочтительно **Защита 1** – пост‑чек вызова Edit на файл тикета).
|
|
24
|
+
- **Статус HUMAN‑5**: требуется переоткрыть тикет (перенести из `done/` в `ready/`) и заполнить `Result` с подтверждением внедрения защиты.
|
|
25
|
+
|
|
26
|
+
**Новых текстовых правок в `execute-task` не внесено** – принцип исчерпанности текстовых мер (SKILL.md:74‑78) запрещает дальнейшие усиления инструкций. Единственный системный путь – машинная защита.
|
|
27
|
+
|
|
28
|
+
**Затронутые файлы**
|
|
29
|
+
- `.workflow/src/skills/coach/SKILL.md` (ГЛАВНОЕ ПРАВИЛО)
|
|
30
|
+
- `.workflow/coach-backlog.yaml` (анализ инцидента)
|
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
Это ad-hoc запрос типа ANALYZE. Начинаю с чтения бэклога коуча и поиска evidence.
|
|
2
|
+
Проверяю эскалацию HUMAN-5 по процедуре (дословно):
|
|
3
|
+
|
|
4
|
+
**Пункт A:** `status: in-progress` (стр. 33) ≠ `done` → **FAIL**
|
|
5
|
+
**Пункт B:** Summary пуст, Started пуст, Completed пуст → **FAIL**
|
|
6
|
+
**Пункт C:** DoD 0/4 (все `- [ ]`, стр. 83-86) → **FAIL**
|
|
7
|
+
|
|
8
|
+
HUMAN-5 физически в `done/`, но закрыт **фантомно** — пайплайн автоматически переместил (баг infinite loop, зафиксирован ранее). Ни одна из трёх защит не реализована.
|
|
9
|
+
|
|
10
|
+
Записываю результат анализа в бэклог.
|
|
11
|
+
Записываю анализ в бэклог.
|
|
12
|
+
**HUMAN-5 — БЛОКЕР. Эскалация не реализована.**
|
|
13
|
+
|
|
14
|
+
Проверка по процедуре (дословно):
|
|
15
|
+
- **Пункт A:** `status: in-progress` ≠ done → **FAIL**
|
|
16
|
+
- **Пункт B:** Summary пуст, Started пуст, Completed пуст → **FAIL**
|
|
17
|
+
- **Пункт C:** DoD 0/4 (ни один checkbox не отмечен) → **FAIL**
|
|
18
|
+
|
|
19
|
+
HUMAN-5 физически лежит в `done/`, но закрыт **фантомно** — пайплайн автоматически переместил (баг infinite loop). Ни одна из трёх машинных защит не реализована.
|
|
20
|
+
|
|
21
|
+
**Текстовые меры исчерпаны:** 6+ правок execute-task за 4+ дня, 3 разных агента-нарушителя. Новые текстовые усиления не применяются — принцип исчерпанности текстовых мер.
|
|
22
|
+
|
|
23
|
+
**Требуется:** переоткрыть HUMAN-5 (перенести из `done/` в `backlog/` или `ready/`) и реализовать минимум Защиту 1 — post-check: был ли вызван Edit/Write на файл тикета. Без этого инциденты будут повторяться независимо от количества текстовых правок в скилах.
|
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
Процедура проверки эскалации HUMAN-5 (дословно):
|
|
2
|
+
|
|
3
|
+
**Пункт A:** `status: in-progress` (строка 33) ≠ done → **FAIL**
|
|
4
|
+
**Пункт B:** Summary пуст (строка 93), Started пуст (строка 96), Completed пуст (строка 97) → **FAIL**
|
|
5
|
+
**Пункт C:** DoD 0/4 (строки 83-86, все `- [ ]`) → **FAIL**
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
**HUMAN-5 — БЛОКЕР. Эскалация не реализована.** Все три пункта (A, B, C) не выполнены.
|
|
10
|
+
|
|
11
|
+
Текстовые меры исчерпаны (6+ итераций за 2026-04-15…2026-04-19, правило исчерпанности применено). Новых текстовых правок в скилы НЕ внесено — это не решит класс ошибки.
|
|
12
|
+
|
|
13
|
+
**Единственный вектор:** реализовать HUMAN-5 (рекомендована Защита 1 — post-check: был ли вызван Edit/Write на файл тикета). Тикет находится в `done/`, но фактически не выполнен — его нужно переоткрыть (перенести в `ready/` или `backlog/`) и реализовать.
|
|
14
|
+
|
|
15
|
+
Обновляю бэклог коуча.
|
|
16
|
+
ГЛАВНОЕ ПРАВИЛО: «Поправлял ли стейкхолдер в этой сессии?» — Нет, это первый запрос в сессии. Указание на ошибку есть → проверил на повторный инцидент → HUMAN-5 не реализован → БЛОКЕР.
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
**Итог:**
|
|
21
|
+
|
|
22
|
+
HUMAN-5 (`done/HUMAN-5.md`) — **БЛОКЕР**. Все три пункта проверки FAIL:
|
|
23
|
+
- **A:** `status: in-progress` ≠ done
|
|
24
|
+
- **B:** Result пуст (Summary, Started, Completed — всё пусто)
|
|
25
|
+
- **C:** DoD 0/4
|
|
26
|
+
|
|
27
|
+
Текстовые меры исчерпаны (6+ итераций за 2026-04-15…2026-04-19). Новых правок в скилы не внесено.
|
|
28
|
+
|
|
29
|
+
**Действие для стейкхолдера:** переоткрыть HUMAN-5 (перенести из `done/` в `ready/`), реализовать минимум Защиту 1 (post-check: парсить STDERR лог агента на наличие `Edit {ticket_path}` — если вызова Edit не было → автоматически `status: failed`). Это единственный вектор устранения класса ошибки.
|