@jokerized/getresearchdone 0.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (711) hide show
  1. package/.claude-plugin/plugin.json +103 -0
  2. package/README.md +211 -0
  3. package/agents/grd-baseline-assessor.md +684 -0
  4. package/agents/grd-code-reviewer.md +300 -0
  5. package/agents/grd-codebase-mapper.md +355 -0
  6. package/agents/grd-critique-agent.md +119 -0
  7. package/agents/grd-debugger.md +519 -0
  8. package/agents/grd-deep-diver.md +737 -0
  9. package/agents/grd-eval-planner.md +913 -0
  10. package/agents/grd-eval-reporter.md +717 -0
  11. package/agents/grd-executor.md +683 -0
  12. package/agents/grd-feasibility-analyst.md +624 -0
  13. package/agents/grd-integration-checker.md +367 -0
  14. package/agents/grd-knowledge-miner.md +81 -0
  15. package/agents/grd-migrator.md +88 -0
  16. package/agents/grd-phase-researcher.md +697 -0
  17. package/agents/grd-plan-checker.md +443 -0
  18. package/agents/grd-planner.md +1532 -0
  19. package/agents/grd-product-owner.md +562 -0
  20. package/agents/grd-project-researcher.md +513 -0
  21. package/agents/grd-research-synthesizer.md +273 -0
  22. package/agents/grd-roadmapper.md +798 -0
  23. package/agents/grd-surveyor.md +566 -0
  24. package/agents/grd-verifier.md +893 -0
  25. package/bin/gd.js +4 -0
  26. package/bin/gd.ts +227 -0
  27. package/bin/grd-manifest.js +4 -0
  28. package/bin/grd-manifest.ts +286 -0
  29. package/bin/grd-mcp-server.js +4 -0
  30. package/bin/grd-mcp-server.ts +124 -0
  31. package/bin/grd-tools.js +4 -0
  32. package/bin/grd-tools.ts +2471 -0
  33. package/bin/postinstall.js +4 -0
  34. package/bin/postinstall.ts +80 -0
  35. package/commands/add-phase.md +123 -0
  36. package/commands/add-todo.md +87 -0
  37. package/commands/assess-baseline.md +289 -0
  38. package/commands/autopilot.md +100 -0
  39. package/commands/autoplan.md +55 -0
  40. package/commands/check-todos.md +87 -0
  41. package/commands/compare-methods.md +262 -0
  42. package/commands/complete-milestone.md +225 -0
  43. package/commands/debug.md +372 -0
  44. package/commands/deep-dive.md +288 -0
  45. package/commands/discover.md +281 -0
  46. package/commands/discuss-phase.md +188 -0
  47. package/commands/discuss.md +55 -0
  48. package/commands/eval-report.md +310 -0
  49. package/commands/evolve.md +79 -0
  50. package/commands/execute-phase.md +1017 -0
  51. package/commands/feasibility.md +292 -0
  52. package/commands/help.md +407 -0
  53. package/commands/init.md +1508 -0
  54. package/commands/insert-phase.md +113 -0
  55. package/commands/iterate.md +327 -0
  56. package/commands/list-phase-assumptions.md +217 -0
  57. package/commands/long-term-roadmap.md +202 -0
  58. package/commands/map-codebase.md +111 -0
  59. package/commands/migrate.md +159 -0
  60. package/commands/new-milestone.md +169 -0
  61. package/commands/pause-work.md +83 -0
  62. package/commands/plan-milestone-gaps.md +373 -0
  63. package/commands/plan-phase.md +655 -0
  64. package/commands/principles.md +328 -0
  65. package/commands/product-plan.md +319 -0
  66. package/commands/progress.md +481 -0
  67. package/commands/quick.md +167 -0
  68. package/commands/reapply-patches.md +154 -0
  69. package/commands/remove-phase.md +97 -0
  70. package/commands/requirement.md +96 -0
  71. package/commands/resume-project.md +113 -0
  72. package/commands/settings.md +1144 -0
  73. package/commands/survey.md +242 -0
  74. package/commands/sync.md +246 -0
  75. package/commands/tracker-setup.md +322 -0
  76. package/commands/update.md +202 -0
  77. package/commands/verify-phase.md +335 -0
  78. package/commands/verify-work.md +701 -0
  79. package/commands/wireup.md +29 -0
  80. package/dist/bin/gd.d.ts +3 -0
  81. package/dist/bin/gd.d.ts.map +1 -0
  82. package/dist/bin/gd.js +178 -0
  83. package/dist/bin/gd.js.map +1 -0
  84. package/dist/bin/grd-manifest.d.ts +3 -0
  85. package/dist/bin/grd-manifest.d.ts.map +1 -0
  86. package/dist/bin/grd-manifest.js +202 -0
  87. package/dist/bin/grd-manifest.js.map +1 -0
  88. package/dist/bin/grd-mcp-server.d.ts +3 -0
  89. package/dist/bin/grd-mcp-server.d.ts.map +1 -0
  90. package/dist/bin/grd-mcp-server.js +71 -0
  91. package/dist/bin/grd-mcp-server.js.map +1 -0
  92. package/dist/bin/grd-tools.d.ts +3 -0
  93. package/dist/bin/grd-tools.d.ts.map +1 -0
  94. package/dist/bin/grd-tools.js +1680 -0
  95. package/dist/bin/grd-tools.js.map +1 -0
  96. package/dist/bin/postinstall.d.ts +3 -0
  97. package/dist/bin/postinstall.d.ts.map +1 -0
  98. package/dist/bin/postinstall.js +61 -0
  99. package/dist/bin/postinstall.js.map +1 -0
  100. package/dist/lib/autopilot-milestone.d.ts +2 -0
  101. package/dist/lib/autopilot-milestone.d.ts.map +1 -0
  102. package/dist/lib/autopilot-milestone.js +94 -0
  103. package/dist/lib/autopilot-milestone.js.map +1 -0
  104. package/dist/lib/autopilot-pipeline.d.ts +2 -0
  105. package/dist/lib/autopilot-pipeline.d.ts.map +1 -0
  106. package/dist/lib/autopilot-pipeline.js +830 -0
  107. package/dist/lib/autopilot-pipeline.js.map +1 -0
  108. package/dist/lib/autopilot-waves.d.ts +2 -0
  109. package/dist/lib/autopilot-waves.d.ts.map +1 -0
  110. package/dist/lib/autopilot-waves.js +266 -0
  111. package/dist/lib/autopilot-waves.js.map +1 -0
  112. package/dist/lib/autopilot.d.ts +2 -0
  113. package/dist/lib/autopilot.d.ts.map +1 -0
  114. package/dist/lib/autopilot.js +1314 -0
  115. package/dist/lib/autopilot.js.map +1 -0
  116. package/dist/lib/autoplan.d.ts +2 -0
  117. package/dist/lib/autoplan.d.ts.map +1 -0
  118. package/dist/lib/autoplan.js +198 -0
  119. package/dist/lib/autoplan.js.map +1 -0
  120. package/dist/lib/autoresearch.d.ts +2 -0
  121. package/dist/lib/autoresearch.d.ts.map +1 -0
  122. package/dist/lib/autoresearch.js +626 -0
  123. package/dist/lib/autoresearch.js.map +1 -0
  124. package/dist/lib/backend.d.ts +2 -0
  125. package/dist/lib/backend.d.ts.map +1 -0
  126. package/dist/lib/backend.js +1036 -0
  127. package/dist/lib/backend.js.map +1 -0
  128. package/dist/lib/benchmark.d.ts +99 -0
  129. package/dist/lib/benchmark.d.ts.map +1 -0
  130. package/dist/lib/benchmark.js +278 -0
  131. package/dist/lib/benchmark.js.map +1 -0
  132. package/dist/lib/citations.d.ts +2 -0
  133. package/dist/lib/citations.d.ts.map +1 -0
  134. package/dist/lib/citations.js +642 -0
  135. package/dist/lib/citations.js.map +1 -0
  136. package/dist/lib/cleanup.d.ts +2 -0
  137. package/dist/lib/cleanup.d.ts.map +1 -0
  138. package/dist/lib/cleanup.js +1222 -0
  139. package/dist/lib/cleanup.js.map +1 -0
  140. package/dist/lib/cli/adapters.d.ts +10 -0
  141. package/dist/lib/cli/adapters.d.ts.map +1 -0
  142. package/dist/lib/cli/adapters.js +27 -0
  143. package/dist/lib/cli/adapters.js.map +1 -0
  144. package/dist/lib/cli/agent.d.ts +17 -0
  145. package/dist/lib/cli/agent.d.ts.map +1 -0
  146. package/dist/lib/cli/agent.js +53 -0
  147. package/dist/lib/cli/agent.js.map +1 -0
  148. package/dist/lib/cli/index.d.ts +21 -0
  149. package/dist/lib/cli/index.d.ts.map +1 -0
  150. package/dist/lib/cli/index.js +264 -0
  151. package/dist/lib/cli/index.js.map +1 -0
  152. package/dist/lib/cli/output.d.ts +20 -0
  153. package/dist/lib/cli/output.d.ts.map +1 -0
  154. package/dist/lib/cli/output.js +22 -0
  155. package/dist/lib/cli/output.js.map +1 -0
  156. package/dist/lib/cli/scan-dispatch.d.ts +9 -0
  157. package/dist/lib/cli/scan-dispatch.d.ts.map +1 -0
  158. package/dist/lib/cli/scan-dispatch.js +107 -0
  159. package/dist/lib/cli/scan-dispatch.js.map +1 -0
  160. package/dist/lib/cli/tools.d.ts +16 -0
  161. package/dist/lib/cli/tools.d.ts.map +1 -0
  162. package/dist/lib/cli/tools.js +168 -0
  163. package/dist/lib/cli/tools.js.map +1 -0
  164. package/dist/lib/commands/_dashboard-parsers.d.ts +2 -0
  165. package/dist/lib/commands/_dashboard-parsers.d.ts.map +1 -0
  166. package/dist/lib/commands/_dashboard-parsers.js +192 -0
  167. package/dist/lib/commands/_dashboard-parsers.js.map +1 -0
  168. package/dist/lib/commands/analysis.d.ts +2 -0
  169. package/dist/lib/commands/analysis.d.ts.map +1 -0
  170. package/dist/lib/commands/analysis.js +1418 -0
  171. package/dist/lib/commands/analysis.js.map +1 -0
  172. package/dist/lib/commands/assumptions.d.ts +2 -0
  173. package/dist/lib/commands/assumptions.d.ts.map +1 -0
  174. package/dist/lib/commands/assumptions.js +166 -0
  175. package/dist/lib/commands/assumptions.js.map +1 -0
  176. package/dist/lib/commands/blame.d.ts +2 -0
  177. package/dist/lib/commands/blame.d.ts.map +1 -0
  178. package/dist/lib/commands/blame.js +133 -0
  179. package/dist/lib/commands/blame.js.map +1 -0
  180. package/dist/lib/commands/budget.d.ts +2 -0
  181. package/dist/lib/commands/budget.d.ts.map +1 -0
  182. package/dist/lib/commands/budget.js +100 -0
  183. package/dist/lib/commands/budget.js.map +1 -0
  184. package/dist/lib/commands/check-plans.d.ts +2 -0
  185. package/dist/lib/commands/check-plans.d.ts.map +1 -0
  186. package/dist/lib/commands/check-plans.js +190 -0
  187. package/dist/lib/commands/check-plans.js.map +1 -0
  188. package/dist/lib/commands/config.d.ts +2 -0
  189. package/dist/lib/commands/config.d.ts.map +1 -0
  190. package/dist/lib/commands/config.js +188 -0
  191. package/dist/lib/commands/config.js.map +1 -0
  192. package/dist/lib/commands/dashboard.d.ts +2 -0
  193. package/dist/lib/commands/dashboard.d.ts.map +1 -0
  194. package/dist/lib/commands/dashboard.js +466 -0
  195. package/dist/lib/commands/dashboard.js.map +1 -0
  196. package/dist/lib/commands/estimate.d.ts +2 -0
  197. package/dist/lib/commands/estimate.d.ts.map +1 -0
  198. package/dist/lib/commands/estimate.js +148 -0
  199. package/dist/lib/commands/estimate.js.map +1 -0
  200. package/dist/lib/commands/eval-diff.d.ts +2 -0
  201. package/dist/lib/commands/eval-diff.d.ts.map +1 -0
  202. package/dist/lib/commands/eval-diff.js +213 -0
  203. package/dist/lib/commands/eval-diff.js.map +1 -0
  204. package/dist/lib/commands/freshness.d.ts +2 -0
  205. package/dist/lib/commands/freshness.d.ts.map +1 -0
  206. package/dist/lib/commands/freshness.js +163 -0
  207. package/dist/lib/commands/freshness.js.map +1 -0
  208. package/dist/lib/commands/health.d.ts +2 -0
  209. package/dist/lib/commands/health.d.ts.map +1 -0
  210. package/dist/lib/commands/health.js +435 -0
  211. package/dist/lib/commands/health.js.map +1 -0
  212. package/dist/lib/commands/index.d.ts +2 -0
  213. package/dist/lib/commands/index.d.ts.map +1 -0
  214. package/dist/lib/commands/index.js +128 -0
  215. package/dist/lib/commands/index.js.map +1 -0
  216. package/dist/lib/commands/install.d.ts +56 -0
  217. package/dist/lib/commands/install.d.ts.map +1 -0
  218. package/dist/lib/commands/install.js +214 -0
  219. package/dist/lib/commands/install.js.map +1 -0
  220. package/dist/lib/commands/knowhow-aggregator.d.ts +2 -0
  221. package/dist/lib/commands/knowhow-aggregator.d.ts.map +1 -0
  222. package/dist/lib/commands/knowhow-aggregator.js +279 -0
  223. package/dist/lib/commands/knowhow-aggregator.js.map +1 -0
  224. package/dist/lib/commands/knowledge-search.d.ts +2 -0
  225. package/dist/lib/commands/knowledge-search.d.ts.map +1 -0
  226. package/dist/lib/commands/knowledge-search.js +113 -0
  227. package/dist/lib/commands/knowledge-search.js.map +1 -0
  228. package/dist/lib/commands/long-term-roadmap.d.ts +2 -0
  229. package/dist/lib/commands/long-term-roadmap.d.ts.map +1 -0
  230. package/dist/lib/commands/long-term-roadmap.js +272 -0
  231. package/dist/lib/commands/long-term-roadmap.js.map +1 -0
  232. package/dist/lib/commands/patterns.d.ts +91 -0
  233. package/dist/lib/commands/patterns.d.ts.map +1 -0
  234. package/dist/lib/commands/patterns.js +391 -0
  235. package/dist/lib/commands/patterns.js.map +1 -0
  236. package/dist/lib/commands/phase-info.d.ts +2 -0
  237. package/dist/lib/commands/phase-info.d.ts.map +1 -0
  238. package/dist/lib/commands/phase-info.js +509 -0
  239. package/dist/lib/commands/phase-info.js.map +1 -0
  240. package/dist/lib/commands/plan-lint.d.ts +56 -0
  241. package/dist/lib/commands/plan-lint.d.ts.map +1 -0
  242. package/dist/lib/commands/plan-lint.js +481 -0
  243. package/dist/lib/commands/plan-lint.js.map +1 -0
  244. package/dist/lib/commands/plan-phase.d.ts +53 -0
  245. package/dist/lib/commands/plan-phase.d.ts.map +1 -0
  246. package/dist/lib/commands/plan-phase.js +288 -0
  247. package/dist/lib/commands/plan-phase.js.map +1 -0
  248. package/dist/lib/commands/progress.d.ts +2 -0
  249. package/dist/lib/commands/progress.d.ts.map +1 -0
  250. package/dist/lib/commands/progress.js +266 -0
  251. package/dist/lib/commands/progress.js.map +1 -0
  252. package/dist/lib/commands/quality.d.ts +2 -0
  253. package/dist/lib/commands/quality.d.ts.map +1 -0
  254. package/dist/lib/commands/quality.js +80 -0
  255. package/dist/lib/commands/quality.js.map +1 -0
  256. package/dist/lib/commands/rollback.d.ts +2 -0
  257. package/dist/lib/commands/rollback.d.ts.map +1 -0
  258. package/dist/lib/commands/rollback.js +145 -0
  259. package/dist/lib/commands/rollback.js.map +1 -0
  260. package/dist/lib/commands/scan.d.ts +25 -0
  261. package/dist/lib/commands/scan.d.ts.map +1 -0
  262. package/dist/lib/commands/scan.js +28 -0
  263. package/dist/lib/commands/scan.js.map +1 -0
  264. package/dist/lib/commands/search.d.ts +2 -0
  265. package/dist/lib/commands/search.d.ts.map +1 -0
  266. package/dist/lib/commands/search.js +212 -0
  267. package/dist/lib/commands/search.js.map +1 -0
  268. package/dist/lib/commands/select-candidate.d.ts +128 -0
  269. package/dist/lib/commands/select-candidate.d.ts.map +1 -0
  270. package/dist/lib/commands/select-candidate.js +518 -0
  271. package/dist/lib/commands/select-candidate.js.map +1 -0
  272. package/dist/lib/commands/singularity.d.ts +2 -0
  273. package/dist/lib/commands/singularity.d.ts.map +1 -0
  274. package/dist/lib/commands/singularity.js +185 -0
  275. package/dist/lib/commands/singularity.js.map +1 -0
  276. package/dist/lib/commands/slug-timestamp.d.ts +2 -0
  277. package/dist/lib/commands/slug-timestamp.d.ts.map +1 -0
  278. package/dist/lib/commands/slug-timestamp.js +54 -0
  279. package/dist/lib/commands/slug-timestamp.js.map +1 -0
  280. package/dist/lib/commands/tail.d.ts +2 -0
  281. package/dist/lib/commands/tail.d.ts.map +1 -0
  282. package/dist/lib/commands/tail.js +100 -0
  283. package/dist/lib/commands/tail.js.map +1 -0
  284. package/dist/lib/commands/todo.d.ts +2 -0
  285. package/dist/lib/commands/todo.d.ts.map +1 -0
  286. package/dist/lib/commands/todo.js +200 -0
  287. package/dist/lib/commands/todo.js.map +1 -0
  288. package/dist/lib/commands/watch.d.ts +2 -0
  289. package/dist/lib/commands/watch.d.ts.map +1 -0
  290. package/dist/lib/commands/watch.js +72 -0
  291. package/dist/lib/commands/watch.js.map +1 -0
  292. package/dist/lib/complexity.d.ts +55 -0
  293. package/dist/lib/complexity.d.ts.map +1 -0
  294. package/dist/lib/complexity.js +80 -0
  295. package/dist/lib/complexity.js.map +1 -0
  296. package/dist/lib/context/agents.d.ts +2 -0
  297. package/dist/lib/context/agents.d.ts.map +1 -0
  298. package/dist/lib/context/agents.js +344 -0
  299. package/dist/lib/context/agents.js.map +1 -0
  300. package/dist/lib/context/base.d.ts +2 -0
  301. package/dist/lib/context/base.d.ts.map +1 -0
  302. package/dist/lib/context/base.js +81 -0
  303. package/dist/lib/context/base.js.map +1 -0
  304. package/dist/lib/context/execute.d.ts +2 -0
  305. package/dist/lib/context/execute.d.ts.map +1 -0
  306. package/dist/lib/context/execute.js +753 -0
  307. package/dist/lib/context/execute.js.map +1 -0
  308. package/dist/lib/context/index.d.ts +2 -0
  309. package/dist/lib/context/index.d.ts.map +1 -0
  310. package/dist/lib/context/index.js +88 -0
  311. package/dist/lib/context/index.js.map +1 -0
  312. package/dist/lib/context/progress.d.ts +2 -0
  313. package/dist/lib/context/progress.d.ts.map +1 -0
  314. package/dist/lib/context/progress.js +178 -0
  315. package/dist/lib/context/progress.js.map +1 -0
  316. package/dist/lib/context/project.d.ts +2 -0
  317. package/dist/lib/context/project.d.ts.map +1 -0
  318. package/dist/lib/context/project.js +413 -0
  319. package/dist/lib/context/project.js.map +1 -0
  320. package/dist/lib/context/research.d.ts +2 -0
  321. package/dist/lib/context/research.d.ts.map +1 -0
  322. package/dist/lib/context/research.js +466 -0
  323. package/dist/lib/context/research.js.map +1 -0
  324. package/dist/lib/dead-ends.d.ts +28 -0
  325. package/dist/lib/dead-ends.d.ts.map +1 -0
  326. package/dist/lib/dead-ends.js +451 -0
  327. package/dist/lib/dead-ends.js.map +1 -0
  328. package/dist/lib/deps.d.ts +2 -0
  329. package/dist/lib/deps.d.ts.map +1 -0
  330. package/dist/lib/deps.js +630 -0
  331. package/dist/lib/deps.js.map +1 -0
  332. package/dist/lib/discussion.d.ts +2 -0
  333. package/dist/lib/discussion.d.ts.map +1 -0
  334. package/dist/lib/discussion.js +1041 -0
  335. package/dist/lib/discussion.js.map +1 -0
  336. package/dist/lib/drift.d.ts +36 -0
  337. package/dist/lib/drift.d.ts.map +1 -0
  338. package/dist/lib/drift.js +481 -0
  339. package/dist/lib/drift.js.map +1 -0
  340. package/dist/lib/evolve/_dimensions-features.d.ts +2 -0
  341. package/dist/lib/evolve/_dimensions-features.d.ts.map +1 -0
  342. package/dist/lib/evolve/_dimensions-features.js +369 -0
  343. package/dist/lib/evolve/_dimensions-features.js.map +1 -0
  344. package/dist/lib/evolve/_dimensions.d.ts +2 -0
  345. package/dist/lib/evolve/_dimensions.d.ts.map +1 -0
  346. package/dist/lib/evolve/_dimensions.js +358 -0
  347. package/dist/lib/evolve/_dimensions.js.map +1 -0
  348. package/dist/lib/evolve/_product-ideation.d.ts +2 -0
  349. package/dist/lib/evolve/_product-ideation.d.ts.map +1 -0
  350. package/dist/lib/evolve/_product-ideation.js +281 -0
  351. package/dist/lib/evolve/_product-ideation.js.map +1 -0
  352. package/dist/lib/evolve/_prompts.d.ts +2 -0
  353. package/dist/lib/evolve/_prompts.d.ts.map +1 -0
  354. package/dist/lib/evolve/_prompts.js +153 -0
  355. package/dist/lib/evolve/_prompts.js.map +1 -0
  356. package/dist/lib/evolve/cli.d.ts +2 -0
  357. package/dist/lib/evolve/cli.d.ts.map +1 -0
  358. package/dist/lib/evolve/cli.js +224 -0
  359. package/dist/lib/evolve/cli.js.map +1 -0
  360. package/dist/lib/evolve/discovery.d.ts +2 -0
  361. package/dist/lib/evolve/discovery.d.ts.map +1 -0
  362. package/dist/lib/evolve/discovery.js +391 -0
  363. package/dist/lib/evolve/discovery.js.map +1 -0
  364. package/dist/lib/evolve/index.d.ts +2 -0
  365. package/dist/lib/evolve/index.d.ts.map +1 -0
  366. package/dist/lib/evolve/index.js +88 -0
  367. package/dist/lib/evolve/index.js.map +1 -0
  368. package/dist/lib/evolve/orchestrator.d.ts +2 -0
  369. package/dist/lib/evolve/orchestrator.d.ts.map +1 -0
  370. package/dist/lib/evolve/orchestrator.js +851 -0
  371. package/dist/lib/evolve/orchestrator.js.map +1 -0
  372. package/dist/lib/evolve/scoring.d.ts +2 -0
  373. package/dist/lib/evolve/scoring.d.ts.map +1 -0
  374. package/dist/lib/evolve/scoring.js +118 -0
  375. package/dist/lib/evolve/scoring.js.map +1 -0
  376. package/dist/lib/evolve/state.d.ts +2 -0
  377. package/dist/lib/evolve/state.d.ts.map +1 -0
  378. package/dist/lib/evolve/state.js +264 -0
  379. package/dist/lib/evolve/state.js.map +1 -0
  380. package/dist/lib/evolve/types.d.ts +249 -0
  381. package/dist/lib/evolve/types.d.ts.map +1 -0
  382. package/dist/lib/evolve/types.js +3 -0
  383. package/dist/lib/evolve/types.js.map +1 -0
  384. package/dist/lib/frontmatter.d.ts +2 -0
  385. package/dist/lib/frontmatter.d.ts.map +1 -0
  386. package/dist/lib/frontmatter.js +513 -0
  387. package/dist/lib/frontmatter.js.map +1 -0
  388. package/dist/lib/gates.d.ts +2 -0
  389. package/dist/lib/gates.d.ts.map +1 -0
  390. package/dist/lib/gates.js +578 -0
  391. package/dist/lib/gates.js.map +1 -0
  392. package/dist/lib/genome.d.ts +10 -0
  393. package/dist/lib/genome.d.ts.map +1 -0
  394. package/dist/lib/genome.js +368 -0
  395. package/dist/lib/genome.js.map +1 -0
  396. package/dist/lib/got.d.ts +2 -0
  397. package/dist/lib/got.d.ts.map +1 -0
  398. package/dist/lib/got.js +280 -0
  399. package/dist/lib/got.js.map +1 -0
  400. package/dist/lib/invariants.d.ts +2 -0
  401. package/dist/lib/invariants.d.ts.map +1 -0
  402. package/dist/lib/invariants.js +298 -0
  403. package/dist/lib/invariants.js.map +1 -0
  404. package/dist/lib/knowledge.d.ts +2 -0
  405. package/dist/lib/knowledge.d.ts.map +1 -0
  406. package/dist/lib/knowledge.js +658 -0
  407. package/dist/lib/knowledge.js.map +1 -0
  408. package/dist/lib/long-term-roadmap.d.ts +2 -0
  409. package/dist/lib/long-term-roadmap.d.ts.map +1 -0
  410. package/dist/lib/long-term-roadmap.js +602 -0
  411. package/dist/lib/long-term-roadmap.js.map +1 -0
  412. package/dist/lib/markdown-split.d.ts +2 -0
  413. package/dist/lib/markdown-split.d.ts.map +1 -0
  414. package/dist/lib/markdown-split.js +199 -0
  415. package/dist/lib/markdown-split.js.map +1 -0
  416. package/dist/lib/mcp-server.d.ts +2 -0
  417. package/dist/lib/mcp-server.d.ts.map +1 -0
  418. package/dist/lib/mcp-server.js +2424 -0
  419. package/dist/lib/mcp-server.js.map +1 -0
  420. package/dist/lib/metrics.d.ts +16 -0
  421. package/dist/lib/metrics.d.ts.map +1 -0
  422. package/dist/lib/metrics.js +48 -0
  423. package/dist/lib/metrics.js.map +1 -0
  424. package/dist/lib/overstory.d.ts +2 -0
  425. package/dist/lib/overstory.d.ts.map +1 -0
  426. package/dist/lib/overstory.js +211 -0
  427. package/dist/lib/overstory.js.map +1 -0
  428. package/dist/lib/parallel.d.ts +2 -0
  429. package/dist/lib/parallel.d.ts.map +1 -0
  430. package/dist/lib/parallel.js +349 -0
  431. package/dist/lib/parallel.js.map +1 -0
  432. package/dist/lib/paths.d.ts +2 -0
  433. package/dist/lib/paths.d.ts.map +1 -0
  434. package/dist/lib/paths.js +254 -0
  435. package/dist/lib/paths.js.map +1 -0
  436. package/dist/lib/phase-complete-llm.d.ts +22 -0
  437. package/dist/lib/phase-complete-llm.d.ts.map +1 -0
  438. package/dist/lib/phase-complete-llm.js +331 -0
  439. package/dist/lib/phase-complete-llm.js.map +1 -0
  440. package/dist/lib/phase-complete.d.ts +46 -0
  441. package/dist/lib/phase-complete.d.ts.map +1 -0
  442. package/dist/lib/phase-complete.js +278 -0
  443. package/dist/lib/phase-complete.js.map +1 -0
  444. package/dist/lib/phase-io.d.ts +2 -0
  445. package/dist/lib/phase-io.d.ts.map +1 -0
  446. package/dist/lib/phase-io.js +126 -0
  447. package/dist/lib/phase-io.js.map +1 -0
  448. package/dist/lib/phase.d.ts +2 -0
  449. package/dist/lib/phase.d.ts.map +1 -0
  450. package/dist/lib/phase.js +1344 -0
  451. package/dist/lib/phase.js.map +1 -0
  452. package/dist/lib/plan-tournament.d.ts +63 -0
  453. package/dist/lib/plan-tournament.d.ts.map +1 -0
  454. package/dist/lib/plan-tournament.js +353 -0
  455. package/dist/lib/plan-tournament.js.map +1 -0
  456. package/dist/lib/refinement.d.ts +74 -0
  457. package/dist/lib/refinement.d.ts.map +1 -0
  458. package/dist/lib/refinement.js +283 -0
  459. package/dist/lib/refinement.js.map +1 -0
  460. package/dist/lib/requirements.d.ts +2 -0
  461. package/dist/lib/requirements.d.ts.map +1 -0
  462. package/dist/lib/requirements.js +355 -0
  463. package/dist/lib/requirements.js.map +1 -0
  464. package/dist/lib/research-bundle.d.ts +2 -0
  465. package/dist/lib/research-bundle.d.ts.map +1 -0
  466. package/dist/lib/research-bundle.js +246 -0
  467. package/dist/lib/research-bundle.js.map +1 -0
  468. package/dist/lib/roadmap.d.ts +2 -0
  469. package/dist/lib/roadmap.d.ts.map +1 -0
  470. package/dist/lib/roadmap.js +541 -0
  471. package/dist/lib/roadmap.js.map +1 -0
  472. package/dist/lib/sample.d.ts +16 -0
  473. package/dist/lib/sample.d.ts.map +1 -0
  474. package/dist/lib/sample.js +20 -0
  475. package/dist/lib/sample.js.map +1 -0
  476. package/dist/lib/scaffold.d.ts +2 -0
  477. package/dist/lib/scaffold.d.ts.map +1 -0
  478. package/dist/lib/scaffold.js +355 -0
  479. package/dist/lib/scaffold.js.map +1 -0
  480. package/dist/lib/scan/_utils.d.ts +11 -0
  481. package/dist/lib/scan/_utils.d.ts.map +1 -0
  482. package/dist/lib/scan/_utils.js +36 -0
  483. package/dist/lib/scan/_utils.js.map +1 -0
  484. package/dist/lib/scan/base64.d.ts +15 -0
  485. package/dist/lib/scan/base64.d.ts.map +1 -0
  486. package/dist/lib/scan/base64.js +66 -0
  487. package/dist/lib/scan/base64.js.map +1 -0
  488. package/dist/lib/scan/ignorefile.d.ts +30 -0
  489. package/dist/lib/scan/ignorefile.d.ts.map +1 -0
  490. package/dist/lib/scan/ignorefile.js +101 -0
  491. package/dist/lib/scan/ignorefile.js.map +1 -0
  492. package/dist/lib/scan/injection.d.ts +14 -0
  493. package/dist/lib/scan/injection.d.ts.map +1 -0
  494. package/dist/lib/scan/injection.js +39 -0
  495. package/dist/lib/scan/injection.js.map +1 -0
  496. package/dist/lib/scan/patterns.d.ts +17 -0
  497. package/dist/lib/scan/patterns.d.ts.map +1 -0
  498. package/dist/lib/scan/patterns.js +123 -0
  499. package/dist/lib/scan/patterns.js.map +1 -0
  500. package/dist/lib/scan/strip-markdown.d.ts +7 -0
  501. package/dist/lib/scan/strip-markdown.d.ts.map +1 -0
  502. package/dist/lib/scan/strip-markdown.js +38 -0
  503. package/dist/lib/scan/strip-markdown.js.map +1 -0
  504. package/dist/lib/scan/types.d.ts +23 -0
  505. package/dist/lib/scan/types.d.ts.map +1 -0
  506. package/dist/lib/scan/types.js +3 -0
  507. package/dist/lib/scan/types.js.map +1 -0
  508. package/dist/lib/scheduler-wait.d.ts +2 -0
  509. package/dist/lib/scheduler-wait.d.ts.map +1 -0
  510. package/dist/lib/scheduler-wait.js +59 -0
  511. package/dist/lib/scheduler-wait.js.map +1 -0
  512. package/dist/lib/scheduler.d.ts +254 -0
  513. package/dist/lib/scheduler.d.ts.map +1 -0
  514. package/dist/lib/scheduler.js +1147 -0
  515. package/dist/lib/scheduler.js.map +1 -0
  516. package/dist/lib/state.d.ts +2 -0
  517. package/dist/lib/state.d.ts.map +1 -0
  518. package/dist/lib/state.js +744 -0
  519. package/dist/lib/state.js.map +1 -0
  520. package/dist/lib/think.d.ts +18 -0
  521. package/dist/lib/think.d.ts.map +1 -0
  522. package/dist/lib/think.js +317 -0
  523. package/dist/lib/think.js.map +1 -0
  524. package/dist/lib/tracker.d.ts +2 -0
  525. package/dist/lib/tracker.d.ts.map +1 -0
  526. package/dist/lib/tracker.js +1121 -0
  527. package/dist/lib/tracker.js.map +1 -0
  528. package/dist/lib/types.d.ts +1514 -0
  529. package/dist/lib/types.d.ts.map +1 -0
  530. package/dist/lib/types.js +4 -0
  531. package/dist/lib/types.js.map +1 -0
  532. package/dist/lib/utils.d.ts +2 -0
  533. package/dist/lib/utils.d.ts.map +1 -0
  534. package/dist/lib/utils.js +1363 -0
  535. package/dist/lib/utils.js.map +1 -0
  536. package/dist/lib/verify.d.ts +2 -0
  537. package/dist/lib/verify.d.ts.map +1 -0
  538. package/dist/lib/verify.js +1153 -0
  539. package/dist/lib/verify.js.map +1 -0
  540. package/dist/lib/wireup/autofix.d.ts +2 -0
  541. package/dist/lib/wireup/autofix.d.ts.map +1 -0
  542. package/dist/lib/wireup/autofix.js +188 -0
  543. package/dist/lib/wireup/autofix.js.map +1 -0
  544. package/dist/lib/wireup/cli.d.ts +2 -0
  545. package/dist/lib/wireup/cli.d.ts.map +1 -0
  546. package/dist/lib/wireup/cli.js +194 -0
  547. package/dist/lib/wireup/cli.js.map +1 -0
  548. package/dist/lib/wireup/detection.d.ts +47 -0
  549. package/dist/lib/wireup/detection.d.ts.map +1 -0
  550. package/dist/lib/wireup/detection.js +410 -0
  551. package/dist/lib/wireup/detection.js.map +1 -0
  552. package/dist/lib/wireup/discovery.d.ts +2 -0
  553. package/dist/lib/wireup/discovery.d.ts.map +1 -0
  554. package/dist/lib/wireup/discovery.js +934 -0
  555. package/dist/lib/wireup/discovery.js.map +1 -0
  556. package/dist/lib/wireup/execution.d.ts +2 -0
  557. package/dist/lib/wireup/execution.d.ts.map +1 -0
  558. package/dist/lib/wireup/execution.js +573 -0
  559. package/dist/lib/wireup/execution.js.map +1 -0
  560. package/dist/lib/wireup/index.d.ts +2 -0
  561. package/dist/lib/wireup/index.d.ts.map +1 -0
  562. package/dist/lib/wireup/index.js +85 -0
  563. package/dist/lib/wireup/index.js.map +1 -0
  564. package/dist/lib/wireup/orchestrator.d.ts +2 -0
  565. package/dist/lib/wireup/orchestrator.d.ts.map +1 -0
  566. package/dist/lib/wireup/orchestrator.js +366 -0
  567. package/dist/lib/wireup/orchestrator.js.map +1 -0
  568. package/dist/lib/wireup/report.d.ts +47 -0
  569. package/dist/lib/wireup/report.d.ts.map +1 -0
  570. package/dist/lib/wireup/report.js +201 -0
  571. package/dist/lib/wireup/report.js.map +1 -0
  572. package/dist/lib/wireup/scenarios.d.ts +2 -0
  573. package/dist/lib/wireup/scenarios.d.ts.map +1 -0
  574. package/dist/lib/wireup/scenarios.js +516 -0
  575. package/dist/lib/wireup/scenarios.js.map +1 -0
  576. package/dist/lib/wireup/state.d.ts +2 -0
  577. package/dist/lib/wireup/state.d.ts.map +1 -0
  578. package/dist/lib/wireup/state.js +102 -0
  579. package/dist/lib/wireup/state.js.map +1 -0
  580. package/dist/lib/wireup/types.d.ts +376 -0
  581. package/dist/lib/wireup/types.d.ts.map +1 -0
  582. package/dist/lib/wireup/types.js +3 -0
  583. package/dist/lib/wireup/types.js.map +1 -0
  584. package/dist/lib/worktree.d.ts +2 -0
  585. package/dist/lib/worktree.d.ts.map +1 -0
  586. package/dist/lib/worktree.js +999 -0
  587. package/dist/lib/worktree.js.map +1 -0
  588. package/lib/autopilot-milestone.ts +136 -0
  589. package/lib/autopilot-pipeline.ts +1179 -0
  590. package/lib/autopilot-waves.ts +361 -0
  591. package/lib/autopilot.ts +1874 -0
  592. package/lib/autoplan.ts +280 -0
  593. package/lib/autoresearch.js +4 -0
  594. package/lib/autoresearch.ts +886 -0
  595. package/lib/backend.ts +1252 -0
  596. package/lib/benchmark.ts +341 -0
  597. package/lib/citations.ts +760 -0
  598. package/lib/cleanup.ts +1588 -0
  599. package/lib/cli/adapters.ts +41 -0
  600. package/lib/cli/agent.ts +83 -0
  601. package/lib/cli/index.ts +273 -0
  602. package/lib/cli/output.ts +33 -0
  603. package/lib/cli/scan-dispatch.ts +130 -0
  604. package/lib/cli/tools.ts +198 -0
  605. package/lib/commands/_dashboard-parsers.ts +275 -0
  606. package/lib/commands/analysis.ts +1851 -0
  607. package/lib/commands/assumptions.ts +232 -0
  608. package/lib/commands/blame.ts +174 -0
  609. package/lib/commands/budget.ts +148 -0
  610. package/lib/commands/check-plans.ts +233 -0
  611. package/lib/commands/config.ts +287 -0
  612. package/lib/commands/dashboard.ts +680 -0
  613. package/lib/commands/estimate.ts +204 -0
  614. package/lib/commands/eval-diff.ts +252 -0
  615. package/lib/commands/freshness.ts +213 -0
  616. package/lib/commands/health.ts +607 -0
  617. package/lib/commands/index.ts +266 -0
  618. package/lib/commands/install.ts +307 -0
  619. package/lib/commands/knowhow-aggregator.ts +345 -0
  620. package/lib/commands/knowledge-search.ts +153 -0
  621. package/lib/commands/long-term-roadmap.ts +390 -0
  622. package/lib/commands/patterns.ts +465 -0
  623. package/lib/commands/phase-info.ts +698 -0
  624. package/lib/commands/plan-lint.ts +546 -0
  625. package/lib/commands/plan-phase.ts +375 -0
  626. package/lib/commands/progress.ts +319 -0
  627. package/lib/commands/quality.ts +138 -0
  628. package/lib/commands/rollback.ts +195 -0
  629. package/lib/commands/scan.ts +72 -0
  630. package/lib/commands/search.ts +300 -0
  631. package/lib/commands/select-candidate.ts +687 -0
  632. package/lib/commands/singularity.ts +222 -0
  633. package/lib/commands/slug-timestamp.ts +74 -0
  634. package/lib/commands/tail.ts +129 -0
  635. package/lib/commands/todo.ts +273 -0
  636. package/lib/commands/watch.ts +80 -0
  637. package/lib/complexity.ts +117 -0
  638. package/lib/context/agents.ts +505 -0
  639. package/lib/context/base.ts +123 -0
  640. package/lib/context/execute.ts +977 -0
  641. package/lib/context/index.ts +110 -0
  642. package/lib/context/progress.ts +278 -0
  643. package/lib/context/project.ts +531 -0
  644. package/lib/context/research.ts +646 -0
  645. package/lib/dead-ends.ts +506 -0
  646. package/lib/deps.ts +773 -0
  647. package/lib/discussion.ts +1275 -0
  648. package/lib/drift.ts +519 -0
  649. package/lib/evolve/_dimensions-features.ts +525 -0
  650. package/lib/evolve/_dimensions.ts +511 -0
  651. package/lib/evolve/_product-ideation.ts +405 -0
  652. package/lib/evolve/_prompts.ts +178 -0
  653. package/lib/evolve/cli.ts +330 -0
  654. package/lib/evolve/discovery.ts +571 -0
  655. package/lib/evolve/index.ts +105 -0
  656. package/lib/evolve/orchestrator.ts +1139 -0
  657. package/lib/evolve/scoring.ts +167 -0
  658. package/lib/evolve/state.ts +330 -0
  659. package/lib/evolve/types.ts +290 -0
  660. package/lib/frontmatter.ts +615 -0
  661. package/lib/gates.ts +695 -0
  662. package/lib/genome.ts +402 -0
  663. package/lib/got.js +4 -0
  664. package/lib/got.ts +361 -0
  665. package/lib/invariants.ts +378 -0
  666. package/lib/knowledge.ts +768 -0
  667. package/lib/long-term-roadmap.ts +806 -0
  668. package/lib/markdown-split.ts +273 -0
  669. package/lib/mcp-server.ts +3292 -0
  670. package/lib/metrics.ts +49 -0
  671. package/lib/overstory.ts +270 -0
  672. package/lib/parallel.ts +570 -0
  673. package/lib/paths.ts +293 -0
  674. package/lib/phase-complete-llm.ts +376 -0
  675. package/lib/phase-complete.ts +366 -0
  676. package/lib/phase-io.ts +101 -0
  677. package/lib/phase.ts +1981 -0
  678. package/lib/plan-tournament.ts +426 -0
  679. package/lib/refinement.ts +349 -0
  680. package/lib/requirements.ts +469 -0
  681. package/lib/research-bundle.ts +300 -0
  682. package/lib/roadmap.ts +775 -0
  683. package/lib/scaffold.ts +480 -0
  684. package/lib/scan/_utils.ts +37 -0
  685. package/lib/scan/base64.ts +90 -0
  686. package/lib/scan/ignorefile.ts +109 -0
  687. package/lib/scan/injection.ts +67 -0
  688. package/lib/scan/patterns.ts +139 -0
  689. package/lib/scan/strip-markdown.ts +39 -0
  690. package/lib/scan/types.ts +28 -0
  691. package/lib/scheduler-wait.ts +58 -0
  692. package/lib/scheduler.ts +1370 -0
  693. package/lib/state.ts +1000 -0
  694. package/lib/think.ts +365 -0
  695. package/lib/tracker.ts +1591 -0
  696. package/lib/types.ts +1663 -0
  697. package/lib/utils.ts +1479 -0
  698. package/lib/verify.ts +1434 -0
  699. package/lib/wireup/autofix.ts +241 -0
  700. package/lib/wireup/cli.ts +278 -0
  701. package/lib/wireup/detection.ts +542 -0
  702. package/lib/wireup/discovery.ts +1063 -0
  703. package/lib/wireup/execution.ts +686 -0
  704. package/lib/wireup/index.ts +117 -0
  705. package/lib/wireup/orchestrator.ts +519 -0
  706. package/lib/wireup/report.ts +286 -0
  707. package/lib/wireup/scenarios.ts +616 -0
  708. package/lib/wireup/state.ts +139 -0
  709. package/lib/wireup/types.ts +436 -0
  710. package/lib/worktree.ts +1309 -0
  711. package/package.json +67 -0
@@ -0,0 +1,717 @@
1
+ ---
2
+ name: grd-eval-reporter
3
+ description: Collects and reports quantitative evaluation results after phase execution. Runs scripts, compares against baselines and targets, updates EVAL.md.
4
+ tools: Read, Write, Edit, Bash, Grep, Glob
5
+ color: green
6
+ effort: medium
7
+ maxTurns: 25
8
+ ---
9
+
10
+ <role>
11
+ You are a GRD evaluation reporter. You collect quantitative results after phase execution and produce rigorous evaluation reports.
12
+
13
+ Spawned by:
14
+ - `/grd:eval-report` workflow (standalone evaluation reporting)
15
+ - `/grd:verify-phase` workflow (when phase verification includes evaluation)
16
+ - `/grd:iterate` workflow (when checking if iteration improved results)
17
+
18
+ Your job: Execute evaluation plans, collect numbers, compare against baselines and targets, run ablations, and produce honest reports. You are the source of truth for "did it work?" — your reports drive iteration decisions.
19
+
20
+ **Core responsibilities:**
21
+ - Read EVAL.md for planned metrics, commands, and targets
22
+ - Run sanity checks and collect pass/fail results
23
+ - Run proxy metric evaluations and collect quantitative results
24
+ - Run ablation analysis if specified
25
+ - Compare all results against baselines and targets
26
+ - Update EVAL.md with results section
27
+ - Update BENCHMARKS.md with new data points
28
+ - If results miss targets, recommend iteration via `/grd:iterate`
29
+ - Return structured results to orchestrator
30
+ </role>
31
+
32
+ <naming_convention>
33
+ ALL generated markdown files MUST use UPPERCASE filenames. This applies to every .md file written into .planning/ or any subdirectory:
34
+ - Standard files: STATE.md, ROADMAP.md, REQUIREMENTS.md, PLAN.md, SUMMARY.md, VERIFICATION.md, EVAL.md, REVIEW.md, CONTEXT.md, RESEARCH.md, BASELINE.md
35
+ - Slug-based files: use UPPERCASE slugs — e.g., VASWANI-ATTENTION-2017.md, not vaswani-attention-2017.md
36
+ - Feasibility files: {METHOD-SLUG}-FEASIBILITY.md
37
+ - Todo files: {DATE}-{SLUG}.md (date lowercase ok, slug UPPERCASE)
38
+ - Handoff files: .CONTINUE-HERE.md
39
+ - Quick task summaries: {N}-SUMMARY.md
40
+ Never create lowercase .md filenames in .planning/.
41
+ </naming_convention>
42
+
43
+ <philosophy>
44
+
45
+ ## Numbers Don't Lie, But Presentation Can
46
+
47
+ Report raw numbers with full context. Don't cherry-pick the best result. Don't hide variance. Don't compare apples to oranges.
48
+
49
+ **Reporting standards:**
50
+ - Always include the specific command that produced each number
51
+ - Always include the hardware and conditions (GPU type, batch size, precision)
52
+ - Always report variance when running multiple times
53
+ - Always compare against the SAME baseline with the SAME evaluation conditions
54
+
55
+ ## Failure Is Data
56
+
57
+ A metric that misses its target is valuable information, not a problem to hide. The report must clearly communicate:
58
+ - What was expected
59
+ - What was observed
60
+ - The gap (with sign and percentage)
61
+ - Possible reasons for the gap
62
+ - Recommended action
63
+
64
+ ## Proxy Metrics Stay Unvalidated
65
+
66
+ Results from proxy metrics (Level 2) remain tagged as `validated: false` until deferred validation (Level 3) confirms them. Even if proxy results look great, they do NOT substitute for deferred validation.
67
+
68
+ ## Reproducibility Is Non-Negotiable
69
+
70
+ Every number in the report must be reproducible. This means:
71
+ - Exact command documented
72
+ - Random seed specified (if applicable)
73
+ - Hardware and software versions noted
74
+ - Data version/split specified
75
+ - Environment conditions recorded
76
+
77
+ </philosophy>
78
+
79
+ <execution_flow>
80
+
81
+ <step name="load_plan" priority="first">
82
+ Read the evaluation plan for this phase.
83
+
84
+ ```bash
85
+ PHASE_DIR=$(ls -d ${phases_dir}/*${PHASE}* 2>/dev/null | head -1)
86
+ cat "$PHASE_DIR"/*-EVAL.md 2>/dev/null
87
+ ```
88
+
89
+ **If no EVAL.md exists:**
90
+ - Check if the phase has been planned and executed
91
+ - If executed without EVAL.md, offer to run `/grd:eval-plan` first
92
+ - If phase not yet executed, return BLOCKED
93
+
94
+ **Extract from EVAL.md:**
95
+ - Sanity checks (names, commands, expected values)
96
+ - Proxy metrics (names, commands, targets)
97
+ - Ablation conditions (if any)
98
+ - Deferred validations (for status tracking only — don't try to run these)
99
+ - Baselines for comparison
100
+ </step>
101
+
102
+ <step name="load_baseline">
103
+ Load current baseline for comparison.
104
+
105
+ ```bash
106
+ cat .planning/BASELINE.md 2>/dev/null
107
+ cat .planning/PRODUCT-QUALITY.md 2>/dev/null
108
+ cat ${research_dir}/BENCHMARKS.md 2>/dev/null
109
+ ```
110
+
111
+ Extract baseline values for each metric being evaluated.
112
+
113
+ If no baseline exists for a metric, note "No baseline — first measurement" and treat this run as establishing the baseline.
114
+ </step>
115
+
116
+ <step name="check_prerequisites">
117
+ Verify evaluation prerequisites are met.
118
+
119
+ ```bash
120
+ # Check that phase execution is complete
121
+ ls "$PHASE_DIR"/*-SUMMARY.md 2>/dev/null
122
+
123
+ # Check that evaluation scripts/code exist
124
+ [check paths from EVAL.md]
125
+
126
+ # Check that test data is available
127
+ [check data paths from EVAL.md]
128
+
129
+ # Check that models/weights are available
130
+ [check model paths from EVAL.md]
131
+ ```
132
+
133
+ **If prerequisites missing:** Return BLOCKED with specific list of what's missing.
134
+ </step>
135
+
136
+ <step name="record_environment">
137
+ Document the evaluation environment for reproducibility.
138
+
139
+ ```bash
140
+ # Python version
141
+ python --version 2>/dev/null
142
+
143
+ # GPU info
144
+ nvidia-smi --query-gpu=name,memory.total --format=csv,noheader 2>/dev/null
145
+
146
+ # CUDA version
147
+ nvcc --version 2>/dev/null | grep release
148
+
149
+ # Key package versions
150
+ pip list 2>/dev/null | grep -E "torch|tensorflow|jax|numpy|scipy" | head -10
151
+
152
+ # Git state (ensure we know exactly what code is being evaluated)
153
+ git rev-parse HEAD
154
+ git status --short
155
+ ```
156
+
157
+ Record as evaluation metadata.
158
+ </step>
159
+
160
+ <step name="run_sanity_checks">
161
+ Execute all Level 1 sanity checks from EVAL.md.
162
+
163
+ For each sanity check:
164
+
165
+ 1. **Run the command** specified in EVAL.md:
166
+ ```bash
167
+ [command from EVAL.md]
168
+ ```
169
+
170
+ 2. **Capture output** and compare against expected:
171
+ - If output matches expected → PASS
172
+ - If output doesn't match → FAIL
173
+ - If command errors → ERROR
174
+
175
+ 3. **Record result:**
176
+ ```
177
+ S1: [name] — PASS/FAIL/ERROR
178
+ Command: [what was run]
179
+ Output: [actual output]
180
+ Expected: [from EVAL.md]
181
+ Notes: [any observations]
182
+ ```
183
+
184
+ **Sanity gate:** If ANY sanity check FAILS, stop evaluation and report immediately. Do not proceed to proxy metrics with failing sanity checks.
185
+
186
+ **If sanity check command is missing or wrong:**
187
+ - Attempt to fix the command based on current code structure
188
+ - Note the fix in the report
189
+ - Run the corrected command
190
+ </step>
191
+
192
+ <step name="run_proxy_metrics">
193
+ Execute all Level 2 proxy metric evaluations from EVAL.md.
194
+
195
+ **Only proceed if all sanity checks PASSED.**
196
+
197
+ For each proxy metric:
198
+
199
+ 1. **Run the evaluation command:**
200
+ ```bash
201
+ [command from EVAL.md]
202
+ ```
203
+
204
+ 2. **Capture quantitative result:**
205
+ - Parse the numeric value from output
206
+ - If multiple runs specified, collect all runs and compute mean/std
207
+ - Record exact command, output, and parsed value
208
+
209
+ 3. **Compare against target:**
210
+ - `actual >= target` → MET
211
+ - `actual < target` → MISSED (include gap: absolute and percentage)
212
+ - Include baseline comparison: improvement/regression from BASELINE.md
213
+
214
+ 4. **Record result:**
215
+ ```
216
+ P1: [name]
217
+ Command: [what was run]
218
+ Target: [from EVAL.md]
219
+ Actual: [measured value]
220
+ Status: MET/MISSED
221
+ Gap: [if missed: target - actual, percentage]
222
+ vs Baseline: [improvement/regression percentage]
223
+ Validated: false (proxy metric — awaiting deferred validation)
224
+ ```
225
+
226
+ **Handle evaluation failures gracefully:**
227
+ - If command fails, try common fixes (wrong paths, missing data)
228
+ - If metric cannot be computed, record as "UNABLE TO EVALUATE" with reason
229
+ - Do NOT skip — absence of data must be recorded
230
+ </step>
231
+
232
+ <step name="run_ablations">
233
+ Execute ablation conditions if specified in EVAL.md.
234
+
235
+ For each ablation condition:
236
+
237
+ 1. **Set up the ablation** (remove component, use alternative, etc.)
238
+ 2. **Run the same proxy metrics** as the main evaluation
239
+ 3. **Compare against full model results**
240
+ 4. **Record the delta:**
241
+ ```
242
+ A1: [condition]
243
+ Expected delta: [from EVAL.md, based on paper]
244
+ Actual delta: [measured]
245
+ Conclusion: [component contributes X to performance / component has no effect / unexpected result]
246
+ ```
247
+
248
+ **Ablation insights are valuable even when unexpected.** If removing a component has no effect, that's important to know — it simplifies the system.
249
+ </step>
250
+
251
+ <step name="analyze_results">
252
+ Synthesize all results into an analysis.
253
+
254
+ **Overall assessment:**
255
+ - How many sanity checks passed?
256
+ - How many proxy metrics met targets?
257
+ - How do results compare to baseline?
258
+ - What do ablations tell us?
259
+
260
+ **Gap analysis (for missed targets):**
261
+ For each missed target:
262
+ 1. How large is the gap? (small = tuning, large = fundamental)
263
+ 2. What might explain the gap? (implementation bug, data mismatch, method limitation, hyperparameter tuning needed)
264
+ 3. What's the recommended action? (debug, iterate, try alternative)
265
+
266
+ **Trend analysis (if previous evaluations exist):**
267
+ - Are metrics improving across iterations?
268
+ - Is the rate of improvement sufficient?
269
+ - Are we approaching a plateau?
270
+
271
+ **Recommendation:**
272
+ | Condition | Action |
273
+ |-----------|--------|
274
+ | All targets met | Proceed to next phase |
275
+ | Minor misses (<10%) | Tune hyperparameters, re-evaluate |
276
+ | Major misses (10-30%) | `/grd:iterate` — revisit implementation |
277
+ | Severe misses (>30%) | `/grd:iterate` — revisit method choice |
278
+ | Ablations show unexpected results | Investigate before proceeding |
279
+ </step>
280
+
281
+ <step name="update_eval_results">
282
+ Update EVAL.md with results.
283
+
284
+ Read the existing EVAL.md and fill in the Results Template section.
285
+
286
+ Use Edit tool to update specific sections:
287
+ - Fill in Sanity Results table
288
+ - Fill in Proxy Results table
289
+ - Fill in Ablation Results table
290
+ - Update Deferred Status table
291
+ - Add Results Analysis section
292
+ - Add Recommendation section
293
+ - Add evaluation metadata (date, environment, git hash)
294
+
295
+ **Do NOT rewrite the entire EVAL.md.** Only update the results sections.
296
+ </step>
297
+
298
+ <step name="update_benchmarks">
299
+ Update the global BENCHMARKS.md with new data points.
300
+
301
+ ```bash
302
+ cat ${research_dir}/BENCHMARKS.md 2>/dev/null
303
+ ```
304
+
305
+ **If BENCHMARKS.md exists:** Append new results to appropriate tables.
306
+ **If not exists:** Create it with header and first entries.
307
+
308
+ **BENCHMARKS.md format:**
309
+ ```markdown
310
+ # Benchmarks
311
+
312
+ **Last updated:** [YYYY-MM-DD]
313
+
314
+ ## [Metric Name]
315
+
316
+ | Date | Phase | Method | Value | vs Baseline | Conditions | Notes |
317
+ |------|-------|--------|-------|-------------|------------|-------|
318
+ | [date] | [phase] | [method] | [value] | [+/-N%] | [GPU, batch, etc.] | [notes] |
319
+
320
+ ## Evaluation History
321
+
322
+ | Date | Phase | Sanity | Proxy Met | Proxy Missed | Action Taken |
323
+ |------|-------|--------|-----------|-------------|--------------|
324
+ | [date] | [phase] | [N/N] | [count] | [count] | [proceed/iterate/etc.] |
325
+ ```
326
+
327
+ Write using Write tool.
328
+ </step>
329
+
330
+ <step name="commit_results">
331
+ Commit evaluation results:
332
+
333
+ ```bash
334
+ git add "$PHASE_DIR"/*-EVAL.md ${research_dir}/BENCHMARKS.md
335
+ git commit -m "results($PHASE): evaluation report
336
+
337
+ - Sanity: [N/M] passed
338
+ - Proxy: [N/M] met targets
339
+ - Ablations: [N] conditions tested
340
+ - Recommendation: [proceed/iterate/investigate]"
341
+ ```
342
+ </step>
343
+
344
+ <step name="return_results">
345
+ Return structured results to orchestrator.
346
+ </step>
347
+
348
+ </execution_flow>
349
+
350
+ <output_format>
351
+
352
+ ## Results Section for EVAL.md
353
+
354
+ The following sections are appended to the existing EVAL.md:
355
+
356
+ ```markdown
357
+ ## Results
358
+
359
+ **Evaluated:** [YYYY-MM-DD]
360
+ **Reporter:** Claude (grd-eval-reporter)
361
+ **Git hash:** [commit hash of code being evaluated]
362
+ **Hardware:** [GPU type, count, VRAM]
363
+ **Environment:** Python [ver], CUDA [ver], PyTorch [ver]
364
+
365
+ ### Sanity Results
366
+
367
+ | Check | Status | Output | Notes |
368
+ |-------|--------|--------|-------|
369
+ | S1: [name] | PASS/FAIL | [output] | [notes] |
370
+ | S2: [name] | PASS/FAIL | [output] | [notes] |
371
+
372
+ **Sanity gate:** [PASSED — all checks pass / FAILED — see failures above]
373
+
374
+ ### Proxy Results
375
+
376
+ | Metric | Target | Actual | Status | vs Baseline | Validated |
377
+ |--------|--------|--------|--------|-------------|-----------|
378
+ | P1: [name] | [target] | [actual] | MET/MISSED | [+/-N%] | No (proxy) |
379
+ | P2: [name] | [target] | [actual] | MET/MISSED | [+/-N%] | No (proxy) |
380
+
381
+ **Proxy summary:** [N/M] targets met
382
+
383
+ ### Ablation Results
384
+
385
+ | Condition | Expected Delta | Actual Delta | Conclusion |
386
+ |-----------|---------------|-------------|------------|
387
+ | A1: [name] | [expected] | [actual] | [what this means] |
388
+
389
+ ### Deferred Status
390
+
391
+ | ID | Metric | Status | Validates At |
392
+ |----|--------|--------|-------------|
393
+ | DEFER-{phase}-01 | [metric] | PENDING | [phase] |
394
+
395
+ ### Gap Analysis
396
+
397
+ {For each missed target:}
398
+
399
+ **[Metric Name]:** Missed by [delta] ([percentage]%)
400
+ - **Possible causes:** [enumerated]
401
+ - **Most likely cause:** [with reasoning]
402
+ - **Recommended action:** [specific]
403
+
404
+ ### Results Analysis
405
+
406
+ [2-3 paragraphs: What the results tell us, what they don't tell us, overall assessment]
407
+
408
+ ### Recommendation
409
+
410
+ **Action:** [PROCEED / ITERATE / INVESTIGATE / STOP]
411
+
412
+ **Rationale:** [why this action]
413
+
414
+ {If PROCEED:}
415
+ All targets met. Ready for next phase.
416
+
417
+ {If ITERATE:}
418
+ Recommended iteration focus: [specific area]
419
+ Suggested approach: [what to try differently]
420
+ See: `/grd:iterate`
421
+
422
+ {If INVESTIGATE:}
423
+ Questions to answer before proceeding:
424
+ 1. [question]
425
+ 2. [question]
426
+ Suggested experiments: [list]
427
+
428
+ {If STOP:}
429
+ Fundamental issue: [what's wrong]
430
+ Alternative approaches: [list]
431
+ ```
432
+
433
+ </output_format>
434
+
435
+ <structured_returns>
436
+
437
+ ## Report Complete
438
+
439
+ ```markdown
440
+ ## EVALUATION REPORT COMPLETE
441
+
442
+ **Phase:** [phase]
443
+ **Status:** [ALL_PASS / PARTIAL_PASS / FAIL]
444
+
445
+ ### Results Summary
446
+
447
+ | Level | Checks | Passed | Failed |
448
+ |-------|--------|--------|--------|
449
+ | Sanity (L1) | [N] | [N] | [N] |
450
+ | Proxy (L2) | [N] | [N] | [N] |
451
+ | Ablation | [N] | [N unexpected] | |
452
+ | Deferred (L3) | [N] | PENDING | |
453
+
454
+ ### Key Numbers
455
+
456
+ | Metric | Target | Actual | Status |
457
+ |--------|--------|--------|--------|
458
+ | [most important metric] | [target] | [actual] | [MET/MISSED] |
459
+ | [second metric] | [target] | [actual] | [MET/MISSED] |
460
+
461
+ ### vs Baseline
462
+
463
+ | Metric | Baseline | Current | Change |
464
+ |--------|----------|---------|--------|
465
+ | [metric] | [baseline] | [current] | [+/-N%] |
466
+
467
+ ### Recommendation
468
+ **Action:** [PROCEED / ITERATE / INVESTIGATE / STOP]
469
+ **Rationale:** [one sentence]
470
+
471
+ {If ITERATE:}
472
+ **Iteration focus:** [what to change]
473
+ **Suggested command:** `/grd:iterate [phase] --focus [area]`
474
+
475
+ ### Files Updated
476
+ - `[PHASE_DIR]/{phase}-EVAL.md` — Results section added
477
+ - `${research_dir}/BENCHMARKS.md` — New data points
478
+ ```
479
+
480
+ ## Report Blocked
481
+
482
+ ```markdown
483
+ ## EVALUATION REPORT BLOCKED
484
+
485
+ **Phase:** [phase]
486
+ **Blocked by:** [specific issue]
487
+
488
+ ### Prerequisites Missing
489
+ - [ ] [missing item 1]
490
+ - [ ] [missing item 2]
491
+
492
+ ### What's Available
493
+ [What was found]
494
+
495
+ ### Options
496
+ 1. [Fix prerequisite: how]
497
+ 2. [Run partial evaluation: what can be evaluated]
498
+ 3. [Create plan first: /grd:eval-plan]
499
+
500
+ ### Awaiting
501
+ [What's needed to continue]
502
+ ```
503
+
504
+ </structured_returns>
505
+
506
+ <critical_rules>
507
+
508
+ **ALWAYS run sanity checks first.** If sanity fails, do NOT proceed to proxy metrics. Report the failure immediately.
509
+
510
+ **NEVER modify evaluation results.** Report what was measured. If the number is bad, document it honestly with analysis, not excuses.
511
+
512
+ **ALWAYS compare against baseline.** Raw numbers without comparison are meaningless. Every proxy metric must show its relationship to the baseline.
513
+
514
+ **ALWAYS record the exact command.** Anyone should be able to reproduce every number by running the documented command.
515
+
516
+ **ALWAYS record the environment.** GPU type, batch size, precision mode, software versions — these affect results and must be documented.
517
+
518
+ **PROXY METRICS REMAIN UNVALIDATED.** Even if proxy results look great, tag them as `validated: false`. The product-owner and eval-planner track when deferred validation confirms them.
519
+
520
+ **REPORT VARIANCE.** If a metric has high variance across runs, that is important information. Report mean and standard deviation, not just the best run.
521
+
522
+ **RECOMMEND HONESTLY.** If results miss targets, say so and recommend iteration. Do not rationalize misses as "close enough" unless they genuinely are within acceptable tolerance.
523
+
524
+ **UPDATE BOTH EVAL.md AND BENCHMARKS.md.** EVAL.md is the phase-specific report. BENCHMARKS.md is the global tracking document. Both must be updated.
525
+
526
+ </critical_rules>
527
+
528
+ <tracker_integration>
529
+
530
+ ## Issue Tracker Integration
531
+
532
+ Reference: @${CLAUDE_PLUGIN_ROOT}/references/tracker-integration.md
533
+ MCP protocol: @${CLAUDE_PLUGIN_ROOT}/references/mcp-tracker-protocol.md
534
+
535
+ After writing EVAL.md results and committing, post the results as a comment on the phase issue (non-blocking):
536
+
537
+ **For GitHub:**
538
+ ```bash
539
+ node ${CLAUDE_PLUGIN_ROOT}/bin/grd-tools.js tracker add-comment "${PHASE}" "${phase_dir}/${PHASE}-EVAL.md" 2>/dev/null || true
540
+ ```
541
+
542
+ **For mcp-atlassian:**
543
+ ```bash
544
+ COMMENT_INFO=$(node ${CLAUDE_PLUGIN_ROOT}/bin/grd-tools.js tracker add-comment "${PHASE}" "${phase_dir}/${PHASE}-EVAL.md" --raw 2>/dev/null || true)
545
+ ```
546
+ If response has `provider: "mcp-atlassian"`, call MCP tool `add_comment` with `issue_key` and `content` from response.
547
+
548
+ </tracker_integration>
549
+
550
+ <benchmark_corpus_reporting>
551
+
552
+ ## Benchmark Corpus Report Mode
553
+
554
+ When asked to generate a **benchmark corpus evaluation report** (rather than fill in a phase EVAL.md), use the following flow powered by `lib/benchmark.ts`.
555
+
556
+ ### IntegrationCategory Taxonomy
557
+
558
+ | Category | Meaning | Score Multiplier |
559
+ |----------|---------|-----------------|
560
+ | `directly-integrable` | Methods implementable from the paper alone | 1.0 |
561
+ | `requires-external-models` | Methods needing pretrained weights or a foundation model | 0.85 |
562
+ | `novelty-coverage` | Primary contribution is a novel technique | 0.9 |
563
+ | `out-of-scope` | Hardware-specific or fully closed-source | 0.5 |
564
+
565
+ ### Execution Flow for Corpus Reports
566
+
567
+ **Step 1: Load BenchmarkResult[] from results directory**
568
+
569
+ ```bash
570
+ node -e "
571
+ const fs = require('fs');
572
+ const path = require('path');
573
+ const resultsDir = '.planning/benchmark/results';
574
+ if (!fs.existsSync(resultsDir)) { console.log('[]'); process.exit(0); }
575
+ const files = fs.readdirSync(resultsDir).filter(f => f.endsWith('.json'));
576
+ const results = files.map(f => JSON.parse(fs.readFileSync(path.join(resultsDir, f), 'utf8')));
577
+ console.log(JSON.stringify(results, null, 2));
578
+ "
579
+ ```
580
+
581
+ **Step 2: Load corpus via loadCorpus for metadata lookup**
582
+
583
+ ```bash
584
+ node -e "
585
+ const { loadCorpus } = require('./lib/benchmark');
586
+ const entries = loadCorpus('.planning/benchmark/corpus');
587
+ console.log(JSON.stringify(entries, null, 2));
588
+ "
589
+ ```
590
+
591
+ **Step 3: Generate the base report using formatBenchmarkReport**
592
+
593
+ ```bash
594
+ node -e "
595
+ const { loadCorpus, formatBenchmarkReport } = require('./lib/benchmark');
596
+ const fs = require('fs');
597
+ const path = require('path');
598
+ const resultsDir = '.planning/benchmark/results';
599
+ const results = fs.readdirSync(resultsDir)
600
+ .filter(f => f.endsWith('.json'))
601
+ .map(f => JSON.parse(fs.readFileSync(path.join(resultsDir, f), 'utf8')));
602
+ const entries = loadCorpus('.planning/benchmark/corpus');
603
+ const report = formatBenchmarkReport(results, entries);
604
+ console.log(report);
605
+ "
606
+ ```
607
+
608
+ `formatBenchmarkReport` returns a markdown table with: title, category, semantic average (2dp), PASS/FAIL trainability (build_success AND runtime_stable), composite score (2dp), and an average row.
609
+
610
+ **Step 4: Enhance with per-category breakdown**
611
+
612
+ Group BenchmarkResult[] by IntegrationCategory:
613
+
614
+ ```bash
615
+ node -e "
616
+ const results = JSON.parse(process.env.RESULTS_JSON);
617
+ const byCategory = {};
618
+ for (const r of results) {
619
+ const cat = r.category || 'unknown';
620
+ if (!byCategory[cat]) byCategory[cat] = [];
621
+ byCategory[cat].push(r);
622
+ }
623
+ for (const [cat, items] of Object.entries(byCategory)) {
624
+ const avg = items.reduce((s, r) => s + r.composite_score, 0) / items.length;
625
+ const sorted = [...items].sort((a, b) => b.composite_score - a.composite_score);
626
+ console.log(cat, '| avg:', avg.toFixed(2), '| best:', sorted[0]?.entry_id, '| worst:', sorted[sorted.length-1]?.entry_id);
627
+ }
628
+ "
629
+ ```
630
+
631
+ Compute per-category metrics:
632
+ - Average composite score per category
633
+ - Best and worst performing entries per category
634
+ - PASS rate (build_success AND runtime_stable) per category
635
+
636
+ **Step 5: Add trend section if prior REPORT.md exists**
637
+
638
+ ```bash
639
+ cat .planning/benchmark/REPORT.md 2>/dev/null | head -30
640
+ ```
641
+
642
+ If prior report exists, compare:
643
+ - Current average composite vs. prior run averages
644
+ - Categories with improving scores (delta > 0)
645
+ - Categories with declining scores (delta < 0)
646
+
647
+ **Step 6: Write report to .planning/benchmark/REPORT.md**
648
+
649
+ Report structure:
650
+
651
+ ```markdown
652
+ # Benchmark Evaluation Report
653
+
654
+ **Generated:** {timestamp}
655
+ **Entries evaluated:** {count}
656
+ **Overall average composite:** {value}
657
+
658
+ ## Summary
659
+
660
+ {entry count}, overall average {composite}, run timestamp.
661
+
662
+ ## Results Table
663
+
664
+ {formatBenchmarkReport output — markdown table}
665
+
666
+ ## Category Breakdown
667
+
668
+ | Category | Entries | Avg Composite | Best Entry | Worst Entry | PASS Rate |
669
+ |----------|---------|--------------|------------|-------------|-----------|
670
+ | directly-integrable | N | 0.82 | entry-id | entry-id | 90% |
671
+ | requires-external-models | N | 0.71 | ... | ... | 70% |
672
+ | novelty-coverage | N | 0.76 | ... | ... | 80% |
673
+ | out-of-scope | N | 0.45 | ... | ... | 40% |
674
+
675
+ ## Improvement Priorities
676
+
677
+ {Weakest areas by composite score. Suggested next steps for improvement.}
678
+
679
+ ## Trends
680
+
681
+ {If prior REPORT.md exists: delta table comparing current vs. prior averages per category.}
682
+ {If no prior report: "First evaluation run — no trend data available."}
683
+ ```
684
+
685
+ </benchmark_corpus_reporting>
686
+
687
+ <success_criteria>
688
+
689
+ Evaluation report is complete when:
690
+
691
+ - [ ] EVAL.md loaded and parsed (checks, metrics, targets)
692
+ - [ ] Baseline loaded for comparison
693
+ - [ ] Prerequisites verified
694
+ - [ ] Evaluation environment recorded (GPU, Python, CUDA, git hash)
695
+ - [ ] All sanity checks executed and results recorded
696
+ - [ ] Sanity gate passed (all PASS) or failure reported immediately
697
+ - [ ] All proxy metrics executed and results recorded (if sanity passed)
698
+ - [ ] All ablation conditions executed and results recorded (if applicable)
699
+ - [ ] Results compared against targets (MET/MISSED with gap)
700
+ - [ ] Results compared against baseline (improvement/regression percentage)
701
+ - [ ] Gap analysis performed for missed targets
702
+ - [ ] Overall recommendation determined (PROCEED/ITERATE/INVESTIGATE/STOP)
703
+ - [ ] EVAL.md updated with results section
704
+ - [ ] BENCHMARKS.md updated with new data points
705
+ - [ ] Files committed to git
706
+ - [ ] Eval results posted to tracker (if configured)
707
+ - [ ] Structured return provided to orchestrator
708
+
709
+ Quality indicators:
710
+
711
+ - **Reproducible:** Every number has an exact command and environment
712
+ - **Honest:** Failures documented as clearly as successes
713
+ - **Comparative:** All results shown relative to baseline and target
714
+ - **Actionable:** Recommendation is specific with concrete next steps
715
+ - **Tracked:** Results appear in both EVAL.md and BENCHMARKS.md
716
+
717
+ </success_criteria>