gsd-trae 1.0.1 → 1.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (761) hide show
  1. package/CHANGELOG.md +6 -0
  2. package/assets/screenshot.png +0 -0
  3. package/package.json +9 -2
  4. package/.claude/settings.local.json +0 -8
  5. package/.gitmodules +0 -6
  6. package/.trae/rules/project_rules.md +0 -56
  7. package/.vscode/code-counter/code-counter.db +0 -0
  8. package/.vscode/settings.json +0 -6
  9. package/refs/gsd/.github/CODEOWNERS +0 -2
  10. package/refs/gsd/.github/FUNDING.yml +0 -1
  11. package/refs/gsd/.github/ISSUE_TEMPLATE/bug_report.yml +0 -59
  12. package/refs/gsd/.github/ISSUE_TEMPLATE/feature_request.yml +0 -37
  13. package/refs/gsd/.github/pull_request_template.md +0 -24
  14. package/refs/gsd/.github/workflows/auto-label-issues.yml +0 -21
  15. package/refs/gsd/CHANGELOG.md +0 -1520
  16. package/refs/gsd/LICENSE +0 -21
  17. package/refs/gsd/README.md +0 -704
  18. package/refs/gsd/SECURITY.md +0 -33
  19. package/refs/gsd/agents/gsd-codebase-mapper.md +0 -764
  20. package/refs/gsd/agents/gsd-debugger.md +0 -1246
  21. package/refs/gsd/agents/gsd-executor.md +0 -469
  22. package/refs/gsd/agents/gsd-integration-checker.md +0 -443
  23. package/refs/gsd/agents/gsd-phase-researcher.md +0 -546
  24. package/refs/gsd/agents/gsd-plan-checker.md +0 -690
  25. package/refs/gsd/agents/gsd-planner.md +0 -1275
  26. package/refs/gsd/agents/gsd-project-researcher.md +0 -621
  27. package/refs/gsd/agents/gsd-research-synthesizer.md +0 -239
  28. package/refs/gsd/agents/gsd-roadmapper.md +0 -642
  29. package/refs/gsd/agents/gsd-verifier.md +0 -573
  30. package/refs/gsd/assets/gsd-logo-2000-transparent.png +0 -0
  31. package/refs/gsd/assets/gsd-logo-2000-transparent.svg +0 -17
  32. package/refs/gsd/assets/gsd-logo-2000.png +0 -0
  33. package/refs/gsd/assets/gsd-logo-2000.svg +0 -21
  34. package/refs/gsd/assets/terminal.svg +0 -68
  35. package/refs/gsd/bin/install.js +0 -2090
  36. package/refs/gsd/commands/gsd/add-phase.md +0 -43
  37. package/refs/gsd/commands/gsd/add-tests.md +0 -41
  38. package/refs/gsd/commands/gsd/add-todo.md +0 -47
  39. package/refs/gsd/commands/gsd/audit-milestone.md +0 -36
  40. package/refs/gsd/commands/gsd/check-todos.md +0 -45
  41. package/refs/gsd/commands/gsd/cleanup.md +0 -18
  42. package/refs/gsd/commands/gsd/complete-milestone.md +0 -136
  43. package/refs/gsd/commands/gsd/debug.md +0 -167
  44. package/refs/gsd/commands/gsd/discuss-phase.md +0 -83
  45. package/refs/gsd/commands/gsd/execute-phase.md +0 -41
  46. package/refs/gsd/commands/gsd/health.md +0 -22
  47. package/refs/gsd/commands/gsd/help.md +0 -22
  48. package/refs/gsd/commands/gsd/insert-phase.md +0 -32
  49. package/refs/gsd/commands/gsd/join-discord.md +0 -18
  50. package/refs/gsd/commands/gsd/list-phase-assumptions.md +0 -46
  51. package/refs/gsd/commands/gsd/map-codebase.md +0 -71
  52. package/refs/gsd/commands/gsd/new-milestone.md +0 -44
  53. package/refs/gsd/commands/gsd/new-project.md +0 -42
  54. package/refs/gsd/commands/gsd/new-project.md.bak +0 -1041
  55. package/refs/gsd/commands/gsd/pause-work.md +0 -38
  56. package/refs/gsd/commands/gsd/plan-milestone-gaps.md +0 -34
  57. package/refs/gsd/commands/gsd/plan-phase.md +0 -45
  58. package/refs/gsd/commands/gsd/progress.md +0 -24
  59. package/refs/gsd/commands/gsd/quick.md +0 -41
  60. package/refs/gsd/commands/gsd/reapply-patches.md +0 -110
  61. package/refs/gsd/commands/gsd/remove-phase.md +0 -31
  62. package/refs/gsd/commands/gsd/research-phase.md +0 -189
  63. package/refs/gsd/commands/gsd/resume-work.md +0 -40
  64. package/refs/gsd/commands/gsd/set-profile.md +0 -34
  65. package/refs/gsd/commands/gsd/settings.md +0 -36
  66. package/refs/gsd/commands/gsd/update.md +0 -37
  67. package/refs/gsd/commands/gsd/verify-work.md +0 -38
  68. package/refs/gsd/docs/USER-GUIDE.md +0 -471
  69. package/refs/gsd/docs/context-monitor.md +0 -96
  70. package/refs/gsd/get-shit-done/bin/gsd-tools.cjs +0 -585
  71. package/refs/gsd/get-shit-done/bin/lib/commands.cjs +0 -553
  72. package/refs/gsd/get-shit-done/bin/lib/config.cjs +0 -162
  73. package/refs/gsd/get-shit-done/bin/lib/core.cjs +0 -411
  74. package/refs/gsd/get-shit-done/bin/lib/frontmatter.cjs +0 -299
  75. package/refs/gsd/get-shit-done/bin/lib/init.cjs +0 -710
  76. package/refs/gsd/get-shit-done/bin/lib/milestone.cjs +0 -215
  77. package/refs/gsd/get-shit-done/bin/lib/phase.cjs +0 -870
  78. package/refs/gsd/get-shit-done/bin/lib/roadmap.cjs +0 -298
  79. package/refs/gsd/get-shit-done/bin/lib/state.cjs +0 -521
  80. package/refs/gsd/get-shit-done/bin/lib/template.cjs +0 -222
  81. package/refs/gsd/get-shit-done/bin/lib/verify.cjs +0 -772
  82. package/refs/gsd/get-shit-done/references/checkpoints.md +0 -776
  83. package/refs/gsd/get-shit-done/references/continuation-format.md +0 -249
  84. package/refs/gsd/get-shit-done/references/decimal-phase-calculation.md +0 -65
  85. package/refs/gsd/get-shit-done/references/git-integration.md +0 -248
  86. package/refs/gsd/get-shit-done/references/git-planning-commit.md +0 -38
  87. package/refs/gsd/get-shit-done/references/model-profile-resolution.md +0 -34
  88. package/refs/gsd/get-shit-done/references/model-profiles.md +0 -92
  89. package/refs/gsd/get-shit-done/references/phase-argument-parsing.md +0 -61
  90. package/refs/gsd/get-shit-done/references/planning-config.md +0 -196
  91. package/refs/gsd/get-shit-done/references/questioning.md +0 -145
  92. package/refs/gsd/get-shit-done/references/tdd.md +0 -263
  93. package/refs/gsd/get-shit-done/references/ui-brand.md +0 -160
  94. package/refs/gsd/get-shit-done/references/verification-patterns.md +0 -612
  95. package/refs/gsd/get-shit-done/templates/DEBUG.md +0 -164
  96. package/refs/gsd/get-shit-done/templates/UAT.md +0 -247
  97. package/refs/gsd/get-shit-done/templates/VALIDATION.md +0 -76
  98. package/refs/gsd/get-shit-done/templates/codebase/architecture.md +0 -255
  99. package/refs/gsd/get-shit-done/templates/codebase/concerns.md +0 -310
  100. package/refs/gsd/get-shit-done/templates/codebase/conventions.md +0 -307
  101. package/refs/gsd/get-shit-done/templates/codebase/integrations.md +0 -280
  102. package/refs/gsd/get-shit-done/templates/codebase/stack.md +0 -186
  103. package/refs/gsd/get-shit-done/templates/codebase/structure.md +0 -285
  104. package/refs/gsd/get-shit-done/templates/codebase/testing.md +0 -480
  105. package/refs/gsd/get-shit-done/templates/config.json +0 -37
  106. package/refs/gsd/get-shit-done/templates/context.md +0 -283
  107. package/refs/gsd/get-shit-done/templates/continue-here.md +0 -78
  108. package/refs/gsd/get-shit-done/templates/debug-subagent-prompt.md +0 -91
  109. package/refs/gsd/get-shit-done/templates/discovery.md +0 -146
  110. package/refs/gsd/get-shit-done/templates/milestone-archive.md +0 -123
  111. package/refs/gsd/get-shit-done/templates/milestone.md +0 -115
  112. package/refs/gsd/get-shit-done/templates/phase-prompt.md +0 -569
  113. package/refs/gsd/get-shit-done/templates/planner-subagent-prompt.md +0 -117
  114. package/refs/gsd/get-shit-done/templates/project.md +0 -184
  115. package/refs/gsd/get-shit-done/templates/requirements.md +0 -231
  116. package/refs/gsd/get-shit-done/templates/research-project/ARCHITECTURE.md +0 -204
  117. package/refs/gsd/get-shit-done/templates/research-project/FEATURES.md +0 -147
  118. package/refs/gsd/get-shit-done/templates/research-project/PITFALLS.md +0 -200
  119. package/refs/gsd/get-shit-done/templates/research-project/STACK.md +0 -120
  120. package/refs/gsd/get-shit-done/templates/research-project/SUMMARY.md +0 -170
  121. package/refs/gsd/get-shit-done/templates/research.md +0 -552
  122. package/refs/gsd/get-shit-done/templates/retrospective.md +0 -54
  123. package/refs/gsd/get-shit-done/templates/roadmap.md +0 -202
  124. package/refs/gsd/get-shit-done/templates/state.md +0 -176
  125. package/refs/gsd/get-shit-done/templates/summary-complex.md +0 -59
  126. package/refs/gsd/get-shit-done/templates/summary-minimal.md +0 -41
  127. package/refs/gsd/get-shit-done/templates/summary-standard.md +0 -48
  128. package/refs/gsd/get-shit-done/templates/summary.md +0 -248
  129. package/refs/gsd/get-shit-done/templates/user-setup.md +0 -311
  130. package/refs/gsd/get-shit-done/templates/verification-report.md +0 -322
  131. package/refs/gsd/get-shit-done/workflows/add-phase.md +0 -111
  132. package/refs/gsd/get-shit-done/workflows/add-tests.md +0 -350
  133. package/refs/gsd/get-shit-done/workflows/add-todo.md +0 -157
  134. package/refs/gsd/get-shit-done/workflows/audit-milestone.md +0 -297
  135. package/refs/gsd/get-shit-done/workflows/check-todos.md +0 -176
  136. package/refs/gsd/get-shit-done/workflows/cleanup.md +0 -152
  137. package/refs/gsd/get-shit-done/workflows/complete-milestone.md +0 -763
  138. package/refs/gsd/get-shit-done/workflows/diagnose-issues.md +0 -219
  139. package/refs/gsd/get-shit-done/workflows/discovery-phase.md +0 -289
  140. package/refs/gsd/get-shit-done/workflows/discuss-phase.md +0 -542
  141. package/refs/gsd/get-shit-done/workflows/execute-phase.md +0 -449
  142. package/refs/gsd/get-shit-done/workflows/execute-plan.md +0 -448
  143. package/refs/gsd/get-shit-done/workflows/health.md +0 -156
  144. package/refs/gsd/get-shit-done/workflows/help.md +0 -489
  145. package/refs/gsd/get-shit-done/workflows/insert-phase.md +0 -129
  146. package/refs/gsd/get-shit-done/workflows/list-phase-assumptions.md +0 -178
  147. package/refs/gsd/get-shit-done/workflows/map-codebase.md +0 -315
  148. package/refs/gsd/get-shit-done/workflows/new-milestone.md +0 -382
  149. package/refs/gsd/get-shit-done/workflows/new-project.md +0 -1116
  150. package/refs/gsd/get-shit-done/workflows/pause-work.md +0 -122
  151. package/refs/gsd/get-shit-done/workflows/plan-milestone-gaps.md +0 -274
  152. package/refs/gsd/get-shit-done/workflows/plan-phase.md +0 -569
  153. package/refs/gsd/get-shit-done/workflows/progress.md +0 -381
  154. package/refs/gsd/get-shit-done/workflows/quick.md +0 -453
  155. package/refs/gsd/get-shit-done/workflows/remove-phase.md +0 -154
  156. package/refs/gsd/get-shit-done/workflows/research-phase.md +0 -73
  157. package/refs/gsd/get-shit-done/workflows/resume-project.md +0 -306
  158. package/refs/gsd/get-shit-done/workflows/set-profile.md +0 -80
  159. package/refs/gsd/get-shit-done/workflows/settings.md +0 -213
  160. package/refs/gsd/get-shit-done/workflows/transition.md +0 -544
  161. package/refs/gsd/get-shit-done/workflows/update.md +0 -219
  162. package/refs/gsd/get-shit-done/workflows/verify-phase.md +0 -242
  163. package/refs/gsd/get-shit-done/workflows/verify-work.md +0 -569
  164. package/refs/gsd/hooks/gsd-check-update.js +0 -62
  165. package/refs/gsd/hooks/gsd-context-monitor.js +0 -122
  166. package/refs/gsd/hooks/gsd-statusline.js +0 -108
  167. package/refs/gsd/package.json +0 -50
  168. package/refs/gsd/scripts/build-hooks.js +0 -43
  169. package/refs/gsd/tests/commands.test.cjs +0 -661
  170. package/refs/gsd/tests/helpers.cjs +0 -40
  171. package/refs/gsd/tests/init.test.cjs +0 -205
  172. package/refs/gsd/tests/milestone.test.cjs +0 -98
  173. package/refs/gsd/tests/phase.test.cjs +0 -1241
  174. package/refs/gsd/tests/roadmap.test.cjs +0 -265
  175. package/refs/gsd/tests/state.test.cjs +0 -302
  176. package/refs/gsd/tests/verify.test.cjs +0 -80
  177. package/refs/vbenchmark/.agent/agents/codebase-explorer.md +0 -224
  178. package/refs/vbenchmark/.agent/agents/debugger.md +0 -180
  179. package/refs/vbenchmark/.agent/agents/documenter.md +0 -166
  180. package/refs/vbenchmark/.agent/agents/implementer.md +0 -70
  181. package/refs/vbenchmark/.agent/agents/orchestrator.md +0 -212
  182. package/refs/vbenchmark/.agent/agents/researcher.md +0 -80
  183. package/refs/vbenchmark/.agent/agents/reviewer.md +0 -184
  184. package/refs/vbenchmark/.agent/agents/tester.md +0 -170
  185. package/refs/vbenchmark/.agent/commands/commit.md +0 -29
  186. package/refs/vbenchmark/.agent/commands/debug.md +0 -59
  187. package/refs/vbenchmark/.agent/commands/document.md +0 -52
  188. package/refs/vbenchmark/.agent/commands/gather-context.md +0 -58
  189. package/refs/vbenchmark/.agent/commands/init.md +0 -56
  190. package/refs/vbenchmark/.agent/commands/preset-help.md +0 -50
  191. package/refs/vbenchmark/.agent/commands/refactor.md +0 -71
  192. package/refs/vbenchmark/.agent/commands/research.md +0 -37
  193. package/refs/vbenchmark/.agent/commands/review.md +0 -38
  194. package/refs/vbenchmark/.agent/commands/test.md +0 -61
  195. package/refs/vbenchmark/.agent/rules/01-code-quality.md +0 -33
  196. package/refs/vbenchmark/.agent/rules/02-typescript-go.md +0 -46
  197. package/refs/vbenchmark/.agent/rules/03-security-git.md +0 -34
  198. package/refs/vbenchmark/.agent/rules/04-architecture.md +0 -40
  199. package/refs/vbenchmark/.agent/sync.js +0 -536
  200. package/refs/vbenchmark/.agent/workflows/commit.md +0 -29
  201. package/refs/vbenchmark/.agent/workflows/debug.md +0 -59
  202. package/refs/vbenchmark/.agent/workflows/document.md +0 -52
  203. package/refs/vbenchmark/.agent/workflows/gather-context.md +0 -58
  204. package/refs/vbenchmark/.agent/workflows/init.md +0 -56
  205. package/refs/vbenchmark/.agent/workflows/preset-help.md +0 -50
  206. package/refs/vbenchmark/.agent/workflows/refactor.md +0 -71
  207. package/refs/vbenchmark/.agent/workflows/research.md +0 -37
  208. package/refs/vbenchmark/.agent/workflows/review.md +0 -38
  209. package/refs/vbenchmark/.agent/workflows/test.md +0 -61
  210. package/refs/vbenchmark/.claude/commands/agentic-dev/apply.md +0 -222
  211. package/refs/vbenchmark/.claude/commands/agentic-dev/done.md +0 -166
  212. package/refs/vbenchmark/.claude/commands/agentic-dev/proposal.md +0 -220
  213. package/refs/vbenchmark/.claude/commands/openspec/apply.md +0 -23
  214. package/refs/vbenchmark/.claude/commands/openspec/archive.md +0 -27
  215. package/refs/vbenchmark/.claude/commands/openspec/proposal.md +0 -28
  216. package/refs/vbenchmark/.clinerules/01-rules.md +0 -73
  217. package/refs/vbenchmark/.clinerules/02-agents.md +0 -34
  218. package/refs/vbenchmark/.cursor/commands/commit.md +0 -29
  219. package/refs/vbenchmark/.cursor/commands/debug.md +0 -59
  220. package/refs/vbenchmark/.cursor/commands/document.md +0 -52
  221. package/refs/vbenchmark/.cursor/commands/gather-context.md +0 -58
  222. package/refs/vbenchmark/.cursor/commands/init.md +0 -56
  223. package/refs/vbenchmark/.cursor/commands/preset-help.md +0 -50
  224. package/refs/vbenchmark/.cursor/commands/refactor.md +0 -71
  225. package/refs/vbenchmark/.cursor/commands/research.md +0 -37
  226. package/refs/vbenchmark/.cursor/commands/review.md +0 -38
  227. package/refs/vbenchmark/.cursor/commands/test.md +0 -61
  228. package/refs/vbenchmark/.cursor/rules/agents.mdc +0 -1357
  229. package/refs/vbenchmark/.factory/droids/codebase-explorer.md +0 -224
  230. package/refs/vbenchmark/.factory/droids/debugger.md +0 -180
  231. package/refs/vbenchmark/.factory/droids/documenter.md +0 -166
  232. package/refs/vbenchmark/.factory/droids/implementer.md +0 -70
  233. package/refs/vbenchmark/.factory/droids/orchestrator.md +0 -212
  234. package/refs/vbenchmark/.factory/droids/researcher.md +0 -80
  235. package/refs/vbenchmark/.factory/droids/reviewer.md +0 -184
  236. package/refs/vbenchmark/.factory/droids/tester.md +0 -170
  237. package/refs/vbenchmark/.gemini/workflows/commit.md +0 -29
  238. package/refs/vbenchmark/.gemini/workflows/debug.md +0 -59
  239. package/refs/vbenchmark/.gemini/workflows/document.md +0 -52
  240. package/refs/vbenchmark/.gemini/workflows/gather-context.md +0 -58
  241. package/refs/vbenchmark/.gemini/workflows/init.md +0 -56
  242. package/refs/vbenchmark/.gemini/workflows/preset-help.md +0 -50
  243. package/refs/vbenchmark/.gemini/workflows/refactor.md +0 -71
  244. package/refs/vbenchmark/.gemini/workflows/research.md +0 -37
  245. package/refs/vbenchmark/.gemini/workflows/review.md +0 -38
  246. package/refs/vbenchmark/.gemini/workflows/test.md +0 -61
  247. package/refs/vbenchmark/.github/CODEOWNERS +0 -20
  248. package/refs/vbenchmark/.github/FUNDING.yml +0 -4
  249. package/refs/vbenchmark/.github/ISSUE_TEMPLATE/bug-report.yml +0 -76
  250. package/refs/vbenchmark/.github/ISSUE_TEMPLATE/new-task.yml +0 -106
  251. package/refs/vbenchmark/.github/PULL_REQUEST_TEMPLATE.md +0 -38
  252. package/refs/vbenchmark/.github/copilot-instructions.md +0 -73
  253. package/refs/vbenchmark/.github/workflows/ci.yaml +0 -33
  254. package/refs/vbenchmark/.github/workflows/vercel-auto-pr.yml +0 -478
  255. package/refs/vbenchmark/.github/workflows/vercel-deploy.yaml +0 -487
  256. package/refs/vbenchmark/.github/workflows/vercel-pr-command.yaml +0 -337
  257. package/refs/vbenchmark/.github/workflows/vercel-project-init.yaml +0 -208
  258. package/refs/vbenchmark/.opencode/agent/codebase-explorer.md +0 -224
  259. package/refs/vbenchmark/.opencode/agent/debugger.md +0 -180
  260. package/refs/vbenchmark/.opencode/agent/documenter.md +0 -166
  261. package/refs/vbenchmark/.opencode/agent/implementer.md +0 -70
  262. package/refs/vbenchmark/.opencode/agent/orchestrator.md +0 -212
  263. package/refs/vbenchmark/.opencode/agent/researcher.md +0 -80
  264. package/refs/vbenchmark/.opencode/agent/reviewer.md +0 -184
  265. package/refs/vbenchmark/.opencode/agent/tester.md +0 -170
  266. package/refs/vbenchmark/.opencode/command/commit.md +0 -29
  267. package/refs/vbenchmark/.opencode/command/debug.md +0 -59
  268. package/refs/vbenchmark/.opencode/command/document.md +0 -52
  269. package/refs/vbenchmark/.opencode/command/gather-context.md +0 -58
  270. package/refs/vbenchmark/.opencode/command/init.md +0 -56
  271. package/refs/vbenchmark/.opencode/command/preset-help.md +0 -50
  272. package/refs/vbenchmark/.opencode/command/refactor.md +0 -71
  273. package/refs/vbenchmark/.opencode/command/research.md +0 -37
  274. package/refs/vbenchmark/.opencode/command/review.md +0 -38
  275. package/refs/vbenchmark/.opencode/command/test.md +0 -61
  276. package/refs/vbenchmark/.trae/project_rules.md +0 -73
  277. package/refs/vbenchmark/.windsurf/rules/rules.md +0 -85
  278. package/refs/vbenchmark/AGENTS.md +0 -73
  279. package/refs/vbenchmark/CONTRIBUTING.md +0 -332
  280. package/refs/vbenchmark/Caddyfile +0 -3
  281. package/refs/vbenchmark/LICENSE +0 -47
  282. package/refs/vbenchmark/README.md +0 -354
  283. package/refs/vbenchmark/docker-compose.prod.yaml +0 -35
  284. package/refs/vbenchmark/docker-compose.yaml +0 -53
  285. package/refs/vbenchmark/docs/TASK_EXPANSION_PLAN.md +0 -211
  286. package/refs/vbenchmark/docs/THESIS.md +0 -441
  287. package/refs/vbenchmark/docs/categories/code-evolution.md +0 -138
  288. package/refs/vbenchmark/openspec/changes/init-vibecodingbench/design.md +0 -111
  289. package/refs/vbenchmark/openspec/changes/init-vibecodingbench/proposal.md +0 -15
  290. package/refs/vbenchmark/openspec/changes/init-vibecodingbench/specs/evaluation/spec.md +0 -105
  291. package/refs/vbenchmark/openspec/changes/init-vibecodingbench/specs/leaderboard/spec.md +0 -68
  292. package/refs/vbenchmark/openspec/changes/init-vibecodingbench/specs/task-definition/spec.md +0 -45
  293. package/refs/vbenchmark/openspec/changes/init-vibecodingbench/specs/task-runner/spec.md +0 -49
  294. package/refs/vbenchmark/openspec/changes/init-vibecodingbench/tasks.md +0 -413
  295. package/refs/vbenchmark/package.json +0 -51
  296. package/refs/vbenchmark/packages/cli/eslint.config.js +0 -16
  297. package/refs/vbenchmark/packages/cli/package.json +0 -35
  298. package/refs/vbenchmark/packages/cli/src/agents/index.ts +0 -655
  299. package/refs/vbenchmark/packages/cli/src/commands/eval.ts +0 -197
  300. package/refs/vbenchmark/packages/cli/src/commands/list.ts +0 -63
  301. package/refs/vbenchmark/packages/cli/src/commands/run.ts +0 -147
  302. package/refs/vbenchmark/packages/cli/src/evaluator.ts +0 -125
  303. package/refs/vbenchmark/packages/cli/src/index.ts +0 -21
  304. package/refs/vbenchmark/packages/cli/src/lib/task-variation.ts +0 -153
  305. package/refs/vbenchmark/packages/cli/src/loader.ts +0 -258
  306. package/refs/vbenchmark/packages/cli/src/reporter.ts +0 -222
  307. package/refs/vbenchmark/packages/cli/src/runtime/docker.ts +0 -385
  308. package/refs/vbenchmark/packages/cli/tsconfig.json +0 -8
  309. package/refs/vbenchmark/packages/dashboard/Dockerfile +0 -42
  310. package/refs/vbenchmark/packages/dashboard/index.html +0 -21
  311. package/refs/vbenchmark/packages/dashboard/package.json +0 -29
  312. package/refs/vbenchmark/packages/dashboard/postcss.config.js +0 -6
  313. package/refs/vbenchmark/packages/dashboard/public/favicon.svg +0 -24
  314. package/refs/vbenchmark/packages/dashboard/public/logo.png +0 -0
  315. package/refs/vbenchmark/packages/dashboard/public/logo.svg +0 -39
  316. package/refs/vbenchmark/packages/dashboard/src/App.tsx +0 -1468
  317. package/refs/vbenchmark/packages/dashboard/src/data/category-performance.json +0 -1
  318. package/refs/vbenchmark/packages/dashboard/src/data/leaderboard.json +0 -1
  319. package/refs/vbenchmark/packages/dashboard/src/data/task-results.json +0 -1
  320. package/refs/vbenchmark/packages/dashboard/src/data/tasks.json +0 -1
  321. package/refs/vbenchmark/packages/dashboard/src/index.css +0 -3
  322. package/refs/vbenchmark/packages/dashboard/src/main.tsx +0 -13
  323. package/refs/vbenchmark/packages/dashboard/src/vite-env.d.ts +0 -9
  324. package/refs/vbenchmark/packages/dashboard/tailwind.config.js +0 -11
  325. package/refs/vbenchmark/packages/dashboard/tsconfig.json +0 -21
  326. package/refs/vbenchmark/packages/dashboard/tsconfig.node.json +0 -11
  327. package/refs/vbenchmark/packages/dashboard/vercel.json +0 -6
  328. package/refs/vbenchmark/packages/dashboard/vite.config.ts +0 -28
  329. package/refs/vbenchmark/packages/evaluator/eslint.config.js +0 -16
  330. package/refs/vbenchmark/packages/evaluator/package.json +0 -24
  331. package/refs/vbenchmark/packages/evaluator/src/index.ts +0 -15
  332. package/refs/vbenchmark/packages/evaluator/src/runners/functional.ts +0 -88
  333. package/refs/vbenchmark/packages/evaluator/src/runners/quality.ts +0 -140
  334. package/refs/vbenchmark/packages/evaluator/src/runners/security.ts +0 -94
  335. package/refs/vbenchmark/packages/evaluator/src/runners/visual.ts +0 -108
  336. package/refs/vbenchmark/packages/evaluator/src/types.d.ts +0 -19
  337. package/refs/vbenchmark/packages/evaluator/tsconfig.json +0 -8
  338. package/refs/vbenchmark/packages/leaderboard/Dockerfile +0 -38
  339. package/refs/vbenchmark/packages/leaderboard/drizzle.config.ts +0 -10
  340. package/refs/vbenchmark/packages/leaderboard/eslint.config.js +0 -16
  341. package/refs/vbenchmark/packages/leaderboard/fly.toml +0 -29
  342. package/refs/vbenchmark/packages/leaderboard/package.json +0 -36
  343. package/refs/vbenchmark/packages/leaderboard/src/app.ts +0 -29
  344. package/refs/vbenchmark/packages/leaderboard/src/components/BrowserPreview.tsx +0 -190
  345. package/refs/vbenchmark/packages/leaderboard/src/components/ComparisonView.tsx +0 -205
  346. package/refs/vbenchmark/packages/leaderboard/src/components/LeaderboardTable.tsx +0 -150
  347. package/refs/vbenchmark/packages/leaderboard/src/components/LiveRunCard.tsx +0 -133
  348. package/refs/vbenchmark/packages/leaderboard/src/components/SubmissionForm.tsx +0 -406
  349. package/refs/vbenchmark/packages/leaderboard/src/components/SubmitForm.tsx +0 -293
  350. package/refs/vbenchmark/packages/leaderboard/src/components/TerminalStream.tsx +0 -111
  351. package/refs/vbenchmark/packages/leaderboard/src/config/pricing.ts +0 -206
  352. package/refs/vbenchmark/packages/leaderboard/src/db/index.ts +0 -31
  353. package/refs/vbenchmark/packages/leaderboard/src/db/schema.ts +0 -125
  354. package/refs/vbenchmark/packages/leaderboard/src/index.ts +0 -13
  355. package/refs/vbenchmark/packages/leaderboard/src/lib/websocket.ts +0 -124
  356. package/refs/vbenchmark/packages/leaderboard/src/routes/leaderboard.ts +0 -698
  357. package/refs/vbenchmark/packages/leaderboard/src/routes/live.ts +0 -175
  358. package/refs/vbenchmark/packages/leaderboard/src/routes/submissions.ts +0 -183
  359. package/refs/vbenchmark/packages/leaderboard/src/routes/tasks.ts +0 -215
  360. package/refs/vbenchmark/packages/leaderboard/tests/api.test.ts +0 -228
  361. package/refs/vbenchmark/packages/leaderboard/tsconfig.json +0 -9
  362. package/refs/vbenchmark/scripts/deploy.sh +0 -70
  363. package/refs/vbenchmark/tasks/ai-integration/advanced/context-management/PROMPT.md +0 -15
  364. package/refs/vbenchmark/tasks/ai-integration/advanced/context-management/task.yaml +0 -16
  365. package/refs/vbenchmark/tasks/ai-integration/advanced/evaluation-framework/PROMPT.md +0 -15
  366. package/refs/vbenchmark/tasks/ai-integration/advanced/evaluation-framework/task.yaml +0 -16
  367. package/refs/vbenchmark/tasks/ai-integration/advanced/guardrails-safety/PROMPT.md +0 -15
  368. package/refs/vbenchmark/tasks/ai-integration/advanced/guardrails-safety/task.yaml +0 -16
  369. package/refs/vbenchmark/tasks/ai-integration/advanced/memory-system/PROMPT.md +0 -15
  370. package/refs/vbenchmark/tasks/ai-integration/advanced/memory-system/task.yaml +0 -16
  371. package/refs/vbenchmark/tasks/ai-integration/advanced/model-routing/PROMPT.md +0 -15
  372. package/refs/vbenchmark/tasks/ai-integration/advanced/model-routing/task.yaml +0 -16
  373. package/refs/vbenchmark/tasks/ai-integration/advanced/multi-agent-system/PROMPT.md +0 -15
  374. package/refs/vbenchmark/tasks/ai-integration/advanced/multi-agent-system/task.yaml +0 -16
  375. package/refs/vbenchmark/tasks/ai-integration/advanced/prompt-optimization/PROMPT.md +0 -15
  376. package/refs/vbenchmark/tasks/ai-integration/advanced/prompt-optimization/task.yaml +0 -16
  377. package/refs/vbenchmark/tasks/ai-integration/advanced/reasoning-chain/PROMPT.md +0 -15
  378. package/refs/vbenchmark/tasks/ai-integration/advanced/reasoning-chain/task.yaml +0 -16
  379. package/refs/vbenchmark/tasks/ai-integration/advanced/streaming-pipeline/PROMPT.md +0 -15
  380. package/refs/vbenchmark/tasks/ai-integration/advanced/streaming-pipeline/task.yaml +0 -16
  381. package/refs/vbenchmark/tasks/ai-integration/advanced/tool-use-orchestration/PROMPT.md +0 -15
  382. package/refs/vbenchmark/tasks/ai-integration/advanced/tool-use-orchestration/task.yaml +0 -16
  383. package/refs/vbenchmark/tasks/ai-integration/agents/code-review-agent/PROMPT.md +0 -64
  384. package/refs/vbenchmark/tasks/ai-integration/agents/code-review-agent/task.yaml +0 -24
  385. package/refs/vbenchmark/tasks/ai-integration/agents/research-agent/PROMPT.md +0 -61
  386. package/refs/vbenchmark/tasks/ai-integration/agents/research-agent/task.yaml +0 -24
  387. package/refs/vbenchmark/tasks/ai-integration/agents/web-scraper-agent/PROMPT.md +0 -57
  388. package/refs/vbenchmark/tasks/ai-integration/agents/web-scraper-agent/task.yaml +0 -24
  389. package/refs/vbenchmark/tasks/ai-integration/embeddings/duplicate-detection/PROMPT.md +0 -50
  390. package/refs/vbenchmark/tasks/ai-integration/embeddings/duplicate-detection/task.yaml +0 -24
  391. package/refs/vbenchmark/tasks/ai-integration/embeddings/recommendation-engine/PROMPT.md +0 -51
  392. package/refs/vbenchmark/tasks/ai-integration/embeddings/recommendation-engine/task.yaml +0 -24
  393. package/refs/vbenchmark/tasks/ai-integration/embeddings/semantic-search/PROMPT.md +0 -50
  394. package/refs/vbenchmark/tasks/ai-integration/embeddings/semantic-search/task.yaml +0 -24
  395. package/refs/vbenchmark/tasks/ai-integration/fine-tuning/classification-model/PROMPT.md +0 -50
  396. package/refs/vbenchmark/tasks/ai-integration/fine-tuning/classification-model/task.yaml +0 -24
  397. package/refs/vbenchmark/tasks/ai-integration/function-calling/api-orchestrator/PROMPT.md +0 -60
  398. package/refs/vbenchmark/tasks/ai-integration/function-calling/api-orchestrator/task.yaml +0 -24
  399. package/refs/vbenchmark/tasks/ai-integration/function-calling/calendar-assistant/PROMPT.md +0 -50
  400. package/refs/vbenchmark/tasks/ai-integration/function-calling/calendar-assistant/task.yaml +0 -24
  401. package/refs/vbenchmark/tasks/ai-integration/function-calling/database-query/PROMPT.md +0 -62
  402. package/refs/vbenchmark/tasks/ai-integration/function-calling/database-query/task.yaml +0 -24
  403. package/refs/vbenchmark/tasks/ai-integration/multimodal/chart-interpreter/PROMPT.md +0 -60
  404. package/refs/vbenchmark/tasks/ai-integration/multimodal/chart-interpreter/task.yaml +0 -24
  405. package/refs/vbenchmark/tasks/ai-integration/multimodal/image-captioning/PROMPT.md +0 -49
  406. package/refs/vbenchmark/tasks/ai-integration/multimodal/image-captioning/task.yaml +0 -24
  407. package/refs/vbenchmark/tasks/ai-integration/rag-chatbot/code-assistant/PROMPT.md +0 -51
  408. package/refs/vbenchmark/tasks/ai-integration/rag-chatbot/code-assistant/task.yaml +0 -24
  409. package/refs/vbenchmark/tasks/ai-integration/rag-chatbot/doc-search/PROMPT.md +0 -51
  410. package/refs/vbenchmark/tasks/ai-integration/rag-chatbot/doc-search/task.yaml +0 -24
  411. package/refs/vbenchmark/tasks/ai-integration/rag-chatbot/pdf-qa/PROMPT.md +0 -76
  412. package/refs/vbenchmark/tasks/ai-integration/rag-chatbot/pdf-qa/docker-compose.yaml +0 -30
  413. package/refs/vbenchmark/tasks/ai-integration/rag-chatbot/pdf-qa/task.yaml +0 -30
  414. package/refs/vbenchmark/tasks/ai-integration/rag-chatbot/pdf-qa/tests/functional/qa.test.py +0 -146
  415. package/refs/vbenchmark/tasks/ai-integration/rag-chatbot/support-bot/PROMPT.md +0 -51
  416. package/refs/vbenchmark/tasks/ai-integration/rag-chatbot/support-bot/task.yaml +0 -24
  417. package/refs/vbenchmark/tasks/ai-integration/structured-output/contract-analyzer/PROMPT.md +0 -67
  418. package/refs/vbenchmark/tasks/ai-integration/structured-output/contract-analyzer/task.yaml +0 -24
  419. package/refs/vbenchmark/tasks/ai-integration/structured-output/invoice-parser/PROMPT.md +0 -61
  420. package/refs/vbenchmark/tasks/ai-integration/structured-output/invoice-parser/task.yaml +0 -27
  421. package/refs/vbenchmark/tasks/ai-integration/structured-output/receipt-scanner/PROMPT.md +0 -65
  422. package/refs/vbenchmark/tasks/ai-integration/structured-output/receipt-scanner/task.yaml +0 -24
  423. package/refs/vbenchmark/tasks/ai-integration/structured-output/resume-parser/PROMPT.md +0 -70
  424. package/refs/vbenchmark/tasks/ai-integration/structured-output/resume-parser/task.yaml +0 -24
  425. package/refs/vbenchmark/tasks/api-integrations/advanced/api-analytics/PROMPT.md +0 -15
  426. package/refs/vbenchmark/tasks/api-integrations/advanced/api-analytics/task.yaml +0 -16
  427. package/refs/vbenchmark/tasks/api-integrations/advanced/api-gateway/PROMPT.md +0 -15
  428. package/refs/vbenchmark/tasks/api-integrations/advanced/api-gateway/task.yaml +0 -16
  429. package/refs/vbenchmark/tasks/api-integrations/advanced/api-mocking/PROMPT.md +0 -15
  430. package/refs/vbenchmark/tasks/api-integrations/advanced/api-mocking/task.yaml +0 -16
  431. package/refs/vbenchmark/tasks/api-integrations/advanced/contract-testing/PROMPT.md +0 -15
  432. package/refs/vbenchmark/tasks/api-integrations/advanced/contract-testing/task.yaml +0 -16
  433. package/refs/vbenchmark/tasks/api-integrations/advanced/graphql-federation/PROMPT.md +0 -15
  434. package/refs/vbenchmark/tasks/api-integrations/advanced/graphql-federation/task.yaml +0 -16
  435. package/refs/vbenchmark/tasks/api-integrations/advanced/grpc-gateway/PROMPT.md +0 -15
  436. package/refs/vbenchmark/tasks/api-integrations/advanced/grpc-gateway/task.yaml +0 -16
  437. package/refs/vbenchmark/tasks/api-integrations/advanced/rate-limiter/PROMPT.md +0 -15
  438. package/refs/vbenchmark/tasks/api-integrations/advanced/rate-limiter/task.yaml +0 -16
  439. package/refs/vbenchmark/tasks/api-integrations/advanced/request-validator/PROMPT.md +0 -15
  440. package/refs/vbenchmark/tasks/api-integrations/advanced/request-validator/task.yaml +0 -16
  441. package/refs/vbenchmark/tasks/api-integrations/advanced/sdk-generator/PROMPT.md +0 -15
  442. package/refs/vbenchmark/tasks/api-integrations/advanced/sdk-generator/task.yaml +0 -16
  443. package/refs/vbenchmark/tasks/api-integrations/advanced/webhook-processor/PROMPT.md +0 -15
  444. package/refs/vbenchmark/tasks/api-integrations/advanced/webhook-processor/task.yaml +0 -16
  445. package/refs/vbenchmark/tasks/api-integrations/analytics/mixpanel-events/PROMPT.md +0 -42
  446. package/refs/vbenchmark/tasks/api-integrations/analytics/mixpanel-events/task.yaml +0 -24
  447. package/refs/vbenchmark/tasks/api-integrations/analytics/segment-tracking/PROMPT.md +0 -42
  448. package/refs/vbenchmark/tasks/api-integrations/analytics/segment-tracking/task.yaml +0 -24
  449. package/refs/vbenchmark/tasks/api-integrations/auth-provider/oauth2-github/PROMPT.md +0 -42
  450. package/refs/vbenchmark/tasks/api-integrations/auth-provider/oauth2-github/task.yaml +0 -24
  451. package/refs/vbenchmark/tasks/api-integrations/auth-provider/okta-integration/PROMPT.md +0 -44
  452. package/refs/vbenchmark/tasks/api-integrations/auth-provider/okta-integration/task.yaml +0 -24
  453. package/refs/vbenchmark/tasks/api-integrations/auth-provider/saml-sso/PROMPT.md +0 -42
  454. package/refs/vbenchmark/tasks/api-integrations/auth-provider/saml-sso/task.yaml +0 -24
  455. package/refs/vbenchmark/tasks/api-integrations/communication/discord-webhook/PROMPT.md +0 -44
  456. package/refs/vbenchmark/tasks/api-integrations/communication/discord-webhook/task.yaml +0 -24
  457. package/refs/vbenchmark/tasks/api-integrations/communication/slack-bot/PROMPT.md +0 -42
  458. package/refs/vbenchmark/tasks/api-integrations/communication/slack-bot/task.yaml +0 -24
  459. package/refs/vbenchmark/tasks/api-integrations/communication/twilio-sms/PROMPT.md +0 -42
  460. package/refs/vbenchmark/tasks/api-integrations/communication/twilio-sms/task.yaml +0 -24
  461. package/refs/vbenchmark/tasks/api-integrations/email/transactional/PROMPT.md +0 -82
  462. package/refs/vbenchmark/tasks/api-integrations/email/transactional/task.yaml +0 -27
  463. package/refs/vbenchmark/tasks/api-integrations/maps/google-maps-geocoding/PROMPT.md +0 -41
  464. package/refs/vbenchmark/tasks/api-integrations/maps/google-maps-geocoding/task.yaml +0 -24
  465. package/refs/vbenchmark/tasks/api-integrations/maps/mapbox-directions/PROMPT.md +0 -41
  466. package/refs/vbenchmark/tasks/api-integrations/maps/mapbox-directions/task.yaml +0 -24
  467. package/refs/vbenchmark/tasks/api-integrations/payment/crypto-payments/PROMPT.md +0 -43
  468. package/refs/vbenchmark/tasks/api-integrations/payment/crypto-payments/task.yaml +0 -24
  469. package/refs/vbenchmark/tasks/api-integrations/payment/paypal-integration/PROMPT.md +0 -41
  470. package/refs/vbenchmark/tasks/api-integrations/payment/paypal-integration/task.yaml +0 -24
  471. package/refs/vbenchmark/tasks/api-integrations/social/twitter-api/PROMPT.md +0 -41
  472. package/refs/vbenchmark/tasks/api-integrations/social/twitter-api/task.yaml +0 -24
  473. package/refs/vbenchmark/tasks/api-integrations/storage/cloudinary-upload/PROMPT.md +0 -43
  474. package/refs/vbenchmark/tasks/api-integrations/storage/cloudinary-upload/task.yaml +0 -24
  475. package/refs/vbenchmark/tasks/api-integrations/storage/gcs-streaming/PROMPT.md +0 -43
  476. package/refs/vbenchmark/tasks/api-integrations/storage/gcs-streaming/task.yaml +0 -24
  477. package/refs/vbenchmark/tasks/api-integrations/storage/s3-presigned-urls/PROMPT.md +0 -41
  478. package/refs/vbenchmark/tasks/api-integrations/storage/s3-presigned-urls/task.yaml +0 -24
  479. package/refs/vbenchmark/tasks/api-integrations/stripe/checkout-session/PROMPT.md +0 -41
  480. package/refs/vbenchmark/tasks/api-integrations/stripe/checkout-session/task.yaml +0 -24
  481. package/refs/vbenchmark/tasks/api-integrations/stripe/payment-webhook/PROMPT.md +0 -60
  482. package/refs/vbenchmark/tasks/api-integrations/stripe/payment-webhook/docker-compose.yaml +0 -38
  483. package/refs/vbenchmark/tasks/api-integrations/stripe/payment-webhook/task.yaml +0 -31
  484. package/refs/vbenchmark/tasks/api-integrations/stripe/payment-webhook/tests/webhook.test.ts +0 -193
  485. package/refs/vbenchmark/tasks/api-integrations/stripe/subscription-portal/PROMPT.md +0 -41
  486. package/refs/vbenchmark/tasks/api-integrations/stripe/subscription-portal/task.yaml +0 -24
  487. package/refs/vbenchmark/tasks/code-evolution/advanced/api-deprecation/PROMPT.md +0 -15
  488. package/refs/vbenchmark/tasks/code-evolution/advanced/api-deprecation/task.yaml +0 -16
  489. package/refs/vbenchmark/tasks/code-evolution/advanced/ast-refactoring/PROMPT.md +0 -15
  490. package/refs/vbenchmark/tasks/code-evolution/advanced/ast-refactoring/task.yaml +0 -16
  491. package/refs/vbenchmark/tasks/code-evolution/advanced/concurrency-fix/PROMPT.md +0 -15
  492. package/refs/vbenchmark/tasks/code-evolution/advanced/concurrency-fix/task.yaml +0 -16
  493. package/refs/vbenchmark/tasks/code-evolution/advanced/database-schema-migration/PROMPT.md +0 -15
  494. package/refs/vbenchmark/tasks/code-evolution/advanced/database-schema-migration/task.yaml +0 -16
  495. package/refs/vbenchmark/tasks/code-evolution/advanced/dead-code-elimination/PROMPT.md +0 -15
  496. package/refs/vbenchmark/tasks/code-evolution/advanced/dead-code-elimination/task.yaml +0 -16
  497. package/refs/vbenchmark/tasks/code-evolution/advanced/dependency-upgrade/PROMPT.md +0 -15
  498. package/refs/vbenchmark/tasks/code-evolution/advanced/dependency-upgrade/task.yaml +0 -16
  499. package/refs/vbenchmark/tasks/code-evolution/advanced/memory-optimization/PROMPT.md +0 -15
  500. package/refs/vbenchmark/tasks/code-evolution/advanced/memory-optimization/task.yaml +0 -16
  501. package/refs/vbenchmark/tasks/code-evolution/advanced/monorepo-extraction/PROMPT.md +0 -15
  502. package/refs/vbenchmark/tasks/code-evolution/advanced/monorepo-extraction/task.yaml +0 -16
  503. package/refs/vbenchmark/tasks/code-evolution/advanced/performance-profiling/PROMPT.md +0 -15
  504. package/refs/vbenchmark/tasks/code-evolution/advanced/performance-profiling/task.yaml +0 -16
  505. package/refs/vbenchmark/tasks/code-evolution/advanced/type-migration/PROMPT.md +0 -15
  506. package/refs/vbenchmark/tasks/code-evolution/advanced/type-migration/task.yaml +0 -16
  507. package/refs/vbenchmark/tasks/code-evolution/legacy-migration/callback-to-async/PROMPT.md +0 -47
  508. package/refs/vbenchmark/tasks/code-evolution/legacy-migration/callback-to-async/task.yaml +0 -24
  509. package/refs/vbenchmark/tasks/code-evolution/legacy-migration/express-to-fastify/PROMPT.md +0 -49
  510. package/refs/vbenchmark/tasks/code-evolution/legacy-migration/express-to-fastify/base-code/src/app.ts +0 -22
  511. package/refs/vbenchmark/tasks/code-evolution/legacy-migration/express-to-fastify/task.yaml +0 -37
  512. package/refs/vbenchmark/tasks/code-evolution/legacy-migration/express-to-fastify/tests/api.test.ts +0 -70
  513. package/refs/vbenchmark/tasks/code-evolution/legacy-migration/flask-to-fastapi/PROMPT.md +0 -46
  514. package/refs/vbenchmark/tasks/code-evolution/legacy-migration/flask-to-fastapi/task.yaml +0 -24
  515. package/refs/vbenchmark/tasks/code-evolution/legacy-migration/java-to-kotlin/PROMPT.md +0 -45
  516. package/refs/vbenchmark/tasks/code-evolution/legacy-migration/java-to-kotlin/task.yaml +0 -24
  517. package/refs/vbenchmark/tasks/code-evolution/legacy-migration/jquery-to-react/PROMPT.md +0 -47
  518. package/refs/vbenchmark/tasks/code-evolution/legacy-migration/jquery-to-react/task.yaml +0 -24
  519. package/refs/vbenchmark/tasks/code-evolution/legacy-migration/rest-to-grpc/PROMPT.md +0 -47
  520. package/refs/vbenchmark/tasks/code-evolution/legacy-migration/rest-to-grpc/task.yaml +0 -24
  521. package/refs/vbenchmark/tasks/code-evolution/performance/async-refactor/PROMPT.md +0 -47
  522. package/refs/vbenchmark/tasks/code-evolution/performance/async-refactor/task.yaml +0 -24
  523. package/refs/vbenchmark/tasks/code-evolution/performance/memory-leak-fix/PROMPT.md +0 -47
  524. package/refs/vbenchmark/tasks/code-evolution/performance/memory-leak-fix/task.yaml +0 -24
  525. package/refs/vbenchmark/tasks/code-evolution/performance/query-optimization/PROMPT.md +0 -49
  526. package/refs/vbenchmark/tasks/code-evolution/performance/query-optimization/task.yaml +0 -24
  527. package/refs/vbenchmark/tasks/code-evolution/refactoring/class-to-hooks/PROMPT.md +0 -96
  528. package/refs/vbenchmark/tasks/code-evolution/refactoring/class-to-hooks/task.yaml +0 -27
  529. package/refs/vbenchmark/tasks/code-evolution/refactoring/dependency-injection/PROMPT.md +0 -47
  530. package/refs/vbenchmark/tasks/code-evolution/refactoring/dependency-injection/task.yaml +0 -24
  531. package/refs/vbenchmark/tasks/code-evolution/refactoring/error-handling/PROMPT.md +0 -48
  532. package/refs/vbenchmark/tasks/code-evolution/refactoring/error-handling/task.yaml +0 -24
  533. package/refs/vbenchmark/tasks/code-evolution/refactoring/monolith-to-modules/PROMPT.md +0 -50
  534. package/refs/vbenchmark/tasks/code-evolution/refactoring/monolith-to-modules/task.yaml +0 -24
  535. package/refs/vbenchmark/tasks/code-evolution/refactoring/orm-migration/PROMPT.md +0 -47
  536. package/refs/vbenchmark/tasks/code-evolution/refactoring/orm-migration/task.yaml +0 -24
  537. package/refs/vbenchmark/tasks/code-evolution/security/secrets-rotation/PROMPT.md +0 -49
  538. package/refs/vbenchmark/tasks/code-evolution/security/secrets-rotation/task.yaml +0 -24
  539. package/refs/vbenchmark/tasks/code-evolution/security/sql-injection-fix/PROMPT.md +0 -50
  540. package/refs/vbenchmark/tasks/code-evolution/security/sql-injection-fix/task.yaml +0 -24
  541. package/refs/vbenchmark/tasks/code-evolution/security/xss-prevention/PROMPT.md +0 -47
  542. package/refs/vbenchmark/tasks/code-evolution/security/xss-prevention/task.yaml +0 -24
  543. package/refs/vbenchmark/tasks/code-evolution/testing/add-unit-tests/PROMPT.md +0 -48
  544. package/refs/vbenchmark/tasks/code-evolution/testing/add-unit-tests/task.yaml +0 -24
  545. package/refs/vbenchmark/tasks/code-evolution/testing/e2e-playwright/PROMPT.md +0 -50
  546. package/refs/vbenchmark/tasks/code-evolution/testing/e2e-playwright/task.yaml +0 -24
  547. package/refs/vbenchmark/tasks/code-evolution/testing/pytest-fixtures/PROMPT.md +0 -47
  548. package/refs/vbenchmark/tasks/code-evolution/testing/pytest-fixtures/task.yaml +0 -24
  549. package/refs/vbenchmark/tasks/frontend/accessibility/keyboard-shortcuts/PROMPT.md +0 -44
  550. package/refs/vbenchmark/tasks/frontend/accessibility/keyboard-shortcuts/task.yaml +0 -24
  551. package/refs/vbenchmark/tasks/frontend/accessibility/screen-reader-nav/PROMPT.md +0 -44
  552. package/refs/vbenchmark/tasks/frontend/accessibility/screen-reader-nav/task.yaml +0 -24
  553. package/refs/vbenchmark/tasks/frontend/advanced/canvas-editor/PROMPT.md +0 -15
  554. package/refs/vbenchmark/tasks/frontend/advanced/canvas-editor/task.yaml +0 -16
  555. package/refs/vbenchmark/tasks/frontend/advanced/micro-frontend/PROMPT.md +0 -15
  556. package/refs/vbenchmark/tasks/frontend/advanced/micro-frontend/task.yaml +0 -16
  557. package/refs/vbenchmark/tasks/frontend/advanced/offline-first/PROMPT.md +0 -15
  558. package/refs/vbenchmark/tasks/frontend/advanced/offline-first/task.yaml +0 -16
  559. package/refs/vbenchmark/tasks/frontend/advanced/realtime-collab/PROMPT.md +0 -15
  560. package/refs/vbenchmark/tasks/frontend/advanced/realtime-collab/task.yaml +0 -16
  561. package/refs/vbenchmark/tasks/frontend/advanced/service-worker/PROMPT.md +0 -15
  562. package/refs/vbenchmark/tasks/frontend/advanced/service-worker/task.yaml +0 -16
  563. package/refs/vbenchmark/tasks/frontend/advanced/state-machine/PROMPT.md +0 -15
  564. package/refs/vbenchmark/tasks/frontend/advanced/state-machine/task.yaml +0 -16
  565. package/refs/vbenchmark/tasks/frontend/advanced/virtual-list/PROMPT.md +0 -15
  566. package/refs/vbenchmark/tasks/frontend/advanced/virtual-list/task.yaml +0 -16
  567. package/refs/vbenchmark/tasks/frontend/advanced/wasm-integration/PROMPT.md +0 -15
  568. package/refs/vbenchmark/tasks/frontend/advanced/wasm-integration/task.yaml +0 -16
  569. package/refs/vbenchmark/tasks/frontend/advanced/web-worker/PROMPT.md +0 -15
  570. package/refs/vbenchmark/tasks/frontend/advanced/web-worker/task.yaml +0 -16
  571. package/refs/vbenchmark/tasks/frontend/advanced/webgl-visualization/PROMPT.md +0 -15
  572. package/refs/vbenchmark/tasks/frontend/advanced/webgl-visualization/task.yaml +0 -16
  573. package/refs/vbenchmark/tasks/frontend/animation/page-transitions/PROMPT.md +0 -44
  574. package/refs/vbenchmark/tasks/frontend/animation/page-transitions/task.yaml +0 -24
  575. package/refs/vbenchmark/tasks/frontend/components/data-grid/PROMPT.md +0 -59
  576. package/refs/vbenchmark/tasks/frontend/components/data-grid/task.yaml +0 -24
  577. package/refs/vbenchmark/tasks/frontend/components/date-range-picker/PROMPT.md +0 -57
  578. package/refs/vbenchmark/tasks/frontend/components/date-range-picker/task.yaml +0 -24
  579. package/refs/vbenchmark/tasks/frontend/components/file-uploader/PROMPT.md +0 -55
  580. package/refs/vbenchmark/tasks/frontend/components/file-uploader/task.yaml +0 -24
  581. package/refs/vbenchmark/tasks/frontend/components/form-builder/PROMPT.md +0 -96
  582. package/refs/vbenchmark/tasks/frontend/components/form-builder/task.yaml +0 -28
  583. package/refs/vbenchmark/tasks/frontend/components/rich-text-editor/PROMPT.md +0 -45
  584. package/refs/vbenchmark/tasks/frontend/components/rich-text-editor/task.yaml +0 -24
  585. package/refs/vbenchmark/tasks/frontend/figma-to-code/dashboard-layout/PROMPT.md +0 -50
  586. package/refs/vbenchmark/tasks/frontend/figma-to-code/dashboard-layout/task.yaml +0 -25
  587. package/refs/vbenchmark/tasks/frontend/figma-to-code/landing-page/PROMPT.md +0 -49
  588. package/refs/vbenchmark/tasks/frontend/figma-to-code/landing-page/task.yaml +0 -25
  589. package/refs/vbenchmark/tasks/frontend/figma-to-code/mobile-app-screen/PROMPT.md +0 -51
  590. package/refs/vbenchmark/tasks/frontend/figma-to-code/mobile-app-screen/task.yaml +0 -24
  591. package/refs/vbenchmark/tasks/frontend/figma-to-code/pricing-card/PROMPT.md +0 -93
  592. package/refs/vbenchmark/tasks/frontend/figma-to-code/pricing-card/docker-compose.yaml +0 -23
  593. package/refs/vbenchmark/tasks/frontend/figma-to-code/pricing-card/task.yaml +0 -30
  594. package/refs/vbenchmark/tasks/frontend/figma-to-code/pricing-card/tests/visual/diff.test.ts +0 -107
  595. package/refs/vbenchmark/tasks/frontend/figma-to-code/pricing-card/tests/visual/interaction.test.ts +0 -88
  596. package/refs/vbenchmark/tasks/frontend/performance/image-lazy-load/PROMPT.md +0 -43
  597. package/refs/vbenchmark/tasks/frontend/performance/image-lazy-load/task.yaml +0 -24
  598. package/refs/vbenchmark/tasks/frontend/performance/infinite-scroll/PROMPT.md +0 -44
  599. package/refs/vbenchmark/tasks/frontend/performance/infinite-scroll/task.yaml +0 -24
  600. package/refs/vbenchmark/tasks/frontend/state-management/collaborative-editor/PROMPT.md +0 -44
  601. package/refs/vbenchmark/tasks/frontend/state-management/collaborative-editor/task.yaml +0 -24
  602. package/refs/vbenchmark/tasks/frontend/state-management/shopping-cart/PROMPT.md +0 -53
  603. package/refs/vbenchmark/tasks/frontend/state-management/shopping-cart/task.yaml +0 -24
  604. package/refs/vbenchmark/tasks/frontend/visualization/chart-dashboard/PROMPT.md +0 -83
  605. package/refs/vbenchmark/tasks/frontend/visualization/chart-dashboard/task.yaml +0 -28
  606. package/refs/vbenchmark/tasks/frontend/visualization/gantt-chart/PROMPT.md +0 -57
  607. package/refs/vbenchmark/tasks/frontend/visualization/gantt-chart/task.yaml +0 -24
  608. package/refs/vbenchmark/tasks/frontend/visualization/map-dashboard/PROMPT.md +0 -44
  609. package/refs/vbenchmark/tasks/frontend/visualization/map-dashboard/task.yaml +0 -24
  610. package/refs/vbenchmark/tasks/frontend/visualization/realtime-charts/PROMPT.md +0 -43
  611. package/refs/vbenchmark/tasks/frontend/visualization/realtime-charts/task.yaml +0 -24
  612. package/refs/vbenchmark/tasks/glue-code/advanced/blue-green-deploy/PROMPT.md +0 -15
  613. package/refs/vbenchmark/tasks/glue-code/advanced/blue-green-deploy/task.yaml +0 -16
  614. package/refs/vbenchmark/tasks/glue-code/advanced/canary-release/PROMPT.md +0 -15
  615. package/refs/vbenchmark/tasks/glue-code/advanced/canary-release/task.yaml +0 -16
  616. package/refs/vbenchmark/tasks/glue-code/advanced/change-data-capture/PROMPT.md +0 -15
  617. package/refs/vbenchmark/tasks/glue-code/advanced/change-data-capture/task.yaml +0 -16
  618. package/refs/vbenchmark/tasks/glue-code/advanced/config-management/PROMPT.md +0 -15
  619. package/refs/vbenchmark/tasks/glue-code/advanced/config-management/task.yaml +0 -16
  620. package/refs/vbenchmark/tasks/glue-code/advanced/data-pipeline/PROMPT.md +0 -15
  621. package/refs/vbenchmark/tasks/glue-code/advanced/data-pipeline/task.yaml +0 -16
  622. package/refs/vbenchmark/tasks/glue-code/advanced/distributed-tracing/PROMPT.md +0 -15
  623. package/refs/vbenchmark/tasks/glue-code/advanced/distributed-tracing/task.yaml +0 -16
  624. package/refs/vbenchmark/tasks/glue-code/advanced/log-aggregation/PROMPT.md +0 -15
  625. package/refs/vbenchmark/tasks/glue-code/advanced/log-aggregation/task.yaml +0 -16
  626. package/refs/vbenchmark/tasks/glue-code/advanced/schema-registry/PROMPT.md +0 -15
  627. package/refs/vbenchmark/tasks/glue-code/advanced/schema-registry/task.yaml +0 -16
  628. package/refs/vbenchmark/tasks/glue-code/advanced/secret-rotation/PROMPT.md +0 -15
  629. package/refs/vbenchmark/tasks/glue-code/advanced/secret-rotation/task.yaml +0 -16
  630. package/refs/vbenchmark/tasks/glue-code/advanced/stream-processing/PROMPT.md +0 -15
  631. package/refs/vbenchmark/tasks/glue-code/advanced/stream-processing/task.yaml +0 -16
  632. package/refs/vbenchmark/tasks/glue-code/api-sync/rest-to-graphql/PROMPT.md +0 -66
  633. package/refs/vbenchmark/tasks/glue-code/api-sync/rest-to-graphql/task.yaml +0 -27
  634. package/refs/vbenchmark/tasks/glue-code/caching/redis-cache/PROMPT.md +0 -82
  635. package/refs/vbenchmark/tasks/glue-code/caching/redis-cache/task.yaml +0 -27
  636. package/refs/vbenchmark/tasks/glue-code/data-transform/avro-schema-evolution/PROMPT.md +0 -51
  637. package/refs/vbenchmark/tasks/glue-code/data-transform/avro-schema-evolution/task.yaml +0 -24
  638. package/refs/vbenchmark/tasks/glue-code/data-transform/csv-normalizer/PROMPT.md +0 -49
  639. package/refs/vbenchmark/tasks/glue-code/data-transform/csv-normalizer/task.yaml +0 -24
  640. package/refs/vbenchmark/tasks/glue-code/data-transform/excel-to-json/PROMPT.md +0 -67
  641. package/refs/vbenchmark/tasks/glue-code/data-transform/excel-to-json/task.yaml +0 -28
  642. package/refs/vbenchmark/tasks/glue-code/data-transform/excel-to-json/tests/transform.test.py +0 -137
  643. package/refs/vbenchmark/tasks/glue-code/data-transform/json-to-xml/PROMPT.md +0 -45
  644. package/refs/vbenchmark/tasks/glue-code/data-transform/json-to-xml/task.yaml +0 -24
  645. package/refs/vbenchmark/tasks/glue-code/data-transform/protobuf-converter/PROMPT.md +0 -44
  646. package/refs/vbenchmark/tasks/glue-code/data-transform/protobuf-converter/task.yaml +0 -24
  647. package/refs/vbenchmark/tasks/glue-code/etl/cdc-pipeline/PROMPT.md +0 -52
  648. package/refs/vbenchmark/tasks/glue-code/etl/cdc-pipeline/task.yaml +0 -27
  649. package/refs/vbenchmark/tasks/glue-code/etl/database-sync/PROMPT.md +0 -51
  650. package/refs/vbenchmark/tasks/glue-code/etl/database-sync/task.yaml +0 -24
  651. package/refs/vbenchmark/tasks/glue-code/etl/s3-to-warehouse/PROMPT.md +0 -50
  652. package/refs/vbenchmark/tasks/glue-code/etl/s3-to-warehouse/task.yaml +0 -24
  653. package/refs/vbenchmark/tasks/glue-code/file-processing/image-resizer/PROMPT.md +0 -52
  654. package/refs/vbenchmark/tasks/glue-code/file-processing/image-resizer/task.yaml +0 -24
  655. package/refs/vbenchmark/tasks/glue-code/file-processing/pdf-merger/PROMPT.md +0 -50
  656. package/refs/vbenchmark/tasks/glue-code/file-processing/pdf-merger/task.yaml +0 -24
  657. package/refs/vbenchmark/tasks/glue-code/file-processing/video-transcoder/PROMPT.md +0 -50
  658. package/refs/vbenchmark/tasks/glue-code/file-processing/video-transcoder/task.yaml +0 -27
  659. package/refs/vbenchmark/tasks/glue-code/migration/data-backfill/PROMPT.md +0 -50
  660. package/refs/vbenchmark/tasks/glue-code/migration/data-backfill/task.yaml +0 -24
  661. package/refs/vbenchmark/tasks/glue-code/migration/database-versioning/PROMPT.md +0 -50
  662. package/refs/vbenchmark/tasks/glue-code/migration/database-versioning/task.yaml +0 -24
  663. package/refs/vbenchmark/tasks/glue-code/queue/kafka-producer/PROMPT.md +0 -49
  664. package/refs/vbenchmark/tasks/glue-code/queue/kafka-producer/task.yaml +0 -27
  665. package/refs/vbenchmark/tasks/glue-code/queue/rabbitmq-consumer/PROMPT.md +0 -50
  666. package/refs/vbenchmark/tasks/glue-code/queue/rabbitmq-consumer/task.yaml +0 -27
  667. package/refs/vbenchmark/tasks/glue-code/queue/sqs-batch-processor/PROMPT.md +0 -47
  668. package/refs/vbenchmark/tasks/glue-code/queue/sqs-batch-processor/task.yaml +0 -24
  669. package/refs/vbenchmark/tasks/glue-code/scheduler/cron-job-manager/PROMPT.md +0 -52
  670. package/refs/vbenchmark/tasks/glue-code/scheduler/cron-job-manager/task.yaml +0 -27
  671. package/refs/vbenchmark/tasks/glue-code/scheduler/delayed-tasks/PROMPT.md +0 -51
  672. package/refs/vbenchmark/tasks/glue-code/scheduler/delayed-tasks/task.yaml +0 -27
  673. package/refs/vbenchmark/tasks/saas-core/advanced/api-versioning/PROMPT.md +0 -15
  674. package/refs/vbenchmark/tasks/saas-core/advanced/api-versioning/task.yaml +0 -16
  675. package/refs/vbenchmark/tasks/saas-core/advanced/circuit-breaker/PROMPT.md +0 -13
  676. package/refs/vbenchmark/tasks/saas-core/advanced/circuit-breaker/task.yaml +0 -16
  677. package/refs/vbenchmark/tasks/saas-core/advanced/compliance-gdpr/PROMPT.md +0 -15
  678. package/refs/vbenchmark/tasks/saas-core/advanced/compliance-gdpr/task.yaml +0 -16
  679. package/refs/vbenchmark/tasks/saas-core/advanced/cqrs-pattern/PROMPT.md +0 -13
  680. package/refs/vbenchmark/tasks/saas-core/advanced/cqrs-pattern/task.yaml +0 -16
  681. package/refs/vbenchmark/tasks/saas-core/advanced/data-encryption/PROMPT.md +0 -15
  682. package/refs/vbenchmark/tasks/saas-core/advanced/data-encryption/task.yaml +0 -16
  683. package/refs/vbenchmark/tasks/saas-core/advanced/distributed-locking/PROMPT.md +0 -46
  684. package/refs/vbenchmark/tasks/saas-core/advanced/distributed-locking/task.yaml +0 -24
  685. package/refs/vbenchmark/tasks/saas-core/advanced/event-sourcing/PROMPT.md +0 -23
  686. package/refs/vbenchmark/tasks/saas-core/advanced/event-sourcing/task.yaml +0 -16
  687. package/refs/vbenchmark/tasks/saas-core/advanced/feature-flags-ab/PROMPT.md +0 -15
  688. package/refs/vbenchmark/tasks/saas-core/advanced/feature-flags-ab/task.yaml +0 -16
  689. package/refs/vbenchmark/tasks/saas-core/advanced/saga-orchestration/PROMPT.md +0 -13
  690. package/refs/vbenchmark/tasks/saas-core/advanced/saga-orchestration/task.yaml +0 -16
  691. package/refs/vbenchmark/tasks/saas-core/advanced/webhook-delivery/PROMPT.md +0 -15
  692. package/refs/vbenchmark/tasks/saas-core/advanced/webhook-delivery/task.yaml +0 -16
  693. package/refs/vbenchmark/tasks/saas-core/audit/activity-logging/PROMPT.md +0 -50
  694. package/refs/vbenchmark/tasks/saas-core/audit/activity-logging/task.yaml +0 -27
  695. package/refs/vbenchmark/tasks/saas-core/auth/jwt-refresh-tokens/PROMPT.md +0 -50
  696. package/refs/vbenchmark/tasks/saas-core/auth/jwt-refresh-tokens/task.yaml +0 -27
  697. package/refs/vbenchmark/tasks/saas-core/auth/magic-link-email/PROMPT.md +0 -53
  698. package/refs/vbenchmark/tasks/saas-core/auth/magic-link-email/task.yaml +0 -27
  699. package/refs/vbenchmark/tasks/saas-core/auth/mfa-totp/PROMPT.md +0 -79
  700. package/refs/vbenchmark/tasks/saas-core/auth/mfa-totp/task.yaml +0 -27
  701. package/refs/vbenchmark/tasks/saas-core/auth/rbac-permissions/PROMPT.md +0 -51
  702. package/refs/vbenchmark/tasks/saas-core/auth/rbac-permissions/task.yaml +0 -27
  703. package/refs/vbenchmark/tasks/saas-core/auth/session-management/PROMPT.md +0 -52
  704. package/refs/vbenchmark/tasks/saas-core/auth/session-management/task.yaml +0 -27
  705. package/refs/vbenchmark/tasks/saas-core/auth/supabase-oauth/PROMPT.md +0 -45
  706. package/refs/vbenchmark/tasks/saas-core/auth/supabase-oauth/docker-compose.yaml +0 -47
  707. package/refs/vbenchmark/tasks/saas-core/auth/supabase-oauth/task.yaml +0 -32
  708. package/refs/vbenchmark/tasks/saas-core/auth/supabase-oauth/tests/auth.test.ts +0 -59
  709. package/refs/vbenchmark/tasks/saas-core/billing/invoice-generation/PROMPT.md +0 -53
  710. package/refs/vbenchmark/tasks/saas-core/billing/invoice-generation/task.yaml +0 -27
  711. package/refs/vbenchmark/tasks/saas-core/billing/stripe-subscriptions/PROMPT.md +0 -51
  712. package/refs/vbenchmark/tasks/saas-core/billing/stripe-subscriptions/task.yaml +0 -27
  713. package/refs/vbenchmark/tasks/saas-core/billing/usage-metering/PROMPT.md +0 -52
  714. package/refs/vbenchmark/tasks/saas-core/billing/usage-metering/task.yaml +0 -27
  715. package/refs/vbenchmark/tasks/saas-core/crud/dashboard-table/PROMPT.md +0 -48
  716. package/refs/vbenchmark/tasks/saas-core/crud/dashboard-table/task.yaml +0 -28
  717. package/refs/vbenchmark/tasks/saas-core/multi-tenant/org-isolation/PROMPT.md +0 -50
  718. package/refs/vbenchmark/tasks/saas-core/multi-tenant/org-isolation/task.yaml +0 -27
  719. package/refs/vbenchmark/tasks/saas-core/multi-tenant/subdomain-routing/PROMPT.md +0 -50
  720. package/refs/vbenchmark/tasks/saas-core/multi-tenant/subdomain-routing/task.yaml +0 -27
  721. package/refs/vbenchmark/tasks/saas-core/notifications/email-queue/PROMPT.md +0 -53
  722. package/refs/vbenchmark/tasks/saas-core/notifications/email-queue/task.yaml +0 -27
  723. package/refs/vbenchmark/tasks/saas-core/notifications/in-app-alerts/PROMPT.md +0 -51
  724. package/refs/vbenchmark/tasks/saas-core/notifications/in-app-alerts/task.yaml +0 -27
  725. package/refs/vbenchmark/tasks/saas-core/notifications/push-notifications/PROMPT.md +0 -51
  726. package/refs/vbenchmark/tasks/saas-core/notifications/push-notifications/task.yaml +0 -27
  727. package/refs/vbenchmark/tasks/saas-core/realtime/websocket-chat/PROMPT.md +0 -80
  728. package/refs/vbenchmark/tasks/saas-core/realtime/websocket-chat/task.yaml +0 -27
  729. package/refs/vbenchmark/tasks/saas-core/search/full-text-search/PROMPT.md +0 -51
  730. package/refs/vbenchmark/tasks/saas-core/search/full-text-search/task.yaml +0 -27
  731. package/refs/vbenchmark/tasks/saas-core/security/rate-limiter/PROMPT.md +0 -99
  732. package/refs/vbenchmark/tasks/saas-core/security/rate-limiter/task.yaml +0 -27
  733. package/refs/vbenchmark/tasks/saas-core/settings/user-preferences/PROMPT.md +0 -78
  734. package/refs/vbenchmark/tasks/saas-core/settings/user-preferences/task.yaml +0 -27
  735. package/refs/vbenchmark/templates/fastapi-postgres/docker-compose.yaml +0 -36
  736. package/refs/vbenchmark/templates/fastapi-postgres/pyproject.toml +0 -34
  737. package/refs/vbenchmark/templates/fastapi-postgres/src/__init__.py +0 -0
  738. package/refs/vbenchmark/templates/fastapi-postgres/src/config.py +0 -12
  739. package/refs/vbenchmark/templates/fastapi-postgres/src/database.py +0 -15
  740. package/refs/vbenchmark/templates/fastapi-postgres/src/main.py +0 -51
  741. package/refs/vbenchmark/templates/fastapi-postgres/src/models.py +0 -12
  742. package/refs/vbenchmark/templates/fastapi-postgres/src/schemas.py +0 -20
  743. package/refs/vbenchmark/templates/go-fiber/docker-compose.yaml +0 -34
  744. package/refs/vbenchmark/templates/go-fiber/go.mod +0 -33
  745. package/refs/vbenchmark/templates/go-fiber/go.sum +0 -68
  746. package/refs/vbenchmark/templates/go-fiber/main.go +0 -98
  747. package/refs/vbenchmark/templates/nextjs-supabase/.env.example +0 -3
  748. package/refs/vbenchmark/templates/nextjs-supabase/docker-compose.yaml +0 -68
  749. package/refs/vbenchmark/templates/nextjs-supabase/src/app/globals.css +0 -13
  750. package/refs/vbenchmark/templates/nextjs-supabase/src/app/layout.tsx +0 -19
  751. package/refs/vbenchmark/templates/nextjs-supabase/src/app/page.tsx +0 -38
  752. package/refs/vbenchmark/templates/nextjs-supabase/src/lib/supabase/client.ts +0 -8
  753. package/refs/vbenchmark/templates/nextjs-supabase/src/lib/supabase/server.ts +0 -32
  754. package/refs/vbenchmark/templates/rust-axum/Cargo.lock +0 -2371
  755. package/refs/vbenchmark/templates/rust-axum/Cargo.toml +0 -16
  756. package/refs/vbenchmark/templates/rust-axum/docker-compose.yaml +0 -34
  757. package/refs/vbenchmark/templates/rust-axum/migrations/20240101000000_init.sql +0 -20
  758. package/refs/vbenchmark/templates/rust-axum/src/main.rs +0 -121
  759. package/refs/vbenchmark/tsconfig.base.json +0 -18
  760. package/refs/vbenchmark/turbo.json +0 -23
  761. package/refs/vbenchmark/vercel.json +0 -10
@@ -1,111 +0,0 @@
1
- # Design: VibeCodingBench Architecture
2
-
3
- ## Context
4
- Building a comprehensive benchmark for coding agents (Claude Code, Gemini, Codex, DeepSeek, etc.) that measures real-world developer task performance. Must support both local execution and hosted leaderboard.
5
-
6
- ## Goals
7
- - Reproducible evaluation across different agents
8
- - Fair comparison with isolated Docker execution
9
- - Multi-dimensional scoring (not just pass/fail)
10
- - Easy task contribution workflow
11
- - Support polyglot: TypeScript, Python, Go, Rust, Java
12
-
13
- ## Non-Goals
14
- - Real-time collaboration features
15
- - IDE integrations (agents run headless)
16
- - Training data generation
17
-
18
- ## Decisions
19
-
20
- ### 1. Monorepo Structure
21
- ```
22
- vibecodingbench/
23
- ├── packages/
24
- │ ├── cli/ # Task runner CLI
25
- │ ├── evaluator/ # Scoring engine
26
- │ └── leaderboard/ # Web service
27
- ├── tasks/
28
- │ ├── saas-core/ # 30% weight
29
- │ ├── glue-code/ # 20% weight
30
- │ ├── ai-integration/ # 20% weight
31
- │ ├── frontend/ # 15% weight
32
- │ └── api-integrations/ # 15% weight
33
- ├── templates/ # Starter codebases
34
- │ ├── nextjs-supabase/
35
- │ ├── fastapi-postgres/
36
- │ ├── go-fiber/
37
- │ └── rust-axum/
38
- └── docker/ # Base images
39
- ```
40
- **Rationale:** Single repo simplifies versioning, CI, and contributions.
41
-
42
- ### 2. Task Definition Format
43
- Each task is a directory:
44
- ```
45
- tasks/saas-core/auth/supabase-oauth/
46
- ├── task.yaml # Metadata, prompt, constraints
47
- ├── docker-compose.yaml # Services (DB, mock APIs)
48
- ├── template/ # Starter code (optional)
49
- ├── tests/ # Evaluation tests
50
- │ ├── functional/ # Must pass
51
- │ ├── security/ # OWASP checks
52
- │ └── visual/ # Screenshot diff (frontend)
53
- └── golden/ # Reference implementation
54
- ```
55
- **Rationale:** Self-contained, versionable, easy to add.
56
-
57
- ### 3. Execution Model
58
- ```
59
- ┌─────────────┐ ┌──────────────┐ ┌─────────────┐
60
- │ CLI │────▶│ Task Env │────▶│ Evaluator │
61
- │ (host) │ │ (Docker) │ │ (Docker) │
62
- └─────────────┘ └──────────────┘ └─────────────┘
63
- │ │ │
64
- │ mount workspace │ run agent │ run tests
65
- │ inject prompt │ capture output │ compute scores
66
- └────────────────────┴────────────────────┘
67
- ```
68
- - Agent runs inside container with network access (for package installs)
69
- - Evaluation runs in separate container (no agent access)
70
- - Time/token limits enforced by CLI
71
-
72
- ### 4. Scoring Dimensions
73
- | Dimension | Weight | Method |
74
- |-----------|--------|--------|
75
- | Functional | 40% | Test pass rate |
76
- | Code Quality | 20% | ESLint/Ruff + complexity metrics |
77
- | Security | 20% | Semgrep OWASP rules |
78
- | Efficiency | 20% | Tokens used + wall time |
79
-
80
- ### 5. Agent Interface
81
- Agents connect via stdio or HTTP:
82
- ```yaml
83
- # task.yaml
84
- agent_interface:
85
- type: stdio # or http
86
- prompt_file: PROMPT.md
87
- workspace: /workspace
88
- timeout: 300s
89
- token_limit: 100000
90
- ```
91
-
92
- ## Alternatives Considered
93
-
94
- ### Task Registry (rejected)
95
- - Pros: Smaller local footprint
96
- - Cons: More infrastructure, harder offline use
97
- - Decision: Start monorepo, can extract registry later
98
-
99
- ### VM-per-task (rejected)
100
- - Pros: Better isolation
101
- - Cons: 10x cost, slower iteration
102
- - Decision: Docker sufficient, VMs for hosted tier only
103
-
104
- ## Risks & Mitigations
105
-
106
- | Risk | Mitigation |
107
- |------|------------|
108
- | Task contamination in training data | Version tasks, rotate variants |
109
- | Agent gaming metrics | Multiple equivalent tasks per category |
110
- | Unfair time comparisons | Normalize by model speed tier |
111
- | Docker escape | Rootless containers, seccomp profiles |
@@ -1,15 +0,0 @@
1
- # Change: Initialize VibeCodingBench
2
-
3
- ## Why
4
- Existing coding benchmarks (HumanEval, SWE-bench) focus on algorithmic puzzles or isolated bug fixes. Real developers spend 40% of time on SaaS boilerplate, integrations, and glue code. We need a benchmark that measures what coding agents actually do in production.
5
-
6
- ## What Changes
7
- - Create monorepo structure with task runner CLI
8
- - Define task specification format (YAML + Docker)
9
- - Implement multi-dimensional evaluation (functional, quality, security, efficiency)
10
- - Build leaderboard service for hosted evaluation
11
- - Add 200+ tasks across 5 categories: SaaS, Glue Code, AI Integration, Frontend, API
12
-
13
- ## Impact
14
- - Affected specs: task-runner, task-definition, evaluation, leaderboard (all new)
15
- - Affected code: Greenfield project
@@ -1,105 +0,0 @@
1
- ## ADDED Requirements
2
-
3
- ### Requirement: Multi-Dimensional Scoring
4
- The system SHALL compute scores across five dimensions with configurable weights.
5
-
6
- #### Scenario: Default weights
7
- - **WHEN** no custom weights specified
8
- - **THEN** system uses: Functional 40%, Visual 20%, Quality 20%, Cost 10%, Speed 10%
9
-
10
- #### Scenario: Custom weights
11
- - **WHEN** user specifies `--weights func=50,visual=0,quality=30,cost=10,speed=10`
12
- - **THEN** system applies custom weight distribution
13
-
14
- ### Requirement: Functional Correctness (Pass@k)
15
- The system SHALL measure functional correctness via execution-based testing.
16
-
17
- #### Scenario: Pass@1
18
- - **WHEN** test suite runs once and passes
19
- - **THEN** functional score = 100%
20
-
21
- #### Scenario: Pass@n with retries
22
- - **WHEN** task allows n attempts and any attempt passes
23
- - **THEN** functional score = 100% but efficiency penalty applied
24
-
25
- #### Scenario: Fail-to-Pass validation
26
- - **WHEN** task is bug-fix type
27
- - **THEN** system verifies agent's test fails before fix and passes after
28
-
29
- ### Requirement: Visual Fidelity
30
- The system SHALL measure UI accuracy via screenshot comparison.
31
-
32
- #### Scenario: Pixel diff scoring
33
- - **WHEN** task has `reference.png` in golden/
34
- - **THEN** system captures screenshot and computes pixel match percentage
35
-
36
- #### Scenario: Responsive breakpoints
37
- - **WHEN** task specifies `breakpoints: [375, 768, 1440]`
38
- - **THEN** system tests at each width and averages scores
39
-
40
- #### Scenario: Tolerance threshold
41
- - **WHEN** pixel mismatch < 5%
42
- - **THEN** visual score = 100% (allows font rendering variance)
43
-
44
- ### Requirement: Code Quality
45
- The system SHALL measure code hygiene via static analysis.
46
-
47
- #### Scenario: Linter errors
48
- - **WHEN** generated code has linter errors
49
- - **THEN** quality score reduced by error count (max -50 points)
50
-
51
- #### Scenario: Cyclomatic complexity
52
- - **WHEN** average complexity > 10
53
- - **THEN** quality score reduced proportionally
54
-
55
- #### Scenario: Security scan
56
- - **WHEN** Semgrep finds Critical/High vulnerabilities
57
- - **THEN** task auto-fails regardless of other scores
58
-
59
- ### Requirement: Hallucination Detection
60
- The system SHALL detect fabricated dependencies.
61
-
62
- #### Scenario: Import validation
63
- - **WHEN** agent imports package not in npm/PyPI/Go modules
64
- - **THEN** hallucination flag raised, quality score -20
65
-
66
- ### Requirement: Cost Efficiency
67
- The system SHALL track token usage and compute costs.
68
-
69
- #### Scenario: Token tracking
70
- - **WHEN** task completes
71
- - **THEN** system records input_tokens, output_tokens, total_cost
72
-
73
- #### Scenario: Cost per solved task (CPST)
74
- - **WHEN** computing leaderboard
75
- - **THEN** CPST = total_cost / passed_tasks
76
-
77
- #### Scenario: Context pollution rate
78
- - **WHEN** agent reads files
79
- - **THEN** pollution_rate = (files_read - files_edited) / files_read
80
-
81
- ### Requirement: Speed Metrics
82
- The system SHALL track execution time and reasoning efficiency.
83
-
84
- #### Scenario: Wall-clock time
85
- - **WHEN** task completes
86
- - **THEN** system records start_time, end_time, duration_seconds
87
-
88
- #### Scenario: Step efficiency
89
- - **WHEN** agent completes task
90
- - **THEN** system counts LLM round-trips (fewer = better)
91
-
92
- #### Scenario: Self-correction rate
93
- - **WHEN** agent encounters error and retries
94
- - **THEN** system tracks retry_count (target < 2)
95
-
96
- ### Requirement: Final Score Calculation
97
- The system SHALL compute weighted final score with penalties.
98
-
99
- #### Scenario: Score formula
100
- - **WHEN** all dimensions computed
101
- - **THEN** final_score = (func * w1) + (visual * w2) + (quality * w3) - (cost_penalty) - (speed_penalty)
102
-
103
- #### Scenario: Leaderboard ranking
104
- - **WHEN** displaying results
105
- - **THEN** rank by final_score descending, show all dimensions in spider chart
@@ -1,68 +0,0 @@
1
- ## ADDED Requirements
2
-
3
- ### Requirement: Submission API
4
- The system SHALL accept evaluation submissions via REST API.
5
-
6
- #### Scenario: Submit run results
7
- - **WHEN** POST /api/submissions with run results JSON
8
- - **THEN** system validates, stores, and queues for leaderboard update
9
-
10
- #### Scenario: Agent identification
11
- - **WHEN** submission includes `agent_id` and `model_version`
12
- - **THEN** system groups results by agent for comparison
13
-
14
- ### Requirement: Leaderboard Display
15
- The system SHALL display ranked agents with multi-dimensional scores.
16
-
17
- #### Scenario: Overall leaderboard
18
- - **WHEN** GET /api/leaderboard
19
- - **THEN** system returns agents ranked by final_score with all dimension breakdowns
20
-
21
- #### Scenario: Category leaderboard
22
- - **WHEN** GET /api/leaderboard?category=saas-core
23
- - **THEN** system returns agents ranked by performance in that category only
24
-
25
- #### Scenario: Spider chart data
26
- - **WHEN** GET /api/leaderboard/:agent_id/chart
27
- - **THEN** system returns 5-axis radar chart data (func, visual, quality, cost, speed)
28
-
29
- ### Requirement: Historical Tracking
30
- The system SHALL track agent performance over time.
31
-
32
- #### Scenario: Version comparison
33
- - **WHEN** same agent submits new model version
34
- - **THEN** system shows delta vs previous version
35
-
36
- #### Scenario: Trend graphs
37
- - **WHEN** viewing agent detail page
38
- - **THEN** system displays score trends over last 30 days
39
-
40
- ### Requirement: Live Demo Dashboard
41
- The system SHALL provide real-time task execution viewing.
42
-
43
- #### Scenario: Active runs
44
- - **WHEN** tasks are running
45
- - **THEN** dashboard shows live terminal streams and browser recordings
46
-
47
- #### Scenario: Replay recordings
48
- - **WHEN** user selects completed run
49
- - **THEN** system plays back asciinema recording synced with browser video
50
-
51
- #### Scenario: Side-by-side comparison
52
- - **WHEN** user selects 2+ agents for same task
53
- - **THEN** system shows parallel playback of each agent's execution
54
-
55
- ### Requirement: Fairness Controls
56
- The system SHALL enforce fair comparison conditions.
57
-
58
- #### Scenario: Docker isolation
59
- - **WHEN** submitting results
60
- - **THEN** system verifies run was in fresh Docker container (via attestation)
61
-
62
- #### Scenario: Held-out validation
63
- - **WHEN** task is marked `held_out: true`
64
- - **THEN** system only accepts submissions from last 14 days (prevents training contamination)
65
-
66
- #### Scenario: Standardized scaffolding
67
- - **WHEN** displaying leaderboard
68
- - **THEN** system shows which agent tooling was used (raw API vs Claude Code CLI vs Codex CLI)
@@ -1,45 +0,0 @@
1
- ## ADDED Requirements
2
-
3
- ### Requirement: Task Schema
4
- The system SHALL validate task definitions against a JSON Schema.
5
-
6
- #### Scenario: Valid task.yaml
7
- - **WHEN** task.yaml contains all required fields (id, name, category, prompt, timeout)
8
- - **THEN** system loads task without errors
9
-
10
- #### Scenario: Invalid task.yaml
11
- - **WHEN** task.yaml is missing required fields
12
- - **THEN** system reports validation errors with line numbers
13
-
14
- ### Requirement: Task Structure
15
- Each task SHALL be a self-contained directory with standardized layout.
16
-
17
- #### Scenario: Minimal task
18
- - **WHEN** task directory contains `task.yaml` and `tests/`
19
- - **THEN** system can execute and evaluate the task
20
-
21
- #### Scenario: Full task with template
22
- - **WHEN** task directory contains `task.yaml`, `template/`, `tests/`, `golden/`
23
- - **THEN** system uses template as starter code and golden for reference comparison
24
-
25
- ### Requirement: Task Metadata
26
- Task definitions SHALL include metadata for filtering and scoring.
27
-
28
- #### Scenario: Category and weight
29
- - **WHEN** task.yaml specifies `category: saas-core` and `weight: 1.5`
30
- - **THEN** system applies weight multiplier to final score
31
-
32
- #### Scenario: Difficulty level
33
- - **WHEN** task.yaml specifies `difficulty: hard`
34
- - **THEN** system adjusts timeout and token limits accordingly
35
-
36
- ### Requirement: Prompt Specification
37
- Tasks SHALL define agent prompts with clear success criteria.
38
-
39
- #### Scenario: Prompt file
40
- - **WHEN** task.yaml specifies `prompt_file: PROMPT.md`
41
- - **THEN** system reads prompt from that file with variable substitution
42
-
43
- #### Scenario: Inline prompt
44
- - **WHEN** task.yaml contains `prompt:` field directly
45
- - **THEN** system uses inline prompt text
@@ -1,49 +0,0 @@
1
- ## ADDED Requirements
2
-
3
- ### Requirement: Task Discovery
4
- The system SHALL discover tasks from the `tasks/` directory by scanning for `task.yaml` files.
5
-
6
- #### Scenario: List all tasks
7
- - **WHEN** user runs `vibecodingbench list`
8
- - **THEN** system displays all tasks grouped by category with metadata
9
-
10
- #### Scenario: Filter by category
11
- - **WHEN** user runs `vibecodingbench list --category saas-core`
12
- - **THEN** system displays only tasks in that category
13
-
14
- ### Requirement: Task Execution
15
- The system SHALL execute tasks in isolated Docker containers with configurable timeouts.
16
-
17
- #### Scenario: Run single task
18
- - **WHEN** user runs `vibecodingbench run <task-id> --agent claude-code`
19
- - **THEN** system spawns Docker container, injects prompt, captures agent output
20
-
21
- #### Scenario: Timeout enforcement
22
- - **WHEN** agent exceeds task timeout (default 300s)
23
- - **THEN** system kills container and records timeout failure
24
-
25
- #### Scenario: Token limit enforcement
26
- - **WHEN** agent exceeds token limit (default 100k)
27
- - **THEN** system stops agent and records token limit failure
28
-
29
- ### Requirement: Agent Interface
30
- The system SHALL support multiple agent connection methods.
31
-
32
- #### Scenario: Stdio agent
33
- - **WHEN** task.yaml specifies `agent_interface.type: stdio`
34
- - **THEN** system communicates via stdin/stdout pipes
35
-
36
- #### Scenario: HTTP agent
37
- - **WHEN** task.yaml specifies `agent_interface.type: http`
38
- - **THEN** system communicates via REST API on localhost:8080
39
-
40
- ### Requirement: Live Demo Mode
41
- The system SHALL support live streaming of task execution for demos.
42
-
43
- #### Scenario: Stream execution
44
- - **WHEN** user runs `vibecodingbench run <task-id> --live`
45
- - **THEN** system streams agent actions, terminal output, and browser (if applicable) to web UI
46
-
47
- #### Scenario: Record session
48
- - **WHEN** user runs `vibecodingbench run <task-id> --record`
49
- - **THEN** system saves asciinema recording and browser video to `results/<run-id>/`