gsd-trae 1.0.0 → 1.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +40 -0
- package/README.md +7 -76
- package/assets/screenshot.png +0 -0
- package/package.json +12 -3
- package/.claude/settings.local.json +0 -8
- package/.gitmodules +0 -6
- package/.trae/project_rules.md +0 -56
- package/.trae/rules/project_rules.md +0 -56
- package/.vscode/code-counter/code-counter.db +0 -0
- package/.vscode/settings.json +0 -5
- package/refs/gsd/.github/CODEOWNERS +0 -2
- package/refs/gsd/.github/FUNDING.yml +0 -1
- package/refs/gsd/.github/ISSUE_TEMPLATE/bug_report.yml +0 -59
- package/refs/gsd/.github/ISSUE_TEMPLATE/feature_request.yml +0 -37
- package/refs/gsd/.github/pull_request_template.md +0 -24
- package/refs/gsd/.github/workflows/auto-label-issues.yml +0 -21
- package/refs/gsd/CHANGELOG.md +0 -1520
- package/refs/gsd/LICENSE +0 -21
- package/refs/gsd/README.md +0 -704
- package/refs/gsd/SECURITY.md +0 -33
- package/refs/gsd/agents/gsd-codebase-mapper.md +0 -764
- package/refs/gsd/agents/gsd-debugger.md +0 -1246
- package/refs/gsd/agents/gsd-executor.md +0 -469
- package/refs/gsd/agents/gsd-integration-checker.md +0 -443
- package/refs/gsd/agents/gsd-phase-researcher.md +0 -546
- package/refs/gsd/agents/gsd-plan-checker.md +0 -690
- package/refs/gsd/agents/gsd-planner.md +0 -1275
- package/refs/gsd/agents/gsd-project-researcher.md +0 -621
- package/refs/gsd/agents/gsd-research-synthesizer.md +0 -239
- package/refs/gsd/agents/gsd-roadmapper.md +0 -642
- package/refs/gsd/agents/gsd-verifier.md +0 -573
- package/refs/gsd/assets/gsd-logo-2000-transparent.png +0 -0
- package/refs/gsd/assets/gsd-logo-2000-transparent.svg +0 -17
- package/refs/gsd/assets/gsd-logo-2000.png +0 -0
- package/refs/gsd/assets/gsd-logo-2000.svg +0 -21
- package/refs/gsd/assets/terminal.svg +0 -68
- package/refs/gsd/bin/install.js +0 -2090
- package/refs/gsd/commands/gsd/add-phase.md +0 -43
- package/refs/gsd/commands/gsd/add-tests.md +0 -41
- package/refs/gsd/commands/gsd/add-todo.md +0 -47
- package/refs/gsd/commands/gsd/audit-milestone.md +0 -36
- package/refs/gsd/commands/gsd/check-todos.md +0 -45
- package/refs/gsd/commands/gsd/cleanup.md +0 -18
- package/refs/gsd/commands/gsd/complete-milestone.md +0 -136
- package/refs/gsd/commands/gsd/debug.md +0 -167
- package/refs/gsd/commands/gsd/discuss-phase.md +0 -83
- package/refs/gsd/commands/gsd/execute-phase.md +0 -41
- package/refs/gsd/commands/gsd/health.md +0 -22
- package/refs/gsd/commands/gsd/help.md +0 -22
- package/refs/gsd/commands/gsd/insert-phase.md +0 -32
- package/refs/gsd/commands/gsd/join-discord.md +0 -18
- package/refs/gsd/commands/gsd/list-phase-assumptions.md +0 -46
- package/refs/gsd/commands/gsd/map-codebase.md +0 -71
- package/refs/gsd/commands/gsd/new-milestone.md +0 -44
- package/refs/gsd/commands/gsd/new-project.md +0 -42
- package/refs/gsd/commands/gsd/new-project.md.bak +0 -1041
- package/refs/gsd/commands/gsd/pause-work.md +0 -38
- package/refs/gsd/commands/gsd/plan-milestone-gaps.md +0 -34
- package/refs/gsd/commands/gsd/plan-phase.md +0 -45
- package/refs/gsd/commands/gsd/progress.md +0 -24
- package/refs/gsd/commands/gsd/quick.md +0 -41
- package/refs/gsd/commands/gsd/reapply-patches.md +0 -110
- package/refs/gsd/commands/gsd/remove-phase.md +0 -31
- package/refs/gsd/commands/gsd/research-phase.md +0 -189
- package/refs/gsd/commands/gsd/resume-work.md +0 -40
- package/refs/gsd/commands/gsd/set-profile.md +0 -34
- package/refs/gsd/commands/gsd/settings.md +0 -36
- package/refs/gsd/commands/gsd/update.md +0 -37
- package/refs/gsd/commands/gsd/verify-work.md +0 -38
- package/refs/gsd/docs/USER-GUIDE.md +0 -471
- package/refs/gsd/docs/context-monitor.md +0 -96
- package/refs/gsd/get-shit-done/bin/gsd-tools.cjs +0 -585
- package/refs/gsd/get-shit-done/bin/lib/commands.cjs +0 -553
- package/refs/gsd/get-shit-done/bin/lib/config.cjs +0 -162
- package/refs/gsd/get-shit-done/bin/lib/core.cjs +0 -411
- package/refs/gsd/get-shit-done/bin/lib/frontmatter.cjs +0 -299
- package/refs/gsd/get-shit-done/bin/lib/init.cjs +0 -710
- package/refs/gsd/get-shit-done/bin/lib/milestone.cjs +0 -215
- package/refs/gsd/get-shit-done/bin/lib/phase.cjs +0 -870
- package/refs/gsd/get-shit-done/bin/lib/roadmap.cjs +0 -298
- package/refs/gsd/get-shit-done/bin/lib/state.cjs +0 -521
- package/refs/gsd/get-shit-done/bin/lib/template.cjs +0 -222
- package/refs/gsd/get-shit-done/bin/lib/verify.cjs +0 -772
- package/refs/gsd/get-shit-done/references/checkpoints.md +0 -776
- package/refs/gsd/get-shit-done/references/continuation-format.md +0 -249
- package/refs/gsd/get-shit-done/references/decimal-phase-calculation.md +0 -65
- package/refs/gsd/get-shit-done/references/git-integration.md +0 -248
- package/refs/gsd/get-shit-done/references/git-planning-commit.md +0 -38
- package/refs/gsd/get-shit-done/references/model-profile-resolution.md +0 -34
- package/refs/gsd/get-shit-done/references/model-profiles.md +0 -92
- package/refs/gsd/get-shit-done/references/phase-argument-parsing.md +0 -61
- package/refs/gsd/get-shit-done/references/planning-config.md +0 -196
- package/refs/gsd/get-shit-done/references/questioning.md +0 -145
- package/refs/gsd/get-shit-done/references/tdd.md +0 -263
- package/refs/gsd/get-shit-done/references/ui-brand.md +0 -160
- package/refs/gsd/get-shit-done/references/verification-patterns.md +0 -612
- package/refs/gsd/get-shit-done/templates/DEBUG.md +0 -164
- package/refs/gsd/get-shit-done/templates/UAT.md +0 -247
- package/refs/gsd/get-shit-done/templates/VALIDATION.md +0 -76
- package/refs/gsd/get-shit-done/templates/codebase/architecture.md +0 -255
- package/refs/gsd/get-shit-done/templates/codebase/concerns.md +0 -310
- package/refs/gsd/get-shit-done/templates/codebase/conventions.md +0 -307
- package/refs/gsd/get-shit-done/templates/codebase/integrations.md +0 -280
- package/refs/gsd/get-shit-done/templates/codebase/stack.md +0 -186
- package/refs/gsd/get-shit-done/templates/codebase/structure.md +0 -285
- package/refs/gsd/get-shit-done/templates/codebase/testing.md +0 -480
- package/refs/gsd/get-shit-done/templates/config.json +0 -37
- package/refs/gsd/get-shit-done/templates/context.md +0 -283
- package/refs/gsd/get-shit-done/templates/continue-here.md +0 -78
- package/refs/gsd/get-shit-done/templates/debug-subagent-prompt.md +0 -91
- package/refs/gsd/get-shit-done/templates/discovery.md +0 -146
- package/refs/gsd/get-shit-done/templates/milestone-archive.md +0 -123
- package/refs/gsd/get-shit-done/templates/milestone.md +0 -115
- package/refs/gsd/get-shit-done/templates/phase-prompt.md +0 -569
- package/refs/gsd/get-shit-done/templates/planner-subagent-prompt.md +0 -117
- package/refs/gsd/get-shit-done/templates/project.md +0 -184
- package/refs/gsd/get-shit-done/templates/requirements.md +0 -231
- package/refs/gsd/get-shit-done/templates/research-project/ARCHITECTURE.md +0 -204
- package/refs/gsd/get-shit-done/templates/research-project/FEATURES.md +0 -147
- package/refs/gsd/get-shit-done/templates/research-project/PITFALLS.md +0 -200
- package/refs/gsd/get-shit-done/templates/research-project/STACK.md +0 -120
- package/refs/gsd/get-shit-done/templates/research-project/SUMMARY.md +0 -170
- package/refs/gsd/get-shit-done/templates/research.md +0 -552
- package/refs/gsd/get-shit-done/templates/retrospective.md +0 -54
- package/refs/gsd/get-shit-done/templates/roadmap.md +0 -202
- package/refs/gsd/get-shit-done/templates/state.md +0 -176
- package/refs/gsd/get-shit-done/templates/summary-complex.md +0 -59
- package/refs/gsd/get-shit-done/templates/summary-minimal.md +0 -41
- package/refs/gsd/get-shit-done/templates/summary-standard.md +0 -48
- package/refs/gsd/get-shit-done/templates/summary.md +0 -248
- package/refs/gsd/get-shit-done/templates/user-setup.md +0 -311
- package/refs/gsd/get-shit-done/templates/verification-report.md +0 -322
- package/refs/gsd/get-shit-done/workflows/add-phase.md +0 -111
- package/refs/gsd/get-shit-done/workflows/add-tests.md +0 -350
- package/refs/gsd/get-shit-done/workflows/add-todo.md +0 -157
- package/refs/gsd/get-shit-done/workflows/audit-milestone.md +0 -297
- package/refs/gsd/get-shit-done/workflows/check-todos.md +0 -176
- package/refs/gsd/get-shit-done/workflows/cleanup.md +0 -152
- package/refs/gsd/get-shit-done/workflows/complete-milestone.md +0 -763
- package/refs/gsd/get-shit-done/workflows/diagnose-issues.md +0 -219
- package/refs/gsd/get-shit-done/workflows/discovery-phase.md +0 -289
- package/refs/gsd/get-shit-done/workflows/discuss-phase.md +0 -542
- package/refs/gsd/get-shit-done/workflows/execute-phase.md +0 -449
- package/refs/gsd/get-shit-done/workflows/execute-plan.md +0 -448
- package/refs/gsd/get-shit-done/workflows/health.md +0 -156
- package/refs/gsd/get-shit-done/workflows/help.md +0 -489
- package/refs/gsd/get-shit-done/workflows/insert-phase.md +0 -129
- package/refs/gsd/get-shit-done/workflows/list-phase-assumptions.md +0 -178
- package/refs/gsd/get-shit-done/workflows/map-codebase.md +0 -315
- package/refs/gsd/get-shit-done/workflows/new-milestone.md +0 -382
- package/refs/gsd/get-shit-done/workflows/new-project.md +0 -1116
- package/refs/gsd/get-shit-done/workflows/pause-work.md +0 -122
- package/refs/gsd/get-shit-done/workflows/plan-milestone-gaps.md +0 -274
- package/refs/gsd/get-shit-done/workflows/plan-phase.md +0 -569
- package/refs/gsd/get-shit-done/workflows/progress.md +0 -381
- package/refs/gsd/get-shit-done/workflows/quick.md +0 -453
- package/refs/gsd/get-shit-done/workflows/remove-phase.md +0 -154
- package/refs/gsd/get-shit-done/workflows/research-phase.md +0 -73
- package/refs/gsd/get-shit-done/workflows/resume-project.md +0 -306
- package/refs/gsd/get-shit-done/workflows/set-profile.md +0 -80
- package/refs/gsd/get-shit-done/workflows/settings.md +0 -213
- package/refs/gsd/get-shit-done/workflows/transition.md +0 -544
- package/refs/gsd/get-shit-done/workflows/update.md +0 -219
- package/refs/gsd/get-shit-done/workflows/verify-phase.md +0 -242
- package/refs/gsd/get-shit-done/workflows/verify-work.md +0 -569
- package/refs/gsd/hooks/gsd-check-update.js +0 -62
- package/refs/gsd/hooks/gsd-context-monitor.js +0 -122
- package/refs/gsd/hooks/gsd-statusline.js +0 -108
- package/refs/gsd/package.json +0 -50
- package/refs/gsd/scripts/build-hooks.js +0 -43
- package/refs/gsd/tests/commands.test.cjs +0 -661
- package/refs/gsd/tests/helpers.cjs +0 -40
- package/refs/gsd/tests/init.test.cjs +0 -205
- package/refs/gsd/tests/milestone.test.cjs +0 -98
- package/refs/gsd/tests/phase.test.cjs +0 -1241
- package/refs/gsd/tests/roadmap.test.cjs +0 -265
- package/refs/gsd/tests/state.test.cjs +0 -302
- package/refs/gsd/tests/verify.test.cjs +0 -80
- package/refs/vbenchmark/.agent/agents/codebase-explorer.md +0 -224
- package/refs/vbenchmark/.agent/agents/debugger.md +0 -180
- package/refs/vbenchmark/.agent/agents/documenter.md +0 -166
- package/refs/vbenchmark/.agent/agents/implementer.md +0 -70
- package/refs/vbenchmark/.agent/agents/orchestrator.md +0 -212
- package/refs/vbenchmark/.agent/agents/researcher.md +0 -80
- package/refs/vbenchmark/.agent/agents/reviewer.md +0 -184
- package/refs/vbenchmark/.agent/agents/tester.md +0 -170
- package/refs/vbenchmark/.agent/commands/commit.md +0 -29
- package/refs/vbenchmark/.agent/commands/debug.md +0 -59
- package/refs/vbenchmark/.agent/commands/document.md +0 -52
- package/refs/vbenchmark/.agent/commands/gather-context.md +0 -58
- package/refs/vbenchmark/.agent/commands/init.md +0 -56
- package/refs/vbenchmark/.agent/commands/preset-help.md +0 -50
- package/refs/vbenchmark/.agent/commands/refactor.md +0 -71
- package/refs/vbenchmark/.agent/commands/research.md +0 -37
- package/refs/vbenchmark/.agent/commands/review.md +0 -38
- package/refs/vbenchmark/.agent/commands/test.md +0 -61
- package/refs/vbenchmark/.agent/rules/01-code-quality.md +0 -33
- package/refs/vbenchmark/.agent/rules/02-typescript-go.md +0 -46
- package/refs/vbenchmark/.agent/rules/03-security-git.md +0 -34
- package/refs/vbenchmark/.agent/rules/04-architecture.md +0 -40
- package/refs/vbenchmark/.agent/sync.js +0 -536
- package/refs/vbenchmark/.agent/workflows/commit.md +0 -29
- package/refs/vbenchmark/.agent/workflows/debug.md +0 -59
- package/refs/vbenchmark/.agent/workflows/document.md +0 -52
- package/refs/vbenchmark/.agent/workflows/gather-context.md +0 -58
- package/refs/vbenchmark/.agent/workflows/init.md +0 -56
- package/refs/vbenchmark/.agent/workflows/preset-help.md +0 -50
- package/refs/vbenchmark/.agent/workflows/refactor.md +0 -71
- package/refs/vbenchmark/.agent/workflows/research.md +0 -37
- package/refs/vbenchmark/.agent/workflows/review.md +0 -38
- package/refs/vbenchmark/.agent/workflows/test.md +0 -61
- package/refs/vbenchmark/.claude/commands/agentic-dev/apply.md +0 -222
- package/refs/vbenchmark/.claude/commands/agentic-dev/done.md +0 -166
- package/refs/vbenchmark/.claude/commands/agentic-dev/proposal.md +0 -220
- package/refs/vbenchmark/.claude/commands/openspec/apply.md +0 -23
- package/refs/vbenchmark/.claude/commands/openspec/archive.md +0 -27
- package/refs/vbenchmark/.claude/commands/openspec/proposal.md +0 -28
- package/refs/vbenchmark/.clinerules/01-rules.md +0 -73
- package/refs/vbenchmark/.clinerules/02-agents.md +0 -34
- package/refs/vbenchmark/.cursor/commands/commit.md +0 -29
- package/refs/vbenchmark/.cursor/commands/debug.md +0 -59
- package/refs/vbenchmark/.cursor/commands/document.md +0 -52
- package/refs/vbenchmark/.cursor/commands/gather-context.md +0 -58
- package/refs/vbenchmark/.cursor/commands/init.md +0 -56
- package/refs/vbenchmark/.cursor/commands/preset-help.md +0 -50
- package/refs/vbenchmark/.cursor/commands/refactor.md +0 -71
- package/refs/vbenchmark/.cursor/commands/research.md +0 -37
- package/refs/vbenchmark/.cursor/commands/review.md +0 -38
- package/refs/vbenchmark/.cursor/commands/test.md +0 -61
- package/refs/vbenchmark/.cursor/rules/agents.mdc +0 -1357
- package/refs/vbenchmark/.factory/droids/codebase-explorer.md +0 -224
- package/refs/vbenchmark/.factory/droids/debugger.md +0 -180
- package/refs/vbenchmark/.factory/droids/documenter.md +0 -166
- package/refs/vbenchmark/.factory/droids/implementer.md +0 -70
- package/refs/vbenchmark/.factory/droids/orchestrator.md +0 -212
- package/refs/vbenchmark/.factory/droids/researcher.md +0 -80
- package/refs/vbenchmark/.factory/droids/reviewer.md +0 -184
- package/refs/vbenchmark/.factory/droids/tester.md +0 -170
- package/refs/vbenchmark/.gemini/workflows/commit.md +0 -29
- package/refs/vbenchmark/.gemini/workflows/debug.md +0 -59
- package/refs/vbenchmark/.gemini/workflows/document.md +0 -52
- package/refs/vbenchmark/.gemini/workflows/gather-context.md +0 -58
- package/refs/vbenchmark/.gemini/workflows/init.md +0 -56
- package/refs/vbenchmark/.gemini/workflows/preset-help.md +0 -50
- package/refs/vbenchmark/.gemini/workflows/refactor.md +0 -71
- package/refs/vbenchmark/.gemini/workflows/research.md +0 -37
- package/refs/vbenchmark/.gemini/workflows/review.md +0 -38
- package/refs/vbenchmark/.gemini/workflows/test.md +0 -61
- package/refs/vbenchmark/.github/CODEOWNERS +0 -20
- package/refs/vbenchmark/.github/FUNDING.yml +0 -4
- package/refs/vbenchmark/.github/ISSUE_TEMPLATE/bug-report.yml +0 -76
- package/refs/vbenchmark/.github/ISSUE_TEMPLATE/new-task.yml +0 -106
- package/refs/vbenchmark/.github/PULL_REQUEST_TEMPLATE.md +0 -38
- package/refs/vbenchmark/.github/copilot-instructions.md +0 -73
- package/refs/vbenchmark/.github/workflows/ci.yaml +0 -33
- package/refs/vbenchmark/.github/workflows/vercel-auto-pr.yml +0 -478
- package/refs/vbenchmark/.github/workflows/vercel-deploy.yaml +0 -487
- package/refs/vbenchmark/.github/workflows/vercel-pr-command.yaml +0 -337
- package/refs/vbenchmark/.github/workflows/vercel-project-init.yaml +0 -208
- package/refs/vbenchmark/.opencode/agent/codebase-explorer.md +0 -224
- package/refs/vbenchmark/.opencode/agent/debugger.md +0 -180
- package/refs/vbenchmark/.opencode/agent/documenter.md +0 -166
- package/refs/vbenchmark/.opencode/agent/implementer.md +0 -70
- package/refs/vbenchmark/.opencode/agent/orchestrator.md +0 -212
- package/refs/vbenchmark/.opencode/agent/researcher.md +0 -80
- package/refs/vbenchmark/.opencode/agent/reviewer.md +0 -184
- package/refs/vbenchmark/.opencode/agent/tester.md +0 -170
- package/refs/vbenchmark/.opencode/command/commit.md +0 -29
- package/refs/vbenchmark/.opencode/command/debug.md +0 -59
- package/refs/vbenchmark/.opencode/command/document.md +0 -52
- package/refs/vbenchmark/.opencode/command/gather-context.md +0 -58
- package/refs/vbenchmark/.opencode/command/init.md +0 -56
- package/refs/vbenchmark/.opencode/command/preset-help.md +0 -50
- package/refs/vbenchmark/.opencode/command/refactor.md +0 -71
- package/refs/vbenchmark/.opencode/command/research.md +0 -37
- package/refs/vbenchmark/.opencode/command/review.md +0 -38
- package/refs/vbenchmark/.opencode/command/test.md +0 -61
- package/refs/vbenchmark/.trae/project_rules.md +0 -73
- package/refs/vbenchmark/.windsurf/rules/rules.md +0 -85
- package/refs/vbenchmark/AGENTS.md +0 -73
- package/refs/vbenchmark/CONTRIBUTING.md +0 -332
- package/refs/vbenchmark/Caddyfile +0 -3
- package/refs/vbenchmark/LICENSE +0 -47
- package/refs/vbenchmark/README.md +0 -354
- package/refs/vbenchmark/docker-compose.prod.yaml +0 -35
- package/refs/vbenchmark/docker-compose.yaml +0 -53
- package/refs/vbenchmark/docs/TASK_EXPANSION_PLAN.md +0 -211
- package/refs/vbenchmark/docs/THESIS.md +0 -441
- package/refs/vbenchmark/docs/categories/code-evolution.md +0 -138
- package/refs/vbenchmark/openspec/changes/init-vibecodingbench/design.md +0 -111
- package/refs/vbenchmark/openspec/changes/init-vibecodingbench/proposal.md +0 -15
- package/refs/vbenchmark/openspec/changes/init-vibecodingbench/specs/evaluation/spec.md +0 -105
- package/refs/vbenchmark/openspec/changes/init-vibecodingbench/specs/leaderboard/spec.md +0 -68
- package/refs/vbenchmark/openspec/changes/init-vibecodingbench/specs/task-definition/spec.md +0 -45
- package/refs/vbenchmark/openspec/changes/init-vibecodingbench/specs/task-runner/spec.md +0 -49
- package/refs/vbenchmark/openspec/changes/init-vibecodingbench/tasks.md +0 -413
- package/refs/vbenchmark/package.json +0 -51
- package/refs/vbenchmark/packages/cli/eslint.config.js +0 -16
- package/refs/vbenchmark/packages/cli/package.json +0 -35
- package/refs/vbenchmark/packages/cli/src/agents/index.ts +0 -655
- package/refs/vbenchmark/packages/cli/src/commands/eval.ts +0 -197
- package/refs/vbenchmark/packages/cli/src/commands/list.ts +0 -63
- package/refs/vbenchmark/packages/cli/src/commands/run.ts +0 -147
- package/refs/vbenchmark/packages/cli/src/evaluator.ts +0 -125
- package/refs/vbenchmark/packages/cli/src/index.ts +0 -21
- package/refs/vbenchmark/packages/cli/src/lib/task-variation.ts +0 -153
- package/refs/vbenchmark/packages/cli/src/loader.ts +0 -258
- package/refs/vbenchmark/packages/cli/src/reporter.ts +0 -222
- package/refs/vbenchmark/packages/cli/src/runtime/docker.ts +0 -385
- package/refs/vbenchmark/packages/cli/tsconfig.json +0 -8
- package/refs/vbenchmark/packages/dashboard/Dockerfile +0 -42
- package/refs/vbenchmark/packages/dashboard/index.html +0 -21
- package/refs/vbenchmark/packages/dashboard/package.json +0 -29
- package/refs/vbenchmark/packages/dashboard/postcss.config.js +0 -6
- package/refs/vbenchmark/packages/dashboard/public/favicon.svg +0 -24
- package/refs/vbenchmark/packages/dashboard/public/logo.png +0 -0
- package/refs/vbenchmark/packages/dashboard/public/logo.svg +0 -39
- package/refs/vbenchmark/packages/dashboard/src/App.tsx +0 -1468
- package/refs/vbenchmark/packages/dashboard/src/data/category-performance.json +0 -1
- package/refs/vbenchmark/packages/dashboard/src/data/leaderboard.json +0 -1
- package/refs/vbenchmark/packages/dashboard/src/data/task-results.json +0 -1
- package/refs/vbenchmark/packages/dashboard/src/data/tasks.json +0 -1
- package/refs/vbenchmark/packages/dashboard/src/index.css +0 -3
- package/refs/vbenchmark/packages/dashboard/src/main.tsx +0 -13
- package/refs/vbenchmark/packages/dashboard/src/vite-env.d.ts +0 -9
- package/refs/vbenchmark/packages/dashboard/tailwind.config.js +0 -11
- package/refs/vbenchmark/packages/dashboard/tsconfig.json +0 -21
- package/refs/vbenchmark/packages/dashboard/tsconfig.node.json +0 -11
- package/refs/vbenchmark/packages/dashboard/vercel.json +0 -6
- package/refs/vbenchmark/packages/dashboard/vite.config.ts +0 -28
- package/refs/vbenchmark/packages/evaluator/eslint.config.js +0 -16
- package/refs/vbenchmark/packages/evaluator/package.json +0 -24
- package/refs/vbenchmark/packages/evaluator/src/index.ts +0 -15
- package/refs/vbenchmark/packages/evaluator/src/runners/functional.ts +0 -88
- package/refs/vbenchmark/packages/evaluator/src/runners/quality.ts +0 -140
- package/refs/vbenchmark/packages/evaluator/src/runners/security.ts +0 -94
- package/refs/vbenchmark/packages/evaluator/src/runners/visual.ts +0 -108
- package/refs/vbenchmark/packages/evaluator/src/types.d.ts +0 -19
- package/refs/vbenchmark/packages/evaluator/tsconfig.json +0 -8
- package/refs/vbenchmark/packages/leaderboard/Dockerfile +0 -38
- package/refs/vbenchmark/packages/leaderboard/drizzle.config.ts +0 -10
- package/refs/vbenchmark/packages/leaderboard/eslint.config.js +0 -16
- package/refs/vbenchmark/packages/leaderboard/fly.toml +0 -29
- package/refs/vbenchmark/packages/leaderboard/package.json +0 -36
- package/refs/vbenchmark/packages/leaderboard/src/app.ts +0 -29
- package/refs/vbenchmark/packages/leaderboard/src/components/BrowserPreview.tsx +0 -190
- package/refs/vbenchmark/packages/leaderboard/src/components/ComparisonView.tsx +0 -205
- package/refs/vbenchmark/packages/leaderboard/src/components/LeaderboardTable.tsx +0 -150
- package/refs/vbenchmark/packages/leaderboard/src/components/LiveRunCard.tsx +0 -133
- package/refs/vbenchmark/packages/leaderboard/src/components/SubmissionForm.tsx +0 -406
- package/refs/vbenchmark/packages/leaderboard/src/components/SubmitForm.tsx +0 -293
- package/refs/vbenchmark/packages/leaderboard/src/components/TerminalStream.tsx +0 -111
- package/refs/vbenchmark/packages/leaderboard/src/config/pricing.ts +0 -206
- package/refs/vbenchmark/packages/leaderboard/src/db/index.ts +0 -31
- package/refs/vbenchmark/packages/leaderboard/src/db/schema.ts +0 -125
- package/refs/vbenchmark/packages/leaderboard/src/index.ts +0 -13
- package/refs/vbenchmark/packages/leaderboard/src/lib/websocket.ts +0 -124
- package/refs/vbenchmark/packages/leaderboard/src/routes/leaderboard.ts +0 -698
- package/refs/vbenchmark/packages/leaderboard/src/routes/live.ts +0 -175
- package/refs/vbenchmark/packages/leaderboard/src/routes/submissions.ts +0 -183
- package/refs/vbenchmark/packages/leaderboard/src/routes/tasks.ts +0 -215
- package/refs/vbenchmark/packages/leaderboard/tests/api.test.ts +0 -228
- package/refs/vbenchmark/packages/leaderboard/tsconfig.json +0 -9
- package/refs/vbenchmark/scripts/deploy.sh +0 -70
- package/refs/vbenchmark/tasks/ai-integration/advanced/context-management/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/ai-integration/advanced/context-management/task.yaml +0 -16
- package/refs/vbenchmark/tasks/ai-integration/advanced/evaluation-framework/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/ai-integration/advanced/evaluation-framework/task.yaml +0 -16
- package/refs/vbenchmark/tasks/ai-integration/advanced/guardrails-safety/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/ai-integration/advanced/guardrails-safety/task.yaml +0 -16
- package/refs/vbenchmark/tasks/ai-integration/advanced/memory-system/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/ai-integration/advanced/memory-system/task.yaml +0 -16
- package/refs/vbenchmark/tasks/ai-integration/advanced/model-routing/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/ai-integration/advanced/model-routing/task.yaml +0 -16
- package/refs/vbenchmark/tasks/ai-integration/advanced/multi-agent-system/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/ai-integration/advanced/multi-agent-system/task.yaml +0 -16
- package/refs/vbenchmark/tasks/ai-integration/advanced/prompt-optimization/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/ai-integration/advanced/prompt-optimization/task.yaml +0 -16
- package/refs/vbenchmark/tasks/ai-integration/advanced/reasoning-chain/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/ai-integration/advanced/reasoning-chain/task.yaml +0 -16
- package/refs/vbenchmark/tasks/ai-integration/advanced/streaming-pipeline/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/ai-integration/advanced/streaming-pipeline/task.yaml +0 -16
- package/refs/vbenchmark/tasks/ai-integration/advanced/tool-use-orchestration/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/ai-integration/advanced/tool-use-orchestration/task.yaml +0 -16
- package/refs/vbenchmark/tasks/ai-integration/agents/code-review-agent/PROMPT.md +0 -64
- package/refs/vbenchmark/tasks/ai-integration/agents/code-review-agent/task.yaml +0 -24
- package/refs/vbenchmark/tasks/ai-integration/agents/research-agent/PROMPT.md +0 -61
- package/refs/vbenchmark/tasks/ai-integration/agents/research-agent/task.yaml +0 -24
- package/refs/vbenchmark/tasks/ai-integration/agents/web-scraper-agent/PROMPT.md +0 -57
- package/refs/vbenchmark/tasks/ai-integration/agents/web-scraper-agent/task.yaml +0 -24
- package/refs/vbenchmark/tasks/ai-integration/embeddings/duplicate-detection/PROMPT.md +0 -50
- package/refs/vbenchmark/tasks/ai-integration/embeddings/duplicate-detection/task.yaml +0 -24
- package/refs/vbenchmark/tasks/ai-integration/embeddings/recommendation-engine/PROMPT.md +0 -51
- package/refs/vbenchmark/tasks/ai-integration/embeddings/recommendation-engine/task.yaml +0 -24
- package/refs/vbenchmark/tasks/ai-integration/embeddings/semantic-search/PROMPT.md +0 -50
- package/refs/vbenchmark/tasks/ai-integration/embeddings/semantic-search/task.yaml +0 -24
- package/refs/vbenchmark/tasks/ai-integration/fine-tuning/classification-model/PROMPT.md +0 -50
- package/refs/vbenchmark/tasks/ai-integration/fine-tuning/classification-model/task.yaml +0 -24
- package/refs/vbenchmark/tasks/ai-integration/function-calling/api-orchestrator/PROMPT.md +0 -60
- package/refs/vbenchmark/tasks/ai-integration/function-calling/api-orchestrator/task.yaml +0 -24
- package/refs/vbenchmark/tasks/ai-integration/function-calling/calendar-assistant/PROMPT.md +0 -50
- package/refs/vbenchmark/tasks/ai-integration/function-calling/calendar-assistant/task.yaml +0 -24
- package/refs/vbenchmark/tasks/ai-integration/function-calling/database-query/PROMPT.md +0 -62
- package/refs/vbenchmark/tasks/ai-integration/function-calling/database-query/task.yaml +0 -24
- package/refs/vbenchmark/tasks/ai-integration/multimodal/chart-interpreter/PROMPT.md +0 -60
- package/refs/vbenchmark/tasks/ai-integration/multimodal/chart-interpreter/task.yaml +0 -24
- package/refs/vbenchmark/tasks/ai-integration/multimodal/image-captioning/PROMPT.md +0 -49
- package/refs/vbenchmark/tasks/ai-integration/multimodal/image-captioning/task.yaml +0 -24
- package/refs/vbenchmark/tasks/ai-integration/rag-chatbot/code-assistant/PROMPT.md +0 -51
- package/refs/vbenchmark/tasks/ai-integration/rag-chatbot/code-assistant/task.yaml +0 -24
- package/refs/vbenchmark/tasks/ai-integration/rag-chatbot/doc-search/PROMPT.md +0 -51
- package/refs/vbenchmark/tasks/ai-integration/rag-chatbot/doc-search/task.yaml +0 -24
- package/refs/vbenchmark/tasks/ai-integration/rag-chatbot/pdf-qa/PROMPT.md +0 -76
- package/refs/vbenchmark/tasks/ai-integration/rag-chatbot/pdf-qa/docker-compose.yaml +0 -30
- package/refs/vbenchmark/tasks/ai-integration/rag-chatbot/pdf-qa/task.yaml +0 -30
- package/refs/vbenchmark/tasks/ai-integration/rag-chatbot/pdf-qa/tests/functional/qa.test.py +0 -146
- package/refs/vbenchmark/tasks/ai-integration/rag-chatbot/support-bot/PROMPT.md +0 -51
- package/refs/vbenchmark/tasks/ai-integration/rag-chatbot/support-bot/task.yaml +0 -24
- package/refs/vbenchmark/tasks/ai-integration/structured-output/contract-analyzer/PROMPT.md +0 -67
- package/refs/vbenchmark/tasks/ai-integration/structured-output/contract-analyzer/task.yaml +0 -24
- package/refs/vbenchmark/tasks/ai-integration/structured-output/invoice-parser/PROMPT.md +0 -61
- package/refs/vbenchmark/tasks/ai-integration/structured-output/invoice-parser/task.yaml +0 -27
- package/refs/vbenchmark/tasks/ai-integration/structured-output/receipt-scanner/PROMPT.md +0 -65
- package/refs/vbenchmark/tasks/ai-integration/structured-output/receipt-scanner/task.yaml +0 -24
- package/refs/vbenchmark/tasks/ai-integration/structured-output/resume-parser/PROMPT.md +0 -70
- package/refs/vbenchmark/tasks/ai-integration/structured-output/resume-parser/task.yaml +0 -24
- package/refs/vbenchmark/tasks/api-integrations/advanced/api-analytics/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/api-integrations/advanced/api-analytics/task.yaml +0 -16
- package/refs/vbenchmark/tasks/api-integrations/advanced/api-gateway/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/api-integrations/advanced/api-gateway/task.yaml +0 -16
- package/refs/vbenchmark/tasks/api-integrations/advanced/api-mocking/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/api-integrations/advanced/api-mocking/task.yaml +0 -16
- package/refs/vbenchmark/tasks/api-integrations/advanced/contract-testing/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/api-integrations/advanced/contract-testing/task.yaml +0 -16
- package/refs/vbenchmark/tasks/api-integrations/advanced/graphql-federation/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/api-integrations/advanced/graphql-federation/task.yaml +0 -16
- package/refs/vbenchmark/tasks/api-integrations/advanced/grpc-gateway/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/api-integrations/advanced/grpc-gateway/task.yaml +0 -16
- package/refs/vbenchmark/tasks/api-integrations/advanced/rate-limiter/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/api-integrations/advanced/rate-limiter/task.yaml +0 -16
- package/refs/vbenchmark/tasks/api-integrations/advanced/request-validator/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/api-integrations/advanced/request-validator/task.yaml +0 -16
- package/refs/vbenchmark/tasks/api-integrations/advanced/sdk-generator/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/api-integrations/advanced/sdk-generator/task.yaml +0 -16
- package/refs/vbenchmark/tasks/api-integrations/advanced/webhook-processor/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/api-integrations/advanced/webhook-processor/task.yaml +0 -16
- package/refs/vbenchmark/tasks/api-integrations/analytics/mixpanel-events/PROMPT.md +0 -42
- package/refs/vbenchmark/tasks/api-integrations/analytics/mixpanel-events/task.yaml +0 -24
- package/refs/vbenchmark/tasks/api-integrations/analytics/segment-tracking/PROMPT.md +0 -42
- package/refs/vbenchmark/tasks/api-integrations/analytics/segment-tracking/task.yaml +0 -24
- package/refs/vbenchmark/tasks/api-integrations/auth-provider/oauth2-github/PROMPT.md +0 -42
- package/refs/vbenchmark/tasks/api-integrations/auth-provider/oauth2-github/task.yaml +0 -24
- package/refs/vbenchmark/tasks/api-integrations/auth-provider/okta-integration/PROMPT.md +0 -44
- package/refs/vbenchmark/tasks/api-integrations/auth-provider/okta-integration/task.yaml +0 -24
- package/refs/vbenchmark/tasks/api-integrations/auth-provider/saml-sso/PROMPT.md +0 -42
- package/refs/vbenchmark/tasks/api-integrations/auth-provider/saml-sso/task.yaml +0 -24
- package/refs/vbenchmark/tasks/api-integrations/communication/discord-webhook/PROMPT.md +0 -44
- package/refs/vbenchmark/tasks/api-integrations/communication/discord-webhook/task.yaml +0 -24
- package/refs/vbenchmark/tasks/api-integrations/communication/slack-bot/PROMPT.md +0 -42
- package/refs/vbenchmark/tasks/api-integrations/communication/slack-bot/task.yaml +0 -24
- package/refs/vbenchmark/tasks/api-integrations/communication/twilio-sms/PROMPT.md +0 -42
- package/refs/vbenchmark/tasks/api-integrations/communication/twilio-sms/task.yaml +0 -24
- package/refs/vbenchmark/tasks/api-integrations/email/transactional/PROMPT.md +0 -82
- package/refs/vbenchmark/tasks/api-integrations/email/transactional/task.yaml +0 -27
- package/refs/vbenchmark/tasks/api-integrations/maps/google-maps-geocoding/PROMPT.md +0 -41
- package/refs/vbenchmark/tasks/api-integrations/maps/google-maps-geocoding/task.yaml +0 -24
- package/refs/vbenchmark/tasks/api-integrations/maps/mapbox-directions/PROMPT.md +0 -41
- package/refs/vbenchmark/tasks/api-integrations/maps/mapbox-directions/task.yaml +0 -24
- package/refs/vbenchmark/tasks/api-integrations/payment/crypto-payments/PROMPT.md +0 -43
- package/refs/vbenchmark/tasks/api-integrations/payment/crypto-payments/task.yaml +0 -24
- package/refs/vbenchmark/tasks/api-integrations/payment/paypal-integration/PROMPT.md +0 -41
- package/refs/vbenchmark/tasks/api-integrations/payment/paypal-integration/task.yaml +0 -24
- package/refs/vbenchmark/tasks/api-integrations/social/twitter-api/PROMPT.md +0 -41
- package/refs/vbenchmark/tasks/api-integrations/social/twitter-api/task.yaml +0 -24
- package/refs/vbenchmark/tasks/api-integrations/storage/cloudinary-upload/PROMPT.md +0 -43
- package/refs/vbenchmark/tasks/api-integrations/storage/cloudinary-upload/task.yaml +0 -24
- package/refs/vbenchmark/tasks/api-integrations/storage/gcs-streaming/PROMPT.md +0 -43
- package/refs/vbenchmark/tasks/api-integrations/storage/gcs-streaming/task.yaml +0 -24
- package/refs/vbenchmark/tasks/api-integrations/storage/s3-presigned-urls/PROMPT.md +0 -41
- package/refs/vbenchmark/tasks/api-integrations/storage/s3-presigned-urls/task.yaml +0 -24
- package/refs/vbenchmark/tasks/api-integrations/stripe/checkout-session/PROMPT.md +0 -41
- package/refs/vbenchmark/tasks/api-integrations/stripe/checkout-session/task.yaml +0 -24
- package/refs/vbenchmark/tasks/api-integrations/stripe/payment-webhook/PROMPT.md +0 -60
- package/refs/vbenchmark/tasks/api-integrations/stripe/payment-webhook/docker-compose.yaml +0 -38
- package/refs/vbenchmark/tasks/api-integrations/stripe/payment-webhook/task.yaml +0 -31
- package/refs/vbenchmark/tasks/api-integrations/stripe/payment-webhook/tests/webhook.test.ts +0 -193
- package/refs/vbenchmark/tasks/api-integrations/stripe/subscription-portal/PROMPT.md +0 -41
- package/refs/vbenchmark/tasks/api-integrations/stripe/subscription-portal/task.yaml +0 -24
- package/refs/vbenchmark/tasks/code-evolution/advanced/api-deprecation/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/code-evolution/advanced/api-deprecation/task.yaml +0 -16
- package/refs/vbenchmark/tasks/code-evolution/advanced/ast-refactoring/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/code-evolution/advanced/ast-refactoring/task.yaml +0 -16
- package/refs/vbenchmark/tasks/code-evolution/advanced/concurrency-fix/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/code-evolution/advanced/concurrency-fix/task.yaml +0 -16
- package/refs/vbenchmark/tasks/code-evolution/advanced/database-schema-migration/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/code-evolution/advanced/database-schema-migration/task.yaml +0 -16
- package/refs/vbenchmark/tasks/code-evolution/advanced/dead-code-elimination/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/code-evolution/advanced/dead-code-elimination/task.yaml +0 -16
- package/refs/vbenchmark/tasks/code-evolution/advanced/dependency-upgrade/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/code-evolution/advanced/dependency-upgrade/task.yaml +0 -16
- package/refs/vbenchmark/tasks/code-evolution/advanced/memory-optimization/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/code-evolution/advanced/memory-optimization/task.yaml +0 -16
- package/refs/vbenchmark/tasks/code-evolution/advanced/monorepo-extraction/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/code-evolution/advanced/monorepo-extraction/task.yaml +0 -16
- package/refs/vbenchmark/tasks/code-evolution/advanced/performance-profiling/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/code-evolution/advanced/performance-profiling/task.yaml +0 -16
- package/refs/vbenchmark/tasks/code-evolution/advanced/type-migration/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/code-evolution/advanced/type-migration/task.yaml +0 -16
- package/refs/vbenchmark/tasks/code-evolution/legacy-migration/callback-to-async/PROMPT.md +0 -47
- package/refs/vbenchmark/tasks/code-evolution/legacy-migration/callback-to-async/task.yaml +0 -24
- package/refs/vbenchmark/tasks/code-evolution/legacy-migration/express-to-fastify/PROMPT.md +0 -49
- package/refs/vbenchmark/tasks/code-evolution/legacy-migration/express-to-fastify/base-code/src/app.ts +0 -22
- package/refs/vbenchmark/tasks/code-evolution/legacy-migration/express-to-fastify/task.yaml +0 -37
- package/refs/vbenchmark/tasks/code-evolution/legacy-migration/express-to-fastify/tests/api.test.ts +0 -70
- package/refs/vbenchmark/tasks/code-evolution/legacy-migration/flask-to-fastapi/PROMPT.md +0 -46
- package/refs/vbenchmark/tasks/code-evolution/legacy-migration/flask-to-fastapi/task.yaml +0 -24
- package/refs/vbenchmark/tasks/code-evolution/legacy-migration/java-to-kotlin/PROMPT.md +0 -45
- package/refs/vbenchmark/tasks/code-evolution/legacy-migration/java-to-kotlin/task.yaml +0 -24
- package/refs/vbenchmark/tasks/code-evolution/legacy-migration/jquery-to-react/PROMPT.md +0 -47
- package/refs/vbenchmark/tasks/code-evolution/legacy-migration/jquery-to-react/task.yaml +0 -24
- package/refs/vbenchmark/tasks/code-evolution/legacy-migration/rest-to-grpc/PROMPT.md +0 -47
- package/refs/vbenchmark/tasks/code-evolution/legacy-migration/rest-to-grpc/task.yaml +0 -24
- package/refs/vbenchmark/tasks/code-evolution/performance/async-refactor/PROMPT.md +0 -47
- package/refs/vbenchmark/tasks/code-evolution/performance/async-refactor/task.yaml +0 -24
- package/refs/vbenchmark/tasks/code-evolution/performance/memory-leak-fix/PROMPT.md +0 -47
- package/refs/vbenchmark/tasks/code-evolution/performance/memory-leak-fix/task.yaml +0 -24
- package/refs/vbenchmark/tasks/code-evolution/performance/query-optimization/PROMPT.md +0 -49
- package/refs/vbenchmark/tasks/code-evolution/performance/query-optimization/task.yaml +0 -24
- package/refs/vbenchmark/tasks/code-evolution/refactoring/class-to-hooks/PROMPT.md +0 -96
- package/refs/vbenchmark/tasks/code-evolution/refactoring/class-to-hooks/task.yaml +0 -27
- package/refs/vbenchmark/tasks/code-evolution/refactoring/dependency-injection/PROMPT.md +0 -47
- package/refs/vbenchmark/tasks/code-evolution/refactoring/dependency-injection/task.yaml +0 -24
- package/refs/vbenchmark/tasks/code-evolution/refactoring/error-handling/PROMPT.md +0 -48
- package/refs/vbenchmark/tasks/code-evolution/refactoring/error-handling/task.yaml +0 -24
- package/refs/vbenchmark/tasks/code-evolution/refactoring/monolith-to-modules/PROMPT.md +0 -50
- package/refs/vbenchmark/tasks/code-evolution/refactoring/monolith-to-modules/task.yaml +0 -24
- package/refs/vbenchmark/tasks/code-evolution/refactoring/orm-migration/PROMPT.md +0 -47
- package/refs/vbenchmark/tasks/code-evolution/refactoring/orm-migration/task.yaml +0 -24
- package/refs/vbenchmark/tasks/code-evolution/security/secrets-rotation/PROMPT.md +0 -49
- package/refs/vbenchmark/tasks/code-evolution/security/secrets-rotation/task.yaml +0 -24
- package/refs/vbenchmark/tasks/code-evolution/security/sql-injection-fix/PROMPT.md +0 -50
- package/refs/vbenchmark/tasks/code-evolution/security/sql-injection-fix/task.yaml +0 -24
- package/refs/vbenchmark/tasks/code-evolution/security/xss-prevention/PROMPT.md +0 -47
- package/refs/vbenchmark/tasks/code-evolution/security/xss-prevention/task.yaml +0 -24
- package/refs/vbenchmark/tasks/code-evolution/testing/add-unit-tests/PROMPT.md +0 -48
- package/refs/vbenchmark/tasks/code-evolution/testing/add-unit-tests/task.yaml +0 -24
- package/refs/vbenchmark/tasks/code-evolution/testing/e2e-playwright/PROMPT.md +0 -50
- package/refs/vbenchmark/tasks/code-evolution/testing/e2e-playwright/task.yaml +0 -24
- package/refs/vbenchmark/tasks/code-evolution/testing/pytest-fixtures/PROMPT.md +0 -47
- package/refs/vbenchmark/tasks/code-evolution/testing/pytest-fixtures/task.yaml +0 -24
- package/refs/vbenchmark/tasks/frontend/accessibility/keyboard-shortcuts/PROMPT.md +0 -44
- package/refs/vbenchmark/tasks/frontend/accessibility/keyboard-shortcuts/task.yaml +0 -24
- package/refs/vbenchmark/tasks/frontend/accessibility/screen-reader-nav/PROMPT.md +0 -44
- package/refs/vbenchmark/tasks/frontend/accessibility/screen-reader-nav/task.yaml +0 -24
- package/refs/vbenchmark/tasks/frontend/advanced/canvas-editor/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/frontend/advanced/canvas-editor/task.yaml +0 -16
- package/refs/vbenchmark/tasks/frontend/advanced/micro-frontend/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/frontend/advanced/micro-frontend/task.yaml +0 -16
- package/refs/vbenchmark/tasks/frontend/advanced/offline-first/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/frontend/advanced/offline-first/task.yaml +0 -16
- package/refs/vbenchmark/tasks/frontend/advanced/realtime-collab/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/frontend/advanced/realtime-collab/task.yaml +0 -16
- package/refs/vbenchmark/tasks/frontend/advanced/service-worker/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/frontend/advanced/service-worker/task.yaml +0 -16
- package/refs/vbenchmark/tasks/frontend/advanced/state-machine/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/frontend/advanced/state-machine/task.yaml +0 -16
- package/refs/vbenchmark/tasks/frontend/advanced/virtual-list/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/frontend/advanced/virtual-list/task.yaml +0 -16
- package/refs/vbenchmark/tasks/frontend/advanced/wasm-integration/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/frontend/advanced/wasm-integration/task.yaml +0 -16
- package/refs/vbenchmark/tasks/frontend/advanced/web-worker/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/frontend/advanced/web-worker/task.yaml +0 -16
- package/refs/vbenchmark/tasks/frontend/advanced/webgl-visualization/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/frontend/advanced/webgl-visualization/task.yaml +0 -16
- package/refs/vbenchmark/tasks/frontend/animation/page-transitions/PROMPT.md +0 -44
- package/refs/vbenchmark/tasks/frontend/animation/page-transitions/task.yaml +0 -24
- package/refs/vbenchmark/tasks/frontend/components/data-grid/PROMPT.md +0 -59
- package/refs/vbenchmark/tasks/frontend/components/data-grid/task.yaml +0 -24
- package/refs/vbenchmark/tasks/frontend/components/date-range-picker/PROMPT.md +0 -57
- package/refs/vbenchmark/tasks/frontend/components/date-range-picker/task.yaml +0 -24
- package/refs/vbenchmark/tasks/frontend/components/file-uploader/PROMPT.md +0 -55
- package/refs/vbenchmark/tasks/frontend/components/file-uploader/task.yaml +0 -24
- package/refs/vbenchmark/tasks/frontend/components/form-builder/PROMPT.md +0 -96
- package/refs/vbenchmark/tasks/frontend/components/form-builder/task.yaml +0 -28
- package/refs/vbenchmark/tasks/frontend/components/rich-text-editor/PROMPT.md +0 -45
- package/refs/vbenchmark/tasks/frontend/components/rich-text-editor/task.yaml +0 -24
- package/refs/vbenchmark/tasks/frontend/figma-to-code/dashboard-layout/PROMPT.md +0 -50
- package/refs/vbenchmark/tasks/frontend/figma-to-code/dashboard-layout/task.yaml +0 -25
- package/refs/vbenchmark/tasks/frontend/figma-to-code/landing-page/PROMPT.md +0 -49
- package/refs/vbenchmark/tasks/frontend/figma-to-code/landing-page/task.yaml +0 -25
- package/refs/vbenchmark/tasks/frontend/figma-to-code/mobile-app-screen/PROMPT.md +0 -51
- package/refs/vbenchmark/tasks/frontend/figma-to-code/mobile-app-screen/task.yaml +0 -24
- package/refs/vbenchmark/tasks/frontend/figma-to-code/pricing-card/PROMPT.md +0 -93
- package/refs/vbenchmark/tasks/frontend/figma-to-code/pricing-card/docker-compose.yaml +0 -23
- package/refs/vbenchmark/tasks/frontend/figma-to-code/pricing-card/task.yaml +0 -30
- package/refs/vbenchmark/tasks/frontend/figma-to-code/pricing-card/tests/visual/diff.test.ts +0 -107
- package/refs/vbenchmark/tasks/frontend/figma-to-code/pricing-card/tests/visual/interaction.test.ts +0 -88
- package/refs/vbenchmark/tasks/frontend/performance/image-lazy-load/PROMPT.md +0 -43
- package/refs/vbenchmark/tasks/frontend/performance/image-lazy-load/task.yaml +0 -24
- package/refs/vbenchmark/tasks/frontend/performance/infinite-scroll/PROMPT.md +0 -44
- package/refs/vbenchmark/tasks/frontend/performance/infinite-scroll/task.yaml +0 -24
- package/refs/vbenchmark/tasks/frontend/state-management/collaborative-editor/PROMPT.md +0 -44
- package/refs/vbenchmark/tasks/frontend/state-management/collaborative-editor/task.yaml +0 -24
- package/refs/vbenchmark/tasks/frontend/state-management/shopping-cart/PROMPT.md +0 -53
- package/refs/vbenchmark/tasks/frontend/state-management/shopping-cart/task.yaml +0 -24
- package/refs/vbenchmark/tasks/frontend/visualization/chart-dashboard/PROMPT.md +0 -83
- package/refs/vbenchmark/tasks/frontend/visualization/chart-dashboard/task.yaml +0 -28
- package/refs/vbenchmark/tasks/frontend/visualization/gantt-chart/PROMPT.md +0 -57
- package/refs/vbenchmark/tasks/frontend/visualization/gantt-chart/task.yaml +0 -24
- package/refs/vbenchmark/tasks/frontend/visualization/map-dashboard/PROMPT.md +0 -44
- package/refs/vbenchmark/tasks/frontend/visualization/map-dashboard/task.yaml +0 -24
- package/refs/vbenchmark/tasks/frontend/visualization/realtime-charts/PROMPT.md +0 -43
- package/refs/vbenchmark/tasks/frontend/visualization/realtime-charts/task.yaml +0 -24
- package/refs/vbenchmark/tasks/glue-code/advanced/blue-green-deploy/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/glue-code/advanced/blue-green-deploy/task.yaml +0 -16
- package/refs/vbenchmark/tasks/glue-code/advanced/canary-release/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/glue-code/advanced/canary-release/task.yaml +0 -16
- package/refs/vbenchmark/tasks/glue-code/advanced/change-data-capture/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/glue-code/advanced/change-data-capture/task.yaml +0 -16
- package/refs/vbenchmark/tasks/glue-code/advanced/config-management/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/glue-code/advanced/config-management/task.yaml +0 -16
- package/refs/vbenchmark/tasks/glue-code/advanced/data-pipeline/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/glue-code/advanced/data-pipeline/task.yaml +0 -16
- package/refs/vbenchmark/tasks/glue-code/advanced/distributed-tracing/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/glue-code/advanced/distributed-tracing/task.yaml +0 -16
- package/refs/vbenchmark/tasks/glue-code/advanced/log-aggregation/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/glue-code/advanced/log-aggregation/task.yaml +0 -16
- package/refs/vbenchmark/tasks/glue-code/advanced/schema-registry/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/glue-code/advanced/schema-registry/task.yaml +0 -16
- package/refs/vbenchmark/tasks/glue-code/advanced/secret-rotation/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/glue-code/advanced/secret-rotation/task.yaml +0 -16
- package/refs/vbenchmark/tasks/glue-code/advanced/stream-processing/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/glue-code/advanced/stream-processing/task.yaml +0 -16
- package/refs/vbenchmark/tasks/glue-code/api-sync/rest-to-graphql/PROMPT.md +0 -66
- package/refs/vbenchmark/tasks/glue-code/api-sync/rest-to-graphql/task.yaml +0 -27
- package/refs/vbenchmark/tasks/glue-code/caching/redis-cache/PROMPT.md +0 -82
- package/refs/vbenchmark/tasks/glue-code/caching/redis-cache/task.yaml +0 -27
- package/refs/vbenchmark/tasks/glue-code/data-transform/avro-schema-evolution/PROMPT.md +0 -51
- package/refs/vbenchmark/tasks/glue-code/data-transform/avro-schema-evolution/task.yaml +0 -24
- package/refs/vbenchmark/tasks/glue-code/data-transform/csv-normalizer/PROMPT.md +0 -49
- package/refs/vbenchmark/tasks/glue-code/data-transform/csv-normalizer/task.yaml +0 -24
- package/refs/vbenchmark/tasks/glue-code/data-transform/excel-to-json/PROMPT.md +0 -67
- package/refs/vbenchmark/tasks/glue-code/data-transform/excel-to-json/task.yaml +0 -28
- package/refs/vbenchmark/tasks/glue-code/data-transform/excel-to-json/tests/transform.test.py +0 -137
- package/refs/vbenchmark/tasks/glue-code/data-transform/json-to-xml/PROMPT.md +0 -45
- package/refs/vbenchmark/tasks/glue-code/data-transform/json-to-xml/task.yaml +0 -24
- package/refs/vbenchmark/tasks/glue-code/data-transform/protobuf-converter/PROMPT.md +0 -44
- package/refs/vbenchmark/tasks/glue-code/data-transform/protobuf-converter/task.yaml +0 -24
- package/refs/vbenchmark/tasks/glue-code/etl/cdc-pipeline/PROMPT.md +0 -52
- package/refs/vbenchmark/tasks/glue-code/etl/cdc-pipeline/task.yaml +0 -27
- package/refs/vbenchmark/tasks/glue-code/etl/database-sync/PROMPT.md +0 -51
- package/refs/vbenchmark/tasks/glue-code/etl/database-sync/task.yaml +0 -24
- package/refs/vbenchmark/tasks/glue-code/etl/s3-to-warehouse/PROMPT.md +0 -50
- package/refs/vbenchmark/tasks/glue-code/etl/s3-to-warehouse/task.yaml +0 -24
- package/refs/vbenchmark/tasks/glue-code/file-processing/image-resizer/PROMPT.md +0 -52
- package/refs/vbenchmark/tasks/glue-code/file-processing/image-resizer/task.yaml +0 -24
- package/refs/vbenchmark/tasks/glue-code/file-processing/pdf-merger/PROMPT.md +0 -50
- package/refs/vbenchmark/tasks/glue-code/file-processing/pdf-merger/task.yaml +0 -24
- package/refs/vbenchmark/tasks/glue-code/file-processing/video-transcoder/PROMPT.md +0 -50
- package/refs/vbenchmark/tasks/glue-code/file-processing/video-transcoder/task.yaml +0 -27
- package/refs/vbenchmark/tasks/glue-code/migration/data-backfill/PROMPT.md +0 -50
- package/refs/vbenchmark/tasks/glue-code/migration/data-backfill/task.yaml +0 -24
- package/refs/vbenchmark/tasks/glue-code/migration/database-versioning/PROMPT.md +0 -50
- package/refs/vbenchmark/tasks/glue-code/migration/database-versioning/task.yaml +0 -24
- package/refs/vbenchmark/tasks/glue-code/queue/kafka-producer/PROMPT.md +0 -49
- package/refs/vbenchmark/tasks/glue-code/queue/kafka-producer/task.yaml +0 -27
- package/refs/vbenchmark/tasks/glue-code/queue/rabbitmq-consumer/PROMPT.md +0 -50
- package/refs/vbenchmark/tasks/glue-code/queue/rabbitmq-consumer/task.yaml +0 -27
- package/refs/vbenchmark/tasks/glue-code/queue/sqs-batch-processor/PROMPT.md +0 -47
- package/refs/vbenchmark/tasks/glue-code/queue/sqs-batch-processor/task.yaml +0 -24
- package/refs/vbenchmark/tasks/glue-code/scheduler/cron-job-manager/PROMPT.md +0 -52
- package/refs/vbenchmark/tasks/glue-code/scheduler/cron-job-manager/task.yaml +0 -27
- package/refs/vbenchmark/tasks/glue-code/scheduler/delayed-tasks/PROMPT.md +0 -51
- package/refs/vbenchmark/tasks/glue-code/scheduler/delayed-tasks/task.yaml +0 -27
- package/refs/vbenchmark/tasks/saas-core/advanced/api-versioning/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/saas-core/advanced/api-versioning/task.yaml +0 -16
- package/refs/vbenchmark/tasks/saas-core/advanced/circuit-breaker/PROMPT.md +0 -13
- package/refs/vbenchmark/tasks/saas-core/advanced/circuit-breaker/task.yaml +0 -16
- package/refs/vbenchmark/tasks/saas-core/advanced/compliance-gdpr/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/saas-core/advanced/compliance-gdpr/task.yaml +0 -16
- package/refs/vbenchmark/tasks/saas-core/advanced/cqrs-pattern/PROMPT.md +0 -13
- package/refs/vbenchmark/tasks/saas-core/advanced/cqrs-pattern/task.yaml +0 -16
- package/refs/vbenchmark/tasks/saas-core/advanced/data-encryption/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/saas-core/advanced/data-encryption/task.yaml +0 -16
- package/refs/vbenchmark/tasks/saas-core/advanced/distributed-locking/PROMPT.md +0 -46
- package/refs/vbenchmark/tasks/saas-core/advanced/distributed-locking/task.yaml +0 -24
- package/refs/vbenchmark/tasks/saas-core/advanced/event-sourcing/PROMPT.md +0 -23
- package/refs/vbenchmark/tasks/saas-core/advanced/event-sourcing/task.yaml +0 -16
- package/refs/vbenchmark/tasks/saas-core/advanced/feature-flags-ab/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/saas-core/advanced/feature-flags-ab/task.yaml +0 -16
- package/refs/vbenchmark/tasks/saas-core/advanced/saga-orchestration/PROMPT.md +0 -13
- package/refs/vbenchmark/tasks/saas-core/advanced/saga-orchestration/task.yaml +0 -16
- package/refs/vbenchmark/tasks/saas-core/advanced/webhook-delivery/PROMPT.md +0 -15
- package/refs/vbenchmark/tasks/saas-core/advanced/webhook-delivery/task.yaml +0 -16
- package/refs/vbenchmark/tasks/saas-core/audit/activity-logging/PROMPT.md +0 -50
- package/refs/vbenchmark/tasks/saas-core/audit/activity-logging/task.yaml +0 -27
- package/refs/vbenchmark/tasks/saas-core/auth/jwt-refresh-tokens/PROMPT.md +0 -50
- package/refs/vbenchmark/tasks/saas-core/auth/jwt-refresh-tokens/task.yaml +0 -27
- package/refs/vbenchmark/tasks/saas-core/auth/magic-link-email/PROMPT.md +0 -53
- package/refs/vbenchmark/tasks/saas-core/auth/magic-link-email/task.yaml +0 -27
- package/refs/vbenchmark/tasks/saas-core/auth/mfa-totp/PROMPT.md +0 -79
- package/refs/vbenchmark/tasks/saas-core/auth/mfa-totp/task.yaml +0 -27
- package/refs/vbenchmark/tasks/saas-core/auth/rbac-permissions/PROMPT.md +0 -51
- package/refs/vbenchmark/tasks/saas-core/auth/rbac-permissions/task.yaml +0 -27
- package/refs/vbenchmark/tasks/saas-core/auth/session-management/PROMPT.md +0 -52
- package/refs/vbenchmark/tasks/saas-core/auth/session-management/task.yaml +0 -27
- package/refs/vbenchmark/tasks/saas-core/auth/supabase-oauth/PROMPT.md +0 -45
- package/refs/vbenchmark/tasks/saas-core/auth/supabase-oauth/docker-compose.yaml +0 -47
- package/refs/vbenchmark/tasks/saas-core/auth/supabase-oauth/task.yaml +0 -32
- package/refs/vbenchmark/tasks/saas-core/auth/supabase-oauth/tests/auth.test.ts +0 -59
- package/refs/vbenchmark/tasks/saas-core/billing/invoice-generation/PROMPT.md +0 -53
- package/refs/vbenchmark/tasks/saas-core/billing/invoice-generation/task.yaml +0 -27
- package/refs/vbenchmark/tasks/saas-core/billing/stripe-subscriptions/PROMPT.md +0 -51
- package/refs/vbenchmark/tasks/saas-core/billing/stripe-subscriptions/task.yaml +0 -27
- package/refs/vbenchmark/tasks/saas-core/billing/usage-metering/PROMPT.md +0 -52
- package/refs/vbenchmark/tasks/saas-core/billing/usage-metering/task.yaml +0 -27
- package/refs/vbenchmark/tasks/saas-core/crud/dashboard-table/PROMPT.md +0 -48
- package/refs/vbenchmark/tasks/saas-core/crud/dashboard-table/task.yaml +0 -28
- package/refs/vbenchmark/tasks/saas-core/multi-tenant/org-isolation/PROMPT.md +0 -50
- package/refs/vbenchmark/tasks/saas-core/multi-tenant/org-isolation/task.yaml +0 -27
- package/refs/vbenchmark/tasks/saas-core/multi-tenant/subdomain-routing/PROMPT.md +0 -50
- package/refs/vbenchmark/tasks/saas-core/multi-tenant/subdomain-routing/task.yaml +0 -27
- package/refs/vbenchmark/tasks/saas-core/notifications/email-queue/PROMPT.md +0 -53
- package/refs/vbenchmark/tasks/saas-core/notifications/email-queue/task.yaml +0 -27
- package/refs/vbenchmark/tasks/saas-core/notifications/in-app-alerts/PROMPT.md +0 -51
- package/refs/vbenchmark/tasks/saas-core/notifications/in-app-alerts/task.yaml +0 -27
- package/refs/vbenchmark/tasks/saas-core/notifications/push-notifications/PROMPT.md +0 -51
- package/refs/vbenchmark/tasks/saas-core/notifications/push-notifications/task.yaml +0 -27
- package/refs/vbenchmark/tasks/saas-core/realtime/websocket-chat/PROMPT.md +0 -80
- package/refs/vbenchmark/tasks/saas-core/realtime/websocket-chat/task.yaml +0 -27
- package/refs/vbenchmark/tasks/saas-core/search/full-text-search/PROMPT.md +0 -51
- package/refs/vbenchmark/tasks/saas-core/search/full-text-search/task.yaml +0 -27
- package/refs/vbenchmark/tasks/saas-core/security/rate-limiter/PROMPT.md +0 -99
- package/refs/vbenchmark/tasks/saas-core/security/rate-limiter/task.yaml +0 -27
- package/refs/vbenchmark/tasks/saas-core/settings/user-preferences/PROMPT.md +0 -78
- package/refs/vbenchmark/tasks/saas-core/settings/user-preferences/task.yaml +0 -27
- package/refs/vbenchmark/templates/fastapi-postgres/docker-compose.yaml +0 -36
- package/refs/vbenchmark/templates/fastapi-postgres/pyproject.toml +0 -34
- package/refs/vbenchmark/templates/fastapi-postgres/src/__init__.py +0 -0
- package/refs/vbenchmark/templates/fastapi-postgres/src/config.py +0 -12
- package/refs/vbenchmark/templates/fastapi-postgres/src/database.py +0 -15
- package/refs/vbenchmark/templates/fastapi-postgres/src/main.py +0 -51
- package/refs/vbenchmark/templates/fastapi-postgres/src/models.py +0 -12
- package/refs/vbenchmark/templates/fastapi-postgres/src/schemas.py +0 -20
- package/refs/vbenchmark/templates/go-fiber/docker-compose.yaml +0 -34
- package/refs/vbenchmark/templates/go-fiber/go.mod +0 -33
- package/refs/vbenchmark/templates/go-fiber/go.sum +0 -68
- package/refs/vbenchmark/templates/go-fiber/main.go +0 -98
- package/refs/vbenchmark/templates/nextjs-supabase/.env.example +0 -3
- package/refs/vbenchmark/templates/nextjs-supabase/docker-compose.yaml +0 -68
- package/refs/vbenchmark/templates/nextjs-supabase/src/app/globals.css +0 -13
- package/refs/vbenchmark/templates/nextjs-supabase/src/app/layout.tsx +0 -19
- package/refs/vbenchmark/templates/nextjs-supabase/src/app/page.tsx +0 -38
- package/refs/vbenchmark/templates/nextjs-supabase/src/lib/supabase/client.ts +0 -8
- package/refs/vbenchmark/templates/nextjs-supabase/src/lib/supabase/server.ts +0 -32
- package/refs/vbenchmark/templates/rust-axum/Cargo.lock +0 -2371
- package/refs/vbenchmark/templates/rust-axum/Cargo.toml +0 -16
- package/refs/vbenchmark/templates/rust-axum/docker-compose.yaml +0 -34
- package/refs/vbenchmark/templates/rust-axum/migrations/20240101000000_init.sql +0 -20
- package/refs/vbenchmark/templates/rust-axum/src/main.rs +0 -121
- package/refs/vbenchmark/tsconfig.base.json +0 -18
- package/refs/vbenchmark/turbo.json +0 -23
- package/refs/vbenchmark/vercel.json +0 -10
|
@@ -1,354 +0,0 @@
|
|
|
1
|
-
<p align="center">
|
|
2
|
-
<h1 align="center">🚀 VibeCodingBench</h1>
|
|
3
|
-
<p align="center">
|
|
4
|
-
<strong>The benchmark that measures what AI coding agents actually do in production</strong>
|
|
5
|
-
</p>
|
|
6
|
-
<p align="center">
|
|
7
|
-
<a href="#why-vibecodingbench">Why</a> •
|
|
8
|
-
<a href="#quick-start">Quick Start</a> •
|
|
9
|
-
<a href="#task-categories">Tasks</a> •
|
|
10
|
-
<a href="#evaluation">Evaluation</a> •
|
|
11
|
-
<a href="#leaderboard">Leaderboard</a> •
|
|
12
|
-
<a href="#contributing">Contributing</a>
|
|
13
|
-
</p>
|
|
14
|
-
<p align="center">
|
|
15
|
-
<img src="https://img.shields.io/badge/tasks-180-blue" alt="Tasks">
|
|
16
|
-
<img src="https://img.shields.io/badge/models-15-green" alt="Models">
|
|
17
|
-
<img src="https://img.shields.io/badge/languages-10-orange" alt="Languages">
|
|
18
|
-
|
|
19
|
-
<img src="https://img.shields.io/badge/version-1.0.0-brightgreen" alt="Version">
|
|
20
|
-
</p>
|
|
21
|
-
</p>
|
|
22
|
-
|
|
23
|
-
---
|
|
24
|
-
|
|
25
|
-
## Why VibeCodingBench?
|
|
26
|
-
|
|
27
|
-
**Existing benchmarks are disconnected from reality.** See our [full thesis](docs/THESIS.md) for detailed analysis.
|
|
28
|
-
|
|
29
|
-
| Benchmark | Focus | Real-World Signal | Limitation |
|
|
30
|
-
|-----------|-------|-------------------|------------|
|
|
31
|
-
| HumanEval | Algorithmic puzzles | ❌ Low | Not production code |
|
|
32
|
-
| SWE-bench | Bug fixes in 12 repos | ⚠️ Medium | [63% suspicious patches](https://runloop.ai/blog/swe-bench-deep-dive-unmasking-the-limitations-of-a-popular-benchmark) |
|
|
33
|
-
| SWE-bench Pro | Multi-file tasks | ⚠️ Medium | [70% → 23% performance drop](https://scale.com/leaderboard/swe_bench_pro_public) |
|
|
34
|
-
| **VibeCodingBench** | Full-stack features | ✅ **High** | Production-aligned tasks |
|
|
35
|
-
|
|
36
|
-
### The Evidence
|
|
37
|
-
|
|
38
|
-
**Developer Time Distribution** ([Sonar Research](https://www.sonarsource.com/blog/how-much-time-do-developers-spend-actually-writing-code/)):
|
|
39
|
-
- Writing new code: 32% | Code maintenance: 19% | Testing: 12%
|
|
40
|
-
- Developers code only **52 minutes/day** on average
|
|
41
|
-
|
|
42
|
-
**The Boilerplate Burden** ([GitHub Octoverse 2025](https://github.blog/news-insights/octoverse/)):
|
|
43
|
-
- 2.4M repos use Notebooks (+75% YoY)
|
|
44
|
-
- 1.9M repos use Dockerfiles (+120% YoY)
|
|
45
|
-
- Developers need help with **repetitive patterns**: auth, CRUD, integrations
|
|
46
|
-
|
|
47
|
-
**SWE-EVO Exposes the Gap** ([arxiv:2512.18470](https://arxiv.org/abs/2512.18470)):
|
|
48
|
-
- Best models: 65% on simple fixes → **only 21% on code evolution**
|
|
49
|
-
- "Current AI agents struggle with comprehensive planning and execution"
|
|
50
|
-
|
|
51
|
-
**Quality Beyond Pass Rate** ([Qodo 2025](https://www.qodo.ai/reports/state-of-ai-code-quality/)):
|
|
52
|
-
- "Claude Sonnet 4 averaged **2.11 issues per passing task**"
|
|
53
|
-
- Pass rate alone hides production risks
|
|
54
|
-
|
|
55
|
-
**Developer Frustration** ([Stack Overflow 2025](https://survey.stackoverflow.co/2025/)):
|
|
56
|
-
- 66% cite "AI solutions almost right, but not quite" as top frustration
|
|
57
|
-
- 45% say "debugging AI code is more time-consuming"
|
|
58
|
-
|
|
59
|
-
## Quick Start
|
|
60
|
-
|
|
61
|
-
### From Source
|
|
62
|
-
|
|
63
|
-
```bash
|
|
64
|
-
git clone https://github.com/alt-research/vibe-coding-benchmark-public.git
|
|
65
|
-
cd coding-model-benchmark
|
|
66
|
-
npm install
|
|
67
|
-
npm run build
|
|
68
|
-
|
|
69
|
-
# List tasks
|
|
70
|
-
node packages/cli/dist/index.js list
|
|
71
|
-
|
|
72
|
-
# Run a task with mock agent
|
|
73
|
-
node packages/cli/dist/index.js run saas-core/auth/supabase-oauth --agent mock
|
|
74
|
-
|
|
75
|
-
# Run with real agent (requires API key)
|
|
76
|
-
export ANTHROPIC_API_KEY=your_key
|
|
77
|
-
node packages/cli/dist/index.js run saas-core/auth/supabase-oauth --agent claude
|
|
78
|
-
|
|
79
|
-
# Run full evaluation across agents
|
|
80
|
-
node packages/cli/dist/index.js eval --agents claude,glm,minimax
|
|
81
|
-
|
|
82
|
-
# Watch live execution
|
|
83
|
-
node packages/cli/dist/index.js run <task-id> --agent claude --live
|
|
84
|
-
```
|
|
85
|
-
|
|
86
|
-
## Task Categories
|
|
87
|
-
|
|
88
|
-
| Category | Weight | Tasks | Languages | Examples |
|
|
89
|
-
|----------|--------|-------|-----------|----------|
|
|
90
|
-
| **SaaS Core** | 25% | 20 | TS, Go, Python, Java, Rust | `supabase-oauth`, `jwt-refresh-tokens`, `rbac-permissions` |
|
|
91
|
-
| **Glue Code** | 20% | 20 | Python, Go, TS, Java, Rust | `csv-normalizer`, `kafka-producer`, `cdc-pipeline` |
|
|
92
|
-
| **AI Integration** | 20% | 20 | Python, TS, Go | `pdf-qa`, `research-agent`, `semantic-search` |
|
|
93
|
-
| **Frontend** | 15% | 20 | React, Vue, Svelte, RN | `landing-page`, `data-grid`, `collaborative-editor` |
|
|
94
|
-
| **API Integrations** | 10% | 20 | TS, Go, Python, Java | `checkout-session`, `twilio-sms`, `saml-sso` |
|
|
95
|
-
| **Code Evolution** | 10% | 20 | TS, Python, Go, Kotlin | `flask-to-fastapi`, `java-to-kotlin`, `secrets-rotation` |
|
|
96
|
-
|
|
97
|
-
**Total: 180 tasks** across **10 languages** (TypeScript, Python, Go, Java, Kotlin, Rust, C#, React, Vue, Svelte)
|
|
98
|
-
|
|
99
|
-
### Language Distribution
|
|
100
|
-
|
|
101
|
-
Based on [GitHub Octoverse 2025](https://github.blog/news-insights/octoverse/) and [Stack Overflow Developer Survey 2025](https://survey.stackoverflow.co/2025/):
|
|
102
|
-
|
|
103
|
-
| Language | % of Tasks | Rationale |
|
|
104
|
-
|----------|------------|-----------|
|
|
105
|
-
| TypeScript/JavaScript | 40% | #1 on GitHub, dominant in web dev |
|
|
106
|
-
| Python | 25% | #2 on GitHub, AI/ML leader |
|
|
107
|
-
| Go | 15% | Rising for cloud-native, microservices |
|
|
108
|
-
| Java/Kotlin | 10% | Enterprise, Android development |
|
|
109
|
-
| Rust | 5% | Systems programming, performance-critical |
|
|
110
|
-
| C# | 5% | Enterprise, game development |
|
|
111
|
-
|
|
112
|
-
### Task Structure
|
|
113
|
-
|
|
114
|
-
Each task is a self-contained directory:
|
|
115
|
-
|
|
116
|
-
```
|
|
117
|
-
tasks/saas-core/auth/supabase-oauth/
|
|
118
|
-
├── task.yaml # Metadata, constraints
|
|
119
|
-
├── PROMPT.md # Instructions for the agent
|
|
120
|
-
├── tests/ # Evaluation tests
|
|
121
|
-
│ └── auth.test.ts # Playwright E2E tests
|
|
122
|
-
├── docker-compose.yaml # Services (DB, mock APIs)
|
|
123
|
-
└── golden/ # Reference implementation (optional)
|
|
124
|
-
```
|
|
125
|
-
|
|
126
|
-
**Hot-reload support**: Add new tasks while the benchmark is running!
|
|
127
|
-
|
|
128
|
-
## Evaluation
|
|
129
|
-
|
|
130
|
-
### Multi-Dimensional Scoring
|
|
131
|
-
|
|
132
|
-
We measure what senior engineers care about:
|
|
133
|
-
|
|
134
|
-
| Dimension | Weight | Method | Why It Matters |
|
|
135
|
-
|-----------|--------|--------|----------------|
|
|
136
|
-
| **Functional** | 40% | Playwright E2E, Pass@k | Does it work? |
|
|
137
|
-
| **Visual** | 20% | Pixel diff vs reference | Does it look right? |
|
|
138
|
-
| **Quality** | 20% | ESLint + Semgrep + complexity | Is it maintainable? |
|
|
139
|
-
| **Cost** | 10% | Tokens used, context pollution | Is it efficient? |
|
|
140
|
-
| **Speed** | 10% | Wall-clock time, step count | Is it fast? |
|
|
141
|
-
|
|
142
|
-
### Security Gate
|
|
143
|
-
|
|
144
|
-
Any **Critical/High** vulnerability = **automatic fail**. We use Semgrep with OWASP rules.
|
|
145
|
-
|
|
146
|
-
### The Scoring Formula
|
|
147
|
-
|
|
148
|
-
```
|
|
149
|
-
Final = (Functional × 0.4) + (Visual × 0.2) + (Quality × 0.2)
|
|
150
|
-
- (Cost Penalty) - (Speed Penalty)
|
|
151
|
-
|
|
152
|
-
Security Fail → Final = 0
|
|
153
|
-
```
|
|
154
|
-
|
|
155
|
-
## Supported Agents
|
|
156
|
-
|
|
157
|
-
| Agent | Model | Status | Config | Pricing (Input/Output per MTok) |
|
|
158
|
-
|-------|-------|--------|--------|--------------------------------|
|
|
159
|
-
| Claude | Haiku 4.5 | ✅ Supported | `ANTHROPIC_API_KEY` | $1.00 / $5.00 |
|
|
160
|
-
| Claude | Opus 4.5 | ✅ Supported | `ANTHROPIC_API_KEY` | $5.00 / $25.00 |
|
|
161
|
-
| Qwen | Qwen3-Max | ✅ Supported | `QWEN_API_KEY` | $1.20 / $6.00 |
|
|
162
|
-
| GLM | GLM-4.7 | ✅ Supported | `GLM_API_KEY` | $0.60 / $2.20 |
|
|
163
|
-
| MiniMax | M2.1 | ✅ Supported | `MINIMAX_API_KEY` | $0.30 / $1.20 |
|
|
164
|
-
| OpenAI | GPT-5.2 | ✅ Supported | `OPENAI_API_KEY` | $1.75 / $14.00 |
|
|
165
|
-
| DeepSeek | Chat-v3 | ✅ Supported | `DEEPSEEK_API_KEY` | $0.40 / $1.60 |
|
|
166
|
-
| Gemini | 3-Flash Preview | ✅ Supported | `GOOGLE_API_KEY` | $0.50 / $3.00 |
|
|
167
|
-
|
|
168
|
-
## Leaderboard
|
|
169
|
-
|
|
170
|
-
```
|
|
171
|
-
📈 LEADERBOARD (2026-01-27) - 180 tasks evaluated, 15 models
|
|
172
|
-
|
|
173
|
-
╔══════╤══════════════════════╤═══════╤═══════════╤════════════╤════════════╤══════════════╤═════════════╗
|
|
174
|
-
║ Rank │ Model │ Final │ Pass Rate │ Total Cost │ Total Time │ Avg Time/Task│ Total Tokens║
|
|
175
|
-
╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
|
|
176
|
-
║ #1 │ Claude Opus 4.5 │ 89.2% │ 100.0% │ $12.31 │ 2h 12m │ 44s │ 648K ║
|
|
177
|
-
╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
|
|
178
|
-
║ #2 │ Claude Haiku 4.5 │ 89.0% │ 99.4% │ $3.03 │ 1h 5m │ 22s │ 798K ║
|
|
179
|
-
╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
|
|
180
|
-
║ #3 │ Grok 4 Fast │ 88.8% │ 98.9% │ $0.21 │ 1h 57m │ 70s │ 520K ║
|
|
181
|
-
╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
|
|
182
|
-
║ #4 │ OpenAI GPT-5.2 │ 88.8% │ 98.3% │ $5.01 │ 1h 24m │ 28s │ 485K ║
|
|
183
|
-
╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
|
|
184
|
-
║ #5 │ Qwen3 Max │ 88.6% │ 100.0% │ $5.42 │ 2h 15m │ 45s │ 949K ║
|
|
185
|
-
╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
|
|
186
|
-
║ #6 │ Claude Sonnet 4.5 │ 88.6% │ 98.3% │ $6.98 │ 2h 6m │ 42s │ 612K ║
|
|
187
|
-
╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
|
|
188
|
-
║ #7 │ GLM 4-Plus │ 88.2% │ 98.9% │ $0.93 │ 4h 49m │ 96s │ 794K ║
|
|
189
|
-
╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
|
|
190
|
-
║ #8 │ DeepSeek v3.2 │ 88.2% │ 98.3% │ $0.50 │ 4h 29m │ 90s │ 543K ║
|
|
191
|
-
╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
|
|
192
|
-
║ #9 │ Grok 4 │ 88.0% │ 97.8% │ $5.47 │ 2h 5m │ 75s │ 480K ║
|
|
193
|
-
╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
|
|
194
|
-
║ #10 │ MiniMax M2.1 │ 87.4% │ 99.4% │ $2.40 │ 8h 15m │ 165s │ 2.78M ║
|
|
195
|
-
╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
|
|
196
|
-
║ #11 │ Grok 4.1 Fast │ 86.8% │ 97.2% │ $0.24 │ 2h 27m │ 89s │ 580K ║
|
|
197
|
-
╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
|
|
198
|
-
║ #12 │ Gemini 3 Pro Preview │ 85.8% │ 95.0% │ $10.34 │ 1h 36m │ 32s │ 738K ║
|
|
199
|
-
╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
|
|
200
|
-
║ #13 │ GLM-4.7 │ 83.9% │ 85.6% │ $0.73 │ 2h 50m │ 57s │ 623K ║
|
|
201
|
-
╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
|
|
202
|
-
║ #14 │ GLM 4.7 Flash │ 83.8% │ 92.8% │ $1.11 │ 2h 15m │ 45s │ 650K ║
|
|
203
|
-
╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
|
|
204
|
-
║ #15 │ Gemini 3 Flash │ 83.4% │ 92.2% │ $0.86 │ 1h 23m │ 28s │ 384K ║
|
|
205
|
-
╚══════╧══════════════════════╧═══════╧═══════════╧════════════╧════════════╧══════════════╧═════════════╝
|
|
206
|
-
```
|
|
207
|
-
|
|
208
|
-
### Pricing (OpenRouter 2026-01-27)
|
|
209
|
-
|
|
210
|
-
| Model | Input $/M | Output $/M |
|
|
211
|
-
|-------|-----------|------------|
|
|
212
|
-
| Claude Opus 4.5 | $5.00 | $25.00 |
|
|
213
|
-
| Claude Sonnet 4.5 | $3.00 | $15.00 |
|
|
214
|
-
| Claude Haiku 4.5 | $1.00 | $5.00 |
|
|
215
|
-
| Qwen3 Max | $1.20 | $6.00 |
|
|
216
|
-
| OpenAI GPT-5.2 | $1.75 | $14.00 |
|
|
217
|
-
| Grok 4 | $3.00 | $15.00 |
|
|
218
|
-
| Grok 4 Fast | $0.20 | $0.50 |
|
|
219
|
-
| Grok 4.1 Fast | $0.20 | $0.50 |
|
|
220
|
-
| GLM 4-Plus/4.7 | $0.40 | $1.50 |
|
|
221
|
-
| GLM 4.7 Flash | $0.07 | $0.40 |
|
|
222
|
-
| DeepSeek v3.2 | $0.30 | $1.20 |
|
|
223
|
-
| Gemini 3 Flash | $0.50 | $3.00 |
|
|
224
|
-
| Gemini 3 Pro | $2.00 | $12.00 |
|
|
225
|
-
| MiniMax M2.1 | $0.27 | $1.12 |
|
|
226
|
-
|
|
227
|
-
### Detailed Metrics
|
|
228
|
-
|
|
229
|
-
| Model | Functional | Quality | Cost/Task | Tokens/Task |
|
|
230
|
-
|-------|------------|---------|-----------|-------------|
|
|
231
|
-
| Claude Opus 4.5 | 85.0% | 80.0% | $0.0684 | 3,599 |
|
|
232
|
-
| Claude Haiku 4.5 | 84.5% | 79.6% | $0.0168 | 4,435 |
|
|
233
|
-
| Grok 4 Fast | 84.1% | 80.0% | $0.0012 | 2,889 |
|
|
234
|
-
| Qwen3 Max | 85.0% | 80.0% | $0.0301 | 5,273 |
|
|
235
|
-
| OpenAI GPT-5.2 | 83.6% | 79.6% | $0.0278 | 2,694 |
|
|
236
|
-
| Claude Sonnet 4.5 | 83.6% | 80.0% | $0.0388 | 3,400 |
|
|
237
|
-
| GLM 4-Plus | 84.1% | 80.0% | $0.0052 | 4,412 |
|
|
238
|
-
| DeepSeek v3.2 | 83.6% | 80.0% | $0.0028 | 3,015 |
|
|
239
|
-
| Grok 4 | 83.6% | 80.0% | $0.0304 | 2,667 |
|
|
240
|
-
| MiniMax M2.1 | 84.5% | 80.0% | $0.0133 | 15,436 |
|
|
241
|
-
| Grok 4.1 Fast | 82.6% | 78.7% | $0.0013 | 3,222 |
|
|
242
|
-
| Gemini 3 Pro Preview | 80.8% | 77.3% | $0.0574 | 4,099 |
|
|
243
|
-
| GLM-4.7 | 72.7% | 79.6% | $0.0041 | 3,464 |
|
|
244
|
-
| GLM 4.7 Flash | 78.9% | 79.6% | $0.0062 | 3,611 |
|
|
245
|
-
| Gemini 3 Flash | 78.4% | 75.1% | $0.0048 | 2,133 |
|
|
246
|
-
|
|
247
|
-
**Live Dashboard**: https://vibecoding.llmbench.xyz
|
|
248
|
-
|
|
249
|
-
## Contributing
|
|
250
|
-
|
|
251
|
-
We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for details.
|
|
252
|
-
|
|
253
|
-
### Adding a New Task
|
|
254
|
-
|
|
255
|
-
1. **Create task directory**:
|
|
256
|
-
```bash
|
|
257
|
-
mkdir -p tasks/<category>/<subcategory>/<task-name>
|
|
258
|
-
```
|
|
259
|
-
|
|
260
|
-
2. **Add task.yaml**:
|
|
261
|
-
```yaml
|
|
262
|
-
name: My New Task
|
|
263
|
-
category: saas-core
|
|
264
|
-
difficulty: medium
|
|
265
|
-
stack: nextjs-supabase
|
|
266
|
-
tags: [typescript, auth]
|
|
267
|
-
```
|
|
268
|
-
|
|
269
|
-
3. **Write PROMPT.md** with clear requirements
|
|
270
|
-
|
|
271
|
-
4. **Add tests** (Playwright for web, pytest for Python)
|
|
272
|
-
|
|
273
|
-
5. **Submit PR** using the template
|
|
274
|
-
|
|
275
|
-
## Architecture
|
|
276
|
-
|
|
277
|
-
```
|
|
278
|
-
vibecodingbench/
|
|
279
|
-
├── packages/
|
|
280
|
-
│ ├── cli/ # CLI tool
|
|
281
|
-
│ ├── evaluator/ # Scoring engine
|
|
282
|
-
│ └── leaderboard/ # Web dashboard
|
|
283
|
-
├── tasks/ # 120 benchmark tasks
|
|
284
|
-
│ ├── saas-core/ # 20 tasks
|
|
285
|
-
│ ├── glue-code/ # 20 tasks
|
|
286
|
-
│ ├── ai-integration/ # 20 tasks
|
|
287
|
-
│ ├── frontend/ # 20 tasks
|
|
288
|
-
│ ├── api-integrations/ # 20 tasks
|
|
289
|
-
│ └── code-evolution/ # 20 tasks
|
|
290
|
-
├── templates/ # Starter codebases
|
|
291
|
-
│ ├── nextjs-supabase/
|
|
292
|
-
│ ├── fastapi-postgres/
|
|
293
|
-
│ ├── go-fiber/
|
|
294
|
-
│ └── rust-axum/
|
|
295
|
-
└── docker/ # Base images
|
|
296
|
-
```
|
|
297
|
-
|
|
298
|
-
## Deployment
|
|
299
|
-
|
|
300
|
-
### Self-Hosted (Docker)
|
|
301
|
-
|
|
302
|
-
```bash
|
|
303
|
-
# Build and run production stack
|
|
304
|
-
./scripts/deploy.sh docker
|
|
305
|
-
|
|
306
|
-
# Or in background
|
|
307
|
-
./scripts/deploy.sh docker --detach
|
|
308
|
-
|
|
309
|
-
# Services available at:
|
|
310
|
-
# - Dashboard: http://localhost:3000
|
|
311
|
-
# - API: http://localhost:3001
|
|
312
|
-
```
|
|
313
|
-
|
|
314
|
-
### Fly.io
|
|
315
|
-
|
|
316
|
-
```bash
|
|
317
|
-
cd packages/leaderboard
|
|
318
|
-
fly launch --config fly.toml
|
|
319
|
-
fly deploy
|
|
320
|
-
```
|
|
321
|
-
|
|
322
|
-
## Environment Setup
|
|
323
|
-
|
|
324
|
-
```bash
|
|
325
|
-
# Required
|
|
326
|
-
export ANTHROPIC_API_KEY=... # Claude (Anthropic)
|
|
327
|
-
export OPENAI_API_KEY=... # OpenAI
|
|
328
|
-
export GOOGLE_API_KEY=... # Gemini (Google AI)
|
|
329
|
-
|
|
330
|
-
# Optional
|
|
331
|
-
export GLM_API_KEY=... # GLM (Zhipu AI)
|
|
332
|
-
export MINIMAX_API_KEY=... # MiniMax
|
|
333
|
-
export QWEN_API_KEY=... # Qwen (Alibaba DashScope)
|
|
334
|
-
export DEEPSEEK_API_KEY=... # DeepSeek
|
|
335
|
-
```
|
|
336
|
-
|
|
337
|
-
## Citation
|
|
338
|
-
|
|
339
|
-
If you use VibeCodingBench in your research, please cite:
|
|
340
|
-
|
|
341
|
-
```bibtex
|
|
342
|
-
@software{vibecodingbench2025,
|
|
343
|
-
title = {VibeCodingBench: A Benchmark for AI Coding Agents on Real-World Developer Tasks},
|
|
344
|
-
year = {2025},
|
|
345
|
-
url = {https://github.com/alt-research/vibe-coding-benchmark-public}
|
|
346
|
-
}
|
|
347
|
-
```
|
|
348
|
-
|
|
349
|
-
|
|
350
|
-
---
|
|
351
|
-
|
|
352
|
-
<p align="center">
|
|
353
|
-
<sub>Built with ❤️ by the open-source community</sub>
|
|
354
|
-
</p>
|
|
@@ -1,35 +0,0 @@
|
|
|
1
|
-
version: '3.8'
|
|
2
|
-
|
|
3
|
-
services:
|
|
4
|
-
leaderboard:
|
|
5
|
-
build:
|
|
6
|
-
context: ./packages/leaderboard
|
|
7
|
-
dockerfile: Dockerfile
|
|
8
|
-
ports:
|
|
9
|
-
- "3001:3001"
|
|
10
|
-
environment:
|
|
11
|
-
- NODE_ENV=production
|
|
12
|
-
- PORT=3001
|
|
13
|
-
- DATABASE_URL=${DATABASE_URL:-}
|
|
14
|
-
restart: unless-stopped
|
|
15
|
-
healthcheck:
|
|
16
|
-
test: ["CMD", "wget", "-qO-", "http://localhost:3001/health"]
|
|
17
|
-
interval: 30s
|
|
18
|
-
timeout: 10s
|
|
19
|
-
retries: 3
|
|
20
|
-
|
|
21
|
-
dashboard:
|
|
22
|
-
build:
|
|
23
|
-
context: ./packages/dashboard
|
|
24
|
-
dockerfile: Dockerfile
|
|
25
|
-
ports:
|
|
26
|
-
- "3000:3000"
|
|
27
|
-
environment:
|
|
28
|
-
- VITE_API_URL=http://leaderboard:3001
|
|
29
|
-
depends_on:
|
|
30
|
-
- leaderboard
|
|
31
|
-
restart: unless-stopped
|
|
32
|
-
|
|
33
|
-
networks:
|
|
34
|
-
default:
|
|
35
|
-
name: vibecodingbench
|
|
@@ -1,53 +0,0 @@
|
|
|
1
|
-
version: '3.8'
|
|
2
|
-
|
|
3
|
-
services:
|
|
4
|
-
postgres:
|
|
5
|
-
image: postgres:16-alpine
|
|
6
|
-
container_name: benchmark-postgres
|
|
7
|
-
environment:
|
|
8
|
-
POSTGRES_USER: benchmark
|
|
9
|
-
POSTGRES_PASSWORD: benchmark123
|
|
10
|
-
POSTGRES_DB: vibecodingbench
|
|
11
|
-
ports:
|
|
12
|
-
- "5432:5432"
|
|
13
|
-
volumes:
|
|
14
|
-
- postgres_data:/var/lib/postgresql/data
|
|
15
|
-
healthcheck:
|
|
16
|
-
test: ["CMD-SHELL", "pg_isready -U benchmark -d vibecodingbench"]
|
|
17
|
-
interval: 5s
|
|
18
|
-
timeout: 5s
|
|
19
|
-
retries: 5
|
|
20
|
-
|
|
21
|
-
leaderboard:
|
|
22
|
-
build:
|
|
23
|
-
context: ./packages/leaderboard
|
|
24
|
-
dockerfile: Dockerfile
|
|
25
|
-
ports:
|
|
26
|
-
- "3001:3001"
|
|
27
|
-
environment:
|
|
28
|
-
- NODE_ENV=production
|
|
29
|
-
- PORT=3001
|
|
30
|
-
- DATABASE_URL=postgresql://benchmark:benchmark123@postgres:5432/vibecodingbench
|
|
31
|
-
depends_on:
|
|
32
|
-
postgres:
|
|
33
|
-
condition: service_healthy
|
|
34
|
-
restart: unless-stopped
|
|
35
|
-
|
|
36
|
-
dashboard:
|
|
37
|
-
build:
|
|
38
|
-
context: ./packages/dashboard
|
|
39
|
-
dockerfile: Dockerfile
|
|
40
|
-
ports:
|
|
41
|
-
- "3000:3000"
|
|
42
|
-
environment:
|
|
43
|
-
- VITE_API_URL=http://leaderboard:3001
|
|
44
|
-
depends_on:
|
|
45
|
-
- leaderboard
|
|
46
|
-
restart: unless-stopped
|
|
47
|
-
|
|
48
|
-
volumes:
|
|
49
|
-
postgres_data:
|
|
50
|
-
|
|
51
|
-
networks:
|
|
52
|
-
default:
|
|
53
|
-
name: vibecodingbench
|
|
@@ -1,211 +0,0 @@
|
|
|
1
|
-
# VibeCodingBench Task Expansion Plan
|
|
2
|
-
|
|
3
|
-
## Overview
|
|
4
|
-
Expanding from 18 tasks to 120 tasks (20 per category) with multi-language support.
|
|
5
|
-
|
|
6
|
-
## Language Distribution
|
|
7
|
-
Based on GitHub Octoverse 2025 and Stack Overflow 2025:
|
|
8
|
-
- **TypeScript/JavaScript**: 40% (most used on GitHub)
|
|
9
|
-
- **Python**: 25% (dominant in AI/data)
|
|
10
|
-
- **Go**: 15% (cloud-native, microservices)
|
|
11
|
-
- **Java/Kotlin**: 10% (enterprise)
|
|
12
|
-
- **Rust**: 5% (systems, performance)
|
|
13
|
-
- **C#**: 5% (enterprise, game dev)
|
|
14
|
-
|
|
15
|
-
---
|
|
16
|
-
|
|
17
|
-
## Category 1: saas-core (20 tasks)
|
|
18
|
-
|
|
19
|
-
### Existing (6):
|
|
20
|
-
1. auth/supabase-oauth (TypeScript)
|
|
21
|
-
2. auth/mfa-totp (TypeScript)
|
|
22
|
-
3. crud/dashboard-table (TypeScript)
|
|
23
|
-
4. settings/user-preferences (TypeScript)
|
|
24
|
-
5. realtime/websocket-chat (TypeScript)
|
|
25
|
-
6. security/rate-limiter (TypeScript)
|
|
26
|
-
|
|
27
|
-
### New (14):
|
|
28
|
-
7. auth/jwt-refresh-tokens (Go) - Implement JWT with refresh token rotation
|
|
29
|
-
8. auth/magic-link-email (Python/FastAPI) - Passwordless email authentication
|
|
30
|
-
9. auth/rbac-permissions (Java/Spring) - Role-based access control system
|
|
31
|
-
10. auth/session-management (Rust/Actix) - Secure session handling with Redis
|
|
32
|
-
11. billing/stripe-subscriptions (TypeScript) - Subscription management with Stripe
|
|
33
|
-
12. billing/usage-metering (Go) - Track and bill based on API usage
|
|
34
|
-
13. billing/invoice-generation (Python) - Generate PDF invoices with line items
|
|
35
|
-
14. multi-tenant/org-isolation (TypeScript) - Database-per-tenant isolation
|
|
36
|
-
15. multi-tenant/subdomain-routing (Go) - Route requests by subdomain
|
|
37
|
-
16. notifications/email-queue (Python) - Async email notification system
|
|
38
|
-
17. notifications/push-notifications (TypeScript) - Web push with service workers
|
|
39
|
-
18. notifications/in-app-alerts (Java/Spring) - Real-time in-app notifications
|
|
40
|
-
19. audit/activity-logging (Go) - Comprehensive audit trail system
|
|
41
|
-
20. search/full-text-search (TypeScript) - Elasticsearch integration for search
|
|
42
|
-
|
|
43
|
-
---
|
|
44
|
-
|
|
45
|
-
## Category 2: glue-code (20 tasks)
|
|
46
|
-
|
|
47
|
-
### Existing (3):
|
|
48
|
-
1. data-transform/excel-to-json (Python)
|
|
49
|
-
2. api-sync/rest-to-graphql (TypeScript)
|
|
50
|
-
3. caching/redis-cache (TypeScript)
|
|
51
|
-
|
|
52
|
-
### New (17):
|
|
53
|
-
4. data-transform/csv-normalizer (Python) - Clean and normalize CSV data
|
|
54
|
-
5. data-transform/json-to-xml (Go) - Bidirectional JSON/XML conversion
|
|
55
|
-
6. data-transform/protobuf-converter (Rust) - Protocol buffer serialization
|
|
56
|
-
7. data-transform/avro-schema-evolution (Java) - Handle Avro schema changes
|
|
57
|
-
8. etl/database-sync (Python) - Sync data between PostgreSQL and MongoDB
|
|
58
|
-
9. etl/s3-to-warehouse (Go) - Load S3 files into data warehouse
|
|
59
|
-
10. etl/cdc-pipeline (TypeScript) - Change data capture with Debezium
|
|
60
|
-
11. queue/rabbitmq-consumer (Python) - Reliable message queue processing
|
|
61
|
-
12. queue/kafka-producer (Go) - High-throughput Kafka event publishing
|
|
62
|
-
13. queue/sqs-batch-processor (TypeScript) - AWS SQS batch processing
|
|
63
|
-
14. scheduler/cron-job-manager (Go) - Distributed cron job scheduling
|
|
64
|
-
15. scheduler/delayed-tasks (Python) - Celery-based delayed task execution
|
|
65
|
-
16. file-processing/image-resizer (Rust) - High-performance image processing
|
|
66
|
-
17. file-processing/pdf-merger (Python) - Merge and manipulate PDFs
|
|
67
|
-
18. file-processing/video-transcoder (Go) - FFmpeg-based video processing
|
|
68
|
-
19. migration/database-versioning (TypeScript) - Schema migration system
|
|
69
|
-
20. migration/data-backfill (Python) - Backfill data with progress tracking
|
|
70
|
-
|
|
71
|
-
---
|
|
72
|
-
|
|
73
|
-
## Category 3: ai-integration (20 tasks)
|
|
74
|
-
|
|
75
|
-
### Existing (2):
|
|
76
|
-
1. structured-output/invoice-parser (Python)
|
|
77
|
-
2. rag-chatbot/pdf-qa (Python)
|
|
78
|
-
|
|
79
|
-
### New (18):
|
|
80
|
-
3. structured-output/resume-parser (Python) - Extract structured data from resumes
|
|
81
|
-
4. structured-output/receipt-scanner (TypeScript) - OCR + LLM receipt extraction
|
|
82
|
-
5. structured-output/contract-analyzer (Python) - Legal document analysis
|
|
83
|
-
6. rag-chatbot/code-assistant (TypeScript) - Codebase Q&A with RAG
|
|
84
|
-
7. rag-chatbot/support-bot (Python) - Customer support with knowledge base
|
|
85
|
-
8. rag-chatbot/doc-search (Go) - Multi-document semantic search
|
|
86
|
-
9. agents/web-scraper-agent (Python) - Autonomous web data extraction
|
|
87
|
-
10. agents/research-agent (TypeScript) - Multi-step research automation
|
|
88
|
-
11. agents/code-review-agent (Python) - Automated PR review with LLM
|
|
89
|
-
12. function-calling/api-orchestrator (TypeScript) - LLM-driven API calls
|
|
90
|
-
13. function-calling/database-query (Python) - Natural language to SQL
|
|
91
|
-
14. function-calling/calendar-assistant (TypeScript) - Schedule management agent
|
|
92
|
-
15. embeddings/semantic-search (Python) - Vector similarity search
|
|
93
|
-
16. embeddings/recommendation-engine (Go) - Content recommendations
|
|
94
|
-
17. embeddings/duplicate-detection (Python) - Find similar documents
|
|
95
|
-
18. fine-tuning/classification-model (Python) - Fine-tune for text classification
|
|
96
|
-
19. multimodal/image-captioning (Python) - Generate image descriptions
|
|
97
|
-
20. multimodal/chart-interpreter (TypeScript) - Extract data from chart images
|
|
98
|
-
|
|
99
|
-
---
|
|
100
|
-
|
|
101
|
-
## Category 4: frontend (20 tasks)
|
|
102
|
-
|
|
103
|
-
### Existing (3):
|
|
104
|
-
1. figma-to-code/pricing-card (TypeScript)
|
|
105
|
-
2. visualization/chart-dashboard (TypeScript)
|
|
106
|
-
3. components/form-builder (TypeScript)
|
|
107
|
-
|
|
108
|
-
### New (17):
|
|
109
|
-
4. figma-to-code/landing-page (TypeScript/React) - Full landing page from design
|
|
110
|
-
5. figma-to-code/dashboard-layout (TypeScript/Vue) - Admin dashboard UI
|
|
111
|
-
6. figma-to-code/mobile-app-screen (TypeScript/React Native) - Mobile UI
|
|
112
|
-
7. components/data-grid (TypeScript/React) - Advanced data grid with virtual scroll
|
|
113
|
-
8. components/rich-text-editor (TypeScript/Vue) - WYSIWYG editor with plugins
|
|
114
|
-
9. components/file-uploader (TypeScript/React) - Drag-drop with preview
|
|
115
|
-
10. components/date-range-picker (TypeScript/Svelte) - Complex date selection
|
|
116
|
-
11. visualization/realtime-charts (TypeScript/React) - Live updating charts
|
|
117
|
-
12. visualization/map-dashboard (TypeScript/Vue) - Geographic data viz
|
|
118
|
-
13. visualization/gantt-chart (TypeScript/React) - Project timeline view
|
|
119
|
-
14. state-management/shopping-cart (TypeScript/React) - Complex cart with Redux
|
|
120
|
-
15. state-management/collaborative-editor (TypeScript/Vue) - Real-time collab
|
|
121
|
-
16. accessibility/screen-reader-nav (TypeScript/React) - WCAG compliant nav
|
|
122
|
-
17. accessibility/keyboard-shortcuts (TypeScript/Vue) - Full keyboard support
|
|
123
|
-
18. performance/infinite-scroll (TypeScript/React) - Virtualized infinite list
|
|
124
|
-
19. performance/image-lazy-load (TypeScript/Svelte) - Optimized image loading
|
|
125
|
-
20. animation/page-transitions (TypeScript/React) - Smooth route animations
|
|
126
|
-
|
|
127
|
-
---
|
|
128
|
-
|
|
129
|
-
## Category 5: api-integrations (20 tasks)
|
|
130
|
-
|
|
131
|
-
### Existing (2):
|
|
132
|
-
1. stripe/payment-webhook (TypeScript)
|
|
133
|
-
2. email/transactional (TypeScript)
|
|
134
|
-
|
|
135
|
-
### New (18):
|
|
136
|
-
3. stripe/checkout-session (Go) - Create Stripe checkout flows
|
|
137
|
-
4. stripe/subscription-portal (TypeScript) - Customer billing portal
|
|
138
|
-
5. payment/paypal-integration (Python) - PayPal payments and refunds
|
|
139
|
-
6. payment/crypto-payments (TypeScript) - Accept cryptocurrency
|
|
140
|
-
7. storage/s3-presigned-urls (Go) - Secure file uploads to S3
|
|
141
|
-
8. storage/cloudinary-upload (TypeScript) - Image upload and transform
|
|
142
|
-
9. storage/gcs-streaming (Python) - Stream large files to GCS
|
|
143
|
-
10. auth-provider/oauth2-github (Go) - GitHub OAuth integration
|
|
144
|
-
11. auth-provider/saml-sso (Java/Spring) - Enterprise SAML SSO
|
|
145
|
-
12. auth-provider/okta-integration (TypeScript) - Okta user management
|
|
146
|
-
13. communication/twilio-sms (Python) - SMS notifications
|
|
147
|
-
14. communication/slack-bot (TypeScript) - Slack app with slash commands
|
|
148
|
-
15. communication/discord-webhook (Go) - Discord notifications
|
|
149
|
-
16. maps/google-maps-geocoding (TypeScript) - Address to coordinates
|
|
150
|
-
17. maps/mapbox-directions (Python) - Route calculation
|
|
151
|
-
18. analytics/segment-tracking (TypeScript) - Event tracking pipeline
|
|
152
|
-
19. analytics/mixpanel-events (Go) - User behavior analytics
|
|
153
|
-
20. social/twitter-api (Python) - Tweet posting and monitoring
|
|
154
|
-
|
|
155
|
-
---
|
|
156
|
-
|
|
157
|
-
## Category 6: code-evolution (20 tasks)
|
|
158
|
-
|
|
159
|
-
### Existing (2):
|
|
160
|
-
1. legacy-migration/express-to-fastify (TypeScript)
|
|
161
|
-
2. refactoring/class-to-hooks (TypeScript)
|
|
162
|
-
|
|
163
|
-
### New (18):
|
|
164
|
-
3. legacy-migration/callback-to-async (TypeScript) - Callback hell to async/await
|
|
165
|
-
4. legacy-migration/jquery-to-react (TypeScript) - jQuery app to React
|
|
166
|
-
5. legacy-migration/flask-to-fastapi (Python) - Flask to FastAPI migration
|
|
167
|
-
6. legacy-migration/java-to-kotlin (Kotlin) - Java codebase to Kotlin
|
|
168
|
-
7. legacy-migration/rest-to-grpc (Go) - REST API to gRPC
|
|
169
|
-
8. refactoring/monolith-to-modules (TypeScript) - Extract modules from monolith
|
|
170
|
-
9. refactoring/orm-migration (Python) - SQLAlchemy to async SQLModel
|
|
171
|
-
10. refactoring/dependency-injection (Java/Spring) - Add DI to legacy code
|
|
172
|
-
11. refactoring/error-handling (Go) - Standardize error handling
|
|
173
|
-
12. testing/add-unit-tests (TypeScript) - Add tests to untested code
|
|
174
|
-
13. testing/e2e-playwright (TypeScript) - Add E2E tests with Playwright
|
|
175
|
-
14. testing/pytest-fixtures (Python) - Refactor tests with fixtures
|
|
176
|
-
15. performance/query-optimization (Python) - Optimize slow DB queries
|
|
177
|
-
16. performance/memory-leak-fix (TypeScript) - Fix memory leaks in Node.js
|
|
178
|
-
17. performance/async-refactor (Python) - Sync to async for I/O bound
|
|
179
|
-
18. security/sql-injection-fix (Python) - Fix SQL injection vulnerabilities
|
|
180
|
-
19. security/xss-prevention (TypeScript) - Add XSS protection
|
|
181
|
-
20. security/secrets-rotation (Go) - Implement secrets rotation
|
|
182
|
-
|
|
183
|
-
---
|
|
184
|
-
|
|
185
|
-
## Implementation Priority
|
|
186
|
-
|
|
187
|
-
### Phase 1 (High Priority - Common Tasks)
|
|
188
|
-
- All auth tasks
|
|
189
|
-
- All billing tasks
|
|
190
|
-
- All payment integrations
|
|
191
|
-
- RAG and agent tasks
|
|
192
|
-
|
|
193
|
-
### Phase 2 (Medium Priority - Enterprise)
|
|
194
|
-
- Multi-tenant tasks
|
|
195
|
-
- SAML/SSO integrations
|
|
196
|
-
- Audit logging
|
|
197
|
-
- Migration tasks
|
|
198
|
-
|
|
199
|
-
### Phase 3 (Lower Priority - Specialized)
|
|
200
|
-
- Multimodal AI tasks
|
|
201
|
-
- Advanced visualizations
|
|
202
|
-
- Performance optimization tasks
|
|
203
|
-
|
|
204
|
-
---
|
|
205
|
-
|
|
206
|
-
## Sources
|
|
207
|
-
- [GitHub Octoverse 2025](https://github.blog/news-insights/octoverse/octoverse-a-new-developer-joins-github-every-second-as-ai-leads-typescript-to-1/)
|
|
208
|
-
- [Stack Overflow Developer Survey 2025](https://survey.stackoverflow.co/2025/)
|
|
209
|
-
- [HackerRank Real-World Coding Challenges 2025](https://www.hackerrank.com/writing/design-real-world-coding-challenges-junior-backend-developer-screening-2025)
|
|
210
|
-
- [LangChain State of Agent Engineering](https://www.langchain.com/state-of-agent-engineering)
|
|
211
|
-
- [WorkOS Multi-Tenant Architecture Guide](https://workos.com/blog/developers-guide-saas-multi-tenant-architecture)
|