coderace 0.2.0__tar.gz → 0.4.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (64) hide show
  1. coderace-0.4.0/CHANGELOG.md +56 -0
  2. coderace-0.4.0/PKG-INFO +412 -0
  3. coderace-0.4.0/README.md +382 -0
  4. coderace-0.4.0/action.yml +120 -0
  5. coderace-0.4.0/all-day-build-contract-ci-integration.md +104 -0
  6. coderace-0.4.0/all-day-build-contract-cost-tracking.md +97 -0
  7. {coderace-0.2.0 → coderace-0.4.0}/coderace/__init__.py +1 -1
  8. coderace-0.4.0/coderace/adapters/aider.py +33 -0
  9. {coderace-0.2.0 → coderace-0.4.0}/coderace/adapters/base.py +32 -1
  10. {coderace-0.2.0 → coderace-0.4.0}/coderace/adapters/claude.py +13 -0
  11. coderace-0.4.0/coderace/adapters/codex.py +33 -0
  12. coderace-0.4.0/coderace/adapters/gemini.py +32 -0
  13. coderace-0.4.0/coderace/adapters/opencode.py +31 -0
  14. {coderace-0.2.0 → coderace-0.4.0}/coderace/cli.py +115 -2
  15. coderace-0.4.0/coderace/commands/__init__.py +1 -0
  16. coderace-0.4.0/coderace/commands/diff.py +159 -0
  17. coderace-0.4.0/coderace/commands/results.py +116 -0
  18. coderace-0.4.0/coderace/cost.py +456 -0
  19. {coderace-0.2.0 → coderace-0.4.0}/coderace/html_report.py +15 -0
  20. {coderace-0.2.0 → coderace-0.4.0}/coderace/reporter.py +26 -0
  21. {coderace-0.2.0 → coderace-0.4.0}/coderace/scorer.py +1 -0
  22. {coderace-0.2.0 → coderace-0.4.0}/coderace/stats.py +9 -0
  23. {coderace-0.2.0 → coderace-0.4.0}/coderace/task.py +33 -0
  24. {coderace-0.2.0 → coderace-0.4.0}/coderace/types.py +7 -0
  25. {coderace-0.2.0 → coderace-0.4.0}/examples/add-type-hints.yaml +5 -0
  26. coderace-0.4.0/examples/ci-race-on-pr.yml +66 -0
  27. {coderace-0.2.0 → coderace-0.4.0}/examples/example-task.yaml +8 -0
  28. {coderace-0.2.0 → coderace-0.4.0}/examples/fix-edge-case.yaml +5 -0
  29. {coderace-0.2.0 → coderace-0.4.0}/examples/write-tests.yaml +5 -0
  30. coderace-0.4.0/progress-log.md +92 -0
  31. {coderace-0.2.0 → coderace-0.4.0}/pyproject.toml +1 -1
  32. coderace-0.4.0/scripts/ci-run.sh +61 -0
  33. coderace-0.4.0/scripts/format-comment.py +172 -0
  34. coderace-0.4.0/tests/test_cost.py +432 -0
  35. coderace-0.4.0/tests/test_cost_config.py +311 -0
  36. coderace-0.4.0/tests/test_cost_integration.py +374 -0
  37. coderace-0.4.0/tests/test_diff.py +234 -0
  38. coderace-0.4.0/tests/test_format_comment.py +215 -0
  39. coderace-0.4.0/tests/test_markdown_results.py +226 -0
  40. {coderace-0.2.0 → coderace-0.4.0}/uv.lock +1 -1
  41. coderace-0.2.0/CHANGELOG.md +0 -34
  42. coderace-0.2.0/PKG-INFO +0 -211
  43. coderace-0.2.0/README.md +0 -181
  44. coderace-0.2.0/coderace/adapters/aider.py +0 -20
  45. coderace-0.2.0/coderace/adapters/codex.py +0 -20
  46. coderace-0.2.0/coderace/adapters/gemini.py +0 -19
  47. coderace-0.2.0/coderace/adapters/opencode.py +0 -18
  48. {coderace-0.2.0 → coderace-0.4.0}/.github/workflows/publish.yml +0 -0
  49. {coderace-0.2.0 → coderace-0.4.0}/.gitignore +0 -0
  50. {coderace-0.2.0 → coderace-0.4.0}/LICENSE +0 -0
  51. {coderace-0.2.0 → coderace-0.4.0}/all-day-build-contract-v0.2.md +0 -0
  52. {coderace-0.2.0 → coderace-0.4.0}/coderace/adapters/__init__.py +0 -0
  53. {coderace-0.2.0 → coderace-0.4.0}/coderace/git_ops.py +0 -0
  54. {coderace-0.2.0 → coderace-0.4.0}/tests/__init__.py +0 -0
  55. {coderace-0.2.0 → coderace-0.4.0}/tests/conftest.py +0 -0
  56. {coderace-0.2.0 → coderace-0.4.0}/tests/test_adapters.py +0 -0
  57. {coderace-0.2.0 → coderace-0.4.0}/tests/test_cli.py +0 -0
  58. {coderace-0.2.0 → coderace-0.4.0}/tests/test_examples.py +0 -0
  59. {coderace-0.2.0 → coderace-0.4.0}/tests/test_git_ops.py +0 -0
  60. {coderace-0.2.0 → coderace-0.4.0}/tests/test_html_report.py +0 -0
  61. {coderace-0.2.0 → coderace-0.4.0}/tests/test_reporter.py +0 -0
  62. {coderace-0.2.0 → coderace-0.4.0}/tests/test_scorer.py +0 -0
  63. {coderace-0.2.0 → coderace-0.4.0}/tests/test_stats.py +0 -0
  64. {coderace-0.2.0 → coderace-0.4.0}/tests/test_task.py +0 -0
@@ -0,0 +1,56 @@
1
+ # Changelog
2
+
3
+ ## [0.4.0] - 2026-02-24
4
+
5
+ ### Added
6
+
7
+ - **Cost tracking** — Each agent run now includes an estimated API cost. The results table shows a `Cost (USD)` column in terminal, markdown, JSON, and HTML output.
8
+ - **`coderace/cost.py`** — Pricing engine: pricing table for Claude Code (Sonnet 4.6, Opus 4.6), Codex (GPT-5.3), Gemini CLI (2.5 Pro, 3.1 Pro), Aider, and OpenCode. `CostResult` dataclass with `input_tokens`, `output_tokens`, `estimated_cost_usd`, `model_name`, `pricing_source`.
9
+ - **Per-adapter `parse_cost()` methods** — Each adapter extracts token counts or cost info from the agent's stdout/stderr. Falls back to file-size estimation when tokens are unavailable.
10
+ - **`pricing:` section in task YAML** — Override pricing per-agent or per-model with `input_per_1m` / `output_per_1m` (USD per 1M tokens).
11
+ - **`--no-cost` flag** — `coderace run task.yaml --no-cost` disables cost tracking entirely.
12
+ - **HTML report $/score column** — The HTML report now shows cost and cost-per-point for direct efficiency comparison.
13
+ - **Statistical mode cost aggregation** — `--runs N` shows mean ± stddev for cost alongside score and time.
14
+ - **`coderace init` template** — Now includes a commented `pricing:` example section.
15
+
16
+ ## [0.3.0] - 2026-02-24
17
+
18
+ ### Added
19
+
20
+ - **`coderace diff`** - Generate task YAML from a git diff. Three modes: `review` (find bugs), `fix` (apply fixes), `improve` (refactor). Pipe any diff in, get a ready-to-race task out.
21
+ - **GitHub Action** - `uses: mikiships/coderace@main` drops into any workflow. Races agents on your task and posts a results table as a PR comment. Re-runs update the same comment.
22
+ - **Example CI workflows** - Two drop-in configs: PR trigger and label trigger (`race-agents`).
23
+ - **`--format` flag for results** - `coderace results task.yaml -F markdown|json|terminal` for CI-friendly output.
24
+
25
+ ## [0.2.0] - 2026-02-23
26
+
27
+ ### Added
28
+
29
+ - **OpenCode adapter** - OpenCode (terminal-first open-source coding agent) is now a supported agent (`opencode` in task YAML)
30
+ - **Custom scoring weights** - Override default weights in task YAML via `scoring:` section; weights are auto-normalized; supports short aliases (`tests`, `exit`, `lint`, `time`, `lines`)
31
+ - **HTML reports** - Self-contained single-file HTML report auto-generated on every run at `.coderace/<task>-results.html`; also `coderace results --html report.html` for manual export; sortable columns, dark theme
32
+ - **Statistical mode** - `coderace run task.yaml --runs N` for multi-run comparison; shows mean ± stddev for score, time, and lines changed; saves per-run and aggregated JSON
33
+ - **Example tasks** - `examples/` directory with 3 ready-to-use templates: `add-type-hints.yaml`, `fix-edge-case.yaml`, `write-tests.yaml`
34
+
35
+ ### Changed
36
+
37
+ - `coderace init` template now includes OpenCode in default agent list
38
+ - `coderace init` template includes commented scoring example
39
+ - README: "Try it now" section, statistical mode docs, HTML report docs, custom scoring docs, updated agent table
40
+
41
+ ### Fixed
42
+
43
+ - `opencode` now accepted as a valid agent name in task validation
44
+
45
+ ## [0.1.0] - 2026-02-22
46
+
47
+ ### Added
48
+
49
+ - Initial release
50
+ - CLI: `init`, `run`, `results`, `version` commands
51
+ - 4 agent adapters: Claude Code, Codex, Aider, Gemini CLI
52
+ - Sequential and parallel (git worktrees) run modes
53
+ - Composite scoring: tests (40%), exit (20%), lint (15%), time (15%), lines (10%)
54
+ - JSON results output
55
+ - Rich terminal table output
56
+ - `coderace run --parallel` using git worktrees
@@ -0,0 +1,412 @@
1
+ Metadata-Version: 2.4
2
+ Name: coderace
3
+ Version: 0.4.0
4
+ Summary: Race coding agents against each other on real tasks
5
+ Project-URL: Homepage, https://github.com/mikiships/coderace
6
+ Project-URL: Repository, https://github.com/mikiships/coderace
7
+ Author: mikiships
8
+ License-Expression: MIT
9
+ License-File: LICENSE
10
+ Keywords: ai,aider,benchmark,claude,codex,coding-agents
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: Environment :: Console
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: License :: OSI Approved :: MIT License
15
+ Classifier: Programming Language :: Python :: 3
16
+ Classifier: Programming Language :: Python :: 3.10
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Classifier: Programming Language :: Python :: 3.12
19
+ Classifier: Programming Language :: Python :: 3.13
20
+ Classifier: Topic :: Software Development :: Testing
21
+ Requires-Python: >=3.10
22
+ Requires-Dist: pyyaml>=6.0
23
+ Requires-Dist: rich>=13.0
24
+ Requires-Dist: typer>=0.9.0
25
+ Provides-Extra: dev
26
+ Requires-Dist: pytest-mock>=3.0; extra == 'dev'
27
+ Requires-Dist: pytest>=7.0; extra == 'dev'
28
+ Requires-Dist: ruff>=0.4.0; extra == 'dev'
29
+ Description-Content-Type: text/markdown
30
+
31
+ # coderace
32
+
33
+ Stop reading blog comparisons. Race coding agents against each other on real tasks in *your* repo with *your* code.
34
+
35
+ Every week there's a new "Claude Code vs Codex vs Cursor" post. They test on toy problems with cherry-picked examples. coderace gives you automated, reproducible, scored comparisons on the tasks you actually care about.
36
+
37
+ Define a task. Run it against Claude Code, Codex, Aider, Gemini CLI, and OpenCode. Get a scored comparison table.
38
+
39
+ ## Install
40
+
41
+ ```bash
42
+ pip install coderace
43
+ ```
44
+
45
+ ## Quick Start
46
+
47
+ ```bash
48
+ # Create a task template
49
+ coderace init fix-auth-bug
50
+
51
+ # Edit the task file (describe the bug, set test command)
52
+ # Then race the agents:
53
+ coderace run fix-auth-bug.yaml
54
+
55
+ # Or race them in parallel (uses git worktrees):
56
+ coderace run fix-auth-bug.yaml --parallel
57
+
58
+ # View results from the last run
59
+ coderace results fix-auth-bug.yaml
60
+ ```
61
+
62
+ ## `coderace diff` — Race Agents on a Real PR Diff
63
+
64
+ Turn any git diff into a coderace task with one command:
65
+
66
+ ```bash
67
+ # Race agents to review the latest commit
68
+ git diff HEAD~1 | coderace diff --mode review | coderace run /dev/stdin
69
+
70
+ # Generate a task YAML from a patch file, then run it
71
+ git diff main...my-branch > my-pr.patch
72
+ coderace diff --file my-pr.patch --mode fix --output task.yaml
73
+ coderace run task.yaml
74
+ ```
75
+
76
+ ### Modes
77
+
78
+ | Mode | What agents are asked to do |
79
+ |------|-----------------------------|
80
+ | `review` | Review the changes and provide feedback on correctness, style, and potential issues |
81
+ | `fix` | Fix bugs or problems introduced by the diff |
82
+ | `improve` | Enhance performance, readability, or robustness of the changed code |
83
+
84
+ ### Flags
85
+
86
+ ```
87
+ --file PATH Read diff from file instead of stdin
88
+ --mode TEXT review | fix | improve (default: review)
89
+ --agents TEXT Override agent list (repeatable: --agents claude --agents aider)
90
+ --name TEXT Task name in generated YAML (default: diff-task)
91
+ --output PATH Write YAML to file instead of stdout
92
+ --test-command Test command to embed in the task (default: pytest tests/ -x)
93
+ --lint-command Lint command to embed in the task (default: ruff check .)
94
+ ```
95
+
96
+ ## Task Format
97
+
98
+ ```yaml
99
+ name: fix-auth-bug
100
+ description: |
101
+ The login endpoint returns 500 when email contains a plus sign.
102
+ Fix the email validation in auth/validators.py.
103
+ repo: .
104
+ test_command: pytest tests/test_auth.py -x
105
+ lint_command: ruff check .
106
+ timeout: 300
107
+ agents:
108
+ - claude
109
+ - codex
110
+ - aider
111
+ ```
112
+
113
+ ## What It Does
114
+
115
+ For each agent in the task:
116
+
117
+ 1. Creates a fresh git branch (`coderace/<agent>-<task>`)
118
+ 2. Invokes the agent CLI with the task description
119
+ 3. Runs your test command
120
+ 4. Runs your lint command (optional)
121
+ 5. Computes a composite score
122
+
123
+ ## Scoring
124
+
125
+ | Metric | Weight | Description |
126
+ |--------|--------|-------------|
127
+ | Tests pass | 40% | Did the test command exit 0? |
128
+ | Exit clean | 20% | Did the agent itself exit 0 without timeout? |
129
+ | Lint clean | 15% | Did the lint command exit 0? |
130
+ | Wall time | 15% | Faster is better (normalized across agents) |
131
+ | Lines changed | 10% | Fewer is better (normalized across agents) |
132
+
133
+ ## Output
134
+
135
+ Terminal table with Rich formatting:
136
+
137
+ ```
138
+ ┌──────┬────────┬───────┬───────┬──────┬──────┬──────────┬───────┐
139
+ │ Rank │ Agent │ Score │ Tests │ Exit │ Lint │ Time (s) │ Lines │
140
+ ├──────┼────────┼───────┼───────┼──────┼──────┼──────────┼───────┤
141
+ │ 1 │ claude │ 85.0 │ PASS │ PASS │ PASS │ 10.5 │ 42 │
142
+ │ 2 │ codex │ 70.0 │ PASS │ PASS │ FAIL │ 15.2 │ 98 │
143
+ │ 3 │ aider │ 55.0 │ FAIL │ PASS │ PASS │ 8.1 │ 31 │
144
+ └──────┴────────┴───────┴───────┴──────┴──────┴──────────┴───────┘
145
+ ```
146
+
147
+ Results also saved as JSON in `.coderace/<task>-results.json` and as a self-contained HTML report in `.coderace/<task>-results.html`.
148
+
149
+ ## Try It Now
150
+
151
+ The `examples/` directory has ready-to-use task templates:
152
+
153
+ ```bash
154
+ # Race agents on adding type hints to your project
155
+ coderace run examples/add-type-hints.yaml
156
+
157
+ # Race agents on fixing an edge case bug
158
+ coderace run examples/fix-edge-case.yaml
159
+
160
+ # Race agents on writing new tests
161
+ coderace run examples/write-tests.yaml
162
+ ```
163
+
164
+ Edit the `repo` and `description` fields to point at your actual project and describe your real task.
165
+
166
+ ## Statistical Mode
167
+
168
+ Run each agent multiple times and get mean ± stddev:
169
+
170
+ ```bash
171
+ coderace run task.yaml --runs 5
172
+ ```
173
+
174
+ Useful for tasks with variable outcomes (LLM nondeterminism is real).
175
+
176
+ ## HTML Reports
177
+
178
+ Export results as a shareable single-file HTML report:
179
+
180
+ ```bash
181
+ # Auto-generated on every run at .coderace/<task>-results.html
182
+ # Or export manually:
183
+ coderace results task.yaml --html report.html
184
+ ```
185
+
186
+ The HTML report has sortable columns and a dark theme. Drop it in a blog post or Slack.
187
+
188
+ ## Custom Scoring
189
+
190
+ Override the default weights in your task YAML:
191
+
192
+ ```yaml
193
+ scoring:
194
+ tests: 60 # tests passing (default 40)
195
+ exit: 20 # clean exit (default 20)
196
+ lint: 10 # lint clean (default 15)
197
+ time: 5 # wall time (default 15)
198
+ lines: 5 # lines changed (default 10)
199
+ ```
200
+
201
+ Weights are normalized automatically (don't need to sum to 100).
202
+
203
+ ## Cost Tracking
204
+
205
+ coderace automatically estimates API cost for each agent run. After every race, the results table includes a **Cost (USD)** column so you can compare quality-per-dollar, not just quality alone.
206
+
207
+ ```
208
+ ┌──────┬────────┬───────┬───────┬──────┬──────┬──────────┬───────┬────────────┐
209
+ │ Rank │ Agent │ Score │ Tests │ Exit │ Lint │ Time (s) │ Lines │ Cost (USD) │
210
+ ├──────┼────────┼───────┼───────┼──────┼──────┼──────────┼───────┼────────────┤
211
+ │ 1 │ claude │ 85.0 │ PASS │ PASS │ PASS │ 10.5 │ 42 │ $0.0063 │
212
+ │ 2 │ codex │ 70.0 │ PASS │ PASS │ FAIL │ 15.2 │ 98 │ $0.0041 │
213
+ │ 3 │ aider │ 55.0 │ FAIL │ PASS │ PASS │ 8.1 │ 31 │ - │
214
+ └──────┴────────┴───────┴───────┴──────┴──────┴──────────┴───────┴────────────┘
215
+ ```
216
+
217
+ Cost appears in all output formats:
218
+ - **Terminal** — `Cost (USD)` column (shows `-` when unavailable)
219
+ - **Markdown** — `--format markdown` includes the column
220
+ - **JSON** — `cost` object per agent result with `input_tokens`, `output_tokens`, `estimated_cost_usd`, `model_name`, `pricing_source`
221
+ - **HTML report** — Cost column plus `$/score` ratio column for direct efficiency comparison
222
+
223
+ ### How it works
224
+
225
+ Each agent adapter parses token counts or cost lines from the agent's CLI output:
226
+
227
+ | Agent | Source |
228
+ |-------|--------|
229
+ | Claude Code | `usage.input_tokens` / `usage.output_tokens` from JSON output; or "Total cost: $N" lines |
230
+ | Codex | `prompt_tokens=N, completion_tokens=N` usage summary |
231
+ | Gemini CLI | `inputTokenCount=N, outputTokenCount=N` lines |
232
+ | Aider | "Tokens: N sent, N received. Cost: $N message" lines |
233
+ | OpenCode | "Total cost: $N" or generic token lines |
234
+
235
+ If token counts are unavailable, cost is estimated from input file size + output diff size (marked as `pricing_source: "estimated"`).
236
+
237
+ ### Disable cost tracking
238
+
239
+ ```bash
240
+ coderace run task.yaml --no-cost
241
+ ```
242
+
243
+ ## Custom Pricing
244
+
245
+ Override the default pricing table in your task YAML — useful for custom models, negotiated rates, or open-source deployments.
246
+
247
+ ```yaml
248
+ # pricing: per-agent or per-model overrides (USD per 1M tokens)
249
+ pricing:
250
+ claude:
251
+ input_per_1m: 3.00 # default for claude-sonnet-4-6
252
+ output_per_1m: 15.00
253
+ codex:
254
+ input_per_1m: 3.00
255
+ output_per_1m: 15.00
256
+ # Or use the model name directly:
257
+ claude-opus-4-6:
258
+ input_per_1m: 15.00
259
+ output_per_1m: 75.00
260
+ ```
261
+
262
+ Keys can be agent names (`claude`, `codex`, `aider`, `gemini`, `opencode`) or model names (`claude-sonnet-4-6`, `gpt-5.3-codex`, `gemini-2.5-pro`). The default pricing table covers:
263
+
264
+ | Model | Input ($/1M) | Output ($/1M) |
265
+ |-------|-------------|--------------|
266
+ | claude-sonnet-4-6 | $3.00 | $15.00 |
267
+ | claude-opus-4-6 | $15.00 | $75.00 |
268
+ | gpt-5.3-codex | $3.00 | $15.00 |
269
+ | gemini-2.5-pro | $1.25 | $10.00 |
270
+ | gemini-3.1-pro | $1.25 | $10.00 |
271
+
272
+ Pricing is easy to update: the table lives in `coderace/cost.py` as a plain dict.
273
+
274
+ ## Supported Agents
275
+
276
+ | Agent | CLI | Notes |
277
+ |-------|-----|-------|
278
+ | Claude Code | `claude` | Anthropic's coding agent |
279
+ | Codex | `codex` | OpenAI Codex CLI |
280
+ | Aider | `aider` | Git-integrated AI coding |
281
+ | Gemini CLI | `gemini` | Google's Gemini CLI |
282
+ | OpenCode | `opencode` | Open-source terminal agent |
283
+
284
+ Each agent must be installed and authenticated separately.
285
+
286
+ ## Parallel Mode
287
+
288
+ Use `--parallel` (or `-p`) to run all agents simultaneously using git worktrees. Each agent gets its own isolated working directory, so they don't interfere with each other.
289
+
290
+ ```bash
291
+ coderace run task.yaml --parallel
292
+ ```
293
+
294
+ Sequential mode (default) runs agents one at a time on the same repo.
295
+
296
+ ## Why coderace?
297
+
298
+ **Blog posts compare models. coderace compares agents on your work.**
299
+
300
+ - Run on your actual codebase, not HumanEval
301
+ - Automated scoring: tests, lint, time, lines changed
302
+ - Parallel mode with git worktrees (no interference between agents)
303
+ - JSON output for CI integration and tracking over time
304
+ - Works with any agent that has a CLI
305
+
306
+ The goal isn't "which model is best." It's "which agent solves my specific problem best."
307
+
308
+ ## CI Integration
309
+
310
+ Use coderace in GitHub Actions to automatically race agents on PRs and post results as comments.
311
+
312
+ ### Quick setup
313
+
314
+ 1. Copy `examples/ci-race-on-pr.yml` into `.github/workflows/` in your repo.
315
+ 2. Create a task YAML at `.github/coderace-task.yaml` (see [Task Format](#task-format)).
316
+ 3. Install the agent CLIs your task requires (see comments in the workflow file).
317
+ 4. Open or update a PR — results appear as a PR comment automatically.
318
+
319
+ ### Workflow: Race on every PR
320
+
321
+ ```yaml
322
+ name: Race Coding Agents
323
+
324
+ on:
325
+ pull_request:
326
+ branches: [main]
327
+
328
+ jobs:
329
+ race:
330
+ runs-on: ubuntu-latest
331
+ permissions:
332
+ contents: read
333
+ pull-requests: write
334
+
335
+ steps:
336
+ - uses: actions/checkout@v4
337
+
338
+ - name: Run coderace
339
+ uses: mikiships/coderace@v0.3
340
+ with:
341
+ task: .github/coderace-task.yaml
342
+ agents: claude,aider
343
+ github-token: ${{ secrets.GITHUB_TOKEN }}
344
+ ```
345
+
346
+ ### Workflow: Race only when "race-agents" label is added
347
+
348
+ Cost-control pattern: only race when a maintainer deliberately triggers it.
349
+
350
+ ```yaml
351
+ name: Race Coding Agents (on label)
352
+
353
+ on:
354
+ pull_request:
355
+ types: [labeled]
356
+
357
+ jobs:
358
+ race:
359
+ if: github.event.label.name == 'race-agents'
360
+ runs-on: ubuntu-latest
361
+ permissions:
362
+ contents: read
363
+ pull-requests: write
364
+
365
+ steps:
366
+ - uses: actions/checkout@v4
367
+
368
+ - name: Run coderace
369
+ uses: mikiships/coderace@v0.3
370
+ with:
371
+ task: .github/coderace-task.yaml
372
+ github-token: ${{ secrets.GITHUB_TOKEN }}
373
+ ```
374
+
375
+ ### Action inputs
376
+
377
+ | Input | Description | Default |
378
+ |-------|-------------|---------|
379
+ | `task` | Path to coderace task YAML | _(required)_ |
380
+ | `agents` | Comma-separated agents to race | _(from task file)_ |
381
+ | `parallel` | Run agents in parallel (`true`/`false`) | `false` |
382
+ | `github-token` | Token for posting PR comments | `${{ github.token }}` |
383
+ | `coderace-version` | coderace version to install | `latest` |
384
+ | `python-version` | Python version | `3.11` |
385
+
386
+ ### Example PR comment
387
+
388
+ The action automatically posts (and updates on re-run) a comment like:
389
+
390
+ > ✅ **coderace** — `fix-auth-bug` | **Winner: `claude`** (85.0 pts) | 3 agent(s) raced
391
+ >
392
+ > | Rank | Agent | Score | Tests | Lint | Exit | Time (s) | Lines |
393
+ > |------|-------|------:|:-----:|:----:|:----:|---------:|------:|
394
+ > | 1 | `claude` | 85.0 | ✅ | ✅ | ✅ | 10.5 | 42 |
395
+ > | 2 | `codex` | 70.0 | ✅ | ❌ | ✅ | 15.2 | 98 |
396
+ > | 3 | `aider` | 55.0 | ❌ | ✅ | ✅ | 8.1 | 31 |
397
+
398
+ The action uses a hidden HTML marker to find and update existing comments, so re-running doesn't spam the PR.
399
+
400
+ ## See Also
401
+
402
+ - **[pytest-agentcontract](https://github.com/mikiships/pytest-agentcontract)** -- Deterministic CI tests for LLM agent trajectories. Record once, replay offline, assert contracts. Pairs well with coderace: race agents to find the best one, then lock down its behavior with contract tests.
403
+
404
+ ## Requirements
405
+
406
+ - Python 3.10+
407
+ - Git
408
+ - At least one coding agent CLI installed
409
+
410
+ ## License
411
+
412
+ MIT