PyPI - coderace - Versions diffs - 0.1.0__tar.gz → 0.2.0__tar.gz - Mend

coderace 0.1.0tar.gz → 0.2.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (43) hide show

coderace-0.2.0/CHANGELOG.md +34 -0
{coderace-0.1.0 → coderace-0.2.0}/PKG-INFO +79 -10
{coderace-0.1.0 → coderace-0.2.0}/README.md +78 -9
coderace-0.2.0/all-day-build-contract-v0.2.md +137 -0
{coderace-0.1.0 → coderace-0.2.0}/coderace/__init__.py +1 -1
{coderace-0.1.0 → coderace-0.2.0}/coderace/adapters/__init__.py +3 -0
coderace-0.2.0/coderace/adapters/opencode.py +18 -0
coderace-0.2.0/coderace/cli.py +509 -0
coderace-0.2.0/coderace/html_report.py +134 -0
{coderace-0.1.0 → coderace-0.2.0}/coderace/reporter.py +60 -0
{coderace-0.1.0 → coderace-0.2.0}/coderace/scorer.py +10 -15
coderace-0.2.0/coderace/stats.py +97 -0
{coderace-0.1.0 → coderace-0.2.0}/coderace/task.py +13 -0
coderace-0.2.0/coderace/types.py +130 -0
coderace-0.2.0/examples/add-type-hints.yaml +31 -0
coderace-0.2.0/examples/example-task.yaml +23 -0
coderace-0.2.0/examples/fix-edge-case.yaml +37 -0
coderace-0.2.0/examples/write-tests.yaml +37 -0
{coderace-0.1.0 → coderace-0.2.0}/pyproject.toml +4 -1
{coderace-0.1.0 → coderace-0.2.0}/tests/test_adapters.py +12 -1
{coderace-0.1.0 → coderace-0.2.0}/tests/test_cli.py +2 -1
coderace-0.2.0/tests/test_examples.py +65 -0
coderace-0.2.0/tests/test_html_report.py +95 -0
coderace-0.2.0/tests/test_scorer.py +80 -0
coderace-0.2.0/tests/test_stats.py +76 -0
coderace-0.2.0/uv.lock +349 -0
coderace-0.1.0/coderace/cli.py +0 -306
coderace-0.1.0/coderace/types.py +0 -73
coderace-0.1.0/tests/test_scorer.py +0 -46
{coderace-0.1.0 → coderace-0.2.0}/.github/workflows/publish.yml +0 -0
{coderace-0.1.0 → coderace-0.2.0}/.gitignore +0 -0
{coderace-0.1.0 → coderace-0.2.0}/LICENSE +0 -0
{coderace-0.1.0 → coderace-0.2.0}/coderace/adapters/aider.py +0 -0
{coderace-0.1.0 → coderace-0.2.0}/coderace/adapters/base.py +0 -0
{coderace-0.1.0 → coderace-0.2.0}/coderace/adapters/claude.py +0 -0
{coderace-0.1.0 → coderace-0.2.0}/coderace/adapters/codex.py +0 -0
{coderace-0.1.0 → coderace-0.2.0}/coderace/adapters/gemini.py +0 -0
{coderace-0.1.0 → coderace-0.2.0}/coderace/git_ops.py +0 -0
{coderace-0.1.0 → coderace-0.2.0}/tests/__init__.py +0 -0
{coderace-0.1.0 → coderace-0.2.0}/tests/conftest.py +0 -0
{coderace-0.1.0 → coderace-0.2.0}/tests/test_git_ops.py +0 -0
{coderace-0.1.0 → coderace-0.2.0}/tests/test_reporter.py +0 -0
{coderace-0.1.0 → coderace-0.2.0}/tests/test_task.py +0 -0

coderace-0.2.0/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,34 @@
+# Changelog
+## [0.2.0] - 2026-02-23
+### Added
+- **OpenCode adapter** - OpenCode (terminal-first open-source coding agent) is now a supported agent (`opencode` in task YAML)
+- **Custom scoring weights** - Override default weights in task YAML via `scoring:` section; weights are auto-normalized; supports short aliases (`tests`, `exit`, `lint`, `time`, `lines`)
+- **HTML reports** - Self-contained single-file HTML report auto-generated on every run at `.coderace/<task>-results.html`; also `coderace results --html report.html` for manual export; sortable columns, dark theme
+- **Statistical mode** - `coderace run task.yaml --runs N` for multi-run comparison; shows mean ± stddev for score, time, and lines changed; saves per-run and aggregated JSON
+- **Example tasks** - `examples/` directory with 3 ready-to-use templates: `add-type-hints.yaml`, `fix-edge-case.yaml`, `write-tests.yaml`
+### Changed
+- `coderace init` template now includes OpenCode in default agent list
+- `coderace init` template includes commented scoring example
+- README: "Try it now" section, statistical mode docs, HTML report docs, custom scoring docs, updated agent table
+### Fixed
+- `opencode` now accepted as a valid agent name in task validation
+## [0.1.0] - 2026-02-22
+### Added
+- Initial release
+- CLI: `init`, `run`, `results`, `version` commands
+- 4 agent adapters: Claude Code, Codex, Aider, Gemini CLI
+- Sequential and parallel (git worktrees) run modes
+- Composite scoring: tests (40%), exit (20%), lint (15%), time (15%), lines (10%)
+- JSON results output
+- Rich terminal table output
+- `coderace run --parallel` using git worktrees

{coderace-0.1.0 → coderace-0.2.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: coderace
-Version: 0.1.0
+Version: 0.2.0
 Summary: Race coding agents against each other on real tasks
 Project-URL: Homepage, https://github.com/mikiships/coderace
 Project-URL: Repository, https://github.com/mikiships/coderace
@@ -30,9 +30,11 @@ Description-Content-Type: text/markdown
 # coderace
-Race coding agents against each other on real tasks in your repo.
+Stop reading blog comparisons. Race coding agents against each other on real tasks in *your* repo with *your* code.
-Define a task. Run it against Claude Code, Codex, and Aider. Get a scored comparison table.
+Every week there's a new "Claude Code vs Codex vs Cursor" post. They test on toy problems with cherry-picked examples. coderace gives you automated, reproducible, scored comparisons on the tasks you actually care about.
+Define a task. Run it against Claude Code, Codex, Aider, Gemini CLI, and OpenCode. Get a scored comparison table.
 ## Install
@@ -108,16 +110,71 @@ Terminal table with Rich formatting:
 └──────┴────────┴───────┴───────┴──────┴──────┴──────────┴───────┘
 ```
-Results also saved as JSON in `.coderace/<task>-results.json`.
+Results also saved as JSON in `.coderace/<task>-results.json` and as a self-contained HTML report in `.coderace/<task>-results.html`.
+## Try It Now
+The `examples/` directory has ready-to-use task templates:
+```bash
+# Race agents on adding type hints to your project
+coderace run examples/add-type-hints.yaml
+# Race agents on fixing an edge case bug
+coderace run examples/fix-edge-case.yaml
+# Race agents on writing new tests
+coderace run examples/write-tests.yaml
+```
+Edit the `repo` and `description` fields to point at your actual project and describe your real task.
+## Statistical Mode
+Run each agent multiple times and get mean ± stddev:
+```bash
+coderace run task.yaml --runs 5
+```
+Useful for tasks with variable outcomes (LLM nondeterminism is real).
+## HTML Reports
+Export results as a shareable single-file HTML report:
+```bash
+# Auto-generated on every run at .coderace/<task>-results.html
+# Or export manually:
+coderace results task.yaml --html report.html
+```
+The HTML report has sortable columns and a dark theme. Drop it in a blog post or Slack.
+## Custom Scoring
+Override the default weights in your task YAML:
+```yaml
+scoring:
+  tests: 60   # tests passing (default 40)
+  exit: 20    # clean exit (default 20)
+  lint: 10    # lint clean (default 15)
+  time: 5     # wall time (default 15)
+  lines: 5    # lines changed (default 10)
+```
+Weights are normalized automatically (don't need to sum to 100).
 ## Supported Agents
-| Agent | CLI | Command |
-|-------|-----|---------|
-| Claude Code | `claude` | `claude --print --output-format json -p "<task>"` |
-| Codex | `codex` | `codex --quiet --full-auto -p "<task>"` |
-| Aider | `aider` | `aider --message "<task>" --yes --no-auto-commits` |
-| Gemini CLI | `gemini` | `gemini --non-interactive -p "<task>"` |
+| Agent | CLI | Notes |
+|-------|-----|-------|
+| Claude Code | `claude` | Anthropic's coding agent |
+| Codex | `codex` | OpenAI Codex CLI |
+| Aider | `aider` | Git-integrated AI coding |
+| Gemini CLI | `gemini` | Google's Gemini CLI |
+| OpenCode | `opencode` | Open-source terminal agent |
 Each agent must be installed and authenticated separately.
@@ -131,6 +188,18 @@ coderace run task.yaml --parallel
 Sequential mode (default) runs agents one at a time on the same repo.
+## Why coderace?
+**Blog posts compare models. coderace compares agents on your work.**
+- Run on your actual codebase, not HumanEval
+- Automated scoring: tests, lint, time, lines changed
+- Parallel mode with git worktrees (no interference between agents)
+- JSON output for CI integration and tracking over time
+- Works with any agent that has a CLI
+The goal isn't "which model is best." It's "which agent solves my specific problem best."
 ## Requirements
 - Python 3.10+

{coderace-0.1.0 → coderace-0.2.0}/README.md RENAMED Viewed

@@ -1,8 +1,10 @@
 # coderace
-Race coding agents against each other on real tasks in your repo.
+Stop reading blog comparisons. Race coding agents against each other on real tasks in *your* repo with *your* code.
-Define a task. Run it against Claude Code, Codex, and Aider. Get a scored comparison table.
+Every week there's a new "Claude Code vs Codex vs Cursor" post. They test on toy problems with cherry-picked examples. coderace gives you automated, reproducible, scored comparisons on the tasks you actually care about.
+Define a task. Run it against Claude Code, Codex, Aider, Gemini CLI, and OpenCode. Get a scored comparison table.
 ## Install
@@ -78,16 +80,71 @@ Terminal table with Rich formatting:
 └──────┴────────┴───────┴───────┴──────┴──────┴──────────┴───────┘
 ```
-Results also saved as JSON in `.coderace/<task>-results.json`.
+Results also saved as JSON in `.coderace/<task>-results.json` and as a self-contained HTML report in `.coderace/<task>-results.html`.
+## Try It Now
+The `examples/` directory has ready-to-use task templates:
+```bash
+# Race agents on adding type hints to your project
+coderace run examples/add-type-hints.yaml
+# Race agents on fixing an edge case bug
+coderace run examples/fix-edge-case.yaml
+# Race agents on writing new tests
+coderace run examples/write-tests.yaml
+```
+Edit the `repo` and `description` fields to point at your actual project and describe your real task.
+## Statistical Mode
+Run each agent multiple times and get mean ± stddev:
+```bash
+coderace run task.yaml --runs 5
+```
+Useful for tasks with variable outcomes (LLM nondeterminism is real).
+## HTML Reports
+Export results as a shareable single-file HTML report:
+```bash
+# Auto-generated on every run at .coderace/<task>-results.html
+# Or export manually:
+coderace results task.yaml --html report.html
+```
+The HTML report has sortable columns and a dark theme. Drop it in a blog post or Slack.
+## Custom Scoring
+Override the default weights in your task YAML:
+```yaml
+scoring:
+  tests: 60   # tests passing (default 40)
+  exit: 20    # clean exit (default 20)
+  lint: 10    # lint clean (default 15)
+  time: 5     # wall time (default 15)
+  lines: 5    # lines changed (default 10)
+```
+Weights are normalized automatically (don't need to sum to 100).
 ## Supported Agents
-| Agent | CLI | Command |
-|-------|-----|---------|
-| Claude Code | `claude` | `claude --print --output-format json -p "<task>"` |
-| Codex | `codex` | `codex --quiet --full-auto -p "<task>"` |
-| Aider | `aider` | `aider --message "<task>" --yes --no-auto-commits` |
-| Gemini CLI | `gemini` | `gemini --non-interactive -p "<task>"` |
+| Agent | CLI | Notes |
+|-------|-----|-------|
+| Claude Code | `claude` | Anthropic's coding agent |
+| Codex | `codex` | OpenAI Codex CLI |
+| Aider | `aider` | Git-integrated AI coding |
+| Gemini CLI | `gemini` | Google's Gemini CLI |
+| OpenCode | `opencode` | Open-source terminal agent |
 Each agent must be installed and authenticated separately.
@@ -101,6 +158,18 @@ coderace run task.yaml --parallel
 Sequential mode (default) runs agents one at a time on the same repo.
+## Why coderace?
+**Blog posts compare models. coderace compares agents on your work.**
+- Run on your actual codebase, not HumanEval
+- Automated scoring: tests, lint, time, lines changed
+- Parallel mode with git worktrees (no interference between agents)
+- JSON output for CI integration and tracking over time
+- Works with any agent that has a CLI
+The goal isn't "which model is best." It's "which agent solves my specific problem best."
 ## Requirements
 - Python 3.10+

coderace-0.2.0/all-day-build-contract-v0.2.md ADDED Viewed

@@ -0,0 +1,137 @@
+# All-Day Build Contract: coderace v0.2.0
+Status: In Progress
+Date: 2026-02-23
+Owner: Codex/sub-agent execution pass
+Scope type: Deliverable-gated (no hour promises)
+## 1. Objective
+Ship coderace v0.2.0 with five new features that make comparison results shareable, statistically meaningful, and broader in agent coverage. The "Claude Code vs Codex" comparison trend is peaking this week. OpenCode (60k-star open-source alternative) just got a major benchmark review. Adding OpenCode as the 5th agent + HTML reports makes coderace the go-to tool for this moment.
+This contract is considered complete only when every deliverable and validation gate below is satisfied.
+## 2. Non-Negotiable Build Rules
+1. No time-based completion claims.
+2. Completion is allowed only when all checklist items are checked.
+3. Full test suite must pass at the end (existing 39 tests + new tests).
+4. New features must ship with docs and report addendum updates in the same pass.
+5. CLI outputs must be deterministic and schema-backed where specified.
+6. Never modify files outside the project directory.
+7. Commit after each completed deliverable (not at the end).
+8. If stuck on same issue for 3 attempts, stop and write a blocker report.
+9. Do NOT refactor, restyle, or "improve" code outside the deliverables.
+10. Read existing tests and docs before writing new code.
+## 3. Feature Deliverables
+### D1. OpenCode Adapter (5th CLI agent)
+Add OpenCode CLI as a supported agent. OpenCode is a terminal-first open-source coding assistant with 60k+ GitHub stars. It's invoked as `opencode` with similar patterns to other CLI agents.
+Required files:
+- `coderace/adapters/opencode.py`
+- `tests/test_opencode_adapter.py`
+- [ ] Implement OpenCode adapter following existing adapter pattern (see claude.py, codex.py, aider.py, gemini.py)
+- [ ] OpenCode invocation: `opencode` CLI with task prompt via stdin or --prompt flag (research actual CLI interface)
+- [ ] If OpenCode CLI is not installed, adapter should detect and report clearly
+- [ ] Register adapter in adapter registry
+- [ ] Add `opencode` to task YAML agents list support
+- [ ] Tests: unit tests for adapter (mock CLI invocation), integration with run pipeline
+- [ ] Update README: add OpenCode to supported agents list and example YAML
+### D2. Custom Scoring Weights
+Allow users to override default scoring weights in the task YAML. Currently hardcoded (40/20/15/15/10). Users should be able to tune.
+Required files:
+- Modify `coderace/scoring.py` (or wherever scoring lives)
+- `tests/test_custom_scoring.py`
+- [ ] Add optional `scoring` section to task YAML schema:
+  ```yaml
+  scoring:
+    tests: 50
+    exit: 20
+    lint: 10
+    time: 10
+    lines: 10
+  ```
+- [ ] Weights are normalized (sum to 100) automatically
+- [ ] If `scoring` section omitted, use current defaults
+- [ ] Validate: all weights >= 0, no unknown keys
+- [ ] Tests: custom weights, partial override, invalid weights, normalization
+### D3. HTML Report Output
+Generate a self-contained HTML report from race results. This makes results shareable on blogs, tweets, and team Slack.
+Required files:
+- `coderace/report.py`
+- `tests/test_report.py`
+- [ ] `coderace results task.yaml --html report.html` generates a single-file HTML report
+- [ ] Report includes: task name, date, agent scores table, scoring weights used, timing breakdown
+- [ ] Styled with inline CSS (no external dependencies, single file)
+- [ ] Table is sortable by clicking column headers (vanilla JS, inline)
+- [ ] Include a "Generated by coderace" footer with version
+- [ ] Tests: HTML generation, content validation, file output
+### D4. Statistical Mode (multiple runs)
+Run the same task N times and report mean/stddev for each metric. Real benchmarking needs statistical significance.
+Required files:
+- Modify `coderace/runner.py` (or equivalent)
+- `coderace/stats.py`
+- `tests/test_stats.py`
+- [ ] `coderace run task.yaml --runs 5` runs each agent 5 times
+- [ ] Results show mean ± stddev for score, time, and lines changed
+- [ ] Rich table adapts to show statistical columns
+- [ ] JSON output includes per-run data + aggregates
+- [ ] HTML report (D3) also supports statistical view
+- [ ] Tests: multi-run aggregation, edge cases (1 run = no stddev), JSON schema
+### D5. Example Benchmark Tasks
+Ship example tasks that work out of the box on any Python project. Users shouldn't have to write YAML from scratch to try coderace.
+Required files:
+- `examples/add-type-hints.yaml`
+- `examples/fix-edge-case.yaml`
+- `examples/write-tests.yaml`
+- Update README with examples section
+- [ ] 3 example task YAMLs that target common patterns (type hints, edge cases, test coverage)
+- [ ] Each example has a description explaining what it tests
+- [ ] Examples reference a small bundled test fixture (or clearly document how to point at user's repo)
+- [ ] README section: "Try it now" with copy-paste commands
+- [ ] Tests: validate example YAML files parse correctly
+## 4. Test Requirements
+- [ ] Unit tests for each deliverable (specified above)
+- [ ] All 39 existing tests must still pass
+- [ ] Integration test: run full pipeline with mock agents including OpenCode
+- [ ] Edge cases: empty results, single agent, all agents fail, custom weights sum to 0
+## 5. Reports
+- Write progress to `progress-log.md` after each deliverable
+- Include: what was built, what tests pass, what's next, any blockers
+- Final summary when all deliverables done or stopped
+## 6. Stop Conditions
+- All deliverables checked and all tests passing -> DONE
+- 3 consecutive failed attempts on same issue -> STOP, write blocker report
+- Scope creep detected (new requirements discovered) -> STOP, report what's new
+- All tests passing but deliverables remain -> continue to next deliverable
+## 7. Version Bump
+- [ ] Bump version to 0.2.0 in pyproject.toml
+- [ ] Update CHANGELOG or add one if missing

{coderace-0.1.0 → coderace-0.2.0}/coderace/__init__.py RENAMED Viewed

@@ -1,3 +1,3 @@
 """coderace - Race coding agents against each other on real tasks."""
-__version__ = "0.1.0"
+__version__ = "0.2.0"

{coderace-0.1.0 → coderace-0.2.0}/coderace/adapters/__init__.py RENAMED Viewed

@@ -5,12 +5,14 @@ from coderace.adapters.base import BaseAdapter
 from coderace.adapters.claude import ClaudeAdapter
 from coderace.adapters.codex import CodexAdapter
 from coderace.adapters.gemini import GeminiAdapter
+from coderace.adapters.opencode import OpenCodeAdapter
 ADAPTERS: dict[str, type[BaseAdapter]] = {
     "claude": ClaudeAdapter,
     "codex": CodexAdapter,
     "aider": AiderAdapter,
     "gemini": GeminiAdapter,
+    "opencode": OpenCodeAdapter,
 }
 __all__ = [
@@ -20,4 +22,5 @@ __all__ = [
     "CodexAdapter",
     "AiderAdapter",
     "GeminiAdapter",
+    "OpenCodeAdapter",
 ]

coderace-0.2.0/coderace/adapters/opencode.py ADDED Viewed

@@ -0,0 +1,18 @@
+"""OpenCode adapter."""
+from __future__ import annotations
+from coderace.adapters.base import BaseAdapter
+class OpenCodeAdapter(BaseAdapter):
+    """Adapter for OpenCode CLI (terminal-first AI coding agent)."""
+    name = "opencode"
+    def build_command(self, task_description: str) -> list[str]:
+        return [
+            "opencode",
+            "run",
+            task_description,
+        ]

coderace 0.1.0__tar.gz → 0.2.0__tar.gz

coderace 0.1.0tar.gz → 0.2.0tar.gz