PyPI - coderace - Versions diffs - 0.2.0__tar.gz → 0.4.0__tar.gz - Mend

coderace 0.2.0tar.gz → 0.4.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (64) hide show

coderace-0.4.0/CHANGELOG.md +56 -0
coderace-0.4.0/PKG-INFO +412 -0
coderace-0.4.0/README.md +382 -0
coderace-0.4.0/action.yml +120 -0
coderace-0.4.0/all-day-build-contract-ci-integration.md +104 -0
coderace-0.4.0/all-day-build-contract-cost-tracking.md +97 -0
{coderace-0.2.0 → coderace-0.4.0}/coderace/__init__.py +1 -1
coderace-0.4.0/coderace/adapters/aider.py +33 -0
{coderace-0.2.0 → coderace-0.4.0}/coderace/adapters/base.py +32 -1
{coderace-0.2.0 → coderace-0.4.0}/coderace/adapters/claude.py +13 -0
coderace-0.4.0/coderace/adapters/codex.py +33 -0
coderace-0.4.0/coderace/adapters/gemini.py +32 -0
coderace-0.4.0/coderace/adapters/opencode.py +31 -0
{coderace-0.2.0 → coderace-0.4.0}/coderace/cli.py +115 -2
coderace-0.4.0/coderace/commands/__init__.py +1 -0
coderace-0.4.0/coderace/commands/diff.py +159 -0
coderace-0.4.0/coderace/commands/results.py +116 -0
coderace-0.4.0/coderace/cost.py +456 -0
{coderace-0.2.0 → coderace-0.4.0}/coderace/html_report.py +15 -0
{coderace-0.2.0 → coderace-0.4.0}/coderace/reporter.py +26 -0
{coderace-0.2.0 → coderace-0.4.0}/coderace/scorer.py +1 -0
{coderace-0.2.0 → coderace-0.4.0}/coderace/stats.py +9 -0
{coderace-0.2.0 → coderace-0.4.0}/coderace/task.py +33 -0
{coderace-0.2.0 → coderace-0.4.0}/coderace/types.py +7 -0
{coderace-0.2.0 → coderace-0.4.0}/examples/add-type-hints.yaml +5 -0
coderace-0.4.0/examples/ci-race-on-pr.yml +66 -0
{coderace-0.2.0 → coderace-0.4.0}/examples/example-task.yaml +8 -0
{coderace-0.2.0 → coderace-0.4.0}/examples/fix-edge-case.yaml +5 -0
{coderace-0.2.0 → coderace-0.4.0}/examples/write-tests.yaml +5 -0
coderace-0.4.0/progress-log.md +92 -0
{coderace-0.2.0 → coderace-0.4.0}/pyproject.toml +1 -1
coderace-0.4.0/scripts/ci-run.sh +61 -0
coderace-0.4.0/scripts/format-comment.py +172 -0
coderace-0.4.0/tests/test_cost.py +432 -0
coderace-0.4.0/tests/test_cost_config.py +311 -0
coderace-0.4.0/tests/test_cost_integration.py +374 -0
coderace-0.4.0/tests/test_diff.py +234 -0
coderace-0.4.0/tests/test_format_comment.py +215 -0
coderace-0.4.0/tests/test_markdown_results.py +226 -0
{coderace-0.2.0 → coderace-0.4.0}/uv.lock +1 -1
coderace-0.2.0/CHANGELOG.md +0 -34
coderace-0.2.0/PKG-INFO +0 -211
coderace-0.2.0/README.md +0 -181
coderace-0.2.0/coderace/adapters/aider.py +0 -20
coderace-0.2.0/coderace/adapters/codex.py +0 -20
coderace-0.2.0/coderace/adapters/gemini.py +0 -19
coderace-0.2.0/coderace/adapters/opencode.py +0 -18
{coderace-0.2.0 → coderace-0.4.0}/.github/workflows/publish.yml +0 -0
{coderace-0.2.0 → coderace-0.4.0}/.gitignore +0 -0
{coderace-0.2.0 → coderace-0.4.0}/LICENSE +0 -0
{coderace-0.2.0 → coderace-0.4.0}/all-day-build-contract-v0.2.md +0 -0
{coderace-0.2.0 → coderace-0.4.0}/coderace/adapters/__init__.py +0 -0
{coderace-0.2.0 → coderace-0.4.0}/coderace/git_ops.py +0 -0
{coderace-0.2.0 → coderace-0.4.0}/tests/__init__.py +0 -0
{coderace-0.2.0 → coderace-0.4.0}/tests/conftest.py +0 -0
{coderace-0.2.0 → coderace-0.4.0}/tests/test_adapters.py +0 -0
{coderace-0.2.0 → coderace-0.4.0}/tests/test_cli.py +0 -0
{coderace-0.2.0 → coderace-0.4.0}/tests/test_examples.py +0 -0
{coderace-0.2.0 → coderace-0.4.0}/tests/test_git_ops.py +0 -0
{coderace-0.2.0 → coderace-0.4.0}/tests/test_html_report.py +0 -0
{coderace-0.2.0 → coderace-0.4.0}/tests/test_reporter.py +0 -0
{coderace-0.2.0 → coderace-0.4.0}/tests/test_scorer.py +0 -0
{coderace-0.2.0 → coderace-0.4.0}/tests/test_stats.py +0 -0
{coderace-0.2.0 → coderace-0.4.0}/tests/test_task.py +0 -0

coderace-0.4.0/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,56 @@
+# Changelog
+## [0.4.0] - 2026-02-24
+### Added
+- **Cost tracking** — Each agent run now includes an estimated API cost. The results table shows a `Cost (USD)` column in terminal, markdown, JSON, and HTML output.
+- **`coderace/cost.py`** — Pricing engine: pricing table for Claude Code (Sonnet 4.6, Opus 4.6), Codex (GPT-5.3), Gemini CLI (2.5 Pro, 3.1 Pro), Aider, and OpenCode. `CostResult` dataclass with `input_tokens`, `output_tokens`, `estimated_cost_usd`, `model_name`, `pricing_source`.
+- **Per-adapter `parse_cost()` methods** — Each adapter extracts token counts or cost info from the agent's stdout/stderr. Falls back to file-size estimation when tokens are unavailable.
+- **`pricing:` section in task YAML** — Override pricing per-agent or per-model with `input_per_1m` / `output_per_1m` (USD per 1M tokens).
+- **`--no-cost` flag** — `coderace run task.yaml --no-cost` disables cost tracking entirely.
+- **HTML report $/score column** — The HTML report now shows cost and cost-per-point for direct efficiency comparison.
+- **Statistical mode cost aggregation** — `--runs N` shows mean ± stddev for cost alongside score and time.
+- **`coderace init` template** — Now includes a commented `pricing:` example section.
+## [0.3.0] - 2026-02-24
+### Added
+- **`coderace diff`** - Generate task YAML from a git diff. Three modes: `review` (find bugs), `fix` (apply fixes), `improve` (refactor). Pipe any diff in, get a ready-to-race task out.
+- **GitHub Action** - `uses: mikiships/coderace@main` drops into any workflow. Races agents on your task and posts a results table as a PR comment. Re-runs update the same comment.
+- **Example CI workflows** - Two drop-in configs: PR trigger and label trigger (`race-agents`).
+- **`--format` flag for results** - `coderace results task.yaml -F markdown|json|terminal` for CI-friendly output.
+## [0.2.0] - 2026-02-23
+### Added
+- **OpenCode adapter** - OpenCode (terminal-first open-source coding agent) is now a supported agent (`opencode` in task YAML)
+- **Custom scoring weights** - Override default weights in task YAML via `scoring:` section; weights are auto-normalized; supports short aliases (`tests`, `exit`, `lint`, `time`, `lines`)
+- **HTML reports** - Self-contained single-file HTML report auto-generated on every run at `.coderace/<task>-results.html`; also `coderace results --html report.html` for manual export; sortable columns, dark theme
+- **Statistical mode** - `coderace run task.yaml --runs N` for multi-run comparison; shows mean ± stddev for score, time, and lines changed; saves per-run and aggregated JSON
+- **Example tasks** - `examples/` directory with 3 ready-to-use templates: `add-type-hints.yaml`, `fix-edge-case.yaml`, `write-tests.yaml`
+### Changed
+- `coderace init` template now includes OpenCode in default agent list
+- `coderace init` template includes commented scoring example
+- README: "Try it now" section, statistical mode docs, HTML report docs, custom scoring docs, updated agent table
+### Fixed
+- `opencode` now accepted as a valid agent name in task validation
+## [0.1.0] - 2026-02-22
+### Added
+- Initial release
+- CLI: `init`, `run`, `results`, `version` commands
+- 4 agent adapters: Claude Code, Codex, Aider, Gemini CLI
+- Sequential and parallel (git worktrees) run modes
+- Composite scoring: tests (40%), exit (20%), lint (15%), time (15%), lines (10%)
+- JSON results output
+- Rich terminal table output
+- `coderace run --parallel` using git worktrees

coderace-0.4.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,412 @@
+Metadata-Version: 2.4
+Name: coderace
+Version: 0.4.0
+Summary: Race coding agents against each other on real tasks
+Project-URL: Homepage, https://github.com/mikiships/coderace
+Project-URL: Repository, https://github.com/mikiships/coderace
+Author: mikiships
+License-Expression: MIT
+License-File: LICENSE
+Keywords: ai,aider,benchmark,claude,codex,coding-agents
+Classifier: Development Status :: 3 - Alpha
+Classifier: Environment :: Console
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Topic :: Software Development :: Testing
+Requires-Python: >=3.10
+Requires-Dist: pyyaml>=6.0
+Requires-Dist: rich>=13.0
+Requires-Dist: typer>=0.9.0
+Provides-Extra: dev
+Requires-Dist: pytest-mock>=3.0; extra == 'dev'
+Requires-Dist: pytest>=7.0; extra == 'dev'
+Requires-Dist: ruff>=0.4.0; extra == 'dev'
+Description-Content-Type: text/markdown
+# coderace
+Stop reading blog comparisons. Race coding agents against each other on real tasks in *your* repo with *your* code.
+Every week there's a new "Claude Code vs Codex vs Cursor" post. They test on toy problems with cherry-picked examples. coderace gives you automated, reproducible, scored comparisons on the tasks you actually care about.
+Define a task. Run it against Claude Code, Codex, Aider, Gemini CLI, and OpenCode. Get a scored comparison table.
+## Install
+```bash
+pip install coderace
+```
+## Quick Start
+```bash
+# Create a task template
+coderace init fix-auth-bug
+# Edit the task file (describe the bug, set test command)
+# Then race the agents:
+coderace run fix-auth-bug.yaml
+# Or race them in parallel (uses git worktrees):
+coderace run fix-auth-bug.yaml --parallel
+# View results from the last run
+coderace results fix-auth-bug.yaml
+```
+## `coderace diff` — Race Agents on a Real PR Diff
+Turn any git diff into a coderace task with one command:
+```bash
+# Race agents to review the latest commit
+git diff HEAD~1 | coderace diff --mode review | coderace run /dev/stdin
+# Generate a task YAML from a patch file, then run it
+git diff main...my-branch > my-pr.patch
+coderace diff --file my-pr.patch --mode fix --output task.yaml
+coderace run task.yaml
+```
+### Modes
+| Mode | What agents are asked to do |
+|------|-----------------------------|
+| `review` | Review the changes and provide feedback on correctness, style, and potential issues |
+| `fix` | Fix bugs or problems introduced by the diff |
+| `improve` | Enhance performance, readability, or robustness of the changed code |
+### Flags
+```
+--file PATH       Read diff from file instead of stdin
+--mode TEXT       review | fix | improve  (default: review)
+--agents TEXT     Override agent list (repeatable: --agents claude --agents aider)
+--name TEXT       Task name in generated YAML  (default: diff-task)
+--output PATH     Write YAML to file instead of stdout
+--test-command    Test command to embed in the task (default: pytest tests/ -x)
+--lint-command    Lint command to embed in the task (default: ruff check .)
+```
+## Task Format
+```yaml
+name: fix-auth-bug
+description: |
+  The login endpoint returns 500 when email contains a plus sign.
+  Fix the email validation in auth/validators.py.
+repo: .
+test_command: pytest tests/test_auth.py -x
+lint_command: ruff check .
+timeout: 300
+agents:
+  - claude
+  - codex
+  - aider
+```
+## What It Does
+For each agent in the task:
+1. Creates a fresh git branch (`coderace/<agent>-<task>`)
+2. Invokes the agent CLI with the task description
+3. Runs your test command
+4. Runs your lint command (optional)
+5. Computes a composite score
+## Scoring
+| Metric | Weight | Description |
+|--------|--------|-------------|
+| Tests pass | 40% | Did the test command exit 0? |
+| Exit clean | 20% | Did the agent itself exit 0 without timeout? |
+| Lint clean | 15% | Did the lint command exit 0? |
+| Wall time | 15% | Faster is better (normalized across agents) |
+| Lines changed | 10% | Fewer is better (normalized across agents) |
+## Output
+Terminal table with Rich formatting:
+```
+┌──────┬────────┬───────┬───────┬──────┬──────┬──────────┬───────┐
+│ Rank │ Agent  │ Score │ Tests │ Exit │ Lint │ Time (s) │ Lines │
+├──────┼────────┼───────┼───────┼──────┼──────┼──────────┼───────┤
+│  1   │ claude │  85.0 │ PASS  │ PASS │ PASS │     10.5 │    42 │
+│  2   │ codex  │  70.0 │ PASS  │ PASS │ FAIL │     15.2 │    98 │
+│  3   │ aider  │  55.0 │ FAIL  │ PASS │ PASS │      8.1 │    31 │
+└──────┴────────┴───────┴───────┴──────┴──────┴──────────┴───────┘
+```
+Results also saved as JSON in `.coderace/<task>-results.json` and as a self-contained HTML report in `.coderace/<task>-results.html`.
+## Try It Now
+The `examples/` directory has ready-to-use task templates:
+```bash
+# Race agents on adding type hints to your project
+coderace run examples/add-type-hints.yaml
+# Race agents on fixing an edge case bug
+coderace run examples/fix-edge-case.yaml
+# Race agents on writing new tests
+coderace run examples/write-tests.yaml
+```
+Edit the `repo` and `description` fields to point at your actual project and describe your real task.
+## Statistical Mode
+Run each agent multiple times and get mean ± stddev:
+```bash
+coderace run task.yaml --runs 5
+```
+Useful for tasks with variable outcomes (LLM nondeterminism is real).
+## HTML Reports
+Export results as a shareable single-file HTML report:
+```bash
+# Auto-generated on every run at .coderace/<task>-results.html
+# Or export manually:
+coderace results task.yaml --html report.html
+```
+The HTML report has sortable columns and a dark theme. Drop it in a blog post or Slack.
+## Custom Scoring
+Override the default weights in your task YAML:
+```yaml
+scoring:
+  tests: 60   # tests passing (default 40)
+  exit: 20    # clean exit (default 20)
+  lint: 10    # lint clean (default 15)
+  time: 5     # wall time (default 15)
+  lines: 5    # lines changed (default 10)
+```
+Weights are normalized automatically (don't need to sum to 100).
+## Cost Tracking
+coderace automatically estimates API cost for each agent run. After every race, the results table includes a **Cost (USD)** column so you can compare quality-per-dollar, not just quality alone.
+```
+┌──────┬────────┬───────┬───────┬──────┬──────┬──────────┬───────┬────────────┐
+│ Rank │ Agent  │ Score │ Tests │ Exit │ Lint │ Time (s) │ Lines │ Cost (USD) │
+├──────┼────────┼───────┼───────┼──────┼──────┼──────────┼───────┼────────────┤
+│  1   │ claude │  85.0 │ PASS  │ PASS │ PASS │     10.5 │    42 │    $0.0063 │
+│  2   │ codex  │  70.0 │ PASS  │ PASS │ FAIL │     15.2 │    98 │    $0.0041 │
+│  3   │ aider  │  55.0 │ FAIL  │ PASS │ PASS │      8.1 │    31 │          - │
+└──────┴────────┴───────┴───────┴──────┴──────┴──────────┴───────┴────────────┘
+```
+Cost appears in all output formats:
+- **Terminal** — `Cost (USD)` column (shows `-` when unavailable)
+- **Markdown** — `--format markdown` includes the column
+- **JSON** — `cost` object per agent result with `input_tokens`, `output_tokens`, `estimated_cost_usd`, `model_name`, `pricing_source`
+- **HTML report** — Cost column plus `$/score` ratio column for direct efficiency comparison
+### How it works
+Each agent adapter parses token counts or cost lines from the agent's CLI output:
+| Agent | Source |
+|-------|--------|
+| Claude Code | `usage.input_tokens` / `usage.output_tokens` from JSON output; or "Total cost: $N" lines |
+| Codex | `prompt_tokens=N, completion_tokens=N` usage summary |
+| Gemini CLI | `inputTokenCount=N, outputTokenCount=N` lines |
+| Aider | "Tokens: N sent, N received. Cost: $N message" lines |
+| OpenCode | "Total cost: $N" or generic token lines |
+If token counts are unavailable, cost is estimated from input file size + output diff size (marked as `pricing_source: "estimated"`).
+### Disable cost tracking
+```bash
+coderace run task.yaml --no-cost
+```
+## Custom Pricing
+Override the default pricing table in your task YAML — useful for custom models, negotiated rates, or open-source deployments.
+```yaml
+# pricing: per-agent or per-model overrides (USD per 1M tokens)
+pricing:
+  claude:
+    input_per_1m: 3.00    # default for claude-sonnet-4-6
+    output_per_1m: 15.00
+  codex:
+    input_per_1m: 3.00
+    output_per_1m: 15.00
+  # Or use the model name directly:
+  claude-opus-4-6:
+    input_per_1m: 15.00
+    output_per_1m: 75.00
+```
+Keys can be agent names (`claude`, `codex`, `aider`, `gemini`, `opencode`) or model names (`claude-sonnet-4-6`, `gpt-5.3-codex`, `gemini-2.5-pro`). The default pricing table covers:
+| Model | Input ($/1M) | Output ($/1M) |
+|-------|-------------|--------------|
+| claude-sonnet-4-6 | $3.00 | $15.00 |
+| claude-opus-4-6 | $15.00 | $75.00 |
+| gpt-5.3-codex | $3.00 | $15.00 |
+| gemini-2.5-pro | $1.25 | $10.00 |
+| gemini-3.1-pro | $1.25 | $10.00 |
+Pricing is easy to update: the table lives in `coderace/cost.py` as a plain dict.
+## Supported Agents
+| Agent | CLI | Notes |
+|-------|-----|-------|
+| Claude Code | `claude` | Anthropic's coding agent |
+| Codex | `codex` | OpenAI Codex CLI |
+| Aider | `aider` | Git-integrated AI coding |
+| Gemini CLI | `gemini` | Google's Gemini CLI |
+| OpenCode | `opencode` | Open-source terminal agent |
+Each agent must be installed and authenticated separately.
+## Parallel Mode
+Use `--parallel` (or `-p`) to run all agents simultaneously using git worktrees. Each agent gets its own isolated working directory, so they don't interfere with each other.
+```bash
+coderace run task.yaml --parallel
+```
+Sequential mode (default) runs agents one at a time on the same repo.
+## Why coderace?
+**Blog posts compare models. coderace compares agents on your work.**
+- Run on your actual codebase, not HumanEval
+- Automated scoring: tests, lint, time, lines changed
+- Parallel mode with git worktrees (no interference between agents)
+- JSON output for CI integration and tracking over time
+- Works with any agent that has a CLI
+The goal isn't "which model is best." It's "which agent solves my specific problem best."
+## CI Integration
+Use coderace in GitHub Actions to automatically race agents on PRs and post results as comments.
+### Quick setup
+1. Copy `examples/ci-race-on-pr.yml` into `.github/workflows/` in your repo.
+2. Create a task YAML at `.github/coderace-task.yaml` (see [Task Format](#task-format)).
+3. Install the agent CLIs your task requires (see comments in the workflow file).
+4. Open or update a PR — results appear as a PR comment automatically.
+### Workflow: Race on every PR
+```yaml
+name: Race Coding Agents
+on:
+  pull_request:
+    branches: [main]
+jobs:
+  race:
+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+      pull-requests: write
+    steps:
+      - uses: actions/checkout@v4
+      - name: Run coderace
+        uses: mikiships/coderace@v0.3
+        with:
+          task: .github/coderace-task.yaml
+          agents: claude,aider
+          github-token: ${{ secrets.GITHUB_TOKEN }}
+```
+### Workflow: Race only when "race-agents" label is added
+Cost-control pattern: only race when a maintainer deliberately triggers it.
+```yaml
+name: Race Coding Agents (on label)
+on:
+  pull_request:
+    types: [labeled]
+jobs:
+  race:
+    if: github.event.label.name == 'race-agents'
+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+      pull-requests: write
+    steps:
+      - uses: actions/checkout@v4
+      - name: Run coderace
+        uses: mikiships/coderace@v0.3
+        with:
+          task: .github/coderace-task.yaml
+          github-token: ${{ secrets.GITHUB_TOKEN }}
+```
+### Action inputs
+| Input | Description | Default |
+|-------|-------------|---------|
+| `task` | Path to coderace task YAML | _(required)_ |
+| `agents` | Comma-separated agents to race | _(from task file)_ |
+| `parallel` | Run agents in parallel (`true`/`false`) | `false` |
+| `github-token` | Token for posting PR comments | `${{ github.token }}` |
+| `coderace-version` | coderace version to install | `latest` |
+| `python-version` | Python version | `3.11` |
+### Example PR comment
+The action automatically posts (and updates on re-run) a comment like:
+> ✅ **coderace** — `fix-auth-bug` | **Winner: `claude`** (85.0 pts) | 3 agent(s) raced
+>
+> | Rank | Agent | Score | Tests | Lint | Exit | Time (s) | Lines |
+> |------|-------|------:|:-----:|:----:|:----:|---------:|------:|
+> | 1 | `claude` | 85.0 | ✅ | ✅ | ✅ | 10.5 | 42 |
+> | 2 | `codex` | 70.0 | ✅ | ❌ | ✅ | 15.2 | 98 |
+> | 3 | `aider` | 55.0 | ❌ | ✅ | ✅ | 8.1 | 31 |
+The action uses a hidden HTML marker to find and update existing comments, so re-running doesn't spam the PR.
+## See Also
+- **[pytest-agentcontract](https://github.com/mikiships/pytest-agentcontract)** -- Deterministic CI tests for LLM agent trajectories. Record once, replay offline, assert contracts. Pairs well with coderace: race agents to find the best one, then lock down its behavior with contract tests.
+## Requirements
+- Python 3.10+
+- Git
+- At least one coding agent CLI installed
+## License
+MIT

coderace 0.2.0__tar.gz → 0.4.0__tar.gz

coderace 0.2.0tar.gz → 0.4.0tar.gz