PyPI - coderace - Versions diffs - 0.3.0__tar.gz → 0.4.0__tar.gz - Mend

coderace 0.3.0tar.gz → 0.4.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (62) hide show

{coderace-0.3.0 → coderace-0.4.0}/CHANGELOG.md RENAMED Viewed

@@ -1,5 +1,18 @@
 # Changelog
+## [0.4.0] - 2026-02-24
+### Added
+- **Cost tracking** — Each agent run now includes an estimated API cost. The results table shows a `Cost (USD)` column in terminal, markdown, JSON, and HTML output.
+- **`coderace/cost.py`** — Pricing engine: pricing table for Claude Code (Sonnet 4.6, Opus 4.6), Codex (GPT-5.3), Gemini CLI (2.5 Pro, 3.1 Pro), Aider, and OpenCode. `CostResult` dataclass with `input_tokens`, `output_tokens`, `estimated_cost_usd`, `model_name`, `pricing_source`.
+- **Per-adapter `parse_cost()` methods** — Each adapter extracts token counts or cost info from the agent's stdout/stderr. Falls back to file-size estimation when tokens are unavailable.
+- **`pricing:` section in task YAML** — Override pricing per-agent or per-model with `input_per_1m` / `output_per_1m` (USD per 1M tokens).
+- **`--no-cost` flag** — `coderace run task.yaml --no-cost` disables cost tracking entirely.
+- **HTML report $/score column** — The HTML report now shows cost and cost-per-point for direct efficiency comparison.
+- **Statistical mode cost aggregation** — `--runs N` shows mean ± stddev for cost alongside score and time.
+- **`coderace init` template** — Now includes a commented `pricing:` example section.
 ## [0.3.0] - 2026-02-24
 ### Added

{coderace-0.3.0 → coderace-0.4.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: coderace
-Version: 0.3.0
+Version: 0.4.0
 Summary: Race coding agents against each other on real tasks
 Project-URL: Homepage, https://github.com/mikiships/coderace
 Project-URL: Repository, https://github.com/mikiships/coderace
@@ -200,6 +200,77 @@ scoring:
 Weights are normalized automatically (don't need to sum to 100).
+## Cost Tracking
+coderace automatically estimates API cost for each agent run. After every race, the results table includes a **Cost (USD)** column so you can compare quality-per-dollar, not just quality alone.
+```
+┌──────┬────────┬───────┬───────┬──────┬──────┬──────────┬───────┬────────────┐
+│ Rank │ Agent  │ Score │ Tests │ Exit │ Lint │ Time (s) │ Lines │ Cost (USD) │
+├──────┼────────┼───────┼───────┼──────┼──────┼──────────┼───────┼────────────┤
+│  1   │ claude │  85.0 │ PASS  │ PASS │ PASS │     10.5 │    42 │    $0.0063 │
+│  2   │ codex  │  70.0 │ PASS  │ PASS │ FAIL │     15.2 │    98 │    $0.0041 │
+│  3   │ aider  │  55.0 │ FAIL  │ PASS │ PASS │      8.1 │    31 │          - │
+└──────┴────────┴───────┴───────┴──────┴──────┴──────────┴───────┴────────────┘
+```
+Cost appears in all output formats:
+- **Terminal** — `Cost (USD)` column (shows `-` when unavailable)
+- **Markdown** — `--format markdown` includes the column
+- **JSON** — `cost` object per agent result with `input_tokens`, `output_tokens`, `estimated_cost_usd`, `model_name`, `pricing_source`
+- **HTML report** — Cost column plus `$/score` ratio column for direct efficiency comparison
+### How it works
+Each agent adapter parses token counts or cost lines from the agent's CLI output:
+| Agent | Source |
+|-------|--------|
+| Claude Code | `usage.input_tokens` / `usage.output_tokens` from JSON output; or "Total cost: $N" lines |
+| Codex | `prompt_tokens=N, completion_tokens=N` usage summary |
+| Gemini CLI | `inputTokenCount=N, outputTokenCount=N` lines |
+| Aider | "Tokens: N sent, N received. Cost: $N message" lines |
+| OpenCode | "Total cost: $N" or generic token lines |
+If token counts are unavailable, cost is estimated from input file size + output diff size (marked as `pricing_source: "estimated"`).
+### Disable cost tracking
+```bash
+coderace run task.yaml --no-cost
+```
+## Custom Pricing
+Override the default pricing table in your task YAML — useful for custom models, negotiated rates, or open-source deployments.
+```yaml
+# pricing: per-agent or per-model overrides (USD per 1M tokens)
+pricing:
+  claude:
+    input_per_1m: 3.00    # default for claude-sonnet-4-6
+    output_per_1m: 15.00
+  codex:
+    input_per_1m: 3.00
+    output_per_1m: 15.00
+  # Or use the model name directly:
+  claude-opus-4-6:
+    input_per_1m: 15.00
+    output_per_1m: 75.00
+```
+Keys can be agent names (`claude`, `codex`, `aider`, `gemini`, `opencode`) or model names (`claude-sonnet-4-6`, `gpt-5.3-codex`, `gemini-2.5-pro`). The default pricing table covers:
+| Model | Input ($/1M) | Output ($/1M) |
+|-------|-------------|--------------|
+| claude-sonnet-4-6 | $3.00 | $15.00 |
+| claude-opus-4-6 | $15.00 | $75.00 |
+| gpt-5.3-codex | $3.00 | $15.00 |
+| gemini-2.5-pro | $1.25 | $10.00 |
+| gemini-3.1-pro | $1.25 | $10.00 |
+Pricing is easy to update: the table lives in `coderace/cost.py` as a plain dict.
 ## Supported Agents
 | Agent | CLI | Notes |

{coderace-0.3.0 → coderace-0.4.0}/README.md RENAMED Viewed

@@ -170,6 +170,77 @@ scoring:
 Weights are normalized automatically (don't need to sum to 100).
+## Cost Tracking
+coderace automatically estimates API cost for each agent run. After every race, the results table includes a **Cost (USD)** column so you can compare quality-per-dollar, not just quality alone.
+```
+┌──────┬────────┬───────┬───────┬──────┬──────┬──────────┬───────┬────────────┐
+│ Rank │ Agent  │ Score │ Tests │ Exit │ Lint │ Time (s) │ Lines │ Cost (USD) │
+├──────┼────────┼───────┼───────┼──────┼──────┼──────────┼───────┼────────────┤
+│  1   │ claude │  85.0 │ PASS  │ PASS │ PASS │     10.5 │    42 │    $0.0063 │
+│  2   │ codex  │  70.0 │ PASS  │ PASS │ FAIL │     15.2 │    98 │    $0.0041 │
+│  3   │ aider  │  55.0 │ FAIL  │ PASS │ PASS │      8.1 │    31 │          - │
+└──────┴────────┴───────┴───────┴──────┴──────┴──────────┴───────┴────────────┘
+```
+Cost appears in all output formats:
+- **Terminal** — `Cost (USD)` column (shows `-` when unavailable)
+- **Markdown** — `--format markdown` includes the column
+- **JSON** — `cost` object per agent result with `input_tokens`, `output_tokens`, `estimated_cost_usd`, `model_name`, `pricing_source`
+- **HTML report** — Cost column plus `$/score` ratio column for direct efficiency comparison
+### How it works
+Each agent adapter parses token counts or cost lines from the agent's CLI output:
+| Agent | Source |
+|-------|--------|
+| Claude Code | `usage.input_tokens` / `usage.output_tokens` from JSON output; or "Total cost: $N" lines |
+| Codex | `prompt_tokens=N, completion_tokens=N` usage summary |
+| Gemini CLI | `inputTokenCount=N, outputTokenCount=N` lines |
+| Aider | "Tokens: N sent, N received. Cost: $N message" lines |
+| OpenCode | "Total cost: $N" or generic token lines |
+If token counts are unavailable, cost is estimated from input file size + output diff size (marked as `pricing_source: "estimated"`).
+### Disable cost tracking
+```bash
+coderace run task.yaml --no-cost
+```
+## Custom Pricing
+Override the default pricing table in your task YAML — useful for custom models, negotiated rates, or open-source deployments.
+```yaml
+# pricing: per-agent or per-model overrides (USD per 1M tokens)
+pricing:
+  claude:
+    input_per_1m: 3.00    # default for claude-sonnet-4-6
+    output_per_1m: 15.00
+  codex:
+    input_per_1m: 3.00
+    output_per_1m: 15.00
+  # Or use the model name directly:
+  claude-opus-4-6:
+    input_per_1m: 15.00
+    output_per_1m: 75.00
+```
+Keys can be agent names (`claude`, `codex`, `aider`, `gemini`, `opencode`) or model names (`claude-sonnet-4-6`, `gpt-5.3-codex`, `gemini-2.5-pro`). The default pricing table covers:
+| Model | Input ($/1M) | Output ($/1M) |
+|-------|-------------|--------------|
+| claude-sonnet-4-6 | $3.00 | $15.00 |
+| claude-opus-4-6 | $15.00 | $75.00 |
+| gpt-5.3-codex | $3.00 | $15.00 |
+| gemini-2.5-pro | $1.25 | $10.00 |
+| gemini-3.1-pro | $1.25 | $10.00 |
+Pricing is easy to update: the table lives in `coderace/cost.py` as a plain dict.
 ## Supported Agents
 | Agent | CLI | Notes |

coderace-0.4.0/all-day-build-contract-cost-tracking.md ADDED Viewed

@@ -0,0 +1,97 @@
+# All-Day Build Contract: Cost Tracking (v0.4.0)
+Status: In Progress
+Date: 2026-02-24
+Owner: Sub-agent execution pass
+Scope type: Deliverable-gated (no hour promises)
+## 1. Objective
+Add cost tracking to coderace so users can compare coding agents on quality-per-dollar, not just quality alone. When a race finishes, each agent's result includes estimated API cost. The results table shows a $/score column. This is the #1 missing comparison axis: everyone benchmarks speed and quality, nobody automates cost comparison.
+This contract is considered complete only when every deliverable and validation gate below is satisfied.
+## 2. Non-Negotiable Build Rules
+1. No time-based completion claims.
+2. Completion is allowed only when all checklist items are checked.
+3. Full test suite must pass at the end.
+4. New features must ship with docs and report addendum updates in the same pass.
+5. CLI outputs must be deterministic and schema-backed where specified.
+6. Never modify files outside the project directory.
+7. Commit after each completed deliverable (not at the end).
+8. If stuck on same issue for 3 attempts, stop and write a blocker report.
+9. Do NOT refactor, restyle, or "improve" code outside the deliverables.
+10. Read existing tests and docs before writing new code.
+## 3. Feature Deliverables
+### D1. Cost estimation engine (core)
+Build a cost estimation module that maps agent CLI output to dollar costs. Each agent adapter gets a `parse_cost()` method that extracts token counts or cost info from the agent's stdout/stderr.
+Required:
+- `coderace/cost.py` — pricing tables, cost calculation logic
+- `coderace/adapters/*.py` — updated with parse_cost() methods
+- [ ] Pricing table for: Claude Code (Sonnet 4.6, Opus 4.6), Codex (GPT-5.3), Gemini CLI (Gemini 2.5 Pro, Gemini 3.1 Pro), Aider (configurable model), OpenCode (configurable model)
+- [ ] Parse token counts from each agent's output (Claude Code prints session summary, Codex prints usage, etc.)
+- [ ] Fallback: if token counts unavailable, estimate from input file size + output diff size using per-model pricing
+- [ ] CostResult dataclass: input_tokens, output_tokens, estimated_cost_usd, model_name, pricing_source
+- [ ] Tests for D1: unit tests for each parser, edge cases (missing output, unknown model)
+### D2. Results integration
+Integrate cost data into the race results pipeline. Show cost alongside score in all output formats.
+Required:
+- `coderace/results.py` — updated
+- `coderace/cli.py` — updated
+- [ ] Race results include cost_usd field per agent
+- [ ] `coderace results` terminal output shows Cost column
+- [ ] `--format markdown` includes cost column
+- [ ] `--format json` includes cost object
+- [ ] HTML report includes cost column with $/score ratio
+- [ ] Statistical mode (`--runs N`) aggregates cost: mean ± stddev
+- [ ] Tests for D2
+### D3. Cost configuration
+Allow users to override pricing in task YAML (for custom models, negotiated rates, etc).
+Required:
+- `coderace/config.py` or extend task YAML schema
+- [ ] `pricing:` section in task YAML: per-agent or per-model overrides
+- [ ] `coderace init` template includes commented pricing example
+- [ ] `--no-cost` flag to disable cost tracking entirely
+- [ ] Tests for D3
+### D4. Documentation
+- [ ] README section: "Cost Tracking" with example output
+- [ ] README section: "Custom Pricing" showing YAML config
+- [ ] CHANGELOG entry for v0.4.0
+- [ ] Update example task YAMLs with pricing comments
+## 4. Test Requirements
+- [ ] Unit tests for cost parsing (each adapter)
+- [ ] Unit tests for pricing calculation
+- [ ] Integration test: full race with cost output
+- [ ] Edge cases: agent crashes (no cost data), unknown model, zero tokens
+- [ ] All existing 130 tests must still pass
+## 5. Reports
+- Write progress to `progress-log.md` after each deliverable
+- Include: what was built, what tests pass, what's next, any blockers
+- Final summary when all deliverables done or stopped
+## 6. Stop Conditions
+- All deliverables checked and all tests passing -> DONE
+- 3 consecutive failed attempts on same issue -> STOP, write blocker report
+- Scope creep detected (new requirements discovered) -> STOP, report what's new
+- All tests passing but deliverables remain -> continue to next deliverable

coderace-0.4.0/coderace/adapters/aider.py ADDED Viewed

@@ -0,0 +1,33 @@
+"""Aider adapter."""
+from __future__ import annotations
+from typing import Optional
+from coderace.adapters.base import BaseAdapter
+from coderace.cost import CostResult, parse_aider_cost
+class AiderAdapter(BaseAdapter):
+    """Adapter for Aider coding assistant."""
+    name = "aider"
+    def build_command(self, task_description: str) -> list[str]:
+        return [
+            "aider",
+            "--message",
+            task_description,
+            "--yes",
+            "--no-auto-commits",
+        ]
+    def parse_cost(
+        self,
+        stdout: str,
+        stderr: str,
+        model_name: str = "aider-default",
+        custom_pricing: dict[str, tuple[float, float]] | None = None,
+    ) -> Optional[CostResult]:
+        """Parse cost data from Aider output."""
+        return parse_aider_cost(stdout, stderr, model_name, custom_pricing)

{coderace-0.3.0 → coderace-0.4.0}/coderace/adapters/base.py RENAMED Viewed

@@ -6,7 +6,9 @@ import subprocess
 import time
 from abc import ABC, abstractmethod
 from pathlib import Path
+from typing import Optional
+from coderace.cost import CostResult
 from coderace.types import AgentResult
@@ -20,7 +22,27 @@ class BaseAdapter(ABC):
         """Build the CLI command to invoke this agent."""
         ...
-    def run(self, task_description: str, workdir: Path, timeout: int) -> AgentResult:
+    def parse_cost(
+        self,
+        stdout: str,
+        stderr: str,
+        model_name: str = "",
+        custom_pricing: dict[str, tuple[float, float]] | None = None,
+    ) -> Optional[CostResult]:
+        """Parse cost data from agent output. Override in subclasses.
+        Returns None if cost data is unavailable.
+        """
+        return None
+    def run(
+        self,
+        task_description: str,
+        workdir: Path,
+        timeout: int,
+        no_cost: bool = False,
+        custom_pricing: dict[str, tuple[float, float]] | None = None,
+    ) -> AgentResult:
         """Run the agent on a task and capture results."""
         cmd = self.build_command(task_description)
         start = time.monotonic()
@@ -50,6 +72,14 @@ class BaseAdapter(ABC):
         wall_time = time.monotonic() - start
+        # Parse cost (fails gracefully — never raises)
+        cost_result: Optional[CostResult] = None
+        if not no_cost:
+            try:
+                cost_result = self.parse_cost(stdout, stderr, custom_pricing=custom_pricing)
+            except Exception:
+                pass
         return AgentResult(
             agent=self.name,
             exit_code=exit_code,
@@ -57,4 +87,5 @@ class BaseAdapter(ABC):
             stderr=stderr,
             wall_time=wall_time,
             timed_out=timed_out,
+            cost_result=cost_result,
         )

{coderace-0.3.0 → coderace-0.4.0}/coderace/adapters/claude.py RENAMED Viewed

@@ -2,7 +2,10 @@
 from __future__ import annotations
+from typing import Optional
 from coderace.adapters.base import BaseAdapter
+from coderace.cost import CostResult, parse_claude_cost
 class ClaudeAdapter(BaseAdapter):
@@ -19,3 +22,13 @@ class ClaudeAdapter(BaseAdapter):
             "-p",
             task_description,
         ]
+    def parse_cost(
+        self,
+        stdout: str,
+        stderr: str,
+        model_name: str = "claude-sonnet-4-6",
+        custom_pricing: dict[str, tuple[float, float]] | None = None,
+    ) -> Optional[CostResult]:
+        """Parse cost data from Claude Code output."""
+        return parse_claude_cost(stdout, stderr, model_name, custom_pricing)

coderace-0.4.0/coderace/adapters/codex.py ADDED Viewed

@@ -0,0 +1,33 @@
+"""Codex CLI adapter."""
+from __future__ import annotations
+from typing import Optional
+from coderace.adapters.base import BaseAdapter
+from coderace.cost import CostResult, parse_codex_cost
+class CodexAdapter(BaseAdapter):
+    """Adapter for OpenAI Codex CLI."""
+    name = "codex"
+    def build_command(self, task_description: str) -> list[str]:
+        return [
+            "codex",
+            "--quiet",
+            "--full-auto",
+            "-p",
+            task_description,
+        ]
+    def parse_cost(
+        self,
+        stdout: str,
+        stderr: str,
+        model_name: str = "gpt-5.3-codex",
+        custom_pricing: dict[str, tuple[float, float]] | None = None,
+    ) -> Optional[CostResult]:
+        """Parse cost data from Codex CLI output."""
+        return parse_codex_cost(stdout, stderr, model_name, custom_pricing)

coderace-0.4.0/coderace/adapters/gemini.py ADDED Viewed

@@ -0,0 +1,32 @@
+"""Gemini CLI adapter."""
+from __future__ import annotations
+from typing import Optional
+from coderace.adapters.base import BaseAdapter
+from coderace.cost import CostResult, parse_gemini_cost
+class GeminiAdapter(BaseAdapter):
+    """Adapter for Google Gemini CLI."""
+    name = "gemini"
+    def build_command(self, task_description: str) -> list[str]:
+        return [
+            "gemini",
+            "--non-interactive",
+            "-p",
+            task_description,
+        ]
+    def parse_cost(
+        self,
+        stdout: str,
+        stderr: str,
+        model_name: str = "gemini-2.5-pro",
+        custom_pricing: dict[str, tuple[float, float]] | None = None,
+    ) -> Optional[CostResult]:
+        """Parse cost data from Gemini CLI output."""
+        return parse_gemini_cost(stdout, stderr, model_name, custom_pricing)

coderace-0.4.0/coderace/adapters/opencode.py ADDED Viewed

@@ -0,0 +1,31 @@
+"""OpenCode adapter."""
+from __future__ import annotations
+from typing import Optional
+from coderace.adapters.base import BaseAdapter
+from coderace.cost import CostResult, parse_opencode_cost
+class OpenCodeAdapter(BaseAdapter):
+    """Adapter for OpenCode CLI (terminal-first AI coding agent)."""
+    name = "opencode"
+    def build_command(self, task_description: str) -> list[str]:
+        return [
+            "opencode",
+            "run",
+            task_description,
+        ]
+    def parse_cost(
+        self,
+        stdout: str,
+        stderr: str,
+        model_name: str = "opencode-default",
+        custom_pricing: dict[str, tuple[float, float]] | None = None,
+    ) -> Optional[CostResult]:
+        """Parse cost data from OpenCode output."""
+        return parse_opencode_cost(stdout, stderr, model_name, custom_pricing)

{coderace-0.3.0 → coderace-0.4.0}/coderace/cli.py RENAMED Viewed

@@ -51,6 +51,8 @@ def _run_agent_sequential(
     branch: str,
     base_ref: str,
     timeout: int,
+    no_cost: bool = False,
+    custom_pricing: dict | None = None,
 ) -> tuple[AgentResult | None, int]:
     """Run a single agent sequentially (on the main repo). Returns (result, lines_changed)."""
     try:
@@ -59,7 +61,7 @@ def _run_agent_sequential(
         return None, 0
     adapter = ADAPTERS[agent_name]()
-    result = adapter.run(task_description, repo, timeout)
+    result = adapter.run(task_description, repo, timeout, no_cost=no_cost, custom_pricing=custom_pricing)
     _, lines = get_diff_stat(repo, base_ref)
     return result, lines
@@ -72,6 +74,8 @@ def _run_agent_worktree(
     branch: str,
     base_ref: str,
     timeout: int,
+    no_cost: bool = False,
+    custom_pricing: dict | None = None,
 ) -> tuple[AgentResult | None, int]:
     """Run a single agent in a git worktree (for parallel execution)."""
     import tempfile
@@ -87,7 +91,7 @@ def _run_agent_worktree(
         add_worktree(repo, worktree_dir, branch)
         adapter = ADAPTERS[agent_name]()
-        result = adapter.run(task_description, worktree_dir, timeout)
+        result = adapter.run(task_description, worktree_dir, timeout, no_cost=no_cost, custom_pricing=custom_pricing)
         _, lines = get_diff_stat(worktree_dir, base_ref)
         return result, lines
@@ -114,6 +118,9 @@ def run(
     runs: int = typer.Option(
         1, "--runs", "-n", help="Number of runs (>1 for stats)"
     ),
+    no_cost: bool = typer.Option(
+        False, "--no-cost", help="Disable cost tracking"
+    ),
 ) -> None:
     """Run all agents on a task and score the results."""
     task = load_task(task_file)
@@ -195,6 +202,8 @@ def run(
                         branch,
                         base_ref,
                         task.timeout,
+                        no_cost,
+                        task.pricing,
                     )
                     futures[future] = agent_name
@@ -248,6 +257,8 @@ def run(
                     branch,
                     base_ref,
                     task.timeout,
+                    no_cost=no_cost,
+                    custom_pricing=task.pricing,
                 )
                 if result is None:
@@ -417,6 +428,8 @@ def _save_stats_json(
                 "exit_clean_rate": s.exit_clean_rate,
                 "lint_clean_rate": s.lint_clean_rate,
                 "per_run_scores": s.per_run_scores,
+                "cost_mean": s.cost_mean,
+                "cost_stddev": s.cost_stddev,
             }
         )
@@ -484,9 +497,16 @@ def results(
     table.add_column("Lint", justify="center")
     table.add_column("Time (s)", justify="right")
     table.add_column("Lines", justify="right")
+    table.add_column("Cost (USD)", justify="right")
     for entry in data:
         b = entry["breakdown"]
+        cost_info = entry.get("cost")
+        cost_str = (
+            f"${cost_info['estimated_cost_usd']:.4f}"
+            if cost_info is not None
+            else "-"
+        )
         table.add_row(
             str(entry["rank"]),
             entry["agent"],
@@ -496,6 +516,7 @@ def results(
             _bool_icon(b["lint_clean"]),
             f"{b['wall_time']:.1f}",
             str(b["lines_changed"]),
+            cost_str,
         )
     console.print(table)

{coderace-0.3.0 → coderace-0.4.0}/coderace/commands/results.py RENAMED Viewed

@@ -39,16 +39,21 @@ def format_markdown_results(scores: list[Score], task_name: str = "") -> str:
     )
     # Table header
-    header = "| Rank | Agent | Score | Tests | Lint | Exit | Time (s) | Lines |\n"
-    separator = "|------|-------|------:|:-----:|:----:|:----:|---------:|------:|\n"
+    header = "| Rank | Agent | Score | Tests | Lint | Exit | Time (s) | Lines | Cost (USD) |\n"
+    separator = "|------|-------|------:|:-----:|:----:|:----:|---------:|------:|-----------:|\n"
     rows: list[str] = []
     for i, score in enumerate(ranked, 1):
         b = score.breakdown
+        cost_str = (
+            f"${score.cost_result.estimated_cost_usd:.4f}"
+            if score.cost_result is not None
+            else "-"
+        )
         row = (
             f"| {i} | `{score.agent}` | {score.composite:.1f} |"
             f" {_bool_md(b.tests_pass)} | {_bool_md(b.lint_clean)} |"
-            f" {_bool_md(b.exit_clean)} | {b.wall_time:.1f} | {b.lines_changed} |"
+            f" {_bool_md(b.exit_clean)} | {b.wall_time:.1f} | {b.lines_changed} | {cost_str} |"
         )
         rows.append(row)
@@ -84,12 +89,18 @@ def format_markdown_from_json(data: list[dict], task_name: str = "") -> str:
     heading = f"## coderace results: {task_name}\n\n" if task_name else "## coderace results\n\n"
     summary = f"**Winner:** `{agent}` — {score:.1f} pts | {n} agent(s) raced\n\n"
-    header = "| Rank | Agent | Score | Tests | Lint | Exit | Time (s) | Lines |\n"
-    separator = "|------|-------|------:|:-----:|:----:|:----:|---------:|------:|\n"
+    header = "| Rank | Agent | Score | Tests | Lint | Exit | Time (s) | Lines | Cost (USD) |\n"
+    separator = "|------|-------|------:|:-----:|:----:|:----:|---------:|------:|-----------:|\n"
     rows: list[str] = []
     for entry in data:
         b = entry.get("breakdown", {})
+        cost_info = entry.get("cost")
+        cost_str = (
+            f"${cost_info['estimated_cost_usd']:.4f}"
+            if cost_info is not None
+            else "-"
+        )
         rank = entry.get("rank", "?")
         a = entry.get("agent", "?")
         sc = entry.get("composite_score", 0.0)
@@ -98,7 +109,7 @@ def format_markdown_from_json(data: list[dict], task_name: str = "") -> str:
             f" {_bool_md(b.get('tests_pass', False))} |"
             f" {_bool_md(b.get('lint_clean', False))} |"
             f" {_bool_md(b.get('exit_clean', False))} |"
-            f" {b.get('wall_time', 0.0):.1f} | {b.get('lines_changed', 0)} |"
+            f" {b.get('wall_time', 0.0):.1f} | {b.get('lines_changed', 0)} | {cost_str} |"
         )
         rows.append(row)

coderace 0.3.0__tar.gz → 0.4.0__tar.gz

coderace 0.3.0tar.gz → 0.4.0tar.gz