PyPI - axor-benchmarks - Versions diffs - 0.1.1__tar.gz - Mend

axor-benchmarks 0.1.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

axor_benchmarks-0.1.1/.github/workflows/ci.yml +95 -0
axor_benchmarks-0.1.1/.gitignore +8 -0
axor_benchmarks-0.1.1/LICENSE +17 -0
axor_benchmarks-0.1.1/PKG-INFO +164 -0
axor_benchmarks-0.1.1/README.md +139 -0
axor_benchmarks-0.1.1/REFACTORING_SUMMARY.md +161 -0
axor_benchmarks-0.1.1/TEST_COVERAGE_IMPROVEMENTS.md +385 -0
axor_benchmarks-0.1.1/api_key.py +333 -0
axor_benchmarks-0.1.1/api_key_management.py +40 -0
axor_benchmarks-0.1.1/api_key_refactored.py +282 -0
axor_benchmarks-0.1.1/benchmarks/__init__.py +0 -0
axor_benchmarks-0.1.1/benchmarks/governed.py +274 -0
axor_benchmarks-0.1.1/benchmarks/raw.py +457 -0
axor_benchmarks-0.1.1/benchmarks/reporter.py +232 -0
axor_benchmarks-0.1.1/benchmarks/run.py +273 -0
axor_benchmarks-0.1.1/benchmarks/tasks.py +194 -0
axor_benchmarks-0.1.1/pyproject.toml +51 -0
axor_benchmarks-0.1.1/test_api_key.py +204 -0
axor_benchmarks-0.1.1/test_api_key_management.py +293 -0
axor_benchmarks-0.1.1/test_api_key_simple.py +138 -0

axor_benchmarks-0.1.1/.github/workflows/ci.yml ADDED Viewed

@@ -0,0 +1,95 @@
+name: CI/CD
+on:
+  push:
+    branches: [main]
+    tags: ["v*.*.*"]
+  pull_request:
+    branches: [main]
+jobs:
+  test:
+    name: Test (Python ${{ matrix.python-version }})
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ["3.11", "3.12"]
+    steps:
+      - uses: actions/checkout@v4
+      - name: Checkout axor-core
+        uses: actions/checkout@v4
+        with:
+          repository: ${{ github.repository_owner }}/axor-core
+          path: axor-core
+      - name: Checkout axor-claude
+        uses: actions/checkout@v4
+        with:
+          repository: ${{ github.repository_owner }}/axor-claude
+          path: axor-claude
+      - uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+          cache: pip
+      - name: Install
+        run: |
+          pip install -e axor-core/
+          pip install -e axor-claude/
+          pip install -e ".[dev]"
+      - name: Run tests
+        run: pytest -q
+  publish:
+    name: Publish to PyPI
+    needs: test
+    runs-on: ubuntu-latest
+    if: startsWith(github.ref, 'refs/tags/v')
+    environment: pypi
+    permissions:
+      id-token: write
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+      - name: Verify tag matches package version
+        run: |
+          python - << 'EOF'
+          import pathlib
+          import re
+          import sys
+          import tomllib
+          ref = "${{ github.ref_name }}"
+          m = re.fullmatch(r"v(\d+\.\d+\.\d+)", ref)
+          if not m:
+              print(f"Tag {ref!r} must match vX.Y.Z")
+              sys.exit(1)
+          tag_version = m.group(1)
+          data = tomllib.loads(pathlib.Path("pyproject.toml").read_text(encoding="utf-8"))
+          pkg_version = data["project"]["version"]
+          if tag_version != pkg_version:
+              print(f"Version mismatch: tag={tag_version}, pyproject={pkg_version}")
+              sys.exit(1)
+          print(f"Version check passed: {pkg_version}")
+          EOF
+      - name: Build
+        run: |
+          pip install hatchling build
+          python -m build
+      - name: Publish to PyPI
+        uses: pypa/gh-action-pypi-publish@release/v1

axor_benchmarks-0.1.1/.gitignore ADDED Viewed

@@ -0,0 +1,8 @@
+__pycache__/
+*.pyc
+*.egg-info/
+dist/
+build/
+.venv/
+results/
+*.json

axor_benchmarks-0.1.1/LICENSE ADDED Viewed

@@ -0,0 +1,17 @@
+MIT License
+Copyright (c) 2025 Axor Contributors
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.

axor_benchmarks-0.1.1/PKG-INFO ADDED Viewed

@@ -0,0 +1,164 @@
+Metadata-Version: 2.4
+Name: axor-benchmarks
+Version: 0.1.1
+Summary: Benchmark governed vs raw Claude on your codebase
+Project-URL: Repository, https://github.com/Bucha11/axor-benchmarks
+Project-URL: Bug Tracker, https://github.com/Bucha11/axor-benchmarks/issues
+Project-URL: Changelog, https://github.com/Bucha11/axor-benchmarks/releases
+License: MIT
+License-File: LICENSE
+Keywords: agents,axor,benchmark,claude,llm
+Classifier: Development Status :: 3 - Alpha
+Classifier: Environment :: Console
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Requires-Python: >=3.11
+Requires-Dist: anthropic>=0.40.0
+Requires-Dist: axor-claude>=0.1.0
+Requires-Dist: axor-core>=0.1.0
+Provides-Extra: dev
+Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
+Requires-Dist: pytest>=8.0; extra == 'dev'
+Description-Content-Type: text/markdown
+# axor-benchmarks
+[![CI](https://github.com/Bucha11/axor-benchmarks/actions/workflows/ci.yml/badge.svg)](https://github.com/Bucha11/axor-benchmarks/actions/workflows/ci.yml)
+[![PyPI](https://img.shields.io/pypi/v/axor-benchmarks)](https://pypi.org/project/axor-benchmarks/)
+[![Python](https://img.shields.io/pypi/pyversions/axor-benchmarks)](https://pypi.org/project/axor-benchmarks/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
+**Benchmark governed (axor) vs raw Claude on your codebase.**
+Measures real token savings, latency, and federation across 4 benchmark suites on any Python project.
+---
+## Installation
+```bash
+pip install axor-benchmarks
+```
+---
+## Quick Start
+```bash
+cd ~/my-project
+axor-bench
+```
+Output:
+```
+  axor benchmark results
+  repo: ~/my-project
+  file: src/auth.py
+  task                  raw tokens    governed    savings  bar               policy
+  ─────────────────────────────────────────────────────────────────────────────────
+  write_test                 1,842       1,203    -34.7%  ████████░░░░░░░░  focused_generative
+  explain_function           1,105         891    -19.4%  ███░░░░░░░░░░░░░  focused_readonly
+  find_bugs                  1,290         978    -24.2%  ████░░░░░░░░░░░░  focused_readonly
+  ─────────────────────────────────────────────────────────────────────────────────
+  TOTAL                      4,237       3,072    -27.5%  ████░░░░░░░░░░░░
+  insights
+  → Token reduction:    27.5% (4,237 → 3,072 tokens)
+  → Most used policy:   focused_readonly (2 tasks)
+```
+---
+## Authentication
+Priority order (highest to lowest):
+1. `--api-key sk-ant-...` flag
+2. `ANTHROPIC_API_KEY` env var
+3. `~/.axor/config.toml` (set via `axor claude → /auth`)
+```bash
+# Use env var
+ANTHROPIC_API_KEY=sk-ant-... axor-bench
+# Use flag (not saved)
+axor-bench --api-key sk-ant-...
+# Use saved key from axor-cli
+axor claude    # → /auth  →  saves to ~/.axor/config.toml
+axor-bench     # reads automatically
+```
+---
+## Suites
+| Suite | Tasks | What it measures |
+|-------|-------|-----------------|
+| `quick` | 1 task | Fast sanity check (~30s) |
+| `small` | 3 tasks | Single-turn focused tasks |
+| `large` | 2 tasks | Multi-tool, multi-step tasks |
+| `conversation` | 1 × 10 turns | Context growth over long sessions |
+| `federation` | 1 task | Child agent spawning + isolation |
+| `full` | all | Complete benchmark (~5-10 min) |
+```bash
+axor-bench --suite small          # fast
+axor-bench --suite full           # complete
+axor-bench --suite conversation   # test context compression
+axor-bench --suite federation     # test child agents
+```
+---
+## Options
+```
+axor-bench [options]
+  --api-key KEY     Anthropic API key
+  --repo PATH       Repo to benchmark (default: current dir)
+  --file PATH       Specific file to use as context
+  --suite SUITE     quick | small | large | conversation | federation | full
+  --no-raw          Skip raw Claude baseline (governed only)
+  --output FORMAT   table (default) | json
+```
+---
+## What is measured
+**Raw Claude** — direct Anthropic API call with no governance:
+- Full conversation history passed every turn
+- No context compression
+- No policy selection
+- No tool governance
+**Governed (axor)** — same task via GovernedSession:
+- Dynamic policy based on task (focused_readonly, moderate_mutative, etc.)
+- Context shaped and compressed per turn
+- Waste elimination (dedup, error collapse, prose summarization)
+- Session-scoped cache (no re-reading same file twice)
+**Token savings** = `(raw - governed) / raw × 100%`
+Positive = governed uses fewer tokens (expected for most tasks).
+Negative = governed uses more (possible for very simple tasks where overhead > savings).
+---
+## Requirements
+- Python 3.11+
+- `axor-core >= 0.1.0`
+- `axor-claude >= 0.1.0`
+- `anthropic >= 0.40.0`
+---
+## License
+MIT

axor_benchmarks-0.1.1/README.md ADDED Viewed

@@ -0,0 +1,139 @@
+# axor-benchmarks
+[![CI](https://github.com/Bucha11/axor-benchmarks/actions/workflows/ci.yml/badge.svg)](https://github.com/Bucha11/axor-benchmarks/actions/workflows/ci.yml)
+[![PyPI](https://img.shields.io/pypi/v/axor-benchmarks)](https://pypi.org/project/axor-benchmarks/)
+[![Python](https://img.shields.io/pypi/pyversions/axor-benchmarks)](https://pypi.org/project/axor-benchmarks/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
+**Benchmark governed (axor) vs raw Claude on your codebase.**
+Measures real token savings, latency, and federation across 4 benchmark suites on any Python project.
+---
+## Installation
+```bash
+pip install axor-benchmarks
+```
+---
+## Quick Start
+```bash
+cd ~/my-project
+axor-bench
+```
+Output:
+```
+  axor benchmark results
+  repo: ~/my-project
+  file: src/auth.py
+  task                  raw tokens    governed    savings  bar               policy
+  ─────────────────────────────────────────────────────────────────────────────────
+  write_test                 1,842       1,203    -34.7%  ████████░░░░░░░░  focused_generative
+  explain_function           1,105         891    -19.4%  ███░░░░░░░░░░░░░  focused_readonly
+  find_bugs                  1,290         978    -24.2%  ████░░░░░░░░░░░░  focused_readonly
+  ─────────────────────────────────────────────────────────────────────────────────
+  TOTAL                      4,237       3,072    -27.5%  ████░░░░░░░░░░░░
+  insights
+  → Token reduction:    27.5% (4,237 → 3,072 tokens)
+  → Most used policy:   focused_readonly (2 tasks)
+```
+---
+## Authentication
+Priority order (highest to lowest):
+1. `--api-key sk-ant-...` flag
+2. `ANTHROPIC_API_KEY` env var
+3. `~/.axor/config.toml` (set via `axor claude → /auth`)
+```bash
+# Use env var
+ANTHROPIC_API_KEY=sk-ant-... axor-bench
+# Use flag (not saved)
+axor-bench --api-key sk-ant-...
+# Use saved key from axor-cli
+axor claude    # → /auth  →  saves to ~/.axor/config.toml
+axor-bench     # reads automatically
+```
+---
+## Suites
+| Suite | Tasks | What it measures |
+|-------|-------|-----------------|
+| `quick` | 1 task | Fast sanity check (~30s) |
+| `small` | 3 tasks | Single-turn focused tasks |
+| `large` | 2 tasks | Multi-tool, multi-step tasks |
+| `conversation` | 1 × 10 turns | Context growth over long sessions |
+| `federation` | 1 task | Child agent spawning + isolation |
+| `full` | all | Complete benchmark (~5-10 min) |
+```bash
+axor-bench --suite small          # fast
+axor-bench --suite full           # complete
+axor-bench --suite conversation   # test context compression
+axor-bench --suite federation     # test child agents
+```
+---
+## Options
+```
+axor-bench [options]
+  --api-key KEY     Anthropic API key
+  --repo PATH       Repo to benchmark (default: current dir)
+  --file PATH       Specific file to use as context
+  --suite SUITE     quick | small | large | conversation | federation | full
+  --no-raw          Skip raw Claude baseline (governed only)
+  --output FORMAT   table (default) | json
+```
+---
+## What is measured
+**Raw Claude** — direct Anthropic API call with no governance:
+- Full conversation history passed every turn
+- No context compression
+- No policy selection
+- No tool governance
+**Governed (axor)** — same task via GovernedSession:
+- Dynamic policy based on task (focused_readonly, moderate_mutative, etc.)
+- Context shaped and compressed per turn
+- Waste elimination (dedup, error collapse, prose summarization)
+- Session-scoped cache (no re-reading same file twice)
+**Token savings** = `(raw - governed) / raw × 100%`
+Positive = governed uses fewer tokens (expected for most tasks).
+Negative = governed uses more (possible for very simple tasks where overhead > savings).
+---
+## Requirements
+- Python 3.11+
+- `axor-core >= 0.1.0`
+- `axor-claude >= 0.1.0`
+- `anthropic >= 0.40.0`
+---
+## License
+MIT

axor_benchmarks-0.1.1/REFACTORING_SUMMARY.md ADDED Viewed

@@ -0,0 +1,161 @@
+# API Key Module Refactoring Summary
+## Overview
+Refactored `api_key.py` to improve readability, maintainability, and type safety while preserving the existing public interface.
+## Key Improvements
+### 1. **Enhanced Type Hints**
+- Added `Final` type hints for constants to prevent accidental modification
+- Added explicit return types to all functions
+- Added parameter type hints where missing
+- Used `dict[str, Any]` instead of generic dict
+**Before:**
+```python
+CONFIG_DIR  = Path.home() / ".axor"
+CONFIG_FILE = CONFIG_DIR / "config.toml"
+_ENV_VARS = {
+    "claude": "ANTHROPIC_API_KEY",
+    "openai": "OPENAI_API_KEY",
+}
+```
+**After:**
+```python
+CONFIG_DIR: Final[Path] = Path.home() / ".axor"
+CONFIG_FILE: Final[Path] = CONFIG_DIR / "config.toml"
+_ENV_VARS: Final[dict[str, str]] = {
+    "claude": "ANTHROPIC_API_KEY",
+    "openai": "OPENAI_API_KEY",
+}
+```
+### 2. **Improved Code Organization**
+- Moved all public functions to the top
+- Grouped private helper functions at the bottom with clear section comment
+- Consistent ordering: constants → public API → private helpers
+### 3. **Better Separation of Concerns**
+- Extracted TOML serialization logic into `_serialize_to_toml()`
+- Split complex `prompt_and_save()` into smaller, focused functions:
+  - `_print_prompt_header()` - Display prompt information
+  - `_prompt_for_key()` - Get API key from user
+  - `_prompt_to_save()` - Ask about saving to config
+  - `_save_key_to_config()` - Save and display result
+- Created `_load_existing_config()` to deduplicate config loading logic
+### 4. **Enhanced Documentation**
+- Added comprehensive docstrings to all functions (public and private)
+- Included Args, Returns, and Raises sections where appropriate
+- Added inline comments explaining non-obvious behavior
+- Clarified the priority chain in `resolve_api_key()` docstring
+### 5. **Improved Error Handling**
+- Better tracking of file descriptors in `_write_config()`
+- Explicit cleanup of temp files on error
+- Clear separation between expected failures (return None) and exceptional failures (raise)
+**Before:**
+```python
+def _write_config(data: dict[str, Any]) -> None:
+    fd, tmp = tempfile.mkstemp(dir=CONFIG_DIR, prefix=".axor_cfg_")
+    try:
+        with os.fdopen(fd, "w") as f:
+            f.write("\n".join(lines))
+        os.replace(tmp, CONFIG_FILE)
+        CONFIG_FILE.chmod(stat.S_IRUSR | stat.S_IWUSR)
+    except Exception:
+        if os.path.exists(tmp):
+            os.unlink(tmp)
+        raise
+```
+**After:**
+```python
+def _write_config(data: dict[str, Any]) -> None:
+    toml_content = _serialize_to_toml(data)
+    fd = -1
+    tmp_path = ""
+    try:
+        fd, tmp_path = tempfile.mkstemp(dir=CONFIG_DIR, prefix=".axor_cfg_")
+        with os.fdopen(fd, "w") as f:
+            f.write(toml_content)
+        fd = -1  # Mark as closed
+        os.replace(tmp_path, CONFIG_FILE)
+        CONFIG_FILE.chmod(stat.S_IRUSR | stat.S_IWUSR)
+    except Exception:
+        if fd != -1:
+            try:
+                os.close(fd)
+            except OSError:
+                pass
+        if tmp_path and os.path.exists(tmp_path):
+            os.unlink(tmp_path)
+        raise
+```
+### 6. **Named Constants**
+- Introduced `_TOML_API_KEY_FIELD` constant for "api_key" field name
+- Prevents typos and makes future changes easier
+### 7. **Pathlib Consistency**
+- Used `Path.open()` instead of mixing `open()` with Path objects
+- More idiomatic pathlib usage throughout
+**Before:**
+```python
+with open(CONFIG_FILE, "rb") as f:
+    config = tomllib.load(f)
+```
+**After:**
+```python
+with CONFIG_FILE.open("rb") as f:
+    config = tomllib.load(f)
+```
+### 8. **Quote Escaping in TOML**
+- Added proper escaping for quotes in values to prevent TOML syntax errors
+- More robust serialization
+**Before:**
+```python
+lines.append(f'{key} = "{val}"')
+```
+**After:**
+```python
+escaped_val = str(val).replace('"', r'\"')
+lines.append(f'{key} = "{escaped_val}"')
+```
+## Public Interface Preserved
+All public functions maintain their exact signatures:
+- `resolve_api_key(adapter: str, flag_key: str | None = None) -> str | None`
+- `load_from_config(adapter: str) -> str | None`
+- `save_to_config(adapter: str, api_key: str) -> None`
+- `clear_from_config(adapter: str) -> bool`
+- `prompt_and_save(adapter: str) -> str | None`
+## Testing
+All existing functionality verified:
+- ✓ Module imports successfully
+- ✓ Flag-based key resolution
+- ✓ Environment variable resolution
+- ✓ Config file save/load
+- ✓ Key clearing
+## Benefits
+1. **Maintainability**: Smaller, focused functions are easier to understand and modify
+2. **Testability**: Each helper function can be tested independently
+3. **Type Safety**: Better IDE support and early error detection
+4. **Readability**: Clear structure and comprehensive documentation
+5. **Robustness**: Improved error handling and edge case coverage