agentblaster 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- agentblaster-0.1.0/.github/workflows/ci.yml +147 -0
- agentblaster-0.1.0/.github/workflows/publish.yml +81 -0
- agentblaster-0.1.0/.gitignore +13 -0
- agentblaster-0.1.0/PKG-INFO +250 -0
- agentblaster-0.1.0/README.md +208 -0
- agentblaster-0.1.0/agentblaster.policy.example.yaml +94 -0
- agentblaster-0.1.0/campaigns/qwen-gemma-local/README.md +347 -0
- agentblaster-0.1.0/campaigns/qwen-gemma-local/campaign-handoff.json +117 -0
- agentblaster-0.1.0/docs/README.md +71 -0
- agentblaster-0.1.0/docs/agent-fanout.md +31 -0
- agentblaster-0.1.0/docs/agent-profiles.md +37 -0
- agentblaster-0.1.0/docs/artifact-schemas.md +19 -0
- agentblaster-0.1.0/docs/audit.md +111 -0
- agentblaster-0.1.0/docs/cache-control.md +23 -0
- agentblaster-0.1.0/docs/cancellation.md +29 -0
- agentblaster-0.1.0/docs/capabilities.md +101 -0
- agentblaster-0.1.0/docs/capability-surfaces.md +39 -0
- agentblaster-0.1.0/docs/dashboard.md +129 -0
- agentblaster-0.1.0/docs/engine-targets.md +51 -0
- agentblaster-0.1.0/docs/evidence-bundles.md +116 -0
- agentblaster-0.1.0/docs/experiments.md +35 -0
- agentblaster-0.1.0/docs/failure-taxonomy.md +23 -0
- agentblaster-0.1.0/docs/harness-engineering.md +62 -0
- agentblaster-0.1.0/docs/harness.md +69 -0
- agentblaster-0.1.0/docs/launch-recipes.md +62 -0
- agentblaster-0.1.0/docs/lcp-fixtures.md +19 -0
- agentblaster-0.1.0/docs/metrics.md +123 -0
- agentblaster-0.1.0/docs/models.md +114 -0
- agentblaster-0.1.0/docs/observability.md +46 -0
- agentblaster-0.1.0/docs/planning.md +116 -0
- agentblaster-0.1.0/docs/prd.md +1113 -0
- agentblaster-0.1.0/docs/prompt-footprint.md +43 -0
- agentblaster-0.1.0/docs/providers.md +392 -0
- agentblaster-0.1.0/docs/readiness.md +25 -0
- agentblaster-0.1.0/docs/release-qualification.md +139 -0
- agentblaster-0.1.0/docs/reporting.md +292 -0
- agentblaster-0.1.0/docs/reproducibility.md +93 -0
- agentblaster-0.1.0/docs/retention.md +98 -0
- agentblaster-0.1.0/docs/security-policy.md +184 -0
- agentblaster-0.1.0/docs/security-scan.md +20 -0
- agentblaster-0.1.0/docs/suite-governance.md +68 -0
- agentblaster-0.1.0/docs/telemetry-normalization.md +46 -0
- agentblaster-0.1.0/docs/testing.md +217 -0
- agentblaster-0.1.0/docs/trace-replay.md +53 -0
- agentblaster-0.1.0/docs/user-guide.md +474 -0
- agentblaster-0.1.0/docs/workflow-surfaces.md +37 -0
- agentblaster-0.1.0/examples/README.md +44 -0
- agentblaster-0.1.0/examples/matrices/local-smoke.yaml +13 -0
- agentblaster-0.1.0/examples/matrices/qwen-gemma-local.yaml +39 -0
- agentblaster-0.1.0/examples/matrices/qwen-gemma-stress.yaml +351 -0
- agentblaster-0.1.0/examples/suites/harness-contract-fuzz.yaml +100 -0
- agentblaster-0.1.0/examples/suites/mcp-wide.yaml +22 -0
- agentblaster-0.1.0/examples/suites/skills-prefill.yaml +26 -0
- agentblaster-0.1.0/examples/suites/smoke.yaml +23 -0
- agentblaster-0.1.0/examples/suites/streaming.yaml +20 -0
- agentblaster-0.1.0/examples/suites/structured.yaml +33 -0
- agentblaster-0.1.0/examples/suites/toolcall.yaml +31 -0
- agentblaster-0.1.0/examples/suites/toolsim.yaml +28 -0
- agentblaster-0.1.0/examples/suites/trace-replay.yaml +33 -0
- agentblaster-0.1.0/pyproject.toml +78 -0
- agentblaster-0.1.0/src/agentblaster/__init__.py +3 -0
- agentblaster-0.1.0/src/agentblaster/adapters.py +1435 -0
- agentblaster-0.1.0/src/agentblaster/agent_profiles.py +420 -0
- agentblaster-0.1.0/src/agentblaster/audit.py +27 -0
- agentblaster-0.1.0/src/agentblaster/benchmark_kit.py +356 -0
- agentblaster-0.1.0/src/agentblaster/bundle.py +692 -0
- agentblaster-0.1.0/src/agentblaster/campaign.py +1031 -0
- agentblaster-0.1.0/src/agentblaster/campaign_preflight.py +647 -0
- agentblaster-0.1.0/src/agentblaster/capabilities.py +270 -0
- agentblaster-0.1.0/src/agentblaster/claim_readiness.py +3948 -0
- agentblaster-0.1.0/src/agentblaster/cleanup.py +226 -0
- agentblaster-0.1.0/src/agentblaster/cli.py +4202 -0
- agentblaster-0.1.0/src/agentblaster/compare.py +423 -0
- agentblaster-0.1.0/src/agentblaster/config.py +62 -0
- agentblaster-0.1.0/src/agentblaster/constants.py +8 -0
- agentblaster-0.1.0/src/agentblaster/contract_check.py +919 -0
- agentblaster-0.1.0/src/agentblaster/costs.py +74 -0
- agentblaster-0.1.0/src/agentblaster/dashboard.py +5974 -0
- agentblaster-0.1.0/src/agentblaster/engine_advisory.py +1045 -0
- agentblaster-0.1.0/src/agentblaster/engine_onboarding.py +224 -0
- agentblaster-0.1.0/src/agentblaster/engine_targets.py +545 -0
- agentblaster-0.1.0/src/agentblaster/environment.py +284 -0
- agentblaster-0.1.0/src/agentblaster/errors.py +21 -0
- agentblaster-0.1.0/src/agentblaster/evidence.py +188 -0
- agentblaster-0.1.0/src/agentblaster/evidence_index.py +1865 -0
- agentblaster-0.1.0/src/agentblaster/experiment.py +200 -0
- agentblaster-0.1.0/src/agentblaster/exports.py +158 -0
- agentblaster-0.1.0/src/agentblaster/failures.py +70 -0
- agentblaster-0.1.0/src/agentblaster/fixtures.py +775 -0
- agentblaster-0.1.0/src/agentblaster/harness.py +1254 -0
- agentblaster-0.1.0/src/agentblaster/implementation_status.py +719 -0
- agentblaster-0.1.0/src/agentblaster/integrity.py +161 -0
- agentblaster-0.1.0/src/agentblaster/launch_recipes.py +295 -0
- agentblaster-0.1.0/src/agentblaster/lcp.py +107 -0
- agentblaster-0.1.0/src/agentblaster/matrix.py +101 -0
- agentblaster-0.1.0/src/agentblaster/matrix_gate.py +565 -0
- agentblaster-0.1.0/src/agentblaster/matrix_pressure.py +187 -0
- agentblaster-0.1.0/src/agentblaster/matrix_saturation.py +601 -0
- agentblaster-0.1.0/src/agentblaster/mcp.py +187 -0
- agentblaster-0.1.0/src/agentblaster/metric_coverage.py +552 -0
- agentblaster-0.1.0/src/agentblaster/mock_provider.py +485 -0
- agentblaster-0.1.0/src/agentblaster/model_catalog.py +153 -0
- agentblaster-0.1.0/src/agentblaster/models.py +531 -0
- agentblaster-0.1.0/src/agentblaster/observability.py +110 -0
- agentblaster-0.1.0/src/agentblaster/planning.py +199 -0
- agentblaster-0.1.0/src/agentblaster/policy.py +635 -0
- agentblaster-0.1.0/src/agentblaster/presets.py +219 -0
- agentblaster-0.1.0/src/agentblaster/prompt_footprint.py +245 -0
- agentblaster-0.1.0/src/agentblaster/protocol_repair.py +431 -0
- agentblaster-0.1.0/src/agentblaster/provider_audit.py +210 -0
- agentblaster-0.1.0/src/agentblaster/publication_brief.py +893 -0
- agentblaster-0.1.0/src/agentblaster/quality.py +1142 -0
- agentblaster-0.1.0/src/agentblaster/rate_limits.py +74 -0
- agentblaster-0.1.0/src/agentblaster/readiness.py +241 -0
- agentblaster-0.1.0/src/agentblaster/redaction.py +58 -0
- agentblaster-0.1.0/src/agentblaster/redaction_scan.py +247 -0
- agentblaster-0.1.0/src/agentblaster/release.py +440 -0
- agentblaster-0.1.0/src/agentblaster/release_qualification.py +2248 -0
- agentblaster-0.1.0/src/agentblaster/remote_onboarding.py +308 -0
- agentblaster-0.1.0/src/agentblaster/reports.py +2245 -0
- agentblaster-0.1.0/src/agentblaster/runner.py +1677 -0
- agentblaster-0.1.0/src/agentblaster/schema_registry.py +1151 -0
- agentblaster-0.1.0/src/agentblaster/secrets.py +274 -0
- agentblaster-0.1.0/src/agentblaster/security_posture.py +492 -0
- agentblaster-0.1.0/src/agentblaster/skills.py +67 -0
- agentblaster-0.1.0/src/agentblaster/stress_matrix.py +113 -0
- agentblaster-0.1.0/src/agentblaster/suite_audit.py +259 -0
- agentblaster-0.1.0/src/agentblaster/suite_calibration.py +171 -0
- agentblaster-0.1.0/src/agentblaster/suites.py +805 -0
- agentblaster-0.1.0/src/agentblaster/telemetry.py +947 -0
- agentblaster-0.1.0/src/agentblaster/telemetry_audit.py +300 -0
- agentblaster-0.1.0/src/agentblaster/toolsim.py +193 -0
- agentblaster-0.1.0/src/agentblaster/workflow_readiness.py +570 -0
- agentblaster-0.1.0/src/agentblaster/workflow_surfaces.py +292 -0
- agentblaster-0.1.0/tests/gui/chrome-dashboard-checklist.md +30 -0
- agentblaster-0.1.0/tests/gui/test_dashboard_playwright.py +105 -0
- agentblaster-0.1.0/tests/test_adapters.py +1095 -0
- agentblaster-0.1.0/tests/test_agent_profiles.py +72 -0
- agentblaster-0.1.0/tests/test_audit.py +16 -0
- agentblaster-0.1.0/tests/test_benchmark_kit.py +80 -0
- agentblaster-0.1.0/tests/test_bundle.py +265 -0
- agentblaster-0.1.0/tests/test_cache_control.py +100 -0
- agentblaster-0.1.0/tests/test_campaign.py +648 -0
- agentblaster-0.1.0/tests/test_campaign_handoff.py +125 -0
- agentblaster-0.1.0/tests/test_campaign_preflight.py +313 -0
- agentblaster-0.1.0/tests/test_capabilities.py +310 -0
- agentblaster-0.1.0/tests/test_claim_readiness.py +2026 -0
- agentblaster-0.1.0/tests/test_cleanup.py +185 -0
- agentblaster-0.1.0/tests/test_cli.py +3600 -0
- agentblaster-0.1.0/tests/test_compare.py +181 -0
- agentblaster-0.1.0/tests/test_config.py +37 -0
- agentblaster-0.1.0/tests/test_contract_check.py +228 -0
- agentblaster-0.1.0/tests/test_dashboard.py +3040 -0
- agentblaster-0.1.0/tests/test_dashboard_fixtures.py +155 -0
- agentblaster-0.1.0/tests/test_engine_advisory.py +412 -0
- agentblaster-0.1.0/tests/test_engine_onboarding.py +59 -0
- agentblaster-0.1.0/tests/test_engine_targets.py +109 -0
- agentblaster-0.1.0/tests/test_environment.py +61 -0
- agentblaster-0.1.0/tests/test_evidence.py +146 -0
- agentblaster-0.1.0/tests/test_evidence_index.py +764 -0
- agentblaster-0.1.0/tests/test_examples.py +24 -0
- agentblaster-0.1.0/tests/test_experiment.py +79 -0
- agentblaster-0.1.0/tests/test_exports.py +216 -0
- agentblaster-0.1.0/tests/test_failures.py +36 -0
- agentblaster-0.1.0/tests/test_harness.py +321 -0
- agentblaster-0.1.0/tests/test_implementation_status.py +257 -0
- agentblaster-0.1.0/tests/test_integrity.py +120 -0
- agentblaster-0.1.0/tests/test_launch_recipes.py +80 -0
- agentblaster-0.1.0/tests/test_lcp.py +20 -0
- agentblaster-0.1.0/tests/test_matrix.py +489 -0
- agentblaster-0.1.0/tests/test_matrix_pressure.py +86 -0
- agentblaster-0.1.0/tests/test_matrix_saturation.py +187 -0
- agentblaster-0.1.0/tests/test_mcp.py +69 -0
- agentblaster-0.1.0/tests/test_metric_coverage.py +110 -0
- agentblaster-0.1.0/tests/test_mock_provider.py +156 -0
- agentblaster-0.1.0/tests/test_model_catalog.py +74 -0
- agentblaster-0.1.0/tests/test_models.py +128 -0
- agentblaster-0.1.0/tests/test_planning.py +202 -0
- agentblaster-0.1.0/tests/test_policy.py +969 -0
- agentblaster-0.1.0/tests/test_presets.py +95 -0
- agentblaster-0.1.0/tests/test_prompt_footprint.py +96 -0
- agentblaster-0.1.0/tests/test_protocol_repair.py +158 -0
- agentblaster-0.1.0/tests/test_publication_brief.py +301 -0
- agentblaster-0.1.0/tests/test_quality.py +171 -0
- agentblaster-0.1.0/tests/test_rate_limits.py +34 -0
- agentblaster-0.1.0/tests/test_readiness.py +168 -0
- agentblaster-0.1.0/tests/test_redaction.py +39 -0
- agentblaster-0.1.0/tests/test_redaction_scan.py +89 -0
- agentblaster-0.1.0/tests/test_release.py +156 -0
- agentblaster-0.1.0/tests/test_release_qualification.py +1620 -0
- agentblaster-0.1.0/tests/test_remote_onboarding.py +157 -0
- agentblaster-0.1.0/tests/test_reports.py +680 -0
- agentblaster-0.1.0/tests/test_review_artifact_integrations.py +206 -0
- agentblaster-0.1.0/tests/test_runner.py +1610 -0
- agentblaster-0.1.0/tests/test_schema_registry.py +146 -0
- agentblaster-0.1.0/tests/test_secrets.py +82 -0
- agentblaster-0.1.0/tests/test_security_posture.py +242 -0
- agentblaster-0.1.0/tests/test_skills.py +36 -0
- agentblaster-0.1.0/tests/test_stress_matrix.py +98 -0
- agentblaster-0.1.0/tests/test_suite_audit.py +86 -0
- agentblaster-0.1.0/tests/test_suite_calibration.py +118 -0
- agentblaster-0.1.0/tests/test_suites.py +211 -0
- agentblaster-0.1.0/tests/test_telemetry.py +290 -0
- agentblaster-0.1.0/tests/test_telemetry_audit.py +158 -0
- agentblaster-0.1.0/tests/test_toolsim.py +48 -0
- agentblaster-0.1.0/tests/test_workflow_readiness.py +188 -0
- agentblaster-0.1.0/tests/test_workflow_surfaces.py +58 -0
- agentblaster-0.1.0/tests/test_workflows.py +26 -0
|
@@ -0,0 +1,147 @@
|
|
|
1
|
+
name: CI
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
branches: [main]
|
|
6
|
+
pull_request:
|
|
7
|
+
branches: [main]
|
|
8
|
+
workflow_dispatch:
|
|
9
|
+
|
|
10
|
+
permissions:
|
|
11
|
+
contents: read
|
|
12
|
+
|
|
13
|
+
jobs:
|
|
14
|
+
tests:
|
|
15
|
+
name: Python ${{ matrix.python-version }} on ${{ matrix.os }}
|
|
16
|
+
runs-on: ${{ matrix.os }}
|
|
17
|
+
strategy:
|
|
18
|
+
fail-fast: false
|
|
19
|
+
matrix:
|
|
20
|
+
os: [ubuntu-latest, macos-latest, windows-latest]
|
|
21
|
+
python-version: ["3.11", "3.12"]
|
|
22
|
+
|
|
23
|
+
steps:
|
|
24
|
+
- name: Checkout
|
|
25
|
+
uses: actions/checkout@v4
|
|
26
|
+
|
|
27
|
+
- name: Set up Python
|
|
28
|
+
uses: actions/setup-python@v5
|
|
29
|
+
with:
|
|
30
|
+
python-version: ${{ matrix.python-version }}
|
|
31
|
+
cache: pip
|
|
32
|
+
|
|
33
|
+
- name: Install package
|
|
34
|
+
run: python -m pip install -e ".[dev]"
|
|
35
|
+
|
|
36
|
+
- name: Run deterministic app tests
|
|
37
|
+
run: |
|
|
38
|
+
python -c "from pathlib import Path; Path('test-reports').mkdir(exist_ok=True)"
|
|
39
|
+
pytest -q -m "not remote and not slow and not gui" --junitxml=test-reports/pytest.xml
|
|
40
|
+
|
|
41
|
+
- name: Selftest command dry runs
|
|
42
|
+
run: |
|
|
43
|
+
agentblaster selftest --tier fast --dry-run
|
|
44
|
+
agentblaster selftest --tier normal --dry-run
|
|
45
|
+
agentblaster selftest --tier security --dry-run
|
|
46
|
+
agentblaster selftest gui --browser chromium --dry-run
|
|
47
|
+
|
|
48
|
+
- name: CLI smoke
|
|
49
|
+
run: |
|
|
50
|
+
agentblaster version
|
|
51
|
+
agentblaster doctor --output-json test-reports/environment-readiness.json --fail-on-required-gaps
|
|
52
|
+
agentblaster suites
|
|
53
|
+
agentblaster validate-case examples/suites/smoke.yaml
|
|
54
|
+
agentblaster quality tiers
|
|
55
|
+
agentblaster quality command normal
|
|
56
|
+
agentblaster harness profiles
|
|
57
|
+
agentblaster models targets
|
|
58
|
+
|
|
59
|
+
- name: Upload test report
|
|
60
|
+
if: always()
|
|
61
|
+
uses: actions/upload-artifact@v4
|
|
62
|
+
with:
|
|
63
|
+
name: pytest-${{ matrix.os }}-${{ matrix.python-version }}
|
|
64
|
+
path: test-reports/pytest.xml
|
|
65
|
+
|
|
66
|
+
governance-artifacts:
|
|
67
|
+
name: Deterministic governance artifacts
|
|
68
|
+
runs-on: ubuntu-latest
|
|
69
|
+
|
|
70
|
+
steps:
|
|
71
|
+
- name: Checkout
|
|
72
|
+
uses: actions/checkout@v4
|
|
73
|
+
|
|
74
|
+
- name: Set up Python
|
|
75
|
+
uses: actions/setup-python@v5
|
|
76
|
+
with:
|
|
77
|
+
python-version: "3.12"
|
|
78
|
+
cache: pip
|
|
79
|
+
|
|
80
|
+
- name: Install package
|
|
81
|
+
run: python -m pip install -e ".[dev]"
|
|
82
|
+
|
|
83
|
+
- name: Generate SDLC and GUI evidence artifacts
|
|
84
|
+
run: |
|
|
85
|
+
mkdir -p test-reports/gui test-reports/providers test-reports/security
|
|
86
|
+
agentblaster quality chrome-checklist --output test-reports/gui/chrome-dashboard-checklist.md
|
|
87
|
+
agentblaster quality chrome-plan --format json --output test-reports/gui/chrome-dashboard-plan.json
|
|
88
|
+
agentblaster quality chrome-plan --format md --output test-reports/gui/chrome-dashboard-plan.md
|
|
89
|
+
agentblaster quality dashboard-fixture --output test-reports/dashboard-runs --overwrite
|
|
90
|
+
|
|
91
|
+
- name: Generate mock-provider contract plans
|
|
92
|
+
env:
|
|
93
|
+
AGENTBLASTER_HOME: ${{ github.workspace }}/test-reports/config
|
|
94
|
+
run: |
|
|
95
|
+
agentblaster providers add --name mock-openai --contract openai --base-url http://127.0.0.1:8787/v1
|
|
96
|
+
agentblaster providers add --name mock-responses --contract openai-responses --base-url http://127.0.0.1:8787/v1
|
|
97
|
+
agentblaster providers add --name mock-anthropic --contract anthropic --base-url http://127.0.0.1:8787/v1
|
|
98
|
+
agentblaster providers contract-check --provider mock-openai --model agentblaster-mock-qwen3.6-27b-dense --output-json test-reports/providers/mock-openai-contract-plan.json
|
|
99
|
+
agentblaster providers contract-check --provider mock-responses --model agentblaster-mock-qwen3.6-27b-dense --output-json test-reports/providers/mock-responses-contract-plan.json
|
|
100
|
+
agentblaster providers contract-check --provider mock-anthropic --model agentblaster-mock-qwen3.6-27b-dense --skip-structured --output-json test-reports/providers/mock-anthropic-contract-plan.json
|
|
101
|
+
agentblaster providers audit --output-json test-reports/providers/provider-audit.json
|
|
102
|
+
|
|
103
|
+
- name: Generate release provenance and redaction scan
|
|
104
|
+
run: |
|
|
105
|
+
mkdir -p test-reports/release test-reports/security
|
|
106
|
+
agentblaster doctor --output-json test-reports/release/environment-readiness.json --fail-on-required-gaps
|
|
107
|
+
agentblaster release packaging-readiness --output-json test-reports/release/packaging-readiness.json --fail-on-gaps
|
|
108
|
+
agentblaster release provenance --output test-reports/release-provenance.json
|
|
109
|
+
agentblaster security scan test-reports --output-json test-reports/security/redaction-scan.json
|
|
110
|
+
|
|
111
|
+
- name: Upload governance artifacts
|
|
112
|
+
if: always()
|
|
113
|
+
uses: actions/upload-artifact@v4
|
|
114
|
+
with:
|
|
115
|
+
name: deterministic-governance-artifacts
|
|
116
|
+
path: test-reports/
|
|
117
|
+
|
|
118
|
+
gui-optional:
|
|
119
|
+
name: Optional GUI browser lane
|
|
120
|
+
runs-on: ubuntu-latest
|
|
121
|
+
if: github.event_name == 'workflow_dispatch'
|
|
122
|
+
|
|
123
|
+
steps:
|
|
124
|
+
- name: Checkout
|
|
125
|
+
uses: actions/checkout@v4
|
|
126
|
+
|
|
127
|
+
- name: Set up Python
|
|
128
|
+
uses: actions/setup-python@v5
|
|
129
|
+
with:
|
|
130
|
+
python-version: "3.12"
|
|
131
|
+
cache: pip
|
|
132
|
+
|
|
133
|
+
- name: Install package with GUI test extras
|
|
134
|
+
run: python -m pip install -e ".[dev,gui-test]"
|
|
135
|
+
|
|
136
|
+
- name: Install Playwright browser
|
|
137
|
+
run: python -m playwright install --with-deps chromium
|
|
138
|
+
|
|
139
|
+
- name: Run optional GUI tests
|
|
140
|
+
run: PYTHONPATH=src pytest -q tests/gui -m gui --junitxml=test-reports/gui-pytest.xml
|
|
141
|
+
|
|
142
|
+
- name: Upload GUI report
|
|
143
|
+
if: always()
|
|
144
|
+
uses: actions/upload-artifact@v4
|
|
145
|
+
with:
|
|
146
|
+
name: gui-pytest
|
|
147
|
+
path: test-reports/gui-pytest.xml
|
|
@@ -0,0 +1,81 @@
|
|
|
1
|
+
name: Package
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
workflow_dispatch:
|
|
5
|
+
push:
|
|
6
|
+
tags:
|
|
7
|
+
- "v*"
|
|
8
|
+
|
|
9
|
+
permissions:
|
|
10
|
+
contents: read
|
|
11
|
+
|
|
12
|
+
jobs:
|
|
13
|
+
package:
|
|
14
|
+
name: Build release package artifacts
|
|
15
|
+
runs-on: ubuntu-latest
|
|
16
|
+
|
|
17
|
+
steps:
|
|
18
|
+
- name: Checkout
|
|
19
|
+
uses: actions/checkout@v4
|
|
20
|
+
|
|
21
|
+
- name: Set up Python
|
|
22
|
+
uses: actions/setup-python@v5
|
|
23
|
+
with:
|
|
24
|
+
python-version: "3.12"
|
|
25
|
+
cache: pip
|
|
26
|
+
|
|
27
|
+
- name: Install build dependencies
|
|
28
|
+
run: python -m pip install -e ".[dev]"
|
|
29
|
+
|
|
30
|
+
- name: Static environment and packaging readiness
|
|
31
|
+
run: |
|
|
32
|
+
mkdir -p dist release-reports
|
|
33
|
+
agentblaster doctor --output-json release-reports/environment-readiness.json --fail-on-required-gaps
|
|
34
|
+
agentblaster release packaging-readiness --output-json release-reports/packaging-readiness.json --fail-on-gaps
|
|
35
|
+
agentblaster release provenance --output release-reports/release-provenance.json
|
|
36
|
+
|
|
37
|
+
- name: Build source and wheel distributions
|
|
38
|
+
run: python -m build
|
|
39
|
+
|
|
40
|
+
- name: Scan shareable release artifacts
|
|
41
|
+
run: agentblaster security scan dist release-reports --output-json release-reports/redaction-scan.json
|
|
42
|
+
|
|
43
|
+
- name: Upload package artifacts
|
|
44
|
+
uses: actions/upload-artifact@v4
|
|
45
|
+
with:
|
|
46
|
+
name: agentblaster-package
|
|
47
|
+
path: |
|
|
48
|
+
dist/
|
|
49
|
+
release-reports/
|
|
50
|
+
|
|
51
|
+
- name: Upload distributions for publishing
|
|
52
|
+
uses: actions/upload-artifact@v4
|
|
53
|
+
with:
|
|
54
|
+
name: agentblaster-dist
|
|
55
|
+
path: dist/
|
|
56
|
+
|
|
57
|
+
publish:
|
|
58
|
+
name: Publish to PyPI via trusted publisher
|
|
59
|
+
needs: package
|
|
60
|
+
runs-on: ubuntu-latest
|
|
61
|
+
# Only publish on version tag pushes, never on manual dispatch or branch pushes.
|
|
62
|
+
if: startsWith(github.ref, 'refs/tags/v')
|
|
63
|
+
|
|
64
|
+
environment:
|
|
65
|
+
name: pypi
|
|
66
|
+
url: https://pypi.org/project/agentblaster/
|
|
67
|
+
|
|
68
|
+
permissions:
|
|
69
|
+
# IMPORTANT: required for trusted publishing (OIDC). Without this the
|
|
70
|
+
# GitHub -> PyPI handshake cannot occur and the publisher stays unused.
|
|
71
|
+
id-token: write
|
|
72
|
+
|
|
73
|
+
steps:
|
|
74
|
+
- name: Download built distributions
|
|
75
|
+
uses: actions/download-artifact@v4
|
|
76
|
+
with:
|
|
77
|
+
name: agentblaster-dist
|
|
78
|
+
path: dist/
|
|
79
|
+
|
|
80
|
+
- name: Publish to PyPI
|
|
81
|
+
uses: pypa/gh-action-pypi-publish@release/v1
|
|
@@ -0,0 +1,250 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: agentblaster
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Local agentic benchmark suite for MLX and OpenAI-compatible inference engines.
|
|
5
|
+
Project-URL: Homepage, https://github.com/scouzi1966/AgentBlaster
|
|
6
|
+
Project-URL: Repository, https://github.com/scouzi1966/AgentBlaster
|
|
7
|
+
Project-URL: Issues, https://github.com/scouzi1966/AgentBlaster/issues
|
|
8
|
+
Author: scouzi1966
|
|
9
|
+
License: MIT
|
|
10
|
+
Keywords: agentic-ai,anthropic-compatible,benchmark,local-llm,mlx,openai-compatible
|
|
11
|
+
Classifier: Development Status :: 3 - Alpha
|
|
12
|
+
Classifier: Environment :: Console
|
|
13
|
+
Classifier: Intended Audience :: Developers
|
|
14
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
15
|
+
Classifier: Operating System :: MacOS
|
|
16
|
+
Classifier: Operating System :: Microsoft :: Windows
|
|
17
|
+
Classifier: Operating System :: POSIX :: Linux
|
|
18
|
+
Classifier: Programming Language :: Python :: 3
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
21
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
22
|
+
Classifier: Topic :: Software Development :: Testing
|
|
23
|
+
Requires-Python: >=3.11
|
|
24
|
+
Requires-Dist: httpx>=0.27
|
|
25
|
+
Requires-Dist: pydantic>=2.7
|
|
26
|
+
Requires-Dist: pyyaml>=6.0
|
|
27
|
+
Requires-Dist: rich>=13.7
|
|
28
|
+
Requires-Dist: typer>=0.12
|
|
29
|
+
Provides-Extra: dev
|
|
30
|
+
Requires-Dist: build>=1.2; extra == 'dev'
|
|
31
|
+
Requires-Dist: pytest>=8.0; extra == 'dev'
|
|
32
|
+
Requires-Dist: twine>=5.0; extra == 'dev'
|
|
33
|
+
Provides-Extra: exports
|
|
34
|
+
Requires-Dist: pyarrow>=15.0; extra == 'exports'
|
|
35
|
+
Provides-Extra: gui-test
|
|
36
|
+
Requires-Dist: playwright>=1.44; extra == 'gui-test'
|
|
37
|
+
Provides-Extra: reports
|
|
38
|
+
Requires-Dist: cairosvg>=2.7; extra == 'reports'
|
|
39
|
+
Provides-Extra: secrets
|
|
40
|
+
Requires-Dist: keyring>=25.0; extra == 'secrets'
|
|
41
|
+
Description-Content-Type: text/markdown
|
|
42
|
+
|
|
43
|
+
# AgentBlaster
|
|
44
|
+
|
|
45
|
+
AgentBlaster is a local agentic benchmark suite for OpenAI-compatible, Anthropic-compatible, and engine-native local inference servers.
|
|
46
|
+
|
|
47
|
+
The goal is to measure the hard parts of local agent workloads: repeated long system prompts, tool schemas, skills, MCP-style tool catalogs, structured output, streaming, cancellation, concurrency, prompt-cache reuse, and professional reporting.
|
|
48
|
+
|
|
49
|
+
[](https://github.com/scouzi1966/AgentBlaster/actions/workflows/ci.yml)
|
|
50
|
+
|
|
51
|
+
## Initial Scope
|
|
52
|
+
|
|
53
|
+
- Engines: AFM MLX, mlx-lm, Ollama MLX, LM Studio, oMLX, Rapid-MLX, and vLLM-MLX.
|
|
54
|
+
- Models: Qwen3.6-27B dense and Gemma 4 31B dense.
|
|
55
|
+
- Interfaces: OpenAI Chat Completions first, then OpenAI Responses and Anthropic Messages.
|
|
56
|
+
- Outputs: CLI results, normalized JSONL, optional dashboard, HTML/PDF/SVG reports, PNG-ready media cards, and media-kit manifests for corporate/media publication packs.
|
|
57
|
+
|
|
58
|
+
## Repository Status
|
|
59
|
+
|
|
60
|
+
This repository is freshly scaffolded from the initial PRD. The product requirements live in [docs/prd.md](docs/prd.md).
|
|
61
|
+
|
|
62
|
+
## Implemented CLI Foundation
|
|
63
|
+
|
|
64
|
+
```bash
|
|
65
|
+
agentblaster version
|
|
66
|
+
agentblaster doctor --policy agentblaster.policy.yaml --output-json reports/environment-readiness.json
|
|
67
|
+
agentblaster implementation-status --output-json reports/implementation-status.json
|
|
68
|
+
agentblaster suites
|
|
69
|
+
agentblaster validate-case examples/suites/smoke.yaml
|
|
70
|
+
agentblaster engines list
|
|
71
|
+
agentblaster engines onboarding --format markdown --output reports/local-engine-onboarding.md
|
|
72
|
+
agentblaster engines improvement-plan --engine afm --pressure-audit reports/qwen-gemma-stress-pressure.json --matrix-saturation-report reports/qwen-gemma-matrix-saturation.json --provider-contract-matrix reports/qwen-gemma-provider-contract-matrix.json --telemetry-audit reports/afm-telemetry-audit.json --metric-coverage reports/afm-metric-coverage.json --matrix-gate reports/qwen-gemma-matrix-gate.json --harness-review reports/harness-orchestration-review.json --output-json reports/afm-improvement-plan.json
|
|
73
|
+
agentblaster engines launch-recipes --catalog
|
|
74
|
+
agentblaster engines launch-recipes --engine afm --model mlx-community/Qwen3.6-27B --markdown --output-json reports/afm-launch-recipe.json
|
|
75
|
+
agentblaster engines probe --engine afm --base-url http://127.0.0.1:9999/v1
|
|
76
|
+
agentblaster providers presets
|
|
77
|
+
agentblaster providers add-preset --preset afm
|
|
78
|
+
agentblaster providers add-preset --preset ollama-native
|
|
79
|
+
agentblaster providers add-preset --preset openai
|
|
80
|
+
agentblaster providers add-preset --preset anthropic
|
|
81
|
+
agentblaster providers add-preset --preset openai --name openai-workspace --api-key-env WORKSPACE_OPENAI_KEY
|
|
82
|
+
agentblaster providers add --name openai --contract openai --base-url https://api.openai.com/v1 --api-key-env OPENAI_API_KEY --remote
|
|
83
|
+
agentblaster providers add --name openai --contract openai --base-url https://api.openai.com/v1 --api-key-env OPENAI_API_KEY --remote --audit-log audit/control-plane.jsonl
|
|
84
|
+
agentblaster providers add --name openai-enterprise --contract openai --base-url https://gateway.example.com/v1 --api-key-env OPENAI_API_KEY --ca-bundle /etc/ssl/certs/enterprise-ca.pem --remote
|
|
85
|
+
agentblaster providers list
|
|
86
|
+
agentblaster providers audit --policy agentblaster.policy.yaml --output-json reports/provider-audit.json
|
|
87
|
+
agentblaster providers metric-coverage --provider afm --output-json reports/afm-metric-coverage.json
|
|
88
|
+
agentblaster providers metric-coverage --catalog --output-json reports/metric-coverage-catalog.json
|
|
89
|
+
agentblaster providers readiness --provider afm --suite trace-replay --model mlx-community/Qwen3.6-27B --policy agentblaster.policy.yaml --strict-unknown --output-json reports/afm-trace-readiness.json
|
|
90
|
+
agentblaster mock-provider --host 127.0.0.1 --port 8787
|
|
91
|
+
agentblaster providers contract-check --provider mock-openai --model agentblaster-mock-qwen3.6-27b-dense --output-json reports/mock-openai-contract-plan.json
|
|
92
|
+
agentblaster providers contract-check --provider mock-openai --model agentblaster-mock-qwen3.6-27b-dense --execute --output-json reports/mock-openai-contract-check.json
|
|
93
|
+
agentblaster providers auth test --provider openai
|
|
94
|
+
agentblaster providers auth status --provider openai
|
|
95
|
+
agentblaster providers auth clear --provider openai --delete-secret
|
|
96
|
+
agentblaster providers cost set --provider openai --input-usd-per-1m-tokens 3.0 --output-usd-per-1m-tokens 12.0
|
|
97
|
+
agentblaster providers cost show --provider openai
|
|
98
|
+
agentblaster providers rate-limits set --provider openai --max-concurrency 2 --requests-per-minute 60
|
|
99
|
+
agentblaster providers rate-limits show --provider openai
|
|
100
|
+
agentblaster providers probe openai
|
|
101
|
+
agentblaster providers capabilities enable --provider afm --capability tool_calling
|
|
102
|
+
agentblaster providers capabilities list --provider afm
|
|
103
|
+
agentblaster suite-requirements --suite trace-replay
|
|
104
|
+
agentblaster suite-requirements --suite agentic-tool-loop
|
|
105
|
+
agentblaster suite-requirements --suite agent-fanout
|
|
106
|
+
agentblaster suite-requirements --suite cancellation
|
|
107
|
+
agentblaster suite-footprint --suite trace-replay --output-json reports/trace-replay-footprint.json
|
|
108
|
+
agentblaster suite-footprint --suite cache-control --output-json reports/cache-control-footprint.json
|
|
109
|
+
agentblaster suite-audit --suite-file examples/suites/toolsim.yaml --output-json reports/toolsim-suite-audit.json
|
|
110
|
+
agentblaster suite-calibration --suite-file examples/suites/agentic-local-profiles.yaml --template-output reports/agentic-local-profiles-calibration.json
|
|
111
|
+
agentblaster suite-calibration --suite-file examples/suites/agentic-local-profiles.yaml --calibration reports/agentic-local-profiles-calibration.json --output-json reports/agentic-local-profiles-calibration-report.json
|
|
112
|
+
agentblaster policy validate agentblaster.policy.yaml --output-json reports/policy-normalized.json
|
|
113
|
+
agentblaster policy template --profile local --output agentblaster.policy.yaml --output-json reports/enterprise-policy-template.json
|
|
114
|
+
agentblaster policy controls agentblaster.policy.yaml --name local-campaign --output-json reports/policy-control-summary.json
|
|
115
|
+
agentblaster evidence bundle --suite-file examples/suites/toolsim.yaml --policy agentblaster.policy.yaml --include-provider-audit --output-dir evidence --audit-log audit/control-plane.jsonl
|
|
116
|
+
agentblaster evidence campaign-preflight --matrix examples/matrices/qwen-gemma-local.yaml --matrix examples/matrices/qwen-gemma-stress.yaml --policy agentblaster.policy.yaml --benchmark-readiness reports/afm-trace-readiness.json --output-dir campaign-preflight/qwen-gemma-local --audit-log audit/control-plane.jsonl
|
|
117
|
+
agentblaster evidence campaign-preflight --matrix campaigns/qwen-gemma-local/matrices/qwen-gemma-local.yaml --policy agentblaster.policy.yaml --benchmark-readiness-list campaigns/qwen-gemma-local/reports/benchmark-readiness-inputs.txt --output-dir campaigns/qwen-gemma-local/reports/campaign-preflight
|
|
118
|
+
agentblaster evidence index --name afm-release --artifact reports/qwen-gemma-matrix-gate.json --artifact reports/harness-orchestration-review.json --artifact reports/afm-improvement-plan.json --artifact reports/afm-metric-coverage.json --artifact reports/cleanup-plan.json --output-json reports/afm-release-evidence-index.json
|
|
119
|
+
agentblaster providers check-suite --provider openai --suite trace-replay --output-json reports/openai-trace-preflight.json
|
|
120
|
+
agentblaster providers check-suite --provider afm --suite toolcall --strict-unknown
|
|
121
|
+
agentblaster catalog simulated-tools --output-json reports/simulated-tools-catalog.json
|
|
122
|
+
agentblaster catalog mcp-profiles --output-json reports/mcp-profiles-catalog.json
|
|
123
|
+
agentblaster catalog lcp-profiles --output-json reports/lcp-profiles-catalog.json
|
|
124
|
+
agentblaster catalog skills --output-json reports/skills-catalog.json
|
|
125
|
+
agentblaster catalog artifact-schemas --format markdown --output reports/artifact-schemas.md
|
|
126
|
+
agentblaster catalog normalize-telemetry samples/ollama-response.json --contract native --native-adapter ollama --output-json reports/ollama-normalized-telemetry.json
|
|
127
|
+
agentblaster dashboard --runs runs --host 127.0.0.1 --port 8765
|
|
128
|
+
agentblaster dashboard --runs runs --host 127.0.0.1 --port 8765 --policy agentblaster.policy.yaml --auth-token-env AGENTBLASTER_DASHBOARD_TOKEN
|
|
129
|
+
agentblaster run --suite smoke --engine afm --model mlx-community/Qwen3.6-27B --dry-run --plan-json reports/afm-smoke-plan.json
|
|
130
|
+
agentblaster run --suite smoke --engine openai --model <openai-model> --no-raw-traces --audit-log runs/audit.jsonl --concurrency 1 --retention-classification confidential --retention-days 30 --raw-trace-retention-days 7
|
|
131
|
+
agentblaster run --suite-file examples/suites/smoke.yaml --engine openai --model <openai-model> --no-raw-traces
|
|
132
|
+
agentblaster run --suite toolcall --engine afm --model mlx-community/Qwen3.6-27B --strict-unknown-capabilities
|
|
133
|
+
agentblaster run --suite agentic-tool-loop --engine afm --model mlx-community/Qwen3.6-27B --strict-unknown-capabilities --no-raw-traces
|
|
134
|
+
agentblaster run --suite agent-fanout --engine afm --model mlx-community/Qwen3.6-27B --concurrency 4 --no-raw-traces
|
|
135
|
+
agentblaster run --suite cancellation --engine afm --model mlx-community/Qwen3.6-27B --no-raw-traces
|
|
136
|
+
agentblaster run --suite harness-engineering --engine afm --model mlx-community/Qwen3.6-27B --strict-unknown-capabilities --no-raw-traces
|
|
137
|
+
agentblaster run --matrix examples/matrices/local-smoke.yaml --offline --continue-on-error --matrix-summary-json reports/local-smoke-matrix-summary.json
|
|
138
|
+
agentblaster matrix contract-checks examples/matrices/qwen-gemma-local.yaml --output-json reports/qwen-gemma-contract-matrix-plan.json
|
|
139
|
+
agentblaster matrix pressure-audit examples/matrices/qwen-gemma-stress.yaml --output-json reports/qwen-gemma-stress-pressure.json
|
|
140
|
+
agentblaster matrix report reports/local-smoke-matrix-summary.json --format html,md,json
|
|
141
|
+
agentblaster matrix saturation-report reports/qwen-gemma-matrix-summary.json --output-json reports/qwen-gemma-matrix-saturation.json
|
|
142
|
+
agentblaster matrix gate reports/local-smoke-matrix-summary.json --require-all-runs-complete --max-failed-runs 0 --min-case-pass-rate 95 --max-failure-class engine_protocol_bug=0 --max-tool-loop-stop-reason max_tool_calls_reached=0 --output-json reports/local-smoke-matrix-gate.json
|
|
143
|
+
agentblaster run --suite trace-replay --engine afm --model mlx-community/Qwen3.6-27B --offline
|
|
144
|
+
agentblaster run --suite-file examples/suites/trace-replay.yaml --engine afm --model mlx-community/Qwen3.6-27B --offline
|
|
145
|
+
agentblaster report runs/<run-id> --format html,json,publication,card,png
|
|
146
|
+
agentblaster report runs/<run-id> --format html,json --audit-log audit/control-plane.jsonl
|
|
147
|
+
agentblaster publication-bundle runs/<run-id> --output-dir publication-bundles --audit-log audit/control-plane.jsonl
|
|
148
|
+
agentblaster export runs/<run-id> --format jsonl,csv,parquet
|
|
149
|
+
agentblaster telemetry-audit runs/<run-id> --required-field tokens_per_second_decode --output-json reports/run-telemetry-audit.json
|
|
150
|
+
agentblaster compare runs/<run-a> runs/<run-b> --output-json reports/comparison.json
|
|
151
|
+
agentblaster compare-gate runs/<baseline> runs/<candidate> --max-avg-latency-regression-pct 15 --min-pass-rate 95 --output-json reports/comparison-gate.json
|
|
152
|
+
agentblaster cleanup runs/<run-id> --raw --reports --exports --caches --temp --bundles --output-json reports/manual-cleanup-plan.json
|
|
153
|
+
agentblaster cleanup runs/<run-id> --raw --reports --exports --caches --temp --bundles --execute --audit-log audit/control-plane.jsonl --require-audit-log --policy agentblaster.policy.yaml
|
|
154
|
+
agentblaster cleanup-expired --runs runs --output-json reports/cleanup-plan.json
|
|
155
|
+
agentblaster cleanup-expired --runs runs --execute --audit-log audit/control-plane.jsonl --require-audit-log --policy agentblaster.policy.yaml
|
|
156
|
+
agentblaster verify runs/<run-id>
|
|
157
|
+
agentblaster sign runs/<run-id> --key-env AGENTBLASTER_SIGNING_KEY --key-id ci-release-key
|
|
158
|
+
agentblaster verify-signature runs/<run-id> --key-env AGENTBLASTER_SIGNING_KEY
|
|
159
|
+
agentblaster quality tiers
|
|
160
|
+
agentblaster quality command normal
|
|
161
|
+
agentblaster quality validation-manifest --format json --output test-reports/sdlc-validation-manifest.json
|
|
162
|
+
agentblaster quality chrome-checklist --output tests/gui/chrome-dashboard-checklist.md
|
|
163
|
+
agentblaster quality chrome-plan --format json --output tests/gui/chrome-dashboard-plan.json
|
|
164
|
+
agentblaster quality dashboard-fixture --output tests/fixtures/dashboard-runs --overwrite
|
|
165
|
+
agentblaster selftest --tier normal --dry-run
|
|
166
|
+
agentblaster selftest gui --browser chromium --headed --dry-run
|
|
167
|
+
PYTHONPATH=src pytest -q tests/gui -m gui
|
|
168
|
+
agentblaster selftest report --run selftest_20260531T000000Z --format html,json,junit
|
|
169
|
+
agentblaster experiment manifest --name qwen-gemma-local --objective "Compare AFM and LM Studio on Qwen/Gemma local-agent suites." --providers afm,lm-studio --targets qwen3.6-27b-dense,gemma-4-31b-dense --suites trace-replay,agentic-tool-loop,agent-fanout,prefill,harness-engineering --policy agentblaster.policy.yaml --output reports/qwen-gemma-experiment.json
|
|
170
|
+
agentblaster experiment gate reports/qwen-gemma-experiment.json --require-policy --output-json reports/qwen-gemma-experiment-gate.json
|
|
171
|
+
agentblaster release packaging-readiness --output-json reports/packaging-readiness.json --fail-on-gaps --audit-log audit/control-plane.jsonl
|
|
172
|
+
agentblaster release provenance --output reports/release-provenance.json --audit-log audit/control-plane.jsonl
|
|
173
|
+
agentblaster release qualification-bundle --name afm-release --evidence-bundle evidence/toolsim.agentblaster-evidence.zip --provider-audit reports/provider-audit.json --provider-contract-matrix reports/qwen-gemma-provider-contract-matrix.json --matrix-gate reports/qwen-gemma-matrix-gate.json --telemetry-audit reports/run-telemetry-audit.json --normalized-telemetry reports/afm-normalized-telemetry.json --matrix-pressure-audit reports/qwen-gemma-stress-pressure.json --matrix-saturation-report reports/qwen-gemma-matrix-saturation.json --matrix-scorecard reports/qwen-gemma-matrix-scorecard.json --implementation-status reports/implementation-status.json --campaign-preflight-manifest campaign-preflight/qwen-gemma-local/manifest.json --benchmark-readiness reports/afm-trace-readiness.json --engine-advisory reports/afm-improvement-plan.json --evidence-index reports/afm-release-evidence-index.json --suite-audit reports/toolsim-suite-audit.json --metric-coverage reports/afm-metric-coverage.json --release-provenance reports/release-provenance.json --publication-bundle publication-bundles/run.agentblaster-publication.zip --matrix-publication-bundle publication-bundles/qwen-gemma-matrix-summary.agentblaster-matrix-publication.zip --harness-review reports/harness-contract-fuzz-review.json --selftest-report test-reports/selftest/selftest-report.json --sdlc-validation-manifest test-reports/sdlc-validation-manifest.json --output-dir release-bundles --audit-log audit/control-plane.jsonl
|
|
174
|
+
agentblaster security scan release-bundles/afm-release.agentblaster-release-qualification.zip --output-json reports/redaction-scan.json
|
|
175
|
+
agentblaster release claim-readiness --name afm-release --experiment-manifest reports/qwen-gemma-experiment.json --experiment-gate reports/qwen-gemma-experiment-gate.json --provider-audit reports/provider-audit.json --provider-contract-matrix reports/qwen-gemma-provider-contract-matrix.json --matrix-gate reports/qwen-gemma-matrix-gate.json --telemetry-audit reports/run-telemetry-audit.json --normalized-telemetry reports/afm-normalized-telemetry.json --matrix-pressure-audit reports/qwen-gemma-stress-pressure.json --matrix-saturation-report reports/qwen-gemma-matrix-saturation.json --matrix-scorecard reports/qwen-gemma-matrix-scorecard.json --implementation-status reports/implementation-status.json --benchmark-readiness reports/afm-trace-readiness.json --release-provenance reports/release-provenance.json --release-qualification-bundle release-bundles/afm-release.agentblaster-release-qualification.zip --redaction-scan reports/redaction-scan.json --publication-bundle publication-bundles/run.agentblaster-publication.zip --matrix-publication-bundle publication-bundles/qwen-gemma-matrix-summary.agentblaster-matrix-publication.zip --harness-review reports/harness-contract-fuzz-review.json --engine-advisory reports/afm-improvement-plan.json --evidence-index reports/afm-release-evidence-index.json --suite-audit reports/toolsim-suite-audit.json --metric-coverage reports/afm-metric-coverage.json --campaign-preflight-manifest campaign-preflight/qwen-gemma-local/manifest.json --selftest-report test-reports/selftest/selftest-report.json --output-json reports/afm-release-claim-readiness.json
|
|
176
|
+
agentblaster release publication-brief --name afm-release --claim-readiness reports/afm-release-claim-readiness.json --matrix-scorecard reports/qwen-gemma-matrix-scorecard.json --release-provenance reports/release-provenance.json --evidence-index reports/afm-release-evidence-index.json --output-json reports/afm-release-publication-brief.json --output-md reports/afm-release-publication-brief.md
|
|
177
|
+
agentblaster agents profiles
|
|
178
|
+
agentblaster agents suite --profile all --output examples/suites/agentic-local-profiles.yaml
|
|
179
|
+
agentblaster agents suite --profile hermes --output examples/suites/agentic-hermes.yaml
|
|
180
|
+
agentblaster harness profiles
|
|
181
|
+
agentblaster harness generate --profile contract-fuzz --suite smoke --repeats 1 --seed 0 --output examples/suites/harness-contract-fuzz.yaml
|
|
182
|
+
agentblaster harness generate --profile metamorphic --suite smoke --repeats 3 --seed 13 --output examples/suites/harness-metamorphic.yaml
|
|
183
|
+
agentblaster harness generate --profile cancellation --suite smoke --repeats 3 --seed 23 --output examples/suites/harness-cancellation.yaml
|
|
184
|
+
agentblaster harness generate --profile orchestration --suite smoke --repeats 3 --seed 29 --output examples/suites/harness-orchestration.yaml
|
|
185
|
+
agentblaster harness generate --profile emerging-workflows --suite smoke --repeats 2 --seed 37 --output examples/suites/harness-emerging-workflows.yaml
|
|
186
|
+
agentblaster harness review --suite-file examples/suites/harness-contract-fuzz.yaml --output-json reports/harness-contract-fuzz-review.json
|
|
187
|
+
agentblaster models targets
|
|
188
|
+
agentblaster models matrix --providers afm,lm-studio --targets qwen3.6-27b-dense,gemma-4-31b-dense --suite trace-replay --output examples/matrices/qwen-gemma-local.yaml
|
|
189
|
+
agentblaster models stress-matrix --providers afm,lm-studio --targets qwen3.6-27b-dense,gemma-4-31b-dense --suites agentic-tool-loop,agent-fanout,prefill,harness-engineering,trace-replay --concurrency-levels 1,2,4,8 --output examples/matrices/qwen-gemma-stress.yaml --summary-json reports/qwen-gemma-stress-plan.json
|
|
190
|
+
agentblaster models benchmark-kit --providers afm,lm-studio --targets qwen3.6-27b-dense,gemma-4-31b-dense --suite trace-replay --policy agentblaster.policy.yaml --output-dir benchmark-kits/qwen-gemma-local
|
|
191
|
+
cat campaigns/qwen-gemma-local/README.md
|
|
192
|
+
agentblaster run --matrix examples/matrices/qwen-gemma-local.yaml --offline --continue-on-error --matrix-summary-json reports/qwen-gemma-matrix-summary.json
|
|
193
|
+
agentblaster run --matrix examples/matrices/qwen-gemma-stress.yaml --offline --dry-run --plan-json reports/qwen-gemma-stress-plan.json
|
|
194
|
+
agentblaster matrix report reports/qwen-gemma-matrix-summary.json --format html,md,json
|
|
195
|
+
agentblaster matrix saturation-report reports/qwen-gemma-matrix-summary.json --output-json reports/qwen-gemma-matrix-saturation.json
|
|
196
|
+
agentblaster matrix gate reports/qwen-gemma-matrix-summary.json --require-all-runs-complete --max-failed-runs 0 --min-case-pass-rate 95 --max-failure-class engine_protocol_bug=0 --max-tool-loop-stop-reason max_tool_calls_reached=0 --output-json reports/qwen-gemma-matrix-gate.json
|
|
197
|
+
agentblaster run --suite smoke --engine afm --model mlx-community/Qwen3.6-27B --offline
|
|
198
|
+
agentblaster run --suite smoke --engine afm --model mlx-community/Qwen3.6-27B --policy agentblaster.policy.yaml
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
Provider profiles are stored locally without raw API keys. API keys can be referenced through environment variables, optional OS keyring storage, or an explicit plaintext `.env` fallback for local development only, with dashboard setup-status posture, provider-audit secret-backend posture, status, test, clear, and writable-secret-delete workflows.
|
|
202
|
+
|
|
203
|
+
Provider setup details are documented in [docs/providers.md](docs/providers.md), including remote OpenAI/Anthropic presets, the deterministic local mock provider, schema-versioned redacted provider audits, readiness dossiers, portable environment-variable references, optional OS-keyring API-key references, explicit development-only dotenv fallback, cost models for budget policy, and provider rate limits for pacing/concurrency control. Local engine setup recipes are documented in [docs/launch-recipes.md](docs/launch-recipes.md).
|
|
204
|
+
Engine target planning includes AFM MLX, MLX-LM, Ollama/Ollama-native, Rapid-MLX, oMLX, vLLM-MLX OpenAI/Anthropic-compatible profiles, LM Studio Chat/Responses/Anthropic/native profiles, and remote OpenAI/Anthropic-compatible contract targets. Each target declares representative agent-profile baselines, workflow surfaces, prefill/concurrency challenges, contract priority, telemetry profiles, and native metric claim policy for standardized comparison.
|
|
205
|
+
|
|
206
|
+
Reporting details are documented in [docs/reporting.md](docs/reporting.md), including publication JSON plus SVG/PNG report cards for media or corporate consumption. Metric coverage is documented in [docs/metrics.md](docs/metrics.md), including native/measured/inferred/conditional/unavailable field status and stats-semantics guidance for cross-engine comparisons.
|
|
207
|
+
Failure classification is documented in [docs/failure-taxonomy.md](docs/failure-taxonomy.md), including the distinction between model-quality misses, engine protocol bugs, feature gaps, runtime failures, environment failures, rate limits, and harness defects.
|
|
208
|
+
Artifact schemas are documented in [docs/artifact-schemas.md](docs/artifact-schemas.md), including publication-safety guidance for run, matrix, lifecycle, raw, and readiness artifacts.
|
|
209
|
+
|
|
210
|
+
Reproducibility details are documented in [docs/reproducibility.md](docs/reproducibility.md), including suite snapshots, suite/case hashes, run integrity manifests, signatures, and publication-bundle signature coverage metadata.
|
|
211
|
+
Implementation status inventory is available through `agentblaster implementation-status`; it is a static handoff artifact and does not run tests or contact providers. It reports file presence plus static requirement inventories for target engines, engine-target standardization metadata, provider contracts, Qwen/Gemma model targets, agent profiles, built-in harness-engineering suite cases, stats-comparability/metric-coverage catalogs, enterprise policy controls, credential/backend posture including optional keyring support, run/matrix publication-bundle governance with media-kit manifests, and SDLC/Chrome self-test gates.
|
|
212
|
+
Retention metadata is documented in [docs/retention.md](docs/retention.md), including manifest fields for artifact classification, intended run retention, and shorter raw-trace retention.
|
|
213
|
+
|
|
214
|
+
Observability details are documented in [docs/observability.md](docs/observability.md), including optional Prometheus before/after snapshots for local engine telemetry and normalized response telemetry with comparison-readiness metadata. Agent fan-out diagnostics are documented in [docs/agent-fanout.md](docs/agent-fanout.md). Cache-control diagnostics are documented in [docs/cache-control.md](docs/cache-control.md). Cancellation diagnostics are documented in [docs/cancellation.md](docs/cancellation.md). The built-in `agentic-tool-loop` suite exercises bounded deterministic tool-result replay, MCP fixture calls, LCP context attachment, and max-tool-call stop-reason reporting.
|
|
215
|
+
|
|
216
|
+
Dashboard details are documented in [docs/dashboard.md](docs/dashboard.md), including the no-JavaScript launch/report-generation forms and allowlisted report artifact links.
|
|
217
|
+
|
|
218
|
+
Capability preflight is documented in [docs/capabilities.md](docs/capabilities.md), including suite feature requirements and provider-suite compatibility checks.
|
|
219
|
+
Run execution performs capability preflight by default, failing before dispatch when a provider is explicitly missing suite-required features.
|
|
220
|
+
Bundled capability surface catalogs are documented in [docs/capability-surfaces.md](docs/capability-surfaces.md), including simulated tool, deterministic MCP profile, LCP context-bundle, and skill-pack inventory commands for policy review.
|
|
221
|
+
Suite governance is documented in [docs/suite-governance.md](docs/suite-governance.md), including static provenance, risk, license/source, and capability-surface audits before dispatch.
|
|
222
|
+
Evidence bundles are documented in [docs/evidence-bundles.md](docs/evidence-bundles.md), including redaction-safe governance zip artifacts for corporate review and media-supporting benchmark evidence.
|
|
223
|
+
Campaign preflight bundles are also documented there; they collect no-dispatch readiness, schema, policy, provider-audit, and matrix-inventory artifacts before expensive local or remote matrices are launched.
|
|
224
|
+
|
|
225
|
+
Model targets, matrix generation, and benchmark kits are documented in [docs/models.md](docs/models.md), including the initial Qwen3.6 27B dense and Gemma 4 31B dense comparison targets, comparison-group guidance, required release metadata, and provider contract-matrix commands for campaign compatibility evidence.
|
|
226
|
+
The checked-in Qwen/Gemma campaign handoff lives in [campaigns/qwen-gemma-local/README.md](campaigns/qwen-gemma-local/README.md).
|
|
227
|
+
|
|
228
|
+
Dry-run planning is documented in [docs/planning.md](docs/planning.md), including policy/capability preflight and estimated token/cost summaries before dispatch. Prompt footprint analysis is documented in [docs/prompt-footprint.md](docs/prompt-footprint.md), including system/tool/MCP/LCP/skill prefix breakdowns for prefill diagnostics. Matrix pressure audits extend that analysis across provider/model/suite/concurrency matrices before dispatch.
|
|
229
|
+
|
|
230
|
+
Run execution includes enterprise controls: raw traces can be disabled, remote providers can be blocked with `--offline`, YAML policy files can allowlist providers and endpoint hosts, policy can require remote API-key references, policy can restrict secret backends and approved secret reference names/prefixes, policy can require cleanup audit logs, policy can cap suite and matrix cost exposure, policy can gate suite-provided tool schemas, simulated tools, MCP profiles, LCP context bundles, skills, provenance, risk levels, and source/license metadata, and optional JSONL audit logs record run and policy events.
|
|
231
|
+
Security policy details are documented in [docs/security-policy.md](docs/security-policy.md), including enterprise baseline generation with `agentblaster policy template`, no-secret review summaries with `agentblaster policy controls`, and `agentblaster.provider-audit.v1` provider/auth posture audits. The example policy in [agentblaster.policy.example.yaml](agentblaster.policy.example.yaml) separates provider endpoint allowlists from Prometheus metrics endpoint allowlists and includes capability-surface allowlists.
|
|
232
|
+
|
|
233
|
+
Audit logging details are documented in [docs/audit.md](docs/audit.md), including control-plane events for provider config, secret reference changes, dashboard start, report generation, matrix reports, and exports.
|
|
234
|
+
|
|
235
|
+
AgentBlaster includes its own SDLC test harness taxonomy. The `quality` commands describe deterministic app-test tiers, release lanes, SDLC validation manifests, Chrome/Codex dashboard validation plans, and redacted dashboard fixtures with release-evidence summaries, including bounded `agentic-tool-loop` stop-reason gate metadata, without running tests. SDLC validation manifests are direct review artifacts and can also be archived through release qualification bundles as compact summaries for claim-readiness, evidence-index, and dashboard review.
|
|
236
|
+
Experiment manifests are documented in [docs/experiments.md](docs/experiments.md), including static scope, preflight requirements, acceptance gates, and publication rules for corporate/media benchmark campaigns. Release governance artifacts can be generated with `agentblaster release packaging-readiness` and `agentblaster release provenance`; the JSON outputs record package metadata readiness, dependency declarations, an SPDX-lite SBOM inventory, optional installed package inventory, safe source hashes, and explicit redaction notes. Release qualification bundles collect evidence, audit, advisory, gate, readiness, provenance, publication, SDLC validation, and selftest artifacts into one checksum-indexed package. Provider audits, publication briefs, and SDLC validation manifests are summarized, not copied verbatim, for release qualification, claim-readiness, evidence-index, and dashboard consumers; publication briefs also surface compact engine-target IDs and media-kit readiness from compact claim-readiness evidence without opening publication ZIP bundles. Generated campaign runbooks include claim-readiness, publication-brief, and final archival bundle commands so corporate/media packets can carry the final claim gate, brief, provider-auth posture, media-kit readiness, and app-SDLC review evidence in compact redaction-safe form. Use `agentblaster security scan` as a final local redaction gate before publishing bundles; it scans text files, text entries inside ZIP bundles, and unsafe ZIP member names without extracting archives or printing matched secret/local-path values.
|
|
237
|
+
Repository automation is documented in [docs/testing.md](docs/testing.md), including deterministic CI and a package-build workflow that uploads artifacts and, on version tag pushes (`v*`), publishes to PyPI via a trusted publisher.
|
|
238
|
+
|
|
239
|
+
AgentBlaster includes representative local-agent profile generators for OpenCode-style, OpenClaw-style, Nous Hermes-style, Pi-style, Aider-style, Cline-style, Continue-style, and Codex-style workflows. The `agents` commands write reviewable YAML suites with tool, MCP, LCP, skill, trace-replay, structured-output, retrieval, and sandboxed command-planning surfaces and do not call providers or install third-party agent frameworks.
|
|
240
|
+
|
|
241
|
+
AgentBlaster also includes deterministic harness-engineering generators for prefill/cache, concurrency, cancellation, provider-contract fuzz, metamorphic-equivalence, skill-prefix routing, multi-tool orchestration, mixed emerging MCP/LCP/skills/tool-loop stacks, and judge-rubric workloads with bounded fixture tool-result round trips. The `harness` commands write reviewable YAML suites and static harness-review artifacts without calling providers.
|
|
242
|
+
Harness engineering details are documented in [docs/harness.md](docs/harness.md), including generated-suite provenance and reporting metadata.
|
|
243
|
+
|
|
244
|
+
Trace replay cases can provide explicit `messages` for multi-turn agent workflows, including prior assistant tool calls and deterministic tool-result context. OpenAI-compatible and Anthropic-compatible adapters normalize those traces into their respective request contracts.
|
|
245
|
+
|
|
246
|
+
## Planned Benchmark CLI
|
|
247
|
+
|
|
248
|
+
```bash
|
|
249
|
+
agentblaster report runs/<run-id> --format html
|
|
250
|
+
```
|