agentblaster 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (208) hide show
  1. agentblaster-0.1.0/.github/workflows/ci.yml +147 -0
  2. agentblaster-0.1.0/.github/workflows/publish.yml +81 -0
  3. agentblaster-0.1.0/.gitignore +13 -0
  4. agentblaster-0.1.0/PKG-INFO +250 -0
  5. agentblaster-0.1.0/README.md +208 -0
  6. agentblaster-0.1.0/agentblaster.policy.example.yaml +94 -0
  7. agentblaster-0.1.0/campaigns/qwen-gemma-local/README.md +347 -0
  8. agentblaster-0.1.0/campaigns/qwen-gemma-local/campaign-handoff.json +117 -0
  9. agentblaster-0.1.0/docs/README.md +71 -0
  10. agentblaster-0.1.0/docs/agent-fanout.md +31 -0
  11. agentblaster-0.1.0/docs/agent-profiles.md +37 -0
  12. agentblaster-0.1.0/docs/artifact-schemas.md +19 -0
  13. agentblaster-0.1.0/docs/audit.md +111 -0
  14. agentblaster-0.1.0/docs/cache-control.md +23 -0
  15. agentblaster-0.1.0/docs/cancellation.md +29 -0
  16. agentblaster-0.1.0/docs/capabilities.md +101 -0
  17. agentblaster-0.1.0/docs/capability-surfaces.md +39 -0
  18. agentblaster-0.1.0/docs/dashboard.md +129 -0
  19. agentblaster-0.1.0/docs/engine-targets.md +51 -0
  20. agentblaster-0.1.0/docs/evidence-bundles.md +116 -0
  21. agentblaster-0.1.0/docs/experiments.md +35 -0
  22. agentblaster-0.1.0/docs/failure-taxonomy.md +23 -0
  23. agentblaster-0.1.0/docs/harness-engineering.md +62 -0
  24. agentblaster-0.1.0/docs/harness.md +69 -0
  25. agentblaster-0.1.0/docs/launch-recipes.md +62 -0
  26. agentblaster-0.1.0/docs/lcp-fixtures.md +19 -0
  27. agentblaster-0.1.0/docs/metrics.md +123 -0
  28. agentblaster-0.1.0/docs/models.md +114 -0
  29. agentblaster-0.1.0/docs/observability.md +46 -0
  30. agentblaster-0.1.0/docs/planning.md +116 -0
  31. agentblaster-0.1.0/docs/prd.md +1113 -0
  32. agentblaster-0.1.0/docs/prompt-footprint.md +43 -0
  33. agentblaster-0.1.0/docs/providers.md +392 -0
  34. agentblaster-0.1.0/docs/readiness.md +25 -0
  35. agentblaster-0.1.0/docs/release-qualification.md +139 -0
  36. agentblaster-0.1.0/docs/reporting.md +292 -0
  37. agentblaster-0.1.0/docs/reproducibility.md +93 -0
  38. agentblaster-0.1.0/docs/retention.md +98 -0
  39. agentblaster-0.1.0/docs/security-policy.md +184 -0
  40. agentblaster-0.1.0/docs/security-scan.md +20 -0
  41. agentblaster-0.1.0/docs/suite-governance.md +68 -0
  42. agentblaster-0.1.0/docs/telemetry-normalization.md +46 -0
  43. agentblaster-0.1.0/docs/testing.md +217 -0
  44. agentblaster-0.1.0/docs/trace-replay.md +53 -0
  45. agentblaster-0.1.0/docs/user-guide.md +474 -0
  46. agentblaster-0.1.0/docs/workflow-surfaces.md +37 -0
  47. agentblaster-0.1.0/examples/README.md +44 -0
  48. agentblaster-0.1.0/examples/matrices/local-smoke.yaml +13 -0
  49. agentblaster-0.1.0/examples/matrices/qwen-gemma-local.yaml +39 -0
  50. agentblaster-0.1.0/examples/matrices/qwen-gemma-stress.yaml +351 -0
  51. agentblaster-0.1.0/examples/suites/harness-contract-fuzz.yaml +100 -0
  52. agentblaster-0.1.0/examples/suites/mcp-wide.yaml +22 -0
  53. agentblaster-0.1.0/examples/suites/skills-prefill.yaml +26 -0
  54. agentblaster-0.1.0/examples/suites/smoke.yaml +23 -0
  55. agentblaster-0.1.0/examples/suites/streaming.yaml +20 -0
  56. agentblaster-0.1.0/examples/suites/structured.yaml +33 -0
  57. agentblaster-0.1.0/examples/suites/toolcall.yaml +31 -0
  58. agentblaster-0.1.0/examples/suites/toolsim.yaml +28 -0
  59. agentblaster-0.1.0/examples/suites/trace-replay.yaml +33 -0
  60. agentblaster-0.1.0/pyproject.toml +78 -0
  61. agentblaster-0.1.0/src/agentblaster/__init__.py +3 -0
  62. agentblaster-0.1.0/src/agentblaster/adapters.py +1435 -0
  63. agentblaster-0.1.0/src/agentblaster/agent_profiles.py +420 -0
  64. agentblaster-0.1.0/src/agentblaster/audit.py +27 -0
  65. agentblaster-0.1.0/src/agentblaster/benchmark_kit.py +356 -0
  66. agentblaster-0.1.0/src/agentblaster/bundle.py +692 -0
  67. agentblaster-0.1.0/src/agentblaster/campaign.py +1031 -0
  68. agentblaster-0.1.0/src/agentblaster/campaign_preflight.py +647 -0
  69. agentblaster-0.1.0/src/agentblaster/capabilities.py +270 -0
  70. agentblaster-0.1.0/src/agentblaster/claim_readiness.py +3948 -0
  71. agentblaster-0.1.0/src/agentblaster/cleanup.py +226 -0
  72. agentblaster-0.1.0/src/agentblaster/cli.py +4202 -0
  73. agentblaster-0.1.0/src/agentblaster/compare.py +423 -0
  74. agentblaster-0.1.0/src/agentblaster/config.py +62 -0
  75. agentblaster-0.1.0/src/agentblaster/constants.py +8 -0
  76. agentblaster-0.1.0/src/agentblaster/contract_check.py +919 -0
  77. agentblaster-0.1.0/src/agentblaster/costs.py +74 -0
  78. agentblaster-0.1.0/src/agentblaster/dashboard.py +5974 -0
  79. agentblaster-0.1.0/src/agentblaster/engine_advisory.py +1045 -0
  80. agentblaster-0.1.0/src/agentblaster/engine_onboarding.py +224 -0
  81. agentblaster-0.1.0/src/agentblaster/engine_targets.py +545 -0
  82. agentblaster-0.1.0/src/agentblaster/environment.py +284 -0
  83. agentblaster-0.1.0/src/agentblaster/errors.py +21 -0
  84. agentblaster-0.1.0/src/agentblaster/evidence.py +188 -0
  85. agentblaster-0.1.0/src/agentblaster/evidence_index.py +1865 -0
  86. agentblaster-0.1.0/src/agentblaster/experiment.py +200 -0
  87. agentblaster-0.1.0/src/agentblaster/exports.py +158 -0
  88. agentblaster-0.1.0/src/agentblaster/failures.py +70 -0
  89. agentblaster-0.1.0/src/agentblaster/fixtures.py +775 -0
  90. agentblaster-0.1.0/src/agentblaster/harness.py +1254 -0
  91. agentblaster-0.1.0/src/agentblaster/implementation_status.py +719 -0
  92. agentblaster-0.1.0/src/agentblaster/integrity.py +161 -0
  93. agentblaster-0.1.0/src/agentblaster/launch_recipes.py +295 -0
  94. agentblaster-0.1.0/src/agentblaster/lcp.py +107 -0
  95. agentblaster-0.1.0/src/agentblaster/matrix.py +101 -0
  96. agentblaster-0.1.0/src/agentblaster/matrix_gate.py +565 -0
  97. agentblaster-0.1.0/src/agentblaster/matrix_pressure.py +187 -0
  98. agentblaster-0.1.0/src/agentblaster/matrix_saturation.py +601 -0
  99. agentblaster-0.1.0/src/agentblaster/mcp.py +187 -0
  100. agentblaster-0.1.0/src/agentblaster/metric_coverage.py +552 -0
  101. agentblaster-0.1.0/src/agentblaster/mock_provider.py +485 -0
  102. agentblaster-0.1.0/src/agentblaster/model_catalog.py +153 -0
  103. agentblaster-0.1.0/src/agentblaster/models.py +531 -0
  104. agentblaster-0.1.0/src/agentblaster/observability.py +110 -0
  105. agentblaster-0.1.0/src/agentblaster/planning.py +199 -0
  106. agentblaster-0.1.0/src/agentblaster/policy.py +635 -0
  107. agentblaster-0.1.0/src/agentblaster/presets.py +219 -0
  108. agentblaster-0.1.0/src/agentblaster/prompt_footprint.py +245 -0
  109. agentblaster-0.1.0/src/agentblaster/protocol_repair.py +431 -0
  110. agentblaster-0.1.0/src/agentblaster/provider_audit.py +210 -0
  111. agentblaster-0.1.0/src/agentblaster/publication_brief.py +893 -0
  112. agentblaster-0.1.0/src/agentblaster/quality.py +1142 -0
  113. agentblaster-0.1.0/src/agentblaster/rate_limits.py +74 -0
  114. agentblaster-0.1.0/src/agentblaster/readiness.py +241 -0
  115. agentblaster-0.1.0/src/agentblaster/redaction.py +58 -0
  116. agentblaster-0.1.0/src/agentblaster/redaction_scan.py +247 -0
  117. agentblaster-0.1.0/src/agentblaster/release.py +440 -0
  118. agentblaster-0.1.0/src/agentblaster/release_qualification.py +2248 -0
  119. agentblaster-0.1.0/src/agentblaster/remote_onboarding.py +308 -0
  120. agentblaster-0.1.0/src/agentblaster/reports.py +2245 -0
  121. agentblaster-0.1.0/src/agentblaster/runner.py +1677 -0
  122. agentblaster-0.1.0/src/agentblaster/schema_registry.py +1151 -0
  123. agentblaster-0.1.0/src/agentblaster/secrets.py +274 -0
  124. agentblaster-0.1.0/src/agentblaster/security_posture.py +492 -0
  125. agentblaster-0.1.0/src/agentblaster/skills.py +67 -0
  126. agentblaster-0.1.0/src/agentblaster/stress_matrix.py +113 -0
  127. agentblaster-0.1.0/src/agentblaster/suite_audit.py +259 -0
  128. agentblaster-0.1.0/src/agentblaster/suite_calibration.py +171 -0
  129. agentblaster-0.1.0/src/agentblaster/suites.py +805 -0
  130. agentblaster-0.1.0/src/agentblaster/telemetry.py +947 -0
  131. agentblaster-0.1.0/src/agentblaster/telemetry_audit.py +300 -0
  132. agentblaster-0.1.0/src/agentblaster/toolsim.py +193 -0
  133. agentblaster-0.1.0/src/agentblaster/workflow_readiness.py +570 -0
  134. agentblaster-0.1.0/src/agentblaster/workflow_surfaces.py +292 -0
  135. agentblaster-0.1.0/tests/gui/chrome-dashboard-checklist.md +30 -0
  136. agentblaster-0.1.0/tests/gui/test_dashboard_playwright.py +105 -0
  137. agentblaster-0.1.0/tests/test_adapters.py +1095 -0
  138. agentblaster-0.1.0/tests/test_agent_profiles.py +72 -0
  139. agentblaster-0.1.0/tests/test_audit.py +16 -0
  140. agentblaster-0.1.0/tests/test_benchmark_kit.py +80 -0
  141. agentblaster-0.1.0/tests/test_bundle.py +265 -0
  142. agentblaster-0.1.0/tests/test_cache_control.py +100 -0
  143. agentblaster-0.1.0/tests/test_campaign.py +648 -0
  144. agentblaster-0.1.0/tests/test_campaign_handoff.py +125 -0
  145. agentblaster-0.1.0/tests/test_campaign_preflight.py +313 -0
  146. agentblaster-0.1.0/tests/test_capabilities.py +310 -0
  147. agentblaster-0.1.0/tests/test_claim_readiness.py +2026 -0
  148. agentblaster-0.1.0/tests/test_cleanup.py +185 -0
  149. agentblaster-0.1.0/tests/test_cli.py +3600 -0
  150. agentblaster-0.1.0/tests/test_compare.py +181 -0
  151. agentblaster-0.1.0/tests/test_config.py +37 -0
  152. agentblaster-0.1.0/tests/test_contract_check.py +228 -0
  153. agentblaster-0.1.0/tests/test_dashboard.py +3040 -0
  154. agentblaster-0.1.0/tests/test_dashboard_fixtures.py +155 -0
  155. agentblaster-0.1.0/tests/test_engine_advisory.py +412 -0
  156. agentblaster-0.1.0/tests/test_engine_onboarding.py +59 -0
  157. agentblaster-0.1.0/tests/test_engine_targets.py +109 -0
  158. agentblaster-0.1.0/tests/test_environment.py +61 -0
  159. agentblaster-0.1.0/tests/test_evidence.py +146 -0
  160. agentblaster-0.1.0/tests/test_evidence_index.py +764 -0
  161. agentblaster-0.1.0/tests/test_examples.py +24 -0
  162. agentblaster-0.1.0/tests/test_experiment.py +79 -0
  163. agentblaster-0.1.0/tests/test_exports.py +216 -0
  164. agentblaster-0.1.0/tests/test_failures.py +36 -0
  165. agentblaster-0.1.0/tests/test_harness.py +321 -0
  166. agentblaster-0.1.0/tests/test_implementation_status.py +257 -0
  167. agentblaster-0.1.0/tests/test_integrity.py +120 -0
  168. agentblaster-0.1.0/tests/test_launch_recipes.py +80 -0
  169. agentblaster-0.1.0/tests/test_lcp.py +20 -0
  170. agentblaster-0.1.0/tests/test_matrix.py +489 -0
  171. agentblaster-0.1.0/tests/test_matrix_pressure.py +86 -0
  172. agentblaster-0.1.0/tests/test_matrix_saturation.py +187 -0
  173. agentblaster-0.1.0/tests/test_mcp.py +69 -0
  174. agentblaster-0.1.0/tests/test_metric_coverage.py +110 -0
  175. agentblaster-0.1.0/tests/test_mock_provider.py +156 -0
  176. agentblaster-0.1.0/tests/test_model_catalog.py +74 -0
  177. agentblaster-0.1.0/tests/test_models.py +128 -0
  178. agentblaster-0.1.0/tests/test_planning.py +202 -0
  179. agentblaster-0.1.0/tests/test_policy.py +969 -0
  180. agentblaster-0.1.0/tests/test_presets.py +95 -0
  181. agentblaster-0.1.0/tests/test_prompt_footprint.py +96 -0
  182. agentblaster-0.1.0/tests/test_protocol_repair.py +158 -0
  183. agentblaster-0.1.0/tests/test_publication_brief.py +301 -0
  184. agentblaster-0.1.0/tests/test_quality.py +171 -0
  185. agentblaster-0.1.0/tests/test_rate_limits.py +34 -0
  186. agentblaster-0.1.0/tests/test_readiness.py +168 -0
  187. agentblaster-0.1.0/tests/test_redaction.py +39 -0
  188. agentblaster-0.1.0/tests/test_redaction_scan.py +89 -0
  189. agentblaster-0.1.0/tests/test_release.py +156 -0
  190. agentblaster-0.1.0/tests/test_release_qualification.py +1620 -0
  191. agentblaster-0.1.0/tests/test_remote_onboarding.py +157 -0
  192. agentblaster-0.1.0/tests/test_reports.py +680 -0
  193. agentblaster-0.1.0/tests/test_review_artifact_integrations.py +206 -0
  194. agentblaster-0.1.0/tests/test_runner.py +1610 -0
  195. agentblaster-0.1.0/tests/test_schema_registry.py +146 -0
  196. agentblaster-0.1.0/tests/test_secrets.py +82 -0
  197. agentblaster-0.1.0/tests/test_security_posture.py +242 -0
  198. agentblaster-0.1.0/tests/test_skills.py +36 -0
  199. agentblaster-0.1.0/tests/test_stress_matrix.py +98 -0
  200. agentblaster-0.1.0/tests/test_suite_audit.py +86 -0
  201. agentblaster-0.1.0/tests/test_suite_calibration.py +118 -0
  202. agentblaster-0.1.0/tests/test_suites.py +211 -0
  203. agentblaster-0.1.0/tests/test_telemetry.py +290 -0
  204. agentblaster-0.1.0/tests/test_telemetry_audit.py +158 -0
  205. agentblaster-0.1.0/tests/test_toolsim.py +48 -0
  206. agentblaster-0.1.0/tests/test_workflow_readiness.py +188 -0
  207. agentblaster-0.1.0/tests/test_workflow_surfaces.py +58 -0
  208. agentblaster-0.1.0/tests/test_workflows.py +26 -0
@@ -0,0 +1,147 @@
1
+ name: CI
2
+
3
+ on:
4
+ push:
5
+ branches: [main]
6
+ pull_request:
7
+ branches: [main]
8
+ workflow_dispatch:
9
+
10
+ permissions:
11
+ contents: read
12
+
13
+ jobs:
14
+ tests:
15
+ name: Python ${{ matrix.python-version }} on ${{ matrix.os }}
16
+ runs-on: ${{ matrix.os }}
17
+ strategy:
18
+ fail-fast: false
19
+ matrix:
20
+ os: [ubuntu-latest, macos-latest, windows-latest]
21
+ python-version: ["3.11", "3.12"]
22
+
23
+ steps:
24
+ - name: Checkout
25
+ uses: actions/checkout@v4
26
+
27
+ - name: Set up Python
28
+ uses: actions/setup-python@v5
29
+ with:
30
+ python-version: ${{ matrix.python-version }}
31
+ cache: pip
32
+
33
+ - name: Install package
34
+ run: python -m pip install -e ".[dev]"
35
+
36
+ - name: Run deterministic app tests
37
+ run: |
38
+ python -c "from pathlib import Path; Path('test-reports').mkdir(exist_ok=True)"
39
+ pytest -q -m "not remote and not slow and not gui" --junitxml=test-reports/pytest.xml
40
+
41
+ - name: Selftest command dry runs
42
+ run: |
43
+ agentblaster selftest --tier fast --dry-run
44
+ agentblaster selftest --tier normal --dry-run
45
+ agentblaster selftest --tier security --dry-run
46
+ agentblaster selftest gui --browser chromium --dry-run
47
+
48
+ - name: CLI smoke
49
+ run: |
50
+ agentblaster version
51
+ agentblaster doctor --output-json test-reports/environment-readiness.json --fail-on-required-gaps
52
+ agentblaster suites
53
+ agentblaster validate-case examples/suites/smoke.yaml
54
+ agentblaster quality tiers
55
+ agentblaster quality command normal
56
+ agentblaster harness profiles
57
+ agentblaster models targets
58
+
59
+ - name: Upload test report
60
+ if: always()
61
+ uses: actions/upload-artifact@v4
62
+ with:
63
+ name: pytest-${{ matrix.os }}-${{ matrix.python-version }}
64
+ path: test-reports/pytest.xml
65
+
66
+ governance-artifacts:
67
+ name: Deterministic governance artifacts
68
+ runs-on: ubuntu-latest
69
+
70
+ steps:
71
+ - name: Checkout
72
+ uses: actions/checkout@v4
73
+
74
+ - name: Set up Python
75
+ uses: actions/setup-python@v5
76
+ with:
77
+ python-version: "3.12"
78
+ cache: pip
79
+
80
+ - name: Install package
81
+ run: python -m pip install -e ".[dev]"
82
+
83
+ - name: Generate SDLC and GUI evidence artifacts
84
+ run: |
85
+ mkdir -p test-reports/gui test-reports/providers test-reports/security
86
+ agentblaster quality chrome-checklist --output test-reports/gui/chrome-dashboard-checklist.md
87
+ agentblaster quality chrome-plan --format json --output test-reports/gui/chrome-dashboard-plan.json
88
+ agentblaster quality chrome-plan --format md --output test-reports/gui/chrome-dashboard-plan.md
89
+ agentblaster quality dashboard-fixture --output test-reports/dashboard-runs --overwrite
90
+
91
+ - name: Generate mock-provider contract plans
92
+ env:
93
+ AGENTBLASTER_HOME: ${{ github.workspace }}/test-reports/config
94
+ run: |
95
+ agentblaster providers add --name mock-openai --contract openai --base-url http://127.0.0.1:8787/v1
96
+ agentblaster providers add --name mock-responses --contract openai-responses --base-url http://127.0.0.1:8787/v1
97
+ agentblaster providers add --name mock-anthropic --contract anthropic --base-url http://127.0.0.1:8787/v1
98
+ agentblaster providers contract-check --provider mock-openai --model agentblaster-mock-qwen3.6-27b-dense --output-json test-reports/providers/mock-openai-contract-plan.json
99
+ agentblaster providers contract-check --provider mock-responses --model agentblaster-mock-qwen3.6-27b-dense --output-json test-reports/providers/mock-responses-contract-plan.json
100
+ agentblaster providers contract-check --provider mock-anthropic --model agentblaster-mock-qwen3.6-27b-dense --skip-structured --output-json test-reports/providers/mock-anthropic-contract-plan.json
101
+ agentblaster providers audit --output-json test-reports/providers/provider-audit.json
102
+
103
+ - name: Generate release provenance and redaction scan
104
+ run: |
105
+ mkdir -p test-reports/release test-reports/security
106
+ agentblaster doctor --output-json test-reports/release/environment-readiness.json --fail-on-required-gaps
107
+ agentblaster release packaging-readiness --output-json test-reports/release/packaging-readiness.json --fail-on-gaps
108
+ agentblaster release provenance --output test-reports/release-provenance.json
109
+ agentblaster security scan test-reports --output-json test-reports/security/redaction-scan.json
110
+
111
+ - name: Upload governance artifacts
112
+ if: always()
113
+ uses: actions/upload-artifact@v4
114
+ with:
115
+ name: deterministic-governance-artifacts
116
+ path: test-reports/
117
+
118
+ gui-optional:
119
+ name: Optional GUI browser lane
120
+ runs-on: ubuntu-latest
121
+ if: github.event_name == 'workflow_dispatch'
122
+
123
+ steps:
124
+ - name: Checkout
125
+ uses: actions/checkout@v4
126
+
127
+ - name: Set up Python
128
+ uses: actions/setup-python@v5
129
+ with:
130
+ python-version: "3.12"
131
+ cache: pip
132
+
133
+ - name: Install package with GUI test extras
134
+ run: python -m pip install -e ".[dev,gui-test]"
135
+
136
+ - name: Install Playwright browser
137
+ run: python -m playwright install --with-deps chromium
138
+
139
+ - name: Run optional GUI tests
140
+ run: PYTHONPATH=src pytest -q tests/gui -m gui --junitxml=test-reports/gui-pytest.xml
141
+
142
+ - name: Upload GUI report
143
+ if: always()
144
+ uses: actions/upload-artifact@v4
145
+ with:
146
+ name: gui-pytest
147
+ path: test-reports/gui-pytest.xml
@@ -0,0 +1,81 @@
1
+ name: Package
2
+
3
+ on:
4
+ workflow_dispatch:
5
+ push:
6
+ tags:
7
+ - "v*"
8
+
9
+ permissions:
10
+ contents: read
11
+
12
+ jobs:
13
+ package:
14
+ name: Build release package artifacts
15
+ runs-on: ubuntu-latest
16
+
17
+ steps:
18
+ - name: Checkout
19
+ uses: actions/checkout@v4
20
+
21
+ - name: Set up Python
22
+ uses: actions/setup-python@v5
23
+ with:
24
+ python-version: "3.12"
25
+ cache: pip
26
+
27
+ - name: Install build dependencies
28
+ run: python -m pip install -e ".[dev]"
29
+
30
+ - name: Static environment and packaging readiness
31
+ run: |
32
+ mkdir -p dist release-reports
33
+ agentblaster doctor --output-json release-reports/environment-readiness.json --fail-on-required-gaps
34
+ agentblaster release packaging-readiness --output-json release-reports/packaging-readiness.json --fail-on-gaps
35
+ agentblaster release provenance --output release-reports/release-provenance.json
36
+
37
+ - name: Build source and wheel distributions
38
+ run: python -m build
39
+
40
+ - name: Scan shareable release artifacts
41
+ run: agentblaster security scan dist release-reports --output-json release-reports/redaction-scan.json
42
+
43
+ - name: Upload package artifacts
44
+ uses: actions/upload-artifact@v4
45
+ with:
46
+ name: agentblaster-package
47
+ path: |
48
+ dist/
49
+ release-reports/
50
+
51
+ - name: Upload distributions for publishing
52
+ uses: actions/upload-artifact@v4
53
+ with:
54
+ name: agentblaster-dist
55
+ path: dist/
56
+
57
+ publish:
58
+ name: Publish to PyPI via trusted publisher
59
+ needs: package
60
+ runs-on: ubuntu-latest
61
+ # Only publish on version tag pushes, never on manual dispatch or branch pushes.
62
+ if: startsWith(github.ref, 'refs/tags/v')
63
+
64
+ environment:
65
+ name: pypi
66
+ url: https://pypi.org/project/agentblaster/
67
+
68
+ permissions:
69
+ # IMPORTANT: required for trusted publishing (OIDC). Without this the
70
+ # GitHub -> PyPI handshake cannot occur and the publisher stays unused.
71
+ id-token: write
72
+
73
+ steps:
74
+ - name: Download built distributions
75
+ uses: actions/download-artifact@v4
76
+ with:
77
+ name: agentblaster-dist
78
+ path: dist/
79
+
80
+ - name: Publish to PyPI
81
+ uses: pypa/gh-action-pypi-publish@release/v1
@@ -0,0 +1,13 @@
1
+ __pycache__/
2
+ *.py[cod]
3
+ .venv/
4
+ .venvs/
5
+ .env
6
+ .DS_Store
7
+ .pytest_cache/
8
+ .ruff_cache/
9
+ dist/
10
+ build/
11
+ *.egg-info/
12
+ runs/
13
+ reports/
@@ -0,0 +1,250 @@
1
+ Metadata-Version: 2.4
2
+ Name: agentblaster
3
+ Version: 0.1.0
4
+ Summary: Local agentic benchmark suite for MLX and OpenAI-compatible inference engines.
5
+ Project-URL: Homepage, https://github.com/scouzi1966/AgentBlaster
6
+ Project-URL: Repository, https://github.com/scouzi1966/AgentBlaster
7
+ Project-URL: Issues, https://github.com/scouzi1966/AgentBlaster/issues
8
+ Author: scouzi1966
9
+ License: MIT
10
+ Keywords: agentic-ai,anthropic-compatible,benchmark,local-llm,mlx,openai-compatible
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: Environment :: Console
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: License :: OSI Approved :: MIT License
15
+ Classifier: Operating System :: MacOS
16
+ Classifier: Operating System :: Microsoft :: Windows
17
+ Classifier: Operating System :: POSIX :: Linux
18
+ Classifier: Programming Language :: Python :: 3
19
+ Classifier: Programming Language :: Python :: 3.11
20
+ Classifier: Programming Language :: Python :: 3.12
21
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
22
+ Classifier: Topic :: Software Development :: Testing
23
+ Requires-Python: >=3.11
24
+ Requires-Dist: httpx>=0.27
25
+ Requires-Dist: pydantic>=2.7
26
+ Requires-Dist: pyyaml>=6.0
27
+ Requires-Dist: rich>=13.7
28
+ Requires-Dist: typer>=0.12
29
+ Provides-Extra: dev
30
+ Requires-Dist: build>=1.2; extra == 'dev'
31
+ Requires-Dist: pytest>=8.0; extra == 'dev'
32
+ Requires-Dist: twine>=5.0; extra == 'dev'
33
+ Provides-Extra: exports
34
+ Requires-Dist: pyarrow>=15.0; extra == 'exports'
35
+ Provides-Extra: gui-test
36
+ Requires-Dist: playwright>=1.44; extra == 'gui-test'
37
+ Provides-Extra: reports
38
+ Requires-Dist: cairosvg>=2.7; extra == 'reports'
39
+ Provides-Extra: secrets
40
+ Requires-Dist: keyring>=25.0; extra == 'secrets'
41
+ Description-Content-Type: text/markdown
42
+
43
+ # AgentBlaster
44
+
45
+ AgentBlaster is a local agentic benchmark suite for OpenAI-compatible, Anthropic-compatible, and engine-native local inference servers.
46
+
47
+ The goal is to measure the hard parts of local agent workloads: repeated long system prompts, tool schemas, skills, MCP-style tool catalogs, structured output, streaming, cancellation, concurrency, prompt-cache reuse, and professional reporting.
48
+
49
+ [![CI](https://github.com/scouzi1966/AgentBlaster/actions/workflows/ci.yml/badge.svg)](https://github.com/scouzi1966/AgentBlaster/actions/workflows/ci.yml)
50
+
51
+ ## Initial Scope
52
+
53
+ - Engines: AFM MLX, mlx-lm, Ollama MLX, LM Studio, oMLX, Rapid-MLX, and vLLM-MLX.
54
+ - Models: Qwen3.6-27B dense and Gemma 4 31B dense.
55
+ - Interfaces: OpenAI Chat Completions first, then OpenAI Responses and Anthropic Messages.
56
+ - Outputs: CLI results, normalized JSONL, optional dashboard, HTML/PDF/SVG reports, PNG-ready media cards, and media-kit manifests for corporate/media publication packs.
57
+
58
+ ## Repository Status
59
+
60
+ This repository is freshly scaffolded from the initial PRD. The product requirements live in [docs/prd.md](docs/prd.md).
61
+
62
+ ## Implemented CLI Foundation
63
+
64
+ ```bash
65
+ agentblaster version
66
+ agentblaster doctor --policy agentblaster.policy.yaml --output-json reports/environment-readiness.json
67
+ agentblaster implementation-status --output-json reports/implementation-status.json
68
+ agentblaster suites
69
+ agentblaster validate-case examples/suites/smoke.yaml
70
+ agentblaster engines list
71
+ agentblaster engines onboarding --format markdown --output reports/local-engine-onboarding.md
72
+ agentblaster engines improvement-plan --engine afm --pressure-audit reports/qwen-gemma-stress-pressure.json --matrix-saturation-report reports/qwen-gemma-matrix-saturation.json --provider-contract-matrix reports/qwen-gemma-provider-contract-matrix.json --telemetry-audit reports/afm-telemetry-audit.json --metric-coverage reports/afm-metric-coverage.json --matrix-gate reports/qwen-gemma-matrix-gate.json --harness-review reports/harness-orchestration-review.json --output-json reports/afm-improvement-plan.json
73
+ agentblaster engines launch-recipes --catalog
74
+ agentblaster engines launch-recipes --engine afm --model mlx-community/Qwen3.6-27B --markdown --output-json reports/afm-launch-recipe.json
75
+ agentblaster engines probe --engine afm --base-url http://127.0.0.1:9999/v1
76
+ agentblaster providers presets
77
+ agentblaster providers add-preset --preset afm
78
+ agentblaster providers add-preset --preset ollama-native
79
+ agentblaster providers add-preset --preset openai
80
+ agentblaster providers add-preset --preset anthropic
81
+ agentblaster providers add-preset --preset openai --name openai-workspace --api-key-env WORKSPACE_OPENAI_KEY
82
+ agentblaster providers add --name openai --contract openai --base-url https://api.openai.com/v1 --api-key-env OPENAI_API_KEY --remote
83
+ agentblaster providers add --name openai --contract openai --base-url https://api.openai.com/v1 --api-key-env OPENAI_API_KEY --remote --audit-log audit/control-plane.jsonl
84
+ agentblaster providers add --name openai-enterprise --contract openai --base-url https://gateway.example.com/v1 --api-key-env OPENAI_API_KEY --ca-bundle /etc/ssl/certs/enterprise-ca.pem --remote
85
+ agentblaster providers list
86
+ agentblaster providers audit --policy agentblaster.policy.yaml --output-json reports/provider-audit.json
87
+ agentblaster providers metric-coverage --provider afm --output-json reports/afm-metric-coverage.json
88
+ agentblaster providers metric-coverage --catalog --output-json reports/metric-coverage-catalog.json
89
+ agentblaster providers readiness --provider afm --suite trace-replay --model mlx-community/Qwen3.6-27B --policy agentblaster.policy.yaml --strict-unknown --output-json reports/afm-trace-readiness.json
90
+ agentblaster mock-provider --host 127.0.0.1 --port 8787
91
+ agentblaster providers contract-check --provider mock-openai --model agentblaster-mock-qwen3.6-27b-dense --output-json reports/mock-openai-contract-plan.json
92
+ agentblaster providers contract-check --provider mock-openai --model agentblaster-mock-qwen3.6-27b-dense --execute --output-json reports/mock-openai-contract-check.json
93
+ agentblaster providers auth test --provider openai
94
+ agentblaster providers auth status --provider openai
95
+ agentblaster providers auth clear --provider openai --delete-secret
96
+ agentblaster providers cost set --provider openai --input-usd-per-1m-tokens 3.0 --output-usd-per-1m-tokens 12.0
97
+ agentblaster providers cost show --provider openai
98
+ agentblaster providers rate-limits set --provider openai --max-concurrency 2 --requests-per-minute 60
99
+ agentblaster providers rate-limits show --provider openai
100
+ agentblaster providers probe openai
101
+ agentblaster providers capabilities enable --provider afm --capability tool_calling
102
+ agentblaster providers capabilities list --provider afm
103
+ agentblaster suite-requirements --suite trace-replay
104
+ agentblaster suite-requirements --suite agentic-tool-loop
105
+ agentblaster suite-requirements --suite agent-fanout
106
+ agentblaster suite-requirements --suite cancellation
107
+ agentblaster suite-footprint --suite trace-replay --output-json reports/trace-replay-footprint.json
108
+ agentblaster suite-footprint --suite cache-control --output-json reports/cache-control-footprint.json
109
+ agentblaster suite-audit --suite-file examples/suites/toolsim.yaml --output-json reports/toolsim-suite-audit.json
110
+ agentblaster suite-calibration --suite-file examples/suites/agentic-local-profiles.yaml --template-output reports/agentic-local-profiles-calibration.json
111
+ agentblaster suite-calibration --suite-file examples/suites/agentic-local-profiles.yaml --calibration reports/agentic-local-profiles-calibration.json --output-json reports/agentic-local-profiles-calibration-report.json
112
+ agentblaster policy validate agentblaster.policy.yaml --output-json reports/policy-normalized.json
113
+ agentblaster policy template --profile local --output agentblaster.policy.yaml --output-json reports/enterprise-policy-template.json
114
+ agentblaster policy controls agentblaster.policy.yaml --name local-campaign --output-json reports/policy-control-summary.json
115
+ agentblaster evidence bundle --suite-file examples/suites/toolsim.yaml --policy agentblaster.policy.yaml --include-provider-audit --output-dir evidence --audit-log audit/control-plane.jsonl
116
+ agentblaster evidence campaign-preflight --matrix examples/matrices/qwen-gemma-local.yaml --matrix examples/matrices/qwen-gemma-stress.yaml --policy agentblaster.policy.yaml --benchmark-readiness reports/afm-trace-readiness.json --output-dir campaign-preflight/qwen-gemma-local --audit-log audit/control-plane.jsonl
117
+ agentblaster evidence campaign-preflight --matrix campaigns/qwen-gemma-local/matrices/qwen-gemma-local.yaml --policy agentblaster.policy.yaml --benchmark-readiness-list campaigns/qwen-gemma-local/reports/benchmark-readiness-inputs.txt --output-dir campaigns/qwen-gemma-local/reports/campaign-preflight
118
+ agentblaster evidence index --name afm-release --artifact reports/qwen-gemma-matrix-gate.json --artifact reports/harness-orchestration-review.json --artifact reports/afm-improvement-plan.json --artifact reports/afm-metric-coverage.json --artifact reports/cleanup-plan.json --output-json reports/afm-release-evidence-index.json
119
+ agentblaster providers check-suite --provider openai --suite trace-replay --output-json reports/openai-trace-preflight.json
120
+ agentblaster providers check-suite --provider afm --suite toolcall --strict-unknown
121
+ agentblaster catalog simulated-tools --output-json reports/simulated-tools-catalog.json
122
+ agentblaster catalog mcp-profiles --output-json reports/mcp-profiles-catalog.json
123
+ agentblaster catalog lcp-profiles --output-json reports/lcp-profiles-catalog.json
124
+ agentblaster catalog skills --output-json reports/skills-catalog.json
125
+ agentblaster catalog artifact-schemas --format markdown --output reports/artifact-schemas.md
126
+ agentblaster catalog normalize-telemetry samples/ollama-response.json --contract native --native-adapter ollama --output-json reports/ollama-normalized-telemetry.json
127
+ agentblaster dashboard --runs runs --host 127.0.0.1 --port 8765
128
+ agentblaster dashboard --runs runs --host 127.0.0.1 --port 8765 --policy agentblaster.policy.yaml --auth-token-env AGENTBLASTER_DASHBOARD_TOKEN
129
+ agentblaster run --suite smoke --engine afm --model mlx-community/Qwen3.6-27B --dry-run --plan-json reports/afm-smoke-plan.json
130
+ agentblaster run --suite smoke --engine openai --model <openai-model> --no-raw-traces --audit-log runs/audit.jsonl --concurrency 1 --retention-classification confidential --retention-days 30 --raw-trace-retention-days 7
131
+ agentblaster run --suite-file examples/suites/smoke.yaml --engine openai --model <openai-model> --no-raw-traces
132
+ agentblaster run --suite toolcall --engine afm --model mlx-community/Qwen3.6-27B --strict-unknown-capabilities
133
+ agentblaster run --suite agentic-tool-loop --engine afm --model mlx-community/Qwen3.6-27B --strict-unknown-capabilities --no-raw-traces
134
+ agentblaster run --suite agent-fanout --engine afm --model mlx-community/Qwen3.6-27B --concurrency 4 --no-raw-traces
135
+ agentblaster run --suite cancellation --engine afm --model mlx-community/Qwen3.6-27B --no-raw-traces
136
+ agentblaster run --suite harness-engineering --engine afm --model mlx-community/Qwen3.6-27B --strict-unknown-capabilities --no-raw-traces
137
+ agentblaster run --matrix examples/matrices/local-smoke.yaml --offline --continue-on-error --matrix-summary-json reports/local-smoke-matrix-summary.json
138
+ agentblaster matrix contract-checks examples/matrices/qwen-gemma-local.yaml --output-json reports/qwen-gemma-contract-matrix-plan.json
139
+ agentblaster matrix pressure-audit examples/matrices/qwen-gemma-stress.yaml --output-json reports/qwen-gemma-stress-pressure.json
140
+ agentblaster matrix report reports/local-smoke-matrix-summary.json --format html,md,json
141
+ agentblaster matrix saturation-report reports/qwen-gemma-matrix-summary.json --output-json reports/qwen-gemma-matrix-saturation.json
142
+ agentblaster matrix gate reports/local-smoke-matrix-summary.json --require-all-runs-complete --max-failed-runs 0 --min-case-pass-rate 95 --max-failure-class engine_protocol_bug=0 --max-tool-loop-stop-reason max_tool_calls_reached=0 --output-json reports/local-smoke-matrix-gate.json
143
+ agentblaster run --suite trace-replay --engine afm --model mlx-community/Qwen3.6-27B --offline
144
+ agentblaster run --suite-file examples/suites/trace-replay.yaml --engine afm --model mlx-community/Qwen3.6-27B --offline
145
+ agentblaster report runs/<run-id> --format html,json,publication,card,png
146
+ agentblaster report runs/<run-id> --format html,json --audit-log audit/control-plane.jsonl
147
+ agentblaster publication-bundle runs/<run-id> --output-dir publication-bundles --audit-log audit/control-plane.jsonl
148
+ agentblaster export runs/<run-id> --format jsonl,csv,parquet
149
+ agentblaster telemetry-audit runs/<run-id> --required-field tokens_per_second_decode --output-json reports/run-telemetry-audit.json
150
+ agentblaster compare runs/<run-a> runs/<run-b> --output-json reports/comparison.json
151
+ agentblaster compare-gate runs/<baseline> runs/<candidate> --max-avg-latency-regression-pct 15 --min-pass-rate 95 --output-json reports/comparison-gate.json
152
+ agentblaster cleanup runs/<run-id> --raw --reports --exports --caches --temp --bundles --output-json reports/manual-cleanup-plan.json
153
+ agentblaster cleanup runs/<run-id> --raw --reports --exports --caches --temp --bundles --execute --audit-log audit/control-plane.jsonl --require-audit-log --policy agentblaster.policy.yaml
154
+ agentblaster cleanup-expired --runs runs --output-json reports/cleanup-plan.json
155
+ agentblaster cleanup-expired --runs runs --execute --audit-log audit/control-plane.jsonl --require-audit-log --policy agentblaster.policy.yaml
156
+ agentblaster verify runs/<run-id>
157
+ agentblaster sign runs/<run-id> --key-env AGENTBLASTER_SIGNING_KEY --key-id ci-release-key
158
+ agentblaster verify-signature runs/<run-id> --key-env AGENTBLASTER_SIGNING_KEY
159
+ agentblaster quality tiers
160
+ agentblaster quality command normal
161
+ agentblaster quality validation-manifest --format json --output test-reports/sdlc-validation-manifest.json
162
+ agentblaster quality chrome-checklist --output tests/gui/chrome-dashboard-checklist.md
163
+ agentblaster quality chrome-plan --format json --output tests/gui/chrome-dashboard-plan.json
164
+ agentblaster quality dashboard-fixture --output tests/fixtures/dashboard-runs --overwrite
165
+ agentblaster selftest --tier normal --dry-run
166
+ agentblaster selftest gui --browser chromium --headed --dry-run
167
+ PYTHONPATH=src pytest -q tests/gui -m gui
168
+ agentblaster selftest report --run selftest_20260531T000000Z --format html,json,junit
169
+ agentblaster experiment manifest --name qwen-gemma-local --objective "Compare AFM and LM Studio on Qwen/Gemma local-agent suites." --providers afm,lm-studio --targets qwen3.6-27b-dense,gemma-4-31b-dense --suites trace-replay,agentic-tool-loop,agent-fanout,prefill,harness-engineering --policy agentblaster.policy.yaml --output reports/qwen-gemma-experiment.json
170
+ agentblaster experiment gate reports/qwen-gemma-experiment.json --require-policy --output-json reports/qwen-gemma-experiment-gate.json
171
+ agentblaster release packaging-readiness --output-json reports/packaging-readiness.json --fail-on-gaps --audit-log audit/control-plane.jsonl
172
+ agentblaster release provenance --output reports/release-provenance.json --audit-log audit/control-plane.jsonl
173
+ agentblaster release qualification-bundle --name afm-release --evidence-bundle evidence/toolsim.agentblaster-evidence.zip --provider-audit reports/provider-audit.json --provider-contract-matrix reports/qwen-gemma-provider-contract-matrix.json --matrix-gate reports/qwen-gemma-matrix-gate.json --telemetry-audit reports/run-telemetry-audit.json --normalized-telemetry reports/afm-normalized-telemetry.json --matrix-pressure-audit reports/qwen-gemma-stress-pressure.json --matrix-saturation-report reports/qwen-gemma-matrix-saturation.json --matrix-scorecard reports/qwen-gemma-matrix-scorecard.json --implementation-status reports/implementation-status.json --campaign-preflight-manifest campaign-preflight/qwen-gemma-local/manifest.json --benchmark-readiness reports/afm-trace-readiness.json --engine-advisory reports/afm-improvement-plan.json --evidence-index reports/afm-release-evidence-index.json --suite-audit reports/toolsim-suite-audit.json --metric-coverage reports/afm-metric-coverage.json --release-provenance reports/release-provenance.json --publication-bundle publication-bundles/run.agentblaster-publication.zip --matrix-publication-bundle publication-bundles/qwen-gemma-matrix-summary.agentblaster-matrix-publication.zip --harness-review reports/harness-contract-fuzz-review.json --selftest-report test-reports/selftest/selftest-report.json --sdlc-validation-manifest test-reports/sdlc-validation-manifest.json --output-dir release-bundles --audit-log audit/control-plane.jsonl
174
+ agentblaster security scan release-bundles/afm-release.agentblaster-release-qualification.zip --output-json reports/redaction-scan.json
175
+ agentblaster release claim-readiness --name afm-release --experiment-manifest reports/qwen-gemma-experiment.json --experiment-gate reports/qwen-gemma-experiment-gate.json --provider-audit reports/provider-audit.json --provider-contract-matrix reports/qwen-gemma-provider-contract-matrix.json --matrix-gate reports/qwen-gemma-matrix-gate.json --telemetry-audit reports/run-telemetry-audit.json --normalized-telemetry reports/afm-normalized-telemetry.json --matrix-pressure-audit reports/qwen-gemma-stress-pressure.json --matrix-saturation-report reports/qwen-gemma-matrix-saturation.json --matrix-scorecard reports/qwen-gemma-matrix-scorecard.json --implementation-status reports/implementation-status.json --benchmark-readiness reports/afm-trace-readiness.json --release-provenance reports/release-provenance.json --release-qualification-bundle release-bundles/afm-release.agentblaster-release-qualification.zip --redaction-scan reports/redaction-scan.json --publication-bundle publication-bundles/run.agentblaster-publication.zip --matrix-publication-bundle publication-bundles/qwen-gemma-matrix-summary.agentblaster-matrix-publication.zip --harness-review reports/harness-contract-fuzz-review.json --engine-advisory reports/afm-improvement-plan.json --evidence-index reports/afm-release-evidence-index.json --suite-audit reports/toolsim-suite-audit.json --metric-coverage reports/afm-metric-coverage.json --campaign-preflight-manifest campaign-preflight/qwen-gemma-local/manifest.json --selftest-report test-reports/selftest/selftest-report.json --output-json reports/afm-release-claim-readiness.json
176
+ agentblaster release publication-brief --name afm-release --claim-readiness reports/afm-release-claim-readiness.json --matrix-scorecard reports/qwen-gemma-matrix-scorecard.json --release-provenance reports/release-provenance.json --evidence-index reports/afm-release-evidence-index.json --output-json reports/afm-release-publication-brief.json --output-md reports/afm-release-publication-brief.md
177
+ agentblaster agents profiles
178
+ agentblaster agents suite --profile all --output examples/suites/agentic-local-profiles.yaml
179
+ agentblaster agents suite --profile hermes --output examples/suites/agentic-hermes.yaml
180
+ agentblaster harness profiles
181
+ agentblaster harness generate --profile contract-fuzz --suite smoke --repeats 1 --seed 0 --output examples/suites/harness-contract-fuzz.yaml
182
+ agentblaster harness generate --profile metamorphic --suite smoke --repeats 3 --seed 13 --output examples/suites/harness-metamorphic.yaml
183
+ agentblaster harness generate --profile cancellation --suite smoke --repeats 3 --seed 23 --output examples/suites/harness-cancellation.yaml
184
+ agentblaster harness generate --profile orchestration --suite smoke --repeats 3 --seed 29 --output examples/suites/harness-orchestration.yaml
185
+ agentblaster harness generate --profile emerging-workflows --suite smoke --repeats 2 --seed 37 --output examples/suites/harness-emerging-workflows.yaml
186
+ agentblaster harness review --suite-file examples/suites/harness-contract-fuzz.yaml --output-json reports/harness-contract-fuzz-review.json
187
+ agentblaster models targets
188
+ agentblaster models matrix --providers afm,lm-studio --targets qwen3.6-27b-dense,gemma-4-31b-dense --suite trace-replay --output examples/matrices/qwen-gemma-local.yaml
189
+ agentblaster models stress-matrix --providers afm,lm-studio --targets qwen3.6-27b-dense,gemma-4-31b-dense --suites agentic-tool-loop,agent-fanout,prefill,harness-engineering,trace-replay --concurrency-levels 1,2,4,8 --output examples/matrices/qwen-gemma-stress.yaml --summary-json reports/qwen-gemma-stress-plan.json
190
+ agentblaster models benchmark-kit --providers afm,lm-studio --targets qwen3.6-27b-dense,gemma-4-31b-dense --suite trace-replay --policy agentblaster.policy.yaml --output-dir benchmark-kits/qwen-gemma-local
191
+ cat campaigns/qwen-gemma-local/README.md
192
+ agentblaster run --matrix examples/matrices/qwen-gemma-local.yaml --offline --continue-on-error --matrix-summary-json reports/qwen-gemma-matrix-summary.json
193
+ agentblaster run --matrix examples/matrices/qwen-gemma-stress.yaml --offline --dry-run --plan-json reports/qwen-gemma-stress-plan.json
194
+ agentblaster matrix report reports/qwen-gemma-matrix-summary.json --format html,md,json
195
+ agentblaster matrix saturation-report reports/qwen-gemma-matrix-summary.json --output-json reports/qwen-gemma-matrix-saturation.json
196
+ agentblaster matrix gate reports/qwen-gemma-matrix-summary.json --require-all-runs-complete --max-failed-runs 0 --min-case-pass-rate 95 --max-failure-class engine_protocol_bug=0 --max-tool-loop-stop-reason max_tool_calls_reached=0 --output-json reports/qwen-gemma-matrix-gate.json
197
+ agentblaster run --suite smoke --engine afm --model mlx-community/Qwen3.6-27B --offline
198
+ agentblaster run --suite smoke --engine afm --model mlx-community/Qwen3.6-27B --policy agentblaster.policy.yaml
199
+ ```
200
+
201
+ Provider profiles are stored locally without raw API keys. API keys can be referenced through environment variables, optional OS keyring storage, or an explicit plaintext `.env` fallback for local development only, with dashboard setup-status posture, provider-audit secret-backend posture, status, test, clear, and writable-secret-delete workflows.
202
+
203
+ Provider setup details are documented in [docs/providers.md](docs/providers.md), including remote OpenAI/Anthropic presets, the deterministic local mock provider, schema-versioned redacted provider audits, readiness dossiers, portable environment-variable references, optional OS-keyring API-key references, explicit development-only dotenv fallback, cost models for budget policy, and provider rate limits for pacing/concurrency control. Local engine setup recipes are documented in [docs/launch-recipes.md](docs/launch-recipes.md).
204
+ Engine target planning includes AFM MLX, MLX-LM, Ollama/Ollama-native, Rapid-MLX, oMLX, vLLM-MLX OpenAI/Anthropic-compatible profiles, LM Studio Chat/Responses/Anthropic/native profiles, and remote OpenAI/Anthropic-compatible contract targets. Each target declares representative agent-profile baselines, workflow surfaces, prefill/concurrency challenges, contract priority, telemetry profiles, and native metric claim policy for standardized comparison.
205
+
206
+ Reporting details are documented in [docs/reporting.md](docs/reporting.md), including publication JSON plus SVG/PNG report cards for media or corporate consumption. Metric coverage is documented in [docs/metrics.md](docs/metrics.md), including native/measured/inferred/conditional/unavailable field status and stats-semantics guidance for cross-engine comparisons.
207
+ Failure classification is documented in [docs/failure-taxonomy.md](docs/failure-taxonomy.md), including the distinction between model-quality misses, engine protocol bugs, feature gaps, runtime failures, environment failures, rate limits, and harness defects.
208
+ Artifact schemas are documented in [docs/artifact-schemas.md](docs/artifact-schemas.md), including publication-safety guidance for run, matrix, lifecycle, raw, and readiness artifacts.
209
+
210
+ Reproducibility details are documented in [docs/reproducibility.md](docs/reproducibility.md), including suite snapshots, suite/case hashes, run integrity manifests, signatures, and publication-bundle signature coverage metadata.
211
+ Implementation status inventory is available through `agentblaster implementation-status`; it is a static handoff artifact and does not run tests or contact providers. It reports file presence plus static requirement inventories for target engines, engine-target standardization metadata, provider contracts, Qwen/Gemma model targets, agent profiles, built-in harness-engineering suite cases, stats-comparability/metric-coverage catalogs, enterprise policy controls, credential/backend posture including optional keyring support, run/matrix publication-bundle governance with media-kit manifests, and SDLC/Chrome self-test gates.
212
+ Retention metadata is documented in [docs/retention.md](docs/retention.md), including manifest fields for artifact classification, intended run retention, and shorter raw-trace retention.
213
+
214
+ Observability details are documented in [docs/observability.md](docs/observability.md), including optional Prometheus before/after snapshots for local engine telemetry and normalized response telemetry with comparison-readiness metadata. Agent fan-out diagnostics are documented in [docs/agent-fanout.md](docs/agent-fanout.md). Cache-control diagnostics are documented in [docs/cache-control.md](docs/cache-control.md). Cancellation diagnostics are documented in [docs/cancellation.md](docs/cancellation.md). The built-in `agentic-tool-loop` suite exercises bounded deterministic tool-result replay, MCP fixture calls, LCP context attachment, and max-tool-call stop-reason reporting.
215
+
216
+ Dashboard details are documented in [docs/dashboard.md](docs/dashboard.md), including the no-JavaScript launch/report-generation forms and allowlisted report artifact links.
217
+
218
+ Capability preflight is documented in [docs/capabilities.md](docs/capabilities.md), including suite feature requirements and provider-suite compatibility checks.
219
+ Run execution performs capability preflight by default, failing before dispatch when a provider is explicitly missing suite-required features.
220
+ Bundled capability surface catalogs are documented in [docs/capability-surfaces.md](docs/capability-surfaces.md), including simulated tool, deterministic MCP profile, LCP context-bundle, and skill-pack inventory commands for policy review.
221
+ Suite governance is documented in [docs/suite-governance.md](docs/suite-governance.md), including static provenance, risk, license/source, and capability-surface audits before dispatch.
222
+ Evidence bundles are documented in [docs/evidence-bundles.md](docs/evidence-bundles.md), including redaction-safe governance zip artifacts for corporate review and media-supporting benchmark evidence.
223
+ Campaign preflight bundles are also documented there; they collect no-dispatch readiness, schema, policy, provider-audit, and matrix-inventory artifacts before expensive local or remote matrices are launched.
224
+
225
+ Model targets, matrix generation, and benchmark kits are documented in [docs/models.md](docs/models.md), including the initial Qwen3.6 27B dense and Gemma 4 31B dense comparison targets, comparison-group guidance, required release metadata, and provider contract-matrix commands for campaign compatibility evidence.
226
+ The checked-in Qwen/Gemma campaign handoff lives in [campaigns/qwen-gemma-local/README.md](campaigns/qwen-gemma-local/README.md).
227
+
228
+ Dry-run planning is documented in [docs/planning.md](docs/planning.md), including policy/capability preflight and estimated token/cost summaries before dispatch. Prompt footprint analysis is documented in [docs/prompt-footprint.md](docs/prompt-footprint.md), including system/tool/MCP/LCP/skill prefix breakdowns for prefill diagnostics. Matrix pressure audits extend that analysis across provider/model/suite/concurrency matrices before dispatch.
229
+
230
+ Run execution includes enterprise controls: raw traces can be disabled, remote providers can be blocked with `--offline`, YAML policy files can allowlist providers and endpoint hosts, policy can require remote API-key references, policy can restrict secret backends and approved secret reference names/prefixes, policy can require cleanup audit logs, policy can cap suite and matrix cost exposure, policy can gate suite-provided tool schemas, simulated tools, MCP profiles, LCP context bundles, skills, provenance, risk levels, and source/license metadata, and optional JSONL audit logs record run and policy events.
231
+ Security policy details are documented in [docs/security-policy.md](docs/security-policy.md), including enterprise baseline generation with `agentblaster policy template`, no-secret review summaries with `agentblaster policy controls`, and `agentblaster.provider-audit.v1` provider/auth posture audits. The example policy in [agentblaster.policy.example.yaml](agentblaster.policy.example.yaml) separates provider endpoint allowlists from Prometheus metrics endpoint allowlists and includes capability-surface allowlists.
232
+
233
+ Audit logging details are documented in [docs/audit.md](docs/audit.md), including control-plane events for provider config, secret reference changes, dashboard start, report generation, matrix reports, and exports.
234
+
235
+ AgentBlaster includes its own SDLC test harness taxonomy. The `quality` commands describe deterministic app-test tiers, release lanes, SDLC validation manifests, Chrome/Codex dashboard validation plans, and redacted dashboard fixtures with release-evidence summaries, including bounded `agentic-tool-loop` stop-reason gate metadata, without running tests. SDLC validation manifests are direct review artifacts and can also be archived through release qualification bundles as compact summaries for claim-readiness, evidence-index, and dashboard review.
236
+ Experiment manifests are documented in [docs/experiments.md](docs/experiments.md), including static scope, preflight requirements, acceptance gates, and publication rules for corporate/media benchmark campaigns. Release governance artifacts can be generated with `agentblaster release packaging-readiness` and `agentblaster release provenance`; the JSON outputs record package metadata readiness, dependency declarations, an SPDX-lite SBOM inventory, optional installed package inventory, safe source hashes, and explicit redaction notes. Release qualification bundles collect evidence, audit, advisory, gate, readiness, provenance, publication, SDLC validation, and selftest artifacts into one checksum-indexed package. Provider audits, publication briefs, and SDLC validation manifests are summarized, not copied verbatim, for release qualification, claim-readiness, evidence-index, and dashboard consumers; publication briefs also surface compact engine-target IDs and media-kit readiness from compact claim-readiness evidence without opening publication ZIP bundles. Generated campaign runbooks include claim-readiness, publication-brief, and final archival bundle commands so corporate/media packets can carry the final claim gate, brief, provider-auth posture, media-kit readiness, and app-SDLC review evidence in compact redaction-safe form. Use `agentblaster security scan` as a final local redaction gate before publishing bundles; it scans text files, text entries inside ZIP bundles, and unsafe ZIP member names without extracting archives or printing matched secret/local-path values.
237
+ Repository automation is documented in [docs/testing.md](docs/testing.md), including deterministic CI and a package-build workflow that uploads artifacts and, on version tag pushes (`v*`), publishes to PyPI via a trusted publisher.
238
+
239
+ AgentBlaster includes representative local-agent profile generators for OpenCode-style, OpenClaw-style, Nous Hermes-style, Pi-style, Aider-style, Cline-style, Continue-style, and Codex-style workflows. The `agents` commands write reviewable YAML suites with tool, MCP, LCP, skill, trace-replay, structured-output, retrieval, and sandboxed command-planning surfaces and do not call providers or install third-party agent frameworks.
240
+
241
+ AgentBlaster also includes deterministic harness-engineering generators for prefill/cache, concurrency, cancellation, provider-contract fuzz, metamorphic-equivalence, skill-prefix routing, multi-tool orchestration, mixed emerging MCP/LCP/skills/tool-loop stacks, and judge-rubric workloads with bounded fixture tool-result round trips. The `harness` commands write reviewable YAML suites and static harness-review artifacts without calling providers.
242
+ Harness engineering details are documented in [docs/harness.md](docs/harness.md), including generated-suite provenance and reporting metadata.
243
+
244
+ Trace replay cases can provide explicit `messages` for multi-turn agent workflows, including prior assistant tool calls and deterministic tool-result context. OpenAI-compatible and Anthropic-compatible adapters normalize those traces into their respective request contracts.
245
+
246
+ ## Planned Benchmark CLI
247
+
248
+ ```bash
249
+ agentblaster report runs/<run-id> --format html
250
+ ```