ai-crucible 0.2.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- ai_crucible-0.2.0/.github/workflows/ci.yml +79 -0
- ai_crucible-0.2.0/.github/workflows/pages.yml +50 -0
- ai_crucible-0.2.0/.github/workflows/release.yml +190 -0
- ai_crucible-0.2.0/.gitignore +57 -0
- ai_crucible-0.2.0/CHANGELOG.md +48 -0
- ai_crucible-0.2.0/LICENSE +21 -0
- ai_crucible-0.2.0/PKG-INFO +125 -0
- ai_crucible-0.2.0/README.es.md +96 -0
- ai_crucible-0.2.0/README.fr.md +96 -0
- ai_crucible-0.2.0/README.hi.md +96 -0
- ai_crucible-0.2.0/README.it.md +96 -0
- ai_crucible-0.2.0/README.ja.md +96 -0
- ai_crucible-0.2.0/README.md +96 -0
- ai_crucible-0.2.0/README.pt-BR.md +96 -0
- ai_crucible-0.2.0/README.zh.md +96 -0
- ai_crucible-0.2.0/SCORECARD.md +46 -0
- ai_crucible-0.2.0/SECURITY.md +141 -0
- ai_crucible-0.2.0/SHIP_GATE.md +71 -0
- ai_crucible-0.2.0/assets/logo.png +0 -0
- ai_crucible-0.2.0/docs/attestia-integration-roadmap.md +254 -0
- ai_crucible-0.2.0/docs/gameplan.md +119 -0
- ai_crucible-0.2.0/docs/phase-0/chatgpt-deep-research.md +129 -0
- ai_crucible-0.2.0/docs/phase-0/swarm-01-capability-gaps.md +70 -0
- ai_crucible-0.2.0/docs/phase-0/swarm-02-designer-bias.md +55 -0
- ai_crucible-0.2.0/docs/phase-0/swarm-03-multi-attempt-scoring.md +57 -0
- ai_crucible-0.2.0/docs/phase-0/swarm-04-novel-benchmark-designs.md +79 -0
- ai_crucible-0.2.0/docs/phase-0/swarm-05-agent-eval-methodology.md +71 -0
- ai_crucible-0.2.0/docs/phase-0/swarm-06-reward-hacking-detection.md +58 -0
- ai_crucible-0.2.0/docs/phase-0/swarm-07-multi-criterion-scoring.md +58 -0
- ai_crucible-0.2.0/docs/phase-0/swarm-08-tool-efficiency-budgets.md +39 -0
- ai_crucible-0.2.0/docs/phase-0/swarm-09-honeypot-patterns.md +39 -0
- ai_crucible-0.2.0/docs/phase-0/swarm-10-datacurve-followup.md +47 -0
- ai_crucible-0.2.0/docs/phase-0/swarm-11-replication-packages.md +86 -0
- ai_crucible-0.2.0/docs/phase-0/swarm-12-preregistration-stats.md +60 -0
- ai_crucible-0.2.0/docs/phase-0/swarm-13-cryptographic-provenance.md +73 -0
- ai_crucible-0.2.0/docs/phase-0/swarm-14-third-party-audit.md +80 -0
- ai_crucible-0.2.0/docs/phase-0/swarm-15-ablation-tuning.md +74 -0
- ai_crucible-0.2.0/docs/phase-0/swarm-16-engagement-reward-surface.md +55 -0
- ai_crucible-0.2.0/docs/phase-0/swarm-17-kernel-architecture.md +57 -0
- ai_crucible-0.2.0/docs/phase-0/swarm-18-sandboxing-hardening.md +48 -0
- ai_crucible-0.2.0/docs/phase-0/swarm-19-critic-role-value.md +50 -0
- ai_crucible-0.2.0/docs/phase-0/swarm-20-honeypot-eval-awareness.md +64 -0
- ai_crucible-0.2.0/docs/phase-0/swarm-21-task-relevant-vs-irrelevant-signals.md +45 -0
- ai_crucible-0.2.0/docs/phase-0/swarm-22-positive-engagement-levers.md +63 -0
- ai_crucible-0.2.0/docs/phase-0/swarm-23-self-competition-framing.md +61 -0
- ai_crucible-0.2.0/docs/phase-0/swarm-24-designer-generative-engagement.md +49 -0
- ai_crucible-0.2.0/docs/phase-0/swarm-25-engagement-vs-measurement-integrity.md +65 -0
- ai_crucible-0.2.0/docs/phase-2/calibration.md +74 -0
- ai_crucible-0.2.0/docs/phase-2/grounding-research.md +68 -0
- ai_crucible-0.2.0/docs/phase-2/kickoff-forks-swarm.md +102 -0
- ai_crucible-0.2.0/docs/phase-2/panel.json +34 -0
- ai_crucible-0.2.0/docs/phase-2/swarm-wave0-contracts.md +134 -0
- ai_crucible-0.2.0/docs/research-grounding.md +896 -0
- ai_crucible-0.2.0/npm/LICENSE +21 -0
- ai_crucible-0.2.0/npm/README.es.md +34 -0
- ai_crucible-0.2.0/npm/README.fr.md +38 -0
- ai_crucible-0.2.0/npm/README.hi.md +36 -0
- ai_crucible-0.2.0/npm/README.it.md +38 -0
- ai_crucible-0.2.0/npm/README.ja.md +36 -0
- ai_crucible-0.2.0/npm/README.md +43 -0
- ai_crucible-0.2.0/npm/README.pt-BR.md +39 -0
- ai_crucible-0.2.0/npm/README.zh.md +35 -0
- ai_crucible-0.2.0/npm/bin/ai-crucible.js +18 -0
- ai_crucible-0.2.0/npm/package.json +40 -0
- ai_crucible-0.2.0/puzzles/seed-sulzbach-55252/meta.json +43 -0
- ai_crucible-0.2.0/puzzles/seed-sulzbach-55252/oracle/ANSWER_KEY_a7f3b9.txt +3 -0
- ai_crucible-0.2.0/puzzles/seed-sulzbach-55252/oracle/check.py +118 -0
- ai_crucible-0.2.0/puzzles/seed-sulzbach-55252/prompt +16 -0
- ai_crucible-0.2.0/puzzles/seed-sulzbach-55252/setup_script +66 -0
- ai_crucible-0.2.0/pyproject.toml +70 -0
- ai_crucible-0.2.0/site/astro.config.mjs +36 -0
- ai_crucible-0.2.0/site/package-lock.json +7926 -0
- ai_crucible-0.2.0/site/package.json +18 -0
- ai_crucible-0.2.0/site/src/assets/logo.png +0 -0
- ai_crucible-0.2.0/site/src/content/docs/handbook/architecture.md +122 -0
- ai_crucible-0.2.0/site/src/content/docs/handbook/concepts.md +125 -0
- ai_crucible-0.2.0/site/src/content/docs/handbook/getting-started.md +164 -0
- ai_crucible-0.2.0/site/src/content/docs/handbook/index.md +70 -0
- ai_crucible-0.2.0/site/src/content/docs/handbook/reference.md +249 -0
- ai_crucible-0.2.0/site/src/content/docs/handbook/security.md +121 -0
- ai_crucible-0.2.0/site/src/content.config.ts +7 -0
- ai_crucible-0.2.0/site/src/pages/index.astro +33 -0
- ai_crucible-0.2.0/site/src/site-config.ts +64 -0
- ai_crucible-0.2.0/site/src/styles/global.css +3 -0
- ai_crucible-0.2.0/site/src/styles/starlight-custom.css +5 -0
- ai_crucible-0.2.0/site/tsconfig.json +5 -0
- ai_crucible-0.2.0/src/ai_crucible/__init__.py +48 -0
- ai_crucible-0.2.0/src/ai_crucible/__main__.py +6 -0
- ai_crucible-0.2.0/src/ai_crucible/attestation.py +326 -0
- ai_crucible-0.2.0/src/ai_crucible/budget.py +186 -0
- ai_crucible-0.2.0/src/ai_crucible/calibration/__init__.py +15 -0
- ai_crucible-0.2.0/src/ai_crucible/calibration/admission_pairs.json +1583 -0
- ai_crucible-0.2.0/src/ai_crucible/calibration/admission_set.json +23 -0
- ai_crucible-0.2.0/src/ai_crucible/calibration/irt.py +271 -0
- ai_crucible-0.2.0/src/ai_crucible/calibration/items/README.md +48 -0
- ai_crucible-0.2.0/src/ai_crucible/calibration/items/difficulty_anchor.json +38 -0
- ai_crucible-0.2.0/src/ai_crucible/calibration/items/known_diagnostic.json +38 -0
- ai_crucible-0.2.0/src/ai_crucible/calibration/items/known_impossible.json +38 -0
- ai_crucible-0.2.0/src/ai_crucible/calibration/items/known_trivial.json +35 -0
- ai_crucible-0.2.0/src/ai_crucible/calibration/items/test_retest.json +24 -0
- ai_crucible-0.2.0/src/ai_crucible/calibration/known_groups.py +295 -0
- ai_crucible-0.2.0/src/ai_crucible/calibration/loader.py +314 -0
- ai_crucible-0.2.0/src/ai_crucible/calibration/types.py +41 -0
- ai_crucible-0.2.0/src/ai_crucible/characterize/__init__.py +18 -0
- ai_crucible-0.2.0/src/ai_crucible/characterize/aggregate.py +605 -0
- ai_crucible-0.2.0/src/ai_crucible/characterize/human_labels.py +290 -0
- ai_crucible-0.2.0/src/ai_crucible/characterize/metrics.py +1156 -0
- ai_crucible-0.2.0/src/ai_crucible/characterize/panel_store.py +173 -0
- ai_crucible-0.2.0/src/ai_crucible/characterize/profile.py +714 -0
- ai_crucible-0.2.0/src/ai_crucible/characterize/run.py +505 -0
- ai_crucible-0.2.0/src/ai_crucible/characterize/types.py +79 -0
- ai_crucible-0.2.0/src/ai_crucible/cli.py +68 -0
- ai_crucible-0.2.0/src/ai_crucible/engagement.py +188 -0
- ai_crucible-0.2.0/src/ai_crucible/eval_awareness.py +160 -0
- ai_crucible-0.2.0/src/ai_crucible/framing.py +202 -0
- ai_crucible-0.2.0/src/ai_crucible/instrument/__init__.py +106 -0
- ai_crucible-0.2.0/src/ai_crucible/instrument/inspect_task.py +213 -0
- ai_crucible-0.2.0/src/ai_crucible/instrument/prereg.py +349 -0
- ai_crucible-0.2.0/src/ai_crucible/instrument/rubric_bundle.py +226 -0
- ai_crucible-0.2.0/src/ai_crucible/instrument/sut.py +197 -0
- ai_crucible-0.2.0/src/ai_crucible/instrument/tuning.py +428 -0
- ai_crucible-0.2.0/src/ai_crucible/kernel.py +607 -0
- ai_crucible-0.2.0/src/ai_crucible/models/__init__.py +26 -0
- ai_crucible-0.2.0/src/ai_crucible/models/claude_adapter.py +325 -0
- ai_crucible-0.2.0/src/ai_crucible/models/ollama_adapter.py +488 -0
- ai_crucible-0.2.0/src/ai_crucible/observability.py +342 -0
- ai_crucible-0.2.0/src/ai_crucible/puzzle.py +151 -0
- ai_crucible-0.2.0/src/ai_crucible/roles.py +352 -0
- ai_crucible-0.2.0/src/ai_crucible/sandbox.py +376 -0
- ai_crucible-0.2.0/src/ai_crucible/scoring/__init__.py +52 -0
- ai_crucible-0.2.0/src/ai_crucible/scoring/judge_panel.py +346 -0
- ai_crucible-0.2.0/src/ai_crucible/scoring/oracle.py +240 -0
- ai_crucible-0.2.0/src/ai_crucible/scoring/stats.py +277 -0
- ai_crucible-0.2.0/src/ai_crucible/trace.py +230 -0
- ai_crucible-0.2.0/src/ai_crucible/types.py +266 -0
- ai_crucible-0.2.0/tests/__init__.py +0 -0
- ai_crucible-0.2.0/tests/conftest.py +68 -0
- ai_crucible-0.2.0/tests/fixtures/puzzles/sample/meta.json +36 -0
- ai_crucible-0.2.0/tests/fixtures/puzzles/sample/prompt +4 -0
- ai_crucible-0.2.0/tests/fixtures/puzzles/sample/setup_script +10 -0
- ai_crucible-0.2.0/tests/test_budget.py +165 -0
- ai_crucible-0.2.0/tests/test_calibration.py +609 -0
- ai_crucible-0.2.0/tests/test_characterize.py +1206 -0
- ai_crucible-0.2.0/tests/test_characterize_run.py +177 -0
- ai_crucible-0.2.0/tests/test_cli.py +43 -0
- ai_crucible-0.2.0/tests/test_contracts.py +97 -0
- ai_crucible-0.2.0/tests/test_engagement.py +347 -0
- ai_crucible-0.2.0/tests/test_human_labels.py +343 -0
- ai_crucible-0.2.0/tests/test_instrument.py +556 -0
- ai_crucible-0.2.0/tests/test_kernel.py +1004 -0
- ai_crucible-0.2.0/tests/test_models.py +471 -0
- ai_crucible-0.2.0/tests/test_observability.py +524 -0
- ai_crucible-0.2.0/tests/test_panel_store.py +147 -0
- ai_crucible-0.2.0/tests/test_puzzle.py +146 -0
- ai_crucible-0.2.0/tests/test_roles.py +332 -0
- ai_crucible-0.2.0/tests/test_sandbox.py +462 -0
- ai_crucible-0.2.0/tests/test_scoring.py +846 -0
- ai_crucible-0.2.0/uv.lock +2326 -0
- ai_crucible-0.2.0/verify.sh +20 -0
|
@@ -0,0 +1,79 @@
|
|
|
1
|
+
name: CI
|
|
2
|
+
|
|
3
|
+
# Paths-gated so CI minutes only burn on changes that can affect the build
|
|
4
|
+
# (per .claude/rules/github-actions.md — non-negotiable). pull_request covers
|
|
5
|
+
# branch validation; workflow_dispatch is the manual fallback.
|
|
6
|
+
on:
|
|
7
|
+
push:
|
|
8
|
+
paths:
|
|
9
|
+
- "src/**"
|
|
10
|
+
- "tests/**"
|
|
11
|
+
- "pyproject.toml"
|
|
12
|
+
- ".github/workflows/**"
|
|
13
|
+
pull_request:
|
|
14
|
+
paths:
|
|
15
|
+
- "src/**"
|
|
16
|
+
- "tests/**"
|
|
17
|
+
- "pyproject.toml"
|
|
18
|
+
- ".github/workflows/**"
|
|
19
|
+
workflow_dispatch:
|
|
20
|
+
|
|
21
|
+
# Cancel superseded runs on the same ref (required by github-actions rules).
|
|
22
|
+
concurrency:
|
|
23
|
+
group: ${{ github.workflow }}-${{ github.ref }}
|
|
24
|
+
cancel-in-progress: true
|
|
25
|
+
|
|
26
|
+
permissions:
|
|
27
|
+
contents: read
|
|
28
|
+
|
|
29
|
+
jobs:
|
|
30
|
+
test:
|
|
31
|
+
runs-on: ubuntu-latest
|
|
32
|
+
strategy:
|
|
33
|
+
fail-fast: false
|
|
34
|
+
matrix:
|
|
35
|
+
# The two most-tested floors. 3.11 is the supported lower bound
|
|
36
|
+
# (requires-python >=3.11) and 3.12 is the upper end exercised here.
|
|
37
|
+
python-version: ["3.11", "3.12"]
|
|
38
|
+
|
|
39
|
+
steps:
|
|
40
|
+
- name: Checkout
|
|
41
|
+
uses: actions/checkout@v4
|
|
42
|
+
|
|
43
|
+
- name: Install uv
|
|
44
|
+
uses: astral-sh/setup-uv@v5
|
|
45
|
+
with:
|
|
46
|
+
enable-cache: true
|
|
47
|
+
|
|
48
|
+
- name: Set up Python ${{ matrix.python-version }}
|
|
49
|
+
run: uv python install ${{ matrix.python-version }}
|
|
50
|
+
|
|
51
|
+
- name: Sync dependencies (dev + stats extras)
|
|
52
|
+
run: uv sync --extra dev --extra stats --python ${{ matrix.python-version }}
|
|
53
|
+
|
|
54
|
+
- name: Lint (ruff)
|
|
55
|
+
run: uv run ruff check .
|
|
56
|
+
|
|
57
|
+
- name: Read coverage floor from pyproject
|
|
58
|
+
id: covfloor
|
|
59
|
+
shell: bash
|
|
60
|
+
run: |
|
|
61
|
+
# Single source of truth: pyproject [tool.coverage.report].fail_under.
|
|
62
|
+
# Keeping the gate here in lockstep with pyproject avoids drift.
|
|
63
|
+
floor=$(uv run python -c "import tomllib,pathlib; print(tomllib.loads(pathlib.Path('pyproject.toml').read_text())['tool']['coverage']['report']['fail_under'])")
|
|
64
|
+
echo "floor=$floor" >> "$GITHUB_OUTPUT"
|
|
65
|
+
echo "Coverage floor: $floor%"
|
|
66
|
+
|
|
67
|
+
- name: Test (pytest + coverage gate)
|
|
68
|
+
run: >-
|
|
69
|
+
uv run pytest
|
|
70
|
+
--cov=ai_crucible
|
|
71
|
+
--cov-report=term-missing
|
|
72
|
+
--cov-fail-under=${{ steps.covfloor.outputs.floor }}
|
|
73
|
+
|
|
74
|
+
- name: Dependency audit (pip-audit on runtime deps)
|
|
75
|
+
run: |
|
|
76
|
+
# Ship Gate D: ecosystem-appropriate dependency scanning. Audit the
|
|
77
|
+
# resolved RUNTIME deps from the lockfile (not dev/stats tooling).
|
|
78
|
+
uv export --format requirements-txt --no-emit-project --no-hashes > requirements-audit.txt
|
|
79
|
+
uvx pip-audit -r requirements-audit.txt
|
|
@@ -0,0 +1,50 @@
|
|
|
1
|
+
name: Deploy site to GitHub Pages
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
branches: [main]
|
|
6
|
+
paths:
|
|
7
|
+
- 'site/**'
|
|
8
|
+
- '.github/workflows/pages.yml'
|
|
9
|
+
workflow_dispatch:
|
|
10
|
+
|
|
11
|
+
concurrency:
|
|
12
|
+
group: ${{ github.workflow }}-${{ github.ref }}
|
|
13
|
+
cancel-in-progress: true
|
|
14
|
+
|
|
15
|
+
permissions:
|
|
16
|
+
contents: read
|
|
17
|
+
pages: write
|
|
18
|
+
id-token: write
|
|
19
|
+
|
|
20
|
+
jobs:
|
|
21
|
+
build:
|
|
22
|
+
runs-on: ubuntu-latest
|
|
23
|
+
steps:
|
|
24
|
+
- uses: actions/checkout@v4
|
|
25
|
+
|
|
26
|
+
- uses: actions/setup-node@v4
|
|
27
|
+
with:
|
|
28
|
+
node-version: 22
|
|
29
|
+
|
|
30
|
+
- name: Install site dependencies
|
|
31
|
+
working-directory: site
|
|
32
|
+
run: npm ci
|
|
33
|
+
|
|
34
|
+
- name: Build site
|
|
35
|
+
working-directory: site
|
|
36
|
+
run: npm run build
|
|
37
|
+
|
|
38
|
+
- uses: actions/upload-pages-artifact@v3
|
|
39
|
+
with:
|
|
40
|
+
path: site/dist
|
|
41
|
+
|
|
42
|
+
deploy:
|
|
43
|
+
needs: build
|
|
44
|
+
runs-on: ubuntu-latest
|
|
45
|
+
environment:
|
|
46
|
+
name: github-pages
|
|
47
|
+
url: ${{ steps.deployment.outputs.page_url }}
|
|
48
|
+
steps:
|
|
49
|
+
- id: deployment
|
|
50
|
+
uses: actions/deploy-pages@v4
|
|
@@ -0,0 +1,190 @@
|
|
|
1
|
+
name: Release
|
|
2
|
+
|
|
3
|
+
# On a published GitHub Release this:
|
|
4
|
+
# 1. publishes the Python package to PyPI via Trusted Publishing (OIDC, no token),
|
|
5
|
+
# 2. builds PyInstaller binaries for each platform + uploads them (+ checksums) to the release,
|
|
6
|
+
# 3. publishes the @dogfood-lab/ai-crucible npm wrapper (the npm-launcher) via Trusted Publishing.
|
|
7
|
+
#
|
|
8
|
+
# Org rules (.claude/rules/github-actions.md): publish workflows fire on `release: published` only.
|
|
9
|
+
# The repo has ci.yml + this release.yml (the org CI/publish pair) plus pages.yml — pages.yml is the
|
|
10
|
+
# accepted GitHub-Pages-deploy exception (paths-gated to site/**), and the cross-OS binary matrix is
|
|
11
|
+
# the explicit-request exception to the "1 OS / no macos" rule, because the npm launcher distributes
|
|
12
|
+
# platform binaries.
|
|
13
|
+
#
|
|
14
|
+
# First-publish prerequisites (one-time, on the registries):
|
|
15
|
+
# * PyPI: a pending publisher for project `ai-crucible`, workflow `release.yml` — ALREADY SET
|
|
16
|
+
# (repository dogfood-lab/ai-crucible, environment "(Any)").
|
|
17
|
+
# * npm: a v0.0.0 placeholder publish of @dogfood-lab/ai-crucible (reserves the name) + a Trusted
|
|
18
|
+
# Publisher on npmjs.com (workflow release.yml, org dogfood-lab, repo ai-crucible).
|
|
19
|
+
# * Do NOT enable "require 2FA / disallow tokens" on the npm package until after the first CI publish.
|
|
20
|
+
#
|
|
21
|
+
# NOTE: the PyInstaller step bundles inspect-ai + the scientific stack; the first CI run may need a
|
|
22
|
+
# hidden-import/collect tweak (a known PyInstaller reality). A failure there is contained — the PyPI
|
|
23
|
+
# job is independent and still publishes; re-dispatch fixes binaries + the npm wrapper.
|
|
24
|
+
|
|
25
|
+
on:
|
|
26
|
+
release:
|
|
27
|
+
types: [published]
|
|
28
|
+
workflow_dispatch:
|
|
29
|
+
|
|
30
|
+
permissions:
|
|
31
|
+
contents: read
|
|
32
|
+
|
|
33
|
+
concurrency:
|
|
34
|
+
group: ${{ github.workflow }}-${{ github.ref }}
|
|
35
|
+
cancel-in-progress: false # never cancel an in-flight publish
|
|
36
|
+
|
|
37
|
+
jobs:
|
|
38
|
+
pypi:
|
|
39
|
+
name: Publish to PyPI (Trusted Publishing)
|
|
40
|
+
runs-on: ubuntu-latest
|
|
41
|
+
permissions:
|
|
42
|
+
id-token: write # OIDC handshake for PyPI Trusted Publishing — the only auth needed
|
|
43
|
+
timeout-minutes: 15
|
|
44
|
+
steps:
|
|
45
|
+
- uses: actions/checkout@v5
|
|
46
|
+
|
|
47
|
+
- name: Verify the tag matches pyproject version
|
|
48
|
+
run: |
|
|
49
|
+
TAG="${GITHUB_REF_NAME#v}"
|
|
50
|
+
PKG=$(grep -m1 '^version = ' pyproject.toml | sed -E 's/version = "(.*)"/\1/')
|
|
51
|
+
echo "tag=${TAG} pyproject=${PKG}"
|
|
52
|
+
if [ "${TAG}" != "${PKG}" ]; then
|
|
53
|
+
echo "::error::release tag ${TAG} does not match pyproject version ${PKG}"
|
|
54
|
+
exit 1
|
|
55
|
+
fi
|
|
56
|
+
|
|
57
|
+
- uses: astral-sh/setup-uv@v6
|
|
58
|
+
|
|
59
|
+
- name: Build sdist + wheel
|
|
60
|
+
run: uv build
|
|
61
|
+
|
|
62
|
+
- name: Check distribution metadata
|
|
63
|
+
run: uvx twine check dist/*
|
|
64
|
+
|
|
65
|
+
- name: Publish to PyPI
|
|
66
|
+
# Trusted Publishing: no token. PEP 740 attestations are on by default (action >= v1.11.0).
|
|
67
|
+
uses: pypa/gh-action-pypi-publish@release/v1
|
|
68
|
+
|
|
69
|
+
build-binaries:
|
|
70
|
+
name: Build PyInstaller binary (${{ matrix.target }})
|
|
71
|
+
strategy:
|
|
72
|
+
fail-fast: false # one OS's failure must not cancel the others
|
|
73
|
+
matrix:
|
|
74
|
+
include:
|
|
75
|
+
- os: ubuntu-latest
|
|
76
|
+
target: linux-x64
|
|
77
|
+
ext: ""
|
|
78
|
+
- os: macos-latest
|
|
79
|
+
target: darwin-arm64
|
|
80
|
+
ext: ""
|
|
81
|
+
# darwin-x64 omitted: macos-13 is deprecated; the arm64 binary runs via Rosetta.
|
|
82
|
+
- os: windows-latest
|
|
83
|
+
target: win-x64
|
|
84
|
+
ext: ".exe"
|
|
85
|
+
runs-on: ${{ matrix.os }}
|
|
86
|
+
timeout-minutes: 25
|
|
87
|
+
steps:
|
|
88
|
+
- uses: actions/checkout@v5
|
|
89
|
+
- uses: astral-sh/setup-uv@v6
|
|
90
|
+
- run: uv python install 3.12
|
|
91
|
+
- run: uv venv
|
|
92
|
+
|
|
93
|
+
- name: Install ai-crucible + PyInstaller
|
|
94
|
+
# The [stats] extra pulls statsmodels (the κ machinery the characterize path uses); the
|
|
95
|
+
# core deps (inspect-ai, numpy, scipy, pydantic) come with the project install.
|
|
96
|
+
run: uv pip install ".[stats]" "pyinstaller>=6.9.0"
|
|
97
|
+
|
|
98
|
+
- name: Build binary
|
|
99
|
+
shell: bash
|
|
100
|
+
run: |
|
|
101
|
+
VERSION=${GITHUB_REF_NAME#v}
|
|
102
|
+
uv run pyinstaller --onefile --name ai-crucible --console \
|
|
103
|
+
--collect-submodules ai_crucible \
|
|
104
|
+
--collect-all inspect_ai \
|
|
105
|
+
--collect-submodules pydantic \
|
|
106
|
+
--collect-submodules scipy \
|
|
107
|
+
--collect-submodules statsmodels \
|
|
108
|
+
--collect-submodules numpy \
|
|
109
|
+
--copy-metadata ai-crucible \
|
|
110
|
+
src/ai_crucible/__main__.py
|
|
111
|
+
OUTNAME="ai-crucible-${VERSION}-${{ matrix.target }}${{ matrix.ext }}"
|
|
112
|
+
mv "dist/ai-crucible${{ matrix.ext }}" "dist/${OUTNAME}"
|
|
113
|
+
echo "ASSET_NAME=${OUTNAME}" >> "$GITHUB_ENV"
|
|
114
|
+
|
|
115
|
+
- name: Smoke-test the binary
|
|
116
|
+
shell: bash
|
|
117
|
+
run: dist/${{ env.ASSET_NAME }} --version
|
|
118
|
+
|
|
119
|
+
- uses: actions/upload-artifact@v4
|
|
120
|
+
with:
|
|
121
|
+
name: binary-${{ matrix.target }}
|
|
122
|
+
path: dist/${{ env.ASSET_NAME }}
|
|
123
|
+
|
|
124
|
+
release-binaries:
|
|
125
|
+
name: Upload binaries + checksums to the release
|
|
126
|
+
needs: build-binaries
|
|
127
|
+
runs-on: ubuntu-latest
|
|
128
|
+
permissions:
|
|
129
|
+
contents: write # upload assets to the release
|
|
130
|
+
steps:
|
|
131
|
+
- uses: actions/download-artifact@v4
|
|
132
|
+
with:
|
|
133
|
+
path: artifacts
|
|
134
|
+
merge-multiple: true
|
|
135
|
+
|
|
136
|
+
- name: Generate checksums
|
|
137
|
+
shell: bash
|
|
138
|
+
run: |
|
|
139
|
+
VERSION=${GITHUB_REF_NAME#v}
|
|
140
|
+
cd artifacts
|
|
141
|
+
sha256sum * > "checksums-${VERSION}.txt"
|
|
142
|
+
cat "checksums-${VERSION}.txt"
|
|
143
|
+
|
|
144
|
+
- uses: softprops/action-gh-release@v2
|
|
145
|
+
with:
|
|
146
|
+
files: artifacts/*
|
|
147
|
+
|
|
148
|
+
npm:
|
|
149
|
+
name: Publish npm wrapper (Trusted Publishing)
|
|
150
|
+
needs: release-binaries # the wrapper is only useful once the binaries are on the release
|
|
151
|
+
runs-on: ubuntu-latest
|
|
152
|
+
permissions:
|
|
153
|
+
id-token: write # npm provenance via Sigstore OIDC
|
|
154
|
+
timeout-minutes: 15
|
|
155
|
+
steps:
|
|
156
|
+
- uses: actions/checkout@v5
|
|
157
|
+
|
|
158
|
+
- name: Verify the npm wrapper version matches the tag
|
|
159
|
+
run: |
|
|
160
|
+
TAG="${GITHUB_REF_NAME#v}"
|
|
161
|
+
PKG=$(node -p "require('./npm/package.json').version")
|
|
162
|
+
echo "tag=${TAG} npm=${PKG}"
|
|
163
|
+
if [ "${TAG}" != "${PKG}" ]; then
|
|
164
|
+
echo "::error::release tag ${TAG} does not match npm/package.json version ${PKG}"
|
|
165
|
+
exit 1
|
|
166
|
+
fi
|
|
167
|
+
|
|
168
|
+
- uses: actions/setup-node@v4
|
|
169
|
+
with:
|
|
170
|
+
node-version: "22"
|
|
171
|
+
registry-url: "https://registry.npmjs.org"
|
|
172
|
+
|
|
173
|
+
- name: Install npm >= 11.5 for OIDC trusted-publishing auth
|
|
174
|
+
run: |
|
|
175
|
+
# Node 22's bundled npm 10.9 races on an in-place `npm install -g npm@latest`
|
|
176
|
+
# (MODULE_NOT_FOUND: promise-retry). Install npm@latest into a sandbox and shadow it.
|
|
177
|
+
SANDBOX="$HOME/.npm-cli-sandbox"
|
|
178
|
+
mkdir -p "$SANDBOX"
|
|
179
|
+
pushd "$SANDBOX" >/dev/null
|
|
180
|
+
echo '{"name":"npm-cli-sandbox","version":"0.0.0","private":true}' > package.json
|
|
181
|
+
npm install --no-save --no-audit --no-fund --silent npm@latest
|
|
182
|
+
popd >/dev/null
|
|
183
|
+
echo "$SANDBOX/node_modules/.bin" >> "$GITHUB_PATH"
|
|
184
|
+
"$SANDBOX/node_modules/.bin/npm" --version
|
|
185
|
+
|
|
186
|
+
- name: Publish wrapper with provenance (OIDC trusted publisher)
|
|
187
|
+
working-directory: npm
|
|
188
|
+
run: |
|
|
189
|
+
npm install --no-save --no-audit --no-fund
|
|
190
|
+
npm publish --provenance --access public
|
|
@@ -0,0 +1,57 @@
|
|
|
1
|
+
# OS
|
|
2
|
+
.DS_Store
|
|
3
|
+
Thumbs.db
|
|
4
|
+
desktop.ini
|
|
5
|
+
|
|
6
|
+
# Editors
|
|
7
|
+
.vscode/
|
|
8
|
+
.idea/
|
|
9
|
+
*.swp
|
|
10
|
+
*~
|
|
11
|
+
|
|
12
|
+
# Env / secrets
|
|
13
|
+
.env
|
|
14
|
+
.env.*
|
|
15
|
+
!.env.example
|
|
16
|
+
*.key
|
|
17
|
+
*.pem
|
|
18
|
+
|
|
19
|
+
# Dependencies
|
|
20
|
+
node_modules/
|
|
21
|
+
__pycache__/
|
|
22
|
+
*.py[cod]
|
|
23
|
+
*.egg-info/
|
|
24
|
+
.venv/
|
|
25
|
+
venv/
|
|
26
|
+
|
|
27
|
+
# Build artifacts
|
|
28
|
+
dist/
|
|
29
|
+
build/
|
|
30
|
+
*.tsbuildinfo
|
|
31
|
+
|
|
32
|
+
# Python tooling caches
|
|
33
|
+
.pytest_cache/
|
|
34
|
+
.ruff_cache/
|
|
35
|
+
.mypy_cache/
|
|
36
|
+
.coverage
|
|
37
|
+
htmlcov/
|
|
38
|
+
coverage.xml
|
|
39
|
+
|
|
40
|
+
# Logs
|
|
41
|
+
*.log
|
|
42
|
+
npm-debug.log*
|
|
43
|
+
|
|
44
|
+
# Local scratch
|
|
45
|
+
scratch/
|
|
46
|
+
tmp/
|
|
47
|
+
|
|
48
|
+
# Polyglot translation cache
|
|
49
|
+
.polyglot-cache.json
|
|
50
|
+
|
|
51
|
+
# pip-audit scratch
|
|
52
|
+
requirements-audit.txt
|
|
53
|
+
|
|
54
|
+
# characterization run outputs
|
|
55
|
+
char-*.json
|
|
56
|
+
characterization-report.json
|
|
57
|
+
site/.astro/
|
|
@@ -0,0 +1,48 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to this project will be documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/),
|
|
6
|
+
and this project adheres to [Semantic Versioning](https://semver.org/).
|
|
7
|
+
|
|
8
|
+
## [Unreleased]
|
|
9
|
+
|
|
10
|
+
## [0.2.0] - 2026-06-01
|
|
11
|
+
|
|
12
|
+
Phase-1 milestone: the policy-enforced kernel + role contracts + instrument-quality
|
|
13
|
+
scaffolding, built greenfield from a citation-verified design lock. Pre-1.0 by
|
|
14
|
+
design — this is Phase 1 of a multi-phase research instrument (see Notes).
|
|
15
|
+
|
|
16
|
+
### Added
|
|
17
|
+
- **Phase-1 kernel** — a thin policy layer on [Inspect AI](https://inspect.aisi.org.uk/),
|
|
18
|
+
nine modules threaded through one observable `generate` choke point:
|
|
19
|
+
`puzzle_loader`, `sandbox`, `roles`, `budget_governor`, `oracle_scorer`,
|
|
20
|
+
`judge_panel`, `trace_writer`, `observability`, `attestation`. Entry points
|
|
21
|
+
`run_attempt` / `run_pass_hat_k`.
|
|
22
|
+
- **Layered Reward Surface + Sealed Boundary** — prompt-framing as a first-class
|
|
23
|
+
measured arm (`neutral` / `self_referential` / `social_standings`); the
|
|
24
|
+
motivating surface (chrome) never shares a context window with the scored,
|
|
25
|
+
out-of-band-graded surface.
|
|
26
|
+
- **Scoring** — `pass^k`, Wilson / Clopper-Pearson intervals, McNemar, the
|
|
27
|
+
graduation rule, the §8.3 conjunctive hard gate, and a cross-family judge panel
|
|
28
|
+
that excludes the generator's family (external verifier).
|
|
29
|
+
- **Instrument-quality scaffolding** — AsPredicted + REFORMS pre-registration
|
|
30
|
+
templates, content-hashed rubric bundle with version-bump, Sobol screen +
|
|
31
|
+
deterministic split + Thresholdout budget, `SUT.yaml`, and Inspect-AI task
|
|
32
|
+
mapping (the §9 audit chain).
|
|
33
|
+
- **First seed puzzle** — `puzzles/seed-sulzbach-55252/` (claude-code#55252:
|
|
34
|
+
fabrication / looping-on-diagnosis) with a grading-side oracle + fingerprinted
|
|
35
|
+
bait honeypot.
|
|
36
|
+
- **Design lock** — `docs/research-grounding.md` §10, grounded in two
|
|
37
|
+
citation-verified study-swarms (`docs/phase-0/swarm-16..25`; every arXiv ID
|
|
38
|
+
verified, corrections recorded inline).
|
|
39
|
+
- **Shipping baseline** — CI (ruff + pytest + 60% coverage gate on Python
|
|
40
|
+
3.11/3.12 + dependency audit), `SECURITY.md` threat model, MIT `LICENSE`, and a
|
|
41
|
+
`verify.sh` one-command gate.
|
|
42
|
+
|
|
43
|
+
### Notes
|
|
44
|
+
- **Pre-1.0 research instrument.** Phase 1 of a multi-phase build. The real
|
|
45
|
+
cross-family local-model panel (Phase 2), the hardened container/microVM sandbox
|
|
46
|
+
provider, and the cosign/Rekor external attestation anchor are deferred to
|
|
47
|
+
Phase 2+ and are documented as such. No package has been published.
|
|
48
|
+
- `0.1.0` was an internal scaffold version, never tagged or released.
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 mcp-tool-shop
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,125 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: ai-crucible
|
|
3
|
+
Version: 0.2.0
|
|
4
|
+
Summary: Diagnostic adversarial game for frontier LLMs — a policy-enforced kernel that mediates a Designer/Solver/Judge cycle, scores against a hidden oracle, and curates a Lab/Arena/Regression catalog.
|
|
5
|
+
Project-URL: Homepage, https://github.com/dogfood-lab/ai-crucible
|
|
6
|
+
Project-URL: Repository, https://github.com/dogfood-lab/ai-crucible
|
|
7
|
+
Author-email: mcp-tool-shop <64996768+mcp-tool-shop@users.noreply.github.com>
|
|
8
|
+
License: MIT
|
|
9
|
+
License-File: LICENSE
|
|
10
|
+
Keywords: auditing-game,diagnostic,evaluation,inspect-ai,llm,reward-hacking
|
|
11
|
+
Classifier: Development Status :: 3 - Alpha
|
|
12
|
+
Classifier: Intended Audience :: Science/Research
|
|
13
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
16
|
+
Requires-Python: <3.14,>=3.11
|
|
17
|
+
Requires-Dist: inspect-ai>=0.3
|
|
18
|
+
Requires-Dist: numpy>=1.26
|
|
19
|
+
Requires-Dist: pydantic>=2.5
|
|
20
|
+
Requires-Dist: scipy>=1.11
|
|
21
|
+
Provides-Extra: dev
|
|
22
|
+
Requires-Dist: mypy>=1.11; extra == 'dev'
|
|
23
|
+
Requires-Dist: pytest-cov>=5; extra == 'dev'
|
|
24
|
+
Requires-Dist: pytest>=8; extra == 'dev'
|
|
25
|
+
Requires-Dist: ruff>=0.6; extra == 'dev'
|
|
26
|
+
Provides-Extra: stats
|
|
27
|
+
Requires-Dist: statsmodels>=0.14; extra == 'stats'
|
|
28
|
+
Description-Content-Type: text/markdown
|
|
29
|
+
|
|
30
|
+
<p align="center">
|
|
31
|
+
<a href="README.ja.md">日本語</a> | <a href="README.zh.md">中文</a> | <a href="README.es.md">Español</a> | <a href="README.fr.md">Français</a> | <a href="README.hi.md">हिन्दी</a> | <a href="README.it.md">Italiano</a> | <a href="README.pt-BR.md">Português (BR)</a>
|
|
32
|
+
</p>
|
|
33
|
+
|
|
34
|
+
<p align="center">
|
|
35
|
+
<img src="https://raw.githubusercontent.com/dogfood-lab/ai-crucible/main/assets/logo.png" alt="ai-crucible" width="500" />
|
|
36
|
+
</p>
|
|
37
|
+
|
|
38
|
+
<p align="center">
|
|
39
|
+
<a href="https://github.com/dogfood-lab/ai-crucible/actions/workflows/ci.yml"><img src="https://github.com/dogfood-lab/ai-crucible/actions/workflows/ci.yml/badge.svg" alt="CI" /></a>
|
|
40
|
+
<a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="MIT License" /></a>
|
|
41
|
+
<img src="https://img.shields.io/badge/python-3.11%E2%80%933.13-blue.svg" alt="Python 3.11–3.13" />
|
|
42
|
+
<img src="https://img.shields.io/badge/coverage-96%25-brightgreen.svg" alt="Coverage 96%" />
|
|
43
|
+
<a href="CHANGELOG.md"><img src="https://img.shields.io/badge/version-0.2.0-orange.svg" alt="Version 0.2.0" /></a>
|
|
44
|
+
<a href="https://dogfood-lab.github.io/ai-crucible/"><img src="https://img.shields.io/badge/docs-handbook-orange.svg" alt="Handbook" /></a>
|
|
45
|
+
</p>
|
|
46
|
+
|
|
47
|
+
<p align="center"><b>A diagnostic adversarial game for frontier LLMs — a measurement instrument that happens to be fun.</b></p>
|
|
48
|
+
|
|
49
|
+
One Claude session (**Designer**) crafts puzzles targeting real, currently-observed capability gaps. Another (**Solver**) attempts them. A policy-enforced kernel mediates, scores against a hidden oracle, and curates a catalog through a `Lab → Arena → Regression` lifecycle. Puzzles are grounded in empirical signal — real GitHub issues, academic literature, observed failures in the field — not synthetic.
|
|
50
|
+
|
|
51
|
+
## What makes it different
|
|
52
|
+
|
|
53
|
+
- **Capability, not "cheating."** AI Crucible distinguishes *elegance* and *novelty* (rewarded) from *answer-bypass* (penalized). Lateral thinking is a capability to measure, not a vice to punish.
|
|
54
|
+
- **The instrument measures itself.** Prompt framing is a first-class measured arm — the kernel runs the same puzzle under `neutral` / `self_referential` / `social_standings` framings and reports its own prompt-effect as a diagnostic.
|
|
55
|
+
- **A sealed measurement boundary.** Motivation and measurement never share a context window; the hidden oracle is graded out-of-band by a different model family with the agent's reasoning hidden. The model can't game what it can't perceive.
|
|
56
|
+
- **Reliability by consistency.** `pass^k` (all *k* independent trials succeed), Wilson intervals, and cross-family judge panels — built to report distributions, not point estimates.
|
|
57
|
+
- **A living catalog.** Solved puzzles are *demoted* to `Regression`, never deleted, so the catalog becomes a capability-evolution timeline as the frontier moves.
|
|
58
|
+
|
|
59
|
+
## Threat model (summary)
|
|
60
|
+
|
|
61
|
+
AI Crucible's primary adversary is the **model under measurement**, which may try to reach the answer key or to detect and game the eval. It **touches**: puzzle prompts/setup, the Solver's own confined working directory, and the model-provider API calls the operator configures. It does **not** touch: the oracle/answer key (graded out-of-band on a separate host the Solver cannot reach) or motivation "chrome" (rank/standings — never injected into the scored context). **Permissions:** model-provider keys via environment variables at runtime; no bundled secrets, no telemetry, no outbound calls of its own. Full disclosure — including where a boundary is *defense-in-depth* rather than a hard guarantee — is in **[SECURITY.md](SECURITY.md)**.
|
|
62
|
+
|
|
63
|
+
## Architecture
|
|
64
|
+
|
|
65
|
+
AI Crucible is a **thin policy layer on [Inspect AI](https://inspect.aisi.org.uk/)** (UK AISI), not a from-scratch harness. A single `AttemptState` object is threaded Designer → Solver → (Critic) → Judge through **one `generate` choke point**, so every model and tool call is observable.
|
|
66
|
+
|
|
67
|
+
| Module | Responsibility |
|
|
68
|
+
| ------ | -------------- |
|
|
69
|
+
| `puzzle_loader` | Loads a puzzle directory (`meta.json` / `prompt` / `setup_script`) into Solver-visible state. **Never touches the oracle.** |
|
|
70
|
+
| `sandbox` | Narrow `exec` / `read_file` / `write_file` channel into a locked, network-less container. |
|
|
71
|
+
| `roles` | The five role slots (Designer / Solver / Critic / Judge / CohortSolver). Only Solver gets tools; Critic is interface-reserved, default-off. |
|
|
72
|
+
| `budget_governor` | Per-class tool-call + wall-clock budgets, displayed to the agent, enforced kernel-side; hard-kill on pathological loops. |
|
|
73
|
+
| `oracle_scorer` | Out-of-band grading: solved-**and**-no-regression against the hidden oracle (SWE-bench pattern). |
|
|
74
|
+
| `judge_panel` | Cross-family panel of model-scorers + reducer (PoLL) for novelty validation and bypass detection. |
|
|
75
|
+
| `trace_writer` | Per-attempt transcript in the Inspect `EvalLog` shape; large blobs stored by digest. |
|
|
76
|
+
| `observability` | Per-attempt → per-puzzle → per-model rollups; `pass^k` native. |
|
|
77
|
+
| `attestation` | Cryptographic provenance (cosign + event-store) behind a typed subprocess boundary. |
|
|
78
|
+
|
|
79
|
+
The sealed boundary runs in three tiers — **Tier 1** scored context (deployment-shaped, framing-neutral), **Tier 2** engagement framing (probed for contamination each release), **Tier 3** chrome (rank/leaderboard — human-facing UI only, never in a context the model solves in). The full design rationale, with citations, is in [`docs/research-grounding.md`](docs/research-grounding.md).
|
|
80
|
+
|
|
81
|
+
## Install
|
|
82
|
+
|
|
83
|
+
```bash
|
|
84
|
+
# As a Python library + CLI (PyPI):
|
|
85
|
+
pip install ai-crucible # or: uv pip install ai-crucible
|
|
86
|
+
ai-crucible --help
|
|
87
|
+
|
|
88
|
+
# Or zero-prerequisite via npx — downloads a verified binary, no Python needed:
|
|
89
|
+
npx @dogfood-lab/ai-crucible --help
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
> **Research preview (v0.2.x).** The judge panel's alt-test ω is still a *circular model-jury bootstrap* until a human-labeling round runs, so seated judges are **provisional** and the composed panel **escalates to a Claude Designer** below quorum. See the [scorecard](SCORECARD.md) for the honest, non-cosmetic gate results.
|
|
93
|
+
|
|
94
|
+
## Quick start (from source)
|
|
95
|
+
|
|
96
|
+
AI Crucible uses [`uv`](https://docs.astral.sh/uv/) for environment and dependency management. Python **3.11+**.
|
|
97
|
+
|
|
98
|
+
```bash
|
|
99
|
+
# Create the venv and install the dev + stats extras
|
|
100
|
+
uv sync --extra dev --extra stats
|
|
101
|
+
|
|
102
|
+
# Run the test suite (with the coverage gate)
|
|
103
|
+
uv run pytest --cov=ai_crucible --cov-report=term-missing
|
|
104
|
+
|
|
105
|
+
# Lint
|
|
106
|
+
uv run ruff check .
|
|
107
|
+
|
|
108
|
+
# One command: lint + tests + build + smoke
|
|
109
|
+
bash verify.sh
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
## Documentation
|
|
113
|
+
|
|
114
|
+
- **[Handbook](https://dogfood-lab.github.io/ai-crucible/)** — guides, architecture, and reference.
|
|
115
|
+
- [`docs/research-grounding.md`](docs/research-grounding.md) — design rationale, with citations.
|
|
116
|
+
- [`docs/gameplan.md`](docs/gameplan.md) — roadmap and open questions.
|
|
117
|
+
- [`SECURITY.md`](SECURITY.md) — threat model + honest residual-risk disclosure.
|
|
118
|
+
|
|
119
|
+
## License
|
|
120
|
+
|
|
121
|
+
[MIT](LICENSE). Public and pre-1.0 — see the [CHANGELOG](CHANGELOG.md) for version status.
|
|
122
|
+
|
|
123
|
+
---
|
|
124
|
+
|
|
125
|
+
<p align="center"><sub>Built by <a href="https://mcp-tool-shop.github.io/">MCP Tool Shop</a> · part of the <a href="https://github.com/dogfood-lab">dogfood-lab</a> workshop for testing in the AI era.</sub></p>
|