PyPI - keel-workflow - Versions diffs - 0.6.0__py3-none-any.whl - Mend

keel-workflow 0.6.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (49) hide show

keel/__init__.py +14 -0
keel/__main__.py +5 -0
keel/adapters/commands/ci-check.md +75 -0
keel/adapters/commands/coverage.md +136 -0
keel/adapters/commands/deps-audit.md +142 -0
keel/adapters/commands/flake-audit.md +157 -0
keel/adapters/commands/implement.md +127 -0
keel/adapters/commands/morning.md +100 -0
keel/adapters/commands/overnight.md +152 -0
keel/adapters/commands/pr-loop.md +181 -0
keel/adapters/commands/regression.md +181 -0
keel/adapters/commands/review-all-day.md +242 -0
keel/adapters/commands/review-cycle.md +205 -0
keel/adapters/commands/ship-v2.md +51 -0
keel/adapters/commands/ship.md +333 -0
keel/adapters/commands/stale-prs.md +142 -0
keel/adapters/commands/triage.md +260 -0
keel/adapters/commands/wrap.md +93 -0
keel/agents.py +89 -0
keel/classify.py +35 -0
keel/cli.py +639 -0
keel/config.py +206 -0
keel/consent.py +215 -0
keel/contracts.py +291 -0
keel/extensions.py +165 -0
keel/findings.py +91 -0
keel/gates.py +131 -0
keel/git.py +49 -0
keel/github.py +49 -0
keel/github_transport.py +109 -0
keel/install.py +293 -0
keel/jsonschema_min.py +138 -0
keel/jury.py +85 -0
keel/lock.py +33 -0
keel/model.py +129 -0
keel/orchestrator.py +139 -0
keel/project_commands.py +87 -0
keel/runner.py +98 -0
keel/runtime.py +246 -0
keel/scaffold.py +100 -0
keel/schema/project.schema.json +252 -0
keel/ship.py +111 -0
keel/window.py +39 -0
keel_workflow-0.6.0.dist-info/METADATA +224 -0
keel_workflow-0.6.0.dist-info/RECORD +49 -0
keel_workflow-0.6.0.dist-info/WHEEL +5 -0
keel_workflow-0.6.0.dist-info/entry_points.txt +2 -0
keel_workflow-0.6.0.dist-info/licenses/LICENSE +201 -0
keel_workflow-0.6.0.dist-info/top_level.txt +1 -0

keel/__init__.py ADDED Viewed

@@ -0,0 +1,14 @@
+"""keel — project-neutral, multi-agent workflow core.
+A fixed *backbone* of ordered steps drives a unit of work (a GitHub issue) from
+backlog to done. Projects never fork the backbone; they customise behaviour with:
+* **config** (``project.yaml`` — per-project *values*), and
+* **extensions** (project-owned *Lego pieces* snapped into named slots, add-only).
+The public, deterministic core lives in pure modules (``config``, ``model``,
+``extensions``, ``findings``, ``gates``); all network/subprocess I/O is kept in
+thin, fail-soft wrappers. See ``docs/proposals/keel-architecture.md``.
+"""
+__version__ = "0.6.0"

keel/__main__.py ADDED Viewed

@@ -0,0 +1,5 @@
+"""Enable ``python -m keel``."""
+from .cli import main
+raise SystemExit(main())

keel/adapters/commands/ci-check.md ADDED Viewed

@@ -0,0 +1,75 @@
+---
+description: Check the latest CI run's status; on failure, locate the failing job/step, diagnose the root cause, and propose one fix — never auto-apply.
+argument-hint: "[pr number]"
+allowed-tools: Bash(keel:*), Bash(git:*), Bash(gh:*), Read, Grep
+---
+# /keel:ci-check
+Project-neutral CI status check. Reads `.keel/project.yaml` (`ci_workflows`, `base_branch`)
+via the `keel` CLI — the workflow names and branch are never hardcoded here. It inspects the
+latest CI run, and when that run failed it pulls the failing log, reads the offending source,
+and proposes exactly **one** fix. It is read-only: it never edits code, pushes, re-kicks, or
+merges.
+All user-facing diagnoses MUST be written in English; free-form chat may be in any language.
+## Step 0 — orient + runtime gate
+```bash
+keel validate .keel/project.yaml --root .
+keel plan     .keel/project.yaml --root .     # read base_branch, ci_workflows
+```
+CI status lives behind the host's Actions/checks API, which is reached through `gh` (or its
+MCP equivalent). Detect availability first:
+- If a PR is in scope, prefer `keel ship .keel/project.yaml --root . --pr <N>` for the CI
+  rollup + merge decision, falling back to `gh pr checks <N>` for per-workflow detail across
+  the project's `ci_workflows`.
+- If neither `gh` nor an MCP checks-read path is reachable in this runtime, **exit cleanly**
+  with a single line saying CI data requires the host CLI and is unavailable in this sandbox
+  (re-run from a local checkout with the CLI installed, or use the host's web UI). Do **not**
+  error or partial-run — surfacing nothing here advances nothing.
+## Step 1 — latest run(s)
+List the most recent runs (a small limit, e.g. the last 3) across the project's
+`ci_workflows`, capturing per run: id, status, conclusion, workflow name, head branch, and
+created-at. The newest run is the one under analysis; the prior two give context for "did this
+just start failing?".
+## Step 2 — if the latest run failed
+1. Pull the failing job's **log tail** (the last screenful of `--log-failed`, not the whole
+   log) for the newest failing run.
+2. Identify the **failing job and step name**.
+3. **Read the source file** the failure points at (use the `file:line` the tool output carries
+   where it has one).
+4. **Classify** the failure: a real code/test failure vs. a flake (intermittent; cross-check
+   with the prior runs from Step 1) vs. infra/quota (runner, network, rate-limit, credentials).
+5. **Diagnose** the root cause in 2–4 sentences.
+6. **Propose ONE specific fix** — describe it concretely (file/line-targeted). Do **not** apply
+   it automatically.
+## Step 3 — if all runs passed
+Print a single green line: CI is green, with the latest run's workflow name, branch, and time.
+## Step 4 — recommend the next action
+Route by the Step 2 classification — never merge here:
+- **Transient / flake** → re-kick (a fresh push or the host's re-run), and consider
+  `/keel:flake-audit` if it recurs.
+- **Real failure** → apply the proposed fix, then run `/keel:review-cycle` (self-review +
+  independent reviewers) and re-check; the fix is not "done" until every reviewer is clear of
+  blockers AND CI is green. Land it through `/keel:ship` (window + lock + review), never by a
+  direct merge from here.
+- **Infra / quota** → escalate; this is not a code fix.
+## Stop conditions / invariants
+- **Read-only** — propose a fix, never apply, push, re-kick, or merge.
+- **Deterministic** for identical CI state.
+- **Fail-soft** — a missing CLI degrades to the Step 0 clean-exit note, not a crash.

keel/adapters/commands/coverage.md ADDED Viewed

@@ -0,0 +1,136 @@
+---
+description: Compute and post the per-PR test-coverage delta (base → head), flag low-coverage × high-risk hot spots, and open issues to close gaps — routed to keel:ship.
+argument-hint: "[pr number] [--base <branch>] [--threshold <pct>] [--changed] [--open-issues] [--dry-run]"
+allowed-tools: Bash(keel:*), Bash(git:*), Bash(gh:*), Read, Edit
+---
+# /keel:coverage
+Project-neutral coverage report. Every project value — the base branch, the coverage tooling
+per area, the risk map, the repo — is read from `.keel/project.yaml` via the `keel` CLI. The
+coverage command is **the project's** (its test/coverage command via its toolchain); this
+adapter never names a specific coverage tool.
+The command is **read-only on PR code**: it never modifies the PR diff, never closes the PR,
+and never merges. It runs coverage twice (base + head), produces a delta table, and posts (or
+updates) exactly one PR comment. Re-runs find the existing codename-prefixed comment and update
+it in place so the timeline never stacks duplicates. All published artifacts (the PR comment)
+MUST be English.
+## Step 0 — orient + parse arguments
+```bash
+keel validate .keel/project.yaml --root .
+keel plan     .keel/project.yaml --root .     # read base_branch, tier3_globs, repo
+keel plan     .keel/project.yaml --root . --command coverage --live --json
+```
+The live plan is the operator-consent preflight. Before creating worktrees, writing local
+coverage cache files, posting comments/labels/issues, using secrets, publishing, or calling
+production-adjacent systems, parse `contract.operator_consent`; if
+`requires_operator_consent` is true, STOP and ask the operator to rerun with the required
+`--approve-scope` values.
+Arguments:
+- Positional, optional: a single positive integer **PR number**. Default: derive the PR from
+  the current branch; if no open PR is tied to the current branch and no positional was given,
+  exit non-zero with `no PR for current branch — pass an explicit PR number`. Reject more than
+  one positional and zero/negative integers.
+- `--base <branch>` — consumes exactly one branch name after the flag. Default: `base_branch`
+  from the plan. Reject `--base` with no value after it.
+- `--threshold <pct>` — a coverage floor to compare hot spots against (Step 5).
+- `--changed` — scope coverage to the PR diff (`git diff base...HEAD`) rather than the whole
+  tree.
+- `--open-issues` — open a deduped issue per hot spot (Step 6) and route each to `/keel:ship`.
+- `--dry-run` — compute the report and log the would-be comment/label mutations to stdout as
+  `DRY-RUN: …`; make no write.
+- Reject unknown flags.
+## Step 1 — detect areas touched + availability
+Get the PR's changed-file list and classify which project **areas** it touches (derive areas
+from the repo layout). If the PR touches **no instrumented area**, post a short
+codename-prefixed comment noting there is no coverage signal (no files under an instrumented
+area) and exit; under `--dry-run`, log the would-be body and skip the post.
+Coverage needs the host CLI (or MCP read path) for the PR file list and the comment. If that is
+unreachable, note it and degrade per the MCP-mode rule in Step 6.
+## Step 2 — compute the baseline (base-branch head)
+Use a **dedicated worktree** so the PR checkout is not disturbed: fetch `--base` and add a
+worktree at its head. For each touched area, run the project's coverage command **for that
+area** and parse its report for the per-area + overall **line** coverage percentage
+(`covered / (covered + missed) * 100`). Cache results keyed by `<commit-sha>:<area>` so a
+re-run on the same SHA does not recompute.
+**Graceful degradation (per area):** if an area's coverage tool is not wired up, **skip** that
+area's coverage and add a note (`<area> coverage skipped: tooling not wired up`), continuing
+with the other areas. Discover the exact coverage task by name rather than hardcoding it (task
+names vary by tool version).
+## Step 3 — compute head (PR head)
+Run the same commands against the PR-head checkout (the working tree if it is already on the PR
+head, else a second worktree), reusing the task discovered in Step 2. Cache keyed by
+`<head-sha>:<area>`.
+## Step 4 — build the delta report
+A single markdown body whose **literal first line** is the codename
+`COVERAGE-<PR>-<UTC_TIMESTAMP>`. **Codename pin (load-bearing):** no blank line above it, no
+leading whitespace, no quoting, no Markdown prefix or surrounding formatting — Step 6 finds the
+existing comment by the `COVERAGE-<PR>-` prefix to update it in place, and any deviation makes
+it miss the prior comment and post a duplicate. The prefix here MUST match the literal first
+line emitted — change them together or find-and-update silently breaks.
+Body: a `base@<short> → head@<short>` header, then one section per touched area with rows
+(unit · base % · head % · signed Δ · files-in-diff) plus a bold **overall** summary row.
+Formatting rules:
+- One row per sub-area/module; one bold overall row per section that ran.
+- Show the absolute signed delta: `+1.2%`, `-0.3%`, `+0.0%`.
+- **Bold the whole row** when `|Δ| >= 0.5%` so reviewers' eyes catch it.
+- Omit a section entirely if its area was not touched.
+- A skipped area renders as a single italic line (`_<area> coverage skipped: …_`).
+## Step 5 — flag hot spots (low coverage × high risk)
+Cross the per-file/area coverage against `tier3_globs`: a low-coverage file that **also** matches
+a tier-3 glob is a **hot spot** — surface these first (high risk × low coverage). Compare each
+against `--threshold` if given. Hot spots are what Step 6 opens issues for.
+## Step 6 — post / update the PR comment + open hot-spot issues
+Find the existing coverage comment by the `COVERAGE-<PR>-` prefix on the PR.
+- Existing found and not `--dry-run` → **update it in place** (edit the existing comment).
+- None found and not `--dry-run` → post a new comment.
+- `--dry-run` → log the body to stdout, no API call.
+Only ever one coverage comment per PR after the first run. **In-place-update gap:** if the
+runtime can create comments but cannot edit one in place (e.g. an MCP path with no
+update-comment tool), do **not** post a second comment — compute and log the delta, emit a
+one-line note that an existing comment was found and in-place update is unavailable (re-run
+locally, or delete the prior comment first), and continue.
+**Label** (compute the worst overall regression across the areas that ran): if at least one area
+regressed by ≥ 0.5%, idempotently ensure and add a `coverage-regression` label; otherwise remove
+it if a prior run added it (ignore "not present" errors). The label is informational, not
+gating. Under `--dry-run`, log the would-be transition and skip.
+When `--open-issues` is set, open a **deduped** issue per hot spot (Step 5), tiered by
+`tier3_globs`, and hand each fix to **`/keel:ship`** (window + lock + review). Under `--dry-run`,
+print the would-be issues and route nothing.
+## Stop conditions / invariants
+- **Never modify the PR's code; never close or merge** — the only writes are one PR comment
+  (created or edited) and one PR label edit (plus hot-spot issues under `--open-issues`).
+  Gating/merge is `/keel:ship`'s job.
+- **Fail loudly on a coverage build/test failure** — print the failing command and exit
+  non-zero; do NOT post a half-formed table (reviewers would treat it as authoritative). This is
+  distinct from a tool **not wired up**, which degrades gracefully per Step 2.
+- **Worktree cleanup** — always remove the base (and head) worktree in a trap on EXIT so a
+  mid-run failure never leaks a worktree.
+- **No silent dry-run mutations** — every comment/label/issue write is printed as `DRY-RUN: …`
+  and skipped.
+- Deterministic for identical coverage data.

keel/adapters/commands/deps-audit.md ADDED Viewed

@@ -0,0 +1,142 @@
+---
+description: On-demand dependency security + licence audit across the project's ecosystems; classify security vs. routine, append findings to today's tracking issue, and route fixes to keel:ship.
+argument-hint: "[<ecosystem>|all] [--severity low|moderate|high|critical] [--open-issues] [--dry-run]"
+allowed-tools: Bash(keel:*), Bash(git:*), Bash(gh:*), Read, Edit
+---
+# /keel:deps-audit
+Project-neutral dependency audit. Every project value — which ecosystems exist, the audit
+command per ecosystem, the licence baseline path, the timezone for the run date, the risk map
+— is read from `.keel/project.yaml` via the `keel` CLI. The actual audit tool for each
+ecosystem is **the project's** (invoked through the toolchain referenced by `build_gate_cmd`);
+this adapter never names a specific package manager.
+The command is **read-only on application code**: it never edits dependency manifests or
+lockfiles, never applies upgrades, never updates the licence baseline, and never closes the
+tracking issue. It runs the project's per-ecosystem vulnerability audit, computes licence
+drift against a committed baseline, and appends one codename-prefixed comment to today's
+tracking issue. Re-runs append a fresh comment; the codename prefix is the search anchor that
+lets a briefing or a later run find the latest run inside a multi-comment issue.
+All published artifacts (issue/comment bodies) MUST be in English; free-form chat may be any
+language.
+## Step 0 — orient + parse arguments
+```bash
+keel validate .keel/project.yaml --root .
+keel plan     .keel/project.yaml --root .     # read ecosystems, tier3_globs, timezone, repo
+keel plan     .keel/project.yaml --root . --command deps-audit --live --json
+```
+The live plan is the operator-consent preflight. Before posting tracking comments, opening
+issues, routing fixes to `/keel:ship`, using secrets, publishing, or calling
+production-adjacent systems, parse `contract.operator_consent`; if
+`requires_operator_consent` is true, STOP and ask the operator to rerun with the required
+`--approve-scope` values.
+Arguments:
+- Positional, optional: one **ecosystem** the project declares, or `all`. Default `all`.
+  Reject any value that is not a declared ecosystem or `all`; reject more than one positional.
+- `--severity <level>` — one of `low`, `moderate`, `high`, `critical`. Default `moderate`.
+  Severity ordering is a numeric map (`critical=4`, `high=3`, `moderate=2`, `low=1`); a finding
+  is reported when `severity_rank >= threshold_rank`. Reject a missing or out-of-set value.
+- `--security-only` — report only vulnerabilities (CVE/advisory findings); skip routine/licence
+  reporting noise.
+- `--open-issues` — open a deduped fix issue per finding (Step 6) and route each to
+  `/keel:ship`. Without it the command is report-only against the tracking issue.
+- `--dry-run` — compute the report and log the would-be tracking-issue create/comment to stdout
+  as `DRY-RUN: …`, but make no write.
+- Reject unknown flags.
+## Step 1 — find or create today's tracking issue
+Compute the run **date in the project timezone** (from the plan, never a hardcoded zone).
+Search for an open tracking issue whose title is exactly `deps-audit: <DATE>`:
+- Found → capture its number.
+- Not found and not `--dry-run` → create it (title `deps-audit: <DATE>`, a one-line body
+  noting each run appends a comment) and capture the number.
+- Not found and `--dry-run` → print `DRY-RUN: would create tracking issue deps-audit: <DATE>`
+  and continue (Step 6 will skip the post anyway).
+## Step 2 — per-ecosystem vulnerability audit
+For each in-scope ecosystem (skip ecosystems excluded by the positional), run the project's
+declared audit command for that ecosystem via its toolchain. Many audit tools **exit non-zero
+when they find anything** — capture output without letting that abort the run (the equivalent
+of a trailing `|| true`). Parse the report and extract per vulnerability: the advisory/CVE id,
+severity (normalise from any numeric CVSS the tool emits), the dependency name, the current
+resolved version, the recommended non-vulnerable version, and a **fix-available** flag. Filter
+to `severity_rank >= threshold_rank`.
+**Graceful degradation (per ecosystem, never fatal):** if an audit tool is not wired up,
+cannot run (missing installed deps, network down), or errors, do **not** abort. Add a note
+under the report's "Skipped" section (`<ecosystem> audit skipped: <one-line reason>`) — e.g.
+"audit plugin not wired up; follow-up: add it to the build" — and continue with the others.
+The only fatal case is an argument-parse failure in Step 0.
+## Step 3 — licence drift
+Read the committed licence-baseline path from the project config. Format: sorted
+`<package>@<version> <licence>` lines that are the canonical licence set for the current
+lockfile state.
+- Baseline **missing** → write a one-time scaffolding note to the report (capture the current
+  set as a new baseline in a separate intentional commit) and skip the diff.
+- Otherwise build the **current** licence set for each ecosystem that ran (read each installed
+  package's declared licence; for ecosystems whose licences resolve over the network, skip that
+  half with a note if the network is unreachable), sort it, and diff against the baseline.
+  Classify each diff line as `added`, `removed`, or `changed` (same package, different
+  licence/version). Empty diff → report `licences: no drift`.
+## Step 4 — tier the findings
+Tier every vulnerability and drift entry by blast radius against `tier3_globs`: a finding that
+touches a tier-3 path (or a dependency consumed from one) is escalated — higher priority and a
+tier label on any issue opened in Step 6.
+## Step 5 — build the report comment
+A single markdown body whose **literal first line** is the codename
+`DEPS-AUDIT-<DATE>-<UTC_TIMESTAMP>`. **Codename pin (load-bearing):** no blank line above it,
+no leading whitespace, no quoting, no Markdown prefix or surrounding formatting — downstream
+consumers (briefing, future-run search anchors) locate the latest run by the
+`DEPS-AUDIT-<DATE>-` prefix, and any deviation makes them miss it.
+Body shape: a summary count line (`critical: n | high: n | moderate: n | low: n`, counting all
+ecosystems that ran, after threshold filtering); one section per in-scope ecosystem (package ·
+version · severity · advisory · fix-available); a licence-drift section (status · package ·
+baseline licence · current licence); and a "Skipped" section.
+Formatting rules:
+- Omit an ecosystem section entirely if it was out of scope.
+- An in-scope ecosystem with zero findings above threshold → a single italic line
+  `_No <ecosystem> findings at or above <severity> severity._` in place of the table.
+- Licence section ran with empty diff → `_licences: no drift_`.
+- Omit "Skipped" if nothing was skipped.
+- Under `--security-only`, omit the licence-drift section and any routine-update rows.
+## Step 6 — post / open issues + route to ship
+Locate any prior comment for today's run by the `DEPS-AUDIT-<DATE>-` prefix inside the tracking
+issue. The comment API appends (it does not edit in place), and the prefix is the anchor — so
+appending one comment per run is correct.
+- `--dry-run` → print the body under `DRY-RUN: would post comment on issue #<N>:` and skip.
+- Otherwise → post a fresh comment to the tracking issue.
+When `--open-issues` is set, additionally open a **deduped** fix issue per finding (one issue
+per CVE/dependency; do not re-open against an existing open issue for the same finding),
+labelled by **severity** and **tier**, and hand each fix to **`/keel:ship`** (window + lock +
+review). Never bump a dependency and merge directly from here. Under `--dry-run`, print
+`DRY-RUN: would open issue …` per finding and route nothing.
+## Stop conditions / invariants
+- **Never auto-apply upgrades** — version bumps go through `/keel:ship`, not this command.
+- **Never close the tracking issue; never modify manifests, lockfiles, or the licence
+  baseline** (the baseline is updated by a separate intentional commit).
+- **Per-audit failure continues with the others** — network/missing-tool/missing-baseline are
+  reported under "Skipped", not fatal. Only Step 0 arg-parse failure is fatal.
+- **No silent dry-run mutations** — every issue create / comment post is printed as `DRY-RUN: …`
+  and skipped.
+- Fail-soft · deterministic for identical inputs.

keel/adapters/commands/flake-audit.md ADDED Viewed

@@ -0,0 +1,157 @@
+---
+description: Detect intermittently-failing tests from recent CI history (or repeated local runs); dedupe against tracked flakes and open one tracking issue per newly-detected flake — routed to keel:ship.
+argument-hint: "[--days <N>] [--runs <N>] [--threshold <P>] [--open-issues] [--dry-run]"
+allowed-tools: Bash(keel:*), Bash(git:*), Bash(gh:*), Read, Edit
+---
+# /keel:flake-audit
+Project-neutral flaky-test audit. Every project value — the CI workflows, the base branch, the
+test gate, the repo — is read from `.keel/project.yaml` via the `keel` CLI. The test command is
+**the project's** (`keel run-gates .keel/project.yaml --root .`); this adapter never names a
+specific test runner.
+The command is **read-only on test code**: it never edits test sources, never auto-disables a
+test (no skip/ignore/only annotation), never reruns a test to "confirm" a flake, and never
+closes an existing flake issue. It aggregates per-test pass/fail signal, classifies tests whose
+failure rate crosses the threshold as flaky, dedupes against open flake issues, and opens one
+tracking issue per newly-detected flake. Re-runs MUST NOT spam duplicates — the Step 4 dedupe is
+load-bearing.
+**The defining rule:** only an **across-runs disagreement** (a test that both passes and fails
+under identical conditions) marks a flake. A test that **consistently fails** is a real bug, not
+a flake — it belongs to `/keel:ci-check` and human triage; do not classify it here. All
+published artifacts (report, issue bodies) MUST be English.
+## Step 0 — orient + parse arguments + runtime gate
+```bash
+keel validate .keel/project.yaml --root .
+keel plan     .keel/project.yaml --root .     # read base_branch, ci_workflows, repo
+keel plan     .keel/project.yaml --root . --command flake-audit --live --json
+```
+The live plan is the operator-consent preflight. Before opening issues, routing fixes to
+`/keel:ship`, using secrets, publishing, or calling production-adjacent systems, parse
+`contract.operator_consent`; if `requires_operator_consent` is true, STOP and ask the
+operator to rerun with the required `--approve-scope` values.
+Arguments:
+- `--days <N>` — CI-history lookback window in calendar days (UTC). Default `14`. Reject `0`,
+  negatives, non-integers.
+- `--runs <N>` — when reading CI history is unavailable, re-run the project's test gate `N`
+  times locally instead (default `5`). The two evidence sources are mutually supportive:
+  prefer CI history; fall back to repeated local runs.
+- `--threshold <P>` — minimum failure rate to flag, a decimal in `[0, 1]`. Default `0.10`.
+  Reject anything outside `[0, 1]`, non-numeric, or a missing value after the flag.
+- `--open-issues` — open a deduped tracking issue per newly-detected flake (Step 6) and route
+  each fix to `/keel:ship`. Without it, report-only.
+- `--dry-run` — compute the report and print findings, but skip every issue-create mutation;
+  print `would create: <title>` per skipped issue.
+- Reject unknown flags.
+**Runtime-availability gate.** Test-level CI-history analysis needs the host's check-run +
+artifact APIs (via the host CLI; no MCP equivalent today). If those are unreachable AND the
+project's test gate cannot be run locally either, exit cleanly with a single line saying flake
+detection requires either CI check-run/artifact access or a runnable local test gate, neither
+available in this runtime. Do not error or partial-run.
+## Step 1 — gather evidence
+**CI-history mode (preferred, full fidelity):** enumerate the in-window CI runs of the
+project's `ci_workflows` on `base_branch` (drive it via commits in the window + their
+check-runs where a direct run-by-date listing is unavailable; document the substitution
+honestly in the report's Limitations). For each failing check run keep its conclusion, workflow
+name, start time, the run-page URL (for "sample run URLs"), and the test-report artifact URL if
+extractable.
+**Local-runs mode (fallback):** run `keel run-gates .keel/project.yaml --root .` `--runs`
+times on a clean tree and record per-run pass/fail.
+## Step 2 — build per-test aggregates
+- **Test-level (full fidelity):** parse each run's test-report into per-test results keyed by
+  **fully-qualified test name**; track `pass_count` and `fail_count`. Capture, per failing
+  test, the first ~5 lines of the failure signature from the most recent failing run.
+- **Degraded run-level (no per-test granularity — artifacts unreachable):** aggregate only at
+  the run level (which runs failed, on which commit, with which URL); skip per-test
+  classification and per-flake issue creation, and produce a Limitations-flagged summary only.
+## Step 3 — classify flakes (test-level only)
+A fully-qualified test with both passes and failures in the window is **flaky** when:
+```
+fail_count >= 3  AND  fail_count / (pass_count + fail_count) >= <threshold>
+```
+The `fail_count >= 3` floor is load-bearing: it stops a single transient blip being labelled
+flaky. A test that **never passed** in the window is a deterministic failure, not a flake — do
+not classify it here.
+## Step 4 — dedupe against tracked flakes
+Search open issues labelled `flake` (canonical title shape `flaky test: <fully.qualified.name>`)
+and build the tracked-name set by stripping that prefix. Carry forward **only** newly-classified
+flakes not already tracked. This dedupe is the only reason the command is safe to re-run on a
+schedule — do not weaken it.
+## Step 5 — build the report
+A single markdown body whose **literal first line** is the codename
+`FLAKE-AUDIT-<DATE>-<UTC_TIMESTAMP>`. **Codename pin (load-bearing):** no blank line above it,
+no leading whitespace, no quoting, no Markdown prefix or surrounding formatting — downstream
+consumers locate the run by the `FLAKE-AUDIT-<DATE>-` prefix.
+Body: a summary line (runs examined · distinct failing tests · classified flakes ·
+newly-opened issues); a **newly-classified flakes** table (test · fail rate · failures · up to
+3 sample run URLs, most-recent first · first ~5 lines of the failure signature, fenced inline);
+an **already-tracked (deduped)** list (`<name> — see #<existing>`); and a **Limitations**
+section.
+Formatting rules:
+- Zero new flakes → render `_no new flakes above threshold_` instead of the table.
+- Omit "already-tracked" if nothing was deduped.
+- Omit "Limitations" if no degradation occurred; otherwise include every honest caveat (history
+  enumerated commit-by-commit, an artifact download that failed, degraded run-level mode, etc.).
+- Degraded run-level mode → replace the flakes table with a short list of failing runs (commit ·
+  workflow · URL) and add the Limitations bullet that test-level classification was skipped.
+## Step 6 — open issues per new flake + route to ship
+Only under `--open-issues`, and test-level mode only (degraded mode skips this). Per newly
+classified flake:
+- `--dry-run` → print `would create: flaky test: <name>` and skip.
+- Otherwise → open an issue titled `flaky test: <fully.qualified.name>` whose body carries the
+  detecting codename + window, the fail rate (`<fail>/<total>`), the threshold, up to 3 sample
+  run URLs, the fenced failure-signature excerpt, and a triage note: **do NOT auto-disable**;
+  investigate root cause (timing, shared state, ordering, network); close with a fix PR or mark
+  wontfix if the test is being retired.
+Labels: always `flake`; add an **area** label tiered from `tier3_globs` / the test's path or
+FQN where the project's layout makes the area derivable; if the area is indeterminate, apply
+only `flake` and note that in the body. If a needed label does not yet exist, the create may
+fail to apply it — detect that, **retry the create without the label**, and add a Limitations
+bullet recommending the operator pre-create the label.
+Hand each opened flake fix to **`/keel:ship`** (window + lock + review). Optionally suggest a
+quarantine annotation in the issue — but never silently skip a test from this command.
+## Step 7 — print
+Always print the Step 5 report to stdout, even when not `--dry-run` — a silent run would force
+the operator to spelunk the tracker after the fact.
+## Stop conditions / invariants
+- **Never auto-disable a flaky test; never rerun a test to "confirm" flakiness** (a rerun
+  invalidates the very signal being measured) — only observed history counts.
+- **One flake = one issue** — the Step 4 dedupe is load-bearing; re-runs never duplicate a
+  tracked flake.
+- **Never close/edit/comment on existing flake issues** — triage is a human call; this command
+  only opens new ones.
+- **Per-step failure continues with what was fetched**, documented under Limitations; only Step
+  0 arg-parse failure is fatal.
+- **No silent dry-run mutations** — each create is printed as `would create: <title>` and
+  skipped.
+- Fail-soft · deterministic for identical inputs.

keel/adapters/commands/implement.md ADDED Viewed

@@ -0,0 +1,127 @@
+---
+description: Delegate a single issue to the right implementer and drive the s4 implement step standalone. Project-neutral — every project specific is read from .keel/project.yaml via the keel CLI.
+argument-hint: "[issue number] [--delegate <claude|codex|agy|ollama:MODEL>]"
+allowed-tools: Bash(keel:*), Bash(git:*), Bash(gh:*), Read, Write, Agent
+---
+# /keel:implement
+The standalone **implement step (`s4`)** of the keel backbone. This adapter is
+project-neutral: it contains no branch name, build command, agent, path glob, or
+timezone. Read every project-specific value from `.keel/project.yaml` via the
+`keel` CLI.
+> **Hard rule.** If you are about to type a literal like a base-branch name, a
+> build/lint command, an implementer agent, or a path glob — **stop** and read it
+> from config (`base_branch`, `build_gate_cmd`, `lint_cmd`, `implementer_agents`,
+> `tier3_globs`). Hardcoding a project specific here is the exact bug keel exists
+> to kill.
+## Language
+All committed/published artifacts (commits, branch names, PR/issue titles and
+bodies, comments, file contents) MUST be written in English. Free-form chat with
+the user may stay in any language. (project-specific language policy lives in the
+project's source-of-truth doc — `knobs.sot_doc`.)
+## Step 0 — Resolve config
+```bash
+keel validate .keel/project.yaml --root .   # abort if config/extensions invalid
+keel plan     .keel/project.yaml --root .    # read base_branch, implementer_agents, tier3_globs
+keel plan     .keel/project.yaml --root . --command implement --live --json
+```
+The live plan is the operator-consent preflight. Before posting comments, creating a
+worktree/branch, delegating, editing files, committing, pushing, opening a PR, using
+secrets, publishing, or calling production-adjacent systems, parse
+`contract.operator_consent`; if `requires_operator_consent` is true, STOP and ask the
+operator to rerun with the required `--approve-scope` values. Store
+`operator_consent.delegated_agent_scope` for Step 5.
+Read the knobs you will need: `base_branch`, `implementer_agents`, `tier3_globs`,
+`build_gate_cmd`, `lint_cmd`.
+## Step 1 — Fetch the issue
+Read the issue (title, body, labels) via `gh` (CLI when available) or the GitHub
+MCP read tools (sandbox/web runtime). Capture `number`, `title`, `body`, and the
+`labels` array — the role/platform label drives implementer routing below.
+## Step 2 — Check for an existing branch
+Look for a branch already associated with this issue (e.g. matching
+`*issue-<N>*`). If one exists, report it and ask the human whether to continue on
+it or start fresh — do not silently clobber in-flight work.
+## Step 3 — Resolve the implementer
+Resolve the implementer agent from `implementer_agents` keyed by the issue's
+**role/platform label**, overridden by `--delegate`, defaulting to the **host
+agent**. Do not hardcode an agent name — the mapping is config.
+Project-specific routing nuances (e.g. a particular file-pattern that demands a
+specialized tool or a record-and-validate script for snapshot/baseline tests)
+live in the project, not here: express them as a `.keel/extensions/` Lego
+(an `after-implement` slot) or mark them "(project-specific; stays in the
+project)".
+## Step 4 — Agent run codename + start comment
+Mint an agent run codename and record attribution. Use a deterministic,
+collision-free form: `<ROLE_PREFIX>-<issue>-<UTC timestamp>` where `ROLE_PREFIX`
+derives from the resolved implementer role and the timestamp is generated at run
+time (UTC, e.g. `YYYYMMDD-HHMMSS`). Attribution is `agent:<vendor>` plus a
+versionless `model:<base>`.
+Post a start comment on the issue before delegating (via `gh` or the GitHub MCP
+comment tool), including: codename, chosen agent, implementer system (host agent
+id), and the planned branch name.
+## Step 5 — Delegate (with worktree isolation)
+Dispatch to the resolved implementer with the issue context. Mandatory steps the
+implementer must follow:
+0. Receive and obey the approved `operator_consent.delegated_agent_scope`. If the
+   implementer attempts work outside `approved_mutation_scopes`, the orchestrator blocks or
+   escalates. Secret access requires explicit `secrets` approval for this run.
+1. Read the project's source-of-truth doc (`knobs.sot_doc`) and any platform
+   context it points to.
+2. **Workspace isolation (mandatory):** before any code-modifying work, create a
+   git worktree off `origin/<base_branch>` (config — never assume the branch) and
+   perform every edit, build, and push from inside it. Never mutate the user's
+   primary checkout. Use a repo-nested worktree path (never a sibling), e.g.:
+   ```bash
+   git fetch origin "$BASE_BRANCH" --quiet
+   git worktree add -b feature/issue-<N>-<slug> worktrees/issue-<N> origin/"$BASE_BRANCH"
+   ```
+   Run the gates from inside that path. After the PR merges, clean up with
+   `git worktree remove worktrees/issue-<N> --force`.
+3. Implement all acceptance criteria with focused commits scoped to the issue.
+4. Run the applicable gates from inside the worktree via the keel CLI so the
+   command strings stay config-driven:
+   ```bash
+   keel run-gates .keel/project.yaml --root .
+   ```
+   This executes the built-in `build_gate_cmd` / `lint_cmd` plus any `tester`
+   Lego. Gate selection that depends on which files changed (schema migration,
+   entitlement, or config-specific suites) is project-specific: express it as a
+   Lego or mark it "(project-specific; stays in the project)".
+5. Include the codename in commits / PR body / final summary when practical.
+6. Return the contract: codename, branch/commit, files changed, gate results,
+   docs impact, and anything needing manual/device/infra verification.
+## Step 6 — Report back + hand off
+After the implementer completes, summarize: codename, branch/commit, what was
+implemented, gate results, and anything needing manual verification. Post a
+completion comment on the issue with the same codename and the gate results if
+the implementer did not already do so.
+Do **not** merge here — that is `/keel:ship`'s job (window + lock + review). Hand
+the contract to `/keel:ship` (or `/keel:pr-loop`) to open the PR and drive
+review / CI / merge.
+Fail over to the host agent on delegate quota errors; attribute the **effective**
+agent.