keel-workflow 0.6.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (49) hide show
  1. keel/__init__.py +14 -0
  2. keel/__main__.py +5 -0
  3. keel/adapters/commands/ci-check.md +75 -0
  4. keel/adapters/commands/coverage.md +136 -0
  5. keel/adapters/commands/deps-audit.md +142 -0
  6. keel/adapters/commands/flake-audit.md +157 -0
  7. keel/adapters/commands/implement.md +127 -0
  8. keel/adapters/commands/morning.md +100 -0
  9. keel/adapters/commands/overnight.md +152 -0
  10. keel/adapters/commands/pr-loop.md +181 -0
  11. keel/adapters/commands/regression.md +181 -0
  12. keel/adapters/commands/review-all-day.md +242 -0
  13. keel/adapters/commands/review-cycle.md +205 -0
  14. keel/adapters/commands/ship-v2.md +51 -0
  15. keel/adapters/commands/ship.md +333 -0
  16. keel/adapters/commands/stale-prs.md +142 -0
  17. keel/adapters/commands/triage.md +260 -0
  18. keel/adapters/commands/wrap.md +93 -0
  19. keel/agents.py +89 -0
  20. keel/classify.py +35 -0
  21. keel/cli.py +639 -0
  22. keel/config.py +206 -0
  23. keel/consent.py +215 -0
  24. keel/contracts.py +291 -0
  25. keel/extensions.py +165 -0
  26. keel/findings.py +91 -0
  27. keel/gates.py +131 -0
  28. keel/git.py +49 -0
  29. keel/github.py +49 -0
  30. keel/github_transport.py +109 -0
  31. keel/install.py +293 -0
  32. keel/jsonschema_min.py +138 -0
  33. keel/jury.py +85 -0
  34. keel/lock.py +33 -0
  35. keel/model.py +129 -0
  36. keel/orchestrator.py +139 -0
  37. keel/project_commands.py +87 -0
  38. keel/runner.py +98 -0
  39. keel/runtime.py +246 -0
  40. keel/scaffold.py +100 -0
  41. keel/schema/project.schema.json +252 -0
  42. keel/ship.py +111 -0
  43. keel/window.py +39 -0
  44. keel_workflow-0.6.0.dist-info/METADATA +224 -0
  45. keel_workflow-0.6.0.dist-info/RECORD +49 -0
  46. keel_workflow-0.6.0.dist-info/WHEEL +5 -0
  47. keel_workflow-0.6.0.dist-info/entry_points.txt +2 -0
  48. keel_workflow-0.6.0.dist-info/licenses/LICENSE +201 -0
  49. keel_workflow-0.6.0.dist-info/top_level.txt +1 -0
keel/__init__.py ADDED
@@ -0,0 +1,14 @@
1
+ """keel — project-neutral, multi-agent workflow core.
2
+
3
+ A fixed *backbone* of ordered steps drives a unit of work (a GitHub issue) from
4
+ backlog to done. Projects never fork the backbone; they customise behaviour with:
5
+
6
+ * **config** (``project.yaml`` — per-project *values*), and
7
+ * **extensions** (project-owned *Lego pieces* snapped into named slots, add-only).
8
+
9
+ The public, deterministic core lives in pure modules (``config``, ``model``,
10
+ ``extensions``, ``findings``, ``gates``); all network/subprocess I/O is kept in
11
+ thin, fail-soft wrappers. See ``docs/proposals/keel-architecture.md``.
12
+ """
13
+
14
+ __version__ = "0.6.0"
keel/__main__.py ADDED
@@ -0,0 +1,5 @@
1
+ """Enable ``python -m keel``."""
2
+
3
+ from .cli import main
4
+
5
+ raise SystemExit(main())
@@ -0,0 +1,75 @@
1
+ ---
2
+ description: Check the latest CI run's status; on failure, locate the failing job/step, diagnose the root cause, and propose one fix — never auto-apply.
3
+ argument-hint: "[pr number]"
4
+ allowed-tools: Bash(keel:*), Bash(git:*), Bash(gh:*), Read, Grep
5
+ ---
6
+
7
+ # /keel:ci-check
8
+
9
+ Project-neutral CI status check. Reads `.keel/project.yaml` (`ci_workflows`, `base_branch`)
10
+ via the `keel` CLI — the workflow names and branch are never hardcoded here. It inspects the
11
+ latest CI run, and when that run failed it pulls the failing log, reads the offending source,
12
+ and proposes exactly **one** fix. It is read-only: it never edits code, pushes, re-kicks, or
13
+ merges.
14
+
15
+ All user-facing diagnoses MUST be written in English; free-form chat may be in any language.
16
+
17
+ ## Step 0 — orient + runtime gate
18
+
19
+ ```bash
20
+ keel validate .keel/project.yaml --root .
21
+ keel plan .keel/project.yaml --root . # read base_branch, ci_workflows
22
+ ```
23
+
24
+ CI status lives behind the host's Actions/checks API, which is reached through `gh` (or its
25
+ MCP equivalent). Detect availability first:
26
+
27
+ - If a PR is in scope, prefer `keel ship .keel/project.yaml --root . --pr <N>` for the CI
28
+ rollup + merge decision, falling back to `gh pr checks <N>` for per-workflow detail across
29
+ the project's `ci_workflows`.
30
+ - If neither `gh` nor an MCP checks-read path is reachable in this runtime, **exit cleanly**
31
+ with a single line saying CI data requires the host CLI and is unavailable in this sandbox
32
+ (re-run from a local checkout with the CLI installed, or use the host's web UI). Do **not**
33
+ error or partial-run — surfacing nothing here advances nothing.
34
+
35
+ ## Step 1 — latest run(s)
36
+
37
+ List the most recent runs (a small limit, e.g. the last 3) across the project's
38
+ `ci_workflows`, capturing per run: id, status, conclusion, workflow name, head branch, and
39
+ created-at. The newest run is the one under analysis; the prior two give context for "did this
40
+ just start failing?".
41
+
42
+ ## Step 2 — if the latest run failed
43
+
44
+ 1. Pull the failing job's **log tail** (the last screenful of `--log-failed`, not the whole
45
+ log) for the newest failing run.
46
+ 2. Identify the **failing job and step name**.
47
+ 3. **Read the source file** the failure points at (use the `file:line` the tool output carries
48
+ where it has one).
49
+ 4. **Classify** the failure: a real code/test failure vs. a flake (intermittent; cross-check
50
+ with the prior runs from Step 1) vs. infra/quota (runner, network, rate-limit, credentials).
51
+ 5. **Diagnose** the root cause in 2–4 sentences.
52
+ 6. **Propose ONE specific fix** — describe it concretely (file/line-targeted). Do **not** apply
53
+ it automatically.
54
+
55
+ ## Step 3 — if all runs passed
56
+
57
+ Print a single green line: CI is green, with the latest run's workflow name, branch, and time.
58
+
59
+ ## Step 4 — recommend the next action
60
+
61
+ Route by the Step 2 classification — never merge here:
62
+
63
+ - **Transient / flake** → re-kick (a fresh push or the host's re-run), and consider
64
+ `/keel:flake-audit` if it recurs.
65
+ - **Real failure** → apply the proposed fix, then run `/keel:review-cycle` (self-review +
66
+ independent reviewers) and re-check; the fix is not "done" until every reviewer is clear of
67
+ blockers AND CI is green. Land it through `/keel:ship` (window + lock + review), never by a
68
+ direct merge from here.
69
+ - **Infra / quota** → escalate; this is not a code fix.
70
+
71
+ ## Stop conditions / invariants
72
+
73
+ - **Read-only** — propose a fix, never apply, push, re-kick, or merge.
74
+ - **Deterministic** for identical CI state.
75
+ - **Fail-soft** — a missing CLI degrades to the Step 0 clean-exit note, not a crash.
@@ -0,0 +1,136 @@
1
+ ---
2
+ description: Compute and post the per-PR test-coverage delta (base → head), flag low-coverage × high-risk hot spots, and open issues to close gaps — routed to keel:ship.
3
+ argument-hint: "[pr number] [--base <branch>] [--threshold <pct>] [--changed] [--open-issues] [--dry-run]"
4
+ allowed-tools: Bash(keel:*), Bash(git:*), Bash(gh:*), Read, Edit
5
+ ---
6
+
7
+ # /keel:coverage
8
+
9
+ Project-neutral coverage report. Every project value — the base branch, the coverage tooling
10
+ per area, the risk map, the repo — is read from `.keel/project.yaml` via the `keel` CLI. The
11
+ coverage command is **the project's** (its test/coverage command via its toolchain); this
12
+ adapter never names a specific coverage tool.
13
+
14
+ The command is **read-only on PR code**: it never modifies the PR diff, never closes the PR,
15
+ and never merges. It runs coverage twice (base + head), produces a delta table, and posts (or
16
+ updates) exactly one PR comment. Re-runs find the existing codename-prefixed comment and update
17
+ it in place so the timeline never stacks duplicates. All published artifacts (the PR comment)
18
+ MUST be English.
19
+
20
+ ## Step 0 — orient + parse arguments
21
+
22
+ ```bash
23
+ keel validate .keel/project.yaml --root .
24
+ keel plan .keel/project.yaml --root . # read base_branch, tier3_globs, repo
25
+ keel plan .keel/project.yaml --root . --command coverage --live --json
26
+ ```
27
+
28
+ The live plan is the operator-consent preflight. Before creating worktrees, writing local
29
+ coverage cache files, posting comments/labels/issues, using secrets, publishing, or calling
30
+ production-adjacent systems, parse `contract.operator_consent`; if
31
+ `requires_operator_consent` is true, STOP and ask the operator to rerun with the required
32
+ `--approve-scope` values.
33
+
34
+ Arguments:
35
+ - Positional, optional: a single positive integer **PR number**. Default: derive the PR from
36
+ the current branch; if no open PR is tied to the current branch and no positional was given,
37
+ exit non-zero with `no PR for current branch — pass an explicit PR number`. Reject more than
38
+ one positional and zero/negative integers.
39
+ - `--base <branch>` — consumes exactly one branch name after the flag. Default: `base_branch`
40
+ from the plan. Reject `--base` with no value after it.
41
+ - `--threshold <pct>` — a coverage floor to compare hot spots against (Step 5).
42
+ - `--changed` — scope coverage to the PR diff (`git diff base...HEAD`) rather than the whole
43
+ tree.
44
+ - `--open-issues` — open a deduped issue per hot spot (Step 6) and route each to `/keel:ship`.
45
+ - `--dry-run` — compute the report and log the would-be comment/label mutations to stdout as
46
+ `DRY-RUN: …`; make no write.
47
+ - Reject unknown flags.
48
+
49
+ ## Step 1 — detect areas touched + availability
50
+
51
+ Get the PR's changed-file list and classify which project **areas** it touches (derive areas
52
+ from the repo layout). If the PR touches **no instrumented area**, post a short
53
+ codename-prefixed comment noting there is no coverage signal (no files under an instrumented
54
+ area) and exit; under `--dry-run`, log the would-be body and skip the post.
55
+
56
+ Coverage needs the host CLI (or MCP read path) for the PR file list and the comment. If that is
57
+ unreachable, note it and degrade per the MCP-mode rule in Step 6.
58
+
59
+ ## Step 2 — compute the baseline (base-branch head)
60
+
61
+ Use a **dedicated worktree** so the PR checkout is not disturbed: fetch `--base` and add a
62
+ worktree at its head. For each touched area, run the project's coverage command **for that
63
+ area** and parse its report for the per-area + overall **line** coverage percentage
64
+ (`covered / (covered + missed) * 100`). Cache results keyed by `<commit-sha>:<area>` so a
65
+ re-run on the same SHA does not recompute.
66
+
67
+ **Graceful degradation (per area):** if an area's coverage tool is not wired up, **skip** that
68
+ area's coverage and add a note (`<area> coverage skipped: tooling not wired up`), continuing
69
+ with the other areas. Discover the exact coverage task by name rather than hardcoding it (task
70
+ names vary by tool version).
71
+
72
+ ## Step 3 — compute head (PR head)
73
+
74
+ Run the same commands against the PR-head checkout (the working tree if it is already on the PR
75
+ head, else a second worktree), reusing the task discovered in Step 2. Cache keyed by
76
+ `<head-sha>:<area>`.
77
+
78
+ ## Step 4 — build the delta report
79
+
80
+ A single markdown body whose **literal first line** is the codename
81
+ `COVERAGE-<PR>-<UTC_TIMESTAMP>`. **Codename pin (load-bearing):** no blank line above it, no
82
+ leading whitespace, no quoting, no Markdown prefix or surrounding formatting — Step 6 finds the
83
+ existing comment by the `COVERAGE-<PR>-` prefix to update it in place, and any deviation makes
84
+ it miss the prior comment and post a duplicate. The prefix here MUST match the literal first
85
+ line emitted — change them together or find-and-update silently breaks.
86
+
87
+ Body: a `base@<short> → head@<short>` header, then one section per touched area with rows
88
+ (unit · base % · head % · signed Δ · files-in-diff) plus a bold **overall** summary row.
89
+ Formatting rules:
90
+ - One row per sub-area/module; one bold overall row per section that ran.
91
+ - Show the absolute signed delta: `+1.2%`, `-0.3%`, `+0.0%`.
92
+ - **Bold the whole row** when `|Δ| >= 0.5%` so reviewers' eyes catch it.
93
+ - Omit a section entirely if its area was not touched.
94
+ - A skipped area renders as a single italic line (`_<area> coverage skipped: …_`).
95
+
96
+ ## Step 5 — flag hot spots (low coverage × high risk)
97
+
98
+ Cross the per-file/area coverage against `tier3_globs`: a low-coverage file that **also** matches
99
+ a tier-3 glob is a **hot spot** — surface these first (high risk × low coverage). Compare each
100
+ against `--threshold` if given. Hot spots are what Step 6 opens issues for.
101
+
102
+ ## Step 6 — post / update the PR comment + open hot-spot issues
103
+
104
+ Find the existing coverage comment by the `COVERAGE-<PR>-` prefix on the PR.
105
+ - Existing found and not `--dry-run` → **update it in place** (edit the existing comment).
106
+ - None found and not `--dry-run` → post a new comment.
107
+ - `--dry-run` → log the body to stdout, no API call.
108
+
109
+ Only ever one coverage comment per PR after the first run. **In-place-update gap:** if the
110
+ runtime can create comments but cannot edit one in place (e.g. an MCP path with no
111
+ update-comment tool), do **not** post a second comment — compute and log the delta, emit a
112
+ one-line note that an existing comment was found and in-place update is unavailable (re-run
113
+ locally, or delete the prior comment first), and continue.
114
+
115
+ **Label** (compute the worst overall regression across the areas that ran): if at least one area
116
+ regressed by ≥ 0.5%, idempotently ensure and add a `coverage-regression` label; otherwise remove
117
+ it if a prior run added it (ignore "not present" errors). The label is informational, not
118
+ gating. Under `--dry-run`, log the would-be transition and skip.
119
+
120
+ When `--open-issues` is set, open a **deduped** issue per hot spot (Step 5), tiered by
121
+ `tier3_globs`, and hand each fix to **`/keel:ship`** (window + lock + review). Under `--dry-run`,
122
+ print the would-be issues and route nothing.
123
+
124
+ ## Stop conditions / invariants
125
+
126
+ - **Never modify the PR's code; never close or merge** — the only writes are one PR comment
127
+ (created or edited) and one PR label edit (plus hot-spot issues under `--open-issues`).
128
+ Gating/merge is `/keel:ship`'s job.
129
+ - **Fail loudly on a coverage build/test failure** — print the failing command and exit
130
+ non-zero; do NOT post a half-formed table (reviewers would treat it as authoritative). This is
131
+ distinct from a tool **not wired up**, which degrades gracefully per Step 2.
132
+ - **Worktree cleanup** — always remove the base (and head) worktree in a trap on EXIT so a
133
+ mid-run failure never leaks a worktree.
134
+ - **No silent dry-run mutations** — every comment/label/issue write is printed as `DRY-RUN: …`
135
+ and skipped.
136
+ - Deterministic for identical coverage data.
@@ -0,0 +1,142 @@
1
+ ---
2
+ description: On-demand dependency security + licence audit across the project's ecosystems; classify security vs. routine, append findings to today's tracking issue, and route fixes to keel:ship.
3
+ argument-hint: "[<ecosystem>|all] [--severity low|moderate|high|critical] [--open-issues] [--dry-run]"
4
+ allowed-tools: Bash(keel:*), Bash(git:*), Bash(gh:*), Read, Edit
5
+ ---
6
+
7
+ # /keel:deps-audit
8
+
9
+ Project-neutral dependency audit. Every project value — which ecosystems exist, the audit
10
+ command per ecosystem, the licence baseline path, the timezone for the run date, the risk map
11
+ — is read from `.keel/project.yaml` via the `keel` CLI. The actual audit tool for each
12
+ ecosystem is **the project's** (invoked through the toolchain referenced by `build_gate_cmd`);
13
+ this adapter never names a specific package manager.
14
+
15
+ The command is **read-only on application code**: it never edits dependency manifests or
16
+ lockfiles, never applies upgrades, never updates the licence baseline, and never closes the
17
+ tracking issue. It runs the project's per-ecosystem vulnerability audit, computes licence
18
+ drift against a committed baseline, and appends one codename-prefixed comment to today's
19
+ tracking issue. Re-runs append a fresh comment; the codename prefix is the search anchor that
20
+ lets a briefing or a later run find the latest run inside a multi-comment issue.
21
+
22
+ All published artifacts (issue/comment bodies) MUST be in English; free-form chat may be any
23
+ language.
24
+
25
+ ## Step 0 — orient + parse arguments
26
+
27
+ ```bash
28
+ keel validate .keel/project.yaml --root .
29
+ keel plan .keel/project.yaml --root . # read ecosystems, tier3_globs, timezone, repo
30
+ keel plan .keel/project.yaml --root . --command deps-audit --live --json
31
+ ```
32
+
33
+ The live plan is the operator-consent preflight. Before posting tracking comments, opening
34
+ issues, routing fixes to `/keel:ship`, using secrets, publishing, or calling
35
+ production-adjacent systems, parse `contract.operator_consent`; if
36
+ `requires_operator_consent` is true, STOP and ask the operator to rerun with the required
37
+ `--approve-scope` values.
38
+
39
+ Arguments:
40
+ - Positional, optional: one **ecosystem** the project declares, or `all`. Default `all`.
41
+ Reject any value that is not a declared ecosystem or `all`; reject more than one positional.
42
+ - `--severity <level>` — one of `low`, `moderate`, `high`, `critical`. Default `moderate`.
43
+ Severity ordering is a numeric map (`critical=4`, `high=3`, `moderate=2`, `low=1`); a finding
44
+ is reported when `severity_rank >= threshold_rank`. Reject a missing or out-of-set value.
45
+ - `--security-only` — report only vulnerabilities (CVE/advisory findings); skip routine/licence
46
+ reporting noise.
47
+ - `--open-issues` — open a deduped fix issue per finding (Step 6) and route each to
48
+ `/keel:ship`. Without it the command is report-only against the tracking issue.
49
+ - `--dry-run` — compute the report and log the would-be tracking-issue create/comment to stdout
50
+ as `DRY-RUN: …`, but make no write.
51
+ - Reject unknown flags.
52
+
53
+ ## Step 1 — find or create today's tracking issue
54
+
55
+ Compute the run **date in the project timezone** (from the plan, never a hardcoded zone).
56
+ Search for an open tracking issue whose title is exactly `deps-audit: <DATE>`:
57
+ - Found → capture its number.
58
+ - Not found and not `--dry-run` → create it (title `deps-audit: <DATE>`, a one-line body
59
+ noting each run appends a comment) and capture the number.
60
+ - Not found and `--dry-run` → print `DRY-RUN: would create tracking issue deps-audit: <DATE>`
61
+ and continue (Step 6 will skip the post anyway).
62
+
63
+ ## Step 2 — per-ecosystem vulnerability audit
64
+
65
+ For each in-scope ecosystem (skip ecosystems excluded by the positional), run the project's
66
+ declared audit command for that ecosystem via its toolchain. Many audit tools **exit non-zero
67
+ when they find anything** — capture output without letting that abort the run (the equivalent
68
+ of a trailing `|| true`). Parse the report and extract per vulnerability: the advisory/CVE id,
69
+ severity (normalise from any numeric CVSS the tool emits), the dependency name, the current
70
+ resolved version, the recommended non-vulnerable version, and a **fix-available** flag. Filter
71
+ to `severity_rank >= threshold_rank`.
72
+
73
+ **Graceful degradation (per ecosystem, never fatal):** if an audit tool is not wired up,
74
+ cannot run (missing installed deps, network down), or errors, do **not** abort. Add a note
75
+ under the report's "Skipped" section (`<ecosystem> audit skipped: <one-line reason>`) — e.g.
76
+ "audit plugin not wired up; follow-up: add it to the build" — and continue with the others.
77
+ The only fatal case is an argument-parse failure in Step 0.
78
+
79
+ ## Step 3 — licence drift
80
+
81
+ Read the committed licence-baseline path from the project config. Format: sorted
82
+ `<package>@<version> <licence>` lines that are the canonical licence set for the current
83
+ lockfile state.
84
+ - Baseline **missing** → write a one-time scaffolding note to the report (capture the current
85
+ set as a new baseline in a separate intentional commit) and skip the diff.
86
+ - Otherwise build the **current** licence set for each ecosystem that ran (read each installed
87
+ package's declared licence; for ecosystems whose licences resolve over the network, skip that
88
+ half with a note if the network is unreachable), sort it, and diff against the baseline.
89
+ Classify each diff line as `added`, `removed`, or `changed` (same package, different
90
+ licence/version). Empty diff → report `licences: no drift`.
91
+
92
+ ## Step 4 — tier the findings
93
+
94
+ Tier every vulnerability and drift entry by blast radius against `tier3_globs`: a finding that
95
+ touches a tier-3 path (or a dependency consumed from one) is escalated — higher priority and a
96
+ tier label on any issue opened in Step 6.
97
+
98
+ ## Step 5 — build the report comment
99
+
100
+ A single markdown body whose **literal first line** is the codename
101
+ `DEPS-AUDIT-<DATE>-<UTC_TIMESTAMP>`. **Codename pin (load-bearing):** no blank line above it,
102
+ no leading whitespace, no quoting, no Markdown prefix or surrounding formatting — downstream
103
+ consumers (briefing, future-run search anchors) locate the latest run by the
104
+ `DEPS-AUDIT-<DATE>-` prefix, and any deviation makes them miss it.
105
+
106
+ Body shape: a summary count line (`critical: n | high: n | moderate: n | low: n`, counting all
107
+ ecosystems that ran, after threshold filtering); one section per in-scope ecosystem (package ·
108
+ version · severity · advisory · fix-available); a licence-drift section (status · package ·
109
+ baseline licence · current licence); and a "Skipped" section.
110
+
111
+ Formatting rules:
112
+ - Omit an ecosystem section entirely if it was out of scope.
113
+ - An in-scope ecosystem with zero findings above threshold → a single italic line
114
+ `_No <ecosystem> findings at or above <severity> severity._` in place of the table.
115
+ - Licence section ran with empty diff → `_licences: no drift_`.
116
+ - Omit "Skipped" if nothing was skipped.
117
+ - Under `--security-only`, omit the licence-drift section and any routine-update rows.
118
+
119
+ ## Step 6 — post / open issues + route to ship
120
+
121
+ Locate any prior comment for today's run by the `DEPS-AUDIT-<DATE>-` prefix inside the tracking
122
+ issue. The comment API appends (it does not edit in place), and the prefix is the anchor — so
123
+ appending one comment per run is correct.
124
+ - `--dry-run` → print the body under `DRY-RUN: would post comment on issue #<N>:` and skip.
125
+ - Otherwise → post a fresh comment to the tracking issue.
126
+
127
+ When `--open-issues` is set, additionally open a **deduped** fix issue per finding (one issue
128
+ per CVE/dependency; do not re-open against an existing open issue for the same finding),
129
+ labelled by **severity** and **tier**, and hand each fix to **`/keel:ship`** (window + lock +
130
+ review). Never bump a dependency and merge directly from here. Under `--dry-run`, print
131
+ `DRY-RUN: would open issue …` per finding and route nothing.
132
+
133
+ ## Stop conditions / invariants
134
+
135
+ - **Never auto-apply upgrades** — version bumps go through `/keel:ship`, not this command.
136
+ - **Never close the tracking issue; never modify manifests, lockfiles, or the licence
137
+ baseline** (the baseline is updated by a separate intentional commit).
138
+ - **Per-audit failure continues with the others** — network/missing-tool/missing-baseline are
139
+ reported under "Skipped", not fatal. Only Step 0 arg-parse failure is fatal.
140
+ - **No silent dry-run mutations** — every issue create / comment post is printed as `DRY-RUN: …`
141
+ and skipped.
142
+ - Fail-soft · deterministic for identical inputs.
@@ -0,0 +1,157 @@
1
+ ---
2
+ description: Detect intermittently-failing tests from recent CI history (or repeated local runs); dedupe against tracked flakes and open one tracking issue per newly-detected flake — routed to keel:ship.
3
+ argument-hint: "[--days <N>] [--runs <N>] [--threshold <P>] [--open-issues] [--dry-run]"
4
+ allowed-tools: Bash(keel:*), Bash(git:*), Bash(gh:*), Read, Edit
5
+ ---
6
+
7
+ # /keel:flake-audit
8
+
9
+ Project-neutral flaky-test audit. Every project value — the CI workflows, the base branch, the
10
+ test gate, the repo — is read from `.keel/project.yaml` via the `keel` CLI. The test command is
11
+ **the project's** (`keel run-gates .keel/project.yaml --root .`); this adapter never names a
12
+ specific test runner.
13
+
14
+ The command is **read-only on test code**: it never edits test sources, never auto-disables a
15
+ test (no skip/ignore/only annotation), never reruns a test to "confirm" a flake, and never
16
+ closes an existing flake issue. It aggregates per-test pass/fail signal, classifies tests whose
17
+ failure rate crosses the threshold as flaky, dedupes against open flake issues, and opens one
18
+ tracking issue per newly-detected flake. Re-runs MUST NOT spam duplicates — the Step 4 dedupe is
19
+ load-bearing.
20
+
21
+ **The defining rule:** only an **across-runs disagreement** (a test that both passes and fails
22
+ under identical conditions) marks a flake. A test that **consistently fails** is a real bug, not
23
+ a flake — it belongs to `/keel:ci-check` and human triage; do not classify it here. All
24
+ published artifacts (report, issue bodies) MUST be English.
25
+
26
+ ## Step 0 — orient + parse arguments + runtime gate
27
+
28
+ ```bash
29
+ keel validate .keel/project.yaml --root .
30
+ keel plan .keel/project.yaml --root . # read base_branch, ci_workflows, repo
31
+ keel plan .keel/project.yaml --root . --command flake-audit --live --json
32
+ ```
33
+
34
+ The live plan is the operator-consent preflight. Before opening issues, routing fixes to
35
+ `/keel:ship`, using secrets, publishing, or calling production-adjacent systems, parse
36
+ `contract.operator_consent`; if `requires_operator_consent` is true, STOP and ask the
37
+ operator to rerun with the required `--approve-scope` values.
38
+
39
+ Arguments:
40
+ - `--days <N>` — CI-history lookback window in calendar days (UTC). Default `14`. Reject `0`,
41
+ negatives, non-integers.
42
+ - `--runs <N>` — when reading CI history is unavailable, re-run the project's test gate `N`
43
+ times locally instead (default `5`). The two evidence sources are mutually supportive:
44
+ prefer CI history; fall back to repeated local runs.
45
+ - `--threshold <P>` — minimum failure rate to flag, a decimal in `[0, 1]`. Default `0.10`.
46
+ Reject anything outside `[0, 1]`, non-numeric, or a missing value after the flag.
47
+ - `--open-issues` — open a deduped tracking issue per newly-detected flake (Step 6) and route
48
+ each fix to `/keel:ship`. Without it, report-only.
49
+ - `--dry-run` — compute the report and print findings, but skip every issue-create mutation;
50
+ print `would create: <title>` per skipped issue.
51
+ - Reject unknown flags.
52
+
53
+ **Runtime-availability gate.** Test-level CI-history analysis needs the host's check-run +
54
+ artifact APIs (via the host CLI; no MCP equivalent today). If those are unreachable AND the
55
+ project's test gate cannot be run locally either, exit cleanly with a single line saying flake
56
+ detection requires either CI check-run/artifact access or a runnable local test gate, neither
57
+ available in this runtime. Do not error or partial-run.
58
+
59
+ ## Step 1 — gather evidence
60
+
61
+ **CI-history mode (preferred, full fidelity):** enumerate the in-window CI runs of the
62
+ project's `ci_workflows` on `base_branch` (drive it via commits in the window + their
63
+ check-runs where a direct run-by-date listing is unavailable; document the substitution
64
+ honestly in the report's Limitations). For each failing check run keep its conclusion, workflow
65
+ name, start time, the run-page URL (for "sample run URLs"), and the test-report artifact URL if
66
+ extractable.
67
+
68
+ **Local-runs mode (fallback):** run `keel run-gates .keel/project.yaml --root .` `--runs`
69
+ times on a clean tree and record per-run pass/fail.
70
+
71
+ ## Step 2 — build per-test aggregates
72
+
73
+ - **Test-level (full fidelity):** parse each run's test-report into per-test results keyed by
74
+ **fully-qualified test name**; track `pass_count` and `fail_count`. Capture, per failing
75
+ test, the first ~5 lines of the failure signature from the most recent failing run.
76
+ - **Degraded run-level (no per-test granularity — artifacts unreachable):** aggregate only at
77
+ the run level (which runs failed, on which commit, with which URL); skip per-test
78
+ classification and per-flake issue creation, and produce a Limitations-flagged summary only.
79
+
80
+ ## Step 3 — classify flakes (test-level only)
81
+
82
+ A fully-qualified test with both passes and failures in the window is **flaky** when:
83
+
84
+ ```
85
+ fail_count >= 3 AND fail_count / (pass_count + fail_count) >= <threshold>
86
+ ```
87
+
88
+ The `fail_count >= 3` floor is load-bearing: it stops a single transient blip being labelled
89
+ flaky. A test that **never passed** in the window is a deterministic failure, not a flake — do
90
+ not classify it here.
91
+
92
+ ## Step 4 — dedupe against tracked flakes
93
+
94
+ Search open issues labelled `flake` (canonical title shape `flaky test: <fully.qualified.name>`)
95
+ and build the tracked-name set by stripping that prefix. Carry forward **only** newly-classified
96
+ flakes not already tracked. This dedupe is the only reason the command is safe to re-run on a
97
+ schedule — do not weaken it.
98
+
99
+ ## Step 5 — build the report
100
+
101
+ A single markdown body whose **literal first line** is the codename
102
+ `FLAKE-AUDIT-<DATE>-<UTC_TIMESTAMP>`. **Codename pin (load-bearing):** no blank line above it,
103
+ no leading whitespace, no quoting, no Markdown prefix or surrounding formatting — downstream
104
+ consumers locate the run by the `FLAKE-AUDIT-<DATE>-` prefix.
105
+
106
+ Body: a summary line (runs examined · distinct failing tests · classified flakes ·
107
+ newly-opened issues); a **newly-classified flakes** table (test · fail rate · failures · up to
108
+ 3 sample run URLs, most-recent first · first ~5 lines of the failure signature, fenced inline);
109
+ an **already-tracked (deduped)** list (`<name> — see #<existing>`); and a **Limitations**
110
+ section.
111
+
112
+ Formatting rules:
113
+ - Zero new flakes → render `_no new flakes above threshold_` instead of the table.
114
+ - Omit "already-tracked" if nothing was deduped.
115
+ - Omit "Limitations" if no degradation occurred; otherwise include every honest caveat (history
116
+ enumerated commit-by-commit, an artifact download that failed, degraded run-level mode, etc.).
117
+ - Degraded run-level mode → replace the flakes table with a short list of failing runs (commit ·
118
+ workflow · URL) and add the Limitations bullet that test-level classification was skipped.
119
+
120
+ ## Step 6 — open issues per new flake + route to ship
121
+
122
+ Only under `--open-issues`, and test-level mode only (degraded mode skips this). Per newly
123
+ classified flake:
124
+ - `--dry-run` → print `would create: flaky test: <name>` and skip.
125
+ - Otherwise → open an issue titled `flaky test: <fully.qualified.name>` whose body carries the
126
+ detecting codename + window, the fail rate (`<fail>/<total>`), the threshold, up to 3 sample
127
+ run URLs, the fenced failure-signature excerpt, and a triage note: **do NOT auto-disable**;
128
+ investigate root cause (timing, shared state, ordering, network); close with a fix PR or mark
129
+ wontfix if the test is being retired.
130
+
131
+ Labels: always `flake`; add an **area** label tiered from `tier3_globs` / the test's path or
132
+ FQN where the project's layout makes the area derivable; if the area is indeterminate, apply
133
+ only `flake` and note that in the body. If a needed label does not yet exist, the create may
134
+ fail to apply it — detect that, **retry the create without the label**, and add a Limitations
135
+ bullet recommending the operator pre-create the label.
136
+
137
+ Hand each opened flake fix to **`/keel:ship`** (window + lock + review). Optionally suggest a
138
+ quarantine annotation in the issue — but never silently skip a test from this command.
139
+
140
+ ## Step 7 — print
141
+
142
+ Always print the Step 5 report to stdout, even when not `--dry-run` — a silent run would force
143
+ the operator to spelunk the tracker after the fact.
144
+
145
+ ## Stop conditions / invariants
146
+
147
+ - **Never auto-disable a flaky test; never rerun a test to "confirm" flakiness** (a rerun
148
+ invalidates the very signal being measured) — only observed history counts.
149
+ - **One flake = one issue** — the Step 4 dedupe is load-bearing; re-runs never duplicate a
150
+ tracked flake.
151
+ - **Never close/edit/comment on existing flake issues** — triage is a human call; this command
152
+ only opens new ones.
153
+ - **Per-step failure continues with what was fetched**, documented under Limitations; only Step
154
+ 0 arg-parse failure is fatal.
155
+ - **No silent dry-run mutations** — each create is printed as `would create: <title>` and
156
+ skipped.
157
+ - Fail-soft · deterministic for identical inputs.
@@ -0,0 +1,127 @@
1
+ ---
2
+ description: Delegate a single issue to the right implementer and drive the s4 implement step standalone. Project-neutral — every project specific is read from .keel/project.yaml via the keel CLI.
3
+ argument-hint: "[issue number] [--delegate <claude|codex|agy|ollama:MODEL>]"
4
+ allowed-tools: Bash(keel:*), Bash(git:*), Bash(gh:*), Read, Write, Agent
5
+ ---
6
+
7
+ # /keel:implement
8
+
9
+ The standalone **implement step (`s4`)** of the keel backbone. This adapter is
10
+ project-neutral: it contains no branch name, build command, agent, path glob, or
11
+ timezone. Read every project-specific value from `.keel/project.yaml` via the
12
+ `keel` CLI.
13
+
14
+ > **Hard rule.** If you are about to type a literal like a base-branch name, a
15
+ > build/lint command, an implementer agent, or a path glob — **stop** and read it
16
+ > from config (`base_branch`, `build_gate_cmd`, `lint_cmd`, `implementer_agents`,
17
+ > `tier3_globs`). Hardcoding a project specific here is the exact bug keel exists
18
+ > to kill.
19
+
20
+ ## Language
21
+
22
+ All committed/published artifacts (commits, branch names, PR/issue titles and
23
+ bodies, comments, file contents) MUST be written in English. Free-form chat with
24
+ the user may stay in any language. (project-specific language policy lives in the
25
+ project's source-of-truth doc — `knobs.sot_doc`.)
26
+
27
+ ## Step 0 — Resolve config
28
+
29
+ ```bash
30
+ keel validate .keel/project.yaml --root . # abort if config/extensions invalid
31
+ keel plan .keel/project.yaml --root . # read base_branch, implementer_agents, tier3_globs
32
+ keel plan .keel/project.yaml --root . --command implement --live --json
33
+ ```
34
+
35
+ The live plan is the operator-consent preflight. Before posting comments, creating a
36
+ worktree/branch, delegating, editing files, committing, pushing, opening a PR, using
37
+ secrets, publishing, or calling production-adjacent systems, parse
38
+ `contract.operator_consent`; if `requires_operator_consent` is true, STOP and ask the
39
+ operator to rerun with the required `--approve-scope` values. Store
40
+ `operator_consent.delegated_agent_scope` for Step 5.
41
+
42
+ Read the knobs you will need: `base_branch`, `implementer_agents`, `tier3_globs`,
43
+ `build_gate_cmd`, `lint_cmd`.
44
+
45
+ ## Step 1 — Fetch the issue
46
+
47
+ Read the issue (title, body, labels) via `gh` (CLI when available) or the GitHub
48
+ MCP read tools (sandbox/web runtime). Capture `number`, `title`, `body`, and the
49
+ `labels` array — the role/platform label drives implementer routing below.
50
+
51
+ ## Step 2 — Check for an existing branch
52
+
53
+ Look for a branch already associated with this issue (e.g. matching
54
+ `*issue-<N>*`). If one exists, report it and ask the human whether to continue on
55
+ it or start fresh — do not silently clobber in-flight work.
56
+
57
+ ## Step 3 — Resolve the implementer
58
+
59
+ Resolve the implementer agent from `implementer_agents` keyed by the issue's
60
+ **role/platform label**, overridden by `--delegate`, defaulting to the **host
61
+ agent**. Do not hardcode an agent name — the mapping is config.
62
+
63
+ Project-specific routing nuances (e.g. a particular file-pattern that demands a
64
+ specialized tool or a record-and-validate script for snapshot/baseline tests)
65
+ live in the project, not here: express them as a `.keel/extensions/` Lego
66
+ (an `after-implement` slot) or mark them "(project-specific; stays in the
67
+ project)".
68
+
69
+ ## Step 4 — Agent run codename + start comment
70
+
71
+ Mint an agent run codename and record attribution. Use a deterministic,
72
+ collision-free form: `<ROLE_PREFIX>-<issue>-<UTC timestamp>` where `ROLE_PREFIX`
73
+ derives from the resolved implementer role and the timestamp is generated at run
74
+ time (UTC, e.g. `YYYYMMDD-HHMMSS`). Attribution is `agent:<vendor>` plus a
75
+ versionless `model:<base>`.
76
+
77
+ Post a start comment on the issue before delegating (via `gh` or the GitHub MCP
78
+ comment tool), including: codename, chosen agent, implementer system (host agent
79
+ id), and the planned branch name.
80
+
81
+ ## Step 5 — Delegate (with worktree isolation)
82
+
83
+ Dispatch to the resolved implementer with the issue context. Mandatory steps the
84
+ implementer must follow:
85
+
86
+ 0. Receive and obey the approved `operator_consent.delegated_agent_scope`. If the
87
+ implementer attempts work outside `approved_mutation_scopes`, the orchestrator blocks or
88
+ escalates. Secret access requires explicit `secrets` approval for this run.
89
+ 1. Read the project's source-of-truth doc (`knobs.sot_doc`) and any platform
90
+ context it points to.
91
+ 2. **Workspace isolation (mandatory):** before any code-modifying work, create a
92
+ git worktree off `origin/<base_branch>` (config — never assume the branch) and
93
+ perform every edit, build, and push from inside it. Never mutate the user's
94
+ primary checkout. Use a repo-nested worktree path (never a sibling), e.g.:
95
+ ```bash
96
+ git fetch origin "$BASE_BRANCH" --quiet
97
+ git worktree add -b feature/issue-<N>-<slug> worktrees/issue-<N> origin/"$BASE_BRANCH"
98
+ ```
99
+ Run the gates from inside that path. After the PR merges, clean up with
100
+ `git worktree remove worktrees/issue-<N> --force`.
101
+ 3. Implement all acceptance criteria with focused commits scoped to the issue.
102
+ 4. Run the applicable gates from inside the worktree via the keel CLI so the
103
+ command strings stay config-driven:
104
+ ```bash
105
+ keel run-gates .keel/project.yaml --root .
106
+ ```
107
+ This executes the built-in `build_gate_cmd` / `lint_cmd` plus any `tester`
108
+ Lego. Gate selection that depends on which files changed (schema migration,
109
+ entitlement, or config-specific suites) is project-specific: express it as a
110
+ Lego or mark it "(project-specific; stays in the project)".
111
+ 5. Include the codename in commits / PR body / final summary when practical.
112
+ 6. Return the contract: codename, branch/commit, files changed, gate results,
113
+ docs impact, and anything needing manual/device/infra verification.
114
+
115
+ ## Step 6 — Report back + hand off
116
+
117
+ After the implementer completes, summarize: codename, branch/commit, what was
118
+ implemented, gate results, and anything needing manual verification. Post a
119
+ completion comment on the issue with the same codename and the gate results if
120
+ the implementer did not already do so.
121
+
122
+ Do **not** merge here — that is `/keel:ship`'s job (window + lock + review). Hand
123
+ the contract to `/keel:ship` (or `/keel:pr-loop`) to open the PR and drive
124
+ review / CI / merge.
125
+
126
+ Fail over to the host agent on delegate quota errors; attribute the **effective**
127
+ agent.