codebyplan 1.13.52 → 1.13.54
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/cli.js +3226 -897
- package/package.json +1 -1
- package/templates/agents/cbp-database-agent.md +1 -1
- package/templates/agents/cbp-e2e-maestro.md +1 -1
- package/templates/agents/cbp-e2e-playwright.md +24 -16
- package/templates/agents/cbp-e2e-tauri.md +1 -1
- package/templates/agents/cbp-e2e-vscode.md +1 -1
- package/templates/agents/cbp-e2e-xcuitest.md +1 -1
- package/templates/agents/cbp-improve-claude.md +2 -2
- package/templates/agents/{cbp-round-executor.md → cbp-round-builder.md} +23 -23
- package/templates/agents/{cbp-task-planner.md → cbp-round-planner.md} +26 -25
- package/templates/agents/cbp-security-agent.md +10 -2
- package/templates/agents/cbp-stripe-agent.md +2 -2
- package/templates/agents/cbp-testing-qa-agent.md +34 -20
- package/templates/agents/cbp-verify-reviewer.md +236 -0
- package/templates/context/architecture-map.md +4 -4
- package/templates/context/mcp-docs.md +57 -11
- package/templates/context/testing/e2e.md +9 -9
- package/templates/github-workflows/ci.yml +104 -0
- package/templates/github-workflows/publish.yml +8 -27
- package/templates/github-workflows/release-desktop.yml +215 -0
- package/templates/hooks/cbp-skill-context-guard.sh +1 -1
- package/templates/hooks/cbp-test-hooks.sh +9 -9
- package/templates/hooks/validate-structure-lengths.sh +1 -1
- package/templates/hooks/validate-structure-patterns.sh +1 -1
- package/templates/rules/README.md +1 -2
- package/templates/rules/agent-claim-verification.md +1 -1
- package/templates/rules/context-file-loading.md +10 -10
- package/templates/rules/development-workflow.md +73 -0
- package/templates/rules/e2e-mandatory.md +8 -8
- package/templates/rules/execution-proof.md +70 -0
- package/templates/rules/model-invocation-convention.md +2 -2
- package/templates/rules/parallel-waves.md +11 -11
- package/templates/rules/spawn-failure-is-gate-failure.md +76 -0
- package/templates/rules/task-routing-recommendation.md +1 -1
- package/templates/rules/todo-backend.md +3 -3
- package/templates/rules/two-tier-ci.md +63 -0
- package/templates/settings.project.base.json +15 -11
- package/templates/skills/cbp-build-cc-mode/SKILL.md +1 -1
- package/templates/skills/cbp-build-cc-settings/reference/cbp-permission-policy.md +7 -7
- package/templates/skills/cbp-build-cc-skill/SKILL.md +1 -1
- package/templates/skills/cbp-build-cc-skill/reference/cbp-quality.md +2 -2
- package/templates/skills/cbp-build-cc-skill/reference/fork-eligibility.md +11 -14
- package/templates/skills/cbp-checkpoint-check/SKILL.md +11 -3
- package/templates/skills/cbp-checkpoint-create/SKILL.md +16 -1
- package/templates/skills/cbp-checkpoint-end/SKILL.md +5 -1
- package/templates/skills/cbp-checkpoint-update/SKILL.md +3 -3
- package/templates/skills/cbp-clear-continue/SKILL.md +2 -2
- package/templates/skills/cbp-clear-prep/SKILL.md +3 -3
- package/templates/skills/{cbp-task-complete → cbp-finalize}/SKILL.md +25 -29
- package/templates/skills/{cbp-task-complete → cbp-finalize}/reference/checkpoint-done-branching.md +1 -1
- package/templates/skills/{cbp-task-complete → cbp-finalize}/reference/next-step-heuristic.md +1 -1
- package/templates/skills/cbp-frontend-design/SKILL.md +1 -1
- package/templates/skills/cbp-frontend-ui/SKILL.md +7 -7
- package/templates/skills/cbp-git-commit/SKILL.md +3 -3
- package/templates/skills/cbp-merge-main/SKILL.md +4 -4
- package/templates/skills/{cbp-round-execute → cbp-round-build}/SKILL.md +93 -75
- package/templates/skills/cbp-round-complete/SKILL.md +15 -14
- package/templates/skills/cbp-round-plan/SKILL.md +344 -0
- package/templates/skills/cbp-session-end/SKILL.md +1 -1
- package/templates/skills/cbp-setup-cd/SKILL.md +291 -0
- package/templates/skills/cbp-setup-cd/reference/github-actions-cd.md +231 -0
- package/templates/skills/cbp-setup-ci/SKILL.md +175 -0
- package/templates/skills/cbp-setup-ci/reference/github-actions.md +100 -0
- package/templates/skills/cbp-ship/SKILL.md +21 -0
- package/templates/skills/cbp-ship-main/SKILL.md +3 -2
- package/templates/skills/cbp-standalone-task-check/SKILL.md +10 -9
- package/templates/skills/cbp-standalone-task-complete/SKILL.md +12 -13
- package/templates/skills/cbp-standalone-task-create/SKILL.md +16 -9
- package/templates/skills/cbp-standalone-task-start/SKILL.md +9 -5
- package/templates/skills/cbp-standalone-task-testing/SKILL.md +16 -7
- package/templates/skills/cbp-task-create/SKILL.md +6 -7
- package/templates/skills/cbp-task-start/SKILL.md +8 -8
- package/templates/skills/cbp-todo/SKILL.md +6 -8
- package/templates/skills/cbp-verify/SKILL.md +146 -0
- package/templates/skills/cbp-verify/reference/deterministic-gates.md +114 -0
- package/templates/skills/{cbp-round-end → cbp-verify}/reference/findings-presentation.md +16 -12
- package/templates/skills/cbp-verify/reference/round-scope.md +62 -0
- package/templates/skills/cbp-verify/reference/task-scope.md +71 -0
- package/templates/agents/cbp-improve-round.md +0 -283
- package/templates/agents/cbp-task-check.md +0 -217
- package/templates/skills/cbp-round-check/SKILL.md +0 -132
- package/templates/skills/cbp-round-end/SKILL.md +0 -173
- package/templates/skills/cbp-round-end/reference/inline-fallback.md +0 -35
- package/templates/skills/cbp-round-execute/reference/inline-fallback.md +0 -55
- package/templates/skills/cbp-round-input/SKILL.md +0 -197
- package/templates/skills/cbp-round-start/SKILL.md +0 -261
- package/templates/skills/cbp-round-update/SKILL.md +0 -120
- package/templates/skills/cbp-ship/templates/workflow-eas-submit.yml +0 -53
- package/templates/skills/cbp-ship/templates/workflow-vsce-publish.yml +0 -31
- package/templates/skills/cbp-task-check/SKILL.md +0 -172
- package/templates/skills/cbp-task-testing/SKILL.md +0 -277
|
@@ -25,7 +25,7 @@ Any multi-segment input is an error:
|
|
|
25
25
|
|
|
26
26
|
```
|
|
27
27
|
standalone-task-complete: argument `{value}` looks like a checkpoint-task pair.
|
|
28
|
-
Use /cbp-
|
|
28
|
+
Use /cbp-finalize {chk}-{task} for checkpoint-bound tasks.
|
|
29
29
|
Standalone tasks use a bare number, e.g. /cbp-standalone-task-complete 45.
|
|
30
30
|
```
|
|
31
31
|
|
|
@@ -35,7 +35,7 @@ Error cases: any multi-segment input, `abc`, `108-`, `-1`, anything with whitesp
|
|
|
35
35
|
|
|
36
36
|
- `standalone-task-complete 45` → standalone TASK-45
|
|
37
37
|
- `standalone-task-complete` (no arg) → active in-progress task via `get_current_standalone_task`
|
|
38
|
-
- `standalone-task-complete 141-3` → error: "Use /cbp-
|
|
38
|
+
- `standalone-task-complete 141-3` → error: "Use /cbp-finalize {chk}-{task} for checkpoint-bound tasks."
|
|
39
39
|
- `standalone-task-complete abc` → error: malformed
|
|
40
40
|
|
|
41
41
|
### Step 1.5: Get Current Task
|
|
@@ -56,7 +56,7 @@ If any round is `in_progress`:
|
|
|
56
56
|
```
|
|
57
57
|
## Cannot Complete Standalone Task
|
|
58
58
|
|
|
59
|
-
Standalone TASK-[N] has an active round (Round [N]). Run `/cbp-
|
|
59
|
+
Standalone TASK-[N] has an active round (Round [N]). Run `/cbp-verify` to finish it.
|
|
60
60
|
```
|
|
61
61
|
|
|
62
62
|
Stop here.
|
|
@@ -66,7 +66,7 @@ Verify at least one round has `testing_qa_output` in its context. If not:
|
|
|
66
66
|
```
|
|
67
67
|
## Cannot Complete Standalone Task
|
|
68
68
|
|
|
69
|
-
No testing-qa-agent validation found. Run `/cbp-round-
|
|
69
|
+
No testing-qa-agent validation found. Run `/cbp-round-plan` to execute a validated round.
|
|
70
70
|
```
|
|
71
71
|
|
|
72
72
|
Stop here.
|
|
@@ -179,18 +179,17 @@ When `branch_deleted === true` in the ship JSON:
|
|
|
179
179
|
|
|
180
180
|
### Step 7.5: Complete Standalone Task
|
|
181
181
|
|
|
182
|
-
Note:
|
|
182
|
+
Note: completion is called only after `codebyplan ship` succeeds (no `checks_failed`) — the DB completion record reflects work that has landed in production.
|
|
183
183
|
|
|
184
|
-
|
|
184
|
+
Complete via the CLI (wraps the `complete_standalone_task` MCP tool):
|
|
185
185
|
|
|
186
|
-
|
|
186
|
+
```bash
|
|
187
|
+
codebyplan standalone-task complete --id <standalone_task.id>
|
|
188
|
+
```
|
|
187
189
|
|
|
188
|
-
|
|
190
|
+
The CLI auto-resolves `caller_worktree_id` (override → worktree cache → resolver). `caller_worktree_id` is REQUIRED — the MCP server's pre-guard rejects mutations from non-matching worktrees, and the CLI hard-fails (exit 1) with registration guidance rather than sending an undefined id. The server auto-clears `assigned_worktree_id` on the task on success.
|
|
189
191
|
|
|
190
|
-
|
|
191
|
-
Warning: could not resolve caller_worktree_id (npx codebyplan resolve-worktree returned empty).
|
|
192
|
-
The complete_standalone_task call may be rejected by the pre-guard. Proceed anyway? (yes / no)
|
|
193
|
-
```
|
|
192
|
+
If the CLI exits 1 with a "could not resolve caller_worktree_id" message, run `npx codebyplan setup` (or `codebyplan resolve-worktree --cache`) from this worktree, then re-run the command.
|
|
194
193
|
|
|
195
194
|
### Step 8: Run Cleanup + Migration (inline)
|
|
196
195
|
|
|
@@ -238,6 +237,6 @@ Do NOT use AskUserQuestion for routing. Do NOT use the Skill tool to auto-trigge
|
|
|
238
237
|
- **Chain**: `/cbp-standalone-task-check` → `/cbp-standalone-task-testing` → `/cbp-standalone-task-complete`
|
|
239
238
|
- **Delegates to**: `codebyplan ship` CLI (Step 7 — PR creation, check polling, merge, branch cleanup)
|
|
240
239
|
- **Reads**: MCP `get_current_standalone_task`, `get_standalone_tasks`, `get_standalone_rounds`
|
|
241
|
-
- **Writes**: MCP `update_standalone_task
|
|
240
|
+
- **Writes**: MCP `update_standalone_task` (Step 6 files); `codebyplan standalone-task complete` (wraps `complete_standalone_task`)
|
|
242
241
|
- **Uses skills (inline, no sub-agent)**: `cleanup` (if deletions), `migration` (if exports renamed)
|
|
243
242
|
- **Does NOT** auto-trigger next skill — emits single directive only
|
|
@@ -17,7 +17,7 @@ Create a new standalone task — independent of any checkpoint. Gathers user con
|
|
|
17
17
|
|
|
18
18
|
## Identifier Notation
|
|
19
19
|
|
|
20
|
-
Standalone tasks use a bare number (e.g. `45` = standalone TASK-45). There is no checkpoint segment. Canonical notation follows `cbp-round-
|
|
20
|
+
Standalone tasks use a bare number (e.g. `45` = standalone TASK-45). There is no checkpoint segment. Canonical notation follows `cbp-round-plan` Step 0 "CHK / TASK / ROUND Identifier Notation Vocabulary".
|
|
21
21
|
|
|
22
22
|
## Instructions
|
|
23
23
|
|
|
@@ -96,13 +96,20 @@ Resolve worktree_id via `npx codebyplan resolve-worktree 2>/dev/null`.
|
|
|
96
96
|
|
|
97
97
|
### Step 7: Create Standalone Task
|
|
98
98
|
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
|
|
99
|
+
Create via the CLI (wraps the `create_standalone_task` MCP tool; auto-resolves `caller_worktree_id`):
|
|
100
|
+
|
|
101
|
+
```bash
|
|
102
|
+
codebyplan standalone-task create \
|
|
103
|
+
--title "<concise task title>" \
|
|
104
|
+
--number <next number from Step 3> \
|
|
105
|
+
--requirements "<numbered requirements list>" \
|
|
106
|
+
--context '<JSON: decisions from Q&A + source findings>' \
|
|
107
|
+
--assigned-worktree-id <from Step 6, if resolved>
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
- `--repo-id` is optional — the CLI reads it from `.codebyplan/repo.json`.
|
|
111
|
+
- Omit `--assigned-worktree-id` when Step 6 did not resolve a worktree.
|
|
112
|
+
- On success the CLI prints the created row JSON (including `.id`) to stdout.
|
|
106
113
|
|
|
107
114
|
```
|
|
108
115
|
## Standalone Task Created
|
|
@@ -145,6 +152,6 @@ Waiting for user to decide next step.
|
|
|
145
152
|
## Integration
|
|
146
153
|
|
|
147
154
|
- **Reads**: MCP `get_standalone_tasks`
|
|
148
|
-
- **Writes**:
|
|
155
|
+
- **Writes**: `codebyplan standalone-task create` (wraps `create_standalone_task` MCP tool)
|
|
149
156
|
- **Triggered by**: user manual
|
|
150
157
|
- **Does NOT auto-trigger** next command — user decides
|
|
@@ -149,13 +149,17 @@ Load context from DB:
|
|
|
149
149
|
|
|
150
150
|
### Step 5: Set Task Status
|
|
151
151
|
|
|
152
|
-
|
|
152
|
+
Set status via the CLI (wraps `update_standalone_task`; auto-resolves `caller_worktree_id`):
|
|
153
153
|
|
|
154
|
-
|
|
154
|
+
```bash
|
|
155
|
+
codebyplan standalone-task update --id <task.id> --status in_progress
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
`--id` is the standalone task UUID resolved in Step 2. The CLI resolves `caller_worktree_id` itself (override → worktree cache → resolver), so `CALLER_WT` does not need to be passed.
|
|
155
159
|
|
|
156
160
|
### Step 6: Auto-trigger Round Start
|
|
157
161
|
|
|
158
|
-
Trigger `/cbp-round-
|
|
162
|
+
Trigger `/cbp-round-plan` with no argument. Do NOT pass the task number — round-start's 2-segment form means standalone TASK-{a} round {b}, not a checkpoint task. Passing no argument causes round-start to derive the active task/round from state, which is correct.
|
|
159
163
|
|
|
160
164
|
```
|
|
161
165
|
Starting first round...
|
|
@@ -164,5 +168,5 @@ Starting first round...
|
|
|
164
168
|
## Integration
|
|
165
169
|
|
|
166
170
|
- **Reads**: MCP `get_standalone_tasks`, `get_current_standalone_task`, `get_standalone_rounds`
|
|
167
|
-
- **Writes**: MCP `update_standalone_task`
|
|
168
|
-
- **Triggers**: `/cbp-round-
|
|
171
|
+
- **Writes**: `codebyplan standalone-task update` (Step 5 status); MCP `update_standalone_task` (Step 3.4 branch_name persist)
|
|
172
|
+
- **Triggers**: `/cbp-round-plan` (no argument — auto, round 1)
|
|
@@ -19,7 +19,7 @@ Comprehensive task-level testing for standalone tasks — the **cross-round doub
|
|
|
19
19
|
|
|
20
20
|
## Scope vs Round-Level Validation
|
|
21
21
|
|
|
22
|
-
Per-wave `testing-qa-agent` runs inside `/cbp-round-
|
|
22
|
+
Per-wave `testing-qa-agent` runs inside `/cbp-round-build` Step 5 and **owns per-round QA**: per-app build/lint/types, the `console.log`/debug scan, the OWASP/secret full-diff grep, and `pnpm audit`. This skill does NOT repeat them. It adds only the cross-round layer invisible within a single round: workspace-wide lint, workspace tsc, and the full test suite (which catch cross-package breakage), plus the cross-round code review (Step 6.5) and the user manual walkthrough (Step 8).
|
|
23
23
|
|
|
24
24
|
## Instructions
|
|
25
25
|
|
|
@@ -34,7 +34,7 @@ Any multi-segment input is an error:
|
|
|
34
34
|
|
|
35
35
|
```
|
|
36
36
|
standalone-task-testing: argument `{value}` looks like a checkpoint-task pair.
|
|
37
|
-
Use /cbp-
|
|
37
|
+
Use /cbp-verify {chk}-{task} for checkpoint-bound tasks.
|
|
38
38
|
Standalone tasks use a bare number, e.g. /cbp-standalone-task-testing 45.
|
|
39
39
|
```
|
|
40
40
|
|
|
@@ -57,7 +57,7 @@ Use MCP `get_standalone_rounds(standalone_task_id)`. Verify all rounds are `comp
|
|
|
57
57
|
## Cannot Run Standalone Task Testing
|
|
58
58
|
|
|
59
59
|
Standalone TASK-[N] has an active round (Round [N]). Complete it first:
|
|
60
|
-
- Run `/cbp-
|
|
60
|
+
- Run `/cbp-verify` to finish the round
|
|
61
61
|
```
|
|
62
62
|
|
|
63
63
|
Stop.
|
|
@@ -86,13 +86,22 @@ Read every non-deleted file in the aggregated list. Build a mental model of the
|
|
|
86
86
|
|
|
87
87
|
Capture stdout and stderr for each check.
|
|
88
88
|
|
|
89
|
+
**ci.json command resolution (absent-fallback safe):** Before running the checks below, resolve commands from `.codebyplan/ci.json`:
|
|
90
|
+
|
|
91
|
+
```bash
|
|
92
|
+
CI_TYPES_CMD=$(npx codebyplan ci resolve typecheck 2>/dev/null)
|
|
93
|
+
CI_UNIT_CMD=$(npx codebyplan ci resolve unit_test 2>/dev/null)
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
Fallback: if `.codebyplan/ci.json` is absent, `ci resolve` returns the central default (exit 0). If the binary is unavailable, the variable is empty and the `${CI_*_CMD:-<literal>}` guards in the table below activate the hardcoded fallback.
|
|
97
|
+
|
|
89
98
|
**Hard-fail tests** (block completion):
|
|
90
99
|
|
|
91
100
|
| Category | Command | Condition |
|
|
92
101
|
|----------|---------|-----------|
|
|
93
102
|
| Full-repo lint | `pnpm -w lint` | Always |
|
|
94
|
-
| Full-repo types |
|
|
95
|
-
| Full-repo unit tests |
|
|
103
|
+
| Full-repo types | `${CI_TYPES_CMD:-pnpm exec tsc --noEmit}` | Source files changed |
|
|
104
|
+
| Full-repo unit tests | `${CI_UNIT_CMD:-pnpm test --run}` | Source files in aggregated_files |
|
|
96
105
|
| Per-package E2E | `pnpm --filter <pkg> e2e:test` | UI files in aggregated_files |
|
|
97
106
|
|
|
98
107
|
These are the workspace-wide / cross-package checks only — per-app build/lint/types, the `console.log`/debug scan, the OWASP/secret grep, and `pnpm audit` already ran per-round inside `testing-qa-agent` and are NOT repeated here.
|
|
@@ -176,11 +185,11 @@ Next: /cbp-standalone-task-complete {N}
|
|
|
176
185
|
---
|
|
177
186
|
|
|
178
187
|
**Next:**
|
|
179
|
-
Run `/cbp-round-
|
|
188
|
+
Run `/cbp-round-plan` to address the minor issues found during testing.
|
|
180
189
|
|
|
181
190
|
---
|
|
182
191
|
|
|
183
|
-
Waiting for user to run `/cbp-round-
|
|
192
|
+
Waiting for user to run `/cbp-round-plan`.
|
|
184
193
|
|
|
185
194
|
**Major problems found:**
|
|
186
195
|
|
|
@@ -10,13 +10,12 @@ Create a new task within the active checkpoint. Gathers user context, analyzes e
|
|
|
10
10
|
|
|
11
11
|
## When Used
|
|
12
12
|
|
|
13
|
-
- Suggested by `/cbp-
|
|
14
|
-
- Suggested by `/cbp-task-testing` when major problems need a separate task
|
|
13
|
+
- Suggested by `/cbp-verify` (task scope) when scope issues or major problems require a separate task
|
|
15
14
|
- User manually wants to add a task to the current checkpoint
|
|
16
15
|
|
|
17
16
|
## Identifier Notation
|
|
18
17
|
|
|
19
|
-
This skill operates on the **active** checkpoint resolved via MCP `get_current_task` and does not accept a positional identifier argument. The task it creates gets its `number` from the next-available slot within the active checkpoint (checkpoint-bound). Canonical chk-task-round notation — used in prose, error messages, and cross-references — follows `cbp-round-
|
|
18
|
+
This skill operates on the **active** checkpoint resolved via MCP `get_current_task` and does not accept a positional identifier argument. The task it creates gets its `number` from the next-available slot within the active checkpoint (checkpoint-bound). Canonical chk-task-round notation — used in prose, error messages, and cross-references — follows `cbp-round-plan` Step 0 "CHK / TASK / ROUND Identifier Notation Vocabulary": `108-1` (CHK-108 TASK-1), `108-1-2` (round 2 of CHK-108 TASK-1).
|
|
20
19
|
|
|
21
20
|
**Bare-number argument**: if a bare number (e.g. `42`) is provided with no checkpoint context, this skill cannot resolve it to a checkpoint-bound task:
|
|
22
21
|
|
|
@@ -44,8 +43,8 @@ Use AskUserQuestion to understand the new task:
|
|
|
44
43
|
|
|
45
44
|
Why is this task needed? What should it accomplish?
|
|
46
45
|
|
|
47
|
-
If this was triggered by `/cbp-
|
|
48
|
-
[pre-loaded context from
|
|
46
|
+
If this was triggered by `/cbp-verify` (task scope), the findings are:
|
|
47
|
+
[pre-loaded context from verify findings if available]
|
|
49
48
|
|
|
50
49
|
Please describe:
|
|
51
50
|
1. What the task should accomplish
|
|
@@ -70,7 +69,7 @@ Discovered issues MUST be captured. The default target is current scope (round
|
|
|
70
69
|
|
|
71
70
|
| Situation | Action |
|
|
72
71
|
|-----------|--------|
|
|
73
|
-
| Trivial inline fix (≤5 min, mechanical, scope-clean) | Apply in the CURRENT round per `cbp-
|
|
72
|
+
| Trivial inline fix (≤5 min, mechanical, scope-clean) | Apply in the CURRENT round per `cbp-verify` reference `findings-presentation.md` "Trivial-Resolution Exception" |
|
|
74
73
|
| Related to the current task's domain | Create a new ROUND in the current task |
|
|
75
74
|
| Fits the current checkpoint goal but is meaningfully separate | Create a new TASK in the current checkpoint via `create_task(checkpoint_id)` |
|
|
76
75
|
| Large enough to need multiple tasks AND fits no current checkpoint | Create a NEW CHECKPOINT via `create_checkpoint` |
|
|
@@ -193,5 +192,5 @@ Waiting for user to decide next step.
|
|
|
193
192
|
|
|
194
193
|
- **Reads**: Local state `.codebyplan/state/checkpoints/<id>.json` + `.../tasks/<id>.json`; on miss `npx codebyplan sync` once; MCP `get_current_task` / `get_tasks` as documented break-glass when the state dir is absent and sync fails. Step 3.5 dedup `get_tasks(standalone=true)` stays MCP — no local-state equivalent for standalone listing.
|
|
195
194
|
- **Writes**: `codebyplan task create --checkpoint-id <id> ...` (CLI write-through); MCP `create_task` break-glass.
|
|
196
|
-
- **Triggered by**: `/cbp-
|
|
195
|
+
- **Triggered by**: `/cbp-verify` (task scope, suggested), user manual
|
|
197
196
|
- **Does NOT auto-trigger** next command — user decides
|
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: cbp-task-start
|
|
3
3
|
description: Start a task, load context from DB
|
|
4
|
-
triggers: [cbp-round-
|
|
4
|
+
triggers: [cbp-round-plan]
|
|
5
5
|
argument-hint: [chk-task] # e.g. `108-1` (CHK-108 TASK-1)
|
|
6
6
|
effort: xhigh
|
|
7
7
|
---
|
|
@@ -14,7 +14,7 @@ Start a task by loading context from the database and preparing for work.
|
|
|
14
14
|
|
|
15
15
|
### Step 1: Parse `$ARGUMENTS`
|
|
16
16
|
|
|
17
|
-
Parse the argument using the canonical chk-task-round notation (see `cbp-round-
|
|
17
|
+
Parse the argument using the canonical chk-task-round notation (see `cbp-round-plan` Step 0 "CHK / TASK / ROUND Identifier Notation Vocabulary"):
|
|
18
18
|
|
|
19
19
|
| Shape | Regex | Resolves to |
|
|
20
20
|
|-------|-------|-------------|
|
|
@@ -30,7 +30,7 @@ task-start: invalid argument `{value}`. Expected:
|
|
|
30
30
|
(empty) → next pending task
|
|
31
31
|
|
|
32
32
|
For standalone tasks, use `/cbp-standalone-task-start {N}`.
|
|
33
|
-
For a specific round, use `/cbp-round-
|
|
33
|
+
For a specific round, use `/cbp-round-plan 108-1-2`.
|
|
34
34
|
```
|
|
35
35
|
|
|
36
36
|
Error cases: `108-1-2` (that is round-start's shape), `abc`, `108-`, `-1`, `108--1`, anything with whitespace or non-numeric characters.
|
|
@@ -40,7 +40,7 @@ Error cases: `108-1-2` (that is round-start's shape), `abc`, `108-`, `-1`, `108-
|
|
|
40
40
|
- `task-start 108-1` → CHK-108 TASK-1
|
|
41
41
|
- `task-start` (no arg) → next pending via `get_current_task`
|
|
42
42
|
- `task-start 45` → error: "Use /cbp-standalone-task-start 45 instead — bare numbers no longer route to standalone tasks."
|
|
43
|
-
- `task-start 108-1-2` → error: "use `/cbp-round-
|
|
43
|
+
- `task-start 108-1-2` → error: "use `/cbp-round-plan 108-1-2`"
|
|
44
44
|
- `task-start abc` → error: malformed
|
|
45
45
|
- `task-start 108-` → error: malformed
|
|
46
46
|
|
|
@@ -75,7 +75,7 @@ Ask via AskUserQuestion, naming the resolved task and disclosing the actions:
|
|
|
75
75
|
> - **Cancel** — do nothing
|
|
76
76
|
|
|
77
77
|
- **Proceed** → continue to Step 3.
|
|
78
|
-
- **Cancel** → abort cleanly: make NO writes (no branch switch, no commit, no `update_task`, no `/cbp-round-
|
|
78
|
+
- **Cancel** → abort cleanly: make NO writes (no branch switch, no commit, no `update_task`, no `/cbp-round-plan` trigger) and exit with one line: `Cancelled by user — TASK-{N} not started.`
|
|
79
79
|
|
|
80
80
|
### Step 3: Branch Auto-Handling
|
|
81
81
|
|
|
@@ -221,17 +221,17 @@ Display context summary:
|
|
|
221
221
|
|
|
222
222
|
### Step 6: Auto-trigger Round Start
|
|
223
223
|
|
|
224
|
-
The Step 2.5 permission gate already covered this hand-off (the user approved running the skill, round-1 auto-start included), so no further prompt is needed here — automatically trigger `/cbp-round-
|
|
224
|
+
The Step 2.5 permission gate already covered this hand-off (the user approved running the skill, round-1 auto-start included), so no further prompt is needed here — automatically trigger `/cbp-round-plan` for the first round.
|
|
225
225
|
|
|
226
226
|
```
|
|
227
227
|
Starting first round...
|
|
228
228
|
```
|
|
229
229
|
|
|
230
|
-
Trigger `/cbp-round-
|
|
230
|
+
Trigger `/cbp-round-plan` with **no argument**. Do NOT pass the task identifier (`{chk}-{task}`) — round-start's 2-segment form is interpreted as standalone TASK-`{chk}` round `{task}`, not CHK-`{chk}` TASK-`{task}`. Passing no argument causes round-start to derive the active task/round from state, which is the correct path here.
|
|
231
231
|
|
|
232
232
|
## Integration
|
|
233
233
|
|
|
234
234
|
- **Gates**: Step 2.5 permission gate — asks the user to confirm before any side effect; **Cancel** aborts cleanly with no writes. Fires on every invocation (manual, auto-trigger, auto-loop).
|
|
235
235
|
- **Reads**: `.codebyplan/state/checkpoints/*.json`, `checkpoints/<id>/tasks/*.json`, `checkpoints/<id>/tasks/<id>/rounds/*.json`, `todos.json` (local-first; `npx codebyplan sync` on miss; MCP `get_current_task`/`get_tasks`/`get_rounds` break-glass)
|
|
236
236
|
- **Writes**: `codebyplan task update` (CLI write-through; MCP `update_task` break-glass)
|
|
237
|
-
- **Triggers**: `/cbp-round-
|
|
237
|
+
- **Triggers**: `/cbp-round-plan` (auto, round 1, no argument)
|
|
@@ -131,19 +131,17 @@ Once the gates pass, load the context the head command needs. This ensures `/cle
|
|
|
131
131
|
| `/cbp-checkpoint-plan` | Load checkpoint from `.codebyplan/state/checkpoints/<id>.json` + task files under `checkpoints/<id>/tasks/` (fallback MCP `get_checkpoints` + `get_tasks`). Display checkpoint title, goal, ideas, existing task count |
|
|
132
132
|
| `/cbp-checkpoint-start` | Load checkpoint + task files from local state (fallback MCP `get_checkpoints` + `get_tasks`). Display checkpoint title, status, claim state, first pending task |
|
|
133
133
|
| `/cbp-task-start [N]` | Load from `.codebyplan/state/session/current.json` (fallback MCP `get_current_task`). Display checkpoint title + task title/requirements summary |
|
|
134
|
-
| `/cbp-round-
|
|
135
|
-
| `/cbp-round-
|
|
136
|
-
| `/cbp-
|
|
137
|
-
| `/cbp-task-check` | Load from local state session (fallback MCP `get_current_task`). Display checkpoint + task + files summary |
|
|
138
|
-
| `/cbp-task-testing` | Load from local state session + round files (fallback MCP `get_current_task` + `get_rounds`). Display checkpoint + task + testing status summary |
|
|
134
|
+
| `/cbp-round-plan` | Load from local state session + round files (fallback MCP `get_current_task` + `get_rounds`). Display checkpoint + task + round count + last round summary |
|
|
135
|
+
| `/cbp-round-plan` | **Full context load** (see Step 2b) |
|
|
136
|
+
| `/cbp-verify` | Load from local state session + round files (fallback MCP `get_current_task` + `get_rounds`). Display checkpoint + task + files_changed triage summary (claude_approved, findings, hard_fail) |
|
|
139
137
|
| `/cbp-task-create` | Load from local state session (fallback MCP `get_current_task`). Display checkpoint + task list summary |
|
|
140
|
-
| `/cbp-
|
|
138
|
+
| `/cbp-finalize` | Load from local state session (fallback MCP `get_current_task`). Display checkpoint + task summary |
|
|
141
139
|
| `/cbp-checkpoint-complete` | Load from local state session (fallback MCP `get_current_task`). Display checkpoint summary |
|
|
142
140
|
| *(no command / idle)* | See Step 3 — suggest `/cbp-session-end` |
|
|
143
141
|
|
|
144
142
|
**For any unrecognized command:** Load from local state session (fallback MCP `get_current_task`) as a safe default. Display whatever context is available.
|
|
145
143
|
|
|
146
|
-
### Step 2b: Full Context Load (for `/cbp-round-
|
|
144
|
+
### Step 2b: Full Context Load (for `/cbp-round-plan`)
|
|
147
145
|
|
|
148
146
|
This is the most context-dependent command. Load everything:
|
|
149
147
|
|
|
@@ -190,7 +188,7 @@ Reached only when the Step 1.5 ownership gate allowed routing to continue, the S
|
|
|
190
188
|
|
|
191
189
|
## Integration
|
|
192
190
|
|
|
193
|
-
- **Called by**: `/cbp-session-start`, `/cbp-
|
|
191
|
+
- **Called by**: `/cbp-session-start`, `/cbp-finalize`, `/cbp-checkpoint-complete`, manual, after `/clear`
|
|
194
192
|
- **Resolves**: `npx codebyplan resolve-worktree --json` (worktree id + distress signal), `npx codebyplan whoami --json` (user id)
|
|
195
193
|
- **Reads**: `.codebyplan/state/todos.json`, `session/current.json`, `checkpoints/<id>.json`, `checkpoints/<id>/tasks/<id>.json`, `checkpoints/<id>/tasks/<id>/rounds/<id>.json`, `worktrees.json`. If missing/stale, run `npx codebyplan sync` once and re-read. Break-glass: MCP `get_todos`, `get_current_task`, `get_rounds`, `get_checkpoints`, `get_tasks` when state dir absent and sync fails. `get_worktrees` stays MCP (display-only ownership-block path; no CLI verb).
|
|
196
194
|
- **Triggers**: `rows[0].command` (auto, after the Step 1.5 ownership gate and Step 1.55 stale-entity guard pass, and the Step 1.6 planning gate falls through); Step 1.55 overrides to STOP (stale completed/cancelled entity); Step 1.6 overrides to `/cbp-checkpoint-plan` (unplanned) or `/cbp-checkpoint-start` (planned-but-pending)
|
|
@@ -0,0 +1,146 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: cbp-verify
|
|
3
|
+
description: Unified verify stage — deterministic gates, real-execution proof, and a fresh-context diff review at round or task scope. Auto-triggered by cbp-round-build; escalates to task scope on the last clean round.
|
|
4
|
+
argument-hint: [chk-task[-round] | task[-round]]
|
|
5
|
+
triggers: [cbp-round-plan, cbp-round-complete, cbp-finalize]
|
|
6
|
+
effort: xhigh
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# Verify Command
|
|
10
|
+
|
|
11
|
+
The single verify stage for the execution half. Collapses automated checks, finished-round
|
|
12
|
+
triage, AI production review, and comprehensive task-level testing into one scope-aware skill.
|
|
13
|
+
The deterministic spine lives in the CLI (`codebyplan check`, `codebyplan e2e verify-round`);
|
|
14
|
+
this skill orchestrates the gates, proves execution, spawns ONE fresh-context reviewer, and
|
|
15
|
+
routes on a single directive.
|
|
16
|
+
|
|
17
|
+
Auto-triggered by `/cbp-round-build` after execution. The human gate is NOT here — it is the
|
|
18
|
+
separate `ask`-tier `cbp-round-complete` (round) and the one batched walkthrough in Phase 6
|
|
19
|
+
(task). This skill is model-invocable on purpose.
|
|
20
|
+
|
|
21
|
+
## Scope & Kind
|
|
22
|
+
|
|
23
|
+
- **SCOPE** (`round` | `task`) — auto-detected: a 3-segment `{chk}-{task}-{round}` (or 2-segment
|
|
24
|
+
`{task}-{round}` standalone) argument, or an auto-trigger from `cbp-round-build`, is
|
|
25
|
+
`scope=round`. A 2-segment `{chk}-{task}` (or bare `{task}` standalone) argument, or the Phase 5
|
|
26
|
+
escalation, is `scope=task`.
|
|
27
|
+
- **KIND** (`checkpoint` | `standalone`) — detected ONCE at the top from identifier shape
|
|
28
|
+
(3-segment / 2-segment-chk = checkpoint; 2-segment / bare = standalone). KIND selects MCP tool
|
|
29
|
+
names per the table in `reference/deterministic-gates.md`.
|
|
30
|
+
|
|
31
|
+
All reads are local-state-first (`.codebyplan/state/**`); on miss run `npx codebyplan sync` once
|
|
32
|
+
and re-read; MCP `get_*` is the break-glass fallback. All writes go through `codebyplan ... update`
|
|
33
|
+
(CLI write-through), MCP break-glass.
|
|
34
|
+
|
|
35
|
+
## HARD GATES — non-negotiable
|
|
36
|
+
|
|
37
|
+
- The deterministic-gate JSON **is** the verdict — never narrate "I verified the build". (Phase 2)
|
|
38
|
+
- Empty execution proof on a UI-touching diff = GATE FAILURE. (Phase 3, `rules/execution-proof.md`)
|
|
39
|
+
- Reviewer spawn failure = HARD GATE FAILURE → STOP + retry directive; NEVER self-review inline.
|
|
40
|
+
(Phase 4, `rules/spawn-failure-is-gate-failure.md`)
|
|
41
|
+
- `gate6` is always hard (never baselined); baseline regressions are a user-accept gate, never
|
|
42
|
+
auto-accepted. (`rules/two-tier-ci.md`)
|
|
43
|
+
|
|
44
|
+
## Phase Skeleton
|
|
45
|
+
|
|
46
|
+
### PHASE 1 — RESOLVE
|
|
47
|
+
|
|
48
|
+
Parse `$ARGUMENTS` (notation per `cbp-round-plan` identifier vocabulary). Detect SCOPE and KIND
|
|
49
|
+
(above). Resolve the active round/task from local state. If `scope=round` and no in-progress
|
|
50
|
+
round → `No active round. Run /cbp-round-plan first.` and STOP. If `scope=task` and any round is
|
|
51
|
+
still `in_progress` → STOP with "complete the active round first". Full resolution + KIND tool
|
|
52
|
+
table: `reference/deterministic-gates.md`.
|
|
53
|
+
|
|
54
|
+
### PHASE 2 — DETERMINISTIC GATES
|
|
55
|
+
|
|
56
|
+
Run the unified matrix and capture the JSON:
|
|
57
|
+
|
|
58
|
+
```bash
|
|
59
|
+
codebyplan check --scope <round|task> --json
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
The JSON `{ results[], any_failed, hard_fail_checks[], no_baseline }` IS the verdict — record
|
|
63
|
+
each result's `check`, `status`, `exit_code`, `new_failures[]`. `gate6` is ALWAYS hard;
|
|
64
|
+
`lint`/`typecheck`/`tests`/`audit` fail only on NEW per-package failures vs the committed
|
|
65
|
+
`.check-baseline.json` (baseline-tolerant soft tier, `rules/two-tier-ci.md`). `any_failed === true`
|
|
66
|
+
(equivalently `hard_fail_checks.length > 0`) → carry into the Phase 5 verdict as a fail. Exact
|
|
67
|
+
contract + the `claude_only` carve-out (deterministic-only path, no agent): see
|
|
68
|
+
`reference/deterministic-gates.md`.
|
|
69
|
+
|
|
70
|
+
### PHASE 3 — REAL EXECUTION PROOF
|
|
71
|
+
|
|
72
|
+
Produce the committed proof for every tier the diff touches (`rules/execution-proof.md`):
|
|
73
|
+
|
|
74
|
+
- **Tier 1** (configured e2e framework whose `app` source changed) — persist `e2e_eligible` /
|
|
75
|
+
`e2e_outputs` to round context, then:
|
|
76
|
+
|
|
77
|
+
```bash
|
|
78
|
+
codebyplan e2e verify-round --round-id <round_id> --task-id <task_id>
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
Exit 0 = pass; exit 1 → surface `result.failed_checks[]` (`e2e_eligible_skipped` /
|
|
82
|
+
`zero_assertion_run` / `empty_gallery`) verbatim and carry as a fail.
|
|
83
|
+
- **Tier 2/3/4** — dev-server screenshot / HTTP trace / command log per the rule.
|
|
84
|
+
|
|
85
|
+
**Empty proof on a UI diff = GATE FAILURE.** Verify each screenshot/trace is committed with
|
|
86
|
+
`git ls-files --error-unmatch <path>`. Write the `verify_manifest` (gates + proof, schema in
|
|
87
|
+
`rules/execution-proof.md`). Per-scope detail: `reference/round-scope.md`, `reference/task-scope.md`.
|
|
88
|
+
|
|
89
|
+
### PHASE 4 — FRESH-CONTEXT DIFF REVIEW
|
|
90
|
+
|
|
91
|
+
Spawn `cbp-verify-reviewer` with `scope` (round → round diff; task → full task diff) and the
|
|
92
|
+
Input Contract from `agents/cbp-verify-reviewer.md`. **SPAWN FAILURE = HARD GATE FAILURE** → STOP
|
|
93
|
+
and surface the retry directive (`rules/spawn-failure-is-gate-failure.md`); record
|
|
94
|
+
`<scope>.context.verify.spawn_failure`; do NOT walk the reviewer's phases inline. A returned
|
|
95
|
+
`NOT_READY` is a successful review — act on it, do not retry.
|
|
96
|
+
|
|
97
|
+
Triage the returned findings: in-scope mechanical fixes the orchestrator applies itself
|
|
98
|
+
(`Edit`/`Write`); blocking out-of-scope findings → `/cbp-round-plan` fix round. A baseline
|
|
99
|
+
regression is a **blocking user-accept gate** — never auto-accepted.
|
|
100
|
+
|
|
101
|
+
### PHASE 5 — VERDICT + ROUTE (single directive, never an A/B/C menu)
|
|
102
|
+
|
|
103
|
+
Combine Phase 2 + 3 + 4. Route on one directive (`feedback-close-out-routing.md`):
|
|
104
|
+
|
|
105
|
+
| Result | Route |
|
|
106
|
+
|--------|-------|
|
|
107
|
+
| Any gate/proof/review fail | `Next: /cbp-round-plan` (open a fix round) |
|
|
108
|
+
| Pass + more work wanted | `Next: /cbp-round-plan` (another round) |
|
|
109
|
+
| Pass + LAST round + clean (scope=round) | escalate to `scope=task` → re-enter at Phase 1 |
|
|
110
|
+
| Pass (scope=task) | proceed to Phase 6 finalize |
|
|
111
|
+
|
|
112
|
+
### PHASE 6 — FINALIZE
|
|
113
|
+
|
|
114
|
+
- **scope=round** — route to the human git-add gate: `Next: /cbp-round-complete`
|
|
115
|
+
(`ask`-tier; reconciles `sync-approvals` + `complete_round`). cbp-verify does NOT stage files
|
|
116
|
+
or complete the round. Detail: `reference/round-scope.md`.
|
|
117
|
+
- **scope=task** — whole-repo `codebyplan check --scope task`, holistic `cbp-verify-reviewer`
|
|
118
|
+
(scope=task) already run in Phase 4, then the ONE genuine human step: a single batched
|
|
119
|
+
`AskUserQuestion` walkthrough (all user-testable items in one prompt, never one-per-question).
|
|
120
|
+
On satisfaction, write `task.context.verify_verdict = { verdict: 'READY', manifest, decided_at }`
|
|
121
|
+
and route `Next: /cbp-finalize`. Detail: `reference/task-scope.md`.
|
|
122
|
+
|
|
123
|
+
## Key Rules
|
|
124
|
+
|
|
125
|
+
- The JSON verdict from `codebyplan check` / `e2e verify-round` is authoritative — no prose
|
|
126
|
+
substitution.
|
|
127
|
+
- Reviewer spawn failure STOPS the skill (retry directive); never self-certify inline.
|
|
128
|
+
- Empty proof on a UI diff fails verify; screenshots must be committed.
|
|
129
|
+
- Claude NEVER `git add`s — staging is the user's approval signal at `cbp-round-complete`.
|
|
130
|
+
- Single-directive routing only — never an A/B/C menu.
|
|
131
|
+
- `claude_only` profile is the deterministic-only carve-out (no reviewer spawn expected).
|
|
132
|
+
|
|
133
|
+
## Integration
|
|
134
|
+
|
|
135
|
+
- **Triggered by**: `/cbp-round-build` (auto, scope=round after execution); self-escalates to
|
|
136
|
+
scope=task on the last clean round.
|
|
137
|
+
- **Reads**: `.codebyplan/state/**` (local-first; `npx codebyplan sync` on miss; MCP `get_*`
|
|
138
|
+
break-glass); changed files + git diff via the reviewer.
|
|
139
|
+
- **Writes**: `codebyplan round update` / `codebyplan task update` (CLI write-through; MCP
|
|
140
|
+
`update_round` / `update_task` break-glass) — `verify_manifest`, `verify_verdict`.
|
|
141
|
+
- **Spawns**: `cbp-verify-reviewer` (scope param); the `cbp-e2e-*` specialists feed Tier-1 proof
|
|
142
|
+
upstream in `cbp-round-build`.
|
|
143
|
+
- **Triggers**: `/cbp-round-plan` (any fail or more-work), `/cbp-round-complete` (scope=round
|
|
144
|
+
finalize), `/cbp-finalize` (scope=task READY).
|
|
145
|
+
- **References**: `reference/round-scope.md`, `reference/task-scope.md`,
|
|
146
|
+
`reference/deterministic-gates.md`.
|
|
@@ -0,0 +1,114 @@
|
|
|
1
|
+
# Deterministic Gates — Command Contracts & Manifest
|
|
2
|
+
|
|
3
|
+
Authoritative gate-command + manifest detail for `cbp-verify`. The SKILL.md phases point here;
|
|
4
|
+
this file is loaded on demand.
|
|
5
|
+
|
|
6
|
+
## KIND tool table
|
|
7
|
+
|
|
8
|
+
KIND is detected once at SKILL Phase 1 from the identifier shape. MCP tool names differ by KIND;
|
|
9
|
+
all writes prefer the CLI write-through and fall back to MCP.
|
|
10
|
+
|
|
11
|
+
| Operation | `checkpoint` KIND | `standalone` KIND |
|
|
12
|
+
|-----------|------------------|-------------------|
|
|
13
|
+
| Get task | local state (break-glass `get_current_task`) | `get_current_standalone_task(repo_id)` |
|
|
14
|
+
| Get rounds | local state (break-glass `get_rounds`) | `get_standalone_rounds(standalone_task_id)` |
|
|
15
|
+
| Update round | `codebyplan round update` (MCP `update_round`) | MCP `update_standalone_round` |
|
|
16
|
+
| Update task | `codebyplan task update` (MCP `update_task`) | MCP `update_standalone_task` |
|
|
17
|
+
|
|
18
|
+
Empty-arg KIND detection: probe `get_current_standalone_task` first; if found → `standalone`;
|
|
19
|
+
else `checkpoint` via `get_current_task`. (KIND detection is MCP-unavoidable — no identifier yet
|
|
20
|
+
means no local path to probe; everything after is local-first.)
|
|
21
|
+
|
|
22
|
+
## Phase 1 resolution detail
|
|
23
|
+
|
|
24
|
+
| Parse | Resolution |
|
|
25
|
+
|-------|-----------|
|
|
26
|
+
| `{chk}-{task}-{round}` | checkpoint round. Read `.codebyplan/state/checkpoints/*.json` → filter `number==={chk}`; `.../tasks/*.json` → `{task}`; `.../rounds/*.json` → `{round}`. |
|
|
27
|
+
| `{chk}-{task}` | checkpoint task (scope=task). Resolve checkpoint + task; verify all rounds `completed`. |
|
|
28
|
+
| `{task}-{round}` | standalone round (scope=round). |
|
|
29
|
+
| `{task}` (bare) | standalone task (scope=task). |
|
|
30
|
+
| _(empty)_ | the active in-progress task/round from `.codebyplan/state/todos.json`. |
|
|
31
|
+
|
|
32
|
+
On any miss: `npx codebyplan sync` once, re-read; MCP `get_*` break-glass only when the state dir
|
|
33
|
+
is absent AND sync fails.
|
|
34
|
+
|
|
35
|
+
## Phase 2 — `codebyplan check`
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
codebyplan check --scope <round|task> --json
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
JSON shape (`RunCheckResult`, source `packages/codebyplan-package/src/lib/check.ts:185`):
|
|
42
|
+
|
|
43
|
+
```jsonc
|
|
44
|
+
{
|
|
45
|
+
"results": [
|
|
46
|
+
{ "check": "gate6|lint|typecheck|tests|audit",
|
|
47
|
+
"status": "pass|fail|skipped",
|
|
48
|
+
"exit_code": 0,
|
|
49
|
+
"command": "...",
|
|
50
|
+
"stdout": "...", "stderr": "...",
|
|
51
|
+
"executed": true,
|
|
52
|
+
"new_failures": ["@scope/pkg", "GHSA-xxxx"] } // omitted for gate6
|
|
53
|
+
],
|
|
54
|
+
"any_failed": false,
|
|
55
|
+
"hard_fail_checks": [], // names of checks that failed post-baseline-diff
|
|
56
|
+
"no_baseline": false
|
|
57
|
+
}
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
- **`gate6`** (sibling-identity parity) is ALWAYS hard — never baselined, no `new_failures` field.
|
|
61
|
+
- `lint` / `typecheck` / `tests` / `audit` are **baseline-diffed**: `status: 'pass'` when
|
|
62
|
+
`new_failures` is `[]` even if the underlying command exited non-zero (pre-existing red is
|
|
63
|
+
tolerated). `audit.new_failures` lists new GHSA ids not in the allowlist.
|
|
64
|
+
- Verdict: `any_failed === true` (≡ `hard_fail_checks.length > 0`) is a fail — surface each failing
|
|
65
|
+
result's `new_failures` / `stdout` / `stderr`. **This JSON is the verdict; never substitute prose.**
|
|
66
|
+
- Soft tier uses NO `--no-baseline`. The whole-repo absolute-green tier
|
|
67
|
+
(`--scope merged --no-baseline`) belongs to checkpoint close, not this skill
|
|
68
|
+
(`rules/two-tier-ci.md`).
|
|
69
|
+
|
|
70
|
+
## Phase 3 — `codebyplan e2e verify-round`
|
|
71
|
+
|
|
72
|
+
```bash
|
|
73
|
+
codebyplan e2e verify-round --round-id <uuid> --task-id <uuid>
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
Persist `round.context.e2e_eligible[]` + `e2e_outputs{}` FIRST (the CLI reads the round row from
|
|
77
|
+
the DB). Verdict JSON (`VerifyRoundResult`, source `packages/codebyplan-package/src/lib/e2e.ts:127`):
|
|
78
|
+
|
|
79
|
+
```jsonc
|
|
80
|
+
{ "round_id": "...", "task_id": "...",
|
|
81
|
+
"result": { "pass": true, "failed_checks": [], "skipped_validly": [] } }
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
Exit 0 = pass. Exit 1 → one or more of `e2e_eligible_skipped` / `zero_assertion_run` /
|
|
85
|
+
`empty_gallery` in `result.failed_checks[]` — surface verbatim, carry as a fail, route to a fix
|
|
86
|
+
round (`rules/e2e-mandatory.md`). When `e2e_eligible[]` is empty, skip the call — nothing to verify.
|
|
87
|
+
|
|
88
|
+
## `claude_only` carve-out (deterministic-only path)
|
|
89
|
+
|
|
90
|
+
When the resolved profile is `claude_only` (round touched only `.claude/**` / docs / config — no
|
|
91
|
+
app surface), there is **no reviewer to spawn by design**. Proof IS the deterministic set:
|
|
92
|
+
|
|
93
|
+
1. `codebyplan check --scope <round|task> --json` (gate6 + matrix as above).
|
|
94
|
+
2. `bash -n <hook>` for each touched `.sh` file.
|
|
95
|
+
3. SKILL/agent/rule structure sanity for touched `.claude/` files (line counts, no `/cbp-*`
|
|
96
|
+
legacy notation).
|
|
97
|
+
|
|
98
|
+
This is a first-class verification path, NOT a banned inline fallback
|
|
99
|
+
(`rules/spawn-failure-is-gate-failure.md` carve-out) — Phase 4's reviewer spawn is skipped, and
|
|
100
|
+
that skip is recorded as `verify_manifest.proof.tier: 4`, not a spawn failure.
|
|
101
|
+
|
|
102
|
+
## verify-manifest write
|
|
103
|
+
|
|
104
|
+
Write the manifest into round/task context (merge into existing context — the `update_*`
|
|
105
|
+
REPLACE contract requires re-sending the full object):
|
|
106
|
+
|
|
107
|
+
```bash
|
|
108
|
+
codebyplan round update --id <round_id> --task-id <uuid> --checkpoint-id <uuid> --context '<json>'
|
|
109
|
+
# break-glass: MCP update_round / update_standalone_round
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
Schema (canonical in `rules/execution-proof.md`): `verify_manifest = { scope, gates[], proof{ tier,
|
|
113
|
+
artifacts[], e2e_verify_round }, decided_at }`. Each `proof.artifacts[].path` is proven committed
|
|
114
|
+
via `git ls-files --error-unmatch <path>` before it counts.
|