maxsimcli 5.0.6 → 5.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +316 -288
- package/dist/assets/CHANGELOG.md +14 -0
- package/dist/assets/hooks/maxsim-capture-learnings.cjs +128 -0
- package/dist/assets/hooks/maxsim-capture-learnings.cjs.map +1 -0
- package/dist/assets/hooks/maxsim-check-update.cjs +126 -88
- package/dist/assets/hooks/maxsim-check-update.cjs.map +1 -1
- package/dist/assets/hooks/maxsim-notification-sound.cjs +87 -43
- package/dist/assets/hooks/maxsim-notification-sound.cjs.map +1 -1
- package/dist/assets/hooks/maxsim-statusline.cjs +45 -171
- package/dist/assets/hooks/maxsim-statusline.cjs.map +1 -1
- package/dist/assets/hooks/maxsim-stop-sound.cjs +86 -43
- package/dist/assets/hooks/maxsim-stop-sound.cjs.map +1 -1
- package/dist/assets/hooks/maxsim-sync-reminder.cjs +72 -21
- package/dist/assets/hooks/maxsim-sync-reminder.cjs.map +1 -1
- package/dist/assets/templates/agents/AGENTS.md +62 -51
- package/dist/assets/templates/agents/executor.md +44 -59
- package/dist/assets/templates/agents/planner.md +36 -31
- package/dist/assets/templates/agents/researcher.md +35 -43
- package/dist/assets/templates/agents/verifier.md +29 -31
- package/dist/assets/templates/commands/maxsim/debug.md +20 -154
- package/dist/assets/templates/commands/maxsim/execute.md +19 -33
- package/dist/assets/templates/commands/maxsim/go.md +21 -20
- package/dist/assets/templates/commands/maxsim/help.md +5 -14
- package/dist/assets/templates/commands/maxsim/init.md +18 -40
- package/dist/assets/templates/commands/maxsim/plan.md +22 -37
- package/dist/assets/templates/commands/maxsim/progress.md +15 -16
- package/dist/assets/templates/commands/maxsim/quick.md +18 -29
- package/dist/assets/templates/commands/maxsim/settings.md +18 -26
- package/dist/assets/templates/references/continuation-format.md +2 -4
- package/dist/assets/templates/references/model-profiles.md +2 -2
- package/dist/assets/templates/references/planning-config.md +10 -11
- package/dist/assets/templates/references/self-improvement.md +120 -0
- package/dist/assets/templates/rules/conventions.md +1 -1
- package/dist/assets/templates/rules/verification-protocol.md +1 -1
- package/dist/assets/templates/skills/brainstorming/SKILL.md +35 -26
- package/dist/assets/templates/skills/code-review/SKILL.md +78 -55
- package/dist/assets/templates/skills/commit-conventions/SKILL.md +70 -36
- package/dist/assets/templates/skills/github-operations/SKILL.md +142 -0
- package/dist/assets/templates/skills/handoff-contract/SKILL.md +62 -28
- package/dist/assets/templates/skills/maxsim-batch/SKILL.md +68 -42
- package/dist/assets/templates/skills/maxsim-simplify/SKILL.md +65 -40
- package/dist/assets/templates/skills/project-memory/SKILL.md +121 -0
- package/dist/assets/templates/skills/research/SKILL.md +126 -0
- package/dist/assets/templates/skills/roadmap-writing/SKILL.md +71 -68
- package/dist/assets/templates/skills/systematic-debugging/SKILL.md +37 -25
- package/dist/assets/templates/skills/tdd/SKILL.md +36 -39
- package/dist/assets/templates/skills/using-maxsim/SKILL.md +69 -55
- package/dist/assets/templates/skills/verification/SKILL.md +167 -0
- package/dist/assets/templates/workflows/batch.md +249 -268
- package/dist/assets/templates/workflows/diagnose-issues.md +225 -151
- package/dist/assets/templates/workflows/execute-plan.md +191 -981
- package/dist/assets/templates/workflows/execute.md +350 -309
- package/dist/assets/templates/workflows/go.md +119 -138
- package/dist/assets/templates/workflows/health.md +71 -114
- package/dist/assets/templates/workflows/help.md +85 -147
- package/dist/assets/templates/workflows/init-existing.md +180 -1373
- package/dist/assets/templates/workflows/init.md +53 -165
- package/dist/assets/templates/workflows/new-milestone.md +91 -334
- package/dist/assets/templates/workflows/new-project.md +165 -1384
- package/dist/assets/templates/workflows/plan-create.md +182 -73
- package/dist/assets/templates/workflows/plan-discuss.md +89 -82
- package/dist/assets/templates/workflows/plan-research.md +191 -85
- package/dist/assets/templates/workflows/plan.md +122 -58
- package/dist/assets/templates/workflows/progress.md +76 -310
- package/dist/assets/templates/workflows/quick.md +70 -495
- package/dist/assets/templates/workflows/sdd.md +231 -221
- package/dist/assets/templates/workflows/settings.md +90 -120
- package/dist/assets/templates/workflows/verify-phase.md +296 -258
- package/dist/cli.cjs +17 -23465
- package/dist/cli.cjs.map +1 -1
- package/dist/install.cjs +356 -8358
- package/dist/install.cjs.map +1 -1
- package/package.json +16 -22
- package/dist/assets/templates/skills/agent-system-map/SKILL.md +0 -92
- package/dist/assets/templates/skills/evidence-collection/SKILL.md +0 -87
- package/dist/assets/templates/skills/github-artifact-protocol/SKILL.md +0 -67
- package/dist/assets/templates/skills/github-tools-guide/SKILL.md +0 -89
- package/dist/assets/templates/skills/input-validation/SKILL.md +0 -51
- package/dist/assets/templates/skills/memory-management/SKILL.md +0 -75
- package/dist/assets/templates/skills/research-methodology/SKILL.md +0 -137
- package/dist/assets/templates/skills/sdd/SKILL.md +0 -91
- package/dist/assets/templates/skills/tool-priority-guide/SKILL.md +0 -80
- package/dist/assets/templates/skills/verification-before-completion/SKILL.md +0 -71
- package/dist/assets/templates/skills/verification-gates/SKILL.md +0 -169
- package/dist/assets/templates/workflows/discuss-phase.md +0 -683
- package/dist/assets/templates/workflows/research-phase.md +0 -73
- package/dist/assets/templates/workflows/verify-work.md +0 -572
- package/dist/core-D5zUr9cb.cjs +0 -4305
- package/dist/core-D5zUr9cb.cjs.map +0 -1
- package/dist/skills-CjFWZIGM.cjs +0 -6824
- package/dist/skills-CjFWZIGM.cjs.map +0 -1
|
@@ -1,91 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: sdd
|
|
3
|
-
description: >-
|
|
4
|
-
Spec-driven development with fresh-agent-per-task execution. Prevents context
|
|
5
|
-
rot by isolating each task in a clean context window with its spec. Use when
|
|
6
|
-
executing multi-task plans, orchestrating agent work, or when context
|
|
7
|
-
accumulation degrades quality.
|
|
8
|
-
---
|
|
9
|
-
|
|
10
|
-
# Spec-Driven Development (SDD)
|
|
11
|
-
|
|
12
|
-
Execute tasks sequentially, each in a fresh agent with clean context. Verify every task before moving to the next.
|
|
13
|
-
|
|
14
|
-
## Why SDD
|
|
15
|
-
|
|
16
|
-
Context rot is the primary failure mode for multi-task execution. As an agent processes more tasks, earlier context competes with later instructions. Quality degrades predictably after 3-5 tasks in a single context window. SDD solves this by giving each task a fresh context with only its specification.
|
|
17
|
-
|
|
18
|
-
## The SDD Process
|
|
19
|
-
|
|
20
|
-
### 1. LOAD -- Read the Plan
|
|
21
|
-
|
|
22
|
-
- Read the plan file (PLAN.md) to get the ordered task list
|
|
23
|
-
- For each task: description, acceptance criteria, relevant files
|
|
24
|
-
- Confirm task order respects dependencies
|
|
25
|
-
|
|
26
|
-
### 2. DISPATCH -- Spawn Fresh Agent Per Task
|
|
27
|
-
|
|
28
|
-
For each task in order:
|
|
29
|
-
|
|
30
|
-
1. Assemble minimal task context:
|
|
31
|
-
- Task description and acceptance criteria from the plan
|
|
32
|
-
- Only the files relevant to this specific task
|
|
33
|
-
- Results from previous tasks (commit hashes, created files) -- NOT the full previous context
|
|
34
|
-
2. Spawn a fresh agent with this minimal context
|
|
35
|
-
3. The agent implements the task, runs verification, and commits
|
|
36
|
-
|
|
37
|
-
### 3. REVIEW -- Two-Stage Quality Gate
|
|
38
|
-
|
|
39
|
-
After each task completes:
|
|
40
|
-
|
|
41
|
-
**Stage 1: Spec Compliance** -- Does the implementation match the task spec? Are all acceptance criteria met? Were only specified files modified?
|
|
42
|
-
|
|
43
|
-
**Stage 2: Code Quality** -- Are there bugs, edge cases, or error handling gaps? Is the code consistent with codebase conventions? Do all tests pass?
|
|
44
|
-
|
|
45
|
-
Verdict: PASS or FAIL with specific issues per stage.
|
|
46
|
-
|
|
47
|
-
### 4. FIX -- Address Review Failures
|
|
48
|
-
|
|
49
|
-
If either review stage fails:
|
|
50
|
-
|
|
51
|
-
1. Spawn a NEW fresh agent with original task spec + review feedback + current file state
|
|
52
|
-
2. Fix agent addresses ONLY the review issues -- no new features
|
|
53
|
-
3. Re-run both review stages
|
|
54
|
-
4. If 3 fix attempts fail: STOP and escalate
|
|
55
|
-
|
|
56
|
-
### 5. ADVANCE -- Move to Next Task
|
|
57
|
-
|
|
58
|
-
Only after both review stages pass:
|
|
59
|
-
|
|
60
|
-
- Record task as complete with commit hash
|
|
61
|
-
- Pass minimal summary (not full context) to the next task
|
|
62
|
-
|
|
63
|
-
## Context Management
|
|
64
|
-
|
|
65
|
-
Each agent receives ONLY what it needs:
|
|
66
|
-
|
|
67
|
-
| Context Item | Included? |
|
|
68
|
-
|---|---|
|
|
69
|
-
| Task description + acceptance criteria | Always |
|
|
70
|
-
| Files relevant to this task | Always |
|
|
71
|
-
| Previous task commit hashes | Always |
|
|
72
|
-
| Previous task full diff | Never |
|
|
73
|
-
| Previous task agent conversation | Never |
|
|
74
|
-
| Full codebase | Never -- only specified files |
|
|
75
|
-
|
|
76
|
-
The point of SDD is fresh context. Loading the previous agent's full context defeats the purpose.
|
|
77
|
-
|
|
78
|
-
## When to Use SDD
|
|
79
|
-
|
|
80
|
-
- **Good fit:** Multi-task plans (3+ tasks), sequential work where each task builds on the previous, implementations where quality degrades over time
|
|
81
|
-
- **Poor fit:** Single-task work, highly interactive tasks requiring user feedback, tasks that share significant overlapping context
|
|
82
|
-
|
|
83
|
-
## Common Pitfalls
|
|
84
|
-
|
|
85
|
-
| Pitfall | Why It Matters |
|
|
86
|
-
|---|---|
|
|
87
|
-
| Skipping review for simple tasks | Simple tasks still have bugs. Review catches what the implementer missed. |
|
|
88
|
-
| Passing full context forward | Full context causes the exact rot SDD is designed to prevent. |
|
|
89
|
-
| Deferring fixes to the next task | The next task's agent does not know about the bug. Fix it now. |
|
|
90
|
-
|
|
91
|
-
See also: `/verification-before-completion` for the evidence-based verification methodology used within each SDD task.
|
|
@@ -1,80 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: tool-priority-guide
|
|
3
|
-
description: >-
|
|
4
|
-
Tool selection guide for Claude Code operations. Maps common tasks to preferred
|
|
5
|
-
tools, explaining when to use Read over cat, Grep over rg, Glob over find,
|
|
6
|
-
Write over echo, and Edit over sed. Use when deciding which tool to use for
|
|
7
|
-
file operations, search, content modification, or web content retrieval.
|
|
8
|
-
user-invocable: false
|
|
9
|
-
---
|
|
10
|
-
|
|
11
|
-
# Tool Priority Guide
|
|
12
|
-
|
|
13
|
-
Use dedicated Claude Code tools over Bash equivalents. Dedicated tools provide better permissions handling, output formatting, and user experience.
|
|
14
|
-
|
|
15
|
-
## File Reading
|
|
16
|
-
|
|
17
|
-
| Task | Use | Not |
|
|
18
|
-
|------|-----|-----|
|
|
19
|
-
| Read file contents | **Read tool** | `cat`, `head`, `tail` via Bash |
|
|
20
|
-
| Read specific lines | **Read tool** (with offset/limit) | `sed -n 'X,Yp'` via Bash |
|
|
21
|
-
| Read images | **Read tool** (multimodal) | Not possible via Bash |
|
|
22
|
-
| Read PDFs | **Read tool** (with pages param) | `pdftotext` via Bash |
|
|
23
|
-
|
|
24
|
-
**Why Read:** Handles permissions, large files, binary formats. Returns line-numbered output.
|
|
25
|
-
|
|
26
|
-
## File Writing
|
|
27
|
-
|
|
28
|
-
| Task | Use | Not |
|
|
29
|
-
|------|-----|-----|
|
|
30
|
-
| Create new file | **Write tool** | `echo > file`, `cat <<EOF` via Bash |
|
|
31
|
-
| Rewrite entire file | **Write tool** (after Read) | `cat > file` via Bash |
|
|
32
|
-
| Modify part of file | **Edit tool** | `sed`, `awk` via Bash |
|
|
33
|
-
| Rename string across file | **Edit tool** (replace_all) | `sed -i 's/old/new/g'` via Bash |
|
|
34
|
-
|
|
35
|
-
**Why Write/Edit:** Atomic operations, preserves encoding, provides diff view for review.
|
|
36
|
-
|
|
37
|
-
## Searching
|
|
38
|
-
|
|
39
|
-
| Task | Use | Not |
|
|
40
|
-
|------|-----|-----|
|
|
41
|
-
| Search file contents | **Grep tool** | `grep`, `rg` via Bash |
|
|
42
|
-
| Find files by pattern | **Glob tool** | `find`, `ls -R` via Bash |
|
|
43
|
-
| Search with context | **Grep tool** (-A, -B, -C params) | `grep -C N` via Bash |
|
|
44
|
-
| Count matches | **Grep tool** (output_mode: count) | `grep -c` via Bash |
|
|
45
|
-
|
|
46
|
-
**Why Grep/Glob:** Optimized permissions, structured output, result limiting.
|
|
47
|
-
|
|
48
|
-
## Web Content
|
|
49
|
-
|
|
50
|
-
| Task | Use | Not |
|
|
51
|
-
|------|-----|-----|
|
|
52
|
-
| Fetch documentation | **WebFetch tool** | `curl` via Bash |
|
|
53
|
-
| Read API responses | **WebFetch tool** | `curl | jq` via Bash |
|
|
54
|
-
| Download files | **Bash** (`curl -O`) | WebFetch (not for binary downloads) |
|
|
55
|
-
|
|
56
|
-
**Why WebFetch:** Handles authentication, follows redirects, parses HTML.
|
|
57
|
-
|
|
58
|
-
## When Bash IS the Right Tool
|
|
59
|
-
|
|
60
|
-
| Task | Why Bash |
|
|
61
|
-
|------|---------|
|
|
62
|
-
| Run build/test commands | `npm test`, `npm run build` -- no dedicated tool |
|
|
63
|
-
| Git operations | `git status`, `git commit` -- no dedicated tool |
|
|
64
|
-
| Install dependencies | `npm install` -- no dedicated tool |
|
|
65
|
-
| Check file existence | `test -f path` -- lightweight, often part of larger commands |
|
|
66
|
-
| Run project CLI tools | Project-specific commands -- no dedicated tool |
|
|
67
|
-
| Chained operations | Multiple sequential commands with `&&` |
|
|
68
|
-
|
|
69
|
-
## Quick Reference
|
|
70
|
-
|
|
71
|
-
```
|
|
72
|
-
Read file --> Read tool
|
|
73
|
-
Write file --> Write tool (new) or Edit tool (modify)
|
|
74
|
-
Search code --> Grep tool
|
|
75
|
-
Find files --> Glob tool
|
|
76
|
-
Fetch URL --> WebFetch tool
|
|
77
|
-
Run commands --> Bash tool
|
|
78
|
-
```
|
|
79
|
-
|
|
80
|
-
The general principle: if a dedicated tool exists for the operation, use it. Fall back to Bash only when no dedicated tool covers the task.
|
|
@@ -1,71 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: verification-before-completion
|
|
3
|
-
description: >-
|
|
4
|
-
Requires running verification commands and reading actual output before
|
|
5
|
-
completion claims. Covers the 5-step verification process and evidence block
|
|
6
|
-
format. Use when claiming work is done, tests pass, builds succeed, or bugs
|
|
7
|
-
are fixed.
|
|
8
|
-
---
|
|
9
|
-
|
|
10
|
-
# Verification Before Completion
|
|
11
|
-
|
|
12
|
-
Evidence before claims, always. No exceptions.
|
|
13
|
-
|
|
14
|
-
## The 5-Step Process
|
|
15
|
-
|
|
16
|
-
Before claiming any status or marking a task done:
|
|
17
|
-
|
|
18
|
-
1. **IDENTIFY** -- What command proves this claim?
|
|
19
|
-
2. **RUN** -- Execute the command fresh in this turn (not a previous run)
|
|
20
|
-
3. **READ** -- Read the full output, check the exit code, count failures
|
|
21
|
-
4. **VERIFY** -- Does the output actually confirm the claim? If NO: state the actual status with evidence. If YES: proceed.
|
|
22
|
-
5. **CLAIM** -- Only now may you assert completion
|
|
23
|
-
|
|
24
|
-
### Evidence Block Format
|
|
25
|
-
|
|
26
|
-
When claiming task completion, build completion, or test passage, produce:
|
|
27
|
-
|
|
28
|
-
```
|
|
29
|
-
CLAIM: [what you are claiming]
|
|
30
|
-
EVIDENCE: [exact command run in this turn]
|
|
31
|
-
OUTPUT: [relevant excerpt of actual output]
|
|
32
|
-
VERDICT: PASS | FAIL
|
|
33
|
-
```
|
|
34
|
-
|
|
35
|
-
This format is required for task completion claims in MAXSIM plan execution.
|
|
36
|
-
|
|
37
|
-
## What Counts as Verification
|
|
38
|
-
|
|
39
|
-
| Claim | Requires | Not Sufficient |
|
|
40
|
-
|-------|----------|----------------|
|
|
41
|
-
| "Tests pass" | Test command output showing 0 failures | Previous run, "should pass", partial run |
|
|
42
|
-
| "Build succeeds" | Build command with exit code 0 | Linter passing, "logs look clean" |
|
|
43
|
-
| "Bug is fixed" | Original failing test now passes | "Code changed, assumed fixed" |
|
|
44
|
-
| "Task is complete" | All done criteria checked with evidence | "I implemented everything in the plan" |
|
|
45
|
-
| "No regressions" | Full test suite passing | "I only changed one file" |
|
|
46
|
-
|
|
47
|
-
## Common Pitfalls
|
|
48
|
-
|
|
49
|
-
| Excuse | Why It Fails |
|
|
50
|
-
|--------|-------------|
|
|
51
|
-
| "Should work now" | "Should" is not evidence. Run the command. |
|
|
52
|
-
| "I'm confident in the logic" | Confidence is not evidence. Run it. |
|
|
53
|
-
| "The linter passed" | Linter passing does not mean tests pass or build succeeds. |
|
|
54
|
-
| "I only changed one line" | One line can break everything. Verify. |
|
|
55
|
-
| "The subagent reported success" | Trust test output and VCS diffs, not agent reports. |
|
|
56
|
-
|
|
57
|
-
Stop if you catch yourself using "should", "probably", or "looks good" about unverified work, or expressing satisfaction before running verification.
|
|
58
|
-
|
|
59
|
-
## Verification Checklist
|
|
60
|
-
|
|
61
|
-
Before marking any work as complete:
|
|
62
|
-
|
|
63
|
-
- [ ] Identified the verification command for every claim
|
|
64
|
-
- [ ] Ran each verification command fresh in this turn
|
|
65
|
-
- [ ] Read the full output (not just the summary line)
|
|
66
|
-
- [ ] Checked exit codes (0 = success, non-zero = failure)
|
|
67
|
-
- [ ] Evidence supports every completion claim
|
|
68
|
-
- [ ] No "should", "probably", or "seems to" in your completion statement
|
|
69
|
-
- [ ] Evidence block produced for the task completion claim
|
|
70
|
-
|
|
71
|
-
See also: `/verification-gates` for the full gate framework with retry logic and escalation protocol.
|
|
@@ -1,169 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: verification-gates
|
|
3
|
-
description: >-
|
|
4
|
-
Hard gate framework for evidence-based verification. Defines four gate types
|
|
5
|
-
(input validation, pre-action, completion, quality), retry logic with feedback,
|
|
6
|
-
anti-rationalization enforcement, and escalation protocol. Use when implementing
|
|
7
|
-
verification checkpoints, completion gates, or quality checks.
|
|
8
|
-
user-invocable: false
|
|
9
|
-
---
|
|
10
|
-
|
|
11
|
-
# Verification Gates
|
|
12
|
-
|
|
13
|
-
Evidence before claims, always. No exceptions.
|
|
14
|
-
|
|
15
|
-
## Gate Types
|
|
16
|
-
|
|
17
|
-
### 1. Input Validation Gate
|
|
18
|
-
|
|
19
|
-
**When:** Before starting any work.
|
|
20
|
-
**Purpose:** Verify all required inputs exist (files, env vars, CLI args, state).
|
|
21
|
-
|
|
22
|
-
**Evidence required:**
|
|
23
|
-
- File existence checks (`test -f path`)
|
|
24
|
-
- Environment variable checks
|
|
25
|
-
- State file readability
|
|
26
|
-
|
|
27
|
-
**On failure:** Return structured error immediately. Do NOT attempt partial work.
|
|
28
|
-
|
|
29
|
-
```
|
|
30
|
-
AGENT RESULT: INPUT VALIDATION FAILED
|
|
31
|
-
Missing: [list of missing inputs]
|
|
32
|
-
Expected from: [source -- orchestrator, user, prior agent]
|
|
33
|
-
```
|
|
34
|
-
|
|
35
|
-
### 2. Pre-Action Gate
|
|
36
|
-
|
|
37
|
-
**When:** Before destructive actions (file writes, git commits, PRs, deployments).
|
|
38
|
-
**Purpose:** Verify intent and impact before irreversible changes.
|
|
39
|
-
|
|
40
|
-
**Evidence required:**
|
|
41
|
-
- State what will be changed
|
|
42
|
-
- Confirm target files/branches are correct
|
|
43
|
-
- Verify no unintended side effects (e.g., `git status` before commit)
|
|
44
|
-
|
|
45
|
-
**On failure:** Abort the action. Report what was wrong and what would have happened.
|
|
46
|
-
|
|
47
|
-
### 3. Completion Gate
|
|
48
|
-
|
|
49
|
-
**When:** Before claiming any task, plan, or phase is done.
|
|
50
|
-
**Purpose:** Verify all done criteria are met with fresh tool output.
|
|
51
|
-
|
|
52
|
-
**HARD GATE -- No completion claims without fresh verification evidence.**
|
|
53
|
-
|
|
54
|
-
Do NOT pass this gate by arguing it's "close enough", "minor issue", or "will fix later".
|
|
55
|
-
Either evidence passes or it fails. No middle ground.
|
|
56
|
-
Partial success is failure. "Good enough" is not enough.
|
|
57
|
-
|
|
58
|
-
If you have not run the verification command in THIS turn, you cannot claim it passes.
|
|
59
|
-
|
|
60
|
-
**Evidence required:**
|
|
61
|
-
- Run every verification command from the task's verify block
|
|
62
|
-
- Check every item in the done criteria list
|
|
63
|
-
- Produce an evidence block for each claim
|
|
64
|
-
|
|
65
|
-
### 4. Quality Gate
|
|
66
|
-
|
|
67
|
-
**When:** After implementation, before marking work as shippable.
|
|
68
|
-
**Purpose:** Verify code quality standards are met.
|
|
69
|
-
|
|
70
|
-
**Evidence required:**
|
|
71
|
-
- Test suite output (all passing, zero failures)
|
|
72
|
-
- Build output (exit code 0)
|
|
73
|
-
- Lint output (zero errors -- warnings acceptable if project allows)
|
|
74
|
-
|
|
75
|
-
**On failure:** Fix quality issues before proceeding. Do not defer quality failures.
|
|
76
|
-
|
|
77
|
-
## Anti-Rationalization
|
|
78
|
-
|
|
79
|
-
FORBIDDEN PHRASES -- if you catch yourself using these, STOP. You are rationalizing:
|
|
80
|
-
|
|
81
|
-
- "should work"
|
|
82
|
-
- "probably passes"
|
|
83
|
-
- "I'm confident that..."
|
|
84
|
-
- "based on my analysis..."
|
|
85
|
-
- "the logic suggests..."
|
|
86
|
-
- "it's reasonable to assume..."
|
|
87
|
-
|
|
88
|
-
These phrases replace evidence with reasoning. The gate requires tool output, not arguments.
|
|
89
|
-
|
|
90
|
-
Additional forbidden rationalizations:
|
|
91
|
-
- "It's close enough" -- close is not done
|
|
92
|
-
- "Minor issue, will fix later" -- later is never
|
|
93
|
-
- "The logic is correct so it must pass" -- run it and find out
|
|
94
|
-
- "I already verified this in a previous step" -- previous steps are stale; verify now
|
|
95
|
-
|
|
96
|
-
## Evidence Standard
|
|
97
|
-
|
|
98
|
-
Any tool output qualifies as evidence: test output, build results, git diff, file reads, linter output, command exit codes.
|
|
99
|
-
|
|
100
|
-
| Claim | Requires | Not Sufficient |
|
|
101
|
-
|-------|----------|----------------|
|
|
102
|
-
| "Tests pass" | Test command output showing 0 failures | Previous run, "should pass" |
|
|
103
|
-
| "Build succeeds" | Build command with exit code 0 | Linter passing only |
|
|
104
|
-
| "Bug is fixed" | Original failing test now passes | "Code changed, assumed fixed" |
|
|
105
|
-
| "Task complete" | All done criteria checked with evidence | "I implemented everything" |
|
|
106
|
-
| "No regressions" | Full test suite passing | "I only changed one file" |
|
|
107
|
-
| "File created" | `test -f path` or Read tool output | "I wrote it with Write tool" |
|
|
108
|
-
|
|
109
|
-
## Evidence Block Format
|
|
110
|
-
|
|
111
|
-
```
|
|
112
|
-
CLAIM: [what you are claiming]
|
|
113
|
-
EVIDENCE: [exact command run in THIS turn]
|
|
114
|
-
OUTPUT: [relevant excerpt of actual output]
|
|
115
|
-
VERDICT: PASS | FAIL
|
|
116
|
-
```
|
|
117
|
-
|
|
118
|
-
Produce one evidence block per claim. Group related claims if verified by the same command.
|
|
119
|
-
|
|
120
|
-
## Retry Protocol
|
|
121
|
-
|
|
122
|
-
Maximum 2 retries (3 total attempts) per gate.
|
|
123
|
-
|
|
124
|
-
**Retry feedback loop:**
|
|
125
|
-
1. Gate fails -- capture: what failed, expected result, actual result
|
|
126
|
-
2. Analyze the failure output (do not guess; read the error)
|
|
127
|
-
3. Fix the identified issue
|
|
128
|
-
4. Re-run the verification command
|
|
129
|
-
5. Produce a new evidence block
|
|
130
|
-
|
|
131
|
-
Each retry MUST include in its evidence block:
|
|
132
|
-
- Attempt number (1/3, 2/3, 3/3)
|
|
133
|
-
- What changed since last attempt
|
|
134
|
-
- Fresh verification output
|
|
135
|
-
|
|
136
|
-
**After 3rd failure -- escalation:**
|
|
137
|
-
|
|
138
|
-
Return full failure context to orchestrator:
|
|
139
|
-
|
|
140
|
-
```markdown
|
|
141
|
-
## GATE FAILURE -- ESCALATION
|
|
142
|
-
|
|
143
|
-
**Gate:** [gate type]
|
|
144
|
-
**Attempts:** 3/3
|
|
145
|
-
**Final evidence:**
|
|
146
|
-
CLAIM: [claim]
|
|
147
|
-
EVIDENCE: [command]
|
|
148
|
-
OUTPUT: [output]
|
|
149
|
-
VERDICT: FAIL
|
|
150
|
-
|
|
151
|
-
**History:**
|
|
152
|
-
- Attempt 1: [what failed, what was tried]
|
|
153
|
-
- Attempt 2: [what failed, what was tried]
|
|
154
|
-
- Attempt 3: [what failed -- escalating]
|
|
155
|
-
|
|
156
|
-
**Recommended action:** [what the orchestrator or user should do]
|
|
157
|
-
```
|
|
158
|
-
|
|
159
|
-
## Audit Trail
|
|
160
|
-
|
|
161
|
-
Log every gate attempt to GitHub Issues as a comment on the active phase issue:
|
|
162
|
-
|
|
163
|
-
- Gate name and type
|
|
164
|
-
- Attempt number
|
|
165
|
-
- Evidence provided (abbreviated)
|
|
166
|
-
- PASS or FAIL result
|
|
167
|
-
- Timestamp
|
|
168
|
-
|
|
169
|
-
This creates an auditable record of all verification activity for debugging and improvement.
|