crewkit 1.0.0 → 1.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/skill/SKILL.md +9 -4
- package/skill/templates/agents/coder.md +1 -0
- package/skill/templates/skills/explore-and-plan/SKILL.md +21 -8
- package/skill/templates/skills/full-workflow/SKILL.md +2 -72
- package/skill/templates/skills/full-workflow/references/operational-policies.md +85 -0
package/package.json
CHANGED
package/skill/SKILL.md
CHANGED
|
@@ -901,18 +901,23 @@ Write calibrated hooks to `.claude/hooks/`.
|
|
|
901
901
|
|
|
902
902
|
---
|
|
903
903
|
|
|
904
|
-
### Step 7 — `.claude/skills/` (templates
|
|
904
|
+
### Step 7 — `.claude/skills/` (templates + references)
|
|
905
905
|
|
|
906
|
-
Read all
|
|
907
|
-
- `full-workflow/SKILL.md`
|
|
906
|
+
Read all core skill templates from `~/.claude/skills/crewkit-setup/templates/skills/`:
|
|
907
|
+
- `full-workflow/SKILL.md` + `full-workflow/references/`
|
|
908
908
|
- `hotfix/SKILL.md`
|
|
909
909
|
- `explore-and-plan/SKILL.md`
|
|
910
910
|
- `review-pr/SKILL.md`
|
|
911
911
|
|
|
912
|
-
Copy each skill template to `.claude/skills/[name]/SKILL.md`.
|
|
912
|
+
Copy each skill template to `.claude/skills/[name]/SKILL.md`. **If the template has a `references/` subdirectory, copy it too.**
|
|
913
913
|
|
|
914
914
|
These skill templates are **stack-agnostic** by design — they reference `.ai/memory/commands.md` for build/test commands and `.ai/memory/` for project context. No variable substitution needed.
|
|
915
915
|
|
|
916
|
+
**Skill design principle — inline vs. references:**
|
|
917
|
+
- **Inline** in SKILL.md: content needed on every invocation (commands, classification tables, return format)
|
|
918
|
+
- **`references/`**: content needed only on certain branches (fix loop policies, stack-specific adapters, error catalogs). The SKILL.md tells the agent WHEN to load each reference.
|
|
919
|
+
- Never extract always-needed content to references — it adds tool calls without benefit.
|
|
920
|
+
|
|
916
921
|
---
|
|
917
922
|
|
|
918
923
|
### Step 8 — `.claude/QUICKSTART.md` (onboarding guide)
|
|
@@ -47,6 +47,7 @@ If your project uses `.claude/rules/` directory rules, they are loaded automatic
|
|
|
47
47
|
- Do not create a new file when adding to an existing file achieves the same goal
|
|
48
48
|
- Do not add `TODO` comments — either fix it now or leave it for the plan
|
|
49
49
|
- **NEVER create test files** — test creation is the **tester agent's exclusive responsibility**
|
|
50
|
+
- **NEVER mark phases, tasks, or features as completed** in `napkin.md`, `state.md`, or any memory/status file — only the orchestrator does this, and only after tester PASS + reviewer APPROVED
|
|
50
51
|
|
|
51
52
|
## Return Format
|
|
52
53
|
|
|
@@ -52,16 +52,29 @@ Use **architect** with explorer findings. Must return:
|
|
|
52
52
|
|
|
53
53
|
**The architect must NOT produce the plan.** Only analysis and decisions.
|
|
54
54
|
|
|
55
|
-
### 3. Present decisions — MANDATORY PAUSE
|
|
55
|
+
### 3. Present decisions — MANDATORY PAUSE (one-by-one)
|
|
56
56
|
|
|
57
|
-
**DO NOT create the plan yet.**
|
|
58
|
-
- Each decision with options, pros/cons, recommendation
|
|
59
|
-
- Required vs compromise vs debt
|
|
60
|
-
- Task size and key risks
|
|
61
|
-
- Technical verdict
|
|
62
|
-
- Ask user to confirm or override each decision
|
|
57
|
+
**DO NOT create the plan yet.**
|
|
63
58
|
|
|
64
|
-
**
|
|
59
|
+
First, present a brief summary: task size, technical verdict, total number of decisions, and key risks. Then present decisions **one at a time**, waiting for user response before showing the next.
|
|
60
|
+
|
|
61
|
+
**For each decision:**
|
|
62
|
+
1. **Name the decision** clearly (e.g., "D1: How to store the onboarding flag")
|
|
63
|
+
2. **Explain what it solves** — 1-2 sentences so the user understands WHY this decision matters
|
|
64
|
+
3. **Present options as a table** with Pros and Cons columns
|
|
65
|
+
4. **Explain the practical difference** — not abstract architecture, but what concretely changes for the user/system with each option
|
|
66
|
+
5. **State your recommendation** with a clear prompt (e.g., "Go with A?")
|
|
67
|
+
6. **Wait for the user to respond** before presenting the next decision
|
|
68
|
+
|
|
69
|
+
**Rules:**
|
|
70
|
+
- ONE decision per message. Never batch multiple decisions.
|
|
71
|
+
- If the user agrees, confirm and move to the next immediately.
|
|
72
|
+
- If the user disagrees, acknowledge the override and record it. Then move to the next.
|
|
73
|
+
- If the user asks for more detail, explain further before asking again.
|
|
74
|
+
- After ALL decisions are confirmed, show a complete summary table.
|
|
75
|
+
- If scope is large, ask about scope reduction early (D1 or D2) since it affects all subsequent decisions.
|
|
76
|
+
|
|
77
|
+
**Wait for ALL decisions to be resolved before proceeding to step 4.**
|
|
65
78
|
|
|
66
79
|
### 4. Create plan file (after confirmation)
|
|
67
80
|
|
|
@@ -121,81 +121,11 @@ If a durable lesson was learned, append to the appropriate `lessons-{domain}.md`
|
|
|
121
121
|
|
|
122
122
|
---
|
|
123
123
|
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
## Exit gate
|
|
127
|
-
|
|
128
|
-
**HARD BLOCK: No task is complete without reviewer APPROVED (clean).**
|
|
129
|
-
|
|
130
|
-
- Tester PASS alone is **not sufficient**
|
|
131
|
-
- Reviewer APPROVED is **mandatory** before Summarize
|
|
132
|
-
- **APPROVED with IMPORTANT+ findings is NOT clean.** Fix, then re-run tester + reviewer.
|
|
133
|
-
- Both must be clean (PASS + APPROVED without IMPORTANT+ findings) before Summarize.
|
|
134
|
-
|
|
135
|
-
## Findings consolidation
|
|
136
|
-
|
|
137
|
-
After tester and reviewer finish:
|
|
138
|
-
|
|
139
|
-
1. **Collect** results from both
|
|
140
|
-
2. **Classify:** Tester = PASS/FAIL. Reviewer = APPROVED/NEEDS_CHANGES
|
|
141
|
-
3. **Deduplicate** — same file + same concern → keep higher severity
|
|
142
|
-
4. **APPROVED with IMPORTANT+ findings** = treat as NEEDS_CHANGES
|
|
143
|
-
5. **Decision matrix:**
|
|
144
|
-
|
|
145
|
-
| Tester | Reviewer | Action |
|
|
146
|
-
|--------|----------|--------|
|
|
147
|
-
| PASS | APPROVED (clean) | Done → Summarize |
|
|
148
|
-
| PASS | APPROVED with IMPORTANT+ | Fix loop |
|
|
149
|
-
| PASS | NEEDS_CHANGES | Fix loop (reviewer findings) |
|
|
150
|
-
| FAIL | APPROVED | Fix loop (test failures) |
|
|
151
|
-
| FAIL | NEEDS_CHANGES | Fix loop (merge into ONE list for coder) |
|
|
152
|
-
|
|
153
|
-
When both fail, call coder **once** with the merged list.
|
|
154
|
-
|
|
155
|
-
## Fix loop
|
|
156
|
-
|
|
157
|
-
1. **Fix:**
|
|
158
|
-
- Risk **HIGH**: all fixes through **coder** — never auto-fix
|
|
159
|
-
- Risk LOW/MEDIUM: `auto_fixable: yes` → orchestrator applies directly. Else → coder
|
|
160
|
-
- When fix changes an exception type or interface → instruct coder to grep for all test doubles/fakes
|
|
161
|
-
2. **Revalidate in parallel** (tester fix-loop mode + reviewer)
|
|
162
|
-
3. Consolidate again
|
|
163
|
-
4. Exit when PASS + APPROVED
|
|
164
|
-
5. **Max 5 iterations** — then STOP and report to user.
|
|
165
|
-
|
|
166
|
-
**MINOR findings** do not trigger fix loop alone.
|
|
167
|
-
|
|
168
|
-
**Tester time budget:** if the tester reports pre-existing failures unrelated to the current task, the orchestrator must NOT ask the tester to fix them. Note them for a separate task and proceed.
|
|
169
|
-
|
|
170
|
-
## Test creation rule
|
|
171
|
-
|
|
172
|
-
**Every behavioral change must be validated by tests.** The tester creates them automatically.
|
|
173
|
-
|
|
174
|
-
- New feature with logic → unit tests + integration when applicable
|
|
175
|
-
- Bug fix → test that reproduces the bug + verifies the fix
|
|
176
|
-
- Refactor with preserved behavior → existing tests are sufficient
|
|
177
|
-
- Cosmetic/text/DTO change without logic → build + review is sufficient
|
|
178
|
-
|
|
179
|
-
## HIGH risk rules
|
|
180
|
-
|
|
181
|
-
- Never auto-fix — all through coder
|
|
182
|
-
- Full test suite on every revalidation
|
|
183
|
-
- Reviewer always mandatory
|
|
184
|
-
- Architect mandatory if any design decision is open
|
|
185
|
-
|
|
186
|
-
## Stop conditions
|
|
187
|
-
|
|
188
|
-
STOP and escalate when:
|
|
189
|
-
- Build doesn't stabilize after 2 corrections
|
|
190
|
-
- Reviewer flags an architectural problem
|
|
191
|
-
- Tester finds widespread failures outside task scope
|
|
192
|
-
- Root cause unclear after 1 fix loop
|
|
193
|
-
- Affected files grow beyond plan
|
|
194
|
-
- SMALL/MEDIUM reveals structural impact
|
|
124
|
+
> **Operational policies** (exit gate, fix loop, findings consolidation, stop conditions): load `references/operational-policies.md` when entering consolidation or fix loop.
|
|
195
125
|
|
|
196
126
|
---
|
|
197
127
|
|
|
198
|
-
# Part
|
|
128
|
+
# Part 2 — Stack Configuration
|
|
199
129
|
|
|
200
130
|
The orchestrator must tell subagents which build/test commands to use. Read `.ai/memory/commands.md` at the start and use the correct commands for each stack.
|
|
201
131
|
|
|
@@ -0,0 +1,85 @@
|
|
|
1
|
+
# Full-Workflow — Operational Policies
|
|
2
|
+
|
|
3
|
+
Referenced by `SKILL.md`. Load when entering consolidation, fix loop, or stop conditions.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Exit gate
|
|
8
|
+
|
|
9
|
+
**HARD BLOCK: No task is complete without reviewer APPROVED (clean).**
|
|
10
|
+
|
|
11
|
+
- Tester PASS alone is **not sufficient**
|
|
12
|
+
- Reviewer APPROVED is **mandatory** before Summarize
|
|
13
|
+
- **APPROVED with IMPORTANT+ findings is NOT clean.** Fix, then re-run tester + reviewer.
|
|
14
|
+
- Both must be clean (PASS + APPROVED without IMPORTANT+ findings) before Summarize.
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## Findings consolidation
|
|
19
|
+
|
|
20
|
+
After tester and reviewer finish:
|
|
21
|
+
|
|
22
|
+
1. **Collect** results from both
|
|
23
|
+
2. **Classify:** Tester = PASS/FAIL. Reviewer = APPROVED/NEEDS_CHANGES
|
|
24
|
+
3. **Deduplicate** — same file + same concern → keep higher severity
|
|
25
|
+
4. **APPROVED with IMPORTANT+ findings** = treat as NEEDS_CHANGES
|
|
26
|
+
5. **Decision matrix:**
|
|
27
|
+
|
|
28
|
+
| Tester | Reviewer | Action |
|
|
29
|
+
|--------|----------|--------|
|
|
30
|
+
| PASS | APPROVED (clean) | Done → Summarize |
|
|
31
|
+
| PASS | APPROVED with IMPORTANT+ | Fix loop |
|
|
32
|
+
| PASS | NEEDS_CHANGES | Fix loop (reviewer findings) |
|
|
33
|
+
| FAIL | APPROVED | Fix loop (test failures) |
|
|
34
|
+
| FAIL | NEEDS_CHANGES | Fix loop (merge into ONE list for coder) |
|
|
35
|
+
|
|
36
|
+
When both fail, call coder **once** with the merged list.
|
|
37
|
+
|
|
38
|
+
---
|
|
39
|
+
|
|
40
|
+
## Fix loop
|
|
41
|
+
|
|
42
|
+
1. **Fix:**
|
|
43
|
+
- Risk **HIGH**: all fixes through **coder** — never auto-fix
|
|
44
|
+
- Risk LOW/MEDIUM: `auto_fixable: yes` → orchestrator applies directly. Else → coder
|
|
45
|
+
- When fix changes an exception type or interface → instruct coder to grep for all test doubles/fakes
|
|
46
|
+
2. **Revalidate in parallel** (tester fix-loop mode + reviewer)
|
|
47
|
+
3. Consolidate again
|
|
48
|
+
4. Exit when PASS + APPROVED
|
|
49
|
+
5. **Max 5 iterations** — then STOP and report to user.
|
|
50
|
+
|
|
51
|
+
**MINOR findings** do not trigger fix loop alone.
|
|
52
|
+
|
|
53
|
+
**Tester time budget:** if the tester reports pre-existing failures unrelated to the current task, the orchestrator must NOT ask the tester to fix them. Note them for a separate task and proceed.
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## Test creation rule
|
|
58
|
+
|
|
59
|
+
**Every behavioral change must be validated by tests.** The tester creates them automatically.
|
|
60
|
+
|
|
61
|
+
- New feature with logic → unit tests + integration when applicable
|
|
62
|
+
- Bug fix → test that reproduces the bug + verifies the fix
|
|
63
|
+
- Refactor with preserved behavior → existing tests are sufficient
|
|
64
|
+
- Cosmetic/text/DTO change without logic → build + review is sufficient
|
|
65
|
+
|
|
66
|
+
---
|
|
67
|
+
|
|
68
|
+
## HIGH risk rules
|
|
69
|
+
|
|
70
|
+
- Never auto-fix — all through coder
|
|
71
|
+
- Full test suite on every revalidation
|
|
72
|
+
- Reviewer always mandatory
|
|
73
|
+
- Architect mandatory if any design decision is open
|
|
74
|
+
|
|
75
|
+
---
|
|
76
|
+
|
|
77
|
+
## Stop conditions
|
|
78
|
+
|
|
79
|
+
STOP and escalate when:
|
|
80
|
+
- Build doesn't stabilize after 2 corrections
|
|
81
|
+
- Reviewer flags an architectural problem
|
|
82
|
+
- Tester finds widespread failures outside task scope
|
|
83
|
+
- Root cause unclear after 1 fix loop
|
|
84
|
+
- Affected files grow beyond plan
|
|
85
|
+
- SMALL/MEDIUM reveals structural impact
|