@arthai/agents 1.0.4 → 1.0.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +55 -3
- package/VERSION +1 -1
- package/agents/troubleshooter.md +132 -0
- package/bin/cli.js +366 -0
- package/bundles/canvas.json +1 -1
- package/bundles/compass.json +1 -1
- package/bundles/counsel.json +1 -0
- package/bundles/cruise.json +1 -1
- package/bundles/forge.json +12 -1
- package/bundles/prism.json +1 -0
- package/bundles/scalpel.json +5 -2
- package/bundles/sentinel.json +8 -2
- package/bundles/shield.json +1 -0
- package/bundles/spark.json +1 -0
- package/compiler.sh +14 -0
- package/dist/plugins/canvas/.claude-plugin/plugin.json +1 -1
- package/dist/plugins/canvas/VERSION +1 -0
- package/dist/plugins/canvas/commands/planning.md +100 -11
- package/dist/plugins/canvas/hooks/hooks.json +16 -0
- package/dist/plugins/canvas/hooks/project-setup.sh +109 -0
- package/dist/plugins/canvas/templates/CLAUDE.md.managed-block +123 -0
- package/dist/plugins/canvas/templates/CLAUDE.md.template +111 -0
- package/dist/plugins/compass/.claude-plugin/plugin.json +1 -1
- package/dist/plugins/compass/VERSION +1 -0
- package/dist/plugins/compass/commands/planning.md +100 -11
- package/dist/plugins/compass/hooks/hooks.json +16 -0
- package/dist/plugins/compass/hooks/project-setup.sh +109 -0
- package/dist/plugins/compass/templates/CLAUDE.md.managed-block +123 -0
- package/dist/plugins/compass/templates/CLAUDE.md.template +111 -0
- package/dist/plugins/counsel/.claude-plugin/plugin.json +1 -1
- package/dist/plugins/counsel/VERSION +1 -0
- package/dist/plugins/counsel/hooks/hooks.json +10 -0
- package/dist/plugins/counsel/hooks/project-setup.sh +109 -0
- package/dist/plugins/counsel/templates/CLAUDE.md.managed-block +123 -0
- package/dist/plugins/counsel/templates/CLAUDE.md.template +111 -0
- package/dist/plugins/cruise/.claude-plugin/plugin.json +1 -1
- package/dist/plugins/cruise/VERSION +1 -0
- package/dist/plugins/cruise/hooks/hooks.json +16 -0
- package/dist/plugins/cruise/hooks/project-setup.sh +109 -0
- package/dist/plugins/cruise/templates/CLAUDE.md.managed-block +123 -0
- package/dist/plugins/cruise/templates/CLAUDE.md.template +111 -0
- package/dist/plugins/forge/.claude-plugin/plugin.json +1 -1
- package/dist/plugins/forge/VERSION +1 -0
- package/dist/plugins/forge/agents/troubleshooter.md +132 -0
- package/dist/plugins/forge/commands/implement.md +99 -1
- package/dist/plugins/forge/commands/planning.md +100 -11
- package/dist/plugins/forge/hooks/escalation-guard.sh +177 -0
- package/dist/plugins/forge/hooks/hooks.json +22 -0
- package/dist/plugins/forge/hooks/project-setup.sh +109 -0
- package/dist/plugins/forge/templates/CLAUDE.md.managed-block +123 -0
- package/dist/plugins/forge/templates/CLAUDE.md.template +111 -0
- package/dist/plugins/prime/.claude-plugin/plugin.json +1 -1
- package/dist/plugins/prime/VERSION +1 -0
- package/dist/plugins/prime/agents/troubleshooter.md +132 -0
- package/dist/plugins/prime/commands/calibrate.md +20 -0
- package/dist/plugins/prime/commands/ci-fix.md +36 -0
- package/dist/plugins/prime/commands/fix.md +23 -0
- package/dist/plugins/prime/commands/implement.md +99 -1
- package/dist/plugins/prime/commands/planning.md +100 -11
- package/dist/plugins/prime/commands/qa-incident.md +54 -0
- package/dist/plugins/prime/commands/restart.md +186 -30
- package/dist/plugins/prime/hooks/escalation-guard.sh +177 -0
- package/dist/plugins/prime/hooks/hooks.json +60 -0
- package/dist/plugins/prime/hooks/post-config-change-restart-reminder.sh +86 -0
- package/dist/plugins/prime/hooks/post-server-crash-watch.sh +120 -0
- package/dist/plugins/prime/hooks/pre-server-port-guard.sh +110 -0
- package/dist/plugins/prime/hooks/project-setup.sh +109 -0
- package/dist/plugins/prime/hooks/sync-agents.sh +99 -12
- package/dist/plugins/prime/templates/CLAUDE.md.managed-block +123 -0
- package/dist/plugins/prime/templates/CLAUDE.md.template +111 -0
- package/dist/plugins/prism/.claude-plugin/plugin.json +1 -1
- package/dist/plugins/prism/VERSION +1 -0
- package/dist/plugins/prism/commands/qa-incident.md +54 -0
- package/dist/plugins/prism/hooks/hooks.json +12 -0
- package/dist/plugins/prism/hooks/project-setup.sh +109 -0
- package/dist/plugins/prism/templates/CLAUDE.md.managed-block +123 -0
- package/dist/plugins/prism/templates/CLAUDE.md.template +111 -0
- package/dist/plugins/scalpel/.claude-plugin/plugin.json +1 -1
- package/dist/plugins/scalpel/VERSION +1 -0
- package/dist/plugins/scalpel/agents/troubleshooter.md +132 -0
- package/dist/plugins/scalpel/commands/ci-fix.md +36 -0
- package/dist/plugins/scalpel/commands/fix.md +23 -0
- package/dist/plugins/scalpel/hooks/escalation-guard.sh +177 -0
- package/dist/plugins/scalpel/hooks/hooks.json +24 -0
- package/dist/plugins/scalpel/hooks/project-setup.sh +109 -0
- package/dist/plugins/scalpel/templates/CLAUDE.md.managed-block +123 -0
- package/dist/plugins/scalpel/templates/CLAUDE.md.template +111 -0
- package/dist/plugins/sentinel/.claude-plugin/plugin.json +1 -1
- package/dist/plugins/sentinel/VERSION +1 -0
- package/dist/plugins/sentinel/agents/troubleshooter.md +132 -0
- package/dist/plugins/sentinel/commands/restart.md +186 -30
- package/dist/plugins/sentinel/hooks/escalation-guard.sh +177 -0
- package/dist/plugins/sentinel/hooks/hooks.json +64 -0
- package/dist/plugins/sentinel/hooks/post-config-change-restart-reminder.sh +86 -0
- package/dist/plugins/sentinel/hooks/post-server-crash-watch.sh +120 -0
- package/dist/plugins/sentinel/hooks/pre-server-port-guard.sh +110 -0
- package/dist/plugins/sentinel/hooks/project-setup.sh +109 -0
- package/dist/plugins/sentinel/templates/CLAUDE.md.managed-block +123 -0
- package/dist/plugins/sentinel/templates/CLAUDE.md.template +111 -0
- package/dist/plugins/shield/.claude-plugin/plugin.json +1 -1
- package/dist/plugins/shield/VERSION +1 -0
- package/dist/plugins/shield/hooks/hooks.json +22 -12
- package/dist/plugins/shield/hooks/project-setup.sh +109 -0
- package/dist/plugins/shield/templates/CLAUDE.md.managed-block +123 -0
- package/dist/plugins/shield/templates/CLAUDE.md.template +111 -0
- package/dist/plugins/spark/.claude-plugin/plugin.json +1 -1
- package/dist/plugins/spark/VERSION +1 -0
- package/dist/plugins/spark/commands/calibrate.md +20 -0
- package/dist/plugins/spark/hooks/hooks.json +10 -0
- package/dist/plugins/spark/hooks/project-setup.sh +109 -0
- package/dist/plugins/spark/templates/CLAUDE.md.managed-block +123 -0
- package/dist/plugins/spark/templates/CLAUDE.md.template +111 -0
- package/hook-defs.json +31 -0
- package/hooks/escalation-guard.sh +177 -0
- package/hooks/post-config-change-restart-reminder.sh +86 -0
- package/hooks/post-server-crash-watch.sh +120 -0
- package/hooks/pre-server-port-guard.sh +110 -0
- package/hooks/project-setup.sh +109 -0
- package/hooks/sync-agents.sh +99 -12
- package/install.sh +2 -2
- package/package.json +1 -1
- package/portable.manifest +7 -1
- package/skills/calibrate/SKILL.md +20 -0
- package/skills/ci-fix/SKILL.md +36 -0
- package/skills/fix/SKILL.md +23 -0
- package/skills/implement/SKILL.md +99 -1
- package/skills/license/SKILL.md +159 -0
- package/skills/planning/SKILL.md +100 -11
- package/skills/publish/SKILL.md +3 -0
- package/skills/qa-incident/SKILL.md +54 -0
- package/skills/restart/SKILL.md +187 -31
|
@@ -58,7 +58,86 @@ Map user choices:
|
|
|
58
58
|
|
|
59
59
|
Store the resolved mode as `DEBATE_MODE` (values: `lite`, `fast`, `full`).
|
|
60
60
|
|
|
61
|
-
### 3.
|
|
61
|
+
### 3. Phase 0: Spec Generation (before debate)
|
|
62
|
+
|
|
63
|
+
Before any debate rounds, the PM generates a **spec doc** that becomes the foundation for all subsequent work. This ensures user stories and edge cases are defined BEFORE architecture decisions.
|
|
64
|
+
|
|
65
|
+
**Create the specs directory:**
|
|
66
|
+
```bash
|
|
67
|
+
mkdir -p .claude/specs
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
**Spawn PM agent (subagent_type="product-manager", model="sonnet") for spec generation only:**
|
|
71
|
+
|
|
72
|
+
Prompt:
|
|
73
|
+
```
|
|
74
|
+
You are a Product Manager generating a spec doc for the feature "{feature-name}".
|
|
75
|
+
|
|
76
|
+
Feature brief: {FEATURE_BRIEF}
|
|
77
|
+
|
|
78
|
+
Generate a spec doc with these sections:
|
|
79
|
+
|
|
80
|
+
## User Stories
|
|
81
|
+
Write 3-7 user stories covering the happy path and key error states.
|
|
82
|
+
Format: "As a [user type], I want [action], so that [outcome]"
|
|
83
|
+
Each story must have:
|
|
84
|
+
- **Story ID**: US-1, US-2, etc.
|
|
85
|
+
- **Priority**: P0 (must-have for launch) or P1 (important but deferrable)
|
|
86
|
+
- **Acceptance**: specific testable condition that proves this story is done
|
|
87
|
+
|
|
88
|
+
Example:
|
|
89
|
+
US-1 [P0]: As a new developer, I want to install the toolkit with one command,
|
|
90
|
+
so that I can start using it without manual configuration.
|
|
91
|
+
Acceptance: `npx @arthai/agents install forge .` succeeds and skills are available.
|
|
92
|
+
|
|
93
|
+
## User Journey
|
|
94
|
+
Step-by-step flow from the user's first interaction to completion.
|
|
95
|
+
Include:
|
|
96
|
+
- **Happy path**: numbered steps (1. User does X → 2. System responds Y → ...)
|
|
97
|
+
- **Decision points**: where the user makes a choice (mark with ◆)
|
|
98
|
+
- **Error branches**: where things can go wrong (mark with ✗ and show recovery path)
|
|
99
|
+
|
|
100
|
+
Format as a text flowchart:
|
|
101
|
+
```
|
|
102
|
+
1. User discovers feature
|
|
103
|
+
2. User does [action]
|
|
104
|
+
◆ Decision: [choice A] or [choice B]
|
|
105
|
+
→ Choice A: proceed to step 3
|
|
106
|
+
→ Choice B: proceed to step 5
|
|
107
|
+
3. System responds with [result]
|
|
108
|
+
✗ Error: [what went wrong] → Recovery: [how to fix]
|
|
109
|
+
4. ...
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
## Edge Cases
|
|
113
|
+
Structured list of what can go wrong. For each:
|
|
114
|
+
- **ID**: EC-1, EC-2, etc.
|
|
115
|
+
- **Scenario**: what triggers this edge case
|
|
116
|
+
- **Expected behavior**: what should happen (not crash, not hang)
|
|
117
|
+
- **Severity**: Critical (blocks user) / High (degrades experience) / Medium (inconvenience)
|
|
118
|
+
- **Linked story**: which user story this edge case relates to
|
|
119
|
+
|
|
120
|
+
## Success Criteria
|
|
121
|
+
Measurable outcomes tied to user stories. These become the acceptance criteria
|
|
122
|
+
that /implement and /qa use to validate the implementation.
|
|
123
|
+
- Each criterion references a story ID
|
|
124
|
+
- Each criterion is binary: pass or fail, no subjective judgment
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
**Write the output to `.claude/specs/{feature-name}.md`** with this frontmatter:
|
|
128
|
+
|
|
129
|
+
```markdown
|
|
130
|
+
---
|
|
131
|
+
feature: {feature-name}
|
|
132
|
+
generated: {ISO date}
|
|
133
|
+
stories: {count}
|
|
134
|
+
edge_cases: {count}
|
|
135
|
+
---
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
**Store the spec content as `FEATURE_SPEC`** — this is injected into the shared context block for all debate participants.
|
|
139
|
+
|
|
140
|
+
### 4. Codebase Scan
|
|
62
141
|
|
|
63
142
|
Spawn an `explore-light` subagent (model: haiku) to scan for relevant files:
|
|
64
143
|
|
|
@@ -68,7 +147,7 @@ prompt: "For a feature called '{feature-name}', find: (1) related backend routes
|
|
|
68
147
|
|
|
69
148
|
Store the result as `CODEBASE_CONTEXT`.
|
|
70
149
|
|
|
71
|
-
###
|
|
150
|
+
### 5. Create Team + Tasks
|
|
72
151
|
|
|
73
152
|
Create team: `planning-{feature-name}`
|
|
74
153
|
|
|
@@ -76,12 +155,13 @@ Create these tasks:
|
|
|
76
155
|
|
|
77
156
|
| Task | Owner | Subject |
|
|
78
157
|
|------|-------|---------|
|
|
79
|
-
|
|
|
158
|
+
| 0 | product-manager | Generate spec doc for {feature-name} (Phase 0 — already done above) |
|
|
159
|
+
| 1 | product-manager | Define product scope for {feature-name} (uses spec as input) |
|
|
80
160
|
| 2 | architect | Design technical plan for {feature-name} |
|
|
81
161
|
| 3 (if --design) | design-thinker | Create design brief for {feature-name} |
|
|
82
162
|
| 4 (if not --lite) | devils-advocate | Challenge scope and feasibility for {feature-name} |
|
|
83
163
|
|
|
84
|
-
###
|
|
164
|
+
### 6. Build Shared Context Block
|
|
85
165
|
|
|
86
166
|
Compose this block to inject into every teammate's spawn prompt:
|
|
87
167
|
|
|
@@ -95,6 +175,11 @@ Auth: {AUTH_APPROACH}
|
|
|
95
175
|
## Feature: {feature-name}
|
|
96
176
|
{FEATURE_BRIEF}
|
|
97
177
|
|
|
178
|
+
## Spec Doc (from Phase 0)
|
|
179
|
+
{FEATURE_SPEC}
|
|
180
|
+
|
|
181
|
+
(Full spec at: .claude/specs/{feature-name}.md)
|
|
182
|
+
|
|
98
183
|
## Relevant Codebase
|
|
99
184
|
{CODEBASE_CONTEXT}
|
|
100
185
|
|
|
@@ -114,13 +199,13 @@ This planning session runs in {DEBATE_MODE} mode.
|
|
|
114
199
|
- If DA recommends KILL and user overrides, record as USER OVERRIDE in the plan.
|
|
115
200
|
```
|
|
116
201
|
|
|
117
|
-
###
|
|
202
|
+
### 7. Spawn Teammates (ALL IN ONE MESSAGE — parallel)
|
|
118
203
|
|
|
119
204
|
Spawn all teammates in a **single message** with multiple Task tool calls:
|
|
120
205
|
|
|
121
206
|
**Always spawn:**
|
|
122
207
|
- **product-manager** (subagent_type="product-manager", model="opus")
|
|
123
|
-
- Prompt: "{SHARED_CONTEXT}\n\nYou are the PM. Own the 'what' and 'why'.\n\nIn Round 1, you LEAD with your scope claim. Format your must-haves as:\n [M1] requirement — BECAUSE reason\n [M2] ...\nMaximum 5 MUST-HAVEs. Also list NICE-TO-HAVEs with CUT-IF conditions, EXPLICIT EXCLUSIONS, and success metrics.\n\nIn Round 2, you COUNTER the architect's approach: does it deliver user
|
|
208
|
+
- Prompt: "{SHARED_CONTEXT}\n\nYou are the PM. Own the 'what' and 'why'. You generated the spec doc in Phase 0 — your user stories and edge cases are in the shared context above. Use them as the foundation for your scope claim.\n\nIn Round 1, you LEAD with your scope claim. Each must-have should trace to one or more user stories (reference by ID, e.g., 'US-1, US-3'). Format your must-haves as:\n [M1] requirement — BECAUSE reason (traces to US-X)\n [M2] ...\nMaximum 5 MUST-HAVEs. Also list NICE-TO-HAVEs with CUT-IF conditions, EXPLICIT EXCLUSIONS, and success metrics.\n\nIn Round 2, you COUNTER the architect's approach: does it deliver the user journey as specified? Is it over-engineered? What is the time-to-value? Reference edge cases the approach doesn't handle.\n\nIn Round 3 (full mode only), you DEFEND against the devil's advocate's risk case. Accept or reject each risk with evidence. Reference user stories to justify why a must-have cannot be cut."
|
|
124
209
|
|
|
125
210
|
- **architect** (subagent_type="architect", model="opus")
|
|
126
211
|
- Prompt: "{SHARED_CONTEXT}\n\nYou are the Architect. Own the 'how'.\n\nIn Round 1, you COUNTER the PM's scope from a feasibility lens. Challenge feasibility, flag hidden complexity, identify scope creep vectors, propose a counter-scope.\n\nIn Round 2, you LEAD with your technical approach: API contract, DB changes, architecture decision + WHY, task breakdown with S/M/L/XL estimates, implementation cost, dependencies, risks.\n\nIn Round 3 (full mode only), you DEFEND against the devil's advocate's risk case. Accept or reject each risk with evidence. Keep it simple for early-stage. Push back on scope that doesn't match the development stage."
|
|
@@ -140,7 +225,7 @@ Spawn all teammates in a **single message** with multiple Task tool calls:
|
|
|
140
225
|
- **gtm-expert** (subagent_type="gtm-expert", model="sonnet")
|
|
141
226
|
- Prompt: "{SHARED_CONTEXT}\n\nYou are the GTM Expert. Own distribution and launch strategy. Advise on positioning, viral mechanics, and launch sequencing. Challenge the team on how users will discover and adopt this feature. Your output feeds into Round 3 as additional evidence for the devil's advocate."
|
|
142
227
|
|
|
143
|
-
###
|
|
228
|
+
### 8. Structured Debate Protocol
|
|
144
229
|
|
|
145
230
|
Facilitate the following rounds in sequence. Each phase completes before the next begins.
|
|
146
231
|
|
|
@@ -279,7 +364,7 @@ Each matched item is flagged as a RISK NOTE in the plan.
|
|
|
279
364
|
|
|
280
365
|
---
|
|
281
366
|
|
|
282
|
-
###
|
|
367
|
+
### 9. Convergence Logic
|
|
283
368
|
|
|
284
369
|
**Plan is APPROVED when ALL of the following are true after all applicable rounds complete:**
|
|
285
370
|
- Rounds 1 and 2 have zero UNRESOLVED items
|
|
@@ -296,7 +381,7 @@ If user overrides a KILL recommendation, record as `USER OVERRIDE` in the plan.
|
|
|
296
381
|
|
|
297
382
|
In lite and fast modes, skip convergence checks for rounds that were not run. Apply only the checks applicable to completed rounds.
|
|
298
383
|
|
|
299
|
-
###
|
|
384
|
+
### 10. Scope Lock
|
|
300
385
|
|
|
301
386
|
After convergence, compute a `scope_hash`:
|
|
302
387
|
- Concatenate all locked MUST-HAVE strings from Round 1 in order
|
|
@@ -305,7 +390,7 @@ After convergence, compute a `scope_hash`:
|
|
|
305
390
|
|
|
306
391
|
When `/implement` loads the plan, it can verify the hash against the locked must-haves to detect tampering.
|
|
307
392
|
|
|
308
|
-
###
|
|
393
|
+
### 11. Write Plan File + Present to User
|
|
309
394
|
|
|
310
395
|
Synthesize teammate outputs into a structured plan and **write it to `.claude/plans/{feature-name}.md`** using the Write tool. This file is read by `/implement` to auto-configure the implementation team.
|
|
311
396
|
|
|
@@ -317,6 +402,7 @@ feature: {feature-name}
|
|
|
317
402
|
debate_mode: {DEBATE_MODE}
|
|
318
403
|
scope_hash: {SHA-256 of locked must-haves}
|
|
319
404
|
da_confidence: {HIGH|MEDIUM|LOW|N/A}
|
|
405
|
+
spec: specs/{feature-name}.md
|
|
320
406
|
layers:
|
|
321
407
|
- frontend # include if ANY frontend tasks exist
|
|
322
408
|
- backend # include if ANY backend tasks exist
|
|
@@ -324,6 +410,9 @@ layers:
|
|
|
324
410
|
|
|
325
411
|
# Planning Summary: {feature-name}
|
|
326
412
|
|
|
413
|
+
## Spec Reference
|
|
414
|
+
See `.claude/specs/{feature-name}.md` for user stories, user journey, edge cases, and success criteria.
|
|
415
|
+
|
|
327
416
|
## Problem & User Segment (from PM)
|
|
328
417
|
...
|
|
329
418
|
|
|
@@ -394,7 +483,7 @@ layers:
|
|
|
394
483
|
|
|
395
484
|
Present the plan to the user for review.
|
|
396
485
|
|
|
397
|
-
###
|
|
486
|
+
### 12. Cleanup
|
|
398
487
|
|
|
399
488
|
After user reviews the plan:
|
|
400
489
|
- Send shutdown_request to all teammates.
|
|
@@ -40,6 +40,60 @@ affected_files:
|
|
|
40
40
|
|
|
41
41
|
4. **Confirm to user**: "Incident logged. Next `/qa` run will generate a regression test targeting this issue."
|
|
42
42
|
|
|
43
|
+
## Auto-Logging from Escalation (Circuit Breaker Resolution)
|
|
44
|
+
|
|
45
|
+
When an agent resolves a stuck situation (circuit breaker was tripped, then the issue was fixed),
|
|
46
|
+
the resolution should be logged automatically. This enables future agents to find the fix
|
|
47
|
+
without repeating the same debugging journey.
|
|
48
|
+
|
|
49
|
+
**Trigger:** Any workflow that resolves an error after the escalation guard tripped (3+ consecutive failures).
|
|
50
|
+
|
|
51
|
+
**Auto-log format** — create `.claude/qa-knowledge/incidents/{date}-escalation-{slug}.md`:
|
|
52
|
+
|
|
53
|
+
```markdown
|
|
54
|
+
---
|
|
55
|
+
date: {today YYYY-MM-DD}
|
|
56
|
+
severity: medium
|
|
57
|
+
status: covered
|
|
58
|
+
type: escalation-resolution
|
|
59
|
+
error_signature: {from .claude/.escalation-state.json}
|
|
60
|
+
affected_files:
|
|
61
|
+
- {files that were modified to fix the issue}
|
|
62
|
+
---
|
|
63
|
+
# Escalation: {brief description of what was stuck}
|
|
64
|
+
|
|
65
|
+
## Error
|
|
66
|
+
{exact error message that triggered the circuit breaker}
|
|
67
|
+
|
|
68
|
+
## What Was Tried (failed)
|
|
69
|
+
1. {attempt 1 from escalation state} → {result}
|
|
70
|
+
2. {attempt 2} → {result}
|
|
71
|
+
3. {attempt 3} → {result}
|
|
72
|
+
|
|
73
|
+
## Root Cause
|
|
74
|
+
{what was actually wrong}
|
|
75
|
+
|
|
76
|
+
## Fix Applied
|
|
77
|
+
{what change resolved the issue}
|
|
78
|
+
|
|
79
|
+
## How to Prevent
|
|
80
|
+
{if applicable — what convention or check would catch this earlier}
|
|
81
|
+
|
|
82
|
+
## Search Keywords
|
|
83
|
+
{error message fragments, file names, command patterns — for future KB searches}
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
**Also update** `.claude/qa-knowledge/bug-patterns.md` if this represents a new pattern:
|
|
87
|
+
```
|
|
88
|
+
## {Pattern name}
|
|
89
|
+
- **Signature:** {error message or command pattern}
|
|
90
|
+
- **Root cause:** {common reason}
|
|
91
|
+
- **Fix:** {standard resolution}
|
|
92
|
+
- **First seen:** {date}
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
This closes the learning loop: stuck → escalate → fix → log → future agents find it instantly.
|
|
96
|
+
|
|
43
97
|
## If No Description Provided
|
|
44
98
|
|
|
45
99
|
Ask: "Describe the issue you want to log (e.g., 'admin page crashes when user has no sessions')"
|
|
@@ -1,68 +1,224 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: restart
|
|
3
|
-
description: "
|
|
3
|
+
description: "Discover, restart, and validate local dev servers. Auto-detects Docker vs native, checks health, catches crash loops. Usage: /restart <service> <--preflight>"
|
|
4
4
|
---
|
|
5
5
|
|
|
6
6
|
# Restart Servers Skill
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
Discover how services run, restart them safely, and verify they stay healthy.
|
|
9
9
|
|
|
10
|
-
##
|
|
10
|
+
## Phase 1: Discover Services
|
|
11
11
|
|
|
12
|
-
|
|
12
|
+
**First, check CLAUDE.md** for a `Local Dev Services` table:
|
|
13
13
|
|
|
14
14
|
```markdown
|
|
15
15
|
## Local Dev Services
|
|
16
16
|
|
|
17
|
-
| Service | Port | Directory | Start Command |
|
|
18
|
-
|
|
19
|
-
|
|
|
20
|
-
| Backend | 8000 | backend/ | uvicorn app.main:app --reload
|
|
17
|
+
| Service | Type | Port | Directory | Start Command | Health Check | Depends On |
|
|
18
|
+
|----------|--------|------|-----------|---------------------------------------|-----------------|------------|
|
|
19
|
+
| Postgres | docker | 5432 | — | docker compose up -d postgres | pg_isready | Docker |
|
|
20
|
+
| Backend | native | 8000 | backend/ | uvicorn app.main:app --reload | /api/health | Postgres |
|
|
21
|
+
| Frontend | native | 3000 | frontend/ | npm run dev | /health | Backend |
|
|
21
22
|
```
|
|
22
23
|
|
|
23
|
-
If
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
-
|
|
24
|
+
**If the table is missing or incomplete, auto-discover** by scanning the repo. Check these files in order:
|
|
25
|
+
|
|
26
|
+
| File | What to extract |
|
|
27
|
+
|------|-----------------|
|
|
28
|
+
| `docker-compose.yml` / `docker-compose.*.yml` | Service names, ports, images, healthcheck configs, volume mounts. These are **docker** type services. |
|
|
29
|
+
| `Dockerfile` / `Dockerfile.*` | What gets containerized — cross-reference with compose to understand which services are Docker-managed. |
|
|
30
|
+
| `package.json` (root + subdirs) | `scripts.dev`, `scripts.start`, `scripts.serve` → these are **native** Node services. Check `proxy` field for backend port. |
|
|
31
|
+
| `Makefile` / `Procfile` / `Justfile` | Process definitions, often with ports and dependency ordering. |
|
|
32
|
+
| `pyproject.toml` / `requirements.txt` | Python services — look for uvicorn/gunicorn/flask/django in deps. |
|
|
33
|
+
| `.env` / `.env.local` / `.env.example` | Port assignments (`PORT=`, `DATABASE_URL=`, `REDIS_URL=`), required env vars. |
|
|
34
|
+
| `turbo.json` / `nx.json` / `pnpm-workspace.yaml` | Monorepo structure — maps workspace packages to services. |
|
|
35
|
+
|
|
36
|
+
For each discovered service, determine:
|
|
37
|
+
- **Name**: human-readable (e.g., "Backend API", "Postgres")
|
|
38
|
+
- **Type**: `docker` (managed by docker compose) or `native` (runs directly on host)
|
|
39
|
+
- **Port**: what port it listens on
|
|
40
|
+
- **Directory**: where to `cd` before running the start command
|
|
41
|
+
- **Start command**: exact command to launch it
|
|
42
|
+
- **Health check**: endpoint or command to verify it's running (look for `/health`, `/api/health`, `pg_isready`, `redis-cli ping`, etc.)
|
|
43
|
+
- **Depends on**: other services that must be running first (DB before backend, backend before frontend)
|
|
44
|
+
|
|
45
|
+
**If discovery is ambiguous, ASK the user.** Specifically ask when:
|
|
46
|
+
- Multiple compose files exist and it's unclear which to use (dev vs prod vs test)
|
|
47
|
+
- A service could be Docker OR native (e.g., Postgres has both a compose entry and a local install)
|
|
48
|
+
- No health check endpoint is obvious — ask what URL or command confirms the service is healthy
|
|
49
|
+
- Port conflicts or unusual port assignments are detected
|
|
50
|
+
- You find services but can't determine the dependency order
|
|
51
|
+
|
|
52
|
+
Present what you found and ask for confirmation:
|
|
53
|
+
```
|
|
54
|
+
Found 3 services:
|
|
55
|
+
1. Postgres (docker, port 5432) — via docker-compose.yml
|
|
56
|
+
2. Backend (native, port 8000) — via backend/pyproject.toml
|
|
57
|
+
3. Frontend (native, port 3000) — via frontend/package.json
|
|
58
|
+
|
|
59
|
+
Dependencies: Frontend → Backend → Postgres
|
|
60
|
+
|
|
61
|
+
Health checks:
|
|
62
|
+
- Postgres: pg_isready -h localhost -p 5432
|
|
63
|
+
- Backend: curl localhost:8000/api/health
|
|
64
|
+
- Frontend: curl localhost:3000
|
|
65
|
+
|
|
66
|
+
Does this look right? Any corrections?
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
After confirmation (or if CLAUDE.md table already exists), proceed to Phase 2.
|
|
70
|
+
|
|
71
|
+
**Update CLAUDE.md**: If services were auto-discovered and the user confirmed, write/update the `Local Dev Services` table in CLAUDE.md so future runs skip discovery.
|
|
28
72
|
|
|
29
|
-
|
|
73
|
+
## Phase 2: Pre-flight Validation
|
|
30
74
|
|
|
75
|
+
Before touching any running processes, validate the environment is ready:
|
|
76
|
+
|
|
77
|
+
### 2a. Docker check (if any service type is `docker`)
|
|
78
|
+
```bash
|
|
79
|
+
# Is Docker daemon running?
|
|
80
|
+
docker info > /dev/null 2>&1 && echo "Docker: OK" || echo "Docker: NOT RUNNING"
|
|
81
|
+
```
|
|
82
|
+
If Docker is not running, **stop and tell the user**. Don't proceed.
|
|
83
|
+
|
|
84
|
+
### 2b. Dependency check (for native services)
|
|
31
85
|
```bash
|
|
32
|
-
#
|
|
86
|
+
# Node services — are node_modules installed?
|
|
87
|
+
[ -d "<directory>/node_modules" ] && echo "node_modules: OK" || echo "node_modules: MISSING — run npm install"
|
|
88
|
+
|
|
89
|
+
# Python services — is the venv active / deps installed?
|
|
90
|
+
[ -f "<directory>/.venv/bin/python" ] && echo "venv: OK" || echo "venv: MISSING"
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
### 2c. Environment variables
|
|
94
|
+
```bash
|
|
95
|
+
# Check critical env vars exist (read from .env.example or known requirements)
|
|
96
|
+
[ -f "<directory>/.env" ] || [ -f "<directory>/.env.local" ] && echo "env file: OK" || echo "env file: MISSING"
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
### 2d. Port availability
|
|
100
|
+
```bash
|
|
101
|
+
# Check if port is already in use by a DIFFERENT process than expected
|
|
102
|
+
lsof -ti:<port> 2>/dev/null
|
|
103
|
+
```
|
|
104
|
+
If a port is occupied by an unexpected process, warn the user before killing it.
|
|
105
|
+
|
|
106
|
+
### 2e. Pre-flight only mode
|
|
107
|
+
If the user ran `/restart --preflight`, **stop here** and report results. Don't restart anything.
|
|
108
|
+
|
|
109
|
+
## Phase 3: Restart Services (dependency order)
|
|
110
|
+
|
|
111
|
+
Restart in dependency order — infrastructure first, then backends, then frontends.
|
|
112
|
+
|
|
113
|
+
### 3a. Stop services (reverse dependency order)
|
|
114
|
+
```bash
|
|
115
|
+
# For native services — kill by port
|
|
33
116
|
lsof -ti:<port> | xargs kill -9 2>/dev/null
|
|
34
117
|
|
|
118
|
+
# For docker services — stop the specific service
|
|
119
|
+
docker compose stop <service_name>
|
|
120
|
+
|
|
35
121
|
# Wait for ports to free
|
|
36
122
|
sleep 2
|
|
37
|
-
```
|
|
38
123
|
|
|
39
|
-
|
|
124
|
+
# Verify port is actually free
|
|
125
|
+
lsof -ti:<port> 2>/dev/null && echo "WARNING: port <port> still occupied" || echo "Port <port>: free"
|
|
126
|
+
```
|
|
40
127
|
|
|
128
|
+
### 3b. Start services (dependency order)
|
|
41
129
|
```bash
|
|
42
|
-
#
|
|
43
|
-
<
|
|
130
|
+
# Docker services first
|
|
131
|
+
docker compose up -d <service_name>
|
|
132
|
+
|
|
133
|
+
# Then native services — run in background
|
|
134
|
+
cd <directory> && <start_command>
|
|
44
135
|
```
|
|
45
136
|
|
|
46
|
-
Use `run_in_background: true` on the Bash tool.
|
|
137
|
+
Use `run_in_background: true` on the Bash tool for native services.
|
|
138
|
+
|
|
139
|
+
**Wait between dependency layers** — don't start the backend until the DB health check passes. Don't start the frontend until the backend health check passes.
|
|
140
|
+
|
|
141
|
+
## Phase 4: Post-restart Health Validation
|
|
142
|
+
|
|
143
|
+
This is the critical phase. A single health check is not enough — services can start and then crash seconds later.
|
|
47
144
|
|
|
48
|
-
|
|
145
|
+
### 4a. Initial health check (per service, in dependency order)
|
|
49
146
|
|
|
50
147
|
```bash
|
|
51
|
-
#
|
|
52
|
-
curl -sf http://localhost:<port
|
|
148
|
+
# HTTP services
|
|
149
|
+
curl -sf http://localhost:<port><health_path> > /dev/null 2>&1 && echo "<service>: UP" || echo "<service>: DOWN"
|
|
150
|
+
|
|
151
|
+
# Postgres
|
|
152
|
+
pg_isready -h localhost -p <port> 2>/dev/null && echo "Postgres: UP" || echo "Postgres: DOWN"
|
|
153
|
+
|
|
154
|
+
# Redis
|
|
155
|
+
redis-cli -p <port> ping 2>/dev/null | grep -q PONG && echo "Redis: UP" || echo "Redis: DOWN"
|
|
156
|
+
|
|
157
|
+
# Docker services
|
|
158
|
+
docker compose ps <service_name> --format '{{.Status}}' | grep -q "Up" && echo "<service>: UP" || echo "<service>: DOWN"
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
### 4b. Crash loop detection (wait 8 seconds, check again)
|
|
162
|
+
|
|
163
|
+
```bash
|
|
164
|
+
sleep 8
|
|
165
|
+
|
|
166
|
+
# Re-check each service
|
|
167
|
+
curl -sf http://localhost:<port><health_path> > /dev/null 2>&1 && echo "<service>: STABLE" || echo "<service>: CRASHED"
|
|
168
|
+
|
|
169
|
+
# For native services — is the process still running?
|
|
170
|
+
lsof -ti:<port> > /dev/null 2>&1 && echo "<service> process: alive" || echo "<service> process: GONE"
|
|
53
171
|
```
|
|
54
172
|
|
|
55
|
-
|
|
173
|
+
If a service is down on the second check:
|
|
174
|
+
1. **Check logs** — read the last 30 lines of output for error messages
|
|
175
|
+
2. **Report the failure clearly** with the error
|
|
176
|
+
3. **Do NOT retry automatically** — tell the user what went wrong
|
|
177
|
+
|
|
178
|
+
### 4c. Final status report
|
|
179
|
+
|
|
180
|
+
```
|
|
181
|
+
✅ Restart complete
|
|
182
|
+
|
|
183
|
+
| Service | Type | Port | Status |
|
|
184
|
+
|----------|--------|------|---------|
|
|
185
|
+
| Postgres | docker | 5432 | STABLE |
|
|
186
|
+
| Backend | native | 8000 | STABLE |
|
|
187
|
+
| Frontend | native | 3000 | STABLE |
|
|
188
|
+
|
|
189
|
+
All services healthy after 10s stability check.
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
Or if something failed:
|
|
193
|
+
|
|
194
|
+
```
|
|
195
|
+
⚠️ Restart partial — 1 service unhealthy
|
|
196
|
+
|
|
197
|
+
| Service | Type | Port | Status |
|
|
198
|
+
|----------|--------|------|---------|
|
|
199
|
+
| Postgres | docker | 5432 | STABLE |
|
|
200
|
+
| Backend | native | 8000 | CRASHED |
|
|
201
|
+
| Frontend | native | 3000 | SKIPPED |
|
|
202
|
+
|
|
203
|
+
Backend crashed after startup. Last error:
|
|
204
|
+
ModuleNotFoundError: No module named 'sqlalchemy'
|
|
205
|
+
|
|
206
|
+
Fix: Run `pip install -r requirements.txt` in backend/ and retry.
|
|
207
|
+
Frontend was skipped because it depends on Backend.
|
|
208
|
+
```
|
|
56
209
|
|
|
57
210
|
## Argument Patterns
|
|
58
211
|
|
|
59
212
|
| User Input | Action |
|
|
60
213
|
|-----------|--------|
|
|
61
|
-
| `/restart` (no args) |
|
|
62
|
-
| `/restart backend` |
|
|
63
|
-
| `/restart
|
|
64
|
-
| `/restart <service>` | Kill and restart the named service |
|
|
214
|
+
| `/restart` (no args) | Discover → validate → restart all services |
|
|
215
|
+
| `/restart backend` | Restart only the named service (and its dependencies if they're down) |
|
|
216
|
+
| `/restart --preflight` | Discover and validate only — don't restart anything |
|
|
65
217
|
|
|
66
|
-
##
|
|
218
|
+
## Key Principles
|
|
67
219
|
|
|
68
|
-
|
|
220
|
+
1. **Never guess** — if you can't determine how a service runs, ask
|
|
221
|
+
2. **Dependency order matters** — always start infra → backend → frontend
|
|
222
|
+
3. **One health check isn't enough** — check twice with a gap to catch crash loops
|
|
223
|
+
4. **Report errors with context** — show the actual log output, not just "DOWN"
|
|
224
|
+
5. **Don't retry blindly** — if a service crashes, diagnose and report, don't loop
|