@arthai/agents 1.0.5 → 1.0.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (130) hide show
  1. package/README.md +33 -3
  2. package/VERSION +1 -1
  3. package/agents/troubleshooter.md +132 -0
  4. package/bin/cli.js +296 -0
  5. package/bundles/canvas.json +1 -1
  6. package/bundles/compass.json +1 -1
  7. package/bundles/counsel.json +1 -0
  8. package/bundles/cruise.json +1 -1
  9. package/bundles/forge.json +12 -1
  10. package/bundles/prism.json +1 -0
  11. package/bundles/scalpel.json +5 -2
  12. package/bundles/sentinel.json +8 -2
  13. package/bundles/shield.json +1 -0
  14. package/bundles/spark.json +1 -0
  15. package/compiler.sh +14 -0
  16. package/dist/plugins/canvas/.claude-plugin/plugin.json +1 -1
  17. package/dist/plugins/canvas/VERSION +1 -0
  18. package/dist/plugins/canvas/commands/planning.md +100 -11
  19. package/dist/plugins/canvas/hooks/hooks.json +16 -0
  20. package/dist/plugins/canvas/hooks/project-setup.sh +109 -0
  21. package/dist/plugins/canvas/templates/CLAUDE.md.managed-block +123 -0
  22. package/dist/plugins/canvas/templates/CLAUDE.md.template +111 -0
  23. package/dist/plugins/compass/.claude-plugin/plugin.json +1 -1
  24. package/dist/plugins/compass/VERSION +1 -0
  25. package/dist/plugins/compass/commands/planning.md +100 -11
  26. package/dist/plugins/compass/hooks/hooks.json +16 -0
  27. package/dist/plugins/compass/hooks/project-setup.sh +109 -0
  28. package/dist/plugins/compass/templates/CLAUDE.md.managed-block +123 -0
  29. package/dist/plugins/compass/templates/CLAUDE.md.template +111 -0
  30. package/dist/plugins/counsel/.claude-plugin/plugin.json +1 -1
  31. package/dist/plugins/counsel/VERSION +1 -0
  32. package/dist/plugins/counsel/hooks/hooks.json +10 -0
  33. package/dist/plugins/counsel/hooks/project-setup.sh +109 -0
  34. package/dist/plugins/counsel/templates/CLAUDE.md.managed-block +123 -0
  35. package/dist/plugins/counsel/templates/CLAUDE.md.template +111 -0
  36. package/dist/plugins/cruise/.claude-plugin/plugin.json +1 -1
  37. package/dist/plugins/cruise/VERSION +1 -0
  38. package/dist/plugins/cruise/hooks/hooks.json +16 -0
  39. package/dist/plugins/cruise/hooks/project-setup.sh +109 -0
  40. package/dist/plugins/cruise/templates/CLAUDE.md.managed-block +123 -0
  41. package/dist/plugins/cruise/templates/CLAUDE.md.template +111 -0
  42. package/dist/plugins/forge/.claude-plugin/plugin.json +1 -1
  43. package/dist/plugins/forge/VERSION +1 -0
  44. package/dist/plugins/forge/agents/troubleshooter.md +132 -0
  45. package/dist/plugins/forge/commands/implement.md +99 -1
  46. package/dist/plugins/forge/commands/planning.md +100 -11
  47. package/dist/plugins/forge/hooks/escalation-guard.sh +177 -0
  48. package/dist/plugins/forge/hooks/hooks.json +22 -0
  49. package/dist/plugins/forge/hooks/project-setup.sh +109 -0
  50. package/dist/plugins/forge/templates/CLAUDE.md.managed-block +123 -0
  51. package/dist/plugins/forge/templates/CLAUDE.md.template +111 -0
  52. package/dist/plugins/prime/.claude-plugin/plugin.json +1 -1
  53. package/dist/plugins/prime/VERSION +1 -0
  54. package/dist/plugins/prime/agents/troubleshooter.md +132 -0
  55. package/dist/plugins/prime/commands/calibrate.md +20 -0
  56. package/dist/plugins/prime/commands/ci-fix.md +36 -0
  57. package/dist/plugins/prime/commands/fix.md +23 -0
  58. package/dist/plugins/prime/commands/implement.md +99 -1
  59. package/dist/plugins/prime/commands/planning.md +100 -11
  60. package/dist/plugins/prime/commands/qa-incident.md +54 -0
  61. package/dist/plugins/prime/commands/restart.md +186 -30
  62. package/dist/plugins/prime/hooks/escalation-guard.sh +177 -0
  63. package/dist/plugins/prime/hooks/hooks.json +60 -0
  64. package/dist/plugins/prime/hooks/post-config-change-restart-reminder.sh +86 -0
  65. package/dist/plugins/prime/hooks/post-server-crash-watch.sh +120 -0
  66. package/dist/plugins/prime/hooks/pre-server-port-guard.sh +110 -0
  67. package/dist/plugins/prime/hooks/project-setup.sh +109 -0
  68. package/dist/plugins/prime/hooks/sync-agents.sh +99 -12
  69. package/dist/plugins/prime/templates/CLAUDE.md.managed-block +123 -0
  70. package/dist/plugins/prime/templates/CLAUDE.md.template +111 -0
  71. package/dist/plugins/prism/.claude-plugin/plugin.json +1 -1
  72. package/dist/plugins/prism/VERSION +1 -0
  73. package/dist/plugins/prism/commands/qa-incident.md +54 -0
  74. package/dist/plugins/prism/hooks/hooks.json +12 -0
  75. package/dist/plugins/prism/hooks/project-setup.sh +109 -0
  76. package/dist/plugins/prism/templates/CLAUDE.md.managed-block +123 -0
  77. package/dist/plugins/prism/templates/CLAUDE.md.template +111 -0
  78. package/dist/plugins/scalpel/.claude-plugin/plugin.json +1 -1
  79. package/dist/plugins/scalpel/VERSION +1 -0
  80. package/dist/plugins/scalpel/agents/troubleshooter.md +132 -0
  81. package/dist/plugins/scalpel/commands/ci-fix.md +36 -0
  82. package/dist/plugins/scalpel/commands/fix.md +23 -0
  83. package/dist/plugins/scalpel/hooks/escalation-guard.sh +177 -0
  84. package/dist/plugins/scalpel/hooks/hooks.json +24 -0
  85. package/dist/plugins/scalpel/hooks/project-setup.sh +109 -0
  86. package/dist/plugins/scalpel/templates/CLAUDE.md.managed-block +123 -0
  87. package/dist/plugins/scalpel/templates/CLAUDE.md.template +111 -0
  88. package/dist/plugins/sentinel/.claude-plugin/plugin.json +1 -1
  89. package/dist/plugins/sentinel/VERSION +1 -0
  90. package/dist/plugins/sentinel/agents/troubleshooter.md +132 -0
  91. package/dist/plugins/sentinel/commands/restart.md +186 -30
  92. package/dist/plugins/sentinel/hooks/escalation-guard.sh +177 -0
  93. package/dist/plugins/sentinel/hooks/hooks.json +64 -0
  94. package/dist/plugins/sentinel/hooks/post-config-change-restart-reminder.sh +86 -0
  95. package/dist/plugins/sentinel/hooks/post-server-crash-watch.sh +120 -0
  96. package/dist/plugins/sentinel/hooks/pre-server-port-guard.sh +110 -0
  97. package/dist/plugins/sentinel/hooks/project-setup.sh +109 -0
  98. package/dist/plugins/sentinel/templates/CLAUDE.md.managed-block +123 -0
  99. package/dist/plugins/sentinel/templates/CLAUDE.md.template +111 -0
  100. package/dist/plugins/shield/.claude-plugin/plugin.json +1 -1
  101. package/dist/plugins/shield/VERSION +1 -0
  102. package/dist/plugins/shield/hooks/hooks.json +22 -12
  103. package/dist/plugins/shield/hooks/project-setup.sh +109 -0
  104. package/dist/plugins/shield/templates/CLAUDE.md.managed-block +123 -0
  105. package/dist/plugins/shield/templates/CLAUDE.md.template +111 -0
  106. package/dist/plugins/spark/.claude-plugin/plugin.json +1 -1
  107. package/dist/plugins/spark/VERSION +1 -0
  108. package/dist/plugins/spark/commands/calibrate.md +20 -0
  109. package/dist/plugins/spark/hooks/hooks.json +10 -0
  110. package/dist/plugins/spark/hooks/project-setup.sh +109 -0
  111. package/dist/plugins/spark/templates/CLAUDE.md.managed-block +123 -0
  112. package/dist/plugins/spark/templates/CLAUDE.md.template +111 -0
  113. package/hook-defs.json +31 -0
  114. package/hooks/escalation-guard.sh +177 -0
  115. package/hooks/post-config-change-restart-reminder.sh +86 -0
  116. package/hooks/post-server-crash-watch.sh +120 -0
  117. package/hooks/pre-server-port-guard.sh +110 -0
  118. package/hooks/project-setup.sh +109 -0
  119. package/hooks/sync-agents.sh +99 -12
  120. package/install.sh +2 -2
  121. package/package.json +1 -1
  122. package/portable.manifest +7 -1
  123. package/skills/calibrate/SKILL.md +20 -0
  124. package/skills/ci-fix/SKILL.md +36 -0
  125. package/skills/fix/SKILL.md +23 -0
  126. package/skills/implement/SKILL.md +99 -1
  127. package/skills/license/SKILL.md +159 -0
  128. package/skills/planning/SKILL.md +100 -11
  129. package/skills/qa-incident/SKILL.md +54 -0
  130. package/skills/restart/SKILL.md +187 -31
@@ -0,0 +1,111 @@
1
+ # CLAUDE.md — {{PROJECT_NAME}}
2
+
3
+ <!-- Generated by claude-agents install.sh --init -->
4
+ <!-- TODO: Replace {{placeholders}} with your project details -->
5
+
6
+ ## Project Overview
7
+
8
+ {{PROJECT_NAME}} is a {{DESCRIPTION}}.
9
+
10
+ ## Tech Stack
11
+
12
+ - **Frontend**: <!-- TODO: e.g., Next.js 14, React 18, TypeScript, Tailwind -->
13
+ - **Backend**: <!-- TODO: e.g., FastAPI, SQLAlchemy, PostgreSQL -->
14
+ - **Auth**: <!-- TODO: e.g., Stytch, Auth0, Clerk -->
15
+ - **Deploy**: <!-- TODO: e.g., Railway, Vercel, AWS -->
16
+
17
+ ## Project Structure
18
+
19
+ ```
20
+ {{PROJECT_NAME}}/
21
+ ├── frontend/ <!-- TODO: Frontend directory -->
22
+ ├── backend/ <!-- TODO: Backend directory -->
23
+ └── ...
24
+ ```
25
+
26
+ ## Key Architecture
27
+
28
+ <!-- TODO: Describe your auth flow, API patterns, database schema, etc. -->
29
+
30
+ ## Local Dev Services
31
+
32
+ <!-- TODO: Auto-populated by /scan or fill manually -->
33
+
34
+ | Service | Port | Directory | Start Command |
35
+ |----------|------|-----------|---------------|
36
+ | Frontend | <!-- TODO --> | frontend/ | <!-- TODO: e.g., npm run dev --> |
37
+ | Backend | <!-- TODO --> | backend/ | <!-- TODO: e.g., uvicorn app.main:app --reload --> |
38
+
39
+ ## Test Commands
40
+
41
+ <!-- TODO: Auto-populated by /scan or fill manually -->
42
+
43
+ | What | Command | Directory |
44
+ |------|---------|-----------|
45
+ | Backend tests | <!-- TODO: e.g., pytest --> | backend/ |
46
+ | Backend lint | <!-- TODO: e.g., ruff check . --> | backend/ |
47
+ | Frontend tests | <!-- TODO: e.g., npm test --> | frontend/ |
48
+ | Frontend lint | <!-- TODO: e.g., npm run lint --> | frontend/ |
49
+ | Type check | <!-- TODO: e.g., npx tsc --noEmit --> | frontend/ |
50
+ | E2E tests | <!-- TODO: e.g., npx playwright test --> | frontend/ |
51
+
52
+ ## Infrastructure
53
+
54
+ <!-- TODO: Auto-populated by /scan or fill manually -->
55
+
56
+ | Platform | Service | Domain |
57
+ |----------|---------|--------|
58
+ | <!-- TODO: e.g., Railway --> | <!-- TODO --> | <!-- TODO --> |
59
+
60
+ Health endpoints: <!-- TODO: e.g., /health, /api/health -->
61
+
62
+ ## Environments
63
+
64
+ <!-- TODO: Auto-populated by /scan environments or /calibrate -->
65
+
66
+ | Name | Type | URL | Health | Deploy | Branch |
67
+ |------|------|-----|--------|--------|--------|
68
+ | local | development | <!-- TODO --> | <!-- TODO: e.g., /health --> | manual | — |
69
+ | <!-- TODO --> | <!-- TODO: staging/production/preview/canary --> | <!-- TODO --> | <!-- TODO --> | <!-- TODO --> | <!-- TODO --> |
70
+
71
+ Access notes: <!-- TODO: e.g., Railway MCP for staging/prod. Env vars: .env.local, .env.staging -->
72
+
73
+ ## Domain
74
+
75
+ <!-- TODO: Auto-populated by /scan or fill manually -->
76
+ <!-- Describe what this app does, its core entities, and business rules. -->
77
+ <!-- Used by qa-domain agent for domain-aware testing. -->
78
+
79
+ ## Running Locally
80
+
81
+ ```bash
82
+ # TODO: Add your local development commands
83
+ # Frontend
84
+ cd frontend && npm run dev
85
+
86
+ # Backend
87
+ cd backend && source .venv/bin/activate && uvicorn app.main:app --reload
88
+ ```
89
+
90
+ ## Critical Rules
91
+
92
+ <!-- TODO: Add project-specific rules, e.g.: -->
93
+ - Never push to main directly — always create a PR
94
+ - Secrets in .env.local only — never committed
95
+
96
+ ## Agent Customization
97
+
98
+ The following agents/skills are managed by `claude-agents` (symlinked):
99
+ - Run `~/.claude-agents/install.sh --status` to see what's linked
100
+ - To override any portable file, replace the symlink with a regular file
101
+ - Your override won't be touched by future syncs
102
+
103
+ ### Project-Specific Agents
104
+
105
+ Add project-specific agents as regular files in `.claude/agents/`:
106
+ - See `~/.claude-agents/examples/agents/` for templates (frontend, backend, ops, sre, qa)
107
+
108
+ ### Project-Specific Skills
109
+
110
+ Add project-specific skills as regular directories in `.claude/skills/`:
111
+ - See `~/.claude-agents/examples/skills/` for templates (ci-fix, qa, restart)
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "sentinel",
3
3
  "description": "Operations and reliability — SRE, incidents, deploy monitoring",
4
- "version": "1.0.5",
4
+ "version": "1.0.7",
5
5
  "author": {
6
6
  "name": "Arth AI"
7
7
  }
@@ -0,0 +1 @@
1
+ 1.0.7
@@ -0,0 +1,132 @@
1
+ ---
2
+ name: troubleshooter
3
+ description: "Specialized debugging agent for when other agents get stuck. Performs root cause analysis using error context, knowledge base, git history, and CLAUDE.md. Produces structured diagnosis with confidence level and recommended fix."
4
+ model: sonnet
5
+ ---
6
+
7
+ # Troubleshooter Agent
8
+
9
+ You are a specialized debugging agent. You are called when another agent or workflow
10
+ has failed multiple times and needs expert diagnosis.
11
+
12
+ ## When You Are Spawned
13
+
14
+ Another agent has hit a wall — they've tried 2-3 fixes and keep failing. Your job
15
+ is to diagnose the root cause and provide a fix with confidence rating.
16
+
17
+ ## Your Process (follow in order)
18
+
19
+ ### 1. Understand the Problem (DO NOT SKIP)
20
+
21
+ Read the error context provided in your spawn prompt. Extract:
22
+ - **Exact error message** (not paraphrased)
23
+ - **What was being attempted** (the goal, not just the command)
24
+ - **What has already been tried** (and why each attempt failed)
25
+ - **The file(s) involved**
26
+
27
+ ### 2. Consult Knowledge Base (BEFORE forming any hypothesis)
28
+
29
+ Check these sources in order:
30
+
31
+ ```
32
+ .claude/knowledge/qa-knowledge/ → past incidents with error signatures
33
+ .claude/knowledge/shared/conventions.md → project-specific gotchas and rules
34
+ .claude/knowledge/shared/patterns.md → architecture patterns that may explain the error
35
+ .claude/knowledge/agents/ → per-agent learning files
36
+ CLAUDE.md → project configuration, test commands, services
37
+ ```
38
+
39
+ Search for:
40
+ - The exact error message (or key phrases)
41
+ - The file/module involved
42
+ - The command that failed
43
+ - Similar past incidents
44
+
45
+ **If you find a match:** Follow the documented fix. Do not reinvent.
46
+ **If no match:** Proceed to step 3.
47
+
48
+ ### 3. Gather Fresh Evidence
49
+
50
+ Read the actual source code around the error:
51
+ - The file mentioned in the error (read 50+ lines of context, not just the error line)
52
+ - Related files (imports, callers, configuration)
53
+ - Recent changes: `git log --oneline -10 -- <file>` and `git diff HEAD -- <file>`
54
+
55
+ Check the environment:
56
+ - `git status` — are there uncommitted changes that might cause the issue?
57
+ - Check if the right dependencies are installed (node_modules, venv, etc.)
58
+ - Check if services are running (ports, Docker containers)
59
+ - Check environment variables that the code expects
60
+
61
+ ### 4. Form Hypothesis (evidence-based only)
62
+
63
+ Based on steps 2-3, form ONE primary hypothesis and optionally one alternative.
64
+ Each hypothesis MUST cite evidence:
65
+
66
+ ```
67
+ HYPOTHESIS: [what I think is wrong]
68
+ EVIDENCE:
69
+ - [source]: [what I found that supports this]
70
+ - [source]: [what I found that supports this]
71
+ CONFIDENCE: HIGH / MEDIUM / LOW
72
+ - HIGH: evidence directly explains the error, fix is clear
73
+ - MEDIUM: evidence is consistent but not conclusive
74
+ - LOW: best guess based on limited evidence
75
+ ```
76
+
77
+ ### 5. Recommend Fix
78
+
79
+ Provide a specific, actionable fix:
80
+
81
+ ```
82
+ RECOMMENDED FIX:
83
+ File: [exact file path]
84
+ Change: [what to modify — be specific, not vague]
85
+ Why: [how this addresses the root cause]
86
+ Verify: [command to run to confirm the fix works]
87
+
88
+ ALTERNATIVE FIX (if confidence < HIGH):
89
+ File: [exact file path]
90
+ Change: [what to modify]
91
+ Why: [different hypothesis this addresses]
92
+ ```
93
+
94
+ ### 6. Output Format
95
+
96
+ Always produce this structured output:
97
+
98
+ ```markdown
99
+ ## Troubleshooter Diagnosis
100
+
101
+ **Error:** [exact error]
102
+ **Root Cause:** [1-2 sentence explanation]
103
+ **Confidence:** HIGH / MEDIUM / LOW
104
+
105
+ ### Evidence
106
+ - [source 1]: [finding]
107
+ - [source 2]: [finding]
108
+ - Knowledge base: [match found / no match]
109
+
110
+ ### Recommended Fix
111
+ - File: [path]
112
+ - Change: [specific change]
113
+ - Verify: [command]
114
+
115
+ ### What Was Wrong With Previous Attempts
116
+ - Attempt 1: [why it didn't work — specific reason]
117
+ - Attempt 2: [why it didn't work — specific reason]
118
+
119
+ ### If This Doesn't Work
120
+ - [Next diagnostic step to try]
121
+ - [What data to gather]
122
+ - [Whether to escalate to user — and what to ask them]
123
+ ```
124
+
125
+ ## Rules
126
+
127
+ 1. **Never guess.** Every claim must cite evidence from code, logs, KB, or git history.
128
+ 2. **Check KB first.** If a past incident matches, use that fix. Don't reinvent.
129
+ 3. **Be specific.** "Check the config" is not a fix. "Change line 42 of config.ts from X to Y" is.
130
+ 4. **Explain why previous attempts failed.** This is as valuable as the fix itself.
131
+ 5. **Know when to escalate.** If confidence is LOW and you can't gather more evidence, say so. Recommend what data to ask the user for.
132
+ 6. **Don't try the fix yourself.** Your job is diagnosis. The calling agent implements the fix.
@@ -1,68 +1,224 @@
1
1
  ---
2
2
  name: restart
3
- description: "Kill and restart local dev servers. Reads service config from CLAUDE.md. Usage: /restart [service]"
3
+ description: "Discover, restart, and validate local dev servers. Auto-detects Docker vs native, checks health, catches crash loops. Usage: /restart <service> <--preflight>"
4
4
  ---
5
5
 
6
6
  # Restart Servers Skill
7
7
 
8
- Kill and restart local dev servers using the configuration from CLAUDE.md.
8
+ Discover how services run, restart them safely, and verify they stay healthy.
9
9
 
10
- ## Instructions
10
+ ## Phase 1: Discover Services
11
11
 
12
- 1. **Read CLAUDE.md** in the project root. Look for the **Local Dev Services** table:
12
+ **First, check CLAUDE.md** for a `Local Dev Services` table:
13
13
 
14
14
  ```markdown
15
15
  ## Local Dev Services
16
16
 
17
- | Service | Port | Directory | Start Command |
18
- |----------|------|-----------|---------------|
19
- | Frontend | 3000 | frontend/ | npm run dev |
20
- | Backend | 8000 | backend/ | uvicorn app.main:app --reload --port 8000 |
17
+ | Service | Type | Port | Directory | Start Command | Health Check | Depends On |
18
+ |----------|--------|------|-----------|---------------------------------------|-----------------|------------|
19
+ | Postgres | docker | 5432 | — | docker compose up -d postgres | pg_isready | Docker |
20
+ | Backend | native | 8000 | backend/ | uvicorn app.main:app --reload | /api/health | Postgres |
21
+ | Frontend | native | 3000 | frontend/ | npm run dev | /health | Backend |
21
22
  ```
22
23
 
23
- If CLAUDE.md doesn't have this section, scan the project:
24
- - `package.json` scripts → find dev/start commands
25
- - `requirements.txt` / `pyproject.toml` look for uvicorn/gunicorn/flask
26
- - `docker-compose.yml` → extract service ports
27
- - Common ports: frontend 3000, backend 8000
24
+ **If the table is missing or incomplete, auto-discover** by scanning the repo. Check these files in order:
25
+
26
+ | File | What to extract |
27
+ |------|-----------------|
28
+ | `docker-compose.yml` / `docker-compose.*.yml` | Service names, ports, images, healthcheck configs, volume mounts. These are **docker** type services. |
29
+ | `Dockerfile` / `Dockerfile.*` | What gets containerized — cross-reference with compose to understand which services are Docker-managed. |
30
+ | `package.json` (root + subdirs) | `scripts.dev`, `scripts.start`, `scripts.serve` → these are **native** Node services. Check `proxy` field for backend port. |
31
+ | `Makefile` / `Procfile` / `Justfile` | Process definitions, often with ports and dependency ordering. |
32
+ | `pyproject.toml` / `requirements.txt` | Python services — look for uvicorn/gunicorn/flask/django in deps. |
33
+ | `.env` / `.env.local` / `.env.example` | Port assignments (`PORT=`, `DATABASE_URL=`, `REDIS_URL=`), required env vars. |
34
+ | `turbo.json` / `nx.json` / `pnpm-workspace.yaml` | Monorepo structure — maps workspace packages to services. |
35
+
36
+ For each discovered service, determine:
37
+ - **Name**: human-readable (e.g., "Backend API", "Postgres")
38
+ - **Type**: `docker` (managed by docker compose) or `native` (runs directly on host)
39
+ - **Port**: what port it listens on
40
+ - **Directory**: where to `cd` before running the start command
41
+ - **Start command**: exact command to launch it
42
+ - **Health check**: endpoint or command to verify it's running (look for `/health`, `/api/health`, `pg_isready`, `redis-cli ping`, etc.)
43
+ - **Depends on**: other services that must be running first (DB before backend, backend before frontend)
44
+
45
+ **If discovery is ambiguous, ASK the user.** Specifically ask when:
46
+ - Multiple compose files exist and it's unclear which to use (dev vs prod vs test)
47
+ - A service could be Docker OR native (e.g., Postgres has both a compose entry and a local install)
48
+ - No health check endpoint is obvious — ask what URL or command confirms the service is healthy
49
+ - Port conflicts or unusual port assignments are detected
50
+ - You find services but can't determine the dependency order
51
+
52
+ Present what you found and ask for confirmation:
53
+ ```
54
+ Found 3 services:
55
+ 1. Postgres (docker, port 5432) — via docker-compose.yml
56
+ 2. Backend (native, port 8000) — via backend/pyproject.toml
57
+ 3. Frontend (native, port 3000) — via frontend/package.json
58
+
59
+ Dependencies: Frontend → Backend → Postgres
60
+
61
+ Health checks:
62
+ - Postgres: pg_isready -h localhost -p 5432
63
+ - Backend: curl localhost:8000/api/health
64
+ - Frontend: curl localhost:3000
65
+
66
+ Does this look right? Any corrections?
67
+ ```
68
+
69
+ After confirmation (or if CLAUDE.md table already exists), proceed to Phase 2.
70
+
71
+ **Update CLAUDE.md**: If services were auto-discovered and the user confirmed, write/update the `Local Dev Services` table in CLAUDE.md so future runs skip discovery.
28
72
 
29
- 2. **Kill existing processes** for the target service(s):
73
+ ## Phase 2: Pre-flight Validation
30
74
 
75
+ Before touching any running processes, validate the environment is ready:
76
+
77
+ ### 2a. Docker check (if any service type is `docker`)
78
+ ```bash
79
+ # Is Docker daemon running?
80
+ docker info > /dev/null 2>&1 && echo "Docker: OK" || echo "Docker: NOT RUNNING"
81
+ ```
82
+ If Docker is not running, **stop and tell the user**. Don't proceed.
83
+
84
+ ### 2b. Dependency check (for native services)
31
85
  ```bash
32
- # Kill by port
86
+ # Node services — are node_modules installed?
87
+ [ -d "<directory>/node_modules" ] && echo "node_modules: OK" || echo "node_modules: MISSING — run npm install"
88
+
89
+ # Python services — is the venv active / deps installed?
90
+ [ -f "<directory>/.venv/bin/python" ] && echo "venv: OK" || echo "venv: MISSING"
91
+ ```
92
+
93
+ ### 2c. Environment variables
94
+ ```bash
95
+ # Check critical env vars exist (read from .env.example or known requirements)
96
+ [ -f "<directory>/.env" ] || [ -f "<directory>/.env.local" ] && echo "env file: OK" || echo "env file: MISSING"
97
+ ```
98
+
99
+ ### 2d. Port availability
100
+ ```bash
101
+ # Check if port is already in use by a DIFFERENT process than expected
102
+ lsof -ti:<port> 2>/dev/null
103
+ ```
104
+ If a port is occupied by an unexpected process, warn the user before killing it.
105
+
106
+ ### 2e. Pre-flight only mode
107
+ If the user ran `/restart --preflight`, **stop here** and report results. Don't restart anything.
108
+
109
+ ## Phase 3: Restart Services (dependency order)
110
+
111
+ Restart in dependency order — infrastructure first, then backends, then frontends.
112
+
113
+ ### 3a. Stop services (reverse dependency order)
114
+ ```bash
115
+ # For native services — kill by port
33
116
  lsof -ti:<port> | xargs kill -9 2>/dev/null
34
117
 
118
+ # For docker services — stop the specific service
119
+ docker compose stop <service_name>
120
+
35
121
  # Wait for ports to free
36
122
  sleep 2
37
- ```
38
123
 
39
- 3. **Start each server** in background using the configured start command:
124
+ # Verify port is actually free
125
+ lsof -ti:<port> 2>/dev/null && echo "WARNING: port <port> still occupied" || echo "Port <port>: free"
126
+ ```
40
127
 
128
+ ### 3b. Start services (dependency order)
41
129
  ```bash
42
- # Run in background
43
- <start_command> &
130
+ # Docker services first
131
+ docker compose up -d <service_name>
132
+
133
+ # Then native services — run in background
134
+ cd <directory> && <start_command>
44
135
  ```
45
136
 
46
- Use `run_in_background: true` on the Bash tool.
137
+ Use `run_in_background: true` on the Bash tool for native services.
138
+
139
+ **Wait between dependency layers** — don't start the backend until the DB health check passes. Don't start the frontend until the backend health check passes.
140
+
141
+ ## Phase 4: Post-restart Health Validation
142
+
143
+ This is the critical phase. A single health check is not enough — services can start and then crash seconds later.
47
144
 
48
- 4. **Health check** after startup:
145
+ ### 4a. Initial health check (per service, in dependency order)
49
146
 
50
147
  ```bash
51
- # Check each service
52
- curl -sf http://localhost:<port>/health > /dev/null 2>&1 && echo "<service>:UP" || echo "<service>:DOWN"
148
+ # HTTP services
149
+ curl -sf http://localhost:<port><health_path> > /dev/null 2>&1 && echo "<service>: UP" || echo "<service>: DOWN"
150
+
151
+ # Postgres
152
+ pg_isready -h localhost -p <port> 2>/dev/null && echo "Postgres: UP" || echo "Postgres: DOWN"
153
+
154
+ # Redis
155
+ redis-cli -p <port> ping 2>/dev/null | grep -q PONG && echo "Redis: UP" || echo "Redis: DOWN"
156
+
157
+ # Docker services
158
+ docker compose ps <service_name> --format '{{.Status}}' | grep -q "Up" && echo "<service>: UP" || echo "<service>: DOWN"
159
+ ```
160
+
161
+ ### 4b. Crash loop detection (wait 8 seconds, check again)
162
+
163
+ ```bash
164
+ sleep 8
165
+
166
+ # Re-check each service
167
+ curl -sf http://localhost:<port><health_path> > /dev/null 2>&1 && echo "<service>: STABLE" || echo "<service>: CRASHED"
168
+
169
+ # For native services — is the process still running?
170
+ lsof -ti:<port> > /dev/null 2>&1 && echo "<service> process: alive" || echo "<service> process: GONE"
53
171
  ```
54
172
 
55
- 5. **Report** status of all servers.
173
+ If a service is down on the second check:
174
+ 1. **Check logs** — read the last 30 lines of output for error messages
175
+ 2. **Report the failure clearly** with the error
176
+ 3. **Do NOT retry automatically** — tell the user what went wrong
177
+
178
+ ### 4c. Final status report
179
+
180
+ ```
181
+ ✅ Restart complete
182
+
183
+ | Service | Type | Port | Status |
184
+ |----------|--------|------|---------|
185
+ | Postgres | docker | 5432 | STABLE |
186
+ | Backend | native | 8000 | STABLE |
187
+ | Frontend | native | 3000 | STABLE |
188
+
189
+ All services healthy after 10s stability check.
190
+ ```
191
+
192
+ Or if something failed:
193
+
194
+ ```
195
+ ⚠️ Restart partial — 1 service unhealthy
196
+
197
+ | Service | Type | Port | Status |
198
+ |----------|--------|------|---------|
199
+ | Postgres | docker | 5432 | STABLE |
200
+ | Backend | native | 8000 | CRASHED |
201
+ | Frontend | native | 3000 | SKIPPED |
202
+
203
+ Backend crashed after startup. Last error:
204
+ ModuleNotFoundError: No module named 'sqlalchemy'
205
+
206
+ Fix: Run `pip install -r requirements.txt` in backend/ and retry.
207
+ Frontend was skipped because it depends on Backend.
208
+ ```
56
209
 
57
210
  ## Argument Patterns
58
211
 
59
212
  | User Input | Action |
60
213
  |-----------|--------|
61
- | `/restart` (no args) | Kill and restart all configured servers |
62
- | `/restart backend` | Kill and restart only the backend service |
63
- | `/restart frontend` | Kill and restart only the frontend service |
64
- | `/restart <service>` | Kill and restart the named service |
214
+ | `/restart` (no args) | Discover validate → restart all services |
215
+ | `/restart backend` | Restart only the named service (and its dependencies if they're down) |
216
+ | `/restart --preflight` | Discover and validate only don't restart anything |
65
217
 
66
- ## If CLAUDE.md Is Missing Service Config
218
+ ## Key Principles
67
219
 
68
- Report: "No Local Dev Services table found in CLAUDE.md. Add a table with Service, Port, Directory, and Start Command columns. Or specify the services manually."
220
+ 1. **Never guess** if you can't determine how a service runs, ask
221
+ 2. **Dependency order matters** — always start infra → backend → frontend
222
+ 3. **One health check isn't enough** — check twice with a gap to catch crash loops
223
+ 4. **Report errors with context** — show the actual log output, not just "DOWN"
224
+ 5. **Don't retry blindly** — if a service crashes, diagnose and report, don't loop