npm - @arthai/agents - Versions diffs - 1.0.5 → 1.0.7 - Mend

@arthai/agents 1.0.5 → 1.0.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (130) hide show

package/README.md +33 -3
package/VERSION +1 -1
package/agents/troubleshooter.md +132 -0
package/bin/cli.js +296 -0
package/bundles/canvas.json +1 -1
package/bundles/compass.json +1 -1
package/bundles/counsel.json +1 -0
package/bundles/cruise.json +1 -1
package/bundles/forge.json +12 -1
package/bundles/prism.json +1 -0
package/bundles/scalpel.json +5 -2
package/bundles/sentinel.json +8 -2
package/bundles/shield.json +1 -0
package/bundles/spark.json +1 -0
package/compiler.sh +14 -0
package/dist/plugins/canvas/.claude-plugin/plugin.json +1 -1
package/dist/plugins/canvas/VERSION +1 -0
package/dist/plugins/canvas/commands/planning.md +100 -11
package/dist/plugins/canvas/hooks/hooks.json +16 -0
package/dist/plugins/canvas/hooks/project-setup.sh +109 -0
package/dist/plugins/canvas/templates/CLAUDE.md.managed-block +123 -0
package/dist/plugins/canvas/templates/CLAUDE.md.template +111 -0
package/dist/plugins/compass/.claude-plugin/plugin.json +1 -1
package/dist/plugins/compass/VERSION +1 -0
package/dist/plugins/compass/commands/planning.md +100 -11
package/dist/plugins/compass/hooks/hooks.json +16 -0
package/dist/plugins/compass/hooks/project-setup.sh +109 -0
package/dist/plugins/compass/templates/CLAUDE.md.managed-block +123 -0
package/dist/plugins/compass/templates/CLAUDE.md.template +111 -0
package/dist/plugins/counsel/.claude-plugin/plugin.json +1 -1
package/dist/plugins/counsel/VERSION +1 -0
package/dist/plugins/counsel/hooks/hooks.json +10 -0
package/dist/plugins/counsel/hooks/project-setup.sh +109 -0
package/dist/plugins/counsel/templates/CLAUDE.md.managed-block +123 -0
package/dist/plugins/counsel/templates/CLAUDE.md.template +111 -0
package/dist/plugins/cruise/.claude-plugin/plugin.json +1 -1
package/dist/plugins/cruise/VERSION +1 -0
package/dist/plugins/cruise/hooks/hooks.json +16 -0
package/dist/plugins/cruise/hooks/project-setup.sh +109 -0
package/dist/plugins/cruise/templates/CLAUDE.md.managed-block +123 -0
package/dist/plugins/cruise/templates/CLAUDE.md.template +111 -0
package/dist/plugins/forge/.claude-plugin/plugin.json +1 -1
package/dist/plugins/forge/VERSION +1 -0
package/dist/plugins/forge/agents/troubleshooter.md +132 -0
package/dist/plugins/forge/commands/implement.md +99 -1
package/dist/plugins/forge/commands/planning.md +100 -11
package/dist/plugins/forge/hooks/escalation-guard.sh +177 -0
package/dist/plugins/forge/hooks/hooks.json +22 -0
package/dist/plugins/forge/hooks/project-setup.sh +109 -0
package/dist/plugins/forge/templates/CLAUDE.md.managed-block +123 -0
package/dist/plugins/forge/templates/CLAUDE.md.template +111 -0
package/dist/plugins/prime/.claude-plugin/plugin.json +1 -1
package/dist/plugins/prime/VERSION +1 -0
package/dist/plugins/prime/agents/troubleshooter.md +132 -0
package/dist/plugins/prime/commands/calibrate.md +20 -0
package/dist/plugins/prime/commands/ci-fix.md +36 -0
package/dist/plugins/prime/commands/fix.md +23 -0
package/dist/plugins/prime/commands/implement.md +99 -1
package/dist/plugins/prime/commands/planning.md +100 -11
package/dist/plugins/prime/commands/qa-incident.md +54 -0
package/dist/plugins/prime/commands/restart.md +186 -30
package/dist/plugins/prime/hooks/escalation-guard.sh +177 -0
package/dist/plugins/prime/hooks/hooks.json +60 -0
package/dist/plugins/prime/hooks/post-config-change-restart-reminder.sh +86 -0
package/dist/plugins/prime/hooks/post-server-crash-watch.sh +120 -0
package/dist/plugins/prime/hooks/pre-server-port-guard.sh +110 -0
package/dist/plugins/prime/hooks/project-setup.sh +109 -0
package/dist/plugins/prime/hooks/sync-agents.sh +99 -12
package/dist/plugins/prime/templates/CLAUDE.md.managed-block +123 -0
package/dist/plugins/prime/templates/CLAUDE.md.template +111 -0
package/dist/plugins/prism/.claude-plugin/plugin.json +1 -1
package/dist/plugins/prism/VERSION +1 -0
package/dist/plugins/prism/commands/qa-incident.md +54 -0
package/dist/plugins/prism/hooks/hooks.json +12 -0
package/dist/plugins/prism/hooks/project-setup.sh +109 -0
package/dist/plugins/prism/templates/CLAUDE.md.managed-block +123 -0
package/dist/plugins/prism/templates/CLAUDE.md.template +111 -0
package/dist/plugins/scalpel/.claude-plugin/plugin.json +1 -1
package/dist/plugins/scalpel/VERSION +1 -0
package/dist/plugins/scalpel/agents/troubleshooter.md +132 -0
package/dist/plugins/scalpel/commands/ci-fix.md +36 -0
package/dist/plugins/scalpel/commands/fix.md +23 -0
package/dist/plugins/scalpel/hooks/escalation-guard.sh +177 -0
package/dist/plugins/scalpel/hooks/hooks.json +24 -0
package/dist/plugins/scalpel/hooks/project-setup.sh +109 -0
package/dist/plugins/scalpel/templates/CLAUDE.md.managed-block +123 -0
package/dist/plugins/scalpel/templates/CLAUDE.md.template +111 -0
package/dist/plugins/sentinel/.claude-plugin/plugin.json +1 -1
package/dist/plugins/sentinel/VERSION +1 -0
package/dist/plugins/sentinel/agents/troubleshooter.md +132 -0
package/dist/plugins/sentinel/commands/restart.md +186 -30
package/dist/plugins/sentinel/hooks/escalation-guard.sh +177 -0
package/dist/plugins/sentinel/hooks/hooks.json +64 -0
package/dist/plugins/sentinel/hooks/post-config-change-restart-reminder.sh +86 -0
package/dist/plugins/sentinel/hooks/post-server-crash-watch.sh +120 -0
package/dist/plugins/sentinel/hooks/pre-server-port-guard.sh +110 -0
package/dist/plugins/sentinel/hooks/project-setup.sh +109 -0
package/dist/plugins/sentinel/templates/CLAUDE.md.managed-block +123 -0
package/dist/plugins/sentinel/templates/CLAUDE.md.template +111 -0
package/dist/plugins/shield/.claude-plugin/plugin.json +1 -1
package/dist/plugins/shield/VERSION +1 -0
package/dist/plugins/shield/hooks/hooks.json +22 -12
package/dist/plugins/shield/hooks/project-setup.sh +109 -0
package/dist/plugins/shield/templates/CLAUDE.md.managed-block +123 -0
package/dist/plugins/shield/templates/CLAUDE.md.template +111 -0
package/dist/plugins/spark/.claude-plugin/plugin.json +1 -1
package/dist/plugins/spark/VERSION +1 -0
package/dist/plugins/spark/commands/calibrate.md +20 -0
package/dist/plugins/spark/hooks/hooks.json +10 -0
package/dist/plugins/spark/hooks/project-setup.sh +109 -0
package/dist/plugins/spark/templates/CLAUDE.md.managed-block +123 -0
package/dist/plugins/spark/templates/CLAUDE.md.template +111 -0
package/hook-defs.json +31 -0
package/hooks/escalation-guard.sh +177 -0
package/hooks/post-config-change-restart-reminder.sh +86 -0
package/hooks/post-server-crash-watch.sh +120 -0
package/hooks/pre-server-port-guard.sh +110 -0
package/hooks/project-setup.sh +109 -0
package/hooks/sync-agents.sh +99 -12
package/install.sh +2 -2
package/package.json +1 -1
package/portable.manifest +7 -1
package/skills/calibrate/SKILL.md +20 -0
package/skills/ci-fix/SKILL.md +36 -0
package/skills/fix/SKILL.md +23 -0
package/skills/implement/SKILL.md +99 -1
package/skills/license/SKILL.md +159 -0
package/skills/planning/SKILL.md +100 -11
package/skills/qa-incident/SKILL.md +54 -0
package/skills/restart/SKILL.md +187 -31

package/dist/plugins/scalpel/templates/CLAUDE.md.template ADDED Viewed

@@ -0,0 +1,111 @@
+# CLAUDE.md — {{PROJECT_NAME}}
+<!-- Generated by claude-agents install.sh --init -->
+<!-- TODO: Replace {{placeholders}} with your project details -->
+## Project Overview
+{{PROJECT_NAME}} is a {{DESCRIPTION}}.
+## Tech Stack
+- **Frontend**: <!-- TODO: e.g., Next.js 14, React 18, TypeScript, Tailwind -->
+- **Backend**: <!-- TODO: e.g., FastAPI, SQLAlchemy, PostgreSQL -->
+- **Auth**: <!-- TODO: e.g., Stytch, Auth0, Clerk -->
+- **Deploy**: <!-- TODO: e.g., Railway, Vercel, AWS -->
+## Project Structure
+```
+{{PROJECT_NAME}}/
+├── frontend/          <!-- TODO: Frontend directory -->
+├── backend/           <!-- TODO: Backend directory -->
+└── ...
+```
+## Key Architecture
+<!-- TODO: Describe your auth flow, API patterns, database schema, etc. -->
+## Local Dev Services
+<!-- TODO: Auto-populated by /scan or fill manually -->
+| Service  | Port | Directory | Start Command |
+|----------|------|-----------|---------------|
+| Frontend | <!-- TODO --> | frontend/ | <!-- TODO: e.g., npm run dev --> |
+| Backend  | <!-- TODO --> | backend/  | <!-- TODO: e.g., uvicorn app.main:app --reload --> |
+## Test Commands
+<!-- TODO: Auto-populated by /scan or fill manually -->
+| What | Command | Directory |
+|------|---------|-----------|
+| Backend tests | <!-- TODO: e.g., pytest --> | backend/ |
+| Backend lint | <!-- TODO: e.g., ruff check . --> | backend/ |
+| Frontend tests | <!-- TODO: e.g., npm test --> | frontend/ |
+| Frontend lint | <!-- TODO: e.g., npm run lint --> | frontend/ |
+| Type check | <!-- TODO: e.g., npx tsc --noEmit --> | frontend/ |
+| E2E tests | <!-- TODO: e.g., npx playwright test --> | frontend/ |
+## Infrastructure
+<!-- TODO: Auto-populated by /scan or fill manually -->
+| Platform | Service | Domain |
+|----------|---------|--------|
+| <!-- TODO: e.g., Railway --> | <!-- TODO --> | <!-- TODO --> |
+Health endpoints: <!-- TODO: e.g., /health, /api/health -->
+## Environments
+<!-- TODO: Auto-populated by /scan environments or /calibrate -->
+| Name | Type | URL | Health | Deploy | Branch |
+|------|------|-----|--------|--------|--------|
+| local | development | <!-- TODO --> | <!-- TODO: e.g., /health --> | manual | — |
+| <!-- TODO --> | <!-- TODO: staging/production/preview/canary --> | <!-- TODO --> | <!-- TODO --> | <!-- TODO --> | <!-- TODO --> |
+Access notes: <!-- TODO: e.g., Railway MCP for staging/prod. Env vars: .env.local, .env.staging -->
+## Domain
+<!-- TODO: Auto-populated by /scan or fill manually -->
+<!-- Describe what this app does, its core entities, and business rules. -->
+<!-- Used by qa-domain agent for domain-aware testing. -->
+## Running Locally
+```bash
+# TODO: Add your local development commands
+# Frontend
+cd frontend && npm run dev
+# Backend
+cd backend && source .venv/bin/activate && uvicorn app.main:app --reload
+```
+## Critical Rules
+<!-- TODO: Add project-specific rules, e.g.: -->
+- Never push to main directly — always create a PR
+- Secrets in .env.local only — never committed
+## Agent Customization
+The following agents/skills are managed by `claude-agents` (symlinked):
+- Run `~/.claude-agents/install.sh --status` to see what's linked
+- To override any portable file, replace the symlink with a regular file
+- Your override won't be touched by future syncs
+### Project-Specific Agents
+Add project-specific agents as regular files in `.claude/agents/`:
+- See `~/.claude-agents/examples/agents/` for templates (frontend, backend, ops, sre, qa)
+### Project-Specific Skills
+Add project-specific skills as regular directories in `.claude/skills/`:
+- See `~/.claude-agents/examples/skills/` for templates (ci-fix, qa, restart)

package/dist/plugins/sentinel/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "sentinel",
   "description": "Operations and reliability — SRE, incidents, deploy monitoring",
-  "version": "1.0.5",
+  "version": "1.0.7",
   "author": {
     "name": "Arth AI"
   }

package/dist/plugins/sentinel/VERSION ADDED Viewed

	@@ -0,0 +1 @@
1	+ 1.0.7

package/dist/plugins/sentinel/agents/troubleshooter.md ADDED Viewed

@@ -0,0 +1,132 @@
+---
+name: troubleshooter
+description: "Specialized debugging agent for when other agents get stuck. Performs root cause analysis using error context, knowledge base, git history, and CLAUDE.md. Produces structured diagnosis with confidence level and recommended fix."
+model: sonnet
+---
+# Troubleshooter Agent
+You are a specialized debugging agent. You are called when another agent or workflow
+has failed multiple times and needs expert diagnosis.
+## When You Are Spawned
+Another agent has hit a wall — they've tried 2-3 fixes and keep failing. Your job
+is to diagnose the root cause and provide a fix with confidence rating.
+## Your Process (follow in order)
+### 1. Understand the Problem (DO NOT SKIP)
+Read the error context provided in your spawn prompt. Extract:
+- **Exact error message** (not paraphrased)
+- **What was being attempted** (the goal, not just the command)
+- **What has already been tried** (and why each attempt failed)
+- **The file(s) involved**
+### 2. Consult Knowledge Base (BEFORE forming any hypothesis)
+Check these sources in order:
+```
+.claude/knowledge/qa-knowledge/         → past incidents with error signatures
+.claude/knowledge/shared/conventions.md → project-specific gotchas and rules
+.claude/knowledge/shared/patterns.md    → architecture patterns that may explain the error
+.claude/knowledge/agents/               → per-agent learning files
+CLAUDE.md                               → project configuration, test commands, services
+```
+Search for:
+- The exact error message (or key phrases)
+- The file/module involved
+- The command that failed
+- Similar past incidents
+**If you find a match:** Follow the documented fix. Do not reinvent.
+**If no match:** Proceed to step 3.
+### 3. Gather Fresh Evidence
+Read the actual source code around the error:
+- The file mentioned in the error (read 50+ lines of context, not just the error line)
+- Related files (imports, callers, configuration)
+- Recent changes: `git log --oneline -10 -- <file>` and `git diff HEAD -- <file>`
+Check the environment:
+- `git status` — are there uncommitted changes that might cause the issue?
+- Check if the right dependencies are installed (node_modules, venv, etc.)
+- Check if services are running (ports, Docker containers)
+- Check environment variables that the code expects
+### 4. Form Hypothesis (evidence-based only)
+Based on steps 2-3, form ONE primary hypothesis and optionally one alternative.
+Each hypothesis MUST cite evidence:
+```
+HYPOTHESIS: [what I think is wrong]
+EVIDENCE:
+  - [source]: [what I found that supports this]
+  - [source]: [what I found that supports this]
+CONFIDENCE: HIGH / MEDIUM / LOW
+  - HIGH: evidence directly explains the error, fix is clear
+  - MEDIUM: evidence is consistent but not conclusive
+  - LOW: best guess based on limited evidence
+```
+### 5. Recommend Fix
+Provide a specific, actionable fix:
+```
+RECOMMENDED FIX:
+  File: [exact file path]
+  Change: [what to modify — be specific, not vague]
+  Why: [how this addresses the root cause]
+  Verify: [command to run to confirm the fix works]
+ALTERNATIVE FIX (if confidence < HIGH):
+  File: [exact file path]
+  Change: [what to modify]
+  Why: [different hypothesis this addresses]
+```
+### 6. Output Format
+Always produce this structured output:
+```markdown
+## Troubleshooter Diagnosis
+**Error:** [exact error]
+**Root Cause:** [1-2 sentence explanation]
+**Confidence:** HIGH / MEDIUM / LOW
+### Evidence
+- [source 1]: [finding]
+- [source 2]: [finding]
+- Knowledge base: [match found / no match]
+### Recommended Fix
+- File: [path]
+- Change: [specific change]
+- Verify: [command]
+### What Was Wrong With Previous Attempts
+- Attempt 1: [why it didn't work — specific reason]
+- Attempt 2: [why it didn't work — specific reason]
+### If This Doesn't Work
+- [Next diagnostic step to try]
+- [What data to gather]
+- [Whether to escalate to user — and what to ask them]
+```
+## Rules
+1. **Never guess.** Every claim must cite evidence from code, logs, KB, or git history.
+2. **Check KB first.** If a past incident matches, use that fix. Don't reinvent.
+3. **Be specific.** "Check the config" is not a fix. "Change line 42 of config.ts from X to Y" is.
+4. **Explain why previous attempts failed.** This is as valuable as the fix itself.
+5. **Know when to escalate.** If confidence is LOW and you can't gather more evidence, say so. Recommend what data to ask the user for.
+6. **Don't try the fix yourself.** Your job is diagnosis. The calling agent implements the fix.

package/dist/plugins/sentinel/commands/restart.md CHANGED Viewed

@@ -1,68 +1,224 @@
 ---
 name: restart
-description: "Kill and restart local dev servers. Reads service config from CLAUDE.md. Usage: /restart [service]"
+description: "Discover, restart, and validate local dev servers. Auto-detects Docker vs native, checks health, catches crash loops. Usage: /restart <service> <--preflight>"
 ---
 # Restart Servers Skill
-Kill and restart local dev servers using the configuration from CLAUDE.md.
+Discover how services run, restart them safely, and verify they stay healthy.
-## Instructions
+## Phase 1: Discover Services
-1. **Read CLAUDE.md** in the project root. Look for the **Local Dev Services** table:
+**First, check CLAUDE.md** for a `Local Dev Services` table:
 ```markdown
 ## Local Dev Services
-| Service  | Port | Directory | Start Command |
-|----------|------|-----------|---------------|
-| Frontend | 3000 | frontend/ | npm run dev |
-| Backend  | 8000 | backend/  | uvicorn app.main:app --reload --port 8000 |
+| Service  | Type   | Port | Directory | Start Command                         | Health Check    | Depends On |
+|----------|--------|------|-----------|---------------------------------------|-----------------|------------|
+| Postgres | docker | 5432 | —         | docker compose up -d postgres         | pg_isready      | Docker     |
+| Backend  | native | 8000 | backend/  | uvicorn app.main:app --reload         | /api/health     | Postgres   |
+| Frontend | native | 3000 | frontend/ | npm run dev                           | /health         | Backend    |
 ```
-If CLAUDE.md doesn't have this section, scan the project:
-- `package.json` scripts → find dev/start commands
-- `requirements.txt` / `pyproject.toml` → look for uvicorn/gunicorn/flask
-- `docker-compose.yml` → extract service ports
-- Common ports: frontend 3000, backend 8000
+**If the table is missing or incomplete, auto-discover** by scanning the repo. Check these files in order:
+| File | What to extract |
+|------|-----------------|
+| `docker-compose.yml` / `docker-compose.*.yml` | Service names, ports, images, healthcheck configs, volume mounts. These are **docker** type services. |
+| `Dockerfile` / `Dockerfile.*` | What gets containerized — cross-reference with compose to understand which services are Docker-managed. |
+| `package.json` (root + subdirs) | `scripts.dev`, `scripts.start`, `scripts.serve` → these are **native** Node services. Check `proxy` field for backend port. |
+| `Makefile` / `Procfile` / `Justfile` | Process definitions, often with ports and dependency ordering. |
+| `pyproject.toml` / `requirements.txt` | Python services — look for uvicorn/gunicorn/flask/django in deps. |
+| `.env` / `.env.local` / `.env.example` | Port assignments (`PORT=`, `DATABASE_URL=`, `REDIS_URL=`), required env vars. |
+| `turbo.json` / `nx.json` / `pnpm-workspace.yaml` | Monorepo structure — maps workspace packages to services. |
+For each discovered service, determine:
+- **Name**: human-readable (e.g., "Backend API", "Postgres")
+- **Type**: `docker` (managed by docker compose) or `native` (runs directly on host)
+- **Port**: what port it listens on
+- **Directory**: where to `cd` before running the start command
+- **Start command**: exact command to launch it
+- **Health check**: endpoint or command to verify it's running (look for `/health`, `/api/health`, `pg_isready`, `redis-cli ping`, etc.)
+- **Depends on**: other services that must be running first (DB before backend, backend before frontend)
+**If discovery is ambiguous, ASK the user.** Specifically ask when:
+- Multiple compose files exist and it's unclear which to use (dev vs prod vs test)
+- A service could be Docker OR native (e.g., Postgres has both a compose entry and a local install)
+- No health check endpoint is obvious — ask what URL or command confirms the service is healthy
+- Port conflicts or unusual port assignments are detected
+- You find services but can't determine the dependency order
+Present what you found and ask for confirmation:
+```
+Found 3 services:
+  1. Postgres (docker, port 5432) — via docker-compose.yml
+  2. Backend (native, port 8000) — via backend/pyproject.toml
+  3. Frontend (native, port 3000) — via frontend/package.json
+Dependencies: Frontend → Backend → Postgres
+Health checks:
+  - Postgres: pg_isready -h localhost -p 5432
+  - Backend: curl localhost:8000/api/health
+  - Frontend: curl localhost:3000
+Does this look right? Any corrections?
+```
+After confirmation (or if CLAUDE.md table already exists), proceed to Phase 2.
+**Update CLAUDE.md**: If services were auto-discovered and the user confirmed, write/update the `Local Dev Services` table in CLAUDE.md so future runs skip discovery.
-2. **Kill existing processes** for the target service(s):
+## Phase 2: Pre-flight Validation
+Before touching any running processes, validate the environment is ready:
+### 2a. Docker check (if any service type is `docker`)
+```bash
+# Is Docker daemon running?
+docker info > /dev/null 2>&1 && echo "Docker: OK" || echo "Docker: NOT RUNNING"
+```
+If Docker is not running, **stop and tell the user**. Don't proceed.
+### 2b. Dependency check (for native services)
 ```bash
-# Kill by port
+# Node services — are node_modules installed?
+[ -d "<directory>/node_modules" ] && echo "node_modules: OK" || echo "node_modules: MISSING — run npm install"
+# Python services — is the venv active / deps installed?
+[ -f "<directory>/.venv/bin/python" ] && echo "venv: OK" || echo "venv: MISSING"
+```
+### 2c. Environment variables
+```bash
+# Check critical env vars exist (read from .env.example or known requirements)
+[ -f "<directory>/.env" ] || [ -f "<directory>/.env.local" ] && echo "env file: OK" || echo "env file: MISSING"
+```
+### 2d. Port availability
+```bash
+# Check if port is already in use by a DIFFERENT process than expected
+lsof -ti:<port> 2>/dev/null
+```
+If a port is occupied by an unexpected process, warn the user before killing it.
+### 2e. Pre-flight only mode
+If the user ran `/restart --preflight`, **stop here** and report results. Don't restart anything.
+## Phase 3: Restart Services (dependency order)
+Restart in dependency order — infrastructure first, then backends, then frontends.
+### 3a. Stop services (reverse dependency order)
+```bash
+# For native services — kill by port
 lsof -ti:<port> | xargs kill -9 2>/dev/null
+# For docker services — stop the specific service
+docker compose stop <service_name>
 # Wait for ports to free
 sleep 2
-```
-3. **Start each server** in background using the configured start command:
+# Verify port is actually free
+lsof -ti:<port> 2>/dev/null && echo "WARNING: port <port> still occupied" || echo "Port <port>: free"
+```
+### 3b. Start services (dependency order)
 ```bash
-# Run in background
-<start_command> &
+# Docker services first
+docker compose up -d <service_name>
+# Then native services — run in background
+cd <directory> && <start_command>
 ```
-Use `run_in_background: true` on the Bash tool.
+Use `run_in_background: true` on the Bash tool for native services.
+**Wait between dependency layers** — don't start the backend until the DB health check passes. Don't start the frontend until the backend health check passes.
+## Phase 4: Post-restart Health Validation
+This is the critical phase. A single health check is not enough — services can start and then crash seconds later.
-4. **Health check** after startup:
+### 4a. Initial health check (per service, in dependency order)
 ```bash
-# Check each service
-curl -sf http://localhost:<port>/health > /dev/null 2>&1 && echo "<service>:UP" || echo "<service>:DOWN"
+# HTTP services
+curl -sf http://localhost:<port><health_path> > /dev/null 2>&1 && echo "<service>: UP" || echo "<service>: DOWN"
+# Postgres
+pg_isready -h localhost -p <port> 2>/dev/null && echo "Postgres: UP" || echo "Postgres: DOWN"
+# Redis
+redis-cli -p <port> ping 2>/dev/null | grep -q PONG && echo "Redis: UP" || echo "Redis: DOWN"
+# Docker services
+docker compose ps <service_name> --format '{{.Status}}' | grep -q "Up" && echo "<service>: UP" || echo "<service>: DOWN"
+```
+### 4b. Crash loop detection (wait 8 seconds, check again)
+```bash
+sleep 8
+# Re-check each service
+curl -sf http://localhost:<port><health_path> > /dev/null 2>&1 && echo "<service>: STABLE" || echo "<service>: CRASHED"
+# For native services — is the process still running?
+lsof -ti:<port> > /dev/null 2>&1 && echo "<service> process: alive" || echo "<service> process: GONE"
 ```
-5. **Report** status of all servers.
+If a service is down on the second check:
+1. **Check logs** — read the last 30 lines of output for error messages
+2. **Report the failure clearly** with the error
+3. **Do NOT retry automatically** — tell the user what went wrong
+### 4c. Final status report
+```
+✅ Restart complete
+| Service  | Type   | Port | Status  |
+|----------|--------|------|---------|
+| Postgres | docker | 5432 | STABLE  |
+| Backend  | native | 8000 | STABLE  |
+| Frontend | native | 3000 | STABLE  |
+All services healthy after 10s stability check.
+```
+Or if something failed:
+```
+⚠️ Restart partial — 1 service unhealthy
+| Service  | Type   | Port | Status  |
+|----------|--------|------|---------|
+| Postgres | docker | 5432 | STABLE  |
+| Backend  | native | 8000 | CRASHED |
+| Frontend | native | 3000 | SKIPPED |
+Backend crashed after startup. Last error:
+  ModuleNotFoundError: No module named 'sqlalchemy'
+Fix: Run `pip install -r requirements.txt` in backend/ and retry.
+Frontend was skipped because it depends on Backend.
+```
 ## Argument Patterns
 | User Input | Action |
 |-----------|--------|
-| `/restart` (no args) | Kill and restart all configured servers |
-| `/restart backend` | Kill and restart only the backend service |
-| `/restart frontend` | Kill and restart only the frontend service |
-| `/restart <service>` | Kill and restart the named service |
+| `/restart` (no args) | Discover → validate → restart all services |
+| `/restart backend` | Restart only the named service (and its dependencies if they're down) |
+| `/restart --preflight` | Discover and validate only — don't restart anything |
-## If CLAUDE.md Is Missing Service Config
+## Key Principles
-Report: "No Local Dev Services table found in CLAUDE.md. Add a table with Service, Port, Directory, and Start Command columns. Or specify the services manually."
+1. **Never guess** — if you can't determine how a service runs, ask
+2. **Dependency order matters** — always start infra → backend → frontend
+3. **One health check isn't enough** — check twice with a gap to catch crash loops
+4. **Report errors with context** — show the actual log output, not just "DOWN"
+5. **Don't retry blindly** — if a service crashes, diagnose and report, don't loop