@codyswann/lisa 1.60.7 → 1.62.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/plugins/lisa/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-cdk/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-expo/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-nestjs/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-rails/.claude-plugin/plugin.json +17 -2
- package/plugins/lisa-rails/agents/ops-specialist.md +226 -0
- package/plugins/lisa-rails/hooks/rubocop-on-edit.sh +78 -0
- package/plugins/lisa-rails/hooks/sg-scan-on-edit.sh +74 -0
- package/plugins/lisa-rails/rules/lisa.md +5 -2
- package/plugins/lisa-rails/skills/ops-check-logs/SKILL.md +191 -0
- package/plugins/lisa-rails/skills/ops-deploy/SKILL.md +153 -0
- package/plugins/lisa-rails/skills/ops-run-local/SKILL.md +169 -0
- package/plugins/lisa-rails/skills/ops-verify-jobs/SKILL.md +157 -0
- package/plugins/lisa-rails/skills/ops-verify-telemetry/SKILL.md +197 -0
- package/plugins/lisa-typescript/.claude-plugin/plugin.json +1 -1
- package/plugins/src/rails/.claude-plugin/plugin.json +10 -1
- package/plugins/src/rails/agents/ops-specialist.md +226 -0
- package/plugins/src/rails/hooks/rubocop-on-edit.sh +78 -0
- package/plugins/src/rails/hooks/sg-scan-on-edit.sh +74 -0
- package/plugins/src/rails/rules/lisa.md +5 -2
- package/plugins/src/rails/skills/ops-check-logs/SKILL.md +191 -0
- package/plugins/src/rails/skills/ops-deploy/SKILL.md +153 -0
- package/plugins/src/rails/skills/ops-run-local/SKILL.md +169 -0
- package/plugins/src/rails/skills/ops-verify-jobs/SKILL.md +157 -0
- package/plugins/src/rails/skills/ops-verify-telemetry/SKILL.md +197 -0
- package/rails/copy-overwrite/.rubocop.yml +3 -13
- package/rails/create-only/.github/workflows/ci.yml +1 -0
- package/rails/create-only/.github/workflows/claude-nightly-code-complexity.yml +21 -0
- package/rails/create-only/.github/workflows/claude-nightly-test-coverage.yml +29 -0
- package/rails/create-only/.github/workflows/claude-nightly-test-improvement.yml +32 -0
- package/rails/create-only/.simplecov +10 -1
- package/rails/create-only/rubocop.thresholds.yml +17 -0
- package/rails/create-only/simplecov.thresholds.json +4 -0
- package/typescript/create-only/.github/workflows/claude-nightly-test-coverage.yml +2 -0
|
@@ -0,0 +1,153 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ops-deploy
|
|
3
|
+
description: Deploy Rails applications via Kamal or CI/CD branch push to staging or production environments.
|
|
4
|
+
allowed-tools:
|
|
5
|
+
- Bash
|
|
6
|
+
- Read
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# Ops: Deploy
|
|
10
|
+
|
|
11
|
+
Deploy the Rails application to remote environments.
|
|
12
|
+
|
|
13
|
+
**Argument**: `$ARGUMENTS` — environment (`staging`, `production`) and optional method (`kamal`, `ci`; default: `kamal`)
|
|
14
|
+
|
|
15
|
+
## Safety
|
|
16
|
+
|
|
17
|
+
**CRITICAL**: Production deployments require explicit human confirmation before proceeding. Always ask for confirmation when `$ARGUMENTS` contains `production`.
|
|
18
|
+
|
|
19
|
+
## Discovery
|
|
20
|
+
|
|
21
|
+
1. Read `config/deploy.yml` to discover the Kamal configuration: service name, registry, image, servers, accessories
|
|
22
|
+
2. Read `config/deploy.staging.yml` and `config/deploy.production.yml` for environment-specific overrides
|
|
23
|
+
3. Verify the specific environment-variable keys needed for the deploy are present in `.env.staging` / `.env.production` (or in SSM) without printing secret values
|
|
24
|
+
4. Read `Dockerfile` to understand the Docker build stages
|
|
25
|
+
|
|
26
|
+
## CI/CD Path (Preferred)
|
|
27
|
+
|
|
28
|
+
The standard deployment path is via CI/CD — pushing to environment branches triggers auto-deploy.
|
|
29
|
+
|
|
30
|
+
```bash
|
|
31
|
+
# Deploy to staging via CI/CD
|
|
32
|
+
git push origin HEAD:staging
|
|
33
|
+
|
|
34
|
+
# Deploy to production via CI/CD (requires human confirmation first)
|
|
35
|
+
git push origin HEAD:production
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
Monitor the deployment via GitHub Actions:
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
gh run list --branch {environment} --limit 3
|
|
42
|
+
gh run watch {run-id}
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
## Kamal Deployment (Manual)
|
|
46
|
+
|
|
47
|
+
### Pre-Deploy Checks
|
|
48
|
+
|
|
49
|
+
1. **Verify Kamal is installed**:
|
|
50
|
+
```bash
|
|
51
|
+
kamal version
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
2. **Verify Docker builds locally**:
|
|
55
|
+
```bash
|
|
56
|
+
docker build -t {app_name}:test .
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
3. **Check current deployment state**:
|
|
60
|
+
```bash
|
|
61
|
+
kamal details -d {environment}
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
4. **Check for deploy lock**:
|
|
65
|
+
```bash
|
|
66
|
+
kamal lock status -d {environment}
|
|
67
|
+
```
|
|
68
|
+
If locked from a previous interrupted deploy: `kamal lock release -d {environment}`
|
|
69
|
+
|
|
70
|
+
### Deploy to Staging
|
|
71
|
+
|
|
72
|
+
```bash
|
|
73
|
+
kamal deploy -d staging
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
### Deploy to Production
|
|
77
|
+
|
|
78
|
+
**Requires explicit human confirmation.**
|
|
79
|
+
|
|
80
|
+
```bash
|
|
81
|
+
kamal deploy -d production
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
### Deploy with Specific Git Ref
|
|
85
|
+
|
|
86
|
+
```bash
|
|
87
|
+
kamal deploy -d {environment} --version {git-sha-or-tag}
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
### Rollback
|
|
91
|
+
|
|
92
|
+
If a deploy causes issues, roll back to the previous version:
|
|
93
|
+
|
|
94
|
+
```bash
|
|
95
|
+
# List available versions
|
|
96
|
+
kamal app containers -d {environment}
|
|
97
|
+
|
|
98
|
+
# Rollback to previous version
|
|
99
|
+
kamal rollback {previous-version} -d {environment}
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
## Post-Deploy Verification
|
|
103
|
+
|
|
104
|
+
After any deployment:
|
|
105
|
+
|
|
106
|
+
1. **Health check** the deployed environment:
|
|
107
|
+
```bash
|
|
108
|
+
curl -sf -o /dev/null -w "HTTP %{http_code} in %{time_total}s" https://{app_host}/up
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
2. **Verify ECS service stability** (running count matches desired count):
|
|
112
|
+
```bash
|
|
113
|
+
aws ecs describe-services \
|
|
114
|
+
--cluster {cluster-name} \
|
|
115
|
+
--services {service-name} \
|
|
116
|
+
--region {aws-region} \
|
|
117
|
+
--query 'services[0].{Running:runningCount,Desired:desiredCount,Status:status}' \
|
|
118
|
+
--output table
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
3. **Check for migration status** (if migrations were included):
|
|
122
|
+
```bash
|
|
123
|
+
kamal app exec --roles=web "bin/rails db:migrate:status" -d {environment}
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
4. **Check logs** for errors in the first 5 minutes (use `ops-check-logs` skill)
|
|
127
|
+
|
|
128
|
+
5. **Verify Solid Queue workers** are running (use `ops-verify-jobs` skill)
|
|
129
|
+
|
|
130
|
+
6. **Verify OpenTelemetry traces** are being exported (use `ops-verify-telemetry` skill)
|
|
131
|
+
|
|
132
|
+
## Kamal Utility Commands
|
|
133
|
+
|
|
134
|
+
| Command | Purpose |
|
|
135
|
+
|---------|---------|
|
|
136
|
+
| `kamal details -d {env}` | Show current deployment details |
|
|
137
|
+
| `kamal app logs -d {env}` | Tail application logs |
|
|
138
|
+
| `kamal app exec --roles=web "bin/rails console" -d {env}` | Open Rails console on remote |
|
|
139
|
+
| `kamal audit -d {env}` | Show deploy audit log |
|
|
140
|
+
| `kamal env push -d {env}` | Push updated environment variables |
|
|
141
|
+
| `kamal lock status -d {env}` | Check deploy lock status |
|
|
142
|
+
| `kamal lock release -d {env}` | Release a stale deploy lock |
|
|
143
|
+
| `kamal traefik reboot -d {env}` | Restart the Traefik proxy |
|
|
144
|
+
|
|
145
|
+
## Output Format
|
|
146
|
+
|
|
147
|
+
Report deployment result as a table:
|
|
148
|
+
|
|
149
|
+
| Target | Environment | Method | Status | Verification |
|
|
150
|
+
|--------|-------------|--------|--------|-------------|
|
|
151
|
+
| Rails app | staging | Kamal | SUCCESS/FAIL | /up returns 200 |
|
|
152
|
+
| ECS tasks | staging | N/A | STABLE/UNSTABLE | running == desired |
|
|
153
|
+
| Solid Queue | staging | N/A | RUNNING/DOWN | workers have heartbeat |
|
|
@@ -0,0 +1,169 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ops-run-local
|
|
3
|
+
description: Manage the local Docker Compose development environment for Rails applications. Supports start, stop, restart, and status for the full stack or individual services.
|
|
4
|
+
allowed-tools:
|
|
5
|
+
- Bash
|
|
6
|
+
- Read
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# Ops: Run Local
|
|
10
|
+
|
|
11
|
+
Manage the local Docker Compose development environment.
|
|
12
|
+
|
|
13
|
+
**Argument**: `$ARGUMENTS` — `start`, `stop`, `restart`, `status`, `start-app`, `start-services` (default: `start`)
|
|
14
|
+
|
|
15
|
+
## Prerequisites (run before any operation)
|
|
16
|
+
|
|
17
|
+
1. Verify Docker is running:
|
|
18
|
+
```bash
|
|
19
|
+
docker info > /dev/null 2>&1 && echo "Docker OK" || echo "ERROR: Docker is not running — start Docker Desktop"
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
2. Check port availability:
|
|
23
|
+
```bash
|
|
24
|
+
lsof -i :3000 2>/dev/null | grep LISTEN
|
|
25
|
+
lsof -i :5432 2>/dev/null | grep LISTEN
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
3. Verify Ruby and Bundler are available:
|
|
29
|
+
```bash
|
|
30
|
+
ruby --version && bundle --version
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
## Discovery
|
|
34
|
+
|
|
35
|
+
Read the project's `docker-compose.yml` (or `compose.yaml`) to identify available services. Common services include:
|
|
36
|
+
|
|
37
|
+
- `web` or `app` — the Rails application
|
|
38
|
+
- `postgres` or `db` — PostgreSQL database
|
|
39
|
+
- `worker` — Solid Queue background worker
|
|
40
|
+
- `css` — Tailwind CSS watch process
|
|
41
|
+
|
|
42
|
+
Read `Procfile.dev` if it exists — it defines the local development process manager configuration (typically run via `bin/dev`).
|
|
43
|
+
|
|
44
|
+
Read `config/database.yml` to understand which databases need to exist locally.
|
|
45
|
+
|
|
46
|
+
## Operations
|
|
47
|
+
|
|
48
|
+
### start (full stack)
|
|
49
|
+
|
|
50
|
+
Start all Docker Compose services and the Rails application.
|
|
51
|
+
|
|
52
|
+
1. **Start infrastructure services** (PostgreSQL, etc.):
|
|
53
|
+
```bash
|
|
54
|
+
docker compose up -d postgres
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
2. **Wait for PostgreSQL** (up to 30 seconds):
|
|
58
|
+
```bash
|
|
59
|
+
for i in $(seq 1 30); do
|
|
60
|
+
docker compose exec -T postgres pg_isready -U postgres > /dev/null 2>&1 && echo "PostgreSQL ready" && break
|
|
61
|
+
sleep 1
|
|
62
|
+
done
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
3. **Create and migrate databases** (if needed):
|
|
66
|
+
```bash
|
|
67
|
+
bin/rails db:prepare
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
4. **Start the full stack** via `bin/dev` (Procfile.dev) or Docker Compose:
|
|
71
|
+
```bash
|
|
72
|
+
# Option A: Procfile.dev (preferred — starts web, worker, CSS watcher)
|
|
73
|
+
bin/dev
|
|
74
|
+
```
|
|
75
|
+
Run this in the background using the Bash tool with `run_in_background: true`.
|
|
76
|
+
|
|
77
|
+
```bash
|
|
78
|
+
# Option B: Docker Compose (if all services are containerized)
|
|
79
|
+
docker compose up -d
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
5. **Wait for Rails** (up to 60 seconds):
|
|
83
|
+
```bash
|
|
84
|
+
for i in $(seq 1 60); do
|
|
85
|
+
curl -sf http://localhost:3000/up > /dev/null 2>&1 && echo "Rails ready" && break
|
|
86
|
+
sleep 1
|
|
87
|
+
done
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
6. Report status table.
|
|
91
|
+
|
|
92
|
+
### start-services (infrastructure only)
|
|
93
|
+
|
|
94
|
+
Start only infrastructure services (database, cache) without the Rails app.
|
|
95
|
+
|
|
96
|
+
```bash
|
|
97
|
+
docker compose up -d postgres
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
Wait for readiness:
|
|
101
|
+
```bash
|
|
102
|
+
for i in $(seq 1 30); do
|
|
103
|
+
docker compose exec -T postgres pg_isready -U postgres > /dev/null 2>&1 && echo "PostgreSQL ready" && break
|
|
104
|
+
sleep 1
|
|
105
|
+
done
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
### start-app (Rails app only, assumes services are running)
|
|
109
|
+
|
|
110
|
+
```bash
|
|
111
|
+
bin/dev
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
Run in background. Verify:
|
|
115
|
+
```bash
|
|
116
|
+
for i in $(seq 1 60); do
|
|
117
|
+
curl -sf http://localhost:3000/up > /dev/null 2>&1 && echo "Rails ready" && break
|
|
118
|
+
sleep 1
|
|
119
|
+
done
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
### stop
|
|
123
|
+
|
|
124
|
+
Stop all local services.
|
|
125
|
+
|
|
126
|
+
```bash
|
|
127
|
+
# Stop Rails processes (bin/dev uses foreman which spawns child processes)
|
|
128
|
+
lsof -ti :3000 | xargs kill -9 2>/dev/null || echo "No Rails process on :3000"
|
|
129
|
+
|
|
130
|
+
# Stop Docker Compose services
|
|
131
|
+
docker compose down
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
### restart
|
|
135
|
+
|
|
136
|
+
1. Run **stop** (above).
|
|
137
|
+
2. Wait 2 seconds: `sleep 2`
|
|
138
|
+
3. Run **start** (above).
|
|
139
|
+
4. Verify all services respond.
|
|
140
|
+
|
|
141
|
+
### status
|
|
142
|
+
|
|
143
|
+
Check what is currently running and responsive.
|
|
144
|
+
|
|
145
|
+
```bash
|
|
146
|
+
echo "=== Port Check ==="
|
|
147
|
+
echo -n "Rails :3000 — "; lsof -i :3000 2>/dev/null | grep LISTEN > /dev/null && echo "LISTENING" || echo "NOT LISTENING"
|
|
148
|
+
echo -n "Postgres :5432 — "; lsof -i :5432 2>/dev/null | grep LISTEN > /dev/null && echo "LISTENING" || echo "NOT LISTENING"
|
|
149
|
+
|
|
150
|
+
echo ""
|
|
151
|
+
echo "=== Health Check ==="
|
|
152
|
+
echo -n "Rails /up — "; curl -sf -o /dev/null -w "HTTP %{http_code} in %{time_total}s" http://localhost:3000/up 2>/dev/null || echo "UNREACHABLE"
|
|
153
|
+
|
|
154
|
+
echo ""
|
|
155
|
+
echo "=== Docker Compose ==="
|
|
156
|
+
docker compose ps 2>/dev/null || echo "No Docker Compose services running"
|
|
157
|
+
|
|
158
|
+
echo ""
|
|
159
|
+
echo "=== Solid Queue Worker ==="
|
|
160
|
+
bin/rails runner "puts SolidQueue::Process.where('last_heartbeat_at > ?', 5.minutes.ago).count.to_s + ' active workers'" 2>/dev/null || echo "Cannot query Solid Queue (app may not be running)"
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
Report results as a table:
|
|
164
|
+
|
|
165
|
+
| Service | Port | Listening | Responsive |
|
|
166
|
+
|---------|------|-----------|------------|
|
|
167
|
+
| Rails (web) | 3000 | YES/NO | YES/NO |
|
|
168
|
+
| PostgreSQL | 5432 | YES/NO | N/A |
|
|
169
|
+
| Solid Queue worker | N/A | N/A | YES/NO (heartbeat) |
|
|
@@ -0,0 +1,157 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ops-verify-jobs
|
|
3
|
+
description: Verify Solid Queue background jobs are running in Rails applications. Check worker health, queue depth, failed jobs, recurring job execution, and retry stuck jobs.
|
|
4
|
+
allowed-tools:
|
|
5
|
+
- Bash
|
|
6
|
+
- Read
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# Ops: Verify Jobs
|
|
10
|
+
|
|
11
|
+
Verify Solid Queue background jobs are running and healthy.
|
|
12
|
+
|
|
13
|
+
**Argument**: `$ARGUMENTS` — operation (`status`, `failed`, `retry`, `recurring`, `depth`; default: `status`)
|
|
14
|
+
|
|
15
|
+
## Discovery
|
|
16
|
+
|
|
17
|
+
1. Read `config/solid_queue.yml` (or `config/queue.yml`) to understand the worker and dispatcher configuration
|
|
18
|
+
2. Read `app/jobs/` directory to discover all job classes
|
|
19
|
+
3. Read `config/recurring.yml` (if it exists) to discover scheduled/recurring jobs
|
|
20
|
+
|
|
21
|
+
## Prerequisites
|
|
22
|
+
|
|
23
|
+
The Rails application must be running (locally or remotely) for database-backed queries to work.
|
|
24
|
+
|
|
25
|
+
## Operations
|
|
26
|
+
|
|
27
|
+
### status (overall health)
|
|
28
|
+
|
|
29
|
+
Check Solid Queue process health via database heartbeats:
|
|
30
|
+
|
|
31
|
+
```bash
|
|
32
|
+
bin/rails runner "
|
|
33
|
+
processes = SolidQueue::Process.all
|
|
34
|
+
puts '=== Solid Queue Processes ==='
|
|
35
|
+
processes.each do |p|
|
|
36
|
+
stale = p.last_heartbeat_at < 5.minutes.ago ? 'STALE' : 'OK'
|
|
37
|
+
puts \"#{p.kind} | PID #{p.pid} | #{p.hostname} | Last heartbeat: #{p.last_heartbeat_at} | #{stale}\"
|
|
38
|
+
end
|
|
39
|
+
puts ''
|
|
40
|
+
puts \"Total: #{processes.count} processes\"
|
|
41
|
+
puts \"Healthy: #{processes.select { |p| p.last_heartbeat_at > 5.minutes.ago }.count}\"
|
|
42
|
+
puts \"Stale: #{processes.select { |p| p.last_heartbeat_at <= 5.minutes.ago }.count}\"
|
|
43
|
+
"
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
### depth (queue depth by queue name)
|
|
47
|
+
|
|
48
|
+
```bash
|
|
49
|
+
bin/rails runner "
|
|
50
|
+
puts '=== Queue Depth ==='
|
|
51
|
+
SolidQueue::Job.where(finished_at: nil).group(:queue_name).count.each do |queue, count|
|
|
52
|
+
puts \"#{queue}: #{count} pending\"
|
|
53
|
+
end
|
|
54
|
+
puts ''
|
|
55
|
+
total = SolidQueue::Job.where(finished_at: nil).count
|
|
56
|
+
puts \"Total pending: #{total}\"
|
|
57
|
+
"
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
### failed (list failed jobs)
|
|
61
|
+
|
|
62
|
+
```bash
|
|
63
|
+
bin/rails runner "
|
|
64
|
+
failed = SolidQueue::FailedExecution.includes(:job).order(created_at: :desc).limit(20)
|
|
65
|
+
puts '=== Failed Jobs ==='
|
|
66
|
+
failed.each do |f|
|
|
67
|
+
puts \"#{f.job.class_name} | Queue: #{f.job.queue_name} | Failed at: #{f.created_at} | Error: #{f.error.to_s.truncate(120)}\"
|
|
68
|
+
end
|
|
69
|
+
puts ''
|
|
70
|
+
puts \"Total failed: #{SolidQueue::FailedExecution.count}\"
|
|
71
|
+
"
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
### retry (retry failed jobs)
|
|
75
|
+
|
|
76
|
+
Retry all failed jobs:
|
|
77
|
+
|
|
78
|
+
```bash
|
|
79
|
+
bin/rails runner "
|
|
80
|
+
count = SolidQueue::FailedExecution.count
|
|
81
|
+
SolidQueue::FailedExecution.find_each do |fe|
|
|
82
|
+
fe.retry
|
|
83
|
+
end
|
|
84
|
+
puts \"Retried #{count} failed jobs\"
|
|
85
|
+
"
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
Retry a specific job class:
|
|
89
|
+
|
|
90
|
+
```bash
|
|
91
|
+
bin/rails runner "
|
|
92
|
+
failed = SolidQueue::FailedExecution.includes(:job).where(solid_queue_jobs: { class_name: '{JobClassName}' })
|
|
93
|
+
count = failed.count
|
|
94
|
+
failed.find_each { |fe| fe.retry }
|
|
95
|
+
puts \"Retried #{count} #{'{JobClassName}'} jobs\"
|
|
96
|
+
"
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
### recurring (check recurring job schedule)
|
|
100
|
+
|
|
101
|
+
```bash
|
|
102
|
+
bin/rails runner "
|
|
103
|
+
puts '=== Recurring Tasks ==='
|
|
104
|
+
SolidQueue::RecurringTask.all.each do |task|
|
|
105
|
+
last_run = SolidQueue::Job.where(class_name: task.class_name).order(created_at: :desc).first
|
|
106
|
+
last_run_at = last_run&.created_at || 'NEVER'
|
|
107
|
+
puts \"#{task.key} | #{task.class_name} | Schedule: #{task.schedule} | Last run: #{last_run_at}\"
|
|
108
|
+
end
|
|
109
|
+
" 2>/dev/null || echo "RecurringTask not available — check config/recurring.yml"
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
## Remote Verification (via Kamal)
|
|
113
|
+
|
|
114
|
+
For checking jobs in staging/production environments:
|
|
115
|
+
|
|
116
|
+
```bash
|
|
117
|
+
kamal app exec --roles=web "bin/rails runner \"
|
|
118
|
+
processes = SolidQueue::Process.all
|
|
119
|
+
healthy = processes.select { |p| p.last_heartbeat_at > 5.minutes.ago }.count
|
|
120
|
+
stale = processes.select { |p| p.last_heartbeat_at <= 5.minutes.ago }.count
|
|
121
|
+
failed = SolidQueue::FailedExecution.count
|
|
122
|
+
pending = SolidQueue::Job.where(finished_at: nil).count
|
|
123
|
+
puts \\\"Processes: #{processes.count} (#{healthy} healthy, #{stale} stale)\\\"
|
|
124
|
+
puts \\\"Pending jobs: #{pending}\\\"
|
|
125
|
+
puts \\\"Failed jobs: #{failed}\\\"
|
|
126
|
+
\"" -d {environment}
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
## Output Format
|
|
130
|
+
|
|
131
|
+
### Process Status
|
|
132
|
+
|
|
133
|
+
| Kind | PID | Hostname | Last Heartbeat | Status |
|
|
134
|
+
|------|-----|----------|---------------|--------|
|
|
135
|
+
| Worker | 12345 | web-1 | 30s ago | OK |
|
|
136
|
+
| Dispatcher | 12346 | web-1 | 15s ago | OK |
|
|
137
|
+
|
|
138
|
+
### Queue Depth
|
|
139
|
+
|
|
140
|
+
| Queue | Pending | Status |
|
|
141
|
+
|-------|---------|--------|
|
|
142
|
+
| default | 3 | OK |
|
|
143
|
+
| mailers | 0 | OK |
|
|
144
|
+
| low_priority | 150 | WARN (> 100) |
|
|
145
|
+
|
|
146
|
+
### Failed Jobs
|
|
147
|
+
|
|
148
|
+
| Job Class | Queue | Failed At | Error (truncated) |
|
|
149
|
+
|-----------|-------|-----------|-------------------|
|
|
150
|
+
| SendEmailJob | mailers | 5m ago | Net::SMTPAuthenticationError |
|
|
151
|
+
| ProcessPaymentJob | default | 1h ago | Stripe::CardError |
|
|
152
|
+
|
|
153
|
+
Flag concerns:
|
|
154
|
+
- Any stale process (heartbeat > 5 minutes ago)
|
|
155
|
+
- Queue depth > 100 pending jobs
|
|
156
|
+
- Any failed jobs in the last hour
|
|
157
|
+
- Recurring jobs that missed their schedule
|
|
@@ -0,0 +1,197 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ops-verify-telemetry
|
|
3
|
+
description: Verify OpenTelemetry traces are being collected and exported to AWS X-Ray for Rails applications. Check collector health, trace export, and CloudWatch metrics.
|
|
4
|
+
allowed-tools:
|
|
5
|
+
- Bash
|
|
6
|
+
- Read
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# Ops: Verify Telemetry
|
|
10
|
+
|
|
11
|
+
Verify OpenTelemetry traces are being collected and exported to X-Ray.
|
|
12
|
+
|
|
13
|
+
**Argument**: `$ARGUMENTS` — check type (`traces`, `metrics`, `all`; default: `all`)
|
|
14
|
+
|
|
15
|
+
## Discovery
|
|
16
|
+
|
|
17
|
+
1. Read `Gemfile` for OpenTelemetry gem configuration:
|
|
18
|
+
- `opentelemetry-sdk`
|
|
19
|
+
- `opentelemetry-exporter-otlp`
|
|
20
|
+
- `opentelemetry-instrumentation-all` (or individual instrumentation gems)
|
|
21
|
+
2. Read `config/initializers/opentelemetry.rb` (or similar) for SDK configuration
|
|
22
|
+
3. Read `config/deploy.yml` or environment files for `OTEL_EXPORTER_OTLP_ENDPOINT` and `OTEL_SERVICE_NAME`
|
|
23
|
+
4. Read `docker-compose.yml` for OpenTelemetry Collector sidecar configuration (if present)
|
|
24
|
+
|
|
25
|
+
## Local Verification
|
|
26
|
+
|
|
27
|
+
### Check OpenTelemetry Gems
|
|
28
|
+
|
|
29
|
+
```bash
|
|
30
|
+
bundle list | grep -i opentelemetry
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
### Check OpenTelemetry Configuration
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
grep -r "OpenTelemetry" config/initializers/ app/
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
### Verify OTLP Endpoint is Set
|
|
40
|
+
|
|
41
|
+
```bash
|
|
42
|
+
echo "OTEL_EXPORTER_OTLP_ENDPOINT=${OTEL_EXPORTER_OTLP_ENDPOINT:-not set}"
|
|
43
|
+
echo "OTEL_SERVICE_NAME=${OTEL_SERVICE_NAME:-not set}"
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
### Generate a Test Trace (local)
|
|
47
|
+
|
|
48
|
+
Make a request to the local app and check that traces are produced:
|
|
49
|
+
|
|
50
|
+
```bash
|
|
51
|
+
# Make a request that should generate a trace
|
|
52
|
+
curl -sf http://localhost:3000/up -w "\nHTTP %{http_code}\n"
|
|
53
|
+
|
|
54
|
+
# Check Rails logs for OpenTelemetry output (if configured to log)
|
|
55
|
+
grep -i "otel\|opentelemetry\|trace_id" log/development.log | tail -10
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
### Check Collector Sidecar (if using Docker Compose)
|
|
59
|
+
|
|
60
|
+
```bash
|
|
61
|
+
docker compose logs otel-collector 2>/dev/null | tail -20 || echo "No otel-collector service in Docker Compose"
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
## Remote Verification (AWS X-Ray)
|
|
65
|
+
|
|
66
|
+
### Check Recent Traces
|
|
67
|
+
|
|
68
|
+
```bash
|
|
69
|
+
aws xray get-trace-summaries \
|
|
70
|
+
--region {aws-region} \
|
|
71
|
+
--start-time $(ruby -r time -e 'puts (Time.now.utc - 30 * 60).iso8601') \
|
|
72
|
+
--end-time $(ruby -r time -e 'puts Time.now.utc.iso8601') \
|
|
73
|
+
--query 'TraceSummaries | length(@)' \
|
|
74
|
+
--output text
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
### Check Traces for the Application Service
|
|
78
|
+
|
|
79
|
+
```bash
|
|
80
|
+
aws xray get-trace-summaries \
|
|
81
|
+
--region {aws-region} \
|
|
82
|
+
--start-time $(ruby -r time -e 'puts (Time.now.utc - 30 * 60).iso8601') \
|
|
83
|
+
--end-time $(ruby -r time -e 'puts Time.now.utc.iso8601') \
|
|
84
|
+
--filter-expression "service(\"{service-name}\")" \
|
|
85
|
+
--query 'TraceSummaries[:10].{TraceId:Id,Duration:Duration,StatusCode:Http.HttpStatus,URL:Http.HttpURL,ResponseTime:ResponseTime}' \
|
|
86
|
+
--output table
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
### Check for Error Traces
|
|
90
|
+
|
|
91
|
+
```bash
|
|
92
|
+
aws xray get-trace-summaries \
|
|
93
|
+
--region {aws-region} \
|
|
94
|
+
--start-time $(ruby -r time -e 'puts (Time.now.utc - 3600).iso8601') \
|
|
95
|
+
--end-time $(ruby -r time -e 'puts Time.now.utc.iso8601') \
|
|
96
|
+
--filter-expression "service(\"{service-name}\") AND fault = true" \
|
|
97
|
+
--query 'TraceSummaries[:10].{TraceId:Id,Duration:Duration,StatusCode:Http.HttpStatus,URL:Http.HttpURL}' \
|
|
98
|
+
--output table
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
### Get Detailed Trace
|
|
102
|
+
|
|
103
|
+
```bash
|
|
104
|
+
aws xray batch-get-traces \
|
|
105
|
+
--region {aws-region} \
|
|
106
|
+
--trace-ids "{trace-id}" \
|
|
107
|
+
--query 'Traces[0].Segments[].Document' \
|
|
108
|
+
--output text | jq '.'
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
### Check X-Ray Service Map
|
|
112
|
+
|
|
113
|
+
```bash
|
|
114
|
+
aws xray get-service-graph \
|
|
115
|
+
--region {aws-region} \
|
|
116
|
+
--start-time $(ruby -r time -e 'puts (Time.now.utc - 3600).iso8601') \
|
|
117
|
+
--end-time $(ruby -r time -e 'puts Time.now.utc.iso8601') \
|
|
118
|
+
--query 'Services[].{Name:Name,Type:Type,Edges:Edges[].{Ref:ReferenceId,Latency:ResponseTimeHistogram[0].Average}}' \
|
|
119
|
+
--output table
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
## CloudWatch Metrics Verification
|
|
123
|
+
|
|
124
|
+
### Check Custom Metrics Published by the Application
|
|
125
|
+
|
|
126
|
+
```bash
|
|
127
|
+
aws cloudwatch list-metrics \
|
|
128
|
+
--region {aws-region} \
|
|
129
|
+
--namespace "{app_name}" \
|
|
130
|
+
--query 'Metrics[].{MetricName:MetricName,Dimensions:Dimensions[].{Name:Name,Value:Value}}' \
|
|
131
|
+
--output table
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
### Check Recent Metric Data Points
|
|
135
|
+
|
|
136
|
+
```bash
|
|
137
|
+
aws cloudwatch get-metric-statistics \
|
|
138
|
+
--region {aws-region} \
|
|
139
|
+
--namespace "{app_name}" \
|
|
140
|
+
--metric-name "{metric-name}" \
|
|
141
|
+
--start-time $(ruby -r time -e 'puts (Time.now.utc - 3600).iso8601') \
|
|
142
|
+
--end-time $(ruby -r time -e 'puts Time.now.utc.iso8601') \
|
|
143
|
+
--period 300 \
|
|
144
|
+
--statistics Average Sum \
|
|
145
|
+
--output table
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
### Check CloudWatch Alarms
|
|
149
|
+
|
|
150
|
+
```bash
|
|
151
|
+
aws cloudwatch describe-alarms \
|
|
152
|
+
--region {aws-region} \
|
|
153
|
+
--alarm-name-prefix "{app_name}" \
|
|
154
|
+
--query 'MetricAlarms[].{Name:AlarmName,State:StateValue,Metric:MetricName,Threshold:Threshold}' \
|
|
155
|
+
--output table
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
## Remote Verification (via Kamal)
|
|
159
|
+
|
|
160
|
+
For checking telemetry configuration in deployed environments:
|
|
161
|
+
|
|
162
|
+
```bash
|
|
163
|
+
kamal app exec --roles=web "bin/rails runner \"
|
|
164
|
+
puts 'OTEL_EXPORTER_OTLP_ENDPOINT: ' + ENV.fetch('OTEL_EXPORTER_OTLP_ENDPOINT', 'NOT SET')
|
|
165
|
+
puts 'OTEL_SERVICE_NAME: ' + ENV.fetch('OTEL_SERVICE_NAME', 'NOT SET')
|
|
166
|
+
puts 'OTEL_TRACES_EXPORTER: ' + ENV.fetch('OTEL_TRACES_EXPORTER', 'NOT SET')
|
|
167
|
+
\"" -d {environment}
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
## Output Format
|
|
171
|
+
|
|
172
|
+
### Trace Summary
|
|
173
|
+
|
|
174
|
+
| Check | Status | Details |
|
|
175
|
+
|-------|--------|---------|
|
|
176
|
+
| OTel gems installed | OK/FAIL | opentelemetry-sdk v1.x.x |
|
|
177
|
+
| OTel initializer present | OK/FAIL | config/initializers/opentelemetry.rb |
|
|
178
|
+
| OTLP endpoint configured | OK/FAIL | https://otel-collector:4318 |
|
|
179
|
+
| Traces in X-Ray (last 30m) | OK/FAIL | 245 traces found |
|
|
180
|
+
| Error traces (last 1h) | OK/WARN | 3 fault traces |
|
|
181
|
+
| CloudWatch metrics | OK/FAIL | 12 custom metrics published |
|
|
182
|
+
| CloudWatch alarms | OK/WARN | 1 alarm in ALARM state |
|
|
183
|
+
|
|
184
|
+
### Recent Traces
|
|
185
|
+
|
|
186
|
+
| Trace ID | Duration | Status | URL |
|
|
187
|
+
|----------|----------|--------|-----|
|
|
188
|
+
| 1-abc123 | 45ms | 200 | GET /up |
|
|
189
|
+
| 1-def456 | 120ms | 200 | GET /api/v1/users |
|
|
190
|
+
| 1-ghi789 | 2300ms | 500 | POST /api/v1/orders |
|
|
191
|
+
|
|
192
|
+
Flag concerns:
|
|
193
|
+
- No traces found in the last 30 minutes
|
|
194
|
+
- Error rate above 5% of total traces
|
|
195
|
+
- Average trace duration above 1 second
|
|
196
|
+
- OTLP endpoint not configured
|
|
197
|
+
- CloudWatch alarms in ALARM state
|
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "lisa-rails",
|
|
3
3
|
"version": "1.0.0",
|
|
4
|
-
"description": "Ruby on Rails-specific
|
|
4
|
+
"description": "Ruby on Rails-specific hooks — RuboCop linting/formatting and ast-grep scanning on edit",
|
|
5
5
|
"author": { "name": "Cody Swann" },
|
|
6
6
|
"hooks": {
|
|
7
7
|
"UserPromptSubmit": [
|
|
@@ -14,6 +14,15 @@
|
|
|
14
14
|
}
|
|
15
15
|
]
|
|
16
16
|
}
|
|
17
|
+
],
|
|
18
|
+
"PostToolUse": [
|
|
19
|
+
{
|
|
20
|
+
"matcher": "Write|Edit",
|
|
21
|
+
"hooks": [
|
|
22
|
+
{ "type": "command", "command": "${CLAUDE_PLUGIN_ROOT}/hooks/rubocop-on-edit.sh" },
|
|
23
|
+
{ "type": "command", "command": "${CLAUDE_PLUGIN_ROOT}/hooks/sg-scan-on-edit.sh" }
|
|
24
|
+
]
|
|
25
|
+
}
|
|
17
26
|
]
|
|
18
27
|
}
|
|
19
28
|
}
|