@howlil/ez-agents 3.4.2 → 3.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +77 -2
- package/agents/ez-observer-agent.md +260 -0
- package/agents/ez-release-agent.md +333 -0
- package/agents/ez-requirements-agent.md +377 -0
- package/agents/ez-scrum-master-agent.md +242 -0
- package/agents/ez-tech-lead-agent.md +267 -0
- package/bin/install.js +3221 -3272
- package/commands/ez/arch-review.md +102 -0
- package/commands/ez/execute-phase.md +11 -0
- package/commands/ez/export-session.md +79 -0
- package/commands/ez/gather-requirements.md +117 -0
- package/commands/ez/git-workflow.md +72 -0
- package/commands/ez/hotfix.md +120 -0
- package/commands/ez/import-session.md +82 -0
- package/commands/ez/list-sessions.md +96 -0
- package/commands/ez/package-manager.md +316 -0
- package/commands/ez/plan-phase.md +9 -1
- package/commands/ez/preflight.md +79 -0
- package/commands/ez/progress.md +13 -1
- package/commands/ez/release.md +153 -0
- package/commands/ez/resume.md +107 -0
- package/commands/ez/standup.md +85 -0
- package/ez-agents/bin/ez-tools.cjs +1095 -716
- package/ez-agents/bin/lib/bdd-validator.cjs +622 -0
- package/ez-agents/bin/lib/content-scanner.cjs +238 -0
- package/ez-agents/bin/lib/context-cache.cjs +154 -0
- package/ez-agents/bin/lib/context-errors.cjs +71 -0
- package/ez-agents/bin/lib/context-manager.cjs +220 -0
- package/ez-agents/bin/lib/discussion-synthesizer.cjs +458 -0
- package/ez-agents/bin/lib/file-access.cjs +207 -0
- package/ez-agents/bin/lib/git-errors.cjs +83 -0
- package/ez-agents/bin/lib/git-utils.cjs +321 -203
- package/ez-agents/bin/lib/git-workflow-engine.cjs +1157 -0
- package/ez-agents/bin/lib/index.cjs +46 -2
- package/ez-agents/bin/lib/lockfile-validator.cjs +227 -0
- package/ez-agents/bin/lib/logger.cjs +124 -154
- package/ez-agents/bin/lib/memory-compression.cjs +256 -0
- package/ez-agents/bin/lib/metrics-tracker.cjs +406 -0
- package/ez-agents/bin/lib/package-manager-detector.cjs +203 -0
- package/ez-agents/bin/lib/package-manager-executor.cjs +385 -0
- package/ez-agents/bin/lib/package-manager-service.cjs +216 -0
- package/ez-agents/bin/lib/release-validator.cjs +614 -0
- package/ez-agents/bin/lib/safe-exec.cjs +128 -214
- package/ez-agents/bin/lib/session-chain.cjs +304 -0
- package/ez-agents/bin/lib/session-errors.cjs +81 -0
- package/ez-agents/bin/lib/session-export.cjs +251 -0
- package/ez-agents/bin/lib/session-import.cjs +262 -0
- package/ez-agents/bin/lib/session-manager.cjs +280 -0
- package/ez-agents/bin/lib/tier-manager.cjs +428 -0
- package/ez-agents/bin/lib/url-fetch.cjs +170 -0
- package/ez-agents/references/metrics-schema.md +118 -0
- package/ez-agents/references/planning-config.md +140 -0
- package/ez-agents/references/tier-strategy.md +103 -0
- package/ez-agents/templates/bdd-feature.md +173 -0
- package/ez-agents/templates/discussion.md +68 -0
- package/ez-agents/templates/incident-runbook.md +205 -0
- package/ez-agents/templates/release-checklist.md +133 -0
- package/ez-agents/templates/rollback-plan.md +201 -0
- package/ez-agents/workflows/arch-review.md +54 -0
- package/ez-agents/workflows/autonomous.md +844 -743
- package/ez-agents/workflows/execute-phase.md +45 -0
- package/ez-agents/workflows/export-session.md +255 -0
- package/ez-agents/workflows/gather-requirements.md +206 -0
- package/ez-agents/workflows/help.md +92 -0
- package/ez-agents/workflows/hotfix.md +291 -0
- package/ez-agents/workflows/import-session.md +303 -0
- package/ez-agents/workflows/new-milestone.md +713 -384
- package/ez-agents/workflows/new-project.md +1107 -1113
- package/ez-agents/workflows/plan-phase.md +22 -0
- package/ez-agents/workflows/progress.md +15 -25
- package/ez-agents/workflows/release.md +253 -0
- package/ez-agents/workflows/resume-session.md +215 -0
- package/ez-agents/workflows/standup.md +64 -0
- package/package.json +9 -2
|
@@ -0,0 +1,68 @@
|
|
|
1
|
+
---
|
|
2
|
+
phase: {phase-number}-{phase-slug}
|
|
3
|
+
status: open
|
|
4
|
+
participants: [ez-requirements-agent, ez-tech-lead-agent, ez-observer-agent, ez-scrum-master-agent]
|
|
5
|
+
opened: {timestamp}
|
|
6
|
+
consensus: pending
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# Phase {X}: {Name} — Pre-Execution Discussion
|
|
10
|
+
|
|
11
|
+
**Purpose:** Parallel agent perspectives before phase execution. Orchestrator reads consensus before spawning executors.
|
|
12
|
+
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
## Requirements Perspective (ez-requirements-agent)
|
|
16
|
+
|
|
17
|
+
> *Populated by ez-requirements-agent during phase kickoff*
|
|
18
|
+
|
|
19
|
+
{Populated during gather-requirements or plan-phase kickoff}
|
|
20
|
+
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
## Tech Lead Perspective (ez-tech-lead-agent)
|
|
24
|
+
|
|
25
|
+
> *Populated by ez-tech-lead-agent during plan-phase review*
|
|
26
|
+
|
|
27
|
+
{Populated during arch-review after plan creation}
|
|
28
|
+
|
|
29
|
+
---
|
|
30
|
+
|
|
31
|
+
## Observer Perspective (ez-observer-agent)
|
|
32
|
+
|
|
33
|
+
> *Populated by ez-observer-agent during execute-phase pre-flight*
|
|
34
|
+
|
|
35
|
+
{Populated during execute-phase pre-flight}
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
## Scrum Master Perspective (ez-scrum-master-agent)
|
|
40
|
+
|
|
41
|
+
> *Populated by ez-scrum-master-agent during phase kickoff*
|
|
42
|
+
|
|
43
|
+
{Populated during standup or phase kickoff}
|
|
44
|
+
|
|
45
|
+
---
|
|
46
|
+
|
|
47
|
+
## Consensus
|
|
48
|
+
|
|
49
|
+
> *Synthesized by orchestrator from above perspectives*
|
|
50
|
+
|
|
51
|
+
**Status:** {open | consensus-reached | needs-human}
|
|
52
|
+
|
|
53
|
+
### Blockers
|
|
54
|
+
{List any hard blockers from any agent, or "None"}
|
|
55
|
+
|
|
56
|
+
### Key Warnings
|
|
57
|
+
{List significant warnings, or "None"}
|
|
58
|
+
|
|
59
|
+
### Go / No-Go
|
|
60
|
+
{GO — proceed to execution | NO-GO — resolve blockers first | HUMAN-NEEDED — requires user input}
|
|
61
|
+
|
|
62
|
+
### Rationale
|
|
63
|
+
{1-2 sentences explaining the consensus decision}
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
*Discussion opened: {timestamp}*
|
|
68
|
+
*Last updated: {timestamp}*
|
|
@@ -0,0 +1,205 @@
|
|
|
1
|
+
# Incident Runbook: {service-name}
|
|
2
|
+
|
|
3
|
+
**Version:** v{version} ({tier} tier)
|
|
4
|
+
**Last updated:** {date}
|
|
5
|
+
**Owner:** {team or individual}
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Severity Levels
|
|
10
|
+
|
|
11
|
+
| Level | Description | Response Time | Example |
|
|
12
|
+
|-------|-------------|---------------|---------|
|
|
13
|
+
| P0 — Critical | Complete outage, data loss | Immediate | App unreachable, DB corruption |
|
|
14
|
+
| P1 — High | Major feature broken, many users affected | 15 minutes | Login broken, payments failing |
|
|
15
|
+
| P2 — Medium | Feature degraded, workaround exists | 1 hour | Slow responses, non-critical feature down |
|
|
16
|
+
| P3 — Low | Minor issue, cosmetic | Next business day | UI glitch, non-critical error |
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## On-Call Contacts
|
|
21
|
+
|
|
22
|
+
| Role | Contact | When to Call |
|
|
23
|
+
|------|---------|--------------|
|
|
24
|
+
| Primary on-call | {name/handle} | P0 and P1 |
|
|
25
|
+
| Secondary | {name/handle} | Primary unreachable |
|
|
26
|
+
| Database | {name/handle} | DB issues |
|
|
27
|
+
| Platform/Infra | {name/handle} | Infrastructure issues |
|
|
28
|
+
|
|
29
|
+
---
|
|
30
|
+
|
|
31
|
+
## Quick Diagnostics
|
|
32
|
+
|
|
33
|
+
### 1. Is the app running?
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
curl -f https://{your-domain}/health && echo "UP" || echo "DOWN"
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
### 2. Check recent deployments
|
|
40
|
+
|
|
41
|
+
```bash
|
|
42
|
+
git log --oneline -5 # Recent commits
|
|
43
|
+
# Or check your deployment platform dashboard
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
### 3. Check error logs
|
|
47
|
+
|
|
48
|
+
```bash
|
|
49
|
+
# Vercel: vercel logs
|
|
50
|
+
# Railway: railway logs
|
|
51
|
+
# Generic: tail -100 /var/log/app.log
|
|
52
|
+
# Or: check error tracking dashboard (Sentry, etc.)
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
### 4. Check database connectivity
|
|
56
|
+
|
|
57
|
+
```bash
|
|
58
|
+
# Test DB connection
|
|
59
|
+
node -e "
|
|
60
|
+
const { PrismaClient } = require('@prisma/client');
|
|
61
|
+
const prisma = new PrismaClient();
|
|
62
|
+
prisma.\$connect().then(() => { console.log('DB OK'); process.exit(0); })
|
|
63
|
+
.catch(e => { console.error('DB FAIL:', e.message); process.exit(1); });
|
|
64
|
+
"
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
---
|
|
68
|
+
|
|
69
|
+
## Common Incidents and Resolutions
|
|
70
|
+
|
|
71
|
+
### App is down (P0)
|
|
72
|
+
|
|
73
|
+
1. Check if recent deployment caused it: `git log --oneline -3`
|
|
74
|
+
2. If yes → roll back immediately (see Rollback Procedure)
|
|
75
|
+
3. If no → check infrastructure status (hosting provider status page)
|
|
76
|
+
4. Check logs for error pattern
|
|
77
|
+
5. If DB issue → see DB incident section
|
|
78
|
+
|
|
79
|
+
### Login/Auth broken (P1)
|
|
80
|
+
|
|
81
|
+
1. Check auth service logs
|
|
82
|
+
2. Verify environment variables: `NEXTAUTH_SECRET`, `NEXTAUTH_URL`, OAuth credentials
|
|
83
|
+
3. Check if auth provider (Google, GitHub, etc.) is having issues
|
|
84
|
+
4. Test with curl: `curl -X POST /api/auth/signin`
|
|
85
|
+
5. If JWT secret rotated → issue new tokens (force re-login acceptable)
|
|
86
|
+
|
|
87
|
+
### Payments failing (P1)
|
|
88
|
+
|
|
89
|
+
1. Check payment provider dashboard (Stripe, etc.) for service issues
|
|
90
|
+
2. Check webhook delivery in provider dashboard
|
|
91
|
+
3. Verify API keys are valid: check for expiry or rotation
|
|
92
|
+
4. Check logs for specific Stripe error codes
|
|
93
|
+
5. **Do NOT retry failed charges** until root cause identified
|
|
94
|
+
|
|
95
|
+
### Database full / slow (P1)
|
|
96
|
+
|
|
97
|
+
1. Check disk usage and connection count
|
|
98
|
+
2. Identify slow queries: check database logs or monitoring
|
|
99
|
+
3. Kill long-running queries if blocking
|
|
100
|
+
4. Check for lock contention
|
|
101
|
+
5. Consider read replica if load issue
|
|
102
|
+
|
|
103
|
+
### High error rate (P2)
|
|
104
|
+
|
|
105
|
+
1. Check error tracking for error pattern
|
|
106
|
+
2. Identify affected endpoints from logs
|
|
107
|
+
3. Check if recent deployment correlates
|
|
108
|
+
4. Roll back if correlation found
|
|
109
|
+
5. Otherwise: hotfix with `/ez:hotfix start {description}`
|
|
110
|
+
|
|
111
|
+
---
|
|
112
|
+
|
|
113
|
+
## Rollback Procedure
|
|
114
|
+
|
|
115
|
+
```bash
|
|
116
|
+
# 1. Decision: roll back if P0/P1 persists > {rollback_window}
|
|
117
|
+
|
|
118
|
+
# 2. Roll back application
|
|
119
|
+
vercel rollback # Vercel
|
|
120
|
+
# OR: git revert HEAD && git push # generic
|
|
121
|
+
|
|
122
|
+
# 3. Verify health
|
|
123
|
+
curl -f https://{your-domain}/health
|
|
124
|
+
|
|
125
|
+
# 4. Notify stakeholders
|
|
126
|
+
# See communication templates below
|
|
127
|
+
|
|
128
|
+
# 5. Document in post-mortem
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
Full rollback plan: `.planning/releases/v{version}-ROLLBACK-PLAN.md`
|
|
132
|
+
|
|
133
|
+
---
|
|
134
|
+
|
|
135
|
+
## Communication Templates
|
|
136
|
+
|
|
137
|
+
### Customer-facing (P0)
|
|
138
|
+
|
|
139
|
+
```
|
|
140
|
+
[{service-name} Status] Service disruption
|
|
141
|
+
|
|
142
|
+
We are experiencing a service disruption affecting {impact description}.
|
|
143
|
+
Our team is actively working to restore service.
|
|
144
|
+
|
|
145
|
+
Current status: Investigating
|
|
146
|
+
ETA: Unknown (update in 30 minutes)
|
|
147
|
+
|
|
148
|
+
We apologize for the inconvenience.
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
### Customer-facing (Resolved)
|
|
152
|
+
|
|
153
|
+
```
|
|
154
|
+
[{service-name} Status] Service restored
|
|
155
|
+
|
|
156
|
+
The service disruption has been resolved at {time}.
|
|
157
|
+
All systems are operating normally.
|
|
158
|
+
|
|
159
|
+
Duration: {duration}
|
|
160
|
+
Impact: {description}
|
|
161
|
+
|
|
162
|
+
We apologize for the disruption. A full post-mortem will be published at {link}.
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
---
|
|
166
|
+
|
|
167
|
+
## Post-Incident Process
|
|
168
|
+
|
|
169
|
+
1. **Immediate** (within 1 hour): Write incident timeline in `.planning/incidents/{date}-{title}.md`
|
|
170
|
+
2. **Short-term** (within 24 hours): Root cause analysis complete
|
|
171
|
+
3. **Follow-up** (within 1 week): Prevention measures implemented
|
|
172
|
+
|
|
173
|
+
### Post-Mortem Template
|
|
174
|
+
|
|
175
|
+
```markdown
|
|
176
|
+
# Incident Post-Mortem: {title}
|
|
177
|
+
|
|
178
|
+
**Date:** {date}
|
|
179
|
+
**Duration:** {start} – {end} ({total duration})
|
|
180
|
+
**Severity:** P{level}
|
|
181
|
+
**Impact:** {affected users/features}
|
|
182
|
+
|
|
183
|
+
## Timeline
|
|
184
|
+
- HH:MM — {event}
|
|
185
|
+
- HH:MM — {event}
|
|
186
|
+
|
|
187
|
+
## Root Cause
|
|
188
|
+
{Single sentence root cause}
|
|
189
|
+
|
|
190
|
+
## Contributing Factors
|
|
191
|
+
{What made this possible}
|
|
192
|
+
|
|
193
|
+
## What Went Well
|
|
194
|
+
{Detection speed, response time, etc.}
|
|
195
|
+
|
|
196
|
+
## Action Items
|
|
197
|
+
| Action | Owner | Due |
|
|
198
|
+
|--------|-------|-----|
|
|
199
|
+
| {item} | {person} | {date} |
|
|
200
|
+
```
|
|
201
|
+
|
|
202
|
+
---
|
|
203
|
+
|
|
204
|
+
*Generated by EZ Agents release-agent*
|
|
205
|
+
*Tier: {tier} | Version: v{version}*
|
|
@@ -0,0 +1,133 @@
|
|
|
1
|
+
# Release Checklist Template
|
|
2
|
+
|
|
3
|
+
Tier-parameterized checklist for `/ez:release`. Items marked `[AUTO]` can be checked programmatically. Items marked `[HUMAN]` require manual verification.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## MVP Checklist (6 items)
|
|
8
|
+
|
|
9
|
+
Use when: First public release, early access, startup MVP
|
|
10
|
+
|
|
11
|
+
```
|
|
12
|
+
Release: v{version} — MVP Tier
|
|
13
|
+
Date: {date}
|
|
14
|
+
```
|
|
15
|
+
|
|
16
|
+
| # | Item | Type | Result |
|
|
17
|
+
|---|------|------|--------|
|
|
18
|
+
| 1 | All @must BDD scenarios passing | AUTO | |
|
|
19
|
+
| 2 | `npm audit` — no critical vulnerabilities | AUTO | |
|
|
20
|
+
| 3 | Health endpoint returns 200 (if applicable) | AUTO | |
|
|
21
|
+
| 4 | No secrets in committed files | AUTO | |
|
|
22
|
+
| 5 | Application starts without errors (`npm start` or equivalent) | AUTO | |
|
|
23
|
+
| 6 | Rollback procedure documented | AUTO | |
|
|
24
|
+
|
|
25
|
+
**Gate:** All 6 items must pass for MVP release.
|
|
26
|
+
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## Medium Checklist (18 items)
|
|
30
|
+
|
|
31
|
+
Use when: General availability, paying customers, production SLA
|
|
32
|
+
|
|
33
|
+
```
|
|
34
|
+
Release: v{version} — Medium Tier
|
|
35
|
+
Date: {date}
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
*Includes all MVP items plus:*
|
|
39
|
+
|
|
40
|
+
| # | Item | Type | Result |
|
|
41
|
+
|---|------|------|--------|
|
|
42
|
+
| 1-6 | All MVP items | AUTO | |
|
|
43
|
+
| 7 | All @should BDD scenarios passing | AUTO | |
|
|
44
|
+
| 8 | Test coverage ≥ 80% | AUTO | |
|
|
45
|
+
| 9 | Staging environment parity verified | HUMAN | |
|
|
46
|
+
| 10 | Monitoring/alerts configured (uptime, error rate) | HUMAN | |
|
|
47
|
+
| 11 | Structured logging in place (no console.log in prod) | AUTO | |
|
|
48
|
+
| 12 | Performance baseline documented | HUMAN | |
|
|
49
|
+
| 13 | Error tracking configured (Sentry, Rollbar, or equivalent) | HUMAN | |
|
|
50
|
+
| 14 | Database migrations tested on staging copy | HUMAN | |
|
|
51
|
+
| 15 | API documentation current (README or OpenAPI) | HUMAN | |
|
|
52
|
+
| 16 | Environment variables documented (.env.example up to date) | AUTO | |
|
|
53
|
+
| 17 | Graceful shutdown handled (SIGTERM, connection draining) | AUTO | |
|
|
54
|
+
| 18 | Rate limiting on public API endpoints | AUTO | |
|
|
55
|
+
|
|
56
|
+
**Gate:** Items 1-12 must pass. Items 13-18 advisory for Medium.
|
|
57
|
+
|
|
58
|
+
---
|
|
59
|
+
|
|
60
|
+
## Enterprise Checklist (30 items)
|
|
61
|
+
|
|
62
|
+
Use when: Enterprise customers, regulated industries, compliance requirements
|
|
63
|
+
|
|
64
|
+
```
|
|
65
|
+
Release: v{version} — Enterprise Tier
|
|
66
|
+
Date: {date}
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
*Includes all Medium items plus:*
|
|
70
|
+
|
|
71
|
+
| # | Item | Type | Result |
|
|
72
|
+
|---|------|------|--------|
|
|
73
|
+
| 1-18 | All Medium items | MIXED | |
|
|
74
|
+
| 19 | All @could BDD scenarios passing | AUTO | |
|
|
75
|
+
| 20 | Test coverage ≥ 95% | AUTO | |
|
|
76
|
+
| 21 | Security audit completed (internal or third-party) | HUMAN | |
|
|
77
|
+
| 22 | Compliance documentation updated (SOC2/GDPR controls) | HUMAN | |
|
|
78
|
+
| 23 | Load test results documented (target: 2x expected peak) | HUMAN | |
|
|
79
|
+
| 24 | Disaster recovery tested (backup restore procedure) | HUMAN | |
|
|
80
|
+
| 25 | Data retention policy configured | HUMAN | |
|
|
81
|
+
| 26 | Audit logging enabled (who did what, when) | AUTO | |
|
|
82
|
+
| 27 | Penetration test completed or scheduled | HUMAN | |
|
|
83
|
+
| 28 | SOC2/GDPR controls validated | HUMAN | |
|
|
84
|
+
| 29 | Change management ticket filed | HUMAN | |
|
|
85
|
+
| 30 | Incident runbook up to date | HUMAN | |
|
|
86
|
+
|
|
87
|
+
**Gate:** All items 1-26 must pass. Items 27-30 must be scheduled/documented.
|
|
88
|
+
|
|
89
|
+
---
|
|
90
|
+
|
|
91
|
+
## Security Gates (All Tiers)
|
|
92
|
+
|
|
93
|
+
These run before ANY release regardless of tier:
|
|
94
|
+
|
|
95
|
+
```bash
|
|
96
|
+
# Gate 1: No secrets
|
|
97
|
+
git grep -i -E "(api[_-]?key|password|secret)['\"]?\s*[=:]\s*['\"]?[a-zA-Z0-9+/]{20,}" HEAD
|
|
98
|
+
|
|
99
|
+
# Gate 2: npm audit
|
|
100
|
+
npm audit --audit-level=critical
|
|
101
|
+
|
|
102
|
+
# Gate 3: Production TODO/FIXME
|
|
103
|
+
grep -rn "TODO\|FIXME\|HACK" src/ --include="*.ts" --include="*.js" | grep -v test
|
|
104
|
+
|
|
105
|
+
# Gate 4: .env protected
|
|
106
|
+
grep -q "^\.env" .gitignore
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
**Any gate failure = BLOCKED regardless of tier.**
|
|
110
|
+
|
|
111
|
+
---
|
|
112
|
+
|
|
113
|
+
## Production Readiness Score
|
|
114
|
+
|
|
115
|
+
```
|
|
116
|
+
Score = 100 - (blocking_failures × 10) - (advisory_failures × 2)
|
|
117
|
+
|
|
118
|
+
READY: 90-100
|
|
119
|
+
CONDITIONAL: 70-89
|
|
120
|
+
NOT READY: <70
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
---
|
|
124
|
+
|
|
125
|
+
## Rollback Criteria
|
|
126
|
+
|
|
127
|
+
**Roll back immediately if (within 1 hour of release):**
|
|
128
|
+
|
|
129
|
+
| Tier | Trigger | Response Time |
|
|
130
|
+
|------|---------|---------------|
|
|
131
|
+
| MVP | App won't start OR error rate >20% | 30 minutes |
|
|
132
|
+
| Medium | Error rate >5% above baseline OR P95 >500ms increase | 15 minutes |
|
|
133
|
+
| Enterprise | Any SLA breach OR compliance violation | 5 minutes |
|
|
@@ -0,0 +1,201 @@
|
|
|
1
|
+
# Rollback Plan: v{version}
|
|
2
|
+
|
|
3
|
+
**Released:** {date}
|
|
4
|
+
**Tier:** {tier}
|
|
5
|
+
**Previous version:** {previous_version}
|
|
6
|
+
**Previous tag:** {previous_tag}
|
|
7
|
+
**Author:** EZ Agents release-agent
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## Rollback Decision Criteria
|
|
12
|
+
|
|
13
|
+
Roll back **immediately** if any of the following occur within {rollback_window} of release:
|
|
14
|
+
|
|
15
|
+
### MVP Tier (30-minute window)
|
|
16
|
+
- [ ] Application fails to start
|
|
17
|
+
- [ ] Error rate exceeds 20%
|
|
18
|
+
- [ ] Health endpoint returns non-200
|
|
19
|
+
- [ ] Critical functionality broken (login, core user flow)
|
|
20
|
+
|
|
21
|
+
### Medium Tier (15-minute window)
|
|
22
|
+
- All MVP criteria plus:
|
|
23
|
+
- [ ] Error rate increases >5% above pre-release baseline
|
|
24
|
+
- [ ] P95 response time increases >200ms
|
|
25
|
+
- [ ] Payment/auth system errors
|
|
26
|
+
|
|
27
|
+
### Enterprise Tier (5-minute window)
|
|
28
|
+
- All Medium criteria plus:
|
|
29
|
+
- [ ] Any SLA breach
|
|
30
|
+
- [ ] Any compliance-related failure
|
|
31
|
+
- [ ] Security alert triggered
|
|
32
|
+
- [ ] Data integrity issue detected
|
|
33
|
+
|
|
34
|
+
---
|
|
35
|
+
|
|
36
|
+
## Rollback Procedure
|
|
37
|
+
|
|
38
|
+
### Step 1: Decision (T+0)
|
|
39
|
+
|
|
40
|
+
Whoever observes a rollback trigger calls rollback immediately.
|
|
41
|
+
|
|
42
|
+
**Do NOT wait** to gather more data. Roll back, then investigate.
|
|
43
|
+
|
|
44
|
+
Contact: {oncall_contact or "N/A"}
|
|
45
|
+
|
|
46
|
+
### Step 2: Application Rollback (T+2 minutes)
|
|
47
|
+
|
|
48
|
+
Choose rollback method based on deployment platform:
|
|
49
|
+
|
|
50
|
+
**Vercel:**
|
|
51
|
+
```bash
|
|
52
|
+
vercel rollback
|
|
53
|
+
# Or use Vercel dashboard → Deployments → select previous → Promote
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
**Netlify:**
|
|
57
|
+
```bash
|
|
58
|
+
# Netlify dashboard → Deploys → select previous deploy → Publish deploy
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
**Railway:**
|
|
62
|
+
```bash
|
|
63
|
+
# Railway dashboard → Deployments → select previous → Rollback
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
**Heroku:**
|
|
67
|
+
```bash
|
|
68
|
+
heroku releases
|
|
69
|
+
heroku rollback v{previous_release_number}
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
**Generic (git-based deploy):**
|
|
73
|
+
```bash
|
|
74
|
+
git revert HEAD --no-edit
|
|
75
|
+
git push origin main
|
|
76
|
+
# Triggers your CI/CD to redeploy previous version
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
**Docker:**
|
|
80
|
+
```bash
|
|
81
|
+
docker pull {registry}/{image}:{previous_version}
|
|
82
|
+
docker tag {registry}/{image}:{previous_version} {registry}/{image}:latest
|
|
83
|
+
docker push {registry}/{image}:latest
|
|
84
|
+
# Restart containers
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
### Step 3: Database Rollback (if applicable)
|
|
88
|
+
|
|
89
|
+
{If database migrations were run:}
|
|
90
|
+
|
|
91
|
+
**Check if rollback is needed:**
|
|
92
|
+
Was a database migration applied as part of this release? Check migration history:
|
|
93
|
+
```bash
|
|
94
|
+
npx prisma migrate status 2>/dev/null
|
|
95
|
+
# Or: cat .planning/releases/v{version}-migrations.md
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
**If migration must be rolled back:**
|
|
99
|
+
|
|
100
|
+
```bash
|
|
101
|
+
# Prisma
|
|
102
|
+
npx prisma migrate resolve --rolled-back {migration_name}
|
|
103
|
+
|
|
104
|
+
# Django
|
|
105
|
+
python manage.py migrate {app_name} {previous_migration}
|
|
106
|
+
|
|
107
|
+
# Rails
|
|
108
|
+
rails db:rollback STEP=1
|
|
109
|
+
|
|
110
|
+
# Flyway
|
|
111
|
+
flyway undo
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
**WARNING:** Only roll back migrations if the data model change is backward-compatible OR no new data has been written. When in doubt: roll back application first, keep database as-is, hotfix forward.
|
|
115
|
+
|
|
116
|
+
### Step 4: Verify Rollback (T+5 minutes)
|
|
117
|
+
|
|
118
|
+
```bash
|
|
119
|
+
# Check health
|
|
120
|
+
curl -f https://{your-domain}/health || echo "HEALTH_CHECK_FAILED"
|
|
121
|
+
|
|
122
|
+
# Check error rate
|
|
123
|
+
# View in your monitoring dashboard — should return to pre-release baseline
|
|
124
|
+
|
|
125
|
+
# Smoke test key flows
|
|
126
|
+
# 1. Visit {your-domain}
|
|
127
|
+
# 2. Log in with test account
|
|
128
|
+
# 3. Perform core user action
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
Expected state after successful rollback:
|
|
132
|
+
- Application responds to all requests
|
|
133
|
+
- Error rate returns to pre-release baseline
|
|
134
|
+
- Health endpoint returns 200
|
|
135
|
+
- Core user flows work
|
|
136
|
+
|
|
137
|
+
### Step 5: Post-Rollback Communication
|
|
138
|
+
|
|
139
|
+
Notify relevant parties (team, users if customer-facing):
|
|
140
|
+
|
|
141
|
+
```
|
|
142
|
+
[Status Update]
|
|
143
|
+
We rolled back v{version} due to [brief description].
|
|
144
|
+
Service is restored. Root cause investigation in progress.
|
|
145
|
+
ETA for fix: [estimate]
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
### Step 6: Post-Mortem
|
|
149
|
+
|
|
150
|
+
After rollback is complete and service is stable:
|
|
151
|
+
|
|
152
|
+
1. **Root cause analysis** — What caused the issue?
|
|
153
|
+
2. **Timeline** — When detected, when rolled back, total impact duration
|
|
154
|
+
3. **Fix plan** — How to fix before re-releasing
|
|
155
|
+
4. **Process improvement** — What check could have caught this?
|
|
156
|
+
|
|
157
|
+
Write post-mortem to: `.planning/releases/v{version}-POSTMORTEM.md`
|
|
158
|
+
|
|
159
|
+
Update CHANGELOG.md:
|
|
160
|
+
```markdown
|
|
161
|
+
## [v{version}] — ROLLED BACK
|
|
162
|
+
Released {date}, rolled back {rollback_date}.
|
|
163
|
+
Reason: {brief reason}
|
|
164
|
+
Fix scheduled for v{next_version}
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
---
|
|
168
|
+
|
|
169
|
+
## Forward Fix Procedure
|
|
170
|
+
|
|
171
|
+
After rolling back, create a hotfix:
|
|
172
|
+
|
|
173
|
+
```bash
|
|
174
|
+
/ez:hotfix start {fix-description}
|
|
175
|
+
# Make the fix
|
|
176
|
+
/ez:hotfix complete {fix-description} {new_version}
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
Or plan a new phase if the fix is larger:
|
|
180
|
+
|
|
181
|
+
```bash
|
|
182
|
+
/ez:plan-phase {next_phase} --gaps
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
---
|
|
186
|
+
|
|
187
|
+
## Emergency Contacts
|
|
188
|
+
|
|
189
|
+
{Fill in before going to production:}
|
|
190
|
+
|
|
191
|
+
| Role | Contact | When to Call |
|
|
192
|
+
|------|---------|--------------|
|
|
193
|
+
| On-call developer | {name/handle} | Any rollback decision |
|
|
194
|
+
| Database admin | {name/handle} | If DB rollback needed |
|
|
195
|
+
| Customer success | {name/handle} | If customer impact >5 min |
|
|
196
|
+
|
|
197
|
+
---
|
|
198
|
+
|
|
199
|
+
*Generated by EZ Agents release-agent*
|
|
200
|
+
*Release: v{version} — {tier} tier*
|
|
201
|
+
*Created: {timestamp}*
|
|
@@ -0,0 +1,54 @@
|
|
|
1
|
+
<purpose>
|
|
2
|
+
Run a Tech Lead architecture review on phase plans. Reads phase PLAN.md files and checks for pattern drift, technical debt, security issues, and design conflicts.
|
|
3
|
+
</purpose>
|
|
4
|
+
|
|
5
|
+
<process>
|
|
6
|
+
|
|
7
|
+
## 1. Initialize
|
|
8
|
+
|
|
9
|
+
Parse $ARGUMENTS for phase number.
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
INIT=$(node "$HOME/.claude/ez-agents/bin/ez-tools.cjs" init plan-phase "$PHASE")
|
|
13
|
+
if [[ "$INIT" == @file:* ]]; then INIT=$(cat "${INIT#@file:}"); fi
|
|
14
|
+
```
|
|
15
|
+
|
|
16
|
+
Extract: `phase_dir`, `phase_number`, `phase_name`, `planner_model`.
|
|
17
|
+
|
|
18
|
+
## 2. Validate
|
|
19
|
+
|
|
20
|
+
Check phase directory and plans exist. If no PLAN.md found: error "No plans found for phase {N}. Run /ez:plan-phase first."
|
|
21
|
+
|
|
22
|
+
## 3. Spawn Tech Lead Review
|
|
23
|
+
|
|
24
|
+
Display:
|
|
25
|
+
```
|
|
26
|
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
27
|
+
EZ ► ARCH REVIEW — Phase {N}: {Name}
|
|
28
|
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
```
|
|
32
|
+
Task(
|
|
33
|
+
prompt="Review architecture of Phase {phase_number}: {phase_name}.
|
|
34
|
+
Read all PLAN.md files in {phase_dir}.
|
|
35
|
+
Check: pattern consistency, technical debt, security, cross-phase conflicts, dependencies.
|
|
36
|
+
Report: BLOCKERS (must fix), WARNINGS (should fix), ADVISORY (consider).
|
|
37
|
+
Output: ## Tech Lead Review: Phase {N} — {status}",
|
|
38
|
+
subagent_type="ez-tech-lead-agent",
|
|
39
|
+
model="{planner_model}"
|
|
40
|
+
)
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
## 4. Present Results
|
|
44
|
+
|
|
45
|
+
Display agent output directly. If BLOCKER found: highlight "Fix before /ez:execute-phase". If APPROVE: confirm safe to execute.
|
|
46
|
+
|
|
47
|
+
</process>
|
|
48
|
+
|
|
49
|
+
<success_criteria>
|
|
50
|
+
- [ ] Phase validated
|
|
51
|
+
- [ ] Tech lead agent spawned
|
|
52
|
+
- [ ] Findings presented with severity
|
|
53
|
+
- [ ] Clear go/no-go for execution
|
|
54
|
+
</success_criteria>
|