@brainst0rm/core 0.13.0 → 0.14.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (51) hide show
  1. package/dist/chunk-M7BBX56R.js +340 -0
  2. package/dist/chunk-M7BBX56R.js.map +1 -0
  3. package/dist/{chunk-SWXTFHC7.js → chunk-Z5D2QZY6.js} +3 -3
  4. package/dist/chunk-Z5D2QZY6.js.map +1 -0
  5. package/dist/chunk-Z6ZWNWWR.js +34 -0
  6. package/dist/index.d.ts +2717 -188
  7. package/dist/index.js +16178 -7949
  8. package/dist/index.js.map +1 -1
  9. package/dist/self-extend-47LWSK3E.js +52 -0
  10. package/dist/self-extend-47LWSK3E.js.map +1 -0
  11. package/dist/skills/builtin/api-and-interface-design/SKILL.md +300 -0
  12. package/dist/skills/builtin/browser-testing-with-devtools/SKILL.md +307 -0
  13. package/dist/skills/builtin/ci-cd-and-automation/SKILL.md +391 -0
  14. package/dist/skills/builtin/code-review-and-quality/SKILL.md +353 -0
  15. package/dist/skills/builtin/code-simplification/SKILL.md +340 -0
  16. package/dist/skills/builtin/context-engineering/SKILL.md +301 -0
  17. package/dist/skills/builtin/daemon-operations/SKILL.md +55 -0
  18. package/dist/skills/builtin/debugging-and-error-recovery/SKILL.md +306 -0
  19. package/dist/skills/builtin/deprecation-and-migration/SKILL.md +207 -0
  20. package/dist/skills/builtin/documentation-and-adrs/SKILL.md +295 -0
  21. package/dist/skills/builtin/frontend-ui-engineering/SKILL.md +333 -0
  22. package/dist/skills/builtin/git-workflow-and-versioning/SKILL.md +303 -0
  23. package/dist/skills/builtin/github-collaboration/SKILL.md +215 -0
  24. package/dist/skills/builtin/godmode-operations/SKILL.md +68 -0
  25. package/dist/skills/builtin/idea-refine/SKILL.md +186 -0
  26. package/dist/skills/builtin/idea-refine/examples.md +244 -0
  27. package/dist/skills/builtin/idea-refine/frameworks.md +101 -0
  28. package/dist/skills/builtin/idea-refine/refinement-criteria.md +126 -0
  29. package/dist/skills/builtin/idea-refine/scripts/idea-refine.sh +15 -0
  30. package/dist/skills/builtin/incremental-implementation/SKILL.md +243 -0
  31. package/dist/skills/builtin/memory-init/SKILL.md +54 -0
  32. package/dist/skills/builtin/memory-reflection/SKILL.md +59 -0
  33. package/dist/skills/builtin/multi-model-routing/SKILL.md +56 -0
  34. package/dist/skills/builtin/performance-optimization/SKILL.md +291 -0
  35. package/dist/skills/builtin/planning-and-task-breakdown/SKILL.md +240 -0
  36. package/dist/skills/builtin/security-and-hardening/SKILL.md +368 -0
  37. package/dist/skills/builtin/shipping-and-launch/SKILL.md +310 -0
  38. package/dist/skills/builtin/spec-driven-development/SKILL.md +212 -0
  39. package/dist/skills/builtin/test-driven-development/SKILL.md +376 -0
  40. package/dist/skills/builtin/using-agent-skills/SKILL.md +173 -0
  41. package/dist/trajectory-analyzer-ZAI2XUAI.js +14 -0
  42. package/dist/{trajectory-capture-RF7TUN6I.js → trajectory-capture-ERPIVYQJ.js} +3 -3
  43. package/package.json +14 -11
  44. package/dist/chunk-OU3NPQBH.js +0 -87
  45. package/dist/chunk-OU3NPQBH.js.map +0 -1
  46. package/dist/chunk-PZ5AY32C.js +0 -10
  47. package/dist/chunk-SWXTFHC7.js.map +0 -1
  48. package/dist/trajectory-MOCIJBV6.js +0 -8
  49. /package/dist/{chunk-PZ5AY32C.js.map → chunk-Z6ZWNWWR.js.map} +0 -0
  50. /package/dist/{trajectory-MOCIJBV6.js.map → trajectory-analyzer-ZAI2XUAI.js.map} +0 -0
  51. /package/dist/{trajectory-capture-RF7TUN6I.js.map → trajectory-capture-ERPIVYQJ.js.map} +0 -0
@@ -0,0 +1,368 @@
1
+ ---
2
+ name: security-and-hardening
3
+ description: Hardens code against vulnerabilities. Use when handling user input, authentication, data storage, or external integrations. Use when building any feature that accepts untrusted data, manages user sessions, or interacts with third-party services.
4
+ ---
5
+
6
+ # Security and Hardening
7
+
8
+ ## Overview
9
+
10
+ Security-first development practices for web applications. Treat every external input as hostile, every secret as sacred, and every authorization check as mandatory. Security isn't a phase — it's a constraint on every line of code that touches user data, authentication, or external systems.
11
+
12
+ ## When to Use
13
+
14
+ - Building anything that accepts user input
15
+ - Implementing authentication or authorization
16
+ - Storing or transmitting sensitive data
17
+ - Integrating with external APIs or services
18
+ - Adding file uploads, webhooks, or callbacks
19
+ - Handling payment or PII data
20
+
21
+ ## The Three-Tier Boundary System
22
+
23
+ ### Always Do (No Exceptions)
24
+
25
+ - **Validate all external input** at the system boundary (API routes, form handlers)
26
+ - **Parameterize all database queries** — never concatenate user input into SQL
27
+ - **Encode output** to prevent XSS (use framework auto-escaping, don't bypass it)
28
+ - **Use HTTPS** for all external communication
29
+ - **Hash passwords** with bcrypt/scrypt/argon2 (never store plaintext)
30
+ - **Set security headers** (CSP, HSTS, X-Frame-Options, X-Content-Type-Options)
31
+ - **Use httpOnly, secure, sameSite cookies** for sessions
32
+ - **Run `npm audit`** (or equivalent) before every release
33
+
34
+ ### Ask First (Requires Human Approval)
35
+
36
+ - Adding new authentication flows or changing auth logic
37
+ - Storing new categories of sensitive data (PII, payment info)
38
+ - Adding new external service integrations
39
+ - Changing CORS configuration
40
+ - Adding file upload handlers
41
+ - Modifying rate limiting or throttling
42
+ - Granting elevated permissions or roles
43
+
44
+ ### Never Do
45
+
46
+ - **Never commit secrets** to version control (API keys, passwords, tokens)
47
+ - **Never log sensitive data** (passwords, tokens, full credit card numbers)
48
+ - **Never trust client-side validation** as a security boundary
49
+ - **Never disable security headers** for convenience
50
+ - **Never use `eval()` or `innerHTML`** with user-provided data
51
+ - **Never store sessions in client-accessible storage** (localStorage for auth tokens)
52
+ - **Never expose stack traces** or internal error details to users
53
+
54
+ ## OWASP Top 10 Prevention
55
+
56
+ ### 1. Injection (SQL, NoSQL, OS Command)
57
+
58
+ ```typescript
59
+ // BAD: SQL injection via string concatenation
60
+ const query = `SELECT * FROM users WHERE id = '${userId}'`;
61
+
62
+ // GOOD: Parameterized query
63
+ const user = await db.query("SELECT * FROM users WHERE id = $1", [userId]);
64
+
65
+ // GOOD: ORM with parameterized input
66
+ const user = await prisma.user.findUnique({ where: { id: userId } });
67
+ ```
68
+
69
+ ### 2. Broken Authentication
70
+
71
+ ```typescript
72
+ // Password hashing
73
+ import { hash, compare } from "bcrypt";
74
+
75
+ const SALT_ROUNDS = 12;
76
+ const hashedPassword = await hash(plaintext, SALT_ROUNDS);
77
+ const isValid = await compare(plaintext, hashedPassword);
78
+
79
+ // Session management
80
+ app.use(
81
+ session({
82
+ secret: process.env.SESSION_SECRET, // From environment, not code
83
+ resave: false,
84
+ saveUninitialized: false,
85
+ cookie: {
86
+ httpOnly: true, // Not accessible via JavaScript
87
+ secure: true, // HTTPS only
88
+ sameSite: "lax", // CSRF protection
89
+ maxAge: 24 * 60 * 60 * 1000, // 24 hours
90
+ },
91
+ }),
92
+ );
93
+ ```
94
+
95
+ ### 3. Cross-Site Scripting (XSS)
96
+
97
+ ```typescript
98
+ // BAD: Rendering user input as HTML
99
+ element.innerHTML = userInput;
100
+
101
+ // GOOD: Use framework auto-escaping (React does this by default)
102
+ return <div>{userInput}</div>;
103
+
104
+ // If you MUST render HTML, sanitize first
105
+ import DOMPurify from 'dompurify';
106
+ const clean = DOMPurify.sanitize(userInput);
107
+ ```
108
+
109
+ ### 4. Broken Access Control
110
+
111
+ ```typescript
112
+ // Always check authorization, not just authentication
113
+ app.patch("/api/tasks/:id", authenticate, async (req, res) => {
114
+ const task = await taskService.findById(req.params.id);
115
+
116
+ // Check that the authenticated user owns this resource
117
+ if (task.ownerId !== req.user.id) {
118
+ return res.status(403).json({
119
+ error: {
120
+ code: "FORBIDDEN",
121
+ message: "Not authorized to modify this task",
122
+ },
123
+ });
124
+ }
125
+
126
+ // Proceed with update
127
+ const updated = await taskService.update(req.params.id, req.body);
128
+ return res.json(updated);
129
+ });
130
+ ```
131
+
132
+ ### 5. Security Misconfiguration
133
+
134
+ ```typescript
135
+ // Security headers (use helmet for Express)
136
+ import helmet from "helmet";
137
+ app.use(helmet());
138
+
139
+ // Content Security Policy
140
+ app.use(
141
+ helmet.contentSecurityPolicy({
142
+ directives: {
143
+ defaultSrc: ["'self'"],
144
+ scriptSrc: ["'self'"],
145
+ styleSrc: ["'self'", "'unsafe-inline'"], // Tighten if possible
146
+ imgSrc: ["'self'", "data:", "https:"],
147
+ connectSrc: ["'self'"],
148
+ },
149
+ }),
150
+ );
151
+
152
+ // CORS — restrict to known origins
153
+ app.use(
154
+ cors({
155
+ origin: process.env.ALLOWED_ORIGINS?.split(",") || "http://localhost:3000",
156
+ credentials: true,
157
+ }),
158
+ );
159
+ ```
160
+
161
+ ### 6. Sensitive Data Exposure
162
+
163
+ ```typescript
164
+ // Never return sensitive fields in API responses
165
+ function sanitizeUser(user: UserRecord): PublicUser {
166
+ const { passwordHash, resetToken, ...publicFields } = user;
167
+ return publicFields;
168
+ }
169
+
170
+ // Use environment variables for secrets
171
+ const API_KEY = process.env.STRIPE_API_KEY;
172
+ if (!API_KEY) throw new Error("STRIPE_API_KEY not configured");
173
+ ```
174
+
175
+ ## Input Validation Patterns
176
+
177
+ ### Schema Validation at Boundaries
178
+
179
+ ```typescript
180
+ import { z } from "zod";
181
+
182
+ const CreateTaskSchema = z.object({
183
+ title: z.string().min(1).max(200).trim(),
184
+ description: z.string().max(2000).optional(),
185
+ priority: z.enum(["low", "medium", "high"]).default("medium"),
186
+ dueDate: z.string().datetime().optional(),
187
+ });
188
+
189
+ // Validate at the route handler
190
+ app.post("/api/tasks", async (req, res) => {
191
+ const result = CreateTaskSchema.safeParse(req.body);
192
+ if (!result.success) {
193
+ return res.status(422).json({
194
+ error: {
195
+ code: "VALIDATION_ERROR",
196
+ message: "Invalid input",
197
+ details: result.error.flatten(),
198
+ },
199
+ });
200
+ }
201
+ // result.data is now typed and validated
202
+ const task = await taskService.create(result.data);
203
+ return res.status(201).json(task);
204
+ });
205
+ ```
206
+
207
+ ### File Upload Safety
208
+
209
+ ```typescript
210
+ // Restrict file types and sizes
211
+ const ALLOWED_TYPES = ["image/jpeg", "image/png", "image/webp"];
212
+ const MAX_SIZE = 5 * 1024 * 1024; // 5MB
213
+
214
+ function validateUpload(file: UploadedFile) {
215
+ if (!ALLOWED_TYPES.includes(file.mimetype)) {
216
+ throw new ValidationError("File type not allowed");
217
+ }
218
+ if (file.size > MAX_SIZE) {
219
+ throw new ValidationError("File too large (max 5MB)");
220
+ }
221
+ // Don't trust the file extension — check magic bytes if critical
222
+ }
223
+ ```
224
+
225
+ ## Triaging npm audit Results
226
+
227
+ Not all audit findings require immediate action. Use this decision tree:
228
+
229
+ ```
230
+ npm audit reports a vulnerability
231
+ ├── Severity: critical or high
232
+ │ ├── Is the vulnerable code reachable in your app?
233
+ │ │ ├── YES --> Fix immediately (update, patch, or replace the dependency)
234
+ │ │ └── NO (dev-only dep, unused code path) --> Fix soon, but not a blocker
235
+ │ └── Is a fix available?
236
+ │ ├── YES --> Update to the patched version
237
+ │ └── NO --> Check for workarounds, consider replacing the dependency, or add to allowlist with a review date
238
+ ├── Severity: moderate
239
+ │ ├── Reachable in production? --> Fix in the next release cycle
240
+ │ └── Dev-only? --> Fix when convenient, track in backlog
241
+ └── Severity: low
242
+ └── Track and fix during regular dependency updates
243
+ ```
244
+
245
+ **Key questions:**
246
+
247
+ - Is the vulnerable function actually called in your code path?
248
+ - Is the dependency a runtime dependency or dev-only?
249
+ - Is the vulnerability exploitable given your deployment context (e.g., a server-side vulnerability in a client-only app)?
250
+
251
+ When you defer a fix, document the reason and set a review date.
252
+
253
+ ## Rate Limiting
254
+
255
+ ```typescript
256
+ import rateLimit from "express-rate-limit";
257
+
258
+ // General API rate limit
259
+ app.use(
260
+ "/api/",
261
+ rateLimit({
262
+ windowMs: 15 * 60 * 1000, // 15 minutes
263
+ max: 100, // 100 requests per window
264
+ standardHeaders: true,
265
+ legacyHeaders: false,
266
+ }),
267
+ );
268
+
269
+ // Stricter limit for auth endpoints
270
+ app.use(
271
+ "/api/auth/",
272
+ rateLimit({
273
+ windowMs: 15 * 60 * 1000,
274
+ max: 10, // 10 attempts per 15 minutes
275
+ }),
276
+ );
277
+ ```
278
+
279
+ ## Secrets Management
280
+
281
+ ```
282
+ .env files:
283
+ ├── .env.example → Committed (template with placeholder values)
284
+ ├── .env → NOT committed (contains real secrets)
285
+ └── .env.local → NOT committed (local overrides)
286
+
287
+ .gitignore must include:
288
+ .env
289
+ .env.local
290
+ .env.*.local
291
+ *.pem
292
+ *.key
293
+ ```
294
+
295
+ **Always check before committing:**
296
+
297
+ ```bash
298
+ # Check for accidentally staged secrets
299
+ git diff --cached | grep -i "password\|secret\|api_key\|token"
300
+ ```
301
+
302
+ ## Security Review Checklist
303
+
304
+ ```markdown
305
+ ### Authentication
306
+
307
+ - [ ] Passwords hashed with bcrypt/scrypt/argon2 (salt rounds ≥ 12)
308
+ - [ ] Session tokens are httpOnly, secure, sameSite
309
+ - [ ] Login has rate limiting
310
+ - [ ] Password reset tokens expire
311
+
312
+ ### Authorization
313
+
314
+ - [ ] Every endpoint checks user permissions
315
+ - [ ] Users can only access their own resources
316
+ - [ ] Admin actions require admin role verification
317
+
318
+ ### Input
319
+
320
+ - [ ] All user input validated at the boundary
321
+ - [ ] SQL queries are parameterized
322
+ - [ ] HTML output is encoded/escaped
323
+
324
+ ### Data
325
+
326
+ - [ ] No secrets in code or version control
327
+ - [ ] Sensitive fields excluded from API responses
328
+ - [ ] PII encrypted at rest (if applicable)
329
+
330
+ ### Infrastructure
331
+
332
+ - [ ] Security headers configured (CSP, HSTS, etc.)
333
+ - [ ] CORS restricted to known origins
334
+ - [ ] Dependencies audited for vulnerabilities
335
+ - [ ] Error messages don't expose internals
336
+ ```
337
+
338
+ ## Common Rationalizations
339
+
340
+ | Rationalization | Reality |
341
+ | --------------------------------------------------- | ------------------------------------------------------------------------------- |
342
+ | "This is an internal tool, security doesn't matter" | Internal tools get compromised. Attackers target the weakest link. |
343
+ | "We'll add security later" | Security retrofitting is 10x harder than building it in. Add it now. |
344
+ | "No one would try to exploit this" | Automated scanners will find it. Security by obscurity is not security. |
345
+ | "The framework handles security" | Frameworks provide tools, not guarantees. You still need to use them correctly. |
346
+ | "It's just a prototype" | Prototypes become production. Security habits from day one. |
347
+
348
+ ## Red Flags
349
+
350
+ - User input passed directly to database queries, shell commands, or HTML rendering
351
+ - Secrets in source code or commit history
352
+ - API endpoints without authentication or authorization checks
353
+ - Missing CORS configuration or wildcard (`*`) origins
354
+ - No rate limiting on authentication endpoints
355
+ - Stack traces or internal errors exposed to users
356
+ - Dependencies with known critical vulnerabilities
357
+
358
+ ## Verification
359
+
360
+ After implementing security-relevant code:
361
+
362
+ - [ ] `npm audit` shows no critical or high vulnerabilities
363
+ - [ ] No secrets in source code or git history
364
+ - [ ] All user input validated at system boundaries
365
+ - [ ] Authentication and authorization checked on every protected endpoint
366
+ - [ ] Security headers present in response (check with browser DevTools)
367
+ - [ ] Error responses don't expose internal details
368
+ - [ ] Rate limiting active on auth endpoints
@@ -0,0 +1,310 @@
1
+ ---
2
+ name: shipping-and-launch
3
+ description: Prepares production launches. Use when preparing to deploy to production. Use when you need a pre-launch checklist, when setting up monitoring, when planning a staged rollout, or when you need a rollback strategy.
4
+ ---
5
+
6
+ # Shipping and Launch
7
+
8
+ ## Overview
9
+
10
+ Ship with confidence. The goal is not just to deploy — it's to deploy safely, with monitoring in place, a rollback plan ready, and a clear understanding of what success looks like. Every launch should be reversible, observable, and incremental.
11
+
12
+ ## When to Use
13
+
14
+ - Deploying a feature to production for the first time
15
+ - Releasing a significant change to users
16
+ - Migrating data or infrastructure
17
+ - Opening a beta or early access program
18
+ - Any deployment that carries risk (all of them)
19
+
20
+ ## The Pre-Launch Checklist
21
+
22
+ ### Code Quality
23
+
24
+ - [ ] All tests pass (unit, integration, e2e)
25
+ - [ ] Build succeeds with no warnings
26
+ - [ ] Lint and type checking pass
27
+ - [ ] Code reviewed and approved
28
+ - [ ] No TODO comments that should be resolved before launch
29
+ - [ ] No `console.log` debugging statements in production code
30
+ - [ ] Error handling covers expected failure modes
31
+
32
+ ### Security
33
+
34
+ - [ ] No secrets in code or version control
35
+ - [ ] `npm audit` shows no critical or high vulnerabilities
36
+ - [ ] Input validation on all user-facing endpoints
37
+ - [ ] Authentication and authorization checks in place
38
+ - [ ] Security headers configured (CSP, HSTS, etc.)
39
+ - [ ] Rate limiting on authentication endpoints
40
+ - [ ] CORS configured to specific origins (not wildcard)
41
+
42
+ ### Performance
43
+
44
+ - [ ] Core Web Vitals within "Good" thresholds
45
+ - [ ] No N+1 queries in critical paths
46
+ - [ ] Images optimized (compression, responsive sizes, lazy loading)
47
+ - [ ] Bundle size within budget
48
+ - [ ] Database queries have appropriate indexes
49
+ - [ ] Caching configured for static assets and repeated queries
50
+
51
+ ### Accessibility
52
+
53
+ - [ ] Keyboard navigation works for all interactive elements
54
+ - [ ] Screen reader can convey page content and structure
55
+ - [ ] Color contrast meets WCAG 2.1 AA (4.5:1 for text)
56
+ - [ ] Focus management correct for modals and dynamic content
57
+ - [ ] Error messages are descriptive and associated with form fields
58
+ - [ ] No accessibility warnings in axe-core or Lighthouse
59
+
60
+ ### Infrastructure
61
+
62
+ - [ ] Environment variables set in production
63
+ - [ ] Database migrations applied (or ready to apply)
64
+ - [ ] DNS and SSL configured
65
+ - [ ] CDN configured for static assets
66
+ - [ ] Logging and error reporting configured
67
+ - [ ] Health check endpoint exists and responds
68
+
69
+ ### Documentation
70
+
71
+ - [ ] README updated with any new setup requirements
72
+ - [ ] API documentation current
73
+ - [ ] ADRs written for any architectural decisions
74
+ - [ ] Changelog updated
75
+ - [ ] User-facing documentation updated (if applicable)
76
+
77
+ ## Feature Flag Strategy
78
+
79
+ Ship behind feature flags to decouple deployment from release:
80
+
81
+ ```typescript
82
+ // Feature flag check
83
+ const flags = await getFeatureFlags(userId);
84
+
85
+ if (flags.taskSharing) {
86
+ // New feature: task sharing
87
+ return <TaskSharingPanel task={task} />;
88
+ }
89
+
90
+ // Default: existing behavior
91
+ return null;
92
+ ```
93
+
94
+ **Feature flag lifecycle:**
95
+
96
+ ```
97
+ 1. DEPLOY with flag OFF → Code is in production but inactive
98
+ 2. ENABLE for team/beta → Internal testing in production environment
99
+ 3. GRADUAL ROLLOUT → 5% → 25% → 50% → 100% of users
100
+ 4. MONITOR at each stage → Watch error rates, performance, user feedback
101
+ 5. CLEAN UP → Remove flag and dead code path after full rollout
102
+ ```
103
+
104
+ **Rules:**
105
+
106
+ - Every feature flag has an owner and an expiration date
107
+ - Clean up flags within 2 weeks of full rollout
108
+ - Don't nest feature flags (creates exponential combinations)
109
+ - Test both flag states (on and off) in CI
110
+
111
+ ## Staged Rollout
112
+
113
+ ### The Rollout Sequence
114
+
115
+ ```
116
+ 1. DEPLOY to staging
117
+ └── Full test suite in staging environment
118
+ └── Manual smoke test of critical flows
119
+
120
+ 2. DEPLOY to production (feature flag OFF)
121
+ └── Verify deployment succeeded (health check)
122
+ └── Check error monitoring (no new errors)
123
+
124
+ 3. ENABLE for team (flag ON for internal users)
125
+ └── Team uses the feature in production
126
+ └── 24-hour monitoring window
127
+
128
+ 4. CANARY rollout (flag ON for 5% of users)
129
+ └── Monitor error rates, latency, user behavior
130
+ └── Compare metrics: canary vs. baseline
131
+ └── 24-48 hour monitoring window
132
+ └── Advance only if all thresholds pass (see table below)
133
+
134
+ 5. GRADUAL increase (25% -> 50% -> 100%)
135
+ └── Same monitoring at each step
136
+ └── Ability to roll back to previous percentage at any point
137
+
138
+ 6. FULL rollout (flag ON for all users)
139
+ └── Monitor for 1 week
140
+ └── Clean up feature flag
141
+ ```
142
+
143
+ ### Rollout Decision Thresholds
144
+
145
+ Use these thresholds to decide whether to advance, hold, or roll back at each stage:
146
+
147
+ | Metric | Advance (green) | Hold and investigate (yellow) | Roll back (red) |
148
+ | ---------------- | ---------------------- | ------------------------------- | ------------------------------- |
149
+ | Error rate | Within 10% of baseline | 10-100% above baseline | >2x baseline |
150
+ | P95 latency | Within 20% of baseline | 20-50% above baseline | >50% above baseline |
151
+ | Client JS errors | No new error types | New errors at <0.1% of sessions | New errors at >0.1% of sessions |
152
+ | Business metrics | Neutral or positive | Decline <5% (may be noise) | Decline >5% |
153
+
154
+ ### When to Roll Back
155
+
156
+ Roll back immediately if:
157
+
158
+ - Error rate increases by more than 2x baseline
159
+ - P95 latency increases by more than 50%
160
+ - User-reported issues spike
161
+ - Data integrity issues detected
162
+ - Security vulnerability discovered
163
+
164
+ ## Monitoring and Observability
165
+
166
+ ### What to Monitor
167
+
168
+ ```
169
+ Application metrics:
170
+ ├── Error rate (total and by endpoint)
171
+ ├── Response time (p50, p95, p99)
172
+ ├── Request volume
173
+ ├── Active users
174
+ └── Key business metrics (conversion, engagement)
175
+
176
+ Infrastructure metrics:
177
+ ├── CPU and memory utilization
178
+ ├── Database connection pool usage
179
+ ├── Disk space
180
+ ├── Network latency
181
+ └── Queue depth (if applicable)
182
+
183
+ Client metrics:
184
+ ├── Core Web Vitals (LCP, INP, CLS)
185
+ ├── JavaScript errors
186
+ ├── API error rates from client perspective
187
+ └── Page load time
188
+ ```
189
+
190
+ ### Error Reporting
191
+
192
+ ```typescript
193
+ // Set up error boundary with reporting
194
+ class ErrorBoundary extends React.Component {
195
+ componentDidCatch(error: Error, info: React.ErrorInfo) {
196
+ // Report to error tracking service
197
+ reportError(error, {
198
+ componentStack: info.componentStack,
199
+ userId: getCurrentUser()?.id,
200
+ page: window.location.pathname,
201
+ });
202
+ }
203
+
204
+ render() {
205
+ if (this.state.hasError) {
206
+ return <ErrorFallback onRetry={() => this.setState({ hasError: false })} />;
207
+ }
208
+ return this.props.children;
209
+ }
210
+ }
211
+
212
+ // Server-side error reporting
213
+ app.use((err: Error, req: Request, res: Response, next: NextFunction) => {
214
+ reportError(err, {
215
+ method: req.method,
216
+ url: req.url,
217
+ userId: req.user?.id,
218
+ });
219
+
220
+ // Don't expose internals to users
221
+ res.status(500).json({
222
+ error: { code: 'INTERNAL_ERROR', message: 'Something went wrong' },
223
+ });
224
+ });
225
+ ```
226
+
227
+ ### Post-Launch Verification
228
+
229
+ In the first hour after launch:
230
+
231
+ ```
232
+ 1. Check health endpoint returns 200
233
+ 2. Check error monitoring dashboard (no new error types)
234
+ 3. Check latency dashboard (no regression)
235
+ 4. Test the critical user flow manually
236
+ 5. Verify logs are flowing and readable
237
+ 6. Confirm rollback mechanism works (dry run if possible)
238
+ ```
239
+
240
+ ## Rollback Strategy
241
+
242
+ Every deployment needs a rollback plan before it happens:
243
+
244
+ ```markdown
245
+ ## Rollback Plan for [Feature/Release]
246
+
247
+ ### Trigger Conditions
248
+
249
+ - Error rate > 2x baseline
250
+ - P95 latency > [X]ms
251
+ - User reports of [specific issue]
252
+
253
+ ### Rollback Steps
254
+
255
+ 1. Disable feature flag (if applicable)
256
+ OR
257
+ 1. Deploy previous version: `git revert <commit> && git push`
258
+ 1. Verify rollback: health check, error monitoring
259
+ 1. Communicate: notify team of rollback
260
+
261
+ ### Database Considerations
262
+
263
+ - Migration [X] has a rollback: `npx prisma migrate rollback`
264
+ - Data inserted by new feature: [preserved / cleaned up]
265
+
266
+ ### Time to Rollback
267
+
268
+ - Feature flag: < 1 minute
269
+ - Redeploy previous version: < 5 minutes
270
+ - Database rollback: < 15 minutes
271
+ ```
272
+
273
+ ## Common Rationalizations
274
+
275
+ | Rationalization | Reality |
276
+ | ----------------------------------------------- | --------------------------------------------------------------------------------------------- |
277
+ | "It works in staging, it'll work in production" | Production has different data, traffic patterns, and edge cases. Monitor after deploy. |
278
+ | "We don't need feature flags for this" | Every feature benefits from a kill switch. Even "simple" changes can break things. |
279
+ | "Monitoring is overhead" | Not having monitoring means you discover problems from user complaints instead of dashboards. |
280
+ | "We'll add monitoring later" | Add it before launch. You can't debug what you can't see. |
281
+ | "Rolling back is admitting failure" | Rolling back is responsible engineering. Shipping a broken feature is the failure. |
282
+
283
+ ## Red Flags
284
+
285
+ - Deploying without a rollback plan
286
+ - No monitoring or error reporting in production
287
+ - Big-bang releases (everything at once, no staging)
288
+ - Feature flags with no expiration or owner
289
+ - No one monitoring the deploy for the first hour
290
+ - Production environment configuration done by memory, not code
291
+ - "It's Friday afternoon, let's ship it"
292
+
293
+ ## Verification
294
+
295
+ Before deploying:
296
+
297
+ - [ ] Pre-launch checklist completed (all sections green)
298
+ - [ ] Feature flag configured (if applicable)
299
+ - [ ] Rollback plan documented
300
+ - [ ] Monitoring dashboards set up
301
+ - [ ] Team notified of deployment
302
+
303
+ After deploying:
304
+
305
+ - [ ] Health check returns 200
306
+ - [ ] Error rate is normal
307
+ - [ ] Latency is normal
308
+ - [ ] Critical user flow works
309
+ - [ ] Logs are flowing
310
+ - [ ] Rollback tested or verified ready