antigravity-ai-kit 3.2.0 → 3.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: database-design
3
- description: Database schema design and optimization patterns
3
+ description: Database schema design, optimization patterns, distributed system consistency models, and zero-downtime migration strategies
4
4
  triggers: [context, database, schema, sql, prisma]
5
5
  ---
6
6
 
@@ -123,13 +123,13 @@ model User {
123
123
  ## Query Optimization
124
124
 
125
125
  ```typescript
126
- // N+1 Problem
126
+ // N+1 Problem
127
127
  const users = await prisma.user.findMany();
128
128
  for (const user of users) {
129
129
  const orders = await prisma.order.findMany({ where: { userId: user.id } });
130
130
  }
131
131
 
132
- // Eager Loading
132
+ // Eager Loading
133
133
  const users = await prisma.user.findMany({
134
134
  include: { orders: true },
135
135
  });
@@ -137,6 +137,155 @@ const users = await prisma.user.findMany({
137
137
 
138
138
  ---
139
139
 
140
+ ## CAP Theorem
141
+
142
+ In a distributed system, you can guarantee at most two of three properties simultaneously:
143
+
144
+ - **Consistency (C)**: Every read returns the most recent write
145
+ - **Availability (A)**: Every request receives a response (no timeout)
146
+ - **Partition Tolerance (P)**: The system continues operating despite network partitions
147
+
148
+ Since network partitions are unavoidable in distributed systems, the real choice is between CP and AP.
149
+
150
+ ### Decision Matrix
151
+
152
+ | Trade-off | Guarantees | Sacrifices | When to Choose | Example Systems |
153
+ | :--- | :--- | :--- | :--- | :--- |
154
+ | **CP** | Consistency + Partition Tolerance | Availability during partitions | Financial transactions, inventory counts, leader election | MongoDB (default), HBase, Zookeeper |
155
+ | **AP** | Availability + Partition Tolerance | Consistency (eventual) | Social feeds, caching layers, DNS, session stores | Cassandra, DynamoDB, CouchDB |
156
+ | **CA** | Consistency + Availability | Partition Tolerance | Single-node deployments only (no true distribution) | Traditional RDBMS (PostgreSQL, MySQL single-node) |
157
+
158
+ ---
159
+
160
+ ## ACID vs BASE
161
+
162
+ ### Property Comparison
163
+
164
+ | Property | ACID | BASE |
165
+ | :--- | :--- | :--- |
166
+ | **Full name** | Atomicity, Consistency, Isolation, Durability | Basically Available, Soft state, Eventually consistent |
167
+ | **Consistency** | Strong (immediate) | Eventual |
168
+ | **Availability** | May block under contention | Prioritizes availability |
169
+ | **Transactions** | Full multi-statement transactions | Single-record atomic ops; app-level sagas |
170
+ | **Scaling** | Vertical first; horizontal is complex | Horizontal by design |
171
+ | **Best for** | Financial systems, booking, inventory | Analytics, social, IoT, content delivery |
172
+
173
+ ### When to Use Each
174
+
175
+ - **ACID**: Money movement, order processing, anything requiring rollback guarantees, regulatory compliance
176
+ - **BASE**: High-throughput writes, geographically distributed reads, systems where stale reads are acceptable for seconds
177
+
178
+ ---
179
+
180
+ ## Consistency Models
181
+
182
+ From strongest to weakest, choose the level your application actually needs:
183
+
184
+ | Model | Guarantee | Latency Cost | Use Case |
185
+ | :--- | :--- | :--- | :--- |
186
+ | **Strict / Linearizable** | Reads always see the latest write globally | Highest (cross-region coordination) | Distributed locks, leader election |
187
+ | **Sequential** | All nodes see operations in the same order | High | Replicated state machines |
188
+ | **Causal** | Causally related operations are seen in order | Medium | Chat applications, collaborative editing |
189
+ | **Read-your-writes** | A client always sees its own writes | Low-Medium | User profile updates, shopping carts |
190
+ | **Monotonic reads** | Once a value is seen, older values are never returned | Low | Dashboard displays, reporting |
191
+ | **Eventual** | All replicas converge given enough time | Lowest | DNS, CDN caches, social media likes |
192
+
193
+ Choose the weakest model your application can tolerate to maximize performance and availability.
194
+
195
+ ---
196
+
197
+ ## Migration Safety
198
+
199
+ ### Zero-Downtime Migration Pattern
200
+
201
+ Safe migrations follow a multi-phase approach that avoids locking tables or breaking running application code.
202
+
203
+ **Phase 1 - Expand**: Add new structures alongside old ones
204
+ **Phase 2 - Migrate**: Backfill data, dual-write to both structures
205
+ **Phase 3 - Contract**: Remove old structures after all consumers have switched
206
+
207
+ ### Safe vs Unsafe Operations
208
+
209
+ | Operation | Safe? | Zero-Downtime Alternative |
210
+ | :--- | :--- | :--- |
211
+ | **Add nullable column** | Safe | N/A (already safe) |
212
+ | **Add column with default** | Safe (Postgres 11+) | For older versions, add nullable then backfill |
213
+ | **Drop column** | Unsafe | Stop reading column in code first, then drop in next deploy |
214
+ | **Rename column** | Unsafe | Add new column, dual-write, migrate reads, drop old |
215
+ | **Change column type** | Unsafe | Add new column with new type, backfill, swap reads |
216
+ | **Add NOT NULL constraint** | Unsafe | Add CHECK constraint as NOT VALID, then VALIDATE separately |
217
+ | **Add index** | Unsafe (locks table) | Use `CREATE INDEX CONCURRENTLY` (Postgres) |
218
+ | **Drop table** | Unsafe | Remove all references in code first, then drop |
219
+
220
+ ### Backfill Pattern
221
+
222
+ ```typescript
223
+ // Backfill in batches to avoid long-running transactions
224
+ async function backfillNewColumn(batchSize = 1000) {
225
+ let processed = 0;
226
+ let hasMore = true;
227
+
228
+ while (hasMore) {
229
+ const rows = await prisma.$executeRaw`
230
+ UPDATE users
231
+ SET display_name = first_name || ' ' || last_name
232
+ WHERE display_name IS NULL
233
+ LIMIT ${batchSize}
234
+ `;
235
+
236
+ processed += rows;
237
+ hasMore = rows === batchSize;
238
+
239
+ // Yield to other operations between batches
240
+ await new Promise((resolve) => setTimeout(resolve, 100));
241
+ }
242
+
243
+ return processed;
244
+ }
245
+ ```
246
+
247
+ ---
248
+
249
+ ## Connection Pooling
250
+
251
+ ### Pool Size Guidance
252
+
253
+ | Environment | Pool Size | Rationale |
254
+ | :--- | :--- | :--- |
255
+ | **Development** | 2-5 | Single developer, minimal concurrency |
256
+ | **Production (server)** | 10-20 per instance | Balance between concurrency and DB connection limits |
257
+ | **Production (serverless)** | 1-2 per function | Functions scale horizontally; too many connections exhaust DB limits |
258
+ | **Staging / CI** | 3-5 | Mirrors production behavior without resource waste |
259
+
260
+ ### Sizing Formula
261
+
262
+ ```
263
+ max_pool_size = (db_max_connections - reserved_superuser_connections) / number_of_app_instances
264
+ ```
265
+
266
+ For PostgreSQL with `max_connections = 100`, 3 superuser slots reserved, and 4 app instances:
267
+ `(100 - 3) / 4 = ~24 connections per instance`
268
+
269
+ ### Tool Recommendations
270
+
271
+ | Tool | Best For | Notes |
272
+ | :--- | :--- | :--- |
273
+ | **PgBouncer** | External pooler for PostgreSQL | Transaction-mode pooling for serverless; sits between app and DB |
274
+ | **Prisma built-in pool** | Prisma ORM users | Configure via `connection_limit` in datasource URL |
275
+ | **Prisma Accelerate** | Serverless / edge | Managed connection pooling with global caching |
276
+ | **RDS Proxy** | AWS deployments | Managed pooler; supports IAM auth and failover |
277
+ | **Supabase Supavisor** | Supabase projects | Built-in pooler with transaction and session modes |
278
+
279
+ ```prisma
280
+ // Prisma connection pool configuration
281
+ datasource db {
282
+ provider = "postgresql"
283
+ url = env("DATABASE_URL") // ?connection_limit=20&pool_timeout=10
284
+ }
285
+ ```
286
+
287
+ ---
288
+
140
289
  ## Quick Reference
141
290
 
142
291
  | Pattern | Usage |
@@ -147,3 +296,8 @@ const users = await prisma.user.findMany({
147
296
  | Timestamps | Always include |
148
297
  | Indexes | Frequent queries |
149
298
  | Constraints | Data integrity |
299
+ | CAP trade-off | Distributed design |
300
+ | ACID | Transactional data |
301
+ | BASE | High-scale writes |
302
+ | Migrations | Zero-downtime deploys |
303
+ | Connection Pool | Right-size per env |
@@ -12,11 +12,14 @@
12
12
 
13
13
  Include in plan:
14
14
 
15
- - **Accessibility (WCAG 2.1 AA)**: Identify components requiring ARIA labels, keyboard navigation, screen reader support, color contrast compliance
16
- - **Responsive Design**: Specify breakpoints to test (mobile 375px, tablet 768px, desktop 1280px), identify layout changes per breakpoint
17
- - **Bundle Size Impact**: Estimate size of new dependencies, identify tree-shaking opportunities, consider code splitting for new routes
18
- - **Core Web Vitals**: Assess impact on LCP (largest contentful paint), CLS (cumulative layout shift), INP (interaction to next paint)
19
- - **Component Composition**: Specify component hierarchy, prop interfaces, state management approach (local vs. global)
15
+ - **Accessibility (WCAG 2.1 AA)**: Identify components requiring ARIA labels, keyboard navigation, screen reader support, color contrast compliance (minimum 4.5:1 normal text, 3:1 large text)
16
+ - **Responsive Design**: Specify breakpoints to test (mobile 375px, tablet 768px, desktop 1280px), identify layout changes per breakpoint, verify touch targets (minimum 44x44px)
17
+ - **Bundle Size Impact**: Estimate size of new dependencies, identify tree-shaking opportunities, consider code splitting for new routes, set bundle budget (initial JS < 200KB gzipped)
18
+ - **Core Web Vitals**: Assess impact on LCP (< 2.5s), CLS (< 0.1), INP (< 200ms), identify render-blocking resources
19
+ - **Component Composition**: Specify component hierarchy, prop interfaces, state management approach (local vs. global), identify shared components for extraction
20
+ - **Rendering Strategy**: SSR vs CSR vs ISR decision for each route, hydration impact assessment, streaming SSR opportunities
21
+ - **Design System Compliance**: Verify alignment with existing design tokens (colors, spacing, typography), identify new tokens required
22
+ - **Error Boundaries**: Define error boundary placement, fallback UI for each failure mode, error reporting integration
20
23
 
21
24
  ---
22
25
 
@@ -26,11 +29,14 @@ Include in plan:
26
29
 
27
30
  Include in plan:
28
31
 
29
- - **API Contract**: Define request/response schemas (Zod validation), HTTP methods, status codes, error response format
30
- - **Error Handling**: Specify error response structure, error codes, client-facing messages vs. internal logging
31
- - **Rate Limiting**: Identify endpoints requiring rate limits, specify limits (requests/minute/user), throttling strategy
32
- - **Middleware Chain**: Document new middleware additions, execution order, impact on existing middleware stack
33
- - **Database Interaction**: Query patterns (parameterized), transaction boundaries, connection pooling impact
32
+ - **API Contract**: Define request/response schemas (Zod validation), HTTP methods, status codes, error response format (RFC 7807 Problem Details), versioning strategy
33
+ - **Error Handling**: Specify error response structure, error codes, client-facing messages vs. internal logging, error correlation IDs for tracing
34
+ - **Rate Limiting**: Identify endpoints requiring rate limits, specify limits (requests/minute/user), throttling strategy (sliding window vs. token bucket), response headers (X-RateLimit-*)
35
+ - **Middleware Chain**: Document new middleware additions, execution order, impact on existing middleware stack, short-circuit conditions
36
+ - **Database Interaction**: Query patterns (parameterized), transaction boundaries, connection pooling impact, N+1 query prevention
37
+ - **Input Validation**: Validation layer placement (controller vs. middleware), sanitization strategy, content-type enforcement, request size limits
38
+ - **Idempotency**: Identify non-idempotent operations, implement idempotency keys for critical mutations, retry safety assessment
39
+ - **Observability**: Structured logging format (JSON), request tracing headers (X-Request-ID propagation), health check endpoint specification
34
40
 
35
41
  ---
36
42
 
@@ -40,11 +46,14 @@ Include in plan:
40
46
 
41
47
  Include in plan:
42
48
 
43
- - **Migration Rollback**: Write both up and down migrations, test rollback procedure before deploying
44
- - **Index Impact Analysis**: Identify queries affected by schema changes, recommend index additions/removals, estimate query performance impact
45
- - **Data Integrity**: Define constraints (foreign keys, unique, not null, check), cascade behavior for deletions
46
- - **Backup Verification**: Verify backup exists before destructive migrations, test restore procedure for critical tables
47
- - **Query Performance**: Benchmark key queries before and after changes, set acceptable latency thresholds
49
+ - **Migration Rollback**: Write both up and down migrations, test rollback procedure before deploying, zero-downtime migration pattern (expand-contract for schema changes)
50
+ - **Index Impact Analysis**: Identify queries affected by schema changes, recommend index additions/removals, estimate query performance impact, verify composite index column order matches query patterns
51
+ - **Data Integrity**: Define constraints (foreign keys, unique, not null, check), cascade behavior for deletions, domain invariant enforcement at database level
52
+ - **Backup Verification**: Verify backup exists before destructive migrations, test restore procedure for critical tables, point-in-time recovery validation
53
+ - **Query Performance**: Benchmark key queries before and after changes (EXPLAIN ANALYZE), set acceptable latency thresholds (p50 < 10ms, p99 < 100ms for OLTP), identify sequential scan risks
54
+ - **Consistency Model**: Specify required consistency level (strong/eventual), transaction isolation level selection (Read Committed default, Serializable for financial), optimistic vs. pessimistic locking strategy
55
+ - **Data Classification**: Identify PII columns requiring encryption at rest, data retention policy compliance, audit trail requirements for sensitive data mutations
56
+ - **Connection Management**: Connection pool sizing for workload (pool_size = num_cores * 2 + disk_spindles), statement timeout configuration, idle connection cleanup
48
57
 
49
58
  ---
50
59
 
@@ -54,11 +63,14 @@ Include in plan:
54
63
 
55
64
  Include in plan:
56
65
 
57
- - **Infrastructure Changes**: Specify IaC modifications (Dockerfile, docker-compose, CI config), environment variable additions
58
- - **Monitoring & Alerting**: Define new metrics to track, alerting thresholds, dashboard updates
59
- - **Progressive Rollout**: Strategy for deployment (canary → staged → full), rollback triggers, health check endpoints
60
- - **Runbook Updates**: Document operational procedures for the new functionality, incident response steps
61
- - **Environment Parity**: Verify changes work across dev, staging, and production environments
66
+ - **Infrastructure Changes**: Specify IaC modifications (Dockerfile, docker-compose, CI config), environment variable additions, 12-Factor App compliance check
67
+ - **Monitoring & Alerting**: Define new metrics to track, alerting thresholds (SLO-derived), dashboard updates, golden signals coverage (latency, traffic, errors, saturation)
68
+ - **Progressive Rollout**: Strategy for deployment (canary → staged → full), rollback triggers (error rate > 1%, latency p99 > 2x baseline), automated rollback criteria, health check endpoints
69
+ - **Runbook Updates**: Document operational procedures for the new functionality, incident response steps, escalation paths
70
+ - **Environment Parity**: Verify changes work across dev, staging, and production environments, configuration drift detection
71
+ - **GitOps Compliance**: Infrastructure changes committed to version control, declarative configuration (desired state, not imperative scripts), automated drift reconciliation
72
+ - **Container Security**: Base image selection (distroless/alpine preferred), multi-stage build optimization, no secrets in image layers, vulnerability scanning in CI
73
+ - **Observability Pipeline**: Log aggregation configuration, trace sampling strategy, metric cardinality assessment, correlation between logs/traces/metrics
62
74
 
63
75
  ---
64
76
 
@@ -68,11 +80,14 @@ Include in plan:
68
80
 
69
81
  Include in plan (in addition to mandatory security considerations):
70
82
 
71
- - **Threat Model (STRIDE)**: Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege — assess each for the change
72
- - **Authentication Flow Impact**: How the change affects login, session management, token lifecycle
73
- - **Data Classification**: Identify data sensitivity levels (public, internal, confidential, restricted), storage and transmission requirements
74
- - **Compliance Requirements**: GDPR/CCPA implications (data minimization, consent, right to erasure)
75
- - **Secret Management**: New secrets required, rotation policy, storage mechanism (environment variables only)
83
+ - **Threat Model (STRIDE)**: Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege — assess each for the change with severity rating
84
+ - **Authentication Flow Impact**: How the change affects login, session management, token lifecycle, OAuth 2.0 flow selection (Authorization Code + PKCE for SPAs, Client Credentials for M2M)
85
+ - **Data Classification**: Identify data sensitivity levels (public, internal, confidential, restricted), storage and transmission requirements per level
86
+ - **Compliance Requirements**: GDPR/CCPA implications (data minimization, consent, right to erasure, breach notification within 72 hours)
87
+ - **Secret Management**: New secrets required, rotation policy, storage mechanism (environment variables only), zero hardcoded credentials enforcement
88
+ - **Zero Trust Assessment**: Authentication at every boundary (never trust, always verify), least privilege access for new endpoints/services, micro-segmentation for new network paths
89
+ - **Supply Chain Security**: New dependency audit (license, maintainer, vulnerability scan), lockfile integrity verification, SRI hashes for CDN resources
90
+ - **Input Boundary Defense**: All external inputs validated and sanitized, output encoding for context (HTML/URL/JS), parameterized queries only (no string concatenation)
76
91
 
77
92
  ---
78
93
 
@@ -82,11 +97,14 @@ Include in plan (in addition to mandatory security considerations):
82
97
 
83
98
  Include in plan:
84
99
 
85
- - **Performance Budget**: Define acceptable thresholds (page load time, API response time, memory usage)
86
- - **Profiling Strategy**: Tools and methods to measure before/after (Lighthouse, Chrome DevTools, load testing)
87
- - **Caching Strategy**: Cache layers (browser, CDN, application, database), TTL values, invalidation approach
88
- - **Lazy Loading**: Identify resources for deferred loading, intersection observer patterns, dynamic imports
89
- - **Benchmarking**: Define benchmark suite, baseline measurements, regression detection
100
+ - **Performance Budget**: Define acceptable thresholds (LCP < 2.5s, FID < 100ms, page load < 3s, API p99 < 500ms, memory < 512MB per process)
101
+ - **Profiling Strategy**: Tools and methods to measure before/after (Lighthouse, Chrome DevTools, load testing with k6/Artillery), baseline measurement requirements
102
+ - **Caching Architecture**: Cache layers (browser CDN application database), TTL values per layer, invalidation strategy (time-based, event-driven, version-key), cache stampede prevention (stale-while-revalidate, locking)
103
+ - **Lazy Loading**: Identify resources for deferred loading, intersection observer patterns, dynamic imports for route-level code splitting, image loading strategy (responsive images, next-gen formats)
104
+ - **Benchmarking**: Define benchmark suite, baseline measurements, regression detection thresholds, automated performance gates in CI
105
+ - **Database Query Optimization**: EXPLAIN ANALYZE for new/modified queries, index coverage verification, N+1 detection, read replica routing for heavy reads
106
+ - **Concurrency Model**: Event loop impact assessment, worker thread candidates for CPU-intensive operations, connection pool saturation risk
107
+ - **CDN Strategy**: Edge caching rules for static assets, cache-control header specification, origin shield configuration, geographic distribution assessment
90
108
 
91
109
  ---
92
110
 
@@ -96,11 +114,61 @@ Include in plan:
96
114
 
97
115
  Include in plan:
98
116
 
99
- - **Platform Parity**: Identify iOS vs. Android differences in behavior, UI, or API access
100
- - **Offline Support**: Define offline behavior, data sync strategy, conflict resolution
101
- - **App Store Guidelines**: Compliance with Apple/Google review guidelines for the feature
102
- - **Native Modules**: Bridge requirements, native module dependencies, build configuration changes
103
- - **Device Testing**: Target device matrix, screen size variations, OS version compatibility
117
+ - **Platform Parity**: Identify iOS vs. Android differences in behavior, UI, or API access, platform-specific code paths (#ifdef equivalent)
118
+ - **Offline Support**: Define offline behavior, data sync strategy (optimistic vs. pessimistic), conflict resolution (last-write-wins, CRDT, manual merge), network-aware queries
119
+ - **App Store Guidelines**: Compliance with Apple HIG and Material Design 3, review guideline risks, in-app purchase requirements
120
+ - **Native Modules**: Bridge requirements, native module dependencies, build configuration changes (Podfile/build.gradle)
121
+ - **Device Testing**: Target device matrix, screen size variations, OS version compatibility (minimum iOS 15 / Android API 26)
122
+ - **Navigation Architecture**: Navigation pattern selection (stack, tab, drawer), deep linking support, back navigation handling per platform
123
+ - **Mobile Performance Budget**: App startup time < 2s, frame rate 60fps minimum, memory usage < 150MB, APK/IPA size budget
124
+ - **State Persistence**: Local storage strategy (AsyncStorage, SQLite, MMKV), state rehydration on app resume, background task handling
125
+
126
+ ---
127
+
128
+ ## Reliability Domain
129
+
130
+ **Triggered when**: `reliability` domain matched (keywords: reliability, uptime, monitoring, sre, sla, slo, sli, etc.)
131
+
132
+ Include in plan:
133
+
134
+ - **SLO Definition**: Define Service Level Objectives for affected services (availability target, latency targets at p50/p95/p99, error rate budget)
135
+ - **SLI Instrumentation**: Specify Service Level Indicators to measure (request success rate, request latency, system throughput), measurement method and data source
136
+ - **Error Budget Impact**: Assess how the change affects existing error budgets, define acceptable error budget consumption for rollout
137
+ - **Golden Signals**: Monitoring for all four golden signals (latency, traffic, errors, saturation) for new/modified services
138
+ - **Resilience Patterns**: Circuit breaker placement, retry policy (exponential backoff with jitter), timeout configuration, bulkhead isolation for critical paths
139
+ - **Incident Preparedness**: Runbook for new failure modes, alerting rules (page vs. ticket), escalation matrix, blast radius assessment
140
+ - **Chaos Engineering**: Identify failure injection points for validation, steady-state hypothesis, abort conditions for chaos experiments
141
+ - **Capacity Planning**: Resource requirements (CPU, memory, network, storage), scaling triggers (auto-scale thresholds), load testing validation for expected traffic growth
142
+
143
+ ---
144
+
145
+ ## Observability Domain
146
+
147
+ **Triggered when**: `observability` domain matched (keywords: logging, tracing, metrics, monitoring, alerting, opentelemetry, etc.)
148
+
149
+ Include in plan:
150
+
151
+ - **Three Pillars Coverage**: Specify logging additions (structured JSON), metrics (counters, histograms, gauges), traces (span creation, context propagation)
152
+ - **OpenTelemetry Integration**: SDK initialization, auto-instrumentation scope, manual span creation for business-critical paths, sampling strategy (head-based vs. tail-based)
153
+ - **Log Architecture**: Log levels and when to use each (ERROR: actionable failures, WARN: degradation, INFO: business events, DEBUG: development only), structured fields, correlation ID propagation
154
+ - **Alerting Strategy**: Alert conditions derived from SLOs, notification channels (PagerDuty/Slack), alert fatigue prevention (multi-window burn rate), silence/snooze policies
155
+ - **Dashboard Design**: Key metrics visualization, RED method (Rate, Errors, Duration) per service, drill-down capability from overview to detail
156
+ - **Cost Management**: Metric cardinality assessment, log volume projection, trace sampling rate optimization, retention policy per signal type
157
+
158
+ ---
159
+
160
+ ## Distributed Systems Domain
161
+
162
+ **Triggered when**: `architecture` domain matched AND task involves multiple services, message queues, or event-driven patterns
163
+
164
+ Include in plan:
165
+
166
+ - **Consistency Strategy**: CAP theorem trade-off for the specific use case, consistency model selection (strong, eventual, causal), Saga pattern for distributed transactions (choreography vs. orchestration)
167
+ - **Communication Pattern**: Synchronous (REST/gRPC) vs. asynchronous (message queue/event stream) decision per interaction, protocol selection criteria
168
+ - **Fault Tolerance**: Failure mode analysis for each service interaction, fallback behavior, partial failure handling, data loss prevention
169
+ - **Event-Driven Design**: Event schema definition (CloudEvents format), event ordering guarantees, idempotent consumers, dead letter queue strategy
170
+ - **Service Discovery**: Registration mechanism, health check protocol, load balancing strategy (client-side vs. server-side), circuit breaker integration
171
+ - **Data Sovereignty**: Which service owns which data, cross-service data access patterns (API calls, not shared databases), eventual consistency reconciliation
104
172
 
105
173
  ---
106
174
 
@@ -112,3 +180,5 @@ The planner reads this file when domain-specific sections are needed:
112
180
  2. For each matched domain, include the corresponding enhancer section
113
181
  3. Domain sections are added AFTER the base plan schema sections
114
182
  4. Multiple domains can be active simultaneously (e.g., frontend + backend for a full-stack feature)
183
+ 5. Each domain section contributes to the plan quality score (+2 bonus per matched domain section present, -2 penalty per missing)
184
+ 6. Domain enhancers leverage the specialized knowledge of their corresponding elevated agents (e.g., reliability domain draws from reliability-engineer's SRE Golden Signals framework)
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: security-practices
3
- description: Application security best practices and vulnerability prevention
3
+ description: Application security best practices including Zero Trust principles, OAuth 2.0 / OpenID Connect flows, API security, supply chain security, and vulnerability prevention
4
4
  triggers: [context, security, auth, vulnerability]
5
5
  ---
6
6
 
@@ -21,7 +21,7 @@ This skill provides security guidelines following OWASP standards and industry b
21
21
  ### Password Security
22
22
 
23
23
  ```typescript
24
- // Use bcrypt with cost factor 12
24
+ // Use bcrypt with cost factor 12
25
25
  import bcrypt from "bcrypt";
26
26
 
27
27
  const SALT_ROUNDS = 12;
@@ -55,20 +55,20 @@ const refreshToken = jwt.sign({ userId }, REFRESH_SECRET, { expiresIn: "7d" });
55
55
  ### Never Trust User Input
56
56
 
57
57
  ```typescript
58
- // SQL Injection vulnerable
58
+ // SQL Injection vulnerable
59
59
  const query = `SELECT * FROM users WHERE id = ${userId}`;
60
60
 
61
- // Parameterized query
61
+ // Parameterized query
62
62
  const user = await prisma.user.findUnique({ where: { id: userId } });
63
63
  ```
64
64
 
65
65
  ### Sanitize Output
66
66
 
67
67
  ```typescript
68
- // XSS vulnerable
68
+ // XSS vulnerable
69
69
  element.innerHTML = userInput;
70
70
 
71
- // Escape HTML
71
+ // Escape HTML
72
72
  import DOMPurify from "dompurify";
73
73
  element.innerHTML = DOMPurify.sanitize(userInput);
74
74
  ```
@@ -114,18 +114,194 @@ res.setHeader("Content-Security-Policy", "default-src 'self'");
114
114
  ## Secrets Management
115
115
 
116
116
  ```bash
117
- # Never commit secrets
117
+ # Never commit secrets
118
118
  # .env file with API_KEY=sk-1234...
119
119
 
120
- # Use environment variables
120
+ # Use environment variables
121
121
  export API_KEY=$(vault read secret/api-key)
122
122
 
123
- # Use secret managers
123
+ # Use secret managers
124
124
  # AWS Secrets Manager, HashiCorp Vault, etc.
125
125
  ```
126
126
 
127
127
  ---
128
128
 
129
+ ## Zero Trust Principles
130
+
131
+ Zero Trust assumes no implicit trust for any entity inside or outside the network perimeter. Every access request is fully authenticated, authorized, and encrypted before granting access.
132
+
133
+ | Principle | Implementation | Verification |
134
+ | :--- | :--- | :--- |
135
+ | **Never trust, always verify** | Authenticate every request regardless of origin; treat internal traffic the same as external | Audit logs confirm no unauthenticated requests reach protected resources |
136
+ | **Least privilege** | Grant minimum permissions required; use role-based and attribute-based access control | Periodic access reviews; automated permission drift detection |
137
+ | **Assume breach** | Encrypt data at rest and in transit; segment blast radius; implement intrusion detection | Red team exercises; incident response drills validate containment |
138
+ | **Micro-segmentation** | Isolate workloads with network policies; service mesh mTLS between microservices | Verify lateral movement is blocked between segments with penetration testing |
139
+ | **Continuous validation** | Re-evaluate trust on every request; session tokens with short TTL; step-up auth for sensitive ops | Monitor for session hijacking; alert on anomalous access patterns |
140
+ | **Device trust** | Require managed/compliant devices; verify device posture before granting access | Device compliance checks run at connection time and periodically |
141
+
142
+ ---
143
+
144
+ ## OAuth 2.0 / OpenID Connect Flows
145
+
146
+ ### Flow Selection Matrix
147
+
148
+ | Client Type | Recommended Flow | Reason |
149
+ | :--- | :--- | :--- |
150
+ | **Web app (SPA)** | Authorization Code + PKCE | No client secret in browser; PKCE prevents interception |
151
+ | **Web app (server)** | Authorization Code | Client secret stored server-side securely |
152
+ | **Mobile / Desktop** | Authorization Code + PKCE | Public client; PKCE mandatory |
153
+ | **Machine-to-Machine** | Client Credentials | No user interaction; service identity via client secret |
154
+ | **Legacy (avoid)** | Implicit | Deprecated; tokens exposed in URL fragment |
155
+
156
+ ### Token Storage Requirements
157
+
158
+ ```typescript
159
+ // NEVER store access tokens in localStorage (XSS-accessible)
160
+ // NEVER store tokens in sessionStorage for long-lived sessions
161
+
162
+ // Use httpOnly, Secure, SameSite cookies for refresh tokens
163
+ res.cookie("refresh_token", token, {
164
+ httpOnly: true,
165
+ secure: true,
166
+ sameSite: "strict",
167
+ maxAge: 7 * 24 * 60 * 60 * 1000, // 7 days
168
+ path: "/api/auth/refresh",
169
+ });
170
+
171
+ // Keep access tokens in memory only (JS variable)
172
+ // They are short-lived (15 min) and re-obtained via refresh
173
+ ```
174
+
175
+ ### PKCE Implementation
176
+
177
+ ```typescript
178
+ import crypto from "crypto";
179
+
180
+ // Generate code verifier (43-128 chars, unreserved URI characters)
181
+ function generateCodeVerifier(): string {
182
+ return crypto.randomBytes(32).toString("base64url");
183
+ }
184
+
185
+ // Derive code challenge from verifier
186
+ function generateCodeChallenge(verifier: string): string {
187
+ return crypto.createHash("sha256").update(verifier).digest("base64url");
188
+ }
189
+
190
+ // All public clients MUST use PKCE (RFC 7636)
191
+ // Send code_challenge with authorization request
192
+ // Send code_verifier with token exchange request
193
+ ```
194
+
195
+ ---
196
+
197
+ ## API Security
198
+
199
+ ### Rate Limiting Patterns
200
+
201
+ | Strategy | Use Case | Example |
202
+ | :--- | :--- | :--- |
203
+ | **Per-endpoint** | Protect expensive operations | `/api/search`: 10 req/min |
204
+ | **Per-user** | Fair usage enforcement | Authenticated: 1000 req/hr |
205
+ | **Sliding window** | Smooth traffic spikes | Rolling 60s window, max 100 |
206
+ | **Token bucket** | Burst tolerance | 10 tokens, refill 1/sec |
207
+ | **IP-based** | Unauthenticated endpoints | Login: 5 attempts/15 min |
208
+
209
+ ```typescript
210
+ import rateLimit from "express-rate-limit";
211
+
212
+ const apiLimiter = rateLimit({
213
+ windowMs: 15 * 60 * 1000, // 15 minutes
214
+ max: 100,
215
+ standardHeaders: true,
216
+ legacyHeaders: false,
217
+ keyGenerator: (req) => req.user?.id ?? req.ip,
218
+ });
219
+
220
+ app.use("/api/", apiLimiter);
221
+ ```
222
+
223
+ ### API Key Management
224
+
225
+ - **Rotate keys** on a regular schedule (90 days max) and immediately on suspected compromise
226
+ - **Scope keys** to specific endpoints, methods, and IP ranges
227
+ - **Never embed keys** in client-side code or version control
228
+ - **Use separate keys** for each environment (dev, staging, production)
229
+ - **Log key usage** to detect anomalous patterns
230
+
231
+ ### Request Signing
232
+
233
+ ```typescript
234
+ // Sign requests with HMAC to prevent tampering
235
+ import crypto from "crypto";
236
+
237
+ function signRequest(payload: string, secret: string): string {
238
+ return crypto.createHmac("sha256", secret).update(payload).digest("hex");
239
+ }
240
+
241
+ // Verify on server side; reject requests with invalid or expired signatures
242
+ // Include timestamp in signed payload to prevent replay attacks
243
+ ```
244
+
245
+ ### API Versioning Security
246
+
247
+ - Deprecate and remove old API versions that lack current security controls
248
+ - Apply the same authentication and authorization to all active versions
249
+ - Monitor traffic to deprecated versions for potential abuse
250
+ - Never maintain insecure legacy endpoints for backward compatibility
251
+
252
+ ---
253
+
254
+ ## Supply Chain Security
255
+
256
+ ### Dependency Auditing
257
+
258
+ ```bash
259
+ # Run audit on every CI build
260
+ npm audit --audit-level=high
261
+
262
+ # Fix known vulnerabilities
263
+ npm audit fix
264
+
265
+ # Use lockfile-only installs in CI to prevent supply chain attacks
266
+ npm ci
267
+ ```
268
+
269
+ ### Lockfile Integrity
270
+
271
+ - **Always commit** `package-lock.json` to version control
272
+ - **Use `npm ci`** in CI/CD pipelines (installs from lockfile exactly)
273
+ - **Review lockfile diffs** in pull requests for unexpected changes
274
+ - **Enable lockfile-lint** to enforce registry and integrity hash policies
275
+
276
+ ### Dependency Pinning
277
+
278
+ ```json
279
+ {
280
+ "dependencies": {
281
+ "express": "4.18.2",
282
+ "prisma": "5.10.0"
283
+ }
284
+ }
285
+ ```
286
+
287
+ - Pin exact versions in production applications (no `^` or `~`)
288
+ - Use Dependabot or Renovate for controlled, reviewed updates
289
+ - Separate security patches from feature updates in dependency PRs
290
+
291
+ ### Typosquatting Detection
292
+
293
+ | Technique | Example |
294
+ | :--- | :--- |
295
+ | **Character swap** | `expresss` instead of `express` |
296
+ | **Hyphen confusion** | `lodash-utils` mimicking `lodash` |
297
+ | **Scope squatting** | `@myorg/config` vs `@my-org/config` |
298
+
299
+ - Verify package publisher and download counts before installing
300
+ - Use `npm config set ignore-scripts true` for initial installs, then review scripts
301
+ - Consider using Socket.dev or Snyk to detect malicious packages automatically
302
+
303
+ ---
304
+
129
305
  ## Quick Reference
130
306
 
131
307
  | Practice | Implementation |
@@ -138,3 +314,7 @@ export API_KEY=$(vault read secret/api-key)
138
314
  | Secrets | Environment, vaults |
139
315
  | Dependencies | npm audit, Snyk |
140
316
  | Logging | Audit trail, no PII |
317
+ | Zero Trust | Verify every request |
318
+ | OAuth 2.0 | Auth Code + PKCE |
319
+ | API Keys | Scoped, rotated |
320
+ | Supply Chain | Lockfile, pin deps |
@@ -190,6 +190,7 @@ If any of these conditions are met, **REJECT** the task:
190
190
 
191
191
  ## Related Resources
192
192
 
193
+ - **Rule**: `.agent/rules/quality-gate.md` (enforcement principles for this workflow)
193
194
  - **Previous**: `/brainstorm` (explore options before validation)
194
195
  - **Next**: `/plan` (implementation planning after approval)
195
196
  - **Related**: `/retrospective` (post-sprint audit applies similar rigor)