npm - @wipal/agent-team - Versions diffs - 1.0.0 - Mend

@wipal/agent-team 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (89) hide show

package/.claude/skills/domain/architecture/performance-engineering/references/benchmarking.md ADDED Viewed

@@ -0,0 +1,336 @@
+# Benchmarking & Load Testing
+## Load Testing Types
+### 1. Load Testing
+```
+Purpose: Verify system handles expected load
+Configuration:
+- Expected concurrent users
+- Expected request rate
+- Typical usage patterns
+Success Criteria:
+- Response time within SLA
+- Error rate below threshold
+- Resource utilization acceptable
+```
+### 2. Stress Testing
+```
+Purpose: Find system breaking point
+Configuration:
+- Gradually increase load beyond capacity
+- Monitor for degradation
+- Identify failure mode
+Success Criteria:
+- Understand system limits
+- Identify bottleneck
+- Graceful degradation behavior
+```
+### 3. Spike Testing
+```
+Purpose: Test sudden traffic increase
+Configuration:
+- Baseline load
+- Sudden spike (10x traffic)
+- Return to baseline
+Success Criteria:
+- Auto-scaling triggers
+- No cascade failures
+- Recovery to normal
+```
+### 4. Soak Testing
+```
+Purpose: Find issues over time
+Configuration:
+- Sustained load (hours/days)
+- Monitor for degradation
+- Look for memory leaks
+Success Criteria:
+- Stable performance over time
+- No memory growth
+- No resource exhaustion
+```
+## Load Testing Tools
+### k6 (Recommended)
+```javascript
+// script.js
+import http from 'k6/http';
+import { check, sleep } from 'k6';
+export const options = {
+  stages: [
+    { duration: '30s', target: 20 },   // Ramp up
+    { duration: '1m', target: 20 },    // Stay
+    { duration: '30s', target: 100 },  // Spike
+    { duration: '1m', target: 100 },   // Stay
+    { duration: '30s', target: 0 },    // Ramp down
+  ],
+  thresholds: {
+    http_req_duration: ['p(95)<500'], // 95% under 500ms
+    http_req_failed: ['rate<0.01'],   // <1% errors
+  },
+};
+export default function () {
+  const res = http.get('https://api.example.com/users');
+  check(res, {
+    'status is 200': (r) => r.status === 200,
+    'response time OK': (r) => r.timings.duration < 500,
+  });
+  sleep(1);
+}
+// Run: k6 run script.js
+```
+### Locust (Python)
+```python
+from locust import HttpUser, task, between
+class WebsiteUser(HttpUser):
+    wait_time = between(1, 5)
+    @task
+    def get_users(self):
+        self.client.get("/users")
+    @task(3)  # 3x more frequent
+    def get_products(self):
+        self.client.get("/products")
+# Run: locust -f locustfile.py
+```
+### Artillery (Node.js)
+```yaml
+config:
+  target: 'https://api.example.com'
+  phases:
+    - duration: 60
+      arrivalRate: 10
+      name: Warm up
+    - duration: 120
+      arrivalRate: 50
+      name: Sustained load
+scenarios:
+  - flow:
+      - get:
+          url: "/users"
+      - think: 2
+      - get:
+          url: "/products"
+# Run: artillery run config.yml
+```
+## Performance Metrics
+### Key Metrics to Track
+```
+1. Throughput
+   - Requests per second (RPS)
+   - Transactions per second (TPS)
+   - Messages per second
+2. Latency
+   - p50 (median)
+   - p90
+   - p95
+   - p99
+   - p99.9
+3. Error Rate
+   - HTTP 4xx errors
+   - HTTP 5xx errors
+   - Timeouts
+   - Connection errors
+4. Resource Utilization
+   - CPU usage
+   - Memory usage
+   - Disk I/O
+   - Network I/O
+   - Database connections
+```
+### Interpreting Results
+```
+Good Results:
+- Flat latency as load increases (until saturation)
+- Linear throughput increase with load
+- Error rate < 0.1%
+- Resource utilization < 80%
+Warning Signs:
+- Latency increases exponentially
+- Throughput plateaus while latency increases
+- Error rate spikes
+- Resource exhaustion
+Bottleneck Indicators:
+- High CPU, low throughput → Algorithm optimization
+- High memory → Memory leak, inefficient data structures
+- High I/O wait → Disk bottleneck, database queries
+- High network → Bandwidth limitation
+```
+## Test Environment
+### Production-Like Environment
+```
+Requirements:
+- Same hardware specifications
+- Same software versions
+- Same configuration
+- Same data volume (or representative subset)
+- Same network conditions
+Database:
+- Use production-like data distribution
+- Same indexes
+- Same query patterns
+```
+### Test Data
+```
+Options:
+1. Production data (anonymized)
+2. Synthetic data (realistic distribution)
+3. Subset of production
+Considerations:
+- Data volume affects performance
+- Data distribution affects query plans
+- Sensitive data handling
+```
+## Benchmarking Best Practices
+### Before Testing
+```
+1. Define clear objectives
+   - What are you testing?
+   - What's the success criteria?
+2. Establish baseline
+   - Run tests on known-good configuration
+   - Document baseline metrics
+3. Prepare environment
+   - Clean state
+   - No other load
+   - Monitoring ready
+```
+### During Testing
+```
+1. Monitor all layers
+   - Application logs
+   - Database metrics
+   - Infrastructure metrics
+   - Network metrics
+2. Capture data
+   - Response times
+   - Error logs
+   - Resource utilization
+   - GC pauses (if applicable)
+3. Note observations
+   - When did degradation start?
+   - What failed first?
+   - Error patterns?
+```
+### After Testing
+```
+1. Analyze results
+   - Compare to baseline
+   - Identify bottlenecks
+   - Document findings
+2. Report
+   - Executive summary
+   - Key metrics
+   - Recommendations
+3. Action items
+   - Prioritize fixes
+   - Estimate effort
+   - Plan retest
+```
+## Continuous Performance Testing
+### CI/CD Integration
+```yaml
+# .github/workflows/performance.yml
+name: Performance Tests
+on:
+  schedule:
+    - cron: '0 2 * * *'  # Daily at 2am
+  workflow_dispatch:
+jobs:
+  load-test:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+      - name: Run k6
+        uses: grafana/k6-action@v0.3.0
+        with:
+          filename: tests/load/basic.js
+        env:
+          K6_CLOUD_TOKEN: ${{ secrets.K6_TOKEN }}
+      - name: Check thresholds
+        run: |
+          # Fail if p95 > 500ms
+          if [ $(cat results.json | jq '.metrics.http_req_duration.values["p(95)"]') -gt 500 ]; then
+            echo "Performance regression detected"
+            exit 1
+          fi
+```
+### Performance Budgets
+```
+Define and enforce budgets:
+Frontend:
+- First Contentful Paint: < 1.5s
+- Largest Contentful Paint: < 2.5s
+- Time to Interactive: < 3.5s
+- Cumulative Layout Shift: < 0.1
+API:
+- p95 latency: < 200ms
+- p99 latency: < 500ms
+- Error rate: < 0.1%
+Database:
+- Query time p95: < 50ms
+- Connection pool usage: < 80%
+Enforcement:
+- Build fails if budget exceeded
+- PR checks for performance
+- Automated alerts
+```

package/.claude/skills/domain/architecture/performance-engineering/references/caching-strategies.md ADDED Viewed

@@ -0,0 +1,284 @@
+# Caching Strategies
+## Cache Types
+### 1. Browser Cache
+```
+HTTP Headers:
+Cache-Control: max-age=31536000  // 1 year for static
+Cache-Control: no-cache           // Revalidate
+Cache-Control: private            // Browser only
+ETag: "abc123"                    // Validation
+Last-Modified: Wed, 21 Oct 2015   // Validation
+Use for: Static assets, immutable content
+```
+### 2. CDN Cache
+```
+┌──────────┐     ┌──────────────────────────────┐
+│  Origin  │     │            CDN               │
+│  Server  │────▶│  ┌─────┐ ┌─────┐ ┌─────┐   │
+│          │     │  │Edge1│ │Edge2│ │Edge3│   │
+└──────────┘     │  └─────┘ └─────┘ └─────┘   │
+                 └──────────────────────────────┘
+CDN Features:
+- Geographic distribution
+- DDoS protection
+- SSL termination
+- Compression
+Providers: Cloudflare, AWS CloudFront, Fastly, Akamai
+```
+### 3. Application Cache
+```
+In-Memory Cache (Redis, Memcached):
+┌─────────┐     ┌─────────┐     ┌─────────┐
+│   App   │────▶│  Redis  │     │   DB    │
+│ Server  │     │ Cluster │     │         │
+└─────────┘     └─────────┘     └─────────┘
+Use for:
+- Session data
+- API response caching
+- Rate limiting counters
+- Leaderboards
+- Real-time analytics
+```
+### 4. Database Cache
+```
+Query Cache:
+- Cache query results
+- Invalidate on write
+Buffer Pool (InnoDB):
+- Cache table data
+- Cache indexes
+Read Replicas:
+- Offload read traffic
+- Geographic distribution
+```
+## Caching Patterns
+### Cache-Aside (Lazy Loading)
+```
+def get_user(user_id):
+    # 1. Check cache
+    cached = cache.get(f"user:{user_id}")
+    if cached:
+        return cached
+    # 2. Load from DB
+    user = db.query("SELECT * FROM users WHERE id = ?", user_id)
+    # 3. Populate cache
+    cache.set(f"user:{user_id}", user, ttl=3600)
+    return user
+Pros:
+- Only caches what's requested
+- Cache failure doesn't break app
+Cons:
+- Cache miss penalty
+- Stale data possible
+```
+### Write-Through
+```
+def update_user(user_id, data):
+    # 1. Update cache
+    cache.set(f"user:{user_id}", data, ttl=3600)
+    # 2. Update database
+    db.query("UPDATE users SET ... WHERE id = ?", user_id)
+    return data
+Pros:
+- Cache always fresh
+- Read performance optimal
+Cons:
+- Write latency higher
+- May cache unused data
+```
+### Write-Behind (Write-Back)
+```
+Write Queue: [update1, update2, update3]
+def update_user(user_id, data):
+    # 1. Update cache immediately
+    cache.set(f"user:{user_id}", data)
+    # 2. Queue for async DB write
+    write_queue.push({user_id, data})
+    return data
+# Background worker:
+def write_worker():
+    while True:
+        batch = write_queue.pop_batch(100)
+        db.batch_update(batch)
+Pros:
+- Very fast writes
+- Batch DB operations
+Cons:
+- Risk of data loss
+- Complexity
+```
+### Refresh-Ahead
+```
+def get_user(user_id):
+    cached = cache.get(f"user:{user_id}")
+    # Check if nearing expiry
+    ttl = cache.ttl(f"user:{user_id}")
+    if ttl < 300:  # Less than 5 min
+        # Async refresh
+        refresh_queue.push(user_id)
+    return cached or load_from_db(user_id)
+# Background refresh
+def refresh_worker():
+    while True:
+        user_id = refresh_queue.pop()
+        user = db.query("SELECT * FROM users WHERE id = ?", user_id)
+        cache.set(f"user:{user_id}", user, ttl=3600)
+Pros:
+- No cache misses
+- Optimal read performance
+Cons:
+- May refresh unused data
+- Complexity
+```
+## Cache Invalidation
+### Strategies
+```
+1. Time-Based (TTL)
+   cache.set(key, value, ttl=3600)
+   Simple but may serve stale data
+2. Event-Based
+   On data change: cache.delete(key)
+   Fresh but requires tracking
+3. Version-Based
+   key = f"user:{user_id}:v{version}"
+   Increment version on change
+4. Tag-Based
+   cache.set(key, value, tags=["user:123", "posts"])
+   Invalidate by tag
+```
+### Cache Stampede Prevention
+```
+Problem: Multiple requests miss cache simultaneously
+Result: All hit database at once
+Solutions:
+1. Locking
+   if cache.get(key) is None:
+       if lock.acquire(key, timeout=5):
+           value = db.query(...)
+           cache.set(key, value)
+           lock.release(key)
+       else:
+           time.sleep(0.1)
+           return get_with_cache(key)
+2. Probabilistic Early Refresh
+   if random() < (current_time - expiry) / ttl:
+       async_refresh(key)
+3. Request Coalescing
+   pending_requests[key].add(request_id)
+   # Single DB query, respond to all waiters
+```
+## Cache Configuration
+### Redis Best Practices
+```
+Memory Management:
+maxmemory 4gb
+maxmemory-policy allkeys-lru
+Persistence (choose based on needs):
+appendonly yes          # AOF for durability
+save 900 1              # RDB snapshots
+Replication:
+replicaof master-host 6379
+Keys Design:
+- Use consistent prefixes
+- Include TTL in key when relevant
+- Keep keys short but meaningful
+Examples:
+user:123:profile        # User profile cache
+session:abc123          # Session data
+ratelimit:api:456       # Rate limit counter
+cache:products:page:1   # Paginated cache
+```
+### Cache Warming
+```
+def warm_cache():
+    """Pre-populate cache on startup"""
+    # Popular products
+    popular = db.query("SELECT * FROM products ORDER BY views DESC LIMIT 100")
+    for product in popular:
+        cache.set(f"product:{product.id}", product, ttl=3600)
+    # User sessions
+    active_sessions = db.query("SELECT * FROM sessions WHERE active = true")
+    for session in active_sessions:
+        cache.set(f"session:{session.id}", session, ttl=86400)
+# Run on:
+- Application startup
+- Scheduled (hourly)
+- After cache clear
+```
+## Monitoring
+### Key Metrics
+```
+- Hit rate: cache_hits / (cache_hits + cache_misses)
+- Latency: p50, p95, p99
+- Memory usage: used_memory / maxmemory
+- Evictions: keys evicted due to memory pressure
+- Connections: current connections vs max
+```
+### Alerts
+```
+- Hit rate < 80%
+- Memory usage > 90%
+- Latency p99 > 10ms
+- Evictions increasing
+- Connection pool exhausted
+```