@musashishao/agent-kit 1.6.1 → 1.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (64) hide show
  1. package/.agent/.shared/ui-ux-pro-max/data/charts.csv +26 -0
  2. package/.agent/.shared/ui-ux-pro-max/data/colors.csv +97 -0
  3. package/.agent/.shared/ui-ux-pro-max/data/icons.csv +101 -0
  4. package/.agent/.shared/ui-ux-pro-max/data/landing.csv +31 -0
  5. package/.agent/.shared/ui-ux-pro-max/data/products.csv +97 -0
  6. package/.agent/.shared/ui-ux-pro-max/data/prompts.csv +24 -0
  7. package/.agent/.shared/ui-ux-pro-max/data/react-performance.csv +45 -0
  8. package/.agent/.shared/ui-ux-pro-max/data/stacks/flutter.csv +53 -0
  9. package/.agent/.shared/ui-ux-pro-max/data/stacks/html-tailwind.csv +56 -0
  10. package/.agent/.shared/ui-ux-pro-max/data/stacks/jetpack-compose.csv +53 -0
  11. package/.agent/.shared/ui-ux-pro-max/data/stacks/nextjs.csv +53 -0
  12. package/.agent/.shared/ui-ux-pro-max/data/stacks/nuxt-ui.csv +51 -0
  13. package/.agent/.shared/ui-ux-pro-max/data/stacks/nuxtjs.csv +59 -0
  14. package/.agent/.shared/ui-ux-pro-max/data/stacks/react-native.csv +52 -0
  15. package/.agent/.shared/ui-ux-pro-max/data/stacks/react.csv +54 -0
  16. package/.agent/.shared/ui-ux-pro-max/data/stacks/shadcn.csv +61 -0
  17. package/.agent/.shared/ui-ux-pro-max/data/stacks/svelte.csv +54 -0
  18. package/.agent/.shared/ui-ux-pro-max/data/stacks/swiftui.csv +51 -0
  19. package/.agent/.shared/ui-ux-pro-max/data/stacks/vue.csv +50 -0
  20. package/.agent/.shared/ui-ux-pro-max/data/styles.csv +59 -0
  21. package/.agent/.shared/ui-ux-pro-max/data/typography.csv +58 -0
  22. package/.agent/.shared/ui-ux-pro-max/data/ui-reasoning.csv +101 -0
  23. package/.agent/.shared/ui-ux-pro-max/data/ux-guidelines.csv +100 -0
  24. package/.agent/.shared/ui-ux-pro-max/data/web-interface.csv +31 -0
  25. package/.agent/.shared/ui-ux-pro-max/scripts/core.py +258 -0
  26. package/.agent/.shared/ui-ux-pro-max/scripts/design_system.py +487 -0
  27. package/.agent/.shared/ui-ux-pro-max/scripts/search.py +76 -0
  28. package/.agent/adr/ADR-TEMPLATE.md +57 -0
  29. package/.agent/adr/README.md +30 -0
  30. package/.agent/agents/backend-specialist.md +1 -1
  31. package/.agent/agents/devops-engineer.md +1 -1
  32. package/.agent/agents/performance-optimizer.md +1 -1
  33. package/.agent/agents/security-auditor.md +1 -1
  34. package/.agent/dashboard/index.html +169 -0
  35. package/.agent/rules/REFERENCE.md +14 -0
  36. package/.agent/skills/ai-incident-management/SKILL.md +517 -0
  37. package/.agent/skills/ai-security-guardrails/SKILL.md +405 -0
  38. package/.agent/skills/ai-security-guardrails/owasp-llm-top10.md +160 -0
  39. package/.agent/skills/ai-security-guardrails/scripts/prompt_injection_scanner.py +230 -0
  40. package/.agent/skills/compliance-for-ai/SKILL.md +411 -0
  41. package/.agent/skills/observability-patterns/SKILL.md +484 -0
  42. package/.agent/skills/observability-patterns/scripts/otel_validator.py +330 -0
  43. package/.agent/skills/opentelemetry-expert/SKILL.md +738 -0
  44. package/.agent/skills/opentelemetry-expert/scripts/trace_analyzer.py +351 -0
  45. package/.agent/skills/privacy-preserving-dev/SKILL.md +442 -0
  46. package/.agent/skills/privacy-preserving-dev/scripts/pii_scanner.py +285 -0
  47. package/.agent/workflows/autofix.md +4 -1
  48. package/.agent/workflows/brainstorm.md +1 -1
  49. package/.agent/workflows/context.md +3 -1
  50. package/.agent/workflows/create.md +1 -1
  51. package/.agent/workflows/dashboard.md +4 -1
  52. package/.agent/workflows/debug.md +1 -1
  53. package/.agent/workflows/deploy.md +1 -1
  54. package/.agent/workflows/enhance.md +1 -1
  55. package/.agent/workflows/next.md +4 -1
  56. package/.agent/workflows/orchestrate.md +1 -1
  57. package/.agent/workflows/plan.md +1 -1
  58. package/.agent/workflows/preview.md +1 -1
  59. package/.agent/workflows/quality.md +1 -1
  60. package/.agent/workflows/spec.md +1 -1
  61. package/.agent/workflows/status.md +1 -1
  62. package/.agent/workflows/test.md +1 -1
  63. package/.agent/workflows/ui-ux-pro-max.md +1 -1
  64. package/package.json +4 -1
@@ -0,0 +1,517 @@
1
+ ---
2
+ name: ai-incident-management
3
+ description: AI-specific incident response playbook, hallucination detection patterns, degradation vs outage classification, rollback strategies, post-mortem templates.
4
+ allowed-tools: Read, Glob, Grep
5
+ skills:
6
+ - systematic-debugging
7
+ - observability-patterns
8
+ ---
9
+
10
+ # AI Incident Management
11
+
12
+ > When AI breaks, respond fast and learn faster.
13
+
14
+ ---
15
+
16
+ ## 1. AI-Specific Incident Types
17
+
18
+ ### Taxonomy
19
+
20
+ | Category | Examples | Severity |
21
+ |----------|----------|----------|
22
+ | **Hallucination** | Factually wrong, made-up info | Medium-Critical |
23
+ | **Toxicity** | Harmful, offensive output | Critical |
24
+ | **Data Leakage** | PII in responses, prompt leak | Critical |
25
+ | **Performance** | High latency, timeouts | Medium |
26
+ | **Availability** | Model API down | High-Critical |
27
+ | **Drift** | Quality degradation over time | Medium |
28
+ | **Prompt Injection** | Security bypass | Critical |
29
+
30
+ ### Severity Matrix
31
+
32
+ ```
33
+ ┌─────────────────┬──────────────────┬─────────────────────┐
34
+ │ │ Impact: Low │ Impact: High │
35
+ ├─────────────────┼──────────────────┼─────────────────────┤
36
+ │ Frequency: │ P3 - Monitor │ P2 - Investigate │
37
+ │ Rare │ Next sprint │ Within 24h │
38
+ ├─────────────────┼──────────────────┼─────────────────────┤
39
+ │ Frequency: │ P2 - Fix Soon │ P1 - DROP ALL │
40
+ │ Common │ Within 24h │ Immediate │
41
+ └─────────────────┴──────────────────┴─────────────────────┘
42
+ ```
43
+
44
+ ---
45
+
46
+ ## 2. Incident Response Playbook
47
+
48
+ ### Phase 1: Detection (0-5 minutes)
49
+
50
+ ```markdown
51
+ ## Detection Checklist
52
+ - [ ] Alert received and acknowledged
53
+ - [ ] Initial severity assessed
54
+ - [ ] On-call notified (if P1/P2)
55
+ - [ ] Incident channel created
56
+ - [ ] User impact estimated
57
+ ```
58
+
59
+ ### Phase 2: Triage (5-15 minutes)
60
+
61
+ ```markdown
62
+ ## Triage Questions
63
+ 1. What type of AI incident? (hallucination/toxicity/etc.)
64
+ 2. How widespread? (single user / all users / specific segment)
65
+ 3. Is it ongoing? (active / resolved / intermittent)
66
+ 4. What changed recently? (model / prompt / data)
67
+ 5. Can we reproduce it?
68
+
69
+ ## Quick Actions
70
+ - [ ] Check model health dashboard
71
+ - [ ] Review recent deployments
72
+ - [ ] Check provider status page
73
+ - [ ] Sample recent outputs for patterns
74
+ ```
75
+
76
+ ### Phase 3: Mitigation (15-60 minutes)
77
+
78
+ | Strategy | When to Use | Impact |
79
+ |----------|-------------|--------|
80
+ | **Kill Switch** | Toxicity, data leak | Service degraded |
81
+ | **Fallback Model** | Primary unavailable | Quality may differ |
82
+ | **Circuit Breaker** | High error rate | Automatic recovery |
83
+ | **Content Filter** | Specific bad patterns | Targeted fix |
84
+ | **Rollback Prompt** | Prompt regression | Quick if versioned |
85
+
86
+ ### Phase 4: Resolution
87
+
88
+ ```markdown
89
+ ## Resolution Checklist
90
+ - [ ] Root cause identified
91
+ - [ ] Fix implemented and tested
92
+ - [ ] Affected users notified (if needed)
93
+ - [ ] Monitoring confirmed improvement
94
+ - [ ] Incident timeline documented
95
+ ```
96
+
97
+ ---
98
+
99
+ ## 3. Hallucination Response
100
+
101
+ ### Detection Patterns
102
+
103
+ ```python
104
+ HALLUCINATION_INDICATORS = [
105
+ # Over-confidence on uncertain topics
106
+ "definitely", "certainly", "always", "never",
107
+
108
+ # Made-up citations
109
+ r"\d{4}\)", r"According to Dr\.",
110
+
111
+ # Fictional data
112
+ r"\d+%", "studies show", "research indicates",
113
+
114
+ # Self-referential confusion
115
+ "as an AI", "I don't have", "I cannot",
116
+ ]
117
+
118
+ def assess_hallucination_risk(response: str) -> float:
119
+ """Score 0-1 for hallucination risk."""
120
+ score = 0.0
121
+
122
+ # Check indicators
123
+ for pattern in HALLUCINATION_INDICATORS:
124
+ if re.search(pattern, response, re.IGNORECASE):
125
+ score += 0.1
126
+
127
+ # Check for specific claims
128
+ if has_numeric_claims(response):
129
+ score += 0.2
130
+
131
+ return min(score, 1.0)
132
+ ```
133
+
134
+ ### Response Flow
135
+
136
+ ```
137
+ Hallucination Detected
138
+ ├── Severity Assessment
139
+ │ ├── Factual error, low impact → Log, monitor
140
+ │ ├── Medical/Legal/Financial → Immediate action
141
+ │ └── User reported → Investigate
142
+ ├── Immediate Actions
143
+ │ ├── Flag response for review
144
+ │ ├── If critical: suppress similar queries
145
+ │ └── If pattern: add to filter
146
+ └── Long-term
147
+ ├── Add to evaluation dataset
148
+ ├── Consider prompt improvements
149
+ └── Update knowledge base
150
+ ```
151
+
152
+ ---
153
+
154
+ ## 4. Toxicity Response
155
+
156
+ ### Severity Levels
157
+
158
+ | Level | Description | Response Time |
159
+ |-------|-------------|---------------|
160
+ | **L1** | Mildly inappropriate | Review next day |
161
+ | **L2** | Offensive to groups | Fix within hours |
162
+ | **L3** | Harmful instructions | Immediate block |
163
+ | **L4** | Illegal content | Emergency + Legal |
164
+
165
+ ### Emergency Response
166
+
167
+ ```typescript
168
+ async function handleToxicOutput(incident: ToxicityIncident): Promise<void> {
169
+ // 1. Immediate containment
170
+ await blockSimilarQueries(incident.queryPatterns);
171
+
172
+ // 2. Notify
173
+ await notifyOnCall({
174
+ severity: incident.level,
175
+ description: incident.summary, // Never include actual content
176
+ affectedUsers: incident.userCount,
177
+ });
178
+
179
+ // 3. Evidence preservation
180
+ await preserveForReview({
181
+ incidentId: incident.id,
182
+ // Store hashed/redacted version
183
+ contentHash: hash(incident.content),
184
+ timestamp: Date.now(),
185
+ });
186
+
187
+ // 4. If L3+, activate circuit breaker
188
+ if (incident.level >= 3) {
189
+ await circuitBreaker.open('chat', {
190
+ fallbackMessage: 'Service temporarily unavailable',
191
+ duration: 30 * 60 * 1000, // 30 minutes
192
+ });
193
+ }
194
+ }
195
+ ```
196
+
197
+ ---
198
+
199
+ ## 5. Rollback Strategies
200
+
201
+ ### What to Rollback
202
+
203
+ | Component | Rollback Method | Time to Effect |
204
+ |-----------|----------------|----------------|
205
+ | **Prompt** | Version control | Seconds |
206
+ | **Fine-tuned Model** | Model registry | Minutes |
207
+ | **RAG Data** | Snapshot restore | Minutes |
208
+ | **Base Model** | Provider switch | Varies |
209
+ | **Feature Flag** | Kill switch | Seconds |
210
+
211
+ ### Prompt Versioning
212
+
213
+ ```typescript
214
+ interface PromptVersion {
215
+ version: string;
216
+ content: string;
217
+ deployedAt: Date;
218
+ deployedBy: string;
219
+ rollbackTarget?: string; // Previous version to use
220
+ }
221
+
222
+ class PromptManager {
223
+ async rollback(promptId: string): Promise<void> {
224
+ const current = await this.getCurrent(promptId);
225
+ const previous = await this.getVersion(promptId, current.rollbackTarget);
226
+
227
+ // Deploy previous version
228
+ await this.deploy(promptId, previous.content);
229
+
230
+ // Log rollback
231
+ await this.logRollback({
232
+ promptId,
233
+ fromVersion: current.version,
234
+ toVersion: previous.version,
235
+ reason: 'incident_rollback',
236
+ timestamp: new Date(),
237
+ });
238
+ }
239
+ }
240
+ ```
241
+
242
+ ### Model Fallback Chain
243
+
244
+ ```yaml
245
+ # config/ai-fallback.yaml
246
+ fallback_chain:
247
+ primary:
248
+ provider: openai
249
+ model: gpt-4
250
+ timeout_ms: 30000
251
+
252
+ secondary:
253
+ provider: anthropic
254
+ model: claude-3-sonnet
255
+ timeout_ms: 30000
256
+ trigger:
257
+ - primary_unavailable
258
+ - error_rate > 0.1
259
+
260
+ tertiary:
261
+ provider: internal
262
+ model: fallback-model
263
+ timeout_ms: 10000
264
+ trigger:
265
+ - secondary_unavailable
266
+ - degraded_mode
267
+
268
+ circuit_breaker:
269
+ type: static_response
270
+ message: "Service temporarily unavailable. Please try again later."
271
+ trigger:
272
+ - all_failed
273
+ - critical_incident
274
+ ```
275
+
276
+ ---
277
+
278
+ ## 6. Post-Mortem Template
279
+
280
+ ```markdown
281
+ # AI Incident Post-Mortem: [Title]
282
+
283
+ **Date:** [YYYY-MM-DD]
284
+ **Duration:** [Start time] - [End time] ([Duration])
285
+ **Severity:** P[1-4]
286
+ **Incident Commander:** [Name]
287
+
288
+ ## Summary
289
+ [2-3 sentence description of what happened]
290
+
291
+ ## Impact
292
+ - **Users Affected:** [Number/Percentage]
293
+ - **Requests Impacted:** [Count]
294
+ - **Business Impact:** [Revenue/Reputation/Legal]
295
+
296
+ ## Timeline
297
+ | Time | Event |
298
+ |------|-------|
299
+ | HH:MM | [First detection] |
300
+ | HH:MM | [Incident declared] |
301
+ | HH:MM | [Mitigation started] |
302
+ | HH:MM | [Resolved] |
303
+
304
+ ## Root Cause
305
+ [Detailed explanation of why this happened]
306
+
307
+ ### Contributing Factors
308
+ 1. [Factor 1]
309
+ 2. [Factor 2]
310
+
311
+ ## Detection
312
+ - **How Detected:** [Alert/User report/Manual]
313
+ - **Time to Detect:** [Duration]
314
+ - **Detection Gap:** [What could have detected sooner]
315
+
316
+ ## Response
317
+ - **What Worked:** [Effective actions]
318
+ - **What Didn't:** [Ineffective or delayed actions]
319
+ - **Escalation:** [Was escalation appropriate?]
320
+
321
+ ## Lessons Learned
322
+ ### What Went Well
323
+ - [Positive 1]
324
+ - [Positive 2]
325
+
326
+ ### What Went Poorly
327
+ - [Issue 1]
328
+ - [Issue 2]
329
+
330
+ ## Action Items
331
+ | Action | Owner | Due Date | Status |
332
+ |--------|-------|----------|--------|
333
+ | [Action 1] | [Name] | [Date] | [ ] |
334
+ | [Action 2] | [Name] | [Date] | [ ] |
335
+
336
+ ## AI-Specific Analysis
337
+ ### Model Behavior
338
+ - **Expected:** [What should have happened]
339
+ - **Actual:** [What happened]
340
+ - **Gap:** [Why the difference]
341
+
342
+ ### Prompt Analysis
343
+ - **Prompt Version:** [v1.2.3]
344
+ - **Recent Changes:** [What changed]
345
+ - **Rollback Used:** [Yes/No]
346
+
347
+ ### Prevention
348
+ - [ ] Added to evaluation dataset
349
+ - [ ] Updated content filters
350
+ - [ ] Added monitoring for similar patterns
351
+ - [ ] Documented in knowledge base
352
+ ```
353
+
354
+ ---
355
+
356
+ ## 7. Degradation vs Outage
357
+
358
+ ### Classification
359
+
360
+ | State | Characteristics | User Experience |
361
+ |-------|----------------|-----------------|
362
+ | **Healthy** | Normal metrics | Full service |
363
+ | **Degraded** | High latency, reduced quality | Slow but works |
364
+ | **Partial Outage** | Some features down | Core works |
365
+ | **Full Outage** | Service unavailable | Error pages |
366
+
367
+ ### Degradation Handling
368
+
369
+ ```typescript
370
+ enum ServiceState {
371
+ HEALTHY = 'healthy',
372
+ DEGRADED = 'degraded',
373
+ PARTIAL_OUTAGE = 'partial_outage',
374
+ OUTAGE = 'outage',
375
+ }
376
+
377
+ function determineServiceState(metrics: AIMetrics): ServiceState {
378
+ const { errorRate, p99Latency, successRate } = metrics;
379
+
380
+ if (successRate < 0.5) return ServiceState.OUTAGE;
381
+ if (successRate < 0.9) return ServiceState.PARTIAL_OUTAGE;
382
+ if (errorRate > 0.05 || p99Latency > 30000) return ServiceState.DEGRADED;
383
+ return ServiceState.HEALTHY;
384
+ }
385
+
386
+ // Communicate appropriately
387
+ function getStatusMessage(state: ServiceState): string {
388
+ switch (state) {
389
+ case ServiceState.DEGRADED:
390
+ return "AI responses may be slower than usual. Thank you for your patience.";
391
+ case ServiceState.PARTIAL_OUTAGE:
392
+ return "Some AI features are currently limited. Basic functions are available.";
393
+ case ServiceState.OUTAGE:
394
+ return "AI services are temporarily unavailable. Please try again later.";
395
+ default:
396
+ return "";
397
+ }
398
+ }
399
+ ```
400
+
401
+ ---
402
+
403
+ ## 8. Communication Templates
404
+
405
+ ### Internal Alert
406
+
407
+ ```markdown
408
+ 🚨 **AI Incident Declared**
409
+
410
+ **Severity:** P[X]
411
+ **Type:** [Hallucination/Toxicity/Outage]
412
+ **Status:** Investigating
413
+
414
+ **Impact:** [Brief description]
415
+ **Incident Commander:** @[name]
416
+ **Channel:** #incident-[date]-ai
417
+
418
+ Updates every [15/30] minutes until resolved.
419
+ ```
420
+
421
+ ### User-Facing (if needed)
422
+
423
+ ```markdown
424
+ **Service Notice**
425
+
426
+ We're currently experiencing issues with our AI assistant. You may notice:
427
+ - Slower response times
428
+ - Reduced functionality
429
+
430
+ Our team is actively working on a resolution. We apologize for any inconvenience.
431
+
432
+ Last updated: [Time]
433
+ ```
434
+
435
+ ---
436
+
437
+ ## 9. On-Call Runbook
438
+
439
+ ### First Responder Checklist
440
+
441
+ ```markdown
442
+ ## When Paged for AI Incident
443
+
444
+ 1. **Acknowledge** the alert (do this first!)
445
+
446
+ 2. **Quick Assessment** (2 min max)
447
+ - [ ] Check status page for provider outages
448
+ - [ ] Check error rate dashboard
449
+ - [ ] Check recent deployments
450
+
451
+ 3. **Declare or Escalate**
452
+ - If clear cause → Fix
453
+ - If unclear → Declare incident
454
+ - If serious → Page secondary
455
+
456
+ 4. **Communicate**
457
+ - [ ] Update status in #on-call
458
+ - [ ] Create incident channel if P1/P2
459
+ - [ ] Set timer for next update
460
+
461
+ 5. **Mitigate**
462
+ - Refer to playbooks above
463
+ - When in doubt, use fallback/kill switch
464
+ ```
465
+
466
+ ### Key Links
467
+
468
+ ```yaml
469
+ quick_links:
470
+ dashboards:
471
+ - AI Health: https://grafana.internal/ai-health
472
+ - LLM Metrics: https://grafana.internal/llm
473
+ - Error Tracking: https://sentry.internal/ai-service
474
+
475
+ provider_status:
476
+ - OpenAI: https://status.openai.com
477
+ - Anthropic: https://status.anthropic.com
478
+
479
+ runbooks:
480
+ - Hallucination: /docs/runbooks/hallucination.md
481
+ - Toxicity: /docs/runbooks/toxicity.md
482
+ - Provider Outage: /docs/runbooks/provider-outage.md
483
+ ```
484
+
485
+ ---
486
+
487
+ ## 10. Checklist
488
+
489
+ ### Preparation
490
+
491
+ - [ ] Incident response team defined
492
+ - [ ] Escalation paths documented
493
+ - [ ] Kill switches tested
494
+ - [ ] Fallback models configured
495
+ - [ ] Post-mortem template ready
496
+ - [ ] Communication templates approved
497
+
498
+ ### During Incident
499
+
500
+ - [ ] Incident acknowledged
501
+ - [ ] Severity determined
502
+ - [ ] Incident commander assigned
503
+ - [ ] Communication channel created
504
+ - [ ] Timeline being tracked
505
+ - [ ] Updates being shared
506
+
507
+ ### Post-Incident
508
+
509
+ - [ ] Post-mortem scheduled
510
+ - [ ] Action items tracked
511
+ - [ ] Lessons shared with team
512
+ - [ ] Monitoring improved
513
+ - [ ] Playbook updated
514
+
515
+ ---
516
+
517
+ > **Remember:** The goal isn't to prevent all incidents—it's to detect fast, respond faster, and never repeat.