npm - dojo.md - Versions diffs - 0.1.0 → 0.2.0 - Mend

dojo.md 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (243) hide show

package/courses/rest-api-error-handling/scenarios/level-4/error-training-program.yaml ADDED Viewed

@@ -0,0 +1,61 @@
+meta:
+  id: error-training-program
+  level: 4
+  course: rest-api-error-handling
+  type: output
+  description: "Design API error handling training — create a curriculum for building error handling expertise across an organization"
+  tags: [REST, API, training, curriculum, certification, onboarding, expert]
+state: {}
+trigger: |
+  You're designing a company-wide API error handling training program
+  for your 800-engineer organization. Currently, error handling
+  knowledge is concentrated in 5 senior engineers who wrote the
+  original API framework.
+  Current problems the training must address:
+  - New hires take 6 weeks before they can debug production API errors
+  - 60% of API support tickets are "what does this error mean?"
+  - Code reviews show the same error handling mistakes repeatedly:
+    catching and swallowing errors, returning 200 with error bodies,
+    exposing stack traces, inconsistent error formats
+  - 3 P1 incidents in the last year were caused by poor error handling
+  - Teams copy-paste error handling code without understanding it
+  - No one outside the core 5 engineers understands distributed
+    error propagation
+  Target audience tiers:
+  1. All engineers: Read and understand API error responses, basic
+     debugging using correlation IDs and error logs
+  2. Backend developers: Write correct error handling code, implement
+     validation, use proper status codes
+  3. Platform engineers: Design error handling frameworks, build
+     monitoring, implement circuit breakers
+  4. On-call engineers: Diagnose and resolve production error
+     scenarios under time pressure
+  Constraints:
+  - Maximum 3 hours of training per engineer per quarter
+  - Must be available asynchronously (remote-first)
+  - Must show measurable impact within 6 months
+  - Must reduce error-related support tickets by 50%
+  Task: Design the complete training program. Write: the curriculum
+  for each tier, the hands-on exercises (sandbox API with intentional
+  errors), the certification system, the measurement framework, and
+  the rollout plan.
+assertions:
+  - type: llm_judge
+    criteria: "Curriculum is tier-appropriate — all engineers learn to read errors and use correlation IDs, backend devs learn to write correct error handling, platform engineers learn framework design and monitoring, and on-call engineers practice incident scenarios. Each tier builds on the previous"
+    weight: 0.35
+    description: "Tier-appropriate curriculum"
+  - type: llm_judge
+    criteria: "Hands-on exercises are practical — uses a sandbox API with intentional errors (status code mistakes, information leakage, missing validation, cascade failures), exercises are progressive in difficulty, and on-call training includes simulated incidents with time pressure"
+    weight: 0.35
+    description: "Practical hands-on exercises"
+  - type: llm_judge
+    criteria: "Measurement framework proves impact — tracks leading indicators (training completion, exercise pass rates, code review error catch rates) and lagging indicators (support ticket volume, incident count, time-to-resolve), with clear before/after comparison methodology"
+    weight: 0.30
+    description: "Impact-proving measurements"

package/courses/rest-api-error-handling/scenarios/level-4/expert-error-shift.yaml ADDED Viewed

@@ -0,0 +1,63 @@
+meta:
+  id: expert-error-shift
+  level: 4
+  course: rest-api-error-handling
+  type: output
+  description: "Expert error handling shift — manage a multi-dimensional API reliability crisis with business, technical, and organizational dimensions"
+  tags: [REST, API, error-handling, shift-simulation, crisis, expert]
+state: {}
+trigger: |
+  You're the VP of Engineering. It's Monday morning and three
+  crises landed simultaneously over the weekend.
+  Crisis 1 — Major customer data exposure via error responses:
+  A security researcher published a blog post showing that your API's
+  error responses leak sensitive data. Examples:
+  - GET /users/123 returns 403 with the user's email in the error:
+    "User alice@company.com does not have access to this resource"
+  - Failed login returns different errors for "user not found" vs
+    "wrong password" (account enumeration)
+  - 500 errors expose database schema in stack traces
+  The blog post has 50K views. Your security team estimates 6 months
+  of error logs containing exposed PII. GDPR implications.
+  Crisis 2 — Enterprise customer threatening $10M contract termination:
+  Your largest customer's integration broke Friday night because you
+  deployed a change to error response format (from custom JSON to
+  RFC 7807) without warning. Their error parsing broke, causing their
+  system to treat all your API responses as errors. They lost $500K
+  in weekend transactions and want $1M in damages.
+  Crisis 3 — Internal team revolt:
+  The engineering team is frustrated. An email to all-engineering
+  went viral internally: "We've spent 6 months building a new error
+  handling framework and leadership keeps changing requirements.
+  First it was JSON, then RFC 7807, then custom extensions. The
+  migration is 20% done and now we're told to add GDPR compliance
+  fields. Nobody knows what the actual standard is. 8 teams have
+  paused migration entirely."
+  You have a board meeting Wednesday where you must address all three.
+  Task: Navigate all 3 crises. Write: the security response plan
+  (data exposure containment and GDPR notification), the customer
+  recovery plan (contract preservation and compensation), the
+  internal alignment plan (single standard, clear migration path),
+  and the board briefing that unifies all three into a coherent
+  strategy.
+assertions:
+  - type: llm_judge
+    criteria: "Security response is thorough — assesses the actual data exposure (6 months of logs), plans GDPR Article 33 notification (72 hours), fixes the three specific leaks (PII in errors, account enumeration, stack traces), and addresses the researcher's report publicly"
+    weight: 0.35
+    description: "Thorough security response"
+  - type: llm_judge
+    criteria: "Customer recovery preserves the relationship — acknowledges the breaking change without defensiveness, offers meaningful compensation, implements a change notification process to prevent recurrence, and provides a migration support package. Addresses the contract termination threat directly"
+    weight: 0.35
+    description: "Relationship-preserving customer recovery"
+  - type: llm_judge
+    criteria: "Internal alignment is decisive — makes a clear decision on the error standard (RFC 7807 with specific extensions), provides a single migration document, addresses team frustration empathetically, and the board briefing connects all three crises as symptoms of one root cause (lack of error handling governance)"
+    weight: 0.30
+    description: "Decisive internal alignment"

package/courses/rest-api-error-handling/scenarios/level-5/comprehensive-error-system.yaml ADDED Viewed

@@ -0,0 +1,68 @@
+meta:
+  id: comprehensive-error-system
+  level: 5
+  course: rest-api-error-handling
+  type: output
+  description: "Design a comprehensive API error handling system — architect a 5-layer error platform for a Fortune 500 company"
+  tags: [REST, API, comprehensive, enterprise, architecture, Fortune-500, master]
+state: {}
+trigger: |
+  You're the CTO of a Fortune 500 company ($15B revenue, 5,000
+  engineers) designing a unified API error handling system. The
+  company has 6 business units acquired over 12 years, each with
+  different API stacks, error handling approaches, and cultures.
+  Current state:
+  - BU1 Fintech (1,500 eng): REST APIs, RFC 7807, sophisticated
+    error handling, PCI DSS/SOC 2 compliant, 50,000 API consumers
+  - BU2 Healthcare (1,000 eng): REST + HL7 FHIR, HIPAA compliant,
+    custom error format, 5,000 API consumers
+  - BU3 E-commerce (800 eng): REST + GraphQL, basic error handling,
+    no compliance framework, 20,000 API consumers
+  - BU4 IoT (400 eng): MQTT + REST, custom binary error codes,
+    no standard format, 2,000 device integrations
+  - BU5 Government (200 eng): REST, FedRAMP compliant, NIST 800-53,
+    verbose audit trails, 500 government consumers
+  - BU6 AI Platform (100 eng): gRPC + REST, no error standard,
+    streaming errors for long-running ML tasks, 1,000 consumers
+  Design the error handling system in 5 layers:
+  Layer 1 — Error Standards: Unified error format that accommodates
+  REST, GraphQL, gRPC, and MQTT. Error code registry shared across
+  all 6 BUs.
+  Layer 2 — Error Infrastructure: Logging, tracing, metrics pipeline
+  that ingests errors from all protocols and stacks.
+  Layer 3 — Error Intelligence: Analytics, anomaly detection,
+  prediction, and AI-powered root cause analysis across all BUs.
+  Layer 4 — Error Compliance: Audit trails, PII handling, data
+  residency, and reporting that satisfies all 5 compliance frameworks
+  simultaneously.
+  Layer 5 — Error Experience: Consumer-facing error documentation,
+  SDKs, self-service debugging tools, and developer portal.
+  Constraints:
+  - $25M budget over 3 years
+  - Cannot break any BU's existing API consumers
+  - Must handle protocol diversity (REST, GraphQL, gRPC, MQTT, HL7)
+  - Must satisfy 5 compliance frameworks simultaneously
+  - Must handle 2B+ error events per day across all BUs
+  Task: Design all 5 layers with migration strategy, 3-year roadmap,
+  and board presentation for approval.
+assertions:
+  - type: llm_judge
+    criteria: "5-layer design accommodates protocol diversity — the unified error format works across REST (JSON), GraphQL (extensions), gRPC (status codes + details), MQTT (custom payloads), and HL7 FHIR (OperationOutcome). Each BU's specific needs are addressed without forcing one-size-fits-all"
+    weight: 0.35
+    description: "Protocol-accommodating 5-layer design"
+  - type: llm_judge
+    criteria: "Migration plan is phased and non-disruptive — phases BUs by readiness and complexity, provides adapter layers so existing consumers aren't broken, and the 3-year roadmap has quarterly milestones with clear deliverables per BU"
+    weight: 0.35
+    description: "Non-disruptive migration plan"
+  - type: llm_judge
+    criteria: "Budget allocation is justified — $25M distributed across layers, BUs, and years with build-vs-buy decisions, ROI projection showing when investment pays back, and the compliance unification saves the company from maintaining 5 separate compliance programs"
+    weight: 0.30
+    description: "Justified budget allocation"

package/courses/rest-api-error-handling/scenarios/level-5/error-ai-future.yaml ADDED Viewed

@@ -0,0 +1,75 @@
+meta:
+  id: error-ai-future
+  level: 5
+  course: rest-api-error-handling
+  type: output
+  description: "Design AI-powered API error handling — build systems that predict, prevent, and auto-remediate API errors using machine learning"
+  tags: [REST, API, AI, machine-learning, prediction, auto-remediation, master]
+state: {}
+trigger: |
+  You're the Head of AI/ML at an API platform company. The CEO
+  wants to build "self-healing APIs" — APIs that predict errors
+  before they happen, automatically remediate common issues, and
+  learn from every incident.
+  Current capabilities:
+  - 2B API requests/day across 500 endpoints
+  - 2 years of historical error data (logs, traces, metrics)
+  - Error pattern library: 500 known error patterns
+  - Current MTTR: 45 minutes (target: <5 minutes)
+  - ML infrastructure: MLflow, Kubernetes, GPU cluster
+  Desired AI capabilities:
+  1. Error prediction:
+  - Predict error spikes 30 minutes before they happen
+  - Input: latency trends, error rate derivatives, deployment events,
+    infrastructure metrics, traffic patterns
+  - Target: "Endpoint X will exceed 1% error rate in 30 minutes"
+  2. Automatic root cause analysis:
+  - Given an error spike, identify the root cause in <2 minutes
+  - Currently: engineers manually trace through 20 services
+  - Target: "Error caused by memory leak in Service Y, introduced
+    in commit abc123 deployed at 2:30 PM"
+  3. Auto-remediation:
+  - For known error patterns, take automatic corrective action
+  - Examples: restart pods, scale up, roll back deployments, enable
+    circuit breakers, increase timeout thresholds
+  - Safety: human approval for risky actions
+  4. Error message generation:
+  - Generate contextual, helpful error messages based on the request
+    context, user history, and error type
+  - Current: "Internal server error"
+  - Target: "Your payment couldn't be processed because your card
+    issuer declined the transaction. This sometimes happens with
+    international purchases. Try a different card or contact your
+    bank."
+  5. API consumer error assistance:
+  - Chatbot that helps API consumers debug their integration errors
+  - Uses historical error patterns and documentation
+  Task: Design the AI error handling system. For each of the 5
+  capabilities, write: the ML model design (features, architecture,
+  training data), the safety framework (when to act automatically
+  vs require human approval), the feedback loop (how the system
+  learns from outcomes), and the ethical considerations.
+assertions:
+  - type: llm_judge
+    criteria: "ML designs are technically sound — error prediction uses time-series features with appropriate models (LSTM, transformer, or gradient boosting), root cause analysis uses causal inference or graph neural networks on service dependency data, and training data requirements are realistic given 2 years of historical data"
+    weight: 0.35
+    description: "Technically sound ML designs"
+  - type: llm_judge
+    criteria: "Safety framework prevents AI-caused incidents — defines confidence thresholds for automatic action vs human approval, has rollback mechanisms for auto-remediation, prevents feedback loops (auto-scaling triggering more errors), and addresses the risk of AI taking wrong corrective actions"
+    weight: 0.35
+    description: "Incident-preventing safety framework"
+  - type: llm_judge
+    criteria: "Ethical considerations are thoughtful — addresses bias in error message generation (accessibility, language), transparency (users should know when AI generated the message), accountability (who's responsible when auto-remediation fails), and data privacy (ML models trained on potentially sensitive error data)"
+    weight: 0.30
+    description: "Thoughtful ethical considerations"

package/courses/rest-api-error-handling/scenarios/level-5/error-behavioral-science.yaml ADDED Viewed

@@ -0,0 +1,73 @@
+meta:
+  id: error-behavioral-science
+  level: 5
+  course: rest-api-error-handling
+  type: output
+  description: "Apply behavioral science to API errors — use psychology research to design error messages and systems that drive correct developer behavior"
+  tags: [REST, API, behavioral-science, psychology, developer-experience, master]
+state: {}
+trigger: |
+  You're a researcher studying how API error messages affect developer
+  behavior. Your company's API has 10,000 active developers, and
+  you've been given budget to run A/B tests on error messages and
+  error handling UX.
+  Your research questions:
+  1. Do better error messages reduce support tickets?
+  2. Do error messages affect integration quality?
+  3. Can error UX prevent common mistakes?
+  4. Do developers trust APIs more when errors are well-handled?
+  Behavioral observations from your developer research:
+  Observation 1 — Anchoring effect:
+  When developers see their first error from your API (during
+  onboarding), it anchors their perception of your API quality.
+  Developers who hit a well-formatted 400 with clear guidance had
+  3x higher completion rates than those who hit a cryptic 500.
+  Observation 2 — Learned helplessness:
+  After 3+ unhelpful error messages, developers stop reading them
+  and go straight to support. 40% of support tickets say "I got an
+  error" without including the error message.
+  Observation 3 — Copy-paste behavior:
+  70% of developers copy-paste error messages into Google. Error
+  messages that are unique strings are more findable than generic
+  ones.
+  Observation 4 — Completion bias:
+  Developers who are 80% done integrating will push through bad
+  error handling. Developers who hit errors in the first 20% are
+  5x more likely to abandon the integration entirely.
+  Observation 5 — Social proof:
+  Error pages that say "500 developers hit this same error this
+  month — here's how they fixed it" have 60% lower support tickets.
+  Observation 6 — Loss aversion:
+  Developers respond more to "This error will cause payment failures
+  in production" than "Fixing this will improve reliability."
+  Task: Design a behavioral science-informed error handling system.
+  Write: the A/B test designs for each observation (hypothesis,
+  control, treatment, metrics), the error message framework based on
+  the research findings, the developer journey map showing where
+  errors have the most impact, and the recommendations for the API
+  team.
+assertions:
+  - type: llm_judge
+    criteria: "A/B test designs are methodologically sound — each has a clear hypothesis, control and treatment groups, primary and secondary metrics, sample size considerations, and expected effect size. The tests are ethical (no degradation of error handling for the control group)"
+    weight: 0.35
+    description: "Sound A/B test methodology"
+  - type: llm_judge
+    criteria: "Error message framework applies behavioral insights — addresses anchoring (great first-error experience), learned helplessness (always helpful messages), searchability (unique error strings), completion bias (extra help during onboarding), social proof (community solutions), and loss aversion (production impact warnings)"
+    weight: 0.35
+    description: "Behavioral insight application"
+  - type: llm_judge
+    criteria: "Developer journey map is insightful — identifies the critical moments where errors have outsized impact (onboarding, first production deployment, scaling), and the recommendations are actionable for the API team with expected impact on support tickets and developer retention"
+    weight: 0.30
+    description: "Insightful developer journey map"

package/courses/rest-api-error-handling/scenarios/level-5/error-board-strategy.yaml ADDED Viewed

@@ -0,0 +1,60 @@
+meta:
+  id: error-board-strategy
+  level: 5
+  course: rest-api-error-handling
+  type: output
+  description: "Present API reliability strategy to the board — build a multi-year investment case for API error handling excellence"
+  tags: [REST, API, board, strategy, investment, reliability, master]
+state: {}
+trigger: |
+  You're the CTO presenting to the board of directors. Your company
+  (a $2B revenue API-first platform) needs board approval for a
+  $15M, 3-year API reliability investment. Two board members are
+  skeptical: "Our APIs work fine. Why spend $15M on error handling?"
+  Your business case data:
+  - Current annual cost of API errors: $25M (direct + indirect)
+  - Customer churn due to reliability: $8M/year
+  - Competitive disadvantage: 3 enterprise deals lost to competitors
+    with better API reliability (total contract value: $12M)
+  - Regulatory risk: $5M potential fines (PCI DSS, GDPR findings)
+  - Engineering productivity lost to error debugging: $4M/year
+  Proposed investment breakdown ($15M over 3 years):
+  - Year 1 ($7M): Error handling platform, tracing, monitoring
+  - Year 2 ($5M): AI-powered error prediction, SLO automation
+  - Year 3 ($3M): Self-healing APIs, proactive error prevention
+  Projected returns:
+  - Year 1: $8M savings (quick wins + customer retention)
+  - Year 2: $15M savings + $5M new revenue (reliability as feature)
+  - Year 3: $20M savings + $10M new revenue (market differentiation)
+  Board member profiles:
+  - Chair (former CEO): cares about competitive positioning
+  - Finance chair: skeptical, wants hard ROI numbers
+  - Technology committee: wants architectural confidence
+  - Risk committee: worried about regulatory exposure
+  - New member (VC partner): wants growth story, not maintenance
+  Task: Write the board presentation and supporting materials.
+  Include: the 3-year strategy narrative, the financial model (ROI
+  calculation, NPV, payback period), the risk mitigation argument,
+  the competitive differentiation story, and the prepared responses
+  for each skeptical board member.
+assertions:
+  - type: llm_judge
+    criteria: "Financial model is rigorous — ROI calculation shows $15M investment returning $58M+ over 3 years, NPV is positive at reasonable discount rate, payback period is under 18 months. Numbers are broken down by category and year, not just totals. The finance chair would find this credible"
+    weight: 0.35
+    description: "Rigorous financial model"
+  - type: llm_judge
+    criteria: "Strategy narrative is compelling — connects error handling to business outcomes (customer retention, competitive wins, regulatory compliance, developer productivity), positions the investment as growth enabler (not just cost reduction), and addresses each board member's specific concern"
+    weight: 0.35
+    description: "Compelling strategy narrative"
+  - type: llm_judge
+    criteria: "Skeptic responses are prepared — has specific answers for 'our APIs work fine' (data shows $25M in hidden costs), 'why $15M?' (each component justified with alternatives), and the VC's growth concern (reliability enables enterprise tier pricing). Responses are data-backed, not defensive"
+    weight: 0.30
+    description: "Prepared skeptic responses"

package/courses/rest-api-error-handling/scenarios/level-5/error-consulting-engagement.yaml ADDED Viewed

@@ -0,0 +1,58 @@
+meta:
+  id: error-consulting-engagement
+  level: 5
+  course: rest-api-error-handling
+  type: output
+  description: "Lead an API error handling consulting engagement — transform a client's error handling from ad-hoc to world-class in 12 weeks"
+  tags: [REST, API, consulting, transformation, engagement, master]
+state: {}
+trigger: |
+  You're a principal consultant at an API reliability firm. A Series
+  C fintech company ($200M valuation, 400 engineers) hired you for
+  a 12-week engagement after their API errors caused a $5M loss and
+  made the news.
+  Discovery findings (Week 1):
+  - 80 microservices, no consistent error handling
+  - 12 different error response formats
+  - No distributed tracing (debugging takes hours)
+  - Error logs contain PII (GDPR violation risk)
+  - No error budgets or SLOs defined
+  - On-call rotation causes 30% engineer burnout
+  - API consumers receive error messages like "null", "undefined",
+    "[object Object]", and full Java stack traces
+  - Rate limiting exists but returns HTML error pages (not JSON)
+  - Webhook delivery has no retry logic (events lost daily)
+  - 3 compliance frameworks apply (PCI DSS, SOC 2, GDPR) but no
+    compliance-aware error handling
+  Client expectations:
+  - "Fix the critical issues in the first 4 weeks"
+  - "Leave us with a system we can maintain ourselves"
+  - "Train our team so we don't need consultants again"
+  - "Show our board measurable improvement"
+  Budget: $500K for the engagement
+  Your team: You + 2 senior consultants + 1 junior consultant
+  Task: Design the 12-week engagement plan. Write: the weekly
+  deliverables, the quick wins (first 4 weeks), the systematic
+  improvements (weeks 5-10), the handoff plan (weeks 11-12), the
+  client team training approach, and the success metrics report
+  for the board.
+assertions:
+  - type: llm_judge
+    criteria: "12-week plan is structured and realistic — first 4 weeks fix critical issues (PII in logs, stack traces in responses, webhook retry), weeks 5-10 build systematic improvements (error standard, tracing, SLOs, compliance), weeks 11-12 hand off with documentation and training. Each week has specific deliverables and the 4-person team capacity is realistic"
+    weight: 0.35
+    description: "Structured realistic engagement plan"
+  - type: llm_judge
+    criteria: "Quick wins deliver visible impact — addresses the highest-risk items first (PII exposure for GDPR, stack trace leakage for security, webhook reliability for data integrity), and shows measurable improvement within 4 weeks that the client can present to their board"
+    weight: 0.35
+    description: "Impactful quick wins"
+  - type: llm_judge
+    criteria: "Handoff ensures sustainability — training program for the client team, documentation of standards and architecture decisions, monitoring dashboards the client can operate, and a 30/60/90 day post-engagement checkup plan"
+    weight: 0.30
+    description: "Sustainable handoff"

package/courses/rest-api-error-handling/scenarios/level-5/error-industry-benchmarks.yaml ADDED Viewed

@@ -0,0 +1,72 @@
+meta:
+  id: error-industry-benchmarks
+  level: 5
+  course: rest-api-error-handling
+  type: output
+  description: "Analyze API error handling industry benchmarks — compare error handling maturity across industries and define best-in-class standards"
+  tags: [REST, API, benchmarks, industry-analysis, maturity-model, master]
+state: {}
+trigger: |
+  You're writing the definitive "State of API Error Handling 2026"
+  report. You've surveyed 500 companies and analyzed 200 public APIs.
+  Your data reveals significant gaps between industry leaders and
+  the median.
+  Survey data across industries:
+  Financial Services (75 companies):
+  - Error format adoption: 80% RFC 7807, 15% custom, 5% no standard
+  - Avg error response time: 45ms (leaders: 15ms)
+  - Error rate (5xx): 0.02% (leaders: 0.005%)
+  - Distributed tracing adoption: 90%
+  - Error budget adoption: 65%
+  - Compliance-aware error handling: 95%
+  Healthcare (50 companies):
+  - Error format adoption: 30% RFC 7807, 40% HL7/FHIR errors, 30% custom
+  - Avg error response time: 200ms (leaders: 50ms)
+  - Error rate (5xx): 0.15% (leaders: 0.03%)
+  - Distributed tracing: 40%
+  - PHI in error responses: 25% (violation)
+  - Audit trail completeness: 60%
+  E-commerce (100 companies):
+  - Error format adoption: 45% RFC 7807, 35% custom, 20% no standard
+  - Avg error response time: 80ms (leaders: 20ms)
+  - Error rate (5xx): 0.08% (leaders: 0.01%)
+  - Retry and idempotency: 55%
+  - Circuit breaker adoption: 40%
+  - Error cost tracking: 15%
+  SaaS/B2B (150 companies):
+  - Error format adoption: 60% RFC 7807, 30% custom, 10% no standard
+  - API error documentation quality: 4.2/10 average
+  - Error message actionability: 35% have actionable messages
+  - Webhook error handling: 50% have retry logic
+  - Error SLA in contracts: 25%
+  Public Sector (25 companies):
+  - Error format adoption: 15% RFC 7807, 20% custom, 65% no standard
+  - FedRAMP compliance: 40%
+  - Error audit trails: 30%
+  Task: Write the industry benchmark report. Include: the maturity
+  model (5 levels with criteria), the industry comparison analysis,
+  the gap analysis (where each industry needs to improve most), the
+  recommendations per industry, and the predictions for 2027-2028.
+assertions:
+  - type: llm_judge
+    criteria: "Maturity model is well-defined — 5 levels from ad-hoc to world-class with specific criteria at each level (error format, tracing, SLOs, compliance, cost tracking). Each level has clear indicators and the model can be used for self-assessment"
+    weight: 0.35
+    description: "Well-defined maturity model"
+  - type: llm_judge
+    criteria: "Industry analysis is insightful — identifies why financial services leads (regulatory pressure, direct revenue impact), why healthcare lags (complex standards, legacy systems), and provides specific benchmarks each industry should target. The data is analyzed, not just presented"
+    weight: 0.35
+    description: "Insightful industry analysis"
+  - type: llm_judge
+    criteria: "Predictions are grounded — forecasts based on trends in the data (RFC 7807 adoption curve, AI integration, regulatory pressure), not speculation. Includes specific predictions like 'by 2028, 80% of APIs will use Problem Details' with supporting reasoning"
+    weight: 0.30
+    description: "Grounded predictions"

package/courses/rest-api-error-handling/scenarios/level-5/error-ma-integration.yaml ADDED Viewed

@@ -0,0 +1,68 @@
+meta:
+  id: error-ma-integration
+  level: 5
+  course: rest-api-error-handling
+  type: output
+  description: "Integrate API error handling post-M&A — unify error handling across two companies with incompatible API platforms"
+  tags: [REST, API, M&A, integration, unification, migration, master]
+state: {}
+trigger: |
+  Your company ($5B revenue, 3,000 engineers) just acquired a
+  competitor ($1B revenue, 800 engineers). Both companies have
+  extensive API platforms that must be integrated. The board
+  mandated "unified API experience within 18 months."
+  Acquirer (your company) API platform:
+  - 200 REST APIs, RFC 7807 error format
+  - OpenTelemetry distributed tracing
+  - Error budgets with SLOs per endpoint
+  - Centralized error analytics (Datadog)
+  - Multi-region (US, EU)
+  - Compliance: SOC 2, PCI DSS, GDPR
+  - Languages: Go, TypeScript
+  - API gateway: Kong
+  Acquired company API platform:
+  - 150 REST APIs + 30 GraphQL APIs, custom error format
+  - No distributed tracing (per-service logging only)
+  - No SLOs (alert-based monitoring only)
+  - Fragmented monitoring (Sentry, CloudWatch, custom dashboards)
+  - Single region (US-East)
+  - Compliance: HIPAA (healthcare data)
+  - Languages: Java, Python
+  - API gateway: AWS API Gateway
+  Integration challenges:
+  1. Different error formats (RFC 7807 vs custom)
+  2. Different error codes (same name, different meaning)
+  3. Overlapping endpoints (both have /users, /payments, etc.)
+  4. Different auth systems (OAuth2 vs API keys)
+  5. Combined compliance: need SOC 2 + PCI DSS + GDPR + HIPAA
+  6. Customer migration (3,000 + 1,200 API consumers)
+  7. Cultural differences (startup velocity vs enterprise stability)
+  The acquired company's CTO is resistant: "Our error handling works
+  fine for our customers. Why should we change?"
+  Task: Design the 18-month integration plan. Write: the unified
+  error handling standard (accounting for both platforms), the
+  migration strategy for both sets of API consumers, the compliance
+  unification approach, the organizational integration (combining
+  platform teams), and the change management strategy for the
+  acquired company.
+assertions:
+  - type: llm_judge
+    criteria: "Unified standard accommodates both platforms — doesn't simply impose the acquirer's standard on the acquired company, but creates a unified format that handles both REST and GraphQL, incorporates the best practices from both platforms, and adds HIPAA compliance to the existing SOC 2/PCI DSS/GDPR framework"
+    weight: 0.35
+    description: "Accommodating unified standard"
+  - type: llm_judge
+    criteria: "Migration strategy preserves customer experience — phases the migration so no API consumer experiences breaking changes, provides migration tooling (adapters, SDKs, documentation), and handles the 4,200 combined consumers with backward-compatible error format evolution"
+    weight: 0.35
+    description: "Customer-preserving migration"
+  - type: llm_judge
+    criteria: "Change management addresses resistance — acknowledges the acquired CTO's concerns, identifies what the acquired platform does well (adopt those practices), and the organizational integration plan combines teams without creating winners and losers"
+    weight: 0.30
+    description: "Resistance-addressing change management"