npm - @cubis/foundry - Versions diffs - 0.3.71 → 0.3.73 - Mend

@cubis/foundry 0.3.71 → 0.3.73

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (276) hide show

package/workflows/powers/nextjs-developer/POWER.md CHANGED Viewed

@@ -1,62 +1,69 @@
 ````markdown
 ---
 inclusion: manual
-name: "nextjs-developer"
-displayName: "Next.js Developer"
-description: "Use for implementing/refactoring Next.js App Router features, React Server Components/Actions, SEO, and production architecture."
+name: nextjs-developer
+description: "Use for implementing and reviewing Next.js App Router features, React Server Components and Actions, caching strategy, SEO, and production runtime behavior."
 license: MIT
 metadata:
-  version: "2.0.0"
-  domain: "frontend"
-  role: "specialist"
-  stack: "nextjs"
-  baseline: "Next.js 16 + React 19"
+  author: cubis-foundry
+  version: "3.0"
+compatibility: Claude Code, Codex, GitHub Copilot
 ---
 # Next.js Developer
-## Scope
+## Purpose
-- Primary skill for App Router implementation and refactors.
-- Not for version migration playbooks (`next-upgrade`) or cache-components-only tuning (`next-cache-components`).
+Use for implementing and reviewing Next.js App Router features, React Server Components and Actions, caching strategy, SEO, and production runtime behavior.
-## When to use
+## When to Use
-- Building routes/layouts/loading/error boundaries.
-- Implementing server components and server actions.
-- Designing data fetching/revalidation strategy.
-- Improving SEO and web vitals.
+- Building or refactoring App Router routes, layouts, loading states, and errors.
+- Choosing static, dynamic, or streaming rendering behavior per route.
+- Implementing React Server Components, Server Actions, and cache invalidation.
+- Improving metadata, structured data, and Core Web Vitals in a Next app.
-## Core workflow
+## Instructions
-1. Define rendering mode per route (static/dynamic/streaming).
-2. Keep server/client component boundaries explicit.
-3. Implement data access with caching semantics by intent.
-4. Add forms/actions with validation and auth checks.
-5. Verify perf, accessibility, and error handling.
+1. Decide rendering mode and data boundary per route.
+2. Keep server and client components separated by necessity, not habit.
+3. Use caching and invalidation intentionally rather than globally.
+4. Add route-level loading, error, and auth-sensitive behavior explicitly.
+5. Verify SEO, accessibility, and runtime performance before shipping.
-## Baseline standards
+### Baseline standards
-- Default to Server Components; opt into client components only when needed.
+- Default to Server Components and opt into client components deliberately.
 - Keep server-only code out of client bundles.
-- Use route-level loading/error states.
-- Use metadata API and structured data for SEO.
-- Measure Core Web Vitals and act on regressions.
-## Avoid
-- Unnecessary client hydration.
-- Mixing auth-sensitive logic into client-only paths.
-- Overusing global revalidation when targeted invalidation is possible.
-## Reference files
-- `references/app-router.md`
-- `references/server-components.md`
-- `references/server-actions.md`
-- `references/data-fetching.md`
-- `references/performance.md`
-- `references/seo.md`
-- `references/deployment.md`
-- `references/testing.md`
+- Use route-level loading and error states.
+- Prefer targeted revalidation over broad cache busting.
+- Measure Web Vitals after changes that affect rendering or data flow.
+### Constraints
+- Avoid unnecessary client hydration.
+- Avoid mixing secret or auth-sensitive logic into client paths.
+- Avoid global revalidation when route- or tag-scoped invalidation is enough.
+- Avoid framework-heavy decisions without checking the underlying React boundary first.
+## Output Format
+Provide implementation guidance, code examples, and configuration as appropriate to the task.
+## References
+Load on demand. Do not preload all reference files.
+| File | Load when |
+| --- | --- |
+| `references/app-router-cache-playbook.md` | The task needs route-level guidance for App Router rendering, caching, revalidation, metadata, and server/client boundaries. |
+## Scripts
+No helper scripts are required for this skill right now. Keep execution in `SKILL.md` and `references/` unless repeated automation becomes necessary.
+## Examples
+- "Help me with nextjs developer best practices in this project"
+- "Review my nextjs developer implementation for issues"
 ````

package/workflows/powers/nextjs-developer/SKILL.md CHANGED Viewed

@@ -1,59 +1,66 @@
 ---
-name: "nextjs-developer"
-displayName: "Next.js Developer"
-description: "Use for implementing/refactoring Next.js App Router features, React Server Components/Actions, SEO, and production architecture."
+name: nextjs-developer
+description: "Use for implementing and reviewing Next.js App Router features, React Server Components and Actions, caching strategy, SEO, and production runtime behavior."
 license: MIT
 metadata:
-  version: "2.0.0"
-  domain: "frontend"
-  role: "specialist"
-  stack: "nextjs"
-  baseline: "Next.js 16 + React 19"
+  author: cubis-foundry
+  version: "3.0"
+compatibility: Claude Code, Codex, GitHub Copilot
 ---
 # Next.js Developer
-## Scope
+## Purpose
-- Primary skill for App Router implementation and refactors.
-- Not for version migration playbooks (`next-upgrade`) or cache-components-only tuning (`next-cache-components`).
+Use for implementing and reviewing Next.js App Router features, React Server Components and Actions, caching strategy, SEO, and production runtime behavior.
-## When to use
+## When to Use
-- Building routes/layouts/loading/error boundaries.
-- Implementing server components and server actions.
-- Designing data fetching/revalidation strategy.
-- Improving SEO and web vitals.
+- Building or refactoring App Router routes, layouts, loading states, and errors.
+- Choosing static, dynamic, or streaming rendering behavior per route.
+- Implementing React Server Components, Server Actions, and cache invalidation.
+- Improving metadata, structured data, and Core Web Vitals in a Next app.
-## Core workflow
+## Instructions
-1. Define rendering mode per route (static/dynamic/streaming).
-2. Keep server/client component boundaries explicit.
-3. Implement data access with caching semantics by intent.
-4. Add forms/actions with validation and auth checks.
-5. Verify perf, accessibility, and error handling.
+1. Decide rendering mode and data boundary per route.
+2. Keep server and client components separated by necessity, not habit.
+3. Use caching and invalidation intentionally rather than globally.
+4. Add route-level loading, error, and auth-sensitive behavior explicitly.
+5. Verify SEO, accessibility, and runtime performance before shipping.
-## Baseline standards
+### Baseline standards
-- Default to Server Components; opt into client components only when needed.
+- Default to Server Components and opt into client components deliberately.
 - Keep server-only code out of client bundles.
-- Use route-level loading/error states.
-- Use metadata API and structured data for SEO.
-- Measure Core Web Vitals and act on regressions.
-## Avoid
-- Unnecessary client hydration.
-- Mixing auth-sensitive logic into client-only paths.
-- Overusing global revalidation when targeted invalidation is possible.
-## Reference files
-- `references/app-router.md`
-- `references/server-components.md`
-- `references/server-actions.md`
-- `references/data-fetching.md`
-- `references/performance.md`
-- `references/seo.md`
-- `references/deployment.md`
-- `references/testing.md`
+- Use route-level loading and error states.
+- Prefer targeted revalidation over broad cache busting.
+- Measure Web Vitals after changes that affect rendering or data flow.
+### Constraints
+- Avoid unnecessary client hydration.
+- Avoid mixing secret or auth-sensitive logic into client paths.
+- Avoid global revalidation when route- or tag-scoped invalidation is enough.
+- Avoid framework-heavy decisions without checking the underlying React boundary first.
+## Output Format
+Provide implementation guidance, code examples, and configuration as appropriate to the task.
+## References
+Load on demand. Do not preload all reference files.
+| File | Load when |
+| --- | --- |
+| `references/app-router-cache-playbook.md` | The task needs route-level guidance for App Router rendering, caching, revalidation, metadata, and server/client boundaries. |
+## Scripts
+No helper scripts are required for this skill right now. Keep execution in `SKILL.md` and `references/` unless repeated automation becomes necessary.
+## Examples
+- "Help me with nextjs developer best practices in this project"
+- "Review my nextjs developer implementation for issues"

package/workflows/powers/nodejs-best-practices/POWER.md CHANGED Viewed

@@ -1,52 +1,71 @@
 ````markdown
 ---
 inclusion: manual
-name: "nodejs-best-practices"
-description: "Decision framework for modern Node.js backend architecture, operations, and security using current LTS-era practices."
+name: nodejs-best-practices
+description: "Use for modern Node.js backend architecture, runtime choices, worker or queue boundaries, edge-vs-server tradeoffs, reliability controls, and production-safe service implementation in the current LTS era."
 license: MIT
 metadata:
-  version: "2.0.0"
-  domain: "backend"
-  role: "decision-guide"
-  stack: "nodejs"
-  baseline: "Node 22/24 LTS era"
+  author: cubis-foundry
+  version: "3.0"
+compatibility: Claude Code, Codex, GitHub Copilot
 ---
 # Node.js Best Practices
 ## Purpose
-Use this skill to choose the right Node backend patterns for the actual constraints of the task, not by habit.
+Use for modern Node.js backend architecture, runtime choices, worker or queue boundaries, edge-vs-server tradeoffs, reliability controls, and production-safe service implementation in the current LTS era.
-## Decision flow
+## When to Use
-1. Clarify deployment target (container, serverless, edge).
-2. Select framework/runtime shape based on latency and team constraints.
-3. Define API, validation, auth, and observability boundaries.
-4. Implement smallest safe slice, then harden.
+- Choosing Node backend structure for APIs, workers, or service code.
+- Making runtime, framework, validation, queue, and observability decisions.
+- Hardening Node services for concurrency, deployment, and failure handling.
+- Reviewing service code for event-loop safety, background work boundaries, and production behavior.
-## Core guidance
+## Instructions
-- Prefer typed boundaries (TypeScript + schema validation).
-- Keep transport concerns out of business logic.
-- Standardize error envelopes and correlation IDs.
-- Enforce timeout, retry, and circuit-breaker strategy for downstream calls.
-- Use graceful shutdown and health/readiness probes.
+1. Confirm runtime context: container, serverless, edge, worker, queue consumer, or long-lived process.
+2. Pick the smallest framework/runtime shape that fits latency, I/O profile, and deployment constraints.
+3. Keep transport, business logic, persistence, and background execution boundaries explicit.
+4. Add validation, timeout, retry, backpressure, and observability controls before shipping.
+5. Verify graceful shutdown, health checks, worker behavior, and dependency failure handling.
-## Security and reliability
+### Baseline standards
-- Validate all request input before business logic.
-- Use least-privilege credentials and secret rotation.
+- Prefer typed boundaries and explicit schema validation.
 - Avoid blocking the event loop in request paths.
-- Add rate limits and abuse controls on external endpoints.
+- Use workers or queues when CPU-heavy or long-lived background work would distort request latency.
+- Add correlation IDs and consistent error envelopes.
+- Use graceful shutdown and readiness probes.
+- Measure CPU, heap, and I/O hot paths before optimizing.
-## Performance
+### Constraints
-- Measure before optimizing.
-- Profile CPU and heap in realistic workloads.
-- Use streaming/backpressure for large I/O paths.
+- Avoid framework-by-habit decisions.
+- Avoid hidden background work with no timeout or cancellation path.
+- Avoid running CPU-bound work on the main event loop when workers or out-of-process jobs are needed.
+- Avoid unbounded retries and silent downstream failures.
+- Avoid secret handling or auth logic without explicit least-privilege boundaries.
-## Output expectation
+## Output Format
-Return concrete architecture choices with tradeoffs, then implementation steps and verification criteria.
+Provide implementation guidance, code examples, and configuration as appropriate to the task.
+## References
+Load on demand. Do not preload all reference files.
+| File | Load when |
+| --- | --- |
+| `references/runtime-reliability-checklist.md` | You need a deeper checklist for runtime choice, shutdown, workers or queues, edge-vs-server tradeoffs, validation, retries, observability, and production failure handling. |
+## Scripts
+No helper scripts are required for this skill right now. Keep execution in `SKILL.md` and `references/` unless repeated automation becomes necessary.
+## Examples
+- "Help me with nodejs best practices best practices in this project"
+- "Review my nodejs best practices implementation for issues"
 ````

package/workflows/powers/nodejs-best-practices/SKILL.md CHANGED Viewed

@@ -1,49 +1,68 @@
 ---
-name: "nodejs-best-practices"
-description: "Decision framework for modern Node.js backend architecture, operations, and security using current LTS-era practices."
+name: nodejs-best-practices
+description: "Use for modern Node.js backend architecture, runtime choices, worker or queue boundaries, edge-vs-server tradeoffs, reliability controls, and production-safe service implementation in the current LTS era."
 license: MIT
 metadata:
-  version: "2.0.0"
-  domain: "backend"
-  role: "decision-guide"
-  stack: "nodejs"
-  baseline: "Node 22/24 LTS era"
+  author: cubis-foundry
+  version: "3.0"
+compatibility: Claude Code, Codex, GitHub Copilot
 ---
 # Node.js Best Practices
 ## Purpose
-Use this skill to choose the right Node backend patterns for the actual constraints of the task, not by habit.
+Use for modern Node.js backend architecture, runtime choices, worker or queue boundaries, edge-vs-server tradeoffs, reliability controls, and production-safe service implementation in the current LTS era.
-## Decision flow
+## When to Use
-1. Clarify deployment target (container, serverless, edge).
-2. Select framework/runtime shape based on latency and team constraints.
-3. Define API, validation, auth, and observability boundaries.
-4. Implement smallest safe slice, then harden.
+- Choosing Node backend structure for APIs, workers, or service code.
+- Making runtime, framework, validation, queue, and observability decisions.
+- Hardening Node services for concurrency, deployment, and failure handling.
+- Reviewing service code for event-loop safety, background work boundaries, and production behavior.
-## Core guidance
+## Instructions
-- Prefer typed boundaries (TypeScript + schema validation).
-- Keep transport concerns out of business logic.
-- Standardize error envelopes and correlation IDs.
-- Enforce timeout, retry, and circuit-breaker strategy for downstream calls.
-- Use graceful shutdown and health/readiness probes.
+1. Confirm runtime context: container, serverless, edge, worker, queue consumer, or long-lived process.
+2. Pick the smallest framework/runtime shape that fits latency, I/O profile, and deployment constraints.
+3. Keep transport, business logic, persistence, and background execution boundaries explicit.
+4. Add validation, timeout, retry, backpressure, and observability controls before shipping.
+5. Verify graceful shutdown, health checks, worker behavior, and dependency failure handling.
-## Security and reliability
+### Baseline standards
-- Validate all request input before business logic.
-- Use least-privilege credentials and secret rotation.
+- Prefer typed boundaries and explicit schema validation.
 - Avoid blocking the event loop in request paths.
-- Add rate limits and abuse controls on external endpoints.
+- Use workers or queues when CPU-heavy or long-lived background work would distort request latency.
+- Add correlation IDs and consistent error envelopes.
+- Use graceful shutdown and readiness probes.
+- Measure CPU, heap, and I/O hot paths before optimizing.
-## Performance
+### Constraints
-- Measure before optimizing.
-- Profile CPU and heap in realistic workloads.
-- Use streaming/backpressure for large I/O paths.
+- Avoid framework-by-habit decisions.
+- Avoid hidden background work with no timeout or cancellation path.
+- Avoid running CPU-bound work on the main event loop when workers or out-of-process jobs are needed.
+- Avoid unbounded retries and silent downstream failures.
+- Avoid secret handling or auth logic without explicit least-privilege boundaries.
-## Output expectation
+## Output Format
-Return concrete architecture choices with tradeoffs, then implementation steps and verification criteria.
+Provide implementation guidance, code examples, and configuration as appropriate to the task.
+## References
+Load on demand. Do not preload all reference files.
+| File | Load when |
+| --- | --- |
+| `references/runtime-reliability-checklist.md` | You need a deeper checklist for runtime choice, shutdown, workers or queues, edge-vs-server tradeoffs, validation, retries, observability, and production failure handling. |
+## Scripts
+No helper scripts are required for this skill right now. Keep execution in `SKILL.md` and `references/` unless repeated automation becomes necessary.
+## Examples
+- "Help me with nodejs best practices best practices in this project"
+- "Review my nodejs best practices implementation for issues"

package/workflows/powers/observability/POWER.md ADDED Viewed

@@ -0,0 +1,109 @@
+````markdown
+---
+inclusion: manual
+name: observability
+description: "Use when designing or reviewing logging, monitoring, tracing, and alerting for production services. Covers structured logging, distributed tracing, metric collection, dashboard design, alert hygiene, SLO definition, and incident readiness across application and infrastructure layers."
+license: MIT
+metadata:
+  author: cubis-foundry
+  version: "1.0"
+compatibility: Claude Code, Codex, GitHub Copilot
+---
+# Observability
+## Purpose
+Use when designing or reviewing logging, monitoring, tracing, and alerting for production services. Covers structured logging, distributed tracing, metric collection, dashboard design, alert hygiene, SLO definition, and incident readiness across application and infrastructure layers.
+## When to Use
+- Working on observability related tasks
+## Instructions
+1. **Define SLOs first** — decide what "healthy" means before instrumenting. SLOs drive alerting, not the other way around.
+2. **Instrument the golden signals** — latency, traffic, errors, saturation. Every service must expose these four.
+3. **Structured logging** — JSON to stdout. Include request ID, user context, operation name, and outcome. Never log secrets or PII.
+4. **Distributed tracing** — propagate trace context across service boundaries. Instrument entry points, database calls, and external API calls.
+5. **Alert on symptoms, not causes** — alert on SLO burn rate, not on CPU percentage. Alerts must be actionable.
+### Three pillars
+### Logs
+- Write structured JSON logs to stdout/stderr. Let the platform handle collection.
+- Include: timestamp (ISO 8601), level, service name, trace ID, span ID, message, and relevant context fields.
+- Log levels: ERROR for failures requiring attention, WARN for degraded but functional, INFO for key business events, DEBUG for development only (disabled in production).
+- Never log: passwords, tokens, credit card numbers, PII, or full request/response bodies in production.
+- Correlation: every log line must include the request/trace ID for cross-referencing with traces and metrics.
+### Metrics
+- Use RED method for request-driven services: Rate, Errors, Duration.
+- Use USE method for resources: Utilization, Saturation, Errors.
+- Histogram for latency (not averages — p50/p95/p99 matter).
+- Counter for request counts, error counts, and throughput.
+- Gauge for current state (queue depth, active connections, cache size).
+- Label cardinality: keep label values bounded. Never use user IDs or request IDs as metric labels.
+### Traces
+- Use OpenTelemetry SDK for instrumentation — vendor-neutral, industry standard.
+- Auto-instrument HTTP clients, database drivers, and message consumers.
+- Add custom spans for significant business operations.
+- Propagate W3C Trace Context headers across all service boundaries.
+- Sample appropriately: 100% for errors, tail-sampled for high-volume happy paths.
+### SLO design
+- Define SLIs (Service Level Indicators) from the user's perspective — e.g., "percentage of requests completing in under 200ms."
+- Set SLOs at realistic targets — 99.9% is very different from 99.99%.
+- Calculate error budget: `1 - SLO target`. When budget is consumed, prioritize reliability over features.
+- Review SLOs quarterly. Adjust based on actual user impact data.
+### Alerting hygiene
+- Every alert must have: a clear title, expected impact, runbook link, and escalation path.
+- Use multi-window burn rate alerts for SLO-based alerting.
+- Suppress alerts during maintenance windows.
+- Page only for user-facing impact. Use tickets for slow-burn degradation.
+- Review alert fatigue monthly. If an alert fires more than weekly without action, fix or remove it.
+### Dashboard design
+- Start with a service overview dashboard: golden signals, SLO status, recent deployments.
+- Use consistent time ranges and refresh intervals across dashboards.
+- Top-to-bottom layout: high-level health → request flow → resource utilization → dependencies.
+- Every graph must have: title, unit, and a brief description of what abnormal looks like.
+- Avoid vanity dashboards — every panel must answer a question someone would ask during an incident.
+### Constraints
+- Avoid logging at DEBUG level in production — volume overwhelms analysis.
+- Avoid high-cardinality metric labels (user ID, IP address, full URL path).
+- Avoid alert on every error — alert on error rate exceeding SLO budget.
+- Avoid dashboard sprawl — ten dashboards nobody checks are worse than two good ones.
+- Avoid instrumenting everything with traces — sample and focus on critical paths.
+- Avoid using averages instead of percentiles for latency metrics.
+## Output Format
+Provide implementation guidance, code examples, and configuration as appropriate to the task.
+## References
+| File                                       | Purpose                                                                                         |
+| ------------------------------------------ | ----------------------------------------------------------------------------------------------- |
+| `references/opentelemetry-setup-guide.md`  | OTel SDK setup, auto-instrumentation, exporter configuration, and sampling strategy.            |
+| `references/alerting-and-slo-checklist.md` | SLO definition template, burn-rate alert formulas, runbook structure, and alert review process. |
+## Scripts
+No helper scripts are required for this skill right now. Keep execution in `SKILL.md` and `references/` unless repeated automation becomes necessary.
+## Examples
+- "Help me with observability best practices in this project"
+- "Review my observability implementation for issues"
+````

package/workflows/powers/observability/SKILL.md ADDED Viewed

@@ -0,0 +1,106 @@
+---
+name: observability
+description: "Use when designing or reviewing logging, monitoring, tracing, and alerting for production services. Covers structured logging, distributed tracing, metric collection, dashboard design, alert hygiene, SLO definition, and incident readiness across application and infrastructure layers."
+license: MIT
+metadata:
+  author: cubis-foundry
+  version: "1.0"
+compatibility: Claude Code, Codex, GitHub Copilot
+---
+# Observability
+## Purpose
+Use when designing or reviewing logging, monitoring, tracing, and alerting for production services. Covers structured logging, distributed tracing, metric collection, dashboard design, alert hygiene, SLO definition, and incident readiness across application and infrastructure layers.
+## When to Use
+- Working on observability related tasks
+## Instructions
+1. **Define SLOs first** — decide what "healthy" means before instrumenting. SLOs drive alerting, not the other way around.
+2. **Instrument the golden signals** — latency, traffic, errors, saturation. Every service must expose these four.
+3. **Structured logging** — JSON to stdout. Include request ID, user context, operation name, and outcome. Never log secrets or PII.
+4. **Distributed tracing** — propagate trace context across service boundaries. Instrument entry points, database calls, and external API calls.
+5. **Alert on symptoms, not causes** — alert on SLO burn rate, not on CPU percentage. Alerts must be actionable.
+### Three pillars
+### Logs
+- Write structured JSON logs to stdout/stderr. Let the platform handle collection.
+- Include: timestamp (ISO 8601), level, service name, trace ID, span ID, message, and relevant context fields.
+- Log levels: ERROR for failures requiring attention, WARN for degraded but functional, INFO for key business events, DEBUG for development only (disabled in production).
+- Never log: passwords, tokens, credit card numbers, PII, or full request/response bodies in production.
+- Correlation: every log line must include the request/trace ID for cross-referencing with traces and metrics.
+### Metrics
+- Use RED method for request-driven services: Rate, Errors, Duration.
+- Use USE method for resources: Utilization, Saturation, Errors.
+- Histogram for latency (not averages — p50/p95/p99 matter).
+- Counter for request counts, error counts, and throughput.
+- Gauge for current state (queue depth, active connections, cache size).
+- Label cardinality: keep label values bounded. Never use user IDs or request IDs as metric labels.
+### Traces
+- Use OpenTelemetry SDK for instrumentation — vendor-neutral, industry standard.
+- Auto-instrument HTTP clients, database drivers, and message consumers.
+- Add custom spans for significant business operations.
+- Propagate W3C Trace Context headers across all service boundaries.
+- Sample appropriately: 100% for errors, tail-sampled for high-volume happy paths.
+### SLO design
+- Define SLIs (Service Level Indicators) from the user's perspective — e.g., "percentage of requests completing in under 200ms."
+- Set SLOs at realistic targets — 99.9% is very different from 99.99%.
+- Calculate error budget: `1 - SLO target`. When budget is consumed, prioritize reliability over features.
+- Review SLOs quarterly. Adjust based on actual user impact data.
+### Alerting hygiene
+- Every alert must have: a clear title, expected impact, runbook link, and escalation path.
+- Use multi-window burn rate alerts for SLO-based alerting.
+- Suppress alerts during maintenance windows.
+- Page only for user-facing impact. Use tickets for slow-burn degradation.
+- Review alert fatigue monthly. If an alert fires more than weekly without action, fix or remove it.
+### Dashboard design
+- Start with a service overview dashboard: golden signals, SLO status, recent deployments.
+- Use consistent time ranges and refresh intervals across dashboards.
+- Top-to-bottom layout: high-level health → request flow → resource utilization → dependencies.
+- Every graph must have: title, unit, and a brief description of what abnormal looks like.
+- Avoid vanity dashboards — every panel must answer a question someone would ask during an incident.
+### Constraints
+- Avoid logging at DEBUG level in production — volume overwhelms analysis.
+- Avoid high-cardinality metric labels (user ID, IP address, full URL path).
+- Avoid alert on every error — alert on error rate exceeding SLO budget.
+- Avoid dashboard sprawl — ten dashboards nobody checks are worse than two good ones.
+- Avoid instrumenting everything with traces — sample and focus on critical paths.
+- Avoid using averages instead of percentiles for latency metrics.
+## Output Format
+Provide implementation guidance, code examples, and configuration as appropriate to the task.
+## References
+| File                                       | Purpose                                                                                         |
+| ------------------------------------------ | ----------------------------------------------------------------------------------------------- |
+| `references/opentelemetry-setup-guide.md`  | OTel SDK setup, auto-instrumentation, exporter configuration, and sampling strategy.            |
+| `references/alerting-and-slo-checklist.md` | SLO definition template, burn-rate alert formulas, runbook structure, and alert review process. |
+## Scripts
+No helper scripts are required for this skill right now. Keep execution in `SKILL.md` and `references/` unless repeated automation becomes necessary.
+## Examples
+- "Help me with observability best practices in this project"
+- "Review my observability implementation for issues"