@skill-graph/cli 0.5.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +247 -0
- package/LICENSE +200 -0
- package/NOTICE +62 -0
- package/README.md +398 -0
- package/SKILL_GRAPH.md +443 -0
- package/bin/skill-graph.js +374 -0
- package/docs/ADOPTION.md +117 -0
- package/docs/CONFORMANCE.md +66 -0
- package/docs/PRIMER.md +384 -0
- package/docs/QUICKSTART-30MIN.md +333 -0
- package/docs/ROUTING-METRICS.md +120 -0
- package/docs/SKILL-MD-FORMAT-COMPATIBILITY.md +127 -0
- package/docs/SKILL_AUDIT_CHECKLIST.md +199 -0
- package/docs/SKILL_AUDIT_LOOP.md +195 -0
- package/docs/SKILL_METADATA_PROTOCOL.md +609 -0
- package/docs/_archived/marketplace-publication-priority-2026-05-18.md +239 -0
- package/docs/adr/0001-predicate-set.md +69 -0
- package/docs/adr/0002-json-ld-context.md +82 -0
- package/docs/adr/0003-ontoclean-rigidity-tags.md +65 -0
- package/docs/adr/0004-persistent-identifiers.md +74 -0
- package/docs/adr/0005-freshness-consolidation.md +70 -0
- package/docs/adr/0006-revise-predicate-rename.md +105 -0
- package/docs/adr/0007-audit-loop-cadence.md +99 -0
- package/docs/adr/0008-skill-surface-split-and-curation-policy.md +93 -0
- package/docs/category-consumers.md +168 -0
- package/docs/concept-map.md +194 -0
- package/docs/diagrams/drift-states.mmd +21 -0
- package/docs/diagrams/manifest-pipeline.mmd +25 -0
- package/docs/diagrams/routing-harness.mmd +41 -0
- package/docs/diagrams/starter-graph.mmd +53 -0
- package/docs/field-decision-guide.md +315 -0
- package/docs/field-rationale.md +211 -0
- package/docs/field-reference.generated.md +624 -0
- package/docs/field-reference.md +1426 -0
- package/docs/glossary.md +190 -0
- package/docs/head-noun-glossary.md +63 -0
- package/docs/images/audit-phases.png +0 -0
- package/docs/images/drift-states.png +0 -0
- package/docs/images/graded-mode.png +0 -0
- package/docs/images/manifest-pipeline.png +0 -0
- package/docs/images/routing-harness.png +0 -0
- package/docs/images/skill-anatomy.png +0 -0
- package/docs/images/starter-graph.png +0 -0
- package/docs/images/system-model.png +0 -0
- package/docs/integrations/github-actions.md +155 -0
- package/docs/manifest-field-mapping.md +443 -0
- package/docs/marketplace-publication-queue.generated.md +240 -0
- package/docs/marketplace-release-agent-prompt.md +82 -0
- package/docs/marketplace-skill-candidate-list.md +272 -0
- package/docs/marketplace-syndication.md +222 -0
- package/docs/migration-sample-review.md +155 -0
- package/docs/migrations/v4-to-v5.md +168 -0
- package/docs/migrations/v5-to-v6.md +221 -0
- package/docs/name-exceptions.yaml +37 -0
- package/docs/plans/marketplace-p1-public-migration-plan.md +41 -0
- package/docs/plans/multi-root-workspace.md +148 -0
- package/docs/plans/scripts-roadmap.md +107 -0
- package/docs/plans/v4-schema-bump.md +160 -0
- package/docs/plans/wave-2-extraction.md +122 -0
- package/docs/positioning-vs-marketplaces.md +175 -0
- package/docs/proposals/skill-audit-loop-positioning.md +160 -0
- package/docs/quality-doctrine.md +138 -0
- package/docs/recommended-skills.md +150 -0
- package/docs/research/skill-comprehension-eval-research.md +1830 -0
- package/docs/research/skill-retrieval-evidence.md +66 -0
- package/docs/skill-metadata-protocol.md +471 -0
- package/docs/skills-sh-maintainer-cleanup-request.md +80 -0
- package/examples/audits/a11y/findings.md +52 -0
- package/examples/audits/a11y/scorecard.md +21 -0
- package/examples/audits/a11y/verdict.md +44 -0
- package/examples/audits/debugging/findings.md +59 -0
- package/examples/audits/debugging/scorecard.md +22 -0
- package/examples/audits/debugging/verdict.md +33 -0
- package/examples/audits/documentation/findings.md +59 -0
- package/examples/audits/documentation/scorecard.md +22 -0
- package/examples/audits/documentation/verdict.md +33 -0
- package/examples/evals/a11y.json +140 -0
- package/examples/evals/api-design.json +52 -0
- package/examples/evals/code-review.json +52 -0
- package/examples/evals/data-modeling.json +52 -0
- package/examples/evals/database-migration.json +52 -0
- package/examples/evals/debugging.json +118 -0
- package/examples/evals/dependency-architecture.json +52 -0
- package/examples/evals/design-system-architecture.json +52 -0
- package/examples/evals/error-tracking.json +52 -0
- package/examples/evals/event-contract-design.json +52 -0
- package/examples/evals/form-ux-architecture.json +52 -0
- package/examples/evals/framework-fit-analysis.json +52 -0
- package/examples/evals/graph-audit.json +139 -0
- package/examples/evals/information-architecture.json +52 -0
- package/examples/evals/interaction-feedback.json +52 -0
- package/examples/evals/interaction-patterns.json +52 -0
- package/examples/evals/layout-composition.json +52 -0
- package/examples/evals/lint-overlay.json +117 -0
- package/examples/evals/microcopy.json +52 -0
- package/examples/evals/observability-modeling.json +52 -0
- package/examples/evals/pattern-recognition.json +96 -0
- package/examples/evals/performance-engineering.json +52 -0
- package/examples/evals/refactor.json +128 -0
- package/examples/evals/semiotics.json +52 -0
- package/examples/evals/skill-infrastructure.json +96 -0
- package/examples/evals/skill-router.json +140 -0
- package/examples/evals/skill-router.routing.json +113 -0
- package/examples/evals/system-interface-contracts.json +52 -0
- package/examples/evals/task-analysis.json +52 -0
- package/examples/evals/testing-strategy.json +118 -0
- package/examples/evals/type-safety.json +249 -0
- package/examples/evals/visual-design-foundations.json +52 -0
- package/examples/evals/webhook-integration.json +52 -0
- package/examples/exports/a11y.skill-md.md +80 -0
- package/examples/exports/debugging.skill-md.md +80 -0
- package/examples/exports/refactor.skill-md.md +78 -0
- package/examples/exports/testing-strategy.skill-md.md +81 -0
- package/examples/projects/markdown-static-site/README.md +115 -0
- package/examples/projects/markdown-static-site/skills/content-source-router/SKILL.md +131 -0
- package/examples/projects/markdown-static-site/skills/image-optimization-pipeline-config/SKILL.md +132 -0
- package/examples/projects/markdown-static-site/skills/link-rot-detection/SKILL.md +103 -0
- package/examples/projects/markdown-static-site/skills/markdown-post-frontmatter-validation/SKILL.md +133 -0
- package/examples/projects/markdown-static-site/skills/migrate-posts-to-v2-frontmatter/SKILL.md +140 -0
- package/examples/projects/saas-stripe-postgres/README.md +208 -0
- package/examples/projects/saas-stripe-postgres/db/migrations/0004_canonicalize_orders.sql +37 -0
- package/examples/projects/saas-stripe-postgres/db/schema.sql +112 -0
- package/examples/projects/saas-stripe-postgres/skills/migrate-orders-to-canonical-schema/SKILL.md +149 -0
- package/examples/projects/saas-stripe-postgres/skills/nextjs-server-action-validation/SKILL.md +154 -0
- package/examples/projects/saas-stripe-postgres/skills/payment-provider-router/SKILL.md +153 -0
- package/examples/projects/saas-stripe-postgres/skills/postgres-rls-pattern/SKILL.md +163 -0
- package/examples/projects/saas-stripe-postgres/skills/stripe-webhook-signature-verification/SKILL.md +137 -0
- package/examples/protocol/skill-metadata-template.md +301 -0
- package/examples/protocol/skills.manifest.sample.json +13245 -0
- package/examples/skill-metadata-template.md +317 -0
- package/examples/skills.manifest.sample.json +13519 -0
- package/examples/tests/v3-1-skos-fixture/SKILL.md +93 -0
- package/marketplace/README.md +17 -0
- package/marketplace/skills/a11y/SKILL.md +66 -0
- package/marketplace/skills/acid-fundamentals/SKILL.md +106 -0
- package/marketplace/skills/agent-engineering/SKILL.md +386 -0
- package/marketplace/skills/agent-eval-design/SKILL.md +55 -0
- package/marketplace/skills/ai-native-development/SKILL.md +294 -0
- package/marketplace/skills/api-design/SKILL.md +60 -0
- package/marketplace/skills/architecture-decision-records/SKILL.md +55 -0
- package/marketplace/skills/background-jobs/SKILL.md +265 -0
- package/marketplace/skills/bounded-context-mapping/SKILL.md +55 -0
- package/marketplace/skills/cap-theorem-tradeoffs/SKILL.md +127 -0
- package/marketplace/skills/client-server-boundary/SKILL.md +187 -0
- package/marketplace/skills/code-review/SKILL.md +120 -0
- package/marketplace/skills/color-system-design/SKILL.md +43 -0
- package/marketplace/skills/component-architecture/SKILL.md +126 -0
- package/marketplace/skills/compression/SKILL.md +112 -0
- package/marketplace/skills/conceptual-modeling/SKILL.md +181 -0
- package/marketplace/skills/connection-pooling/SKILL.md +105 -0
- package/marketplace/skills/constraint-awareness/SKILL.md +287 -0
- package/marketplace/skills/content-monitor/SKILL.md +209 -0
- package/marketplace/skills/context-engineering/SKILL.md +320 -0
- package/marketplace/skills/context-graph/SKILL.md +174 -0
- package/marketplace/skills/context-management/SKILL.md +174 -0
- package/marketplace/skills/context-window/SKILL.md +239 -0
- package/marketplace/skills/contract-testing/SKILL.md +120 -0
- package/marketplace/skills/cron-scheduling/SKILL.md +223 -0
- package/marketplace/skills/dark-mode-implementation/SKILL.md +47 -0
- package/marketplace/skills/data-modeling/SKILL.md +59 -0
- package/marketplace/skills/data-modeling-fundamentals/SKILL.md +117 -0
- package/marketplace/skills/database-migration/SKILL.md +429 -0
- package/marketplace/skills/debugging/SKILL.md +67 -0
- package/marketplace/skills/dependency-architecture/SKILL.md +58 -0
- package/marketplace/skills/design-module-composition/SKILL.md +43 -0
- package/marketplace/skills/design-system-architecture/SKILL.md +61 -0
- package/marketplace/skills/design-thinking/SKILL.md +44 -0
- package/marketplace/skills/diagnosis/SKILL.md +296 -0
- package/marketplace/skills/diff-analysis/SKILL.md +188 -0
- package/marketplace/skills/e2e-test-design/SKILL.md +113 -0
- package/marketplace/skills/entity-relationship-modeling/SKILL.md +218 -0
- package/marketplace/skills/epistemic-grounding/SKILL.md +112 -0
- package/marketplace/skills/error-boundary/SKILL.md +235 -0
- package/marketplace/skills/error-tracking/SKILL.md +261 -0
- package/marketplace/skills/eval-driven-development/SKILL.md +147 -0
- package/marketplace/skills/evaluation/SKILL.md +113 -0
- package/marketplace/skills/event-contract-design/SKILL.md +60 -0
- package/marketplace/skills/event-storming/SKILL.md +56 -0
- package/marketplace/skills/form-ux-architecture/SKILL.md +60 -0
- package/marketplace/skills/framework-fit-analysis/SKILL.md +59 -0
- package/marketplace/skills/frontend-architecture/SKILL.md +43 -0
- package/marketplace/skills/generative-ui/SKILL.md +118 -0
- package/marketplace/skills/graph-audit/SKILL.md +81 -0
- package/marketplace/skills/guardrails/SKILL.md +118 -0
- package/marketplace/skills/hooks-patterns/SKILL.md +185 -0
- package/marketplace/skills/http-semantics/SKILL.md +136 -0
- package/marketplace/skills/ideation/SKILL.md +41 -0
- package/marketplace/skills/indexing-strategy/SKILL.md +108 -0
- package/marketplace/skills/information-architecture/SKILL.md +59 -0
- package/marketplace/skills/integration-test-design/SKILL.md +111 -0
- package/marketplace/skills/intent-recognition/SKILL.md +136 -0
- package/marketplace/skills/interaction-feedback/SKILL.md +59 -0
- package/marketplace/skills/interaction-patterns/SKILL.md +59 -0
- package/marketplace/skills/journey-mapping/SKILL.md +41 -0
- package/marketplace/skills/keywords/SKILL.md +213 -0
- package/marketplace/skills/knowledge-modeling/SKILL.md +232 -0
- package/marketplace/skills/layout-composition/SKILL.md +59 -0
- package/marketplace/skills/linguistics/SKILL.md +429 -0
- package/marketplace/skills/lint-overlay/SKILL.md +76 -0
- package/marketplace/skills/mental-models/SKILL.md +126 -0
- package/marketplace/skills/merge-queue/SKILL.md +94 -0
- package/marketplace/skills/methodology/SKILL.md +317 -0
- package/marketplace/skills/microcopy/SKILL.md +232 -0
- package/marketplace/skills/middleware-patterns/SKILL.md +363 -0
- package/marketplace/skills/mobile-responsive-ux/SKILL.md +287 -0
- package/marketplace/skills/mutation-testing/SKILL.md +112 -0
- package/marketplace/skills/naming-conventions/SKILL.md +112 -0
- package/marketplace/skills/observability-modeling/SKILL.md +59 -0
- package/marketplace/skills/ontology-modeling/SKILL.md +67 -0
- package/marketplace/skills/owasp-security/SKILL.md +153 -0
- package/marketplace/skills/pattern-recognition/SKILL.md +472 -0
- package/marketplace/skills/performance-budgets/SKILL.md +185 -0
- package/marketplace/skills/performance-engineering/SKILL.md +58 -0
- package/marketplace/skills/performance-testing/SKILL.md +125 -0
- package/marketplace/skills/printify/SKILL.md +42 -0
- package/marketplace/skills/prioritization/SKILL.md +118 -0
- package/marketplace/skills/problem-framing/SKILL.md +41 -0
- package/marketplace/skills/problem-locating-solving/SKILL.md +203 -0
- package/marketplace/skills/project-knowledge-extraction/SKILL.md +54 -0
- package/marketplace/skills/prompt-craft/SKILL.md +134 -0
- package/marketplace/skills/prompt-injection-defense/SKILL.md +132 -0
- package/marketplace/skills/property-based-testing/SKILL.md +100 -0
- package/marketplace/skills/prototyping/SKILL.md +43 -0
- package/marketplace/skills/query-optimization/SKILL.md +144 -0
- package/marketplace/skills/real-time-updates/SKILL.md +324 -0
- package/marketplace/skills/ref-patterns/SKILL.md +284 -0
- package/marketplace/skills/refactor/SKILL.md +65 -0
- package/marketplace/skills/rendering-models/SKILL.md +142 -0
- package/marketplace/skills/replication-patterns/SKILL.md +110 -0
- package/marketplace/skills/research-synthesis/SKILL.md +41 -0
- package/marketplace/skills/route-handler-design/SKILL.md +347 -0
- package/marketplace/skills/schema-evolution/SKILL.md +140 -0
- package/marketplace/skills/security-fundamentals/SKILL.md +139 -0
- package/marketplace/skills/semantic-center/SKILL.md +194 -0
- package/marketplace/skills/semantic-relations/SKILL.md +250 -0
- package/marketplace/skills/semantics/SKILL.md +366 -0
- package/marketplace/skills/semiotics/SKILL.md +230 -0
- package/marketplace/skills/seo-strategy/SKILL.md +260 -0
- package/marketplace/skills/server-actions-design/SKILL.md +243 -0
- package/marketplace/skills/server-components-design/SKILL.md +190 -0
- package/marketplace/skills/sharding-strategy/SKILL.md +123 -0
- package/marketplace/skills/shopify/SKILL.md +42 -0
- package/marketplace/skills/skill-infrastructure/SKILL.md +320 -0
- package/marketplace/skills/skill-router/SKILL.md +71 -0
- package/marketplace/skills/skill-scaffold/SKILL.md +105 -0
- package/marketplace/skills/snapshot-testing/SKILL.md +120 -0
- package/marketplace/skills/spec-driven-development/SKILL.md +148 -0
- package/marketplace/skills/state-machine-modeling/SKILL.md +56 -0
- package/marketplace/skills/state-management/SKILL.md +134 -0
- package/marketplace/skills/streaming-architecture/SKILL.md +194 -0
- package/marketplace/skills/summarization/SKILL.md +156 -0
- package/marketplace/skills/suspense-patterns/SKILL.md +265 -0
- package/marketplace/skills/system-interface-contracts/SKILL.md +59 -0
- package/marketplace/skills/task-analysis/SKILL.md +201 -0
- package/marketplace/skills/taxonomy-design/SKILL.md +66 -0
- package/marketplace/skills/test-coverage-strategy/SKILL.md +108 -0
- package/marketplace/skills/test-doubles-design/SKILL.md +98 -0
- package/marketplace/skills/test-driven-development/SKILL.md +96 -0
- package/marketplace/skills/testing-strategy/SKILL.md +67 -0
- package/marketplace/skills/theme-system-design/SKILL.md +43 -0
- package/marketplace/skills/tool-call-flow/SKILL.md +229 -0
- package/marketplace/skills/tool-call-strategy/SKILL.md +292 -0
- package/marketplace/skills/transaction-isolation/SKILL.md +98 -0
- package/marketplace/skills/type-safety/SKILL.md +177 -0
- package/marketplace/skills/typography-system/SKILL.md +43 -0
- package/marketplace/skills/usability-testing/SKILL.md +43 -0
- package/marketplace/skills/user-research/SKILL.md +43 -0
- package/marketplace/skills/vercel-composition-patterns/SKILL.md +157 -0
- package/marketplace/skills/version-control/SKILL.md +233 -0
- package/marketplace/skills/visual-design-foundations/SKILL.md +59 -0
- package/marketplace/skills/visual-hierarchy/SKILL.md +43 -0
- package/marketplace/skills/webhook-integration/SKILL.md +331 -0
- package/marketplace/skills/writing-humanizer/SKILL.md +380 -0
- package/package.json +67 -0
- package/schemas/manifest.schema.json +811 -0
- package/schemas/manifest.v2.schema.json +164 -0
- package/schemas/manifest.v3.schema.json +758 -0
- package/schemas/manifest.v4.schema.json +755 -0
- package/schemas/manifest.v5.schema.json +755 -0
- package/schemas/manifest.v6.schema.json +811 -0
- package/schemas/skill.context.jsonld +279 -0
- package/schemas/skill.schema.json +919 -0
- package/schemas/skill.v2.schema.json +201 -0
- package/schemas/skill.v3.schema.json +827 -0
- package/schemas/skill.v4.schema.json +822 -0
- package/schemas/skill.v5.schema.json +830 -0
- package/schemas/skill.v6.schema.json +946 -0
- package/schemas/vocabulary/keywords.json +180 -0
- package/schemas/vocabulary/workspace_tags.json +23 -0
- package/scripts/__tests__/migrate-skill-v2-to-v3.test.js +161 -0
- package/scripts/__tests__/migrate-skill-v3-to-v4.test.js +158 -0
- package/scripts/__tests__/test-export-parser-drift.js +149 -0
- package/scripts/__tests__/test-marketplace-export.js +114 -0
- package/scripts/__tests__/test-router-paths.js +82 -0
- package/scripts/__tests__/test-stability-promotion.js +244 -0
- package/scripts/__tests__/test-v3-1-alias-contract.js +109 -0
- package/scripts/__tests__/test-v3-1-skos-runtime.js +116 -0
- package/scripts/backfill-schema-version.js +198 -0
- package/scripts/build-field-reference.js +160 -0
- package/scripts/build-retrieval-baseline.js +511 -0
- package/scripts/check-markdown-links.js +211 -0
- package/scripts/check-protocol-consistency.js +979 -0
- package/scripts/export-marketplace-skills.js +610 -0
- package/scripts/export-skill.js +374 -0
- package/scripts/generate-manifest.js +787 -0
- package/scripts/lib/alias-contract.js +83 -0
- package/scripts/lib/audit-prompt-builder.js +771 -0
- package/scripts/lib/mock-grader.js +134 -0
- package/scripts/lib/parse-frontmatter.js +429 -0
- package/scripts/lib/roots.js +119 -0
- package/scripts/lint/check-archetype-sections.js +185 -0
- package/scripts/lint/check-category-enum.js +83 -0
- package/scripts/lint/check-routing-eval.js +146 -0
- package/scripts/lint/check-routing-quality.js +211 -0
- package/scripts/lint/check-stability-promotion.js +220 -0
- package/scripts/lint/format-code-frame.js +206 -0
- package/scripts/marketplace-install.js +125 -0
- package/scripts/migrate-category-to-enum.js +169 -0
- package/scripts/migrate-skill-v2-to-v3.js +424 -0
- package/scripts/migrate-skill-v3-to-v4.js +200 -0
- package/scripts/migrate-skill-v5-to-v6.js +304 -0
- package/scripts/restructure-by-category.js +85 -0
- package/scripts/seed-publication-classification.js +282 -0
- package/scripts/skill-audit.js +893 -0
- package/scripts/skill-graph-drift.js +483 -0
- package/scripts/skill-graph-route.js +766 -0
- package/scripts/skill-graph-routing-eval.js +393 -0
- package/scripts/skill-lint.js +1317 -0
- package/scripts/skill-overlap.js +213 -0
- package/scripts/verify-skill-md-export.js +201 -0
|
@@ -0,0 +1,185 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: performance-budgets
|
|
3
|
+
description: "Use when declaring, measuring, or enforcing performance thresholds as a quality contract rather than as an aspirational target. Covers the three budget axes (time, size, count), the four governing properties of a real budget (metric, threshold, percentile, consequence), the Core Web Vitals set (LCP, INP, CLS), the RAIL model, Lighthouse budgets.json, lab vs field measurement, and the discipline of treating budget breach as a build or deploy failure rather than a tracked metric. Do NOT use for the activity of profiling and optimizing a specific slow path (use performance-engineering), the choice of rendering model that bounds achievable budgets (use rendering-models), or the design of observability and telemetry signals (use observability-modeling)."
|
|
4
|
+
license: MIT
|
|
5
|
+
allowed-tools: Read Grep
|
|
6
|
+
metadata:
|
|
7
|
+
metadata: "{\"schema_version\":6,\"version\":\"1.0.0\",\"type\":\"capability\",\"category\":\"quality\",\"domain\":\"quality/performance\",\"scope\":\"reference\",\"owner\":\"skill-graph-maintainer\",\"freshness\":\"2026-05-15\",\"drift_check\":\"{\\\\\\\"last_verified\\\\\\\":\\\\\\\"2026-05-15\\\\\\\"}\",\"eval_artifacts\":\"planned\",\"eval_state\":\"unverified\",\"routing_eval\":\"absent\",\"comprehension_state\":\"present\",\"stability\":\"experimental\",\"keywords\":\"[\\\\\\\"performance budget\\\\\\\",\\\\\\\"Core Web Vitals\\\\\\\",\\\\\\\"LCP\\\\\\\",\\\\\\\"INP\\\\\\\",\\\\\\\"CLS\\\\\\\",\\\\\\\"RAIL model\\\\\\\",\\\\\\\"Lighthouse budgets\\\\\\\",\\\\\\\"lab metrics\\\\\\\",\\\\\\\"field metrics\\\\\\\",\\\\\\\"p75 performance\\\\\\\",\\\\\\\"bundle size budget\\\\\\\",\\\\\\\"request count budget\\\\\\\",\\\\\\\"performance regression\\\\\\\",\\\\\\\"budget breach\\\\\\\"]\",\"triggers\":\"[\\\\\\\"how fast does this page need to be\\\\\\\",\\\\\\\"what's a good LCP target\\\\\\\",\\\\\\\"should this fail the build\\\\\\\",\\\\\\\"why is the Lighthouse score different from real users\\\\\\\",\\\\\\\"we need a performance budget\\\\\\\"]\",\"examples\":\"[\\\\\\\"set a Core Web Vitals budget for a marketing landing page and enforce it in CI\\\\\\\",\\\\\\\"explain why a green Lighthouse score still produced bad real-user performance\\\\\\\",\\\\\\\"decide between INP and FID as the interaction-responsiveness metric\\\\\\\",\\\\\\\"design a per-route budget table that distinguishes static pages from logged-in dashboards\\\\\\\"]\",\"anti_examples\":\"[\\\\\\\"profile a specific slow query and decide what to fix (use performance-engineering)\\\\\\\",\\\\\\\"choose between SSG and SSR for a route (use rendering-models)\\\\\\\",\\\\\\\"design telemetry spans and traces (use observability-modeling)\\\\\\\"]\",\"relations\":\"{\\\\\\\"related\\\\\\\":[\\\\\\\"performance-engineering\\\\\\\",\\\\\\\"rendering-models\\\\\\\",\\\\\\\"observability-modeling\\\\\\\",\\\\\\\"testing-strategy\\\\\\\",\\\\\\\"http-semantics\\\\\\\"],\\\\\\\"boundary\\\\\\\":[{\\\\\\\"skill\\\\\\\":\\\\\\\"performance-engineering\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"performance-engineering owns the activity of measuring, profiling, and improving performance. performance-budgets owns the threshold-and-consequence contract. The two compose: budgets define the failure conditions; engineering produces the improvements that prevent breach.\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"rendering-models\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"rendering-models owns the choice of when and where the UI is produced. performance-budgets sits downstream — the chosen rendering model bounds which budgets are achievable on a given route.\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"observability-modeling\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"observability-modeling owns the design of telemetry signals (spans, metrics, logs). performance-budgets consumes signals as evidence of breach but does not design the signals themselves.\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"testing-strategy\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"testing-strategy owns runtime-correctness verification. performance-budgets is an analogous discipline for non-functional properties — a budget breach is the same kind of CI failure as a failing test.\\\\\\\"}],\\\\\\\"verify_with\\\\\\\":[\\\\\\\"performance-engineering\\\\\\\",\\\\\\\"observability-modeling\\\\\\\"]}\",\"mental_model\":\"|\",\"purpose\":\"|\",\"boundary\":\"|\",\"analogy\":\"A performance budget is to a web app what a calorie budget is to a diet — the calorie count of any single meal is information; the calorie budget is what you do about it when you exceed it. A diet that 'tracks' calories without consequence is description; a diet with a calorie *budget* is discipline. And a per-meal budget (per-route) catches drift earlier than a per-day total: by the time the day total breaches, the offending meal is hours behind you and harder to undo.\",\"misconception\":\"|\",\"concept\":\"{\\\\\\\"definition\\\\\\\":\\\\\\\"A performance budget is a declared, measurable threshold for a user-affecting property of a system — load time, interaction latency, layout stability, bundle size, request count — treated as a contract the system must satisfy. A budget has four parts: the metric (what is measured), the threshold (the maximum or minimum acceptable value), the percentile (whose experience the threshold describes), and the consequence (what happens when the threshold is breached). Without all four, the number is an aspiration, not a budget.\\\\\\\",\\\\\\\"mental_model\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"purpose\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"boundary\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"taxonomy\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"analogy\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"misconception\\\\\\\":\\\\\\\"|\\\\\\\"}\",\"skill_graph_source_repo\":\"https://github.com/jacob-balslev/skill-graph\",\"skill_graph_protocol\":\"Skill Metadata Protocol v5\",\"skill_graph_project\":\"Skill Graph\",\"skill_graph_canonical_skill\":\"skills/performance-budgets/SKILL.md\"}"
|
|
8
|
+
skill_graph_source_repo: "https://github.com/jacob-balslev/skill-graph"
|
|
9
|
+
skill_graph_protocol: Skill Metadata Protocol v4
|
|
10
|
+
skill_graph_project: Skill Graph
|
|
11
|
+
skill_graph_canonical_skill: skills/performance-budgets/SKILL.md
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Performance Budgets
|
|
15
|
+
|
|
16
|
+
## Coverage
|
|
17
|
+
|
|
18
|
+
The discipline of declaring measurable thresholds for performance-affecting properties and enforcing them as quality contracts. Covers the four governing properties (metric, threshold, percentile, consequence), the three budget axes (time, size, count), the three measurement modes (lab, field, synthetic), the Core Web Vitals set as the most-adopted public budget standard, the RAIL model as the interaction-class framework, Lighthouse budgets.json as a declarative enforcement mechanism, and the per-route granularity that real applications require.
|
|
19
|
+
|
|
20
|
+
## Philosophy
|
|
21
|
+
|
|
22
|
+
A budget is a contract written in numbers. Three things distinguish a real budget from a tracked metric:
|
|
23
|
+
|
|
24
|
+
1. **It is set before the spending decisions, not after.** A budget that emerges from post-hoc retrospectives is a description; a budget that constrains the next feature is a discipline.
|
|
25
|
+
2. **It binds to a consequence, not a dashboard.** A number that someone watches and worries about is a metric. A number that fails a build is a budget.
|
|
26
|
+
3. **It speaks for the user.** Every other voice in the room — engineering, design, product, analytics, marketing — has its own incentives. The budget is the institutional voice of the user, present at every commit, refusing to grant exceptions silently.
|
|
27
|
+
|
|
28
|
+
The hardest part of performance budgeting is not setting the number — it is committing to enforce it when enforcement is expensive. The first time the budget blocks a feature, the discipline is tested. A team that grants an exception "just this once" has reduced the budget to a metric. A team that delays the feature, cuts another feature, or extends the budget through a documented amendment has kept the contract intact.
|
|
29
|
+
|
|
30
|
+
## The Four Parts of a Real Budget
|
|
31
|
+
|
|
32
|
+
A complete budget statement names all four:
|
|
33
|
+
|
|
34
|
+
> **LCP at p75 must be below 2.5 seconds in field measurement; breach blocks deploy.**
|
|
35
|
+
|
|
36
|
+
| Part | This budget's value | Why it matters |
|
|
37
|
+
|---|---|---|
|
|
38
|
+
| Metric | LCP (Largest Contentful Paint) | Cited by name, not by "speed" or "load time"; LCP has a precise definition |
|
|
39
|
+
| Threshold | 2.5 seconds | A number, not a band; matches the Core Web Vitals "Good" boundary |
|
|
40
|
+
| Percentile | p75 | Whose experience this threshold describes — the slower three-quarters get worse |
|
|
41
|
+
| Consequence | Deploy blocked | The discipline; without this, the rest is a tracked metric |
|
|
42
|
+
|
|
43
|
+
A statement that says "LCP should be fast" or "we target a Lighthouse score of 90" is missing at least one of these parts. The missing parts are where the discipline leaks.
|
|
44
|
+
|
|
45
|
+
## The Three Axes
|
|
46
|
+
|
|
47
|
+
| Axis | Examples | Why budgeted |
|
|
48
|
+
|---|---|---|
|
|
49
|
+
| **Time** | LCP, INP, CLS, TTFB, FCP, TTI | Direct user-perceived latency |
|
|
50
|
+
| **Size** | JS bytes, CSS bytes, image bytes, font bytes, total transfer | Bounds parse, network, and execution cost — proxies for time on slow devices |
|
|
51
|
+
| **Count** | Requests, third-party scripts, fonts loaded, DOM nodes | Connection overhead, rendering cost, attack surface |
|
|
52
|
+
|
|
53
|
+
Size and count budgets are upstream of time budgets: a page that ships 3MB of JS will fail INP on a mid-range Android device regardless of how clever the optimization is. Setting all three axes catches regression at the level closest to where it happened.
|
|
54
|
+
|
|
55
|
+
## The Core Web Vitals Set
|
|
56
|
+
|
|
57
|
+
Google's Core Web Vitals are the most widely adopted public budget standard. They are the default starting point for a project that does not yet have a budget.
|
|
58
|
+
|
|
59
|
+
| Metric | Good | Needs Improvement | Poor | Definition |
|
|
60
|
+
|---|---|---|---|---|
|
|
61
|
+
| LCP (Largest Contentful Paint) | ≤ 2.5s | 2.5–4.0s | > 4.0s | Time from navigation start to the largest above-fold element rendering |
|
|
62
|
+
| INP (Interaction to Next Paint) | ≤ 200ms | 200–500ms | > 500ms | The worst observed delay between any user interaction and the next paint, in the session |
|
|
63
|
+
| CLS (Cumulative Layout Shift) | ≤ 0.1 | 0.1–0.25 | > 0.25 | Sum of unexpected layout shifts, weighted by impact area |
|
|
64
|
+
|
|
65
|
+
All three are measured at p75 in field data. The bands come from Google's published correlation between metric value and user-retention proxies.
|
|
66
|
+
|
|
67
|
+
INP replaced First Input Delay (FID) as a Core Web Vital in March 2024. FID measured only the first interaction; INP measures the worst across the session. A site that passes FID may fail INP; the budget update is not cosmetic.
|
|
68
|
+
|
|
69
|
+
## The RAIL Model
|
|
70
|
+
|
|
71
|
+
A complementary framework that classifies interactions by their latency budget:
|
|
72
|
+
|
|
73
|
+
| Class | Budget | Examples |
|
|
74
|
+
|---|---|---|
|
|
75
|
+
| **Response** | < 100ms from user input to acknowledgement | Button press feedback, hover, focus |
|
|
76
|
+
| **Animation** | < 16ms per frame (60fps) | Scroll, transition, drag |
|
|
77
|
+
| **Idle** | Idle work in chunks ≤ 50ms | Analytics flush, prefetching, deferred rendering |
|
|
78
|
+
| **Load** | < 5s on a slow 3G mid-range mobile | First navigation to interactive |
|
|
79
|
+
|
|
80
|
+
RAIL is older than Core Web Vitals but still load-bearing as a per-interaction-class budget. CWV is the public-facing summary; RAIL is the design-time guidance for what each interaction should cost.
|
|
81
|
+
|
|
82
|
+
## Lab vs Field
|
|
83
|
+
|
|
84
|
+
| Mode | Tool examples | Strengths | Limitations |
|
|
85
|
+
|---|---|---|---|
|
|
86
|
+
| Lab | Lighthouse, WebPageTest, sitespeed.io | Reproducible, fast, integrable in CI | Constructed environment may not reflect real users |
|
|
87
|
+
| Field (RUM) | CrUX, web-vitals.js + own RUM, Sentry Performance, Datadog RUM | Authoritative; reflects real users on real devices | Slow feedback; data arrives after the regression ships |
|
|
88
|
+
| Synthetic | Scheduled lab runs against production from multiple regions | Trend tracking; geographic coverage | Still lab; same constructed-environment limits |
|
|
89
|
+
|
|
90
|
+
The discipline that works:
|
|
91
|
+
|
|
92
|
+
- **Lab as pre-deploy gate.** Lighthouse-CI or equivalent runs on every PR; budget breach blocks the merge. Catches regressions before they ship.
|
|
93
|
+
- **Field as authoritative assessment.** RUM data reports the p75 the user actually experienced. If the field metric breaches even though the lab passed, the lab is missing something — investigate and fix the gap.
|
|
94
|
+
- **Synthetic for trend.** Scheduled production runs from a representative set of geographies; useful for week-over-week regression spotting and for the rare case where production diverges from staging.
|
|
95
|
+
|
|
96
|
+
## Per-Route Budgets
|
|
97
|
+
|
|
98
|
+
A site-wide budget either fits the strictest route or the loosest. Neither is right. A realistic budget table:
|
|
99
|
+
|
|
100
|
+
| Route profile | LCP target | INP target | JS budget | Why |
|
|
101
|
+
|---|---|---|---|---|
|
|
102
|
+
| Marketing landing | 1.5s | 200ms | 100KB compressed | Largest revenue impact per millisecond; competition is fast |
|
|
103
|
+
| Product detail | 2.0s | 200ms | 200KB | Catalog content + image-heavy; users have intent |
|
|
104
|
+
| Search results | 2.5s | 200ms | 250KB | Server work per query; users tolerate slightly more for relevance |
|
|
105
|
+
| Logged-in dashboard | 3.0s | 300ms | 400KB | Personalized; users have committed; richer UI |
|
|
106
|
+
| Admin panel | 4.0s | 500ms | 600KB | Low-traffic; high-functionality; small known audience |
|
|
107
|
+
|
|
108
|
+
The numbers above are illustrative — the actual values come from the project's content, audience, and competitive position. The shape is the load-bearing part: differentiate per route, with stricter budgets on the routes that face the cold-arriving user and looser budgets on the routes that face the committed one.
|
|
109
|
+
|
|
110
|
+
## Lighthouse budgets.json
|
|
111
|
+
|
|
112
|
+
A declarative enforcement file that Lighthouse and Lighthouse-CI consume:
|
|
113
|
+
|
|
114
|
+
```json
|
|
115
|
+
{
|
|
116
|
+
"path": "/",
|
|
117
|
+
"resourceSizes": [
|
|
118
|
+
{ "resourceType": "script", "budget": 200 },
|
|
119
|
+
{ "resourceType": "stylesheet", "budget": 50 },
|
|
120
|
+
{ "resourceType": "image", "budget": 300 },
|
|
121
|
+
{ "resourceType": "font", "budget": 100 },
|
|
122
|
+
{ "resourceType": "total", "budget": 800 }
|
|
123
|
+
],
|
|
124
|
+
"resourceCounts": [
|
|
125
|
+
{ "resourceType": "third-party", "budget": 10 },
|
|
126
|
+
{ "resourceType": "script", "budget": 20 }
|
|
127
|
+
],
|
|
128
|
+
"timings": [
|
|
129
|
+
{ "metric": "interactive", "budget": 3000 },
|
|
130
|
+
{ "metric": "first-contentful-paint", "budget": 1500 }
|
|
131
|
+
]
|
|
132
|
+
}
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
The file is checked into the repo, applies per path, and triggers a Lighthouse-CI failure when breached. It is the cheapest place to make a budget enforceable; the consequence (CI failure) is the property that makes it a budget rather than documentation.
|
|
136
|
+
|
|
137
|
+
## The Discipline of Calibration
|
|
138
|
+
|
|
139
|
+
A budget set too tight blocks all work and gets disabled. A budget set too loose generates no signal. Calibration is the practice of setting budgets that are slightly stricter than current performance, ratcheting them tighter as optimization work lands.
|
|
140
|
+
|
|
141
|
+
**Ratchet pattern:**
|
|
142
|
+
|
|
143
|
+
1. Measure current p75 on the route.
|
|
144
|
+
2. Set the budget at the current value + a small margin (5–10%).
|
|
145
|
+
3. Land optimization work; budget enforces no regression.
|
|
146
|
+
4. When the optimization shows in field data, tighten the budget to the new p75 + margin.
|
|
147
|
+
5. Repeat.
|
|
148
|
+
|
|
149
|
+
The pattern prevents the "budget is impossible from day one" failure mode and the "budget drifts forever" failure mode. Each tightening is small enough to land without negotiation; the cumulative effect over a year is substantial.
|
|
150
|
+
|
|
151
|
+
## Verification
|
|
152
|
+
|
|
153
|
+
After applying this skill, verify:
|
|
154
|
+
- [ ] Every budget statement names all four parts (metric, threshold, percentile, consequence).
|
|
155
|
+
- [ ] At least one budget on each of the three axes (time, size, count) — single-axis budgets miss whole classes of regression.
|
|
156
|
+
- [ ] Field measurement (RUM, CrUX) is the authoritative source for at least the Core Web Vitals; lab measurement is the pre-deploy gate.
|
|
157
|
+
- [ ] Per-route budgets exist where routes have meaningfully different content profiles (landing vs dashboard, anonymous vs logged-in).
|
|
158
|
+
- [ ] The consequence is automated — a CI step, a deploy gate, a rollback trigger — not a manual review.
|
|
159
|
+
- [ ] Lighthouse budgets.json (or equivalent declarative file) is checked into the repo, version-controlled, and reviewed in the same PR as performance-affecting changes.
|
|
160
|
+
- [ ] INP is part of the time budget set (FID alone is insufficient for sessions with multiple interactions).
|
|
161
|
+
- [ ] Third-party scripts are inside the budget, not exempted — the user experiences their cost.
|
|
162
|
+
- [ ] The percentile is documented and matches industry convention (p75 for CWV) or has an explicit reason for differing.
|
|
163
|
+
- [ ] A calibration plan exists — when and how the budget tightens as performance improves.
|
|
164
|
+
|
|
165
|
+
## Do NOT Use When
|
|
166
|
+
|
|
167
|
+
| Instead of this skill | Use | Why |
|
|
168
|
+
|---|---|---|
|
|
169
|
+
| Profiling a specific slow path and deciding what to fix | `performance-engineering` | performance-engineering owns the diagnostic and optimization activity; this skill owns the contract that defines failure |
|
|
170
|
+
| Choosing a rendering model for a route | `rendering-models` | rendering-models bounds what budgets are achievable; this skill consumes that input |
|
|
171
|
+
| Designing telemetry spans, traces, or metric pipelines | `observability-modeling` | observability-modeling designs the signals; this skill defines what level of those signals counts as a breach |
|
|
172
|
+
| Writing tests for behavior correctness | `testing-strategy` | testing-strategy owns runtime correctness; budgets are an analogous discipline for non-functional properties |
|
|
173
|
+
| Designing HTTP cache headers, compression, or transport | `http-semantics` | http-semantics owns the wire-level optimization; budgets are downstream of the choice |
|
|
174
|
+
|
|
175
|
+
## Key Sources
|
|
176
|
+
|
|
177
|
+
- Google Chrome Team. [Core Web Vitals](https://web.dev/articles/vitals). Definitions, "Good / Needs Improvement / Poor" bands, and the underlying research linking metric values to user-retention proxies. The most-adopted public budget standard.
|
|
178
|
+
- Google Chrome Team. [Interaction to Next Paint (INP)](https://web.dev/articles/inp). The 2024 replacement for FID; explains why measuring all interactions in the session (not just the first) materially changes which sites pass.
|
|
179
|
+
- Google Chrome Team. [The RAIL performance model](https://web.dev/articles/rail). Older but still load-bearing; classifies interactions by latency class and assigns each a budget.
|
|
180
|
+
- Tim Kadlec. ["Performance Budgets"](https://timkadlec.com/2013/01/setting-a-performance-budget/). 2013. The original article naming and defining the performance-budget concept; introduces the discipline as a product practice rather than a technical metric.
|
|
181
|
+
- Google Chrome Team. [Lighthouse performance budgets](https://developer.chrome.com/docs/lighthouse/performance/performance-budgets-101). The budgets.json reference and CI integration.
|
|
182
|
+
- Chrome User Experience Report (CrUX). [CrUX dataset](https://developer.chrome.com/docs/crux). The public field dataset; the source of truth for cross-site Core Web Vitals data and the underlying corpus for the band definitions.
|
|
183
|
+
- Addy Osmani. ["Speed at Scale: Web Performance Tips and Tricks from the Trenches"](https://addyosmani.com/blog/). Practitioner-level writing on calibrating and enforcing budgets in production teams.
|
|
184
|
+
- Web Almanac (HTTP Archive). [Performance chapter](https://almanac.httparchive.org/en/2024/performance). Annual cross-site analysis; useful for competitive-derived threshold calibration.
|
|
185
|
+
- Microsoft. [INP at Microsoft Store: 30% improvement case study](https://blogs.windows.com/msedgedev/2023/10/06/the-edge-team-on-improving-inp/). A worked example of an INP budget driving a measurable user-experience improvement.
|
|
@@ -0,0 +1,58 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: performance-engineering
|
|
3
|
+
description: "Use when measuring, diagnosing, budgeting, or improving performance: latency, throughput, Core Web Vitals, database queries, caching, bundle size, concurrency, resource use, and regression prevention. Do NOT use for telemetry schema design alone (use `observability-modeling`), error capture setup (use `error-tracking`), or premature micro-optimization without a measured bottleneck."
|
|
4
|
+
license: MIT
|
|
5
|
+
compatibility: "Portable performance discipline for frontend, backend, databases, jobs, APIs, and agent tooling."
|
|
6
|
+
allowed-tools: Read Grep
|
|
7
|
+
metadata:
|
|
8
|
+
metadata: "{\"schema_version\":6,\"version\":\"1.0.0\",\"type\":\"capability\",\"category\":\"quality\",\"domain\":\"quality/performance\",\"scope\":\"portable\",\"owner\":\"skill-graph-maintainer\",\"freshness\":\"2026-05-11\",\"drift_check\":\"{\\\\\\\"last_verified\\\\\\\":\\\\\\\"2026-05-11\\\\\\\"}\",\"eval_artifacts\":\"present\",\"eval_state\":\"unverified\",\"routing_eval\":\"absent\",\"stability\":\"experimental\",\"keywords\":\"[\\\\\\\"performance engineering\\\\\\\",\\\\\\\"performance budget\\\\\\\",\\\\\\\"profiling\\\\\\\",\\\\\\\"latency\\\\\\\",\\\\\\\"throughput\\\\\\\",\\\\\\\"Core Web Vitals\\\\\\\",\\\\\\\"database performance\\\\\\\",\\\\\\\"caching\\\\\\\",\\\\\\\"bundle size\\\\\\\",\\\\\\\"performance regression\\\\\\\"]\",\"examples\":\"[\\\\\\\"profile this slow dashboard and decide what to optimize first\\\\\\\",\\\\\\\"set performance budgets for API latency, page load, and query time\\\\\\\",\\\\\\\"review this change for likely N+1 queries, cache mistakes, or bundle growth\\\\\\\",\\\\\\\"design a regression check so this endpoint cannot get slow again unnoticed\\\\\\\"]\",\"anti_examples\":\"[\\\\\\\"design logs, spans, metrics, and correlation IDs before implementation\\\\\\\",\\\\\\\"set up Sentry and error redaction\\\\\\\",\\\\\\\"make random micro-optimizations without measurements\\\\\\\",\\\\\\\"write general unit tests for this feature\\\\\\\"]\",\"relations\":\"{\\\\\\\"boundary\\\\\\\":[{\\\\\\\"skill\\\\\\\":\\\\\\\"observability-modeling\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"observability-modeling designs telemetry signals; performance-engineering uses measurements to improve performance\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"error-tracking\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"error-tracking captures failures; performance-engineering handles latency, throughput, and resource efficiency\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"testing-strategy\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"testing-strategy plans correctness tests; performance-engineering plans performance budgets and regressions\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"refactor\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"refactor restructures code while preserving behavior; performance-engineering changes behavior characteristics under measurement\\\\\\\"}],\\\\\\\"related\\\\\\\":[\\\\\\\"observability-modeling\\\\\\\",\\\\\\\"api-design\\\\\\\",\\\\\\\"data-modeling\\\\\\\",\\\\\\\"testing-strategy\\\\\\\"],\\\\\\\"verify_with\\\\\\\":[\\\\\\\"observability-modeling\\\\\\\",\\\\\\\"code-review\\\\\\\"]}\",\"portability\":\"{\\\\\\\"readiness\\\\\\\":\\\\\\\"scripted\\\\\\\",\\\\\\\"targets\\\\\\\":[\\\\\\\"skill-md\\\\\\\"]}\",\"lifecycle\":\"{\\\\\\\"stale_after_days\\\\\\\":180,\\\\\\\"review_cadence\\\\\\\":\\\\\\\"quarterly\\\\\\\"}\",\"skill_graph_source_repo\":\"https://github.com/jacob-balslev/skill-graph\",\"skill_graph_protocol\":\"Skill Metadata Protocol v5\",\"skill_graph_project\":\"Skill Graph\",\"skill_graph_canonical_skill\":\"skills/performance-engineering/SKILL.md\"}"
|
|
9
|
+
skill_graph_source_repo: "https://github.com/jacob-balslev/skill-graph"
|
|
10
|
+
skill_graph_protocol: Skill Metadata Protocol v4
|
|
11
|
+
skill_graph_project: Skill Graph
|
|
12
|
+
skill_graph_canonical_skill: skills/performance-engineering/SKILL.md
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
# Performance Engineering
|
|
16
|
+
|
|
17
|
+
## Coverage
|
|
18
|
+
|
|
19
|
+
Measure and improve performance across frontend, backend, database, jobs, APIs, and tooling. Covers bottleneck analysis, performance budgets, Core Web Vitals, query plans, N+1 detection, caching, batching, concurrency, bundle size, resource use, regression checks, and tradeoffs.
|
|
20
|
+
|
|
21
|
+
## Philosophy
|
|
22
|
+
|
|
23
|
+
Measure first. Performance work without measurement is guessing, and guessing usually optimizes the easiest code rather than the bottleneck. The correct target is the user-visible or business-critical bottleneck with evidence.
|
|
24
|
+
|
|
25
|
+
Performance is also a contract. If speed matters, define budgets and regression checks before the system silently decays.
|
|
26
|
+
|
|
27
|
+
## Method
|
|
28
|
+
|
|
29
|
+
1. Define the performance goal and user/business impact.
|
|
30
|
+
2. Collect baseline measurements under realistic conditions.
|
|
31
|
+
3. Identify the bottleneck: network, server, database, rendering, bundle, CPU, memory, lock contention, or third party.
|
|
32
|
+
4. Choose the smallest intervention likely to move the bottleneck.
|
|
33
|
+
5. Verify improvement with the same measurement method.
|
|
34
|
+
6. Add a budget, alert, or regression test for the fixed surface.
|
|
35
|
+
7. Record tradeoffs such as freshness, complexity, cost, or cache invalidation risk.
|
|
36
|
+
|
|
37
|
+
## Evals
|
|
38
|
+
|
|
39
|
+
This skill ships a comprehension-eval artifact at [`examples/evals/performance-engineering.json`](https://github.com/jacob-balslev/skill-graph/blob/main/examples/evals/performance-engineering.json). The checklist below is the authoring gate for performance decisions; the eval file is the grader surface.
|
|
40
|
+
|
|
41
|
+
## Verification
|
|
42
|
+
|
|
43
|
+
- [ ] Baseline and post-change measurements use the same method
|
|
44
|
+
- [ ] The optimized target is the measured bottleneck
|
|
45
|
+
- [ ] User-visible or business impact is stated
|
|
46
|
+
- [ ] Cache changes include invalidation and staleness rules
|
|
47
|
+
- [ ] Database fixes include query-plan or index evidence when relevant
|
|
48
|
+
- [ ] Frontend fixes include bundle or Web Vitals evidence when relevant
|
|
49
|
+
- [ ] A regression guard exists for important performance surfaces
|
|
50
|
+
|
|
51
|
+
## Do NOT Use When
|
|
52
|
+
|
|
53
|
+
| Use instead | When |
|
|
54
|
+
|---|---|
|
|
55
|
+
| `observability-modeling` | You need to design telemetry schema and diagnostic signals. |
|
|
56
|
+
| `error-tracking` | You need error capture, redaction, source maps, or issue triage. |
|
|
57
|
+
| `testing-strategy` | You need general correctness test planning. |
|
|
58
|
+
| `refactor` | You are restructuring code without a measured performance goal. |
|
|
@@ -0,0 +1,125 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: performance-testing
|
|
3
|
+
description: "Use when measuring a system's non-functional properties — latency, throughput, error rate, resource utilization — by running it under controlled load and verifying against explicit SLO thresholds. Covers the five primitives (load profile, workload, latency metric, throughput metric, SLO target), the load-shape taxonomy (smoke, load, stress, spike, soak, breakpoint), the latency-percentile vocabulary (p50, p95, p99, p99.9) and why average latency misleads, the tool ecosystem (k6, JMeter, Locust, Gatling, Vegeta), and the offline-vs-observability distinction. Do NOT use for the optimization activity itself (use `performance-engineering`), declaring the threshold contract (use `performance-budgets`), runtime measurement of deployed systems (use `observability` or `error-tracking`), microbenchmarks of single functions (language benchmark tools), chaos engineering (use `chaos-engineering`), or test-suite quality measurement (use `mutation-testing`)."
|
|
4
|
+
license: MIT
|
|
5
|
+
allowed-tools: Read Grep
|
|
6
|
+
metadata:
|
|
7
|
+
metadata: "{\"schema_version\":6,\"version\":\"1.0.0\",\"type\":\"capability\",\"category\":\"quality\",\"domain\":\"quality/testing\",\"scope\":\"reference\",\"owner\":\"skill-graph-maintainer\",\"freshness\":\"2026-05-16\",\"drift_check\":\"{\\\\\\\"last_verified\\\\\\\":\\\\\\\"2026-05-16\\\\\\\"}\",\"eval_artifacts\":\"planned\",\"eval_state\":\"unverified\",\"routing_eval\":\"absent\",\"comprehension_state\":\"present\",\"stability\":\"experimental\",\"keywords\":\"[\\\\\\\"performance testing\\\\\\\",\\\\\\\"load testing\\\\\\\",\\\\\\\"stress testing\\\\\\\",\\\\\\\"soak testing\\\\\\\",\\\\\\\"spike testing\\\\\\\",\\\\\\\"breakpoint test\\\\\\\",\\\\\\\"k6\\\\\\\",\\\\\\\"JMeter\\\\\\\",\\\\\\\"Locust\\\\\\\",\\\\\\\"Gatling\\\\\\\",\\\\\\\"latency percentile\\\\\\\",\\\\\\\"p95\\\\\\\",\\\\\\\"p99\\\\\\\",\\\\\\\"SLO\\\\\\\",\\\\\\\"throughput\\\\\\\"]\",\"triggers\":\"[\\\\\\\"what should our load test do\\\\\\\",\\\\\\\"p95 vs average latency\\\\\\\",\\\\\\\"k6 vs JMeter vs Locust\\\\\\\",\\\\\\\"is the system fast enough\\\\\\\",\\\\\\\"stress test or load test\\\\\\\"]\",\"examples\":\"[\\\\\\\"design a load test for an API endpoint that verifies the p95 SLO at expected production traffic\\\\\\\",\\\\\\\"decide between load, stress, and soak tests for a new service before launch\\\\\\\",\\\\\\\"diagnose a soak test failure that only appears after 4 hours — likely a leak\\\\\\\",\\\\\\\"explain why average latency is the wrong metric for user experience\\\\\\\"]\",\"anti_examples\":\"[\\\\\\\"measure production traffic latency in real time (use observability)\\\\\\\",\\\\\\\"benchmark a single function in isolation (use language benchmark tools)\\\\\\\",\\\\\\\"inject failures into a production system (use chaos-engineering)\\\\\\\"]\",\"relations\":\"{\\\\\\\"related\\\\\\\":[\\\\\\\"testing-strategy\\\\\\\",\\\\\\\"integration-test-design\\\\\\\",\\\\\\\"e2e-test-design\\\\\\\",\\\\\\\"performance-engineering\\\\\\\",\\\\\\\"performance-budgets\\\\\\\"],\\\\\\\"boundary\\\\\\\":[{\\\\\\\"skill\\\\\\\":\\\\\\\"testing-strategy\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"testing-strategy owns the strategic question of what to test; this skill owns one tactical technique (controlled-load measurement of non-functional properties) within that strategy.\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"integration-test-design\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"integration-test-design owns tests of correctness across internal seams; this skill owns tests of non-functional properties (latency, throughput) under controlled load. Both can use the same environment; they answer different questions.\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"e2e-test-design\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"e2e-test-design owns user-journey correctness tests; this skill owns load-driven measurement of those same journeys. A 'performance e2e test' is the composition of both disciplines.\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"performance-engineering\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"performance-engineering owns the activity of profiling and optimizing a specific slow path once it has been identified; this skill owns the discipline of exercising the system under controlled load to discover and quantify performance behavior. Performance-engineering acts on bottlenecks; performance-testing produces the measurements that locate them and verifies the optimizations afterward.\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"performance-budgets\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"performance-budgets owns the declaration of the threshold-and-consequence contract (metric, threshold, percentile, consequence) as a quality property; this skill owns the test mechanism that exercises the system under load and verifies whether the declared budgets hold. The two compose: budgets declare what 'fast enough' means; performance tests verify the system meets the declaration. Without a budget, a performance test produces measurements without a verdict; without performance tests, a budget is an aspirational threshold without empirical evidence.\\\\\\\"}],\\\\\\\"verify_with\\\\\\\":[\\\\\\\"testing-strategy\\\\\\\",\\\\\\\"integration-test-design\\\\\\\",\\\\\\\"performance-budgets\\\\\\\"]}\",\"mental_model\":\"|\",\"purpose\":\"|\",\"boundary\":\"|\",\"analogy\":\"Performance testing is to a software system what a load-bearing inspection is to a bridge — you do not certify a bridge by walking across it (functional test) and concluding it works; you drive trucks of known weight across at increasing volumes, with strain gauges on every beam, and verify the deflection stays within spec under expected traffic, that the failure mode is graceful when overloaded (cracks before collapse), that nothing creeps over a long soak. A bridge whose 'average' load it can carry is 50 tonnes but whose p99 stressor reveals harmonic resonance at 80 tonnes is the bridge that fails on a windy day.\",\"misconception\":\"|\",\"concept\":\"{\\\\\\\"definition\\\\\\\":\\\\\\\"Performance testing is the discipline of measuring a system's non-functional properties — latency, throughput, resource utilization, error rate under load — by running the system under controlled load conditions and observing the resulting metrics. Where functional tests answer 'does the system produce the right output?', performance tests answer 'does the system produce the right output *quickly enough* and at *sufficient scale*, while staying within resource budgets and error tolerances?'. The unit of judgment is whether the measured metrics meet defined acceptance thresholds (typically Service-Level Objectives expressed as percentiles, e.g., 'p95 latency below 200ms at 1,000 requests per second sustained for 30 minutes'). Performance testing is *controlled* and *offline*; observability is its production-runtime counterpart that measures the live system without imposed load.\\\\\\\",\\\\\\\"mental_model\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"purpose\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"boundary\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"taxonomy\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"analogy\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"misconception\\\\\\\":\\\\\\\"|\\\\\\\"}\",\"skill_graph_source_repo\":\"https://github.com/jacob-balslev/skill-graph\",\"skill_graph_protocol\":\"Skill Metadata Protocol v5\",\"skill_graph_project\":\"Skill Graph\",\"skill_graph_canonical_skill\":\"skills/performance-testing/SKILL.md\",\"skill_graph_export_description\":\"shortened for Agent Skills 1024-character description limit; canonical source keeps the full routing contract\",\"skill_graph_canonical_description_length\":\"1133\"}"
|
|
8
|
+
skill_graph_source_repo: "https://github.com/jacob-balslev/skill-graph"
|
|
9
|
+
skill_graph_protocol: Skill Metadata Protocol v4
|
|
10
|
+
skill_graph_project: Skill Graph
|
|
11
|
+
skill_graph_canonical_skill: skills/performance-testing/SKILL.md
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Performance Testing
|
|
15
|
+
|
|
16
|
+
## Coverage
|
|
17
|
+
|
|
18
|
+
The discipline of measuring non-functional system properties — latency, throughput, error rate, resource utilization, saturation — by running the system under controlled load and verifying against explicit SLO thresholds. Covers the five primitives (load profile, workload, latency metric, throughput metric, SLO target), the six load-shape types (smoke, load, stress, spike, soak, breakpoint) that each test a different property, the latency-percentile discipline (p50/p95/p99/p99.9) that replaces the misleading average, the environment-fidelity requirement that makes results meaningful, the modern tool landscape (k6, JMeter, Locust, Gatling, Vegeta, Wrk), and the distinction from observability (offline controlled vs online real-traffic) and benchmarking (system vs single-function).
|
|
19
|
+
|
|
20
|
+
## Philosophy
|
|
21
|
+
|
|
22
|
+
Performance testing is the discipline of adding the *time and scale* dimensions to verification. Functional tests verify "the system produces the right output"; performance tests verify "the system produces the right output quickly enough and at sufficient scale, within resource budgets and error tolerances." Without performance testing, "is it fast enough" is answered by intuition or by users complaining; with performance testing, it is answered by measurement against an explicit SLO.
|
|
23
|
+
|
|
24
|
+
The central insight is that performance is multi-dimensional. Latency distribution, throughput ceiling, error rate under load, resource utilization, saturation point — each is a separate property; each requires its own measurement; each can be acceptable while others fail. Reducing performance to a single number (especially average latency) is the most common discipline failure.
|
|
25
|
+
|
|
26
|
+
The complementary insight is environment fidelity. A performance test in a non-production-like environment produces a measurement of that environment, not the system. Production hardware, network, data volumes, dependency versions, and configuration are the load-bearing investment for performance testing; the load tool itself is increasingly cheap.
|
|
27
|
+
|
|
28
|
+
## The Six Load Shapes
|
|
29
|
+
|
|
30
|
+
| Shape | Profile | Verifies | When to run |
|
|
31
|
+
|---|---|---|---|
|
|
32
|
+
| Smoke | Small load, short duration | Test harness works, system functions under any load | Every PR (fast) |
|
|
33
|
+
| Load | Expected production load × margin, sustained | System meets SLO at design load | Every merge / nightly |
|
|
34
|
+
| Stress | Load beyond expected capacity | Failure mode is graceful | Pre-launch / quarterly |
|
|
35
|
+
| Spike | Sudden large increase | Elasticity, autoscaling, recovery | Pre-launch / before known traffic events |
|
|
36
|
+
| Soak | Sustained moderate load for hours | No leaks, no degradation | Pre-launch / monthly |
|
|
37
|
+
| Breakpoint | Gradually increasing load to failure | Quantitative capacity ceiling | Pre-launch / before capacity planning |
|
|
38
|
+
|
|
39
|
+
A complete pre-launch performance test suite runs all six. An ongoing test suite usually runs smoke on every PR and load on every merge, with stress / spike / soak / breakpoint on cadence.
|
|
40
|
+
|
|
41
|
+
## Latency Percentiles — The Honest Vocabulary
|
|
42
|
+
|
|
43
|
+
| Metric | What it tells you | Use for |
|
|
44
|
+
|---|---|---|
|
|
45
|
+
| Mean (average) | Arithmetic average — easily skewed by outliers | Almost nothing; avoid as acceptance criterion |
|
|
46
|
+
| p50 (median) | The typical request | Sanity check; basic system health |
|
|
47
|
+
| p95 | The slow 5% — what 1 in 20 users feels | Common SLO target |
|
|
48
|
+
| p99 | The slow 1% — what 1 in 100 users feels | Common SLO target for user-facing systems |
|
|
49
|
+
| p99.9 | The very slow 0.1% — rare but real | SLO target for high-stakes systems |
|
|
50
|
+
| Max | The single worst request | Diagnosis; avoid as SLO (one bad data point dominates) |
|
|
51
|
+
|
|
52
|
+
Acceptance criteria should always be percentiles (or distributions), never averages. A system whose mean is 50ms and p99 is 5 seconds has a user-experience problem the mean hides.
|
|
53
|
+
|
|
54
|
+
## SLO-Driven Performance Tests
|
|
55
|
+
|
|
56
|
+
A performance test without an SLO is "we measured X" without a verdict. An SLO-driven performance test is:
|
|
57
|
+
|
|
58
|
+
```
|
|
59
|
+
SLO statement:
|
|
60
|
+
- p95 latency < 200ms
|
|
61
|
+
- p99 latency < 500ms
|
|
62
|
+
- error rate < 0.1%
|
|
63
|
+
- sustained throughput >= 1000 RPS
|
|
64
|
+
|
|
65
|
+
Test design:
|
|
66
|
+
- Load shape: constant 1000 RPS for 30 minutes
|
|
67
|
+
- Workload: 70% reads, 25% writes, 5% complex queries
|
|
68
|
+
- Environment: production-equivalent staging
|
|
69
|
+
- Pass: all four SLO conditions met for full test duration
|
|
70
|
+
- Fail: any SLO condition violated for > 60 seconds cumulative
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
The SLO is what makes the test a verification rather than a measurement.
|
|
74
|
+
|
|
75
|
+
## Tool Selection
|
|
76
|
+
|
|
77
|
+
| Tool | Strengths | Best for |
|
|
78
|
+
|---|---|---|
|
|
79
|
+
| k6 (Grafana Labs) | Modern JS scripting; cloud-and-local; HTTP/gRPC/WebSocket | New projects; production default |
|
|
80
|
+
| Apache JMeter | Broad protocol support; UI-driven; very mature | Enterprise / complex protocol mix |
|
|
81
|
+
| Locust | Python scripting; distributed; behavioral modeling | Python ecosystem teams |
|
|
82
|
+
| Gatling | Scala / Java; high throughput per generator | High-load single-generator scenarios |
|
|
83
|
+
| Vegeta | Go; simple HTTP; CLI-driven | Simple HTTP load with CI integration |
|
|
84
|
+
| Wrk / Wrk2 | C; very fast; minimal | Microbenchmarks and pure HTTP |
|
|
85
|
+
|
|
86
|
+
## Verification
|
|
87
|
+
|
|
88
|
+
After applying this skill, verify:
|
|
89
|
+
- [ ] Performance tests have explicit SLOs as acceptance criteria. "We measured X" without a target is not a verification.
|
|
90
|
+
- [ ] Acceptance criteria are stated as percentiles (p50/p95/p99/p99.9), never as averages. Tests that report only averages are misleading.
|
|
91
|
+
- [ ] The test environment is production-equivalent in hardware, network, data volumes, dependency versions, and configuration. Stripped environments produce informational results, not verifications.
|
|
92
|
+
- [ ] A range of load shapes is tested: smoke (PR-level), load (merge-level), stress / spike / soak / breakpoint (pre-launch and on cadence).
|
|
93
|
+
- [ ] Workload composition is derived from production traffic patterns where possible. Synthetic load with the right RPS but wrong operation mix tests the wrong code paths.
|
|
94
|
+
- [ ] The load tool is not the bottleneck. Distributed load generation is used where single-generator capacity might be exceeded.
|
|
95
|
+
- [ ] Performance tests run continuously (not just pre-launch). Regression detection is the on-going value; one-time tests are point-in-time signals only.
|
|
96
|
+
- [ ] Soak tests run for hours and exercise the leak and degradation failure modes that shorter tests miss.
|
|
97
|
+
- [ ] Performance testing is paired with observability for production validation. The two compose; neither replaces the other.
|
|
98
|
+
- [ ] Stress and breakpoint tests are run to characterize failure mode and capacity ceiling. A system whose failure mode is unknown cannot be operated.
|
|
99
|
+
|
|
100
|
+
## Do NOT Use When
|
|
101
|
+
|
|
102
|
+
| Instead of this skill | Use | Why |
|
|
103
|
+
|---|---|---|
|
|
104
|
+
| Profiling and optimizing a specific slow path once located | `performance-engineering` | performance-engineering is the optimization activity itself; this skill is the load-driven measurement that locates and quantifies what to optimize |
|
|
105
|
+
| Declaring the threshold-and-consequence contract (metric, threshold, percentile, consequence) | `performance-budgets` | performance-budgets owns the contract; this skill verifies the contract holds under load |
|
|
106
|
+
| Measuring real production traffic in real time | `observability` or `error-tracking` | observability owns runtime measurement; this skill owns offline controlled measurement |
|
|
107
|
+
| Benchmarking a single function or implementation | language-level benchmark tools | benchmarks isolate; this skill measures the assembled system |
|
|
108
|
+
| Injecting failures into a running system | `chaos-engineering` (when it exists) | chaos is fault injection; this skill is load measurement |
|
|
109
|
+
| Testing the test suite's quality | `mutation-testing` | mutation measures test-suite effectiveness; this skill measures system performance |
|
|
110
|
+
| Choosing the test-level ratio | `testing-strategy` | strategy owns ratios; this skill is one technique within them |
|
|
111
|
+
| Testing internal seams between modules | `integration-test-design` | integration owns correctness across seams; this skill owns load behavior |
|
|
112
|
+
| Testing user journeys end-to-end | `e2e-test-design` | e2e owns user-perceived behavior; "performance e2e" composes both |
|
|
113
|
+
|
|
114
|
+
## Key Sources
|
|
115
|
+
|
|
116
|
+
- Grafana Labs / k6 Team. ["k6 Documentation"](https://k6.io/docs/). Canonical reference for the modern JavaScript-scriptable load testing tool; covers the six load shapes and the SLO-as-test-target discipline.
|
|
117
|
+
- Apache Software Foundation. ["Apache JMeter — User Manual"](https://jmeter.apache.org/usermanual/). Canonical reference for the most-established cross-protocol performance testing tool.
|
|
118
|
+
- Locust Team. ["Locust Documentation"](https://docs.locust.io/). Reference for the Python-scriptable distributed load testing tool.
|
|
119
|
+
- Gatling Team. ["Gatling Documentation"](https://gatling.io/docs/). Reference for the high-performance Scala-based load testing tool.
|
|
120
|
+
- Dean, J., & Norvig, P. ["Latency Numbers Every Programmer Should Know"](https://gist.github.com/jboner/2841832). Industry-canonical reference table for the latency scales that performance testing measures against.
|
|
121
|
+
- Beyer, B., Jones, C., Petoff, J., & Murphy, N. R. (2016). *Site Reliability Engineering*. O'Reilly. Google SRE book; chapter on SLOs as the contract performance tests verify against.
|
|
122
|
+
- Brendan Gregg. ["The USE Method: A method for analyzing system performance"](https://www.brendangregg.com/usemethod.html). The Utilization-Saturation-Errors framework that underlies systematic performance analysis.
|
|
123
|
+
- Tene, G. ["How NOT to Measure Latency"](https://www.youtube.com/watch?v=lJ8ydIuPFeU). The canonical talk on percentile latency, coordinated omission, and the misleading nature of average-latency reporting.
|
|
124
|
+
- Smith, C. U., & Williams, L. G. (2002). *Performance Solutions: A Practical Guide to Creating Responsive, Scalable Software*. Addison-Wesley. Foundational reference on performance engineering methodology.
|
|
125
|
+
- Kounev, S., Lange, K.-D., & von Kistowski, J. (2020). *Systems Benchmarking: For Scientists and Engineers*. Springer. Modern reference on performance testing methodology and benchmark design.
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: printify
|
|
3
|
+
description: "Use when working with Printify — the print-on-demand REST API, catalog model (blueprints, print providers, variants, print areas), product creation and publish lifecycle to connected channels, order routing, shipping cost queries, and HMAC SHA-256 webhook verification. Do NOT use for non-Printify POD vendors, generic Shopify storefront work, or print-file (artwork) generation."
|
|
4
|
+
license: CC-BY-4.0
|
|
5
|
+
metadata:
|
|
6
|
+
metadata: "{\"schema_version\":6,\"version\":\"1.0.0\",\"type\":\"capability\",\"category\":\"engineering\",\"domain\":\"engineering/integrations\",\"scope\":\"portable\",\"owner\":\"skill-graph-maintainer\",\"freshness\":\"2026-05-12\",\"drift_check\":\"{\\\\\\\"last_verified\\\\\\\":\\\\\\\"2026-05-12\\\\\\\"}\",\"eval_artifacts\":\"planned\",\"eval_state\":\"unverified\",\"routing_eval\":\"absent\",\"stability\":\"experimental\",\"keywords\":\"[\\\\\\\"printify api\\\\\\\",\\\\\\\"print on demand\\\\\\\",\\\\\\\"printify blueprints\\\\\\\",\\\\\\\"printify print providers\\\\\\\",\\\\\\\"printify publish lifecycle\\\\\\\",\\\\\\\"printify webhooks\\\\\\\",\\\\\\\"printify variants\\\\\\\",\\\\\\\"printify shipping costs\\\\\\\",\\\\\\\"printify order routing\\\\\\\",\\\\\\\"print provider catalog\\\\\\\"]\",\"triggers\":\"[\\\\\\\"printify\\\\\\\",\\\\\\\"printify api\\\\\\\",\\\\\\\"printify webhook\\\\\\\",\\\\\\\"print on demand\\\\\\\"]\",\"examples\":\"[\\\\\\\"Create a Printify product from a blueprint + print provider + variant set and publish it to a connected Shopify store\\\\\\\",\\\\\\\"Handle a Printify order:updated webhook and reconcile fulfillment status\\\\\\\",\\\\\\\"Resolve shipping cost for a basket of Printify variants given a destination country\\\\\\\"]\",\"anti_examples\":\"[\\\\\\\"Generate the artwork PNG file that gets uploaded as a print file\\\\\\\",\\\\\\\"Implement the Shopify side of the Printify-to-Shopify sync\\\\\\\",\\\\\\\"Design a generic POD-vendor-agnostic product schema\\\\\\\"]\",\"relations\":\"{\\\\\\\"related\\\\\\\":[\\\\\\\"shopify\\\\\\\",\\\\\\\"webhook-integration\\\\\\\",\\\\\\\"api-design\\\\\\\"],\\\\\\\"boundary\\\\\\\":[{\\\\\\\"skill\\\\\\\":\\\\\\\"shopify\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"Printify publishes to Shopify (and other channels) but the Shopify-side concerns — theme display, Shopify webhooks, Admin API — belong in the shopify skill.\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"webhook-integration\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"webhook-integration covers vendor-neutral signing/retry patterns; this skill handles Printify's specific event types and signature scheme.\\\\\\\"}]}\",\"skill_graph_source_repo\":\"https://github.com/jacob-balslev/skill-graph\",\"skill_graph_protocol\":\"Skill Metadata Protocol v5\",\"skill_graph_project\":\"Skill Graph\",\"skill_graph_canonical_skill\":\"skills/printify/SKILL.md\"}"
|
|
7
|
+
skill_graph_source_repo: "https://github.com/jacob-balslev/skill-graph"
|
|
8
|
+
skill_graph_protocol: Skill Metadata Protocol v4
|
|
9
|
+
skill_graph_project: Skill Graph
|
|
10
|
+
skill_graph_canonical_skill: skills/printify/SKILL.md
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Printify
|
|
14
|
+
|
|
15
|
+
## Coverage
|
|
16
|
+
The Printify REST API exposes a catalog (blueprints — abstract product templates like "Unisex Heavy Cotton Tee"; print providers — fulfillment partners who manufacture a given blueprint; variants — concrete color/size combinations a provider offers for a blueprint), shops (connected sales channels: Shopify, Etsy, WooCommerce, eBay, TikTok, custom API), products (a blueprint + print provider + variant selection + print areas + images, owned by a Printify shop), and orders (created either by sync from a connected channel or directly via API for merchant-fulfilled flows). This skill covers the relationships between these resources and the lifecycle transitions that move a product from draft to published to live on a storefront.
|
|
17
|
+
|
|
18
|
+
Authentication is a personal access token in the Authorization: Bearer header. The API is versioned via URL path (currently /v1/) and rate-limited globally with documented limits per endpoint family — product creation is more constrained than read calls. Image uploads go through a dedicated upload endpoint that accepts either a public URL or base64 contents and returns an image ID referenced by subsequent product-create payloads.
|
|
19
|
+
|
|
20
|
+
The publish lifecycle is a two-step asynchronous operation. POST to /products marks a product as ready for publish in Printify; the actual push to the connected channel (e.g., creating the Shopify product) happens via /products/{id}/publish which returns immediately and resolves asynchronously. The channel push can fail independently (channel auth expired, blueprint not available in the target region) and must be observed via the publishing-succeeded / publishing-failed webhook events or by polling the product publish status. Unpublish and delete each have their own semantics — unpublish removes from the channel but keeps the product in Printify; delete removes both.
|
|
21
|
+
|
|
22
|
+
Webhooks deliver order, product, and shop events. Each delivery includes an X-Pfy-Signature header computed as HMAC SHA-256 over the raw body with the webhook secret returned on subscription. Order events (order:created, order:updated, order:sent-to-production, order:shipment:created, order:shipment:delivered, order:cancelled) carry the full order payload including line items with print provider, shipping cost, and tracking. Shipping cost can also be computed pre-purchase via /shops/{shop_id}/orders/shipping.json with a destination address and line items.
|
|
23
|
+
|
|
24
|
+
## Philosophy
|
|
25
|
+
Printify sits between the merchant and the print provider, and its catalog reflects that — a blueprint's available variants and print areas are determined by the print provider, not by Printify. The same blueprint produced by two providers can have different color availability, different print area dimensions, and different shipping profiles. Integrations should treat blueprint + print_provider as a composite key and never assume that variants are portable across providers.
|
|
26
|
+
|
|
27
|
+
The publish lifecycle is asynchronous and partially observable. Treat publish success as a separate state from product-create success, and reconcile via webhooks rather than optimistic UI. Costs (product cost, shipping cost) are determined at order-create time by Printify and can differ from any pre-quoted estimate; the order webhook payload is the source of truth for actual cost.
|
|
28
|
+
|
|
29
|
+
## Verification
|
|
30
|
+
- Webhook signature verification computes HMAC SHA-256 over the raw body bytes with the per-subscription secret and uses constant-time comparison against the X-Pfy-Signature header.
|
|
31
|
+
- Product creation supplies a valid (blueprint_id, print_provider_id) pair from the print provider's variant list — invalid pairs return 400 with the unsupported variant ID, not a generic error.
|
|
32
|
+
- Publish status is reconciled via the publishing-succeeded webhook before the product is marked live in the integrating system; status defaults to pending and stays there if the webhook is missed.
|
|
33
|
+
- Image uploads complete and return an image ID before the product-create call references it; race conditions return 404 on the image reference.
|
|
34
|
+
- Shipping cost queried via /shipping.json uses the same address fields the order will eventually carry; mismatched country/region codes silently route to default shipping profiles.
|
|
35
|
+
- Order webhooks are idempotent at the consumer — the same event ID can be delivered more than once on retry.
|
|
36
|
+
|
|
37
|
+
## Do NOT Use When
|
|
38
|
+
- The print-on-demand vendor is Printful, Gelato, CustomCat, or any non-Printify provider. Each has a different catalog model, signing scheme, and publish lifecycle.
|
|
39
|
+
- The task is generating the print artwork itself (PNG/SVG creation, DPI handling, color-profile conversion). That is a graphics-generation task, not a Printify task.
|
|
40
|
+
- You are working on the Shopify-side of a Printify→Shopify sync (theme display, Shopify product overrides). Use the shopify skill.
|
|
41
|
+
- The work is vendor-agnostic webhook plumbing (retry, dead-letter, signing-key rotation across many providers). Use webhook-integration.
|
|
42
|
+
- You are designing the merchant's internal order schema. Use event-contract-design.
|
|
@@ -0,0 +1,118 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: prioritization
|
|
3
|
+
description: "This skill provides prioritization frameworks for AI engineering: RICE-A (adding AI Ambiguity to RICE) for product features, ICE for research experiments, and MoSCoW for MVP/Release scoping. Use when ranking the backlog, deciding which model research path to follow, or defining the scope of a new feature. Do NOT use for one-off task sequencing (use task skill) or personal time management."
|
|
4
|
+
license: MIT
|
|
5
|
+
compatibility: "Markdown, Git, agent-skill runtimes"
|
|
6
|
+
allowed-tools: Read Grep Bash
|
|
7
|
+
metadata:
|
|
8
|
+
metadata: "{\"schema_version\":6,\"version\":\"1.0.0\",\"type\":\"capability\",\"category\":\"quality\",\"domain\":\"quality/doctrine\",\"scope\":\"reference\",\"owner\":\"skill-graph-maintainer\",\"freshness\":\"2026-03-27\",\"drift_check\":\"{\\\\\\\"last_verified\\\\\\\":\\\\\\\"2026-03-27\\\\\\\"}\",\"eval_artifacts\":\"planned\",\"eval_state\":\"unverified\",\"routing_eval\":\"absent\",\"stability\":\"experimental\",\"keywords\":\"[\\\\\\\"prioritization\\\\\\\",\\\\\\\"RICE\\\\\\\",\\\\\\\"ICE\\\\\\\",\\\\\\\"MoSCoW\\\\\\\",\\\\\\\"RICE-A\\\\\\\",\\\\\\\"AI ambiguity\\\\\\\",\\\\\\\"feature ranking\\\\\\\",\\\\\\\"research prioritization\\\\\\\",\\\\\\\"backlog management\\\\\\\",\\\\\\\"MVP scope\\\\\\\"]\",\"triggers\":\"[\\\\\\\"prioritization-skill\\\\\\\",\\\\\\\"roadmap-skill\\\\\\\",\\\\\\\"priority-planning-mode\\\\\\\"]\",\"relations\":\"{\\\\\\\"related\\\\\\\":[\\\\\\\"constraint-awareness\\\\\\\"]}\",\"portability\":\"{\\\\\\\"readiness\\\\\\\":\\\\\\\"scripted\\\\\\\",\\\\\\\"targets\\\\\\\":[\\\\\\\"skill-md\\\\\\\"]}\",\"lifecycle\":\"{\\\\\\\"stale_after_days\\\\\\\":90,\\\\\\\"review_cadence\\\\\\\":\\\\\\\"quarterly\\\\\\\"}\",\"skill_graph_source_repo\":\"https://github.com/jacob-balslev/skill-graph\",\"skill_graph_protocol\":\"Skill Metadata Protocol v5\",\"skill_graph_project\":\"Skill Graph\",\"skill_graph_canonical_skill\":\"skills/prioritization/SKILL.md\"}"
|
|
9
|
+
skill_graph_source_repo: "https://github.com/jacob-balslev/skill-graph"
|
|
10
|
+
skill_graph_protocol: Skill Metadata Protocol v4
|
|
11
|
+
skill_graph_project: Skill Graph
|
|
12
|
+
skill_graph_canonical_skill: skills/prioritization/SKILL.md
|
|
13
|
+
---
|
|
14
|
+
# Prioritization
|
|
15
|
+
|
|
16
|
+
## Domain Context
|
|
17
|
+
|
|
18
|
+
**What is this skill?** This skill provides prioritization frameworks for AI engineering: RICE-A (adding AI Ambiguity to RICE) for product features, ICE for research experiments, and MoSCoW for MVP/Release scoping. Use when ranking the backlog, deciding which model research path to follow, or defining the scope of a new feature. Do NOT use for one-off task sequencing (use task skill) or personal time management.
|
|
19
|
+
|
|
20
|
+
## Workflow
|
|
21
|
+
|
|
22
|
+
Use the ordered phases, checklists, and guardrails in the sections below as the canonical workflow for this skill. When multiple subsections describe steps, follow them in the order presented.
|
|
23
|
+
|
|
24
|
+
## Coverage
|
|
25
|
+
|
|
26
|
+
Three prioritization frameworks (RICE-A, ICE, MoSCoW), the Human vs AI Matrix for task delegation, accuracy threshold setting to prevent scope creep, and framework selection rules for matching the right framework to the current development phase.
|
|
27
|
+
|
|
28
|
+
## Philosophy
|
|
29
|
+
|
|
30
|
+
Without explicit prioritization frameworks, agents default to working on whatever seems most interesting or most recently mentioned. In AI-assisted development, this is especially dangerous because research tasks have unbounded ambiguity. The RICE-A extension adds an Ambiguity denominator that forces experimental work through a research phase before it competes with proven features for engineering time. This skill prevents the two most common prioritization failures: shipping low-confidence features ahead of proven ones, and chasing diminishing accuracy returns instead of delivering core value.
|
|
31
|
+
|
|
32
|
+
> Prioritization is the science of ranking work by expected impact vs. effort. In AI development, traditional prioritization fails because it ignores research uncertainty. Good prioritization accounts for model ambiguity while maximizing the delivery of core product value.
|
|
33
|
+
|
|
34
|
+
## 1. RICE-A Framework — Product Feature Prioritization
|
|
35
|
+
|
|
36
|
+
Use for ranking user-facing features when you have a baseline model.
|
|
37
|
+
|
|
38
|
+
$$Score = \frac{Reach \times Impact \times Confidence}{Effort \times (\frac{Ambiguity}{2})}$$
|
|
39
|
+
|
|
40
|
+
### RICE-A Definitions
|
|
41
|
+
|
|
42
|
+
| Factor | Definition | Scale |
|
|
43
|
+
|--------|------------|-------|
|
|
44
|
+
| **Reach** | Users/quarter affected | Absolute number |
|
|
45
|
+
| **Impact** | Contribution to core value proposition | 3 (Massive) to 0.25 (Minimal) |
|
|
46
|
+
| **Confidence** | Data quality & baseline model presence | 100% (Proven) to 50% (Guess) |
|
|
47
|
+
| **Effort** | Person-weeks (Inference + Data effort) | Number |
|
|
48
|
+
| **Ambiguity** | "Unknown unknowns" of model performance | 1 (Deterministic) to 5 (Highly Experimental) |
|
|
49
|
+
|
|
50
|
+
**Rule**: A high Ambiguity (A) score acts as a denominator that lowers the priority of experimental features until they move through the ICE research phase.
|
|
51
|
+
|
|
52
|
+
## 2. ICE Framework — Research Prioritization
|
|
53
|
+
|
|
54
|
+
Use for ranking 10+ experiments when you are in the "Discovery" phase.
|
|
55
|
+
|
|
56
|
+
$$Score = Impact \times Confidence \times Ease$$
|
|
57
|
+
|
|
58
|
+
| Factor | Scale (1-10) | Definition |
|
|
59
|
+
|--------|--------------|------------|
|
|
60
|
+
| **Impact** | 1-10 | How much does this improve the baseline metric? |
|
|
61
|
+
| **Confidence** | 1-10 | How sure are we that this experiment will succeed? |
|
|
62
|
+
| **Ease** | 1-10 | How fast can we run this experiment (ignoring long-term COGS)? |
|
|
63
|
+
|
|
64
|
+
**Rule**: ICE is for "fail-fast" research. Prioritize the highest score to find the working model architecture before applying RICE-A for product integration.
|
|
65
|
+
|
|
66
|
+
## 3. MoSCoW Method — MVP/Release Scoping
|
|
67
|
+
|
|
68
|
+
Use for defining the "Musts" of a specific delivery milestone.
|
|
69
|
+
|
|
70
|
+
| Category | Definition | AI Example |
|
|
71
|
+
|----------|------------|------------|
|
|
72
|
+
| **Must-Have** | Non-negotiable core functionality | "Model must correctly calculate profit_cents" |
|
|
73
|
+
| **Should-Have** | High priority but not critical for launch | "Latency should be < 2s for 95% of queries" |
|
|
74
|
+
| **Could-Have** | Desirable enhancements ("Nice-to-have") | "Multi-modal image support for product matching" |
|
|
75
|
+
| **Won't-Have** | Out of scope for this milestone | " chasing the final 0.1% of accuracy" |
|
|
76
|
+
|
|
77
|
+
**Rule**: Protect the team from "Accuracy Creep". Define the "Must-Have" accuracy threshold before starting implementation.
|
|
78
|
+
|
|
79
|
+
## 4. The Human vs. AI Matrix (The Gold Quadrant)
|
|
80
|
+
|
|
81
|
+
Prioritize work based on who is best suited for the task.
|
|
82
|
+
|
|
83
|
+
```text
|
|
84
|
+
High | (1) AI ASSISTED | (2) THE GOLD QUADRANT
|
|
85
|
+
Human | (Research, Strategy) | (Bulk Gen, Triage, Tests)
|
|
86
|
+
Effort |------------------------|--------------------------
|
|
87
|
+
| (3) IGNORE | (4) HUMAN ONLY
|
|
88
|
+
Low | (Trivial Tasks) | (Creative, High-Stakes)
|
|
89
|
+
----------------------------------------------------
|
|
90
|
+
High Low
|
|
91
|
+
AI Effort
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
- **The Gold Quadrant**: High Human Effort / Low AI Effort. These tasks have the highest ROI for AI agents (e.g., generating 50 tests, triaging 1000 logs).
|
|
95
|
+
|
|
96
|
+
## Verification
|
|
97
|
+
|
|
98
|
+
```text
|
|
99
|
+
PRIORITIZATION CHECK
|
|
100
|
+
====================
|
|
101
|
+
[ ] Framework matches the phase (ICE for Research, RICE-A for Product)
|
|
102
|
+
[ ] Ambiguity (A) score assigned for experimental features
|
|
103
|
+
[ ] Confidence score grounded in data quality (not just vibes)
|
|
104
|
+
[ ] MoSCoW defined for the current milestone
|
|
105
|
+
[ ] Accuracy threshold set (prevents Accuracy Creep)
|
|
106
|
+
[ ] Task sits in the "Gold Quadrant" for AI agents
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
## Do NOT Use When
|
|
110
|
+
|
|
111
|
+
| Instead of this skill | Use | Why |
|
|
112
|
+
|---|---|---|
|
|
113
|
+
| Sequencing tasks within a single sprint or session | `task` | Task skill owns execution ordering; prioritization owns which work to pick |
|
|
114
|
+
| Estimating effort for individual tasks | `effort` | Effort calibration is a separate concern from priority ranking |
|
|
115
|
+
| Competitive positioning or market strategy | `competitive-positioning` | Business strategy informs priority inputs but is not the framework itself |
|
|
116
|
+
| Defining product requirements or specifications | `spec-driven-development` | Prioritization ranks work; SDD defines what the work contains |
|
|
117
|
+
|
|
118
|
+
> **Source**: `REPORTS/Report_UI-UX-Thesis-Audit_Gemini-3-Flash_13-03-2026-05-15.md`
|