@skill-graph/cli 0.5.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +247 -0
- package/LICENSE +200 -0
- package/NOTICE +62 -0
- package/README.md +398 -0
- package/SKILL_GRAPH.md +443 -0
- package/bin/skill-graph.js +374 -0
- package/docs/ADOPTION.md +117 -0
- package/docs/CONFORMANCE.md +66 -0
- package/docs/PRIMER.md +384 -0
- package/docs/QUICKSTART-30MIN.md +333 -0
- package/docs/ROUTING-METRICS.md +120 -0
- package/docs/SKILL-MD-FORMAT-COMPATIBILITY.md +127 -0
- package/docs/SKILL_AUDIT_CHECKLIST.md +199 -0
- package/docs/SKILL_AUDIT_LOOP.md +195 -0
- package/docs/SKILL_METADATA_PROTOCOL.md +609 -0
- package/docs/_archived/marketplace-publication-priority-2026-05-18.md +239 -0
- package/docs/adr/0001-predicate-set.md +69 -0
- package/docs/adr/0002-json-ld-context.md +82 -0
- package/docs/adr/0003-ontoclean-rigidity-tags.md +65 -0
- package/docs/adr/0004-persistent-identifiers.md +74 -0
- package/docs/adr/0005-freshness-consolidation.md +70 -0
- package/docs/adr/0006-revise-predicate-rename.md +105 -0
- package/docs/adr/0007-audit-loop-cadence.md +99 -0
- package/docs/adr/0008-skill-surface-split-and-curation-policy.md +93 -0
- package/docs/category-consumers.md +168 -0
- package/docs/concept-map.md +194 -0
- package/docs/diagrams/drift-states.mmd +21 -0
- package/docs/diagrams/manifest-pipeline.mmd +25 -0
- package/docs/diagrams/routing-harness.mmd +41 -0
- package/docs/diagrams/starter-graph.mmd +53 -0
- package/docs/field-decision-guide.md +315 -0
- package/docs/field-rationale.md +211 -0
- package/docs/field-reference.generated.md +624 -0
- package/docs/field-reference.md +1426 -0
- package/docs/glossary.md +190 -0
- package/docs/head-noun-glossary.md +63 -0
- package/docs/images/audit-phases.png +0 -0
- package/docs/images/drift-states.png +0 -0
- package/docs/images/graded-mode.png +0 -0
- package/docs/images/manifest-pipeline.png +0 -0
- package/docs/images/routing-harness.png +0 -0
- package/docs/images/skill-anatomy.png +0 -0
- package/docs/images/starter-graph.png +0 -0
- package/docs/images/system-model.png +0 -0
- package/docs/integrations/github-actions.md +155 -0
- package/docs/manifest-field-mapping.md +443 -0
- package/docs/marketplace-publication-queue.generated.md +240 -0
- package/docs/marketplace-release-agent-prompt.md +82 -0
- package/docs/marketplace-skill-candidate-list.md +272 -0
- package/docs/marketplace-syndication.md +222 -0
- package/docs/migration-sample-review.md +155 -0
- package/docs/migrations/v4-to-v5.md +168 -0
- package/docs/migrations/v5-to-v6.md +221 -0
- package/docs/name-exceptions.yaml +37 -0
- package/docs/plans/marketplace-p1-public-migration-plan.md +41 -0
- package/docs/plans/multi-root-workspace.md +148 -0
- package/docs/plans/scripts-roadmap.md +107 -0
- package/docs/plans/v4-schema-bump.md +160 -0
- package/docs/plans/wave-2-extraction.md +122 -0
- package/docs/positioning-vs-marketplaces.md +175 -0
- package/docs/proposals/skill-audit-loop-positioning.md +160 -0
- package/docs/quality-doctrine.md +138 -0
- package/docs/recommended-skills.md +150 -0
- package/docs/research/skill-comprehension-eval-research.md +1830 -0
- package/docs/research/skill-retrieval-evidence.md +66 -0
- package/docs/skill-metadata-protocol.md +471 -0
- package/docs/skills-sh-maintainer-cleanup-request.md +80 -0
- package/examples/audits/a11y/findings.md +52 -0
- package/examples/audits/a11y/scorecard.md +21 -0
- package/examples/audits/a11y/verdict.md +44 -0
- package/examples/audits/debugging/findings.md +59 -0
- package/examples/audits/debugging/scorecard.md +22 -0
- package/examples/audits/debugging/verdict.md +33 -0
- package/examples/audits/documentation/findings.md +59 -0
- package/examples/audits/documentation/scorecard.md +22 -0
- package/examples/audits/documentation/verdict.md +33 -0
- package/examples/evals/a11y.json +140 -0
- package/examples/evals/api-design.json +52 -0
- package/examples/evals/code-review.json +52 -0
- package/examples/evals/data-modeling.json +52 -0
- package/examples/evals/database-migration.json +52 -0
- package/examples/evals/debugging.json +118 -0
- package/examples/evals/dependency-architecture.json +52 -0
- package/examples/evals/design-system-architecture.json +52 -0
- package/examples/evals/error-tracking.json +52 -0
- package/examples/evals/event-contract-design.json +52 -0
- package/examples/evals/form-ux-architecture.json +52 -0
- package/examples/evals/framework-fit-analysis.json +52 -0
- package/examples/evals/graph-audit.json +139 -0
- package/examples/evals/information-architecture.json +52 -0
- package/examples/evals/interaction-feedback.json +52 -0
- package/examples/evals/interaction-patterns.json +52 -0
- package/examples/evals/layout-composition.json +52 -0
- package/examples/evals/lint-overlay.json +117 -0
- package/examples/evals/microcopy.json +52 -0
- package/examples/evals/observability-modeling.json +52 -0
- package/examples/evals/pattern-recognition.json +96 -0
- package/examples/evals/performance-engineering.json +52 -0
- package/examples/evals/refactor.json +128 -0
- package/examples/evals/semiotics.json +52 -0
- package/examples/evals/skill-infrastructure.json +96 -0
- package/examples/evals/skill-router.json +140 -0
- package/examples/evals/skill-router.routing.json +113 -0
- package/examples/evals/system-interface-contracts.json +52 -0
- package/examples/evals/task-analysis.json +52 -0
- package/examples/evals/testing-strategy.json +118 -0
- package/examples/evals/type-safety.json +249 -0
- package/examples/evals/visual-design-foundations.json +52 -0
- package/examples/evals/webhook-integration.json +52 -0
- package/examples/exports/a11y.skill-md.md +80 -0
- package/examples/exports/debugging.skill-md.md +80 -0
- package/examples/exports/refactor.skill-md.md +78 -0
- package/examples/exports/testing-strategy.skill-md.md +81 -0
- package/examples/projects/markdown-static-site/README.md +115 -0
- package/examples/projects/markdown-static-site/skills/content-source-router/SKILL.md +131 -0
- package/examples/projects/markdown-static-site/skills/image-optimization-pipeline-config/SKILL.md +132 -0
- package/examples/projects/markdown-static-site/skills/link-rot-detection/SKILL.md +103 -0
- package/examples/projects/markdown-static-site/skills/markdown-post-frontmatter-validation/SKILL.md +133 -0
- package/examples/projects/markdown-static-site/skills/migrate-posts-to-v2-frontmatter/SKILL.md +140 -0
- package/examples/projects/saas-stripe-postgres/README.md +208 -0
- package/examples/projects/saas-stripe-postgres/db/migrations/0004_canonicalize_orders.sql +37 -0
- package/examples/projects/saas-stripe-postgres/db/schema.sql +112 -0
- package/examples/projects/saas-stripe-postgres/skills/migrate-orders-to-canonical-schema/SKILL.md +149 -0
- package/examples/projects/saas-stripe-postgres/skills/nextjs-server-action-validation/SKILL.md +154 -0
- package/examples/projects/saas-stripe-postgres/skills/payment-provider-router/SKILL.md +153 -0
- package/examples/projects/saas-stripe-postgres/skills/postgres-rls-pattern/SKILL.md +163 -0
- package/examples/projects/saas-stripe-postgres/skills/stripe-webhook-signature-verification/SKILL.md +137 -0
- package/examples/protocol/skill-metadata-template.md +301 -0
- package/examples/protocol/skills.manifest.sample.json +13245 -0
- package/examples/skill-metadata-template.md +317 -0
- package/examples/skills.manifest.sample.json +13519 -0
- package/examples/tests/v3-1-skos-fixture/SKILL.md +93 -0
- package/marketplace/README.md +17 -0
- package/marketplace/skills/a11y/SKILL.md +66 -0
- package/marketplace/skills/acid-fundamentals/SKILL.md +106 -0
- package/marketplace/skills/agent-engineering/SKILL.md +386 -0
- package/marketplace/skills/agent-eval-design/SKILL.md +55 -0
- package/marketplace/skills/ai-native-development/SKILL.md +294 -0
- package/marketplace/skills/api-design/SKILL.md +60 -0
- package/marketplace/skills/architecture-decision-records/SKILL.md +55 -0
- package/marketplace/skills/background-jobs/SKILL.md +265 -0
- package/marketplace/skills/bounded-context-mapping/SKILL.md +55 -0
- package/marketplace/skills/cap-theorem-tradeoffs/SKILL.md +127 -0
- package/marketplace/skills/client-server-boundary/SKILL.md +187 -0
- package/marketplace/skills/code-review/SKILL.md +120 -0
- package/marketplace/skills/color-system-design/SKILL.md +43 -0
- package/marketplace/skills/component-architecture/SKILL.md +126 -0
- package/marketplace/skills/compression/SKILL.md +112 -0
- package/marketplace/skills/conceptual-modeling/SKILL.md +181 -0
- package/marketplace/skills/connection-pooling/SKILL.md +105 -0
- package/marketplace/skills/constraint-awareness/SKILL.md +287 -0
- package/marketplace/skills/content-monitor/SKILL.md +209 -0
- package/marketplace/skills/context-engineering/SKILL.md +320 -0
- package/marketplace/skills/context-graph/SKILL.md +174 -0
- package/marketplace/skills/context-management/SKILL.md +174 -0
- package/marketplace/skills/context-window/SKILL.md +239 -0
- package/marketplace/skills/contract-testing/SKILL.md +120 -0
- package/marketplace/skills/cron-scheduling/SKILL.md +223 -0
- package/marketplace/skills/dark-mode-implementation/SKILL.md +47 -0
- package/marketplace/skills/data-modeling/SKILL.md +59 -0
- package/marketplace/skills/data-modeling-fundamentals/SKILL.md +117 -0
- package/marketplace/skills/database-migration/SKILL.md +429 -0
- package/marketplace/skills/debugging/SKILL.md +67 -0
- package/marketplace/skills/dependency-architecture/SKILL.md +58 -0
- package/marketplace/skills/design-module-composition/SKILL.md +43 -0
- package/marketplace/skills/design-system-architecture/SKILL.md +61 -0
- package/marketplace/skills/design-thinking/SKILL.md +44 -0
- package/marketplace/skills/diagnosis/SKILL.md +296 -0
- package/marketplace/skills/diff-analysis/SKILL.md +188 -0
- package/marketplace/skills/e2e-test-design/SKILL.md +113 -0
- package/marketplace/skills/entity-relationship-modeling/SKILL.md +218 -0
- package/marketplace/skills/epistemic-grounding/SKILL.md +112 -0
- package/marketplace/skills/error-boundary/SKILL.md +235 -0
- package/marketplace/skills/error-tracking/SKILL.md +261 -0
- package/marketplace/skills/eval-driven-development/SKILL.md +147 -0
- package/marketplace/skills/evaluation/SKILL.md +113 -0
- package/marketplace/skills/event-contract-design/SKILL.md +60 -0
- package/marketplace/skills/event-storming/SKILL.md +56 -0
- package/marketplace/skills/form-ux-architecture/SKILL.md +60 -0
- package/marketplace/skills/framework-fit-analysis/SKILL.md +59 -0
- package/marketplace/skills/frontend-architecture/SKILL.md +43 -0
- package/marketplace/skills/generative-ui/SKILL.md +118 -0
- package/marketplace/skills/graph-audit/SKILL.md +81 -0
- package/marketplace/skills/guardrails/SKILL.md +118 -0
- package/marketplace/skills/hooks-patterns/SKILL.md +185 -0
- package/marketplace/skills/http-semantics/SKILL.md +136 -0
- package/marketplace/skills/ideation/SKILL.md +41 -0
- package/marketplace/skills/indexing-strategy/SKILL.md +108 -0
- package/marketplace/skills/information-architecture/SKILL.md +59 -0
- package/marketplace/skills/integration-test-design/SKILL.md +111 -0
- package/marketplace/skills/intent-recognition/SKILL.md +136 -0
- package/marketplace/skills/interaction-feedback/SKILL.md +59 -0
- package/marketplace/skills/interaction-patterns/SKILL.md +59 -0
- package/marketplace/skills/journey-mapping/SKILL.md +41 -0
- package/marketplace/skills/keywords/SKILL.md +213 -0
- package/marketplace/skills/knowledge-modeling/SKILL.md +232 -0
- package/marketplace/skills/layout-composition/SKILL.md +59 -0
- package/marketplace/skills/linguistics/SKILL.md +429 -0
- package/marketplace/skills/lint-overlay/SKILL.md +76 -0
- package/marketplace/skills/mental-models/SKILL.md +126 -0
- package/marketplace/skills/merge-queue/SKILL.md +94 -0
- package/marketplace/skills/methodology/SKILL.md +317 -0
- package/marketplace/skills/microcopy/SKILL.md +232 -0
- package/marketplace/skills/middleware-patterns/SKILL.md +363 -0
- package/marketplace/skills/mobile-responsive-ux/SKILL.md +287 -0
- package/marketplace/skills/mutation-testing/SKILL.md +112 -0
- package/marketplace/skills/naming-conventions/SKILL.md +112 -0
- package/marketplace/skills/observability-modeling/SKILL.md +59 -0
- package/marketplace/skills/ontology-modeling/SKILL.md +67 -0
- package/marketplace/skills/owasp-security/SKILL.md +153 -0
- package/marketplace/skills/pattern-recognition/SKILL.md +472 -0
- package/marketplace/skills/performance-budgets/SKILL.md +185 -0
- package/marketplace/skills/performance-engineering/SKILL.md +58 -0
- package/marketplace/skills/performance-testing/SKILL.md +125 -0
- package/marketplace/skills/printify/SKILL.md +42 -0
- package/marketplace/skills/prioritization/SKILL.md +118 -0
- package/marketplace/skills/problem-framing/SKILL.md +41 -0
- package/marketplace/skills/problem-locating-solving/SKILL.md +203 -0
- package/marketplace/skills/project-knowledge-extraction/SKILL.md +54 -0
- package/marketplace/skills/prompt-craft/SKILL.md +134 -0
- package/marketplace/skills/prompt-injection-defense/SKILL.md +132 -0
- package/marketplace/skills/property-based-testing/SKILL.md +100 -0
- package/marketplace/skills/prototyping/SKILL.md +43 -0
- package/marketplace/skills/query-optimization/SKILL.md +144 -0
- package/marketplace/skills/real-time-updates/SKILL.md +324 -0
- package/marketplace/skills/ref-patterns/SKILL.md +284 -0
- package/marketplace/skills/refactor/SKILL.md +65 -0
- package/marketplace/skills/rendering-models/SKILL.md +142 -0
- package/marketplace/skills/replication-patterns/SKILL.md +110 -0
- package/marketplace/skills/research-synthesis/SKILL.md +41 -0
- package/marketplace/skills/route-handler-design/SKILL.md +347 -0
- package/marketplace/skills/schema-evolution/SKILL.md +140 -0
- package/marketplace/skills/security-fundamentals/SKILL.md +139 -0
- package/marketplace/skills/semantic-center/SKILL.md +194 -0
- package/marketplace/skills/semantic-relations/SKILL.md +250 -0
- package/marketplace/skills/semantics/SKILL.md +366 -0
- package/marketplace/skills/semiotics/SKILL.md +230 -0
- package/marketplace/skills/seo-strategy/SKILL.md +260 -0
- package/marketplace/skills/server-actions-design/SKILL.md +243 -0
- package/marketplace/skills/server-components-design/SKILL.md +190 -0
- package/marketplace/skills/sharding-strategy/SKILL.md +123 -0
- package/marketplace/skills/shopify/SKILL.md +42 -0
- package/marketplace/skills/skill-infrastructure/SKILL.md +320 -0
- package/marketplace/skills/skill-router/SKILL.md +71 -0
- package/marketplace/skills/skill-scaffold/SKILL.md +105 -0
- package/marketplace/skills/snapshot-testing/SKILL.md +120 -0
- package/marketplace/skills/spec-driven-development/SKILL.md +148 -0
- package/marketplace/skills/state-machine-modeling/SKILL.md +56 -0
- package/marketplace/skills/state-management/SKILL.md +134 -0
- package/marketplace/skills/streaming-architecture/SKILL.md +194 -0
- package/marketplace/skills/summarization/SKILL.md +156 -0
- package/marketplace/skills/suspense-patterns/SKILL.md +265 -0
- package/marketplace/skills/system-interface-contracts/SKILL.md +59 -0
- package/marketplace/skills/task-analysis/SKILL.md +201 -0
- package/marketplace/skills/taxonomy-design/SKILL.md +66 -0
- package/marketplace/skills/test-coverage-strategy/SKILL.md +108 -0
- package/marketplace/skills/test-doubles-design/SKILL.md +98 -0
- package/marketplace/skills/test-driven-development/SKILL.md +96 -0
- package/marketplace/skills/testing-strategy/SKILL.md +67 -0
- package/marketplace/skills/theme-system-design/SKILL.md +43 -0
- package/marketplace/skills/tool-call-flow/SKILL.md +229 -0
- package/marketplace/skills/tool-call-strategy/SKILL.md +292 -0
- package/marketplace/skills/transaction-isolation/SKILL.md +98 -0
- package/marketplace/skills/type-safety/SKILL.md +177 -0
- package/marketplace/skills/typography-system/SKILL.md +43 -0
- package/marketplace/skills/usability-testing/SKILL.md +43 -0
- package/marketplace/skills/user-research/SKILL.md +43 -0
- package/marketplace/skills/vercel-composition-patterns/SKILL.md +157 -0
- package/marketplace/skills/version-control/SKILL.md +233 -0
- package/marketplace/skills/visual-design-foundations/SKILL.md +59 -0
- package/marketplace/skills/visual-hierarchy/SKILL.md +43 -0
- package/marketplace/skills/webhook-integration/SKILL.md +331 -0
- package/marketplace/skills/writing-humanizer/SKILL.md +380 -0
- package/package.json +67 -0
- package/schemas/manifest.schema.json +811 -0
- package/schemas/manifest.v2.schema.json +164 -0
- package/schemas/manifest.v3.schema.json +758 -0
- package/schemas/manifest.v4.schema.json +755 -0
- package/schemas/manifest.v5.schema.json +755 -0
- package/schemas/manifest.v6.schema.json +811 -0
- package/schemas/skill.context.jsonld +279 -0
- package/schemas/skill.schema.json +919 -0
- package/schemas/skill.v2.schema.json +201 -0
- package/schemas/skill.v3.schema.json +827 -0
- package/schemas/skill.v4.schema.json +822 -0
- package/schemas/skill.v5.schema.json +830 -0
- package/schemas/skill.v6.schema.json +946 -0
- package/schemas/vocabulary/keywords.json +180 -0
- package/schemas/vocabulary/workspace_tags.json +23 -0
- package/scripts/__tests__/migrate-skill-v2-to-v3.test.js +161 -0
- package/scripts/__tests__/migrate-skill-v3-to-v4.test.js +158 -0
- package/scripts/__tests__/test-export-parser-drift.js +149 -0
- package/scripts/__tests__/test-marketplace-export.js +114 -0
- package/scripts/__tests__/test-router-paths.js +82 -0
- package/scripts/__tests__/test-stability-promotion.js +244 -0
- package/scripts/__tests__/test-v3-1-alias-contract.js +109 -0
- package/scripts/__tests__/test-v3-1-skos-runtime.js +116 -0
- package/scripts/backfill-schema-version.js +198 -0
- package/scripts/build-field-reference.js +160 -0
- package/scripts/build-retrieval-baseline.js +511 -0
- package/scripts/check-markdown-links.js +211 -0
- package/scripts/check-protocol-consistency.js +979 -0
- package/scripts/export-marketplace-skills.js +610 -0
- package/scripts/export-skill.js +374 -0
- package/scripts/generate-manifest.js +787 -0
- package/scripts/lib/alias-contract.js +83 -0
- package/scripts/lib/audit-prompt-builder.js +771 -0
- package/scripts/lib/mock-grader.js +134 -0
- package/scripts/lib/parse-frontmatter.js +429 -0
- package/scripts/lib/roots.js +119 -0
- package/scripts/lint/check-archetype-sections.js +185 -0
- package/scripts/lint/check-category-enum.js +83 -0
- package/scripts/lint/check-routing-eval.js +146 -0
- package/scripts/lint/check-routing-quality.js +211 -0
- package/scripts/lint/check-stability-promotion.js +220 -0
- package/scripts/lint/format-code-frame.js +206 -0
- package/scripts/marketplace-install.js +125 -0
- package/scripts/migrate-category-to-enum.js +169 -0
- package/scripts/migrate-skill-v2-to-v3.js +424 -0
- package/scripts/migrate-skill-v3-to-v4.js +200 -0
- package/scripts/migrate-skill-v5-to-v6.js +304 -0
- package/scripts/restructure-by-category.js +85 -0
- package/scripts/seed-publication-classification.js +282 -0
- package/scripts/skill-audit.js +893 -0
- package/scripts/skill-graph-drift.js +483 -0
- package/scripts/skill-graph-route.js +766 -0
- package/scripts/skill-graph-routing-eval.js +393 -0
- package/scripts/skill-lint.js +1317 -0
- package/scripts/skill-overlap.js +213 -0
- package/scripts/verify-skill-md-export.js +201 -0
|
@@ -0,0 +1,132 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: prompt-injection-defense
|
|
3
|
+
description: "Use when reasoning about systems that pass untrusted content to a language model: the data-vs-instruction collapse that makes this attack class a structural property of LLMs rather than a fixable bug, the direct/indirect/exfiltration/action-trigger taxonomy, the role of every untrusted surface (RAG retrievals, tool results, attachments, web content, document parsing, user-provided text), why content filters and improved system prompts do not solve it, and the defense-in-depth measures that do (capability constraint, content origin tracking, separate planning and execution stages, human-in-the-loop gates, principle-of-least-authority for tools). Do NOT use for jailbreaking and policy circumvention (use model-safety), for general API security (use api-security), for runtime input validation patterns (use type-safety + api-design), or for the protocol cycle of tool calls (use tool-call-flow)."
|
|
4
|
+
license: MIT
|
|
5
|
+
allowed-tools: Read Grep
|
|
6
|
+
metadata:
|
|
7
|
+
metadata: "{\"schema_version\":6,\"version\":\"1.0.0\",\"type\":\"capability\",\"category\":\"quality\",\"domain\":\"quality/security\",\"scope\":\"reference\",\"owner\":\"skill-graph-maintainer\",\"freshness\":\"2026-05-16\",\"drift_check\":\"{\\\\\\\"last_verified\\\\\\\":\\\\\\\"2026-05-16\\\\\\\"}\",\"eval_artifacts\":\"planned\",\"eval_state\":\"unverified\",\"routing_eval\":\"absent\",\"comprehension_state\":\"present\",\"stability\":\"experimental\",\"keywords\":\"[\\\\\\\"prompt injection\\\\\\\",\\\\\\\"indirect prompt injection\\\\\\\",\\\\\\\"LLM01\\\\\\\",\\\\\\\"OWASP LLM\\\\\\\",\\\\\\\"data exfiltration\\\\\\\",\\\\\\\"tool abuse\\\\\\\",\\\\\\\"untrusted content\\\\\\\",\\\\\\\"RAG injection\\\\\\\",\\\\\\\"markdown image exfiltration\\\\\\\",\\\\\\\"jailbreak\\\\\\\",\\\\\\\"instruction confusion\\\\\\\",\\\\\\\"principle of least authority\\\\\\\",\\\\\\\"dual LLM pattern\\\\\\\",\\\\\\\"prompt injection attack\\\\\\\",\\\\\\\"inject instructions into context\\\\\\\"]\",\"triggers\":\"[\\\\\\\"is this an injection vector\\\\\\\",\\\\\\\"how do we stop the model from following commands in user input\\\\\\\",\\\\\\\"the model is treating retrieved content as commands\\\\\\\",\\\\\\\"is RAG safe\\\\\\\",\\\\\\\"can the model exfiltrate data via a tool call\\\\\\\"]\",\"examples\":\"[\\\\\\\"review whether retrieved documents in a RAG pipeline can override the system prompt\\\\\\\",\\\\\\\"design the boundary between a planning agent and an execution agent so injected commands cannot trigger destructive tool calls\\\\\\\",\\\\\\\"explain why a content filter that blocks one canonical attack phrase does not stop the broader class\\\\\\\",\\\\\\\"decide what tools an agent reading email attachments may invoke without human confirmation\\\\\\\"]\",\"anti_examples\":\"[\\\\\\\"design the JSON shape of a tool's parameters (use tool-call-flow)\\\\\\\",\\\\\\\"harden an HTTP API against SQL injection or XSS (use api-security)\\\\\\\",\\\\\\\"audit a model's refusal behavior on disallowed content (use model-safety)\\\\\\\"]\",\"relations\":\"{\\\\\\\"related\\\\\\\":[\\\\\\\"tool-call-flow\\\\\\\",\\\\\\\"http-semantics\\\\\\\",\\\\\\\"type-safety\\\\\\\",\\\\\\\"api-design\\\\\\\"],\\\\\\\"boundary\\\\\\\":[{\\\\\\\"skill\\\\\\\":\\\\\\\"tool-call-flow\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"tool-call-flow owns the protocol cycle by which a model invokes a tool; this skill owns the security property the cycle must preserve when any message carries untrusted content.\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"type-safety\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"type-safety owns preventing type errors at compile time; this skill owns preventing command-execution errors at the data-vs-instruction boundary. Both are validate-at-the-boundary problems with different threat models.\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"api-design\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"api-design owns the request/response surface contract; this skill owns the constraint that no field carrying user content may be treated as commands by a downstream model.\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"http-semantics\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"http-semantics owns transport meaning (cache, idempotency, content type); this skill owns the threat that arrives over correct HTTP and is still harmful because the model interprets it as a command.\\\\\\\"}],\\\\\\\"verify_with\\\\\\\":[\\\\\\\"api-design\\\\\\\",\\\\\\\"tool-call-flow\\\\\\\"]}\",\"mental_model\":\"|\",\"purpose\":\"|\",\"boundary\":\"|\",\"analogy\":\"Prompt injection defense is to LLM-integrated systems what blast walls are to fuel depots — you cannot prevent the fuel from being flammable (the structural property), so you do not try; you build the walls so that an ignition contains itself, the radius is bounded, and the rest of the depot survives. The walls are the architectural defense; the model's susceptibility is the fuel's flammability — a property of its physics, not a bug to fix.\",\"misconception\":\"|\",\"concept\":\"{\\\\\\\"definition\\\\\\\":\\\\\\\"This attack class describes systems where untrusted content placed within a language model's input causes the model to follow attacker-controlled directives instead of, or in addition to, the application's legitimate ones. It is a structural property of how transformer-based language models consume their input — every token in the context window contributes to the next-token prediction, and the model has no reliable mechanism to distinguish 'directives from the application developer' from 'directives in a document the application happens to have loaded.' Defense, therefore, is not elimination of the vulnerability but architectural containment of its blast radius.\\\\\\\",\\\\\\\"mental_model\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"purpose\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"boundary\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"taxonomy\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"analogy\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"misconception\\\\\\\":\\\\\\\"|\\\\\\\"}\",\"skill_graph_source_repo\":\"https://github.com/jacob-balslev/skill-graph\",\"skill_graph_protocol\":\"Skill Metadata Protocol v5\",\"skill_graph_project\":\"Skill Graph\",\"skill_graph_canonical_skill\":\"skills/prompt-injection-defense/SKILL.md\"}"
|
|
8
|
+
skill_graph_source_repo: "https://github.com/jacob-balslev/skill-graph"
|
|
9
|
+
skill_graph_protocol: Skill Metadata Protocol v4
|
|
10
|
+
skill_graph_project: Skill Graph
|
|
11
|
+
skill_graph_canonical_skill: skills/prompt-injection-defense/SKILL.md
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Prompt-Injection Defense
|
|
15
|
+
|
|
16
|
+
## Coverage
|
|
17
|
+
|
|
18
|
+
The architectural discipline of defending language-model-integrated systems against the attack class in which untrusted content causes the model to follow attacker-controlled directives. Covers the data-vs-directive collapse that makes this attack structural rather than incidental, the direct/indirect/action-trigger/exfiltration taxonomy, the injection surfaces (user input, RAG retrieval, tool result, attached document, multimodal image content, subagent output), why content filters and improved system prompts do not solve the class, and the defense-in-depth measures that do (capability constraint, origin tracking, dual-LLM pattern, planning/execution separation, human-in-the-loop confirmation, principle of least authority).
|
|
19
|
+
|
|
20
|
+
## Philosophy
|
|
21
|
+
|
|
22
|
+
This attack class is not a bug. It is a property of how transformer-based language models consume their context. Every token in the context window contributes to the next-token prediction, and the model has no reliable mechanism to distinguish "directives from the application developer" from "directives written by an attacker in a document the application happens to have loaded." Treating it as a bug to fix — by patching the model or improving the system prompt — buys partial reductions in attack success rate but never reaches zero.
|
|
23
|
+
|
|
24
|
+
The discipline of defense, therefore, is not to eliminate the vulnerability. It is to ensure that successful compromise does not translate to consequential action. The model can be tricked; the runtime must not be. The defenses that work are architectural: limit what tools the model exposed to untrusted content can call, separate the agent that reads untrusted content from the agent (or code) that takes action, require human confirmation for high-impact operations regardless of model intent, and track the provenance of every byte in the context window so that low-trust content cannot route to high-authority execution paths.
|
|
25
|
+
|
|
26
|
+
The wrong mental model is "build a smart fence around the model." The right mental model is "engineer the system so the model's mistakes don't matter."
|
|
27
|
+
|
|
28
|
+
## The Threat Model
|
|
29
|
+
|
|
30
|
+
| Element | Direct case | Indirect case |
|
|
31
|
+
|---|---|---|
|
|
32
|
+
| Who is the attacker | The user typing into the input | A third party who controls content the system reads |
|
|
33
|
+
| Who is the victim | The application (or the user's interest in the app's correct behavior) | The user on whose behalf the model is acting |
|
|
34
|
+
| Where the directive lives | The user-input field | A document, webpage, tool result, email, RAG entry, subagent output |
|
|
35
|
+
| Why the user wouldn't notice | The user is the attacker | The user may never even see the injected content |
|
|
36
|
+
| First demonstrated | Riley Goodside popularized in September 2022 | Greshake et al., "Not what you've signed up for," February 2023 |
|
|
37
|
+
|
|
38
|
+
Both threat cases have the same root cause (data-vs-directive collapse in transformers) and require the same architectural defenses, but the indirect case is the harder threat — the user is not a participant in their own compromise.
|
|
39
|
+
|
|
40
|
+
## The Defense Stack
|
|
41
|
+
|
|
42
|
+
Defenses compose. None alone is sufficient; the stack as a whole determines the system's security posture.
|
|
43
|
+
|
|
44
|
+
| Layer | What it does | Bypass class | Strength |
|
|
45
|
+
|---|---|---|---|
|
|
46
|
+
| Input filtering / blocklist | Pattern-match for known attack strings | Paraphrase, encoding, indirect content | Weak |
|
|
47
|
+
| System-prompt warning | Tell the model not to follow injected directives | Sufficiently persuasive text in the same context | Weak-to-medium |
|
|
48
|
+
| Output sanitization | Strip dangerous markdown / outbound URLs / scripts from model output | Same-origin exfiltration, encoded data | Medium for exfiltration |
|
|
49
|
+
| Structured output enforcement | Force JSON/function-call schema | Semantic compromise within valid structure | Medium for shape, weak for content |
|
|
50
|
+
| Tool authority constraint | The tools available to a low-trust agent are themselves low-impact | Compose multiple safe tools into harmful effect | Strong |
|
|
51
|
+
| Origin tracking / dual-LLM pattern | A privileged LLM never sees untrusted content; a quarantined LLM produces typed outputs the privileged one consumes | Quarantined LLM persuades the privileged one via the typed channel — needs schema rigor | Strong |
|
|
52
|
+
| Planning/execution separation | The planning model proposes; a separate execution layer enforces what is actually allowed | Bypassed only if execution policy is itself derived from model output | Strong |
|
|
53
|
+
| Human-in-the-loop confirmation | Every irreversible action requires explicit user approval | User clicks through; UX matters | Strong if UX is honest |
|
|
54
|
+
| Principle of least authority | The agent has only the credentials and scopes needed for the immediate task | Insider threat from the agent itself is the residual risk | Strong |
|
|
55
|
+
|
|
56
|
+
The OWASP Top 10 for LLM Applications (LLM01: Prompt Injection) recommends combining several of these in any deployed system.
|
|
57
|
+
|
|
58
|
+
## Injection Surfaces — Every One Is A Vector
|
|
59
|
+
|
|
60
|
+
| Surface | Risk | Mitigation |
|
|
61
|
+
|---|---|---|
|
|
62
|
+
| User-input field | Direct case | Treat as untrusted; constrain tools accordingly |
|
|
63
|
+
| RAG retrieval | Indirect via poisoned/attacker-authored documents in the corpus | Origin-tag retrieved chunks; low-trust score; never let RAG content escalate authority |
|
|
64
|
+
| Tool result | Indirect via a tool that fetches third-party content (web, email body, low-trust DB rows) | Treat tool results as untrusted; constrain follow-up tool calls; do not let a tool result trigger an action the user did not authorize |
|
|
65
|
+
| Attached document (PDF, DOCX, spreadsheet) | Indirect via attachment uploaded by anyone (the user, but also a forwarded email) | Same as above; consider whether the agent reading attachments needs any tool authority |
|
|
66
|
+
| Image / multimodal | Directives encoded as text in image pixels, OCR'd by the model | Same as above; vision models susceptible to text-in-image directives |
|
|
67
|
+
| Subagent output | A compromised subagent propagates the compromise to its parent | Subagent outputs are tool results; treat as untrusted |
|
|
68
|
+
| The system prompt position | If user content gets prepended above the system prompt due to bug | Validate the message-list construction; system prompt must always be first |
|
|
69
|
+
|
|
70
|
+
The defensive question for any new feature: **what untrusted content will enter the model's context, and what tools will the model have authority to call in that turn?** If the answer to the second is anything destructive, the design needs revision.
|
|
71
|
+
|
|
72
|
+
## The Markdown-Image Exfiltration Pattern
|
|
73
|
+
|
|
74
|
+
A signature exfiltration technique against assistant-style LLMs:
|
|
75
|
+
|
|
76
|
+
1. Untrusted content the model is reading contains a directive to include, at the end of its response, a markdown image whose URL points at an attacker-controlled server with the query string containing some sensitive value from the conversation.
|
|
77
|
+
2. The model, attending to the directive, constructs the markdown image element with the sensitive value embedded in the URL.
|
|
78
|
+
3. The chat UI renders the markdown, causing the user's browser to fetch the image URL.
|
|
79
|
+
4. The attacker's server logs the URL, capturing the sensitive value.
|
|
80
|
+
|
|
81
|
+
The user did not click anything. They saw the assistant's reply, the image silently loaded, and the data was exfiltrated.
|
|
82
|
+
|
|
83
|
+
Mitigations:
|
|
84
|
+
- Strip markdown image links pointing to non-allowed origins before rendering.
|
|
85
|
+
- Apply Content-Security-Policy to the chat UI restricting `img-src`.
|
|
86
|
+
- Sanitize URLs in model output as part of the rendering pipeline, not the model output.
|
|
87
|
+
|
|
88
|
+
This pattern generalizes: any rendered output that can produce an outbound network request based on attacker-controlled content is an exfiltration channel.
|
|
89
|
+
|
|
90
|
+
## The Dual-LLM Pattern
|
|
91
|
+
|
|
92
|
+
Proposed by Simon Willison (2023). Two LLMs split the work:
|
|
93
|
+
|
|
94
|
+
- **Privileged LLM** — has access to tools, secrets, and authority. Never sees untrusted content directly. Receives only typed, structured summaries from the quarantined LLM (a schema like `{ documents_summary: string, action_options: Action[], recommended: Action }`).
|
|
95
|
+
- **Quarantined LLM** — reads untrusted content. Has no tool authority. Its only output is into a typed schema that the privileged LLM consumes.
|
|
96
|
+
|
|
97
|
+
Even if the quarantined LLM is fully compromised (every retrieved document successfully attacks it), it can only output values into the typed schema; the harm is bounded by what an attacker can express through that schema. If the schema is small and well-designed, the bound is tight.
|
|
98
|
+
|
|
99
|
+
This is structurally analogous to a sandboxed process producing a parsed protobuf for a privileged orchestrator — the security boundary is the data shape between them, enforced by code on both sides.
|
|
100
|
+
|
|
101
|
+
## Verification
|
|
102
|
+
|
|
103
|
+
After applying this skill, verify:
|
|
104
|
+
- [ ] Every place untrusted content enters the model's context is named explicitly. "User input" is not the only one — RAG retrievals, tool results, attached documents, multimodal image content, and subagent outputs all qualify.
|
|
105
|
+
- [ ] The agent exposed to any untrusted content has tool authority limited to operations that cannot cause harm if maliciously invoked. Destructive tools require human-in-the-loop confirmation regardless of model intent.
|
|
106
|
+
- [ ] No defense rests solely on prompting. System-prompt warnings are present as one layer but are not the load-bearing layer.
|
|
107
|
+
- [ ] If output is rendered as HTML or Markdown, image-source and link-target origins are restricted by an allowlist or Content-Security-Policy, not by trust in the model output.
|
|
108
|
+
- [ ] If the system uses RAG, retrieved chunks are origin-tagged; the rendering or downstream-tool layer treats retrieved content as low-trust regardless of corpus provenance.
|
|
109
|
+
- [ ] If the system uses subagents, subagent outputs are treated as tool results — i.e., as untrusted content — when they re-enter the parent's context.
|
|
110
|
+
- [ ] No single tool call can both ingest untrusted content and perform a high-impact action in the same turn. The planning/execution boundary is enforced architecturally, not by prompt.
|
|
111
|
+
- [ ] An adversarial test has been run: at least one red-team pass against the system using public attack-prompt corpora (e.g., the OWASP LLM01 examples, the SPML benchmark) and a hand-written set targeting the system's specific tools and surfaces.
|
|
112
|
+
|
|
113
|
+
## Do NOT Use When
|
|
114
|
+
|
|
115
|
+
| Instead of this skill | Use | Why |
|
|
116
|
+
|---|---|---|
|
|
117
|
+
| Hardening a model against producing disallowed content (jailbreaking) | `model-safety` | jailbreaking targets the model's policy boundary on behalf of one user; this attack class targets the application's correct behavior on behalf of a victim user |
|
|
118
|
+
| Designing the JSON shape or parameter schema of a tool | `tool-call-flow` + `api-design` | tool-call-flow owns the model-runtime cycle; api-design owns parameter shape; this skill owns the security property they must preserve |
|
|
119
|
+
| Defending an HTTP API against SQL injection or XSS | `api-security` | those have hard data-vs-directive boundaries that can be fixed at the encoding layer; this skill is for the boundary-less LLM case |
|
|
120
|
+
| Auditing the model's accuracy or hallucination behavior | `eval-driven-development` | eval owns measurement; this skill owns the security property |
|
|
121
|
+
| General authn/authz for API endpoints | `api-security` | authz governs what callers may do; this skill governs what an authenticated agent may be tricked into doing |
|
|
122
|
+
|
|
123
|
+
## Key Sources
|
|
124
|
+
|
|
125
|
+
- OWASP. [LLM01: Prompt Injection — OWASP Top 10 for Large Language Model Applications (2025)](https://genai.owasp.org/llmrisk/llm01-prompt-injection/). The canonical industry-aligned threat-classification and mitigation framework.
|
|
126
|
+
- Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). ["Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection"](https://arxiv.org/abs/2302.12173). The foundational academic paper on the indirect case; defines the threat model.
|
|
127
|
+
- Perez, F., & Ribeiro, I. (2022). ["Ignore Previous Prompt: Attack Techniques For Language Models"](https://arxiv.org/abs/2211.09527). Early systematic study of direct attack techniques.
|
|
128
|
+
- Willison, S. [Prompt injection: What's the worst that can happen?](https://simonwillison.net/2023/Apr/14/worst-that-can-happen/) and [The Dual LLM pattern for building AI assistants that can resist prompt injection](https://simonwillison.net/2023/Apr/25/dual-llm-pattern/). Canonical practitioner taxonomy and the dual-LLM architectural pattern.
|
|
129
|
+
- NIST. [AI Risk Management Framework (AI RMF 1.0)](https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf). Section on adversarial input and the broader AI risk taxonomy; useful framing for what this attack class sits inside.
|
|
130
|
+
- Anthropic. [Mitigating jailbreaks and prompt injections](https://docs.anthropic.com/en/docs/test-and-evaluate/strengthen-guardrails/mitigate-jailbreaks). Vendor-side guidance on defense in depth for Anthropic-hosted models — useful as one practitioner perspective, not as a complete defense.
|
|
131
|
+
- OWASP. [LLM02: Sensitive Information Disclosure](https://genai.owasp.org/llmrisk/llm02-sensitive-information-disclosure/) and [LLM06: Excessive Agency](https://genai.owasp.org/llmrisk/llm06-excessive-agency/). Adjacent OWASP categories that compose with this one — exfiltration consequences and over-broad tool authority are the consequence side of the threat.
|
|
132
|
+
- Schulhoff, S., Pinto, J., Khan, A., et al. (2024). ["The Prompt Report: A Systematic Survey of Prompting Techniques"](https://arxiv.org/abs/2406.06608). Cross-references defensive prompting techniques within the broader prompting literature.
|
|
@@ -0,0 +1,100 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: property-based-testing
|
|
3
|
+
description: "Use when reasoning about tests that specify universal properties of code rather than specific input-output pairs: the forall(input) → property quantification, the generator/shrinker primitives that produce inputs and minimize failing cases, the four-rules-of-simple-design analog (commutativity, associativity, idempotence, round-trip, oracle, invariant), the difference between example-based tests (one input, one assertion) and property-based tests (many generated inputs, one universal claim), why property tests find bugs example tests don't, the shrinking discipline that produces minimal failing cases, and the trade-off between generator complexity and bug-finding capacity. Do NOT use for specifying one concrete behavior with one input (use example-based tests under testing-strategy), for fuzz-testing focused on crashes (use fuzz-testing), for mutation testing as a test-suite quality signal (use mutation-testing), or for model-based testing of state machines (use state-machine-modeling)."
|
|
4
|
+
license: MIT
|
|
5
|
+
allowed-tools: Read Grep
|
|
6
|
+
metadata:
|
|
7
|
+
metadata: "{\"schema_version\":6,\"version\":\"1.0.0\",\"type\":\"capability\",\"category\":\"quality\",\"domain\":\"quality/testing\",\"scope\":\"reference\",\"owner\":\"skill-graph-maintainer\",\"freshness\":\"2026-05-16\",\"drift_check\":\"{\\\\\\\"last_verified\\\\\\\":\\\\\\\"2026-05-16\\\\\\\"}\",\"eval_artifacts\":\"planned\",\"eval_state\":\"unverified\",\"routing_eval\":\"absent\",\"comprehension_state\":\"present\",\"stability\":\"experimental\",\"keywords\":\"[\\\\\\\"property-based testing\\\\\\\",\\\\\\\"PBT\\\\\\\",\\\\\\\"QuickCheck\\\\\\\",\\\\\\\"Hypothesis\\\\\\\",\\\\\\\"fast-check\\\\\\\",\\\\\\\"generator\\\\\\\",\\\\\\\"shrinker\\\\\\\",\\\\\\\"forall\\\\\\\",\\\\\\\"invariant\\\\\\\",\\\\\\\"round-trip property\\\\\\\",\\\\\\\"oracle property\\\\\\\",\\\\\\\"random testing\\\\\\\",\\\\\\\"generative testing\\\\\\\"]\",\"triggers\":\"[\\\\\\\"should this be a property test\\\\\\\",\\\\\\\"what's an invariant for this function\\\\\\\",\\\\\\\"the bug only happens on weird inputs\\\\\\\",\\\\\\\"QuickCheck or fast-check\\\\\\\",\\\\\\\"how do we test all possible cases\\\\\\\"]\",\"examples\":\"[\\\\\\\"design property-based tests for a sorting function that exercise the universal contract\\\\\\\",\\\\\\\"decide which functions in a parser deserve property tests vs example tests\\\\\\\",\\\\\\\"diagnose a property test that finds a real bug but the shrunk input is large — likely a poorly-shrinkable generator\\\\\\\",\\\\\\\"explain the round-trip property for an encode/decode pair\\\\\\\"]\",\"anti_examples\":\"[\\\\\\\"specify one concrete input-output case (use example-based tests; see testing-strategy)\\\\\\\",\\\\\\\"fuzz for crashes and memory safety (use fuzz-testing)\\\\\\\",\\\\\\\"measure whether tests would catch a defect (use mutation-testing)\\\\\\\"]\",\"relations\":\"{\\\\\\\"related\\\\\\\":[\\\\\\\"testing-strategy\\\\\\\",\\\\\\\"test-driven-development\\\\\\\",\\\\\\\"type-safety\\\\\\\",\\\\\\\"mutation-testing\\\\\\\"],\\\\\\\"boundary\\\\\\\":[{\\\\\\\"skill\\\\\\\":\\\\\\\"testing-strategy\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"testing-strategy owns the strategic question of what test levels to invest in; this skill owns one tactical technique (generative tests with universal properties) within that strategy. Property-based testing is a complement to example-based tests, not a replacement.\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"mutation-testing\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"mutation-testing measures whether the test suite catches code-level defects; property-based testing is one source of high-mutation-killing tests because universal properties tend to be specific about behavior across many inputs. They compose: PBT writes the tests; mutation testing measures their effectiveness.\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"type-safety\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"type-safety constrains the input space at compile time; property-based testing samples the runtime input space within the type-constraint envelope. Stronger types reduce the property-test surface needed; PBT is most valuable where types alone cannot encode the invariants (algorithmic correctness, business rules, round-trips).\\\\\\\"}],\\\\\\\"verify_with\\\\\\\":[\\\\\\\"testing-strategy\\\\\\\",\\\\\\\"mutation-testing\\\\\\\"]}\",\"mental_model\":\"|\",\"purpose\":\"|\",\"boundary\":\"|\",\"analogy\":\"A property-based test is to an example test what an actuarial table is to a single insurance claim — the example tells you what happened in one case, the property tells you what must hold across the entire population; you do not learn that fire insurance is sound by inspecting one policy, you learn it from a contract that quantifies over every house ever insured.\",\"misconception\":\"|\",\"concept\":\"{\\\\\\\"definition\\\\\\\":\\\\\\\"Property-based testing is a tactical technique in which a test specifies a universal property — a claim that must hold for all inputs in a domain — and a test framework generates many random inputs in the domain and checks the property holds. When the property fails on some input, the framework shrinks the input to the smallest failing case to make the bug legible. The unit of specification is a forall-quantified claim, not a single example. Properties typically take three shapes: invariants (a property the output must always have, independent of the input), oracles (the output must equal what an alternative implementation computes), and round-trips (encoding then decoding produces the original value). Property-based testing supplements example-based testing rather than replacing it: examples specify particular behaviors; properties specify the universal contract.\\\\\\\",\\\\\\\"mental_model\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"purpose\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"boundary\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"taxonomy\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"analogy\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"misconception\\\\\\\":\\\\\\\"|\\\\\\\"}\",\"skill_graph_source_repo\":\"https://github.com/jacob-balslev/skill-graph\",\"skill_graph_protocol\":\"Skill Metadata Protocol v5\",\"skill_graph_project\":\"Skill Graph\",\"skill_graph_canonical_skill\":\"skills/property-based-testing/SKILL.md\"}"
|
|
8
|
+
skill_graph_source_repo: "https://github.com/jacob-balslev/skill-graph"
|
|
9
|
+
skill_graph_protocol: Skill Metadata Protocol v4
|
|
10
|
+
skill_graph_project: Skill Graph
|
|
11
|
+
skill_graph_canonical_skill: skills/property-based-testing/SKILL.md
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Property-Based Testing
|
|
15
|
+
|
|
16
|
+
## Coverage
|
|
17
|
+
|
|
18
|
+
The tactical testing technique in which a test specifies a universal property — a claim that must hold for all inputs in a domain — and a framework generates many random inputs to challenge the claim, shrinking any failing input to its minimal form. Covers the four primitives (property, generator, shrinker, trial budget), the three classical property shapes (invariant, oracle, round-trip) plus algebraic and metamorphic patterns, the QuickCheck heritage and the modern tool ecosystem (Hypothesis, fast-check, ScalaCheck, proptest, jqwik), the generator/shrinker discipline for domain types, and the integration of property tests with example tests in a test suite.
|
|
19
|
+
|
|
20
|
+
## Philosophy
|
|
21
|
+
|
|
22
|
+
Property-based testing changes the unit of specification from the example to the contract. An example test says "for this input, the output should be this." A property test says "for any input matching this generator, this universal claim holds." When the contract is articulable, a property test verifies far more of the behavior than a hand-written set of examples could; when the contract is not articulable, property tests are made up after the fact and verify whatever happens to be in the generator's reach, which is not useful.
|
|
23
|
+
|
|
24
|
+
The discipline is in three places: the property (is the claim universal, falsifiable, and meaningful?), the generator (does it produce inputs that cover the space, including the edges?), and the shrinker (when a property fails, can the framework reduce the failing input to something a developer can read?). All three must be well-designed for property-based testing to deliver its promise.
|
|
25
|
+
|
|
26
|
+
Property tests are complements to example tests, not replacements. The mature pattern is properties for the universal contract and examples for the specific cases that document particular behaviors or bug fixes.
|
|
27
|
+
|
|
28
|
+
## The Three Classical Property Shapes
|
|
29
|
+
|
|
30
|
+
| Shape | Pattern | Example |
|
|
31
|
+
|---|---|---|
|
|
32
|
+
| Invariant | The output has a structural property regardless of input | `forall L. isSorted(sort(L))` |
|
|
33
|
+
| Oracle | The output equals an alternative implementation's output | `forall L. fastSort(L) == referenceSort(L)` |
|
|
34
|
+
| Round-trip | A transformation and its inverse compose to identity | `forall x. decode(encode(x)) == x` |
|
|
35
|
+
|
|
36
|
+
Plus algebraic patterns (commutativity, associativity, identity, idempotence) and metamorphic patterns (relations between related inputs). Most useful property tests use one or two of these shapes.
|
|
37
|
+
|
|
38
|
+
## The Generator and Shrinker
|
|
39
|
+
|
|
40
|
+
A property test has two halves: the property and the generator. A well-designed generator produces inputs that exercise the input space — including edge cases (empty, single-element, boundary values), valid cases, and invalid cases as appropriate. A poorly-designed generator misses bug-triggering inputs and the property "passes" by not having been challenged.
|
|
41
|
+
|
|
42
|
+
The shrinker reduces a failing input to its minimal form. Without shrinking, a failing input is random noise. With shrinking, a failing input is the smallest demonstration of the bug — usually small enough that the bug is visible at a glance.
|
|
43
|
+
|
|
44
|
+
| Domain | Library-provided generator | Library-provided shrinker | Custom shrinker need |
|
|
45
|
+
|---|---|---|---|
|
|
46
|
+
| Primitive types (int, float, string, bool) | Yes | Yes | No |
|
|
47
|
+
| Collections (list, map, set) | Yes | Yes | No |
|
|
48
|
+
| Optional, Either, Result | Yes | Yes | No |
|
|
49
|
+
| Tuples and products | Yes | Yes | No |
|
|
50
|
+
| Custom domain types | Compose from primitives | Shrinks to component primitives | Sometimes — when domain has invariants |
|
|
51
|
+
| Grammar-based (valid JSON, SQL, email) | Some libraries provide | May need custom | Often yes |
|
|
52
|
+
| Stateful (sequences of operations) | Yes (PBT state-machine modes) | Yes | Custom for domain operations |
|
|
53
|
+
|
|
54
|
+
## Property vs Example — Where Each Fits
|
|
55
|
+
|
|
56
|
+
| Code character | Property test | Example test |
|
|
57
|
+
|---|---|---|
|
|
58
|
+
| Sorting, searching, data structure operations | Strong fit | For regression of specific bugs |
|
|
59
|
+
| Parsing / unparsing, encoding / decoding | Strong fit (round-trip) | For known-input regression |
|
|
60
|
+
| Pure mathematical functions | Strong fit | Rarely needed |
|
|
61
|
+
| Business rules with combinatorial input | Strong fit | For specific edge cases |
|
|
62
|
+
| API request/response round-trips | Strong fit | For known good/bad cases |
|
|
63
|
+
| Functions defined by a small list of cases | Weak fit — property unclear | Strong fit |
|
|
64
|
+
| Bug regression for a specific known input | Weak fit | Strong fit |
|
|
65
|
+
| Code with side effects / I/O | Adaptable via stateful PBT | Often easier |
|
|
66
|
+
|
|
67
|
+
A test suite typically mixes both; the right ratio depends on how much of the code has articulable universal contracts.
|
|
68
|
+
|
|
69
|
+
## Verification
|
|
70
|
+
|
|
71
|
+
After applying this skill, verify:
|
|
72
|
+
- [ ] Every property test asserts a *universal* claim, not a specific input-output. A test that runs `forall input in [single_value]` is an example test in property-test clothing.
|
|
73
|
+
- [ ] Each property fits one of the recognized shapes (invariant, oracle, round-trip, algebraic, metamorphic) or has a clearly-stated rationale for being a different shape.
|
|
74
|
+
- [ ] Generators for domain types are composed from library primitives; custom generators ship with custom shrinkers when default shrinking produces unintuitive minimal cases.
|
|
75
|
+
- [ ] Generator design produces both common and edge-case inputs. Generators that produce only "typical" inputs miss bug-triggering edge cases by construction.
|
|
76
|
+
- [ ] When a property fails, the reported failing input is the shrunk minimum — not a 500-element random list. If the shrunk input is large, the generator/shrinker pair needs work.
|
|
77
|
+
- [ ] Property tests are complemented by example tests for specific cases (bug regressions, documented behaviors). The suite is not property-only.
|
|
78
|
+
- [ ] Trial budgets are appropriate to the property: 100 inputs for cheap properties; 1000+ for properties on slow code or with rare-bug input spaces; nightly long-running campaigns for production-critical properties.
|
|
79
|
+
- [ ] PBT is applied where the contract is articulable. Functions that are genuinely defined by a list of cases are not force-fit to property tests.
|
|
80
|
+
|
|
81
|
+
## Do NOT Use When
|
|
82
|
+
|
|
83
|
+
| Instead of this skill | Use | Why |
|
|
84
|
+
|---|---|---|
|
|
85
|
+
| Specifying one concrete input-output behavior | example tests (see `testing-strategy`) | examples specify particular behaviors; this skill specifies universal contracts |
|
|
86
|
+
| Generating random inputs to find crashes or memory-safety bugs | fuzz-testing skill | fuzzing asserts no-crash implicitly; this skill asserts an explicit property |
|
|
87
|
+
| Measuring whether the test suite catches deliberately-introduced defects | `mutation-testing` | mutation measures test-suite quality; PBT is one technique for producing high-mutation-killing tests |
|
|
88
|
+
| Constructing a state-machine model of the system under test | `state-machine-modeling` (when it exists) | state-machine modeling is its own discipline; stateful PBT is a related but narrower technique |
|
|
89
|
+
| Choosing test levels (unit/integration/e2e) | `testing-strategy` | testing-strategy owns level choices; this skill is a tactical technique within them |
|
|
90
|
+
|
|
91
|
+
## Key Sources
|
|
92
|
+
|
|
93
|
+
- Claessen, K., & Hughes, J. (2000). ["QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs"](https://www.cs.tufts.edu/~nr/cs257/archive/john-hughes/quick.pdf). *ICFP 2000*. The foundational paper introducing property-based testing as a discipline; the QuickCheck library it describes is the ancestor of every modern PBT library.
|
|
94
|
+
- Hughes, J. (2007). ["QuickCheck Testing for Fun and Profit"](https://link.springer.com/chapter/10.1007/978-3-540-69611-7_1). *PADL 2007*. The canonical practitioner-oriented introduction to property-based testing patterns.
|
|
95
|
+
- Hughes, J., et al. (2016). ["Mysteries of Dropbox: Property-Based Testing of a Distributed Synchronization Service"](https://ieeexplore.ieee.org/document/7515466). *ICST 2016*. Industrial case study showing PBT's effectiveness on a real distributed system.
|
|
96
|
+
- MacIver, D. R. ["Hypothesis — How does this work?"](https://hypothesis.works/articles/how-hypothesis-works/) and ["The shrinker is the Hypothesis"](https://hypothesis.works/articles/shrinking/). Modern PBT shrinker design; the strongest current open-source implementation.
|
|
97
|
+
- fast-check team. ["fast-check Documentation — Properties"](https://fast-check.dev/docs/core-blocks/properties/). Reference for the most-adopted PBT library in the JavaScript ecosystem.
|
|
98
|
+
- Arts, T., Hughes, J., Johansson, J., & Wiger, U. (2006). ["Testing telecoms software with Quviq QuickCheck"](https://dl.acm.org/doi/10.1145/1159789.1159792). *Erlang Workshop 2006*. Industrial PBT case study; a foundational reference for stateful property testing.
|
|
99
|
+
- Lampropoulos, L., Gallois-Wong, D., Hritcu, C., Hughes, J., Pierce, B. C., & Xia, L. (2017). ["Beginner's Luck: A Language for Property-Based Generators"](https://dl.acm.org/doi/10.1145/3009837.3009868). *POPL 2017*. Recent research on the generator/shrinker design problem.
|
|
100
|
+
- Fink, G., & Bishop, M. (1997). ["Property-based testing: a new approach to testing for assurance"](https://dl.acm.org/doi/10.1145/263244.263267). *ACM SIGSOFT Software Engineering Notes*, 22(4), 74-80. Earlier related thread on properties as a testing discipline.
|
|
@@ -0,0 +1,43 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: prototyping
|
|
3
|
+
description: "Use when building an artifact whose purpose is to answer a specific question — paper sketch, wireframe, clickable mockup, wizard-of-oz, role-play, service prototype, or code spike — at the lowest fidelity sufficient to produce that learning. Do NOT use for production-grade component construction, design-system contribution, or building the actual ship-ready feature — those are design-module-composition and engineering implementation."
|
|
4
|
+
license: CC-BY-4.0
|
|
5
|
+
metadata:
|
|
6
|
+
metadata: "{\"schema_version\":6,\"version\":\"1.0.0\",\"type\":\"capability\",\"category\":\"design\",\"scope\":\"portable\",\"owner\":\"skill-graph-maintainer\",\"freshness\":\"2026-05-12\",\"drift_check\":\"{\\\\\\\"last_verified\\\\\\\":\\\\\\\"2026-05-12\\\\\\\"}\",\"eval_artifacts\":\"planned\",\"eval_state\":\"unverified\",\"routing_eval\":\"absent\",\"stability\":\"experimental\",\"keywords\":\"[\\\\\\\"paper prototype\\\\\\\",\\\\\\\"low fidelity prototype\\\\\\\",\\\\\\\"clickable prototype\\\\\\\",\\\\\\\"wizard of oz\\\\\\\",\\\\\\\"role play prototype\\\\\\\",\\\\\\\"service prototype\\\\\\\",\\\\\\\"code spike\\\\\\\",\\\\\\\"learning goal\\\\\\\",\\\\\\\"fidelity matching\\\\\\\",\\\\\\\"throwaway prototype\\\\\\\",\\\\\\\"sacrificial concept\\\\\\\",\\\\\\\"prototype to learn\\\\\\\",\\\\\\\"rough and right\\\\\\\"]\",\"triggers\":\"[\\\\\\\"prototype this\\\\\\\",\\\\\\\"wizard of oz\\\\\\\",\\\\\\\"paper prototype\\\\\\\",\\\\\\\"clickable mockup\\\\\\\",\\\\\\\"what fidelity\\\\\\\"]\",\"examples\":\"[\\\\\\\"Pick the right fidelity for a prototype that tests whether users will trust an AI-suggested category.\\\\\\\",\\\\\\\"Plan a wizard-of-oz study where a human acts as the recommendation engine.\\\\\\\",\\\\\\\"Sketch a role-play prototype for a service-desk interaction before any UI is built.\\\\\\\",\\\\\\\"Decide between a paper prototype and a Figma clickable for this onboarding test.\\\\\\\"]\",\"anti_examples\":\"[\\\\\\\"Build the production React component for the new dashboard widget.\\\\\\\",\\\\\\\"Add this component to the design system library.\\\\\\\",\\\\\\\"Write the migration script for the production database.\\\\\\\"]\",\"relations\":\"{\\\\\\\"related\\\\\\\":[\\\\\\\"ideation\\\\\\\",\\\\\\\"usability-testing\\\\\\\",\\\\\\\"design-thinking\\\\\\\"],\\\\\\\"boundary\\\\\\\":[{\\\\\\\"skill\\\\\\\":\\\\\\\"design-module-composition\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"design-module-composition produces durable design-system components meant to ship and be reused. prototyping produces disposable artifacts whose only purpose is learning — different lifecycle, different quality bar, different audience.\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"interaction-patterns\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"interaction-patterns is a reference catalog of established UI behaviors. prototyping is the activity of building a thing to test a question — it may use interaction patterns but is not itself a pattern library.\\\\\\\"}]}\",\"skill_graph_source_repo\":\"https://github.com/jacob-balslev/skill-graph\",\"skill_graph_protocol\":\"Skill Metadata Protocol v5\",\"skill_graph_project\":\"Skill Graph\",\"skill_graph_canonical_skill\":\"skills/prototyping/SKILL.md\"}"
|
|
7
|
+
skill_graph_source_repo: "https://github.com/jacob-balslev/skill-graph"
|
|
8
|
+
skill_graph_protocol: Skill Metadata Protocol v4
|
|
9
|
+
skill_graph_project: Skill Graph
|
|
10
|
+
skill_graph_canonical_skill: skills/prototyping/SKILL.md
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Prototyping
|
|
14
|
+
|
|
15
|
+
## Coverage
|
|
16
|
+
Prototyping covers the practice of constructing artifacts whose primary purpose is to **answer a question** the team has written down. The fidelity ladder runs from **paper sketches** (fastest, cheapest, best for early flow and concept testing) through **wireframes**, **clickable prototypes** (Figma, Framer, similar), **wizard-of-oz** prototypes (a human secretly performs the function the system will eventually automate — Kelley 1984), **role-play / bodystorming** (the team physically acts out a service interaction), **service prototypes** (props and staged environments for service-design questions), and up to **code spikes** (throwaway working code that answers a feasibility question).
|
|
17
|
+
|
|
18
|
+
The central skill is **fidelity matching**: choosing the lowest fidelity that can credibly answer the learning question. Paper prototypes can answer "is this flow understandable?" but not "is the typography readable?"; a clickable prototype can answer "do users find the primary action?" but not "does this feel fast under load?"; only a code spike can answer "will this API rate-limit us at scale?". Building higher fidelity than the question requires wastes time and prematurely anchors stakeholders on visual decisions.
|
|
19
|
+
|
|
20
|
+
A complementary skill is **the learning goal contract**: every prototype begins with one or two written questions it is built to answer, and a definition of what evidence would count as an answer in either direction. Without this, prototypes drift into "let's just make it look nice" and the testing session that follows produces ambiguous results because nobody agreed in advance what they were looking for.
|
|
21
|
+
|
|
22
|
+
The practice also covers **sacrificial concepts** — deliberately rough or extreme prototypes whose purpose is to provoke a reaction, not to be defended. IDEO and the Stanford d.school both teach using disposable artifacts to draw out user preferences that would not surface in abstract conversation.
|
|
23
|
+
|
|
24
|
+
## Philosophy
|
|
25
|
+
Prototyping rejects the instinct to polish before showing. Polish signals finality; polish makes stakeholders evaluate fit-and-finish instead of concept; polish makes users reluctant to criticize. A rougher prototype invites honest reaction. The famous IDEO maxim "if a picture is worth a thousand words, a prototype is worth a thousand meetings" captures the substitution effect — but only if the prototype is cheap enough that a team can build three and throw two away.
|
|
26
|
+
|
|
27
|
+
The discipline insists prototypes are means, not ends. A successful prototype is one that produced a clear answer, even if the answer is "this concept doesn't work" — perhaps especially then, because that finding came at the price of a prototype rather than a launched feature. Teams that judge prototypes by their visual quality have inverted the value system; teams that judge them by what was learned have it right.
|
|
28
|
+
|
|
29
|
+
## Verification
|
|
30
|
+
- The prototype has a written learning question, agreed before construction began, and a definition of what evidence would answer it.
|
|
31
|
+
- The fidelity matches the learning goal — the team can defend why this fidelity was chosen and what a higher- or lower-fidelity version would have cost or gained.
|
|
32
|
+
- The prototype is **disposable** in the team's mind — there is no implicit commitment that this code/file/sketch will become the production artifact.
|
|
33
|
+
- The construction time is small relative to the cost of being wrong about the underlying concept — if a prototype took two weeks to test a one-week assumption, the fidelity was probably too high.
|
|
34
|
+
- The prototype is testable: a real participant can interact with it (or watch it being acted out) and produce a meaningful reaction, not just nod politely.
|
|
35
|
+
- The team has a plan for what happens after testing — either iterate, escalate fidelity, or kill the concept — written down before testing begins.
|
|
36
|
+
|
|
37
|
+
## Do NOT Use When
|
|
38
|
+
- The artifact will ship to real users in production — that is engineering implementation, not prototyping; even a "high-fi prototype" that ships is a product.
|
|
39
|
+
- The component is meant for reuse across many features and contexts — use **design-module-composition** to contribute to the design system.
|
|
40
|
+
- No learning question has been articulated — return to **problem-framing** or **ideation** to clarify what the prototype would even test.
|
|
41
|
+
- The team needs to evaluate an existing artifact with users — use **usability-testing** directly; no new prototype is required.
|
|
42
|
+
- The question is purely technical performance, scaling, or infrastructure — use an engineering spike with appropriate measurement instrumentation, not a design prototype.
|
|
43
|
+
- The output is a reference catalogue of established UI behaviors — use **interaction-patterns**.
|
|
@@ -0,0 +1,144 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: query-optimization
|
|
3
|
+
description: "Use when diagnosing and tuning a specific slow query in a relational database: the query planner mental model (parse → rewrite → plan → execute), the canonical inputs the planner reasons over (statistics, cost model, available indexes), reading EXPLAIN and EXPLAIN ANALYZE output, the catalog of plan-node types (sequential scan, index scan, index-only scan, bitmap heap scan, nested loop, hash join, merge join, sort, hash aggregate, materialize) and what each tells you about the query's actual cost, the difference between query rewriting (reformulating the SQL) and operational fixes (adding indexes, ANALYZE, statistics targets), and the diagnostic procedure that takes a slow query to a fast one. Do NOT use for the design of which indexes to maintain (use indexing-strategy), schema design (use data-modeling), distributed-data partitioning (use sharding-strategy), or isolation-level decisions (use transaction-isolation)."
|
|
4
|
+
license: MIT
|
|
5
|
+
allowed-tools: Read Grep
|
|
6
|
+
metadata:
|
|
7
|
+
metadata: "{\"schema_version\":6,\"version\":\"1.0.0\",\"type\":\"capability\",\"category\":\"engineering\",\"domain\":\"engineering/data\",\"scope\":\"reference\",\"owner\":\"skill-graph-maintainer\",\"freshness\":\"2026-05-16\",\"drift_check\":\"{\\\\\\\"last_verified\\\\\\\":\\\\\\\"2026-05-16\\\\\\\"}\",\"eval_artifacts\":\"planned\",\"eval_state\":\"unverified\",\"routing_eval\":\"absent\",\"comprehension_state\":\"present\",\"stability\":\"experimental\",\"keywords\":\"[\\\\\\\"query optimization\\\\\\\",\\\\\\\"query planner\\\\\\\",\\\\\\\"EXPLAIN\\\\\\\",\\\\\\\"EXPLAIN ANALYZE\\\\\\\",\\\\\\\"sequential scan\\\\\\\",\\\\\\\"index scan\\\\\\\",\\\\\\\"nested loop\\\\\\\",\\\\\\\"hash join\\\\\\\",\\\\\\\"merge join\\\\\\\",\\\\\\\"query rewriting\\\\\\\",\\\\\\\"statistics\\\\\\\",\\\\\\\"planner cost model\\\\\\\",\\\\\\\"cardinality estimation\\\\\\\"]\",\"triggers\":\"[\\\\\\\"this query is slow\\\\\\\",\\\\\\\"EXPLAIN ANALYZE output\\\\\\\",\\\\\\\"why isn't the planner using the index\\\\\\\",\\\\\\\"join order optimization\\\\\\\",\\\\\\\"subquery vs join\\\\\\\"]\",\"examples\":\"[\\\\\\\"diagnose a query that takes 8 seconds and identify the plan node responsible\\\\\\\",\\\\\\\"rewrite a slow correlated subquery as a join to enable a faster plan\\\\\\\",\\\\\\\"explain why ANALYZE on a recently-changed table can change the planner's decisions\\\\\\\",\\\\\\\"decide whether to add an index, rewrite the query, or accept the cost\\\\\\\"]\",\"anti_examples\":\"[\\\\\\\"design which indexes to maintain on a new schema (use indexing-strategy)\\\\\\\",\\\\\\\"choose a database schema (use data-modeling)\\\\\\\",\\\\\\\"decide isolation level for a workload (use transaction-isolation)\\\\\\\"]\",\"relations\":\"{\\\\\\\"related\\\\\\\":[\\\\\\\"indexing-strategy\\\\\\\",\\\\\\\"data-modeling\\\\\\\",\\\\\\\"transaction-isolation\\\\\\\",\\\\\\\"schema-evolution\\\\\\\"],\\\\\\\"boundary\\\\\\\":[{\\\\\\\"skill\\\\\\\":\\\\\\\"indexing-strategy\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"indexing-strategy owns the design of which indexes the database has; this skill owns the diagnosis and tuning of specific slow queries. The two compose: query-optimization diagnoses; indexing-strategy is one of the response tools.\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"data-modeling\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"data-modeling owns schema and access-pattern design; this skill owns the tuning of queries against the existing schema. Sometimes the diagnosis is 'the schema is wrong for this query'; the response is then in data-modeling's scope.\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"transaction-isolation\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"transaction-isolation owns concurrency correctness; this skill owns single-query performance. Sometimes a slow query is slow because of lock contention from isolation; the disciplines intersect on those cases.\\\\\\\"}],\\\\\\\"verify_with\\\\\\\":[\\\\\\\"indexing-strategy\\\\\\\",\\\\\\\"data-modeling\\\\\\\"]}\",\"mental_model\":\"|\",\"purpose\":\"|\",\"boundary\":\"|\",\"analogy\":\"Query optimization is to a slow SQL query what a medical specialist's chart-reading is to a slow-recovering patient — you do not prescribe before reading the lab values; the plan reads like a chart, every plan node is a vital sign, every cardinality misestimate is a misdiagnosis the planner already made, and your job is to translate the chart into the right intervention rather than the most familiar one.\",\"misconception\":\"|\",\"concept\":\"{\\\\\\\"definition\\\\\\\":\\\\\\\"Query optimization is the discipline of diagnosing and tuning a specific slow query. The unit of work is one query that the application or a user reported as slow; the goal is identifying the root cause and applying a response (rewrite the SQL, add an index, refresh statistics, denormalize the schema, change the access pattern). The mental model is the query planner — the database component that takes the SQL string, parses it, applies rewrites, considers possible plans, estimates each plan's cost using statistics and a cost model, picks the cheapest, and executes it. EXPLAIN and EXPLAIN ANALYZE expose the planner's choices and their actual cost; reading these is the central diagnostic skill. The work is largely interpretation: knowing what each plan-node type means, what its cost implies, what cardinality misestimation looks like, and which response (rewrite, index, statistics, schema) addresses the diagnosis.\\\\\\\",\\\\\\\"mental_model\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"purpose\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"boundary\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"taxonomy\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"analogy\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"misconception\\\\\\\":\\\\\\\"|\\\\\\\"}\",\"skill_graph_source_repo\":\"https://github.com/jacob-balslev/skill-graph\",\"skill_graph_protocol\":\"Skill Metadata Protocol v5\",\"skill_graph_project\":\"Skill Graph\",\"skill_graph_canonical_skill\":\"skills/query-optimization/SKILL.md\"}"
|
|
8
|
+
skill_graph_source_repo: "https://github.com/jacob-balslev/skill-graph"
|
|
9
|
+
skill_graph_protocol: Skill Metadata Protocol v4
|
|
10
|
+
skill_graph_project: Skill Graph
|
|
11
|
+
skill_graph_canonical_skill: skills/query-optimization/SKILL.md
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Query Optimization
|
|
15
|
+
|
|
16
|
+
## Coverage
|
|
17
|
+
|
|
18
|
+
The discipline of diagnosing and tuning specific slow queries by reading the database planner's chosen plan, identifying the root cause, and applying the right response. Covers the query planner's phases (parse → rewrite → plan → execute), the cost model and statistics that drive plan choice, the plan-node catalog (Seq Scan, Index Scan, Index Only Scan, Bitmap Heap Scan, Nested Loop, Hash Join, Merge Join, Sort, Hash Aggregate, Materialize, CTE Scan), the EXPLAIN and EXPLAIN ANALYZE diagnostic vocabulary, the response catalog (rewrite, add index, refresh statistics, denormalize, materialized view, isolation change, settings, application cache, accept cost), and the root-cause taxonomy that links diagnosis to response.
|
|
19
|
+
|
|
20
|
+
## Philosophy
|
|
21
|
+
|
|
22
|
+
Query optimization is largely a reading discipline. The planner exposes its decisions and the database reports the actual cost; the work is reading EXPLAIN ANALYZE output, knowing what each node means, and choosing the right response. A team that reaches for "add an index" before reading the plan is guessing; a team that reads the plan finds the actual issue and applies the right response — often *not* adding an index.
|
|
23
|
+
|
|
24
|
+
The most common root cause is cardinality misestimation: the planner thinks 100 rows will match, actually 10 million do. The chosen plan optimized for 100 is wrong for 10 million. ANALYZE, raised statistics targets, and (occasionally) extended statistics objects are where most diagnoses live. Adding indexes without addressing the statistics problem produces no improvement.
|
|
25
|
+
|
|
26
|
+
The discipline distinguishes diagnosis from response. The diagnosis is "what is the root cause of this query's slowness." The response is "what to do about it." The response catalog has many entries; matching diagnosis to the right one is the work.
|
|
27
|
+
|
|
28
|
+
## The Diagnostic Procedure
|
|
29
|
+
|
|
30
|
+
```
|
|
31
|
+
┌───────────────────────────┐
|
|
32
|
+
│ 1. Run EXPLAIN ANALYZE │
|
|
33
|
+
└───────────────────────────┘
|
|
34
|
+
│
|
|
35
|
+
▼
|
|
36
|
+
┌───────────────────────────┐
|
|
37
|
+
│ 2. Find the most-expensive│
|
|
38
|
+
│ plan node (highest │
|
|
39
|
+
│ actual time) │
|
|
40
|
+
└───────────────────────────┘
|
|
41
|
+
│
|
|
42
|
+
▼
|
|
43
|
+
┌───────────────────────────┐
|
|
44
|
+
│ 3. Compare estimated rows │
|
|
45
|
+
│ vs actual rows │
|
|
46
|
+
└───────────────────────────┘
|
|
47
|
+
│
|
|
48
|
+
▼
|
|
49
|
+
┌───────────────────────────────────────────────────────┐
|
|
50
|
+
│ 4. Choose response based on diagnosis: │
|
|
51
|
+
│ estimate ≫ actual → predicate is more selective │
|
|
52
|
+
│ than planner thinks; may need│
|
|
53
|
+
│ extended statistics or │
|
|
54
|
+
│ query rewrite │
|
|
55
|
+
│ estimate ≪ actual → ANALYZE; raise statistics │
|
|
56
|
+
│ target; consider extended │
|
|
57
|
+
│ statistics │
|
|
58
|
+
│ estimate ≈ actual but slow → access path is wrong; │
|
|
59
|
+
│ maybe add index, change │
|
|
60
|
+
│ join order, rewrite │
|
|
61
|
+
└───────────────────────────────────────────────────────┘
|
|
62
|
+
│
|
|
63
|
+
▼
|
|
64
|
+
┌───────────────────────────┐
|
|
65
|
+
│ 5. Apply response and │
|
|
66
|
+
│ re-EXPLAIN ANALYZE │
|
|
67
|
+
└───────────────────────────┘
|
|
68
|
+
│
|
|
69
|
+
▼
|
|
70
|
+
┌───────────────────────────┐
|
|
71
|
+
│ 6. Iterate until target │
|
|
72
|
+
│ is met │
|
|
73
|
+
└───────────────────────────┘
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
## The Plan-Node Catalog
|
|
77
|
+
|
|
78
|
+
| Plan node | What it does | Read as |
|
|
79
|
+
|---|---|---|
|
|
80
|
+
| Seq Scan | Read every row of a table | Slow for large tables; the planner chose this for a reason — check selectivity |
|
|
81
|
+
| Index Scan | Use index to find rows; fetch row | Standard fast access for selective predicates |
|
|
82
|
+
| Index Only Scan | Use covering index; no row fetch | The cheapest read access; index includes all needed columns |
|
|
83
|
+
| Bitmap Heap Scan | Build bitmap from indexes; fetch matching rows | Good for AND of multiple conditions; less random I/O than nested index scans |
|
|
84
|
+
| Nested Loop | For each outer row, scan inner | Fast for small outer or when inner has an index; bad for large outer |
|
|
85
|
+
| Hash Join | Build hash from one side; probe with other | Standard fast equi-join for medium-large inputs |
|
|
86
|
+
| Merge Join | Merge two sorted inputs | Fast when both are pre-sorted on join key (often after Sort or matching indexes) |
|
|
87
|
+
| Sort | Sort rows | Expensive at scale; consider an index or ORDER BY-less query |
|
|
88
|
+
| Hash Aggregate | Group-by via hash table | Fast for moderate cardinality groups |
|
|
89
|
+
| Group Aggregate | Group-by on sorted input | Used after Sort or matching index |
|
|
90
|
+
| Materialize | Materialize intermediate for repeated access | Inserted by planner when sub-result is reused |
|
|
91
|
+
| CTE Scan | Scan CTE result | CTE may be optimization fence in older Postgres |
|
|
92
|
+
|
|
93
|
+
## Response Catalog
|
|
94
|
+
|
|
95
|
+
| Diagnosis | Right response |
|
|
96
|
+
|---|---|
|
|
97
|
+
| Sequential scan on large table; predicate is selective | Add index |
|
|
98
|
+
| Sequential scan; predicate is poorly selective | Accept (scan is right) or denormalize |
|
|
99
|
+
| Index Scan with high `Rows Removed by Filter` | Index doesn't match — partial index or composite |
|
|
100
|
+
| Estimated rows ≫ actual | Check if predicate is more selective than planner knows; extended statistics or rewrite |
|
|
101
|
+
| Estimated rows ≪ actual | ANALYZE; raise STATISTICS target |
|
|
102
|
+
| Nested Loop with large outer | Hint hash join or rewrite to enable hash plan |
|
|
103
|
+
| Hash Join with small outer | Should be nested loop with index — check why |
|
|
104
|
+
| Many Sort nodes | Add index matching ORDER BY |
|
|
105
|
+
| N+1 query pattern (application-side) | Replace with single join query |
|
|
106
|
+
| Correlated subquery slow | Rewrite as join or EXISTS |
|
|
107
|
+
| Query slow under concurrency, fast solo | Lock contention — check isolation, locking strategy |
|
|
108
|
+
| Stale statistics suspected | ANALYZE; consider auto-vacuum tuning |
|
|
109
|
+
| Query is fundamentally too much work | Materialized view, precomputed aggregate, schema change |
|
|
110
|
+
|
|
111
|
+
## Verification
|
|
112
|
+
|
|
113
|
+
After applying this skill, verify:
|
|
114
|
+
- [ ] EXPLAIN ANALYZE is the starting point for every slow-query investigation. Guessing without reading the plan is replaced by reading the plan.
|
|
115
|
+
- [ ] The most-expensive plan node is identified and the diagnosis is targeted at that node, not the query as a whole.
|
|
116
|
+
- [ ] Estimated rows vs actual rows is checked at every node. Cardinality misestimates are the most common root cause.
|
|
117
|
+
- [ ] The response matches the diagnosis. Adding an index is one response of many; the team is not reflexively adding indexes.
|
|
118
|
+
- [ ] Slow queries are prioritized by aggregate impact (frequency × duration), not by individual duration alone. pg_stat_statements or equivalent is consulted.
|
|
119
|
+
- [ ] Query rewrites are tested for semantic equivalence and plan-shape change. Two queries that "should be equivalent" can produce different plans.
|
|
120
|
+
- [ ] Statistics are refreshed (ANALYZE) before deeper investigation when the planner's estimates seem off. Stale statistics are routine after bulk inserts and schema changes.
|
|
121
|
+
- [ ] Periodic review of top queries detects regressions caused by data growth. A query that was fast last year may not be this year.
|
|
122
|
+
|
|
123
|
+
## Do NOT Use When
|
|
124
|
+
|
|
125
|
+
| Instead of this skill | Use | Why |
|
|
126
|
+
|---|---|---|
|
|
127
|
+
| Designing which indexes the database maintains | `indexing-strategy` | indexing-strategy owns design; this skill diagnoses |
|
|
128
|
+
| Designing the schema or entity relationships | `data-modeling` | data-modeling owns design; sometimes the answer is a schema change in its scope |
|
|
129
|
+
| Reasoning about how schema changes over time | `schema-evolution` | schema-evolution owns versioning; this owns query-level tuning |
|
|
130
|
+
| Choosing isolation level | `transaction-isolation` | transaction-isolation owns concurrency; this owns retrieval performance |
|
|
131
|
+
| Horizontal partitioning across nodes | `sharding-strategy` | sharding owns partition; this owns within-shard performance |
|
|
132
|
+
| Designing performance tests for the system | `performance-testing` | performance-testing owns measurement under load; this owns single-query tuning |
|
|
133
|
+
|
|
134
|
+
## Key Sources
|
|
135
|
+
|
|
136
|
+
- PostgreSQL Global Development Group. ["PostgreSQL Documentation — Performance Tips"](https://www.postgresql.org/docs/current/performance-tips.html) and ["Using EXPLAIN"](https://www.postgresql.org/docs/current/using-explain.html). The canonical reference for Postgres query planning and EXPLAIN interpretation.
|
|
137
|
+
- Kleppmann, M. (2017). *Designing Data-Intensive Applications*. O'Reilly. Chapter 3 (Storage and Retrieval) covers the storage structures underlying query execution; useful framing for why plans take the shape they do.
|
|
138
|
+
- Petrov, A. (2019). *Database Internals*. O'Reilly. Deep treatment of query execution, the cost model, and plan generation.
|
|
139
|
+
- Tow, D. (2003). *SQL Tuning*. O'Reilly. The classic practitioner reference on diagnosing slow queries; database-agnostic methodology that applies across systems.
|
|
140
|
+
- Winand, M. (2012, ongoing). [*Use The Index, Luke!*](https://use-the-index-luke.com/). The canonical practitioner guide to SQL indexing, with substantial treatment of how indexes interact with the planner.
|
|
141
|
+
- Microsoft. ["Query Tuning Assistant and Query Store"](https://learn.microsoft.com/en-us/sql/relational-databases/performance/query-store). Reference for SQL Server's query-performance tooling.
|
|
142
|
+
- Oracle. ["Query Optimizer Concepts"](https://docs.oracle.com/database/121/TGSQL/tgsql_optcncpt.htm). Reference for Oracle's cost-based optimizer.
|
|
143
|
+
- MySQL Reference Manual. ["EXPLAIN Output Format"](https://dev.mysql.com/doc/refman/8.0/en/explain-output.html). MySQL EXPLAIN reference.
|
|
144
|
+
- Selinger, P. G., Astrahan, M. M., Chamberlin, D. D., Lorie, R. A., & Price, T. G. (1979). ["Access Path Selection in a Relational Database Management System"](https://dl.acm.org/doi/10.1145/582095.582099). *SIGMOD 1979*. The foundational paper on cost-based query optimization (System R); historical reference for the discipline's origin.
|