@skill-graph/cli 0.5.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +247 -0
- package/LICENSE +200 -0
- package/NOTICE +62 -0
- package/README.md +398 -0
- package/SKILL_GRAPH.md +443 -0
- package/bin/skill-graph.js +374 -0
- package/docs/ADOPTION.md +117 -0
- package/docs/CONFORMANCE.md +66 -0
- package/docs/PRIMER.md +384 -0
- package/docs/QUICKSTART-30MIN.md +333 -0
- package/docs/ROUTING-METRICS.md +120 -0
- package/docs/SKILL-MD-FORMAT-COMPATIBILITY.md +127 -0
- package/docs/SKILL_AUDIT_CHECKLIST.md +199 -0
- package/docs/SKILL_AUDIT_LOOP.md +195 -0
- package/docs/SKILL_METADATA_PROTOCOL.md +609 -0
- package/docs/_archived/marketplace-publication-priority-2026-05-18.md +239 -0
- package/docs/adr/0001-predicate-set.md +69 -0
- package/docs/adr/0002-json-ld-context.md +82 -0
- package/docs/adr/0003-ontoclean-rigidity-tags.md +65 -0
- package/docs/adr/0004-persistent-identifiers.md +74 -0
- package/docs/adr/0005-freshness-consolidation.md +70 -0
- package/docs/adr/0006-revise-predicate-rename.md +105 -0
- package/docs/adr/0007-audit-loop-cadence.md +99 -0
- package/docs/adr/0008-skill-surface-split-and-curation-policy.md +93 -0
- package/docs/category-consumers.md +168 -0
- package/docs/concept-map.md +194 -0
- package/docs/diagrams/drift-states.mmd +21 -0
- package/docs/diagrams/manifest-pipeline.mmd +25 -0
- package/docs/diagrams/routing-harness.mmd +41 -0
- package/docs/diagrams/starter-graph.mmd +53 -0
- package/docs/field-decision-guide.md +315 -0
- package/docs/field-rationale.md +211 -0
- package/docs/field-reference.generated.md +624 -0
- package/docs/field-reference.md +1426 -0
- package/docs/glossary.md +190 -0
- package/docs/head-noun-glossary.md +63 -0
- package/docs/images/audit-phases.png +0 -0
- package/docs/images/drift-states.png +0 -0
- package/docs/images/graded-mode.png +0 -0
- package/docs/images/manifest-pipeline.png +0 -0
- package/docs/images/routing-harness.png +0 -0
- package/docs/images/skill-anatomy.png +0 -0
- package/docs/images/starter-graph.png +0 -0
- package/docs/images/system-model.png +0 -0
- package/docs/integrations/github-actions.md +155 -0
- package/docs/manifest-field-mapping.md +443 -0
- package/docs/marketplace-publication-queue.generated.md +240 -0
- package/docs/marketplace-release-agent-prompt.md +82 -0
- package/docs/marketplace-skill-candidate-list.md +272 -0
- package/docs/marketplace-syndication.md +222 -0
- package/docs/migration-sample-review.md +155 -0
- package/docs/migrations/v4-to-v5.md +168 -0
- package/docs/migrations/v5-to-v6.md +221 -0
- package/docs/name-exceptions.yaml +37 -0
- package/docs/plans/marketplace-p1-public-migration-plan.md +41 -0
- package/docs/plans/multi-root-workspace.md +148 -0
- package/docs/plans/scripts-roadmap.md +107 -0
- package/docs/plans/v4-schema-bump.md +160 -0
- package/docs/plans/wave-2-extraction.md +122 -0
- package/docs/positioning-vs-marketplaces.md +175 -0
- package/docs/proposals/skill-audit-loop-positioning.md +160 -0
- package/docs/quality-doctrine.md +138 -0
- package/docs/recommended-skills.md +150 -0
- package/docs/research/skill-comprehension-eval-research.md +1830 -0
- package/docs/research/skill-retrieval-evidence.md +66 -0
- package/docs/skill-metadata-protocol.md +471 -0
- package/docs/skills-sh-maintainer-cleanup-request.md +80 -0
- package/examples/audits/a11y/findings.md +52 -0
- package/examples/audits/a11y/scorecard.md +21 -0
- package/examples/audits/a11y/verdict.md +44 -0
- package/examples/audits/debugging/findings.md +59 -0
- package/examples/audits/debugging/scorecard.md +22 -0
- package/examples/audits/debugging/verdict.md +33 -0
- package/examples/audits/documentation/findings.md +59 -0
- package/examples/audits/documentation/scorecard.md +22 -0
- package/examples/audits/documentation/verdict.md +33 -0
- package/examples/evals/a11y.json +140 -0
- package/examples/evals/api-design.json +52 -0
- package/examples/evals/code-review.json +52 -0
- package/examples/evals/data-modeling.json +52 -0
- package/examples/evals/database-migration.json +52 -0
- package/examples/evals/debugging.json +118 -0
- package/examples/evals/dependency-architecture.json +52 -0
- package/examples/evals/design-system-architecture.json +52 -0
- package/examples/evals/error-tracking.json +52 -0
- package/examples/evals/event-contract-design.json +52 -0
- package/examples/evals/form-ux-architecture.json +52 -0
- package/examples/evals/framework-fit-analysis.json +52 -0
- package/examples/evals/graph-audit.json +139 -0
- package/examples/evals/information-architecture.json +52 -0
- package/examples/evals/interaction-feedback.json +52 -0
- package/examples/evals/interaction-patterns.json +52 -0
- package/examples/evals/layout-composition.json +52 -0
- package/examples/evals/lint-overlay.json +117 -0
- package/examples/evals/microcopy.json +52 -0
- package/examples/evals/observability-modeling.json +52 -0
- package/examples/evals/pattern-recognition.json +96 -0
- package/examples/evals/performance-engineering.json +52 -0
- package/examples/evals/refactor.json +128 -0
- package/examples/evals/semiotics.json +52 -0
- package/examples/evals/skill-infrastructure.json +96 -0
- package/examples/evals/skill-router.json +140 -0
- package/examples/evals/skill-router.routing.json +113 -0
- package/examples/evals/system-interface-contracts.json +52 -0
- package/examples/evals/task-analysis.json +52 -0
- package/examples/evals/testing-strategy.json +118 -0
- package/examples/evals/type-safety.json +249 -0
- package/examples/evals/visual-design-foundations.json +52 -0
- package/examples/evals/webhook-integration.json +52 -0
- package/examples/exports/a11y.skill-md.md +80 -0
- package/examples/exports/debugging.skill-md.md +80 -0
- package/examples/exports/refactor.skill-md.md +78 -0
- package/examples/exports/testing-strategy.skill-md.md +81 -0
- package/examples/projects/markdown-static-site/README.md +115 -0
- package/examples/projects/markdown-static-site/skills/content-source-router/SKILL.md +131 -0
- package/examples/projects/markdown-static-site/skills/image-optimization-pipeline-config/SKILL.md +132 -0
- package/examples/projects/markdown-static-site/skills/link-rot-detection/SKILL.md +103 -0
- package/examples/projects/markdown-static-site/skills/markdown-post-frontmatter-validation/SKILL.md +133 -0
- package/examples/projects/markdown-static-site/skills/migrate-posts-to-v2-frontmatter/SKILL.md +140 -0
- package/examples/projects/saas-stripe-postgres/README.md +208 -0
- package/examples/projects/saas-stripe-postgres/db/migrations/0004_canonicalize_orders.sql +37 -0
- package/examples/projects/saas-stripe-postgres/db/schema.sql +112 -0
- package/examples/projects/saas-stripe-postgres/skills/migrate-orders-to-canonical-schema/SKILL.md +149 -0
- package/examples/projects/saas-stripe-postgres/skills/nextjs-server-action-validation/SKILL.md +154 -0
- package/examples/projects/saas-stripe-postgres/skills/payment-provider-router/SKILL.md +153 -0
- package/examples/projects/saas-stripe-postgres/skills/postgres-rls-pattern/SKILL.md +163 -0
- package/examples/projects/saas-stripe-postgres/skills/stripe-webhook-signature-verification/SKILL.md +137 -0
- package/examples/protocol/skill-metadata-template.md +301 -0
- package/examples/protocol/skills.manifest.sample.json +13245 -0
- package/examples/skill-metadata-template.md +317 -0
- package/examples/skills.manifest.sample.json +13519 -0
- package/examples/tests/v3-1-skos-fixture/SKILL.md +93 -0
- package/marketplace/README.md +17 -0
- package/marketplace/skills/a11y/SKILL.md +66 -0
- package/marketplace/skills/acid-fundamentals/SKILL.md +106 -0
- package/marketplace/skills/agent-engineering/SKILL.md +386 -0
- package/marketplace/skills/agent-eval-design/SKILL.md +55 -0
- package/marketplace/skills/ai-native-development/SKILL.md +294 -0
- package/marketplace/skills/api-design/SKILL.md +60 -0
- package/marketplace/skills/architecture-decision-records/SKILL.md +55 -0
- package/marketplace/skills/background-jobs/SKILL.md +265 -0
- package/marketplace/skills/bounded-context-mapping/SKILL.md +55 -0
- package/marketplace/skills/cap-theorem-tradeoffs/SKILL.md +127 -0
- package/marketplace/skills/client-server-boundary/SKILL.md +187 -0
- package/marketplace/skills/code-review/SKILL.md +120 -0
- package/marketplace/skills/color-system-design/SKILL.md +43 -0
- package/marketplace/skills/component-architecture/SKILL.md +126 -0
- package/marketplace/skills/compression/SKILL.md +112 -0
- package/marketplace/skills/conceptual-modeling/SKILL.md +181 -0
- package/marketplace/skills/connection-pooling/SKILL.md +105 -0
- package/marketplace/skills/constraint-awareness/SKILL.md +287 -0
- package/marketplace/skills/content-monitor/SKILL.md +209 -0
- package/marketplace/skills/context-engineering/SKILL.md +320 -0
- package/marketplace/skills/context-graph/SKILL.md +174 -0
- package/marketplace/skills/context-management/SKILL.md +174 -0
- package/marketplace/skills/context-window/SKILL.md +239 -0
- package/marketplace/skills/contract-testing/SKILL.md +120 -0
- package/marketplace/skills/cron-scheduling/SKILL.md +223 -0
- package/marketplace/skills/dark-mode-implementation/SKILL.md +47 -0
- package/marketplace/skills/data-modeling/SKILL.md +59 -0
- package/marketplace/skills/data-modeling-fundamentals/SKILL.md +117 -0
- package/marketplace/skills/database-migration/SKILL.md +429 -0
- package/marketplace/skills/debugging/SKILL.md +67 -0
- package/marketplace/skills/dependency-architecture/SKILL.md +58 -0
- package/marketplace/skills/design-module-composition/SKILL.md +43 -0
- package/marketplace/skills/design-system-architecture/SKILL.md +61 -0
- package/marketplace/skills/design-thinking/SKILL.md +44 -0
- package/marketplace/skills/diagnosis/SKILL.md +296 -0
- package/marketplace/skills/diff-analysis/SKILL.md +188 -0
- package/marketplace/skills/e2e-test-design/SKILL.md +113 -0
- package/marketplace/skills/entity-relationship-modeling/SKILL.md +218 -0
- package/marketplace/skills/epistemic-grounding/SKILL.md +112 -0
- package/marketplace/skills/error-boundary/SKILL.md +235 -0
- package/marketplace/skills/error-tracking/SKILL.md +261 -0
- package/marketplace/skills/eval-driven-development/SKILL.md +147 -0
- package/marketplace/skills/evaluation/SKILL.md +113 -0
- package/marketplace/skills/event-contract-design/SKILL.md +60 -0
- package/marketplace/skills/event-storming/SKILL.md +56 -0
- package/marketplace/skills/form-ux-architecture/SKILL.md +60 -0
- package/marketplace/skills/framework-fit-analysis/SKILL.md +59 -0
- package/marketplace/skills/frontend-architecture/SKILL.md +43 -0
- package/marketplace/skills/generative-ui/SKILL.md +118 -0
- package/marketplace/skills/graph-audit/SKILL.md +81 -0
- package/marketplace/skills/guardrails/SKILL.md +118 -0
- package/marketplace/skills/hooks-patterns/SKILL.md +185 -0
- package/marketplace/skills/http-semantics/SKILL.md +136 -0
- package/marketplace/skills/ideation/SKILL.md +41 -0
- package/marketplace/skills/indexing-strategy/SKILL.md +108 -0
- package/marketplace/skills/information-architecture/SKILL.md +59 -0
- package/marketplace/skills/integration-test-design/SKILL.md +111 -0
- package/marketplace/skills/intent-recognition/SKILL.md +136 -0
- package/marketplace/skills/interaction-feedback/SKILL.md +59 -0
- package/marketplace/skills/interaction-patterns/SKILL.md +59 -0
- package/marketplace/skills/journey-mapping/SKILL.md +41 -0
- package/marketplace/skills/keywords/SKILL.md +213 -0
- package/marketplace/skills/knowledge-modeling/SKILL.md +232 -0
- package/marketplace/skills/layout-composition/SKILL.md +59 -0
- package/marketplace/skills/linguistics/SKILL.md +429 -0
- package/marketplace/skills/lint-overlay/SKILL.md +76 -0
- package/marketplace/skills/mental-models/SKILL.md +126 -0
- package/marketplace/skills/merge-queue/SKILL.md +94 -0
- package/marketplace/skills/methodology/SKILL.md +317 -0
- package/marketplace/skills/microcopy/SKILL.md +232 -0
- package/marketplace/skills/middleware-patterns/SKILL.md +363 -0
- package/marketplace/skills/mobile-responsive-ux/SKILL.md +287 -0
- package/marketplace/skills/mutation-testing/SKILL.md +112 -0
- package/marketplace/skills/naming-conventions/SKILL.md +112 -0
- package/marketplace/skills/observability-modeling/SKILL.md +59 -0
- package/marketplace/skills/ontology-modeling/SKILL.md +67 -0
- package/marketplace/skills/owasp-security/SKILL.md +153 -0
- package/marketplace/skills/pattern-recognition/SKILL.md +472 -0
- package/marketplace/skills/performance-budgets/SKILL.md +185 -0
- package/marketplace/skills/performance-engineering/SKILL.md +58 -0
- package/marketplace/skills/performance-testing/SKILL.md +125 -0
- package/marketplace/skills/printify/SKILL.md +42 -0
- package/marketplace/skills/prioritization/SKILL.md +118 -0
- package/marketplace/skills/problem-framing/SKILL.md +41 -0
- package/marketplace/skills/problem-locating-solving/SKILL.md +203 -0
- package/marketplace/skills/project-knowledge-extraction/SKILL.md +54 -0
- package/marketplace/skills/prompt-craft/SKILL.md +134 -0
- package/marketplace/skills/prompt-injection-defense/SKILL.md +132 -0
- package/marketplace/skills/property-based-testing/SKILL.md +100 -0
- package/marketplace/skills/prototyping/SKILL.md +43 -0
- package/marketplace/skills/query-optimization/SKILL.md +144 -0
- package/marketplace/skills/real-time-updates/SKILL.md +324 -0
- package/marketplace/skills/ref-patterns/SKILL.md +284 -0
- package/marketplace/skills/refactor/SKILL.md +65 -0
- package/marketplace/skills/rendering-models/SKILL.md +142 -0
- package/marketplace/skills/replication-patterns/SKILL.md +110 -0
- package/marketplace/skills/research-synthesis/SKILL.md +41 -0
- package/marketplace/skills/route-handler-design/SKILL.md +347 -0
- package/marketplace/skills/schema-evolution/SKILL.md +140 -0
- package/marketplace/skills/security-fundamentals/SKILL.md +139 -0
- package/marketplace/skills/semantic-center/SKILL.md +194 -0
- package/marketplace/skills/semantic-relations/SKILL.md +250 -0
- package/marketplace/skills/semantics/SKILL.md +366 -0
- package/marketplace/skills/semiotics/SKILL.md +230 -0
- package/marketplace/skills/seo-strategy/SKILL.md +260 -0
- package/marketplace/skills/server-actions-design/SKILL.md +243 -0
- package/marketplace/skills/server-components-design/SKILL.md +190 -0
- package/marketplace/skills/sharding-strategy/SKILL.md +123 -0
- package/marketplace/skills/shopify/SKILL.md +42 -0
- package/marketplace/skills/skill-infrastructure/SKILL.md +320 -0
- package/marketplace/skills/skill-router/SKILL.md +71 -0
- package/marketplace/skills/skill-scaffold/SKILL.md +105 -0
- package/marketplace/skills/snapshot-testing/SKILL.md +120 -0
- package/marketplace/skills/spec-driven-development/SKILL.md +148 -0
- package/marketplace/skills/state-machine-modeling/SKILL.md +56 -0
- package/marketplace/skills/state-management/SKILL.md +134 -0
- package/marketplace/skills/streaming-architecture/SKILL.md +194 -0
- package/marketplace/skills/summarization/SKILL.md +156 -0
- package/marketplace/skills/suspense-patterns/SKILL.md +265 -0
- package/marketplace/skills/system-interface-contracts/SKILL.md +59 -0
- package/marketplace/skills/task-analysis/SKILL.md +201 -0
- package/marketplace/skills/taxonomy-design/SKILL.md +66 -0
- package/marketplace/skills/test-coverage-strategy/SKILL.md +108 -0
- package/marketplace/skills/test-doubles-design/SKILL.md +98 -0
- package/marketplace/skills/test-driven-development/SKILL.md +96 -0
- package/marketplace/skills/testing-strategy/SKILL.md +67 -0
- package/marketplace/skills/theme-system-design/SKILL.md +43 -0
- package/marketplace/skills/tool-call-flow/SKILL.md +229 -0
- package/marketplace/skills/tool-call-strategy/SKILL.md +292 -0
- package/marketplace/skills/transaction-isolation/SKILL.md +98 -0
- package/marketplace/skills/type-safety/SKILL.md +177 -0
- package/marketplace/skills/typography-system/SKILL.md +43 -0
- package/marketplace/skills/usability-testing/SKILL.md +43 -0
- package/marketplace/skills/user-research/SKILL.md +43 -0
- package/marketplace/skills/vercel-composition-patterns/SKILL.md +157 -0
- package/marketplace/skills/version-control/SKILL.md +233 -0
- package/marketplace/skills/visual-design-foundations/SKILL.md +59 -0
- package/marketplace/skills/visual-hierarchy/SKILL.md +43 -0
- package/marketplace/skills/webhook-integration/SKILL.md +331 -0
- package/marketplace/skills/writing-humanizer/SKILL.md +380 -0
- package/package.json +67 -0
- package/schemas/manifest.schema.json +811 -0
- package/schemas/manifest.v2.schema.json +164 -0
- package/schemas/manifest.v3.schema.json +758 -0
- package/schemas/manifest.v4.schema.json +755 -0
- package/schemas/manifest.v5.schema.json +755 -0
- package/schemas/manifest.v6.schema.json +811 -0
- package/schemas/skill.context.jsonld +279 -0
- package/schemas/skill.schema.json +919 -0
- package/schemas/skill.v2.schema.json +201 -0
- package/schemas/skill.v3.schema.json +827 -0
- package/schemas/skill.v4.schema.json +822 -0
- package/schemas/skill.v5.schema.json +830 -0
- package/schemas/skill.v6.schema.json +946 -0
- package/schemas/vocabulary/keywords.json +180 -0
- package/schemas/vocabulary/workspace_tags.json +23 -0
- package/scripts/__tests__/migrate-skill-v2-to-v3.test.js +161 -0
- package/scripts/__tests__/migrate-skill-v3-to-v4.test.js +158 -0
- package/scripts/__tests__/test-export-parser-drift.js +149 -0
- package/scripts/__tests__/test-marketplace-export.js +114 -0
- package/scripts/__tests__/test-router-paths.js +82 -0
- package/scripts/__tests__/test-stability-promotion.js +244 -0
- package/scripts/__tests__/test-v3-1-alias-contract.js +109 -0
- package/scripts/__tests__/test-v3-1-skos-runtime.js +116 -0
- package/scripts/backfill-schema-version.js +198 -0
- package/scripts/build-field-reference.js +160 -0
- package/scripts/build-retrieval-baseline.js +511 -0
- package/scripts/check-markdown-links.js +211 -0
- package/scripts/check-protocol-consistency.js +979 -0
- package/scripts/export-marketplace-skills.js +610 -0
- package/scripts/export-skill.js +374 -0
- package/scripts/generate-manifest.js +787 -0
- package/scripts/lib/alias-contract.js +83 -0
- package/scripts/lib/audit-prompt-builder.js +771 -0
- package/scripts/lib/mock-grader.js +134 -0
- package/scripts/lib/parse-frontmatter.js +429 -0
- package/scripts/lib/roots.js +119 -0
- package/scripts/lint/check-archetype-sections.js +185 -0
- package/scripts/lint/check-category-enum.js +83 -0
- package/scripts/lint/check-routing-eval.js +146 -0
- package/scripts/lint/check-routing-quality.js +211 -0
- package/scripts/lint/check-stability-promotion.js +220 -0
- package/scripts/lint/format-code-frame.js +206 -0
- package/scripts/marketplace-install.js +125 -0
- package/scripts/migrate-category-to-enum.js +169 -0
- package/scripts/migrate-skill-v2-to-v3.js +424 -0
- package/scripts/migrate-skill-v3-to-v4.js +200 -0
- package/scripts/migrate-skill-v5-to-v6.js +304 -0
- package/scripts/restructure-by-category.js +85 -0
- package/scripts/seed-publication-classification.js +282 -0
- package/scripts/skill-audit.js +893 -0
- package/scripts/skill-graph-drift.js +483 -0
- package/scripts/skill-graph-route.js +766 -0
- package/scripts/skill-graph-routing-eval.js +393 -0
- package/scripts/skill-lint.js +1317 -0
- package/scripts/skill-overlap.js +213 -0
- package/scripts/verify-skill-md-export.js +201 -0
|
@@ -0,0 +1,294 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ai-native-development
|
|
3
|
+
description: "Use when reasoning about agent autonomy levels, designing auto-improve loops, evaluating AI-generated code quality, or measuring agent productivity in an LLM-assisted codebase. Covers Karpathy's three eras of software (1.0 explicit / 2.0 learned / 3.0 natural-language), the vibe-coding-vs-agentic-engineering distinction, the 0–5 autonomy slider with task-type recommendations, the one-asset / one-metric / one-time-box AutoResearch loop, Software 3.0 productivity metrics, and the documented quality regressions of ungated AI-generated code (the 'vibe hangover'). Do NOT use for choosing a specific autonomy-loop topology (use `agent-engineering`), for the per-prompt authoring discipline (use `prompt-craft`), or for reviewing the AI-generated code that comes out of a Software 3.0 workflow (use `code-review`)."
|
|
4
|
+
license: MIT
|
|
5
|
+
compatibility: "Provider- and runtime-agnostic. The autonomy-slider levels and quality-gate sequence apply to any LLM-coding harness (Claude Code, OpenCode, Cursor, Aider, Copilot Workspace, Continue) that supports a deterministic verify step between agent output and merge."
|
|
6
|
+
allowed-tools: Read Grep
|
|
7
|
+
metadata:
|
|
8
|
+
metadata: "{\"schema_version\":6,\"version\":\"1.0.0\",\"type\":\"capability\",\"category\":\"agent\",\"domain\":\"agent/concepts\",\"scope\":\"portable\",\"owner\":\"skill-graph-maintainer\",\"freshness\":\"2026-05-06\",\"drift_check\":\"{\\\\\\\"last_verified\\\\\\\":\\\\\\\"2026-05-06\\\\\\\"}\",\"eval_artifacts\":\"planned\",\"eval_state\":\"unverified\",\"routing_eval\":\"absent\",\"stability\":\"experimental\",\"keywords\":\"[\\\\\\\"software 3.0 concepts\\\\\\\",\\\\\\\"vibe coding\\\\\\\",\\\\\\\"agentic engineering doctrine\\\\\\\",\\\\\\\"autonomy slider\\\\\\\",\\\\\\\"prompt as code\\\\\\\",\\\\\\\"karpathy three eras\\\\\\\",\\\\\\\"autoresearch loop\\\\\\\",\\\\\\\"ai-generated code quality\\\\\\\",\\\\\\\"vibe hangover\\\\\\\",\\\\\\\"llm-native development\\\\\\\",\\\\\\\"software 1.0 vs 2.0 vs 3.0\\\\\\\",\\\\\\\"natural language programs\\\\\\\",\\\\\\\"agent productivity metrics\\\\\\\",\\\\\\\"rework rate\\\\\\\",\\\\\\\"agent completion rate\\\\\\\",\\\\\\\"quality gates ai code\\\\\\\",\\\\\\\"autonomy levels coding\\\\\\\",\\\\\\\"ai code security regression\\\\\\\"]\",\"examples\":\"[\\\\\\\"we keep accepting agent-generated code on first try and shipping bugs — what discipline replaces this?\\\\\\\",\\\\\\\"what autonomy level should I run for a security-sensitive change?\\\\\\\",\\\\\\\"does measuring lines-of-code per session make sense when an agent generates the code?\\\\\\\",\\\\\\\"the team is treating prompts and skill files like throwaway notes — what's the alternative framing?\\\\\\\",\\\\\\\"we want an auto-improve loop for our skill content — how do we constrain it so it doesn't regress?\\\\\\\",\\\\\\\"what's the conceptual difference between a vibe coding session and an agentic engineering session?\\\\\\\",\\\\\\\"AI-generated code is shipping with vulnerabilities — what gates should sit between agent output and production?\\\\\\\",\\\\\\\"how do I match autonomy level to the risk profile of the task?\\\\\\\"]\",\"anti_examples\":\"[\\\\\\\"improve this specific prompt for the grader\\\\\\\",\\\\\\\"review this AI-generated PR for correctness\\\\\\\",\\\\\\\"design the checkpoint state machine for our loop\\\\\\\",\\\\\\\"scaffold a new skill that codifies our coding doctrine\\\\\\\",\\\\\\\"the autonomous loop is stalling — debug it\\\\\\\"]\",\"relations\":\"{\\\\\\\"boundary\\\\\\\":[{\\\\\\\"skill\\\\\\\":\\\\\\\"prompt-craft\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"prompt-craft is the per-prompt authoring discipline; ai-native-development is the meta-frame that explains why prompts are source code\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"agent-engineering\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"agent-engineering owns the production reliability discipline (orchestration, error budgets, observability); ai-native-development owns the conceptual model that motivates those concerns\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"code-review\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"code-review evaluates the AI-generated output; ai-native-development frames why that output exists and how to size the gates around it\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"tool-call-strategy\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"tool-call-strategy is the tactical layer for an agent's tool dispatch; ai-native-development is the conceptual layer above it\\\\\\\"}],\\\\\\\"related\\\\\\\":[\\\\\\\"prompt-craft\\\\\\\",\\\\\\\"agent-engineering\\\\\\\",\\\\\\\"code-review\\\\\\\",\\\\\\\"skill-router\\\\\\\"],\\\\\\\"verify_with\\\\\\\":[\\\\\\\"code-review\\\\\\\",\\\\\\\"testing-strategy\\\\\\\"]}\",\"portability\":\"{\\\\\\\"readiness\\\\\\\":\\\\\\\"scripted\\\\\\\",\\\\\\\"targets\\\\\\\":[\\\\\\\"skill-md\\\\\\\"]}\",\"lifecycle\":\"{\\\\\\\"stale_after_days\\\\\\\":180,\\\\\\\"review_cadence\\\\\\\":\\\\\\\"quarterly\\\\\\\"}\",\"skill_graph_source_repo\":\"https://github.com/jacob-balslev/skill-graph\",\"skill_graph_protocol\":\"Skill Metadata Protocol v5\",\"skill_graph_project\":\"Skill Graph\",\"skill_graph_canonical_skill\":\"skills/ai-native-development/SKILL.md\"}"
|
|
9
|
+
skill_graph_source_repo: "https://github.com/jacob-balslev/skill-graph"
|
|
10
|
+
skill_graph_protocol: Skill Metadata Protocol v4
|
|
11
|
+
skill_graph_project: Skill Graph
|
|
12
|
+
skill_graph_canonical_skill: skills/ai-native-development/SKILL.md
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
# AI-Native Development
|
|
16
|
+
|
|
17
|
+
## Coverage
|
|
18
|
+
|
|
19
|
+
The conceptual model for software development when an LLM participates in code creation. Specifically: Andrej Karpathy's three eras of software (1.0 explicit code / 2.0 learned weights / 3.0 natural-language programs); the vibe-coding-vs-agentic-engineering distinction and when each is appropriate; the 0–5 autonomy slider mapping task type and risk to the right level of agent independence; the AutoResearch improvement loop with its three constraints (one editable asset, one scalar metric, one time box); Software 3.0 productivity metrics that replace lines-of-code and commit-count for an LLM-assisted team; the documented security and quality regressions of ungated AI-generated code (the "vibe hangover") and the quality-gate sequence that compensates for them; and the operating principle that prompts, skill files, and agent-runtime configuration are _source code_ — versioned, reviewed, tested.
|
|
20
|
+
|
|
21
|
+
## Philosophy
|
|
22
|
+
|
|
23
|
+
A prompt is a program. A skill file is a library. An agent session is a runtime. This is not a metaphor; it is the literal operational model of an LLM-assisted codebase. The mistake teams make is treating these artifacts as ad-hoc notes — the same mistake early industry made with shell scripts before treating them as version-controlled software. AI-native development is the discipline of putting the same engineering rigor around prompts and skills that any team puts around production code: source control, code review, tests, contracts, observability.
|
|
24
|
+
|
|
25
|
+
The largest single failure mode at the team level is unintentional autonomy. Without an explicit framing, every agent session defaults to the _highest_ autonomy the harness allows, regardless of the task's risk. Vibe coding is not wrong — for a throwaway prototype it is correct. It is wrong as the _default_ for production code. The autonomy slider is the framing tool that lets a team decide _intentionally_ where on the slider any given task should run, and what gates compensate when autonomy goes up.
|
|
26
|
+
|
|
27
|
+
## 1. The Three Eras of Software
|
|
28
|
+
|
|
29
|
+
Karpathy named a structural shift in how programs are produced:
|
|
30
|
+
|
|
31
|
+
### Software 1.0 — Explicit code
|
|
32
|
+
|
|
33
|
+
Humans write instructions in a programming language. A compiler or interpreter executes them. Behavior is deterministic and fully auditable. Bugs are logic errors in code humans wrote.
|
|
34
|
+
|
|
35
|
+
```
|
|
36
|
+
Human writes code → compiler/interpreter runs code → output
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
### Software 2.0 — Learned programs
|
|
40
|
+
|
|
41
|
+
Humans curate data and pick an architecture. An optimizer trains weights. The trained network is the program. Behavior is probabilistic; auditability is partial (interpretability is an open problem). Bugs are distribution mismatches or training artifacts.
|
|
42
|
+
|
|
43
|
+
```
|
|
44
|
+
Human curates data + defines architecture → training → weights (the program) → output
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
### Software 3.0 — Natural-language programs
|
|
48
|
+
|
|
49
|
+
Humans write a specification in natural language. An LLM interprets the specification and produces behavior. The "code" is the prompt. Behavior is stochastic — the same prompt can produce different output across runs. Bugs are ambiguities in the prompt or gaps in the model's knowledge.
|
|
50
|
+
|
|
51
|
+
```
|
|
52
|
+
Human writes prompt → LLM interprets prompt → output
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
### The mapping
|
|
56
|
+
|
|
57
|
+
In Software 3.0 the prompt _is_ the program. Every traditional software-engineering concept has an analogue:
|
|
58
|
+
|
|
59
|
+
| Traditional concept | Software 3.0 equivalent |
|
|
60
|
+
| ------------------- | -------------------------------------------------- |
|
|
61
|
+
| Source code | Prompt files, system prompts, skill specifications |
|
|
62
|
+
| Libraries | Reusable skill files |
|
|
63
|
+
| Compiler | LLM inference engine |
|
|
64
|
+
| Linker | Skill-injector / context-loader |
|
|
65
|
+
| Runtime | Agent session |
|
|
66
|
+
| RAM | Context window |
|
|
67
|
+
| Debugger | Context-failure analysis |
|
|
68
|
+
| Tests | Eval suites |
|
|
69
|
+
| Version control | Skill / prompt versioning + git |
|
|
70
|
+
|
|
71
|
+
Once the mapping is explicit, the engineering disciplines transfer: review the prompt the way you'd review a function; version the skill the way you'd version a library; eval the agent's output the way you'd run unit tests against a build.
|
|
72
|
+
|
|
73
|
+
## 2. Vibe Coding vs Agentic Engineering
|
|
74
|
+
|
|
75
|
+
"Vibe coding" was named by Karpathy (Feb 2025) for the practice of generating code by feel — describing what you want, accepting the output, iterating by vibes. It is the default mode of most AI-assisted development. Agentic engineering is the disciplined alternative: structured, verifiable, with quality gates at every step.
|
|
76
|
+
|
|
77
|
+
| Dimension | Vibe coding | Agentic engineering |
|
|
78
|
+
| --------------- | -------------------------------- | ------------------------------------------------------------ |
|
|
79
|
+
| Planning | None — "just start coding" | Explicit plan or task spec |
|
|
80
|
+
| Specification | Verbal / mental model | Written contracts (acceptance criteria, ADRs, skill files) |
|
|
81
|
+
| Code generation | Accept first output | Generate → verify → iterate |
|
|
82
|
+
| Review | Skim the diff | Automated gates (lint, type-check, tests) + human spot-check |
|
|
83
|
+
| Quality | "Does it look right?" | Measurable criteria (evals pass, CI green) |
|
|
84
|
+
| Knowledge | Lost between sessions | Captured (skills, memory, ADRs, decision records) |
|
|
85
|
+
| Reproducibility | Low — depends on prompt phrasing | High — same skill content produces same behavior |
|
|
86
|
+
| Security | "It probably works" | Explicit security review; threat model considered |
|
|
87
|
+
| Scale | Fits small prototypes | Fits production systems with multiple agents |
|
|
88
|
+
|
|
89
|
+
### When vibe coding is the right tool
|
|
90
|
+
|
|
91
|
+
Vibe coding is correct for: throwaway prototypes, personal scripts with no users, learning a new library by playing with it, design exploration before committing to an approach.
|
|
92
|
+
|
|
93
|
+
### When vibe coding is the wrong tool
|
|
94
|
+
|
|
95
|
+
Vibe coding is wrong for: production code, financial calculations, security-sensitive logic (auth, authorization, data handling), shared codebases where other developers or agents will maintain the code.
|
|
96
|
+
|
|
97
|
+
## 3. The Autonomy Slider
|
|
98
|
+
|
|
99
|
+
Agent autonomy exists on a spectrum, not a binary. The right level depends on three inputs: task type, quality of available context, and consequences of failure.
|
|
100
|
+
|
|
101
|
+
### Levels
|
|
102
|
+
|
|
103
|
+
| Level | Name | Human role | Agent role | Example |
|
|
104
|
+
| ----- | ----------------------- | ----------------------------------------- | -------------------------------------- | ----------------------------------------- |
|
|
105
|
+
| 0 | Manual | Writes all code | None | Traditional development |
|
|
106
|
+
| 1 | Suggestion | Reviews suggestions, accepts/rejects | Suggests completions | Inline tab-completion |
|
|
107
|
+
| 2 | Drafting | Reviews drafts, edits before commit | Generates complete drafts from prompts | "Write a component that does X" |
|
|
108
|
+
| 3 | Implementing | Reviews finished work, runs gates | Implements full features, writes tests | Agent completes one ticket end-to-end |
|
|
109
|
+
| 4 | Autonomous + spot-check | Spot-checks via session summary | Implements, tests, documents, commits | Multi-task queue worked independently |
|
|
110
|
+
| 5 | Fully autonomous | Monitors metrics, intervenes on anomalies | Prioritize, implement, verify, deploy | Theoretical — not yet safe for production |
|
|
111
|
+
|
|
112
|
+
### Autonomy by task type
|
|
113
|
+
|
|
114
|
+
| Task type | Recommended level | Why |
|
|
115
|
+
| --------------------------- | ----------------- | -------------------------------------------------- |
|
|
116
|
+
| Bug fix with failing test | 4 | Clear acceptance criteria; low ambiguity |
|
|
117
|
+
| New feature implementation | 3 | Architectural decisions need human review |
|
|
118
|
+
| Codebase audit | 4 | Research; agent investigates autonomously |
|
|
119
|
+
| Security-sensitive change | 2 | High consequences; human must verify |
|
|
120
|
+
| Financial calculation logic | 2–3 | Monetary consequences; careful review needed |
|
|
121
|
+
| Documentation update | 4 | Low risk; agent verifies against source |
|
|
122
|
+
| UI / visual implementation | 3 | Visual judgment; screenshot review required |
|
|
123
|
+
| Refactor with green tests | 3–4 | Tests guard correctness; scope review still needed |
|
|
124
|
+
| Production deployment | 1 | High consequences; human controls the process |
|
|
125
|
+
|
|
126
|
+
### Autonomy prerequisites
|
|
127
|
+
|
|
128
|
+
Higher autonomy requires better infrastructure. The slider is not a free parameter; moving it up requires the supporting controls.
|
|
129
|
+
|
|
130
|
+
| Level | Required infrastructure |
|
|
131
|
+
| ----- | -------------------------------------------------------------------------------------------------------------------------------- |
|
|
132
|
+
| 2 | Prompt quality, basic type checking |
|
|
133
|
+
| 3 | Automated tests, CI pipeline, skill system, code review |
|
|
134
|
+
| 4 | Tripwire guardrails on destructive operations, structured session-summary protocol, persistent memory, eval suite, model routing |
|
|
135
|
+
| 5 | Self-healing, anomaly detection, automatic rollback, comprehensive evals, runtime observability |
|
|
136
|
+
|
|
137
|
+
A team that runs at level 4 without the level-4 infrastructure is not "moving fast"; it is shipping at level 4 with level-2 safety, and the gap will surface as production incidents.
|
|
138
|
+
|
|
139
|
+
## 4. The AutoResearch Loop
|
|
140
|
+
|
|
141
|
+
Karpathy's `autoresearch` pattern is the simplest reliable shape for autonomous agent improvement work:
|
|
142
|
+
|
|
143
|
+
```
|
|
144
|
+
LOOP:
|
|
145
|
+
1. Modify one thing (code, config, parameter, prompt)
|
|
146
|
+
2. Run the experiment (execute, measure)
|
|
147
|
+
3. Check: did the metric improve?
|
|
148
|
+
YES → keep the change, continue
|
|
149
|
+
NO → revert the change, try something else
|
|
150
|
+
4. Repeat until the time box expires
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
### The three constraints
|
|
154
|
+
|
|
155
|
+
The loop works because of what it forbids, not what it allows:
|
|
156
|
+
|
|
157
|
+
1. **One editable asset** — agent can modify one file, one function, or one parameter set. No cascading multi-file edits in a single iteration. Prevents diffuse failure.
|
|
158
|
+
2. **One scalar metric** — success is one number (accuracy, latency, score, cost). Prevents the "improved on A but regressed B and we didn't notice" failure.
|
|
159
|
+
3. **One time box** — the loop runs for a fixed duration. Prevents infinite exploration on a problem the loop won't solve.
|
|
160
|
+
|
|
161
|
+
### When to use AutoResearch vs manual iteration
|
|
162
|
+
|
|
163
|
+
| Situation | AutoResearch | Manual |
|
|
164
|
+
| ---------------------------------------------- | ------------------------------ | -------------------------- |
|
|
165
|
+
| Optimizing a measurable metric | Yes | No |
|
|
166
|
+
| Exploring design alternatives with a judge | Yes (with judge as the metric) | Also fine |
|
|
167
|
+
| Implementing a specified feature | No | Yes |
|
|
168
|
+
| Debugging a specific bug with known root cause | No | Yes |
|
|
169
|
+
| Tuning a prompt against an eval set | Yes | Also fine |
|
|
170
|
+
| Performance optimization with a clear metric | Yes | If the metric is composite |
|
|
171
|
+
|
|
172
|
+
### Common failure modes
|
|
173
|
+
|
|
174
|
+
- **Multi-axis edits.** The agent edits two files in one iteration; the metric improves; you don't know which edit caused it. Solution: enforce one-asset programmatically.
|
|
175
|
+
- **Metric drift.** The metric the loop optimizes is not the metric you actually care about. Solution: validate the metric on a held-out set before starting the loop.
|
|
176
|
+
- **No time box.** The loop runs indefinitely against a problem the loop can't solve. Solution: hard limit; manual review at expiry.
|
|
177
|
+
|
|
178
|
+
## 5. Software 3.0 Productivity Metrics
|
|
179
|
+
|
|
180
|
+
Traditional software metrics — lines of code, commits per day, velocity points — are meaningless when an agent can produce 10,000 lines in five minutes. The question is not how much was produced; it is whether what was produced was correct.
|
|
181
|
+
|
|
182
|
+
### Metrics that matter
|
|
183
|
+
|
|
184
|
+
| Metric | What it measures | Direction |
|
|
185
|
+
| ---------------------------------- | -------------------------- | ----------------------------------------------------------------------- |
|
|
186
|
+
| Tasks completed per session | Throughput | Higher better |
|
|
187
|
+
| Agent completion rate | Autonomy quality | Higher better — % of tasks finished without human intervention |
|
|
188
|
+
| Rework rate | Output quality | Lower better — % of tasks that needed human correction |
|
|
189
|
+
| Time-to-value | Idea → working feature | Decreasing trend |
|
|
190
|
+
| Skill / context-injection accuracy | Context engineering health | Higher precision and recall |
|
|
191
|
+
| Eval pass rate | Skill-content correctness | Higher better |
|
|
192
|
+
| Context-failure rate | Agent reliability | Lower better — % of tasks where agent went wrong because of bad context |
|
|
193
|
+
|
|
194
|
+
### Metrics that don't matter
|
|
195
|
+
|
|
196
|
+
| Metric | Why it's misleading |
|
|
197
|
+
| ------------------------ | -------------------------------------------------------- |
|
|
198
|
+
| Lines of code generated | Quantity is free; quality isn't |
|
|
199
|
+
| Commits per day | More commits ≠ more value |
|
|
200
|
+
| Files changed | Breadth says nothing about correctness |
|
|
201
|
+
| Time spent coding | The constraint is not coding time; it is human attention |
|
|
202
|
+
| Number of agents running | More agents can mean more noise, not throughput |
|
|
203
|
+
|
|
204
|
+
### The productivity equation
|
|
205
|
+
|
|
206
|
+
```
|
|
207
|
+
productivity ≈ (tasks completed × quality score) / human attention consumed
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
The goal is to grow the numerator while shrinking the denominator. "Getting better at AI-native development" is the operational definition of moving this ratio in the right direction over time.
|
|
211
|
+
|
|
212
|
+
## 6. The Vibe Hangover — Quality Gates as Compensation
|
|
213
|
+
|
|
214
|
+
The rapid adoption of AI-assisted coding has produced measurable quality regressions. This data grounds the case for agentic engineering and gates over vibe coding.
|
|
215
|
+
|
|
216
|
+
### Reported findings
|
|
217
|
+
|
|
218
|
+
| Source | Finding |
|
|
219
|
+
| ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- |
|
|
220
|
+
| Stanford / Microsoft study (2024) | AI-assisted developers believe their code is more secure, but the code is ~1.7× more likely to contain security vulnerabilities |
|
|
221
|
+
| GitClear (2025) | AI-generated code shows ~2.74× higher vulnerability rate than human-written equivalents |
|
|
222
|
+
| OWASP AI Security Guide (2024–25) | New attack surface: prompt injection, training-data poisoning, model manipulation |
|
|
223
|
+
| Snyk State of AI Code Security (2025) | ≥56% of surveyed developers report at least one AI-introduced vulnerability reaching production |
|
|
224
|
+
|
|
225
|
+
These numbers will move; the structural reason will not. AI-generated code has more vulnerabilities because:
|
|
226
|
+
|
|
227
|
+
1. **Training-data bias.** Models learn from public repos that include vulnerable code; popular patterns are not necessarily secure patterns.
|
|
228
|
+
2. **Missing context.** The model does not know the deployment environment, threat model, or compliance requirements unless explicitly told.
|
|
229
|
+
3. **Acceptance bias.** Developers scrutinize AI-generated code less than code they wrote themselves ("it looks reasonable").
|
|
230
|
+
4. **Speed-vs-security trade-off.** Faster output encourages faster acceptance. Security review is slow and feels like friction.
|
|
231
|
+
|
|
232
|
+
### The compensating gates
|
|
233
|
+
|
|
234
|
+
The defence is mandatory verification between every agent action and production:
|
|
235
|
+
|
|
236
|
+
```
|
|
237
|
+
Agent generates code
|
|
238
|
+
│
|
|
239
|
+
▼
|
|
240
|
+
[Gate 1] Type checking
|
|
241
|
+
│
|
|
242
|
+
▼
|
|
243
|
+
[Gate 2] Lint / style / safety rules
|
|
244
|
+
│
|
|
245
|
+
▼
|
|
246
|
+
[Gate 3] Automated tests (unit + integration)
|
|
247
|
+
│
|
|
248
|
+
▼
|
|
249
|
+
[Gate 4] Security scanning (deps, secrets, known CVEs)
|
|
250
|
+
│
|
|
251
|
+
▼
|
|
252
|
+
[Gate 5] Design / visual review (for UI changes)
|
|
253
|
+
│
|
|
254
|
+
▼
|
|
255
|
+
[Gate 6] Human spot-check (for autonomy < 5)
|
|
256
|
+
│
|
|
257
|
+
▼
|
|
258
|
+
Production
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
**Rule:** no gate may be skipped under speed pressure. An agent that passes all gates is trustworthy on this change. An agent that bypasses gates is a liability regardless of how good the diff looks.
|
|
262
|
+
|
|
263
|
+
The gates are also the _justification_ for higher autonomy. Without the gates, level 4 is reckless. With the gates, level 4 is responsible.
|
|
264
|
+
|
|
265
|
+
## 7. Operating Position
|
|
266
|
+
|
|
267
|
+
A team can name its current operating point on the autonomy slider explicitly. Most production-LLM teams sit between **level 3 and level 4**: agents implement complete features end-to-end, run quality gates locally, and the human reviews the completed work via a structured session summary or PR rather than line-by-line as it is being written.
|
|
268
|
+
|
|
269
|
+
Moving up the slider over time is a deliberate engineering project: each step requires the supporting infrastructure to move with it (eval suites, tripwire guardrails on destructive operations, persistent memory across sessions, model-routing logic for matching tasks to model strengths). A team that drifts upward without that infrastructure is drifting toward a regression event, not toward higher productivity.
|
|
270
|
+
|
|
271
|
+
Moving down the slider is also legitimate: high-stakes work (production deployment, security-sensitive logic, irreversible data operations) should run at lower autonomy regardless of the team's overall position. The slider is a per-task setting, not a team-wide setting.
|
|
272
|
+
|
|
273
|
+
## Verification
|
|
274
|
+
|
|
275
|
+
- [ ] Prompts and skill specifications are treated as source code — versioned in git, reviewed before merge, covered by evals where useful
|
|
276
|
+
- [ ] Every agent session operates at an _intentional_ autonomy level chosen for the task's risk, not the harness's default
|
|
277
|
+
- [ ] Quality gates exist between agent output and production: type check, lint, automated tests, security scan, plus a human spot-check while autonomy is < 5
|
|
278
|
+
- [ ] Productivity is measured by outcomes (tasks completed, rework rate, time-to-value), not by output volume (LoC, commit count)
|
|
279
|
+
- [ ] Knowledge is captured durably (skill files, decision records, structured session summaries) rather than lost between sessions
|
|
280
|
+
- [ ] Auto-improve loops are constrained per the AutoResearch pattern — one editable asset, one scalar metric, one time box
|
|
281
|
+
- [ ] Security regressions known to come from AI-generated code (data exposure, weak auth, accepted-but-vulnerable patterns) are explicitly mitigated by the gate stack
|
|
282
|
+
- [ ] Vibe-coding patterns are limited to throwaway prototypes; production work runs as agentic engineering
|
|
283
|
+
- [ ] The team can answer "what is our current autonomy level on this task?" without ambiguity, and the answer is justified by the task's risk profile
|
|
284
|
+
|
|
285
|
+
## Do NOT Use When
|
|
286
|
+
|
|
287
|
+
| Use instead | When |
|
|
288
|
+
| -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
289
|
+
| `prompt-craft` | Authoring or improving a specific prompt — the per-prompt discipline below this skill's conceptual frame |
|
|
290
|
+
| `agent-engineering` | Designing the production-reliability layer for an agent system: orchestration patterns, error budgets, observability, fault tolerance |
|
|
291
|
+
| `code-review` | Reviewing the AI-generated code that comes out of a Software 3.0 workflow — this skill frames _why_ the review is needed; code-review is _how_ the review is done |
|
|
292
|
+
| `tool-call-strategy` | The tactical layer of which tool an agent should call when, in what order, with what fallback |
|
|
293
|
+
| `skill-router` | The cross-skill dispatch decision (which skill activates for a query) — this skill is meta about _why_ a skill library exists at all |
|
|
294
|
+
| `debugging` | An autonomous loop has stalled, regressed, or is producing wrong output and you need to chase the root cause |
|
|
@@ -0,0 +1,60 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: api-design
|
|
3
|
+
description: "Use when designing or reviewing API surfaces: resources/actions, request and response schemas, status codes, pagination, filtering, idempotency, versioning, auth boundaries, and error envelopes. Do NOT use for non-HTTP system contracts (use `system-interface-contracts`), async event contracts (use `event-contract-design`), database design (use `data-modeling`), or inbound provider webhook mechanics (use `webhook-integration`)."
|
|
4
|
+
license: MIT
|
|
5
|
+
compatibility: "Portable API design guidance for REST-like HTTP APIs, route handlers, internal APIs, and documented JSON contracts."
|
|
6
|
+
allowed-tools: Read Grep
|
|
7
|
+
metadata:
|
|
8
|
+
metadata: "{\"schema_version\":6,\"version\":\"1.0.0\",\"type\":\"capability\",\"category\":\"engineering\",\"domain\":\"engineering/api-design\",\"scope\":\"portable\",\"owner\":\"skill-graph-maintainer\",\"freshness\":\"2026-05-11\",\"drift_check\":\"{\\\\\\\"last_verified\\\\\\\":\\\\\\\"2026-05-11\\\\\\\"}\",\"eval_artifacts\":\"present\",\"eval_state\":\"unverified\",\"routing_eval\":\"absent\",\"stability\":\"experimental\",\"keywords\":\"[\\\\\\\"API design\\\\\\\",\\\\\\\"REST API\\\\\\\",\\\\\\\"endpoint design\\\\\\\",\\\\\\\"request response schema\\\\\\\",\\\\\\\"status codes\\\\\\\",\\\\\\\"pagination\\\\\\\",\\\\\\\"filtering\\\\\\\",\\\\\\\"idempotency\\\\\\\",\\\\\\\"API versioning\\\\\\\",\\\\\\\"error envelope\\\\\\\"]\",\"examples\":\"[\\\\\\\"design the API for listing orders with filters, pagination, and stable errors\\\\\\\",\\\\\\\"review this route contract before frontend and backend implement it separately\\\\\\\",\\\\\\\"should this operation be a resource update, an action endpoint, or an async job?\\\\\\\",\\\\\\\"define API versioning and idempotency for this create endpoint\\\\\\\"]\",\"anti_examples\":\"[\\\\\\\"define the broader contract between a job, service, and dashboard\\\\\\\",\\\\\\\"design database tables, foreign keys, and views\\\\\\\",\\\\\\\"implement provider webhook signature verification and retry behavior\\\\\\\",\\\\\\\"debug why this endpoint is returning 500\\\\\\\"]\",\"relations\":\"{\\\\\\\"boundary\\\\\\\":[{\\\\\\\"skill\\\\\\\":\\\\\\\"system-interface-contracts\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"system-interface-contracts owns interface contracts across any boundary; api-design owns API endpoint shape and HTTP semantics\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"event-contract-design\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"event-contract-design owns asynchronous event and message contracts; api-design owns HTTP request/response surfaces\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"data-modeling\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"data-modeling owns persistence shape; api-design owns external representation and operation shape\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"webhook-integration\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"webhook-integration owns inbound provider webhooks; api-design owns APIs the system exposes or calls by contract\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"debugging\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"debugging owns known endpoint failures; api-design owns pre-implementation surface design\\\\\\\"}],\\\\\\\"related\\\\\\\":[\\\\\\\"system-interface-contracts\\\\\\\",\\\\\\\"data-modeling\\\\\\\",\\\\\\\"testing-strategy\\\\\\\",\\\\\\\"webhook-integration\\\\\\\",\\\\\\\"event-contract-design\\\\\\\"],\\\\\\\"verify_with\\\\\\\":[\\\\\\\"testing-strategy\\\\\\\",\\\\\\\"code-review\\\\\\\"]}\",\"portability\":\"{\\\\\\\"readiness\\\\\\\":\\\\\\\"scripted\\\\\\\",\\\\\\\"targets\\\\\\\":[\\\\\\\"skill-md\\\\\\\"]}\",\"lifecycle\":\"{\\\\\\\"stale_after_days\\\\\\\":365,\\\\\\\"review_cadence\\\\\\\":\\\\\\\"quarterly\\\\\\\"}\",\"skill_graph_source_repo\":\"https://github.com/jacob-balslev/skill-graph\",\"skill_graph_protocol\":\"Skill Metadata Protocol v5\",\"skill_graph_project\":\"Skill Graph\",\"skill_graph_canonical_skill\":\"skills/api-design/SKILL.md\"}"
|
|
9
|
+
skill_graph_source_repo: "https://github.com/jacob-balslev/skill-graph"
|
|
10
|
+
skill_graph_protocol: Skill Metadata Protocol v4
|
|
11
|
+
skill_graph_project: Skill Graph
|
|
12
|
+
skill_graph_canonical_skill: skills/api-design/SKILL.md
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
# API Design
|
|
16
|
+
|
|
17
|
+
## Coverage
|
|
18
|
+
|
|
19
|
+
Design clear and evolvable API surfaces. Covers resources vs actions, route naming, request/response schemas, validation, status codes, pagination, filtering, sorting, idempotency, auth boundaries, error envelopes, rate-limit signals, versioning, deprecation, and contract examples.
|
|
20
|
+
|
|
21
|
+
## Philosophy
|
|
22
|
+
|
|
23
|
+
An API is a product surface for another program. Its main job is stable meaning under change. Internal convenience should not leak into routes, schemas, or errors unless consumers actually need it.
|
|
24
|
+
|
|
25
|
+
Prefer boring consistency. A small set of predictable patterns beats clever endpoint-specific behavior that every client has to rediscover.
|
|
26
|
+
|
|
27
|
+
## Method
|
|
28
|
+
|
|
29
|
+
1. Identify consumers and their tasks.
|
|
30
|
+
2. Model resources and actions separately.
|
|
31
|
+
3. Define request, response, and error schema examples.
|
|
32
|
+
4. Decide pagination, filtering, sorting, and field selection.
|
|
33
|
+
5. State auth, tenant, and permission boundaries.
|
|
34
|
+
6. Add idempotency and retry behavior for mutating or async operations.
|
|
35
|
+
7. Define versioning and deprecation rules before the first breaking change.
|
|
36
|
+
8. Add contract tests or fixtures.
|
|
37
|
+
|
|
38
|
+
## Evals
|
|
39
|
+
|
|
40
|
+
This skill ships a comprehension-eval artifact at [`examples/evals/api-design.json`](https://github.com/jacob-balslev/skill-graph/blob/main/examples/evals/api-design.json). The checklist below is the authoring gate for API surface decisions; the eval file is the grader surface.
|
|
41
|
+
|
|
42
|
+
## Verification
|
|
43
|
+
|
|
44
|
+
- [ ] Routes use consistent nouns/actions and avoid implementation leakage
|
|
45
|
+
- [ ] Request and response examples cover success and failure
|
|
46
|
+
- [ ] Status codes match retry and client-action semantics
|
|
47
|
+
- [ ] Collection endpoints define pagination and stable ordering
|
|
48
|
+
- [ ] Mutating operations define idempotency or explicitly reject it
|
|
49
|
+
- [ ] Auth and tenant boundaries are visible in the contract
|
|
50
|
+
- [ ] Breaking-change and deprecation rules are stated
|
|
51
|
+
|
|
52
|
+
## Do NOT Use When
|
|
53
|
+
|
|
54
|
+
| Use instead | When |
|
|
55
|
+
|---|---|
|
|
56
|
+
| `system-interface-contracts` | The boundary is broader than an API endpoint, such as jobs, modules, events, or agent interfaces. |
|
|
57
|
+
| `event-contract-design` | You need asynchronous event schema, envelope, topic/channel naming, replay, or compatibility. |
|
|
58
|
+
| `data-modeling` | You need persistence structure and constraints. |
|
|
59
|
+
| `webhook-integration` | You are implementing inbound third-party webhook handling. |
|
|
60
|
+
| `debugging` | An API already fails and needs diagnosis. |
|
|
@@ -0,0 +1,55 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: architecture-decision-records
|
|
3
|
+
description: "Use when writing, reviewing, or updating Architecture Decision Records: context, decision, options rejected, consequences, status, supersession, and follow-up verification. Do NOT use for general documentation prose (use `documentation`), code review findings (use `code-review`), or choosing between frameworks before a decision exists (use `framework-fit-analysis`)."
|
|
4
|
+
license: MIT
|
|
5
|
+
compatibility: "Portable ADR discipline for Markdown decision logs, repo docs, design docs, and architecture governance."
|
|
6
|
+
allowed-tools: Read Grep
|
|
7
|
+
metadata:
|
|
8
|
+
metadata: "{\"schema_version\":6,\"version\":\"1.0.0\",\"type\":\"capability\",\"category\":\"engineering\",\"domain\":\"architecture/decision-records\",\"scope\":\"portable\",\"owner\":\"skill-graph-maintainer\",\"freshness\":\"2026-05-11\",\"drift_check\":\"{\\\\\\\"last_verified\\\\\\\":\\\\\\\"2026-05-11\\\\\\\"}\",\"eval_artifacts\":\"planned\",\"eval_state\":\"unverified\",\"routing_eval\":\"absent\",\"stability\":\"experimental\",\"keywords\":\"[\\\\\\\"ADR\\\\\\\",\\\\\\\"architecture decision record\\\\\\\",\\\\\\\"decision log\\\\\\\",\\\\\\\"technical decision\\\\\\\",\\\\\\\"decision consequences\\\\\\\",\\\\\\\"options rejected\\\\\\\",\\\\\\\"superseded ADR\\\\\\\",\\\\\\\"architectural rationale\\\\\\\",\\\\\\\"decision status\\\\\\\"]\",\"examples\":\"[\\\\\\\"write an ADR for choosing Postgres views as the source of truth\\\\\\\",\\\\\\\"review this architecture decision record for missing consequences and rejected options\\\\\\\",\\\\\\\"this decision changed - should we amend the ADR or supersede it?\\\\\\\",\\\\\\\"extract the decision from this long architecture discussion into a durable ADR\\\\\\\"]\",\"anti_examples\":\"[\\\\\\\"write a general README section explaining how this module works\\\\\\\",\\\\\\\"choose which framework we should use for this project\\\\\\\",\\\\\\\"review this PR for bugs and regressions\\\\\\\",\\\\\\\"design the interface contract between these two services\\\\\\\"]\",\"relations\":\"{\\\\\\\"boundary\\\\\\\":[{\\\\\\\"skill\\\\\\\":\\\\\\\"framework-fit-analysis\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"framework-fit-analysis evaluates options before selection; architecture-decision-records records the selected option and tradeoffs\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"code-review\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"code-review evaluates a diff; architecture-decision-records evaluates the decision record\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"system-interface-contracts\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"system-interface-contracts designs boundaries and contracts; architecture-decision-records records the decision to adopt one\\\\\\\"}],\\\\\\\"related\\\\\\\":[\\\\\\\"framework-fit-analysis\\\\\\\",\\\\\\\"bounded-context-mapping\\\\\\\",\\\\\\\"system-interface-contracts\\\\\\\",\\\\\\\"dependency-architecture\\\\\\\"],\\\\\\\"verify_with\\\\\\\":[\\\\\\\"code-review\\\\\\\"]}\",\"portability\":\"{\\\\\\\"readiness\\\\\\\":\\\\\\\"scripted\\\\\\\",\\\\\\\"targets\\\\\\\":[\\\\\\\"skill-md\\\\\\\"]}\",\"lifecycle\":\"{\\\\\\\"stale_after_days\\\\\\\":365,\\\\\\\"review_cadence\\\\\\\":\\\\\\\"quarterly\\\\\\\"}\",\"skill_graph_source_repo\":\"https://github.com/jacob-balslev/skill-graph\",\"skill_graph_protocol\":\"Skill Metadata Protocol v5\",\"skill_graph_project\":\"Skill Graph\",\"skill_graph_canonical_skill\":\"skills/architecture-decision-records/SKILL.md\"}"
|
|
9
|
+
skill_graph_source_repo: "https://github.com/jacob-balslev/skill-graph"
|
|
10
|
+
skill_graph_protocol: Skill Metadata Protocol v4
|
|
11
|
+
skill_graph_project: Skill Graph
|
|
12
|
+
skill_graph_canonical_skill: skills/architecture-decision-records/SKILL.md
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
# Architecture Decision Records
|
|
16
|
+
|
|
17
|
+
## Coverage
|
|
18
|
+
|
|
19
|
+
Create and audit ADRs for significant technical choices. Covers decision context, forces, considered options, chosen decision, rejected alternatives, consequences, status, supersession, links to implementation, and follow-up verification. Use for decisions with future readers, cross-team consequences, operational cost, or hard-to-reverse effects.
|
|
20
|
+
|
|
21
|
+
## Philosophy
|
|
22
|
+
|
|
23
|
+
An ADR is not a design essay. It is a durable answer to "Why did we choose this, given what we knew then?" It should preserve the tradeoff, not retroactively make the decision look inevitable.
|
|
24
|
+
|
|
25
|
+
Good ADRs are short, dated, statused, and honest about consequences. If a future agent cannot tell whether the decision still holds, the record failed.
|
|
26
|
+
|
|
27
|
+
## Method
|
|
28
|
+
|
|
29
|
+
1. Name the decision in one sentence.
|
|
30
|
+
2. Capture context and forces, including constraints and non-goals.
|
|
31
|
+
3. List serious options considered, including "do nothing" when real.
|
|
32
|
+
4. State the decision and why it won.
|
|
33
|
+
5. Record consequences: benefits, costs, risks, migration obligations, and reversibility.
|
|
34
|
+
6. Set status: proposed, accepted, deprecated, or superseded.
|
|
35
|
+
7. Link implementation surfaces and verification checks.
|
|
36
|
+
|
|
37
|
+
## Verification
|
|
38
|
+
|
|
39
|
+
- [ ] The ADR records one decision, not a cluster of unrelated choices
|
|
40
|
+
- [ ] Rejected options are concrete and plausible
|
|
41
|
+
- [ ] Consequences include costs and risks, not only benefits
|
|
42
|
+
- [ ] Status and date are present
|
|
43
|
+
- [ ] Supersession links are explicit when the decision changed
|
|
44
|
+
- [ ] Implementation references are current or intentionally absent
|
|
45
|
+
- [ ] The decision can be understood without reading the whole discussion that caused it
|
|
46
|
+
|
|
47
|
+
## Do NOT Use When
|
|
48
|
+
|
|
49
|
+
| Use instead | When |
|
|
50
|
+
|---|---|
|
|
51
|
+
| `documentation` | You need a guide, README, tutorial, or reference page. |
|
|
52
|
+
| `framework-fit-analysis` | The task is still evaluating options before a decision. |
|
|
53
|
+
| `code-review` | The task is reviewing a diff for correctness. |
|
|
54
|
+
| `system-interface-contracts` | You need to design a contract before recording the decision. |
|
|
55
|
+
|