@skill-graph/cli 0.5.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +247 -0
- package/LICENSE +200 -0
- package/NOTICE +62 -0
- package/README.md +398 -0
- package/SKILL_GRAPH.md +443 -0
- package/bin/skill-graph.js +374 -0
- package/docs/ADOPTION.md +117 -0
- package/docs/CONFORMANCE.md +66 -0
- package/docs/PRIMER.md +384 -0
- package/docs/QUICKSTART-30MIN.md +333 -0
- package/docs/ROUTING-METRICS.md +120 -0
- package/docs/SKILL-MD-FORMAT-COMPATIBILITY.md +127 -0
- package/docs/SKILL_AUDIT_CHECKLIST.md +199 -0
- package/docs/SKILL_AUDIT_LOOP.md +195 -0
- package/docs/SKILL_METADATA_PROTOCOL.md +609 -0
- package/docs/_archived/marketplace-publication-priority-2026-05-18.md +239 -0
- package/docs/adr/0001-predicate-set.md +69 -0
- package/docs/adr/0002-json-ld-context.md +82 -0
- package/docs/adr/0003-ontoclean-rigidity-tags.md +65 -0
- package/docs/adr/0004-persistent-identifiers.md +74 -0
- package/docs/adr/0005-freshness-consolidation.md +70 -0
- package/docs/adr/0006-revise-predicate-rename.md +105 -0
- package/docs/adr/0007-audit-loop-cadence.md +99 -0
- package/docs/adr/0008-skill-surface-split-and-curation-policy.md +93 -0
- package/docs/category-consumers.md +168 -0
- package/docs/concept-map.md +194 -0
- package/docs/diagrams/drift-states.mmd +21 -0
- package/docs/diagrams/manifest-pipeline.mmd +25 -0
- package/docs/diagrams/routing-harness.mmd +41 -0
- package/docs/diagrams/starter-graph.mmd +53 -0
- package/docs/field-decision-guide.md +315 -0
- package/docs/field-rationale.md +211 -0
- package/docs/field-reference.generated.md +624 -0
- package/docs/field-reference.md +1426 -0
- package/docs/glossary.md +190 -0
- package/docs/head-noun-glossary.md +63 -0
- package/docs/images/audit-phases.png +0 -0
- package/docs/images/drift-states.png +0 -0
- package/docs/images/graded-mode.png +0 -0
- package/docs/images/manifest-pipeline.png +0 -0
- package/docs/images/routing-harness.png +0 -0
- package/docs/images/skill-anatomy.png +0 -0
- package/docs/images/starter-graph.png +0 -0
- package/docs/images/system-model.png +0 -0
- package/docs/integrations/github-actions.md +155 -0
- package/docs/manifest-field-mapping.md +443 -0
- package/docs/marketplace-publication-queue.generated.md +240 -0
- package/docs/marketplace-release-agent-prompt.md +82 -0
- package/docs/marketplace-skill-candidate-list.md +272 -0
- package/docs/marketplace-syndication.md +222 -0
- package/docs/migration-sample-review.md +155 -0
- package/docs/migrations/v4-to-v5.md +168 -0
- package/docs/migrations/v5-to-v6.md +221 -0
- package/docs/name-exceptions.yaml +37 -0
- package/docs/plans/marketplace-p1-public-migration-plan.md +41 -0
- package/docs/plans/multi-root-workspace.md +148 -0
- package/docs/plans/scripts-roadmap.md +107 -0
- package/docs/plans/v4-schema-bump.md +160 -0
- package/docs/plans/wave-2-extraction.md +122 -0
- package/docs/positioning-vs-marketplaces.md +175 -0
- package/docs/proposals/skill-audit-loop-positioning.md +160 -0
- package/docs/quality-doctrine.md +138 -0
- package/docs/recommended-skills.md +150 -0
- package/docs/research/skill-comprehension-eval-research.md +1830 -0
- package/docs/research/skill-retrieval-evidence.md +66 -0
- package/docs/skill-metadata-protocol.md +471 -0
- package/docs/skills-sh-maintainer-cleanup-request.md +80 -0
- package/examples/audits/a11y/findings.md +52 -0
- package/examples/audits/a11y/scorecard.md +21 -0
- package/examples/audits/a11y/verdict.md +44 -0
- package/examples/audits/debugging/findings.md +59 -0
- package/examples/audits/debugging/scorecard.md +22 -0
- package/examples/audits/debugging/verdict.md +33 -0
- package/examples/audits/documentation/findings.md +59 -0
- package/examples/audits/documentation/scorecard.md +22 -0
- package/examples/audits/documentation/verdict.md +33 -0
- package/examples/evals/a11y.json +140 -0
- package/examples/evals/api-design.json +52 -0
- package/examples/evals/code-review.json +52 -0
- package/examples/evals/data-modeling.json +52 -0
- package/examples/evals/database-migration.json +52 -0
- package/examples/evals/debugging.json +118 -0
- package/examples/evals/dependency-architecture.json +52 -0
- package/examples/evals/design-system-architecture.json +52 -0
- package/examples/evals/error-tracking.json +52 -0
- package/examples/evals/event-contract-design.json +52 -0
- package/examples/evals/form-ux-architecture.json +52 -0
- package/examples/evals/framework-fit-analysis.json +52 -0
- package/examples/evals/graph-audit.json +139 -0
- package/examples/evals/information-architecture.json +52 -0
- package/examples/evals/interaction-feedback.json +52 -0
- package/examples/evals/interaction-patterns.json +52 -0
- package/examples/evals/layout-composition.json +52 -0
- package/examples/evals/lint-overlay.json +117 -0
- package/examples/evals/microcopy.json +52 -0
- package/examples/evals/observability-modeling.json +52 -0
- package/examples/evals/pattern-recognition.json +96 -0
- package/examples/evals/performance-engineering.json +52 -0
- package/examples/evals/refactor.json +128 -0
- package/examples/evals/semiotics.json +52 -0
- package/examples/evals/skill-infrastructure.json +96 -0
- package/examples/evals/skill-router.json +140 -0
- package/examples/evals/skill-router.routing.json +113 -0
- package/examples/evals/system-interface-contracts.json +52 -0
- package/examples/evals/task-analysis.json +52 -0
- package/examples/evals/testing-strategy.json +118 -0
- package/examples/evals/type-safety.json +249 -0
- package/examples/evals/visual-design-foundations.json +52 -0
- package/examples/evals/webhook-integration.json +52 -0
- package/examples/exports/a11y.skill-md.md +80 -0
- package/examples/exports/debugging.skill-md.md +80 -0
- package/examples/exports/refactor.skill-md.md +78 -0
- package/examples/exports/testing-strategy.skill-md.md +81 -0
- package/examples/projects/markdown-static-site/README.md +115 -0
- package/examples/projects/markdown-static-site/skills/content-source-router/SKILL.md +131 -0
- package/examples/projects/markdown-static-site/skills/image-optimization-pipeline-config/SKILL.md +132 -0
- package/examples/projects/markdown-static-site/skills/link-rot-detection/SKILL.md +103 -0
- package/examples/projects/markdown-static-site/skills/markdown-post-frontmatter-validation/SKILL.md +133 -0
- package/examples/projects/markdown-static-site/skills/migrate-posts-to-v2-frontmatter/SKILL.md +140 -0
- package/examples/projects/saas-stripe-postgres/README.md +208 -0
- package/examples/projects/saas-stripe-postgres/db/migrations/0004_canonicalize_orders.sql +37 -0
- package/examples/projects/saas-stripe-postgres/db/schema.sql +112 -0
- package/examples/projects/saas-stripe-postgres/skills/migrate-orders-to-canonical-schema/SKILL.md +149 -0
- package/examples/projects/saas-stripe-postgres/skills/nextjs-server-action-validation/SKILL.md +154 -0
- package/examples/projects/saas-stripe-postgres/skills/payment-provider-router/SKILL.md +153 -0
- package/examples/projects/saas-stripe-postgres/skills/postgres-rls-pattern/SKILL.md +163 -0
- package/examples/projects/saas-stripe-postgres/skills/stripe-webhook-signature-verification/SKILL.md +137 -0
- package/examples/protocol/skill-metadata-template.md +301 -0
- package/examples/protocol/skills.manifest.sample.json +13245 -0
- package/examples/skill-metadata-template.md +317 -0
- package/examples/skills.manifest.sample.json +13519 -0
- package/examples/tests/v3-1-skos-fixture/SKILL.md +93 -0
- package/marketplace/README.md +17 -0
- package/marketplace/skills/a11y/SKILL.md +66 -0
- package/marketplace/skills/acid-fundamentals/SKILL.md +106 -0
- package/marketplace/skills/agent-engineering/SKILL.md +386 -0
- package/marketplace/skills/agent-eval-design/SKILL.md +55 -0
- package/marketplace/skills/ai-native-development/SKILL.md +294 -0
- package/marketplace/skills/api-design/SKILL.md +60 -0
- package/marketplace/skills/architecture-decision-records/SKILL.md +55 -0
- package/marketplace/skills/background-jobs/SKILL.md +265 -0
- package/marketplace/skills/bounded-context-mapping/SKILL.md +55 -0
- package/marketplace/skills/cap-theorem-tradeoffs/SKILL.md +127 -0
- package/marketplace/skills/client-server-boundary/SKILL.md +187 -0
- package/marketplace/skills/code-review/SKILL.md +120 -0
- package/marketplace/skills/color-system-design/SKILL.md +43 -0
- package/marketplace/skills/component-architecture/SKILL.md +126 -0
- package/marketplace/skills/compression/SKILL.md +112 -0
- package/marketplace/skills/conceptual-modeling/SKILL.md +181 -0
- package/marketplace/skills/connection-pooling/SKILL.md +105 -0
- package/marketplace/skills/constraint-awareness/SKILL.md +287 -0
- package/marketplace/skills/content-monitor/SKILL.md +209 -0
- package/marketplace/skills/context-engineering/SKILL.md +320 -0
- package/marketplace/skills/context-graph/SKILL.md +174 -0
- package/marketplace/skills/context-management/SKILL.md +174 -0
- package/marketplace/skills/context-window/SKILL.md +239 -0
- package/marketplace/skills/contract-testing/SKILL.md +120 -0
- package/marketplace/skills/cron-scheduling/SKILL.md +223 -0
- package/marketplace/skills/dark-mode-implementation/SKILL.md +47 -0
- package/marketplace/skills/data-modeling/SKILL.md +59 -0
- package/marketplace/skills/data-modeling-fundamentals/SKILL.md +117 -0
- package/marketplace/skills/database-migration/SKILL.md +429 -0
- package/marketplace/skills/debugging/SKILL.md +67 -0
- package/marketplace/skills/dependency-architecture/SKILL.md +58 -0
- package/marketplace/skills/design-module-composition/SKILL.md +43 -0
- package/marketplace/skills/design-system-architecture/SKILL.md +61 -0
- package/marketplace/skills/design-thinking/SKILL.md +44 -0
- package/marketplace/skills/diagnosis/SKILL.md +296 -0
- package/marketplace/skills/diff-analysis/SKILL.md +188 -0
- package/marketplace/skills/e2e-test-design/SKILL.md +113 -0
- package/marketplace/skills/entity-relationship-modeling/SKILL.md +218 -0
- package/marketplace/skills/epistemic-grounding/SKILL.md +112 -0
- package/marketplace/skills/error-boundary/SKILL.md +235 -0
- package/marketplace/skills/error-tracking/SKILL.md +261 -0
- package/marketplace/skills/eval-driven-development/SKILL.md +147 -0
- package/marketplace/skills/evaluation/SKILL.md +113 -0
- package/marketplace/skills/event-contract-design/SKILL.md +60 -0
- package/marketplace/skills/event-storming/SKILL.md +56 -0
- package/marketplace/skills/form-ux-architecture/SKILL.md +60 -0
- package/marketplace/skills/framework-fit-analysis/SKILL.md +59 -0
- package/marketplace/skills/frontend-architecture/SKILL.md +43 -0
- package/marketplace/skills/generative-ui/SKILL.md +118 -0
- package/marketplace/skills/graph-audit/SKILL.md +81 -0
- package/marketplace/skills/guardrails/SKILL.md +118 -0
- package/marketplace/skills/hooks-patterns/SKILL.md +185 -0
- package/marketplace/skills/http-semantics/SKILL.md +136 -0
- package/marketplace/skills/ideation/SKILL.md +41 -0
- package/marketplace/skills/indexing-strategy/SKILL.md +108 -0
- package/marketplace/skills/information-architecture/SKILL.md +59 -0
- package/marketplace/skills/integration-test-design/SKILL.md +111 -0
- package/marketplace/skills/intent-recognition/SKILL.md +136 -0
- package/marketplace/skills/interaction-feedback/SKILL.md +59 -0
- package/marketplace/skills/interaction-patterns/SKILL.md +59 -0
- package/marketplace/skills/journey-mapping/SKILL.md +41 -0
- package/marketplace/skills/keywords/SKILL.md +213 -0
- package/marketplace/skills/knowledge-modeling/SKILL.md +232 -0
- package/marketplace/skills/layout-composition/SKILL.md +59 -0
- package/marketplace/skills/linguistics/SKILL.md +429 -0
- package/marketplace/skills/lint-overlay/SKILL.md +76 -0
- package/marketplace/skills/mental-models/SKILL.md +126 -0
- package/marketplace/skills/merge-queue/SKILL.md +94 -0
- package/marketplace/skills/methodology/SKILL.md +317 -0
- package/marketplace/skills/microcopy/SKILL.md +232 -0
- package/marketplace/skills/middleware-patterns/SKILL.md +363 -0
- package/marketplace/skills/mobile-responsive-ux/SKILL.md +287 -0
- package/marketplace/skills/mutation-testing/SKILL.md +112 -0
- package/marketplace/skills/naming-conventions/SKILL.md +112 -0
- package/marketplace/skills/observability-modeling/SKILL.md +59 -0
- package/marketplace/skills/ontology-modeling/SKILL.md +67 -0
- package/marketplace/skills/owasp-security/SKILL.md +153 -0
- package/marketplace/skills/pattern-recognition/SKILL.md +472 -0
- package/marketplace/skills/performance-budgets/SKILL.md +185 -0
- package/marketplace/skills/performance-engineering/SKILL.md +58 -0
- package/marketplace/skills/performance-testing/SKILL.md +125 -0
- package/marketplace/skills/printify/SKILL.md +42 -0
- package/marketplace/skills/prioritization/SKILL.md +118 -0
- package/marketplace/skills/problem-framing/SKILL.md +41 -0
- package/marketplace/skills/problem-locating-solving/SKILL.md +203 -0
- package/marketplace/skills/project-knowledge-extraction/SKILL.md +54 -0
- package/marketplace/skills/prompt-craft/SKILL.md +134 -0
- package/marketplace/skills/prompt-injection-defense/SKILL.md +132 -0
- package/marketplace/skills/property-based-testing/SKILL.md +100 -0
- package/marketplace/skills/prototyping/SKILL.md +43 -0
- package/marketplace/skills/query-optimization/SKILL.md +144 -0
- package/marketplace/skills/real-time-updates/SKILL.md +324 -0
- package/marketplace/skills/ref-patterns/SKILL.md +284 -0
- package/marketplace/skills/refactor/SKILL.md +65 -0
- package/marketplace/skills/rendering-models/SKILL.md +142 -0
- package/marketplace/skills/replication-patterns/SKILL.md +110 -0
- package/marketplace/skills/research-synthesis/SKILL.md +41 -0
- package/marketplace/skills/route-handler-design/SKILL.md +347 -0
- package/marketplace/skills/schema-evolution/SKILL.md +140 -0
- package/marketplace/skills/security-fundamentals/SKILL.md +139 -0
- package/marketplace/skills/semantic-center/SKILL.md +194 -0
- package/marketplace/skills/semantic-relations/SKILL.md +250 -0
- package/marketplace/skills/semantics/SKILL.md +366 -0
- package/marketplace/skills/semiotics/SKILL.md +230 -0
- package/marketplace/skills/seo-strategy/SKILL.md +260 -0
- package/marketplace/skills/server-actions-design/SKILL.md +243 -0
- package/marketplace/skills/server-components-design/SKILL.md +190 -0
- package/marketplace/skills/sharding-strategy/SKILL.md +123 -0
- package/marketplace/skills/shopify/SKILL.md +42 -0
- package/marketplace/skills/skill-infrastructure/SKILL.md +320 -0
- package/marketplace/skills/skill-router/SKILL.md +71 -0
- package/marketplace/skills/skill-scaffold/SKILL.md +105 -0
- package/marketplace/skills/snapshot-testing/SKILL.md +120 -0
- package/marketplace/skills/spec-driven-development/SKILL.md +148 -0
- package/marketplace/skills/state-machine-modeling/SKILL.md +56 -0
- package/marketplace/skills/state-management/SKILL.md +134 -0
- package/marketplace/skills/streaming-architecture/SKILL.md +194 -0
- package/marketplace/skills/summarization/SKILL.md +156 -0
- package/marketplace/skills/suspense-patterns/SKILL.md +265 -0
- package/marketplace/skills/system-interface-contracts/SKILL.md +59 -0
- package/marketplace/skills/task-analysis/SKILL.md +201 -0
- package/marketplace/skills/taxonomy-design/SKILL.md +66 -0
- package/marketplace/skills/test-coverage-strategy/SKILL.md +108 -0
- package/marketplace/skills/test-doubles-design/SKILL.md +98 -0
- package/marketplace/skills/test-driven-development/SKILL.md +96 -0
- package/marketplace/skills/testing-strategy/SKILL.md +67 -0
- package/marketplace/skills/theme-system-design/SKILL.md +43 -0
- package/marketplace/skills/tool-call-flow/SKILL.md +229 -0
- package/marketplace/skills/tool-call-strategy/SKILL.md +292 -0
- package/marketplace/skills/transaction-isolation/SKILL.md +98 -0
- package/marketplace/skills/type-safety/SKILL.md +177 -0
- package/marketplace/skills/typography-system/SKILL.md +43 -0
- package/marketplace/skills/usability-testing/SKILL.md +43 -0
- package/marketplace/skills/user-research/SKILL.md +43 -0
- package/marketplace/skills/vercel-composition-patterns/SKILL.md +157 -0
- package/marketplace/skills/version-control/SKILL.md +233 -0
- package/marketplace/skills/visual-design-foundations/SKILL.md +59 -0
- package/marketplace/skills/visual-hierarchy/SKILL.md +43 -0
- package/marketplace/skills/webhook-integration/SKILL.md +331 -0
- package/marketplace/skills/writing-humanizer/SKILL.md +380 -0
- package/package.json +67 -0
- package/schemas/manifest.schema.json +811 -0
- package/schemas/manifest.v2.schema.json +164 -0
- package/schemas/manifest.v3.schema.json +758 -0
- package/schemas/manifest.v4.schema.json +755 -0
- package/schemas/manifest.v5.schema.json +755 -0
- package/schemas/manifest.v6.schema.json +811 -0
- package/schemas/skill.context.jsonld +279 -0
- package/schemas/skill.schema.json +919 -0
- package/schemas/skill.v2.schema.json +201 -0
- package/schemas/skill.v3.schema.json +827 -0
- package/schemas/skill.v4.schema.json +822 -0
- package/schemas/skill.v5.schema.json +830 -0
- package/schemas/skill.v6.schema.json +946 -0
- package/schemas/vocabulary/keywords.json +180 -0
- package/schemas/vocabulary/workspace_tags.json +23 -0
- package/scripts/__tests__/migrate-skill-v2-to-v3.test.js +161 -0
- package/scripts/__tests__/migrate-skill-v3-to-v4.test.js +158 -0
- package/scripts/__tests__/test-export-parser-drift.js +149 -0
- package/scripts/__tests__/test-marketplace-export.js +114 -0
- package/scripts/__tests__/test-router-paths.js +82 -0
- package/scripts/__tests__/test-stability-promotion.js +244 -0
- package/scripts/__tests__/test-v3-1-alias-contract.js +109 -0
- package/scripts/__tests__/test-v3-1-skos-runtime.js +116 -0
- package/scripts/backfill-schema-version.js +198 -0
- package/scripts/build-field-reference.js +160 -0
- package/scripts/build-retrieval-baseline.js +511 -0
- package/scripts/check-markdown-links.js +211 -0
- package/scripts/check-protocol-consistency.js +979 -0
- package/scripts/export-marketplace-skills.js +610 -0
- package/scripts/export-skill.js +374 -0
- package/scripts/generate-manifest.js +787 -0
- package/scripts/lib/alias-contract.js +83 -0
- package/scripts/lib/audit-prompt-builder.js +771 -0
- package/scripts/lib/mock-grader.js +134 -0
- package/scripts/lib/parse-frontmatter.js +429 -0
- package/scripts/lib/roots.js +119 -0
- package/scripts/lint/check-archetype-sections.js +185 -0
- package/scripts/lint/check-category-enum.js +83 -0
- package/scripts/lint/check-routing-eval.js +146 -0
- package/scripts/lint/check-routing-quality.js +211 -0
- package/scripts/lint/check-stability-promotion.js +220 -0
- package/scripts/lint/format-code-frame.js +206 -0
- package/scripts/marketplace-install.js +125 -0
- package/scripts/migrate-category-to-enum.js +169 -0
- package/scripts/migrate-skill-v2-to-v3.js +424 -0
- package/scripts/migrate-skill-v3-to-v4.js +200 -0
- package/scripts/migrate-skill-v5-to-v6.js +304 -0
- package/scripts/restructure-by-category.js +85 -0
- package/scripts/seed-publication-classification.js +282 -0
- package/scripts/skill-audit.js +893 -0
- package/scripts/skill-graph-drift.js +483 -0
- package/scripts/skill-graph-route.js +766 -0
- package/scripts/skill-graph-routing-eval.js +393 -0
- package/scripts/skill-lint.js +1317 -0
- package/scripts/skill-overlap.js +213 -0
- package/scripts/verify-skill-md-export.js +201 -0
|
@@ -0,0 +1,126 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: mental-models
|
|
3
|
+
description: "Use when reasoning about how a system, user, or designer's internal model of behavior may diverge from reality — applies across UX, distributed systems, type systems, API design, and team collaboration. Covers the three-model frame (designer / system image / user), the two gulfs (execution and evaluation), analogy and metaphor as model-seeding, the five failure modes (transfer, overgeneralization, underspecification, drift, invariant blindness), the surface/operational/architectural/domain layering, and the discipline of validating a model against the system it claims to represent. Do NOT use for the visual representation of a model (use knowledge-modeling), for the formal-domain entities-attributes-relationships of conceptual modeling (use conceptual-modeling), for cognitive biases in decision-making (out of scope), or for empirically eliciting user models via research methods (use user-research)."
|
|
4
|
+
license: MIT
|
|
5
|
+
allowed-tools: Read Grep
|
|
6
|
+
metadata:
|
|
7
|
+
metadata: "{\"schema_version\":6,\"version\":\"1.0.0\",\"type\":\"capability\",\"category\":\"foundations\",\"domain\":\"foundations/mental-models\",\"scope\":\"reference\",\"owner\":\"skill-graph-maintainer\",\"freshness\":\"2026-05-16\",\"drift_check\":\"{\\\\\\\"last_verified\\\\\\\":\\\\\\\"2026-05-16\\\\\\\"}\",\"eval_artifacts\":\"planned\",\"eval_state\":\"unverified\",\"routing_eval\":\"absent\",\"comprehension_state\":\"present\",\"stability\":\"experimental\",\"keywords\":\"[\\\\\\\"mental models\\\\\\\",\\\\\\\"mental model\\\\\\\",\\\\\\\"gulf of execution\\\\\\\",\\\\\\\"gulf of evaluation\\\\\\\",\\\\\\\"user model\\\\\\\",\\\\\\\"system model\\\\\\\",\\\\\\\"designer model\\\\\\\",\\\\\\\"conceptual model\\\\\\\",\\\\\\\"metaphor\\\\\\\",\\\\\\\"analogy\\\\\\\",\\\\\\\"model-system fit\\\\\\\",\\\\\\\"hidden invariant\\\\\\\",\\\\\\\"model drift\\\\\\\",\\\\\\\"shared understanding\\\\\\\",\\\\\\\"cognitive tool\\\\\\\",\\\\\\\"progressive disclosure\\\\\\\",\\\\\\\"race conditions in concurrent code\\\\\\\"]\",\"triggers\":\"[\\\\\\\"how do I think about\\\\\\\",\\\\\\\"users expect X but the system does Y\\\\\\\",\\\\\\\"this is confusing\\\\\\\",\\\\\\\"the user's model doesn't match\\\\\\\",\\\\\\\"why is this surprising\\\\\\\",\\\\\\\"what's the right metaphor\\\\\\\"]\",\"examples\":\"[\\\\\\\"users keep trying to drag a row to reorder but the table doesn't support drag — diagnose the model mismatch\\\\\\\",\\\\\\\"explain why race conditions across parallel tool calls are hard for developers to anticipate\\\\\\\",\\\\\\\"decide on the right metaphor for a feature that combines folders and tags\\\\\\\",\\\\\\\"the team disagrees on what 'workspace' means in the product — surface the divergent mental models\\\\\\\"]\",\"anti_examples\":\"[\\\\\\\"draw the boxes-and-arrows diagram of the system (use knowledge-modeling)\\\\\\\",\\\\\\\"name the React hook for managing form state (tactical implementation choice)\\\\\\\",\\\\\\\"write user-research interview questions (use user-research)\\\\\\\",\\\\\\\"teach a junior engineer about distributed-systems consistency models (use teaching-patterns)\\\\\\\"]\",\"relations\":\"{\\\\\\\"related\\\\\\\":[\\\\\\\"conceptual-modeling\\\\\\\",\\\\\\\"knowledge-modeling\\\\\\\",\\\\\\\"user-research\\\\\\\",\\\\\\\"semantics\\\\\\\"],\\\\\\\"boundary\\\\\\\":[{\\\\\\\"skill\\\\\\\":\\\\\\\"conceptual-modeling\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"conceptual-modeling owns the FORMAL representation of a domain (entities, attributes, relationships) for downstream use by schemas and APIs; this skill owns the COGNITIVE construct (how a mind represents the system), which is upstream of any formalism.\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"knowledge-modeling\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"knowledge-modeling owns the choice of representation paradigm (graphs, frames, rules, networks); this skill owns the discipline of reasoning about model-system fit before any paradigm is chosen.\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"user-research\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"user-research owns the methods for eliciting and validating user mental models empirically; this skill owns the framing of what a mental model is and how to reason about its accuracy.\\\\\\\"}],\\\\\\\"verify_with\\\\\\\":[\\\\\\\"user-research\\\\\\\"]}\",\"mental_model\":\"|\",\"purpose\":\"|\",\"boundary\":\"|\",\"analogy\":\"A mental model is to a system what a map is to a city — the map is not the territory, the map is useful precisely because it is smaller and selective, and a traveler navigating with the wrong map (a city map for a different city, an out-of-date map, a tourist map missing the metro) does not get lost because the map is 'wrong' but because the map's selectiveness does not match the route they need.\",\"misconception\":\"|\",\"concept\":\"{\\\\\\\"definition\\\\\\\":\\\\\\\"A mental model is the internal representation a person carries of how a system, domain, or set of relationships behaves — used to predict outcomes, plan actions, and interpret feedback. Mental models are constructed (from experience, instruction, analogy, and inference), private (each mind holds its own), and partial (no model captures the full complexity of its referent). They are the cognitive scaffolding that makes the world tractable; they are also the silent source of most surprise, frustration, and bug-shaped misunderstanding. The discipline of mental-models is the explicit study of how these representations are built, where they diverge from the systems they represent, and how to bring multiple models into alignment when collaborators, users, and systems are working from different ones.\\\\\\\",\\\\\\\"mental_model\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"purpose\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"boundary\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"taxonomy\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"analogy\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"misconception\\\\\\\":\\\\\\\"|\\\\\\\"}\",\"skill_graph_source_repo\":\"https://github.com/jacob-balslev/skill-graph\",\"skill_graph_protocol\":\"Skill Metadata Protocol v5\",\"skill_graph_project\":\"Skill Graph\",\"skill_graph_canonical_skill\":\"skills/mental-models/SKILL.md\"}"
|
|
8
|
+
skill_graph_source_repo: "https://github.com/jacob-balslev/skill-graph"
|
|
9
|
+
skill_graph_protocol: Skill Metadata Protocol v4
|
|
10
|
+
skill_graph_project: Skill Graph
|
|
11
|
+
skill_graph_canonical_skill: skills/mental-models/SKILL.md
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Mental Models
|
|
15
|
+
|
|
16
|
+
## Coverage
|
|
17
|
+
|
|
18
|
+
The cognitive discipline of building, evaluating, and aligning the internal representations that minds carry of systems, domains, and interactions. Covers the three-model frame (designer model / system image / user model), the two gulfs (execution and evaluation), the role of analogy and metaphor in seeding new models, the model failure modes (transfer, overgeneralization, underspecification, drift, invariant blindness), the difference between surface, operational, architectural, and domain models, the methods for surfacing and repairing mental models through interface and documentation, and the cross-domain application of mental-models reasoning to UX, distributed systems, type systems, API design, and team collaboration.
|
|
19
|
+
|
|
20
|
+
## Philosophy
|
|
21
|
+
|
|
22
|
+
Mental models are how minds make systems tractable. Every user, every developer, every stakeholder operates from a model — partial, private, learned. Most surprise, most "confusing UX," most "miscommunication," most "user error" is the trace of a model-system gap. The disciplined response is to ask which model differs from which referent, where, and which intervention closes the gap — not to ask whose fault the surprise was.
|
|
23
|
+
|
|
24
|
+
The discipline holds three commitments. First: that the user, the developer, and the system are three distinct sources of truth, each with their own model, none of which is automatically right. Second: that the interface is the bridge through which the system's actual model can become visible enough to repair user models in flight. Third: that the right model for a role is the smallest model that supports successful action in that role — not the most complete model, not the most accurate model in some absolute sense, but the model that does the work and no more.
|
|
25
|
+
|
|
26
|
+
For agents working in software systems, mental-models discipline is the layer that decides what model to operate from before deciding what code to write. An agent that imports a wrong metaphor — "git branches are tree branches," "React state is OOP state," "a webhook is a function call" — produces code that compiles but doesn't work. The discipline is to make the operating model explicit, to test its predictions against the system, and to surface the gap before it becomes a bug.
|
|
27
|
+
|
|
28
|
+
## The Three-Model Frame
|
|
29
|
+
|
|
30
|
+
Norman's foundational distinction. Every system involves three models that may or may not align.
|
|
31
|
+
|
|
32
|
+
| Model | Held by | Built from | Failure mode when divergent |
|
|
33
|
+
|---|---|---|---|
|
|
34
|
+
| **Designer's model** | The builders | Intent, specs, internal docs | "I built it this way; why doesn't anyone get it?" — usually the system image fails to convey the designer's model |
|
|
35
|
+
| **System image** | The interface itself | Visual design, behavior, copy, feedback | An accurate system that looks wrong, or an inaccurate system that looks reasonable — both confuse users |
|
|
36
|
+
| **User's model** | Each user | Interface, prior experience, instruction | User takes actions that match their model but not the system; reports the system as buggy |
|
|
37
|
+
|
|
38
|
+
The interface is the only channel by which the designer's model can reach the user. If the interface is opaque, inconsistent, or contradicts the system's actual behavior, the user constructs the best model they can from incomplete signals — and that model will be wrong in predictable ways.
|
|
39
|
+
|
|
40
|
+
## The Two Gulfs
|
|
41
|
+
|
|
42
|
+
Norman's framework for where model failure shows up.
|
|
43
|
+
|
|
44
|
+
| Gulf | Question it asks | What widens it | What narrows it |
|
|
45
|
+
|---|---|---|---|
|
|
46
|
+
| **Gulf of execution** | "How do I do what I want to do?" | Hidden affordances, non-discoverable commands, jargon controls | Direct manipulation, obvious primary action, learned conventions matched correctly |
|
|
47
|
+
| **Gulf of evaluation** | "What did the system actually do?" | Silent state changes, no feedback, ambiguous results | Immediate visible feedback, undo, state inspection, clear confirmations |
|
|
48
|
+
|
|
49
|
+
A wide gulf of execution means the user can't find the right action. A wide gulf of evaluation means they can't tell whether their action succeeded. Both are bridged by surfacing the system's state in a form the user's model can read.
|
|
50
|
+
|
|
51
|
+
## Model Failure Modes And Their Interventions
|
|
52
|
+
|
|
53
|
+
| Failure mode | What it looks like | Intervention |
|
|
54
|
+
|---|---|---|
|
|
55
|
+
| **Transfer failure** | User applies a model from another system whose details don't fit ("I tried to drag this row to reorder, like every other table") | Surface the difference explicitly: tooltip, copy, signifier change. Or implement the expected behavior. |
|
|
56
|
+
| **Overgeneralization** | A pattern that worked in one context applied beyond its scope ("I tried to click outside the dialog to dismiss; it was a modal that requires explicit choice") | Signal boundaries: visual cues, copy that names the constraint, behavioral signals |
|
|
57
|
+
| **Underspecification** | The user has a model that's silent on the case they hit ("I assumed undo would work everywhere; this action wasn't undoable") | Progressive disclosure of the relevant constraint at the moment of relevance, not in advance |
|
|
58
|
+
| **Drift** | A model that was once correct but no longer matches ("I learned this workflow last year; it has three new steps now") | Change communication: in-product release notes, prompts on first encounter with changed behavior, removal of stale documentation |
|
|
59
|
+
| **Invariant blindness** | The system has a rule the user doesn't know about ("you can edit this except when another user has it locked, and the lock isn't visible") | Surface the invariant at the moment of relevance: lock indicator, error message that names the rule, preflight check |
|
|
60
|
+
|
|
61
|
+
The right intervention depends on the right diagnosis. "Train the user harder" is almost never correct — the user's model came from somewhere, usually from the interface or precedent.
|
|
62
|
+
|
|
63
|
+
## Metaphor As Model-Seeding
|
|
64
|
+
|
|
65
|
+
A new mental model is most often constructed by analogy from an existing one. The chosen metaphor has long downstream effects.
|
|
66
|
+
|
|
67
|
+
| Metaphor | What it seeds | What it costs |
|
|
68
|
+
|---|---|---|
|
|
69
|
+
| File ⟶ paper-in-folder | Discrete unit; in one place; can be moved | Implies physical containment that cloud-sync, multi-tag, and shared-link systems violate |
|
|
70
|
+
| Email folder ⟶ filing cabinet | Mutually exclusive categories | Doesn't fit messages that belong in multiple buckets — tags model fits better |
|
|
71
|
+
| Web page ⟶ printed document | Static, paginated, top-to-bottom | Doesn't fit dynamic, interactive, scrollable, infinite-content pages |
|
|
72
|
+
| API call ⟶ function call | Synchronous, single-machine, return-immediately | Hides network failure, latency, retries, partial failure |
|
|
73
|
+
| Thread ⟶ worker | Independent unit; pauseable | Doesn't surface the shared-memory failure modes that cause actual concurrency bugs |
|
|
74
|
+
| Database transaction ⟶ all-or-nothing | Strong atomicity | Doesn't fit weaker isolation levels (Read Committed, etc.); see `transaction-isolation` |
|
|
75
|
+
|
|
76
|
+
Choosing a metaphor commits the designer to surfacing the points where it breaks down. The metaphor is useful precisely because of its match; it is dangerous precisely because of its mismatches.
|
|
77
|
+
|
|
78
|
+
## Applying The Discipline
|
|
79
|
+
|
|
80
|
+
| Domain | Question to ask |
|
|
81
|
+
|---|---|
|
|
82
|
+
| **Designing a feature** | What's the user's model now? What model do we want them to have? What's the smallest interface change that bridges the two? |
|
|
83
|
+
| **Naming a function or API** | What model does this name imply? Does the function actually behave that way? If not, rename or change behavior. |
|
|
84
|
+
| **Writing an error message** | What model fragment caused this error? What model would prevent it? Can the error message implant the corrective model? |
|
|
85
|
+
| **Onboarding a new user** | What metaphor will we seed? Will it still fit in 6 months when the user is fluent? |
|
|
86
|
+
| **Resolving team disagreement** | Are we disagreeing about the system, or about each member's model of the system? Compare models before debating decisions. |
|
|
87
|
+
| **Debugging a confusing codebase** | What model would make this code obvious? Why doesn't the code communicate that model? |
|
|
88
|
+
| **Working with an agent on a codebase** | What model is the agent operating from? What would it predict about this code? Does the prediction match reality? |
|
|
89
|
+
| **Architecting a distributed system** | What is the designer's model of consistency? What will the system actually exhibit? What's in the system image (logs, dashboards, error semantics) that lets operators build the right operational model? |
|
|
90
|
+
|
|
91
|
+
## Verification
|
|
92
|
+
|
|
93
|
+
After applying this skill, verify:
|
|
94
|
+
- [ ] The system being designed, debugged, or discussed has the three models (designer / image / user) named separately, not conflated into a single "the system."
|
|
95
|
+
- [ ] When the user surprises you, the diagnosis identifies which failure mode (transfer / overgeneralization / underspecification / drift / invariant blindness) is operating before any intervention is chosen.
|
|
96
|
+
- [ ] When you choose a metaphor, you have explicitly listed at least one place it breaks down and how the interface will signal that breakdown.
|
|
97
|
+
- [ ] When the team disagrees, you have asked whether the disagreement is about the system or about the models of the system, and you have surfaced the divergent models before adjudicating.
|
|
98
|
+
- [ ] The model targeted for users is the smallest one that supports their role, not the most complete one that explains the implementation.
|
|
99
|
+
- [ ] Interfaces, error messages, naming, and defaults are aligned (not all five elements implying contradictory models).
|
|
100
|
+
- [ ] When you ship a model-shifting change, you have a mechanism to refresh user models (in-product notification, onboarding update, release note that explains the new model — not just the new behavior).
|
|
101
|
+
- [ ] If you are an agent working in an unfamiliar codebase, you have named the model you are currently operating from and tested at least one prediction against the system before producing substantial code.
|
|
102
|
+
|
|
103
|
+
## Do NOT Use When
|
|
104
|
+
|
|
105
|
+
| Instead of this skill | Use | Why |
|
|
106
|
+
|---|---|---|
|
|
107
|
+
| Building a formal representation (entities, attributes, schemas) | `conceptual-modeling` | conceptual-modeling owns formal-domain representation; mental-models owns the cognitive layer above any formalism |
|
|
108
|
+
| Choosing between graphs, frames, rules as a knowledge representation paradigm | `knowledge-modeling` | knowledge-modeling owns paradigm selection; mental-models is upstream of paradigm choice |
|
|
109
|
+
| Empirically eliciting user mental models through research methods | `user-research` | user-research owns the elicitation methods; this skill owns the conceptual frame the methods serve |
|
|
110
|
+
| Transferring a known mental model to another person | `teaching-patterns` | teaching-patterns owns the transmission discipline; this skill owns the construction discipline |
|
|
111
|
+
| Reasoning about specific cognitive biases (anchoring, framing, recency) | (no direct skill — out of scope) | The cognitive-bias literature is its own discipline; this skill focuses on system-model representation rather than decision-making bias |
|
|
112
|
+
| Choosing the React hook for a state-management decision | `state-management` + library docs | Tactical implementation choice; this skill is the upstream framing |
|
|
113
|
+
| The drawing or notation of a model | diagramming tools / `knowledge-modeling` | The artifact is not the model; this skill applies to the cognitive structure, not its visual representation |
|
|
114
|
+
|
|
115
|
+
## Key Sources
|
|
116
|
+
|
|
117
|
+
- Norman, D. A. (1983). "Some Observations on Mental Models." In Gentner, D. & Stevens, A. L. (Eds.), *Mental Models* (pp. 7–14). Lawrence Erlbaum. The foundational designer's-model / system-image / user's-model framing, and the gulfs of execution and evaluation.
|
|
118
|
+
- Norman, D. A. (2013). *The Design of Everyday Things* (Revised and Expanded Edition). Basic Books. The most accessible exposition of mental-models discipline applied to interface design; the three-model frame and the gulfs are presented in operational form.
|
|
119
|
+
- Johnson-Laird, P. N. (1983). *Mental Models: Towards a Cognitive Science of Language, Inference, and Consciousness*. Harvard University Press. The canonical cognitive-science treatment of mental models as the internal structures over which reasoning operates.
|
|
120
|
+
- Gentner, D., & Stevens, A. L. (Eds.). (1983). *Mental Models*. Lawrence Erlbaum. The collection that established the modern study of mental models across multiple domains (physics, calculators, navigation).
|
|
121
|
+
- Senge, P. M. (1990). *The Fifth Discipline: The Art and Practice of the Learning Organization*. Doubleday. The application of mental-models discipline to organizational thinking, with emphasis on surfacing assumptions and challenging "the way things are."
|
|
122
|
+
- Hutchins, E. (1995). *Cognition in the Wild*. MIT Press. The situated-cognition account: mental models are not purely internal but distributed across people, artifacts, and environments.
|
|
123
|
+
- Kahneman, D. (2011). *Thinking, Fast and Slow*. Farrar, Straus and Giroux. Adjacent treatment of the System 1 / System 2 distinction, which intersects with how mental models are constructed (fast, pattern-matched) and challenged (slow, deliberate).
|
|
124
|
+
- Carroll, J. M., & Mack, R. L. (1985). "Metaphor, computing systems, and active learning." *International Journal of Man-Machine Studies, 22*(1), 39–57. Foundational work on the role of metaphor in seeding mental models of computing systems, and on the production paradox (what users say vs what they do).
|
|
125
|
+
- Korzybski, A. (1933). *Science and Sanity: An Introduction to Non-Aristotelian Systems and General Semantics*. Institute of General Semantics. The origin of "the map is not the territory" — the foundational distinction between representation and represented.
|
|
126
|
+
- Clark, A. (2013). "Whatever next? Predictive brains, situated agents, and the future of cognitive science." *Behavioral and Brain Sciences, 36*(3), 181–204. The predictive-coding view, which reframes mental models as prior expectations the brain continuously tests against incoming evidence.
|
|
@@ -0,0 +1,94 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: merge-queue
|
|
3
|
+
description: "Use when serializing merges across multiple agent branches, resolving conflicts between agent outputs, or cleaning stale task branches. Covers atomic locking, idempotency checks, non-fast-forward handling, and worktree cleanup. Do NOT use for ordinary git operations outside an agent merge queue (use `version-control`)."
|
|
4
|
+
license: MIT
|
|
5
|
+
compatibility: "Markdown, Git, agent-skill runtimes"
|
|
6
|
+
allowed-tools: Read Grep Bash
|
|
7
|
+
metadata:
|
|
8
|
+
metadata: "{\"schema_version\":6,\"version\":\"1.0.0\",\"type\":\"workflow\",\"category\":\"engineering\",\"domain\":\"engineering/git\",\"scope\":\"portable\",\"owner\":\"skill-graph-maintainer\",\"freshness\":\"2026-04-01\",\"drift_check\":\"{\\\\\\\"last_verified\\\\\\\":\\\\\\\"2026-04-01\\\\\\\"}\",\"eval_artifacts\":\"planned\",\"eval_state\":\"unverified\",\"routing_eval\":\"absent\",\"stability\":\"experimental\",\"keywords\":\"[\\\\\\\"merge queue\\\\\\\",\\\\\\\"atomic lock\\\\\\\",\\\\\\\"idempotency\\\\\\\",\\\\\\\"no-ff merge\\\\\\\",\\\\\\\"worktree cleanup\\\\\\\",\\\\\\\"agent branch\\\\\\\",\\\\\\\"master merge\\\\\\\"]\",\"triggers\":\"[\\\\\\\"merge-queue\\\\\\\",\\\\\\\"agent-merge\\\\\\\"]\",\"relations\":\"{\\\\\\\"related\\\\\\\":[\\\\\\\"version-control\\\\\\\"],\\\\\\\"boundary\\\\\\\":[\\\\\\\"version-control\\\\\\\"]}\",\"portability\":\"{\\\\\\\"readiness\\\\\\\":\\\\\\\"scripted\\\\\\\",\\\\\\\"targets\\\\\\\":[\\\\\\\"skill-md\\\\\\\"]}\",\"lifecycle\":\"{\\\\\\\"stale_after_days\\\\\\\":90,\\\\\\\"review_cadence\\\\\\\":\\\\\\\"quarterly\\\\\\\"}\",\"skill_graph_source_repo\":\"https://github.com/jacob-balslev/skill-graph\",\"skill_graph_protocol\":\"Skill Metadata Protocol v5\",\"skill_graph_project\":\"Skill Graph\",\"skill_graph_canonical_skill\":\"skills/merge-queue/SKILL.md\"}"
|
|
9
|
+
skill_graph_source_repo: "https://github.com/jacob-balslev/skill-graph"
|
|
10
|
+
skill_graph_protocol: Skill Metadata Protocol v4
|
|
11
|
+
skill_graph_project: Skill Graph
|
|
12
|
+
skill_graph_canonical_skill: skills/merge-queue/SKILL.md
|
|
13
|
+
---
|
|
14
|
+
# Merge Queue (Serialized Commit Control)
|
|
15
|
+
|
|
16
|
+
## Domain Context
|
|
17
|
+
|
|
18
|
+
**What is this skill?** This skill manages the serialized merge queue for agent branches. Covers atomic locking, idempotency checks, non-fast-forward merges, and worktree cleanup. Use when merging multiple agent tasks to master, resolving merge conflicts between agents, or cleaning up stale task branches. Do NOT use for standard git operations (use version-control).
|
|
19
|
+
|
|
20
|
+
## Coverage
|
|
21
|
+
|
|
22
|
+
This skill manages the serialized merge queue for agent branches. The sections below contain the detailed rules, examples, and boundaries for using this skill correctly.
|
|
23
|
+
|
|
24
|
+
## Key Files
|
|
25
|
+
|
|
26
|
+
| File | Purpose |
|
|
27
|
+
|---|---|
|
|
28
|
+
| `scripts/agent/merge-queue.sh` | The primary CLI for managing the queue. |
|
|
29
|
+
| `agent-orchestration/logs/events.jsonl` | Emits `merge_started` and `merge_completed` events. |
|
|
30
|
+
## Workflow
|
|
31
|
+
|
|
32
|
+
Use the ordered phases, checklists, and guardrails in the sections below as the canonical workflow for this skill. When multiple subsections describe steps, follow them in the order presented.
|
|
33
|
+
|
|
34
|
+
## Queue Coverage
|
|
35
|
+
|
|
36
|
+
Atomic merge locking (`merge.lock`), branch idempotency checks, `--no-ff` (non-fast-forward) merge policy, automated worktree and branch cleanup, and event emission.
|
|
37
|
+
|
|
38
|
+
> The "Gatekeeper" for the master branch. Prevents agents from causing race conditions during the final commit phase.
|
|
39
|
+
|
|
40
|
+
## Philosophy
|
|
41
|
+
|
|
42
|
+
Merge queues exist to protect the main branch from concurrent merge failures. When multiple agents modify overlapping code simultaneously, merging them independently creates a false sense of safety — each agent's work may pass CI alone but break when combined. A merge queue serializes merges, ensuring every commit is tested against the current state of main plus all queued predecessors before pushing. The result is a main branch that is always in a known-good state, with clear task-level history preserved via non-fast-forward commits. This prevents both silent conflicts (where two agents unwittingly overwrite each other's work) and long debugging sessions trying to untangle which merge introduced a regression.
|
|
43
|
+
|
|
44
|
+
## 1. The Merge Protocol
|
|
45
|
+
|
|
46
|
+
To prevent merge conflicts and history pollution, all agents must submit their finished tasks to the merge queue.
|
|
47
|
+
|
|
48
|
+
| Phase | Action | Tool |
|
|
49
|
+
|---|---|---|
|
|
50
|
+
| **Lock** | Acquire `merge.lock` | `scripts/agent/merge-queue.sh` (merge subcommand) |
|
|
51
|
+
| **Verify** | Check branch state | Verify branch is up to date with master and passes tests. |
|
|
52
|
+
| **Merge** | Non-FF Merge | Merge with `--no-ff` to preserve task history. |
|
|
53
|
+
| **Cleanup** | Delete branch/worktree | Remove the git worktree and the remote task branch. |
|
|
54
|
+
| **Release** | Release `merge.lock` | Allow the next agent in the queue to proceed. |
|
|
55
|
+
|
|
56
|
+
## 2. Clever Features (Stolen from Merge Queue)
|
|
57
|
+
|
|
58
|
+
- **Atomic Lock**: Only one agent can merge at a time, ensuring that master is always in a known good state.
|
|
59
|
+
- **Idempotency**: If a merge fails halfway, the queue can be resumed without creating duplicate commits.
|
|
60
|
+
- **Automated Cleanup**: Once a merge is successful, the `merge-queue.sh` automatically deletes the task-specific worktree (`/tmp/worktrees/SH-XXXX`) to save disk space.
|
|
61
|
+
|
|
62
|
+
## 3. Managing the Queue
|
|
63
|
+
|
|
64
|
+
- **Submit Merge**: `bash scripts/agent/merge-queue.sh merge --task SH-XXXX`
|
|
65
|
+
- **Check Status**: `bash scripts/agent/merge-queue.sh status`
|
|
66
|
+
- **Cleanup Manually**: `bash scripts/agent/merge-queue.sh cleanup --task SH-XXXX`
|
|
67
|
+
|
|
68
|
+
## 4. Key Files
|
|
69
|
+
|
|
70
|
+
| File | Purpose |
|
|
71
|
+
|---|---|
|
|
72
|
+
| `scripts/agent/merge-queue.sh` | The primary CLI for managing the queue. |
|
|
73
|
+
| `.git/merge.lock` | (Virtual) The lock file preventing concurrent merges. |
|
|
74
|
+
| `agent-orchestration/logs/events.jsonl` | Emits `merge_started` and `merge_completed` events. |
|
|
75
|
+
|
|
76
|
+
## 5. Verification Protocol
|
|
77
|
+
|
|
78
|
+
- **Lock Test**: Try to start two merges simultaneously; the second should wait or fail.
|
|
79
|
+
- **Master Integrity**: Verify that master only contains `--no-ff` merge commits for agent tasks.
|
|
80
|
+
- **Disk Space**: Verify that the `/tmp/worktrees/` directory is pruned after successful merges.
|
|
81
|
+
|
|
82
|
+
|
|
83
|
+
## Do NOT Use When
|
|
84
|
+
|
|
85
|
+
| Instead of this skill | Use | Why |
|
|
86
|
+
|---|---|---|
|
|
87
|
+
| (To be filled during next audit pass) | — | — |
|
|
88
|
+
|
|
89
|
+
|
|
90
|
+
## Verification
|
|
91
|
+
|
|
92
|
+
After applying this skill, verify:
|
|
93
|
+
- [ ] Changes follow the patterns documented above
|
|
94
|
+
- [ ] No regressions in affected functionality
|
|
@@ -0,0 +1,317 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: methodology
|
|
3
|
+
description: "Use when planning multi-step implementations, designing quality gates, establishing verification protocols, or building agent checklists calibrated to known failure modes. Covers methodology/method/process distinctions, Cleanroom, PSP/TSP, hypothesis-driven development, DMAIC, checklist design, V&V frameworks, EDDOps, quality gates, and PDCA. Do NOT use for code-review verdicts (use `code-review`), behavior-preserving implementation work (use `refactor`), or test strategy (use `testing-strategy`)."
|
|
4
|
+
license: MIT
|
|
5
|
+
compatibility: "Markdown, Git, agent-skill runtimes"
|
|
6
|
+
allowed-tools: Read Grep Bash
|
|
7
|
+
metadata:
|
|
8
|
+
metadata: "{\"schema_version\":6,\"version\":\"1.0.0\",\"type\":\"capability\",\"category\":\"quality\",\"domain\":\"quality/method\",\"scope\":\"reference\",\"owner\":\"skill-graph-maintainer\",\"freshness\":\"2026-04-01\",\"drift_check\":\"{\\\\\\\"last_verified\\\\\\\":\\\\\\\"2026-04-01\\\\\\\"}\",\"eval_artifacts\":\"planned\",\"eval_state\":\"unverified\",\"routing_eval\":\"absent\",\"stability\":\"experimental\",\"keywords\":\"[\\\\\\\"methodology\\\\\\\",\\\\\\\"method\\\\\\\",\\\\\\\"process\\\\\\\",\\\\\\\"formal methods\\\\\\\",\\\\\\\"cleanroom\\\\\\\",\\\\\\\"PSP\\\\\\\",\\\\\\\"TSP\\\\\\\",\\\\\\\"hypothesis driven\\\\\\\",\\\\\\\"DMAIC\\\\\\\",\\\\\\\"PDCA\\\\\\\",\\\\\\\"Deming\\\\\\\",\\\\\\\"quality gates\\\\\\\",\\\\\\\"checklist manifesto\\\\\\\",\\\\\\\"verification validation\\\\\\\",\\\\\\\"DO-178C\\\\\\\",\\\\\\\"V&V\\\\\\\",\\\\\\\"EDDOps\\\\\\\",\\\\\\\"defect prevention\\\\\\\",\\\\\\\"shift left\\\\\\\",\\\\\\\"evidence based\\\\\\\"]\",\"triggers\":\"[\\\\\\\"methodology-skill\\\\\\\"]\",\"portability\":\"{\\\\\\\"readiness\\\\\\\":\\\\\\\"scripted\\\\\\\",\\\\\\\"targets\\\\\\\":[\\\\\\\"skill-md\\\\\\\"]}\",\"lifecycle\":\"{\\\\\\\"stale_after_days\\\\\\\":90,\\\\\\\"review_cadence\\\\\\\":\\\\\\\"quarterly\\\\\\\"}\",\"skill_graph_source_repo\":\"https://github.com/jacob-balslev/skill-graph\",\"skill_graph_protocol\":\"Skill Metadata Protocol v5\",\"skill_graph_project\":\"Skill Graph\",\"skill_graph_canonical_skill\":\"skills/methodology/SKILL.md\"}"
|
|
9
|
+
skill_graph_source_repo: "https://github.com/jacob-balslev/skill-graph"
|
|
10
|
+
skill_graph_protocol: Skill Metadata Protocol v4
|
|
11
|
+
skill_graph_project: Skill Graph
|
|
12
|
+
skill_graph_canonical_skill: skills/methodology/SKILL.md
|
|
13
|
+
---
|
|
14
|
+
# Methodology
|
|
15
|
+
|
|
16
|
+
## Domain Context
|
|
17
|
+
|
|
18
|
+
**What is this skill?** Provides the philosophical framework (methodology), specific techniques (methods), and ordered execution sequences (processes) that govern rigorous agent work. Covers the methodology-method-process stack, Cleanroom defect prevention, PSP/TSP measurement discipline, hypothesis-driven development, DMAIC quality management, aviation checklist design (Gawande), V&V frameworks (DO-178C/NASA), EDDOps for LLM agents, quality gates as hard stops, and the PDCA cycle for standardizing successes. Use when planning multi-step implementations, designing quality gates, establishing verification protocols, building agent checklists calibrated to known failure modes, or when an agent needs to understand WHY a process step exists (not just THAT it exists). Do NOT use for the specific behavioral rules of completeness and honesty (use methodical), quality definitions per artifact (use quality-doctrine), or generate-critique-revise loop mechanics (use self-review-pattern).
|
|
19
|
+
|
|
20
|
+
## Coverage
|
|
21
|
+
|
|
22
|
+
This skill covers the foundational frameworks that make rigorous agent work possible: the methodology-method-process three-layer stack (why these steps, not just which steps), Cleanroom defect prevention (quality built in, not inspected in), PSP/TSP measurement discipline (plan, review, post-mortem), hypothesis-driven development (define success criteria before the experiment), DMAIC/PDCA quality management (especially the skipped Control/Act phases), Gawande checklist design (calibrated to known failure modes, not generic), V&V frameworks from safety-critical industries (DO-178C bidirectional traceability, NASA verification standards), EDDOps for LLM agent evaluation (lifecycle coverage, slice-level validation, closed feedback loops), quality gates as hard stops (binary, blocking, measurable), and the 8 codifiable methodology patterns synthesized across all source disciplines.
|
|
23
|
+
|
|
24
|
+
## Philosophy
|
|
25
|
+
|
|
26
|
+
Agents default to executing process without methodology. They apply steps without understanding the principle that makes the steps necessary, which is exactly why they skip steps that feel low-value in the moment. A pilot who knows the B-17 crash story understands why the gust-lock check exists. A pilot who just has a checklist might skip it when running late.
|
|
27
|
+
|
|
28
|
+
The academic distinction is load-bearing: **Methodology** is the philosophical framework — WHY you approach work a certain way. **Method** is a specific technique within the methodology. **Process** is the ordered sequence of steps executing a method. One process can contain multiple methods; one method can serve many processes. But without methodology, the agent has no basis for deciding when to apply extra rigor, when to challenge a process that seems wasteful, or when a shortcut is genuinely safe vs. when it compromises the underlying principle.
|
|
29
|
+
|
|
30
|
+
This skill installs the WHY. The `methodical` skill installs the behavioral rules. Together they prevent both "knows the steps but skips them" (ineptitude) and "follows steps mechanically without understanding when they matter" (cargo culting).
|
|
31
|
+
|
|
32
|
+
## Trio Boundaries
|
|
33
|
+
|
|
34
|
+
methodology provides the *frameworks and principles* behind rigorous execution. It does NOT provide the specific behavioral rules for completeness and honesty — use `methodical` for that. It does NOT define what quality means per artifact type — use `quality-doctrine` for that. It does NOT provide the generate-critique-revise loop mechanics — use `self-review-pattern` for that.
|
|
35
|
+
|
|
36
|
+
---
|
|
37
|
+
|
|
38
|
+
## 1. The Three-Layer Stack
|
|
39
|
+
|
|
40
|
+
| Layer | What It Is | Scope | Example |
|
|
41
|
+
|---|---|---|---|
|
|
42
|
+
| **Methodology** | The philosophical framework — WHY | Values, assumptions, epistemology | "Defect prevention is cheaper than defect removal" |
|
|
43
|
+
| **Method** | A specific technique — HOW | Technique, approach, rules | "Run a type-check before committing" |
|
|
44
|
+
| **Process** | An ordered sequence — WHEN | Steps, timing, handoffs | "1. Claim task → 2. Implement → 3. Verify → 4. Wrap" |
|
|
45
|
+
|
|
46
|
+
**Operational rule:** Before executing any multi-step task, the agent must be able to articulate:
|
|
47
|
+
1. What quality principle governs this work? (the methodology)
|
|
48
|
+
2. What specific technique is being applied? (the method)
|
|
49
|
+
3. In what order do the steps execute? (the process)
|
|
50
|
+
|
|
51
|
+
If the agent cannot state #1, it is executing process without methodology — which is how skipped steps and silent shortcuts happen.
|
|
52
|
+
|
|
53
|
+
---
|
|
54
|
+
|
|
55
|
+
## 2. Cleanroom: Defect Prevention Over Removal
|
|
56
|
+
|
|
57
|
+
Cleanroom software engineering (Harlan Mills, IBM) derives its name from semiconductor cleanrooms — environments designed to *prevent* contamination, not clean it up after.
|
|
58
|
+
|
|
59
|
+
### The Three Pillars
|
|
60
|
+
|
|
61
|
+
1. **Formal specification before implementation.** Behavior is specified before any code is written. Team reviews verify that design correctly implements specification — before implementation begins.
|
|
62
|
+
2. **Incremental implementation with quality gates.** Each increment is measured against pre-established standards. Failure to pass a gate triggers return to design. Work cannot advance past a failed gate.
|
|
63
|
+
3. **Statistical testing as experiment.** Testing is a designed experiment with defined coverage, not ad-hoc verification.
|
|
64
|
+
|
|
65
|
+
### The Defect Cost Theorem
|
|
66
|
+
|
|
67
|
+
| Phase Detected | Cost Multiplier |
|
|
68
|
+
|---|---|
|
|
69
|
+
| During design | 1x |
|
|
70
|
+
| During code review | 6-10x |
|
|
71
|
+
| During testing | 25-100x |
|
|
72
|
+
| In production | 100x+ |
|
|
73
|
+
|
|
74
|
+
Source: IBM Systems Sciences Institute, widely cited.
|
|
75
|
+
|
|
76
|
+
**Translation to agent work:** Every pre-implementation check (acceptance criteria declared, existing patterns audited, assumptions externalized) prevents a post-task revision cycle. The PRE-TASK gate is not optional even when the task feels simple — simple tasks fail most often on defects that a pre-task check would have caught, because the agent doesn't bother checking assumptions it believes are obvious.
|
|
77
|
+
|
|
78
|
+
---
|
|
79
|
+
|
|
80
|
+
## 3. PSP/TSP: Measurement at the Individual Level
|
|
81
|
+
|
|
82
|
+
The Personal Software Process (Watts Humphrey, SEI/CMU) applies capability maturity principles to individual engineers. TSP teams missed target schedules by an average of only 6%, versus a one-third failure rate for unaided projects.
|
|
83
|
+
|
|
84
|
+
### PSP Phase Structure
|
|
85
|
+
|
|
86
|
+
| Phase | What Happens | Agent Equivalent |
|
|
87
|
+
|---|---|---|
|
|
88
|
+
| **Planning** | Estimate using historical data. Never begin without a plan. | Pre-task declaration: scope, steps, risks |
|
|
89
|
+
| **Design** | Specify the solution before implementing | Declare what tokens, states, components will be used |
|
|
90
|
+
| **Design Review** | Personal review of design before coding — using a personal checklist | Self-check: "Have I seen this pattern before? What was the failure mode?" |
|
|
91
|
+
| **Code** | Implement | Implement |
|
|
92
|
+
| **Code Review** | Personal review before compile — same checklist discipline | Post-implementation self-check: all states present? all tokens correct? |
|
|
93
|
+
| **Test** | Verify against planned acceptance criteria | Run verification against pre-declared criteria |
|
|
94
|
+
| **Post-mortem** | Record actuals, calculate defect metrics, update historical data | Wrap findings: document what was discovered, update checklists |
|
|
95
|
+
|
|
96
|
+
### The Defect Philosophy
|
|
97
|
+
|
|
98
|
+
"Errors are usually predictable, so PSP developers can personalize their checklists to target their own common errors." PSP treats defect types as learnable patterns, not random events.
|
|
99
|
+
|
|
100
|
+
**Agent application:** Agent checklists should include known agent failure modes:
|
|
101
|
+
- Missing `aria-label` on icon-only buttons
|
|
102
|
+
- Raw hex color instead of design token
|
|
103
|
+
- Missing empty/error/loading state
|
|
104
|
+
- Wrong heading level for component type
|
|
105
|
+
- `tabular-nums` missing on financial amounts
|
|
106
|
+
- Dark mode not verified
|
|
107
|
+
- Mobile breakpoints not verified
|
|
108
|
+
- Stale doc references after rename/delete
|
|
109
|
+
|
|
110
|
+
---
|
|
111
|
+
|
|
112
|
+
## 4. Hypothesis-Driven Development
|
|
113
|
+
|
|
114
|
+
Every implementation decision is a hypothesis. The structure:
|
|
115
|
+
|
|
116
|
+
```
|
|
117
|
+
We believe [doing this]
|
|
118
|
+
For [this system/component]
|
|
119
|
+
Will achieve [this outcome]
|
|
120
|
+
We will know we are right when [we observe this measurable signal]
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
**The critical discipline:** Define validation criteria BEFORE the experiment, not after observing results. This prevents confirmation bias — verifying against what you hoped to find rather than what is actually there.
|
|
124
|
+
|
|
125
|
+
**Anti-pattern this prevents:** The "fix-by-guessing" loop — making changes until the symptom disappears without understanding the cause. This is treating symptoms rather than diagnosing disease.
|
|
126
|
+
|
|
127
|
+
### Evidence-Based Decision Standard
|
|
128
|
+
|
|
129
|
+
A decision is evidence-based when:
|
|
130
|
+
1. The claim is stated explicitly before evidence is gathered
|
|
131
|
+
2. Evidence is gathered by a method independent of the claim-holder's preference
|
|
132
|
+
3. The evidence is compared against the pre-stated success criteria
|
|
133
|
+
4. The decision and evidence are recorded for future reference
|
|
134
|
+
|
|
135
|
+
"Looks good to me" is not evidence. "Should work" is not evidence. A screenshot, a test result, a command output — those are evidence.
|
|
136
|
+
|
|
137
|
+
---
|
|
138
|
+
|
|
139
|
+
## 5. DMAIC and Deming: Quality Management
|
|
140
|
+
|
|
141
|
+
### DMAIC Applied to Agent Tasks
|
|
142
|
+
|
|
143
|
+
| Phase | Agent Application |
|
|
144
|
+
|---|---|
|
|
145
|
+
| **Define** | State the problem, acceptance criteria, and scope. What does DONE look like? |
|
|
146
|
+
| **Measure** | Gather baseline: current state, existing coverage, current defect rate |
|
|
147
|
+
| **Analyze** | Root cause analysis: WHY does the defect exist? What structural pattern causes it? |
|
|
148
|
+
| **Improve** | Implement the fix, targeting the root cause identified in Analyze |
|
|
149
|
+
| **Control** | Regression test, checklist update, documentation — prevent recurrence |
|
|
150
|
+
|
|
151
|
+
**The Control phase is what agents skip.** Fixing a bug without adding a regression test is a DMAIC failure. The defect will recur.
|
|
152
|
+
|
|
153
|
+
### Deming's Most Transferable Points
|
|
154
|
+
|
|
155
|
+
**Point 3 — Cease dependence on inspection.** Quality must be designed in, not inspected in at the end. Post-task review is a last resort, not the primary quality mechanism. Methodology that only activates when a reviewer asks is compliance theater.
|
|
156
|
+
|
|
157
|
+
**Point 5 — Improve constantly every process.** Every completed task is a data point. Wrap findings that identify a recurring pattern must update the checklist or skill — not just document the finding.
|
|
158
|
+
|
|
159
|
+
**Point 10 — Eliminate slogans and exhortations.** "Be more careful" and "write better code" are not instructions. Methodology replaces vague exhortations with specific, verifiable steps. "Check all interactive elements have visible focus states by tabbing through the page" is an instruction. "Make it accessible" is not.
|
|
160
|
+
|
|
161
|
+
**Point 14 — Put everybody to work to accomplish the transformation.** Methodology is not a layer added by reviewers — it is internalized by every agent at every step.
|
|
162
|
+
|
|
163
|
+
### PDCA Cycle
|
|
164
|
+
|
|
165
|
+
```
|
|
166
|
+
PLAN → Define the task, declare acceptance criteria, identify risks
|
|
167
|
+
DO → Implement
|
|
168
|
+
CHECK → Verify against acceptance criteria (not against memory of what you built)
|
|
169
|
+
ACT → If passing: standardize the pattern. If failing: return to PLAN.
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
**Critical insight:** The ACT phase applies to SUCCESSES as well as failures. Patterns that work should be standardized — added to checklists, documented in guides, encoded in skills. Wrap findings must document both what was fixed (failure learning) and what worked well (success learning).
|
|
173
|
+
|
|
174
|
+
---
|
|
175
|
+
|
|
176
|
+
## 6. Aviation Checklists (Gawande)
|
|
177
|
+
|
|
178
|
+
### The Origin
|
|
179
|
+
|
|
180
|
+
1935: Boeing B-17 prototype crashed on its maiden flight. The pilot forgot to disengage a gust lock. The plane was too complex for one person to manage from memory. The solution was not to hire better pilots. It was to create a checklist.
|
|
181
|
+
|
|
182
|
+
### Two Categories of Failure
|
|
183
|
+
|
|
184
|
+
1. **Ignorance** — we don't know what to do
|
|
185
|
+
2. **Ineptitude** — we know what to do but fail to apply it under pressure
|
|
186
|
+
|
|
187
|
+
Modern professional failures are almost entirely category 2. Agents skip `aria-label` not because they don't know it's required, but because attention is on the primary task when the icon button is written. Checklists prevent ineptitude failures by making minimum necessary steps explicit and resistant to skipping.
|
|
188
|
+
|
|
189
|
+
### Checklist Design Principles
|
|
190
|
+
|
|
191
|
+
- **Short.** Include only items where skipping causes serious consequences. Not every step.
|
|
192
|
+
- **Calibrated.** Items should target the specific failure modes of the person/agent using the checklist. Generic checklists catch nothing.
|
|
193
|
+
- **Two types:** Do-confirm (work from memory, then confirm) vs. Read-do (read each item, then do it). Read-do for novel/complex tasks. Do-confirm for familiar tasks.
|
|
194
|
+
- **Point of use.** The checklist must be present at execution time, not in a document somewhere else.
|
|
195
|
+
|
|
196
|
+
### Why "Being More Careful" Doesn't Work
|
|
197
|
+
|
|
198
|
+
Agents skip steps because:
|
|
199
|
+
- Attention is allocated to the primary task, not peripheral completeness requirements
|
|
200
|
+
- Context window pressure creates implicit priority on the immediately visible goal
|
|
201
|
+
- There is no external forcing function triggering a stop
|
|
202
|
+
- "Close enough" is locally indistinguishable from "correct"
|
|
203
|
+
|
|
204
|
+
Methodology — not effort — is the solution. Structured gates that must be explicitly cleared, in sequence, with evidence.
|
|
205
|
+
|
|
206
|
+
---
|
|
207
|
+
|
|
208
|
+
## 7. Verification and Validation (V&V)
|
|
209
|
+
|
|
210
|
+
### The Distinction
|
|
211
|
+
|
|
212
|
+
**Verification:** Are we building the product RIGHT? (Does implementation match specification?)
|
|
213
|
+
**Validation:** Are we building the RIGHT product? (Does it meet the user's actual need?)
|
|
214
|
+
|
|
215
|
+
A component can pass all token checks (verified) but solve the wrong UX problem (not validated). Both must pass.
|
|
216
|
+
|
|
217
|
+
### DO-178C and NASA V&V
|
|
218
|
+
|
|
219
|
+
**Bidirectional traceability.** Every requirement traced to code implementing it, test verifying it, and test results confirming it. No requirement is "done" without the complete trace.
|
|
220
|
+
|
|
221
|
+
**V&V accounts for 70% of total development effort** in safety-critical industries. This is not waste — it is the work. Agents that treat verification as overhead are misallocating effort. Implementation is cheap; ensuring it is correct is expensive and unavoidable.
|
|
222
|
+
|
|
223
|
+
**Independent verification.** The implementer's mental model contaminates their verification — they verify against what they INTENDED, not what they BUILT. Independent verification catches the gap.
|
|
224
|
+
|
|
225
|
+
---
|
|
226
|
+
|
|
227
|
+
## 8. EDDOps: Evaluation-Driven Development for LLM Agents
|
|
228
|
+
|
|
229
|
+
### Six Quality Drivers (arXiv, 2024)
|
|
230
|
+
|
|
231
|
+
| Driver | Requirement |
|
|
232
|
+
|---|---|
|
|
233
|
+
| **Lifecycle coverage** | Evaluation before deployment AND after AND continuously |
|
|
234
|
+
| **Metric mix** | End-to-end scores AND step-level checks AND slice-aware checks |
|
|
235
|
+
| **System-level anchor** | Evaluate full orchestration, not isolated model behavior |
|
|
236
|
+
| **Adaptive evaluation** | Stable baselines + triggered probes when context changes |
|
|
237
|
+
| **Closed feedback loops** | Findings must link to recorded changes — no untracked fixes |
|
|
238
|
+
| **Human oversight** | Escalate ambiguous or high-impact cases to human judgment |
|
|
239
|
+
|
|
240
|
+
### The Core Insight
|
|
241
|
+
|
|
242
|
+
Evaluation is not a terminal checkpoint. It is a governing capability spanning the entire lifecycle. It is not "the last thing before shipping." It is the mechanism by which quality is maintained across every phase.
|
|
243
|
+
|
|
244
|
+
**Slice-level validation:** After a fix, verify on the same failing case, not pooled aggregates. An improvement in overall scores that masks a regression in the failing case is a methodology failure.
|
|
245
|
+
|
|
246
|
+
**Traceability mandate:** "All changes must be versioned and linked to originating evidence." For agents: every fix must be traceable to a specific finding. "Cleaned it up" is not a valid change record.
|
|
247
|
+
|
|
248
|
+
---
|
|
249
|
+
|
|
250
|
+
## 9. The 8 Codifiable Patterns
|
|
251
|
+
|
|
252
|
+
These patterns appear across all source disciplines and translate directly into skill rules:
|
|
253
|
+
|
|
254
|
+
| # | Pattern | Sources | Rule |
|
|
255
|
+
|---|---|---|---|
|
|
256
|
+
| 1 | **Declare Before Act** | Cleanroom + PSP + HDD + Gawande | Externalize intent before implementation; catches wrong assumptions |
|
|
257
|
+
| 2 | **Quality Gates Are Hard Stops** | Cleanroom + DMAIC + DO-178C | Binary, blocking, measurable; "mostly passes" does not exist |
|
|
258
|
+
| 3 | **Defect Prevention at Earliest Phase** | Cleanroom cost theorem + shift-left | Upstream checks cost 1x; downstream costs cascade to 100x |
|
|
259
|
+
| 4 | **Evidence Replaces Belief** | HDD + PSP + V&V + EDDOps | Four components: criterion, test, observed result, comparison |
|
|
260
|
+
| 5 | **Failure Modes Are Learnable** | PSP personalization + Gawande | Checklists calibrated to known failure modes, not generic |
|
|
261
|
+
| 6 | **Methodology Must Be Internalized** | Deming Points 3, 14 | Quality designed in, not inspected in; compliance theater is not methodology |
|
|
262
|
+
| 7 | **Independent Verification for High-Stakes** | V&V + EDDOps human oversight | Implementer's mental model contaminates self-verification |
|
|
263
|
+
| 8 | **PDCA Standardizes Successes Too** | Deming PDCA ACT phase | Wrap findings encode what worked, not just what failed |
|
|
264
|
+
|
|
265
|
+
---
|
|
266
|
+
|
|
267
|
+
## 10. Anti-Patterns This Skill Prohibits
|
|
268
|
+
|
|
269
|
+
| Anti-Pattern | Source Discipline | Why It Fails |
|
|
270
|
+
|---|---|---|
|
|
271
|
+
| "Looks good to me" as verification | PSP, V&V, HDD | Not a trace, not evidence, not reproducible |
|
|
272
|
+
| Proceeding past a failing gate | Cleanroom, DO-178C | Downstream work compounds the upstream error |
|
|
273
|
+
| Testing only the happy path | PSP, Cleanroom, DO-178C | Error states are where failures concentrate in production |
|
|
274
|
+
| Fixing symptoms without root cause | DMAIC Analyze, HDD | Defect will recur under slightly different conditions |
|
|
275
|
+
| Self-certifying without structured evidence | V&V independence, EDDOps | Mental model contamination |
|
|
276
|
+
| Skipping pre-task "because it's simple" | PSP planning, Gawande | Simple tasks fail on overlooked assumptions MORE often |
|
|
277
|
+
| Only learning from failures, not successes | PDCA ACT phase, PSP post-mortem | Success patterns are lost; only error avoidance is encoded |
|
|
278
|
+
| Treating verification as overhead | V&V (70% of effort), DMAIC Control | Verification IS the work, not an addition to it |
|
|
279
|
+
| Vague completion criteria | DMAIC Define, HDD, DO-178C | Unmeasurable criteria cannot be verified |
|
|
280
|
+
|
|
281
|
+
---
|
|
282
|
+
|
|
283
|
+
## Verification
|
|
284
|
+
|
|
285
|
+
After applying this skill, verify:
|
|
286
|
+
- [ ] Agent can articulate the methodology (WHY) behind each process step
|
|
287
|
+
- [ ] Quality gates are binary pass/fail with evidence, not "mostly okay"
|
|
288
|
+
- [ ] Pre-task declaration exists before implementation began
|
|
289
|
+
- [ ] Post-task verification uses evidence, not belief
|
|
290
|
+
- [ ] Checklists are calibrated to known agent failure modes
|
|
291
|
+
- [ ] PDCA ACT phase captures successes, not just failures
|
|
292
|
+
- [ ] Changes are traceable to originating findings (EDDOps traceability)
|
|
293
|
+
|
|
294
|
+
## Do NOT Use When
|
|
295
|
+
|
|
296
|
+
| Instead of this skill | Use | Why |
|
|
297
|
+
|---|---|---|
|
|
298
|
+
| Enforcing completeness and honesty rules | `methodical` | methodical has the 10 rules and anti-pattern catalog |
|
|
299
|
+
| Defining what "better" means per artifact | `quality-doctrine` | quality-doctrine owns quality definitions |
|
|
300
|
+
| Implementing generate-critique-revise loops | `self-review-pattern` | self-review-pattern owns the loop mechanics |
|
|
301
|
+
| Sequencing task phases | `task-execution` | task-execution owns the workflow |
|
|
302
|
+
| Designing agent governance policies | `agent-governance` | agent-governance owns policy and authority boundaries |
|
|
303
|
+
|
|
304
|
+
## Key Sources
|
|
305
|
+
|
|
306
|
+
- Mills, H. (1987). Cleanroom Software Engineering. IEEE Software.
|
|
307
|
+
- Humphrey, W. (1995). A Discipline for Software Engineering (PSP). Addison-Wesley.
|
|
308
|
+
- Humphrey, W. (2000). Introduction to the Team Software Process. SEI/CMU.
|
|
309
|
+
- Gawande, A. (2009). The Checklist Manifesto. Metropolitan Books.
|
|
310
|
+
- Boyd, J. (1976). OODA Loop. USAF.
|
|
311
|
+
- Deming, W.E. (1986). Out of the Crisis. MIT Press.
|
|
312
|
+
- DO-178C. Software Considerations in Airborne Systems and Equipment Certification.
|
|
313
|
+
- IEEE 1012. Standard for System, Software, and Hardware Verification and Validation.
|
|
314
|
+
- NASA (2014). Verification and Validation. Technical Reports Server.
|
|
315
|
+
- EDDOps (2024). Evaluation-Driven Development of LLM Agents. arXiv:2411.13768.
|
|
316
|
+
- Barry O'Reilly. Hypothesis-Driven Development.
|
|
317
|
+
- Thoughtworks. How to Implement Hypothesis-Driven Development.
|