@skill-graph/cli 0.5.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (330) hide show
  1. package/CHANGELOG.md +247 -0
  2. package/LICENSE +200 -0
  3. package/NOTICE +62 -0
  4. package/README.md +398 -0
  5. package/SKILL_GRAPH.md +443 -0
  6. package/bin/skill-graph.js +374 -0
  7. package/docs/ADOPTION.md +117 -0
  8. package/docs/CONFORMANCE.md +66 -0
  9. package/docs/PRIMER.md +384 -0
  10. package/docs/QUICKSTART-30MIN.md +333 -0
  11. package/docs/ROUTING-METRICS.md +120 -0
  12. package/docs/SKILL-MD-FORMAT-COMPATIBILITY.md +127 -0
  13. package/docs/SKILL_AUDIT_CHECKLIST.md +199 -0
  14. package/docs/SKILL_AUDIT_LOOP.md +195 -0
  15. package/docs/SKILL_METADATA_PROTOCOL.md +609 -0
  16. package/docs/_archived/marketplace-publication-priority-2026-05-18.md +239 -0
  17. package/docs/adr/0001-predicate-set.md +69 -0
  18. package/docs/adr/0002-json-ld-context.md +82 -0
  19. package/docs/adr/0003-ontoclean-rigidity-tags.md +65 -0
  20. package/docs/adr/0004-persistent-identifiers.md +74 -0
  21. package/docs/adr/0005-freshness-consolidation.md +70 -0
  22. package/docs/adr/0006-revise-predicate-rename.md +105 -0
  23. package/docs/adr/0007-audit-loop-cadence.md +99 -0
  24. package/docs/adr/0008-skill-surface-split-and-curation-policy.md +93 -0
  25. package/docs/category-consumers.md +168 -0
  26. package/docs/concept-map.md +194 -0
  27. package/docs/diagrams/drift-states.mmd +21 -0
  28. package/docs/diagrams/manifest-pipeline.mmd +25 -0
  29. package/docs/diagrams/routing-harness.mmd +41 -0
  30. package/docs/diagrams/starter-graph.mmd +53 -0
  31. package/docs/field-decision-guide.md +315 -0
  32. package/docs/field-rationale.md +211 -0
  33. package/docs/field-reference.generated.md +624 -0
  34. package/docs/field-reference.md +1426 -0
  35. package/docs/glossary.md +190 -0
  36. package/docs/head-noun-glossary.md +63 -0
  37. package/docs/images/audit-phases.png +0 -0
  38. package/docs/images/drift-states.png +0 -0
  39. package/docs/images/graded-mode.png +0 -0
  40. package/docs/images/manifest-pipeline.png +0 -0
  41. package/docs/images/routing-harness.png +0 -0
  42. package/docs/images/skill-anatomy.png +0 -0
  43. package/docs/images/starter-graph.png +0 -0
  44. package/docs/images/system-model.png +0 -0
  45. package/docs/integrations/github-actions.md +155 -0
  46. package/docs/manifest-field-mapping.md +443 -0
  47. package/docs/marketplace-publication-queue.generated.md +240 -0
  48. package/docs/marketplace-release-agent-prompt.md +82 -0
  49. package/docs/marketplace-skill-candidate-list.md +272 -0
  50. package/docs/marketplace-syndication.md +222 -0
  51. package/docs/migration-sample-review.md +155 -0
  52. package/docs/migrations/v4-to-v5.md +168 -0
  53. package/docs/migrations/v5-to-v6.md +221 -0
  54. package/docs/name-exceptions.yaml +37 -0
  55. package/docs/plans/marketplace-p1-public-migration-plan.md +41 -0
  56. package/docs/plans/multi-root-workspace.md +148 -0
  57. package/docs/plans/scripts-roadmap.md +107 -0
  58. package/docs/plans/v4-schema-bump.md +160 -0
  59. package/docs/plans/wave-2-extraction.md +122 -0
  60. package/docs/positioning-vs-marketplaces.md +175 -0
  61. package/docs/proposals/skill-audit-loop-positioning.md +160 -0
  62. package/docs/quality-doctrine.md +138 -0
  63. package/docs/recommended-skills.md +150 -0
  64. package/docs/research/skill-comprehension-eval-research.md +1830 -0
  65. package/docs/research/skill-retrieval-evidence.md +66 -0
  66. package/docs/skill-metadata-protocol.md +471 -0
  67. package/docs/skills-sh-maintainer-cleanup-request.md +80 -0
  68. package/examples/audits/a11y/findings.md +52 -0
  69. package/examples/audits/a11y/scorecard.md +21 -0
  70. package/examples/audits/a11y/verdict.md +44 -0
  71. package/examples/audits/debugging/findings.md +59 -0
  72. package/examples/audits/debugging/scorecard.md +22 -0
  73. package/examples/audits/debugging/verdict.md +33 -0
  74. package/examples/audits/documentation/findings.md +59 -0
  75. package/examples/audits/documentation/scorecard.md +22 -0
  76. package/examples/audits/documentation/verdict.md +33 -0
  77. package/examples/evals/a11y.json +140 -0
  78. package/examples/evals/api-design.json +52 -0
  79. package/examples/evals/code-review.json +52 -0
  80. package/examples/evals/data-modeling.json +52 -0
  81. package/examples/evals/database-migration.json +52 -0
  82. package/examples/evals/debugging.json +118 -0
  83. package/examples/evals/dependency-architecture.json +52 -0
  84. package/examples/evals/design-system-architecture.json +52 -0
  85. package/examples/evals/error-tracking.json +52 -0
  86. package/examples/evals/event-contract-design.json +52 -0
  87. package/examples/evals/form-ux-architecture.json +52 -0
  88. package/examples/evals/framework-fit-analysis.json +52 -0
  89. package/examples/evals/graph-audit.json +139 -0
  90. package/examples/evals/information-architecture.json +52 -0
  91. package/examples/evals/interaction-feedback.json +52 -0
  92. package/examples/evals/interaction-patterns.json +52 -0
  93. package/examples/evals/layout-composition.json +52 -0
  94. package/examples/evals/lint-overlay.json +117 -0
  95. package/examples/evals/microcopy.json +52 -0
  96. package/examples/evals/observability-modeling.json +52 -0
  97. package/examples/evals/pattern-recognition.json +96 -0
  98. package/examples/evals/performance-engineering.json +52 -0
  99. package/examples/evals/refactor.json +128 -0
  100. package/examples/evals/semiotics.json +52 -0
  101. package/examples/evals/skill-infrastructure.json +96 -0
  102. package/examples/evals/skill-router.json +140 -0
  103. package/examples/evals/skill-router.routing.json +113 -0
  104. package/examples/evals/system-interface-contracts.json +52 -0
  105. package/examples/evals/task-analysis.json +52 -0
  106. package/examples/evals/testing-strategy.json +118 -0
  107. package/examples/evals/type-safety.json +249 -0
  108. package/examples/evals/visual-design-foundations.json +52 -0
  109. package/examples/evals/webhook-integration.json +52 -0
  110. package/examples/exports/a11y.skill-md.md +80 -0
  111. package/examples/exports/debugging.skill-md.md +80 -0
  112. package/examples/exports/refactor.skill-md.md +78 -0
  113. package/examples/exports/testing-strategy.skill-md.md +81 -0
  114. package/examples/projects/markdown-static-site/README.md +115 -0
  115. package/examples/projects/markdown-static-site/skills/content-source-router/SKILL.md +131 -0
  116. package/examples/projects/markdown-static-site/skills/image-optimization-pipeline-config/SKILL.md +132 -0
  117. package/examples/projects/markdown-static-site/skills/link-rot-detection/SKILL.md +103 -0
  118. package/examples/projects/markdown-static-site/skills/markdown-post-frontmatter-validation/SKILL.md +133 -0
  119. package/examples/projects/markdown-static-site/skills/migrate-posts-to-v2-frontmatter/SKILL.md +140 -0
  120. package/examples/projects/saas-stripe-postgres/README.md +208 -0
  121. package/examples/projects/saas-stripe-postgres/db/migrations/0004_canonicalize_orders.sql +37 -0
  122. package/examples/projects/saas-stripe-postgres/db/schema.sql +112 -0
  123. package/examples/projects/saas-stripe-postgres/skills/migrate-orders-to-canonical-schema/SKILL.md +149 -0
  124. package/examples/projects/saas-stripe-postgres/skills/nextjs-server-action-validation/SKILL.md +154 -0
  125. package/examples/projects/saas-stripe-postgres/skills/payment-provider-router/SKILL.md +153 -0
  126. package/examples/projects/saas-stripe-postgres/skills/postgres-rls-pattern/SKILL.md +163 -0
  127. package/examples/projects/saas-stripe-postgres/skills/stripe-webhook-signature-verification/SKILL.md +137 -0
  128. package/examples/protocol/skill-metadata-template.md +301 -0
  129. package/examples/protocol/skills.manifest.sample.json +13245 -0
  130. package/examples/skill-metadata-template.md +317 -0
  131. package/examples/skills.manifest.sample.json +13519 -0
  132. package/examples/tests/v3-1-skos-fixture/SKILL.md +93 -0
  133. package/marketplace/README.md +17 -0
  134. package/marketplace/skills/a11y/SKILL.md +66 -0
  135. package/marketplace/skills/acid-fundamentals/SKILL.md +106 -0
  136. package/marketplace/skills/agent-engineering/SKILL.md +386 -0
  137. package/marketplace/skills/agent-eval-design/SKILL.md +55 -0
  138. package/marketplace/skills/ai-native-development/SKILL.md +294 -0
  139. package/marketplace/skills/api-design/SKILL.md +60 -0
  140. package/marketplace/skills/architecture-decision-records/SKILL.md +55 -0
  141. package/marketplace/skills/background-jobs/SKILL.md +265 -0
  142. package/marketplace/skills/bounded-context-mapping/SKILL.md +55 -0
  143. package/marketplace/skills/cap-theorem-tradeoffs/SKILL.md +127 -0
  144. package/marketplace/skills/client-server-boundary/SKILL.md +187 -0
  145. package/marketplace/skills/code-review/SKILL.md +120 -0
  146. package/marketplace/skills/color-system-design/SKILL.md +43 -0
  147. package/marketplace/skills/component-architecture/SKILL.md +126 -0
  148. package/marketplace/skills/compression/SKILL.md +112 -0
  149. package/marketplace/skills/conceptual-modeling/SKILL.md +181 -0
  150. package/marketplace/skills/connection-pooling/SKILL.md +105 -0
  151. package/marketplace/skills/constraint-awareness/SKILL.md +287 -0
  152. package/marketplace/skills/content-monitor/SKILL.md +209 -0
  153. package/marketplace/skills/context-engineering/SKILL.md +320 -0
  154. package/marketplace/skills/context-graph/SKILL.md +174 -0
  155. package/marketplace/skills/context-management/SKILL.md +174 -0
  156. package/marketplace/skills/context-window/SKILL.md +239 -0
  157. package/marketplace/skills/contract-testing/SKILL.md +120 -0
  158. package/marketplace/skills/cron-scheduling/SKILL.md +223 -0
  159. package/marketplace/skills/dark-mode-implementation/SKILL.md +47 -0
  160. package/marketplace/skills/data-modeling/SKILL.md +59 -0
  161. package/marketplace/skills/data-modeling-fundamentals/SKILL.md +117 -0
  162. package/marketplace/skills/database-migration/SKILL.md +429 -0
  163. package/marketplace/skills/debugging/SKILL.md +67 -0
  164. package/marketplace/skills/dependency-architecture/SKILL.md +58 -0
  165. package/marketplace/skills/design-module-composition/SKILL.md +43 -0
  166. package/marketplace/skills/design-system-architecture/SKILL.md +61 -0
  167. package/marketplace/skills/design-thinking/SKILL.md +44 -0
  168. package/marketplace/skills/diagnosis/SKILL.md +296 -0
  169. package/marketplace/skills/diff-analysis/SKILL.md +188 -0
  170. package/marketplace/skills/e2e-test-design/SKILL.md +113 -0
  171. package/marketplace/skills/entity-relationship-modeling/SKILL.md +218 -0
  172. package/marketplace/skills/epistemic-grounding/SKILL.md +112 -0
  173. package/marketplace/skills/error-boundary/SKILL.md +235 -0
  174. package/marketplace/skills/error-tracking/SKILL.md +261 -0
  175. package/marketplace/skills/eval-driven-development/SKILL.md +147 -0
  176. package/marketplace/skills/evaluation/SKILL.md +113 -0
  177. package/marketplace/skills/event-contract-design/SKILL.md +60 -0
  178. package/marketplace/skills/event-storming/SKILL.md +56 -0
  179. package/marketplace/skills/form-ux-architecture/SKILL.md +60 -0
  180. package/marketplace/skills/framework-fit-analysis/SKILL.md +59 -0
  181. package/marketplace/skills/frontend-architecture/SKILL.md +43 -0
  182. package/marketplace/skills/generative-ui/SKILL.md +118 -0
  183. package/marketplace/skills/graph-audit/SKILL.md +81 -0
  184. package/marketplace/skills/guardrails/SKILL.md +118 -0
  185. package/marketplace/skills/hooks-patterns/SKILL.md +185 -0
  186. package/marketplace/skills/http-semantics/SKILL.md +136 -0
  187. package/marketplace/skills/ideation/SKILL.md +41 -0
  188. package/marketplace/skills/indexing-strategy/SKILL.md +108 -0
  189. package/marketplace/skills/information-architecture/SKILL.md +59 -0
  190. package/marketplace/skills/integration-test-design/SKILL.md +111 -0
  191. package/marketplace/skills/intent-recognition/SKILL.md +136 -0
  192. package/marketplace/skills/interaction-feedback/SKILL.md +59 -0
  193. package/marketplace/skills/interaction-patterns/SKILL.md +59 -0
  194. package/marketplace/skills/journey-mapping/SKILL.md +41 -0
  195. package/marketplace/skills/keywords/SKILL.md +213 -0
  196. package/marketplace/skills/knowledge-modeling/SKILL.md +232 -0
  197. package/marketplace/skills/layout-composition/SKILL.md +59 -0
  198. package/marketplace/skills/linguistics/SKILL.md +429 -0
  199. package/marketplace/skills/lint-overlay/SKILL.md +76 -0
  200. package/marketplace/skills/mental-models/SKILL.md +126 -0
  201. package/marketplace/skills/merge-queue/SKILL.md +94 -0
  202. package/marketplace/skills/methodology/SKILL.md +317 -0
  203. package/marketplace/skills/microcopy/SKILL.md +232 -0
  204. package/marketplace/skills/middleware-patterns/SKILL.md +363 -0
  205. package/marketplace/skills/mobile-responsive-ux/SKILL.md +287 -0
  206. package/marketplace/skills/mutation-testing/SKILL.md +112 -0
  207. package/marketplace/skills/naming-conventions/SKILL.md +112 -0
  208. package/marketplace/skills/observability-modeling/SKILL.md +59 -0
  209. package/marketplace/skills/ontology-modeling/SKILL.md +67 -0
  210. package/marketplace/skills/owasp-security/SKILL.md +153 -0
  211. package/marketplace/skills/pattern-recognition/SKILL.md +472 -0
  212. package/marketplace/skills/performance-budgets/SKILL.md +185 -0
  213. package/marketplace/skills/performance-engineering/SKILL.md +58 -0
  214. package/marketplace/skills/performance-testing/SKILL.md +125 -0
  215. package/marketplace/skills/printify/SKILL.md +42 -0
  216. package/marketplace/skills/prioritization/SKILL.md +118 -0
  217. package/marketplace/skills/problem-framing/SKILL.md +41 -0
  218. package/marketplace/skills/problem-locating-solving/SKILL.md +203 -0
  219. package/marketplace/skills/project-knowledge-extraction/SKILL.md +54 -0
  220. package/marketplace/skills/prompt-craft/SKILL.md +134 -0
  221. package/marketplace/skills/prompt-injection-defense/SKILL.md +132 -0
  222. package/marketplace/skills/property-based-testing/SKILL.md +100 -0
  223. package/marketplace/skills/prototyping/SKILL.md +43 -0
  224. package/marketplace/skills/query-optimization/SKILL.md +144 -0
  225. package/marketplace/skills/real-time-updates/SKILL.md +324 -0
  226. package/marketplace/skills/ref-patterns/SKILL.md +284 -0
  227. package/marketplace/skills/refactor/SKILL.md +65 -0
  228. package/marketplace/skills/rendering-models/SKILL.md +142 -0
  229. package/marketplace/skills/replication-patterns/SKILL.md +110 -0
  230. package/marketplace/skills/research-synthesis/SKILL.md +41 -0
  231. package/marketplace/skills/route-handler-design/SKILL.md +347 -0
  232. package/marketplace/skills/schema-evolution/SKILL.md +140 -0
  233. package/marketplace/skills/security-fundamentals/SKILL.md +139 -0
  234. package/marketplace/skills/semantic-center/SKILL.md +194 -0
  235. package/marketplace/skills/semantic-relations/SKILL.md +250 -0
  236. package/marketplace/skills/semantics/SKILL.md +366 -0
  237. package/marketplace/skills/semiotics/SKILL.md +230 -0
  238. package/marketplace/skills/seo-strategy/SKILL.md +260 -0
  239. package/marketplace/skills/server-actions-design/SKILL.md +243 -0
  240. package/marketplace/skills/server-components-design/SKILL.md +190 -0
  241. package/marketplace/skills/sharding-strategy/SKILL.md +123 -0
  242. package/marketplace/skills/shopify/SKILL.md +42 -0
  243. package/marketplace/skills/skill-infrastructure/SKILL.md +320 -0
  244. package/marketplace/skills/skill-router/SKILL.md +71 -0
  245. package/marketplace/skills/skill-scaffold/SKILL.md +105 -0
  246. package/marketplace/skills/snapshot-testing/SKILL.md +120 -0
  247. package/marketplace/skills/spec-driven-development/SKILL.md +148 -0
  248. package/marketplace/skills/state-machine-modeling/SKILL.md +56 -0
  249. package/marketplace/skills/state-management/SKILL.md +134 -0
  250. package/marketplace/skills/streaming-architecture/SKILL.md +194 -0
  251. package/marketplace/skills/summarization/SKILL.md +156 -0
  252. package/marketplace/skills/suspense-patterns/SKILL.md +265 -0
  253. package/marketplace/skills/system-interface-contracts/SKILL.md +59 -0
  254. package/marketplace/skills/task-analysis/SKILL.md +201 -0
  255. package/marketplace/skills/taxonomy-design/SKILL.md +66 -0
  256. package/marketplace/skills/test-coverage-strategy/SKILL.md +108 -0
  257. package/marketplace/skills/test-doubles-design/SKILL.md +98 -0
  258. package/marketplace/skills/test-driven-development/SKILL.md +96 -0
  259. package/marketplace/skills/testing-strategy/SKILL.md +67 -0
  260. package/marketplace/skills/theme-system-design/SKILL.md +43 -0
  261. package/marketplace/skills/tool-call-flow/SKILL.md +229 -0
  262. package/marketplace/skills/tool-call-strategy/SKILL.md +292 -0
  263. package/marketplace/skills/transaction-isolation/SKILL.md +98 -0
  264. package/marketplace/skills/type-safety/SKILL.md +177 -0
  265. package/marketplace/skills/typography-system/SKILL.md +43 -0
  266. package/marketplace/skills/usability-testing/SKILL.md +43 -0
  267. package/marketplace/skills/user-research/SKILL.md +43 -0
  268. package/marketplace/skills/vercel-composition-patterns/SKILL.md +157 -0
  269. package/marketplace/skills/version-control/SKILL.md +233 -0
  270. package/marketplace/skills/visual-design-foundations/SKILL.md +59 -0
  271. package/marketplace/skills/visual-hierarchy/SKILL.md +43 -0
  272. package/marketplace/skills/webhook-integration/SKILL.md +331 -0
  273. package/marketplace/skills/writing-humanizer/SKILL.md +380 -0
  274. package/package.json +67 -0
  275. package/schemas/manifest.schema.json +811 -0
  276. package/schemas/manifest.v2.schema.json +164 -0
  277. package/schemas/manifest.v3.schema.json +758 -0
  278. package/schemas/manifest.v4.schema.json +755 -0
  279. package/schemas/manifest.v5.schema.json +755 -0
  280. package/schemas/manifest.v6.schema.json +811 -0
  281. package/schemas/skill.context.jsonld +279 -0
  282. package/schemas/skill.schema.json +919 -0
  283. package/schemas/skill.v2.schema.json +201 -0
  284. package/schemas/skill.v3.schema.json +827 -0
  285. package/schemas/skill.v4.schema.json +822 -0
  286. package/schemas/skill.v5.schema.json +830 -0
  287. package/schemas/skill.v6.schema.json +946 -0
  288. package/schemas/vocabulary/keywords.json +180 -0
  289. package/schemas/vocabulary/workspace_tags.json +23 -0
  290. package/scripts/__tests__/migrate-skill-v2-to-v3.test.js +161 -0
  291. package/scripts/__tests__/migrate-skill-v3-to-v4.test.js +158 -0
  292. package/scripts/__tests__/test-export-parser-drift.js +149 -0
  293. package/scripts/__tests__/test-marketplace-export.js +114 -0
  294. package/scripts/__tests__/test-router-paths.js +82 -0
  295. package/scripts/__tests__/test-stability-promotion.js +244 -0
  296. package/scripts/__tests__/test-v3-1-alias-contract.js +109 -0
  297. package/scripts/__tests__/test-v3-1-skos-runtime.js +116 -0
  298. package/scripts/backfill-schema-version.js +198 -0
  299. package/scripts/build-field-reference.js +160 -0
  300. package/scripts/build-retrieval-baseline.js +511 -0
  301. package/scripts/check-markdown-links.js +211 -0
  302. package/scripts/check-protocol-consistency.js +979 -0
  303. package/scripts/export-marketplace-skills.js +610 -0
  304. package/scripts/export-skill.js +374 -0
  305. package/scripts/generate-manifest.js +787 -0
  306. package/scripts/lib/alias-contract.js +83 -0
  307. package/scripts/lib/audit-prompt-builder.js +771 -0
  308. package/scripts/lib/mock-grader.js +134 -0
  309. package/scripts/lib/parse-frontmatter.js +429 -0
  310. package/scripts/lib/roots.js +119 -0
  311. package/scripts/lint/check-archetype-sections.js +185 -0
  312. package/scripts/lint/check-category-enum.js +83 -0
  313. package/scripts/lint/check-routing-eval.js +146 -0
  314. package/scripts/lint/check-routing-quality.js +211 -0
  315. package/scripts/lint/check-stability-promotion.js +220 -0
  316. package/scripts/lint/format-code-frame.js +206 -0
  317. package/scripts/marketplace-install.js +125 -0
  318. package/scripts/migrate-category-to-enum.js +169 -0
  319. package/scripts/migrate-skill-v2-to-v3.js +424 -0
  320. package/scripts/migrate-skill-v3-to-v4.js +200 -0
  321. package/scripts/migrate-skill-v5-to-v6.js +304 -0
  322. package/scripts/restructure-by-category.js +85 -0
  323. package/scripts/seed-publication-classification.js +282 -0
  324. package/scripts/skill-audit.js +893 -0
  325. package/scripts/skill-graph-drift.js +483 -0
  326. package/scripts/skill-graph-route.js +766 -0
  327. package/scripts/skill-graph-routing-eval.js +393 -0
  328. package/scripts/skill-lint.js +1317 -0
  329. package/scripts/skill-overlap.js +213 -0
  330. package/scripts/verify-skill-md-export.js +201 -0
@@ -0,0 +1,118 @@
1
+ {
2
+ "skill_name": "testing-strategy",
3
+ "subject": "Verification planning for bug fixes, features, and refactors: test scope, test-level selection, effort-to-risk matching, regression targeting, evidence quality, and failure-case coverage",
4
+ "adjacent_concepts": ["debugging", "refactor", "lint-overlay"],
5
+ "grounding_note": "This skill is self-grounding. Its evals validate internal consistency against the SKILL.md body — the skill does not anchor to an external canonical source (e.g., a specific testing-pyramid paper) because the doctrine here is synthesized from industry practice rather than derived from one authoritative text. Evals cite SKILL.md line ranges deliberately; add external truth_sources only if the skill is later narrowed to a specific external framework. Line-range stability is enforced by the `checkEvalTruthSourceRanges` lint check (scripts/skill-lint.js D2) — every edit that moves a cited range out of file bounds fails lint before commit, so drift surfaces immediately rather than silently degrading grader grounding.",
6
+ "evals": [
7
+ {
8
+ "id": 1,
9
+ "prompt": "An engineer is adding a new feature that involves a pure function with no I/O composed inside a service that calls a payment processor over the network. According to the testing-strategy skill's Test-Level Selection table, which test level applies to which piece and why?",
10
+ "dimension": "definition",
11
+ "substance": "domain",
12
+ "calibration": "semantic",
13
+ "truth_mode": "code_verification",
14
+ "skill_type": "concept",
15
+ "criticality": "high",
16
+ "truth_sources": ["skills/testing-strategy/SKILL.md:84-91"]
17
+ },
18
+ {
19
+ "id": 2,
20
+ "prompt": "The skill's Philosophy says \"a test that never fails is noise.\" Explain the mental model behind that claim — why does the skill treat a non-failing test as a cost rather than a free safety net, and what does it imply about coverage-percentage targets?",
21
+ "dimension": "mental_model",
22
+ "substance": "domain",
23
+ "calibration": "semantic",
24
+ "truth_mode": "conceptual_correctness_plus_repo_application",
25
+ "skill_type": "concept",
26
+ "criticality": "high",
27
+ "truth_sources": ["skills/testing-strategy/SKILL.md:76-78"]
28
+ },
29
+ {
30
+ "id": 3,
31
+ "prompt": "A bug has already landed in production. An engineer wants to use the testing-strategy skill to chase it. Should they? Cite the negative-routing rule that decides this and name the skill that applies to an already-failing behavior.",
32
+ "dimension": "boundary",
33
+ "substance": "contradiction-check",
34
+ "calibration": "semantic",
35
+ "truth_mode": "code_verification",
36
+ "skill_type": "concept",
37
+ "criticality": "high",
38
+ "truth_sources": ["skills/testing-strategy/SKILL.md:111-116"]
39
+ },
40
+ {
41
+ "id": 4,
42
+ "prompt": "A team writes a unit test for a function by mocking its only external collaborator. The test passes. According to the testing-strategy's level-selection anti-patterns, what is wrong with this approach and what is the correct fix?",
43
+ "dimension": "application",
44
+ "substance": "domain",
45
+ "calibration": "process",
46
+ "truth_mode": "process_correctness",
47
+ "skill_type": "workflow",
48
+ "criticality": "high",
49
+ "truth_sources": ["skills/testing-strategy/SKILL.md:93-98"]
50
+ },
51
+ {
52
+ "id": 5,
53
+ "prompt": "A bug slipped past unit tests and reached production. The team asks where to put the regression test. According to the testing-strategy skill's Test-Level Selection table, what is the answer and why is adding a new unit test at the same level a poor choice?",
54
+ "dimension": "purpose",
55
+ "substance": "domain",
56
+ "calibration": "semantic",
57
+ "truth_mode": "conceptual_correctness_plus_repo_application",
58
+ "skill_type": "concept",
59
+ "criticality": "critical",
60
+ "truth_sources": ["skills/testing-strategy/SKILL.md:84-91"]
61
+ },
62
+ {
63
+ "id": 6,
64
+ "prompt": "A contributor argues that adding a test is always net-positive because more coverage can never be bad. According to the testing-strategy skill, is that correct? Cite the row in the Test-Level Selection table that contradicts this and explain the cost that the argument ignores.",
65
+ "dimension": "application",
66
+ "substance": "contradiction-check",
67
+ "calibration": "semantic",
68
+ "truth_mode": "conceptual_correctness_plus_repo_application",
69
+ "skill_type": "concept",
70
+ "criticality": "critical",
71
+ "truth_sources": ["skills/testing-strategy/SKILL.md:84-91", "skills/testing-strategy/SKILL.md:93-98"]
72
+ },
73
+ {
74
+ "id": 7,
75
+ "prompt": "A team asks the testing-strategy skill to review whether their architecture diagram correctly describes the service boundaries — there is no pending change, no failure, and no verification target. Should the skill accept? Cite the negative-routing rule.",
76
+ "dimension": "boundary",
77
+ "substance": "contradiction-check",
78
+ "calibration": "semantic",
79
+ "truth_mode": "code_verification",
80
+ "skill_type": "concept",
81
+ "criticality": "normal",
82
+ "truth_sources": ["skills/testing-strategy/SKILL.md:111-116"]
83
+ },
84
+ {
85
+ "id": 8,
86
+ "prompt": "A test plan for a new feature states: \"The QA team will manually walk through the checkout flow each week and sign off in Slack.\" According to the testing-strategy skill's Verification checklist, does this satisfy the evidence-quality gate? If not, explain what \"concrete, reproducible verification\" requires and what the correct response to this plan is.",
87
+ "dimension": "application",
88
+ "substance": "contradiction-check",
89
+ "calibration": "semantic",
90
+ "truth_mode": "conceptual_correctness_plus_repo_application",
91
+ "skill_type": "concept",
92
+ "criticality": "high",
93
+ "truth_sources": ["skills/testing-strategy/SKILL.md:104-109"]
94
+ },
95
+ {
96
+ "id": 9,
97
+ "prompt": "A test plan for a new discount-code redemption feature covers the happy path (valid code applied) but has no tests for expired codes, maxed-out codes, or malformed codes. According to the testing-strategy skill's Verification checklist, which gate fails, and what is the prescribed response from the skill when asked to sign off on this plan?",
98
+ "dimension": "application",
99
+ "substance": "domain",
100
+ "calibration": "semantic",
101
+ "truth_mode": "conceptual_correctness_plus_repo_application",
102
+ "skill_type": "concept",
103
+ "criticality": "high",
104
+ "truth_sources": ["skills/testing-strategy/SKILL.md:104-109"]
105
+ },
106
+ {
107
+ "id": 10,
108
+ "prompt": "A utility function for formatting phone numbers has been stable and unchanged for two years, with no bug history. Its behavior is covered only indirectly by integration tests of the features that call it. A compliance auditor has flagged that the codebase needs explicit test coverage for every public utility used in customer-facing flows, and the engineering team is onboarding five new contributors next quarter who will benefit from clearer specs. According to the testing-strategy skill's Test-Level Selection table — specifically the `No new test` row which reads \"Behavior that is 'obviously correct,' unchanged for a year, no external pressure\" — does the compliance and onboarding context shift the verdict? Apply the qualifier \"no external pressure\" correctly and recommend what the skill would actually prescribe here.",
109
+ "dimension": "application",
110
+ "substance": "domain",
111
+ "calibration": "semantic",
112
+ "truth_mode": "conceptual_correctness_plus_repo_application",
113
+ "skill_type": "concept",
114
+ "criticality": "high",
115
+ "truth_sources": ["skills/testing-strategy/SKILL.md:84-91"]
116
+ }
117
+ ]
118
+ }
@@ -0,0 +1,249 @@
1
+ {
2
+ "skill_name": "type-safety",
3
+ "subject": "Type-safety as a discipline: what static type systems guarantee, the difference between sound and unsound systems, structural vs nominal typing, type narrowing, the runtime boundary problem, the discipline of validating at I/O boundaries and trusting types inside, and the connection from compile-time guarantees to runtime correctness",
4
+ "adjacent_concepts": ["api-design", "testing-strategy", "data-modeling", "code-review"],
5
+ "grounding_note": "Truth sources cite line ranges in skills/type-safety/SKILL.md (frontmatter concept block at lines 63-116, body Verification at 254-262, Do NOT Use When at 264-272). Cases are authored against the v4 SKILL.md as of 2026-05-16. Comprehension-dimension extensions (comprehension_dimension, concept_field, transfer, expected_behaviors) are additive — graders that don't consume them treat them as no-ops.",
6
+ "evals": [
7
+ {
8
+ "id": 1,
9
+ "prompt": "Explain what type safety is, to a product manager who has never written code. Do not use the phrases from the type-safety skill body verbatim.",
10
+ "dimension": "definition",
11
+ "comprehension_dimension": "C1",
12
+ "concept_field": "definition",
13
+ "transfer": "near",
14
+ "substance": "domain",
15
+ "calibration": "semantic",
16
+ "truth_mode": "conceptual_correctness_plus_repo_application",
17
+ "skill_type": "concept",
18
+ "criticality": "high",
19
+ "truth_sources": [
20
+ "skills/type-safety/SKILL.md:64",
21
+ "skills/type-safety/SKILL.md#type-safety"
22
+ ],
23
+ "expected_behaviors": [
24
+ { "id": "names_primary_category", "kind": "positive", "description": "Names compile-time error detection as the category" },
25
+ { "id": "states_observable_effect", "kind": "positive", "description": "States that types rule out a class of runtime errors" },
26
+ { "id": "no_verbatim_span", "kind": "negative", "description": "No 6-gram span shared with skills/type-safety/SKILL.md concept.definition or body" },
27
+ { "id": "no_fabricated_specificity", "kind": "negative", "description": "Does not invent claims about specific languages, line counts, or guarantees not in concept.definition" }
28
+ ],
29
+ "expected_reasoning": "A correct answer names the category (compile-time error detection), the effect (rules out a class of runtime errors), the cost (annotation burden up-front), and the benefit (compounding safety). It does NOT verbatim-copy the concept.definition's first sentence, which is the canonical body phrasing."
30
+ },
31
+ {
32
+ "id": 2,
33
+ "prompt": "A team is migrating a Python service that uses dict[str, Any] for inbound API payloads to a TypeScript rewrite. They're debating between Record<string, unknown> and Record<string, any>. Apply type-safety's mental model primitives to this case. The skill body discusses JSON.parse but does not enumerate this specific Python-to-TypeScript migration scenario.",
34
+ "dimension": "mental_model",
35
+ "comprehension_dimension": "C2",
36
+ "concept_field": "mental_model",
37
+ "transfer": "far",
38
+ "substance": "domain",
39
+ "calibration": "semantic",
40
+ "truth_mode": "conceptual_correctness_plus_repo_application",
41
+ "skill_type": "concept",
42
+ "criticality": "critical",
43
+ "truth_sources": [
44
+ "skills/type-safety/SKILL.md:65-78",
45
+ "skills/type-safety/SKILL.md:208-239",
46
+ "skills/type-safety/SKILL.md#the-runtime-boundary"
47
+ ],
48
+ "expected_behaviors": [
49
+ { "id": "invokes_runtime_boundary_primitive", "kind": "positive", "description": "Identifies API payloads as crossing the runtime boundary, where type information stops" },
50
+ { "id": "invokes_narrowing_or_validation_primitive", "kind": "positive", "description": "Invokes either the narrowing primitive (unknown forces narrowing) or the validation primitive (parse at boundary)" },
51
+ { "id": "distinguishes_any_from_unknown", "kind": "positive", "description": "Names that any opts out of type-checking, unknown forces narrowing" },
52
+ { "id": "conclusion_consistent_with_skill", "kind": "positive", "description": "Concludes Record<string, unknown> + validator at the boundary preserves type-safety; Record<string, any> silently disables it" },
53
+ { "id": "scenario_not_in_body", "kind": "negative", "description": "Body does not mention Python-to-TypeScript migration; the answer is far transfer, not retrieval" }
54
+ ],
55
+ "expected_reasoning": "API payloads are an I/O boundary; their types are unverified bytes until parsed. unknown forces narrowing (the type-safe answer to 'I don't know what this is yet'); any disables checking entirely. The right answer is Record<string, unknown> at the type level plus a schema validator (Zod, io-ts) at the boundary to produce typed values. Record<string, any> would compile but silently destroy the discipline."
56
+ },
57
+ {
58
+ "id": 3,
59
+ "prompt": "Why does the discipline of type-safety exist as a thing teams choose to invest in? What did practitioners do before it was widely adopted, and what was specifically broken about that approach?",
60
+ "dimension": "purpose",
61
+ "comprehension_dimension": "C3",
62
+ "concept_field": "purpose",
63
+ "transfer": "near",
64
+ "substance": "domain",
65
+ "calibration": "semantic",
66
+ "truth_mode": "conceptual_correctness_plus_repo_application",
67
+ "skill_type": "concept",
68
+ "criticality": "high",
69
+ "truth_sources": [
70
+ "skills/type-safety/SKILL.md:79-84"
71
+ ],
72
+ "expected_behaviors": [
73
+ { "id": "names_pain_point", "kind": "positive", "description": "Names the pain point — runtime bugs becoming production incidents or silent data corruption" },
74
+ { "id": "names_prior_alternative", "kind": "positive", "description": "Names the prior alternative — dynamic languages relying on tests, docs, and reader vigilance to communicate contracts" },
75
+ { "id": "names_improvement_mechanism", "kind": "positive", "description": "States that types make contracts checkable at compile time and visible at call sites; scales with team size and code size" },
76
+ { "id": "no_fabricated_purpose", "kind": "negative", "description": "Does not add purposes not in the skill — e.g., does not claim types are about performance" }
77
+ ]
78
+ },
79
+ {
80
+ "id": 4,
81
+ "prompt": "We're designing the JSON shape of a new outbound webhook event for the orders service. Apply type-safety to decide the field structure and naming.",
82
+ "dimension": "boundary",
83
+ "comprehension_dimension": "C4",
84
+ "concept_field": "boundary",
85
+ "transfer": "near",
86
+ "substance": "contradiction-check",
87
+ "calibration": "semantic",
88
+ "truth_mode": "code_verification",
89
+ "skill_type": "concept",
90
+ "criticality": "high",
91
+ "truth_sources": [
92
+ "skills/type-safety/SKILL.md:85-94",
93
+ "skills/type-safety/SKILL.md:264-272",
94
+ "skills/type-safety/SKILL.md#do-not-use-when"
95
+ ],
96
+ "expected_behaviors": [
97
+ { "id": "identifies_cross_into_api_design", "kind": "positive", "description": "Recognizes that JSON-shape design of an API surface is api-design, not type-safety" },
98
+ { "id": "names_api_design_as_owner", "kind": "positive", "description": "Names api-design as the correct owner skill" },
99
+ { "id": "explains_mechanism_of_difference", "kind": "positive", "description": "States api-design owns the external surface contract; type-safety owns the discipline of expressing internal program correctness as types" },
100
+ { "id": "no_partial_comply_in_type_safety_voice", "kind": "negative", "description": "Does not provide JSON shape recommendations in type-safety's voice before handing off" }
101
+ ]
102
+ },
103
+ {
104
+ "id": 5,
105
+ "prompt": "Where does Haskell's Generalized Algebraic Data Types (GADTs) feature sit in type-safety's taxonomy? It allows types to depend on values like dependent types but is more constrained. Place it using the skill's taxonomy vocabulary, or explain why it doesn't fit.",
106
+ "dimension": "mental_model",
107
+ "comprehension_dimension": "C5",
108
+ "concept_field": "taxonomy",
109
+ "transfer": "far",
110
+ "substance": "domain",
111
+ "calibration": "semantic",
112
+ "truth_mode": "conceptual_correctness_plus_repo_application",
113
+ "skill_type": "concept",
114
+ "criticality": "normal",
115
+ "truth_sources": [
116
+ "skills/type-safety/SKILL.md:95-103"
117
+ ],
118
+ "expected_behaviors": [
119
+ { "id": "uses_taxonomy_vocab", "kind": "positive", "description": "Uses one or more of the schema's taxonomy relationship types: subset, alternative, prerequisite, composition, specialization" },
120
+ { "id": "places_between_dependent_and_refinement", "kind": "positive", "description": "Places GADTs near dependent and refinement types, recognizing the partial dependency" },
121
+ { "id": "preserves_existing_relationships", "kind": "positive", "description": "Placement does not contradict the relationships among existing taxonomy entries (sound vs unsound, structural vs nominal)" },
122
+ { "id": "admits_imperfect_fit_or_extends", "kind": "positive", "description": "Either admits the fit is imperfect and proposes how to extend, or fits with a clear relationship-type justification" }
123
+ ],
124
+ "expected_reasoning": "GADTs sit between Haskell's vanilla parametric polymorphism and full dependent types — types can be refined by pattern-matching on constructors, but not by arbitrary value expressions. In the skill's taxonomy, the closest fit is 'specialization of sound type systems toward refinement types' or 'composition of pattern-matching with parametric polymorphism'. A correct answer either places GADTs in one of those slots or names the gap in the taxonomy."
125
+ },
126
+ {
127
+ "id": 6,
128
+ "prompt": "Part A: The type-safety skill compares a type system to a building's structural engineering. Use that analogy to explain how `npm audit` differs from type-checking — does the analogy place npm audit as a type-check, as a different kind of check, or outside the scope of the analogy?\n\nPart B: Where does the structural-engineering analogy break down? Give one concrete case where applying the analogy would mislead someone.",
129
+ "dimension": "mental_model",
130
+ "comprehension_dimension": "C6",
131
+ "concept_field": "analogy",
132
+ "transfer": "far",
133
+ "substance": "domain",
134
+ "calibration": "semantic",
135
+ "truth_mode": "conceptual_correctness_plus_repo_application",
136
+ "skill_type": "concept",
137
+ "criticality": "normal",
138
+ "truth_sources": [
139
+ "skills/type-safety/SKILL.md:104-107"
140
+ ],
141
+ "expected_behaviors": [
142
+ { "id": "part_a_distinguishes_npm_audit", "kind": "positive", "description": "Part A: Identifies npm audit as a different class of check (fire-marshal inspection vs structural design) — checking for vulnerabilities found AFTER the build, not for the threat class the design accounted for" },
143
+ { "id": "part_b_names_real_break", "kind": "positive", "description": "Part B: Names a concrete way the analogy misleads — e.g., structural engineering is stamped once vs type-check runs every change; or escape hatches are gated in engineering but not in TypeScript by default" },
144
+ { "id": "preserves_structural_relationships", "kind": "positive", "description": "Part A's mapping preserves the analogy's load-bearing relationship (build-time design verification vs runtime threat tolerance)" },
145
+ { "id": "no_under_claim_on_break", "kind": "negative", "description": "Part B does not refuse to identify a break ('the analogy holds') — the analogy DOES break in identifiable ways and a refusal indicates non-engagement" }
146
+ ]
147
+ },
148
+ {
149
+ "id": 7,
150
+ "prompt": "Our backend team has been told that since we use TypeScript with strict mode enabled, the JSON parsed from our customer API responses is automatically validated at compile time. We've been planning the next quarter's work and someone proposed dropping our Zod schemas to reduce maintenance overhead. Walk me through whether that's a sensible plan.",
151
+ "dimension": "anti_pattern",
152
+ "comprehension_dimension": "C7",
153
+ "concept_field": "misconception",
154
+ "transfer": "near",
155
+ "substance": "contradiction-check",
156
+ "calibration": "semantic",
157
+ "truth_mode": "conceptual_correctness_plus_repo_application",
158
+ "skill_type": "concept",
159
+ "criticality": "critical",
160
+ "truth_sources": [
161
+ "skills/type-safety/SKILL.md:108-115",
162
+ "skills/type-safety/SKILL.md:208-239",
163
+ "skills/type-safety/SKILL.md#the-runtime-boundary"
164
+ ],
165
+ "expected_behaviors": [
166
+ { "id": "flags_misconception_unprompted", "kind": "positive", "description": "Recognizes 'JSON parsed... automatically validated at compile time' as the misconception without being told it's wrong" },
167
+ { "id": "corrects_via_runtime_boundary", "kind": "positive", "description": "Corrects using the runtime boundary primitive — TypeScript types are claims about values; values from network/disk are not types until parsed" },
168
+ { "id": "rejects_dropping_zod", "kind": "positive", "description": "Recommends keeping Zod (or equivalent) for I/O boundary validation; dropping it would silently disable runtime safety" },
169
+ { "id": "explains_mechanism_of_mislead", "kind": "positive", "description": "States that the misconception conflates 'static type assertion' with 'runtime verification' — JSON.parse(input) as User is a claim, not a check" },
170
+ { "id": "no_partial_validation_of_misconception", "kind": "negative", "description": "Does not validate the misconception even partially (e.g., 'mostly true, but...')" }
171
+ ],
172
+ "expected_reasoning": "The misconception is that TypeScript catches runtime errors. It does not. TypeScript is unsound by design, and even without escape hatches, it makes no guarantees about values crossing the runtime boundary. JSON.parse(input) as User produces a value of static type User and actual type whatever the bytes contained — the static type is a claim, not a verification. Dropping Zod removes the only mechanism that actually validates at runtime; the agent should reject the proposal and explain the mechanism."
173
+ },
174
+ {
175
+ "id": 8,
176
+ "prompt": "Review this TypeScript snippet for type-safety. Snippet (payments.ts): `export async function fetchPaymentMethod(userId: string) { const response = await fetch('/api/users/' + userId + '/payment'); const data: any = await response.json(); return data as PaymentMethod; } function getUserBalance(user: User): number { const cents = user.balance; return cents / 100; } function summarize(records: PaymentRecord[]): string { return records.map(r => r.amount.toFixed(2)).join(', '); }`. What's your assessment?",
177
+ "dimension": "application",
178
+ "comprehension_dimension": "C8",
179
+ "concept_field": null,
180
+ "transfer": "near",
181
+ "substance": "domain",
182
+ "calibration": "semantic",
183
+ "truth_mode": "conceptual_correctness_plus_repo_application",
184
+ "skill_type": "concept",
185
+ "criticality": "high",
186
+ "truth_sources": [
187
+ "skills/type-safety/SKILL.md:254-262",
188
+ "skills/type-safety/SKILL.md#verification"
189
+ ],
190
+ "expected_behaviors": [
191
+ { "id": "invokes_verification_checklist_unprompted", "kind": "positive", "description": "Walks through ≥3 items from the type-safety Verification checklist without being told to" },
192
+ { "id": "flags_any_without_comment", "kind": "positive", "description": "Flags `any: any` on the data variable without justification" },
193
+ { "id": "flags_as_cast_without_comment", "kind": "positive", "description": "Flags `as PaymentMethod` cast on the unvalidated response.json() result" },
194
+ { "id": "flags_missing_runtime_validation", "kind": "positive", "description": "Flags the fetch boundary as missing runtime validation (Zod/io-ts/valibot)" },
195
+ { "id": "anchors_to_line_numbers", "kind": "positive", "description": "References specific lines or function names from the snippet, not abstract restatements" },
196
+ { "id": "no_verdict_without_anchor", "kind": "negative", "description": "Does not provide an overall verdict without anchoring each judgment to a Verification criterion" }
197
+ ],
198
+ "expected_reasoning": "The Verification checklist requires (paraphrased): strict mode on, noUncheckedIndexedAccess on, no `any` without justification comment, no `as Type` without justification comment, every I/O boundary parses with a validator, discriminated unions have exhaustiveness, public APIs have explicit return types. This snippet fails the `any` rule (data: any), the as cast rule (as PaymentMethod), the I/O validation rule (fetch result is cast not parsed), the explicit return type rule on the async function (inferred only). It passes 'no @ts-ignore'. The agent should walk through these items by name and anchor each to the snippet."
199
+ },
200
+ {
201
+ "id": 9,
202
+ "prompt": "Help me design a Postgres schema for storing user-uploaded file metadata. I want to use type-safety to decide which columns should be NOT NULL, which should be enums, and how to express the relationships between the tables.",
203
+ "dimension": "boundary",
204
+ "comprehension_dimension": "C9",
205
+ "concept_field": null,
206
+ "transfer": "near",
207
+ "substance": "contradiction-check",
208
+ "calibration": "semantic",
209
+ "truth_mode": "code_verification",
210
+ "skill_type": "concept",
211
+ "criticality": "high",
212
+ "truth_sources": [
213
+ "skills/type-safety/SKILL.md:264-272",
214
+ "skills/type-safety/SKILL.md#do-not-use-when"
215
+ ],
216
+ "expected_behaviors": [
217
+ { "id": "refuses_schema_design_in_type_safety_voice", "kind": "positive", "description": "Does not produce a column list with NOT NULL or enum suggestions in type-safety's voice" },
218
+ { "id": "names_data_modeling_as_owner", "kind": "positive", "description": "Names data-modeling as the correct owner skill" },
219
+ { "id": "explains_mechanism", "kind": "positive", "description": "States that data-modeling owns persistence shape; type-safety owns in-memory type contracts that consume that shape" },
220
+ { "id": "no_partial_comply", "kind": "negative", "description": "Does not provide partial schema recommendations before refusing" },
221
+ { "id": "no_overrefuse", "kind": "negative", "description": "If the user asks a follow-up about the TypeScript types that consume the schema once data-modeling has produced it, the agent re-engages — does not blanket-refuse all related work" }
222
+ ]
223
+ },
224
+ {
225
+ "id": 10,
226
+ "prompt": "A team has heard that 'type assertions in TypeScript are like runtime casts in C++ — they check the type at runtime and fail if it's wrong.' Walk me through whether this comparison is accurate.",
227
+ "dimension": "anti_pattern",
228
+ "comprehension_dimension": "C7",
229
+ "concept_field": "misconception",
230
+ "transfer": "far",
231
+ "substance": "contradiction-check",
232
+ "calibration": "semantic",
233
+ "truth_mode": "conceptual_correctness_plus_repo_application",
234
+ "skill_type": "concept",
235
+ "criticality": "high",
236
+ "truth_sources": [
237
+ "skills/type-safety/SKILL.md:108-115",
238
+ "skills/type-safety/SKILL.md#any-vs-unknown-vs-never"
239
+ ],
240
+ "expected_behaviors": [
241
+ { "id": "flags_misconception_about_as", "kind": "positive", "description": "Recognizes that TypeScript `as` is NOT a runtime check; C++ dynamic_cast is. The comparison is wrong." },
242
+ { "id": "states_as_compiles_to_nothing", "kind": "positive", "description": "States that `as` compiles to nothing — it is a directive to the type checker only, no runtime code is emitted" },
243
+ { "id": "distinguishes_static_directive_from_runtime_check", "kind": "positive", "description": "Distinguishes a static directive ('trust me, this is the type') from a runtime check (`instanceof`, validator parse)" },
244
+ { "id": "no_partial_validation", "kind": "negative", "description": "Does not say 'yes, with some caveats' — the comparison is structurally wrong, not just imprecise" }
245
+ ],
246
+ "expected_reasoning": "TypeScript `as` is a static directive, not a runtime check. It compiles to nothing. C++ `dynamic_cast` IS a runtime check (returns nullptr or throws). C-style casts in C++ are closer to TypeScript `as`. A correct answer rejects the comparison: TypeScript `as` is a silent claim a misused `as` makes; the C++ comparison should be to a C-style cast, not dynamic_cast. The body's misconception field explicitly names this trap."
247
+ }
248
+ ]
249
+ }
@@ -0,0 +1,52 @@
1
+ {
2
+ "skill_name": "visual-design-foundations",
3
+ "subject": "Visual craft decisions for color, typography, spacing, density, hierarchy, elevation, and motion feel",
4
+ "adjacent_concepts": ["semiotics", "design-system-architecture", "layout-composition", "a11y"],
5
+ "grounding_note": "Truth sources cite the whole SKILL.md file to keep the initial eval surface stable while the new skill settles.",
6
+ "evals": [
7
+ {
8
+ "id": 1,
9
+ "prompt": "A dense internal admin tool feels noisy and hard to scan. The task structure is settled. Which visual-design-foundations method steps apply first?",
10
+ "dimension": "application",
11
+ "substance": "domain",
12
+ "calibration": "process",
13
+ "truth_mode": "process_correctness",
14
+ "skill_type": "concept",
15
+ "criticality": "high",
16
+ "truth_sources": ["skills/visual-design-foundations/SKILL.md"]
17
+ },
18
+ {
19
+ "id": 2,
20
+ "prompt": "A green badge means both 'healthy' and 'cost increased' in a finance UI. Should visual-design-foundations own the meaning problem?",
21
+ "dimension": "boundary",
22
+ "substance": "contradiction-check",
23
+ "calibration": "semantic",
24
+ "truth_mode": "code_verification",
25
+ "skill_type": "concept",
26
+ "criticality": "normal",
27
+ "truth_sources": ["skills/visual-design-foundations/SKILL.md"]
28
+ },
29
+ {
30
+ "id": 3,
31
+ "prompt": "A designer wants raw hex colors wired into reusable component variants. Which boundary decides whether visual-design-foundations or design-system-architecture should own the work?",
32
+ "dimension": "boundary",
33
+ "substance": "contradiction-check",
34
+ "calibration": "semantic",
35
+ "truth_mode": "code_verification",
36
+ "skill_type": "concept",
37
+ "criticality": "normal",
38
+ "truth_sources": ["skills/visual-design-foundations/SKILL.md"]
39
+ },
40
+ {
41
+ "id": 4,
42
+ "prompt": "A product card has oversized headings, inconsistent metadata text, and heavy borders around every element. What verification checks should visual-design-foundations apply?",
43
+ "dimension": "application",
44
+ "substance": "domain",
45
+ "calibration": "semantic",
46
+ "truth_mode": "conceptual_correctness_plus_repo_application",
47
+ "skill_type": "concept",
48
+ "criticality": "normal",
49
+ "truth_sources": ["skills/visual-design-foundations/SKILL.md"]
50
+ }
51
+ ]
52
+ }
@@ -0,0 +1,52 @@
1
+ {
2
+ "skill_name": "webhook-integration",
3
+ "subject": "Inbound third-party webhook handler design for signature verification, idempotency, provider retry contracts, raw payload persistence, quarantine, secret rotation, and PII capture timing",
4
+ "adjacent_concepts": ["event-contract-design", "testing-strategy", "debugging", "owasp-security"],
5
+ "grounding_note": "Truth sources cite the whole SKILL.md file to keep the initial eval surface stable while routing boundaries are tightened.",
6
+ "evals": [
7
+ {
8
+ "id": 1,
9
+ "prompt": "A new third-party webhook handler must verify HMAC signatures on the raw body, dedupe retries, and choose 200 vs 500 behavior. Which skill owns this?",
10
+ "dimension": "application",
11
+ "substance": "domain",
12
+ "calibration": "process",
13
+ "truth_mode": "process_correctness",
14
+ "skill_type": "workflow",
15
+ "criticality": "high",
16
+ "truth_sources": ["skills/webhook-integration/SKILL.md"]
17
+ },
18
+ {
19
+ "id": 2,
20
+ "prompt": "A provider deletes customer data after 30 days and the handler must capture safe canonical data on first delivery. What should webhook-integration decide?",
21
+ "dimension": "application",
22
+ "substance": "domain",
23
+ "calibration": "semantic",
24
+ "truth_mode": "conceptual_correctness_plus_repo_application",
25
+ "skill_type": "workflow",
26
+ "criticality": "high",
27
+ "truth_sources": ["skills/webhook-integration/SKILL.md"]
28
+ },
29
+ {
30
+ "id": 3,
31
+ "prompt": "The product is an outbound webhook publisher that delivers customer-facing events. Should webhook-integration accept the contract design task?",
32
+ "dimension": "boundary",
33
+ "substance": "contradiction-check",
34
+ "calibration": "semantic",
35
+ "truth_mode": "code_verification",
36
+ "skill_type": "concept",
37
+ "criticality": "normal",
38
+ "truth_sources": ["skills/webhook-integration/SKILL.md"]
39
+ },
40
+ {
41
+ "id": 4,
42
+ "prompt": "The webhook handler is already failing in production and the user wants root-cause debugging. Which boundary should webhook-integration respect?",
43
+ "dimension": "boundary",
44
+ "substance": "contradiction-check",
45
+ "calibration": "semantic",
46
+ "truth_mode": "code_verification",
47
+ "skill_type": "concept",
48
+ "criticality": "normal",
49
+ "truth_sources": ["skills/webhook-integration/SKILL.md"]
50
+ }
51
+ ]
52
+ }
@@ -0,0 +1,80 @@
1
+ ---
2
+ name: a11y
3
+ description: "Use when building or reviewing interactive UI, forms, navigation, or dynamic content. Covers semantic HTML, keyboard access, focus management, labeling, state-change announcement, and reduced-motion / high-contrast preferences. Do NOT use for color-palette creation, visual branding, feedback-state staging, or prose reading-level accessibility - those belong to `visual-design-foundations`, `interaction-feedback`, and documentation respectively."
4
+ license: MIT
5
+ compatibility: "Markdown, Git, any web stack"
6
+ allowed-tools: Read Grep
7
+ metadata:
8
+ schema_version: "4"
9
+ version: "1.0.0"
10
+ type: capability
11
+ category: frontend
12
+ scope: portable
13
+ owner: skill-graph-maintainer
14
+ freshness: "2026-04-18"
15
+ drift_check: "{\"last_verified\":\"2026-04-18\"}"
16
+ eval_artifacts: present
17
+ eval_state: passing
18
+ routing_eval: present
19
+ stability: experimental
20
+ keywords: "[\"accessibility\",\"a11y\",\"keyboard navigation\",\"screen reader\",\"focus management\",\"keyboard not working\",\"tab order\",\"missing aria label\",\"screen reader says\",\"reduced motion\",\"high contrast\",\"semantic html\",\"form labels\",\"form fields\",\"aria-label\",\"assistive tech\",\"assistive technology\",\"accessible labels\",\"proper labels\"]"
21
+ triggers: "[\"a11y-skill\"]"
22
+ paths: "[\"**/*.{html,tsx,jsx,vue,svelte}\",\"**/*.css\",\"!**/*.test.{ts,tsx,js,jsx}\",\"!**/dist/**\",\"!**/node_modules/**\"]"
23
+ examples: "[\"this modal is keyboard-trapped — users can't Escape to close it\",\"screen reader doesn't announce when the form validation state changes\",\"add proper labels to these form fields so assistive tech can read them\",\"review this dropdown menu for arrow-key navigation and focus return\"]"
24
+ anti_examples: "[\"rewrite this error message at a 6th-grade reading level\",\"clean up this accessibility code without changing how it behaves\"]"
25
+ relations: "{\"boundary\":[{\"skill\":\"refactor\",\"reason\":\"refactor is behavior-preserving code modification; a11y is observable user-facing behavior\"},{\"skill\":\"documentation\",\"reason\":\"documentation owns prose reading-level and audience fit; a11y owns assistive-tech behavior\"},{\"skill\":\"diagnosis\",\"reason\":\"diagnosis classifies failure symptoms (Logic / Runtime / Performance / etc.) for triage; a11y owns assistive-tech behavior. The phrase 'rewrite this error message...' is a documentation/UX concern, not a diagnosis or a11y concern — diagnosis is named here so the router excludes it from a11y's positive scope.\"},{\"skill\":\"visual-design-foundations\",\"reason\":\"visual-design-foundations owns palette, typography, spacing, and visual craft; a11y owns whether the resulting interaction is perceivable, operable, understandable, and robust\"},{\"skill\":\"interaction-feedback\",\"reason\":\"interaction-feedback owns feedback-state staging; a11y owns whether those state changes are announced and operable\"}],\"related\":[\"interaction-patterns\",\"form-ux-architecture\",\"interaction-feedback\",\"design-system-architecture\"],\"verify_with\":[\"testing-strategy\"]}"
26
+ portability: "{\"readiness\":\"scripted\",\"targets\":[\"skill-md\"]}"
27
+ ---
28
+
29
+ # Accessibility
30
+
31
+ ## Coverage
32
+
33
+ - Semantic HTML: choosing the right primitive elements so structure is meaningful to assistive technology
34
+ - Keyboard access: making every interaction reachable and operable without a pointing device
35
+ - Focus management: keeping focus visible, predictable, and correctly placed after navigation and state changes
36
+ - Labeling and naming: ensuring every interactive element has a programmatic name that matches its visible label
37
+ - State and change announcement: communicating dynamic updates (loading, errors, success) to assistive technology
38
+ - Reduced-motion and high-contrast preferences: respecting user settings that affect interaction perception
39
+
40
+ ## Philosophy
41
+
42
+ Accessible interaction is structural, not cosmetic. It is decided by the primitive you picked, the focus order you wrote, and the label that ships or doesn't — not by the audit that runs after. Teams that treat accessibility as a finishing pass pay for it twice: once in remediation work that was cheaper to avoid, and again when assistive-technology users hit the failure and bounce. The correct default is to build with those users in scope from the first commit, not after the first lawsuit.
43
+
44
+ ## Primitive Selection
45
+
46
+ The single highest-leverage accessibility decision is picking the right HTML primitive before styling. A wrong primitive cannot be rescued by ARIA; the right primitive usually needs no ARIA at all.
47
+
48
+ | User intent | Correct primitive | Wrong primitives (common mistakes) |
49
+ | ------------------------------------ | ----------------------------------------------- | ------------------------------------------------------- |
50
+ | Trigger an action on the same page | `<button type="button">` | `<a href="#">`, `<div onclick>`, `<span role="button">` |
51
+ | Navigate to a different URL | `<a href="…">` | `<button onclick=navigate>`, `<div onclick>` |
52
+ | Group related form controls | `<fieldset>` + `<legend>` | `<div>` with a heading above it |
53
+ | Label a form control | `<label for="…">` (or wrapping `<label>`) | `<div>` text next to the input, `placeholder` only |
54
+ | Show a collapsible section | `<details>` + `<summary>` | `<div>` with JS toggle and no ARIA |
55
+ | Present tabular data | `<table>` + `<th scope="…">` | `<div>` grid, CSS grid with no semantic role |
56
+ | Announce a status change | `<output>` or `role="status"` live region | Toast that only renders visually |
57
+ | Interactive widget not covered above | Native element + tested keyboard + ARIA pattern | Custom `<div>` with ad-hoc `role` and handlers |
58
+
59
+ ### When ARIA is appropriate
60
+
61
+ Only when no native primitive fits the interaction, and only when you also ship the keyboard behavior that matches the role. Adding `role="button"` to a `<div>` without Enter/Space handlers is worse than either the correct `<button>` or the untyped `<div>` alone.
62
+
63
+ ## Evals
64
+
65
+ This skill ships a comprehension-eval artifact at [`examples/evals/a11y.json`](../../examples/evals/a11y.json). The `Verification` checklist below is the authoring gate for a new interactive component; the eval file is how this skill is graded by `scripts/skill-audit.js --graded`. Do not conflate them — the checklist is for implementers, the eval is for the grader.
66
+
67
+ ## Verification
68
+
69
+ - [ ] Interactive elements use the right semantic primitives
70
+ - [ ] Keyboard-only flows remain usable
71
+ - [ ] Focus is visible and lands in the correct place
72
+ - [ ] Labels and state changes are perceivable
73
+ - [ ] User preferences (reduced motion, high contrast) are respected
74
+
75
+ ## Do NOT Use When
76
+
77
+ | Use instead | When |
78
+ | --------------- | -------------------------------------------------------------------------------------------------------- |
79
+ | `documentation` | The task is prose structure or reading-level clarity, not interaction accessibility |
80
+ | `refactor` | The task is behavior-preserving code cleanup — refactoring does not change what assistive tech perceives |