@skill-graph/cli 0.5.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (330) hide show
  1. package/CHANGELOG.md +247 -0
  2. package/LICENSE +200 -0
  3. package/NOTICE +62 -0
  4. package/README.md +398 -0
  5. package/SKILL_GRAPH.md +443 -0
  6. package/bin/skill-graph.js +374 -0
  7. package/docs/ADOPTION.md +117 -0
  8. package/docs/CONFORMANCE.md +66 -0
  9. package/docs/PRIMER.md +384 -0
  10. package/docs/QUICKSTART-30MIN.md +333 -0
  11. package/docs/ROUTING-METRICS.md +120 -0
  12. package/docs/SKILL-MD-FORMAT-COMPATIBILITY.md +127 -0
  13. package/docs/SKILL_AUDIT_CHECKLIST.md +199 -0
  14. package/docs/SKILL_AUDIT_LOOP.md +195 -0
  15. package/docs/SKILL_METADATA_PROTOCOL.md +609 -0
  16. package/docs/_archived/marketplace-publication-priority-2026-05-18.md +239 -0
  17. package/docs/adr/0001-predicate-set.md +69 -0
  18. package/docs/adr/0002-json-ld-context.md +82 -0
  19. package/docs/adr/0003-ontoclean-rigidity-tags.md +65 -0
  20. package/docs/adr/0004-persistent-identifiers.md +74 -0
  21. package/docs/adr/0005-freshness-consolidation.md +70 -0
  22. package/docs/adr/0006-revise-predicate-rename.md +105 -0
  23. package/docs/adr/0007-audit-loop-cadence.md +99 -0
  24. package/docs/adr/0008-skill-surface-split-and-curation-policy.md +93 -0
  25. package/docs/category-consumers.md +168 -0
  26. package/docs/concept-map.md +194 -0
  27. package/docs/diagrams/drift-states.mmd +21 -0
  28. package/docs/diagrams/manifest-pipeline.mmd +25 -0
  29. package/docs/diagrams/routing-harness.mmd +41 -0
  30. package/docs/diagrams/starter-graph.mmd +53 -0
  31. package/docs/field-decision-guide.md +315 -0
  32. package/docs/field-rationale.md +211 -0
  33. package/docs/field-reference.generated.md +624 -0
  34. package/docs/field-reference.md +1426 -0
  35. package/docs/glossary.md +190 -0
  36. package/docs/head-noun-glossary.md +63 -0
  37. package/docs/images/audit-phases.png +0 -0
  38. package/docs/images/drift-states.png +0 -0
  39. package/docs/images/graded-mode.png +0 -0
  40. package/docs/images/manifest-pipeline.png +0 -0
  41. package/docs/images/routing-harness.png +0 -0
  42. package/docs/images/skill-anatomy.png +0 -0
  43. package/docs/images/starter-graph.png +0 -0
  44. package/docs/images/system-model.png +0 -0
  45. package/docs/integrations/github-actions.md +155 -0
  46. package/docs/manifest-field-mapping.md +443 -0
  47. package/docs/marketplace-publication-queue.generated.md +240 -0
  48. package/docs/marketplace-release-agent-prompt.md +82 -0
  49. package/docs/marketplace-skill-candidate-list.md +272 -0
  50. package/docs/marketplace-syndication.md +222 -0
  51. package/docs/migration-sample-review.md +155 -0
  52. package/docs/migrations/v4-to-v5.md +168 -0
  53. package/docs/migrations/v5-to-v6.md +221 -0
  54. package/docs/name-exceptions.yaml +37 -0
  55. package/docs/plans/marketplace-p1-public-migration-plan.md +41 -0
  56. package/docs/plans/multi-root-workspace.md +148 -0
  57. package/docs/plans/scripts-roadmap.md +107 -0
  58. package/docs/plans/v4-schema-bump.md +160 -0
  59. package/docs/plans/wave-2-extraction.md +122 -0
  60. package/docs/positioning-vs-marketplaces.md +175 -0
  61. package/docs/proposals/skill-audit-loop-positioning.md +160 -0
  62. package/docs/quality-doctrine.md +138 -0
  63. package/docs/recommended-skills.md +150 -0
  64. package/docs/research/skill-comprehension-eval-research.md +1830 -0
  65. package/docs/research/skill-retrieval-evidence.md +66 -0
  66. package/docs/skill-metadata-protocol.md +471 -0
  67. package/docs/skills-sh-maintainer-cleanup-request.md +80 -0
  68. package/examples/audits/a11y/findings.md +52 -0
  69. package/examples/audits/a11y/scorecard.md +21 -0
  70. package/examples/audits/a11y/verdict.md +44 -0
  71. package/examples/audits/debugging/findings.md +59 -0
  72. package/examples/audits/debugging/scorecard.md +22 -0
  73. package/examples/audits/debugging/verdict.md +33 -0
  74. package/examples/audits/documentation/findings.md +59 -0
  75. package/examples/audits/documentation/scorecard.md +22 -0
  76. package/examples/audits/documentation/verdict.md +33 -0
  77. package/examples/evals/a11y.json +140 -0
  78. package/examples/evals/api-design.json +52 -0
  79. package/examples/evals/code-review.json +52 -0
  80. package/examples/evals/data-modeling.json +52 -0
  81. package/examples/evals/database-migration.json +52 -0
  82. package/examples/evals/debugging.json +118 -0
  83. package/examples/evals/dependency-architecture.json +52 -0
  84. package/examples/evals/design-system-architecture.json +52 -0
  85. package/examples/evals/error-tracking.json +52 -0
  86. package/examples/evals/event-contract-design.json +52 -0
  87. package/examples/evals/form-ux-architecture.json +52 -0
  88. package/examples/evals/framework-fit-analysis.json +52 -0
  89. package/examples/evals/graph-audit.json +139 -0
  90. package/examples/evals/information-architecture.json +52 -0
  91. package/examples/evals/interaction-feedback.json +52 -0
  92. package/examples/evals/interaction-patterns.json +52 -0
  93. package/examples/evals/layout-composition.json +52 -0
  94. package/examples/evals/lint-overlay.json +117 -0
  95. package/examples/evals/microcopy.json +52 -0
  96. package/examples/evals/observability-modeling.json +52 -0
  97. package/examples/evals/pattern-recognition.json +96 -0
  98. package/examples/evals/performance-engineering.json +52 -0
  99. package/examples/evals/refactor.json +128 -0
  100. package/examples/evals/semiotics.json +52 -0
  101. package/examples/evals/skill-infrastructure.json +96 -0
  102. package/examples/evals/skill-router.json +140 -0
  103. package/examples/evals/skill-router.routing.json +113 -0
  104. package/examples/evals/system-interface-contracts.json +52 -0
  105. package/examples/evals/task-analysis.json +52 -0
  106. package/examples/evals/testing-strategy.json +118 -0
  107. package/examples/evals/type-safety.json +249 -0
  108. package/examples/evals/visual-design-foundations.json +52 -0
  109. package/examples/evals/webhook-integration.json +52 -0
  110. package/examples/exports/a11y.skill-md.md +80 -0
  111. package/examples/exports/debugging.skill-md.md +80 -0
  112. package/examples/exports/refactor.skill-md.md +78 -0
  113. package/examples/exports/testing-strategy.skill-md.md +81 -0
  114. package/examples/projects/markdown-static-site/README.md +115 -0
  115. package/examples/projects/markdown-static-site/skills/content-source-router/SKILL.md +131 -0
  116. package/examples/projects/markdown-static-site/skills/image-optimization-pipeline-config/SKILL.md +132 -0
  117. package/examples/projects/markdown-static-site/skills/link-rot-detection/SKILL.md +103 -0
  118. package/examples/projects/markdown-static-site/skills/markdown-post-frontmatter-validation/SKILL.md +133 -0
  119. package/examples/projects/markdown-static-site/skills/migrate-posts-to-v2-frontmatter/SKILL.md +140 -0
  120. package/examples/projects/saas-stripe-postgres/README.md +208 -0
  121. package/examples/projects/saas-stripe-postgres/db/migrations/0004_canonicalize_orders.sql +37 -0
  122. package/examples/projects/saas-stripe-postgres/db/schema.sql +112 -0
  123. package/examples/projects/saas-stripe-postgres/skills/migrate-orders-to-canonical-schema/SKILL.md +149 -0
  124. package/examples/projects/saas-stripe-postgres/skills/nextjs-server-action-validation/SKILL.md +154 -0
  125. package/examples/projects/saas-stripe-postgres/skills/payment-provider-router/SKILL.md +153 -0
  126. package/examples/projects/saas-stripe-postgres/skills/postgres-rls-pattern/SKILL.md +163 -0
  127. package/examples/projects/saas-stripe-postgres/skills/stripe-webhook-signature-verification/SKILL.md +137 -0
  128. package/examples/protocol/skill-metadata-template.md +301 -0
  129. package/examples/protocol/skills.manifest.sample.json +13245 -0
  130. package/examples/skill-metadata-template.md +317 -0
  131. package/examples/skills.manifest.sample.json +13519 -0
  132. package/examples/tests/v3-1-skos-fixture/SKILL.md +93 -0
  133. package/marketplace/README.md +17 -0
  134. package/marketplace/skills/a11y/SKILL.md +66 -0
  135. package/marketplace/skills/acid-fundamentals/SKILL.md +106 -0
  136. package/marketplace/skills/agent-engineering/SKILL.md +386 -0
  137. package/marketplace/skills/agent-eval-design/SKILL.md +55 -0
  138. package/marketplace/skills/ai-native-development/SKILL.md +294 -0
  139. package/marketplace/skills/api-design/SKILL.md +60 -0
  140. package/marketplace/skills/architecture-decision-records/SKILL.md +55 -0
  141. package/marketplace/skills/background-jobs/SKILL.md +265 -0
  142. package/marketplace/skills/bounded-context-mapping/SKILL.md +55 -0
  143. package/marketplace/skills/cap-theorem-tradeoffs/SKILL.md +127 -0
  144. package/marketplace/skills/client-server-boundary/SKILL.md +187 -0
  145. package/marketplace/skills/code-review/SKILL.md +120 -0
  146. package/marketplace/skills/color-system-design/SKILL.md +43 -0
  147. package/marketplace/skills/component-architecture/SKILL.md +126 -0
  148. package/marketplace/skills/compression/SKILL.md +112 -0
  149. package/marketplace/skills/conceptual-modeling/SKILL.md +181 -0
  150. package/marketplace/skills/connection-pooling/SKILL.md +105 -0
  151. package/marketplace/skills/constraint-awareness/SKILL.md +287 -0
  152. package/marketplace/skills/content-monitor/SKILL.md +209 -0
  153. package/marketplace/skills/context-engineering/SKILL.md +320 -0
  154. package/marketplace/skills/context-graph/SKILL.md +174 -0
  155. package/marketplace/skills/context-management/SKILL.md +174 -0
  156. package/marketplace/skills/context-window/SKILL.md +239 -0
  157. package/marketplace/skills/contract-testing/SKILL.md +120 -0
  158. package/marketplace/skills/cron-scheduling/SKILL.md +223 -0
  159. package/marketplace/skills/dark-mode-implementation/SKILL.md +47 -0
  160. package/marketplace/skills/data-modeling/SKILL.md +59 -0
  161. package/marketplace/skills/data-modeling-fundamentals/SKILL.md +117 -0
  162. package/marketplace/skills/database-migration/SKILL.md +429 -0
  163. package/marketplace/skills/debugging/SKILL.md +67 -0
  164. package/marketplace/skills/dependency-architecture/SKILL.md +58 -0
  165. package/marketplace/skills/design-module-composition/SKILL.md +43 -0
  166. package/marketplace/skills/design-system-architecture/SKILL.md +61 -0
  167. package/marketplace/skills/design-thinking/SKILL.md +44 -0
  168. package/marketplace/skills/diagnosis/SKILL.md +296 -0
  169. package/marketplace/skills/diff-analysis/SKILL.md +188 -0
  170. package/marketplace/skills/e2e-test-design/SKILL.md +113 -0
  171. package/marketplace/skills/entity-relationship-modeling/SKILL.md +218 -0
  172. package/marketplace/skills/epistemic-grounding/SKILL.md +112 -0
  173. package/marketplace/skills/error-boundary/SKILL.md +235 -0
  174. package/marketplace/skills/error-tracking/SKILL.md +261 -0
  175. package/marketplace/skills/eval-driven-development/SKILL.md +147 -0
  176. package/marketplace/skills/evaluation/SKILL.md +113 -0
  177. package/marketplace/skills/event-contract-design/SKILL.md +60 -0
  178. package/marketplace/skills/event-storming/SKILL.md +56 -0
  179. package/marketplace/skills/form-ux-architecture/SKILL.md +60 -0
  180. package/marketplace/skills/framework-fit-analysis/SKILL.md +59 -0
  181. package/marketplace/skills/frontend-architecture/SKILL.md +43 -0
  182. package/marketplace/skills/generative-ui/SKILL.md +118 -0
  183. package/marketplace/skills/graph-audit/SKILL.md +81 -0
  184. package/marketplace/skills/guardrails/SKILL.md +118 -0
  185. package/marketplace/skills/hooks-patterns/SKILL.md +185 -0
  186. package/marketplace/skills/http-semantics/SKILL.md +136 -0
  187. package/marketplace/skills/ideation/SKILL.md +41 -0
  188. package/marketplace/skills/indexing-strategy/SKILL.md +108 -0
  189. package/marketplace/skills/information-architecture/SKILL.md +59 -0
  190. package/marketplace/skills/integration-test-design/SKILL.md +111 -0
  191. package/marketplace/skills/intent-recognition/SKILL.md +136 -0
  192. package/marketplace/skills/interaction-feedback/SKILL.md +59 -0
  193. package/marketplace/skills/interaction-patterns/SKILL.md +59 -0
  194. package/marketplace/skills/journey-mapping/SKILL.md +41 -0
  195. package/marketplace/skills/keywords/SKILL.md +213 -0
  196. package/marketplace/skills/knowledge-modeling/SKILL.md +232 -0
  197. package/marketplace/skills/layout-composition/SKILL.md +59 -0
  198. package/marketplace/skills/linguistics/SKILL.md +429 -0
  199. package/marketplace/skills/lint-overlay/SKILL.md +76 -0
  200. package/marketplace/skills/mental-models/SKILL.md +126 -0
  201. package/marketplace/skills/merge-queue/SKILL.md +94 -0
  202. package/marketplace/skills/methodology/SKILL.md +317 -0
  203. package/marketplace/skills/microcopy/SKILL.md +232 -0
  204. package/marketplace/skills/middleware-patterns/SKILL.md +363 -0
  205. package/marketplace/skills/mobile-responsive-ux/SKILL.md +287 -0
  206. package/marketplace/skills/mutation-testing/SKILL.md +112 -0
  207. package/marketplace/skills/naming-conventions/SKILL.md +112 -0
  208. package/marketplace/skills/observability-modeling/SKILL.md +59 -0
  209. package/marketplace/skills/ontology-modeling/SKILL.md +67 -0
  210. package/marketplace/skills/owasp-security/SKILL.md +153 -0
  211. package/marketplace/skills/pattern-recognition/SKILL.md +472 -0
  212. package/marketplace/skills/performance-budgets/SKILL.md +185 -0
  213. package/marketplace/skills/performance-engineering/SKILL.md +58 -0
  214. package/marketplace/skills/performance-testing/SKILL.md +125 -0
  215. package/marketplace/skills/printify/SKILL.md +42 -0
  216. package/marketplace/skills/prioritization/SKILL.md +118 -0
  217. package/marketplace/skills/problem-framing/SKILL.md +41 -0
  218. package/marketplace/skills/problem-locating-solving/SKILL.md +203 -0
  219. package/marketplace/skills/project-knowledge-extraction/SKILL.md +54 -0
  220. package/marketplace/skills/prompt-craft/SKILL.md +134 -0
  221. package/marketplace/skills/prompt-injection-defense/SKILL.md +132 -0
  222. package/marketplace/skills/property-based-testing/SKILL.md +100 -0
  223. package/marketplace/skills/prototyping/SKILL.md +43 -0
  224. package/marketplace/skills/query-optimization/SKILL.md +144 -0
  225. package/marketplace/skills/real-time-updates/SKILL.md +324 -0
  226. package/marketplace/skills/ref-patterns/SKILL.md +284 -0
  227. package/marketplace/skills/refactor/SKILL.md +65 -0
  228. package/marketplace/skills/rendering-models/SKILL.md +142 -0
  229. package/marketplace/skills/replication-patterns/SKILL.md +110 -0
  230. package/marketplace/skills/research-synthesis/SKILL.md +41 -0
  231. package/marketplace/skills/route-handler-design/SKILL.md +347 -0
  232. package/marketplace/skills/schema-evolution/SKILL.md +140 -0
  233. package/marketplace/skills/security-fundamentals/SKILL.md +139 -0
  234. package/marketplace/skills/semantic-center/SKILL.md +194 -0
  235. package/marketplace/skills/semantic-relations/SKILL.md +250 -0
  236. package/marketplace/skills/semantics/SKILL.md +366 -0
  237. package/marketplace/skills/semiotics/SKILL.md +230 -0
  238. package/marketplace/skills/seo-strategy/SKILL.md +260 -0
  239. package/marketplace/skills/server-actions-design/SKILL.md +243 -0
  240. package/marketplace/skills/server-components-design/SKILL.md +190 -0
  241. package/marketplace/skills/sharding-strategy/SKILL.md +123 -0
  242. package/marketplace/skills/shopify/SKILL.md +42 -0
  243. package/marketplace/skills/skill-infrastructure/SKILL.md +320 -0
  244. package/marketplace/skills/skill-router/SKILL.md +71 -0
  245. package/marketplace/skills/skill-scaffold/SKILL.md +105 -0
  246. package/marketplace/skills/snapshot-testing/SKILL.md +120 -0
  247. package/marketplace/skills/spec-driven-development/SKILL.md +148 -0
  248. package/marketplace/skills/state-machine-modeling/SKILL.md +56 -0
  249. package/marketplace/skills/state-management/SKILL.md +134 -0
  250. package/marketplace/skills/streaming-architecture/SKILL.md +194 -0
  251. package/marketplace/skills/summarization/SKILL.md +156 -0
  252. package/marketplace/skills/suspense-patterns/SKILL.md +265 -0
  253. package/marketplace/skills/system-interface-contracts/SKILL.md +59 -0
  254. package/marketplace/skills/task-analysis/SKILL.md +201 -0
  255. package/marketplace/skills/taxonomy-design/SKILL.md +66 -0
  256. package/marketplace/skills/test-coverage-strategy/SKILL.md +108 -0
  257. package/marketplace/skills/test-doubles-design/SKILL.md +98 -0
  258. package/marketplace/skills/test-driven-development/SKILL.md +96 -0
  259. package/marketplace/skills/testing-strategy/SKILL.md +67 -0
  260. package/marketplace/skills/theme-system-design/SKILL.md +43 -0
  261. package/marketplace/skills/tool-call-flow/SKILL.md +229 -0
  262. package/marketplace/skills/tool-call-strategy/SKILL.md +292 -0
  263. package/marketplace/skills/transaction-isolation/SKILL.md +98 -0
  264. package/marketplace/skills/type-safety/SKILL.md +177 -0
  265. package/marketplace/skills/typography-system/SKILL.md +43 -0
  266. package/marketplace/skills/usability-testing/SKILL.md +43 -0
  267. package/marketplace/skills/user-research/SKILL.md +43 -0
  268. package/marketplace/skills/vercel-composition-patterns/SKILL.md +157 -0
  269. package/marketplace/skills/version-control/SKILL.md +233 -0
  270. package/marketplace/skills/visual-design-foundations/SKILL.md +59 -0
  271. package/marketplace/skills/visual-hierarchy/SKILL.md +43 -0
  272. package/marketplace/skills/webhook-integration/SKILL.md +331 -0
  273. package/marketplace/skills/writing-humanizer/SKILL.md +380 -0
  274. package/package.json +67 -0
  275. package/schemas/manifest.schema.json +811 -0
  276. package/schemas/manifest.v2.schema.json +164 -0
  277. package/schemas/manifest.v3.schema.json +758 -0
  278. package/schemas/manifest.v4.schema.json +755 -0
  279. package/schemas/manifest.v5.schema.json +755 -0
  280. package/schemas/manifest.v6.schema.json +811 -0
  281. package/schemas/skill.context.jsonld +279 -0
  282. package/schemas/skill.schema.json +919 -0
  283. package/schemas/skill.v2.schema.json +201 -0
  284. package/schemas/skill.v3.schema.json +827 -0
  285. package/schemas/skill.v4.schema.json +822 -0
  286. package/schemas/skill.v5.schema.json +830 -0
  287. package/schemas/skill.v6.schema.json +946 -0
  288. package/schemas/vocabulary/keywords.json +180 -0
  289. package/schemas/vocabulary/workspace_tags.json +23 -0
  290. package/scripts/__tests__/migrate-skill-v2-to-v3.test.js +161 -0
  291. package/scripts/__tests__/migrate-skill-v3-to-v4.test.js +158 -0
  292. package/scripts/__tests__/test-export-parser-drift.js +149 -0
  293. package/scripts/__tests__/test-marketplace-export.js +114 -0
  294. package/scripts/__tests__/test-router-paths.js +82 -0
  295. package/scripts/__tests__/test-stability-promotion.js +244 -0
  296. package/scripts/__tests__/test-v3-1-alias-contract.js +109 -0
  297. package/scripts/__tests__/test-v3-1-skos-runtime.js +116 -0
  298. package/scripts/backfill-schema-version.js +198 -0
  299. package/scripts/build-field-reference.js +160 -0
  300. package/scripts/build-retrieval-baseline.js +511 -0
  301. package/scripts/check-markdown-links.js +211 -0
  302. package/scripts/check-protocol-consistency.js +979 -0
  303. package/scripts/export-marketplace-skills.js +610 -0
  304. package/scripts/export-skill.js +374 -0
  305. package/scripts/generate-manifest.js +787 -0
  306. package/scripts/lib/alias-contract.js +83 -0
  307. package/scripts/lib/audit-prompt-builder.js +771 -0
  308. package/scripts/lib/mock-grader.js +134 -0
  309. package/scripts/lib/parse-frontmatter.js +429 -0
  310. package/scripts/lib/roots.js +119 -0
  311. package/scripts/lint/check-archetype-sections.js +185 -0
  312. package/scripts/lint/check-category-enum.js +83 -0
  313. package/scripts/lint/check-routing-eval.js +146 -0
  314. package/scripts/lint/check-routing-quality.js +211 -0
  315. package/scripts/lint/check-stability-promotion.js +220 -0
  316. package/scripts/lint/format-code-frame.js +206 -0
  317. package/scripts/marketplace-install.js +125 -0
  318. package/scripts/migrate-category-to-enum.js +169 -0
  319. package/scripts/migrate-skill-v2-to-v3.js +424 -0
  320. package/scripts/migrate-skill-v3-to-v4.js +200 -0
  321. package/scripts/migrate-skill-v5-to-v6.js +304 -0
  322. package/scripts/restructure-by-category.js +85 -0
  323. package/scripts/seed-publication-classification.js +282 -0
  324. package/scripts/skill-audit.js +893 -0
  325. package/scripts/skill-graph-drift.js +483 -0
  326. package/scripts/skill-graph-route.js +766 -0
  327. package/scripts/skill-graph-routing-eval.js +393 -0
  328. package/scripts/skill-lint.js +1317 -0
  329. package/scripts/skill-overlap.js +213 -0
  330. package/scripts/verify-skill-md-export.js +201 -0
@@ -0,0 +1,174 @@
1
+ ---
2
+ name: context-management
3
+ description: "Use when deciding what to load into an active agent session, recovering from context drift, preparing compaction or restart, distilling raw inputs into a working summary, or writing a handoff another agent can resume quickly. Covers intake triage, the six-step context-management loop, working-set shaping, evidence-first loading, drift signals, anti-drift rules, compaction-ready handoffs, and selective rebuild after context loss. Do NOT use for token math (use `context-window`), prompt wording (use `prompt-craft`), persistent memory curation, or multi-graph context architecture (use `context-graph`)."
4
+ license: MIT
5
+ compatibility: Runtime-agnostic. The intake-triage / loop / drift / handoff discipline applies to any LLM-coding harness regardless of context window size or compaction implementation.
6
+ allowed-tools: Read Grep
7
+ metadata:
8
+ metadata: "{\"schema_version\":6,\"version\":\"1.0.0\",\"type\":\"capability\",\"category\":\"agent\",\"domain\":\"agent/context\",\"scope\":\"portable\",\"owner\":\"skill-graph-maintainer\",\"freshness\":\"2026-05-06\",\"drift_check\":\"{\\\\\\\"last_verified\\\\\\\":\\\\\\\"2026-05-06\\\\\\\"}\",\"eval_artifacts\":\"planned\",\"eval_state\":\"unverified\",\"routing_eval\":\"absent\",\"stability\":\"experimental\",\"keywords\":\"[\\\\\\\"context management\\\\\\\",\\\\\\\"working set discipline\\\\\\\",\\\\\\\"intake triage four buckets\\\\\\\",\\\\\\\"context drift recovery\\\\\\\",\\\\\\\"context management loop\\\\\\\",\\\\\\\"compaction-ready handoff\\\\\\\",\\\\\\\"distill raw inputs\\\\\\\",\\\\\\\"one active hypothesis\\\\\\\",\\\\\\\"selective context rebuild\\\\\\\",\\\\\\\"lost-thread recovery\\\\\\\",\\\\\\\"active question one sentence\\\\\\\",\\\\\\\"prove or disprove minimum evidence\\\\\\\",\\\\\\\"collapse confirmed facts\\\\\\\",\\\\\\\"drop disproven assumptions\\\\\\\",\\\\\\\"working set shaping\\\\\\\",\\\\\\\"distillation pattern\\\\\\\",\\\\\\\"anti-drift rules\\\\\\\",\\\\\\\"handoff in 30 seconds\\\\\\\"]\",\"examples\":\"[\\\\\\\"the session feels noisy and I'm re-reading the same files — what discipline pulls it back?\\\\\\\",\\\\\\\"the agent keeps citing assumptions that were already disproven — how do I clear them out?\\\\\\\",\\\\\\\"I'm about to compact — what do I need to preserve so the next session resumes correctly?\\\\\\\",\\\\\\\"the thread is lost; what's the recipe for rebuilding only what's needed instead of warming up everything?\\\\\\\",\\\\\\\"I have a 300-line error log and a 600-line component file in context — how do I distill them?\\\\\\\",\\\\\\\"the active question changed three times this session — how do I prevent the old context from steering the new one?\\\\\\\",\\\\\\\"this conversation has 40K tokens of evidence; what should the working set actually contain?\\\\\\\"]\",\"anti_examples\":\"[\\\\\\\"calculate the per-zone token budget for the 200K context window\\\\\\\",\\\\\\\"improve this prompt template for the grader\\\\\\\",\\\\\\\"curate the persistent memory index file\\\\\\\",\\\\\\\"design the multi-graph architecture for skills + docs + memory\\\\\\\",\\\\\\\"review this AI-generated PR for correctness\\\\\\\",\\\\\\\"why is this skill not routing — fix the keyword config\\\\\\\"]\",\"relations\":\"{\\\\\\\"boundary\\\\\\\":[{\\\\\\\"skill\\\\\\\":\\\\\\\"context-graph\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"context-graph maps the static topology — what skills, docs, memory, scripts exist and how they connect; context-management is the live working-set discipline inside one running session\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"prompt-craft\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"prompt-craft is wording and structure of one prompt; context-management is the discipline of what enters, stays in, and exits the session around any prompt\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"context-engineering\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"context-engineering is the system-level design (injector quality, failure metrics); context-management is the per-session operating discipline within that system\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"tool-call-strategy\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"tool-call-strategy decides which tool to call next for the agent's job; context-management decides what context that decision should be made against\\\\\\\"}],\\\\\\\"related\\\\\\\":[\\\\\\\"context-engineering\\\\\\\",\\\\\\\"context-graph\\\\\\\",\\\\\\\"tool-call-strategy\\\\\\\"],\\\\\\\"verify_with\\\\\\\":[\\\\\\\"context-engineering\\\\\\\"]}\",\"portability\":\"{\\\\\\\"readiness\\\\\\\":\\\\\\\"scripted\\\\\\\",\\\\\\\"targets\\\\\\\":[\\\\\\\"skill-md\\\\\\\"]}\",\"lifecycle\":\"{\\\\\\\"stale_after_days\\\\\\\":365,\\\\\\\"review_cadence\\\\\\\":\\\\\\\"quarterly\\\\\\\"}\",\"skill_graph_source_repo\":\"https://github.com/jacob-balslev/skill-graph\",\"skill_graph_protocol\":\"Skill Metadata Protocol v5\",\"skill_graph_project\":\"Skill Graph\",\"skill_graph_canonical_skill\":\"skills/context-management/SKILL.md\",\"skill_graph_export_description\":\"shortened for Agent Skills 1024-character description limit; canonical source keeps the full routing contract\",\"skill_graph_canonical_description_length\":\"1322\"}"
9
+ skill_graph_source_repo: "https://github.com/jacob-balslev/skill-graph"
10
+ skill_graph_protocol: Skill Metadata Protocol v4
11
+ skill_graph_project: Skill Graph
12
+ skill_graph_canonical_skill: skills/context-management/SKILL.md
13
+ ---
14
+
15
+ # Context Management
16
+
17
+ ## Coverage
18
+
19
+ The working discipline that controls what enters, stays in, and exits an active agent session. Intake triage that sorts every candidate context source into a four-bucket classification (must-have / useful soon / durable background / noise) before any large file is read. The six-step context-management loop: state the active question in one sentence, name the minimum evidence needed to answer it, load the cheapest sources first (index → search → narrow file slice), collapse confirmed facts into a checkpoint, drop disproven assumptions from the active thread, re-check whether the question changed before reading more. Working-set shaping rules — what to keep active vs what to push out — and the distillation pattern that converts a 300-line log into a 2-line summary, a whole file into a function name plus slice plus invariant, a long conversation into current-state-blocker-next-step. Drift detection signals (re-reading the same file, ideas changing every turn, search-space unbounded, the agent forgetting what was proven) and the anti-drift rules (one active hypothesis at a time, one primary question, one verification target). The compaction-ready handoff format with five required fields (task / question / proven facts / rejected paths / next step) and the under-thirty-seconds resume test. The selective-rebuild recipe for recovering after the thread is lost.
20
+
21
+ ## Philosophy
22
+
23
+ Context management is the practical layer between _having_ the right information available somewhere in the workspace and _having_ it active in the agent at the right moment. The goal is _not_ to load more context — it is to keep the smallest working set that still lets the agent act correctly. Without this discipline, agents speculate from stale assumptions, re-read files they already processed, and lose the decision trail at the moment of compaction. Every context slot occupied by noise is a slot unavailable for the evidence that would actually resolve the current question.
24
+
25
+ The hardest part is not what to load. It is _what to drop_. Disproven hypotheses, raw logs after the key pattern is extracted, full files after the needed lines are identified, alternative hypotheses that have already been falsified — all of these continue to occupy context until they are _deliberately_ removed. The working set is what the agent is actively reasoning over, not everything it has ever seen.
26
+
27
+ ## 1. Outcomes
28
+
29
+ | Job | What success looks like | Common failure |
30
+ | ----------- | ------------------------------------------------- | --------------------------------------------------- |
31
+ | Intake | Only task-critical context is loaded first | Reading five files before the problem is even named |
32
+ | Working set | The active context matches the _current_ question | Old assumptions keep steering new work |
33
+ | Handoff | Another session can resume cleanly | Compaction loses the decision trail |
34
+ | Recovery | The agent can rebuild the thread quickly | Re-reading everything from scratch |
35
+
36
+ ## 2. The Context-Management Loop
37
+
38
+ Use this loop whenever a task starts to sprawl or the session feels noisy:
39
+
40
+ 1. **Define the active question** in one sentence.
41
+ 2. **Name the minimum evidence** that would answer it — both the _prove_ set and the _disprove_ set.
42
+ 3. **Load the cheapest sources first**: index, search result, narrow file slice. Avoid full-file reads until evidence demands the rest of the file.
43
+ 4. **Collapse confirmed facts** into a short checkpoint that the agent can re-read at any later turn.
44
+ 5. **Drop stale or disproven assumptions** from the active thread — disprove ≠ delete-from-history, but delete-from-active-context.
45
+ 6. **Re-check whether the active question changed** before reading more. If yes, restart the loop. Do not drag old context along.
46
+
47
+ The loop is recursive — every "load more" decision restarts it. The goal is not to never read; it is to read _only when evidence demands it_.
48
+
49
+ ## 3. Intake Triage — Four Buckets
50
+
51
+ Before reading anything large, sort every candidate context source into one of four buckets:
52
+
53
+ | Bucket | Load now? | Examples | Rule |
54
+ | ---------------------- | --------- | ------------------------------------------------------------------------ | ------------------------------------------------------------------ |
55
+ | **Must-have** | Yes | The user-named file, the failing route, the owning skill | Read first |
56
+ | **Useful soon** | Maybe | Neighbouring docs, related tests, adjacent skill files | Load only after the active question is stable |
57
+ | **Durable background** | Rarely | Broad architecture overview, long design guide, full module README | Use the index first; slice narrowly only if the index points there |
58
+ | **Noise** | No | Adjacent-but-unneeded files, unrelated generated output, one-off scripts | Ignore unless evidence later points there |
59
+
60
+ ### Intake rules
61
+
62
+ - Start from the user's _concrete_ artefact (file, error, ticket) before loading background.
63
+ - Prefer an _owning_ doc over a broad repo-wide reference.
64
+ - Prefer a _search result_ over a full-file read.
65
+ - Prefer _one decisive file slice_ over three speculative reads.
66
+ - If you cannot explain — out loud, in one sentence — _why_ a source is being loaded, do not load it.
67
+
68
+ ## 4. Working-Set Shaping
69
+
70
+ The working set is what the agent is _actively reasoning over_ — not everything it has ever seen.
71
+
72
+ ### Keep these in the working set
73
+
74
+ - The current problem statement
75
+ - The current hypothesis
76
+ - The smallest evidence set that can _prove or disprove_ it
77
+ - The next verification step
78
+
79
+ ### Push these out of the working set
80
+
81
+ - Raw logs after the key pattern is extracted
82
+ - Full files after the needed lines are identified
83
+ - Alternative hypotheses that have already been disproven
84
+ - Background docs once their actionable rule has been distilled
85
+
86
+ ### Distillation pattern
87
+
88
+ | Raw input | Keep in active context instead |
89
+ | ------------------------ | ----------------------------------------------------- |
90
+ | 300-line error log | 2-line summary of the repeating error and its trigger |
91
+ | Whole component file | Function name + line slice + one key invariant |
92
+ | Large skill body | The one rule that changes the current decision |
93
+ | Long conversation thread | Current state, blocker, next step |
94
+ | Multi-page doc | The single section that answers the active question |
95
+
96
+ The distillation is the _artefact_ — write it down, paste it into the active context, then drop the raw input. Distillation that lives only in the agent's "memory" is fragile; distillation written into the conversation is durable.
97
+
98
+ ## 5. Drift Detection
99
+
100
+ Context drift means the session is no longer solving the same problem it started with, _or_ is still reasoning from assumptions that have already been falsified.
101
+
102
+ | Drift signal | What it usually means | Response |
103
+ | --------------------------------------------------------- | -------------------------------------------- | -------------------------------------------------------------------- |
104
+ | Re-reading the same file repeatedly | The active question is vague | Rewrite the question and search narrower |
105
+ | Fix ideas change every turn | No anchored hypothesis | Stop implementing; restate the evidence; pick one hypothesis to test |
106
+ | New files keep getting pulled in | Search space is unbounded | Identify the _first failing boundary_ and stay there |
107
+ | The agent forgets what was already proven | Facts were never collapsed into a checkpoint | Write the checkpoint _before_ continuing |
108
+ | The agent restates earlier conclusions in different words | Active context is too large to scan | Distil and drop the original sources |
109
+
110
+ ### Anti-drift rules
111
+
112
+ - **One active hypothesis** at a time.
113
+ - **One primary question** at a time.
114
+ - **One verification target** at a time.
115
+ - When evidence contradicts the plan, _update the checkpoint_ before moving on. Do not silently abandon the contradicted hypothesis.
116
+
117
+ ## 6. Compaction-Ready Handoffs
118
+
119
+ This skill does not own token math or compaction triggers. It owns _handoff quality_ — the artefact a successor agent reads when the current session compacts, restarts, or hands over.
120
+
121
+ Before any compaction, restart, or handover, preserve five fields:
122
+
123
+ 1. **The active task** (identifier, link, or short description)
124
+ 2. **The current question** (one sentence)
125
+ 3. **The strongest supported hypothesis** (and the evidence supporting it)
126
+ 4. **The evidence already verified** (proven facts the next session should not re-prove)
127
+ 5. **The next concrete step** (the action a successor would take if they had no other context)
128
+
129
+ ### Good handoff format
130
+
131
+ | Field | Example |
132
+ | -------------- | --------------------------------------------------------------------------------------------------------------- |
133
+ | Task | `ABC-123` (a single task ID or link) |
134
+ | Question | "Why does the settings save path ignore org scope?" |
135
+ | Proven facts | "The route uses an unscoped query helper, not the org-scoped one; failure reproduces only for non-owner roles." |
136
+ | Rejected paths | "Not a session bug — auth claims are correct." |
137
+ | Next step | "Patch the org-scoped update path; rerun the failing request." |
138
+
139
+ If the handoff cannot tell another agent _where to start_ in under thirty seconds of reading, it is incomplete. Optimise for the _cold start_, not for the agent that already has full context.
140
+
141
+ ## 7. Recovery After Lost Context
142
+
143
+ When the thread is lost — after compaction, after a session restart, after a long context-switching gap — rebuild _selectively_, in this order:
144
+
145
+ 1. **Re-read the user request** (the canonical task statement, not your own paraphrase).
146
+ 2. **Re-read the last checkpoint** (or continuation artefact, or the most recent agent-written summary).
147
+ 3. **Re-open only the files that directly support the current question** — not "everything that might be relevant."
148
+ 4. **Reconstruct the current hypothesis from evidence**, not from memory of what you used to think.
149
+ 5. **Resume from the next unverified step.**
150
+
151
+ Do not "warm up" by re-reading everything. Recovery is a _selective_ rebuild, not a context flood. Loading 30 files to "get back into the task" recreates the original drift on a fresh canvas.
152
+
153
+ ## Verification
154
+
155
+ - [ ] I can state the active question in one sentence
156
+ - [ ] I know — and could explain — why each currently loaded source is in context
157
+ - [ ] I have reduced raw inputs into a smaller written working summary that lives in the conversation, not in the agent's memory
158
+ - [ ] I have removed or ignored disproven hypotheses from the active thread
159
+ - [ ] I have a handoff-ready checkpoint with all five required fields before any compaction or session restart
160
+ - [ ] I am loading new context because _evidence demands it_, not because I feel uncertain
161
+ - [ ] One active hypothesis, one primary question, one verification target — no exceptions
162
+ - [ ] A successor agent could resume from my checkpoint in under thirty seconds
163
+
164
+ ## Do NOT Use When
165
+
166
+ | Use instead | When |
167
+ | ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
168
+ | A context-window budget skill | Calculating per-zone token budgets, compaction thresholds, or 1M-vs-200K decisions |
169
+ | `prompt-craft` | Writing or improving a prompt template — wording, structure, format constraints |
170
+ | A persistent-memory curation skill | Curating cross-session memory files, pruning the memory index, deciding what _survives_ a session |
171
+ | `context-graph` | Designing the architectural model for the multi-graph context system itself |
172
+ | `context-engineering` | Designing the system-level information architecture, injector quality, and failure metrics — context-engineering is upstream of this skill |
173
+ | `code-review` | Reviewing AI-generated code — orthogonal concern |
174
+ | `tool-call-strategy` | The dispatch decision (which tool to call next) — context-management is the _input_ to that decision, not the decision itself |
@@ -0,0 +1,239 @@
1
+ ---
2
+ name: context-window
3
+ description: "Use when allocating context-window budget across system, skill-injection, working, and output zones; monitoring context health; deciding when to compact; preserving state before compaction; recovering after compaction; or choosing strategies for 1M, 200K, or 128K context windows. Covers zone budgets, practical model-budget tables, the 80% compaction rule, pre/post-compact protocols, persistence hierarchy, operation token costs, and token-reduction techniques. Do NOT use for deciding what information belongs in the working set (use `context-management`), prompt design (use `prompt-craft`), graph architecture (use `context-graph`), or memory curation."
4
+ license: MIT
5
+ compatibility: "Provider-agnostic. The zone model, 80% rule, persistence hierarchy, and token-reduction techniques apply across Anthropic, OpenAI, Google, and open-weight contexts of any size. Specific token figures are illustrative — substitute the figures of the model you actually run."
6
+ allowed-tools: Read Grep
7
+ metadata:
8
+ metadata: "{\"schema_version\":6,\"version\":\"1.0.0\",\"type\":\"capability\",\"category\":\"agent\",\"domain\":\"agent/context\",\"scope\":\"portable\",\"owner\":\"skill-graph-maintainer\",\"freshness\":\"2026-05-06\",\"drift_check\":\"{\\\\\\\"last_verified\\\\\\\":\\\\\\\"2026-05-06\\\\\\\"}\",\"eval_artifacts\":\"planned\",\"eval_state\":\"unverified\",\"routing_eval\":\"absent\",\"stability\":\"experimental\",\"keywords\":\"[\\\\\\\"context window management\\\\\\\",\\\\\\\"context budget allocation\\\\\\\",\\\\\\\"80% compaction rule\\\\\\\",\\\\\\\"context health states\\\\\\\",\\\\\\\"pre-compact hook\\\\\\\",\\\\\\\"post-compact recovery\\\\\\\",\\\\\\\"cross-session persistence hierarchy\\\\\\\",\\\\\\\"token consumption per operation\\\\\\\",\\\\\\\"deterministic cli vs mcp tool result tokens\\\\\\\",\\\\\\\"targeted file read offset limit\\\\\\\",\\\\\\\"progressive skill disclosure\\\\\\\",\\\\\\\"grep before reading files\\\\\\\",\\\\\\\"1M context window strategy\\\\\\\",\\\\\\\"200K context window strategy\\\\\\\",\\\\\\\"128K context window strategy\\\\\\\",\\\\\\\"checkpoint before compact\\\\\\\",\\\\\\\"continuation signal\\\\\\\",\\\\\\\"what survives compaction\\\\\\\"]\",\"examples\":\"[\\\\\\\"the agent's tool results are starting to truncate — what state are we in and what should I do next?\\\\\\\",\\\\\\\"I have a 1M-context model — does that mean I can ignore budget management?\\\\\\\",\\\\\\\"the session is at 75% context — should I compact now or finish the current operation first?\\\\\\\",\\\\\\\"I just compacted and lost the decision trail; what should the pre-compact hook have preserved?\\\\\\\",\\\\\\\"the agent reads 5 files looking for a function and burns 100K tokens — what's the right pattern?\\\\\\\",\\\\\\\"I'm running on a 128K-context model — what's the per-task budget I can plan against?\\\\\\\",\\\\\\\"what survives compaction and what doesn't, ranked from most to least durable?\\\\\\\",\\\\\\\"the skill payload is 30K and I haven't even read a file yet — how do I shrink it?\\\\\\\"]\",\"anti_examples\":\"[\\\\\\\"decide what context to load or drop in the working set\\\\\\\",\\\\\\\"design the multi-graph architecture for skills + docs + memory\\\\\\\",\\\\\\\"improve the prompt template the agent uses\\\\\\\",\\\\\\\"curate the durable memory index across sessions\\\\\\\",\\\\\\\"which skill should activate for this query\\\\\\\",\\\\\\\"review this AI-generated PR for correctness\\\\\\\",\\\\\\\"the README has drifted from the actual CLI flags — which wins?\\\\\\\",\\\\\\\"the docs have drifted from the code — which is canonical?\\\\\\\"]\",\"relations\":\"{\\\\\\\"boundary\\\\\\\":[{\\\\\\\"skill\\\\\\\":\\\\\\\"context-management\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"context-management decides what to load and drop in the working set; context-window is the budget math underneath that decides how much fits and when to compact\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"context-graph\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"context-graph maps the static topology of skills / docs / memory; context-window is the runtime budget for the part of that topology that is actually loaded\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"tool-call-strategy\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"tool-call-strategy decides which tool to invoke; context-window decides how much budget the result of that tool is allowed to occupy\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"prompt-craft\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"prompt-craft is wording / structure of one prompt; context-window is the per-session budget the prompt and its results live within\\\\\\\"}],\\\\\\\"related\\\\\\\":[\\\\\\\"context-management\\\\\\\",\\\\\\\"context-graph\\\\\\\",\\\\\\\"tool-call-strategy\\\\\\\",\\\\\\\"context-engineering\\\\\\\"],\\\\\\\"verify_with\\\\\\\":[\\\\\\\"context-management\\\\\\\"]}\",\"portability\":\"{\\\\\\\"readiness\\\\\\\":\\\\\\\"scripted\\\\\\\",\\\\\\\"targets\\\\\\\":[\\\\\\\"skill-md\\\\\\\"]}\",\"lifecycle\":\"{\\\\\\\"stale_after_days\\\\\\\":180,\\\\\\\"review_cadence\\\\\\\":\\\\\\\"quarterly\\\\\\\"}\",\"skill_graph_source_repo\":\"https://github.com/jacob-balslev/skill-graph\",\"skill_graph_protocol\":\"Skill Metadata Protocol v5\",\"skill_graph_project\":\"Skill Graph\",\"skill_graph_canonical_skill\":\"skills/context-window/SKILL.md\",\"skill_graph_export_description\":\"shortened for Agent Skills 1024-character description limit; canonical source keeps the full routing contract\",\"skill_graph_canonical_description_length\":\"1352\"}"
9
+ skill_graph_source_repo: "https://github.com/jacob-balslev/skill-graph"
10
+ skill_graph_protocol: Skill Metadata Protocol v4
11
+ skill_graph_project: Skill Graph
12
+ skill_graph_canonical_skill: skills/context-window/SKILL.md
13
+ ---
14
+
15
+ # Context Window
16
+
17
+ ## Coverage
18
+
19
+ The quantitative discipline behind an agent's working memory. Allocates the context-window budget across three zones: System (system prompt, rules, tool schemas), Skill Injection (the SKILL.md files auto-loaded for the current task), and Working (conversation, tool results, file contents, agent output). Names the three context health states — `ok` (< 60% used), `compact` (60–80%), `exhausted` (> 80%) — and the **80% compaction rule** that compaction must always trigger before the budget is fully consumed, leaving 20% as the safety margin for finishing the current operation, writing the checkpoint, running the closeout protocol, and emitting the continuation signal. Specifies the pre-compact protocol (commit uncommitted changes, write the continuation signal, update the checkpoint, save state that cannot be re-derived from git or disk) and the post-compact recovery flow (re-injection of git status, active-task reference, recent commits, critical findings). Catalogs typical token consumption per operation type (full file read 20–40K, large tool-result JSON 10–30K, single SKILL injection 3–8K, fixed system overhead) and the five token-reduction techniques: deterministic-CLI over heavy MCP / tool-result paths, targeted file reads with `offset` + `limit` instead of full-file reads, search-before-read (grep first, read the match), progressive skill disclosure (small SKILL.md kept always loaded; large reference files loaded on demand), and count-mode for exploration (count matches, then read the few that matter). Specifies the cross-session persistence hierarchy — git history > files on disk > durable memory > live context — and uses it to decide _what to checkpoint_ before compaction. Lists per-model-class context strategies for 1M, 200K, and 128K windows.
20
+
21
+ ## Philosophy
22
+
23
+ The context window is the agent's _working memory_. Unlike human memory, it has a hard ceiling — when it fills, information is permanently lost from the live session unless it has been checkpointed somewhere durable. Managing the window is not optional. It is the difference between completing a long task and crashing mid-work with the most recent reasoning gone.
24
+
25
+ The trap of large windows is the assumption that they are _effectively_ unlimited. A 1M-token window feels infinite until a single 2000-line file read consumes 30K, three of those plus a long tool-result chain pushes past 200K, and the agent is at 60% before any real implementation has happened. The ceiling is real, and it is closer than the headline number suggests. Discipline at 200K is identical to discipline at 1M; only the absolute numbers move.
26
+
27
+ The 80% rule exists because compaction is _itself_ an operation that needs budget. Hitting 100% mid-operation loses the operation. Compacting at 80% preserves it — the remaining 20% pays for the act of preserving.
28
+
29
+ ## Zone Model
30
+
31
+ A useful per-session mental partition of the available budget:
32
+
33
+ | Zone | Typical share | What lives here |
34
+ | ------------------- | ------------- | ------------------------------------------------------------------------ |
35
+ | **System** | ~5–10% | System prompt, repo rules, tool schemas, always-loaded directives |
36
+ | **Skill injection** | ~2–5% | The SKILL.md files auto-loaded by the routing layer for the current task |
37
+ | **Working** | ~85–93% | Conversation, tool results, file contents, agent output |
38
+
39
+ The exact share varies by model, harness, and task type. The zones are useful because budget breaches show up in different places: a System overrun is a rules / tool-schema problem, a Skill overrun is a routing / over-injection problem, a Working overrun is a context-management / file-read problem. Each has a different remediation.
40
+
41
+ ### Practical budget by model class
42
+
43
+ Replace these illustrative figures with the actual figures of your runtime — they shift over time and across vendors.
44
+
45
+ | Model class | Total context | Typical system overhead | Practical working budget |
46
+ | ------------------------------------------------------------ | ------------- | ----------------------- | ------------------------ |
47
+ | Frontier 1M-context (Anthropic Opus / Sonnet 1M tier) | ~1,000,000 | ~70K | ~930K |
48
+ | Frontier 200K-context (default tier of most frontier models) | ~200,000 | ~70K | ~130K |
49
+ | Long-context Haiku class | ~200,000 | ~50K | ~150K |
50
+ | ~128K class (some OpenAI / open-weight) | ~128,000 | ~20K | ~108K |
51
+
52
+ ## Context Health States
53
+
54
+ | State | Used budget | Meaning | Action |
55
+ | ----------- | ----------- | ---------------- | ------------------------------------------------ |
56
+ | `ok` | < 60% | Normal operation | Continue working |
57
+ | `compact` | 60–80% | Getting crowded | Plan compaction at the next logical boundary |
58
+ | `exhausted` | > 80% | Critical | Stop after the current item, compact immediately |
59
+
60
+ ### The 80% rule
61
+
62
+ **Always compact at 80% of the working budget — never at 100%.** The remaining 20% is the safety margin for:
63
+
64
+ - Completing the operation currently in flight
65
+ - Writing the checkpoint state
66
+ - Running whatever session-closeout / wrap protocol the runtime ships
67
+ - Emitting the continuation signal so the next session can resume
68
+
69
+ Hitting 100% mid-operation loses work. Compacting at 80% preserves it.
70
+
71
+ ## Compaction Protocol
72
+
73
+ ### When to compact
74
+
75
+ 1. Context health reaches `compact` or `exhausted`
76
+ 2. After completing a logical unit of work (one task, one file, one audit item)
77
+ 3. Before starting a large new operation that will read many files
78
+ 4. When tool results begin to truncate (a leading indicator of context pressure)
79
+
80
+ ### Pre-compact checklist
81
+
82
+ Before triggering compaction:
83
+
84
+ 1. **Commit any uncommitted changes** — git work survives compaction; live context does not.
85
+ 2. **Write the continuation signal** — the next-session contract: active task, current question, remaining work.
86
+ 3. **Update any loop or task checkpoint** — advance the recorded phase to the actual phase.
87
+ 4. **Save critical state** — anything that cannot be re-derived from git history or files on disk goes into a durable artefact now.
88
+
89
+ ### Pre-compact hook
90
+
91
+ A pre-compact hook is the deterministic enforcer of the checklist. The hook captures, at minimum:
92
+
93
+ - The active task identifier and the current question
94
+ - The agent mode / phase
95
+ - The current git branch and the most recent commit hashes
96
+ - The current context-health state
97
+ - A small bag of custom state (whatever the runtime needs to resume)
98
+
99
+ Any runtime that supports compaction without a pre-compact hook is _one accidental compaction away from losing the decision trail_. The hook is not optional infrastructure for any session that runs more than a few minutes.
100
+
101
+ ### Post-compact recovery
102
+
103
+ After compaction, the session-start brief should re-inject:
104
+
105
+ - Git status (branch, recent commits, dirty files)
106
+ - The active task pulled from the continuation signal
107
+ - A short summary of the in-progress board state
108
+ - Any critical findings recorded in the pre-compact checkpoint
109
+
110
+ The agent does not re-load the lost conversation. It rebuilds _selectively_ from the durable artefacts.
111
+
112
+ ## Token Consumption Patterns
113
+
114
+ ### What consumes the most context
115
+
116
+ | Operation | Typical tokens | Impact |
117
+ | ----------------------------------- | -------------- | ---------- |
118
+ | Full file read (2000 lines) | 20–40K | High |
119
+ | Grep results, 50 matches | 5–10K | Medium |
120
+ | Tool result, large JSON | 10–30K | High |
121
+ | Skill injection, one SKILL.md | 3–8K | Low–Medium |
122
+ | Agent response, code + explanation | 2–5K | Low |
123
+ | System prompt + always-loaded rules | ~50K (fixed) | Baseline |
124
+
125
+ ### Five token-reduction techniques
126
+
127
+ #### 1. Deterministic CLI over heavy tool-result paths
128
+
129
+ Where the runtime offers both a heavy tool-result path (e.g., a large MCP-style JSON dump) and a deterministic CLI / scripted path that returns the same data shaped tighter, prefer the CLI. The savings can easily be 50–100× per call. The principle: ship structured output through tools the model can read efficiently, not through whatever path the runtime happens to expose by default.
130
+
131
+ #### 2. Targeted file reads (offset + limit)
132
+
133
+ ```
134
+ BAD: read the whole 2000-line file
135
+ → 30K tokens
136
+ GOOD: read 30 lines starting at the function you actually need
137
+ → 500 tokens
138
+ ```
139
+
140
+ If a code-search step has already located the relevant lines, _use_ those line numbers. A "read everything because I might need it" pattern is the single biggest avoidable burn.
141
+
142
+ #### 3. Search before read
143
+
144
+ ```
145
+ BAD: read 5 candidate files looking for a function
146
+ → 100K tokens
147
+ GOOD: grep for the function name first, then read 30 lines from the one match
148
+ → 2K tokens
149
+ ```
150
+
151
+ The search step costs ~1K tokens and replaces 50–100K of speculative reading.
152
+
153
+ #### 4. Progressive skill disclosure
154
+
155
+ Skills should follow a two-tier structure:
156
+
157
+ - **`SKILL.md`** — the core patterns, the routing-contract description, the verification checklist. Always loaded when the skill is selected. Should fit comfortably in 3–8K tokens.
158
+ - **`references/*.md`** — detailed reference material, long examples, deep specifications. Loaded _only_ when explicitly needed.
159
+
160
+ Only `SKILL.md` is auto-injected. References are loaded by the agent when the task demands the depth.
161
+
162
+ #### 5. Count mode for exploration
163
+
164
+ ```
165
+ BAD: list every TODO comment in the repo, full match content
166
+ → 50K tokens
167
+ GOOD: count first, then read selectively
168
+ grep --count "TODO" → 200 tokens
169
+ grep "TODO" path: src/lib/ --head 10 → 2K tokens
170
+ ```
171
+
172
+ Exploration should be a _count → narrow → read_ sequence, not a single exhaustive read.
173
+
174
+ ## Cross-Session Persistence Hierarchy
175
+
176
+ What survives a compaction or session restart, ranked from most to least durable:
177
+
178
+ 1. **Git** — code, commits, branches. Permanent.
179
+ 2. **Files on disk** — checkpoints, continuation signals, structured logs (JSONL is ideal). Persistent until manually deleted.
180
+ 3. **Durable memory** — index files and topic files in a memory directory consumed by the next session. Persistent and indexed.
181
+ 4. **Live context** — conversation history, in-flight reasoning, tool results. **Lost on compaction.**
182
+
183
+ The hierarchy drives the pre-compact checklist: anything that lives only at level 4 needs to be promoted to levels 1–3 _before_ compaction, or it is gone.
184
+
185
+ ### Planning for compaction
186
+
187
+ When starting a complex multi-step task:
188
+
189
+ 1. Break it into subtasks each of which can complete inside one context window
190
+ 2. After each subtask: commit + update checkpoint + write continuation signal
191
+ 3. If a subtask risks exceeding the budget mid-flight, split further or read fewer files
192
+
193
+ The rhythm is: small unit → commit → checkpoint → next unit. Compaction becomes a routine boundary instead of a crisis.
194
+
195
+ ## Per-Model-Class Strategies
196
+
197
+ | Model class | Typical task sizing | Compaction cadence | Key disciplines |
198
+ | ------------------------------------------- | ------------------------------------------------------- | ----------------------- | ----------------------------------------------------------------------- |
199
+ | 1M context (frontier Opus / Sonnet 1M tier) | 5–10 file reads + full implementation per session | After 3–4 complex tasks | Progressive skill disclosure; targeted reads still required |
200
+ | 200K context (default frontier tier) | 2–4 file reads + one focused implementation per session | After every 2 tasks | Aggressive search-before-read; skill targeting; offset+limit reads |
201
+ | Long-context Haiku class | 2–3 file reads per task | After every task | Minimise skill payload; targeted labels only; commit between tasks |
202
+ | ~128K class | 1 task per session | Hard boundary | Count-mode first; read only essentials; one verification step at a time |
203
+
204
+ A 1M window is not a license to ignore the rules — it just shifts the breaking point further out. Apply the same discipline; the budget math just lets you run longer between compactions.
205
+
206
+ ## Anti-Patterns
207
+
208
+ | Anti-pattern | Why it fails | Correct |
209
+ | -------------------------------------------------------- | -------------------------------------------------------------------------- | ---------------------------------------------------------------------------------- |
210
+ | Reading entire large files when 30 lines would do | Burns 20–40K per file with no benefit | Use `offset` + `limit` |
211
+ | Loading every available skill regardless of task | Skill injection should be 2–5%, not 25% | Use targeted routing labels; trust the routing layer |
212
+ | Ignoring the `compact` health signal | Skipping past 80% guarantees a 100% loss event sooner or later | Compact at the next logical boundary once `compact` triggers |
213
+ | Compacting without a pre-compact checkpoint | The decision trail is lost; the next session re-derives wrong | Always run the pre-compact checklist; keep the hook always-on |
214
+ | Letting tool results dump unstructured JSON into context | A 30K tool result evicts 30K of useful conversation | Wrap heavy results in a CLI / script that returns the shape you need |
215
+ | Speculative reads ("I might need this") | Speculation has the same cost as evidence-based reads, with worse outcomes | Read on evidence; if you cannot name _what you'll do with the file_, don't read it |
216
+ | Treating the 1M window as effectively unlimited | A complex task crosses 60% in minutes; the ceiling is real | Apply the same discipline at 1M as at 200K; the budget just stretches |
217
+
218
+ ## Verification
219
+
220
+ - [ ] The current context-health state has been correctly classified as `ok`, `compact`, or `exhausted` based on actual usage estimates
221
+ - [ ] The pre-compact checklist has been followed before any compaction (commit, continuation signal, checkpoint, custom state)
222
+ - [ ] A pre-compact hook is installed and runs deterministically — compaction is never invoked without it firing
223
+ - [ ] File reads use `offset` + `limit` targeting for any file beyond ~200 lines
224
+ - [ ] The session prefers deterministic CLI / scripted tool paths over heavy MCP-style result dumps where both exist
225
+ - [ ] No compaction has been triggered at 100% — the 80% rule has been respected
226
+ - [ ] What needs to survive the session has been promoted from live context to git / files / durable memory before any compaction
227
+ - [ ] The active model's actual context budget (not assumed budget) is the planning baseline
228
+
229
+ ## Do NOT Use When
230
+
231
+ | Use instead | When |
232
+ | ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
233
+ | `context-management` | Deciding _what_ to load, keep, or drop from the working set — the qualitative side of context health |
234
+ | `context-graph` | Designing the multi-graph architecture (skills + docs + memory + scripts) — the topology, not the runtime budget |
235
+ | `prompt-craft` | Writing or improving a prompt — wording, structure, format constraints |
236
+ | A memory-curation skill | Curating cross-session persistent memory files, pruning the memory index |
237
+ | `tool-call-strategy` | Choosing which tool to call next — context-window decides the _budget_ for the call's result, not whether the call is the right call |
238
+ | `code-review` | Reviewing AI-generated code — orthogonal concern |
239
+ | `context-engineering` | Designing the system-level information architecture — context-engineering is upstream of this skill |
@@ -0,0 +1,120 @@
1
+ ---
2
+ name: contract-testing
3
+ description: "Use when verifying the interface between two services or components by capturing the consumer's expectations as a contract artifact and verifying the provider satisfies it. Covers the consumer-driven contracts pattern (Fowler 2006; Pact), the contrast with schema-only validation (OpenAPI/JSON Schema captures shape, not behavioral expectations), the broker as the integration point between consumer and provider deploy schedules, two-phase verification (consumer-side mocks; provider-side replay), the difference between contract testing (verifies the interface) and integration testing (verifies the implementation through it), and how contract tests replace brittle cross-service e2e. Do NOT use for in-system integration (use `integration-test-design`), full user-journey testing (use `e2e-test-design`), single-unit testing (use `testing-strategy` + `test-doubles-design`), or pure OpenAPI schema validation (API-spec tooling)."
4
+ license: MIT
5
+ allowed-tools: Read Grep
6
+ metadata:
7
+ metadata: "{\"schema_version\":6,\"version\":\"1.0.0\",\"type\":\"capability\",\"category\":\"quality\",\"domain\":\"quality/testing\",\"scope\":\"reference\",\"owner\":\"skill-graph-maintainer\",\"freshness\":\"2026-05-16\",\"drift_check\":\"{\\\\\\\"last_verified\\\\\\\":\\\\\\\"2026-05-16\\\\\\\"}\",\"eval_artifacts\":\"planned\",\"eval_state\":\"unverified\",\"routing_eval\":\"absent\",\"comprehension_state\":\"present\",\"stability\":\"experimental\",\"keywords\":\"[\\\\\\\"contract testing\\\\\\\",\\\\\\\"consumer-driven contracts\\\\\\\",\\\\\\\"Pact\\\\\\\",\\\\\\\"Spring Cloud Contract\\\\\\\",\\\\\\\"Specmatic\\\\\\\",\\\\\\\"contract broker\\\\\\\",\\\\\\\"provider verification\\\\\\\",\\\\\\\"consumer test\\\\\\\",\\\\\\\"CDC\\\\\\\",\\\\\\\"OpenAPI conformance\\\\\\\"]\",\"triggers\":\"[\\\\\\\"should this be a contract test or an integration test\\\\\\\",\\\\\\\"Pact vs OpenAPI\\\\\\\",\\\\\\\"how do we decouple deploys between services\\\\\\\",\\\\\\\"the consumer broke when the provider changed\\\\\\\",\\\\\\\"should we e2e test across services\\\\\\\"]\",\"examples\":\"[\\\\\\\"design a consumer-driven contract test between a frontend and a backend service\\\\\\\",\\\\\\\"decide whether to use Pact or schema-only validation for a new API\\\\\\\",\\\\\\\"diagnose a contract test that passes consumer-side but fails provider-side — implementation drift\\\\\\\",\\\\\\\"explain how the contract broker decouples deploy schedules between consumer and provider teams\\\\\\\"]\",\"anti_examples\":\"[\\\\\\\"test internal seams of a system (use integration-test-design)\\\\\\\",\\\\\\\"validate an HTTP response against an OpenAPI schema (use API-spec tooling)\\\\\\\",\\\\\\\"test a complete user journey through the UI (use e2e-test-design)\\\\\\\"]\",\"relations\":\"{\\\\\\\"related\\\\\\\":[\\\\\\\"testing-strategy\\\\\\\",\\\\\\\"integration-test-design\\\\\\\",\\\\\\\"e2e-test-design\\\\\\\",\\\\\\\"api-design\\\\\\\",\\\\\\\"event-contract-design\\\\\\\",\\\\\\\"system-interface-contracts\\\\\\\"],\\\\\\\"boundary\\\\\\\":[{\\\\\\\"skill\\\\\\\":\\\\\\\"integration-test-design\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"integration-test-design owns tests that exercise the real implementation through an interface; this skill owns tests that verify the interface contract independently of the implementation behind it. Contract tests can replace cross-service e2e tests; they cannot replace integration tests that verify behavior through the interface.\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"e2e-test-design\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"e2e-test-design owns user-journey tests across the whole stack; this skill owns service-boundary contract verification. Cross-service e2e tests are often the wrong tool — they are slow and verify too much; contract tests verify the interface specifically.\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"api-design\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"api-design owns the design of the request/response surface; this skill owns the testing of whether the implementation meets that design's contract. The two compose: api-design produces the contract; contract testing verifies it.\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"system-interface-contracts\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"system-interface-contracts owns the design and documentation of contracts between systems, modules, and services; this skill owns the testing of those contracts. system-interface-contracts is the design discipline; this skill is the verification technique.\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"event-contract-design\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"event-contract-design owns the design of asynchronous event contracts; this skill applies to verifying message-bus contracts (Pact supports asynchronous message contracts). The two compose for event-driven systems.\\\\\\\"}],\\\\\\\"verify_with\\\\\\\":[\\\\\\\"api-design\\\\\\\",\\\\\\\"integration-test-design\\\\\\\"]}\",\"mental_model\":\"|\",\"purpose\":\"|\",\"boundary\":\"|\",\"analogy\":\"A contract test is to a service interface what a lease is to a tenancy — the consumer writes down the specific obligations they depend on (utilities included, quiet hours, this exact rent), the landlord (provider) verifies independently that they can honor those obligations, and the lease in the broker's filing cabinet lets either party prove compatibility without re-negotiating from scratch each time one of them moves.\",\"misconception\":\"|\",\"concept\":\"{\\\\\\\"definition\\\\\\\":\\\\\\\"Contract testing is the technique of verifying that an interface between a consumer and a provider — typically two services across a network boundary — works as both sides expect, by capturing the consumer's expectations as a *contract* artifact and then running that contract against both sides independently. The contract is consumer-driven: it expresses the specific interactions the consumer actually performs (this HTTP request → this response shape and content), not the full surface of the provider's API. The consumer side verifies its code by replaying the contract against a generated mock provider; the provider side verifies its implementation by replaying the contract against the real provider. When both verifications pass independently, the two sides are known to be compatible without ever running them together. The technique decouples deploy schedules between consumer and provider teams and replaces brittle cross-service end-to-end tests with focused interface verification.\\\\\\\",\\\\\\\"mental_model\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"purpose\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"boundary\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"taxonomy\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"analogy\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"misconception\\\\\\\":\\\\\\\"|\\\\\\\"}\",\"skill_graph_source_repo\":\"https://github.com/jacob-balslev/skill-graph\",\"skill_graph_protocol\":\"Skill Metadata Protocol v5\",\"skill_graph_project\":\"Skill Graph\",\"skill_graph_canonical_skill\":\"skills/contract-testing/SKILL.md\",\"skill_graph_export_description\":\"shortened for Agent Skills 1024-character description limit; canonical source keeps the full routing contract\",\"skill_graph_canonical_description_length\":\"1176\"}"
8
+ skill_graph_source_repo: "https://github.com/jacob-balslev/skill-graph"
9
+ skill_graph_protocol: Skill Metadata Protocol v4
10
+ skill_graph_project: Skill Graph
11
+ skill_graph_canonical_skill: skills/contract-testing/SKILL.md
12
+ ---
13
+
14
+ # Contract Testing
15
+
16
+ ## Coverage
17
+
18
+ The technique of verifying interfaces between consumer and provider components by capturing the consumer's expectations as a contract artifact and running that contract against both sides independently. Covers the consumer-driven contracts pattern (Fowler 2006; Pact ecosystem), the two-phase verification (consumer-side mock generation; provider-side replay), the broker as the integration point that enables deploy independence and compatibility tracking, the contrast with schema-only validation (OpenAPI captures surface; contracts capture behavior), the contrast with integration and e2e testing (contracts verify the interface; integration and e2e verify implementation and journeys), and the tool ecosystem (Pact, Spring Cloud Contract, Specmatic, bi-directional patterns).
19
+
20
+ ## Philosophy
21
+
22
+ Contract testing decouples deploy schedules between services. Before contract testing, two services that depended on each other had to be tested together — usually with brittle cross-service e2e tests in a shared staging environment, which serialized deploys and produced flaky signal. With contract testing, each side is verified independently against a shared contract; deploys can happen on independent schedules as long as the contract is satisfied.
23
+
24
+ The contract is *consumer-driven*: it captures what the consumer actually does, not what the provider offers. This narrowing — three endpoints out of fifty, twelve fields out of a hundred — is what makes contract testing focused and what distinguishes it from full schema validation. The provider can change anything the consumers don't rely on; the provider cannot break anything any consumer's contract requires.
25
+
26
+ The discipline is in the broker. A contract-test setup without a broker is "contracts as committed fixtures" — much of the maintenance cost, little of the deploy-independence benefit. A contract-test setup with a broker (Pact Broker, PactFlow) provides versioned contract storage, compatibility tracking, and deploy-gating; the strategic value of the technique is realized at this layer.
27
+
28
+ ## The Two-Phase Verification
29
+
30
+ ```
31
+ ┌─────────────────────────┐ ┌─────────────────────────┐
32
+ │ Consumer's CI │ │ Provider's CI │
33
+ │ │ │ │
34
+ │ 1. Run consumer tests │ │ 4. Fetch contracts │
35
+ │ against generated │ │ from broker │
36
+ │ mock provider │ │ │
37
+ │ │ │ 5. Replay each contract │
38
+ │ 2. If pass: contract │ │ against real │
39
+ │ is correct │ │ provider │
40
+ │ │ │ │
41
+ │ 3. Publish contract │ │ 6. If pass: provider │
42
+ │ to broker │ ──────▶ │ satisfies contract │
43
+ │ │ │ │
44
+ │ │ ◀────── │ 7. Mark contract version│
45
+ │ │ broker │ as verified │
46
+ └─────────────────────────┘ └─────────────────────────┘
47
+ ```
48
+
49
+ When both sides have passed independently, the broker records the compatibility. Deploy gating uses this record: don't deploy the provider unless its current version is marked compatible with the production consumer's contract.
50
+
51
+ ## Schema vs Contract — A Practical Comparison
52
+
53
+ | Property | OpenAPI / JSON Schema | Consumer-driven contract |
54
+ |---|---|---|
55
+ | Captures | Surface (endpoints, types, shapes) | Behavior (specific interactions the consumer performs) |
56
+ | Driven by | Provider | Consumer |
57
+ | Coverage | Everything the provider offers | Only what consumers use |
58
+ | Catches breaking field value change | If schema constrains values; usually not | Yes — the contract has specific values |
59
+ | Catches semantic / error-shape change | No | Yes if consumer relies on the error |
60
+ | Verifies through real implementation | No (validates response against schema only) | Yes (provider runs the contract against itself) |
61
+ | Used for | API description, code generation, validation | Test gate between consumer and provider |
62
+ | Should you have both? | Yes | Yes |
63
+
64
+ ## When Contract Testing Replaces What
65
+
66
+ | Pre-contract-testing approach | Contract-testing replacement | When replacement is right |
67
+ |---|---|---|
68
+ | Cross-service e2e test for service boundary | Contract test | Almost always — cheaper, more reliable, doesn't require both services running |
69
+ | Schema validation alone | Contract test + schema | Always — schema is necessary but not sufficient |
70
+ | Coordinated deploy schedules | Independent deploys + broker gating | Almost always — the deploy-independence is the strategic value |
71
+ | Manual API change reviews | Contract verification on provider CI | Always — the verification is automated |
72
+ | "We'll catch it in staging" | Contract test on PR | Always — catching at PR time is much cheaper |
73
+
74
+ ## Broker-Enabled Deploy Gating
75
+
76
+ | State | Can deploy? |
77
+ |---|---|
78
+ | Provider version has been verified against current production consumer contract | Yes |
79
+ | Provider version has not yet been verified against current production consumer contract | No — run verification first |
80
+ | Provider version fails verification against production consumer contract | No — fix or coordinate with consumer |
81
+ | Consumer version's contract has not been verified against current production provider | No — run provider verification |
82
+
83
+ The broker's compatibility matrix is the deploy gate. Modern setups (PactFlow, Pact Broker `can-i-deploy` API) automate this; the deploy pipeline calls the broker and gets a yes/no answer.
84
+
85
+ ## Verification
86
+
87
+ After applying this skill, verify:
88
+ - [ ] Contracts are *consumer-driven* — written from consumer-side tests against real consumer code, not generated from provider specs.
89
+ - [ ] Contracts and OpenAPI/JSON-Schema coexist; the schema describes surface, the contract verifies behavior. Neither replaces the other.
90
+ - [ ] Both phases of verification run automatically: consumer-side on every consumer PR, provider-side on every provider PR (plus scheduled cron for drift detection).
91
+ - [ ] A broker is in use for any setup with more than 1-2 consumers per provider. The broker provides compatibility tracking and deploy gating.
92
+ - [ ] Deploy gating is wired: provider deploys are blocked if the version isn't verified against the production consumer's contract.
93
+ - [ ] Contract testing is *complementing* integration testing, not replacing it. The team still has integration tests for in-service behavior.
94
+ - [ ] Cross-service e2e tests have been reduced to the most-critical journeys; most cross-service e2e has migrated to contract tests.
95
+ - [ ] Message-bus and event-driven boundaries use contract testing where the consumer-provider pattern applies (Pact supports async message contracts).
96
+ - [ ] Contract evolution is handled: deprecating a field requires checking that no consumer's contract requires it; introducing a new field doesn't break existing contracts.
97
+
98
+ ## Do NOT Use When
99
+
100
+ | Instead of this skill | Use | Why |
101
+ |---|---|---|
102
+ | Testing the implementation behind a service interface | `integration-test-design` | integration tests verify in-service behavior; contracts verify the interface |
103
+ | Testing a user journey across the whole stack including UI | `e2e-test-design` | e2e tests verify the user experience; contracts verify the service boundary |
104
+ | Validating responses against an OpenAPI schema | OpenAPI/JSON-Schema validation tooling | schema validation captures surface; this skill captures behavior |
105
+ | Testing a single unit in isolation | `testing-strategy` + `test-doubles-design` | unit scope is below this skill's level |
106
+ | Designing the API surface itself | `api-design` | api-design owns the contract design; this skill owns the verification |
107
+ | Designing the event surface | `event-contract-design` | event-contract-design owns the asynchronous contract design; this skill verifies it |
108
+ | Choosing the test-level ratio | `testing-strategy` | strategy owns ratios; this skill owns the contract-testing technique |
109
+
110
+ ## Key Sources
111
+
112
+ - Fowler, M. (2006). ["Consumer-Driven Contracts: A Service Evolution Pattern"](https://martinfowler.com/articles/consumerDrivenContracts.html). The foundational essay defining consumer-driven contracts as a pattern for service evolution.
113
+ - Pact Foundation. ["Pact — Documentation"](https://docs.pact.io). The canonical reference for the most-adopted consumer-driven contract testing tool; multi-language clients; broker ecosystem.
114
+ - PactFlow. ["The Pact Broker"](https://docs.pactflow.io/docs/pact-broker/). Reference for the contract broker pattern, compatibility tracking, and deploy gating.
115
+ - Robinson, I. (2006). ["Consumer-Driven Contracts: Three Levels of Confidence"](https://www.ianrobinson.net/). Practitioner essay on the levels of compatibility consumer-driven contracts provide.
116
+ - Specmatic. ["Specmatic — Bi-directional Contract Testing"](https://specmatic.io/). Reference for the alternative bi-directional approach that compares OpenAPI specs against consumer contracts.
117
+ - Spring Cloud. ["Spring Cloud Contract — Reference Documentation"](https://docs.spring.io/spring-cloud-contract/docs/). The JVM-ecosystem contract testing tool; works with REST and messaging.
118
+ - Newman, S. (2015, 2nd ed. 2021). *Building Microservices*. O'Reilly. Chapter on testing strategies for microservices, with contract testing as the recommended cross-service approach.
119
+ - Richardson, C. (2018). *Microservices Patterns*. Manning. Pattern chapters covering contract testing as part of microservices integration patterns.
120
+ - Cervera, A., et al. (2015). ["Consumer-driven Contracts for Microservices: A Survey"](https://www.researchgate.net/publication/277024057). Academic survey of CDC patterns and tool support.