@heytherevibin/skillforge 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (402) hide show
  1. package/CHANGELOG.md +16 -0
  2. package/CODE_OF_CONDUCT.md +34 -0
  3. package/CONTRIBUTING.md +38 -0
  4. package/LICENSE +21 -0
  5. package/README.md +337 -0
  6. package/RELEASING.md +93 -0
  7. package/SECURITY.md +31 -0
  8. package/STRATEGY.md +26 -0
  9. package/bin/cli.js +547 -0
  10. package/lib/packs.js +184 -0
  11. package/package.json +38 -0
  12. package/python/app/__init__.py +0 -0
  13. package/python/app/__pycache__/__init__.cpython-312.pyc +0 -0
  14. package/python/app/__pycache__/auth.cpython-312.pyc +0 -0
  15. package/python/app/__pycache__/main.cpython-312.pyc +0 -0
  16. package/python/app/auth.py +63 -0
  17. package/python/app/cli.py +78 -0
  18. package/python/app/db_paths.py +26 -0
  19. package/python/app/events_cli.py +175 -0
  20. package/python/app/main.py +647 -0
  21. package/python/app/materialize.py +138 -0
  22. package/python/app/mcp_server.py +610 -0
  23. package/python/app/route_cli.py +117 -0
  24. package/python/requirements-dev.txt +1 -0
  25. package/python/requirements.txt +7 -0
  26. package/python/tests/test_db_paths.py +41 -0
  27. package/skills/accessibility/SKILL.md +145 -0
  28. package/skills/agent-architecture-audit/SKILL.md +256 -0
  29. package/skills/agent-eval/SKILL.md +144 -0
  30. package/skills/agent-harness-construction/SKILL.md +72 -0
  31. package/skills/agent-introspection-debugging/SKILL.md +152 -0
  32. package/skills/agent-payment-x402/SKILL.md +224 -0
  33. package/skills/agent-sort/SKILL.md +214 -0
  34. package/skills/agentic-engineering/SKILL.md +62 -0
  35. package/skills/agentic-os/SKILL.md +386 -0
  36. package/skills/ai-first-engineering/SKILL.md +50 -0
  37. package/skills/ai-regression-testing/SKILL.md +384 -0
  38. package/skills/android-clean-architecture/SKILL.md +338 -0
  39. package/skills/angular-developer/SKILL.md +153 -0
  40. package/skills/angular-developer/references/angular-animations.md +160 -0
  41. package/skills/angular-developer/references/angular-aria.md +410 -0
  42. package/skills/angular-developer/references/cli.md +86 -0
  43. package/skills/angular-developer/references/component-harnesses.md +59 -0
  44. package/skills/angular-developer/references/component-styling.md +91 -0
  45. package/skills/angular-developer/references/components.md +117 -0
  46. package/skills/angular-developer/references/creating-services.md +97 -0
  47. package/skills/angular-developer/references/data-resolvers.md +69 -0
  48. package/skills/angular-developer/references/define-routes.md +67 -0
  49. package/skills/angular-developer/references/defining-providers.md +72 -0
  50. package/skills/angular-developer/references/di-fundamentals.md +120 -0
  51. package/skills/angular-developer/references/e2e-testing.md +56 -0
  52. package/skills/angular-developer/references/effects.md +83 -0
  53. package/skills/angular-developer/references/hierarchical-injectors.md +43 -0
  54. package/skills/angular-developer/references/host-elements.md +80 -0
  55. package/skills/angular-developer/references/injection-context.md +63 -0
  56. package/skills/angular-developer/references/inputs.md +101 -0
  57. package/skills/angular-developer/references/linked-signal.md +59 -0
  58. package/skills/angular-developer/references/loading-strategies.md +61 -0
  59. package/skills/angular-developer/references/mcp.md +108 -0
  60. package/skills/angular-developer/references/navigate-to-routes.md +69 -0
  61. package/skills/angular-developer/references/outputs.md +86 -0
  62. package/skills/angular-developer/references/reactive-forms.md +122 -0
  63. package/skills/angular-developer/references/rendering-strategies.md +44 -0
  64. package/skills/angular-developer/references/resource.md +77 -0
  65. package/skills/angular-developer/references/route-animations.md +56 -0
  66. package/skills/angular-developer/references/route-guards.md +52 -0
  67. package/skills/angular-developer/references/router-lifecycle.md +45 -0
  68. package/skills/angular-developer/references/router-testing.md +87 -0
  69. package/skills/angular-developer/references/show-routes-with-outlets.md +68 -0
  70. package/skills/angular-developer/references/signal-forms.md +795 -0
  71. package/skills/angular-developer/references/signals-overview.md +94 -0
  72. package/skills/angular-developer/references/tailwind-css.md +69 -0
  73. package/skills/angular-developer/references/template-driven-forms.md +114 -0
  74. package/skills/angular-developer/references/testing-fundamentals.md +65 -0
  75. package/skills/api-connector-builder/SKILL.md +120 -0
  76. package/skills/api-design/SKILL.md +522 -0
  77. package/skills/architecture-decision-records/SKILL.md +178 -0
  78. package/skills/article-writing/SKILL.md +78 -0
  79. package/skills/automation-audit-ops/SKILL.md +141 -0
  80. package/skills/autonomous-agent-harness/SKILL.md +272 -0
  81. package/skills/autonomous-loops/SKILL.md +609 -0
  82. package/skills/backend-patterns/SKILL.md +560 -0
  83. package/skills/benchmark/SKILL.md +92 -0
  84. package/skills/blueprint/SKILL.md +104 -0
  85. package/skills/browser-qa/SKILL.md +86 -0
  86. package/skills/bun-runtime/SKILL.md +83 -0
  87. package/skills/canary-watch/SKILL.md +98 -0
  88. package/skills/carrier-relationship-management/SKILL.md +211 -0
  89. package/skills/cisco-ios-patterns/SKILL.md +163 -0
  90. package/skills/ck/SKILL.md +147 -0
  91. package/skills/ck/commands/forget.mjs +44 -0
  92. package/skills/ck/commands/info.mjs +24 -0
  93. package/skills/ck/commands/init.mjs +143 -0
  94. package/skills/ck/commands/list.mjs +40 -0
  95. package/skills/ck/commands/migrate.mjs +202 -0
  96. package/skills/ck/commands/resume.mjs +36 -0
  97. package/skills/ck/commands/save.mjs +210 -0
  98. package/skills/ck/commands/shared.mjs +387 -0
  99. package/skills/ck/hooks/session-start.mjs +224 -0
  100. package/skills/claude-devfleet/SKILL.md +103 -0
  101. package/skills/click-path-audit/SKILL.md +244 -0
  102. package/skills/clickhouse-io/SKILL.md +438 -0
  103. package/skills/code-tour/SKILL.md +235 -0
  104. package/skills/codebase-onboarding/SKILL.md +232 -0
  105. package/skills/coding-standards/SKILL.md +548 -0
  106. package/skills/compose-multiplatform-patterns/SKILL.md +298 -0
  107. package/skills/connections-optimizer/SKILL.md +188 -0
  108. package/skills/content-engine/SKILL.md +126 -0
  109. package/skills/content-hash-cache-pattern/SKILL.md +160 -0
  110. package/skills/context-budget/SKILL.md +134 -0
  111. package/skills/continuous-agent-loop/SKILL.md +44 -0
  112. package/skills/continuous-learning/SKILL.md +129 -0
  113. package/skills/continuous-learning/config.json +18 -0
  114. package/skills/continuous-learning/evaluate-session.sh +69 -0
  115. package/skills/continuous-learning-v2/SKILL.md +358 -0
  116. package/skills/continuous-learning-v2/agents/observer-loop.sh +322 -0
  117. package/skills/continuous-learning-v2/agents/observer.md +198 -0
  118. package/skills/continuous-learning-v2/agents/session-guardian.sh +150 -0
  119. package/skills/continuous-learning-v2/agents/start-observer.sh +248 -0
  120. package/skills/continuous-learning-v2/config.json +8 -0
  121. package/skills/continuous-learning-v2/hooks/observe.sh +476 -0
  122. package/skills/continuous-learning-v2/scripts/detect-project.sh +288 -0
  123. package/skills/continuous-learning-v2/scripts/instinct-cli.py +1519 -0
  124. package/skills/continuous-learning-v2/scripts/lib/homunculus-dir.sh +31 -0
  125. package/skills/continuous-learning-v2/scripts/migrate-homunculus.sh +62 -0
  126. package/skills/continuous-learning-v2/scripts/test_parse_instinct.py +1018 -0
  127. package/skills/cost-aware-llm-pipeline/SKILL.md +182 -0
  128. package/skills/cost-tracking/SKILL.md +147 -0
  129. package/skills/council/SKILL.md +202 -0
  130. package/skills/cpp-coding-standards/SKILL.md +722 -0
  131. package/skills/cpp-testing/SKILL.md +323 -0
  132. package/skills/crosspost/SKILL.md +110 -0
  133. package/skills/csharp-testing/SKILL.md +320 -0
  134. package/skills/customer-billing-ops/SKILL.md +139 -0
  135. package/skills/customs-trade-compliance/SKILL.md +262 -0
  136. package/skills/dart-flutter-patterns/SKILL.md +562 -0
  137. package/skills/dashboard-builder/SKILL.md +108 -0
  138. package/skills/data-scraper-agent/SKILL.md +764 -0
  139. package/skills/database-migrations/SKILL.md +428 -0
  140. package/skills/deep-research/SKILL.md +158 -0
  141. package/skills/defi-amm-security/SKILL.md +166 -0
  142. package/skills/deployment-patterns/SKILL.md +426 -0
  143. package/skills/design-system/SKILL.md +81 -0
  144. package/skills/django-celery/SKILL.md +456 -0
  145. package/skills/django-patterns/SKILL.md +733 -0
  146. package/skills/django-security/SKILL.md +592 -0
  147. package/skills/django-tdd/SKILL.md +728 -0
  148. package/skills/django-verification/SKILL.md +468 -0
  149. package/skills/dmux-workflows/SKILL.md +190 -0
  150. package/skills/docker-patterns/SKILL.md +363 -0
  151. package/skills/documentation-lookup/SKILL.md +89 -0
  152. package/skills/dotnet-patterns/SKILL.md +320 -0
  153. package/skills/e2e-testing/SKILL.md +325 -0
  154. package/skills/email-ops/SKILL.md +120 -0
  155. package/skills/energy-procurement/SKILL.md +227 -0
  156. package/skills/enterprise-agent-ops/SKILL.md +49 -0
  157. package/skills/error-handling/SKILL.md +375 -0
  158. package/skills/eval-harness/SKILL.md +269 -0
  159. package/skills/evm-token-decimals/SKILL.md +130 -0
  160. package/skills/exa-search/SKILL.md +106 -0
  161. package/skills/fal-ai-media/SKILL.md +287 -0
  162. package/skills/fastapi-patterns/SKILL.md +327 -0
  163. package/skills/finance-billing-ops/SKILL.md +126 -0
  164. package/skills/flox-environments/SKILL.md +496 -0
  165. package/skills/flutter-dart-code-review/SKILL.md +434 -0
  166. package/skills/foundation-models-on-device/SKILL.md +243 -0
  167. package/skills/frontend-design-direction/SKILL.md +92 -0
  168. package/skills/frontend-patterns/SKILL.md +641 -0
  169. package/skills/frontend-slides/SKILL.md +183 -0
  170. package/skills/frontend-slides/STYLE_PRESETS.md +330 -0
  171. package/skills/frontend-slides/animation-patterns.md +122 -0
  172. package/skills/frontend-slides/html-template.md +419 -0
  173. package/skills/frontend-slides/scripts/export-pdf.sh +418 -0
  174. package/skills/frontend-slides/scripts/extract-pptx.py +96 -0
  175. package/skills/frontend-slides/viewport-base.css +153 -0
  176. package/skills/fsharp-testing/SKILL.md +279 -0
  177. package/skills/gan-style-harness/SKILL.md +278 -0
  178. package/skills/gateguard/SKILL.md +125 -0
  179. package/skills/git-workflow/SKILL.md +714 -0
  180. package/skills/github-ops/SKILL.md +143 -0
  181. package/skills/golang-patterns/SKILL.md +673 -0
  182. package/skills/golang-testing/SKILL.md +719 -0
  183. package/skills/google-workspace-ops/SKILL.md +94 -0
  184. package/skills/healthcare-cdss-patterns/SKILL.md +245 -0
  185. package/skills/healthcare-emr-patterns/SKILL.md +159 -0
  186. package/skills/healthcare-eval-harness/SKILL.md +207 -0
  187. package/skills/healthcare-phi-compliance/SKILL.md +145 -0
  188. package/skills/hermes-imports/SKILL.md +87 -0
  189. package/skills/hexagonal-architecture/SKILL.md +275 -0
  190. package/skills/hipaa-compliance/SKILL.md +78 -0
  191. package/skills/homelab-network-readiness/SKILL.md +169 -0
  192. package/skills/homelab-network-setup/SKILL.md +129 -0
  193. package/skills/homelab-pihole-dns/SKILL.md +274 -0
  194. package/skills/homelab-vlan-segmentation/SKILL.md +311 -0
  195. package/skills/homelab-wireguard-vpn/SKILL.md +305 -0
  196. package/skills/hookify-rules/SKILL.md +128 -0
  197. package/skills/inventory-demand-planning/SKILL.md +246 -0
  198. package/skills/investor-materials/SKILL.md +95 -0
  199. package/skills/investor-outreach/SKILL.md +90 -0
  200. package/skills/ios-icon-gen/SKILL.md +157 -0
  201. package/skills/ios-icon-gen/scripts/generate_icons.swift +258 -0
  202. package/skills/ios-icon-gen/scripts/iconify_gen.sh +235 -0
  203. package/skills/iterative-retrieval/SKILL.md +209 -0
  204. package/skills/java-coding-standards/SKILL.md +382 -0
  205. package/skills/jira-integration/SKILL.md +292 -0
  206. package/skills/jpa-patterns/SKILL.md +150 -0
  207. package/skills/knowledge-ops/SKILL.md +153 -0
  208. package/skills/kotlin-coroutines-flows/SKILL.md +283 -0
  209. package/skills/kotlin-exposed-patterns/SKILL.md +718 -0
  210. package/skills/kotlin-ktor-patterns/SKILL.md +688 -0
  211. package/skills/kotlin-patterns/SKILL.md +710 -0
  212. package/skills/kotlin-testing/SKILL.md +823 -0
  213. package/skills/laravel-patterns/SKILL.md +414 -0
  214. package/skills/laravel-plugin-discovery/SKILL.md +228 -0
  215. package/skills/laravel-security/SKILL.md +284 -0
  216. package/skills/laravel-tdd/SKILL.md +282 -0
  217. package/skills/laravel-verification/SKILL.md +178 -0
  218. package/skills/lead-intelligence/SKILL.md +320 -0
  219. package/skills/lead-intelligence/agents/enrichment-agent.md +85 -0
  220. package/skills/lead-intelligence/agents/mutual-mapper.md +75 -0
  221. package/skills/lead-intelligence/agents/outreach-drafter.md +98 -0
  222. package/skills/lead-intelligence/agents/signal-scorer.md +60 -0
  223. package/skills/liquid-glass-design/SKILL.md +279 -0
  224. package/skills/llm-trading-agent-security/SKILL.md +146 -0
  225. package/skills/logistics-exception-management/SKILL.md +221 -0
  226. package/skills/make-interfaces-feel-better/SKILL.md +151 -0
  227. package/skills/manim-video/SKILL.md +88 -0
  228. package/skills/manim-video/assets/network_graph_scene.py +52 -0
  229. package/skills/market-research/SKILL.md +74 -0
  230. package/skills/mcp-server-patterns/SKILL.md +68 -0
  231. package/skills/messages-ops/SKILL.md +103 -0
  232. package/skills/mle-workflow/SKILL.md +345 -0
  233. package/skills/motion-advanced/SKILL.md +596 -0
  234. package/skills/motion-foundations/SKILL.md +299 -0
  235. package/skills/motion-patterns/SKILL.md +435 -0
  236. package/skills/motion-ui/SKILL.md +574 -0
  237. package/skills/mysql-patterns/SKILL.md +411 -0
  238. package/skills/nanoclaw-repl/SKILL.md +32 -0
  239. package/skills/nestjs-patterns/SKILL.md +229 -0
  240. package/skills/netmiko-ssh-automation/SKILL.md +173 -0
  241. package/skills/network-bgp-diagnostics/SKILL.md +167 -0
  242. package/skills/network-config-validation/SKILL.md +210 -0
  243. package/skills/network-interface-health/SKILL.md +152 -0
  244. package/skills/nextjs-turbopack/SKILL.md +43 -0
  245. package/skills/nodejs-keccak256/SKILL.md +102 -0
  246. package/skills/nutrient-document-processing/SKILL.md +166 -0
  247. package/skills/nuxt4-patterns/SKILL.md +99 -0
  248. package/skills/openclaw-persona-forge/SKILL.md +288 -0
  249. package/skills/openclaw-persona-forge/gacha.py +224 -0
  250. package/skills/openclaw-persona-forge/gacha.sh +5 -0
  251. package/skills/openclaw-persona-forge/references/avatar-style.md +124 -0
  252. package/skills/openclaw-persona-forge/references/boundary-rules.md +53 -0
  253. package/skills/openclaw-persona-forge/references/error-handling.md +53 -0
  254. package/skills/openclaw-persona-forge/references/identity-tension.md +48 -0
  255. package/skills/openclaw-persona-forge/references/naming-system.md +39 -0
  256. package/skills/openclaw-persona-forge/references/output-template.md +166 -0
  257. package/skills/opensource-pipeline/SKILL.md +254 -0
  258. package/skills/perl-patterns/SKILL.md +503 -0
  259. package/skills/perl-security/SKILL.md +502 -0
  260. package/skills/perl-testing/SKILL.md +474 -0
  261. package/skills/plan-orchestrate/SKILL.md +253 -0
  262. package/skills/plankton-code-quality/SKILL.md +236 -0
  263. package/skills/postgres-patterns/SKILL.md +146 -0
  264. package/skills/product-capability/SKILL.md +140 -0
  265. package/skills/product-lens/SKILL.md +91 -0
  266. package/skills/production-audit/SKILL.md +206 -0
  267. package/skills/production-scheduling/SKILL.md +237 -0
  268. package/skills/project-flow-ops/SKILL.md +110 -0
  269. package/skills/prompt-optimizer/SKILL.md +398 -0
  270. package/skills/python-patterns/SKILL.md +749 -0
  271. package/skills/python-testing/SKILL.md +815 -0
  272. package/skills/pytorch-patterns/SKILL.md +395 -0
  273. package/skills/quality-nonconformance/SKILL.md +259 -0
  274. package/skills/quarkus-patterns/SKILL.md +721 -0
  275. package/skills/quarkus-security/SKILL.md +466 -0
  276. package/skills/quarkus-tdd/SKILL.md +810 -0
  277. package/skills/quarkus-verification/SKILL.md +478 -0
  278. package/skills/ralphinho-rfc-pipeline/SKILL.md +66 -0
  279. package/skills/redis-patterns/SKILL.md +402 -0
  280. package/skills/regex-vs-llm-structured-text/SKILL.md +219 -0
  281. package/skills/remotion-video-creation/SKILL.md +43 -0
  282. package/skills/remotion-video-creation/rules/3d.md +86 -0
  283. package/skills/remotion-video-creation/rules/animations.md +29 -0
  284. package/skills/remotion-video-creation/rules/assets/charts-bar-chart.tsx +173 -0
  285. package/skills/remotion-video-creation/rules/assets/text-animations-typewriter.tsx +100 -0
  286. package/skills/remotion-video-creation/rules/assets/text-animations-word-highlight.tsx +108 -0
  287. package/skills/remotion-video-creation/rules/assets.md +78 -0
  288. package/skills/remotion-video-creation/rules/audio.md +172 -0
  289. package/skills/remotion-video-creation/rules/calculate-metadata.md +104 -0
  290. package/skills/remotion-video-creation/rules/can-decode.md +75 -0
  291. package/skills/remotion-video-creation/rules/charts.md +58 -0
  292. package/skills/remotion-video-creation/rules/compositions.md +146 -0
  293. package/skills/remotion-video-creation/rules/display-captions.md +126 -0
  294. package/skills/remotion-video-creation/rules/extract-frames.md +229 -0
  295. package/skills/remotion-video-creation/rules/fonts.md +152 -0
  296. package/skills/remotion-video-creation/rules/get-audio-duration.md +58 -0
  297. package/skills/remotion-video-creation/rules/get-video-dimensions.md +68 -0
  298. package/skills/remotion-video-creation/rules/get-video-duration.md +58 -0
  299. package/skills/remotion-video-creation/rules/gifs.md +138 -0
  300. package/skills/remotion-video-creation/rules/images.md +130 -0
  301. package/skills/remotion-video-creation/rules/import-srt-captions.md +67 -0
  302. package/skills/remotion-video-creation/rules/lottie.md +67 -0
  303. package/skills/remotion-video-creation/rules/measuring-dom-nodes.md +34 -0
  304. package/skills/remotion-video-creation/rules/measuring-text.md +143 -0
  305. package/skills/remotion-video-creation/rules/sequencing.md +106 -0
  306. package/skills/remotion-video-creation/rules/tailwind.md +11 -0
  307. package/skills/remotion-video-creation/rules/text-animations.md +20 -0
  308. package/skills/remotion-video-creation/rules/timing.md +179 -0
  309. package/skills/remotion-video-creation/rules/transcribe-captions.md +19 -0
  310. package/skills/remotion-video-creation/rules/transitions.md +122 -0
  311. package/skills/remotion-video-creation/rules/trimming.md +52 -0
  312. package/skills/remotion-video-creation/rules/videos.md +171 -0
  313. package/skills/repo-scan/SKILL.md +78 -0
  314. package/skills/research-ops/SKILL.md +111 -0
  315. package/skills/returns-reverse-logistics/SKILL.md +239 -0
  316. package/skills/rules-distill/SKILL.md +263 -0
  317. package/skills/rules-distill/scripts/scan-rules.sh +58 -0
  318. package/skills/rules-distill/scripts/scan-skills.sh +129 -0
  319. package/skills/rust-patterns/SKILL.md +498 -0
  320. package/skills/rust-testing/SKILL.md +499 -0
  321. package/skills/safety-guard/SKILL.md +74 -0
  322. package/skills/santa-method/SKILL.md +306 -0
  323. package/skills/scientific-db-pubmed-database/SKILL.md +175 -0
  324. package/skills/scientific-db-uspto-database/SKILL.md +177 -0
  325. package/skills/scientific-pkg-gget/SKILL.md +166 -0
  326. package/skills/scientific-thinking-literature-review/SKILL.md +192 -0
  327. package/skills/scientific-thinking-scholar-evaluation/SKILL.md +160 -0
  328. package/skills/search-first/SKILL.md +181 -0
  329. package/skills/security-bounty-hunter/SKILL.md +99 -0
  330. package/skills/security-review/SKILL.md +502 -0
  331. package/skills/security-review/cloud-infrastructure-security.md +361 -0
  332. package/skills/seo/SKILL.md +153 -0
  333. package/skills/skill-comply/SKILL.md +57 -0
  334. package/skills/skill-comply/fixtures/compliant_trace.jsonl +5 -0
  335. package/skills/skill-comply/fixtures/noncompliant_trace.jsonl +3 -0
  336. package/skills/skill-comply/fixtures/tdd_spec.yaml +44 -0
  337. package/skills/skill-comply/prompts/classifier.md +24 -0
  338. package/skills/skill-comply/prompts/scenario_generator.md +62 -0
  339. package/skills/skill-comply/prompts/spec_generator.md +42 -0
  340. package/skills/skill-comply/pyproject.toml +15 -0
  341. package/skills/skill-comply/scripts/__init__.py +0 -0
  342. package/skills/skill-comply/scripts/classifier.py +85 -0
  343. package/skills/skill-comply/scripts/grader.py +124 -0
  344. package/skills/skill-comply/scripts/parser.py +107 -0
  345. package/skills/skill-comply/scripts/report.py +170 -0
  346. package/skills/skill-comply/scripts/run.py +127 -0
  347. package/skills/skill-comply/scripts/runner.py +186 -0
  348. package/skills/skill-comply/scripts/scenario_generator.py +70 -0
  349. package/skills/skill-comply/scripts/spec_generator.py +72 -0
  350. package/skills/skill-comply/scripts/utils.py +13 -0
  351. package/skills/skill-comply/tests/test_grader.py +197 -0
  352. package/skills/skill-comply/tests/test_parser.py +90 -0
  353. package/skills/skill-comply/tests/test_runner.py +172 -0
  354. package/skills/skill-scout/SKILL.md +139 -0
  355. package/skills/skill-stocktake/SKILL.md +193 -0
  356. package/skills/skill-stocktake/scripts/quick-diff.sh +87 -0
  357. package/skills/skill-stocktake/scripts/save-results.sh +56 -0
  358. package/skills/skill-stocktake/scripts/scan.sh +170 -0
  359. package/skills/social-graph-ranker/SKILL.md +153 -0
  360. package/skills/springboot-patterns/SKILL.md +313 -0
  361. package/skills/springboot-security/SKILL.md +271 -0
  362. package/skills/springboot-tdd/SKILL.md +157 -0
  363. package/skills/springboot-verification/SKILL.md +230 -0
  364. package/skills/strategic-compact/SKILL.md +129 -0
  365. package/skills/strategic-compact/suggest-compact.sh +54 -0
  366. package/skills/swift-actor-persistence/SKILL.md +142 -0
  367. package/skills/swift-concurrency-6-2/SKILL.md +216 -0
  368. package/skills/swift-protocol-di-testing/SKILL.md +189 -0
  369. package/skills/swiftui-patterns/SKILL.md +259 -0
  370. package/skills/tdd-workflow/SKILL.md +462 -0
  371. package/skills/team-builder/SKILL.md +166 -0
  372. package/skills/terminal-ops/SKILL.md +108 -0
  373. package/skills/tinystruct-patterns/SKILL.md +130 -0
  374. package/skills/tinystruct-patterns/references/architecture.md +77 -0
  375. package/skills/tinystruct-patterns/references/data-handling.md +35 -0
  376. package/skills/tinystruct-patterns/references/routing.md +57 -0
  377. package/skills/tinystruct-patterns/references/system-usage.md +74 -0
  378. package/skills/tinystruct-patterns/references/testing.md +59 -0
  379. package/skills/token-budget-advisor/SKILL.md +133 -0
  380. package/skills/ui-demo/SKILL.md +464 -0
  381. package/skills/ui-to-vue/SKILL.md +134 -0
  382. package/skills/unified-notifications-ops/SKILL.md +186 -0
  383. package/skills/verification-loop/SKILL.md +125 -0
  384. package/skills/video-editing/SKILL.md +309 -0
  385. package/skills/videodb/SKILL.md +373 -0
  386. package/skills/videodb/reference/api-reference.md +550 -0
  387. package/skills/videodb/reference/capture-reference.md +407 -0
  388. package/skills/videodb/reference/capture.md +101 -0
  389. package/skills/videodb/reference/editor.md +443 -0
  390. package/skills/videodb/reference/generative.md +331 -0
  391. package/skills/videodb/reference/rtstream-reference.md +564 -0
  392. package/skills/videodb/reference/rtstream.md +65 -0
  393. package/skills/videodb/reference/search.md +230 -0
  394. package/skills/videodb/reference/streaming.md +406 -0
  395. package/skills/videodb/reference/use-cases.md +118 -0
  396. package/skills/videodb/scripts/ws_listener.py +282 -0
  397. package/skills/visa-doc-translate/README.md +86 -0
  398. package/skills/visa-doc-translate/SKILL.md +117 -0
  399. package/skills/vite-patterns/SKILL.md +448 -0
  400. package/skills/windows-desktop-e2e/SKILL.md +787 -0
  401. package/skills/workspace-surface-audit/SKILL.md +124 -0
  402. package/skills/x-api/SKILL.md +233 -0
@@ -0,0 +1,375 @@
1
+ ---
2
+ name: error-handling
3
+ description: Patterns for robust error handling across TypeScript, Python, and Go. Covers typed errors, error boundaries, retries, circuit breakers, and user-facing error messages.
4
+ ---
5
+
6
+ # Error Handling Patterns
7
+
8
+ Consistent, robust error handling patterns for production applications.
9
+
10
+ ## When to Activate
11
+
12
+ - Designing error types or exception hierarchies for a new module or service
13
+ - Adding retry logic or circuit breakers for unreliable external dependencies
14
+ - Reviewing API endpoints for missing error handling
15
+ - Implementing user-facing error messages and feedback
16
+ - Debugging cascading failures or silent error swallowing
17
+
18
+ ## Core Principles
19
+
20
+ 1. **Fail fast and loudly** — surface errors at the boundary where they occur; don't bury them
21
+ 2. **Typed errors over string messages** — errors are first-class values with structure
22
+ 3. **User messages ≠ developer messages** — show friendly text to users, log full context server-side
23
+ 4. **Never swallow errors silently** — every `catch` block must either handle, re-throw, or log
24
+ 5. **Errors are part of your API contract** — document every error code a client may receive
25
+
26
+ ## TypeScript / JavaScript
27
+
28
+ ### Typed Error Classes
29
+
30
+ ```typescript
31
+ // Define an error hierarchy for your domain
32
+ export class AppError extends Error {
33
+ constructor(
34
+ message: string,
35
+ public readonly code: string,
36
+ public readonly statusCode: number = 500,
37
+ public readonly details?: unknown,
38
+ ) {
39
+ super(message)
40
+ this.name = this.constructor.name
41
+ // Maintain correct prototype chain in transpiled ES5 JavaScript.
42
+ // Required for `instanceof` checks (e.g., `error instanceof NotFoundError`)
43
+ // to work correctly when extending the built-in Error class.
44
+ Object.setPrototypeOf(this, new.target.prototype)
45
+ }
46
+ }
47
+
48
+ export class NotFoundError extends AppError {
49
+ constructor(resource: string, id: string) {
50
+ super(`${resource} not found: ${id}`, 'NOT_FOUND', 404)
51
+ }
52
+ }
53
+
54
+ export class ValidationError extends AppError {
55
+ constructor(message: string, details: { field: string; message: string }[]) {
56
+ super(message, 'VALIDATION_ERROR', 422, details)
57
+ }
58
+ }
59
+
60
+ export class UnauthorizedError extends AppError {
61
+ constructor(reason = 'Authentication required') {
62
+ super(reason, 'UNAUTHORIZED', 401)
63
+ }
64
+ }
65
+
66
+ export class RateLimitError extends AppError {
67
+ constructor(public readonly retryAfterMs: number) {
68
+ super('Rate limit exceeded', 'RATE_LIMITED', 429)
69
+ }
70
+ }
71
+ ```
72
+
73
+ ### Result Pattern (no-throw style)
74
+
75
+ For operations where failure is expected and common (parsing, external calls):
76
+
77
+ ```typescript
78
+ type Result<T, E = AppError> =
79
+ | { ok: true; value: T }
80
+ | { ok: false; error: E }
81
+
82
+ function ok<T>(value: T): Result<T> {
83
+ return { ok: true, value }
84
+ }
85
+
86
+ function err<E>(error: E): Result<never, E> {
87
+ return { ok: false, error }
88
+ }
89
+
90
+ // Usage
91
+ async function fetchUser(id: string): Promise<Result<User>> {
92
+ try {
93
+ const user = await db.users.findUnique({ where: { id } })
94
+ if (!user) return err(new NotFoundError('User', id))
95
+ return ok(user)
96
+ } catch (e) {
97
+ return err(new AppError('Database error', 'DB_ERROR'))
98
+ }
99
+ }
100
+
101
+ const result = await fetchUser('abc-123')
102
+ if (!result.ok) {
103
+ // TypeScript knows result.error here
104
+ logger.error('Failed to fetch user', { error: result.error })
105
+ return
106
+ }
107
+ // TypeScript knows result.value here
108
+ console.log(result.value.email)
109
+ ```
110
+
111
+ ### API Error Handler (Next.js / Express)
112
+
113
+ ```typescript
114
+ import { NextRequest, NextResponse } from 'next/server'
115
+
116
+ function handleApiError(error: unknown): NextResponse {
117
+ // Known application error
118
+ if (error instanceof AppError) {
119
+ return NextResponse.json(
120
+ {
121
+ error: {
122
+ code: error.code,
123
+ message: error.message,
124
+ ...(error.details ? { details: error.details } : {}),
125
+ },
126
+ },
127
+ { status: error.statusCode },
128
+ )
129
+ }
130
+
131
+ // Zod validation error
132
+ if (error instanceof z.ZodError) {
133
+ return NextResponse.json(
134
+ {
135
+ error: {
136
+ code: 'VALIDATION_ERROR',
137
+ message: 'Request validation failed',
138
+ details: error.issues.map(i => ({
139
+ field: i.path.join('.'),
140
+ message: i.message,
141
+ })),
142
+ },
143
+ },
144
+ { status: 422 },
145
+ )
146
+ }
147
+
148
+ // Unexpected error — log details, return generic message
149
+ console.error('Unexpected error:', error)
150
+ return NextResponse.json(
151
+ { error: { code: 'INTERNAL_ERROR', message: 'An unexpected error occurred' } },
152
+ { status: 500 },
153
+ )
154
+ }
155
+
156
+ export async function POST(req: NextRequest) {
157
+ try {
158
+ // ... handler logic
159
+ } catch (error) {
160
+ return handleApiError(error)
161
+ }
162
+ }
163
+ ```
164
+
165
+ ### React Error Boundary
166
+
167
+ ```typescript
168
+ import { Component, ErrorInfo, ReactNode } from 'react'
169
+
170
+ interface Props {
171
+ fallback: ReactNode
172
+ onError?: (error: Error, info: ErrorInfo) => void
173
+ children: ReactNode
174
+ }
175
+
176
+ interface State {
177
+ hasError: boolean
178
+ error: Error | null
179
+ }
180
+
181
+ export class ErrorBoundary extends Component<Props, State> {
182
+ state: State = { hasError: false, error: null }
183
+
184
+ static getDerivedStateFromError(error: Error): State {
185
+ return { hasError: true, error }
186
+ }
187
+
188
+ componentDidCatch(error: Error, info: ErrorInfo) {
189
+ this.props.onError?.(error, info)
190
+ console.error('Unhandled React error:', error, info)
191
+ }
192
+
193
+ render() {
194
+ if (this.state.hasError) return this.props.fallback
195
+ return this.props.children
196
+ }
197
+ }
198
+
199
+ // Usage
200
+ <ErrorBoundary fallback={<p>Something went wrong. Please refresh.</p>}>
201
+ <MyComponent />
202
+ </ErrorBoundary>
203
+ ```
204
+
205
+ ## Python
206
+
207
+ ### Custom Exception Hierarchy
208
+
209
+ ```python
210
+ class AppError(Exception):
211
+ """Base application error."""
212
+ def __init__(self, message: str, code: str, status_code: int = 500):
213
+ super().__init__(message)
214
+ self.code = code
215
+ self.status_code = status_code
216
+
217
+ class NotFoundError(AppError):
218
+ def __init__(self, resource: str, id: str):
219
+ super().__init__(f"{resource} not found: {id}", "NOT_FOUND", 404)
220
+
221
+ class ValidationError(AppError):
222
+ def __init__(self, message: str, details: list[dict] | None = None):
223
+ super().__init__(message, "VALIDATION_ERROR", 422)
224
+ self.details = details or []
225
+ ```
226
+
227
+ ### FastAPI Global Exception Handler
228
+
229
+ ```python
230
+ from fastapi import FastAPI, Request
231
+ from fastapi.responses import JSONResponse
232
+
233
+ app = FastAPI()
234
+
235
+ @app.exception_handler(AppError)
236
+ async def app_error_handler(request: Request, exc: AppError) -> JSONResponse:
237
+ return JSONResponse(
238
+ status_code=exc.status_code,
239
+ content={"error": {"code": exc.code, "message": str(exc)}},
240
+ )
241
+
242
+ @app.exception_handler(Exception)
243
+ async def generic_error_handler(request: Request, exc: Exception) -> JSONResponse:
244
+ # Log full details, return generic message
245
+ logger.exception("Unexpected error", exc_info=exc)
246
+ return JSONResponse(
247
+ status_code=500,
248
+ content={"error": {"code": "INTERNAL_ERROR", "message": "An unexpected error occurred"}},
249
+ )
250
+ ```
251
+
252
+ ## Go
253
+
254
+ ### Sentinel Errors and Error Wrapping
255
+
256
+ ```go
257
+ package domain
258
+
259
+ import "errors"
260
+
261
+ // Sentinel errors for type-checking
262
+ var (
263
+ ErrNotFound = errors.New("not found")
264
+ ErrUnauthorized = errors.New("unauthorized")
265
+ ErrConflict = errors.New("conflict")
266
+ )
267
+
268
+ // Wrap errors with context — never lose the original
269
+ func (r *UserRepository) FindByID(ctx context.Context, id string) (*User, error) {
270
+ user, err := r.db.QueryRow(ctx, "SELECT * FROM users WHERE id = $1", id)
271
+ if errors.Is(err, sql.ErrNoRows) {
272
+ return nil, fmt.Errorf("user %s: %w", id, ErrNotFound)
273
+ }
274
+ if err != nil {
275
+ return nil, fmt.Errorf("querying user %s: %w", id, err)
276
+ }
277
+ return user, nil
278
+ }
279
+
280
+ // At the handler level, unwrap to determine response
281
+ func (h *Handler) GetUser(w http.ResponseWriter, r *http.Request) {
282
+ user, err := h.service.GetUser(r.Context(), chi.URLParam(r, "id"))
283
+ if err != nil {
284
+ switch {
285
+ case errors.Is(err, domain.ErrNotFound):
286
+ writeError(w, http.StatusNotFound, "not_found", err.Error())
287
+ case errors.Is(err, domain.ErrUnauthorized):
288
+ writeError(w, http.StatusForbidden, "forbidden", "Access denied")
289
+ default:
290
+ slog.Error("unexpected error", "err", err)
291
+ writeError(w, http.StatusInternalServerError, "internal_error", "An unexpected error occurred")
292
+ }
293
+ return
294
+ }
295
+ writeJSON(w, http.StatusOK, user)
296
+ }
297
+ ```
298
+
299
+ ## Retry with Exponential Backoff
300
+
301
+ ```typescript
302
+ interface RetryOptions {
303
+ maxAttempts?: number
304
+ baseDelayMs?: number
305
+ maxDelayMs?: number
306
+ retryIf?: (error: unknown) => boolean
307
+ }
308
+
309
+ async function withRetry<T>(
310
+ fn: () => Promise<T>,
311
+ options: RetryOptions = {},
312
+ ): Promise<T> {
313
+ const {
314
+ maxAttempts = 3,
315
+ baseDelayMs = 500,
316
+ maxDelayMs = 10_000,
317
+ retryIf = () => true,
318
+ } = options
319
+
320
+ let lastError: unknown
321
+
322
+ for (let attempt = 1; attempt <= maxAttempts; attempt++) {
323
+ try {
324
+ return await fn()
325
+ } catch (error) {
326
+ lastError = error
327
+ if (attempt === maxAttempts || !retryIf(error)) throw error
328
+
329
+ const jitter = Math.random() * baseDelayMs
330
+ const delay = Math.min(baseDelayMs * 2 ** (attempt - 1) + jitter, maxDelayMs)
331
+ await new Promise(resolve => setTimeout(resolve, delay))
332
+ }
333
+ }
334
+
335
+ throw lastError
336
+ }
337
+
338
+ // Usage: retry transient network errors, not 4xx
339
+ const data = await withRetry(() => fetch('/api/data').then(r => r.json()), {
340
+ maxAttempts: 3,
341
+ retryIf: (error) => !(error instanceof AppError && error.statusCode < 500),
342
+ })
343
+ ```
344
+
345
+ ## User-Facing Error Messages
346
+
347
+ Map error codes to human-readable messages. Keep technical details out of user-visible text.
348
+
349
+ ```typescript
350
+ const USER_ERROR_MESSAGES: Record<string, string> = {
351
+ NOT_FOUND: 'The requested item could not be found.',
352
+ UNAUTHORIZED: 'Please sign in to continue.',
353
+ FORBIDDEN: "You don't have permission to do that.",
354
+ VALIDATION_ERROR: 'Please check your input and try again.',
355
+ RATE_LIMITED: 'Too many requests. Please wait a moment and try again.',
356
+ INTERNAL_ERROR: 'Something went wrong on our end. Please try again later.',
357
+ }
358
+
359
+ export function getUserMessage(code: string): string {
360
+ return USER_ERROR_MESSAGES[code] ?? USER_ERROR_MESSAGES.INTERNAL_ERROR
361
+ }
362
+ ```
363
+
364
+ ## Error Handling Checklist
365
+
366
+ Before merging any code that touches error handling:
367
+
368
+ - [ ] Every `catch` block handles, re-throws, or logs — no silent swallowing
369
+ - [ ] API errors follow the standard envelope `{ error: { code, message } }`
370
+ - [ ] User-facing messages contain no stack traces or internal details
371
+ - [ ] Full error context is logged server-side
372
+ - [ ] Custom error classes extend a base `AppError` with a `code` field
373
+ - [ ] Async functions surface errors to callers — no fire-and-forget without fallback
374
+ - [ ] Retry logic only retries retriable errors (not 4xx client errors)
375
+ - [ ] React components are wrapped in `ErrorBoundary` for rendering errors
@@ -0,0 +1,269 @@
1
+ ---
2
+ name: eval-harness
3
+ description: Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles
4
+ tools: Read, Write, Edit, Bash, Grep, Glob
5
+ ---
6
+
7
+ # Eval Harness Skill
8
+
9
+ A formal evaluation framework for Claude Code sessions, implementing eval-driven development (EDD) principles.
10
+
11
+ ## When to Activate
12
+
13
+ - Setting up eval-driven development (EDD) for AI-assisted workflows
14
+ - Defining pass/fail criteria for Claude Code task completion
15
+ - Measuring agent reliability with pass@k metrics
16
+ - Creating regression test suites for prompt or agent changes
17
+ - Benchmarking agent performance across model versions
18
+
19
+ ## Philosophy
20
+
21
+ Eval-Driven Development treats evals as the "unit tests of AI development":
22
+ - Define expected behavior BEFORE implementation
23
+ - Run evals continuously during development
24
+ - Track regressions with each change
25
+ - Use pass@k metrics for reliability measurement
26
+
27
+ ## Eval Types
28
+
29
+ ### Capability Evals
30
+ Test if Claude can do something it couldn't before:
31
+ ```markdown
32
+ [CAPABILITY EVAL: feature-name]
33
+ Task: Description of what Claude should accomplish
34
+ Success Criteria:
35
+ - [ ] Criterion 1
36
+ - [ ] Criterion 2
37
+ - [ ] Criterion 3
38
+ Expected Output: Description of expected result
39
+ ```
40
+
41
+ ### Regression Evals
42
+ Ensure changes don't break existing functionality:
43
+ ```markdown
44
+ [REGRESSION EVAL: feature-name]
45
+ Baseline: SHA or checkpoint name
46
+ Tests:
47
+ - existing-test-1: PASS/FAIL
48
+ - existing-test-2: PASS/FAIL
49
+ - existing-test-3: PASS/FAIL
50
+ Result: X/Y passed (previously Y/Y)
51
+ ```
52
+
53
+ ## Grader Types
54
+
55
+ ### 1. Code-Based Grader
56
+ Deterministic checks using code:
57
+ ```bash
58
+ # Check if file contains expected pattern
59
+ grep -q "export function handleAuth" src/auth.ts && echo "PASS" || echo "FAIL"
60
+
61
+ # Check if tests pass
62
+ npm test -- --testPathPattern="auth" && echo "PASS" || echo "FAIL"
63
+
64
+ # Check if build succeeds
65
+ npm run build && echo "PASS" || echo "FAIL"
66
+ ```
67
+
68
+ ### 2. Model-Based Grader
69
+ Use Claude to evaluate open-ended outputs:
70
+ ```markdown
71
+ [MODEL GRADER PROMPT]
72
+ Evaluate the following code change:
73
+ 1. Does it solve the stated problem?
74
+ 2. Is it well-structured?
75
+ 3. Are edge cases handled?
76
+ 4. Is error handling appropriate?
77
+
78
+ Score: 1-5 (1=poor, 5=excellent)
79
+ Reasoning: [explanation]
80
+ ```
81
+
82
+ ### 3. Human Grader
83
+ Flag for manual review:
84
+ ```markdown
85
+ [HUMAN REVIEW REQUIRED]
86
+ Change: Description of what changed
87
+ Reason: Why human review is needed
88
+ Risk Level: LOW/MEDIUM/HIGH
89
+ ```
90
+
91
+ ## Metrics
92
+
93
+ ### pass@k
94
+ "At least one success in k attempts"
95
+ - pass@1: First attempt success rate
96
+ - pass@3: Success within 3 attempts
97
+ - Typical target: pass@3 > 90%
98
+
99
+ ### pass^k
100
+ "All k trials succeed"
101
+ - Higher bar for reliability
102
+ - pass^3: 3 consecutive successes
103
+ - Use for critical paths
104
+
105
+ ## Eval Workflow
106
+
107
+ ### 1. Define (Before Coding)
108
+ ```markdown
109
+ ## EVAL DEFINITION: feature-xyz
110
+
111
+ ### Capability Evals
112
+ 1. Can create new user account
113
+ 2. Can validate email format
114
+ 3. Can hash password securely
115
+
116
+ ### Regression Evals
117
+ 1. Existing login still works
118
+ 2. Session management unchanged
119
+ 3. Logout flow intact
120
+
121
+ ### Success Metrics
122
+ - pass@3 > 90% for capability evals
123
+ - pass^3 = 100% for regression evals
124
+ ```
125
+
126
+ ### 2. Implement
127
+ Write code to pass the defined evals.
128
+
129
+ ### 3. Evaluate
130
+ ```bash
131
+ # Run capability evals
132
+ [Run each capability eval, record PASS/FAIL]
133
+
134
+ # Run regression evals
135
+ npm test -- --testPathPattern="existing"
136
+
137
+ # Generate report
138
+ ```
139
+
140
+ ### 4. Report
141
+ ```markdown
142
+ EVAL REPORT: feature-xyz
143
+ ========================
144
+
145
+ Capability Evals:
146
+ create-user: PASS (pass@1)
147
+ validate-email: PASS (pass@2)
148
+ hash-password: PASS (pass@1)
149
+ Overall: 3/3 passed
150
+
151
+ Regression Evals:
152
+ login-flow: PASS
153
+ session-mgmt: PASS
154
+ logout-flow: PASS
155
+ Overall: 3/3 passed
156
+
157
+ Metrics:
158
+ pass@1: 67% (2/3)
159
+ pass@3: 100% (3/3)
160
+
161
+ Status: READY FOR REVIEW
162
+ ```
163
+
164
+ ## Integration Patterns
165
+
166
+ ### Pre-Implementation
167
+ ```
168
+ /eval define feature-name
169
+ ```
170
+ Creates eval definition file at `.claude/evals/feature-name.md`
171
+
172
+ ### During Implementation
173
+ ```
174
+ /eval check feature-name
175
+ ```
176
+ Runs current evals and reports status
177
+
178
+ ### Post-Implementation
179
+ ```
180
+ /eval report feature-name
181
+ ```
182
+ Generates full eval report
183
+
184
+ ## Eval Storage
185
+
186
+ Store evals in project:
187
+ ```
188
+ .claude/
189
+ evals/
190
+ feature-xyz.md # Eval definition
191
+ feature-xyz.log # Eval run history
192
+ baseline.json # Regression baselines
193
+ ```
194
+
195
+ ## Best Practices
196
+
197
+ 1. **Define evals BEFORE coding** - Forces clear thinking about success criteria
198
+ 2. **Run evals frequently** - Catch regressions early
199
+ 3. **Track pass@k over time** - Monitor reliability trends
200
+ 4. **Use code graders when possible** - Deterministic > probabilistic
201
+ 5. **Human review for security** - Never fully automate security checks
202
+ 6. **Keep evals fast** - Slow evals don't get run
203
+ 7. **Version evals with code** - Evals are first-class artifacts
204
+
205
+ ## Example: Adding Authentication
206
+
207
+ ```markdown
208
+ ## EVAL: add-authentication
209
+
210
+ ### Phase 1: Define (10 min)
211
+ Capability Evals:
212
+ - [ ] User can register with email/password
213
+ - [ ] User can login with valid credentials
214
+ - [ ] Invalid credentials rejected with proper error
215
+ - [ ] Sessions persist across page reloads
216
+ - [ ] Logout clears session
217
+
218
+ Regression Evals:
219
+ - [ ] Public routes still accessible
220
+ - [ ] API responses unchanged
221
+ - [ ] Database schema compatible
222
+
223
+ ### Phase 2: Implement (varies)
224
+ [Write code]
225
+
226
+ ### Phase 3: Evaluate
227
+ Run: /eval check add-authentication
228
+
229
+ ### Phase 4: Report
230
+ EVAL REPORT: add-authentication
231
+ ==============================
232
+ Capability: 5/5 passed (pass@3: 100%)
233
+ Regression: 3/3 passed (pass^3: 100%)
234
+ Status: SHIP IT
235
+ ```
236
+
237
+ ## Product Evals (v1.8)
238
+
239
+ Use product evals when behavior quality cannot be captured by unit tests alone.
240
+
241
+ ### Grader Types
242
+
243
+ 1. Code grader (deterministic assertions)
244
+ 2. Rule grader (regex/schema constraints)
245
+ 3. Model grader (LLM-as-judge rubric)
246
+ 4. Human grader (manual adjudication for ambiguous outputs)
247
+
248
+ ### pass@k Guidance
249
+
250
+ - `pass@1`: direct reliability
251
+ - `pass@3`: practical reliability under controlled retries
252
+ - `pass^3`: stability test (all 3 runs must pass)
253
+
254
+ Recommended thresholds:
255
+ - Capability evals: pass@3 >= 0.90
256
+ - Regression evals: pass^3 = 1.00 for release-critical paths
257
+
258
+ ### Eval Anti-Patterns
259
+
260
+ - Overfitting prompts to known eval examples
261
+ - Measuring only happy-path outputs
262
+ - Ignoring cost and latency drift while chasing pass rates
263
+ - Allowing flaky graders in release gates
264
+
265
+ ### Minimal Eval Artifact Layout
266
+
267
+ - `.claude/evals/<feature>.md` definition
268
+ - `.claude/evals/<feature>.log` run history
269
+ - `docs/releases/<version>/eval-summary.md` release snapshot