@heytherevibin/skillforge 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +16 -0
- package/CODE_OF_CONDUCT.md +34 -0
- package/CONTRIBUTING.md +38 -0
- package/LICENSE +21 -0
- package/README.md +337 -0
- package/RELEASING.md +93 -0
- package/SECURITY.md +31 -0
- package/STRATEGY.md +26 -0
- package/bin/cli.js +547 -0
- package/lib/packs.js +184 -0
- package/package.json +38 -0
- package/python/app/__init__.py +0 -0
- package/python/app/__pycache__/__init__.cpython-312.pyc +0 -0
- package/python/app/__pycache__/auth.cpython-312.pyc +0 -0
- package/python/app/__pycache__/main.cpython-312.pyc +0 -0
- package/python/app/auth.py +63 -0
- package/python/app/cli.py +78 -0
- package/python/app/db_paths.py +26 -0
- package/python/app/events_cli.py +175 -0
- package/python/app/main.py +647 -0
- package/python/app/materialize.py +138 -0
- package/python/app/mcp_server.py +610 -0
- package/python/app/route_cli.py +117 -0
- package/python/requirements-dev.txt +1 -0
- package/python/requirements.txt +7 -0
- package/python/tests/test_db_paths.py +41 -0
- package/skills/accessibility/SKILL.md +145 -0
- package/skills/agent-architecture-audit/SKILL.md +256 -0
- package/skills/agent-eval/SKILL.md +144 -0
- package/skills/agent-harness-construction/SKILL.md +72 -0
- package/skills/agent-introspection-debugging/SKILL.md +152 -0
- package/skills/agent-payment-x402/SKILL.md +224 -0
- package/skills/agent-sort/SKILL.md +214 -0
- package/skills/agentic-engineering/SKILL.md +62 -0
- package/skills/agentic-os/SKILL.md +386 -0
- package/skills/ai-first-engineering/SKILL.md +50 -0
- package/skills/ai-regression-testing/SKILL.md +384 -0
- package/skills/android-clean-architecture/SKILL.md +338 -0
- package/skills/angular-developer/SKILL.md +153 -0
- package/skills/angular-developer/references/angular-animations.md +160 -0
- package/skills/angular-developer/references/angular-aria.md +410 -0
- package/skills/angular-developer/references/cli.md +86 -0
- package/skills/angular-developer/references/component-harnesses.md +59 -0
- package/skills/angular-developer/references/component-styling.md +91 -0
- package/skills/angular-developer/references/components.md +117 -0
- package/skills/angular-developer/references/creating-services.md +97 -0
- package/skills/angular-developer/references/data-resolvers.md +69 -0
- package/skills/angular-developer/references/define-routes.md +67 -0
- package/skills/angular-developer/references/defining-providers.md +72 -0
- package/skills/angular-developer/references/di-fundamentals.md +120 -0
- package/skills/angular-developer/references/e2e-testing.md +56 -0
- package/skills/angular-developer/references/effects.md +83 -0
- package/skills/angular-developer/references/hierarchical-injectors.md +43 -0
- package/skills/angular-developer/references/host-elements.md +80 -0
- package/skills/angular-developer/references/injection-context.md +63 -0
- package/skills/angular-developer/references/inputs.md +101 -0
- package/skills/angular-developer/references/linked-signal.md +59 -0
- package/skills/angular-developer/references/loading-strategies.md +61 -0
- package/skills/angular-developer/references/mcp.md +108 -0
- package/skills/angular-developer/references/navigate-to-routes.md +69 -0
- package/skills/angular-developer/references/outputs.md +86 -0
- package/skills/angular-developer/references/reactive-forms.md +122 -0
- package/skills/angular-developer/references/rendering-strategies.md +44 -0
- package/skills/angular-developer/references/resource.md +77 -0
- package/skills/angular-developer/references/route-animations.md +56 -0
- package/skills/angular-developer/references/route-guards.md +52 -0
- package/skills/angular-developer/references/router-lifecycle.md +45 -0
- package/skills/angular-developer/references/router-testing.md +87 -0
- package/skills/angular-developer/references/show-routes-with-outlets.md +68 -0
- package/skills/angular-developer/references/signal-forms.md +795 -0
- package/skills/angular-developer/references/signals-overview.md +94 -0
- package/skills/angular-developer/references/tailwind-css.md +69 -0
- package/skills/angular-developer/references/template-driven-forms.md +114 -0
- package/skills/angular-developer/references/testing-fundamentals.md +65 -0
- package/skills/api-connector-builder/SKILL.md +120 -0
- package/skills/api-design/SKILL.md +522 -0
- package/skills/architecture-decision-records/SKILL.md +178 -0
- package/skills/article-writing/SKILL.md +78 -0
- package/skills/automation-audit-ops/SKILL.md +141 -0
- package/skills/autonomous-agent-harness/SKILL.md +272 -0
- package/skills/autonomous-loops/SKILL.md +609 -0
- package/skills/backend-patterns/SKILL.md +560 -0
- package/skills/benchmark/SKILL.md +92 -0
- package/skills/blueprint/SKILL.md +104 -0
- package/skills/browser-qa/SKILL.md +86 -0
- package/skills/bun-runtime/SKILL.md +83 -0
- package/skills/canary-watch/SKILL.md +98 -0
- package/skills/carrier-relationship-management/SKILL.md +211 -0
- package/skills/cisco-ios-patterns/SKILL.md +163 -0
- package/skills/ck/SKILL.md +147 -0
- package/skills/ck/commands/forget.mjs +44 -0
- package/skills/ck/commands/info.mjs +24 -0
- package/skills/ck/commands/init.mjs +143 -0
- package/skills/ck/commands/list.mjs +40 -0
- package/skills/ck/commands/migrate.mjs +202 -0
- package/skills/ck/commands/resume.mjs +36 -0
- package/skills/ck/commands/save.mjs +210 -0
- package/skills/ck/commands/shared.mjs +387 -0
- package/skills/ck/hooks/session-start.mjs +224 -0
- package/skills/claude-devfleet/SKILL.md +103 -0
- package/skills/click-path-audit/SKILL.md +244 -0
- package/skills/clickhouse-io/SKILL.md +438 -0
- package/skills/code-tour/SKILL.md +235 -0
- package/skills/codebase-onboarding/SKILL.md +232 -0
- package/skills/coding-standards/SKILL.md +548 -0
- package/skills/compose-multiplatform-patterns/SKILL.md +298 -0
- package/skills/connections-optimizer/SKILL.md +188 -0
- package/skills/content-engine/SKILL.md +126 -0
- package/skills/content-hash-cache-pattern/SKILL.md +160 -0
- package/skills/context-budget/SKILL.md +134 -0
- package/skills/continuous-agent-loop/SKILL.md +44 -0
- package/skills/continuous-learning/SKILL.md +129 -0
- package/skills/continuous-learning/config.json +18 -0
- package/skills/continuous-learning/evaluate-session.sh +69 -0
- package/skills/continuous-learning-v2/SKILL.md +358 -0
- package/skills/continuous-learning-v2/agents/observer-loop.sh +322 -0
- package/skills/continuous-learning-v2/agents/observer.md +198 -0
- package/skills/continuous-learning-v2/agents/session-guardian.sh +150 -0
- package/skills/continuous-learning-v2/agents/start-observer.sh +248 -0
- package/skills/continuous-learning-v2/config.json +8 -0
- package/skills/continuous-learning-v2/hooks/observe.sh +476 -0
- package/skills/continuous-learning-v2/scripts/detect-project.sh +288 -0
- package/skills/continuous-learning-v2/scripts/instinct-cli.py +1519 -0
- package/skills/continuous-learning-v2/scripts/lib/homunculus-dir.sh +31 -0
- package/skills/continuous-learning-v2/scripts/migrate-homunculus.sh +62 -0
- package/skills/continuous-learning-v2/scripts/test_parse_instinct.py +1018 -0
- package/skills/cost-aware-llm-pipeline/SKILL.md +182 -0
- package/skills/cost-tracking/SKILL.md +147 -0
- package/skills/council/SKILL.md +202 -0
- package/skills/cpp-coding-standards/SKILL.md +722 -0
- package/skills/cpp-testing/SKILL.md +323 -0
- package/skills/crosspost/SKILL.md +110 -0
- package/skills/csharp-testing/SKILL.md +320 -0
- package/skills/customer-billing-ops/SKILL.md +139 -0
- package/skills/customs-trade-compliance/SKILL.md +262 -0
- package/skills/dart-flutter-patterns/SKILL.md +562 -0
- package/skills/dashboard-builder/SKILL.md +108 -0
- package/skills/data-scraper-agent/SKILL.md +764 -0
- package/skills/database-migrations/SKILL.md +428 -0
- package/skills/deep-research/SKILL.md +158 -0
- package/skills/defi-amm-security/SKILL.md +166 -0
- package/skills/deployment-patterns/SKILL.md +426 -0
- package/skills/design-system/SKILL.md +81 -0
- package/skills/django-celery/SKILL.md +456 -0
- package/skills/django-patterns/SKILL.md +733 -0
- package/skills/django-security/SKILL.md +592 -0
- package/skills/django-tdd/SKILL.md +728 -0
- package/skills/django-verification/SKILL.md +468 -0
- package/skills/dmux-workflows/SKILL.md +190 -0
- package/skills/docker-patterns/SKILL.md +363 -0
- package/skills/documentation-lookup/SKILL.md +89 -0
- package/skills/dotnet-patterns/SKILL.md +320 -0
- package/skills/e2e-testing/SKILL.md +325 -0
- package/skills/email-ops/SKILL.md +120 -0
- package/skills/energy-procurement/SKILL.md +227 -0
- package/skills/enterprise-agent-ops/SKILL.md +49 -0
- package/skills/error-handling/SKILL.md +375 -0
- package/skills/eval-harness/SKILL.md +269 -0
- package/skills/evm-token-decimals/SKILL.md +130 -0
- package/skills/exa-search/SKILL.md +106 -0
- package/skills/fal-ai-media/SKILL.md +287 -0
- package/skills/fastapi-patterns/SKILL.md +327 -0
- package/skills/finance-billing-ops/SKILL.md +126 -0
- package/skills/flox-environments/SKILL.md +496 -0
- package/skills/flutter-dart-code-review/SKILL.md +434 -0
- package/skills/foundation-models-on-device/SKILL.md +243 -0
- package/skills/frontend-design-direction/SKILL.md +92 -0
- package/skills/frontend-patterns/SKILL.md +641 -0
- package/skills/frontend-slides/SKILL.md +183 -0
- package/skills/frontend-slides/STYLE_PRESETS.md +330 -0
- package/skills/frontend-slides/animation-patterns.md +122 -0
- package/skills/frontend-slides/html-template.md +419 -0
- package/skills/frontend-slides/scripts/export-pdf.sh +418 -0
- package/skills/frontend-slides/scripts/extract-pptx.py +96 -0
- package/skills/frontend-slides/viewport-base.css +153 -0
- package/skills/fsharp-testing/SKILL.md +279 -0
- package/skills/gan-style-harness/SKILL.md +278 -0
- package/skills/gateguard/SKILL.md +125 -0
- package/skills/git-workflow/SKILL.md +714 -0
- package/skills/github-ops/SKILL.md +143 -0
- package/skills/golang-patterns/SKILL.md +673 -0
- package/skills/golang-testing/SKILL.md +719 -0
- package/skills/google-workspace-ops/SKILL.md +94 -0
- package/skills/healthcare-cdss-patterns/SKILL.md +245 -0
- package/skills/healthcare-emr-patterns/SKILL.md +159 -0
- package/skills/healthcare-eval-harness/SKILL.md +207 -0
- package/skills/healthcare-phi-compliance/SKILL.md +145 -0
- package/skills/hermes-imports/SKILL.md +87 -0
- package/skills/hexagonal-architecture/SKILL.md +275 -0
- package/skills/hipaa-compliance/SKILL.md +78 -0
- package/skills/homelab-network-readiness/SKILL.md +169 -0
- package/skills/homelab-network-setup/SKILL.md +129 -0
- package/skills/homelab-pihole-dns/SKILL.md +274 -0
- package/skills/homelab-vlan-segmentation/SKILL.md +311 -0
- package/skills/homelab-wireguard-vpn/SKILL.md +305 -0
- package/skills/hookify-rules/SKILL.md +128 -0
- package/skills/inventory-demand-planning/SKILL.md +246 -0
- package/skills/investor-materials/SKILL.md +95 -0
- package/skills/investor-outreach/SKILL.md +90 -0
- package/skills/ios-icon-gen/SKILL.md +157 -0
- package/skills/ios-icon-gen/scripts/generate_icons.swift +258 -0
- package/skills/ios-icon-gen/scripts/iconify_gen.sh +235 -0
- package/skills/iterative-retrieval/SKILL.md +209 -0
- package/skills/java-coding-standards/SKILL.md +382 -0
- package/skills/jira-integration/SKILL.md +292 -0
- package/skills/jpa-patterns/SKILL.md +150 -0
- package/skills/knowledge-ops/SKILL.md +153 -0
- package/skills/kotlin-coroutines-flows/SKILL.md +283 -0
- package/skills/kotlin-exposed-patterns/SKILL.md +718 -0
- package/skills/kotlin-ktor-patterns/SKILL.md +688 -0
- package/skills/kotlin-patterns/SKILL.md +710 -0
- package/skills/kotlin-testing/SKILL.md +823 -0
- package/skills/laravel-patterns/SKILL.md +414 -0
- package/skills/laravel-plugin-discovery/SKILL.md +228 -0
- package/skills/laravel-security/SKILL.md +284 -0
- package/skills/laravel-tdd/SKILL.md +282 -0
- package/skills/laravel-verification/SKILL.md +178 -0
- package/skills/lead-intelligence/SKILL.md +320 -0
- package/skills/lead-intelligence/agents/enrichment-agent.md +85 -0
- package/skills/lead-intelligence/agents/mutual-mapper.md +75 -0
- package/skills/lead-intelligence/agents/outreach-drafter.md +98 -0
- package/skills/lead-intelligence/agents/signal-scorer.md +60 -0
- package/skills/liquid-glass-design/SKILL.md +279 -0
- package/skills/llm-trading-agent-security/SKILL.md +146 -0
- package/skills/logistics-exception-management/SKILL.md +221 -0
- package/skills/make-interfaces-feel-better/SKILL.md +151 -0
- package/skills/manim-video/SKILL.md +88 -0
- package/skills/manim-video/assets/network_graph_scene.py +52 -0
- package/skills/market-research/SKILL.md +74 -0
- package/skills/mcp-server-patterns/SKILL.md +68 -0
- package/skills/messages-ops/SKILL.md +103 -0
- package/skills/mle-workflow/SKILL.md +345 -0
- package/skills/motion-advanced/SKILL.md +596 -0
- package/skills/motion-foundations/SKILL.md +299 -0
- package/skills/motion-patterns/SKILL.md +435 -0
- package/skills/motion-ui/SKILL.md +574 -0
- package/skills/mysql-patterns/SKILL.md +411 -0
- package/skills/nanoclaw-repl/SKILL.md +32 -0
- package/skills/nestjs-patterns/SKILL.md +229 -0
- package/skills/netmiko-ssh-automation/SKILL.md +173 -0
- package/skills/network-bgp-diagnostics/SKILL.md +167 -0
- package/skills/network-config-validation/SKILL.md +210 -0
- package/skills/network-interface-health/SKILL.md +152 -0
- package/skills/nextjs-turbopack/SKILL.md +43 -0
- package/skills/nodejs-keccak256/SKILL.md +102 -0
- package/skills/nutrient-document-processing/SKILL.md +166 -0
- package/skills/nuxt4-patterns/SKILL.md +99 -0
- package/skills/openclaw-persona-forge/SKILL.md +288 -0
- package/skills/openclaw-persona-forge/gacha.py +224 -0
- package/skills/openclaw-persona-forge/gacha.sh +5 -0
- package/skills/openclaw-persona-forge/references/avatar-style.md +124 -0
- package/skills/openclaw-persona-forge/references/boundary-rules.md +53 -0
- package/skills/openclaw-persona-forge/references/error-handling.md +53 -0
- package/skills/openclaw-persona-forge/references/identity-tension.md +48 -0
- package/skills/openclaw-persona-forge/references/naming-system.md +39 -0
- package/skills/openclaw-persona-forge/references/output-template.md +166 -0
- package/skills/opensource-pipeline/SKILL.md +254 -0
- package/skills/perl-patterns/SKILL.md +503 -0
- package/skills/perl-security/SKILL.md +502 -0
- package/skills/perl-testing/SKILL.md +474 -0
- package/skills/plan-orchestrate/SKILL.md +253 -0
- package/skills/plankton-code-quality/SKILL.md +236 -0
- package/skills/postgres-patterns/SKILL.md +146 -0
- package/skills/product-capability/SKILL.md +140 -0
- package/skills/product-lens/SKILL.md +91 -0
- package/skills/production-audit/SKILL.md +206 -0
- package/skills/production-scheduling/SKILL.md +237 -0
- package/skills/project-flow-ops/SKILL.md +110 -0
- package/skills/prompt-optimizer/SKILL.md +398 -0
- package/skills/python-patterns/SKILL.md +749 -0
- package/skills/python-testing/SKILL.md +815 -0
- package/skills/pytorch-patterns/SKILL.md +395 -0
- package/skills/quality-nonconformance/SKILL.md +259 -0
- package/skills/quarkus-patterns/SKILL.md +721 -0
- package/skills/quarkus-security/SKILL.md +466 -0
- package/skills/quarkus-tdd/SKILL.md +810 -0
- package/skills/quarkus-verification/SKILL.md +478 -0
- package/skills/ralphinho-rfc-pipeline/SKILL.md +66 -0
- package/skills/redis-patterns/SKILL.md +402 -0
- package/skills/regex-vs-llm-structured-text/SKILL.md +219 -0
- package/skills/remotion-video-creation/SKILL.md +43 -0
- package/skills/remotion-video-creation/rules/3d.md +86 -0
- package/skills/remotion-video-creation/rules/animations.md +29 -0
- package/skills/remotion-video-creation/rules/assets/charts-bar-chart.tsx +173 -0
- package/skills/remotion-video-creation/rules/assets/text-animations-typewriter.tsx +100 -0
- package/skills/remotion-video-creation/rules/assets/text-animations-word-highlight.tsx +108 -0
- package/skills/remotion-video-creation/rules/assets.md +78 -0
- package/skills/remotion-video-creation/rules/audio.md +172 -0
- package/skills/remotion-video-creation/rules/calculate-metadata.md +104 -0
- package/skills/remotion-video-creation/rules/can-decode.md +75 -0
- package/skills/remotion-video-creation/rules/charts.md +58 -0
- package/skills/remotion-video-creation/rules/compositions.md +146 -0
- package/skills/remotion-video-creation/rules/display-captions.md +126 -0
- package/skills/remotion-video-creation/rules/extract-frames.md +229 -0
- package/skills/remotion-video-creation/rules/fonts.md +152 -0
- package/skills/remotion-video-creation/rules/get-audio-duration.md +58 -0
- package/skills/remotion-video-creation/rules/get-video-dimensions.md +68 -0
- package/skills/remotion-video-creation/rules/get-video-duration.md +58 -0
- package/skills/remotion-video-creation/rules/gifs.md +138 -0
- package/skills/remotion-video-creation/rules/images.md +130 -0
- package/skills/remotion-video-creation/rules/import-srt-captions.md +67 -0
- package/skills/remotion-video-creation/rules/lottie.md +67 -0
- package/skills/remotion-video-creation/rules/measuring-dom-nodes.md +34 -0
- package/skills/remotion-video-creation/rules/measuring-text.md +143 -0
- package/skills/remotion-video-creation/rules/sequencing.md +106 -0
- package/skills/remotion-video-creation/rules/tailwind.md +11 -0
- package/skills/remotion-video-creation/rules/text-animations.md +20 -0
- package/skills/remotion-video-creation/rules/timing.md +179 -0
- package/skills/remotion-video-creation/rules/transcribe-captions.md +19 -0
- package/skills/remotion-video-creation/rules/transitions.md +122 -0
- package/skills/remotion-video-creation/rules/trimming.md +52 -0
- package/skills/remotion-video-creation/rules/videos.md +171 -0
- package/skills/repo-scan/SKILL.md +78 -0
- package/skills/research-ops/SKILL.md +111 -0
- package/skills/returns-reverse-logistics/SKILL.md +239 -0
- package/skills/rules-distill/SKILL.md +263 -0
- package/skills/rules-distill/scripts/scan-rules.sh +58 -0
- package/skills/rules-distill/scripts/scan-skills.sh +129 -0
- package/skills/rust-patterns/SKILL.md +498 -0
- package/skills/rust-testing/SKILL.md +499 -0
- package/skills/safety-guard/SKILL.md +74 -0
- package/skills/santa-method/SKILL.md +306 -0
- package/skills/scientific-db-pubmed-database/SKILL.md +175 -0
- package/skills/scientific-db-uspto-database/SKILL.md +177 -0
- package/skills/scientific-pkg-gget/SKILL.md +166 -0
- package/skills/scientific-thinking-literature-review/SKILL.md +192 -0
- package/skills/scientific-thinking-scholar-evaluation/SKILL.md +160 -0
- package/skills/search-first/SKILL.md +181 -0
- package/skills/security-bounty-hunter/SKILL.md +99 -0
- package/skills/security-review/SKILL.md +502 -0
- package/skills/security-review/cloud-infrastructure-security.md +361 -0
- package/skills/seo/SKILL.md +153 -0
- package/skills/skill-comply/SKILL.md +57 -0
- package/skills/skill-comply/fixtures/compliant_trace.jsonl +5 -0
- package/skills/skill-comply/fixtures/noncompliant_trace.jsonl +3 -0
- package/skills/skill-comply/fixtures/tdd_spec.yaml +44 -0
- package/skills/skill-comply/prompts/classifier.md +24 -0
- package/skills/skill-comply/prompts/scenario_generator.md +62 -0
- package/skills/skill-comply/prompts/spec_generator.md +42 -0
- package/skills/skill-comply/pyproject.toml +15 -0
- package/skills/skill-comply/scripts/__init__.py +0 -0
- package/skills/skill-comply/scripts/classifier.py +85 -0
- package/skills/skill-comply/scripts/grader.py +124 -0
- package/skills/skill-comply/scripts/parser.py +107 -0
- package/skills/skill-comply/scripts/report.py +170 -0
- package/skills/skill-comply/scripts/run.py +127 -0
- package/skills/skill-comply/scripts/runner.py +186 -0
- package/skills/skill-comply/scripts/scenario_generator.py +70 -0
- package/skills/skill-comply/scripts/spec_generator.py +72 -0
- package/skills/skill-comply/scripts/utils.py +13 -0
- package/skills/skill-comply/tests/test_grader.py +197 -0
- package/skills/skill-comply/tests/test_parser.py +90 -0
- package/skills/skill-comply/tests/test_runner.py +172 -0
- package/skills/skill-scout/SKILL.md +139 -0
- package/skills/skill-stocktake/SKILL.md +193 -0
- package/skills/skill-stocktake/scripts/quick-diff.sh +87 -0
- package/skills/skill-stocktake/scripts/save-results.sh +56 -0
- package/skills/skill-stocktake/scripts/scan.sh +170 -0
- package/skills/social-graph-ranker/SKILL.md +153 -0
- package/skills/springboot-patterns/SKILL.md +313 -0
- package/skills/springboot-security/SKILL.md +271 -0
- package/skills/springboot-tdd/SKILL.md +157 -0
- package/skills/springboot-verification/SKILL.md +230 -0
- package/skills/strategic-compact/SKILL.md +129 -0
- package/skills/strategic-compact/suggest-compact.sh +54 -0
- package/skills/swift-actor-persistence/SKILL.md +142 -0
- package/skills/swift-concurrency-6-2/SKILL.md +216 -0
- package/skills/swift-protocol-di-testing/SKILL.md +189 -0
- package/skills/swiftui-patterns/SKILL.md +259 -0
- package/skills/tdd-workflow/SKILL.md +462 -0
- package/skills/team-builder/SKILL.md +166 -0
- package/skills/terminal-ops/SKILL.md +108 -0
- package/skills/tinystruct-patterns/SKILL.md +130 -0
- package/skills/tinystruct-patterns/references/architecture.md +77 -0
- package/skills/tinystruct-patterns/references/data-handling.md +35 -0
- package/skills/tinystruct-patterns/references/routing.md +57 -0
- package/skills/tinystruct-patterns/references/system-usage.md +74 -0
- package/skills/tinystruct-patterns/references/testing.md +59 -0
- package/skills/token-budget-advisor/SKILL.md +133 -0
- package/skills/ui-demo/SKILL.md +464 -0
- package/skills/ui-to-vue/SKILL.md +134 -0
- package/skills/unified-notifications-ops/SKILL.md +186 -0
- package/skills/verification-loop/SKILL.md +125 -0
- package/skills/video-editing/SKILL.md +309 -0
- package/skills/videodb/SKILL.md +373 -0
- package/skills/videodb/reference/api-reference.md +550 -0
- package/skills/videodb/reference/capture-reference.md +407 -0
- package/skills/videodb/reference/capture.md +101 -0
- package/skills/videodb/reference/editor.md +443 -0
- package/skills/videodb/reference/generative.md +331 -0
- package/skills/videodb/reference/rtstream-reference.md +564 -0
- package/skills/videodb/reference/rtstream.md +65 -0
- package/skills/videodb/reference/search.md +230 -0
- package/skills/videodb/reference/streaming.md +406 -0
- package/skills/videodb/reference/use-cases.md +118 -0
- package/skills/videodb/scripts/ws_listener.py +282 -0
- package/skills/visa-doc-translate/README.md +86 -0
- package/skills/visa-doc-translate/SKILL.md +117 -0
- package/skills/vite-patterns/SKILL.md +448 -0
- package/skills/windows-desktop-e2e/SKILL.md +787 -0
- package/skills/workspace-surface-audit/SKILL.md +124 -0
- package/skills/x-api/SKILL.md +233 -0
|
@@ -0,0 +1,375 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: error-handling
|
|
3
|
+
description: Patterns for robust error handling across TypeScript, Python, and Go. Covers typed errors, error boundaries, retries, circuit breakers, and user-facing error messages.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Error Handling Patterns
|
|
7
|
+
|
|
8
|
+
Consistent, robust error handling patterns for production applications.
|
|
9
|
+
|
|
10
|
+
## When to Activate
|
|
11
|
+
|
|
12
|
+
- Designing error types or exception hierarchies for a new module or service
|
|
13
|
+
- Adding retry logic or circuit breakers for unreliable external dependencies
|
|
14
|
+
- Reviewing API endpoints for missing error handling
|
|
15
|
+
- Implementing user-facing error messages and feedback
|
|
16
|
+
- Debugging cascading failures or silent error swallowing
|
|
17
|
+
|
|
18
|
+
## Core Principles
|
|
19
|
+
|
|
20
|
+
1. **Fail fast and loudly** — surface errors at the boundary where they occur; don't bury them
|
|
21
|
+
2. **Typed errors over string messages** — errors are first-class values with structure
|
|
22
|
+
3. **User messages ≠ developer messages** — show friendly text to users, log full context server-side
|
|
23
|
+
4. **Never swallow errors silently** — every `catch` block must either handle, re-throw, or log
|
|
24
|
+
5. **Errors are part of your API contract** — document every error code a client may receive
|
|
25
|
+
|
|
26
|
+
## TypeScript / JavaScript
|
|
27
|
+
|
|
28
|
+
### Typed Error Classes
|
|
29
|
+
|
|
30
|
+
```typescript
|
|
31
|
+
// Define an error hierarchy for your domain
|
|
32
|
+
export class AppError extends Error {
|
|
33
|
+
constructor(
|
|
34
|
+
message: string,
|
|
35
|
+
public readonly code: string,
|
|
36
|
+
public readonly statusCode: number = 500,
|
|
37
|
+
public readonly details?: unknown,
|
|
38
|
+
) {
|
|
39
|
+
super(message)
|
|
40
|
+
this.name = this.constructor.name
|
|
41
|
+
// Maintain correct prototype chain in transpiled ES5 JavaScript.
|
|
42
|
+
// Required for `instanceof` checks (e.g., `error instanceof NotFoundError`)
|
|
43
|
+
// to work correctly when extending the built-in Error class.
|
|
44
|
+
Object.setPrototypeOf(this, new.target.prototype)
|
|
45
|
+
}
|
|
46
|
+
}
|
|
47
|
+
|
|
48
|
+
export class NotFoundError extends AppError {
|
|
49
|
+
constructor(resource: string, id: string) {
|
|
50
|
+
super(`${resource} not found: ${id}`, 'NOT_FOUND', 404)
|
|
51
|
+
}
|
|
52
|
+
}
|
|
53
|
+
|
|
54
|
+
export class ValidationError extends AppError {
|
|
55
|
+
constructor(message: string, details: { field: string; message: string }[]) {
|
|
56
|
+
super(message, 'VALIDATION_ERROR', 422, details)
|
|
57
|
+
}
|
|
58
|
+
}
|
|
59
|
+
|
|
60
|
+
export class UnauthorizedError extends AppError {
|
|
61
|
+
constructor(reason = 'Authentication required') {
|
|
62
|
+
super(reason, 'UNAUTHORIZED', 401)
|
|
63
|
+
}
|
|
64
|
+
}
|
|
65
|
+
|
|
66
|
+
export class RateLimitError extends AppError {
|
|
67
|
+
constructor(public readonly retryAfterMs: number) {
|
|
68
|
+
super('Rate limit exceeded', 'RATE_LIMITED', 429)
|
|
69
|
+
}
|
|
70
|
+
}
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
### Result Pattern (no-throw style)
|
|
74
|
+
|
|
75
|
+
For operations where failure is expected and common (parsing, external calls):
|
|
76
|
+
|
|
77
|
+
```typescript
|
|
78
|
+
type Result<T, E = AppError> =
|
|
79
|
+
| { ok: true; value: T }
|
|
80
|
+
| { ok: false; error: E }
|
|
81
|
+
|
|
82
|
+
function ok<T>(value: T): Result<T> {
|
|
83
|
+
return { ok: true, value }
|
|
84
|
+
}
|
|
85
|
+
|
|
86
|
+
function err<E>(error: E): Result<never, E> {
|
|
87
|
+
return { ok: false, error }
|
|
88
|
+
}
|
|
89
|
+
|
|
90
|
+
// Usage
|
|
91
|
+
async function fetchUser(id: string): Promise<Result<User>> {
|
|
92
|
+
try {
|
|
93
|
+
const user = await db.users.findUnique({ where: { id } })
|
|
94
|
+
if (!user) return err(new NotFoundError('User', id))
|
|
95
|
+
return ok(user)
|
|
96
|
+
} catch (e) {
|
|
97
|
+
return err(new AppError('Database error', 'DB_ERROR'))
|
|
98
|
+
}
|
|
99
|
+
}
|
|
100
|
+
|
|
101
|
+
const result = await fetchUser('abc-123')
|
|
102
|
+
if (!result.ok) {
|
|
103
|
+
// TypeScript knows result.error here
|
|
104
|
+
logger.error('Failed to fetch user', { error: result.error })
|
|
105
|
+
return
|
|
106
|
+
}
|
|
107
|
+
// TypeScript knows result.value here
|
|
108
|
+
console.log(result.value.email)
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
### API Error Handler (Next.js / Express)
|
|
112
|
+
|
|
113
|
+
```typescript
|
|
114
|
+
import { NextRequest, NextResponse } from 'next/server'
|
|
115
|
+
|
|
116
|
+
function handleApiError(error: unknown): NextResponse {
|
|
117
|
+
// Known application error
|
|
118
|
+
if (error instanceof AppError) {
|
|
119
|
+
return NextResponse.json(
|
|
120
|
+
{
|
|
121
|
+
error: {
|
|
122
|
+
code: error.code,
|
|
123
|
+
message: error.message,
|
|
124
|
+
...(error.details ? { details: error.details } : {}),
|
|
125
|
+
},
|
|
126
|
+
},
|
|
127
|
+
{ status: error.statusCode },
|
|
128
|
+
)
|
|
129
|
+
}
|
|
130
|
+
|
|
131
|
+
// Zod validation error
|
|
132
|
+
if (error instanceof z.ZodError) {
|
|
133
|
+
return NextResponse.json(
|
|
134
|
+
{
|
|
135
|
+
error: {
|
|
136
|
+
code: 'VALIDATION_ERROR',
|
|
137
|
+
message: 'Request validation failed',
|
|
138
|
+
details: error.issues.map(i => ({
|
|
139
|
+
field: i.path.join('.'),
|
|
140
|
+
message: i.message,
|
|
141
|
+
})),
|
|
142
|
+
},
|
|
143
|
+
},
|
|
144
|
+
{ status: 422 },
|
|
145
|
+
)
|
|
146
|
+
}
|
|
147
|
+
|
|
148
|
+
// Unexpected error — log details, return generic message
|
|
149
|
+
console.error('Unexpected error:', error)
|
|
150
|
+
return NextResponse.json(
|
|
151
|
+
{ error: { code: 'INTERNAL_ERROR', message: 'An unexpected error occurred' } },
|
|
152
|
+
{ status: 500 },
|
|
153
|
+
)
|
|
154
|
+
}
|
|
155
|
+
|
|
156
|
+
export async function POST(req: NextRequest) {
|
|
157
|
+
try {
|
|
158
|
+
// ... handler logic
|
|
159
|
+
} catch (error) {
|
|
160
|
+
return handleApiError(error)
|
|
161
|
+
}
|
|
162
|
+
}
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
### React Error Boundary
|
|
166
|
+
|
|
167
|
+
```typescript
|
|
168
|
+
import { Component, ErrorInfo, ReactNode } from 'react'
|
|
169
|
+
|
|
170
|
+
interface Props {
|
|
171
|
+
fallback: ReactNode
|
|
172
|
+
onError?: (error: Error, info: ErrorInfo) => void
|
|
173
|
+
children: ReactNode
|
|
174
|
+
}
|
|
175
|
+
|
|
176
|
+
interface State {
|
|
177
|
+
hasError: boolean
|
|
178
|
+
error: Error | null
|
|
179
|
+
}
|
|
180
|
+
|
|
181
|
+
export class ErrorBoundary extends Component<Props, State> {
|
|
182
|
+
state: State = { hasError: false, error: null }
|
|
183
|
+
|
|
184
|
+
static getDerivedStateFromError(error: Error): State {
|
|
185
|
+
return { hasError: true, error }
|
|
186
|
+
}
|
|
187
|
+
|
|
188
|
+
componentDidCatch(error: Error, info: ErrorInfo) {
|
|
189
|
+
this.props.onError?.(error, info)
|
|
190
|
+
console.error('Unhandled React error:', error, info)
|
|
191
|
+
}
|
|
192
|
+
|
|
193
|
+
render() {
|
|
194
|
+
if (this.state.hasError) return this.props.fallback
|
|
195
|
+
return this.props.children
|
|
196
|
+
}
|
|
197
|
+
}
|
|
198
|
+
|
|
199
|
+
// Usage
|
|
200
|
+
<ErrorBoundary fallback={<p>Something went wrong. Please refresh.</p>}>
|
|
201
|
+
<MyComponent />
|
|
202
|
+
</ErrorBoundary>
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
## Python
|
|
206
|
+
|
|
207
|
+
### Custom Exception Hierarchy
|
|
208
|
+
|
|
209
|
+
```python
|
|
210
|
+
class AppError(Exception):
|
|
211
|
+
"""Base application error."""
|
|
212
|
+
def __init__(self, message: str, code: str, status_code: int = 500):
|
|
213
|
+
super().__init__(message)
|
|
214
|
+
self.code = code
|
|
215
|
+
self.status_code = status_code
|
|
216
|
+
|
|
217
|
+
class NotFoundError(AppError):
|
|
218
|
+
def __init__(self, resource: str, id: str):
|
|
219
|
+
super().__init__(f"{resource} not found: {id}", "NOT_FOUND", 404)
|
|
220
|
+
|
|
221
|
+
class ValidationError(AppError):
|
|
222
|
+
def __init__(self, message: str, details: list[dict] | None = None):
|
|
223
|
+
super().__init__(message, "VALIDATION_ERROR", 422)
|
|
224
|
+
self.details = details or []
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
### FastAPI Global Exception Handler
|
|
228
|
+
|
|
229
|
+
```python
|
|
230
|
+
from fastapi import FastAPI, Request
|
|
231
|
+
from fastapi.responses import JSONResponse
|
|
232
|
+
|
|
233
|
+
app = FastAPI()
|
|
234
|
+
|
|
235
|
+
@app.exception_handler(AppError)
|
|
236
|
+
async def app_error_handler(request: Request, exc: AppError) -> JSONResponse:
|
|
237
|
+
return JSONResponse(
|
|
238
|
+
status_code=exc.status_code,
|
|
239
|
+
content={"error": {"code": exc.code, "message": str(exc)}},
|
|
240
|
+
)
|
|
241
|
+
|
|
242
|
+
@app.exception_handler(Exception)
|
|
243
|
+
async def generic_error_handler(request: Request, exc: Exception) -> JSONResponse:
|
|
244
|
+
# Log full details, return generic message
|
|
245
|
+
logger.exception("Unexpected error", exc_info=exc)
|
|
246
|
+
return JSONResponse(
|
|
247
|
+
status_code=500,
|
|
248
|
+
content={"error": {"code": "INTERNAL_ERROR", "message": "An unexpected error occurred"}},
|
|
249
|
+
)
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
## Go
|
|
253
|
+
|
|
254
|
+
### Sentinel Errors and Error Wrapping
|
|
255
|
+
|
|
256
|
+
```go
|
|
257
|
+
package domain
|
|
258
|
+
|
|
259
|
+
import "errors"
|
|
260
|
+
|
|
261
|
+
// Sentinel errors for type-checking
|
|
262
|
+
var (
|
|
263
|
+
ErrNotFound = errors.New("not found")
|
|
264
|
+
ErrUnauthorized = errors.New("unauthorized")
|
|
265
|
+
ErrConflict = errors.New("conflict")
|
|
266
|
+
)
|
|
267
|
+
|
|
268
|
+
// Wrap errors with context — never lose the original
|
|
269
|
+
func (r *UserRepository) FindByID(ctx context.Context, id string) (*User, error) {
|
|
270
|
+
user, err := r.db.QueryRow(ctx, "SELECT * FROM users WHERE id = $1", id)
|
|
271
|
+
if errors.Is(err, sql.ErrNoRows) {
|
|
272
|
+
return nil, fmt.Errorf("user %s: %w", id, ErrNotFound)
|
|
273
|
+
}
|
|
274
|
+
if err != nil {
|
|
275
|
+
return nil, fmt.Errorf("querying user %s: %w", id, err)
|
|
276
|
+
}
|
|
277
|
+
return user, nil
|
|
278
|
+
}
|
|
279
|
+
|
|
280
|
+
// At the handler level, unwrap to determine response
|
|
281
|
+
func (h *Handler) GetUser(w http.ResponseWriter, r *http.Request) {
|
|
282
|
+
user, err := h.service.GetUser(r.Context(), chi.URLParam(r, "id"))
|
|
283
|
+
if err != nil {
|
|
284
|
+
switch {
|
|
285
|
+
case errors.Is(err, domain.ErrNotFound):
|
|
286
|
+
writeError(w, http.StatusNotFound, "not_found", err.Error())
|
|
287
|
+
case errors.Is(err, domain.ErrUnauthorized):
|
|
288
|
+
writeError(w, http.StatusForbidden, "forbidden", "Access denied")
|
|
289
|
+
default:
|
|
290
|
+
slog.Error("unexpected error", "err", err)
|
|
291
|
+
writeError(w, http.StatusInternalServerError, "internal_error", "An unexpected error occurred")
|
|
292
|
+
}
|
|
293
|
+
return
|
|
294
|
+
}
|
|
295
|
+
writeJSON(w, http.StatusOK, user)
|
|
296
|
+
}
|
|
297
|
+
```
|
|
298
|
+
|
|
299
|
+
## Retry with Exponential Backoff
|
|
300
|
+
|
|
301
|
+
```typescript
|
|
302
|
+
interface RetryOptions {
|
|
303
|
+
maxAttempts?: number
|
|
304
|
+
baseDelayMs?: number
|
|
305
|
+
maxDelayMs?: number
|
|
306
|
+
retryIf?: (error: unknown) => boolean
|
|
307
|
+
}
|
|
308
|
+
|
|
309
|
+
async function withRetry<T>(
|
|
310
|
+
fn: () => Promise<T>,
|
|
311
|
+
options: RetryOptions = {},
|
|
312
|
+
): Promise<T> {
|
|
313
|
+
const {
|
|
314
|
+
maxAttempts = 3,
|
|
315
|
+
baseDelayMs = 500,
|
|
316
|
+
maxDelayMs = 10_000,
|
|
317
|
+
retryIf = () => true,
|
|
318
|
+
} = options
|
|
319
|
+
|
|
320
|
+
let lastError: unknown
|
|
321
|
+
|
|
322
|
+
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
|
|
323
|
+
try {
|
|
324
|
+
return await fn()
|
|
325
|
+
} catch (error) {
|
|
326
|
+
lastError = error
|
|
327
|
+
if (attempt === maxAttempts || !retryIf(error)) throw error
|
|
328
|
+
|
|
329
|
+
const jitter = Math.random() * baseDelayMs
|
|
330
|
+
const delay = Math.min(baseDelayMs * 2 ** (attempt - 1) + jitter, maxDelayMs)
|
|
331
|
+
await new Promise(resolve => setTimeout(resolve, delay))
|
|
332
|
+
}
|
|
333
|
+
}
|
|
334
|
+
|
|
335
|
+
throw lastError
|
|
336
|
+
}
|
|
337
|
+
|
|
338
|
+
// Usage: retry transient network errors, not 4xx
|
|
339
|
+
const data = await withRetry(() => fetch('/api/data').then(r => r.json()), {
|
|
340
|
+
maxAttempts: 3,
|
|
341
|
+
retryIf: (error) => !(error instanceof AppError && error.statusCode < 500),
|
|
342
|
+
})
|
|
343
|
+
```
|
|
344
|
+
|
|
345
|
+
## User-Facing Error Messages
|
|
346
|
+
|
|
347
|
+
Map error codes to human-readable messages. Keep technical details out of user-visible text.
|
|
348
|
+
|
|
349
|
+
```typescript
|
|
350
|
+
const USER_ERROR_MESSAGES: Record<string, string> = {
|
|
351
|
+
NOT_FOUND: 'The requested item could not be found.',
|
|
352
|
+
UNAUTHORIZED: 'Please sign in to continue.',
|
|
353
|
+
FORBIDDEN: "You don't have permission to do that.",
|
|
354
|
+
VALIDATION_ERROR: 'Please check your input and try again.',
|
|
355
|
+
RATE_LIMITED: 'Too many requests. Please wait a moment and try again.',
|
|
356
|
+
INTERNAL_ERROR: 'Something went wrong on our end. Please try again later.',
|
|
357
|
+
}
|
|
358
|
+
|
|
359
|
+
export function getUserMessage(code: string): string {
|
|
360
|
+
return USER_ERROR_MESSAGES[code] ?? USER_ERROR_MESSAGES.INTERNAL_ERROR
|
|
361
|
+
}
|
|
362
|
+
```
|
|
363
|
+
|
|
364
|
+
## Error Handling Checklist
|
|
365
|
+
|
|
366
|
+
Before merging any code that touches error handling:
|
|
367
|
+
|
|
368
|
+
- [ ] Every `catch` block handles, re-throws, or logs — no silent swallowing
|
|
369
|
+
- [ ] API errors follow the standard envelope `{ error: { code, message } }`
|
|
370
|
+
- [ ] User-facing messages contain no stack traces or internal details
|
|
371
|
+
- [ ] Full error context is logged server-side
|
|
372
|
+
- [ ] Custom error classes extend a base `AppError` with a `code` field
|
|
373
|
+
- [ ] Async functions surface errors to callers — no fire-and-forget without fallback
|
|
374
|
+
- [ ] Retry logic only retries retriable errors (not 4xx client errors)
|
|
375
|
+
- [ ] React components are wrapped in `ErrorBoundary` for rendering errors
|
|
@@ -0,0 +1,269 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: eval-harness
|
|
3
|
+
description: Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles
|
|
4
|
+
tools: Read, Write, Edit, Bash, Grep, Glob
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Eval Harness Skill
|
|
8
|
+
|
|
9
|
+
A formal evaluation framework for Claude Code sessions, implementing eval-driven development (EDD) principles.
|
|
10
|
+
|
|
11
|
+
## When to Activate
|
|
12
|
+
|
|
13
|
+
- Setting up eval-driven development (EDD) for AI-assisted workflows
|
|
14
|
+
- Defining pass/fail criteria for Claude Code task completion
|
|
15
|
+
- Measuring agent reliability with pass@k metrics
|
|
16
|
+
- Creating regression test suites for prompt or agent changes
|
|
17
|
+
- Benchmarking agent performance across model versions
|
|
18
|
+
|
|
19
|
+
## Philosophy
|
|
20
|
+
|
|
21
|
+
Eval-Driven Development treats evals as the "unit tests of AI development":
|
|
22
|
+
- Define expected behavior BEFORE implementation
|
|
23
|
+
- Run evals continuously during development
|
|
24
|
+
- Track regressions with each change
|
|
25
|
+
- Use pass@k metrics for reliability measurement
|
|
26
|
+
|
|
27
|
+
## Eval Types
|
|
28
|
+
|
|
29
|
+
### Capability Evals
|
|
30
|
+
Test if Claude can do something it couldn't before:
|
|
31
|
+
```markdown
|
|
32
|
+
[CAPABILITY EVAL: feature-name]
|
|
33
|
+
Task: Description of what Claude should accomplish
|
|
34
|
+
Success Criteria:
|
|
35
|
+
- [ ] Criterion 1
|
|
36
|
+
- [ ] Criterion 2
|
|
37
|
+
- [ ] Criterion 3
|
|
38
|
+
Expected Output: Description of expected result
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
### Regression Evals
|
|
42
|
+
Ensure changes don't break existing functionality:
|
|
43
|
+
```markdown
|
|
44
|
+
[REGRESSION EVAL: feature-name]
|
|
45
|
+
Baseline: SHA or checkpoint name
|
|
46
|
+
Tests:
|
|
47
|
+
- existing-test-1: PASS/FAIL
|
|
48
|
+
- existing-test-2: PASS/FAIL
|
|
49
|
+
- existing-test-3: PASS/FAIL
|
|
50
|
+
Result: X/Y passed (previously Y/Y)
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
## Grader Types
|
|
54
|
+
|
|
55
|
+
### 1. Code-Based Grader
|
|
56
|
+
Deterministic checks using code:
|
|
57
|
+
```bash
|
|
58
|
+
# Check if file contains expected pattern
|
|
59
|
+
grep -q "export function handleAuth" src/auth.ts && echo "PASS" || echo "FAIL"
|
|
60
|
+
|
|
61
|
+
# Check if tests pass
|
|
62
|
+
npm test -- --testPathPattern="auth" && echo "PASS" || echo "FAIL"
|
|
63
|
+
|
|
64
|
+
# Check if build succeeds
|
|
65
|
+
npm run build && echo "PASS" || echo "FAIL"
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
### 2. Model-Based Grader
|
|
69
|
+
Use Claude to evaluate open-ended outputs:
|
|
70
|
+
```markdown
|
|
71
|
+
[MODEL GRADER PROMPT]
|
|
72
|
+
Evaluate the following code change:
|
|
73
|
+
1. Does it solve the stated problem?
|
|
74
|
+
2. Is it well-structured?
|
|
75
|
+
3. Are edge cases handled?
|
|
76
|
+
4. Is error handling appropriate?
|
|
77
|
+
|
|
78
|
+
Score: 1-5 (1=poor, 5=excellent)
|
|
79
|
+
Reasoning: [explanation]
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
### 3. Human Grader
|
|
83
|
+
Flag for manual review:
|
|
84
|
+
```markdown
|
|
85
|
+
[HUMAN REVIEW REQUIRED]
|
|
86
|
+
Change: Description of what changed
|
|
87
|
+
Reason: Why human review is needed
|
|
88
|
+
Risk Level: LOW/MEDIUM/HIGH
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
## Metrics
|
|
92
|
+
|
|
93
|
+
### pass@k
|
|
94
|
+
"At least one success in k attempts"
|
|
95
|
+
- pass@1: First attempt success rate
|
|
96
|
+
- pass@3: Success within 3 attempts
|
|
97
|
+
- Typical target: pass@3 > 90%
|
|
98
|
+
|
|
99
|
+
### pass^k
|
|
100
|
+
"All k trials succeed"
|
|
101
|
+
- Higher bar for reliability
|
|
102
|
+
- pass^3: 3 consecutive successes
|
|
103
|
+
- Use for critical paths
|
|
104
|
+
|
|
105
|
+
## Eval Workflow
|
|
106
|
+
|
|
107
|
+
### 1. Define (Before Coding)
|
|
108
|
+
```markdown
|
|
109
|
+
## EVAL DEFINITION: feature-xyz
|
|
110
|
+
|
|
111
|
+
### Capability Evals
|
|
112
|
+
1. Can create new user account
|
|
113
|
+
2. Can validate email format
|
|
114
|
+
3. Can hash password securely
|
|
115
|
+
|
|
116
|
+
### Regression Evals
|
|
117
|
+
1. Existing login still works
|
|
118
|
+
2. Session management unchanged
|
|
119
|
+
3. Logout flow intact
|
|
120
|
+
|
|
121
|
+
### Success Metrics
|
|
122
|
+
- pass@3 > 90% for capability evals
|
|
123
|
+
- pass^3 = 100% for regression evals
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
### 2. Implement
|
|
127
|
+
Write code to pass the defined evals.
|
|
128
|
+
|
|
129
|
+
### 3. Evaluate
|
|
130
|
+
```bash
|
|
131
|
+
# Run capability evals
|
|
132
|
+
[Run each capability eval, record PASS/FAIL]
|
|
133
|
+
|
|
134
|
+
# Run regression evals
|
|
135
|
+
npm test -- --testPathPattern="existing"
|
|
136
|
+
|
|
137
|
+
# Generate report
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
### 4. Report
|
|
141
|
+
```markdown
|
|
142
|
+
EVAL REPORT: feature-xyz
|
|
143
|
+
========================
|
|
144
|
+
|
|
145
|
+
Capability Evals:
|
|
146
|
+
create-user: PASS (pass@1)
|
|
147
|
+
validate-email: PASS (pass@2)
|
|
148
|
+
hash-password: PASS (pass@1)
|
|
149
|
+
Overall: 3/3 passed
|
|
150
|
+
|
|
151
|
+
Regression Evals:
|
|
152
|
+
login-flow: PASS
|
|
153
|
+
session-mgmt: PASS
|
|
154
|
+
logout-flow: PASS
|
|
155
|
+
Overall: 3/3 passed
|
|
156
|
+
|
|
157
|
+
Metrics:
|
|
158
|
+
pass@1: 67% (2/3)
|
|
159
|
+
pass@3: 100% (3/3)
|
|
160
|
+
|
|
161
|
+
Status: READY FOR REVIEW
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
## Integration Patterns
|
|
165
|
+
|
|
166
|
+
### Pre-Implementation
|
|
167
|
+
```
|
|
168
|
+
/eval define feature-name
|
|
169
|
+
```
|
|
170
|
+
Creates eval definition file at `.claude/evals/feature-name.md`
|
|
171
|
+
|
|
172
|
+
### During Implementation
|
|
173
|
+
```
|
|
174
|
+
/eval check feature-name
|
|
175
|
+
```
|
|
176
|
+
Runs current evals and reports status
|
|
177
|
+
|
|
178
|
+
### Post-Implementation
|
|
179
|
+
```
|
|
180
|
+
/eval report feature-name
|
|
181
|
+
```
|
|
182
|
+
Generates full eval report
|
|
183
|
+
|
|
184
|
+
## Eval Storage
|
|
185
|
+
|
|
186
|
+
Store evals in project:
|
|
187
|
+
```
|
|
188
|
+
.claude/
|
|
189
|
+
evals/
|
|
190
|
+
feature-xyz.md # Eval definition
|
|
191
|
+
feature-xyz.log # Eval run history
|
|
192
|
+
baseline.json # Regression baselines
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
## Best Practices
|
|
196
|
+
|
|
197
|
+
1. **Define evals BEFORE coding** - Forces clear thinking about success criteria
|
|
198
|
+
2. **Run evals frequently** - Catch regressions early
|
|
199
|
+
3. **Track pass@k over time** - Monitor reliability trends
|
|
200
|
+
4. **Use code graders when possible** - Deterministic > probabilistic
|
|
201
|
+
5. **Human review for security** - Never fully automate security checks
|
|
202
|
+
6. **Keep evals fast** - Slow evals don't get run
|
|
203
|
+
7. **Version evals with code** - Evals are first-class artifacts
|
|
204
|
+
|
|
205
|
+
## Example: Adding Authentication
|
|
206
|
+
|
|
207
|
+
```markdown
|
|
208
|
+
## EVAL: add-authentication
|
|
209
|
+
|
|
210
|
+
### Phase 1: Define (10 min)
|
|
211
|
+
Capability Evals:
|
|
212
|
+
- [ ] User can register with email/password
|
|
213
|
+
- [ ] User can login with valid credentials
|
|
214
|
+
- [ ] Invalid credentials rejected with proper error
|
|
215
|
+
- [ ] Sessions persist across page reloads
|
|
216
|
+
- [ ] Logout clears session
|
|
217
|
+
|
|
218
|
+
Regression Evals:
|
|
219
|
+
- [ ] Public routes still accessible
|
|
220
|
+
- [ ] API responses unchanged
|
|
221
|
+
- [ ] Database schema compatible
|
|
222
|
+
|
|
223
|
+
### Phase 2: Implement (varies)
|
|
224
|
+
[Write code]
|
|
225
|
+
|
|
226
|
+
### Phase 3: Evaluate
|
|
227
|
+
Run: /eval check add-authentication
|
|
228
|
+
|
|
229
|
+
### Phase 4: Report
|
|
230
|
+
EVAL REPORT: add-authentication
|
|
231
|
+
==============================
|
|
232
|
+
Capability: 5/5 passed (pass@3: 100%)
|
|
233
|
+
Regression: 3/3 passed (pass^3: 100%)
|
|
234
|
+
Status: SHIP IT
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
## Product Evals (v1.8)
|
|
238
|
+
|
|
239
|
+
Use product evals when behavior quality cannot be captured by unit tests alone.
|
|
240
|
+
|
|
241
|
+
### Grader Types
|
|
242
|
+
|
|
243
|
+
1. Code grader (deterministic assertions)
|
|
244
|
+
2. Rule grader (regex/schema constraints)
|
|
245
|
+
3. Model grader (LLM-as-judge rubric)
|
|
246
|
+
4. Human grader (manual adjudication for ambiguous outputs)
|
|
247
|
+
|
|
248
|
+
### pass@k Guidance
|
|
249
|
+
|
|
250
|
+
- `pass@1`: direct reliability
|
|
251
|
+
- `pass@3`: practical reliability under controlled retries
|
|
252
|
+
- `pass^3`: stability test (all 3 runs must pass)
|
|
253
|
+
|
|
254
|
+
Recommended thresholds:
|
|
255
|
+
- Capability evals: pass@3 >= 0.90
|
|
256
|
+
- Regression evals: pass^3 = 1.00 for release-critical paths
|
|
257
|
+
|
|
258
|
+
### Eval Anti-Patterns
|
|
259
|
+
|
|
260
|
+
- Overfitting prompts to known eval examples
|
|
261
|
+
- Measuring only happy-path outputs
|
|
262
|
+
- Ignoring cost and latency drift while chasing pass rates
|
|
263
|
+
- Allowing flaky graders in release gates
|
|
264
|
+
|
|
265
|
+
### Minimal Eval Artifact Layout
|
|
266
|
+
|
|
267
|
+
- `.claude/evals/<feature>.md` definition
|
|
268
|
+
- `.claude/evals/<feature>.log` run history
|
|
269
|
+
- `docs/releases/<version>/eval-summary.md` release snapshot
|