rsc-universal 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +279 -0
- package/manifest.json +4761 -0
- package/package.json +59 -0
- package/schema/frontmatter.schema.json +12 -0
- package/scripts/build-manifest.js +72 -0
- package/scripts/consult.js +106 -0
- package/scripts/detect-repo.js +118 -0
- package/scripts/doctor.js +21 -0
- package/scripts/eval-lint.sh +179 -0
- package/scripts/install-apply.js +52 -0
- package/scripts/install-plan.js +13 -0
- package/scripts/lib/behavior-score.js +103 -0
- package/scripts/lib/frontmatter.js +47 -0
- package/scripts/lib/harden-policy.js +41 -0
- package/scripts/lib/manifest.js +18 -0
- package/scripts/lib/recommend.js +36 -0
- package/scripts/lib/registry.js +110 -0
- package/scripts/lib/result-envelope.js +35 -0
- package/scripts/lib/state.js +12 -0
- package/scripts/lib/ui.js +17 -0
- package/scripts/reviewer-guard.sh +67 -0
- package/scripts/rsc.js +108 -0
- package/scripts/skill-behavior-eval.js +33 -0
- package/scripts/skill-behavior-eval.workflow.js +136 -0
- package/scripts/skill-behavior-rubric.md +63 -0
- package/scripts/skill-harden-rubric.md +40 -0
- package/scripts/skill-harden.workflow.js +161 -0
- package/scripts/skill-rubric.md +39 -0
- package/scripts/skill-scoreboard.workflow.js +35 -0
- package/skills/ab-testing/SKILL.md +191 -0
- package/skills/ab-testing/evals/README.md +8 -0
- package/skills/ab-testing/evals/cases.yaml +49 -0
- package/skills/ab-testing/references/pitfalls.md +74 -0
- package/skills/ab-testing/references/sample-size-and-cuped.md +128 -0
- package/skills/ab-testing/scripts/verify.sh +89 -0
- package/skills/accessibility/SKILL.md +218 -0
- package/skills/accessibility/evals/README.md +3 -0
- package/skills/accessibility/evals/cases.yaml +47 -0
- package/skills/accessibility/references/aria-patterns.md +113 -0
- package/skills/accessibility/references/wcag22-checklist.md +83 -0
- package/skills/accessibility/scripts/verify.sh +103 -0
- package/skills/ads/SKILL.md +175 -0
- package/skills/ads/evals/README.md +15 -0
- package/skills/ads/evals/cases.yaml +58 -0
- package/skills/ads/references/platform-specs.md +73 -0
- package/skills/ads/references/roas-model.md +77 -0
- package/skills/ads/scripts/verify.sh +210 -0
- package/skills/agent-eval/SKILL.md +213 -0
- package/skills/agent-eval/evals/README.md +12 -0
- package/skills/agent-eval/evals/cases.yaml +45 -0
- package/skills/agent-eval/references/judge-design.md +118 -0
- package/skills/agent-eval/references/runner-and-gate.md +183 -0
- package/skills/agent-eval/scripts/verify.sh +161 -0
- package/skills/agent-safety/SKILL.md +176 -0
- package/skills/agent-safety/evals/README.md +12 -0
- package/skills/agent-safety/evals/cases.yaml +46 -0
- package/skills/agent-safety/references/threat-model.md +51 -0
- package/skills/ai-media/SKILL.md +196 -0
- package/skills/ai-media/evals/README.md +3 -0
- package/skills/ai-media/evals/cases.yaml +45 -0
- package/skills/ai-media/references/ffmpeg-assembly.md +117 -0
- package/skills/ai-media/references/models-and-params.md +78 -0
- package/skills/ai-media/scripts/verify.sh +103 -0
- package/skills/analytics/SKILL.md +219 -0
- package/skills/analytics/evals/README.md +9 -0
- package/skills/analytics/evals/cases.yaml +53 -0
- package/skills/analytics/references/event-taxonomy.md +75 -0
- package/skills/analytics/references/ga4-setup.md +122 -0
- package/skills/analytics/references/posthog-setup.md +100 -0
- package/skills/analytics/scripts/verify.sh +95 -0
- package/skills/analyze/SKILL.md +136 -0
- package/skills/analyze/evals/README.md +72 -0
- package/skills/analyze/evals/cases.yaml +74 -0
- package/skills/angular/SKILL.md +288 -0
- package/skills/angular/evals/README.md +3 -0
- package/skills/angular/evals/cases.yaml +38 -0
- package/skills/angular/references/migration.md +81 -0
- package/skills/angular/references/signals-rxjs.md +92 -0
- package/skills/angular/scripts/verify.sh +122 -0
- package/skills/api-connector-builder/SKILL.md +285 -0
- package/skills/api-connector-builder/evals/README.md +11 -0
- package/skills/api-connector-builder/evals/cases.yaml +47 -0
- package/skills/api-connector-builder/references/auth-flows.md +132 -0
- package/skills/api-connector-builder/references/pagination.md +144 -0
- package/skills/api-connector-builder/scripts/verify.sh +172 -0
- package/skills/api-design/SKILL.md +189 -0
- package/skills/api-design/evals/README.md +3 -0
- package/skills/api-design/evals/cases.yaml +45 -0
- package/skills/api-design/references/graphql-design.md +70 -0
- package/skills/api-design/references/openapi-contract.md +86 -0
- package/skills/api-design/references/rest-conventions.md +63 -0
- package/skills/api-design/references/versioning-and-evolution.md +49 -0
- package/skills/api-design/scripts/verify.sh +138 -0
- package/skills/article-writing/SKILL.md +175 -0
- package/skills/article-writing/evals/README.md +3 -0
- package/skills/article-writing/evals/cases.yaml +47 -0
- package/skills/article-writing/references/ai-tell-banlist.md +114 -0
- package/skills/article-writing/references/on-page-seo.md +133 -0
- package/skills/article-writing/scripts/verify.sh +165 -0
- package/skills/astro/SKILL.md +275 -0
- package/skills/astro/evals/README.md +3 -0
- package/skills/astro/evals/cases.yaml +41 -0
- package/skills/astro/references/content-layer.md +118 -0
- package/skills/astro/references/deploy-and-integrations.md +163 -0
- package/skills/astro/scripts/verify.sh +137 -0
- package/skills/author-skill/SKILL.md +206 -0
- package/skills/author-skill/evals/README.md +66 -0
- package/skills/author-skill/evals/cases.yaml +75 -0
- package/skills/author-skill/references/description-recipe.md +84 -0
- package/skills/author-skill/references/eval-authoring.md +74 -0
- package/skills/author-skill/references/rsc-conventions.md +91 -0
- package/skills/automation-flows/SKILL.md +132 -0
- package/skills/automation-flows/evals/README.md +5 -0
- package/skills/automation-flows/evals/cases.yaml +44 -0
- package/skills/automation-flows/references/error-handling.md +58 -0
- package/skills/automation-flows/references/n8n-workflow-json.md +63 -0
- package/skills/automation-flows/scripts/verify.sh +78 -0
- package/skills/aws-essentials/SKILL.md +223 -0
- package/skills/aws-essentials/evals/README.md +10 -0
- package/skills/aws-essentials/evals/cases.yaml +44 -0
- package/skills/aws-essentials/references/iam-least-privilege.md +134 -0
- package/skills/aws-essentials/references/rds-cloudfront-recipes.md +127 -0
- package/skills/aws-essentials/scripts/verify.sh +99 -0
- package/skills/backups/SKILL.md +137 -0
- package/skills/backups/evals/README.md +3 -0
- package/skills/backups/evals/cases.yaml +42 -0
- package/skills/backups/references/engine-recipes.md +121 -0
- package/skills/backups/references/restore-runbook.md +65 -0
- package/skills/backups/scripts/verify.sh +80 -0
- package/skills/bash-scripting/SKILL.md +231 -0
- package/skills/bash-scripting/evals/README.md +3 -0
- package/skills/bash-scripting/evals/cases.yaml +45 -0
- package/skills/bash-scripting/references/portability.md +97 -0
- package/skills/bash-scripting/scripts/verify.sh +140 -0
- package/skills/bookkeeping/SKILL.md +184 -0
- package/skills/bookkeeping/evals/README.md +5 -0
- package/skills/bookkeeping/evals/cases.yaml +52 -0
- package/skills/bookkeeping/references/chart-of-accounts.md +87 -0
- package/skills/bookkeeping/references/reconciliation-playbook.md +54 -0
- package/skills/bookkeeping/references/tricky-transactions.md +192 -0
- package/skills/brand-identity/SKILL.md +161 -0
- package/skills/brand-identity/evals/README.md +14 -0
- package/skills/brand-identity/evals/cases.yaml +43 -0
- package/skills/brand-identity/references/color-and-tokens.md +129 -0
- package/skills/brand-identity/references/logo-and-assets.md +117 -0
- package/skills/brand-identity/scripts/verify.sh +224 -0
- package/skills/brand-voice/SKILL.md +183 -0
- package/skills/brand-voice/evals/README.md +3 -0
- package/skills/brand-voice/evals/cases.yaml +57 -0
- package/skills/brand-voice/references/voice-guide-template.md +150 -0
- package/skills/brand-voice/references/word-bank.md +61 -0
- package/skills/brand-voice/scripts/verify.sh +190 -0
- package/skills/building-agents/SKILL.md +469 -0
- package/skills/building-agents/evals/README.md +68 -0
- package/skills/building-agents/evals/cases.yaml +60 -0
- package/skills/building-agents/references/agent-loops-and-harness.md +371 -0
- package/skills/building-agents/references/evals-and-observability.md +420 -0
- package/skills/building-agents/references/mcp-servers.md +294 -0
- package/skills/building-agents/references/provider-abstraction.md +489 -0
- package/skills/building-agents/references/tools-and-rag.md +417 -0
- package/skills/building-agents/scripts/verify.sh +121 -0
- package/skills/business-intelligence/SKILL.md +176 -0
- package/skills/business-intelligence/evals/README.md +3 -0
- package/skills/business-intelligence/evals/cases.yaml +43 -0
- package/skills/business-intelligence/references/authoring-semantic-models.md +120 -0
- package/skills/business-intelligence/references/wiring-agents-and-apis.md +79 -0
- package/skills/business-intelligence/scripts/verify.sh +143 -0
- package/skills/calendar-scheduling/SKILL.md +196 -0
- package/skills/calendar-scheduling/evals/README.md +14 -0
- package/skills/calendar-scheduling/evals/cases.yaml +45 -0
- package/skills/calendar-scheduling/references/google-calendar-sync.md +78 -0
- package/skills/calendar-scheduling/references/provider-matrix.md +71 -0
- package/skills/calendar-scheduling/scripts/verify.sh +117 -0
- package/skills/case-studies/SKILL.md +147 -0
- package/skills/case-studies/evals/README.md +3 -0
- package/skills/case-studies/evals/cases.yaml +63 -0
- package/skills/case-studies/references/case-study-skeleton.md +90 -0
- package/skills/case-studies/references/consent-and-substantiation.md +80 -0
- package/skills/case-studies/scripts/verify.sh +161 -0
- package/skills/chatbot/SKILL.md +168 -0
- package/skills/chatbot/evals/README.md +13 -0
- package/skills/chatbot/evals/cases.yaml +43 -0
- package/skills/chatbot/references/handoff-and-sales.md +71 -0
- package/skills/chatbot/references/system-prompt-and-guardrails.md +78 -0
- package/skills/chatbot/scripts/verify.sh +162 -0
- package/skills/chrome-extension/SKILL.md +169 -0
- package/skills/chrome-extension/evals/README.md +12 -0
- package/skills/chrome-extension/evals/cases.yaml +40 -0
- package/skills/chrome-extension/references/store-and-migration.md +84 -0
- package/skills/chrome-extension/scripts/verify.sh +62 -0
- package/skills/clarify/SKILL.md +159 -0
- package/skills/clarify/evals/README.md +70 -0
- package/skills/clarify/evals/cases.yaml +71 -0
- package/skills/clickhouse-analytics/SKILL.md +165 -0
- package/skills/clickhouse-analytics/evals/README.md +3 -0
- package/skills/clickhouse-analytics/evals/cases.yaml +45 -0
- package/skills/clickhouse-analytics/references/ingestion-and-mvs.md +109 -0
- package/skills/clickhouse-analytics/references/query-optimization.md +76 -0
- package/skills/clickhouse-analytics/references/schema-and-engines.md +63 -0
- package/skills/clickhouse-analytics/scripts/verify.sh +109 -0
- package/skills/client-onboarding/SKILL.md +254 -0
- package/skills/client-onboarding/evals/README.md +14 -0
- package/skills/client-onboarding/evals/cases.yaml +40 -0
- package/skills/client-onboarding/references/onboarding-playbook.md +126 -0
- package/skills/cloudflare/SKILL.md +191 -0
- package/skills/cloudflare/evals/README.md +15 -0
- package/skills/cloudflare/evals/cases.yaml +46 -0
- package/skills/cloudflare/references/storage-primitives.md +104 -0
- package/skills/cloudflare/references/wrangler-config.md +91 -0
- package/skills/cloudflare/scripts/verify.sh +133 -0
- package/skills/code-review/SKILL.md +143 -0
- package/skills/code-review/evals/README.md +3 -0
- package/skills/code-review/evals/cases.yaml +55 -0
- package/skills/code-review/references/pr-workflow.md +67 -0
- package/skills/codebase-onboarding/SKILL.md +133 -0
- package/skills/codebase-onboarding/evals/README.md +3 -0
- package/skills/codebase-onboarding/evals/cases.yaml +69 -0
- package/skills/codebase-onboarding/references/recon-playbook.md +57 -0
- package/skills/codebase-onboarding/scripts/verify.sh +54 -0
- package/skills/cold-outreach/SKILL.md +206 -0
- package/skills/cold-outreach/evals/README.md +3 -0
- package/skills/cold-outreach/evals/cases.yaml +60 -0
- package/skills/cold-outreach/references/compliance-footer.md +50 -0
- package/skills/cold-outreach/references/hook-derivation.md +73 -0
- package/skills/cold-outreach/references/templates.md +88 -0
- package/skills/cold-outreach/scripts/verify.sh +170 -0
- package/skills/community/SKILL.md +225 -0
- package/skills/community/evals/README.md +3 -0
- package/skills/community/evals/cases.yaml +40 -0
- package/skills/community/references/metrics-and-rituals.md +58 -0
- package/skills/community/references/platform-playbooks.md +64 -0
- package/skills/community/scripts/verify.sh +83 -0
- package/skills/competitor-watch/SKILL.md +193 -0
- package/skills/competitor-watch/evals/README.md +19 -0
- package/skills/competitor-watch/evals/cases.yaml +54 -0
- package/skills/competitor-watch/references/monitoring-config.md +124 -0
- package/skills/competitor-watch/references/tracker-schema.md +79 -0
- package/skills/competitor-watch/scripts/verify.sh +253 -0
- package/skills/compliance/SKILL.md +184 -0
- package/skills/compliance/evals/README.md +14 -0
- package/skills/compliance/evals/cases.yaml +46 -0
- package/skills/compliance/references/frameworks.md +108 -0
- package/skills/compliance/references/operating-rhythm.md +79 -0
- package/skills/compliance/scripts/verify.sh +168 -0
- package/skills/compose-multiplatform/SKILL.md +198 -0
- package/skills/compose-multiplatform/evals/README.md +3 -0
- package/skills/compose-multiplatform/evals/cases.yaml +40 -0
- package/skills/compose-multiplatform/references/ios-interop.md +91 -0
- package/skills/compose-multiplatform/references/project-setup.md +96 -0
- package/skills/compose-multiplatform/scripts/verify.sh +123 -0
- package/skills/constitution/SKILL.md +160 -0
- package/skills/constitution/evals/README.md +68 -0
- package/skills/constitution/evals/cases.yaml +72 -0
- package/skills/constitution/references/constitution-template.md +90 -0
- package/skills/content-engine/SKILL.md +164 -0
- package/skills/content-engine/evals/README.md +17 -0
- package/skills/content-engine/evals/cases.yaml +62 -0
- package/skills/content-engine/references/atomization.md +81 -0
- package/skills/content-engine/references/brief-and-pipeline.md +90 -0
- package/skills/content-engine/scripts/verify.sh +146 -0
- package/skills/context-budget/SKILL.md +132 -0
- package/skills/context-budget/evals/README.md +11 -0
- package/skills/context-budget/evals/cases.yaml +40 -0
- package/skills/context-budget/references/handoff-and-compaction.md +96 -0
- package/skills/continuous-learning/SKILL.md +136 -0
- package/skills/continuous-learning/evals/README.md +16 -0
- package/skills/continuous-learning/evals/cases.yaml +39 -0
- package/skills/continuous-learning/references/lesson-routing.md +106 -0
- package/skills/contracts/SKILL.md +124 -0
- package/skills/contracts/evals/README.md +3 -0
- package/skills/contracts/evals/cases.yaml +42 -0
- package/skills/contracts/references/clause-library.md +129 -0
- package/skills/contracts/references/review-playbook.md +49 -0
- package/skills/contracts/scripts/verify.sh +53 -0
- package/skills/coolify/SKILL.md +201 -0
- package/skills/coolify/evals/README.md +21 -0
- package/skills/coolify/evals/cases.yaml +46 -0
- package/skills/coolify/references/databases-and-backups.md +99 -0
- package/skills/coolify/references/deploy-recipes.md +105 -0
- package/skills/coolify/references/install-and-proxy.md +80 -0
- package/skills/coolify/scripts/verify.sh +123 -0
- package/skills/cost-tracking/SKILL.md +183 -0
- package/skills/cost-tracking/evals/README.md +3 -0
- package/skills/cost-tracking/evals/cases.yaml +45 -0
- package/skills/cost-tracking/references/cloud-caps.md +52 -0
- package/skills/cost-tracking/references/pricing-tables.md +51 -0
- package/skills/cost-tracking/scripts/verify.sh +135 -0
- package/skills/course-builder/SKILL.md +186 -0
- package/skills/course-builder/evals/README.md +16 -0
- package/skills/course-builder/evals/cases.yaml +49 -0
- package/skills/course-builder/references/assessment-design.md +74 -0
- package/skills/course-builder/references/grounding-and-scoping.md +69 -0
- package/skills/course-builder/references/outcomes-and-blooms.md +82 -0
- package/skills/course-builder/scripts/verify.sh +247 -0
- package/skills/course-storytelling/SKILL.md +205 -0
- package/skills/course-storytelling/evals/README.md +54 -0
- package/skills/course-storytelling/evals/cases.yaml +50 -0
- package/skills/course-storytelling/references/brunson-frameworks.md +190 -0
- package/skills/course-storytelling/references/concept-landing-recipe.md +136 -0
- package/skills/course-storytelling/references/course-analysis.md +124 -0
- package/skills/course-storytelling/references/learner-grounding.md +183 -0
- package/skills/course-storytelling/references/mental-models.md +115 -0
- package/skills/course-storytelling/scripts/verify.sh +223 -0
- package/skills/cpp/SKILL.md +349 -0
- package/skills/cpp/evals/README.md +14 -0
- package/skills/cpp/evals/cases.yaml +44 -0
- package/skills/cpp/references/cmake.md +167 -0
- package/skills/cpp/references/move-and-templates.md +130 -0
- package/skills/cpp/references/undefined-behavior.md +86 -0
- package/skills/cpp/scripts/verify.sh +165 -0
- package/skills/csharp-dotnet/SKILL.md +291 -0
- package/skills/csharp-dotnet/evals/README.md +3 -0
- package/skills/csharp-dotnet/evals/cases.yaml +48 -0
- package/skills/csharp-dotnet/references/aspnetcore.md +99 -0
- package/skills/csharp-dotnet/references/async.md +82 -0
- package/skills/csharp-dotnet/references/efcore.md +96 -0
- package/skills/csharp-dotnet/scripts/verify.sh +90 -0
- package/skills/customer-support/SKILL.md +193 -0
- package/skills/customer-support/evals/README.md +13 -0
- package/skills/customer-support/evals/cases.yaml +61 -0
- package/skills/customer-support/references/macros-and-sla.md +142 -0
- package/skills/dashboard/SKILL.md +205 -0
- package/skills/dashboard/evals/README.md +3 -0
- package/skills/dashboard/evals/cases.yaml +50 -0
- package/skills/dashboard/references/chart-selection.md +34 -0
- package/skills/dashboard/references/tile-schema.md +164 -0
- package/skills/dashboard/scripts/verify.sh +130 -0
- package/skills/data-cleaning/SKILL.md +285 -0
- package/skills/data-cleaning/evals/README.md +16 -0
- package/skills/data-cleaning/evals/cases.yaml +57 -0
- package/skills/data-cleaning/references/normalization-recipes.md +136 -0
- package/skills/data-cleaning/references/validation-patterns.md +134 -0
- package/skills/data-cleaning/scripts/verify.sh +115 -0
- package/skills/data-policy/SKILL.md +163 -0
- package/skills/data-policy/evals/README.md +15 -0
- package/skills/data-policy/evals/cases.yaml +44 -0
- package/skills/data-policy/references/consent-and-ropa.md +97 -0
- package/skills/data-policy/references/retention-schedule.md +83 -0
- package/skills/data-policy/scripts/verify.sh +143 -0
- package/skills/data-scraper/SKILL.md +134 -0
- package/skills/data-scraper/evals/README.md +3 -0
- package/skills/data-scraper/evals/cases.yaml +46 -0
- package/skills/data-scraper/references/anti-bot.md +85 -0
- package/skills/data-scraper/references/frameworks.md +116 -0
- package/skills/data-scraper/references/legal-compliance.md +59 -0
- package/skills/data-scraper/scripts/verify.sh +166 -0
- package/skills/db-migrations/SKILL.md +254 -0
- package/skills/db-migrations/evals/README.md +10 -0
- package/skills/db-migrations/evals/cases.yaml +46 -0
- package/skills/db-migrations/references/backfill-and-batching.md +105 -0
- package/skills/db-migrations/references/expand-contract-playbook.md +152 -0
- package/skills/db-migrations/references/tools-and-runners.md +88 -0
- package/skills/db-migrations/scripts/verify.sh +112 -0
- package/skills/debug/SKILL.md +227 -0
- package/skills/debug/evals/README.md +88 -0
- package/skills/debug/evals/cases.yaml +74 -0
- package/skills/decision-records/SKILL.md +189 -0
- package/skills/decision-records/evals/README.md +3 -0
- package/skills/decision-records/evals/cases.yaml +43 -0
- package/skills/decision-records/references/templates.md +232 -0
- package/skills/decision-records/scripts/verify.sh +105 -0
- package/skills/deployment/SKILL.md +439 -0
- package/skills/deployment/evals/README.md +50 -0
- package/skills/deployment/evals/cases.yaml +53 -0
- package/skills/deployment/references/coolify.md +216 -0
- package/skills/deployment/references/dockerfiles-by-stack.md +319 -0
- package/skills/deployment/references/github-actions.md +295 -0
- package/skills/deployment/references/hosting-targets.md +272 -0
- package/skills/deployment/scripts/verify.sh +134 -0
- package/skills/design/SKILL.md +399 -0
- package/skills/design/evals/README.md +53 -0
- package/skills/design/evals/cases.yaml +56 -0
- package/skills/design/references/brand-grounding.md +187 -0
- package/skills/design/references/copywriting-frameworks.md +138 -0
- package/skills/design/references/landing-anatomy-and-cro.md +202 -0
- package/skills/design/references/motion-and-interaction.md +182 -0
- package/skills/design/references/research-method.md +147 -0
- package/skills/design/references/signature-and-craft.md +148 -0
- package/skills/design/references/trends-2026.md +80 -0
- package/skills/design/references/visual-system.md +236 -0
- package/skills/design/scripts/verify.sh +248 -0
- package/skills/digitalocean/SKILL.md +251 -0
- package/skills/digitalocean/evals/README.md +10 -0
- package/skills/digitalocean/evals/cases.yaml +37 -0
- package/skills/digitalocean/references/app-spec.md +126 -0
- package/skills/digitalocean/references/droplet-ops.md +95 -0
- package/skills/digitalocean/scripts/verify.sh +102 -0
- package/skills/django/SKILL.md +268 -0
- package/skills/django/evals/README.md +11 -0
- package/skills/django/evals/cases.yaml +47 -0
- package/skills/django/references/drf.md +109 -0
- package/skills/django/references/orm-performance.md +91 -0
- package/skills/django/references/security.md +81 -0
- package/skills/django/references/testing.md +86 -0
- package/skills/django/scripts/verify.sh +115 -0
- package/skills/docker/SKILL.md +283 -0
- package/skills/docker/evals/README.md +10 -0
- package/skills/docker/evals/cases.yaml +44 -0
- package/skills/docker/references/base-images-and-stages.md +104 -0
- package/skills/docker/references/compose-recipes.md +109 -0
- package/skills/docker/scripts/verify.sh +149 -0
- package/skills/document-processing/SKILL.md +214 -0
- package/skills/document-processing/evals/README.md +3 -0
- package/skills/document-processing/evals/cases.yaml +65 -0
- package/skills/document-processing/references/engines.md +67 -0
- package/skills/document-processing/scripts/verify.sh +172 -0
- package/skills/domains-dns/SKILL.md +146 -0
- package/skills/domains-dns/evals/README.md +16 -0
- package/skills/domains-dns/evals/cases.yaml +47 -0
- package/skills/domains-dns/references/record-cookbook.md +94 -0
- package/skills/domains-dns/references/tls-and-acme.md +90 -0
- package/skills/domains-dns/references/verify-and-debug.md +64 -0
- package/skills/domains-dns/scripts/verify.sh +163 -0
- package/skills/drizzle-orm/SKILL.md +234 -0
- package/skills/drizzle-orm/evals/README.md +12 -0
- package/skills/drizzle-orm/evals/cases.yaml +47 -0
- package/skills/drizzle-orm/references/relations-and-drivers.md +118 -0
- package/skills/drizzle-orm/scripts/verify.sh +155 -0
- package/skills/duckdb/SKILL.md +207 -0
- package/skills/duckdb/evals/README.md +31 -0
- package/skills/duckdb/evals/cases.yaml +41 -0
- package/skills/duckdb/references/python-and-interop.md +105 -0
- package/skills/duckdb/references/remote-and-lakehouse.md +101 -0
- package/skills/duckdb/scripts/verify.sh +71 -0
- package/skills/dynamodb/SKILL.md +217 -0
- package/skills/dynamodb/evals/README.md +8 -0
- package/skills/dynamodb/evals/cases.yaml +46 -0
- package/skills/dynamodb/references/access-patterns.md +127 -0
- package/skills/dynamodb/references/capacity-and-limits.md +78 -0
- package/skills/dynamodb/scripts/verify.sh +108 -0
- package/skills/e-signature/SKILL.md +185 -0
- package/skills/e-signature/evals/README.md +3 -0
- package/skills/e-signature/evals/cases.yaml +44 -0
- package/skills/e-signature/references/docusign.md +83 -0
- package/skills/e-signature/references/dropbox-sign.md +73 -0
- package/skills/e-signature/references/legal-tiers.md +37 -0
- package/skills/e-signature/scripts/verify.sh +81 -0
- package/skills/e2e-testing/SKILL.md +243 -0
- package/skills/e2e-testing/evals/README.md +10 -0
- package/skills/e2e-testing/evals/cases.yaml +64 -0
- package/skills/e2e-testing/references/config-and-ci.md +156 -0
- package/skills/e2e-testing/references/flakiness-playbook.md +124 -0
- package/skills/e2e-testing/scripts/verify.sh +117 -0
- package/skills/electron/SKILL.md +221 -0
- package/skills/electron/evals/README.md +13 -0
- package/skills/electron/evals/cases.yaml +38 -0
- package/skills/electron/references/packaging-and-updates.md +122 -0
- package/skills/electron/references/security-and-ipc.md +158 -0
- package/skills/electron/scripts/verify.sh +143 -0
- package/skills/elixir/SKILL.md +217 -0
- package/skills/elixir/evals/README.md +3 -0
- package/skills/elixir/evals/cases.yaml +41 -0
- package/skills/elixir/references/mix-and-releases.md +91 -0
- package/skills/elixir/references/otp-patterns.md +96 -0
- package/skills/elixir/scripts/verify.sh +76 -0
- package/skills/email-connector/SKILL.md +294 -0
- package/skills/email-connector/evals/README.md +19 -0
- package/skills/email-connector/evals/cases.yaml +39 -0
- package/skills/email-connector/references/providers.md +107 -0
- package/skills/email-connector/scripts/verify.sh +72 -0
- package/skills/email-deliverability/SKILL.md +168 -0
- package/skills/email-deliverability/evals/README.md +21 -0
- package/skills/email-deliverability/evals/cases.yaml +45 -0
- package/skills/email-deliverability/scripts/verify.sh +98 -0
- package/skills/embeddings-search/SKILL.md +193 -0
- package/skills/embeddings-search/evals/README.md +10 -0
- package/skills/embeddings-search/evals/cases.yaml +44 -0
- package/skills/embeddings-search/references/evaluation.md +86 -0
- package/skills/embeddings-search/references/models.md +73 -0
- package/skills/embeddings-search/scripts/verify.sh +103 -0
- package/skills/error-handling/SKILL.md +307 -0
- package/skills/error-handling/evals/README.md +12 -0
- package/skills/error-handling/evals/cases.yaml +46 -0
- package/skills/error-handling/references/boundaries-and-messaging.md +120 -0
- package/skills/error-handling/references/retry-and-resilience.md +154 -0
- package/skills/error-handling/scripts/verify.sh +110 -0
- package/skills/expo/SKILL.md +253 -0
- package/skills/expo/evals/README.md +13 -0
- package/skills/expo/evals/cases.yaml +44 -0
- package/skills/expo/references/config-plugins.md +117 -0
- package/skills/expo/references/eas-update.md +118 -0
- package/skills/expo/scripts/verify.sh +132 -0
- package/skills/fal/SKILL.md +210 -0
- package/skills/fal/evals/README.md +3 -0
- package/skills/fal/evals/cases.yaml +42 -0
- package/skills/fal/references/models-and-cost.md +53 -0
- package/skills/fal/references/queue-and-webhooks.md +153 -0
- package/skills/fal/scripts/verify.sh +72 -0
- package/skills/fastapi/SKILL.md +499 -0
- package/skills/fastapi/evals/README.md +50 -0
- package/skills/fastapi/evals/cases.yaml +55 -0
- package/skills/fastapi/references/database.md +347 -0
- package/skills/fastapi/references/production.md +338 -0
- package/skills/fastapi/references/security.md +330 -0
- package/skills/fastapi/references/testing.md +349 -0
- package/skills/fastapi/scripts/verify.sh +116 -0
- package/skills/finance-ops/SKILL.md +149 -0
- package/skills/finance-ops/evals/README.md +3 -0
- package/skills/finance-ops/evals/cases.yaml +39 -0
- package/skills/finance-ops/references/cash-flow-forecast.md +57 -0
- package/skills/finance-ops/references/month-close.md +59 -0
- package/skills/finance-ops/references/reconciliation.md +65 -0
- package/skills/finance-ops/scripts/verify.sh +166 -0
- package/skills/financial-model/SKILL.md +170 -0
- package/skills/financial-model/evals/README.md +3 -0
- package/skills/financial-model/evals/cases.yaml +53 -0
- package/skills/financial-model/references/benchmarks-and-scenarios.md +55 -0
- package/skills/financial-model/references/model-structure.md +67 -0
- package/skills/financial-model/references/revenue-build.md +68 -0
- package/skills/financial-model/scripts/verify.sh +232 -0
- package/skills/firebase/SKILL.md +251 -0
- package/skills/firebase/evals/README.md +12 -0
- package/skills/firebase/evals/cases.yaml +45 -0
- package/skills/firebase/references/cloud-functions.md +102 -0
- package/skills/firebase/references/data-modeling.md +108 -0
- package/skills/firebase/references/security-rules.md +137 -0
- package/skills/firebase/scripts/verify.sh +98 -0
- package/skills/flutter/SKILL.md +448 -0
- package/skills/flutter/evals/README.md +54 -0
- package/skills/flutter/evals/cases.yaml +69 -0
- package/skills/flutter/references/architecture-and-state.md +499 -0
- package/skills/flutter/references/i18n-and-dependencies.md +197 -0
- package/skills/flutter/references/performance.md +299 -0
- package/skills/flutter/references/testing.md +385 -0
- package/skills/flutter/references/ui-and-navigation.md +378 -0
- package/skills/flutter/scripts/verify.sh +104 -0
- package/skills/fly-io/SKILL.md +206 -0
- package/skills/fly-io/evals/README.md +3 -0
- package/skills/fly-io/evals/cases.yaml +42 -0
- package/skills/fly-io/references/fly-toml.md +155 -0
- package/skills/fly-io/references/multi-region.md +66 -0
- package/skills/fly-io/scripts/verify.sh +90 -0
- package/skills/forecasting/SKILL.md +139 -0
- package/skills/forecasting/evals/README.md +13 -0
- package/skills/forecasting/evals/cases.yaml +47 -0
- package/skills/forecasting/references/accuracy-and-backtesting.md +104 -0
- package/skills/forecasting/references/methods-cheatsheet.md +94 -0
- package/skills/forecasting/scripts/verify.sh +99 -0
- package/skills/fundraising/SKILL.md +162 -0
- package/skills/fundraising/evals/README.md +18 -0
- package/skills/fundraising/evals/cases.yaml +76 -0
- package/skills/fundraising/references/funnel-math.md +90 -0
- package/skills/fundraising/references/process-playbook.md +97 -0
- package/skills/gcp-essentials/SKILL.md +327 -0
- package/skills/gcp-essentials/evals/README.md +12 -0
- package/skills/gcp-essentials/evals/cases.yaml +38 -0
- package/skills/gcp-essentials/references/deploy-recipes.md +81 -0
- package/skills/gcp-essentials/references/iam-and-auth.md +94 -0
- package/skills/gcp-essentials/references/networking-and-sql.md +74 -0
- package/skills/gcp-essentials/scripts/verify.sh +158 -0
- package/skills/gdpr-privacy/SKILL.md +167 -0
- package/skills/gdpr-privacy/evals/README.md +3 -0
- package/skills/gdpr-privacy/evals/cases.yaml +47 -0
- package/skills/gdpr-privacy/references/dpa-and-transfers.md +63 -0
- package/skills/gdpr-privacy/references/dsar-and-consent.md +83 -0
- package/skills/gdpr-privacy/references/privacy-policy-blueprint.md +99 -0
- package/skills/gdpr-privacy/scripts/verify.sh +84 -0
- package/skills/git-workflow/SKILL.md +190 -0
- package/skills/git-workflow/evals/README.md +10 -0
- package/skills/git-workflow/evals/cases.yaml +47 -0
- package/skills/git-workflow/references/interactive-rebase.md +89 -0
- package/skills/github-actions/SKILL.md +256 -0
- package/skills/github-actions/evals/README.md +3 -0
- package/skills/github-actions/evals/cases.yaml +45 -0
- package/skills/github-actions/references/caching-and-matrix.md +92 -0
- package/skills/github-actions/references/oidc-deploys.md +130 -0
- package/skills/github-actions/scripts/verify.sh +105 -0
- package/skills/go/SKILL.md +438 -0
- package/skills/go/evals/README.md +56 -0
- package/skills/go/evals/cases.yaml +55 -0
- package/skills/go/references/concurrency.md +557 -0
- package/skills/go/references/http-services.md +529 -0
- package/skills/go/references/testing.md +338 -0
- package/skills/go/scripts/verify.sh +109 -0
- package/skills/google-workspace/SKILL.md +287 -0
- package/skills/google-workspace/evals/README.md +16 -0
- package/skills/google-workspace/evals/cases.yaml +44 -0
- package/skills/google-workspace/references/api-recipes.md +148 -0
- package/skills/google-workspace/references/auth-setup.md +100 -0
- package/skills/google-workspace/scripts/verify.sh +128 -0
- package/skills/grants/SKILL.md +171 -0
- package/skills/grants/evals/README.md +3 -0
- package/skills/grants/evals/cases.yaml +69 -0
- package/skills/grants/references/budget-justification.md +71 -0
- package/skills/grants/references/jurisdictions.md +35 -0
- package/skills/grants/references/logic-model.md +66 -0
- package/skills/grants/scripts/verify.sh +193 -0
- package/skills/harness/SKILL.md +329 -0
- package/skills/harness/assets/_TEMPLATE/.env.example +8 -0
- package/skills/harness/assets/_TEMPLATE/CREDENTIALS.md +25 -0
- package/skills/harness/assets/_TEMPLATE/README.md +25 -0
- package/skills/harness/assets/_TEMPLATE/test_connection.sh +30 -0
- package/skills/harness/evals/README.md +54 -0
- package/skills/harness/evals/cases.yaml +72 -0
- package/skills/harness/examples/audit-example.md +120 -0
- package/skills/harness/references/agents-md-template.md +41 -0
- package/skills/harness/references/audit-report-template.html +140 -0
- package/skills/harness/references/audit-report-template.md +116 -0
- package/skills/harness/references/claude-md-template.md +98 -0
- package/skills/harness/references/inbox-readme-template.md +51 -0
- package/skills/harness/references/ingest-formats.md +185 -0
- package/skills/harness/references/providers.yaml +3410 -0
- package/skills/harness/references/tools-readme-template.md +88 -0
- package/skills/harness/references/wiki-archive-template.html +81 -0
- package/skills/harness/references/wiki-article-template.md +20 -0
- package/skills/harness/references/wiki-dashboard-template.html +136 -0
- package/skills/harness/references/wiki-deep-improve-report-template.html +126 -0
- package/skills/harness/references/wiki-gaps-template.md +18 -0
- package/skills/harness/references/wiki-index-template.md +23 -0
- package/skills/harness/references/wiki-protocol.md +699 -0
- package/skills/harness/references/wiki-raw-template.md +7 -0
- package/skills/hetzner/SKILL.md +221 -0
- package/skills/hetzner/evals/README.md +35 -0
- package/skills/hetzner/evals/cases.yaml +46 -0
- package/skills/hetzner/references/cloud-init.md +120 -0
- package/skills/hetzner/references/plans-and-locations.md +56 -0
- package/skills/hetzner/scripts/verify.sh +122 -0
- package/skills/hiring/SKILL.md +248 -0
- package/skills/hiring/evals/README.md +13 -0
- package/skills/hiring/evals/cases.yaml +41 -0
- package/skills/hiring/references/templates.md +118 -0
- package/skills/htmx/SKILL.md +261 -0
- package/skills/htmx/evals/README.md +3 -0
- package/skills/htmx/evals/cases.yaml +38 -0
- package/skills/htmx/references/patterns.md +113 -0
- package/skills/htmx/references/server-contract.md +91 -0
- package/skills/htmx/scripts/verify.sh +93 -0
- package/skills/huggingface/SKILL.md +190 -0
- package/skills/huggingface/evals/README.md +11 -0
- package/skills/huggingface/evals/cases.yaml +41 -0
- package/skills/huggingface/references/endpoints-and-spaces.md +99 -0
- package/skills/huggingface/references/hub-and-cli.md +85 -0
- package/skills/huggingface/references/inference-providers.md +115 -0
- package/skills/huggingface/scripts/verify.sh +123 -0
- package/skills/implement/SKILL.md +283 -0
- package/skills/implement/evals/README.md +56 -0
- package/skills/implement/evals/cases.yaml +43 -0
- package/skills/init/SKILL.md +184 -0
- package/skills/init/evals/README.md +49 -0
- package/skills/init/evals/cases.yaml +74 -0
- package/skills/init/references/accompaniment-and-profile.md +140 -0
- package/skills/init/references/discovery.md +90 -0
- package/skills/init/references/recommend-skills.md +115 -0
- package/skills/init/scripts/verify.sh +122 -0
- package/skills/instagram-api/SKILL.md +241 -0
- package/skills/instagram-api/evals/README.md +3 -0
- package/skills/instagram-api/evals/cases.yaml +43 -0
- package/skills/instagram-api/references/insights-metrics.md +88 -0
- package/skills/instagram-api/references/publish-reel.md +98 -0
- package/skills/instagram-api/scripts/verify.sh +137 -0
- package/skills/inventory/SKILL.md +131 -0
- package/skills/inventory/evals/README.md +3 -0
- package/skills/inventory/evals/cases.yaml +43 -0
- package/skills/inventory/references/abc-xyz.md +52 -0
- package/skills/inventory/references/ddmrp.md +32 -0
- package/skills/inventory/references/reorder-policies.md +85 -0
- package/skills/inventory/references/safety-stock.md +63 -0
- package/skills/inventory/scripts/verify.sh +155 -0
- package/skills/investor-materials/SKILL.md +175 -0
- package/skills/investor-materials/evals/README.md +15 -0
- package/skills/investor-materials/evals/cases.yaml +60 -0
- package/skills/investor-materials/references/dataroom-checklist.md +134 -0
- package/skills/investor-materials/references/update-and-onepager-templates.md +152 -0
- package/skills/investor-materials/scripts/verify.sh +148 -0
- package/skills/invoicing/SKILL.md +154 -0
- package/skills/invoicing/evals/README.md +5 -0
- package/skills/invoicing/evals/cases.yaml +49 -0
- package/skills/invoicing/references/dunning-ladder.md +53 -0
- package/skills/invoicing/references/e-invoicing-mandates.md +43 -0
- package/skills/invoicing/scripts/fixtures/broken-invoice.json +13 -0
- package/skills/invoicing/scripts/fixtures/valid-invoice.json +15 -0
- package/skills/invoicing/scripts/verify.sh +133 -0
- package/skills/ip-trademark/SKILL.md +186 -0
- package/skills/ip-trademark/evals/README.md +10 -0
- package/skills/ip-trademark/evals/cases.yaml +47 -0
- package/skills/ip-trademark/references/jurisdictions.md +63 -0
- package/skills/ip-trademark/references/ownership-and-licensing.md +90 -0
- package/skills/java/SKILL.md +341 -0
- package/skills/java/evals/README.md +23 -0
- package/skills/java/evals/cases.yaml +43 -0
- package/skills/java/references/builds.md +133 -0
- package/skills/java/references/concurrency.md +108 -0
- package/skills/java/references/streams.md +102 -0
- package/skills/java/scripts/verify.sh +107 -0
- package/skills/knowledge-ops/SKILL.md +125 -0
- package/skills/knowledge-ops/evals/README.md +16 -0
- package/skills/knowledge-ops/evals/cases.yaml +50 -0
- package/skills/knowledge-ops/references/gardening-playbook.md +116 -0
- package/skills/kotlin-android/SKILL.md +245 -0
- package/skills/kotlin-android/evals/README.md +13 -0
- package/skills/kotlin-android/evals/cases.yaml +56 -0
- package/skills/kotlin-android/references/architecture.md +200 -0
- package/skills/kotlin-android/references/gradle-setup.md +125 -0
- package/skills/kotlin-android/scripts/verify.sh +109 -0
- package/skills/kpi-framework/SKILL.md +199 -0
- package/skills/kpi-framework/evals/README.md +11 -0
- package/skills/kpi-framework/evals/cases.yaml +42 -0
- package/skills/kpi-framework/references/definition-and-targets.md +64 -0
- package/skills/kpi-framework/references/metric-catalog.md +84 -0
- package/skills/landing-copy/SKILL.md +153 -0
- package/skills/landing-copy/evals/README.md +18 -0
- package/skills/landing-copy/evals/cases.yaml +63 -0
- package/skills/landing-copy/references/frameworks.md +61 -0
- package/skills/landing-copy/references/page-skeleton.md +92 -0
- package/skills/landing-copy/scripts/verify.sh +164 -0
- package/skills/laravel/SKILL.md +301 -0
- package/skills/laravel/evals/README.md +10 -0
- package/skills/laravel/evals/cases.yaml +45 -0
- package/skills/laravel/references/eloquent-patterns.md +126 -0
- package/skills/laravel/references/queues-and-scheduling.md +153 -0
- package/skills/laravel/scripts/verify.sh +128 -0
- package/skills/lead-gen/SKILL.md +155 -0
- package/skills/lead-gen/evals/README.md +3 -0
- package/skills/lead-gen/evals/cases.yaml +43 -0
- package/skills/lead-gen/references/data-sources.md +87 -0
- package/skills/lead-gen/references/scoring-model.md +93 -0
- package/skills/lead-gen/scripts/verify.sh +179 -0
- package/skills/linkedin-api/SKILL.md +211 -0
- package/skills/linkedin-api/evals/README.md +3 -0
- package/skills/linkedin-api/evals/cases.yaml +41 -0
- package/skills/linkedin-api/references/api-reference.md +168 -0
- package/skills/linkedin-api/scripts/verify.sh +98 -0
- package/skills/linkedin-carousels/SKILL.md +239 -0
- package/skills/linkedin-carousels/evals/README.md +13 -0
- package/skills/linkedin-carousels/evals/cases.yaml +62 -0
- package/skills/linkedin-carousels/references/carousel-patterns.md +200 -0
- package/skills/linkedin-carousels/scripts/verify.sh +160 -0
- package/skills/linkedin-content/SKILL.md +162 -0
- package/skills/linkedin-content/evals/README.md +13 -0
- package/skills/linkedin-content/evals/cases.yaml +62 -0
- package/skills/linkedin-content/references/hooks-and-formats.md +114 -0
- package/skills/linkedin-content/scripts/verify.sh +154 -0
- package/skills/linkedin-outreach/SKILL.md +174 -0
- package/skills/linkedin-outreach/evals/README.md +3 -0
- package/skills/linkedin-outreach/evals/cases.yaml +43 -0
- package/skills/linkedin-outreach/references/ledger-schema.md +48 -0
- package/skills/linkedin-outreach/references/sales-navigator-playbook.md +61 -0
- package/skills/linkedin-outreach/scripts/verify.sh +120 -0
- package/skills/linkedin-strategy/SKILL.md +167 -0
- package/skills/linkedin-strategy/evals/README.md +3 -0
- package/skills/linkedin-strategy/evals/cases.yaml +49 -0
- package/skills/linkedin-strategy/references/ssi-and-pillars.md +59 -0
- package/skills/linkedin-strategy/references/wiki-records.md +62 -0
- package/skills/linkedin-strategy/scripts/verify.sh +120 -0
- package/skills/llm-pipeline/SKILL.md +155 -0
- package/skills/llm-pipeline/evals/README.md +3 -0
- package/skills/llm-pipeline/evals/cases.yaml +44 -0
- package/skills/llm-pipeline/references/caching-layers.md +60 -0
- package/skills/llm-pipeline/references/litellm-router.md +101 -0
- package/skills/llm-pipeline/scripts/verify.sh +169 -0
- package/skills/logistics-ops/SKILL.md +219 -0
- package/skills/logistics-ops/evals/README.md +20 -0
- package/skills/logistics-ops/evals/cases.yaml +48 -0
- package/skills/logistics-ops/references/carriers-and-claims.md +105 -0
- package/skills/market-research/SKILL.md +145 -0
- package/skills/market-research/evals/README.md +3 -0
- package/skills/market-research/evals/cases.yaml +48 -0
- package/skills/market-research/references/demand-signals.md +63 -0
- package/skills/market-research/references/sizing-playbook.md +121 -0
- package/skills/market-research/scripts/verify.sh +215 -0
- package/skills/marketing/SKILL.md +233 -0
- package/skills/marketing/evals/README.md +61 -0
- package/skills/marketing/evals/cases.yaml +84 -0
- package/skills/marketing/references/brand-grounding.md +197 -0
- package/skills/marketing/references/campaigns-and-channels.md +151 -0
- package/skills/marketing/references/copy-frameworks.md +166 -0
- package/skills/marketing/references/landing-copy.md +191 -0
- package/skills/marketing/references/seo-geo.md +391 -0
- package/skills/marketing/scripts/seo_audit.py +166 -0
- package/skills/marketing/scripts/verify.sh +233 -0
- package/skills/medium-publishing/SKILL.md +152 -0
- package/skills/medium-publishing/evals/README.md +3 -0
- package/skills/medium-publishing/evals/cases.yaml +42 -0
- package/skills/medium-publishing/references/cross-post-and-canonical.md +65 -0
- package/skills/medium-publishing/references/legacy-api.md +100 -0
- package/skills/medium-strategy/SKILL.md +161 -0
- package/skills/medium-strategy/evals/README.md +3 -0
- package/skills/medium-strategy/evals/cases.yaml +50 -0
- package/skills/medium-strategy/references/distribution-and-boost.md +65 -0
- package/skills/medium-strategy/references/wiki-records.md +60 -0
- package/skills/medium-strategy/scripts/verify.sh +118 -0
- package/skills/medium-writing/SKILL.md +140 -0
- package/skills/medium-writing/evals/README.md +5 -0
- package/skills/medium-writing/evals/cases.yaml +39 -0
- package/skills/medium-writing/references/title-patterns.md +79 -0
- package/skills/meeting-notes/SKILL.md +168 -0
- package/skills/meeting-notes/evals/README.md +14 -0
- package/skills/meeting-notes/evals/cases.yaml +46 -0
- package/skills/meeting-notes/references/templates.md +140 -0
- package/skills/modal/SKILL.md +307 -0
- package/skills/modal/evals/README.md +29 -0
- package/skills/modal/evals/cases.yaml +50 -0
- package/skills/modal/references/images-gpu-cookbook.md +160 -0
- package/skills/modal/references/web-and-scaling.md +138 -0
- package/skills/modal/scripts/verify.sh +127 -0
- package/skills/mongodb/SKILL.md +342 -0
- package/skills/mongodb/evals/README.md +29 -0
- package/skills/mongodb/evals/cases.yaml +41 -0
- package/skills/mongodb/references/aggregation.md +115 -0
- package/skills/mongodb/references/data-modeling.md +135 -0
- package/skills/mongodb/references/transactions-and-ops.md +128 -0
- package/skills/mongodb/scripts/verify.sh +151 -0
- package/skills/monitoring/SKILL.md +155 -0
- package/skills/monitoring/evals/README.md +3 -0
- package/skills/monitoring/evals/cases.yaml +47 -0
- package/skills/monitoring/references/burn-rate-and-oncall.md +128 -0
- package/skills/monitoring/references/tool-setup.md +154 -0
- package/skills/monitoring/scripts/verify.sh +145 -0
- package/skills/mysql/SKILL.md +249 -0
- package/skills/mysql/evals/README.md +12 -0
- package/skills/mysql/evals/cases.yaml +49 -0
- package/skills/mysql/references/indexing-and-explain.md +161 -0
- package/skills/mysql/references/mysql-vs-mariadb.md +78 -0
- package/skills/mysql/references/online-ddl-and-migrations.md +120 -0
- package/skills/mysql/references/replication-and-ha.md +115 -0
- package/skills/mysql/scripts/verify.sh +141 -0
- package/skills/neon/SKILL.md +218 -0
- package/skills/neon/evals/README.md +11 -0
- package/skills/neon/evals/cases.yaml +45 -0
- package/skills/neon/references/branching-ci.md +86 -0
- package/skills/neon/scripts/verify.sh +78 -0
- package/skills/nestjs/SKILL.md +225 -0
- package/skills/nestjs/evals/README.md +3 -0
- package/skills/nestjs/evals/cases.yaml +38 -0
- package/skills/nestjs/references/cross-cutting.md +135 -0
- package/skills/nestjs/references/testing-recipes.md +105 -0
- package/skills/nestjs/scripts/verify.sh +98 -0
- package/skills/netlify/SKILL.md +208 -0
- package/skills/netlify/evals/README.md +13 -0
- package/skills/netlify/evals/cases.yaml +43 -0
- package/skills/netlify/references/functions.md +97 -0
- package/skills/netlify/references/netlify-toml.md +115 -0
- package/skills/netlify/scripts/verify.sh +95 -0
- package/skills/newsletter/SKILL.md +162 -0
- package/skills/newsletter/evals/README.md +12 -0
- package/skills/newsletter/evals/cases.yaml +42 -0
- package/skills/newsletter/references/growth-loops.md +73 -0
- package/skills/newsletter/references/welcome-sequence.md +62 -0
- package/skills/newsletter/scripts/verify.sh +173 -0
- package/skills/nextjs/SKILL.md +472 -0
- package/skills/nextjs/evals/README.md +59 -0
- package/skills/nextjs/evals/cases.yaml +56 -0
- package/skills/nextjs/references/data-and-caching.md +309 -0
- package/skills/nextjs/references/metadata.md +208 -0
- package/skills/nextjs/references/performance.md +325 -0
- package/skills/nextjs/references/react.md +383 -0
- package/skills/nextjs/references/security.md +239 -0
- package/skills/nextjs/references/testing.md +290 -0
- package/skills/nextjs/scripts/verify.sh +141 -0
- package/skills/no-code-app/SKILL.md +153 -0
- package/skills/no-code-app/evals/README.md +3 -0
- package/skills/no-code-app/evals/cases.yaml +43 -0
- package/skills/no-code-app/references/platform-limits.md +100 -0
- package/skills/nodejs/SKILL.md +242 -0
- package/skills/nodejs/evals/README.md +3 -0
- package/skills/nodejs/evals/cases.yaml +39 -0
- package/skills/nodejs/references/express5-migration.md +53 -0
- package/skills/nodejs/references/graceful-shutdown.md +73 -0
- package/skills/nodejs/scripts/verify.sh +122 -0
- package/skills/notion-connector/SKILL.md +234 -0
- package/skills/notion-connector/evals/README.md +15 -0
- package/skills/notion-connector/evals/cases.yaml +45 -0
- package/skills/notion-connector/references/api-versions.md +63 -0
- package/skills/notion-connector/references/property-shapes.md +110 -0
- package/skills/notion-connector/references/sync-patterns.md +95 -0
- package/skills/notion-connector/scripts/verify.sh +162 -0
- package/skills/observability/SKILL.md +231 -0
- package/skills/observability/evals/README.md +3 -0
- package/skills/observability/evals/cases.yaml +49 -0
- package/skills/observability/references/collector-config.md +98 -0
- package/skills/observability/references/instrumentation-recipes.md +115 -0
- package/skills/observability/scripts/verify.sh +156 -0
- package/skills/ollama/SKILL.md +213 -0
- package/skills/ollama/evals/README.md +9 -0
- package/skills/ollama/evals/cases.yaml +43 -0
- package/skills/ollama/references/api.md +148 -0
- package/skills/ollama/references/hardware-sizing.md +87 -0
- package/skills/ollama/scripts/verify.sh +116 -0
- package/skills/orient/SKILL.md +54 -0
- package/skills/orient/evals/README.md +16 -0
- package/skills/orient/evals/cases.yaml +57 -0
- package/skills/orient/references/orientation-contract.md +34 -0
- package/skills/parallel/SKILL.md +198 -0
- package/skills/parallel/evals/README.md +62 -0
- package/skills/parallel/evals/cases.yaml +44 -0
- package/skills/people-ops/SKILL.md +122 -0
- package/skills/people-ops/evals/README.md +14 -0
- package/skills/people-ops/evals/cases.yaml +43 -0
- package/skills/people-ops/references/templates.md +129 -0
- package/skills/performance/SKILL.md +221 -0
- package/skills/performance/evals/README.md +3 -0
- package/skills/performance/evals/cases.yaml +47 -0
- package/skills/performance/references/profiling-playbook.md +54 -0
- package/skills/performance/scripts/verify.sh +94 -0
- package/skills/phoenix/SKILL.md +169 -0
- package/skills/phoenix/evals/README.md +3 -0
- package/skills/phoenix/evals/cases.yaml +40 -0
- package/skills/phoenix/references/auth-and-scopes.md +82 -0
- package/skills/phoenix/references/ecto-patterns.md +93 -0
- package/skills/phoenix/references/liveview.md +134 -0
- package/skills/phoenix/scripts/verify.sh +73 -0
- package/skills/php/SKILL.md +397 -0
- package/skills/php/evals/README.md +12 -0
- package/skills/php/evals/cases.yaml +45 -0
- package/skills/php/references/tooling.md +170 -0
- package/skills/php/references/type-system.md +220 -0
- package/skills/php/scripts/verify.sh +155 -0
- package/skills/pitch-deck/SKILL.md +209 -0
- package/skills/pitch-deck/evals/README.md +15 -0
- package/skills/pitch-deck/evals/cases.yaml +55 -0
- package/skills/pitch-deck/references/numbers-that-matter.md +78 -0
- package/skills/pitch-deck/references/slide-spine.md +149 -0
- package/skills/pitch-deck/scripts/verify.sh +186 -0
- package/skills/plan/SKILL.md +204 -0
- package/skills/plan/evals/README.md +62 -0
- package/skills/plan/evals/cases.yaml +49 -0
- package/skills/plan/references/plan-template.md +124 -0
- package/skills/planetscale/SKILL.md +223 -0
- package/skills/planetscale/evals/README.md +11 -0
- package/skills/planetscale/evals/cases.yaml +46 -0
- package/skills/planetscale/references/deploy-requests.md +75 -0
- package/skills/planetscale/references/no-foreign-keys.md +88 -0
- package/skills/planetscale/scripts/verify.sh +115 -0
- package/skills/podcast/SKILL.md +166 -0
- package/skills/podcast/evals/README.md +17 -0
- package/skills/podcast/evals/cases.yaml +61 -0
- package/skills/podcast/references/rss-and-namespace.md +136 -0
- package/skills/podcast/scripts/verify.sh +246 -0
- package/skills/postgresdb/SKILL.md +372 -0
- package/skills/postgresdb/evals/README.md +55 -0
- package/skills/postgresdb/evals/cases.yaml +57 -0
- package/skills/postgresdb/references/migrations.md +279 -0
- package/skills/postgresdb/references/operations-and-security.md +267 -0
- package/skills/postgresdb/references/query-optimization.md +374 -0
- package/skills/postgresdb/references/schema-and-indexing.md +379 -0
- package/skills/postgresdb/scripts/verify.sh +191 -0
- package/skills/presentations/SKILL.md +296 -0
- package/skills/presentations/evals/README.md +61 -0
- package/skills/presentations/evals/cases.yaml +56 -0
- package/skills/presentations/references/brand-grounding.md +160 -0
- package/skills/presentations/references/markdown-decks.md +290 -0
- package/skills/presentations/references/pptx-python.md +242 -0
- package/skills/presentations/references/slide-design.md +261 -0
- package/skills/presentations/references/storytelling-and-decks.md +150 -0
- package/skills/presentations/scripts/verify.sh +252 -0
- package/skills/press-kit/SKILL.md +243 -0
- package/skills/press-kit/evals/README.md +15 -0
- package/skills/press-kit/evals/cases.yaml +55 -0
- package/skills/press-kit/references/release-types.md +102 -0
- package/skills/press-kit/references/templates.md +132 -0
- package/skills/press-kit/scripts/verify.sh +161 -0
- package/skills/pricing/SKILL.md +160 -0
- package/skills/pricing/evals/README.md +5 -0
- package/skills/pricing/evals/cases.yaml +44 -0
- package/skills/pricing/references/localization.md +56 -0
- package/skills/pricing/references/pricing-models.md +55 -0
- package/skills/pricing/scripts/verify.sh +91 -0
- package/skills/prisma-orm/SKILL.md +320 -0
- package/skills/prisma-orm/evals/README.md +12 -0
- package/skills/prisma-orm/evals/cases.yaml +56 -0
- package/skills/prisma-orm/references/migrations-and-v7-upgrade.md +197 -0
- package/skills/prisma-orm/references/queries-and-performance.md +169 -0
- package/skills/prisma-orm/scripts/verify.sh +137 -0
- package/skills/procurement/SKILL.md +179 -0
- package/skills/procurement/evals/README.md +20 -0
- package/skills/procurement/evals/cases.yaml +49 -0
- package/skills/procurement/references/scorecard-and-tco.md +100 -0
- package/skills/procurement/references/sourcing-requests.md +116 -0
- package/skills/procurement/scripts/verify.sh +280 -0
- package/skills/project-ops/SKILL.md +130 -0
- package/skills/project-ops/evals/README.md +3 -0
- package/skills/project-ops/evals/cases.yaml +71 -0
- package/skills/project-ops/references/raid-and-rag.md +58 -0
- package/skills/project-ops/references/status-report-template.md +68 -0
- package/skills/project-ops/scripts/verify.sh +257 -0
- package/skills/prompt-engineering/SKILL.md +138 -0
- package/skills/prompt-engineering/evals/README.md +11 -0
- package/skills/prompt-engineering/evals/cases.yaml +46 -0
- package/skills/prompt-engineering/references/eval-templates.md +94 -0
- package/skills/prompt-engineering/references/output-contracts.md +120 -0
- package/skills/prompt-engineering/scripts/verify.sh +84 -0
- package/skills/proposals/SKILL.md +159 -0
- package/skills/proposals/evals/README.md +3 -0
- package/skills/proposals/evals/cases.yaml +53 -0
- package/skills/proposals/references/proposal-skeleton.md +110 -0
- package/skills/proposals/references/sow-skeleton.md +79 -0
- package/skills/proposals/scripts/verify.sh +201 -0
- package/skills/python/SKILL.md +369 -0
- package/skills/python/evals/README.md +19 -0
- package/skills/python/evals/cases.yaml +46 -0
- package/skills/python/references/async.md +136 -0
- package/skills/python/references/stdlib.md +162 -0
- package/skills/python/references/typing.md +160 -0
- package/skills/python/scripts/verify.sh +125 -0
- package/skills/rag/SKILL.md +226 -0
- package/skills/rag/evals/README.md +13 -0
- package/skills/rag/evals/cases.yaml +45 -0
- package/skills/rag/references/evaluation.md +99 -0
- package/skills/rag/references/pipeline.md +151 -0
- package/skills/rag/scripts/verify.sh +99 -0
- package/skills/rails/SKILL.md +264 -0
- package/skills/rails/evals/README.md +12 -0
- package/skills/rails/evals/cases.yaml +47 -0
- package/skills/rails/references/activerecord.md +148 -0
- package/skills/rails/references/hotwire.md +139 -0
- package/skills/rails/references/testing.md +110 -0
- package/skills/rails/scripts/verify.sh +128 -0
- package/skills/railway/SKILL.md +245 -0
- package/skills/railway/evals/README.md +14 -0
- package/skills/railway/evals/cases.yaml +44 -0
- package/skills/railway/references/cli-cookbook.md +137 -0
- package/skills/railway/references/config-as-code.md +120 -0
- package/skills/railway/scripts/verify.sh +162 -0
- package/skills/react/SKILL.md +222 -0
- package/skills/react/evals/README.md +3 -0
- package/skills/react/evals/cases.yaml +43 -0
- package/skills/react/references/data-and-state.md +152 -0
- package/skills/react/references/performance.md +75 -0
- package/skills/react/references/routing.md +99 -0
- package/skills/react/scripts/verify.sh +123 -0
- package/skills/react-native/SKILL.md +220 -0
- package/skills/react-native/evals/README.md +3 -0
- package/skills/react-native/evals/cases.yaml +42 -0
- package/skills/react-native/references/native-modules.md +123 -0
- package/skills/react-native/references/performance-debugging.md +46 -0
- package/skills/react-native/scripts/verify.sh +117 -0
- package/skills/redis/SKILL.md +298 -0
- package/skills/redis/evals/README.md +10 -0
- package/skills/redis/evals/cases.yaml +43 -0
- package/skills/redis/references/caching.md +116 -0
- package/skills/redis/references/locks-and-rate-limiting.md +140 -0
- package/skills/redis/references/queues.md +102 -0
- package/skills/redis/scripts/verify.sh +164 -0
- package/skills/remotion-video/SKILL.md +218 -0
- package/skills/remotion-video/evals/README.md +23 -0
- package/skills/remotion-video/evals/cases.yaml +64 -0
- package/skills/remotion-video/references/captions-pipeline.md +163 -0
- package/skills/remotion-video/references/render-and-pipeline.md +131 -0
- package/skills/remotion-video/scripts/verify.sh +169 -0
- package/skills/render/SKILL.md +256 -0
- package/skills/render/evals/README.md +12 -0
- package/skills/render/evals/cases.yaml +45 -0
- package/skills/render/references/blueprint-reference.md +203 -0
- package/skills/render/scripts/verify.sh +167 -0
- package/skills/replicate/SKILL.md +210 -0
- package/skills/replicate/evals/README.md +9 -0
- package/skills/replicate/evals/cases.yaml +45 -0
- package/skills/replicate/references/cog-packaging.md +89 -0
- package/skills/replicate/references/deployments-api.md +87 -0
- package/skills/replicate/references/webhooks-and-async.md +110 -0
- package/skills/replicate/scripts/verify.sh +162 -0
- package/skills/replicate-images/SKILL.md +241 -0
- package/skills/replicate-images/evals/README.md +13 -0
- package/skills/replicate-images/evals/cases.yaml +41 -0
- package/skills/replicate-images/references/editing-recipes.md +129 -0
- package/skills/replicate-images/references/models.md +131 -0
- package/skills/replicate-images/scripts/verify.sh +178 -0
- package/skills/reporting/SKILL.md +178 -0
- package/skills/reporting/evals/README.md +12 -0
- package/skills/reporting/evals/cases.yaml +46 -0
- package/skills/reporting/references/pipeline.md +213 -0
- package/skills/reporting/scripts/verify.sh +149 -0
- package/skills/research-ops/SKILL.md +200 -0
- package/skills/research-ops/evals/README.md +13 -0
- package/skills/research-ops/evals/cases.yaml +38 -0
- package/skills/research-ops/references/credibility-rubric.md +78 -0
- package/skills/research-ops/references/memo-template.md +63 -0
- package/skills/research-ops/scripts/verify.sh +181 -0
- package/skills/retention/SKILL.md +206 -0
- package/skills/retention/evals/README.md +13 -0
- package/skills/retention/evals/cases.yaml +42 -0
- package/skills/retention/references/health-score-and-metrics.md +97 -0
- package/skills/retention/references/save-and-winback-plays.md +65 -0
- package/skills/review/SKILL.md +222 -0
- package/skills/review/evals/README.md +84 -0
- package/skills/review/evals/cases.yaml +55 -0
- package/skills/review-management/SKILL.md +204 -0
- package/skills/review-management/evals/README.md +13 -0
- package/skills/review-management/evals/cases.yaml +60 -0
- package/skills/review-management/references/platform-apis.md +86 -0
- package/skills/review-management/scripts/verify.sh +128 -0
- package/skills/ruby/SKILL.md +316 -0
- package/skills/ruby/evals/README.md +12 -0
- package/skills/ruby/evals/cases.yaml +41 -0
- package/skills/ruby/references/gems-and-testing.md +208 -0
- package/skills/ruby/references/metaprogramming.md +161 -0
- package/skills/ruby/scripts/verify.sh +83 -0
- package/skills/runpod/SKILL.md +238 -0
- package/skills/runpod/evals/README.md +11 -0
- package/skills/runpod/evals/cases.yaml +47 -0
- package/skills/runpod/references/cost-and-scaling.md +85 -0
- package/skills/runpod/references/serverless-workers.md +101 -0
- package/skills/runpod/scripts/verify.sh +126 -0
- package/skills/rust/SKILL.md +395 -0
- package/skills/rust/evals/README.md +12 -0
- package/skills/rust/evals/cases.yaml +42 -0
- package/skills/rust/references/async-tokio.md +141 -0
- package/skills/rust/references/axum-service.md +132 -0
- package/skills/rust/references/ownership.md +86 -0
- package/skills/rust/references/testing.md +108 -0
- package/skills/rust/scripts/verify.sh +91 -0
- package/skills/sales-pipeline/SKILL.md +162 -0
- package/skills/sales-pipeline/evals/README.md +13 -0
- package/skills/sales-pipeline/evals/cases.yaml +60 -0
- package/skills/sales-pipeline/references/forecasting-math.md +82 -0
- package/skills/sales-pipeline/references/stage-playbook.md +84 -0
- package/skills/sales-pipeline/scripts/verify.sh +210 -0
- package/skills/scaling/SKILL.md +137 -0
- package/skills/scaling/evals/README.md +3 -0
- package/skills/scaling/evals/cases.yaml +42 -0
- package/skills/scaling/references/load-testing-k6.md +127 -0
- package/skills/scaling/scripts/example.load.js +24 -0
- package/skills/scaling/scripts/verify.sh +70 -0
- package/skills/sdd/SKILL.md +203 -0
- package/skills/sdd/evals/README.md +60 -0
- package/skills/sdd/evals/cases.yaml +78 -0
- package/skills/sdd-init/SKILL.md +148 -0
- package/skills/sdd-init/evals/README.md +3 -0
- package/skills/sdd-init/evals/cases.yaml +43 -0
- package/skills/secure-coding/SKILL.md +365 -0
- package/skills/secure-coding/evals/README.md +68 -0
- package/skills/secure-coding/evals/cases.yaml +55 -0
- package/skills/secure-coding/references/authn-authz.md +249 -0
- package/skills/secure-coding/references/owasp-by-stack.md +574 -0
- package/skills/secure-coding/references/secrets-and-supply-chain.md +205 -0
- package/skills/secure-coding/references/threat-modeling.md +213 -0
- package/skills/secure-coding/scripts/verify.sh +208 -0
- package/skills/security-scan/SKILL.md +239 -0
- package/skills/security-scan/evals/README.md +14 -0
- package/skills/security-scan/evals/cases.yaml +50 -0
- package/skills/security-scan/references/tools.md +98 -0
- package/skills/security-scan/references/triage.md +93 -0
- package/skills/security-scan/scripts/verify.sh +108 -0
- package/skills/seo-geo/SKILL.md +192 -0
- package/skills/seo-geo/evals/README.md +14 -0
- package/skills/seo-geo/evals/cases.yaml +45 -0
- package/skills/seo-geo/references/ai-crawler-control.md +104 -0
- package/skills/seo-geo/references/schema-recipes.md +130 -0
- package/skills/seo-geo/scripts/verify.sh +236 -0
- package/skills/ship/SKILL.md +258 -0
- package/skills/ship/evals/README.md +89 -0
- package/skills/ship/evals/cases.yaml +44 -0
- package/skills/shopify/SKILL.md +229 -0
- package/skills/shopify/evals/README.md +14 -0
- package/skills/shopify/evals/cases.yaml +41 -0
- package/skills/shopify/references/apps-graphql.md +103 -0
- package/skills/shopify/references/checkout-extensibility.md +71 -0
- package/skills/shopify/references/liquid-themes.md +89 -0
- package/skills/shopify/scripts/verify.sh +120 -0
- package/skills/shortform-editing/SKILL.md +161 -0
- package/skills/shortform-editing/evals/README.md +16 -0
- package/skills/shortform-editing/evals/cases.yaml +61 -0
- package/skills/shortform-editing/references/captions.md +85 -0
- package/skills/shortform-editing/references/ffmpeg-pipeline.md +126 -0
- package/skills/shortform-editing/scripts/verify.sh +148 -0
- package/skills/shortform-ideation/SKILL.md +153 -0
- package/skills/shortform-ideation/evals/README.md +20 -0
- package/skills/shortform-ideation/evals/cases.yaml +58 -0
- package/skills/shortform-ideation/references/experiment-ledger.md +85 -0
- package/skills/shortform-ideation/references/trend-sources.md +69 -0
- package/skills/shortform-ideation/scripts/verify.sh +172 -0
- package/skills/shortform-packaging/SKILL.md +247 -0
- package/skills/shortform-packaging/evals/README.md +10 -0
- package/skills/shortform-packaging/evals/cases.yaml +48 -0
- package/skills/shortform-packaging/references/package-templates.md +117 -0
- package/skills/shortform-packaging/scripts/verify.sh +210 -0
- package/skills/shortform-strategy/SKILL.md +149 -0
- package/skills/shortform-strategy/evals/README.md +3 -0
- package/skills/shortform-strategy/evals/cases.yaml +52 -0
- package/skills/shortform-strategy/references/learning-loop-template.md +49 -0
- package/skills/shortform-strategy/references/platform-signals-2026.md +46 -0
- package/skills/shortform-strategy/scripts/verify.sh +176 -0
- package/skills/skill-scout/SKILL.md +133 -0
- package/skills/skill-scout/evals/README.md +12 -0
- package/skills/skill-scout/evals/cases.yaml +56 -0
- package/skills/skill-scout/references/install-commands.md +76 -0
- package/skills/skill-scout/scripts/verify.sh +154 -0
- package/skills/social-publisher/SKILL.md +179 -0
- package/skills/social-publisher/evals/README.md +14 -0
- package/skills/social-publisher/evals/cases.yaml +55 -0
- package/skills/social-publisher/references/calendar-schema.md +97 -0
- package/skills/social-publisher/references/platform-limits.md +56 -0
- package/skills/social-publisher/scripts/verify.sh +232 -0
- package/skills/solid-js/SKILL.md +260 -0
- package/skills/solid-js/evals/README.md +3 -0
- package/skills/solid-js/evals/cases.yaml +38 -0
- package/skills/solid-js/references/reactivity-deep-dive.md +89 -0
- package/skills/solid-js/references/router-and-start.md +93 -0
- package/skills/solid-js/scripts/verify.sh +130 -0
- package/skills/sop-builder/SKILL.md +233 -0
- package/skills/sop-builder/evals/README.md +14 -0
- package/skills/sop-builder/evals/cases.yaml +48 -0
- package/skills/sop-builder/references/sop-skeleton.md +170 -0
- package/skills/specify/SKILL.md +214 -0
- package/skills/specify/evals/README.md +73 -0
- package/skills/specify/evals/cases.yaml +80 -0
- package/skills/specify/references/eliciting-requirements.md +77 -0
- package/skills/specify/references/spec-template.md +60 -0
- package/skills/spreadsheet-ops/SKILL.md +180 -0
- package/skills/spreadsheet-ops/evals/README.md +33 -0
- package/skills/spreadsheet-ops/evals/cases.yaml +42 -0
- package/skills/spreadsheet-ops/references/formula-cookbook.md +70 -0
- package/skills/spreadsheet-ops/references/python-excel.md +87 -0
- package/skills/spreadsheet-ops/references/sheets-api-appsscript.md +118 -0
- package/skills/spreadsheet-ops/scripts/verify.sh +152 -0
- package/skills/spring-boot/SKILL.md +375 -0
- package/skills/spring-boot/evals/README.md +11 -0
- package/skills/spring-boot/evals/cases.yaml +49 -0
- package/skills/spring-boot/references/jpa.md +94 -0
- package/skills/spring-boot/references/security.md +92 -0
- package/skills/spring-boot/references/testing.md +95 -0
- package/skills/spring-boot/scripts/verify.sh +115 -0
- package/skills/sql/SKILL.md +286 -0
- package/skills/sql/evals/README.md +9 -0
- package/skills/sql/evals/cases.yaml +49 -0
- package/skills/sql/references/ctes-and-recursion.md +63 -0
- package/skills/sql/references/joins-and-sets.md +71 -0
- package/skills/sql/references/portability.md +38 -0
- package/skills/sql/references/window-functions.md +72 -0
- package/skills/sql/scripts/verify.sh +139 -0
- package/skills/sqlite-turso/SKILL.md +214 -0
- package/skills/sqlite-turso/evals/README.md +24 -0
- package/skills/sqlite-turso/evals/cases.yaml +45 -0
- package/skills/sqlite-turso/references/embedded-replicas.md +96 -0
- package/skills/sqlite-turso/scripts/verify.sh +95 -0
- package/skills/stripe/SKILL.md +269 -0
- package/skills/stripe/evals/README.md +11 -0
- package/skills/stripe/evals/cases.yaml +45 -0
- package/skills/stripe/references/going-live.md +64 -0
- package/skills/stripe/references/webhook-events.md +79 -0
- package/skills/stripe/scripts/verify.sh +130 -0
- package/skills/structured-extraction/SKILL.md +230 -0
- package/skills/structured-extraction/evals/README.md +13 -0
- package/skills/structured-extraction/evals/cases.yaml +70 -0
- package/skills/structured-extraction/references/providers.md +152 -0
- package/skills/structured-extraction/scripts/verify.sh +160 -0
- package/skills/suggest/SKILL.md +30 -0
- package/skills/suggest/evals/README.md +14 -0
- package/skills/suggest/evals/cases.yaml +51 -0
- package/skills/supabase/SKILL.md +268 -0
- package/skills/supabase/evals/README.md +12 -0
- package/skills/supabase/evals/cases.yaml +42 -0
- package/skills/supabase/references/auth-ssr.md +173 -0
- package/skills/supabase/references/rls-cookbook.md +122 -0
- package/skills/supabase/scripts/verify.sh +149 -0
- package/skills/svelte/SKILL.md +238 -0
- package/skills/svelte/evals/README.md +3 -0
- package/skills/svelte/evals/cases.yaml +41 -0
- package/skills/svelte/references/runes.md +97 -0
- package/skills/svelte/references/sveltekit-data.md +156 -0
- package/skills/svelte/scripts/verify.sh +128 -0
- package/skills/swift-ios/SKILL.md +217 -0
- package/skills/swift-ios/evals/README.md +3 -0
- package/skills/swift-ios/evals/cases.yaml +46 -0
- package/skills/swift-ios/references/concurrency.md +132 -0
- package/skills/swift-ios/references/testing.md +112 -0
- package/skills/swift-ios/scripts/verify.sh +98 -0
- package/skills/tasks/SKILL.md +260 -0
- package/skills/tasks/evals/README.md +70 -0
- package/skills/tasks/evals/cases.yaml +75 -0
- package/skills/tauri/SKILL.md +224 -0
- package/skills/tauri/evals/README.md +12 -0
- package/skills/tauri/evals/cases.yaml +46 -0
- package/skills/tauri/references/bundling-distribution.md +129 -0
- package/skills/tauri/references/security.md +143 -0
- package/skills/tauri/scripts/verify.sh +178 -0
- package/skills/technical-writing/SKILL.md +230 -0
- package/skills/technical-writing/evals/README.md +12 -0
- package/skills/technical-writing/evals/cases.yaml +53 -0
- package/skills/technical-writing/references/diataxis-modes.md +131 -0
- package/skills/technical-writing/references/vale-starter.md +90 -0
- package/skills/technical-writing/scripts/verify.sh +83 -0
- package/skills/terms-conditions/SKILL.md +147 -0
- package/skills/terms-conditions/evals/README.md +14 -0
- package/skills/terms-conditions/evals/cases.yaml +48 -0
- package/skills/terms-conditions/references/clause-library.md +158 -0
- package/skills/terms-conditions/references/notices-and-aup.md +125 -0
- package/skills/terms-conditions/scripts/verify.sh +92 -0
- package/skills/testing-go/SKILL.md +246 -0
- package/skills/testing-go/evals/README.md +3 -0
- package/skills/testing-go/evals/cases.yaml +44 -0
- package/skills/testing-go/references/coverage-and-benchmarks.md +85 -0
- package/skills/testing-go/references/mocks-and-fakes.md +140 -0
- package/skills/testing-go/references/synctest-and-concurrency.md +82 -0
- package/skills/testing-go/scripts/verify.sh +72 -0
- package/skills/testing-py/SKILL.md +179 -0
- package/skills/testing-py/evals/README.md +5 -0
- package/skills/testing-py/evals/cases.yaml +44 -0
- package/skills/testing-py/references/mocking.md +141 -0
- package/skills/testing-py/references/property-testing.md +99 -0
- package/skills/testing-py/scripts/verify.sh +117 -0
- package/skills/testing-web/SKILL.md +224 -0
- package/skills/testing-web/evals/README.md +11 -0
- package/skills/testing-web/evals/cases.yaml +52 -0
- package/skills/testing-web/references/jest-setup.md +88 -0
- package/skills/testing-web/references/recipes.md +116 -0
- package/skills/testing-web/scripts/verify.sh +111 -0
- package/skills/tiktok-api/SKILL.md +315 -0
- package/skills/tiktok-api/evals/README.md +17 -0
- package/skills/tiktok-api/evals/cases.yaml +51 -0
- package/skills/tiktok-api/references/metrics-and-publish.md +127 -0
- package/skills/tiktok-api/references/oauth-setup.md +105 -0
- package/skills/tiktok-api/references/wiki-schema.md +85 -0
- package/skills/tiktok-api/scripts/verify.sh +96 -0
- package/skills/together-fireworks/SKILL.md +181 -0
- package/skills/together-fireworks/evals/README.md +3 -0
- package/skills/together-fireworks/evals/cases.yaml +50 -0
- package/skills/together-fireworks/references/batch-and-tuning.md +59 -0
- package/skills/together-fireworks/references/models-and-pricing.md +79 -0
- package/skills/together-fireworks/scripts/verify.sh +165 -0
- package/skills/translation-l10n/SKILL.md +229 -0
- package/skills/translation-l10n/evals/README.md +3 -0
- package/skills/translation-l10n/evals/cases.yaml +39 -0
- package/skills/translation-l10n/references/icu-cookbook.md +82 -0
- package/skills/translation-l10n/references/rtl-and-bidi.md +60 -0
- package/skills/typescript/SKILL.md +258 -0
- package/skills/typescript/evals/README.md +15 -0
- package/skills/typescript/evals/cases.yaml +46 -0
- package/skills/typescript/references/build-and-monorepo.md +141 -0
- package/skills/typescript/references/type-system.md +162 -0
- package/skills/typescript/scripts/verify.sh +52 -0
- package/skills/unit-economics/SKILL.md +180 -0
- package/skills/unit-economics/evals/README.md +5 -0
- package/skills/unit-economics/evals/cases.yaml +43 -0
- package/skills/unit-economics/references/formulas.md +144 -0
- package/skills/unit-economics/scripts/verify.sh +179 -0
- package/skills/vector-db/SKILL.md +189 -0
- package/skills/vector-db/evals/README.md +10 -0
- package/skills/vector-db/evals/cases.yaml +45 -0
- package/skills/vector-db/references/engines.md +175 -0
- package/skills/vector-db/references/tuning.md +62 -0
- package/skills/vector-db/scripts/verify.sh +110 -0
- package/skills/vercel/SKILL.md +242 -0
- package/skills/vercel/evals/README.md +23 -0
- package/skills/vercel/evals/cases.yaml +45 -0
- package/skills/vercel/references/cli-cookbook.md +98 -0
- package/skills/vercel/references/vercel-json.md +120 -0
- package/skills/vercel/scripts/verify.sh +168 -0
- package/skills/verify/SKILL.md +188 -0
- package/skills/verify/evals/README.md +78 -0
- package/skills/verify/evals/cases.yaml +74 -0
- package/skills/video-shorts/SKILL.md +163 -0
- package/skills/video-shorts/evals/README.md +15 -0
- package/skills/video-shorts/evals/cases.yaml +56 -0
- package/skills/video-shorts/references/hook-and-script-patterns.md +95 -0
- package/skills/video-shorts/references/specs-and-safe-zones.md +74 -0
- package/skills/video-shorts/scripts/verify.sh +172 -0
- package/skills/vue-nuxt/SKILL.md +384 -0
- package/skills/vue-nuxt/evals/README.md +11 -0
- package/skills/vue-nuxt/evals/cases.yaml +49 -0
- package/skills/vue-nuxt/references/data-and-state.md +127 -0
- package/skills/vue-nuxt/references/migration-nuxt4.md +79 -0
- package/skills/vue-nuxt/references/nitro-and-rendering.md +117 -0
- package/skills/vue-nuxt/references/reactivity.md +135 -0
- package/skills/vue-nuxt/scripts/verify.sh +148 -0
- package/skills/webhooks/SKILL.md +246 -0
- package/skills/webhooks/evals/README.md +15 -0
- package/skills/webhooks/evals/cases.yaml +46 -0
- package/skills/webhooks/references/framework-raw-body.md +97 -0
- package/skills/webhooks/references/signature-schemes.md +66 -0
- package/skills/webhooks/scripts/verify.sh +142 -0
- package/skills/webinar/SKILL.md +196 -0
- package/skills/webinar/evals/README.md +14 -0
- package/skills/webinar/evals/cases.yaml +44 -0
- package/skills/webinar/references/email-cadence.md +75 -0
- package/skills/webinar/references/run-of-show.md +83 -0
- package/skills/whatsapp-telegram/SKILL.md +235 -0
- package/skills/whatsapp-telegram/evals/README.md +11 -0
- package/skills/whatsapp-telegram/evals/cases.yaml +44 -0
- package/skills/whatsapp-telegram/references/telegram-bot-api.md +91 -0
- package/skills/whatsapp-telegram/references/whatsapp-cloud-api.md +103 -0
- package/skills/whatsapp-telegram/scripts/verify.sh +90 -0
- package/skills/wordpress/SKILL.md +224 -0
- package/skills/wordpress/evals/README.md +3 -0
- package/skills/wordpress/evals/cases.yaml +50 -0
- package/skills/wordpress/references/hardening.md +108 -0
- package/skills/wordpress/references/performance.md +80 -0
- package/skills/wordpress/references/woocommerce.md +65 -0
- package/skills/wordpress/scripts/verify.sh +96 -0
- package/skills/worktrees/SKILL.md +199 -0
- package/skills/worktrees/evals/README.md +78 -0
- package/skills/worktrees/evals/cases.yaml +47 -0
- package/skills/youtube-api/SKILL.md +286 -0
- package/skills/youtube-api/evals/README.md +3 -0
- package/skills/youtube-api/evals/cases.yaml +50 -0
- package/skills/youtube-api/references/analytics-queries.md +89 -0
- package/skills/youtube-api/references/oauth-setup.md +55 -0
- package/skills/youtube-api/references/wiki-schema.md +70 -0
- package/skills/youtube-api/scripts/verify.sh +84 -0
- package/skills/youtube-ideation/SKILL.md +234 -0
- package/skills/youtube-ideation/evals/README.md +14 -0
- package/skills/youtube-ideation/evals/cases.yaml +52 -0
- package/skills/youtube-ideation/references/idea-ledger-and-loop.md +89 -0
- package/skills/youtube-ideation/references/research-and-signals.md +92 -0
- package/skills/youtube-ideation/scripts/verify.sh +237 -0
- package/skills/youtube-packaging/SKILL.md +220 -0
- package/skills/youtube-packaging/evals/README.md +16 -0
- package/skills/youtube-packaging/evals/cases.yaml +48 -0
- package/skills/youtube-packaging/references/description-and-chapters.md +135 -0
- package/skills/youtube-packaging/scripts/verify.sh +250 -0
- package/skills/youtube-strategy/SKILL.md +157 -0
- package/skills/youtube-strategy/evals/README.md +5 -0
- package/skills/youtube-strategy/evals/cases.yaml +61 -0
- package/skills/youtube-strategy/references/channel-architecture.md +46 -0
- package/skills/youtube-strategy/references/wiki-records.md +86 -0
- package/skills/youtube-strategy/scripts/verify.sh +118 -0
- package/skills/youtube-thumbnails/SKILL.md +180 -0
- package/skills/youtube-thumbnails/evals/README.md +11 -0
- package/skills/youtube-thumbnails/evals/cases.yaml +48 -0
- package/skills/youtube-thumbnails/references/composition-and-specs.md +69 -0
- package/skills/youtube-thumbnails/references/experiment-log-format.md +65 -0
- package/skills/youtube-thumbnails/scripts/verify.sh +123 -0
- package/targets/claude.js +23 -0
- package/targets/codex.js +29 -0
- package/targets/cursor.js +20 -0
- package/targets/gemini.js +29 -0
- package/targets/index.js +55 -0
|
@@ -0,0 +1,420 @@
|
|
|
1
|
+
# Evals, tracing & cost control — the production line
|
|
2
|
+
|
|
3
|
+
An LLM feature without an eval gate is a demo. This file ships a runnable, provider-
|
|
4
|
+
agnostic eval harness (the one `scripts/verify.sh` dry-runs), an LLM-as-judge that is
|
|
5
|
+
itself model-agnostic, OTel GenAI tracing, and a closed-loop cost controller: caching,
|
|
6
|
+
batching, routing/cascades, and hard budgets. Python 3.12+, Pydantic v2.
|
|
7
|
+
|
|
8
|
+
## Eval-first
|
|
9
|
+
|
|
10
|
+
Define success criteria **before** building. Two complementary suites:
|
|
11
|
+
|
|
12
|
+
- **Capability evals** — can the agent do the new thing at all? Built once, when you add
|
|
13
|
+
a feature.
|
|
14
|
+
- **Regression evals** — does a prompt/model/code change break what already worked? Run
|
|
15
|
+
on every PR against a stored baseline.
|
|
16
|
+
|
|
17
|
+
Both reduce to: a golden set, graders, metrics, and a threshold that gates merge.
|
|
18
|
+
|
|
19
|
+
## Golden sets
|
|
20
|
+
|
|
21
|
+
JSONL, one case per line, versioned as fixtures next to the code:
|
|
22
|
+
|
|
23
|
+
```json
|
|
24
|
+
{"input": "Refund order ord_8123 for 500 cents, defective", "expected": {"tool": "refund", "order_id": "ord_8123"}, "metadata": {"suite": "capability", "tags": ["refund"]}}
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
```python
|
|
28
|
+
import json
|
|
29
|
+
from pathlib import Path
|
|
30
|
+
|
|
31
|
+
from pydantic import BaseModel
|
|
32
|
+
|
|
33
|
+
|
|
34
|
+
class Case(BaseModel):
|
|
35
|
+
input: str
|
|
36
|
+
expected: dict
|
|
37
|
+
metadata: dict = {}
|
|
38
|
+
|
|
39
|
+
|
|
40
|
+
def load_golden(path: str | Path) -> list[Case]:
|
|
41
|
+
return [Case.model_validate_json(line) for line in Path(path).read_text().splitlines() if line.strip()]
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
Build the golden set from **real traffic** (sampled, labeled), not invented prompts; keep
|
|
45
|
+
a strict train/eval split so prompts can't be tuned against the eval set (a form of
|
|
46
|
+
leakage). Version it: a golden set is a first-class artifact, reviewed like code.
|
|
47
|
+
|
|
48
|
+
## Graders
|
|
49
|
+
|
|
50
|
+
Three grader kinds, one interface. The LLM-as-judge uses the same `LLMProvider`, so the
|
|
51
|
+
judge is provider-agnostic and can run on a *different* model than the candidate.
|
|
52
|
+
|
|
53
|
+
```python
|
|
54
|
+
from typing import Protocol
|
|
55
|
+
|
|
56
|
+
from provider_abstraction import CompletionRequest, LLMProvider, Message
|
|
57
|
+
from provider_abstraction import complete_structured
|
|
58
|
+
|
|
59
|
+
|
|
60
|
+
class Grader(Protocol):
|
|
61
|
+
name: str
|
|
62
|
+
def grade(self, case: Case, output: str) -> float: ... # 0.0–1.0
|
|
63
|
+
|
|
64
|
+
|
|
65
|
+
class ExactGrader:
|
|
66
|
+
name = "exact"
|
|
67
|
+
def grade(self, case: Case, output: str) -> float:
|
|
68
|
+
return float(output.strip() == str(case.expected.get("text", "")).strip())
|
|
69
|
+
|
|
70
|
+
|
|
71
|
+
class SchemaGrader:
|
|
72
|
+
name = "schema"
|
|
73
|
+
def __init__(self, schema: type[BaseModel]) -> None:
|
|
74
|
+
self.schema = schema
|
|
75
|
+
def grade(self, case: Case, output: str) -> float:
|
|
76
|
+
try:
|
|
77
|
+
self.schema.model_validate_json(output)
|
|
78
|
+
return 1.0
|
|
79
|
+
except Exception:
|
|
80
|
+
return 0.0
|
|
81
|
+
|
|
82
|
+
|
|
83
|
+
class Verdict(BaseModel):
|
|
84
|
+
score: float # 0.0–1.0
|
|
85
|
+
reasoning: str
|
|
86
|
+
|
|
87
|
+
|
|
88
|
+
class JudgeGrader:
|
|
89
|
+
name = "judge"
|
|
90
|
+
RUBRIC = ("Score 0.0–1.0 how well the ANSWER satisfies the CRITERIA. "
|
|
91
|
+
"Be strict; reward only grounded, complete, correct answers. Return JSON.")
|
|
92
|
+
|
|
93
|
+
def __init__(self, judge: LLMProvider, judge_model: str) -> None:
|
|
94
|
+
self.judge, self.model = judge, judge_model
|
|
95
|
+
|
|
96
|
+
async def grade(self, case: Case, output: str) -> float:
|
|
97
|
+
v = await complete_structured(self.judge, CompletionRequest(model=self.model, messages=[
|
|
98
|
+
Message(role="system", content=self.RUBRIC),
|
|
99
|
+
Message(role="user", content=f"CRITERIA:\n{case.expected}\n\nANSWER:\n{output}")]), Verdict)
|
|
100
|
+
return v.score
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
Judge hygiene: use a **different model** for judge vs candidate to avoid self-preference
|
|
104
|
+
bias; prefer **pairwise** (A-vs-B) when comparing two systems and **pointwise** (score one)
|
|
105
|
+
for absolute gates; randomize A/B order to cancel position bias; and **calibrate** the
|
|
106
|
+
judge against ~50 human labels before trusting it in a gate (target judge-human agreement
|
|
107
|
+
> 0.8).
|
|
108
|
+
|
|
109
|
+
## Metrics
|
|
110
|
+
|
|
111
|
+
```python
|
|
112
|
+
def accuracy(scores: list[float]) -> float:
|
|
113
|
+
return sum(s >= 0.5 for s in scores) / len(scores)
|
|
114
|
+
|
|
115
|
+
|
|
116
|
+
def pass_at_k(successes_per_case: list[int], k: int) -> float:
|
|
117
|
+
# pass@k = fraction of cases with >=1 success in k attempts.
|
|
118
|
+
return sum(s >= 1 for s in successes_per_case) / len(successes_per_case)
|
|
119
|
+
|
|
120
|
+
|
|
121
|
+
def pass_caret_k(all_success_per_case: list[bool]) -> float:
|
|
122
|
+
# pass^k = fraction of cases where ALL k attempts succeeded (stability).
|
|
123
|
+
return sum(all_success_per_case) / len(all_success_per_case)
|
|
124
|
+
|
|
125
|
+
|
|
126
|
+
def percentile(values: list[float], p: float) -> float:
|
|
127
|
+
s = sorted(values)
|
|
128
|
+
idx = min(len(s) - 1, int(round((p / 100) * (len(s) - 1))))
|
|
129
|
+
return s[idx]
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
Track at minimum: `accuracy`, `pass@1`/`pass@3`, `faithfulness` (judge/citation grader),
|
|
133
|
+
answer relevance, `cost_per_task`, `p50`/`p95` latency, and `tool_call_validity_rate`
|
|
134
|
+
(fraction of tool calls that pass schema validation). Recommended gates: capability
|
|
135
|
+
`pass@3 ≥ 0.90`; regression `pass^3 = 1.0` on release-critical paths.
|
|
136
|
+
|
|
137
|
+
## Eval runner
|
|
138
|
+
|
|
139
|
+
The full async runner. It supports a **dry-run** mode (`EVAL_DRY_RUN=1` or `--dry-run`)
|
|
140
|
+
that uses a stub provider and never hits live APIs — this is exactly what `verify.sh`
|
|
141
|
+
invokes as a smoke test.
|
|
142
|
+
|
|
143
|
+
```python
|
|
144
|
+
# evals/run.py
|
|
145
|
+
import argparse
|
|
146
|
+
import asyncio
|
|
147
|
+
import json
|
|
148
|
+
import os
|
|
149
|
+
import statistics
|
|
150
|
+
import sys
|
|
151
|
+
import time
|
|
152
|
+
from pathlib import Path
|
|
153
|
+
|
|
154
|
+
from provider_abstraction import (CompletionRequest, CompletionResponse, LLMProvider,
|
|
155
|
+
Message, Usage, get_provider)
|
|
156
|
+
|
|
157
|
+
|
|
158
|
+
class StubProvider:
|
|
159
|
+
"""Deterministic offline provider for dry-runs — no network, no keys."""
|
|
160
|
+
async def complete(self, req: CompletionRequest) -> CompletionResponse:
|
|
161
|
+
return CompletionResponse(text='{"ok": true}', usage=Usage(input_tokens=10, output_tokens=5))
|
|
162
|
+
async def stream(self, req): # pragma: no cover - unused in dry-run
|
|
163
|
+
yield
|
|
164
|
+
async def embed(self, texts: list[str]) -> list[list[float]]:
|
|
165
|
+
return [[0.0] * 1536 for _ in texts]
|
|
166
|
+
|
|
167
|
+
|
|
168
|
+
async def run(golden_path: str, dry_run: bool) -> int:
|
|
169
|
+
cases = [json.loads(line) for line in Path(golden_path).read_text().splitlines() if line.strip()]
|
|
170
|
+
provider: LLMProvider = StubProvider() if dry_run else get_provider(os.environ["LLM"])
|
|
171
|
+
model = "stub" if dry_run else os.environ["LLM"].split(":", 1)[1]
|
|
172
|
+
rows = []
|
|
173
|
+
for c in cases:
|
|
174
|
+
t0 = time.perf_counter()
|
|
175
|
+
resp = await provider.complete(CompletionRequest(model=model,
|
|
176
|
+
messages=[Message(role="user", content=c["input"])]))
|
|
177
|
+
ok = SchemaGrader(_AnyJSON).grade(_Case(c), resp.text)
|
|
178
|
+
rows.append({"score": ok, "cost": resp.usage.cost_usd, "ms": (time.perf_counter() - t0) * 1000})
|
|
179
|
+
|
|
180
|
+
n = len(rows) or 1
|
|
181
|
+
metrics = {
|
|
182
|
+
"accuracy": sum(r["score"] for r in rows) / n,
|
|
183
|
+
"cost_per_task": sum(r["cost"] for r in rows) / n,
|
|
184
|
+
"p95_latency_ms": statistics.quantiles([r["ms"] for r in rows], n=20)[-1] if len(rows) > 1 else rows[0]["ms"],
|
|
185
|
+
}
|
|
186
|
+
thresholds = {"accuracy": 0.0 if dry_run else 0.9} # dry-run only checks the pipeline runs
|
|
187
|
+
Path("eval-report.json").write_text(json.dumps(metrics, indent=2))
|
|
188
|
+
failed = {k: metrics[k] for k, lo in thresholds.items() if metrics[k] < lo}
|
|
189
|
+
print(json.dumps({"metrics": metrics, "failed": failed, "dry_run": dry_run}, indent=2))
|
|
190
|
+
return 1 if failed else 0
|
|
191
|
+
|
|
192
|
+
|
|
193
|
+
if __name__ == "__main__":
|
|
194
|
+
ap = argparse.ArgumentParser()
|
|
195
|
+
ap.add_argument("--golden", default="evals/golden.jsonl")
|
|
196
|
+
ap.add_argument("--dry-run", action="store_true", default=os.environ.get("EVAL_DRY_RUN") == "1")
|
|
197
|
+
args = ap.parse_args()
|
|
198
|
+
sys.exit(asyncio.run(run(args.golden, args.dry_run)))
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
```python
|
|
202
|
+
# tiny shims referenced above so the file is self-contained
|
|
203
|
+
from pydantic import BaseModel
|
|
204
|
+
|
|
205
|
+
|
|
206
|
+
class _AnyJSON(BaseModel):
|
|
207
|
+
model_config = {"extra": "allow"}
|
|
208
|
+
|
|
209
|
+
|
|
210
|
+
class _Case:
|
|
211
|
+
def __init__(self, d: dict) -> None:
|
|
212
|
+
self.input, self.expected, self.metadata = d["input"], d.get("expected", {}), d.get("metadata", {})
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
## Regression gates in CI
|
|
216
|
+
|
|
217
|
+
Run the suite on every PR; fail if any metric regresses past tolerance versus a stored
|
|
218
|
+
baseline. GitHub Actions:
|
|
219
|
+
|
|
220
|
+
```yaml
|
|
221
|
+
name: eval-gate
|
|
222
|
+
on: pull_request
|
|
223
|
+
jobs:
|
|
224
|
+
evals:
|
|
225
|
+
runs-on: ubuntu-latest
|
|
226
|
+
steps:
|
|
227
|
+
- uses: actions/checkout@v4
|
|
228
|
+
- uses: actions/setup-python@v5
|
|
229
|
+
with:
|
|
230
|
+
python-version: "3.12"
|
|
231
|
+
- run: pip install -e . pydantic openai anthropic
|
|
232
|
+
- name: Run evals
|
|
233
|
+
env:
|
|
234
|
+
LLM: ${{ vars.LLM }}
|
|
235
|
+
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
|
236
|
+
run: python evals/run.py --golden evals/golden.jsonl
|
|
237
|
+
- name: Compare against baseline
|
|
238
|
+
run: python evals/compare.py eval-report.json evals/baseline.json --tolerance 0.02
|
|
239
|
+
```
|
|
240
|
+
|
|
241
|
+
```python
|
|
242
|
+
# evals/compare.py — fail if any metric regresses more than tolerance vs baseline.
|
|
243
|
+
import json
|
|
244
|
+
import sys
|
|
245
|
+
|
|
246
|
+
current, baseline, _, tol = sys.argv[1], sys.argv[2], sys.argv[3], float(sys.argv[4])
|
|
247
|
+
cur = json.load(open(current))
|
|
248
|
+
base = json.load(open(baseline))
|
|
249
|
+
regressions = {k: (base[k], cur[k]) for k in ("accuracy",)
|
|
250
|
+
if k in base and cur.get(k, 0) < base[k] - tol}
|
|
251
|
+
if regressions:
|
|
252
|
+
print(f"REGRESSION: {regressions}")
|
|
253
|
+
sys.exit(1)
|
|
254
|
+
print("no regressions")
|
|
255
|
+
```
|
|
256
|
+
|
|
257
|
+
Guard against flaky graders: run judge grades `n=3` and require majority; never put a
|
|
258
|
+
grader with judge-human agreement below your threshold into a release gate.
|
|
259
|
+
|
|
260
|
+
## Tracing (OTel GenAI)
|
|
261
|
+
|
|
262
|
+
Emit OpenTelemetry **GenAI semantic-convention** spans (`gen_ai.*`). Dashboards
|
|
263
|
+
(Langfuse / Phoenix / Braintrust) are swappable OTLP backends — you emit spans, you swap
|
|
264
|
+
the exporter.
|
|
265
|
+
|
|
266
|
+
```python
|
|
267
|
+
from opentelemetry import trace
|
|
268
|
+
from opentelemetry.sdk.trace import TracerProvider
|
|
269
|
+
from opentelemetry.sdk.trace.export import BatchSpanProcessor
|
|
270
|
+
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
|
|
271
|
+
|
|
272
|
+
provider = TracerProvider()
|
|
273
|
+
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter())) # OTEL_EXPORTER_OTLP_ENDPOINT
|
|
274
|
+
trace.set_tracer_provider(provider)
|
|
275
|
+
tracer = trace.get_tracer("agent")
|
|
276
|
+
|
|
277
|
+
|
|
278
|
+
async def traced_complete(p: LLMProvider, system: str, req: CompletionRequest) -> CompletionResponse:
|
|
279
|
+
with tracer.start_as_current_span("chat") as span:
|
|
280
|
+
span.set_attribute("gen_ai.system", system) # "openai" | "anthropic" | ...
|
|
281
|
+
span.set_attribute("gen_ai.request.model", req.model)
|
|
282
|
+
span.set_attribute("gen_ai.request.temperature", req.temperature)
|
|
283
|
+
resp = await p.complete(req)
|
|
284
|
+
span.set_attribute("gen_ai.usage.input_tokens", resp.usage.input_tokens)
|
|
285
|
+
span.set_attribute("gen_ai.usage.output_tokens", resp.usage.output_tokens)
|
|
286
|
+
span.set_attribute("gen_ai.usage.cost_usd", resp.usage.cost_usd)
|
|
287
|
+
return resp
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
Wrap each tool call and each agent step in its own child span; the agent loop's `run_id`
|
|
291
|
+
becomes the trace id, so a whole run is one tree. **Redact PII before it lands in a span**
|
|
292
|
+
(spans persist to a backend) — never trace raw user content unfiltered.
|
|
293
|
+
|
|
294
|
+
## Cost & latency control
|
|
295
|
+
|
|
296
|
+
### Caching
|
|
297
|
+
|
|
298
|
+
Provider prompt caching (Anthropic `cache_control`, OpenAI/Gemini automatic cached input)
|
|
299
|
+
is abstracted behind the adapter (`cache_system` flag). On top of that, an **app-level
|
|
300
|
+
semantic cache** returns a prior answer when a new request is near-identical:
|
|
301
|
+
|
|
302
|
+
```python
|
|
303
|
+
import hashlib
|
|
304
|
+
|
|
305
|
+
|
|
306
|
+
class SemanticCache:
|
|
307
|
+
def __init__(self, provider: LLMProvider, retriever, threshold: float = 0.95) -> None:
|
|
308
|
+
self.provider, self.retriever, self.threshold = provider, retriever, threshold
|
|
309
|
+
|
|
310
|
+
async def get_or_compute(self, prompt: str, compute) -> tuple[str, bool]:
|
|
311
|
+
[vec] = await self.provider.embed([prompt])
|
|
312
|
+
hits = await self.retriever.search(vec, k=1, flt={"kind": "cache"})
|
|
313
|
+
if hits and hits[0].get("sim", 0) >= self.threshold:
|
|
314
|
+
return hits[0]["content"], True # cache hit
|
|
315
|
+
answer = await compute()
|
|
316
|
+
await self.retriever.upsert(prompt, vec, answer, meta={"kind": "cache"})
|
|
317
|
+
return answer, False
|
|
318
|
+
```
|
|
319
|
+
|
|
320
|
+
### Batching
|
|
321
|
+
|
|
322
|
+
For offline workloads (backfills, nightly evals, bulk classification), use provider batch
|
|
323
|
+
APIs (≈ −50% cost class) instead of synchronous calls. Submit a JSONL of requests, poll,
|
|
324
|
+
collect — latency is hours, not seconds, which is fine offline.
|
|
325
|
+
|
|
326
|
+
### Model routing / cascades
|
|
327
|
+
|
|
328
|
+
Classify the task, send it to the **cheapest model that passes the eval for that task
|
|
329
|
+
class**, and escalate only on a failed self-check — never default to the flagship.
|
|
330
|
+
|
|
331
|
+
```python
|
|
332
|
+
async def route_and_run(task: str, classify, providers: dict[str, tuple[LLMProvider, str]]) -> str:
|
|
333
|
+
tier = await classify(task) # "cheap" | "default" | "smart"
|
|
334
|
+
order = {"cheap": ["cheap", "default", "smart"],
|
|
335
|
+
"default": ["default", "smart"],
|
|
336
|
+
"smart": ["smart"]}[tier]
|
|
337
|
+
for name in order:
|
|
338
|
+
provider, model = providers[name]
|
|
339
|
+
resp = await provider.complete(CompletionRequest(model=model,
|
|
340
|
+
messages=[Message(role="user", content=task)]))
|
|
341
|
+
if await _self_check(provider, model, task, resp.text): # escalate only on failure
|
|
342
|
+
return resp.text
|
|
343
|
+
return resp.text # smart tier is terminal
|
|
344
|
+
|
|
345
|
+
|
|
346
|
+
async def _self_check(provider, model, task: str, answer: str) -> bool:
|
|
347
|
+
v = await complete_structured(provider, CompletionRequest(model=model, messages=[
|
|
348
|
+
Message(role="user", content=f"Does this answer fully and correctly address the task? "
|
|
349
|
+
f"TASK: {task}\nANSWER: {answer}")]), Verdict)
|
|
350
|
+
return v.score >= 0.8
|
|
351
|
+
```
|
|
352
|
+
|
|
353
|
+
### Budgets
|
|
354
|
+
|
|
355
|
+
An immutable per-tenant cost tracker with a hard stop. Frozen dataclass = auditable, never
|
|
356
|
+
silently mutated.
|
|
357
|
+
|
|
358
|
+
```python
|
|
359
|
+
from dataclasses import dataclass, field
|
|
360
|
+
|
|
361
|
+
|
|
362
|
+
class BudgetExceededError(Exception):
|
|
363
|
+
pass
|
|
364
|
+
|
|
365
|
+
|
|
366
|
+
@dataclass(frozen=True, slots=True)
|
|
367
|
+
class CostRecord:
|
|
368
|
+
tenant: str
|
|
369
|
+
model: str
|
|
370
|
+
input_tokens: int
|
|
371
|
+
output_tokens: int
|
|
372
|
+
cost_usd: float
|
|
373
|
+
|
|
374
|
+
|
|
375
|
+
@dataclass(frozen=True, slots=True)
|
|
376
|
+
class CostTracker:
|
|
377
|
+
budgets_usd: dict[str, float] = field(default_factory=dict) # per-tenant cap
|
|
378
|
+
records: tuple[CostRecord, ...] = ()
|
|
379
|
+
|
|
380
|
+
def add(self, r: CostRecord) -> "CostTracker":
|
|
381
|
+
spent = self.spent(r.tenant) + r.cost_usd
|
|
382
|
+
cap = self.budgets_usd.get(r.tenant, float("inf"))
|
|
383
|
+
if spent > cap:
|
|
384
|
+
raise BudgetExceededError(f"{r.tenant}: ${spent:.4f} exceeds ${cap:.2f}")
|
|
385
|
+
return CostTracker(budgets_usd=self.budgets_usd, records=(*self.records, r))
|
|
386
|
+
|
|
387
|
+
def spent(self, tenant: str) -> float:
|
|
388
|
+
return sum(rec.cost_usd for rec in self.records if rec.tenant == tenant)
|
|
389
|
+
```
|
|
390
|
+
|
|
391
|
+
Compute `cost_usd` from the dated pricing table — never hardcode a price in business logic.
|
|
392
|
+
|
|
393
|
+
| Model (provider:id) | $/MTok in | $/MTok out | Notes |
|
|
394
|
+
|---|---|---|---|
|
|
395
|
+
| anthropic:claude-haiku-4-5 | 1.00 | 5.00 | cheap/fast; caching −90% read, batch −50% |
|
|
396
|
+
| anthropic:claude-sonnet-4-6 | 3.00 | 15.00 | default; 1M-token context flat-rate |
|
|
397
|
+
| anthropic:claude-opus-4-8 | 5.00 | 25.00 | flagship Opus (4.8, May 2026); re-verify the id |
|
|
398
|
+
| openai:gpt-5.5 | 5.00 | 30.00 | flagship; >272K prompt billed 2× in / 1.5× out |
|
|
399
|
+
| openai:gpt-5.4 | 2.50 | 15.00 | cached input −90% |
|
|
400
|
+
| openai:gpt-5.2-codex | 1.75 | 14.00 | code-tuned |
|
|
401
|
+
| gemini:gemini-3.5-flash | 1.50 | 9.00 | cached 0.15; 1M context |
|
|
402
|
+
| gemini:gemini-3.1-pro | — | — | paid-only since 2026-04-01 |
|
|
403
|
+
|
|
404
|
+
All figures **(as of 2026-06; verify before quoting)**. Load them from config and refresh
|
|
405
|
+
the table before trusting any number.
|
|
406
|
+
|
|
407
|
+
## Anti-patterns
|
|
408
|
+
|
|
409
|
+
- **Overfitting prompts to the eval set** — you measure memorization, not capability.
|
|
410
|
+
Hold out a fresh split.
|
|
411
|
+
- **Happy-path-only evals** — include adversarial, empty, and malformed inputs.
|
|
412
|
+
- **Chasing accuracy while cost/latency drift** — gate on `cost_per_task` and `p95`, not
|
|
413
|
+
just accuracy.
|
|
414
|
+
- **Flaky graders in release gates** — calibrate and majority-vote, or keep them advisory.
|
|
415
|
+
- **Tracing PII without redaction** — spans persist; redact before emit.
|
|
416
|
+
|
|
417
|
+
## See also
|
|
418
|
+
|
|
419
|
+
- `provider-abstraction.md` — the `LLMProvider` the runner, judge, and router all use.
|
|
420
|
+
- `agent-loops-and-harness.md` — budgets and `AgentState` the `CostTracker` complements.
|