@ruaruababa/vibe-kit 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CATALOG.md +317 -0
- package/README.md +121 -0
- package/aliases.json +65 -0
- package/bin/vibe.js +2 -0
- package/bundles.json +265 -0
- package/catalog.json +1560 -0
- package/dist/antigravity-skills/bin/cli.js +438 -0
- package/dist/antigravity-skills/lib/skill-utils.js +158 -0
- package/dist/antigravity-skills/scripts/build-catalog.js +305 -0
- package/dist/antigravity-skills/scripts/normalize-frontmatter.js +144 -0
- package/dist/antigravity-skills/scripts/validate-skills.js +230 -0
- package/dist/bin/vibe.js +2 -0
- package/dist/dist/src/cli/index.js +26 -0
- package/dist/lib/skill-utils.js +158 -0
- package/dist/scripts/build-catalog.js +50 -0
- package/dist/scripts/normalize-frontmatter.js +144 -0
- package/dist/scripts/validate-skills.js +56 -0
- package/dist/src/cli/index.js +146 -0
- package/dist/src/types/index.js +13 -0
- package/dist/src/utils/fs.js +1 -0
- package/package.json +43 -0
- package/skills/accessibility-compliance-accessibility-audit/SKILL.md +42 -0
- package/skills/accessibility-compliance-accessibility-audit/resources/implementation-playbook.md +502 -0
- package/skills/agent-orchestration-improve-agent/SKILL.md +349 -0
- package/skills/agent-orchestration-multi-agent-optimize/SKILL.md +239 -0
- package/skills/agent-orchestrator/SKILL.md +24 -0
- package/skills/ai-engineer/SKILL.md +171 -0
- package/skills/airflow-dag-patterns/SKILL.md +41 -0
- package/skills/airflow-dag-patterns/resources/implementation-playbook.md +509 -0
- package/skills/angular-migration/SKILL.md +428 -0
- package/skills/anti-reversing-techniques/SKILL.md +42 -0
- package/skills/anti-reversing-techniques/resources/implementation-playbook.md +539 -0
- package/skills/api-design-principles/SKILL.md +37 -0
- package/skills/api-design-principles/assets/api-design-checklist.md +155 -0
- package/skills/api-design-principles/assets/rest-api-template.py +182 -0
- package/skills/api-design-principles/references/graphql-schema-design.md +583 -0
- package/skills/api-design-principles/references/rest-best-practices.md +408 -0
- package/skills/api-design-principles/resources/implementation-playbook.md +513 -0
- package/skills/api-documenter/SKILL.md +184 -0
- package/skills/api-testing-observability-api-mock/SKILL.md +46 -0
- package/skills/api-testing-observability-api-mock/resources/implementation-playbook.md +1327 -0
- package/skills/application-performance-performance-optimization/SKILL.md +154 -0
- package/skills/architect-review/SKILL.md +174 -0
- package/skills/architecture-decision-records/SKILL.md +441 -0
- package/skills/architecture-patterns/SKILL.md +37 -0
- package/skills/architecture-patterns/resources/implementation-playbook.md +479 -0
- package/skills/arm-cortex-expert/SKILL.md +306 -0
- package/skills/async-python-patterns/SKILL.md +39 -0
- package/skills/async-python-patterns/resources/implementation-playbook.md +678 -0
- package/skills/attack-tree-construction/SKILL.md +38 -0
- package/skills/attack-tree-construction/resources/implementation-playbook.md +671 -0
- package/skills/auth-implementation-patterns/SKILL.md +39 -0
- package/skills/auth-implementation-patterns/resources/implementation-playbook.md +618 -0
- package/skills/backend-architect/SKILL.md +333 -0
- package/skills/backend-development-feature-development/SKILL.md +180 -0
- package/skills/backend-security-coder/SKILL.md +156 -0
- package/skills/backtesting-frameworks/SKILL.md +39 -0
- package/skills/backtesting-frameworks/resources/implementation-playbook.md +647 -0
- package/skills/bash-defensive-patterns/SKILL.md +43 -0
- package/skills/bash-defensive-patterns/resources/implementation-playbook.md +517 -0
- package/skills/bash-pro/SKILL.md +310 -0
- package/skills/bats-testing-patterns/SKILL.md +34 -0
- package/skills/bats-testing-patterns/resources/implementation-playbook.md +614 -0
- package/skills/bazel-build-optimization/SKILL.md +397 -0
- package/skills/billing-automation/SKILL.md +42 -0
- package/skills/billing-automation/resources/implementation-playbook.md +544 -0
- package/skills/binary-analysis-patterns/SKILL.md +450 -0
- package/skills/blockchain-developer/SKILL.md +208 -0
- package/skills/business-analyst/SKILL.md +182 -0
- package/skills/c-pro/SKILL.md +56 -0
- package/skills/c4-architecture-c4-architecture/SKILL.md +389 -0
- package/skills/c4-code/SKILL.md +244 -0
- package/skills/c4-component/SKILL.md +153 -0
- package/skills/c4-container/SKILL.md +171 -0
- package/skills/c4-context/SKILL.md +150 -0
- package/skills/changelog-automation/SKILL.md +38 -0
- package/skills/changelog-automation/resources/implementation-playbook.md +538 -0
- package/skills/cicd-automation-workflow-automate/SKILL.md +51 -0
- package/skills/cicd-automation-workflow-automate/resources/implementation-playbook.md +1333 -0
- package/skills/clean-markdown/SKILL.md +23 -0
- package/skills/cloud-architect/SKILL.md +135 -0
- package/skills/code-documentation-code-explain/SKILL.md +46 -0
- package/skills/code-documentation-code-explain/resources/implementation-playbook.md +802 -0
- package/skills/code-documentation-doc-generate/SKILL.md +48 -0
- package/skills/code-documentation-doc-generate/resources/implementation-playbook.md +640 -0
- package/skills/code-refactoring-context-restore/SKILL.md +179 -0
- package/skills/code-refactoring-refactor-clean/SKILL.md +51 -0
- package/skills/code-refactoring-refactor-clean/resources/implementation-playbook.md +879 -0
- package/skills/code-refactoring-tech-debt/SKILL.md +386 -0
- package/skills/code-review-ai-ai-review/SKILL.md +450 -0
- package/skills/code-review-excellence/SKILL.md +40 -0
- package/skills/code-review-excellence/resources/implementation-playbook.md +515 -0
- package/skills/code-reviewer/SKILL.md +178 -0
- package/skills/codebase-cleanup-deps-audit/SKILL.md +51 -0
- package/skills/codebase-cleanup-deps-audit/resources/implementation-playbook.md +766 -0
- package/skills/codebase-cleanup-refactor-clean/SKILL.md +51 -0
- package/skills/codebase-cleanup-refactor-clean/resources/implementation-playbook.md +879 -0
- package/skills/codebase-cleanup-tech-debt/SKILL.md +386 -0
- package/skills/competitive-landscape/SKILL.md +34 -0
- package/skills/competitive-landscape/resources/implementation-playbook.md +494 -0
- package/skills/comprehensive-review-full-review/SKILL.md +146 -0
- package/skills/comprehensive-review-pr-enhance/SKILL.md +46 -0
- package/skills/comprehensive-review-pr-enhance/resources/implementation-playbook.md +691 -0
- package/skills/conductor-implement/SKILL.md +388 -0
- package/skills/conductor-manage/SKILL.md +39 -0
- package/skills/conductor-manage/resources/implementation-playbook.md +1120 -0
- package/skills/conductor-new-track/SKILL.md +433 -0
- package/skills/conductor-revert/SKILL.md +372 -0
- package/skills/conductor-setup/SKILL.md +426 -0
- package/skills/conductor-status/SKILL.md +338 -0
- package/skills/conductor-validator/SKILL.md +62 -0
- package/skills/content-marketer/SKILL.md +170 -0
- package/skills/context-driven-development/SKILL.md +400 -0
- package/skills/context-management-context-restore/SKILL.md +179 -0
- package/skills/context-management-context-save/SKILL.md +177 -0
- package/skills/context-manager/SKILL.md +185 -0
- package/skills/cost-optimization/SKILL.md +286 -0
- package/skills/cpp-pro/SKILL.md +59 -0
- package/skills/cqrs-implementation/SKILL.md +35 -0
- package/skills/cqrs-implementation/resources/implementation-playbook.md +540 -0
- package/skills/csharp-pro/SKILL.md +59 -0
- package/skills/customer-support/SKILL.md +170 -0
- package/skills/data-engineer/SKILL.md +224 -0
- package/skills/data-engineering-data-driven-feature/SKILL.md +182 -0
- package/skills/data-engineering-data-pipeline/SKILL.md +201 -0
- package/skills/data-quality-frameworks/SKILL.md +40 -0
- package/skills/data-quality-frameworks/resources/implementation-playbook.md +573 -0
- package/skills/data-scientist/SKILL.md +199 -0
- package/skills/data-storytelling/SKILL.md +465 -0
- package/skills/database-admin/SKILL.md +165 -0
- package/skills/database-architect/SKILL.md +268 -0
- package/skills/database-cloud-optimization-cost-optimize/SKILL.md +44 -0
- package/skills/database-cloud-optimization-cost-optimize/resources/implementation-playbook.md +1441 -0
- package/skills/database-migration/SKILL.md +436 -0
- package/skills/database-migrations-migration-observability/SKILL.md +420 -0
- package/skills/database-migrations-sql-migrations/SKILL.md +53 -0
- package/skills/database-migrations-sql-migrations/resources/implementation-playbook.md +499 -0
- package/skills/database-optimizer/SKILL.md +167 -0
- package/skills/dbt-transformation-patterns/SKILL.md +34 -0
- package/skills/dbt-transformation-patterns/resources/implementation-playbook.md +547 -0
- package/skills/debugger/SKILL.md +49 -0
- package/skills/debugging-strategies/SKILL.md +34 -0
- package/skills/debugging-strategies/resources/implementation-playbook.md +511 -0
- package/skills/debugging-toolkit-smart-debug/SKILL.md +197 -0
- package/skills/defi-protocol-templates/SKILL.md +466 -0
- package/skills/dependency-management-deps-audit/SKILL.md +44 -0
- package/skills/dependency-management-deps-audit/resources/implementation-playbook.md +766 -0
- package/skills/dependency-upgrade/SKILL.md +421 -0
- package/skills/deployment-engineer/SKILL.md +170 -0
- package/skills/deployment-pipeline-design/SKILL.md +371 -0
- package/skills/deployment-validation-config-validate/SKILL.md +496 -0
- package/skills/devops-troubleshooter/SKILL.md +161 -0
- package/skills/distributed-debugging-debug-trace/SKILL.md +44 -0
- package/skills/distributed-debugging-debug-trace/resources/implementation-playbook.md +1307 -0
- package/skills/distributed-tracing/SKILL.md +450 -0
- package/skills/django-pro/SKILL.md +180 -0
- package/skills/docs-architect/SKILL.md +98 -0
- package/skills/documentation-generation-doc-generate/SKILL.md +48 -0
- package/skills/documentation-generation-doc-generate/resources/implementation-playbook.md +640 -0
- package/skills/dotnet-architect/SKILL.md +197 -0
- package/skills/dotnet-backend-patterns/SKILL.md +37 -0
- package/skills/dotnet-backend-patterns/assets/repository-template.cs +523 -0
- package/skills/dotnet-backend-patterns/assets/service-template.cs +336 -0
- package/skills/dotnet-backend-patterns/references/dapper-patterns.md +544 -0
- package/skills/dotnet-backend-patterns/references/ef-core-best-practices.md +355 -0
- package/skills/dotnet-backend-patterns/resources/implementation-playbook.md +799 -0
- package/skills/dummy-skill/SKILL.md +5 -0
- package/skills/dx-optimizer/SKILL.md +83 -0
- package/skills/e2e-testing-patterns/SKILL.md +41 -0
- package/skills/e2e-testing-patterns/resources/implementation-playbook.md +531 -0
- package/skills/elixir-pro/SKILL.md +59 -0
- package/skills/embedding-strategies/SKILL.md +491 -0
- package/skills/employment-contract-templates/SKILL.md +39 -0
- package/skills/employment-contract-templates/resources/implementation-playbook.md +493 -0
- package/skills/error-debugging-error-analysis/SKILL.md +47 -0
- package/skills/error-debugging-error-analysis/resources/implementation-playbook.md +1143 -0
- package/skills/error-debugging-error-trace/SKILL.md +43 -0
- package/skills/error-debugging-error-trace/resources/implementation-playbook.md +1361 -0
- package/skills/error-debugging-multi-agent-review/SKILL.md +216 -0
- package/skills/error-detective/SKILL.md +53 -0
- package/skills/error-diagnostics-error-analysis/SKILL.md +47 -0
- package/skills/error-diagnostics-error-analysis/resources/implementation-playbook.md +1143 -0
- package/skills/error-diagnostics-error-trace/SKILL.md +48 -0
- package/skills/error-diagnostics-error-trace/resources/implementation-playbook.md +1371 -0
- package/skills/error-diagnostics-smart-debug/SKILL.md +197 -0
- package/skills/error-handling-patterns/SKILL.md +35 -0
- package/skills/error-handling-patterns/resources/implementation-playbook.md +635 -0
- package/skills/event-sourcing-architect/SKILL.md +58 -0
- package/skills/event-store-design/SKILL.md +449 -0
- package/skills/fastapi-pro/SKILL.md +192 -0
- package/skills/fastapi-templates/SKILL.md +32 -0
- package/skills/fastapi-templates/resources/implementation-playbook.md +566 -0
- package/skills/final-test/SKILL.md +5 -0
- package/skills/firmware-analyst/SKILL.md +320 -0
- package/skills/flutter-expert/SKILL.md +200 -0
- package/skills/framework-migration-code-migrate/SKILL.md +48 -0
- package/skills/framework-migration-code-migrate/resources/implementation-playbook.md +1052 -0
- package/skills/framework-migration-deps-upgrade/SKILL.md +48 -0
- package/skills/framework-migration-deps-upgrade/resources/implementation-playbook.md +755 -0
- package/skills/framework-migration-legacy-modernize/SKILL.md +132 -0
- package/skills/frontend-developer/SKILL.md +171 -0
- package/skills/frontend-mobile-development-component-scaffold/SKILL.md +403 -0
- package/skills/frontend-mobile-security-xss-scan/SKILL.md +322 -0
- package/skills/frontend-security-coder/SKILL.md +170 -0
- package/skills/full-stack-orchestration-full-stack-feature/SKILL.md +135 -0
- package/skills/gdpr-data-handling/SKILL.md +33 -0
- package/skills/gdpr-data-handling/resources/implementation-playbook.md +615 -0
- package/skills/git-advanced-workflows/SKILL.md +412 -0
- package/skills/git-pr-workflows-git-workflow/SKILL.md +140 -0
- package/skills/git-pr-workflows-onboard/SKILL.md +416 -0
- package/skills/git-pr-workflows-pr-enhance/SKILL.md +48 -0
- package/skills/git-pr-workflows-pr-enhance/resources/implementation-playbook.md +701 -0
- package/skills/github-actions-templates/SKILL.md +345 -0
- package/skills/gitlab-ci-patterns/SKILL.md +283 -0
- package/skills/gitops-workflow/SKILL.md +303 -0
- package/skills/gitops-workflow/references/argocd-setup.md +134 -0
- package/skills/gitops-workflow/references/sync-policies.md +131 -0
- package/skills/go-concurrency-patterns/SKILL.md +33 -0
- package/skills/go-concurrency-patterns/resources/implementation-playbook.md +654 -0
- package/skills/godot-gdscript-patterns/SKILL.md +33 -0
- package/skills/godot-gdscript-patterns/resources/implementation-playbook.md +804 -0
- package/skills/golang-pro/SKILL.md +179 -0
- package/skills/grafana-dashboards/SKILL.md +381 -0
- package/skills/graphql-architect/SKILL.md +182 -0
- package/skills/haskell-pro/SKILL.md +56 -0
- package/skills/helm-chart-scaffolding/SKILL.md +34 -0
- package/skills/helm-chart-scaffolding/assets/Chart.yaml.template +42 -0
- package/skills/helm-chart-scaffolding/assets/values.yaml.template +185 -0
- package/skills/helm-chart-scaffolding/references/chart-structure.md +500 -0
- package/skills/helm-chart-scaffolding/resources/implementation-playbook.md +543 -0
- package/skills/helm-chart-scaffolding/scripts/validate-chart.sh +244 -0
- package/skills/hr-pro/SKILL.md +126 -0
- package/skills/hybrid-cloud-architect/SKILL.md +168 -0
- package/skills/hybrid-cloud-networking/SKILL.md +238 -0
- package/skills/hybrid-search-implementation/SKILL.md +32 -0
- package/skills/hybrid-search-implementation/resources/implementation-playbook.md +567 -0
- package/skills/incident-responder/SKILL.md +213 -0
- package/skills/incident-response-incident-response/SKILL.md +168 -0
- package/skills/incident-response-smart-fix/SKILL.md +29 -0
- package/skills/incident-response-smart-fix/resources/implementation-playbook.md +838 -0
- package/skills/incident-runbook-templates/SKILL.md +395 -0
- package/skills/ios-developer/SKILL.md +219 -0
- package/skills/istio-traffic-management/SKILL.md +337 -0
- package/skills/java-pro/SKILL.md +177 -0
- package/skills/javascript-pro/SKILL.md +57 -0
- package/skills/javascript-testing-patterns/SKILL.md +35 -0
- package/skills/javascript-testing-patterns/resources/implementation-playbook.md +1024 -0
- package/skills/javascript-typescript-typescript-scaffold/SKILL.md +361 -0
- package/skills/julia-pro/SKILL.md +209 -0
- package/skills/k8s-manifest-generator/SKILL.md +35 -0
- package/skills/k8s-manifest-generator/assets/configmap-template.yaml +296 -0
- package/skills/k8s-manifest-generator/assets/deployment-template.yaml +203 -0
- package/skills/k8s-manifest-generator/assets/service-template.yaml +171 -0
- package/skills/k8s-manifest-generator/references/deployment-spec.md +753 -0
- package/skills/k8s-manifest-generator/references/service-spec.md +724 -0
- package/skills/k8s-manifest-generator/resources/implementation-playbook.md +510 -0
- package/skills/k8s-security-policies/SKILL.md +346 -0
- package/skills/k8s-security-policies/assets/network-policy-template.yaml +177 -0
- package/skills/k8s-security-policies/references/rbac-patterns.md +187 -0
- package/skills/kpi-dashboard-design/SKILL.md +440 -0
- package/skills/kubernetes-architect/SKILL.md +170 -0
- package/skills/langchain-architecture/SKILL.md +350 -0
- package/skills/legacy-modernizer/SKILL.md +53 -0
- package/skills/legal-advisor/SKILL.md +70 -0
- package/skills/linkerd-patterns/SKILL.md +321 -0
- package/skills/llm-application-dev-ai-assistant/SKILL.md +35 -0
- package/skills/llm-application-dev-ai-assistant/resources/implementation-playbook.md +1236 -0
- package/skills/llm-application-dev-langchain-agent/SKILL.md +246 -0
- package/skills/llm-application-dev-prompt-optimize/SKILL.md +37 -0
- package/skills/llm-application-dev-prompt-optimize/resources/implementation-playbook.md +591 -0
- package/skills/llm-evaluation/SKILL.md +483 -0
- package/skills/machine-learning-ops-ml-pipeline/SKILL.md +314 -0
- package/skills/malware-analyst/SKILL.md +247 -0
- package/skills/market-sizing-analysis/SKILL.md +425 -0
- package/skills/market-sizing-analysis/examples/saas-market-sizing.md +349 -0
- package/skills/market-sizing-analysis/references/data-sources.md +360 -0
- package/skills/memory-forensics/SKILL.md +491 -0
- package/skills/memory-safety-patterns/SKILL.md +33 -0
- package/skills/memory-safety-patterns/resources/implementation-playbook.md +603 -0
- package/skills/mermaid-expert/SKILL.md +59 -0
- package/skills/microservices-patterns/SKILL.md +35 -0
- package/skills/microservices-patterns/resources/implementation-playbook.md +607 -0
- package/skills/minecraft-bukkit-pro/SKILL.md +126 -0
- package/skills/ml-engineer/SKILL.md +168 -0
- package/skills/ml-pipeline-workflow/SKILL.md +257 -0
- package/skills/mlops-engineer/SKILL.md +219 -0
- package/skills/mobile-developer/SKILL.md +205 -0
- package/skills/mobile-security-coder/SKILL.md +184 -0
- package/skills/modern-javascript-patterns/SKILL.md +35 -0
- package/skills/modern-javascript-patterns/resources/implementation-playbook.md +910 -0
- package/skills/monorepo-architect/SKILL.md +61 -0
- package/skills/monorepo-management/SKILL.md +35 -0
- package/skills/monorepo-management/resources/implementation-playbook.md +621 -0
- package/skills/mtls-configuration/SKILL.md +359 -0
- package/skills/multi-cloud-architecture/SKILL.md +189 -0
- package/skills/multi-platform-apps-multi-platform/SKILL.md +203 -0
- package/skills/network-engineer/SKILL.md +169 -0
- package/skills/nextjs-app-router-patterns/SKILL.md +33 -0
- package/skills/nextjs-app-router-patterns/resources/implementation-playbook.md +543 -0
- package/skills/nft-standards/SKILL.md +395 -0
- package/skills/node-expert/SKILL.md +23 -0
- package/skills/nodejs-backend-patterns/SKILL.md +35 -0
- package/skills/nodejs-backend-patterns/resources/implementation-playbook.md +1019 -0
- package/skills/nx-workspace-patterns/SKILL.md +464 -0
- package/skills/observability-engineer/SKILL.md +237 -0
- package/skills/observability-monitoring-monitor-setup/SKILL.md +48 -0
- package/skills/observability-monitoring-monitor-setup/resources/implementation-playbook.md +505 -0
- package/skills/observability-monitoring-slo-implement/SKILL.md +43 -0
- package/skills/observability-monitoring-slo-implement/resources/implementation-playbook.md +1077 -0
- package/skills/on-call-handoff-patterns/SKILL.md +453 -0
- package/skills/openapi-spec-generation/SKILL.md +33 -0
- package/skills/openapi-spec-generation/resources/implementation-playbook.md +1027 -0
- package/skills/payment-integration/SKILL.md +77 -0
- package/skills/paypal-integration/SKILL.md +479 -0
- package/skills/pci-compliance/SKILL.md +478 -0
- package/skills/performance-engineer/SKILL.md +180 -0
- package/skills/performance-testing-review-ai-review/SKILL.md +450 -0
- package/skills/performance-testing-review-multi-agent-review/SKILL.md +216 -0
- package/skills/php-pro/SKILL.md +63 -0
- package/skills/posix-shell-pro/SKILL.md +304 -0
- package/skills/postgresql/SKILL.md +230 -0
- package/skills/postmortem-writing/SKILL.md +386 -0
- package/skills/projection-patterns/SKILL.md +33 -0
- package/skills/projection-patterns/resources/implementation-playbook.md +501 -0
- package/skills/prometheus-configuration/SKILL.md +404 -0
- package/skills/prompt-engineer/SKILL.md +272 -0
- package/skills/prompt-engineering-patterns/SKILL.md +213 -0
- package/skills/prompt-engineering-patterns/assets/few-shot-examples.json +106 -0
- package/skills/prompt-engineering-patterns/assets/prompt-template-library.md +246 -0
- package/skills/prompt-engineering-patterns/references/chain-of-thought.md +399 -0
- package/skills/prompt-engineering-patterns/references/few-shot-learning.md +369 -0
- package/skills/prompt-engineering-patterns/references/prompt-optimization.md +414 -0
- package/skills/prompt-engineering-patterns/references/prompt-templates.md +470 -0
- package/skills/prompt-engineering-patterns/references/system-prompts.md +189 -0
- package/skills/prompt-engineering-patterns/scripts/optimize-prompt.py +279 -0
- package/skills/protocol-reverse-engineering/SKILL.md +29 -0
- package/skills/protocol-reverse-engineering/resources/implementation-playbook.md +509 -0
- package/skills/python-development-python-scaffold/SKILL.md +331 -0
- package/skills/python-packaging/SKILL.md +36 -0
- package/skills/python-packaging/resources/implementation-playbook.md +869 -0
- package/skills/python-performance-optimization/SKILL.md +36 -0
- package/skills/python-performance-optimization/resources/implementation-playbook.md +868 -0
- package/skills/python-pro/SKILL.md +158 -0
- package/skills/python-testing-patterns/SKILL.md +37 -0
- package/skills/python-testing-patterns/resources/implementation-playbook.md +906 -0
- package/skills/quant-analyst/SKILL.md +53 -0
- package/skills/rag-implementation/SKILL.md +421 -0
- package/skills/react-modernization/SKILL.md +34 -0
- package/skills/react-modernization/resources/implementation-playbook.md +512 -0
- package/skills/react-native-architecture/SKILL.md +33 -0
- package/skills/react-native-architecture/resources/implementation-playbook.md +670 -0
- package/skills/react-state-management/SKILL.md +441 -0
- package/skills/reference-builder/SKILL.md +188 -0
- package/skills/reverse-engineer/SKILL.md +173 -0
- package/skills/risk-manager/SKILL.md +61 -0
- package/skills/risk-metrics-calculation/SKILL.md +33 -0
- package/skills/risk-metrics-calculation/resources/implementation-playbook.md +554 -0
- package/skills/ruby-pro/SKILL.md +56 -0
- package/skills/rust-async-patterns/SKILL.md +33 -0
- package/skills/rust-async-patterns/resources/implementation-playbook.md +516 -0
- package/skills/rust-pro/SKILL.md +178 -0
- package/skills/saga-orchestration/SKILL.md +496 -0
- package/skills/sales-automator/SKILL.md +55 -0
- package/skills/sast-configuration/SKILL.md +212 -0
- package/skills/scala-pro/SKILL.md +82 -0
- package/skills/screen-reader-testing/SKILL.md +33 -0
- package/skills/screen-reader-testing/resources/implementation-playbook.md +544 -0
- package/skills/search-specialist/SKILL.md +80 -0
- package/skills/secrets-management/SKILL.md +364 -0
- package/skills/security-auditor/SKILL.md +169 -0
- package/skills/security-compliance-compliance-check/SKILL.md +55 -0
- package/skills/security-compliance-compliance-check/resources/implementation-playbook.md +963 -0
- package/skills/security-requirement-extraction/SKILL.md +33 -0
- package/skills/security-requirement-extraction/resources/implementation-playbook.md +676 -0
- package/skills/security-scanning-security-dependencies/SKILL.md +43 -0
- package/skills/security-scanning-security-dependencies/resources/implementation-playbook.md +544 -0
- package/skills/security-scanning-security-hardening/SKILL.md +147 -0
- package/skills/security-scanning-security-sast/SKILL.md +495 -0
- package/skills/seo-authority-builder/SKILL.md +136 -0
- package/skills/seo-cannibalization-detector/SKILL.md +123 -0
- package/skills/seo-content-auditor/SKILL.md +83 -0
- package/skills/seo-content-planner/SKILL.md +108 -0
- package/skills/seo-content-refresher/SKILL.md +118 -0
- package/skills/seo-content-writer/SKILL.md +96 -0
- package/skills/seo-keyword-strategist/SKILL.md +95 -0
- package/skills/seo-meta-optimizer/SKILL.md +92 -0
- package/skills/seo-snippet-hunter/SKILL.md +114 -0
- package/skills/seo-structure-architect/SKILL.md +108 -0
- package/skills/service-mesh-expert/SKILL.md +58 -0
- package/skills/service-mesh-observability/SKILL.md +395 -0
- package/skills/shellcheck-configuration/SKILL.md +466 -0
- package/skills/similarity-search-patterns/SKILL.md +33 -0
- package/skills/similarity-search-patterns/resources/implementation-playbook.md +557 -0
- package/skills/slo-implementation/SKILL.md +341 -0
- package/skills/solidity-security/SKILL.md +34 -0
- package/skills/solidity-security/resources/implementation-playbook.md +524 -0
- package/skills/spark-optimization/SKILL.md +427 -0
- package/skills/sql-optimization-patterns/SKILL.md +35 -0
- package/skills/sql-optimization-patterns/resources/implementation-playbook.md +504 -0
- package/skills/sql-pro/SKILL.md +173 -0
- package/skills/startup-analyst/SKILL.md +328 -0
- package/skills/startup-business-analyst-business-case/SKILL.md +487 -0
- package/skills/startup-business-analyst-financial-projections/SKILL.md +353 -0
- package/skills/startup-business-analyst-market-opportunity/SKILL.md +240 -0
- package/skills/startup-financial-modeling/SKILL.md +467 -0
- package/skills/startup-metrics-framework/SKILL.md +34 -0
- package/skills/startup-metrics-framework/resources/implementation-playbook.md +500 -0
- package/skills/stride-analysis-patterns/SKILL.md +33 -0
- package/skills/stride-analysis-patterns/resources/implementation-playbook.md +655 -0
- package/skills/stripe-integration/SKILL.md +454 -0
- package/skills/systems-programming-rust-project/SKILL.md +440 -0
- package/skills/tailwind-design-system/SKILL.md +33 -0
- package/skills/tailwind-design-system/resources/implementation-playbook.md +665 -0
- package/skills/tdd-orchestrator/SKILL.md +205 -0
- package/skills/tdd-workflows-tdd-cycle/SKILL.md +221 -0
- package/skills/tdd-workflows-tdd-green/SKILL.md +73 -0
- package/skills/tdd-workflows-tdd-green/resources/implementation-playbook.md +870 -0
- package/skills/tdd-workflows-tdd-red/SKILL.md +164 -0
- package/skills/tdd-workflows-tdd-refactor/SKILL.md +187 -0
- package/skills/team-collaboration-issue/SKILL.md +37 -0
- package/skills/team-collaboration-issue/resources/implementation-playbook.md +640 -0
- package/skills/team-collaboration-standup-notes/SKILL.md +44 -0
- package/skills/team-collaboration-standup-notes/resources/implementation-playbook.md +768 -0
- package/skills/team-composition-analysis/SKILL.md +413 -0
- package/skills/temporal-python-pro/SKILL.md +370 -0
- package/skills/temporal-python-testing/SKILL.md +170 -0
- package/skills/temporal-python-testing/resources/integration-testing.md +455 -0
- package/skills/temporal-python-testing/resources/local-setup.md +553 -0
- package/skills/temporal-python-testing/resources/replay-testing.md +462 -0
- package/skills/temporal-python-testing/resources/unit-testing.md +328 -0
- package/skills/terraform-module-library/SKILL.md +261 -0
- package/skills/terraform-module-library/references/aws-modules.md +63 -0
- package/skills/terraform-specialist/SKILL.md +166 -0
- package/skills/test-automator/SKILL.md +224 -0
- package/skills/threat-mitigation-mapping/SKILL.md +33 -0
- package/skills/threat-mitigation-mapping/resources/implementation-playbook.md +744 -0
- package/skills/threat-modeling-expert/SKILL.md +60 -0
- package/skills/track-management/SKILL.md +38 -0
- package/skills/track-management/resources/implementation-playbook.md +591 -0
- package/skills/turborepo-caching/SKILL.md +419 -0
- package/skills/tutorial-engineer/SKILL.md +139 -0
- package/skills/typescript-advanced-types/SKILL.md +35 -0
- package/skills/typescript-advanced-types/resources/implementation-playbook.md +716 -0
- package/skills/typescript-pro/SKILL.md +55 -0
- package/skills/ui-minimal/SKILL.md +23 -0
- package/skills/ui-ux-designer/SKILL.md +209 -0
- package/skills/ui-visual-validator/SKILL.md +214 -0
- package/skills/unit-testing-test-generate/SKILL.md +319 -0
- package/skills/unity-developer/SKILL.md +230 -0
- package/skills/unity-ecs-patterns/SKILL.md +33 -0
- package/skills/unity-ecs-patterns/resources/implementation-playbook.md +625 -0
- package/skills/uv-package-manager/SKILL.md +37 -0
- package/skills/uv-package-manager/resources/implementation-playbook.md +830 -0
- package/skills/vector-database-engineer/SKILL.md +60 -0
- package/skills/vector-index-tuning/SKILL.md +42 -0
- package/skills/vector-index-tuning/resources/implementation-playbook.md +507 -0
- package/skills/wcag-audit-patterns/SKILL.md +41 -0
- package/skills/wcag-audit-patterns/resources/implementation-playbook.md +541 -0
- package/skills/web3-testing/SKILL.md +427 -0
- package/skills/workflow-orchestration-patterns/SKILL.md +333 -0
- package/skills/workflow-patterns/SKILL.md +38 -0
- package/skills/workflow-patterns/resources/implementation-playbook.md +621 -0
|
@@ -0,0 +1,427 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: spark-optimization
|
|
3
|
+
description: Optimize Apache Spark jobs with partitioning, caching, shuffle optimization, and memory tuning. Use when improving Spark performance, debugging slow jobs, or scaling data processing pipelines.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Apache Spark Optimization
|
|
7
|
+
|
|
8
|
+
Production patterns for optimizing Apache Spark jobs including partitioning strategies, memory management, shuffle optimization, and performance tuning.
|
|
9
|
+
|
|
10
|
+
## Do not use this skill when
|
|
11
|
+
|
|
12
|
+
- The task is unrelated to apache spark optimization
|
|
13
|
+
- You need a different domain or tool outside this scope
|
|
14
|
+
|
|
15
|
+
## Instructions
|
|
16
|
+
|
|
17
|
+
- Clarify goals, constraints, and required inputs.
|
|
18
|
+
- Apply relevant best practices and validate outcomes.
|
|
19
|
+
- Provide actionable steps and verification.
|
|
20
|
+
- If detailed examples are required, open `resources/implementation-playbook.md`.
|
|
21
|
+
|
|
22
|
+
## Use this skill when
|
|
23
|
+
|
|
24
|
+
- Optimizing slow Spark jobs
|
|
25
|
+
- Tuning memory and executor configuration
|
|
26
|
+
- Implementing efficient partitioning strategies
|
|
27
|
+
- Debugging Spark performance issues
|
|
28
|
+
- Scaling Spark pipelines for large datasets
|
|
29
|
+
- Reducing shuffle and data skew
|
|
30
|
+
|
|
31
|
+
## Core Concepts
|
|
32
|
+
|
|
33
|
+
### 1. Spark Execution Model
|
|
34
|
+
|
|
35
|
+
```
|
|
36
|
+
Driver Program
|
|
37
|
+
↓
|
|
38
|
+
Job (triggered by action)
|
|
39
|
+
↓
|
|
40
|
+
Stages (separated by shuffles)
|
|
41
|
+
↓
|
|
42
|
+
Tasks (one per partition)
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
### 2. Key Performance Factors
|
|
46
|
+
|
|
47
|
+
| Factor | Impact | Solution |
|
|
48
|
+
|--------|--------|----------|
|
|
49
|
+
| **Shuffle** | Network I/O, disk I/O | Minimize wide transformations |
|
|
50
|
+
| **Data Skew** | Uneven task duration | Salting, broadcast joins |
|
|
51
|
+
| **Serialization** | CPU overhead | Use Kryo, columnar formats |
|
|
52
|
+
| **Memory** | GC pressure, spills | Tune executor memory |
|
|
53
|
+
| **Partitions** | Parallelism | Right-size partitions |
|
|
54
|
+
|
|
55
|
+
## Quick Start
|
|
56
|
+
|
|
57
|
+
```python
|
|
58
|
+
from pyspark.sql import SparkSession
|
|
59
|
+
from pyspark.sql import functions as F
|
|
60
|
+
|
|
61
|
+
# Create optimized Spark session
|
|
62
|
+
spark = (SparkSession.builder
|
|
63
|
+
.appName("OptimizedJob")
|
|
64
|
+
.config("spark.sql.adaptive.enabled", "true")
|
|
65
|
+
.config("spark.sql.adaptive.coalescePartitions.enabled", "true")
|
|
66
|
+
.config("spark.sql.adaptive.skewJoin.enabled", "true")
|
|
67
|
+
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
|
|
68
|
+
.config("spark.sql.shuffle.partitions", "200")
|
|
69
|
+
.getOrCreate())
|
|
70
|
+
|
|
71
|
+
# Read with optimized settings
|
|
72
|
+
df = (spark.read
|
|
73
|
+
.format("parquet")
|
|
74
|
+
.option("mergeSchema", "false")
|
|
75
|
+
.load("s3://bucket/data/"))
|
|
76
|
+
|
|
77
|
+
# Efficient transformations
|
|
78
|
+
result = (df
|
|
79
|
+
.filter(F.col("date") >= "2024-01-01")
|
|
80
|
+
.select("id", "amount", "category")
|
|
81
|
+
.groupBy("category")
|
|
82
|
+
.agg(F.sum("amount").alias("total")))
|
|
83
|
+
|
|
84
|
+
result.write.mode("overwrite").parquet("s3://bucket/output/")
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
## Patterns
|
|
88
|
+
|
|
89
|
+
### Pattern 1: Optimal Partitioning
|
|
90
|
+
|
|
91
|
+
```python
|
|
92
|
+
# Calculate optimal partition count
|
|
93
|
+
def calculate_partitions(data_size_gb: float, partition_size_mb: int = 128) -> int:
|
|
94
|
+
"""
|
|
95
|
+
Optimal partition size: 128MB - 256MB
|
|
96
|
+
Too few: Under-utilization, memory pressure
|
|
97
|
+
Too many: Task scheduling overhead
|
|
98
|
+
"""
|
|
99
|
+
return max(int(data_size_gb * 1024 / partition_size_mb), 1)
|
|
100
|
+
|
|
101
|
+
# Repartition for even distribution
|
|
102
|
+
df_repartitioned = df.repartition(200, "partition_key")
|
|
103
|
+
|
|
104
|
+
# Coalesce to reduce partitions (no shuffle)
|
|
105
|
+
df_coalesced = df.coalesce(100)
|
|
106
|
+
|
|
107
|
+
# Partition pruning with predicate pushdown
|
|
108
|
+
df = (spark.read.parquet("s3://bucket/data/")
|
|
109
|
+
.filter(F.col("date") == "2024-01-01")) # Spark pushes this down
|
|
110
|
+
|
|
111
|
+
# Write with partitioning for future queries
|
|
112
|
+
(df.write
|
|
113
|
+
.partitionBy("year", "month", "day")
|
|
114
|
+
.mode("overwrite")
|
|
115
|
+
.parquet("s3://bucket/partitioned_output/"))
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
### Pattern 2: Join Optimization
|
|
119
|
+
|
|
120
|
+
```python
|
|
121
|
+
from pyspark.sql import functions as F
|
|
122
|
+
from pyspark.sql.types import *
|
|
123
|
+
|
|
124
|
+
# 1. Broadcast Join - Small table joins
|
|
125
|
+
# Best when: One side < 10MB (configurable)
|
|
126
|
+
small_df = spark.read.parquet("s3://bucket/small_table/") # < 10MB
|
|
127
|
+
large_df = spark.read.parquet("s3://bucket/large_table/") # TBs
|
|
128
|
+
|
|
129
|
+
# Explicit broadcast hint
|
|
130
|
+
result = large_df.join(
|
|
131
|
+
F.broadcast(small_df),
|
|
132
|
+
on="key",
|
|
133
|
+
how="left"
|
|
134
|
+
)
|
|
135
|
+
|
|
136
|
+
# 2. Sort-Merge Join - Default for large tables
|
|
137
|
+
# Requires shuffle, but handles any size
|
|
138
|
+
result = large_df1.join(large_df2, on="key", how="inner")
|
|
139
|
+
|
|
140
|
+
# 3. Bucket Join - Pre-sorted, no shuffle at join time
|
|
141
|
+
# Write bucketed tables
|
|
142
|
+
(df.write
|
|
143
|
+
.bucketBy(200, "customer_id")
|
|
144
|
+
.sortBy("customer_id")
|
|
145
|
+
.mode("overwrite")
|
|
146
|
+
.saveAsTable("bucketed_orders"))
|
|
147
|
+
|
|
148
|
+
# Join bucketed tables (no shuffle!)
|
|
149
|
+
orders = spark.table("bucketed_orders")
|
|
150
|
+
customers = spark.table("bucketed_customers") # Same bucket count
|
|
151
|
+
result = orders.join(customers, on="customer_id")
|
|
152
|
+
|
|
153
|
+
# 4. Skew Join Handling
|
|
154
|
+
# Enable AQE skew join optimization
|
|
155
|
+
spark.conf.set("spark.sql.adaptive.skewJoin.enabled", "true")
|
|
156
|
+
spark.conf.set("spark.sql.adaptive.skewJoin.skewedPartitionFactor", "5")
|
|
157
|
+
spark.conf.set("spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes", "256MB")
|
|
158
|
+
|
|
159
|
+
# Manual salting for severe skew
|
|
160
|
+
def salt_join(df_skewed, df_other, key_col, num_salts=10):
|
|
161
|
+
"""Add salt to distribute skewed keys"""
|
|
162
|
+
# Add salt to skewed side
|
|
163
|
+
df_salted = df_skewed.withColumn(
|
|
164
|
+
"salt",
|
|
165
|
+
(F.rand() * num_salts).cast("int")
|
|
166
|
+
).withColumn(
|
|
167
|
+
"salted_key",
|
|
168
|
+
F.concat(F.col(key_col), F.lit("_"), F.col("salt"))
|
|
169
|
+
)
|
|
170
|
+
|
|
171
|
+
# Explode other side with all salts
|
|
172
|
+
df_exploded = df_other.crossJoin(
|
|
173
|
+
spark.range(num_salts).withColumnRenamed("id", "salt")
|
|
174
|
+
).withColumn(
|
|
175
|
+
"salted_key",
|
|
176
|
+
F.concat(F.col(key_col), F.lit("_"), F.col("salt"))
|
|
177
|
+
)
|
|
178
|
+
|
|
179
|
+
# Join on salted key
|
|
180
|
+
return df_salted.join(df_exploded, on="salted_key", how="inner")
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
### Pattern 3: Caching and Persistence
|
|
184
|
+
|
|
185
|
+
```python
|
|
186
|
+
from pyspark import StorageLevel
|
|
187
|
+
|
|
188
|
+
# Cache when reusing DataFrame multiple times
|
|
189
|
+
df = spark.read.parquet("s3://bucket/data/")
|
|
190
|
+
df_filtered = df.filter(F.col("status") == "active")
|
|
191
|
+
|
|
192
|
+
# Cache in memory (MEMORY_AND_DISK is default)
|
|
193
|
+
df_filtered.cache()
|
|
194
|
+
|
|
195
|
+
# Or with specific storage level
|
|
196
|
+
df_filtered.persist(StorageLevel.MEMORY_AND_DISK_SER)
|
|
197
|
+
|
|
198
|
+
# Force materialization
|
|
199
|
+
df_filtered.count()
|
|
200
|
+
|
|
201
|
+
# Use in multiple actions
|
|
202
|
+
agg1 = df_filtered.groupBy("category").count()
|
|
203
|
+
agg2 = df_filtered.groupBy("region").sum("amount")
|
|
204
|
+
|
|
205
|
+
# Unpersist when done
|
|
206
|
+
df_filtered.unpersist()
|
|
207
|
+
|
|
208
|
+
# Storage levels explained:
|
|
209
|
+
# MEMORY_ONLY - Fast, but may not fit
|
|
210
|
+
# MEMORY_AND_DISK - Spills to disk if needed (recommended)
|
|
211
|
+
# MEMORY_ONLY_SER - Serialized, less memory, more CPU
|
|
212
|
+
# DISK_ONLY - When memory is tight
|
|
213
|
+
# OFF_HEAP - Tungsten off-heap memory
|
|
214
|
+
|
|
215
|
+
# Checkpoint for complex lineage
|
|
216
|
+
spark.sparkContext.setCheckpointDir("s3://bucket/checkpoints/")
|
|
217
|
+
df_complex = (df
|
|
218
|
+
.join(other_df, "key")
|
|
219
|
+
.groupBy("category")
|
|
220
|
+
.agg(F.sum("amount")))
|
|
221
|
+
df_complex.checkpoint() # Breaks lineage, materializes
|
|
222
|
+
```
|
|
223
|
+
|
|
224
|
+
### Pattern 4: Memory Tuning
|
|
225
|
+
|
|
226
|
+
```python
|
|
227
|
+
# Executor memory configuration
|
|
228
|
+
# spark-submit --executor-memory 8g --executor-cores 4
|
|
229
|
+
|
|
230
|
+
# Memory breakdown (8GB executor):
|
|
231
|
+
# - spark.memory.fraction = 0.6 (60% = 4.8GB for execution + storage)
|
|
232
|
+
# - spark.memory.storageFraction = 0.5 (50% of 4.8GB = 2.4GB for cache)
|
|
233
|
+
# - Remaining 2.4GB for execution (shuffles, joins, sorts)
|
|
234
|
+
# - 40% = 3.2GB for user data structures and internal metadata
|
|
235
|
+
|
|
236
|
+
spark = (SparkSession.builder
|
|
237
|
+
.config("spark.executor.memory", "8g")
|
|
238
|
+
.config("spark.executor.memoryOverhead", "2g") # For non-JVM memory
|
|
239
|
+
.config("spark.memory.fraction", "0.6")
|
|
240
|
+
.config("spark.memory.storageFraction", "0.5")
|
|
241
|
+
.config("spark.sql.shuffle.partitions", "200")
|
|
242
|
+
# For memory-intensive operations
|
|
243
|
+
.config("spark.sql.autoBroadcastJoinThreshold", "50MB")
|
|
244
|
+
# Prevent OOM on large shuffles
|
|
245
|
+
.config("spark.sql.files.maxPartitionBytes", "128MB")
|
|
246
|
+
.getOrCreate())
|
|
247
|
+
|
|
248
|
+
# Monitor memory usage
|
|
249
|
+
def print_memory_usage(spark):
|
|
250
|
+
"""Print current memory usage"""
|
|
251
|
+
sc = spark.sparkContext
|
|
252
|
+
for executor in sc._jsc.sc().getExecutorMemoryStatus().keySet().toArray():
|
|
253
|
+
mem_status = sc._jsc.sc().getExecutorMemoryStatus().get(executor)
|
|
254
|
+
total = mem_status._1() / (1024**3)
|
|
255
|
+
free = mem_status._2() / (1024**3)
|
|
256
|
+
print(f"{executor}: {total:.2f}GB total, {free:.2f}GB free")
|
|
257
|
+
```
|
|
258
|
+
|
|
259
|
+
### Pattern 5: Shuffle Optimization
|
|
260
|
+
|
|
261
|
+
```python
|
|
262
|
+
# Reduce shuffle data size
|
|
263
|
+
spark.conf.set("spark.sql.shuffle.partitions", "auto") # With AQE
|
|
264
|
+
spark.conf.set("spark.shuffle.compress", "true")
|
|
265
|
+
spark.conf.set("spark.shuffle.spill.compress", "true")
|
|
266
|
+
|
|
267
|
+
# Pre-aggregate before shuffle
|
|
268
|
+
df_optimized = (df
|
|
269
|
+
# Local aggregation first (combiner)
|
|
270
|
+
.groupBy("key", "partition_col")
|
|
271
|
+
.agg(F.sum("value").alias("partial_sum"))
|
|
272
|
+
# Then global aggregation
|
|
273
|
+
.groupBy("key")
|
|
274
|
+
.agg(F.sum("partial_sum").alias("total")))
|
|
275
|
+
|
|
276
|
+
# Avoid shuffle with map-side operations
|
|
277
|
+
# BAD: Shuffle for each distinct
|
|
278
|
+
distinct_count = df.select("category").distinct().count()
|
|
279
|
+
|
|
280
|
+
# GOOD: Approximate distinct (no shuffle)
|
|
281
|
+
approx_count = df.select(F.approx_count_distinct("category")).collect()[0][0]
|
|
282
|
+
|
|
283
|
+
# Use coalesce instead of repartition when reducing partitions
|
|
284
|
+
df_reduced = df.coalesce(10) # No shuffle
|
|
285
|
+
|
|
286
|
+
# Optimize shuffle with compression
|
|
287
|
+
spark.conf.set("spark.io.compression.codec", "lz4") # Fast compression
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
### Pattern 6: Data Format Optimization
|
|
291
|
+
|
|
292
|
+
```python
|
|
293
|
+
# Parquet optimizations
|
|
294
|
+
(df.write
|
|
295
|
+
.option("compression", "snappy") # Fast compression
|
|
296
|
+
.option("parquet.block.size", 128 * 1024 * 1024) # 128MB row groups
|
|
297
|
+
.parquet("s3://bucket/output/"))
|
|
298
|
+
|
|
299
|
+
# Column pruning - only read needed columns
|
|
300
|
+
df = (spark.read.parquet("s3://bucket/data/")
|
|
301
|
+
.select("id", "amount", "date")) # Spark only reads these columns
|
|
302
|
+
|
|
303
|
+
# Predicate pushdown - filter at storage level
|
|
304
|
+
df = (spark.read.parquet("s3://bucket/partitioned/year=2024/")
|
|
305
|
+
.filter(F.col("status") == "active")) # Pushed to Parquet reader
|
|
306
|
+
|
|
307
|
+
# Delta Lake optimizations
|
|
308
|
+
(df.write
|
|
309
|
+
.format("delta")
|
|
310
|
+
.option("optimizeWrite", "true") # Bin-packing
|
|
311
|
+
.option("autoCompact", "true") # Compact small files
|
|
312
|
+
.mode("overwrite")
|
|
313
|
+
.save("s3://bucket/delta_table/"))
|
|
314
|
+
|
|
315
|
+
# Z-ordering for multi-dimensional queries
|
|
316
|
+
spark.sql("""
|
|
317
|
+
OPTIMIZE delta.`s3://bucket/delta_table/`
|
|
318
|
+
ZORDER BY (customer_id, date)
|
|
319
|
+
""")
|
|
320
|
+
```
|
|
321
|
+
|
|
322
|
+
### Pattern 7: Monitoring and Debugging
|
|
323
|
+
|
|
324
|
+
```python
|
|
325
|
+
# Enable detailed metrics
|
|
326
|
+
spark.conf.set("spark.sql.codegen.wholeStage", "true")
|
|
327
|
+
spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
|
|
328
|
+
|
|
329
|
+
# Explain query plan
|
|
330
|
+
df.explain(mode="extended")
|
|
331
|
+
# Modes: simple, extended, codegen, cost, formatted
|
|
332
|
+
|
|
333
|
+
# Get physical plan statistics
|
|
334
|
+
df.explain(mode="cost")
|
|
335
|
+
|
|
336
|
+
# Monitor task metrics
|
|
337
|
+
def analyze_stage_metrics(spark):
|
|
338
|
+
"""Analyze recent stage metrics"""
|
|
339
|
+
status_tracker = spark.sparkContext.statusTracker()
|
|
340
|
+
|
|
341
|
+
for stage_id in status_tracker.getActiveStageIds():
|
|
342
|
+
stage_info = status_tracker.getStageInfo(stage_id)
|
|
343
|
+
print(f"Stage {stage_id}:")
|
|
344
|
+
print(f" Tasks: {stage_info.numTasks}")
|
|
345
|
+
print(f" Completed: {stage_info.numCompletedTasks}")
|
|
346
|
+
print(f" Failed: {stage_info.numFailedTasks}")
|
|
347
|
+
|
|
348
|
+
# Identify data skew
|
|
349
|
+
def check_partition_skew(df):
|
|
350
|
+
"""Check for partition skew"""
|
|
351
|
+
partition_counts = (df
|
|
352
|
+
.withColumn("partition_id", F.spark_partition_id())
|
|
353
|
+
.groupBy("partition_id")
|
|
354
|
+
.count()
|
|
355
|
+
.orderBy(F.desc("count")))
|
|
356
|
+
|
|
357
|
+
partition_counts.show(20)
|
|
358
|
+
|
|
359
|
+
stats = partition_counts.select(
|
|
360
|
+
F.min("count").alias("min"),
|
|
361
|
+
F.max("count").alias("max"),
|
|
362
|
+
F.avg("count").alias("avg"),
|
|
363
|
+
F.stddev("count").alias("stddev")
|
|
364
|
+
).collect()[0]
|
|
365
|
+
|
|
366
|
+
skew_ratio = stats["max"] / stats["avg"]
|
|
367
|
+
print(f"Skew ratio: {skew_ratio:.2f}x (>2x indicates skew)")
|
|
368
|
+
```
|
|
369
|
+
|
|
370
|
+
## Configuration Cheat Sheet
|
|
371
|
+
|
|
372
|
+
```python
|
|
373
|
+
# Production configuration template
|
|
374
|
+
spark_configs = {
|
|
375
|
+
# Adaptive Query Execution (AQE)
|
|
376
|
+
"spark.sql.adaptive.enabled": "true",
|
|
377
|
+
"spark.sql.adaptive.coalescePartitions.enabled": "true",
|
|
378
|
+
"spark.sql.adaptive.skewJoin.enabled": "true",
|
|
379
|
+
|
|
380
|
+
# Memory
|
|
381
|
+
"spark.executor.memory": "8g",
|
|
382
|
+
"spark.executor.memoryOverhead": "2g",
|
|
383
|
+
"spark.memory.fraction": "0.6",
|
|
384
|
+
"spark.memory.storageFraction": "0.5",
|
|
385
|
+
|
|
386
|
+
# Parallelism
|
|
387
|
+
"spark.sql.shuffle.partitions": "200",
|
|
388
|
+
"spark.default.parallelism": "200",
|
|
389
|
+
|
|
390
|
+
# Serialization
|
|
391
|
+
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
|
|
392
|
+
"spark.sql.execution.arrow.pyspark.enabled": "true",
|
|
393
|
+
|
|
394
|
+
# Compression
|
|
395
|
+
"spark.io.compression.codec": "lz4",
|
|
396
|
+
"spark.shuffle.compress": "true",
|
|
397
|
+
|
|
398
|
+
# Broadcast
|
|
399
|
+
"spark.sql.autoBroadcastJoinThreshold": "50MB",
|
|
400
|
+
|
|
401
|
+
# File handling
|
|
402
|
+
"spark.sql.files.maxPartitionBytes": "128MB",
|
|
403
|
+
"spark.sql.files.openCostInBytes": "4MB",
|
|
404
|
+
}
|
|
405
|
+
```
|
|
406
|
+
|
|
407
|
+
## Best Practices
|
|
408
|
+
|
|
409
|
+
### Do's
|
|
410
|
+
- **Enable AQE** - Adaptive query execution handles many issues
|
|
411
|
+
- **Use Parquet/Delta** - Columnar formats with compression
|
|
412
|
+
- **Broadcast small tables** - Avoid shuffle for small joins
|
|
413
|
+
- **Monitor Spark UI** - Check for skew, spills, GC
|
|
414
|
+
- **Right-size partitions** - 128MB - 256MB per partition
|
|
415
|
+
|
|
416
|
+
### Don'ts
|
|
417
|
+
- **Don't collect large data** - Keep data distributed
|
|
418
|
+
- **Don't use UDFs unnecessarily** - Use built-in functions
|
|
419
|
+
- **Don't over-cache** - Memory is limited
|
|
420
|
+
- **Don't ignore data skew** - It dominates job time
|
|
421
|
+
- **Don't use `.count()` for existence** - Use `.take(1)` or `.isEmpty()`
|
|
422
|
+
|
|
423
|
+
## Resources
|
|
424
|
+
|
|
425
|
+
- [Spark Performance Tuning](https://spark.apache.org/docs/latest/sql-performance-tuning.html)
|
|
426
|
+
- [Spark Configuration](https://spark.apache.org/docs/latest/configuration.html)
|
|
427
|
+
- [Databricks Optimization Guide](https://docs.databricks.com/en/optimizations/index.html)
|
|
@@ -0,0 +1,35 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: sql-optimization-patterns
|
|
3
|
+
description: Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# SQL Optimization Patterns
|
|
7
|
+
|
|
8
|
+
Transform slow database queries into lightning-fast operations through systematic optimization, proper indexing, and query plan analysis.
|
|
9
|
+
|
|
10
|
+
## Use this skill when
|
|
11
|
+
|
|
12
|
+
- Debugging slow-running queries
|
|
13
|
+
- Designing performant database schemas
|
|
14
|
+
- Optimizing application response times
|
|
15
|
+
- Reducing database load and costs
|
|
16
|
+
- Improving scalability for growing datasets
|
|
17
|
+
- Analyzing EXPLAIN query plans
|
|
18
|
+
- Implementing efficient indexes
|
|
19
|
+
- Resolving N+1 query problems
|
|
20
|
+
|
|
21
|
+
## Do not use this skill when
|
|
22
|
+
|
|
23
|
+
- The task is unrelated to sql optimization patterns
|
|
24
|
+
- You need a different domain or tool outside this scope
|
|
25
|
+
|
|
26
|
+
## Instructions
|
|
27
|
+
|
|
28
|
+
- Clarify goals, constraints, and required inputs.
|
|
29
|
+
- Apply relevant best practices and validate outcomes.
|
|
30
|
+
- Provide actionable steps and verification.
|
|
31
|
+
- If detailed examples are required, open `resources/implementation-playbook.md`.
|
|
32
|
+
|
|
33
|
+
## Resources
|
|
34
|
+
|
|
35
|
+
- `resources/implementation-playbook.md` for detailed patterns and examples.
|