aigroup-workflow 2.2.1 → 2.2.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/commands/fix-build.md +10 -5
- package/.claude/commands/init-project.md +13 -8
- package/.claude/commands/plan.md +15 -8
- package/.claude/commands/review.md +12 -6
- package/.claude/commands/tdd.md +11 -5
- package/.claude/commands/workflow-start.md +20 -11
- package/.claude/settings.json +28 -0
- package/.codex/agents/architect.toml +207 -0
- package/.codex/agents/build-error-resolver.toml +110 -0
- package/.codex/agents/code-reviewer.toml +233 -0
- package/.codex/agents/doc-updater.toml +103 -0
- package/.codex/agents/e2e-runner.toml +103 -0
- package/.codex/agents/get-current-datetime.toml +23 -0
- package/.codex/agents/init-architect.toml +181 -0
- package/.codex/agents/planner.toml +208 -0
- package/.codex/agents/refactor-cleaner.toml +81 -0
- package/.codex/agents/rust-reviewer.toml +90 -0
- package/.codex/agents/security-reviewer.toml +104 -0
- package/.codex/agents/tdd-guide.toml +87 -0
- package/AGENTS.md +2 -2
- package/CLAUDE.md +23 -1
- package/LICENSE +20 -20
- package/README.md +333 -333
- package/agents/a11y-architect.md +141 -141
- package/agents/architect.md +211 -211
- package/agents/build-error-resolver.md +114 -114
- package/agents/chief-of-staff.md +151 -151
- package/agents/code-architect.md +71 -71
- package/agents/code-explorer.md +69 -69
- package/agents/code-reviewer.md +237 -237
- package/agents/code-simplifier.md +47 -47
- package/agents/comment-analyzer.md +45 -45
- package/agents/conversation-analyzer.md +52 -52
- package/agents/cpp-build-resolver.md +90 -90
- package/agents/cpp-reviewer.md +72 -72
- package/agents/csharp-reviewer.md +101 -101
- package/agents/dart-build-resolver.md +201 -201
- package/agents/database-reviewer.md +91 -91
- package/agents/doc-updater.md +107 -107
- package/agents/docs-lookup.md +68 -68
- package/agents/e2e-runner.md +107 -107
- package/agents/flutter-reviewer.md +243 -243
- package/agents/gan-evaluator.md +209 -209
- package/agents/gan-generator.md +131 -131
- package/agents/gan-planner.md +99 -99
- package/agents/get-current-datetime.md +26 -26
- package/agents/go-build-resolver.md +94 -94
- package/agents/go-reviewer.md +76 -76
- package/agents/harness-optimizer.md +35 -35
- package/agents/healthcare-reviewer.md +83 -83
- package/agents/java-build-resolver.md +153 -153
- package/agents/java-reviewer.md +92 -92
- package/agents/kotlin-build-resolver.md +118 -118
- package/agents/kotlin-reviewer.md +159 -159
- package/agents/loop-operator.md +36 -36
- package/agents/opensource-forker.md +198 -198
- package/agents/opensource-packager.md +249 -249
- package/agents/opensource-sanitizer.md +188 -188
- package/agents/performance-optimizer.md +446 -446
- package/agents/planner.md +212 -212
- package/agents/pr-test-analyzer.md +45 -45
- package/agents/python-reviewer.md +98 -98
- package/agents/pytorch-build-resolver.md +120 -120
- package/agents/refactor-cleaner.md +85 -85
- package/agents/rust-build-resolver.md +148 -148
- package/agents/rust-reviewer.md +94 -94
- package/agents/security-reviewer.md +108 -108
- package/agents/seo-specialist.md +59 -59
- package/agents/silent-failure-hunter.md +50 -50
- package/agents/tdd-guide.md +91 -91
- package/agents/type-design-analyzer.md +41 -41
- package/agents/typescript-reviewer.md +112 -112
- package/cli/commands/update.mjs +1 -1
- package/cli/utils/scaffold.mjs +53 -0
- package/docs/rules/agents.md +166 -50
- package/docs/rules/cpp/coding-style.md +44 -44
- package/docs/rules/cpp/hooks.md +39 -39
- package/docs/rules/cpp/patterns.md +51 -51
- package/docs/rules/cpp/security.md +51 -51
- package/docs/rules/cpp/testing.md +44 -44
- package/docs/rules/csharp/coding-style.md +72 -72
- package/docs/rules/csharp/hooks.md +25 -25
- package/docs/rules/csharp/patterns.md +50 -50
- package/docs/rules/csharp/security.md +58 -58
- package/docs/rules/csharp/testing.md +46 -46
- package/docs/rules/dart/coding-style.md +159 -159
- package/docs/rules/dart/hooks.md +66 -66
- package/docs/rules/dart/patterns.md +261 -261
- package/docs/rules/dart/security.md +135 -135
- package/docs/rules/dart/testing.md +215 -215
- package/docs/rules/golang/coding-style.md +32 -32
- package/docs/rules/golang/hooks.md +17 -17
- package/docs/rules/golang/patterns.md +45 -45
- package/docs/rules/golang/security.md +34 -34
- package/docs/rules/golang/testing.md +31 -31
- package/docs/rules/java/coding-style.md +114 -114
- package/docs/rules/java/hooks.md +18 -18
- package/docs/rules/java/patterns.md +146 -146
- package/docs/rules/java/security.md +100 -100
- package/docs/rules/java/testing.md +131 -131
- package/docs/rules/kotlin/coding-style.md +86 -86
- package/docs/rules/kotlin/hooks.md +17 -17
- package/docs/rules/kotlin/patterns.md +146 -146
- package/docs/rules/kotlin/security.md +82 -82
- package/docs/rules/kotlin/testing.md +128 -128
- package/docs/rules/perl/coding-style.md +46 -46
- package/docs/rules/perl/hooks.md +22 -22
- package/docs/rules/perl/patterns.md +76 -76
- package/docs/rules/perl/security.md +69 -69
- package/docs/rules/perl/testing.md +54 -54
- package/docs/rules/php/coding-style.md +40 -40
- package/docs/rules/php/hooks.md +24 -24
- package/docs/rules/php/patterns.md +33 -33
- package/docs/rules/php/security.md +37 -37
- package/docs/rules/php/testing.md +39 -39
- package/docs/rules/python/coding-style.md +42 -42
- package/docs/rules/python/hooks.md +19 -19
- package/docs/rules/python/patterns.md +39 -39
- package/docs/rules/python/security.md +30 -30
- package/docs/rules/python/testing.md +38 -38
- package/docs/rules/rust/coding-style.md +151 -151
- package/docs/rules/rust/hooks.md +16 -16
- package/docs/rules/rust/patterns.md +168 -168
- package/docs/rules/rust/security.md +141 -141
- package/docs/rules/rust/testing.md +154 -154
- package/docs/rules/swift/coding-style.md +47 -47
- package/docs/rules/swift/hooks.md +20 -20
- package/docs/rules/swift/patterns.md +66 -66
- package/docs/rules/swift/security.md +33 -33
- package/docs/rules/swift/testing.md +45 -45
- package/docs/rules/typescript/coding-style.md +199 -199
- package/docs/rules/typescript/hooks.md +22 -22
- package/docs/rules/typescript/patterns.md +52 -52
- package/docs/rules/typescript/security.md +28 -28
- package/docs/rules/typescript/testing.md +18 -18
- package/docs/rules/web/coding-style.md +96 -96
- package/docs/rules/web/design-quality.md +62 -62
- package/docs/rules/web/hooks.md +120 -120
- package/docs/rules/web/patterns.md +79 -79
- package/docs/rules/web/performance.md +64 -64
- package/docs/rules/web/security.md +57 -57
- package/docs/rules/web/testing.md +55 -55
- package/docs/templates/README.md +36 -36
- package/docs/templates/ai-project-final.md +124 -124
- package/docs/templates/ai-project.md +105 -105
- package/docs/templates/api.md +157 -157
- package/docs/templates/bug.md +62 -62
- package/docs/templates/code-review.md +87 -87
- package/docs/templates/generic.md +116 -116
- package/docs/templates/implementation-plan.md +1 -1
- package/docs/templates/meeting.md +68 -68
- package/docs/templates/prd.md +98 -98
- package/docs/templates/ui.md +134 -134
- package/docs/workflow-pipeline.md +5 -5
- package/package.json +40 -39
- package/skills/SUPERPOWERS-LICENSE +21 -21
- package/skills/ai-ml/fine-tuning-expert/SKILL.md +162 -162
- package/skills/ai-ml/fine-tuning-expert/references/dataset-preparation.md +540 -540
- package/skills/ai-ml/fine-tuning-expert/references/deployment-optimization.md +673 -673
- package/skills/ai-ml/fine-tuning-expert/references/evaluation-metrics.md +597 -597
- package/skills/ai-ml/fine-tuning-expert/references/hyperparameter-tuning.md +565 -565
- package/skills/ai-ml/fine-tuning-expert/references/lora-peft.md +347 -347
- package/skills/ai-ml/ml-pipeline/SKILL.md +159 -159
- package/skills/ai-ml/ml-pipeline/references/experiment-tracking.md +833 -833
- package/skills/ai-ml/ml-pipeline/references/feature-engineering.md +631 -631
- package/skills/ai-ml/ml-pipeline/references/model-validation.md +978 -978
- package/skills/ai-ml/ml-pipeline/references/pipeline-orchestration.md +907 -907
- package/skills/ai-ml/ml-pipeline/references/training-pipelines.md +782 -782
- package/skills/ai-ml/rag-architect/SKILL.md +194 -194
- package/skills/ai-ml/rag-architect/references/chunking-strategies.md +878 -878
- package/skills/ai-ml/rag-architect/references/embedding-models.md +561 -561
- package/skills/ai-ml/rag-architect/references/rag-evaluation.md +833 -833
- package/skills/ai-ml/rag-architect/references/retrieval-optimization.md +795 -795
- package/skills/ai-ml/rag-architect/references/vector-databases.md +589 -589
- package/skills/ai-ml/spark-engineer/SKILL.md +148 -148
- package/skills/ai-ml/spark-engineer/references/partitioning-caching.md +543 -543
- package/skills/ai-ml/spark-engineer/references/performance-tuning.md +544 -544
- package/skills/ai-ml/spark-engineer/references/rdd-operations.md +599 -599
- package/skills/ai-ml/spark-engineer/references/spark-sql-dataframes.md +474 -474
- package/skills/ai-ml/spark-engineer/references/streaming-patterns.md +786 -786
- package/skills/backend/api-designer/SKILL.md +217 -217
- package/skills/backend/api-designer/references/error-handling.md +541 -541
- package/skills/backend/api-designer/references/openapi.md +824 -824
- package/skills/backend/api-designer/references/pagination.md +494 -494
- package/skills/backend/api-designer/references/rest-patterns.md +335 -335
- package/skills/backend/api-designer/references/versioning.md +391 -391
- package/skills/backend/architecture-designer/SKILL.md +117 -117
- package/skills/backend/architecture-designer/references/adr-template.md +116 -116
- package/skills/backend/architecture-designer/references/architecture-patterns.md +111 -111
- package/skills/backend/architecture-designer/references/database-selection.md +102 -102
- package/skills/backend/architecture-designer/references/nfr-checklist.md +112 -112
- package/skills/backend/architecture-designer/references/system-design.md +100 -100
- package/skills/backend/code-documenter/SKILL.md +147 -147
- package/skills/backend/code-documenter/references/api-docs-fastapi-django.md +166 -166
- package/skills/backend/code-documenter/references/api-docs-nestjs-express.md +220 -220
- package/skills/backend/code-documenter/references/coverage-reports.md +125 -125
- package/skills/backend/code-documenter/references/documentation-systems.md +333 -333
- package/skills/backend/code-documenter/references/interactive-api-docs.md +531 -531
- package/skills/backend/code-documenter/references/python-docstrings.md +121 -121
- package/skills/backend/code-documenter/references/typescript-jsdoc.md +145 -145
- package/skills/backend/code-documenter/references/user-guides-tutorials.md +530 -530
- package/skills/backend/debugging-wizard/SKILL.md +105 -105
- package/skills/backend/debugging-wizard/references/common-patterns.md +132 -132
- package/skills/backend/debugging-wizard/references/debugging-tools.md +140 -140
- package/skills/backend/debugging-wizard/references/quick-fixes.md +177 -177
- package/skills/backend/debugging-wizard/references/strategies.md +142 -142
- package/skills/backend/debugging-wizard/references/systematic-debugging.md +367 -367
- package/skills/backend/feature-forge/SKILL.md +98 -98
- package/skills/backend/feature-forge/references/acceptance-criteria.md +104 -104
- package/skills/backend/feature-forge/references/ears-syntax.md +99 -99
- package/skills/backend/feature-forge/references/interview-questions.md +150 -150
- package/skills/backend/feature-forge/references/pre-discovery-subagents.md +54 -54
- package/skills/backend/feature-forge/references/specification-template.md +103 -103
- package/skills/backend/fullstack-guardian/SKILL.md +105 -105
- package/skills/backend/fullstack-guardian/references/api-design-standards.md +307 -307
- package/skills/backend/fullstack-guardian/references/architecture-decisions.md +350 -350
- package/skills/backend/fullstack-guardian/references/backend-patterns.md +237 -237
- package/skills/backend/fullstack-guardian/references/common-patterns.md +134 -134
- package/skills/backend/fullstack-guardian/references/deliverables-checklist.md +354 -354
- package/skills/backend/fullstack-guardian/references/design-template.md +91 -91
- package/skills/backend/fullstack-guardian/references/error-handling.md +135 -135
- package/skills/backend/fullstack-guardian/references/frontend-patterns.md +340 -340
- package/skills/backend/fullstack-guardian/references/integration-patterns.md +333 -333
- package/skills/backend/fullstack-guardian/references/security-checklist.md +106 -106
- package/skills/backend/graphql-architect/SKILL.md +146 -146
- package/skills/backend/graphql-architect/references/federation.md +418 -418
- package/skills/backend/graphql-architect/references/migration-from-rest.md +1141 -1141
- package/skills/backend/graphql-architect/references/resolvers.md +425 -425
- package/skills/backend/graphql-architect/references/schema-design.md +393 -393
- package/skills/backend/graphql-architect/references/security.md +569 -569
- package/skills/backend/graphql-architect/references/subscriptions.md +510 -510
- package/skills/backend/legacy-modernizer/SKILL.md +137 -137
- package/skills/backend/legacy-modernizer/references/legacy-testing.md +381 -381
- package/skills/backend/legacy-modernizer/references/migration-strategies.md +423 -423
- package/skills/backend/legacy-modernizer/references/refactoring-patterns.md +395 -395
- package/skills/backend/legacy-modernizer/references/strangler-fig-pattern.md +281 -281
- package/skills/backend/legacy-modernizer/references/system-assessment.md +487 -487
- package/skills/backend/microservices-architect/SKILL.md +164 -164
- package/skills/backend/microservices-architect/references/communication.md +499 -499
- package/skills/backend/microservices-architect/references/data.md +721 -721
- package/skills/backend/microservices-architect/references/decomposition.md +344 -344
- package/skills/backend/microservices-architect/references/observability.md +805 -805
- package/skills/backend/microservices-architect/references/patterns.md +603 -603
- package/skills/database/database-optimizer/SKILL.md +147 -147
- package/skills/database/database-optimizer/references/index-strategies.md +331 -331
- package/skills/database/database-optimizer/references/monitoring-analysis.md +501 -501
- package/skills/database/database-optimizer/references/mysql-tuning.md +452 -452
- package/skills/database/database-optimizer/references/postgresql-tuning.md +413 -413
- package/skills/database/database-optimizer/references/query-optimization.md +251 -251
- package/skills/database/postgres-pro/SKILL.md +152 -152
- package/skills/database/postgres-pro/references/extensions.md +404 -404
- package/skills/database/postgres-pro/references/jsonb.md +321 -321
- package/skills/database/postgres-pro/references/maintenance.md +481 -481
- package/skills/database/postgres-pro/references/performance.md +265 -265
- package/skills/database/postgres-pro/references/replication.md +446 -446
- package/skills/database/sql-pro/SKILL.md +129 -129
- package/skills/database/sql-pro/references/database-design.md +402 -402
- package/skills/database/sql-pro/references/dialect-differences.md +419 -419
- package/skills/database/sql-pro/references/optimization.md +384 -384
- package/skills/database/sql-pro/references/query-patterns.md +285 -285
- package/skills/database/sql-pro/references/window-functions.md +328 -328
- package/skills/dotnet/csharp-developer/SKILL.md +125 -125
- package/skills/dotnet/csharp-developer/references/aspnet-core.md +394 -394
- package/skills/dotnet/csharp-developer/references/blazor.md +553 -553
- package/skills/dotnet/csharp-developer/references/entity-framework.md +409 -409
- package/skills/dotnet/csharp-developer/references/modern-csharp.md +248 -248
- package/skills/dotnet/csharp-developer/references/performance.md +498 -498
- package/skills/dotnet/dotnet-core-expert/SKILL.md +138 -138
- package/skills/dotnet/dotnet-core-expert/references/authentication.md +546 -546
- package/skills/dotnet/dotnet-core-expert/references/clean-architecture.md +455 -455
- package/skills/dotnet/dotnet-core-expert/references/cloud-native.md +548 -548
- package/skills/dotnet/dotnet-core-expert/references/entity-framework.md +440 -440
- package/skills/dotnet/dotnet-core-expert/references/minimal-apis.md +319 -319
- package/skills/frontend/angular-architect/SKILL.md +152 -152
- package/skills/frontend/angular-architect/references/components.md +297 -297
- package/skills/frontend/angular-architect/references/ngrx.md +401 -401
- package/skills/frontend/angular-architect/references/routing.md +361 -361
- package/skills/frontend/angular-architect/references/rxjs.md +319 -319
- package/skills/frontend/angular-architect/references/testing.md +405 -405
- package/skills/frontend/design-commands/design.md +91 -91
- package/skills/frontend/design-commands/handoff.md +97 -97
- package/skills/frontend/design-commands/prototype.md +120 -120
- package/skills/frontend/design-commands/spec.md +160 -160
- package/skills/frontend/design-commands/style.md +78 -78
- package/skills/frontend/flutter-expert/SKILL.md +138 -138
- package/skills/frontend/flutter-expert/references/bloc-state.md +259 -259
- package/skills/frontend/flutter-expert/references/gorouter-navigation.md +119 -119
- package/skills/frontend/flutter-expert/references/performance.md +99 -99
- package/skills/frontend/flutter-expert/references/project-structure.md +118 -118
- package/skills/frontend/flutter-expert/references/riverpod-state.md +130 -130
- package/skills/frontend/flutter-expert/references/widget-patterns.md +123 -123
- package/skills/frontend/nextjs-developer/SKILL.md +143 -143
- package/skills/frontend/nextjs-developer/references/app-router.md +311 -311
- package/skills/frontend/nextjs-developer/references/data-fetching.md +482 -482
- package/skills/frontend/nextjs-developer/references/deployment.md +545 -545
- package/skills/frontend/nextjs-developer/references/server-actions.md +462 -462
- package/skills/frontend/nextjs-developer/references/server-components.md +384 -384
- package/skills/frontend/react-expert/SKILL.md +149 -149
- package/skills/frontend/react-expert/references/hooks-patterns.md +162 -162
- package/skills/frontend/react-expert/references/migration-class-to-modern.md +1119 -1119
- package/skills/frontend/react-expert/references/performance.md +168 -168
- package/skills/frontend/react-expert/references/react-19-features.md +174 -174
- package/skills/frontend/react-expert/references/server-components.md +143 -143
- package/skills/frontend/react-expert/references/state-management.md +171 -171
- package/skills/frontend/react-expert/references/testing-react.md +174 -174
- package/skills/frontend/react-native-expert/SKILL.md +185 -185
- package/skills/frontend/react-native-expert/references/expo-router.md +187 -187
- package/skills/frontend/react-native-expert/references/list-optimization.md +204 -204
- package/skills/frontend/react-native-expert/references/platform-handling.md +188 -188
- package/skills/frontend/react-native-expert/references/project-structure.md +171 -171
- package/skills/frontend/react-native-expert/references/storage-hooks.md +173 -173
- package/skills/frontend/senior-frontend/SKILL.md +477 -477
- package/skills/frontend/senior-frontend/references/frontend_best_practices.md +806 -806
- package/skills/frontend/senior-frontend/references/nextjs_optimization_guide.md +724 -724
- package/skills/frontend/senior-frontend/references/react_patterns.md +746 -746
- package/skills/frontend/senior-frontend/scripts/bundle_analyzer.py +407 -407
- package/skills/frontend/senior-frontend/scripts/component_generator.py +329 -329
- package/skills/frontend/senior-frontend/scripts/frontend_scaffolder.py +1005 -1005
- package/skills/frontend/ui-ux-pro-max/SKILL.md +386 -386
- package/skills/frontend/ui-ux-pro-max/data/charts.csv +26 -26
- package/skills/frontend/ui-ux-pro-max/data/colors.csv +97 -97
- package/skills/frontend/ui-ux-pro-max/data/icons.csv +101 -101
- package/skills/frontend/ui-ux-pro-max/data/landing.csv +31 -31
- package/skills/frontend/ui-ux-pro-max/data/products.csv +96 -96
- package/skills/frontend/ui-ux-pro-max/data/react-performance.csv +45 -45
- package/skills/frontend/ui-ux-pro-max/data/stacks/astro.csv +54 -54
- package/skills/frontend/ui-ux-pro-max/data/stacks/flutter.csv +53 -53
- package/skills/frontend/ui-ux-pro-max/data/stacks/html-tailwind.csv +56 -56
- package/skills/frontend/ui-ux-pro-max/data/stacks/jetpack-compose.csv +53 -53
- package/skills/frontend/ui-ux-pro-max/data/stacks/nextjs.csv +53 -53
- package/skills/frontend/ui-ux-pro-max/data/stacks/nuxt-ui.csv +51 -51
- package/skills/frontend/ui-ux-pro-max/data/stacks/nuxtjs.csv +59 -59
- package/skills/frontend/ui-ux-pro-max/data/stacks/react-native.csv +52 -52
- package/skills/frontend/ui-ux-pro-max/data/stacks/react.csv +54 -54
- package/skills/frontend/ui-ux-pro-max/data/stacks/shadcn.csv +61 -61
- package/skills/frontend/ui-ux-pro-max/data/stacks/svelte.csv +54 -54
- package/skills/frontend/ui-ux-pro-max/data/stacks/swiftui.csv +51 -51
- package/skills/frontend/ui-ux-pro-max/data/stacks/vue.csv +50 -50
- package/skills/frontend/ui-ux-pro-max/data/styles.csv +68 -68
- package/skills/frontend/ui-ux-pro-max/data/typography.csv +57 -57
- package/skills/frontend/ui-ux-pro-max/data/ui-reasoning.csv +101 -101
- package/skills/frontend/ui-ux-pro-max/data/ux-guidelines.csv +99 -99
- package/skills/frontend/ui-ux-pro-max/data/web-interface.csv +31 -31
- package/skills/frontend/ui-ux-pro-max/scripts/core.py +253 -253
- package/skills/frontend/ui-ux-pro-max/scripts/design_system.py +1067 -1067
- package/skills/frontend/ui-ux-pro-max/scripts/search.py +114 -114
- package/skills/frontend/vue-expert/SKILL.md +98 -98
- package/skills/frontend/vue-expert/references/build-tooling.md +480 -480
- package/skills/frontend/vue-expert/references/components.md +448 -448
- package/skills/frontend/vue-expert/references/composition-api.md +299 -299
- package/skills/frontend/vue-expert/references/mobile-hybrid.md +636 -636
- package/skills/frontend/vue-expert/references/nuxt.md +669 -669
- package/skills/frontend/vue-expert/references/state-management.md +449 -449
- package/skills/frontend/vue-expert/references/typescript.md +584 -584
- package/skills/frontend/vue-expert-js/SKILL.md +167 -167
- package/skills/frontend/vue-expert-js/references/component-architecture.md +219 -219
- package/skills/frontend/vue-expert-js/references/composables-patterns.md +183 -183
- package/skills/frontend/vue-expert-js/references/jsdoc-typing.md +535 -535
- package/skills/frontend/vue-expert-js/references/state-management.md +249 -249
- package/skills/frontend/vue-expert-js/references/testing-patterns.md +237 -237
- package/skills/go-rust-cpp/cpp-pro/SKILL.md +115 -115
- package/skills/go-rust-cpp/cpp-pro/references/build-tooling.md +440 -440
- package/skills/go-rust-cpp/cpp-pro/references/concurrency.md +437 -437
- package/skills/go-rust-cpp/cpp-pro/references/memory-performance.md +397 -397
- package/skills/go-rust-cpp/cpp-pro/references/modern-cpp.md +304 -304
- package/skills/go-rust-cpp/cpp-pro/references/templates.md +357 -357
- package/skills/go-rust-cpp/golang-pro/SKILL.md +122 -122
- package/skills/go-rust-cpp/golang-pro/references/concurrency.md +329 -329
- package/skills/go-rust-cpp/golang-pro/references/generics.md +442 -442
- package/skills/go-rust-cpp/golang-pro/references/interfaces.md +432 -432
- package/skills/go-rust-cpp/golang-pro/references/project-structure.md +477 -477
- package/skills/go-rust-cpp/golang-pro/references/testing.md +451 -451
- package/skills/go-rust-cpp/rust-engineer/SKILL.md +167 -167
- package/skills/go-rust-cpp/rust-engineer/references/async.md +458 -458
- package/skills/go-rust-cpp/rust-engineer/references/error-handling.md +334 -334
- package/skills/go-rust-cpp/rust-engineer/references/ownership.md +278 -278
- package/skills/go-rust-cpp/rust-engineer/references/testing.md +470 -470
- package/skills/go-rust-cpp/rust-engineer/references/traits.md +413 -413
- package/skills/infra/cli-developer/SKILL.md +113 -113
- package/skills/infra/cli-developer/references/design-patterns.md +221 -221
- package/skills/infra/cli-developer/references/go-cli.md +540 -540
- package/skills/infra/cli-developer/references/node-cli.md +383 -383
- package/skills/infra/cli-developer/references/python-cli.md +422 -422
- package/skills/infra/cli-developer/references/ux-patterns.md +448 -448
- package/skills/infra/cloud-architect/SKILL.md +216 -216
- package/skills/infra/cloud-architect/references/aws.md +394 -394
- package/skills/infra/cloud-architect/references/azure.md +562 -562
- package/skills/infra/cloud-architect/references/cost.md +582 -582
- package/skills/infra/cloud-architect/references/gcp.md +633 -633
- package/skills/infra/cloud-architect/references/multi-cloud.md +483 -483
- package/skills/infra/devops-engineer/SKILL.md +144 -144
- package/skills/infra/devops-engineer/references/deployment-strategies.md +241 -241
- package/skills/infra/devops-engineer/references/docker-patterns.md +113 -113
- package/skills/infra/devops-engineer/references/github-actions.md +139 -139
- package/skills/infra/devops-engineer/references/incident-response.md +331 -331
- package/skills/infra/devops-engineer/references/kubernetes.md +154 -154
- package/skills/infra/devops-engineer/references/platform-engineering.md +417 -417
- package/skills/infra/devops-engineer/references/release-automation.md +527 -527
- package/skills/infra/devops-engineer/references/terraform-iac.md +141 -141
- package/skills/infra/kubernetes-specialist/SKILL.md +241 -241
- package/skills/infra/kubernetes-specialist/references/configuration.md +452 -452
- package/skills/infra/kubernetes-specialist/references/cost-optimization.md +458 -458
- package/skills/infra/kubernetes-specialist/references/custom-operators.md +563 -563
- package/skills/infra/kubernetes-specialist/references/gitops.md +530 -530
- package/skills/infra/kubernetes-specialist/references/helm-charts.md +912 -912
- package/skills/infra/kubernetes-specialist/references/multi-cluster.md +507 -507
- package/skills/infra/kubernetes-specialist/references/networking.md +447 -447
- package/skills/infra/kubernetes-specialist/references/service-mesh.md +459 -459
- package/skills/infra/kubernetes-specialist/references/storage.md +535 -535
- package/skills/infra/kubernetes-specialist/references/troubleshooting.md +414 -414
- package/skills/infra/kubernetes-specialist/references/workloads.md +377 -377
- package/skills/infra/mcp-developer/SKILL.md +143 -143
- package/skills/infra/mcp-developer/references/protocol.md +244 -244
- package/skills/infra/mcp-developer/references/python-sdk.md +367 -367
- package/skills/infra/mcp-developer/references/resources.md +554 -554
- package/skills/infra/mcp-developer/references/tools.md +480 -480
- package/skills/infra/mcp-developer/references/typescript-sdk.md +350 -350
- package/skills/infra/monitoring-expert/SKILL.md +176 -176
- package/skills/infra/monitoring-expert/references/alerting-rules.md +141 -141
- package/skills/infra/monitoring-expert/references/application-profiling.md +331 -331
- package/skills/infra/monitoring-expert/references/capacity-planning.md +344 -344
- package/skills/infra/monitoring-expert/references/dashboards.md +126 -126
- package/skills/infra/monitoring-expert/references/opentelemetry.md +123 -123
- package/skills/infra/monitoring-expert/references/performance-testing.md +269 -269
- package/skills/infra/monitoring-expert/references/prometheus-metrics.md +136 -136
- package/skills/infra/monitoring-expert/references/structured-logging.md +142 -142
- package/skills/infra/sre-engineer/SKILL.md +181 -181
- package/skills/infra/sre-engineer/references/automation-toil.md +492 -492
- package/skills/infra/sre-engineer/references/error-budget-policy.md +334 -334
- package/skills/infra/sre-engineer/references/incident-chaos.md +576 -576
- package/skills/infra/sre-engineer/references/monitoring-alerting.md +424 -424
- package/skills/infra/sre-engineer/references/slo-sli-management.md +238 -238
- package/skills/infra/terraform-engineer/SKILL.md +143 -143
- package/skills/infra/terraform-engineer/references/best-practices.md +583 -583
- package/skills/infra/terraform-engineer/references/module-patterns.md +297 -297
- package/skills/infra/terraform-engineer/references/providers.md +452 -452
- package/skills/infra/terraform-engineer/references/state-management.md +371 -371
- package/skills/infra/terraform-engineer/references/testing.md +486 -486
- package/skills/infra/websocket-engineer/SKILL.md +168 -168
- package/skills/infra/websocket-engineer/references/alternatives.md +391 -391
- package/skills/infra/websocket-engineer/references/patterns.md +400 -400
- package/skills/infra/websocket-engineer/references/protocol.md +195 -195
- package/skills/infra/websocket-engineer/references/scaling.md +333 -333
- package/skills/infra/websocket-engineer/references/security.md +474 -474
- package/skills/java/java-architect/SKILL.md +132 -132
- package/skills/java/java-architect/references/jpa-optimization.md +393 -393
- package/skills/java/java-architect/references/reactive-webflux.md +356 -356
- package/skills/java/java-architect/references/spring-boot-setup.md +269 -269
- package/skills/java/java-architect/references/spring-security.md +445 -445
- package/skills/java/java-architect/references/testing-patterns.md +500 -500
- package/skills/java/kotlin-specialist/SKILL.md +147 -147
- package/skills/java/kotlin-specialist/references/android-compose.md +419 -419
- package/skills/java/kotlin-specialist/references/coroutines-flow.md +276 -276
- package/skills/java/kotlin-specialist/references/dsl-idioms.md +421 -421
- package/skills/java/kotlin-specialist/references/ktor-server.md +426 -426
- package/skills/java/kotlin-specialist/references/multiplatform-kmp.md +380 -380
- package/skills/java/spring-boot-engineer/SKILL.md +195 -195
- package/skills/java/spring-boot-engineer/references/cloud.md +498 -498
- package/skills/java/spring-boot-engineer/references/data.md +381 -381
- package/skills/java/spring-boot-engineer/references/security.md +459 -459
- package/skills/java/spring-boot-engineer/references/testing.md +545 -545
- package/skills/java/spring-boot-engineer/references/web.md +295 -295
- package/skills/javascript/javascript-pro/SKILL.md +132 -132
- package/skills/javascript/javascript-pro/references/async-patterns.md +334 -334
- package/skills/javascript/javascript-pro/references/browser-apis.md +398 -398
- package/skills/javascript/javascript-pro/references/modern-syntax.md +272 -272
- package/skills/javascript/javascript-pro/references/modules.md +357 -357
- package/skills/javascript/javascript-pro/references/node-essentials.md +471 -471
- package/skills/javascript/nestjs-expert/SKILL.md +206 -206
- package/skills/javascript/nestjs-expert/references/authentication.md +166 -166
- package/skills/javascript/nestjs-expert/references/controllers-routing.md +111 -111
- package/skills/javascript/nestjs-expert/references/dtos-validation.md +153 -153
- package/skills/javascript/nestjs-expert/references/migration-from-express.md +1237 -1237
- package/skills/javascript/nestjs-expert/references/services-di.md +140 -140
- package/skills/javascript/nestjs-expert/references/testing-patterns.md +186 -186
- package/skills/javascript/typescript-pro/SKILL.md +145 -145
- package/skills/javascript/typescript-pro/references/advanced-types.md +259 -259
- package/skills/javascript/typescript-pro/references/configuration.md +445 -445
- package/skills/javascript/typescript-pro/references/patterns.md +484 -484
- package/skills/javascript/typescript-pro/references/type-guards.md +352 -352
- package/skills/javascript/typescript-pro/references/utility-types.md +329 -329
- package/skills/php/laravel-specialist/SKILL.md +262 -262
- package/skills/php/laravel-specialist/references/eloquent.md +351 -351
- package/skills/php/laravel-specialist/references/livewire.md +512 -512
- package/skills/php/laravel-specialist/references/queues.md +423 -423
- package/skills/php/laravel-specialist/references/routing.md +362 -362
- package/skills/php/laravel-specialist/references/testing.md +522 -522
- package/skills/php/php-pro/SKILL.md +206 -206
- package/skills/php/php-pro/references/async-patterns.md +412 -412
- package/skills/php/php-pro/references/laravel-patterns.md +377 -377
- package/skills/php/php-pro/references/modern-php-features.md +323 -323
- package/skills/php/php-pro/references/symfony-patterns.md +466 -466
- package/skills/php/php-pro/references/testing-quality.md +466 -466
- package/skills/product/competitive-analysis/SKILL.md +257 -257
- package/skills/product/meeting-notes/SKILL.md +266 -266
- package/skills/product/prd-template/SKILL.md +150 -150
- package/skills/product/stakeholder-update/SKILL.md +225 -225
- package/skills/product/user-research-synthesis/SKILL.md +235 -235
- package/skills/python/django-expert/SKILL.md +162 -162
- package/skills/python/django-expert/references/authentication.md +145 -145
- package/skills/python/django-expert/references/drf-serializers.md +148 -148
- package/skills/python/django-expert/references/models-orm.md +151 -151
- package/skills/python/django-expert/references/testing-django.md +204 -204
- package/skills/python/django-expert/references/viewsets-views.md +153 -153
- package/skills/python/fastapi-expert/SKILL.md +185 -185
- package/skills/python/fastapi-expert/references/async-sqlalchemy.md +146 -146
- package/skills/python/fastapi-expert/references/authentication.md +159 -159
- package/skills/python/fastapi-expert/references/endpoints-routing.md +142 -142
- package/skills/python/fastapi-expert/references/migration-from-django.md +996 -996
- package/skills/python/fastapi-expert/references/pydantic-v2.md +135 -135
- package/skills/python/fastapi-expert/references/testing-async.md +159 -159
- package/skills/python/pandas-pro/SKILL.md +178 -178
- package/skills/python/pandas-pro/references/aggregation-groupby.md +545 -545
- package/skills/python/pandas-pro/references/data-cleaning.md +500 -500
- package/skills/python/pandas-pro/references/dataframe-operations.md +420 -420
- package/skills/python/pandas-pro/references/merging-joining.md +596 -596
- package/skills/python/pandas-pro/references/performance-optimization.md +597 -597
- package/skills/python/python-pro/SKILL.md +177 -177
- package/skills/python/python-pro/references/async-patterns.md +356 -356
- package/skills/python/python-pro/references/packaging.md +460 -460
- package/skills/python/python-pro/references/standard-library.md +378 -378
- package/skills/python/python-pro/references/testing.md +404 -404
- package/skills/python/python-pro/references/type-system.md +290 -290
- package/skills/quality/chaos-engineer/SKILL.md +182 -182
- package/skills/quality/chaos-engineer/references/chaos-tools.md +511 -511
- package/skills/quality/chaos-engineer/references/experiment-design.md +229 -229
- package/skills/quality/chaos-engineer/references/game-days.md +434 -434
- package/skills/quality/chaos-engineer/references/infrastructure-chaos.md +348 -348
- package/skills/quality/chaos-engineer/references/kubernetes-chaos.md +432 -432
- package/skills/quality/code-reviewer/SKILL.md +119 -119
- package/skills/quality/code-reviewer/references/common-issues.md +142 -142
- package/skills/quality/code-reviewer/references/feedback-examples.md +144 -144
- package/skills/quality/code-reviewer/references/receiving-feedback.md +238 -238
- package/skills/quality/code-reviewer/references/report-template.md +109 -109
- package/skills/quality/code-reviewer/references/review-checklist.md +88 -88
- package/skills/quality/code-reviewer/references/spec-compliance-review.md +258 -258
- package/skills/quality/playwright-expert/SKILL.md +169 -169
- package/skills/quality/playwright-expert/references/api-mocking.md +140 -140
- package/skills/quality/playwright-expert/references/configuration.md +155 -155
- package/skills/quality/playwright-expert/references/debugging-flaky.md +150 -150
- package/skills/quality/playwright-expert/references/page-object-model.md +152 -152
- package/skills/quality/playwright-expert/references/selectors-locators.md +119 -119
- package/skills/quality/secure-code-guardian/SKILL.md +191 -191
- package/skills/quality/secure-code-guardian/references/authentication.md +136 -136
- package/skills/quality/secure-code-guardian/references/input-validation.md +146 -146
- package/skills/quality/secure-code-guardian/references/owasp-prevention.md +135 -135
- package/skills/quality/secure-code-guardian/references/security-headers.md +133 -133
- package/skills/quality/secure-code-guardian/references/xss-csrf.md +157 -157
- package/skills/quality/security-reviewer/SKILL.md +103 -103
- package/skills/quality/security-reviewer/references/infrastructure-security.md +268 -268
- package/skills/quality/security-reviewer/references/penetration-testing.md +268 -268
- package/skills/quality/security-reviewer/references/report-template.md +170 -170
- package/skills/quality/security-reviewer/references/sast-tools.md +117 -117
- package/skills/quality/security-reviewer/references/secret-scanning.md +125 -125
- package/skills/quality/security-reviewer/references/vulnerability-patterns.md +152 -152
- package/skills/quality/senior-qa/README.md +196 -196
- package/skills/quality/senior-qa/SKILL.md +399 -399
- package/skills/quality/senior-qa/references/qa_best_practices.md +964 -964
- package/skills/quality/senior-qa/references/test_automation_patterns.md +1009 -1009
- package/skills/quality/senior-qa/references/testing_strategies.md +649 -649
- package/skills/quality/senior-qa/scripts/coverage_analyzer.py +836 -836
- package/skills/quality/senior-qa/scripts/e2e_test_scaffolder.py +820 -820
- package/skills/quality/senior-qa/scripts/test_suite_generator.py +605 -605
- package/skills/quality/tdd-guide/HOW_TO_USE.md +313 -313
- package/skills/quality/tdd-guide/README.md +680 -680
- package/skills/quality/tdd-guide/SKILL.md +122 -122
- package/skills/quality/tdd-guide/assets/expected_output.json +77 -77
- package/skills/quality/tdd-guide/assets/sample_input_python.json +39 -39
- package/skills/quality/tdd-guide/assets/sample_input_typescript.json +36 -36
- package/skills/quality/tdd-guide/references/ci-integration.md +195 -195
- package/skills/quality/tdd-guide/references/framework-guide.md +206 -206
- package/skills/quality/tdd-guide/references/tdd-best-practices.md +128 -128
- package/skills/quality/tdd-guide/scripts/coverage_analyzer.py +434 -434
- package/skills/quality/tdd-guide/scripts/fixture_generator.py +440 -440
- package/skills/quality/tdd-guide/scripts/format_detector.py +384 -384
- package/skills/quality/tdd-guide/scripts/framework_adapter.py +428 -428
- package/skills/quality/tdd-guide/scripts/metrics_calculator.py +456 -456
- package/skills/quality/tdd-guide/scripts/output_formatter.py +354 -354
- package/skills/quality/tdd-guide/scripts/tdd_workflow.py +474 -474
- package/skills/quality/tdd-guide/scripts/test_generator.py +438 -438
- package/skills/quality/test-master/SKILL.md +94 -94
- package/skills/quality/test-master/references/automation-frameworks.md +294 -294
- package/skills/quality/test-master/references/e2e-testing.md +128 -128
- package/skills/quality/test-master/references/integration-testing.md +120 -120
- package/skills/quality/test-master/references/performance-testing.md +118 -118
- package/skills/quality/test-master/references/qa-methodology.md +247 -247
- package/skills/quality/test-master/references/security-testing.md +127 -127
- package/skills/quality/test-master/references/tdd-iron-laws.md +174 -174
- package/skills/quality/test-master/references/test-reports.md +104 -104
- package/skills/quality/test-master/references/testing-anti-patterns.md +231 -231
- package/skills/quality/test-master/references/unit-testing.md +113 -113
- package/skills/ruby/rails-expert/SKILL.md +154 -154
- package/skills/ruby/rails-expert/references/active-record.md +244 -244
- package/skills/ruby/rails-expert/references/api-development.md +401 -401
- package/skills/ruby/rails-expert/references/background-jobs.md +272 -272
- package/skills/ruby/rails-expert/references/hotwire-turbo.md +228 -228
- package/skills/ruby/rails-expert/references/rspec-testing.md +367 -367
- package/skills/swift/swift-expert/SKILL.md +163 -163
- package/skills/swift/swift-expert/references/async-concurrency.md +360 -360
- package/skills/swift/swift-expert/references/memory-performance.md +377 -377
- package/skills/swift/swift-expert/references/protocol-oriented.md +354 -354
- package/skills/swift/swift-expert/references/swiftui-patterns.md +291 -291
- package/skills/swift/swift-expert/references/testing-patterns.md +399 -399
- package/skills/workflow/brainstorming/SKILL.md +164 -164
- package/skills/workflow/brainstorming/scripts/frame-template.html +214 -214
- package/skills/workflow/brainstorming/scripts/helper.js +88 -88
- package/skills/workflow/brainstorming/scripts/server.cjs +354 -354
- package/skills/workflow/brainstorming/scripts/start-server.sh +148 -148
- package/skills/workflow/brainstorming/scripts/stop-server.sh +56 -56
- package/skills/workflow/brainstorming/spec-document-reviewer-prompt.md +49 -49
- package/skills/workflow/brainstorming/visual-companion.md +287 -287
- package/skills/workflow/documentation/SKILL.md +45 -45
- package/skills/workflow/entropy-management/SKILL.md +115 -115
- package/skills/workflow/executing-plans/SKILL.md +70 -70
- package/skills/workflow/finishing-a-development-branch/SKILL.md +200 -200
- package/skills/workflow/receiving-code-review/SKILL.md +213 -213
- package/skills/workflow/requesting-code-review/SKILL.md +105 -105
- package/skills/workflow/requesting-code-review/code-reviewer.md +146 -146
- package/skills/workflow/requirement-engineering/SKILL.md +111 -111
- package/skills/workflow/systematic-debugging/CREATION-LOG.md +119 -119
- package/skills/workflow/systematic-debugging/SKILL.md +296 -296
- package/skills/workflow/systematic-debugging/condition-based-waiting-example.ts +158 -158
- package/skills/workflow/systematic-debugging/condition-based-waiting.md +115 -115
- package/skills/workflow/systematic-debugging/defense-in-depth.md +122 -122
- package/skills/workflow/systematic-debugging/find-polluter.sh +63 -63
- package/skills/workflow/systematic-debugging/root-cause-tracing.md +169 -169
- package/skills/workflow/systematic-debugging/test-academic.md +14 -14
- package/skills/workflow/systematic-debugging/test-pressure-1.md +58 -58
- package/skills/workflow/systematic-debugging/test-pressure-2.md +68 -68
- package/skills/workflow/systematic-debugging/test-pressure-3.md +69 -69
- package/skills/workflow/using-git-worktrees/SKILL.md +218 -218
- package/skills/workflow/verification-before-completion/SKILL.md +139 -139
- package/skills/workflow/writing-plans/SKILL.md +151 -151
- package/skills/workflow/writing-plans/plan-document-reviewer-prompt.md +49 -49
- package/skills/workflow/writing-skills/SKILL.md +655 -655
- package/skills/workflow/writing-skills/anthropic-best-practices.md +1150 -1150
- package/skills/workflow/writing-skills/examples/CLAUDE_MD_TESTING.md +189 -189
- package/skills/workflow/writing-skills/persuasion-principles.md +187 -187
- package/skills/workflow/writing-skills/render-graphs.js +168 -168
- package/skills/workflow/writing-skills/testing-skills-with-subagents.md +384 -384
|
@@ -1,597 +1,597 @@
|
|
|
1
|
-
# Performance Optimization
|
|
2
|
-
|
|
3
|
-
---
|
|
4
|
-
|
|
5
|
-
## Overview
|
|
6
|
-
|
|
7
|
-
Optimizing pandas performance is critical for production workflows. This reference covers memory optimization, vectorization, chunking, and profiling with pandas 2.0+.
|
|
8
|
-
|
|
9
|
-
---
|
|
10
|
-
|
|
11
|
-
## Memory Analysis
|
|
12
|
-
|
|
13
|
-
### Checking Memory Usage
|
|
14
|
-
|
|
15
|
-
```python
|
|
16
|
-
import pandas as pd
|
|
17
|
-
import numpy as np
|
|
18
|
-
|
|
19
|
-
df = pd.DataFrame({
|
|
20
|
-
'id': range(1_000_000),
|
|
21
|
-
'name': ['user_' + str(i) for i in range(1_000_000)],
|
|
22
|
-
'category': np.random.choice(['A', 'B', 'C', 'D'], 1_000_000),
|
|
23
|
-
'value': np.random.randn(1_000_000),
|
|
24
|
-
'count': np.random.randint(0, 100, 1_000_000),
|
|
25
|
-
})
|
|
26
|
-
|
|
27
|
-
# Basic memory info
|
|
28
|
-
print(df.info(memory_usage='deep'))
|
|
29
|
-
|
|
30
|
-
# Detailed memory by column
|
|
31
|
-
memory_usage = df.memory_usage(deep=True)
|
|
32
|
-
print(memory_usage)
|
|
33
|
-
print(f"Total: {memory_usage.sum() / 1e6:.2f} MB")
|
|
34
|
-
|
|
35
|
-
# Memory as percentage of total
|
|
36
|
-
memory_pct = (memory_usage / memory_usage.sum() * 100).round(2)
|
|
37
|
-
print(memory_pct)
|
|
38
|
-
```
|
|
39
|
-
|
|
40
|
-
### Memory Profiling Function
|
|
41
|
-
|
|
42
|
-
```python
|
|
43
|
-
def memory_profile(df: pd.DataFrame) -> pd.DataFrame:
|
|
44
|
-
"""Profile memory usage by column with optimization suggestions."""
|
|
45
|
-
memory_bytes = df.memory_usage(deep=True)
|
|
46
|
-
|
|
47
|
-
profile = pd.DataFrame({
|
|
48
|
-
'dtype': df.dtypes,
|
|
49
|
-
'non_null': df.count(),
|
|
50
|
-
'null_count': df.isna().sum(),
|
|
51
|
-
'unique': df.nunique(),
|
|
52
|
-
'memory_mb': (memory_bytes / 1e6).round(3),
|
|
53
|
-
})
|
|
54
|
-
|
|
55
|
-
# Add optimization suggestions
|
|
56
|
-
suggestions = []
|
|
57
|
-
for col in df.columns:
|
|
58
|
-
dtype = df[col].dtype
|
|
59
|
-
nunique = df[col].nunique()
|
|
60
|
-
|
|
61
|
-
if dtype == 'object':
|
|
62
|
-
if nunique / len(df) < 0.5: # Less than 50% unique
|
|
63
|
-
suggestions.append(f"Convert to category (only {nunique} unique)")
|
|
64
|
-
else:
|
|
65
|
-
suggestions.append("Consider string dtype")
|
|
66
|
-
elif dtype == 'int64':
|
|
67
|
-
if df[col].max() < 2**31 and df[col].min() >= -2**31:
|
|
68
|
-
suggestions.append("Downcast to int32")
|
|
69
|
-
if df[col].max() < 2**15 and df[col].min() >= -2**15:
|
|
70
|
-
suggestions.append("Downcast to int16")
|
|
71
|
-
elif dtype == 'float64':
|
|
72
|
-
suggestions.append("Consider float32 if precision allows")
|
|
73
|
-
else:
|
|
74
|
-
suggestions.append("OK")
|
|
75
|
-
|
|
76
|
-
profile['suggestion'] = suggestions
|
|
77
|
-
return profile
|
|
78
|
-
|
|
79
|
-
print(memory_profile(df))
|
|
80
|
-
```
|
|
81
|
-
|
|
82
|
-
---
|
|
83
|
-
|
|
84
|
-
## Memory Optimization Techniques
|
|
85
|
-
|
|
86
|
-
### Downcasting Numeric Types
|
|
87
|
-
|
|
88
|
-
```python
|
|
89
|
-
# Automatic downcasting for integers
|
|
90
|
-
df['count'] = pd.to_numeric(df['count'], downcast='integer')
|
|
91
|
-
|
|
92
|
-
# Automatic downcasting for floats
|
|
93
|
-
df['value'] = pd.to_numeric(df['value'], downcast='float')
|
|
94
|
-
|
|
95
|
-
# Manual downcasting function
|
|
96
|
-
def downcast_dtypes(df: pd.DataFrame) -> pd.DataFrame:
|
|
97
|
-
"""Reduce memory by downcasting numeric types."""
|
|
98
|
-
df = df.copy()
|
|
99
|
-
|
|
100
|
-
for col in df.select_dtypes(include=['int']).columns:
|
|
101
|
-
df[col] = pd.to_numeric(df[col], downcast='integer')
|
|
102
|
-
|
|
103
|
-
for col in df.select_dtypes(include=['float']).columns:
|
|
104
|
-
df[col] = pd.to_numeric(df[col], downcast='float')
|
|
105
|
-
|
|
106
|
-
return df
|
|
107
|
-
|
|
108
|
-
df_optimized = downcast_dtypes(df)
|
|
109
|
-
print(f"Before: {df.memory_usage(deep=True).sum() / 1e6:.2f} MB")
|
|
110
|
-
print(f"After: {df_optimized.memory_usage(deep=True).sum() / 1e6:.2f} MB")
|
|
111
|
-
```
|
|
112
|
-
|
|
113
|
-
### Using Categorical Type
|
|
114
|
-
|
|
115
|
-
```python
|
|
116
|
-
# Convert low-cardinality string columns to category
|
|
117
|
-
# Especially effective when unique values << total rows
|
|
118
|
-
|
|
119
|
-
# Before
|
|
120
|
-
print(f"Object dtype: {df['category'].memory_usage(deep=True) / 1e6:.2f} MB")
|
|
121
|
-
|
|
122
|
-
# After
|
|
123
|
-
df['category'] = df['category'].astype('category')
|
|
124
|
-
print(f"Category dtype: {df['category'].memory_usage(deep=True) / 1e6:.2f} MB")
|
|
125
|
-
|
|
126
|
-
# Automatic conversion for low-cardinality columns
|
|
127
|
-
def optimize_categories(df: pd.DataFrame, threshold: float = 0.5) -> pd.DataFrame:
|
|
128
|
-
"""Convert object columns to category if unique ratio < threshold."""
|
|
129
|
-
df = df.copy()
|
|
130
|
-
|
|
131
|
-
for col in df.select_dtypes(include=['object']).columns:
|
|
132
|
-
unique_ratio = df[col].nunique() / len(df)
|
|
133
|
-
if unique_ratio < threshold:
|
|
134
|
-
df[col] = df[col].astype('category')
|
|
135
|
-
|
|
136
|
-
return df
|
|
137
|
-
```
|
|
138
|
-
|
|
139
|
-
### Sparse Data Types
|
|
140
|
-
|
|
141
|
-
```python
|
|
142
|
-
# For data with many repeated values (especially zeros/NaN)
|
|
143
|
-
sparse_series = pd.arrays.SparseArray([0, 0, 1, 0, 0, 0, 2, 0, 0, 0])
|
|
144
|
-
|
|
145
|
-
# Create sparse DataFrame
|
|
146
|
-
df_sparse = pd.DataFrame({
|
|
147
|
-
'sparse_col': pd.arrays.SparseArray([0] * 9000 + [1] * 1000),
|
|
148
|
-
'dense_col': [0] * 9000 + [1] * 1000,
|
|
149
|
-
})
|
|
150
|
-
|
|
151
|
-
print(f"Sparse: {df_sparse['sparse_col'].memory_usage() / 1e6:.4f} MB")
|
|
152
|
-
print(f"Dense: {df_sparse['dense_col'].memory_usage() / 1e6:.4f} MB")
|
|
153
|
-
```
|
|
154
|
-
|
|
155
|
-
### Nullable Types (pandas 2.0+)
|
|
156
|
-
|
|
157
|
-
```python
|
|
158
|
-
# Use nullable types for proper NA handling with memory efficiency
|
|
159
|
-
df = df.astype({
|
|
160
|
-
'id': 'Int32', # Nullable int32
|
|
161
|
-
'count': 'Int16', # Nullable int16
|
|
162
|
-
'value': 'Float32', # Nullable float32
|
|
163
|
-
'name': 'string', # Nullable string (more memory efficient)
|
|
164
|
-
'category': 'category', # Categorical
|
|
165
|
-
})
|
|
166
|
-
|
|
167
|
-
# Arrow-backed types for even better memory (pandas 2.0+)
|
|
168
|
-
df['name'] = df['name'].astype('string[pyarrow]')
|
|
169
|
-
df['category'] = df['category'].astype('category')
|
|
170
|
-
```
|
|
171
|
-
|
|
172
|
-
---
|
|
173
|
-
|
|
174
|
-
## Vectorization
|
|
175
|
-
|
|
176
|
-
### Replace Loops with Vectorized Operations
|
|
177
|
-
|
|
178
|
-
```python
|
|
179
|
-
# BAD: Row iteration (extremely slow)
|
|
180
|
-
result = []
|
|
181
|
-
for idx, row in df.iterrows():
|
|
182
|
-
if row['value'] > 0:
|
|
183
|
-
result.append(row['value'] * 2)
|
|
184
|
-
else:
|
|
185
|
-
result.append(0)
|
|
186
|
-
df['result'] = result
|
|
187
|
-
|
|
188
|
-
# GOOD: Vectorized with np.where
|
|
189
|
-
df['result'] = np.where(df['value'] > 0, df['value'] * 2, 0)
|
|
190
|
-
|
|
191
|
-
# GOOD: Vectorized with boolean indexing
|
|
192
|
-
df['result'] = 0
|
|
193
|
-
df.loc[df['value'] > 0, 'result'] = df.loc[df['value'] > 0, 'value'] * 2
|
|
194
|
-
```
|
|
195
|
-
|
|
196
|
-
### Multiple Conditions with np.select
|
|
197
|
-
|
|
198
|
-
```python
|
|
199
|
-
# BAD: Nested if-else in apply
|
|
200
|
-
def categorize(row):
|
|
201
|
-
if row['value'] < -1:
|
|
202
|
-
return 'very_low'
|
|
203
|
-
elif row['value'] < 0:
|
|
204
|
-
return 'low'
|
|
205
|
-
elif row['value'] < 1:
|
|
206
|
-
return 'medium'
|
|
207
|
-
else:
|
|
208
|
-
return 'high'
|
|
209
|
-
|
|
210
|
-
df['category'] = df.apply(categorize, axis=1) # SLOW!
|
|
211
|
-
|
|
212
|
-
# GOOD: Vectorized with np.select
|
|
213
|
-
conditions = [
|
|
214
|
-
df['value'] < -1,
|
|
215
|
-
df['value'] < 0,
|
|
216
|
-
df['value'] < 1,
|
|
217
|
-
]
|
|
218
|
-
choices = ['very_low', 'low', 'medium']
|
|
219
|
-
df['category'] = np.select(conditions, choices, default='high')
|
|
220
|
-
```
|
|
221
|
-
|
|
222
|
-
### String Operations - Vectorized
|
|
223
|
-
|
|
224
|
-
```python
|
|
225
|
-
# BAD: Apply for string operations
|
|
226
|
-
df['upper_name'] = df['name'].apply(lambda x: x.upper())
|
|
227
|
-
|
|
228
|
-
# GOOD: Vectorized string methods
|
|
229
|
-
df['upper_name'] = df['name'].str.upper()
|
|
230
|
-
|
|
231
|
-
# Combine multiple string operations
|
|
232
|
-
df['processed'] = (
|
|
233
|
-
df['name']
|
|
234
|
-
.str.strip()
|
|
235
|
-
.str.lower()
|
|
236
|
-
.str.replace(r'\s+', '_', regex=True)
|
|
237
|
-
)
|
|
238
|
-
```
|
|
239
|
-
|
|
240
|
-
### Avoid apply() When Possible
|
|
241
|
-
|
|
242
|
-
```python
|
|
243
|
-
# BAD: apply for row-wise calculation
|
|
244
|
-
df['total'] = df.apply(lambda row: row['a'] + row['b'] + row['c'], axis=1)
|
|
245
|
-
|
|
246
|
-
# GOOD: Direct vectorized operation
|
|
247
|
-
df['total'] = df['a'] + df['b'] + df['c']
|
|
248
|
-
|
|
249
|
-
# BAD: apply for element-wise operation
|
|
250
|
-
df['squared'] = df['value'].apply(lambda x: x ** 2)
|
|
251
|
-
|
|
252
|
-
# GOOD: Vectorized
|
|
253
|
-
df['squared'] = df['value'] ** 2
|
|
254
|
-
|
|
255
|
-
# When apply IS appropriate: complex custom logic
|
|
256
|
-
def complex_calculation(row):
|
|
257
|
-
# Multiple dependencies and conditional logic
|
|
258
|
-
if row['type'] == 'A':
|
|
259
|
-
return row['value'] * row['multiplier'] + row['offset']
|
|
260
|
-
else:
|
|
261
|
-
return row['value'] / row['divisor'] - row['adjustment']
|
|
262
|
-
|
|
263
|
-
# Consider rewriting as vectorized if performance critical
|
|
264
|
-
```
|
|
265
|
-
|
|
266
|
-
---
|
|
267
|
-
|
|
268
|
-
## Chunked Processing
|
|
269
|
-
|
|
270
|
-
### Reading Large Files in Chunks
|
|
271
|
-
|
|
272
|
-
```python
|
|
273
|
-
# Read CSV in chunks
|
|
274
|
-
chunk_size = 100_000
|
|
275
|
-
chunks = []
|
|
276
|
-
|
|
277
|
-
for chunk in pd.read_csv('large_file.csv', chunksize=chunk_size):
|
|
278
|
-
# Process each chunk
|
|
279
|
-
processed = chunk[chunk['value'] > 0] # Filter
|
|
280
|
-
processed = processed.groupby('category')['value'].sum() # Aggregate
|
|
281
|
-
chunks.append(processed)
|
|
282
|
-
|
|
283
|
-
# Combine results
|
|
284
|
-
result = pd.concat(chunks).groupby(level=0).sum()
|
|
285
|
-
```
|
|
286
|
-
|
|
287
|
-
### Chunked Processing Function
|
|
288
|
-
|
|
289
|
-
```python
|
|
290
|
-
def process_large_csv(
|
|
291
|
-
filepath: str,
|
|
292
|
-
chunk_size: int = 100_000,
|
|
293
|
-
filter_func=None,
|
|
294
|
-
agg_func=None,
|
|
295
|
-
) -> pd.DataFrame:
|
|
296
|
-
"""Process large CSV files in chunks."""
|
|
297
|
-
results = []
|
|
298
|
-
|
|
299
|
-
for chunk in pd.read_csv(filepath, chunksize=chunk_size):
|
|
300
|
-
# Apply filter if provided
|
|
301
|
-
if filter_func:
|
|
302
|
-
chunk = filter_func(chunk)
|
|
303
|
-
|
|
304
|
-
# Apply aggregation if provided
|
|
305
|
-
if agg_func:
|
|
306
|
-
chunk = agg_func(chunk)
|
|
307
|
-
|
|
308
|
-
results.append(chunk)
|
|
309
|
-
|
|
310
|
-
# Combine results
|
|
311
|
-
combined = pd.concat(results, ignore_index=True)
|
|
312
|
-
|
|
313
|
-
# Re-aggregate if needed
|
|
314
|
-
if agg_func:
|
|
315
|
-
combined = agg_func(combined)
|
|
316
|
-
|
|
317
|
-
return combined
|
|
318
|
-
|
|
319
|
-
# Usage
|
|
320
|
-
result = process_large_csv(
|
|
321
|
-
'large_file.csv',
|
|
322
|
-
chunk_size=50_000,
|
|
323
|
-
filter_func=lambda df: df[df['value'] > 0],
|
|
324
|
-
agg_func=lambda df: df.groupby('category').agg({'value': 'sum'}),
|
|
325
|
-
)
|
|
326
|
-
```
|
|
327
|
-
|
|
328
|
-
### Memory-Efficient Iteration
|
|
329
|
-
|
|
330
|
-
```python
|
|
331
|
-
# When you must iterate, use itertuples (not iterrows)
|
|
332
|
-
# itertuples is 10-100x faster than iterrows
|
|
333
|
-
|
|
334
|
-
# BAD: iterrows
|
|
335
|
-
for idx, row in df.iterrows():
|
|
336
|
-
process(row['name'], row['value'])
|
|
337
|
-
|
|
338
|
-
# BETTER: itertuples
|
|
339
|
-
for row in df.itertuples():
|
|
340
|
-
process(row.name, row.value) # Access as attributes
|
|
341
|
-
|
|
342
|
-
# BEST: Vectorized operations (avoid iteration entirely)
|
|
343
|
-
```
|
|
344
|
-
|
|
345
|
-
---
|
|
346
|
-
|
|
347
|
-
## Query Optimization
|
|
348
|
-
|
|
349
|
-
### Efficient Filtering
|
|
350
|
-
|
|
351
|
-
```python
|
|
352
|
-
# Order matters - filter early, compute late
|
|
353
|
-
# BAD: Compute on all rows, then filter
|
|
354
|
-
df['expensive_calc'] = df['a'] * df['b'] + np.sin(df['c'])
|
|
355
|
-
result = df[df['category'] == 'A']
|
|
356
|
-
|
|
357
|
-
# GOOD: Filter first, compute on subset
|
|
358
|
-
mask = df['category'] == 'A'
|
|
359
|
-
result = df[mask].copy()
|
|
360
|
-
result['expensive_calc'] = result['a'] * result['b'] + np.sin(result['c'])
|
|
361
|
-
```
|
|
362
|
-
|
|
363
|
-
### Using query() for Performance
|
|
364
|
-
|
|
365
|
-
```python
|
|
366
|
-
# query() can be faster for large DataFrames (uses numexpr)
|
|
367
|
-
# Traditional boolean indexing
|
|
368
|
-
result = df[(df['value'] > 0) & (df['category'] == 'A')]
|
|
369
|
-
|
|
370
|
-
# query() syntax (faster for large data)
|
|
371
|
-
result = df.query('value > 0 and category == "A"')
|
|
372
|
-
|
|
373
|
-
# With variables
|
|
374
|
-
threshold = 0
|
|
375
|
-
cat = 'A'
|
|
376
|
-
result = df.query('value > @threshold and category == @cat')
|
|
377
|
-
```
|
|
378
|
-
|
|
379
|
-
### eval() for Complex Expressions
|
|
380
|
-
|
|
381
|
-
```python
|
|
382
|
-
# eval() uses numexpr for faster computation
|
|
383
|
-
# Standard pandas
|
|
384
|
-
df['result'] = df['a'] + df['b'] * df['c'] - df['d']
|
|
385
|
-
|
|
386
|
-
# Using eval (faster for large DataFrames)
|
|
387
|
-
df['result'] = pd.eval('df.a + df.b * df.c - df.d')
|
|
388
|
-
|
|
389
|
-
# In-place with inplace parameter
|
|
390
|
-
df.eval('result = a + b * c - d', inplace=True)
|
|
391
|
-
```
|
|
392
|
-
|
|
393
|
-
---
|
|
394
|
-
|
|
395
|
-
## GroupBy Optimization
|
|
396
|
-
|
|
397
|
-
### Pre-sort for Faster GroupBy
|
|
398
|
-
|
|
399
|
-
```python
|
|
400
|
-
# Sort by groupby column first
|
|
401
|
-
df = df.sort_values('category')
|
|
402
|
-
|
|
403
|
-
# Use sort=False since already sorted
|
|
404
|
-
result = df.groupby('category', sort=False)['value'].mean()
|
|
405
|
-
```
|
|
406
|
-
|
|
407
|
-
### Use Built-in Aggregations
|
|
408
|
-
|
|
409
|
-
```python
|
|
410
|
-
# BAD: Custom function via apply
|
|
411
|
-
result = df.groupby('category')['value'].apply(lambda x: x.mean())
|
|
412
|
-
|
|
413
|
-
# GOOD: Built-in aggregation
|
|
414
|
-
result = df.groupby('category')['value'].mean()
|
|
415
|
-
|
|
416
|
-
# Built-in aggregations available:
|
|
417
|
-
# sum, mean, median, min, max, std, var, count, first, last, nth
|
|
418
|
-
# size, sem, prod, cumsum, cummax, cummin, cumprod
|
|
419
|
-
```
|
|
420
|
-
|
|
421
|
-
### Observed Categories
|
|
422
|
-
|
|
423
|
-
```python
|
|
424
|
-
# For categorical columns, use observed=True (pandas 2.0+ default)
|
|
425
|
-
df['category'] = df['category'].astype('category')
|
|
426
|
-
|
|
427
|
-
# Avoid computing for unobserved categories
|
|
428
|
-
result = df.groupby('category', observed=True)['value'].mean()
|
|
429
|
-
```
|
|
430
|
-
|
|
431
|
-
---
|
|
432
|
-
|
|
433
|
-
## I/O Optimization
|
|
434
|
-
|
|
435
|
-
### Efficient File Formats
|
|
436
|
-
|
|
437
|
-
```python
|
|
438
|
-
# Parquet - best for analytical workloads
|
|
439
|
-
df.to_parquet('data.parquet', compression='snappy')
|
|
440
|
-
df = pd.read_parquet('data.parquet')
|
|
441
|
-
|
|
442
|
-
# Feather - best for pandas interchange
|
|
443
|
-
df.to_feather('data.feather')
|
|
444
|
-
df = pd.read_feather('data.feather')
|
|
445
|
-
|
|
446
|
-
# CSV with optimizations
|
|
447
|
-
df.to_csv('data.csv', index=False)
|
|
448
|
-
df = pd.read_csv(
|
|
449
|
-
'data.csv',
|
|
450
|
-
dtype={'category': 'category', 'count': 'int32'},
|
|
451
|
-
usecols=['id', 'category', 'value'], # Only needed columns
|
|
452
|
-
nrows=10000, # Limit rows for testing
|
|
453
|
-
)
|
|
454
|
-
```
|
|
455
|
-
|
|
456
|
-
### Specify dtypes When Reading
|
|
457
|
-
|
|
458
|
-
```python
|
|
459
|
-
# Specify dtypes upfront to avoid inference overhead
|
|
460
|
-
dtypes = {
|
|
461
|
-
'id': 'int32',
|
|
462
|
-
'name': 'string',
|
|
463
|
-
'category': 'category',
|
|
464
|
-
'value': 'float32',
|
|
465
|
-
'count': 'int16',
|
|
466
|
-
}
|
|
467
|
-
|
|
468
|
-
df = pd.read_csv('data.csv', dtype=dtypes)
|
|
469
|
-
|
|
470
|
-
# Parse dates efficiently
|
|
471
|
-
df = pd.read_csv(
|
|
472
|
-
'data.csv',
|
|
473
|
-
dtype=dtypes,
|
|
474
|
-
parse_dates=['date_column'],
|
|
475
|
-
date_format='%Y-%m-%d', # Explicit format is faster
|
|
476
|
-
)
|
|
477
|
-
```
|
|
478
|
-
|
|
479
|
-
---
|
|
480
|
-
|
|
481
|
-
## Profiling and Benchmarking
|
|
482
|
-
|
|
483
|
-
### Timing Operations
|
|
484
|
-
|
|
485
|
-
```python
|
|
486
|
-
import time
|
|
487
|
-
|
|
488
|
-
# Simple timing
|
|
489
|
-
start = time.time()
|
|
490
|
-
result = df.groupby('category')['value'].mean()
|
|
491
|
-
elapsed = time.time() - start
|
|
492
|
-
print(f"Elapsed: {elapsed:.4f} seconds")
|
|
493
|
-
|
|
494
|
-
# Using %%timeit in Jupyter
|
|
495
|
-
# %%timeit
|
|
496
|
-
# df.groupby('category')['value'].mean()
|
|
497
|
-
```
|
|
498
|
-
|
|
499
|
-
### Memory Profiling
|
|
500
|
-
|
|
501
|
-
```python
|
|
502
|
-
# Track memory before/after
|
|
503
|
-
import tracemalloc
|
|
504
|
-
|
|
505
|
-
tracemalloc.start()
|
|
506
|
-
|
|
507
|
-
# Your operation
|
|
508
|
-
df_result = df.groupby('category').agg({'value': 'sum'})
|
|
509
|
-
|
|
510
|
-
current, peak = tracemalloc.get_traced_memory()
|
|
511
|
-
print(f"Current memory: {current / 1e6:.2f} MB")
|
|
512
|
-
print(f"Peak memory: {peak / 1e6:.2f} MB")
|
|
513
|
-
|
|
514
|
-
tracemalloc.stop()
|
|
515
|
-
```
|
|
516
|
-
|
|
517
|
-
### Comparison Template
|
|
518
|
-
|
|
519
|
-
```python
|
|
520
|
-
def benchmark_operations(df: pd.DataFrame, operations: dict, n_runs: int = 5):
|
|
521
|
-
"""Benchmark multiple operations."""
|
|
522
|
-
results = {}
|
|
523
|
-
|
|
524
|
-
for name, func in operations.items():
|
|
525
|
-
times = []
|
|
526
|
-
for _ in range(n_runs):
|
|
527
|
-
start = time.time()
|
|
528
|
-
func(df)
|
|
529
|
-
times.append(time.time() - start)
|
|
530
|
-
|
|
531
|
-
results[name] = {
|
|
532
|
-
'mean': np.mean(times),
|
|
533
|
-
'std': np.std(times),
|
|
534
|
-
'min': np.min(times),
|
|
535
|
-
}
|
|
536
|
-
|
|
537
|
-
return pd.DataFrame(results).T
|
|
538
|
-
|
|
539
|
-
# Usage
|
|
540
|
-
operations = {
|
|
541
|
-
'iterrows': lambda df: [row['value'] for _, row in df.iterrows()],
|
|
542
|
-
'itertuples': lambda df: [row.value for row in df.itertuples()],
|
|
543
|
-
'vectorized': lambda df: df['value'].tolist(),
|
|
544
|
-
}
|
|
545
|
-
|
|
546
|
-
benchmark_results = benchmark_operations(df.head(10000), operations)
|
|
547
|
-
print(benchmark_results)
|
|
548
|
-
```
|
|
549
|
-
|
|
550
|
-
---
|
|
551
|
-
|
|
552
|
-
## Best Practices Summary
|
|
553
|
-
|
|
554
|
-
1. **Profile first** - Identify actual bottlenecks before optimizing
|
|
555
|
-
2. **Use appropriate dtypes** - int32/float32/category save memory
|
|
556
|
-
3. **Vectorize everything** - Avoid loops and apply when possible
|
|
557
|
-
4. **Filter early** - Reduce data before expensive operations
|
|
558
|
-
5. **Chunk large files** - Process in manageable pieces
|
|
559
|
-
6. **Use efficient file formats** - Parquet/Feather over CSV
|
|
560
|
-
7. **Leverage built-in methods** - Faster than custom functions
|
|
561
|
-
|
|
562
|
-
---
|
|
563
|
-
|
|
564
|
-
## Performance Checklist
|
|
565
|
-
|
|
566
|
-
Before deploying pandas code:
|
|
567
|
-
|
|
568
|
-
- [ ] Memory profiled with `memory_usage(deep=True)`
|
|
569
|
-
- [ ] Dtypes optimized (downcast, categorical)
|
|
570
|
-
- [ ] No iterrows/itertuples in hot paths
|
|
571
|
-
- [ ] GroupBy uses built-in aggregations
|
|
572
|
-
- [ ] Large files processed in chunks
|
|
573
|
-
- [ ] Filters applied before computations
|
|
574
|
-
- [ ] Appropriate file format used
|
|
575
|
-
- [ ] Benchmarked with representative data size
|
|
576
|
-
|
|
577
|
-
---
|
|
578
|
-
|
|
579
|
-
## Anti-Patterns Summary
|
|
580
|
-
|
|
581
|
-
| Anti-Pattern | Alternative |
|
|
582
|
-
|--------------|-------------|
|
|
583
|
-
| `iterrows()` for computation | Vectorized operations |
|
|
584
|
-
| `apply(lambda)` for simple ops | Built-in methods |
|
|
585
|
-
| Loading entire large file | Chunked reading |
|
|
586
|
-
| String columns with low cardinality | Category dtype |
|
|
587
|
-
| int64 for small integers | int32/int16 |
|
|
588
|
-
| Multiple separate filters | Combined boolean mask |
|
|
589
|
-
| Repeated groupby calls | Single groupby with multiple aggs |
|
|
590
|
-
|
|
591
|
-
---
|
|
592
|
-
|
|
593
|
-
## Related References
|
|
594
|
-
|
|
595
|
-
- `dataframe-operations.md` - Efficient indexing and filtering
|
|
596
|
-
- `aggregation-groupby.md` - Optimized aggregation patterns
|
|
597
|
-
- `merging-joining.md` - Efficient merge strategies
|
|
1
|
+
# Performance Optimization
|
|
2
|
+
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
## Overview
|
|
6
|
+
|
|
7
|
+
Optimizing pandas performance is critical for production workflows. This reference covers memory optimization, vectorization, chunking, and profiling with pandas 2.0+.
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## Memory Analysis
|
|
12
|
+
|
|
13
|
+
### Checking Memory Usage
|
|
14
|
+
|
|
15
|
+
```python
|
|
16
|
+
import pandas as pd
|
|
17
|
+
import numpy as np
|
|
18
|
+
|
|
19
|
+
df = pd.DataFrame({
|
|
20
|
+
'id': range(1_000_000),
|
|
21
|
+
'name': ['user_' + str(i) for i in range(1_000_000)],
|
|
22
|
+
'category': np.random.choice(['A', 'B', 'C', 'D'], 1_000_000),
|
|
23
|
+
'value': np.random.randn(1_000_000),
|
|
24
|
+
'count': np.random.randint(0, 100, 1_000_000),
|
|
25
|
+
})
|
|
26
|
+
|
|
27
|
+
# Basic memory info
|
|
28
|
+
print(df.info(memory_usage='deep'))
|
|
29
|
+
|
|
30
|
+
# Detailed memory by column
|
|
31
|
+
memory_usage = df.memory_usage(deep=True)
|
|
32
|
+
print(memory_usage)
|
|
33
|
+
print(f"Total: {memory_usage.sum() / 1e6:.2f} MB")
|
|
34
|
+
|
|
35
|
+
# Memory as percentage of total
|
|
36
|
+
memory_pct = (memory_usage / memory_usage.sum() * 100).round(2)
|
|
37
|
+
print(memory_pct)
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
### Memory Profiling Function
|
|
41
|
+
|
|
42
|
+
```python
|
|
43
|
+
def memory_profile(df: pd.DataFrame) -> pd.DataFrame:
|
|
44
|
+
"""Profile memory usage by column with optimization suggestions."""
|
|
45
|
+
memory_bytes = df.memory_usage(deep=True)
|
|
46
|
+
|
|
47
|
+
profile = pd.DataFrame({
|
|
48
|
+
'dtype': df.dtypes,
|
|
49
|
+
'non_null': df.count(),
|
|
50
|
+
'null_count': df.isna().sum(),
|
|
51
|
+
'unique': df.nunique(),
|
|
52
|
+
'memory_mb': (memory_bytes / 1e6).round(3),
|
|
53
|
+
})
|
|
54
|
+
|
|
55
|
+
# Add optimization suggestions
|
|
56
|
+
suggestions = []
|
|
57
|
+
for col in df.columns:
|
|
58
|
+
dtype = df[col].dtype
|
|
59
|
+
nunique = df[col].nunique()
|
|
60
|
+
|
|
61
|
+
if dtype == 'object':
|
|
62
|
+
if nunique / len(df) < 0.5: # Less than 50% unique
|
|
63
|
+
suggestions.append(f"Convert to category (only {nunique} unique)")
|
|
64
|
+
else:
|
|
65
|
+
suggestions.append("Consider string dtype")
|
|
66
|
+
elif dtype == 'int64':
|
|
67
|
+
if df[col].max() < 2**31 and df[col].min() >= -2**31:
|
|
68
|
+
suggestions.append("Downcast to int32")
|
|
69
|
+
if df[col].max() < 2**15 and df[col].min() >= -2**15:
|
|
70
|
+
suggestions.append("Downcast to int16")
|
|
71
|
+
elif dtype == 'float64':
|
|
72
|
+
suggestions.append("Consider float32 if precision allows")
|
|
73
|
+
else:
|
|
74
|
+
suggestions.append("OK")
|
|
75
|
+
|
|
76
|
+
profile['suggestion'] = suggestions
|
|
77
|
+
return profile
|
|
78
|
+
|
|
79
|
+
print(memory_profile(df))
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
---
|
|
83
|
+
|
|
84
|
+
## Memory Optimization Techniques
|
|
85
|
+
|
|
86
|
+
### Downcasting Numeric Types
|
|
87
|
+
|
|
88
|
+
```python
|
|
89
|
+
# Automatic downcasting for integers
|
|
90
|
+
df['count'] = pd.to_numeric(df['count'], downcast='integer')
|
|
91
|
+
|
|
92
|
+
# Automatic downcasting for floats
|
|
93
|
+
df['value'] = pd.to_numeric(df['value'], downcast='float')
|
|
94
|
+
|
|
95
|
+
# Manual downcasting function
|
|
96
|
+
def downcast_dtypes(df: pd.DataFrame) -> pd.DataFrame:
|
|
97
|
+
"""Reduce memory by downcasting numeric types."""
|
|
98
|
+
df = df.copy()
|
|
99
|
+
|
|
100
|
+
for col in df.select_dtypes(include=['int']).columns:
|
|
101
|
+
df[col] = pd.to_numeric(df[col], downcast='integer')
|
|
102
|
+
|
|
103
|
+
for col in df.select_dtypes(include=['float']).columns:
|
|
104
|
+
df[col] = pd.to_numeric(df[col], downcast='float')
|
|
105
|
+
|
|
106
|
+
return df
|
|
107
|
+
|
|
108
|
+
df_optimized = downcast_dtypes(df)
|
|
109
|
+
print(f"Before: {df.memory_usage(deep=True).sum() / 1e6:.2f} MB")
|
|
110
|
+
print(f"After: {df_optimized.memory_usage(deep=True).sum() / 1e6:.2f} MB")
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
### Using Categorical Type
|
|
114
|
+
|
|
115
|
+
```python
|
|
116
|
+
# Convert low-cardinality string columns to category
|
|
117
|
+
# Especially effective when unique values << total rows
|
|
118
|
+
|
|
119
|
+
# Before
|
|
120
|
+
print(f"Object dtype: {df['category'].memory_usage(deep=True) / 1e6:.2f} MB")
|
|
121
|
+
|
|
122
|
+
# After
|
|
123
|
+
df['category'] = df['category'].astype('category')
|
|
124
|
+
print(f"Category dtype: {df['category'].memory_usage(deep=True) / 1e6:.2f} MB")
|
|
125
|
+
|
|
126
|
+
# Automatic conversion for low-cardinality columns
|
|
127
|
+
def optimize_categories(df: pd.DataFrame, threshold: float = 0.5) -> pd.DataFrame:
|
|
128
|
+
"""Convert object columns to category if unique ratio < threshold."""
|
|
129
|
+
df = df.copy()
|
|
130
|
+
|
|
131
|
+
for col in df.select_dtypes(include=['object']).columns:
|
|
132
|
+
unique_ratio = df[col].nunique() / len(df)
|
|
133
|
+
if unique_ratio < threshold:
|
|
134
|
+
df[col] = df[col].astype('category')
|
|
135
|
+
|
|
136
|
+
return df
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
### Sparse Data Types
|
|
140
|
+
|
|
141
|
+
```python
|
|
142
|
+
# For data with many repeated values (especially zeros/NaN)
|
|
143
|
+
sparse_series = pd.arrays.SparseArray([0, 0, 1, 0, 0, 0, 2, 0, 0, 0])
|
|
144
|
+
|
|
145
|
+
# Create sparse DataFrame
|
|
146
|
+
df_sparse = pd.DataFrame({
|
|
147
|
+
'sparse_col': pd.arrays.SparseArray([0] * 9000 + [1] * 1000),
|
|
148
|
+
'dense_col': [0] * 9000 + [1] * 1000,
|
|
149
|
+
})
|
|
150
|
+
|
|
151
|
+
print(f"Sparse: {df_sparse['sparse_col'].memory_usage() / 1e6:.4f} MB")
|
|
152
|
+
print(f"Dense: {df_sparse['dense_col'].memory_usage() / 1e6:.4f} MB")
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
### Nullable Types (pandas 2.0+)
|
|
156
|
+
|
|
157
|
+
```python
|
|
158
|
+
# Use nullable types for proper NA handling with memory efficiency
|
|
159
|
+
df = df.astype({
|
|
160
|
+
'id': 'Int32', # Nullable int32
|
|
161
|
+
'count': 'Int16', # Nullable int16
|
|
162
|
+
'value': 'Float32', # Nullable float32
|
|
163
|
+
'name': 'string', # Nullable string (more memory efficient)
|
|
164
|
+
'category': 'category', # Categorical
|
|
165
|
+
})
|
|
166
|
+
|
|
167
|
+
# Arrow-backed types for even better memory (pandas 2.0+)
|
|
168
|
+
df['name'] = df['name'].astype('string[pyarrow]')
|
|
169
|
+
df['category'] = df['category'].astype('category')
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
---
|
|
173
|
+
|
|
174
|
+
## Vectorization
|
|
175
|
+
|
|
176
|
+
### Replace Loops with Vectorized Operations
|
|
177
|
+
|
|
178
|
+
```python
|
|
179
|
+
# BAD: Row iteration (extremely slow)
|
|
180
|
+
result = []
|
|
181
|
+
for idx, row in df.iterrows():
|
|
182
|
+
if row['value'] > 0:
|
|
183
|
+
result.append(row['value'] * 2)
|
|
184
|
+
else:
|
|
185
|
+
result.append(0)
|
|
186
|
+
df['result'] = result
|
|
187
|
+
|
|
188
|
+
# GOOD: Vectorized with np.where
|
|
189
|
+
df['result'] = np.where(df['value'] > 0, df['value'] * 2, 0)
|
|
190
|
+
|
|
191
|
+
# GOOD: Vectorized with boolean indexing
|
|
192
|
+
df['result'] = 0
|
|
193
|
+
df.loc[df['value'] > 0, 'result'] = df.loc[df['value'] > 0, 'value'] * 2
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
### Multiple Conditions with np.select
|
|
197
|
+
|
|
198
|
+
```python
|
|
199
|
+
# BAD: Nested if-else in apply
|
|
200
|
+
def categorize(row):
|
|
201
|
+
if row['value'] < -1:
|
|
202
|
+
return 'very_low'
|
|
203
|
+
elif row['value'] < 0:
|
|
204
|
+
return 'low'
|
|
205
|
+
elif row['value'] < 1:
|
|
206
|
+
return 'medium'
|
|
207
|
+
else:
|
|
208
|
+
return 'high'
|
|
209
|
+
|
|
210
|
+
df['category'] = df.apply(categorize, axis=1) # SLOW!
|
|
211
|
+
|
|
212
|
+
# GOOD: Vectorized with np.select
|
|
213
|
+
conditions = [
|
|
214
|
+
df['value'] < -1,
|
|
215
|
+
df['value'] < 0,
|
|
216
|
+
df['value'] < 1,
|
|
217
|
+
]
|
|
218
|
+
choices = ['very_low', 'low', 'medium']
|
|
219
|
+
df['category'] = np.select(conditions, choices, default='high')
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
### String Operations - Vectorized
|
|
223
|
+
|
|
224
|
+
```python
|
|
225
|
+
# BAD: Apply for string operations
|
|
226
|
+
df['upper_name'] = df['name'].apply(lambda x: x.upper())
|
|
227
|
+
|
|
228
|
+
# GOOD: Vectorized string methods
|
|
229
|
+
df['upper_name'] = df['name'].str.upper()
|
|
230
|
+
|
|
231
|
+
# Combine multiple string operations
|
|
232
|
+
df['processed'] = (
|
|
233
|
+
df['name']
|
|
234
|
+
.str.strip()
|
|
235
|
+
.str.lower()
|
|
236
|
+
.str.replace(r'\s+', '_', regex=True)
|
|
237
|
+
)
|
|
238
|
+
```
|
|
239
|
+
|
|
240
|
+
### Avoid apply() When Possible
|
|
241
|
+
|
|
242
|
+
```python
|
|
243
|
+
# BAD: apply for row-wise calculation
|
|
244
|
+
df['total'] = df.apply(lambda row: row['a'] + row['b'] + row['c'], axis=1)
|
|
245
|
+
|
|
246
|
+
# GOOD: Direct vectorized operation
|
|
247
|
+
df['total'] = df['a'] + df['b'] + df['c']
|
|
248
|
+
|
|
249
|
+
# BAD: apply for element-wise operation
|
|
250
|
+
df['squared'] = df['value'].apply(lambda x: x ** 2)
|
|
251
|
+
|
|
252
|
+
# GOOD: Vectorized
|
|
253
|
+
df['squared'] = df['value'] ** 2
|
|
254
|
+
|
|
255
|
+
# When apply IS appropriate: complex custom logic
|
|
256
|
+
def complex_calculation(row):
|
|
257
|
+
# Multiple dependencies and conditional logic
|
|
258
|
+
if row['type'] == 'A':
|
|
259
|
+
return row['value'] * row['multiplier'] + row['offset']
|
|
260
|
+
else:
|
|
261
|
+
return row['value'] / row['divisor'] - row['adjustment']
|
|
262
|
+
|
|
263
|
+
# Consider rewriting as vectorized if performance critical
|
|
264
|
+
```
|
|
265
|
+
|
|
266
|
+
---
|
|
267
|
+
|
|
268
|
+
## Chunked Processing
|
|
269
|
+
|
|
270
|
+
### Reading Large Files in Chunks
|
|
271
|
+
|
|
272
|
+
```python
|
|
273
|
+
# Read CSV in chunks
|
|
274
|
+
chunk_size = 100_000
|
|
275
|
+
chunks = []
|
|
276
|
+
|
|
277
|
+
for chunk in pd.read_csv('large_file.csv', chunksize=chunk_size):
|
|
278
|
+
# Process each chunk
|
|
279
|
+
processed = chunk[chunk['value'] > 0] # Filter
|
|
280
|
+
processed = processed.groupby('category')['value'].sum() # Aggregate
|
|
281
|
+
chunks.append(processed)
|
|
282
|
+
|
|
283
|
+
# Combine results
|
|
284
|
+
result = pd.concat(chunks).groupby(level=0).sum()
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
### Chunked Processing Function
|
|
288
|
+
|
|
289
|
+
```python
|
|
290
|
+
def process_large_csv(
|
|
291
|
+
filepath: str,
|
|
292
|
+
chunk_size: int = 100_000,
|
|
293
|
+
filter_func=None,
|
|
294
|
+
agg_func=None,
|
|
295
|
+
) -> pd.DataFrame:
|
|
296
|
+
"""Process large CSV files in chunks."""
|
|
297
|
+
results = []
|
|
298
|
+
|
|
299
|
+
for chunk in pd.read_csv(filepath, chunksize=chunk_size):
|
|
300
|
+
# Apply filter if provided
|
|
301
|
+
if filter_func:
|
|
302
|
+
chunk = filter_func(chunk)
|
|
303
|
+
|
|
304
|
+
# Apply aggregation if provided
|
|
305
|
+
if agg_func:
|
|
306
|
+
chunk = agg_func(chunk)
|
|
307
|
+
|
|
308
|
+
results.append(chunk)
|
|
309
|
+
|
|
310
|
+
# Combine results
|
|
311
|
+
combined = pd.concat(results, ignore_index=True)
|
|
312
|
+
|
|
313
|
+
# Re-aggregate if needed
|
|
314
|
+
if agg_func:
|
|
315
|
+
combined = agg_func(combined)
|
|
316
|
+
|
|
317
|
+
return combined
|
|
318
|
+
|
|
319
|
+
# Usage
|
|
320
|
+
result = process_large_csv(
|
|
321
|
+
'large_file.csv',
|
|
322
|
+
chunk_size=50_000,
|
|
323
|
+
filter_func=lambda df: df[df['value'] > 0],
|
|
324
|
+
agg_func=lambda df: df.groupby('category').agg({'value': 'sum'}),
|
|
325
|
+
)
|
|
326
|
+
```
|
|
327
|
+
|
|
328
|
+
### Memory-Efficient Iteration
|
|
329
|
+
|
|
330
|
+
```python
|
|
331
|
+
# When you must iterate, use itertuples (not iterrows)
|
|
332
|
+
# itertuples is 10-100x faster than iterrows
|
|
333
|
+
|
|
334
|
+
# BAD: iterrows
|
|
335
|
+
for idx, row in df.iterrows():
|
|
336
|
+
process(row['name'], row['value'])
|
|
337
|
+
|
|
338
|
+
# BETTER: itertuples
|
|
339
|
+
for row in df.itertuples():
|
|
340
|
+
process(row.name, row.value) # Access as attributes
|
|
341
|
+
|
|
342
|
+
# BEST: Vectorized operations (avoid iteration entirely)
|
|
343
|
+
```
|
|
344
|
+
|
|
345
|
+
---
|
|
346
|
+
|
|
347
|
+
## Query Optimization
|
|
348
|
+
|
|
349
|
+
### Efficient Filtering
|
|
350
|
+
|
|
351
|
+
```python
|
|
352
|
+
# Order matters - filter early, compute late
|
|
353
|
+
# BAD: Compute on all rows, then filter
|
|
354
|
+
df['expensive_calc'] = df['a'] * df['b'] + np.sin(df['c'])
|
|
355
|
+
result = df[df['category'] == 'A']
|
|
356
|
+
|
|
357
|
+
# GOOD: Filter first, compute on subset
|
|
358
|
+
mask = df['category'] == 'A'
|
|
359
|
+
result = df[mask].copy()
|
|
360
|
+
result['expensive_calc'] = result['a'] * result['b'] + np.sin(result['c'])
|
|
361
|
+
```
|
|
362
|
+
|
|
363
|
+
### Using query() for Performance
|
|
364
|
+
|
|
365
|
+
```python
|
|
366
|
+
# query() can be faster for large DataFrames (uses numexpr)
|
|
367
|
+
# Traditional boolean indexing
|
|
368
|
+
result = df[(df['value'] > 0) & (df['category'] == 'A')]
|
|
369
|
+
|
|
370
|
+
# query() syntax (faster for large data)
|
|
371
|
+
result = df.query('value > 0 and category == "A"')
|
|
372
|
+
|
|
373
|
+
# With variables
|
|
374
|
+
threshold = 0
|
|
375
|
+
cat = 'A'
|
|
376
|
+
result = df.query('value > @threshold and category == @cat')
|
|
377
|
+
```
|
|
378
|
+
|
|
379
|
+
### eval() for Complex Expressions
|
|
380
|
+
|
|
381
|
+
```python
|
|
382
|
+
# eval() uses numexpr for faster computation
|
|
383
|
+
# Standard pandas
|
|
384
|
+
df['result'] = df['a'] + df['b'] * df['c'] - df['d']
|
|
385
|
+
|
|
386
|
+
# Using eval (faster for large DataFrames)
|
|
387
|
+
df['result'] = pd.eval('df.a + df.b * df.c - df.d')
|
|
388
|
+
|
|
389
|
+
# In-place with inplace parameter
|
|
390
|
+
df.eval('result = a + b * c - d', inplace=True)
|
|
391
|
+
```
|
|
392
|
+
|
|
393
|
+
---
|
|
394
|
+
|
|
395
|
+
## GroupBy Optimization
|
|
396
|
+
|
|
397
|
+
### Pre-sort for Faster GroupBy
|
|
398
|
+
|
|
399
|
+
```python
|
|
400
|
+
# Sort by groupby column first
|
|
401
|
+
df = df.sort_values('category')
|
|
402
|
+
|
|
403
|
+
# Use sort=False since already sorted
|
|
404
|
+
result = df.groupby('category', sort=False)['value'].mean()
|
|
405
|
+
```
|
|
406
|
+
|
|
407
|
+
### Use Built-in Aggregations
|
|
408
|
+
|
|
409
|
+
```python
|
|
410
|
+
# BAD: Custom function via apply
|
|
411
|
+
result = df.groupby('category')['value'].apply(lambda x: x.mean())
|
|
412
|
+
|
|
413
|
+
# GOOD: Built-in aggregation
|
|
414
|
+
result = df.groupby('category')['value'].mean()
|
|
415
|
+
|
|
416
|
+
# Built-in aggregations available:
|
|
417
|
+
# sum, mean, median, min, max, std, var, count, first, last, nth
|
|
418
|
+
# size, sem, prod, cumsum, cummax, cummin, cumprod
|
|
419
|
+
```
|
|
420
|
+
|
|
421
|
+
### Observed Categories
|
|
422
|
+
|
|
423
|
+
```python
|
|
424
|
+
# For categorical columns, use observed=True (pandas 2.0+ default)
|
|
425
|
+
df['category'] = df['category'].astype('category')
|
|
426
|
+
|
|
427
|
+
# Avoid computing for unobserved categories
|
|
428
|
+
result = df.groupby('category', observed=True)['value'].mean()
|
|
429
|
+
```
|
|
430
|
+
|
|
431
|
+
---
|
|
432
|
+
|
|
433
|
+
## I/O Optimization
|
|
434
|
+
|
|
435
|
+
### Efficient File Formats
|
|
436
|
+
|
|
437
|
+
```python
|
|
438
|
+
# Parquet - best for analytical workloads
|
|
439
|
+
df.to_parquet('data.parquet', compression='snappy')
|
|
440
|
+
df = pd.read_parquet('data.parquet')
|
|
441
|
+
|
|
442
|
+
# Feather - best for pandas interchange
|
|
443
|
+
df.to_feather('data.feather')
|
|
444
|
+
df = pd.read_feather('data.feather')
|
|
445
|
+
|
|
446
|
+
# CSV with optimizations
|
|
447
|
+
df.to_csv('data.csv', index=False)
|
|
448
|
+
df = pd.read_csv(
|
|
449
|
+
'data.csv',
|
|
450
|
+
dtype={'category': 'category', 'count': 'int32'},
|
|
451
|
+
usecols=['id', 'category', 'value'], # Only needed columns
|
|
452
|
+
nrows=10000, # Limit rows for testing
|
|
453
|
+
)
|
|
454
|
+
```
|
|
455
|
+
|
|
456
|
+
### Specify dtypes When Reading
|
|
457
|
+
|
|
458
|
+
```python
|
|
459
|
+
# Specify dtypes upfront to avoid inference overhead
|
|
460
|
+
dtypes = {
|
|
461
|
+
'id': 'int32',
|
|
462
|
+
'name': 'string',
|
|
463
|
+
'category': 'category',
|
|
464
|
+
'value': 'float32',
|
|
465
|
+
'count': 'int16',
|
|
466
|
+
}
|
|
467
|
+
|
|
468
|
+
df = pd.read_csv('data.csv', dtype=dtypes)
|
|
469
|
+
|
|
470
|
+
# Parse dates efficiently
|
|
471
|
+
df = pd.read_csv(
|
|
472
|
+
'data.csv',
|
|
473
|
+
dtype=dtypes,
|
|
474
|
+
parse_dates=['date_column'],
|
|
475
|
+
date_format='%Y-%m-%d', # Explicit format is faster
|
|
476
|
+
)
|
|
477
|
+
```
|
|
478
|
+
|
|
479
|
+
---
|
|
480
|
+
|
|
481
|
+
## Profiling and Benchmarking
|
|
482
|
+
|
|
483
|
+
### Timing Operations
|
|
484
|
+
|
|
485
|
+
```python
|
|
486
|
+
import time
|
|
487
|
+
|
|
488
|
+
# Simple timing
|
|
489
|
+
start = time.time()
|
|
490
|
+
result = df.groupby('category')['value'].mean()
|
|
491
|
+
elapsed = time.time() - start
|
|
492
|
+
print(f"Elapsed: {elapsed:.4f} seconds")
|
|
493
|
+
|
|
494
|
+
# Using %%timeit in Jupyter
|
|
495
|
+
# %%timeit
|
|
496
|
+
# df.groupby('category')['value'].mean()
|
|
497
|
+
```
|
|
498
|
+
|
|
499
|
+
### Memory Profiling
|
|
500
|
+
|
|
501
|
+
```python
|
|
502
|
+
# Track memory before/after
|
|
503
|
+
import tracemalloc
|
|
504
|
+
|
|
505
|
+
tracemalloc.start()
|
|
506
|
+
|
|
507
|
+
# Your operation
|
|
508
|
+
df_result = df.groupby('category').agg({'value': 'sum'})
|
|
509
|
+
|
|
510
|
+
current, peak = tracemalloc.get_traced_memory()
|
|
511
|
+
print(f"Current memory: {current / 1e6:.2f} MB")
|
|
512
|
+
print(f"Peak memory: {peak / 1e6:.2f} MB")
|
|
513
|
+
|
|
514
|
+
tracemalloc.stop()
|
|
515
|
+
```
|
|
516
|
+
|
|
517
|
+
### Comparison Template
|
|
518
|
+
|
|
519
|
+
```python
|
|
520
|
+
def benchmark_operations(df: pd.DataFrame, operations: dict, n_runs: int = 5):
|
|
521
|
+
"""Benchmark multiple operations."""
|
|
522
|
+
results = {}
|
|
523
|
+
|
|
524
|
+
for name, func in operations.items():
|
|
525
|
+
times = []
|
|
526
|
+
for _ in range(n_runs):
|
|
527
|
+
start = time.time()
|
|
528
|
+
func(df)
|
|
529
|
+
times.append(time.time() - start)
|
|
530
|
+
|
|
531
|
+
results[name] = {
|
|
532
|
+
'mean': np.mean(times),
|
|
533
|
+
'std': np.std(times),
|
|
534
|
+
'min': np.min(times),
|
|
535
|
+
}
|
|
536
|
+
|
|
537
|
+
return pd.DataFrame(results).T
|
|
538
|
+
|
|
539
|
+
# Usage
|
|
540
|
+
operations = {
|
|
541
|
+
'iterrows': lambda df: [row['value'] for _, row in df.iterrows()],
|
|
542
|
+
'itertuples': lambda df: [row.value for row in df.itertuples()],
|
|
543
|
+
'vectorized': lambda df: df['value'].tolist(),
|
|
544
|
+
}
|
|
545
|
+
|
|
546
|
+
benchmark_results = benchmark_operations(df.head(10000), operations)
|
|
547
|
+
print(benchmark_results)
|
|
548
|
+
```
|
|
549
|
+
|
|
550
|
+
---
|
|
551
|
+
|
|
552
|
+
## Best Practices Summary
|
|
553
|
+
|
|
554
|
+
1. **Profile first** - Identify actual bottlenecks before optimizing
|
|
555
|
+
2. **Use appropriate dtypes** - int32/float32/category save memory
|
|
556
|
+
3. **Vectorize everything** - Avoid loops and apply when possible
|
|
557
|
+
4. **Filter early** - Reduce data before expensive operations
|
|
558
|
+
5. **Chunk large files** - Process in manageable pieces
|
|
559
|
+
6. **Use efficient file formats** - Parquet/Feather over CSV
|
|
560
|
+
7. **Leverage built-in methods** - Faster than custom functions
|
|
561
|
+
|
|
562
|
+
---
|
|
563
|
+
|
|
564
|
+
## Performance Checklist
|
|
565
|
+
|
|
566
|
+
Before deploying pandas code:
|
|
567
|
+
|
|
568
|
+
- [ ] Memory profiled with `memory_usage(deep=True)`
|
|
569
|
+
- [ ] Dtypes optimized (downcast, categorical)
|
|
570
|
+
- [ ] No iterrows/itertuples in hot paths
|
|
571
|
+
- [ ] GroupBy uses built-in aggregations
|
|
572
|
+
- [ ] Large files processed in chunks
|
|
573
|
+
- [ ] Filters applied before computations
|
|
574
|
+
- [ ] Appropriate file format used
|
|
575
|
+
- [ ] Benchmarked with representative data size
|
|
576
|
+
|
|
577
|
+
---
|
|
578
|
+
|
|
579
|
+
## Anti-Patterns Summary
|
|
580
|
+
|
|
581
|
+
| Anti-Pattern | Alternative |
|
|
582
|
+
|--------------|-------------|
|
|
583
|
+
| `iterrows()` for computation | Vectorized operations |
|
|
584
|
+
| `apply(lambda)` for simple ops | Built-in methods |
|
|
585
|
+
| Loading entire large file | Chunked reading |
|
|
586
|
+
| String columns with low cardinality | Category dtype |
|
|
587
|
+
| int64 for small integers | int32/int16 |
|
|
588
|
+
| Multiple separate filters | Combined boolean mask |
|
|
589
|
+
| Repeated groupby calls | Single groupby with multiple aggs |
|
|
590
|
+
|
|
591
|
+
---
|
|
592
|
+
|
|
593
|
+
## Related References
|
|
594
|
+
|
|
595
|
+
- `dataframe-operations.md` - Efficient indexing and filtering
|
|
596
|
+
- `aggregation-groupby.md` - Optimized aggregation patterns
|
|
597
|
+
- `merging-joining.md` - Efficient merge strategies
|