maestro-bundle 1.3.1 → 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (116) hide show
  1. package/package.json +1 -1
  2. package/templates/bundle-ai-agents/skills/agent-orchestration/SKILL.md +107 -41
  3. package/templates/bundle-ai-agents/skills/agent-orchestration/references/graph-patterns.md +50 -0
  4. package/templates/bundle-ai-agents/skills/agent-orchestration/references/routing-strategies.md +47 -0
  5. package/templates/bundle-ai-agents/skills/api-design/SKILL.md +125 -16
  6. package/templates/bundle-ai-agents/skills/api-design/references/pydantic-patterns.md +72 -0
  7. package/templates/bundle-ai-agents/skills/api-design/references/rest-conventions.md +51 -0
  8. package/templates/bundle-ai-agents/skills/clean-architecture/SKILL.md +113 -21
  9. package/templates/bundle-ai-agents/skills/clean-architecture/references/dependency-injection.md +60 -0
  10. package/templates/bundle-ai-agents/skills/clean-architecture/references/layer-rules.md +56 -0
  11. package/templates/bundle-ai-agents/skills/context-engineering/SKILL.md +104 -36
  12. package/templates/bundle-ai-agents/skills/context-engineering/references/compression-techniques.md +76 -0
  13. package/templates/bundle-ai-agents/skills/context-engineering/references/context-budget-calculator.md +45 -0
  14. package/templates/bundle-ai-agents/skills/database-modeling/SKILL.md +146 -19
  15. package/templates/bundle-ai-agents/skills/database-modeling/references/index-strategies.md +48 -0
  16. package/templates/bundle-ai-agents/skills/database-modeling/references/naming-conventions.md +27 -0
  17. package/templates/bundle-ai-agents/skills/docker-containerization/SKILL.md +124 -15
  18. package/templates/bundle-ai-agents/skills/docker-containerization/references/compose-patterns.md +97 -0
  19. package/templates/bundle-ai-agents/skills/docker-containerization/references/dockerfile-checklist.md +37 -0
  20. package/templates/bundle-ai-agents/skills/eval-testing/SKILL.md +113 -25
  21. package/templates/bundle-ai-agents/skills/eval-testing/references/eval-types.md +52 -0
  22. package/templates/bundle-ai-agents/skills/eval-testing/references/golden-dataset-template.md +59 -0
  23. package/templates/bundle-ai-agents/skills/memory-management/SKILL.md +112 -28
  24. package/templates/bundle-ai-agents/skills/memory-management/references/memory-tiers.md +41 -0
  25. package/templates/bundle-ai-agents/skills/memory-management/references/namespace-conventions.md +41 -0
  26. package/templates/bundle-ai-agents/skills/prompt-engineering/SKILL.md +139 -47
  27. package/templates/bundle-ai-agents/skills/prompt-engineering/references/anti-patterns.md +59 -0
  28. package/templates/bundle-ai-agents/skills/prompt-engineering/references/prompt-templates.md +75 -0
  29. package/templates/bundle-ai-agents/skills/rag-pipeline/SKILL.md +104 -27
  30. package/templates/bundle-ai-agents/skills/rag-pipeline/references/chunking-strategies.md +27 -0
  31. package/templates/bundle-ai-agents/skills/rag-pipeline/references/embedding-models.md +31 -0
  32. package/templates/bundle-ai-agents/skills/rag-pipeline/references/rag-evaluation.md +39 -0
  33. package/templates/bundle-ai-agents/skills/testing-strategy/SKILL.md +127 -18
  34. package/templates/bundle-ai-agents/skills/testing-strategy/references/fixture-patterns.md +81 -0
  35. package/templates/bundle-ai-agents/skills/testing-strategy/references/naming-conventions.md +69 -0
  36. package/templates/bundle-base/skills/branch-strategy/SKILL.md +134 -21
  37. package/templates/bundle-base/skills/branch-strategy/references/branch-rules.md +40 -0
  38. package/templates/bundle-base/skills/code-review/SKILL.md +123 -38
  39. package/templates/bundle-base/skills/code-review/references/review-checklist.md +45 -0
  40. package/templates/bundle-base/skills/commit-pattern/SKILL.md +98 -39
  41. package/templates/bundle-base/skills/commit-pattern/references/conventional-commits.md +40 -0
  42. package/templates/bundle-data-pipeline/skills/data-preprocessing/SKILL.md +110 -19
  43. package/templates/bundle-data-pipeline/skills/data-preprocessing/references/pandas-cheatsheet.md +63 -0
  44. package/templates/bundle-data-pipeline/skills/data-preprocessing/references/pandera-schemas.md +44 -0
  45. package/templates/bundle-data-pipeline/skills/docker-containerization/SKILL.md +132 -16
  46. package/templates/bundle-data-pipeline/skills/docker-containerization/references/compose-patterns.md +82 -0
  47. package/templates/bundle-data-pipeline/skills/docker-containerization/references/dockerfile-best-practices.md +57 -0
  48. package/templates/bundle-data-pipeline/skills/feature-engineering/SKILL.md +143 -45
  49. package/templates/bundle-data-pipeline/skills/feature-engineering/references/encoding-guide.md +41 -0
  50. package/templates/bundle-data-pipeline/skills/feature-engineering/references/scaling-guide.md +38 -0
  51. package/templates/bundle-data-pipeline/skills/mlops-pipeline/SKILL.md +156 -37
  52. package/templates/bundle-data-pipeline/skills/mlops-pipeline/references/mlflow-commands.md +69 -0
  53. package/templates/bundle-data-pipeline/skills/model-training/SKILL.md +152 -33
  54. package/templates/bundle-data-pipeline/skills/model-training/references/evaluation-metrics.md +52 -0
  55. package/templates/bundle-data-pipeline/skills/model-training/references/model-selection-guide.md +41 -0
  56. package/templates/bundle-data-pipeline/skills/rag-pipeline/SKILL.md +127 -39
  57. package/templates/bundle-data-pipeline/skills/rag-pipeline/references/chunking-strategies.md +51 -0
  58. package/templates/bundle-data-pipeline/skills/rag-pipeline/references/embedding-models.md +49 -0
  59. package/templates/bundle-frontend-spa/skills/authentication/SKILL.md +196 -13
  60. package/templates/bundle-frontend-spa/skills/authentication/references/jwt-security.md +41 -0
  61. package/templates/bundle-frontend-spa/skills/component-design/SKILL.md +191 -41
  62. package/templates/bundle-frontend-spa/skills/component-design/references/accessibility-checklist.md +41 -0
  63. package/templates/bundle-frontend-spa/skills/component-design/references/tailwind-patterns.md +65 -0
  64. package/templates/bundle-frontend-spa/skills/e2e-testing/SKILL.md +241 -79
  65. package/templates/bundle-frontend-spa/skills/e2e-testing/references/playwright-selectors.md +66 -0
  66. package/templates/bundle-frontend-spa/skills/e2e-testing/references/test-patterns.md +82 -0
  67. package/templates/bundle-frontend-spa/skills/integration-api/SKILL.md +221 -31
  68. package/templates/bundle-frontend-spa/skills/integration-api/references/api-patterns.md +81 -0
  69. package/templates/bundle-frontend-spa/skills/react-patterns/SKILL.md +195 -70
  70. package/templates/bundle-frontend-spa/skills/react-patterns/references/component-checklist.md +22 -0
  71. package/templates/bundle-frontend-spa/skills/react-patterns/references/hook-patterns.md +63 -0
  72. package/templates/bundle-frontend-spa/skills/responsive-layout/SKILL.md +162 -22
  73. package/templates/bundle-frontend-spa/skills/responsive-layout/references/breakpoint-guide.md +63 -0
  74. package/templates/bundle-frontend-spa/skills/state-management/SKILL.md +158 -30
  75. package/templates/bundle-frontend-spa/skills/state-management/references/react-query-config.md +64 -0
  76. package/templates/bundle-frontend-spa/skills/state-management/references/state-patterns.md +78 -0
  77. package/templates/bundle-jhipster-microservices/skills/ci-cd-pipeline/SKILL.md +135 -45
  78. package/templates/bundle-jhipster-microservices/skills/ci-cd-pipeline/references/gitlab-ci-templates.md +93 -0
  79. package/templates/bundle-jhipster-microservices/skills/clean-architecture/SKILL.md +87 -21
  80. package/templates/bundle-jhipster-microservices/skills/clean-architecture/references/layer-rules.md +78 -0
  81. package/templates/bundle-jhipster-microservices/skills/ddd-tactical/SKILL.md +94 -25
  82. package/templates/bundle-jhipster-microservices/skills/ddd-tactical/references/ddd-patterns.md +48 -0
  83. package/templates/bundle-jhipster-microservices/skills/jhipster-angular/SKILL.md +63 -21
  84. package/templates/bundle-jhipster-microservices/skills/jhipster-angular/references/angular-microservices.md +40 -0
  85. package/templates/bundle-jhipster-microservices/skills/jhipster-angular/references/angular-structure.md +59 -0
  86. package/templates/bundle-jhipster-microservices/skills/jhipster-docker-k8s/SKILL.md +125 -91
  87. package/templates/bundle-jhipster-microservices/skills/jhipster-docker-k8s/references/docker-k8s-commands.md +68 -0
  88. package/templates/bundle-jhipster-microservices/skills/jhipster-entities/SKILL.md +72 -20
  89. package/templates/bundle-jhipster-microservices/skills/jhipster-entities/references/cross-service-entities.md +36 -0
  90. package/templates/bundle-jhipster-microservices/skills/jhipster-entities/references/jdl-types.md +56 -0
  91. package/templates/bundle-jhipster-microservices/skills/jhipster-gateway/SKILL.md +80 -8
  92. package/templates/bundle-jhipster-microservices/skills/jhipster-gateway/references/gateway-config.md +43 -0
  93. package/templates/bundle-jhipster-microservices/skills/jhipster-kafka/SKILL.md +115 -22
  94. package/templates/bundle-jhipster-microservices/skills/jhipster-kafka/references/kafka-events.md +39 -0
  95. package/templates/bundle-jhipster-microservices/skills/jhipster-registry/SKILL.md +92 -23
  96. package/templates/bundle-jhipster-microservices/skills/jhipster-registry/references/consul-config.md +61 -0
  97. package/templates/bundle-jhipster-microservices/skills/jhipster-service/SKILL.md +81 -18
  98. package/templates/bundle-jhipster-microservices/skills/jhipster-service/references/service-patterns.md +40 -0
  99. package/templates/bundle-jhipster-microservices/skills/testing-strategy/SKILL.md +101 -20
  100. package/templates/bundle-jhipster-microservices/skills/testing-strategy/references/test-naming.md +55 -0
  101. package/templates/bundle-jhipster-monorepo/skills/clean-architecture/SKILL.md +87 -21
  102. package/templates/bundle-jhipster-monorepo/skills/clean-architecture/references/layer-rules.md +78 -0
  103. package/templates/bundle-jhipster-monorepo/skills/ddd-tactical/SKILL.md +94 -25
  104. package/templates/bundle-jhipster-monorepo/skills/ddd-tactical/references/ddd-patterns.md +48 -0
  105. package/templates/bundle-jhipster-monorepo/skills/jhipster-angular/SKILL.md +99 -52
  106. package/templates/bundle-jhipster-monorepo/skills/jhipster-angular/references/angular-structure.md +59 -0
  107. package/templates/bundle-jhipster-monorepo/skills/jhipster-entities/SKILL.md +89 -36
  108. package/templates/bundle-jhipster-monorepo/skills/jhipster-entities/references/jdl-types.md +56 -0
  109. package/templates/bundle-jhipster-monorepo/skills/jhipster-liquibase/SKILL.md +123 -23
  110. package/templates/bundle-jhipster-monorepo/skills/jhipster-liquibase/references/liquibase-operations.md +95 -0
  111. package/templates/bundle-jhipster-monorepo/skills/jhipster-security/SKILL.md +106 -19
  112. package/templates/bundle-jhipster-monorepo/skills/jhipster-security/references/security-checklist.md +47 -0
  113. package/templates/bundle-jhipster-monorepo/skills/jhipster-spring/SKILL.md +84 -16
  114. package/templates/bundle-jhipster-monorepo/skills/jhipster-spring/references/spring-layers.md +41 -0
  115. package/templates/bundle-jhipster-monorepo/skills/testing-strategy/SKILL.md +101 -20
  116. package/templates/bundle-jhipster-monorepo/skills/testing-strategy/references/test-naming.md +55 -0
@@ -0,0 +1,44 @@
1
+ # Pandera Schema Validation Reference
2
+
3
+ ## Basic Schema
4
+ ```python
5
+ import pandera as pa
6
+
7
+ schema = pa.DataFrameSchema({
8
+ "id": pa.Column(int, nullable=False, unique=True),
9
+ "name": pa.Column(str, nullable=False),
10
+ "score": pa.Column(float, pa.Check.between(0, 100)),
11
+ "status": pa.Column(str, pa.Check.isin(["active", "inactive"])),
12
+ })
13
+
14
+ validated = schema.validate(df)
15
+ ```
16
+
17
+ ## Common Checks
18
+ ```python
19
+ pa.Check.between(0, 100) # range check
20
+ pa.Check.isin(["a", "b", "c"]) # allowed values
21
+ pa.Check.str_matches(r"^\d{3}$") # regex match
22
+ pa.Check.gt(0) # greater than
23
+ pa.Check.le(1.0) # less than or equal
24
+ pa.Check(lambda s: s.str.len() > 3) # custom check
25
+ ```
26
+
27
+ ## Schema-Level Checks
28
+ ```python
29
+ schema = pa.DataFrameSchema(
30
+ columns={...},
31
+ checks=[
32
+ pa.Check(lambda df: df["end_date"] > df["start_date"]),
33
+ ],
34
+ index=pa.Index(int, name="idx"),
35
+ coerce=True, # auto-coerce types
36
+ )
37
+ ```
38
+
39
+ ## Decorator Validation
40
+ ```python
41
+ @pa.check_input(schema)
42
+ def process_data(df: pd.DataFrame) -> pd.DataFrame:
43
+ return df.assign(processed=True)
44
+ ```
@@ -1,12 +1,50 @@
1
1
  ---
2
2
  name: docker-containerization
3
- description: Criar Dockerfiles otimizados com multi-stage build, security hardening e docker-compose para desenvolvimento. Use quando for containerizar aplicações, criar Dockerfiles, ou configurar ambiente de dev.
3
+ description: Create optimized Dockerfiles with multi-stage builds, security hardening, and docker-compose for development environments. Use when you need to containerize an application, write Dockerfiles, set up docker-compose, or debug container issues.
4
+ version: 1.0.0
5
+ author: Maestro
4
6
  ---
5
7
 
6
8
  # Docker Containerization
7
9
 
8
- ## Dockerfile Python Multi-stage
10
+ Build production-ready containers with multi-stage builds, security best practices, and full docker-compose dev environments.
9
11
 
12
+ ## When to Use
13
+ - User needs to containerize a Python/Node.js application
14
+ - User wants to create a docker-compose setup for local development
15
+ - User needs to optimize Docker image size with multi-stage builds
16
+ - User wants to add health checks, non-root users, or security hardening
17
+ - User needs to debug container build or runtime issues
18
+
19
+ ## Available Operations
20
+ 1. Create a multi-stage Dockerfile for Python (FastAPI/Flask)
21
+ 2. Create a multi-stage Dockerfile for React/Node.js
22
+ 3. Set up docker-compose with PostgreSQL, Redis, MinIO
23
+ 4. Configure .dockerignore for optimal build context
24
+ 5. Debug build failures and optimize image size
25
+
26
+ ## Multi-Step Workflow
27
+
28
+ ### Step 1: Create .dockerignore
29
+ ```bash
30
+ cat > .dockerignore << 'EOF'
31
+ .git
32
+ node_modules
33
+ __pycache__
34
+ *.pyc
35
+ .env
36
+ .venv
37
+ dist
38
+ build
39
+ coverage
40
+ .pytest_cache
41
+ .mypy_cache
42
+ *.egg-info
43
+ .DS_Store
44
+ EOF
45
+ ```
46
+
47
+ ### Step 2: Write Dockerfile for Python API (Multi-Stage)
10
48
  ```dockerfile
11
49
  # === Build stage ===
12
50
  FROM python:3.11-slim AS builder
@@ -18,18 +56,25 @@ RUN pip install --no-cache-dir --prefix=/install -r requirements.txt
18
56
  # === Runtime stage ===
19
57
  FROM python:3.11-slim
20
58
  WORKDIR /app
59
+
60
+ # Security: non-root user
21
61
  RUN groupadd -r appuser && useradd -r -g appuser appuser
62
+
63
+ # Copy only installed packages from builder
22
64
  COPY --from=builder /install /usr/local
23
65
  COPY src/ ./src/
66
+
67
+ # Switch to non-root
24
68
  USER appuser
69
+
25
70
  EXPOSE 8000
26
71
  HEALTHCHECK --interval=30s --timeout=5s CMD curl -f http://localhost:8000/health || exit 1
27
72
  CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]
28
73
  ```
29
74
 
30
- ## Dockerfile React Multi-stage
31
-
75
+ ### Step 3: Write Dockerfile for React Frontend (Multi-Stage)
32
76
  ```dockerfile
77
+ # === Build stage ===
33
78
  FROM node:20-slim AS builder
34
79
  WORKDIR /app
35
80
  COPY package*.json ./
@@ -37,14 +82,33 @@ RUN npm ci
37
82
  COPY . .
38
83
  RUN npm run build
39
84
 
85
+ # === Runtime stage ===
40
86
  FROM nginx:alpine
41
87
  COPY --from=builder /app/dist /usr/share/nginx/html
42
88
  COPY nginx.conf /etc/nginx/conf.d/default.conf
43
89
  EXPOSE 80
90
+ HEALTHCHECK --interval=30s --timeout=5s CMD wget -q --spider http://localhost/ || exit 1
44
91
  ```
45
92
 
46
- ## Docker Compose Dev
93
+ ### Step 4: Build and Test Locally
94
+ ```bash
95
+ # Build Python API image
96
+ docker build -t myapp-api -f docker/Dockerfile.api .
97
+
98
+ # Build React frontend image
99
+ docker build -t myapp-frontend -f docker/Dockerfile.frontend .
100
+
101
+ # Verify image sizes
102
+ docker images | grep myapp
103
+
104
+ # Test API container
105
+ docker run --rm -p 8000:8000 myapp-api
47
106
 
107
+ # Test frontend container
108
+ docker run --rm -p 3000:80 myapp-frontend
109
+ ```
110
+
111
+ ### Step 5: Set Up docker-compose for Development
48
112
  ```yaml
49
113
  # docker-compose.dev.yml
50
114
  services:
@@ -98,17 +162,69 @@ volumes:
98
162
  pgdata:
99
163
  ```
100
164
 
101
- ## .dockerignore
165
+ ### Step 6: Start and Manage the Stack
166
+ ```bash
167
+ # Start all services
168
+ docker compose -f docker-compose.dev.yml up -d
169
+
170
+ # Check service health
171
+ docker compose -f docker-compose.dev.yml ps
102
172
 
173
+ # View logs
174
+ docker compose -f docker-compose.dev.yml logs -f api
175
+
176
+ # Stop everything
177
+ docker compose -f docker-compose.dev.yml down
178
+
179
+ # Stop and remove volumes (clean slate)
180
+ docker compose -f docker-compose.dev.yml down -v
103
181
  ```
104
- .git
105
- node_modules
106
- __pycache__
107
- *.pyc
108
- .env
109
- .venv
110
- dist
111
- build
112
- coverage
113
- .pytest_cache
182
+
183
+ ### Step 7: Debug Common Issues
184
+ ```bash
185
+ # Check why a container exited
186
+ docker compose -f docker-compose.dev.yml logs api
187
+
188
+ # Shell into a running container
189
+ docker compose -f docker-compose.dev.yml exec api bash
190
+
191
+ # Check container resource usage
192
+ docker stats
193
+
194
+ # Rebuild a single service after code changes
195
+ docker compose -f docker-compose.dev.yml up -d --build api
196
+
197
+ # Prune unused images to free disk space
198
+ docker image prune -f
114
199
  ```
200
+
201
+ ## Resources
202
+ - `references/dockerfile-best-practices.md` - Security and optimization patterns
203
+ - `references/compose-patterns.md` - Common docker-compose service configurations
204
+
205
+ ## Examples
206
+ ### Example 1: Containerize a FastAPI App
207
+ User asks: "Create a Dockerfile for our Python API"
208
+ Response approach:
209
+ 1. Create .dockerignore to exclude unnecessary files
210
+ 2. Write multi-stage Dockerfile (builder + runtime)
211
+ 3. Add non-root user and health check
212
+ 4. Build and test with `docker build` and `docker run`
213
+ 5. Verify image size with `docker images`
214
+
215
+ ### Example 2: Set Up Local Dev Environment
216
+ User asks: "Set up docker-compose with Postgres and Redis for development"
217
+ Response approach:
218
+ 1. Create docker-compose.dev.yml with all services
219
+ 2. Add health checks and dependency ordering
220
+ 3. Mount source code as volume for hot reload
221
+ 4. Start with `docker compose up -d`
222
+ 5. Verify services with `docker compose ps`
223
+
224
+ ## Notes
225
+ - Always use multi-stage builds to keep production images small
226
+ - Never run containers as root -- create a dedicated appuser
227
+ - Add HEALTHCHECK to every Dockerfile for orchestrator integration
228
+ - Use `.dockerignore` to keep build context small and fast
229
+ - Pin base image versions (python:3.11-slim, not python:latest)
230
+ - Mount source code as volumes only in dev, not in production builds
@@ -0,0 +1,82 @@
1
+ # Docker Compose Patterns
2
+
3
+ ## PostgreSQL with pgvector
4
+ ```yaml
5
+ postgres:
6
+ image: pgvector/pgvector:pg16
7
+ environment:
8
+ POSTGRES_DB: mydb
9
+ POSTGRES_USER: myuser
10
+ POSTGRES_PASSWORD: mypassword
11
+ ports:
12
+ - "5432:5432"
13
+ volumes:
14
+ - pgdata:/var/lib/postgresql/data
15
+ - ./init.sql:/docker-entrypoint-initdb.d/init.sql
16
+ healthcheck:
17
+ test: ["CMD-SHELL", "pg_isready -U myuser"]
18
+ interval: 5s
19
+ timeout: 5s
20
+ retries: 5
21
+ ```
22
+
23
+ ## Redis
24
+ ```yaml
25
+ redis:
26
+ image: redis:7-alpine
27
+ ports:
28
+ - "6379:6379"
29
+ volumes:
30
+ - redisdata:/data
31
+ healthcheck:
32
+ test: ["CMD", "redis-cli", "ping"]
33
+ interval: 5s
34
+ timeout: 3s
35
+ ```
36
+
37
+ ## MinIO (S3-compatible storage)
38
+ ```yaml
39
+ minio:
40
+ image: minio/minio
41
+ command: server /data --console-address ":9001"
42
+ ports:
43
+ - "9000:9000"
44
+ - "9001:9001"
45
+ environment:
46
+ MINIO_ROOT_USER: minioadmin
47
+ MINIO_ROOT_PASSWORD: minioadmin
48
+ volumes:
49
+ - miniodata:/data
50
+ ```
51
+
52
+ ## MLflow Server
53
+ ```yaml
54
+ mlflow:
55
+ image: ghcr.io/mlflow/mlflow:latest
56
+ command: mlflow server --host 0.0.0.0 --port 5000 --backend-store-uri postgresql://mlflow:mlflow@postgres/mlflow --default-artifact-root s3://mlflow/
57
+ ports:
58
+ - "5000:5000"
59
+ depends_on:
60
+ postgres:
61
+ condition: service_healthy
62
+ ```
63
+
64
+ ## Dependency Ordering
65
+ ```yaml
66
+ services:
67
+ api:
68
+ depends_on:
69
+ postgres:
70
+ condition: service_healthy # wait for healthy, not just started
71
+ redis:
72
+ condition: service_started
73
+ ```
74
+
75
+ ## Development Volumes (hot reload)
76
+ ```yaml
77
+ services:
78
+ api:
79
+ volumes:
80
+ - ./src:/app/src # mount source code
81
+ - /app/node_modules # exclude node_modules from mount
82
+ ```
@@ -0,0 +1,57 @@
1
+ # Dockerfile Best Practices
2
+
3
+ ## Multi-Stage Build Pattern
4
+ ```dockerfile
5
+ # Stage 1: Build
6
+ FROM python:3.11-slim AS builder
7
+ WORKDIR /app
8
+ COPY requirements.txt .
9
+ RUN pip install --no-cache-dir --prefix=/install -r requirements.txt
10
+
11
+ # Stage 2: Runtime (smaller image)
12
+ FROM python:3.11-slim
13
+ COPY --from=builder /install /usr/local
14
+ COPY src/ ./src/
15
+ CMD ["python", "-m", "src.main"]
16
+ ```
17
+
18
+ ## Security Checklist
19
+ - [ ] Non-root user: `RUN groupadd -r app && useradd -r -g app app` then `USER app`
20
+ - [ ] Pin base image versions: `python:3.11-slim` not `python:latest`
21
+ - [ ] No secrets in build args or ENV: use runtime environment variables
22
+ - [ ] Minimal packages: `--no-install-recommends` for apt-get
23
+ - [ ] Clean up apt cache: `rm -rf /var/lib/apt/lists/*`
24
+ - [ ] Scan images: `docker scout cves myimage`
25
+
26
+ ## Layer Optimization
27
+ ```dockerfile
28
+ # BAD: creates unnecessary layer cache misses
29
+ COPY . .
30
+ RUN pip install -r requirements.txt
31
+
32
+ # GOOD: dependencies cached separately from code
33
+ COPY requirements.txt .
34
+ RUN pip install -r requirements.txt
35
+ COPY src/ ./src/
36
+ ```
37
+
38
+ ## Health Checks
39
+ ```dockerfile
40
+ # HTTP health check
41
+ HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
42
+ CMD curl -f http://localhost:8000/health || exit 1
43
+
44
+ # TCP health check (no curl needed)
45
+ HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
46
+ CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1
47
+ ```
48
+
49
+ ## Image Size Comparison
50
+ | Base Image | Size | Use Case |
51
+ |---|---|---|
52
+ | python:3.11 | ~900MB | Avoid |
53
+ | python:3.11-slim | ~120MB | Default choice |
54
+ | python:3.11-alpine | ~50MB | If musl-compatible |
55
+ | node:20 | ~1GB | Avoid |
56
+ | node:20-slim | ~200MB | Default choice |
57
+ | node:20-alpine | ~130MB | If no native deps |
@@ -1,76 +1,174 @@
1
1
  ---
2
2
  name: feature-engineering
3
- description: Criar e transformar features para modelos de ML incluindo encoding, scaling, e feature selection. Use quando precisar preparar dados, criar features, ou selecionar variáveis relevantes.
3
+ description: Create and transform features for ML models including encoding, scaling, and feature selection. Use when you need to prepare data for training, create new features, encode categoricals, or select the most relevant variables.
4
+ version: 1.0.0
5
+ author: Maestro
4
6
  ---
5
7
 
6
8
  # Feature Engineering
7
9
 
8
- ## Fluxo
10
+ Build feature pipelines that transform raw data into model-ready inputs using scikit-learn.
9
11
 
10
- ```
11
- Dados brutos Limpeza Encoding Scaling Feature Selection → Dados prontos
12
- ```
12
+ ## When to Use
13
+ - User needs to encode categorical variables (one-hot, ordinal, label)
14
+ - User needs to scale or normalize numeric features
15
+ - User wants to select the best features for a model
16
+ - User needs to create derived features (interactions, aggregations, date parts)
17
+ - User needs to remove outliers from a dataset
18
+
19
+ ## Available Operations
20
+ 1. Clean data and remove outliers (IQR method)
21
+ 2. Encode categorical features (OneHot, Ordinal, Label)
22
+ 3. Scale numeric features (Standard, MinMax, Robust)
23
+ 4. Create derived features (date parts, interactions, aggregations)
24
+ 5. Select top features (statistical tests, model importance)
25
+ 6. Build a reusable sklearn ColumnTransformer pipeline
13
26
 
14
- ## Limpeza
27
+ ## Multi-Step Workflow
15
28
 
29
+ ### Step 1: Install Dependencies
30
+ ```bash
31
+ pip install pandas numpy scikit-learn joblib
32
+ ```
33
+
34
+ ### Step 2: Load and Split Data
16
35
  ```python
17
36
  import pandas as pd
37
+ import numpy as np
38
+ from sklearn.model_selection import train_test_split
18
39
 
19
- def clean_data(df: pd.DataFrame) -> pd.DataFrame:
20
- # Remover duplicatas
21
- df = df.drop_duplicates()
22
-
23
- # Tratar nulos
24
- df['age'] = df['age'].fillna(df['age'].median())
25
- df['name'] = df['name'].fillna('Unknown')
40
+ df = pd.read_parquet("data/processed/dataset_clean.parquet")
26
41
 
27
- # Remover outliers (IQR)
28
- Q1, Q3 = df['salary'].quantile([0.25, 0.75])
29
- IQR = Q3 - Q1
30
- df = df[(df['salary'] >= Q1 - 1.5*IQR) & (df['salary'] <= Q3 + 1.5*IQR)]
42
+ # Separate target
43
+ X = df.drop(columns=["target"])
44
+ y = df["target"]
31
45
 
32
- # Tipagem
33
- df['created_at'] = pd.to_datetime(df['created_at'])
46
+ # Split BEFORE any fitting -- prevents data leakage
47
+ X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
48
+ print(f"Train: {X_train.shape}, Test: {X_test.shape}")
49
+ ```
34
50
 
51
+ ### Step 3: Remove Outliers (IQR Method)
52
+ ```python
53
+ def remove_outliers_iqr(df: pd.DataFrame, columns: list[str]) -> pd.DataFrame:
54
+ df = df.copy()
55
+ for col in columns:
56
+ Q1, Q3 = df[col].quantile([0.25, 0.75])
57
+ IQR = Q3 - Q1
58
+ mask = (df[col] >= Q1 - 1.5 * IQR) & (df[col] <= Q3 + 1.5 * IQR)
59
+ before = len(df)
60
+ df = df[mask]
61
+ print(f" {col}: removed {before - len(df)} outliers")
35
62
  return df
36
- ```
37
63
 
38
- ## Encoding
64
+ numeric_cols = X_train.select_dtypes(include=[np.number]).columns.tolist()
65
+ X_train = remove_outliers_iqr(X_train, numeric_cols)
66
+ y_train = y_train.loc[X_train.index]
67
+ ```
39
68
 
69
+ ### Step 4: Build Encoding and Scaling Pipeline
40
70
  ```python
41
- from sklearn.preprocessing import OneHotEncoder, LabelEncoder, OrdinalEncoder
71
+ from sklearn.preprocessing import StandardScaler, OneHotEncoder, OrdinalEncoder
72
+ from sklearn.compose import ColumnTransformer
73
+ from sklearn.pipeline import Pipeline
74
+
75
+ numeric_features = ["age", "salary", "experience"]
76
+ categorical_features = ["department", "city"]
77
+ ordinal_features = ["level"]
78
+
79
+ preprocessor = ColumnTransformer(
80
+ transformers=[
81
+ ("num", StandardScaler(), numeric_features),
82
+ ("cat", OneHotEncoder(sparse_output=False, handle_unknown="ignore"), categorical_features),
83
+ ("ord", OrdinalEncoder(categories=[["junior", "mid", "senior"]]), ordinal_features),
84
+ ],
85
+ remainder="drop"
86
+ )
87
+
88
+ # Fit on train only, transform both
89
+ X_train_transformed = preprocessor.fit_transform(X_train)
90
+ X_test_transformed = preprocessor.transform(X_test)
91
+ print(f"Features after transform: {X_train_transformed.shape[1]}")
92
+ ```
42
93
 
43
- # Categorias sem ordem → OneHotEncoder
44
- ohe = OneHotEncoder(sparse_output=False, handle_unknown='ignore')
45
- encoded = ohe.fit_transform(df[['department', 'city']])
94
+ ### Step 5: Feature Selection
95
+ ```python
96
+ from sklearn.feature_selection import SelectKBest, f_classif, mutual_info_classif
46
97
 
47
- # Categorias com ordem → OrdinalEncoder
48
- oe = OrdinalEncoder(categories=[['junior', 'pleno', 'senior']])
49
- df['level_encoded'] = oe.fit_transform(df[['level']])
98
+ # Statistical filter
99
+ selector = SelectKBest(score_func=f_classif, k=10)
100
+ X_selected = selector.fit_transform(X_train_transformed, y_train)
50
101
 
51
- # Target LabelEncoder
52
- le = LabelEncoder()
53
- y = le.fit_transform(df['target'])
102
+ # Get selected feature names
103
+ feature_names = preprocessor.get_feature_names_out()
104
+ selected_mask = selector.get_support()
105
+ selected_features = feature_names[selected_mask]
106
+ print(f"Selected features: {list(selected_features)}")
54
107
  ```
55
108
 
56
- ## Feature Selection
109
+ Alternatively, use model-based importance:
110
+ ```python
111
+ from sklearn.ensemble import RandomForestClassifier
112
+
113
+ rf = RandomForestClassifier(n_estimators=100, random_state=42)
114
+ rf.fit(X_train_transformed, y_train)
115
+ importances = pd.Series(rf.feature_importances_, index=feature_names)
116
+ top_features = importances.nlargest(10)
117
+ print(top_features)
118
+ ```
57
119
 
120
+ ### Step 6: Save Transformer for Reuse
121
+ ```bash
122
+ mkdir -p models
123
+ ```
58
124
  ```python
59
- from sklearn.feature_selection import SelectKBest, f_classif, mutual_info_classif
125
+ import joblib
60
126
 
61
- # Filtro estatístico
62
- selector = SelectKBest(score_func=f_classif, k=10)
63
- X_selected = selector.fit_transform(X, y)
127
+ joblib.dump(preprocessor, "models/preprocessor_v1.pkl")
128
+ print("Saved preprocessor to models/preprocessor_v1.pkl")
64
129
 
65
- # Feature importance do modelo
66
- model.fit(X, y)
67
- importances = pd.Series(model.feature_importances_, index=feature_names)
68
- top_features = importances.nlargest(10)
130
+ # To reload later:
131
+ # preprocessor = joblib.load("models/preprocessor_v1.pkl")
69
132
  ```
70
133
 
71
- ## Regras
134
+ ### Step 7: Verify Pipeline End-to-End
135
+ ```bash
136
+ python -c "
137
+ import joblib, pandas as pd
138
+ p = joblib.load('models/preprocessor_v1.pkl')
139
+ df = pd.read_parquet('data/processed/dataset_clean.parquet').head(5)
140
+ X = df.drop(columns=['target'])
141
+ result = p.transform(X)
142
+ print(f'Input: {X.shape} -> Output: {result.shape}')
143
+ "
144
+ ```
72
145
 
73
- 1. Nunca usar dados do test set para fit do scaler/encoder
74
- 2. Salvar transformers junto com o modelo (pickle/joblib)
75
- 3. Documentar cada feature criada (nome, tipo, origem)
76
- 4. Verificar correlação entre features (remover redundantes)
146
+ ## Resources
147
+ - `references/encoding-guide.md` - When to use which encoder
148
+ - `references/scaling-guide.md` - Scaler comparison and selection
149
+
150
+ ## Examples
151
+ ### Example 1: Encode Categoricals
152
+ User asks: "Encode the department and city columns for my classifier"
153
+ Response approach:
154
+ 1. Identify column cardinality with `df['col'].nunique()`
155
+ 2. Use OneHotEncoder for low-cardinality unordered categoricals
156
+ 3. Use OrdinalEncoder for ordered categoricals
157
+ 4. Build a ColumnTransformer and fit on training data only
158
+ 5. Save the fitted transformer with joblib
159
+
160
+ ### Example 2: Select Best Features
161
+ User asks: "Which features matter most for predicting churn?"
162
+ Response approach:
163
+ 1. Preprocess all features through the ColumnTransformer
164
+ 2. Run SelectKBest with f_classif to rank features
165
+ 3. Also run RandomForest feature_importances_ for comparison
166
+ 4. Report the top 10 features from both methods
167
+ 5. Recommend dropping low-importance features
168
+
169
+ ## Notes
170
+ - Never fit encoders/scalers on test data -- fit on train, transform both
171
+ - Save transformers alongside the model with joblib for reproducibility
172
+ - Document each created feature: name, type, source column, transformation
173
+ - Check correlation between features and remove redundants (threshold > 0.95)
174
+ - For high-cardinality categoricals (>50 values), consider target encoding
@@ -0,0 +1,41 @@
1
+ # Encoding Guide
2
+
3
+ ## Decision Matrix
4
+
5
+ | Scenario | Encoder | Example |
6
+ |---|---|---|
7
+ | Unordered, low cardinality (<15) | OneHotEncoder | department, color |
8
+ | Ordered categories | OrdinalEncoder | level (junior/mid/senior) |
9
+ | Binary target variable | LabelEncoder | yes/no, churn/retain |
10
+ | High cardinality (>50) | TargetEncoder | zip_code, product_id |
11
+ | Text-like categories | HashingEncoder | free-text categories |
12
+
13
+ ## OneHotEncoder
14
+ ```python
15
+ from sklearn.preprocessing import OneHotEncoder
16
+ ohe = OneHotEncoder(sparse_output=False, handle_unknown='ignore')
17
+ encoded = ohe.fit_transform(df[['department', 'city']])
18
+ feature_names = ohe.get_feature_names_out()
19
+ ```
20
+
21
+ ## OrdinalEncoder
22
+ ```python
23
+ from sklearn.preprocessing import OrdinalEncoder
24
+ oe = OrdinalEncoder(categories=[['junior', 'mid', 'senior']])
25
+ df['level_encoded'] = oe.fit_transform(df[['level']])
26
+ ```
27
+
28
+ ## LabelEncoder (target only)
29
+ ```python
30
+ from sklearn.preprocessing import LabelEncoder
31
+ le = LabelEncoder()
32
+ y = le.fit_transform(df['target'])
33
+ # Decode: le.inverse_transform(y)
34
+ ```
35
+
36
+ ## TargetEncoder (high cardinality)
37
+ ```python
38
+ from sklearn.preprocessing import TargetEncoder
39
+ te = TargetEncoder(smooth="auto")
40
+ df['zip_encoded'] = te.fit_transform(df[['zip_code']], y)
41
+ ```
@@ -0,0 +1,38 @@
1
+ # Scaling Guide
2
+
3
+ ## Decision Matrix
4
+
5
+ | Scenario | Scaler | When to Use |
6
+ |---|---|---|
7
+ | Normal distribution, no outliers | StandardScaler | Default choice for most models |
8
+ | Need 0-1 range | MinMaxScaler | Neural networks, image data |
9
+ | Data has outliers | RobustScaler | Uses median/IQR, outlier-resistant |
10
+ | Sparse data | MaxAbsScaler | Preserves sparsity |
11
+
12
+ ## StandardScaler (z-score normalization)
13
+ ```python
14
+ from sklearn.preprocessing import StandardScaler
15
+ scaler = StandardScaler()
16
+ X_scaled = scaler.fit_transform(X_train)
17
+ X_test_scaled = scaler.transform(X_test)
18
+ ```
19
+
20
+ ## MinMaxScaler (0-1 range)
21
+ ```python
22
+ from sklearn.preprocessing import MinMaxScaler
23
+ scaler = MinMaxScaler(feature_range=(0, 1))
24
+ X_scaled = scaler.fit_transform(X_train)
25
+ ```
26
+
27
+ ## RobustScaler (outlier-resistant)
28
+ ```python
29
+ from sklearn.preprocessing import RobustScaler
30
+ scaler = RobustScaler()
31
+ X_scaled = scaler.fit_transform(X_train)
32
+ ```
33
+
34
+ ## Important Rules
35
+ 1. Always fit on training data only
36
+ 2. Save the scaler with joblib alongside the model
37
+ 3. Tree-based models (RF, XGBoost) do NOT need scaling
38
+ 4. Linear models, SVM, KNN, neural nets DO need scaling