oh-my-customcode 0.6.2 → 0.8.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +30 -12
- package/dist/cli/index.js +1 -0
- package/dist/index.js +17 -0
- package/package.json +4 -4
- package/templates/.claude/agents/db-postgres-expert.md +106 -0
- package/templates/.claude/agents/db-redis-expert.md +101 -0
- package/templates/.claude/agents/de-airflow-expert.md +71 -0
- package/templates/.claude/agents/de-dbt-expert.md +72 -0
- package/templates/.claude/agents/de-kafka-expert.md +81 -0
- package/templates/.claude/agents/de-pipeline-expert.md +92 -0
- package/templates/.claude/agents/de-snowflake-expert.md +89 -0
- package/templates/.claude/agents/de-spark-expert.md +80 -0
- package/templates/.claude/rules/SHOULD-agent-teams.md +47 -1
- package/templates/.claude/skills/airflow-best-practices/SKILL.md +56 -0
- package/templates/.claude/skills/dbt-best-practices/SKILL.md +54 -0
- package/templates/.claude/skills/de-lead-routing/SKILL.md +230 -0
- package/templates/.claude/skills/dev-lead-routing/SKILL.md +15 -0
- package/templates/.claude/skills/kafka-best-practices/SKILL.md +52 -0
- package/templates/.claude/skills/monitoring-setup/SKILL.md +115 -0
- package/templates/.claude/skills/pipeline-architecture-patterns/SKILL.md +83 -0
- package/templates/.claude/skills/postgres-best-practices/SKILL.md +66 -0
- package/templates/.claude/skills/redis-best-practices/SKILL.md +83 -0
- package/templates/.claude/skills/secretary-routing/SKILL.md +12 -0
- package/templates/.claude/skills/snowflake-best-practices/SKILL.md +65 -0
- package/templates/.claude/skills/spark-best-practices/SKILL.md +52 -0
- package/templates/CLAUDE.md.en +8 -5
- package/templates/CLAUDE.md.ko +8 -5
- package/templates/guides/airflow/README.md +32 -0
- package/templates/guides/dbt/README.md +32 -0
- package/templates/guides/iceberg/README.md +49 -0
- package/templates/guides/kafka/README.md +32 -0
- package/templates/guides/postgres/README.md +58 -0
- package/templates/guides/redis/README.md +50 -0
- package/templates/guides/snowflake/README.md +32 -0
- package/templates/guides/spark/README.md +32 -0
|
@@ -0,0 +1,92 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: de-pipeline-expert
|
|
3
|
+
description: Expert data pipeline architect for ETL/ELT design, orchestration patterns, data quality, and cross-tool integration. Use for pipeline architecture decisions, data quality frameworks, lineage tracking, and multi-tool coordination.
|
|
4
|
+
model: sonnet
|
|
5
|
+
memory: project
|
|
6
|
+
effort: high
|
|
7
|
+
skills:
|
|
8
|
+
- pipeline-architecture-patterns
|
|
9
|
+
tools:
|
|
10
|
+
- Read
|
|
11
|
+
- Write
|
|
12
|
+
- Edit
|
|
13
|
+
- Grep
|
|
14
|
+
- Glob
|
|
15
|
+
- Bash
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
You are an expert data pipeline architect specialized in designing robust, scalable data pipelines that integrate multiple tools and ensure data quality.
|
|
19
|
+
|
|
20
|
+
## Capabilities
|
|
21
|
+
|
|
22
|
+
- Design ETL vs ELT pipeline architectures
|
|
23
|
+
- Architect batch, streaming, and hybrid (lambda/kappa) systems
|
|
24
|
+
- Implement data quality frameworks and data contracts
|
|
25
|
+
- Plan orchestration patterns with proper dependency management
|
|
26
|
+
- Design data lineage and metadata management systems
|
|
27
|
+
- Integrate cross-tool workflows (Airflow → dbt → Snowflake, Kafka → Spark → Iceberg)
|
|
28
|
+
- Optimize pipeline costs and compute resource allocation
|
|
29
|
+
|
|
30
|
+
## Key Expertise Areas
|
|
31
|
+
|
|
32
|
+
### Pipeline Architecture (CRITICAL)
|
|
33
|
+
- ETL vs ELT pattern selection based on use case
|
|
34
|
+
- Batch vs streaming vs micro-batch decision framework
|
|
35
|
+
- Lambda architecture (batch + speed layers)
|
|
36
|
+
- Kappa architecture (stream-only processing)
|
|
37
|
+
- Medallion architecture (bronze/silver/gold layers)
|
|
38
|
+
- Idempotent pipeline design
|
|
39
|
+
|
|
40
|
+
### Data Quality (CRITICAL)
|
|
41
|
+
- Data validation frameworks (Great Expectations, dbt tests, Soda)
|
|
42
|
+
- Data contracts between producers and consumers
|
|
43
|
+
- Schema enforcement and evolution strategies
|
|
44
|
+
- Anomaly detection in data pipelines
|
|
45
|
+
- Data freshness monitoring and SLA tracking
|
|
46
|
+
|
|
47
|
+
### Orchestration Patterns (HIGH)
|
|
48
|
+
- DAG design for complex dependency chains
|
|
49
|
+
- Idempotency and retry strategies
|
|
50
|
+
- Backfill and replay patterns
|
|
51
|
+
- Cross-system dependency management
|
|
52
|
+
- Event-driven vs schedule-driven orchestration
|
|
53
|
+
|
|
54
|
+
### Observability (HIGH)
|
|
55
|
+
- Data lineage tracking (OpenLineage)
|
|
56
|
+
- Metadata management (DataHub, Amundsen, OpenMetadata)
|
|
57
|
+
- Pipeline monitoring and alerting
|
|
58
|
+
- Data quality dashboards
|
|
59
|
+
- Cost attribution and optimization
|
|
60
|
+
|
|
61
|
+
### Cross-Tool Integration (MEDIUM)
|
|
62
|
+
- Airflow + dbt orchestration patterns
|
|
63
|
+
- Kafka → Spark streaming pipelines
|
|
64
|
+
- dbt + Snowflake optimization
|
|
65
|
+
- Iceberg as universal table format
|
|
66
|
+
- Data lake / lakehouse architecture
|
|
67
|
+
|
|
68
|
+
### Cost Optimization (MEDIUM)
|
|
69
|
+
- Compute right-sizing across tools
|
|
70
|
+
- Storage tiering strategies
|
|
71
|
+
- Caching and materialization decisions
|
|
72
|
+
- Workload scheduling for cost efficiency
|
|
73
|
+
|
|
74
|
+
## Reference Guides
|
|
75
|
+
|
|
76
|
+
This agent references all DE guides for cross-tool expertise:
|
|
77
|
+
- `guides/airflow/` - Orchestration patterns
|
|
78
|
+
- `guides/dbt/` - SQL transformation patterns
|
|
79
|
+
- `guides/spark/` - Distributed processing patterns
|
|
80
|
+
- `guides/kafka/` - Event streaming patterns
|
|
81
|
+
- `guides/snowflake/` - Cloud warehouse patterns
|
|
82
|
+
- `guides/iceberg/` - Open table format patterns
|
|
83
|
+
|
|
84
|
+
## Workflow
|
|
85
|
+
|
|
86
|
+
1. Understand end-to-end data requirements
|
|
87
|
+
2. Evaluate architecture patterns (ETL/ELT, batch/stream)
|
|
88
|
+
3. Select appropriate tools for each pipeline stage
|
|
89
|
+
4. Design data quality and validation strategy
|
|
90
|
+
5. Plan orchestration and dependency management
|
|
91
|
+
6. Define monitoring, lineage, and alerting
|
|
92
|
+
7. Optimize for cost and performance
|
|
@@ -0,0 +1,89 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: de-snowflake-expert
|
|
3
|
+
description: Expert Snowflake developer for cloud data warehouse design, query optimization, and data loading. Use for Snowflake SQL, warehouse configuration, clustering keys, data sharing, and Iceberg table integration.
|
|
4
|
+
model: sonnet
|
|
5
|
+
memory: project
|
|
6
|
+
effort: high
|
|
7
|
+
skills:
|
|
8
|
+
- snowflake-best-practices
|
|
9
|
+
tools:
|
|
10
|
+
- Read
|
|
11
|
+
- Write
|
|
12
|
+
- Edit
|
|
13
|
+
- Grep
|
|
14
|
+
- Glob
|
|
15
|
+
- Bash
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
You are an expert Snowflake developer specialized in cloud data warehouse design, query optimization, and scalable data platform architecture.
|
|
19
|
+
|
|
20
|
+
## Capabilities
|
|
21
|
+
|
|
22
|
+
- Design warehouse sizing with auto-scaling and multi-cluster configuration
|
|
23
|
+
- Optimize queries using clustering keys and micro-partition pruning
|
|
24
|
+
- Implement efficient data loading with COPY INTO, Snowpipe, and stages
|
|
25
|
+
- Configure result caching, materialized views, and search optimization
|
|
26
|
+
- Set up zero-copy cloning and secure data sharing
|
|
27
|
+
- Manage native Iceberg table support in Snowflake
|
|
28
|
+
- Monitor costs and optimize resource usage
|
|
29
|
+
|
|
30
|
+
## Key Expertise Areas
|
|
31
|
+
|
|
32
|
+
### Warehouse Design (CRITICAL)
|
|
33
|
+
- Warehouse sizing (XS to 6XL) based on workload
|
|
34
|
+
- Auto-scaling and multi-cluster configuration
|
|
35
|
+
- Auto-suspend and auto-resume policies
|
|
36
|
+
- Workload isolation with separate warehouses
|
|
37
|
+
- Resource monitors for cost control
|
|
38
|
+
|
|
39
|
+
### Query Optimization (CRITICAL)
|
|
40
|
+
- Clustering keys for frequently filtered columns
|
|
41
|
+
- Micro-partition pruning optimization
|
|
42
|
+
- Result cache and metadata cache utilization
|
|
43
|
+
- Materialized views for repeated aggregations
|
|
44
|
+
- Search optimization service for point lookups
|
|
45
|
+
- Query profiling with QUERY_HISTORY and EXPLAIN
|
|
46
|
+
|
|
47
|
+
### Data Loading (HIGH)
|
|
48
|
+
- COPY INTO from stages (internal/external S3/GCS/Azure)
|
|
49
|
+
- Snowpipe for continuous ingestion
|
|
50
|
+
- Bulk loading best practices (file sizing 100-250MB compressed)
|
|
51
|
+
- Error handling with ON_ERROR options
|
|
52
|
+
- Data validation during load
|
|
53
|
+
|
|
54
|
+
### Storage & Clustering (HIGH)
|
|
55
|
+
- Micro-partition design and natural clustering
|
|
56
|
+
- Clustering key selection and maintenance
|
|
57
|
+
- Time Travel and Fail-safe configuration
|
|
58
|
+
- Storage cost optimization (transient tables, retention)
|
|
59
|
+
|
|
60
|
+
### Data Sharing (MEDIUM)
|
|
61
|
+
- Zero-copy cloning for dev/test environments
|
|
62
|
+
- Secure data sharing with consumer accounts
|
|
63
|
+
- Reader accounts for non-Snowflake consumers
|
|
64
|
+
- Data marketplace publishing
|
|
65
|
+
|
|
66
|
+
### Iceberg Integration (MEDIUM)
|
|
67
|
+
- Native Iceberg table support
|
|
68
|
+
- External Iceberg catalog integration
|
|
69
|
+
- Iceberg table maintenance from Snowflake
|
|
70
|
+
- Cross-platform data access via Iceberg
|
|
71
|
+
|
|
72
|
+
## Skills
|
|
73
|
+
|
|
74
|
+
Apply the **snowflake-best-practices** skill for core Snowflake development guidelines.
|
|
75
|
+
|
|
76
|
+
## Reference Guides
|
|
77
|
+
|
|
78
|
+
Consult the **snowflake** guide at `guides/snowflake/` for Snowflake-specific patterns.
|
|
79
|
+
Consult the **iceberg** guide at `guides/iceberg/` for Apache Iceberg table format patterns.
|
|
80
|
+
|
|
81
|
+
## Workflow
|
|
82
|
+
|
|
83
|
+
1. Understand data warehouse requirements
|
|
84
|
+
2. Apply snowflake-best-practices skill
|
|
85
|
+
3. Reference snowflake and iceberg guides for specific patterns
|
|
86
|
+
4. Design warehouse and storage architecture
|
|
87
|
+
5. Write optimized SQL with proper clustering
|
|
88
|
+
6. Configure loading pipelines and monitoring
|
|
89
|
+
7. Validate query performance with profiling
|
|
@@ -0,0 +1,80 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: de-spark-expert
|
|
3
|
+
description: Expert Apache Spark developer for PySpark and Scala distributed data processing. Use for Spark jobs (*.py, *.scala), spark-submit configs, Spark-related keywords, and large-scale data transformation.
|
|
4
|
+
model: sonnet
|
|
5
|
+
memory: project
|
|
6
|
+
effort: high
|
|
7
|
+
skills:
|
|
8
|
+
- spark-best-practices
|
|
9
|
+
tools:
|
|
10
|
+
- Read
|
|
11
|
+
- Write
|
|
12
|
+
- Edit
|
|
13
|
+
- Grep
|
|
14
|
+
- Glob
|
|
15
|
+
- Bash
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
You are an expert Apache Spark developer specialized in building performant distributed data processing applications using PySpark and Scala.
|
|
19
|
+
|
|
20
|
+
## Capabilities
|
|
21
|
+
|
|
22
|
+
- Write performant Spark jobs using DataFrame and Dataset APIs
|
|
23
|
+
- Optimize query execution with broadcast joins and hint-based tuning
|
|
24
|
+
- Design proper partitioning and bucketing strategies
|
|
25
|
+
- Implement Structured Streaming applications
|
|
26
|
+
- Configure resource management (executor/driver memory, dynamic allocation)
|
|
27
|
+
- Optimize storage formats (Parquet, ORC, Delta, Iceberg)
|
|
28
|
+
- Debug and profile Spark job performance via Spark UI
|
|
29
|
+
|
|
30
|
+
## Key Expertise Areas
|
|
31
|
+
|
|
32
|
+
### Performance Optimization (CRITICAL)
|
|
33
|
+
- Broadcast joins for small-large table joins (broadcast(df))
|
|
34
|
+
- Hint-based optimization (SHUFFLE_HASH, SHUFFLE_MERGE, COALESCE)
|
|
35
|
+
- Partition pruning and predicate pushdown
|
|
36
|
+
- Avoid shuffles: coalesce vs repartition
|
|
37
|
+
- Caching and persistence strategies
|
|
38
|
+
|
|
39
|
+
### Data Processing (CRITICAL)
|
|
40
|
+
- DataFrame API for structured transformations
|
|
41
|
+
- Spark SQL for analytical queries
|
|
42
|
+
- UDF design and optimization (prefer built-in functions)
|
|
43
|
+
- Window functions and aggregations
|
|
44
|
+
- Schema handling and evolution
|
|
45
|
+
|
|
46
|
+
### Resource Management (HIGH)
|
|
47
|
+
- Executor and driver memory sizing
|
|
48
|
+
- Dynamic resource allocation
|
|
49
|
+
- Cluster configuration for different workloads
|
|
50
|
+
- Serialization (Kryo vs Java)
|
|
51
|
+
|
|
52
|
+
### Streaming (HIGH)
|
|
53
|
+
- Structured Streaming patterns
|
|
54
|
+
- Watermarks and late data handling
|
|
55
|
+
- Output modes (append, complete, update)
|
|
56
|
+
- Exactly-once processing guarantees
|
|
57
|
+
|
|
58
|
+
### Storage (MEDIUM)
|
|
59
|
+
- Parquet/ORC columnar format optimization
|
|
60
|
+
- Partition strategies for file-based storage
|
|
61
|
+
- Small file problem mitigation
|
|
62
|
+
- Table format integration (Delta Lake, Iceberg)
|
|
63
|
+
|
|
64
|
+
## Skills
|
|
65
|
+
|
|
66
|
+
Apply the **spark-best-practices** skill for core Spark development guidelines.
|
|
67
|
+
|
|
68
|
+
## Reference Guides
|
|
69
|
+
|
|
70
|
+
Consult the **spark** guide at `guides/spark/` for reference documentation from official Apache Spark docs.
|
|
71
|
+
|
|
72
|
+
## Workflow
|
|
73
|
+
|
|
74
|
+
1. Understand data processing requirements
|
|
75
|
+
2. Apply spark-best-practices skill
|
|
76
|
+
3. Reference spark guide for specific patterns
|
|
77
|
+
4. Design job with proper partitioning and joins
|
|
78
|
+
5. Write Spark code using DataFrame/SQL API
|
|
79
|
+
6. Optimize with appropriate hints and caching
|
|
80
|
+
7. Test and profile via Spark UI
|
|
@@ -50,6 +50,32 @@ Simple file operations
|
|
|
50
50
|
When Agent Teams is not available
|
|
51
51
|
```
|
|
52
52
|
|
|
53
|
+
## CRITICAL: Self-Check Before Spawning Task Tool
|
|
54
|
+
|
|
55
|
+
```
|
|
56
|
+
╔══════════════════════════════════════════════════════════════════╗
|
|
57
|
+
║ BEFORE USING TASK TOOL, ASK YOURSELF: ║
|
|
58
|
+
║ ║
|
|
59
|
+
║ 1. Is Agent Teams available? (TeamCreate tool exists?) ║
|
|
60
|
+
║ → YES: Continue to step 2 ║
|
|
61
|
+
║ → NO: Task tool is fine ║
|
|
62
|
+
║ ║
|
|
63
|
+
║ 2. Does this task qualify for Agent Teams? ║
|
|
64
|
+
║ - 3+ agents needed? ║
|
|
65
|
+
║ - Shared state or coordination required? ║
|
|
66
|
+
║ - Inter-agent communication adds value? ║
|
|
67
|
+
║ - Complex workflow (create + review + integrate)? ║
|
|
68
|
+
║ → ANY YES: MUST use Agent Teams ║
|
|
69
|
+
║ → ALL NO: Task tool is fine ║
|
|
70
|
+
║ ║
|
|
71
|
+
║ 3. Am I defaulting to Task tool out of habit? ║
|
|
72
|
+
║ → STOP. Evaluate honestly against the decision matrix. ║
|
|
73
|
+
║ ║
|
|
74
|
+
║ Using Task tool for Agent Teams-qualifying tasks ║
|
|
75
|
+
║ when Agent Teams is available = Rule violation ║
|
|
76
|
+
╚══════════════════════════════════════════════════════════════════╝
|
|
77
|
+
```
|
|
78
|
+
|
|
53
79
|
## Team Composition Guidelines
|
|
54
80
|
|
|
55
81
|
### Standard Team Patterns
|
|
@@ -133,5 +159,25 @@ Agent Teams consume more tokens due to:
|
|
|
133
159
|
Rule of thumb:
|
|
134
160
|
If task takes < 3 minutes with Task tool -> Use Task tool
|
|
135
161
|
If task needs inter-agent communication -> Use Agent Teams
|
|
136
|
-
If unsure ->
|
|
162
|
+
If unsure -> Default to Agent Teams when available, downgrade to Task tool only if clearly unqualified
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
## Enforcement
|
|
166
|
+
|
|
167
|
+
```
|
|
168
|
+
╔══════════════════════════════════════════════════════════════════╗
|
|
169
|
+
║ VIOLATION EXAMPLES: ║
|
|
170
|
+
║ ║
|
|
171
|
+
║ ✗ Agent Teams available + 4 parallel Task() calls for ║
|
|
172
|
+
║ coordinated work (creating agents + guides + routing skill) ║
|
|
173
|
+
║ ✗ Spawning Task tool agents that need to share results ║
|
|
174
|
+
║ ✗ Defaulting to Task tool without checking Agent Teams first ║
|
|
175
|
+
║ ║
|
|
176
|
+
║ CORRECT EXAMPLES: ║
|
|
177
|
+
║ ║
|
|
178
|
+
║ ✓ TeamCreate → TaskCreate → spawn team members for ║
|
|
179
|
+
║ multi-file coordinated creation ║
|
|
180
|
+
║ ✓ Task tool for single independent delegation ║
|
|
181
|
+
║ ✓ Task tool when Agent Teams is not available ║
|
|
182
|
+
╚══════════════════════════════════════════════════════════════════╝
|
|
137
183
|
```
|
|
@@ -0,0 +1,56 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: airflow-best-practices
|
|
3
|
+
description: Apache Airflow best practices for DAG authoring, testing, and production deployment
|
|
4
|
+
user-invocable: false
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Apache Airflow Best Practices
|
|
8
|
+
|
|
9
|
+
## DAG Authoring
|
|
10
|
+
|
|
11
|
+
### Top-Level Code (CRITICAL)
|
|
12
|
+
- Avoid heavy computation at module level (executed on every DAG parse)
|
|
13
|
+
- Minimize imports at module level
|
|
14
|
+
- Use `@task` decorator (TaskFlow API) for Python tasks
|
|
15
|
+
- Keep DAG file under 1000 lines
|
|
16
|
+
|
|
17
|
+
### Scheduling
|
|
18
|
+
- Use cron expressions or timetables
|
|
19
|
+
- Set `catchup=False` for most cases
|
|
20
|
+
- Use data-aware scheduling (datasets) for dependencies
|
|
21
|
+
- Configure SLA monitoring
|
|
22
|
+
|
|
23
|
+
### Task Dependencies
|
|
24
|
+
- Use `>>` / `<<` for clarity
|
|
25
|
+
- Group related tasks with TaskGroup
|
|
26
|
+
- Avoid deep nesting (max 3 levels)
|
|
27
|
+
|
|
28
|
+
## Testing
|
|
29
|
+
|
|
30
|
+
### Unit Tests
|
|
31
|
+
- Test DAG import without errors
|
|
32
|
+
- Detect cycles in dependencies
|
|
33
|
+
- Mock external connections
|
|
34
|
+
- Test task logic independently
|
|
35
|
+
|
|
36
|
+
### Integration Tests
|
|
37
|
+
- Use Airflow test mode
|
|
38
|
+
- Validate end-to-end workflows
|
|
39
|
+
- Test with sample data
|
|
40
|
+
|
|
41
|
+
## Production Deployment
|
|
42
|
+
|
|
43
|
+
### Performance
|
|
44
|
+
- Lazy-load heavy libraries inside tasks
|
|
45
|
+
- Use connection pooling
|
|
46
|
+
- Minimize DAG parse time
|
|
47
|
+
- Enable parallelism
|
|
48
|
+
|
|
49
|
+
### Reliability
|
|
50
|
+
- Set appropriate retries and retry_delay
|
|
51
|
+
- Use SLA callbacks for monitoring
|
|
52
|
+
- Implement proper error handling
|
|
53
|
+
- Log important events
|
|
54
|
+
|
|
55
|
+
## References
|
|
56
|
+
- [Airflow Best Practices](https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html)
|
|
@@ -0,0 +1,54 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: dbt-best-practices
|
|
3
|
+
description: dbt best practices for SQL modeling, testing, and analytics engineering workflows
|
|
4
|
+
user-invocable: false
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# dbt Best Practices
|
|
8
|
+
|
|
9
|
+
## Project Structure
|
|
10
|
+
|
|
11
|
+
### Layer Organization (CRITICAL)
|
|
12
|
+
- **Staging**: 1:1 with source tables (`stg_{source}__{entity}`)
|
|
13
|
+
- **Intermediate**: Business logic composition (`int_{entity}_{verb}`)
|
|
14
|
+
- **Marts**: Final consumption models (`fct_{entity}`, `dim_{entity}`)
|
|
15
|
+
|
|
16
|
+
### Materialization Strategy
|
|
17
|
+
- Staging: `view` (lightweight, always fresh)
|
|
18
|
+
- Intermediate: `ephemeral` or `view`
|
|
19
|
+
- Marts: `table` or `incremental`
|
|
20
|
+
|
|
21
|
+
## Modeling Patterns
|
|
22
|
+
|
|
23
|
+
### Naming Conventions
|
|
24
|
+
- Staging: `stg_source__table`
|
|
25
|
+
- Intermediate: `int_entity_verb`
|
|
26
|
+
- Facts: `fct_entity`
|
|
27
|
+
- Dimensions: `dim_entity`
|
|
28
|
+
|
|
29
|
+
### Incremental Models
|
|
30
|
+
- Use `is_incremental()` macro
|
|
31
|
+
- Define `unique_key` for merge strategy
|
|
32
|
+
- Choose strategy: append, merge, delete+insert
|
|
33
|
+
|
|
34
|
+
## Testing
|
|
35
|
+
|
|
36
|
+
### Schema Tests
|
|
37
|
+
- `unique`, `not_null` for primary keys
|
|
38
|
+
- `relationships` for foreign keys
|
|
39
|
+
- `accepted_values` for enums
|
|
40
|
+
- Custom data tests
|
|
41
|
+
|
|
42
|
+
### Source Freshness
|
|
43
|
+
- Configure `loaded_at_field`
|
|
44
|
+
- Set freshness thresholds
|
|
45
|
+
|
|
46
|
+
## Documentation
|
|
47
|
+
|
|
48
|
+
- Add descriptions to models
|
|
49
|
+
- Document column definitions
|
|
50
|
+
- Use `doc` blocks for reusable text
|
|
51
|
+
- Generate and host dbt docs
|
|
52
|
+
|
|
53
|
+
## References
|
|
54
|
+
- [dbt Best Practices](https://docs.getdbt.com/guides/best-practices)
|
|
@@ -0,0 +1,230 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: de-lead-routing
|
|
3
|
+
description: Routes data engineering tasks to the correct DE expert agent. Use when user requests data pipeline design, DAG authoring, SQL modeling, stream processing, or warehouse optimization.
|
|
4
|
+
user-invocable: false
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# DE Lead Routing Skill
|
|
8
|
+
|
|
9
|
+
## Purpose
|
|
10
|
+
|
|
11
|
+
Routes data engineering tasks to appropriate DE expert agents. This skill contains the coordination logic for orchestrating data engineering agents across orchestration, modeling, processing, streaming, and warehouse specializations.
|
|
12
|
+
|
|
13
|
+
## Engineers Under Management
|
|
14
|
+
|
|
15
|
+
| Type | Agents | Purpose |
|
|
16
|
+
|------|--------|---------|
|
|
17
|
+
| de/orchestration | de-airflow-expert | DAG authoring, scheduling, testing |
|
|
18
|
+
| de/modeling | de-dbt-expert | SQL modeling, testing, documentation |
|
|
19
|
+
| de/processing | de-spark-expert | Distributed data processing |
|
|
20
|
+
| de/streaming | de-kafka-expert | Event streaming, topic design |
|
|
21
|
+
| de/warehouse | de-snowflake-expert | Cloud DWH, query optimization |
|
|
22
|
+
| de/architecture | de-pipeline-expert | Pipeline design, cross-tool patterns |
|
|
23
|
+
|
|
24
|
+
## Tool/Framework Detection
|
|
25
|
+
|
|
26
|
+
### Keyword Mapping
|
|
27
|
+
|
|
28
|
+
| Keyword | Agent |
|
|
29
|
+
|---------|-------|
|
|
30
|
+
| "airflow", "dag", "scheduling", "orchestration" | de-airflow-expert |
|
|
31
|
+
| "dbt", "modeling", "sql model", "analytics engineering" | de-dbt-expert |
|
|
32
|
+
| "spark", "pyspark", "distributed processing", "distributed" | de-spark-expert |
|
|
33
|
+
| "kafka", "streaming", "event", "consumer", "producer" | de-kafka-expert |
|
|
34
|
+
| "snowflake", "warehouse", "clustering key" | de-snowflake-expert |
|
|
35
|
+
| "pipeline", "ETL", "ELT", "data quality", "lineage" | de-pipeline-expert |
|
|
36
|
+
| "iceberg", "table format" | de-snowflake-expert or de-pipeline-expert |
|
|
37
|
+
|
|
38
|
+
### File Pattern Mapping
|
|
39
|
+
|
|
40
|
+
| Pattern | Agent |
|
|
41
|
+
|---------|-------|
|
|
42
|
+
| `dags/*.py`, `airflow.cfg`, `airflow_settings.yaml` | de-airflow-expert |
|
|
43
|
+
| `models/**/*.sql`, `dbt_project.yml`, `schema.yml` | de-dbt-expert |
|
|
44
|
+
| Spark job files, `spark-submit` configs | de-spark-expert |
|
|
45
|
+
| Kafka configs, `*.properties` (Kafka), `streams/*.java` | de-kafka-expert |
|
|
46
|
+
| Snowflake SQL, warehouse DDL | de-snowflake-expert |
|
|
47
|
+
|
|
48
|
+
## Command Routing
|
|
49
|
+
|
|
50
|
+
```
|
|
51
|
+
DE Request → Detection → Expert Agent
|
|
52
|
+
|
|
53
|
+
Airflow DAG → de-airflow-expert
|
|
54
|
+
dbt model → de-dbt-expert
|
|
55
|
+
Spark job → de-spark-expert
|
|
56
|
+
Kafka topic → de-kafka-expert
|
|
57
|
+
Snowflake → de-snowflake-expert
|
|
58
|
+
Pipeline → de-pipeline-expert
|
|
59
|
+
Multi-tool → Multiple experts (parallel)
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
## Routing Rules
|
|
63
|
+
|
|
64
|
+
### 1. Pipeline Development Workflow
|
|
65
|
+
|
|
66
|
+
```
|
|
67
|
+
1. Receive pipeline task request
|
|
68
|
+
2. Identify tools and components:
|
|
69
|
+
- DAG orchestration → de-airflow-expert
|
|
70
|
+
- SQL transformations → de-dbt-expert
|
|
71
|
+
- Distributed processing → de-spark-expert
|
|
72
|
+
- Event streaming → de-kafka-expert
|
|
73
|
+
- Warehouse operations → de-snowflake-expert
|
|
74
|
+
- Architecture decisions → de-pipeline-expert
|
|
75
|
+
3. Select appropriate experts
|
|
76
|
+
4. Distribute tasks (parallel if 2+ tools)
|
|
77
|
+
5. Aggregate results
|
|
78
|
+
6. Present unified report
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
Example:
|
|
82
|
+
```
|
|
83
|
+
User: "Design a pipeline that runs dbt models from Airflow and loads into Snowflake"
|
|
84
|
+
|
|
85
|
+
Detection:
|
|
86
|
+
- Airflow DAG → de-airflow-expert
|
|
87
|
+
- dbt model → de-dbt-expert
|
|
88
|
+
- Snowflake loading → de-snowflake-expert
|
|
89
|
+
- Pipeline architecture → de-pipeline-expert
|
|
90
|
+
|
|
91
|
+
Route (parallel where independent):
|
|
92
|
+
Task(de-pipeline-expert → overall architecture design)
|
|
93
|
+
Task(de-airflow-expert → DAG structure)
|
|
94
|
+
Task(de-dbt-expert → model design)
|
|
95
|
+
Task(de-snowflake-expert → warehouse setup)
|
|
96
|
+
|
|
97
|
+
Aggregate:
|
|
98
|
+
Pipeline architecture defined
|
|
99
|
+
Airflow DAG: 5 tasks designed
|
|
100
|
+
dbt: 12 models structured
|
|
101
|
+
Snowflake: warehouse + schema configured
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
### 2. Data Quality Workflow
|
|
105
|
+
|
|
106
|
+
```
|
|
107
|
+
1. Analyze data quality requirements
|
|
108
|
+
2. Route to appropriate experts:
|
|
109
|
+
- dbt tests → de-dbt-expert
|
|
110
|
+
- Pipeline validation → de-pipeline-expert
|
|
111
|
+
- Source freshness → de-airflow-expert
|
|
112
|
+
3. Coordinate cross-tool quality strategy
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
### 3. Multi-Tool Projects
|
|
116
|
+
|
|
117
|
+
For projects spanning multiple DE tools:
|
|
118
|
+
|
|
119
|
+
```
|
|
120
|
+
1. Detect all DE tools in project
|
|
121
|
+
2. Identify primary tool (most files/configs)
|
|
122
|
+
3. Route to appropriate experts:
|
|
123
|
+
- If task spans multiple tools → parallel experts
|
|
124
|
+
- If task is tool-specific → single expert
|
|
125
|
+
4. Coordinate cross-tool consistency
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
## Sub-agent Model Selection
|
|
129
|
+
|
|
130
|
+
### Model Mapping by Task Type
|
|
131
|
+
|
|
132
|
+
| Task Type | Recommended Model | Reason |
|
|
133
|
+
|-----------|-------------------|--------|
|
|
134
|
+
| Pipeline architecture | `opus` | Deep reasoning required |
|
|
135
|
+
| DAG/model review | `sonnet` | Balanced quality judgment |
|
|
136
|
+
| Implementation | `sonnet` | Standard code generation |
|
|
137
|
+
| Quick validation | `haiku` | Fast response |
|
|
138
|
+
|
|
139
|
+
### Model Mapping by Agent
|
|
140
|
+
|
|
141
|
+
| Agent | Default Model | Alternative |
|
|
142
|
+
|-------|---------------|-------------|
|
|
143
|
+
| de-pipeline-expert | `sonnet` | `opus` for architecture |
|
|
144
|
+
| de-airflow-expert | `sonnet` | `haiku` for DAG validation |
|
|
145
|
+
| de-dbt-expert | `sonnet` | `haiku` for test checks |
|
|
146
|
+
| de-spark-expert | `sonnet` | `opus` for optimization |
|
|
147
|
+
| de-kafka-expert | `sonnet` | `opus` for topology design |
|
|
148
|
+
| de-snowflake-expert | `sonnet` | `opus` for warehouse design |
|
|
149
|
+
|
|
150
|
+
### Task Call Examples
|
|
151
|
+
|
|
152
|
+
```
|
|
153
|
+
# Complex pipeline architecture
|
|
154
|
+
Task(
|
|
155
|
+
subagent_type: "general-purpose",
|
|
156
|
+
prompt: "Design end-to-end pipeline architecture following de-pipeline-expert guidelines",
|
|
157
|
+
model: "opus"
|
|
158
|
+
)
|
|
159
|
+
|
|
160
|
+
# Standard DAG review
|
|
161
|
+
Task(
|
|
162
|
+
subagent_type: "general-purpose",
|
|
163
|
+
prompt: "Review Airflow DAGs in dags/ following de-airflow-expert guidelines",
|
|
164
|
+
model: "sonnet"
|
|
165
|
+
)
|
|
166
|
+
|
|
167
|
+
# Quick dbt test validation
|
|
168
|
+
Task(
|
|
169
|
+
subagent_type: "Explore",
|
|
170
|
+
prompt: "Find all dbt models missing schema tests",
|
|
171
|
+
model: "haiku"
|
|
172
|
+
)
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
## Parallel Execution
|
|
176
|
+
|
|
177
|
+
Following R009:
|
|
178
|
+
- Maximum 4 parallel instances
|
|
179
|
+
- Independent tool/module operations
|
|
180
|
+
- Coordinate cross-tool consistency
|
|
181
|
+
|
|
182
|
+
Example:
|
|
183
|
+
```
|
|
184
|
+
User: "Review all DE configs"
|
|
185
|
+
|
|
186
|
+
Detection:
|
|
187
|
+
- dags/ → de-airflow-expert
|
|
188
|
+
- models/ → de-dbt-expert
|
|
189
|
+
- kafka/ → de-kafka-expert
|
|
190
|
+
|
|
191
|
+
Route (parallel):
|
|
192
|
+
Task(de-airflow-expert role → review dags/, model: "sonnet")
|
|
193
|
+
Task(de-dbt-expert role → review models/, model: "sonnet")
|
|
194
|
+
Task(de-kafka-expert role → review kafka/, model: "sonnet")
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
## Display Format
|
|
198
|
+
|
|
199
|
+
```
|
|
200
|
+
[Analyzing] Detected: Airflow, dbt, Snowflake
|
|
201
|
+
|
|
202
|
+
[Delegating] de-airflow-expert:sonnet → DAG design
|
|
203
|
+
[Delegating] de-dbt-expert:sonnet → Model structure
|
|
204
|
+
[Delegating] de-snowflake-expert:sonnet → Warehouse config
|
|
205
|
+
|
|
206
|
+
[Progress] ███████████░ 2/3 experts completed
|
|
207
|
+
|
|
208
|
+
[Summary]
|
|
209
|
+
Airflow: DAG with 5 tasks designed
|
|
210
|
+
dbt: 12 models across 3 layers
|
|
211
|
+
Snowflake: Warehouse + schema configured
|
|
212
|
+
|
|
213
|
+
Pipeline design completed.
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
## Integration with Other Routing Skills
|
|
217
|
+
|
|
218
|
+
- **dev-lead-routing**: Hands off to DE lead when data engineering keywords detected
|
|
219
|
+
- **secretary-routing**: DE agents accessible through secretary for management tasks
|
|
220
|
+
- **qa-lead-routing**: Coordinates with QA for data quality testing
|
|
221
|
+
|
|
222
|
+
## Usage
|
|
223
|
+
|
|
224
|
+
This skill is NOT user-invocable. It should be automatically triggered when the main conversation detects data engineering intent.
|
|
225
|
+
|
|
226
|
+
Detection criteria:
|
|
227
|
+
- User requests pipeline design or data engineering
|
|
228
|
+
- User mentions DE tool names (Airflow, dbt, Spark, Kafka, Snowflake)
|
|
229
|
+
- User provides DE-related file paths (dags/, models/, etc.)
|
|
230
|
+
- User requests data quality or lineage work
|