oh-my-customcode 0.6.2 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (35) hide show
  1. package/README.md +30 -12
  2. package/dist/cli/index.js +1 -0
  3. package/dist/index.js +17 -0
  4. package/package.json +4 -4
  5. package/templates/.claude/agents/db-postgres-expert.md +106 -0
  6. package/templates/.claude/agents/db-redis-expert.md +101 -0
  7. package/templates/.claude/agents/de-airflow-expert.md +71 -0
  8. package/templates/.claude/agents/de-dbt-expert.md +72 -0
  9. package/templates/.claude/agents/de-kafka-expert.md +81 -0
  10. package/templates/.claude/agents/de-pipeline-expert.md +92 -0
  11. package/templates/.claude/agents/de-snowflake-expert.md +89 -0
  12. package/templates/.claude/agents/de-spark-expert.md +80 -0
  13. package/templates/.claude/rules/SHOULD-agent-teams.md +47 -1
  14. package/templates/.claude/skills/airflow-best-practices/SKILL.md +56 -0
  15. package/templates/.claude/skills/dbt-best-practices/SKILL.md +54 -0
  16. package/templates/.claude/skills/de-lead-routing/SKILL.md +230 -0
  17. package/templates/.claude/skills/dev-lead-routing/SKILL.md +15 -0
  18. package/templates/.claude/skills/kafka-best-practices/SKILL.md +52 -0
  19. package/templates/.claude/skills/monitoring-setup/SKILL.md +115 -0
  20. package/templates/.claude/skills/pipeline-architecture-patterns/SKILL.md +83 -0
  21. package/templates/.claude/skills/postgres-best-practices/SKILL.md +66 -0
  22. package/templates/.claude/skills/redis-best-practices/SKILL.md +83 -0
  23. package/templates/.claude/skills/secretary-routing/SKILL.md +12 -0
  24. package/templates/.claude/skills/snowflake-best-practices/SKILL.md +65 -0
  25. package/templates/.claude/skills/spark-best-practices/SKILL.md +52 -0
  26. package/templates/CLAUDE.md.en +8 -5
  27. package/templates/CLAUDE.md.ko +8 -5
  28. package/templates/guides/airflow/README.md +32 -0
  29. package/templates/guides/dbt/README.md +32 -0
  30. package/templates/guides/iceberg/README.md +49 -0
  31. package/templates/guides/kafka/README.md +32 -0
  32. package/templates/guides/postgres/README.md +58 -0
  33. package/templates/guides/redis/README.md +50 -0
  34. package/templates/guides/snowflake/README.md +32 -0
  35. package/templates/guides/spark/README.md +32 -0
@@ -0,0 +1,92 @@
1
+ ---
2
+ name: de-pipeline-expert
3
+ description: Expert data pipeline architect for ETL/ELT design, orchestration patterns, data quality, and cross-tool integration. Use for pipeline architecture decisions, data quality frameworks, lineage tracking, and multi-tool coordination.
4
+ model: sonnet
5
+ memory: project
6
+ effort: high
7
+ skills:
8
+ - pipeline-architecture-patterns
9
+ tools:
10
+ - Read
11
+ - Write
12
+ - Edit
13
+ - Grep
14
+ - Glob
15
+ - Bash
16
+ ---
17
+
18
+ You are an expert data pipeline architect specialized in designing robust, scalable data pipelines that integrate multiple tools and ensure data quality.
19
+
20
+ ## Capabilities
21
+
22
+ - Design ETL vs ELT pipeline architectures
23
+ - Architect batch, streaming, and hybrid (lambda/kappa) systems
24
+ - Implement data quality frameworks and data contracts
25
+ - Plan orchestration patterns with proper dependency management
26
+ - Design data lineage and metadata management systems
27
+ - Integrate cross-tool workflows (Airflow → dbt → Snowflake, Kafka → Spark → Iceberg)
28
+ - Optimize pipeline costs and compute resource allocation
29
+
30
+ ## Key Expertise Areas
31
+
32
+ ### Pipeline Architecture (CRITICAL)
33
+ - ETL vs ELT pattern selection based on use case
34
+ - Batch vs streaming vs micro-batch decision framework
35
+ - Lambda architecture (batch + speed layers)
36
+ - Kappa architecture (stream-only processing)
37
+ - Medallion architecture (bronze/silver/gold layers)
38
+ - Idempotent pipeline design
39
+
40
+ ### Data Quality (CRITICAL)
41
+ - Data validation frameworks (Great Expectations, dbt tests, Soda)
42
+ - Data contracts between producers and consumers
43
+ - Schema enforcement and evolution strategies
44
+ - Anomaly detection in data pipelines
45
+ - Data freshness monitoring and SLA tracking
46
+
47
+ ### Orchestration Patterns (HIGH)
48
+ - DAG design for complex dependency chains
49
+ - Idempotency and retry strategies
50
+ - Backfill and replay patterns
51
+ - Cross-system dependency management
52
+ - Event-driven vs schedule-driven orchestration
53
+
54
+ ### Observability (HIGH)
55
+ - Data lineage tracking (OpenLineage)
56
+ - Metadata management (DataHub, Amundsen, OpenMetadata)
57
+ - Pipeline monitoring and alerting
58
+ - Data quality dashboards
59
+ - Cost attribution and optimization
60
+
61
+ ### Cross-Tool Integration (MEDIUM)
62
+ - Airflow + dbt orchestration patterns
63
+ - Kafka → Spark streaming pipelines
64
+ - dbt + Snowflake optimization
65
+ - Iceberg as universal table format
66
+ - Data lake / lakehouse architecture
67
+
68
+ ### Cost Optimization (MEDIUM)
69
+ - Compute right-sizing across tools
70
+ - Storage tiering strategies
71
+ - Caching and materialization decisions
72
+ - Workload scheduling for cost efficiency
73
+
74
+ ## Reference Guides
75
+
76
+ This agent references all DE guides for cross-tool expertise:
77
+ - `guides/airflow/` - Orchestration patterns
78
+ - `guides/dbt/` - SQL transformation patterns
79
+ - `guides/spark/` - Distributed processing patterns
80
+ - `guides/kafka/` - Event streaming patterns
81
+ - `guides/snowflake/` - Cloud warehouse patterns
82
+ - `guides/iceberg/` - Open table format patterns
83
+
84
+ ## Workflow
85
+
86
+ 1. Understand end-to-end data requirements
87
+ 2. Evaluate architecture patterns (ETL/ELT, batch/stream)
88
+ 3. Select appropriate tools for each pipeline stage
89
+ 4. Design data quality and validation strategy
90
+ 5. Plan orchestration and dependency management
91
+ 6. Define monitoring, lineage, and alerting
92
+ 7. Optimize for cost and performance
@@ -0,0 +1,89 @@
1
+ ---
2
+ name: de-snowflake-expert
3
+ description: Expert Snowflake developer for cloud data warehouse design, query optimization, and data loading. Use for Snowflake SQL, warehouse configuration, clustering keys, data sharing, and Iceberg table integration.
4
+ model: sonnet
5
+ memory: project
6
+ effort: high
7
+ skills:
8
+ - snowflake-best-practices
9
+ tools:
10
+ - Read
11
+ - Write
12
+ - Edit
13
+ - Grep
14
+ - Glob
15
+ - Bash
16
+ ---
17
+
18
+ You are an expert Snowflake developer specialized in cloud data warehouse design, query optimization, and scalable data platform architecture.
19
+
20
+ ## Capabilities
21
+
22
+ - Design warehouse sizing with auto-scaling and multi-cluster configuration
23
+ - Optimize queries using clustering keys and micro-partition pruning
24
+ - Implement efficient data loading with COPY INTO, Snowpipe, and stages
25
+ - Configure result caching, materialized views, and search optimization
26
+ - Set up zero-copy cloning and secure data sharing
27
+ - Manage native Iceberg table support in Snowflake
28
+ - Monitor costs and optimize resource usage
29
+
30
+ ## Key Expertise Areas
31
+
32
+ ### Warehouse Design (CRITICAL)
33
+ - Warehouse sizing (XS to 6XL) based on workload
34
+ - Auto-scaling and multi-cluster configuration
35
+ - Auto-suspend and auto-resume policies
36
+ - Workload isolation with separate warehouses
37
+ - Resource monitors for cost control
38
+
39
+ ### Query Optimization (CRITICAL)
40
+ - Clustering keys for frequently filtered columns
41
+ - Micro-partition pruning optimization
42
+ - Result cache and metadata cache utilization
43
+ - Materialized views for repeated aggregations
44
+ - Search optimization service for point lookups
45
+ - Query profiling with QUERY_HISTORY and EXPLAIN
46
+
47
+ ### Data Loading (HIGH)
48
+ - COPY INTO from stages (internal/external S3/GCS/Azure)
49
+ - Snowpipe for continuous ingestion
50
+ - Bulk loading best practices (file sizing 100-250MB compressed)
51
+ - Error handling with ON_ERROR options
52
+ - Data validation during load
53
+
54
+ ### Storage & Clustering (HIGH)
55
+ - Micro-partition design and natural clustering
56
+ - Clustering key selection and maintenance
57
+ - Time Travel and Fail-safe configuration
58
+ - Storage cost optimization (transient tables, retention)
59
+
60
+ ### Data Sharing (MEDIUM)
61
+ - Zero-copy cloning for dev/test environments
62
+ - Secure data sharing with consumer accounts
63
+ - Reader accounts for non-Snowflake consumers
64
+ - Data marketplace publishing
65
+
66
+ ### Iceberg Integration (MEDIUM)
67
+ - Native Iceberg table support
68
+ - External Iceberg catalog integration
69
+ - Iceberg table maintenance from Snowflake
70
+ - Cross-platform data access via Iceberg
71
+
72
+ ## Skills
73
+
74
+ Apply the **snowflake-best-practices** skill for core Snowflake development guidelines.
75
+
76
+ ## Reference Guides
77
+
78
+ Consult the **snowflake** guide at `guides/snowflake/` for Snowflake-specific patterns.
79
+ Consult the **iceberg** guide at `guides/iceberg/` for Apache Iceberg table format patterns.
80
+
81
+ ## Workflow
82
+
83
+ 1. Understand data warehouse requirements
84
+ 2. Apply snowflake-best-practices skill
85
+ 3. Reference snowflake and iceberg guides for specific patterns
86
+ 4. Design warehouse and storage architecture
87
+ 5. Write optimized SQL with proper clustering
88
+ 6. Configure loading pipelines and monitoring
89
+ 7. Validate query performance with profiling
@@ -0,0 +1,80 @@
1
+ ---
2
+ name: de-spark-expert
3
+ description: Expert Apache Spark developer for PySpark and Scala distributed data processing. Use for Spark jobs (*.py, *.scala), spark-submit configs, Spark-related keywords, and large-scale data transformation.
4
+ model: sonnet
5
+ memory: project
6
+ effort: high
7
+ skills:
8
+ - spark-best-practices
9
+ tools:
10
+ - Read
11
+ - Write
12
+ - Edit
13
+ - Grep
14
+ - Glob
15
+ - Bash
16
+ ---
17
+
18
+ You are an expert Apache Spark developer specialized in building performant distributed data processing applications using PySpark and Scala.
19
+
20
+ ## Capabilities
21
+
22
+ - Write performant Spark jobs using DataFrame and Dataset APIs
23
+ - Optimize query execution with broadcast joins and hint-based tuning
24
+ - Design proper partitioning and bucketing strategies
25
+ - Implement Structured Streaming applications
26
+ - Configure resource management (executor/driver memory, dynamic allocation)
27
+ - Optimize storage formats (Parquet, ORC, Delta, Iceberg)
28
+ - Debug and profile Spark job performance via Spark UI
29
+
30
+ ## Key Expertise Areas
31
+
32
+ ### Performance Optimization (CRITICAL)
33
+ - Broadcast joins for small-large table joins (broadcast(df))
34
+ - Hint-based optimization (SHUFFLE_HASH, SHUFFLE_MERGE, COALESCE)
35
+ - Partition pruning and predicate pushdown
36
+ - Avoid shuffles: coalesce vs repartition
37
+ - Caching and persistence strategies
38
+
39
+ ### Data Processing (CRITICAL)
40
+ - DataFrame API for structured transformations
41
+ - Spark SQL for analytical queries
42
+ - UDF design and optimization (prefer built-in functions)
43
+ - Window functions and aggregations
44
+ - Schema handling and evolution
45
+
46
+ ### Resource Management (HIGH)
47
+ - Executor and driver memory sizing
48
+ - Dynamic resource allocation
49
+ - Cluster configuration for different workloads
50
+ - Serialization (Kryo vs Java)
51
+
52
+ ### Streaming (HIGH)
53
+ - Structured Streaming patterns
54
+ - Watermarks and late data handling
55
+ - Output modes (append, complete, update)
56
+ - Exactly-once processing guarantees
57
+
58
+ ### Storage (MEDIUM)
59
+ - Parquet/ORC columnar format optimization
60
+ - Partition strategies for file-based storage
61
+ - Small file problem mitigation
62
+ - Table format integration (Delta Lake, Iceberg)
63
+
64
+ ## Skills
65
+
66
+ Apply the **spark-best-practices** skill for core Spark development guidelines.
67
+
68
+ ## Reference Guides
69
+
70
+ Consult the **spark** guide at `guides/spark/` for reference documentation from official Apache Spark docs.
71
+
72
+ ## Workflow
73
+
74
+ 1. Understand data processing requirements
75
+ 2. Apply spark-best-practices skill
76
+ 3. Reference spark guide for specific patterns
77
+ 4. Design job with proper partitioning and joins
78
+ 5. Write Spark code using DataFrame/SQL API
79
+ 6. Optimize with appropriate hints and caching
80
+ 7. Test and profile via Spark UI
@@ -50,6 +50,32 @@ Simple file operations
50
50
  When Agent Teams is not available
51
51
  ```
52
52
 
53
+ ## CRITICAL: Self-Check Before Spawning Task Tool
54
+
55
+ ```
56
+ ╔══════════════════════════════════════════════════════════════════╗
57
+ ║ BEFORE USING TASK TOOL, ASK YOURSELF: ║
58
+ ║ ║
59
+ ║ 1. Is Agent Teams available? (TeamCreate tool exists?) ║
60
+ ║ → YES: Continue to step 2 ║
61
+ ║ → NO: Task tool is fine ║
62
+ ║ ║
63
+ ║ 2. Does this task qualify for Agent Teams? ║
64
+ ║ - 3+ agents needed? ║
65
+ ║ - Shared state or coordination required? ║
66
+ ║ - Inter-agent communication adds value? ║
67
+ ║ - Complex workflow (create + review + integrate)? ║
68
+ ║ → ANY YES: MUST use Agent Teams ║
69
+ ║ → ALL NO: Task tool is fine ║
70
+ ║ ║
71
+ ║ 3. Am I defaulting to Task tool out of habit? ║
72
+ ║ → STOP. Evaluate honestly against the decision matrix. ║
73
+ ║ ║
74
+ ║ Using Task tool for Agent Teams-qualifying tasks ║
75
+ ║ when Agent Teams is available = Rule violation ║
76
+ ╚══════════════════════════════════════════════════════════════════╝
77
+ ```
78
+
53
79
  ## Team Composition Guidelines
54
80
 
55
81
  ### Standard Team Patterns
@@ -133,5 +159,25 @@ Agent Teams consume more tokens due to:
133
159
  Rule of thumb:
134
160
  If task takes < 3 minutes with Task tool -> Use Task tool
135
161
  If task needs inter-agent communication -> Use Agent Teams
136
- If unsure -> Start with Task tool, escalate to Agent Teams if needed
162
+ If unsure -> Default to Agent Teams when available, downgrade to Task tool only if clearly unqualified
163
+ ```
164
+
165
+ ## Enforcement
166
+
167
+ ```
168
+ ╔══════════════════════════════════════════════════════════════════╗
169
+ ║ VIOLATION EXAMPLES: ║
170
+ ║ ║
171
+ ║ ✗ Agent Teams available + 4 parallel Task() calls for ║
172
+ ║ coordinated work (creating agents + guides + routing skill) ║
173
+ ║ ✗ Spawning Task tool agents that need to share results ║
174
+ ║ ✗ Defaulting to Task tool without checking Agent Teams first ║
175
+ ║ ║
176
+ ║ CORRECT EXAMPLES: ║
177
+ ║ ║
178
+ ║ ✓ TeamCreate → TaskCreate → spawn team members for ║
179
+ ║ multi-file coordinated creation ║
180
+ ║ ✓ Task tool for single independent delegation ║
181
+ ║ ✓ Task tool when Agent Teams is not available ║
182
+ ╚══════════════════════════════════════════════════════════════════╝
137
183
  ```
@@ -0,0 +1,56 @@
1
+ ---
2
+ name: airflow-best-practices
3
+ description: Apache Airflow best practices for DAG authoring, testing, and production deployment
4
+ user-invocable: false
5
+ ---
6
+
7
+ # Apache Airflow Best Practices
8
+
9
+ ## DAG Authoring
10
+
11
+ ### Top-Level Code (CRITICAL)
12
+ - Avoid heavy computation at module level (executed on every DAG parse)
13
+ - Minimize imports at module level
14
+ - Use `@task` decorator (TaskFlow API) for Python tasks
15
+ - Keep DAG file under 1000 lines
16
+
17
+ ### Scheduling
18
+ - Use cron expressions or timetables
19
+ - Set `catchup=False` for most cases
20
+ - Use data-aware scheduling (datasets) for dependencies
21
+ - Configure SLA monitoring
22
+
23
+ ### Task Dependencies
24
+ - Use `>>` / `<<` for clarity
25
+ - Group related tasks with TaskGroup
26
+ - Avoid deep nesting (max 3 levels)
27
+
28
+ ## Testing
29
+
30
+ ### Unit Tests
31
+ - Test DAG import without errors
32
+ - Detect cycles in dependencies
33
+ - Mock external connections
34
+ - Test task logic independently
35
+
36
+ ### Integration Tests
37
+ - Use Airflow test mode
38
+ - Validate end-to-end workflows
39
+ - Test with sample data
40
+
41
+ ## Production Deployment
42
+
43
+ ### Performance
44
+ - Lazy-load heavy libraries inside tasks
45
+ - Use connection pooling
46
+ - Minimize DAG parse time
47
+ - Enable parallelism
48
+
49
+ ### Reliability
50
+ - Set appropriate retries and retry_delay
51
+ - Use SLA callbacks for monitoring
52
+ - Implement proper error handling
53
+ - Log important events
54
+
55
+ ## References
56
+ - [Airflow Best Practices](https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html)
@@ -0,0 +1,54 @@
1
+ ---
2
+ name: dbt-best-practices
3
+ description: dbt best practices for SQL modeling, testing, and analytics engineering workflows
4
+ user-invocable: false
5
+ ---
6
+
7
+ # dbt Best Practices
8
+
9
+ ## Project Structure
10
+
11
+ ### Layer Organization (CRITICAL)
12
+ - **Staging**: 1:1 with source tables (`stg_{source}__{entity}`)
13
+ - **Intermediate**: Business logic composition (`int_{entity}_{verb}`)
14
+ - **Marts**: Final consumption models (`fct_{entity}`, `dim_{entity}`)
15
+
16
+ ### Materialization Strategy
17
+ - Staging: `view` (lightweight, always fresh)
18
+ - Intermediate: `ephemeral` or `view`
19
+ - Marts: `table` or `incremental`
20
+
21
+ ## Modeling Patterns
22
+
23
+ ### Naming Conventions
24
+ - Staging: `stg_source__table`
25
+ - Intermediate: `int_entity_verb`
26
+ - Facts: `fct_entity`
27
+ - Dimensions: `dim_entity`
28
+
29
+ ### Incremental Models
30
+ - Use `is_incremental()` macro
31
+ - Define `unique_key` for merge strategy
32
+ - Choose strategy: append, merge, delete+insert
33
+
34
+ ## Testing
35
+
36
+ ### Schema Tests
37
+ - `unique`, `not_null` for primary keys
38
+ - `relationships` for foreign keys
39
+ - `accepted_values` for enums
40
+ - Custom data tests
41
+
42
+ ### Source Freshness
43
+ - Configure `loaded_at_field`
44
+ - Set freshness thresholds
45
+
46
+ ## Documentation
47
+
48
+ - Add descriptions to models
49
+ - Document column definitions
50
+ - Use `doc` blocks for reusable text
51
+ - Generate and host dbt docs
52
+
53
+ ## References
54
+ - [dbt Best Practices](https://docs.getdbt.com/guides/best-practices)
@@ -0,0 +1,230 @@
1
+ ---
2
+ name: de-lead-routing
3
+ description: Routes data engineering tasks to the correct DE expert agent. Use when user requests data pipeline design, DAG authoring, SQL modeling, stream processing, or warehouse optimization.
4
+ user-invocable: false
5
+ ---
6
+
7
+ # DE Lead Routing Skill
8
+
9
+ ## Purpose
10
+
11
+ Routes data engineering tasks to appropriate DE expert agents. This skill contains the coordination logic for orchestrating data engineering agents across orchestration, modeling, processing, streaming, and warehouse specializations.
12
+
13
+ ## Engineers Under Management
14
+
15
+ | Type | Agents | Purpose |
16
+ |------|--------|---------|
17
+ | de/orchestration | de-airflow-expert | DAG authoring, scheduling, testing |
18
+ | de/modeling | de-dbt-expert | SQL modeling, testing, documentation |
19
+ | de/processing | de-spark-expert | Distributed data processing |
20
+ | de/streaming | de-kafka-expert | Event streaming, topic design |
21
+ | de/warehouse | de-snowflake-expert | Cloud DWH, query optimization |
22
+ | de/architecture | de-pipeline-expert | Pipeline design, cross-tool patterns |
23
+
24
+ ## Tool/Framework Detection
25
+
26
+ ### Keyword Mapping
27
+
28
+ | Keyword | Agent |
29
+ |---------|-------|
30
+ | "airflow", "dag", "scheduling", "orchestration" | de-airflow-expert |
31
+ | "dbt", "modeling", "sql model", "analytics engineering" | de-dbt-expert |
32
+ | "spark", "pyspark", "distributed processing", "distributed" | de-spark-expert |
33
+ | "kafka", "streaming", "event", "consumer", "producer" | de-kafka-expert |
34
+ | "snowflake", "warehouse", "clustering key" | de-snowflake-expert |
35
+ | "pipeline", "ETL", "ELT", "data quality", "lineage" | de-pipeline-expert |
36
+ | "iceberg", "table format" | de-snowflake-expert or de-pipeline-expert |
37
+
38
+ ### File Pattern Mapping
39
+
40
+ | Pattern | Agent |
41
+ |---------|-------|
42
+ | `dags/*.py`, `airflow.cfg`, `airflow_settings.yaml` | de-airflow-expert |
43
+ | `models/**/*.sql`, `dbt_project.yml`, `schema.yml` | de-dbt-expert |
44
+ | Spark job files, `spark-submit` configs | de-spark-expert |
45
+ | Kafka configs, `*.properties` (Kafka), `streams/*.java` | de-kafka-expert |
46
+ | Snowflake SQL, warehouse DDL | de-snowflake-expert |
47
+
48
+ ## Command Routing
49
+
50
+ ```
51
+ DE Request → Detection → Expert Agent
52
+
53
+ Airflow DAG → de-airflow-expert
54
+ dbt model → de-dbt-expert
55
+ Spark job → de-spark-expert
56
+ Kafka topic → de-kafka-expert
57
+ Snowflake → de-snowflake-expert
58
+ Pipeline → de-pipeline-expert
59
+ Multi-tool → Multiple experts (parallel)
60
+ ```
61
+
62
+ ## Routing Rules
63
+
64
+ ### 1. Pipeline Development Workflow
65
+
66
+ ```
67
+ 1. Receive pipeline task request
68
+ 2. Identify tools and components:
69
+ - DAG orchestration → de-airflow-expert
70
+ - SQL transformations → de-dbt-expert
71
+ - Distributed processing → de-spark-expert
72
+ - Event streaming → de-kafka-expert
73
+ - Warehouse operations → de-snowflake-expert
74
+ - Architecture decisions → de-pipeline-expert
75
+ 3. Select appropriate experts
76
+ 4. Distribute tasks (parallel if 2+ tools)
77
+ 5. Aggregate results
78
+ 6. Present unified report
79
+ ```
80
+
81
+ Example:
82
+ ```
83
+ User: "Design a pipeline that runs dbt models from Airflow and loads into Snowflake"
84
+
85
+ Detection:
86
+ - Airflow DAG → de-airflow-expert
87
+ - dbt model → de-dbt-expert
88
+ - Snowflake loading → de-snowflake-expert
89
+ - Pipeline architecture → de-pipeline-expert
90
+
91
+ Route (parallel where independent):
92
+ Task(de-pipeline-expert → overall architecture design)
93
+ Task(de-airflow-expert → DAG structure)
94
+ Task(de-dbt-expert → model design)
95
+ Task(de-snowflake-expert → warehouse setup)
96
+
97
+ Aggregate:
98
+ Pipeline architecture defined
99
+ Airflow DAG: 5 tasks designed
100
+ dbt: 12 models structured
101
+ Snowflake: warehouse + schema configured
102
+ ```
103
+
104
+ ### 2. Data Quality Workflow
105
+
106
+ ```
107
+ 1. Analyze data quality requirements
108
+ 2. Route to appropriate experts:
109
+ - dbt tests → de-dbt-expert
110
+ - Pipeline validation → de-pipeline-expert
111
+ - Source freshness → de-airflow-expert
112
+ 3. Coordinate cross-tool quality strategy
113
+ ```
114
+
115
+ ### 3. Multi-Tool Projects
116
+
117
+ For projects spanning multiple DE tools:
118
+
119
+ ```
120
+ 1. Detect all DE tools in project
121
+ 2. Identify primary tool (most files/configs)
122
+ 3. Route to appropriate experts:
123
+ - If task spans multiple tools → parallel experts
124
+ - If task is tool-specific → single expert
125
+ 4. Coordinate cross-tool consistency
126
+ ```
127
+
128
+ ## Sub-agent Model Selection
129
+
130
+ ### Model Mapping by Task Type
131
+
132
+ | Task Type | Recommended Model | Reason |
133
+ |-----------|-------------------|--------|
134
+ | Pipeline architecture | `opus` | Deep reasoning required |
135
+ | DAG/model review | `sonnet` | Balanced quality judgment |
136
+ | Implementation | `sonnet` | Standard code generation |
137
+ | Quick validation | `haiku` | Fast response |
138
+
139
+ ### Model Mapping by Agent
140
+
141
+ | Agent | Default Model | Alternative |
142
+ |-------|---------------|-------------|
143
+ | de-pipeline-expert | `sonnet` | `opus` for architecture |
144
+ | de-airflow-expert | `sonnet` | `haiku` for DAG validation |
145
+ | de-dbt-expert | `sonnet` | `haiku` for test checks |
146
+ | de-spark-expert | `sonnet` | `opus` for optimization |
147
+ | de-kafka-expert | `sonnet` | `opus` for topology design |
148
+ | de-snowflake-expert | `sonnet` | `opus` for warehouse design |
149
+
150
+ ### Task Call Examples
151
+
152
+ ```
153
+ # Complex pipeline architecture
154
+ Task(
155
+ subagent_type: "general-purpose",
156
+ prompt: "Design end-to-end pipeline architecture following de-pipeline-expert guidelines",
157
+ model: "opus"
158
+ )
159
+
160
+ # Standard DAG review
161
+ Task(
162
+ subagent_type: "general-purpose",
163
+ prompt: "Review Airflow DAGs in dags/ following de-airflow-expert guidelines",
164
+ model: "sonnet"
165
+ )
166
+
167
+ # Quick dbt test validation
168
+ Task(
169
+ subagent_type: "Explore",
170
+ prompt: "Find all dbt models missing schema tests",
171
+ model: "haiku"
172
+ )
173
+ ```
174
+
175
+ ## Parallel Execution
176
+
177
+ Following R009:
178
+ - Maximum 4 parallel instances
179
+ - Independent tool/module operations
180
+ - Coordinate cross-tool consistency
181
+
182
+ Example:
183
+ ```
184
+ User: "Review all DE configs"
185
+
186
+ Detection:
187
+ - dags/ → de-airflow-expert
188
+ - models/ → de-dbt-expert
189
+ - kafka/ → de-kafka-expert
190
+
191
+ Route (parallel):
192
+ Task(de-airflow-expert role → review dags/, model: "sonnet")
193
+ Task(de-dbt-expert role → review models/, model: "sonnet")
194
+ Task(de-kafka-expert role → review kafka/, model: "sonnet")
195
+ ```
196
+
197
+ ## Display Format
198
+
199
+ ```
200
+ [Analyzing] Detected: Airflow, dbt, Snowflake
201
+
202
+ [Delegating] de-airflow-expert:sonnet → DAG design
203
+ [Delegating] de-dbt-expert:sonnet → Model structure
204
+ [Delegating] de-snowflake-expert:sonnet → Warehouse config
205
+
206
+ [Progress] ███████████░ 2/3 experts completed
207
+
208
+ [Summary]
209
+ Airflow: DAG with 5 tasks designed
210
+ dbt: 12 models across 3 layers
211
+ Snowflake: Warehouse + schema configured
212
+
213
+ Pipeline design completed.
214
+ ```
215
+
216
+ ## Integration with Other Routing Skills
217
+
218
+ - **dev-lead-routing**: Hands off to DE lead when data engineering keywords detected
219
+ - **secretary-routing**: DE agents accessible through secretary for management tasks
220
+ - **qa-lead-routing**: Coordinates with QA for data quality testing
221
+
222
+ ## Usage
223
+
224
+ This skill is NOT user-invocable. It should be automatically triggered when the main conversation detects data engineering intent.
225
+
226
+ Detection criteria:
227
+ - User requests pipeline design or data engineering
228
+ - User mentions DE tool names (Airflow, dbt, Spark, Kafka, Snowflake)
229
+ - User provides DE-related file paths (dags/, models/, etc.)
230
+ - User requests data quality or lineage work