oh-my-customcode 0.7.0 → 0.9.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +38 -12
- package/dist/cli/index.js +518 -274
- package/dist/index.js +304 -101
- package/package.json +1 -1
- package/templates/.claude/agents/db-postgres-expert.md +106 -0
- package/templates/.claude/agents/db-redis-expert.md +101 -0
- package/templates/.claude/agents/de-airflow-expert.md +71 -0
- package/templates/.claude/agents/de-dbt-expert.md +72 -0
- package/templates/.claude/agents/de-kafka-expert.md +81 -0
- package/templates/.claude/agents/de-pipeline-expert.md +92 -0
- package/templates/.claude/agents/de-snowflake-expert.md +89 -0
- package/templates/.claude/agents/de-spark-expert.md +80 -0
- package/templates/.claude/rules/SHOULD-agent-teams.md +47 -1
- package/templates/.claude/skills/airflow-best-practices/SKILL.md +56 -0
- package/templates/.claude/skills/dbt-best-practices/SKILL.md +54 -0
- package/templates/.claude/skills/de-lead-routing/SKILL.md +230 -0
- package/templates/.claude/skills/dev-lead-routing/SKILL.md +15 -0
- package/templates/.claude/skills/kafka-best-practices/SKILL.md +52 -0
- package/templates/.claude/skills/pipeline-architecture-patterns/SKILL.md +83 -0
- package/templates/.claude/skills/postgres-best-practices/SKILL.md +66 -0
- package/templates/.claude/skills/redis-best-practices/SKILL.md +83 -0
- package/templates/.claude/skills/secretary-routing/SKILL.md +12 -0
- package/templates/.claude/skills/snowflake-best-practices/SKILL.md +65 -0
- package/templates/.claude/skills/spark-best-practices/SKILL.md +52 -0
- package/templates/.codex/agents/arch-documenter.md +97 -0
- package/templates/.codex/agents/arch-speckit-agent.md +134 -0
- package/templates/.codex/agents/be-express-expert.md +80 -0
- package/templates/.codex/agents/be-fastapi-expert.md +43 -0
- package/templates/.codex/agents/be-go-backend-expert.md +43 -0
- package/templates/.codex/agents/be-nestjs-expert.md +60 -0
- package/templates/.codex/agents/be-springboot-expert.md +85 -0
- package/templates/.codex/agents/db-postgres-expert.md +106 -0
- package/templates/.codex/agents/db-redis-expert.md +101 -0
- package/templates/.codex/agents/db-supabase-expert.md +71 -0
- package/templates/.codex/agents/de-airflow-expert.md +71 -0
- package/templates/.codex/agents/de-dbt-expert.md +72 -0
- package/templates/.codex/agents/de-kafka-expert.md +81 -0
- package/templates/.codex/agents/de-pipeline-expert.md +92 -0
- package/templates/.codex/agents/de-snowflake-expert.md +89 -0
- package/templates/.codex/agents/de-spark-expert.md +80 -0
- package/templates/.codex/agents/fe-svelte-agent.md +65 -0
- package/templates/.codex/agents/fe-vercel-agent.md +69 -0
- package/templates/.codex/agents/fe-vuejs-agent.md +65 -0
- package/templates/.codex/agents/infra-aws-expert.md +47 -0
- package/templates/.codex/agents/infra-docker-expert.md +47 -0
- package/templates/.codex/agents/lang-golang-expert.md +43 -0
- package/templates/.codex/agents/lang-java21-expert.md +65 -0
- package/templates/.codex/agents/lang-kotlin-expert.md +43 -0
- package/templates/.codex/agents/lang-python-expert.md +43 -0
- package/templates/.codex/agents/lang-rust-expert.md +43 -0
- package/templates/.codex/agents/lang-typescript-expert.md +43 -0
- package/templates/.codex/agents/mgr-claude-code-bible.md +246 -0
- package/templates/.codex/agents/mgr-creator.md +120 -0
- package/templates/.codex/agents/mgr-gitnerd.md +113 -0
- package/templates/.codex/agents/mgr-sauron.md +154 -0
- package/templates/.codex/agents/mgr-supplier.md +120 -0
- package/templates/.codex/agents/mgr-sync-checker.md +99 -0
- package/templates/.codex/agents/mgr-updater.md +103 -0
- package/templates/.codex/agents/qa-engineer.md +96 -0
- package/templates/.codex/agents/qa-planner.md +74 -0
- package/templates/.codex/agents/qa-writer.md +97 -0
- package/templates/.codex/agents/sys-memory-keeper.md +117 -0
- package/templates/.codex/agents/sys-naggy.md +90 -0
- package/templates/.codex/agents/tool-bun-expert.md +71 -0
- package/templates/.codex/agents/tool-npm-expert.md +88 -0
- package/templates/.codex/agents/tool-optimizer.md +87 -0
- package/templates/.codex/codex-native-hash.txt +1 -0
- package/templates/.codex/contexts/dev.md +20 -0
- package/templates/.codex/contexts/ecomode.md +63 -0
- package/templates/.codex/contexts/index.yaml +41 -0
- package/templates/.codex/contexts/research.md +28 -0
- package/templates/.codex/contexts/review.md +23 -0
- package/templates/.codex/hooks/hooks.json +151 -0
- package/templates/.codex/install-hooks.sh +100 -0
- package/templates/.codex/rules/MAY-optimization.md +93 -0
- package/templates/.codex/rules/MUST-agent-design.md +162 -0
- package/templates/.codex/rules/MUST-agent-identification.md +108 -0
- package/templates/.codex/rules/MUST-continuous-improvement.md +132 -0
- package/templates/.codex/rules/MUST-intent-transparency.md +199 -0
- package/templates/.codex/rules/MUST-language-policy.md +62 -0
- package/templates/.codex/rules/MUST-orchestrator-coordination.md +471 -0
- package/templates/.codex/rules/MUST-parallel-execution.md +469 -0
- package/templates/.codex/rules/MUST-permissions.md +84 -0
- package/templates/.codex/rules/MUST-safety.md +69 -0
- package/templates/.codex/rules/MUST-sync-verification.md +281 -0
- package/templates/.codex/rules/MUST-tool-identification.md +195 -0
- package/templates/.codex/rules/SHOULD-agent-teams.md +183 -0
- package/templates/.codex/rules/SHOULD-ecomode.md +145 -0
- package/templates/.codex/rules/SHOULD-error-handling.md +102 -0
- package/templates/.codex/rules/SHOULD-hud-statusline.md +112 -0
- package/templates/.codex/rules/SHOULD-interaction.md +103 -0
- package/templates/.codex/rules/SHOULD-memory-integration.md +132 -0
- package/templates/.codex/rules/index.yaml +141 -0
- package/templates/.codex/skills/airflow-best-practices/SKILL.md +56 -0
- package/templates/.codex/skills/audit-agents/SKILL.md +116 -0
- package/templates/.codex/skills/aws-best-practices/SKILL.md +280 -0
- package/templates/.codex/skills/claude-code-bible/SKILL.md +180 -0
- package/templates/.codex/skills/claude-code-bible/scripts/fetch-docs.js +244 -0
- package/templates/.codex/skills/create-agent/SKILL.md +91 -0
- package/templates/.codex/skills/dbt-best-practices/SKILL.md +54 -0
- package/templates/.codex/skills/de-lead-routing/SKILL.md +230 -0
- package/templates/.codex/skills/dev-lead-routing/SKILL.md +253 -0
- package/templates/.codex/skills/dev-refactor/SKILL.md +123 -0
- package/templates/.codex/skills/dev-review/SKILL.md +81 -0
- package/templates/.codex/skills/docker-best-practices/SKILL.md +275 -0
- package/templates/.codex/skills/fastapi-best-practices/SKILL.md +270 -0
- package/templates/.codex/skills/fix-refs/SKILL.md +107 -0
- package/templates/.codex/skills/go-backend-best-practices/SKILL.md +338 -0
- package/templates/.codex/skills/go-best-practices/CLAUDE.md +9 -0
- package/templates/.codex/skills/go-best-practices/SKILL.md +203 -0
- package/templates/.codex/skills/help/SKILL.md +125 -0
- package/templates/.codex/skills/intent-detection/SKILL.md +215 -0
- package/templates/.codex/skills/intent-detection/patterns/agent-triggers.yaml +349 -0
- package/templates/.codex/skills/kafka-best-practices/SKILL.md +52 -0
- package/templates/.codex/skills/kotlin-best-practices/SKILL.md +256 -0
- package/templates/.codex/skills/lists/SKILL.md +78 -0
- package/templates/.codex/skills/memory-management/SKILL.md +195 -0
- package/templates/.codex/skills/memory-recall/SKILL.md +152 -0
- package/templates/.codex/skills/memory-save/SKILL.md +126 -0
- package/templates/.codex/skills/monitoring-setup/SKILL.md +115 -0
- package/templates/.codex/skills/npm-audit/SKILL.md +72 -0
- package/templates/.codex/skills/npm-publish/SKILL.md +63 -0
- package/templates/.codex/skills/npm-version/SKILL.md +75 -0
- package/templates/.codex/skills/optimize-analyze/SKILL.md +55 -0
- package/templates/.codex/skills/optimize-bundle/SKILL.md +67 -0
- package/templates/.codex/skills/optimize-report/SKILL.md +74 -0
- package/templates/.codex/skills/pipeline-architecture-patterns/SKILL.md +83 -0
- package/templates/.codex/skills/postgres-best-practices/SKILL.md +66 -0
- package/templates/.codex/skills/python-best-practices/SKILL.md +222 -0
- package/templates/.codex/skills/qa-lead-routing/SKILL.md +277 -0
- package/templates/.codex/skills/react-best-practices/SKILL.md +101 -0
- package/templates/.codex/skills/redis-best-practices/SKILL.md +83 -0
- package/templates/.codex/skills/result-aggregation/SKILL.md +164 -0
- package/templates/.codex/skills/rust-best-practices/SKILL.md +267 -0
- package/templates/.codex/skills/sauron-watch/SKILL.md +144 -0
- package/templates/.codex/skills/secretary-routing/SKILL.md +190 -0
- package/templates/.codex/skills/snowflake-best-practices/SKILL.md +65 -0
- package/templates/.codex/skills/spark-best-practices/SKILL.md +52 -0
- package/templates/.codex/skills/springboot-best-practices/SKILL.md +357 -0
- package/templates/.codex/skills/status/SKILL.md +153 -0
- package/templates/.codex/skills/supabase-postgres-best-practices/SKILL.md +99 -0
- package/templates/.codex/skills/typescript-best-practices/SKILL.md +321 -0
- package/templates/.codex/skills/update-docs/SKILL.md +140 -0
- package/templates/.codex/skills/update-external/SKILL.md +149 -0
- package/templates/.codex/skills/vercel-deploy/SKILL.md +73 -0
- package/templates/.codex/skills/web-design-guidelines/SKILL.md +118 -0
- package/templates/.codex/skills/writing-clearly-and-concisely/SKILL.md +64 -0
- package/templates/.codex/uninstall-hooks.sh +52 -0
- package/templates/AGENTS.md.en +39 -0
- package/templates/AGENTS.md.ko +39 -0
- package/templates/CLAUDE.md.en +7 -5
- package/templates/CLAUDE.md.ko +7 -5
- package/templates/guides/airflow/README.md +32 -0
- package/templates/guides/dbt/README.md +32 -0
- package/templates/guides/iceberg/README.md +49 -0
- package/templates/guides/kafka/README.md +32 -0
- package/templates/guides/postgres/README.md +58 -0
- package/templates/guides/redis/README.md +50 -0
- package/templates/guides/snowflake/README.md +32 -0
- package/templates/guides/spark/README.md +32 -0
- package/templates/manifest.codex.json +43 -0
- package/templates/manifest.json +5 -5
|
@@ -0,0 +1,80 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: de-spark-expert
|
|
3
|
+
description: Expert Apache Spark developer for PySpark and Scala distributed data processing. Use for Spark jobs (*.py, *.scala), spark-submit configs, Spark-related keywords, and large-scale data transformation.
|
|
4
|
+
model: sonnet
|
|
5
|
+
memory: project
|
|
6
|
+
effort: high
|
|
7
|
+
skills:
|
|
8
|
+
- spark-best-practices
|
|
9
|
+
tools:
|
|
10
|
+
- Read
|
|
11
|
+
- Write
|
|
12
|
+
- Edit
|
|
13
|
+
- Grep
|
|
14
|
+
- Glob
|
|
15
|
+
- Bash
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
You are an expert Apache Spark developer specialized in building performant distributed data processing applications using PySpark and Scala.
|
|
19
|
+
|
|
20
|
+
## Capabilities
|
|
21
|
+
|
|
22
|
+
- Write performant Spark jobs using DataFrame and Dataset APIs
|
|
23
|
+
- Optimize query execution with broadcast joins and hint-based tuning
|
|
24
|
+
- Design proper partitioning and bucketing strategies
|
|
25
|
+
- Implement Structured Streaming applications
|
|
26
|
+
- Configure resource management (executor/driver memory, dynamic allocation)
|
|
27
|
+
- Optimize storage formats (Parquet, ORC, Delta, Iceberg)
|
|
28
|
+
- Debug and profile Spark job performance via Spark UI
|
|
29
|
+
|
|
30
|
+
## Key Expertise Areas
|
|
31
|
+
|
|
32
|
+
### Performance Optimization (CRITICAL)
|
|
33
|
+
- Broadcast joins for small-large table joins (broadcast(df))
|
|
34
|
+
- Hint-based optimization (SHUFFLE_HASH, SHUFFLE_MERGE, COALESCE)
|
|
35
|
+
- Partition pruning and predicate pushdown
|
|
36
|
+
- Avoid shuffles: coalesce vs repartition
|
|
37
|
+
- Caching and persistence strategies
|
|
38
|
+
|
|
39
|
+
### Data Processing (CRITICAL)
|
|
40
|
+
- DataFrame API for structured transformations
|
|
41
|
+
- Spark SQL for analytical queries
|
|
42
|
+
- UDF design and optimization (prefer built-in functions)
|
|
43
|
+
- Window functions and aggregations
|
|
44
|
+
- Schema handling and evolution
|
|
45
|
+
|
|
46
|
+
### Resource Management (HIGH)
|
|
47
|
+
- Executor and driver memory sizing
|
|
48
|
+
- Dynamic resource allocation
|
|
49
|
+
- Cluster configuration for different workloads
|
|
50
|
+
- Serialization (Kryo vs Java)
|
|
51
|
+
|
|
52
|
+
### Streaming (HIGH)
|
|
53
|
+
- Structured Streaming patterns
|
|
54
|
+
- Watermarks and late data handling
|
|
55
|
+
- Output modes (append, complete, update)
|
|
56
|
+
- Exactly-once processing guarantees
|
|
57
|
+
|
|
58
|
+
### Storage (MEDIUM)
|
|
59
|
+
- Parquet/ORC columnar format optimization
|
|
60
|
+
- Partition strategies for file-based storage
|
|
61
|
+
- Small file problem mitigation
|
|
62
|
+
- Table format integration (Delta Lake, Iceberg)
|
|
63
|
+
|
|
64
|
+
## Skills
|
|
65
|
+
|
|
66
|
+
Apply the **spark-best-practices** skill for core Spark development guidelines.
|
|
67
|
+
|
|
68
|
+
## Reference Guides
|
|
69
|
+
|
|
70
|
+
Consult the **spark** guide at `guides/spark/` for reference documentation from official Apache Spark docs.
|
|
71
|
+
|
|
72
|
+
## Workflow
|
|
73
|
+
|
|
74
|
+
1. Understand data processing requirements
|
|
75
|
+
2. Apply spark-best-practices skill
|
|
76
|
+
3. Reference spark guide for specific patterns
|
|
77
|
+
4. Design job with proper partitioning and joins
|
|
78
|
+
5. Write Spark code using DataFrame/SQL API
|
|
79
|
+
6. Optimize with appropriate hints and caching
|
|
80
|
+
7. Test and profile via Spark UI
|
|
@@ -50,6 +50,32 @@ Simple file operations
|
|
|
50
50
|
When Agent Teams is not available
|
|
51
51
|
```
|
|
52
52
|
|
|
53
|
+
## CRITICAL: Self-Check Before Spawning Task Tool
|
|
54
|
+
|
|
55
|
+
```
|
|
56
|
+
╔══════════════════════════════════════════════════════════════════╗
|
|
57
|
+
║ BEFORE USING TASK TOOL, ASK YOURSELF: ║
|
|
58
|
+
║ ║
|
|
59
|
+
║ 1. Is Agent Teams available? (TeamCreate tool exists?) ║
|
|
60
|
+
║ → YES: Continue to step 2 ║
|
|
61
|
+
║ → NO: Task tool is fine ║
|
|
62
|
+
║ ║
|
|
63
|
+
║ 2. Does this task qualify for Agent Teams? ║
|
|
64
|
+
║ - 3+ agents needed? ║
|
|
65
|
+
║ - Shared state or coordination required? ║
|
|
66
|
+
║ - Inter-agent communication adds value? ║
|
|
67
|
+
║ - Complex workflow (create + review + integrate)? ║
|
|
68
|
+
║ → ANY YES: MUST use Agent Teams ║
|
|
69
|
+
║ → ALL NO: Task tool is fine ║
|
|
70
|
+
║ ║
|
|
71
|
+
║ 3. Am I defaulting to Task tool out of habit? ║
|
|
72
|
+
║ → STOP. Evaluate honestly against the decision matrix. ║
|
|
73
|
+
║ ║
|
|
74
|
+
║ Using Task tool for Agent Teams-qualifying tasks ║
|
|
75
|
+
║ when Agent Teams is available = Rule violation ║
|
|
76
|
+
╚══════════════════════════════════════════════════════════════════╝
|
|
77
|
+
```
|
|
78
|
+
|
|
53
79
|
## Team Composition Guidelines
|
|
54
80
|
|
|
55
81
|
### Standard Team Patterns
|
|
@@ -133,5 +159,25 @@ Agent Teams consume more tokens due to:
|
|
|
133
159
|
Rule of thumb:
|
|
134
160
|
If task takes < 3 minutes with Task tool -> Use Task tool
|
|
135
161
|
If task needs inter-agent communication -> Use Agent Teams
|
|
136
|
-
If unsure ->
|
|
162
|
+
If unsure -> Default to Agent Teams when available, downgrade to Task tool only if clearly unqualified
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
## Enforcement
|
|
166
|
+
|
|
167
|
+
```
|
|
168
|
+
╔══════════════════════════════════════════════════════════════════╗
|
|
169
|
+
║ VIOLATION EXAMPLES: ║
|
|
170
|
+
║ ║
|
|
171
|
+
║ ✗ Agent Teams available + 4 parallel Task() calls for ║
|
|
172
|
+
║ coordinated work (creating agents + guides + routing skill) ║
|
|
173
|
+
║ ✗ Spawning Task tool agents that need to share results ║
|
|
174
|
+
║ ✗ Defaulting to Task tool without checking Agent Teams first ║
|
|
175
|
+
║ ║
|
|
176
|
+
║ CORRECT EXAMPLES: ║
|
|
177
|
+
║ ║
|
|
178
|
+
║ ✓ TeamCreate → TaskCreate → spawn team members for ║
|
|
179
|
+
║ multi-file coordinated creation ║
|
|
180
|
+
║ ✓ Task tool for single independent delegation ║
|
|
181
|
+
║ ✓ Task tool when Agent Teams is not available ║
|
|
182
|
+
╚══════════════════════════════════════════════════════════════════╝
|
|
137
183
|
```
|
|
@@ -0,0 +1,56 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: airflow-best-practices
|
|
3
|
+
description: Apache Airflow best practices for DAG authoring, testing, and production deployment
|
|
4
|
+
user-invocable: false
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Apache Airflow Best Practices
|
|
8
|
+
|
|
9
|
+
## DAG Authoring
|
|
10
|
+
|
|
11
|
+
### Top-Level Code (CRITICAL)
|
|
12
|
+
- Avoid heavy computation at module level (executed on every DAG parse)
|
|
13
|
+
- Minimize imports at module level
|
|
14
|
+
- Use `@task` decorator (TaskFlow API) for Python tasks
|
|
15
|
+
- Keep DAG file under 1000 lines
|
|
16
|
+
|
|
17
|
+
### Scheduling
|
|
18
|
+
- Use cron expressions or timetables
|
|
19
|
+
- Set `catchup=False` for most cases
|
|
20
|
+
- Use data-aware scheduling (datasets) for dependencies
|
|
21
|
+
- Configure SLA monitoring
|
|
22
|
+
|
|
23
|
+
### Task Dependencies
|
|
24
|
+
- Use `>>` / `<<` for clarity
|
|
25
|
+
- Group related tasks with TaskGroup
|
|
26
|
+
- Avoid deep nesting (max 3 levels)
|
|
27
|
+
|
|
28
|
+
## Testing
|
|
29
|
+
|
|
30
|
+
### Unit Tests
|
|
31
|
+
- Test DAG import without errors
|
|
32
|
+
- Detect cycles in dependencies
|
|
33
|
+
- Mock external connections
|
|
34
|
+
- Test task logic independently
|
|
35
|
+
|
|
36
|
+
### Integration Tests
|
|
37
|
+
- Use Airflow test mode
|
|
38
|
+
- Validate end-to-end workflows
|
|
39
|
+
- Test with sample data
|
|
40
|
+
|
|
41
|
+
## Production Deployment
|
|
42
|
+
|
|
43
|
+
### Performance
|
|
44
|
+
- Lazy-load heavy libraries inside tasks
|
|
45
|
+
- Use connection pooling
|
|
46
|
+
- Minimize DAG parse time
|
|
47
|
+
- Enable parallelism
|
|
48
|
+
|
|
49
|
+
### Reliability
|
|
50
|
+
- Set appropriate retries and retry_delay
|
|
51
|
+
- Use SLA callbacks for monitoring
|
|
52
|
+
- Implement proper error handling
|
|
53
|
+
- Log important events
|
|
54
|
+
|
|
55
|
+
## References
|
|
56
|
+
- [Airflow Best Practices](https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html)
|
|
@@ -0,0 +1,54 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: dbt-best-practices
|
|
3
|
+
description: dbt best practices for SQL modeling, testing, and analytics engineering workflows
|
|
4
|
+
user-invocable: false
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# dbt Best Practices
|
|
8
|
+
|
|
9
|
+
## Project Structure
|
|
10
|
+
|
|
11
|
+
### Layer Organization (CRITICAL)
|
|
12
|
+
- **Staging**: 1:1 with source tables (`stg_{source}__{entity}`)
|
|
13
|
+
- **Intermediate**: Business logic composition (`int_{entity}_{verb}`)
|
|
14
|
+
- **Marts**: Final consumption models (`fct_{entity}`, `dim_{entity}`)
|
|
15
|
+
|
|
16
|
+
### Materialization Strategy
|
|
17
|
+
- Staging: `view` (lightweight, always fresh)
|
|
18
|
+
- Intermediate: `ephemeral` or `view`
|
|
19
|
+
- Marts: `table` or `incremental`
|
|
20
|
+
|
|
21
|
+
## Modeling Patterns
|
|
22
|
+
|
|
23
|
+
### Naming Conventions
|
|
24
|
+
- Staging: `stg_source__table`
|
|
25
|
+
- Intermediate: `int_entity_verb`
|
|
26
|
+
- Facts: `fct_entity`
|
|
27
|
+
- Dimensions: `dim_entity`
|
|
28
|
+
|
|
29
|
+
### Incremental Models
|
|
30
|
+
- Use `is_incremental()` macro
|
|
31
|
+
- Define `unique_key` for merge strategy
|
|
32
|
+
- Choose strategy: append, merge, delete+insert
|
|
33
|
+
|
|
34
|
+
## Testing
|
|
35
|
+
|
|
36
|
+
### Schema Tests
|
|
37
|
+
- `unique`, `not_null` for primary keys
|
|
38
|
+
- `relationships` for foreign keys
|
|
39
|
+
- `accepted_values` for enums
|
|
40
|
+
- Custom data tests
|
|
41
|
+
|
|
42
|
+
### Source Freshness
|
|
43
|
+
- Configure `loaded_at_field`
|
|
44
|
+
- Set freshness thresholds
|
|
45
|
+
|
|
46
|
+
## Documentation
|
|
47
|
+
|
|
48
|
+
- Add descriptions to models
|
|
49
|
+
- Document column definitions
|
|
50
|
+
- Use `doc` blocks for reusable text
|
|
51
|
+
- Generate and host dbt docs
|
|
52
|
+
|
|
53
|
+
## References
|
|
54
|
+
- [dbt Best Practices](https://docs.getdbt.com/guides/best-practices)
|
|
@@ -0,0 +1,230 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: de-lead-routing
|
|
3
|
+
description: Routes data engineering tasks to the correct DE expert agent. Use when user requests data pipeline design, DAG authoring, SQL modeling, stream processing, or warehouse optimization.
|
|
4
|
+
user-invocable: false
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# DE Lead Routing Skill
|
|
8
|
+
|
|
9
|
+
## Purpose
|
|
10
|
+
|
|
11
|
+
Routes data engineering tasks to appropriate DE expert agents. This skill contains the coordination logic for orchestrating data engineering agents across orchestration, modeling, processing, streaming, and warehouse specializations.
|
|
12
|
+
|
|
13
|
+
## Engineers Under Management
|
|
14
|
+
|
|
15
|
+
| Type | Agents | Purpose |
|
|
16
|
+
|------|--------|---------|
|
|
17
|
+
| de/orchestration | de-airflow-expert | DAG authoring, scheduling, testing |
|
|
18
|
+
| de/modeling | de-dbt-expert | SQL modeling, testing, documentation |
|
|
19
|
+
| de/processing | de-spark-expert | Distributed data processing |
|
|
20
|
+
| de/streaming | de-kafka-expert | Event streaming, topic design |
|
|
21
|
+
| de/warehouse | de-snowflake-expert | Cloud DWH, query optimization |
|
|
22
|
+
| de/architecture | de-pipeline-expert | Pipeline design, cross-tool patterns |
|
|
23
|
+
|
|
24
|
+
## Tool/Framework Detection
|
|
25
|
+
|
|
26
|
+
### Keyword Mapping
|
|
27
|
+
|
|
28
|
+
| Keyword | Agent |
|
|
29
|
+
|---------|-------|
|
|
30
|
+
| "airflow", "dag", "scheduling", "orchestration" | de-airflow-expert |
|
|
31
|
+
| "dbt", "modeling", "sql model", "analytics engineering" | de-dbt-expert |
|
|
32
|
+
| "spark", "pyspark", "distributed processing", "distributed" | de-spark-expert |
|
|
33
|
+
| "kafka", "streaming", "event", "consumer", "producer" | de-kafka-expert |
|
|
34
|
+
| "snowflake", "warehouse", "clustering key" | de-snowflake-expert |
|
|
35
|
+
| "pipeline", "ETL", "ELT", "data quality", "lineage" | de-pipeline-expert |
|
|
36
|
+
| "iceberg", "table format" | de-snowflake-expert or de-pipeline-expert |
|
|
37
|
+
|
|
38
|
+
### File Pattern Mapping
|
|
39
|
+
|
|
40
|
+
| Pattern | Agent |
|
|
41
|
+
|---------|-------|
|
|
42
|
+
| `dags/*.py`, `airflow.cfg`, `airflow_settings.yaml` | de-airflow-expert |
|
|
43
|
+
| `models/**/*.sql`, `dbt_project.yml`, `schema.yml` | de-dbt-expert |
|
|
44
|
+
| Spark job files, `spark-submit` configs | de-spark-expert |
|
|
45
|
+
| Kafka configs, `*.properties` (Kafka), `streams/*.java` | de-kafka-expert |
|
|
46
|
+
| Snowflake SQL, warehouse DDL | de-snowflake-expert |
|
|
47
|
+
|
|
48
|
+
## Command Routing
|
|
49
|
+
|
|
50
|
+
```
|
|
51
|
+
DE Request → Detection → Expert Agent
|
|
52
|
+
|
|
53
|
+
Airflow DAG → de-airflow-expert
|
|
54
|
+
dbt model → de-dbt-expert
|
|
55
|
+
Spark job → de-spark-expert
|
|
56
|
+
Kafka topic → de-kafka-expert
|
|
57
|
+
Snowflake → de-snowflake-expert
|
|
58
|
+
Pipeline → de-pipeline-expert
|
|
59
|
+
Multi-tool → Multiple experts (parallel)
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
## Routing Rules
|
|
63
|
+
|
|
64
|
+
### 1. Pipeline Development Workflow
|
|
65
|
+
|
|
66
|
+
```
|
|
67
|
+
1. Receive pipeline task request
|
|
68
|
+
2. Identify tools and components:
|
|
69
|
+
- DAG orchestration → de-airflow-expert
|
|
70
|
+
- SQL transformations → de-dbt-expert
|
|
71
|
+
- Distributed processing → de-spark-expert
|
|
72
|
+
- Event streaming → de-kafka-expert
|
|
73
|
+
- Warehouse operations → de-snowflake-expert
|
|
74
|
+
- Architecture decisions → de-pipeline-expert
|
|
75
|
+
3. Select appropriate experts
|
|
76
|
+
4. Distribute tasks (parallel if 2+ tools)
|
|
77
|
+
5. Aggregate results
|
|
78
|
+
6. Present unified report
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
Example:
|
|
82
|
+
```
|
|
83
|
+
User: "Design a pipeline that runs dbt models from Airflow and loads into Snowflake"
|
|
84
|
+
|
|
85
|
+
Detection:
|
|
86
|
+
- Airflow DAG → de-airflow-expert
|
|
87
|
+
- dbt model → de-dbt-expert
|
|
88
|
+
- Snowflake loading → de-snowflake-expert
|
|
89
|
+
- Pipeline architecture → de-pipeline-expert
|
|
90
|
+
|
|
91
|
+
Route (parallel where independent):
|
|
92
|
+
Task(de-pipeline-expert → overall architecture design)
|
|
93
|
+
Task(de-airflow-expert → DAG structure)
|
|
94
|
+
Task(de-dbt-expert → model design)
|
|
95
|
+
Task(de-snowflake-expert → warehouse setup)
|
|
96
|
+
|
|
97
|
+
Aggregate:
|
|
98
|
+
Pipeline architecture defined
|
|
99
|
+
Airflow DAG: 5 tasks designed
|
|
100
|
+
dbt: 12 models structured
|
|
101
|
+
Snowflake: warehouse + schema configured
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
### 2. Data Quality Workflow
|
|
105
|
+
|
|
106
|
+
```
|
|
107
|
+
1. Analyze data quality requirements
|
|
108
|
+
2. Route to appropriate experts:
|
|
109
|
+
- dbt tests → de-dbt-expert
|
|
110
|
+
- Pipeline validation → de-pipeline-expert
|
|
111
|
+
- Source freshness → de-airflow-expert
|
|
112
|
+
3. Coordinate cross-tool quality strategy
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
### 3. Multi-Tool Projects
|
|
116
|
+
|
|
117
|
+
For projects spanning multiple DE tools:
|
|
118
|
+
|
|
119
|
+
```
|
|
120
|
+
1. Detect all DE tools in project
|
|
121
|
+
2. Identify primary tool (most files/configs)
|
|
122
|
+
3. Route to appropriate experts:
|
|
123
|
+
- If task spans multiple tools → parallel experts
|
|
124
|
+
- If task is tool-specific → single expert
|
|
125
|
+
4. Coordinate cross-tool consistency
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
## Sub-agent Model Selection
|
|
129
|
+
|
|
130
|
+
### Model Mapping by Task Type
|
|
131
|
+
|
|
132
|
+
| Task Type | Recommended Model | Reason |
|
|
133
|
+
|-----------|-------------------|--------|
|
|
134
|
+
| Pipeline architecture | `opus` | Deep reasoning required |
|
|
135
|
+
| DAG/model review | `sonnet` | Balanced quality judgment |
|
|
136
|
+
| Implementation | `sonnet` | Standard code generation |
|
|
137
|
+
| Quick validation | `haiku` | Fast response |
|
|
138
|
+
|
|
139
|
+
### Model Mapping by Agent
|
|
140
|
+
|
|
141
|
+
| Agent | Default Model | Alternative |
|
|
142
|
+
|-------|---------------|-------------|
|
|
143
|
+
| de-pipeline-expert | `sonnet` | `opus` for architecture |
|
|
144
|
+
| de-airflow-expert | `sonnet` | `haiku` for DAG validation |
|
|
145
|
+
| de-dbt-expert | `sonnet` | `haiku` for test checks |
|
|
146
|
+
| de-spark-expert | `sonnet` | `opus` for optimization |
|
|
147
|
+
| de-kafka-expert | `sonnet` | `opus` for topology design |
|
|
148
|
+
| de-snowflake-expert | `sonnet` | `opus` for warehouse design |
|
|
149
|
+
|
|
150
|
+
### Task Call Examples
|
|
151
|
+
|
|
152
|
+
```
|
|
153
|
+
# Complex pipeline architecture
|
|
154
|
+
Task(
|
|
155
|
+
subagent_type: "general-purpose",
|
|
156
|
+
prompt: "Design end-to-end pipeline architecture following de-pipeline-expert guidelines",
|
|
157
|
+
model: "opus"
|
|
158
|
+
)
|
|
159
|
+
|
|
160
|
+
# Standard DAG review
|
|
161
|
+
Task(
|
|
162
|
+
subagent_type: "general-purpose",
|
|
163
|
+
prompt: "Review Airflow DAGs in dags/ following de-airflow-expert guidelines",
|
|
164
|
+
model: "sonnet"
|
|
165
|
+
)
|
|
166
|
+
|
|
167
|
+
# Quick dbt test validation
|
|
168
|
+
Task(
|
|
169
|
+
subagent_type: "Explore",
|
|
170
|
+
prompt: "Find all dbt models missing schema tests",
|
|
171
|
+
model: "haiku"
|
|
172
|
+
)
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
## Parallel Execution
|
|
176
|
+
|
|
177
|
+
Following R009:
|
|
178
|
+
- Maximum 4 parallel instances
|
|
179
|
+
- Independent tool/module operations
|
|
180
|
+
- Coordinate cross-tool consistency
|
|
181
|
+
|
|
182
|
+
Example:
|
|
183
|
+
```
|
|
184
|
+
User: "Review all DE configs"
|
|
185
|
+
|
|
186
|
+
Detection:
|
|
187
|
+
- dags/ → de-airflow-expert
|
|
188
|
+
- models/ → de-dbt-expert
|
|
189
|
+
- kafka/ → de-kafka-expert
|
|
190
|
+
|
|
191
|
+
Route (parallel):
|
|
192
|
+
Task(de-airflow-expert role → review dags/, model: "sonnet")
|
|
193
|
+
Task(de-dbt-expert role → review models/, model: "sonnet")
|
|
194
|
+
Task(de-kafka-expert role → review kafka/, model: "sonnet")
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
## Display Format
|
|
198
|
+
|
|
199
|
+
```
|
|
200
|
+
[Analyzing] Detected: Airflow, dbt, Snowflake
|
|
201
|
+
|
|
202
|
+
[Delegating] de-airflow-expert:sonnet → DAG design
|
|
203
|
+
[Delegating] de-dbt-expert:sonnet → Model structure
|
|
204
|
+
[Delegating] de-snowflake-expert:sonnet → Warehouse config
|
|
205
|
+
|
|
206
|
+
[Progress] ███████████░ 2/3 experts completed
|
|
207
|
+
|
|
208
|
+
[Summary]
|
|
209
|
+
Airflow: DAG with 5 tasks designed
|
|
210
|
+
dbt: 12 models across 3 layers
|
|
211
|
+
Snowflake: Warehouse + schema configured
|
|
212
|
+
|
|
213
|
+
Pipeline design completed.
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
## Integration with Other Routing Skills
|
|
217
|
+
|
|
218
|
+
- **dev-lead-routing**: Hands off to DE lead when data engineering keywords detected
|
|
219
|
+
- **secretary-routing**: DE agents accessible through secretary for management tasks
|
|
220
|
+
- **qa-lead-routing**: Coordinates with QA for data quality testing
|
|
221
|
+
|
|
222
|
+
## Usage
|
|
223
|
+
|
|
224
|
+
This skill is NOT user-invocable. It should be automatically triggered when the main conversation detects data engineering intent.
|
|
225
|
+
|
|
226
|
+
Detection criteria:
|
|
227
|
+
- User requests pipeline design or data engineering
|
|
228
|
+
- User mentions DE tool names (Airflow, dbt, Spark, Kafka, Snowflake)
|
|
229
|
+
- User provides DE-related file paths (dags/, models/, etc.)
|
|
230
|
+
- User requests data quality or lineage work
|
|
@@ -18,6 +18,9 @@ Routes development tasks to appropriate language and framework expert agents. Th
|
|
|
18
18
|
| sw-engineer/frontend | fe-vercel-agent, fe-vuejs-agent, fe-svelte-agent | Frontend frameworks |
|
|
19
19
|
| sw-engineer/backend | be-fastapi-expert, be-springboot-expert, be-go-backend-expert, be-nestjs-expert, be-express-expert | Backend frameworks |
|
|
20
20
|
| sw-engineer/tooling | tool-npm-expert, tool-optimizer, tool-bun-expert | Build tools and optimization |
|
|
21
|
+
| sw-engineer/database | db-supabase-expert, db-postgres-expert, db-redis-expert | Database design and optimization |
|
|
22
|
+
| sw-architect | arch-documenter, arch-speckit-agent | Architecture documentation and spec-driven development |
|
|
23
|
+
| infra-engineer | infra-docker-expert, infra-aws-expert | Container and cloud infrastructure |
|
|
21
24
|
|
|
22
25
|
## Language/Framework Detection
|
|
23
26
|
|
|
@@ -34,6 +37,11 @@ Routes development tasks to appropriate language and framework expert agents. Th
|
|
|
34
37
|
| `.js`, `.jsx` (React) | fe-vercel-agent | React/Next.js |
|
|
35
38
|
| `.vue` | fe-vuejs-agent | Vue.js |
|
|
36
39
|
| `.svelte` | fe-svelte-agent | Svelte |
|
|
40
|
+
| `.sql` (PostgreSQL) | db-postgres-expert | PostgreSQL |
|
|
41
|
+
| `.sql` (Supabase) | db-supabase-expert | Supabase PostgreSQL |
|
|
42
|
+
| `Dockerfile`, `*.dockerfile` | infra-docker-expert | Docker |
|
|
43
|
+
| `*.tf`, `*.tfvars` | infra-aws-expert | Terraform/IaC |
|
|
44
|
+
| `*.yaml`, `*.yml` (CloudFormation) | infra-aws-expert | AWS CloudFormation |
|
|
37
45
|
|
|
38
46
|
### Keyword Mapping
|
|
39
47
|
|
|
@@ -55,6 +63,13 @@ Routes development tasks to appropriate language and framework expert agents. Th
|
|
|
55
63
|
| "npm" | tool-npm-expert |
|
|
56
64
|
| "optimize", "bundle" | tool-optimizer |
|
|
57
65
|
| "bun" | tool-bun-expert |
|
|
66
|
+
| "postgres", "postgresql", "pg_stat", "psql" | db-postgres-expert |
|
|
67
|
+
| "redis", "cache", "pub/sub", "sorted set" | db-redis-expert |
|
|
68
|
+
| "supabase", "rls", "edge function" | db-supabase-expert |
|
|
69
|
+
| "docker", "dockerfile", "container", "compose" | infra-docker-expert |
|
|
70
|
+
| "aws", "cloudformation", "cdk", "terraform", "vpc", "iam", "s3", "lambda" | infra-aws-expert |
|
|
71
|
+
| "architecture", "adr", "openapi", "swagger", "diagram" | arch-documenter |
|
|
72
|
+
| "spec", "specification", "tdd", "requirements" | arch-speckit-agent |
|
|
58
73
|
|
|
59
74
|
## Command Routing
|
|
60
75
|
|
|
@@ -0,0 +1,52 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: kafka-best-practices
|
|
3
|
+
description: Apache Kafka best practices for event streaming, topic design, and producer-consumer patterns
|
|
4
|
+
user-invocable: false
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Apache Kafka Best Practices
|
|
8
|
+
|
|
9
|
+
## Producer Patterns
|
|
10
|
+
|
|
11
|
+
### Idempotent Producer (CRITICAL)
|
|
12
|
+
- Enable `enable.idempotence=true`
|
|
13
|
+
- Prevents duplicate messages
|
|
14
|
+
- Requires `acks=all`, `retries > 0`, `max.in.flight.requests.per.connection <= 5`
|
|
15
|
+
|
|
16
|
+
### Exactly-Once Semantics
|
|
17
|
+
- Use transactional API: `initTransactions()`, `beginTransaction()`, `commitTransaction()`
|
|
18
|
+
- For exactly-once end-to-end processing
|
|
19
|
+
|
|
20
|
+
### Performance
|
|
21
|
+
- Batching: `linger.ms` (wait for batch to fill)
|
|
22
|
+
- Compression: `compression.type=snappy` or `lz4`
|
|
23
|
+
- `batch.size`: 16KB default, tune based on message size
|
|
24
|
+
|
|
25
|
+
## Consumer Patterns
|
|
26
|
+
|
|
27
|
+
### Offset Management
|
|
28
|
+
- Auto-commit: `enable.auto.commit=true` (at-least-once)
|
|
29
|
+
- Manual commit: `commitSync()` or `commitAsync()` (better control)
|
|
30
|
+
|
|
31
|
+
### Rebalancing
|
|
32
|
+
- Cooperative sticky assignor: minimal rebalancing disruption
|
|
33
|
+
- `session.timeout.ms` and `heartbeat.interval.ms` tuning
|
|
34
|
+
|
|
35
|
+
### At-Least-Once vs Exactly-Once
|
|
36
|
+
- At-least-once: default, idempotent processing required
|
|
37
|
+
- Exactly-once: transactional consumer + producer
|
|
38
|
+
|
|
39
|
+
## Topic Design
|
|
40
|
+
|
|
41
|
+
### Partitioning
|
|
42
|
+
- Partition count: based on throughput (MB/s ÷ partition throughput)
|
|
43
|
+
- Key-based partitioning for ordering guarantees
|
|
44
|
+
- More partitions = higher throughput (but more overhead)
|
|
45
|
+
|
|
46
|
+
### Retention
|
|
47
|
+
- Time-based: `retention.ms`
|
|
48
|
+
- Size-based: `retention.bytes`
|
|
49
|
+
- Log compaction: for changelog topics (`cleanup.policy=compact`)
|
|
50
|
+
|
|
51
|
+
## References
|
|
52
|
+
- [Kafka Documentation](https://kafka.apache.org/documentation/)
|
|
@@ -0,0 +1,83 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: pipeline-architecture-patterns
|
|
3
|
+
description: Data pipeline architecture patterns for ETL/ELT design, orchestration, and data quality frameworks
|
|
4
|
+
user-invocable: false
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Data Pipeline Architecture Patterns
|
|
8
|
+
|
|
9
|
+
## Pipeline Architectures
|
|
10
|
+
|
|
11
|
+
### ETL vs ELT (CRITICAL)
|
|
12
|
+
- **ETL**: Extract → Transform (staging) → Load
|
|
13
|
+
- Traditional, on-premise data warehouses
|
|
14
|
+
- Pre-aggregation, complex transformations
|
|
15
|
+
- **ELT**: Extract → Load (raw) → Transform (in warehouse)
|
|
16
|
+
- Cloud warehouses (Snowflake, BigQuery)
|
|
17
|
+
- Leverage warehouse compute power
|
|
18
|
+
|
|
19
|
+
### Lambda Architecture
|
|
20
|
+
- Batch layer: historical data processing
|
|
21
|
+
- Speed layer: real-time stream processing
|
|
22
|
+
- Serving layer: merge batch + real-time views
|
|
23
|
+
- Complexity: maintain two codebases
|
|
24
|
+
|
|
25
|
+
### Kappa Architecture
|
|
26
|
+
- Stream-only processing
|
|
27
|
+
- Single codebase for batch + real-time
|
|
28
|
+
- Reprocessing via replay
|
|
29
|
+
- Simpler than Lambda
|
|
30
|
+
|
|
31
|
+
### Medallion Architecture
|
|
32
|
+
- **Bronze**: Raw data (append-only)
|
|
33
|
+
- **Silver**: Cleaned, conformed data
|
|
34
|
+
- **Gold**: Business-level aggregations
|
|
35
|
+
- Databricks pattern
|
|
36
|
+
|
|
37
|
+
## Orchestration Patterns
|
|
38
|
+
|
|
39
|
+
### DAG-Based Orchestration
|
|
40
|
+
- Airflow, Prefect, Dagster
|
|
41
|
+
- Task dependencies as DAG
|
|
42
|
+
- Retries, backfills, scheduling
|
|
43
|
+
|
|
44
|
+
### Event-Driven Orchestration
|
|
45
|
+
- Kafka, Pub/Sub triggers
|
|
46
|
+
- Real-time, low-latency
|
|
47
|
+
- Decoupled producers/consumers
|
|
48
|
+
|
|
49
|
+
### Hybrid Orchestration
|
|
50
|
+
- Scheduled batch + event-driven streams
|
|
51
|
+
- Example: Airflow DAG triggered by Kafka event
|
|
52
|
+
|
|
53
|
+
## Data Quality Frameworks
|
|
54
|
+
|
|
55
|
+
### Data Contracts (CRITICAL)
|
|
56
|
+
- Define schema, freshness, volume expectations
|
|
57
|
+
- Producer-consumer agreement
|
|
58
|
+
- Break build on violation
|
|
59
|
+
|
|
60
|
+
### Validation Frameworks
|
|
61
|
+
- **Great Expectations**: Python-based expectations
|
|
62
|
+
- **dbt tests**: SQL-based tests
|
|
63
|
+
- **Soda**: YAML-based checks
|
|
64
|
+
|
|
65
|
+
### Data Lineage
|
|
66
|
+
- Track data origin and transformations
|
|
67
|
+
- Debug data quality issues
|
|
68
|
+
- Compliance and auditing
|
|
69
|
+
|
|
70
|
+
## Idempotency Patterns
|
|
71
|
+
|
|
72
|
+
### Idempotent Design (CRITICAL)
|
|
73
|
+
- Same input → same output (no side effects)
|
|
74
|
+
- Upserts instead of inserts
|
|
75
|
+
- Partition replacement instead of append
|
|
76
|
+
|
|
77
|
+
### Deduplication
|
|
78
|
+
- Use unique keys
|
|
79
|
+
- Window-based deduplication
|
|
80
|
+
- Consumer group offset management
|
|
81
|
+
|
|
82
|
+
## References
|
|
83
|
+
- [Data Engineering Design Patterns](https://www.oreilly.com/library/view/data-engineering-design/9781098130725/)
|