dbt-cube-sync 0.1.0a4__tar.gz → 0.1.0a6__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of dbt-cube-sync might be problematic. Click here for more details.

@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: dbt-cube-sync
3
- Version: 0.1.0a4
3
+ Version: 0.1.0a6
4
4
  Summary: Synchronization tool for dbt models to Cube.js schemas and BI tools
5
5
  Author: Ponder
6
6
  Requires-Python: >=3.9,<4.0
@@ -16,6 +16,7 @@ Requires-Dist: jinja2 (>=3.1.2,<4.0.0)
16
16
  Requires-Dist: pydantic (>=2.5.0,<3.0.0)
17
17
  Requires-Dist: pyyaml (>=6.0,<7.0)
18
18
  Requires-Dist: requests (>=2.31.0,<3.0.0)
19
+ Requires-Dist: sqlalchemy (>=2.0.0,<3.0.0)
19
20
  Description-Content-Type: text/markdown
20
21
 
21
22
  # dbt-cube-sync
@@ -25,6 +26,8 @@ A powerful synchronization tool that creates a seamless pipeline from dbt models
25
26
  ## Features
26
27
 
27
28
  - 🔄 **dbt → Cube.js**: Auto-generate Cube.js schemas from dbt models with metrics
29
+ - 🗃️ **Flexible Data Type Source**: Get column types from catalog OR directly from database via SQLAlchemy
30
+ - 🎯 **Model Filtering**: Process specific models instead of all models
28
31
  - 📊 **Cube.js → BI Tools**: Sync schemas to multiple BI platforms
29
32
  - 🏗️ **Extensible Architecture**: Plugin-based connector system for easy BI tool integration
30
33
  - 🐳 **Docker Support**: Containerized execution with orchestration support
@@ -46,6 +49,27 @@ poetry install
46
49
  poetry run dbt-cube-sync --help
47
50
  ```
48
51
 
52
+ ### Database Drivers (for SQLAlchemy URI feature)
53
+
54
+ If you want to use the `--sqlalchemy-uri` option to fetch column types directly from your database, you'll need to install the appropriate database driver:
55
+
56
+ ```bash
57
+ # PostgreSQL
58
+ poetry add psycopg2-binary
59
+
60
+ # MySQL
61
+ poetry add pymysql
62
+
63
+ # Snowflake
64
+ poetry add snowflake-sqlalchemy
65
+
66
+ # BigQuery
67
+ poetry add sqlalchemy-bigquery
68
+
69
+ # Redshift
70
+ poetry add sqlalchemy-redshift
71
+ ```
72
+
49
73
  ### Using Docker
50
74
 
51
75
  ```bash
@@ -55,42 +79,43 @@ docker run --rm dbt-cube-sync --help
55
79
 
56
80
  ## Quick Start
57
81
 
58
- ### 1. Create Configuration File
82
+ ### 1. Generate Cube.js Schemas from dbt
59
83
 
84
+ **Option A: Using catalog file (traditional method)**
60
85
  ```bash
61
- # Create sample config
62
- dbt-cube-sync create-config sync-config.yaml
63
-
64
- # Edit the config file with your BI tool credentials
86
+ dbt-cube-sync dbt-to-cube \
87
+ --manifest ./target/manifest.json \
88
+ --catalog ./target/catalog.json \
89
+ --output ./cube_output
65
90
  ```
66
91
 
67
- ### 2. Generate Cube.js Schemas
68
-
92
+ **Option B: Using database connection (no catalog needed)**
69
93
  ```bash
70
- # Generate from dbt manifest
71
- dbt-cube-sync generate-cubes \\
72
- --dbt-manifest ./DbtEducationalDataProject/target/manifest.json \\
73
- --output-dir ./cube/conf/cube_output
94
+ dbt-cube-sync dbt-to-cube \
95
+ --manifest ./target/manifest.json \
96
+ --sqlalchemy-uri postgresql://user:password@localhost:5432/mydb \
97
+ --output ./cube_output
74
98
  ```
75
99
 
76
- ### 3. Sync to BI Tool
77
-
100
+ **Option C: Filter specific models**
78
101
  ```bash
79
- # Sync to Superset
80
- dbt-cube-sync sync-bi superset \\
81
- --cube-dir ./cube/conf/cube_output \\
82
- --config-file ./sync-config.yaml
102
+ dbt-cube-sync dbt-to-cube \
103
+ --manifest ./target/manifest.json \
104
+ --sqlalchemy-uri postgresql://user:password@localhost:5432/mydb \
105
+ --models orders,customers,products \
106
+ --output ./cube_output
83
107
  ```
84
108
 
85
- ### 4. Full Pipeline
109
+ ### 2. Sync to BI Tool (Optional)
86
110
 
87
111
  ```bash
88
- # Complete dbt → Cube.js → Superset pipeline
89
- dbt-cube-sync full-sync \\
90
- --dbt-manifest ./DbtEducationalDataProject/target/manifest.json \\
91
- --cube-dir ./cube/conf/cube_output \\
92
- --bi-connector superset \\
93
- --config-file ./sync-config.yaml
112
+ # Sync to Superset
113
+ dbt-cube-sync cube-to-bi superset \
114
+ --cube-files ./cube_output \
115
+ --url http://localhost:8088 \
116
+ --username admin \
117
+ --password admin \
118
+ --cube-connection-name Cube
94
119
  ```
95
120
 
96
121
  ## Configuration
@@ -119,23 +144,50 @@ connectors:
119
144
 
120
145
  ## CLI Commands
121
146
 
122
- ### `generate-cubes`
147
+ ### `dbt-to-cube`
123
148
  Generate Cube.js schema files from dbt models.
124
149
 
125
150
  **Options:**
126
- - `--dbt-manifest` / `-m`: Path to dbt manifest.json file
127
- - `--output-dir` / `-o`: Output directory for Cube.js files
128
- - `--template-dir` / `-t`: Directory containing Cube.js templates
151
+ - `--manifest` / `-m`: Path to dbt manifest.json file (required)
152
+ - `--catalog` / `-c`: Path to dbt catalog.json file (optional if --sqlalchemy-uri is provided)
153
+ - `--sqlalchemy-uri` / `-s`: SQLAlchemy database URI for fetching column types (optional if --catalog is provided)
154
+ - Example: `postgresql://user:password@localhost:5432/database`
155
+ - Example: `mysql://user:password@localhost:3306/database`
156
+ - Example: `snowflake://user:password@account/database/schema`
157
+ - `--models`: Comma-separated list of model names to process (optional, processes all if not specified)
158
+ - Example: `--models model1,model2,model3`
159
+ - `--output` / `-o`: Output directory for Cube.js files (required)
160
+ - `--template-dir` / `-t`: Directory containing Cube.js templates (default: ./cube/templates)
161
+
162
+ **Examples:**
163
+ ```bash
164
+ # Using catalog file
165
+ dbt-cube-sync dbt-to-cube -m manifest.json -c catalog.json -o output/
166
+
167
+ # Using database connection (no catalog needed)
168
+ dbt-cube-sync dbt-to-cube -m manifest.json -s postgresql://user:pass@localhost/db -o output/
129
169
 
130
- ### `sync-bi`
170
+ # Filter specific models
171
+ dbt-cube-sync dbt-to-cube -m manifest.json -s postgresql://user:pass@localhost/db --models users,orders -o output/
172
+ ```
173
+
174
+ ### `cube-to-bi`
131
175
  Sync Cube.js schemas to BI tool datasets.
132
176
 
133
177
  **Arguments:**
134
- - `connector`: BI tool type (`superset`, `tableau`, `powerbi`)
178
+ - `bi_tool`: BI tool type (`superset`, `tableau`, `powerbi`)
135
179
 
136
180
  **Options:**
137
- - `--cube-dir` / `-c`: Directory containing Cube.js files
138
- - `--config-file` / `-f`: Configuration file for BI tool connection
181
+ - `--cube-files` / `-c`: Directory containing Cube.js files (required)
182
+ - `--url` / `-u`: BI tool URL (required)
183
+ - `--username` / `-n`: BI tool username (required)
184
+ - `--password` / `-p`: BI tool password (required)
185
+ - `--cube-connection-name` / `-d`: Name of Cube database connection in BI tool (default: Cube)
186
+
187
+ **Example:**
188
+ ```bash
189
+ dbt-cube-sync cube-to-bi superset -c cube_output/ -u http://localhost:8088 -n admin -p admin -d Cube
190
+ ```
139
191
 
140
192
  ### `full-sync`
141
193
  Complete pipeline: dbt models → Cube.js schemas → BI tool datasets.
@@ -170,6 +222,7 @@ dbt-cube-sync/
170
222
  │ ├── config.py # Configuration management
171
223
  │ ├── core/
172
224
  │ │ ├── dbt_parser.py # dbt manifest parser
225
+ │ │ ├── db_inspector.py # Database column type inspector (SQLAlchemy)
173
226
  │ │ ├── cube_generator.py # Cube.js generator
174
227
  │ │ └── models.py # Pydantic data models
175
228
  │ └── connectors/
@@ -5,6 +5,8 @@ A powerful synchronization tool that creates a seamless pipeline from dbt models
5
5
  ## Features
6
6
 
7
7
  - 🔄 **dbt → Cube.js**: Auto-generate Cube.js schemas from dbt models with metrics
8
+ - 🗃️ **Flexible Data Type Source**: Get column types from catalog OR directly from database via SQLAlchemy
9
+ - 🎯 **Model Filtering**: Process specific models instead of all models
8
10
  - 📊 **Cube.js → BI Tools**: Sync schemas to multiple BI platforms
9
11
  - 🏗️ **Extensible Architecture**: Plugin-based connector system for easy BI tool integration
10
12
  - 🐳 **Docker Support**: Containerized execution with orchestration support
@@ -26,6 +28,27 @@ poetry install
26
28
  poetry run dbt-cube-sync --help
27
29
  ```
28
30
 
31
+ ### Database Drivers (for SQLAlchemy URI feature)
32
+
33
+ If you want to use the `--sqlalchemy-uri` option to fetch column types directly from your database, you'll need to install the appropriate database driver:
34
+
35
+ ```bash
36
+ # PostgreSQL
37
+ poetry add psycopg2-binary
38
+
39
+ # MySQL
40
+ poetry add pymysql
41
+
42
+ # Snowflake
43
+ poetry add snowflake-sqlalchemy
44
+
45
+ # BigQuery
46
+ poetry add sqlalchemy-bigquery
47
+
48
+ # Redshift
49
+ poetry add sqlalchemy-redshift
50
+ ```
51
+
29
52
  ### Using Docker
30
53
 
31
54
  ```bash
@@ -35,42 +58,43 @@ docker run --rm dbt-cube-sync --help
35
58
 
36
59
  ## Quick Start
37
60
 
38
- ### 1. Create Configuration File
61
+ ### 1. Generate Cube.js Schemas from dbt
39
62
 
63
+ **Option A: Using catalog file (traditional method)**
40
64
  ```bash
41
- # Create sample config
42
- dbt-cube-sync create-config sync-config.yaml
43
-
44
- # Edit the config file with your BI tool credentials
65
+ dbt-cube-sync dbt-to-cube \
66
+ --manifest ./target/manifest.json \
67
+ --catalog ./target/catalog.json \
68
+ --output ./cube_output
45
69
  ```
46
70
 
47
- ### 2. Generate Cube.js Schemas
48
-
71
+ **Option B: Using database connection (no catalog needed)**
49
72
  ```bash
50
- # Generate from dbt manifest
51
- dbt-cube-sync generate-cubes \\
52
- --dbt-manifest ./DbtEducationalDataProject/target/manifest.json \\
53
- --output-dir ./cube/conf/cube_output
73
+ dbt-cube-sync dbt-to-cube \
74
+ --manifest ./target/manifest.json \
75
+ --sqlalchemy-uri postgresql://user:password@localhost:5432/mydb \
76
+ --output ./cube_output
54
77
  ```
55
78
 
56
- ### 3. Sync to BI Tool
57
-
79
+ **Option C: Filter specific models**
58
80
  ```bash
59
- # Sync to Superset
60
- dbt-cube-sync sync-bi superset \\
61
- --cube-dir ./cube/conf/cube_output \\
62
- --config-file ./sync-config.yaml
81
+ dbt-cube-sync dbt-to-cube \
82
+ --manifest ./target/manifest.json \
83
+ --sqlalchemy-uri postgresql://user:password@localhost:5432/mydb \
84
+ --models orders,customers,products \
85
+ --output ./cube_output
63
86
  ```
64
87
 
65
- ### 4. Full Pipeline
88
+ ### 2. Sync to BI Tool (Optional)
66
89
 
67
90
  ```bash
68
- # Complete dbt → Cube.js → Superset pipeline
69
- dbt-cube-sync full-sync \\
70
- --dbt-manifest ./DbtEducationalDataProject/target/manifest.json \\
71
- --cube-dir ./cube/conf/cube_output \\
72
- --bi-connector superset \\
73
- --config-file ./sync-config.yaml
91
+ # Sync to Superset
92
+ dbt-cube-sync cube-to-bi superset \
93
+ --cube-files ./cube_output \
94
+ --url http://localhost:8088 \
95
+ --username admin \
96
+ --password admin \
97
+ --cube-connection-name Cube
74
98
  ```
75
99
 
76
100
  ## Configuration
@@ -99,23 +123,50 @@ connectors:
99
123
 
100
124
  ## CLI Commands
101
125
 
102
- ### `generate-cubes`
126
+ ### `dbt-to-cube`
103
127
  Generate Cube.js schema files from dbt models.
104
128
 
105
129
  **Options:**
106
- - `--dbt-manifest` / `-m`: Path to dbt manifest.json file
107
- - `--output-dir` / `-o`: Output directory for Cube.js files
108
- - `--template-dir` / `-t`: Directory containing Cube.js templates
130
+ - `--manifest` / `-m`: Path to dbt manifest.json file (required)
131
+ - `--catalog` / `-c`: Path to dbt catalog.json file (optional if --sqlalchemy-uri is provided)
132
+ - `--sqlalchemy-uri` / `-s`: SQLAlchemy database URI for fetching column types (optional if --catalog is provided)
133
+ - Example: `postgresql://user:password@localhost:5432/database`
134
+ - Example: `mysql://user:password@localhost:3306/database`
135
+ - Example: `snowflake://user:password@account/database/schema`
136
+ - `--models`: Comma-separated list of model names to process (optional, processes all if not specified)
137
+ - Example: `--models model1,model2,model3`
138
+ - `--output` / `-o`: Output directory for Cube.js files (required)
139
+ - `--template-dir` / `-t`: Directory containing Cube.js templates (default: ./cube/templates)
140
+
141
+ **Examples:**
142
+ ```bash
143
+ # Using catalog file
144
+ dbt-cube-sync dbt-to-cube -m manifest.json -c catalog.json -o output/
145
+
146
+ # Using database connection (no catalog needed)
147
+ dbt-cube-sync dbt-to-cube -m manifest.json -s postgresql://user:pass@localhost/db -o output/
109
148
 
110
- ### `sync-bi`
149
+ # Filter specific models
150
+ dbt-cube-sync dbt-to-cube -m manifest.json -s postgresql://user:pass@localhost/db --models users,orders -o output/
151
+ ```
152
+
153
+ ### `cube-to-bi`
111
154
  Sync Cube.js schemas to BI tool datasets.
112
155
 
113
156
  **Arguments:**
114
- - `connector`: BI tool type (`superset`, `tableau`, `powerbi`)
157
+ - `bi_tool`: BI tool type (`superset`, `tableau`, `powerbi`)
115
158
 
116
159
  **Options:**
117
- - `--cube-dir` / `-c`: Directory containing Cube.js files
118
- - `--config-file` / `-f`: Configuration file for BI tool connection
160
+ - `--cube-files` / `-c`: Directory containing Cube.js files (required)
161
+ - `--url` / `-u`: BI tool URL (required)
162
+ - `--username` / `-n`: BI tool username (required)
163
+ - `--password` / `-p`: BI tool password (required)
164
+ - `--cube-connection-name` / `-d`: Name of Cube database connection in BI tool (default: Cube)
165
+
166
+ **Example:**
167
+ ```bash
168
+ dbt-cube-sync cube-to-bi superset -c cube_output/ -u http://localhost:8088 -n admin -p admin -d Cube
169
+ ```
119
170
 
120
171
  ### `full-sync`
121
172
  Complete pipeline: dbt models → Cube.js schemas → BI tool datasets.
@@ -150,6 +201,7 @@ dbt-cube-sync/
150
201
  │ ├── config.py # Configuration management
151
202
  │ ├── core/
152
203
  │ │ ├── dbt_parser.py # dbt manifest parser
204
+ │ │ ├── db_inspector.py # Database column type inspector (SQLAlchemy)
153
205
  │ │ ├── cube_generator.py # Cube.js generator
154
206
  │ │ └── models.py # Pydantic data models
155
207
  │ └── connectors/
@@ -24,10 +24,12 @@ class CustomGroup(click.Group):
24
24
  click.echo("\nAvailable commands:")
25
25
  click.echo(" dbt-cube-sync --help # Show help")
26
26
  click.echo(" dbt-cube-sync --version # Show version")
27
- click.echo(" dbt-cube-sync dbt-to-cube -m manifest -c catalog -o output # Generate Cube.js schemas")
27
+ click.echo(" dbt-cube-sync dbt-to-cube -m manifest -c catalog -o output # Generate with catalog")
28
+ click.echo(" dbt-cube-sync dbt-to-cube -m manifest -s postgresql://user:pass@host/db -o output # Generate with database")
29
+ click.echo(" dbt-cube-sync dbt-to-cube -m manifest -s <uri> --models model1,model2 -o output # Filter specific models")
28
30
  click.echo(" dbt-cube-sync cube-to-bi superset -c cubes -u url -n user -p pass -d Cube # Sync to BI tool")
29
31
  ctx.exit(1)
30
-
32
+
31
33
  return super().get_command(ctx, cmd_name)
32
34
 
33
35
 
@@ -39,35 +41,66 @@ def main():
39
41
 
40
42
 
41
43
  @main.command()
42
- @click.option('--manifest', '-m',
44
+ @click.option('--manifest', '-m',
43
45
  required=True,
44
46
  help='Path to dbt manifest.json file')
45
47
  @click.option('--catalog', '-c',
46
- required=True,
47
- help='Path to dbt catalog.json file')
48
+ required=False,
49
+ default=None,
50
+ help='Path to dbt catalog.json file (optional if --sqlalchemy-uri is provided)')
51
+ @click.option('--sqlalchemy-uri', '-s',
52
+ required=False,
53
+ default=None,
54
+ help='SQLAlchemy database URI for fetching column types (e.g., postgresql://user:pass@host:port/db)')
55
+ @click.option('--models',
56
+ required=False,
57
+ default=None,
58
+ help='Comma-separated list of model names to process (e.g., model1,model2). If not specified, processes all models')
48
59
  @click.option('--output', '-o',
49
60
  required=True,
50
61
  help='Output directory for Cube.js files')
51
62
  @click.option('--template-dir', '-t',
52
63
  default='./cube/templates',
53
64
  help='Directory containing Cube.js templates')
54
- def dbt_to_cube(manifest: str, catalog: str, output: str, template_dir: str):
65
+ def dbt_to_cube(manifest: str, catalog: Optional[str], sqlalchemy_uri: Optional[str], models: Optional[str], output: str, template_dir: str):
55
66
  """Generate Cube.js schemas from dbt models"""
56
67
  try:
68
+ # Validate that at least one source of column types is provided
69
+ if not catalog and not sqlalchemy_uri:
70
+ click.echo("❌ Error: You must provide either --catalog or --sqlalchemy-uri to get column data types", err=True)
71
+ click.echo("💡 Example with catalog: dbt-cube-sync dbt-to-cube -m manifest.json -c catalog.json -o output/", err=True)
72
+ click.echo("💡 Example with database: dbt-cube-sync dbt-to-cube -m manifest.json -s postgresql://user:pass@host:port/db -o output/", err=True)
73
+ sys.exit(1)
74
+
75
+ # Parse model filter if provided
76
+ model_filter = None
77
+ if models:
78
+ model_filter = [m.strip() for m in models.split(',')]
79
+ click.echo(f"🎯 Filtering models: {', '.join(model_filter)}")
80
+
57
81
  click.echo("🔄 Parsing dbt manifest...")
58
- parser = DbtParser(manifest, catalog)
59
- models = parser.parse_models()
60
-
61
- click.echo(f"📊 Found {len(models)} dbt models")
62
-
82
+ parser = DbtParser(
83
+ manifest_path=manifest,
84
+ catalog_path=catalog,
85
+ sqlalchemy_uri=sqlalchemy_uri,
86
+ model_filter=model_filter
87
+ )
88
+ parsed_models = parser.parse_models()
89
+
90
+ click.echo(f"📊 Found {len(parsed_models)} dbt models")
91
+
92
+ if len(parsed_models) == 0:
93
+ click.echo("⚠️ No models found. Make sure your models have both columns and metrics defined.")
94
+ sys.exit(0)
95
+
63
96
  click.echo("🏗️ Generating Cube.js schemas...")
64
97
  generator = CubeGenerator(template_dir, output)
65
- generated_files = generator.generate_cube_files(models)
66
-
98
+ generated_files = generator.generate_cube_files(parsed_models)
99
+
67
100
  click.echo(f"✅ Generated {len(generated_files)} Cube.js files:")
68
101
  for file_path in generated_files:
69
102
  click.echo(f" • {file_path}")
70
-
103
+
71
104
  except Exception as e:
72
105
  click.echo(f"❌ Error: {str(e)}", err=True)
73
106
  sys.exit(1)
@@ -7,7 +7,7 @@ from pathlib import Path
7
7
  from typing import List, Dict, Any
8
8
  from jinja2 import Environment, FileSystemLoader, Template
9
9
 
10
- from .models import DbtModel, CubeSchema, CubeDimension, CubeMeasure
10
+ from .models import DbtModel, CubeSchema, CubeDimension, CubeMeasure, CubePreAggregation, CubeRefreshKey
11
11
  from .dbt_parser import DbtParser
12
12
 
13
13
 
@@ -98,11 +98,36 @@ class CubeGenerator:
98
98
  )
99
99
  measures.append(measure)
100
100
 
101
+ # Convert pre-aggregations
102
+ pre_aggregations = []
103
+ for pre_agg_name, pre_agg_data in model.pre_aggregations.items():
104
+ # Convert refresh_key if present
105
+ refresh_key = None
106
+ if pre_agg_data.refresh_key:
107
+ refresh_key = CubeRefreshKey(
108
+ every=pre_agg_data.refresh_key.every,
109
+ sql=pre_agg_data.refresh_key.sql,
110
+ incremental=pre_agg_data.refresh_key.incremental,
111
+ update_window=pre_agg_data.refresh_key.update_window
112
+ )
113
+
114
+ pre_aggregation = CubePreAggregation(
115
+ name=pre_agg_name,
116
+ type=pre_agg_data.type,
117
+ measures=pre_agg_data.measures,
118
+ dimensions=pre_agg_data.dimensions,
119
+ time_dimension=pre_agg_data.time_dimension,
120
+ granularity=pre_agg_data.granularity,
121
+ refresh_key=refresh_key
122
+ )
123
+ pre_aggregations.append(pre_aggregation)
124
+
101
125
  return CubeSchema(
102
126
  cube_name=cube_name,
103
127
  sql=sql,
104
128
  dimensions=dimensions,
105
- measures=measures
129
+ measures=measures,
130
+ pre_aggregations=pre_aggregations
106
131
  )
107
132
 
108
133
  def _write_cube_file(self, cube_schema: CubeSchema) -> Path:
@@ -116,7 +141,8 @@ class CubeGenerator:
116
141
  cube_name=cube_schema.cube_name,
117
142
  sql=cube_schema.sql,
118
143
  dimensions=cube_schema.dimensions,
119
- measures=cube_schema.measures
144
+ measures=cube_schema.measures,
145
+ pre_aggregations=cube_schema.pre_aggregations
120
146
  )
121
147
  else:
122
148
  # Fallback to hardcoded template
@@ -131,7 +157,12 @@ class CubeGenerator:
131
157
 
132
158
  def _generate_cube_content(self, cube_schema: CubeSchema) -> str:
133
159
  """Generate Cube.js content using hardcoded template"""
134
-
160
+
161
+ # Extract table name from SQL for refresh_key replacement
162
+ import re
163
+ table_name_match = re.search(r'FROM\s+([^\s,;]+)', cube_schema.sql, re.IGNORECASE)
164
+ table_name = table_name_match.group(1) if table_name_match else None
165
+
135
166
  # Generate dimensions
136
167
  dimensions_content = []
137
168
  for dim in cube_schema.dimensions:
@@ -152,11 +183,79 @@ class CubeGenerator:
152
183
  }}"""
153
184
  measures_content.append(measure_content)
154
185
 
186
+ # Generate pre-aggregations
187
+ pre_aggregations_content = []
188
+ for pre_agg in cube_schema.pre_aggregations:
189
+ pre_agg_parts = [f" type: `{pre_agg.type}`"]
190
+
191
+ if pre_agg.measures:
192
+ measures_list = ', '.join([f'CUBE.{measure}' for measure in pre_agg.measures])
193
+ pre_agg_parts.append(f" measures: [{measures_list}]")
194
+
195
+ if pre_agg.dimensions:
196
+ dims_list = ', '.join([f'CUBE.{dim}' for dim in pre_agg.dimensions])
197
+ pre_agg_parts.append(f" dimensions: [{dims_list}]")
198
+
199
+ if pre_agg.time_dimension:
200
+ pre_agg_parts.append(f" time_dimension: CUBE.{pre_agg.time_dimension}")
201
+
202
+ if pre_agg.granularity:
203
+ pre_agg_parts.append(f" granularity: `{pre_agg.granularity}`")
204
+
205
+ if pre_agg.refresh_key:
206
+ refresh_key_parts = []
207
+ if pre_agg.refresh_key.every:
208
+ refresh_key_parts.append(f" every: `{pre_agg.refresh_key.every}`")
209
+ if pre_agg.refresh_key.sql:
210
+ # Replace ${CUBE} and ${this} with actual table name
211
+ refresh_sql = pre_agg.refresh_key.sql
212
+ if table_name:
213
+ refresh_sql = refresh_sql.replace('${CUBE}', table_name)
214
+ refresh_sql = refresh_sql.replace('${this}', table_name)
215
+ refresh_key_parts.append(f" sql: `{refresh_sql}`")
216
+ if pre_agg.refresh_key.incremental is not None:
217
+ refresh_key_parts.append(f" incremental: {str(pre_agg.refresh_key.incremental).lower()}")
218
+ if pre_agg.refresh_key.update_window:
219
+ refresh_key_parts.append(f" update_window: `{pre_agg.refresh_key.update_window}`")
220
+
221
+ if refresh_key_parts:
222
+ refresh_key_content = ',\n'.join(refresh_key_parts)
223
+ pre_agg_parts.append(f" refresh_key: {{\n{refresh_key_content}\n }}")
224
+
225
+ parts_joined = ',\n'.join(pre_agg_parts)
226
+ pre_agg_content = f""" {pre_agg.name}: {{
227
+ {parts_joined}
228
+ }}"""
229
+ pre_aggregations_content.append(pre_agg_content)
230
+
155
231
  # Combine into full cube definition
156
232
  dimensions_joined = ',\n\n'.join(dimensions_content)
157
233
  measures_joined = ',\n\n'.join(measures_content)
158
234
 
159
- content = f"""cube(`{cube_schema.cube_name}`, {{
235
+ # Ensure we have measures (required for a useful Cube.js schema)
236
+ if not measures_content:
237
+ raise ValueError(f"Cube {cube_schema.cube_name} has no measures defined. Measures are required for Cube.js schemas.")
238
+
239
+ if pre_aggregations_content:
240
+ pre_aggregations_joined = ',\n\n'.join(pre_aggregations_content)
241
+ content = f"""cube(`{cube_schema.cube_name}`, {{
242
+ sql: `{cube_schema.sql}`,
243
+
244
+ dimensions: {{
245
+ {dimensions_joined}
246
+ }},
247
+
248
+ measures: {{
249
+ {measures_joined}
250
+ }},
251
+
252
+ pre_aggregations: {{
253
+ {pre_aggregations_joined}
254
+ }}
255
+ }});
256
+ """
257
+ else:
258
+ content = f"""cube(`{cube_schema.cube_name}`, {{
160
259
  sql: `{cube_schema.sql}`,
161
260
 
162
261
  dimensions: {{
@@ -0,0 +1,51 @@
1
+ """
2
+ Database inspector - fetches column types using SQLAlchemy
3
+ """
4
+ from typing import Dict, Optional
5
+ from sqlalchemy import create_engine, inspect, MetaData, Table
6
+ from sqlalchemy.engine import Engine
7
+
8
+
9
+ class DatabaseInspector:
10
+ """Inspects database schema to extract column type information"""
11
+
12
+ def __init__(self, sqlalchemy_uri: str):
13
+ """
14
+ Initialize the database inspector
15
+
16
+ Args:
17
+ sqlalchemy_uri: SQLAlchemy connection URI (e.g., postgresql://user:pass@host:port/db)
18
+ """
19
+ self.engine: Engine = create_engine(sqlalchemy_uri)
20
+ self.inspector = inspect(self.engine)
21
+
22
+ def get_table_columns(self, schema: str, table_name: str) -> Dict[str, str]:
23
+ """
24
+ Get column names and their data types for a specific table
25
+
26
+ Args:
27
+ schema: Database schema name
28
+ table_name: Table name
29
+
30
+ Returns:
31
+ Dictionary mapping column names to data types
32
+ """
33
+ columns = {}
34
+
35
+ try:
36
+ # Get columns from the database
37
+ table_columns = self.inspector.get_columns(table_name, schema=schema)
38
+
39
+ for column in table_columns:
40
+ col_name = column['name']
41
+ col_type = str(column['type'])
42
+ columns[col_name] = col_type
43
+
44
+ except Exception as e:
45
+ print(f"Warning: Could not inspect table {schema}.{table_name}: {e}")
46
+
47
+ return columns
48
+
49
+ def close(self):
50
+ """Close the database connection"""
51
+ self.engine.dispose()
@@ -3,27 +3,39 @@ dbt manifest parser - extracts models, metrics, and column information
3
3
  """
4
4
  import json
5
5
  import os
6
- from typing import Dict, List
6
+ from typing import Dict, List, Optional
7
7
  from pathlib import Path
8
8
 
9
- from .models import DbtModel, DbtColumn, DbtMetric
9
+ from .models import DbtModel, DbtColumn, DbtMetric, DbtPreAggregation, DbtRefreshKey
10
+ from .db_inspector import DatabaseInspector
10
11
 
11
12
 
12
13
  class DbtParser:
13
14
  """Parses dbt manifest.json to extract model and metric information"""
14
-
15
- def __init__(self, manifest_path: str, catalog_path: str = None):
15
+
16
+ def __init__(
17
+ self,
18
+ manifest_path: str,
19
+ catalog_path: Optional[str] = None,
20
+ sqlalchemy_uri: Optional[str] = None,
21
+ model_filter: Optional[List[str]] = None
22
+ ):
16
23
  """
17
24
  Initialize the parser
18
-
25
+
19
26
  Args:
20
27
  manifest_path: Path to dbt manifest.json file
21
28
  catalog_path: Optional path to dbt catalog.json for column types
29
+ sqlalchemy_uri: Optional SQLAlchemy URI to connect to database for column types
30
+ model_filter: Optional list of model names to process (if None, processes all models)
22
31
  """
23
32
  self.manifest_path = manifest_path
24
33
  self.catalog_path = catalog_path
34
+ self.sqlalchemy_uri = sqlalchemy_uri
35
+ self.model_filter = model_filter
25
36
  self.manifest = self._load_manifest()
26
37
  self.catalog = self._load_catalog() if catalog_path else None
38
+ self.db_inspector = DatabaseInspector(sqlalchemy_uri) if sqlalchemy_uri else None
27
39
 
28
40
  def _load_manifest(self) -> dict:
29
41
  """Load the dbt manifest.json file"""
@@ -48,22 +60,32 @@ class DbtParser:
48
60
  def parse_models(self) -> List[DbtModel]:
49
61
  """
50
62
  Extract models with metrics and columns from manifest
51
-
63
+
52
64
  Returns:
53
65
  List of DbtModel instances
54
66
  """
55
67
  models = []
56
68
  nodes = self.manifest.get('nodes', {})
57
-
69
+
58
70
  for node_id, node_data in nodes.items():
59
71
  # Only process models
60
72
  if node_data.get('resource_type') != 'model':
61
73
  continue
62
-
74
+
75
+ # Apply model filter if specified
76
+ model_name = node_data.get('name', '')
77
+ if self.model_filter and model_name not in self.model_filter:
78
+ continue
79
+
63
80
  model = self._parse_model(node_id, node_data)
64
- if model and model.columns and model.metrics: # Only include models with BOTH columns AND metrics
81
+ # Include models that have columns AND metrics (measures are required for useful Cube.js schemas)
82
+ if model and model.columns and model.metrics:
65
83
  models.append(model)
66
-
84
+
85
+ # Close database inspector if it was used
86
+ if self.db_inspector:
87
+ self.db_inspector.close()
88
+
67
89
  return models
68
90
 
69
91
  def _parse_model(self, node_id: str, node_data: dict) -> DbtModel:
@@ -78,34 +100,49 @@ class DbtParser:
78
100
  # Parse metrics from config.meta.metrics
79
101
  metrics = self._parse_metrics(node_data)
80
102
 
103
+ # Parse pre-aggregations from config.meta.pre_aggregations
104
+ pre_aggregations = self._parse_pre_aggregations(node_data)
105
+
81
106
  return DbtModel(
82
107
  name=model_name,
83
108
  database=model_database,
84
109
  schema_name=model_schema,
85
110
  node_id=node_id,
86
111
  columns=columns,
87
- metrics=metrics
112
+ metrics=metrics,
113
+ pre_aggregations=pre_aggregations
88
114
  )
89
115
 
90
116
  def _parse_columns(self, node_id: str, node_data: dict) -> Dict[str, DbtColumn]:
91
- """Parse columns for a model, enhanced with catalog data if available"""
117
+ """Parse columns for a model, enhanced with catalog or database data if available"""
92
118
  columns = {}
93
119
  manifest_columns = node_data.get('columns', {})
94
-
95
- # Get catalog columns for type information
120
+
121
+ # Get catalog columns for type information (if catalog is available)
96
122
  catalog_columns = {}
97
123
  if self.catalog and node_id in self.catalog.get('nodes', {}):
98
124
  catalog_columns = self.catalog['nodes'][node_id].get('columns', {})
99
-
100
- # If manifest has columns, use them with catalog type info
125
+
126
+ # Get database columns for type information (if db_inspector is available)
127
+ db_columns = {}
128
+ if self.db_inspector and not self.catalog:
129
+ schema = node_data.get('schema', '')
130
+ table_name = node_data.get('name', '')
131
+ if schema and table_name:
132
+ db_columns = self.db_inspector.get_table_columns(schema, table_name)
133
+
134
+ # If manifest has columns, use them with catalog or database type info
101
135
  if manifest_columns:
102
136
  for col_name, col_data in manifest_columns.items():
103
137
  data_type = None
104
-
105
- # Try to get data type from catalog
138
+
139
+ # Try to get data type from catalog first
106
140
  if col_name in catalog_columns:
107
141
  data_type = catalog_columns[col_name].get('type', '')
108
-
142
+ # Otherwise try database
143
+ elif col_name in db_columns:
144
+ data_type = db_columns[col_name]
145
+
109
146
  columns[col_name] = DbtColumn(
110
147
  name=col_name,
111
148
  data_type=data_type,
@@ -113,15 +150,24 @@ class DbtParser:
113
150
  meta=col_data.get('meta', {})
114
151
  )
115
152
  else:
116
- # If no manifest columns, use all catalog columns
117
- for col_name, col_data in catalog_columns.items():
153
+ # If no manifest columns, use catalog or database columns
154
+ source_columns = catalog_columns or db_columns
155
+ for col_name in source_columns:
156
+ if catalog_columns:
157
+ col_data = catalog_columns[col_name]
158
+ data_type = col_data.get('type', '')
159
+ description = f"Column from catalog: {col_name}"
160
+ else:
161
+ data_type = db_columns[col_name]
162
+ description = f"Column from database: {col_name}"
163
+
118
164
  columns[col_name] = DbtColumn(
119
165
  name=col_name,
120
- data_type=col_data.get('type', ''),
121
- description=f"Column from catalog: {col_name}",
166
+ data_type=data_type,
167
+ description=description,
122
168
  meta={}
123
169
  )
124
-
170
+
125
171
  return columns
126
172
 
127
173
  def _parse_metrics(self, node_data: dict) -> Dict[str, DbtMetric]:
@@ -145,6 +191,40 @@ class DbtParser:
145
191
 
146
192
  return metrics
147
193
 
194
+ def _parse_pre_aggregations(self, node_data: dict) -> Dict[str, DbtPreAggregation]:
195
+ """Parse pre-aggregations from model configuration"""
196
+ pre_aggregations = {}
197
+
198
+ # Look for pre-aggregations in config.meta.pre_aggregations
199
+ config = node_data.get('config', {})
200
+ meta = config.get('meta', {})
201
+ pre_aggs_data = meta.get('pre_aggregations', {})
202
+
203
+ for pre_agg_name, pre_agg_config in pre_aggs_data.items():
204
+ if isinstance(pre_agg_config, dict):
205
+ # Parse refresh_key if present
206
+ refresh_key = None
207
+ refresh_key_config = pre_agg_config.get('refresh_key')
208
+ if refresh_key_config and isinstance(refresh_key_config, dict):
209
+ refresh_key = DbtRefreshKey(
210
+ every=refresh_key_config.get('every'),
211
+ sql=refresh_key_config.get('sql'),
212
+ incremental=refresh_key_config.get('incremental'),
213
+ update_window=refresh_key_config.get('update_window')
214
+ )
215
+
216
+ pre_aggregations[pre_agg_name] = DbtPreAggregation(
217
+ name=pre_agg_name,
218
+ type=pre_agg_config.get('type', 'rollup'),
219
+ measures=pre_agg_config.get('measures', []),
220
+ dimensions=pre_agg_config.get('dimensions', []),
221
+ time_dimension=pre_agg_config.get('time_dimension'),
222
+ granularity=pre_agg_config.get('granularity'),
223
+ refresh_key=refresh_key
224
+ )
225
+
226
+ return pre_aggregations
227
+
148
228
  @staticmethod
149
229
  def map_dbt_type_to_cube_type(dbt_type: str) -> str:
150
230
  """Map dbt metric types to Cube.js measure types"""
@@ -22,6 +22,25 @@ class DbtMetric(BaseModel):
22
22
  description: Optional[str] = None
23
23
 
24
24
 
25
+ class DbtRefreshKey(BaseModel):
26
+ """Represents a refresh_key configuration for pre-aggregations"""
27
+ every: Optional[str] = None
28
+ sql: Optional[str] = None
29
+ incremental: Optional[bool] = None
30
+ update_window: Optional[str] = None
31
+
32
+
33
+ class DbtPreAggregation(BaseModel):
34
+ """Represents a dbt pre-aggregation configuration"""
35
+ name: str
36
+ type: str = "rollup"
37
+ measures: Optional[List[str]] = None
38
+ dimensions: Optional[List[str]] = None
39
+ time_dimension: Optional[str] = None
40
+ granularity: Optional[str] = None
41
+ refresh_key: Optional[DbtRefreshKey] = None
42
+
43
+
25
44
  class DbtModel(BaseModel):
26
45
  """Represents a parsed dbt model"""
27
46
  name: str
@@ -30,6 +49,7 @@ class DbtModel(BaseModel):
30
49
  node_id: str
31
50
  columns: Dict[str, DbtColumn]
32
51
  metrics: Dict[str, DbtMetric]
52
+ pre_aggregations: Dict[str, DbtPreAggregation] = {}
33
53
 
34
54
 
35
55
  class CubeDimension(BaseModel):
@@ -50,12 +70,32 @@ class CubeMeasure(BaseModel):
50
70
  description: Optional[str] = None
51
71
 
52
72
 
73
+ class CubeRefreshKey(BaseModel):
74
+ """Represents a Cube.js refresh_key configuration"""
75
+ every: Optional[str] = None
76
+ sql: Optional[str] = None
77
+ incremental: Optional[bool] = None
78
+ update_window: Optional[str] = None
79
+
80
+
81
+ class CubePreAggregation(BaseModel):
82
+ """Represents a Cube.js pre-aggregation"""
83
+ name: str
84
+ type: str = "rollup"
85
+ measures: Optional[List[str]] = None
86
+ dimensions: Optional[List[str]] = None
87
+ time_dimension: Optional[str] = None
88
+ granularity: Optional[str] = None
89
+ refresh_key: Optional[CubeRefreshKey] = None
90
+
91
+
53
92
  class CubeSchema(BaseModel):
54
93
  """Represents a complete Cube.js schema"""
55
94
  cube_name: str
56
95
  sql: str
57
96
  dimensions: List[CubeDimension]
58
97
  measures: List[CubeMeasure]
98
+ pre_aggregations: List[CubePreAggregation] = []
59
99
 
60
100
 
61
101
  class SyncResult(BaseModel):
@@ -1,6 +1,6 @@
1
1
  [tool.poetry]
2
2
  name = "dbt-cube-sync"
3
- version = "0.1.0a4"
3
+ version = "0.1.0a6"
4
4
  description = "Synchronization tool for dbt models to Cube.js schemas and BI tools"
5
5
  authors = ["Ponder"]
6
6
  readme = "README.md"
@@ -13,6 +13,7 @@ pyyaml = "^6.0"
13
13
  click = "^8.1.7"
14
14
  pydantic = "^2.5.0"
15
15
  jinja2 = "^3.1.2"
16
+ sqlalchemy = "^2.0.0"
16
17
 
17
18
  [tool.poetry.group.dev.dependencies]
18
19
  pytest = "^7.4.0"