kelp-core 0.0.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (56) hide show
  1. kelp_core-0.0.1/PKG-INFO +314 -0
  2. kelp_core-0.0.1/README.md +298 -0
  3. kelp_core-0.0.1/pyproject.toml +111 -0
  4. kelp_core-0.0.1/src/kelp/__init__.py +15 -0
  5. kelp_core-0.0.1/src/kelp/__main__.py +4 -0
  6. kelp_core-0.0.1/src/kelp/catalog/__init__.py +21 -0
  7. kelp_core-0.0.1/src/kelp/catalog/abac_ddl.py +66 -0
  8. kelp_core-0.0.1/src/kelp/catalog/api.py +137 -0
  9. kelp_core-0.0.1/src/kelp/catalog/function_ddl.py +115 -0
  10. kelp_core-0.0.1/src/kelp/catalog/metric_view_ddl.py +310 -0
  11. kelp_core-0.0.1/src/kelp/catalog/uc_adapter.py +357 -0
  12. kelp_core-0.0.1/src/kelp/catalog/uc_diff.py +235 -0
  13. kelp_core-0.0.1/src/kelp/catalog/uc_models.py +132 -0
  14. kelp_core-0.0.1/src/kelp/catalog/uc_query_builder.py +383 -0
  15. kelp_core-0.0.1/src/kelp/cli/__init__.py +0 -0
  16. kelp_core-0.0.1/src/kelp/cli/catalog.py +493 -0
  17. kelp_core-0.0.1/src/kelp/cli/cli.py +309 -0
  18. kelp_core-0.0.1/src/kelp/cli/init.py +130 -0
  19. kelp_core-0.0.1/src/kelp/config/__init__.py +3 -0
  20. kelp_core-0.0.1/src/kelp/config/catalog.py +65 -0
  21. kelp_core-0.0.1/src/kelp/config/catalog_spec.py +83 -0
  22. kelp_core-0.0.1/src/kelp/config/lifecycle.py +219 -0
  23. kelp_core-0.0.1/src/kelp/config/project.py +197 -0
  24. kelp_core-0.0.1/src/kelp/config/runtime.py +101 -0
  25. kelp_core-0.0.1/src/kelp/config/settings.py +243 -0
  26. kelp_core-0.0.1/src/kelp/config/vars.py +111 -0
  27. kelp_core-0.0.1/src/kelp/constants.py +3 -0
  28. kelp_core-0.0.1/src/kelp/models/__init__.py +15 -0
  29. kelp_core-0.0.1/src/kelp/models/abac.py +58 -0
  30. kelp_core-0.0.1/src/kelp/models/catalog.py +238 -0
  31. kelp_core-0.0.1/src/kelp/models/function.py +125 -0
  32. kelp_core-0.0.1/src/kelp/models/jsonschema.py +56 -0
  33. kelp_core-0.0.1/src/kelp/models/metric_view.py +64 -0
  34. kelp_core-0.0.1/src/kelp/models/project_config.py +161 -0
  35. kelp_core-0.0.1/src/kelp/models/runtime_context.py +38 -0
  36. kelp_core-0.0.1/src/kelp/models/table.py +363 -0
  37. kelp_core-0.0.1/src/kelp/pipelines/__init__.py +20 -0
  38. kelp_core-0.0.1/src/kelp/pipelines/api.py +146 -0
  39. kelp_core-0.0.1/src/kelp/pipelines/streaming_tables.py +446 -0
  40. kelp_core-0.0.1/src/kelp/pipelines/utils.py +16 -0
  41. kelp_core-0.0.1/src/kelp/service/__init__.py +3 -0
  42. kelp_core-0.0.1/src/kelp/service/pipeline_manager.py +370 -0
  43. kelp_core-0.0.1/src/kelp/service/table_manager.py +525 -0
  44. kelp_core-0.0.1/src/kelp/service/yaml_manager.py +775 -0
  45. kelp_core-0.0.1/src/kelp/tables/__init__.py +29 -0
  46. kelp_core-0.0.1/src/kelp/tables/api.py +133 -0
  47. kelp_core-0.0.1/src/kelp/transformations/__init__.py +13 -0
  48. kelp_core-0.0.1/src/kelp/transformations/functions.py +138 -0
  49. kelp_core-0.0.1/src/kelp/transformations/schema.py +527 -0
  50. kelp_core-0.0.1/src/kelp/utils/__init__.py +1 -0
  51. kelp_core-0.0.1/src/kelp/utils/common.py +80 -0
  52. kelp_core-0.0.1/src/kelp/utils/databricks.py +182 -0
  53. kelp_core-0.0.1/src/kelp/utils/dict_parser.py +100 -0
  54. kelp_core-0.0.1/src/kelp/utils/jinja_parser.py +152 -0
  55. kelp_core-0.0.1/src/kelp/utils/logging.py +28 -0
  56. kelp_core-0.0.1/src/kelp/utils/yaml_parser.py +11 -0
@@ -0,0 +1,314 @@
1
+ Metadata-Version: 2.3
2
+ Name: kelp-core
3
+ Version: 0.0.1
4
+ Summary: Metadata Toolkit for Databricks Spark and Declarative Pipelines
5
+ Author: BenSchr
6
+ Requires-Dist: databricks-sdk>=0.80.0
7
+ Requires-Dist: jinja2>=3.1.6
8
+ Requires-Dist: pydantic<3
9
+ Requires-Dist: pyyaml>=6.0
10
+ Requires-Dist: typer>=0.23.0
11
+ Requires-Python: >=3.12
12
+ Project-URL: Homepage, https://github.com/benschr/kelp-core
13
+ Project-URL: Documentation, https://benschr.github.io/kelp-core/
14
+ Project-URL: Repository, https://github.com/benschr/kelp-core
15
+ Description-Content-Type: text/markdown
16
+
17
+
18
+ ```
19
+ ██╗ ██╗███████╗██╗ ██████╗
20
+ ██║ ██╔╝██╔════╝██║ ██╔══██╗
21
+ █████╔╝ █████╗ ██║ ██████╔╝
22
+ ██╔═██╗ ██╔══╝ ██║ ██╔═══╝
23
+ ██║ ██╗███████╗███████╗██║
24
+ ╚═╝ ╚═╝╚══════╝╚══════╝╚═╝
25
+ Metadata Toolkit for Databricks Spark and Declarative Pipelines
26
+ ```
27
+ Kelp is a powerful framework designed to simplify the management of data pipelines, quality checks, and table configurations. Follow the instructions below to set up Kelp in your environment and start building robust data solutions.
28
+
29
+ Documentation: [https://benschr.github.io/kelp-core/](https://benschr.github.io/kelp-core/)
30
+
31
+ ## Why Kelp?
32
+ Kelp provides a metadata and transformation layer for Databricks Spark and Spark Declarative Pipelines (SDP). It lets you define data models, quality checks, and transformations in structured YAML while offering Python utilities for advanced logic. With Kelp you can:
33
+
34
+ ### Metadata management
35
+ - Define models, metric views, functions, and ABAC policies in readable, maintainable YAML
36
+ - Keep local metadata synchronized with Unity Catalog for improved governance and discoverability
37
+ - Use variables and targets for environment-specific configuration
38
+ - Inherit directory-level settings and tags across models
39
+
40
+ ### Spark Declarative Pipelines (SDP)
41
+ - Inject metadata into SDP decorators with minimal boilerplate
42
+ - Optionally use DQX quality checks instead of SDP expectations
43
+ - Apply a quarantine pattern for validation failures
44
+ - Sync metadata to Unity Catalog after pipeline runs
45
+ - Easily inject catalog and schema names for tables and functions
46
+ - Sync descriptions and tags from metadata to tables and columns without requiring the Spark schema to match exactly
47
+ - Use a low-level API (no decorators) to stay robust against SDP syntax or feature changes
48
+
49
+ ### Extra utilities
50
+ - Composable DataFrame transformations for schema enforcement and function application
51
+ - CLI tools for project management and metadata synchronization
52
+ - Metric views for defining business metrics and dimensions in metadata
53
+ - ABAC policies for row- and column-level access control defined in metadata and applied in code and the catalog
54
+ - Reusable function definitions in metadata that can be referenced from code and ABAC policies for consistent logic and easier maintenance
55
+
56
+ ## Installation
57
+
58
+ To install Kelp, you can use `uv`, `pip`, or the package manager of your choice. Below are the commands for both methods:
59
+
60
+ ```
61
+ uv add kelp-core==0.0.1
62
+ ```
63
+
64
+ ```
65
+ pip install kelp-core==0.0.1
66
+ ```
67
+
68
+
69
+ ## Initialization
70
+
71
+ After installing `kelp`, initialize a new Kelp project in your desired directory by running the following command:
72
+
73
+ ```
74
+ kelp init .
75
+ ```
76
+
77
+ This will create a `kelp_project.yml` file in the current directory, which is the main configuration file for your Kelp project. You can customize this file to specify your project's settings, variables and file paths.
78
+
79
+
80
+ ```python
81
+ kelp_project.yml # (1)!
82
+ kelp_metadata/# (2)!
83
+ models/**/*.yml
84
+ metrics/**/*.yml
85
+ functions/**/*.yml
86
+ abacs/**/*.yml
87
+ ```
88
+
89
+ 1. This is where your main project configuration file lives. Here you can set global settings, variables, and other configurations for your Kelp project.
90
+ 2. This directory stores your model and metric definitions in YAML format. You can organize them in subdirectories as needed (e.g., by environment, team, or domain).
91
+
92
+ Example structure
93
+ ```markdown
94
+ kelp_project.yml
95
+ kelp_metadata/
96
+ models/
97
+ bronze/
98
+ bronze_customers.yml
99
+ silver/
100
+ silver_customers.yml
101
+ gold/
102
+ gold_customers.yml
103
+ metrics/
104
+ customer_metrics.yml
105
+ functions/
106
+ functions.yml
107
+ sql/
108
+ mask_ssn.sql
109
+ abacs/
110
+ policies.yml
111
+ ```
112
+
113
+ ## Set Up Targets and Base Configurations
114
+
115
+ Targets in Kelp represent different environments or configurations for your pipelines (e.g., development, staging, production). Define targets in your `kelp_project.yml` file under the `targets` section. Each target can have its own settings, such as catalog and schema variables, as well as other environment-specific configurations.
116
+
117
+ ```yaml
118
+ kelp_project:
119
+
120
+ models_path: "./kelp_metadata/models"
121
+ models:
122
+ +catalog: ${ catalog } # (1)!
123
+ bronze:
124
+ +schema: kelp_bronze
125
+ silver:
126
+ +schema: kelp_silver
127
+ gold:
128
+ +schema: kelp_gold
129
+ +tags:
130
+ kelp_managed: "" # (2)!
131
+
132
+ metrics_path: "./kelp_metadata/metrics"
133
+ metric_views:
134
+ +catalog: ${ catalog }
135
+ +schema: kelp_gold
136
+ +tags:
137
+ kelp_managed: ""
138
+
139
+ functions_path: "./kelp_metadata/functions"
140
+ functions:
141
+ +catalog: ${ security_catalog } # (4)!
142
+ +schema: ${ security_schema }
143
+
144
+ abacs_path: "./kelp_metadata/abacs"
145
+ abacs: {}
146
+
147
+ vars:
148
+ default_catalog: my_catalog
149
+ default_schema: my_schema
150
+ default_security_catalog: security_catalog
151
+ default_security_schema: security_schema
152
+
153
+ targets:
154
+ dev:
155
+ vars:
156
+ catalog: ${default_catalog}_dev # (3)!
157
+ schema: ${default_schema}_dev
158
+ security_catalog: ${default_security_catalog}_dev
159
+ security_schema: ${default_security_schema}_dev
160
+ prod:
161
+ vars:
162
+ catalog: ${default_catalog}_prod
163
+ schema: ${default_schema}_prod
164
+ security_catalog: ${default_security_catalog}_prod
165
+ security_schema: ${default_security_schema}_prod
166
+ ```
167
+
168
+ 1. Set up directory-level configurations with `+` that can be inherited by all models and metric views in that directory.
169
+ 2. This sets a tag on all models in this project.
170
+ 3. You can override variables for each target.
171
+ 4. Functions often live in a separate security schema/catalog and can be configured independently.
172
+
173
+ ## Next Steps
174
+
175
+ Explore Kelp's comprehensive guides to get the most out of the framework:
176
+
177
+ | Guide | Overview |
178
+ |-------|----------|
179
+ | [Spark Declarative Pipelines (SDP)](guides/sdp.md) | Integrate Kelp with Databricks SDP using decorators and the low-level API |
180
+ | [Normal Spark (Non-SDP)](guides/normal_spark.md) | Use Kelp in standard Spark jobs with `kelp.tables`, DDL, and DQX |
181
+ | [Sync Metadata with Your Catalog](guides/catalog.md) | Keep local metadata in sync with Unity Catalog |
182
+ | [DataFrame Transformations](guides/transformations.md) | Use composable transformations like `apply_schema()` and `apply_func()` |
183
+ | [Project Configuration](guides/project_config.md) | Master `kelp_project.yml` configuration, hierarchies, and targets |
184
+ | [CLI Reference](guides/cli.md) | Command-line tools for project management and metadata sync |
185
+ | [Functions](guides/functions.md) | Define reusable SQL and Python functions in Unity Catalog |
186
+ | [ABAC Policies](guides/abacs.md) | Implement row and column access control |
187
+ | [Metric Views](guides/metric_views.md) | Define business metrics and dimensions |
188
+
189
+ ## Build Transformations
190
+
191
+ Kelp provides utilities to transform data using DataFrame transformations that can be chained together:
192
+
193
+ - **Schema enforcement** - Apply and enforce schemas from metadata via `apply_schema()`
194
+ - **Function application** - Apply Unity Catalog functions via `apply_func()`
195
+
196
+ Use Kelp's composable transformations in your pipelines:
197
+
198
+ ```python
199
+ from kelp.transformations import apply_schema, apply_func
200
+ import kelp.pipelines as kp
201
+
202
+ @kp.table()
203
+ def silver_customers():
204
+ df = spark.readStream.table(kp.ref("bronze_customers"))
205
+
206
+ return (
207
+ df
208
+ .transform(apply_schema("silver_customers"))
209
+ .transform(apply_func(
210
+ func_name="normalize_email",
211
+ new_column="email_clean",
212
+ parameters="email"
213
+ ))
214
+ )
215
+ ```
216
+
217
+ Learn more in the [DataFrame Transformations](guides/transformations.md) guide.
218
+
219
+ ## Define Functions, Metrics, and Policies
220
+
221
+ Kelp supports multiple metadata objects beyond tables:
222
+
223
+ - **`kelp_functions`** - SQL/Python Unity Catalog functions (define once, use in code and ABAC)
224
+ - **`kelp_metric_views`** - Business metrics for analytics and dashboards
225
+ - **`kelp_abacs`** - Row filters and column masking (attribute-based access control)
226
+
227
+ Example function:
228
+
229
+ ```yaml
230
+ kelp_functions:
231
+ - name: normalize_email
232
+ language: SQL
233
+ parameters:
234
+ - name: email
235
+ data_type: STRING
236
+ returns_data_type: STRING
237
+ body: lower(trim(email))
238
+ ```
239
+
240
+ Example metric view:
241
+
242
+ ```yaml
243
+ kelp_metric_views:
244
+ - name: customer_monthly_revenue
245
+ catalog: ${ catalog }
246
+ schema: ${ metric_schema }
247
+ definition:
248
+ measures:
249
+ - name: total_revenue
250
+ expr: SUM(amount)
251
+ - name: order_count
252
+ expr: COUNT(*)
253
+ dimensions:
254
+ - name: order_month
255
+ expr: DATE_TRUNC('MONTH', order_date)
256
+ source_table: ${ catalog }.gold.orders
257
+ ```
258
+
259
+ Learn more in the [Functions](guides/functions.md), [Metric Views](guides/metric_views.md), and [ABAC Policies](guides/abacs.md) guides.
260
+
261
+ ## Use the Kelp CLI
262
+
263
+ The Kelp CLI provides commands for project management and metadata synchronization:
264
+
265
+ ```bash
266
+ # Initialize a new project
267
+ uv run kelp init project ./my_project
268
+
269
+ # Generate JSON schema for IDE support
270
+ uv run kelp json-schema --output kelp_json_schema.json
271
+
272
+ # Sync metadata from Databricks tables to YAML
273
+ uv run kelp catalog sync-from-catalog "catalog.schema.table" --output models/table.yml
274
+
275
+ # Validate project configuration
276
+ uv run kelp validate --target prod
277
+
278
+ ```
279
+
280
+ Learn more in the [CLI Reference](guides/cli.md).
281
+
282
+ ## Sync Metadata to Unity Catalog
283
+
284
+ After your pipeline creates tables, sync metadata (descriptions, tags, constraints) to the catalog:
285
+
286
+ ```python
287
+ import kelp.catalog as kc
288
+
289
+ kc.init("kelp_project.yml", target="prod")
290
+
291
+ # Sync functions first (before pipeline runs)
292
+ for query in kc.sync_functions():
293
+ spark.sql(query)
294
+
295
+ # Sync tables, metric views and ABAC policies (after pipeline runs)
296
+ for query in kc.sync_catalog():
297
+ spark.sql(query)
298
+
299
+ ```
300
+
301
+ Learn more in the [Sync Metadata with Your Catalog](guides/catalog.md) guide.
302
+
303
+ ## Environment Variables
304
+
305
+ If you frequently reuse a specific target and project path, you can set them as environment variables:
306
+
307
+ ```bash
308
+ export KELP_TARGET=prod
309
+ export KELP_PROJECT_FILE=/path/to/kelp_project.yml
310
+
311
+ # Now commands use these defaults
312
+ uv run kelp validate
313
+ uv run kelp catalog sync-from-catalog "catalog.schema.table"
314
+ ```
@@ -0,0 +1,298 @@
1
+
2
+ ```
3
+ ██╗ ██╗███████╗██╗ ██████╗
4
+ ██║ ██╔╝██╔════╝██║ ██╔══██╗
5
+ █████╔╝ █████╗ ██║ ██████╔╝
6
+ ██╔═██╗ ██╔══╝ ██║ ██╔═══╝
7
+ ██║ ██╗███████╗███████╗██║
8
+ ╚═╝ ╚═╝╚══════╝╚══════╝╚═╝
9
+ Metadata Toolkit for Databricks Spark and Declarative Pipelines
10
+ ```
11
+ Kelp is a powerful framework designed to simplify the management of data pipelines, quality checks, and table configurations. Follow the instructions below to set up Kelp in your environment and start building robust data solutions.
12
+
13
+ Documentation: [https://benschr.github.io/kelp-core/](https://benschr.github.io/kelp-core/)
14
+
15
+ ## Why Kelp?
16
+ Kelp provides a metadata and transformation layer for Databricks Spark and Spark Declarative Pipelines (SDP). It lets you define data models, quality checks, and transformations in structured YAML while offering Python utilities for advanced logic. With Kelp you can:
17
+
18
+ ### Metadata management
19
+ - Define models, metric views, functions, and ABAC policies in readable, maintainable YAML
20
+ - Keep local metadata synchronized with Unity Catalog for improved governance and discoverability
21
+ - Use variables and targets for environment-specific configuration
22
+ - Inherit directory-level settings and tags across models
23
+
24
+ ### Spark Declarative Pipelines (SDP)
25
+ - Inject metadata into SDP decorators with minimal boilerplate
26
+ - Optionally use DQX quality checks instead of SDP expectations
27
+ - Apply a quarantine pattern for validation failures
28
+ - Sync metadata to Unity Catalog after pipeline runs
29
+ - Easily inject catalog and schema names for tables and functions
30
+ - Sync descriptions and tags from metadata to tables and columns without requiring the Spark schema to match exactly
31
+ - Use a low-level API (no decorators) to stay robust against SDP syntax or feature changes
32
+
33
+ ### Extra utilities
34
+ - Composable DataFrame transformations for schema enforcement and function application
35
+ - CLI tools for project management and metadata synchronization
36
+ - Metric views for defining business metrics and dimensions in metadata
37
+ - ABAC policies for row- and column-level access control defined in metadata and applied in code and the catalog
38
+ - Reusable function definitions in metadata that can be referenced from code and ABAC policies for consistent logic and easier maintenance
39
+
40
+ ## Installation
41
+
42
+ To install Kelp, you can use `uv`, `pip`, or the package manager of your choice. Below are the commands for both methods:
43
+
44
+ ```
45
+ uv add kelp-core==0.0.1
46
+ ```
47
+
48
+ ```
49
+ pip install kelp-core==0.0.1
50
+ ```
51
+
52
+
53
+ ## Initialization
54
+
55
+ After installing `kelp`, initialize a new Kelp project in your desired directory by running the following command:
56
+
57
+ ```
58
+ kelp init .
59
+ ```
60
+
61
+ This will create a `kelp_project.yml` file in the current directory, which is the main configuration file for your Kelp project. You can customize this file to specify your project's settings, variables and file paths.
62
+
63
+
64
+ ```python
65
+ kelp_project.yml # (1)!
66
+ kelp_metadata/# (2)!
67
+ models/**/*.yml
68
+ metrics/**/*.yml
69
+ functions/**/*.yml
70
+ abacs/**/*.yml
71
+ ```
72
+
73
+ 1. This is where your main project configuration file lives. Here you can set global settings, variables, and other configurations for your Kelp project.
74
+ 2. This directory stores your model and metric definitions in YAML format. You can organize them in subdirectories as needed (e.g., by environment, team, or domain).
75
+
76
+ Example structure
77
+ ```markdown
78
+ kelp_project.yml
79
+ kelp_metadata/
80
+ models/
81
+ bronze/
82
+ bronze_customers.yml
83
+ silver/
84
+ silver_customers.yml
85
+ gold/
86
+ gold_customers.yml
87
+ metrics/
88
+ customer_metrics.yml
89
+ functions/
90
+ functions.yml
91
+ sql/
92
+ mask_ssn.sql
93
+ abacs/
94
+ policies.yml
95
+ ```
96
+
97
+ ## Set Up Targets and Base Configurations
98
+
99
+ Targets in Kelp represent different environments or configurations for your pipelines (e.g., development, staging, production). Define targets in your `kelp_project.yml` file under the `targets` section. Each target can have its own settings, such as catalog and schema variables, as well as other environment-specific configurations.
100
+
101
+ ```yaml
102
+ kelp_project:
103
+
104
+ models_path: "./kelp_metadata/models"
105
+ models:
106
+ +catalog: ${ catalog } # (1)!
107
+ bronze:
108
+ +schema: kelp_bronze
109
+ silver:
110
+ +schema: kelp_silver
111
+ gold:
112
+ +schema: kelp_gold
113
+ +tags:
114
+ kelp_managed: "" # (2)!
115
+
116
+ metrics_path: "./kelp_metadata/metrics"
117
+ metric_views:
118
+ +catalog: ${ catalog }
119
+ +schema: kelp_gold
120
+ +tags:
121
+ kelp_managed: ""
122
+
123
+ functions_path: "./kelp_metadata/functions"
124
+ functions:
125
+ +catalog: ${ security_catalog } # (4)!
126
+ +schema: ${ security_schema }
127
+
128
+ abacs_path: "./kelp_metadata/abacs"
129
+ abacs: {}
130
+
131
+ vars:
132
+ default_catalog: my_catalog
133
+ default_schema: my_schema
134
+ default_security_catalog: security_catalog
135
+ default_security_schema: security_schema
136
+
137
+ targets:
138
+ dev:
139
+ vars:
140
+ catalog: ${default_catalog}_dev # (3)!
141
+ schema: ${default_schema}_dev
142
+ security_catalog: ${default_security_catalog}_dev
143
+ security_schema: ${default_security_schema}_dev
144
+ prod:
145
+ vars:
146
+ catalog: ${default_catalog}_prod
147
+ schema: ${default_schema}_prod
148
+ security_catalog: ${default_security_catalog}_prod
149
+ security_schema: ${default_security_schema}_prod
150
+ ```
151
+
152
+ 1. Set up directory-level configurations with `+` that can be inherited by all models and metric views in that directory.
153
+ 2. This sets a tag on all models in this project.
154
+ 3. You can override variables for each target.
155
+ 4. Functions often live in a separate security schema/catalog and can be configured independently.
156
+
157
+ ## Next Steps
158
+
159
+ Explore Kelp's comprehensive guides to get the most out of the framework:
160
+
161
+ | Guide | Overview |
162
+ |-------|----------|
163
+ | [Spark Declarative Pipelines (SDP)](guides/sdp.md) | Integrate Kelp with Databricks SDP using decorators and the low-level API |
164
+ | [Normal Spark (Non-SDP)](guides/normal_spark.md) | Use Kelp in standard Spark jobs with `kelp.tables`, DDL, and DQX |
165
+ | [Sync Metadata with Your Catalog](guides/catalog.md) | Keep local metadata in sync with Unity Catalog |
166
+ | [DataFrame Transformations](guides/transformations.md) | Use composable transformations like `apply_schema()` and `apply_func()` |
167
+ | [Project Configuration](guides/project_config.md) | Master `kelp_project.yml` configuration, hierarchies, and targets |
168
+ | [CLI Reference](guides/cli.md) | Command-line tools for project management and metadata sync |
169
+ | [Functions](guides/functions.md) | Define reusable SQL and Python functions in Unity Catalog |
170
+ | [ABAC Policies](guides/abacs.md) | Implement row and column access control |
171
+ | [Metric Views](guides/metric_views.md) | Define business metrics and dimensions |
172
+
173
+ ## Build Transformations
174
+
175
+ Kelp provides utilities to transform data using DataFrame transformations that can be chained together:
176
+
177
+ - **Schema enforcement** - Apply and enforce schemas from metadata via `apply_schema()`
178
+ - **Function application** - Apply Unity Catalog functions via `apply_func()`
179
+
180
+ Use Kelp's composable transformations in your pipelines:
181
+
182
+ ```python
183
+ from kelp.transformations import apply_schema, apply_func
184
+ import kelp.pipelines as kp
185
+
186
+ @kp.table()
187
+ def silver_customers():
188
+ df = spark.readStream.table(kp.ref("bronze_customers"))
189
+
190
+ return (
191
+ df
192
+ .transform(apply_schema("silver_customers"))
193
+ .transform(apply_func(
194
+ func_name="normalize_email",
195
+ new_column="email_clean",
196
+ parameters="email"
197
+ ))
198
+ )
199
+ ```
200
+
201
+ Learn more in the [DataFrame Transformations](guides/transformations.md) guide.
202
+
203
+ ## Define Functions, Metrics, and Policies
204
+
205
+ Kelp supports multiple metadata objects beyond tables:
206
+
207
+ - **`kelp_functions`** - SQL/Python Unity Catalog functions (define once, use in code and ABAC)
208
+ - **`kelp_metric_views`** - Business metrics for analytics and dashboards
209
+ - **`kelp_abacs`** - Row filters and column masking (attribute-based access control)
210
+
211
+ Example function:
212
+
213
+ ```yaml
214
+ kelp_functions:
215
+ - name: normalize_email
216
+ language: SQL
217
+ parameters:
218
+ - name: email
219
+ data_type: STRING
220
+ returns_data_type: STRING
221
+ body: lower(trim(email))
222
+ ```
223
+
224
+ Example metric view:
225
+
226
+ ```yaml
227
+ kelp_metric_views:
228
+ - name: customer_monthly_revenue
229
+ catalog: ${ catalog }
230
+ schema: ${ metric_schema }
231
+ definition:
232
+ measures:
233
+ - name: total_revenue
234
+ expr: SUM(amount)
235
+ - name: order_count
236
+ expr: COUNT(*)
237
+ dimensions:
238
+ - name: order_month
239
+ expr: DATE_TRUNC('MONTH', order_date)
240
+ source_table: ${ catalog }.gold.orders
241
+ ```
242
+
243
+ Learn more in the [Functions](guides/functions.md), [Metric Views](guides/metric_views.md), and [ABAC Policies](guides/abacs.md) guides.
244
+
245
+ ## Use the Kelp CLI
246
+
247
+ The Kelp CLI provides commands for project management and metadata synchronization:
248
+
249
+ ```bash
250
+ # Initialize a new project
251
+ uv run kelp init project ./my_project
252
+
253
+ # Generate JSON schema for IDE support
254
+ uv run kelp json-schema --output kelp_json_schema.json
255
+
256
+ # Sync metadata from Databricks tables to YAML
257
+ uv run kelp catalog sync-from-catalog "catalog.schema.table" --output models/table.yml
258
+
259
+ # Validate project configuration
260
+ uv run kelp validate --target prod
261
+
262
+ ```
263
+
264
+ Learn more in the [CLI Reference](guides/cli.md).
265
+
266
+ ## Sync Metadata to Unity Catalog
267
+
268
+ After your pipeline creates tables, sync metadata (descriptions, tags, constraints) to the catalog:
269
+
270
+ ```python
271
+ import kelp.catalog as kc
272
+
273
+ kc.init("kelp_project.yml", target="prod")
274
+
275
+ # Sync functions first (before pipeline runs)
276
+ for query in kc.sync_functions():
277
+ spark.sql(query)
278
+
279
+ # Sync tables, metric views and ABAC policies (after pipeline runs)
280
+ for query in kc.sync_catalog():
281
+ spark.sql(query)
282
+
283
+ ```
284
+
285
+ Learn more in the [Sync Metadata with Your Catalog](guides/catalog.md) guide.
286
+
287
+ ## Environment Variables
288
+
289
+ If you frequently reuse a specific target and project path, you can set them as environment variables:
290
+
291
+ ```bash
292
+ export KELP_TARGET=prod
293
+ export KELP_PROJECT_FILE=/path/to/kelp_project.yml
294
+
295
+ # Now commands use these defaults
296
+ uv run kelp validate
297
+ uv run kelp catalog sync-from-catalog "catalog.schema.table"
298
+ ```