@clickzetta/cz-cli-darwin-x64 0.5.15 → 0.5.17
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/cz-cli +0 -0
- package/bin/skills/lakehouse-doc-en/SKILL.md +6 -11
- package/bin/skills/lakehouse-doc-en/references/AIGateway.md +58 -13
- package/bin/skills/lakehouse-doc-en/references/Computation.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/DataSource_Amazon_DocumentDB.md +3 -1
- package/bin/skills/lakehouse-doc-en/references/Foreach.md +14 -14
- package/bin/skills/lakehouse-doc-en/references/JDBC-Driver.md +0 -1
- package/bin/skills/lakehouse-doc-en/references/LakehouseAI-overview.md +21 -8
- package/bin/skills/lakehouse-doc-en/references/LakehouseDataGPT-tour.md +4 -9
- package/bin/skills/lakehouse-doc-en/references/LakehouseStudio-tour.md +14 -19
- package/bin/skills/lakehouse-doc-en/references/Lakehouse_Zilliz_MakeDataReadyforBIandAI.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/Logstash.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/Migrate_Spark_DataEngineeringBestPractices_Project_to_Lakehouse.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/Notebook.md +17 -17
- package/bin/skills/lakehouse-doc-en/references/RemoteFunction-as-udf.md +14 -14
- package/bin/skills/lakehouse-doc-en/references/SQL_External_Catalog_Guide.md +1 -9
- package/bin/skills/lakehouse-doc-en/references/SUMMARY.md +59 -29
- package/bin/skills/lakehouse-doc-en/references/WINDOWFUNCTION.md +99 -57
- package/bin/skills/lakehouse-doc-en/references/Zettapark_Data_Engineering_Demo.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/access-control-configuration.md +1 -8
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-2-5-1.0.md +16 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-3-29-1.0.2.md +14 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-3-8-1.0.1.md +16 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-4-28-1.1.md +29 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-12-1.1.1.md +18 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-15-1.2.md +9 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-21-1.3.md +9 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-28-1.4.md +10 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-6-3-1.5.md +9 -0
- package/bin/skills/lakehouse-doc-en/references/alicloud-arn-externalid.md +0 -5
- package/bin/skills/lakehouse-doc-en/references/answer-accuracy-improve.md +120 -103
- package/bin/skills/lakehouse-doc-en/references/application-list.md +1 -3
- package/bin/skills/lakehouse-doc-en/references/approval-list.md +16 -17
- package/bin/skills/lakehouse-doc-en/references/batch-load-parquet-file-into-lakehouse.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/batch_sync.md +9 -9
- package/bin/skills/lakehouse-doc-en/references/batch_sync_Sop.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/batchloadparquetfileintoLakehouse.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/bulkloadv1-python-sdk.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/chart-auto-refresh-guide.md +12 -6
- package/bin/skills/lakehouse-doc-en/references/clickzetta-sample-data.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/code_approval.md +1 -5
- package/bin/skills/lakehouse-doc-en/references/composite_task.md +31 -42
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_environment_and_data_generate.md +6 -9
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_javasdk_bulkload_realtime.md +4 -10
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_kafka_realtime_sync.md +1 -10
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_local_file_into_table_by_studio.md +0 -6
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_batchload_public_network.md +0 -5
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_python_node.md +2 -7
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_realtime_cdc_public_network.md +13 -18
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_sql_insert.md +0 -1
- package/bin/skills/lakehouse-doc-en/references/concepts.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/config-datasource.md +5 -7
- package/bin/skills/lakehouse-doc-en/references/connect-with-cli.md +116 -72
- package/bin/skills/lakehouse-doc-en/references/connect-with-cz-cli.md +151 -0
- package/bin/skills/lakehouse-doc-en/references/continue-job.md +9 -17
- package/bin/skills/lakehouse-doc-en/references/create-api-connection.md +315 -286
- package/bin/skills/lakehouse-doc-en/references/create-catalog-connection.md +1 -0
- package/bin/skills/lakehouse-doc-en/references/create-dynamic-table.md +4 -4
- package/bin/skills/lakehouse-doc-en/references/create-external-catalog.md +85 -22
- package/bin/skills/lakehouse-doc-en/references/create-table-ddl.md +45 -0
- package/bin/skills/lakehouse-doc-en/references/creating_alicloud_privatelinkendpoint.md +4 -6
- package/bin/skills/lakehouse-doc-en/references/creating_alicloud_privatelinkservice.md +4 -7
- package/bin/skills/lakehouse-doc-en/references/creating_tencentcloud_privatelinkendpoint.md +2 -7
- package/bin/skills/lakehouse-doc-en/references/creating_tencentcloud_privatelinkservice.md +1 -5
- package/bin/skills/lakehouse-doc-en/references/cz-cli-agent.md +15 -10
- package/bin/skills/lakehouse-doc-en/references/cz-cli-datasource.md +0 -8
- package/bin/skills/lakehouse-doc-en/references/cz-cli-sql.md +2 -45
- package/bin/skills/lakehouse-doc-en/references/cz-cli.md +53 -42
- package/bin/skills/lakehouse-doc-en/references/dashboard-version-management-guide.md +12 -4
- package/bin/skills/lakehouse-doc-en/references/data-integration-intro.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/data-integration.md +29 -27
- package/bin/skills/lakehouse-doc-en/references/data-load-summary.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/data-quality.md +25 -25
- package/bin/skills/lakehouse-doc-en/references/data-sharing.md +31 -54
- package/bin/skills/lakehouse-doc-en/references/data-sources.md +45 -45
- package/bin/skills/lakehouse-doc-en/references/data_catalog.md +23 -25
- package/bin/skills/lakehouse-doc-en/references/data_privacy.md +5 -2
- package/bin/skills/lakehouse-doc-en/references/data_sharing_between_accounts_guide.md +0 -4
- package/bin/skills/lakehouse-doc-en/references/data_visualization.md +4 -15
- package/bin/skills/lakehouse-doc-en/references/dataagent.md +39 -7
- package/bin/skills/lakehouse-doc-en/references/databricks-delta-to-lakehouse-migration.md +168 -0
- package/bin/skills/lakehouse-doc-en/references/databricks-dlt-to-lakehouse-migration.md +331 -0
- package/bin/skills/lakehouse-doc-en/references/databricks-external-catalog-practice.md +367 -0
- package/bin/skills/lakehouse-doc-en/references/databricks-jobs-to-studio-migration.md +199 -0
- package/bin/skills/lakehouse-doc-en/references/databricks-notebook-to-studio-migration.md +350 -0
- package/bin/skills/lakehouse-doc-en/references/databricks-uc-governance-to-lakehouse-migration.md +327 -0
- package/bin/skills/lakehouse-doc-en/references/datagpt-model-config.md +34 -0
- package/bin/skills/lakehouse-doc-en/references/datagpt_data_source.md +50 -37
- package/bin/skills/lakehouse-doc-en/references/datagpt_introduction.md +55 -79
- package/bin/skills/lakehouse-doc-en/references/datagpt_quickstart.md +50 -64
- package/bin/skills/lakehouse-doc-en/references/datalake-acceleration.md +75 -2
- package/bin/skills/lakehouse-doc-en/references/dbt-databricks-to-clickzetta-migration.md +242 -0
- package/bin/skills/lakehouse-doc-en/references/dynamic-mask.md +30 -30
- package/bin/skills/lakehouse-doc-en/references/dynamic-table-bestpractice.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/dynamic-table-introduce.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/dynamic_table_summary.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/eco_integration/streamlit.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/eco_integration/superset.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/ecosystem-all.md +1 -3
- package/bin/skills/lakehouse-doc-en/references/ecosystem.md +145 -0
- package/bin/skills/lakehouse-doc-en/references/external-catalog-summary.md +33 -38
- package/bin/skills/lakehouse-doc-en/references/external-function-combo-practice.md +466 -0
- package/bin/skills/lakehouse-doc-en/references/f6fc6447ee.md +7 -9
- package/bin/skills/lakehouse-doc-en/references/federation-query.md +56 -6
- package/bin/skills/lakehouse-doc-en/references/finebi-mysql.md +2 -0
- package/bin/skills/lakehouse-doc-en/references/get-started-with-sample-data.md +10 -11
- package/bin/skills/lakehouse-doc-en/references/gitfolder.md +2 -3
- package/bin/skills/lakehouse-doc-en/references/grant-privileges.md +2 -0
- package/bin/skills/lakehouse-doc-en/references/iceberg-rest-catalog-databricks.md +166 -0
- package/bin/skills/lakehouse-doc-en/references/ide.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/if_else_task.md +59 -57
- package/bin/skills/lakehouse-doc-en/references/input_output.md +10 -7
- package/bin/skills/lakehouse-doc-en/references/jobprofile-bestpractices.md +60 -64
- package/bin/skills/lakehouse-doc-en/references/kafka-connection.md +0 -1
- package/bin/skills/lakehouse-doc-en/references/key-concepts.md +146 -117
- package/bin/skills/lakehouse-doc-en/references/lakehouse-ai-gateway-cz-cli.md +317 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-ai-sql-analysis.md +345 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-dqc-guide.md +300 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-medallion-sql-dt-guide.md +543 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-multi-cloud-acceleration.md +274 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-multimodal-ai-pipeline.md +198 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-quick-experience_guide.md +49 -52
- package/bin/skills/lakehouse-doc-en/references/lakehouse-volume-pipe-acceleration-guide.md +380 -0
- package/bin/skills/lakehouse-doc-en/references/langchain-plug-installation.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/management.md +4 -9
- package/bin/skills/lakehouse-doc-en/references/medallion-lakehouse-from-scratch.md +2 -1
- package/bin/skills/lakehouse-doc-en/references/metrics_answer_build.md +58 -21
- package/bin/skills/lakehouse-doc-en/references/migrate-spark-data-engineering-best-practices-to-lakehouse.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/mindsdb.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/monitoring_and_alerting.md +65 -60
- package/bin/skills/lakehouse-doc-en/references/monitoring_item_specification.md +33 -33
- package/bin/skills/lakehouse-doc-en/references/multitable_batch_sync.md +16 -16
- package/bin/skills/lakehouse-doc-en/references/multitable_realtime_sync.md +65 -72
- package/bin/skills/lakehouse-doc-en/references/multitable_realtime_sync_sop.md +54 -52
- package/bin/skills/lakehouse-doc-en/references/navicat-mysql.md +2 -0
- package/bin/skills/lakehouse-doc-en/references/om-dynamic-table.md +71 -66
- package/bin/skills/lakehouse-doc-en/references/om-vcluster.md +2 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-create-session.md +79 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-generate-auth-token.md +63 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-overview.md +96 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-quick-start.md +286 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-response-guide.md +264 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-safe-question-poll.md +201 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-text2insight-query.md +99 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-text2insight-stop.md +74 -0
- package/bin/skills/lakehouse-doc-en/references/overview.md +6 -7
- package/bin/skills/lakehouse-doc-en/references/permission-application.md +5 -5
- package/bin/skills/lakehouse-doc-en/references/pipe-introduction.md +1 -0
- package/bin/skills/lakehouse-doc-en/references/pipe-kafka-table-stream.md +72 -70
- package/bin/skills/lakehouse-doc-en/references/pipe-kafka.md +105 -110
- package/bin/skills/lakehouse-doc-en/references/pipe-overview.md +40 -40
- package/bin/skills/lakehouse-doc-en/references/pipe-storage-object.md +43 -48
- package/bin/skills/lakehouse-doc-en/references/pipe-summary.md +14 -4
- package/bin/skills/lakehouse-doc-en/references/pipe-syntax.md +58 -151
- package/bin/skills/lakehouse-doc-en/references/practice_python_task.md +4 -4
- package/bin/skills/lakehouse-doc-en/references/pricing-ai-gateway.md +181 -0
- package/bin/skills/lakehouse-doc-en/references/pricing-lakehouse.md +316 -0
- package/bin/skills/lakehouse-doc-en/references/pricing.md +44 -288
- package/bin/skills/lakehouse-doc-en/references/private-link-general.md +0 -2
- package/bin/skills/lakehouse-doc-en/references/pyspark-to-zettapark-migration-f1.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/python-igs.md +7 -3
- package/bin/skills/lakehouse-doc-en/references/python-sample-put-github-rt-events.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/python-task.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/python_reference/connector.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/python_reference/connector_advanced.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/python_reference/connector_examples.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/python_sdk_guide.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/python_shell_datasource.md +11 -9
- package/bin/skills/lakehouse-doc-en/references/quick_start_batch_sync_data.md +9 -18
- package/bin/skills/lakehouse-doc-en/references/quick_start_bi_analysis.md +8 -25
- package/bin/skills/lakehouse-doc-en/references/quick_start_create_workspace.md +4 -6
- package/bin/skills/lakehouse-doc-en/references/quick_start_data_quality.md +8 -8
- package/bin/skills/lakehouse-doc-en/references/quick_start_etl.md +16 -20
- package/bin/skills/lakehouse-doc-en/references/quick_start_monitoring_and_alerting.md +10 -18
- package/bin/skills/lakehouse-doc-en/references/quick_start_sql_query.md +7 -10
- package/bin/skills/lakehouse-doc-en/references/quick_start_upload_data.md +5 -7
- package/bin/skills/lakehouse-doc-en/references/quick_start_user_management.md +8 -8
- package/bin/skills/lakehouse-doc-en/references/quick_start_workspace.md +0 -5
- package/bin/skills/lakehouse-doc-en/references/quick_start_workspace_user.md +8 -8
- package/bin/skills/lakehouse-doc-en/references/quickstart.md +69 -56
- package/bin/skills/lakehouse-doc-en/references/quickstart_datashare_between_companies.md +0 -5
- package/bin/skills/lakehouse-doc-en/references/quickstart_envirment_for_team.md +0 -24
- package/bin/skills/lakehouse-doc-en/references/realtime-pipeline-selection-guide.md +1 -2
- package/bin/skills/lakehouse-doc-en/references/realtime-sales-dashboard-with-dynamic-table.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/realtime_sync.md +0 -1
- package/bin/skills/lakehouse-doc-en/references/release-note-2026-05-19.md +5 -3
- package/bin/skills/lakehouse-doc-en/references/revoke-privileges.md +3 -1
- package/bin/skills/lakehouse-doc-en/references/roles.md +2 -3
- package/bin/skills/lakehouse-doc-en/references/row-filter.md +165 -0
- package/bin/skills/lakehouse-doc-en/references/row_level_permission.md +30 -19
- package/bin/skills/lakehouse-doc-en/references/scheduled_task.md +28 -21
- package/bin/skills/lakehouse-doc-en/references/security_overview.md +99 -21
- package/bin/skills/lakehouse-doc-en/references/set-command.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/setup.md +13 -15
- package/bin/skills/lakehouse-doc-en/references/show-grants.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/snowflake-dynamic-tables-to-lakehouse.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/spark-connector-summary.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/sql_functions/context_functions/current_vcluster.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/sso-configuration.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/streaming_pipeline_with_dynamic_table.md +0 -1
- package/bin/skills/lakehouse-doc-en/references/studio-incremental-sync-practice.md +27 -23
- package/bin/skills/lakehouse-doc-en/references/studio-shell-task.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/supported-cloud-platforms.md +32 -0
- package/bin/skills/lakehouse-doc-en/references/table_rendering.md +18 -12
- package/bin/skills/lakehouse-doc-en/references/task-develop.md +89 -91
- package/bin/skills/lakehouse-doc-en/references/task_development.md +19 -17
- package/bin/skills/lakehouse-doc-en/references/task_group.md +16 -14
- package/bin/skills/lakehouse-doc-en/references/task_instance.md +21 -21
- package/bin/skills/lakehouse-doc-en/references/task_param.md +38 -35
- package/bin/skills/lakehouse-doc-en/references/task_param_reference.md +81 -79
- package/bin/skills/lakehouse-doc-en/references/task_scheduling_dependency.md +20 -21
- package/bin/skills/lakehouse-doc-en/references/tencentcloud_arn_and_externalid.md +1 -5
- package/bin/skills/lakehouse-doc-en/references/trial-account-quotas-and-limits.md +1 -3
- package/bin/skills/lakehouse-doc-en/references/tutorial_connect_to_lakehouse.md +69 -0
- package/bin/skills/lakehouse-doc-en/references/tutorials.md +4 -1
- package/bin/skills/lakehouse-doc-en/references/unique-key.md +167 -0
- package/bin/skills/lakehouse-doc-en/references/usageandbillingview.md +138 -0
- package/bin/skills/lakehouse-doc-en/references/use-dbt-dev.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/use-java-sdk-realtime-uploaddata.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/use-java-sdk-upload-data-local.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/use-models.md +128 -0
- package/bin/skills/lakehouse-doc-en/references/use-mysql-client.md +81 -81
- package/bin/skills/lakehouse-doc-en/references/use-python-sdk-upload-data.md +10 -12
- package/bin/skills/lakehouse-doc-en/references/user-identification.md +2 -3
- package/bin/skills/lakehouse-doc-en/references/user_permission_grand_guide.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/using-udf-in-dynamic-table.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/vc_cache.md +18 -22
- package/bin/skills/lakehouse-doc-en/references/vcluster_size_description.md +33 -31
- package/bin/skills/lakehouse-doc-en/references/virtual-cluster.md +43 -45
- package/bin/skills/lakehouse-doc-en/references/web-job-history.md +94 -108
- package/bin/skills/lakehouse-doc-en/references/web_search.md +16 -7
- package/bin/skills/lakehouse-doc-en/references/zettapark-data-engineering-demo.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/zettapark-dataframe-guide.md +144 -70
- package/bin/skills/lakehouse-doc-en/references/zettapark-dynamic-table-guide.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/zettapark-etl-guide.md +73 -33
- package/bin/skills/lakehouse-doc-en/references/zettapark-feature-engineering.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/zettapark-functions-guide.md +75 -46
- package/bin/skills/lakehouse-doc-en/references/zettapark-quick-start.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/zettapark-stream-guide.md +4 -4
- package/bin/skills/lakehouse-doc-en/references/zettapark-volume-guide.md +93 -29
- package/package.json +1 -1
- package/bin/skills/lakehouse-doc-en/references/CLAUDE.md +0 -606
- package/bin/skills/lakehouse-doc-en/references/modelprice.md +0 -155
|
@@ -0,0 +1,242 @@
|
|
|
1
|
+
# dbt-databricks → dbt-clickzetta Migration in Practice: Financial Payment Data Pipeline
|
|
2
|
+
|
|
3
|
+
If your data pipeline is built with dbt + Databricks SQL, migrating to Singdata Lakehouse is far less work than you might expect — as little as changing one line in `profiles.yml`. All core dbt capabilities — CTE models, window functions, SCD Type 2 logic, data quality tests, unit tests, and data contracts — are fully compatible with Lakehouse and require zero changes.
|
|
4
|
+
|
|
5
|
+
This article validates that claim with a real project: a financial payment data pipeline built on dbt-databricks (staging → intermediate → marts → semantic, 30 models, 36 tests) was fully migrated to dbt-clickzetta and verified with `dbt seed 9/9 + dbt run 30/30 + dbt test 36/36` — all passing.
|
|
6
|
+
|
|
7
|
+
Full code on GitHub: [dbt-databricks2lakehouse-blueprint](https://github.com/clickzetta/dbt-databricks2lakehouse-blueprint)
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## Original Project
|
|
12
|
+
|
|
13
|
+
[dbt-databricks2lakehouse-blueprint](https://github.com/clickzetta/dbt-databricks2lakehouse-blueprint) is forked from [Alex-Teodosiu/dbt-blueprint](https://github.com/Alex-Teodosiu/dbt-blueprint) (⭐72). The original stack is dbt-databricks + Databricks SQL Warehouse. The project simulates a P2P/P2B (peer-to-peer / peer-to-business) payment platform data engineering pipeline, covering five data domains — users, merchants, accounts, bank cards, and transactions — and implements the complete chain from raw events → cleansing → SCD Type 2 dimension tables → fact tables → semantic layer.
|
|
14
|
+
|
|
15
|
+
The migrated code is in the `03_lakehouse/` directory and can be compared file-by-file with `01_source/dbt_blueprint/`.
|
|
16
|
+
|
|
17
|
+
## Conclusion First
|
|
18
|
+
|
|
19
|
+
**The adapter is dbt's database dialect layer** — switching adapters means switching the database; the dbt models themselves do not need to change. Changes are concentrated in `profiles.yml` (connection configuration) and a handful of Databricks-specific syntax items.
|
|
20
|
+
|
|
21
|
+
| Change | Files | Effort | Notes |
|
|
22
|
+
|--------|-------|--------|-------|
|
|
23
|
+
| Adapter switch | 1 (`profiles.yml`) | Minimal | `type: databricks` → `type: clickzetta`, different connection params |
|
|
24
|
+
| Project profile name | 1 (`dbt_project.yml`) | Minimal | `connection_databricks` → `dbt_blueprint` |
|
|
25
|
+
| `getdate()` | 1 | Minimal | → `current_date()` |
|
|
26
|
+
| Macro target name | 1 | Minimal | `databricks_cluster` → `clickzetta_prod` |
|
|
27
|
+
|
|
28
|
+
**Parts that need no changes**: all CTE model logic, window functions (LEAD/LAG/ROW_NUMBER), SCD Type 2 logic, `dbt_utils.generate_surrogate_key()`, data quality tests, unit tests, data contracts, `SELECT * EXCEPT (col)`, `col :: type` cast syntax, `DATEDIFF(year, start, end)` — all natively supported by Lakehouse.
|
|
29
|
+
|
|
30
|
+
---
|
|
31
|
+
|
|
32
|
+
## Technology Stack Comparison
|
|
33
|
+
|
|
34
|
+
| | dbt-databricks (original) | dbt-clickzetta (migrated) |
|
|
35
|
+
|---|---|---|
|
|
36
|
+
| dbt adapter | `dbt-databricks` | `dbt-clickzetta` |
|
|
37
|
+
| Compute engine | Databricks SQL Warehouse | Singdata Lakehouse |
|
|
38
|
+
| Connection method | `host` + `http_path` + `token` | `instance` + `workspace` + `service` + `username` + `password` |
|
|
39
|
+
| Target catalog | `catalog: dbt_blueprint` | Not needed (workspace is the catalog) |
|
|
40
|
+
| SQL dialect | Databricks SQL | Lakehouse SQL (ANSI-compatible) |
|
|
41
|
+
| `:: cast` | Supported | **Supported** (same on both sides, no change needed) |
|
|
42
|
+
| `SELECT * EXCEPT` | Supported | **Supported** (same on both sides, no change needed) |
|
|
43
|
+
| `DATEDIFF(year, s, e)` | Supported | **Supported** (three-argument form compatible) |
|
|
44
|
+
| `getdate()` | Supported | Not supported → use `current_date()` |
|
|
45
|
+
| Model logic (CTE/Window/JOIN) | — | **Fully identical** |
|
|
46
|
+
|
|
47
|
+
---
|
|
48
|
+
|
|
49
|
+

|
|
50
|
+
|
|
51
|
+
---
|
|
52
|
+
|
|
53
|
+
## Project Background
|
|
54
|
+
|
|
55
|
+
The data comes from a simulated P2P/P2B payment platform and contains 9 raw event tables:
|
|
56
|
+
|
|
57
|
+
| Domain | Table | Rows | Description |
|
|
58
|
+
|--------|-------|------|-------------|
|
|
59
|
+
| Users | `user_events` | 196 | User registration/status change events (SCD2 source) |
|
|
60
|
+
| Merchants | `merchant_events` | 30 | Merchant registration/status changes |
|
|
61
|
+
| Merchants | `industry_codes` | 30 | Industry code reference table |
|
|
62
|
+
| Accounts | `account_events` | 80 | Bank account opening/change events |
|
|
63
|
+
| Bank cards | `card_events` | 60 | Card activation/status changes |
|
|
64
|
+
| Transactions | `raw_p2p_captured_events` | 150 | Successful P2P transactions |
|
|
65
|
+
| Transactions | `raw_p2p_failed_events` | 50 | Failed P2P transactions |
|
|
66
|
+
| Transactions | `raw_p2b_captured_events` | 80 | Successful P2B transactions |
|
|
67
|
+
| Transactions | `raw_p2b_failed_events` | 30 | Failed P2B transactions |
|
|
68
|
+
|
|
69
|
+
dbt four-layer architecture:
|
|
70
|
+
|
|
71
|
+
```
|
|
72
|
+
staging → intermediate → marts → semantic
|
|
73
|
+
(view) (table) (table) (table)
|
|
74
|
+
raw events SCD2 logic dim + fact enriched
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
---
|
|
78
|
+
|
|
79
|
+
## Migration Steps
|
|
80
|
+
|
|
81
|
+
### Step 1: Switch the Adapter (Core Change)
|
|
82
|
+
|
|
83
|
+
The original `profiles.yml` uses `type: databricks`. The migration only requires switching to `type: clickzetta`, replacing Databricks connection params `host/http_path/token` with Lakehouse params `instance/workspace/service`:
|
|
84
|
+
|
|
85
|
+
```yaml
|
|
86
|
+
# Original (dbt-databricks)
|
|
87
|
+
connection_databricks:
|
|
88
|
+
target: dev_local
|
|
89
|
+
outputs:
|
|
90
|
+
dev_local:
|
|
91
|
+
type: databricks
|
|
92
|
+
catalog: dbt_blueprint
|
|
93
|
+
schema: default
|
|
94
|
+
host: dbc-a505ff10-0af5.cloud.databricks.com
|
|
95
|
+
http_path: /sql/1.0/warehouses/4fa4ca06332da87f
|
|
96
|
+
token: "{{ env_var('BLUEPRINT_DATABRICKS_TOKEN') }}"
|
|
97
|
+
threads: 1
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
```yaml
|
|
101
|
+
# Migrated (dbt-clickzetta)
|
|
102
|
+
dbt_blueprint:
|
|
103
|
+
target: dev
|
|
104
|
+
outputs:
|
|
105
|
+
dev:
|
|
106
|
+
type: clickzetta
|
|
107
|
+
instance: "{{ env_var('CZ_INSTANCE') }}"
|
|
108
|
+
workspace: "{{ env_var('CZ_WORKSPACE') }}"
|
|
109
|
+
schema: "{{ env_var('CZ_SCHEMA', 'dbt_blueprint_dev') }}"
|
|
110
|
+
vcluster: "{{ env_var('CZ_VCLUSTER', 'DEFAULT') }}"
|
|
111
|
+
username: "{{ env_var('CZ_USERNAME') }}"
|
|
112
|
+
password: "{{ env_var('CZ_PASSWORD') }}"
|
|
113
|
+
service: "{{ env_var('CZ_SERVICE') }}"
|
|
114
|
+
threads: 4
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
Also update the `profile` name in `dbt_project.yml`:
|
|
118
|
+
|
|
119
|
+
```yaml
|
|
120
|
+
# Original
|
|
121
|
+
profile: 'connection_databricks'
|
|
122
|
+
|
|
123
|
+
# Migrated
|
|
124
|
+
profile: 'dbt_blueprint'
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
### Step 2: Replace `getdate()`
|
|
128
|
+
|
|
129
|
+
`getdate()` is a SQL Server/Databricks function. Lakehouse uses `current_date()`:
|
|
130
|
+
|
|
131
|
+
```sql
|
|
132
|
+
-- Original (int_users.sql)
|
|
133
|
+
{{ calculate_age('date_of_birth', 'getdate()') }} as age
|
|
134
|
+
|
|
135
|
+
-- Migrated
|
|
136
|
+
{{ calculate_age('date_of_birth', 'current_date()') }} as age
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
> 💡 **Tip**: The `:: cast` syntax and the three-argument `DATEDIFF(year, start, end)` form are both **natively supported** in Lakehouse — no changes needed.
|
|
140
|
+
|
|
141
|
+
### Step 3: Update the Target Name in the `generate_schema_name` Macro
|
|
142
|
+
|
|
143
|
+
The original macro uses `databricks_cluster` as the prod environment name. Change it to `clickzetta_prod`:
|
|
144
|
+
|
|
145
|
+
```sql
|
|
146
|
+
-- Original
|
|
147
|
+
{%- if target.name in ["prod", "databricks_cluster"] -%}
|
|
148
|
+
|
|
149
|
+
-- Migrated
|
|
150
|
+
{%- if target.name in ["prod", "clickzetta_prod"] -%}
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
---
|
|
154
|
+
|
|
155
|
+
## Fully Compatible Parts (No Changes Needed)
|
|
156
|
+
|
|
157
|
+
The following patterns are written identically on both sides and have been tested end-to-end:
|
|
158
|
+
|
|
159
|
+
```sql
|
|
160
|
+
-- :: cast type conversion — supported on both sides
|
|
161
|
+
userId :: string as user_id,
|
|
162
|
+
eventTime :: timestamp as event_time,
|
|
163
|
+
amount :: double as transaction_amount
|
|
164
|
+
|
|
165
|
+
-- SCD Type 2 window logic
|
|
166
|
+
, users_scd2 as (
|
|
167
|
+
select
|
|
168
|
+
user_id, status, event_time as from_event_timestamp,
|
|
169
|
+
lead(event_time) over w as to_event_timestamp
|
|
170
|
+
from users
|
|
171
|
+
window w as (partition by user_id order by event_time)
|
|
172
|
+
)
|
|
173
|
+
|
|
174
|
+
-- DATEDIFF three-argument form — supported on both sides
|
|
175
|
+
DATEDIFF(year, date_of_birth, current_date())
|
|
176
|
+
|
|
177
|
+
-- Surrogate key (dbt_utils)
|
|
178
|
+
{{ dbt_utils.generate_surrogate_key(['user_id', 'from_event_timestamp', 'status']) }}
|
|
179
|
+
|
|
180
|
+
-- SELECT * EXCEPT — supported on both sides
|
|
181
|
+
select * except (age)
|
|
182
|
+
from users
|
|
183
|
+
where age >= 18
|
|
184
|
+
|
|
185
|
+
-- Data contracts (model contracts)
|
|
186
|
+
config:
|
|
187
|
+
contract:
|
|
188
|
+
enforced: true
|
|
189
|
+
columns:
|
|
190
|
+
- name: transaction_uid
|
|
191
|
+
data_type: string
|
|
192
|
+
data_tests:
|
|
193
|
+
- not_null
|
|
194
|
+
- unique
|
|
195
|
+
|
|
196
|
+
-- Unit tests
|
|
197
|
+
unit_tests:
|
|
198
|
+
- name: test__int_users__scd_logic
|
|
199
|
+
model: int_users
|
|
200
|
+
given:
|
|
201
|
+
- input: ref('stg_user_events')
|
|
202
|
+
rows: [...]
|
|
203
|
+
expect:
|
|
204
|
+
rows: [...]
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
---
|
|
208
|
+
|
|
209
|
+
## dbt Test Results
|
|
210
|
+
|
|
211
|
+
Tested on the AWS Singapore instance (`aws_singapore_prod`), 36/36 all passing:
|
|
212
|
+
|
|
213
|
+
```
|
|
214
|
+
dbt seed: 9/9 PASS (196 users, 310 transactions, 80 accounts, 60 cards...)
|
|
215
|
+
dbt run: 30/30 PASS (9 views + 21 tables)
|
|
216
|
+
dbt test: 36/36 PASS (21 data tests + 15 unit tests)
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
Data quality tests include: not_null/unique constraints on source tables, marts-layer data contracts (`contract: enforced: true`), and column type validation on fact_transaction. Unit tests cover SCD2 logic (`int_users__scd_logic`), transaction aggregation (`int_transactions__union_all`), and the age calculation macro (`calculate_age__valid_ages`).
|
|
220
|
+
|
|
221
|
+
---
|
|
222
|
+
|
|
223
|
+
## Notes
|
|
224
|
+
|
|
225
|
+
- **`getdate()` vs `current_date()`**: Databricks' `getdate()` returns the current timestamp. Lakehouse uses `current_date()` (date) or `current_timestamp()` (timestamp) as replacements. The `col :: type` cast syntax and the three-argument `DATEDIFF(year, s, e)` form are both **natively supported** in Lakehouse — no changes needed.
|
|
226
|
+
- **Catalog hierarchy**: Databricks uses three-part naming (`catalog.schema.table`). In dbt with Lakehouse, only `schema.table` is used; the adapter handles the rest automatically. No manual changes to table references in models are needed.
|
|
227
|
+
- **Seeds replacing the ingestion layer**: The original project relies on ingestion tables in the Databricks workspace. After migrating to Lakehouse, seeds CSVs are used as replacements. In production, these can be replaced with Studio data integration or CDC tasks.
|
|
228
|
+
|
|
229
|
+
## Related Documentation
|
|
230
|
+
|
|
231
|
+
### dbt Development Guides
|
|
232
|
+
|
|
233
|
+
- [dbt Quick Start](use-dbt-dev.md): dbt-clickzetta adapter installation and configuration
|
|
234
|
+
- [dbt Incremental Models](dbt-incremental.md): Incremental strategy configuration and unique_key options
|
|
235
|
+
- [dbt Data Quality Checks](dbt-data-quality.md): Data tests, contract validation, and custom tests
|
|
236
|
+
- [dbt Advanced Features](dbt-advanced-features.md): Macros, snapshots, semantic layer, and more
|
|
237
|
+
|
|
238
|
+
### Other Migration Case Studies
|
|
239
|
+
|
|
240
|
+
- [Databricks Notebook → Lakehouse Migration (Retail Medallion Pipeline)](databricks-notebook-to-studio-migration.md): PySpark Notebook → ZettaPark/Studio tasks
|
|
241
|
+
- [dbt BigQuery Migration: Retail Data Warehouse Pipeline](dbt-bigquery-to-clickzetta-migration.md): BigQuery → Lakehouse adapter migration
|
|
242
|
+
- [dbt Snowflake Migration: TPC-H Data Warehouse Pipeline](dbt-snowflake-to-clickzetta-migration.md): Snowflake → Lakehouse adapter migration
|
|
@@ -1,12 +1,12 @@
|
|
|
1
1
|
# Lakehouse Column-Level Security (Dynamic Masking) User Guide
|
|
2
2
|
|
|
3
|
-
##
|
|
3
|
+
## Overview
|
|
4
4
|
|
|
5
|
-
Column-level Security provides fine-grained data protection capabilities through Dynamic Data Masking, which dynamically modifies the display of sensitive data (such as partial hiding or character replacement) based on user identity or role.
|
|
5
|
+
Column-level Security provides fine-grained data protection capabilities through Dynamic Data Masking, which dynamically modifies the display of sensitive data (such as partial hiding or character replacement) based on user identity or role. The system only stores the original data and executes the masking function at runtime during data access. This document introduces how to implement this functionality through the SQL interface.
|
|
6
6
|
|
|
7
|
-
##
|
|
7
|
+
## Core Syntax
|
|
8
8
|
|
|
9
|
-
###
|
|
9
|
+
### 1 Creating Masking Policy Functions
|
|
10
10
|
|
|
11
11
|
Refer to the [CREATE FUNCTION (SQL)](create-sql-function.md) syntax.
|
|
12
12
|
|
|
@@ -19,12 +19,12 @@ expression_with_conditional_logic;
|
|
|
19
19
|
|
|
20
20
|
**Key Elements**:
|
|
21
21
|
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
22
|
+
* Must return the same data type as the original column.
|
|
23
|
+
* Use security context functions:
|
|
24
|
+
* `current_user()` to get the current user (note case sensitivity).
|
|
25
|
+
* `current_roles()` to get an array of user roles.
|
|
26
26
|
|
|
27
|
-
### 2
|
|
27
|
+
### 2 Binding Policies to Columns
|
|
28
28
|
|
|
29
29
|
**When Creating a Table**:
|
|
30
30
|
|
|
@@ -45,13 +45,13 @@ SET MASK schema_name.masking_function;
|
|
|
45
45
|
|
|
46
46
|
**Adding a Column with Masking**:
|
|
47
47
|
|
|
48
|
-
```
|
|
48
|
+
```SQL
|
|
49
49
|
ALTER TABLE table_name ADD COLUMN (column_name column_type MASK schema_name.masking_function);
|
|
50
50
|
```
|
|
51
51
|
|
|
52
|
-
###
|
|
52
|
+
### 3 Removing Policy Binding
|
|
53
53
|
|
|
54
|
-
```
|
|
54
|
+
```SQL
|
|
55
55
|
ALTER TABLE table_name
|
|
56
56
|
CHANGE COLUMN column_name
|
|
57
57
|
UNSET MASK;
|
|
@@ -59,9 +59,9 @@ UNSET MASK;
|
|
|
59
59
|
|
|
60
60
|
***
|
|
61
61
|
|
|
62
|
-
##
|
|
62
|
+
## Use Case Examples
|
|
63
63
|
|
|
64
|
-
###
|
|
64
|
+
### 1 Basic Masking
|
|
65
65
|
|
|
66
66
|
**Requirement**: Display the first 6 characters of an ID card number, followed by 4 asterisks, and then the last 4 characters.
|
|
67
67
|
|
|
@@ -79,7 +79,7 @@ ALTER TABLE data CHANGE COLUMN idcard SET MASK public.idcard_masking;
|
|
|
79
79
|
Original Value: 130183199901011234 → Masked: 130183****9010
|
|
80
80
|
```
|
|
81
81
|
|
|
82
|
-
###
|
|
82
|
+
### 2 Dynamic Masking Based on User
|
|
83
83
|
|
|
84
84
|
**Requirement**: Only the `UAT_TEST` user should see masked data.
|
|
85
85
|
|
|
@@ -104,7 +104,7 @@ CASE
|
|
|
104
104
|
END;
|
|
105
105
|
```
|
|
106
106
|
|
|
107
|
-
### 3
|
|
107
|
+
### 3 Dynamic Masking Based on Role
|
|
108
108
|
|
|
109
109
|
**Requirement**: Users with the `user_admin` role can view the full information.
|
|
110
110
|
|
|
@@ -121,9 +121,9 @@ END;
|
|
|
121
121
|
|
|
122
122
|
***
|
|
123
123
|
|
|
124
|
-
##
|
|
124
|
+
## Complete Operation Example
|
|
125
125
|
|
|
126
|
-
###
|
|
126
|
+
### 1 Initializing the Environment
|
|
127
127
|
|
|
128
128
|
```sql
|
|
129
129
|
CREATE SCHEMA IF NOT EXISTS security_demo;
|
|
@@ -144,7 +144,7 @@ INSERT INTO security_demo.user_data VALUES ('James', '123-45-6789', '123456789')
|
|
|
144
144
|
SELECT * FROM security_demo.user_data;
|
|
145
145
|
```
|
|
146
146
|
|
|
147
|
-
###
|
|
147
|
+
### 2 Creating Policy Functions
|
|
148
148
|
|
|
149
149
|
```sql
|
|
150
150
|
-- Exemption for privileged roles
|
|
@@ -157,7 +157,7 @@ CASE
|
|
|
157
157
|
END;
|
|
158
158
|
```
|
|
159
159
|
|
|
160
|
-
###
|
|
160
|
+
### 3 Modifying Masking Policies
|
|
161
161
|
|
|
162
162
|
```sql
|
|
163
163
|
-- Removing the previous policy
|
|
@@ -167,7 +167,7 @@ ALTER TABLE security_demo.user_data CHANGE COLUMN ssn UNSET MASK;
|
|
|
167
167
|
ALTER TABLE security_demo.user_data CHANGE COLUMN ssn SET MASK security_demo.admin_ssn_mask;
|
|
168
168
|
```
|
|
169
169
|
|
|
170
|
-
### 4
|
|
170
|
+
### 4 Verifying the Effect
|
|
171
171
|
|
|
172
172
|
**Query by a Regular User**:
|
|
173
173
|
|
|
@@ -185,18 +185,18 @@ SELECT * FROM user_data;
|
|
|
185
185
|
|
|
186
186
|
***
|
|
187
187
|
|
|
188
|
-
##
|
|
188
|
+
## Management Notes
|
|
189
189
|
|
|
190
|
-
###
|
|
190
|
+
### 1 Permission Control
|
|
191
191
|
|
|
192
|
-
|
|
193
|
-
|
|
192
|
+
* Only roles with `ALTER TABLE` permissions are allowed to modify masking policies.
|
|
193
|
+
* Function creation requires `CREATE FUNCTION` permissions.
|
|
194
194
|
|
|
195
|
-
###
|
|
195
|
+
### 2 Performance Recommendations
|
|
196
196
|
|
|
197
|
-
|
|
198
|
-
|
|
197
|
+
* Avoid using complex calculations in masking functions.
|
|
198
|
+
* Use conditional logic cautiously for columns with high query frequency.
|
|
199
199
|
|
|
200
|
-
##
|
|
200
|
+
## Limitations
|
|
201
201
|
|
|
202
|
-
|
|
202
|
+
* Only one masking policy can be bound to a single column. If you want to define multiple masking rules, you can use conditional logic within a single function to apply different policies.
|
|
@@ -77,7 +77,7 @@ Lakehouse currently uses a scheduling mechanism to update Dynamic Table. The fol
|
|
|
77
77
|
|
|
78
78
|
| | Usage | Advantages | Disadvantages |
|
|
79
79
|
| ------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
80
|
-
| Define scheduling attributes in DDL statements | Define the refresh interval in refreshOption, refer to the documentation <
|
|
80
|
+
| Define scheduling attributes in DDL statements | Define the refresh interval in refreshOption, refer to the documentation <create-dynamic-table.md> for specific usage. Currently, the refresh interval is limited to one minute. | Simple and easy to use, can quickly set refresh options. Does not rely on any third-party tools. | Currently, Lakehouse does not support defining strict upstream and downstream dependencies on Dynamic Table. In DDL definition, it relies on time scheduling. You can ensure upstream refresh completion before scheduling downstream through time intervals. |
|
|
81
81
|
| Define scheduling in Lakehouse Studio | You can configure scheduling through a visual interface in Lakehouse Studio, refer to [Task Development Scheduling Documentation](taskdevelop.md) for specific configuration methods. Currently, the refresh interval is limited to one minute. | Visual configuration, user-friendly. Supports scheduling dependency configuration to ensure downstream refresh after upstream refresh completion. Supports single-node operation monitoring, such as failure alerts, timeout alerts, etc. | |
|
|
82
82
|
| Submitting Refresh Jobs Using Third-Party Scheduling Engines | By downloading the Lakehouse client command, use cron expressions to schedule Refresh tasks. Alternatively, use the Java Jdbc interface to customize the submission of Refresh commands. | You can more flexibly control job submission and configure scheduling information, with no time interval restrictions | Requires reliance on third-party scheduling. Introduces third-party scheduling systems |
|
|
83
83
|
|
|
@@ -75,7 +75,7 @@ Lakehouse currently uses a scheduling mechanism to update Dynamic Tables. The fo
|
|
|
75
75
|
|
|
76
76
|
| | Usage | Advantages | Disadvantages |
|
|
77
77
|
| ------------------------ | ------------------------------------------------------------------------------------------------------- | --------------------------------------------------------- | ------------------------------------------------------------------------------------- |
|
|
78
|
-
| Define scheduling properties in DDL statements | Define the refresh interval in refreshOption, refer to the documentation
|
|
78
|
+
| Define scheduling properties in DDL statements | Define the refresh interval in refreshOption, refer to the documentation create-dynamic-table.md for specific usage. Currently, the refresh interval is limited to one minute. | Simple and easy to use, can quickly set refresh options. Does not rely on any third-party tools. | Currently, Lakehouse does not support defining strict upstream and downstream dependencies on Dynamic Tables. In DDL definitions, it relies on time scheduling. You can ensure the upstream refresh is completed before scheduling the downstream through time intervals. |
|
|
79
79
|
| Define scheduling in Lakehouse Studio | You can configure scheduling through a visual interface in Lakehouse Studio, refer to the [task development scheduling documentation](taskdevelop.md) for specific configuration methods. Currently, the refresh interval is limited to one minute. | Visual configuration, user-friendly. Supports scheduling dependency configuration to ensure the upstream refresh is completed before refreshing the downstream. Supports single-node operation monitoring, such as failure alerts, timeout alerts, etc. | |
|
|
80
80
|
| Submitting Refresh Jobs Using Third-Party Scheduling Engines | By downloading the Lakehouse client command, use cron expressions to schedule Refresh tasks. Alternatively, use the Java Jdbc interface to customize the submission of Refresh commands. | You can more flexibly control job submission and configure scheduling information, with no time interval restrictions | Requires reliance on third-party scheduling. Introduces third-party scheduling systems |
|
|
81
81
|
|
|
@@ -99,6 +99,6 @@ if st.button('Generate Chart') and sql:
|
|
|
99
99
|
## Reference Resources
|
|
100
100
|
|
|
101
101
|
* [Streamlit Official Documentation](https://docs.streamlit.io/library/get-started)
|
|
102
|
-
* [Singdata Lakehouse Official Documentation](https://
|
|
102
|
+
* [Singdata Lakehouse Official Documentation](https://singdata.com/documents)
|
|
103
103
|
|
|
104
104
|
^
|
|
@@ -22,7 +22,7 @@ Open the browser and go to http://localhost:8088.
|
|
|
22
22
|
### Local Installation
|
|
23
23
|
|
|
24
24
|
1. Install `clickzetta-sqlalchemy`:
|
|
25
|
-
`clickzetta-sqlalchemy` needs to be installed in an environment with Python version 3.
|
|
25
|
+
`clickzetta-sqlalchemy` needs to be installed in an environment with Python version 3.10 or above.
|
|
26
26
|
Installation command (ensure the current environment does not need to use clickzetta-sqlalchemy and clickzetta-connector, uninstall them to avoid dependency conflicts):
|
|
27
27
|
```
|
|
28
28
|
pip uninstall -y clickzetta-sqlalchemy clickzetta-connector && pip install clickzetta-connector -U
|
|
@@ -76,7 +76,6 @@ See [JDBC Driver](jdbc-driver.md) for details.
|
|
|
76
76
|
|--------|-------------|-----------|
|
|
77
77
|
| Apache Spark | Read and write Singdata Lakehouse tables via the Spark Connector; supports the DataFrame API and spark-sql | [Spark Connector](spark-connector-summary.md) |
|
|
78
78
|
| Apache Flink | Write to Singdata Lakehouse via the Flink Connector; supports CDC scenarios and append-only mode; sink tables only (write) | [Flink Connector](flink-write-connector.md) |
|
|
79
|
-
| Trino | Query Singdata Lakehouse data via Trino federated queries | [Trino Integration Guide](eco_integration/trino.md) |
|
|
80
79
|
|
|
81
80
|
**Two Flink Connector modes**:
|
|
82
81
|
- `igs-dynamic-table`: supports CDC (insert / update / delete); the target table must have a primary key
|
|
@@ -116,6 +115,5 @@ What is your use case?
|
|
|
116
115
|
│ └── Python applications → Python SDK
|
|
117
116
|
└── Compute engine
|
|
118
117
|
├── Batch processing / ML → Spark Connector
|
|
119
|
-
|
|
120
|
-
└── Federated queries → Trino
|
|
118
|
+
└── Stream processing / CDC → Flink Connector
|
|
121
119
|
```
|
|
@@ -0,0 +1,145 @@
|
|
|
1
|
+
# Ecosystem
|
|
2
|
+
|
|
3
|
+
Singdata Lakehouse is compatible with mainstream data integration, BI, AI, and development tools, and is deployed on seven public clouds including Alibaba Cloud, Tencent Cloud, and AWS. This document summarizes verified third-party tools and connection solutions organized by category.
|
|
4
|
+
|
|
5
|
+
If the tool you need is not on the list, that does not mean it is unsupported — Lakehouse provides standard access via JDBC, MySQL protocol, and Python/Java SDKs, and any tool compatible with these protocols can connect directly. If you want to develop a new connector or integration solution based on Lakehouse, feel free to contact our partner team.
|
|
6
|
+
|
|
7
|
+
## Cloud Platforms (CSP)
|
|
8
|
+
|
|
9
|
+
Lakehouse is deployed on seven clouds: Alibaba Cloud, Tencent Cloud, AWS, GCP, Huawei Cloud, Baidu AI Cloud, and Volcengine. Alibaba Cloud, Tencent Cloud, and AWS provide complete dedicated documentation (including storage connections, private network connections, and permission configuration); the configuration approach is consistent across all other cloud platforms. BYOS (Bring Your Own Storage) deployment is also supported — data is stored under the user's own cloud account and does not pass through the Singdata platform. See [Supported Cloud Platforms](supported-cloud-platforms.md) and [Private Storage Overview](byos_general.md) for details.
|
|
10
|
+
|
|
11
|
+
***
|
|
12
|
+
|
|
13
|
+
## Data Integration
|
|
14
|
+
|
|
15
|
+
The following data integration tools are compatible with Lakehouse, covering offline batch, real-time CDC, message streaming, and log collection scenarios. Lakehouse also supports [50+ data sources](data-sources.md) (MySQL, Oracle, PostgreSQL, MongoDB, Hive, MaxCompute, etc.) via Studio Data Sync for direct access without third-party tools:
|
|
16
|
+
|
|
17
|
+
| Tool | Connection | Description | Reference |
|
|
18
|
+
| ------------ | ------------------ | ------------------------------------------------------------- | ------------------------------------------------------------ |
|
|
19
|
+
| Apache Kafka | Kafka Connector | Real-time message stream writing to Lakehouse | [Kafka Data Source](DataSource_Kafka.md) |
|
|
20
|
+
| AutoMQ | Kafka Protocol | Next-generation message queue, compatible with Kafka protocol | [AutoMQ Data Source](DataSource_AutoMQ.md) |
|
|
21
|
+
| Airbyte | JDBC | Open-source ELT platform with a rich connector ecosystem | [Airbyte Integration Guide](airbyte.md) |
|
|
22
|
+
| DataX | Plugin-based | Alibaba open-source tool, suitable for batch data sync | [DataX Integration Guide](eco_integration/datax.md) |
|
|
23
|
+
| Apache Flink | Flink Connector | Stream processing engine for real-time writes to Lakehouse | [Flink Connector](flink-write-connector.md) |
|
|
24
|
+
| Apache Spark | Spark Connector | Large-scale data reads and writes for Lakehouse tables | [Spark Connector](spark-connector-summary.md) |
|
|
25
|
+
| Logstash | Logstash Connector | Import log data into Lakehouse | [Logstash Integration Guide](Logstash.md) |
|
|
26
|
+
| Bluepipe | Native integration | Real-time CDC sync from Oracle to Lakehouse | [Bluepipe Sync Guide](bluepipe-oracle-lakehouse-datasync.md) |
|
|
27
|
+
|
|
28
|
+
***
|
|
29
|
+
|
|
30
|
+
## BI and Visualization
|
|
31
|
+
|
|
32
|
+
The following BI tools are compatible with Lakehouse. Any BI tool supporting JDBC, ODBC, or MySQL protocol can connect directly and is not limited to the list below:
|
|
33
|
+
|
|
34
|
+
| Tool | Connection | Description | Reference |
|
|
35
|
+
| --------------- | -------------- | ---------------------------------------------------------------- | ---------------------------------------------------------------- |
|
|
36
|
+
| FineBI | JDBC / MySQL | Leading domestic BI tool | [JDBC Connection](FineBI.md) · [MySQL Protocol](finebi-mysql.md) |
|
|
37
|
+
| Tableau | JDBC | Suitable for complex visualizations and exploratory analysis | [Tableau Connection Guide](tableau-connect-to-lakehouse.md) |
|
|
38
|
+
| Power BI | MySQL Protocol | Connect via MySQL protocol | [Power BI Connection Guide](PowerBI.md) |
|
|
39
|
+
| Apache Superset | SQLAlchemy | Open-source, suitable for self-service analytics | [Superset Connection Guide](eco_integration/superset.md) |
|
|
40
|
+
| Metabase | JDBC | Open-source, easy to deploy, suitable for small and medium teams | [Metabase Connection Guide](metabase.md) |
|
|
41
|
+
| Apache Zeppelin | JDBC | Notebook-style data exploration | [Zeppelin Connection Guide](eco_integration/Zeppelin.md) |
|
|
42
|
+
| Rath | JDBC | Open-source intelligent analytics with automatic insight support | [Rath Connection Guide](eco_integration/rath.md) |
|
|
43
|
+
| Streamlit | Python SDK | Rapidly build data apps for data science teams | [Streamlit Connection Guide](eco_integration/streamlit.md) |
|
|
44
|
+
|
|
45
|
+
***
|
|
46
|
+
|
|
47
|
+
## Transformation and Compute Engines
|
|
48
|
+
|
|
49
|
+
The following data transformation tools and compute engines are compatible with Lakehouse:
|
|
50
|
+
|
|
51
|
+
| Tool | Connection | Description | Reference |
|
|
52
|
+
| ------------ | ---------------------- | ------------------------------------------------------------------------ | ----------------------------------------------- |
|
|
53
|
+
| dbt | dbt-clickzetta adapter | Data modeling and transformation, supports Dynamic Table materialization | [dbt Integration Guide](eco_integration/dbt.md) |
|
|
54
|
+
| Apache Spark | Spark Connector | Large-scale batch processing and machine learning | [Spark Connector](spark-connector-summary.md) |
|
|
55
|
+
| Apache Flink | Flink Connector | Real-time stream processing | [Flink Connector](flink-write-connector.md) |
|
|
56
|
+
|
|
57
|
+
The **dbt documentation series** covers all scenarios from quick start to migration practice: jaffle-shop experience, Snowflake/BigQuery migration, incremental processing, real-time pipelines, and data quality testing. See [DBT Practice Series](dbt-practice-series.md).
|
|
58
|
+
|
|
59
|
+
***
|
|
60
|
+
|
|
61
|
+
## AI and Machine Learning
|
|
62
|
+
|
|
63
|
+
The following AI frameworks and platforms are compatible with Lakehouse, supporting vector storage, RAG applications, and AI workflow scenarios:
|
|
64
|
+
|
|
65
|
+
| Tool | Integration | Description | Reference |
|
|
66
|
+
| --------------- | ---------------- | ------------------------------------------------ | -------------------------------------------------------------------------- |
|
|
67
|
+
| LangChain | Python SDK | Vector storage and RAG application development | [LangChain Integration](langchain_integration.md) |
|
|
68
|
+
| LlamaIndex | Python SDK | Data indexing and retrieval | [LlamaIndex Integration](llama-index.md) |
|
|
69
|
+
| Dify | MCP Server / SDK | Vector database + file storage | [Dify Integration Overview](dify_yunqilakehouse_integration_overview.md) |
|
|
70
|
+
| N8N | MCP Server | Unified AI workflows | [N8N Integration](N8N_AI_Workflow_Integration.md) |
|
|
71
|
+
| MindsDB | JDBC | ML/LLM modeling and prediction on Lakehouse data | [MindsDB Integration](JDBC_MindsDB_ML_LLM.md) |
|
|
72
|
+
| Datus | MCP Server | Data engineering agent | [Datus Integration](Datus_Lakehouse_Integrated_Guide.md) |
|
|
73
|
+
| Zilliz | Joint solution | Vector database joint solution | [Zilliz Joint Solution](lakehouse-zilliz-make-data-ready-for-bi-and-ai.md) |
|
|
74
|
+
| Unstructured.io | SDK | Unstructured document parsing and vectorization | [Unstructured.io Integration](unstructured-io.md) |
|
|
75
|
+
|
|
76
|
+
Lakehouse also provides an [MCP Server](LakehouseMCPServer.md) that can be called by any AI Agent supporting the MCP protocol.
|
|
77
|
+
|
|
78
|
+
***
|
|
79
|
+
|
|
80
|
+
## Programmatic Interfaces
|
|
81
|
+
|
|
82
|
+
Lakehouse provides the following native programming interfaces and SDKs:
|
|
83
|
+
|
|
84
|
+
| Interface | Language | Description | Reference |
|
|
85
|
+
| -------------- | ---------- | ----------------------------------------------------------- | ---------------------------------------------------------- |
|
|
86
|
+
| JDBC Driver | Java / JVM | Standard JDBC interface, compatible with all JVM ecosystems | [JDBC Driver](JDBC-Driver.md) |
|
|
87
|
+
| MySQL Protocol | All | No client dependency, compatible with MySQL ecosystem | [MySQL Protocol Connection](use-mysql-client.md) |
|
|
88
|
+
| Python SDK | Python | PEP 249 compatible, supports batch/real-time writes | [Python SDK](python_reference/python-sdk-summary.md) |
|
|
89
|
+
| Java SDK | Java | Supports BulkLoad and real-time stream writes | [Java SDK Batch Upload](use-java-sdk-upload-data-local.md) |
|
|
90
|
+
| SQLAlchemy | Python | Standard Python ORM / SQL toolkit | [SQLAlchemy Connection](sqlalchemy.md) |
|
|
91
|
+
| cz-cli | Shell | Command-line client: SQL + Studio Tasks + AI Agent | [cz-cli Guide](cz-cli.md) |
|
|
92
|
+
|
|
93
|
+
***
|
|
94
|
+
|
|
95
|
+
## SQL Clients and Database Management Tools
|
|
96
|
+
|
|
97
|
+
These tools connect via JDBC or MySQL protocol, compatible with standard SQL operations:
|
|
98
|
+
|
|
99
|
+
| Tool | Connection | Description | Reference |
|
|
100
|
+
| --------------- | -------------- | ------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------- |
|
|
101
|
+
| DBeaver | JDBC | Open-source and free, community edition is sufficient, suitable for daily queries and data exploration | [DBeaver Connection Guide](eco_integration/dbeaver-lakehouse.md) |
|
|
102
|
+
| DataGrip | JDBC | JetBrains product with strong code completion and SQL analysis | [DataGrip Connection Guide](eco_integration/datagrip-lakehouse.md) |
|
|
103
|
+
| SQL Workbench/J | JDBC | Lightweight, basic SQL execution | [SQL Workbench/J Connection Guide](eco_integration/sqlworkbench-j-lakehouse.md) |
|
|
104
|
+
| Navicat | MySQL Protocol | Visual management with intuitive operations | [Navicat Connection Guide](navicat-mysql.md) |
|
|
105
|
+
|
|
106
|
+
***
|
|
107
|
+
|
|
108
|
+
## Data Lake Formats
|
|
109
|
+
|
|
110
|
+
Lakehouse is **natively based on Apache Iceberg** — tables are stored in Iceberg format, supporting time travel, partition evolution, schema evolution, and cross-engine access. Delta Lake and Hudi formats are also supported via external tables:
|
|
111
|
+
|
|
112
|
+
| Format | Relationship | Description | Reference |
|
|
113
|
+
| -------------- | -------------- | --------------------------------------------------------------- | ------------------------------------------------------------ |
|
|
114
|
+
| Apache Iceberg | Native format | Underlying format for all Lakehouse tables, cross-engine access | [Spark + Iceberg Analytics](spark-lakehouse-iceberg-rest.md) |
|
|
115
|
+
| Delta Lake | External table | Open table format from the Databricks ecosystem | [Delta Lake External Table](delta-lake.md) |
|
|
116
|
+
| Apache Hudi | External table | Open table format optimized for streaming writes | [Hudi External Table](external-hudi-table.md) |
|
|
117
|
+
|
|
118
|
+
**Federated Queries**: Query Iceberg tables in Hive, Databricks, and Snowflake OpenCatalog directly via External Catalog, without data migration. See [Federated Query](federation-query.md).
|
|
119
|
+
|
|
120
|
+
***
|
|
121
|
+
|
|
122
|
+
## Modern Data Stack
|
|
123
|
+
|
|
124
|
+
The following solution combinations show how to build a complete data platform using Lakehouse and ecosystem tools:
|
|
125
|
+
|
|
126
|
+
| Solution | Toolchain | Reference |
|
|
127
|
+
| ------------------ | ------------------------------------ | --------------------------------------------------------------------------- |
|
|
128
|
+
| ELT-oriented | Airbyte → Lakehouse → dbt → Metabase | [ELT Modern Data Stack](ELTModernDataStack.md) |
|
|
129
|
+
| Analytics-oriented | Lakehouse ← dbt → Superset | [Analytics Modern Data Stack](analytics-modern-data-stack.md) |
|
|
130
|
+
| BI + AI | Lakehouse + Zilliz | [BI + AI Joint Solution](lakehouse-zilliz-make-data-ready-for-bi-and-ai.md) |
|
|
131
|
+
|
|
132
|
+
***
|
|
133
|
+
|
|
134
|
+
> 💡 **Tip**: The list above contains verified and compatible third-party tools. Lakehouse provides standard access via JDBC, MySQL protocol, and Python/Java SDKs — any tool compatible with these protocols can be used directly. If the tool you need is not on the list, it can still connect normally.
|
|
135
|
+
|
|
136
|
+
## Quick Navigation
|
|
137
|
+
|
|
138
|
+
* **Understand product concepts**: [Key Concepts](key-concepts.md) · [Incremental Computing](incremental-computing.md)
|
|
139
|
+
* **Start ingesting data**: [Data Integration](#data-integration) · [50+ Data Source Support](data-sources.md)
|
|
140
|
+
* **Connect BI tools**: [BI and Visualization](#bi-and-visualization)
|
|
141
|
+
* **Data modeling**: [dbt Integration Guide](eco_integration/dbt.md) · [DBT Practice Series](dbt-practice-series.md)
|
|
142
|
+
* **Programmatic access**: [Programmatic Interfaces](#programmatic-interfaces)
|
|
143
|
+
* **AI application development**: [AI and Machine Learning](#ai-and-machine-learning)
|
|
144
|
+
|
|
145
|
+
^
|