@clickzetta/cz-cli-darwin-arm64 0.5.16 → 0.5.18

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (243) hide show
  1. package/bin/cz-cli +0 -0
  2. package/bin/skills/lakehouse-doc-en/SKILL.md +6 -11
  3. package/bin/skills/lakehouse-doc-en/references/AIGateway.md +58 -13
  4. package/bin/skills/lakehouse-doc-en/references/Computation.md +1 -1
  5. package/bin/skills/lakehouse-doc-en/references/DataSource_Amazon_DocumentDB.md +3 -1
  6. package/bin/skills/lakehouse-doc-en/references/Foreach.md +14 -14
  7. package/bin/skills/lakehouse-doc-en/references/JDBC-Driver.md +0 -1
  8. package/bin/skills/lakehouse-doc-en/references/LakehouseAI-overview.md +21 -8
  9. package/bin/skills/lakehouse-doc-en/references/LakehouseDataGPT-tour.md +4 -9
  10. package/bin/skills/lakehouse-doc-en/references/LakehouseStudio-tour.md +14 -19
  11. package/bin/skills/lakehouse-doc-en/references/Lakehouse_Zilliz_MakeDataReadyforBIandAI.md +1 -1
  12. package/bin/skills/lakehouse-doc-en/references/Logstash.md +3 -3
  13. package/bin/skills/lakehouse-doc-en/references/Migrate_Spark_DataEngineeringBestPractices_Project_to_Lakehouse.md +1 -1
  14. package/bin/skills/lakehouse-doc-en/references/Notebook.md +17 -17
  15. package/bin/skills/lakehouse-doc-en/references/RemoteFunction-as-udf.md +14 -14
  16. package/bin/skills/lakehouse-doc-en/references/SQL_External_Catalog_Guide.md +1 -9
  17. package/bin/skills/lakehouse-doc-en/references/SUMMARY.md +59 -29
  18. package/bin/skills/lakehouse-doc-en/references/WINDOWFUNCTION.md +99 -57
  19. package/bin/skills/lakehouse-doc-en/references/Zettapark_Data_Engineering_Demo.md +1 -1
  20. package/bin/skills/lakehouse-doc-en/references/access-control-configuration.md +1 -8
  21. package/bin/skills/lakehouse-doc-en/references/aigw-2026-2-5-1.0.md +16 -0
  22. package/bin/skills/lakehouse-doc-en/references/aigw-2026-3-29-1.0.2.md +14 -0
  23. package/bin/skills/lakehouse-doc-en/references/aigw-2026-3-8-1.0.1.md +16 -0
  24. package/bin/skills/lakehouse-doc-en/references/aigw-2026-4-28-1.1.md +29 -0
  25. package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-12-1.1.1.md +18 -0
  26. package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-15-1.2.md +9 -0
  27. package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-21-1.3.md +9 -0
  28. package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-28-1.4.md +10 -0
  29. package/bin/skills/lakehouse-doc-en/references/aigw-2026-6-3-1.5.md +9 -0
  30. package/bin/skills/lakehouse-doc-en/references/alicloud-arn-externalid.md +0 -5
  31. package/bin/skills/lakehouse-doc-en/references/answer-accuracy-improve.md +120 -103
  32. package/bin/skills/lakehouse-doc-en/references/application-list.md +1 -3
  33. package/bin/skills/lakehouse-doc-en/references/approval-list.md +16 -17
  34. package/bin/skills/lakehouse-doc-en/references/batch-load-parquet-file-into-lakehouse.md +1 -1
  35. package/bin/skills/lakehouse-doc-en/references/batch_sync.md +9 -9
  36. package/bin/skills/lakehouse-doc-en/references/batch_sync_Sop.md +2 -2
  37. package/bin/skills/lakehouse-doc-en/references/batchloadparquetfileintoLakehouse.md +1 -1
  38. package/bin/skills/lakehouse-doc-en/references/bulkloadv1-python-sdk.md +3 -3
  39. package/bin/skills/lakehouse-doc-en/references/chart-auto-refresh-guide.md +12 -6
  40. package/bin/skills/lakehouse-doc-en/references/clickzetta-sample-data.md +3 -3
  41. package/bin/skills/lakehouse-doc-en/references/code_approval.md +1 -5
  42. package/bin/skills/lakehouse-doc-en/references/composite_task.md +31 -42
  43. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_environment_and_data_generate.md +6 -9
  44. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_javasdk_bulkload_realtime.md +4 -10
  45. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_kafka_realtime_sync.md +1 -10
  46. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_local_file_into_table_by_studio.md +0 -6
  47. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_batchload_public_network.md +0 -5
  48. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_python_node.md +2 -7
  49. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_realtime_cdc_public_network.md +13 -18
  50. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_sql_insert.md +0 -1
  51. package/bin/skills/lakehouse-doc-en/references/concepts.md +1 -1
  52. package/bin/skills/lakehouse-doc-en/references/config-datasource.md +5 -7
  53. package/bin/skills/lakehouse-doc-en/references/connect-with-cli.md +116 -72
  54. package/bin/skills/lakehouse-doc-en/references/connect-with-cz-cli.md +151 -0
  55. package/bin/skills/lakehouse-doc-en/references/continue-job.md +9 -17
  56. package/bin/skills/lakehouse-doc-en/references/create-api-connection.md +315 -286
  57. package/bin/skills/lakehouse-doc-en/references/create-catalog-connection.md +1 -0
  58. package/bin/skills/lakehouse-doc-en/references/create-dynamic-table.md +4 -4
  59. package/bin/skills/lakehouse-doc-en/references/create-external-catalog.md +85 -22
  60. package/bin/skills/lakehouse-doc-en/references/create-table-ddl.md +45 -0
  61. package/bin/skills/lakehouse-doc-en/references/creating_alicloud_privatelinkendpoint.md +4 -6
  62. package/bin/skills/lakehouse-doc-en/references/creating_alicloud_privatelinkservice.md +4 -7
  63. package/bin/skills/lakehouse-doc-en/references/creating_tencentcloud_privatelinkendpoint.md +2 -7
  64. package/bin/skills/lakehouse-doc-en/references/creating_tencentcloud_privatelinkservice.md +1 -5
  65. package/bin/skills/lakehouse-doc-en/references/cz-cli-agent.md +15 -10
  66. package/bin/skills/lakehouse-doc-en/references/cz-cli-datasource.md +0 -8
  67. package/bin/skills/lakehouse-doc-en/references/cz-cli-sql.md +2 -45
  68. package/bin/skills/lakehouse-doc-en/references/cz-cli.md +53 -42
  69. package/bin/skills/lakehouse-doc-en/references/dashboard-version-management-guide.md +12 -4
  70. package/bin/skills/lakehouse-doc-en/references/data-integration-intro.md +1 -1
  71. package/bin/skills/lakehouse-doc-en/references/data-integration.md +29 -27
  72. package/bin/skills/lakehouse-doc-en/references/data-load-summary.md +3 -3
  73. package/bin/skills/lakehouse-doc-en/references/data-quality.md +25 -25
  74. package/bin/skills/lakehouse-doc-en/references/data-sharing.md +31 -54
  75. package/bin/skills/lakehouse-doc-en/references/data-sources.md +45 -45
  76. package/bin/skills/lakehouse-doc-en/references/data_catalog.md +23 -25
  77. package/bin/skills/lakehouse-doc-en/references/data_privacy.md +5 -2
  78. package/bin/skills/lakehouse-doc-en/references/data_sharing_between_accounts_guide.md +0 -4
  79. package/bin/skills/lakehouse-doc-en/references/data_visualization.md +4 -15
  80. package/bin/skills/lakehouse-doc-en/references/dataagent.md +39 -7
  81. package/bin/skills/lakehouse-doc-en/references/databricks-delta-to-lakehouse-migration.md +168 -0
  82. package/bin/skills/lakehouse-doc-en/references/databricks-dlt-to-lakehouse-migration.md +331 -0
  83. package/bin/skills/lakehouse-doc-en/references/databricks-external-catalog-practice.md +367 -0
  84. package/bin/skills/lakehouse-doc-en/references/databricks-jobs-to-studio-migration.md +199 -0
  85. package/bin/skills/lakehouse-doc-en/references/databricks-notebook-to-studio-migration.md +350 -0
  86. package/bin/skills/lakehouse-doc-en/references/databricks-uc-governance-to-lakehouse-migration.md +327 -0
  87. package/bin/skills/lakehouse-doc-en/references/datagpt-model-config.md +34 -0
  88. package/bin/skills/lakehouse-doc-en/references/datagpt_data_source.md +50 -37
  89. package/bin/skills/lakehouse-doc-en/references/datagpt_introduction.md +55 -79
  90. package/bin/skills/lakehouse-doc-en/references/datagpt_quickstart.md +50 -64
  91. package/bin/skills/lakehouse-doc-en/references/datalake-acceleration.md +75 -2
  92. package/bin/skills/lakehouse-doc-en/references/dbt-databricks-to-clickzetta-migration.md +242 -0
  93. package/bin/skills/lakehouse-doc-en/references/dynamic-mask.md +30 -30
  94. package/bin/skills/lakehouse-doc-en/references/dynamic-table-bestpractice.md +1 -1
  95. package/bin/skills/lakehouse-doc-en/references/dynamic-table-introduce.md +1 -1
  96. package/bin/skills/lakehouse-doc-en/references/dynamic_table_summary.md +1 -1
  97. package/bin/skills/lakehouse-doc-en/references/eco_integration/streamlit.md +1 -1
  98. package/bin/skills/lakehouse-doc-en/references/eco_integration/superset.md +1 -1
  99. package/bin/skills/lakehouse-doc-en/references/ecosystem-all.md +1 -3
  100. package/bin/skills/lakehouse-doc-en/references/ecosystem.md +145 -0
  101. package/bin/skills/lakehouse-doc-en/references/external-catalog-summary.md +33 -38
  102. package/bin/skills/lakehouse-doc-en/references/external-function-combo-practice.md +466 -0
  103. package/bin/skills/lakehouse-doc-en/references/f6fc6447ee.md +7 -9
  104. package/bin/skills/lakehouse-doc-en/references/federation-query.md +56 -6
  105. package/bin/skills/lakehouse-doc-en/references/finebi-mysql.md +2 -0
  106. package/bin/skills/lakehouse-doc-en/references/get-started-with-sample-data.md +10 -11
  107. package/bin/skills/lakehouse-doc-en/references/gitfolder.md +2 -3
  108. package/bin/skills/lakehouse-doc-en/references/grant-privileges.md +2 -0
  109. package/bin/skills/lakehouse-doc-en/references/iceberg-rest-catalog-databricks.md +166 -0
  110. package/bin/skills/lakehouse-doc-en/references/ide.md +1 -1
  111. package/bin/skills/lakehouse-doc-en/references/if_else_task.md +59 -57
  112. package/bin/skills/lakehouse-doc-en/references/input_output.md +10 -7
  113. package/bin/skills/lakehouse-doc-en/references/jobprofile-bestpractices.md +60 -64
  114. package/bin/skills/lakehouse-doc-en/references/kafka-connection.md +0 -1
  115. package/bin/skills/lakehouse-doc-en/references/key-concepts.md +146 -117
  116. package/bin/skills/lakehouse-doc-en/references/lakehouse-ai-gateway-cz-cli.md +317 -0
  117. package/bin/skills/lakehouse-doc-en/references/lakehouse-ai-sql-analysis.md +345 -0
  118. package/bin/skills/lakehouse-doc-en/references/lakehouse-dqc-guide.md +300 -0
  119. package/bin/skills/lakehouse-doc-en/references/lakehouse-medallion-sql-dt-guide.md +543 -0
  120. package/bin/skills/lakehouse-doc-en/references/lakehouse-multi-cloud-acceleration.md +274 -0
  121. package/bin/skills/lakehouse-doc-en/references/lakehouse-multimodal-ai-pipeline.md +198 -0
  122. package/bin/skills/lakehouse-doc-en/references/lakehouse-quick-experience_guide.md +49 -52
  123. package/bin/skills/lakehouse-doc-en/references/lakehouse-volume-pipe-acceleration-guide.md +380 -0
  124. package/bin/skills/lakehouse-doc-en/references/langchain-plug-installation.md +1 -1
  125. package/bin/skills/lakehouse-doc-en/references/management.md +4 -9
  126. package/bin/skills/lakehouse-doc-en/references/medallion-lakehouse-from-scratch.md +2 -1
  127. package/bin/skills/lakehouse-doc-en/references/metrics_answer_build.md +58 -21
  128. package/bin/skills/lakehouse-doc-en/references/migrate-spark-data-engineering-best-practices-to-lakehouse.md +1 -1
  129. package/bin/skills/lakehouse-doc-en/references/mindsdb.md +1 -1
  130. package/bin/skills/lakehouse-doc-en/references/monitoring_and_alerting.md +65 -60
  131. package/bin/skills/lakehouse-doc-en/references/monitoring_item_specification.md +33 -33
  132. package/bin/skills/lakehouse-doc-en/references/multitable_batch_sync.md +16 -16
  133. package/bin/skills/lakehouse-doc-en/references/multitable_realtime_sync.md +65 -72
  134. package/bin/skills/lakehouse-doc-en/references/multitable_realtime_sync_sop.md +54 -52
  135. package/bin/skills/lakehouse-doc-en/references/navicat-mysql.md +2 -0
  136. package/bin/skills/lakehouse-doc-en/references/om-dynamic-table.md +71 -66
  137. package/bin/skills/lakehouse-doc-en/references/om-vcluster.md +2 -0
  138. package/bin/skills/lakehouse-doc-en/references/open-api-create-session.md +79 -0
  139. package/bin/skills/lakehouse-doc-en/references/open-api-generate-auth-token.md +63 -0
  140. package/bin/skills/lakehouse-doc-en/references/open-api-overview.md +96 -0
  141. package/bin/skills/lakehouse-doc-en/references/open-api-quick-start.md +286 -0
  142. package/bin/skills/lakehouse-doc-en/references/open-api-response-guide.md +264 -0
  143. package/bin/skills/lakehouse-doc-en/references/open-api-safe-question-poll.md +201 -0
  144. package/bin/skills/lakehouse-doc-en/references/open-api-text2insight-query.md +99 -0
  145. package/bin/skills/lakehouse-doc-en/references/open-api-text2insight-stop.md +74 -0
  146. package/bin/skills/lakehouse-doc-en/references/overview.md +6 -7
  147. package/bin/skills/lakehouse-doc-en/references/permission-application.md +5 -5
  148. package/bin/skills/lakehouse-doc-en/references/pipe-introduction.md +1 -0
  149. package/bin/skills/lakehouse-doc-en/references/pipe-kafka-table-stream.md +72 -70
  150. package/bin/skills/lakehouse-doc-en/references/pipe-kafka.md +105 -110
  151. package/bin/skills/lakehouse-doc-en/references/pipe-overview.md +40 -40
  152. package/bin/skills/lakehouse-doc-en/references/pipe-storage-object.md +43 -48
  153. package/bin/skills/lakehouse-doc-en/references/pipe-summary.md +14 -4
  154. package/bin/skills/lakehouse-doc-en/references/pipe-syntax.md +58 -151
  155. package/bin/skills/lakehouse-doc-en/references/practice_python_task.md +4 -4
  156. package/bin/skills/lakehouse-doc-en/references/pricing-ai-gateway.md +181 -0
  157. package/bin/skills/lakehouse-doc-en/references/pricing-lakehouse.md +316 -0
  158. package/bin/skills/lakehouse-doc-en/references/pricing.md +44 -288
  159. package/bin/skills/lakehouse-doc-en/references/private-link-general.md +0 -2
  160. package/bin/skills/lakehouse-doc-en/references/pyspark-to-zettapark-migration-f1.md +1 -1
  161. package/bin/skills/lakehouse-doc-en/references/python-igs.md +7 -3
  162. package/bin/skills/lakehouse-doc-en/references/python-sample-put-github-rt-events.md +1 -1
  163. package/bin/skills/lakehouse-doc-en/references/python-task.md +1 -1
  164. package/bin/skills/lakehouse-doc-en/references/python_reference/connector.md +3 -3
  165. package/bin/skills/lakehouse-doc-en/references/python_reference/connector_advanced.md +2 -2
  166. package/bin/skills/lakehouse-doc-en/references/python_reference/connector_examples.md +2 -2
  167. package/bin/skills/lakehouse-doc-en/references/python_sdk_guide.md +1 -1
  168. package/bin/skills/lakehouse-doc-en/references/python_shell_datasource.md +11 -9
  169. package/bin/skills/lakehouse-doc-en/references/quick_start_batch_sync_data.md +9 -18
  170. package/bin/skills/lakehouse-doc-en/references/quick_start_bi_analysis.md +8 -25
  171. package/bin/skills/lakehouse-doc-en/references/quick_start_create_workspace.md +4 -6
  172. package/bin/skills/lakehouse-doc-en/references/quick_start_data_quality.md +8 -8
  173. package/bin/skills/lakehouse-doc-en/references/quick_start_etl.md +16 -20
  174. package/bin/skills/lakehouse-doc-en/references/quick_start_monitoring_and_alerting.md +10 -18
  175. package/bin/skills/lakehouse-doc-en/references/quick_start_sql_query.md +7 -10
  176. package/bin/skills/lakehouse-doc-en/references/quick_start_upload_data.md +5 -7
  177. package/bin/skills/lakehouse-doc-en/references/quick_start_user_management.md +8 -8
  178. package/bin/skills/lakehouse-doc-en/references/quick_start_workspace.md +0 -5
  179. package/bin/skills/lakehouse-doc-en/references/quick_start_workspace_user.md +8 -8
  180. package/bin/skills/lakehouse-doc-en/references/quickstart.md +69 -56
  181. package/bin/skills/lakehouse-doc-en/references/quickstart_datashare_between_companies.md +0 -5
  182. package/bin/skills/lakehouse-doc-en/references/quickstart_envirment_for_team.md +0 -24
  183. package/bin/skills/lakehouse-doc-en/references/realtime-pipeline-selection-guide.md +1 -2
  184. package/bin/skills/lakehouse-doc-en/references/realtime-sales-dashboard-with-dynamic-table.md +3 -3
  185. package/bin/skills/lakehouse-doc-en/references/realtime_sync.md +0 -1
  186. package/bin/skills/lakehouse-doc-en/references/release-note-2026-05-19.md +5 -3
  187. package/bin/skills/lakehouse-doc-en/references/revoke-privileges.md +3 -1
  188. package/bin/skills/lakehouse-doc-en/references/roles.md +2 -3
  189. package/bin/skills/lakehouse-doc-en/references/row-filter.md +165 -0
  190. package/bin/skills/lakehouse-doc-en/references/row_level_permission.md +30 -19
  191. package/bin/skills/lakehouse-doc-en/references/scheduled_task.md +28 -21
  192. package/bin/skills/lakehouse-doc-en/references/security_overview.md +99 -21
  193. package/bin/skills/lakehouse-doc-en/references/set-command.md +1 -1
  194. package/bin/skills/lakehouse-doc-en/references/setup.md +13 -15
  195. package/bin/skills/lakehouse-doc-en/references/show-grants.md +1 -1
  196. package/bin/skills/lakehouse-doc-en/references/snowflake-dynamic-tables-to-lakehouse.md +2 -2
  197. package/bin/skills/lakehouse-doc-en/references/spark-connector-summary.md +1 -1
  198. package/bin/skills/lakehouse-doc-en/references/sql_functions/context_functions/current_vcluster.md +1 -1
  199. package/bin/skills/lakehouse-doc-en/references/sso-configuration.md +2 -2
  200. package/bin/skills/lakehouse-doc-en/references/streaming_pipeline_with_dynamic_table.md +0 -1
  201. package/bin/skills/lakehouse-doc-en/references/studio-incremental-sync-practice.md +27 -23
  202. package/bin/skills/lakehouse-doc-en/references/studio-shell-task.md +1 -1
  203. package/bin/skills/lakehouse-doc-en/references/supported-cloud-platforms.md +32 -0
  204. package/bin/skills/lakehouse-doc-en/references/table_rendering.md +18 -12
  205. package/bin/skills/lakehouse-doc-en/references/task-develop.md +89 -91
  206. package/bin/skills/lakehouse-doc-en/references/task_development.md +19 -17
  207. package/bin/skills/lakehouse-doc-en/references/task_group.md +16 -14
  208. package/bin/skills/lakehouse-doc-en/references/task_instance.md +21 -21
  209. package/bin/skills/lakehouse-doc-en/references/task_param.md +38 -35
  210. package/bin/skills/lakehouse-doc-en/references/task_param_reference.md +81 -79
  211. package/bin/skills/lakehouse-doc-en/references/task_scheduling_dependency.md +20 -21
  212. package/bin/skills/lakehouse-doc-en/references/tencentcloud_arn_and_externalid.md +1 -5
  213. package/bin/skills/lakehouse-doc-en/references/trial-account-quotas-and-limits.md +1 -3
  214. package/bin/skills/lakehouse-doc-en/references/tutorial_connect_to_lakehouse.md +69 -0
  215. package/bin/skills/lakehouse-doc-en/references/tutorials.md +4 -1
  216. package/bin/skills/lakehouse-doc-en/references/unique-key.md +167 -0
  217. package/bin/skills/lakehouse-doc-en/references/usageandbillingview.md +138 -0
  218. package/bin/skills/lakehouse-doc-en/references/use-dbt-dev.md +3 -3
  219. package/bin/skills/lakehouse-doc-en/references/use-java-sdk-realtime-uploaddata.md +1 -1
  220. package/bin/skills/lakehouse-doc-en/references/use-java-sdk-upload-data-local.md +3 -3
  221. package/bin/skills/lakehouse-doc-en/references/use-models.md +128 -0
  222. package/bin/skills/lakehouse-doc-en/references/use-mysql-client.md +81 -81
  223. package/bin/skills/lakehouse-doc-en/references/use-python-sdk-upload-data.md +10 -12
  224. package/bin/skills/lakehouse-doc-en/references/user-identification.md +2 -3
  225. package/bin/skills/lakehouse-doc-en/references/user_permission_grand_guide.md +1 -1
  226. package/bin/skills/lakehouse-doc-en/references/using-udf-in-dynamic-table.md +1 -1
  227. package/bin/skills/lakehouse-doc-en/references/vc_cache.md +18 -22
  228. package/bin/skills/lakehouse-doc-en/references/vcluster_size_description.md +33 -31
  229. package/bin/skills/lakehouse-doc-en/references/virtual-cluster.md +43 -45
  230. package/bin/skills/lakehouse-doc-en/references/web-job-history.md +94 -108
  231. package/bin/skills/lakehouse-doc-en/references/web_search.md +16 -7
  232. package/bin/skills/lakehouse-doc-en/references/zettapark-data-engineering-demo.md +1 -1
  233. package/bin/skills/lakehouse-doc-en/references/zettapark-dataframe-guide.md +144 -70
  234. package/bin/skills/lakehouse-doc-en/references/zettapark-dynamic-table-guide.md +2 -2
  235. package/bin/skills/lakehouse-doc-en/references/zettapark-etl-guide.md +73 -33
  236. package/bin/skills/lakehouse-doc-en/references/zettapark-feature-engineering.md +2 -2
  237. package/bin/skills/lakehouse-doc-en/references/zettapark-functions-guide.md +75 -46
  238. package/bin/skills/lakehouse-doc-en/references/zettapark-quick-start.md +2 -2
  239. package/bin/skills/lakehouse-doc-en/references/zettapark-stream-guide.md +4 -4
  240. package/bin/skills/lakehouse-doc-en/references/zettapark-volume-guide.md +93 -29
  241. package/package.json +1 -1
  242. package/bin/skills/lakehouse-doc-en/references/CLAUDE.md +0 -606
  243. package/bin/skills/lakehouse-doc-en/references/modelprice.md +0 -155
@@ -0,0 +1,367 @@
1
+ # Databricks Unity Catalog Federation Query Practice
2
+
3
+ Singdata Lakehouse queries tables in Databricks Unity Catalog directly through an External Catalog. Data stays in Databricks' S3 storage without moving — Lakehouse handles SQL execution and result delivery. This guide uses an AWS environment as an example and walks through the complete configuration process from scratch.
4
+
5
+ ![](.topwrite/assets/15-databricks-federation.svg)
6
+
7
+ ---
8
+
9
+ ## Prerequisites
10
+
11
+ - Databricks workspace: Unity Catalog support required (Free Edition already supports it)
12
+ - Singdata Lakehouse instance: must be on the **same cloud platform** (both AWS) as the Databricks data storage (S3)
13
+ - Tools: cz-cli with the corresponding AWS profile pre-configured
14
+
15
+ ---
16
+
17
+ ## SQL Commands Involved
18
+
19
+ | Command | Purpose |
20
+ |---------|---------|
21
+ | `CREATE CATALOG CONNECTION` | Store Databricks OAuth M2M authentication credentials |
22
+ | `CREATE EXTERNAL CATALOG` | Create an external catalog pointing to Databricks Unity Catalog |
23
+ | `SHOW SCHEMAS IN` | List schemas in a Databricks Catalog |
24
+ | `SHOW TABLES IN` | List tables in a schema |
25
+ | `SELECT` | Query Databricks table data |
26
+
27
+ ---
28
+
29
+ ## Databricks Configuration
30
+
31
+ > 💡 **Account Console vs Workspace**: Databricks has two different entry points. `https://accounts.cloud.databricks.com` is the **Account Console**, which manages organization, users, service principals, and other account-level settings. `https://dbc-xxx.cloud.databricks.com` is the **Workspace**, used for data development. The following SP configuration is done in Account Console; permission grants are done in the Workspace.
32
+
33
+ ### Create a Service Principal
34
+
35
+ Open `https://accounts.cloud.databricks.com` → **User management** → **Service principals** → **Add service principal**, and give it a recognizable name (e.g., `lakehouse_connector`).
36
+
37
+ > If `https://accounts.cloud.databricks.com` redirects directly into a Workspace, you may be using the Community Edition, which does not support Unity Catalog and cannot proceed. The Free Edition supports Unity Catalog and can use this feature normally.
38
+
39
+ After creating the SP, click its name to enter the details page and complete the following:
40
+
41
+ 1. **Roles** tab → Enable **Account admin**
42
+ 2. **Principal information** tab → Record the **Application ID** (this is the `CLIENT_ID` used later, in UUID format)
43
+ 3. **Credentials & secrets** tab → **Generate secret** → Record the complete **Secret** generated (this is the `CLIENT_SECRET` used later)
44
+
45
+ > **Note**: The Application ID is visible on both the SP list page and the Principal information tab. The Secret is only shown in full at generation time — save it immediately. Secrets have an `Expires at` expiration time; after expiry you must generate a new one and update the Catalog Connection.
46
+
47
+ ### Add the SP to the Workspace
48
+
49
+ In the Databricks Workspace → **Settings** → **Identity and access** → **Service Principals** → **Add service principal**, and add the SP you just created.
50
+
51
+ > **Note**: The SP must satisfy three conditions simultaneously for authentication to succeed: Account Admin role + added to Workspace + Catalog/Schema permissions. Missing any one of these will result in an `invalid_client` error.
52
+
53
+ ### Enable Metastore External Data Access
54
+
55
+ In the Databricks Workspace → **Catalog** → gear icon → **Metastore** → **Details** tab → find **External data access** → enable the toggle.
56
+
57
+ Without this option enabled, querying data will produce:
58
+
59
+ ```
60
+ PermissionDenied: External Data Access from non Databricks Compute environment is disabled for metastore
61
+ ```
62
+
63
+ ### Grant Catalog and Schema Permissions
64
+
65
+ In Databricks Catalog Explorer, grant the following permissions to the SP:
66
+
67
+ - **Catalog level**: `USE CATALOG`
68
+ - **Schema level**: `USE SCHEMA`, `SELECT`, `EXTERNAL USE SCHEMA`
69
+
70
+ > `EXTERNAL USE SCHEMA` is a required permission for federation queries. It can be granted in the regular Permissions panel; no SQL execution is needed.
71
+
72
+ If you have a SQL execution environment (Notebook or SQL Warehouse), you can also use GRANT commands (replace `<application-id>` with the SP's Application ID in UUID format):
73
+
74
+ ```sql
75
+ GRANT USE CATALOG ON CATALOG workspace
76
+ TO `<application-id>`;
77
+
78
+ GRANT USE SCHEMA ON SCHEMA workspace.table_types_demo
79
+ TO `<application-id>`;
80
+
81
+ GRANT SELECT ON SCHEMA workspace.table_types_demo
82
+ TO `<application-id>`;
83
+
84
+ -- Required permission for federation queries
85
+ GRANT EXTERNAL USE SCHEMA ON SCHEMA workspace.table_types_demo
86
+ TO `<application-id>`;
87
+ ```
88
+
89
+ ---
90
+
91
+ ## Create a Catalog Connection
92
+
93
+ ```sql
94
+ CREATE CATALOG CONNECTION IF NOT EXISTS databricks_conn
95
+ TYPE databricks
96
+ HOST = 'https://<workspace-url>.cloud.databricks.com'
97
+ CLIENT_ID = '<application-id>'
98
+ CLIENT_SECRET = '<oauth-secret>'
99
+ ACCESS_REGION = '<s3-bucket-region>';
100
+ ```
101
+
102
+ > **`ACCESS_REGION` should be the region of the S3 bucket, not the Databricks workspace region.**
103
+ >
104
+ > These are often different — the region selected when creating the Databricks workspace and the region of the S3 bucket where data is actually stored may not match. Using the wrong value causes query timeouts or `PermanentRedirect` errors.
105
+ >
106
+ > **How to confirm the S3 bucket region**: In the Databricks Workspace → left sidebar **Catalog** → click any table → in the right panel find the **Details** tab (or expand the properties panel on the right after clicking the table name) → check the **Storage Location** field, e.g., `s3://my-bucket/path/`. Then go to the AWS S3 Console, search for this bucket, and check its **AWS Region** attribute.
107
+
108
+ Verify the connection:
109
+
110
+ ```sql
111
+ SHOW CATALOG CONNECTIONS;
112
+ ```
113
+
114
+ ---
115
+
116
+ ## Create an External Catalog
117
+
118
+ ```sql
119
+ CREATE EXTERNAL CATALOG IF NOT EXISTS databricks_catalog
120
+ CONNECTION databricks_conn
121
+ OPTIONS ('catalog' = '<databricks-catalog-name>');
122
+ ```
123
+
124
+ The value of `catalog` is the name of the catalog in Databricks Unity Catalog. In the Databricks Workspace → left sidebar **Catalog** icon → expand the left panel. Catalogs listed under **My organization** are the available ones (e.g., `workspace`, `main`, `hive_metastore`).
125
+
126
+ Verify:
127
+
128
+ ```sql
129
+ SHOW SCHEMAS IN databricks_catalog;
130
+ ```
131
+
132
+ Sample output:
133
+
134
+ ```
135
+ schema_name
136
+ -----------
137
+ default
138
+ information_schema
139
+ table_types_demo
140
+ ```
141
+
142
+ ---
143
+
144
+ ## Querying Data
145
+
146
+ ```sql
147
+ -- View table list
148
+ SHOW TABLES IN databricks_catalog.table_types_demo;
149
+
150
+ -- Query data
151
+ SELECT * FROM databricks_catalog.table_types_demo.orders_external LIMIT 10;
152
+ ```
153
+
154
+ ---
155
+
156
+ ## Federation Queries: Cross-Platform SQL Analysis
157
+
158
+ The core value of federation queries is: **Lakehouse can directly query and join Databricks data, and can also write Databricks data into Lakehouse local tables** — all without any data migration.
159
+
160
+ ### Scenario 1: JOIN Between Databricks Tables
161
+
162
+ `orders_external` (orders) and `customers_external` (customers) are joined via `customer_id` to calculate the order count and total spend per customer. Both tables have a `price` field of type `DECIMAL(10,2)`, requiring no conversion:
163
+
164
+ ```sql
165
+ SELECT
166
+ c.customer_name,
167
+ c.country,
168
+ c.loyalty_tier,
169
+ COUNT(o.order_id) AS order_count,
170
+ SUM(o.price) AS total_revenue
171
+ FROM databricks_catalog.table_types_demo.orders_external o
172
+ JOIN databricks_catalog.table_types_demo.customers_external c
173
+ ON o.customer_id = c.customer_id
174
+ GROUP BY c.customer_name, c.country, c.loyalty_tier
175
+ ORDER BY total_revenue DESC;
176
+ ```
177
+
178
+ Query results:
179
+
180
+ | customer_name | country | loyalty_tier | order_count | total_revenue |
181
+ |---------------|-----------|--------------|-------------|---------------|
182
+ | Alice Chen | China | Gold | 2 | 1698.99 |
183
+ | Frank Liu | China | Silver | 1 | 299.99 |
184
+ | Carol Zhang | China | Platinum | 2 | 249.98 |
185
+ | David Lee | Singapore | Bronze | 1 | 129.99 |
186
+ | Emma Wang | China | Silver | 1 | 89.99 |
187
+
188
+ ### Scenario 2: Low Inventory Alert
189
+
190
+ `inventory_delta` records inventory by warehouse (fields: `product_id`, `warehouse_location`, `quantity_available`). Query products with inventory below a threshold, summarized by warehouse:
191
+
192
+ ```sql
193
+ SELECT
194
+ warehouse_location,
195
+ COUNT(*) AS low_stock_products,
196
+ MIN(quantity_available) AS min_stock,
197
+ AVG(quantity_available) AS avg_stock
198
+ FROM databricks_catalog.table_types_demo.inventory_delta
199
+ WHERE quantity_available < 200
200
+ GROUP BY warehouse_location
201
+ ORDER BY min_stock;
202
+ ```
203
+
204
+ Query results:
205
+
206
+ | warehouse_location | low_stock_products | min_stock | avg_stock |
207
+ |--------------------|--------------------|-----------|-----------|
208
+ | Warehouse A | 1 | 50 | 50.0 |
209
+ | Warehouse B | 2 | 75 | 112.5 |
210
+
211
+ ### Scenario 3: Writing Databricks Data to Lakehouse
212
+
213
+ Federation query results can be written directly into Lakehouse local tables for data consolidation. In the example below, `public` is a Lakehouse local schema (not Databricks) — replace it with your actual local schema name:
214
+
215
+ ```sql
216
+ -- Extract completed orders from Databricks into a Lakehouse local table
217
+ CREATE TABLE public.orders_from_databricks AS
218
+ SELECT
219
+ order_id,
220
+ customer_id,
221
+ order_date,
222
+ product_name,
223
+ CAST(price AS DECIMAL(10,2)) AS price,
224
+ status
225
+ FROM databricks_catalog.table_types_demo.orders_external
226
+ WHERE status = 'Delivered';
227
+ ```
228
+
229
+ Query the local table with no network overhead:
230
+
231
+ ```sql
232
+ SELECT * FROM public.orders_from_databricks;
233
+ ```
234
+
235
+ | order_id | customer_id | order_date | product_name | price | status |
236
+ |----------|-------------|------------|-----------------|---------|-----------|
237
+ | 2006 | 1005 | 2026-05-20 | Webcam HD | 89.99 | Delivered |
238
+ | 2001 | 1001 | 2026-05-15 | Laptop Pro | 1299.99 | Delivered |
239
+ | 2004 | 1001 | 2026-05-18 | Monitor 27inch | 399.00 | Delivered |
240
+
241
+ ### Scenario 4: Dynamic Table Consuming Databricks Data
242
+
243
+ Use Databricks data as the upstream source for a Dynamic Table to periodically aggregate into Lakehouse. `public` is a Lakehouse local schema — replace it with your actual local schema name:
244
+
245
+ ```sql
246
+ CREATE OR REPLACE DYNAMIC TABLE public.orders_daily_summary
247
+ REFRESH INTERVAL 1 HOUR VCLUSTER DEFAULT
248
+ COMMENT 'Daily order summary aggregated from Databricks'
249
+ AS
250
+ SELECT
251
+ order_date,
252
+ COUNT(*) AS order_count,
253
+ SUM(CAST(price AS DECIMAL(10,2))) AS total_revenue,
254
+ AVG(CAST(price AS DECIMAL(10,2))) AS avg_order_value
255
+ FROM databricks_catalog.table_types_demo.orders_external
256
+ GROUP BY order_date;
257
+ ```
258
+
259
+ Query after the first refresh:
260
+
261
+ ```sql
262
+ REFRESH DYNAMIC TABLE public.orders_daily_summary;
263
+ SELECT * FROM public.orders_daily_summary ORDER BY order_date;
264
+ ```
265
+
266
+ | order_date | order_count | total_revenue | avg_order_value |
267
+ |------------|-------------|---------------|-----------------|
268
+ | 2026-05-15 | 1 | 1299.99 | 1299.99 |
269
+ | 2026-05-17 | 1 | 49.99 | 49.99 |
270
+ | 2026-05-18 | 1 | 399.00 | 399.00 |
271
+ | 2026-05-19 | 1 | 129.99 | 129.99 |
272
+ | 2026-05-20 | 1 | 89.99 | 89.99 |
273
+ | 2026-06-04 | 2 | 499.98 | 249.99 |
274
+
275
+ > **Note**: When a Dynamic Table references an External Catalog table, every refresh is a full scan because Databricks tables do not support Table Stream. For large data volumes, it is recommended to first snapshot the data into a local table using `CREATE TABLE ... AS SELECT`, then build the Dynamic Table on the local table.
276
+
277
+ ---
278
+
279
+ ## Supported Table Types
280
+
281
+ Not all Databricks tables can be queried from Lakehouse. Support depends on the table's storage type:
282
+
283
+ | Table Type | Format | Supported | Notes |
284
+ |------------|--------|-----------|-------|
285
+ | `TABLE_DELTA_EXTERNAL` | Delta | Yes | Fully supported, recommended |
286
+ | `TABLE_DELTA` | Delta | Yes | Fully supported |
287
+ | `TABLE_EXTERNAL` (Delta format) | Delta | Yes | Supported |
288
+ | `TABLE_EXTERNAL` (Parquet/CSV/JSON) | Non-Delta | No | Reports `unsupported databricks table format` |
289
+ | `TABLE_DB_STORAGE` | Managed Delta | No | Cross-platform access not supported |
290
+ | `VIEW` | — | No | Driver compatibility issue |
291
+
292
+ **Key conclusion**: Only **Delta format** tables support federation queries, whether External or regular Delta tables. External tables in Parquet, CSV, or JSON format are currently not supported.
293
+
294
+ When creating tables in Databricks, prefer Delta format:
295
+
296
+ ```sql
297
+ -- Recommended: Delta External Table
298
+ CREATE TABLE catalog.schema.my_table
299
+ USING DELTA
300
+ LOCATION 's3://my-bucket/my-table/';
301
+
302
+ -- Not recommended (Lakehouse cannot query this)
303
+ CREATE TABLE catalog.schema.my_table
304
+ USING PARQUET
305
+ LOCATION 's3://my-bucket/my-table/';
306
+ ```
307
+
308
+ ---
309
+
310
+ ## Common Error Troubleshooting
311
+
312
+ ### `invalid_client`
313
+
314
+ OAuth authentication failed. Check in order:
315
+
316
+ 1. Has the SP enabled **Account admin** in Account Console → **Roles**?
317
+ 2. Has the SP been added in Workspace → **Settings** → **Service Principals**?
318
+ 3. Is the `CLIENT_SECRET` the complete value (not a masked value like `50db****7f61`)? Go to Account Console → SP → **Credentials & secrets** to generate a new one.
319
+ 4. Has the Secret expired? Check the `Expires at` field on the **Credentials & secrets** page. If expired, generate a new Secret and re-run `CREATE CATALOG CONNECTION`.
320
+
321
+ ### `PermissionDenied: External Data Access ... is disabled`
322
+
323
+ The Metastore has not enabled external data access. In the Databricks Workspace → Catalog → gear icon → Metastore → **External data access** → enable.
324
+
325
+ ### `PermissionDenied: User does not have USE CATALOG`
326
+
327
+ The SP does not have Catalog access permission. In Databricks Catalog Explorer, find the corresponding Catalog → Permissions → Grant → add `USE CATALOG` for the SP.
328
+
329
+ ### `PermissionDenied: User does not have EXTERNAL USE SCHEMA`
330
+
331
+ The SP does not have external access permission for the Schema. In Databricks Catalog Explorer → Schema → Permissions → Grant → add `USE SCHEMA`, `SELECT`, and `EXTERNAL USE SCHEMA` for the SP.
332
+
333
+ ### `NotFound: Catalog 'main' does not exist`
334
+
335
+ The catalog name specified in `OPTIONS ('catalog' = ...)` does not exist. Open the Databricks Workspace → Catalog panel to check the actual catalog names.
336
+
337
+ ### Query Timeout (300 seconds) or `PermanentRedirect`
338
+
339
+ `ACCESS_REGION` is incorrect; S3 requests are being redirected. Check the table's actual storage location (Catalog Explorer → table details → Storage Location), confirm the S3 bucket's region, and recreate the Catalog Connection.
340
+
341
+ ### `unsupported databricks table format {} [PARQUET/CSV/JSON]`
342
+
343
+ The table uses a non-Delta format, which is currently not supported for querying from Lakehouse. Recreate the table in Databricks using Delta format, or convert the data to a Delta table.
344
+
345
+ ### `Table cannot be accessed from outside of Databricks Compute Environment ... kind being TABLE_DB_STORAGE`
346
+
347
+ The table is a Databricks Managed Table — data is stored in Databricks-controlled storage and does not support direct cross-platform access. The table must first be converted to an External Table in Databricks before it can be queried.
348
+
349
+ ---
350
+
351
+ ## Important Notes
352
+
353
+ | Note | Description |
354
+ |------|-------------|
355
+ | **Cloud platform restriction** | Databricks' S3 storage must be on the same cloud platform as the Lakehouse instance (both AWS). Databricks on GCP/Azure cannot be interconnected with an AWS Lakehouse. |
356
+ | **Region consistency** | `ACCESS_REGION` must match the S3 bucket's region, not the workspace region. |
357
+ | **Read-only restriction** | External Catalogs are read-only — writing data from Lakehouse to Databricks (INSERT/UPDATE/DELETE) is not supported. The reverse (writing Databricks data into Lakehouse) is fully supported; see Scenario 3. |
358
+ | **Version requirement** | Requires a version that supports Unity Catalog. Free Edition is supported; Community Edition is not. |
359
+ | **Save your Secret** | The OAuth Secret is only shown in full at generation time — save it immediately. If lost, you must generate a new one. |
360
+
361
+ ---
362
+
363
+ ## Related Documentation
364
+
365
+ - [Create Catalog Connection](create-catalog-connection.md) — Complete DDL syntax and parameter descriptions
366
+ - [Create Databricks External Catalog](create-external-catalog.md) — External Catalog DDL syntax
367
+ - [Federation Query Usage Guide](SQL_External_Catalog_Guide.md) — Complete federation query examples
@@ -0,0 +1,199 @@
1
+ # Databricks Jobs → Lakehouse Studio Migration Guide: E-Commerce ETL Pipeline
2
+
3
+ If your data pipeline runs on Databricks Jobs — multi-task DAGs, task dependencies, scheduled triggers — the core migration effort to Singdata Lakehouse Studio is very low. Task content (PySpark/SQL code) is minimally rewritten with ZettaPark (4 mechanical substitutions). Task orchestration (DAG dependencies, cron scheduling) is rebuilt with a few `cz-cli` commands, all configured in one pass.
4
+
5
+ This article validates this with a real e-commerce ETL pipeline: Bronze ingestion → Silver cleansing and joining → Gold aggregation. 3 tasks + DAG dependencies + daily 02:00 schedule, fully migrated to Lakehouse Studio, passing all 8 automated validations.
6
+
7
+ Full code on GitHub: [databricks2lakehouse-jobs](https://github.com/clickzetta/databricks2lakehouse-jobs)
8
+
9
+ ---
10
+
11
+ ## Source Project
12
+
13
+ `01_source/jobs/ecommerce_etl_job.json`: Original Databricks Jobs definition — 3 notebook tasks, dependency chain 01→02→03, daily trigger at 02:00 AM:
14
+
15
+ ```json
16
+ {
17
+ "name": "ecommerce_etl_pipeline",
18
+ "schedule": {"quartz_cron_expression": "0 0 2 * * ?"},
19
+ "tasks": [
20
+ {"task_key": "ingest_raw", "notebook_task": {...}},
21
+ {"task_key": "transform_silver", "depends_on": [{"task_key": "ingest_raw"}]},
22
+ {"task_key": "aggregate_gold", "depends_on": [{"task_key": "transform_silver"}]}
23
+ ],
24
+ "email_notifications": {"on_failure": ["oncall@company.com"]}
25
+ }
26
+ ```
27
+
28
+ The pipeline processes e-commerce clickstream: 500 events × 30 products → daily sales summary across 5 categories.
29
+
30
+ Migrated code is in `03_lakehouse/tasks/`, comparable file-by-file with `01_source/notebooks/`.
31
+
32
+ ## Conclusion First
33
+
34
+ | Change | Effort | Notes |
35
+ |--------|--------|------|
36
+ | Task content (Python code) | Very low | ZettaPark 4 substitutions (import/session/table path/saveAsTable) |
37
+ | Task creation | Low | `cz-cli task create --type PYTHON --folder <id>` |
38
+ | Dependencies | Low | `--dep-tasks '[{"taskId":N,"taskName":"x"}]'` |
39
+ | Quartz cron → standard cron | Very low | `"0 0 2 * * ?"` → `"0 2 * * *"` |
40
+ | Alert notifications | Low | Databricks email → Studio monitoring rules (email/DingTalk/Feishu) |
41
+
42
+ `dbutils.notebook.run(nb)`, Job cluster configuration — no migration needed, handled automatically by Studio DAG and Virtual Cluster.
43
+
44
+ ---
45
+
46
+ ## Tech Stack Comparison
47
+
48
+ | | Databricks Jobs | Lakehouse Studio |
49
+ |---|---|---|
50
+ | Pipeline definition | Job JSON (`tasks: [...]`) | `cz-cli task create/save-config` |
51
+ | Task dependencies | `depends_on: [{task_key}]` | `--dep-tasks '[{"taskId":N,"taskName":"x"}]'` |
52
+ | Task content | Databricks Notebook (PySpark) | Studio Python task (ZettaPark) |
53
+ | Session | `spark` (global injection) | `clickzetta_dbutils.get_active_lakehouse_engine()` |
54
+ | Schedule cron | Quartz `"0 0 2 * * ?"` | Standard `"0 2 * * *"` |
55
+ | Cluster configuration | `job_clusters: [{...EC2 config}]` | Virtual Cluster auto-managed, no configuration needed |
56
+ | `dbutils.notebook.run(nb)` | Chained invocation | Replaced by Studio DAG dependencies |
57
+ | Failure alerts | `email_notifications.on_failure` | Studio monitoring rules (email/DingTalk/Feishu) |
58
+
59
+ ---
60
+
61
+ ![](.topwrite/assets/anim-32-databricks-jobs-migration.svg)
62
+
63
+ ---
64
+
65
+ ## Migration Steps
66
+
67
+ ### Step 1: Task Content — ZettaPark 4 Substitutions
68
+
69
+ Each notebook requires minimal changes; all business logic is preserved:
70
+
71
+ ```python
72
+ # Databricks notebook (original)
73
+ from pyspark.sql import functions as F # ← pyspark
74
+
75
+ df = spark.read.csv("/Volumes/ecommerce/landing/events/") # ← spark global
76
+ events = spark.table("ecommerce.bronze.raw_events") # ← 3-level naming
77
+
78
+ df.write.saveAsTable("ecommerce.silver.events_enriched")
79
+ ```
80
+
81
+ ```python
82
+ # Studio Python task (03_lakehouse/tasks/02_transform_silver.py)
83
+ from clickzetta.zettapark import functions as F # ① import
84
+ # session injected by platform (via clickzetta_dbutils) # ② session
85
+
86
+ events = session.table("jobs_bronze.raw_events") # ③ table path
87
+ df.write.saveAsTable("jobs_silver.events_enriched") # ④ saveAsTable
88
+ ```
89
+
90
+ DataFrame logic (join/filter/groupBy/agg/withColumn) is completely unchanged.
91
+
92
+ ### Step 2: Task Creation
93
+
94
+ ```bash
95
+ # "task_key" in Databricks Jobs JSON corresponds to Studio task name
96
+ # --type is required (SQL / PYTHON / SHELL etc.)
97
+ # --folder specifies task folder ID (query via cz-cli task list-folders)
98
+
99
+ cz-cli task create etl_01_ingest_raw --type PYTHON --folder 91047 --profile aws_singapore_prod
100
+ cz-cli task create etl_02_transform_silver --type PYTHON --folder 91047 --profile aws_singapore_prod
101
+ cz-cli task create etl_03_aggregate_gold --type PYTHON --folder 91047 --profile aws_singapore_prod
102
+
103
+ # Upload task scripts
104
+ cz-cli task save-content <task_id> --file 03_lakehouse/tasks/01_ingest_raw.py
105
+ ```
106
+
107
+ ### Step 3: Set DAG Dependencies
108
+
109
+ Databricks Jobs uses `depends_on: [{task_key}]`; Studio uses `--dep-tasks` (requires both taskId and taskName):
110
+
111
+ ```bash
112
+ # Databricks Job JSON:
113
+ # {"task_key": "transform_silver", "depends_on": [{"task_key": "ingest_raw"}]}
114
+
115
+ # Studio equivalent (requires taskId + taskName both):
116
+ cz-cli task save-config <id_02> \
117
+ --deps replace \
118
+ --dep-tasks '[{"taskId":10143594,"taskName":"etl_01_ingest_raw"}]'
119
+
120
+ cz-cli task save-config <id_03> \
121
+ --deps replace \
122
+ --dep-tasks '[{"taskId":10144488,"taskName":"etl_02_transform_silver"}]'
123
+ ```
124
+
125
+ ### Step 4: Schedule Cron
126
+
127
+ Databricks uses Quartz 6-field format; Studio uses standard 5-field cron:
128
+
129
+ ```bash
130
+ # Databricks: "quartz_cron_expression": "0 0 2 * * ?" (seconds minutes hours day month weekday)
131
+ # Studio: standard cron "0 2 * * *" (minutes hours day month weekday)
132
+
133
+ cz-cli task save-cron <id_01> --cron "0 2 * * *"
134
+ ```
135
+
136
+ ### Step 5: Deploy
137
+
138
+ ```bash
139
+ cz-cli task deploy <id_01> # Must deploy before task can be scheduled
140
+ cz-cli task deploy <id_02>
141
+ cz-cli task deploy <id_03>
142
+
143
+ # Manual trigger (equivalent to Databricks "Run now")
144
+ cz-cli task execute <id_01>
145
+ ```
146
+
147
+ ### Step 6: Failure Alert Configuration
148
+
149
+ Databricks configures `email_notifications` directly in the Job JSON; Studio configures via monitoring rules:
150
+
151
+ | Databricks | Studio |
152
+ |---|---|
153
+ | Job JSON `email_notifications.on_failure` | Studio UI: Alert Monitoring → New Monitoring Rule |
154
+ | Email only | Supports email, SMS, phone (high severity), Webhook (DingTalk/Feishu) |
155
+ | Task-level configuration | Notification policies can be reused across tasks |
156
+
157
+ **Configuration path**: Studio UI → Operations Monitoring → Alert Monitoring → New Monitoring Rule → Select "Task Instance Failure" event → Configure notification method (email/DingTalk/Feishu Webhook)
158
+
159
+ ---
160
+
161
+ ## E2E Validation Results
162
+
163
+ Tested on AWS Singapore instance, 8/8 all passed:
164
+
165
+ | Check | Expected | Result |
166
+ |--------|--------|------|
167
+ | jobs_bronze.raw_events | 500 | ✅ |
168
+ | jobs_bronze.products | 30 | ✅ |
169
+ | jobs_silver.events_enriched | 500 | ✅ |
170
+ | jobs_gold.daily_sales rows | 115 | ✅ |
171
+ | Total sales amount | 12,814.84 | ✅ |
172
+ | Total order count | 119 | ✅ |
173
+ | Category count | 5 | ✅ |
174
+ | Studio tasks all ONLINE | 3/3 | ✅ |
175
+
176
+ ---
177
+
178
+ ## Notes
179
+
180
+ - **`--dep-tasks` requires both taskId and taskName**: Passing only taskId will return `taskName is required`. Both fields are mandatory.
181
+ - **`--folder` takes folder ID, not name**: Query the ID with `cz-cli task list-folders`.
182
+ - **Task names cannot contain slashes**: The `folder/taskname` format will be parsed as a path in the CLI. Use the `--folder <id>` parameter to specify the folder, and only write the task name in the name field.
183
+ - **`--type` is required**: `cz-cli task create` will error without `--type`. Common types: `PYTHON`, `SQL`, `SHELL`.
184
+ - **Cron format conversion**: Quartz 6-field (seconds minutes hours day month weekday) → standard 5-field (minutes hours day month weekday). `"0 0 2 * * ?"` → `"0 2 * * *"`.
185
+
186
+ ## Related Documentation
187
+
188
+ ### Studio Task Development
189
+
190
+ - [Task Development and Scheduling](task-develop.md): Creating, editing content, and scheduling Studio tasks
191
+ - [Task Scheduling Dependencies](task_scheduling_dependency.md): DAG dependency configuration details
192
+ - [Studio Python Task Development Guide (ZettaPark)](studio-python-task-zettapark.md)
193
+ - [Studio Task Development and Operations (cz-cli)](cz-cli-studio-tasks.md)
194
+
195
+ ### Other Migration Guides
196
+
197
+ - [Databricks Notebook → Lakehouse Migration Guide](databricks-notebook-to-studio-migration.md): Single Notebook → Studio task
198
+ - [Databricks DLT → Lakehouse Migration Guide](databricks-dlt-to-lakehouse-migration.md): Declarative pipeline migration
199
+ - [Databricks Unity Catalog → Lakehouse Migration Guide](databricks-uc-governance-to-lakehouse-migration.md): Permissions and governance