@clickzetta/cz-cli-darwin-arm64 0.5.16 → 0.5.17

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (243) hide show
  1. package/bin/cz-cli +0 -0
  2. package/bin/skills/lakehouse-doc-en/SKILL.md +6 -11
  3. package/bin/skills/lakehouse-doc-en/references/AIGateway.md +58 -13
  4. package/bin/skills/lakehouse-doc-en/references/Computation.md +1 -1
  5. package/bin/skills/lakehouse-doc-en/references/DataSource_Amazon_DocumentDB.md +3 -1
  6. package/bin/skills/lakehouse-doc-en/references/Foreach.md +14 -14
  7. package/bin/skills/lakehouse-doc-en/references/JDBC-Driver.md +0 -1
  8. package/bin/skills/lakehouse-doc-en/references/LakehouseAI-overview.md +21 -8
  9. package/bin/skills/lakehouse-doc-en/references/LakehouseDataGPT-tour.md +4 -9
  10. package/bin/skills/lakehouse-doc-en/references/LakehouseStudio-tour.md +14 -19
  11. package/bin/skills/lakehouse-doc-en/references/Lakehouse_Zilliz_MakeDataReadyforBIandAI.md +1 -1
  12. package/bin/skills/lakehouse-doc-en/references/Logstash.md +3 -3
  13. package/bin/skills/lakehouse-doc-en/references/Migrate_Spark_DataEngineeringBestPractices_Project_to_Lakehouse.md +1 -1
  14. package/bin/skills/lakehouse-doc-en/references/Notebook.md +17 -17
  15. package/bin/skills/lakehouse-doc-en/references/RemoteFunction-as-udf.md +14 -14
  16. package/bin/skills/lakehouse-doc-en/references/SQL_External_Catalog_Guide.md +1 -9
  17. package/bin/skills/lakehouse-doc-en/references/SUMMARY.md +59 -29
  18. package/bin/skills/lakehouse-doc-en/references/WINDOWFUNCTION.md +99 -57
  19. package/bin/skills/lakehouse-doc-en/references/Zettapark_Data_Engineering_Demo.md +1 -1
  20. package/bin/skills/lakehouse-doc-en/references/access-control-configuration.md +1 -8
  21. package/bin/skills/lakehouse-doc-en/references/aigw-2026-2-5-1.0.md +16 -0
  22. package/bin/skills/lakehouse-doc-en/references/aigw-2026-3-29-1.0.2.md +14 -0
  23. package/bin/skills/lakehouse-doc-en/references/aigw-2026-3-8-1.0.1.md +16 -0
  24. package/bin/skills/lakehouse-doc-en/references/aigw-2026-4-28-1.1.md +29 -0
  25. package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-12-1.1.1.md +18 -0
  26. package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-15-1.2.md +9 -0
  27. package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-21-1.3.md +9 -0
  28. package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-28-1.4.md +10 -0
  29. package/bin/skills/lakehouse-doc-en/references/aigw-2026-6-3-1.5.md +9 -0
  30. package/bin/skills/lakehouse-doc-en/references/alicloud-arn-externalid.md +0 -5
  31. package/bin/skills/lakehouse-doc-en/references/answer-accuracy-improve.md +120 -103
  32. package/bin/skills/lakehouse-doc-en/references/application-list.md +1 -3
  33. package/bin/skills/lakehouse-doc-en/references/approval-list.md +16 -17
  34. package/bin/skills/lakehouse-doc-en/references/batch-load-parquet-file-into-lakehouse.md +1 -1
  35. package/bin/skills/lakehouse-doc-en/references/batch_sync.md +9 -9
  36. package/bin/skills/lakehouse-doc-en/references/batch_sync_Sop.md +2 -2
  37. package/bin/skills/lakehouse-doc-en/references/batchloadparquetfileintoLakehouse.md +1 -1
  38. package/bin/skills/lakehouse-doc-en/references/bulkloadv1-python-sdk.md +3 -3
  39. package/bin/skills/lakehouse-doc-en/references/chart-auto-refresh-guide.md +12 -6
  40. package/bin/skills/lakehouse-doc-en/references/clickzetta-sample-data.md +3 -3
  41. package/bin/skills/lakehouse-doc-en/references/code_approval.md +1 -5
  42. package/bin/skills/lakehouse-doc-en/references/composite_task.md +31 -42
  43. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_environment_and_data_generate.md +6 -9
  44. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_javasdk_bulkload_realtime.md +4 -10
  45. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_kafka_realtime_sync.md +1 -10
  46. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_local_file_into_table_by_studio.md +0 -6
  47. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_batchload_public_network.md +0 -5
  48. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_python_node.md +2 -7
  49. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_realtime_cdc_public_network.md +13 -18
  50. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_sql_insert.md +0 -1
  51. package/bin/skills/lakehouse-doc-en/references/concepts.md +1 -1
  52. package/bin/skills/lakehouse-doc-en/references/config-datasource.md +5 -7
  53. package/bin/skills/lakehouse-doc-en/references/connect-with-cli.md +116 -72
  54. package/bin/skills/lakehouse-doc-en/references/connect-with-cz-cli.md +151 -0
  55. package/bin/skills/lakehouse-doc-en/references/continue-job.md +9 -17
  56. package/bin/skills/lakehouse-doc-en/references/create-api-connection.md +315 -286
  57. package/bin/skills/lakehouse-doc-en/references/create-catalog-connection.md +1 -0
  58. package/bin/skills/lakehouse-doc-en/references/create-dynamic-table.md +4 -4
  59. package/bin/skills/lakehouse-doc-en/references/create-external-catalog.md +85 -22
  60. package/bin/skills/lakehouse-doc-en/references/create-table-ddl.md +45 -0
  61. package/bin/skills/lakehouse-doc-en/references/creating_alicloud_privatelinkendpoint.md +4 -6
  62. package/bin/skills/lakehouse-doc-en/references/creating_alicloud_privatelinkservice.md +4 -7
  63. package/bin/skills/lakehouse-doc-en/references/creating_tencentcloud_privatelinkendpoint.md +2 -7
  64. package/bin/skills/lakehouse-doc-en/references/creating_tencentcloud_privatelinkservice.md +1 -5
  65. package/bin/skills/lakehouse-doc-en/references/cz-cli-agent.md +15 -10
  66. package/bin/skills/lakehouse-doc-en/references/cz-cli-datasource.md +0 -8
  67. package/bin/skills/lakehouse-doc-en/references/cz-cli-sql.md +2 -45
  68. package/bin/skills/lakehouse-doc-en/references/cz-cli.md +53 -42
  69. package/bin/skills/lakehouse-doc-en/references/dashboard-version-management-guide.md +12 -4
  70. package/bin/skills/lakehouse-doc-en/references/data-integration-intro.md +1 -1
  71. package/bin/skills/lakehouse-doc-en/references/data-integration.md +29 -27
  72. package/bin/skills/lakehouse-doc-en/references/data-load-summary.md +3 -3
  73. package/bin/skills/lakehouse-doc-en/references/data-quality.md +25 -25
  74. package/bin/skills/lakehouse-doc-en/references/data-sharing.md +31 -54
  75. package/bin/skills/lakehouse-doc-en/references/data-sources.md +45 -45
  76. package/bin/skills/lakehouse-doc-en/references/data_catalog.md +23 -25
  77. package/bin/skills/lakehouse-doc-en/references/data_privacy.md +5 -2
  78. package/bin/skills/lakehouse-doc-en/references/data_sharing_between_accounts_guide.md +0 -4
  79. package/bin/skills/lakehouse-doc-en/references/data_visualization.md +4 -15
  80. package/bin/skills/lakehouse-doc-en/references/dataagent.md +39 -7
  81. package/bin/skills/lakehouse-doc-en/references/databricks-delta-to-lakehouse-migration.md +168 -0
  82. package/bin/skills/lakehouse-doc-en/references/databricks-dlt-to-lakehouse-migration.md +331 -0
  83. package/bin/skills/lakehouse-doc-en/references/databricks-external-catalog-practice.md +367 -0
  84. package/bin/skills/lakehouse-doc-en/references/databricks-jobs-to-studio-migration.md +199 -0
  85. package/bin/skills/lakehouse-doc-en/references/databricks-notebook-to-studio-migration.md +350 -0
  86. package/bin/skills/lakehouse-doc-en/references/databricks-uc-governance-to-lakehouse-migration.md +327 -0
  87. package/bin/skills/lakehouse-doc-en/references/datagpt-model-config.md +34 -0
  88. package/bin/skills/lakehouse-doc-en/references/datagpt_data_source.md +50 -37
  89. package/bin/skills/lakehouse-doc-en/references/datagpt_introduction.md +55 -79
  90. package/bin/skills/lakehouse-doc-en/references/datagpt_quickstart.md +50 -64
  91. package/bin/skills/lakehouse-doc-en/references/datalake-acceleration.md +75 -2
  92. package/bin/skills/lakehouse-doc-en/references/dbt-databricks-to-clickzetta-migration.md +242 -0
  93. package/bin/skills/lakehouse-doc-en/references/dynamic-mask.md +30 -30
  94. package/bin/skills/lakehouse-doc-en/references/dynamic-table-bestpractice.md +1 -1
  95. package/bin/skills/lakehouse-doc-en/references/dynamic-table-introduce.md +1 -1
  96. package/bin/skills/lakehouse-doc-en/references/dynamic_table_summary.md +1 -1
  97. package/bin/skills/lakehouse-doc-en/references/eco_integration/streamlit.md +1 -1
  98. package/bin/skills/lakehouse-doc-en/references/eco_integration/superset.md +1 -1
  99. package/bin/skills/lakehouse-doc-en/references/ecosystem-all.md +1 -3
  100. package/bin/skills/lakehouse-doc-en/references/ecosystem.md +145 -0
  101. package/bin/skills/lakehouse-doc-en/references/external-catalog-summary.md +33 -38
  102. package/bin/skills/lakehouse-doc-en/references/external-function-combo-practice.md +466 -0
  103. package/bin/skills/lakehouse-doc-en/references/f6fc6447ee.md +7 -9
  104. package/bin/skills/lakehouse-doc-en/references/federation-query.md +56 -6
  105. package/bin/skills/lakehouse-doc-en/references/finebi-mysql.md +2 -0
  106. package/bin/skills/lakehouse-doc-en/references/get-started-with-sample-data.md +10 -11
  107. package/bin/skills/lakehouse-doc-en/references/gitfolder.md +2 -3
  108. package/bin/skills/lakehouse-doc-en/references/grant-privileges.md +2 -0
  109. package/bin/skills/lakehouse-doc-en/references/iceberg-rest-catalog-databricks.md +166 -0
  110. package/bin/skills/lakehouse-doc-en/references/ide.md +1 -1
  111. package/bin/skills/lakehouse-doc-en/references/if_else_task.md +59 -57
  112. package/bin/skills/lakehouse-doc-en/references/input_output.md +10 -7
  113. package/bin/skills/lakehouse-doc-en/references/jobprofile-bestpractices.md +60 -64
  114. package/bin/skills/lakehouse-doc-en/references/kafka-connection.md +0 -1
  115. package/bin/skills/lakehouse-doc-en/references/key-concepts.md +146 -117
  116. package/bin/skills/lakehouse-doc-en/references/lakehouse-ai-gateway-cz-cli.md +317 -0
  117. package/bin/skills/lakehouse-doc-en/references/lakehouse-ai-sql-analysis.md +345 -0
  118. package/bin/skills/lakehouse-doc-en/references/lakehouse-dqc-guide.md +300 -0
  119. package/bin/skills/lakehouse-doc-en/references/lakehouse-medallion-sql-dt-guide.md +543 -0
  120. package/bin/skills/lakehouse-doc-en/references/lakehouse-multi-cloud-acceleration.md +274 -0
  121. package/bin/skills/lakehouse-doc-en/references/lakehouse-multimodal-ai-pipeline.md +198 -0
  122. package/bin/skills/lakehouse-doc-en/references/lakehouse-quick-experience_guide.md +49 -52
  123. package/bin/skills/lakehouse-doc-en/references/lakehouse-volume-pipe-acceleration-guide.md +380 -0
  124. package/bin/skills/lakehouse-doc-en/references/langchain-plug-installation.md +1 -1
  125. package/bin/skills/lakehouse-doc-en/references/management.md +4 -9
  126. package/bin/skills/lakehouse-doc-en/references/medallion-lakehouse-from-scratch.md +2 -1
  127. package/bin/skills/lakehouse-doc-en/references/metrics_answer_build.md +58 -21
  128. package/bin/skills/lakehouse-doc-en/references/migrate-spark-data-engineering-best-practices-to-lakehouse.md +1 -1
  129. package/bin/skills/lakehouse-doc-en/references/mindsdb.md +1 -1
  130. package/bin/skills/lakehouse-doc-en/references/monitoring_and_alerting.md +65 -60
  131. package/bin/skills/lakehouse-doc-en/references/monitoring_item_specification.md +33 -33
  132. package/bin/skills/lakehouse-doc-en/references/multitable_batch_sync.md +16 -16
  133. package/bin/skills/lakehouse-doc-en/references/multitable_realtime_sync.md +65 -72
  134. package/bin/skills/lakehouse-doc-en/references/multitable_realtime_sync_sop.md +54 -52
  135. package/bin/skills/lakehouse-doc-en/references/navicat-mysql.md +2 -0
  136. package/bin/skills/lakehouse-doc-en/references/om-dynamic-table.md +71 -66
  137. package/bin/skills/lakehouse-doc-en/references/om-vcluster.md +2 -0
  138. package/bin/skills/lakehouse-doc-en/references/open-api-create-session.md +79 -0
  139. package/bin/skills/lakehouse-doc-en/references/open-api-generate-auth-token.md +63 -0
  140. package/bin/skills/lakehouse-doc-en/references/open-api-overview.md +96 -0
  141. package/bin/skills/lakehouse-doc-en/references/open-api-quick-start.md +286 -0
  142. package/bin/skills/lakehouse-doc-en/references/open-api-response-guide.md +264 -0
  143. package/bin/skills/lakehouse-doc-en/references/open-api-safe-question-poll.md +201 -0
  144. package/bin/skills/lakehouse-doc-en/references/open-api-text2insight-query.md +99 -0
  145. package/bin/skills/lakehouse-doc-en/references/open-api-text2insight-stop.md +74 -0
  146. package/bin/skills/lakehouse-doc-en/references/overview.md +6 -7
  147. package/bin/skills/lakehouse-doc-en/references/permission-application.md +5 -5
  148. package/bin/skills/lakehouse-doc-en/references/pipe-introduction.md +1 -0
  149. package/bin/skills/lakehouse-doc-en/references/pipe-kafka-table-stream.md +72 -70
  150. package/bin/skills/lakehouse-doc-en/references/pipe-kafka.md +105 -110
  151. package/bin/skills/lakehouse-doc-en/references/pipe-overview.md +40 -40
  152. package/bin/skills/lakehouse-doc-en/references/pipe-storage-object.md +43 -48
  153. package/bin/skills/lakehouse-doc-en/references/pipe-summary.md +14 -4
  154. package/bin/skills/lakehouse-doc-en/references/pipe-syntax.md +58 -151
  155. package/bin/skills/lakehouse-doc-en/references/practice_python_task.md +4 -4
  156. package/bin/skills/lakehouse-doc-en/references/pricing-ai-gateway.md +181 -0
  157. package/bin/skills/lakehouse-doc-en/references/pricing-lakehouse.md +316 -0
  158. package/bin/skills/lakehouse-doc-en/references/pricing.md +44 -288
  159. package/bin/skills/lakehouse-doc-en/references/private-link-general.md +0 -2
  160. package/bin/skills/lakehouse-doc-en/references/pyspark-to-zettapark-migration-f1.md +1 -1
  161. package/bin/skills/lakehouse-doc-en/references/python-igs.md +7 -3
  162. package/bin/skills/lakehouse-doc-en/references/python-sample-put-github-rt-events.md +1 -1
  163. package/bin/skills/lakehouse-doc-en/references/python-task.md +1 -1
  164. package/bin/skills/lakehouse-doc-en/references/python_reference/connector.md +3 -3
  165. package/bin/skills/lakehouse-doc-en/references/python_reference/connector_advanced.md +2 -2
  166. package/bin/skills/lakehouse-doc-en/references/python_reference/connector_examples.md +2 -2
  167. package/bin/skills/lakehouse-doc-en/references/python_sdk_guide.md +1 -1
  168. package/bin/skills/lakehouse-doc-en/references/python_shell_datasource.md +11 -9
  169. package/bin/skills/lakehouse-doc-en/references/quick_start_batch_sync_data.md +9 -18
  170. package/bin/skills/lakehouse-doc-en/references/quick_start_bi_analysis.md +8 -25
  171. package/bin/skills/lakehouse-doc-en/references/quick_start_create_workspace.md +4 -6
  172. package/bin/skills/lakehouse-doc-en/references/quick_start_data_quality.md +8 -8
  173. package/bin/skills/lakehouse-doc-en/references/quick_start_etl.md +16 -20
  174. package/bin/skills/lakehouse-doc-en/references/quick_start_monitoring_and_alerting.md +10 -18
  175. package/bin/skills/lakehouse-doc-en/references/quick_start_sql_query.md +7 -10
  176. package/bin/skills/lakehouse-doc-en/references/quick_start_upload_data.md +5 -7
  177. package/bin/skills/lakehouse-doc-en/references/quick_start_user_management.md +8 -8
  178. package/bin/skills/lakehouse-doc-en/references/quick_start_workspace.md +0 -5
  179. package/bin/skills/lakehouse-doc-en/references/quick_start_workspace_user.md +8 -8
  180. package/bin/skills/lakehouse-doc-en/references/quickstart.md +69 -56
  181. package/bin/skills/lakehouse-doc-en/references/quickstart_datashare_between_companies.md +0 -5
  182. package/bin/skills/lakehouse-doc-en/references/quickstart_envirment_for_team.md +0 -24
  183. package/bin/skills/lakehouse-doc-en/references/realtime-pipeline-selection-guide.md +1 -2
  184. package/bin/skills/lakehouse-doc-en/references/realtime-sales-dashboard-with-dynamic-table.md +3 -3
  185. package/bin/skills/lakehouse-doc-en/references/realtime_sync.md +0 -1
  186. package/bin/skills/lakehouse-doc-en/references/release-note-2026-05-19.md +5 -3
  187. package/bin/skills/lakehouse-doc-en/references/revoke-privileges.md +3 -1
  188. package/bin/skills/lakehouse-doc-en/references/roles.md +2 -3
  189. package/bin/skills/lakehouse-doc-en/references/row-filter.md +165 -0
  190. package/bin/skills/lakehouse-doc-en/references/row_level_permission.md +30 -19
  191. package/bin/skills/lakehouse-doc-en/references/scheduled_task.md +28 -21
  192. package/bin/skills/lakehouse-doc-en/references/security_overview.md +99 -21
  193. package/bin/skills/lakehouse-doc-en/references/set-command.md +1 -1
  194. package/bin/skills/lakehouse-doc-en/references/setup.md +13 -15
  195. package/bin/skills/lakehouse-doc-en/references/show-grants.md +1 -1
  196. package/bin/skills/lakehouse-doc-en/references/snowflake-dynamic-tables-to-lakehouse.md +2 -2
  197. package/bin/skills/lakehouse-doc-en/references/spark-connector-summary.md +1 -1
  198. package/bin/skills/lakehouse-doc-en/references/sql_functions/context_functions/current_vcluster.md +1 -1
  199. package/bin/skills/lakehouse-doc-en/references/sso-configuration.md +2 -2
  200. package/bin/skills/lakehouse-doc-en/references/streaming_pipeline_with_dynamic_table.md +0 -1
  201. package/bin/skills/lakehouse-doc-en/references/studio-incremental-sync-practice.md +27 -23
  202. package/bin/skills/lakehouse-doc-en/references/studio-shell-task.md +1 -1
  203. package/bin/skills/lakehouse-doc-en/references/supported-cloud-platforms.md +32 -0
  204. package/bin/skills/lakehouse-doc-en/references/table_rendering.md +18 -12
  205. package/bin/skills/lakehouse-doc-en/references/task-develop.md +89 -91
  206. package/bin/skills/lakehouse-doc-en/references/task_development.md +19 -17
  207. package/bin/skills/lakehouse-doc-en/references/task_group.md +16 -14
  208. package/bin/skills/lakehouse-doc-en/references/task_instance.md +21 -21
  209. package/bin/skills/lakehouse-doc-en/references/task_param.md +38 -35
  210. package/bin/skills/lakehouse-doc-en/references/task_param_reference.md +81 -79
  211. package/bin/skills/lakehouse-doc-en/references/task_scheduling_dependency.md +20 -21
  212. package/bin/skills/lakehouse-doc-en/references/tencentcloud_arn_and_externalid.md +1 -5
  213. package/bin/skills/lakehouse-doc-en/references/trial-account-quotas-and-limits.md +1 -3
  214. package/bin/skills/lakehouse-doc-en/references/tutorial_connect_to_lakehouse.md +69 -0
  215. package/bin/skills/lakehouse-doc-en/references/tutorials.md +4 -1
  216. package/bin/skills/lakehouse-doc-en/references/unique-key.md +167 -0
  217. package/bin/skills/lakehouse-doc-en/references/usageandbillingview.md +138 -0
  218. package/bin/skills/lakehouse-doc-en/references/use-dbt-dev.md +3 -3
  219. package/bin/skills/lakehouse-doc-en/references/use-java-sdk-realtime-uploaddata.md +1 -1
  220. package/bin/skills/lakehouse-doc-en/references/use-java-sdk-upload-data-local.md +3 -3
  221. package/bin/skills/lakehouse-doc-en/references/use-models.md +128 -0
  222. package/bin/skills/lakehouse-doc-en/references/use-mysql-client.md +81 -81
  223. package/bin/skills/lakehouse-doc-en/references/use-python-sdk-upload-data.md +10 -12
  224. package/bin/skills/lakehouse-doc-en/references/user-identification.md +2 -3
  225. package/bin/skills/lakehouse-doc-en/references/user_permission_grand_guide.md +1 -1
  226. package/bin/skills/lakehouse-doc-en/references/using-udf-in-dynamic-table.md +1 -1
  227. package/bin/skills/lakehouse-doc-en/references/vc_cache.md +18 -22
  228. package/bin/skills/lakehouse-doc-en/references/vcluster_size_description.md +33 -31
  229. package/bin/skills/lakehouse-doc-en/references/virtual-cluster.md +43 -45
  230. package/bin/skills/lakehouse-doc-en/references/web-job-history.md +94 -108
  231. package/bin/skills/lakehouse-doc-en/references/web_search.md +16 -7
  232. package/bin/skills/lakehouse-doc-en/references/zettapark-data-engineering-demo.md +1 -1
  233. package/bin/skills/lakehouse-doc-en/references/zettapark-dataframe-guide.md +144 -70
  234. package/bin/skills/lakehouse-doc-en/references/zettapark-dynamic-table-guide.md +2 -2
  235. package/bin/skills/lakehouse-doc-en/references/zettapark-etl-guide.md +73 -33
  236. package/bin/skills/lakehouse-doc-en/references/zettapark-feature-engineering.md +2 -2
  237. package/bin/skills/lakehouse-doc-en/references/zettapark-functions-guide.md +75 -46
  238. package/bin/skills/lakehouse-doc-en/references/zettapark-quick-start.md +2 -2
  239. package/bin/skills/lakehouse-doc-en/references/zettapark-stream-guide.md +4 -4
  240. package/bin/skills/lakehouse-doc-en/references/zettapark-volume-guide.md +93 -29
  241. package/package.json +1 -1
  242. package/bin/skills/lakehouse-doc-en/references/CLAUDE.md +0 -606
  243. package/bin/skills/lakehouse-doc-en/references/modelprice.md +0 -155
@@ -6,8 +6,6 @@ Welcome to Lakehouse! This guide has designed a series of carefully orchestrated
6
6
 
7
7
  This guide includes the following experience content:
8
8
 
9
- :-: ![](.topwrite/assets/lakehouse-happy-path-diagram_1747811346312.svg =820)
10
-
11
9
  1. **Run Your First SQL Query** (2-3 minutes)
12
10
  Experience Lakehouse's easy-to-use SQL analysis environment.
13
11
 
@@ -40,7 +38,6 @@ Log into Lakehouse Studio and create a new workspace: `lakehouse_quick_experienc
40
38
 
41
39
  ^
42
40
 
43
- :-: ![](.topwrite/assets/image_1747812390227.png =618)
44
41
 
45
42
  ^
46
43
 
@@ -48,13 +45,13 @@ Enter the "Development" page and switch the workspace to the newly created works
48
45
 
49
46
  ^
50
47
 
51
- :-: ![](.topwrite/assets/image_1747812493874.png =622)
48
+
52
49
 
53
50
  ^
54
51
 
55
- Entry for creating a new SQL worksheet:
52
+ Entry for creating a new SQL worksheet.
53
+
56
54
 
57
- :-: ![](.topwrite/assets/image_1747812869517.png =621)
58
55
 
59
56
  ^
60
57
 
@@ -62,7 +59,7 @@ Create a new SQL worksheet named "00\_Environment\_Preparation".
62
59
 
63
60
  ^
64
61
 
65
- :-: ![](.topwrite/assets/image_1747812698576.png =622)
62
+
66
63
 
67
64
  ^
68
65
 
@@ -77,11 +74,11 @@ USE SCHEMA happy_path;
77
74
 
78
75
  -- Create the first Virtual Compute Cluster (General type)
79
76
  -- Virtual Compute Clusters are the core concept of Lakehouse, representing on-demand allocatable computing resources
80
- CREATE VCLUSTER IF NOT EXISTS MY_FIRST_VC
81
- VCLUSTER_SIZE = 1
82
- VCLUSTER_TYPE = GENERAL
83
- AUTO_SUSPEND_IN_SECOND = 60
84
- AUTO_RESUME = TRUE
77
+ CREATE VCLUSTER IF NOT EXISTS MY_FIRST_VC
78
+ VCLUSTER_SIZE = 1
79
+ VCLUSTER_TYPE = GENERAL
80
+ AUTO_SUSPEND_IN_SECOND = 60
81
+ AUTO_RESUME = TRUE
85
82
  COMMENT 'My first virtual compute cluster (General)';
86
83
 
87
84
  -- Use this cluster
@@ -133,7 +130,7 @@ In this exercise, you will execute simple SQL queries, create tables, and perfor
133
130
  (103, 'Smart Watch', 1299.50, 'Wearables'),
134
131
  (104, 'Portable Power Bank', 159.90, 'Accessories'),
135
132
  (105, 'Mechanical Keyboard', 349.00, 'Computer Accessories');
136
-
133
+
137
134
  -- Query the inserted data
138
135
  SELECT * FROM happy_path.my_first_table;
139
136
  ```
@@ -142,7 +139,7 @@ In this exercise, you will execute simple SQL queries, create tables, and perfor
142
139
 
143
140
  ```sql
144
141
  -- Count products and average price by category
145
- SELECT
142
+ SELECT
146
143
  category,
147
144
  COUNT(*) as product_count,
148
145
  AVG(price) as avg_price,
@@ -174,11 +171,11 @@ Next, let's create a different type of compute cluster to understand how to choo
174
171
  ```sql
175
172
  -- Create an Analytics-type Virtual Compute Cluster
176
173
  -- Analytics clusters optimize query performance, suitable for low-latency, high-concurrency analysis scenarios
177
- CREATE VCLUSTER IF NOT EXISTS MY_SECOND_VC
178
- VCLUSTER_SIZE = 1
179
- VCLUSTER_TYPE = ANALYTICS
180
- AUTO_SUSPEND_IN_SECOND = 60
181
- AUTO_RESUME = TRUE
174
+ CREATE VCLUSTER IF NOT EXISTS MY_SECOND_VC
175
+ VCLUSTER_SIZE = 1
176
+ VCLUSTER_TYPE = ANALYTICS
177
+ AUTO_SUSPEND_IN_SECOND = 60
178
+ AUTO_RESUME = TRUE
182
179
  COMMENT 'My second virtual compute cluster (Analytics)';
183
180
  ```
184
181
 
@@ -283,7 +280,7 @@ Compute-storage separation is a core architectural feature of Lakehouse, allowin
283
280
  INSERT INTO happy_path.demo_dataset VALUES
284
281
  (6, 432.10, 'Text-6', CURRENT_TIMESTAMP()),
285
282
  (7, 789.65, 'Text-7', CURRENT_TIMESTAMP());
286
-
283
+
287
284
  -- Query the updated dataset
288
285
  SELECT * FROM happy_path.demo_dataset ORDER BY id;
289
286
  ```
@@ -389,7 +386,7 @@ The Lakehouse unified architecture allows you to directly query files in multipl
389
386
  (1003, 205, DATE '2023-02-02', 1, 899.00, 'C5003'),
390
387
  (1004, 204, DATE '2023-02-03', 1, 599.00, 'C5001'),
391
388
  (1005, 202, DATE '2023-02-03', 1, 3799.00, 'C5004');
392
-
389
+
393
390
  -- Export sales data to User Volume (CSV format)
394
391
  COPY INTO USER VOLUME
395
392
  SUBDIRECTORY 'lake_demo/sales_csv'
@@ -518,7 +515,7 @@ Lakehouse supports simultaneously processing both batch and streaming data on th
518
515
  SELECT * FROM happy_path.all_orders ORDER BY order_time DESC;
519
516
 
520
517
  -- View order statistics
521
- SELECT
518
+ SELECT
522
519
  data_source,
523
520
  COUNT(*) as order_count,
524
521
  SUM(order_amount) as total_amount
@@ -538,7 +535,7 @@ Lakehouse supports simultaneously processing both batch and streaming data on th
538
535
 
539
536
  ```sql
540
537
  -- Query order statistics again
541
- SELECT
538
+ SELECT
542
539
  data_source,
543
540
  COUNT(*) as order_count,
544
541
  SUM(order_amount) as total_amount
@@ -577,13 +574,13 @@ Lakehouse supports efficient vector search and inverted index search, which can
577
574
  description STRING,
578
575
  price DECIMAL(10,2),
579
576
  vec VECTOR(FLOAT, 16), -- 16-dimensional vector representing product features
580
-
577
+
581
578
  -- Create vector index
582
579
  INDEX product_vec_idx (vec) USING VECTOR PROPERTIES (
583
- "scalar.type" = "f32",
580
+ "scalar.type" = "f32",
584
581
  "distance.function" = "l2_distance"
585
582
  ),
586
-
583
+
587
584
  -- Create inverted index for full-text search
588
585
  INDEX product_description_idx (description) INVERTED PROPERTIES (
589
586
  'analyzer' = 'chinese'
@@ -596,31 +593,31 @@ Lakehouse supports efficient vector search and inverted index search, which can
596
593
  ```sql
597
594
  -- Insert sample data with vectors
598
595
  INSERT INTO happy_path.product_search_demo VALUES
599
- (1001, 'Ultra-thin Laptop', 'Computers', 'Thin and lightweight high-performance business laptop with the latest processor and HD display', 6999.00,
596
+ (1001, 'Ultra-thin Laptop', 'Computers', 'Thin and lightweight high-performance business laptop with the latest processor and HD display', 6999.00,
600
597
  vector(0.1, 0.2, 0.3, 0.4, 0.5, 0.1, 0.2, 0.3, 0.4, 0.5, 0.1, 0.2, 0.3, 0.4, 0.5, 0.1)),
601
- (1002, 'Professional Gaming Laptop', 'Computers', 'High-performance gaming laptop with dedicated graphics card, suitable for playing large games and professional design', 9999.00,
598
+ (1002, 'Professional Gaming Laptop', 'Computers', 'High-performance gaming laptop with dedicated graphics card, suitable for playing large games and professional design', 9999.00,
602
599
  vector(0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2)),
603
- (1003, 'Business Office Desktop', 'Computers', 'Stable and efficient office desktop computer, suitable for enterprise and home office environments', 4599.00,
600
+ (1003, 'Business Office Desktop', 'Computers', 'Stable and efficient office desktop computer, suitable for enterprise and home office environments', 4599.00,
604
601
  vector(0.3, 0.4, 0.5, 0.6, 0.7, 0.3, 0.4, 0.5, 0.6, 0.7, 0.3, 0.4, 0.5, 0.6, 0.7, 0.3));
605
602
 
606
603
  -- Continue inserting more data
607
604
  INSERT INTO happy_path.product_search_demo VALUES
608
- (1004, 'Professional Photography Camera', 'Digital Devices', 'High-resolution professional DSLR camera, suitable for landscape and portrait photography with clear and detailed image quality', 12999.00,
605
+ (1004, 'Professional Photography Camera', 'Digital Devices', 'High-resolution professional DSLR camera, suitable for landscape and portrait photography with clear and detailed image quality', 12999.00,
609
606
  vector(0.4, 0.5, 0.6, 0.7, 0.8, 0.4, 0.5, 0.6, 0.7, 0.8, 0.4, 0.5, 0.6, 0.7, 0.8, 0.4)),
610
- (1005, 'Portable Bluetooth Speaker', 'Audio Devices', 'Compact and portable Bluetooth speaker with clear sound quality and long battery life, suitable for outdoor use', 299.00,
607
+ (1005, 'Portable Bluetooth Speaker', 'Audio Devices', 'Compact and portable Bluetooth speaker with clear sound quality and long battery life, suitable for outdoor use', 299.00,
611
608
  vector(0.5, 0.6, 0.7, 0.8, 0.9, 0.5, 0.6, 0.7, 0.8, 0.9, 0.5, 0.6, 0.7, 0.8, 0.9, 0.5)),
612
- (1006, 'Wireless Noise-Cancelling Headphones', 'Audio Devices', 'Active noise cancellation technology, wireless connection, comfortable to wear, no ear pressure during long use', 1299.00,
609
+ (1006, 'Wireless Noise-Cancelling Headphones', 'Audio Devices', 'Active noise cancellation technology, wireless connection, comfortable to wear, no ear pressure during long use', 1299.00,
613
610
  vector(0.6, 0.7, 0.8, 0.9, 1.0, 0.6, 0.7, 0.8, 0.9, 1.0, 0.6, 0.7, 0.8, 0.9, 1.0, 0.6)),
614
- (1007, 'Smart Watch', 'Wearables', 'Smart watch supporting heart rate monitoring, activity tracking, and message notifications, compatible with various smartphones', 1599.00,
611
+ (1007, 'Smart Watch', 'Wearables', 'Smart watch supporting heart rate monitoring, activity tracking, and message notifications, compatible with various smartphones', 1599.00,
615
612
  vector(0.7, 0.8, 0.9, 1.0, 0.1, 0.7, 0.8, 0.9, 1.0, 0.1, 0.7, 0.8, 0.9, 1.0, 0.1, 0.7));
616
613
 
617
614
  -- Continue inserting remaining data
618
615
  INSERT INTO happy_path.product_search_demo VALUES
619
- (1008, 'Fitness Tracker', 'Wearables', 'Professional fitness tracking band, recording daily activity, sleep quality, and exercise data, waterproof design', 399.00,
616
+ (1008, 'Fitness Tracker', 'Wearables', 'Professional fitness tracking band, recording daily activity, sleep quality, and exercise data, waterproof design', 399.00,
620
617
  vector(0.8, 0.9, 1.0, 0.1, 0.2, 0.8, 0.9, 1.0, 0.1, 0.2, 0.8, 0.9, 1.0, 0.1, 0.2, 0.8)),
621
- (1009, 'Ultra HD Smart TV', 'Home Appliances', '65-inch 4K Ultra HD smart TV, supporting voice control and various streaming applications', 5999.00,
618
+ (1009, 'Ultra HD Smart TV', 'Home Appliances', '65-inch 4K Ultra HD smart TV, supporting voice control and various streaming applications', 5999.00,
622
619
  vector(0.9, 1.0, 0.1, 0.2, 0.3, 0.9, 1.0, 0.1, 0.2, 0.3, 0.9, 1.0, 0.1, 0.2, 0.3, 0.9)),
623
- (1010, 'Smart Air Purifier', 'Home Appliances', 'Efficiently filters PM2.5 and harmful gases, intelligently monitors air quality, automatically adjusts working mode', 1899.00,
620
+ (1010, 'Smart Air Purifier', 'Home Appliances', 'Efficiently filters PM2.5 and harmful gases, intelligently monitors air quality, automatically adjusts working mode', 1899.00,
624
621
  vector(1.0, 0.1, 0.2, 0.3, 0.4, 1.0, 0.1, 0.2, 0.3, 0.4, 1.0, 0.1, 0.2, 0.3, 0.4, 1.0));
625
622
  ```
626
623
 
@@ -651,7 +648,7 @@ LIMIT 5;
651
648
 
652
649
  ```sql
653
650
  -- Use inverted index for keyword search - find products with "high-performance" in the description
654
- SELECT
651
+ SELECT
655
652
  product_id,
656
653
  product_name,
657
654
  category,
@@ -666,7 +663,7 @@ LIMIT 5;
666
663
 
667
664
  ```sql
668
665
  -- Hybrid query: find products similar to the reference vector and containing "gaming" in the description
669
- SELECT
666
+ SELECT
670
667
  product_id,
671
668
  product_name,
672
669
  category,
@@ -674,7 +671,7 @@ LIMIT 5;
674
671
  price,
675
672
  l2_distance(vec, vector(0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2)) AS distance
676
673
  FROM happy_path.product_search_demo
677
- WHERE
674
+ WHERE
678
675
  match_phrase(description, 'gaming', MAP('analyzer', 'chinese')) AND
679
676
  l2_distance(vec, vector(0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2)) < 10
680
677
  ORDER BY distance
@@ -685,7 +682,7 @@ LIMIT 5;
685
682
 
686
683
  ```sql
687
684
  -- Find products priced between 500-10000, with "high-performance" or "professional" in the description, and high vector similarity
688
- SELECT
685
+ SELECT
689
686
  product_id,
690
687
  product_name,
691
688
  category,
@@ -693,9 +690,9 @@ LIMIT 5;
693
690
  price,
694
691
  l2_distance(vec, vector(0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2)) AS distance
695
692
  FROM happy_path.product_search_demo
696
- WHERE
693
+ WHERE
697
694
  price BETWEEN 500 AND 10000 AND
698
- (match_phrase(description, 'high-performance', MAP('analyzer', 'chinese')) OR
695
+ (match_phrase(description, 'high-performance', MAP('analyzer', 'chinese')) OR
699
696
  match_phrase(description, 'professional', MAP('analyzer', 'chinese'))) AND
700
697
  l2_distance(vec, vector(0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2)) < 10
701
698
  ORDER BY distance
@@ -749,15 +746,15 @@ Lakehouse supports offline batch processing and transformation. Estimated time:
749
746
  ```sql
750
747
  -- Create a date dimension table
751
748
  CREATE TABLE IF NOT EXISTS happy_path.date_dim AS
752
- SELECT DISTINCT
749
+ SELECT DISTINCT
753
750
  date_time::DATE as date_id,
754
751
  YEAR(date_time) as year,
755
752
  MONTH(date_time) as month,
756
753
  DAY(date_time) as day,
757
754
  DAYOFWEEK(date_time) as day_of_week,
758
- CASE
759
- WHEN DAYOFWEEK(date_time) IN (6, 7) THEN true
760
- ELSE false
755
+ CASE
756
+ WHEN DAYOFWEEK(date_time) IN (6, 7) THEN true
757
+ ELSE false
761
758
  END as is_weekend
762
759
  FROM happy_path.sales_data;
763
760
 
@@ -770,7 +767,7 @@ Lakehouse supports offline batch processing and transformation. Estimated time:
770
767
  ```sql
771
768
  -- Create a sales summary table
772
769
  CREATE TABLE IF NOT EXISTS happy_path.sales_summary AS
773
- SELECT
770
+ SELECT
774
771
  d.date_id,
775
772
  d.year,
776
773
  d.month,
@@ -800,7 +797,7 @@ Lakehouse supports offline batch processing and transformation. Estimated time:
800
797
 
801
798
  ```sql
802
799
  -- Analyze sales trends using window functions
803
- SELECT
800
+ SELECT
804
801
  date_id,
805
802
  category,
806
803
  total_sales,
@@ -816,7 +813,7 @@ Lakehouse supports offline batch processing and transformation. Estimated time:
816
813
  ```sql
817
814
  -- Create a business insights view
818
815
  CREATE OR REPLACE VIEW happy_path.business_insights AS
819
- SELECT
816
+ SELECT
820
817
  category,
821
818
  year,
822
819
  month,
@@ -837,7 +834,7 @@ Lakehouse supports offline batch processing and transformation. Estimated time:
837
834
 
838
835
  ```sql
839
836
  -- Analyze sales ranking and proportion by category
840
- SELECT
837
+ SELECT
841
838
  category,
842
839
  SUM(total_sales) as category_sales,
843
840
  RANK() OVER (ORDER BY SUM(total_sales) DESC) as sales_rank,
@@ -973,8 +970,8 @@ Now you can start applying Lakehouse to actual business scenarios and enjoy a si
973
970
 
974
971
  ## References
975
972
 
976
- [Key Concepts](key_concepts.md)
973
+ [Key Concepts](key-concepts.md)
977
974
  [Virtual Compute Cluster](getting_started_with_vcluster_for_processing_analytics.md)
978
975
  [Volume](datalake_volume.md)
979
- [Vector Index](create-vector-index.md)
976
+ [Vector Index](vector-search.md)
980
977
  [Inverted Index](inverted-index.md)
@@ -0,0 +1,380 @@
1
+ # Volume + Pipe + Dynamic Table End-to-End Practice
2
+
3
+ "Data lake acceleration" refers to using the three capabilities of object storage mounting (Volume), continuous data ingestion (Pipe), and incremental computation (Dynamic Table) to directly query, process, and consume file data in object storage using Serverless compute—without migrating data—replacing traditional Spark/Hive ETL and Presto/Trino ad hoc queries.
4
+
5
+ Applicable scenarios:
6
+ - **Automatic file ingestion**: CSV/Parquet files periodically uploaded to OSS/COS/S3 are automatically detected and ingested by Pipe, no manual trigger required
7
+ - **Incremental ETL**: After files are ingested, Dynamic Table automatically computes aggregated metrics incrementally, T+1 reports generated without delay
8
+ - **Legacy data activation**: Large volumes of historical files in object storage can be queried directly via Volume mount, no data migration required
9
+
10
+ Core data flow:
11
+
12
+ ```
13
+ OSS/COS/S3 files → External Volume (mount) → Pipe (continuous ingestion) → Target table → Dynamic Table (incremental aggregation)
14
+ ↕ ↕
15
+ COPY INTO/SELECT FROM COPY INTO/SELECT FROM
16
+ ```
17
+
18
+ ---
19
+
20
+ ## Core Concepts
21
+
22
+ | Object | Description | Analogy |
23
+ |------|------|------|
24
+ | **External Volume** | Mounts OSS/COS/S3 path for zero-copy access | "Filesystem" of Lakehouse |
25
+ | **Pipe** | Continuously running data ingestion pipeline, automatically detects new files | Conveyor belt—files are ingested as soon as they are uploaded |
26
+ | **Dynamic Table** | Materialized aggregation table that automatically refreshes incrementally | Replaces scheduled ETL jobs |
27
+
28
+ The three work together to form a **self-driving data pipeline**: file upload → automatic ingestion → automatic aggregation, fully automated with no manual scheduling.
29
+
30
+ ---
31
+
32
+ ## SQL Commands Involved
33
+
34
+ | Command / Function | Purpose | Use Case |
35
+ |------------|------|---------|
36
+ | `CREATE STORAGE CONNECTION` | Establish object storage authentication channel | One-time setup, shared by all Volumes |
37
+ | `CREATE EXTERNAL VOLUME` | Mount object storage path to a Schema | Configure once per Bucket subdirectory |
38
+ | `COPY INTO VOLUME` | Export data to Volume | Generate files for downstream consumption |
39
+ | `SELECT FROM VOLUME` | Directly query files in Volume | Ad hoc queries, data exploration |
40
+ | `DIRECTORY()` | List files in a Volume | View file list, validate exports |
41
+ | `ALTER VOLUME REFRESH` | Manually refresh Volume directory cache | Use when `AUTO_REFRESH=FALSE` |
42
+ | `CREATE PIPE` | Create continuous data ingestion pipeline | Automatic file ingestion |
43
+ | `ALTER PIPE` | Pause/resume Pipe | Operations management |
44
+ | `DESC PIPE EXTENDED` | View Pipe status and configuration | Monitoring, troubleshooting |
45
+ | `load_history()` | Query table's historical load records | Validate Pipe loading, troubleshoot deduplication |
46
+ | `CREATE DYNAMIC TABLE` | Create auto-incrementally refreshing aggregation table | Replace scheduled ETL jobs |
47
+ | `REFRESH DYNAMIC TABLE` | Manually trigger Dynamic Table refresh | Immediately refresh after initial creation |
48
+ | `SHOW DYNAMIC TABLE REFRESH HISTORY` | View refresh history | Monitor incremental refresh status |
49
+
50
+ ---
51
+
52
+ ## Prerequisites
53
+
54
+ The following uses Alibaba Cloud OSS as an example, completing the full pipeline using the `semantic_model_test` schema and `DEFAULT` Virtual Cluster.
55
+
56
+ > ⚠️ **Prerequisites**: OSS Bucket has been created, and you have AccessKey ID / AccessKey Secret. Virtual Cluster must be in RUNNING state (queries auto-wake in Serverless mode).
57
+
58
+ ---
59
+
60
+ ## End-to-End Practice
61
+
62
+ ### Step 1: Create Storage Connection
63
+
64
+ Establish the authentication channel between Lakehouse and OSS.
65
+
66
+ ```sql
67
+ -- Create OSS storage connection
68
+ CREATE STORAGE CONNECTION IF NOT EXISTS my_oss_conn
69
+ TYPE OSS
70
+ access_id = '<your_access_key_id>'
71
+ access_key = '<your_access_key_secret>'
72
+ ENDPOINT = 'oss-cn-shanghai.aliyuncs.com';
73
+ ```
74
+
75
+ > **Parameter note**: Alibaba Cloud OSS uses lowercase `access_id` / `access_key`. Uppercase `ACCESS_KEY_ID` / `ACCESS_KEY_SECRET` also works. Do not use `ACCESS_KEY` / `SECRET_KEY` (missing suffixes will cause errors).
76
+
77
+ ### Step 2: Create External Volume
78
+
79
+ Mount the OSS Bucket subdirectory as a Lakehouse Volume.
80
+
81
+ ```sql
82
+ CREATE EXTERNAL VOLUME IF NOT EXISTS my_data_vol
83
+ LOCATION 'oss://my-bucket/data/'
84
+ USING CONNECTION my_oss_conn
85
+ DIRECTORY = (ENABLE = TRUE, AUTO_REFRESH = FALSE)
86
+ RECURSIVE = TRUE
87
+ COMMENT 'Dedicated Volume for data lake acceleration';
88
+ ```
89
+
90
+ Key parameter descriptions:
91
+
92
+ | Parameter | Description |
93
+ |---|---|
94
+ | `LOCATION` | OSS path, must point to a specific subdirectory, not the bucket root path |
95
+ | `USING CONNECTION` | References the Storage Connection created in step 1 |
96
+ | `DIRECTORY.ENABLE` | Enables directory metadata index; allows using `DIRECTORY()` function to query file list |
97
+ | `AUTO_REFRESH` | Set to `TRUE` for auto-refresh; when set to `FALSE`, manual `ALTER VOLUME REFRESH` is required |
98
+ | `RECURSIVE` | Recursively scan subdirectories |
99
+
100
+ ### Step 3: Create Source Table and Export to Volume
101
+
102
+ Verify bidirectional read/write capability of the Volume.
103
+
104
+ ```sql
105
+ -- 1. Create source table and insert test data
106
+ CREATE TABLE IF NOT EXISTS sales_source (
107
+ id BIGINT COMMENT 'Order ID',
108
+ product STRING COMMENT 'Product name',
109
+ category STRING COMMENT 'Category',
110
+ amount DECIMAL(10,2) COMMENT 'Amount',
111
+ dt STRING COMMENT 'Date'
112
+ ) COMMENT 'Data lake acceleration test source table';
113
+
114
+ INSERT INTO sales_source VALUES
115
+ (1, 'iPhone 15', 'Electronics', 8999.00, '2026-06-01'),
116
+ (2, 'MacBook Pro', 'Electronics', 14999.00, '2026-06-01'),
117
+ (3, 'AirPods', 'Electronics', 1299.00, '2026-06-01'),
118
+ (4, 'Nike Air Max', 'Sports', 899.00, '2026-06-01'),
119
+ (5, 'Yoga Mat', 'Sports', 199.00, '2026-06-01');
120
+
121
+ -- 2. Export as CSV to Volume
122
+ COPY INTO VOLUME my_data_vol
123
+ SUBDIRECTORY 'export/'
124
+ FROM TABLE sales_source
125
+ FILE_FORMAT = (TYPE = CSV);
126
+
127
+ -- 3. Export as Parquet to Volume
128
+ COPY INTO VOLUME my_data_vol
129
+ SUBDIRECTORY 'export/'
130
+ FROM TABLE sales_source
131
+ FILE_FORMAT = (TYPE = PARQUET);
132
+ ```
133
+
134
+ > ⚠️ **`COPY INTO VOLUME` requires `SUBDIRECTORY`**: omitting this clause will throw `Syntax error at or near 'FROM'`. To export to the Volume root path, use `SUBDIRECTORY '/'`.
135
+
136
+ > ⚠️ **Export syntax**: `COPY INTO VOLUME` uses `FILE_FORMAT = (TYPE = CSV/PARQUET)`, not `USING CSV`. `USING` is only used for `SELECT FROM VOLUME` to query files.
137
+
138
+ ### Step 4: Validate Volume File Read/Write
139
+
140
+ ```sql
141
+ -- Refresh directory cache (manual refresh required when AUTO_REFRESH=FALSE)
142
+ ALTER VOLUME my_data_vol REFRESH;
143
+
144
+ -- View exported files
145
+ SELECT relative_path, size, last_modified_time
146
+ FROM DIRECTORY(VOLUME my_data_vol)
147
+ WHERE relative_path LIKE 'export/%';
148
+
149
+ -- Directly query CSV files
150
+ SELECT * FROM VOLUME my_data_vol
151
+ USING CSV
152
+ FILES('export/part00001.csv');
153
+
154
+ -- Directly query Parquet files (preserves column names)
155
+ SELECT id, product, category, amount, dt
156
+ FROM VOLUME my_data_vol
157
+ USING PARQUET
158
+ FILES('export/part00001.parquet');
159
+ ```
160
+
161
+ > **CSV vs Parquet column name difference**: CSV files without headers auto-generate column names `f0, f1, f2...`; Parquet files preserve original column names. To use original column names with CSV, add `OPTIONS('header'='true')` on import.
162
+
163
+ ### Step 5: Create Pipe for Continuous Ingestion
164
+
165
+ Pipe continuously monitors the Volume for new files and automatically ingests them into the target table.
166
+
167
+ ```sql
168
+ -- 1. Create dedicated Volume for Pipe (must point to a separate subdirectory)
169
+ CREATE EXTERNAL VOLUME IF NOT EXISTS pipe_vol
170
+ LOCATION 'oss://my-bucket/data/incoming/'
171
+ USING CONNECTION my_oss_conn
172
+ DIRECTORY = (ENABLE = TRUE, AUTO_REFRESH = TRUE)
173
+ RECURSIVE = TRUE
174
+ COMMENT 'Dedicated Volume for Pipe continuous ingestion';
175
+
176
+ -- 2. Create target table
177
+ CREATE TABLE IF NOT EXISTS sales_ods (
178
+ id BIGINT COMMENT 'Order ID',
179
+ product STRING COMMENT 'Product name',
180
+ category STRING COMMENT 'Category',
181
+ amount DECIMAL(10,2) COMMENT 'Amount',
182
+ dt STRING COMMENT 'Date'
183
+ ) COMMENT 'ODS layer — Pipe ingestion target table';
184
+
185
+ -- 3. Create Pipe (LIST_PURGE mode)
186
+ CREATE PIPE IF NOT EXISTS sales_pipe
187
+ INGEST_MODE = 'LIST_PURGE'
188
+ VIRTUAL_CLUSTER = 'DEFAULT'
189
+ COMMENT 'Sales data continuous ingestion pipeline'
190
+ AS
191
+ COPY INTO sales_ods
192
+ FROM VOLUME pipe_vol
193
+ USING CSV PURGE = TRUE;
194
+ ```
195
+
196
+ > ⚠️ **Pipe key constraints**:
197
+ > - Each Pipe needs a dedicated Volume; multiple Pipes cannot share the same Volume
198
+ > - `LOCATION` must point to a specific subdirectory, not the bucket root path
199
+ > - `LIST_PURGE` mode **deletes source files** after successful ingestion (irreversible); use `EVENT_NOTIFICATION` mode to keep files
200
+ > - `PURGE = TRUE` must appear after `USING <format>`, not inside OPTIONS
201
+
202
+ #### Pipe Management
203
+
204
+ ```sql
205
+ -- View Pipe status
206
+ DESC PIPE EXTENDED sales_pipe;
207
+ -- Key fields: pipe_status (RUNNING/PAUSED), ingest_mode, input_name, output_name
208
+
209
+ -- Pause Pipe (stop scanning new files)
210
+ ALTER PIPE sales_pipe SET PIPE_EXECUTION_PAUSED = TRUE;
211
+
212
+ -- Resume Pipe (restart scanning)
213
+ ALTER PIPE sales_pipe SET PIPE_EXECUTION_PAUSED = FALSE;
214
+
215
+ -- View imported file records (7-day retention)
216
+ SELECT * FROM load_history('sales_ods');
217
+ -- Returns: file_path, last_copy_time, file_size, status, first_error_message
218
+ ```
219
+
220
+ > **Deduplication mechanism**: Pipe deduplicates by file path via `load_history` (within 7 days). Files with the same name will not be re-imported. To reload the same file, wait 7 days or rename the file before re-uploading.
221
+
222
+ #### Trigger Pipe Loading
223
+
224
+ Pipe starts running immediately after creation (polls approximately every 30 seconds). Writing new files to the Volume path triggers loading:
225
+
226
+ ```sql
227
+ -- Simulate "new file arrival" via COPY INTO VOLUME
228
+ COPY INTO VOLUME pipe_vol
229
+ SUBDIRECTORY '/'
230
+ FROM (SELECT * FROM sales_source WHERE dt = '2026-06-01')
231
+ FILE_FORMAT = (TYPE = CSV);
232
+
233
+ -- Verify data has been loaded after a moment
234
+ SELECT COUNT(*) FROM sales_ods; -- should return 5
235
+ ```
236
+
237
+ > ⚠️ **Files written during pause**: Files written while Pipe is paused will not be loaded. After resuming, they will be detected in the next scan. If the file name matches an already-loaded file, it will be skipped by the deduplication mechanism.
238
+
239
+ ### Step 6: Create Dynamic Table for Incremental Consumption
240
+
241
+ Based on the Pipe-ingested table, create a Dynamic Table for automatic incremental aggregation.
242
+
243
+ ```sql
244
+ -- Enable change tracking on source table (prerequisite for incremental refresh)
245
+ ALTER TABLE sales_ods SET PROPERTIES ('change_tracking' = 'true');
246
+
247
+ -- Create Dynamic Table, aggregate by category
248
+ CREATE OR REPLACE DYNAMIC TABLE sales_summary
249
+ REFRESH INTERVAL 1 HOUR vcluster DEFAULT
250
+ COMMENT 'Category summary — incremental refresh'
251
+ AS
252
+ SELECT
253
+ category,
254
+ COUNT(*) AS order_cnt,
255
+ SUM(amount) AS total_amount,
256
+ AVG(amount) AS avg_amount,
257
+ MIN(dt) AS min_date,
258
+ MAX(dt) AS max_date
259
+ FROM sales_ods
260
+ GROUP BY category;
261
+
262
+ -- Immediately trigger first refresh (resets refresh baseline time)
263
+ REFRESH DYNAMIC TABLE sales_summary;
264
+
265
+ -- Query results
266
+ SELECT * FROM sales_summary ORDER BY category;
267
+ ```
268
+
269
+ > **Refresh frequency note**: `REFRESH INTERVAL 1 HOUR` calculates the next trigger based on creation time and does not align to clock hours. To trigger at a specific time, create near the target time, or execute `REFRESH` immediately after creation to reset the baseline.
270
+
271
+ #### View DT Refresh History
272
+
273
+ ```sql
274
+ SHOW DYNAMIC TABLE REFRESH HISTORY WHERE name = 'sales_summary';
275
+ -- Key fields: state (SUCCEED), refresh_mode (INCREMENTAL/FULL), duration, source_tables
276
+ ```
277
+
278
+ ---
279
+
280
+ ## Full Data Flow Validation
281
+
282
+ ```sql
283
+ -- Validate data consistency across all stages
284
+ SELECT 'Source' AS stage, COUNT(*) AS rows FROM sales_source
285
+ UNION ALL
286
+ SELECT 'ODS' AS stage, COUNT(*) AS rows FROM sales_ods
287
+ UNION ALL
288
+ SELECT 'Summary' AS stage, COUNT(*) AS rows FROM sales_summary;
289
+ ```
290
+
291
+ | Stage | Data Volume | Description |
292
+ |---|---|---|
293
+ | Source | 5 rows | Raw data (INSERT) |
294
+ | ODS | 5 rows | Pipe ingestion (CSV → table) |
295
+ | Summary | 3 rows | Dynamic Table aggregation (3 category groups) |
296
+
297
+ ---
298
+
299
+ ## Best Practices
300
+
301
+ ### File Size Recommendations
302
+
303
+ | Format | Recommended Size | Description |
304
+ |---|---|---|
305
+ | gzip compressed | ~50 MB | Files that are too large reduce parallelism |
306
+ | CSV uncompressed | 128-256 MB | Balance between scan speed and file count |
307
+ | Parquet uncompressed | 128-256 MB | Columnar storage, more efficient for queries |
308
+
309
+ ### Volume and Pipe Design Principles
310
+
311
+ 1. **Each Pipe has its own Volume**: Different Pipes cannot share the same Volume to avoid interference
312
+ 2. **Volume points to a subdirectory**: Do not point to the bucket root path, as this will cause Pipe creation errors
313
+ 3. **LIST_PURGE vs EVENT_NOTIFICATION**:
314
+ - `LIST_PURGE`: Simple configuration, suitable for most scenarios, deletes source files after loading
315
+ - `EVENT_NOTIFICATION`: Low latency, retains source files, but only supports OSS+S3, and requires additional MNS/SQS configuration
316
+
317
+ ### Dynamic Table Design Principles
318
+
319
+ 1. **Use GP type Virtual Cluster** (such as `DEFAULT`): GP type supports small file merging; AP type does not
320
+ 2. **Enable change_tracking**: If the source table does not have it enabled, DT performs a full refresh every time with no incremental support
321
+ 3. **REFRESH immediately after creation**: Ensures first data availability and resets the refresh baseline time
322
+
323
+ ### Data Lifecycle
324
+
325
+ ```
326
+ File upload → Pipe scan → COPY INTO ingest → PURGE delete → Dynamic Table incremental refresh
327
+ ↓ ↓ ↓ ↓ ↓
328
+ OSS write 30s polling load_history record source file deleted aggregation update
329
+ ```
330
+
331
+ ---
332
+
333
+ ## Test Validation Results
334
+
335
+ The following results are from actual testing on an Alibaba Cloud Shanghai instance (`f8866243`):
336
+
337
+ | Test Item | Result | Details |
338
+ |---|---|---|
339
+ | Storage Connection creation | ✅ | OSS connection normal |
340
+ | External Volume mount | ✅ | Directory access normal; `AUTO_REFRESH=FALSE` requires manual refresh |
341
+ | SELECT FROM VOLUME (CSV) | ✅ | Without header, column names are f0-f4; Parquet preserves column names |
342
+ | SELECT FROM VOLUME (Parquet) | ✅ | Column names and types both preserved |
343
+ | COPY INTO TABLE (CSV) | ✅ | 5 rows correctly imported |
344
+ | COPY INTO TABLE (Parquet) | ✅ | 5 rows correctly imported |
345
+ | COPY INTO VOLUME export | ⚠️ | **Must include `SUBDIRECTORY`**, otherwise syntax error |
346
+ | Pipe LIST_PURGE creation | ✅ | Status immediately becomes RUNNING |
347
+ | Pipe load trigger | ✅ | Auto-loaded in ~30 seconds; load_history records complete |
348
+ | Pipe PURGE deletion | ✅ | Source files auto-deleted after successful load |
349
+ | Pipe pause/resume | ✅ | Files not loaded during pause; re-scanned after resume |
350
+ | Pipe deduplication | ✅ | Same-name files correctly blocked by load_history (7-day retention) |
351
+ | Dynamic Table incremental refresh | ✅ | INCREMENTAL mode, aggregation completed in 346ms |
352
+
353
+ ---
354
+
355
+ ## Notes
356
+
357
+ | Note | Impact | Recommendation |
358
+ |---|---|---|
359
+ | `COPY INTO VOLUME` requires `SUBDIRECTORY` | Without it, syntax error | Use `SUBDIRECTORY '/'` for root path |
360
+ | Generic CSV column names | Without header, column names are f0-f4 | Use `OPTIONS('header'='true')` or switch to Parquet |
361
+ | Manual refresh needed when `AUTO_REFRESH=FALSE` | Directory does not update | Execute `ALTER VOLUME name REFRESH` |
362
+ | Pipe same-name file deduplication | Same-name files not loaded after pause/resume | Rename file on re-upload, or wait 7 days for expiration |
363
+ | `load_history` column name | `last_copy_time` not `last_load_time` | Pay attention to column name when querying |
364
+ | Virtual Cluster auto-sleep | Suspends after 60s without queries | Serverless mode pays on-demand, no concern needed |
365
+ | Pipe COPY statement is immutable | When logic adjustment is needed | DROP PIPE then CREATE again |
366
+ | AP type Virtual Cluster does not support small file merging | Query performance degrades over time | Always use GP type (`DEFAULT`) |
367
+
368
+ ---
369
+
370
+ ## Related Documents
371
+
372
+ - [Multi-Cloud Unified Data Lake Acceleration](lakehouse-multi-cloud-acceleration.md) — Alibaba Cloud/Tencent Cloud/AWS real-world comparison
373
+ - [Volume Overview](volume-overview.md) — Volume concepts, types, and file operations
374
+ - [Object Storage Pipe](pipe-storage-object.md) — LIST_PURGE and EVENT_NOTIFICATION complete configuration
375
+ - [Pipe Overview](pipe-overview.md) — Pipe vs Table Stream comparison
376
+ - [Dynamic Table Overview](dynamic-table-introduce.md) — Incremental computation mechanism
377
+ - [Create External Volume](create-external-volume.md) — Complete DDL syntax
378
+ - [Import Data from Volume](from_volume_to_table.md) — COPY INTO syntax
379
+ - [Export Data to Volume](from_lakehouse_to_volume.md) — Export syntax
380
+ - [Query SHOW JOBS](show-jobs.md) — Filter Pipe jobs by query_tag