@clickzetta/cz-cli-darwin-arm64 0.5.15 → 0.5.17

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (243) hide show
  1. package/bin/cz-cli +0 -0
  2. package/bin/skills/lakehouse-doc-en/SKILL.md +6 -11
  3. package/bin/skills/lakehouse-doc-en/references/AIGateway.md +58 -13
  4. package/bin/skills/lakehouse-doc-en/references/Computation.md +1 -1
  5. package/bin/skills/lakehouse-doc-en/references/DataSource_Amazon_DocumentDB.md +3 -1
  6. package/bin/skills/lakehouse-doc-en/references/Foreach.md +14 -14
  7. package/bin/skills/lakehouse-doc-en/references/JDBC-Driver.md +0 -1
  8. package/bin/skills/lakehouse-doc-en/references/LakehouseAI-overview.md +21 -8
  9. package/bin/skills/lakehouse-doc-en/references/LakehouseDataGPT-tour.md +4 -9
  10. package/bin/skills/lakehouse-doc-en/references/LakehouseStudio-tour.md +14 -19
  11. package/bin/skills/lakehouse-doc-en/references/Lakehouse_Zilliz_MakeDataReadyforBIandAI.md +1 -1
  12. package/bin/skills/lakehouse-doc-en/references/Logstash.md +3 -3
  13. package/bin/skills/lakehouse-doc-en/references/Migrate_Spark_DataEngineeringBestPractices_Project_to_Lakehouse.md +1 -1
  14. package/bin/skills/lakehouse-doc-en/references/Notebook.md +17 -17
  15. package/bin/skills/lakehouse-doc-en/references/RemoteFunction-as-udf.md +14 -14
  16. package/bin/skills/lakehouse-doc-en/references/SQL_External_Catalog_Guide.md +1 -9
  17. package/bin/skills/lakehouse-doc-en/references/SUMMARY.md +59 -29
  18. package/bin/skills/lakehouse-doc-en/references/WINDOWFUNCTION.md +99 -57
  19. package/bin/skills/lakehouse-doc-en/references/Zettapark_Data_Engineering_Demo.md +1 -1
  20. package/bin/skills/lakehouse-doc-en/references/access-control-configuration.md +1 -8
  21. package/bin/skills/lakehouse-doc-en/references/aigw-2026-2-5-1.0.md +16 -0
  22. package/bin/skills/lakehouse-doc-en/references/aigw-2026-3-29-1.0.2.md +14 -0
  23. package/bin/skills/lakehouse-doc-en/references/aigw-2026-3-8-1.0.1.md +16 -0
  24. package/bin/skills/lakehouse-doc-en/references/aigw-2026-4-28-1.1.md +29 -0
  25. package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-12-1.1.1.md +18 -0
  26. package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-15-1.2.md +9 -0
  27. package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-21-1.3.md +9 -0
  28. package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-28-1.4.md +10 -0
  29. package/bin/skills/lakehouse-doc-en/references/aigw-2026-6-3-1.5.md +9 -0
  30. package/bin/skills/lakehouse-doc-en/references/alicloud-arn-externalid.md +0 -5
  31. package/bin/skills/lakehouse-doc-en/references/answer-accuracy-improve.md +120 -103
  32. package/bin/skills/lakehouse-doc-en/references/application-list.md +1 -3
  33. package/bin/skills/lakehouse-doc-en/references/approval-list.md +16 -17
  34. package/bin/skills/lakehouse-doc-en/references/batch-load-parquet-file-into-lakehouse.md +1 -1
  35. package/bin/skills/lakehouse-doc-en/references/batch_sync.md +9 -9
  36. package/bin/skills/lakehouse-doc-en/references/batch_sync_Sop.md +2 -2
  37. package/bin/skills/lakehouse-doc-en/references/batchloadparquetfileintoLakehouse.md +1 -1
  38. package/bin/skills/lakehouse-doc-en/references/bulkloadv1-python-sdk.md +3 -3
  39. package/bin/skills/lakehouse-doc-en/references/chart-auto-refresh-guide.md +12 -6
  40. package/bin/skills/lakehouse-doc-en/references/clickzetta-sample-data.md +3 -3
  41. package/bin/skills/lakehouse-doc-en/references/code_approval.md +1 -5
  42. package/bin/skills/lakehouse-doc-en/references/composite_task.md +31 -42
  43. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_environment_and_data_generate.md +6 -9
  44. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_javasdk_bulkload_realtime.md +4 -10
  45. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_kafka_realtime_sync.md +1 -10
  46. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_local_file_into_table_by_studio.md +0 -6
  47. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_batchload_public_network.md +0 -5
  48. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_python_node.md +2 -7
  49. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_realtime_cdc_public_network.md +13 -18
  50. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_sql_insert.md +0 -1
  51. package/bin/skills/lakehouse-doc-en/references/concepts.md +1 -1
  52. package/bin/skills/lakehouse-doc-en/references/config-datasource.md +5 -7
  53. package/bin/skills/lakehouse-doc-en/references/connect-with-cli.md +116 -72
  54. package/bin/skills/lakehouse-doc-en/references/connect-with-cz-cli.md +151 -0
  55. package/bin/skills/lakehouse-doc-en/references/continue-job.md +9 -17
  56. package/bin/skills/lakehouse-doc-en/references/create-api-connection.md +315 -286
  57. package/bin/skills/lakehouse-doc-en/references/create-catalog-connection.md +1 -0
  58. package/bin/skills/lakehouse-doc-en/references/create-dynamic-table.md +4 -4
  59. package/bin/skills/lakehouse-doc-en/references/create-external-catalog.md +85 -22
  60. package/bin/skills/lakehouse-doc-en/references/create-table-ddl.md +45 -0
  61. package/bin/skills/lakehouse-doc-en/references/creating_alicloud_privatelinkendpoint.md +4 -6
  62. package/bin/skills/lakehouse-doc-en/references/creating_alicloud_privatelinkservice.md +4 -7
  63. package/bin/skills/lakehouse-doc-en/references/creating_tencentcloud_privatelinkendpoint.md +2 -7
  64. package/bin/skills/lakehouse-doc-en/references/creating_tencentcloud_privatelinkservice.md +1 -5
  65. package/bin/skills/lakehouse-doc-en/references/cz-cli-agent.md +15 -10
  66. package/bin/skills/lakehouse-doc-en/references/cz-cli-datasource.md +0 -8
  67. package/bin/skills/lakehouse-doc-en/references/cz-cli-sql.md +2 -45
  68. package/bin/skills/lakehouse-doc-en/references/cz-cli.md +53 -42
  69. package/bin/skills/lakehouse-doc-en/references/dashboard-version-management-guide.md +12 -4
  70. package/bin/skills/lakehouse-doc-en/references/data-integration-intro.md +1 -1
  71. package/bin/skills/lakehouse-doc-en/references/data-integration.md +29 -27
  72. package/bin/skills/lakehouse-doc-en/references/data-load-summary.md +3 -3
  73. package/bin/skills/lakehouse-doc-en/references/data-quality.md +25 -25
  74. package/bin/skills/lakehouse-doc-en/references/data-sharing.md +31 -54
  75. package/bin/skills/lakehouse-doc-en/references/data-sources.md +45 -45
  76. package/bin/skills/lakehouse-doc-en/references/data_catalog.md +23 -25
  77. package/bin/skills/lakehouse-doc-en/references/data_privacy.md +5 -2
  78. package/bin/skills/lakehouse-doc-en/references/data_sharing_between_accounts_guide.md +0 -4
  79. package/bin/skills/lakehouse-doc-en/references/data_visualization.md +4 -15
  80. package/bin/skills/lakehouse-doc-en/references/dataagent.md +39 -7
  81. package/bin/skills/lakehouse-doc-en/references/databricks-delta-to-lakehouse-migration.md +168 -0
  82. package/bin/skills/lakehouse-doc-en/references/databricks-dlt-to-lakehouse-migration.md +331 -0
  83. package/bin/skills/lakehouse-doc-en/references/databricks-external-catalog-practice.md +367 -0
  84. package/bin/skills/lakehouse-doc-en/references/databricks-jobs-to-studio-migration.md +199 -0
  85. package/bin/skills/lakehouse-doc-en/references/databricks-notebook-to-studio-migration.md +350 -0
  86. package/bin/skills/lakehouse-doc-en/references/databricks-uc-governance-to-lakehouse-migration.md +327 -0
  87. package/bin/skills/lakehouse-doc-en/references/datagpt-model-config.md +34 -0
  88. package/bin/skills/lakehouse-doc-en/references/datagpt_data_source.md +50 -37
  89. package/bin/skills/lakehouse-doc-en/references/datagpt_introduction.md +55 -79
  90. package/bin/skills/lakehouse-doc-en/references/datagpt_quickstart.md +50 -64
  91. package/bin/skills/lakehouse-doc-en/references/datalake-acceleration.md +75 -2
  92. package/bin/skills/lakehouse-doc-en/references/dbt-databricks-to-clickzetta-migration.md +242 -0
  93. package/bin/skills/lakehouse-doc-en/references/dynamic-mask.md +30 -30
  94. package/bin/skills/lakehouse-doc-en/references/dynamic-table-bestpractice.md +1 -1
  95. package/bin/skills/lakehouse-doc-en/references/dynamic-table-introduce.md +1 -1
  96. package/bin/skills/lakehouse-doc-en/references/dynamic_table_summary.md +1 -1
  97. package/bin/skills/lakehouse-doc-en/references/eco_integration/streamlit.md +1 -1
  98. package/bin/skills/lakehouse-doc-en/references/eco_integration/superset.md +1 -1
  99. package/bin/skills/lakehouse-doc-en/references/ecosystem-all.md +1 -3
  100. package/bin/skills/lakehouse-doc-en/references/ecosystem.md +145 -0
  101. package/bin/skills/lakehouse-doc-en/references/external-catalog-summary.md +33 -38
  102. package/bin/skills/lakehouse-doc-en/references/external-function-combo-practice.md +466 -0
  103. package/bin/skills/lakehouse-doc-en/references/f6fc6447ee.md +7 -9
  104. package/bin/skills/lakehouse-doc-en/references/federation-query.md +56 -6
  105. package/bin/skills/lakehouse-doc-en/references/finebi-mysql.md +2 -0
  106. package/bin/skills/lakehouse-doc-en/references/get-started-with-sample-data.md +10 -11
  107. package/bin/skills/lakehouse-doc-en/references/gitfolder.md +2 -3
  108. package/bin/skills/lakehouse-doc-en/references/grant-privileges.md +2 -0
  109. package/bin/skills/lakehouse-doc-en/references/iceberg-rest-catalog-databricks.md +166 -0
  110. package/bin/skills/lakehouse-doc-en/references/ide.md +1 -1
  111. package/bin/skills/lakehouse-doc-en/references/if_else_task.md +59 -57
  112. package/bin/skills/lakehouse-doc-en/references/input_output.md +10 -7
  113. package/bin/skills/lakehouse-doc-en/references/jobprofile-bestpractices.md +60 -64
  114. package/bin/skills/lakehouse-doc-en/references/kafka-connection.md +0 -1
  115. package/bin/skills/lakehouse-doc-en/references/key-concepts.md +146 -117
  116. package/bin/skills/lakehouse-doc-en/references/lakehouse-ai-gateway-cz-cli.md +317 -0
  117. package/bin/skills/lakehouse-doc-en/references/lakehouse-ai-sql-analysis.md +345 -0
  118. package/bin/skills/lakehouse-doc-en/references/lakehouse-dqc-guide.md +300 -0
  119. package/bin/skills/lakehouse-doc-en/references/lakehouse-medallion-sql-dt-guide.md +543 -0
  120. package/bin/skills/lakehouse-doc-en/references/lakehouse-multi-cloud-acceleration.md +274 -0
  121. package/bin/skills/lakehouse-doc-en/references/lakehouse-multimodal-ai-pipeline.md +198 -0
  122. package/bin/skills/lakehouse-doc-en/references/lakehouse-quick-experience_guide.md +49 -52
  123. package/bin/skills/lakehouse-doc-en/references/lakehouse-volume-pipe-acceleration-guide.md +380 -0
  124. package/bin/skills/lakehouse-doc-en/references/langchain-plug-installation.md +1 -1
  125. package/bin/skills/lakehouse-doc-en/references/management.md +4 -9
  126. package/bin/skills/lakehouse-doc-en/references/medallion-lakehouse-from-scratch.md +2 -1
  127. package/bin/skills/lakehouse-doc-en/references/metrics_answer_build.md +58 -21
  128. package/bin/skills/lakehouse-doc-en/references/migrate-spark-data-engineering-best-practices-to-lakehouse.md +1 -1
  129. package/bin/skills/lakehouse-doc-en/references/mindsdb.md +1 -1
  130. package/bin/skills/lakehouse-doc-en/references/monitoring_and_alerting.md +65 -60
  131. package/bin/skills/lakehouse-doc-en/references/monitoring_item_specification.md +33 -33
  132. package/bin/skills/lakehouse-doc-en/references/multitable_batch_sync.md +16 -16
  133. package/bin/skills/lakehouse-doc-en/references/multitable_realtime_sync.md +65 -72
  134. package/bin/skills/lakehouse-doc-en/references/multitable_realtime_sync_sop.md +54 -52
  135. package/bin/skills/lakehouse-doc-en/references/navicat-mysql.md +2 -0
  136. package/bin/skills/lakehouse-doc-en/references/om-dynamic-table.md +71 -66
  137. package/bin/skills/lakehouse-doc-en/references/om-vcluster.md +2 -0
  138. package/bin/skills/lakehouse-doc-en/references/open-api-create-session.md +79 -0
  139. package/bin/skills/lakehouse-doc-en/references/open-api-generate-auth-token.md +63 -0
  140. package/bin/skills/lakehouse-doc-en/references/open-api-overview.md +96 -0
  141. package/bin/skills/lakehouse-doc-en/references/open-api-quick-start.md +286 -0
  142. package/bin/skills/lakehouse-doc-en/references/open-api-response-guide.md +264 -0
  143. package/bin/skills/lakehouse-doc-en/references/open-api-safe-question-poll.md +201 -0
  144. package/bin/skills/lakehouse-doc-en/references/open-api-text2insight-query.md +99 -0
  145. package/bin/skills/lakehouse-doc-en/references/open-api-text2insight-stop.md +74 -0
  146. package/bin/skills/lakehouse-doc-en/references/overview.md +6 -7
  147. package/bin/skills/lakehouse-doc-en/references/permission-application.md +5 -5
  148. package/bin/skills/lakehouse-doc-en/references/pipe-introduction.md +1 -0
  149. package/bin/skills/lakehouse-doc-en/references/pipe-kafka-table-stream.md +72 -70
  150. package/bin/skills/lakehouse-doc-en/references/pipe-kafka.md +105 -110
  151. package/bin/skills/lakehouse-doc-en/references/pipe-overview.md +40 -40
  152. package/bin/skills/lakehouse-doc-en/references/pipe-storage-object.md +43 -48
  153. package/bin/skills/lakehouse-doc-en/references/pipe-summary.md +14 -4
  154. package/bin/skills/lakehouse-doc-en/references/pipe-syntax.md +58 -151
  155. package/bin/skills/lakehouse-doc-en/references/practice_python_task.md +4 -4
  156. package/bin/skills/lakehouse-doc-en/references/pricing-ai-gateway.md +181 -0
  157. package/bin/skills/lakehouse-doc-en/references/pricing-lakehouse.md +316 -0
  158. package/bin/skills/lakehouse-doc-en/references/pricing.md +44 -288
  159. package/bin/skills/lakehouse-doc-en/references/private-link-general.md +0 -2
  160. package/bin/skills/lakehouse-doc-en/references/pyspark-to-zettapark-migration-f1.md +1 -1
  161. package/bin/skills/lakehouse-doc-en/references/python-igs.md +7 -3
  162. package/bin/skills/lakehouse-doc-en/references/python-sample-put-github-rt-events.md +1 -1
  163. package/bin/skills/lakehouse-doc-en/references/python-task.md +1 -1
  164. package/bin/skills/lakehouse-doc-en/references/python_reference/connector.md +3 -3
  165. package/bin/skills/lakehouse-doc-en/references/python_reference/connector_advanced.md +2 -2
  166. package/bin/skills/lakehouse-doc-en/references/python_reference/connector_examples.md +2 -2
  167. package/bin/skills/lakehouse-doc-en/references/python_sdk_guide.md +1 -1
  168. package/bin/skills/lakehouse-doc-en/references/python_shell_datasource.md +11 -9
  169. package/bin/skills/lakehouse-doc-en/references/quick_start_batch_sync_data.md +9 -18
  170. package/bin/skills/lakehouse-doc-en/references/quick_start_bi_analysis.md +8 -25
  171. package/bin/skills/lakehouse-doc-en/references/quick_start_create_workspace.md +4 -6
  172. package/bin/skills/lakehouse-doc-en/references/quick_start_data_quality.md +8 -8
  173. package/bin/skills/lakehouse-doc-en/references/quick_start_etl.md +16 -20
  174. package/bin/skills/lakehouse-doc-en/references/quick_start_monitoring_and_alerting.md +10 -18
  175. package/bin/skills/lakehouse-doc-en/references/quick_start_sql_query.md +7 -10
  176. package/bin/skills/lakehouse-doc-en/references/quick_start_upload_data.md +5 -7
  177. package/bin/skills/lakehouse-doc-en/references/quick_start_user_management.md +8 -8
  178. package/bin/skills/lakehouse-doc-en/references/quick_start_workspace.md +0 -5
  179. package/bin/skills/lakehouse-doc-en/references/quick_start_workspace_user.md +8 -8
  180. package/bin/skills/lakehouse-doc-en/references/quickstart.md +69 -56
  181. package/bin/skills/lakehouse-doc-en/references/quickstart_datashare_between_companies.md +0 -5
  182. package/bin/skills/lakehouse-doc-en/references/quickstart_envirment_for_team.md +0 -24
  183. package/bin/skills/lakehouse-doc-en/references/realtime-pipeline-selection-guide.md +1 -2
  184. package/bin/skills/lakehouse-doc-en/references/realtime-sales-dashboard-with-dynamic-table.md +3 -3
  185. package/bin/skills/lakehouse-doc-en/references/realtime_sync.md +0 -1
  186. package/bin/skills/lakehouse-doc-en/references/release-note-2026-05-19.md +5 -3
  187. package/bin/skills/lakehouse-doc-en/references/revoke-privileges.md +3 -1
  188. package/bin/skills/lakehouse-doc-en/references/roles.md +2 -3
  189. package/bin/skills/lakehouse-doc-en/references/row-filter.md +165 -0
  190. package/bin/skills/lakehouse-doc-en/references/row_level_permission.md +30 -19
  191. package/bin/skills/lakehouse-doc-en/references/scheduled_task.md +28 -21
  192. package/bin/skills/lakehouse-doc-en/references/security_overview.md +99 -21
  193. package/bin/skills/lakehouse-doc-en/references/set-command.md +1 -1
  194. package/bin/skills/lakehouse-doc-en/references/setup.md +13 -15
  195. package/bin/skills/lakehouse-doc-en/references/show-grants.md +1 -1
  196. package/bin/skills/lakehouse-doc-en/references/snowflake-dynamic-tables-to-lakehouse.md +2 -2
  197. package/bin/skills/lakehouse-doc-en/references/spark-connector-summary.md +1 -1
  198. package/bin/skills/lakehouse-doc-en/references/sql_functions/context_functions/current_vcluster.md +1 -1
  199. package/bin/skills/lakehouse-doc-en/references/sso-configuration.md +2 -2
  200. package/bin/skills/lakehouse-doc-en/references/streaming_pipeline_with_dynamic_table.md +0 -1
  201. package/bin/skills/lakehouse-doc-en/references/studio-incremental-sync-practice.md +27 -23
  202. package/bin/skills/lakehouse-doc-en/references/studio-shell-task.md +1 -1
  203. package/bin/skills/lakehouse-doc-en/references/supported-cloud-platforms.md +32 -0
  204. package/bin/skills/lakehouse-doc-en/references/table_rendering.md +18 -12
  205. package/bin/skills/lakehouse-doc-en/references/task-develop.md +89 -91
  206. package/bin/skills/lakehouse-doc-en/references/task_development.md +19 -17
  207. package/bin/skills/lakehouse-doc-en/references/task_group.md +16 -14
  208. package/bin/skills/lakehouse-doc-en/references/task_instance.md +21 -21
  209. package/bin/skills/lakehouse-doc-en/references/task_param.md +38 -35
  210. package/bin/skills/lakehouse-doc-en/references/task_param_reference.md +81 -79
  211. package/bin/skills/lakehouse-doc-en/references/task_scheduling_dependency.md +20 -21
  212. package/bin/skills/lakehouse-doc-en/references/tencentcloud_arn_and_externalid.md +1 -5
  213. package/bin/skills/lakehouse-doc-en/references/trial-account-quotas-and-limits.md +1 -3
  214. package/bin/skills/lakehouse-doc-en/references/tutorial_connect_to_lakehouse.md +69 -0
  215. package/bin/skills/lakehouse-doc-en/references/tutorials.md +4 -1
  216. package/bin/skills/lakehouse-doc-en/references/unique-key.md +167 -0
  217. package/bin/skills/lakehouse-doc-en/references/usageandbillingview.md +138 -0
  218. package/bin/skills/lakehouse-doc-en/references/use-dbt-dev.md +3 -3
  219. package/bin/skills/lakehouse-doc-en/references/use-java-sdk-realtime-uploaddata.md +1 -1
  220. package/bin/skills/lakehouse-doc-en/references/use-java-sdk-upload-data-local.md +3 -3
  221. package/bin/skills/lakehouse-doc-en/references/use-models.md +128 -0
  222. package/bin/skills/lakehouse-doc-en/references/use-mysql-client.md +81 -81
  223. package/bin/skills/lakehouse-doc-en/references/use-python-sdk-upload-data.md +10 -12
  224. package/bin/skills/lakehouse-doc-en/references/user-identification.md +2 -3
  225. package/bin/skills/lakehouse-doc-en/references/user_permission_grand_guide.md +1 -1
  226. package/bin/skills/lakehouse-doc-en/references/using-udf-in-dynamic-table.md +1 -1
  227. package/bin/skills/lakehouse-doc-en/references/vc_cache.md +18 -22
  228. package/bin/skills/lakehouse-doc-en/references/vcluster_size_description.md +33 -31
  229. package/bin/skills/lakehouse-doc-en/references/virtual-cluster.md +43 -45
  230. package/bin/skills/lakehouse-doc-en/references/web-job-history.md +94 -108
  231. package/bin/skills/lakehouse-doc-en/references/web_search.md +16 -7
  232. package/bin/skills/lakehouse-doc-en/references/zettapark-data-engineering-demo.md +1 -1
  233. package/bin/skills/lakehouse-doc-en/references/zettapark-dataframe-guide.md +144 -70
  234. package/bin/skills/lakehouse-doc-en/references/zettapark-dynamic-table-guide.md +2 -2
  235. package/bin/skills/lakehouse-doc-en/references/zettapark-etl-guide.md +73 -33
  236. package/bin/skills/lakehouse-doc-en/references/zettapark-feature-engineering.md +2 -2
  237. package/bin/skills/lakehouse-doc-en/references/zettapark-functions-guide.md +75 -46
  238. package/bin/skills/lakehouse-doc-en/references/zettapark-quick-start.md +2 -2
  239. package/bin/skills/lakehouse-doc-en/references/zettapark-stream-guide.md +4 -4
  240. package/bin/skills/lakehouse-doc-en/references/zettapark-volume-guide.md +93 -29
  241. package/package.json +1 -1
  242. package/bin/skills/lakehouse-doc-en/references/CLAUDE.md +0 -606
  243. package/bin/skills/lakehouse-doc-en/references/modelprice.md +0 -155
@@ -1,14 +1,14 @@
1
1
  # Using Table Stream and Pipe to Import Kafka Data into Lakehouse
2
2
 
3
- ## 1. Background Introduction
3
+ ## 1. Background
4
4
 
5
- In the field of big data processing, efficiently importing streaming data from Kafka into a Lakehouse (data lake warehouse) is a common requirement. CloudTech provides powerful Table Stream and Pipe functionalities that simplify and enhance this process. This article will detail how to use Table Stream and Pipe to import Kafka data into a Lakehouse, including the complete process of creating a Kafka external table and a Kafka Table Stream.
5
+ In big data processing, efficiently ingesting streaming data from Kafka into a Lakehouse is a common requirement. Singdata Lakehouse provides powerful Table Stream and Pipe functionality that makes this process simpler and more efficient. This article describes how to use Table Stream and Pipe to import Kafka data into the Lakehouse, covering the complete process of creating a Kafka external table and a Kafka Table Stream.
6
6
 
7
- ## 2. Operational Steps
7
+ ## 2. Steps
8
8
 
9
- ### Creating a Kafka External Table
9
+ ### Create a Kafka External Table
10
10
 
11
- Before using Table Stream and Pipe, we need to create an external table integrated with Kafka to access data in Kafka.
11
+ Before using Table Stream and Pipe, create an [external table integrated with Kafka](create-kafka-external.md) to access data in Kafka.
12
12
 
13
13
  ```sql
14
14
  CREATE STORAGE CONNECTION pipe_kafka
@@ -24,23 +24,24 @@ OPTIONS ( 'group_id' = 'external_table_lh', 'topics' = 'my_topic')
24
24
  CONNECTION pipe_kafka;
25
25
  ```
26
26
 
27
- ### Creating a Table Stream
27
+ ### Create a Table Stream
28
28
 
29
- Create a Table Stream on the Kafka external table to capture real-time data changes in Kafka.
29
+ [Create a Table Stream](create-table-stream.md) on the Kafka external table to capture real-time data changes from Kafka.
30
30
 
31
31
  ```sql
32
32
  CREATE TABLE STREAM kafka_table_stream_pipe1
33
33
  ON TABLE external_table_kafka
34
34
  WITH PROPERTIES (
35
35
  'table_stream_mode' = 'append_only'
36
+
36
37
  );
37
38
  ```
38
39
 
39
- - `kafka_table_stream_pipe1`: Name of the Table Stream.
40
- - `ON TABLE external_table_kafka`: Specifies that the Table Stream is created based on the previously created Kafka external table.
41
- - `table_stream_mode='append_only'`: Sets the mode of the Table Stream to append-only, meaning it will only capture newly added data rows.
40
+ * `kafka_table_stream_pipe1`: Name of the Table Stream.
41
+ * `ON TABLE external_table_kafka`: Specifies that the Table Stream is created based on the previously created Kafka external table.
42
+ * `table_stream_mode='append_only'`: Sets the mode to append-only, meaning only newly added data rows are captured.
42
43
 
43
- After creation, you can verify the data in the Table Stream with the following query:
44
+ After creation, verify the data in the Table Stream with the following query:
44
45
 
45
46
  ```sql
46
47
  SELECT CAST(value AS STRING) FROM kafka_table_stream_pipe1;
@@ -48,61 +49,61 @@ SELECT CAST(value AS STRING) FROM kafka_table_stream_pipe1;
48
49
 
49
50
  This query converts the `value` field in the Table Stream to a string type and returns it for subsequent processing.
50
51
 
51
- ### Creating a Target Table
52
+ ### Create a Target Table
52
53
 
53
- Next, create a target table to store data imported from Kafka.
54
+ Create a target table to store data imported from Kafka.
54
55
 
55
56
  ```sql
56
- CREATE TABLE kafak_sink_table_1 (
57
+ CREATE TABLE kafka_sink_table_1 (
57
58
  a TIMESTAMP,
58
59
  b STRING
59
60
  );
60
61
  ```
61
62
 
62
- - `kafak_sink_table_1`: Name of the target table.
63
- - `a TIMESTAMP`: First field for storing timestamp data.
64
- - `b STRING`: Second field for storing string data.
63
+ * `kafka_sink_table_1`: Name of the target table.
64
+ * `a TIMESTAMP`: First field for storing timestamp data.
65
+ * `b STRING`: Second field for storing string data.
65
66
 
66
- ### Creating a Pipe
67
+ ### Create a Pipe
67
68
 
68
- Finally, use a Pipe to continuously import data from the Table Stream into the target table.
69
+ Use a Pipe to continuously import data from the Table Stream into the target table.
69
70
 
70
71
  ```sql
71
72
  CREATE PIPE kafka_pipe_stream
72
73
  VIRTUAL_CLUSTER = 'test_alter'
73
74
  AS
74
- COPY INTO kafak_sink_table_1
75
+ COPY INTO kafka_sink_table_1
75
76
  FROM (
76
77
  SELECT CURRENT_TIMESTAMP(), CAST(value AS STRING) FROM kafka_table_stream_pipe1
77
78
  );
78
79
  ```
79
80
 
80
- - `kafka_pipe_stream`: Name of the Pipe.
81
- - `VIRTUAL_CLUSTER = 'test_alter'`: Specifies the virtual cluster to use.
82
- - `COPY INTO kafak_sink_table_1`: Copies data into the target table `kafak_sink_table_1`.
83
- - `SELECT CURRENT_TIMESTAMP(), CAST(value AS STRING) FROM kafka_table_stream_pipe1`: Selects data from the Table Stream, using the current timestamp and the converted `value` field as the two columns for the target table.
81
+ * `kafka_pipe_stream`: Name of the Pipe.
82
+ * `VIRTUAL_CLUSTER = 'test_alter'`: Specifies the Virtual Cluster to use.
83
+ * `COPY INTO kafka_sink_table_1`: Copies data into the target table `kafka_sink_table_1`.
84
+ * `SELECT CURRENT_TIMESTAMP(), CAST(value AS STRING) FROM kafka_table_stream_pipe1`: Selects data from the Table Stream, using the current timestamp and the converted `value` field as the two columns for the target table.
84
85
 
85
- Other Configurable Properties:
86
+ Other configurable properties:
86
87
  - `INITIAL_DELAY_IN_SECONDS`: Initial job scheduling delay (optional, default 0 seconds)
87
- - `BATCH_INTERVAL_IN_SECONDS`: (Optional) Sets the batch processing interval, default 60 seconds.
88
- - `BATCH_SIZE_PER_KAFKA_PARTITION`: (Optional) Sets the batch size per Kafka partition, default 500,000 records.
89
- - `MAX_SKIP_BATCH_COUNT_ON_ERROR`: (Optional) Sets the maximum number of batches to skip on error, default 30.
90
- - `RESET_KAFKA_GROUP_OFFSETS`: (Optional) Sets the initial offset for Kafka when starting the pipe. Cannot be modified. Possible values: `latest`, `earliest`, `none`, `valid`, `${TIMESTAMP_MILLISECONDS}`
91
- - `none`: Default, no action.
92
- - `valid`: Checks if the current offset in the group is expired and resets expired partitions to the current earliest.
93
- - `earliest`: Resets to the current earliest.
94
- - `latest`: Resets to the current latest.
95
- - `${TIMESTAMP_MILLISECONDS}`: Resets to the offset corresponding to the millisecond timestamp, e.g., '1737789688000' (2025-01-25 15:21:28).
88
+ - `BATCH_INTERVAL_IN_SECONDS`: (Optional) Batch processing interval, default 60 seconds.
89
+ - `BATCH_SIZE_PER_KAFKA_PARTITION`: (Optional) Batch size per Kafka partition, default 500,000 records.
90
+ - `MAX_SKIP_BATCH_COUNT_ON_ERROR`: (Optional) Maximum number of batches to skip on error, default 30.
91
+ - `RESET_KAFKA_GROUP_OFFSETS`: (Optional) Initial Kafka offset when starting the Pipe. Cannot be modified after creation. Possible values: `latest`, `earliest`, `none`, `valid`, `${TIMESTAMP_MILLISECONDS}`
92
+ - `none`: No action (default)
93
+ - `valid`: Checks if the current group offset is expired and resets expired partition offsets to the current earliest
94
+ - `earliest`: Resets to the current earliest
95
+ - `latest`: Resets to the current latest
96
+ - `${TIMESTAMP_MILLISECONDS}`: Resets to the offset corresponding to the millisecond timestamp, e.g., `1737789688000` (2025-01-25 15:21:28)
96
97
 
97
- ## 3. Verifying Results
98
+ ## 3. Verify Results
98
99
 
99
- You can verify whether the data has been successfully imported by querying the target table:
100
+ Verify whether data has been successfully imported by querying the target table:
100
101
 
101
102
  ```sql
102
- SELECT * FROM kafak_sink_table_1;
103
+ SELECT * FROM kafka_sink_table_1;
103
104
  ```
104
105
 
105
- Additionally, check the running status of the Pipe to ensure it is functioning properly:
106
+ Check the running status of the Pipe to ensure it is working properly:
106
107
 
107
108
  ```sql
108
109
  SHOW PIPES;
@@ -112,14 +113,15 @@ This command lists all created Pipes and their status information, including whe
112
113
 
113
114
  ## 4. Status Monitoring and Management
114
115
 
115
- ### Checking Kafka Consumption Latency
116
+ ### Check Kafka Consumption Latency
116
117
 
117
- Use the `DESC PIPE` command. For example, the JSON string in `pipe_latency` below:
118
- - `lastConsumeTimestamp`: The last consumed offset.
119
- - `offsetLag`: The backlog of Kafka data.
120
- - `timeLag`: Consumption latency, calculated as the current time minus the last consumed offset. If Kafka consumption is abnormal, the value is -1.
118
+ Use the `DESC PIPE` command. The JSON string in `pipe_latency` contains the following fields:
119
+ - `lastConsumeTimestamp`: The last consumed offset timestamp
120
+ - `offsetLag`: The backlog of Kafka data
121
+ - `timeLag`: Consumption latency, calculated as the current time minus the last consumed offset timestamp. When Kafka consumption is abnormal, the value is -1
121
122
 
122
- ```sql
123
+
124
+ ````
123
125
  DESC PIPE EXTENDED kafka_pipe_stream
124
126
  +--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
125
127
  | info_name | info_value |
@@ -130,53 +132,53 @@ DESC PIPE EXTENDED kafka_pipe_stream
130
132
  | last_modified_time | 2025-03-05 10:40:55.405 |
131
133
  | comment | |
132
134
  | properties | ((virtual_cluster,test_alter)) |
133
- | copy_statement | COPY INTO TABLE qingyun.pipe_schema.kafak_sink_table_1 FROM (SELECT `current_timestamp`() AS ```current_timestamp``()`, CAST(kafka_table_stream_pipe1.`value` AS string) AS `value` |
135
+ | copy_statement | COPY INTO TABLE qingyun.pipe_schema.kafka_sink_table_1 FROM (SELECT `current_timestamp`() AS ```current_timestamp``()`, CAST(kafka_table_stream_pipe1.`value` AS string) AS `value` |
134
136
  | pipe_status | RUNNING |
135
- | output_name | xxxxxxx.pipe_schema.kafak_sink_table_1 |
137
+ | output_name | xxxxxxx.pipe_schema.kafka_sink_table_1 |
136
138
  | input_name | kafka_table_stream:xxxxxxx.pipe_schema.kafka_table_stream_pipe1 |
137
139
  | invalid_reason | |
138
140
  | pipe_latency | {"kafka":{"lags":{"0":0,"1":0,"2":0,"3":0},"lastConsumeTimestamp":-1,"offsetLag":0,"timeLag":-1}} |
139
141
  +--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
140
- ```
141
142
 
142
- ### Viewing Pipe Execution History
143
+ ````
143
144
 
144
- Since each Pipe execution is a copy operation, you can view all operations in the job history. Use the `query_tag` in the [Job History](web-job-history.md) to filter, as all Pipe copy jobs are tagged in the format `pipe.``workspace_name``.schema_name.pipe_name`, making it easy to track and manage.
145
+ ### View Pipe Execution History
145
146
 
146
- ### Stopping and Starting a Pipe
147
+ Since each Pipe execution is a COPY operation, you can view all operations in the job history. Filter by `query_tag` in [Job History](<web-job-history.md>). All Pipe COPY jobs are tagged in the format `pipe.``workspace_name``.schema_name.pipe_name` for easy tracking.
147
148
 
148
- - Pause a Pipe:
149
- ```sql
150
- ALTER PIPE pipe_name SET PIPE_EXECUTION_PAUSED = true;
151
- ```
149
+ ### Stop and Start a Pipe
152
150
 
151
+ - Pause a Pipe:
152
+ ```
153
+ ALTER PIPE pipe_name SET PIPE_EXECUTION_PAUSED = true;
154
+ ```
153
155
  - Resume a Pipe:
154
- ```sql
155
- ALTER PIPE pipe_name SET PIPE_EXECUTION_PAUSED = false;
156
- ```
156
+ ```
157
+ ALTER PIPE pipe_name SET PIPE_EXECUTION_PAUSED = false;
158
+ ```
157
159
 
158
- ### Modifying Pipe Properties
160
+ ### Modify Pipe Properties
159
161
 
160
- You can modify Pipe properties, but only one property at a time. If multiple properties need to be modified, execute the `ALTER` command multiple times. Below are the modifiable properties and their syntax:
162
+ You can modify Pipe properties one at a time. If multiple properties need to be changed, run the `ALTER` command multiple times. Below are the modifiable properties and their syntax:
161
163
 
162
- ```sql
164
+ ```SQL
163
165
  ALTER PIPE pipe_name SET
164
- [VIRTUAL_CLUSTER = 'virtual_cluster_name'],
166
+ [VIRTUAL_CLUSTER = 'virtual_cluster_name'],
165
167
  [BATCH_INTERVAL_IN_SECONDS=''],
166
- [ BATCH_SIZE_PER_KAFKA_PARTITION=''],
168
+ [BATCH_SIZE_PER_KAFKA_PARTITION=''],
167
169
  [MAX_SKIP_BATCH_COUNT_ON_ERROR=''],
168
170
  [COPY_JOB_HINT='']
169
171
  ```
170
172
 
171
173
  Examples:
172
- ```sql
173
- -- Modify compute cluster
174
- ALTER PIPE pipe_name SET VIRTUAL_CLUSTER = 'default';
175
-
174
+ ```
175
+ -- Modify the Virtual Cluster
176
+ ALTER PIPE pipe_name SET VIRTUAL_CLUSTER = 'DEFAULT'
176
177
  -- Set COPY_JOB_HINT
177
- ALTER PIPE pipe_name SET copy_hints='{"cz.mapper.kafka.message.size": "2000000"}';
178
+ ALTER PIPE pipe_name SET COPY_JOB_HINT='{"cz.mapper.kafka.message.size": "2000000"}'
179
+
178
180
  ```
179
181
 
180
- **Note**
181
- - Modifying the COPY statement logic is not supported. If needed, delete the Pipe and recreate it.
182
- - When modifying the `COPY_JOB_HINT` of a Pipe, the new settings will overwrite existing hints. If your Pipe already has hints (e.g., `{"cz.sql.split.kafka.strategy":"size"}`), you must set all required hints together when adding new ones; otherwise, existing hints will be overwritten. Separate multiple parameters with commas.
182
+ **Notes**
183
+ - Modifying the COPY statement logic is not supported. If you need to modify it, delete the Pipe and recreate it.
184
+ - When modifying the `COPY_JOB_HINT` of a Pipe, the new settings will overwrite all existing hints. If your Pipe already has hints such as `{"cz.sql.split.kafka.strategy":"size"}`, you must include all required hints together when setting new ones; otherwise existing hints will be overwritten. Separate multiple parameters with commas.
@@ -1,4 +1,10 @@
1
- # Continuous Data Import from Kafka Using Pipe
1
+ # Continuous Data Collection from Kafka Using Pipe
2
+
3
+ ## Overview
4
+
5
+ Pipe is the **continuous data ingestion** solution provided by the Lakehouse, designed to automatically and continuously import data from Kafka into Lakehouse tables. Pipe creates a persistent consumer group, maintains the consumption position, and runs continuously according to the configured scheduling strategy.
6
+
7
+ A Kafka Pipe is like a continuously running consumer group. You only need to define the consumption logic, and it automatically pulls data from the Topic and writes it to a table — no manual triggering or Cron configuration required.
2
8
 
3
9
  ## Kafka Pipe Syntax
4
10
 
@@ -6,34 +12,73 @@
6
12
  -- Syntax for creating a Pipe from Kafka
7
13
  CREATE PIPE [ IF NOT EXISTS ] <pipe_name>
8
14
  VIRTUAL_CLUSTER = 'virtual_cluster_name'
15
+ [INITIAL_DELAY_IN_SECONDS='']
9
16
  [BATCH_INTERVAL_IN_SECONDS='']
10
- [ BATCH_SIZE_PER_KAFKA_PARTITION='']
17
+ [BATCH_SIZE_PER_KAFKA_PARTITION='']
11
18
  [MAX_SKIP_BATCH_COUNT_ON_ERROR='']
12
19
  [RESET_KAFKA_GROUP_OFFSETS='']
13
20
  [COPY_JOB_HINT='']
14
21
  AS <copy_statement>;
15
22
  ```
16
23
 
17
- * `<pipe_name>`: The name of the Pipe object you want to create.
18
- * `VIRTUAL_CLUSTER`: Specify the name of the virtual cluster.
19
- * `BATCH_INTERVAL_IN_SECONDS`: (Optional) Set the batch interval time, default is 60 seconds.
20
- * `BATCH_SIZE_PER_KAFKA_PARTITION`: (Optional) Set the batch size per Kafka partition, default is 500,000 records.
21
- * `MAX_SKIP_BATCH_COUNT_ON_ERROR`: (Optional) Set the maximum retry count for skipped batches on error, default is 30.
22
- - `RESET_KAFKA_GROUP_OFFSETS`: (Optional) Sets the initial offset for Kafka when starting the pipe. This property cannot be modified after the pipe is created. Possible values are `latest`, `earliest`, `none`, `valid`, and `${TIMESTAMP_MILLISECONDS}`.
23
- - `none`: No action by default.
24
- - `valid`: Checks if the current offset in the group is expired and resets expired partitions to the current earliest offset.
25
- - `earliest`: Resets to the current earliest offset.
26
- - `latest`: Resets to the current latest offset.
27
- - `${TIMESTAMP_MILLISECONDS}`: Resets to the offset corresponding to the millisecond timestamp, for example, `'1737789688000'` (which corresponds to January 25, 2025, 15:21:28).
24
+ * `<pipe_name>`: The name of the Pipe object, used for management and monitoring.
25
+ * `VIRTUAL_CLUSTER`: Specifies the name of the Virtual Cluster to execute Pipe tasks.
26
+ * `INITIAL_DELAY_IN_SECONDS`: Initial job scheduling delay (optional, default 0 seconds).
27
+ * `BATCH_INTERVAL_IN_SECONDS`: (Optional) Controls how long to accumulate data per batch before writing a shorter interval means fresher data, a longer interval means more efficient single writes. Default of 60 seconds works for most scenarios.
28
+ * `BATCH_SIZE_PER_KAFKA_PARTITION`: (Optional) Batch size per Kafka partition, default 500,000 records.
29
+ * `MAX_SKIP_BATCH_COUNT_ON_ERROR`: (Optional) Maximum number of batches to skip on error, default 30.
30
+ * `RESET_KAFKA_GROUP_OFFSETS`: (Optional) Controls where the Pipe starts consuming Kafka data when it starts. Only settable at startup. If not set and the consumer group has no historical position, Kafka's [auto.offset.reset](https://kafka.apache.org/documentation/#consumerconfigs_auto.offset.reset) configuration is used (default `latest`). Supported values:
31
+ * `none`: No action; uses [auto.offset.reset](https://kafka.apache.org/documentation/#consumerconfigs_auto.offset.reset)
32
+ * `valid`: Checks if the current group offset is expired and resets expired partition offsets to the current earliest
33
+ * `earliest`: Resets to the current earliest
34
+ * `latest`: Resets to the current latest
35
+ * `${TIMESTAMP_MILLISECONDS}`: Resets to the offset corresponding to the millisecond timestamp, e.g., `1737789688000` (2025-01-25 15:21:28)
36
+
37
+ ## Using READ\_KAFKA in a Pipe
38
+
39
+ For temporary exploration, you can use the READ_KAFKA function directly (see [READ_KAFKA Function](<sql_functions/table_functions/read_kafka.md>)). When using `READ_KAFKA` in a Pipe's COPY statement, the following **important differences** apply:
40
+
41
+ ### Parameter Passing Rules
42
+
43
+ ```sql
44
+ -- READ_KAFKA syntax in a Pipe
45
+ read_kafka (
46
+ 'bootstrap_servers', -- Required: Kafka cluster address in host:port format, multiple brokers separated by commas — 2-3 broker addresses are sufficient, no need to list all nodes
47
+ 'topic', -- Required: Topic name — one Pipe corresponds to one Topic; create multiple Pipes for multiple Topics
48
+ '', -- Required: Topic pattern (not yet supported, leave empty string)
49
+ 'group_id', -- Required: Persistent consumer group ID — use a meaningful name (e.g., pipe_orders_group); different Pipes for the same Topic must use different group_ids
50
+ '', -- Leave empty: start position is managed automatically by Pipe (when using READ_KAFKA standalone, fill starting_offsets here)
51
+ '', -- Leave empty: end position managed automatically by Pipe
52
+ '', -- Leave empty: start timestamp managed automatically by Pipe
53
+ '', -- Leave empty: end timestamp managed automatically by Pipe
54
+ 'raw', -- Key format
55
+ 'raw', -- Value format
56
+ 0, -- Max error count
57
+ map() -- Kafka config parameters — fill in SSL, SASL and other auth params here when needed, e.g., map('security.protocol','SASL_SSL',...)
58
+ )
59
+ ```
60
+
61
+ ### Key Differences
62
+
63
+ | Feature | READ\_KAFKA Function (standalone) | READ\_KAFKA (in a Pipe) |
64
+ | ------ | ------------------------ | --------------------- |
65
+ | Consumer group | Temporary, destroyed after execution | Persistent, maintains consumption position |
66
+ | Position management | Manually specify starting\_offsets etc. | Managed automatically by Pipe; position parameters must be left empty |
67
+ | Execution mode | One-time query | Continuously scheduled |
68
+ | Default start position | earliest (explore historical data) | latest (process new data) |
69
+
70
+ ### Best Practices
71
+
72
+ See [Efficiently Ingesting Kafka Data with Pipe](<pipe-kafka-bestpractice-1.md>)
28
73
 
29
74
  ## Usage Example
30
75
 
31
76
  ```SQL
32
- /*Use Lakehouse Pipe task object to continuously import Kafka data into the target table*/
77
+ /*Use a Lakehouse Pipe task object to continuously import Kafka data into a target table*/
33
78
  ---Step01: Create the target table for Kafka writes
34
79
  create table kafka_raw(value string);
35
80
 
36
- ---Step02: Create PIPE task to read from Kafka and write to the target table
81
+ ---Step02: Create a PIPE task to read from Kafka and write to the target table
37
82
  CREATE PIPE load_kafka01
38
83
  VIRTUAL_CLUSTER = 'DEFAULT'
39
84
  BATCH_INTERVAL_IN_SECONDS = '10'
@@ -48,12 +93,12 @@ FROM (
48
93
  'test',-- topic name
49
94
  '', -- topic prefix not supported yet
50
95
  'pipe_kafka_group',-- group id
51
- '',-- Point-related parameters, leave blank in pipe ddl
52
- '',-- Point-related parameters, leave blank in pipe ddl
53
- '',-- Point-related parameters, leave blank in pipe ddl
54
- '',-- Point-related parameters, leave blank in pipe ddl
55
- 'raw',-- format of key, currently only supports binary
56
- 'raw',-- format of value, currently only supports binary
96
+ '',-- offset-related parameter, leave empty in pipe ddl
97
+ '',-- offset-related parameter, leave empty in pipe ddl
98
+ '',-- offset-related parameter, leave empty in pipe ddl
99
+ '',-- offset-related parameter, leave empty in pipe ddl
100
+ 'raw',-- key format, currently only supports binary
101
+ 'raw',-- value format, currently only supports binary
57
102
  0,
58
103
  map()
59
104
  )
@@ -88,72 +133,18 @@ SELECT * FROM kafka_raw LIMIT 100;
88
133
  DROP PIPE load_kafka01;
89
134
  ```
90
135
 
91
- ## Function: read\_kafka
92
-
93
- > Note: This function is currently in preview release
94
-
95
- ## Function Description
96
-
97
- Read data from an Apache Kafka cluster and return the data in tabular form.
98
-
99
- ## Function Syntax
100
-
101
- ```SQL
102
- read_kafka (
103
- <bootstrapServers>,
104
- <topic>,
105
- <topic_prefix>,
106
- <group_id>,
107
- <STARTING_OFFSETS>,
108
- <ENDING_OFFSETS>,
109
- <STARTING_OFFSETS_TIMESTAMP>,
110
- <ENDING_OFFSETS_TIMESTAMP>,
111
- <KEY_FORMAT>,
112
- <VALUE_FORMAT>,
113
- <MAX_ERROR_NUMBER>,
114
- <kafka_parameters>
115
- )
116
- ```
117
-
118
- ## Parameter Description
119
-
120
- * bootstrap: Comma-separated Kafka broker server addresses, such as `1.2.3.1:9092,1.2.3.2:9092`.
121
- * topic: Kafka topic name, multiple topics separated by commas, such as `topicA,topicB`.
122
- * topic\_pattern: Topic regex, not supported yet, leave it empty by default. For example: ''.
123
- * group\_id: Kafka consumer group ID.
124
- * STARTING\_OFFSETS: Specifies the starting offset to read from, default is `latest`. This parameter does not need to be passed in the pipe.
125
- * ENDING\_OFFSETS: Specifies the ending offset, default is `latest`. This parameter does not need to be passed in the pipe.
126
- * STARTING\_OFFSETS\_TIMESTAMP: Specifies the timestamp for the starting offset. This parameter does not need to be passed in the pipe.
127
- * ENDING\_OFFSETS\_TIMESTAMP: Specifies the timestamp for the ending offset. This parameter does not need to be passed in the pipe.
128
- * KEY\_FORMAT: Specifies the format of the key to read, case-insensitive STRING type. Currently, only raw format is supported.
129
- * VALUE\_FORMAT: Specifies the format of the value to read, case-insensitive STRING type. Currently, only raw format is supported.
130
- * MAX\_ERROR\_NUMBER: The maximum number of allowed error rows within the reading window. Must be greater than or equal to 0. The default is 0, which means no error rows are allowed, with a range of 0-100000.
131
- * kafka\_parameters: Parameters to be passed to Kafka, prefixed with kafka., directly using Kafka's parameters. These options can be found in Kafka. The format is like MAP('kafka.security.protocol', 'PLAINTEXT', 'kafka.auto.offset.reset', 'latest'). For values, refer to the [Kafka documentation](https://kafka.apache.org/documentation/#consumerconfigs).
132
-
133
- ## Return Values
134
-
135
- | Field | Meaning | Type |
136
- | --------------- | ---------------------------- | -------------------- |
137
- | topic | Kafka topic name | STRING |
138
- | partition | Data partition ID | INT |
139
- | offset | Offset in Kafka partition | BIGINT |
140
- | timestamp | Kafka message timestamp | TIMESTAMP\_LTZ |
141
- | timestamp\_type | Kafka message timestamp type | STRING |
142
- | headers | Kafka message headers | MAP\<STRING, BINARY> |
143
- | key | Kafka key value | BINARY |
144
- | value | Kafka value | BINARY |
145
136
 
146
137
 
147
138
  ## Status Monitoring and Management
148
139
 
149
- ### Viewing Kafka Consumption Latency
140
+ ### Check Kafka Consumption Latency
150
141
 
151
- Use the `DESC PIPE` command. For example, the JSON string in `pipe_latency` below:
152
- - `lastConsumeTimestamp`: The last consumed offset.
153
- - `offsetLag`: The backlog of Kafka data.
154
- - `timeLag`: Consumption latency, calculated as the current time minus the last consumed offset. If Kafka consumption is abnormal, the value is -1.
142
+ Use the `DESC PIPE` command. The JSON string in `pipe_latency` contains the following fields:
143
+ - `lastConsumeTimestamp`: The last consumed offset timestamp
144
+ - `offsetLag`: The backlog of Kafka data
145
+ - `timeLag`: Consumption latency, calculated as the current time minus the last consumed offset timestamp. When Kafka consumption is abnormal, the value is -1
155
146
 
156
- ```sql
147
+ ````
157
148
  DESC PIPE EXTENDED kafka_pipe_stream
158
149
  +--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
159
150
  | info_name | info_value |
@@ -164,54 +155,58 @@ DESC PIPE EXTENDED kafka_pipe_stream
164
155
  | last_modified_time | 2025-03-05 10:40:55.405 |
165
156
  | comment | |
166
157
  | properties | ((virtual_cluster,test_alter)) |
167
- | copy_statement | COPY INTO TABLE qingyun.pipe_schema.kafak_sink_table_1 FROM (SELECT `current_timestamp`() AS ```current_timestamp``()`, CAST(kafka_table_stream_pipe1.`value` AS string) AS `value` |
158
+ | copy_statement | COPY INTO TABLE qingyun.pipe_schema.kafka_sink_table_1 FROM (SELECT `current_timestamp`() AS ```current_timestamp``()`, CAST(kafka_table_stream_pipe1.`value` AS string) AS `value` |
168
159
  | pipe_status | RUNNING |
169
- | output_name | xxxxxxx.pipe_schema.kafak_sink_table_1 |
160
+ | output_name | xxxxxxx.pipe_schema.kafka_sink_table_1 |
170
161
  | input_name | kafka_table_stream:xxxxxxx.pipe_schema.kafka_table_stream_pipe1 |
171
162
  | invalid_reason | |
172
163
  | pipe_latency | {"kafka":{"lags":{"0":0,"1":0,"2":0,"3":0},"lastConsumeTimestamp":-1,"offsetLag":0,"timeLag":-1}} |
173
164
  +--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
174
- ```
175
165
 
176
- ### Viewing Pipe Execution History
166
+ ````
177
167
 
178
- Since each Pipe execution triggers a copy operation, you can view all operations in the job history. Use the `query_tag` in the [Job History](web-job-history.md) to filter. All Pipe copy jobs are tagged in the format `pipe.``workspace_name``.schema_name.pipe_name`, making it easy to track and manage.
168
+ ### View Pipe Execution History
179
169
 
180
- ### Stopping and Starting a Pipe
170
+ Since each Pipe execution is a COPY operation, you can view all operations in the job history. Filter by `query_tag` in the [Job History](web-job-history.md). All Pipe COPY jobs are tagged in the format `pipe.``workspace_name``.schema_name.pipe_name` for easy tracking.
181
171
 
182
- - Pause a Pipe:
183
- ```sql
184
- ALTER PIPE pipe_name SET PIPE_EXECUTION_PAUSED = true;
185
- ```
172
+ ### Stop and Start a Pipe
186
173
 
187
- - Resume a Pipe:
188
- ```sql
189
- ALTER PIPE pipe_name SET PIPE_EXECUTION_PAUSED = false;
190
- ```
174
+ * Pause a Pipe:
191
175
 
192
- ### Modifying Pipe Properties
176
+ ```
177
+ ALTER PIPE pipe_name SET PIPE_EXECUTION_PAUSED = true;
178
+ ```
193
179
 
194
- You can modify the properties of a Pipe, but only one property at a time. If multiple properties need to be modified, execute the `ALTER` command multiple times. Below are the modifiable properties and their syntax:
180
+ * Resume a Pipe:
195
181
 
196
- ```sql
182
+ ```
183
+ ALTER PIPE pipe_name SET PIPE_EXECUTION_PAUSED = false;
184
+ ```
185
+
186
+ ### Modify Pipe Properties
187
+
188
+ You can modify Pipe properties one at a time. If multiple properties need to be changed, run the `ALTER` command multiple times. Below are the modifiable properties and their syntax:
189
+
190
+ ```SQL
197
191
  ALTER PIPE pipe_name SET
198
192
  [VIRTUAL_CLUSTER = 'virtual_cluster_name'],
199
- [BATCH_INTERVAL_IN_SECONDS=''],
200
- [ BATCH_SIZE_PER_KAFKA_PARTITION=''],
201
- [MAX_SKIP_BATCH_COUNT_ON_ERROR=''],
202
- [RESET_KAFKA_GROUP_OFFSETS=''],
203
- [COPY_JOB_HINT='']
193
+ [BATCH_INTERVAL_IN_SECONDS=''],
194
+ [BATCH_SIZE_PER_KAFKA_PARTITION=''],
195
+ [MAX_SKIP_BATCH_COUNT_ON_ERROR=''],
196
+ [COPY_JOB_HINT='']
204
197
  ```
205
198
 
206
199
  Examples:
207
- ```sql
208
- -- Modify the compute cluster
209
- ALTER PIPE pipe_name SET VIRTUAL_CLUSTER = 'default';
210
200
 
201
+ ```
202
+ -- Modify the Virtual Cluster
203
+ ALTER PIPE pipe_name SET VIRTUAL_CLUSTER = 'DEFAULT'
211
204
  -- Set COPY_JOB_HINT
212
- ALTER PIPE pipe_name SET copy_hints='{"cz.mapper.kafka.message.size": "2000000"}';
205
+ ALTER PIPE pipe_name SET COPY_JOB_HINT='{"cz.mapper.kafka.message.size": "2000000"}'
206
+
213
207
  ```
214
208
 
215
- **Note**
216
- - Modifying the logic of the COPY statement is not supported. If you need to modify it, delete the Pipe and recreate it.
217
- - When modifying the `COPY_JOB_HINT` of a Pipe, the new settings will overwrite existing hints. Therefore, if your Pipe already has hints (e.g., `{"cz.sql.split.kafka.strategy":"size"}`), you must set all required hints together when adding new ones; otherwise, the existing hints will be overwritten by the new settings. Separate multiple parameters with commas.
209
+ **Notes**
210
+
211
+ * Modifying the COPY statement logic is not supported. If you need to modify it, delete the Pipe and recreate it.
212
+ * When modifying the `COPY_JOB_HINT` of a Pipe, the new settings will overwrite all existing hints. If your Pipe already has hints such as `{"cz.sql.split.kafka.strategy":"size"}`, you must include all required hints together when setting new ones; otherwise existing hints will be overwritten. Separate multiple parameters with commas.