@clickzetta/cz-cli-darwin-x64 0.5.16 → 0.5.18

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (243) hide show
  1. package/bin/cz-cli +0 -0
  2. package/bin/skills/lakehouse-doc-en/SKILL.md +6 -11
  3. package/bin/skills/lakehouse-doc-en/references/AIGateway.md +58 -13
  4. package/bin/skills/lakehouse-doc-en/references/Computation.md +1 -1
  5. package/bin/skills/lakehouse-doc-en/references/DataSource_Amazon_DocumentDB.md +3 -1
  6. package/bin/skills/lakehouse-doc-en/references/Foreach.md +14 -14
  7. package/bin/skills/lakehouse-doc-en/references/JDBC-Driver.md +0 -1
  8. package/bin/skills/lakehouse-doc-en/references/LakehouseAI-overview.md +21 -8
  9. package/bin/skills/lakehouse-doc-en/references/LakehouseDataGPT-tour.md +4 -9
  10. package/bin/skills/lakehouse-doc-en/references/LakehouseStudio-tour.md +14 -19
  11. package/bin/skills/lakehouse-doc-en/references/Lakehouse_Zilliz_MakeDataReadyforBIandAI.md +1 -1
  12. package/bin/skills/lakehouse-doc-en/references/Logstash.md +3 -3
  13. package/bin/skills/lakehouse-doc-en/references/Migrate_Spark_DataEngineeringBestPractices_Project_to_Lakehouse.md +1 -1
  14. package/bin/skills/lakehouse-doc-en/references/Notebook.md +17 -17
  15. package/bin/skills/lakehouse-doc-en/references/RemoteFunction-as-udf.md +14 -14
  16. package/bin/skills/lakehouse-doc-en/references/SQL_External_Catalog_Guide.md +1 -9
  17. package/bin/skills/lakehouse-doc-en/references/SUMMARY.md +59 -29
  18. package/bin/skills/lakehouse-doc-en/references/WINDOWFUNCTION.md +99 -57
  19. package/bin/skills/lakehouse-doc-en/references/Zettapark_Data_Engineering_Demo.md +1 -1
  20. package/bin/skills/lakehouse-doc-en/references/access-control-configuration.md +1 -8
  21. package/bin/skills/lakehouse-doc-en/references/aigw-2026-2-5-1.0.md +16 -0
  22. package/bin/skills/lakehouse-doc-en/references/aigw-2026-3-29-1.0.2.md +14 -0
  23. package/bin/skills/lakehouse-doc-en/references/aigw-2026-3-8-1.0.1.md +16 -0
  24. package/bin/skills/lakehouse-doc-en/references/aigw-2026-4-28-1.1.md +29 -0
  25. package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-12-1.1.1.md +18 -0
  26. package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-15-1.2.md +9 -0
  27. package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-21-1.3.md +9 -0
  28. package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-28-1.4.md +10 -0
  29. package/bin/skills/lakehouse-doc-en/references/aigw-2026-6-3-1.5.md +9 -0
  30. package/bin/skills/lakehouse-doc-en/references/alicloud-arn-externalid.md +0 -5
  31. package/bin/skills/lakehouse-doc-en/references/answer-accuracy-improve.md +120 -103
  32. package/bin/skills/lakehouse-doc-en/references/application-list.md +1 -3
  33. package/bin/skills/lakehouse-doc-en/references/approval-list.md +16 -17
  34. package/bin/skills/lakehouse-doc-en/references/batch-load-parquet-file-into-lakehouse.md +1 -1
  35. package/bin/skills/lakehouse-doc-en/references/batch_sync.md +9 -9
  36. package/bin/skills/lakehouse-doc-en/references/batch_sync_Sop.md +2 -2
  37. package/bin/skills/lakehouse-doc-en/references/batchloadparquetfileintoLakehouse.md +1 -1
  38. package/bin/skills/lakehouse-doc-en/references/bulkloadv1-python-sdk.md +3 -3
  39. package/bin/skills/lakehouse-doc-en/references/chart-auto-refresh-guide.md +12 -6
  40. package/bin/skills/lakehouse-doc-en/references/clickzetta-sample-data.md +3 -3
  41. package/bin/skills/lakehouse-doc-en/references/code_approval.md +1 -5
  42. package/bin/skills/lakehouse-doc-en/references/composite_task.md +31 -42
  43. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_environment_and_data_generate.md +6 -9
  44. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_javasdk_bulkload_realtime.md +4 -10
  45. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_kafka_realtime_sync.md +1 -10
  46. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_local_file_into_table_by_studio.md +0 -6
  47. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_batchload_public_network.md +0 -5
  48. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_python_node.md +2 -7
  49. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_realtime_cdc_public_network.md +13 -18
  50. package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_sql_insert.md +0 -1
  51. package/bin/skills/lakehouse-doc-en/references/concepts.md +1 -1
  52. package/bin/skills/lakehouse-doc-en/references/config-datasource.md +5 -7
  53. package/bin/skills/lakehouse-doc-en/references/connect-with-cli.md +116 -72
  54. package/bin/skills/lakehouse-doc-en/references/connect-with-cz-cli.md +151 -0
  55. package/bin/skills/lakehouse-doc-en/references/continue-job.md +9 -17
  56. package/bin/skills/lakehouse-doc-en/references/create-api-connection.md +315 -286
  57. package/bin/skills/lakehouse-doc-en/references/create-catalog-connection.md +1 -0
  58. package/bin/skills/lakehouse-doc-en/references/create-dynamic-table.md +4 -4
  59. package/bin/skills/lakehouse-doc-en/references/create-external-catalog.md +85 -22
  60. package/bin/skills/lakehouse-doc-en/references/create-table-ddl.md +45 -0
  61. package/bin/skills/lakehouse-doc-en/references/creating_alicloud_privatelinkendpoint.md +4 -6
  62. package/bin/skills/lakehouse-doc-en/references/creating_alicloud_privatelinkservice.md +4 -7
  63. package/bin/skills/lakehouse-doc-en/references/creating_tencentcloud_privatelinkendpoint.md +2 -7
  64. package/bin/skills/lakehouse-doc-en/references/creating_tencentcloud_privatelinkservice.md +1 -5
  65. package/bin/skills/lakehouse-doc-en/references/cz-cli-agent.md +15 -10
  66. package/bin/skills/lakehouse-doc-en/references/cz-cli-datasource.md +0 -8
  67. package/bin/skills/lakehouse-doc-en/references/cz-cli-sql.md +2 -45
  68. package/bin/skills/lakehouse-doc-en/references/cz-cli.md +53 -42
  69. package/bin/skills/lakehouse-doc-en/references/dashboard-version-management-guide.md +12 -4
  70. package/bin/skills/lakehouse-doc-en/references/data-integration-intro.md +1 -1
  71. package/bin/skills/lakehouse-doc-en/references/data-integration.md +29 -27
  72. package/bin/skills/lakehouse-doc-en/references/data-load-summary.md +3 -3
  73. package/bin/skills/lakehouse-doc-en/references/data-quality.md +25 -25
  74. package/bin/skills/lakehouse-doc-en/references/data-sharing.md +31 -54
  75. package/bin/skills/lakehouse-doc-en/references/data-sources.md +45 -45
  76. package/bin/skills/lakehouse-doc-en/references/data_catalog.md +23 -25
  77. package/bin/skills/lakehouse-doc-en/references/data_privacy.md +5 -2
  78. package/bin/skills/lakehouse-doc-en/references/data_sharing_between_accounts_guide.md +0 -4
  79. package/bin/skills/lakehouse-doc-en/references/data_visualization.md +4 -15
  80. package/bin/skills/lakehouse-doc-en/references/dataagent.md +39 -7
  81. package/bin/skills/lakehouse-doc-en/references/databricks-delta-to-lakehouse-migration.md +168 -0
  82. package/bin/skills/lakehouse-doc-en/references/databricks-dlt-to-lakehouse-migration.md +331 -0
  83. package/bin/skills/lakehouse-doc-en/references/databricks-external-catalog-practice.md +367 -0
  84. package/bin/skills/lakehouse-doc-en/references/databricks-jobs-to-studio-migration.md +199 -0
  85. package/bin/skills/lakehouse-doc-en/references/databricks-notebook-to-studio-migration.md +350 -0
  86. package/bin/skills/lakehouse-doc-en/references/databricks-uc-governance-to-lakehouse-migration.md +327 -0
  87. package/bin/skills/lakehouse-doc-en/references/datagpt-model-config.md +34 -0
  88. package/bin/skills/lakehouse-doc-en/references/datagpt_data_source.md +50 -37
  89. package/bin/skills/lakehouse-doc-en/references/datagpt_introduction.md +55 -79
  90. package/bin/skills/lakehouse-doc-en/references/datagpt_quickstart.md +50 -64
  91. package/bin/skills/lakehouse-doc-en/references/datalake-acceleration.md +75 -2
  92. package/bin/skills/lakehouse-doc-en/references/dbt-databricks-to-clickzetta-migration.md +242 -0
  93. package/bin/skills/lakehouse-doc-en/references/dynamic-mask.md +30 -30
  94. package/bin/skills/lakehouse-doc-en/references/dynamic-table-bestpractice.md +1 -1
  95. package/bin/skills/lakehouse-doc-en/references/dynamic-table-introduce.md +1 -1
  96. package/bin/skills/lakehouse-doc-en/references/dynamic_table_summary.md +1 -1
  97. package/bin/skills/lakehouse-doc-en/references/eco_integration/streamlit.md +1 -1
  98. package/bin/skills/lakehouse-doc-en/references/eco_integration/superset.md +1 -1
  99. package/bin/skills/lakehouse-doc-en/references/ecosystem-all.md +1 -3
  100. package/bin/skills/lakehouse-doc-en/references/ecosystem.md +145 -0
  101. package/bin/skills/lakehouse-doc-en/references/external-catalog-summary.md +33 -38
  102. package/bin/skills/lakehouse-doc-en/references/external-function-combo-practice.md +466 -0
  103. package/bin/skills/lakehouse-doc-en/references/f6fc6447ee.md +7 -9
  104. package/bin/skills/lakehouse-doc-en/references/federation-query.md +56 -6
  105. package/bin/skills/lakehouse-doc-en/references/finebi-mysql.md +2 -0
  106. package/bin/skills/lakehouse-doc-en/references/get-started-with-sample-data.md +10 -11
  107. package/bin/skills/lakehouse-doc-en/references/gitfolder.md +2 -3
  108. package/bin/skills/lakehouse-doc-en/references/grant-privileges.md +2 -0
  109. package/bin/skills/lakehouse-doc-en/references/iceberg-rest-catalog-databricks.md +166 -0
  110. package/bin/skills/lakehouse-doc-en/references/ide.md +1 -1
  111. package/bin/skills/lakehouse-doc-en/references/if_else_task.md +59 -57
  112. package/bin/skills/lakehouse-doc-en/references/input_output.md +10 -7
  113. package/bin/skills/lakehouse-doc-en/references/jobprofile-bestpractices.md +60 -64
  114. package/bin/skills/lakehouse-doc-en/references/kafka-connection.md +0 -1
  115. package/bin/skills/lakehouse-doc-en/references/key-concepts.md +146 -117
  116. package/bin/skills/lakehouse-doc-en/references/lakehouse-ai-gateway-cz-cli.md +317 -0
  117. package/bin/skills/lakehouse-doc-en/references/lakehouse-ai-sql-analysis.md +345 -0
  118. package/bin/skills/lakehouse-doc-en/references/lakehouse-dqc-guide.md +300 -0
  119. package/bin/skills/lakehouse-doc-en/references/lakehouse-medallion-sql-dt-guide.md +543 -0
  120. package/bin/skills/lakehouse-doc-en/references/lakehouse-multi-cloud-acceleration.md +274 -0
  121. package/bin/skills/lakehouse-doc-en/references/lakehouse-multimodal-ai-pipeline.md +198 -0
  122. package/bin/skills/lakehouse-doc-en/references/lakehouse-quick-experience_guide.md +49 -52
  123. package/bin/skills/lakehouse-doc-en/references/lakehouse-volume-pipe-acceleration-guide.md +380 -0
  124. package/bin/skills/lakehouse-doc-en/references/langchain-plug-installation.md +1 -1
  125. package/bin/skills/lakehouse-doc-en/references/management.md +4 -9
  126. package/bin/skills/lakehouse-doc-en/references/medallion-lakehouse-from-scratch.md +2 -1
  127. package/bin/skills/lakehouse-doc-en/references/metrics_answer_build.md +58 -21
  128. package/bin/skills/lakehouse-doc-en/references/migrate-spark-data-engineering-best-practices-to-lakehouse.md +1 -1
  129. package/bin/skills/lakehouse-doc-en/references/mindsdb.md +1 -1
  130. package/bin/skills/lakehouse-doc-en/references/monitoring_and_alerting.md +65 -60
  131. package/bin/skills/lakehouse-doc-en/references/monitoring_item_specification.md +33 -33
  132. package/bin/skills/lakehouse-doc-en/references/multitable_batch_sync.md +16 -16
  133. package/bin/skills/lakehouse-doc-en/references/multitable_realtime_sync.md +65 -72
  134. package/bin/skills/lakehouse-doc-en/references/multitable_realtime_sync_sop.md +54 -52
  135. package/bin/skills/lakehouse-doc-en/references/navicat-mysql.md +2 -0
  136. package/bin/skills/lakehouse-doc-en/references/om-dynamic-table.md +71 -66
  137. package/bin/skills/lakehouse-doc-en/references/om-vcluster.md +2 -0
  138. package/bin/skills/lakehouse-doc-en/references/open-api-create-session.md +79 -0
  139. package/bin/skills/lakehouse-doc-en/references/open-api-generate-auth-token.md +63 -0
  140. package/bin/skills/lakehouse-doc-en/references/open-api-overview.md +96 -0
  141. package/bin/skills/lakehouse-doc-en/references/open-api-quick-start.md +286 -0
  142. package/bin/skills/lakehouse-doc-en/references/open-api-response-guide.md +264 -0
  143. package/bin/skills/lakehouse-doc-en/references/open-api-safe-question-poll.md +201 -0
  144. package/bin/skills/lakehouse-doc-en/references/open-api-text2insight-query.md +99 -0
  145. package/bin/skills/lakehouse-doc-en/references/open-api-text2insight-stop.md +74 -0
  146. package/bin/skills/lakehouse-doc-en/references/overview.md +6 -7
  147. package/bin/skills/lakehouse-doc-en/references/permission-application.md +5 -5
  148. package/bin/skills/lakehouse-doc-en/references/pipe-introduction.md +1 -0
  149. package/bin/skills/lakehouse-doc-en/references/pipe-kafka-table-stream.md +72 -70
  150. package/bin/skills/lakehouse-doc-en/references/pipe-kafka.md +105 -110
  151. package/bin/skills/lakehouse-doc-en/references/pipe-overview.md +40 -40
  152. package/bin/skills/lakehouse-doc-en/references/pipe-storage-object.md +43 -48
  153. package/bin/skills/lakehouse-doc-en/references/pipe-summary.md +14 -4
  154. package/bin/skills/lakehouse-doc-en/references/pipe-syntax.md +58 -151
  155. package/bin/skills/lakehouse-doc-en/references/practice_python_task.md +4 -4
  156. package/bin/skills/lakehouse-doc-en/references/pricing-ai-gateway.md +181 -0
  157. package/bin/skills/lakehouse-doc-en/references/pricing-lakehouse.md +316 -0
  158. package/bin/skills/lakehouse-doc-en/references/pricing.md +44 -288
  159. package/bin/skills/lakehouse-doc-en/references/private-link-general.md +0 -2
  160. package/bin/skills/lakehouse-doc-en/references/pyspark-to-zettapark-migration-f1.md +1 -1
  161. package/bin/skills/lakehouse-doc-en/references/python-igs.md +7 -3
  162. package/bin/skills/lakehouse-doc-en/references/python-sample-put-github-rt-events.md +1 -1
  163. package/bin/skills/lakehouse-doc-en/references/python-task.md +1 -1
  164. package/bin/skills/lakehouse-doc-en/references/python_reference/connector.md +3 -3
  165. package/bin/skills/lakehouse-doc-en/references/python_reference/connector_advanced.md +2 -2
  166. package/bin/skills/lakehouse-doc-en/references/python_reference/connector_examples.md +2 -2
  167. package/bin/skills/lakehouse-doc-en/references/python_sdk_guide.md +1 -1
  168. package/bin/skills/lakehouse-doc-en/references/python_shell_datasource.md +11 -9
  169. package/bin/skills/lakehouse-doc-en/references/quick_start_batch_sync_data.md +9 -18
  170. package/bin/skills/lakehouse-doc-en/references/quick_start_bi_analysis.md +8 -25
  171. package/bin/skills/lakehouse-doc-en/references/quick_start_create_workspace.md +4 -6
  172. package/bin/skills/lakehouse-doc-en/references/quick_start_data_quality.md +8 -8
  173. package/bin/skills/lakehouse-doc-en/references/quick_start_etl.md +16 -20
  174. package/bin/skills/lakehouse-doc-en/references/quick_start_monitoring_and_alerting.md +10 -18
  175. package/bin/skills/lakehouse-doc-en/references/quick_start_sql_query.md +7 -10
  176. package/bin/skills/lakehouse-doc-en/references/quick_start_upload_data.md +5 -7
  177. package/bin/skills/lakehouse-doc-en/references/quick_start_user_management.md +8 -8
  178. package/bin/skills/lakehouse-doc-en/references/quick_start_workspace.md +0 -5
  179. package/bin/skills/lakehouse-doc-en/references/quick_start_workspace_user.md +8 -8
  180. package/bin/skills/lakehouse-doc-en/references/quickstart.md +69 -56
  181. package/bin/skills/lakehouse-doc-en/references/quickstart_datashare_between_companies.md +0 -5
  182. package/bin/skills/lakehouse-doc-en/references/quickstart_envirment_for_team.md +0 -24
  183. package/bin/skills/lakehouse-doc-en/references/realtime-pipeline-selection-guide.md +1 -2
  184. package/bin/skills/lakehouse-doc-en/references/realtime-sales-dashboard-with-dynamic-table.md +3 -3
  185. package/bin/skills/lakehouse-doc-en/references/realtime_sync.md +0 -1
  186. package/bin/skills/lakehouse-doc-en/references/release-note-2026-05-19.md +5 -3
  187. package/bin/skills/lakehouse-doc-en/references/revoke-privileges.md +3 -1
  188. package/bin/skills/lakehouse-doc-en/references/roles.md +2 -3
  189. package/bin/skills/lakehouse-doc-en/references/row-filter.md +165 -0
  190. package/bin/skills/lakehouse-doc-en/references/row_level_permission.md +30 -19
  191. package/bin/skills/lakehouse-doc-en/references/scheduled_task.md +28 -21
  192. package/bin/skills/lakehouse-doc-en/references/security_overview.md +99 -21
  193. package/bin/skills/lakehouse-doc-en/references/set-command.md +1 -1
  194. package/bin/skills/lakehouse-doc-en/references/setup.md +13 -15
  195. package/bin/skills/lakehouse-doc-en/references/show-grants.md +1 -1
  196. package/bin/skills/lakehouse-doc-en/references/snowflake-dynamic-tables-to-lakehouse.md +2 -2
  197. package/bin/skills/lakehouse-doc-en/references/spark-connector-summary.md +1 -1
  198. package/bin/skills/lakehouse-doc-en/references/sql_functions/context_functions/current_vcluster.md +1 -1
  199. package/bin/skills/lakehouse-doc-en/references/sso-configuration.md +2 -2
  200. package/bin/skills/lakehouse-doc-en/references/streaming_pipeline_with_dynamic_table.md +0 -1
  201. package/bin/skills/lakehouse-doc-en/references/studio-incremental-sync-practice.md +27 -23
  202. package/bin/skills/lakehouse-doc-en/references/studio-shell-task.md +1 -1
  203. package/bin/skills/lakehouse-doc-en/references/supported-cloud-platforms.md +32 -0
  204. package/bin/skills/lakehouse-doc-en/references/table_rendering.md +18 -12
  205. package/bin/skills/lakehouse-doc-en/references/task-develop.md +89 -91
  206. package/bin/skills/lakehouse-doc-en/references/task_development.md +19 -17
  207. package/bin/skills/lakehouse-doc-en/references/task_group.md +16 -14
  208. package/bin/skills/lakehouse-doc-en/references/task_instance.md +21 -21
  209. package/bin/skills/lakehouse-doc-en/references/task_param.md +38 -35
  210. package/bin/skills/lakehouse-doc-en/references/task_param_reference.md +81 -79
  211. package/bin/skills/lakehouse-doc-en/references/task_scheduling_dependency.md +20 -21
  212. package/bin/skills/lakehouse-doc-en/references/tencentcloud_arn_and_externalid.md +1 -5
  213. package/bin/skills/lakehouse-doc-en/references/trial-account-quotas-and-limits.md +1 -3
  214. package/bin/skills/lakehouse-doc-en/references/tutorial_connect_to_lakehouse.md +69 -0
  215. package/bin/skills/lakehouse-doc-en/references/tutorials.md +4 -1
  216. package/bin/skills/lakehouse-doc-en/references/unique-key.md +167 -0
  217. package/bin/skills/lakehouse-doc-en/references/usageandbillingview.md +138 -0
  218. package/bin/skills/lakehouse-doc-en/references/use-dbt-dev.md +3 -3
  219. package/bin/skills/lakehouse-doc-en/references/use-java-sdk-realtime-uploaddata.md +1 -1
  220. package/bin/skills/lakehouse-doc-en/references/use-java-sdk-upload-data-local.md +3 -3
  221. package/bin/skills/lakehouse-doc-en/references/use-models.md +128 -0
  222. package/bin/skills/lakehouse-doc-en/references/use-mysql-client.md +81 -81
  223. package/bin/skills/lakehouse-doc-en/references/use-python-sdk-upload-data.md +10 -12
  224. package/bin/skills/lakehouse-doc-en/references/user-identification.md +2 -3
  225. package/bin/skills/lakehouse-doc-en/references/user_permission_grand_guide.md +1 -1
  226. package/bin/skills/lakehouse-doc-en/references/using-udf-in-dynamic-table.md +1 -1
  227. package/bin/skills/lakehouse-doc-en/references/vc_cache.md +18 -22
  228. package/bin/skills/lakehouse-doc-en/references/vcluster_size_description.md +33 -31
  229. package/bin/skills/lakehouse-doc-en/references/virtual-cluster.md +43 -45
  230. package/bin/skills/lakehouse-doc-en/references/web-job-history.md +94 -108
  231. package/bin/skills/lakehouse-doc-en/references/web_search.md +16 -7
  232. package/bin/skills/lakehouse-doc-en/references/zettapark-data-engineering-demo.md +1 -1
  233. package/bin/skills/lakehouse-doc-en/references/zettapark-dataframe-guide.md +144 -70
  234. package/bin/skills/lakehouse-doc-en/references/zettapark-dynamic-table-guide.md +2 -2
  235. package/bin/skills/lakehouse-doc-en/references/zettapark-etl-guide.md +73 -33
  236. package/bin/skills/lakehouse-doc-en/references/zettapark-feature-engineering.md +2 -2
  237. package/bin/skills/lakehouse-doc-en/references/zettapark-functions-guide.md +75 -46
  238. package/bin/skills/lakehouse-doc-en/references/zettapark-quick-start.md +2 -2
  239. package/bin/skills/lakehouse-doc-en/references/zettapark-stream-guide.md +4 -4
  240. package/bin/skills/lakehouse-doc-en/references/zettapark-volume-guide.md +93 -29
  241. package/package.json +1 -1
  242. package/bin/skills/lakehouse-doc-en/references/CLAUDE.md +0 -606
  243. package/bin/skills/lakehouse-doc-en/references/modelprice.md +0 -155
@@ -1,38 +1,33 @@
1
- # Overview
2
- **[Preview Release] This feature is currently in public preview.**
3
-
4
- External Catalog is a secure object in Lakehouse that maps databases from external data systems, allowing users to perform read-only queries on these data systems within Lakehouse. Through External Catalog, users can leverage Lakehouse's query capabilities to access and analyze data stored in external databases.
5
-
6
- # External Catalog Use Cases
7
- * **Unified Metadata Management**: Manage metadata from multiple data sources in a unified manner, simplifying data governance.
8
- - **Data Federation Query**: With External Catalog, users can perform federated queries across different data sources as if they were data within the same database. Through federated queries, users can access and analyze data stored in external systems in real-time without waiting for data synchronization.
9
- - **Data Import**: Import data scattered across different data sources into Lakehouse to build a unified data lake, facilitating big data analysis and machine learning. Retain historical data or infrequently accessed data in external storage and import it into Lakehouse through External Catalog to optimize data warehouse storage and performance.
10
-
11
- # Using External Catalog
12
-
13
- 1. **Create Connection**: First, you need to create a connection in Catalog Connection. This connection is a secure object that specifies the path and authentication information for accessing the external database system.
14
- 2. **Create External Catalog**: Using the created connection, you can create an external catalog. This catalog exists as a secure object in External Catalog, mirroring the database structure in the external data system.
15
- 3. **Execute Query**: Once the external catalog is created, users can write SQL queries in Lakehouse.
16
-
17
- # Supported Data Sources
18
-
19
- Lakehouse supports Apache Hive connection access through the Multi-Catalog feature.
20
-
21
- # EXTERNAL CATALOG Related Syntax
22
- - Create External Catalog
23
- Refer to [CREATE EXTERNAL CATALOG](<create-external-catalog.md>)
24
- - List Catalog
25
- Refer to [SHOW CATALOG](<show-catalog.md>)
26
- - View SCHEMA under Catalog
27
- Refer to [View SCHEMA under EXTERNAL CATALOG](<show-catalog-schema.md>)
28
- - List Tables under CATALOG
29
- Refer to [List Tables under CATALOG](<show-catalog-table.md>)
30
- - Query Tables under CATALOG
31
- Refer to [Query Tables under CATALOG](<show-catalog-table.md>)
32
- - Create Table Structure under CATALOG
33
- Refer to [View Table Structure under CATALOG](<desc-catalog-table.md>)
34
-
35
- # Permissions
36
- Currently, only the instance admin role can query the created CATALOG.
37
- # Use Cases
38
- Refer to [Create HIVE CATALOG](<create-hive-catalog.md>)
1
+ # External Catalog
2
+
3
+ External Catalog is the federated query entry point in Lakehouse, mapping the metadata catalogs of external data systems (Hive, Databricks, Snowflake, etc.) into Lakehouse, allowing you to query external data directly with standard SQL — no data copying required.
4
+
5
+ **Difference from External Schema**: External Catalog is an independent top-level catalog accessed with three-level naming `catalog.schema.table`; External Schema is a Schema mounted into the current workspace, accessed with two-level naming `schema.table`, which is better suited for integrating Hive databases into an existing workspace. See [Organization Hierarchy](org-hierarchy.md).
6
+
7
+ ## Supported Data Sources
8
+
9
+ | Data Source | Connection Method |
10
+ |--------|---------|
11
+ | Apache Hive | Hive Metastore URIs |
12
+ | Databricks Unity Catalog | Databricks API |
13
+ | Iceberg REST Catalog | Iceberg REST API |
14
+ | Snowflake Open Catalog | Iceberg REST API + OAuth |
15
+
16
+ ## Use Cases
17
+
18
+ - **Cross-platform federated queries**: Query Lakehouse local data and Hive/Databricks data simultaneously — no ETL required
19
+ - **In-place data lake acceleration**: Keep data in OSS/HDFS and use Lakehouse to replace Spark/Hive for ETL or Presto/Trino for ad-hoc queries
20
+ - **Gradual migration**: Maintain business continuity through External Catalog during migration; switch over after verifying data consistency
21
+
22
+ ## Permissions
23
+
24
+ Currently, only the `instance_admin` role can query the created External Catalog.
25
+
26
+ ## Related Documentation
27
+
28
+ - [In-Place Lake Acceleration Implementation Guide](lakehouse-acceleration-guide.md) — Rapid POC validation, replacing Spark/Hive and Presto/Trino without moving data
29
+ - [External Catalog Federated Queries](external-catalog-concept.md) — Detailed usage guide, operation examples, architecture principles
30
+ - [Create External Catalog](create-external-catalog.md) — CREATE EXTERNAL CATALOG syntax
31
+ - [Create Hive Catalog](create-hive-catalog.md) — Hive connection configuration
32
+ - [External Schema](external-schema.md) Mount an external Hive database into a workspace
33
+ - [Organization Hierarchy](org-hierarchy.md) — External Catalog vs External Schema selection guide
@@ -0,0 +1,466 @@
1
+ # Storage Connection + API Connection + External Function: Combined Practice
2
+
3
+ External Functions can look daunting at first — you need to configure an OSS Bucket, a serverless function runtime, a RAM role, write Python code, build a zip, and write DDL... three separate objects, many concepts, many steps, easy to get discouraged.
4
+
5
+ Once you get everything running, you'll find that the complexity is mostly concentrated in **one-time environment setup**. After that initial setup, the cost of adding new functions is extremely low: write Python logic → package → one DDL, done in three minutes. This guide walks you through four progressive scenarios to cover the full workflow across Alibaba Cloud, Tencent Cloud, and AWS.
6
+
7
+ **By the end you'll realize**: External Functions are not complex. You just configure "where to store code" and "where to run code" once, and every new function after that is just code + one SQL statement.
8
+
9
+ Full code on GitHub: [clickzetta_external_function](https://github.com/clickzetta/clickzetta_external_function)
10
+
11
+ ---
12
+
13
+ ## The Iron Triangle: How the Three Objects Work Together
14
+
15
+ An External Function is not just one object — it is the result of three objects working together. Understanding their roles makes the entire mechanism clear:
16
+
17
+ ![](.topwrite/assets/external-function-iron-triangle.svg)
18
+
19
+
20
+ | Object | Analogy | What it does | How often configured |
21
+ |------|------|----------|--------|
22
+ | **Storage Connection** | Parking lot | Authenticates object storage (OSS/COS/S3), allows Lakehouse to read and write code packages | Once per Schema |
23
+ | **API Connection** | Workshop | Authenticates cloud function runtime (FC/SCF/Lambda), defines where code runs | Once per Region |
24
+ | **External Volume** | Shelf | Mounts the object storage Bucket to the Schema, enabling the PUT command to upload files | Once per Bucket |
25
+ | **CREATE EXTERNAL FUNCTION** | Registry | Maps a function name → entry class → zip package | Once per function |
26
+
27
+ The first three are one-time setups — once you configure Storage Connection + API Connection + External Volume, all functions share them. Adding a new function afterwards only requires one `CREATE EXTERNAL FUNCTION` statement.
28
+
29
+ ---
30
+
31
+ ## Prerequisites (One-time, Shared Across All Four Scenarios)
32
+
33
+ All four scenarios share the same cloud environment configuration. **This step is done only once**, and all four scenarios can use it directly.
34
+
35
+ ### Step 1: Choose your cloud, configure config.json
36
+
37
+ Confirm which cloud your Lakehouse is on. The external function runtime (FC/SCF/Lambda) must be on the same cloud and in the same region:
38
+
39
+ ```bash
40
+ cz-cli profile list
41
+ # service column:
42
+ # alicloud.api.clickzetta.com → Alibaba Cloud
43
+ # tencentcloud.api.clickzetta.com → Tencent Cloud
44
+ # aws.api.clickzetta.com → AWS
45
+ ```
46
+
47
+ ```bash
48
+ git clone https://github.com/clickzetta/clickzetta_external_function.git
49
+ cd clickzetta_external_function
50
+ cp config.example.json config.json
51
+ ```
52
+
53
+ Open `config.json` and change only the `platform` field:
54
+
55
+ ```json
56
+ "platform": "aliyun" // or "tencent" or "aws"
57
+ ```
58
+
59
+ Then follow SETUP.md to complete the cloud-specific environment setup (OSS/COS/S3 Bucket, FC/SCF/Lambda, RAM/CAM/IAM role, Bailian API Key) and fill in the details in config.json.
60
+
61
+ ### Step 2: Install cz-cli and verify
62
+
63
+ ```bash
64
+ cz-cli profile use <your-profile>
65
+ cz-cli sql "SELECT current_schema()"
66
+ # Fill the output schema name into config.json → schema field
67
+ ```
68
+
69
+ ### Step 3: Universal steps for all four scenarios
70
+
71
+ The execution flow for all four scenarios is identical:
72
+
73
+ ```
74
+ Fill config.json → check (validate config) → package (build code) → render (generate SQL) → deploy (execute deployment) → verify (call and verify)
75
+ ```
76
+
77
+ Corresponding commands:
78
+
79
+ ```bash
80
+ python ../1-check-config.py # ① Validate configuration
81
+ python 2-package.py # ② Package code (add --deps for AI functions)
82
+ python ../3-render-sql.py # ③ Replace placeholders, generate SQL
83
+ cz-cli sql -f dist/4-deploy_generated.sql --write # ④ Deploy
84
+ ```
85
+
86
+ ---
87
+
88
+ ## Scenario 1: Python External Function Quick Start
89
+
90
+ > One function, zero dependencies, up and running in 5 minutes. Understand how Storage Connection + API Connection + External Function work together.
91
+
92
+ ### Deploy
93
+
94
+ ```bash
95
+ cd python_quickstart
96
+ python ../1-check-config.py
97
+ python 2-package.py
98
+ python ../3-render-sql.py
99
+ cz-cli sql -f dist/4-deploy_generated.sql --write
100
+ ```
101
+
102
+ What `4-deploy_generated.sql` does:
103
+
104
+ ```sql
105
+ -- 1. Storage Connection (OSS authentication)
106
+ CREATE STORAGE CONNECTION IF NOT EXISTS oss_sh_conn
107
+ TYPE OSS
108
+ access_id = '<your-id>' access_key = '<your-key>'
109
+ ENDPOINT = 'oss-cn-shanghai.aliyuncs.com';
110
+
111
+ -- 2. API Connection (Function Compute FC authentication)
112
+ CREATE API CONNECTION IF NOT EXISTS shanghai_func_conn
113
+ TYPE CLOUD_FUNCTION PROVIDER = 'aliyun'
114
+ REGION = 'cn-shanghai' ROLE_ARN = '<your-role-arn>'
115
+ CODE_BUCKET = '<your-bucket>';
116
+
117
+ -- 3. External Volume (mount Bucket)
118
+ CREATE EXTERNAL VOLUME IF NOT EXISTS external_functions_prod
119
+ LOCATION 'oss://<bucket>/' USING CONNECTION oss_sh_conn;
120
+
121
+ -- 4. Upload zip
122
+ PUT '<project>/dist/my_upper.zip' TO VOLUME external_functions_prod FILE 'my_upper.zip';
123
+
124
+ -- 5. Register function
125
+ CREATE EXTERNAL FUNCTION IF NOT EXISTS <schema>.my_upper
126
+ AS 'my_upper.my_upper'
127
+ USING ARCHIVE 'volume://external_functions_prod/my_upper.zip'
128
+ CONNECTION shanghai_func_conn
129
+ WITH PROPERTIES ('remote.udf.api'='python3.mc.v0','remote.udf.protocol'='http.arrow.v0');
130
+
131
+ -- 6. Verify
132
+ SELECT <schema>.my_upper('hello'); -- returns HELLO
133
+ ```
134
+
135
+ ### Function Source Code
136
+
137
+ `src/my_upper.py` — one class, one `evaluate` method:
138
+
139
+ ```python
140
+ @annotate("*->string")
141
+ class my_upper(object):
142
+ def evaluate(self, s):
143
+ return s.upper() if s else s
144
+ ```
145
+
146
+ ### Local Testing
147
+
148
+ FC environments have no stdout and no stack traces. **Always test locally before each deployment**:
149
+
150
+ ```bash
151
+ python3 -c "
152
+ import sys; sys.path.insert(0, 'src')
153
+ from my_upper import my_upper
154
+ print(my_upper().evaluate('hello')) # HELLO
155
+ "
156
+ ```
157
+
158
+ ### Key Takeaways
159
+
160
+ - `remote.udf.api='python3.mc.v0'` specifies the Python 3.10 runtime for FC
161
+ - Calling the function **requires the schema prefix**: `SELECT <schema>.my_upper('hello')` — omitting it results in `function not found`
162
+ - The first call may take 5-10 seconds (FC cold start); subsequent calls are normal
163
+
164
+ ---
165
+
166
+ ## Scenario 2: Python ML Functions + Third-Party Dependency Packaging
167
+
168
+ > 5 ML/PII functions based on scikit-learn + jieba. Demonstrates how to correctly package third-party dependencies that include C extensions.
169
+
170
+ ### Function List
171
+
172
+ | Function | Libraries used | Purpose |
173
+ |------|---------|------|
174
+ | `pii_mask` | re | Phone/email/ID masking |
175
+ | `feature_normalize` | numpy + sklearn | Numeric column normalization (minmax/zscore) |
176
+ | `anomaly_detect` | numpy + sklearn | Isolation Forest anomaly detection |
177
+ | `sentiment_score` | jieba | Chinese sentiment scoring (0-1) |
178
+ | `tfidf_keywords` | sklearn | TF-IDF keyword extraction |
179
+
180
+ ### The Core Problem: FC Runs Linux, macOS Packages Won't Work
181
+
182
+ The FC runtime is **Linux x86_64 + Python 3.10**. Running `pip install scikit-learn` on macOS produces `.dylib` files that cannot be loaded on FC.
183
+
184
+ Solution: use two separate requirements files with two different install methods.
185
+
186
+ | File | Contents | Install method |
187
+ |------|--------|---------|
188
+ | `requirements.txt` | Packages with C extensions (scikit-learn, numpy) | `pip install --platform manylinux2014_x86_64 --only-binary :all:` |
189
+ | `requirements_pure.txt` | Pure Python packages (jieba) | Normal `pip install` |
190
+
191
+ **Why can't they be in the same file?** Putting jieba (pure Python) into `requirements.txt` with `--only-binary :all:` causes pip to throw `No matching distribution found` — pure Python packages don't have binary wheels.
192
+
193
+ ### Deploy
194
+
195
+ ```bash
196
+ cd python_advanced
197
+ python 2-package.py # dual-mode packaging (~100 MB)
198
+ python ../1-check-config.py
199
+ python ../3-render-sql.py
200
+ cz-cli sql -f dist/4-deploy_generated.sql --write
201
+ ```
202
+
203
+ ### Local Testing
204
+
205
+ ```bash
206
+ pip install -r requirements.txt -r requirements_pure.txt
207
+ python3 -c "
208
+ import sys; sys.path.insert(0, 'src')
209
+ from ml_toolkit import pii_mask, feature_normalize, sentiment_score
210
+ print(pii_mask().evaluate('My phone is 13812345678'))
211
+ print(feature_normalize().evaluate('[1,2,3,4,5]', 'minmax'))
212
+ print(sentiment_score().evaluate('The product quality is excellent'))
213
+ "
214
+ ```
215
+
216
+ ### Using in SQL
217
+
218
+ ```sql
219
+ SELECT <schema>.pii_mask('Phone 13812345678, email alice@example.com');
220
+ SELECT <schema>.feature_normalize('[10,20,30,40,50]', 'minmax');
221
+ SELECT <schema>.anomaly_detect('[1,2,3,4,100]');
222
+ SELECT <schema>.sentiment_score('The product quality is excellent, shipping was fast, very satisfied!');
223
+ SELECT <schema>.tfidf_keywords('["AI and machine learning are the future","Deep learning achieves breakthroughs in image recognition"]', 3);
224
+ ```
225
+
226
+ ### Key Takeaways
227
+
228
+ - **C extension packages require Linux binary wheels** — macOS `.dylib` files cannot run on FC
229
+ - **Pure Python packages must be separated** — they cannot be mixed with binary packages in the same requirements file
230
+ - **Zip size determines cold start time**: scikit-learn + numpy is about 100MB; the first call takes 5-10 seconds
231
+
232
+ ---
233
+
234
+ ## Scenario 3: 30 AI SQL Functions — Package Once, Call Anywhere
235
+
236
+ > 30 AI functions share a single zip. Perform summarization, translation, sentiment analysis, OCR, and vector similarity search directly in SQL.
237
+
238
+ ### Design Highlights
239
+
240
+ **One zip, 30 functions**: All 30 functions share a single `clickzetta_ai_functions_full.zip`. The DDL only differs in the class name `AS 'ai_functions_complete.ai_xxx'`. Packaging and uploading a separate zip for each function would mean 30 zip files and exploding management complexity.
241
+
242
+ **API Key as a SQL parameter**: The Bailian API Key is not hardcoded in the source code or bundled into the zip — it is passed as a function parameter:
243
+
244
+ ```sql
245
+ SELECT <schema>.ai_text_summarize('Artificial intelligence is changing the world.', '<your-api-key>');
246
+ ```
247
+
248
+ If the API Key were hardcoded in the zip, a zip leak would mean a Key leak. Passing it as a parameter lets callers manage their own keys.
249
+
250
+ ### Function Categories
251
+
252
+ | Category | Count | Typical functions |
253
+ |------|------|---------|
254
+ | Text processing | 8 | Summarization, translation, sentiment analysis, entity extraction, keywords, classification, cleaning, tagging |
255
+ | Vector processing | 5 | Embedding, similarity, clustering preparation, similar search, document search |
256
+ | Multimodal | 8 | Image description, OCR, image analysis, image embedding, image similarity, video summarization, chart analysis, document parsing |
257
+ | Business scenarios | 9 | Customer intent, sales scoring, review analysis, risk detection, contract extraction, resume parsing, customer segmentation, product description, industry classification |
258
+
259
+ ### Deploy
260
+
261
+ ```bash
262
+ cd python_ai_function
263
+ pip install -r requirements.txt # for local testing only
264
+ python 2-package.py --deps # package code + dashscope Linux dependencies
265
+ python 1-check-config.py # standalone validation (different config structure)
266
+ python 3-render-sql.py
267
+ cz-cli sql -f dist/4-deploy_generated.sql --write
268
+ ```
269
+
270
+ ### Local Testing
271
+
272
+ FC has no logs and no stdout. **Test locally before deploying to save a lot of time:**
273
+
274
+ ```bash
275
+ python3 -c "
276
+ import sys; sys.path.insert(0, 'src')
277
+ from ai_functions_complete import ai_text_summarize, ai_text_translate
278
+ print(ai_text_summarize().evaluate('Hello world', '<your-api-key>'))
279
+ print(ai_text_translate().evaluate('Hello', 'Chinese', '<your-api-key>'))
280
+ "
281
+ ```
282
+
283
+ ### Using in SQL
284
+
285
+ ```sql
286
+ -- Summarization
287
+ SELECT <schema>.ai_text_summarize('Artificial intelligence is changing the world...', '<key>');
288
+
289
+ -- Translation
290
+ SELECT <schema>.ai_text_translate('Hello, how are you?', 'Chinese', '<key>');
291
+
292
+ -- Sentiment analysis
293
+ SELECT <schema>.ai_text_sentiment_analyze('The product quality is excellent!', '<key>');
294
+
295
+ -- Embedding + similarity search
296
+ SELECT <schema>.ai_semantic_similarity('Apples are tasty', 'Apples are a healthy fruit', '<key>');
297
+
298
+ -- Image description
299
+ SELECT <schema>.ai_image_describe('<image-url>', '<key>');
300
+
301
+ -- Contract extraction
302
+ SELECT <schema>.ai_contract_extract('<contract text>', '<key>');
303
+ ```
304
+
305
+ Returns JSON; use `JSON_EXTRACT` to retrieve values:
306
+
307
+ ```sql
308
+ SELECT JSON_EXTRACT(
309
+ <schema>.ai_text_summarize('Artificial intelligence is changing the world...', '<key>'),
310
+ '$.summary'
311
+ );
312
+ ```
313
+
314
+ ### Key Takeaways
315
+
316
+ - **30 functions share one zip** — adding a new function = 20 lines of prompt + one DDL
317
+ - **API Key is not hardcoded** — passed as a SQL parameter, so a zip leak does not compromise security
318
+ - **config.json serves dual roles**: embedded in the zip at build time (for runtime model config), and used to replace SQL placeholders at render time
319
+
320
+ ---
321
+
322
+ ## Scenario 4: Java UDF/UDAF/UDTF
323
+
324
+ > Java external functions support three types: UDF (one row in, one row out), UDAF (multiple rows in, one row out), UDTF (one row in, multiple rows out).
325
+
326
+ ### Quick Overview of the Three Types
327
+
328
+ | Type | Base class | Input → Output | DDL special property | Example function |
329
+ |------|------|-------------|-------------|---------|
330
+ | **UDF** | `GenericUDF` | 1 row → 1 row | — | `pii_mask` PII masking |
331
+ | **UDAF** | `GenericUDAFResolver2` | Multiple rows → 1 row | `AGGREGATOR` | `agg_stats` SUM/AVG/MIN/MAX/COUNT |
332
+ | **UDTF** | `GenericUDTF` | 1 row → N rows | `TABLE_VALUED` | `log_explode` log row expansion |
333
+
334
+ ### Differences from Python External Functions
335
+
336
+ | | Python | Java |
337
+ |------|--------|------|
338
+ | Runtime | Python 3.10 | **Java 8** (Java 9+ not supported) |
339
+ | DDL property | `python3.mc.v0` | `java8.hive2.v0` |
340
+ | Function types | UDF | UDF / UDAF / UDTF |
341
+ | Packaging | zip | Maven `jar-with-dependencies` → zip |
342
+ | Dependencies | pip `--platform manylinux` | Maven `scope=provided` (Hive runtime included) |
343
+
344
+ ### Deploy
345
+
346
+ ```bash
347
+ cd java_udf
348
+ python 2-package.py # Maven compile + zip packaging
349
+ python ../1-check-config.py
350
+ python ../3-render-sql.py
351
+ cz-cli sql -f dist/4-deploy_generated.sql --write
352
+ ```
353
+
354
+ ### UDF Example
355
+
356
+ ```sql
357
+ SELECT <schema>.pii_mask('My phone is 13812345678, email alice@example.com');
358
+ ```
359
+
360
+ ### UDAF Example
361
+
362
+ ```sql
363
+ INSERT INTO <schema>.java_udf_test_scores VALUES (3.5), (4.2), (2.8), (5.0), (3.9);
364
+ SELECT <schema>.agg_stats(val) FROM <schema>.java_udf_test_scores;
365
+ -- → [sum, avg, min, max, count]
366
+ ```
367
+
368
+ ### UDTF Example
369
+
370
+ UDTF requires the `LATERAL` syntax and cannot be used with `SELECT func(x)`:
371
+
372
+ ```sql
373
+ SELECT t.ts, t.event
374
+ FROM (
375
+ SELECT '[2025-01-15 10:30:00] User login
376
+ [2025-01-15 10:35:00] Query order' AS log
377
+ ) s, LATERAL <schema>.log_explode(s.log) t;
378
+ -- → each log line is expanded into one row
379
+ ```
380
+
381
+ ### Key Takeaways
382
+
383
+ - **Java version must be 8** — FC only has a Java 8 runtime
384
+ - **Hive dependency scope=provided**: FC includes `hive-exec.jar`; do not bundle it in the zip to avoid conflicts
385
+ - **UDAF DDL must include `AGGREGATOR`**, UDTF must include `TABLE_VALUED` — omitting them causes creation to succeed but calls to fail
386
+ - **UDTF must use `LATERAL`** syntax; `SELECT func(x)` directly is not supported
387
+
388
+ ---
389
+
390
+ ## Function Quick Reference
391
+
392
+ ### Python (quickstart + advanced + AI function)
393
+
394
+ | Function | Purpose | Source |
395
+ |------|------|------|
396
+ | `my_upper` | String to uppercase | quickstart |
397
+ | `pii_mask` | Phone/email/ID masking | advanced |
398
+ | `feature_normalize` | Numeric column normalization | advanced |
399
+ | `anomaly_detect` | Isolation Forest anomaly detection | advanced |
400
+ | `sentiment_score` | Chinese sentiment scoring | advanced |
401
+ | `tfidf_keywords` | TF-IDF keyword extraction | advanced |
402
+ | `ai_text_summarize` | Text summarization | AI function |
403
+ | `ai_text_translate` | Text translation | AI function |
404
+ | `ai_text_sentiment_analyze` | Sentiment analysis | AI function |
405
+ | `ai_text_extract_entities` | Entity extraction | AI function |
406
+ | `ai_text_extract_keywords` | Keyword extraction | AI function |
407
+ | `ai_text_classify` | Text classification | AI function |
408
+ | `ai_text_clean_normalize` | Text cleaning | AI function |
409
+ | `ai_auto_tag_generate` | Auto tagging | AI function |
410
+ | `ai_text_to_embedding` | Text embedding | AI function |
411
+ | `ai_semantic_similarity` | Semantic similarity | AI function |
412
+ | `ai_text_clustering_prepare` | Clustering preparation | AI function |
413
+ | `ai_find_similar_text` | Similar text search | AI function |
414
+ | `ai_document_search` | Document search | AI function |
415
+ | `ai_image_describe` | Image description | AI function |
416
+ | `ai_image_ocr` | Image OCR | AI function |
417
+ | `ai_image_analyze` | Image analysis | AI function |
418
+ | `ai_image_to_embedding` | Image embedding | AI function |
419
+ | `ai_image_similarity` | Image similarity | AI function |
420
+ | `ai_video_summarize` | Video summarization | AI function |
421
+ | `ai_chart_analyze` | Chart analysis | AI function |
422
+ | `ai_document_parse` | Document parsing | AI function |
423
+ | `ai_customer_intent_analyze` | Customer intent analysis | AI function |
424
+ | `ai_sales_lead_score` | Sales lead scoring | AI function |
425
+ | `ai_review_analyze` | Review analysis | AI function |
426
+ | `ai_risk_text_detect` | Risk text detection | AI function |
427
+ | `ai_contract_extract` | Contract extraction | AI function |
428
+ | `ai_resume_parse` | Resume parsing | AI function |
429
+ | `ai_customer_segment` | Customer segmentation | AI function |
430
+ | `ai_product_description_generate` | Product description generation | AI function |
431
+ | `ai_industry_classification` | Industry classification | AI function |
432
+
433
+ ### Java
434
+
435
+ | Function | Type | Purpose |
436
+ |------|------|------|
437
+ | `pii_mask` | UDF | PII masking |
438
+ | `agg_stats` | UDAF | SUM/AVG/MIN/MAX/COUNT |
439
+ | `log_explode` | UDTF | Log row expansion |
440
+
441
+ ---
442
+
443
+ ## Troubleshooting
444
+
445
+ | Error | Cause | Solution |
446
+ |------|------|------|
447
+ | `function not found` | Missing schema prefix | Add `<schema>.` prefix when calling |
448
+ | `HTTP_GENERAL_ERROR(640)` | RAM trust policy not configured / Bucket in a different region | Check RAM role trust policy (must include `1384322691904283`); confirm Bucket and FC are in the same region |
449
+ | `AccessDenied` | RAM role missing OSS permissions | Add `AliyunOSSFullAccess` or a custom OSS policy |
450
+ | `ImportError: No module named 'sklearn'` | Dependencies not packaged in zip | Re-run `python 2-package.py` (advanced) / `python 2-package.py --deps` (AI function) |
451
+ | `OSError: cannot open shared object file` | macOS `.dylib` was packaged | Confirm `--platform manylinux2014_x86_64` was used |
452
+ | `ClassNotFoundException` | Wrong Java class name or package name | Check that the `AS` path matches the actual class name inside the jar |
453
+ | UDAF created successfully but call fails | DDL missing `AGGREGATOR` | Check `WITH PROPERTIES ('remote.udf.category'='AGGREGATOR')` |
454
+ | UDTF created successfully but `not a table function` | DDL missing `TABLE_VALUED` | Check `WITH PROPERTIES ('remote.udf.category'='TABLE_VALUED')` |
455
+ | First call takes a long time to return | FC cold start | Wait 5-10 seconds; it has not hung |
456
+ | Changes to `config.json` not reflected after deployment | Forgot to re-render | Re-run `python ../3-render-sql.py` |
457
+
458
+ ---
459
+
460
+ ## Related Documentation
461
+
462
+ - [External Function Introduction](RemoteFunction-intro.md)
463
+ - [Development Guide: External Function (Python3)](RemoteFunction-dev-guide-python3.md)
464
+ - [Development Guide: External Function (Java)](external-function-dev-guide-java.md)
465
+ - [CREATE EXTERNAL FUNCTION](create_external_function.md)
466
+ - [GitHub: clickzetta_external_function](https://github.com/clickzetta/clickzetta_external_function)
@@ -39,18 +39,16 @@ If both conditions are met, the scheduling system submits the instance task for
39
39
  If either condition is not met, the task instance will remain in the **Not Started** state. However, to prevent a large number of task instances from remaining in the Not Started state for a long time due to configuration issues or paused upstream tasks, which would waste resources, the system will execute a kill operation based on the user-configured "Scheduling Wait Duration". That is, the task instance will be directly determined as failed by the system.
40
40
 
41
41
  | Note: If the periodic task in the production environment does not have a scheduling wait duration configured, the default instance scheduling wait duration is 3 days. That is, when the task reaches its scheduled time, after 3 days, regardless of whether all upstream tasks have succeeded, it will change from the **Not Started** state to the **Failed** state. If the user has configured a "Scheduling Wait Duration" in the scheduling configuration, once the task reaches its scheduled time, regardless of whether all upstream tasks have succeeded, once the scheduling wait duration is exceeded, the task instance state will change from **Not Started** to **Failed**. |
42
- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
42
+ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
43
43
 
44
44
  ## Scheduling Properties Configuration
45
45
 
46
46
  In the **Development** module, open any task, click the Schedule Configuration function, and you can configure a series of scheduling property information for that task, including basic information, scheduling time, instance information, scheduling dependencies, and task outputs.
47
47
 
48
- *This document focuses on* scheduling time, instance information, and scheduling dependencies. *For other content, please refer to the help documentation.*
48
+ *This document focuses on* scheduling time, instance information, and scheduling dependencies. *For other content, please refer to the help documentation*.
49
49
 
50
50
  ### Scheduling Time
51
51
 
52
- ![](.topwrite/assets/f6fc6447ee/b35ef5af588033e07f3e757c89edfd025ec25ca6.png =582)
53
-
54
52
  ***
55
53
 
56
54
  **Scheduling Cycle**: Configure the scheduling cycle, scheduling frequency, scheduling start time, and scheduling end time. After the user configures this information, the system will automatically generate a standard cron expression that complies with the rules. This automatically generated time expression will be parsed by the scheduling engine, which uses a time-wheel algorithm to derive all specific execution instances that meet the criteria within future cycles.
@@ -77,7 +75,7 @@ If the user specifies a date, instances of this scheduling task will no longer b
77
75
 
78
76
  #### Instance Generation
79
77
 
80
- ![](.topwrite/assets/f6fc6447ee/9260e6641b50d8940c20836bfdb9330b4ba7eeab.png =556)
78
+ ^
81
79
 
82
80
  **Instance Generation**: Based on the user-configured **scheduling configuration information and dependency relationships**, **instances are generated according to the instance generation rule selected by the user**.
83
81
 
@@ -87,7 +85,7 @@ Effective After Publishing: Takes effect immediately after submitting the task.
87
85
  * If the submission time is later than the scheduling start time, instances are updated/generated starting from the submission/publishing time.
88
86
  * Historically generated instances are not affected; only subsequent instances are changed starting from the start time.
89
87
 
90
- ![](.topwrite/assets/f6fc6447ee/94bccbdeead608afb473198307e1466cab001784.jpeg =757)
88
+ ^
91
89
 
92
90
  Next-Day Effective: All instances that need to be executed on the next day are uniformly generated at 22:00 of the current day.
93
91
 
@@ -95,7 +93,7 @@ Next-Day Effective: All instances that need to be executed on the next day are u
95
93
 
96
94
  #### Instance Rerun Methods
97
95
 
98
- ![](.topwrite/assets/f6fc6447ee/6d2c0d8953eac7b7256337937f6f795fd084a154.png =522)
96
+ ^
99
97
 
100
98
  The product provides three rerun methods, which mainly affect behavior in the following two scenarios:
101
99
 
@@ -118,13 +116,13 @@ When the user configures a run timeout duration, if the task instance's run time
118
116
 
119
117
  #### Scheduling Wait Duration
120
118
 
121
- ![](.topwrite/assets/f6fc6447ee/7eb115526d02c946b30b1b2b9d78730794eb5d85.png =619)
119
+ ^
122
120
 
123
121
  After configuring the scheduling wait duration, when the task instance reaches its scheduled run time, regardless of whether upstream tasks have succeeded, a forced kill behavior will be triggered. This configuration is mainly used to prevent resource waste caused by large numbers of downstream instances being unable to execute due to paused upstream tasks. It is recommended to configure with caution.
124
122
 
125
123
  #### Delayed Run Skip Duration
126
124
 
127
- ![](.topwrite/assets/image_1780052779009.png)
125
+ ^
128
126
 
129
127
  The determination is based on the difference between the time the task instance enters the running state and the configured scheduled time.
130
128