@clickzetta/cz-cli-darwin-arm64 0.5.15 → 0.5.17
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/cz-cli +0 -0
- package/bin/skills/lakehouse-doc-en/SKILL.md +6 -11
- package/bin/skills/lakehouse-doc-en/references/AIGateway.md +58 -13
- package/bin/skills/lakehouse-doc-en/references/Computation.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/DataSource_Amazon_DocumentDB.md +3 -1
- package/bin/skills/lakehouse-doc-en/references/Foreach.md +14 -14
- package/bin/skills/lakehouse-doc-en/references/JDBC-Driver.md +0 -1
- package/bin/skills/lakehouse-doc-en/references/LakehouseAI-overview.md +21 -8
- package/bin/skills/lakehouse-doc-en/references/LakehouseDataGPT-tour.md +4 -9
- package/bin/skills/lakehouse-doc-en/references/LakehouseStudio-tour.md +14 -19
- package/bin/skills/lakehouse-doc-en/references/Lakehouse_Zilliz_MakeDataReadyforBIandAI.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/Logstash.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/Migrate_Spark_DataEngineeringBestPractices_Project_to_Lakehouse.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/Notebook.md +17 -17
- package/bin/skills/lakehouse-doc-en/references/RemoteFunction-as-udf.md +14 -14
- package/bin/skills/lakehouse-doc-en/references/SQL_External_Catalog_Guide.md +1 -9
- package/bin/skills/lakehouse-doc-en/references/SUMMARY.md +59 -29
- package/bin/skills/lakehouse-doc-en/references/WINDOWFUNCTION.md +99 -57
- package/bin/skills/lakehouse-doc-en/references/Zettapark_Data_Engineering_Demo.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/access-control-configuration.md +1 -8
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-2-5-1.0.md +16 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-3-29-1.0.2.md +14 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-3-8-1.0.1.md +16 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-4-28-1.1.md +29 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-12-1.1.1.md +18 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-15-1.2.md +9 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-21-1.3.md +9 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-28-1.4.md +10 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-6-3-1.5.md +9 -0
- package/bin/skills/lakehouse-doc-en/references/alicloud-arn-externalid.md +0 -5
- package/bin/skills/lakehouse-doc-en/references/answer-accuracy-improve.md +120 -103
- package/bin/skills/lakehouse-doc-en/references/application-list.md +1 -3
- package/bin/skills/lakehouse-doc-en/references/approval-list.md +16 -17
- package/bin/skills/lakehouse-doc-en/references/batch-load-parquet-file-into-lakehouse.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/batch_sync.md +9 -9
- package/bin/skills/lakehouse-doc-en/references/batch_sync_Sop.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/batchloadparquetfileintoLakehouse.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/bulkloadv1-python-sdk.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/chart-auto-refresh-guide.md +12 -6
- package/bin/skills/lakehouse-doc-en/references/clickzetta-sample-data.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/code_approval.md +1 -5
- package/bin/skills/lakehouse-doc-en/references/composite_task.md +31 -42
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_environment_and_data_generate.md +6 -9
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_javasdk_bulkload_realtime.md +4 -10
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_kafka_realtime_sync.md +1 -10
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_local_file_into_table_by_studio.md +0 -6
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_batchload_public_network.md +0 -5
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_python_node.md +2 -7
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_realtime_cdc_public_network.md +13 -18
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_sql_insert.md +0 -1
- package/bin/skills/lakehouse-doc-en/references/concepts.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/config-datasource.md +5 -7
- package/bin/skills/lakehouse-doc-en/references/connect-with-cli.md +116 -72
- package/bin/skills/lakehouse-doc-en/references/connect-with-cz-cli.md +151 -0
- package/bin/skills/lakehouse-doc-en/references/continue-job.md +9 -17
- package/bin/skills/lakehouse-doc-en/references/create-api-connection.md +315 -286
- package/bin/skills/lakehouse-doc-en/references/create-catalog-connection.md +1 -0
- package/bin/skills/lakehouse-doc-en/references/create-dynamic-table.md +4 -4
- package/bin/skills/lakehouse-doc-en/references/create-external-catalog.md +85 -22
- package/bin/skills/lakehouse-doc-en/references/create-table-ddl.md +45 -0
- package/bin/skills/lakehouse-doc-en/references/creating_alicloud_privatelinkendpoint.md +4 -6
- package/bin/skills/lakehouse-doc-en/references/creating_alicloud_privatelinkservice.md +4 -7
- package/bin/skills/lakehouse-doc-en/references/creating_tencentcloud_privatelinkendpoint.md +2 -7
- package/bin/skills/lakehouse-doc-en/references/creating_tencentcloud_privatelinkservice.md +1 -5
- package/bin/skills/lakehouse-doc-en/references/cz-cli-agent.md +15 -10
- package/bin/skills/lakehouse-doc-en/references/cz-cli-datasource.md +0 -8
- package/bin/skills/lakehouse-doc-en/references/cz-cli-sql.md +2 -45
- package/bin/skills/lakehouse-doc-en/references/cz-cli.md +53 -42
- package/bin/skills/lakehouse-doc-en/references/dashboard-version-management-guide.md +12 -4
- package/bin/skills/lakehouse-doc-en/references/data-integration-intro.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/data-integration.md +29 -27
- package/bin/skills/lakehouse-doc-en/references/data-load-summary.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/data-quality.md +25 -25
- package/bin/skills/lakehouse-doc-en/references/data-sharing.md +31 -54
- package/bin/skills/lakehouse-doc-en/references/data-sources.md +45 -45
- package/bin/skills/lakehouse-doc-en/references/data_catalog.md +23 -25
- package/bin/skills/lakehouse-doc-en/references/data_privacy.md +5 -2
- package/bin/skills/lakehouse-doc-en/references/data_sharing_between_accounts_guide.md +0 -4
- package/bin/skills/lakehouse-doc-en/references/data_visualization.md +4 -15
- package/bin/skills/lakehouse-doc-en/references/dataagent.md +39 -7
- package/bin/skills/lakehouse-doc-en/references/databricks-delta-to-lakehouse-migration.md +168 -0
- package/bin/skills/lakehouse-doc-en/references/databricks-dlt-to-lakehouse-migration.md +331 -0
- package/bin/skills/lakehouse-doc-en/references/databricks-external-catalog-practice.md +367 -0
- package/bin/skills/lakehouse-doc-en/references/databricks-jobs-to-studio-migration.md +199 -0
- package/bin/skills/lakehouse-doc-en/references/databricks-notebook-to-studio-migration.md +350 -0
- package/bin/skills/lakehouse-doc-en/references/databricks-uc-governance-to-lakehouse-migration.md +327 -0
- package/bin/skills/lakehouse-doc-en/references/datagpt-model-config.md +34 -0
- package/bin/skills/lakehouse-doc-en/references/datagpt_data_source.md +50 -37
- package/bin/skills/lakehouse-doc-en/references/datagpt_introduction.md +55 -79
- package/bin/skills/lakehouse-doc-en/references/datagpt_quickstart.md +50 -64
- package/bin/skills/lakehouse-doc-en/references/datalake-acceleration.md +75 -2
- package/bin/skills/lakehouse-doc-en/references/dbt-databricks-to-clickzetta-migration.md +242 -0
- package/bin/skills/lakehouse-doc-en/references/dynamic-mask.md +30 -30
- package/bin/skills/lakehouse-doc-en/references/dynamic-table-bestpractice.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/dynamic-table-introduce.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/dynamic_table_summary.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/eco_integration/streamlit.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/eco_integration/superset.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/ecosystem-all.md +1 -3
- package/bin/skills/lakehouse-doc-en/references/ecosystem.md +145 -0
- package/bin/skills/lakehouse-doc-en/references/external-catalog-summary.md +33 -38
- package/bin/skills/lakehouse-doc-en/references/external-function-combo-practice.md +466 -0
- package/bin/skills/lakehouse-doc-en/references/f6fc6447ee.md +7 -9
- package/bin/skills/lakehouse-doc-en/references/federation-query.md +56 -6
- package/bin/skills/lakehouse-doc-en/references/finebi-mysql.md +2 -0
- package/bin/skills/lakehouse-doc-en/references/get-started-with-sample-data.md +10 -11
- package/bin/skills/lakehouse-doc-en/references/gitfolder.md +2 -3
- package/bin/skills/lakehouse-doc-en/references/grant-privileges.md +2 -0
- package/bin/skills/lakehouse-doc-en/references/iceberg-rest-catalog-databricks.md +166 -0
- package/bin/skills/lakehouse-doc-en/references/ide.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/if_else_task.md +59 -57
- package/bin/skills/lakehouse-doc-en/references/input_output.md +10 -7
- package/bin/skills/lakehouse-doc-en/references/jobprofile-bestpractices.md +60 -64
- package/bin/skills/lakehouse-doc-en/references/kafka-connection.md +0 -1
- package/bin/skills/lakehouse-doc-en/references/key-concepts.md +146 -117
- package/bin/skills/lakehouse-doc-en/references/lakehouse-ai-gateway-cz-cli.md +317 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-ai-sql-analysis.md +345 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-dqc-guide.md +300 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-medallion-sql-dt-guide.md +543 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-multi-cloud-acceleration.md +274 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-multimodal-ai-pipeline.md +198 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-quick-experience_guide.md +49 -52
- package/bin/skills/lakehouse-doc-en/references/lakehouse-volume-pipe-acceleration-guide.md +380 -0
- package/bin/skills/lakehouse-doc-en/references/langchain-plug-installation.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/management.md +4 -9
- package/bin/skills/lakehouse-doc-en/references/medallion-lakehouse-from-scratch.md +2 -1
- package/bin/skills/lakehouse-doc-en/references/metrics_answer_build.md +58 -21
- package/bin/skills/lakehouse-doc-en/references/migrate-spark-data-engineering-best-practices-to-lakehouse.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/mindsdb.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/monitoring_and_alerting.md +65 -60
- package/bin/skills/lakehouse-doc-en/references/monitoring_item_specification.md +33 -33
- package/bin/skills/lakehouse-doc-en/references/multitable_batch_sync.md +16 -16
- package/bin/skills/lakehouse-doc-en/references/multitable_realtime_sync.md +65 -72
- package/bin/skills/lakehouse-doc-en/references/multitable_realtime_sync_sop.md +54 -52
- package/bin/skills/lakehouse-doc-en/references/navicat-mysql.md +2 -0
- package/bin/skills/lakehouse-doc-en/references/om-dynamic-table.md +71 -66
- package/bin/skills/lakehouse-doc-en/references/om-vcluster.md +2 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-create-session.md +79 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-generate-auth-token.md +63 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-overview.md +96 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-quick-start.md +286 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-response-guide.md +264 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-safe-question-poll.md +201 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-text2insight-query.md +99 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-text2insight-stop.md +74 -0
- package/bin/skills/lakehouse-doc-en/references/overview.md +6 -7
- package/bin/skills/lakehouse-doc-en/references/permission-application.md +5 -5
- package/bin/skills/lakehouse-doc-en/references/pipe-introduction.md +1 -0
- package/bin/skills/lakehouse-doc-en/references/pipe-kafka-table-stream.md +72 -70
- package/bin/skills/lakehouse-doc-en/references/pipe-kafka.md +105 -110
- package/bin/skills/lakehouse-doc-en/references/pipe-overview.md +40 -40
- package/bin/skills/lakehouse-doc-en/references/pipe-storage-object.md +43 -48
- package/bin/skills/lakehouse-doc-en/references/pipe-summary.md +14 -4
- package/bin/skills/lakehouse-doc-en/references/pipe-syntax.md +58 -151
- package/bin/skills/lakehouse-doc-en/references/practice_python_task.md +4 -4
- package/bin/skills/lakehouse-doc-en/references/pricing-ai-gateway.md +181 -0
- package/bin/skills/lakehouse-doc-en/references/pricing-lakehouse.md +316 -0
- package/bin/skills/lakehouse-doc-en/references/pricing.md +44 -288
- package/bin/skills/lakehouse-doc-en/references/private-link-general.md +0 -2
- package/bin/skills/lakehouse-doc-en/references/pyspark-to-zettapark-migration-f1.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/python-igs.md +7 -3
- package/bin/skills/lakehouse-doc-en/references/python-sample-put-github-rt-events.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/python-task.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/python_reference/connector.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/python_reference/connector_advanced.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/python_reference/connector_examples.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/python_sdk_guide.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/python_shell_datasource.md +11 -9
- package/bin/skills/lakehouse-doc-en/references/quick_start_batch_sync_data.md +9 -18
- package/bin/skills/lakehouse-doc-en/references/quick_start_bi_analysis.md +8 -25
- package/bin/skills/lakehouse-doc-en/references/quick_start_create_workspace.md +4 -6
- package/bin/skills/lakehouse-doc-en/references/quick_start_data_quality.md +8 -8
- package/bin/skills/lakehouse-doc-en/references/quick_start_etl.md +16 -20
- package/bin/skills/lakehouse-doc-en/references/quick_start_monitoring_and_alerting.md +10 -18
- package/bin/skills/lakehouse-doc-en/references/quick_start_sql_query.md +7 -10
- package/bin/skills/lakehouse-doc-en/references/quick_start_upload_data.md +5 -7
- package/bin/skills/lakehouse-doc-en/references/quick_start_user_management.md +8 -8
- package/bin/skills/lakehouse-doc-en/references/quick_start_workspace.md +0 -5
- package/bin/skills/lakehouse-doc-en/references/quick_start_workspace_user.md +8 -8
- package/bin/skills/lakehouse-doc-en/references/quickstart.md +69 -56
- package/bin/skills/lakehouse-doc-en/references/quickstart_datashare_between_companies.md +0 -5
- package/bin/skills/lakehouse-doc-en/references/quickstart_envirment_for_team.md +0 -24
- package/bin/skills/lakehouse-doc-en/references/realtime-pipeline-selection-guide.md +1 -2
- package/bin/skills/lakehouse-doc-en/references/realtime-sales-dashboard-with-dynamic-table.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/realtime_sync.md +0 -1
- package/bin/skills/lakehouse-doc-en/references/release-note-2026-05-19.md +5 -3
- package/bin/skills/lakehouse-doc-en/references/revoke-privileges.md +3 -1
- package/bin/skills/lakehouse-doc-en/references/roles.md +2 -3
- package/bin/skills/lakehouse-doc-en/references/row-filter.md +165 -0
- package/bin/skills/lakehouse-doc-en/references/row_level_permission.md +30 -19
- package/bin/skills/lakehouse-doc-en/references/scheduled_task.md +28 -21
- package/bin/skills/lakehouse-doc-en/references/security_overview.md +99 -21
- package/bin/skills/lakehouse-doc-en/references/set-command.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/setup.md +13 -15
- package/bin/skills/lakehouse-doc-en/references/show-grants.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/snowflake-dynamic-tables-to-lakehouse.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/spark-connector-summary.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/sql_functions/context_functions/current_vcluster.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/sso-configuration.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/streaming_pipeline_with_dynamic_table.md +0 -1
- package/bin/skills/lakehouse-doc-en/references/studio-incremental-sync-practice.md +27 -23
- package/bin/skills/lakehouse-doc-en/references/studio-shell-task.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/supported-cloud-platforms.md +32 -0
- package/bin/skills/lakehouse-doc-en/references/table_rendering.md +18 -12
- package/bin/skills/lakehouse-doc-en/references/task-develop.md +89 -91
- package/bin/skills/lakehouse-doc-en/references/task_development.md +19 -17
- package/bin/skills/lakehouse-doc-en/references/task_group.md +16 -14
- package/bin/skills/lakehouse-doc-en/references/task_instance.md +21 -21
- package/bin/skills/lakehouse-doc-en/references/task_param.md +38 -35
- package/bin/skills/lakehouse-doc-en/references/task_param_reference.md +81 -79
- package/bin/skills/lakehouse-doc-en/references/task_scheduling_dependency.md +20 -21
- package/bin/skills/lakehouse-doc-en/references/tencentcloud_arn_and_externalid.md +1 -5
- package/bin/skills/lakehouse-doc-en/references/trial-account-quotas-and-limits.md +1 -3
- package/bin/skills/lakehouse-doc-en/references/tutorial_connect_to_lakehouse.md +69 -0
- package/bin/skills/lakehouse-doc-en/references/tutorials.md +4 -1
- package/bin/skills/lakehouse-doc-en/references/unique-key.md +167 -0
- package/bin/skills/lakehouse-doc-en/references/usageandbillingview.md +138 -0
- package/bin/skills/lakehouse-doc-en/references/use-dbt-dev.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/use-java-sdk-realtime-uploaddata.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/use-java-sdk-upload-data-local.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/use-models.md +128 -0
- package/bin/skills/lakehouse-doc-en/references/use-mysql-client.md +81 -81
- package/bin/skills/lakehouse-doc-en/references/use-python-sdk-upload-data.md +10 -12
- package/bin/skills/lakehouse-doc-en/references/user-identification.md +2 -3
- package/bin/skills/lakehouse-doc-en/references/user_permission_grand_guide.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/using-udf-in-dynamic-table.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/vc_cache.md +18 -22
- package/bin/skills/lakehouse-doc-en/references/vcluster_size_description.md +33 -31
- package/bin/skills/lakehouse-doc-en/references/virtual-cluster.md +43 -45
- package/bin/skills/lakehouse-doc-en/references/web-job-history.md +94 -108
- package/bin/skills/lakehouse-doc-en/references/web_search.md +16 -7
- package/bin/skills/lakehouse-doc-en/references/zettapark-data-engineering-demo.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/zettapark-dataframe-guide.md +144 -70
- package/bin/skills/lakehouse-doc-en/references/zettapark-dynamic-table-guide.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/zettapark-etl-guide.md +73 -33
- package/bin/skills/lakehouse-doc-en/references/zettapark-feature-engineering.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/zettapark-functions-guide.md +75 -46
- package/bin/skills/lakehouse-doc-en/references/zettapark-quick-start.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/zettapark-stream-guide.md +4 -4
- package/bin/skills/lakehouse-doc-en/references/zettapark-volume-guide.md +93 -29
- package/package.json +1 -1
- package/bin/skills/lakehouse-doc-en/references/CLAUDE.md +0 -606
- package/bin/skills/lakehouse-doc-en/references/modelprice.md +0 -155
|
@@ -1,38 +1,33 @@
|
|
|
1
|
-
#
|
|
2
|
-
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
Lakehouse
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
-
|
|
29
|
-
|
|
30
|
-
-
|
|
31
|
-
|
|
32
|
-
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
# Permissions
|
|
36
|
-
Currently, only the instance admin role can query the created CATALOG.
|
|
37
|
-
# Use Cases
|
|
38
|
-
Refer to [Create HIVE CATALOG](<create-hive-catalog.md>)
|
|
1
|
+
# External Catalog
|
|
2
|
+
|
|
3
|
+
External Catalog is the federated query entry point in Lakehouse, mapping the metadata catalogs of external data systems (Hive, Databricks, Snowflake, etc.) into Lakehouse, allowing you to query external data directly with standard SQL — no data copying required.
|
|
4
|
+
|
|
5
|
+
**Difference from External Schema**: External Catalog is an independent top-level catalog accessed with three-level naming `catalog.schema.table`; External Schema is a Schema mounted into the current workspace, accessed with two-level naming `schema.table`, which is better suited for integrating Hive databases into an existing workspace. See [Organization Hierarchy](org-hierarchy.md).
|
|
6
|
+
|
|
7
|
+
## Supported Data Sources
|
|
8
|
+
|
|
9
|
+
| Data Source | Connection Method |
|
|
10
|
+
|--------|---------|
|
|
11
|
+
| Apache Hive | Hive Metastore URIs |
|
|
12
|
+
| Databricks Unity Catalog | Databricks API |
|
|
13
|
+
| Iceberg REST Catalog | Iceberg REST API |
|
|
14
|
+
| Snowflake Open Catalog | Iceberg REST API + OAuth |
|
|
15
|
+
|
|
16
|
+
## Use Cases
|
|
17
|
+
|
|
18
|
+
- **Cross-platform federated queries**: Query Lakehouse local data and Hive/Databricks data simultaneously — no ETL required
|
|
19
|
+
- **In-place data lake acceleration**: Keep data in OSS/HDFS and use Lakehouse to replace Spark/Hive for ETL or Presto/Trino for ad-hoc queries
|
|
20
|
+
- **Gradual migration**: Maintain business continuity through External Catalog during migration; switch over after verifying data consistency
|
|
21
|
+
|
|
22
|
+
## Permissions
|
|
23
|
+
|
|
24
|
+
Currently, only the `instance_admin` role can query the created External Catalog.
|
|
25
|
+
|
|
26
|
+
## Related Documentation
|
|
27
|
+
|
|
28
|
+
- [In-Place Lake Acceleration Implementation Guide](lakehouse-acceleration-guide.md) — Rapid POC validation, replacing Spark/Hive and Presto/Trino without moving data
|
|
29
|
+
- [External Catalog Federated Queries](external-catalog-concept.md) — Detailed usage guide, operation examples, architecture principles
|
|
30
|
+
- [Create External Catalog](create-external-catalog.md) — CREATE EXTERNAL CATALOG syntax
|
|
31
|
+
- [Create Hive Catalog](create-hive-catalog.md) — Hive connection configuration
|
|
32
|
+
- [External Schema](external-schema.md) — Mount an external Hive database into a workspace
|
|
33
|
+
- [Organization Hierarchy](org-hierarchy.md) — External Catalog vs External Schema selection guide
|
|
@@ -0,0 +1,466 @@
|
|
|
1
|
+
# Storage Connection + API Connection + External Function: Combined Practice
|
|
2
|
+
|
|
3
|
+
External Functions can look daunting at first — you need to configure an OSS Bucket, a serverless function runtime, a RAM role, write Python code, build a zip, and write DDL... three separate objects, many concepts, many steps, easy to get discouraged.
|
|
4
|
+
|
|
5
|
+
Once you get everything running, you'll find that the complexity is mostly concentrated in **one-time environment setup**. After that initial setup, the cost of adding new functions is extremely low: write Python logic → package → one DDL, done in three minutes. This guide walks you through four progressive scenarios to cover the full workflow across Alibaba Cloud, Tencent Cloud, and AWS.
|
|
6
|
+
|
|
7
|
+
**By the end you'll realize**: External Functions are not complex. You just configure "where to store code" and "where to run code" once, and every new function after that is just code + one SQL statement.
|
|
8
|
+
|
|
9
|
+
Full code on GitHub: [clickzetta_external_function](https://github.com/clickzetta/clickzetta_external_function)
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## The Iron Triangle: How the Three Objects Work Together
|
|
14
|
+
|
|
15
|
+
An External Function is not just one object — it is the result of three objects working together. Understanding their roles makes the entire mechanism clear:
|
|
16
|
+
|
|
17
|
+

|
|
18
|
+
|
|
19
|
+
|
|
20
|
+
| Object | Analogy | What it does | How often configured |
|
|
21
|
+
|------|------|----------|--------|
|
|
22
|
+
| **Storage Connection** | Parking lot | Authenticates object storage (OSS/COS/S3), allows Lakehouse to read and write code packages | Once per Schema |
|
|
23
|
+
| **API Connection** | Workshop | Authenticates cloud function runtime (FC/SCF/Lambda), defines where code runs | Once per Region |
|
|
24
|
+
| **External Volume** | Shelf | Mounts the object storage Bucket to the Schema, enabling the PUT command to upload files | Once per Bucket |
|
|
25
|
+
| **CREATE EXTERNAL FUNCTION** | Registry | Maps a function name → entry class → zip package | Once per function |
|
|
26
|
+
|
|
27
|
+
The first three are one-time setups — once you configure Storage Connection + API Connection + External Volume, all functions share them. Adding a new function afterwards only requires one `CREATE EXTERNAL FUNCTION` statement.
|
|
28
|
+
|
|
29
|
+
---
|
|
30
|
+
|
|
31
|
+
## Prerequisites (One-time, Shared Across All Four Scenarios)
|
|
32
|
+
|
|
33
|
+
All four scenarios share the same cloud environment configuration. **This step is done only once**, and all four scenarios can use it directly.
|
|
34
|
+
|
|
35
|
+
### Step 1: Choose your cloud, configure config.json
|
|
36
|
+
|
|
37
|
+
Confirm which cloud your Lakehouse is on. The external function runtime (FC/SCF/Lambda) must be on the same cloud and in the same region:
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
cz-cli profile list
|
|
41
|
+
# service column:
|
|
42
|
+
# alicloud.api.clickzetta.com → Alibaba Cloud
|
|
43
|
+
# tencentcloud.api.clickzetta.com → Tencent Cloud
|
|
44
|
+
# aws.api.clickzetta.com → AWS
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
```bash
|
|
48
|
+
git clone https://github.com/clickzetta/clickzetta_external_function.git
|
|
49
|
+
cd clickzetta_external_function
|
|
50
|
+
cp config.example.json config.json
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
Open `config.json` and change only the `platform` field:
|
|
54
|
+
|
|
55
|
+
```json
|
|
56
|
+
"platform": "aliyun" // or "tencent" or "aws"
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
Then follow SETUP.md to complete the cloud-specific environment setup (OSS/COS/S3 Bucket, FC/SCF/Lambda, RAM/CAM/IAM role, Bailian API Key) and fill in the details in config.json.
|
|
60
|
+
|
|
61
|
+
### Step 2: Install cz-cli and verify
|
|
62
|
+
|
|
63
|
+
```bash
|
|
64
|
+
cz-cli profile use <your-profile>
|
|
65
|
+
cz-cli sql "SELECT current_schema()"
|
|
66
|
+
# Fill the output schema name into config.json → schema field
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
### Step 3: Universal steps for all four scenarios
|
|
70
|
+
|
|
71
|
+
The execution flow for all four scenarios is identical:
|
|
72
|
+
|
|
73
|
+
```
|
|
74
|
+
Fill config.json → check (validate config) → package (build code) → render (generate SQL) → deploy (execute deployment) → verify (call and verify)
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
Corresponding commands:
|
|
78
|
+
|
|
79
|
+
```bash
|
|
80
|
+
python ../1-check-config.py # ① Validate configuration
|
|
81
|
+
python 2-package.py # ② Package code (add --deps for AI functions)
|
|
82
|
+
python ../3-render-sql.py # ③ Replace placeholders, generate SQL
|
|
83
|
+
cz-cli sql -f dist/4-deploy_generated.sql --write # ④ Deploy
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
---
|
|
87
|
+
|
|
88
|
+
## Scenario 1: Python External Function Quick Start
|
|
89
|
+
|
|
90
|
+
> One function, zero dependencies, up and running in 5 minutes. Understand how Storage Connection + API Connection + External Function work together.
|
|
91
|
+
|
|
92
|
+
### Deploy
|
|
93
|
+
|
|
94
|
+
```bash
|
|
95
|
+
cd python_quickstart
|
|
96
|
+
python ../1-check-config.py
|
|
97
|
+
python 2-package.py
|
|
98
|
+
python ../3-render-sql.py
|
|
99
|
+
cz-cli sql -f dist/4-deploy_generated.sql --write
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
What `4-deploy_generated.sql` does:
|
|
103
|
+
|
|
104
|
+
```sql
|
|
105
|
+
-- 1. Storage Connection (OSS authentication)
|
|
106
|
+
CREATE STORAGE CONNECTION IF NOT EXISTS oss_sh_conn
|
|
107
|
+
TYPE OSS
|
|
108
|
+
access_id = '<your-id>' access_key = '<your-key>'
|
|
109
|
+
ENDPOINT = 'oss-cn-shanghai.aliyuncs.com';
|
|
110
|
+
|
|
111
|
+
-- 2. API Connection (Function Compute FC authentication)
|
|
112
|
+
CREATE API CONNECTION IF NOT EXISTS shanghai_func_conn
|
|
113
|
+
TYPE CLOUD_FUNCTION PROVIDER = 'aliyun'
|
|
114
|
+
REGION = 'cn-shanghai' ROLE_ARN = '<your-role-arn>'
|
|
115
|
+
CODE_BUCKET = '<your-bucket>';
|
|
116
|
+
|
|
117
|
+
-- 3. External Volume (mount Bucket)
|
|
118
|
+
CREATE EXTERNAL VOLUME IF NOT EXISTS external_functions_prod
|
|
119
|
+
LOCATION 'oss://<bucket>/' USING CONNECTION oss_sh_conn;
|
|
120
|
+
|
|
121
|
+
-- 4. Upload zip
|
|
122
|
+
PUT '<project>/dist/my_upper.zip' TO VOLUME external_functions_prod FILE 'my_upper.zip';
|
|
123
|
+
|
|
124
|
+
-- 5. Register function
|
|
125
|
+
CREATE EXTERNAL FUNCTION IF NOT EXISTS <schema>.my_upper
|
|
126
|
+
AS 'my_upper.my_upper'
|
|
127
|
+
USING ARCHIVE 'volume://external_functions_prod/my_upper.zip'
|
|
128
|
+
CONNECTION shanghai_func_conn
|
|
129
|
+
WITH PROPERTIES ('remote.udf.api'='python3.mc.v0','remote.udf.protocol'='http.arrow.v0');
|
|
130
|
+
|
|
131
|
+
-- 6. Verify
|
|
132
|
+
SELECT <schema>.my_upper('hello'); -- returns HELLO
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
### Function Source Code
|
|
136
|
+
|
|
137
|
+
`src/my_upper.py` — one class, one `evaluate` method:
|
|
138
|
+
|
|
139
|
+
```python
|
|
140
|
+
@annotate("*->string")
|
|
141
|
+
class my_upper(object):
|
|
142
|
+
def evaluate(self, s):
|
|
143
|
+
return s.upper() if s else s
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
### Local Testing
|
|
147
|
+
|
|
148
|
+
FC environments have no stdout and no stack traces. **Always test locally before each deployment**:
|
|
149
|
+
|
|
150
|
+
```bash
|
|
151
|
+
python3 -c "
|
|
152
|
+
import sys; sys.path.insert(0, 'src')
|
|
153
|
+
from my_upper import my_upper
|
|
154
|
+
print(my_upper().evaluate('hello')) # HELLO
|
|
155
|
+
"
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
### Key Takeaways
|
|
159
|
+
|
|
160
|
+
- `remote.udf.api='python3.mc.v0'` specifies the Python 3.10 runtime for FC
|
|
161
|
+
- Calling the function **requires the schema prefix**: `SELECT <schema>.my_upper('hello')` — omitting it results in `function not found`
|
|
162
|
+
- The first call may take 5-10 seconds (FC cold start); subsequent calls are normal
|
|
163
|
+
|
|
164
|
+
---
|
|
165
|
+
|
|
166
|
+
## Scenario 2: Python ML Functions + Third-Party Dependency Packaging
|
|
167
|
+
|
|
168
|
+
> 5 ML/PII functions based on scikit-learn + jieba. Demonstrates how to correctly package third-party dependencies that include C extensions.
|
|
169
|
+
|
|
170
|
+
### Function List
|
|
171
|
+
|
|
172
|
+
| Function | Libraries used | Purpose |
|
|
173
|
+
|------|---------|------|
|
|
174
|
+
| `pii_mask` | re | Phone/email/ID masking |
|
|
175
|
+
| `feature_normalize` | numpy + sklearn | Numeric column normalization (minmax/zscore) |
|
|
176
|
+
| `anomaly_detect` | numpy + sklearn | Isolation Forest anomaly detection |
|
|
177
|
+
| `sentiment_score` | jieba | Chinese sentiment scoring (0-1) |
|
|
178
|
+
| `tfidf_keywords` | sklearn | TF-IDF keyword extraction |
|
|
179
|
+
|
|
180
|
+
### The Core Problem: FC Runs Linux, macOS Packages Won't Work
|
|
181
|
+
|
|
182
|
+
The FC runtime is **Linux x86_64 + Python 3.10**. Running `pip install scikit-learn` on macOS produces `.dylib` files that cannot be loaded on FC.
|
|
183
|
+
|
|
184
|
+
Solution: use two separate requirements files with two different install methods.
|
|
185
|
+
|
|
186
|
+
| File | Contents | Install method |
|
|
187
|
+
|------|--------|---------|
|
|
188
|
+
| `requirements.txt` | Packages with C extensions (scikit-learn, numpy) | `pip install --platform manylinux2014_x86_64 --only-binary :all:` |
|
|
189
|
+
| `requirements_pure.txt` | Pure Python packages (jieba) | Normal `pip install` |
|
|
190
|
+
|
|
191
|
+
**Why can't they be in the same file?** Putting jieba (pure Python) into `requirements.txt` with `--only-binary :all:` causes pip to throw `No matching distribution found` — pure Python packages don't have binary wheels.
|
|
192
|
+
|
|
193
|
+
### Deploy
|
|
194
|
+
|
|
195
|
+
```bash
|
|
196
|
+
cd python_advanced
|
|
197
|
+
python 2-package.py # dual-mode packaging (~100 MB)
|
|
198
|
+
python ../1-check-config.py
|
|
199
|
+
python ../3-render-sql.py
|
|
200
|
+
cz-cli sql -f dist/4-deploy_generated.sql --write
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
### Local Testing
|
|
204
|
+
|
|
205
|
+
```bash
|
|
206
|
+
pip install -r requirements.txt -r requirements_pure.txt
|
|
207
|
+
python3 -c "
|
|
208
|
+
import sys; sys.path.insert(0, 'src')
|
|
209
|
+
from ml_toolkit import pii_mask, feature_normalize, sentiment_score
|
|
210
|
+
print(pii_mask().evaluate('My phone is 13812345678'))
|
|
211
|
+
print(feature_normalize().evaluate('[1,2,3,4,5]', 'minmax'))
|
|
212
|
+
print(sentiment_score().evaluate('The product quality is excellent'))
|
|
213
|
+
"
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
### Using in SQL
|
|
217
|
+
|
|
218
|
+
```sql
|
|
219
|
+
SELECT <schema>.pii_mask('Phone 13812345678, email alice@example.com');
|
|
220
|
+
SELECT <schema>.feature_normalize('[10,20,30,40,50]', 'minmax');
|
|
221
|
+
SELECT <schema>.anomaly_detect('[1,2,3,4,100]');
|
|
222
|
+
SELECT <schema>.sentiment_score('The product quality is excellent, shipping was fast, very satisfied!');
|
|
223
|
+
SELECT <schema>.tfidf_keywords('["AI and machine learning are the future","Deep learning achieves breakthroughs in image recognition"]', 3);
|
|
224
|
+
```
|
|
225
|
+
|
|
226
|
+
### Key Takeaways
|
|
227
|
+
|
|
228
|
+
- **C extension packages require Linux binary wheels** — macOS `.dylib` files cannot run on FC
|
|
229
|
+
- **Pure Python packages must be separated** — they cannot be mixed with binary packages in the same requirements file
|
|
230
|
+
- **Zip size determines cold start time**: scikit-learn + numpy is about 100MB; the first call takes 5-10 seconds
|
|
231
|
+
|
|
232
|
+
---
|
|
233
|
+
|
|
234
|
+
## Scenario 3: 30 AI SQL Functions — Package Once, Call Anywhere
|
|
235
|
+
|
|
236
|
+
> 30 AI functions share a single zip. Perform summarization, translation, sentiment analysis, OCR, and vector similarity search directly in SQL.
|
|
237
|
+
|
|
238
|
+
### Design Highlights
|
|
239
|
+
|
|
240
|
+
**One zip, 30 functions**: All 30 functions share a single `clickzetta_ai_functions_full.zip`. The DDL only differs in the class name `AS 'ai_functions_complete.ai_xxx'`. Packaging and uploading a separate zip for each function would mean 30 zip files and exploding management complexity.
|
|
241
|
+
|
|
242
|
+
**API Key as a SQL parameter**: The Bailian API Key is not hardcoded in the source code or bundled into the zip — it is passed as a function parameter:
|
|
243
|
+
|
|
244
|
+
```sql
|
|
245
|
+
SELECT <schema>.ai_text_summarize('Artificial intelligence is changing the world.', '<your-api-key>');
|
|
246
|
+
```
|
|
247
|
+
|
|
248
|
+
If the API Key were hardcoded in the zip, a zip leak would mean a Key leak. Passing it as a parameter lets callers manage their own keys.
|
|
249
|
+
|
|
250
|
+
### Function Categories
|
|
251
|
+
|
|
252
|
+
| Category | Count | Typical functions |
|
|
253
|
+
|------|------|---------|
|
|
254
|
+
| Text processing | 8 | Summarization, translation, sentiment analysis, entity extraction, keywords, classification, cleaning, tagging |
|
|
255
|
+
| Vector processing | 5 | Embedding, similarity, clustering preparation, similar search, document search |
|
|
256
|
+
| Multimodal | 8 | Image description, OCR, image analysis, image embedding, image similarity, video summarization, chart analysis, document parsing |
|
|
257
|
+
| Business scenarios | 9 | Customer intent, sales scoring, review analysis, risk detection, contract extraction, resume parsing, customer segmentation, product description, industry classification |
|
|
258
|
+
|
|
259
|
+
### Deploy
|
|
260
|
+
|
|
261
|
+
```bash
|
|
262
|
+
cd python_ai_function
|
|
263
|
+
pip install -r requirements.txt # for local testing only
|
|
264
|
+
python 2-package.py --deps # package code + dashscope Linux dependencies
|
|
265
|
+
python 1-check-config.py # standalone validation (different config structure)
|
|
266
|
+
python 3-render-sql.py
|
|
267
|
+
cz-cli sql -f dist/4-deploy_generated.sql --write
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
### Local Testing
|
|
271
|
+
|
|
272
|
+
FC has no logs and no stdout. **Test locally before deploying to save a lot of time:**
|
|
273
|
+
|
|
274
|
+
```bash
|
|
275
|
+
python3 -c "
|
|
276
|
+
import sys; sys.path.insert(0, 'src')
|
|
277
|
+
from ai_functions_complete import ai_text_summarize, ai_text_translate
|
|
278
|
+
print(ai_text_summarize().evaluate('Hello world', '<your-api-key>'))
|
|
279
|
+
print(ai_text_translate().evaluate('Hello', 'Chinese', '<your-api-key>'))
|
|
280
|
+
"
|
|
281
|
+
```
|
|
282
|
+
|
|
283
|
+
### Using in SQL
|
|
284
|
+
|
|
285
|
+
```sql
|
|
286
|
+
-- Summarization
|
|
287
|
+
SELECT <schema>.ai_text_summarize('Artificial intelligence is changing the world...', '<key>');
|
|
288
|
+
|
|
289
|
+
-- Translation
|
|
290
|
+
SELECT <schema>.ai_text_translate('Hello, how are you?', 'Chinese', '<key>');
|
|
291
|
+
|
|
292
|
+
-- Sentiment analysis
|
|
293
|
+
SELECT <schema>.ai_text_sentiment_analyze('The product quality is excellent!', '<key>');
|
|
294
|
+
|
|
295
|
+
-- Embedding + similarity search
|
|
296
|
+
SELECT <schema>.ai_semantic_similarity('Apples are tasty', 'Apples are a healthy fruit', '<key>');
|
|
297
|
+
|
|
298
|
+
-- Image description
|
|
299
|
+
SELECT <schema>.ai_image_describe('<image-url>', '<key>');
|
|
300
|
+
|
|
301
|
+
-- Contract extraction
|
|
302
|
+
SELECT <schema>.ai_contract_extract('<contract text>', '<key>');
|
|
303
|
+
```
|
|
304
|
+
|
|
305
|
+
Returns JSON; use `JSON_EXTRACT` to retrieve values:
|
|
306
|
+
|
|
307
|
+
```sql
|
|
308
|
+
SELECT JSON_EXTRACT(
|
|
309
|
+
<schema>.ai_text_summarize('Artificial intelligence is changing the world...', '<key>'),
|
|
310
|
+
'$.summary'
|
|
311
|
+
);
|
|
312
|
+
```
|
|
313
|
+
|
|
314
|
+
### Key Takeaways
|
|
315
|
+
|
|
316
|
+
- **30 functions share one zip** — adding a new function = 20 lines of prompt + one DDL
|
|
317
|
+
- **API Key is not hardcoded** — passed as a SQL parameter, so a zip leak does not compromise security
|
|
318
|
+
- **config.json serves dual roles**: embedded in the zip at build time (for runtime model config), and used to replace SQL placeholders at render time
|
|
319
|
+
|
|
320
|
+
---
|
|
321
|
+
|
|
322
|
+
## Scenario 4: Java UDF/UDAF/UDTF
|
|
323
|
+
|
|
324
|
+
> Java external functions support three types: UDF (one row in, one row out), UDAF (multiple rows in, one row out), UDTF (one row in, multiple rows out).
|
|
325
|
+
|
|
326
|
+
### Quick Overview of the Three Types
|
|
327
|
+
|
|
328
|
+
| Type | Base class | Input → Output | DDL special property | Example function |
|
|
329
|
+
|------|------|-------------|-------------|---------|
|
|
330
|
+
| **UDF** | `GenericUDF` | 1 row → 1 row | — | `pii_mask` PII masking |
|
|
331
|
+
| **UDAF** | `GenericUDAFResolver2` | Multiple rows → 1 row | `AGGREGATOR` | `agg_stats` SUM/AVG/MIN/MAX/COUNT |
|
|
332
|
+
| **UDTF** | `GenericUDTF` | 1 row → N rows | `TABLE_VALUED` | `log_explode` log row expansion |
|
|
333
|
+
|
|
334
|
+
### Differences from Python External Functions
|
|
335
|
+
|
|
336
|
+
| | Python | Java |
|
|
337
|
+
|------|--------|------|
|
|
338
|
+
| Runtime | Python 3.10 | **Java 8** (Java 9+ not supported) |
|
|
339
|
+
| DDL property | `python3.mc.v0` | `java8.hive2.v0` |
|
|
340
|
+
| Function types | UDF | UDF / UDAF / UDTF |
|
|
341
|
+
| Packaging | zip | Maven `jar-with-dependencies` → zip |
|
|
342
|
+
| Dependencies | pip `--platform manylinux` | Maven `scope=provided` (Hive runtime included) |
|
|
343
|
+
|
|
344
|
+
### Deploy
|
|
345
|
+
|
|
346
|
+
```bash
|
|
347
|
+
cd java_udf
|
|
348
|
+
python 2-package.py # Maven compile + zip packaging
|
|
349
|
+
python ../1-check-config.py
|
|
350
|
+
python ../3-render-sql.py
|
|
351
|
+
cz-cli sql -f dist/4-deploy_generated.sql --write
|
|
352
|
+
```
|
|
353
|
+
|
|
354
|
+
### UDF Example
|
|
355
|
+
|
|
356
|
+
```sql
|
|
357
|
+
SELECT <schema>.pii_mask('My phone is 13812345678, email alice@example.com');
|
|
358
|
+
```
|
|
359
|
+
|
|
360
|
+
### UDAF Example
|
|
361
|
+
|
|
362
|
+
```sql
|
|
363
|
+
INSERT INTO <schema>.java_udf_test_scores VALUES (3.5), (4.2), (2.8), (5.0), (3.9);
|
|
364
|
+
SELECT <schema>.agg_stats(val) FROM <schema>.java_udf_test_scores;
|
|
365
|
+
-- → [sum, avg, min, max, count]
|
|
366
|
+
```
|
|
367
|
+
|
|
368
|
+
### UDTF Example
|
|
369
|
+
|
|
370
|
+
UDTF requires the `LATERAL` syntax and cannot be used with `SELECT func(x)`:
|
|
371
|
+
|
|
372
|
+
```sql
|
|
373
|
+
SELECT t.ts, t.event
|
|
374
|
+
FROM (
|
|
375
|
+
SELECT '[2025-01-15 10:30:00] User login
|
|
376
|
+
[2025-01-15 10:35:00] Query order' AS log
|
|
377
|
+
) s, LATERAL <schema>.log_explode(s.log) t;
|
|
378
|
+
-- → each log line is expanded into one row
|
|
379
|
+
```
|
|
380
|
+
|
|
381
|
+
### Key Takeaways
|
|
382
|
+
|
|
383
|
+
- **Java version must be 8** — FC only has a Java 8 runtime
|
|
384
|
+
- **Hive dependency scope=provided**: FC includes `hive-exec.jar`; do not bundle it in the zip to avoid conflicts
|
|
385
|
+
- **UDAF DDL must include `AGGREGATOR`**, UDTF must include `TABLE_VALUED` — omitting them causes creation to succeed but calls to fail
|
|
386
|
+
- **UDTF must use `LATERAL`** syntax; `SELECT func(x)` directly is not supported
|
|
387
|
+
|
|
388
|
+
---
|
|
389
|
+
|
|
390
|
+
## Function Quick Reference
|
|
391
|
+
|
|
392
|
+
### Python (quickstart + advanced + AI function)
|
|
393
|
+
|
|
394
|
+
| Function | Purpose | Source |
|
|
395
|
+
|------|------|------|
|
|
396
|
+
| `my_upper` | String to uppercase | quickstart |
|
|
397
|
+
| `pii_mask` | Phone/email/ID masking | advanced |
|
|
398
|
+
| `feature_normalize` | Numeric column normalization | advanced |
|
|
399
|
+
| `anomaly_detect` | Isolation Forest anomaly detection | advanced |
|
|
400
|
+
| `sentiment_score` | Chinese sentiment scoring | advanced |
|
|
401
|
+
| `tfidf_keywords` | TF-IDF keyword extraction | advanced |
|
|
402
|
+
| `ai_text_summarize` | Text summarization | AI function |
|
|
403
|
+
| `ai_text_translate` | Text translation | AI function |
|
|
404
|
+
| `ai_text_sentiment_analyze` | Sentiment analysis | AI function |
|
|
405
|
+
| `ai_text_extract_entities` | Entity extraction | AI function |
|
|
406
|
+
| `ai_text_extract_keywords` | Keyword extraction | AI function |
|
|
407
|
+
| `ai_text_classify` | Text classification | AI function |
|
|
408
|
+
| `ai_text_clean_normalize` | Text cleaning | AI function |
|
|
409
|
+
| `ai_auto_tag_generate` | Auto tagging | AI function |
|
|
410
|
+
| `ai_text_to_embedding` | Text embedding | AI function |
|
|
411
|
+
| `ai_semantic_similarity` | Semantic similarity | AI function |
|
|
412
|
+
| `ai_text_clustering_prepare` | Clustering preparation | AI function |
|
|
413
|
+
| `ai_find_similar_text` | Similar text search | AI function |
|
|
414
|
+
| `ai_document_search` | Document search | AI function |
|
|
415
|
+
| `ai_image_describe` | Image description | AI function |
|
|
416
|
+
| `ai_image_ocr` | Image OCR | AI function |
|
|
417
|
+
| `ai_image_analyze` | Image analysis | AI function |
|
|
418
|
+
| `ai_image_to_embedding` | Image embedding | AI function |
|
|
419
|
+
| `ai_image_similarity` | Image similarity | AI function |
|
|
420
|
+
| `ai_video_summarize` | Video summarization | AI function |
|
|
421
|
+
| `ai_chart_analyze` | Chart analysis | AI function |
|
|
422
|
+
| `ai_document_parse` | Document parsing | AI function |
|
|
423
|
+
| `ai_customer_intent_analyze` | Customer intent analysis | AI function |
|
|
424
|
+
| `ai_sales_lead_score` | Sales lead scoring | AI function |
|
|
425
|
+
| `ai_review_analyze` | Review analysis | AI function |
|
|
426
|
+
| `ai_risk_text_detect` | Risk text detection | AI function |
|
|
427
|
+
| `ai_contract_extract` | Contract extraction | AI function |
|
|
428
|
+
| `ai_resume_parse` | Resume parsing | AI function |
|
|
429
|
+
| `ai_customer_segment` | Customer segmentation | AI function |
|
|
430
|
+
| `ai_product_description_generate` | Product description generation | AI function |
|
|
431
|
+
| `ai_industry_classification` | Industry classification | AI function |
|
|
432
|
+
|
|
433
|
+
### Java
|
|
434
|
+
|
|
435
|
+
| Function | Type | Purpose |
|
|
436
|
+
|------|------|------|
|
|
437
|
+
| `pii_mask` | UDF | PII masking |
|
|
438
|
+
| `agg_stats` | UDAF | SUM/AVG/MIN/MAX/COUNT |
|
|
439
|
+
| `log_explode` | UDTF | Log row expansion |
|
|
440
|
+
|
|
441
|
+
---
|
|
442
|
+
|
|
443
|
+
## Troubleshooting
|
|
444
|
+
|
|
445
|
+
| Error | Cause | Solution |
|
|
446
|
+
|------|------|------|
|
|
447
|
+
| `function not found` | Missing schema prefix | Add `<schema>.` prefix when calling |
|
|
448
|
+
| `HTTP_GENERAL_ERROR(640)` | RAM trust policy not configured / Bucket in a different region | Check RAM role trust policy (must include `1384322691904283`); confirm Bucket and FC are in the same region |
|
|
449
|
+
| `AccessDenied` | RAM role missing OSS permissions | Add `AliyunOSSFullAccess` or a custom OSS policy |
|
|
450
|
+
| `ImportError: No module named 'sklearn'` | Dependencies not packaged in zip | Re-run `python 2-package.py` (advanced) / `python 2-package.py --deps` (AI function) |
|
|
451
|
+
| `OSError: cannot open shared object file` | macOS `.dylib` was packaged | Confirm `--platform manylinux2014_x86_64` was used |
|
|
452
|
+
| `ClassNotFoundException` | Wrong Java class name or package name | Check that the `AS` path matches the actual class name inside the jar |
|
|
453
|
+
| UDAF created successfully but call fails | DDL missing `AGGREGATOR` | Check `WITH PROPERTIES ('remote.udf.category'='AGGREGATOR')` |
|
|
454
|
+
| UDTF created successfully but `not a table function` | DDL missing `TABLE_VALUED` | Check `WITH PROPERTIES ('remote.udf.category'='TABLE_VALUED')` |
|
|
455
|
+
| First call takes a long time to return | FC cold start | Wait 5-10 seconds; it has not hung |
|
|
456
|
+
| Changes to `config.json` not reflected after deployment | Forgot to re-render | Re-run `python ../3-render-sql.py` |
|
|
457
|
+
|
|
458
|
+
---
|
|
459
|
+
|
|
460
|
+
## Related Documentation
|
|
461
|
+
|
|
462
|
+
- [External Function Introduction](RemoteFunction-intro.md)
|
|
463
|
+
- [Development Guide: External Function (Python3)](RemoteFunction-dev-guide-python3.md)
|
|
464
|
+
- [Development Guide: External Function (Java)](external-function-dev-guide-java.md)
|
|
465
|
+
- [CREATE EXTERNAL FUNCTION](create_external_function.md)
|
|
466
|
+
- [GitHub: clickzetta_external_function](https://github.com/clickzetta/clickzetta_external_function)
|
|
@@ -39,18 +39,16 @@ If both conditions are met, the scheduling system submits the instance task for
|
|
|
39
39
|
If either condition is not met, the task instance will remain in the **Not Started** state. However, to prevent a large number of task instances from remaining in the Not Started state for a long time due to configuration issues or paused upstream tasks, which would waste resources, the system will execute a kill operation based on the user-configured "Scheduling Wait Duration". That is, the task instance will be directly determined as failed by the system.
|
|
40
40
|
|
|
41
41
|
| Note: If the periodic task in the production environment does not have a scheduling wait duration configured, the default instance scheduling wait duration is 3 days. That is, when the task reaches its scheduled time, after 3 days, regardless of whether all upstream tasks have succeeded, it will change from the **Not Started** state to the **Failed** state. If the user has configured a "Scheduling Wait Duration" in the scheduling configuration, once the task reaches its scheduled time, regardless of whether all upstream tasks have succeeded, once the scheduling wait duration is exceeded, the task instance state will change from **Not Started** to **Failed**. |
|
|
42
|
-
|
|
|
42
|
+
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
|
43
43
|
|
|
44
44
|
## Scheduling Properties Configuration
|
|
45
45
|
|
|
46
46
|
In the **Development** module, open any task, click the Schedule Configuration function, and you can configure a series of scheduling property information for that task, including basic information, scheduling time, instance information, scheduling dependencies, and task outputs.
|
|
47
47
|
|
|
48
|
-
*This document focuses on* scheduling time, instance information, and scheduling dependencies. *For other content, please refer to the help documentation
|
|
48
|
+
*This document focuses on* scheduling time, instance information, and scheduling dependencies. *For other content, please refer to the help documentation*.
|
|
49
49
|
|
|
50
50
|
### Scheduling Time
|
|
51
51
|
|
|
52
|
-

|
|
53
|
-
|
|
54
52
|
***
|
|
55
53
|
|
|
56
54
|
**Scheduling Cycle**: Configure the scheduling cycle, scheduling frequency, scheduling start time, and scheduling end time. After the user configures this information, the system will automatically generate a standard cron expression that complies with the rules. This automatically generated time expression will be parsed by the scheduling engine, which uses a time-wheel algorithm to derive all specific execution instances that meet the criteria within future cycles.
|
|
@@ -77,7 +75,7 @@ If the user specifies a date, instances of this scheduling task will no longer b
|
|
|
77
75
|
|
|
78
76
|
#### Instance Generation
|
|
79
77
|
|
|
80
|
-
|
|
78
|
+
^
|
|
81
79
|
|
|
82
80
|
**Instance Generation**: Based on the user-configured **scheduling configuration information and dependency relationships**, **instances are generated according to the instance generation rule selected by the user**.
|
|
83
81
|
|
|
@@ -87,7 +85,7 @@ Effective After Publishing: Takes effect immediately after submitting the task.
|
|
|
87
85
|
* If the submission time is later than the scheduling start time, instances are updated/generated starting from the submission/publishing time.
|
|
88
86
|
* Historically generated instances are not affected; only subsequent instances are changed starting from the start time.
|
|
89
87
|
|
|
90
|
-
|
|
88
|
+
^
|
|
91
89
|
|
|
92
90
|
Next-Day Effective: All instances that need to be executed on the next day are uniformly generated at 22:00 of the current day.
|
|
93
91
|
|
|
@@ -95,7 +93,7 @@ Next-Day Effective: All instances that need to be executed on the next day are u
|
|
|
95
93
|
|
|
96
94
|
#### Instance Rerun Methods
|
|
97
95
|
|
|
98
|
-
|
|
96
|
+
^
|
|
99
97
|
|
|
100
98
|
The product provides three rerun methods, which mainly affect behavior in the following two scenarios:
|
|
101
99
|
|
|
@@ -118,13 +116,13 @@ When the user configures a run timeout duration, if the task instance's run time
|
|
|
118
116
|
|
|
119
117
|
#### Scheduling Wait Duration
|
|
120
118
|
|
|
121
|
-
|
|
119
|
+
^
|
|
122
120
|
|
|
123
121
|
After configuring the scheduling wait duration, when the task instance reaches its scheduled run time, regardless of whether upstream tasks have succeeded, a forced kill behavior will be triggered. This configuration is mainly used to prevent resource waste caused by large numbers of downstream instances being unable to execute due to paused upstream tasks. It is recommended to configure with caution.
|
|
124
122
|
|
|
125
123
|
#### Delayed Run Skip Duration
|
|
126
124
|
|
|
127
|
-
|
|
125
|
+
^
|
|
128
126
|
|
|
129
127
|
The determination is based on the difference between the time the task instance enters the running state and the configured scheduled time.
|
|
130
128
|
|