@clickzetta/cz-cli-darwin-arm64 0.5.15 → 0.5.17
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/cz-cli +0 -0
- package/bin/skills/lakehouse-doc-en/SKILL.md +6 -11
- package/bin/skills/lakehouse-doc-en/references/AIGateway.md +58 -13
- package/bin/skills/lakehouse-doc-en/references/Computation.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/DataSource_Amazon_DocumentDB.md +3 -1
- package/bin/skills/lakehouse-doc-en/references/Foreach.md +14 -14
- package/bin/skills/lakehouse-doc-en/references/JDBC-Driver.md +0 -1
- package/bin/skills/lakehouse-doc-en/references/LakehouseAI-overview.md +21 -8
- package/bin/skills/lakehouse-doc-en/references/LakehouseDataGPT-tour.md +4 -9
- package/bin/skills/lakehouse-doc-en/references/LakehouseStudio-tour.md +14 -19
- package/bin/skills/lakehouse-doc-en/references/Lakehouse_Zilliz_MakeDataReadyforBIandAI.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/Logstash.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/Migrate_Spark_DataEngineeringBestPractices_Project_to_Lakehouse.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/Notebook.md +17 -17
- package/bin/skills/lakehouse-doc-en/references/RemoteFunction-as-udf.md +14 -14
- package/bin/skills/lakehouse-doc-en/references/SQL_External_Catalog_Guide.md +1 -9
- package/bin/skills/lakehouse-doc-en/references/SUMMARY.md +59 -29
- package/bin/skills/lakehouse-doc-en/references/WINDOWFUNCTION.md +99 -57
- package/bin/skills/lakehouse-doc-en/references/Zettapark_Data_Engineering_Demo.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/access-control-configuration.md +1 -8
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-2-5-1.0.md +16 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-3-29-1.0.2.md +14 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-3-8-1.0.1.md +16 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-4-28-1.1.md +29 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-12-1.1.1.md +18 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-15-1.2.md +9 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-21-1.3.md +9 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-28-1.4.md +10 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-6-3-1.5.md +9 -0
- package/bin/skills/lakehouse-doc-en/references/alicloud-arn-externalid.md +0 -5
- package/bin/skills/lakehouse-doc-en/references/answer-accuracy-improve.md +120 -103
- package/bin/skills/lakehouse-doc-en/references/application-list.md +1 -3
- package/bin/skills/lakehouse-doc-en/references/approval-list.md +16 -17
- package/bin/skills/lakehouse-doc-en/references/batch-load-parquet-file-into-lakehouse.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/batch_sync.md +9 -9
- package/bin/skills/lakehouse-doc-en/references/batch_sync_Sop.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/batchloadparquetfileintoLakehouse.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/bulkloadv1-python-sdk.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/chart-auto-refresh-guide.md +12 -6
- package/bin/skills/lakehouse-doc-en/references/clickzetta-sample-data.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/code_approval.md +1 -5
- package/bin/skills/lakehouse-doc-en/references/composite_task.md +31 -42
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_environment_and_data_generate.md +6 -9
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_javasdk_bulkload_realtime.md +4 -10
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_kafka_realtime_sync.md +1 -10
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_local_file_into_table_by_studio.md +0 -6
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_batchload_public_network.md +0 -5
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_python_node.md +2 -7
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_realtime_cdc_public_network.md +13 -18
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_sql_insert.md +0 -1
- package/bin/skills/lakehouse-doc-en/references/concepts.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/config-datasource.md +5 -7
- package/bin/skills/lakehouse-doc-en/references/connect-with-cli.md +116 -72
- package/bin/skills/lakehouse-doc-en/references/connect-with-cz-cli.md +151 -0
- package/bin/skills/lakehouse-doc-en/references/continue-job.md +9 -17
- package/bin/skills/lakehouse-doc-en/references/create-api-connection.md +315 -286
- package/bin/skills/lakehouse-doc-en/references/create-catalog-connection.md +1 -0
- package/bin/skills/lakehouse-doc-en/references/create-dynamic-table.md +4 -4
- package/bin/skills/lakehouse-doc-en/references/create-external-catalog.md +85 -22
- package/bin/skills/lakehouse-doc-en/references/create-table-ddl.md +45 -0
- package/bin/skills/lakehouse-doc-en/references/creating_alicloud_privatelinkendpoint.md +4 -6
- package/bin/skills/lakehouse-doc-en/references/creating_alicloud_privatelinkservice.md +4 -7
- package/bin/skills/lakehouse-doc-en/references/creating_tencentcloud_privatelinkendpoint.md +2 -7
- package/bin/skills/lakehouse-doc-en/references/creating_tencentcloud_privatelinkservice.md +1 -5
- package/bin/skills/lakehouse-doc-en/references/cz-cli-agent.md +15 -10
- package/bin/skills/lakehouse-doc-en/references/cz-cli-datasource.md +0 -8
- package/bin/skills/lakehouse-doc-en/references/cz-cli-sql.md +2 -45
- package/bin/skills/lakehouse-doc-en/references/cz-cli.md +53 -42
- package/bin/skills/lakehouse-doc-en/references/dashboard-version-management-guide.md +12 -4
- package/bin/skills/lakehouse-doc-en/references/data-integration-intro.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/data-integration.md +29 -27
- package/bin/skills/lakehouse-doc-en/references/data-load-summary.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/data-quality.md +25 -25
- package/bin/skills/lakehouse-doc-en/references/data-sharing.md +31 -54
- package/bin/skills/lakehouse-doc-en/references/data-sources.md +45 -45
- package/bin/skills/lakehouse-doc-en/references/data_catalog.md +23 -25
- package/bin/skills/lakehouse-doc-en/references/data_privacy.md +5 -2
- package/bin/skills/lakehouse-doc-en/references/data_sharing_between_accounts_guide.md +0 -4
- package/bin/skills/lakehouse-doc-en/references/data_visualization.md +4 -15
- package/bin/skills/lakehouse-doc-en/references/dataagent.md +39 -7
- package/bin/skills/lakehouse-doc-en/references/databricks-delta-to-lakehouse-migration.md +168 -0
- package/bin/skills/lakehouse-doc-en/references/databricks-dlt-to-lakehouse-migration.md +331 -0
- package/bin/skills/lakehouse-doc-en/references/databricks-external-catalog-practice.md +367 -0
- package/bin/skills/lakehouse-doc-en/references/databricks-jobs-to-studio-migration.md +199 -0
- package/bin/skills/lakehouse-doc-en/references/databricks-notebook-to-studio-migration.md +350 -0
- package/bin/skills/lakehouse-doc-en/references/databricks-uc-governance-to-lakehouse-migration.md +327 -0
- package/bin/skills/lakehouse-doc-en/references/datagpt-model-config.md +34 -0
- package/bin/skills/lakehouse-doc-en/references/datagpt_data_source.md +50 -37
- package/bin/skills/lakehouse-doc-en/references/datagpt_introduction.md +55 -79
- package/bin/skills/lakehouse-doc-en/references/datagpt_quickstart.md +50 -64
- package/bin/skills/lakehouse-doc-en/references/datalake-acceleration.md +75 -2
- package/bin/skills/lakehouse-doc-en/references/dbt-databricks-to-clickzetta-migration.md +242 -0
- package/bin/skills/lakehouse-doc-en/references/dynamic-mask.md +30 -30
- package/bin/skills/lakehouse-doc-en/references/dynamic-table-bestpractice.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/dynamic-table-introduce.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/dynamic_table_summary.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/eco_integration/streamlit.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/eco_integration/superset.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/ecosystem-all.md +1 -3
- package/bin/skills/lakehouse-doc-en/references/ecosystem.md +145 -0
- package/bin/skills/lakehouse-doc-en/references/external-catalog-summary.md +33 -38
- package/bin/skills/lakehouse-doc-en/references/external-function-combo-practice.md +466 -0
- package/bin/skills/lakehouse-doc-en/references/f6fc6447ee.md +7 -9
- package/bin/skills/lakehouse-doc-en/references/federation-query.md +56 -6
- package/bin/skills/lakehouse-doc-en/references/finebi-mysql.md +2 -0
- package/bin/skills/lakehouse-doc-en/references/get-started-with-sample-data.md +10 -11
- package/bin/skills/lakehouse-doc-en/references/gitfolder.md +2 -3
- package/bin/skills/lakehouse-doc-en/references/grant-privileges.md +2 -0
- package/bin/skills/lakehouse-doc-en/references/iceberg-rest-catalog-databricks.md +166 -0
- package/bin/skills/lakehouse-doc-en/references/ide.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/if_else_task.md +59 -57
- package/bin/skills/lakehouse-doc-en/references/input_output.md +10 -7
- package/bin/skills/lakehouse-doc-en/references/jobprofile-bestpractices.md +60 -64
- package/bin/skills/lakehouse-doc-en/references/kafka-connection.md +0 -1
- package/bin/skills/lakehouse-doc-en/references/key-concepts.md +146 -117
- package/bin/skills/lakehouse-doc-en/references/lakehouse-ai-gateway-cz-cli.md +317 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-ai-sql-analysis.md +345 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-dqc-guide.md +300 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-medallion-sql-dt-guide.md +543 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-multi-cloud-acceleration.md +274 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-multimodal-ai-pipeline.md +198 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-quick-experience_guide.md +49 -52
- package/bin/skills/lakehouse-doc-en/references/lakehouse-volume-pipe-acceleration-guide.md +380 -0
- package/bin/skills/lakehouse-doc-en/references/langchain-plug-installation.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/management.md +4 -9
- package/bin/skills/lakehouse-doc-en/references/medallion-lakehouse-from-scratch.md +2 -1
- package/bin/skills/lakehouse-doc-en/references/metrics_answer_build.md +58 -21
- package/bin/skills/lakehouse-doc-en/references/migrate-spark-data-engineering-best-practices-to-lakehouse.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/mindsdb.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/monitoring_and_alerting.md +65 -60
- package/bin/skills/lakehouse-doc-en/references/monitoring_item_specification.md +33 -33
- package/bin/skills/lakehouse-doc-en/references/multitable_batch_sync.md +16 -16
- package/bin/skills/lakehouse-doc-en/references/multitable_realtime_sync.md +65 -72
- package/bin/skills/lakehouse-doc-en/references/multitable_realtime_sync_sop.md +54 -52
- package/bin/skills/lakehouse-doc-en/references/navicat-mysql.md +2 -0
- package/bin/skills/lakehouse-doc-en/references/om-dynamic-table.md +71 -66
- package/bin/skills/lakehouse-doc-en/references/om-vcluster.md +2 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-create-session.md +79 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-generate-auth-token.md +63 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-overview.md +96 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-quick-start.md +286 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-response-guide.md +264 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-safe-question-poll.md +201 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-text2insight-query.md +99 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-text2insight-stop.md +74 -0
- package/bin/skills/lakehouse-doc-en/references/overview.md +6 -7
- package/bin/skills/lakehouse-doc-en/references/permission-application.md +5 -5
- package/bin/skills/lakehouse-doc-en/references/pipe-introduction.md +1 -0
- package/bin/skills/lakehouse-doc-en/references/pipe-kafka-table-stream.md +72 -70
- package/bin/skills/lakehouse-doc-en/references/pipe-kafka.md +105 -110
- package/bin/skills/lakehouse-doc-en/references/pipe-overview.md +40 -40
- package/bin/skills/lakehouse-doc-en/references/pipe-storage-object.md +43 -48
- package/bin/skills/lakehouse-doc-en/references/pipe-summary.md +14 -4
- package/bin/skills/lakehouse-doc-en/references/pipe-syntax.md +58 -151
- package/bin/skills/lakehouse-doc-en/references/practice_python_task.md +4 -4
- package/bin/skills/lakehouse-doc-en/references/pricing-ai-gateway.md +181 -0
- package/bin/skills/lakehouse-doc-en/references/pricing-lakehouse.md +316 -0
- package/bin/skills/lakehouse-doc-en/references/pricing.md +44 -288
- package/bin/skills/lakehouse-doc-en/references/private-link-general.md +0 -2
- package/bin/skills/lakehouse-doc-en/references/pyspark-to-zettapark-migration-f1.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/python-igs.md +7 -3
- package/bin/skills/lakehouse-doc-en/references/python-sample-put-github-rt-events.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/python-task.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/python_reference/connector.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/python_reference/connector_advanced.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/python_reference/connector_examples.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/python_sdk_guide.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/python_shell_datasource.md +11 -9
- package/bin/skills/lakehouse-doc-en/references/quick_start_batch_sync_data.md +9 -18
- package/bin/skills/lakehouse-doc-en/references/quick_start_bi_analysis.md +8 -25
- package/bin/skills/lakehouse-doc-en/references/quick_start_create_workspace.md +4 -6
- package/bin/skills/lakehouse-doc-en/references/quick_start_data_quality.md +8 -8
- package/bin/skills/lakehouse-doc-en/references/quick_start_etl.md +16 -20
- package/bin/skills/lakehouse-doc-en/references/quick_start_monitoring_and_alerting.md +10 -18
- package/bin/skills/lakehouse-doc-en/references/quick_start_sql_query.md +7 -10
- package/bin/skills/lakehouse-doc-en/references/quick_start_upload_data.md +5 -7
- package/bin/skills/lakehouse-doc-en/references/quick_start_user_management.md +8 -8
- package/bin/skills/lakehouse-doc-en/references/quick_start_workspace.md +0 -5
- package/bin/skills/lakehouse-doc-en/references/quick_start_workspace_user.md +8 -8
- package/bin/skills/lakehouse-doc-en/references/quickstart.md +69 -56
- package/bin/skills/lakehouse-doc-en/references/quickstart_datashare_between_companies.md +0 -5
- package/bin/skills/lakehouse-doc-en/references/quickstart_envirment_for_team.md +0 -24
- package/bin/skills/lakehouse-doc-en/references/realtime-pipeline-selection-guide.md +1 -2
- package/bin/skills/lakehouse-doc-en/references/realtime-sales-dashboard-with-dynamic-table.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/realtime_sync.md +0 -1
- package/bin/skills/lakehouse-doc-en/references/release-note-2026-05-19.md +5 -3
- package/bin/skills/lakehouse-doc-en/references/revoke-privileges.md +3 -1
- package/bin/skills/lakehouse-doc-en/references/roles.md +2 -3
- package/bin/skills/lakehouse-doc-en/references/row-filter.md +165 -0
- package/bin/skills/lakehouse-doc-en/references/row_level_permission.md +30 -19
- package/bin/skills/lakehouse-doc-en/references/scheduled_task.md +28 -21
- package/bin/skills/lakehouse-doc-en/references/security_overview.md +99 -21
- package/bin/skills/lakehouse-doc-en/references/set-command.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/setup.md +13 -15
- package/bin/skills/lakehouse-doc-en/references/show-grants.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/snowflake-dynamic-tables-to-lakehouse.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/spark-connector-summary.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/sql_functions/context_functions/current_vcluster.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/sso-configuration.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/streaming_pipeline_with_dynamic_table.md +0 -1
- package/bin/skills/lakehouse-doc-en/references/studio-incremental-sync-practice.md +27 -23
- package/bin/skills/lakehouse-doc-en/references/studio-shell-task.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/supported-cloud-platforms.md +32 -0
- package/bin/skills/lakehouse-doc-en/references/table_rendering.md +18 -12
- package/bin/skills/lakehouse-doc-en/references/task-develop.md +89 -91
- package/bin/skills/lakehouse-doc-en/references/task_development.md +19 -17
- package/bin/skills/lakehouse-doc-en/references/task_group.md +16 -14
- package/bin/skills/lakehouse-doc-en/references/task_instance.md +21 -21
- package/bin/skills/lakehouse-doc-en/references/task_param.md +38 -35
- package/bin/skills/lakehouse-doc-en/references/task_param_reference.md +81 -79
- package/bin/skills/lakehouse-doc-en/references/task_scheduling_dependency.md +20 -21
- package/bin/skills/lakehouse-doc-en/references/tencentcloud_arn_and_externalid.md +1 -5
- package/bin/skills/lakehouse-doc-en/references/trial-account-quotas-and-limits.md +1 -3
- package/bin/skills/lakehouse-doc-en/references/tutorial_connect_to_lakehouse.md +69 -0
- package/bin/skills/lakehouse-doc-en/references/tutorials.md +4 -1
- package/bin/skills/lakehouse-doc-en/references/unique-key.md +167 -0
- package/bin/skills/lakehouse-doc-en/references/usageandbillingview.md +138 -0
- package/bin/skills/lakehouse-doc-en/references/use-dbt-dev.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/use-java-sdk-realtime-uploaddata.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/use-java-sdk-upload-data-local.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/use-models.md +128 -0
- package/bin/skills/lakehouse-doc-en/references/use-mysql-client.md +81 -81
- package/bin/skills/lakehouse-doc-en/references/use-python-sdk-upload-data.md +10 -12
- package/bin/skills/lakehouse-doc-en/references/user-identification.md +2 -3
- package/bin/skills/lakehouse-doc-en/references/user_permission_grand_guide.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/using-udf-in-dynamic-table.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/vc_cache.md +18 -22
- package/bin/skills/lakehouse-doc-en/references/vcluster_size_description.md +33 -31
- package/bin/skills/lakehouse-doc-en/references/virtual-cluster.md +43 -45
- package/bin/skills/lakehouse-doc-en/references/web-job-history.md +94 -108
- package/bin/skills/lakehouse-doc-en/references/web_search.md +16 -7
- package/bin/skills/lakehouse-doc-en/references/zettapark-data-engineering-demo.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/zettapark-dataframe-guide.md +144 -70
- package/bin/skills/lakehouse-doc-en/references/zettapark-dynamic-table-guide.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/zettapark-etl-guide.md +73 -33
- package/bin/skills/lakehouse-doc-en/references/zettapark-feature-engineering.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/zettapark-functions-guide.md +75 -46
- package/bin/skills/lakehouse-doc-en/references/zettapark-quick-start.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/zettapark-stream-guide.md +4 -4
- package/bin/skills/lakehouse-doc-en/references/zettapark-volume-guide.md +93 -29
- package/package.json +1 -1
- package/bin/skills/lakehouse-doc-en/references/CLAUDE.md +0 -606
- package/bin/skills/lakehouse-doc-en/references/modelprice.md +0 -155
|
@@ -6,8 +6,6 @@ Welcome to Lakehouse! This guide has designed a series of carefully orchestrated
|
|
|
6
6
|
|
|
7
7
|
This guide includes the following experience content:
|
|
8
8
|
|
|
9
|
-
:-: 
|
|
10
|
-
|
|
11
9
|
1. **Run Your First SQL Query** (2-3 minutes)
|
|
12
10
|
Experience Lakehouse's easy-to-use SQL analysis environment.
|
|
13
11
|
|
|
@@ -40,7 +38,6 @@ Log into Lakehouse Studio and create a new workspace: `lakehouse_quick_experienc
|
|
|
40
38
|
|
|
41
39
|
^
|
|
42
40
|
|
|
43
|
-
:-: 
|
|
44
41
|
|
|
45
42
|
^
|
|
46
43
|
|
|
@@ -48,13 +45,13 @@ Enter the "Development" page and switch the workspace to the newly created works
|
|
|
48
45
|
|
|
49
46
|
^
|
|
50
47
|
|
|
51
|
-
|
|
48
|
+
|
|
52
49
|
|
|
53
50
|
^
|
|
54
51
|
|
|
55
|
-
Entry for creating a new SQL worksheet
|
|
52
|
+
Entry for creating a new SQL worksheet.
|
|
53
|
+
|
|
56
54
|
|
|
57
|
-
:-: 
|
|
58
55
|
|
|
59
56
|
^
|
|
60
57
|
|
|
@@ -62,7 +59,7 @@ Create a new SQL worksheet named "00\_Environment\_Preparation".
|
|
|
62
59
|
|
|
63
60
|
^
|
|
64
61
|
|
|
65
|
-
|
|
62
|
+
|
|
66
63
|
|
|
67
64
|
^
|
|
68
65
|
|
|
@@ -77,11 +74,11 @@ USE SCHEMA happy_path;
|
|
|
77
74
|
|
|
78
75
|
-- Create the first Virtual Compute Cluster (General type)
|
|
79
76
|
-- Virtual Compute Clusters are the core concept of Lakehouse, representing on-demand allocatable computing resources
|
|
80
|
-
CREATE VCLUSTER IF NOT EXISTS MY_FIRST_VC
|
|
81
|
-
VCLUSTER_SIZE = 1
|
|
82
|
-
VCLUSTER_TYPE = GENERAL
|
|
83
|
-
AUTO_SUSPEND_IN_SECOND = 60
|
|
84
|
-
AUTO_RESUME = TRUE
|
|
77
|
+
CREATE VCLUSTER IF NOT EXISTS MY_FIRST_VC
|
|
78
|
+
VCLUSTER_SIZE = 1
|
|
79
|
+
VCLUSTER_TYPE = GENERAL
|
|
80
|
+
AUTO_SUSPEND_IN_SECOND = 60
|
|
81
|
+
AUTO_RESUME = TRUE
|
|
85
82
|
COMMENT 'My first virtual compute cluster (General)';
|
|
86
83
|
|
|
87
84
|
-- Use this cluster
|
|
@@ -133,7 +130,7 @@ In this exercise, you will execute simple SQL queries, create tables, and perfor
|
|
|
133
130
|
(103, 'Smart Watch', 1299.50, 'Wearables'),
|
|
134
131
|
(104, 'Portable Power Bank', 159.90, 'Accessories'),
|
|
135
132
|
(105, 'Mechanical Keyboard', 349.00, 'Computer Accessories');
|
|
136
|
-
|
|
133
|
+
|
|
137
134
|
-- Query the inserted data
|
|
138
135
|
SELECT * FROM happy_path.my_first_table;
|
|
139
136
|
```
|
|
@@ -142,7 +139,7 @@ In this exercise, you will execute simple SQL queries, create tables, and perfor
|
|
|
142
139
|
|
|
143
140
|
```sql
|
|
144
141
|
-- Count products and average price by category
|
|
145
|
-
SELECT
|
|
142
|
+
SELECT
|
|
146
143
|
category,
|
|
147
144
|
COUNT(*) as product_count,
|
|
148
145
|
AVG(price) as avg_price,
|
|
@@ -174,11 +171,11 @@ Next, let's create a different type of compute cluster to understand how to choo
|
|
|
174
171
|
```sql
|
|
175
172
|
-- Create an Analytics-type Virtual Compute Cluster
|
|
176
173
|
-- Analytics clusters optimize query performance, suitable for low-latency, high-concurrency analysis scenarios
|
|
177
|
-
CREATE VCLUSTER IF NOT EXISTS MY_SECOND_VC
|
|
178
|
-
VCLUSTER_SIZE = 1
|
|
179
|
-
VCLUSTER_TYPE = ANALYTICS
|
|
180
|
-
AUTO_SUSPEND_IN_SECOND = 60
|
|
181
|
-
AUTO_RESUME = TRUE
|
|
174
|
+
CREATE VCLUSTER IF NOT EXISTS MY_SECOND_VC
|
|
175
|
+
VCLUSTER_SIZE = 1
|
|
176
|
+
VCLUSTER_TYPE = ANALYTICS
|
|
177
|
+
AUTO_SUSPEND_IN_SECOND = 60
|
|
178
|
+
AUTO_RESUME = TRUE
|
|
182
179
|
COMMENT 'My second virtual compute cluster (Analytics)';
|
|
183
180
|
```
|
|
184
181
|
|
|
@@ -283,7 +280,7 @@ Compute-storage separation is a core architectural feature of Lakehouse, allowin
|
|
|
283
280
|
INSERT INTO happy_path.demo_dataset VALUES
|
|
284
281
|
(6, 432.10, 'Text-6', CURRENT_TIMESTAMP()),
|
|
285
282
|
(7, 789.65, 'Text-7', CURRENT_TIMESTAMP());
|
|
286
|
-
|
|
283
|
+
|
|
287
284
|
-- Query the updated dataset
|
|
288
285
|
SELECT * FROM happy_path.demo_dataset ORDER BY id;
|
|
289
286
|
```
|
|
@@ -389,7 +386,7 @@ The Lakehouse unified architecture allows you to directly query files in multipl
|
|
|
389
386
|
(1003, 205, DATE '2023-02-02', 1, 899.00, 'C5003'),
|
|
390
387
|
(1004, 204, DATE '2023-02-03', 1, 599.00, 'C5001'),
|
|
391
388
|
(1005, 202, DATE '2023-02-03', 1, 3799.00, 'C5004');
|
|
392
|
-
|
|
389
|
+
|
|
393
390
|
-- Export sales data to User Volume (CSV format)
|
|
394
391
|
COPY INTO USER VOLUME
|
|
395
392
|
SUBDIRECTORY 'lake_demo/sales_csv'
|
|
@@ -518,7 +515,7 @@ Lakehouse supports simultaneously processing both batch and streaming data on th
|
|
|
518
515
|
SELECT * FROM happy_path.all_orders ORDER BY order_time DESC;
|
|
519
516
|
|
|
520
517
|
-- View order statistics
|
|
521
|
-
SELECT
|
|
518
|
+
SELECT
|
|
522
519
|
data_source,
|
|
523
520
|
COUNT(*) as order_count,
|
|
524
521
|
SUM(order_amount) as total_amount
|
|
@@ -538,7 +535,7 @@ Lakehouse supports simultaneously processing both batch and streaming data on th
|
|
|
538
535
|
|
|
539
536
|
```sql
|
|
540
537
|
-- Query order statistics again
|
|
541
|
-
SELECT
|
|
538
|
+
SELECT
|
|
542
539
|
data_source,
|
|
543
540
|
COUNT(*) as order_count,
|
|
544
541
|
SUM(order_amount) as total_amount
|
|
@@ -577,13 +574,13 @@ Lakehouse supports efficient vector search and inverted index search, which can
|
|
|
577
574
|
description STRING,
|
|
578
575
|
price DECIMAL(10,2),
|
|
579
576
|
vec VECTOR(FLOAT, 16), -- 16-dimensional vector representing product features
|
|
580
|
-
|
|
577
|
+
|
|
581
578
|
-- Create vector index
|
|
582
579
|
INDEX product_vec_idx (vec) USING VECTOR PROPERTIES (
|
|
583
|
-
"scalar.type" = "f32",
|
|
580
|
+
"scalar.type" = "f32",
|
|
584
581
|
"distance.function" = "l2_distance"
|
|
585
582
|
),
|
|
586
|
-
|
|
583
|
+
|
|
587
584
|
-- Create inverted index for full-text search
|
|
588
585
|
INDEX product_description_idx (description) INVERTED PROPERTIES (
|
|
589
586
|
'analyzer' = 'chinese'
|
|
@@ -596,31 +593,31 @@ Lakehouse supports efficient vector search and inverted index search, which can
|
|
|
596
593
|
```sql
|
|
597
594
|
-- Insert sample data with vectors
|
|
598
595
|
INSERT INTO happy_path.product_search_demo VALUES
|
|
599
|
-
(1001, 'Ultra-thin Laptop', 'Computers', 'Thin and lightweight high-performance business laptop with the latest processor and HD display', 6999.00,
|
|
596
|
+
(1001, 'Ultra-thin Laptop', 'Computers', 'Thin and lightweight high-performance business laptop with the latest processor and HD display', 6999.00,
|
|
600
597
|
vector(0.1, 0.2, 0.3, 0.4, 0.5, 0.1, 0.2, 0.3, 0.4, 0.5, 0.1, 0.2, 0.3, 0.4, 0.5, 0.1)),
|
|
601
|
-
(1002, 'Professional Gaming Laptop', 'Computers', 'High-performance gaming laptop with dedicated graphics card, suitable for playing large games and professional design', 9999.00,
|
|
598
|
+
(1002, 'Professional Gaming Laptop', 'Computers', 'High-performance gaming laptop with dedicated graphics card, suitable for playing large games and professional design', 9999.00,
|
|
602
599
|
vector(0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2)),
|
|
603
|
-
(1003, 'Business Office Desktop', 'Computers', 'Stable and efficient office desktop computer, suitable for enterprise and home office environments', 4599.00,
|
|
600
|
+
(1003, 'Business Office Desktop', 'Computers', 'Stable and efficient office desktop computer, suitable for enterprise and home office environments', 4599.00,
|
|
604
601
|
vector(0.3, 0.4, 0.5, 0.6, 0.7, 0.3, 0.4, 0.5, 0.6, 0.7, 0.3, 0.4, 0.5, 0.6, 0.7, 0.3));
|
|
605
602
|
|
|
606
603
|
-- Continue inserting more data
|
|
607
604
|
INSERT INTO happy_path.product_search_demo VALUES
|
|
608
|
-
(1004, 'Professional Photography Camera', 'Digital Devices', 'High-resolution professional DSLR camera, suitable for landscape and portrait photography with clear and detailed image quality', 12999.00,
|
|
605
|
+
(1004, 'Professional Photography Camera', 'Digital Devices', 'High-resolution professional DSLR camera, suitable for landscape and portrait photography with clear and detailed image quality', 12999.00,
|
|
609
606
|
vector(0.4, 0.5, 0.6, 0.7, 0.8, 0.4, 0.5, 0.6, 0.7, 0.8, 0.4, 0.5, 0.6, 0.7, 0.8, 0.4)),
|
|
610
|
-
(1005, 'Portable Bluetooth Speaker', 'Audio Devices', 'Compact and portable Bluetooth speaker with clear sound quality and long battery life, suitable for outdoor use', 299.00,
|
|
607
|
+
(1005, 'Portable Bluetooth Speaker', 'Audio Devices', 'Compact and portable Bluetooth speaker with clear sound quality and long battery life, suitable for outdoor use', 299.00,
|
|
611
608
|
vector(0.5, 0.6, 0.7, 0.8, 0.9, 0.5, 0.6, 0.7, 0.8, 0.9, 0.5, 0.6, 0.7, 0.8, 0.9, 0.5)),
|
|
612
|
-
(1006, 'Wireless Noise-Cancelling Headphones', 'Audio Devices', 'Active noise cancellation technology, wireless connection, comfortable to wear, no ear pressure during long use', 1299.00,
|
|
609
|
+
(1006, 'Wireless Noise-Cancelling Headphones', 'Audio Devices', 'Active noise cancellation technology, wireless connection, comfortable to wear, no ear pressure during long use', 1299.00,
|
|
613
610
|
vector(0.6, 0.7, 0.8, 0.9, 1.0, 0.6, 0.7, 0.8, 0.9, 1.0, 0.6, 0.7, 0.8, 0.9, 1.0, 0.6)),
|
|
614
|
-
(1007, 'Smart Watch', 'Wearables', 'Smart watch supporting heart rate monitoring, activity tracking, and message notifications, compatible with various smartphones', 1599.00,
|
|
611
|
+
(1007, 'Smart Watch', 'Wearables', 'Smart watch supporting heart rate monitoring, activity tracking, and message notifications, compatible with various smartphones', 1599.00,
|
|
615
612
|
vector(0.7, 0.8, 0.9, 1.0, 0.1, 0.7, 0.8, 0.9, 1.0, 0.1, 0.7, 0.8, 0.9, 1.0, 0.1, 0.7));
|
|
616
613
|
|
|
617
614
|
-- Continue inserting remaining data
|
|
618
615
|
INSERT INTO happy_path.product_search_demo VALUES
|
|
619
|
-
(1008, 'Fitness Tracker', 'Wearables', 'Professional fitness tracking band, recording daily activity, sleep quality, and exercise data, waterproof design', 399.00,
|
|
616
|
+
(1008, 'Fitness Tracker', 'Wearables', 'Professional fitness tracking band, recording daily activity, sleep quality, and exercise data, waterproof design', 399.00,
|
|
620
617
|
vector(0.8, 0.9, 1.0, 0.1, 0.2, 0.8, 0.9, 1.0, 0.1, 0.2, 0.8, 0.9, 1.0, 0.1, 0.2, 0.8)),
|
|
621
|
-
(1009, 'Ultra HD Smart TV', 'Home Appliances', '65-inch 4K Ultra HD smart TV, supporting voice control and various streaming applications', 5999.00,
|
|
618
|
+
(1009, 'Ultra HD Smart TV', 'Home Appliances', '65-inch 4K Ultra HD smart TV, supporting voice control and various streaming applications', 5999.00,
|
|
622
619
|
vector(0.9, 1.0, 0.1, 0.2, 0.3, 0.9, 1.0, 0.1, 0.2, 0.3, 0.9, 1.0, 0.1, 0.2, 0.3, 0.9)),
|
|
623
|
-
(1010, 'Smart Air Purifier', 'Home Appliances', 'Efficiently filters PM2.5 and harmful gases, intelligently monitors air quality, automatically adjusts working mode', 1899.00,
|
|
620
|
+
(1010, 'Smart Air Purifier', 'Home Appliances', 'Efficiently filters PM2.5 and harmful gases, intelligently monitors air quality, automatically adjusts working mode', 1899.00,
|
|
624
621
|
vector(1.0, 0.1, 0.2, 0.3, 0.4, 1.0, 0.1, 0.2, 0.3, 0.4, 1.0, 0.1, 0.2, 0.3, 0.4, 1.0));
|
|
625
622
|
```
|
|
626
623
|
|
|
@@ -651,7 +648,7 @@ LIMIT 5;
|
|
|
651
648
|
|
|
652
649
|
```sql
|
|
653
650
|
-- Use inverted index for keyword search - find products with "high-performance" in the description
|
|
654
|
-
SELECT
|
|
651
|
+
SELECT
|
|
655
652
|
product_id,
|
|
656
653
|
product_name,
|
|
657
654
|
category,
|
|
@@ -666,7 +663,7 @@ LIMIT 5;
|
|
|
666
663
|
|
|
667
664
|
```sql
|
|
668
665
|
-- Hybrid query: find products similar to the reference vector and containing "gaming" in the description
|
|
669
|
-
SELECT
|
|
666
|
+
SELECT
|
|
670
667
|
product_id,
|
|
671
668
|
product_name,
|
|
672
669
|
category,
|
|
@@ -674,7 +671,7 @@ LIMIT 5;
|
|
|
674
671
|
price,
|
|
675
672
|
l2_distance(vec, vector(0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2)) AS distance
|
|
676
673
|
FROM happy_path.product_search_demo
|
|
677
|
-
WHERE
|
|
674
|
+
WHERE
|
|
678
675
|
match_phrase(description, 'gaming', MAP('analyzer', 'chinese')) AND
|
|
679
676
|
l2_distance(vec, vector(0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2)) < 10
|
|
680
677
|
ORDER BY distance
|
|
@@ -685,7 +682,7 @@ LIMIT 5;
|
|
|
685
682
|
|
|
686
683
|
```sql
|
|
687
684
|
-- Find products priced between 500-10000, with "high-performance" or "professional" in the description, and high vector similarity
|
|
688
|
-
SELECT
|
|
685
|
+
SELECT
|
|
689
686
|
product_id,
|
|
690
687
|
product_name,
|
|
691
688
|
category,
|
|
@@ -693,9 +690,9 @@ LIMIT 5;
|
|
|
693
690
|
price,
|
|
694
691
|
l2_distance(vec, vector(0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2)) AS distance
|
|
695
692
|
FROM happy_path.product_search_demo
|
|
696
|
-
WHERE
|
|
693
|
+
WHERE
|
|
697
694
|
price BETWEEN 500 AND 10000 AND
|
|
698
|
-
(match_phrase(description, 'high-performance', MAP('analyzer', 'chinese')) OR
|
|
695
|
+
(match_phrase(description, 'high-performance', MAP('analyzer', 'chinese')) OR
|
|
699
696
|
match_phrase(description, 'professional', MAP('analyzer', 'chinese'))) AND
|
|
700
697
|
l2_distance(vec, vector(0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2)) < 10
|
|
701
698
|
ORDER BY distance
|
|
@@ -749,15 +746,15 @@ Lakehouse supports offline batch processing and transformation. Estimated time:
|
|
|
749
746
|
```sql
|
|
750
747
|
-- Create a date dimension table
|
|
751
748
|
CREATE TABLE IF NOT EXISTS happy_path.date_dim AS
|
|
752
|
-
SELECT DISTINCT
|
|
749
|
+
SELECT DISTINCT
|
|
753
750
|
date_time::DATE as date_id,
|
|
754
751
|
YEAR(date_time) as year,
|
|
755
752
|
MONTH(date_time) as month,
|
|
756
753
|
DAY(date_time) as day,
|
|
757
754
|
DAYOFWEEK(date_time) as day_of_week,
|
|
758
|
-
CASE
|
|
759
|
-
WHEN DAYOFWEEK(date_time) IN (6, 7) THEN true
|
|
760
|
-
ELSE false
|
|
755
|
+
CASE
|
|
756
|
+
WHEN DAYOFWEEK(date_time) IN (6, 7) THEN true
|
|
757
|
+
ELSE false
|
|
761
758
|
END as is_weekend
|
|
762
759
|
FROM happy_path.sales_data;
|
|
763
760
|
|
|
@@ -770,7 +767,7 @@ Lakehouse supports offline batch processing and transformation. Estimated time:
|
|
|
770
767
|
```sql
|
|
771
768
|
-- Create a sales summary table
|
|
772
769
|
CREATE TABLE IF NOT EXISTS happy_path.sales_summary AS
|
|
773
|
-
SELECT
|
|
770
|
+
SELECT
|
|
774
771
|
d.date_id,
|
|
775
772
|
d.year,
|
|
776
773
|
d.month,
|
|
@@ -800,7 +797,7 @@ Lakehouse supports offline batch processing and transformation. Estimated time:
|
|
|
800
797
|
|
|
801
798
|
```sql
|
|
802
799
|
-- Analyze sales trends using window functions
|
|
803
|
-
SELECT
|
|
800
|
+
SELECT
|
|
804
801
|
date_id,
|
|
805
802
|
category,
|
|
806
803
|
total_sales,
|
|
@@ -816,7 +813,7 @@ Lakehouse supports offline batch processing and transformation. Estimated time:
|
|
|
816
813
|
```sql
|
|
817
814
|
-- Create a business insights view
|
|
818
815
|
CREATE OR REPLACE VIEW happy_path.business_insights AS
|
|
819
|
-
SELECT
|
|
816
|
+
SELECT
|
|
820
817
|
category,
|
|
821
818
|
year,
|
|
822
819
|
month,
|
|
@@ -837,7 +834,7 @@ Lakehouse supports offline batch processing and transformation. Estimated time:
|
|
|
837
834
|
|
|
838
835
|
```sql
|
|
839
836
|
-- Analyze sales ranking and proportion by category
|
|
840
|
-
SELECT
|
|
837
|
+
SELECT
|
|
841
838
|
category,
|
|
842
839
|
SUM(total_sales) as category_sales,
|
|
843
840
|
RANK() OVER (ORDER BY SUM(total_sales) DESC) as sales_rank,
|
|
@@ -973,8 +970,8 @@ Now you can start applying Lakehouse to actual business scenarios and enjoy a si
|
|
|
973
970
|
|
|
974
971
|
## References
|
|
975
972
|
|
|
976
|
-
[Key Concepts](
|
|
973
|
+
[Key Concepts](key-concepts.md)
|
|
977
974
|
[Virtual Compute Cluster](getting_started_with_vcluster_for_processing_analytics.md)
|
|
978
975
|
[Volume](datalake_volume.md)
|
|
979
|
-
[Vector Index](
|
|
976
|
+
[Vector Index](vector-search.md)
|
|
980
977
|
[Inverted Index](inverted-index.md)
|
|
@@ -0,0 +1,380 @@
|
|
|
1
|
+
# Volume + Pipe + Dynamic Table End-to-End Practice
|
|
2
|
+
|
|
3
|
+
"Data lake acceleration" refers to using the three capabilities of object storage mounting (Volume), continuous data ingestion (Pipe), and incremental computation (Dynamic Table) to directly query, process, and consume file data in object storage using Serverless compute—without migrating data—replacing traditional Spark/Hive ETL and Presto/Trino ad hoc queries.
|
|
4
|
+
|
|
5
|
+
Applicable scenarios:
|
|
6
|
+
- **Automatic file ingestion**: CSV/Parquet files periodically uploaded to OSS/COS/S3 are automatically detected and ingested by Pipe, no manual trigger required
|
|
7
|
+
- **Incremental ETL**: After files are ingested, Dynamic Table automatically computes aggregated metrics incrementally, T+1 reports generated without delay
|
|
8
|
+
- **Legacy data activation**: Large volumes of historical files in object storage can be queried directly via Volume mount, no data migration required
|
|
9
|
+
|
|
10
|
+
Core data flow:
|
|
11
|
+
|
|
12
|
+
```
|
|
13
|
+
OSS/COS/S3 files → External Volume (mount) → Pipe (continuous ingestion) → Target table → Dynamic Table (incremental aggregation)
|
|
14
|
+
↕ ↕
|
|
15
|
+
COPY INTO/SELECT FROM COPY INTO/SELECT FROM
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## Core Concepts
|
|
21
|
+
|
|
22
|
+
| Object | Description | Analogy |
|
|
23
|
+
|------|------|------|
|
|
24
|
+
| **External Volume** | Mounts OSS/COS/S3 path for zero-copy access | "Filesystem" of Lakehouse |
|
|
25
|
+
| **Pipe** | Continuously running data ingestion pipeline, automatically detects new files | Conveyor belt—files are ingested as soon as they are uploaded |
|
|
26
|
+
| **Dynamic Table** | Materialized aggregation table that automatically refreshes incrementally | Replaces scheduled ETL jobs |
|
|
27
|
+
|
|
28
|
+
The three work together to form a **self-driving data pipeline**: file upload → automatic ingestion → automatic aggregation, fully automated with no manual scheduling.
|
|
29
|
+
|
|
30
|
+
---
|
|
31
|
+
|
|
32
|
+
## SQL Commands Involved
|
|
33
|
+
|
|
34
|
+
| Command / Function | Purpose | Use Case |
|
|
35
|
+
|------------|------|---------|
|
|
36
|
+
| `CREATE STORAGE CONNECTION` | Establish object storage authentication channel | One-time setup, shared by all Volumes |
|
|
37
|
+
| `CREATE EXTERNAL VOLUME` | Mount object storage path to a Schema | Configure once per Bucket subdirectory |
|
|
38
|
+
| `COPY INTO VOLUME` | Export data to Volume | Generate files for downstream consumption |
|
|
39
|
+
| `SELECT FROM VOLUME` | Directly query files in Volume | Ad hoc queries, data exploration |
|
|
40
|
+
| `DIRECTORY()` | List files in a Volume | View file list, validate exports |
|
|
41
|
+
| `ALTER VOLUME REFRESH` | Manually refresh Volume directory cache | Use when `AUTO_REFRESH=FALSE` |
|
|
42
|
+
| `CREATE PIPE` | Create continuous data ingestion pipeline | Automatic file ingestion |
|
|
43
|
+
| `ALTER PIPE` | Pause/resume Pipe | Operations management |
|
|
44
|
+
| `DESC PIPE EXTENDED` | View Pipe status and configuration | Monitoring, troubleshooting |
|
|
45
|
+
| `load_history()` | Query table's historical load records | Validate Pipe loading, troubleshoot deduplication |
|
|
46
|
+
| `CREATE DYNAMIC TABLE` | Create auto-incrementally refreshing aggregation table | Replace scheduled ETL jobs |
|
|
47
|
+
| `REFRESH DYNAMIC TABLE` | Manually trigger Dynamic Table refresh | Immediately refresh after initial creation |
|
|
48
|
+
| `SHOW DYNAMIC TABLE REFRESH HISTORY` | View refresh history | Monitor incremental refresh status |
|
|
49
|
+
|
|
50
|
+
---
|
|
51
|
+
|
|
52
|
+
## Prerequisites
|
|
53
|
+
|
|
54
|
+
The following uses Alibaba Cloud OSS as an example, completing the full pipeline using the `semantic_model_test` schema and `DEFAULT` Virtual Cluster.
|
|
55
|
+
|
|
56
|
+
> ⚠️ **Prerequisites**: OSS Bucket has been created, and you have AccessKey ID / AccessKey Secret. Virtual Cluster must be in RUNNING state (queries auto-wake in Serverless mode).
|
|
57
|
+
|
|
58
|
+
---
|
|
59
|
+
|
|
60
|
+
## End-to-End Practice
|
|
61
|
+
|
|
62
|
+
### Step 1: Create Storage Connection
|
|
63
|
+
|
|
64
|
+
Establish the authentication channel between Lakehouse and OSS.
|
|
65
|
+
|
|
66
|
+
```sql
|
|
67
|
+
-- Create OSS storage connection
|
|
68
|
+
CREATE STORAGE CONNECTION IF NOT EXISTS my_oss_conn
|
|
69
|
+
TYPE OSS
|
|
70
|
+
access_id = '<your_access_key_id>'
|
|
71
|
+
access_key = '<your_access_key_secret>'
|
|
72
|
+
ENDPOINT = 'oss-cn-shanghai.aliyuncs.com';
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
> **Parameter note**: Alibaba Cloud OSS uses lowercase `access_id` / `access_key`. Uppercase `ACCESS_KEY_ID` / `ACCESS_KEY_SECRET` also works. Do not use `ACCESS_KEY` / `SECRET_KEY` (missing suffixes will cause errors).
|
|
76
|
+
|
|
77
|
+
### Step 2: Create External Volume
|
|
78
|
+
|
|
79
|
+
Mount the OSS Bucket subdirectory as a Lakehouse Volume.
|
|
80
|
+
|
|
81
|
+
```sql
|
|
82
|
+
CREATE EXTERNAL VOLUME IF NOT EXISTS my_data_vol
|
|
83
|
+
LOCATION 'oss://my-bucket/data/'
|
|
84
|
+
USING CONNECTION my_oss_conn
|
|
85
|
+
DIRECTORY = (ENABLE = TRUE, AUTO_REFRESH = FALSE)
|
|
86
|
+
RECURSIVE = TRUE
|
|
87
|
+
COMMENT 'Dedicated Volume for data lake acceleration';
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
Key parameter descriptions:
|
|
91
|
+
|
|
92
|
+
| Parameter | Description |
|
|
93
|
+
|---|---|
|
|
94
|
+
| `LOCATION` | OSS path, must point to a specific subdirectory, not the bucket root path |
|
|
95
|
+
| `USING CONNECTION` | References the Storage Connection created in step 1 |
|
|
96
|
+
| `DIRECTORY.ENABLE` | Enables directory metadata index; allows using `DIRECTORY()` function to query file list |
|
|
97
|
+
| `AUTO_REFRESH` | Set to `TRUE` for auto-refresh; when set to `FALSE`, manual `ALTER VOLUME REFRESH` is required |
|
|
98
|
+
| `RECURSIVE` | Recursively scan subdirectories |
|
|
99
|
+
|
|
100
|
+
### Step 3: Create Source Table and Export to Volume
|
|
101
|
+
|
|
102
|
+
Verify bidirectional read/write capability of the Volume.
|
|
103
|
+
|
|
104
|
+
```sql
|
|
105
|
+
-- 1. Create source table and insert test data
|
|
106
|
+
CREATE TABLE IF NOT EXISTS sales_source (
|
|
107
|
+
id BIGINT COMMENT 'Order ID',
|
|
108
|
+
product STRING COMMENT 'Product name',
|
|
109
|
+
category STRING COMMENT 'Category',
|
|
110
|
+
amount DECIMAL(10,2) COMMENT 'Amount',
|
|
111
|
+
dt STRING COMMENT 'Date'
|
|
112
|
+
) COMMENT 'Data lake acceleration test source table';
|
|
113
|
+
|
|
114
|
+
INSERT INTO sales_source VALUES
|
|
115
|
+
(1, 'iPhone 15', 'Electronics', 8999.00, '2026-06-01'),
|
|
116
|
+
(2, 'MacBook Pro', 'Electronics', 14999.00, '2026-06-01'),
|
|
117
|
+
(3, 'AirPods', 'Electronics', 1299.00, '2026-06-01'),
|
|
118
|
+
(4, 'Nike Air Max', 'Sports', 899.00, '2026-06-01'),
|
|
119
|
+
(5, 'Yoga Mat', 'Sports', 199.00, '2026-06-01');
|
|
120
|
+
|
|
121
|
+
-- 2. Export as CSV to Volume
|
|
122
|
+
COPY INTO VOLUME my_data_vol
|
|
123
|
+
SUBDIRECTORY 'export/'
|
|
124
|
+
FROM TABLE sales_source
|
|
125
|
+
FILE_FORMAT = (TYPE = CSV);
|
|
126
|
+
|
|
127
|
+
-- 3. Export as Parquet to Volume
|
|
128
|
+
COPY INTO VOLUME my_data_vol
|
|
129
|
+
SUBDIRECTORY 'export/'
|
|
130
|
+
FROM TABLE sales_source
|
|
131
|
+
FILE_FORMAT = (TYPE = PARQUET);
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
> ⚠️ **`COPY INTO VOLUME` requires `SUBDIRECTORY`**: omitting this clause will throw `Syntax error at or near 'FROM'`. To export to the Volume root path, use `SUBDIRECTORY '/'`.
|
|
135
|
+
|
|
136
|
+
> ⚠️ **Export syntax**: `COPY INTO VOLUME` uses `FILE_FORMAT = (TYPE = CSV/PARQUET)`, not `USING CSV`. `USING` is only used for `SELECT FROM VOLUME` to query files.
|
|
137
|
+
|
|
138
|
+
### Step 4: Validate Volume File Read/Write
|
|
139
|
+
|
|
140
|
+
```sql
|
|
141
|
+
-- Refresh directory cache (manual refresh required when AUTO_REFRESH=FALSE)
|
|
142
|
+
ALTER VOLUME my_data_vol REFRESH;
|
|
143
|
+
|
|
144
|
+
-- View exported files
|
|
145
|
+
SELECT relative_path, size, last_modified_time
|
|
146
|
+
FROM DIRECTORY(VOLUME my_data_vol)
|
|
147
|
+
WHERE relative_path LIKE 'export/%';
|
|
148
|
+
|
|
149
|
+
-- Directly query CSV files
|
|
150
|
+
SELECT * FROM VOLUME my_data_vol
|
|
151
|
+
USING CSV
|
|
152
|
+
FILES('export/part00001.csv');
|
|
153
|
+
|
|
154
|
+
-- Directly query Parquet files (preserves column names)
|
|
155
|
+
SELECT id, product, category, amount, dt
|
|
156
|
+
FROM VOLUME my_data_vol
|
|
157
|
+
USING PARQUET
|
|
158
|
+
FILES('export/part00001.parquet');
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
> **CSV vs Parquet column name difference**: CSV files without headers auto-generate column names `f0, f1, f2...`; Parquet files preserve original column names. To use original column names with CSV, add `OPTIONS('header'='true')` on import.
|
|
162
|
+
|
|
163
|
+
### Step 5: Create Pipe for Continuous Ingestion
|
|
164
|
+
|
|
165
|
+
Pipe continuously monitors the Volume for new files and automatically ingests them into the target table.
|
|
166
|
+
|
|
167
|
+
```sql
|
|
168
|
+
-- 1. Create dedicated Volume for Pipe (must point to a separate subdirectory)
|
|
169
|
+
CREATE EXTERNAL VOLUME IF NOT EXISTS pipe_vol
|
|
170
|
+
LOCATION 'oss://my-bucket/data/incoming/'
|
|
171
|
+
USING CONNECTION my_oss_conn
|
|
172
|
+
DIRECTORY = (ENABLE = TRUE, AUTO_REFRESH = TRUE)
|
|
173
|
+
RECURSIVE = TRUE
|
|
174
|
+
COMMENT 'Dedicated Volume for Pipe continuous ingestion';
|
|
175
|
+
|
|
176
|
+
-- 2. Create target table
|
|
177
|
+
CREATE TABLE IF NOT EXISTS sales_ods (
|
|
178
|
+
id BIGINT COMMENT 'Order ID',
|
|
179
|
+
product STRING COMMENT 'Product name',
|
|
180
|
+
category STRING COMMENT 'Category',
|
|
181
|
+
amount DECIMAL(10,2) COMMENT 'Amount',
|
|
182
|
+
dt STRING COMMENT 'Date'
|
|
183
|
+
) COMMENT 'ODS layer — Pipe ingestion target table';
|
|
184
|
+
|
|
185
|
+
-- 3. Create Pipe (LIST_PURGE mode)
|
|
186
|
+
CREATE PIPE IF NOT EXISTS sales_pipe
|
|
187
|
+
INGEST_MODE = 'LIST_PURGE'
|
|
188
|
+
VIRTUAL_CLUSTER = 'DEFAULT'
|
|
189
|
+
COMMENT 'Sales data continuous ingestion pipeline'
|
|
190
|
+
AS
|
|
191
|
+
COPY INTO sales_ods
|
|
192
|
+
FROM VOLUME pipe_vol
|
|
193
|
+
USING CSV PURGE = TRUE;
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
> ⚠️ **Pipe key constraints**:
|
|
197
|
+
> - Each Pipe needs a dedicated Volume; multiple Pipes cannot share the same Volume
|
|
198
|
+
> - `LOCATION` must point to a specific subdirectory, not the bucket root path
|
|
199
|
+
> - `LIST_PURGE` mode **deletes source files** after successful ingestion (irreversible); use `EVENT_NOTIFICATION` mode to keep files
|
|
200
|
+
> - `PURGE = TRUE` must appear after `USING <format>`, not inside OPTIONS
|
|
201
|
+
|
|
202
|
+
#### Pipe Management
|
|
203
|
+
|
|
204
|
+
```sql
|
|
205
|
+
-- View Pipe status
|
|
206
|
+
DESC PIPE EXTENDED sales_pipe;
|
|
207
|
+
-- Key fields: pipe_status (RUNNING/PAUSED), ingest_mode, input_name, output_name
|
|
208
|
+
|
|
209
|
+
-- Pause Pipe (stop scanning new files)
|
|
210
|
+
ALTER PIPE sales_pipe SET PIPE_EXECUTION_PAUSED = TRUE;
|
|
211
|
+
|
|
212
|
+
-- Resume Pipe (restart scanning)
|
|
213
|
+
ALTER PIPE sales_pipe SET PIPE_EXECUTION_PAUSED = FALSE;
|
|
214
|
+
|
|
215
|
+
-- View imported file records (7-day retention)
|
|
216
|
+
SELECT * FROM load_history('sales_ods');
|
|
217
|
+
-- Returns: file_path, last_copy_time, file_size, status, first_error_message
|
|
218
|
+
```
|
|
219
|
+
|
|
220
|
+
> **Deduplication mechanism**: Pipe deduplicates by file path via `load_history` (within 7 days). Files with the same name will not be re-imported. To reload the same file, wait 7 days or rename the file before re-uploading.
|
|
221
|
+
|
|
222
|
+
#### Trigger Pipe Loading
|
|
223
|
+
|
|
224
|
+
Pipe starts running immediately after creation (polls approximately every 30 seconds). Writing new files to the Volume path triggers loading:
|
|
225
|
+
|
|
226
|
+
```sql
|
|
227
|
+
-- Simulate "new file arrival" via COPY INTO VOLUME
|
|
228
|
+
COPY INTO VOLUME pipe_vol
|
|
229
|
+
SUBDIRECTORY '/'
|
|
230
|
+
FROM (SELECT * FROM sales_source WHERE dt = '2026-06-01')
|
|
231
|
+
FILE_FORMAT = (TYPE = CSV);
|
|
232
|
+
|
|
233
|
+
-- Verify data has been loaded after a moment
|
|
234
|
+
SELECT COUNT(*) FROM sales_ods; -- should return 5
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
> ⚠️ **Files written during pause**: Files written while Pipe is paused will not be loaded. After resuming, they will be detected in the next scan. If the file name matches an already-loaded file, it will be skipped by the deduplication mechanism.
|
|
238
|
+
|
|
239
|
+
### Step 6: Create Dynamic Table for Incremental Consumption
|
|
240
|
+
|
|
241
|
+
Based on the Pipe-ingested table, create a Dynamic Table for automatic incremental aggregation.
|
|
242
|
+
|
|
243
|
+
```sql
|
|
244
|
+
-- Enable change tracking on source table (prerequisite for incremental refresh)
|
|
245
|
+
ALTER TABLE sales_ods SET PROPERTIES ('change_tracking' = 'true');
|
|
246
|
+
|
|
247
|
+
-- Create Dynamic Table, aggregate by category
|
|
248
|
+
CREATE OR REPLACE DYNAMIC TABLE sales_summary
|
|
249
|
+
REFRESH INTERVAL 1 HOUR vcluster DEFAULT
|
|
250
|
+
COMMENT 'Category summary — incremental refresh'
|
|
251
|
+
AS
|
|
252
|
+
SELECT
|
|
253
|
+
category,
|
|
254
|
+
COUNT(*) AS order_cnt,
|
|
255
|
+
SUM(amount) AS total_amount,
|
|
256
|
+
AVG(amount) AS avg_amount,
|
|
257
|
+
MIN(dt) AS min_date,
|
|
258
|
+
MAX(dt) AS max_date
|
|
259
|
+
FROM sales_ods
|
|
260
|
+
GROUP BY category;
|
|
261
|
+
|
|
262
|
+
-- Immediately trigger first refresh (resets refresh baseline time)
|
|
263
|
+
REFRESH DYNAMIC TABLE sales_summary;
|
|
264
|
+
|
|
265
|
+
-- Query results
|
|
266
|
+
SELECT * FROM sales_summary ORDER BY category;
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
> **Refresh frequency note**: `REFRESH INTERVAL 1 HOUR` calculates the next trigger based on creation time and does not align to clock hours. To trigger at a specific time, create near the target time, or execute `REFRESH` immediately after creation to reset the baseline.
|
|
270
|
+
|
|
271
|
+
#### View DT Refresh History
|
|
272
|
+
|
|
273
|
+
```sql
|
|
274
|
+
SHOW DYNAMIC TABLE REFRESH HISTORY WHERE name = 'sales_summary';
|
|
275
|
+
-- Key fields: state (SUCCEED), refresh_mode (INCREMENTAL/FULL), duration, source_tables
|
|
276
|
+
```
|
|
277
|
+
|
|
278
|
+
---
|
|
279
|
+
|
|
280
|
+
## Full Data Flow Validation
|
|
281
|
+
|
|
282
|
+
```sql
|
|
283
|
+
-- Validate data consistency across all stages
|
|
284
|
+
SELECT 'Source' AS stage, COUNT(*) AS rows FROM sales_source
|
|
285
|
+
UNION ALL
|
|
286
|
+
SELECT 'ODS' AS stage, COUNT(*) AS rows FROM sales_ods
|
|
287
|
+
UNION ALL
|
|
288
|
+
SELECT 'Summary' AS stage, COUNT(*) AS rows FROM sales_summary;
|
|
289
|
+
```
|
|
290
|
+
|
|
291
|
+
| Stage | Data Volume | Description |
|
|
292
|
+
|---|---|---|
|
|
293
|
+
| Source | 5 rows | Raw data (INSERT) |
|
|
294
|
+
| ODS | 5 rows | Pipe ingestion (CSV → table) |
|
|
295
|
+
| Summary | 3 rows | Dynamic Table aggregation (3 category groups) |
|
|
296
|
+
|
|
297
|
+
---
|
|
298
|
+
|
|
299
|
+
## Best Practices
|
|
300
|
+
|
|
301
|
+
### File Size Recommendations
|
|
302
|
+
|
|
303
|
+
| Format | Recommended Size | Description |
|
|
304
|
+
|---|---|---|
|
|
305
|
+
| gzip compressed | ~50 MB | Files that are too large reduce parallelism |
|
|
306
|
+
| CSV uncompressed | 128-256 MB | Balance between scan speed and file count |
|
|
307
|
+
| Parquet uncompressed | 128-256 MB | Columnar storage, more efficient for queries |
|
|
308
|
+
|
|
309
|
+
### Volume and Pipe Design Principles
|
|
310
|
+
|
|
311
|
+
1. **Each Pipe has its own Volume**: Different Pipes cannot share the same Volume to avoid interference
|
|
312
|
+
2. **Volume points to a subdirectory**: Do not point to the bucket root path, as this will cause Pipe creation errors
|
|
313
|
+
3. **LIST_PURGE vs EVENT_NOTIFICATION**:
|
|
314
|
+
- `LIST_PURGE`: Simple configuration, suitable for most scenarios, deletes source files after loading
|
|
315
|
+
- `EVENT_NOTIFICATION`: Low latency, retains source files, but only supports OSS+S3, and requires additional MNS/SQS configuration
|
|
316
|
+
|
|
317
|
+
### Dynamic Table Design Principles
|
|
318
|
+
|
|
319
|
+
1. **Use GP type Virtual Cluster** (such as `DEFAULT`): GP type supports small file merging; AP type does not
|
|
320
|
+
2. **Enable change_tracking**: If the source table does not have it enabled, DT performs a full refresh every time with no incremental support
|
|
321
|
+
3. **REFRESH immediately after creation**: Ensures first data availability and resets the refresh baseline time
|
|
322
|
+
|
|
323
|
+
### Data Lifecycle
|
|
324
|
+
|
|
325
|
+
```
|
|
326
|
+
File upload → Pipe scan → COPY INTO ingest → PURGE delete → Dynamic Table incremental refresh
|
|
327
|
+
↓ ↓ ↓ ↓ ↓
|
|
328
|
+
OSS write 30s polling load_history record source file deleted aggregation update
|
|
329
|
+
```
|
|
330
|
+
|
|
331
|
+
---
|
|
332
|
+
|
|
333
|
+
## Test Validation Results
|
|
334
|
+
|
|
335
|
+
The following results are from actual testing on an Alibaba Cloud Shanghai instance (`f8866243`):
|
|
336
|
+
|
|
337
|
+
| Test Item | Result | Details |
|
|
338
|
+
|---|---|---|
|
|
339
|
+
| Storage Connection creation | ✅ | OSS connection normal |
|
|
340
|
+
| External Volume mount | ✅ | Directory access normal; `AUTO_REFRESH=FALSE` requires manual refresh |
|
|
341
|
+
| SELECT FROM VOLUME (CSV) | ✅ | Without header, column names are f0-f4; Parquet preserves column names |
|
|
342
|
+
| SELECT FROM VOLUME (Parquet) | ✅ | Column names and types both preserved |
|
|
343
|
+
| COPY INTO TABLE (CSV) | ✅ | 5 rows correctly imported |
|
|
344
|
+
| COPY INTO TABLE (Parquet) | ✅ | 5 rows correctly imported |
|
|
345
|
+
| COPY INTO VOLUME export | ⚠️ | **Must include `SUBDIRECTORY`**, otherwise syntax error |
|
|
346
|
+
| Pipe LIST_PURGE creation | ✅ | Status immediately becomes RUNNING |
|
|
347
|
+
| Pipe load trigger | ✅ | Auto-loaded in ~30 seconds; load_history records complete |
|
|
348
|
+
| Pipe PURGE deletion | ✅ | Source files auto-deleted after successful load |
|
|
349
|
+
| Pipe pause/resume | ✅ | Files not loaded during pause; re-scanned after resume |
|
|
350
|
+
| Pipe deduplication | ✅ | Same-name files correctly blocked by load_history (7-day retention) |
|
|
351
|
+
| Dynamic Table incremental refresh | ✅ | INCREMENTAL mode, aggregation completed in 346ms |
|
|
352
|
+
|
|
353
|
+
---
|
|
354
|
+
|
|
355
|
+
## Notes
|
|
356
|
+
|
|
357
|
+
| Note | Impact | Recommendation |
|
|
358
|
+
|---|---|---|
|
|
359
|
+
| `COPY INTO VOLUME` requires `SUBDIRECTORY` | Without it, syntax error | Use `SUBDIRECTORY '/'` for root path |
|
|
360
|
+
| Generic CSV column names | Without header, column names are f0-f4 | Use `OPTIONS('header'='true')` or switch to Parquet |
|
|
361
|
+
| Manual refresh needed when `AUTO_REFRESH=FALSE` | Directory does not update | Execute `ALTER VOLUME name REFRESH` |
|
|
362
|
+
| Pipe same-name file deduplication | Same-name files not loaded after pause/resume | Rename file on re-upload, or wait 7 days for expiration |
|
|
363
|
+
| `load_history` column name | `last_copy_time` not `last_load_time` | Pay attention to column name when querying |
|
|
364
|
+
| Virtual Cluster auto-sleep | Suspends after 60s without queries | Serverless mode pays on-demand, no concern needed |
|
|
365
|
+
| Pipe COPY statement is immutable | When logic adjustment is needed | DROP PIPE then CREATE again |
|
|
366
|
+
| AP type Virtual Cluster does not support small file merging | Query performance degrades over time | Always use GP type (`DEFAULT`) |
|
|
367
|
+
|
|
368
|
+
---
|
|
369
|
+
|
|
370
|
+
## Related Documents
|
|
371
|
+
|
|
372
|
+
- [Multi-Cloud Unified Data Lake Acceleration](lakehouse-multi-cloud-acceleration.md) — Alibaba Cloud/Tencent Cloud/AWS real-world comparison
|
|
373
|
+
- [Volume Overview](volume-overview.md) — Volume concepts, types, and file operations
|
|
374
|
+
- [Object Storage Pipe](pipe-storage-object.md) — LIST_PURGE and EVENT_NOTIFICATION complete configuration
|
|
375
|
+
- [Pipe Overview](pipe-overview.md) — Pipe vs Table Stream comparison
|
|
376
|
+
- [Dynamic Table Overview](dynamic-table-introduce.md) — Incremental computation mechanism
|
|
377
|
+
- [Create External Volume](create-external-volume.md) — Complete DDL syntax
|
|
378
|
+
- [Import Data from Volume](from_volume_to_table.md) — COPY INTO syntax
|
|
379
|
+
- [Export Data to Volume](from_lakehouse_to_volume.md) — Export syntax
|
|
380
|
+
- [Query SHOW JOBS](show-jobs.md) — Filter Pipe jobs by query_tag
|