@clickzetta/cz-cli-darwin-arm64 0.5.15 → 0.5.17
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/cz-cli +0 -0
- package/bin/skills/lakehouse-doc-en/SKILL.md +6 -11
- package/bin/skills/lakehouse-doc-en/references/AIGateway.md +58 -13
- package/bin/skills/lakehouse-doc-en/references/Computation.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/DataSource_Amazon_DocumentDB.md +3 -1
- package/bin/skills/lakehouse-doc-en/references/Foreach.md +14 -14
- package/bin/skills/lakehouse-doc-en/references/JDBC-Driver.md +0 -1
- package/bin/skills/lakehouse-doc-en/references/LakehouseAI-overview.md +21 -8
- package/bin/skills/lakehouse-doc-en/references/LakehouseDataGPT-tour.md +4 -9
- package/bin/skills/lakehouse-doc-en/references/LakehouseStudio-tour.md +14 -19
- package/bin/skills/lakehouse-doc-en/references/Lakehouse_Zilliz_MakeDataReadyforBIandAI.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/Logstash.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/Migrate_Spark_DataEngineeringBestPractices_Project_to_Lakehouse.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/Notebook.md +17 -17
- package/bin/skills/lakehouse-doc-en/references/RemoteFunction-as-udf.md +14 -14
- package/bin/skills/lakehouse-doc-en/references/SQL_External_Catalog_Guide.md +1 -9
- package/bin/skills/lakehouse-doc-en/references/SUMMARY.md +59 -29
- package/bin/skills/lakehouse-doc-en/references/WINDOWFUNCTION.md +99 -57
- package/bin/skills/lakehouse-doc-en/references/Zettapark_Data_Engineering_Demo.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/access-control-configuration.md +1 -8
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-2-5-1.0.md +16 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-3-29-1.0.2.md +14 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-3-8-1.0.1.md +16 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-4-28-1.1.md +29 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-12-1.1.1.md +18 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-15-1.2.md +9 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-21-1.3.md +9 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-28-1.4.md +10 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-6-3-1.5.md +9 -0
- package/bin/skills/lakehouse-doc-en/references/alicloud-arn-externalid.md +0 -5
- package/bin/skills/lakehouse-doc-en/references/answer-accuracy-improve.md +120 -103
- package/bin/skills/lakehouse-doc-en/references/application-list.md +1 -3
- package/bin/skills/lakehouse-doc-en/references/approval-list.md +16 -17
- package/bin/skills/lakehouse-doc-en/references/batch-load-parquet-file-into-lakehouse.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/batch_sync.md +9 -9
- package/bin/skills/lakehouse-doc-en/references/batch_sync_Sop.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/batchloadparquetfileintoLakehouse.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/bulkloadv1-python-sdk.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/chart-auto-refresh-guide.md +12 -6
- package/bin/skills/lakehouse-doc-en/references/clickzetta-sample-data.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/code_approval.md +1 -5
- package/bin/skills/lakehouse-doc-en/references/composite_task.md +31 -42
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_environment_and_data_generate.md +6 -9
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_javasdk_bulkload_realtime.md +4 -10
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_kafka_realtime_sync.md +1 -10
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_local_file_into_table_by_studio.md +0 -6
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_batchload_public_network.md +0 -5
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_python_node.md +2 -7
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_realtime_cdc_public_network.md +13 -18
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_sql_insert.md +0 -1
- package/bin/skills/lakehouse-doc-en/references/concepts.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/config-datasource.md +5 -7
- package/bin/skills/lakehouse-doc-en/references/connect-with-cli.md +116 -72
- package/bin/skills/lakehouse-doc-en/references/connect-with-cz-cli.md +151 -0
- package/bin/skills/lakehouse-doc-en/references/continue-job.md +9 -17
- package/bin/skills/lakehouse-doc-en/references/create-api-connection.md +315 -286
- package/bin/skills/lakehouse-doc-en/references/create-catalog-connection.md +1 -0
- package/bin/skills/lakehouse-doc-en/references/create-dynamic-table.md +4 -4
- package/bin/skills/lakehouse-doc-en/references/create-external-catalog.md +85 -22
- package/bin/skills/lakehouse-doc-en/references/create-table-ddl.md +45 -0
- package/bin/skills/lakehouse-doc-en/references/creating_alicloud_privatelinkendpoint.md +4 -6
- package/bin/skills/lakehouse-doc-en/references/creating_alicloud_privatelinkservice.md +4 -7
- package/bin/skills/lakehouse-doc-en/references/creating_tencentcloud_privatelinkendpoint.md +2 -7
- package/bin/skills/lakehouse-doc-en/references/creating_tencentcloud_privatelinkservice.md +1 -5
- package/bin/skills/lakehouse-doc-en/references/cz-cli-agent.md +15 -10
- package/bin/skills/lakehouse-doc-en/references/cz-cli-datasource.md +0 -8
- package/bin/skills/lakehouse-doc-en/references/cz-cli-sql.md +2 -45
- package/bin/skills/lakehouse-doc-en/references/cz-cli.md +53 -42
- package/bin/skills/lakehouse-doc-en/references/dashboard-version-management-guide.md +12 -4
- package/bin/skills/lakehouse-doc-en/references/data-integration-intro.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/data-integration.md +29 -27
- package/bin/skills/lakehouse-doc-en/references/data-load-summary.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/data-quality.md +25 -25
- package/bin/skills/lakehouse-doc-en/references/data-sharing.md +31 -54
- package/bin/skills/lakehouse-doc-en/references/data-sources.md +45 -45
- package/bin/skills/lakehouse-doc-en/references/data_catalog.md +23 -25
- package/bin/skills/lakehouse-doc-en/references/data_privacy.md +5 -2
- package/bin/skills/lakehouse-doc-en/references/data_sharing_between_accounts_guide.md +0 -4
- package/bin/skills/lakehouse-doc-en/references/data_visualization.md +4 -15
- package/bin/skills/lakehouse-doc-en/references/dataagent.md +39 -7
- package/bin/skills/lakehouse-doc-en/references/databricks-delta-to-lakehouse-migration.md +168 -0
- package/bin/skills/lakehouse-doc-en/references/databricks-dlt-to-lakehouse-migration.md +331 -0
- package/bin/skills/lakehouse-doc-en/references/databricks-external-catalog-practice.md +367 -0
- package/bin/skills/lakehouse-doc-en/references/databricks-jobs-to-studio-migration.md +199 -0
- package/bin/skills/lakehouse-doc-en/references/databricks-notebook-to-studio-migration.md +350 -0
- package/bin/skills/lakehouse-doc-en/references/databricks-uc-governance-to-lakehouse-migration.md +327 -0
- package/bin/skills/lakehouse-doc-en/references/datagpt-model-config.md +34 -0
- package/bin/skills/lakehouse-doc-en/references/datagpt_data_source.md +50 -37
- package/bin/skills/lakehouse-doc-en/references/datagpt_introduction.md +55 -79
- package/bin/skills/lakehouse-doc-en/references/datagpt_quickstart.md +50 -64
- package/bin/skills/lakehouse-doc-en/references/datalake-acceleration.md +75 -2
- package/bin/skills/lakehouse-doc-en/references/dbt-databricks-to-clickzetta-migration.md +242 -0
- package/bin/skills/lakehouse-doc-en/references/dynamic-mask.md +30 -30
- package/bin/skills/lakehouse-doc-en/references/dynamic-table-bestpractice.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/dynamic-table-introduce.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/dynamic_table_summary.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/eco_integration/streamlit.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/eco_integration/superset.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/ecosystem-all.md +1 -3
- package/bin/skills/lakehouse-doc-en/references/ecosystem.md +145 -0
- package/bin/skills/lakehouse-doc-en/references/external-catalog-summary.md +33 -38
- package/bin/skills/lakehouse-doc-en/references/external-function-combo-practice.md +466 -0
- package/bin/skills/lakehouse-doc-en/references/f6fc6447ee.md +7 -9
- package/bin/skills/lakehouse-doc-en/references/federation-query.md +56 -6
- package/bin/skills/lakehouse-doc-en/references/finebi-mysql.md +2 -0
- package/bin/skills/lakehouse-doc-en/references/get-started-with-sample-data.md +10 -11
- package/bin/skills/lakehouse-doc-en/references/gitfolder.md +2 -3
- package/bin/skills/lakehouse-doc-en/references/grant-privileges.md +2 -0
- package/bin/skills/lakehouse-doc-en/references/iceberg-rest-catalog-databricks.md +166 -0
- package/bin/skills/lakehouse-doc-en/references/ide.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/if_else_task.md +59 -57
- package/bin/skills/lakehouse-doc-en/references/input_output.md +10 -7
- package/bin/skills/lakehouse-doc-en/references/jobprofile-bestpractices.md +60 -64
- package/bin/skills/lakehouse-doc-en/references/kafka-connection.md +0 -1
- package/bin/skills/lakehouse-doc-en/references/key-concepts.md +146 -117
- package/bin/skills/lakehouse-doc-en/references/lakehouse-ai-gateway-cz-cli.md +317 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-ai-sql-analysis.md +345 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-dqc-guide.md +300 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-medallion-sql-dt-guide.md +543 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-multi-cloud-acceleration.md +274 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-multimodal-ai-pipeline.md +198 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-quick-experience_guide.md +49 -52
- package/bin/skills/lakehouse-doc-en/references/lakehouse-volume-pipe-acceleration-guide.md +380 -0
- package/bin/skills/lakehouse-doc-en/references/langchain-plug-installation.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/management.md +4 -9
- package/bin/skills/lakehouse-doc-en/references/medallion-lakehouse-from-scratch.md +2 -1
- package/bin/skills/lakehouse-doc-en/references/metrics_answer_build.md +58 -21
- package/bin/skills/lakehouse-doc-en/references/migrate-spark-data-engineering-best-practices-to-lakehouse.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/mindsdb.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/monitoring_and_alerting.md +65 -60
- package/bin/skills/lakehouse-doc-en/references/monitoring_item_specification.md +33 -33
- package/bin/skills/lakehouse-doc-en/references/multitable_batch_sync.md +16 -16
- package/bin/skills/lakehouse-doc-en/references/multitable_realtime_sync.md +65 -72
- package/bin/skills/lakehouse-doc-en/references/multitable_realtime_sync_sop.md +54 -52
- package/bin/skills/lakehouse-doc-en/references/navicat-mysql.md +2 -0
- package/bin/skills/lakehouse-doc-en/references/om-dynamic-table.md +71 -66
- package/bin/skills/lakehouse-doc-en/references/om-vcluster.md +2 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-create-session.md +79 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-generate-auth-token.md +63 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-overview.md +96 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-quick-start.md +286 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-response-guide.md +264 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-safe-question-poll.md +201 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-text2insight-query.md +99 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-text2insight-stop.md +74 -0
- package/bin/skills/lakehouse-doc-en/references/overview.md +6 -7
- package/bin/skills/lakehouse-doc-en/references/permission-application.md +5 -5
- package/bin/skills/lakehouse-doc-en/references/pipe-introduction.md +1 -0
- package/bin/skills/lakehouse-doc-en/references/pipe-kafka-table-stream.md +72 -70
- package/bin/skills/lakehouse-doc-en/references/pipe-kafka.md +105 -110
- package/bin/skills/lakehouse-doc-en/references/pipe-overview.md +40 -40
- package/bin/skills/lakehouse-doc-en/references/pipe-storage-object.md +43 -48
- package/bin/skills/lakehouse-doc-en/references/pipe-summary.md +14 -4
- package/bin/skills/lakehouse-doc-en/references/pipe-syntax.md +58 -151
- package/bin/skills/lakehouse-doc-en/references/practice_python_task.md +4 -4
- package/bin/skills/lakehouse-doc-en/references/pricing-ai-gateway.md +181 -0
- package/bin/skills/lakehouse-doc-en/references/pricing-lakehouse.md +316 -0
- package/bin/skills/lakehouse-doc-en/references/pricing.md +44 -288
- package/bin/skills/lakehouse-doc-en/references/private-link-general.md +0 -2
- package/bin/skills/lakehouse-doc-en/references/pyspark-to-zettapark-migration-f1.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/python-igs.md +7 -3
- package/bin/skills/lakehouse-doc-en/references/python-sample-put-github-rt-events.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/python-task.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/python_reference/connector.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/python_reference/connector_advanced.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/python_reference/connector_examples.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/python_sdk_guide.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/python_shell_datasource.md +11 -9
- package/bin/skills/lakehouse-doc-en/references/quick_start_batch_sync_data.md +9 -18
- package/bin/skills/lakehouse-doc-en/references/quick_start_bi_analysis.md +8 -25
- package/bin/skills/lakehouse-doc-en/references/quick_start_create_workspace.md +4 -6
- package/bin/skills/lakehouse-doc-en/references/quick_start_data_quality.md +8 -8
- package/bin/skills/lakehouse-doc-en/references/quick_start_etl.md +16 -20
- package/bin/skills/lakehouse-doc-en/references/quick_start_monitoring_and_alerting.md +10 -18
- package/bin/skills/lakehouse-doc-en/references/quick_start_sql_query.md +7 -10
- package/bin/skills/lakehouse-doc-en/references/quick_start_upload_data.md +5 -7
- package/bin/skills/lakehouse-doc-en/references/quick_start_user_management.md +8 -8
- package/bin/skills/lakehouse-doc-en/references/quick_start_workspace.md +0 -5
- package/bin/skills/lakehouse-doc-en/references/quick_start_workspace_user.md +8 -8
- package/bin/skills/lakehouse-doc-en/references/quickstart.md +69 -56
- package/bin/skills/lakehouse-doc-en/references/quickstart_datashare_between_companies.md +0 -5
- package/bin/skills/lakehouse-doc-en/references/quickstart_envirment_for_team.md +0 -24
- package/bin/skills/lakehouse-doc-en/references/realtime-pipeline-selection-guide.md +1 -2
- package/bin/skills/lakehouse-doc-en/references/realtime-sales-dashboard-with-dynamic-table.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/realtime_sync.md +0 -1
- package/bin/skills/lakehouse-doc-en/references/release-note-2026-05-19.md +5 -3
- package/bin/skills/lakehouse-doc-en/references/revoke-privileges.md +3 -1
- package/bin/skills/lakehouse-doc-en/references/roles.md +2 -3
- package/bin/skills/lakehouse-doc-en/references/row-filter.md +165 -0
- package/bin/skills/lakehouse-doc-en/references/row_level_permission.md +30 -19
- package/bin/skills/lakehouse-doc-en/references/scheduled_task.md +28 -21
- package/bin/skills/lakehouse-doc-en/references/security_overview.md +99 -21
- package/bin/skills/lakehouse-doc-en/references/set-command.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/setup.md +13 -15
- package/bin/skills/lakehouse-doc-en/references/show-grants.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/snowflake-dynamic-tables-to-lakehouse.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/spark-connector-summary.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/sql_functions/context_functions/current_vcluster.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/sso-configuration.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/streaming_pipeline_with_dynamic_table.md +0 -1
- package/bin/skills/lakehouse-doc-en/references/studio-incremental-sync-practice.md +27 -23
- package/bin/skills/lakehouse-doc-en/references/studio-shell-task.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/supported-cloud-platforms.md +32 -0
- package/bin/skills/lakehouse-doc-en/references/table_rendering.md +18 -12
- package/bin/skills/lakehouse-doc-en/references/task-develop.md +89 -91
- package/bin/skills/lakehouse-doc-en/references/task_development.md +19 -17
- package/bin/skills/lakehouse-doc-en/references/task_group.md +16 -14
- package/bin/skills/lakehouse-doc-en/references/task_instance.md +21 -21
- package/bin/skills/lakehouse-doc-en/references/task_param.md +38 -35
- package/bin/skills/lakehouse-doc-en/references/task_param_reference.md +81 -79
- package/bin/skills/lakehouse-doc-en/references/task_scheduling_dependency.md +20 -21
- package/bin/skills/lakehouse-doc-en/references/tencentcloud_arn_and_externalid.md +1 -5
- package/bin/skills/lakehouse-doc-en/references/trial-account-quotas-and-limits.md +1 -3
- package/bin/skills/lakehouse-doc-en/references/tutorial_connect_to_lakehouse.md +69 -0
- package/bin/skills/lakehouse-doc-en/references/tutorials.md +4 -1
- package/bin/skills/lakehouse-doc-en/references/unique-key.md +167 -0
- package/bin/skills/lakehouse-doc-en/references/usageandbillingview.md +138 -0
- package/bin/skills/lakehouse-doc-en/references/use-dbt-dev.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/use-java-sdk-realtime-uploaddata.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/use-java-sdk-upload-data-local.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/use-models.md +128 -0
- package/bin/skills/lakehouse-doc-en/references/use-mysql-client.md +81 -81
- package/bin/skills/lakehouse-doc-en/references/use-python-sdk-upload-data.md +10 -12
- package/bin/skills/lakehouse-doc-en/references/user-identification.md +2 -3
- package/bin/skills/lakehouse-doc-en/references/user_permission_grand_guide.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/using-udf-in-dynamic-table.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/vc_cache.md +18 -22
- package/bin/skills/lakehouse-doc-en/references/vcluster_size_description.md +33 -31
- package/bin/skills/lakehouse-doc-en/references/virtual-cluster.md +43 -45
- package/bin/skills/lakehouse-doc-en/references/web-job-history.md +94 -108
- package/bin/skills/lakehouse-doc-en/references/web_search.md +16 -7
- package/bin/skills/lakehouse-doc-en/references/zettapark-data-engineering-demo.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/zettapark-dataframe-guide.md +144 -70
- package/bin/skills/lakehouse-doc-en/references/zettapark-dynamic-table-guide.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/zettapark-etl-guide.md +73 -33
- package/bin/skills/lakehouse-doc-en/references/zettapark-feature-engineering.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/zettapark-functions-guide.md +75 -46
- package/bin/skills/lakehouse-doc-en/references/zettapark-quick-start.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/zettapark-stream-guide.md +4 -4
- package/bin/skills/lakehouse-doc-en/references/zettapark-volume-guide.md +93 -29
- package/package.json +1 -1
- package/bin/skills/lakehouse-doc-en/references/CLAUDE.md +0 -606
- package/bin/skills/lakehouse-doc-en/references/modelprice.md +0 -155
|
@@ -0,0 +1,274 @@
|
|
|
1
|
+
# Multi-Cloud Unified Data Lake Acceleration
|
|
2
|
+
|
|
3
|
+
The core concept of "data lake acceleration" is **no data migration**—directly querying and processing files in existing object storage using Singdata Lakehouse's Serverless compute, replacing traditional Spark/Hive ETL and Presto/Trino ad hoc queries.
|
|
4
|
+
|
|
5
|
+
Alibaba Cloud OSS, Tencent Cloud COS, and AWS S3—the three mainstream object storage services—are unified through a single approach: **Volume mount → Pipe continuous ingestion → Dynamic Table incremental aggregation**. Aside from different parameter names when creating Storage Connections, all other SQL syntax is identical.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Why a Multi-Cloud Unified Approach
|
|
10
|
+
|
|
11
|
+
- Enterprise data is spread across multiple cloud providers and needs a unified query entry point
|
|
12
|
+
- Different cloud object storage APIs differ, but Lakehouse data processing logic should remain consistent
|
|
13
|
+
- Reduces the mental overhead and operational costs of switching between cloud environments
|
|
14
|
+
|
|
15
|
+
Singdata Lakehouse's abstraction layer solves this perfectly: **the same SQL, with just a Connection parameter change, runs on different clouds**.
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
## SQL Commands Involved
|
|
20
|
+
|
|
21
|
+
| Command / Function | Purpose | Multi-Cloud Differences |
|
|
22
|
+
|------------|------|---------|
|
|
23
|
+
| `CREATE STORAGE CONNECTION` | Establish object storage authentication channel | **The only step with differences** (different parameter names) |
|
|
24
|
+
| `CREATE EXTERNAL VOLUME` | Mount object storage path | Syntax fully unified (only change protocol prefix) |
|
|
25
|
+
| `COPY INTO VOLUME` | Export data to Volume | Fully unified |
|
|
26
|
+
| `SELECT FROM VOLUME` | Directly query Volume files | Fully unified |
|
|
27
|
+
| `DIRECTORY()` | List files in a Volume | Fully unified |
|
|
28
|
+
| `COPY INTO` | Import data from Volume to table | Fully unified |
|
|
29
|
+
| `CREATE PIPE` | Create continuous ingestion pipeline | Fully unified |
|
|
30
|
+
| `ALTER PIPE` | Pause/resume Pipe | Fully unified |
|
|
31
|
+
| `load_history()` | View historical load records | Fully unified |
|
|
32
|
+
| `CREATE DYNAMIC TABLE` | Create incremental refresh aggregation table | Fully unified |
|
|
33
|
+
| `REFRESH DYNAMIC TABLE` | Manually trigger refresh | Fully unified |
|
|
34
|
+
|
|
35
|
+
---
|
|
36
|
+
|
|
37
|
+
## Core Architecture
|
|
38
|
+
|
|
39
|
+

|
|
40
|
+
|
|
41
|
+
---
|
|
42
|
+
|
|
43
|
+
## Cross-Cloud Comparison
|
|
44
|
+
|
|
45
|
+
The exact same Volume + Pipe test cases were executed on Alibaba Cloud Shanghai (`f8866243`) and Tencent Cloud Shanghai (`0c3c358d`):
|
|
46
|
+
|
|
47
|
+
### Configuration Differences
|
|
48
|
+
|
|
49
|
+
| Configuration | Alibaba Cloud OSS | Tencent Cloud COS |
|
|
50
|
+
|---|---|---|
|
|
51
|
+
| **Connection type** | `TYPE OSS` | `TYPE COS` |
|
|
52
|
+
| **Auth parameter names** | `access_id` / `access_key` (lowercase) | `ACCESS_KEY` / `SECRET_KEY` (uppercase) |
|
|
53
|
+
| **Endpoint** | `ENDPOINT = 'oss-cn-shanghai.aliyuncs.com'` | No ENDPOINT required |
|
|
54
|
+
| **Region parameter** | Embedded in ENDPOINT | `REGION = 'ap-shanghai'` |
|
|
55
|
+
| **APP_ID** | Not applicable | `APP_ID = '1253896122'` |
|
|
56
|
+
| **Volume location syntax** | `LOCATION 'oss://bucket/path/'` | `LOCATION 'cos://bucket-appid/path/'` |
|
|
57
|
+
| **Recommended Endpoint** | `oss-cn-shanghai-internal.aliyuncs.com` (internal) | Auto-resolved, no configuration needed |
|
|
58
|
+
|
|
59
|
+
> ⚠️ **Note**: Alibaba Cloud must use `access_id` / `access_key` or `ACCESS_KEY_ID` / `ACCESS_KEY_SECRET`. Do not use `ACCESS_KEY` / `SECRET_KEY` (missing `_ID` / `_SECRET` suffix). Tencent Cloud is the opposite—must use `ACCESS_KEY` / `SECRET_KEY`.
|
|
60
|
+
|
|
61
|
+
### Feature Consistency
|
|
62
|
+
|
|
63
|
+
| Test Item | Alibaba Cloud OSS | Tencent Cloud COS | Conclusion |
|
|
64
|
+
|---|---|---|---|
|
|
65
|
+
| Storage Connection creation | ✅ | ✅ | Different parameter names, otherwise identical |
|
|
66
|
+
| External Volume creation | ✅ | ✅ | **Syntax fully identical** |
|
|
67
|
+
| COPY INTO VOLUME export CSV | ✅ | ✅ | Identical |
|
|
68
|
+
| COPY INTO VOLUME export Parquet | ✅ | ✅ | Identical |
|
|
69
|
+
| DIRECTORY() file listing | ✅ | ✅ | Identical |
|
|
70
|
+
| SELECT FROM VOLUME (CSV) | ✅ f0-f4 column names | ✅ f0-f4 column names | Identical |
|
|
71
|
+
| SELECT FROM VOLUME (Parquet) | ✅ preserves column names | ✅ preserves column names | Identical |
|
|
72
|
+
| COPY INTO TABLE from Volume | ✅ | ✅ | Identical |
|
|
73
|
+
| PIPE LIST_PURGE creation | ✅ | ✅ | Identical |
|
|
74
|
+
| PIPE load trigger | ✅ ~30s | ✅ ~30s | Identical |
|
|
75
|
+
| PIPE PURGE delete source files | ✅ | ✅ | Identical |
|
|
76
|
+
| PIPE load_history dedup | ✅ | ✅ | Identical |
|
|
77
|
+
| PIPE pause/resume | ✅ | ✅ | Identical |
|
|
78
|
+
| Dynamic Table incremental refresh | ✅ | ✅ | Identical |
|
|
79
|
+
|
|
80
|
+
**Key finding**: Aside from different parameter names when creating the Connection, all other 12 test items are **completely identical**. Volume, Pipe, and Dynamic Table SQL syntax has no differences.
|
|
81
|
+
|
|
82
|
+
---
|
|
83
|
+
|
|
84
|
+
## Unified Implementation
|
|
85
|
+
|
|
86
|
+
### Step 1: Create Storage Connection (the only step with differences)
|
|
87
|
+
|
|
88
|
+
```sql
|
|
89
|
+
-- ============ Alibaba Cloud OSS ============
|
|
90
|
+
CREATE STORAGE CONNECTION IF NOT EXISTS my_storage_conn
|
|
91
|
+
TYPE OSS
|
|
92
|
+
access_id = '<AccessKey ID>'
|
|
93
|
+
access_key = '<AccessKey Secret>'
|
|
94
|
+
ENDPOINT = 'oss-cn-shanghai-internal.aliyuncs.com';
|
|
95
|
+
|
|
96
|
+
-- ============ Tencent Cloud COS ============
|
|
97
|
+
CREATE STORAGE CONNECTION IF NOT EXISTS my_storage_conn
|
|
98
|
+
TYPE COS
|
|
99
|
+
ACCESS_KEY = '<SecretId>'
|
|
100
|
+
SECRET_KEY = '<SecretKey>'
|
|
101
|
+
REGION = 'ap-shanghai'
|
|
102
|
+
APP_ID = '<APP_ID>';
|
|
103
|
+
|
|
104
|
+
-- ============ AWS S3 ============
|
|
105
|
+
CREATE STORAGE CONNECTION IF NOT EXISTS my_storage_conn
|
|
106
|
+
TYPE S3
|
|
107
|
+
ACCESS_KEY = '<Access Key ID>'
|
|
108
|
+
SECRET_KEY = '<Secret Access Key>'
|
|
109
|
+
REGION = 'us-east-1';
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
> 💡 **Same-region internal network acceleration**: Alibaba Cloud uses `oss-cn-shanghai-internal.aliyuncs.com` (internal endpoint). AWS S3 and Tencent Cloud COS automatically use internal routing via the Region parameter. Internal network transfers have no egress fees and lower latency.
|
|
113
|
+
|
|
114
|
+
### Step 2: Create External Volume (unified syntax across all three clouds)
|
|
115
|
+
|
|
116
|
+
```sql
|
|
117
|
+
-- ✅ Unified syntax for all three clouds: only change the protocol prefix in LOCATION
|
|
118
|
+
CREATE EXTERNAL VOLUME my_data_vol
|
|
119
|
+
LOCATION 'oss://my-bucket/data/' -- Alibaba Cloud: oss://
|
|
120
|
+
-- LOCATION 'cos://my-bucket-appid/data/' -- Tencent Cloud: cos://
|
|
121
|
+
-- LOCATION 's3://my-bucket/data/' -- AWS: s3://
|
|
122
|
+
USING CONNECTION my_storage_conn
|
|
123
|
+
DIRECTORY = (ENABLE = TRUE, AUTO_REFRESH = FALSE)
|
|
124
|
+
RECURSIVE = TRUE
|
|
125
|
+
COMMENT 'Multi-cloud unified data lake acceleration Volume';
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
### Step 3: Data Import/Export (fully unified)
|
|
129
|
+
|
|
130
|
+
```sql
|
|
131
|
+
-- Export to Volume (unified across all three clouds)
|
|
132
|
+
COPY INTO VOLUME my_data_vol
|
|
133
|
+
SUBDIRECTORY 'export/'
|
|
134
|
+
FROM TABLE source_table
|
|
135
|
+
FILE_FORMAT = (TYPE = PARQUET);
|
|
136
|
+
|
|
137
|
+
-- Directly query from Volume (unified across all three clouds)
|
|
138
|
+
SELECT * FROM VOLUME my_data_vol
|
|
139
|
+
USING PARQUET
|
|
140
|
+
FILES('export/part00001.parquet');
|
|
141
|
+
|
|
142
|
+
-- Import to table (unified across all three clouds)
|
|
143
|
+
COPY INTO target_table
|
|
144
|
+
FROM VOLUME my_data_vol
|
|
145
|
+
USING PARQUET
|
|
146
|
+
SUBDIRECTORY 'export/';
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
### Step 4: Pipe Continuous Ingestion (fully unified)
|
|
150
|
+
|
|
151
|
+
```sql
|
|
152
|
+
-- Create dedicated Volume for Pipe
|
|
153
|
+
CREATE EXTERNAL VOLUME pipe_vol
|
|
154
|
+
LOCATION 'oss://my-bucket/incoming/' -- change only the protocol prefix across clouds
|
|
155
|
+
USING CONNECTION my_storage_conn
|
|
156
|
+
DIRECTORY = (ENABLE = TRUE, AUTO_REFRESH = TRUE)
|
|
157
|
+
RECURSIVE = TRUE;
|
|
158
|
+
|
|
159
|
+
-- Create Pipe (unified across all three clouds)
|
|
160
|
+
CREATE PIPE my_pipe
|
|
161
|
+
INGEST_MODE = 'LIST_PURGE'
|
|
162
|
+
VIRTUAL_CLUSTER = 'DEFAULT'
|
|
163
|
+
COMMENT 'Multi-cloud unified continuous ingestion pipeline'
|
|
164
|
+
AS
|
|
165
|
+
COPY INTO target_table
|
|
166
|
+
FROM VOLUME pipe_vol
|
|
167
|
+
USING CSV PURGE = TRUE;
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
### Step 5: Dynamic Table Incremental Aggregation (fully unified)
|
|
171
|
+
|
|
172
|
+
```sql
|
|
173
|
+
CREATE OR REPLACE DYNAMIC TABLE summary_table
|
|
174
|
+
REFRESH INTERVAL 1 DAY vcluster DEFAULT
|
|
175
|
+
COMMENT 'Multi-cloud unified aggregated metrics'
|
|
176
|
+
AS
|
|
177
|
+
SELECT category, COUNT(*) AS cnt, SUM(amount) AS total
|
|
178
|
+
FROM target_table
|
|
179
|
+
GROUP BY category;
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
---
|
|
183
|
+
|
|
184
|
+
## Best Practices for Multi-Cloud Unification
|
|
185
|
+
|
|
186
|
+
### 1. Code Reuse Strategy
|
|
187
|
+
|
|
188
|
+
```
|
|
189
|
+
Project directory structure:
|
|
190
|
+
├── connections/
|
|
191
|
+
│ ├── aliyun_oss.sql ← only this file differs
|
|
192
|
+
│ ├── tencent_cos.sql ← only this file differs
|
|
193
|
+
│ └── aws_s3.sql ← only this file differs
|
|
194
|
+
├── volumes/
|
|
195
|
+
│ └── create_volumes.sql ← universal across all three clouds (protocol prefix can be replaced via variable)
|
|
196
|
+
├── tables/
|
|
197
|
+
│ └── ddl.sql ← fully unified
|
|
198
|
+
├── pipes/
|
|
199
|
+
│ └── create_pipes.sql ← fully unified
|
|
200
|
+
└── dynamic_tables/
|
|
201
|
+
└── aggregates.sql ← fully unified
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
**Only the Connection creation SQL needs to be written per cloud.** The remaining 90% of the code can be reused directly.
|
|
205
|
+
|
|
206
|
+
### 2. Naming Convention Recommendations
|
|
207
|
+
|
|
208
|
+
| Object | Convention | Example |
|
|
209
|
+
|---|---|---|
|
|
210
|
+
| Storage Connection | `<cloud>_<purpose>` | `oss_prod_conn`, `cos_archive_conn` |
|
|
211
|
+
| External Volume | `<source_system>_vol` | `orders_vol`, `logs_vol` |
|
|
212
|
+
| Pipe | `<source_system>_pipe` | `orders_pipe`, `logs_pipe` |
|
|
213
|
+
|
|
214
|
+
Do not embed cloud provider information in Volume/Pipe names—these objects may appear in multi-cloud reuse scenarios.
|
|
215
|
+
|
|
216
|
+
### 3. Cost Optimization
|
|
217
|
+
|
|
218
|
+
| Strategy | Description |
|
|
219
|
+
|---|---|
|
|
220
|
+
| Use internal Endpoints | Alibaba Cloud: `*-internal.aliyuncs.com`, no egress fees, lower latency |
|
|
221
|
+
| T+1 refresh frequency | Most analytics scenarios do not need minute-level refresh; `1 DAY` is sufficient |
|
|
222
|
+
| PURGE=true | LIST_PURGE mode auto-deletes source files, prevents OSS/COS storage accumulation |
|
|
223
|
+
| GP type Virtual Cluster | Use `DEFAULT` (GENERAL type), Serverless on-demand billing |
|
|
224
|
+
| Keep file size at 128-256MB | Large CSV/Parquet files are 3-5x more efficient than many small files |
|
|
225
|
+
|
|
226
|
+
### 4. Security Recommendations
|
|
227
|
+
|
|
228
|
+
- Do not use root account AK/SK; create sub-accounts with minimum permissions (Bucket read + specific directory write)
|
|
229
|
+
- Internal Endpoints can bind VPC policies to restrict access sources
|
|
230
|
+
- AK/SK in Storage Connection is not visible in `SHOW STORAGE CONNECTIONS` (masked)
|
|
231
|
+
- When using AWS S3, prefer IAM Role (`ROLE_ARN`) over long-lived AK/SK
|
|
232
|
+
|
|
233
|
+
---
|
|
234
|
+
|
|
235
|
+
## FAQ
|
|
236
|
+
|
|
237
|
+
### Q: How is cross-cloud data transfer latency calculated?
|
|
238
|
+
|
|
239
|
+
Volume does not migrate data—files always remain in the original object storage. Query network latency depends on the link from the Lakehouse instance's region to the object storage. **Same-region internal Endpoint is recommended** (such as `oss-cn-shanghai-internal.aliyuncs.com`), with latency typically under 10ms.
|
|
240
|
+
|
|
241
|
+
### Q: Can you access Tencent Cloud COS from an Alibaba Cloud instance?
|
|
242
|
+
|
|
243
|
+
**Not supported.** Storage Connection must be in the same cloud provider as the Lakehouse instance. For cross-cloud queries, the following alternatives are available:
|
|
244
|
+
|
|
245
|
+
1. Create a Lakehouse instance in the target cloud, use External Catalog for federated queries
|
|
246
|
+
2. Sync data cross-cloud to Lakehouse internal tables via data integration (Studio Sync)
|
|
247
|
+
3. Use Private Link to bridge the network, then access via External Schema
|
|
248
|
+
|
|
249
|
+
### Q: Is EVENT_NOTIFICATION mode for Pipe supported on all three clouds?
|
|
250
|
+
|
|
251
|
+
| Cloud | LIST_PURGE | EVENT_NOTIFICATION |
|
|
252
|
+
|---|---|---|
|
|
253
|
+
| Alibaba Cloud OSS | ✅ | ✅ (requires MNS queue configuration) |
|
|
254
|
+
| Tencent Cloud COS | ✅ | ❌ |
|
|
255
|
+
| AWS S3 | ✅ | ✅ (requires SQS queue configuration) |
|
|
256
|
+
|
|
257
|
+
`EVENT_NOTIFICATION` mode supports Alibaba Cloud OSS and AWS S3, with lower latency (near real-time) and without deleting source files. Tencent Cloud COS does not support it yet.
|
|
258
|
+
|
|
259
|
+
### Q: Why does COPY INTO VOLUME require SUBDIRECTORY?
|
|
260
|
+
|
|
261
|
+
`COPY INTO VOLUME` without the `SUBDIRECTORY` clause will throw `Syntax error at or near 'FROM'`. This is a mandatory SQL parser requirement, unrelated to the cloud platform. To export to the Volume root path, use `SUBDIRECTORY '/'`.
|
|
262
|
+
|
|
263
|
+
---
|
|
264
|
+
|
|
265
|
+
## Related Documents
|
|
266
|
+
|
|
267
|
+
- [Volume + Pipe Data Lake Acceleration Practice](lakehouse-volume-pipe-acceleration-guide.md) — Alibaba Cloud OSS end-to-end walkthrough
|
|
268
|
+
- [Medallion Pure SQL DT Architecture](lakehouse-medallion-sql-dt-guide.md) — Bronze→Silver→Gold three-layer modeling
|
|
269
|
+
- [Create Storage Connection](create-storage-connection.md) — Complete syntax for all three clouds
|
|
270
|
+
- [Create External Volume](create-external-volume.md) — Parameter details
|
|
271
|
+
- [Pipe Overview](pipe-overview.md) — LIST_PURGE vs EVENT_NOTIFICATION
|
|
272
|
+
- [Alibaba Cloud OSS Connection](aliyun_storage_connection.md) — OSS-specific configuration
|
|
273
|
+
- [Tencent Cloud COS Connection](cos_storage_connection.md) — COS-specific configuration
|
|
274
|
+
- [AWS S3 Connection](aws_storage_connection.md) — S3-specific configuration
|
|
@@ -0,0 +1,198 @@
|
|
|
1
|
+
# AI Gateway in Action: Analyzing Images in SQL to Build a Multimodal AI Pipeline
|
|
2
|
+
|
|
3
|
+
The image mode of `AI_COMPLETE` allows SQL to handle not just text, but also directly analyze image content—product image classification, document OCR recognition, monitoring screenshot anomaly detection, all completed in a single SQL query. Combined with **Volume storage + presigned URLs**, it forms a fully managed pipeline from image upload to AI analysis.
|
|
4
|
+
|
|
5
|
+
This article uses the Kimi K2.6 multimodal model as an example to demonstrate how to build an end-to-end multimodal AI pipeline in Singdata Lakehouse.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Pipeline Architecture
|
|
10
|
+
|
|
11
|
+

|
|
12
|
+
|
|
13
|
+
Three key steps:
|
|
14
|
+
1. **`PUT`** uploads images to User Volume (internal storage, user-isolated)
|
|
15
|
+
2. **`GET_PRESIGNED_URL`** generates a time-limited OSS signed URL so the LLM can access it
|
|
16
|
+
3. **`AI_COMPLETE` image mode** sends the image URL + Prompt to get analysis results
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## Upload Images to Volume
|
|
21
|
+
|
|
22
|
+
```sql
|
|
23
|
+
-- Upload a single image to User Volume
|
|
24
|
+
PUT '/path/to/product.jpg' TO USER VOLUME FILE 'images/product.jpg';
|
|
25
|
+
|
|
26
|
+
-- Upload to a subdirectory
|
|
27
|
+
PUT '/path/to/screenshot.png' TO USER VOLUME FILE 'screenshots/error_001.png';
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
> User Volume is private to the user; other users cannot access it. For team-shared images, use an External Volume mounting OSS/COS/S3.
|
|
31
|
+
|
|
32
|
+
---
|
|
33
|
+
|
|
34
|
+
## Generate Presigned URL
|
|
35
|
+
|
|
36
|
+
`GET_PRESIGNED_URL` generates a time-limited OSS signed URL for files in the Volume, which the LLM uses to download the image.
|
|
37
|
+
|
|
38
|
+
```sql
|
|
39
|
+
-- ⚠️ Must enable external mode first, otherwise an internal URL is generated (LLM cannot access)
|
|
40
|
+
SET cz.sql.function.get.presigned.url.force.external = true;
|
|
41
|
+
|
|
42
|
+
-- Generate a presigned URL valid for 1 hour
|
|
43
|
+
SELECT GET_PRESIGNED_URL(USER VOLUME, 'images/product.jpg', 3600) AS url;
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
Example return value:
|
|
47
|
+
```
|
|
48
|
+
http://cz-lh-sh-system-prod.oss-cn-shanghai.aliyuncs.com/.../product.jpg
|
|
49
|
+
?Expires=1780392774&OSSAccessKeyId=...&Signature=...
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
Parameter descriptions:
|
|
53
|
+
- First parameter: Volume type and name (`USER VOLUME` / `VOLUME my_vol`)
|
|
54
|
+
- Second parameter: File path within the Volume
|
|
55
|
+
- Third parameter: Validity period (seconds), recommended 3600 (1 hour), maximum 7 days
|
|
56
|
+
|
|
57
|
+
---
|
|
58
|
+
|
|
59
|
+
## AI_COMPLETE Image Mode
|
|
60
|
+
|
|
61
|
+
Image mode uses the **row value syntax** `(prompt AS prompt, image_url AS image)` to pass both the Prompt and image URL together:
|
|
62
|
+
|
|
63
|
+
```sql
|
|
64
|
+
SET cz.sql.function.get.presigned.url.force.external = true;
|
|
65
|
+
|
|
66
|
+
SELECT AI_COMPLETE('ai_gateway_conn:moonshotai/kimi-k2.6',
|
|
67
|
+
('Describe this image in one sentence' AS prompt,
|
|
68
|
+
GET_PRESIGNED_URL(USER VOLUME, 'ai_test/test_icon.png', 3600) AS image)
|
|
69
|
+
);
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
Output (1.3 seconds):
|
|
73
|
+
|
|
74
|
+
> The image is completely black, showing nothing visible.
|
|
75
|
+
|
|
76
|
+
### Syntax Details
|
|
77
|
+
|
|
78
|
+
| Parameter | Format | Description |
|
|
79
|
+
|---|---|---|
|
|
80
|
+
| model | `'ai_gateway_conn:moonshotai/kimi-k2.6'` | Must use a model that supports multimodal capabilities |
|
|
81
|
+
| prompt | `'Describe this image' AS prompt` | Text prompt; `AS prompt` cannot be omitted |
|
|
82
|
+
| image | `GET_PRESIGNED_URL(...) AS image` | Image URL; `AS image` cannot be omitted |
|
|
83
|
+
|
|
84
|
+
### Supported Models
|
|
85
|
+
|
|
86
|
+
Image mode requires a model with multimodal (vision) support, such as `moonshotai/kimi-k2.6` (confirmed working, 1.3s response). Text-only models will not throw an error when passed an image URL but cannot recognize image content.
|
|
87
|
+
|
|
88
|
+
---
|
|
89
|
+
|
|
90
|
+
## In Practice: Batch Image Analysis
|
|
91
|
+
|
|
92
|
+
### Scenario: Automatic Product Image Classification
|
|
93
|
+
|
|
94
|
+
Suppose an e-commerce system uploads thousands of product images to OSS daily and needs to automatically identify the product type in each image.
|
|
95
|
+
|
|
96
|
+
```sql
|
|
97
|
+
-- 1. Create table to store analysis results
|
|
98
|
+
CREATE TABLE product_image_analysis (
|
|
99
|
+
image_path STRING COMMENT 'Image path',
|
|
100
|
+
category STRING COMMENT 'AI-recognized category',
|
|
101
|
+
description STRING COMMENT 'AI description',
|
|
102
|
+
analyzed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP()
|
|
103
|
+
);
|
|
104
|
+
|
|
105
|
+
-- 2. Read image list from External Volume, analyze in batch
|
|
106
|
+
SET cz.sql.function.get.presigned.url.force.external = true;
|
|
107
|
+
|
|
108
|
+
INSERT INTO product_image_analysis (image_path, category, description)
|
|
109
|
+
SELECT
|
|
110
|
+
relative_path AS image_path,
|
|
111
|
+
-- Classification
|
|
112
|
+
AI_COMPLETE('ai_gateway_conn:moonshotai/kimi-k2.6',
|
|
113
|
+
('Classify the product into one of the following categories: electronics/clothing/food/furniture. Reply with the category name only.'
|
|
114
|
+
AS prompt,
|
|
115
|
+
GET_PRESIGNED_URL(VOLUME product_images_vol, relative_path, 3600) AS image)
|
|
116
|
+
) AS category,
|
|
117
|
+
-- Description
|
|
118
|
+
AI_COMPLETE('ai_gateway_conn:moonshotai/kimi-k2.6',
|
|
119
|
+
('Describe this product in one sentence' AS prompt,
|
|
120
|
+
GET_PRESIGNED_URL(VOLUME product_images_vol, relative_path, 3600) AS image)
|
|
121
|
+
) AS description
|
|
122
|
+
FROM DIRECTORY(VOLUME product_images_vol)
|
|
123
|
+
WHERE relative_path LIKE '%.jpg'
|
|
124
|
+
AND last_modified_time > CURRENT_TIMESTAMP() - INTERVAL 1 DAY;
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
### Query Results
|
|
128
|
+
|
|
129
|
+
```sql
|
|
130
|
+
SELECT category, COUNT(*) AS cnt
|
|
131
|
+
FROM product_image_analysis
|
|
132
|
+
GROUP BY category
|
|
133
|
+
ORDER BY cnt DESC;
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
### Dynamic Table for Automatic Incremental Analysis
|
|
137
|
+
|
|
138
|
+
Wrap the above logic in a Dynamic Table so new images are automatically analyzed after upload:
|
|
139
|
+
|
|
140
|
+
```sql
|
|
141
|
+
SET cz.sql.function.get.presigned.url.force.external = true;
|
|
142
|
+
|
|
143
|
+
CREATE OR REPLACE DYNAMIC TABLE gold.product_image_tags
|
|
144
|
+
REFRESH INTERVAL 1 HOUR vcluster DEFAULT
|
|
145
|
+
COMMENT 'Product image AI tags — auto incremental update'
|
|
146
|
+
AS
|
|
147
|
+
SELECT
|
|
148
|
+
relative_path AS image_path,
|
|
149
|
+
AI_COMPLETE('ai_gateway_conn:moonshotai/kimi-k2.6',
|
|
150
|
+
('Classify the product as electronics/clothing/food/furniture, reply with category name only.'
|
|
151
|
+
AS prompt,
|
|
152
|
+
GET_PRESIGNED_URL(VOLUME product_images_vol, relative_path, 3600) AS image)
|
|
153
|
+
) AS category,
|
|
154
|
+
last_modified_time
|
|
155
|
+
FROM DIRECTORY(VOLUME product_images_vol)
|
|
156
|
+
WHERE relative_path LIKE '%.jpg';
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
> ⚠️ **Note**: DT re-calls AI_COMPLETE for all rows on every REFRESH, which can be costly. It is recommended to use a regular table + scheduled INSERT instead, analyzing only newly added images.
|
|
160
|
+
|
|
161
|
+
---
|
|
162
|
+
|
|
163
|
+
## Applicable Scenarios
|
|
164
|
+
|
|
165
|
+
| Scenario | Implementation |
|
|
166
|
+
|---|---|
|
|
167
|
+
| Automatic product image classification | AI_COMPLETE image mode + classification Prompt |
|
|
168
|
+
| Document OCR extraction | "Extract all text from the image" |
|
|
169
|
+
| Monitoring screenshot anomaly detection | "Determine if there is an anomaly in the image, reply normal/abnormal only" |
|
|
170
|
+
| Social media image content moderation | "Determine if the image content violates rules, reply safe/unsafe only" |
|
|
171
|
+
| Invoice/document information extraction | "Extract the amount, date, and company name from the image" |
|
|
172
|
+
| Preliminary medical imaging screening | "Determine if there is an abnormal area in the image, reply yes/no only" |
|
|
173
|
+
|
|
174
|
+
---
|
|
175
|
+
|
|
176
|
+
## Notes
|
|
177
|
+
|
|
178
|
+
| Note | Description |
|
|
179
|
+
|---|---|
|
|
180
|
+
| **Presigned URL must use force.external** | `SET cz.sql.function.get.presigned.url.force.external = true`, otherwise generates internal URL |
|
|
181
|
+
| **Image formats** | Supports PNG, JPG, JPEG. SVG has a null MIME type, causing `Invalid image MIME type` error |
|
|
182
|
+
| **Model selection** | Must use a multimodal (vision) model. Text-only models won't error but cannot recognize image content |
|
|
183
|
+
| **URL validity** | Presigned URL default recommendation is 3600 seconds (1 hour), maximum 7 days |
|
|
184
|
+
| **DT cost** | AI_COMPLETE in DT fully recomputes on every REFRESH; use regular table + manual INSERT instead |
|
|
185
|
+
| **Concurrency limits** | For large-scale concurrent image analysis, watch AI Gateway TPM/RPM quota limits |
|
|
186
|
+
| **File size** | Images recommended to be < 10MB; oversized images may time out or consume too many tokens |
|
|
187
|
+
| **User Volume isolation** | User Volume is private; other users cannot see images you upload |
|
|
188
|
+
|
|
189
|
+
---
|
|
190
|
+
|
|
191
|
+
## Related Documents
|
|
192
|
+
|
|
193
|
+
- [AI_COMPLETE documentation](ai_complete.md) — Image mode parameters, options configuration
|
|
194
|
+
- [AI-Enhanced Data Analysis](lakehouse-ai-sql-analysis.md) — AI_COMPLETE text mode in practice
|
|
195
|
+
- [Volume Overview](volume-overview.md) — User Volume and External Volume concepts
|
|
196
|
+
- [GET_PRESIGNED_URL](put_get_volume.md) — Presigned URL generation
|
|
197
|
+
- [AI Gateway Product Overview](AI_Gateway.md) — Endpoint management, quota control
|
|
198
|
+
- [Multi-Cloud Unified Data Lake Acceleration](lakehouse-multi-cloud-acceleration.md) — Volume + Pipe cross-cloud approach
|