@clickzetta/cz-cli-darwin-x64 0.5.16 → 0.5.17
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/cz-cli +0 -0
- package/bin/skills/lakehouse-doc-en/SKILL.md +6 -11
- package/bin/skills/lakehouse-doc-en/references/AIGateway.md +58 -13
- package/bin/skills/lakehouse-doc-en/references/Computation.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/DataSource_Amazon_DocumentDB.md +3 -1
- package/bin/skills/lakehouse-doc-en/references/Foreach.md +14 -14
- package/bin/skills/lakehouse-doc-en/references/JDBC-Driver.md +0 -1
- package/bin/skills/lakehouse-doc-en/references/LakehouseAI-overview.md +21 -8
- package/bin/skills/lakehouse-doc-en/references/LakehouseDataGPT-tour.md +4 -9
- package/bin/skills/lakehouse-doc-en/references/LakehouseStudio-tour.md +14 -19
- package/bin/skills/lakehouse-doc-en/references/Lakehouse_Zilliz_MakeDataReadyforBIandAI.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/Logstash.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/Migrate_Spark_DataEngineeringBestPractices_Project_to_Lakehouse.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/Notebook.md +17 -17
- package/bin/skills/lakehouse-doc-en/references/RemoteFunction-as-udf.md +14 -14
- package/bin/skills/lakehouse-doc-en/references/SQL_External_Catalog_Guide.md +1 -9
- package/bin/skills/lakehouse-doc-en/references/SUMMARY.md +59 -29
- package/bin/skills/lakehouse-doc-en/references/WINDOWFUNCTION.md +99 -57
- package/bin/skills/lakehouse-doc-en/references/Zettapark_Data_Engineering_Demo.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/access-control-configuration.md +1 -8
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-2-5-1.0.md +16 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-3-29-1.0.2.md +14 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-3-8-1.0.1.md +16 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-4-28-1.1.md +29 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-12-1.1.1.md +18 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-15-1.2.md +9 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-21-1.3.md +9 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-5-28-1.4.md +10 -0
- package/bin/skills/lakehouse-doc-en/references/aigw-2026-6-3-1.5.md +9 -0
- package/bin/skills/lakehouse-doc-en/references/alicloud-arn-externalid.md +0 -5
- package/bin/skills/lakehouse-doc-en/references/answer-accuracy-improve.md +120 -103
- package/bin/skills/lakehouse-doc-en/references/application-list.md +1 -3
- package/bin/skills/lakehouse-doc-en/references/approval-list.md +16 -17
- package/bin/skills/lakehouse-doc-en/references/batch-load-parquet-file-into-lakehouse.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/batch_sync.md +9 -9
- package/bin/skills/lakehouse-doc-en/references/batch_sync_Sop.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/batchloadparquetfileintoLakehouse.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/bulkloadv1-python-sdk.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/chart-auto-refresh-guide.md +12 -6
- package/bin/skills/lakehouse-doc-en/references/clickzetta-sample-data.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/code_approval.md +1 -5
- package/bin/skills/lakehouse-doc-en/references/composite_task.md +31 -42
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_environment_and_data_generate.md +6 -9
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_javasdk_bulkload_realtime.md +4 -10
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_kafka_realtime_sync.md +1 -10
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_local_file_into_table_by_studio.md +0 -6
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_batchload_public_network.md +0 -5
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_python_node.md +2 -7
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_realtime_cdc_public_network.md +13 -18
- package/bin/skills/lakehouse-doc-en/references/comprehensive_guide_to_ingesting_studio_sql_insert.md +0 -1
- package/bin/skills/lakehouse-doc-en/references/concepts.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/config-datasource.md +5 -7
- package/bin/skills/lakehouse-doc-en/references/connect-with-cli.md +116 -72
- package/bin/skills/lakehouse-doc-en/references/connect-with-cz-cli.md +151 -0
- package/bin/skills/lakehouse-doc-en/references/continue-job.md +9 -17
- package/bin/skills/lakehouse-doc-en/references/create-api-connection.md +315 -286
- package/bin/skills/lakehouse-doc-en/references/create-catalog-connection.md +1 -0
- package/bin/skills/lakehouse-doc-en/references/create-dynamic-table.md +4 -4
- package/bin/skills/lakehouse-doc-en/references/create-external-catalog.md +85 -22
- package/bin/skills/lakehouse-doc-en/references/create-table-ddl.md +45 -0
- package/bin/skills/lakehouse-doc-en/references/creating_alicloud_privatelinkendpoint.md +4 -6
- package/bin/skills/lakehouse-doc-en/references/creating_alicloud_privatelinkservice.md +4 -7
- package/bin/skills/lakehouse-doc-en/references/creating_tencentcloud_privatelinkendpoint.md +2 -7
- package/bin/skills/lakehouse-doc-en/references/creating_tencentcloud_privatelinkservice.md +1 -5
- package/bin/skills/lakehouse-doc-en/references/cz-cli-agent.md +15 -10
- package/bin/skills/lakehouse-doc-en/references/cz-cli-datasource.md +0 -8
- package/bin/skills/lakehouse-doc-en/references/cz-cli-sql.md +2 -45
- package/bin/skills/lakehouse-doc-en/references/cz-cli.md +53 -42
- package/bin/skills/lakehouse-doc-en/references/dashboard-version-management-guide.md +12 -4
- package/bin/skills/lakehouse-doc-en/references/data-integration-intro.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/data-integration.md +29 -27
- package/bin/skills/lakehouse-doc-en/references/data-load-summary.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/data-quality.md +25 -25
- package/bin/skills/lakehouse-doc-en/references/data-sharing.md +31 -54
- package/bin/skills/lakehouse-doc-en/references/data-sources.md +45 -45
- package/bin/skills/lakehouse-doc-en/references/data_catalog.md +23 -25
- package/bin/skills/lakehouse-doc-en/references/data_privacy.md +5 -2
- package/bin/skills/lakehouse-doc-en/references/data_sharing_between_accounts_guide.md +0 -4
- package/bin/skills/lakehouse-doc-en/references/data_visualization.md +4 -15
- package/bin/skills/lakehouse-doc-en/references/dataagent.md +39 -7
- package/bin/skills/lakehouse-doc-en/references/databricks-delta-to-lakehouse-migration.md +168 -0
- package/bin/skills/lakehouse-doc-en/references/databricks-dlt-to-lakehouse-migration.md +331 -0
- package/bin/skills/lakehouse-doc-en/references/databricks-external-catalog-practice.md +367 -0
- package/bin/skills/lakehouse-doc-en/references/databricks-jobs-to-studio-migration.md +199 -0
- package/bin/skills/lakehouse-doc-en/references/databricks-notebook-to-studio-migration.md +350 -0
- package/bin/skills/lakehouse-doc-en/references/databricks-uc-governance-to-lakehouse-migration.md +327 -0
- package/bin/skills/lakehouse-doc-en/references/datagpt-model-config.md +34 -0
- package/bin/skills/lakehouse-doc-en/references/datagpt_data_source.md +50 -37
- package/bin/skills/lakehouse-doc-en/references/datagpt_introduction.md +55 -79
- package/bin/skills/lakehouse-doc-en/references/datagpt_quickstart.md +50 -64
- package/bin/skills/lakehouse-doc-en/references/datalake-acceleration.md +75 -2
- package/bin/skills/lakehouse-doc-en/references/dbt-databricks-to-clickzetta-migration.md +242 -0
- package/bin/skills/lakehouse-doc-en/references/dynamic-mask.md +30 -30
- package/bin/skills/lakehouse-doc-en/references/dynamic-table-bestpractice.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/dynamic-table-introduce.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/dynamic_table_summary.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/eco_integration/streamlit.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/eco_integration/superset.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/ecosystem-all.md +1 -3
- package/bin/skills/lakehouse-doc-en/references/ecosystem.md +145 -0
- package/bin/skills/lakehouse-doc-en/references/external-catalog-summary.md +33 -38
- package/bin/skills/lakehouse-doc-en/references/external-function-combo-practice.md +466 -0
- package/bin/skills/lakehouse-doc-en/references/f6fc6447ee.md +7 -9
- package/bin/skills/lakehouse-doc-en/references/federation-query.md +56 -6
- package/bin/skills/lakehouse-doc-en/references/finebi-mysql.md +2 -0
- package/bin/skills/lakehouse-doc-en/references/get-started-with-sample-data.md +10 -11
- package/bin/skills/lakehouse-doc-en/references/gitfolder.md +2 -3
- package/bin/skills/lakehouse-doc-en/references/grant-privileges.md +2 -0
- package/bin/skills/lakehouse-doc-en/references/iceberg-rest-catalog-databricks.md +166 -0
- package/bin/skills/lakehouse-doc-en/references/ide.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/if_else_task.md +59 -57
- package/bin/skills/lakehouse-doc-en/references/input_output.md +10 -7
- package/bin/skills/lakehouse-doc-en/references/jobprofile-bestpractices.md +60 -64
- package/bin/skills/lakehouse-doc-en/references/kafka-connection.md +0 -1
- package/bin/skills/lakehouse-doc-en/references/key-concepts.md +146 -117
- package/bin/skills/lakehouse-doc-en/references/lakehouse-ai-gateway-cz-cli.md +317 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-ai-sql-analysis.md +345 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-dqc-guide.md +300 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-medallion-sql-dt-guide.md +543 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-multi-cloud-acceleration.md +274 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-multimodal-ai-pipeline.md +198 -0
- package/bin/skills/lakehouse-doc-en/references/lakehouse-quick-experience_guide.md +49 -52
- package/bin/skills/lakehouse-doc-en/references/lakehouse-volume-pipe-acceleration-guide.md +380 -0
- package/bin/skills/lakehouse-doc-en/references/langchain-plug-installation.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/management.md +4 -9
- package/bin/skills/lakehouse-doc-en/references/medallion-lakehouse-from-scratch.md +2 -1
- package/bin/skills/lakehouse-doc-en/references/metrics_answer_build.md +58 -21
- package/bin/skills/lakehouse-doc-en/references/migrate-spark-data-engineering-best-practices-to-lakehouse.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/mindsdb.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/monitoring_and_alerting.md +65 -60
- package/bin/skills/lakehouse-doc-en/references/monitoring_item_specification.md +33 -33
- package/bin/skills/lakehouse-doc-en/references/multitable_batch_sync.md +16 -16
- package/bin/skills/lakehouse-doc-en/references/multitable_realtime_sync.md +65 -72
- package/bin/skills/lakehouse-doc-en/references/multitable_realtime_sync_sop.md +54 -52
- package/bin/skills/lakehouse-doc-en/references/navicat-mysql.md +2 -0
- package/bin/skills/lakehouse-doc-en/references/om-dynamic-table.md +71 -66
- package/bin/skills/lakehouse-doc-en/references/om-vcluster.md +2 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-create-session.md +79 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-generate-auth-token.md +63 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-overview.md +96 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-quick-start.md +286 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-response-guide.md +264 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-safe-question-poll.md +201 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-text2insight-query.md +99 -0
- package/bin/skills/lakehouse-doc-en/references/open-api-text2insight-stop.md +74 -0
- package/bin/skills/lakehouse-doc-en/references/overview.md +6 -7
- package/bin/skills/lakehouse-doc-en/references/permission-application.md +5 -5
- package/bin/skills/lakehouse-doc-en/references/pipe-introduction.md +1 -0
- package/bin/skills/lakehouse-doc-en/references/pipe-kafka-table-stream.md +72 -70
- package/bin/skills/lakehouse-doc-en/references/pipe-kafka.md +105 -110
- package/bin/skills/lakehouse-doc-en/references/pipe-overview.md +40 -40
- package/bin/skills/lakehouse-doc-en/references/pipe-storage-object.md +43 -48
- package/bin/skills/lakehouse-doc-en/references/pipe-summary.md +14 -4
- package/bin/skills/lakehouse-doc-en/references/pipe-syntax.md +58 -151
- package/bin/skills/lakehouse-doc-en/references/practice_python_task.md +4 -4
- package/bin/skills/lakehouse-doc-en/references/pricing-ai-gateway.md +181 -0
- package/bin/skills/lakehouse-doc-en/references/pricing-lakehouse.md +316 -0
- package/bin/skills/lakehouse-doc-en/references/pricing.md +44 -288
- package/bin/skills/lakehouse-doc-en/references/private-link-general.md +0 -2
- package/bin/skills/lakehouse-doc-en/references/pyspark-to-zettapark-migration-f1.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/python-igs.md +7 -3
- package/bin/skills/lakehouse-doc-en/references/python-sample-put-github-rt-events.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/python-task.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/python_reference/connector.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/python_reference/connector_advanced.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/python_reference/connector_examples.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/python_sdk_guide.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/python_shell_datasource.md +11 -9
- package/bin/skills/lakehouse-doc-en/references/quick_start_batch_sync_data.md +9 -18
- package/bin/skills/lakehouse-doc-en/references/quick_start_bi_analysis.md +8 -25
- package/bin/skills/lakehouse-doc-en/references/quick_start_create_workspace.md +4 -6
- package/bin/skills/lakehouse-doc-en/references/quick_start_data_quality.md +8 -8
- package/bin/skills/lakehouse-doc-en/references/quick_start_etl.md +16 -20
- package/bin/skills/lakehouse-doc-en/references/quick_start_monitoring_and_alerting.md +10 -18
- package/bin/skills/lakehouse-doc-en/references/quick_start_sql_query.md +7 -10
- package/bin/skills/lakehouse-doc-en/references/quick_start_upload_data.md +5 -7
- package/bin/skills/lakehouse-doc-en/references/quick_start_user_management.md +8 -8
- package/bin/skills/lakehouse-doc-en/references/quick_start_workspace.md +0 -5
- package/bin/skills/lakehouse-doc-en/references/quick_start_workspace_user.md +8 -8
- package/bin/skills/lakehouse-doc-en/references/quickstart.md +69 -56
- package/bin/skills/lakehouse-doc-en/references/quickstart_datashare_between_companies.md +0 -5
- package/bin/skills/lakehouse-doc-en/references/quickstart_envirment_for_team.md +0 -24
- package/bin/skills/lakehouse-doc-en/references/realtime-pipeline-selection-guide.md +1 -2
- package/bin/skills/lakehouse-doc-en/references/realtime-sales-dashboard-with-dynamic-table.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/realtime_sync.md +0 -1
- package/bin/skills/lakehouse-doc-en/references/release-note-2026-05-19.md +5 -3
- package/bin/skills/lakehouse-doc-en/references/revoke-privileges.md +3 -1
- package/bin/skills/lakehouse-doc-en/references/roles.md +2 -3
- package/bin/skills/lakehouse-doc-en/references/row-filter.md +165 -0
- package/bin/skills/lakehouse-doc-en/references/row_level_permission.md +30 -19
- package/bin/skills/lakehouse-doc-en/references/scheduled_task.md +28 -21
- package/bin/skills/lakehouse-doc-en/references/security_overview.md +99 -21
- package/bin/skills/lakehouse-doc-en/references/set-command.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/setup.md +13 -15
- package/bin/skills/lakehouse-doc-en/references/show-grants.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/snowflake-dynamic-tables-to-lakehouse.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/spark-connector-summary.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/sql_functions/context_functions/current_vcluster.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/sso-configuration.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/streaming_pipeline_with_dynamic_table.md +0 -1
- package/bin/skills/lakehouse-doc-en/references/studio-incremental-sync-practice.md +27 -23
- package/bin/skills/lakehouse-doc-en/references/studio-shell-task.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/supported-cloud-platforms.md +32 -0
- package/bin/skills/lakehouse-doc-en/references/table_rendering.md +18 -12
- package/bin/skills/lakehouse-doc-en/references/task-develop.md +89 -91
- package/bin/skills/lakehouse-doc-en/references/task_development.md +19 -17
- package/bin/skills/lakehouse-doc-en/references/task_group.md +16 -14
- package/bin/skills/lakehouse-doc-en/references/task_instance.md +21 -21
- package/bin/skills/lakehouse-doc-en/references/task_param.md +38 -35
- package/bin/skills/lakehouse-doc-en/references/task_param_reference.md +81 -79
- package/bin/skills/lakehouse-doc-en/references/task_scheduling_dependency.md +20 -21
- package/bin/skills/lakehouse-doc-en/references/tencentcloud_arn_and_externalid.md +1 -5
- package/bin/skills/lakehouse-doc-en/references/trial-account-quotas-and-limits.md +1 -3
- package/bin/skills/lakehouse-doc-en/references/tutorial_connect_to_lakehouse.md +69 -0
- package/bin/skills/lakehouse-doc-en/references/tutorials.md +4 -1
- package/bin/skills/lakehouse-doc-en/references/unique-key.md +167 -0
- package/bin/skills/lakehouse-doc-en/references/usageandbillingview.md +138 -0
- package/bin/skills/lakehouse-doc-en/references/use-dbt-dev.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/use-java-sdk-realtime-uploaddata.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/use-java-sdk-upload-data-local.md +3 -3
- package/bin/skills/lakehouse-doc-en/references/use-models.md +128 -0
- package/bin/skills/lakehouse-doc-en/references/use-mysql-client.md +81 -81
- package/bin/skills/lakehouse-doc-en/references/use-python-sdk-upload-data.md +10 -12
- package/bin/skills/lakehouse-doc-en/references/user-identification.md +2 -3
- package/bin/skills/lakehouse-doc-en/references/user_permission_grand_guide.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/using-udf-in-dynamic-table.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/vc_cache.md +18 -22
- package/bin/skills/lakehouse-doc-en/references/vcluster_size_description.md +33 -31
- package/bin/skills/lakehouse-doc-en/references/virtual-cluster.md +43 -45
- package/bin/skills/lakehouse-doc-en/references/web-job-history.md +94 -108
- package/bin/skills/lakehouse-doc-en/references/web_search.md +16 -7
- package/bin/skills/lakehouse-doc-en/references/zettapark-data-engineering-demo.md +1 -1
- package/bin/skills/lakehouse-doc-en/references/zettapark-dataframe-guide.md +144 -70
- package/bin/skills/lakehouse-doc-en/references/zettapark-dynamic-table-guide.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/zettapark-etl-guide.md +73 -33
- package/bin/skills/lakehouse-doc-en/references/zettapark-feature-engineering.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/zettapark-functions-guide.md +75 -46
- package/bin/skills/lakehouse-doc-en/references/zettapark-quick-start.md +2 -2
- package/bin/skills/lakehouse-doc-en/references/zettapark-stream-guide.md +4 -4
- package/bin/skills/lakehouse-doc-en/references/zettapark-volume-guide.md +93 -29
- package/package.json +1 -1
- package/bin/skills/lakehouse-doc-en/references/CLAUDE.md +0 -606
- package/bin/skills/lakehouse-doc-en/references/modelprice.md +0 -155
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
# Model Selection and Configuration
|
|
2
|
+
|
|
3
|
+
Analytics Agent supports multiple large language models. Administrators can select which large language models are used for conversational analysis on behalf of the team. Once configured, users can choose from these models on the conversation page.
|
|
4
|
+
|
|
5
|
+
## Switching Models in a Conversation
|
|
6
|
+
|
|
7
|
+
Click the model icon on the left side of the input box to open a dropdown list of available models. Click to switch. Models marked as **Suggest** are system-recommended models.
|
|
8
|
+
|
|
9
|
+
:-: 
|
|
10
|
+
|
|
11
|
+
> 💡 **Tip**: The **Model Configuration List** at the bottom of the list links directly to the model configuration page.
|
|
12
|
+
|
|
13
|
+
## Configuring Available Models (Admin)
|
|
14
|
+
|
|
15
|
+
Controls which models appear in the conversation dropdown list.
|
|
16
|
+
|
|
17
|
+
**Entry point**: Left navigation bar → Admin → Model Configuration
|
|
18
|
+
|
|
19
|
+
:-: 
|
|
20
|
+
|
|
21
|
+
Toggle the switch on a model card to make that model immediately visible to all users; turning it off hides it from the list. Supports filtering by provider and searching by name.
|
|
22
|
+
|
|
23
|
+
If the model you need is not in the list (such as an enterprise self-built model), click "Go to AI Gateway to create a new model" at the bottom of the page. After configuring it in AI Gateway, return to the model configuration page and enable the model card switch.
|
|
24
|
+
|
|
25
|
+
:-: 
|
|
26
|
+
|
|
27
|
+
## Related Documentation
|
|
28
|
+
|
|
29
|
+
* [AI Gateway](AIGateway.md) — Connect self-built or third-party models
|
|
30
|
+
* [Data Source Management](datagpt_data_source.md) — Configure the data sources for your analytics domain
|
|
31
|
+
* [Improving Q\&A Accuracy](answer-accuracy-improve.md) — After completing model configuration, use the semantic layer to further improve answer quality
|
|
32
|
+
* [Conversational Data Analysis (Analytics Agent)](datagpt_introduction.md) — Return to the feature overview
|
|
33
|
+
|
|
34
|
+
^
|
|
@@ -1,57 +1,70 @@
|
|
|
1
1
|
# Data Source Management
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
Data sources are the foundation for Analytics Agent to perform data analysis. You must first connect your database to the system before Analytics Agent can answer questions based on that data.
|
|
4
4
|
|
|
5
|
-
##
|
|
5
|
+
## Navigation Entry
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
Left navigation bar → **Management** → **Data Sources**
|
|
8
8
|
|
|
9
|
-
|
|
9
|
+

|
|
10
10
|
|
|
11
|
-
|
|
11
|
+
## Supported Data Source Types
|
|
12
12
|
|
|
13
|
-
|
|
13
|
+
Click + **New Data Source** in the upper right corner and select a data source type:
|
|
14
14
|
|
|
15
|
-
|
|
15
|
+
:-: 
|
|
16
16
|
|
|
17
|
-
|
|
|
18
|
-
|
|
|
19
|
-
|
|
|
20
|
-
|
|
|
21
|
-
|
|
|
22
|
-
|
|
|
23
|
-
| Analytical Computing Cluster Name | No | Specify the computing cluster used for data analysis |
|
|
17
|
+
| Type | Applicable Scenario |
|
|
18
|
+
| -------------- | --------------------------- |
|
|
19
|
+
| **LakeHouse** | Singdata Lakehouse instance |
|
|
20
|
+
| **Databricks** | Databricks data platform |
|
|
21
|
+
| **MySQL** | MySQL database |
|
|
22
|
+
| **StarRocks** | StarRocks database |
|
|
24
23
|
|
|
25
|
-
##
|
|
24
|
+
## Configuration Guide
|
|
26
25
|
|
|
27
|
-
|
|
28
|
-
2. Fill in the database connection string
|
|
26
|
+
### LakeHouse
|
|
29
27
|
|
|
30
|
-

|
|
31
29
|
|
|
32
|
-
|
|
30
|
+
| Field | Description |
|
|
31
|
+
| ------------------ | ------------------------------------------------------------------------- |
|
|
32
|
+
| Data source name | Must be unique within the system |
|
|
33
|
+
| Username | Database username |
|
|
34
|
+
| Password | Database password |
|
|
35
|
+
| JDBC URL | Connection address, format: `jdbc:clickzetta://[host]/[workspace]` |
|
|
36
|
+
| AP Virtual Cluster | Name of the analytical Virtual Cluster; defaults to DEFAULT if left blank |
|
|
33
37
|
|
|
34
|
-
|
|
35
|
-
2. Fill in the analytical computing cluster name
|
|
36
|
-
3. Click the "Connection Test" button to verify if the configuration is correct
|
|
37
|
-
4. After passing the test, click the "Save" button to complete the addition
|
|
38
|
+
### MySQL / StarRocks
|
|
38
39
|
|
|
39
|
-
|
|
40
|
+

|
|
40
41
|
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
42
|
+
| Field | Description |
|
|
43
|
+
| ---------------- | ------------------------------------------------------------------- |
|
|
44
|
+
| Data source name | Must be unique within the system |
|
|
45
|
+
| Username | Database username |
|
|
46
|
+
| Password | Database password |
|
|
47
|
+
| JDBC URL | Connection address, format: `jdbc:mysql://[host]:[port]/[database]` |
|
|
45
48
|
|
|
46
|
-
|
|
49
|
+
### Databricks
|
|
47
50
|
|
|
48
|
-
|
|
|
49
|
-
|
|
|
50
|
-
|
|
|
51
|
-
|
|
|
52
|
-
|
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
51
|
+
| Field | Description |
|
|
52
|
+
| ---------------- | ------------------------------------------------- |
|
|
53
|
+
| Data source name | Must be unique within the system |
|
|
54
|
+
| JDBC URL | JDBC connection address of the Databricks cluster |
|
|
55
|
+
| Password | Access token (Personal Access Token) |
|
|
56
|
+
|
|
57
|
+
## Connection Test and Save
|
|
58
|
+
|
|
59
|
+
After filling in all fields, click **Connection Test** to verify connectivity. Once the test passes, click **Save** to complete the setup.
|
|
60
|
+
|
|
61
|
+
> ⚠️ **Note**: After a data source is saved, modifying the connection information may cause associated analysis domain data to become unavailable. Proceed with caution.
|
|
62
|
+
|
|
63
|
+
## Related Documentation
|
|
64
|
+
|
|
65
|
+
* [Quick Start](datagpt_quickstart.md) — After adding a data source, follow this guide to complete your first Q\&A configuration
|
|
66
|
+
* [Model Selection and Configuration](datagpt-model-config.md) — Choose the right LLM for your analysis domain
|
|
67
|
+
* [Answer Accuracy Improvement](answer-accuracy-improve.md) — After connecting a data source, improve answer quality through semantic layer configuration
|
|
68
|
+
* [Conversational Data Analytics (Analytics Agent)](datagpt_introduction.md) — Return to feature overview
|
|
56
69
|
|
|
57
70
|
^
|
|
@@ -1,129 +1,105 @@
|
|
|
1
|
-
# Conversational
|
|
1
|
+
# Conversational Data Analysis (Analytics Agent)
|
|
2
2
|
|
|
3
|
-
Analytics Agent
|
|
3
|
+
Analytics Agent (formerly known as DataGPT) is a built-in conversational data analysis product in Singdata Lakehouse. Business users ask questions in natural language, and the system automatically generates SQL, executes queries, returns charts and insights — no coding required. Data developers improve Q&A accuracy by configuring a semantic layer (metrics, business terms, knowledge documents, answer builders).
|
|
4
4
|
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-

|
|
8
|
-
|
|
9
|
-
^
|
|
5
|
+

|
|
10
6
|
|
|
11
|
-
## When to Use
|
|
7
|
+
## When to Use
|
|
12
8
|
|
|
13
9
|
| Scenario | Suitable? |
|
|
14
|
-
|
|
10
|
+
|------|----------|
|
|
15
11
|
| Business users querying data and viewing trends via natural language | ✅ Core use case |
|
|
16
12
|
| Quickly generating AI dashboards without writing SQL | ✅ |
|
|
17
13
|
| Automatic anomaly detection and alerting | ✅ |
|
|
14
|
+
| Scheduled data report delivery | ✅ |
|
|
18
15
|
| Precise SQL logic control, complex ETL | ❌ Use Studio SQL tasks |
|
|
19
|
-
| Vector search / RAG Q&A | ❌ Use
|
|
16
|
+
| Vector search / RAG Q&A | ❌ Use [Vector Search](vector_search_ai.md) + [AI Functions](AI_function_in_SQL.md) |
|
|
20
17
|
|
|
21
18
|
## Quick Start
|
|
22
19
|
|
|
23
|
-
**① Activate the service** (1
|
|
20
|
+
**① Activate the service** (1 minute)
|
|
24
21
|
|
|
25
22
|
Find the Analytics Agent product card on the management center homepage and click "Free Activation". New users are recommended to check "Also activate a Lakehouse instance as the default data source" — the system will automatically configure sample data.
|
|
26
23
|
|
|
27
|
-
**② Try with sample data** (5
|
|
24
|
+
**② Try with sample data** (5 minutes)
|
|
28
25
|
|
|
29
26
|
Go to the product homepage, find the analysis domain marked "Sample", click "Start Analysis", and ask questions in natural language:
|
|
27
|
+
|
|
30
28
|
- "What is the average second-hand housing price by district?"
|
|
31
29
|
- "Which district has the highest listing volume?"
|
|
32
30
|
- "Generate a housing price trend dashboard for me"
|
|
33
31
|
|
|
34
|
-
**③ Connect your own data** (as needed)
|
|
35
|
-
|
|
36
|
-
Add a data source (supports uploading Excel/CSV files or connecting Lakehouse data tables) → Create an analysis domain → Configure the semantic layer (business terms, metric definitions, table relationships, answer builders, knowledge documents, data annotations to help the Agent understand your business) → Start conversational analysis
|
|
37
|
-
|
|
38
|
-
## Conceptual Framework:
|
|
39
|
-
|
|
40
|
-
**Core Concepts**:
|
|
41
|
-
The core conceptual framework consists of two main components: Data Assets and Analysis Domains.
|
|
42
|
-
|
|
43
|
-
**Data Assets**
|
|
44
|
-
As the infrastructure for enterprise analytics, it encompasses all core elements available for intelligent analysis, enhanced through the Analytics Agent Semantic Layer:
|
|
45
|
-
|
|
46
|
-
* **Data Tables**: Structured basic data sources from Lakehouse.
|
|
47
|
-
|
|
48
|
-
* **Semantic Layer Elements**:
|
|
32
|
+
**③ Connect your own data** (as needed, completed by data developers)
|
|
49
33
|
|
|
50
|
-
|
|
51
|
-
* **Business Terms**: Unified naming conventions and explanatory definitions designed to provide context for the agent.
|
|
34
|
+
Add a data source → Create an analysis domain → Configure the semantic layer → Start conversational analysis.
|
|
52
35
|
|
|
53
|
-
|
|
36
|
+
Supported data source types:
|
|
54
37
|
|
|
55
|
-
|
|
38
|
+
| Type | Data Source |
|
|
39
|
+
|------|--------|
|
|
40
|
+
| Data Warehouse / Lakehouse | Lakehouse (default), Databricks |
|
|
41
|
+
| Relational Database | MySQL, StarRocks |
|
|
42
|
+
| Files | Excel, CSV upload |
|
|
56
43
|
|
|
57
|
-
|
|
44
|
+
→ [Detailed steps in the Quick Start guide](datagpt_quickstart.md)
|
|
58
45
|
|
|
59
|
-
|
|
46
|
+
## Core Concepts
|
|
60
47
|
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
## User Roles and Responsibilities:
|
|
64
|
-
|
|
65
|
-
The Analytics Agent system is designed to serve two core user groups in data analysis scenarios: data developers and business analysts. These two types of users play unique and complementary roles in the process of extracting data value:
|
|
66
|
-
|
|
67
|
-
1. **Data Developers**: Lead the full data lifecycle management, including data ingestion, quality control, model building, and semantic layer design (covering metric systems and answer builders), while continuously optimizing the Q&A experience. They leverage system capabilities to prepare data for use by business analysts.
|
|
68
|
-
2. **Business Analysts**: As the core users of the system, they explore data deeply through natural language interaction, quickly obtaining business insights and decision support. Through the feedback process, they communicate with data developers to further refine and explore data, gaining deeper understanding and insights.
|
|
69
|
-
|
|
70
|
-
## Technical Architecture:
|
|
71
|
-
|
|
72
|
-
Multi-source and multi-type data enters the Lakehouse system through warehousing and data lake ingestion (when Lakehouse is chosen as the data engine):
|
|
48
|
+
### Analysis Domain
|
|
73
49
|
|
|
74
|
-
|
|
75
|
-
* Data undergoes transformation processing and information extraction through our integrated Single Engine and AI engine.
|
|
76
|
-
* Extraction results are stored in the form of tables, vectors, and inverted indexes, building an Agentic RAG Preparation Layer for the agent. These are then further processed by the Analytics Agent Semantic Layer, which performs automated feature analysis, knowledge graph construction, and index extraction.
|
|
77
|
-
* Based on the DIKW model, the Agentic RAG layer provides "Information," while the Analytics Agent Semantic Layer elevates it to "Knowledge" by annotating, organizing, and summarizing context. This architecture enables the agent to autonomously plan and reason, laying a solid foundation for generative AI applications.
|
|
78
|
-
* Agentic RAG: A Semantic Paradigm Shift
|
|
79
|
-
Analytics Agent transcends the linear "retrieve-then-generate" pipeline. By implementing Agentic RAG, we transform the LLM from a passive text generator into a proactive Reasoning Agent within the Analysis Domain.
|
|
80
|
-
* **LLM-Driven Understanding**: Rather than relying solely on vector distance (cosine similarity), Analytics Agent leverages the LLM's internal cognition to interpret user intent. The model determines "what is needed" rather than simply matching keywords.
|
|
50
|
+
An analysis domain is the workspace for Q&A, organizing data tables, the semantic layer, and knowledge documents together. It is recommended to create separate analysis domains for different business domains (sales, finance, operations) to reduce cross-domain interference while supporting domain-level data permission isolation.
|
|
81
51
|
|
|
82
|
-
|
|
52
|
+
### Semantic Layer
|
|
83
53
|
|
|
84
|
-
|
|
85
|
-
* Whether to read specific **Files**
|
|
86
|
-
* Whether to check **Metric** definitions
|
|
54
|
+
The semantic layer is key to improving Q&A accuracy. It includes four capabilities:
|
|
87
55
|
|
|
88
|
-
|
|
89
|
-
|
|
56
|
+
| Capability | Purpose | When to Use |
|
|
57
|
+
|------|------|----------|
|
|
58
|
+
| **Schema Description** (table/column descriptions, aliases) | Helps the model understand field meanings and business names | When the model selects the wrong table/column, or field names are ambiguous |
|
|
59
|
+
| **Metrics** | Pre-defines precise calculation definitions | When core business metrics need unified definitions |
|
|
60
|
+
| **Answer Builders** | Provides fixed SQL templates | For complex multi-table JOINs and fixed calculation logic |
|
|
61
|
+
| **Knowledge Documents** | Provides business context, rules, and terminology | When the model does not understand industry terms or business rules |
|
|
90
62
|
|
|
91
|
-
|
|
63
|
+
You can also configure **domain prompts** (role settings, answer standards, business constraints) and **row-level permissions** (control data visibility by user).
|
|
92
64
|
|
|
93
|
-
|
|
65
|
+
### Data Assets
|
|
94
66
|
|
|
95
|
-
|
|
67
|
+
* **Data Tables**: Structured data from Lakehouse, Databricks, MySQL, StarRocks, and other data sources, or uploaded Excel / CSV files
|
|
68
|
+
* **Dashboards**: AI-generated visual panels based on the semantic layer, supporting scheduled refresh and version management
|
|
69
|
+
* **Knowledge Base**: Document collections supporting RAG retrieval, organized with folders and linked to analysis domains
|
|
96
70
|
|
|
97
|
-
|
|
71
|
+
### User Roles
|
|
98
72
|
|
|
99
|
-
|
|
73
|
+
The responsibilities of the two user types are clearly separated — data developers are responsible for "making data analyzable", and business analysts are responsible for "using data to make decisions":
|
|
100
74
|
|
|
101
|
-
|
|
75
|
+
| Role | Responsibilities | Not Responsible For |
|
|
76
|
+
|------|-----------|-------------|
|
|
77
|
+
| **Data Developer** | Add data sources, create analysis domains, configure semantic layer (Schema description, metrics, answer builders, knowledge documents), set row-level permissions, optimize Q&A accuracy | Daily queries and data exploration |
|
|
78
|
+
| **Business Analyst** | Ask questions in natural language, view charts, generate and share dashboards, submit Q&A feedback | Data source integration, semantic layer configuration |
|
|
102
79
|
|
|
103
|
-
|
|
80
|
+
## How It Works
|
|
104
81
|
|
|
105
|
-
|
|
82
|
+
Analytics Agent uses an Agentic RAG architecture — not a simple "vector retrieval + generation" approach, but one where the LLM actively plans and reasons:
|
|
106
83
|
|
|
107
|
-
|
|
84
|
+
1. **Understand intent**: Interprets the user's question, determining which tables to query, which metrics to read, and which documents to reference
|
|
85
|
+
2. **Active orchestration**: Autonomously decides whether to execute a SQL query, read a file, or check a metric definition
|
|
86
|
+
3. **Iterative refinement**: Self-corrects when initial results are insufficient, performing multi-step reasoning until the answer is complete and accurate
|
|
108
87
|
|
|
109
|
-
|
|
88
|
+
This enables Analytics Agent to handle multi-hop queries (e.g., "the reason for the sales decline" requires correlating order data and market reports simultaneously).
|
|
110
89
|
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
* **Email**: <service@singdata.com>
|
|
114
|
-
|
|
115
|
-
* **Enterprise WeChat**: 
|
|
116
|
-
|
|
117
|
-
^
|
|
90
|
+
All LLM models used by Analytics Agent are provided by **[AI Gateway](AIGateway.md)**. AI Gateway handles unified model integration, call routing, and usage management — Analytics Agent does not require a separate model API Key. To switch the underlying model or manage usage, do so in AI Gateway.
|
|
118
91
|
|
|
119
92
|
## Related Documentation
|
|
120
93
|
|
|
121
94
|
| Document | Description |
|
|
122
|
-
|
|
95
|
+
|------|------|
|
|
123
96
|
| [Quick Start](datagpt_quickstart.md) | Get started with Analytics Agent in 5 minutes |
|
|
97
|
+
| [Data Source Management](datagpt_data_source.md) | Add and manage data sources |
|
|
124
98
|
| [User Guide](datagpt_tutorial.md) | Data source configuration, semantic layer setup, dashboard creation |
|
|
125
|
-
| [
|
|
126
|
-
| [
|
|
127
|
-
| [Lakehouse DataGPT Tour](
|
|
99
|
+
| [Q&A Accuracy Improvement](answer-accuracy-improve.md) | Detailed explanation of 4 semantic layer capabilities and best practices |
|
|
100
|
+
| [AI Gateway](AIGateway.md) | LLM model integration, routing and usage management |
|
|
101
|
+
| [Lakehouse DataGPT Tour](LakehouseDataGPT-tour.md) | Feature demo videos and screenshots |
|
|
102
|
+
|
|
103
|
+
For suggestions or questions, contact us: **Phone** 400-6767-862 · **Email** service@singdata.com
|
|
128
104
|
|
|
129
105
|
^
|
|
@@ -1,99 +1,85 @@
|
|
|
1
|
-
#
|
|
1
|
+
# Analytics Agent Quick Start
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
This guide helps you configure Analytics Agent from scratch and run your first data Q\&A. After completing it, you will be able to ask questions about your own data in natural language and receive charts and analysis summaries.
|
|
4
4
|
|
|
5
|
-
|
|
6
|
-
* In the pop-up window, the **cloud service provider** Alibaba Cloud and **region** ap-southeast-1 will be specified by default. The system provides the option "**Activate Lakehouse instance** in **Alibaba Cloud** - **ap-southeast-1** **as the default data source**":
|
|
7
|
-
* * **Check (recommended for new users**): The system will automatically activate the Lakehouse in Alibaba Cloud - **ap-southeast-1** as the default data source, no manual configuration is required.
|
|
8
|
-
* **Uncheck**: The system will not automatically activate the Lakehouse instance in East China 2 (Shanghai). You can manually add it on the data source management page after the service is activated. Please note that in this case, **DataGPT will not include preset sample data**.
|
|
9
|
-
* Click "Activate" and after a short wait, you can enter the usage interface
|
|
5
|
+
The diagram below shows the complete user journey. This guide covers the core steps of Phase 1 and Phase 2:
|
|
10
6
|
|
|
11
|
-
|
|
7
|
+
:-: 
|
|
12
8
|
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
After the service is activated, you can start the DataGPT data analysis experience in various ways. To help you get started quickly, we provide the following analysis paths:
|
|
16
|
-
|
|
17
|
-
## Method 1: Use Sample Analysis Domain
|
|
9
|
+
## Activate Analytics Agent Service
|
|
18
10
|
|
|
19
|
-
|
|
11
|
+
* Find the Analytics Agent product card on the "Home" page of the management center and click the **Activate for Free** button. 
|
|
20
12
|
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-

|
|
24
|
-
|
|
25
|
-
## Method 2: Analyze Based on Your Own Data
|
|
13
|
+
:-: 
|
|
26
14
|
|
|
27
|
-
|
|
15
|
+
* In the pop-up window, the **cloud service provider** Alibaba Cloud and **region** East China 2 (Shanghai) will be specified by default. The system provides the option "**Simultaneously activate a Singdata Lakehouse instance in Alibaba Cloud - East China 2 (Shanghai) as the default data source**":
|
|
16
|
+
* **Check (recommended for new users**): The system will automatically activate Lakehouse as the default data source with pre-loaded sample data, requiring no manual configuration.
|
|
17
|
+
* **Uncheck**: After the service is activated, manually add a data source on the data source management page. No pre-loaded sample data will be included. 
|
|
18
|
+
* Click **Activate** and after a short wait, you can enter the usage interface.
|
|
28
19
|
|
|
29
|
-
|
|
20
|
+
## Method 1: Use the Sample Analysis Domain
|
|
30
21
|
|
|
31
|
-
|
|
22
|
+
We have prepared a well-configured sample dataset for you, which includes a complete table configuration and metric system. You can start asking questions directly to experience intelligent analysis. This sample can also serve as a template to help you understand how to build your own analysis domain.
|
|
32
23
|
|
|
33
|
-
|
|
24
|
+
Go to the product home page, find the analysis domain labeled "Sample", and click **Start Analysis**. 
|
|
34
25
|
|
|
35
|
-
|
|
26
|
+
:-: 
|
|
36
27
|
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
`olist_products_dataset.csv.gz (Product Information)`
|
|
40
|
-
|
|
41
|
-
**User and Seller Data**:
|
|
42
|
-
|
|
43
|
-
`olist_customers_dataset.csv.gz (Customer Information)`
|
|
28
|
+
## Method 2: Analyze Based on Your Own Data
|
|
44
29
|
|
|
45
|
-
|
|
30
|
+
The system supports importing multiple data formats including CSV, Excel, and PDF. The following uses real data from the Brazilian e-commerce platform Olist to demonstrate the complete workflow.
|
|
46
31
|
|
|
47
|
-
|
|
32
|
+
### Step 1: Create a New Analysis Domain
|
|
48
33
|
|
|
49
|
-
|
|
34
|
+
:-: 
|
|
50
35
|
|
|
51
|
-
### Step
|
|
36
|
+
### Step 2: Basic Configuration
|
|
52
37
|
|
|
53
|
-
|
|
38
|
+
* **Analysis domain name**: Enter a name, e.g., "Brazil Olist E-commerce Data Analysis"
|
|
39
|
+
* **Data source**: Select the underlying data platform (default is LakeHouse). To connect MySQL, StarRocks, Databricks, or other external databases, refer to [Data Source Management](datagpt_data_source.md).
|
|
40
|
+
* **Model**: The system uses the default model; you can switch at any time on the conversation page. To uniformly configure models available to your team, refer to [Model Selection and Configuration](datagpt-model-config.md).
|
|
54
41
|
|
|
55
|
-
|
|
42
|
+
Leave other options as default and click **Confirm** to create the analysis domain.
|
|
56
43
|
|
|
57
|
-
|
|
44
|
+
> **Note**: The tables, metrics, and answer builder base tables in an analysis domain must all come from the same data source.
|
|
58
45
|
|
|
59
|
-
|
|
60
|
-
* **Data Source**: Select LAKEHOUSE as the underlying data platform (default)
|
|
46
|
+
### Step 3: Add Data
|
|
61
47
|
|
|
62
|
-
|
|
48
|
+
* After creating the analysis domain, click **Add Data → Table**, then click **Start Adding**.
|
|
49
|
+
* Select **Upload File**, add the following data files, and click **Next** to start parsing.
|
|
50
|
+
* Click **Next** to upload data:  
|
|
63
51
|
|
|
64
|
-
|
|
52
|
+
:-: 
|
|
65
53
|
|
|
66
|
-
|
|
54
|
+
> ⚠️ **Note**: In the file parsing interface, all files showing a gray dot must be clicked to confirm (showing green) before you can click "Next".
|
|
67
55
|
|
|
68
|
-
### Step
|
|
56
|
+
### Step 4: Automatic Semantic Layer Construction
|
|
69
57
|
|
|
70
|
-
|
|
71
|
-
* Select "**Upload File**" and add the above files to the system. Click **Next** to start parsing
|
|
58
|
+
After upload is complete, the system automatically analyzes the data and constructs the semantic layer, including column descriptions and aliases, column type recognition, table relationship inference, and basic metric recommendations.
|
|
72
59
|
|
|
73
|
-
:-: 
|
|
74
61
|
|
|
75
|
-
|
|
62
|
+
The semantic layer is the foundation for the Agent to understand your data. If you find that Q\&A results are inaccurate (e.g., wrong metric calculation or wrong table selected), you can improve the semantic layer to resolve it — refer to [Answer Accuracy Improvement](answer-accuracy-improve.md).
|
|
76
63
|
|
|
77
|
-
|
|
64
|
+
### Step 5: Start Q\&A
|
|
78
65
|
|
|
79
|
-
|
|
80
|
-
>
|
|
81
|
-
> 
|
|
66
|
+
Once the data is ready, ask questions in natural language directly, e.g., "What is the sales trend by region over the past 6 months?"
|
|
82
67
|
|
|
83
|
-
|
|
84
|
-
* Data Auto-Profiling: Automatically analyze the basic statistical characteristics of the dataset, including data distribution, missing values, outliers, and other key indicators
|
|
85
|
-
* Intelligent Supplement of Column Descriptions and Aliases: Note: For aliases, the system has generated alias suggestions, which will take effect after selection
|
|
86
|
-
* Column type auto-recognition: Continuous, Categorical, Date\_And\_Time, Partition, and Other
|
|
87
|
-
* Column usage: FILTER, DATETIME\_FILTER, DIM, MEASURE
|
|
88
|
-
* Relationship auto-recognition: If more than one table is uploaded, the relationships will be automatically determined
|
|
89
|
-
* Automatic metric recommendation: Automatically generate business-meaningful metrics
|
|
90
|
-

|
|
68
|
+
After you are satisfied with the results, you can further:
|
|
91
69
|
|
|
92
|
-
**
|
|
70
|
+
* **Adjust table layout**: Describe the layout and colors you want through conversation — refer to [Table Rendering](table_rendering.md)
|
|
71
|
+
* **Save as a dashboard**: Save analysis results with one click; supports multi-version management — refer to [Dashboard Version Management](dashboard-version-management-guide.md)
|
|
72
|
+
* **Set up auto-refresh**: Let dashboard data update automatically without manual refresh — refer to [Chart Auto-Refresh](chart-auto-refresh-guide.md)
|
|
73
|
+
* **Set up scheduled tasks**: Let the Agent automatically run analysis on a schedule and push results to email — refer to [Scheduled Tasks](scheduled_task.md)
|
|
74
|
+
* **Share dashboards and control permissions**: Set visible data ranges for different users — refer to [Row-Level Permissions](row_level_permission.md)
|
|
75
|
+
* **Integrate into business systems**: Embed Q\&A capabilities into your own system via API — refer to [Open API](open-api-overview.md)
|
|
93
76
|
|
|
94
|
-
|
|
77
|
+
## Related Documentation
|
|
95
78
|
|
|
96
|
-
|
|
79
|
+
* [Data Source Management](datagpt_data_source.md) — Add more types of data sources (MySQL, StarRocks, Databricks)
|
|
80
|
+
* [Model Selection and Configuration](datagpt-model-config.md) — Switch or configure the LLM used for Q\&A
|
|
81
|
+
* [Answer Accuracy Improvement](answer-accuracy-improve.md) — Make answers more accurate through semantic layer configuration
|
|
82
|
+
* [Row-Level Permissions](row_level_permission.md) — Set data access ranges for different users
|
|
83
|
+
* [Open API](open-api-overview.md) — Integrate Q\&A capabilities into your system
|
|
97
84
|
|
|
98
85
|
^
|
|
99
|
-
^
|
|
@@ -1,5 +1,78 @@
|
|
|
1
1
|
# Data Lake Acceleration
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
"Data Lake Acceleration" is like attaching a Serverless query engine to data on object storage — data stays in place, and Lakehouse mounts, queries, and processes it directly, eliminating migration time and storage redundancy. Compared to traditional solutions (Spark/Hive ETL + Presto/Trino queries), you only need to focus on SQL logic, without managing cluster operations, scheduling configurations, or incremental detection.
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Three Acceleration Paths
|
|
8
|
+
|
|
9
|
+
Data Lake Acceleration is not a single feature but a combination of multiple capabilities. Based on your current data situation and goals, choose the corresponding path:
|
|
10
|
+
|
|
11
|
+
| Path | Where is the data | How to use | Best for |
|
|
12
|
+
|------|---------|--------|---------|
|
|
13
|
+
| **In-place queries** | Hive Metastore + object storage | External Schema direct connection, query directly | Existing Hive data warehouse, no migration desired |
|
|
14
|
+
| **Auto-ingestion** | Object storage files (CSV/Parquet/JSON) | Volume mount → Pipe auto-import → DT incremental aggregation | Periodic file uploads needing automated pipelines |
|
|
15
|
+
| **SQL modeling** | Already in Lakehouse tables | Dynamic Table declarative multi-layer pipeline | Data already loaded, needs cleaning/modeling/aggregation |
|
|
16
|
+
| **AI in SQL** | Code already in object storage | External Function = Storage Connection + API Connection | Calling AI/ML/external APIs in SQL |
|
|
17
|
+
|
|
18
|
+
The three paths complement each other and can be combined: use External Schema to query existing Hive tables → use Pipe to ingest incremental files → use Dynamic Table to build Silver/Gold layers → use External Function for AI analysis in SQL.
|
|
19
|
+
|
|
20
|
+
If your data is spread across Alibaba Cloud OSS, Tencent Cloud COS, and AWS S3, start with [Multi-Cloud Unified Data Lake Acceleration](lakehouse-multi-cloud-acceleration.md) — the SQL syntax across all three clouds is 90% identical, with only Storage Connection parameter names differing.
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## Core Capabilities Overview
|
|
25
|
+
|
|
26
|
+
| Capability | What it is | What problem it solves |
|
|
27
|
+
|------|--------|-------------|
|
|
28
|
+
| **External Schema** | Direct connection to external Hive Metastore, zero-migration queries | Existing Hive data warehouse should not be touched, but query cost needs to be reduced |
|
|
29
|
+
| **Volume** | Mount OSS/COS/S3 paths as Lakehouse directories | Files stay in object storage, Lakehouse reads and writes directly |
|
|
30
|
+
| **Pipe** | Continuously scans Volume for new files, automatic COPY INTO | No scheduled tasks needed — files are automatically loaded when they arrive |
|
|
31
|
+
| **Dynamic Table** | Declarative incremental refresh of materialized tables | No scheduling DAGs needed — system automatically detects increments and refreshes along dependency chains |
|
|
32
|
+
| **External Function** | Register Python/Java code in OSS as SQL functions | Call AI, ML, and external APIs in SQL without writing application-layer code |
|
|
33
|
+
|
|
34
|
+
---
|
|
35
|
+
|
|
36
|
+
## Choose Your Reading Path
|
|
37
|
+
|
|
38
|
+
### My data is on multiple clouds and I want unified management
|
|
39
|
+
|
|
40
|
+
→ [Multi-Cloud Unified Data Lake Acceleration](lakehouse-multi-cloud-acceleration.md)
|
|
41
|
+
|
|
42
|
+
Real-world comparison of Alibaba Cloud OSS + Tencent Cloud COS + AWS S3. Beyond Storage Connection parameter names, the SQL syntax for Volume, Pipe, and Dynamic Table is completely identical. Includes code reuse strategies, private network acceleration, and security best practices.
|
|
43
|
+
|
|
44
|
+
### I want to query existing Hive data warehouses without moving data
|
|
45
|
+
|
|
46
|
+
→ [In-Place Lake Acceleration Implementation Guide](lakehouse-acceleration-guide.md)
|
|
47
|
+
|
|
48
|
+
External Schema connects directly to Hive Metastore, and Lakehouse queries Hive tables directly. Suitable for scenarios with large amounts of historical data in Hive where migration costs should be avoided.
|
|
49
|
+
|
|
50
|
+
### I want object storage files to be automatically loaded into the warehouse
|
|
51
|
+
|
|
52
|
+
→ [Volume + Pipe + Dynamic Table End-to-End Practice](lakehouse-volume-pipe-acceleration-guide.md)
|
|
53
|
+
|
|
54
|
+
Complete pipeline: create Storage Connection → mount Volume → create Pipe for automatic import → Dynamic Table incremental aggregation. When OSS/COS/S3 files arrive, the full pipeline flows automatically.
|
|
55
|
+
|
|
56
|
+
### I want to build a multi-layer data pipeline using pure SQL
|
|
57
|
+
|
|
58
|
+
→ [Medallion Architecture Practice: Pure SQL Dynamic Table Approach](lakehouse-medallion-sql-dt-guide.md)
|
|
59
|
+
|
|
60
|
+
Build a Bronze → Silver → Gold three-layer pipeline declaratively using Dynamic Table. Full example with a real NHL dataset (10 tables, ~14 million rows), including 5 Gold metric tables: top scorers, team records, goalie rankings, and more.
|
|
61
|
+
|
|
62
|
+
### I want to call AI or external APIs in SQL
|
|
63
|
+
|
|
64
|
+
→ [Storage Connection + API Connection + External Function Combined Practice](external-function-combo-practice.md)
|
|
65
|
+
|
|
66
|
+
Build an External Function environment from scratch, covering Python Quickstart, ML dependency packaging, 30 AI functions, and Java UDF/UDAF/UDTF — four scenarios. Supports Alibaba Cloud, Tencent Cloud, and AWS.
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## Recommended Reading Order
|
|
71
|
+
|
|
72
|
+
For beginners, it is recommended to progress gradually in the following order:
|
|
73
|
+
|
|
74
|
+
1. **[Volume + Pipe + Dynamic Table End-to-End Practice](lakehouse-volume-pipe-acceleration-guide.md)** — Understand the core pipeline for automatic data loading; run through your first end-to-end example
|
|
75
|
+
2. **[Multi-Cloud Unified Data Lake Acceleration](lakehouse-multi-cloud-acceleration.md)** — Master the differences across three clouds (only Connection parameters differ); establish a code reuse strategy
|
|
76
|
+
3. **[Medallion Architecture Practice: Pure SQL Dynamic Table Approach](lakehouse-medallion-sql-dt-guide.md)** — Master multi-table, multi-layer DT modeling; understand inter-layer references and incremental refresh
|
|
77
|
+
4. **[Storage Connection + API Connection + External Function Combined Practice](external-function-combo-practice.md)** — Extend SQL boundaries; call AI/ML within SQL
|
|
78
|
+
5. **[In-Place Lake Acceleration Implementation Guide](lakehouse-acceleration-guide.md)** — For scenarios with existing Hive data warehouses; zero-migration queries with External Schema
|