npm - @clickzetta/cz-cli-darwin-x64 - Versions diffs - 0.5.16 → 0.5.17 - Mend

@clickzetta/cz-cli-darwin-x64 0.5.16 → 0.5.17

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (243) hide show

package/bin/skills/lakehouse-doc-en/references/datagpt-model-config.md ADDED Viewed

@@ -0,0 +1,34 @@
+# Model Selection and Configuration
+Analytics Agent supports multiple large language models. Administrators can select which large language models are used for conversational analysis on behalf of the team. Once configured, users can choose from these models on the conversation page.
+## Switching Models in a Conversation
+Click the model icon on the left side of the input box to open a dropdown list of available models. Click to switch. Models marked as **Suggest** are system-recommended models.
+:-: ![](/.topwrite/assets/image_1780906064431.png =400)
+> 💡 **Tip**: The **Model Configuration List** at the bottom of the list links directly to the model configuration page.
+## Configuring Available Models (Admin)
+Controls which models appear in the conversation dropdown list.
+**Entry point**: Left navigation bar → Admin → Model Configuration
+:-: ![](/.topwrite/assets/image_1780906096468.png =746)
+Toggle the switch on a model card to make that model immediately visible to all users; turning it off hides it from the list. Supports filtering by provider and searching by name.
+If the model you need is not in the list (such as an enterprise self-built model), click "Go to AI Gateway to create a new model" at the bottom of the page. After configuring it in AI Gateway, return to the model configuration page and enable the model card switch.
+:-: ![](/.topwrite/assets/image_1780906154059.png =531)
+## Related Documentation
+* [AI Gateway](AIGateway.md) — Connect self-built or third-party models
+* [Data Source Management](datagpt_data_source.md) — Configure the data sources for your analytics domain
+* [Improving Q\&A Accuracy](answer-accuracy-improve.md) — After completing model configuration, use the semantic layer to further improve answer quality
+* [Conversational Data Analysis (Analytics Agent)](datagpt_introduction.md) — Return to the feature overview
+^

package/bin/skills/lakehouse-doc-en/references/datagpt_data_source.md CHANGED Viewed

@@ -1,57 +1,70 @@
 # Data Source Management
-DataGPT will by default add the Alibaba Cloud Lakehouse instance in the same region as a built-in data source, without the need for manual addition. If you want to add a Lakehouse data source from another region, please refer to the following operations.
+Data sources are the foundation for Analytics Agent to perform data analysis. You must first connect your database to the system before Analytics Agent can answer questions based on that data.
-## Function Overview
+## Navigation Entry
-This function is used to add and configure LakeHouse data source connections, supporting users to connect LakeHouse databases to the system for data analysis.
+Left navigation bar → **Management** → **Data Sources**
-## Operation Entry
+![](/.topwrite/assets/image_1780905896057.png)
-* Location: Product homepage, left navigation bar: Management -> Data Source
+## Supported Data Source Types
-## Configuration Item Description
+Click + **New Data Source** in the upper right corner and select a data source type:
-### Basic Information
+:-: ![](/.topwrite/assets/image_1780905931540.png =751)
-| Field Name                        | Required | Description                                                 |
-| --------------------------------- | -------- | ----------------------------------------------------------- |
-| Data Source Name                  | Yes      | The unique name used to identify the data source            |
-| Connection String                 | Yes      | The connection address of the LakeHouse database            |
-| Username                          | Yes      | The user account for accessing the database, example: admin |
-| Password                          | Yes      | The password for accessing the database                     |
-| Analytical Computing Cluster Name | No       | Specify the computing cluster used for data analysis        |
+| Type           | Applicable Scenario         |
+| -------------- | --------------------------- |
+| **LakeHouse**  | Singdata Lakehouse instance |
+| **Databricks** | Databricks data platform    |
+| **MySQL**      | MySQL database              |
+| **StarRocks**  | StarRocks database          |
-## Operation Instructions
+## Configuration Guide
-1. Enter the data source name
-2. Fill in the database connection string
+### LakeHouse
-![](.topwrite/assets/20250219-114848.jpeg =691)
+![](/.topwrite/assets/image_1780906009343.png)
-![](.topwrite/assets/20250219-114853.jpeg =677)
+| Field              | Description                                                               |
+| ------------------ | ------------------------------------------------------------------------- |
+| Data source name   | Must be unique within the system                                          |
+| Username           | Database username                                                         |
+| Password           | Database password                                                         |
+| JDBC URL           | Connection address, format: `jdbc:clickzetta://[host]/[workspace]`        |
+| AP Virtual Cluster | Name of the analytical Virtual Cluster; defaults to DEFAULT if left blank |
-1. Enter the username and password
-2. Fill in the analytical computing cluster name
-3. Click the "Connection Test" button to verify if the configuration is correct
-4. After passing the test, click the "Save" button to complete the addition
+### MySQL / StarRocks
-## Precautions
+![](/.topwrite/assets/image_1780905971354.png)
-1. The data source name must be unique in the system
-2. It is recommended to perform a connection test before saving
-3. Please ensure that the username provided has sufficient database access permissions
-4. It is recommended to use a strong password to ensure security
+| Field            | Description                                                         |
+| ---------------- | ------------------------------------------------------------------- |
+| Data source name | Must be unique within the system                                    |
+| Username         | Database username                                                   |
+| Password         | Database password                                                   |
+| JDBC URL         | Connection address, format: `jdbc:mysql://[host]:[port]/[database]` |
-## Error Handling
+### Databricks
-|                        |                                       |                                                     |
-| ---------------------- | ------------------------------------- | --------------------------------------------------- |
-| Error Type             | Possible Cause                        | Solution                                            |
-| Connection Test Failed | Connection string format error        | Check if the connection string format is correct    |
-|                        | Username or password error            | Confirm the correctness of the account and password |
-|                        | Network issue                         | Check if the network connection is normal           |
-| Save Failed            | Duplicate data source name            | Use an unused name                                  |
-|                        | Required information not fully filled | Check if all required fields are filled             |
+| Field            | Description                                       |
+| ---------------- | ------------------------------------------------- |
+| Data source name | Must be unique within the system                  |
+| JDBC URL         | JDBC connection address of the Databricks cluster |
+| Password         | Access token (Personal Access Token)              |
+## Connection Test and Save
+After filling in all fields, click **Connection Test** to verify connectivity. Once the test passes, click **Save** to complete the setup.
+> ⚠️ **Note**: After a data source is saved, modifying the connection information may cause associated analysis domain data to become unavailable. Proceed with caution.
+## Related Documentation
+* [Quick Start](datagpt_quickstart.md) — After adding a data source, follow this guide to complete your first Q\&A configuration
+* [Model Selection and Configuration](datagpt-model-config.md) — Choose the right LLM for your analysis domain
+* [Answer Accuracy Improvement](answer-accuracy-improve.md) — After connecting a data source, improve answer quality through semantic layer configuration
+* [Conversational Data Analytics (Analytics Agent)](datagpt_introduction.md) — Return to feature overview
 ^

package/bin/skills/lakehouse-doc-en/references/datagpt_introduction.md CHANGED Viewed

@@ -1,129 +1,105 @@
-# Conversational AI Data Analysis Tool: Analytics Agent
+# Conversational Data Analysis (Analytics Agent)
-Analytics Agent is a next-generation agentic analysis assistant built on cloud-native Lakehouse architecture and other data platforms (formerly known as DataGPT). It deeply integrates AI cognitive capabilities with enterprise-grade data, going beyond simple query functionality. The agent can dynamically construct AI dashboards through natural language, providing imaginative visualization flexibility that surpasses traditional rigid BI tools. It also proactively embeds contextual AI insights into key chart metrics, instantly revealing anomalies and trends, and uncovering hidden data insights within static reports.
+Analytics Agent (formerly known as DataGPT) is a built-in conversational data analysis product in Singdata Lakehouse. Business users ask questions in natural language, and the system automatically generates SQL, executes queries, returns charts and insights — no coding required. Data developers improve Q&A accuracy by configuring a semantic layer (metrics, business terms, knowledge documents, answer builders).
-^
-![](/.topwrite/assets/datagpt_1.png)
-^
+![](.topwrite/assets/anim-13-analytics-agent.svg)
-## When to Use Analytics Agent
+## When to Use
 | Scenario | Suitable? |
-| ---------------------------------------------------- | ------------------------------------ |
+|------|----------|
 | Business users querying data and viewing trends via natural language | ✅ Core use case |
 | Quickly generating AI dashboards without writing SQL | ✅ |
 | Automatic anomaly detection and alerting | ✅ |
+| Scheduled data report delivery | ✅ |
 | Precise SQL logic control, complex ETL | ❌ Use Studio SQL tasks |
-| Vector search / RAG Q&A | ❌ Use vector search + AI functions |
+| Vector search / RAG Q&A | ❌ Use [Vector Search](vector_search_ai.md) + [AI Functions](AI_function_in_SQL.md) |
 ## Quick Start
-**① Activate the service** (1 min)
+**① Activate the service** (1 minute)
 Find the Analytics Agent product card on the management center homepage and click "Free Activation". New users are recommended to check "Also activate a Lakehouse instance as the default data source" — the system will automatically configure sample data.
-**② Try with sample data** (5 min)
+**② Try with sample data** (5 minutes)
 Go to the product homepage, find the analysis domain marked "Sample", click "Start Analysis", and ask questions in natural language:
 - "What is the average second-hand housing price by district?"
 - "Which district has the highest listing volume?"
 - "Generate a housing price trend dashboard for me"
-**③ Connect your own data** (as needed)
-Add a data source (supports uploading Excel/CSV files or connecting Lakehouse data tables) → Create an analysis domain → Configure the semantic layer (business terms, metric definitions, table relationships, answer builders, knowledge documents, data annotations to help the Agent understand your business) → Start conversational analysis
-## Conceptual Framework:
-**Core Concepts**:
-The core conceptual framework consists of two main components: Data Assets and Analysis Domains.
-**Data Assets**
-As the infrastructure for enterprise analytics, it encompasses all core elements available for intelligent analysis, enhanced through the Analytics Agent Semantic Layer:
-* **Data Tables**: Structured basic data sources from Lakehouse.
-* **Semantic Layer Elements**:
+**③ Connect your own data** (as needed, completed by data developers)
-  * **Metric System**: Standardized measurement indicators built on data tables.
-  * **Business Terms**: Unified naming conventions and explanatory definitions designed to provide context for the agent.
+Add a data source → Create an analysis domain → Configure the semantic layer → Start conversational analysis.
-* **Dashboards**: Visual analytics panels built using AI based on the semantic layer and data tables.
+Supported data source types:
-* **Documents**: A collection of knowledge documents supporting Agentic RAG-based Q&A.
+| Type | Data Source |
+|------|--------|
+| Data Warehouse / Lakehouse | Lakehouse (default), Databricks |
+| Relational Database | MySQL, StarRocks |
+| Files | Excel, CSV upload |
-* **Indexes**: Indexes built on data table fields to accelerate retrieval.
+→ [Detailed steps in the Quick Start guide](datagpt_quickstart.md)
-:-: ![](/.topwrite/assets/DataGPT_2.png)
+## Core Concepts
-^
-## User Roles and Responsibilities:
-The Analytics Agent system is designed to serve two core user groups in data analysis scenarios: data developers and business analysts. These two types of users play unique and complementary roles in the process of extracting data value:
-1. **Data Developers**: Lead the full data lifecycle management, including data ingestion, quality control, model building, and semantic layer design (covering metric systems and answer builders), while continuously optimizing the Q&A experience. They leverage system capabilities to prepare data for use by business analysts.
-2. **Business Analysts**: As the core users of the system, they explore data deeply through natural language interaction, quickly obtaining business insights and decision support. Through the feedback process, they communicate with data developers to further refine and explore data, gaining deeper understanding and insights.
-## Technical Architecture:
-Multi-source and multi-type data enters the Lakehouse system through warehousing and data lake ingestion (when Lakehouse is chosen as the data engine):
+### Analysis Domain
-* Metadata is managed and access-controlled uniformly according to the data warehouse's permission system.
-* Data undergoes transformation processing and information extraction through our integrated Single Engine and AI engine.
-* Extraction results are stored in the form of tables, vectors, and inverted indexes, building an Agentic RAG Preparation Layer for the agent. These are then further processed by the Analytics Agent Semantic Layer, which performs automated feature analysis, knowledge graph construction, and index extraction.
-* Based on the DIKW model, the Agentic RAG layer provides "Information," while the Analytics Agent Semantic Layer elevates it to "Knowledge" by annotating, organizing, and summarizing context. This architecture enables the agent to autonomously plan and reason, laying a solid foundation for generative AI applications.
-* Agentic RAG: A Semantic Paradigm Shift
-  Analytics Agent transcends the linear "retrieve-then-generate" pipeline. By implementing Agentic RAG, we transform the LLM from a passive text generator into a proactive Reasoning Agent within the Analysis Domain.
-  * **LLM-Driven Understanding**: Rather than relying solely on vector distance (cosine similarity), Analytics Agent leverages the LLM's internal cognition to interpret user intent. The model determines "what is needed" rather than simply matching keywords.
+An analysis domain is the workspace for Q&A, organizing data tables, the semantic layer, and knowledge documents together. It is recommended to create separate analysis domains for different business domains (sales, finance, operations) to reduce cross-domain interference while supporting domain-level data permission isolation.
-  * **Proactive Orchestration**: The agent acts as the central brain within the analysis domain. It autonomously decides which objects to interact with:
+### Semantic Layer
-    * Whether to query **Data Tables** via SQL
-    * Whether to read specific **Files**
-    * Whether to check **Metric** definitions
+The semantic layer is key to improving Q&A accuracy. It includes four capabilities:
-  * **Iterative Refinement**: If the initial retrieval information is insufficient, the agent self-corrects. It performs multi-step reasoning to obtain additional context, ensuring the final answer is comprehensive and accurate.
-    By internalizing retrieval logic into the LLM itself, Analytics Agent addresses the limitations of traditional RAG:
+| Capability | Purpose | When to Use |
+|------|------|----------|
+| **Schema Description** (table/column descriptions, aliases) | Helps the model understand field meanings and business names | When the model selects the wrong table/column, or field names are ambiguous |
+| **Metrics** | Pre-defines precise calculation definitions | When core business metrics need unified definitions |
+| **Answer Builders** | Provides fixed SQL templates | For complex multi-table JOINs and fixed calculation logic |
+| **Knowledge Documents** | Provides business context, rules, and terminology | When the model does not understand industry terms or business rules |
-  * **Semantic Fidelity**: We leverage the model's multi-dimensional understanding of business logic and nuances, breaking through the "ceiling" of standard vector search.
+You can also configure **domain prompts** (role settings, answer standards, business constraints) and **row-level permissions** (control data visibility by user).
-  * **Complex Problem Solving**: The agent can handle multi-hop queries, synthesizing information from different data types (e.g., correlating a sales decline in a **dashboard** with a market report in a **file**).
+### Data Assets
-  * **Dynamic Adaptation**: As new assets are added to the analysis domain, the agent can adjust its reasoning strategies in real time, without relying on rigid, hard-coded index rules.
+* **Data Tables**: Structured data from Lakehouse, Databricks, MySQL, StarRocks, and other data sources, or uploaded Excel / CSV files
+* **Dashboards**: AI-generated visual panels based on the semantic layer, supporting scheduled refresh and version management
+* **Knowledge Base**: Document collections supporting RAG retrieval, organized with folders and linked to analysis domains
-## Free Version Limitations:
+### User Roles
-Thank you for using Singdata Analytics Agent. You are currently using the free version. To ensure you fully understand the product status, please note the following:
+The responsibilities of the two user types are clearly separated — data developers are responsible for "making data analyzable", and business analysts are responsible for "using data to make decisions":
-1. Features in the current version are early-stage product features, and we reserve the right to optimize, adjust, or modify these features.
+| Role | Responsibilities | Not Responsible For |
+|------|-----------|-------------|
+| **Data Developer** | Add data sources, create analysis domains, configure semantic layer (Schema description, metrics, answer builders, knowledge documents), set row-level permissions, optimize Q&A accuracy | Daily queries and data exploration |
+| **Business Analyst** | Ask questions in natural language, view charts, generate and share dashboards, submit Q&A feedback | Data source integration, semantic layer configuration |
-2. Based on product development plans, some features may be upgraded to paid services or have their service scope adjusted. We will notify affected users in advance before such changes occur.
+## How It Works
-3. During the free usage period, the product features have the following limitations:
+Analytics Agent uses an Agentic RAG architecture — not a simple "vector retrieval + generation" approach, but one where the LLM actively plans and reasons:
-&#x20;       ![](.topwrite/assets/20250114-221654.jpeg =258)
+1. **Understand intent**: Interprets the user's question, determining which tables to query, which metrics to read, and which documents to reference
+2. **Active orchestration**: Autonomously decides whether to execute a SQL query, read a file, or check a metric definition
+3. **Iterative refinement**: Self-corrects when initial results are insufficient, performing multi-step reasoning until the answer is complete and accurate
-If you have any suggestions for the product, please feel free to provide feedback through the following channels:
+This enables Analytics Agent to handle multi-hop queries (e.g., "the reason for the sales decline" requires correlating order data and market reports simultaneously).
-* **Phone**: 400-6767-862
-* **Email**: <service@singdata.com>
-* **Enterprise WeChat**: ![](.topwrite/assets/image_1736856313196.png =116)
-^
+All LLM models used by Analytics Agent are provided by **[AI Gateway](AIGateway.md)**. AI Gateway handles unified model integration, call routing, and usage management — Analytics Agent does not require a separate model API Key. To switch the underlying model or manage usage, do so in AI Gateway.
 ## Related Documentation
 | Document | Description |
-| ---------------------------------------------------- | -------------------------------------------- |
+|------|------|
 | [Quick Start](datagpt_quickstart.md) | Get started with Analytics Agent in 5 minutes |
+| [Data Source Management](datagpt_data_source.md) | Add and manage data sources |
 | [User Guide](datagpt_tutorial.md) | Data source configuration, semantic layer setup, dashboard creation |
-| [Best Practices](datagpt_bestpractice.md) | Methods to improve Q&A accuracy |
-| [Q&A Accuracy Improvement](answer-accuracy-improve.md) | Semantic layer optimization, metric definition standards |
-| [Lakehouse DataGPT Tour](lakehousedatagpt-tour.md) | Feature demo videos and screenshots |
+| [Q&A Accuracy Improvement](answer-accuracy-improve.md) | Detailed explanation of 4 semantic layer capabilities and best practices |
+| [AI Gateway](AIGateway.md) | LLM model integration, routing and usage management |
+| [Lakehouse DataGPT Tour](LakehouseDataGPT-tour.md) | Feature demo videos and screenshots |
+For suggestions or questions, contact us: **Phone** 400-6767-862 · **Email** service@singdata.com
 ^

package/bin/skills/lakehouse-doc-en/references/datagpt_quickstart.md CHANGED Viewed

@@ -1,99 +1,85 @@
-# DataGPT Quick Start
+# Analytics Agent Quick Start
-## Activate DataGPT Service
+This guide helps you configure Analytics Agent from scratch and run your first data Q\&A. After completing it, you will be able to ask questions about your own data in natural language and receive charts and analysis summaries.
-* Find the DataGPT product card on the "Home" page of the management center and click the "Activate for Free" button.&#x20;
-* In the pop-up window, the **cloud service provider** Alibaba Cloud and **region** ap-southeast-1 will be specified by default. The system provides the option "**Activate Lakehouse instance** in **Alibaba Cloud** - **ap-southeast-1** **as the default data source**":![](.topwrite/assets/20250218-202455.jpeg)
-* * **Check (recommended for new users**): The system will automatically activate the Lakehouse in Alibaba Cloud - **ap-southeast-1** as the default data source, no manual configuration is required.
-  * **Uncheck**: The system will not automatically activate the Lakehouse instance in East China 2 (Shanghai). You can manually add it on the data source management page after the service is activated. Please note that in this case, **DataGPT will not include preset sample data**.
-* Click "Activate" and after a short wait, you can enter the usage interface
+The diagram below shows the complete user journey. This guide covers the core steps of Phase 1 and Phase 2:
-&#x20;      ![](.topwrite/assets/20250218-200517.jpeg =783)
+:-: ![](/.topwrite/assets/image_1780894528587.png =635)
-^
-After the service is activated, you can start the DataGPT data analysis experience in various ways. To help you get started quickly, we provide the following analysis paths:
-## Method 1: Use Sample Analysis Domain
+## Activate Analytics Agent Service
-Ask questions using the sample dataset: We have prepared a well-configured sample dataset for you, which includes a complete table configuration and indicator system. You can start asking questions directly to quickly experience the intelligent analysis capabilities. At the same time, this sample can also serve as a template to help you create an analysis domain suitable for your business scenarios.
+* Find the Analytics Agent product card on the "Home" page of the management center and click the **Activate for Free** button.&#x20;
-Enter the product homepage, find the analysis domain marked "Sample" on the main page, click "Start Analysis" to enter the analysis homepage, and you can start asking questions.
-![](.topwrite/assets/20250218-200800.jpeg)
-## Method 2: Analyze Based on Your Own Data
+:-: ![](/.topwrite/assets/image_1780898507274.png =536)
-The system supports importing various data formats, including CSV, Text, Excel, PDF, etc. You can create an independent analysis domain and start intelligent analysis and Q\&A after importing the data.
+* In the pop-up window, the **cloud service provider** Alibaba Cloud and **region** East China 2 (Shanghai) will be specified by default. The system provides the option "**Simultaneously activate a Singdata Lakehouse instance in Alibaba Cloud - East China 2 (Shanghai) as the default data source**":
+  * **Check (recommended for new users**): The system will automatically activate Lakehouse as the default data source with pre-loaded sample data, requiring no manual configuration.
+  * **Uncheck**: After the service is activated, manually add a data source on the data source management page. No pre-loaded sample data will be included.&#x20;
+* Click **Activate** and after a short wait, you can enter the usage interface.
-In this case, we will use the real business data of the famous Brazilian e-commerce platform Olist to demonstrate the system's data analysis capabilities. We will import the following core data files:
+## Method 1: Use the Sample Analysis Domain
-**Core Business Data**:
+We have prepared a well-configured sample dataset for you, which includes a complete table configuration and metric system. You can start asking questions directly to experience intelligent analysis. This sample can also serve as a template to help you understand how to build your own analysis domain.
-`olist_orders_dataset.csv.gz (Order Main Table)`
+Go to the product home page, find the analysis domain labeled "Sample", and click **Start Analysis**.&#x20;
-`olist_order_items_dataset.csv.gz (Order Item Details)`
+:-: ![](/.topwrite/assets/image_1780898646591.png =583)
-`olist_order_payments_dataset.csv.gz (Payment Information)`
-`olist_products_dataset.csv.gz (Product Information)`
-**User and Seller Data**:
-`olist_customers_dataset.csv.gz (Customer Information)`
+## Method 2: Analyze Based on Your Own Data
-`olist_sellers_dataset.csv.gz (Seller Information)`
+The system supports importing multiple data formats including CSV, Excel, and PDF. The following uses real data from the Brazilian e-commerce platform Olist to demonstrate the complete workflow.
-These data files are compressed in gzip format (.gz) to improve transmission efficiency. The system will automatically decompress and recognize them. The data is linked through key fields such as order number (order\_id) and product number (product\_id) to form a complete business analysis data chain.
+### Step 1: Create a New Analysis Domain
-###
+:-: ![](/.topwrite/assets/image_1780898786853.png =571)
-### Step 1: Create a New Analysis Domain
+### Step 2: Basic Configuration
-![](.topwrite/assets/20250218-200854.jpeg)
+* **Analysis domain name**: Enter a name, e.g., "Brazil Olist E-commerce Data Analysis"
+* **Data source**: Select the underlying data platform (default is LakeHouse). To connect MySQL, StarRocks, Databricks, or other external databases, refer to [Data Source Management](datagpt_data_source.md).
+* **Model**: The system uses the default model; you can switch at any time on the conversation page. To uniformly configure models available to your team, refer to [Model Selection and Configuration](datagpt-model-config.md).
-^
+Leave other options as default and click **Confirm** to create the analysis domain.
-### Step 2: Basic Configuration
+> **Note**: The tables, metrics, and answer builder base tables in an analysis domain must all come from the same data source.
-* **Analysis Domain Name**: Users need to fill in the analysis domain name, such as "Brazil Olist E-commerce Data Analysis"
-* **Data Source**: Select LAKEHOUSE as the underlying data platform (default)
+### Step 3: Add Data
-> **Data Source Restriction**: The tables, indicators, and base tables of the answer builder in the analysis domain must come from the same data source. Keep other options as default. Click **Confirm** to create the analysis domain.
+* After creating the analysis domain, click **Add Data → Table**, then click **Start Adding**.
+* Select **Upload File**, add the following data files, and click **Next** to start parsing.
+* Click **Next** to upload data:      &#x20;
-Keep other options as default. Click **Confirm** to create the analysis domain.
+:-: ![](/.topwrite/assets/image_1780898941921.png =683)
-^
+> ⚠️ **Note**: In the file parsing interface, all files showing a gray dot must be clicked to confirm (showing green) before you can click "Next".
-### Step 3: Add Data
+### Step 4: Automatic Semantic Layer Construction
-* After creating a new analysis domain, a prompt to add data will pop up. Click **Add Data -> Table**, then click **Start Adding**
-* Select "**Upload File**" and add the above files to the system. Click **Next** to start parsing
+After upload is complete, the system automatically analyzes the data and constructs the semantic layer, including column descriptions and aliases, column type recognition, table relationship inference, and basic metric recommendations.
-:-: ![](.topwrite/assets/20250218-201126.jpeg =759)
+:-: ![](/.topwrite/assets/image_1780899052078.png =528)
-* Click **Next** to upload data:![](.topwrite/assets/20250218-201200.jpeg =781)      &#x20;
+The semantic layer is the foundation for the Agent to understand your data. If you find that Q\&A results are inaccurate (e.g., wrong metric calculation or wrong table selected), you can improve the semantic layer to resolve it — refer to [Answer Accuracy Improvement](answer-accuracy-improve.md).
-^
+### Step 5: Start Q\&A
-> ⚠️ **Note**: In the file parsing interface, there is a mandatory verification requirement: all files displayed with gray dots must click the file name and confirm to display green before clicking the "Next" button to proceed to the next steps.
->
-> ![](.topwrite/assets/20250218-201423.jpeg =703)
+Once the data is ready, ask questions in natural language directly, e.g., "What is the sales trend by region over the past 6 months?"
-* Automatic Data Semantic Layer Construction:
-  * Data Auto-Profiling: Automatically analyze the basic statistical characteristics of the dataset, including data distribution, missing values, outliers, and other key indicators
-  * Intelligent Supplement of Column Descriptions and Aliases: Note: For aliases, the system has generated alias suggestions, which will take effect after selection
-* Column type auto-recognition: Continuous, Categorical, Date\_And\_Time, Partition, and Other
-  * Column usage: FILTER, DATETIME\_FILTER, DIM, MEASURE
-  * Relationship auto-recognition: If more than one table is uploaded, the relationships will be automatically determined
-  * Automatic metric recommendation: Automatically generate business-meaningful metrics
-    ![](.topwrite/assets/20250218-201550.jpeg =775)
+After you are satisfied with the results, you can further:
-**Once the data is ready, you can start asking questions in natural language**.
+* **Adjust table layout**: Describe the layout and colors you want through conversation — refer to [Table Rendering](table_rendering.md)
+* **Save as a dashboard**: Save analysis results with one click; supports multi-version management — refer to [Dashboard Version Management](dashboard-version-management-guide.md)
+* **Set up auto-refresh**: Let dashboard data update automatically without manual refresh — refer to [Chart Auto-Refresh](chart-auto-refresh-guide.md)
+* **Set up scheduled tasks**: Let the Agent automatically run analysis on a schedule and push results to email — refer to [Scheduled Tasks](scheduled_task.md)
+* **Share dashboards and control permissions**: Set visible data ranges for different users — refer to [Row-Level Permissions](row_level_permission.md)
+* **Integrate into business systems**: Embed Q\&A capabilities into your own system via API — refer to [Open API](open-api-overview.md)
-^
+## Related Documentation
-:-: ![](.topwrite/assets/20250218-202239.jpeg =752)
+* [Data Source Management](datagpt_data_source.md) — Add more types of data sources (MySQL, StarRocks, Databricks)
+* [Model Selection and Configuration](datagpt-model-config.md) — Switch or configure the LLM used for Q\&A
+* [Answer Accuracy Improvement](answer-accuracy-improve.md) — Make answers more accurate through semantic layer configuration
+* [Row-Level Permissions](row_level_permission.md) — Set data access ranges for different users
+* [Open API](open-api-overview.md) — Integrate Q\&A capabilities into your system
 ^
-^

package/bin/skills/lakehouse-doc-en/references/datalake-acceleration.md CHANGED Viewed

@@ -1,5 +1,78 @@
 # Data Lake Acceleration
-Without moving your data, connect directly to an existing Hive Metastore and object storage via External Schema, and replace Spark/Hive ETL and Presto/Trino ad hoc queries with Singdata Serverless compute.
+"Data Lake Acceleration" is like attaching a Serverless query engine to data on object storage — data stays in place, and Lakehouse mounts, queries, and processes it directly, eliminating migration time and storage redundancy. Compared to traditional solutions (Spark/Hive ETL + Presto/Trino queries), you only need to focus on SQL logic, without managing cluster operations, scheduling configurations, or incremental detection.
-- [In-Place Data Lake Acceleration Implementation Guide](lakehouse-acceleration-guide.md)
+---
+## Three Acceleration Paths
+Data Lake Acceleration is not a single feature but a combination of multiple capabilities. Based on your current data situation and goals, choose the corresponding path:
+| Path | Where is the data | How to use | Best for |
+|------|---------|--------|---------|
+| **In-place queries** | Hive Metastore + object storage | External Schema direct connection, query directly | Existing Hive data warehouse, no migration desired |
+| **Auto-ingestion** | Object storage files (CSV/Parquet/JSON) | Volume mount → Pipe auto-import → DT incremental aggregation | Periodic file uploads needing automated pipelines |
+| **SQL modeling** | Already in Lakehouse tables | Dynamic Table declarative multi-layer pipeline | Data already loaded, needs cleaning/modeling/aggregation |
+| **AI in SQL** | Code already in object storage | External Function = Storage Connection + API Connection | Calling AI/ML/external APIs in SQL |
+The three paths complement each other and can be combined: use External Schema to query existing Hive tables → use Pipe to ingest incremental files → use Dynamic Table to build Silver/Gold layers → use External Function for AI analysis in SQL.
+If your data is spread across Alibaba Cloud OSS, Tencent Cloud COS, and AWS S3, start with [Multi-Cloud Unified Data Lake Acceleration](lakehouse-multi-cloud-acceleration.md) — the SQL syntax across all three clouds is 90% identical, with only Storage Connection parameter names differing.
+---
+## Core Capabilities Overview
+| Capability | What it is | What problem it solves |
+|------|--------|-------------|
+| **External Schema** | Direct connection to external Hive Metastore, zero-migration queries | Existing Hive data warehouse should not be touched, but query cost needs to be reduced |
+| **Volume** | Mount OSS/COS/S3 paths as Lakehouse directories | Files stay in object storage, Lakehouse reads and writes directly |
+| **Pipe** | Continuously scans Volume for new files, automatic COPY INTO | No scheduled tasks needed — files are automatically loaded when they arrive |
+| **Dynamic Table** | Declarative incremental refresh of materialized tables | No scheduling DAGs needed — system automatically detects increments and refreshes along dependency chains |
+| **External Function** | Register Python/Java code in OSS as SQL functions | Call AI, ML, and external APIs in SQL without writing application-layer code |
+---
+## Choose Your Reading Path
+### My data is on multiple clouds and I want unified management
+→ [Multi-Cloud Unified Data Lake Acceleration](lakehouse-multi-cloud-acceleration.md)
+Real-world comparison of Alibaba Cloud OSS + Tencent Cloud COS + AWS S3. Beyond Storage Connection parameter names, the SQL syntax for Volume, Pipe, and Dynamic Table is completely identical. Includes code reuse strategies, private network acceleration, and security best practices.
+### I want to query existing Hive data warehouses without moving data
+→ [In-Place Lake Acceleration Implementation Guide](lakehouse-acceleration-guide.md)
+External Schema connects directly to Hive Metastore, and Lakehouse queries Hive tables directly. Suitable for scenarios with large amounts of historical data in Hive where migration costs should be avoided.
+### I want object storage files to be automatically loaded into the warehouse
+→ [Volume + Pipe + Dynamic Table End-to-End Practice](lakehouse-volume-pipe-acceleration-guide.md)
+Complete pipeline: create Storage Connection → mount Volume → create Pipe for automatic import → Dynamic Table incremental aggregation. When OSS/COS/S3 files arrive, the full pipeline flows automatically.
+### I want to build a multi-layer data pipeline using pure SQL
+→ [Medallion Architecture Practice: Pure SQL Dynamic Table Approach](lakehouse-medallion-sql-dt-guide.md)
+Build a Bronze → Silver → Gold three-layer pipeline declaratively using Dynamic Table. Full example with a real NHL dataset (10 tables, ~14 million rows), including 5 Gold metric tables: top scorers, team records, goalie rankings, and more.
+### I want to call AI or external APIs in SQL
+→ [Storage Connection + API Connection + External Function Combined Practice](external-function-combo-practice.md)
+Build an External Function environment from scratch, covering Python Quickstart, ML dependency packaging, 30 AI functions, and Java UDF/UDAF/UDTF — four scenarios. Supports Alibaba Cloud, Tencent Cloud, and AWS.
+---
+## Recommended Reading Order
+For beginners, it is recommended to progress gradually in the following order:
+1. **[Volume + Pipe + Dynamic Table End-to-End Practice](lakehouse-volume-pipe-acceleration-guide.md)** — Understand the core pipeline for automatic data loading; run through your first end-to-end example
+2. **[Multi-Cloud Unified Data Lake Acceleration](lakehouse-multi-cloud-acceleration.md)** — Master the differences across three clouds (only Connection parameters differ); establish a code reuse strategy
+3. **[Medallion Architecture Practice: Pure SQL Dynamic Table Approach](lakehouse-medallion-sql-dt-guide.md)** — Master multi-table, multi-layer DT modeling; understand inter-layer references and incremental refresh
+4. **[Storage Connection + API Connection + External Function Combined Practice](external-function-combo-practice.md)** — Extend SQL boundaries; call AI/ML within SQL
+5. **[In-Place Lake Acceleration Implementation Guide](lakehouse-acceleration-guide.md)** — For scenarios with existing Hive data warehouses; zero-migration queries with External Schema