npm - @clickzetta/cz-cli-darwin-arm64 - Versions diffs - 0.5.15 → 0.5.17 - Mend

@clickzetta/cz-cli-darwin-arm64 0.5.15 → 0.5.17

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (243) hide show

package/bin/skills/lakehouse-doc-en/references/lakehouse-quick-experience_guide.md CHANGED Viewed

@@ -6,8 +6,6 @@ Welcome to Lakehouse! This guide has designed a series of carefully orchestrated
 This guide includes the following experience content:
-:-: ![](.topwrite/assets/lakehouse-happy-path-diagram_1747811346312.svg =820)
 1. **Run Your First SQL Query** (2-3 minutes)
    Experience Lakehouse's easy-to-use SQL analysis environment.
@@ -40,7 +38,6 @@ Log into Lakehouse Studio and create a new workspace: `lakehouse_quick_experienc
 ^
-:-: ![](.topwrite/assets/image_1747812390227.png =618)
 ^
@@ -48,13 +45,13 @@ Enter the "Development" page and switch the workspace to the newly created works
 ^
-:-: ![](.topwrite/assets/image_1747812493874.png =622)
 ^
-Entry for creating a new SQL worksheet:
+Entry for creating a new SQL worksheet.
-:-: ![](.topwrite/assets/image_1747812869517.png =621)
 ^
@@ -62,7 +59,7 @@ Create a new SQL worksheet named "00\_Environment\_Preparation".
 ^
-:-: ![](.topwrite/assets/image_1747812698576.png =622)
 ^
@@ -77,11 +74,11 @@ USE SCHEMA happy_path;
 -- Create the first Virtual Compute Cluster (General type)
 -- Virtual Compute Clusters are the core concept of Lakehouse, representing on-demand allocatable computing resources
-CREATE VCLUSTER IF NOT EXISTS MY_FIRST_VC
-VCLUSTER_SIZE = 1
-VCLUSTER_TYPE = GENERAL
-AUTO_SUSPEND_IN_SECOND = 60
-AUTO_RESUME = TRUE
+CREATE VCLUSTER IF NOT EXISTS MY_FIRST_VC
+VCLUSTER_SIZE = 1
+VCLUSTER_TYPE = GENERAL
+AUTO_SUSPEND_IN_SECOND = 60
+AUTO_RESUME = TRUE
 COMMENT 'My first virtual compute cluster (General)';
 -- Use this cluster
@@ -133,7 +130,7 @@ In this exercise, you will execute simple SQL queries, create tables, and perfor
      (103, 'Smart Watch', 1299.50, 'Wearables'),
      (104, 'Portable Power Bank', 159.90, 'Accessories'),
      (105, 'Mechanical Keyboard', 349.00, 'Computer Accessories');
    -- Query the inserted data
    SELECT * FROM happy_path.my_first_table;
    ```
@@ -142,7 +139,7 @@ In this exercise, you will execute simple SQL queries, create tables, and perfor
    ```sql
    -- Count products and average price by category
-   SELECT
+   SELECT
      category,
      COUNT(*) as product_count,
      AVG(price) as avg_price,
@@ -174,11 +171,11 @@ Next, let's create a different type of compute cluster to understand how to choo
    ```sql
    -- Create an Analytics-type Virtual Compute Cluster
    -- Analytics clusters optimize query performance, suitable for low-latency, high-concurrency analysis scenarios
-   CREATE VCLUSTER IF NOT EXISTS MY_SECOND_VC
-   VCLUSTER_SIZE = 1
-   VCLUSTER_TYPE = ANALYTICS
-   AUTO_SUSPEND_IN_SECOND = 60
-   AUTO_RESUME = TRUE
+   CREATE VCLUSTER IF NOT EXISTS MY_SECOND_VC
+   VCLUSTER_SIZE = 1
+   VCLUSTER_TYPE = ANALYTICS
+   AUTO_SUSPEND_IN_SECOND = 60
+   AUTO_RESUME = TRUE
    COMMENT 'My second virtual compute cluster (Analytics)';
    ```
@@ -283,7 +280,7 @@ Compute-storage separation is a core architectural feature of Lakehouse, allowin
    INSERT INTO happy_path.demo_dataset VALUES
      (6, 432.10, 'Text-6', CURRENT_TIMESTAMP()),
      (7, 789.65, 'Text-7', CURRENT_TIMESTAMP());
    -- Query the updated dataset
    SELECT * FROM happy_path.demo_dataset ORDER BY id;
    ```
@@ -389,7 +386,7 @@ The Lakehouse unified architecture allows you to directly query files in multipl
      (1003, 205, DATE '2023-02-02', 1, 899.00, 'C5003'),
      (1004, 204, DATE '2023-02-03', 1, 599.00, 'C5001'),
      (1005, 202, DATE '2023-02-03', 1, 3799.00, 'C5004');
    -- Export sales data to User Volume (CSV format)
    COPY INTO USER VOLUME
    SUBDIRECTORY 'lake_demo/sales_csv'
@@ -518,7 +515,7 @@ Lakehouse supports simultaneously processing both batch and streaming data on th
    SELECT * FROM happy_path.all_orders ORDER BY order_time DESC;
    -- View order statistics
-   SELECT
+   SELECT
      data_source,
      COUNT(*) as order_count,
      SUM(order_amount) as total_amount
@@ -538,7 +535,7 @@ Lakehouse supports simultaneously processing both batch and streaming data on th
    ```sql
    -- Query order statistics again
-   SELECT
+   SELECT
      data_source,
      COUNT(*) as order_count,
      SUM(order_amount) as total_amount
@@ -577,13 +574,13 @@ Lakehouse supports efficient vector search and inverted index search, which can
      description STRING,
      price DECIMAL(10,2),
      vec VECTOR(FLOAT, 16),  -- 16-dimensional vector representing product features
      -- Create vector index
      INDEX product_vec_idx (vec) USING VECTOR PROPERTIES (
-       "scalar.type" = "f32",
+       "scalar.type" = "f32",
        "distance.function" = "l2_distance"
      ),
      -- Create inverted index for full-text search
      INDEX product_description_idx (description) INVERTED PROPERTIES (
        'analyzer' = 'chinese'
@@ -596,31 +593,31 @@ Lakehouse supports efficient vector search and inverted index search, which can
    ```sql
    -- Insert sample data with vectors
    INSERT INTO happy_path.product_search_demo VALUES
-     (1001, 'Ultra-thin Laptop', 'Computers', 'Thin and lightweight high-performance business laptop with the latest processor and HD display', 6999.00,
+     (1001, 'Ultra-thin Laptop', 'Computers', 'Thin and lightweight high-performance business laptop with the latest processor and HD display', 6999.00,
       vector(0.1, 0.2, 0.3, 0.4, 0.5, 0.1, 0.2, 0.3, 0.4, 0.5, 0.1, 0.2, 0.3, 0.4, 0.5, 0.1)),
-     (1002, 'Professional Gaming Laptop', 'Computers', 'High-performance gaming laptop with dedicated graphics card, suitable for playing large games and professional design', 9999.00,
+     (1002, 'Professional Gaming Laptop', 'Computers', 'High-performance gaming laptop with dedicated graphics card, suitable for playing large games and professional design', 9999.00,
       vector(0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2)),
-     (1003, 'Business Office Desktop', 'Computers', 'Stable and efficient office desktop computer, suitable for enterprise and home office environments', 4599.00,
+     (1003, 'Business Office Desktop', 'Computers', 'Stable and efficient office desktop computer, suitable for enterprise and home office environments', 4599.00,
       vector(0.3, 0.4, 0.5, 0.6, 0.7, 0.3, 0.4, 0.5, 0.6, 0.7, 0.3, 0.4, 0.5, 0.6, 0.7, 0.3));
    -- Continue inserting more data
    INSERT INTO happy_path.product_search_demo VALUES
-     (1004, 'Professional Photography Camera', 'Digital Devices', 'High-resolution professional DSLR camera, suitable for landscape and portrait photography with clear and detailed image quality', 12999.00,
+     (1004, 'Professional Photography Camera', 'Digital Devices', 'High-resolution professional DSLR camera, suitable for landscape and portrait photography with clear and detailed image quality', 12999.00,
       vector(0.4, 0.5, 0.6, 0.7, 0.8, 0.4, 0.5, 0.6, 0.7, 0.8, 0.4, 0.5, 0.6, 0.7, 0.8, 0.4)),
-     (1005, 'Portable Bluetooth Speaker', 'Audio Devices', 'Compact and portable Bluetooth speaker with clear sound quality and long battery life, suitable for outdoor use', 299.00,
+     (1005, 'Portable Bluetooth Speaker', 'Audio Devices', 'Compact and portable Bluetooth speaker with clear sound quality and long battery life, suitable for outdoor use', 299.00,
       vector(0.5, 0.6, 0.7, 0.8, 0.9, 0.5, 0.6, 0.7, 0.8, 0.9, 0.5, 0.6, 0.7, 0.8, 0.9, 0.5)),
-     (1006, 'Wireless Noise-Cancelling Headphones', 'Audio Devices', 'Active noise cancellation technology, wireless connection, comfortable to wear, no ear pressure during long use', 1299.00,
+     (1006, 'Wireless Noise-Cancelling Headphones', 'Audio Devices', 'Active noise cancellation technology, wireless connection, comfortable to wear, no ear pressure during long use', 1299.00,
       vector(0.6, 0.7, 0.8, 0.9, 1.0, 0.6, 0.7, 0.8, 0.9, 1.0, 0.6, 0.7, 0.8, 0.9, 1.0, 0.6)),
-     (1007, 'Smart Watch', 'Wearables', 'Smart watch supporting heart rate monitoring, activity tracking, and message notifications, compatible with various smartphones', 1599.00,
+     (1007, 'Smart Watch', 'Wearables', 'Smart watch supporting heart rate monitoring, activity tracking, and message notifications, compatible with various smartphones', 1599.00,
       vector(0.7, 0.8, 0.9, 1.0, 0.1, 0.7, 0.8, 0.9, 1.0, 0.1, 0.7, 0.8, 0.9, 1.0, 0.1, 0.7));
    -- Continue inserting remaining data
    INSERT INTO happy_path.product_search_demo VALUES
-     (1008, 'Fitness Tracker', 'Wearables', 'Professional fitness tracking band, recording daily activity, sleep quality, and exercise data, waterproof design', 399.00,
+     (1008, 'Fitness Tracker', 'Wearables', 'Professional fitness tracking band, recording daily activity, sleep quality, and exercise data, waterproof design', 399.00,
       vector(0.8, 0.9, 1.0, 0.1, 0.2, 0.8, 0.9, 1.0, 0.1, 0.2, 0.8, 0.9, 1.0, 0.1, 0.2, 0.8)),
-     (1009, 'Ultra HD Smart TV', 'Home Appliances', '65-inch 4K Ultra HD smart TV, supporting voice control and various streaming applications', 5999.00,
+     (1009, 'Ultra HD Smart TV', 'Home Appliances', '65-inch 4K Ultra HD smart TV, supporting voice control and various streaming applications', 5999.00,
       vector(0.9, 1.0, 0.1, 0.2, 0.3, 0.9, 1.0, 0.1, 0.2, 0.3, 0.9, 1.0, 0.1, 0.2, 0.3, 0.9)),
-     (1010, 'Smart Air Purifier', 'Home Appliances', 'Efficiently filters PM2.5 and harmful gases, intelligently monitors air quality, automatically adjusts working mode', 1899.00,
+     (1010, 'Smart Air Purifier', 'Home Appliances', 'Efficiently filters PM2.5 and harmful gases, intelligently monitors air quality, automatically adjusts working mode', 1899.00,
       vector(1.0, 0.1, 0.2, 0.3, 0.4, 1.0, 0.1, 0.2, 0.3, 0.4, 1.0, 0.1, 0.2, 0.3, 0.4, 1.0));
    ```
@@ -651,7 +648,7 @@ LIMIT 5;
    ```sql
    -- Use inverted index for keyword search - find products with "high-performance" in the description
-   SELECT
+   SELECT
      product_id,
      product_name,
      category,
@@ -666,7 +663,7 @@ LIMIT 5;
    ```sql
    -- Hybrid query: find products similar to the reference vector and containing "gaming" in the description
-   SELECT
+   SELECT
      product_id,
      product_name,
      category,
@@ -674,7 +671,7 @@ LIMIT 5;
      price,
      l2_distance(vec, vector(0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2)) AS distance
    FROM happy_path.product_search_demo
-   WHERE
+   WHERE
      match_phrase(description, 'gaming', MAP('analyzer', 'chinese')) AND
      l2_distance(vec, vector(0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2)) < 10
    ORDER BY distance
@@ -685,7 +682,7 @@ LIMIT 5;
    ```sql
    -- Find products priced between 500-10000, with "high-performance" or "professional" in the description, and high vector similarity
-   SELECT
+   SELECT
      product_id,
      product_name,
      category,
@@ -693,9 +690,9 @@ LIMIT 5;
      price,
      l2_distance(vec, vector(0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2)) AS distance
    FROM happy_path.product_search_demo
-   WHERE
+   WHERE
      price BETWEEN 500 AND 10000 AND
-     (match_phrase(description, 'high-performance', MAP('analyzer', 'chinese')) OR
+     (match_phrase(description, 'high-performance', MAP('analyzer', 'chinese')) OR
       match_phrase(description, 'professional', MAP('analyzer', 'chinese'))) AND
      l2_distance(vec, vector(0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2, 0.3, 0.4, 0.5, 0.6, 0.2)) < 10
    ORDER BY distance
@@ -749,15 +746,15 @@ Lakehouse supports offline batch processing and transformation. Estimated time:
    ```sql
    -- Create a date dimension table
    CREATE TABLE IF NOT EXISTS happy_path.date_dim AS
-   SELECT DISTINCT
+   SELECT DISTINCT
      date_time::DATE as date_id,
      YEAR(date_time) as year,
      MONTH(date_time) as month,
      DAY(date_time) as day,
      DAYOFWEEK(date_time) as day_of_week,
-     CASE
-       WHEN DAYOFWEEK(date_time) IN (6, 7) THEN true
-       ELSE false
+     CASE
+       WHEN DAYOFWEEK(date_time) IN (6, 7) THEN true
+       ELSE false
      END as is_weekend
    FROM happy_path.sales_data;
@@ -770,7 +767,7 @@ Lakehouse supports offline batch processing and transformation. Estimated time:
    ```sql
    -- Create a sales summary table
    CREATE TABLE IF NOT EXISTS happy_path.sales_summary AS
-   SELECT
+   SELECT
      d.date_id,
      d.year,
      d.month,
@@ -800,7 +797,7 @@ Lakehouse supports offline batch processing and transformation. Estimated time:
    ```sql
    -- Analyze sales trends using window functions
-   SELECT
+   SELECT
      date_id,
      category,
      total_sales,
@@ -816,7 +813,7 @@ Lakehouse supports offline batch processing and transformation. Estimated time:
    ```sql
    -- Create a business insights view
    CREATE OR REPLACE VIEW happy_path.business_insights AS
-   SELECT
+   SELECT
      category,
      year,
      month,
@@ -837,7 +834,7 @@ Lakehouse supports offline batch processing and transformation. Estimated time:
    ```sql
    -- Analyze sales ranking and proportion by category
-   SELECT
+   SELECT
      category,
      SUM(total_sales) as category_sales,
      RANK() OVER (ORDER BY SUM(total_sales) DESC) as sales_rank,
@@ -973,8 +970,8 @@ Now you can start applying Lakehouse to actual business scenarios and enjoy a si
 ## References
-[Key Concepts](key_concepts.md)
+[Key Concepts](key-concepts.md)
 [Virtual Compute Cluster](getting_started_with_vcluster_for_processing_analytics.md)
 [Volume](datalake_volume.md)
-[Vector Index](create-vector-index.md)
+[Vector Index](vector-search.md)
 [Inverted Index](inverted-index.md)

package/bin/skills/lakehouse-doc-en/references/lakehouse-volume-pipe-acceleration-guide.md ADDED Viewed

@@ -0,0 +1,380 @@
+# Volume + Pipe + Dynamic Table End-to-End Practice
+"Data lake acceleration" refers to using the three capabilities of object storage mounting (Volume), continuous data ingestion (Pipe), and incremental computation (Dynamic Table) to directly query, process, and consume file data in object storage using Serverless compute—without migrating data—replacing traditional Spark/Hive ETL and Presto/Trino ad hoc queries.
+Applicable scenarios:
+- **Automatic file ingestion**: CSV/Parquet files periodically uploaded to OSS/COS/S3 are automatically detected and ingested by Pipe, no manual trigger required
+- **Incremental ETL**: After files are ingested, Dynamic Table automatically computes aggregated metrics incrementally, T+1 reports generated without delay
+- **Legacy data activation**: Large volumes of historical files in object storage can be queried directly via Volume mount, no data migration required
+Core data flow:
+```
+OSS/COS/S3 files → External Volume (mount) → Pipe (continuous ingestion) → Target table → Dynamic Table (incremental aggregation)
+                        ↕                              ↕
+                 COPY INTO/SELECT FROM         COPY INTO/SELECT FROM
+```
+---
+## Core Concepts
+| Object | Description | Analogy |
+|------|------|------|
+| **External Volume** | Mounts OSS/COS/S3 path for zero-copy access | "Filesystem" of Lakehouse |
+| **Pipe** | Continuously running data ingestion pipeline, automatically detects new files | Conveyor belt—files are ingested as soon as they are uploaded |
+| **Dynamic Table** | Materialized aggregation table that automatically refreshes incrementally | Replaces scheduled ETL jobs |
+The three work together to form a **self-driving data pipeline**: file upload → automatic ingestion → automatic aggregation, fully automated with no manual scheduling.
+---
+## SQL Commands Involved
+| Command / Function | Purpose | Use Case |
+|------------|------|---------|
+| `CREATE STORAGE CONNECTION` | Establish object storage authentication channel | One-time setup, shared by all Volumes |
+| `CREATE EXTERNAL VOLUME` | Mount object storage path to a Schema | Configure once per Bucket subdirectory |
+| `COPY INTO VOLUME` | Export data to Volume | Generate files for downstream consumption |
+| `SELECT FROM VOLUME` | Directly query files in Volume | Ad hoc queries, data exploration |
+| `DIRECTORY()` | List files in a Volume | View file list, validate exports |
+| `ALTER VOLUME REFRESH` | Manually refresh Volume directory cache | Use when `AUTO_REFRESH=FALSE` |
+| `CREATE PIPE` | Create continuous data ingestion pipeline | Automatic file ingestion |
+| `ALTER PIPE` | Pause/resume Pipe | Operations management |
+| `DESC PIPE EXTENDED` | View Pipe status and configuration | Monitoring, troubleshooting |
+| `load_history()` | Query table's historical load records | Validate Pipe loading, troubleshoot deduplication |
+| `CREATE DYNAMIC TABLE` | Create auto-incrementally refreshing aggregation table | Replace scheduled ETL jobs |
+| `REFRESH DYNAMIC TABLE` | Manually trigger Dynamic Table refresh | Immediately refresh after initial creation |
+| `SHOW DYNAMIC TABLE REFRESH HISTORY` | View refresh history | Monitor incremental refresh status |
+---
+## Prerequisites
+The following uses Alibaba Cloud OSS as an example, completing the full pipeline using the `semantic_model_test` schema and `DEFAULT` Virtual Cluster.
+> ⚠️ **Prerequisites**: OSS Bucket has been created, and you have AccessKey ID / AccessKey Secret. Virtual Cluster must be in RUNNING state (queries auto-wake in Serverless mode).
+---
+## End-to-End Practice
+### Step 1: Create Storage Connection
+Establish the authentication channel between Lakehouse and OSS.
+```sql
+-- Create OSS storage connection
+CREATE STORAGE CONNECTION IF NOT EXISTS my_oss_conn
+    TYPE OSS
+    access_id = '<your_access_key_id>'
+    access_key = '<your_access_key_secret>'
+    ENDPOINT = 'oss-cn-shanghai.aliyuncs.com';
+```
+> **Parameter note**: Alibaba Cloud OSS uses lowercase `access_id` / `access_key`. Uppercase `ACCESS_KEY_ID` / `ACCESS_KEY_SECRET` also works. Do not use `ACCESS_KEY` / `SECRET_KEY` (missing suffixes will cause errors).
+### Step 2: Create External Volume
+Mount the OSS Bucket subdirectory as a Lakehouse Volume.
+```sql
+CREATE EXTERNAL VOLUME IF NOT EXISTS my_data_vol
+    LOCATION 'oss://my-bucket/data/'
+    USING CONNECTION my_oss_conn
+    DIRECTORY = (ENABLE = TRUE, AUTO_REFRESH = FALSE)
+    RECURSIVE = TRUE
+    COMMENT 'Dedicated Volume for data lake acceleration';
+```
+Key parameter descriptions:
+| Parameter | Description |
+|---|---|
+| `LOCATION` | OSS path, must point to a specific subdirectory, not the bucket root path |
+| `USING CONNECTION` | References the Storage Connection created in step 1 |
+| `DIRECTORY.ENABLE` | Enables directory metadata index; allows using `DIRECTORY()` function to query file list |
+| `AUTO_REFRESH` | Set to `TRUE` for auto-refresh; when set to `FALSE`, manual `ALTER VOLUME REFRESH` is required |
+| `RECURSIVE` | Recursively scan subdirectories |
+### Step 3: Create Source Table and Export to Volume
+Verify bidirectional read/write capability of the Volume.
+```sql
+-- 1. Create source table and insert test data
+CREATE TABLE IF NOT EXISTS sales_source (
+    id       BIGINT        COMMENT 'Order ID',
+    product  STRING        COMMENT 'Product name',
+    category STRING        COMMENT 'Category',
+    amount   DECIMAL(10,2) COMMENT 'Amount',
+    dt       STRING        COMMENT 'Date'
+) COMMENT 'Data lake acceleration test source table';
+INSERT INTO sales_source VALUES
+    (1, 'iPhone 15',    'Electronics', 8999.00, '2026-06-01'),
+    (2, 'MacBook Pro',  'Electronics', 14999.00, '2026-06-01'),
+    (3, 'AirPods',      'Electronics', 1299.00, '2026-06-01'),
+    (4, 'Nike Air Max', 'Sports',      899.00, '2026-06-01'),
+    (5, 'Yoga Mat',     'Sports',      199.00, '2026-06-01');
+-- 2. Export as CSV to Volume
+COPY INTO VOLUME my_data_vol
+    SUBDIRECTORY 'export/'
+FROM TABLE sales_source
+    FILE_FORMAT = (TYPE = CSV);
+-- 3. Export as Parquet to Volume
+COPY INTO VOLUME my_data_vol
+    SUBDIRECTORY 'export/'
+FROM TABLE sales_source
+    FILE_FORMAT = (TYPE = PARQUET);
+```
+> ⚠️ **`COPY INTO VOLUME` requires `SUBDIRECTORY`**: omitting this clause will throw `Syntax error at or near 'FROM'`. To export to the Volume root path, use `SUBDIRECTORY '/'`.
+> ⚠️ **Export syntax**: `COPY INTO VOLUME` uses `FILE_FORMAT = (TYPE = CSV/PARQUET)`, not `USING CSV`. `USING` is only used for `SELECT FROM VOLUME` to query files.
+### Step 4: Validate Volume File Read/Write
+```sql
+-- Refresh directory cache (manual refresh required when AUTO_REFRESH=FALSE)
+ALTER VOLUME my_data_vol REFRESH;
+-- View exported files
+SELECT relative_path, size, last_modified_time
+FROM DIRECTORY(VOLUME my_data_vol)
+WHERE relative_path LIKE 'export/%';
+-- Directly query CSV files
+SELECT * FROM VOLUME my_data_vol
+    USING CSV
+    FILES('export/part00001.csv');
+-- Directly query Parquet files (preserves column names)
+SELECT id, product, category, amount, dt
+FROM VOLUME my_data_vol
+    USING PARQUET
+    FILES('export/part00001.parquet');
+```
+> **CSV vs Parquet column name difference**: CSV files without headers auto-generate column names `f0, f1, f2...`; Parquet files preserve original column names. To use original column names with CSV, add `OPTIONS('header'='true')` on import.
+### Step 5: Create Pipe for Continuous Ingestion
+Pipe continuously monitors the Volume for new files and automatically ingests them into the target table.
+```sql
+-- 1. Create dedicated Volume for Pipe (must point to a separate subdirectory)
+CREATE EXTERNAL VOLUME IF NOT EXISTS pipe_vol
+    LOCATION 'oss://my-bucket/data/incoming/'
+    USING CONNECTION my_oss_conn
+    DIRECTORY = (ENABLE = TRUE, AUTO_REFRESH = TRUE)
+    RECURSIVE = TRUE
+    COMMENT 'Dedicated Volume for Pipe continuous ingestion';
+-- 2. Create target table
+CREATE TABLE IF NOT EXISTS sales_ods (
+    id       BIGINT        COMMENT 'Order ID',
+    product  STRING        COMMENT 'Product name',
+    category STRING        COMMENT 'Category',
+    amount   DECIMAL(10,2) COMMENT 'Amount',
+    dt       STRING        COMMENT 'Date'
+) COMMENT 'ODS layer — Pipe ingestion target table';
+-- 3. Create Pipe (LIST_PURGE mode)
+CREATE PIPE IF NOT EXISTS sales_pipe
+    INGEST_MODE = 'LIST_PURGE'
+    VIRTUAL_CLUSTER = 'DEFAULT'
+    COMMENT 'Sales data continuous ingestion pipeline'
+AS
+COPY INTO sales_ods
+FROM VOLUME pipe_vol
+USING CSV PURGE = TRUE;
+```
+> ⚠️ **Pipe key constraints**:
+> - Each Pipe needs a dedicated Volume; multiple Pipes cannot share the same Volume
+> - `LOCATION` must point to a specific subdirectory, not the bucket root path
+> - `LIST_PURGE` mode **deletes source files** after successful ingestion (irreversible); use `EVENT_NOTIFICATION` mode to keep files
+> - `PURGE = TRUE` must appear after `USING <format>`, not inside OPTIONS
+#### Pipe Management
+```sql
+-- View Pipe status
+DESC PIPE EXTENDED sales_pipe;
+-- Key fields: pipe_status (RUNNING/PAUSED), ingest_mode, input_name, output_name
+-- Pause Pipe (stop scanning new files)
+ALTER PIPE sales_pipe SET PIPE_EXECUTION_PAUSED = TRUE;
+-- Resume Pipe (restart scanning)
+ALTER PIPE sales_pipe SET PIPE_EXECUTION_PAUSED = FALSE;
+-- View imported file records (7-day retention)
+SELECT * FROM load_history('sales_ods');
+-- Returns: file_path, last_copy_time, file_size, status, first_error_message
+```
+> **Deduplication mechanism**: Pipe deduplicates by file path via `load_history` (within 7 days). Files with the same name will not be re-imported. To reload the same file, wait 7 days or rename the file before re-uploading.
+#### Trigger Pipe Loading
+Pipe starts running immediately after creation (polls approximately every 30 seconds). Writing new files to the Volume path triggers loading:
+```sql
+-- Simulate "new file arrival" via COPY INTO VOLUME
+COPY INTO VOLUME pipe_vol
+    SUBDIRECTORY '/'
+FROM (SELECT * FROM sales_source WHERE dt = '2026-06-01')
+    FILE_FORMAT = (TYPE = CSV);
+-- Verify data has been loaded after a moment
+SELECT COUNT(*) FROM sales_ods;  -- should return 5
+```
+> ⚠️ **Files written during pause**: Files written while Pipe is paused will not be loaded. After resuming, they will be detected in the next scan. If the file name matches an already-loaded file, it will be skipped by the deduplication mechanism.
+### Step 6: Create Dynamic Table for Incremental Consumption
+Based on the Pipe-ingested table, create a Dynamic Table for automatic incremental aggregation.
+```sql
+-- Enable change tracking on source table (prerequisite for incremental refresh)
+ALTER TABLE sales_ods SET PROPERTIES ('change_tracking' = 'true');
+-- Create Dynamic Table, aggregate by category
+CREATE OR REPLACE DYNAMIC TABLE sales_summary
+    REFRESH INTERVAL 1 HOUR vcluster DEFAULT
+    COMMENT 'Category summary — incremental refresh'
+AS
+SELECT
+    category,
+    COUNT(*)     AS order_cnt,
+    SUM(amount)  AS total_amount,
+    AVG(amount)  AS avg_amount,
+    MIN(dt)      AS min_date,
+    MAX(dt)      AS max_date
+FROM sales_ods
+GROUP BY category;
+-- Immediately trigger first refresh (resets refresh baseline time)
+REFRESH DYNAMIC TABLE sales_summary;
+-- Query results
+SELECT * FROM sales_summary ORDER BY category;
+```
+> **Refresh frequency note**: `REFRESH INTERVAL 1 HOUR` calculates the next trigger based on creation time and does not align to clock hours. To trigger at a specific time, create near the target time, or execute `REFRESH` immediately after creation to reset the baseline.
+#### View DT Refresh History
+```sql
+SHOW DYNAMIC TABLE REFRESH HISTORY WHERE name = 'sales_summary';
+-- Key fields: state (SUCCEED), refresh_mode (INCREMENTAL/FULL), duration, source_tables
+```
+---
+## Full Data Flow Validation
+```sql
+-- Validate data consistency across all stages
+SELECT 'Source' AS stage, COUNT(*) AS rows FROM sales_source
+UNION ALL
+SELECT 'ODS' AS stage, COUNT(*) AS rows FROM sales_ods
+UNION ALL
+SELECT 'Summary' AS stage, COUNT(*) AS rows FROM sales_summary;
+```
+| Stage | Data Volume | Description |
+|---|---|---|
+| Source | 5 rows | Raw data (INSERT) |
+| ODS | 5 rows | Pipe ingestion (CSV → table) |
+| Summary | 3 rows | Dynamic Table aggregation (3 category groups) |
+---
+## Best Practices
+### File Size Recommendations
+| Format | Recommended Size | Description |
+|---|---|---|
+| gzip compressed | ~50 MB | Files that are too large reduce parallelism |
+| CSV uncompressed | 128-256 MB | Balance between scan speed and file count |
+| Parquet uncompressed | 128-256 MB | Columnar storage, more efficient for queries |
+### Volume and Pipe Design Principles
+1. **Each Pipe has its own Volume**: Different Pipes cannot share the same Volume to avoid interference
+2. **Volume points to a subdirectory**: Do not point to the bucket root path, as this will cause Pipe creation errors
+3. **LIST_PURGE vs EVENT_NOTIFICATION**:
+   - `LIST_PURGE`: Simple configuration, suitable for most scenarios, deletes source files after loading
+   - `EVENT_NOTIFICATION`: Low latency, retains source files, but only supports OSS+S3, and requires additional MNS/SQS configuration
+### Dynamic Table Design Principles
+1. **Use GP type Virtual Cluster** (such as `DEFAULT`): GP type supports small file merging; AP type does not
+2. **Enable change_tracking**: If the source table does not have it enabled, DT performs a full refresh every time with no incremental support
+3. **REFRESH immediately after creation**: Ensures first data availability and resets the refresh baseline time
+### Data Lifecycle
+```
+File upload → Pipe scan → COPY INTO ingest → PURGE delete → Dynamic Table incremental refresh
+     ↓              ↓              ↓               ↓                        ↓
+ OSS write     30s polling   load_history record  source file deleted   aggregation update
+```
+---
+## Test Validation Results
+The following results are from actual testing on an Alibaba Cloud Shanghai instance (`f8866243`):
+| Test Item | Result | Details |
+|---|---|---|
+| Storage Connection creation | ✅ | OSS connection normal |
+| External Volume mount | ✅ | Directory access normal; `AUTO_REFRESH=FALSE` requires manual refresh |
+| SELECT FROM VOLUME (CSV) | ✅ | Without header, column names are f0-f4; Parquet preserves column names |
+| SELECT FROM VOLUME (Parquet) | ✅ | Column names and types both preserved |
+| COPY INTO TABLE (CSV) | ✅ | 5 rows correctly imported |
+| COPY INTO TABLE (Parquet) | ✅ | 5 rows correctly imported |
+| COPY INTO VOLUME export | ⚠️ | **Must include `SUBDIRECTORY`**, otherwise syntax error |
+| Pipe LIST_PURGE creation | ✅ | Status immediately becomes RUNNING |
+| Pipe load trigger | ✅ | Auto-loaded in ~30 seconds; load_history records complete |
+| Pipe PURGE deletion | ✅ | Source files auto-deleted after successful load |
+| Pipe pause/resume | ✅ | Files not loaded during pause; re-scanned after resume |
+| Pipe deduplication | ✅ | Same-name files correctly blocked by load_history (7-day retention) |
+| Dynamic Table incremental refresh | ✅ | INCREMENTAL mode, aggregation completed in 346ms |
+---
+## Notes
+| Note | Impact | Recommendation |
+|---|---|---|
+| `COPY INTO VOLUME` requires `SUBDIRECTORY` | Without it, syntax error | Use `SUBDIRECTORY '/'` for root path |
+| Generic CSV column names | Without header, column names are f0-f4 | Use `OPTIONS('header'='true')` or switch to Parquet |
+| Manual refresh needed when `AUTO_REFRESH=FALSE` | Directory does not update | Execute `ALTER VOLUME name REFRESH` |
+| Pipe same-name file deduplication | Same-name files not loaded after pause/resume | Rename file on re-upload, or wait 7 days for expiration |
+| `load_history` column name | `last_copy_time` not `last_load_time` | Pay attention to column name when querying |
+| Virtual Cluster auto-sleep | Suspends after 60s without queries | Serverless mode pays on-demand, no concern needed |
+| Pipe COPY statement is immutable | When logic adjustment is needed | DROP PIPE then CREATE again |
+| AP type Virtual Cluster does not support small file merging | Query performance degrades over time | Always use GP type (`DEFAULT`) |
+---
+## Related Documents
+- [Multi-Cloud Unified Data Lake Acceleration](lakehouse-multi-cloud-acceleration.md) — Alibaba Cloud/Tencent Cloud/AWS real-world comparison
+- [Volume Overview](volume-overview.md) — Volume concepts, types, and file operations
+- [Object Storage Pipe](pipe-storage-object.md) — LIST_PURGE and EVENT_NOTIFICATION complete configuration
+- [Pipe Overview](pipe-overview.md) — Pipe vs Table Stream comparison
+- [Dynamic Table Overview](dynamic-table-introduce.md) — Incremental computation mechanism
+- [Create External Volume](create-external-volume.md) — Complete DDL syntax
+- [Import Data from Volume](from_volume_to_table.md) — COPY INTO syntax
+- [Export Data to Volume](from_lakehouse_to_volume.md) — Export syntax
+- [Query SHOW JOBS](show-jobs.md) — Filter Pipe jobs by query_tag