npm - @clickzetta/cz-cli-darwin-x64 - Versions diffs - 0.3.92 → 0.3.94 - Mend

@clickzetta/cz-cli-darwin-x64 0.3.92 → 0.3.94

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (69) hide show

package/bin/skills/clickzetta-volume-manager/references/volume-ddl.md CHANGED Viewed

@@ -1,20 +1,35 @@
-# Volume 管理参考
+# Volume Management Reference
-> 来源：https://www.yunqi.tech/documents/datalake_volume_object 等
+> Source: https://www.yunqi.tech/documents/datalake_volume_object and others
-## Volume 类型
+## Volume Types
-| 类型 | 说明 |
-|---|---|
-| 外部 Volume（External Volume） | 挂载 OSS/COS/S3 等对象存储路径 |
-| 内部 Volume（Internal Volume） | 系统托管存储，含 User Volume、Table Volume、命名 Volume |
+| Type | Description | Lifecycle |
+|---|---|---|
+| External Volume | Mount OSS/COS/S3 object storage paths via Storage Connection | User creates/drops |
+| Managed Volume | ClickZetta-managed storage, no connection needed | User creates/drops |
+| User Volume | Auto-created per user per workspace, user-scoped access | Auto-managed; data removed when user deleted |
+| Table Volume | Auto-created per table, access tied to table permissions | Auto-managed; data removed when table dropped |
+## SQL Reference Patterns
+```sql
+-- External Volume / Managed Volume
+VOLUME [[<workspace>].<schema>].volume_name
+-- User Volume
+USER VOLUME
+-- Table Volume
+TABLE VOLUME [[<workspace>].<schema>].table_name
+```
 ---
 ## CREATE EXTERNAL VOLUME
 ```sql
--- OSS（Connection 必须使用小写 access_id/access_key）
+-- OSS
 CREATE EXTERNAL VOLUME my_oss_volume
   LOCATION 'oss://<bucket>/<path>'
   USING CONNECTION my_oss_conn
@@ -36,20 +51,33 @@ CREATE EXTERNAL VOLUME my_s3_volume
   RECURSIVE = TRUE;
 ```
-参数说明：
-- `LOCATION`：对象存储路径
-- `USING CONNECTION`：已创建的 STORAGE CONNECTION 名称
-- `DIRECTORY`：目录功能配置，`ENABLE=TRUE` 开启目录索引，`AUTO_REFRESH=TRUE` 自动刷新
-- `RECURSIVE`：是否递归扫描子目录
+Parameters:
+- `LOCATION`: Object storage path
+- `USING CONNECTION`: Name of an existing STORAGE CONNECTION
+- `DIRECTORY`: Directory configuration, `ENABLE=TRUE` enables directory indexing, `AUTO_REFRESH=TRUE` enables auto-refresh
+- `RECURSIVE`: Whether to recursively scan subdirectories
+> If new files are not visible via `SHOW VOLUME DIRECTORY` after upload, run `ALTER VOLUME name REFRESH` manually.
+---
+## CREATE VOLUME (Managed Volume)
+Managed Volumes use ClickZetta-managed object storage. No Storage Connection or location is required.
+```sql
+CREATE VOLUME my_managed_volume RECURSIVE = TRUE;
+```
-> ⚠️ 上传新文件后如果 `SHOW VOLUME DIRECTORY` 未显示，执行 `ALTER VOLUME name REFRESH` 手动刷新。
+Parameters:
+- `RECURSIVE`: Whether to recursively scan subdirectories
 ---
 ## ALTER VOLUME
 ```sql
--- 刷新目录元数据
+-- Refresh directory metadata
 ALTER VOLUME my_oss_volume REFRESH;
 ```
@@ -57,8 +85,14 @@ ALTER VOLUME my_oss_volume REFRESH;
 ## DROP VOLUME
+Only External Volumes and Managed Volumes can be explicitly dropped. User Volume and Table Volume are auto-managed and cannot be dropped.
 ```sql
+-- Drop External Volume
 DROP VOLUME IF EXISTS my_oss_volume;
+-- Drop Managed Volume
+DROP VOLUME IF EXISTS my_managed_volume;
 ```
 ---
@@ -66,134 +100,201 @@ DROP VOLUME IF EXISTS my_oss_volume;
 ## SHOW / DESC VOLUME
 ```sql
--- 列出所有 Volume
+-- List all Volumes
 SHOW VOLUMES;
--- 按条件过滤（SHOW VOLUMES 不支持 WHERE，使用 information_schema）
+-- Filter by condition (SHOW VOLUMES does not support WHERE, use information_schema)
 SELECT volume_name, volume_type, volume_region, volume_creator
 FROM information_schema.volumes
 WHERE volume_type = 'EXTERNAL';
--- 按名称查找
+-- Find by name
 SELECT * FROM information_schema.volumes
 WHERE volume_name = 'my_oss_volume';
--- 查看 Volume 详情
+-- View Volume details
 DESC VOLUME my_oss_volume;
--- 查看 Volume 目录下的文件
+-- View files in Volume directory
 SHOW VOLUME DIRECTORY my_oss_volume;
 ```
 ---
-## 查看目录元数据（DIRECTORY 函数）
+## Viewing Directory Metadata (DIRECTORY Function)
 ```sql
--- 查看 Volume 目录元数据（需先 ALTER VOLUME REFRESH）
+-- View Volume directory metadata (requires prior ALTER VOLUME REFRESH)
 SELECT * FROM DIRECTORY(VOLUME my_oss_volume);
 ```
 ---
-## User Volume 操作
+## User Volume Operations
+User Volume is auto-created per user per workspace and bound to the user. It can only be accessed by that user. Cannot be explicitly created or dropped. When the user is deleted, the User Volume becomes unavailable and its data is removed.
+All four Volume types support file-level operations. `PUT` and `GET` require client-side support (e.g., cz-cli, Java JDBC driver, Python connector). **ClickZetta Studio Web does not support PUT/GET.**
 ```sql
--- 查看 User Volume 文件列表
+-- List files (all types)
+SHOW VOLUME DIRECTORY my_oss_volume;
+SHOW VOLUME DIRECTORY my_managed_volume;
 SHOW USER VOLUME DIRECTORY;
+SHOW TABLE VOLUME DIRECTORY my_table;
--- 上传文件到 User Volume 根目录
-PUT '/local/path/file.csv' TO USER VOLUME;
+-- Upload files (External / Managed Volume)
+PUT '/local/path/file.csv' TO VOLUME my_oss_volume;
+PUT '/local/path/file.csv' TO VOLUME my_managed_volume;
--- 上传并指定目标路径
+-- Upload to User Volume
+PUT '/local/path/file.csv' TO USER VOLUME;
 PUT '/local/path/file.csv' TO USER VOLUME FILE 'subdir/file.csv';
--- 通配符上传多个文件
 PUT '/local/path/images/*' TO USER VOLUME SUBDIRECTORY 'images/';
--- 下载文件
+-- Upload to Table Volume
+PUT '/local/path/file.csv' TO TABLE VOLUME my_table;
+-- Download files (External / Managed Volume)
+GET VOLUME my_oss_volume FILE 'subdir/file.csv' TO '/local/output/';
+GET VOLUME my_managed_volume FILE 'subdir/file.csv' TO '/local/output/';
+-- Download from User Volume
 GET USER VOLUME FILE 'subdir/file.csv' TO '/local/output/';
--- 删除文件
+-- Download from Table Volume
+GET TABLE VOLUME my_table FILE 'subdir/file.csv' TO '/local/output/';
+-- Delete files (all types)
+REMOVE VOLUME my_oss_volume FILE 'subdir/file.csv';
+REMOVE VOLUME my_managed_volume FILE 'subdir/file.csv';
 REMOVE USER VOLUME FILE 'subdir/file.csv';
+REMOVE TABLE VOLUME my_table FILE 'subdir/file.csv';
--- 删除目录下所有文件
+-- Delete all files in a directory
 REMOVE USER VOLUME SUBDIRECTORY '/';
 ```
 ---
-## 从 Volume 查询数据（SELECT FROM VOLUME）
+## Querying Data from Volume (SELECT FROM VOLUME)
 ```sql
--- 查询 CSV 文件
+-- Query External Volume files
 SELECT * FROM VOLUME my_oss_volume
 USING CSV
 OPTIONS('header' = 'true', 'sep' = ',')
 SUBDIRECTORY 'data/'
 LIMIT 100;
--- 查询 Parquet 文件
+-- Query Managed Volume files
+SELECT * FROM VOLUME my_managed_volume
+USING CSV
+OPTIONS('header' = 'true')
+FILES('data.csv');
+-- Query Parquet files
 SELECT * FROM VOLUME my_oss_volume
 USING PARQUET
 FILES('part-00001.parquet', 'part-00002.parquet');
--- 正则匹配文件
+-- Regex match files
 SELECT * FROM VOLUME my_oss_volume
 USING PARQUET
 REGEXP '.*2024-0[1-3].parquet';
--- 查询 User Volume 文件
+-- Query User Volume files
 SELECT * FROM USER VOLUME
 USING CSV
 OPTIONS('header' = 'true')
 FILES('data.csv')
 LIMIT 10;
+-- Query Table Volume files
+SELECT * FROM TABLE VOLUME my_table
+USING CSV
+OPTIONS('header' = 'true')
+FILES('data.csv')
+LIMIT 10;
 ```
-支持格式：`CSV`、`PARQUET`、`ORC`、`JSON`、`BSON`
+Supported formats: `CSV`, `PARQUET`, `ORC`, `JSON`, `BSON`
-CSV OPTIONS 常用参数：
-- `header`：是否有表头，默认 `false`
-- `sep`：列分隔符，默认 `,`
-- `compression`：压缩格式（gzip/zstd/zlib）
-- `multiLine`：是否支持多行字段，默认 `false`
+Common CSV OPTIONS parameters:
+- `header`: Whether the file has a header row, default `false`
+- `sep`: Column delimiter, default `,`
+- `compression`: Compression format (gzip/zstd/zlib)
+- `multiLine`: Whether multi-line fields are supported, default `false`
 ---
-## COPY INTO TABLE（从 Volume 导入）
+## COPY INTO TABLE (Import from Volume)
 ```sql
+-- Import from External Volume
 COPY INTO my_table
 FROM VOLUME my_oss_volume
 USING CSV
 OPTIONS('header' = 'true')
 SUBDIRECTORY 'data/';
+-- Import from Managed Volume
+COPY INTO my_table
+FROM VOLUME my_managed_volume
+USING CSV
+OPTIONS('header' = 'true')
+FILES('data.csv');
+-- Import from User Volume
+COPY INTO my_table
+FROM USER VOLUME
+USING CSV
+OPTIONS('header' = 'true')
+FILES('data.csv');
+-- Import from Table Volume
+COPY INTO my_table
+FROM TABLE VOLUME source_table
+USING CSV
+OPTIONS('header' = 'true')
+FILES('data.csv');
 ```
-## COPY INTO VOLUME（导出到 Volume）
+## COPY INTO VOLUME (Export to Volume)
 ```sql
--- 导出表到 External Volume
+-- Export table to External Volume
 COPY INTO VOLUME my_oss_volume
 SUBDIRECTORY 'export/'
 FROM TABLE my_table
 FILE_FORMAT = (TYPE = CSV);
--- 导出查询结果
+-- Export query result
 COPY INTO VOLUME my_oss_volume
 SUBDIRECTORY 'export/'
 FROM (SELECT * FROM orders WHERE year = 2024)
 FILE_FORMAT = (TYPE = PARQUET COMPRESSION = 'GZIP');
--- 导出到 User Volume
+-- Export to Managed Volume
+COPY INTO VOLUME my_managed_volume
+SUBDIRECTORY 'export/'
+FROM TABLE my_table
+FILE_FORMAT = (TYPE = CSV);
+-- Export to User Volume
 COPY INTO USER VOLUME
 SUBDIRECTORY 'export/'
 FROM TABLE my_table
 FILE_FORMAT = (TYPE = CSV);
+-- Export to Table Volume
+COPY INTO TABLE VOLUME target_table
+SUBDIRECTORY 'export/'
+FROM TABLE my_table
+FILE_FORMAT = (TYPE = CSV);
 ```
-> ⚠️ **关键区分**：
-> - **导入**（COPY INTO TABLE / SELECT FROM VOLUME）：用 `USING CSV/PARQUET/JSON` + `OPTIONS(...)`
-> - **导出**（COPY INTO VOLUME）：用 `FILE_FORMAT = (TYPE = CSV/PARQUET/JSON)`
-> - 两者语法不可混用！
+> **Key distinction**:
+> - **Import** (COPY INTO TABLE / SELECT FROM VOLUME): Use `USING CSV/PARQUET/JSON` + `OPTIONS(...)`
+> - **Export** (COPY INTO VOLUME): Use `FILE_FORMAT = (TYPE = CSV/PARQUET/JSON)`
+> - These two syntaxes are not interchangeable!

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@clickzetta/cz-cli-darwin-x64",
-  "version": "0.3.92",
+  "version": "0.3.94",
   "description": "cz-cli binary for macOS x64 (Intel)",
   "os": [
     "darwin"

package/bin/skills/clickzetta-dynamic-table/best-practices/scheduling-guide.md DELETED Viewed

@@ -1,135 +0,0 @@
-# Dynamic Table Scheduling Method Selection Guide
-## Comparison of Two Scheduling Methods
-| Method | Approach | Advantages | Disadvantages |
-|------|------|------|------|
-| **DDL built-in scheduling** (REFRESH INTERVAL) | Write a `REFRESH INTERVAL` clause in CREATE DYNAMIC TABLE; Lakehouse triggers automatically | Simple; no additional configuration needed | No alerts, no dependency orchestration; refresh status can only be checked via manual SQL |
-| **Studio Task scheduling** (recommended) | Create a scheduled task in Studio; task content is the `REFRESH DYNAMIC TABLE` command | Supports upstream/downstream dependencies, unified alerts, visual monitoring | Requires creating an additional Task |
-**Studio Task scheduling is recommended for production environments.** DDL built-in scheduling is suitable for quick validation and development/testing phases.
----
-## DDL Built-in Scheduling
-Define the refresh frequency via the `REFRESH INTERVAL` clause in the CREATE statement; Lakehouse triggers periodically:
-```sql
-CREATE DYNAMIC TABLE sales_daily
-REFRESH INTERVAL 1 DAY
-VCLUSTER default
-AS
-SELECT DATE(created_at) AS dt, SUM(amount) AS total
-FROM orders
-GROUP BY 1;
-```
-### Drawbacks
-- **No alerts**: refresh failures are not proactively notified; status can only be checked by manually executing SQL
-- **No dependency orchestration**: cannot declare "refresh only after upstream task completes"; can only stagger by time interval
-- **High monitoring cost**: need to periodically manually execute the following command to check whether refresh is normal
-```sql
--- View refresh history; confirm state is SUCCEED
-SHOW DYNAMIC TABLE REFRESH HISTORY WHERE name = 'your_dt_name';
-```
-Key field descriptions:
-| Field | Meaning |
-|------|------|
-| `state` | SUCCEED / FAILED / RUNNING / QUEUED |
-| `refresh_mode` | INCREMENTAL / FULL / NO_DATA |
-| `error_message` | Error message on failure |
-| `duration` | Duration of this refresh |
-| `stats` | Incremental row count (rows_inserted / rows_deleted) |
----
-## Studio Task Scheduling (Recommended for Production)
-Create a SQL task in Studio; task content is the REFRESH command; managed by Studio's scheduling system.
-### Task Content
-**Non-partitioned DT:**
-```sql
-REFRESH DYNAMIC TABLE schema_name.dt_name;
-```
-**Partitioned DT (with parameters):**
-```sql
-SET dt.args.ds = '${bizdate}';
-REFRESH DYNAMIC TABLE schema_name.dt_name PARTITION (ds = '${bizdate}');
-```
-`${bizdate}` is automatically replaced with the business date by the Studio scheduling engine at each execution.
-### Must Configure Self-dependency
-Concurrent REFRESH on the same DT is prohibited (causes write conflicts or data inconsistency). The Task must enable **self-dependency** to ensure the next instance starts only after the previous one completes.
-### Upstream Dependency Configuration
-- If the DT's source table data needs to wait for an upstream task to produce before refreshing → configure upstream dependency
-- If source table data does not require synchronized readiness (e.g., real-time write table) → upstream dependency is optional
-### Alert Configuration
-Studio Tasks support the following alert rules; all are recommended for production environments:
-- **Failure alert**: notify when task execution fails
-- **Timeout alert**: notify when refresh duration exceeds a threshold (used to detect performance regression)
-- **Not-run alert**: notify when the task has not started within the expected time
----
-## Scheduling Orchestration for Multi-level DT Pipelines
-When multiple DTs form upstream/downstream dependencies (e.g., DT_A → DT_B → DT_C), each DT corresponds to one Studio Task; task dependency relationships ensure execution order:
-```
-Task_A (REFRESH DT_A)
-    └─ Task_B (REFRESH DT_B, depends on Task_A)
-        └─ Task_C (REFRESH DT_C, depends on Task_B)
-```
-REFRESHes for different partitions can run in parallel (assigned to different Task instances); concurrent refresh of the same partition/non-partitioned DT is prohibited.
----
-## Decision Logic: Recommend Scheduling Method to Users
-When helping users create or configure a DT, recommend based on the following logic:
-1. **Is Studio available?**
-   - Yes → always recommend Studio Task scheduling, regardless of development or production environment
-   - No → use DDL built-in scheduling or a third-party scheduling engine
-2. **Are there upstream/downstream dependencies?**
-   - Yes (e.g., source table is produced by another task) → must use Studio Task; configure upstream dependency
-   - No → still recommend Studio Task to gain alert capability
-3. **User has already written a REFRESH INTERVAL clause?**
-   - Suggest: the REFRESH INTERVAL clause can be removed and replaced with Studio Task scheduling to gain alert and dependency management capability
-   - REFRESH INTERVAL and Studio Task can coexist, but will cause double triggering; choosing one is recommended
----
-## Alert Message Template
-When the user is using DDL built-in scheduling, use the following message:
-> 💡 **Suggestion**: You are currently using DDL built-in scheduling (REFRESH INTERVAL), which has the following limitations:
->
-> 1. **No alerts**: refresh failures are not proactively notified; you need to manually execute `SHOW DYNAMIC TABLE REFRESH HISTORY` to check status
-> 2. **No dependency orchestration**: upstream/downstream task dependencies cannot be declared; can only stagger by time interval
->
-> **Recommendation**: Create a scheduled task in Studio with content `REFRESH DYNAMIC TABLE schema.dt_name`, and configure:
-> - Self-dependency (prevent concurrent refresh)
-> - Failure alert + timeout alert
-> - Upstream dependency (if source table is produced by other tasks)

package/bin/skills/clickzetta-dynamic-table/dt-creator/references/dt-declaration-strategy.md DELETED Viewed

@@ -1,185 +0,0 @@
-# Dynamic Table Declaration Strategy
-DT has two creation syntaxes: static partition DT and dynamic partition DT (non-partitioned DT can be viewed as a special case of dynamic partition). The two differ fundamentally in creation syntax, refresh behavior, and incremental behavior.
-## Core Concepts
-### Static Partition DT (Partitioned DT with SESSION_CONFIGS args)
-The SQL references partition parameters via `SESSION_CONFIGS()`, and a specific partition value is specified at each REFRESH. Each partition refreshes independently — each partition refresh unit can be viewed as an independent DT.
-```sql
-CREATE DYNAMIC TABLE order_daily (
-    id BIGINT, amount DECIMAL(12,2), ds STRING
-)
-PARTITIONED BY (ds)
-AS
-SELECT id, amount, SESSION_CONFIGS()['dt.args.ds'] AS ds
-FROM orders
-WHERE ds = SESSION_CONFIGS()['dt.args.ds'];
--- Specify partition at refresh time
-set dt.args.ds=2025-01-01
-REFRESH DYNAMIC TABLE order_daily PARTITION(ds = '2025-01-01');
-```
-### Dynamic Partition DT (Non-partitioned DT / DT without args)
-The SQL does not reference `SESSION_CONFIGS()`, or although partitioned, the partition values are dynamically produced by the query logic. Each REFRESH processes all incremental data from all source tables.
-Dynamic partition DTs do not allow any command other than REFRESH to modify data (INSERT/UPDATE/DELETE/MERGE are all unavailable); data is driven entirely by REFRESH.
-Therefore, the following ETL scenarios are not suitable for dynamic partition DT:
-- Need to manually patch data (e.g., a few rows are found to be incorrect and need to be directly UPDATEd)
-- Need to delete data by condition (e.g., cleaning dirty data, deleting expired records)
-- Need MERGE INTO for upsert (e.g., consuming a stream and merging into a target table in a CDC scenario)
-- Need INSERT INTO to append external data (e.g., manually importing a batch of supplementary data)
-- Need to backfill or re-refresh partitions independently (dynamic partition DT can only do a full table refresh; individual partitions cannot be refreshed separately)
-- Downstream tasks need to write to the same table (DT has exclusive write ownership)
-```sql
-CREATE DYNAMIC TABLE order_summary (
-    category STRING, total_amount DECIMAL(12,2)
-)
-AS
-SELECT category, SUM(amount) AS total_amount
-FROM orders
-GROUP BY category;
--- No partition specified at refresh time
-REFRESH DYNAMIC TABLE order_summary;
-```
-## Key Differences
-| Dimension | Static Partition DT | Dynamic Partition DT |
-|------|-----------|-----------|
-| Does SQL contain `SESSION_CONFIGS()`? | Yes, used to reference partition parameters | No |
-| REFRESH syntax | `REFRESH ... PARTITION(ds='xxx')` | `REFRESH ...` (no PARTITION) |
-| Incremental scope | Only processes incremental data for the specified partition | Processes all incremental data from all source tables |
-| Scheduling method | External scheduler triggers one partition at a time | External scheduler triggers on a timer |
-| Data lifecycle | Managed per partition; can backfill/delete independently | Managed as a whole table |
-| State tables | Maintained independently per partition | Maintained globally |
-| Suitable data patterns | T+1 batch processing, time-partitioned ETL | Real-time streams, global aggregation, no clear partition key |
-## Selection Decision Tree
-```
-Does your data have a clear time/business partition key?
-│
-├─ Yes → Was the original ETL doing INSERT OVERWRITE by partition?
-│       │
-│       ├─ Yes → Use static partition DT
-│       │       (maintain the original partition granularity; each partition refreshes independently)
-│       │
-│       └─ No → Is the data volume large? Do you need per-partition lifecycle management?
-│               │
-│               ├─ Yes → Use static partition DT
-│               │       (even if the original was not partitioned, adding partitions is recommended for manageability)
-│               │
-│               └─ No → Use dynamic partition DT
-│                       (simple scenario; no partition management needed)
-│
-└─ No → Use dynamic partition DT
-        (global aggregation, real-time summary, etc.)
-```
-## Static Partition DT — Details
-### Applicable Scenarios
-1. **T+1 batch ETL migration**
-   - Original SQL follows the `INSERT OVERWRITE TABLE t PARTITION(ds='${ds}')` pattern
-   - Refreshes once per day/hour by partition
-   - Needs to support historical partition backfill
-2. **Sliding window computation**
-   - E.g., aggregation over the last 7 days, period-over-period comparison
-   - SQL references `SESSION_CONFIGS()['dt.args.ds']` and `sub_days(...)` for window range
-3. **Per-partition data lifecycle management**
-   - Automatically clean up expired partitions via `data_lifecycle`
-   - Can backfill a single partition without affecting others
-4. **Self-referencing DT (daily comparison, SCD)**
-   - Current partition depends on the result of the previous partition
-   - Must use static partition, because "current partition" and "previous partition" need to be explicitly specified
-### Refresh Method
-```sql
--- Refresh one partition at a time
-set dt.args.ds=2025-01-15
-REFRESH DYNAMIC TABLE my_dt PARTITION(ds = '2025-01-15');
--- Multi-level partition
-set dt.args.pt=20250411
-set dt.args.pt_hour=01
-REFRESH DYNAMIC TABLE my_dt PARTITION(pt = '20250411', pt_hour = '01');
-```
-### Notes
-- Use `cz.optimizer.incremental.backfill.enabled=TRUE` for backfill; it will automatically use full refresh
-- Partition parameters are passed via `set dt.args.xxx=value`; the PARTITION clause in the REFRESH statement specifies the partition value
-## Dynamic Partition DT — Details
-### Applicable Scenarios
-1. **Real-time stream data aggregation**
-   - Source table continuously writes; DT refreshes on a schedule
-   - No partition management needed; each refresh processes all new data
-2. **Global summary tables**
-   - E.g., global TopN, global count, global deduplication
-   - No clear partition key
-3. **Simple JOIN + filter**
-   - Simple transformations without partition parameters
-   - E.g., fact table JOIN dimension table, output wide table
-4. **Multi-source merge (UNION ALL)**
-   - Data from multiple source tables merged into one table
-   - No partition management needed
-### Refresh Method
-```sql
--- Refresh directly; processes all incremental data from all source tables
-REFRESH DYNAMIC TABLE my_dt;
-```
-### Notes
-- Each refresh processes all incremental data from all source tables; if source table change volume is large, refresh may be slow
-- State tables are maintained globally and may grow as data volume increases
-- Per-partition backfill is not supported; only full table refresh is possible
-- Suitable for scenarios where the change ratio is small (< 5%)
-## Partition Granularity Selection
-When choosing a static partition DT, you also need to decide on partition granularity:
-| Data pattern | Recommended granularity | Notes |
-|---------|------------|------|
-| Strictly ordered time series (e.g., logs) | Minute-level (`dt_min`) | High data volume, frequent writes |
-| Roughly ordered, small amount of late data | Hour-level (`dt_hour`) | Balance between granularity and management complexity |
-| T+1 batch import | Day-level (`ds`) | Most common ETL scenario |
-| By business cycle | Weekly/monthly | Reporting scenarios |
-| Multi-level partition | Day + hour (`ds`, `hour`) | Finer-grained lifecycle management needed |
-Selection principles:
-- Finer granularity → smaller data volume per refresh → higher incremental efficiency
-- Finer granularity → more partitions → more complex management and scheduling
-- Granularity should match the data write frequency: if data is written hourly, partition granularity should not be finer than hourly
-## Determining Partition Strategy from Original ETL
-| Original ETL pattern | Recommended DT partition strategy |
-|--------------|----------------|
-| `INSERT OVERWRITE TABLE t PARTITION(ds='${ds}')` | Static partition DT, day-level |
-| `INSERT OVERWRITE TABLE t PARTITION(ds='${ds}', hour='${hour}')` | Static partition DT, day+hour level |
-| `INSERT OVERWRITE TABLE t PARTITION(ds)` (dynamic partition write) | Dynamic partition DT or static partition DT (depends on whether per-partition management is needed) |
-| `INSERT INTO TABLE t SELECT ...` (no partition) | Dynamic partition DT |
-| `INSERT OVERWRITE TABLE t SELECT ...` (full table overwrite) | Dynamic partition DT |