npm - @clickzetta/cz-cli-darwin-x64 - Versions diffs - 0.3.92 → 0.3.93 - Mend

@clickzetta/cz-cli-darwin-x64 0.3.92 → 0.3.93

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (69) hide show

package/bin/skills/clickzetta-sql-migration/SKILL.md ADDED Viewed

@@ -0,0 +1,128 @@
+---
+name: clickzetta-sql-migration
+description: |
+  Migration guide for SQL workloads moving to ClickZetta Lakehouse from Snowflake,
+  Databricks, or Spark SQL. Covers object concept mapping, syntax differences,
+  function mapping tables, MERGE INTO limitations, the strict implicit type
+  conversion rule, and migration pitfalls. Use this skill ONLY for migration or
+  cross-platform comparison questions. For native ClickZetta SQL syntax (DDL,
+  DML, DQL, functions) reference the ClickZetta Lakehouse documentation.
+  Triggered when the user mentions migration source platforms (Snowflake,
+  Databricks, Delta Lake, Spark SQL) together with ClickZetta, asks "how do I
+  write X (from Snowflake/Spark) in ClickZetta", asks about specific Snowflake
+  or Spark functions/syntax (IFF, ARRAY_SIZE, LISTAGG, FLATTEN, METADATA$ACTION,
+  TARGET_LAG, APPLY CHANGES INTO, ZORDER, WITH RECURSIVE, WHEN NOT MATCHED BY
+  SOURCE, OBJECT_CONSTRUCT, VARIANT colon syntax, CHARINDEX, ZEROIFNULL,
+  DATEADD/DATEDIFF parameter order), asks about implicit type conversion errors,
+  or asks about compatibility/differences between ClickZetta and these
+  platforms.
+  Keywords: Snowflake migration, Databricks migration, Spark SQL migration,
+  Snowflake to ClickZetta, Databricks to ClickZetta, vs Snowflake, vs Spark,
+  vs Databricks, syntax differences, function mapping, implicit type conversion,
+  WHEN NOT MATCHED BY SOURCE, APPLY CHANGES INTO, WITH RECURSIVE, METADATA$ACTION,
+  TARGET_LAG, FLATTEN, IFF, LISTAGG, OBJECT_CONSTRUCT, VARIANT, CHARINDEX
+---
+# ClickZetta SQL Migration Guide
+Use this skill when migrating SQL workloads from Snowflake, Databricks (Delta Lake), or Spark SQL to ClickZetta Lakehouse, or when answering "how does ClickZetta differ from <other system>" questions.
+For native ClickZetta SQL syntax that does not differ from standard SQL, refer to the ClickZetta Lakehouse documentation.
+## Reference Documents
+| Document | When to read |
+|---|---|
+| [Snowflake migration guide](references/migration-snowflake.md) | Migrating from Snowflake — object mapping, type mapping, syntax + function differences |
+| [Databricks migration guide](references/migration-databricks.md) | Migrating from Databricks/Delta Lake — APPLY CHANGES, ZORDER, WHEN NOT MATCHED BY SOURCE alternatives |
+| [vs Snowflake summary](references/vs-snowflake.md) | Cross-platform comparison summary |
+| [vs Spark SQL summary](references/vs-spark.md) | Cross-platform comparison summary |
+| [DML differences](references/dml-differences.md) | INSERT/UPDATE/DELETE/MERGE/COPY syntax that differs from other systems (concise migration view) |
+| [Implicit type conversion](references/implicit-type-conversion.md) | The #1 migration error — strict CAST rules for INSERT/UPDATE |
+| [Function mapping](references/function-mapping.md) | Function-by-function mapping tables (Snowflake/Spark/Databricks → ClickZetta) and unsupported functions |
+| [DDL reference](references/ddl-reference.md) | Detailed DDL syntax — kept for migration completeness; for native ClickZetta DDL prefer the official documentation |
+| [DML reference](references/dml-reference.md) | Detailed DML syntax — kept for migration completeness; for native ClickZetta DML prefer the official documentation |
+| [DQL reference](references/dql-reference.md) | Detailed DQL syntax — kept for migration completeness; for native ClickZetta DQL prefer the official documentation |
+| [Functions reference](references/functions-reference.md) | Detailed function list — kept for migration completeness; for native ClickZetta functions prefer the official documentation |
+---
+## ⚠️ Most Common Migration Pitfalls (Quick Reference)
+| Scenario | Snowflake / Spark / Databricks | ClickZetta |
+|---|---|---|
+| Implicit string→DATE/TIMESTAMP/BOOLEAN/JSON in INSERT | ✅ allowed | ❌ Error — must use `CAST` or typed literals (`DATE '...'`, `TIMESTAMP '...'`, `TRUE`/`FALSE`, `PARSE_JSON(...)`) |
+| `IFF(cond, a, b)` (SF) | — | `IF(cond, a, b)` |
+| `ARRAY_SIZE(arr)` (SF) | `size(arr)` (Spark) | `SIZE(arr)` ✅ or `ARRAY_SIZE(arr)` ✅ — both supported |
+| `LISTAGG(col, ',') WITHIN GROUP (...)` (SF) | — | `GROUP_CONCAT(col ORDER BY col SEPARATOR ',')` |
+| `LATERAL FLATTEN(input => arr)` (SF) | — | `LATERAL VIEW EXPLODE(arr)` |
+| `data:key` JSON access (SF) | — | `data['key']` |
+| `OBJECT_CONSTRUCT('k', v)` (SF) | `STRUCT(v AS k)` (Spark) | `named_struct('k', v)` |
+| `VARIANT` type (SF) | — | `JSON` type |
+| `NUMBER(p, s)` (SF) | — | `DECIMAL(p, s)` |
+| `CHARINDEX(sub, s)` (SF) | — | `INSTR(s, sub)` ⚠️ parameter order reversed |
+| `DATEDIFF(day, start, end)` (SF) | `DATEDIFF(end, start)` (Spark) | both supported, ⚠️ Snowflake order has unit as first arg |
+| `WHEN NOT MATCHED BY SOURCE THEN DELETE` (Databricks) | — | ❌ Not supported — use MERGE INTO + separate DELETE |
+| `APPLY CHANGES INTO` (DLT) | — | TABLE STREAM + MERGE INTO |
+| `WITH RECURSIVE` (SF/Databricks) | ✅ supported | ❌ Not supported — iterate via Python/ZettaPark or pre-build helper tables |
+| `BEGIN; COMMIT; ROLLBACK;` (transactions) | ✅ | ❌ Not supported — use MERGE INTO for atomic operations |
+| `TARGET_LAG = '1 minute'` for dynamic tables (SF) | — | `REFRESH INTERVAL 1 MINUTE VCLUSTER xx` |
+| `METADATA$ACTION` for streams (SF) | — | `__change_type` (values: INSERT / UPDATE_BEFORE / UPDATE_AFTER / DELETE) |
+| `OPTIMIZE t ZORDER BY (col)` (Databricks) | — | `OPTIMIZE t` (small file compaction only, no ZORDER) |
+| `STRUCT(1 AS id, 'a' AS name)` (Spark) | — | `named_struct('id', 1, 'name', 'a')` |
+| `TABLESAMPLE (50 PERCENT)` | — | ❌ PERCENT not supported — use `ORDER BY RAND() LIMIT n` |
+| `CREATE SEQUENCE` (SF) | — | ❌ Not supported — use `IDENTITY(seed)` column (BIGINT only) |
+| `CREATE TEMPORARY TABLE` (SF) | — | ❌ Not supported — use CTE |
+| `CHARINDEX` / `EDITDISTANCE` / `SOUNDEX` (SF) | — | `INSTR` (reversed args) / Python UDF / no equivalent |
+---
+## Object Concept Mapping
+| Snowflake | Databricks | ClickZetta |
+|---|---|---|
+| DATABASE | Catalog (internal) | WORKSPACE |
+| SCHEMA / DATABASE.SCHEMA | Database / Schema | SCHEMA |
+| WAREHOUSE | Cluster / SQL Warehouse | VCLUSTER |
+| STAGE | External Location | VOLUME (+ STORAGE CONNECTION) |
+| STORAGE INTEGRATION | — | STORAGE CONNECTION |
+| SNOWPIPE | Auto Loader | PIPE |
+| STREAM | (Delta CDF / DLT CDC) | TABLE STREAM |
+| DYNAMIC TABLE | DLT (Live Tables) | DYNAMIC TABLE (different syntax) |
+| TASK | Job | Studio Task |
+| SEQUENCE | — | IDENTITY column |
+| SHARE | Delta Sharing | SHARE |
+| — | Unity Catalog (federation) | EXTERNAL CATALOG |
+---
+## Data Type Mapping Quick Reference
+| Snowflake | Spark / Databricks | ClickZetta |
+|---|---|---|
+| `NUMBER(p, s)` / `NUMERIC` | `DECIMAL(p, s)` | `DECIMAL(p, s)` |
+| `INTEGER` / `NUMBER(10,0)` | `INT` / `BIGINT` | `INT` / `BIGINT` |
+| `VARCHAR(n)` / `TEXT` | `STRING` | `STRING` (recommended) or `VARCHAR(n)` |
+| `TIMESTAMP_LTZ` | `TIMESTAMP` | `TIMESTAMP` |
+| `TIMESTAMP_NTZ` | `TIMESTAMP_NTZ` | `TIMESTAMP_NTZ` |
+| `VARIANT` | — | `JSON` |
+| `ARRAY` (untyped) | `ARRAY<T>` | `ARRAY<T>` (must specify element type) |
+| `OBJECT` | `MAP<K,V>` / `STRUCT<...>` | `MAP<K,V>` or `STRUCT<...>` |
+| `GEOGRAPHY` | — | not supported |
+| — | — | `VECTOR(FLOAT, N)` (ClickZetta-specific) |
+---
+## Migration Workflow Pointers
+This skill focuses on **SQL syntax compatibility**. A complete migration involves more than SQL rewrites:
+1. **Object mapping** — see table above
+2. **Schema/DDL conversion** — see [migration-snowflake.md](references/migration-snowflake.md) and [migration-databricks.md](references/migration-databricks.md)
+3. **Data movement** — typically via object storage (S3/OSS) staging + COPY INTO; not covered in detail here
+4. **SQL rewrites** — see this skill's reference documents
+5. **Application/driver layer** — JDBC, Python connector, BI tool reconnection; refer to `clickzetta-lakehouse-connect` skill
+6. **Permission migration** — RBAC concept comparison; refer to `clickzetta-access-control` skill
+7. **Performance tuning re-mapping** — Snowflake CLUSTER BY / Databricks ZORDER → ClickZetta partitioning + indexes; refer to `clickzetta-query-optimizer` skill
+For end-to-end migration planning, combine this skill with the skills listed above.

package/bin/skills/clickzetta-sql-migration/eval_cases.jsonl ADDED Viewed

@@ -0,0 +1,10 @@
+{"case_id":"001","type":"should_call","user_input":"How do I write Snowflake's IFF, ARRAY_SIZE, and LISTAGG in ClickZetta?","expected_skill":"clickzetta-sql-migration","expected_output_contains":["IF(","SIZE("]}
+{"case_id":"002","type":"should_call","user_input":"How to replace Databricks APPLY CHANGES INTO in ClickZetta?","expected_skill":"clickzetta-sql-migration","expected_output_contains":["MERGE INTO"]}
+{"case_id":"003","type":"should_call","user_input":"What are ClickZetta's implicit type conversion rules when migrating from Snowflake?","expected_skill":"clickzetta-sql-migration","expected_output_contains":["implicit","conversion"]}
+{"case_id":"004","type":"should_call","user_input":"How do I migrate a Snowflake VARIANT column to ClickZetta?","expected_skill":"clickzetta-sql-migration","expected_output_contains":["JSON"]}
+{"case_id":"005","type":"should_call","user_input":"Databricks ZORDER equivalent in ClickZetta","expected_skill":"clickzetta-sql-migration","expected_output_contains":["OPTIMIZE"]}
+{"case_id":"006","type":"should_call","user_input":"How to write Snowflake LATERAL FLATTEN in ClickZetta?","expected_skill":"clickzetta-sql-migration","expected_output_contains":["LATERAL VIEW EXPLODE"]}
+{"case_id":"007","type":"should_not_call","user_input":"How do I create a partitioned table in ClickZetta?","forbidden_skill":"clickzetta-sql-migration"}
+{"case_id":"008","type":"should_not_call","user_input":"What is the syntax for SELECT in ClickZetta?","forbidden_skill":"clickzetta-sql-migration"}
+{"case_id":"009","type":"should_not_call","user_input":"How to use window functions in ClickZetta?","forbidden_skill":"clickzetta-sql-migration"}
+{"case_id":"010","type":"should_not_call","user_input":"How do I create a Bloom Filter index?","forbidden_skill":"clickzetta-sql-migration"}

package/bin/skills/clickzetta-sql-migration/references/ddl-reference.md ADDED Viewed

@@ -0,0 +1,350 @@
+# DDL Complete Syntax Reference
+> Based on ClickZetta Lakehouse product documentation, with Snowflake / Spark SQL difference annotations
+---
+## SCHEMA Operations
+```sql
+-- Create
+CREATE SCHEMA IF NOT EXISTS my_schema COMMENT 'description';
+-- Alter
+ALTER SCHEMA my_schema RENAME TO new_schema;
+ALTER SCHEMA my_schema SET COMMENT 'new comment';
+-- Drop (cascades all objects)
+DROP SCHEMA IF EXISTS my_schema;
+-- Show
+SHOW SCHEMAS;
+SHOW SCHEMAS EXTENDED;                          -- includes type column (MANAGED/EXTERNAL)
+SHOW SCHEMAS LIKE 'sales%';
+SHOW SCHEMAS WHERE schema_name = 'public';
+-- Switch
+USE SCHEMA my_schema;
+USE my_schema;                                  -- SCHEMA keyword is optional
+```
+**Differences from Snowflake:**
+- Snowflake uses `USE DATABASE` + `USE SCHEMA`; ClickZetta has no DATABASE layer, use `USE SCHEMA` directly
+- Snowflake supports `CREATE OR REPLACE SCHEMA`; ClickZetta does not, use `IF NOT EXISTS`
+---
+## TABLE Operations
+### CREATE TABLE
+```sql
+-- Basic table creation
+CREATE TABLE IF NOT EXISTS orders (
+    id          BIGINT,
+    customer_id INT,
+    amount      DECIMAL(18, 2)  NOT NULL,
+    status      STRING          DEFAULT 'pending',
+    created_at  TIMESTAMP,
+    tags        ARRAY<STRING>,
+    meta        JSON,
+    COMMENT 'Orders table'
+);
+-- Primary key table (ENABLE VALIDATE RELY: SQL writes also deduplicate)
+CREATE TABLE pk_orders (
+    id     BIGINT PRIMARY KEY,
+    amount DECIMAL(18, 2)
+);
+-- Primary key table (DISABLE NOVALIDATE RELY: only real-time writes deduplicate, SQL writes do not)
+CREATE TABLE cdc_orders (
+    id     BIGINT PRIMARY KEY DISABLE NOVALIDATE RELY,
+    amount DECIMAL(18, 2)
+);
+-- Auto-increment column (BIGINT only, not guaranteed sequential)
+CREATE TABLE auto_id_table (
+    id  BIGINT IDENTITY(1),    -- starts from 1
+    col STRING
+);
+-- Generated column (deterministic expression, cannot be manually inserted)
+CREATE TABLE orders_with_year (
+    id         BIGINT,
+    created_at TIMESTAMP,
+    year       INT GENERATED ALWAYS AS (YEAR(created_at))
+);
+-- Default values (supports non-deterministic functions)
+CREATE TABLE t_default (
+    id         INT,
+    created_at TIMESTAMP DEFAULT current_timestamp(),
+    status     STRING    DEFAULT 'active',
+    score      DOUBLE    DEFAULT random()
+);
+-- Partitioned table (Iceberg hidden partitions)
+CREATE TABLE orders_partitioned (
+    id         BIGINT,
+    amount     DECIMAL(18, 2),
+    created_at TIMESTAMP
+)
+PARTITIONED BY (days(created_at));             -- partition by day
+-- Partition transform functions
+-- years(col)   months(col)   days(col)   hours(col)
+-- bucket(N, col)   truncate(col, W)
+-- Bucketed table
+CREATE TABLE orders_bucketed (
+    id         BIGINT,
+    customer_id INT,
+    amount     DECIMAL(18, 2)
+)
+CLUSTERED BY (customer_id)
+SORTED BY (id ASC)
+INTO 16 BUCKETS;
+-- Data retention period
+CREATE TABLE orders (id BIGINT)
+PROPERTIES ('data_lifecycle' = '30');          -- retain for 30 days
+-- CTAS (Create Table As Select)
+CREATE TABLE orders_copy AS
+SELECT * FROM orders WHERE status = 'completed';
+-- External table (maps to object storage)
+CREATE EXTERNAL TABLE ext_orders (
+    id     BIGINT,
+    amount DECIMAL(18, 2)
+)
+LOCATION 'oss://bucket/orders/'
+STORED AS PARQUET;
+```
+**Differences from Snowflake:**
+- Snowflake `CREATE OR REPLACE TABLE` → ClickZetta `CREATE TABLE IF NOT EXISTS`
+- Snowflake `CLUSTER BY (col)` → ClickZetta `CLUSTERED BY (col) INTO N BUCKETS`
+- Snowflake `AUTOINCREMENT` → ClickZetta `IDENTITY[(seed)]`
+- Snowflake `TRANSIENT TABLE` → ClickZetta has no equivalent (use `data_lifecycle` to control retention)
+- Snowflake `TEMPORARY TABLE` → ClickZetta has no temporary table concept
+- Snowflake `COPY GRANTS` → ClickZetta does not support
+**Differences from Spark SQL:**
+- Spark `USING PARQUET` → ClickZetta does not need it (default is Parquet)
+- Spark `TBLPROPERTIES` → ClickZetta `PROPERTIES`
+- Spark `LOCATION` external table syntax is basically the same
+### ALTER TABLE
+```sql
+-- Rename
+ALTER TABLE orders RENAME TO orders_v2;
+-- Comment
+ALTER TABLE orders SET COMMENT 'new comment';
+-- Data retention period
+ALTER TABLE orders SET PROPERTIES ('data_retention_days' = '7');
+-- Add column
+ALTER TABLE orders ADD COLUMN region STRING AFTER status;
+ALTER TABLE orders ADD COLUMN region STRING FIRST;
+-- Add nested field in complex types
+ALTER TABLE t ADD COLUMN address.zip STRING;           -- STRUCT nested
+ALTER TABLE t ADD COLUMN items.ELEMENT.price DOUBLE;   -- ARRAY<STRUCT> nested
+-- Alter column type (limited)
+ALTER TABLE orders ALTER COLUMN amount TYPE DOUBLE;
+-- Rename column
+ALTER TABLE orders RENAME COLUMN old_col TO new_col;
+-- Drop column
+ALTER TABLE orders DROP COLUMN unnecessary_col;
+-- Alter column comment
+ALTER TABLE orders ALTER COLUMN amount COMMENT 'Order amount';
+-- Add index (tables with ARRAY/JSON columns must add separately)
+-- ⚠️ Index syntax: BLOOMFILTER (not USING BLOOM_FILTER)
+CREATE BLOOMFILTER INDEX IF NOT EXISTS id_bf ON TABLE orders(id);
+CREATE BLOOMFILTER INDEX IF NOT EXISTS name_bf ON TABLE orders(name)
+    PROPERTIES ('analyzer' = 'ngram', 'n' = '3');  -- ngram tokenizer
+-- Inverted index
+CREATE INVERTED INDEX IF NOT EXISTS content_inv ON TABLE articles(content);
+-- Vector index (inline at table creation)
+-- See CREATE TABLE examples
+-- Drop index (⚠️ does not need ON table_name)
+DROP INDEX IF EXISTS id_bf;
+DROP INDEX id_bf;
+```
+**Differences from Snowflake:**
+- Snowflake `ALTER TABLE ... ADD COLUMN` can only add to the end; ClickZetta supports `FIRST/AFTER/BEFORE`
+- Snowflake does not support `DROP COLUMN` (requires table rebuild); ClickZetta supports it
+- Snowflake has no BLOOM_FILTER/INVERTED/VECTOR indexes
+### DROP / TRUNCATE TABLE
+```sql
+-- Drop table (can be recovered with UNDROP)
+DROP TABLE IF EXISTS orders;
+DROP TABLE my_schema.orders;
+-- Truncate table (preserves structure)
+TRUNCATE TABLE orders;
+TRUNCATE TABLE IF EXISTS orders;               -- ✅ supports IF EXISTS
+-- Truncate specific partition
+TRUNCATE TABLE orders PARTITION (dt = '2024-01-01');
+TRUNCATE TABLE orders PARTITION (dt > '2024-01-01');
+TRUNCATE TABLE orders PARTITION (dt >= '2024-01-01' AND dt < '2024-02-01');
+```
+**Differences from Snowflake:**
+- Snowflake `TRUNCATE TABLE` does not support partition conditions; ClickZetta does
+- Snowflake `DROP TABLE ... PURGE` deletes immediately; ClickZetta can UNDROP within retention period
+---
+## VIEW Operations
+```sql
+-- Create view
+CREATE VIEW IF NOT EXISTS order_summary AS
+SELECT customer_id, COUNT(*) AS cnt, SUM(amount) AS total
+FROM orders GROUP BY customer_id;
+-- Replace view (ClickZetta supports OR REPLACE, same as Snowflake)
+CREATE OR REPLACE VIEW order_summary AS
+SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id;
+-- With column aliases and comments
+CREATE VIEW order_summary (cust_id COMMENT 'Customer ID', total COMMENT 'Total amount')
+COMMENT 'Order summary view'
+AS SELECT customer_id, SUM(amount) FROM orders GROUP BY 1;
+-- Drop
+DROP VIEW IF EXISTS order_summary;
+-- Show
+SHOW TABLES WHERE is_view = true;
+SHOW TABLES IN my_schema WHERE is_view = true;
+```
+**Note:** ClickZetta's `CREATE OR REPLACE VIEW` is the same as Snowflake, but `CREATE OR REPLACE TABLE` is not supported.
+---
+## INDEX Operations
+```sql
+-- Show indexes
+SHOW INDEX FROM table_name;
+SHOW INDEX FROM my_schema.table_name;
+-- Show index details
+DESC INDEX index_name;
+DESC INDEX EXTENDED index_name;
+-- Build index on existing data (vector and inverted indexes only, not Bloom Filter)
+BUILD INDEX index_name ON table_name;
+BUILD INDEX index_name ON table_name WHERE partition_col = '2024-01-01';
+```
+---
+## Viewing Object Information
+```sql
+-- Table structure
+DESC table_name;
+DESC EXTENDED table_name;                      -- includes size, record count, etc.
+DESCRIBE TABLE table_name;                     -- same as DESC
+-- Column information
+SHOW COLUMNS IN table_name;
+SHOW COLUMNS FROM table_name IN schema_name;
+-- Create table statement
+SHOW CREATE TABLE table_name;
+-- Table list
+SHOW TABLES;
+SHOW TABLES IN my_schema;
+SHOW TABLES LIKE 'order%';
+SHOW TABLES WHERE is_view = false AND is_materialized_view = false;
+SHOW TABLES WHERE is_dynamic = true;
+SHOW TABLES WHERE is_external = true;
+-- Partition information
+SHOW PARTITIONS table_name;
+SHOW PARTITIONS EXTENDED table_name;           -- includes file count, size, modification time
+SHOW PARTITIONS table_name PARTITION (dt = '2024-01-01');
+SHOW PARTITIONS table_name WHERE total_rows > 1000;
+-- History versions
+DESC HISTORY table_name;
+SHOW TABLES HISTORY;                           -- includes deleted tables
+```
+---
+## SYNONYM Operations
+```sql
+-- Create synonym for a table (cross-schema access)
+CREATE SYNONYM my_orders FOR TABLE other_schema.orders;
+-- Create synonym for a Volume
+CREATE SYNONYM my_vol FOR VOLUME other_schema.data_volume;
+-- Create synonym for a function
+CREATE SYNONYM my_func FOR FUNCTION other_schema.udf_name;
+-- Show synonyms
+SHOW SYNONYMS;
+SHOW SYNONYMS IN my_schema;
+SHOW SYNONYMS LIKE 'my_%';
+-- Drop synonym (must specify object type)
+DROP SYNONYM my_orders FOR TABLE;
+DROP SYNONYM my_vol FOR VOLUME;
+DROP SYNONYM my_func FOR FUNCTION;
+```
+> Supported object types for synonyms: TABLE (including regular tables, Table Streams, materialized views, dynamic tables), VOLUME, FUNCTION.
+> Use cases: cross-schema access, data consistency maintenance, application layer decoupling.
+---
+## Time Travel & Data Recovery
+```sql
+-- Query historical version
+SELECT * FROM orders TIMESTAMP AS OF '2024-01-01 00:00:00';
+SELECT * FROM orders TIMESTAMP AS OF CURRENT_TIMESTAMP() - INTERVAL 12 HOURS;
+SELECT * FROM orders TIMESTAMP AS OF CAST('2024-01-01' AS TIMESTAMP);
+-- Restore table to historical version (table not deleted)
+RESTORE TABLE orders TO TIMESTAMP AS OF '2024-01-01 00:00:00';
+-- Restore deleted table
+UNDROP TABLE orders;
+UNDROP TABLE my_schema.orders;
+-- Set retention period (0-90 days, default 1 day)
+ALTER TABLE orders SET PROPERTIES ('data_retention_days' = '7');
+```
+**Differences from Snowflake:**
+- Snowflake `AT (TIMESTAMP => ...)` → ClickZetta `TIMESTAMP AS OF ...`
+- Snowflake `BEFORE (STATEMENT => ...)` → ClickZetta does not support rollback by statement_id
+- Snowflake `UNDROP TABLE` → ClickZetta same
+- Snowflake default retention 1 day (Enterprise 90 days); ClickZetta default 1 day, max 90 days

package/bin/skills/clickzetta-sql-migration/references/dml-differences.md ADDED Viewed

@@ -0,0 +1,192 @@
+# DML Differences vs Snowflake / Databricks / Spark
+> Focuses only on the DML (INSERT/UPDATE/DELETE/MERGE/COPY) syntax that **differs** from Snowflake, Databricks, or Spark SQL.
+> For the basic ClickZetta DML syntax that works the same as standard SQL, refer to the official ClickZetta Lakehouse documentation.
+---
+## Critical: Implicit Type Conversion
+⚠️ **The single most common migration error.** See [implicit-type-conversion.md](implicit-type-conversion.md) for the full rules table.
+Short version: ClickZetta rejects implicit string→date/timestamp/boolean/json/numeric conversion in INSERT/UPDATE. You must use explicit `CAST` or typed literals.
+---
+## INSERT Differences
+### Snowflake → ClickZetta
+| Snowflake | ClickZetta | Notes |
+|---|---|---|
+| `INSERT OVERWRITE` not supported | `INSERT OVERWRITE TABLE t SELECT ...` ✅ | Use TRUNCATE+INSERT in Snowflake |
+| No `PARTITION (...)` clause | `INSERT INTO t PARTITION (dt='2024-01-01') VALUES ...` ✅ | Hive-style static partition |
+| No dynamic partition syntax | `INSERT INTO t PARTITION (dt) SELECT ..., dt FROM s` ✅ | Hive-style dynamic partition |
+### Spark → ClickZetta
+INSERT syntax is largely identical. ClickZetta is fully compatible with Spark INSERT.
+---
+## UPDATE Differences
+### Snowflake → ClickZetta
+```sql
+-- Snowflake: UPDATE ... FROM (JOIN-style update)
+UPDATE orders o SET amount = c.discount * o.amount
+FROM customers c WHERE o.customer_id = c.id;
+-- ClickZetta: use subquery
+UPDATE orders SET amount = (
+    SELECT discount * orders.amount FROM customers WHERE customers.id = orders.customer_id
+) * amount WHERE customer_id IN (SELECT id FROM customers);
+```
+ClickZetta additionally supports `ORDER BY + LIMIT` in UPDATE, which Snowflake does not:
+```sql
+-- ClickZetta-only: batch update
+UPDATE orders SET status = 'archived'
+WHERE created_at < '2020-01-01'
+ORDER BY created_at ASC
+LIMIT 10000;
+```
+### Spark → ClickZetta
+Spark SQL itself does not support UPDATE (only Delta Lake does). ClickZetta natively supports UPDATE on all tables.
+---
+## DELETE Differences
+### Spark → ClickZetta
+Spark SQL itself does not support DELETE (only Delta Lake does). ClickZetta natively supports DELETE on all tables.
+Snowflake DELETE syntax is essentially identical to ClickZetta.
+---
+## MERGE INTO: Important Limitations
+### Multiple WHEN NOT MATCHED clauses
+```sql
+-- ❌ Snowflake supports multiple WHEN NOT MATCHED — ClickZetta does NOT
+MERGE INTO t USING s ON t.id = s.id
+WHEN NOT MATCHED AND s.type = 'A' THEN INSERT ...
+WHEN NOT MATCHED AND s.type = 'B' THEN INSERT ...;
+-- ✅ ClickZetta: only one WHEN NOT MATCHED — combine logic with CASE
+MERGE INTO t USING s ON t.id = s.id
+WHEN NOT MATCHED THEN INSERT (id, val) VALUES (
+    s.id,
+    CASE s.type WHEN 'A' THEN ... WHEN 'B' THEN ... END
+);
+```
+### WHEN NOT MATCHED BY SOURCE (Databricks Delta Lake)
+```sql
+-- ❌ Databricks supports WHEN NOT MATCHED BY SOURCE — ClickZetta does NOT
+MERGE INTO target t USING source s ON t.id = s.id
+WHEN MATCHED THEN UPDATE ...
+WHEN NOT MATCHED THEN INSERT ...
+WHEN NOT MATCHED BY SOURCE THEN DELETE;  -- ❌ unsupported
+-- ✅ ClickZetta: split into two operations
+MERGE INTO target t USING source s ON t.id = s.id
+WHEN MATCHED THEN UPDATE SET t.val = s.val
+WHEN NOT MATCHED THEN INSERT (id, val) VALUES (s.id, s.val);
+DELETE FROM target WHERE id NOT IN (SELECT id FROM source);
+```
+### Order of Multiple WHEN MATCHED clauses
+```sql
+-- ⚠️ ClickZetta requires UPDATE clauses BEFORE DELETE clauses
+MERGE INTO target t USING source s ON t.id = s.id
+WHEN MATCHED AND s.is_deleted = 0 THEN UPDATE SET ...   -- UPDATE first
+WHEN MATCHED AND s.is_deleted = 1 THEN DELETE          -- DELETE after
+WHEN NOT MATCHED THEN INSERT ...;
+```
+In Snowflake/Databricks, DELETE may appear before UPDATE.
+---
+## Transactions: Not Supported
+```sql
+-- ❌ All of these are unsupported in ClickZetta
+BEGIN;
+BEGIN TRANSACTION;
+START TRANSACTION;
+COMMIT;
+ROLLBACK;
+-- ✅ Use MERGE INTO for atomic UPSERT
+MERGE INTO target t USING source s ON t.id = s.id
+WHEN MATCHED THEN UPDATE SET ...
+WHEN NOT MATCHED THEN INSERT ...;
+```
+For multi-statement atomicity, design idempotent operations or use the `__commit_version` from Time Travel for compensating reads.
+---
+## Bulk Load: Stage → Volume, COPY INTO Differences
+### Snowflake → ClickZetta
+```sql
+-- Snowflake
+COPY INTO orders
+FROM @my_stage/data/2024/
+FILE_FORMAT = (TYPE = CSV FIELD_DELIMITER = ',' SKIP_HEADER = 1)
+PATTERN = '.*\.csv';
+-- ClickZetta
+COPY INTO orders
+FROM VOLUME my_oss_volume
+USING CSV
+OPTIONS('header' = 'true', 'sep' = ',')
+SUBDIRECTORY 'data/2024/'
+REGEXP '.*\.csv';
+```
+| Snowflake | ClickZetta |
+|---|---|
+| `@stage_name` | `VOLUME volume_name` |
+| `FILE_FORMAT = (TYPE = CSV ...)` | `USING CSV OPTIONS(...)` |
+| `PATTERN = '...'` | `REGEXP '...'` |
+| `FILES = ('a.csv','b.csv')` | `FILES('a.csv','b.csv')` |
+### Export
+```sql
+-- Snowflake
+COPY INTO @my_stage FROM orders FILE_FORMAT = (TYPE = PARQUET);
+-- ClickZetta
+COPY INTO VOLUME my_oss_volume
+SUBDIRECTORY 'export/orders/'
+FROM orders
+USING PARQUET;
+```
+---
+## Other ClickZetta-Specific DML Notes
+These are ClickZetta features without direct Snowflake/Databricks/Spark equivalents:
+- `INSERT INTO ... PARTITION (col)` — Hive-style dynamic partition (Snowflake auto-clusters via CLUSTER BY)
+- `COPY OVERWRITE INTO` — atomic overwrite-on-load
+- `RESTORE TABLE ... TO TIMESTAMP AS OF ...` — Time Travel restore (Snowflake uses different syntax, Delta uses VERSION AS OF)
+For the full DML syntax of these features, refer to ClickZetta Lakehouse documentation.