ducklake-delta-exporter 0.1.0__tar.gz → 0.1.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,72 @@
1
+ Metadata-Version: 2.4
2
+ Name: ducklake-delta-exporter
3
+ Version: 0.1.2
4
+ Summary: A utility to export DuckLake database metadata to Delta Lake transaction logs.
5
+ Home-page: https://github.com/djouallah/ducklake_delta_exporter
6
+ Author: mim
7
+ Author-email: your.email@example.com
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Operating System :: OS Independent
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
13
+ Classifier: Development Status :: 3 - Alpha
14
+ Requires-Python: >=3.8
15
+ Description-Content-Type: text/markdown
16
+ Requires-Dist: duckdb
17
+ Requires-Dist: pyarrow
18
+ Dynamic: author
19
+ Dynamic: author-email
20
+ Dynamic: classifier
21
+ Dynamic: description
22
+ Dynamic: description-content-type
23
+ Dynamic: home-page
24
+ Dynamic: requires-dist
25
+ Dynamic: requires-python
26
+ Dynamic: summary
27
+
28
+ # 🦆 DuckLake Delta Exporter
29
+
30
+ A Python utility to **bridge the gap between DuckLake and Delta Lake** by generating Delta-compatible transaction logs directly from DuckLake metadata.
31
+
32
+ This isn’t your typical general-purpose library. It’s mostly battle-tested with **OneLake mounted storage**, and while it *should* work with local filesystems, there’s **no support for S3, GCS, or ABFSS** .
33
+
34
+ It doesn’t use the `deltalake` Python package either. The metadata is handcrafted from scratch — because why not reinvent the wheel for fun and learning?
35
+
36
+ **Goal?**
37
+ Mostly to annoy DuckDB developers into finally shipping a proper Delta Lake metadata exporter 😎
38
+
39
+ 🔗 [Source code on GitHub](https://github.com/djouallah/ducklake_delta_exporter)
40
+
41
+ ---
42
+
43
+ ## ✨ Features
44
+
45
+ - **DuckLake → Delta Sync**
46
+ Generates Delta Lake `_delta_log/*.json` transaction files and Parquet checkpoints from the latest DuckLake state.
47
+
48
+ - **Schema Mapping**
49
+ Converts DuckDB types to their Spark SQL equivalents so Delta can understand them without throwing a tantrum.
50
+
51
+ - **Change Detection**
52
+ Detects file-level additions/removals since the last export — keeps things incremental and tidy.
53
+
54
+ - **Checkpointing**
55
+ Automatically writes Delta checkpoints every N versions (configurable), so readers don’t have to replay the entire log from scratch.
56
+
57
+ ---
58
+
59
+ ## ⚙️ Installation & Usage
60
+
61
+ Install via pip:
62
+
63
+ ```bash
64
+ pip install ducklake-delta-exporter
65
+ ```
66
+
67
+ ```
68
+ from ducklake_delta_exporter import generate_latest_delta_log
69
+
70
+ generate_latest_delta_log('/lakehouse/default/Files/meta.db','/lakehouse/default/Tables')
71
+ ```
72
+ the data path is optional, but handy to support relative path
@@ -0,0 +1,45 @@
1
+ # 🦆 DuckLake Delta Exporter
2
+
3
+ A Python utility to **bridge the gap between DuckLake and Delta Lake** by generating Delta-compatible transaction logs directly from DuckLake metadata.
4
+
5
+ This isn’t your typical general-purpose library. It’s mostly battle-tested with **OneLake mounted storage**, and while it *should* work with local filesystems, there’s **no support for S3, GCS, or ABFSS** .
6
+
7
+ It doesn’t use the `deltalake` Python package either. The metadata is handcrafted from scratch — because why not reinvent the wheel for fun and learning?
8
+
9
+ **Goal?**
10
+ Mostly to annoy DuckDB developers into finally shipping a proper Delta Lake metadata exporter 😎
11
+
12
+ 🔗 [Source code on GitHub](https://github.com/djouallah/ducklake_delta_exporter)
13
+
14
+ ---
15
+
16
+ ## ✨ Features
17
+
18
+ - **DuckLake → Delta Sync**
19
+ Generates Delta Lake `_delta_log/*.json` transaction files and Parquet checkpoints from the latest DuckLake state.
20
+
21
+ - **Schema Mapping**
22
+ Converts DuckDB types to their Spark SQL equivalents so Delta can understand them without throwing a tantrum.
23
+
24
+ - **Change Detection**
25
+ Detects file-level additions/removals since the last export — keeps things incremental and tidy.
26
+
27
+ - **Checkpointing**
28
+ Automatically writes Delta checkpoints every N versions (configurable), so readers don’t have to replay the entire log from scratch.
29
+
30
+ ---
31
+
32
+ ## ⚙️ Installation & Usage
33
+
34
+ Install via pip:
35
+
36
+ ```bash
37
+ pip install ducklake-delta-exporter
38
+ ```
39
+
40
+ ```
41
+ from ducklake_delta_exporter import generate_latest_delta_log
42
+
43
+ generate_latest_delta_log('/lakehouse/default/Files/meta.db','/lakehouse/default/Tables')
44
+ ```
45
+ the data path is optional, but handy to support relative path
@@ -160,7 +160,7 @@ def get_latest_delta_version_info(delta_log_path, con, table_id):
160
160
  return last_delta_version_idx, files_in_last_delta_version, last_exported_ducklake_snapshot_id, meta_id_from_delta_log
161
161
 
162
162
 
163
- def generate_latest_delta_log(db_path: str, data_root: str='/lakehouse/default/Tables', checkpoint_interval: int = 1):
163
+ def generate_latest_delta_log(db_path: str, data_root: str = None):
164
164
  """
165
165
  Generates a Delta Lake transaction log for the LATEST state of each table in a DuckLake database.
166
166
  This creates incremental updates to Delta, not a full history.
@@ -168,10 +168,11 @@ def generate_latest_delta_log(db_path: str, data_root: str='/lakehouse/default/T
168
168
  Args:
169
169
  db_path (str): The path to the DuckLake database file.
170
170
  data_root (str): The root directory for the lakehouse data.
171
- checkpoint_interval (int): The interval at which to create checkpoint files.
172
171
  """
172
+ checkpoint_interval = 1
173
173
  con = duckdb.connect(db_path, read_only=True)
174
-
174
+ if data_root is None: # Only fetch from DB if not provided by user
175
+ data_root = con.sql(""" SELECT value FROM ducklake_metadata WHERE key = 'data_path' """).fetchone()[0]
175
176
  tables = con.sql("""
176
177
  SELECT
177
178
  t.table_id,
@@ -0,0 +1,72 @@
1
+ Metadata-Version: 2.4
2
+ Name: ducklake-delta-exporter
3
+ Version: 0.1.2
4
+ Summary: A utility to export DuckLake database metadata to Delta Lake transaction logs.
5
+ Home-page: https://github.com/djouallah/ducklake_delta_exporter
6
+ Author: mim
7
+ Author-email: your.email@example.com
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Operating System :: OS Independent
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
13
+ Classifier: Development Status :: 3 - Alpha
14
+ Requires-Python: >=3.8
15
+ Description-Content-Type: text/markdown
16
+ Requires-Dist: duckdb
17
+ Requires-Dist: pyarrow
18
+ Dynamic: author
19
+ Dynamic: author-email
20
+ Dynamic: classifier
21
+ Dynamic: description
22
+ Dynamic: description-content-type
23
+ Dynamic: home-page
24
+ Dynamic: requires-dist
25
+ Dynamic: requires-python
26
+ Dynamic: summary
27
+
28
+ # 🦆 DuckLake Delta Exporter
29
+
30
+ A Python utility to **bridge the gap between DuckLake and Delta Lake** by generating Delta-compatible transaction logs directly from DuckLake metadata.
31
+
32
+ This isn’t your typical general-purpose library. It’s mostly battle-tested with **OneLake mounted storage**, and while it *should* work with local filesystems, there’s **no support for S3, GCS, or ABFSS** .
33
+
34
+ It doesn’t use the `deltalake` Python package either. The metadata is handcrafted from scratch — because why not reinvent the wheel for fun and learning?
35
+
36
+ **Goal?**
37
+ Mostly to annoy DuckDB developers into finally shipping a proper Delta Lake metadata exporter 😎
38
+
39
+ 🔗 [Source code on GitHub](https://github.com/djouallah/ducklake_delta_exporter)
40
+
41
+ ---
42
+
43
+ ## ✨ Features
44
+
45
+ - **DuckLake → Delta Sync**
46
+ Generates Delta Lake `_delta_log/*.json` transaction files and Parquet checkpoints from the latest DuckLake state.
47
+
48
+ - **Schema Mapping**
49
+ Converts DuckDB types to their Spark SQL equivalents so Delta can understand them without throwing a tantrum.
50
+
51
+ - **Change Detection**
52
+ Detects file-level additions/removals since the last export — keeps things incremental and tidy.
53
+
54
+ - **Checkpointing**
55
+ Automatically writes Delta checkpoints every N versions (configurable), so readers don’t have to replay the entire log from scratch.
56
+
57
+ ---
58
+
59
+ ## ⚙️ Installation & Usage
60
+
61
+ Install via pip:
62
+
63
+ ```bash
64
+ pip install ducklake-delta-exporter
65
+ ```
66
+
67
+ ```
68
+ from ducklake_delta_exporter import generate_latest_delta_log
69
+
70
+ generate_latest_delta_log('/lakehouse/default/Files/meta.db','/lakehouse/default/Tables')
71
+ ```
72
+ the data path is optional, but handy to support relative path
@@ -3,7 +3,7 @@ from setuptools import setup, find_packages
3
3
 
4
4
  setup(
5
5
  name='ducklake-delta-exporter',
6
- version='0.1.0',
6
+ version='0.1.2',
7
7
  packages=find_packages(),
8
8
  install_requires=[
9
9
  'duckdb',
@@ -12,9 +12,9 @@ setup(
12
12
  author='mim',
13
13
  author_email='your.email@example.com',
14
14
  description='A utility to export DuckLake database metadata to Delta Lake transaction logs.',
15
- long_description=open('README.md').read(),
15
+ long_description=open('README.md', encoding='utf-8').read(),
16
16
  long_description_content_type='text/markdown',
17
- url='https://github.com/djouallah/ducklake-delta-exporter',
17
+ url='https://github.com/djouallah/ducklake_delta_exporter',
18
18
  classifiers=[
19
19
  'Programming Language :: Python :: 3',
20
20
  'License :: OSI Approved :: MIT License',
@@ -1,52 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: ducklake-delta-exporter
3
- Version: 0.1.0
4
- Summary: A utility to export DuckLake database metadata to Delta Lake transaction logs.
5
- Home-page: https://github.com/djouallah/ducklake-delta-exporter
6
- Author: mim
7
- Author-email: your.email@example.com
8
- Classifier: Programming Language :: Python :: 3
9
- Classifier: License :: OSI Approved :: MIT License
10
- Classifier: Operating System :: OS Independent
11
- Classifier: Intended Audience :: Developers
12
- Classifier: Topic :: Software Development :: Libraries :: Python Modules
13
- Classifier: Development Status :: 3 - Alpha
14
- Requires-Python: >=3.8
15
- Description-Content-Type: text/markdown
16
- Requires-Dist: duckdb
17
- Requires-Dist: pyarrow
18
- Dynamic: author
19
- Dynamic: author-email
20
- Dynamic: classifier
21
- Dynamic: description
22
- Dynamic: description-content-type
23
- Dynamic: home-page
24
- Dynamic: requires-dist
25
- Dynamic: requires-python
26
- Dynamic: summary
27
-
28
-
29
- # DuckLake Delta Exporter
30
- A Python utility to synchronize metadata from a DuckLake database with Delta Lake transaction logs. This allows you to manage data in DuckLake and make it discoverable and queryable by Delta Lake compatible tools (e.g., Spark, Delta Lake Rust/Python clients).
31
-
32
- # Features
33
- DuckLake to Delta Sync: Generates incremental Delta Lake transaction logs (_delta_log/*.json) and checkpoint files (_delta_log/*.checkpoint.parquet) based on the latest state of tables in a DuckLake database.
34
-
35
- Schema Mapping: Automatically maps DuckDB data types to their Spark SQL equivalents for Delta Lake schema definitions.
36
-
37
- Change Detection: Identifies added and removed data files since the last Delta export, ensuring only necessary updates are written to the log.
38
-
39
- Checkpointing: Supports creating Delta Lake checkpoint files at a configurable interval for efficient state reconstruction.
40
-
41
- # Installation
42
- You can install this package using pip:
43
-
44
- pip install ducklake-delta-exporter
45
-
46
-
47
-
48
- # Usage
49
- ```
50
- from ducklake_delta_exporter import generate_latest_delta_log
51
- generate_latest_delta_log('path/to/your/ducklake.db', data_root='/lakehouse/default/Tables', checkpoint_interval=1)
52
- ```
@@ -1,25 +0,0 @@
1
-
2
- # DuckLake Delta Exporter
3
- A Python utility to synchronize metadata from a DuckLake database with Delta Lake transaction logs. This allows you to manage data in DuckLake and make it discoverable and queryable by Delta Lake compatible tools (e.g., Spark, Delta Lake Rust/Python clients).
4
-
5
- # Features
6
- DuckLake to Delta Sync: Generates incremental Delta Lake transaction logs (_delta_log/*.json) and checkpoint files (_delta_log/*.checkpoint.parquet) based on the latest state of tables in a DuckLake database.
7
-
8
- Schema Mapping: Automatically maps DuckDB data types to their Spark SQL equivalents for Delta Lake schema definitions.
9
-
10
- Change Detection: Identifies added and removed data files since the last Delta export, ensuring only necessary updates are written to the log.
11
-
12
- Checkpointing: Supports creating Delta Lake checkpoint files at a configurable interval for efficient state reconstruction.
13
-
14
- # Installation
15
- You can install this package using pip:
16
-
17
- pip install ducklake-delta-exporter
18
-
19
-
20
-
21
- # Usage
22
- ```
23
- from ducklake_delta_exporter import generate_latest_delta_log
24
- generate_latest_delta_log('path/to/your/ducklake.db', data_root='/lakehouse/default/Tables', checkpoint_interval=1)
25
- ```
@@ -1,52 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: ducklake-delta-exporter
3
- Version: 0.1.0
4
- Summary: A utility to export DuckLake database metadata to Delta Lake transaction logs.
5
- Home-page: https://github.com/djouallah/ducklake-delta-exporter
6
- Author: mim
7
- Author-email: your.email@example.com
8
- Classifier: Programming Language :: Python :: 3
9
- Classifier: License :: OSI Approved :: MIT License
10
- Classifier: Operating System :: OS Independent
11
- Classifier: Intended Audience :: Developers
12
- Classifier: Topic :: Software Development :: Libraries :: Python Modules
13
- Classifier: Development Status :: 3 - Alpha
14
- Requires-Python: >=3.8
15
- Description-Content-Type: text/markdown
16
- Requires-Dist: duckdb
17
- Requires-Dist: pyarrow
18
- Dynamic: author
19
- Dynamic: author-email
20
- Dynamic: classifier
21
- Dynamic: description
22
- Dynamic: description-content-type
23
- Dynamic: home-page
24
- Dynamic: requires-dist
25
- Dynamic: requires-python
26
- Dynamic: summary
27
-
28
-
29
- # DuckLake Delta Exporter
30
- A Python utility to synchronize metadata from a DuckLake database with Delta Lake transaction logs. This allows you to manage data in DuckLake and make it discoverable and queryable by Delta Lake compatible tools (e.g., Spark, Delta Lake Rust/Python clients).
31
-
32
- # Features
33
- DuckLake to Delta Sync: Generates incremental Delta Lake transaction logs (_delta_log/*.json) and checkpoint files (_delta_log/*.checkpoint.parquet) based on the latest state of tables in a DuckLake database.
34
-
35
- Schema Mapping: Automatically maps DuckDB data types to their Spark SQL equivalents for Delta Lake schema definitions.
36
-
37
- Change Detection: Identifies added and removed data files since the last Delta export, ensuring only necessary updates are written to the log.
38
-
39
- Checkpointing: Supports creating Delta Lake checkpoint files at a configurable interval for efficient state reconstruction.
40
-
41
- # Installation
42
- You can install this package using pip:
43
-
44
- pip install ducklake-delta-exporter
45
-
46
-
47
-
48
- # Usage
49
- ```
50
- from ducklake_delta_exporter import generate_latest_delta_log
51
- generate_latest_delta_log('path/to/your/ducklake.db', data_root='/lakehouse/default/Tables', checkpoint_interval=1)
52
- ```