ducklake-delta-exporter 0.1.0__tar.gz → 0.1.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- ducklake_delta_exporter-0.1.1/PKG-INFO +71 -0
- ducklake_delta_exporter-0.1.1/README.md +44 -0
- ducklake_delta_exporter-0.1.1/ducklake_delta_exporter.egg-info/PKG-INFO +71 -0
- {ducklake_delta_exporter-0.1.0 → ducklake_delta_exporter-0.1.1}/setup.py +3 -3
- ducklake_delta_exporter-0.1.0/PKG-INFO +0 -52
- ducklake_delta_exporter-0.1.0/README.md +0 -25
- ducklake_delta_exporter-0.1.0/ducklake_delta_exporter.egg-info/PKG-INFO +0 -52
- {ducklake_delta_exporter-0.1.0 → ducklake_delta_exporter-0.1.1}/ducklake_delta_exporter/__init__.py +0 -0
- {ducklake_delta_exporter-0.1.0 → ducklake_delta_exporter-0.1.1}/ducklake_delta_exporter.egg-info/SOURCES.txt +0 -0
- {ducklake_delta_exporter-0.1.0 → ducklake_delta_exporter-0.1.1}/ducklake_delta_exporter.egg-info/dependency_links.txt +0 -0
- {ducklake_delta_exporter-0.1.0 → ducklake_delta_exporter-0.1.1}/ducklake_delta_exporter.egg-info/requires.txt +0 -0
- {ducklake_delta_exporter-0.1.0 → ducklake_delta_exporter-0.1.1}/ducklake_delta_exporter.egg-info/top_level.txt +0 -0
- {ducklake_delta_exporter-0.1.0 → ducklake_delta_exporter-0.1.1}/setup.cfg +0 -0
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: ducklake-delta-exporter
|
|
3
|
+
Version: 0.1.1
|
|
4
|
+
Summary: A utility to export DuckLake database metadata to Delta Lake transaction logs.
|
|
5
|
+
Home-page: https://github.com/djouallah/ducklake_delta_exporter
|
|
6
|
+
Author: mim
|
|
7
|
+
Author-email: your.email@example.com
|
|
8
|
+
Classifier: Programming Language :: Python :: 3
|
|
9
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
10
|
+
Classifier: Operating System :: OS Independent
|
|
11
|
+
Classifier: Intended Audience :: Developers
|
|
12
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
13
|
+
Classifier: Development Status :: 3 - Alpha
|
|
14
|
+
Requires-Python: >=3.8
|
|
15
|
+
Description-Content-Type: text/markdown
|
|
16
|
+
Requires-Dist: duckdb
|
|
17
|
+
Requires-Dist: pyarrow
|
|
18
|
+
Dynamic: author
|
|
19
|
+
Dynamic: author-email
|
|
20
|
+
Dynamic: classifier
|
|
21
|
+
Dynamic: description
|
|
22
|
+
Dynamic: description-content-type
|
|
23
|
+
Dynamic: home-page
|
|
24
|
+
Dynamic: requires-dist
|
|
25
|
+
Dynamic: requires-python
|
|
26
|
+
Dynamic: summary
|
|
27
|
+
|
|
28
|
+
# 🦆 DuckLake Delta Exporter
|
|
29
|
+
|
|
30
|
+
A Python utility to **bridge the gap between DuckLake and Delta Lake** by generating Delta-compatible transaction logs directly from DuckLake metadata.
|
|
31
|
+
|
|
32
|
+
This isn’t your typical general-purpose library. It’s mostly battle-tested with **OneLake mounted storage**, and while it *should* work with local filesystems, there’s **no support for S3, GCS, or ABFSS** .
|
|
33
|
+
|
|
34
|
+
It doesn’t use the `deltalake` Python package either. The metadata is handcrafted from scratch — because why not reinvent the wheel for fun and learning?
|
|
35
|
+
|
|
36
|
+
**Goal?**
|
|
37
|
+
Mostly to annoy DuckDB developers into finally shipping a proper Delta Lake metadata exporter 😎
|
|
38
|
+
|
|
39
|
+
🔗 [Source code on GitHub](https://github.com/djouallah/ducklake_delta_exporter)
|
|
40
|
+
|
|
41
|
+
---
|
|
42
|
+
|
|
43
|
+
## ✨ Features
|
|
44
|
+
|
|
45
|
+
- **DuckLake → Delta Sync**
|
|
46
|
+
Generates Delta Lake `_delta_log/*.json` transaction files and Parquet checkpoints from the latest DuckLake state.
|
|
47
|
+
|
|
48
|
+
- **Schema Mapping**
|
|
49
|
+
Converts DuckDB types to their Spark SQL equivalents so Delta can understand them without throwing a tantrum.
|
|
50
|
+
|
|
51
|
+
- **Change Detection**
|
|
52
|
+
Detects file-level additions/removals since the last export — keeps things incremental and tidy.
|
|
53
|
+
|
|
54
|
+
- **Checkpointing**
|
|
55
|
+
Automatically writes Delta checkpoints every N versions (configurable), so readers don’t have to replay the entire log from scratch.
|
|
56
|
+
|
|
57
|
+
---
|
|
58
|
+
|
|
59
|
+
## ⚙️ Installation & Usage
|
|
60
|
+
|
|
61
|
+
Install via pip:
|
|
62
|
+
|
|
63
|
+
```bash
|
|
64
|
+
pip install ducklake-delta-exporter
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
```
|
|
68
|
+
from ducklake_delta_exporter import generate_latest_delta_log
|
|
69
|
+
|
|
70
|
+
generate_latest_delta_log('/lakehouse/default/Files/meta.db')
|
|
71
|
+
```
|
|
@@ -0,0 +1,44 @@
|
|
|
1
|
+
# 🦆 DuckLake Delta Exporter
|
|
2
|
+
|
|
3
|
+
A Python utility to **bridge the gap between DuckLake and Delta Lake** by generating Delta-compatible transaction logs directly from DuckLake metadata.
|
|
4
|
+
|
|
5
|
+
This isn’t your typical general-purpose library. It’s mostly battle-tested with **OneLake mounted storage**, and while it *should* work with local filesystems, there’s **no support for S3, GCS, or ABFSS** .
|
|
6
|
+
|
|
7
|
+
It doesn’t use the `deltalake` Python package either. The metadata is handcrafted from scratch — because why not reinvent the wheel for fun and learning?
|
|
8
|
+
|
|
9
|
+
**Goal?**
|
|
10
|
+
Mostly to annoy DuckDB developers into finally shipping a proper Delta Lake metadata exporter 😎
|
|
11
|
+
|
|
12
|
+
🔗 [Source code on GitHub](https://github.com/djouallah/ducklake_delta_exporter)
|
|
13
|
+
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
## ✨ Features
|
|
17
|
+
|
|
18
|
+
- **DuckLake → Delta Sync**
|
|
19
|
+
Generates Delta Lake `_delta_log/*.json` transaction files and Parquet checkpoints from the latest DuckLake state.
|
|
20
|
+
|
|
21
|
+
- **Schema Mapping**
|
|
22
|
+
Converts DuckDB types to their Spark SQL equivalents so Delta can understand them without throwing a tantrum.
|
|
23
|
+
|
|
24
|
+
- **Change Detection**
|
|
25
|
+
Detects file-level additions/removals since the last export — keeps things incremental and tidy.
|
|
26
|
+
|
|
27
|
+
- **Checkpointing**
|
|
28
|
+
Automatically writes Delta checkpoints every N versions (configurable), so readers don’t have to replay the entire log from scratch.
|
|
29
|
+
|
|
30
|
+
---
|
|
31
|
+
|
|
32
|
+
## ⚙️ Installation & Usage
|
|
33
|
+
|
|
34
|
+
Install via pip:
|
|
35
|
+
|
|
36
|
+
```bash
|
|
37
|
+
pip install ducklake-delta-exporter
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
```
|
|
41
|
+
from ducklake_delta_exporter import generate_latest_delta_log
|
|
42
|
+
|
|
43
|
+
generate_latest_delta_log('/lakehouse/default/Files/meta.db')
|
|
44
|
+
```
|
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: ducklake-delta-exporter
|
|
3
|
+
Version: 0.1.1
|
|
4
|
+
Summary: A utility to export DuckLake database metadata to Delta Lake transaction logs.
|
|
5
|
+
Home-page: https://github.com/djouallah/ducklake_delta_exporter
|
|
6
|
+
Author: mim
|
|
7
|
+
Author-email: your.email@example.com
|
|
8
|
+
Classifier: Programming Language :: Python :: 3
|
|
9
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
10
|
+
Classifier: Operating System :: OS Independent
|
|
11
|
+
Classifier: Intended Audience :: Developers
|
|
12
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
13
|
+
Classifier: Development Status :: 3 - Alpha
|
|
14
|
+
Requires-Python: >=3.8
|
|
15
|
+
Description-Content-Type: text/markdown
|
|
16
|
+
Requires-Dist: duckdb
|
|
17
|
+
Requires-Dist: pyarrow
|
|
18
|
+
Dynamic: author
|
|
19
|
+
Dynamic: author-email
|
|
20
|
+
Dynamic: classifier
|
|
21
|
+
Dynamic: description
|
|
22
|
+
Dynamic: description-content-type
|
|
23
|
+
Dynamic: home-page
|
|
24
|
+
Dynamic: requires-dist
|
|
25
|
+
Dynamic: requires-python
|
|
26
|
+
Dynamic: summary
|
|
27
|
+
|
|
28
|
+
# 🦆 DuckLake Delta Exporter
|
|
29
|
+
|
|
30
|
+
A Python utility to **bridge the gap between DuckLake and Delta Lake** by generating Delta-compatible transaction logs directly from DuckLake metadata.
|
|
31
|
+
|
|
32
|
+
This isn’t your typical general-purpose library. It’s mostly battle-tested with **OneLake mounted storage**, and while it *should* work with local filesystems, there’s **no support for S3, GCS, or ABFSS** .
|
|
33
|
+
|
|
34
|
+
It doesn’t use the `deltalake` Python package either. The metadata is handcrafted from scratch — because why not reinvent the wheel for fun and learning?
|
|
35
|
+
|
|
36
|
+
**Goal?**
|
|
37
|
+
Mostly to annoy DuckDB developers into finally shipping a proper Delta Lake metadata exporter 😎
|
|
38
|
+
|
|
39
|
+
🔗 [Source code on GitHub](https://github.com/djouallah/ducklake_delta_exporter)
|
|
40
|
+
|
|
41
|
+
---
|
|
42
|
+
|
|
43
|
+
## ✨ Features
|
|
44
|
+
|
|
45
|
+
- **DuckLake → Delta Sync**
|
|
46
|
+
Generates Delta Lake `_delta_log/*.json` transaction files and Parquet checkpoints from the latest DuckLake state.
|
|
47
|
+
|
|
48
|
+
- **Schema Mapping**
|
|
49
|
+
Converts DuckDB types to their Spark SQL equivalents so Delta can understand them without throwing a tantrum.
|
|
50
|
+
|
|
51
|
+
- **Change Detection**
|
|
52
|
+
Detects file-level additions/removals since the last export — keeps things incremental and tidy.
|
|
53
|
+
|
|
54
|
+
- **Checkpointing**
|
|
55
|
+
Automatically writes Delta checkpoints every N versions (configurable), so readers don’t have to replay the entire log from scratch.
|
|
56
|
+
|
|
57
|
+
---
|
|
58
|
+
|
|
59
|
+
## ⚙️ Installation & Usage
|
|
60
|
+
|
|
61
|
+
Install via pip:
|
|
62
|
+
|
|
63
|
+
```bash
|
|
64
|
+
pip install ducklake-delta-exporter
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
```
|
|
68
|
+
from ducklake_delta_exporter import generate_latest_delta_log
|
|
69
|
+
|
|
70
|
+
generate_latest_delta_log('/lakehouse/default/Files/meta.db')
|
|
71
|
+
```
|
|
@@ -3,7 +3,7 @@ from setuptools import setup, find_packages
|
|
|
3
3
|
|
|
4
4
|
setup(
|
|
5
5
|
name='ducklake-delta-exporter',
|
|
6
|
-
version='0.1.
|
|
6
|
+
version='0.1.1',
|
|
7
7
|
packages=find_packages(),
|
|
8
8
|
install_requires=[
|
|
9
9
|
'duckdb',
|
|
@@ -12,9 +12,9 @@ setup(
|
|
|
12
12
|
author='mim',
|
|
13
13
|
author_email='your.email@example.com',
|
|
14
14
|
description='A utility to export DuckLake database metadata to Delta Lake transaction logs.',
|
|
15
|
-
long_description=open('README.md').read(),
|
|
15
|
+
long_description=open('README.md', encoding='utf-8').read(),
|
|
16
16
|
long_description_content_type='text/markdown',
|
|
17
|
-
url='https://github.com/djouallah/
|
|
17
|
+
url='https://github.com/djouallah/ducklake_delta_exporter',
|
|
18
18
|
classifiers=[
|
|
19
19
|
'Programming Language :: Python :: 3',
|
|
20
20
|
'License :: OSI Approved :: MIT License',
|
|
@@ -1,52 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.4
|
|
2
|
-
Name: ducklake-delta-exporter
|
|
3
|
-
Version: 0.1.0
|
|
4
|
-
Summary: A utility to export DuckLake database metadata to Delta Lake transaction logs.
|
|
5
|
-
Home-page: https://github.com/djouallah/ducklake-delta-exporter
|
|
6
|
-
Author: mim
|
|
7
|
-
Author-email: your.email@example.com
|
|
8
|
-
Classifier: Programming Language :: Python :: 3
|
|
9
|
-
Classifier: License :: OSI Approved :: MIT License
|
|
10
|
-
Classifier: Operating System :: OS Independent
|
|
11
|
-
Classifier: Intended Audience :: Developers
|
|
12
|
-
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
13
|
-
Classifier: Development Status :: 3 - Alpha
|
|
14
|
-
Requires-Python: >=3.8
|
|
15
|
-
Description-Content-Type: text/markdown
|
|
16
|
-
Requires-Dist: duckdb
|
|
17
|
-
Requires-Dist: pyarrow
|
|
18
|
-
Dynamic: author
|
|
19
|
-
Dynamic: author-email
|
|
20
|
-
Dynamic: classifier
|
|
21
|
-
Dynamic: description
|
|
22
|
-
Dynamic: description-content-type
|
|
23
|
-
Dynamic: home-page
|
|
24
|
-
Dynamic: requires-dist
|
|
25
|
-
Dynamic: requires-python
|
|
26
|
-
Dynamic: summary
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
# DuckLake Delta Exporter
|
|
30
|
-
A Python utility to synchronize metadata from a DuckLake database with Delta Lake transaction logs. This allows you to manage data in DuckLake and make it discoverable and queryable by Delta Lake compatible tools (e.g., Spark, Delta Lake Rust/Python clients).
|
|
31
|
-
|
|
32
|
-
# Features
|
|
33
|
-
DuckLake to Delta Sync: Generates incremental Delta Lake transaction logs (_delta_log/*.json) and checkpoint files (_delta_log/*.checkpoint.parquet) based on the latest state of tables in a DuckLake database.
|
|
34
|
-
|
|
35
|
-
Schema Mapping: Automatically maps DuckDB data types to their Spark SQL equivalents for Delta Lake schema definitions.
|
|
36
|
-
|
|
37
|
-
Change Detection: Identifies added and removed data files since the last Delta export, ensuring only necessary updates are written to the log.
|
|
38
|
-
|
|
39
|
-
Checkpointing: Supports creating Delta Lake checkpoint files at a configurable interval for efficient state reconstruction.
|
|
40
|
-
|
|
41
|
-
# Installation
|
|
42
|
-
You can install this package using pip:
|
|
43
|
-
|
|
44
|
-
pip install ducklake-delta-exporter
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
# Usage
|
|
49
|
-
```
|
|
50
|
-
from ducklake_delta_exporter import generate_latest_delta_log
|
|
51
|
-
generate_latest_delta_log('path/to/your/ducklake.db', data_root='/lakehouse/default/Tables', checkpoint_interval=1)
|
|
52
|
-
```
|
|
@@ -1,25 +0,0 @@
|
|
|
1
|
-
|
|
2
|
-
# DuckLake Delta Exporter
|
|
3
|
-
A Python utility to synchronize metadata from a DuckLake database with Delta Lake transaction logs. This allows you to manage data in DuckLake and make it discoverable and queryable by Delta Lake compatible tools (e.g., Spark, Delta Lake Rust/Python clients).
|
|
4
|
-
|
|
5
|
-
# Features
|
|
6
|
-
DuckLake to Delta Sync: Generates incremental Delta Lake transaction logs (_delta_log/*.json) and checkpoint files (_delta_log/*.checkpoint.parquet) based on the latest state of tables in a DuckLake database.
|
|
7
|
-
|
|
8
|
-
Schema Mapping: Automatically maps DuckDB data types to their Spark SQL equivalents for Delta Lake schema definitions.
|
|
9
|
-
|
|
10
|
-
Change Detection: Identifies added and removed data files since the last Delta export, ensuring only necessary updates are written to the log.
|
|
11
|
-
|
|
12
|
-
Checkpointing: Supports creating Delta Lake checkpoint files at a configurable interval for efficient state reconstruction.
|
|
13
|
-
|
|
14
|
-
# Installation
|
|
15
|
-
You can install this package using pip:
|
|
16
|
-
|
|
17
|
-
pip install ducklake-delta-exporter
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
# Usage
|
|
22
|
-
```
|
|
23
|
-
from ducklake_delta_exporter import generate_latest_delta_log
|
|
24
|
-
generate_latest_delta_log('path/to/your/ducklake.db', data_root='/lakehouse/default/Tables', checkpoint_interval=1)
|
|
25
|
-
```
|
|
@@ -1,52 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.4
|
|
2
|
-
Name: ducklake-delta-exporter
|
|
3
|
-
Version: 0.1.0
|
|
4
|
-
Summary: A utility to export DuckLake database metadata to Delta Lake transaction logs.
|
|
5
|
-
Home-page: https://github.com/djouallah/ducklake-delta-exporter
|
|
6
|
-
Author: mim
|
|
7
|
-
Author-email: your.email@example.com
|
|
8
|
-
Classifier: Programming Language :: Python :: 3
|
|
9
|
-
Classifier: License :: OSI Approved :: MIT License
|
|
10
|
-
Classifier: Operating System :: OS Independent
|
|
11
|
-
Classifier: Intended Audience :: Developers
|
|
12
|
-
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
13
|
-
Classifier: Development Status :: 3 - Alpha
|
|
14
|
-
Requires-Python: >=3.8
|
|
15
|
-
Description-Content-Type: text/markdown
|
|
16
|
-
Requires-Dist: duckdb
|
|
17
|
-
Requires-Dist: pyarrow
|
|
18
|
-
Dynamic: author
|
|
19
|
-
Dynamic: author-email
|
|
20
|
-
Dynamic: classifier
|
|
21
|
-
Dynamic: description
|
|
22
|
-
Dynamic: description-content-type
|
|
23
|
-
Dynamic: home-page
|
|
24
|
-
Dynamic: requires-dist
|
|
25
|
-
Dynamic: requires-python
|
|
26
|
-
Dynamic: summary
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
# DuckLake Delta Exporter
|
|
30
|
-
A Python utility to synchronize metadata from a DuckLake database with Delta Lake transaction logs. This allows you to manage data in DuckLake and make it discoverable and queryable by Delta Lake compatible tools (e.g., Spark, Delta Lake Rust/Python clients).
|
|
31
|
-
|
|
32
|
-
# Features
|
|
33
|
-
DuckLake to Delta Sync: Generates incremental Delta Lake transaction logs (_delta_log/*.json) and checkpoint files (_delta_log/*.checkpoint.parquet) based on the latest state of tables in a DuckLake database.
|
|
34
|
-
|
|
35
|
-
Schema Mapping: Automatically maps DuckDB data types to their Spark SQL equivalents for Delta Lake schema definitions.
|
|
36
|
-
|
|
37
|
-
Change Detection: Identifies added and removed data files since the last Delta export, ensuring only necessary updates are written to the log.
|
|
38
|
-
|
|
39
|
-
Checkpointing: Supports creating Delta Lake checkpoint files at a configurable interval for efficient state reconstruction.
|
|
40
|
-
|
|
41
|
-
# Installation
|
|
42
|
-
You can install this package using pip:
|
|
43
|
-
|
|
44
|
-
pip install ducklake-delta-exporter
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
# Usage
|
|
49
|
-
```
|
|
50
|
-
from ducklake_delta_exporter import generate_latest_delta_log
|
|
51
|
-
generate_latest_delta_log('path/to/your/ducklake.db', data_root='/lakehouse/default/Tables', checkpoint_interval=1)
|
|
52
|
-
```
|
{ducklake_delta_exporter-0.1.0 → ducklake_delta_exporter-0.1.1}/ducklake_delta_exporter/__init__.py
RENAMED
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|