duckrun 0.0.0__tar.gz → 0.1.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
duckrun-0.1.1/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Mimoune
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
duckrun-0.1.1/PKG-INFO ADDED
@@ -0,0 +1,183 @@
1
+ Metadata-Version: 2.4
2
+ Name: duckrun
3
+ Version: 0.1.1
4
+ Summary: Lakehouse task runner powered by DuckDB for Microsoft Fabric
5
+ License-Expression: MIT
6
+ Project-URL: Homepage, https://github.com/djouallah/duckrun
7
+ Project-URL: Repository, https://github.com/djouallah/duckrun
8
+ Project-URL: Issues, https://github.com/djouallah/duckrun/issues
9
+ Requires-Python: >=3.9
10
+ Description-Content-Type: text/markdown
11
+ License-File: LICENSE
12
+ Requires-Dist: duckdb>=1.2.0
13
+ Requires-Dist: deltalake>=0.18.2
14
+ Requires-Dist: requests>=2.28.0
15
+ Dynamic: license-file
16
+
17
+ # 🦆 Duckrun
18
+
19
+ Simple task runner for Microsoft Fabric Python notebook, powered by DuckDB and Delta_rs.
20
+
21
+ ## Installation
22
+
23
+ ```bash
24
+ pip install duckrun
25
+ ```
26
+
27
+
28
+
29
+ ## Quick Start
30
+
31
+ ```python
32
+ import duckrun as dr
33
+
34
+ # Connect to your Fabric lakehouse
35
+ lakehouse = dr.connect(
36
+ workspace="my_workspace",
37
+ lakehouse_name="my_lakehouse",
38
+ schema="dbo",
39
+ sql_folder="./sql" # folder containing your .sql and .py files
40
+ )
41
+
42
+ # Define your pipeline
43
+ pipeline = [
44
+ ('load_data', (url, path)), # Python task
45
+ ('clean_data', 'overwrite'), # SQL task
46
+ ('aggregate', 'append') # SQL task
47
+ ]
48
+
49
+ # Run it
50
+ lakehouse.run(pipeline)
51
+ ```
52
+
53
+ ## How It Works
54
+
55
+ Duckrun runs two types of tasks:
56
+
57
+ ### 1. Python Tasks
58
+ Format: `('function_name', (arg1, arg2, ...))`
59
+
60
+ Create a file `sql_folder/function_name.py` with a function matching the name:
61
+
62
+ ```python
63
+ # sql_folder/load_data.py
64
+ def load_data(url, path):
65
+ # your code here
66
+ # IMPORTANT: Must return 1 for success, 0 for failure
67
+ return 1
68
+ ```
69
+
70
+ ### 2. SQL Tasks
71
+ Format: `('table_name', 'mode')` or `('table_name', 'mode', {params})`
72
+
73
+ Create a file `sql_folder/table_name.sql`:
74
+
75
+ ```sql
76
+ -- sql_folder/clean_data.sql
77
+ SELECT
78
+ id,
79
+ TRIM(name) as name,
80
+ date
81
+ FROM raw_data
82
+ WHERE date >= '2024-01-01'
83
+ ```
84
+
85
+ **Modes:**
86
+ - `overwrite` - Replace table completely
87
+ - `append` - Add to existing table
88
+ - `ignore` - Create only if doesn't exist
89
+
90
+ ## Task Files
91
+
92
+ The `sql_folder` can contain a mixture of both `.sql` and `.py` files. This allows you to combine SQL transformations and Python logic in your pipelines.
93
+
94
+ ### SQL Files
95
+ Your SQL files automatically have access to:
96
+ - `$ws` - workspace name
97
+ - `$lh` - lakehouse name
98
+ - `$schema` - schema name
99
+
100
+ Pass custom parameters:
101
+
102
+ ```python
103
+ pipeline = [
104
+ ('sales', 'append', {'start_date': '2024-01-01', 'end_date': '2024-12-31'})
105
+ ]
106
+ ```
107
+
108
+ ```sql
109
+ -- sql_folder/sales.sql
110
+ SELECT * FROM transactions
111
+ WHERE date BETWEEN '$start_date' AND '$end_date'
112
+ ```
113
+
114
+ ## Table Name Convention
115
+
116
+ Use `__` to create variants of the same table:
117
+
118
+ ```python
119
+ pipeline = [
120
+ ('sales__initial', 'overwrite', {}), # writes to 'sales' table
121
+ ('sales__incremental', 'append', {}), # appends to 'sales' table
122
+ ]
123
+ ```
124
+
125
+ Both write to the same `sales` table, but use different SQL files.
126
+
127
+ ## Query Data
128
+
129
+ ```python
130
+ # Run queries
131
+ lakehouse.sql("SELECT * FROM my_table LIMIT 10").show()
132
+
133
+ # Get as DataFrame
134
+ df = lakehouse.sql("SELECT COUNT(*) FROM sales").df()
135
+ ```
136
+
137
+ ## Real-World Example
138
+
139
+ ```python
140
+ import duckrun as dr
141
+
142
+ lakehouse = dr.connect(
143
+ workspace="Analytics",
144
+ lakehouse_name="Sales",
145
+ schema="dbo",
146
+ sql_folder="./etl"
147
+ )
148
+
149
+ # Daily pipeline
150
+ daily = [
151
+ ('download_files', (api_url, local_path)),
152
+ ('staging_orders', 'overwrite', {'run_date': '2024-06-01'}),
153
+ ('staging_customers', 'overwrite', {'run_date': '2024-06-01'}),
154
+ ('fact_sales', 'append'),
155
+ ('dim_customer', 'overwrite')
156
+ ]
157
+
158
+ lakehouse.run(daily)
159
+
160
+ # Check results
161
+ lakehouse.sql("SELECT COUNT(*) FROM fact_sales").show()
162
+ ```
163
+
164
+ ## Remote SQL Files
165
+
166
+ You can load SQL/Python files from a URL:
167
+
168
+ ```python
169
+ lakehouse = dr.connect(
170
+ workspace="Analytics",
171
+ lakehouse_name="Sales",
172
+ schema="dbo",
173
+ sql_folder="https://raw.githubusercontent.com/user/repo/main/sql"
174
+ )
175
+ ```
176
+
177
+ ## Real-Life Usage
178
+
179
+ For a complete, production-style example, see [fabric_demo](https://github.com/djouallah/fabric_demo).
180
+
181
+ ## License
182
+
183
+ MIT
@@ -0,0 +1,167 @@
1
+ # 🦆 Duckrun
2
+
3
+ Simple task runner for Microsoft Fabric Python notebook, powered by DuckDB and Delta_rs.
4
+
5
+ ## Installation
6
+
7
+ ```bash
8
+ pip install duckrun
9
+ ```
10
+
11
+
12
+
13
+ ## Quick Start
14
+
15
+ ```python
16
+ import duckrun as dr
17
+
18
+ # Connect to your Fabric lakehouse
19
+ lakehouse = dr.connect(
20
+ workspace="my_workspace",
21
+ lakehouse_name="my_lakehouse",
22
+ schema="dbo",
23
+ sql_folder="./sql" # folder containing your .sql and .py files
24
+ )
25
+
26
+ # Define your pipeline
27
+ pipeline = [
28
+ ('load_data', (url, path)), # Python task
29
+ ('clean_data', 'overwrite'), # SQL task
30
+ ('aggregate', 'append') # SQL task
31
+ ]
32
+
33
+ # Run it
34
+ lakehouse.run(pipeline)
35
+ ```
36
+
37
+ ## How It Works
38
+
39
+ Duckrun runs two types of tasks:
40
+
41
+ ### 1. Python Tasks
42
+ Format: `('function_name', (arg1, arg2, ...))`
43
+
44
+ Create a file `sql_folder/function_name.py` with a function matching the name:
45
+
46
+ ```python
47
+ # sql_folder/load_data.py
48
+ def load_data(url, path):
49
+ # your code here
50
+ # IMPORTANT: Must return 1 for success, 0 for failure
51
+ return 1
52
+ ```
53
+
54
+ ### 2. SQL Tasks
55
+ Format: `('table_name', 'mode')` or `('table_name', 'mode', {params})`
56
+
57
+ Create a file `sql_folder/table_name.sql`:
58
+
59
+ ```sql
60
+ -- sql_folder/clean_data.sql
61
+ SELECT
62
+ id,
63
+ TRIM(name) as name,
64
+ date
65
+ FROM raw_data
66
+ WHERE date >= '2024-01-01'
67
+ ```
68
+
69
+ **Modes:**
70
+ - `overwrite` - Replace table completely
71
+ - `append` - Add to existing table
72
+ - `ignore` - Create only if doesn't exist
73
+
74
+ ## Task Files
75
+
76
+ The `sql_folder` can contain a mixture of both `.sql` and `.py` files. This allows you to combine SQL transformations and Python logic in your pipelines.
77
+
78
+ ### SQL Files
79
+ Your SQL files automatically have access to:
80
+ - `$ws` - workspace name
81
+ - `$lh` - lakehouse name
82
+ - `$schema` - schema name
83
+
84
+ Pass custom parameters:
85
+
86
+ ```python
87
+ pipeline = [
88
+ ('sales', 'append', {'start_date': '2024-01-01', 'end_date': '2024-12-31'})
89
+ ]
90
+ ```
91
+
92
+ ```sql
93
+ -- sql_folder/sales.sql
94
+ SELECT * FROM transactions
95
+ WHERE date BETWEEN '$start_date' AND '$end_date'
96
+ ```
97
+
98
+ ## Table Name Convention
99
+
100
+ Use `__` to create variants of the same table:
101
+
102
+ ```python
103
+ pipeline = [
104
+ ('sales__initial', 'overwrite', {}), # writes to 'sales' table
105
+ ('sales__incremental', 'append', {}), # appends to 'sales' table
106
+ ]
107
+ ```
108
+
109
+ Both write to the same `sales` table, but use different SQL files.
110
+
111
+ ## Query Data
112
+
113
+ ```python
114
+ # Run queries
115
+ lakehouse.sql("SELECT * FROM my_table LIMIT 10").show()
116
+
117
+ # Get as DataFrame
118
+ df = lakehouse.sql("SELECT COUNT(*) FROM sales").df()
119
+ ```
120
+
121
+ ## Real-World Example
122
+
123
+ ```python
124
+ import duckrun as dr
125
+
126
+ lakehouse = dr.connect(
127
+ workspace="Analytics",
128
+ lakehouse_name="Sales",
129
+ schema="dbo",
130
+ sql_folder="./etl"
131
+ )
132
+
133
+ # Daily pipeline
134
+ daily = [
135
+ ('download_files', (api_url, local_path)),
136
+ ('staging_orders', 'overwrite', {'run_date': '2024-06-01'}),
137
+ ('staging_customers', 'overwrite', {'run_date': '2024-06-01'}),
138
+ ('fact_sales', 'append'),
139
+ ('dim_customer', 'overwrite')
140
+ ]
141
+
142
+ lakehouse.run(daily)
143
+
144
+ # Check results
145
+ lakehouse.sql("SELECT COUNT(*) FROM fact_sales").show()
146
+ ```
147
+
148
+ ## Remote SQL Files
149
+
150
+ You can load SQL/Python files from a URL:
151
+
152
+ ```python
153
+ lakehouse = dr.connect(
154
+ workspace="Analytics",
155
+ lakehouse_name="Sales",
156
+ schema="dbo",
157
+ sql_folder="https://raw.githubusercontent.com/user/repo/main/sql"
158
+ )
159
+ ```
160
+
161
+ ## Real-Life Usage
162
+
163
+ For a complete, production-style example, see [fabric_demo](https://github.com/djouallah/fabric_demo).
164
+
165
+ ## License
166
+
167
+ MIT
@@ -0,0 +1,183 @@
1
+ Metadata-Version: 2.4
2
+ Name: duckrun
3
+ Version: 0.1.1
4
+ Summary: Lakehouse task runner powered by DuckDB for Microsoft Fabric
5
+ License-Expression: MIT
6
+ Project-URL: Homepage, https://github.com/djouallah/duckrun
7
+ Project-URL: Repository, https://github.com/djouallah/duckrun
8
+ Project-URL: Issues, https://github.com/djouallah/duckrun/issues
9
+ Requires-Python: >=3.9
10
+ Description-Content-Type: text/markdown
11
+ License-File: LICENSE
12
+ Requires-Dist: duckdb>=1.2.0
13
+ Requires-Dist: deltalake>=0.18.2
14
+ Requires-Dist: requests>=2.28.0
15
+ Dynamic: license-file
16
+
17
+ # 🦆 Duckrun
18
+
19
+ Simple task runner for Microsoft Fabric Python notebook, powered by DuckDB and Delta_rs.
20
+
21
+ ## Installation
22
+
23
+ ```bash
24
+ pip install duckrun
25
+ ```
26
+
27
+
28
+
29
+ ## Quick Start
30
+
31
+ ```python
32
+ import duckrun as dr
33
+
34
+ # Connect to your Fabric lakehouse
35
+ lakehouse = dr.connect(
36
+ workspace="my_workspace",
37
+ lakehouse_name="my_lakehouse",
38
+ schema="dbo",
39
+ sql_folder="./sql" # folder containing your .sql and .py files
40
+ )
41
+
42
+ # Define your pipeline
43
+ pipeline = [
44
+ ('load_data', (url, path)), # Python task
45
+ ('clean_data', 'overwrite'), # SQL task
46
+ ('aggregate', 'append') # SQL task
47
+ ]
48
+
49
+ # Run it
50
+ lakehouse.run(pipeline)
51
+ ```
52
+
53
+ ## How It Works
54
+
55
+ Duckrun runs two types of tasks:
56
+
57
+ ### 1. Python Tasks
58
+ Format: `('function_name', (arg1, arg2, ...))`
59
+
60
+ Create a file `sql_folder/function_name.py` with a function matching the name:
61
+
62
+ ```python
63
+ # sql_folder/load_data.py
64
+ def load_data(url, path):
65
+ # your code here
66
+ # IMPORTANT: Must return 1 for success, 0 for failure
67
+ return 1
68
+ ```
69
+
70
+ ### 2. SQL Tasks
71
+ Format: `('table_name', 'mode')` or `('table_name', 'mode', {params})`
72
+
73
+ Create a file `sql_folder/table_name.sql`:
74
+
75
+ ```sql
76
+ -- sql_folder/clean_data.sql
77
+ SELECT
78
+ id,
79
+ TRIM(name) as name,
80
+ date
81
+ FROM raw_data
82
+ WHERE date >= '2024-01-01'
83
+ ```
84
+
85
+ **Modes:**
86
+ - `overwrite` - Replace table completely
87
+ - `append` - Add to existing table
88
+ - `ignore` - Create only if doesn't exist
89
+
90
+ ## Task Files
91
+
92
+ The `sql_folder` can contain a mixture of both `.sql` and `.py` files. This allows you to combine SQL transformations and Python logic in your pipelines.
93
+
94
+ ### SQL Files
95
+ Your SQL files automatically have access to:
96
+ - `$ws` - workspace name
97
+ - `$lh` - lakehouse name
98
+ - `$schema` - schema name
99
+
100
+ Pass custom parameters:
101
+
102
+ ```python
103
+ pipeline = [
104
+ ('sales', 'append', {'start_date': '2024-01-01', 'end_date': '2024-12-31'})
105
+ ]
106
+ ```
107
+
108
+ ```sql
109
+ -- sql_folder/sales.sql
110
+ SELECT * FROM transactions
111
+ WHERE date BETWEEN '$start_date' AND '$end_date'
112
+ ```
113
+
114
+ ## Table Name Convention
115
+
116
+ Use `__` to create variants of the same table:
117
+
118
+ ```python
119
+ pipeline = [
120
+ ('sales__initial', 'overwrite', {}), # writes to 'sales' table
121
+ ('sales__incremental', 'append', {}), # appends to 'sales' table
122
+ ]
123
+ ```
124
+
125
+ Both write to the same `sales` table, but use different SQL files.
126
+
127
+ ## Query Data
128
+
129
+ ```python
130
+ # Run queries
131
+ lakehouse.sql("SELECT * FROM my_table LIMIT 10").show()
132
+
133
+ # Get as DataFrame
134
+ df = lakehouse.sql("SELECT COUNT(*) FROM sales").df()
135
+ ```
136
+
137
+ ## Real-World Example
138
+
139
+ ```python
140
+ import duckrun as dr
141
+
142
+ lakehouse = dr.connect(
143
+ workspace="Analytics",
144
+ lakehouse_name="Sales",
145
+ schema="dbo",
146
+ sql_folder="./etl"
147
+ )
148
+
149
+ # Daily pipeline
150
+ daily = [
151
+ ('download_files', (api_url, local_path)),
152
+ ('staging_orders', 'overwrite', {'run_date': '2024-06-01'}),
153
+ ('staging_customers', 'overwrite', {'run_date': '2024-06-01'}),
154
+ ('fact_sales', 'append'),
155
+ ('dim_customer', 'overwrite')
156
+ ]
157
+
158
+ lakehouse.run(daily)
159
+
160
+ # Check results
161
+ lakehouse.sql("SELECT COUNT(*) FROM fact_sales").show()
162
+ ```
163
+
164
+ ## Remote SQL Files
165
+
166
+ You can load SQL/Python files from a URL:
167
+
168
+ ```python
169
+ lakehouse = dr.connect(
170
+ workspace="Analytics",
171
+ lakehouse_name="Sales",
172
+ schema="dbo",
173
+ sql_folder="https://raw.githubusercontent.com/user/repo/main/sql"
174
+ )
175
+ ```
176
+
177
+ ## Real-Life Usage
178
+
179
+ For a complete, production-style example, see [fabric_demo](https://github.com/djouallah/fabric_demo).
180
+
181
+ ## License
182
+
183
+ MIT
@@ -6,4 +6,5 @@ duckrun/core.py
6
6
  duckrun.egg-info/PKG-INFO
7
7
  duckrun.egg-info/SOURCES.txt
8
8
  duckrun.egg-info/dependency_links.txt
9
+ duckrun.egg-info/requires.txt
9
10
  duckrun.egg-info/top_level.txt
@@ -0,0 +1,3 @@
1
+ duckdb>=1.2.0
2
+ deltalake>=0.18.2
3
+ requests>=2.28.0
@@ -0,0 +1,23 @@
1
+
2
+ [build-system]
3
+ requires = ["setuptools>=61.0", "wheel"]
4
+ build-backend = "setuptools.build_meta"
5
+
6
+ [project]
7
+ name = "duckrun"
8
+ version = "0.1.1"
9
+ description = "Lakehouse task runner powered by DuckDB for Microsoft Fabric"
10
+ readme = "README.md"
11
+ license = "MIT"
12
+ requires-python = ">=3.9"
13
+ dependencies = [
14
+ "duckdb>=1.2.0",
15
+ "deltalake>=0.18.2",
16
+ "requests>=2.28.0"
17
+ ]
18
+
19
+ [project.urls]
20
+ Homepage = "https://github.com/djouallah/duckrun"
21
+ Repository = "https://github.com/djouallah/duckrun"
22
+ Issues = "https://github.com/djouallah/duckrun/issues"
23
+
duckrun-0.0.0/LICENSE DELETED
@@ -1 +0,0 @@
1
- ### **5. `LICENSE`**
duckrun-0.0.0/PKG-INFO DELETED
@@ -1,5 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: duckrun
3
- Version: 0.0.0
4
- License-File: LICENSE
5
- Dynamic: license-file
duckrun-0.0.0/README.md DELETED
@@ -1,39 +0,0 @@
1
- # 🦆 Duckrun
2
-
3
- Lakehouse task runner powered by DuckDB for Microsoft Fabric.
4
-
5
- ## Features
6
-
7
- - 🦆 **DuckDB-powered**: Fast in-memory processing
8
- - 📦 **Delta Lake**: Native Delta table support
9
- - 🔄 **Simple API**: Clean tuple-based pipeline definition
10
- - 🎯 **Fabric-native**: Built for Microsoft Fabric lakehouses
11
- - 🐍 **Python + SQL**: Mix Python and SQL tasks seamlessly
12
-
13
- ## Installation
14
- ```bash
15
- pip install duckrun
16
-
17
- from duckrun import Duckrun
18
-
19
- # Connect to your lakehouse
20
- dr = Duckrun.connect(
21
- workspace="your_workspace",
22
- lakehouse_name="your_lakehouse",
23
- schema="dbo",
24
- sql_folder="./sql"
25
- )
26
-
27
- # Define pipeline
28
- pipeline = [
29
- ('download', (urls, paths, depth)),
30
- ('staging', 'overwrite', {'run_date': '2024-06-01'}),
31
- ('transform', 'append'),
32
- ('fact_sales', 'append')
33
- ]
34
-
35
- # Run it
36
- dr.run(pipeline)
37
-
38
- # Query directly
39
- dr.sql("SELECT * FROM staging").show()
@@ -1,5 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: duckrun
3
- Version: 0.0.0
4
- License-File: LICENSE
5
- Dynamic: license-file
File without changes
File without changes
File without changes
File without changes