duckrun 0.1.0__tar.gz → 0.1.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
duckrun-0.1.1/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Mimoune
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
duckrun-0.1.1/PKG-INFO ADDED
@@ -0,0 +1,183 @@
1
+ Metadata-Version: 2.4
2
+ Name: duckrun
3
+ Version: 0.1.1
4
+ Summary: Lakehouse task runner powered by DuckDB for Microsoft Fabric
5
+ License-Expression: MIT
6
+ Project-URL: Homepage, https://github.com/djouallah/duckrun
7
+ Project-URL: Repository, https://github.com/djouallah/duckrun
8
+ Project-URL: Issues, https://github.com/djouallah/duckrun/issues
9
+ Requires-Python: >=3.9
10
+ Description-Content-Type: text/markdown
11
+ License-File: LICENSE
12
+ Requires-Dist: duckdb>=1.2.0
13
+ Requires-Dist: deltalake>=0.18.2
14
+ Requires-Dist: requests>=2.28.0
15
+ Dynamic: license-file
16
+
17
+ # 🦆 Duckrun
18
+
19
+ Simple task runner for Microsoft Fabric Python notebook, powered by DuckDB and Delta_rs.
20
+
21
+ ## Installation
22
+
23
+ ```bash
24
+ pip install duckrun
25
+ ```
26
+
27
+
28
+
29
+ ## Quick Start
30
+
31
+ ```python
32
+ import duckrun as dr
33
+
34
+ # Connect to your Fabric lakehouse
35
+ lakehouse = dr.connect(
36
+ workspace="my_workspace",
37
+ lakehouse_name="my_lakehouse",
38
+ schema="dbo",
39
+ sql_folder="./sql" # folder containing your .sql and .py files
40
+ )
41
+
42
+ # Define your pipeline
43
+ pipeline = [
44
+ ('load_data', (url, path)), # Python task
45
+ ('clean_data', 'overwrite'), # SQL task
46
+ ('aggregate', 'append') # SQL task
47
+ ]
48
+
49
+ # Run it
50
+ lakehouse.run(pipeline)
51
+ ```
52
+
53
+ ## How It Works
54
+
55
+ Duckrun runs two types of tasks:
56
+
57
+ ### 1. Python Tasks
58
+ Format: `('function_name', (arg1, arg2, ...))`
59
+
60
+ Create a file `sql_folder/function_name.py` with a function matching the name:
61
+
62
+ ```python
63
+ # sql_folder/load_data.py
64
+ def load_data(url, path):
65
+ # your code here
66
+ # IMPORTANT: Must return 1 for success, 0 for failure
67
+ return 1
68
+ ```
69
+
70
+ ### 2. SQL Tasks
71
+ Format: `('table_name', 'mode')` or `('table_name', 'mode', {params})`
72
+
73
+ Create a file `sql_folder/table_name.sql`:
74
+
75
+ ```sql
76
+ -- sql_folder/clean_data.sql
77
+ SELECT
78
+ id,
79
+ TRIM(name) as name,
80
+ date
81
+ FROM raw_data
82
+ WHERE date >= '2024-01-01'
83
+ ```
84
+
85
+ **Modes:**
86
+ - `overwrite` - Replace table completely
87
+ - `append` - Add to existing table
88
+ - `ignore` - Create only if doesn't exist
89
+
90
+ ## Task Files
91
+
92
+ The `sql_folder` can contain a mixture of both `.sql` and `.py` files. This allows you to combine SQL transformations and Python logic in your pipelines.
93
+
94
+ ### SQL Files
95
+ Your SQL files automatically have access to:
96
+ - `$ws` - workspace name
97
+ - `$lh` - lakehouse name
98
+ - `$schema` - schema name
99
+
100
+ Pass custom parameters:
101
+
102
+ ```python
103
+ pipeline = [
104
+ ('sales', 'append', {'start_date': '2024-01-01', 'end_date': '2024-12-31'})
105
+ ]
106
+ ```
107
+
108
+ ```sql
109
+ -- sql_folder/sales.sql
110
+ SELECT * FROM transactions
111
+ WHERE date BETWEEN '$start_date' AND '$end_date'
112
+ ```
113
+
114
+ ## Table Name Convention
115
+
116
+ Use `__` to create variants of the same table:
117
+
118
+ ```python
119
+ pipeline = [
120
+ ('sales__initial', 'overwrite', {}), # writes to 'sales' table
121
+ ('sales__incremental', 'append', {}), # appends to 'sales' table
122
+ ]
123
+ ```
124
+
125
+ Both write to the same `sales` table, but use different SQL files.
126
+
127
+ ## Query Data
128
+
129
+ ```python
130
+ # Run queries
131
+ lakehouse.sql("SELECT * FROM my_table LIMIT 10").show()
132
+
133
+ # Get as DataFrame
134
+ df = lakehouse.sql("SELECT COUNT(*) FROM sales").df()
135
+ ```
136
+
137
+ ## Real-World Example
138
+
139
+ ```python
140
+ import duckrun as dr
141
+
142
+ lakehouse = dr.connect(
143
+ workspace="Analytics",
144
+ lakehouse_name="Sales",
145
+ schema="dbo",
146
+ sql_folder="./etl"
147
+ )
148
+
149
+ # Daily pipeline
150
+ daily = [
151
+ ('download_files', (api_url, local_path)),
152
+ ('staging_orders', 'overwrite', {'run_date': '2024-06-01'}),
153
+ ('staging_customers', 'overwrite', {'run_date': '2024-06-01'}),
154
+ ('fact_sales', 'append'),
155
+ ('dim_customer', 'overwrite')
156
+ ]
157
+
158
+ lakehouse.run(daily)
159
+
160
+ # Check results
161
+ lakehouse.sql("SELECT COUNT(*) FROM fact_sales").show()
162
+ ```
163
+
164
+ ## Remote SQL Files
165
+
166
+ You can load SQL/Python files from a URL:
167
+
168
+ ```python
169
+ lakehouse = dr.connect(
170
+ workspace="Analytics",
171
+ lakehouse_name="Sales",
172
+ schema="dbo",
173
+ sql_folder="https://raw.githubusercontent.com/user/repo/main/sql"
174
+ )
175
+ ```
176
+
177
+ ## Real-Life Usage
178
+
179
+ For a complete, production-style example, see [fabric_demo](https://github.com/djouallah/fabric_demo).
180
+
181
+ ## License
182
+
183
+ MIT
@@ -1,6 +1,6 @@
1
1
  # 🦆 Duckrun
2
2
 
3
- Simple lakehouse task runner for Microsoft Fabric, powered by DuckDB.
3
+ Simple task runner for Microsoft Fabric Python notebook, powered by DuckDB and Delta_rs.
4
4
 
5
5
  ## Installation
6
6
 
@@ -8,10 +8,7 @@ Simple lakehouse task runner for Microsoft Fabric, powered by DuckDB.
8
8
  pip install duckrun
9
9
  ```
10
10
 
11
- For local development (enables Azure CLI authentication):
12
- ```bash
13
- pip install duckrun[local]
14
- ```
11
+
15
12
 
16
13
  ## Quick Start
17
14
 
@@ -50,7 +47,8 @@ Create a file `sql_folder/function_name.py` with a function matching the name:
50
47
  # sql_folder/load_data.py
51
48
  def load_data(url, path):
52
49
  # your code here
53
- return result
50
+ # IMPORTANT: Must return 1 for success, 0 for failure
51
+ return 1
54
52
  ```
55
53
 
56
54
  ### 2. SQL Tasks
@@ -73,8 +71,11 @@ WHERE date >= '2024-01-01'
73
71
  - `append` - Add to existing table
74
72
  - `ignore` - Create only if doesn't exist
75
73
 
76
- ## SQL Parameters
74
+ ## Task Files
77
75
 
76
+ The `sql_folder` can contain a mixture of both `.sql` and `.py` files. This allows you to combine SQL transformations and Python logic in your pipelines.
77
+
78
+ ### SQL Files
78
79
  Your SQL files automatically have access to:
79
80
  - `$ws` - workspace name
80
81
  - `$lh` - lakehouse name
@@ -157,6 +158,10 @@ lakehouse = dr.connect(
157
158
  )
158
159
  ```
159
160
 
161
+ ## Real-Life Usage
162
+
163
+ For a complete, production-style example, see [fabric_demo](https://github.com/djouallah/fabric_demo).
164
+
160
165
  ## License
161
166
 
162
167
  MIT
@@ -0,0 +1,183 @@
1
+ Metadata-Version: 2.4
2
+ Name: duckrun
3
+ Version: 0.1.1
4
+ Summary: Lakehouse task runner powered by DuckDB for Microsoft Fabric
5
+ License-Expression: MIT
6
+ Project-URL: Homepage, https://github.com/djouallah/duckrun
7
+ Project-URL: Repository, https://github.com/djouallah/duckrun
8
+ Project-URL: Issues, https://github.com/djouallah/duckrun/issues
9
+ Requires-Python: >=3.9
10
+ Description-Content-Type: text/markdown
11
+ License-File: LICENSE
12
+ Requires-Dist: duckdb>=1.2.0
13
+ Requires-Dist: deltalake>=0.18.2
14
+ Requires-Dist: requests>=2.28.0
15
+ Dynamic: license-file
16
+
17
+ # 🦆 Duckrun
18
+
19
+ Simple task runner for Microsoft Fabric Python notebook, powered by DuckDB and Delta_rs.
20
+
21
+ ## Installation
22
+
23
+ ```bash
24
+ pip install duckrun
25
+ ```
26
+
27
+
28
+
29
+ ## Quick Start
30
+
31
+ ```python
32
+ import duckrun as dr
33
+
34
+ # Connect to your Fabric lakehouse
35
+ lakehouse = dr.connect(
36
+ workspace="my_workspace",
37
+ lakehouse_name="my_lakehouse",
38
+ schema="dbo",
39
+ sql_folder="./sql" # folder containing your .sql and .py files
40
+ )
41
+
42
+ # Define your pipeline
43
+ pipeline = [
44
+ ('load_data', (url, path)), # Python task
45
+ ('clean_data', 'overwrite'), # SQL task
46
+ ('aggregate', 'append') # SQL task
47
+ ]
48
+
49
+ # Run it
50
+ lakehouse.run(pipeline)
51
+ ```
52
+
53
+ ## How It Works
54
+
55
+ Duckrun runs two types of tasks:
56
+
57
+ ### 1. Python Tasks
58
+ Format: `('function_name', (arg1, arg2, ...))`
59
+
60
+ Create a file `sql_folder/function_name.py` with a function matching the name:
61
+
62
+ ```python
63
+ # sql_folder/load_data.py
64
+ def load_data(url, path):
65
+ # your code here
66
+ # IMPORTANT: Must return 1 for success, 0 for failure
67
+ return 1
68
+ ```
69
+
70
+ ### 2. SQL Tasks
71
+ Format: `('table_name', 'mode')` or `('table_name', 'mode', {params})`
72
+
73
+ Create a file `sql_folder/table_name.sql`:
74
+
75
+ ```sql
76
+ -- sql_folder/clean_data.sql
77
+ SELECT
78
+ id,
79
+ TRIM(name) as name,
80
+ date
81
+ FROM raw_data
82
+ WHERE date >= '2024-01-01'
83
+ ```
84
+
85
+ **Modes:**
86
+ - `overwrite` - Replace table completely
87
+ - `append` - Add to existing table
88
+ - `ignore` - Create only if doesn't exist
89
+
90
+ ## Task Files
91
+
92
+ The `sql_folder` can contain a mixture of both `.sql` and `.py` files. This allows you to combine SQL transformations and Python logic in your pipelines.
93
+
94
+ ### SQL Files
95
+ Your SQL files automatically have access to:
96
+ - `$ws` - workspace name
97
+ - `$lh` - lakehouse name
98
+ - `$schema` - schema name
99
+
100
+ Pass custom parameters:
101
+
102
+ ```python
103
+ pipeline = [
104
+ ('sales', 'append', {'start_date': '2024-01-01', 'end_date': '2024-12-31'})
105
+ ]
106
+ ```
107
+
108
+ ```sql
109
+ -- sql_folder/sales.sql
110
+ SELECT * FROM transactions
111
+ WHERE date BETWEEN '$start_date' AND '$end_date'
112
+ ```
113
+
114
+ ## Table Name Convention
115
+
116
+ Use `__` to create variants of the same table:
117
+
118
+ ```python
119
+ pipeline = [
120
+ ('sales__initial', 'overwrite', {}), # writes to 'sales' table
121
+ ('sales__incremental', 'append', {}), # appends to 'sales' table
122
+ ]
123
+ ```
124
+
125
+ Both write to the same `sales` table, but use different SQL files.
126
+
127
+ ## Query Data
128
+
129
+ ```python
130
+ # Run queries
131
+ lakehouse.sql("SELECT * FROM my_table LIMIT 10").show()
132
+
133
+ # Get as DataFrame
134
+ df = lakehouse.sql("SELECT COUNT(*) FROM sales").df()
135
+ ```
136
+
137
+ ## Real-World Example
138
+
139
+ ```python
140
+ import duckrun as dr
141
+
142
+ lakehouse = dr.connect(
143
+ workspace="Analytics",
144
+ lakehouse_name="Sales",
145
+ schema="dbo",
146
+ sql_folder="./etl"
147
+ )
148
+
149
+ # Daily pipeline
150
+ daily = [
151
+ ('download_files', (api_url, local_path)),
152
+ ('staging_orders', 'overwrite', {'run_date': '2024-06-01'}),
153
+ ('staging_customers', 'overwrite', {'run_date': '2024-06-01'}),
154
+ ('fact_sales', 'append'),
155
+ ('dim_customer', 'overwrite')
156
+ ]
157
+
158
+ lakehouse.run(daily)
159
+
160
+ # Check results
161
+ lakehouse.sql("SELECT COUNT(*) FROM fact_sales").show()
162
+ ```
163
+
164
+ ## Remote SQL Files
165
+
166
+ You can load SQL/Python files from a URL:
167
+
168
+ ```python
169
+ lakehouse = dr.connect(
170
+ workspace="Analytics",
171
+ lakehouse_name="Sales",
172
+ schema="dbo",
173
+ sql_folder="https://raw.githubusercontent.com/user/repo/main/sql"
174
+ )
175
+ ```
176
+
177
+ ## Real-Life Usage
178
+
179
+ For a complete, production-style example, see [fabric_demo](https://github.com/djouallah/fabric_demo).
180
+
181
+ ## License
182
+
183
+ MIT
@@ -6,4 +6,5 @@ duckrun/core.py
6
6
  duckrun.egg-info/PKG-INFO
7
7
  duckrun.egg-info/SOURCES.txt
8
8
  duckrun.egg-info/dependency_links.txt
9
+ duckrun.egg-info/requires.txt
9
10
  duckrun.egg-info/top_level.txt
@@ -0,0 +1,3 @@
1
+ duckdb>=1.2.0
2
+ deltalake>=0.18.2
3
+ requests>=2.28.0
@@ -5,10 +5,16 @@ build-backend = "setuptools.build_meta"
5
5
 
6
6
  [project]
7
7
  name = "duckrun"
8
- version = "0.1.0"
8
+ version = "0.1.1"
9
9
  description = "Lakehouse task runner powered by DuckDB for Microsoft Fabric"
10
+ readme = "README.md"
10
11
  license = "MIT"
11
12
  requires-python = ">=3.9"
13
+ dependencies = [
14
+ "duckdb>=1.2.0",
15
+ "deltalake>=0.18.2",
16
+ "requests>=2.28.0"
17
+ ]
12
18
 
13
19
  [project.urls]
14
20
  Homepage = "https://github.com/djouallah/duckrun"
duckrun-0.1.0/LICENSE DELETED
@@ -1 +0,0 @@
1
- ### **5. `LICENSE`**
duckrun-0.1.0/PKG-INFO DELETED
@@ -1,11 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: duckrun
3
- Version: 0.1.0
4
- Summary: Lakehouse task runner powered by DuckDB for Microsoft Fabric
5
- License-Expression: MIT
6
- Project-URL: Homepage, https://github.com/djouallah/duckrun
7
- Project-URL: Repository, https://github.com/djouallah/duckrun
8
- Project-URL: Issues, https://github.com/djouallah/duckrun/issues
9
- Requires-Python: >=3.9
10
- License-File: LICENSE
11
- Dynamic: license-file
@@ -1,11 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: duckrun
3
- Version: 0.1.0
4
- Summary: Lakehouse task runner powered by DuckDB for Microsoft Fabric
5
- License-Expression: MIT
6
- Project-URL: Homepage, https://github.com/djouallah/duckrun
7
- Project-URL: Repository, https://github.com/djouallah/duckrun
8
- Project-URL: Issues, https://github.com/djouallah/duckrun/issues
9
- Requires-Python: >=3.9
10
- License-File: LICENSE
11
- Dynamic: license-file
File without changes
File without changes
File without changes