duckrun 0.1.0__tar.gz → 0.1.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
duckrun-0.1.2/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Mimoune
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
duckrun-0.1.2/PKG-INFO ADDED
@@ -0,0 +1,168 @@
1
+ Metadata-Version: 2.4
2
+ Name: duckrun
3
+ Version: 0.1.2
4
+ Summary: Lakehouse task runner powered by DuckDB for Microsoft Fabric
5
+ License-Expression: MIT
6
+ Project-URL: Homepage, https://github.com/djouallah/duckrun
7
+ Project-URL: Repository, https://github.com/djouallah/duckrun
8
+ Project-URL: Issues, https://github.com/djouallah/duckrun/issues
9
+ Requires-Python: >=3.9
10
+ Description-Content-Type: text/markdown
11
+ License-File: LICENSE
12
+ Requires-Dist: duckdb>=1.2.0
13
+ Requires-Dist: deltalake>=0.18.2
14
+ Requires-Dist: requests>=2.28.0
15
+ Dynamic: license-file
16
+
17
+
18
+ <img src="duckrun.png" width="400" alt="Duckrun">
19
+
20
+ Simple task runner for Microsoft Fabric Python notebook, powered by DuckDB and Delta_rs.
21
+
22
+
23
+ ## Known Limitation
24
+
25
+ Support only Lakehouse with schema, Workspace and lakehouse names should not contains space
26
+
27
+ ## Installation
28
+
29
+ ```bash
30
+ pip install duckrun
31
+ ```
32
+
33
+
34
+
35
+ ## Quick Start
36
+
37
+ ```python
38
+ import duckrun as dr
39
+
40
+ # Connect to your Fabric lakehouse
41
+ lakehouse = dr.connect(
42
+ workspace="my_workspace",
43
+ lakehouse_name="my_lakehouse",
44
+ schema="dbo",
45
+ sql_folder="./sql" # folder containing your .sql and .py files
46
+ )
47
+
48
+ # Define your pipeline
49
+ pipeline = [
50
+ ('load_data', (url, path)), # Python task
51
+ ('clean_data', 'overwrite'), # SQL task
52
+ ('aggregate', 'append') # SQL task
53
+ ]
54
+
55
+ # Run it
56
+ lakehouse.run(pipeline)
57
+ ```
58
+
59
+ ## Early Exit
60
+
61
+ In a pipeline run, if a task fails, the pipeline will stop without running the subsequent tasks.
62
+
63
+ ## How It Works
64
+
65
+ Duckrun runs two types of tasks:
66
+
67
+ ### 1. Python Tasks
68
+ Format: `('function_name', (arg1, arg2, ...))`
69
+
70
+ Create a file `sql_folder/function_name.py` with a function matching the name:
71
+
72
+ ```python
73
+ # sql_folder/load_data.py
74
+ def load_data(url, path):
75
+ # your code here
76
+ # IMPORTANT: Must return 1 for success, 0 for failure
77
+ return 1
78
+ ```
79
+
80
+ ### 2. SQL Tasks
81
+ Format: `('table_name', 'mode')` or `('table_name', 'mode', {params})`
82
+
83
+ Create a file `sql_folder/table_name.sql`:
84
+
85
+ ```sql
86
+ -- sql_folder/clean_data.sql
87
+ SELECT
88
+ id,
89
+ TRIM(name) as name,
90
+ date
91
+ FROM raw_data
92
+ WHERE date >= '2024-01-01'
93
+ ```
94
+
95
+ **Modes:**
96
+ - `overwrite` - Replace table completely
97
+ - `append` - Add to existing table
98
+ - `ignore` - Create only if doesn't exist
99
+
100
+ ## Task Files
101
+
102
+ The `sql_folder` can contain a mixture of both `.sql` and `.py` files. This allows you to combine SQL transformations and Python logic in your pipelines.
103
+
104
+ ### SQL Files
105
+ Your SQL files automatically have access to:
106
+ - `$ws` - workspace name
107
+ - `$lh` - lakehouse name
108
+ - `$schema` - schema name
109
+
110
+ Pass custom parameters:
111
+
112
+ ```python
113
+ pipeline = [
114
+ ('sales', 'append', {'start_date': '2024-01-01', 'end_date': '2024-12-31'})
115
+ ]
116
+ ```
117
+
118
+ ```sql
119
+ -- sql_folder/sales.sql
120
+ SELECT * FROM transactions
121
+ WHERE date BETWEEN '$start_date' AND '$end_date'
122
+ ```
123
+
124
+ ## Table Name Convention
125
+
126
+ Use `__` to create variants of the same table:
127
+
128
+ ```python
129
+ pipeline = [
130
+ ('sales__initial', 'overwrite'), # writes to 'sales' table
131
+ ('sales__incremental', 'append'), # appends to 'sales' table
132
+ ]
133
+ ```
134
+
135
+ Both write to the same `sales` table, but use different SQL files.
136
+
137
+ ## Query Data
138
+
139
+ ```python
140
+ # Run queries
141
+ lakehouse.sql("SELECT * FROM my_table LIMIT 10").show()
142
+
143
+ # Get as DataFrame
144
+ df = lakehouse.sql("SELECT COUNT(*) FROM sales").df()
145
+ ```
146
+
147
+
148
+
149
+ ## Remote SQL Files
150
+
151
+ You can load SQL/Python files from a URL:
152
+
153
+ ```python
154
+ lakehouse = dr.connect(
155
+ workspace="Analytics",
156
+ lakehouse_name="Sales",
157
+ schema="dbo",
158
+ sql_folder="https://raw.githubusercontent.com/user/repo/main/sql"
159
+ )
160
+ ```
161
+
162
+ ## Real-Life Usage
163
+
164
+ For a complete, production-style example, see [fabric_demo](https://github.com/djouallah/fabric_demo).
165
+
166
+ ## License
167
+
168
+ MIT
@@ -1,6 +1,12 @@
1
- # 🦆 Duckrun
2
1
 
3
- Simple lakehouse task runner for Microsoft Fabric, powered by DuckDB.
2
+ <img src="duckrun.png" width="400" alt="Duckrun">
3
+
4
+ Simple task runner for Microsoft Fabric Python notebook, powered by DuckDB and Delta_rs.
5
+
6
+
7
+ ## Known Limitation
8
+
9
+ Support only Lakehouse with schema, Workspace and lakehouse names should not contains space
4
10
 
5
11
  ## Installation
6
12
 
@@ -8,10 +14,7 @@ Simple lakehouse task runner for Microsoft Fabric, powered by DuckDB.
8
14
  pip install duckrun
9
15
  ```
10
16
 
11
- For local development (enables Azure CLI authentication):
12
- ```bash
13
- pip install duckrun[local]
14
- ```
17
+
15
18
 
16
19
  ## Quick Start
17
20
 
@@ -37,6 +40,10 @@ pipeline = [
37
40
  lakehouse.run(pipeline)
38
41
  ```
39
42
 
43
+ ## Early Exit
44
+
45
+ In a pipeline run, if a task fails, the pipeline will stop without running the subsequent tasks.
46
+
40
47
  ## How It Works
41
48
 
42
49
  Duckrun runs two types of tasks:
@@ -50,7 +57,8 @@ Create a file `sql_folder/function_name.py` with a function matching the name:
50
57
  # sql_folder/load_data.py
51
58
  def load_data(url, path):
52
59
  # your code here
53
- return result
60
+ # IMPORTANT: Must return 1 for success, 0 for failure
61
+ return 1
54
62
  ```
55
63
 
56
64
  ### 2. SQL Tasks
@@ -73,8 +81,11 @@ WHERE date >= '2024-01-01'
73
81
  - `append` - Add to existing table
74
82
  - `ignore` - Create only if doesn't exist
75
83
 
76
- ## SQL Parameters
84
+ ## Task Files
85
+
86
+ The `sql_folder` can contain a mixture of both `.sql` and `.py` files. This allows you to combine SQL transformations and Python logic in your pipelines.
77
87
 
88
+ ### SQL Files
78
89
  Your SQL files automatically have access to:
79
90
  - `$ws` - workspace name
80
91
  - `$lh` - lakehouse name
@@ -100,8 +111,8 @@ Use `__` to create variants of the same table:
100
111
 
101
112
  ```python
102
113
  pipeline = [
103
- ('sales__initial', 'overwrite', {}), # writes to 'sales' table
104
- ('sales__incremental', 'append', {}), # appends to 'sales' table
114
+ ('sales__initial', 'overwrite'), # writes to 'sales' table
115
+ ('sales__incremental', 'append'), # appends to 'sales' table
105
116
  ]
106
117
  ```
107
118
 
@@ -117,32 +128,7 @@ lakehouse.sql("SELECT * FROM my_table LIMIT 10").show()
117
128
  df = lakehouse.sql("SELECT COUNT(*) FROM sales").df()
118
129
  ```
119
130
 
120
- ## Real-World Example
121
131
 
122
- ```python
123
- import duckrun as dr
124
-
125
- lakehouse = dr.connect(
126
- workspace="Analytics",
127
- lakehouse_name="Sales",
128
- schema="dbo",
129
- sql_folder="./etl"
130
- )
131
-
132
- # Daily pipeline
133
- daily = [
134
- ('download_files', (api_url, local_path)),
135
- ('staging_orders', 'overwrite', {'run_date': '2024-06-01'}),
136
- ('staging_customers', 'overwrite', {'run_date': '2024-06-01'}),
137
- ('fact_sales', 'append'),
138
- ('dim_customer', 'overwrite')
139
- ]
140
-
141
- lakehouse.run(daily)
142
-
143
- # Check results
144
- lakehouse.sql("SELECT COUNT(*) FROM fact_sales").show()
145
- ```
146
132
 
147
133
  ## Remote SQL Files
148
134
 
@@ -157,6 +143,10 @@ lakehouse = dr.connect(
157
143
  )
158
144
  ```
159
145
 
146
+ ## Real-Life Usage
147
+
148
+ For a complete, production-style example, see [fabric_demo](https://github.com/djouallah/fabric_demo).
149
+
160
150
  ## License
161
151
 
162
152
  MIT
@@ -64,9 +64,14 @@ class Duckrun:
64
64
  def _attach_lakehouse(self):
65
65
  self._create_onelake_secret()
66
66
  try:
67
+ # Exclude Iceberg metadata folders when scanning for Delta tables
67
68
  list_tables_query = f"""
68
69
  SELECT DISTINCT(split_part(file, '_delta_log', 1)) as tables
69
70
  FROM glob ("abfss://{self.workspace}@onelake.dfs.fabric.microsoft.com/{self.lakehouse_name}.Lakehouse/Tables/*/*/_delta_log/*.json")
71
+ WHERE file NOT LIKE '%/metadata/%'
72
+ AND file NOT LIKE '%/iceberg/%'
73
+ AND split_part(file, '_delta_log', 1) NOT LIKE '%/metadata'
74
+ AND split_part(file, '_delta_log', 1) NOT LIKE '%/iceberg'
70
75
  """
71
76
  list_tables_df = self.con.sql(list_tables_query).df()
72
77
  list_tables = list_tables_df['tables'].tolist() if not list_tables_df.empty else []
@@ -82,18 +87,27 @@ class Duckrun:
82
87
  if len(parts) >= 2:
83
88
  potential_schema = parts[-2]
84
89
  table = parts[-1]
90
+
91
+ # Skip Iceberg-related folders
92
+ if table in ('metadata', 'iceberg') or potential_schema in ('metadata', 'iceberg'):
93
+ continue
94
+
85
95
  if potential_schema == self.schema:
86
96
  try:
87
97
  self.con.sql(f"""
88
98
  CREATE OR REPLACE VIEW {table}
89
99
  AS SELECT * FROM delta_scan('{self.table_base_url}{self.schema}/{table}');
90
100
  """)
101
+ print(f" ✓ Attached: {table}")
91
102
  except Exception as e:
92
- print(f"Error creating view for table {table}: {e}")
103
+ print(f" Skipped {table}: {str(e)[:100]}")
104
+ continue
105
+
93
106
  print("\nAttached tables (views) in DuckDB:")
94
107
  self.con.sql("SELECT name FROM (SHOW ALL TABLES) WHERE database='memory'").show()
95
108
  except Exception as e:
96
109
  print(f"Error attaching lakehouse: {e}")
110
+ print("Continuing without pre-attached tables.")
97
111
 
98
112
  def _normalize_table_name(self, name: str) -> str:
99
113
  """Extract base table name before first '__'"""
@@ -0,0 +1,168 @@
1
+ Metadata-Version: 2.4
2
+ Name: duckrun
3
+ Version: 0.1.2
4
+ Summary: Lakehouse task runner powered by DuckDB for Microsoft Fabric
5
+ License-Expression: MIT
6
+ Project-URL: Homepage, https://github.com/djouallah/duckrun
7
+ Project-URL: Repository, https://github.com/djouallah/duckrun
8
+ Project-URL: Issues, https://github.com/djouallah/duckrun/issues
9
+ Requires-Python: >=3.9
10
+ Description-Content-Type: text/markdown
11
+ License-File: LICENSE
12
+ Requires-Dist: duckdb>=1.2.0
13
+ Requires-Dist: deltalake>=0.18.2
14
+ Requires-Dist: requests>=2.28.0
15
+ Dynamic: license-file
16
+
17
+
18
+ <img src="duckrun.png" width="400" alt="Duckrun">
19
+
20
+ Simple task runner for Microsoft Fabric Python notebook, powered by DuckDB and Delta_rs.
21
+
22
+
23
+ ## Known Limitation
24
+
25
+ Support only Lakehouse with schema, Workspace and lakehouse names should not contains space
26
+
27
+ ## Installation
28
+
29
+ ```bash
30
+ pip install duckrun
31
+ ```
32
+
33
+
34
+
35
+ ## Quick Start
36
+
37
+ ```python
38
+ import duckrun as dr
39
+
40
+ # Connect to your Fabric lakehouse
41
+ lakehouse = dr.connect(
42
+ workspace="my_workspace",
43
+ lakehouse_name="my_lakehouse",
44
+ schema="dbo",
45
+ sql_folder="./sql" # folder containing your .sql and .py files
46
+ )
47
+
48
+ # Define your pipeline
49
+ pipeline = [
50
+ ('load_data', (url, path)), # Python task
51
+ ('clean_data', 'overwrite'), # SQL task
52
+ ('aggregate', 'append') # SQL task
53
+ ]
54
+
55
+ # Run it
56
+ lakehouse.run(pipeline)
57
+ ```
58
+
59
+ ## Early Exit
60
+
61
+ In a pipeline run, if a task fails, the pipeline will stop without running the subsequent tasks.
62
+
63
+ ## How It Works
64
+
65
+ Duckrun runs two types of tasks:
66
+
67
+ ### 1. Python Tasks
68
+ Format: `('function_name', (arg1, arg2, ...))`
69
+
70
+ Create a file `sql_folder/function_name.py` with a function matching the name:
71
+
72
+ ```python
73
+ # sql_folder/load_data.py
74
+ def load_data(url, path):
75
+ # your code here
76
+ # IMPORTANT: Must return 1 for success, 0 for failure
77
+ return 1
78
+ ```
79
+
80
+ ### 2. SQL Tasks
81
+ Format: `('table_name', 'mode')` or `('table_name', 'mode', {params})`
82
+
83
+ Create a file `sql_folder/table_name.sql`:
84
+
85
+ ```sql
86
+ -- sql_folder/clean_data.sql
87
+ SELECT
88
+ id,
89
+ TRIM(name) as name,
90
+ date
91
+ FROM raw_data
92
+ WHERE date >= '2024-01-01'
93
+ ```
94
+
95
+ **Modes:**
96
+ - `overwrite` - Replace table completely
97
+ - `append` - Add to existing table
98
+ - `ignore` - Create only if doesn't exist
99
+
100
+ ## Task Files
101
+
102
+ The `sql_folder` can contain a mixture of both `.sql` and `.py` files. This allows you to combine SQL transformations and Python logic in your pipelines.
103
+
104
+ ### SQL Files
105
+ Your SQL files automatically have access to:
106
+ - `$ws` - workspace name
107
+ - `$lh` - lakehouse name
108
+ - `$schema` - schema name
109
+
110
+ Pass custom parameters:
111
+
112
+ ```python
113
+ pipeline = [
114
+ ('sales', 'append', {'start_date': '2024-01-01', 'end_date': '2024-12-31'})
115
+ ]
116
+ ```
117
+
118
+ ```sql
119
+ -- sql_folder/sales.sql
120
+ SELECT * FROM transactions
121
+ WHERE date BETWEEN '$start_date' AND '$end_date'
122
+ ```
123
+
124
+ ## Table Name Convention
125
+
126
+ Use `__` to create variants of the same table:
127
+
128
+ ```python
129
+ pipeline = [
130
+ ('sales__initial', 'overwrite'), # writes to 'sales' table
131
+ ('sales__incremental', 'append'), # appends to 'sales' table
132
+ ]
133
+ ```
134
+
135
+ Both write to the same `sales` table, but use different SQL files.
136
+
137
+ ## Query Data
138
+
139
+ ```python
140
+ # Run queries
141
+ lakehouse.sql("SELECT * FROM my_table LIMIT 10").show()
142
+
143
+ # Get as DataFrame
144
+ df = lakehouse.sql("SELECT COUNT(*) FROM sales").df()
145
+ ```
146
+
147
+
148
+
149
+ ## Remote SQL Files
150
+
151
+ You can load SQL/Python files from a URL:
152
+
153
+ ```python
154
+ lakehouse = dr.connect(
155
+ workspace="Analytics",
156
+ lakehouse_name="Sales",
157
+ schema="dbo",
158
+ sql_folder="https://raw.githubusercontent.com/user/repo/main/sql"
159
+ )
160
+ ```
161
+
162
+ ## Real-Life Usage
163
+
164
+ For a complete, production-style example, see [fabric_demo](https://github.com/djouallah/fabric_demo).
165
+
166
+ ## License
167
+
168
+ MIT
@@ -6,4 +6,5 @@ duckrun/core.py
6
6
  duckrun.egg-info/PKG-INFO
7
7
  duckrun.egg-info/SOURCES.txt
8
8
  duckrun.egg-info/dependency_links.txt
9
+ duckrun.egg-info/requires.txt
9
10
  duckrun.egg-info/top_level.txt
@@ -0,0 +1,3 @@
1
+ duckdb>=1.2.0
2
+ deltalake>=0.18.2
3
+ requests>=2.28.0
@@ -5,10 +5,16 @@ build-backend = "setuptools.build_meta"
5
5
 
6
6
  [project]
7
7
  name = "duckrun"
8
- version = "0.1.0"
8
+ version = "0.1.2"
9
9
  description = "Lakehouse task runner powered by DuckDB for Microsoft Fabric"
10
+ readme = "README.md"
10
11
  license = "MIT"
11
12
  requires-python = ">=3.9"
13
+ dependencies = [
14
+ "duckdb>=1.2.0",
15
+ "deltalake>=0.18.2",
16
+ "requests>=2.28.0"
17
+ ]
12
18
 
13
19
  [project.urls]
14
20
  Homepage = "https://github.com/djouallah/duckrun"
duckrun-0.1.0/LICENSE DELETED
@@ -1 +0,0 @@
1
- ### **5. `LICENSE`**
duckrun-0.1.0/PKG-INFO DELETED
@@ -1,11 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: duckrun
3
- Version: 0.1.0
4
- Summary: Lakehouse task runner powered by DuckDB for Microsoft Fabric
5
- License-Expression: MIT
6
- Project-URL: Homepage, https://github.com/djouallah/duckrun
7
- Project-URL: Repository, https://github.com/djouallah/duckrun
8
- Project-URL: Issues, https://github.com/djouallah/duckrun/issues
9
- Requires-Python: >=3.9
10
- License-File: LICENSE
11
- Dynamic: license-file
@@ -1,11 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: duckrun
3
- Version: 0.1.0
4
- Summary: Lakehouse task runner powered by DuckDB for Microsoft Fabric
5
- License-Expression: MIT
6
- Project-URL: Homepage, https://github.com/djouallah/duckrun
7
- Project-URL: Repository, https://github.com/djouallah/duckrun
8
- Project-URL: Issues, https://github.com/djouallah/duckrun/issues
9
- Requires-Python: >=3.9
10
- License-File: LICENSE
11
- Dynamic: license-file
File without changes
File without changes