duckrun 0.1.5.1__tar.gz → 0.1.5.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,303 @@
1
+ Metadata-Version: 2.4
2
+ Name: duckrun
3
+ Version: 0.1.5.3
4
+ Summary: Lakehouse task runner powered by DuckDB for Microsoft Fabric
5
+ Author: mim
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/djouallah/duckrun
8
+ Project-URL: Repository, https://github.com/djouallah/duckrun
9
+ Project-URL: Issues, https://github.com/djouallah/duckrun/issues
10
+ Requires-Python: >=3.9
11
+ Description-Content-Type: text/markdown
12
+ License-File: LICENSE
13
+ Requires-Dist: duckdb>=1.2.0
14
+ Requires-Dist: deltalake>=0.18.2
15
+ Requires-Dist: requests>=2.28.0
16
+ Dynamic: license-file
17
+
18
+ <img src="duckrun.png" width="400" alt="Duckrun">
19
+
20
+ Simple task runner for Microsoft Fabric Python notebooks, powered by DuckDB and Delta Lake.
21
+
22
+ ## Important Notes
23
+
24
+ **Requirements:**
25
+ - Lakehouse must have a schema (e.g., `dbo`, `sales`, `analytics`)
26
+ - Workspace and lakehouse names cannot contain spaces
27
+
28
+ **Why no spaces?** Duckrun uses simple name-based paths instead of GUIDs. This keeps the code clean and readable, which is perfect for data engineering workspaces where naming conventions are already well-established. Just use underscores or hyphens instead: `my_workspace` or `my-lakehouse`.
29
+
30
+ ## Installation
31
+
32
+ ```bash
33
+ pip install duckrun
34
+ ```
35
+
36
+ ## Quick Start
37
+
38
+ ```python
39
+ import duckrun
40
+
41
+ # Connect to your Fabric lakehouse
42
+ con = duckrun.connect(
43
+ workspace="my_workspace",
44
+ lakehouse_name="my_lakehouse",
45
+ schema="dbo"
46
+ )
47
+
48
+ # Explore data
49
+ con.sql("SELECT * FROM my_table LIMIT 10").show()
50
+
51
+ # Write to Delta tables (Spark-style API)
52
+ con.sql("SELECT * FROM source").write.mode("overwrite").saveAsTable("target")
53
+ ```
54
+
55
+ That's it! No `sql_folder` needed for data exploration.
56
+
57
+ ## Two Ways to Use Duckrun
58
+
59
+ ### 1. Data Exploration (Spark-Style API)
60
+
61
+ Perfect for ad-hoc analysis and interactive notebooks:
62
+
63
+ ```python
64
+ con = duckrun.connect("workspace", "lakehouse", "dbo")
65
+
66
+ # Query existing tables
67
+ con.sql("SELECT * FROM sales WHERE year = 2024").show()
68
+
69
+ # Get DataFrame
70
+ df = con.sql("SELECT COUNT(*) FROM orders").df()
71
+
72
+ # Write results to Delta tables
73
+ con.sql("""
74
+ SELECT
75
+ customer_id,
76
+ SUM(amount) as total
77
+ FROM orders
78
+ GROUP BY customer_id
79
+ """).write.mode("overwrite").saveAsTable("customer_totals")
80
+
81
+ # Append mode
82
+ con.sql("SELECT * FROM new_orders").write.mode("append").saveAsTable("orders")
83
+ ```
84
+
85
+ **Note:** `.format("delta")` is optional - Delta is the default format!
86
+
87
+ ### 2. Pipeline Orchestration
88
+
89
+ For production workflows with reusable SQL and Python tasks:
90
+
91
+ ```python
92
+ con = duckrun.connect(
93
+ workspace="my_workspace",
94
+ lakehouse_name="my_lakehouse",
95
+ schema="dbo",
96
+ sql_folder="./sql" # folder with .sql and .py files
97
+ )
98
+
99
+ # Define pipeline
100
+ pipeline = [
101
+ ('download_data', (url, path)), # Python task
102
+ ('clean_data', 'overwrite'), # SQL task
103
+ ('aggregate', 'append') # SQL task
104
+ ]
105
+
106
+ # Run it
107
+ con.run(pipeline)
108
+ ```
109
+
110
+ ## Pipeline Tasks
111
+
112
+ ### Python Tasks
113
+
114
+ **Format:** `('function_name', (arg1, arg2, ...))`
115
+
116
+ Create `sql_folder/function_name.py`:
117
+
118
+ ```python
119
+ # sql_folder/download_data.py
120
+ def download_data(url, path):
121
+ # your code here
122
+ return 1 # 1 = success, 0 = failure
123
+ ```
124
+
125
+ ### SQL Tasks
126
+
127
+ **Format:** `('table_name', 'mode')` or `('table_name', 'mode', {params})`
128
+
129
+ Create `sql_folder/table_name.sql`:
130
+
131
+ ```sql
132
+ -- sql_folder/clean_data.sql
133
+ SELECT
134
+ id,
135
+ TRIM(name) as name,
136
+ date
137
+ FROM raw_data
138
+ WHERE date >= '2024-01-01'
139
+ ```
140
+
141
+ **Write Modes:**
142
+ - `overwrite` - Replace table completely
143
+ - `append` - Add to existing table
144
+ - `ignore` - Create only if doesn't exist
145
+
146
+ ### Parameterized SQL
147
+
148
+ Built-in parameters (always available):
149
+ - `$ws` - workspace name
150
+ - `$lh` - lakehouse name
151
+ - `$schema` - schema name
152
+
153
+ Custom parameters:
154
+
155
+ ```python
156
+ pipeline = [
157
+ ('sales', 'append', {'start_date': '2024-01-01', 'end_date': '2024-12-31'})
158
+ ]
159
+ ```
160
+
161
+ ```sql
162
+ -- sql_folder/sales.sql
163
+ SELECT * FROM transactions
164
+ WHERE date BETWEEN '$start_date' AND '$end_date'
165
+ ```
166
+
167
+ ## Advanced Features
168
+
169
+ ### Table Name Variants
170
+
171
+ Use `__` to create multiple versions of the same table:
172
+
173
+ ```python
174
+ pipeline = [
175
+ ('sales__initial', 'overwrite'), # writes to 'sales'
176
+ ('sales__incremental', 'append'), # appends to 'sales'
177
+ ]
178
+ ```
179
+
180
+ Both tasks write to the `sales` table but use different SQL files (`sales__initial.sql` and `sales__incremental.sql`).
181
+
182
+ ### Remote SQL Files
183
+
184
+ Load tasks from GitHub or any URL:
185
+
186
+ ```python
187
+ con = duckrun.connect(
188
+ workspace="Analytics",
189
+ lakehouse_name="Sales",
190
+ schema="dbo",
191
+ sql_folder="https://raw.githubusercontent.com/user/repo/main/sql"
192
+ )
193
+ ```
194
+
195
+ ### Early Exit on Failure
196
+
197
+ **Pipelines automatically stop when any task fails** - subsequent tasks won't run.
198
+
199
+ For **SQL tasks**, failure is automatic:
200
+ - If the query has a syntax error or runtime error, the task fails
201
+ - The pipeline stops immediately
202
+
203
+ For **Python tasks**, you control success/failure by returning:
204
+ - `1` = Success → pipeline continues to next task
205
+ - `0` = Failure → pipeline stops, remaining tasks are skipped
206
+
207
+ Example:
208
+
209
+ ```python
210
+ # sql_folder/download_data.py
211
+ def download_data(url, path):
212
+ try:
213
+ response = requests.get(url)
214
+ response.raise_for_status()
215
+ # save data...
216
+ return 1 # Success - pipeline continues
217
+ except Exception as e:
218
+ print(f"Download failed: {e}")
219
+ return 0 # Failure - pipeline stops here
220
+ ```
221
+
222
+ ```python
223
+ pipeline = [
224
+ ('download_data', (url, path)), # If returns 0, stops here
225
+ ('clean_data', 'overwrite'), # Won't run if download failed
226
+ ('aggregate', 'append') # Won't run if download failed
227
+ ]
228
+
229
+ success = con.run(pipeline) # Returns True only if ALL tasks succeed
230
+ ```
231
+
232
+ This prevents downstream tasks from processing incomplete or corrupted data.
233
+
234
+ ### Delta Lake Optimization
235
+
236
+ Duckrun automatically:
237
+ - Compacts small files when file count exceeds threshold (default: 100)
238
+ - Vacuums old versions on overwrite
239
+ - Cleans up metadata
240
+
241
+ Customize compaction threshold:
242
+
243
+ ```python
244
+ con = duckrun.connect(
245
+ workspace="workspace",
246
+ lakehouse_name="lakehouse",
247
+ schema="dbo",
248
+ compaction_threshold=50 # compact after 50 files
249
+ )
250
+ ```
251
+
252
+ ## Complete Example
253
+
254
+ ```python
255
+ import duckrun
256
+
257
+ # Connect
258
+ con = duckrun.connect("Analytics", "Sales", "dbo", "./sql")
259
+
260
+ # Pipeline with mixed tasks
261
+ pipeline = [
262
+ # Download raw data (Python)
263
+ ('fetch_api_data', ('https://api.example.com/sales', 'raw')),
264
+
265
+ # Clean and transform (SQL)
266
+ ('clean_sales', 'overwrite'),
267
+
268
+ # Aggregate by region (SQL with params)
269
+ ('regional_summary', 'overwrite', {'min_amount': 1000}),
270
+
271
+ # Append to history (SQL)
272
+ ('sales_history', 'append')
273
+ ]
274
+
275
+ # Run
276
+ success = con.run(pipeline)
277
+
278
+ # Explore results
279
+ con.sql("SELECT * FROM regional_summary").show()
280
+
281
+ # Export to new table
282
+ con.sql("""
283
+ SELECT region, SUM(total) as grand_total
284
+ FROM regional_summary
285
+ GROUP BY region
286
+ """).write.mode("overwrite").saveAsTable("region_totals")
287
+ ```
288
+
289
+ ## How It Works
290
+
291
+ 1. **Connection**: Duckrun connects to your Fabric lakehouse using OneLake and Azure authentication
292
+ 2. **Table Discovery**: Automatically scans for Delta tables in your schema and creates DuckDB views
293
+ 3. **Query Execution**: Run SQL queries directly against Delta tables using DuckDB's speed
294
+ 4. **Write Operations**: Results are written back as Delta tables with automatic optimization
295
+ 5. **Pipelines**: Orchestrate complex workflows with reusable SQL and Python tasks
296
+
297
+ ## Real-World Example
298
+
299
+ For a complete production example, see [fabric_demo](https://github.com/djouallah/fabric_demo).
300
+
301
+ ## License
302
+
303
+ MIT
@@ -0,0 +1,286 @@
1
+ <img src="duckrun.png" width="400" alt="Duckrun">
2
+
3
+ Simple task runner for Microsoft Fabric Python notebooks, powered by DuckDB and Delta Lake.
4
+
5
+ ## Important Notes
6
+
7
+ **Requirements:**
8
+ - Lakehouse must have a schema (e.g., `dbo`, `sales`, `analytics`)
9
+ - Workspace and lakehouse names cannot contain spaces
10
+
11
+ **Why no spaces?** Duckrun uses simple name-based paths instead of GUIDs. This keeps the code clean and readable, which is perfect for data engineering workspaces where naming conventions are already well-established. Just use underscores or hyphens instead: `my_workspace` or `my-lakehouse`.
12
+
13
+ ## Installation
14
+
15
+ ```bash
16
+ pip install duckrun
17
+ ```
18
+
19
+ ## Quick Start
20
+
21
+ ```python
22
+ import duckrun
23
+
24
+ # Connect to your Fabric lakehouse
25
+ con = duckrun.connect(
26
+ workspace="my_workspace",
27
+ lakehouse_name="my_lakehouse",
28
+ schema="dbo"
29
+ )
30
+
31
+ # Explore data
32
+ con.sql("SELECT * FROM my_table LIMIT 10").show()
33
+
34
+ # Write to Delta tables (Spark-style API)
35
+ con.sql("SELECT * FROM source").write.mode("overwrite").saveAsTable("target")
36
+ ```
37
+
38
+ That's it! No `sql_folder` needed for data exploration.
39
+
40
+ ## Two Ways to Use Duckrun
41
+
42
+ ### 1. Data Exploration (Spark-Style API)
43
+
44
+ Perfect for ad-hoc analysis and interactive notebooks:
45
+
46
+ ```python
47
+ con = duckrun.connect("workspace", "lakehouse", "dbo")
48
+
49
+ # Query existing tables
50
+ con.sql("SELECT * FROM sales WHERE year = 2024").show()
51
+
52
+ # Get DataFrame
53
+ df = con.sql("SELECT COUNT(*) FROM orders").df()
54
+
55
+ # Write results to Delta tables
56
+ con.sql("""
57
+ SELECT
58
+ customer_id,
59
+ SUM(amount) as total
60
+ FROM orders
61
+ GROUP BY customer_id
62
+ """).write.mode("overwrite").saveAsTable("customer_totals")
63
+
64
+ # Append mode
65
+ con.sql("SELECT * FROM new_orders").write.mode("append").saveAsTable("orders")
66
+ ```
67
+
68
+ **Note:** `.format("delta")` is optional - Delta is the default format!
69
+
70
+ ### 2. Pipeline Orchestration
71
+
72
+ For production workflows with reusable SQL and Python tasks:
73
+
74
+ ```python
75
+ con = duckrun.connect(
76
+ workspace="my_workspace",
77
+ lakehouse_name="my_lakehouse",
78
+ schema="dbo",
79
+ sql_folder="./sql" # folder with .sql and .py files
80
+ )
81
+
82
+ # Define pipeline
83
+ pipeline = [
84
+ ('download_data', (url, path)), # Python task
85
+ ('clean_data', 'overwrite'), # SQL task
86
+ ('aggregate', 'append') # SQL task
87
+ ]
88
+
89
+ # Run it
90
+ con.run(pipeline)
91
+ ```
92
+
93
+ ## Pipeline Tasks
94
+
95
+ ### Python Tasks
96
+
97
+ **Format:** `('function_name', (arg1, arg2, ...))`
98
+
99
+ Create `sql_folder/function_name.py`:
100
+
101
+ ```python
102
+ # sql_folder/download_data.py
103
+ def download_data(url, path):
104
+ # your code here
105
+ return 1 # 1 = success, 0 = failure
106
+ ```
107
+
108
+ ### SQL Tasks
109
+
110
+ **Format:** `('table_name', 'mode')` or `('table_name', 'mode', {params})`
111
+
112
+ Create `sql_folder/table_name.sql`:
113
+
114
+ ```sql
115
+ -- sql_folder/clean_data.sql
116
+ SELECT
117
+ id,
118
+ TRIM(name) as name,
119
+ date
120
+ FROM raw_data
121
+ WHERE date >= '2024-01-01'
122
+ ```
123
+
124
+ **Write Modes:**
125
+ - `overwrite` - Replace table completely
126
+ - `append` - Add to existing table
127
+ - `ignore` - Create only if doesn't exist
128
+
129
+ ### Parameterized SQL
130
+
131
+ Built-in parameters (always available):
132
+ - `$ws` - workspace name
133
+ - `$lh` - lakehouse name
134
+ - `$schema` - schema name
135
+
136
+ Custom parameters:
137
+
138
+ ```python
139
+ pipeline = [
140
+ ('sales', 'append', {'start_date': '2024-01-01', 'end_date': '2024-12-31'})
141
+ ]
142
+ ```
143
+
144
+ ```sql
145
+ -- sql_folder/sales.sql
146
+ SELECT * FROM transactions
147
+ WHERE date BETWEEN '$start_date' AND '$end_date'
148
+ ```
149
+
150
+ ## Advanced Features
151
+
152
+ ### Table Name Variants
153
+
154
+ Use `__` to create multiple versions of the same table:
155
+
156
+ ```python
157
+ pipeline = [
158
+ ('sales__initial', 'overwrite'), # writes to 'sales'
159
+ ('sales__incremental', 'append'), # appends to 'sales'
160
+ ]
161
+ ```
162
+
163
+ Both tasks write to the `sales` table but use different SQL files (`sales__initial.sql` and `sales__incremental.sql`).
164
+
165
+ ### Remote SQL Files
166
+
167
+ Load tasks from GitHub or any URL:
168
+
169
+ ```python
170
+ con = duckrun.connect(
171
+ workspace="Analytics",
172
+ lakehouse_name="Sales",
173
+ schema="dbo",
174
+ sql_folder="https://raw.githubusercontent.com/user/repo/main/sql"
175
+ )
176
+ ```
177
+
178
+ ### Early Exit on Failure
179
+
180
+ **Pipelines automatically stop when any task fails** - subsequent tasks won't run.
181
+
182
+ For **SQL tasks**, failure is automatic:
183
+ - If the query has a syntax error or runtime error, the task fails
184
+ - The pipeline stops immediately
185
+
186
+ For **Python tasks**, you control success/failure by returning:
187
+ - `1` = Success → pipeline continues to next task
188
+ - `0` = Failure → pipeline stops, remaining tasks are skipped
189
+
190
+ Example:
191
+
192
+ ```python
193
+ # sql_folder/download_data.py
194
+ def download_data(url, path):
195
+ try:
196
+ response = requests.get(url)
197
+ response.raise_for_status()
198
+ # save data...
199
+ return 1 # Success - pipeline continues
200
+ except Exception as e:
201
+ print(f"Download failed: {e}")
202
+ return 0 # Failure - pipeline stops here
203
+ ```
204
+
205
+ ```python
206
+ pipeline = [
207
+ ('download_data', (url, path)), # If returns 0, stops here
208
+ ('clean_data', 'overwrite'), # Won't run if download failed
209
+ ('aggregate', 'append') # Won't run if download failed
210
+ ]
211
+
212
+ success = con.run(pipeline) # Returns True only if ALL tasks succeed
213
+ ```
214
+
215
+ This prevents downstream tasks from processing incomplete or corrupted data.
216
+
217
+ ### Delta Lake Optimization
218
+
219
+ Duckrun automatically:
220
+ - Compacts small files when file count exceeds threshold (default: 100)
221
+ - Vacuums old versions on overwrite
222
+ - Cleans up metadata
223
+
224
+ Customize compaction threshold:
225
+
226
+ ```python
227
+ con = duckrun.connect(
228
+ workspace="workspace",
229
+ lakehouse_name="lakehouse",
230
+ schema="dbo",
231
+ compaction_threshold=50 # compact after 50 files
232
+ )
233
+ ```
234
+
235
+ ## Complete Example
236
+
237
+ ```python
238
+ import duckrun
239
+
240
+ # Connect
241
+ con = duckrun.connect("Analytics", "Sales", "dbo", "./sql")
242
+
243
+ # Pipeline with mixed tasks
244
+ pipeline = [
245
+ # Download raw data (Python)
246
+ ('fetch_api_data', ('https://api.example.com/sales', 'raw')),
247
+
248
+ # Clean and transform (SQL)
249
+ ('clean_sales', 'overwrite'),
250
+
251
+ # Aggregate by region (SQL with params)
252
+ ('regional_summary', 'overwrite', {'min_amount': 1000}),
253
+
254
+ # Append to history (SQL)
255
+ ('sales_history', 'append')
256
+ ]
257
+
258
+ # Run
259
+ success = con.run(pipeline)
260
+
261
+ # Explore results
262
+ con.sql("SELECT * FROM regional_summary").show()
263
+
264
+ # Export to new table
265
+ con.sql("""
266
+ SELECT region, SUM(total) as grand_total
267
+ FROM regional_summary
268
+ GROUP BY region
269
+ """).write.mode("overwrite").saveAsTable("region_totals")
270
+ ```
271
+
272
+ ## How It Works
273
+
274
+ 1. **Connection**: Duckrun connects to your Fabric lakehouse using OneLake and Azure authentication
275
+ 2. **Table Discovery**: Automatically scans for Delta tables in your schema and creates DuckDB views
276
+ 3. **Query Execution**: Run SQL queries directly against Delta tables using DuckDB's speed
277
+ 4. **Write Operations**: Results are written back as Delta tables with automatic optimization
278
+ 5. **Pipelines**: Orchestrate complex workflows with reusable SQL and Python tasks
279
+
280
+ ## Real-World Example
281
+
282
+ For a complete production example, see [fabric_demo](https://github.com/djouallah/fabric_demo).
283
+
284
+ ## License
285
+
286
+ MIT
@@ -111,16 +111,17 @@ class Duckrun:
111
111
 
112
112
  Usage:
113
113
  # For pipelines:
114
- dr = Duckrun.connect(workspace, lakehouse, schema, sql_folder)
114
+ dr = Duckrun.connect("workspace/lakehouse.lakehouse/schema", sql_folder="./sql")
115
+ dr = Duckrun.connect("workspace/lakehouse.lakehouse") # defaults to dbo schema
115
116
  dr.run(pipeline)
116
117
 
117
118
  # For data exploration with Spark-style API:
118
- dr = Duckrun.connect(workspace, lakehouse, schema)
119
+ dr = Duckrun.connect("workspace/lakehouse.lakehouse")
119
120
  dr.sql("SELECT * FROM table").show()
120
121
  dr.sql("SELECT 43").write.mode("append").saveAsTable("test")
121
122
  """
122
123
 
123
- def __init__(self, workspace: str, lakehouse_name: str, schema: str,
124
+ def __init__(self, workspace: str, lakehouse_name: str, schema: str = "dbo",
124
125
  sql_folder: Optional[str] = None, compaction_threshold: int = 10):
125
126
  self.workspace = workspace
126
127
  self.lakehouse_name = lakehouse_name
@@ -133,10 +134,57 @@ class Duckrun:
133
134
  self._attach_lakehouse()
134
135
 
135
136
  @classmethod
136
- def connect(cls, workspace: str, lakehouse_name: str, schema: str,
137
- sql_folder: Optional[str] = None, compaction_threshold: int = 100):
138
- """Create and connect to lakehouse"""
137
+ def connect(cls, workspace: Union[str, None] = None, lakehouse_name: Optional[str] = None,
138
+ schema: str = "dbo", sql_folder: Optional[str] = None,
139
+ compaction_threshold: int = 100):
140
+ """
141
+ Create and connect to lakehouse.
142
+
143
+ Supports two formats:
144
+ 1. Compact: connect("ws/lh.lakehouse/schema") or connect("ws/lh.lakehouse")
145
+ 2. Traditional: connect("ws", "lh", "schema") or connect("ws", "lh")
146
+
147
+ Schema defaults to "dbo" if not specified.
148
+
149
+ Examples:
150
+ dr = Duckrun.connect("myworkspace/mylakehouse.lakehouse/bronze")
151
+ dr = Duckrun.connect("myworkspace/mylakehouse.lakehouse") # uses dbo
152
+ dr = Duckrun.connect("myworkspace", "mylakehouse", "bronze")
153
+ dr = Duckrun.connect("myworkspace", "mylakehouse") # uses dbo
154
+ dr = Duckrun.connect("ws/lh.lakehouse", sql_folder="./sql")
155
+ """
139
156
  print("Connecting to Lakehouse...")
157
+
158
+ # Check if using compact format: "ws/lh.lakehouse/schema" or "ws/lh.lakehouse"
159
+ if workspace and "/" in workspace and lakehouse_name is None:
160
+ parts = workspace.split("/")
161
+ if len(parts) == 2:
162
+ # Format: "ws/lh.lakehouse" (schema will use default)
163
+ workspace, lakehouse_name = parts
164
+ # schema already has default value "dbo"
165
+ elif len(parts) == 3:
166
+ # Format: "ws/lh.lakehouse/schema"
167
+ workspace, lakehouse_name, schema = parts
168
+ else:
169
+ raise ValueError(
170
+ f"Invalid connection string format: '{workspace}'. "
171
+ "Expected format: 'workspace/lakehouse.lakehouse' or 'workspace/lakehouse.lakehouse/schema'"
172
+ )
173
+
174
+ # Remove .lakehouse suffix if present
175
+ if lakehouse_name.endswith(".lakehouse"):
176
+ lakehouse_name = lakehouse_name[:-10]
177
+
178
+ # Validate all required parameters are present
179
+ if not workspace or not lakehouse_name:
180
+ raise ValueError(
181
+ "Missing required parameters. Use either:\n"
182
+ " connect('workspace/lakehouse.lakehouse/schema')\n"
183
+ " connect('workspace/lakehouse.lakehouse') # defaults to dbo\n"
184
+ " connect('workspace', 'lakehouse', 'schema')\n"
185
+ " connect('workspace', 'lakehouse') # defaults to dbo"
186
+ )
187
+
140
188
  return cls(workspace, lakehouse_name, schema, sql_folder, compaction_threshold)
141
189
 
142
190
  def _get_storage_token(self):