dagster-duckdb-pyspark 0.21.9__py3-none-any.whl → 0.21.11__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of dagster-duckdb-pyspark might be problematic. Click here for more details.

@@ -120,17 +120,40 @@ Examples:
120
120
  def my_table() -> pyspark.sql.DataFrame: # the name of the asset will be the table name
121
121
  ...
122
122
 
123
- @repository
124
- def my_repo():
125
- return with_resources(
126
- [my_table],
127
- {"io_manager": duckdb_pyspark_io_manager.configured({"database": "my_db.duckdb"})}
123
+ defs = Definitions(
124
+ assets=[my_table],
125
+ resources={"io_manager": duckdb_pyspark_io_manager.configured({"database": "my_db.duckdb"})}
126
+ )
127
+
128
+ You can set a default schema to store the assets using the ``schema`` configuration value of the DuckDB I/O
129
+ Manager. This schema will be used if no other schema is specified directly on an asset or op.
130
+
131
+ .. code-block:: python
132
+
133
+ defs = Definitions(
134
+ assets=[my_table],
135
+ resources={"io_manager": duckdb_pyspark_io_manager.configured({"database": "my_db.duckdb", "schema": "my_schema"})}
136
+ )
137
+
138
+ On individual assets, you an also specify the schema where they should be stored using metadata or
139
+ by adding a ``key_prefix`` to the asset key. If both ``key_prefix`` and metadata are defined, the metadata will
140
+ take precedence.
141
+
142
+ .. code-block:: python
143
+
144
+ @asset(
145
+ key_prefix=["my_schema"] # will be used as the schema in duckdb
128
146
  )
147
+ def my_table() -> pyspark.sql.DataFrame:
148
+ ...
129
149
 
130
- If you do not provide a schema, Dagster will determine a schema based on the assets and ops using
131
- the I/O Manager. For assets, the schema will be determined from the asset key.
132
- For ops, the schema can be specified by including a "schema" entry in output metadata. If "schema" is not provided
133
- via config or on the asset/op, "public" will be used for the schema.
150
+ @asset(
151
+ metadata={"schema": "my_schema"} # will be used as the schema in duckdb
152
+ )
153
+ def my_other_table() -> pyspark.sql.DataFrame:
154
+ ...
155
+
156
+ For ops, the schema can be specified by including a "schema" entry in output metadata.
134
157
 
135
158
  .. code-block:: python
136
159
 
@@ -138,9 +161,10 @@ Examples:
138
161
  out={"my_table": Out(metadata={"schema": "my_schema"})}
139
162
  )
140
163
  def make_my_table() -> pyspark.sql.DataFrame:
141
- # the returned value will be stored at my_schema.my_table
142
164
  ...
143
165
 
166
+ If none of these is provided, the schema will default to "public".
167
+
144
168
  To only use specific columns of a table as input to a downstream op or asset, add the metadata "columns" to the
145
169
  In or AssetIn.
146
170
 
@@ -180,10 +204,35 @@ class DuckDBPySparkIOManager(DuckDBIOManager):
180
204
  resources={"io_manager": DuckDBPySparkIOManager(database="my_db.duckdb")}
181
205
  )
182
206
 
183
- If you do not provide a schema, Dagster will determine a schema based on the assets and ops using
184
- the I/O Manager. For assets, the schema will be determined from the asset key, as in the above example.
185
- For ops, the schema can be specified by including a "schema" entry in output metadata. If "schema" is not provided
186
- via config or on the asset/op, "public" will be used for the schema.
207
+ You can set a default schema to store the assets using the ``schema`` configuration value of the DuckDB I/O
208
+ Manager. This schema will be used if no other schema is specified directly on an asset or op.
209
+
210
+ .. code-block:: python
211
+
212
+ defs = Definitions(
213
+ assets=[my_table],
214
+ resources={"io_manager": DuckDBPySparkIOManager(database="my_db.duckdb", schema="my_schema")}
215
+ )
216
+
217
+ On individual assets, you an also specify the schema where they should be stored using metadata or
218
+ by adding a ``key_prefix`` to the asset key. If both ``key_prefix`` and metadata are defined, the metadata will
219
+ take precedence.
220
+
221
+ .. code-block:: python
222
+
223
+ @asset(
224
+ key_prefix=["my_schema"] # will be used as the schema in duckdb
225
+ )
226
+ def my_table() -> pyspark.sql.DataFrame:
227
+ ...
228
+
229
+ @asset(
230
+ metadata={"schema": "my_schema"} # will be used as the schema in duckdb
231
+ )
232
+ def my_other_table() -> pyspark.sql.DataFrame:
233
+ ...
234
+
235
+ For ops, the schema can be specified by including a "schema" entry in output metadata.
187
236
 
188
237
  .. code-block:: python
189
238
 
@@ -191,9 +240,10 @@ class DuckDBPySparkIOManager(DuckDBIOManager):
191
240
  out={"my_table": Out(metadata={"schema": "my_schema"})}
192
241
  )
193
242
  def make_my_table() -> pyspark.sql.DataFrame:
194
- # the returned value will be stored at my_schema.my_table
195
243
  ...
196
244
 
245
+ If none of these is provided, the schema will default to "public".
246
+
197
247
  To only use specific columns of a table as input to a downstream op or asset, add the metadata "columns" to the
198
248
  In or AssetIn.
199
249
 
@@ -1 +1 @@
1
- __version__ = "0.21.9"
1
+ __version__ = "0.21.11"
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: dagster-duckdb-pyspark
3
- Version: 0.21.9
3
+ Version: 0.21.11
4
4
  Summary: Package for storing PySpark DataFrames in DuckDB.
5
5
  Home-page: https://github.com/dagster-io/dagster/tree/master/python_modules/libraries/dagster-duckb-pyspark
6
6
  Author: Dagster Labs
@@ -12,8 +12,8 @@ Classifier: Programming Language :: Python :: 3.10
12
12
  Classifier: License :: OSI Approved :: Apache Software License
13
13
  Classifier: Operating System :: OS Independent
14
14
  License-File: LICENSE
15
- Requires-Dist: dagster ==1.5.9
16
- Requires-Dist: dagster-duckdb ==0.21.9
15
+ Requires-Dist: dagster ==1.5.11
16
+ Requires-Dist: dagster-duckdb ==0.21.11
17
17
  Requires-Dist: pyspark >=3
18
18
  Requires-Dist: pandas <2.1
19
19
  Requires-Dist: pyarrow
@@ -0,0 +1,9 @@
1
+ dagster_duckdb_pyspark/__init__.py,sha256=KjwD42HKQJslK2WPFg2F7mvHe1hPyrp02xSWM0Az39Y,382
2
+ dagster_duckdb_pyspark/duckdb_pyspark_type_handler.py,sha256=Tqo9McLXY_dmzgszA3nK5X7Hbws7jd8WuXMSXWfMDaQ,9588
3
+ dagster_duckdb_pyspark/py.typed,sha256=mDShSrm8qg9qjacQc2F-rI8ATllqP6EdgHuEYxuCXZ0,7
4
+ dagster_duckdb_pyspark/version.py,sha256=V1YMGft6tHE234dVnQ7jp6BkJFd87D4FSeroTZLBNT8,24
5
+ dagster_duckdb_pyspark-0.21.11.dist-info/LICENSE,sha256=TMatHW4_G9ldRdodEAp-l2Xa2WvsdeOh60E3v1R2jis,11349
6
+ dagster_duckdb_pyspark-0.21.11.dist-info/METADATA,sha256=DbqCw7SRk3t1h2VevonvpfeWWhDgQAM49WYYkTe6mf0,745
7
+ dagster_duckdb_pyspark-0.21.11.dist-info/WHEEL,sha256=yQN5g4mg4AybRjkgi-9yy4iQEFibGQmlz78Pik5Or-A,92
8
+ dagster_duckdb_pyspark-0.21.11.dist-info/top_level.txt,sha256=UYh0E2YiAlK01-DAkx0eikRaH-TIk0n9jijQK2joJBs,23
9
+ dagster_duckdb_pyspark-0.21.11.dist-info/RECORD,,
@@ -1,9 +0,0 @@
1
- dagster_duckdb_pyspark/__init__.py,sha256=KjwD42HKQJslK2WPFg2F7mvHe1hPyrp02xSWM0Az39Y,382
2
- dagster_duckdb_pyspark/duckdb_pyspark_type_handler.py,sha256=t9lqCpo-ibaThEFzxjqownu_yF_tFpVvQO6_ITgPLlY,7980
3
- dagster_duckdb_pyspark/py.typed,sha256=mDShSrm8qg9qjacQc2F-rI8ATllqP6EdgHuEYxuCXZ0,7
4
- dagster_duckdb_pyspark/version.py,sha256=Q2Rw67JbT-EQOHwSa5YAA48ptYmYSvagFzOgypb2BEA,23
5
- dagster_duckdb_pyspark-0.21.9.dist-info/LICENSE,sha256=TMatHW4_G9ldRdodEAp-l2Xa2WvsdeOh60E3v1R2jis,11349
6
- dagster_duckdb_pyspark-0.21.9.dist-info/METADATA,sha256=7obCEuK3bhJ69IqGxdMmA2Um_Td7-5kv47WzdJhus4Y,742
7
- dagster_duckdb_pyspark-0.21.9.dist-info/WHEEL,sha256=yQN5g4mg4AybRjkgi-9yy4iQEFibGQmlz78Pik5Or-A,92
8
- dagster_duckdb_pyspark-0.21.9.dist-info/top_level.txt,sha256=UYh0E2YiAlK01-DAkx0eikRaH-TIk0n9jijQK2joJBs,23
9
- dagster_duckdb_pyspark-0.21.9.dist-info/RECORD,,