dataproc-spark-connect 0.6.0__py2.py3-none-any.whl → 0.7.0__py2.py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- dataproc_spark_connect-0.7.0.dist-info/METADATA +98 -0
- {dataproc_spark_connect-0.6.0.dist-info → dataproc_spark_connect-0.7.0.dist-info}/RECORD +7 -7
- google/cloud/dataproc_spark_connect/client/proxy.py +3 -3
- google/cloud/dataproc_spark_connect/session.py +186 -172
- dataproc_spark_connect-0.6.0.dist-info/METADATA +0 -111
- {dataproc_spark_connect-0.6.0.dist-info → dataproc_spark_connect-0.7.0.dist-info}/LICENSE +0 -0
- {dataproc_spark_connect-0.6.0.dist-info → dataproc_spark_connect-0.7.0.dist-info}/WHEEL +0 -0
- {dataproc_spark_connect-0.6.0.dist-info → dataproc_spark_connect-0.7.0.dist-info}/top_level.txt +0 -0
|
@@ -0,0 +1,98 @@
|
|
|
1
|
+
Metadata-Version: 2.1
|
|
2
|
+
Name: dataproc-spark-connect
|
|
3
|
+
Version: 0.7.0
|
|
4
|
+
Summary: Dataproc client library for Spark Connect
|
|
5
|
+
Home-page: https://github.com/GoogleCloudDataproc/dataproc-spark-connect-python
|
|
6
|
+
Author: Google LLC
|
|
7
|
+
License: Apache 2.0
|
|
8
|
+
License-File: LICENSE
|
|
9
|
+
Requires-Dist: google-api-core>=2.19
|
|
10
|
+
Requires-Dist: google-cloud-dataproc>=5.18
|
|
11
|
+
Requires-Dist: packaging>=20.0
|
|
12
|
+
Requires-Dist: pyspark[connect]>=3.5
|
|
13
|
+
Requires-Dist: tqdm>=4.67
|
|
14
|
+
Requires-Dist: websockets>=15.0
|
|
15
|
+
|
|
16
|
+
# Dataproc Spark Connect Client
|
|
17
|
+
|
|
18
|
+
A wrapper of the Apache [Spark Connect](https://spark.apache.org/spark-connect/)
|
|
19
|
+
client with additional functionalities that allow applications to communicate
|
|
20
|
+
with a remote Dataproc Spark Session using the Spark Connect protocol without
|
|
21
|
+
requiring additional steps.
|
|
22
|
+
|
|
23
|
+
## Install
|
|
24
|
+
|
|
25
|
+
```sh
|
|
26
|
+
pip install dataproc_spark_connect
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
## Uninstall
|
|
30
|
+
|
|
31
|
+
```sh
|
|
32
|
+
pip uninstall dataproc_spark_connect
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
## Setup
|
|
36
|
+
|
|
37
|
+
This client requires permissions to
|
|
38
|
+
manage [Dataproc Sessions and Session Templates](https://cloud.google.com/dataproc-serverless/docs/concepts/iam).
|
|
39
|
+
If you are running the client outside of Google Cloud, you must set following
|
|
40
|
+
environment variables:
|
|
41
|
+
|
|
42
|
+
* `GOOGLE_CLOUD_PROJECT` - The Google Cloud project you use to run Spark
|
|
43
|
+
workloads
|
|
44
|
+
* `GOOGLE_CLOUD_REGION` - The Compute
|
|
45
|
+
Engine [region](https://cloud.google.com/compute/docs/regions-zones#available)
|
|
46
|
+
where you run the Spark workload.
|
|
47
|
+
* `GOOGLE_APPLICATION_CREDENTIALS` -
|
|
48
|
+
Your [Application Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc)
|
|
49
|
+
|
|
50
|
+
## Usage
|
|
51
|
+
|
|
52
|
+
1. Install the latest version of Dataproc Python client and Dataproc Spark
|
|
53
|
+
Connect modules:
|
|
54
|
+
|
|
55
|
+
```sh
|
|
56
|
+
pip install google_cloud_dataproc dataproc_spark_connect --force-reinstall
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
2. Add the required imports into your PySpark application or notebook and start
|
|
60
|
+
a Spark session with the following code instead of using
|
|
61
|
+
environment variables:
|
|
62
|
+
|
|
63
|
+
```python
|
|
64
|
+
from google.cloud.dataproc_spark_connect import DataprocSparkSession
|
|
65
|
+
from google.cloud.dataproc_v1 import Session
|
|
66
|
+
session_config = Session()
|
|
67
|
+
session_config.environment_config.execution_config.subnetwork_uri = '<subnet>'
|
|
68
|
+
session_config.runtime_config.version = '2.2'
|
|
69
|
+
spark = DataprocSparkSession.builder.dataprocSessionConfig(session_config).getOrCreate()
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
## Developing
|
|
73
|
+
|
|
74
|
+
For development instructions see [guide](DEVELOPING.md).
|
|
75
|
+
|
|
76
|
+
## Contributing
|
|
77
|
+
|
|
78
|
+
We'd love to accept your patches and contributions to this project. There are
|
|
79
|
+
just a few small guidelines you need to follow.
|
|
80
|
+
|
|
81
|
+
### Contributor License Agreement
|
|
82
|
+
|
|
83
|
+
Contributions to this project must be accompanied by a Contributor License
|
|
84
|
+
Agreement. You (or your employer) retain the copyright to your contribution;
|
|
85
|
+
this simply gives us permission to use and redistribute your contributions as
|
|
86
|
+
part of the project. Head over to <https://cla.developers.google.com> to see
|
|
87
|
+
your current agreements on file or to sign a new one.
|
|
88
|
+
|
|
89
|
+
You generally only need to submit a CLA once, so if you've already submitted one
|
|
90
|
+
(even if it was for a different project), you probably don't need to do it
|
|
91
|
+
again.
|
|
92
|
+
|
|
93
|
+
### Code reviews
|
|
94
|
+
|
|
95
|
+
All submissions, including submissions by project members, require review. We
|
|
96
|
+
use GitHub pull requests for this purpose. Consult
|
|
97
|
+
[GitHub Help](https://help.github.com/articles/about-pull-requests/) for more
|
|
98
|
+
information on using pull requests.
|
|
@@ -1,12 +1,12 @@
|
|
|
1
1
|
google/cloud/dataproc_spark_connect/__init__.py,sha256=dIqHNWVWWrSuRf26x11kX5e9yMKSHCtmI_GBj1-FDdE,1101
|
|
2
2
|
google/cloud/dataproc_spark_connect/exceptions.py,sha256=ilGyHD5M_yBQ3IC58-Y5miRGIQVJsLaNKvEGcHuk_BE,969
|
|
3
3
|
google/cloud/dataproc_spark_connect/pypi_artifacts.py,sha256=gd-VMwiVP-EJuPp9Vf9Shx8pqps3oSKp0hBcSSZQS-A,1575
|
|
4
|
-
google/cloud/dataproc_spark_connect/session.py,sha256=
|
|
4
|
+
google/cloud/dataproc_spark_connect/session.py,sha256=98Zrn0Vyl2ajcF5hltdSp8LgYTOzDa-eqeYxxmZVKds,26398
|
|
5
5
|
google/cloud/dataproc_spark_connect/client/__init__.py,sha256=6hCNSsgYlie6GuVpc5gjFsPnyeMTScTpXSPYqp1fplY,615
|
|
6
6
|
google/cloud/dataproc_spark_connect/client/core.py,sha256=m3oXTKBm3sBy6jhDu9GRecrxLb5CdEM53SgMlnJb6ag,4616
|
|
7
|
-
google/cloud/dataproc_spark_connect/client/proxy.py,sha256=
|
|
8
|
-
dataproc_spark_connect-0.
|
|
9
|
-
dataproc_spark_connect-0.
|
|
10
|
-
dataproc_spark_connect-0.
|
|
11
|
-
dataproc_spark_connect-0.
|
|
12
|
-
dataproc_spark_connect-0.
|
|
7
|
+
google/cloud/dataproc_spark_connect/client/proxy.py,sha256=qUZXvVY1yn934vE6nlO495XUZ53AUx9O74a9ozkGI9U,8976
|
|
8
|
+
dataproc_spark_connect-0.7.0.dist-info/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
|
|
9
|
+
dataproc_spark_connect-0.7.0.dist-info/METADATA,sha256=fFJLyzjo3CKLx1d18U4i1csJEPgWxoAjok4qFghtOyE,3328
|
|
10
|
+
dataproc_spark_connect-0.7.0.dist-info/WHEEL,sha256=OpXWERl2xLPRHTvd2ZXo_iluPEQd8uSbYkJ53NAER_Y,109
|
|
11
|
+
dataproc_spark_connect-0.7.0.dist-info/top_level.txt,sha256=_1QvSJIhFAGfxb79D6DhB7SUw2X6T4rwnz_LLrbcD3c,7
|
|
12
|
+
dataproc_spark_connect-0.7.0.dist-info/RECORD,,
|
|
@@ -18,7 +18,6 @@ import contextlib
|
|
|
18
18
|
import logging
|
|
19
19
|
import socket
|
|
20
20
|
import threading
|
|
21
|
-
import time
|
|
22
21
|
|
|
23
22
|
import websockets.sync.client as websocketclient
|
|
24
23
|
|
|
@@ -95,6 +94,7 @@ def forward_bytes(name, from_sock, to_sock):
|
|
|
95
94
|
This method is intended to be run in a separate thread of execution.
|
|
96
95
|
|
|
97
96
|
Args:
|
|
97
|
+
name: forwarding thread name
|
|
98
98
|
from_sock: A socket-like object to stream bytes from.
|
|
99
99
|
to_sock: A socket-like object to stream bytes to.
|
|
100
100
|
"""
|
|
@@ -131,7 +131,7 @@ def connect_sockets(conn_number, from_sock, to_sock):
|
|
|
131
131
|
This method continuously streams bytes in both directions between the
|
|
132
132
|
given `from_sock` and `to_sock` socket-like objects.
|
|
133
133
|
|
|
134
|
-
The caller is responsible for creating and closing the supplied
|
|
134
|
+
The caller is responsible for creating and closing the supplied sockets.
|
|
135
135
|
"""
|
|
136
136
|
forward_name = f"{conn_number}-forward"
|
|
137
137
|
t1 = threading.Thread(
|
|
@@ -163,7 +163,7 @@ def forward_connection(conn_number, conn, addr, target_host):
|
|
|
163
163
|
Both the supplied incoming connection (`conn`) and the created outgoing
|
|
164
164
|
connection are automatically closed when this method terminates.
|
|
165
165
|
|
|
166
|
-
This method should be run inside
|
|
166
|
+
This method should be run inside a daemon thread so that it will not
|
|
167
167
|
block program termination.
|
|
168
168
|
"""
|
|
169
169
|
with conn:
|
|
@@ -11,39 +11,37 @@
|
|
|
11
11
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
12
12
|
# See the License for the specific language governing permissions and
|
|
13
13
|
# limitations under the License.
|
|
14
|
+
|
|
14
15
|
import atexit
|
|
16
|
+
import datetime
|
|
15
17
|
import json
|
|
16
18
|
import logging
|
|
17
19
|
import os
|
|
18
20
|
import random
|
|
19
21
|
import string
|
|
22
|
+
import threading
|
|
20
23
|
import time
|
|
21
|
-
import
|
|
22
|
-
from time import sleep
|
|
23
|
-
from typing import Any, cast, ClassVar, Dict, Optional
|
|
24
|
+
import tqdm
|
|
24
25
|
|
|
25
26
|
from google.api_core import retry
|
|
26
|
-
from google.api_core.future.polling import POLLING_PREDICATE
|
|
27
27
|
from google.api_core.client_options import ClientOptions
|
|
28
28
|
from google.api_core.exceptions import Aborted, FailedPrecondition, InvalidArgument, NotFound, PermissionDenied
|
|
29
|
-
from google.
|
|
30
|
-
|
|
31
|
-
from google.cloud.dataproc_spark_connect.pypi_artifacts import PyPiArtifacts
|
|
29
|
+
from google.api_core.future.polling import POLLING_PREDICATE
|
|
32
30
|
from google.cloud.dataproc_spark_connect.client import DataprocChannelBuilder
|
|
31
|
+
from google.cloud.dataproc_spark_connect.exceptions import DataprocSparkConnectException
|
|
32
|
+
from google.cloud.dataproc_spark_connect.pypi_artifacts import PyPiArtifacts
|
|
33
33
|
from google.cloud.dataproc_v1 import (
|
|
34
|
+
AuthenticationConfig,
|
|
34
35
|
CreateSessionRequest,
|
|
35
36
|
GetSessionRequest,
|
|
36
37
|
Session,
|
|
37
38
|
SessionControllerClient,
|
|
38
|
-
SessionTemplate,
|
|
39
39
|
TerminateSessionRequest,
|
|
40
40
|
)
|
|
41
|
-
from google.
|
|
42
|
-
from google.protobuf.text_format import ParseError
|
|
41
|
+
from google.cloud.dataproc_v1.types import sessions
|
|
43
42
|
from pyspark.sql.connect.session import SparkSession
|
|
44
43
|
from pyspark.sql.utils import to_str
|
|
45
|
-
|
|
46
|
-
from google.cloud.dataproc_spark_connect.exceptions import DataprocSparkConnectException
|
|
44
|
+
from typing import Any, cast, ClassVar, Dict, Optional
|
|
47
45
|
|
|
48
46
|
# Set up logging
|
|
49
47
|
logging.basicConfig(level=logging.INFO)
|
|
@@ -69,6 +67,8 @@ class DataprocSparkSession(SparkSession):
|
|
|
69
67
|
... ) # doctest: +SKIP
|
|
70
68
|
"""
|
|
71
69
|
|
|
70
|
+
_DEFAULT_RUNTIME_VERSION = "2.2"
|
|
71
|
+
|
|
72
72
|
_active_s8s_session_uuid: ClassVar[Optional[str]] = None
|
|
73
73
|
_project_id = None
|
|
74
74
|
_region = None
|
|
@@ -77,7 +77,12 @@ class DataprocSparkSession(SparkSession):
|
|
|
77
77
|
|
|
78
78
|
class Builder(SparkSession.Builder):
|
|
79
79
|
|
|
80
|
-
|
|
80
|
+
_dataproc_runtime_to_spark_version = {
|
|
81
|
+
"1.2": "3.5",
|
|
82
|
+
"2.2": "3.5",
|
|
83
|
+
"2.3": "3.5",
|
|
84
|
+
"3.0": "4.0",
|
|
85
|
+
}
|
|
81
86
|
|
|
82
87
|
_session_static_configs = [
|
|
83
88
|
"spark.executor.cores",
|
|
@@ -93,10 +98,10 @@ class DataprocSparkSession(SparkSession):
|
|
|
93
98
|
self._options: Dict[str, Any] = {}
|
|
94
99
|
self._channel_builder: Optional[DataprocChannelBuilder] = None
|
|
95
100
|
self._dataproc_config: Optional[Session] = None
|
|
96
|
-
self._project_id = os.
|
|
97
|
-
self._region = os.
|
|
101
|
+
self._project_id = os.getenv("GOOGLE_CLOUD_PROJECT")
|
|
102
|
+
self._region = os.getenv("GOOGLE_CLOUD_REGION")
|
|
98
103
|
self._client_options = ClientOptions(
|
|
99
|
-
api_endpoint=os.
|
|
104
|
+
api_endpoint=os.getenv(
|
|
100
105
|
"GOOGLE_CLOUD_DATAPROC_API_ENDPOINT",
|
|
101
106
|
f"{self._region}-dataproc.googleapis.com",
|
|
102
107
|
)
|
|
@@ -117,7 +122,7 @@ class DataprocSparkSession(SparkSession):
|
|
|
117
122
|
|
|
118
123
|
def location(self, location):
|
|
119
124
|
self._region = location
|
|
120
|
-
self._client_options.api_endpoint = os.
|
|
125
|
+
self._client_options.api_endpoint = os.getenv(
|
|
121
126
|
"GOOGLE_CLOUD_DATAPROC_API_ENDPOINT",
|
|
122
127
|
f"{self._region}-dataproc.googleapis.com",
|
|
123
128
|
)
|
|
@@ -155,10 +160,7 @@ class DataprocSparkSession(SparkSession):
|
|
|
155
160
|
spark_connect_url = session_response.runtime_info.endpoints.get(
|
|
156
161
|
"Spark Connect Server"
|
|
157
162
|
)
|
|
158
|
-
|
|
159
|
-
if not spark_connect_url.endswith("/"):
|
|
160
|
-
spark_connect_url += "/"
|
|
161
|
-
url = f"{spark_connect_url.replace('.com/', '.com:443/')};session_id={session_response.uuid};use_ssl=true"
|
|
163
|
+
url = f"{spark_connect_url}/;session_id={session_response.uuid};use_ssl=true"
|
|
162
164
|
logger.debug(f"Spark Connect URL: {url}")
|
|
163
165
|
self._channel_builder = DataprocChannelBuilder(
|
|
164
166
|
url,
|
|
@@ -179,56 +181,66 @@ class DataprocSparkSession(SparkSession):
|
|
|
179
181
|
|
|
180
182
|
if self._options.get("spark.remote", False):
|
|
181
183
|
raise NotImplemented(
|
|
182
|
-
"DataprocSparkSession does not support connecting to an existing remote server"
|
|
184
|
+
"DataprocSparkSession does not support connecting to an existing Spark Connect remote server"
|
|
183
185
|
)
|
|
184
186
|
|
|
185
187
|
from google.cloud.dataproc_v1 import SessionControllerClient
|
|
186
188
|
|
|
187
189
|
dataproc_config: Session = self._get_dataproc_config()
|
|
188
|
-
session_template: SessionTemplate = self._get_session_template()
|
|
189
190
|
|
|
190
|
-
self.
|
|
191
|
-
dataproc_config, session_template
|
|
192
|
-
)
|
|
191
|
+
self._validate_version(dataproc_config)
|
|
193
192
|
|
|
194
|
-
spark_connect_session = self._get_spark_connect_session(
|
|
195
|
-
dataproc_config, session_template
|
|
196
|
-
)
|
|
197
|
-
|
|
198
|
-
if not spark_connect_session:
|
|
199
|
-
dataproc_config.spark_connect_session = {}
|
|
200
|
-
os.environ["SPARK_CONNECT_MODE_ENABLED"] = "1"
|
|
201
|
-
session_request = CreateSessionRequest()
|
|
202
193
|
session_id = self.generate_dataproc_session_id()
|
|
203
|
-
|
|
204
|
-
session_request.session_id = session_id
|
|
205
194
|
dataproc_config.name = f"projects/{self._project_id}/locations/{self._region}/sessions/{session_id}"
|
|
206
195
|
logger.debug(
|
|
207
|
-
f"
|
|
196
|
+
f"Dataproc Session configuration:\n{dataproc_config}"
|
|
208
197
|
)
|
|
198
|
+
|
|
199
|
+
session_request = CreateSessionRequest()
|
|
200
|
+
session_request.session_id = session_id
|
|
209
201
|
session_request.session = dataproc_config
|
|
210
202
|
session_request.parent = (
|
|
211
203
|
f"projects/{self._project_id}/locations/{self._region}"
|
|
212
204
|
)
|
|
213
205
|
|
|
214
|
-
logger.debug("Creating
|
|
206
|
+
logger.debug("Creating Dataproc Session")
|
|
215
207
|
DataprocSparkSession._active_s8s_session_id = session_id
|
|
216
208
|
s8s_creation_start_time = time.time()
|
|
217
|
-
|
|
218
|
-
|
|
219
|
-
|
|
220
|
-
|
|
221
|
-
|
|
222
|
-
|
|
223
|
-
|
|
209
|
+
|
|
210
|
+
stop_create_session_pbar = False
|
|
211
|
+
|
|
212
|
+
def create_session_pbar():
|
|
213
|
+
iterations = 150
|
|
214
|
+
pbar = tqdm.trange(
|
|
215
|
+
iterations,
|
|
216
|
+
bar_format="{bar}",
|
|
217
|
+
ncols=80,
|
|
224
218
|
)
|
|
225
|
-
|
|
219
|
+
for i in pbar:
|
|
220
|
+
if stop_create_session_pbar:
|
|
221
|
+
break
|
|
222
|
+
# Last iteration
|
|
223
|
+
if i >= iterations - 1:
|
|
224
|
+
# Sleep until session created
|
|
225
|
+
while not stop_create_session_pbar:
|
|
226
|
+
time.sleep(1)
|
|
227
|
+
else:
|
|
228
|
+
time.sleep(1)
|
|
229
|
+
|
|
230
|
+
pbar.close()
|
|
231
|
+
# Print new line after the progress bar
|
|
232
|
+
print()
|
|
233
|
+
|
|
234
|
+
create_session_pbar_thread = threading.Thread(
|
|
235
|
+
target=create_session_pbar
|
|
236
|
+
)
|
|
237
|
+
|
|
238
|
+
try:
|
|
226
239
|
if (
|
|
227
|
-
|
|
228
|
-
|
|
229
|
-
|
|
230
|
-
|
|
231
|
-
).lower()
|
|
240
|
+
os.getenv(
|
|
241
|
+
"DATAPROC_SPARK_CONNECT_SESSION_TERMINATE_AT_EXIT",
|
|
242
|
+
"false",
|
|
243
|
+
)
|
|
232
244
|
== "true"
|
|
233
245
|
):
|
|
234
246
|
atexit.register(
|
|
@@ -243,18 +255,25 @@ class DataprocSparkSession(SparkSession):
|
|
|
243
255
|
client_options=self._client_options
|
|
244
256
|
).create_session(session_request)
|
|
245
257
|
print(
|
|
246
|
-
f"
|
|
258
|
+
f"Creating Dataproc Session: https://console.cloud.google.com/dataproc/interactive/{self._region}/{session_id}?project={self._project_id}"
|
|
247
259
|
)
|
|
260
|
+
create_session_pbar_thread.start()
|
|
248
261
|
session_response: Session = operation.result(
|
|
249
|
-
polling=
|
|
262
|
+
polling=retry.Retry(
|
|
263
|
+
predicate=POLLING_PREDICATE,
|
|
264
|
+
initial=5.0, # seconds
|
|
265
|
+
maximum=5.0, # seconds
|
|
266
|
+
multiplier=1.0,
|
|
267
|
+
timeout=600, # seconds
|
|
268
|
+
)
|
|
250
269
|
)
|
|
251
|
-
|
|
252
|
-
|
|
253
|
-
|
|
254
|
-
|
|
255
|
-
|
|
256
|
-
|
|
257
|
-
|
|
270
|
+
stop_create_session_pbar = True
|
|
271
|
+
create_session_pbar_thread.join()
|
|
272
|
+
print("Dataproc Session was successfully created")
|
|
273
|
+
file_path = (
|
|
274
|
+
DataprocSparkSession._get_active_session_file_path()
|
|
275
|
+
)
|
|
276
|
+
if file_path is not None:
|
|
258
277
|
try:
|
|
259
278
|
session_data = {
|
|
260
279
|
"session_name": session_response.name,
|
|
@@ -267,21 +286,27 @@ class DataprocSparkSession(SparkSession):
|
|
|
267
286
|
json.dump(session_data, json_file, indent=4)
|
|
268
287
|
except Exception as e:
|
|
269
288
|
logger.error(
|
|
270
|
-
f"Exception while writing active session to file {file_path}
|
|
289
|
+
f"Exception while writing active session to file {file_path}, {e}"
|
|
271
290
|
)
|
|
272
291
|
except (InvalidArgument, PermissionDenied) as e:
|
|
292
|
+
stop_create_session_pbar = True
|
|
293
|
+
if create_session_pbar_thread.is_alive():
|
|
294
|
+
create_session_pbar_thread.join()
|
|
273
295
|
DataprocSparkSession._active_s8s_session_id = None
|
|
274
296
|
raise DataprocSparkConnectException(
|
|
275
|
-
f"Error while creating
|
|
297
|
+
f"Error while creating Dataproc Session: {e.message}"
|
|
276
298
|
)
|
|
277
299
|
except Exception as e:
|
|
300
|
+
stop_create_session_pbar = True
|
|
301
|
+
if create_session_pbar_thread.is_alive():
|
|
302
|
+
create_session_pbar_thread.join()
|
|
278
303
|
DataprocSparkSession._active_s8s_session_id = None
|
|
279
304
|
raise RuntimeError(
|
|
280
|
-
f"Error while creating
|
|
305
|
+
f"Error while creating Dataproc Session"
|
|
281
306
|
) from e
|
|
282
307
|
|
|
283
308
|
logger.debug(
|
|
284
|
-
f"
|
|
309
|
+
f"Dataproc Session created: {session_id} in {int(time.time() - s8s_creation_start_time)} seconds"
|
|
285
310
|
)
|
|
286
311
|
return self.__create_spark_connect_session_from_s8s(
|
|
287
312
|
session_response, dataproc_config.name
|
|
@@ -292,17 +317,20 @@ class DataprocSparkSession(SparkSession):
|
|
|
292
317
|
) -> Optional["DataprocSparkSession"]:
|
|
293
318
|
s8s_session_id = DataprocSparkSession._active_s8s_session_id
|
|
294
319
|
session_name = f"projects/{self._project_id}/locations/{self._region}/sessions/{s8s_session_id}"
|
|
295
|
-
session_response =
|
|
296
|
-
|
|
297
|
-
|
|
320
|
+
session_response = None
|
|
321
|
+
session = None
|
|
322
|
+
if s8s_session_id is not None:
|
|
323
|
+
session_response = get_active_s8s_session_response(
|
|
324
|
+
session_name, self._client_options
|
|
325
|
+
)
|
|
326
|
+
session = DataprocSparkSession.getActiveSession()
|
|
298
327
|
|
|
299
|
-
session = DataprocSparkSession.getActiveSession()
|
|
300
328
|
if session is None:
|
|
301
329
|
session = DataprocSparkSession._default_session
|
|
302
330
|
|
|
303
331
|
if session_response is not None:
|
|
304
332
|
print(
|
|
305
|
-
f"Using existing
|
|
333
|
+
f"Using existing Dataproc Session (configuration changes may not be applied): https://console.cloud.google.com/dataproc/interactive/{self._region}/{s8s_session_id}?project={self._project_id}"
|
|
306
334
|
)
|
|
307
335
|
if session is None:
|
|
308
336
|
session = self.__create_spark_connect_session_from_s8s(
|
|
@@ -310,10 +338,10 @@ class DataprocSparkSession(SparkSession):
|
|
|
310
338
|
)
|
|
311
339
|
return session
|
|
312
340
|
else:
|
|
313
|
-
logger.info(
|
|
314
|
-
f"Session: {s8s_session_id} not active, stopping previous spark session and creating new"
|
|
315
|
-
)
|
|
316
341
|
if session is not None:
|
|
342
|
+
print(
|
|
343
|
+
f"{s8s_session_id} Dataproc Session is not active, stopping and creating a new one"
|
|
344
|
+
)
|
|
317
345
|
session.stop()
|
|
318
346
|
|
|
319
347
|
return None
|
|
@@ -333,21 +361,52 @@ class DataprocSparkSession(SparkSession):
|
|
|
333
361
|
dataproc_config = self._dataproc_config
|
|
334
362
|
for k, v in self._options.items():
|
|
335
363
|
dataproc_config.runtime_config.properties[k] = v
|
|
336
|
-
|
|
337
|
-
|
|
338
|
-
|
|
364
|
+
dataproc_config.spark_connect_session = (
|
|
365
|
+
sessions.SparkConnectConfig()
|
|
366
|
+
)
|
|
367
|
+
if not dataproc_config.runtime_config.version:
|
|
368
|
+
dataproc_config.runtime_config.version = (
|
|
369
|
+
DataprocSparkSession._DEFAULT_RUNTIME_VERSION
|
|
370
|
+
)
|
|
371
|
+
if (
|
|
372
|
+
not dataproc_config.environment_config.execution_config.authentication_config.user_workload_authentication_type
|
|
373
|
+
and "DATAPROC_SPARK_CONNECT_AUTH_TYPE" in os.environ
|
|
374
|
+
):
|
|
375
|
+
dataproc_config.environment_config.execution_config.authentication_config.user_workload_authentication_type = AuthenticationConfig.AuthenticationType[
|
|
376
|
+
os.getenv("DATAPROC_SPARK_CONNECT_AUTH_TYPE")
|
|
339
377
|
]
|
|
340
|
-
|
|
341
|
-
|
|
342
|
-
|
|
343
|
-
|
|
344
|
-
|
|
345
|
-
|
|
346
|
-
|
|
347
|
-
|
|
348
|
-
|
|
349
|
-
|
|
350
|
-
|
|
378
|
+
if (
|
|
379
|
+
not dataproc_config.environment_config.execution_config.service_account
|
|
380
|
+
and "DATAPROC_SPARK_CONNECT_SERVICE_ACCOUNT" in os.environ
|
|
381
|
+
):
|
|
382
|
+
dataproc_config.environment_config.execution_config.service_account = os.getenv(
|
|
383
|
+
"DATAPROC_SPARK_CONNECT_SERVICE_ACCOUNT"
|
|
384
|
+
)
|
|
385
|
+
if (
|
|
386
|
+
not dataproc_config.environment_config.execution_config.subnetwork_uri
|
|
387
|
+
and "DATAPROC_SPARK_CONNECT_SUBNET" in os.environ
|
|
388
|
+
):
|
|
389
|
+
dataproc_config.environment_config.execution_config.subnetwork_uri = os.getenv(
|
|
390
|
+
"DATAPROC_SPARK_CONNECT_SUBNET"
|
|
391
|
+
)
|
|
392
|
+
if (
|
|
393
|
+
not dataproc_config.environment_config.execution_config.ttl
|
|
394
|
+
and "DATAPROC_SPARK_CONNECT_TTL_SECONDS" in os.environ
|
|
395
|
+
):
|
|
396
|
+
dataproc_config.environment_config.execution_config.ttl = {
|
|
397
|
+
"seconds": int(
|
|
398
|
+
os.getenv("DATAPROC_SPARK_CONNECT_TTL_SECONDS")
|
|
399
|
+
)
|
|
400
|
+
}
|
|
401
|
+
if (
|
|
402
|
+
not dataproc_config.environment_config.execution_config.idle_ttl
|
|
403
|
+
and "DATAPROC_SPARK_CONNECT_IDLE_TTL_SECONDS" in os.environ
|
|
404
|
+
):
|
|
405
|
+
dataproc_config.environment_config.execution_config.idle_ttl = {
|
|
406
|
+
"seconds": int(
|
|
407
|
+
os.getenv("DATAPROC_SPARK_CONNECT_IDLE_TTL_SECONDS")
|
|
408
|
+
)
|
|
409
|
+
}
|
|
351
410
|
if "COLAB_NOTEBOOK_RUNTIME_ID" in os.environ:
|
|
352
411
|
dataproc_config.labels["colab-notebook-runtime-id"] = (
|
|
353
412
|
os.environ["COLAB_NOTEBOOK_RUNTIME_ID"]
|
|
@@ -358,87 +417,38 @@ class DataprocSparkSession(SparkSession):
|
|
|
358
417
|
]
|
|
359
418
|
return dataproc_config
|
|
360
419
|
|
|
361
|
-
def
|
|
362
|
-
|
|
363
|
-
GetSessionTemplateRequest,
|
|
364
|
-
SessionTemplateControllerClient,
|
|
365
|
-
)
|
|
366
|
-
|
|
367
|
-
session_template = None
|
|
368
|
-
if self._dataproc_config and self._dataproc_config.session_template:
|
|
369
|
-
session_template = self._dataproc_config.session_template
|
|
370
|
-
get_session_template_request = GetSessionTemplateRequest()
|
|
371
|
-
get_session_template_request.name = session_template
|
|
372
|
-
client = SessionTemplateControllerClient(
|
|
373
|
-
client_options=self._client_options
|
|
374
|
-
)
|
|
375
|
-
try:
|
|
376
|
-
session_template = client.get_session_template(
|
|
377
|
-
get_session_template_request
|
|
378
|
-
)
|
|
379
|
-
except Exception as e:
|
|
380
|
-
logger.error(
|
|
381
|
-
f"Failed to get session template {session_template}: {e}"
|
|
382
|
-
)
|
|
383
|
-
raise
|
|
384
|
-
return session_template
|
|
420
|
+
def _validate_version(self, dataproc_config):
|
|
421
|
+
trim_version = lambda v: ".".join(v.split(".")[:2])
|
|
385
422
|
|
|
386
|
-
|
|
387
|
-
trimmed_version = lambda v: ".".join(v.split(".")[:2])
|
|
388
|
-
version = None
|
|
423
|
+
version = dataproc_config.runtime_config.version
|
|
389
424
|
if (
|
|
390
|
-
|
|
391
|
-
|
|
392
|
-
and dataproc_config.runtime_config.version
|
|
393
|
-
):
|
|
394
|
-
version = dataproc_config.runtime_config.version
|
|
395
|
-
elif (
|
|
396
|
-
session_template
|
|
397
|
-
and session_template.runtime_config
|
|
398
|
-
and session_template.runtime_config.version
|
|
399
|
-
):
|
|
400
|
-
version = session_template.runtime_config.version
|
|
401
|
-
|
|
402
|
-
if not version:
|
|
403
|
-
version = "3.0"
|
|
404
|
-
dataproc_config.runtime_config.version = version
|
|
405
|
-
elif (
|
|
406
|
-
trimmed_version(version)
|
|
407
|
-
not in self._dataproc_runtime_spark_version
|
|
425
|
+
trim_version(version)
|
|
426
|
+
not in self._dataproc_runtime_to_spark_version
|
|
408
427
|
):
|
|
409
428
|
raise ValueError(
|
|
410
|
-
f"
|
|
411
|
-
f"Supported versions: {self.
|
|
429
|
+
f"Specified {version} Dataproc Spark runtime version is not supported. "
|
|
430
|
+
f"Supported runtime versions: {self._dataproc_runtime_to_spark_version.keys()}"
|
|
412
431
|
)
|
|
413
432
|
|
|
414
|
-
server_version = self.
|
|
415
|
-
|
|
433
|
+
server_version = self._dataproc_runtime_to_spark_version[
|
|
434
|
+
trim_version(version)
|
|
416
435
|
]
|
|
436
|
+
|
|
417
437
|
import importlib.metadata
|
|
418
438
|
|
|
419
|
-
|
|
439
|
+
dataproc_connect_version = importlib.metadata.version(
|
|
420
440
|
"dataproc-spark-connect"
|
|
421
441
|
)
|
|
422
442
|
client_version = importlib.metadata.version("pyspark")
|
|
423
|
-
|
|
424
|
-
|
|
425
|
-
|
|
426
|
-
|
|
427
|
-
|
|
428
|
-
logger.warning(
|
|
429
|
-
f"client and server on different versions: {version_message}"
|
|
443
|
+
if trim_version(client_version) != trim_version(server_version):
|
|
444
|
+
print(
|
|
445
|
+
f"Spark Connect client and server use different versions:\n"
|
|
446
|
+
f"- Dataproc Spark Connect client {dataproc_connect_version} (PySpark {client_version})\n"
|
|
447
|
+
f"- Dataproc Spark runtime {version} (Spark {server_version})"
|
|
430
448
|
)
|
|
431
|
-
return version
|
|
432
449
|
|
|
433
|
-
|
|
434
|
-
|
|
435
|
-
if dataproc_config and dataproc_config.spark_connect_session:
|
|
436
|
-
spark_connect_session = dataproc_config.spark_connect_session
|
|
437
|
-
elif session_template and session_template.spark_connect_session:
|
|
438
|
-
spark_connect_session = session_template.spark_connect_session
|
|
439
|
-
return spark_connect_session
|
|
440
|
-
|
|
441
|
-
def generate_dataproc_session_id(self):
|
|
450
|
+
@staticmethod
|
|
451
|
+
def generate_dataproc_session_id():
|
|
442
452
|
timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
|
|
443
453
|
suffix_length = 6
|
|
444
454
|
random_suffix = "".join(
|
|
@@ -451,32 +461,30 @@ class DataprocSparkSession(SparkSession):
|
|
|
451
461
|
def _repr_html_(self) -> str:
|
|
452
462
|
if not self._active_s8s_session_id:
|
|
453
463
|
return """
|
|
454
|
-
<div>No Active Dataproc
|
|
464
|
+
<div>No Active Dataproc Session</div>
|
|
455
465
|
"""
|
|
456
466
|
|
|
457
467
|
s8s_session = f"https://console.cloud.google.com/dataproc/interactive/{self._region}/{self._active_s8s_session_id}"
|
|
458
468
|
ui = f"{s8s_session}/sparkApplications/applications"
|
|
459
|
-
version = ""
|
|
460
469
|
return f"""
|
|
461
470
|
<div>
|
|
462
471
|
<p><b>Spark Connect</b></p>
|
|
463
472
|
|
|
464
|
-
<p><a href="{s8s_session}?project={self._project_id}">
|
|
473
|
+
<p><a href="{s8s_session}?project={self._project_id}">Dataproc Session</a></p>
|
|
465
474
|
<p><a href="{ui}?project={self._project_id}">Spark UI</a></p>
|
|
466
475
|
</div>
|
|
467
476
|
"""
|
|
468
477
|
|
|
469
|
-
|
|
470
|
-
|
|
471
|
-
|
|
472
|
-
|
|
473
|
-
]
|
|
478
|
+
@staticmethod
|
|
479
|
+
def _remove_stopped_session_from_file():
|
|
480
|
+
file_path = DataprocSparkSession._get_active_session_file_path()
|
|
481
|
+
if file_path is not None:
|
|
474
482
|
try:
|
|
475
483
|
with open(file_path, "w"):
|
|
476
484
|
pass
|
|
477
485
|
except Exception as e:
|
|
478
486
|
logger.error(
|
|
479
|
-
f"Exception while removing active session in file {file_path}
|
|
487
|
+
f"Exception while removing active session in file {file_path}, {e}"
|
|
480
488
|
)
|
|
481
489
|
|
|
482
490
|
def addArtifacts(
|
|
@@ -494,7 +502,7 @@ class DataprocSparkSession(SparkSession):
|
|
|
494
502
|
|
|
495
503
|
Parameters
|
|
496
504
|
----------
|
|
497
|
-
*
|
|
505
|
+
*artifact : tuple of str
|
|
498
506
|
Artifact's URIs to add.
|
|
499
507
|
pyfile : bool
|
|
500
508
|
Whether to add them as Python dependencies such as .py, .egg, .zip or .jar files.
|
|
@@ -507,7 +515,7 @@ class DataprocSparkSession(SparkSession):
|
|
|
507
515
|
Add a file to be downloaded with this Spark job on every node.
|
|
508
516
|
The ``path`` passed can only be a local file for now.
|
|
509
517
|
pypi : bool
|
|
510
|
-
This option is only available with DataprocSparkSession.
|
|
518
|
+
This option is only available with DataprocSparkSession. e.g. `spark.addArtifacts("spacy==3.8.4", "torch", pypi=True)`
|
|
511
519
|
Installs PyPi package (with its dependencies) in the active Spark session on the driver and executors.
|
|
512
520
|
|
|
513
521
|
Notes
|
|
@@ -534,6 +542,10 @@ class DataprocSparkSession(SparkSession):
|
|
|
534
542
|
*artifact, pyfile=pyfile, archive=archive, file=file
|
|
535
543
|
)
|
|
536
544
|
|
|
545
|
+
@staticmethod
|
|
546
|
+
def _get_active_session_file_path():
|
|
547
|
+
return os.getenv("DATAPROC_SPARK_CONNECT_ACTIVE_SESSION_FILE_PATH")
|
|
548
|
+
|
|
537
549
|
def stop(self) -> None:
|
|
538
550
|
with DataprocSparkSession._lock:
|
|
539
551
|
if DataprocSparkSession._active_s8s_session_id is not None:
|
|
@@ -544,7 +556,7 @@ class DataprocSparkSession(SparkSession):
|
|
|
544
556
|
self._client_options,
|
|
545
557
|
)
|
|
546
558
|
|
|
547
|
-
self.
|
|
559
|
+
self._remove_stopped_session_from_file()
|
|
548
560
|
DataprocSparkSession._active_s8s_session_uuid = None
|
|
549
561
|
DataprocSparkSession._active_s8s_session_id = None
|
|
550
562
|
DataprocSparkSession._project_id = None
|
|
@@ -565,7 +577,7 @@ def terminate_s8s_session(
|
|
|
565
577
|
):
|
|
566
578
|
from google.cloud.dataproc_v1 import SessionControllerClient
|
|
567
579
|
|
|
568
|
-
logger.debug(f"Terminating
|
|
580
|
+
logger.debug(f"Terminating Dataproc Session: {active_s8s_session_id}")
|
|
569
581
|
terminate_session_request = TerminateSessionRequest()
|
|
570
582
|
session_name = f"projects/{project_id}/locations/{region}/sessions/{active_s8s_session_id}"
|
|
571
583
|
terminate_session_request.name = session_name
|
|
@@ -583,18 +595,20 @@ def terminate_s8s_session(
|
|
|
583
595
|
):
|
|
584
596
|
session = session_client.get_session(get_session_request)
|
|
585
597
|
state = session.state
|
|
586
|
-
sleep(1)
|
|
598
|
+
time.sleep(1)
|
|
587
599
|
except NotFound:
|
|
588
|
-
logger.debug(
|
|
600
|
+
logger.debug(
|
|
601
|
+
f"{active_s8s_session_id} Dataproc Session already deleted"
|
|
602
|
+
)
|
|
589
603
|
# Client will get 'Aborted' error if session creation is still in progress and
|
|
590
604
|
# 'FailedPrecondition' if another termination is still in progress.
|
|
591
|
-
# Both are retryable but we catch it and let TTL take care of cleanups.
|
|
605
|
+
# Both are retryable, but we catch it and let TTL take care of cleanups.
|
|
592
606
|
except (FailedPrecondition, Aborted):
|
|
593
607
|
logger.debug(
|
|
594
|
-
f"
|
|
608
|
+
f"{active_s8s_session_id} Dataproc Session already terminated manually or automatically due to TTL"
|
|
595
609
|
)
|
|
596
610
|
if state is not None and state == Session.State.FAILED:
|
|
597
|
-
raise RuntimeError("
|
|
611
|
+
raise RuntimeError("Dataproc Session termination failed")
|
|
598
612
|
|
|
599
613
|
|
|
600
614
|
def get_active_s8s_session_response(
|
|
@@ -608,7 +622,7 @@ def get_active_s8s_session_response(
|
|
|
608
622
|
).get_session(get_session_request)
|
|
609
623
|
state = get_session_response.state
|
|
610
624
|
except Exception as e:
|
|
611
|
-
|
|
625
|
+
print(f"{session_name} Dataproc Session deleted: {e}")
|
|
612
626
|
return None
|
|
613
627
|
if state is not None and (
|
|
614
628
|
state == Session.State.ACTIVE or state == Session.State.CREATING
|
|
@@ -1,111 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.1
|
|
2
|
-
Name: dataproc-spark-connect
|
|
3
|
-
Version: 0.6.0
|
|
4
|
-
Summary: Dataproc client library for Spark Connect
|
|
5
|
-
Home-page: https://github.com/GoogleCloudDataproc/dataproc-spark-connect-python
|
|
6
|
-
Author: Google LLC
|
|
7
|
-
License: Apache 2.0
|
|
8
|
-
License-File: LICENSE
|
|
9
|
-
Requires-Dist: google-api-core>=2.19.1
|
|
10
|
-
Requires-Dist: google-cloud-dataproc>=5.18.0
|
|
11
|
-
Requires-Dist: websockets
|
|
12
|
-
Requires-Dist: pyspark[connect]>=3.5
|
|
13
|
-
Requires-Dist: packaging>=20.0
|
|
14
|
-
|
|
15
|
-
# Dataproc Spark Connect Client
|
|
16
|
-
|
|
17
|
-
A wrapper of the Apache [Spark Connect](https://spark.apache.org/spark-connect/) client with
|
|
18
|
-
additional functionalities that allow applications to communicate with a remote Dataproc
|
|
19
|
-
Spark cluster using the Spark Connect protocol without requiring additional steps.
|
|
20
|
-
|
|
21
|
-
## Install
|
|
22
|
-
|
|
23
|
-
```console
|
|
24
|
-
pip install dataproc_spark_connect
|
|
25
|
-
```
|
|
26
|
-
|
|
27
|
-
## Uninstall
|
|
28
|
-
|
|
29
|
-
```console
|
|
30
|
-
pip uninstall dataproc_spark_connect
|
|
31
|
-
```
|
|
32
|
-
|
|
33
|
-
## Setup
|
|
34
|
-
This client requires permissions to manage [Dataproc sessions and session templates](https://cloud.google.com/dataproc-serverless/docs/concepts/iam).
|
|
35
|
-
If you are running the client outside of Google Cloud, you must set following environment variables:
|
|
36
|
-
|
|
37
|
-
* GOOGLE_CLOUD_PROJECT - The Google Cloud project you use to run Spark workloads
|
|
38
|
-
* GOOGLE_CLOUD_REGION - The Compute Engine [region](https://cloud.google.com/compute/docs/regions-zones#available) where you run the Spark workload.
|
|
39
|
-
* GOOGLE_APPLICATION_CREDENTIALS - Your [Application Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc)
|
|
40
|
-
* DATAPROC_SPARK_CONNECT_SESSION_DEFAULT_CONFIG (Optional) - The config location, such as `tests/integration/resources/session.textproto`
|
|
41
|
-
|
|
42
|
-
## Usage
|
|
43
|
-
|
|
44
|
-
1. Install the latest version of Dataproc Python client and Dataproc Spark Connect modules:
|
|
45
|
-
|
|
46
|
-
```console
|
|
47
|
-
pip install google_cloud_dataproc --force-reinstall
|
|
48
|
-
pip install dataproc_spark_connect --force-reinstall
|
|
49
|
-
```
|
|
50
|
-
|
|
51
|
-
2. Add the required import into your PySpark application or notebook:
|
|
52
|
-
|
|
53
|
-
```python
|
|
54
|
-
from google.cloud.dataproc_spark_connect import DataprocSparkSession
|
|
55
|
-
```
|
|
56
|
-
|
|
57
|
-
3. There are two ways to create a spark session,
|
|
58
|
-
|
|
59
|
-
1. Start a Spark session using properties defined in `DATAPROC_SPARK_CONNECT_SESSION_DEFAULT_CONFIG`:
|
|
60
|
-
|
|
61
|
-
```python
|
|
62
|
-
spark = DataprocSparkSession.builder.getOrCreate()
|
|
63
|
-
```
|
|
64
|
-
|
|
65
|
-
2. Start a Spark session with the following code instead of using a config file:
|
|
66
|
-
|
|
67
|
-
```python
|
|
68
|
-
from google.cloud.dataproc_v1 import SparkConnectConfig
|
|
69
|
-
from google.cloud.dataproc_v1 import Session
|
|
70
|
-
dataproc_session_config = Session()
|
|
71
|
-
dataproc_session_config.spark_connect_session = SparkConnectConfig()
|
|
72
|
-
dataproc_session_config.environment_config.execution_config.subnetwork_uri = "<subnet>"
|
|
73
|
-
dataproc_session_config.runtime_config.version = '3.0'
|
|
74
|
-
spark = DataprocSparkSession.builder.dataprocSessionConfig(dataproc_session_config).getOrCreate()
|
|
75
|
-
```
|
|
76
|
-
|
|
77
|
-
## Billing
|
|
78
|
-
As this client runs the spark workload on Dataproc, your project will be billed as per [Dataproc Serverless Pricing](https://cloud.google.com/dataproc-serverless/pricing).
|
|
79
|
-
This will happen even if you are running the client from a non-GCE instance.
|
|
80
|
-
|
|
81
|
-
## Contributing
|
|
82
|
-
### Building and Deploying SDK
|
|
83
|
-
|
|
84
|
-
1. Install the requirements in virtual environment.
|
|
85
|
-
|
|
86
|
-
```console
|
|
87
|
-
pip install -r requirements-dev.txt
|
|
88
|
-
```
|
|
89
|
-
|
|
90
|
-
2. Build the code.
|
|
91
|
-
|
|
92
|
-
```console
|
|
93
|
-
python setup.py sdist bdist_wheel
|
|
94
|
-
```
|
|
95
|
-
|
|
96
|
-
3. Copy the generated `.whl` file to Cloud Storage. Use the version specified in the `setup.py` file.
|
|
97
|
-
|
|
98
|
-
```sh
|
|
99
|
-
VERSION=<version>
|
|
100
|
-
gsutil cp dist/dataproc_spark_connect-${VERSION}-py2.py3-none-any.whl gs://<your_bucket_name>
|
|
101
|
-
```
|
|
102
|
-
|
|
103
|
-
4. Download the new SDK on Vertex, then uninstall the old version and install the new one.
|
|
104
|
-
|
|
105
|
-
```sh
|
|
106
|
-
%%bash
|
|
107
|
-
export VERSION=<version>
|
|
108
|
-
gsutil cp gs://<your_bucket_name>/dataproc_spark_connect-${VERSION}-py2.py3-none-any.whl .
|
|
109
|
-
yes | pip uninstall dataproc_spark_connect
|
|
110
|
-
pip install dataproc_spark_connect-${VERSION}-py2.py3-none-any.whl
|
|
111
|
-
```
|
|
File without changes
|
|
File without changes
|
{dataproc_spark_connect-0.6.0.dist-info → dataproc_spark_connect-0.7.0.dist-info}/top_level.txt
RENAMED
|
File without changes
|