dataproc-spark-connect 0.9.0__py2.py3-none-any.whl → 1.0.0__py2.py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- dataproc_spark_connect-1.0.0.dist-info/METADATA +200 -0
- dataproc_spark_connect-1.0.0.dist-info/RECORD +13 -0
- google/cloud/dataproc_spark_connect/client/core.py +5 -3
- google/cloud/dataproc_spark_connect/environment.py +25 -0
- google/cloud/dataproc_spark_connect/exceptions.py +1 -1
- google/cloud/dataproc_spark_connect/session.py +531 -86
- dataproc_spark_connect-0.9.0.dist-info/METADATA +0 -105
- dataproc_spark_connect-0.9.0.dist-info/RECORD +0 -13
- {dataproc_spark_connect-0.9.0.dist-info → dataproc_spark_connect-1.0.0.dist-info}/WHEEL +0 -0
- {dataproc_spark_connect-0.9.0.dist-info → dataproc_spark_connect-1.0.0.dist-info}/licenses/LICENSE +0 -0
- {dataproc_spark_connect-0.9.0.dist-info → dataproc_spark_connect-1.0.0.dist-info}/top_level.txt +0 -0
|
@@ -0,0 +1,200 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: dataproc-spark-connect
|
|
3
|
+
Version: 1.0.0
|
|
4
|
+
Summary: Dataproc client library for Spark Connect
|
|
5
|
+
Home-page: https://github.com/GoogleCloudDataproc/dataproc-spark-connect-python
|
|
6
|
+
Author: Google LLC
|
|
7
|
+
License: Apache 2.0
|
|
8
|
+
Description-Content-Type: text/markdown
|
|
9
|
+
License-File: LICENSE
|
|
10
|
+
Requires-Dist: google-api-core>=2.19
|
|
11
|
+
Requires-Dist: google-cloud-dataproc>=5.18
|
|
12
|
+
Requires-Dist: packaging>=20.0
|
|
13
|
+
Requires-Dist: pyspark-client~=4.0.0
|
|
14
|
+
Requires-Dist: tqdm>=4.67
|
|
15
|
+
Requires-Dist: websockets>=14.0
|
|
16
|
+
Dynamic: author
|
|
17
|
+
Dynamic: description
|
|
18
|
+
Dynamic: home-page
|
|
19
|
+
Dynamic: license
|
|
20
|
+
Dynamic: license-file
|
|
21
|
+
Dynamic: requires-dist
|
|
22
|
+
Dynamic: summary
|
|
23
|
+
|
|
24
|
+
# Dataproc Spark Connect Client
|
|
25
|
+
|
|
26
|
+
A wrapper of the Apache [Spark Connect](https://spark.apache.org/spark-connect/)
|
|
27
|
+
client with additional functionalities that allow applications to communicate
|
|
28
|
+
with a remote Dataproc Spark Session using the Spark Connect protocol without
|
|
29
|
+
requiring additional steps.
|
|
30
|
+
|
|
31
|
+
## Install
|
|
32
|
+
|
|
33
|
+
```sh
|
|
34
|
+
pip install dataproc_spark_connect
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
## Uninstall
|
|
38
|
+
|
|
39
|
+
```sh
|
|
40
|
+
pip uninstall dataproc_spark_connect
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
## Setup
|
|
44
|
+
|
|
45
|
+
This client requires permissions to
|
|
46
|
+
manage [Dataproc Sessions and Session Templates](https://cloud.google.com/dataproc-serverless/docs/concepts/iam).
|
|
47
|
+
|
|
48
|
+
If you are running the client outside of Google Cloud, you need to provide
|
|
49
|
+
authentication credentials. Set the `GOOGLE_APPLICATION_CREDENTIALS` environment
|
|
50
|
+
variable to point to
|
|
51
|
+
your [Application Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc)
|
|
52
|
+
file.
|
|
53
|
+
|
|
54
|
+
You can specify the project and region either via environment variables or directly
|
|
55
|
+
in your code using the builder API:
|
|
56
|
+
|
|
57
|
+
* Environment variables: `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_REGION`
|
|
58
|
+
* Builder API: `.projectId()` and `.location()` methods (recommended)
|
|
59
|
+
|
|
60
|
+
## Usage
|
|
61
|
+
|
|
62
|
+
1. Install the latest version of Dataproc Spark Connect:
|
|
63
|
+
|
|
64
|
+
```sh
|
|
65
|
+
pip install -U dataproc-spark-connect
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
2. Add the required imports into your PySpark application or notebook and start
|
|
69
|
+
a Spark session using the fluent API:
|
|
70
|
+
|
|
71
|
+
```python
|
|
72
|
+
from google.cloud.dataproc_spark_connect import DataprocSparkSession
|
|
73
|
+
spark = DataprocSparkSession.builder.getOrCreate()
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
3. You can configure Spark properties using the `.config()` method:
|
|
77
|
+
|
|
78
|
+
```python
|
|
79
|
+
from google.cloud.dataproc_spark_connect import DataprocSparkSession
|
|
80
|
+
spark = DataprocSparkSession.builder.config('spark.executor.memory', '4g').config('spark.executor.cores', '2').getOrCreate()
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
4. For advanced configuration, you can use the `Session` class to customize
|
|
84
|
+
settings like subnetwork or other environment configurations:
|
|
85
|
+
|
|
86
|
+
```python
|
|
87
|
+
from google.cloud.dataproc_spark_connect import DataprocSparkSession
|
|
88
|
+
from google.cloud.dataproc_v1 import Session
|
|
89
|
+
session_config = Session()
|
|
90
|
+
session_config.environment_config.execution_config.subnetwork_uri = '<subnet>'
|
|
91
|
+
session_config.runtime_config.version = '3.0'
|
|
92
|
+
spark = DataprocSparkSession.builder.projectId('my-project').location('us-central1').dataprocSessionConfig(session_config).getOrCreate()
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
### Reusing Named Sessions Across Notebooks
|
|
96
|
+
|
|
97
|
+
Named sessions allow you to share a single Spark session across multiple notebooks, improving efficiency by avoiding repeated session startup times and reducing costs.
|
|
98
|
+
|
|
99
|
+
To create or connect to a named session:
|
|
100
|
+
|
|
101
|
+
1. Create a session with a custom ID in your first notebook:
|
|
102
|
+
|
|
103
|
+
```python
|
|
104
|
+
from google.cloud.dataproc_spark_connect import DataprocSparkSession
|
|
105
|
+
session_id = 'my-ml-pipeline-session'
|
|
106
|
+
spark = DataprocSparkSession.builder.dataprocSessionId(session_id).getOrCreate()
|
|
107
|
+
df = spark.createDataFrame([(1, 'data')], ['id', 'value'])
|
|
108
|
+
df.show()
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
2. Reuse the same session in another notebook by specifying the same session ID:
|
|
112
|
+
|
|
113
|
+
```python
|
|
114
|
+
from google.cloud.dataproc_spark_connect import DataprocSparkSession
|
|
115
|
+
session_id = 'my-ml-pipeline-session'
|
|
116
|
+
spark = DataprocSparkSession.builder.dataprocSessionId(session_id).getOrCreate()
|
|
117
|
+
df = spark.createDataFrame([(2, 'more-data')], ['id', 'value'])
|
|
118
|
+
df.show()
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
3. Session IDs must be 4-63 characters long, start with a lowercase letter, contain only lowercase letters, numbers, and hyphens, and not end with a hyphen.
|
|
122
|
+
|
|
123
|
+
4. Named sessions persist until explicitly terminated or reach their configured TTL.
|
|
124
|
+
|
|
125
|
+
5. A session with a given ID that is in a TERMINATED state cannot be reused. It must be deleted before a new session with the same ID can be created.
|
|
126
|
+
|
|
127
|
+
### Using Spark SQL Magic Commands (Jupyter Notebooks)
|
|
128
|
+
|
|
129
|
+
The package supports the [sparksql-magic](https://github.com/cryeo/sparksql-magic) library for executing Spark SQL queries directly in Jupyter notebooks.
|
|
130
|
+
|
|
131
|
+
**Installation**: To use magic commands, install the required dependencies manually:
|
|
132
|
+
```bash
|
|
133
|
+
pip install dataproc-spark-connect
|
|
134
|
+
pip install IPython sparksql-magic
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
1. Load the magic extension:
|
|
138
|
+
```python
|
|
139
|
+
%load_ext sparksql_magic
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
2. Configure default settings (optional):
|
|
143
|
+
```python
|
|
144
|
+
%config SparkSql.limit=20
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
3. Execute SQL queries:
|
|
148
|
+
```python
|
|
149
|
+
%%sparksql
|
|
150
|
+
SELECT * FROM your_table
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
4. Advanced usage with options:
|
|
154
|
+
```python
|
|
155
|
+
# Cache results and create a view
|
|
156
|
+
%%sparksql --cache --view result_view df
|
|
157
|
+
SELECT * FROM your_table WHERE condition = true
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
Available options:
|
|
161
|
+
- `--cache` / `-c`: Cache the DataFrame
|
|
162
|
+
- `--eager` / `-e`: Cache with eager loading
|
|
163
|
+
- `--view VIEW` / `-v VIEW`: Create a temporary view
|
|
164
|
+
- `--limit N` / `-l N`: Override default row display limit
|
|
165
|
+
- `variable_name`: Store result in a variable
|
|
166
|
+
|
|
167
|
+
See [sparksql-magic](https://github.com/cryeo/sparksql-magic) for more examples.
|
|
168
|
+
|
|
169
|
+
**Note**: Magic commands are optional. If you only need basic DataprocSparkSession functionality without Jupyter magic support, install only the base package:
|
|
170
|
+
```bash
|
|
171
|
+
pip install dataproc-spark-connect
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
## Developing
|
|
175
|
+
|
|
176
|
+
For development instructions see [guide](DEVELOPING.md).
|
|
177
|
+
|
|
178
|
+
## Contributing
|
|
179
|
+
|
|
180
|
+
We'd love to accept your patches and contributions to this project. There are
|
|
181
|
+
just a few small guidelines you need to follow.
|
|
182
|
+
|
|
183
|
+
### Contributor License Agreement
|
|
184
|
+
|
|
185
|
+
Contributions to this project must be accompanied by a Contributor License
|
|
186
|
+
Agreement. You (or your employer) retain the copyright to your contribution;
|
|
187
|
+
this simply gives us permission to use and redistribute your contributions as
|
|
188
|
+
part of the project. Head over to <https://cla.developers.google.com> to see
|
|
189
|
+
your current agreements on file or to sign a new one.
|
|
190
|
+
|
|
191
|
+
You generally only need to submit a CLA once, so if you've already submitted one
|
|
192
|
+
(even if it was for a different project), you probably don't need to do it
|
|
193
|
+
again.
|
|
194
|
+
|
|
195
|
+
### Code reviews
|
|
196
|
+
|
|
197
|
+
All submissions, including submissions by project members, require review. We
|
|
198
|
+
use GitHub pull requests for this purpose. Consult
|
|
199
|
+
[GitHub Help](https://help.github.com/articles/about-pull-requests/) for more
|
|
200
|
+
information on using pull requests.
|
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
dataproc_spark_connect-1.0.0.dist-info/licenses/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
|
|
2
|
+
google/cloud/dataproc_spark_connect/__init__.py,sha256=dIqHNWVWWrSuRf26x11kX5e9yMKSHCtmI_GBj1-FDdE,1101
|
|
3
|
+
google/cloud/dataproc_spark_connect/environment.py,sha256=o5WRKI1vyIaxZ8S2UhtDer6pdi4CXYRzI9Xdpq5hVkQ,2771
|
|
4
|
+
google/cloud/dataproc_spark_connect/exceptions.py,sha256=iwaHgNabcaxqquOpktGkOWKHMf8hgdPQJUgRnIbTXVs,970
|
|
5
|
+
google/cloud/dataproc_spark_connect/pypi_artifacts.py,sha256=gd-VMwiVP-EJuPp9Vf9Shx8pqps3oSKp0hBcSSZQS-A,1575
|
|
6
|
+
google/cloud/dataproc_spark_connect/session.py,sha256=loEpKA2ssA89EqT9gWphmfPsZwfHjayxd97J2avdQMc,55890
|
|
7
|
+
google/cloud/dataproc_spark_connect/client/__init__.py,sha256=6hCNSsgYlie6GuVpc5gjFsPnyeMTScTpXSPYqp1fplY,615
|
|
8
|
+
google/cloud/dataproc_spark_connect/client/core.py,sha256=GRc4OCTBvIvdagjxOPoDO22vLtt8xDSerdREMRDeUBY,4659
|
|
9
|
+
google/cloud/dataproc_spark_connect/client/proxy.py,sha256=qUZXvVY1yn934vE6nlO495XUZ53AUx9O74a9ozkGI9U,8976
|
|
10
|
+
dataproc_spark_connect-1.0.0.dist-info/METADATA,sha256=HYCTM2juKp06uDL-9Ec1Ssu7tjBfnqX_LJ6bBjRjJjA,6838
|
|
11
|
+
dataproc_spark_connect-1.0.0.dist-info/WHEEL,sha256=JNWh1Fm1UdwIQV075glCn4MVuCRs0sotJIq-J6rbxCU,109
|
|
12
|
+
dataproc_spark_connect-1.0.0.dist-info/top_level.txt,sha256=_1QvSJIhFAGfxb79D6DhB7SUw2X6T4rwnz_LLrbcD3c,7
|
|
13
|
+
dataproc_spark_connect-1.0.0.dist-info/RECORD,,
|
|
@@ -15,14 +15,14 @@ import logging
|
|
|
15
15
|
|
|
16
16
|
import google
|
|
17
17
|
import grpc
|
|
18
|
-
from pyspark.sql.connect.client import
|
|
18
|
+
from pyspark.sql.connect.client import DefaultChannelBuilder
|
|
19
19
|
|
|
20
20
|
from . import proxy
|
|
21
21
|
|
|
22
22
|
logger = logging.getLogger(__name__)
|
|
23
23
|
|
|
24
24
|
|
|
25
|
-
class DataprocChannelBuilder(
|
|
25
|
+
class DataprocChannelBuilder(DefaultChannelBuilder):
|
|
26
26
|
"""
|
|
27
27
|
This is a helper class that is used to create a GRPC channel based on the given
|
|
28
28
|
connection string per the documentation of Spark Connect.
|
|
@@ -88,7 +88,9 @@ class ProxiedChannel(grpc.Channel):
|
|
|
88
88
|
self._proxy = proxy.DataprocSessionProxy(0, target_host)
|
|
89
89
|
self._proxy.start()
|
|
90
90
|
self._proxied_connect_url = f"sc://localhost:{self._proxy.port}"
|
|
91
|
-
self._wrapped =
|
|
91
|
+
self._wrapped = DefaultChannelBuilder(
|
|
92
|
+
self._proxied_connect_url
|
|
93
|
+
).toChannel()
|
|
92
94
|
|
|
93
95
|
def __enter__(self):
|
|
94
96
|
return self
|
|
@@ -13,6 +13,7 @@
|
|
|
13
13
|
# limitations under the License.
|
|
14
14
|
|
|
15
15
|
import os
|
|
16
|
+
import sys
|
|
16
17
|
from typing import Callable, Tuple, List
|
|
17
18
|
|
|
18
19
|
|
|
@@ -46,6 +47,30 @@ def is_jetbrains_ide() -> bool:
|
|
|
46
47
|
return "jetbrains" in os.getenv("TERMINAL_EMULATOR", "").lower()
|
|
47
48
|
|
|
48
49
|
|
|
50
|
+
def is_interactive():
|
|
51
|
+
try:
|
|
52
|
+
from IPython import get_ipython
|
|
53
|
+
|
|
54
|
+
if get_ipython() is not None:
|
|
55
|
+
return True
|
|
56
|
+
except ImportError:
|
|
57
|
+
pass
|
|
58
|
+
|
|
59
|
+
return hasattr(sys, "ps1") or sys.flags.interactive
|
|
60
|
+
|
|
61
|
+
|
|
62
|
+
def is_terminal():
|
|
63
|
+
return sys.stdin.isatty()
|
|
64
|
+
|
|
65
|
+
|
|
66
|
+
def is_interactive_terminal():
|
|
67
|
+
return is_interactive() and is_terminal()
|
|
68
|
+
|
|
69
|
+
|
|
70
|
+
def is_dataproc_batch() -> bool:
|
|
71
|
+
return os.getenv("DATAPROC_WORKLOAD_TYPE") == "batch"
|
|
72
|
+
|
|
73
|
+
|
|
49
74
|
def get_client_environment_label() -> str:
|
|
50
75
|
"""
|
|
51
76
|
Map current environment to a standardized client label.
|
|
@@ -14,6 +14,7 @@
|
|
|
14
14
|
|
|
15
15
|
import atexit
|
|
16
16
|
import datetime
|
|
17
|
+
import functools
|
|
17
18
|
import json
|
|
18
19
|
import logging
|
|
19
20
|
import os
|
|
@@ -24,8 +25,9 @@ import threading
|
|
|
24
25
|
import time
|
|
25
26
|
import uuid
|
|
26
27
|
import tqdm
|
|
28
|
+
from packaging import version
|
|
27
29
|
from types import MethodType
|
|
28
|
-
from typing import Any, cast, ClassVar, Dict, Optional, Union
|
|
30
|
+
from typing import Any, cast, ClassVar, Dict, Iterable, Optional, Union
|
|
29
31
|
|
|
30
32
|
from google.api_core import retry
|
|
31
33
|
from google.api_core.client_options import ClientOptions
|
|
@@ -43,6 +45,7 @@ from google.cloud.dataproc_spark_connect.pypi_artifacts import PyPiArtifacts
|
|
|
43
45
|
from google.cloud.dataproc_v1 import (
|
|
44
46
|
AuthenticationConfig,
|
|
45
47
|
CreateSessionRequest,
|
|
48
|
+
DeleteSessionRequest,
|
|
46
49
|
GetSessionRequest,
|
|
47
50
|
Session,
|
|
48
51
|
SessionControllerClient,
|
|
@@ -63,6 +66,10 @@ SYSTEM_LABELS = {
|
|
|
63
66
|
"goog-colab-notebook-id",
|
|
64
67
|
}
|
|
65
68
|
|
|
69
|
+
_DATAPROC_SESSIONS_BASE_URL = (
|
|
70
|
+
"https://console.cloud.google.com/dataproc/interactive"
|
|
71
|
+
)
|
|
72
|
+
|
|
66
73
|
|
|
67
74
|
def _is_valid_label_value(value: str) -> bool:
|
|
68
75
|
"""
|
|
@@ -84,6 +91,22 @@ def _is_valid_label_value(value: str) -> bool:
|
|
|
84
91
|
return bool(re.match(pattern, value))
|
|
85
92
|
|
|
86
93
|
|
|
94
|
+
def _is_valid_session_id(session_id: str) -> bool:
|
|
95
|
+
"""
|
|
96
|
+
Validates if a string complies with Google Cloud session ID format.
|
|
97
|
+
- Must be 4-63 characters
|
|
98
|
+
- Only lowercase letters, numbers, and dashes are allowed
|
|
99
|
+
- Must start with a lowercase letter
|
|
100
|
+
- Cannot end with a dash
|
|
101
|
+
"""
|
|
102
|
+
if not session_id:
|
|
103
|
+
return False
|
|
104
|
+
|
|
105
|
+
# The pattern is sufficient for validation and already enforces length constraints.
|
|
106
|
+
pattern = r"^[a-z][a-z0-9-]{2,61}[a-z0-9]$"
|
|
107
|
+
return bool(re.match(pattern, session_id))
|
|
108
|
+
|
|
109
|
+
|
|
87
110
|
class DataprocSparkSession(SparkSession):
|
|
88
111
|
"""The entry point to programming Spark with the Dataset and DataFrame API.
|
|
89
112
|
|
|
@@ -103,13 +126,16 @@ class DataprocSparkSession(SparkSession):
|
|
|
103
126
|
... ) # doctest: +SKIP
|
|
104
127
|
"""
|
|
105
128
|
|
|
106
|
-
_DEFAULT_RUNTIME_VERSION = "
|
|
129
|
+
_DEFAULT_RUNTIME_VERSION = "3.0"
|
|
130
|
+
_MIN_RUNTIME_VERSION = "3.0"
|
|
107
131
|
|
|
108
132
|
_active_s8s_session_uuid: ClassVar[Optional[str]] = None
|
|
109
133
|
_project_id = None
|
|
110
134
|
_region = None
|
|
111
135
|
_client_options = None
|
|
112
136
|
_active_s8s_session_id: ClassVar[Optional[str]] = None
|
|
137
|
+
_active_session_uses_custom_id: ClassVar[bool] = False
|
|
138
|
+
_execution_progress_bar = dict()
|
|
113
139
|
|
|
114
140
|
class Builder(SparkSession.Builder):
|
|
115
141
|
|
|
@@ -117,6 +143,7 @@ class DataprocSparkSession(SparkSession):
|
|
|
117
143
|
self._options: Dict[str, Any] = {}
|
|
118
144
|
self._channel_builder: Optional[DataprocChannelBuilder] = None
|
|
119
145
|
self._dataproc_config: Optional[Session] = None
|
|
146
|
+
self._custom_session_id: Optional[str] = None
|
|
120
147
|
self._project_id = os.getenv("GOOGLE_CLOUD_PROJECT")
|
|
121
148
|
self._region = os.getenv("GOOGLE_CLOUD_REGION")
|
|
122
149
|
self._client_options = ClientOptions(
|
|
@@ -125,6 +152,18 @@ class DataprocSparkSession(SparkSession):
|
|
|
125
152
|
f"{self._region}-dataproc.googleapis.com",
|
|
126
153
|
)
|
|
127
154
|
)
|
|
155
|
+
self._session_controller_client: Optional[
|
|
156
|
+
SessionControllerClient
|
|
157
|
+
] = None
|
|
158
|
+
|
|
159
|
+
@property
|
|
160
|
+
def session_controller_client(self) -> SessionControllerClient:
|
|
161
|
+
"""Get or create a SessionControllerClient instance."""
|
|
162
|
+
if self._session_controller_client is None:
|
|
163
|
+
self._session_controller_client = SessionControllerClient(
|
|
164
|
+
client_options=self._client_options
|
|
165
|
+
)
|
|
166
|
+
return self._session_controller_client
|
|
128
167
|
|
|
129
168
|
def projectId(self, project_id):
|
|
130
169
|
self._project_id = project_id
|
|
@@ -138,6 +177,35 @@ class DataprocSparkSession(SparkSession):
|
|
|
138
177
|
)
|
|
139
178
|
return self
|
|
140
179
|
|
|
180
|
+
def dataprocSessionId(self, session_id: str):
|
|
181
|
+
"""
|
|
182
|
+
Set a custom session ID for creating or reusing sessions.
|
|
183
|
+
|
|
184
|
+
The session ID must:
|
|
185
|
+
- Be 4-63 characters long
|
|
186
|
+
- Start with a lowercase letter
|
|
187
|
+
- Contain only lowercase letters, numbers, and hyphens
|
|
188
|
+
- Not end with a hyphen
|
|
189
|
+
|
|
190
|
+
Args:
|
|
191
|
+
session_id: The custom session ID to use
|
|
192
|
+
|
|
193
|
+
Returns:
|
|
194
|
+
This Builder instance for method chaining
|
|
195
|
+
|
|
196
|
+
Raises:
|
|
197
|
+
ValueError: If the session ID format is invalid
|
|
198
|
+
"""
|
|
199
|
+
if not _is_valid_session_id(session_id):
|
|
200
|
+
raise ValueError(
|
|
201
|
+
f"Invalid session ID: '{session_id}'. "
|
|
202
|
+
"Session ID must be 4-63 characters, start with a lowercase letter, "
|
|
203
|
+
"contain only lowercase letters, numbers, and hyphens, "
|
|
204
|
+
"and not end with a hyphen."
|
|
205
|
+
)
|
|
206
|
+
self._custom_session_id = session_id
|
|
207
|
+
return self
|
|
208
|
+
|
|
141
209
|
def dataprocSessionConfig(self, dataproc_config: Session):
|
|
142
210
|
self._dataproc_config = dataproc_config
|
|
143
211
|
for k, v in dataproc_config.runtime_config.properties.items():
|
|
@@ -158,19 +226,6 @@ class DataprocSparkSession(SparkSession):
|
|
|
158
226
|
self.dataproc_config.environment_config.execution_config.service_account = (
|
|
159
227
|
account
|
|
160
228
|
)
|
|
161
|
-
# Automatically set auth type to SERVICE_ACCOUNT when service account is provided
|
|
162
|
-
# This overrides any env var setting to simplify user experience
|
|
163
|
-
self.dataproc_config.environment_config.execution_config.authentication_config.user_workload_authentication_type = (
|
|
164
|
-
AuthenticationConfig.AuthenticationType.SERVICE_ACCOUNT
|
|
165
|
-
)
|
|
166
|
-
return self
|
|
167
|
-
|
|
168
|
-
def authType(
|
|
169
|
-
self, auth_type: "AuthenticationConfig.AuthenticationType"
|
|
170
|
-
):
|
|
171
|
-
self.dataproc_config.environment_config.execution_config.authentication_config.user_workload_authentication_type = (
|
|
172
|
-
auth_type
|
|
173
|
-
)
|
|
174
229
|
return self
|
|
175
230
|
|
|
176
231
|
def subnetwork(self, subnet: str):
|
|
@@ -181,10 +236,7 @@ class DataprocSparkSession(SparkSession):
|
|
|
181
236
|
|
|
182
237
|
def ttl(self, duration: datetime.timedelta):
|
|
183
238
|
"""Set the time-to-live (TTL) for the session using a timedelta object."""
|
|
184
|
-
self.
|
|
185
|
-
"seconds": int(duration.total_seconds())
|
|
186
|
-
}
|
|
187
|
-
return self
|
|
239
|
+
return self.ttlSeconds(int(duration.total_seconds()))
|
|
188
240
|
|
|
189
241
|
def ttlSeconds(self, seconds: int):
|
|
190
242
|
"""Set the time-to-live (TTL) for the session in seconds."""
|
|
@@ -195,10 +247,7 @@ class DataprocSparkSession(SparkSession):
|
|
|
195
247
|
|
|
196
248
|
def idleTtl(self, duration: datetime.timedelta):
|
|
197
249
|
"""Set the idle time-to-live (idle TTL) for the session using a timedelta object."""
|
|
198
|
-
self.
|
|
199
|
-
"seconds": int(duration.total_seconds())
|
|
200
|
-
}
|
|
201
|
-
return self
|
|
250
|
+
return self.idleTtlSeconds(int(duration.total_seconds()))
|
|
202
251
|
|
|
203
252
|
def idleTtlSeconds(self, seconds: int):
|
|
204
253
|
"""Set the idle time-to-live (idle TTL) for the session in seconds."""
|
|
@@ -266,7 +315,11 @@ class DataprocSparkSession(SparkSession):
|
|
|
266
315
|
assert self._channel_builder is not None
|
|
267
316
|
session = DataprocSparkSession(connection=self._channel_builder)
|
|
268
317
|
|
|
318
|
+
# Register handler for Cell Execution Progress bar
|
|
319
|
+
session._register_progress_execution_handler()
|
|
320
|
+
|
|
269
321
|
DataprocSparkSession._set_default_and_active_session(session)
|
|
322
|
+
|
|
270
323
|
return session
|
|
271
324
|
|
|
272
325
|
def __create(self) -> "DataprocSparkSession":
|
|
@@ -281,7 +334,16 @@ class DataprocSparkSession(SparkSession):
|
|
|
281
334
|
|
|
282
335
|
dataproc_config: Session = self._get_dataproc_config()
|
|
283
336
|
|
|
284
|
-
|
|
337
|
+
# Check runtime version compatibility before creating session
|
|
338
|
+
self._check_runtime_compatibility(dataproc_config)
|
|
339
|
+
|
|
340
|
+
# Use custom session ID if provided, otherwise generate one
|
|
341
|
+
session_id = (
|
|
342
|
+
self._custom_session_id
|
|
343
|
+
if self._custom_session_id
|
|
344
|
+
else self.generate_dataproc_session_id()
|
|
345
|
+
)
|
|
346
|
+
|
|
285
347
|
dataproc_config.name = f"projects/{self._project_id}/locations/{self._region}/sessions/{session_id}"
|
|
286
348
|
logger.debug(
|
|
287
349
|
f"Dataproc Session configuration:\n{dataproc_config}"
|
|
@@ -296,6 +358,10 @@ class DataprocSparkSession(SparkSession):
|
|
|
296
358
|
|
|
297
359
|
logger.debug("Creating Dataproc Session")
|
|
298
360
|
DataprocSparkSession._active_s8s_session_id = session_id
|
|
361
|
+
# Track whether this session uses a custom ID (unmanaged) or auto-generated ID (managed)
|
|
362
|
+
DataprocSparkSession._active_session_uses_custom_id = (
|
|
363
|
+
self._custom_session_id is not None
|
|
364
|
+
)
|
|
299
365
|
s8s_creation_start_time = time.time()
|
|
300
366
|
|
|
301
367
|
stop_create_session_pbar_event = threading.Event()
|
|
@@ -386,6 +452,7 @@ class DataprocSparkSession(SparkSession):
|
|
|
386
452
|
if create_session_pbar_thread.is_alive():
|
|
387
453
|
create_session_pbar_thread.join()
|
|
388
454
|
DataprocSparkSession._active_s8s_session_id = None
|
|
455
|
+
DataprocSparkSession._active_session_uses_custom_id = False
|
|
389
456
|
raise DataprocSparkConnectException(
|
|
390
457
|
f"Error while creating Dataproc Session: {e.message}"
|
|
391
458
|
)
|
|
@@ -394,6 +461,7 @@ class DataprocSparkSession(SparkSession):
|
|
|
394
461
|
if create_session_pbar_thread.is_alive():
|
|
395
462
|
create_session_pbar_thread.join()
|
|
396
463
|
DataprocSparkSession._active_s8s_session_id = None
|
|
464
|
+
DataprocSparkSession._active_session_uses_custom_id = False
|
|
397
465
|
raise RuntimeError(
|
|
398
466
|
f"Error while creating Dataproc Session"
|
|
399
467
|
) from e
|
|
@@ -407,16 +475,43 @@ class DataprocSparkSession(SparkSession):
|
|
|
407
475
|
session_response, dataproc_config.name
|
|
408
476
|
)
|
|
409
477
|
|
|
478
|
+
def _wait_for_session_available(
|
|
479
|
+
self, session_name: str, timeout: int = 300
|
|
480
|
+
) -> Session:
|
|
481
|
+
start_time = time.time()
|
|
482
|
+
while time.time() - start_time < timeout:
|
|
483
|
+
try:
|
|
484
|
+
session = self.session_controller_client.get_session(
|
|
485
|
+
name=session_name
|
|
486
|
+
)
|
|
487
|
+
if "Spark Connect Server" in session.runtime_info.endpoints:
|
|
488
|
+
return session
|
|
489
|
+
time.sleep(5)
|
|
490
|
+
except Exception as e:
|
|
491
|
+
logger.warning(
|
|
492
|
+
f"Error while polling for Spark Connect endpoint: {e}"
|
|
493
|
+
)
|
|
494
|
+
time.sleep(5)
|
|
495
|
+
raise RuntimeError(
|
|
496
|
+
f"Spark Connect endpoint not available for session {session_name} after {timeout} seconds."
|
|
497
|
+
)
|
|
498
|
+
|
|
410
499
|
def _display_session_link_on_creation(self, session_id):
|
|
411
|
-
session_url = f"
|
|
500
|
+
session_url = f"{_DATAPROC_SESSIONS_BASE_URL}/{self._region}/{session_id}?project={self._project_id}"
|
|
412
501
|
plain_message = f"Creating Dataproc Session: {session_url}"
|
|
413
|
-
|
|
502
|
+
if environment.is_colab_enterprise():
|
|
503
|
+
html_element = f"""
|
|
414
504
|
<div>
|
|
415
505
|
<p>Creating Dataproc Spark Session<p>
|
|
416
|
-
<p><a href="{session_url}">Dataproc Session</a></p>
|
|
417
506
|
</div>
|
|
418
|
-
|
|
419
|
-
|
|
507
|
+
"""
|
|
508
|
+
else:
|
|
509
|
+
html_element = f"""
|
|
510
|
+
<div>
|
|
511
|
+
<p>Creating Dataproc Spark Session<p>
|
|
512
|
+
<p><a href="{session_url}">Dataproc Session</a></p>
|
|
513
|
+
</div>
|
|
514
|
+
"""
|
|
420
515
|
self._output_element_or_message(plain_message, html_element)
|
|
421
516
|
|
|
422
517
|
def _print_session_created_message(self):
|
|
@@ -435,16 +530,19 @@ class DataprocSparkSession(SparkSession):
|
|
|
435
530
|
:param html_element: HTML element to display for interactive IPython
|
|
436
531
|
environment
|
|
437
532
|
"""
|
|
533
|
+
# Don't print any output (Rich or Plain) for non-interactive
|
|
534
|
+
if not environment.is_interactive():
|
|
535
|
+
return
|
|
536
|
+
|
|
537
|
+
if environment.is_interactive_terminal():
|
|
538
|
+
print(plain_message)
|
|
539
|
+
return
|
|
540
|
+
|
|
438
541
|
try:
|
|
439
542
|
from IPython.display import display, HTML
|
|
440
|
-
from IPython.core.interactiveshell import InteractiveShell
|
|
441
543
|
|
|
442
|
-
if not InteractiveShell.initialized():
|
|
443
|
-
raise DataprocSparkConnectException(
|
|
444
|
-
"Not in an Interactive IPython Environment"
|
|
445
|
-
)
|
|
446
544
|
display(HTML(html_element))
|
|
447
|
-
except
|
|
545
|
+
except ImportError:
|
|
448
546
|
print(plain_message)
|
|
449
547
|
|
|
450
548
|
def _get_exiting_active_session(
|
|
@@ -465,10 +563,13 @@ class DataprocSparkSession(SparkSession):
|
|
|
465
563
|
|
|
466
564
|
if session_response is not None:
|
|
467
565
|
print(
|
|
468
|
-
f"Using existing Dataproc Session (configuration changes may not be applied):
|
|
566
|
+
f"Using existing Dataproc Session (configuration changes may not be applied): {_DATAPROC_SESSIONS_BASE_URL}/{self._region}/{s8s_session_id}?project={self._project_id}"
|
|
469
567
|
)
|
|
470
568
|
self._display_view_session_details_button(s8s_session_id)
|
|
471
569
|
if session is None:
|
|
570
|
+
session_response = self._wait_for_session_available(
|
|
571
|
+
session_name
|
|
572
|
+
)
|
|
472
573
|
session = self.__create_spark_connect_session_from_s8s(
|
|
473
574
|
session_response, session_name
|
|
474
575
|
)
|
|
@@ -484,11 +585,54 @@ class DataprocSparkSession(SparkSession):
|
|
|
484
585
|
|
|
485
586
|
def getOrCreate(self) -> "DataprocSparkSession":
|
|
486
587
|
with DataprocSparkSession._lock:
|
|
588
|
+
if environment.is_dataproc_batch():
|
|
589
|
+
# For Dataproc batch workloads, connect to the already initialized local SparkSession
|
|
590
|
+
from pyspark.sql import SparkSession as PySparkSQLSession
|
|
591
|
+
|
|
592
|
+
session = PySparkSQLSession.builder.getOrCreate()
|
|
593
|
+
return session # type: ignore
|
|
594
|
+
|
|
595
|
+
if self._project_id is None:
|
|
596
|
+
raise DataprocSparkConnectException(
|
|
597
|
+
f"Error while creating Dataproc Session: project ID is not set"
|
|
598
|
+
)
|
|
599
|
+
|
|
600
|
+
if self._region is None:
|
|
601
|
+
raise DataprocSparkConnectException(
|
|
602
|
+
f"Error while creating Dataproc Session: location is not set"
|
|
603
|
+
)
|
|
604
|
+
|
|
605
|
+
# Handle custom session ID by setting it early and letting existing logic handle it
|
|
606
|
+
if self._custom_session_id:
|
|
607
|
+
self._handle_custom_session_id()
|
|
608
|
+
|
|
487
609
|
session = self._get_exiting_active_session()
|
|
488
610
|
if session is None:
|
|
489
611
|
session = self.__create()
|
|
612
|
+
|
|
613
|
+
# Register this session as the instantiated SparkSession for compatibility
|
|
614
|
+
# with tools and libraries that expect SparkSession._instantiatedSession
|
|
615
|
+
from pyspark.sql import SparkSession as PySparkSQLSession
|
|
616
|
+
|
|
617
|
+
PySparkSQLSession._instantiatedSession = session
|
|
618
|
+
|
|
490
619
|
return session
|
|
491
620
|
|
|
621
|
+
def _handle_custom_session_id(self):
|
|
622
|
+
"""Handle custom session ID by checking if it exists and setting _active_s8s_session_id."""
|
|
623
|
+
session_response = self._get_session_by_id(self._custom_session_id)
|
|
624
|
+
if session_response is not None:
|
|
625
|
+
# Found an active session with the custom ID, set it as the active session
|
|
626
|
+
DataprocSparkSession._active_s8s_session_id = (
|
|
627
|
+
self._custom_session_id
|
|
628
|
+
)
|
|
629
|
+
# Mark that this session uses a custom ID
|
|
630
|
+
DataprocSparkSession._active_session_uses_custom_id = True
|
|
631
|
+
else:
|
|
632
|
+
# No existing session found, clear any existing active session ID
|
|
633
|
+
# so we'll create a new one with the custom ID
|
|
634
|
+
DataprocSparkSession._active_s8s_session_id = None
|
|
635
|
+
|
|
492
636
|
def _get_dataproc_config(self):
|
|
493
637
|
# Use the property to ensure we always have a config
|
|
494
638
|
dataproc_config = self.dataproc_config
|
|
@@ -506,20 +650,33 @@ class DataprocSparkSession(SparkSession):
|
|
|
506
650
|
self._check_python_version_compatibility(
|
|
507
651
|
dataproc_config.runtime_config.version
|
|
508
652
|
)
|
|
653
|
+
|
|
654
|
+
# Use local variable to improve readability of deeply nested attribute access
|
|
655
|
+
exec_config = dataproc_config.environment_config.execution_config
|
|
656
|
+
|
|
657
|
+
# Set service account from environment if not already set
|
|
509
658
|
if (
|
|
510
|
-
not
|
|
511
|
-
and "DATAPROC_SPARK_CONNECT_AUTH_TYPE" in os.environ
|
|
512
|
-
):
|
|
513
|
-
dataproc_config.environment_config.execution_config.authentication_config.user_workload_authentication_type = AuthenticationConfig.AuthenticationType[
|
|
514
|
-
os.getenv("DATAPROC_SPARK_CONNECT_AUTH_TYPE")
|
|
515
|
-
]
|
|
516
|
-
if (
|
|
517
|
-
not dataproc_config.environment_config.execution_config.service_account
|
|
659
|
+
not exec_config.service_account
|
|
518
660
|
and "DATAPROC_SPARK_CONNECT_SERVICE_ACCOUNT" in os.environ
|
|
519
661
|
):
|
|
520
|
-
|
|
662
|
+
exec_config.service_account = os.getenv(
|
|
521
663
|
"DATAPROC_SPARK_CONNECT_SERVICE_ACCOUNT"
|
|
522
664
|
)
|
|
665
|
+
|
|
666
|
+
# Auto-set authentication type to SERVICE_ACCOUNT when service account is provided
|
|
667
|
+
if exec_config.service_account:
|
|
668
|
+
# When service account is provided, explicitly set auth type to SERVICE_ACCOUNT
|
|
669
|
+
exec_config.authentication_config.user_workload_authentication_type = (
|
|
670
|
+
AuthenticationConfig.AuthenticationType.SERVICE_ACCOUNT
|
|
671
|
+
)
|
|
672
|
+
elif (
|
|
673
|
+
not exec_config.authentication_config.user_workload_authentication_type
|
|
674
|
+
and "DATAPROC_SPARK_CONNECT_AUTH_TYPE" in os.environ
|
|
675
|
+
):
|
|
676
|
+
# Only set auth type from environment if no service account is present
|
|
677
|
+
exec_config.authentication_config.user_workload_authentication_type = AuthenticationConfig.AuthenticationType[
|
|
678
|
+
os.getenv("DATAPROC_SPARK_CONNECT_AUTH_TYPE")
|
|
679
|
+
]
|
|
523
680
|
if (
|
|
524
681
|
not dataproc_config.environment_config.execution_config.subnetwork_uri
|
|
525
682
|
and "DATAPROC_SPARK_CONNECT_SUBNET" in os.environ
|
|
@@ -568,27 +725,23 @@ class DataprocSparkSession(SparkSession):
|
|
|
568
725
|
default_datasource = os.getenv(
|
|
569
726
|
"DATAPROC_SPARK_CONNECT_DEFAULT_DATASOURCE"
|
|
570
727
|
)
|
|
571
|
-
|
|
572
|
-
|
|
573
|
-
|
|
574
|
-
|
|
575
|
-
|
|
576
|
-
bq_datasource_properties = {
|
|
577
|
-
"spark.datasource.bigquery.viewsEnabled": "true",
|
|
578
|
-
"spark.datasource.bigquery.writeMethod": "direct",
|
|
728
|
+
match default_datasource:
|
|
729
|
+
case "bigquery":
|
|
730
|
+
# Merge default configs with existing properties,
|
|
731
|
+
# user configs take precedence
|
|
732
|
+
for k, v in {
|
|
579
733
|
"spark.sql.catalog.spark_catalog": "com.google.cloud.spark.bigquery.BigQuerySparkSessionCatalog",
|
|
580
|
-
"spark.sql.legacy.createHiveTableByDefault": "false",
|
|
581
734
|
"spark.sql.sources.default": "bigquery",
|
|
582
|
-
}
|
|
583
|
-
# Merge default configs with existing properties, user configs take precedence
|
|
584
|
-
for k, v in bq_datasource_properties.items():
|
|
735
|
+
}.items():
|
|
585
736
|
if k not in dataproc_config.runtime_config.properties:
|
|
586
737
|
dataproc_config.runtime_config.properties[k] = v
|
|
587
|
-
|
|
588
|
-
|
|
589
|
-
|
|
590
|
-
|
|
591
|
-
|
|
738
|
+
case _:
|
|
739
|
+
if default_datasource:
|
|
740
|
+
logger.warning(
|
|
741
|
+
f"DATAPROC_SPARK_CONNECT_DEFAULT_DATASOURCE is set to an invalid value:"
|
|
742
|
+
f" {default_datasource}. Supported value is 'bigquery'."
|
|
743
|
+
)
|
|
744
|
+
|
|
592
745
|
return dataproc_config
|
|
593
746
|
|
|
594
747
|
def _check_python_version_compatibility(self, runtime_version):
|
|
@@ -598,9 +751,7 @@ class DataprocSparkSession(SparkSession):
|
|
|
598
751
|
|
|
599
752
|
# Runtime version to server Python version mapping
|
|
600
753
|
RUNTIME_PYTHON_MAP = {
|
|
601
|
-
"
|
|
602
|
-
"2.2": (3, 12),
|
|
603
|
-
"2.3": (3, 11),
|
|
754
|
+
"3.0": (3, 12),
|
|
604
755
|
}
|
|
605
756
|
|
|
606
757
|
client_python = sys.version_info[:2] # (major, minor)
|
|
@@ -617,9 +768,54 @@ class DataprocSparkSession(SparkSession):
|
|
|
617
768
|
stacklevel=3,
|
|
618
769
|
)
|
|
619
770
|
|
|
771
|
+
def _check_runtime_compatibility(self, dataproc_config):
|
|
772
|
+
"""Check if runtime version 3.0 client is compatible with older runtime versions.
|
|
773
|
+
|
|
774
|
+
Runtime version 3.0 clients do not support older runtime versions (pre-3.0).
|
|
775
|
+
There is no backward or forward compatibility between different runtime versions.
|
|
776
|
+
|
|
777
|
+
Args:
|
|
778
|
+
dataproc_config: The Session configuration containing runtime version
|
|
779
|
+
|
|
780
|
+
Raises:
|
|
781
|
+
DataprocSparkConnectException: If server is using pre-3.0 runtime version
|
|
782
|
+
"""
|
|
783
|
+
runtime_version = dataproc_config.runtime_config.version
|
|
784
|
+
|
|
785
|
+
if not runtime_version:
|
|
786
|
+
return
|
|
787
|
+
|
|
788
|
+
logger.debug(f"Detected server runtime version: {runtime_version}")
|
|
789
|
+
|
|
790
|
+
# Parse runtime version to check if it's below minimum supported version
|
|
791
|
+
try:
|
|
792
|
+
server_version = version.parse(runtime_version)
|
|
793
|
+
min_version = version.parse(
|
|
794
|
+
DataprocSparkSession._MIN_RUNTIME_VERSION
|
|
795
|
+
)
|
|
796
|
+
|
|
797
|
+
if server_version < min_version:
|
|
798
|
+
raise DataprocSparkConnectException(
|
|
799
|
+
f"Specified {runtime_version} Dataproc Runtime version is not supported, "
|
|
800
|
+
f"use {DataprocSparkSession._MIN_RUNTIME_VERSION} version or higher."
|
|
801
|
+
)
|
|
802
|
+
except version.InvalidVersion:
|
|
803
|
+
# If we can't parse the version, log a warning but continue
|
|
804
|
+
logger.warning(
|
|
805
|
+
f"Could not parse runtime version: {runtime_version}"
|
|
806
|
+
)
|
|
807
|
+
|
|
620
808
|
def _display_view_session_details_button(self, session_id):
|
|
809
|
+
# Display button is only supported in colab enterprise
|
|
810
|
+
if not environment.is_colab_enterprise():
|
|
811
|
+
return
|
|
812
|
+
|
|
813
|
+
# Skip button display for colab enterprise IPython terminals
|
|
814
|
+
if environment.is_interactive_terminal():
|
|
815
|
+
return
|
|
816
|
+
|
|
621
817
|
try:
|
|
622
|
-
session_url = f"
|
|
818
|
+
session_url = f"{_DATAPROC_SESSIONS_BASE_URL}/{self._region}/{session_id}?project={self._project_id}"
|
|
623
819
|
from IPython.core.interactiveshell import InteractiveShell
|
|
624
820
|
|
|
625
821
|
if not InteractiveShell.initialized():
|
|
@@ -633,6 +829,90 @@ class DataprocSparkSession(SparkSession):
|
|
|
633
829
|
except ImportError as e:
|
|
634
830
|
logger.debug(f"Import error: {e}")
|
|
635
831
|
|
|
832
|
+
def _get_session_by_id(self, session_id: str) -> Optional[Session]:
|
|
833
|
+
"""
|
|
834
|
+
Get existing session by ID.
|
|
835
|
+
|
|
836
|
+
Returns:
|
|
837
|
+
Session if ACTIVE/CREATING, None if not found or not usable
|
|
838
|
+
"""
|
|
839
|
+
session_name = f"projects/{self._project_id}/locations/{self._region}/sessions/{session_id}"
|
|
840
|
+
|
|
841
|
+
try:
|
|
842
|
+
get_request = GetSessionRequest(name=session_name)
|
|
843
|
+
session = self.session_controller_client.get_session(
|
|
844
|
+
get_request
|
|
845
|
+
)
|
|
846
|
+
|
|
847
|
+
logger.debug(
|
|
848
|
+
f"Found existing session {session_id} in state: {session.state}"
|
|
849
|
+
)
|
|
850
|
+
|
|
851
|
+
if session.state in [
|
|
852
|
+
Session.State.ACTIVE,
|
|
853
|
+
Session.State.CREATING,
|
|
854
|
+
]:
|
|
855
|
+
# Reuse the active session
|
|
856
|
+
logger.info(f"Reusing existing session: {session_id}")
|
|
857
|
+
return session
|
|
858
|
+
else:
|
|
859
|
+
# Session exists but is not usable (terminated/failed/terminating)
|
|
860
|
+
logger.info(
|
|
861
|
+
f"Session {session_id} in {session.state.name} state, cannot reuse"
|
|
862
|
+
)
|
|
863
|
+
return None
|
|
864
|
+
|
|
865
|
+
except NotFound:
|
|
866
|
+
# Session doesn't exist, can create new one
|
|
867
|
+
logger.debug(
|
|
868
|
+
f"Session {session_id} not found, can create new one"
|
|
869
|
+
)
|
|
870
|
+
return None
|
|
871
|
+
except Exception as e:
|
|
872
|
+
logger.error(f"Error checking session {session_id}: {e}")
|
|
873
|
+
return None
|
|
874
|
+
|
|
875
|
+
def _delete_session(self, session_name: str):
|
|
876
|
+
"""Delete a session to free up the session ID for reuse."""
|
|
877
|
+
try:
|
|
878
|
+
delete_request = DeleteSessionRequest(name=session_name)
|
|
879
|
+
self.session_controller_client.delete_session(delete_request)
|
|
880
|
+
logger.debug(f"Deleted session: {session_name}")
|
|
881
|
+
except NotFound:
|
|
882
|
+
logger.debug(f"Session already deleted: {session_name}")
|
|
883
|
+
|
|
884
|
+
def _wait_for_termination(self, session_name: str, timeout: int = 180):
|
|
885
|
+
"""Wait for a session to finish terminating."""
|
|
886
|
+
start_time = time.time()
|
|
887
|
+
|
|
888
|
+
while time.time() - start_time < timeout:
|
|
889
|
+
try:
|
|
890
|
+
get_request = GetSessionRequest(name=session_name)
|
|
891
|
+
session = self.session_controller_client.get_session(
|
|
892
|
+
get_request
|
|
893
|
+
)
|
|
894
|
+
|
|
895
|
+
if session.state in [
|
|
896
|
+
Session.State.TERMINATED,
|
|
897
|
+
Session.State.FAILED,
|
|
898
|
+
]:
|
|
899
|
+
return
|
|
900
|
+
elif session.state != Session.State.TERMINATING:
|
|
901
|
+
# Session is in unexpected state
|
|
902
|
+
logger.warning(
|
|
903
|
+
f"Session {session_name} in unexpected state while waiting for termination: {session.state}"
|
|
904
|
+
)
|
|
905
|
+
return
|
|
906
|
+
|
|
907
|
+
time.sleep(2)
|
|
908
|
+
except NotFound:
|
|
909
|
+
# Session was deleted
|
|
910
|
+
return
|
|
911
|
+
|
|
912
|
+
logger.warning(
|
|
913
|
+
f"Timeout waiting for session {session_name} to terminate"
|
|
914
|
+
)
|
|
915
|
+
|
|
636
916
|
@staticmethod
|
|
637
917
|
def generate_dataproc_session_id():
|
|
638
918
|
timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
|
|
@@ -706,16 +986,111 @@ class DataprocSparkSession(SparkSession):
|
|
|
706
986
|
execute_and_fetch_as_iterator_wrapped_method, self.client
|
|
707
987
|
)
|
|
708
988
|
|
|
989
|
+
# Patching clearProgressHandlers method to not remove Dataproc Progress Handler
|
|
990
|
+
clearProgressHandlers_base_method = self.clearProgressHandlers
|
|
991
|
+
|
|
992
|
+
def clearProgressHandlers_wrapper_method(_, *args, **kwargs):
|
|
993
|
+
clearProgressHandlers_base_method(*args, **kwargs)
|
|
994
|
+
|
|
995
|
+
self._register_progress_execution_handler()
|
|
996
|
+
|
|
997
|
+
self.clearProgressHandlers = MethodType(
|
|
998
|
+
clearProgressHandlers_wrapper_method, self
|
|
999
|
+
)
|
|
1000
|
+
|
|
1001
|
+
@staticmethod
|
|
1002
|
+
@functools.lru_cache(maxsize=1)
|
|
1003
|
+
def get_tqdm_bar():
|
|
1004
|
+
"""
|
|
1005
|
+
Return a tqdm implementation that works in the current environment.
|
|
1006
|
+
|
|
1007
|
+
- Uses CLI tqdm for interactive terminals.
|
|
1008
|
+
- Uses the notebook tqdm if available, otherwise falls back to CLI tqdm.
|
|
1009
|
+
"""
|
|
1010
|
+
from tqdm import tqdm as cli_tqdm
|
|
1011
|
+
|
|
1012
|
+
if environment.is_interactive_terminal():
|
|
1013
|
+
return cli_tqdm
|
|
1014
|
+
|
|
1015
|
+
try:
|
|
1016
|
+
import ipywidgets
|
|
1017
|
+
from tqdm.notebook import tqdm as notebook_tqdm
|
|
1018
|
+
|
|
1019
|
+
return notebook_tqdm
|
|
1020
|
+
except ImportError:
|
|
1021
|
+
return cli_tqdm
|
|
1022
|
+
|
|
1023
|
+
def _register_progress_execution_handler(self):
|
|
1024
|
+
from pyspark.sql.connect.shell.progress import StageInfo
|
|
1025
|
+
|
|
1026
|
+
def handler(
|
|
1027
|
+
stages: Optional[Iterable[StageInfo]],
|
|
1028
|
+
inflight_tasks: int,
|
|
1029
|
+
operation_id: Optional[str],
|
|
1030
|
+
done: bool,
|
|
1031
|
+
):
|
|
1032
|
+
if operation_id is None:
|
|
1033
|
+
return
|
|
1034
|
+
|
|
1035
|
+
# Don't build / render progress bar for non-interactive (despite
|
|
1036
|
+
# Ipython or non-IPython)
|
|
1037
|
+
if not environment.is_interactive():
|
|
1038
|
+
return
|
|
1039
|
+
|
|
1040
|
+
total_tasks = 0
|
|
1041
|
+
completed_tasks = 0
|
|
1042
|
+
|
|
1043
|
+
for stage in stages or []:
|
|
1044
|
+
total_tasks += stage.num_tasks
|
|
1045
|
+
completed_tasks += stage.num_completed_tasks
|
|
1046
|
+
|
|
1047
|
+
# Don't show progress bar till we receive some tasks
|
|
1048
|
+
if total_tasks == 0:
|
|
1049
|
+
return
|
|
1050
|
+
|
|
1051
|
+
# Get correct tqdm (notebook or CLI)
|
|
1052
|
+
tqdm_pbar = self.get_tqdm_bar()
|
|
1053
|
+
|
|
1054
|
+
# Use a lock to ensure only one thread can access and modify
|
|
1055
|
+
# the shared dictionaries at a time.
|
|
1056
|
+
with self._lock:
|
|
1057
|
+
if operation_id in self._execution_progress_bar:
|
|
1058
|
+
pbar = self._execution_progress_bar[operation_id]
|
|
1059
|
+
if pbar.total != total_tasks:
|
|
1060
|
+
pbar.reset(
|
|
1061
|
+
total=total_tasks
|
|
1062
|
+
) # This force resets the progress bar % too on next refresh
|
|
1063
|
+
else:
|
|
1064
|
+
pbar = tqdm_pbar(
|
|
1065
|
+
total=total_tasks,
|
|
1066
|
+
leave=True,
|
|
1067
|
+
dynamic_ncols=True,
|
|
1068
|
+
bar_format="{l_bar}{bar} {n_fmt}/{total_fmt} Tasks",
|
|
1069
|
+
)
|
|
1070
|
+
self._execution_progress_bar[operation_id] = pbar
|
|
1071
|
+
|
|
1072
|
+
# To handle skipped or failed tasks.
|
|
1073
|
+
# StageInfo proto doesn't have skipped and failed tasks information to process.
|
|
1074
|
+
if done and completed_tasks < total_tasks:
|
|
1075
|
+
completed_tasks = total_tasks
|
|
1076
|
+
|
|
1077
|
+
pbar.n = completed_tasks
|
|
1078
|
+
pbar.refresh()
|
|
1079
|
+
|
|
1080
|
+
if done:
|
|
1081
|
+
pbar.close()
|
|
1082
|
+
self._execution_progress_bar.pop(operation_id, None)
|
|
1083
|
+
|
|
1084
|
+
self.registerProgressHandler(handler)
|
|
1085
|
+
|
|
709
1086
|
@staticmethod
|
|
710
1087
|
def _sql_lazy_transformation(req):
|
|
711
1088
|
# Select SQL command
|
|
712
|
-
|
|
713
|
-
|
|
714
|
-
|
|
715
|
-
|
|
716
|
-
|
|
717
|
-
|
|
718
|
-
return False
|
|
1089
|
+
try:
|
|
1090
|
+
query = req.plan.command.sql_command.input.sql.query
|
|
1091
|
+
return "select" in query.strip().lower().split()
|
|
1092
|
+
except AttributeError:
|
|
1093
|
+
return False
|
|
719
1094
|
|
|
720
1095
|
def _repr_html_(self) -> str:
|
|
721
1096
|
if not self._active_s8s_session_id:
|
|
@@ -723,7 +1098,7 @@ class DataprocSparkSession(SparkSession):
|
|
|
723
1098
|
<div>No Active Dataproc Session</div>
|
|
724
1099
|
"""
|
|
725
1100
|
|
|
726
|
-
s8s_session = f"
|
|
1101
|
+
s8s_session = f"{_DATAPROC_SESSIONS_BASE_URL}/{self._region}/{self._active_s8s_session_id}"
|
|
727
1102
|
ui = f"{s8s_session}/sparkApplications/applications"
|
|
728
1103
|
return f"""
|
|
729
1104
|
<div>
|
|
@@ -735,6 +1110,11 @@ class DataprocSparkSession(SparkSession):
|
|
|
735
1110
|
"""
|
|
736
1111
|
|
|
737
1112
|
def _display_operation_link(self, operation_id: str):
|
|
1113
|
+
# Don't print per-operation Spark UI link for non-interactive (despite
|
|
1114
|
+
# Ipython or non-IPython)
|
|
1115
|
+
if not environment.is_interactive():
|
|
1116
|
+
return
|
|
1117
|
+
|
|
738
1118
|
assert all(
|
|
739
1119
|
[
|
|
740
1120
|
operation_id is not None,
|
|
@@ -745,17 +1125,18 @@ class DataprocSparkSession(SparkSession):
|
|
|
745
1125
|
)
|
|
746
1126
|
|
|
747
1127
|
url = (
|
|
748
|
-
f"
|
|
1128
|
+
f"{_DATAPROC_SESSIONS_BASE_URL}/{self._region}/"
|
|
749
1129
|
f"{self._active_s8s_session_id}/sparkApplications/application;"
|
|
750
1130
|
f"associatedSqlOperationId={operation_id}?project={self._project_id}"
|
|
751
1131
|
)
|
|
752
1132
|
|
|
1133
|
+
if environment.is_interactive_terminal():
|
|
1134
|
+
print(f"Spark Query: {url}")
|
|
1135
|
+
return
|
|
1136
|
+
|
|
753
1137
|
try:
|
|
754
1138
|
from IPython.display import display, HTML
|
|
755
|
-
from IPython.core.interactiveshell import InteractiveShell
|
|
756
1139
|
|
|
757
|
-
if not InteractiveShell.initialized():
|
|
758
|
-
return
|
|
759
1140
|
html_element = f"""
|
|
760
1141
|
<div>
|
|
761
1142
|
<p><a href="{url}">Spark Query</a> (Operation: {operation_id})</p>
|
|
@@ -813,7 +1194,7 @@ class DataprocSparkSession(SparkSession):
|
|
|
813
1194
|
This is an API dedicated to Spark Connect client only. With regular Spark Session, it throws
|
|
814
1195
|
an exception.
|
|
815
1196
|
Regarding pypi: Popular packages are already pre-installed in s8s runtime.
|
|
816
|
-
https://cloud.google.com/dataproc-serverless/docs/concepts/versions/spark-runtime-2.
|
|
1197
|
+
https://cloud.google.com/dataproc-serverless/docs/concepts/versions/spark-runtime-2.3#python_libraries
|
|
817
1198
|
If there are conflicts/package doesn't exist, it throws an exception.
|
|
818
1199
|
"""
|
|
819
1200
|
if sum([pypi, file, pyfile, archive]) > 1:
|
|
@@ -836,19 +1217,83 @@ class DataprocSparkSession(SparkSession):
|
|
|
836
1217
|
def _get_active_session_file_path():
|
|
837
1218
|
return os.getenv("DATAPROC_SPARK_CONNECT_ACTIVE_SESSION_FILE_PATH")
|
|
838
1219
|
|
|
839
|
-
def stop(self) -> None:
|
|
1220
|
+
def stop(self, terminate: Optional[bool] = None) -> None:
|
|
1221
|
+
"""
|
|
1222
|
+
Stop the Spark session and optionally terminate the server-side session.
|
|
1223
|
+
|
|
1224
|
+
Parameters
|
|
1225
|
+
----------
|
|
1226
|
+
terminate : bool, optional
|
|
1227
|
+
Control server-side termination behavior.
|
|
1228
|
+
|
|
1229
|
+
- None (default): Auto-detect based on session type
|
|
1230
|
+
|
|
1231
|
+
- Managed sessions (auto-generated ID): terminate server
|
|
1232
|
+
- Named sessions (custom ID): client-side cleanup only
|
|
1233
|
+
|
|
1234
|
+
- True: Always terminate the server-side session
|
|
1235
|
+
- False: Never terminate the server-side session (client cleanup only)
|
|
1236
|
+
|
|
1237
|
+
Examples
|
|
1238
|
+
--------
|
|
1239
|
+
Auto-detect termination behavior (existing behavior):
|
|
1240
|
+
|
|
1241
|
+
>>> spark.stop()
|
|
1242
|
+
|
|
1243
|
+
Force terminate a named session:
|
|
1244
|
+
|
|
1245
|
+
>>> spark.stop(terminate=True)
|
|
1246
|
+
|
|
1247
|
+
Prevent termination of a managed session:
|
|
1248
|
+
|
|
1249
|
+
>>> spark.stop(terminate=False)
|
|
1250
|
+
"""
|
|
840
1251
|
with DataprocSparkSession._lock:
|
|
841
1252
|
if DataprocSparkSession._active_s8s_session_id is not None:
|
|
842
|
-
|
|
843
|
-
|
|
844
|
-
|
|
845
|
-
|
|
846
|
-
|
|
847
|
-
|
|
1253
|
+
# Determine if we should terminate the server-side session
|
|
1254
|
+
if terminate is None:
|
|
1255
|
+
# Auto-detect: managed sessions terminate, named sessions don't
|
|
1256
|
+
should_terminate = (
|
|
1257
|
+
not DataprocSparkSession._active_session_uses_custom_id
|
|
1258
|
+
)
|
|
1259
|
+
else:
|
|
1260
|
+
should_terminate = terminate
|
|
1261
|
+
|
|
1262
|
+
if should_terminate:
|
|
1263
|
+
# Terminate the server-side session
|
|
1264
|
+
logger.debug(
|
|
1265
|
+
f"Terminating session {DataprocSparkSession._active_s8s_session_id}"
|
|
1266
|
+
)
|
|
1267
|
+
terminate_s8s_session(
|
|
1268
|
+
DataprocSparkSession._project_id,
|
|
1269
|
+
DataprocSparkSession._region,
|
|
1270
|
+
DataprocSparkSession._active_s8s_session_id,
|
|
1271
|
+
self._client_options,
|
|
1272
|
+
)
|
|
1273
|
+
else:
|
|
1274
|
+
# Client-side cleanup only
|
|
1275
|
+
logger.debug(
|
|
1276
|
+
f"Stopping session {DataprocSparkSession._active_s8s_session_id} without termination"
|
|
1277
|
+
)
|
|
848
1278
|
|
|
849
1279
|
self._remove_stopped_session_from_file()
|
|
1280
|
+
|
|
1281
|
+
# Clean up SparkSession._instantiatedSession if it points to this session
|
|
1282
|
+
try:
|
|
1283
|
+
from pyspark.sql import SparkSession as PySparkSQLSession
|
|
1284
|
+
|
|
1285
|
+
if PySparkSQLSession._instantiatedSession is self:
|
|
1286
|
+
PySparkSQLSession._instantiatedSession = None
|
|
1287
|
+
logger.debug(
|
|
1288
|
+
"Cleared SparkSession._instantiatedSession reference"
|
|
1289
|
+
)
|
|
1290
|
+
except (ImportError, AttributeError):
|
|
1291
|
+
# PySpark not available or _instantiatedSession doesn't exist
|
|
1292
|
+
pass
|
|
1293
|
+
|
|
850
1294
|
DataprocSparkSession._active_s8s_session_uuid = None
|
|
851
1295
|
DataprocSparkSession._active_s8s_session_id = None
|
|
1296
|
+
DataprocSparkSession._active_session_uses_custom_id = False
|
|
852
1297
|
DataprocSparkSession._project_id = None
|
|
853
1298
|
DataprocSparkSession._region = None
|
|
854
1299
|
DataprocSparkSession._client_options = None
|
|
@@ -1,105 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.4
|
|
2
|
-
Name: dataproc-spark-connect
|
|
3
|
-
Version: 0.9.0
|
|
4
|
-
Summary: Dataproc client library for Spark Connect
|
|
5
|
-
Home-page: https://github.com/GoogleCloudDataproc/dataproc-spark-connect-python
|
|
6
|
-
Author: Google LLC
|
|
7
|
-
License: Apache 2.0
|
|
8
|
-
License-File: LICENSE
|
|
9
|
-
Requires-Dist: google-api-core>=2.19
|
|
10
|
-
Requires-Dist: google-cloud-dataproc>=5.18
|
|
11
|
-
Requires-Dist: packaging>=20.0
|
|
12
|
-
Requires-Dist: pyspark[connect]~=3.5.1
|
|
13
|
-
Requires-Dist: tqdm>=4.67
|
|
14
|
-
Requires-Dist: websockets>=14.0
|
|
15
|
-
Dynamic: author
|
|
16
|
-
Dynamic: description
|
|
17
|
-
Dynamic: home-page
|
|
18
|
-
Dynamic: license
|
|
19
|
-
Dynamic: license-file
|
|
20
|
-
Dynamic: requires-dist
|
|
21
|
-
Dynamic: summary
|
|
22
|
-
|
|
23
|
-
# Dataproc Spark Connect Client
|
|
24
|
-
|
|
25
|
-
A wrapper of the Apache [Spark Connect](https://spark.apache.org/spark-connect/)
|
|
26
|
-
client with additional functionalities that allow applications to communicate
|
|
27
|
-
with a remote Dataproc Spark Session using the Spark Connect protocol without
|
|
28
|
-
requiring additional steps.
|
|
29
|
-
|
|
30
|
-
## Install
|
|
31
|
-
|
|
32
|
-
```sh
|
|
33
|
-
pip install dataproc_spark_connect
|
|
34
|
-
```
|
|
35
|
-
|
|
36
|
-
## Uninstall
|
|
37
|
-
|
|
38
|
-
```sh
|
|
39
|
-
pip uninstall dataproc_spark_connect
|
|
40
|
-
```
|
|
41
|
-
|
|
42
|
-
## Setup
|
|
43
|
-
|
|
44
|
-
This client requires permissions to
|
|
45
|
-
manage [Dataproc Sessions and Session Templates](https://cloud.google.com/dataproc-serverless/docs/concepts/iam).
|
|
46
|
-
If you are running the client outside of Google Cloud, you must set following
|
|
47
|
-
environment variables:
|
|
48
|
-
|
|
49
|
-
* `GOOGLE_CLOUD_PROJECT` - The Google Cloud project you use to run Spark
|
|
50
|
-
workloads
|
|
51
|
-
* `GOOGLE_CLOUD_REGION` - The Compute
|
|
52
|
-
Engine [region](https://cloud.google.com/compute/docs/regions-zones#available)
|
|
53
|
-
where you run the Spark workload.
|
|
54
|
-
* `GOOGLE_APPLICATION_CREDENTIALS` -
|
|
55
|
-
Your [Application Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc)
|
|
56
|
-
|
|
57
|
-
## Usage
|
|
58
|
-
|
|
59
|
-
1. Install the latest version of Dataproc Python client and Dataproc Spark
|
|
60
|
-
Connect modules:
|
|
61
|
-
|
|
62
|
-
```sh
|
|
63
|
-
pip install google_cloud_dataproc dataproc_spark_connect --force-reinstall
|
|
64
|
-
```
|
|
65
|
-
|
|
66
|
-
2. Add the required imports into your PySpark application or notebook and start
|
|
67
|
-
a Spark session with the following code instead of using
|
|
68
|
-
environment variables:
|
|
69
|
-
|
|
70
|
-
```python
|
|
71
|
-
from google.cloud.dataproc_spark_connect import DataprocSparkSession
|
|
72
|
-
from google.cloud.dataproc_v1 import Session
|
|
73
|
-
session_config = Session()
|
|
74
|
-
session_config.environment_config.execution_config.subnetwork_uri = '<subnet>'
|
|
75
|
-
session_config.runtime_config.version = '2.2'
|
|
76
|
-
spark = DataprocSparkSession.builder.dataprocSessionConfig(session_config).getOrCreate()
|
|
77
|
-
```
|
|
78
|
-
|
|
79
|
-
## Developing
|
|
80
|
-
|
|
81
|
-
For development instructions see [guide](DEVELOPING.md).
|
|
82
|
-
|
|
83
|
-
## Contributing
|
|
84
|
-
|
|
85
|
-
We'd love to accept your patches and contributions to this project. There are
|
|
86
|
-
just a few small guidelines you need to follow.
|
|
87
|
-
|
|
88
|
-
### Contributor License Agreement
|
|
89
|
-
|
|
90
|
-
Contributions to this project must be accompanied by a Contributor License
|
|
91
|
-
Agreement. You (or your employer) retain the copyright to your contribution;
|
|
92
|
-
this simply gives us permission to use and redistribute your contributions as
|
|
93
|
-
part of the project. Head over to <https://cla.developers.google.com> to see
|
|
94
|
-
your current agreements on file or to sign a new one.
|
|
95
|
-
|
|
96
|
-
You generally only need to submit a CLA once, so if you've already submitted one
|
|
97
|
-
(even if it was for a different project), you probably don't need to do it
|
|
98
|
-
again.
|
|
99
|
-
|
|
100
|
-
### Code reviews
|
|
101
|
-
|
|
102
|
-
All submissions, including submissions by project members, require review. We
|
|
103
|
-
use GitHub pull requests for this purpose. Consult
|
|
104
|
-
[GitHub Help](https://help.github.com/articles/about-pull-requests/) for more
|
|
105
|
-
information on using pull requests.
|
|
@@ -1,13 +0,0 @@
|
|
|
1
|
-
dataproc_spark_connect-0.9.0.dist-info/licenses/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
|
|
2
|
-
google/cloud/dataproc_spark_connect/__init__.py,sha256=dIqHNWVWWrSuRf26x11kX5e9yMKSHCtmI_GBj1-FDdE,1101
|
|
3
|
-
google/cloud/dataproc_spark_connect/environment.py,sha256=UICy9XyqAxL-cryVWx7GZPRAxoir5LKk0dtqqY_l--c,2307
|
|
4
|
-
google/cloud/dataproc_spark_connect/exceptions.py,sha256=WF-qdzgdofRwILCriIkjjsmjObZfF0P3Ecg4lv-Hmec,968
|
|
5
|
-
google/cloud/dataproc_spark_connect/pypi_artifacts.py,sha256=gd-VMwiVP-EJuPp9Vf9Shx8pqps3oSKp0hBcSSZQS-A,1575
|
|
6
|
-
google/cloud/dataproc_spark_connect/session.py,sha256=ELj5hDhofK1967eE5YaG_LP5B80KWFQWJn5gxi9yYt0,38577
|
|
7
|
-
google/cloud/dataproc_spark_connect/client/__init__.py,sha256=6hCNSsgYlie6GuVpc5gjFsPnyeMTScTpXSPYqp1fplY,615
|
|
8
|
-
google/cloud/dataproc_spark_connect/client/core.py,sha256=m3oXTKBm3sBy6jhDu9GRecrxLb5CdEM53SgMlnJb6ag,4616
|
|
9
|
-
google/cloud/dataproc_spark_connect/client/proxy.py,sha256=qUZXvVY1yn934vE6nlO495XUZ53AUx9O74a9ozkGI9U,8976
|
|
10
|
-
dataproc_spark_connect-0.9.0.dist-info/METADATA,sha256=1z8Ag1P_Lh9db0Rk9nGFoOu6sdeRs0UlrgtOqN_OhIQ,3465
|
|
11
|
-
dataproc_spark_connect-0.9.0.dist-info/WHEEL,sha256=JNWh1Fm1UdwIQV075glCn4MVuCRs0sotJIq-J6rbxCU,109
|
|
12
|
-
dataproc_spark_connect-0.9.0.dist-info/top_level.txt,sha256=_1QvSJIhFAGfxb79D6DhB7SUw2X6T4rwnz_LLrbcD3c,7
|
|
13
|
-
dataproc_spark_connect-0.9.0.dist-info/RECORD,,
|
|
File without changes
|
{dataproc_spark_connect-0.9.0.dist-info → dataproc_spark_connect-1.0.0.dist-info}/licenses/LICENSE
RENAMED
|
File without changes
|
{dataproc_spark_connect-0.9.0.dist-info → dataproc_spark_connect-1.0.0.dist-info}/top_level.txt
RENAMED
|
File without changes
|