dataproc-spark-connect 0.8.3__py2.py3-none-any.whl → 1.0.0__py2.py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- dataproc_spark_connect-1.0.0.dist-info/METADATA +200 -0
- dataproc_spark_connect-1.0.0.dist-info/RECORD +13 -0
- google/cloud/dataproc_spark_connect/client/core.py +5 -3
- google/cloud/dataproc_spark_connect/environment.py +101 -0
- google/cloud/dataproc_spark_connect/exceptions.py +1 -1
- google/cloud/dataproc_spark_connect/session.py +644 -76
- dataproc_spark_connect-0.8.3.dist-info/METADATA +0 -105
- dataproc_spark_connect-0.8.3.dist-info/RECORD +0 -12
- {dataproc_spark_connect-0.8.3.dist-info → dataproc_spark_connect-1.0.0.dist-info}/WHEEL +0 -0
- {dataproc_spark_connect-0.8.3.dist-info → dataproc_spark_connect-1.0.0.dist-info}/licenses/LICENSE +0 -0
- {dataproc_spark_connect-0.8.3.dist-info → dataproc_spark_connect-1.0.0.dist-info}/top_level.txt +0 -0
|
@@ -1,105 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.4
|
|
2
|
-
Name: dataproc-spark-connect
|
|
3
|
-
Version: 0.8.3
|
|
4
|
-
Summary: Dataproc client library for Spark Connect
|
|
5
|
-
Home-page: https://github.com/GoogleCloudDataproc/dataproc-spark-connect-python
|
|
6
|
-
Author: Google LLC
|
|
7
|
-
License: Apache 2.0
|
|
8
|
-
License-File: LICENSE
|
|
9
|
-
Requires-Dist: google-api-core>=2.19
|
|
10
|
-
Requires-Dist: google-cloud-dataproc>=5.18
|
|
11
|
-
Requires-Dist: packaging>=20.0
|
|
12
|
-
Requires-Dist: pyspark[connect]~=3.5.1
|
|
13
|
-
Requires-Dist: tqdm>=4.67
|
|
14
|
-
Requires-Dist: websockets>=14.0
|
|
15
|
-
Dynamic: author
|
|
16
|
-
Dynamic: description
|
|
17
|
-
Dynamic: home-page
|
|
18
|
-
Dynamic: license
|
|
19
|
-
Dynamic: license-file
|
|
20
|
-
Dynamic: requires-dist
|
|
21
|
-
Dynamic: summary
|
|
22
|
-
|
|
23
|
-
# Dataproc Spark Connect Client
|
|
24
|
-
|
|
25
|
-
A wrapper of the Apache [Spark Connect](https://spark.apache.org/spark-connect/)
|
|
26
|
-
client with additional functionalities that allow applications to communicate
|
|
27
|
-
with a remote Dataproc Spark Session using the Spark Connect protocol without
|
|
28
|
-
requiring additional steps.
|
|
29
|
-
|
|
30
|
-
## Install
|
|
31
|
-
|
|
32
|
-
```sh
|
|
33
|
-
pip install dataproc_spark_connect
|
|
34
|
-
```
|
|
35
|
-
|
|
36
|
-
## Uninstall
|
|
37
|
-
|
|
38
|
-
```sh
|
|
39
|
-
pip uninstall dataproc_spark_connect
|
|
40
|
-
```
|
|
41
|
-
|
|
42
|
-
## Setup
|
|
43
|
-
|
|
44
|
-
This client requires permissions to
|
|
45
|
-
manage [Dataproc Sessions and Session Templates](https://cloud.google.com/dataproc-serverless/docs/concepts/iam).
|
|
46
|
-
If you are running the client outside of Google Cloud, you must set following
|
|
47
|
-
environment variables:
|
|
48
|
-
|
|
49
|
-
* `GOOGLE_CLOUD_PROJECT` - The Google Cloud project you use to run Spark
|
|
50
|
-
workloads
|
|
51
|
-
* `GOOGLE_CLOUD_REGION` - The Compute
|
|
52
|
-
Engine [region](https://cloud.google.com/compute/docs/regions-zones#available)
|
|
53
|
-
where you run the Spark workload.
|
|
54
|
-
* `GOOGLE_APPLICATION_CREDENTIALS` -
|
|
55
|
-
Your [Application Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc)
|
|
56
|
-
|
|
57
|
-
## Usage
|
|
58
|
-
|
|
59
|
-
1. Install the latest version of Dataproc Python client and Dataproc Spark
|
|
60
|
-
Connect modules:
|
|
61
|
-
|
|
62
|
-
```sh
|
|
63
|
-
pip install google_cloud_dataproc dataproc_spark_connect --force-reinstall
|
|
64
|
-
```
|
|
65
|
-
|
|
66
|
-
2. Add the required imports into your PySpark application or notebook and start
|
|
67
|
-
a Spark session with the following code instead of using
|
|
68
|
-
environment variables:
|
|
69
|
-
|
|
70
|
-
```python
|
|
71
|
-
from google.cloud.dataproc_spark_connect import DataprocSparkSession
|
|
72
|
-
from google.cloud.dataproc_v1 import Session
|
|
73
|
-
session_config = Session()
|
|
74
|
-
session_config.environment_config.execution_config.subnetwork_uri = '<subnet>'
|
|
75
|
-
session_config.runtime_config.version = '2.2'
|
|
76
|
-
spark = DataprocSparkSession.builder.dataprocSessionConfig(session_config).getOrCreate()
|
|
77
|
-
```
|
|
78
|
-
|
|
79
|
-
## Developing
|
|
80
|
-
|
|
81
|
-
For development instructions see [guide](DEVELOPING.md).
|
|
82
|
-
|
|
83
|
-
## Contributing
|
|
84
|
-
|
|
85
|
-
We'd love to accept your patches and contributions to this project. There are
|
|
86
|
-
just a few small guidelines you need to follow.
|
|
87
|
-
|
|
88
|
-
### Contributor License Agreement
|
|
89
|
-
|
|
90
|
-
Contributions to this project must be accompanied by a Contributor License
|
|
91
|
-
Agreement. You (or your employer) retain the copyright to your contribution;
|
|
92
|
-
this simply gives us permission to use and redistribute your contributions as
|
|
93
|
-
part of the project. Head over to <https://cla.developers.google.com> to see
|
|
94
|
-
your current agreements on file or to sign a new one.
|
|
95
|
-
|
|
96
|
-
You generally only need to submit a CLA once, so if you've already submitted one
|
|
97
|
-
(even if it was for a different project), you probably don't need to do it
|
|
98
|
-
again.
|
|
99
|
-
|
|
100
|
-
### Code reviews
|
|
101
|
-
|
|
102
|
-
All submissions, including submissions by project members, require review. We
|
|
103
|
-
use GitHub pull requests for this purpose. Consult
|
|
104
|
-
[GitHub Help](https://help.github.com/articles/about-pull-requests/) for more
|
|
105
|
-
information on using pull requests.
|
|
@@ -1,12 +0,0 @@
|
|
|
1
|
-
dataproc_spark_connect-0.8.3.dist-info/licenses/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
|
|
2
|
-
google/cloud/dataproc_spark_connect/__init__.py,sha256=dIqHNWVWWrSuRf26x11kX5e9yMKSHCtmI_GBj1-FDdE,1101
|
|
3
|
-
google/cloud/dataproc_spark_connect/exceptions.py,sha256=WF-qdzgdofRwILCriIkjjsmjObZfF0P3Ecg4lv-Hmec,968
|
|
4
|
-
google/cloud/dataproc_spark_connect/pypi_artifacts.py,sha256=gd-VMwiVP-EJuPp9Vf9Shx8pqps3oSKp0hBcSSZQS-A,1575
|
|
5
|
-
google/cloud/dataproc_spark_connect/session.py,sha256=ZWoW9-otaCJnttPt7h9W3pmhHpdbQsAOl8ypOX3fVbo,33556
|
|
6
|
-
google/cloud/dataproc_spark_connect/client/__init__.py,sha256=6hCNSsgYlie6GuVpc5gjFsPnyeMTScTpXSPYqp1fplY,615
|
|
7
|
-
google/cloud/dataproc_spark_connect/client/core.py,sha256=m3oXTKBm3sBy6jhDu9GRecrxLb5CdEM53SgMlnJb6ag,4616
|
|
8
|
-
google/cloud/dataproc_spark_connect/client/proxy.py,sha256=qUZXvVY1yn934vE6nlO495XUZ53AUx9O74a9ozkGI9U,8976
|
|
9
|
-
dataproc_spark_connect-0.8.3.dist-info/METADATA,sha256=croGipnWGtSrd2NLyMCHrcVagYCk9yJ6cEOqCEAm-Qc,3465
|
|
10
|
-
dataproc_spark_connect-0.8.3.dist-info/WHEEL,sha256=JNWh1Fm1UdwIQV075glCn4MVuCRs0sotJIq-J6rbxCU,109
|
|
11
|
-
dataproc_spark_connect-0.8.3.dist-info/top_level.txt,sha256=_1QvSJIhFAGfxb79D6DhB7SUw2X6T4rwnz_LLrbcD3c,7
|
|
12
|
-
dataproc_spark_connect-0.8.3.dist-info/RECORD,,
|
|
File without changes
|
{dataproc_spark_connect-0.8.3.dist-info → dataproc_spark_connect-1.0.0.dist-info}/licenses/LICENSE
RENAMED
|
File without changes
|
{dataproc_spark_connect-0.8.3.dist-info → dataproc_spark_connect-1.0.0.dist-info}/top_level.txt
RENAMED
|
File without changes
|