dataproc-spark-connect 0.2.1__py2.py3-none-any.whl → 0.7.0__py2.py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,119 +0,0 @@
1
- Metadata-Version: 2.1
2
- Name: dataproc-spark-connect
3
- Version: 0.2.1
4
- Summary: Dataproc client library for Spark Connect
5
- Home-page: https://github.com/GoogleCloudDataproc/dataproc-spark-connect-python
6
- Author: Google LLC
7
- License: Apache 2.0
8
- License-File: LICENSE
9
- Requires-Dist: google-api-core>=2.19.1
10
- Requires-Dist: google-cloud-dataproc>=5.15.1
11
- Requires-Dist: wheel
12
- Requires-Dist: websockets
13
- Requires-Dist: pyspark>=3.5
14
- Requires-Dist: pandas
15
- Requires-Dist: pyarrow
16
-
17
- # Dataproc Spark Connect Client
18
-
19
- > ⚠️ **Warning:**
20
- The package `dataproc-spark-connect` has been renamed to `google-spark-connect`. `dataproc-spark-connect` will no longer be updated.
21
- For help using `google-spark-connect`, please see [guide](https://github.com/GoogleCloudDataproc/dataproc-spark-connect-python/blob/main/README.md).
22
-
23
-
24
- A wrapper of the Apache [Spark Connect](https://spark.apache.org/spark-connect/) client with
25
- additional functionalities that allow applications to communicate with a remote Dataproc
26
- Spark cluster using the Spark Connect protocol without requiring additional steps.
27
-
28
- ## Install
29
-
30
- .. code-block:: console
31
-
32
- pip install dataproc_spark_connect
33
-
34
- ## Uninstall
35
-
36
- .. code-block:: console
37
-
38
- pip uninstall dataproc_spark_connect
39
-
40
-
41
- ## Setup
42
- This client requires permissions to manage [Dataproc sessions and session templates](https://cloud.google.com/dataproc-serverless/docs/concepts/iam).
43
- If you are running the client outside of Google Cloud, you must set following environment variables:
44
-
45
- * GOOGLE_CLOUD_PROJECT - The Google Cloud project you use to run Spark workloads
46
- * GOOGLE_CLOUD_REGION - The Compute Engine [region](https://cloud.google.com/compute/docs/regions-zones#available) where you run the Spark workload.
47
- * GOOGLE_APPLICATION_CREDENTIALS - Your [Application Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc)
48
- * DATAPROC_SPARK_CONNECT_SESSION_DEFAULT_CONFIG (Optional) - The config location, such as `tests/integration/resources/session.textproto`
49
-
50
- ## Usage
51
-
52
- 1. Install the latest version of Dataproc Python client and Dataproc Spark Connect modules:
53
-
54
- .. code-block:: console
55
-
56
- pip install google_cloud_dataproc --force-reinstall
57
- pip install dataproc_spark_connect --force-reinstall
58
-
59
- 2. Add the required import into your PySpark application or notebook:
60
-
61
- .. code-block:: python
62
-
63
- from google.cloud.dataproc_spark_connect import DataprocSparkSession
64
-
65
- 3. There are two ways to create a spark session,
66
-
67
- 1. Start a Spark session using properties defined in `DATAPROC_SPARK_CONNECT_SESSION_DEFAULT_CONFIG`:
68
-
69
- .. code-block:: python
70
-
71
- spark = DataprocSparkSession.builder.getOrCreate()
72
-
73
- 2. Start a Spark session with the following code instead of using a config file:
74
-
75
- .. code-block:: python
76
-
77
- from google.cloud.dataproc_v1 import SparkConnectConfig
78
- from google.cloud.dataproc_v1 import Session
79
- dataproc_config = Session()
80
- dataproc_config.spark_connect_session = SparkConnectConfig()
81
- dataproc_config.environment_config.execution_config.subnetwork_uri = "<subnet>"
82
- dataproc_config.runtime_config.version = '3.0'
83
- spark = DataprocSparkSession.builder.dataprocConfig(dataproc_config).getOrCreate()
84
-
85
- ## Billing
86
- As this client runs the spark workload on Dataproc, your project will be billed as per [Dataproc Serverless Pricing](https://cloud.google.com/dataproc-serverless/pricing).
87
- This will happen even if you are running the client from a non-GCE instance.
88
-
89
- ## Contributing
90
- ### Building and Deploying SDK
91
-
92
- 1. Install the requirements in virtual environment.
93
-
94
- .. code-block:: console
95
-
96
- pip install -r requirements.txt
97
-
98
- 2. Build the code.
99
-
100
- .. code-block:: console
101
-
102
- python setup.py sdist bdist_wheel
103
-
104
-
105
- 3. Copy the generated `.whl` file to Cloud Storage. Use the version specified in the `setup.py` file.
106
-
107
- .. code-block:: console
108
-
109
- VERSION=<version> gsutil cp dist/dataproc_spark_connect-${VERSION}-py2.py3-none-any.whl gs://<your_bucket_name>
110
-
111
- 4. Download the new SDK on Vertex, then uninstall the old version and install the new one.
112
-
113
- .. code-block:: console
114
-
115
- %%bash
116
- export VERSION=<version>
117
- gsutil cp gs://<your_bucket_name>/dataproc_spark_connect-${VERSION}-py2.py3-none-any.whl .
118
- yes | pip uninstall dataproc_spark_connect
119
- pip install dataproc_spark_connect-${VERSION}-py2.py3-none-any.whl
@@ -1,10 +0,0 @@
1
- google/cloud/dataproc_spark_connect/__init__.py,sha256=G5Gy26z7Z2u_7EUSYGfcBArKLNTWlSG5OmzuCYzcwjA,969
2
- google/cloud/dataproc_spark_connect/session.py,sha256=A42Wo87VSunG0D3sB-biWyNvU33WhI92mmrJbXI1oNo,23017
3
- google/cloud/dataproc_spark_connect/client/__init__.py,sha256=6hCNSsgYlie6GuVpc5gjFsPnyeMTScTpXSPYqp1fplY,615
4
- google/cloud/dataproc_spark_connect/client/core.py,sha256=7Wy6QwkcWxlHBdo4NsktJEknggPpGkx9F5CS5IpQ7iM,3630
5
- google/cloud/dataproc_spark_connect/client/proxy.py,sha256=ScrbaGsEvqi8wp4ngfD-T9K9mFHXBkVMZkTSr7mdNBs,8926
6
- dataproc_spark_connect-0.2.1.dist-info/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
7
- dataproc_spark_connect-0.2.1.dist-info/METADATA,sha256=S4BrILUItFH_SFQ7fciFargIF5HQsSck4VF8zqFOGN4,4509
8
- dataproc_spark_connect-0.2.1.dist-info/WHEEL,sha256=OpXWERl2xLPRHTvd2ZXo_iluPEQd8uSbYkJ53NAER_Y,109
9
- dataproc_spark_connect-0.2.1.dist-info/top_level.txt,sha256=_1QvSJIhFAGfxb79D6DhB7SUw2X6T4rwnz_LLrbcD3c,7
10
- dataproc_spark_connect-0.2.1.dist-info/RECORD,,