PyPI - dataproc-spark-connect - Versions diffs - 1.0.0rc6__tar.gz → 1.0.0rc7__tar.gz - Mend

dataproc-spark-connect 1.0.0rc6tar.gz → 1.0.0rc7tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

{dataproc_spark_connect-1.0.0rc6 → dataproc_spark_connect-1.0.0rc7}/PKG-INFO RENAMED Viewed

@@ -1,15 +1,16 @@
 Metadata-Version: 2.4
 Name: dataproc-spark-connect
-Version: 1.0.0rc6
+Version: 1.0.0rc7
 Summary: Dataproc client library for Spark Connect
 Home-page: https://github.com/GoogleCloudDataproc/dataproc-spark-connect-python
 Author: Google LLC
 License: Apache 2.0
+Description-Content-Type: text/markdown
 License-File: LICENSE
 Requires-Dist: google-api-core>=2.19
 Requires-Dist: google-cloud-dataproc>=5.18
 Requires-Dist: packaging>=20.0
-Requires-Dist: pyspark[connect]~=4.0.0
+Requires-Dist: pyspark-client~=4.0.0
 Requires-Dist: tqdm>=4.67
 Requires-Dist: websockets>=14.0
 Dynamic: author
@@ -43,39 +44,86 @@ pip uninstall dataproc_spark_connect
 This client requires permissions to
 manage [Dataproc Sessions and Session Templates](https://cloud.google.com/dataproc-serverless/docs/concepts/iam).
-If you are running the client outside of Google Cloud, you must set following
-environment variables:
-* `GOOGLE_CLOUD_PROJECT` - The Google Cloud project you use to run Spark
-  workloads
-* `GOOGLE_CLOUD_REGION` - The Compute
-  Engine [region](https://cloud.google.com/compute/docs/regions-zones#available)
-  where you run the Spark workload.
-* `GOOGLE_APPLICATION_CREDENTIALS` -
-  Your [Application Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc)
+If you are running the client outside of Google Cloud, you need to provide
+authentication credentials. Set the `GOOGLE_APPLICATION_CREDENTIALS` environment
+variable to point to
+your [Application Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc)
+file.
+You can specify the project and region either via environment variables or directly
+in your code using the builder API:
+* Environment variables: `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_REGION`
+* Builder API: `.projectId()` and `.location()` methods (recommended)
 ## Usage
-1. Install the latest version of Dataproc Python client and Dataproc Spark
-   Connect modules:
+1. Install the latest version of Dataproc Spark Connect:
    ```sh
-   pip install google_cloud_dataproc dataproc_spark_connect --force-reinstall
+   pip install -U dataproc-spark-connect
    ```
 2. Add the required imports into your PySpark application or notebook and start
-   a Spark session with the following code instead of using
-   environment variables:
+   a Spark session using the fluent API:
+   ```python
+   from google.cloud.dataproc_spark_connect import DataprocSparkSession
+   spark = DataprocSparkSession.builder.getOrCreate()
+   ```
+3. You can configure Spark properties using the `.config()` method:
+   ```python
+   from google.cloud.dataproc_spark_connect import DataprocSparkSession
+   spark = DataprocSparkSession.builder.config('spark.executor.memory', '4g').config('spark.executor.cores', '2').getOrCreate()
+   ```
+4. For advanced configuration, you can use the `Session` class to customize
+   settings like subnetwork or other environment configurations:
    ```python
    from google.cloud.dataproc_spark_connect import DataprocSparkSession
    from google.cloud.dataproc_v1 import Session
    session_config = Session()
    session_config.environment_config.execution_config.subnetwork_uri = '<subnet>'
-   session_config.runtime_config.version = '2.2'
-   spark = DataprocSparkSession.builder.dataprocSessionConfig(session_config).getOrCreate()
+   session_config.runtime_config.version = '3.0'
+   spark = DataprocSparkSession.builder.projectId('my-project').location('us-central1').dataprocSessionConfig(session_config).getOrCreate()
+   ```
+### Reusing Named Sessions Across Notebooks
+Named sessions allow you to share a single Spark session across multiple notebooks, improving efficiency by avoiding repeated session startup times and reducing costs.
+To create or connect to a named session:
+1. Create a session with a custom ID in your first notebook:
+   ```python
+   from google.cloud.dataproc_spark_connect import DataprocSparkSession
+   session_id = 'my-ml-pipeline-session'
+   spark = DataprocSparkSession.builder.dataprocSessionId(session_id).getOrCreate()
+   df = spark.createDataFrame([(1, 'data')], ['id', 'value'])
+   df.show()
+   ```
+2. Reuse the same session in another notebook by specifying the same session ID:
+   ```python
+   from google.cloud.dataproc_spark_connect import DataprocSparkSession
+   session_id = 'my-ml-pipeline-session'
+   spark = DataprocSparkSession.builder.dataprocSessionId(session_id).getOrCreate()
+   df = spark.createDataFrame([(2, 'more-data')], ['id', 'value'])
+   df.show()
    ```
+3. Session IDs must be 4-63 characters long, start with a lowercase letter, contain only lowercase letters, numbers, and hyphens, and not end with a hyphen.
+4. Named sessions persist until explicitly terminated or reach their configured TTL.
+5. A session with a given ID that is in a TERMINATED state cannot be reused. It must be deleted before a new session with the same ID can be created.
 ### Using Spark SQL Magic Commands (Jupyter Notebooks)
 The package supports the [sparksql-magic](https://github.com/cryeo/sparksql-magic) library for executing Spark SQL queries directly in Jupyter notebooks.

{dataproc_spark_connect-1.0.0rc6 → dataproc_spark_connect-1.0.0rc7}/README.md RENAMED Viewed

@@ -21,39 +21,86 @@ pip uninstall dataproc_spark_connect
 This client requires permissions to
 manage [Dataproc Sessions and Session Templates](https://cloud.google.com/dataproc-serverless/docs/concepts/iam).
-If you are running the client outside of Google Cloud, you must set following
-environment variables:
-* `GOOGLE_CLOUD_PROJECT` - The Google Cloud project you use to run Spark
-  workloads
-* `GOOGLE_CLOUD_REGION` - The Compute
-  Engine [region](https://cloud.google.com/compute/docs/regions-zones#available)
-  where you run the Spark workload.
-* `GOOGLE_APPLICATION_CREDENTIALS` -
-  Your [Application Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc)
+If you are running the client outside of Google Cloud, you need to provide
+authentication credentials. Set the `GOOGLE_APPLICATION_CREDENTIALS` environment
+variable to point to
+your [Application Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc)
+file.
+You can specify the project and region either via environment variables or directly
+in your code using the builder API:
+* Environment variables: `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_REGION`
+* Builder API: `.projectId()` and `.location()` methods (recommended)
 ## Usage
-1. Install the latest version of Dataproc Python client and Dataproc Spark
-   Connect modules:
+1. Install the latest version of Dataproc Spark Connect:
    ```sh
-   pip install google_cloud_dataproc dataproc_spark_connect --force-reinstall
+   pip install -U dataproc-spark-connect
    ```
 2. Add the required imports into your PySpark application or notebook and start
-   a Spark session with the following code instead of using
-   environment variables:
+   a Spark session using the fluent API:
+   ```python
+   from google.cloud.dataproc_spark_connect import DataprocSparkSession
+   spark = DataprocSparkSession.builder.getOrCreate()
+   ```
+3. You can configure Spark properties using the `.config()` method:
+   ```python
+   from google.cloud.dataproc_spark_connect import DataprocSparkSession
+   spark = DataprocSparkSession.builder.config('spark.executor.memory', '4g').config('spark.executor.cores', '2').getOrCreate()
+   ```
+4. For advanced configuration, you can use the `Session` class to customize
+   settings like subnetwork or other environment configurations:
    ```python
    from google.cloud.dataproc_spark_connect import DataprocSparkSession
    from google.cloud.dataproc_v1 import Session
    session_config = Session()
    session_config.environment_config.execution_config.subnetwork_uri = '<subnet>'
-   session_config.runtime_config.version = '2.2'
-   spark = DataprocSparkSession.builder.dataprocSessionConfig(session_config).getOrCreate()
+   session_config.runtime_config.version = '3.0'
+   spark = DataprocSparkSession.builder.projectId('my-project').location('us-central1').dataprocSessionConfig(session_config).getOrCreate()
+   ```
+### Reusing Named Sessions Across Notebooks
+Named sessions allow you to share a single Spark session across multiple notebooks, improving efficiency by avoiding repeated session startup times and reducing costs.
+To create or connect to a named session:
+1. Create a session with a custom ID in your first notebook:
+   ```python
+   from google.cloud.dataproc_spark_connect import DataprocSparkSession
+   session_id = 'my-ml-pipeline-session'
+   spark = DataprocSparkSession.builder.dataprocSessionId(session_id).getOrCreate()
+   df = spark.createDataFrame([(1, 'data')], ['id', 'value'])
+   df.show()
+   ```
+2. Reuse the same session in another notebook by specifying the same session ID:
+   ```python
+   from google.cloud.dataproc_spark_connect import DataprocSparkSession
+   session_id = 'my-ml-pipeline-session'
+   spark = DataprocSparkSession.builder.dataprocSessionId(session_id).getOrCreate()
+   df = spark.createDataFrame([(2, 'more-data')], ['id', 'value'])
+   df.show()
    ```
+3. Session IDs must be 4-63 characters long, start with a lowercase letter, contain only lowercase letters, numbers, and hyphens, and not end with a hyphen.
+4. Named sessions persist until explicitly terminated or reach their configured TTL.
+5. A session with a given ID that is in a TERMINATED state cannot be reused. It must be deleted before a new session with the same ID can be created.
 ### Using Spark SQL Magic Commands (Jupyter Notebooks)
 The package supports the [sparksql-magic](https://github.com/cryeo/sparksql-magic) library for executing Spark SQL queries directly in Jupyter notebooks.

{dataproc_spark_connect-1.0.0rc6 → dataproc_spark_connect-1.0.0rc7}/dataproc_spark_connect.egg-info/PKG-INFO RENAMED Viewed

@@ -1,15 +1,16 @@
 Metadata-Version: 2.4
 Name: dataproc-spark-connect
-Version: 1.0.0rc6
+Version: 1.0.0rc7
 Summary: Dataproc client library for Spark Connect
 Home-page: https://github.com/GoogleCloudDataproc/dataproc-spark-connect-python
 Author: Google LLC
 License: Apache 2.0
+Description-Content-Type: text/markdown
 License-File: LICENSE
 Requires-Dist: google-api-core>=2.19
 Requires-Dist: google-cloud-dataproc>=5.18
 Requires-Dist: packaging>=20.0
-Requires-Dist: pyspark[connect]~=4.0.0
+Requires-Dist: pyspark-client~=4.0.0
 Requires-Dist: tqdm>=4.67
 Requires-Dist: websockets>=14.0
 Dynamic: author
@@ -43,39 +44,86 @@ pip uninstall dataproc_spark_connect
 This client requires permissions to
 manage [Dataproc Sessions and Session Templates](https://cloud.google.com/dataproc-serverless/docs/concepts/iam).
-If you are running the client outside of Google Cloud, you must set following
-environment variables:
-* `GOOGLE_CLOUD_PROJECT` - The Google Cloud project you use to run Spark
-  workloads
-* `GOOGLE_CLOUD_REGION` - The Compute
-  Engine [region](https://cloud.google.com/compute/docs/regions-zones#available)
-  where you run the Spark workload.
-* `GOOGLE_APPLICATION_CREDENTIALS` -
-  Your [Application Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc)
+If you are running the client outside of Google Cloud, you need to provide
+authentication credentials. Set the `GOOGLE_APPLICATION_CREDENTIALS` environment
+variable to point to
+your [Application Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc)
+file.
+You can specify the project and region either via environment variables or directly
+in your code using the builder API:
+* Environment variables: `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_REGION`
+* Builder API: `.projectId()` and `.location()` methods (recommended)
 ## Usage
-1. Install the latest version of Dataproc Python client and Dataproc Spark
-   Connect modules:
+1. Install the latest version of Dataproc Spark Connect:
    ```sh
-   pip install google_cloud_dataproc dataproc_spark_connect --force-reinstall
+   pip install -U dataproc-spark-connect
    ```
 2. Add the required imports into your PySpark application or notebook and start
-   a Spark session with the following code instead of using
-   environment variables:
+   a Spark session using the fluent API:
+   ```python
+   from google.cloud.dataproc_spark_connect import DataprocSparkSession
+   spark = DataprocSparkSession.builder.getOrCreate()
+   ```
+3. You can configure Spark properties using the `.config()` method:
+   ```python
+   from google.cloud.dataproc_spark_connect import DataprocSparkSession
+   spark = DataprocSparkSession.builder.config('spark.executor.memory', '4g').config('spark.executor.cores', '2').getOrCreate()
+   ```
+4. For advanced configuration, you can use the `Session` class to customize
+   settings like subnetwork or other environment configurations:
    ```python
    from google.cloud.dataproc_spark_connect import DataprocSparkSession
    from google.cloud.dataproc_v1 import Session
    session_config = Session()
    session_config.environment_config.execution_config.subnetwork_uri = '<subnet>'
-   session_config.runtime_config.version = '2.2'
-   spark = DataprocSparkSession.builder.dataprocSessionConfig(session_config).getOrCreate()
+   session_config.runtime_config.version = '3.0'
+   spark = DataprocSparkSession.builder.projectId('my-project').location('us-central1').dataprocSessionConfig(session_config).getOrCreate()
+   ```
+### Reusing Named Sessions Across Notebooks
+Named sessions allow you to share a single Spark session across multiple notebooks, improving efficiency by avoiding repeated session startup times and reducing costs.
+To create or connect to a named session:
+1. Create a session with a custom ID in your first notebook:
+   ```python
+   from google.cloud.dataproc_spark_connect import DataprocSparkSession
+   session_id = 'my-ml-pipeline-session'
+   spark = DataprocSparkSession.builder.dataprocSessionId(session_id).getOrCreate()
+   df = spark.createDataFrame([(1, 'data')], ['id', 'value'])
+   df.show()
+   ```
+2. Reuse the same session in another notebook by specifying the same session ID:
+   ```python
+   from google.cloud.dataproc_spark_connect import DataprocSparkSession
+   session_id = 'my-ml-pipeline-session'
+   spark = DataprocSparkSession.builder.dataprocSessionId(session_id).getOrCreate()
+   df = spark.createDataFrame([(2, 'more-data')], ['id', 'value'])
+   df.show()
    ```
+3. Session IDs must be 4-63 characters long, start with a lowercase letter, contain only lowercase letters, numbers, and hyphens, and not end with a hyphen.
+4. Named sessions persist until explicitly terminated or reach their configured TTL.
+5. A session with a given ID that is in a TERMINATED state cannot be reused. It must be deleted before a new session with the same ID can be created.
 ### Using Spark SQL Magic Commands (Jupyter Notebooks)
 The package supports the [sparksql-magic](https://github.com/cryeo/sparksql-magic) library for executing Spark SQL queries directly in Jupyter notebooks.

{dataproc_spark_connect-1.0.0rc6 → dataproc_spark_connect-1.0.0rc7}/dataproc_spark_connect.egg-info/requires.txt RENAMED Viewed

@@ -1,6 +1,6 @@
 google-api-core>=2.19
 google-cloud-dataproc>=5.18
 packaging>=20.0
-pyspark[connect]~=4.0.0
+pyspark-client~=4.0.0
 tqdm>=4.67
 websockets>=14.0

{dataproc_spark_connect-1.0.0rc6 → dataproc_spark_connect-1.0.0rc7}/google/cloud/dataproc_spark_connect/session.py RENAMED Viewed

@@ -14,6 +14,7 @@
 import atexit
 import datetime
+import functools
 import json
 import logging
 import os
@@ -25,8 +26,6 @@ import time
 import uuid
 import tqdm
 from packaging import version
-from tqdm import tqdm as cli_tqdm
-from tqdm.notebook import tqdm as notebook_tqdm
 from types import MethodType
 from typing import Any, cast, ClassVar, Dict, Iterable, Optional, Union
@@ -67,6 +66,10 @@ SYSTEM_LABELS = {
     "goog-colab-notebook-id",
 }
+_DATAPROC_SESSIONS_BASE_URL = (
+    "https://console.cloud.google.com/dataproc/interactive"
+)
 def _is_valid_label_value(value: str) -> bool:
     """
@@ -494,15 +497,21 @@ class DataprocSparkSession(SparkSession):
             )
         def _display_session_link_on_creation(self, session_id):
-            session_url = f"https://console.cloud.google.com/dataproc/interactive/{self._region}/{session_id}?project={self._project_id}"
+            session_url = f"{_DATAPROC_SESSIONS_BASE_URL}/{self._region}/{session_id}?project={self._project_id}"
             plain_message = f"Creating Dataproc Session: {session_url}"
-            html_element = f"""
+            if environment.is_colab_enterprise():
+                html_element = f"""
                 <div>
                     <p>Creating Dataproc Spark Session<p>
-                    <p><a href="{session_url}">Dataproc Session</a></p>
                 </div>
-            """
+                """
+            else:
+                html_element = f"""
+                    <div>
+                        <p>Creating Dataproc Spark Session<p>
+                        <p><a href="{session_url}">Dataproc Session</a></p>
+                    </div>
+                """
             self._output_element_or_message(plain_message, html_element)
         def _print_session_created_message(self):
@@ -554,7 +563,7 @@ class DataprocSparkSession(SparkSession):
             if session_response is not None:
                 print(
-                    f"Using existing Dataproc Session (configuration changes may not be applied): https://console.cloud.google.com/dataproc/interactive/{self._region}/{s8s_session_id}?project={self._project_id}"
+                    f"Using existing Dataproc Session (configuration changes may not be applied): {_DATAPROC_SESSIONS_BASE_URL}/{self._region}/{s8s_session_id}?project={self._project_id}"
                 )
                 self._display_view_session_details_button(s8s_session_id)
                 if session is None:
@@ -711,8 +720,6 @@ class DataprocSparkSession(SparkSession):
                     # Merge default configs with existing properties,
                     # user configs take precedence
                     for k, v in {
-                        "spark.datasource.bigquery.viewsEnabled": "true",
-                        "spark.datasource.bigquery.writeMethod": "direct",
                         "spark.sql.catalog.spark_catalog": "com.google.cloud.spark.bigquery.BigQuerySparkSessionCatalog",
                         "spark.sql.sources.default": "bigquery",
                     }.items():
@@ -734,7 +741,7 @@ class DataprocSparkSession(SparkSession):
             # Runtime version to server Python version mapping
             RUNTIME_PYTHON_MAP = {
-                "3.0": (3, 11),
+                "3.0": (3, 12),
             }
             client_python = sys.version_info[:2]  # (major, minor)
@@ -798,7 +805,7 @@ class DataprocSparkSession(SparkSession):
                 return
             try:
-                session_url = f"https://console.cloud.google.com/dataproc/interactive/sessions/{session_id}/locations/{self._region}?project={self._project_id}"
+                session_url = f"{_DATAPROC_SESSIONS_BASE_URL}/{self._region}/{session_id}?project={self._project_id}"
                 from IPython.core.interactiveshell import InteractiveShell
                 if not InteractiveShell.initialized():
@@ -981,6 +988,28 @@ class DataprocSparkSession(SparkSession):
             clearProgressHandlers_wrapper_method, self
         )
+    @staticmethod
+    @functools.lru_cache(maxsize=1)
+    def get_tqdm_bar():
+        """
+        Return a tqdm implementation that works in the current environment.
+        - Uses CLI tqdm for interactive terminals.
+        - Uses the notebook tqdm if available, otherwise falls back to CLI tqdm.
+        """
+        from tqdm import tqdm as cli_tqdm
+        if environment.is_interactive_terminal():
+            return cli_tqdm
+        try:
+            import ipywidgets
+            from tqdm.notebook import tqdm as notebook_tqdm
+            return notebook_tqdm
+        except ImportError:
+            return cli_tqdm
     def _register_progress_execution_handler(self):
         from pyspark.sql.connect.shell.progress import StageInfo
@@ -1005,9 +1034,12 @@ class DataprocSparkSession(SparkSession):
                 total_tasks += stage.num_tasks
                 completed_tasks += stage.num_completed_tasks
-            tqdm_pbar = notebook_tqdm
-            if environment.is_interactive_terminal():
-                tqdm_pbar = cli_tqdm
+            # Don't show progress bar till we receive some tasks
+            if total_tasks == 0:
+                return
+            # Get correct tqdm (notebook or CLI)
+            tqdm_pbar = self.get_tqdm_bar()
             # Use a lock to ensure only one thread can access and modify
             # the shared dictionaries at a time.
@@ -1044,13 +1076,11 @@ class DataprocSparkSession(SparkSession):
     @staticmethod
     def _sql_lazy_transformation(req):
         # Select SQL command
-        if req.plan and req.plan.command and req.plan.command.sql_command:
-            return (
-                "select"
-                in req.plan.command.sql_command.sql.strip().lower().split()
-            )
-        return False
+        try:
+            query = req.plan.command.sql_command.input.sql.query
+            return "select" in query.strip().lower().split()
+        except AttributeError:
+            return False
     def _repr_html_(self) -> str:
         if not self._active_s8s_session_id:
@@ -1058,7 +1088,7 @@ class DataprocSparkSession(SparkSession):
             <div>No Active Dataproc Session</div>
             """
-        s8s_session = f"https://console.cloud.google.com/dataproc/interactive/{self._region}/{self._active_s8s_session_id}"
+        s8s_session = f"{_DATAPROC_SESSIONS_BASE_URL}/{self._region}/{self._active_s8s_session_id}"
         ui = f"{s8s_session}/sparkApplications/applications"
         return f"""
         <div>
@@ -1085,7 +1115,7 @@ class DataprocSparkSession(SparkSession):
         )
         url = (
-            f"https://console.cloud.google.com/dataproc/interactive/{self._region}/"
+            f"{_DATAPROC_SESSIONS_BASE_URL}/{self._region}/"
             f"{self._active_s8s_session_id}/sparkApplications/application;"
             f"associatedSqlOperationId={operation_id}?project={self._project_id}"
         )
@@ -1177,20 +1207,52 @@ class DataprocSparkSession(SparkSession):
     def _get_active_session_file_path():
         return os.getenv("DATAPROC_SPARK_CONNECT_ACTIVE_SESSION_FILE_PATH")
-    def stop(self) -> None:
+    def stop(self, terminate: Optional[bool] = None) -> None:
+        """
+        Stop the Spark session and optionally terminate the server-side session.
+        Parameters
+        ----------
+        terminate : bool, optional
+            Control server-side termination behavior.
+            - None (default): Auto-detect based on session type
+              - Managed sessions (auto-generated ID): terminate server
+              - Named sessions (custom ID): client-side cleanup only
+            - True: Always terminate the server-side session
+            - False: Never terminate the server-side session (client cleanup only)
+        Examples
+        --------
+        Auto-detect termination behavior (existing behavior):
+        >>> spark.stop()
+        Force terminate a named session:
+        >>> spark.stop(terminate=True)
+        Prevent termination of a managed session:
+        >>> spark.stop(terminate=False)
+        """
         with DataprocSparkSession._lock:
             if DataprocSparkSession._active_s8s_session_id is not None:
-                # Check if this is a managed session (auto-generated ID) or unmanaged session (custom ID)
-                if DataprocSparkSession._active_session_uses_custom_id:
-                    # Unmanaged session (custom ID): Only clean up client-side state
-                    # Don't terminate as it might be in use by other notebooks or clients
-                    logger.debug(
-                        f"Stopping unmanaged session {DataprocSparkSession._active_s8s_session_id} without termination"
+                # Determine if we should terminate the server-side session
+                if terminate is None:
+                    # Auto-detect: managed sessions terminate, named sessions don't
+                    should_terminate = (
+                        not DataprocSparkSession._active_session_uses_custom_id
                     )
                 else:
-                    # Managed session (auto-generated ID): Use original behavior and terminate
+                    should_terminate = terminate
+                if should_terminate:
+                    # Terminate the server-side session
                     logger.debug(
-                        f"Terminating managed session {DataprocSparkSession._active_s8s_session_id}"
+                        f"Terminating session {DataprocSparkSession._active_s8s_session_id}"
                     )
                     terminate_s8s_session(
                         DataprocSparkSession._project_id,
@@ -1198,6 +1260,11 @@ class DataprocSparkSession(SparkSession):
                         DataprocSparkSession._active_s8s_session_id,
                         self._client_options,
                     )
+                else:
+                    # Client-side cleanup only
+                    logger.debug(
+                        f"Stopping session {DataprocSparkSession._active_s8s_session_id} without termination"
+                    )
                 self._remove_stopped_session_from_file()

dataproc_spark_connect-1.0.0rc7/setup.cfg ADDED Viewed

@@ -0,0 +1,14 @@
+[bdist_wheel]
+universal = 1
+[check-manifest]
+ignore =
+	.github/**
+[metadata]
+long_description_content_type = text/markdown
+[egg_info]
+tag_build =
+tag_date = 0

{dataproc_spark_connect-1.0.0rc6 → dataproc_spark_connect-1.0.0rc7}/setup.py RENAMED Viewed

@@ -20,7 +20,7 @@ long_description = (this_directory / "README.md").read_text()
 setup(
     name="dataproc-spark-connect",
-    version="1.0.0rc6",
+    version="1.0.0rc7",
     description="Dataproc client library for Spark Connect",
     long_description=long_description,
     author="Google LLC",
@@ -31,7 +31,7 @@ setup(
         "google-api-core>=2.19",
         "google-cloud-dataproc>=5.18",
         "packaging>=20.0",
-        "pyspark[connect]~=4.0.0",
+        "pyspark-client~=4.0.0",
         "tqdm>=4.67",
         "websockets>=14.0",
     ],