PyPI - spark-nlp - Versions diffs - 5.2.2__tar.gz → 5.3.0__tar.gz - Mend - Supply Chain Defender

spark-nlp 5.2.2tar.gz → 5.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of spark-nlp might be problematic. Click here for more details.

Files changed (234) hide show

{spark-nlp-5.2.2 → spark-nlp-5.3.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: spark-nlp
-Version: 5.2.2
+Version: 5.3.0
 Summary: John Snow Labs Spark NLP is a natural language processing library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines, that scale easily in a distributed environment.
 Home-page: https://github.com/JohnSnowLabs/spark-nlp
 Author: John Snow Labs
@@ -51,10 +51,10 @@ Description-Content-Type: text/markdown
 Spark NLP is a state-of-the-art Natural Language Processing library built on top of Apache Spark. It provides **simple**, **performant** & **accurate** NLP annotations for machine learning pipelines that **scale** easily in a distributed
 environment.
-Spark NLP comes with **30000+** pretrained **pipelines** and **models** in more than **200+** languages.
+Spark NLP comes with **36000+** pretrained **pipelines** and **models** in more than **200+** languages.
 It also offers tasks such as **Tokenization**, **Word Segmentation**, **Part-of-Speech Tagging**, Word and Sentence **Embeddings**, **Named Entity Recognition**, **Dependency Parsing**, **Spell Checking**, **Text Classification**, **Sentiment Analysis**, **Token Classification**, **Machine Translation** (+180 languages), **Summarization**, **Question Answering**, **Table Question Answering**, **Text Generation**, **Image Classification**, **Image to Text (captioning)**, **Automatic Speech Recognition**, **Zero-Shot Learning**, and many more [NLP tasks](#features).
-**Spark NLP** is the only open-source NLP library in **production** that offers state-of-the-art transformers such as **BERT**, **CamemBERT**, **ALBERT**, **ELECTRA**, **XLNet**, **DistilBERT**, **RoBERTa**, **DeBERTa**, **XLM-RoBERTa**, **Longformer**, **ELMO**, **Universal Sentence Encoder**, **Facebook BART**, **Instructor**, **E5**, **Google T5**, **MarianMT**, **OpenAI GPT2**, and **Vision Transformers (ViT)** not only to **Python** and **R**, but also to **JVM** ecosystem (**Java**, **Scala**, and **Kotlin**) at **scale** by extending **Apache Spark** natively.
+**Spark NLP** is the only open-source NLP library in **production** that offers state-of-the-art transformers such as **BERT**, **CamemBERT**, **ALBERT**, **ELECTRA**, **XLNet**, **DistilBERT**, **RoBERTa**, **DeBERTa**, **XLM-RoBERTa**, **Longformer**, **ELMO**, **Universal Sentence Encoder**, **Llama-2**, **M2M100**, **BART**, **Instructor**, **E5**, **Google T5**, **MarianMT**, **OpenAI GPT2**, **Vision Transformers (ViT)**, **OpenAI Whisper**, and many more not only to **Python** and **R**, but also to **JVM** ecosystem (**Java**, **Scala**, and **Kotlin**) at **scale** by extending **Apache Spark** natively.
 ## Project's website
@@ -143,42 +143,34 @@ documentation and examples
 - BERT Sentence Embeddings (TF Hub & HuggingFace models)
 - RoBerta Sentence Embeddings (HuggingFace models)
 - XLM-RoBerta Sentence Embeddings (HuggingFace models)
-- Instructor Embeddings (HuggingFace models)
+- INSTRUCTOR Embeddings (HuggingFace models)
 - E5 Embeddings (HuggingFace models)
 - MPNet Embeddings (HuggingFace models)
 - OpenAI Embeddings
-- Sentence Embeddings
-- Chunk Embeddings
+- Sentence & Chunk Embeddings
 - Unsupervised keywords extraction
 - Language Detection & Identification (up to 375 languages)
-- Multi-class Sentiment analysis (Deep learning)
-- Multi-label Sentiment analysis (Deep learning)
+- Multi-class & Multi-labe Sentiment analysis (Deep learning)
 - Multi-class Text Classification (Deep learning)
-- BERT for Token & Sequence Classification
-- DistilBERT for Token & Sequence Classification
-- CamemBERT for Token & Sequence Classification
-- ALBERT for Token & Sequence Classification
-- RoBERTa for Token & Sequence Classification
-- DeBERTa for Token & Sequence Classification
-- XLM-RoBERTa for Token & Sequence Classification
+- BERT for Token & Sequence Classification & Question Answering
+- DistilBERT for Token & Sequence Classification & Question Answering
+- CamemBERT for Token & Sequence Classification & Question Answering
+- ALBERT for Token & Sequence Classification & Question Answering
+- RoBERTa for Token & Sequence Classification & Question Answering
+- DeBERTa for Token & Sequence Classification & Question Answering
+- XLM-RoBERTa for Token & Sequence Classification & Question Answering
+- Longformer for Token & Sequence Classification & Question Answering
+- MPnet for Token & Sequence Classification & Question Answering
 - XLNet for Token & Sequence Classification
-- Longformer for Token & Sequence Classification
-- BERT for Token & Sequence Classification
-- BERT for Question Answering
-- CamemBERT for Question Answering
-- DistilBERT for Question Answering
-- ALBERT for Question Answering
-- RoBERTa for Question Answering
-- DeBERTa for Question Answering
-- XLM-RoBERTa for Question Answering
-- Longformer for Question Answering
-- Table Question Answering (TAPAS)
 - Zero-Shot NER Model
 - Zero-Shot Text Classification by Transformers (ZSL)
 - Neural Machine Translation (MarianMT)
+- Many-to-Many multilingual translation model (Facebook M2M100)
+- Table Question Answering (TAPAS)
 - Text-To-Text Transfer Transformer (Google T5)
 - Generative Pre-trained Transformer 2 (OpenAI GPT2)
 - Seq2Seq for NLG, Translation, and Comprehension (Facebook BART)
+- Chat and Conversational LLMs (Facebook Llama-22)
 - Vision Transformer (Google ViT)
 - Swin Image Classification (Microsoft Swin Transformer)
 - ConvNext Image Classification (Facebook ConvNext)
@@ -191,7 +183,7 @@ documentation and examples
 - Easy ONNX and TensorFlow integrations
 - GPU Support
 - Full integration with Spark ML functions
-- +24000 pre-trained models in +200 languages!
+- +30000 pre-trained models in +200 languages!
 - +6000 pre-trained pipelines in +200 languages!
 - Multi-lingual NER models: Arabic, Bengali, Chinese, Danish, Dutch, English, Finnish, French, German, Hebrew, Italian,
   Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Urdu, and more.
@@ -205,7 +197,7 @@ To use Spark NLP you need the following requirements:
 **GPU (optional):**
-Spark NLP 5.2.2 is built with ONNX 1.16.3 and TensorFlow 2.7.1 deep learning engines. The minimum following NVIDIA® software are only required for GPU support:
+Spark NLP 5.3.0 is built with ONNX 1.17.0 and TensorFlow 2.7.1 deep learning engines. The minimum following NVIDIA® software are only required for GPU support:
 - NVIDIA® GPU drivers version 450.80.02 or higher
 - CUDA® Toolkit 11.2
@@ -221,7 +213,7 @@ $ java -version
 $ conda create -n sparknlp python=3.7 -y
 $ conda activate sparknlp
 # spark-nlp by default is based on pyspark 3.x
-$ pip install spark-nlp==5.2.2 pyspark==3.3.1
+$ pip install spark-nlp==5.3.0 pyspark==3.3.1
 ```
 In Python console or Jupyter `Python3` kernel:
@@ -266,11 +258,12 @@ For more examples, you can visit our dedicated [examples](https://github.com/Joh
 ## Apache Spark Support
-Spark NLP *5.2.2* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x
+Spark NLP *5.3.0* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x
 | Spark NLP | Apache Spark 3.5.x | Apache Spark 3.4.x | Apache Spark 3.3.x | Apache Spark 3.2.x | Apache Spark 3.1.x | Apache Spark 3.0.x | Apache Spark 2.4.x | Apache Spark 2.3.x |
 |-----------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|
-| 5.2.x     | Partially          | YES                | YES                | YES                | YES                | YES                | NO                 | NO                 |
+| 5.3.x     | YES                | YES                | YES                | YES                | YES                | YES                | NO                 | NO                 |
+| 5.2.x     | YES                | YES                | YES                | YES                | YES                | YES                | NO                 | NO                 |
 | 5.1.x     | Partially          | YES                | YES                | YES                | YES                | YES                | NO                 | NO                 |
 | 5.0.x     | YES                | YES                | YES                | YES                | YES                | YES                | NO                 | NO                 |
 | 4.4.x     | YES                | YES                | YES                | YES                | YES                | YES                | NO                 | NO                 |
@@ -291,6 +284,7 @@ Find out more about `Spark NLP` versions from our [release notes](https://github
 | Spark NLP | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 | Python 3.10| Scala 2.11 | Scala 2.12 |
 |-----------|------------|------------|------------|------------|------------|------------|------------|
+| 5.3.x     | NO         | YES        | YES        | YES        | YES        | NO         | YES        |
 | 5.2.x     | NO         | YES        | YES        | YES        | YES        | NO         | YES        |
 | 5.1.x     | NO         | YES        | YES        | YES        | YES        | NO         | YES        |
 | 5.0.x     | NO         | YES        | YES        | YES        | YES        | NO         | YES        |
@@ -308,7 +302,7 @@ Find out more about `Spark NLP` versions from our [release notes](https://github
 ## Databricks Support
-Spark NLP 5.2.2 has been tested and is compatible with the following runtimes:
+Spark NLP 5.3.0 has been tested and is compatible with the following runtimes:
 **CPU:**
@@ -350,6 +344,10 @@ Spark NLP 5.2.2 has been tested and is compatible with the following runtimes:
 - 14.0 ML
 - 14.1
 - 14.1 ML
+- 14.2
+- 14.2 ML
+- 14.3
+- 14.3 ML
 **GPU:**
@@ -372,10 +370,12 @@ Spark NLP 5.2.2 has been tested and is compatible with the following runtimes:
 - 13.3 ML & GPU
 - 14.0 ML & GPU
 - 14.1 ML & GPU
+- 14.2 ML & GPU
+- 14.3 ML & GPU
 ## EMR Support
-Spark NLP 5.2.2 has been tested and is compatible with the following EMR releases:
+Spark NLP 5.3.0 has been tested and is compatible with the following EMR releases:
 - emr-6.2.0
 - emr-6.3.0
@@ -391,8 +391,11 @@ Spark NLP 5.2.2 has been tested and is compatible with the following EMR release
 - emr-6.12.0
 - emr-6.13.0
 - emr-6.14.0
+- emr-6.15.0
+- emr-7.0.0
 Full list of [Amazon EMR 6.x releases](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-6x.html)
+Full list of [Amazon EMR 7.x releases](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-7x.html)
 NOTE: The EMR 6.1.0 and 6.1.1 are not supported.
@@ -422,11 +425,11 @@ Spark NLP supports all major releases of Apache Spark 3.0.x, Apache Spark 3.1.x,
 ```sh
 # CPU
-spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.2
+spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.0
-pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.2
+pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.0
-spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.2
+spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.0
 ```
 The `spark-nlp` has been published to
@@ -435,11 +438,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s
 ```sh
 # GPU
-spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.2.2
+spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.3.0
-pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.2.2
+pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.3.0
-spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.2.2
+spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.3.0
 ```
@@ -449,11 +452,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s
 ```sh
 # AArch64
-spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.2.2
+spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.3.0
-pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.2.2
+pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.3.0
-spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.2.2
+spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.3.0
 ```
@@ -463,11 +466,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s
 ```sh
 # M1/M2 (Apple Silicon)
-spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.2.2
+spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.3.0
-pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.2.2
+pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.3.0
-spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.2.2
+spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.3.0
 ```
@@ -481,7 +484,7 @@ set in your SparkSession:
 spark-shell \
   --driver-memory 16g \
   --conf spark.kryoserializer.buffer.max=2000M \
-  --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.2
+  --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.0
 ```
 ## Scala
@@ -499,7 +502,7 @@ coordinates:
 <dependency>
     <groupId>com.johnsnowlabs.nlp</groupId>
     <artifactId>spark-nlp_2.12</artifactId>
-    <version>5.2.2</version>
+    <version>5.3.0</version>
 </dependency>
 ```
@@ -510,7 +513,7 @@ coordinates:
 <dependency>
     <groupId>com.johnsnowlabs.nlp</groupId>
     <artifactId>spark-nlp-gpu_2.12</artifactId>
-    <version>5.2.2</version>
+    <version>5.3.0</version>
 </dependency>
 ```
@@ -521,7 +524,7 @@ coordinates:
 <dependency>
     <groupId>com.johnsnowlabs.nlp</groupId>
     <artifactId>spark-nlp-aarch64_2.12</artifactId>
-    <version>5.2.2</version>
+    <version>5.3.0</version>
 </dependency>
 ```
@@ -532,7 +535,7 @@ coordinates:
 <dependency>
     <groupId>com.johnsnowlabs.nlp</groupId>
     <artifactId>spark-nlp-silicon_2.12</artifactId>
-    <version>5.2.2</version>
+    <version>5.3.0</version>
 </dependency>
 ```
@@ -542,28 +545,28 @@ coordinates:
 ```sbtshell
 // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "5.2.2"
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "5.3.0"
 ```
 **spark-nlp-gpu:**
 ```sbtshell
 // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-gpu
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "5.2.2"
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "5.3.0"
 ```
 **spark-nlp-aarch64:**
 ```sbtshell
 // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-aarch64
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "5.2.2"
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "5.3.0"
 ```
 **spark-nlp-silicon:**
 ```sbtshell
 // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-silicon
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "5.2.2"
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "5.3.0"
 ```
 Maven
@@ -585,7 +588,7 @@ If you installed pyspark through pip/conda, you can install `spark-nlp` through
 Pip:
 ```bash
-pip install spark-nlp==5.2.2
+pip install spark-nlp==5.3.0
 ```
 Conda:
@@ -614,7 +617,7 @@ spark = SparkSession.builder
     .config("spark.driver.memory", "16G")
     .config("spark.driver.maxResultSize", "0")
     .config("spark.kryoserializer.buffer.max", "2000M")
-    .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.2")
+    .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.0")
     .getOrCreate()
 ```
@@ -685,7 +688,7 @@ Use either one of the following options
 - Add the following Maven Coordinates to the interpreter's library list
 ```bash
-com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.2
+com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.0
 ```
 - Add a path to pre-built jar from [here](#compiled-jars) in the interpreter's library list making sure the jar is
@@ -696,7 +699,7 @@ com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.2
 Apart from the previous step, install the python module through pip
 ```bash
-pip install spark-nlp==5.2.2
+pip install spark-nlp==5.3.0
 ```
 Or you can install `spark-nlp` from inside Zeppelin by using Conda:
@@ -724,7 +727,7 @@ launch the Jupyter from the same Python environment:
 $ conda create -n sparknlp python=3.8 -y
 $ conda activate sparknlp
 # spark-nlp by default is based on pyspark 3.x
-$ pip install spark-nlp==5.2.2 pyspark==3.3.1 jupyter
+$ pip install spark-nlp==5.3.0 pyspark==3.3.1 jupyter
 $ jupyter notebook
 ```
@@ -741,7 +744,7 @@ export PYSPARK_PYTHON=python3
 export PYSPARK_DRIVER_PYTHON=jupyter
 export PYSPARK_DRIVER_PYTHON_OPTS=notebook
-pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.2
+pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.0
 ```
 Alternatively, you can mix in using `--jars` option for pyspark + `pip install spark-nlp`
@@ -768,7 +771,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
 # -s is for spark-nlp
 # -g will enable upgrading libcudnn8 to 8.1.0 on Google Colab for GPU usage
 # by default they are set to the latest
-!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.2.2
+!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.3.0
 ```
 [Spark NLP quick start on Google Colab](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/quick_start_google_colab.ipynb)
@@ -791,7 +794,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
 # -s is for spark-nlp
 # -g will enable upgrading libcudnn8 to 8.1.0 on Kaggle for GPU usage
 # by default they are set to the latest
-!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.2.2
+!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.3.0
 ```
 [Spark NLP quick start on Kaggle Kernel](https://www.kaggle.com/mozzie/spark-nlp-named-entity-recognition) is a live
@@ -810,9 +813,9 @@ demo on Kaggle Kernel that performs named entity recognitions by using Spark NLP
 3. In `Libraries` tab inside your cluster you need to follow these steps:
-   3.1. Install New -> PyPI -> `spark-nlp==5.2.2` -> Install
+   3.1. Install New -> PyPI -> `spark-nlp==5.3.0` -> Install
-   3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.2` -> Install
+   3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.0` -> Install
 4. Now you can attach your notebook to the cluster and use Spark NLP!
@@ -863,7 +866,7 @@ A sample of your software configuration in JSON on S3 (must be public access):
       "spark.kryoserializer.buffer.max": "2000M",
       "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
       "spark.driver.maxResultSize": "0",
-      "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.2"
+      "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.0"
     }
 }]
 ```
@@ -872,7 +875,7 @@ A sample of AWS CLI to launch EMR cluster:
 ```.sh
 aws emr create-cluster \
---name "Spark NLP 5.2.2" \
+--name "Spark NLP 5.3.0" \
 --release-label emr-6.2.0 \
 --applications Name=Hadoop Name=Spark Name=Hive \
 --instance-type m4.4xlarge \
@@ -936,7 +939,7 @@ gcloud dataproc clusters create ${CLUSTER_NAME} \
   --enable-component-gateway \
   --metadata 'PIP_PACKAGES=spark-nlp spark-nlp-display google-cloud-bigquery google-cloud-storage' \
   --initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/python/pip-install.sh \
-  --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.2
+  --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.0
 ```
 2. On an existing one, you need to install spark-nlp and spark-nlp-display packages from PyPI.
@@ -947,16 +950,20 @@ gcloud dataproc clusters create ${CLUSTER_NAME} \
 You can change the following Spark NLP configurations via Spark Configuration:
-| Property Name                                          | Default              | Meaning                                                                                                                                                                                                                                                                            |
-|--------------------------------------------------------|----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| `spark.jsl.settings.pretrained.cache_folder`           | `~/cache_pretrained` | The location to download and extract pretrained `Models` and `Pipelines`. By default, it will be in User's Home directory under `cache_pretrained` directory                                                                                                                       |
-| `spark.jsl.settings.storage.cluster_tmp_dir`           | `hadoop.tmp.dir`     | The location to use on a cluster for temporarily files such as unpacking indexes for WordEmbeddings. By default, this locations is the location of `hadoop.tmp.dir` set via Hadoop configuration for Apache Spark. NOTE: `S3` is not supported and it must be local, HDFS, or DBFS |
-| `spark.jsl.settings.annotator.log_folder`              | `~/annotator_logs`   | The location to save logs from annotators during training such as `NerDLApproach`, `ClassifierDLApproach`, `SentimentDLApproach`, `MultiClassifierDLApproach`, etc. By default, it will be in User's Home directory under `annotator_logs` directory                               |
-| `spark.jsl.settings.aws.credentials.access_key_id`     | `None`               | Your AWS access key to use your S3 bucket to store log files of training models or access tensorflow graphs used in `NerDLApproach`                                                                                                                                                |
-| `spark.jsl.settings.aws.credentials.secret_access_key` | `None`               | Your AWS secret access key to use your S3 bucket to store log files of training models or access tensorflow graphs used in `NerDLApproach`                                                                                                                                         |
-| `spark.jsl.settings.aws.credentials.session_token`     | `None`               | Your AWS MFA session token to use your S3 bucket to store log files of training models or access tensorflow graphs used in `NerDLApproach`                                                                                                                                         |
-| `spark.jsl.settings.aws.s3_bucket`                     | `None`               | Your AWS S3 bucket to store log files of training models or access tensorflow graphs used in `NerDLApproach`                                                                                                                                                                       |
-| `spark.jsl.settings.aws.region`                        | `None`               | Your AWS region to use your S3 bucket to store log files of training models or access tensorflow graphs used in `NerDLApproach`                                                                                                                                                    |
+| Property Name                                           | Default              | Meaning                                                                                                                                                                                                                                                                            |
+|---------------------------------------------------------|----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `spark.jsl.settings.pretrained.cache_folder`            | `~/cache_pretrained` | The location to download and extract pretrained `Models` and `Pipelines`. By default, it will be in User's Home directory under `cache_pretrained` directory                                                                                                                       |
+| `spark.jsl.settings.storage.cluster_tmp_dir`            | `hadoop.tmp.dir`     | The location to use on a cluster for temporarily files such as unpacking indexes for WordEmbeddings. By default, this locations is the location of `hadoop.tmp.dir` set via Hadoop configuration for Apache Spark. NOTE: `S3` is not supported and it must be local, HDFS, or DBFS |
+| `spark.jsl.settings.annotator.log_folder`               | `~/annotator_logs`   | The location to save logs from annotators during training such as `NerDLApproach`, `ClassifierDLApproach`, `SentimentDLApproach`, `MultiClassifierDLApproach`, etc. By default, it will be in User's Home directory under `annotator_logs` directory                               |
+| `spark.jsl.settings.aws.credentials.access_key_id`      | `None`               | Your AWS access key to use your S3 bucket to store log files of training models or access tensorflow graphs used in `NerDLApproach`                                                                                                                                                |
+| `spark.jsl.settings.aws.credentials.secret_access_key`  | `None`               | Your AWS secret access key to use your S3 bucket to store log files of training models or access tensorflow graphs used in `NerDLApproach`                                                                                                                                         |
+| `spark.jsl.settings.aws.credentials.session_token`      | `None`               | Your AWS MFA session token to use your S3 bucket to store log files of training models or access tensorflow graphs used in `NerDLApproach`                                                                                                                                         |
+| `spark.jsl.settings.aws.s3_bucket`                      | `None`               | Your AWS S3 bucket to store log files of training models or access tensorflow graphs used in `NerDLApproach`                                                                                                                                                                       |
+| `spark.jsl.settings.aws.region`                         | `None`               | Your AWS region to use your S3 bucket to store log files of training models or access tensorflow graphs used in `NerDLApproach`                                                                                                                                                    |
+| `spark.jsl.settings.onnx.gpuDeviceId`                   | `0`                  | Constructs CUDA execution provider options for the specified non-negative device id.                                                                                                                                                                                               |
+| `spark.jsl.settings.onnx.intraOpNumThreads`             | `6`                  | Sets the size of the CPU thread pool used for executing a single graph, if executing on a CPU.                                                                                                                                                                                     |
+| `spark.jsl.settings.onnx.optimizationLevel`             | `ALL_OPT`            | Sets the optimization level of this options object, overriding the old setting.                                                                                                                                                                                                    |
+| `spark.jsl.settings.onnx.executionMode`                 | `SEQUENTIAL`         | Sets the execution mode of this options object, overriding the old setting.                                                                                                                                                                                                        |
 ### How to set Spark NLP Configuration
@@ -975,7 +982,7 @@ spark = SparkSession.builder
     .config("spark.kryoserializer.buffer.max", "2000m")
     .config("spark.jsl.settings.pretrained.cache_folder", "sample_data/pretrained")
     .config("spark.jsl.settings.storage.cluster_tmp_dir", "sample_data/storage")
-    .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.2")
+    .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.0")
     .getOrCreate()
 ```
@@ -989,7 +996,7 @@ spark-shell \
   --conf spark.kryoserializer.buffer.max=2000M \
   --conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \
   --conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \
-  --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.2
+  --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.0
 ```
 **pyspark:**
@@ -1002,7 +1009,7 @@ pyspark \
   --conf spark.kryoserializer.buffer.max=2000M \
   --conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \
   --conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \
-  --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.2
+  --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.0
 ```
 **Databricks:**
@@ -1274,7 +1281,7 @@ spark = SparkSession.builder
     .config("spark.driver.memory", "16G")
     .config("spark.driver.maxResultSize", "0")
     .config("spark.kryoserializer.buffer.max", "2000M")
-    .config("spark.jars", "/tmp/spark-nlp-assembly-5.2.2.jar")
+    .config("spark.jars", "/tmp/spark-nlp-assembly-5.3.0.jar")
     .getOrCreate()
 ```
@@ -1283,7 +1290,7 @@ spark = SparkSession.builder
   version (3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x)
 - If you are local, you can load the Fat JAR from your local FileSystem, however, if you are in a cluster setup you need
   to put the Fat JAR on a distributed FileSystem such as HDFS, DBFS, S3, etc. (
-  i.e., `hdfs:///tmp/spark-nlp-assembly-5.2.2.jar`)
+  i.e., `hdfs:///tmp/spark-nlp-assembly-5.3.0.jar`)
 Example of using pretrained Models and Pipelines in offline: