spark-nlp 5.2.3__py2.py3-none-any.whl → 5.3.0__py2.py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of spark-nlp might be problematic. Click here for more details.

@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: spark-nlp
3
- Version: 5.2.3
3
+ Version: 5.3.0
4
4
  Summary: John Snow Labs Spark NLP is a natural language processing library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines, that scale easily in a distributed environment.
5
5
  Home-page: https://github.com/JohnSnowLabs/spark-nlp
6
6
  Author: John Snow Labs
@@ -54,7 +54,7 @@ environment.
54
54
  Spark NLP comes with **36000+** pretrained **pipelines** and **models** in more than **200+** languages.
55
55
  It also offers tasks such as **Tokenization**, **Word Segmentation**, **Part-of-Speech Tagging**, Word and Sentence **Embeddings**, **Named Entity Recognition**, **Dependency Parsing**, **Spell Checking**, **Text Classification**, **Sentiment Analysis**, **Token Classification**, **Machine Translation** (+180 languages), **Summarization**, **Question Answering**, **Table Question Answering**, **Text Generation**, **Image Classification**, **Image to Text (captioning)**, **Automatic Speech Recognition**, **Zero-Shot Learning**, and many more [NLP tasks](#features).
56
56
 
57
- **Spark NLP** is the only open-source NLP library in **production** that offers state-of-the-art transformers such as **BERT**, **CamemBERT**, **ALBERT**, **ELECTRA**, **XLNet**, **DistilBERT**, **RoBERTa**, **DeBERTa**, **XLM-RoBERTa**, **Longformer**, **ELMO**, **Universal Sentence Encoder**, **Facebook BART**, **Instructor**, **E5**, **Google T5**, **MarianMT**, **OpenAI GPT2**, **Vision Transformers (ViT)**, **OpenAI Whisper**, and many more not only to **Python** and **R**, but also to **JVM** ecosystem (**Java**, **Scala**, and **Kotlin**) at **scale** by extending **Apache Spark** natively.
57
+ **Spark NLP** is the only open-source NLP library in **production** that offers state-of-the-art transformers such as **BERT**, **CamemBERT**, **ALBERT**, **ELECTRA**, **XLNet**, **DistilBERT**, **RoBERTa**, **DeBERTa**, **XLM-RoBERTa**, **Longformer**, **ELMO**, **Universal Sentence Encoder**, **Llama-2**, **M2M100**, **BART**, **Instructor**, **E5**, **Google T5**, **MarianMT**, **OpenAI GPT2**, **Vision Transformers (ViT)**, **OpenAI Whisper**, and many more not only to **Python** and **R**, but also to **JVM** ecosystem (**Java**, **Scala**, and **Kotlin**) at **scale** by extending **Apache Spark** natively.
58
58
 
59
59
  ## Project's website
60
60
 
@@ -143,42 +143,34 @@ documentation and examples
143
143
  - BERT Sentence Embeddings (TF Hub & HuggingFace models)
144
144
  - RoBerta Sentence Embeddings (HuggingFace models)
145
145
  - XLM-RoBerta Sentence Embeddings (HuggingFace models)
146
- - Instructor Embeddings (HuggingFace models)
146
+ - INSTRUCTOR Embeddings (HuggingFace models)
147
147
  - E5 Embeddings (HuggingFace models)
148
148
  - MPNet Embeddings (HuggingFace models)
149
149
  - OpenAI Embeddings
150
- - Sentence Embeddings
151
- - Chunk Embeddings
150
+ - Sentence & Chunk Embeddings
152
151
  - Unsupervised keywords extraction
153
152
  - Language Detection & Identification (up to 375 languages)
154
- - Multi-class Sentiment analysis (Deep learning)
155
- - Multi-label Sentiment analysis (Deep learning)
153
+ - Multi-class & Multi-labe Sentiment analysis (Deep learning)
156
154
  - Multi-class Text Classification (Deep learning)
157
- - BERT for Token & Sequence Classification
158
- - DistilBERT for Token & Sequence Classification
159
- - CamemBERT for Token & Sequence Classification
160
- - ALBERT for Token & Sequence Classification
161
- - RoBERTa for Token & Sequence Classification
162
- - DeBERTa for Token & Sequence Classification
163
- - XLM-RoBERTa for Token & Sequence Classification
155
+ - BERT for Token & Sequence Classification & Question Answering
156
+ - DistilBERT for Token & Sequence Classification & Question Answering
157
+ - CamemBERT for Token & Sequence Classification & Question Answering
158
+ - ALBERT for Token & Sequence Classification & Question Answering
159
+ - RoBERTa for Token & Sequence Classification & Question Answering
160
+ - DeBERTa for Token & Sequence Classification & Question Answering
161
+ - XLM-RoBERTa for Token & Sequence Classification & Question Answering
162
+ - Longformer for Token & Sequence Classification & Question Answering
163
+ - MPnet for Token & Sequence Classification & Question Answering
164
164
  - XLNet for Token & Sequence Classification
165
- - Longformer for Token & Sequence Classification
166
- - BERT for Token & Sequence Classification
167
- - BERT for Question Answering
168
- - CamemBERT for Question Answering
169
- - DistilBERT for Question Answering
170
- - ALBERT for Question Answering
171
- - RoBERTa for Question Answering
172
- - DeBERTa for Question Answering
173
- - XLM-RoBERTa for Question Answering
174
- - Longformer for Question Answering
175
- - Table Question Answering (TAPAS)
176
165
  - Zero-Shot NER Model
177
166
  - Zero-Shot Text Classification by Transformers (ZSL)
178
167
  - Neural Machine Translation (MarianMT)
168
+ - Many-to-Many multilingual translation model (Facebook M2M100)
169
+ - Table Question Answering (TAPAS)
179
170
  - Text-To-Text Transfer Transformer (Google T5)
180
171
  - Generative Pre-trained Transformer 2 (OpenAI GPT2)
181
172
  - Seq2Seq for NLG, Translation, and Comprehension (Facebook BART)
173
+ - Chat and Conversational LLMs (Facebook Llama-22)
182
174
  - Vision Transformer (Google ViT)
183
175
  - Swin Image Classification (Microsoft Swin Transformer)
184
176
  - ConvNext Image Classification (Facebook ConvNext)
@@ -205,7 +197,7 @@ To use Spark NLP you need the following requirements:
205
197
 
206
198
  **GPU (optional):**
207
199
 
208
- Spark NLP 5.2.3 is built with ONNX 1.16.3 and TensorFlow 2.7.1 deep learning engines. The minimum following NVIDIA® software are only required for GPU support:
200
+ Spark NLP 5.3.0 is built with ONNX 1.17.0 and TensorFlow 2.7.1 deep learning engines. The minimum following NVIDIA® software are only required for GPU support:
209
201
 
210
202
  - NVIDIA® GPU drivers version 450.80.02 or higher
211
203
  - CUDA® Toolkit 11.2
@@ -221,7 +213,7 @@ $ java -version
221
213
  $ conda create -n sparknlp python=3.7 -y
222
214
  $ conda activate sparknlp
223
215
  # spark-nlp by default is based on pyspark 3.x
224
- $ pip install spark-nlp==5.2.3 pyspark==3.3.1
216
+ $ pip install spark-nlp==5.3.0 pyspark==3.3.1
225
217
  ```
226
218
 
227
219
  In Python console or Jupyter `Python3` kernel:
@@ -266,11 +258,12 @@ For more examples, you can visit our dedicated [examples](https://github.com/Joh
266
258
 
267
259
  ## Apache Spark Support
268
260
 
269
- Spark NLP *5.2.3* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x
261
+ Spark NLP *5.3.0* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x
270
262
 
271
263
  | Spark NLP | Apache Spark 3.5.x | Apache Spark 3.4.x | Apache Spark 3.3.x | Apache Spark 3.2.x | Apache Spark 3.1.x | Apache Spark 3.0.x | Apache Spark 2.4.x | Apache Spark 2.3.x |
272
264
  |-----------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|
273
- | 5.2.x | YES | YES | YES | YES | YES | YES | NO | NO |
265
+ | 5.3.x | YES | YES | YES | YES | YES | YES | NO | NO |
266
+ | 5.2.x | YES | YES | YES | YES | YES | YES | NO | NO |
274
267
  | 5.1.x | Partially | YES | YES | YES | YES | YES | NO | NO |
275
268
  | 5.0.x | YES | YES | YES | YES | YES | YES | NO | NO |
276
269
  | 4.4.x | YES | YES | YES | YES | YES | YES | NO | NO |
@@ -291,6 +284,7 @@ Find out more about `Spark NLP` versions from our [release notes](https://github
291
284
 
292
285
  | Spark NLP | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 | Python 3.10| Scala 2.11 | Scala 2.12 |
293
286
  |-----------|------------|------------|------------|------------|------------|------------|------------|
287
+ | 5.3.x | NO | YES | YES | YES | YES | NO | YES |
294
288
  | 5.2.x | NO | YES | YES | YES | YES | NO | YES |
295
289
  | 5.1.x | NO | YES | YES | YES | YES | NO | YES |
296
290
  | 5.0.x | NO | YES | YES | YES | YES | NO | YES |
@@ -308,7 +302,7 @@ Find out more about `Spark NLP` versions from our [release notes](https://github
308
302
 
309
303
  ## Databricks Support
310
304
 
311
- Spark NLP 5.2.3 has been tested and is compatible with the following runtimes:
305
+ Spark NLP 5.3.0 has been tested and is compatible with the following runtimes:
312
306
 
313
307
  **CPU:**
314
308
 
@@ -350,6 +344,10 @@ Spark NLP 5.2.3 has been tested and is compatible with the following runtimes:
350
344
  - 14.0 ML
351
345
  - 14.1
352
346
  - 14.1 ML
347
+ - 14.2
348
+ - 14.2 ML
349
+ - 14.3
350
+ - 14.3 ML
353
351
 
354
352
  **GPU:**
355
353
 
@@ -372,10 +370,12 @@ Spark NLP 5.2.3 has been tested and is compatible with the following runtimes:
372
370
  - 13.3 ML & GPU
373
371
  - 14.0 ML & GPU
374
372
  - 14.1 ML & GPU
373
+ - 14.2 ML & GPU
374
+ - 14.3 ML & GPU
375
375
 
376
376
  ## EMR Support
377
377
 
378
- Spark NLP 5.2.3 has been tested and is compatible with the following EMR releases:
378
+ Spark NLP 5.3.0 has been tested and is compatible with the following EMR releases:
379
379
 
380
380
  - emr-6.2.0
381
381
  - emr-6.3.0
@@ -391,8 +391,11 @@ Spark NLP 5.2.3 has been tested and is compatible with the following EMR release
391
391
  - emr-6.12.0
392
392
  - emr-6.13.0
393
393
  - emr-6.14.0
394
+ - emr-6.15.0
395
+ - emr-7.0.0
394
396
 
395
397
  Full list of [Amazon EMR 6.x releases](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-6x.html)
398
+ Full list of [Amazon EMR 7.x releases](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-7x.html)
396
399
 
397
400
  NOTE: The EMR 6.1.0 and 6.1.1 are not supported.
398
401
 
@@ -422,11 +425,11 @@ Spark NLP supports all major releases of Apache Spark 3.0.x, Apache Spark 3.1.x,
422
425
  ```sh
423
426
  # CPU
424
427
 
425
- spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.3
428
+ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.0
426
429
 
427
- pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.3
430
+ pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.0
428
431
 
429
- spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.3
432
+ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.0
430
433
  ```
431
434
 
432
435
  The `spark-nlp` has been published to
@@ -435,11 +438,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s
435
438
  ```sh
436
439
  # GPU
437
440
 
438
- spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.2.3
441
+ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.3.0
439
442
 
440
- pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.2.3
443
+ pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.3.0
441
444
 
442
- spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.2.3
445
+ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.3.0
443
446
 
444
447
  ```
445
448
 
@@ -449,11 +452,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s
449
452
  ```sh
450
453
  # AArch64
451
454
 
452
- spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.2.3
455
+ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.3.0
453
456
 
454
- pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.2.3
457
+ pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.3.0
455
458
 
456
- spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.2.3
459
+ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.3.0
457
460
 
458
461
  ```
459
462
 
@@ -463,11 +466,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s
463
466
  ```sh
464
467
  # M1/M2 (Apple Silicon)
465
468
 
466
- spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.2.3
469
+ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.3.0
467
470
 
468
- pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.2.3
471
+ pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.3.0
469
472
 
470
- spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.2.3
473
+ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.3.0
471
474
 
472
475
  ```
473
476
 
@@ -481,7 +484,7 @@ set in your SparkSession:
481
484
  spark-shell \
482
485
  --driver-memory 16g \
483
486
  --conf spark.kryoserializer.buffer.max=2000M \
484
- --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.3
487
+ --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.0
485
488
  ```
486
489
 
487
490
  ## Scala
@@ -499,7 +502,7 @@ coordinates:
499
502
  <dependency>
500
503
  <groupId>com.johnsnowlabs.nlp</groupId>
501
504
  <artifactId>spark-nlp_2.12</artifactId>
502
- <version>5.2.3</version>
505
+ <version>5.3.0</version>
503
506
  </dependency>
504
507
  ```
505
508
 
@@ -510,7 +513,7 @@ coordinates:
510
513
  <dependency>
511
514
  <groupId>com.johnsnowlabs.nlp</groupId>
512
515
  <artifactId>spark-nlp-gpu_2.12</artifactId>
513
- <version>5.2.3</version>
516
+ <version>5.3.0</version>
514
517
  </dependency>
515
518
  ```
516
519
 
@@ -521,7 +524,7 @@ coordinates:
521
524
  <dependency>
522
525
  <groupId>com.johnsnowlabs.nlp</groupId>
523
526
  <artifactId>spark-nlp-aarch64_2.12</artifactId>
524
- <version>5.2.3</version>
527
+ <version>5.3.0</version>
525
528
  </dependency>
526
529
  ```
527
530
 
@@ -532,7 +535,7 @@ coordinates:
532
535
  <dependency>
533
536
  <groupId>com.johnsnowlabs.nlp</groupId>
534
537
  <artifactId>spark-nlp-silicon_2.12</artifactId>
535
- <version>5.2.3</version>
538
+ <version>5.3.0</version>
536
539
  </dependency>
537
540
  ```
538
541
 
@@ -542,28 +545,28 @@ coordinates:
542
545
 
543
546
  ```sbtshell
544
547
  // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp
545
- libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "5.2.3"
548
+ libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "5.3.0"
546
549
  ```
547
550
 
548
551
  **spark-nlp-gpu:**
549
552
 
550
553
  ```sbtshell
551
554
  // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-gpu
552
- libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "5.2.3"
555
+ libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "5.3.0"
553
556
  ```
554
557
 
555
558
  **spark-nlp-aarch64:**
556
559
 
557
560
  ```sbtshell
558
561
  // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-aarch64
559
- libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "5.2.3"
562
+ libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "5.3.0"
560
563
  ```
561
564
 
562
565
  **spark-nlp-silicon:**
563
566
 
564
567
  ```sbtshell
565
568
  // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-silicon
566
- libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "5.2.3"
569
+ libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "5.3.0"
567
570
  ```
568
571
 
569
572
  Maven
@@ -585,7 +588,7 @@ If you installed pyspark through pip/conda, you can install `spark-nlp` through
585
588
  Pip:
586
589
 
587
590
  ```bash
588
- pip install spark-nlp==5.2.3
591
+ pip install spark-nlp==5.3.0
589
592
  ```
590
593
 
591
594
  Conda:
@@ -614,7 +617,7 @@ spark = SparkSession.builder
614
617
  .config("spark.driver.memory", "16G")
615
618
  .config("spark.driver.maxResultSize", "0")
616
619
  .config("spark.kryoserializer.buffer.max", "2000M")
617
- .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.3")
620
+ .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.0")
618
621
  .getOrCreate()
619
622
  ```
620
623
 
@@ -685,7 +688,7 @@ Use either one of the following options
685
688
  - Add the following Maven Coordinates to the interpreter's library list
686
689
 
687
690
  ```bash
688
- com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.3
691
+ com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.0
689
692
  ```
690
693
 
691
694
  - Add a path to pre-built jar from [here](#compiled-jars) in the interpreter's library list making sure the jar is
@@ -696,7 +699,7 @@ com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.3
696
699
  Apart from the previous step, install the python module through pip
697
700
 
698
701
  ```bash
699
- pip install spark-nlp==5.2.3
702
+ pip install spark-nlp==5.3.0
700
703
  ```
701
704
 
702
705
  Or you can install `spark-nlp` from inside Zeppelin by using Conda:
@@ -724,7 +727,7 @@ launch the Jupyter from the same Python environment:
724
727
  $ conda create -n sparknlp python=3.8 -y
725
728
  $ conda activate sparknlp
726
729
  # spark-nlp by default is based on pyspark 3.x
727
- $ pip install spark-nlp==5.2.3 pyspark==3.3.1 jupyter
730
+ $ pip install spark-nlp==5.3.0 pyspark==3.3.1 jupyter
728
731
  $ jupyter notebook
729
732
  ```
730
733
 
@@ -741,7 +744,7 @@ export PYSPARK_PYTHON=python3
741
744
  export PYSPARK_DRIVER_PYTHON=jupyter
742
745
  export PYSPARK_DRIVER_PYTHON_OPTS=notebook
743
746
 
744
- pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.3
747
+ pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.0
745
748
  ```
746
749
 
747
750
  Alternatively, you can mix in using `--jars` option for pyspark + `pip install spark-nlp`
@@ -768,7 +771,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
768
771
  # -s is for spark-nlp
769
772
  # -g will enable upgrading libcudnn8 to 8.1.0 on Google Colab for GPU usage
770
773
  # by default they are set to the latest
771
- !wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.2.3
774
+ !wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.3.0
772
775
  ```
773
776
 
774
777
  [Spark NLP quick start on Google Colab](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/quick_start_google_colab.ipynb)
@@ -791,7 +794,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
791
794
  # -s is for spark-nlp
792
795
  # -g will enable upgrading libcudnn8 to 8.1.0 on Kaggle for GPU usage
793
796
  # by default they are set to the latest
794
- !wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.2.3
797
+ !wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.3.0
795
798
  ```
796
799
 
797
800
  [Spark NLP quick start on Kaggle Kernel](https://www.kaggle.com/mozzie/spark-nlp-named-entity-recognition) is a live
@@ -810,9 +813,9 @@ demo on Kaggle Kernel that performs named entity recognitions by using Spark NLP
810
813
 
811
814
  3. In `Libraries` tab inside your cluster you need to follow these steps:
812
815
 
813
- 3.1. Install New -> PyPI -> `spark-nlp==5.2.3` -> Install
816
+ 3.1. Install New -> PyPI -> `spark-nlp==5.3.0` -> Install
814
817
 
815
- 3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.3` -> Install
818
+ 3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.0` -> Install
816
819
 
817
820
  4. Now you can attach your notebook to the cluster and use Spark NLP!
818
821
 
@@ -863,7 +866,7 @@ A sample of your software configuration in JSON on S3 (must be public access):
863
866
  "spark.kryoserializer.buffer.max": "2000M",
864
867
  "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
865
868
  "spark.driver.maxResultSize": "0",
866
- "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.3"
869
+ "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.0"
867
870
  }
868
871
  }]
869
872
  ```
@@ -872,7 +875,7 @@ A sample of AWS CLI to launch EMR cluster:
872
875
 
873
876
  ```.sh
874
877
  aws emr create-cluster \
875
- --name "Spark NLP 5.2.3" \
878
+ --name "Spark NLP 5.3.0" \
876
879
  --release-label emr-6.2.0 \
877
880
  --applications Name=Hadoop Name=Spark Name=Hive \
878
881
  --instance-type m4.4xlarge \
@@ -936,7 +939,7 @@ gcloud dataproc clusters create ${CLUSTER_NAME} \
936
939
  --enable-component-gateway \
937
940
  --metadata 'PIP_PACKAGES=spark-nlp spark-nlp-display google-cloud-bigquery google-cloud-storage' \
938
941
  --initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/python/pip-install.sh \
939
- --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.3
942
+ --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.0
940
943
  ```
941
944
 
942
945
  2. On an existing one, you need to install spark-nlp and spark-nlp-display packages from PyPI.
@@ -979,7 +982,7 @@ spark = SparkSession.builder
979
982
  .config("spark.kryoserializer.buffer.max", "2000m")
980
983
  .config("spark.jsl.settings.pretrained.cache_folder", "sample_data/pretrained")
981
984
  .config("spark.jsl.settings.storage.cluster_tmp_dir", "sample_data/storage")
982
- .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.3")
985
+ .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.0")
983
986
  .getOrCreate()
984
987
  ```
985
988
 
@@ -993,7 +996,7 @@ spark-shell \
993
996
  --conf spark.kryoserializer.buffer.max=2000M \
994
997
  --conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \
995
998
  --conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \
996
- --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.3
999
+ --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.0
997
1000
  ```
998
1001
 
999
1002
  **pyspark:**
@@ -1006,7 +1009,7 @@ pyspark \
1006
1009
  --conf spark.kryoserializer.buffer.max=2000M \
1007
1010
  --conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \
1008
1011
  --conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \
1009
- --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.3
1012
+ --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.0
1010
1013
  ```
1011
1014
 
1012
1015
  **Databricks:**
@@ -1278,7 +1281,7 @@ spark = SparkSession.builder
1278
1281
  .config("spark.driver.memory", "16G")
1279
1282
  .config("spark.driver.maxResultSize", "0")
1280
1283
  .config("spark.kryoserializer.buffer.max", "2000M")
1281
- .config("spark.jars", "/tmp/spark-nlp-assembly-5.2.3.jar")
1284
+ .config("spark.jars", "/tmp/spark-nlp-assembly-5.3.0.jar")
1282
1285
  .getOrCreate()
1283
1286
  ```
1284
1287
 
@@ -1287,7 +1290,7 @@ spark = SparkSession.builder
1287
1290
  version (3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x)
1288
1291
  - If you are local, you can load the Fat JAR from your local FileSystem, however, if you are in a cluster setup you need
1289
1292
  to put the Fat JAR on a distributed FileSystem such as HDFS, DBFS, S3, etc. (
1290
- i.e., `hdfs:///tmp/spark-nlp-assembly-5.2.3.jar`)
1293
+ i.e., `hdfs:///tmp/spark-nlp-assembly-5.3.0.jar`)
1291
1294
 
1292
1295
  Example of using pretrained Models and Pipelines in offline:
1293
1296
 
@@ -1,7 +1,7 @@
1
1
  com/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
2
2
  com/johnsnowlabs/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
3
3
  com/johnsnowlabs/nlp/__init__.py,sha256=DPIVXtONO5xXyOk-HB0-sNiHAcco17NN13zPS_6Uw8c,294
4
- sparknlp/__init__.py,sha256=qzDxFYDRyF2Jw1kVlbunQjoL6qtiJ5EA9td1vsm1J5w,13588
4
+ sparknlp/__init__.py,sha256=Q51KCuoFbaMIgFWGe-we8xtE3uhZuEC3wFc2XeZZcMU,13588
5
5
  sparknlp/annotation.py,sha256=I5zOxG5vV2RfPZfqN9enT1i4mo6oBcn3Lrzs37QiOiA,5635
6
6
  sparknlp/annotation_audio.py,sha256=iRV_InSVhgvAwSRe9NTbUH9v6OGvTM-FPCpSAKVu0mE,1917
7
7
  sparknlp/annotation_image.py,sha256=xhCe8Ko-77XqWVuuYHFrjKqF6zPd8Z-RY_rmZXNwCXU,2547
@@ -22,13 +22,13 @@ sparknlp/annotator/n_gram_generator.py,sha256=KRX5xfxmorOfYQkQHZWkkXjwjC13gDTAXn
22
22
  sparknlp/annotator/normalizer.py,sha256=7AkAOB-e8b2uyUBwYoq9HvMPijOwV3wEoxcB3BVsr4w,8780
23
23
  sparknlp/annotator/stemmer.py,sha256=Tl48voyG9wqbT5MAA1hDKW90NorU8rIDhttJxOo1s3Q,2948
24
24
  sparknlp/annotator/stop_words_cleaner.py,sha256=Z9yI9AWDIAXbPM2X6n84voiW31Z20XofCL-tTQNo5ro,7015
25
- sparknlp/annotator/tf_ner_dl_graph_builder.py,sha256=jYciU0fTWg4q1MSwPG5oha6vMeALIDdCK8TPR8-78hg,6373
25
+ sparknlp/annotator/tf_ner_dl_graph_builder.py,sha256=ovsRBUfw9lJkuetmrcYRmW1Ll-33sdDPi4xJ0M_Fs7k,6379
26
26
  sparknlp/annotator/token2_chunk.py,sha256=FtS2Doav9xL1IrC9ZUU4iXqyipp-iT3g68kZt-7YCcQ,2674
27
27
  sparknlp/annotator/audio/__init__.py,sha256=dXjtvi5c0aTZFq1Q_JciUd1uFTBVSJoUdcq0hiYd8yk,757
28
28
  sparknlp/annotator/audio/hubert_for_ctc.py,sha256=76PfwPZZvOHU5kfDqLueCFbmqa4W8pMNRGoCvOqjsEA,7859
29
29
  sparknlp/annotator/audio/wav2vec2_for_ctc.py,sha256=K78P1U6vA4O1UufsLYzy0H7arsKNmwPcIV7kzDFsA5Q,6210
30
30
  sparknlp/annotator/audio/whisper_for_ctc.py,sha256=uII51umuohqwnAW0Q7VdxEFyr_j5LMnfpcRlf8TbetA,9800
31
- sparknlp/annotator/classifier_dl/__init__.py,sha256=WYVdQaqdVEbOm3guZsoiNAPDPkli7qt3fx3xhUU2stQ,3444
31
+ sparknlp/annotator/classifier_dl/__init__.py,sha256=tGg78A8LUgobZFre_3ySN51KGNyl0Zx0inxT9yfL_g8,3686
32
32
  sparknlp/annotator/classifier_dl/albert_for_question_answering.py,sha256=LG2dL6Fky1T35yXTUZBfIihIIGnkRFQ7ECQ3HRXXEG8,6517
33
33
  sparknlp/annotator/classifier_dl/albert_for_sequence_classification.py,sha256=kWx7f9pcKE2qw319gn8FN0Md5dX38gbmfeoY9gWCLNk,7842
34
34
  sparknlp/annotator/classifier_dl/albert_for_token_classification.py,sha256=5rdsjWnsAVmtP-idU7ATKJ8lkH2rtlKZLnpi4Mq27eI,6839
@@ -36,7 +36,7 @@ sparknlp/annotator/classifier_dl/bart_for_zero_shot_classification.py,sha256=yqQ
36
36
  sparknlp/annotator/classifier_dl/bert_for_question_answering.py,sha256=2euY_RAdMPA4IHJXZAd5MkQojFOtFNhB_hSc1iVQ5DQ,6433
37
37
  sparknlp/annotator/classifier_dl/bert_for_sequence_classification.py,sha256=AzD3RQcRuQc0DDTbL6vGiacTtHlZnbAqksNvRQq7EQE,7800
38
38
  sparknlp/annotator/classifier_dl/bert_for_token_classification.py,sha256=uJXoDLPfPWiRmKqtw_3lLBvneIirj87S2JWwfd33zq8,6668
39
- sparknlp/annotator/classifier_dl/bert_for_zero_shot_classification.py,sha256=0DW8bn257A8NO4uOss2meBDnbCcASDT7cQHQzbvT-X0,8351
39
+ sparknlp/annotator/classifier_dl/bert_for_zero_shot_classification.py,sha256=mli7_TZjrFs6GPwWtgpPty6HrRKIXrEZKjcR00NKyBo,8318
40
40
  sparknlp/annotator/classifier_dl/camembert_for_question_answering.py,sha256=BeE-62tFkXMoyiy3PtcnwgT2-wqzTFo5VZHrWUqsWmM,6510
41
41
  sparknlp/annotator/classifier_dl/camembert_for_sequence_classification.py,sha256=06bkwhNBcmNS5gR_JrMjBDW3jAdjEI5YL4SuV16Va7E,7962
42
42
  sparknlp/annotator/classifier_dl/camembert_for_token_classification.py,sha256=vjwDE_kZiBupENaYvUZOTTqVOb3KCsGse-QX3QOutz4,6522
@@ -44,6 +44,7 @@ sparknlp/annotator/classifier_dl/classifier_dl.py,sha256=Dj-T5ByCgzgFpah7LVz_07Q
44
44
  sparknlp/annotator/classifier_dl/deberta_for_question_answering.py,sha256=oikVBeVohsSR9HPV_yq_0U7zHps94UO4lXbYu9G7MF0,6486
45
45
  sparknlp/annotator/classifier_dl/deberta_for_sequence_classification.py,sha256=H2LDT8ttD9hxfFDrymsyCq0EwCuWl5FE2-XVqT9LcRQ,7773
46
46
  sparknlp/annotator/classifier_dl/deberta_for_token_classification.py,sha256=jj5hB9AV-0Of505E6z62lYPIWmsqNeTX0vRRq3_7T9I,6807
47
+ sparknlp/annotator/classifier_dl/deberta_for_zero_shot_classification.py,sha256=AmCSFpR0xsxwus6spCpiw6zduGtvg4B_lLS5PUDXjvc,8711
47
48
  sparknlp/annotator/classifier_dl/distil_bert_for_question_answering.py,sha256=yA4LrI4RN4f44wbIrdpwqderTJBhAkjAHpUxcCeCROE,6552
48
49
  sparknlp/annotator/classifier_dl/distil_bert_for_sequence_classification.py,sha256=Cax3LcVLppiHs1dyahsBSq_TLHSwI2-K7LGCZHZNs1I,7926
49
50
  sparknlp/annotator/classifier_dl/distil_bert_for_token_classification.py,sha256=y9S83LW0Mfn4fRzopRXFj8l2gb-Nrm1rr9zRftOckJU,6832
@@ -51,6 +52,8 @@ sparknlp/annotator/classifier_dl/distil_bert_for_zero_shot_classification.py,sha
51
52
  sparknlp/annotator/classifier_dl/longformer_for_question_answering.py,sha256=VKbOKSTtwdeSsSzB2oKiRlFwSOcpHuMfkvgGM3ofBIo,6553
52
53
  sparknlp/annotator/classifier_dl/longformer_for_sequence_classification.py,sha256=_XO3Ufl_wHyUgUIechZ6J1VCE2G2W-FUPZfHmJSfQvk,7932
53
54
  sparknlp/annotator/classifier_dl/longformer_for_token_classification.py,sha256=RmiFuBRhIAoJoQ8Rgcu997-PxBK1hhWuLVlS1qztMyk,6848
55
+ sparknlp/annotator/classifier_dl/mpnet_for_question_answering.py,sha256=w9hHLrQbDIUHAdCKiXNDneAbohMKopixAKU2wkYkqbs,5522
56
+ sparknlp/annotator/classifier_dl/mpnet_for_sequence_classification.py,sha256=M__giFElL6Q3I88QD6OoXDzdQDk_Zp5sS__Kh_XpLdo,7308
54
57
  sparknlp/annotator/classifier_dl/multi_classifier_dl.py,sha256=ylKQzS7ROyeKeiOF4BZiIkQV1sfrnfUUQ9LXFSFK_Vo,16045
55
58
  sparknlp/annotator/classifier_dl/roberta_for_question_answering.py,sha256=WRxu1uhXnY9C4UHdtJ8qiVGhPSX7sCdSaML0AWHOdJw,6471
56
59
  sparknlp/annotator/classifier_dl/roberta_for_sequence_classification.py,sha256=z97uH5WkG8kPX1Y9qtpLwD7egl0kzbVmxtq4xzZgNNI,7857
@@ -100,7 +103,7 @@ sparknlp/annotator/embeddings/xlm_roberta_embeddings.py,sha256=t-Bg1bQcqI_fIqUWQ
100
103
  sparknlp/annotator/embeddings/xlm_roberta_sentence_embeddings.py,sha256=ojxD3H2VgDEn-RzDdCz0X485pojHBAFrlzsNemI05bY,8602
101
104
  sparknlp/annotator/embeddings/xlnet_embeddings.py,sha256=hJrlsJeO3D7uz54xiEiqqXEbq24YGuWz8U652PV9fNE,9336
102
105
  sparknlp/annotator/er/__init__.py,sha256=eF9Z-PanVfZWSVN2HSFbE7QjCDb6NYV5ESn6geYKlek,692
103
- sparknlp/annotator/er/entity_ruler.py,sha256=NFJgUMh6PV6XdzAdONX9icDbGxdBLze9NrOm_lhezPo,8785
106
+ sparknlp/annotator/er/entity_ruler.py,sha256=7eZtAwoixkl88jTyKEqTKf9Wzo459VXQkYmFBozUY6A,8784
104
107
  sparknlp/annotator/keyword_extraction/__init__.py,sha256=KotCR238x7LgisinsRGaARgPygWUIwC624FmH-sHacE,720
105
108
  sparknlp/annotator/keyword_extraction/yake_keyword_extraction.py,sha256=oeB-8qdMoljG-mgFOCsfnpxyK5jFBZnX7jAUQwsnHTc,13215
106
109
  sparknlp/annotator/ld_dl/__init__.py,sha256=gWNGOaozABT83J4Mn7JmNQsXzm27s3PHpMQmlXl-5L8,704
@@ -132,13 +135,15 @@ sparknlp/annotator/sentence/sentence_detector_dl.py,sha256=-Osj9Bm9KyZRTAWkOsK9c
132
135
  sparknlp/annotator/sentiment/__init__.py,sha256=Lq3vKaZS1YATLMg0VNXSVtkWL5q5G9taGBvdrvSwnfg,766
133
136
  sparknlp/annotator/sentiment/sentiment_detector.py,sha256=m545NGU0Xzg_PO6_qIfpli1uZj7JQcyFgqe9R6wAPFI,8154
134
137
  sparknlp/annotator/sentiment/vivekn_sentiment.py,sha256=4rpXWDgzU6ddnbrSCp9VdLb2epCc9oZ3c6XcqxEw8nk,9655
135
- sparknlp/annotator/seq2seq/__init__.py,sha256=qSfofkO7oQAdgxFct7U-8eTfewEL2V-I_u-EnWnD89s,891
138
+ sparknlp/annotator/seq2seq/__init__.py,sha256=UQK-_3wLkUdW1piGudCx1_k3Tg3tERZJYOBnfMRj8pA,1011
136
139
  sparknlp/annotator/seq2seq/bart_transformer.py,sha256=I1flM4yeCzEAKOdQllBC30XuedxVJ7ferkFhZ6gwEbE,18481
137
140
  sparknlp/annotator/seq2seq/gpt2_transformer.py,sha256=Oz95R_NRR4tWHu_bW6Ak2832ZILXycp3ify7LfRSi8o,15310
141
+ sparknlp/annotator/seq2seq/llama2_transformer.py,sha256=YPge5f4qfv7XZY_LoH2HRzvbZ--XoTTY_BupxxYaCd8,13862
142
+ sparknlp/annotator/seq2seq/m2m100_transformer.py,sha256=fTFGFWaFfJt5kaLvnYknf_23PVyjBuha48asFsE_NaE,16082
138
143
  sparknlp/annotator/seq2seq/marian_transformer.py,sha256=mQ4Ylh7ZzXAOue8f-x0gqzfS3vAz3XUdD7eQ2XhcEs4,13781
139
144
  sparknlp/annotator/seq2seq/t5_transformer.py,sha256=wDVxNLluIU1HGZFqaKKc4YTt4l-elPlAtQ7EEa0f5tg,17308
140
145
  sparknlp/annotator/similarity/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
141
- sparknlp/annotator/similarity/document_similarity_ranker.py,sha256=2UQwHQA05lkQ9e_Z_YVAbpWFdBv0ac4oJDroqCJNbF0,14226
146
+ sparknlp/annotator/similarity/document_similarity_ranker.py,sha256=OFAXEBuALFJglwThsGK8YaJ_pgW1tcevB7jVq-8SyKM,14991
142
147
  sparknlp/annotator/spell_check/__init__.py,sha256=sdnPR3f3Q9mHiv-n4g_O7KpRWPRPweyATSF6Tth_Niw,830
143
148
  sparknlp/annotator/spell_check/context_spell_checker.py,sha256=OtjN51K3TyQpFmZrhPrvxZwCJsENFwTkeNKQYWrP-Gw,31992
144
149
  sparknlp/annotator/spell_check/norvig_sweeting.py,sha256=6ET9KnAqXIQDJ5U9px1ixUbC6R63ln_ljruvh_oLiwA,13197
@@ -155,13 +160,13 @@ sparknlp/base/audio_assembler.py,sha256=HKa9mXvmuMUrjTihUZkppGj-WJjcUrm2BGapNuPi
155
160
  sparknlp/base/doc2_chunk.py,sha256=TyvbdJNkVo9favHlOEoH5JwKbjpk5ZVJ75p8Cilp9jM,6551
156
161
  sparknlp/base/document_assembler.py,sha256=zl-SXWMTR3B0EZ8z6SWYchCwEo-61FhU6u7dHUKDIOg,6697
157
162
  sparknlp/base/embeddings_finisher.py,sha256=5QU1Okgl2ULrPVf4ze1H0SsRCMYXWGARtUsT7dagBYA,7659
158
- sparknlp/base/finisher.py,sha256=kQxR50A82xNrP2ainEAXoVWz0ZZG_ZHmcgzBIAwZqos,8577
163
+ sparknlp/base/finisher.py,sha256=V4wkMm9Ug09q4zTQc9T9Wr-awmu2Hu-eNaJ039YgZXM,8583
159
164
  sparknlp/base/graph_finisher.py,sha256=a8fxk3ei2YQw6s0Y9Yy8oMOF1i1XUrgqaiwVE0VPt4w,4834
160
165
  sparknlp/base/has_recursive_fit.py,sha256=P55rSHLIXhihXWS2bOC_DskcQTc3njieVD1JkjS2bcA,849
161
166
  sparknlp/base/has_recursive_transform.py,sha256=UkGNgo4LMsjQC-Coeefg4bJcg7FoPcPiG382zEa6Ywk,841
162
167
  sparknlp/base/image_assembler.py,sha256=HytRoYJTLMqGtvScHoFnp6CasG9IVNYAHYiT2_rrmeE,3719
163
- sparknlp/base/light_pipeline.py,sha256=7uS9RV2dJF6Xjo0qrhdZabNrd82ERMy9SQpTmboU-RY,16541
164
- sparknlp/base/multi_document_assembler.py,sha256=tos6BYCtfTAplcmP2zphqdqPW9eQbHkzy9t-fwQ33Ww,7064
168
+ sparknlp/base/light_pipeline.py,sha256=Jk2DLpT4PLHCANlOo_WetTdPba_5lYs3ywiyY3lM-PE,16577
169
+ sparknlp/base/multi_document_assembler.py,sha256=4htET1fRAeOB6zhsNXsBq5rKZvn-LGD4vrFRjPZeqow,7070
165
170
  sparknlp/base/recursive_pipeline.py,sha256=V9rTnu8KMwgjoceykN9pF1mKGtOkkuiC_n9v8dE3LDk,4279
166
171
  sparknlp/base/table_assembler.py,sha256=Kxu3R2fY6JgCxEc07ibsMsjip6dgcPDHLiWAZ8gC_d8,5102
167
172
  sparknlp/base/token_assembler.py,sha256=qiHry07L7mVCqeHSH6hHxLygv1AsfZIE4jy1L75L3Do,5075
@@ -177,7 +182,7 @@ sparknlp/common/read_as.py,sha256=imxPGwV7jr4Li_acbo0OAHHRGCBbYv-akzEGaBWEfcY,12
177
182
  sparknlp/common/recursive_annotator_approach.py,sha256=vqugBw22cE3Ff7PIpRlnYFuOlchgL0nM26D8j-NdpqU,1449
178
183
  sparknlp/common/storage.py,sha256=D91H3p8EIjNspjqAYu6ephRpCUtdcAir4_PrAbkIQWE,4842
179
184
  sparknlp/common/utils.py,sha256=Yne6yYcwKxhOZC-U4qfYoDhWUP_6BIaAjI5X_P_df1E,1306
180
- sparknlp/internal/__init__.py,sha256=Al75CEdkKLyJSwXMzMIETKYo9vczWXtkPTagZ4TGydw,25119
185
+ sparknlp/internal/__init__.py,sha256=g4REY_0X2Sr05szDb9681oiPqRWlT4KaOpcAOj3q32A,26496
181
186
  sparknlp/internal/annotator_java_ml.py,sha256=UGPoThG0rGXUOXGSQnDzEDW81Mu1s5RPF29v7DFyE3c,1187
182
187
  sparknlp/internal/annotator_transformer.py,sha256=fXmc2IWXGybqZpbEU9obmbdBYPc798y42zvSB4tqV9U,1448
183
188
  sparknlp/internal/extended_java_wrapper.py,sha256=hwP0133-hDiDf5sBF-P3MtUsuuDj1PpQbtGZQIRwzfk,2240
@@ -219,7 +224,7 @@ sparknlp/training/_tf_graph_builders_1x/ner_dl/dataset_encoder.py,sha256=R4yHFN3
219
224
  sparknlp/training/_tf_graph_builders_1x/ner_dl/ner_model.py,sha256=EoCSdcIjqQ3wv13MAuuWrKV8wyVBP0SbOEW41omHlR0,23189
220
225
  sparknlp/training/_tf_graph_builders_1x/ner_dl/ner_model_saver.py,sha256=k5CQ7gKV6HZbZMB8cKLUJuZxoZWlP_DFWdZ--aIDwsc,2356
221
226
  sparknlp/training/_tf_graph_builders_1x/ner_dl/sentence_grouper.py,sha256=pAxjWhjazSX8Vg0MFqJiuRVw1IbnQNSs-8Xp26L4nko,870
222
- spark_nlp-5.2.3.dist-info/METADATA,sha256=QXMxdjxt8d8HEmdpys1UOmdWUvb1KfIdwYhfQ8pnSU0,56589
223
- spark_nlp-5.2.3.dist-info/WHEEL,sha256=bb2Ot9scclHKMOLDEHY6B2sicWOgugjFKaJsT7vwMQo,110
224
- spark_nlp-5.2.3.dist-info/top_level.txt,sha256=uuytur4pyMRw2H_txNY2ZkaucZHUs22QF8-R03ch_-E,13
225
- spark_nlp-5.2.3.dist-info/RECORD,,
227
+ spark_nlp-5.3.0.dist-info/METADATA,sha256=t_H-q3uAb32zOrySfga2iCrqy3oCXZyfws1a7JaCGz8,57087
228
+ spark_nlp-5.3.0.dist-info/WHEEL,sha256=bb2Ot9scclHKMOLDEHY6B2sicWOgugjFKaJsT7vwMQo,110
229
+ spark_nlp-5.3.0.dist-info/top_level.txt,sha256=uuytur4pyMRw2H_txNY2ZkaucZHUs22QF8-R03ch_-E,13
230
+ spark_nlp-5.3.0.dist-info/RECORD,,
sparknlp/__init__.py CHANGED
@@ -128,7 +128,7 @@ def start(gpu=False,
128
128
  The initiated Spark session.
129
129
 
130
130
  """
131
- current_version = "5.2.3"
131
+ current_version = "5.3.0"
132
132
 
133
133
  if params is None:
134
134
  params = {}
@@ -309,4 +309,4 @@ def version():
309
309
  str
310
310
  The current Spark NLP version.
311
311
  """
312
- return '5.2.3'
312
+ return '5.3.0'
@@ -47,4 +47,7 @@ from sparknlp.annotator.classifier_dl.bert_for_zero_shot_classification import *
47
47
  from sparknlp.annotator.classifier_dl.distil_bert_for_zero_shot_classification import *
48
48
  from sparknlp.annotator.classifier_dl.roberta_for_zero_shot_classification import *
49
49
  from sparknlp.annotator.classifier_dl.xlm_roberta_for_zero_shot_classification import *
50
- from sparknlp.annotator.classifier_dl.bart_for_zero_shot_classification import *
50
+ from sparknlp.annotator.classifier_dl.bart_for_zero_shot_classification import *
51
+ from sparknlp.annotator.classifier_dl.deberta_for_zero_shot_classification import *
52
+ from sparknlp.annotator.classifier_dl.mpnet_for_sequence_classification import *
53
+ from sparknlp.annotator.classifier_dl.mpnet_for_question_answering import *
@@ -41,7 +41,7 @@ class BertForZeroShotClassification(AnnotatorModel,
41
41
  ... .setInputCols(["token", "document"]) \\
42
42
  ... .setOutputCol("label")
43
43
 
44
- The default model is ``"bert_base_cased_zero_shot_classifier_xnli"``, if no name is
44
+ The default model is ``"bert_zero_shot_classifier_mnli"``, if no name is
45
45
  provided.
46
46
 
47
47
  For available pretrained models please see the `Models Hub
@@ -189,14 +189,14 @@ class BertForZeroShotClassification(AnnotatorModel,
189
189
  return BertForZeroShotClassification(java_model=jModel)
190
190
 
191
191
  @staticmethod
192
- def pretrained(name="bert_base_cased_zero_shot_classifier_xnli", lang="en", remote_loc=None):
192
+ def pretrained(name="bert_zero_shot_classifier_mnli", lang="xx", remote_loc=None):
193
193
  """Downloads and loads a pretrained model.
194
194
 
195
195
  Parameters
196
196
  ----------
197
197
  name : str, optional
198
198
  Name of the pretrained model, by default
199
- "bert_base_cased_zero_shot_classifier_xnli"
199
+ "bert_zero_shot_classifier_mnli"
200
200
  lang : str, optional
201
201
  Language of the pretrained model, by default "en"
202
202
  remote_loc : str, optional