spark-nlp 5.4.0__py2.py3-none-any.whl → 5.4.0rc1__py2.py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of spark-nlp might be problematic. Click here for more details.

@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: spark-nlp
3
- Version: 5.4.0
3
+ Version: 5.4.0rc1
4
4
  Summary: John Snow Labs Spark NLP is a natural language processing library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines, that scale easily in a distributed environment.
5
5
  Home-page: https://github.com/JohnSnowLabs/spark-nlp
6
6
  Author: John Snow Labs
@@ -146,7 +146,6 @@ documentation and examples
146
146
  - INSTRUCTOR Embeddings (HuggingFace models)
147
147
  - E5 Embeddings (HuggingFace models)
148
148
  - MPNet Embeddings (HuggingFace models)
149
- - UAE Embeddings (HuggingFace models)
150
149
  - OpenAI Embeddings
151
150
  - Sentence & Chunk Embeddings
152
151
  - Unsupervised keywords extraction
@@ -171,7 +170,7 @@ documentation and examples
171
170
  - Text-To-Text Transfer Transformer (Google T5)
172
171
  - Generative Pre-trained Transformer 2 (OpenAI GPT2)
173
172
  - Seq2Seq for NLG, Translation, and Comprehension (Facebook BART)
174
- - Chat and Conversational LLMs (Facebook Llama-2)
173
+ - Chat and Conversational LLMs (Facebook Llama-22)
175
174
  - Vision Transformer (Google ViT)
176
175
  - Swin Image Classification (Microsoft Swin Transformer)
177
176
  - ConvNext Image Classification (Facebook ConvNext)
@@ -181,10 +180,10 @@ documentation and examples
181
180
  - Automatic Speech Recognition (HuBERT)
182
181
  - Automatic Speech Recognition (OpenAI Whisper)
183
182
  - Named entity recognition (Deep learning)
184
- - Easy ONNX, OpenVINO, and TensorFlow integrations
183
+ - Easy ONNX and TensorFlow integrations
185
184
  - GPU Support
186
185
  - Full integration with Spark ML functions
187
- - +31000 pre-trained models in +200 languages!
186
+ - +30000 pre-trained models in +200 languages!
188
187
  - +6000 pre-trained pipelines in +200 languages!
189
188
  - Multi-lingual NER models: Arabic, Bengali, Chinese, Danish, Dutch, English, Finnish, French, German, Hebrew, Italian,
190
189
  Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Urdu, and more.
@@ -198,7 +197,7 @@ To use Spark NLP you need the following requirements:
198
197
 
199
198
  **GPU (optional):**
200
199
 
201
- Spark NLP 5.4.0 is built with ONNX 1.17.0 and TensorFlow 2.7.1 deep learning engines. The minimum following NVIDIA® software are only required for GPU support:
200
+ Spark NLP 5.4.0-rc1 is built with ONNX 1.17.0 and TensorFlow 2.7.1 deep learning engines. The minimum following NVIDIA® software are only required for GPU support:
202
201
 
203
202
  - NVIDIA® GPU drivers version 450.80.02 or higher
204
203
  - CUDA® Toolkit 11.2
@@ -214,7 +213,7 @@ $ java -version
214
213
  $ conda create -n sparknlp python=3.7 -y
215
214
  $ conda activate sparknlp
216
215
  # spark-nlp by default is based on pyspark 3.x
217
- $ pip install spark-nlp==5.4.0 pyspark==3.3.1
216
+ $ pip install spark-nlp==5.4.0-rc1 pyspark==3.3.1
218
217
  ```
219
218
 
220
219
  In Python console or Jupyter `Python3` kernel:
@@ -259,11 +258,10 @@ For more examples, you can visit our dedicated [examples](https://github.com/Joh
259
258
 
260
259
  ## Apache Spark Support
261
260
 
262
- Spark NLP *5.4.0* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x
261
+ Spark NLP *5.4.0-rc1* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x
263
262
 
264
263
  | Spark NLP | Apache Spark 3.5.x | Apache Spark 3.4.x | Apache Spark 3.3.x | Apache Spark 3.2.x | Apache Spark 3.1.x | Apache Spark 3.0.x | Apache Spark 2.4.x | Apache Spark 2.3.x |
265
264
  |-----------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|
266
- | 5.4.x | YES | YES | YES | YES | YES | YES | NO | NO |
267
265
  | 5.3.x | YES | YES | YES | YES | YES | YES | NO | NO |
268
266
  | 5.2.x | YES | YES | YES | YES | YES | YES | NO | NO |
269
267
  | 5.1.x | Partially | YES | YES | YES | YES | YES | NO | NO |
@@ -273,6 +271,12 @@ Spark NLP *5.4.0* has been built on top of Apache Spark 3.4 while fully supports
273
271
  | 4.2.x | NO | NO | YES | YES | YES | YES | NO | NO |
274
272
  | 4.1.x | NO | NO | YES | YES | YES | YES | NO | NO |
275
273
  | 4.0.x | NO | NO | YES | YES | YES | YES | NO | NO |
274
+ | 3.4.x | NO | NO | N/A | Partially | YES | YES | YES | YES |
275
+ | 3.3.x | NO | NO | NO | NO | YES | YES | YES | YES |
276
+ | 3.2.x | NO | NO | NO | NO | YES | YES | YES | YES |
277
+ | 3.1.x | NO | NO | NO | NO | YES | YES | YES | YES |
278
+ | 3.0.x | NO | NO | NO | NO | YES | YES | YES | YES |
279
+ | 2.7.x | NO | NO | NO | NO | NO | NO | YES | YES |
276
280
 
277
281
  Find out more about `Spark NLP` versions from our [release notes](https://github.com/JohnSnowLabs/spark-nlp/releases).
278
282
 
@@ -289,10 +293,16 @@ Find out more about `Spark NLP` versions from our [release notes](https://github
289
293
  | 4.2.x | YES | YES | YES | YES | YES | NO | YES |
290
294
  | 4.1.x | YES | YES | YES | YES | NO | NO | YES |
291
295
  | 4.0.x | YES | YES | YES | YES | NO | NO | YES |
296
+ | 3.4.x | YES | YES | YES | YES | NO | YES | YES |
297
+ | 3.3.x | YES | YES | YES | NO | NO | YES | YES |
298
+ | 3.2.x | YES | YES | YES | NO | NO | YES | YES |
299
+ | 3.1.x | YES | YES | YES | NO | NO | YES | YES |
300
+ | 3.0.x | YES | YES | YES | NO | NO | YES | YES |
301
+ | 2.7.x | YES | YES | NO | NO | NO | YES | NO |
292
302
 
293
303
  ## Databricks Support
294
304
 
295
- Spark NLP 5.4.0 has been tested and is compatible with the following runtimes:
305
+ Spark NLP 5.4.0-rc1 has been tested and is compatible with the following runtimes:
296
306
 
297
307
  **CPU:**
298
308
 
@@ -365,7 +375,7 @@ Spark NLP 5.4.0 has been tested and is compatible with the following runtimes:
365
375
 
366
376
  ## EMR Support
367
377
 
368
- Spark NLP 5.4.0 has been tested and is compatible with the following EMR releases:
378
+ Spark NLP 5.4.0-rc1 has been tested and is compatible with the following EMR releases:
369
379
 
370
380
  - emr-6.2.0
371
381
  - emr-6.3.0
@@ -415,11 +425,11 @@ Spark NLP supports all major releases of Apache Spark 3.0.x, Apache Spark 3.1.x,
415
425
  ```sh
416
426
  # CPU
417
427
 
418
- spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
428
+ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc1
419
429
 
420
- pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
430
+ pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc1
421
431
 
422
- spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
432
+ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc1
423
433
  ```
424
434
 
425
435
  The `spark-nlp` has been published to
@@ -428,11 +438,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s
428
438
  ```sh
429
439
  # GPU
430
440
 
431
- spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.4.0
441
+ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.4.0-rc1
432
442
 
433
- pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.4.0
443
+ pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.4.0-rc1
434
444
 
435
- spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.4.0
445
+ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.4.0-rc1
436
446
 
437
447
  ```
438
448
 
@@ -442,11 +452,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s
442
452
  ```sh
443
453
  # AArch64
444
454
 
445
- spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.4.0
455
+ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.4.0-rc1
446
456
 
447
- pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.4.0
457
+ pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.4.0-rc1
448
458
 
449
- spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.4.0
459
+ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.4.0-rc1
450
460
 
451
461
  ```
452
462
 
@@ -456,11 +466,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s
456
466
  ```sh
457
467
  # M1/M2 (Apple Silicon)
458
468
 
459
- spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.4.0
469
+ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.4.0-rc1
460
470
 
461
- pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.4.0
471
+ pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.4.0-rc1
462
472
 
463
- spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.4.0
473
+ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.4.0-rc1
464
474
 
465
475
  ```
466
476
 
@@ -474,7 +484,7 @@ set in your SparkSession:
474
484
  spark-shell \
475
485
  --driver-memory 16g \
476
486
  --conf spark.kryoserializer.buffer.max=2000M \
477
- --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
487
+ --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc1
478
488
  ```
479
489
 
480
490
  ## Scala
@@ -492,7 +502,7 @@ coordinates:
492
502
  <dependency>
493
503
  <groupId>com.johnsnowlabs.nlp</groupId>
494
504
  <artifactId>spark-nlp_2.12</artifactId>
495
- <version>5.4.0</version>
505
+ <version>5.4.0-rc1</version>
496
506
  </dependency>
497
507
  ```
498
508
 
@@ -503,7 +513,7 @@ coordinates:
503
513
  <dependency>
504
514
  <groupId>com.johnsnowlabs.nlp</groupId>
505
515
  <artifactId>spark-nlp-gpu_2.12</artifactId>
506
- <version>5.4.0</version>
516
+ <version>5.4.0-rc1</version>
507
517
  </dependency>
508
518
  ```
509
519
 
@@ -514,7 +524,7 @@ coordinates:
514
524
  <dependency>
515
525
  <groupId>com.johnsnowlabs.nlp</groupId>
516
526
  <artifactId>spark-nlp-aarch64_2.12</artifactId>
517
- <version>5.4.0</version>
527
+ <version>5.4.0-rc1</version>
518
528
  </dependency>
519
529
  ```
520
530
 
@@ -525,7 +535,7 @@ coordinates:
525
535
  <dependency>
526
536
  <groupId>com.johnsnowlabs.nlp</groupId>
527
537
  <artifactId>spark-nlp-silicon_2.12</artifactId>
528
- <version>5.4.0</version>
538
+ <version>5.4.0-rc1</version>
529
539
  </dependency>
530
540
  ```
531
541
 
@@ -535,28 +545,28 @@ coordinates:
535
545
 
536
546
  ```sbtshell
537
547
  // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp
538
- libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "5.4.0"
548
+ libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "5.4.0-rc1"
539
549
  ```
540
550
 
541
551
  **spark-nlp-gpu:**
542
552
 
543
553
  ```sbtshell
544
554
  // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-gpu
545
- libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "5.4.0"
555
+ libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "5.4.0-rc1"
546
556
  ```
547
557
 
548
558
  **spark-nlp-aarch64:**
549
559
 
550
560
  ```sbtshell
551
561
  // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-aarch64
552
- libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "5.4.0"
562
+ libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "5.4.0-rc1"
553
563
  ```
554
564
 
555
565
  **spark-nlp-silicon:**
556
566
 
557
567
  ```sbtshell
558
568
  // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-silicon
559
- libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "5.4.0"
569
+ libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "5.4.0-rc1"
560
570
  ```
561
571
 
562
572
  Maven
@@ -578,7 +588,7 @@ If you installed pyspark through pip/conda, you can install `spark-nlp` through
578
588
  Pip:
579
589
 
580
590
  ```bash
581
- pip install spark-nlp==5.4.0
591
+ pip install spark-nlp==5.4.0-rc1
582
592
  ```
583
593
 
584
594
  Conda:
@@ -607,7 +617,7 @@ spark = SparkSession.builder
607
617
  .config("spark.driver.memory", "16G")
608
618
  .config("spark.driver.maxResultSize", "0")
609
619
  .config("spark.kryoserializer.buffer.max", "2000M")
610
- .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0")
620
+ .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc1")
611
621
  .getOrCreate()
612
622
  ```
613
623
 
@@ -678,7 +688,7 @@ Use either one of the following options
678
688
  - Add the following Maven Coordinates to the interpreter's library list
679
689
 
680
690
  ```bash
681
- com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
691
+ com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc1
682
692
  ```
683
693
 
684
694
  - Add a path to pre-built jar from [here](#compiled-jars) in the interpreter's library list making sure the jar is
@@ -689,7 +699,7 @@ com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
689
699
  Apart from the previous step, install the python module through pip
690
700
 
691
701
  ```bash
692
- pip install spark-nlp==5.4.0
702
+ pip install spark-nlp==5.4.0-rc1
693
703
  ```
694
704
 
695
705
  Or you can install `spark-nlp` from inside Zeppelin by using Conda:
@@ -717,7 +727,7 @@ launch the Jupyter from the same Python environment:
717
727
  $ conda create -n sparknlp python=3.8 -y
718
728
  $ conda activate sparknlp
719
729
  # spark-nlp by default is based on pyspark 3.x
720
- $ pip install spark-nlp==5.4.0 pyspark==3.3.1 jupyter
730
+ $ pip install spark-nlp==5.4.0-rc1 pyspark==3.3.1 jupyter
721
731
  $ jupyter notebook
722
732
  ```
723
733
 
@@ -734,7 +744,7 @@ export PYSPARK_PYTHON=python3
734
744
  export PYSPARK_DRIVER_PYTHON=jupyter
735
745
  export PYSPARK_DRIVER_PYTHON_OPTS=notebook
736
746
 
737
- pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
747
+ pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc1
738
748
  ```
739
749
 
740
750
  Alternatively, you can mix in using `--jars` option for pyspark + `pip install spark-nlp`
@@ -761,7 +771,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
761
771
  # -s is for spark-nlp
762
772
  # -g will enable upgrading libcudnn8 to 8.1.0 on Google Colab for GPU usage
763
773
  # by default they are set to the latest
764
- !wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.4.0
774
+ !wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.4.0-rc1
765
775
  ```
766
776
 
767
777
  [Spark NLP quick start on Google Colab](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/quick_start_google_colab.ipynb)
@@ -784,7 +794,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
784
794
  # -s is for spark-nlp
785
795
  # -g will enable upgrading libcudnn8 to 8.1.0 on Kaggle for GPU usage
786
796
  # by default they are set to the latest
787
- !wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.4.0
797
+ !wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.4.0-rc1
788
798
  ```
789
799
 
790
800
  [Spark NLP quick start on Kaggle Kernel](https://www.kaggle.com/mozzie/spark-nlp-named-entity-recognition) is a live
@@ -803,9 +813,9 @@ demo on Kaggle Kernel that performs named entity recognitions by using Spark NLP
803
813
 
804
814
  3. In `Libraries` tab inside your cluster you need to follow these steps:
805
815
 
806
- 3.1. Install New -> PyPI -> `spark-nlp==5.4.0` -> Install
816
+ 3.1. Install New -> PyPI -> `spark-nlp==5.4.0-rc1` -> Install
807
817
 
808
- 3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0` -> Install
818
+ 3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc1` -> Install
809
819
 
810
820
  4. Now you can attach your notebook to the cluster and use Spark NLP!
811
821
 
@@ -856,7 +866,7 @@ A sample of your software configuration in JSON on S3 (must be public access):
856
866
  "spark.kryoserializer.buffer.max": "2000M",
857
867
  "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
858
868
  "spark.driver.maxResultSize": "0",
859
- "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0"
869
+ "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc1"
860
870
  }
861
871
  }]
862
872
  ```
@@ -865,7 +875,7 @@ A sample of AWS CLI to launch EMR cluster:
865
875
 
866
876
  ```.sh
867
877
  aws emr create-cluster \
868
- --name "Spark NLP 5.4.0" \
878
+ --name "Spark NLP 5.4.0-rc1" \
869
879
  --release-label emr-6.2.0 \
870
880
  --applications Name=Hadoop Name=Spark Name=Hive \
871
881
  --instance-type m4.4xlarge \
@@ -929,7 +939,7 @@ gcloud dataproc clusters create ${CLUSTER_NAME} \
929
939
  --enable-component-gateway \
930
940
  --metadata 'PIP_PACKAGES=spark-nlp spark-nlp-display google-cloud-bigquery google-cloud-storage' \
931
941
  --initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/python/pip-install.sh \
932
- --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
942
+ --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc1
933
943
  ```
934
944
 
935
945
  2. On an existing one, you need to install spark-nlp and spark-nlp-display packages from PyPI.
@@ -972,7 +982,7 @@ spark = SparkSession.builder
972
982
  .config("spark.kryoserializer.buffer.max", "2000m")
973
983
  .config("spark.jsl.settings.pretrained.cache_folder", "sample_data/pretrained")
974
984
  .config("spark.jsl.settings.storage.cluster_tmp_dir", "sample_data/storage")
975
- .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0")
985
+ .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc1")
976
986
  .getOrCreate()
977
987
  ```
978
988
 
@@ -986,7 +996,7 @@ spark-shell \
986
996
  --conf spark.kryoserializer.buffer.max=2000M \
987
997
  --conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \
988
998
  --conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \
989
- --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
999
+ --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc1
990
1000
  ```
991
1001
 
992
1002
  **pyspark:**
@@ -999,7 +1009,7 @@ pyspark \
999
1009
  --conf spark.kryoserializer.buffer.max=2000M \
1000
1010
  --conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \
1001
1011
  --conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \
1002
- --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
1012
+ --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc1
1003
1013
  ```
1004
1014
 
1005
1015
  **Databricks:**
@@ -1271,7 +1281,7 @@ spark = SparkSession.builder
1271
1281
  .config("spark.driver.memory", "16G")
1272
1282
  .config("spark.driver.maxResultSize", "0")
1273
1283
  .config("spark.kryoserializer.buffer.max", "2000M")
1274
- .config("spark.jars", "/tmp/spark-nlp-assembly-5.4.0.jar")
1284
+ .config("spark.jars", "/tmp/spark-nlp-assembly-5.4.0-rc1.jar")
1275
1285
  .getOrCreate()
1276
1286
  ```
1277
1287
 
@@ -1280,7 +1290,7 @@ spark = SparkSession.builder
1280
1290
  version (3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x)
1281
1291
  - If you are local, you can load the Fat JAR from your local FileSystem, however, if you are in a cluster setup you need
1282
1292
  to put the Fat JAR on a distributed FileSystem such as HDFS, DBFS, S3, etc. (
1283
- i.e., `hdfs:///tmp/spark-nlp-assembly-5.4.0.jar`)
1293
+ i.e., `hdfs:///tmp/spark-nlp-assembly-5.4.0-rc1.jar`)
1284
1294
 
1285
1295
  Example of using pretrained Models and Pipelines in offline:
1286
1296
 
@@ -1,9 +1,7 @@
1
1
  com/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
2
2
  com/johnsnowlabs/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
3
- com/johnsnowlabs/ml/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
4
- com/johnsnowlabs/ml/ai/__init__.py,sha256=YQiK2M7U4d8y5irPy_HB8ae0mSpqS9583MH44pnKJXc,295
5
3
  com/johnsnowlabs/nlp/__init__.py,sha256=DPIVXtONO5xXyOk-HB0-sNiHAcco17NN13zPS_6Uw8c,294
6
- sparknlp/__init__.py,sha256=lRQR3K0noT97MQlXrjnJEgvD4QIvuUUMrbC7VCND4w4,13638
4
+ sparknlp/__init__.py,sha256=LfjmvNVvUTsHxDW1JoM5qb3yWDjqz97vHf1iSybI1b4,13596
7
5
  sparknlp/annotation.py,sha256=I5zOxG5vV2RfPZfqN9enT1i4mo6oBcn3Lrzs37QiOiA,5635
8
6
  sparknlp/annotation_audio.py,sha256=iRV_InSVhgvAwSRe9NTbUH9v6OGvTM-FPCpSAKVu0mE,1917
9
7
  sparknlp/annotation_image.py,sha256=xhCe8Ko-77XqWVuuYHFrjKqF6zPd8Z-RY_rmZXNwCXU,2547
@@ -30,7 +28,7 @@ sparknlp/annotator/audio/__init__.py,sha256=dXjtvi5c0aTZFq1Q_JciUd1uFTBVSJoUdcq0
30
28
  sparknlp/annotator/audio/hubert_for_ctc.py,sha256=76PfwPZZvOHU5kfDqLueCFbmqa4W8pMNRGoCvOqjsEA,7859
31
29
  sparknlp/annotator/audio/wav2vec2_for_ctc.py,sha256=K78P1U6vA4O1UufsLYzy0H7arsKNmwPcIV7kzDFsA5Q,6210
32
30
  sparknlp/annotator/audio/whisper_for_ctc.py,sha256=uII51umuohqwnAW0Q7VdxEFyr_j5LMnfpcRlf8TbetA,9800
33
- sparknlp/annotator/classifier_dl/__init__.py,sha256=dsbceLBdAsk0VlvgcCcGANHMcyFMKi7-sdyu-Eg41ws,3763
31
+ sparknlp/annotator/classifier_dl/__init__.py,sha256=tGg78A8LUgobZFre_3ySN51KGNyl0Zx0inxT9yfL_g8,3686
34
32
  sparknlp/annotator/classifier_dl/albert_for_question_answering.py,sha256=LG2dL6Fky1T35yXTUZBfIihIIGnkRFQ7ECQ3HRXXEG8,6517
35
33
  sparknlp/annotator/classifier_dl/albert_for_sequence_classification.py,sha256=kWx7f9pcKE2qw319gn8FN0Md5dX38gbmfeoY9gWCLNk,7842
36
34
  sparknlp/annotator/classifier_dl/albert_for_token_classification.py,sha256=5rdsjWnsAVmtP-idU7ATKJ8lkH2rtlKZLnpi4Mq27eI,6839
@@ -56,7 +54,6 @@ sparknlp/annotator/classifier_dl/longformer_for_sequence_classification.py,sha25
56
54
  sparknlp/annotator/classifier_dl/longformer_for_token_classification.py,sha256=RmiFuBRhIAoJoQ8Rgcu997-PxBK1hhWuLVlS1qztMyk,6848
57
55
  sparknlp/annotator/classifier_dl/mpnet_for_question_answering.py,sha256=w9hHLrQbDIUHAdCKiXNDneAbohMKopixAKU2wkYkqbs,5522
58
56
  sparknlp/annotator/classifier_dl/mpnet_for_sequence_classification.py,sha256=M__giFElL6Q3I88QD6OoXDzdQDk_Zp5sS__Kh_XpLdo,7308
59
- sparknlp/annotator/classifier_dl/mpnet_for_token_classification.py,sha256=SgFAJcv7ZE3BmJOehK_CjAaueqaaK6PR33zA5aE9-Ww,6754
60
57
  sparknlp/annotator/classifier_dl/multi_classifier_dl.py,sha256=ylKQzS7ROyeKeiOF4BZiIkQV1sfrnfUUQ9LXFSFK_Vo,16045
61
58
  sparknlp/annotator/classifier_dl/roberta_for_question_answering.py,sha256=WRxu1uhXnY9C4UHdtJ8qiVGhPSX7sCdSaML0AWHOdJw,6471
62
59
  sparknlp/annotator/classifier_dl/roberta_for_sequence_classification.py,sha256=z97uH5WkG8kPX1Y9qtpLwD7egl0kzbVmxtq4xzZgNNI,7857
@@ -66,7 +63,7 @@ sparknlp/annotator/classifier_dl/sentiment_dl.py,sha256=6Z7X3-ykxoaUz6vz-YIXkv2m
66
63
  sparknlp/annotator/classifier_dl/tapas_for_question_answering.py,sha256=2YBODMDUZT-j5ceOFTixrEkOqrztIM1kU-tsW_wao18,6317
67
64
  sparknlp/annotator/classifier_dl/xlm_roberta_for_question_answering.py,sha256=t_zCnKGCjDccKNj_2mjRkysOaNCWNBMKXehbuFSphQc,6538
68
65
  sparknlp/annotator/classifier_dl/xlm_roberta_for_sequence_classification.py,sha256=sudgwa8_QZQzaYvEMSt6J1bDDwyK2Hp1VFhh98P08hY,7930
69
- sparknlp/annotator/classifier_dl/xlm_roberta_for_token_classification.py,sha256=ub5mMiZYKP4eBmXRzjkjfv_FFFR8E01XJs0RC__RxPo,6808
66
+ sparknlp/annotator/classifier_dl/xlm_roberta_for_token_classification.py,sha256=pe4Y1XDxDMQs1q32bwhbPC5_oKcJ4n5JFu-dsofLdSA,6850
70
67
  sparknlp/annotator/classifier_dl/xlm_roberta_for_zero_shot_classification.py,sha256=4dBzpPj-VJcZul5hGcyjYkVMQ1PiaXZEGwvEaob3rss,8899
71
68
  sparknlp/annotator/classifier_dl/xlnet_for_sequence_classification.py,sha256=CI9Ah2lyHkqwDHWGCbkk_gPbQd0NudpC7oXiHtWOucs,7811
72
69
  sparknlp/annotator/classifier_dl/xlnet_for_token_classification.py,sha256=SndQpIfslsSYEOX-myLjpUS6-wVIeDG8MOhJYcu2_7M,6739
@@ -85,17 +82,17 @@ sparknlp/annotator/embeddings/__init__.py,sha256=XQ6-UMsfvH54u3f0yceKiM8XJOAugIT
85
82
  sparknlp/annotator/embeddings/albert_embeddings.py,sha256=6Rd1LIn8oFIpq_ALcJh-RUjPEO7Ht8wsHY6JHSFyMkw,9995
86
83
  sparknlp/annotator/embeddings/bert_embeddings.py,sha256=HVUjkg56kBcpGZCo-fmPG5uatMDF3swW_lnbpy1SgSI,8463
87
84
  sparknlp/annotator/embeddings/bert_sentence_embeddings.py,sha256=NQy9KuXT9aKsTpYCR5RAeoFWI2YqEGorbdYrf_0KKmw,9148
88
- sparknlp/annotator/embeddings/bge_embeddings.py,sha256=hXFFd9HOru1w2L9N5YGSZlaKyxqMsZccpaI4Z8-bNUE,7919
85
+ sparknlp/annotator/embeddings/bge_embeddings.py,sha256=FNmYxcynM1iLJvg5ZNmrZKkyIF0Gtr7G-CgZ72mrVyU,7842
89
86
  sparknlp/annotator/embeddings/camembert_embeddings.py,sha256=dBTXas-2Tas_JUR9Xt_GtHLcyqi_cdvT5EHRnyVrSSQ,8817
90
87
  sparknlp/annotator/embeddings/chunk_embeddings.py,sha256=WUmkJimSuFkdcLJnvcxOV0QlCLgGlhub29ZTrZb70WE,6052
91
88
  sparknlp/annotator/embeddings/deberta_embeddings.py,sha256=_b5nzLb7heFQNN-uT2oBNO6-YmM8bHmAdnGXg47HOWw,8649
92
89
  sparknlp/annotator/embeddings/distil_bert_embeddings.py,sha256=4pyMCsbvvXYeTGIMVUir9wCDKR_1f_HKtXZrTDO1Thc,9275
93
90
  sparknlp/annotator/embeddings/doc2vec.py,sha256=Xk3MdEkXatX9lRgbFbAdnIDrLgIxzUIGWFBZeo9BTq0,13226
94
- sparknlp/annotator/embeddings/e5_embeddings.py,sha256=Esuvrq9JlogGaSSzFVVDkOFMwgYwFwr17I62ZiCDm0k,7858
91
+ sparknlp/annotator/embeddings/e5_embeddings.py,sha256=_f5k-EDa_zSox4INeLBGS3gYO16WrVVKBsU0guVqxkk,7779
95
92
  sparknlp/annotator/embeddings/elmo_embeddings.py,sha256=KV-KPs0Pq_OpPaHsnqBz2k_S7VdzyFZ4632IeFNKqJ8,9858
96
93
  sparknlp/annotator/embeddings/instructor_embeddings.py,sha256=CTKmbuBOx_KBM4JM-Y1U5LyR-6rrnpoBGbgGE_axS1c,8670
97
94
  sparknlp/annotator/embeddings/longformer_embeddings.py,sha256=jS4fxB5O0-d9ta9VKv8ai-17n5YHt5rML8QxUw7K4Io,8754
98
- sparknlp/annotator/embeddings/mpnet_embeddings.py,sha256=7d6E4lS7jjkppDPvty1UHNNrbykkriFiysrxZ_RzL0U,7875
95
+ sparknlp/annotator/embeddings/mpnet_embeddings.py,sha256=2sabImn5spYGzfNwBSH2zUU90Wjqrm2btCVbDbtsqPg,7796
99
96
  sparknlp/annotator/embeddings/roberta_embeddings.py,sha256=q_WHby2lDcPc5bVHkGc6X_GwT3qyDUBLUVz5ZW4HCSY,9229
100
97
  sparknlp/annotator/embeddings/roberta_sentence_embeddings.py,sha256=KVrD4z_tIU-sphK6dmbbnHBBt8-Y89C_BFQAkN99kZo,8181
101
98
  sparknlp/annotator/embeddings/sentence_embeddings.py,sha256=azuA1FKMtTJ9suwJqTEHeWHumT6kYdfURTe_1fsqcB8,5402
@@ -127,7 +124,7 @@ sparknlp/annotator/ner/ner_overwriter.py,sha256=en5OxXIP46yTXokIE96YDP9kcHA9oxiR
127
124
  sparknlp/annotator/ner/zero_shot_ner_model.py,sha256=DohhnkGSG-JxjW72t8AOx3GY7R_qT-LA3I0KF9TBz-Y,7501
128
125
  sparknlp/annotator/openai/__init__.py,sha256=u6SpV_xS8UpBE95WnTl0IefOI5TrTRl7ZHuYoeTetiA,759
129
126
  sparknlp/annotator/openai/openai_completion.py,sha256=OqDODelDAxlS66a4mAqJqXMFlEhaeiKZD4XBzR98k-g,16859
130
- sparknlp/annotator/openai/openai_embeddings.py,sha256=i1ABDRmK6vMzzWP1rVxFiWnvXG4zfrTGGDjq4lvWQeE,108802
127
+ sparknlp/annotator/openai/openai_embeddings.py,sha256=TJgd6sLfUWqJz6fd3jGfoKb-j2nrzzJbhr1S-e-71MI,109860
131
128
  sparknlp/annotator/param/__init__.py,sha256=MKBZs6NWRKxrpeof3Jr4PVmoa75wyRSdWzSt0A9lpfY,750
132
129
  sparknlp/annotator/param/classifier_encoder.py,sha256=PDyOdUX2GOFVr6MLtB7RUPBdtDrzDNJNRe_r9bY5JpE,3005
133
130
  sparknlp/annotator/param/evaluation_dl_params.py,sha256=qxMP_98zaKbO1Y20yOvvarmrTCiU24VskJRo8NNI9CA,4998
@@ -139,14 +136,12 @@ sparknlp/annotator/sentence/sentence_detector_dl.py,sha256=-Osj9Bm9KyZRTAWkOsK9c
139
136
  sparknlp/annotator/sentiment/__init__.py,sha256=Lq3vKaZS1YATLMg0VNXSVtkWL5q5G9taGBvdrvSwnfg,766
140
137
  sparknlp/annotator/sentiment/sentiment_detector.py,sha256=m545NGU0Xzg_PO6_qIfpli1uZj7JQcyFgqe9R6wAPFI,8154
141
138
  sparknlp/annotator/sentiment/vivekn_sentiment.py,sha256=4rpXWDgzU6ddnbrSCp9VdLb2epCc9oZ3c6XcqxEw8nk,9655
142
- sparknlp/annotator/seq2seq/__init__.py,sha256=3pF-b9ubgAs8ofggiNyuc1NQseq_oe231UVjVkZWTmU,1130
139
+ sparknlp/annotator/seq2seq/__init__.py,sha256=UQK-_3wLkUdW1piGudCx1_k3Tg3tERZJYOBnfMRj8pA,1011
143
140
  sparknlp/annotator/seq2seq/bart_transformer.py,sha256=I1flM4yeCzEAKOdQllBC30XuedxVJ7ferkFhZ6gwEbE,18481
144
141
  sparknlp/annotator/seq2seq/gpt2_transformer.py,sha256=Oz95R_NRR4tWHu_bW6Ak2832ZILXycp3ify7LfRSi8o,15310
145
142
  sparknlp/annotator/seq2seq/llama2_transformer.py,sha256=3LzTR0VerFdFmOizsrs2Q7HTnjELJ5WtfUgx5XnOqGM,13898
146
- sparknlp/annotator/seq2seq/m2m100_transformer.py,sha256=uIL9RZuuryTIdAy9TbJf9wbz6RekhW8S079bJhaB6i4,16116
143
+ sparknlp/annotator/seq2seq/m2m100_transformer.py,sha256=fTFGFWaFfJt5kaLvnYknf_23PVyjBuha48asFsE_NaE,16082
147
144
  sparknlp/annotator/seq2seq/marian_transformer.py,sha256=mQ4Ylh7ZzXAOue8f-x0gqzfS3vAz3XUdD7eQ2XhcEs4,13781
148
- sparknlp/annotator/seq2seq/mistral_transformer.py,sha256=hq5-Emut7qYnwFolYQ6cFOEY4j5-8PdlPi2Vs72qCig,14254
149
- sparknlp/annotator/seq2seq/phi2_transformer.py,sha256=YuqEcvJunKKZMmfqD3thXHR5FsPbqjjwbHFExWjbDWk,13796
150
145
  sparknlp/annotator/seq2seq/t5_transformer.py,sha256=wDVxNLluIU1HGZFqaKKc4YTt4l-elPlAtQ7EEa0f5tg,17308
151
146
  sparknlp/annotator/similarity/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
152
147
  sparknlp/annotator/similarity/document_similarity_ranker.py,sha256=OFAXEBuALFJglwThsGK8YaJ_pgW1tcevB7jVq-8SyKM,14991
@@ -188,7 +183,7 @@ sparknlp/common/read_as.py,sha256=imxPGwV7jr4Li_acbo0OAHHRGCBbYv-akzEGaBWEfcY,12
188
183
  sparknlp/common/recursive_annotator_approach.py,sha256=vqugBw22cE3Ff7PIpRlnYFuOlchgL0nM26D8j-NdpqU,1449
189
184
  sparknlp/common/storage.py,sha256=D91H3p8EIjNspjqAYu6ephRpCUtdcAir4_PrAbkIQWE,4842
190
185
  sparknlp/common/utils.py,sha256=Yne6yYcwKxhOZC-U4qfYoDhWUP_6BIaAjI5X_P_df1E,1306
191
- sparknlp/internal/__init__.py,sha256=X38S3vTHB0c4EkzczDv-J7hpJl0g6A9Xe_3u8jGJTCU,30239
186
+ sparknlp/internal/__init__.py,sha256=7kGauV0ncpqnrPzagaXefApSKyzuxYZTO1myeVZ6LJ8,26929
192
187
  sparknlp/internal/annotator_java_ml.py,sha256=UGPoThG0rGXUOXGSQnDzEDW81Mu1s5RPF29v7DFyE3c,1187
193
188
  sparknlp/internal/annotator_transformer.py,sha256=fXmc2IWXGybqZpbEU9obmbdBYPc798y42zvSB4tqV9U,1448
194
189
  sparknlp/internal/extended_java_wrapper.py,sha256=hwP0133-hDiDf5sBF-P3MtUsuuDj1PpQbtGZQIRwzfk,2240
@@ -230,8 +225,8 @@ sparknlp/training/_tf_graph_builders_1x/ner_dl/dataset_encoder.py,sha256=R4yHFN3
230
225
  sparknlp/training/_tf_graph_builders_1x/ner_dl/ner_model.py,sha256=EoCSdcIjqQ3wv13MAuuWrKV8wyVBP0SbOEW41omHlR0,23189
231
226
  sparknlp/training/_tf_graph_builders_1x/ner_dl/ner_model_saver.py,sha256=k5CQ7gKV6HZbZMB8cKLUJuZxoZWlP_DFWdZ--aIDwsc,2356
232
227
  sparknlp/training/_tf_graph_builders_1x/ner_dl/sentence_grouper.py,sha256=pAxjWhjazSX8Vg0MFqJiuRVw1IbnQNSs-8Xp26L4nko,870
233
- spark_nlp-5.4.0.dist-info/.uuid,sha256=1f6hF51aIuv9yCvh31NU9lOpS34NE-h3a0Et7R9yR6A,36
234
- spark_nlp-5.4.0.dist-info/METADATA,sha256=fzEL08vmQeHH_Y9OCF3QfU_CWtGtWk5bexyFxXOGoSs,55595
235
- spark_nlp-5.4.0.dist-info/WHEEL,sha256=bb2Ot9scclHKMOLDEHY6B2sicWOgugjFKaJsT7vwMQo,110
236
- spark_nlp-5.4.0.dist-info/top_level.txt,sha256=uuytur4pyMRw2H_txNY2ZkaucZHUs22QF8-R03ch_-E,13
237
- spark_nlp-5.4.0.dist-info/RECORD,,
228
+ spark_nlp-5.4.0rc1.dist-info/.uuid,sha256=1f6hF51aIuv9yCvh31NU9lOpS34NE-h3a0Et7R9yR6A,36
229
+ spark_nlp-5.4.0rc1.dist-info/METADATA,sha256=cEBGxVSbWrCQInnlujccn9xgqznzSHsX1QiNjQJmobU,57266
230
+ spark_nlp-5.4.0rc1.dist-info/WHEEL,sha256=bb2Ot9scclHKMOLDEHY6B2sicWOgugjFKaJsT7vwMQo,110
231
+ spark_nlp-5.4.0rc1.dist-info/top_level.txt,sha256=uuytur4pyMRw2H_txNY2ZkaucZHUs22QF8-R03ch_-E,13
232
+ spark_nlp-5.4.0rc1.dist-info/RECORD,,
sparknlp/__init__.py CHANGED
@@ -58,7 +58,6 @@ sys.modules['com.johnsnowlabs.nlp.annotators.er'] = annotator
58
58
  sys.modules['com.johnsnowlabs.nlp.annotators.coref'] = annotator
59
59
  sys.modules['com.johnsnowlabs.nlp.annotators.cv'] = annotator
60
60
  sys.modules['com.johnsnowlabs.nlp.annotators.audio'] = annotator
61
- sys.modules['com.johnsnowlabs.ml.ai'] = annotator
62
61
 
63
62
  annotators = annotator
64
63
  embeddings = annotator
@@ -129,7 +128,7 @@ def start(gpu=False,
129
128
  The initiated Spark session.
130
129
 
131
130
  """
132
- current_version = "5.4.0"
131
+ current_version = "5.4.0-rc1"
133
132
 
134
133
  if params is None:
135
134
  params = {}
@@ -310,4 +309,4 @@ def version():
310
309
  str
311
310
  The current Spark NLP version.
312
311
  """
313
- return '5.4.0'
312
+ return '5.4.0-rc1'
@@ -51,4 +51,3 @@ from sparknlp.annotator.classifier_dl.bart_for_zero_shot_classification import *
51
51
  from sparknlp.annotator.classifier_dl.deberta_for_zero_shot_classification import *
52
52
  from sparknlp.annotator.classifier_dl.mpnet_for_sequence_classification import *
53
53
  from sparknlp.annotator.classifier_dl.mpnet_for_question_answering import *
54
- from sparknlp.annotator.classifier_dl.mpnet_for_token_classification import *
@@ -31,7 +31,7 @@ class XlmRoBertaForTokenClassification(AnnotatorModel,
31
31
  >>> token_classifier = XlmRoBertaForTokenClassification.pretrained() \\
32
32
  ... .setInputCols(["token", "document"]) \\
33
33
  ... .setOutputCol("label")
34
- The default model is ``"mpnet_base_token_classifier"``, if no
34
+ The default model is ``"xlm_roberta_base_token_classifier_conll03"``, if no
35
35
  name is provided.
36
36
 
37
37
  For available pretrained models please see the `Models Hub
@@ -150,14 +150,14 @@ class XlmRoBertaForTokenClassification(AnnotatorModel,
150
150
  return XlmRoBertaForTokenClassification(java_model=jModel)
151
151
 
152
152
  @staticmethod
153
- def pretrained(name="mpnet_base_token_classifier", lang="en", remote_loc=None):
153
+ def pretrained(name="xlm_roberta_base_token_classifier_conll03", lang="en", remote_loc=None):
154
154
  """Downloads and loads a pretrained model.
155
155
 
156
156
  Parameters
157
157
  ----------
158
158
  name : str, optional
159
159
  Name of the pretrained model, by default
160
- "mpnet_base_token_classifier"
160
+ "xlm_roberta_base_token_classifier_conll03"
161
161
  lang : str, optional
162
162
  Language of the pretrained model, by default "en"
163
163
  remote_loc : str, optional
@@ -26,8 +26,6 @@ class BGEEmbeddings(AnnotatorModel,
26
26
 
27
27
  BGE, or BAAI General Embeddings, a model that can map any text to a low-dimensional dense
28
28
  vector which can be used for tasks like retrieval, classification, clustering, or semantic search.
29
-
30
- Note that this annotator is only supported for Spark Versions 3.4 and up.
31
29
 
32
30
  Pretrained models can be loaded with `pretrained` of the companion object:
33
31
 
@@ -25,8 +25,6 @@ class E5Embeddings(AnnotatorModel,
25
25
  """Sentence embeddings using E5.
26
26
 
27
27
  E5, a weakly supervised text embedding model that can generate text embeddings tailored to any task (e.g., classification, retrieval, clustering, text evaluation, etc.)
28
- Note that this annotator is only supported for Spark Versions 3.4 and up.
29
-
30
28
  Pretrained models can be loaded with :meth:`.pretrained` of the companion
31
29
  object:
32
30
 
@@ -28,8 +28,6 @@ class MPNetEmbeddings(AnnotatorModel,
28
28
  to inherit the advantages of masked language modeling and permuted language modeling for
29
29
  natural language understanding.
30
30
 
31
- Note that this annotator is only supported for Spark Versions 3.4 and up.
32
-
33
31
  Pretrained models can be loaded with :meth:`.pretrained` of the companion
34
32
  object:
35
33