spark-nlp 6.1.5__py2.py3-none-any.whl → 6.2.0__py2.py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of spark-nlp might be problematic. Click here for more details.

@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: spark-nlp
3
- Version: 6.1.5
3
+ Version: 6.2.0
4
4
  Summary: John Snow Labs Spark NLP is a natural language processing library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines, that scale easily in a distributed environment.
5
5
  Home-page: https://github.com/JohnSnowLabs/spark-nlp
6
6
  Author: John Snow Labs
@@ -102,7 +102,7 @@ $ java -version
102
102
  $ conda create -n sparknlp python=3.7 -y
103
103
  $ conda activate sparknlp
104
104
  # spark-nlp by default is based on pyspark 3.x
105
- $ pip install spark-nlp==6.1.5 pyspark==3.3.1
105
+ $ pip install spark-nlp==6.2.0 pyspark==3.3.1
106
106
  ```
107
107
 
108
108
  In Python console or Jupyter `Python3` kernel:
@@ -168,7 +168,7 @@ For a quick example of using pipelines and models take a look at our official [d
168
168
 
169
169
  ### Apache Spark Support
170
170
 
171
- Spark NLP *6.1.5* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x
171
+ Spark NLP *6.2.0* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x
172
172
 
173
173
  | Spark NLP | Apache Spark 3.5.x | Apache Spark 3.4.x | Apache Spark 3.3.x | Apache Spark 3.2.x | Apache Spark 3.1.x | Apache Spark 3.0.x | Apache Spark 2.4.x | Apache Spark 2.3.x |
174
174
  |-----------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|
@@ -198,7 +198,7 @@ Find out more about 4.x `SparkNLP` versions in our official [documentation](http
198
198
 
199
199
  ### Databricks Support
200
200
 
201
- Spark NLP 6.1.5 has been tested and is compatible with the following runtimes:
201
+ Spark NLP 6.2.0 has been tested and is compatible with the following runtimes:
202
202
 
203
203
  | **CPU** | **GPU** |
204
204
  |--------------------|--------------------|
@@ -216,7 +216,7 @@ We are compatible with older runtimes. For a full list check databricks support
216
216
 
217
217
  ### EMR Support
218
218
 
219
- Spark NLP 6.1.5 has been tested and is compatible with the following EMR releases:
219
+ Spark NLP 6.2.0 has been tested and is compatible with the following EMR releases:
220
220
 
221
221
  | **EMR Release** |
222
222
  |--------------------|
@@ -306,7 +306,7 @@ Please check [these instructions](https://sparknlp.org/docs/en/install#s3-integr
306
306
  Need more **examples**? Check out our dedicated [Spark NLP Examples](https://github.com/JohnSnowLabs/spark-nlp/tree/master/examples)
307
307
  repository to showcase all Spark NLP use cases!
308
308
 
309
- Also, don't forget to check [Spark NLP in Action](https://sparknlp.org/demo) built by Streamlit.
309
+ Also, don't forget to check [Spark NLP in Action](https://sparknlp.org/demos) built by Streamlit.
310
310
 
311
311
  #### All examples: [spark-nlp/examples](https://github.com/JohnSnowLabs/spark-nlp/tree/master/examples)
312
312
 
@@ -3,7 +3,7 @@ com/johnsnowlabs/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,
3
3
  com/johnsnowlabs/ml/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
4
4
  com/johnsnowlabs/ml/ai/__init__.py,sha256=YQiK2M7U4d8y5irPy_HB8ae0mSpqS9583MH44pnKJXc,295
5
5
  com/johnsnowlabs/nlp/__init__.py,sha256=DPIVXtONO5xXyOk-HB0-sNiHAcco17NN13zPS_6Uw8c,294
6
- sparknlp/__init__.py,sha256=DDZfgU_lB6Bzk_7AAy81w1hkRwYlrOerC4dP-hQFpm8,13814
6
+ sparknlp/__init__.py,sha256=6cuRDo27cGHCq7oJzF7sAB4sxm8jd9e8ciB_UH1dRT0,13814
7
7
  sparknlp/annotation.py,sha256=I5zOxG5vV2RfPZfqN9enT1i4mo6oBcn3Lrzs37QiOiA,5635
8
8
  sparknlp/annotation_audio.py,sha256=iRV_InSVhgvAwSRe9NTbUH9v6OGvTM-FPCpSAKVu0mE,1917
9
9
  sparknlp/annotation_image.py,sha256=xhCe8Ko-77XqWVuuYHFrjKqF6zPd8Z-RY_rmZXNwCXU,2547
@@ -16,7 +16,7 @@ sparknlp/annotator/chunker.py,sha256=8nz9B7R_mxKxcfJRfKvz2x_T29W3u4izE9k0wfYPzgE
16
16
  sparknlp/annotator/dataframe_optimizer.py,sha256=P4GySLzz1lRCZX0UBRF9_IDuXlRS1XvRWz-B2L0zqMA,7771
17
17
  sparknlp/annotator/date2_chunk.py,sha256=tW3m_LExmhx8LMFWOGXqMyfNRXSr2dnoEHD-6DrnpXI,3153
18
18
  sparknlp/annotator/document_character_text_splitter.py,sha256=oNrOKJAKO2h1wr0bEuSqYrrltIU_Y6J6cTHy70yKy6s,9877
19
- sparknlp/annotator/document_normalizer.py,sha256=hU2fG6vaPfdngQapoeSu-_zS_LiBZNp2tcVBGl6eTpk,10973
19
+ sparknlp/annotator/document_normalizer.py,sha256=OOqPd6zp7FbtmlLHn1zAxPg9oxDzYRPKLYKr5k0Y5ck,12155
20
20
  sparknlp/annotator/document_token_splitter.py,sha256=-9xbQ9pVAjcKHQQrSk6Cb7f8W1cblCLwWXTNR8kFptA,7499
21
21
  sparknlp/annotator/document_token_splitter_test.py,sha256=NWO9mwhAIUJFuxPofB3c39iUm_6vKp4pteDsBOTH8ng,2684
22
22
  sparknlp/annotator/graph_extraction.py,sha256=b4SB3B_hFgCJT4e5Jcscyxdzfbvw3ujKTa6UNgX5Lhc,14471
@@ -105,7 +105,7 @@ sparknlp/annotator/dependency/dependency_parser.py,sha256=SxyvHPp8Hs1Xnm5X1nLTMi
105
105
  sparknlp/annotator/dependency/typed_dependency_parser.py,sha256=60vPdYkbFk9MPGegg3m9Uik9cMXpMZd8tBvXG39gNww,12456
106
106
  sparknlp/annotator/embeddings/__init__.py,sha256=Aw1oaP5DI0OS6259c0TEZZ6j3VFSvYFEerah5a-udVw,2528
107
107
  sparknlp/annotator/embeddings/albert_embeddings.py,sha256=6Rd1LIn8oFIpq_ALcJh-RUjPEO7Ht8wsHY6JHSFyMkw,9995
108
- sparknlp/annotator/embeddings/auto_gguf_embeddings.py,sha256=TRAYbhGS4K8uSpsScvDr6uD3lYdxMpCUjwDMhV_74rM,19977
108
+ sparknlp/annotator/embeddings/auto_gguf_embeddings.py,sha256=-64uQKkvWsE2By3LEP9Hv10Eox10QAyVz0vSc_BduvY,20146
109
109
  sparknlp/annotator/embeddings/bert_embeddings.py,sha256=HVUjkg56kBcpGZCo-fmPG5uatMDF3swW_lnbpy1SgSI,8463
110
110
  sparknlp/annotator/embeddings/bert_sentence_embeddings.py,sha256=NQy9KuXT9aKsTpYCR5RAeoFWI2YqEGorbdYrf_0KKmw,9148
111
111
  sparknlp/annotator/embeddings/bge_embeddings.py,sha256=ZGbxssjJFaSfbcgqAPV5hsu81SnC0obgCVNOoJkArDA,8105
@@ -135,7 +135,7 @@ sparknlp/annotator/embeddings/xlm_roberta_embeddings.py,sha256=S2HHXOrSFXMAyloZU
135
135
  sparknlp/annotator/embeddings/xlm_roberta_sentence_embeddings.py,sha256=ojxD3H2VgDEn-RzDdCz0X485pojHBAFrlzsNemI05bY,8602
136
136
  sparknlp/annotator/embeddings/xlnet_embeddings.py,sha256=hJrlsJeO3D7uz54xiEiqqXEbq24YGuWz8U652PV9fNE,9336
137
137
  sparknlp/annotator/er/__init__.py,sha256=eF9Z-PanVfZWSVN2HSFbE7QjCDb6NYV5ESn6geYKlek,692
138
- sparknlp/annotator/er/entity_ruler.py,sha256=7eZtAwoixkl88jTyKEqTKf9Wzo459VXQkYmFBozUY6A,8784
138
+ sparknlp/annotator/er/entity_ruler.py,sha256=eg9-I9yWQ_vjaKI5g5T4s575VZEjN1Sq7WJJpCImSVg,10007
139
139
  sparknlp/annotator/keyword_extraction/__init__.py,sha256=KotCR238x7LgisinsRGaARgPygWUIwC624FmH-sHacE,720
140
140
  sparknlp/annotator/keyword_extraction/yake_keyword_extraction.py,sha256=oeB-8qdMoljG-mgFOCsfnpxyK5jFBZnX7jAUQwsnHTc,13215
141
141
  sparknlp/annotator/ld_dl/__init__.py,sha256=gWNGOaozABT83J4Mn7JmNQsXzm27s3PHpMQmlXl-5L8,704
@@ -169,9 +169,9 @@ sparknlp/annotator/sentiment/__init__.py,sha256=Lq3vKaZS1YATLMg0VNXSVtkWL5q5G9ta
169
169
  sparknlp/annotator/sentiment/sentiment_detector.py,sha256=m545NGU0Xzg_PO6_qIfpli1uZj7JQcyFgqe9R6wAPFI,8154
170
170
  sparknlp/annotator/sentiment/vivekn_sentiment.py,sha256=4rpXWDgzU6ddnbrSCp9VdLb2epCc9oZ3c6XcqxEw8nk,9655
171
171
  sparknlp/annotator/seq2seq/__init__.py,sha256=aDiph00Hyq7L8uDY0frtyuHtqFodBqTMbixx_nq4z1I,1841
172
- sparknlp/annotator/seq2seq/auto_gguf_model.py,sha256=yhZQHMHfp88rQvLHTWyS-8imZrwqp-8RQQwnw6PmHfc,11749
173
- sparknlp/annotator/seq2seq/auto_gguf_reranker.py,sha256=MS4wCm2A2YiQfkB4HVVZKuN-3A1yGzqSCF69nu7J2rQ,12640
174
- sparknlp/annotator/seq2seq/auto_gguf_vision_model.py,sha256=swBek2026dW6BOX5O9P8Uq41X2GC71VGW0ADFeUIvs0,15299
172
+ sparknlp/annotator/seq2seq/auto_gguf_model.py,sha256=FaKxJaF7BdlQcf3T-nPZWnXRClF8dcYa71QHIaXFigI,11912
173
+ sparknlp/annotator/seq2seq/auto_gguf_reranker.py,sha256=a_70sNooY_9N6KHXVeuM4cDEbHVDlHa1KUWwu0A-l9s,12809
174
+ sparknlp/annotator/seq2seq/auto_gguf_vision_model.py,sha256=59UZKJbI6oYnSNkk2qqf1nhHtB8h3upGRcjZJyl9bGQ,15494
175
175
  sparknlp/annotator/seq2seq/bart_transformer.py,sha256=I1flM4yeCzEAKOdQllBC30XuedxVJ7ferkFhZ6gwEbE,18481
176
176
  sparknlp/annotator/seq2seq/cohere_transformer.py,sha256=43LZBVazZMgJRCsN7HaYjVYfJ5hRMV95QZyxMtXq-m4,13496
177
177
  sparknlp/annotator/seq2seq/cpm_transformer.py,sha256=0CnBFMlxMu0pD2QZMHyoGtIYgXqfUQm68vr6zEAa6Eg,13290
@@ -219,11 +219,12 @@ sparknlp/base/prompt_assembler.py,sha256=_C_9MdHqsxUjSOa3TqCV-6sSfSiRyhfHBQG5m7R
219
219
  sparknlp/base/recursive_pipeline.py,sha256=V9rTnu8KMwgjoceykN9pF1mKGtOkkuiC_n9v8dE3LDk,4279
220
220
  sparknlp/base/table_assembler.py,sha256=Kxu3R2fY6JgCxEc07ibsMsjip6dgcPDHLiWAZ8gC_d8,5102
221
221
  sparknlp/base/token_assembler.py,sha256=qiHry07L7mVCqeHSH6hHxLygv1AsfZIE4jy1L75L3Do,5075
222
- sparknlp/common/__init__.py,sha256=MJuE__T1YS8f3As7X5sgzHibGjDeiFkQ5vc2bEEf0Ww,1148
222
+ sparknlp/common/__init__.py,sha256=bdnDseYWsKnsBk4KdO_NbPJshF_CeqhO2NFXV1Vu_Ts,1205
223
223
  sparknlp/common/annotator_approach.py,sha256=CbkyaWl6rRX_VaXz2xJCjofijRGJGeJCsqQTDQgNTAw,1765
224
224
  sparknlp/common/annotator_model.py,sha256=l1vDFi2m_WbWg47Jq0F8DygjndUQhv9Ftfcc8Iceb8s,1880
225
225
  sparknlp/common/annotator_properties.py,sha256=7B1os7pBUfHo6b7IPQAXQ-nir0u3tQLzDpAg83h_iqQ,4332
226
226
  sparknlp/common/annotator_type.py,sha256=ash2Ip1IOOiJamPVyy_XQj8Ja_DRHm0b9Vj4Ni75oKM,1225
227
+ sparknlp/common/completion_post_processing.py,sha256=sqcjewfrpIBZ4KFQ1XPYJI7luHIStnv6PovkehFxeOg,1423
227
228
  sparknlp/common/coverage_result.py,sha256=No4PSh1HSs3PyRI1zC47x65tWgfirqPI290icHQoXEI,823
228
229
  sparknlp/common/match_strategy.py,sha256=kt1MUPqU1wCwk5qCdYk6jubHbU-5yfAYxb9jjAOrdnY,1678
229
230
  sparknlp/common/properties.py,sha256=7eBxODxKmFQAgOtrxUH9ly4LugUlkNRVXNQcM60AUK4,53025
@@ -285,7 +286,7 @@ sparknlp/training/_tf_graph_builders_1x/ner_dl/dataset_encoder.py,sha256=R4yHFN3
285
286
  sparknlp/training/_tf_graph_builders_1x/ner_dl/ner_model.py,sha256=EoCSdcIjqQ3wv13MAuuWrKV8wyVBP0SbOEW41omHlR0,23189
286
287
  sparknlp/training/_tf_graph_builders_1x/ner_dl/ner_model_saver.py,sha256=k5CQ7gKV6HZbZMB8cKLUJuZxoZWlP_DFWdZ--aIDwsc,2356
287
288
  sparknlp/training/_tf_graph_builders_1x/ner_dl/sentence_grouper.py,sha256=pAxjWhjazSX8Vg0MFqJiuRVw1IbnQNSs-8Xp26L4nko,870
288
- spark_nlp-6.1.5.dist-info/METADATA,sha256=0eZlMGP1ltriZNLI0gw5Amp2ByB_TJyLqTZza9E2pxY,19774
289
- spark_nlp-6.1.5.dist-info/WHEEL,sha256=JNWh1Fm1UdwIQV075glCn4MVuCRs0sotJIq-J6rbxCU,109
290
- spark_nlp-6.1.5.dist-info/top_level.txt,sha256=uuytur4pyMRw2H_txNY2ZkaucZHUs22QF8-R03ch_-E,13
291
- spark_nlp-6.1.5.dist-info/RECORD,,
289
+ spark_nlp-6.2.0.dist-info/METADATA,sha256=8UP-KdKAwIzGuwXPTaPgk3ytBpsjpSDWQI4kvfxrD7E,19775
290
+ spark_nlp-6.2.0.dist-info/WHEEL,sha256=JNWh1Fm1UdwIQV075glCn4MVuCRs0sotJIq-J6rbxCU,109
291
+ spark_nlp-6.2.0.dist-info/top_level.txt,sha256=uuytur4pyMRw2H_txNY2ZkaucZHUs22QF8-R03ch_-E,13
292
+ spark_nlp-6.2.0.dist-info/RECORD,,
sparknlp/__init__.py CHANGED
@@ -66,7 +66,7 @@ sys.modules['com.johnsnowlabs.ml.ai'] = annotator
66
66
  annotators = annotator
67
67
  embeddings = annotator
68
68
 
69
- __version__ = "6.1.5"
69
+ __version__ = "6.2.0"
70
70
 
71
71
 
72
72
  def start(gpu=False,
@@ -122,6 +122,21 @@ class DocumentNormalizer(AnnotatorModel):
122
122
  "file encoding to apply on normalized documents",
123
123
  typeConverter=TypeConverters.toString)
124
124
 
125
+ presetPattern = Param(
126
+ Params._dummy(),
127
+ "presetPattern",
128
+ "Selects a single text cleaning function from the functional presets (e.g., 'CLEAN_BULLETS', 'CLEAN_DASHES', etc.).",
129
+ typeConverter=TypeConverters.toString
130
+ )
131
+
132
+ autoMode = Param(
133
+ Params._dummy(),
134
+ "autoMode",
135
+ "Enables a predefined cleaning mode combining multiple text cleaner functions (e.g., 'light_clean', 'document_clean', 'html_clean', 'full_auto').",
136
+ typeConverter=TypeConverters.toString
137
+ )
138
+
139
+
125
140
  @keyword_only
126
141
  def __init__(self):
127
142
  super(DocumentNormalizer, self).__init__(classname="com.johnsnowlabs.nlp.annotators.DocumentNormalizer")
@@ -197,3 +212,24 @@ class DocumentNormalizer(AnnotatorModel):
197
212
  File encoding to apply on normalized documents, by default "UTF-8"
198
213
  """
199
214
  return self._set(encoding=value)
215
+
216
+ def setPresetPattern(self, value):
217
+ """Sets a single text cleaning preset pattern.
218
+
219
+ Parameters
220
+ ----------
221
+ value : str
222
+ Preset cleaning pattern name, e.g., 'CLEAN_BULLETS', 'CLEAN_DASHES'.
223
+ """
224
+ return self._set(presetPattern=value)
225
+
226
+
227
+ def setAutoMode(self, value):
228
+ """Sets an automatic text cleaning mode using predefined groups of cleaning functions.
229
+
230
+ Parameters
231
+ ----------
232
+ value : str
233
+ Auto cleaning mode, e.g., 'light_clean', 'document_clean', 'social_clean', 'html_clean', 'full_auto'.
234
+ """
235
+ return self._set(autoMode=value)
@@ -532,3 +532,8 @@ class AutoGGUFEmbeddings(AnnotatorModel, HasBatchedAnnotate):
532
532
  return ResourceDownloader.downloadModel(
533
533
  AutoGGUFEmbeddings, name, lang, remote_loc
534
534
  )
535
+
536
+ def close(self):
537
+ """Closes the llama.cpp model backend freeing resources. The model is reloaded when used again.
538
+ """
539
+ self._java_obj.close()
@@ -215,6 +215,20 @@ class EntityRulerModel(AnnotatorModel, HasStorageModel):
215
215
 
216
216
  outputAnnotatorType = AnnotatorType.CHUNK
217
217
 
218
+ autoMode = Param(
219
+ Params._dummy(),
220
+ "autoMode",
221
+ "Enable built-in regex presets that combine related entity patterns (e.g., 'communication_entities', 'network_entities', 'media_entities', etc.).",
222
+ typeConverter=TypeConverters.toString
223
+ )
224
+
225
+ extractEntities = Param(
226
+ Params._dummy(),
227
+ "extractEntities",
228
+ "List of entity types to extract. If not set, all entities in the active autoMode or from regexPatterns are used.",
229
+ typeConverter=TypeConverters.toListString
230
+ )
231
+
218
232
  def __init__(self, classname="com.johnsnowlabs.nlp.annotators.er.EntityRulerModel", java_model=None):
219
233
  super(EntityRulerModel, self).__init__(
220
234
  classname=classname,
@@ -230,3 +244,24 @@ class EntityRulerModel(AnnotatorModel, HasStorageModel):
230
244
  def loadStorage(path, spark, storage_ref):
231
245
  HasStorageModel.loadStorages(path, spark, storage_ref, EntityRulerModel.database)
232
246
 
247
+
248
+ def setAutoMode(self, value):
249
+ """Sets the auto mode for predefined regex entity groups.
250
+
251
+ Parameters
252
+ ----------
253
+ value : str
254
+ Name of the auto mode to activate (e.g., 'communication_entities', 'network_entities', etc.)
255
+ """
256
+ return self._set(autoMode=value)
257
+
258
+
259
+ def setExtractEntities(self, value):
260
+ """Sets specific entities to extract, filtering only those defined in regexPatterns or autoMode.
261
+
262
+ Parameters
263
+ ----------
264
+ value : list[str]
265
+ List of entity names to extract, e.g., ['EMAIL_ADDRESS_PATTERN', 'IPV4_PATTERN'].
266
+ """
267
+ return self._set(extractEntities=value)
@@ -12,12 +12,10 @@
12
12
  # See the License for the specific language governing permissions and
13
13
  # limitations under the License.
14
14
  """Contains classes for the AutoGGUFModel."""
15
- from typing import List, Dict
16
-
17
15
  from sparknlp.common import *
18
16
 
19
17
 
20
- class AutoGGUFModel(AnnotatorModel, HasBatchedAnnotate, HasLlamaCppProperties):
18
+ class AutoGGUFModel(AnnotatorModel, HasBatchedAnnotate, HasLlamaCppProperties, CompletionPostProcessing):
21
19
  """
22
20
  Annotator that uses the llama.cpp library to generate text completions with large language
23
21
  models.
@@ -243,7 +241,6 @@ class AutoGGUFModel(AnnotatorModel, HasBatchedAnnotate, HasLlamaCppProperties):
243
241
  inputAnnotatorTypes = [AnnotatorType.DOCUMENT]
244
242
  outputAnnotatorType = AnnotatorType.DOCUMENT
245
243
 
246
-
247
244
  @keyword_only
248
245
  def __init__(self, classname="com.johnsnowlabs.nlp.annotators.seq2seq.AutoGGUFModel", java_model=None):
249
246
  super(AutoGGUFModel, self).__init__(
@@ -300,3 +297,8 @@ class AutoGGUFModel(AnnotatorModel, HasBatchedAnnotate, HasLlamaCppProperties):
300
297
  """
301
298
  from sparknlp.pretrained import ResourceDownloader
302
299
  return ResourceDownloader.downloadModel(AutoGGUFModel, name, lang, remote_loc)
300
+
301
+ def close(self):
302
+ """Closes the llama.cpp model backend freeing resources. The model is reloaded when used again.
303
+ """
304
+ self._java_obj.close()
@@ -327,3 +327,8 @@ class AutoGGUFReranker(AnnotatorModel, HasBatchedAnnotate, HasLlamaCppProperties
327
327
  """
328
328
  from sparknlp.pretrained import ResourceDownloader
329
329
  return ResourceDownloader.downloadModel(AutoGGUFReranker, name, lang, remote_loc)
330
+
331
+ def close(self):
332
+ """Closes the llama.cpp model backend freeing resources. The model is reloaded when used again.
333
+ """
334
+ self._java_obj.close()
@@ -15,7 +15,7 @@
15
15
  from sparknlp.common import *
16
16
 
17
17
 
18
- class AutoGGUFVisionModel(AnnotatorModel, HasBatchedAnnotate, HasLlamaCppProperties):
18
+ class AutoGGUFVisionModel(AnnotatorModel, HasBatchedAnnotate, HasLlamaCppProperties, CompletionPostProcessing):
19
19
  """Multimodal annotator that uses the llama.cpp library to generate text completions with large
20
20
  language models. It supports ingesting images for captioning.
21
21
 
@@ -329,3 +329,8 @@ class AutoGGUFVisionModel(AnnotatorModel, HasBatchedAnnotate, HasLlamaCppPropert
329
329
  """
330
330
  from sparknlp.pretrained import ResourceDownloader
331
331
  return ResourceDownloader.downloadModel(AutoGGUFVisionModel, name, lang, remote_loc)
332
+
333
+ def close(self):
334
+ """Closes the llama.cpp model backend freeing resources. The model is reloaded when used again.
335
+ """
336
+ self._java_obj.close()
@@ -23,3 +23,4 @@ from sparknlp.common.storage import *
23
23
  from sparknlp.common.utils import *
24
24
  from sparknlp.common.annotator_type import *
25
25
  from sparknlp.common.match_strategy import *
26
+ from sparknlp.common.completion_post_processing import *
@@ -0,0 +1,37 @@
1
+ # Copyright 2017-2025 John Snow Labs
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+ from pyspark.ml.param import Param, Params, TypeConverters
15
+
16
+
17
+ class CompletionPostProcessing:
18
+ removeThinkingTag = Param(
19
+ Params._dummy(),
20
+ "removeThinkingTag",
21
+ "Set a thinking tag (e.g. think) to be removed from output. Will match <TAG>...</TAG>",
22
+ typeConverter=TypeConverters.toString,
23
+ )
24
+
25
+ def setRemoveThinkingTag(self, value: str):
26
+ """Set a thinking tag (e.g. `think`) to be removed from output.
27
+ Will produce the regex: `(?s)<$TAG>.+?</$TAG>`
28
+ """
29
+ self._set(removeThinkingTag=value)
30
+ return self
31
+
32
+ def getRemoveThinkingTag(self):
33
+ """Get the thinking tag to be removed from output."""
34
+ value = None
35
+ if self.removeThinkingTag in self._paramMap:
36
+ value = self._paramMap[self.removeThinkingTag]
37
+ return value