NLPTemplateEngine 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Anton Antonov
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,391 @@
1
+ Metadata-Version: 2.4
2
+ Name: NLPTemplateEngine
3
+ Version: 0.1.0
4
+ Summary: Natural Language Processing template engine for workflow code generation.
5
+ License: MIT License
6
+
7
+ Copyright (c) 2026 Anton Antonov
8
+
9
+ Permission is hereby granted, free of charge, to any person obtaining a copy
10
+ of this software and associated documentation files (the "Software"), to deal
11
+ in the Software without restriction, including without limitation the rights
12
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
13
+ copies of the Software, and to permit persons to whom the Software is
14
+ furnished to do so, subject to the following conditions:
15
+
16
+ The above copyright notice and this permission notice shall be included in all
17
+ copies or substantial portions of the Software.
18
+
19
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
20
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
21
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
22
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
23
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
24
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
25
+ SOFTWARE.
26
+
27
+ Project-URL: homepage, https://github.com/antononcube/Python-NLPTemplateEngine
28
+ Requires-Python: >=3.9
29
+ Description-Content-Type: text/markdown
30
+ License-File: LICENSE
31
+ Requires-Dist: LLMTextualAnswer
32
+ Requires-Dist: pandas
33
+ Provides-Extra: test
34
+ Requires-Dist: pytest>=7; extra == "test"
35
+ Dynamic: license-file
36
+
37
+ # NLPTemplateEngine
38
+
39
+ This Python package aims to create (nearly) executable code for various computational workflows.
40
+
41
+ Package's data and implementation make a Natural Language Processing (NLP)
42
+ [Template Engine (TE)](https://en.wikipedia.org/wiki/Template_processor), [Wk1],
43
+ that incorporates
44
+ [Question Answering Systems (QAS')](https://en.wikipedia.org/wiki/Question_answering), [Wk2],
45
+ and Machine Learning (ML) classifiers.
46
+
47
+ The current version of the NLP-TE of the package heavily relies on Large Language Models (LLMs) for its QAS component.
48
+
49
+ Future plans involve incorporating other types of QAS implementations.
50
+
51
+ This Python package implementation closely follows the Raku implementation in
52
+ ["ML::TemplateEngine"](https://raku.land/zef:antononcube/ML::NLPTemplateEngine), [AAp4],
53
+ which, in turn, closely follows the Wolfram Language (WL) implementations in
54
+ ["NLP Template Engine"](https://github.com/antononcube/NLP-Template-Engine), [AAr1, AAv1],
55
+ and the WL paclet
56
+ ["NLPTemplateEngine"](https://resources.wolframcloud.com/PacletRepository/resources/AntonAntonov/NLPTemplateEngine/), [AAp2, AAv2].
57
+
58
+ An alternative, more comprehensive approach to building workflows code is given in [AAp2].
59
+ Another alternative is to use few-shot training of LLMs with examples provided by, say,
60
+ the Python package ["DSLExamples"](https://pypi.org/project/DSLExamples), [AAp5].
61
+
62
+ **Remark:** See the [vignette notebook](https://github.com/antononcube/Python-NLPTemplateEngine/blob/main/docs/NLPTemplateEngine-vignette.ipynb) corresponding to this README file.
63
+
64
+ ### Problem formulation
65
+
66
+ We want to have a system (i.e. TE) that:
67
+
68
+ 1. Generates relevant, correct, executable programming code based on natural language specifications of computational
69
+ workflows
70
+
71
+ 2. Can automatically recognize the workflow types
72
+
73
+ 3. Can generate code for different programming languages and related software packages
74
+
75
+ The points above are given in order of importance; the most important are placed first.
76
+
77
+ ### Reliability of results
78
+
79
+ One of the main reasons to re-implement the WL NLP-TE, [AAr1, AAp1], into Raku is to have a more robust way
80
+ of utilizing LLMs to generate code. That goal is more or less achieved with this package, but
81
+ YMMV -- if incomplete or wrong results are obtained run the NLP-TE with different LLM parameter settings
82
+ or different LLMs.
83
+
84
+ ------
85
+
86
+ ## Installation
87
+
88
+ From Zef ecosystem:
89
+
90
+ ```
91
+ python3 -m pip install NLPTemplateEngine
92
+ ```
93
+
94
+ ----
95
+
96
+ ## Setup
97
+
98
+ Load packages and define LLM access objects:
99
+
100
+ ```python
101
+ from NLPTemplateEngine import *
102
+ from langchain_ollama import ChatOllama
103
+ import os
104
+
105
+
106
+ llm = ChatOllama(model=os.getenv("OLLAMA_MODEL", "gemma3:12b"))
107
+ ```
108
+
109
+ -----
110
+
111
+ ## Usage examples
112
+
113
+ ### Quantile Regression (WL)
114
+
115
+ Here the template is automatically determined:
116
+
117
+ ```python
118
+ from NLPTemplateEngine import *
119
+
120
+ qrCommand = """
121
+ Compute quantile regression with probabilities 0.4 and 0.6, with interpolation order 2, for the dataset dfTempBoston.
122
+ """
123
+
124
+ concretize(qrCommand, llm=llm)
125
+ ```
126
+
127
+ ```
128
+ # qrObj=
129
+ # QRMonUnit[dfTempBoston]⟹
130
+ # QRMonEchoDataSummary[]⟹
131
+ # QRMonQuantileRegression[12, {0.4,0.6}, InterpolationOrder->2]⟹
132
+ # QRMonPlot["DateListPlot"->False,PlotTheme->"Detailed"]⟹
133
+ # QRMonErrorPlots["RelativeErrors"->False,"DateListPlot"->False,PlotTheme->"Detailed"];
134
+ ```
135
+
136
+ **Remark:** In the code above the template type, "QuantileRegression", was determined using an LLM-based classifier.
137
+
138
+ ### Latent Semantic Analysis (R)
139
+
140
+ ```python
141
+ lsaCommand = """
142
+ Extract 20 topics from the text corpus aAbstracts using the method NNMF.
143
+ Show statistical thesaurus with the words neural, function, and notebook.
144
+ """
145
+
146
+ concretize(lsaCommand, template = 'LatentSemanticAnalysis', lang = 'R')
147
+ ```
148
+
149
+ ```
150
+ # lsaObj <-
151
+ # LSAMonUnit(aAbstracts) %>%
152
+ # LSAMonMakeDocumentTermMatrix(stemWordsQ = Automatic, stopWords = Automatic) %>%
153
+ # LSAMonEchoDocumentTermMatrixStatistics(logBase = 10) %>%
154
+ # LSAMonApplyTermWeightFunctions(globalWeightFunction = "IDF", localWeightFunction = "None", normalizerFunction = "Cosine") %>%
155
+ # LSAMonExtractTopics(numberOfTopics = 20, method = "NNMF", maxSteps = 16, minNumberOfDocumentsPerTerm = 20) %>%
156
+ # LSAMonEchoTopicsTable(numberOfTerms = 10, wideFormQ = TRUE) %>%
157
+ # LSAMonEchoStatisticalThesaurus(words = c("neural", "function", "notebook"))
158
+ ```
159
+
160
+ ### Random tabular data generation (Raku)
161
+
162
+ ```python
163
+ command = """
164
+ Make random table with 6 rows and 4 columns with the names <A1 B2 C3 D4>.
165
+ """
166
+
167
+ concretize(command, template = 'RandomTabularDataset', lang = 'Raku', llm=llm)
168
+ ```
169
+
170
+ ```
171
+ # random-tabular-dataset(6, 4, "column-names-generator" => <A1 B2 C3 D4>, "form" => "table", "max-number-of-values" => 24, "min-number-of-values" => 24, "row-names" => False)
172
+ ```
173
+
174
+ **Remark:** In the code above it was specified to use Google's Gemini LLM service.
175
+
176
+ ### Recommender workflow (Raku)
177
+
178
+ ```python
179
+ command = """
180
+ Make a commander over the data set @dsTitanic and compute 8 recommendations for the profile (passengerSex:male, passengerClass:2nd).
181
+ """
182
+
183
+ concretize(command, lang = 'Python', llm=llm)
184
+ ```
185
+
186
+ ```
187
+ # my $smrObj = ML::SparseMatrixRecommender.new
188
+ # .create-from-wide-form(@dsTitanic, item-column-name='id', :add-tag-types-to-column-names, tag-value-separator=':')
189
+ # .apply-term-weight-functions('IDF', 'None', 'Cosine')
190
+ # .recommend-by-profile(["passengerSex:male", "passengerClass:2nd"], 8, :!normalize)
191
+ # .join-across(@dsTitanic)
192
+ # .echo-value();
193
+ ```
194
+
195
+ ------
196
+
197
+ ## How it works?
198
+
199
+ The following flowchart describes how the NLP Template Engine involves a series of steps for processing a computation
200
+ specification and executing code to obtain results:
201
+
202
+ ```mermaid
203
+ flowchart TD
204
+ spec[/Computation spec/] --> workSpecQ{"Is workflow type<br>specified?"}
205
+ workSpecQ --> |No| guess[[Guess relevant<br>workflow type]]
206
+ workSpecQ -->|Yes| raw[Get raw answers]
207
+ guess -.- classifier[[Classifier:<br>text to workflow type]]
208
+ guess --> raw
209
+ raw --> process[Process raw answers]
210
+ process --> template[Complete<br>computation<br>template]
211
+ template --> execute[/Executable code/]
212
+ execute --> results[/Computation results/]
213
+
214
+ llm{{LLM}} -.- find[[find-textual-answer]]
215
+ llm -.- classifier
216
+ subgraph LLM-based functionalities
217
+ classifier
218
+ find
219
+ end
220
+
221
+ find --> raw
222
+ raw --> find
223
+ template -.- compData[(Computation<br>templates<br>data)]
224
+ compData -.- process
225
+
226
+ classDef highlighted fill:Salmon,stroke:Coral,stroke-width:2px;
227
+ class spec,results highlighted
228
+ ```
229
+
230
+ Here's a detailed narration of the process:
231
+
232
+ 1. **Computation Specification**:
233
+ - The process begins with a "Computation spec", which is the initial input defining the requirements or parameters
234
+ for the computation task.
235
+
236
+ 2. **Workflow Type Decision**:
237
+ - A decision node asks if the workflow type is specified.
238
+
239
+ 3. **Guess Workflow Type**:
240
+ - If the workflow type is not specified, the system utilizes a classifier to guess relevant workflow type.
241
+
242
+ 4. **Raw Answers**:
243
+ - Regardless of how the workflow type is determined (directly specified or guessed), the system retrieves "raw
244
+ answers", crucial for further processing.
245
+
246
+ 5. **Processing and Templating**:
247
+ - The raw answers undergo processing ("Process raw answers") to organize or refine the data into a usable format.
248
+ - Processed data is then utilized to "Complete computation template", preparing for executable operations.
249
+
250
+ 6. **Executable Code and Results**:
251
+ - The computation template is transformed into "Executable code", which when run, produces the final "Computation
252
+ results".
253
+
254
+ 7. **LLM-Based Functionalities**:
255
+ - The classifier and the answers finder are LLM-based.
256
+
257
+ 8. **Data and Templates**:
258
+ - Code templates are selected based on the specifics of the initial spec and the processed data.
259
+
260
+ ------
261
+
262
+ ## Bring your own templates
263
+
264
+ **0.** Load the NLP-Template-Engine package (and others):
265
+
266
+ ```python
267
+ from NLPTemplateEngine import *
268
+ import pandas as pd
269
+ ```
270
+
271
+ **1.** Get the "training" templates data (from CSV file you have created or changed) for a new workflow
272
+ (["SendMail"](https://github.com/antononcube/NLP-Template-Engine/blob/main/TemplateData/dsQASParameters-SendMail.csv)):
273
+
274
+ ```python
275
+ url = 'https://raw.githubusercontent.com/antononcube/NLP-Template-Engine/main/TemplateData/dsQASParameters-SendMail.csv'
276
+ dsSendMail = pd.read_csv(url)
277
+
278
+ dsSendMail.describe()
279
+ ```
280
+
281
+
282
+ **2.** Add the ingested data for the new workflow (from the CSV file) into the NLP-Template-Engine:
283
+
284
+ ```python
285
+ add_template_data(dsSendMail, llm=llm)
286
+ ```
287
+
288
+ ```
289
+ # (ParameterTypePatterns Defaults ParameterQuestions Questions Shortcuts Templates)
290
+ ```
291
+
292
+ **3.** Parse natural language specification with the newly ingested and onboarded workflow ("SendMail"):
293
+
294
+ ```python
295
+ cmd = "Send email to joedoe@gmail.com with content RandomReal[343], and the subject this is a random real call."
296
+ concretize(cmd, template = "SendMail", lang = 'WL', llm=llm)
297
+ ```
298
+
299
+ ```
300
+ # SendMail[<|"To"->{"joedoe@gmail.com"},"Subject"->"this is a random real call","Body"->RandomReal[343],"AttachedFiles"->None|>]
301
+ ```
302
+
303
+ **4.** Experiment with running the generated code!
304
+
305
+ ------
306
+
307
+ ## TODO
308
+
309
+ - [ ] TODO Templates data
310
+ - [ ] TODO Using JSON instead of CSV format for the templates
311
+ - [ ] TODO Derive suitable data structure
312
+ - [ ] TODO Implement export to JSON
313
+ - [ ] TODO Implement ingestion
314
+ - [ ] TODO Review wrong parameter type specifications
315
+ - A few were found.
316
+ - [ ] TODO New workflows
317
+ - [ ] TODO LLM-workflows
318
+ - [ ] TODO Clustering
319
+ - [ ] TODO Associative rule learning
320
+ - [ ] TODO Unit tests
321
+ - What are good ./t unit tests?
322
+ - [ ] TODO Make ingestion ./t unit tests
323
+ - [ ] TODO Make suitable ./xt unit tests
324
+ - [ ] TODO Documentation
325
+ - [ ] TODO Comparison with LLM code generation using few-shot examples
326
+ - [ ] TODO Video demonstrating the functionalities
327
+
328
+ ------
329
+
330
+ ## References
331
+
332
+ ### Articles, blog posts
333
+
334
+ [AA1] Anton Antonov,
335
+ ["DSL examples with LangChain"](https://pythonforprediction.wordpress.com/2026/02/12/dsl-examples-with-langchain/),
336
+ (2026),
337
+ [PythonForPrediction at WordPress](https://pythonforprediction.wordpress.com).
338
+
339
+ [Wk1] Wikipedia entry, [Template processor](https://en.wikipedia.org/wiki/Template_processor).
340
+
341
+ [Wk2] Wikipedia entry, [Question answering](https://en.wikipedia.org/wiki/Question_answering).
342
+
343
+ ### Functions, packages, repositories
344
+
345
+ [AAr1] Anton Antonov,
346
+ ["NLP Template Engine"](https://github.com/antononcube/NLP-Template-Engine),
347
+ (2021-2022),
348
+ [GitHub/antononcube](https://github.com/antononcube).
349
+
350
+ [AAp1] Anton Antonov,
351
+ [NLPTemplateEngine, WL paclet](https://resources.wolframcloud.com/PacletRepository/resources/AntonAntonov/NLPTemplateEngine/),
352
+ (2023),
353
+ [Wolfram Language Paclet Repository](https://resources.wolframcloud.com/PacletRepository/).
354
+
355
+ [AAp2] Anton Antonov,
356
+ [DSL::Translators, Raku package](https://github.com/antononcube/Raku-DSL-Translators),
357
+ (2020-2025),
358
+ [GitHub/antononcube](https://github.com/antononcube).
359
+
360
+ [AAp3] Anton Antonov,
361
+ [DSL::Examples, Raku package](https://github.com/antononcube/Raku-DSL-Examples),
362
+ (2024-2025),
363
+ [GitHub/antononcube](https://github.com/antononcube).
364
+
365
+ [AAp4] Anton Antonov,
366
+ [ML::TemplateEngine, Raku package](https://github.com/antononcube/Raku-ML-TemplateEngine),
367
+ (2023-2025),
368
+ [GitHub/antononcube](https://github.com/antononcube).
369
+
370
+ [AAp5] Anton Antonov,
371
+ [DSLExamples, Python package](https://github.com/antononcube/Python-DSLExamples),
372
+ (2026),
373
+ [GitHub/antononcube](https://github.com/antononcube).
374
+
375
+ [WRI1] Wolfram Research,
376
+ [FindTextualAnswer]( https://reference.wolfram.com/language/ref/FindTextualAnswer.html),
377
+ (2018),
378
+ [Wolfram Language function](https://reference.wolfram.com), (updated 2020).
379
+
380
+ ### Videos
381
+
382
+ [AAv1] Anton Antonov,
383
+ ["NLP Template Engine, Part 1"](https://youtu.be/a6PvmZnvF9I),
384
+ (2021),
385
+ [YouTube/@AAA4Prediction](https://www.youtube.com/@AAA4Prediction).
386
+
387
+ [AAv2] Anton Antonov,
388
+ ["Natural Language Processing Template Engine"](https://www.youtube.com/watch?v=IrIW9dB5sRM) presentation given at
389
+ WTC-2022,
390
+ (2023),
391
+ [YouTube/@Wolfram](https://www.youtube.com/@Wolfram).