hamtaa-texttools 1.1.21__py3-none-any.whl → 1.1.22__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {hamtaa_texttools-1.1.21.dist-info → hamtaa_texttools-1.1.22.dist-info}/METADATA +46 -87
- {hamtaa_texttools-1.1.21.dist-info → hamtaa_texttools-1.1.22.dist-info}/RECORD +12 -12
- texttools/__init__.py +3 -3
- texttools/batch/batch_runner.py +1 -1
- texttools/internals/async_operator.py +23 -32
- texttools/internals/operator_utils.py +24 -2
- texttools/internals/sync_operator.py +23 -32
- texttools/tools/async_tools.py +0 -2
- texttools/tools/sync_tools.py +0 -2
- {hamtaa_texttools-1.1.21.dist-info → hamtaa_texttools-1.1.22.dist-info}/WHEEL +0 -0
- {hamtaa_texttools-1.1.21.dist-info → hamtaa_texttools-1.1.22.dist-info}/licenses/LICENSE +0 -0
- {hamtaa_texttools-1.1.21.dist-info → hamtaa_texttools-1.1.22.dist-info}/top_level.txt +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: hamtaa-texttools
|
|
3
|
-
Version: 1.1.
|
|
3
|
+
Version: 1.1.22
|
|
4
4
|
Summary: A high-level NLP toolkit built on top of modern LLMs.
|
|
5
5
|
Author-email: Tohidi <the.mohammad.tohidi@gmail.com>, Montazer <montazerh82@gmail.com>, Givechi <mohamad.m.givechi@gmail.com>, MoosaviNejad <erfanmoosavi84@gmail.com>, Zareshahi <a.zareshahi1377@gmail.com>
|
|
6
6
|
License: MIT License
|
|
@@ -37,61 +37,53 @@ Dynamic: license-file
|
|
|
37
37
|
|
|
38
38
|
## 📌 Overview
|
|
39
39
|
|
|
40
|
-
**TextTools** is a high-level **NLP toolkit** built on top of
|
|
40
|
+
**TextTools** is a high-level **NLP toolkit** built on top of **LLMs**.
|
|
41
41
|
|
|
42
42
|
It provides both **sync (`TheTool`)** and **async (`AsyncTheTool`)** APIs for maximum flexibility.
|
|
43
43
|
|
|
44
44
|
It provides ready-to-use utilities for **translation, question detection, keyword extraction, categorization, NER extraction, and more** - designed to help you integrate AI-powered text processing into your applications with minimal effort.
|
|
45
45
|
|
|
46
|
+
**Note:** Most features of `texttools` are reliable when you use `google/gemma-3n-e4b-it` model.
|
|
47
|
+
|
|
46
48
|
---
|
|
47
49
|
|
|
48
50
|
## ✨ Features
|
|
49
51
|
|
|
50
52
|
TextTools provides a rich collection of high-level NLP utilities,
|
|
51
|
-
Each tool is designed to work with structured outputs
|
|
53
|
+
Each tool is designed to work with structured outputs.
|
|
52
54
|
|
|
53
55
|
- **`categorize()`** - Classifies text into given categories
|
|
54
|
-
- **`extract_keywords()`** - Extracts keywords from text
|
|
56
|
+
- **`extract_keywords()`** - Extracts keywords from the text
|
|
55
57
|
- **`extract_entities()`** - Named Entity Recognition (NER) system
|
|
56
|
-
- **`is_question()`** - Binary detection
|
|
58
|
+
- **`is_question()`** - Binary question detection
|
|
57
59
|
- **`text_to_question()`** - Generates questions from text
|
|
58
|
-
- **`merge_questions()`** - Merges multiple questions
|
|
59
|
-
- **`rewrite()`** - Rewrites text
|
|
60
|
+
- **`merge_questions()`** - Merges multiple questions into one
|
|
61
|
+
- **`rewrite()`** - Rewrites text in a diffrent way
|
|
60
62
|
- **`subject_to_question()`** - Generates questions about a specific subject
|
|
61
63
|
- **`summarize()`** - Text summarization
|
|
62
|
-
- **`translate()`** - Text translation
|
|
64
|
+
- **`translate()`** - Text translation
|
|
63
65
|
- **`propositionize()`** - Convert text to atomic independence meaningful sentences
|
|
64
66
|
- **`check_fact()`** - Check whether a statement is relevant to the source text
|
|
65
67
|
- **`run_custom()`** - Allows users to define a custom tool with an arbitrary BaseModel
|
|
66
68
|
|
|
67
69
|
---
|
|
68
70
|
|
|
71
|
+
## 🚀 Installation
|
|
72
|
+
|
|
73
|
+
Install the latest release via PyPI:
|
|
74
|
+
|
|
75
|
+
```bash
|
|
76
|
+
pip install -U hamtaa-texttools
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
---
|
|
80
|
+
|
|
69
81
|
## 📊 Tool Quality Tiers
|
|
70
82
|
|
|
71
|
-
| Status | Meaning | Use in Production? |
|
|
72
|
-
|
|
73
|
-
| **✅ Production** | Evaluated, tested, stable. | **Yes** - ready for reliable use. |
|
|
74
|
-
| **🧪 Experimental** | Added to the package but **not fully evaluated**. Functional, but quality may vary. | **Use with caution** - outputs not yet validated. |
|
|
75
|
-
|
|
76
|
-
### Current Status
|
|
77
|
-
**Production Tools:**
|
|
78
|
-
- `categorize()` (list mode)
|
|
79
|
-
- `extract_keywords()`
|
|
80
|
-
- `extract_entities()`
|
|
81
|
-
- `is_question()`
|
|
82
|
-
- `text_to_question()`
|
|
83
|
-
- `merge_questions()`
|
|
84
|
-
- `rewrite()`
|
|
85
|
-
- `subject_to_question()`
|
|
86
|
-
- `summarize()`
|
|
87
|
-
- `run_custom()` (fine in most cases)
|
|
88
|
-
|
|
89
|
-
**Experimental Tools:**
|
|
90
|
-
- `categorize()` (tree mode)
|
|
91
|
-
- `translate()`
|
|
92
|
-
- `propositionize()`
|
|
93
|
-
- `check_fact()`
|
|
94
|
-
- `run_custom()` (not evaluated in all scenarios)
|
|
83
|
+
| Status | Meaning | Tools | Use in Production? |
|
|
84
|
+
|--------|---------|----------|-------------------|
|
|
85
|
+
| **✅ Production** | Evaluated, tested, stable. | `categorize()` (list mode), `extract_keywords()`, `extract_entities()`, `is_question()`, `text_to_question()`, `merge_questions()`, `rewrite()`, `subject_to_question()`, `summarize()`, `run_custom()` | **Yes** - ready for reliable use. |
|
|
86
|
+
| **🧪 Experimental** | Added to the package but **not fully evaluated**. Functional, but quality may vary. | `categorize()` (tree mode), `translate()`, `propositionize()`, `check_fact()` | **Use with caution** - outputs not yet validated. |
|
|
95
87
|
|
|
96
88
|
---
|
|
97
89
|
|
|
@@ -100,49 +92,37 @@ Each tool is designed to work with structured outputs (JSON / Pydantic).
|
|
|
100
92
|
TextTools provides several optional flags to customize LLM behavior:
|
|
101
93
|
|
|
102
94
|
- **`with_analysis: bool`** → Adds a reasoning step before generating the final output.
|
|
103
|
-
**Note:** This doubles token usage per call
|
|
95
|
+
**Note:** This doubles token usage per call.
|
|
104
96
|
|
|
105
97
|
- **`logprobs: bool`** → Returns token-level probabilities for the generated output. You can also specify `top_logprobs=<N>` to get the top N alternative tokens and their probabilities.
|
|
106
98
|
**Note:** This feature works if it's supported by the model.
|
|
107
99
|
|
|
108
|
-
- **`output_lang: str`** → Forces the model to respond in a specific language.
|
|
100
|
+
- **`output_lang: str`** → Forces the model to respond in a specific language.
|
|
109
101
|
|
|
110
|
-
- **`user_prompt: str`** → Allows you to inject a custom instruction or
|
|
102
|
+
- **`user_prompt: str`** → Allows you to inject a custom instruction or into the model alongside the main template. This gives you fine-grained control over how the model interprets or modifies the input text.
|
|
111
103
|
|
|
112
104
|
- **`temperature: float`** → Determines how creative the model should respond. Takes a float number from `0.0` to `2.0`.
|
|
113
105
|
|
|
114
|
-
- **`validator: Callable (Experimental)`** → Forces TheTool to validate the output result based on your custom validator. Validator should return a
|
|
106
|
+
- **`validator: Callable (Experimental)`** → Forces TheTool to validate the output result based on your custom validator. Validator should return a boolean. If the validator fails, TheTool will retry to get another output by modifying `temperature`. You can also specify `max_validation_retries=<N>`.
|
|
115
107
|
|
|
116
|
-
- **`priority: int (Experimental)`** → Task execution priority level.
|
|
108
|
+
- **`priority: int (Experimental)`** → Task execution priority level. Affects processing order in queues.
|
|
117
109
|
**Note:** This feature works if it's supported by the model and vLLM.
|
|
118
110
|
|
|
119
|
-
**Note:** There might be some tools that don't support some of the parameters above.
|
|
120
|
-
|
|
121
111
|
---
|
|
122
112
|
|
|
123
113
|
## 🧩 ToolOutput
|
|
124
114
|
|
|
125
115
|
Every tool of `TextTools` returns a `ToolOutput` object which is a BaseModel with attributes:
|
|
126
|
-
- **`result: Any`**
|
|
127
|
-
- **`analysis: str`**
|
|
128
|
-
- **`logprobs: list`**
|
|
129
|
-
- **`errors: list[str]`**
|
|
116
|
+
- **`result: Any`**
|
|
117
|
+
- **`analysis: str`**
|
|
118
|
+
- **`logprobs: list`**
|
|
119
|
+
- **`errors: list[str]`**
|
|
130
120
|
- **`ToolOutputMetadata`** →
|
|
131
|
-
- **`tool_name: str`**
|
|
132
|
-
- **`processed_at: datetime`**
|
|
133
|
-
- **`execution_time: float`**
|
|
121
|
+
- **`tool_name: str`**
|
|
122
|
+
- **`processed_at: datetime`**
|
|
123
|
+
- **`execution_time: float`**
|
|
134
124
|
|
|
135
|
-
**Note:** You can use `repr(ToolOutput)` to
|
|
136
|
-
|
|
137
|
-
---
|
|
138
|
-
|
|
139
|
-
## 🚀 Installation
|
|
140
|
-
|
|
141
|
-
Install the latest release via PyPI:
|
|
142
|
-
|
|
143
|
-
```bash
|
|
144
|
-
pip install -U hamtaa-texttools
|
|
145
|
-
```
|
|
125
|
+
**Note:** You can use `repr(ToolOutput)` to print your output with all the details.
|
|
146
126
|
|
|
147
127
|
---
|
|
148
128
|
|
|
@@ -160,26 +140,13 @@ pip install -U hamtaa-texttools
|
|
|
160
140
|
from openai import OpenAI
|
|
161
141
|
from texttools import TheTool
|
|
162
142
|
|
|
163
|
-
# Create your OpenAI client
|
|
164
143
|
client = OpenAI(base_url = "your_url", API_KEY = "your_api_key")
|
|
144
|
+
model = "model_name"
|
|
165
145
|
|
|
166
|
-
# Specify the model
|
|
167
|
-
model = "gpt-4o-mini"
|
|
168
|
-
|
|
169
|
-
# Create an instance of TheTool
|
|
170
146
|
the_tool = TheTool(client=client, model=model)
|
|
171
147
|
|
|
172
|
-
|
|
173
|
-
detection
|
|
174
|
-
print(detection.result)
|
|
175
|
-
print(detection.logprobs)
|
|
176
|
-
# Output: True + logprobs
|
|
177
|
-
|
|
178
|
-
# Example: Translation
|
|
179
|
-
translation = the_tool.translate("سلام، حالت چطوره؟" target_language="English", with_analysis=True)
|
|
180
|
-
print(translation.result)
|
|
181
|
-
print(translation.analysis)
|
|
182
|
-
# Output: "Hi! How are you?" + analysis
|
|
148
|
+
detection = the_tool.is_question("Is this project open source?")
|
|
149
|
+
print(repr(detection))
|
|
183
150
|
```
|
|
184
151
|
|
|
185
152
|
---
|
|
@@ -192,22 +159,17 @@ from openai import AsyncOpenAI
|
|
|
192
159
|
from texttools import AsyncTheTool
|
|
193
160
|
|
|
194
161
|
async def main():
|
|
195
|
-
# Create your AsyncOpenAI client
|
|
196
162
|
async_client = AsyncOpenAI(base_url="your_url", api_key="your_api_key")
|
|
163
|
+
model = "model_name"
|
|
197
164
|
|
|
198
|
-
# Specify the model
|
|
199
|
-
model = "gpt-4o-mini"
|
|
200
|
-
|
|
201
|
-
# Create an instance of AsyncTheTool
|
|
202
165
|
async_the_tool = AsyncTheTool(client=async_client, model=model)
|
|
203
166
|
|
|
204
|
-
# Example: Async Translation and Keyword Extraction
|
|
205
167
|
translation_task = async_the_tool.translate("سلام، حالت چطوره؟", target_language="English")
|
|
206
168
|
keywords_task = async_the_tool.extract_keywords("Tomorrow, we will be dead by the car crash")
|
|
207
169
|
|
|
208
170
|
(translation, keywords) = await asyncio.gather(translation_task, keywords_task)
|
|
209
|
-
print(translation
|
|
210
|
-
print(keywords
|
|
171
|
+
print(repr(translation))
|
|
172
|
+
print(repr(keywords))
|
|
211
173
|
|
|
212
174
|
asyncio.run(main())
|
|
213
175
|
```
|
|
@@ -229,13 +191,12 @@ Use **TextTools** when you need to:
|
|
|
229
191
|
|
|
230
192
|
Process large datasets efficiently using OpenAI's batch API.
|
|
231
193
|
|
|
232
|
-
## ⚡ Quick Start (Batch)
|
|
194
|
+
## ⚡ Quick Start (Batch Runner)
|
|
233
195
|
|
|
234
196
|
```python
|
|
235
197
|
from pydantic import BaseModel
|
|
236
|
-
from texttools import
|
|
198
|
+
from texttools import BatchRunner, BatchConfig
|
|
237
199
|
|
|
238
|
-
# Configure your batch job
|
|
239
200
|
config = BatchConfig(
|
|
240
201
|
system_prompt="Extract entities from the text",
|
|
241
202
|
job_name="entity_extraction",
|
|
@@ -244,12 +205,10 @@ config = BatchConfig(
|
|
|
244
205
|
model="gpt-4o-mini"
|
|
245
206
|
)
|
|
246
207
|
|
|
247
|
-
# Define your output schema
|
|
248
208
|
class Output(BaseModel):
|
|
249
209
|
entities: list[str]
|
|
250
210
|
|
|
251
|
-
|
|
252
|
-
runner = BatchJobRunner(config, output_model=Output)
|
|
211
|
+
runner = BatchRunner(config, output_model=Output)
|
|
253
212
|
runner.run()
|
|
254
213
|
```
|
|
255
214
|
|
|
@@ -1,14 +1,14 @@
|
|
|
1
|
-
hamtaa_texttools-1.1.
|
|
2
|
-
texttools/__init__.py,sha256=
|
|
1
|
+
hamtaa_texttools-1.1.22.dist-info/licenses/LICENSE,sha256=Hb2YOBKy2MJQLnyLrX37B4ZVuac8eaIcE71SvVIMOLg,1082
|
|
2
|
+
texttools/__init__.py,sha256=fqGafzxcnGw0_ivi-vUyLfytWOkjLOumiaB0-I612iY,305
|
|
3
3
|
texttools/batch/batch_config.py,sha256=scWYQBDuaTj8-b2x_a33Zu-zxm7eqEf5FFoquD-Sv94,1029
|
|
4
4
|
texttools/batch/batch_manager.py,sha256=6HfsexU0PHGGBH7HKReZ-CQxaQI9DXYKAPsFXxovb_I,8740
|
|
5
|
-
texttools/batch/batch_runner.py,sha256=
|
|
6
|
-
texttools/internals/async_operator.py,sha256=
|
|
5
|
+
texttools/batch/batch_runner.py,sha256=bpgRnFZiaxqAP6sm3kzb-waeNhIRxXYhttGikGFeXXU,10013
|
|
6
|
+
texttools/internals/async_operator.py,sha256=VHs06Yd_OZqUVyhCOMn7iujEChqhJg8aRS8NXpHBO1w,6719
|
|
7
7
|
texttools/internals/exceptions.py,sha256=h_yp_5i_5IfmqTBQ4S6ZOISrrliJBQ3HTEAjwJXrplk,495
|
|
8
8
|
texttools/internals/models.py,sha256=9uoCAe2TLrSzyS9lMJja5orPAYaCvVL1zoCb6FNdkfs,4541
|
|
9
|
-
texttools/internals/operator_utils.py,sha256=
|
|
9
|
+
texttools/internals/operator_utils.py,sha256=p44-YovUiLefJ-akB3o7Tk1o73ITFxx7E77pod4Aa1Y,2491
|
|
10
10
|
texttools/internals/prompt_loader.py,sha256=yYXDD4YYG2zohGPAmvZwmv5f6xV_RSl5yOrObTh9w7I,3352
|
|
11
|
-
texttools/internals/sync_operator.py,sha256=
|
|
11
|
+
texttools/internals/sync_operator.py,sha256=23mIxk96SOOkYb_7VXjmkNKuWqPTRQVhO4cTKQ_4Mtw,6624
|
|
12
12
|
texttools/internals/text_to_chunks.py,sha256=vY3odhgCZK4E44k_SGlLoSiKkdN0ib6-lQAsPcplAHA,3843
|
|
13
13
|
texttools/prompts/README.md,sha256=ztajRJcmFLhyrUF0_qmOXaCwGsTGCFabfMjch2LAJG0,1375
|
|
14
14
|
texttools/prompts/categorize.yaml,sha256=016b1uGtbKXEwB8_2_bBgVuUelBlu_rgT85XK_c3Yv0,1219
|
|
@@ -24,9 +24,9 @@ texttools/prompts/subject_to_question.yaml,sha256=TfVmZ6gDgaHRqJWCVkFlKpuJczpMvJ
|
|
|
24
24
|
texttools/prompts/summarize.yaml,sha256=CKx4vjhHbGus1TdjDz_oc0bNEQtq7zfHsZkV2WeYHDU,457
|
|
25
25
|
texttools/prompts/text_to_question.yaml,sha256=mnArBoYu7gpGHriaU2-Aw5SixB2ZIgoHMt99PnTPKD0,1003
|
|
26
26
|
texttools/prompts/translate.yaml,sha256=ew9RERAVSzg0cvxAinNwTSFIaOIjdwIsekbUsgAuNgo,632
|
|
27
|
-
texttools/tools/async_tools.py,sha256=
|
|
28
|
-
texttools/tools/sync_tools.py,sha256=
|
|
29
|
-
hamtaa_texttools-1.1.
|
|
30
|
-
hamtaa_texttools-1.1.
|
|
31
|
-
hamtaa_texttools-1.1.
|
|
32
|
-
hamtaa_texttools-1.1.
|
|
27
|
+
texttools/tools/async_tools.py,sha256=s3g6_8Jmg2KvdItWa3sXGfWI8YaOUPnfIRtWhWRMd1c,44543
|
|
28
|
+
texttools/tools/sync_tools.py,sha256=AcApMy_XvT47rBtqGdAFrKE1QDZq30f0uJsqiWYUWQg,44349
|
|
29
|
+
hamtaa_texttools-1.1.22.dist-info/METADATA,sha256=RF431cau25sLMmynuSHXKssKt3ipFt5M9ZKJJA3C9UI,8718
|
|
30
|
+
hamtaa_texttools-1.1.22.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
|
|
31
|
+
hamtaa_texttools-1.1.22.dist-info/top_level.txt,sha256=5Mh0jIxxZ5rOXHGJ6Mp-JPKviywwN0MYuH0xk5bEWqE,10
|
|
32
|
+
hamtaa_texttools-1.1.22.dist-info/RECORD,,
|
texttools/__init__.py
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
|
-
from .batch.batch_runner import BatchJobRunner
|
|
2
|
-
from .batch.batch_config import BatchConfig
|
|
3
1
|
from .tools.sync_tools import TheTool
|
|
4
2
|
from .tools.async_tools import AsyncTheTool
|
|
5
3
|
from .internals.models import CategoryTree
|
|
4
|
+
from .batch.batch_runner import BatchRunner
|
|
5
|
+
from .batch.batch_config import BatchConfig
|
|
6
6
|
|
|
7
|
-
__all__ = ["TheTool", "AsyncTheTool", "
|
|
7
|
+
__all__ = ["TheTool", "AsyncTheTool", "CategoryTree", "BatchRunner", "BatchConfig"]
|
texttools/batch/batch_runner.py
CHANGED
|
@@ -27,17 +27,11 @@ class AsyncOperator:
|
|
|
27
27
|
self._client = client
|
|
28
28
|
self._model = model
|
|
29
29
|
|
|
30
|
-
async def _analyze_completion(self,
|
|
30
|
+
async def _analyze_completion(self, analyze_message: list[dict[str, str]]) -> str:
|
|
31
31
|
try:
|
|
32
|
-
if not analyze_prompt:
|
|
33
|
-
raise PromptError("Analyze template is empty")
|
|
34
|
-
|
|
35
|
-
analyze_message = OperatorUtils.build_user_message(analyze_prompt)
|
|
36
|
-
|
|
37
32
|
completion = await self._client.chat.completions.create(
|
|
38
33
|
model=self._model,
|
|
39
34
|
messages=analyze_message,
|
|
40
|
-
temperature=temperature,
|
|
41
35
|
)
|
|
42
36
|
|
|
43
37
|
if not completion.choices:
|
|
@@ -57,7 +51,7 @@ class AsyncOperator:
|
|
|
57
51
|
|
|
58
52
|
async def _parse_completion(
|
|
59
53
|
self,
|
|
60
|
-
|
|
54
|
+
main_message: list[dict[str, str]],
|
|
61
55
|
output_model: Type[T],
|
|
62
56
|
temperature: float,
|
|
63
57
|
logprobs: bool,
|
|
@@ -69,8 +63,6 @@ class AsyncOperator:
|
|
|
69
63
|
Returns both the parsed object and the raw completion for logprobs.
|
|
70
64
|
"""
|
|
71
65
|
try:
|
|
72
|
-
main_message = OperatorUtils.build_user_message(main_prompt)
|
|
73
|
-
|
|
74
66
|
request_kwargs = {
|
|
75
67
|
"model": self._model,
|
|
76
68
|
"messages": main_message,
|
|
@@ -124,11 +116,13 @@ class AsyncOperator:
|
|
|
124
116
|
**extra_kwargs,
|
|
125
117
|
) -> OperatorOutput:
|
|
126
118
|
"""
|
|
127
|
-
Execute the LLM pipeline with the given input text.
|
|
119
|
+
Execute the LLM pipeline with the given input text.
|
|
128
120
|
"""
|
|
129
121
|
try:
|
|
130
|
-
|
|
122
|
+
if logprobs and (not isinstance(top_logprobs, int) or top_logprobs < 2):
|
|
123
|
+
raise ValueError("top_logprobs should be an int greater than 1")
|
|
131
124
|
|
|
125
|
+
prompt_loader = PromptLoader()
|
|
132
126
|
prompt_configs = prompt_loader.load(
|
|
133
127
|
prompt_file=prompt_file,
|
|
134
128
|
text=text.strip(),
|
|
@@ -136,28 +130,27 @@ class AsyncOperator:
|
|
|
136
130
|
**extra_kwargs,
|
|
137
131
|
)
|
|
138
132
|
|
|
139
|
-
|
|
140
|
-
analysis = ""
|
|
133
|
+
analysis: str | None = None
|
|
141
134
|
|
|
142
135
|
if with_analysis:
|
|
143
|
-
|
|
144
|
-
prompt_configs["analyze_template"]
|
|
136
|
+
analyze_message = OperatorUtils.build_message(
|
|
137
|
+
prompt_configs["analyze_template"]
|
|
145
138
|
)
|
|
146
|
-
|
|
139
|
+
analysis = await self._analyze_completion(analyze_message)
|
|
147
140
|
|
|
148
|
-
|
|
149
|
-
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
main_prompt += prompt_configs["main_template"]
|
|
155
|
-
|
|
156
|
-
if logprobs and (not isinstance(top_logprobs, int) or top_logprobs < 2):
|
|
157
|
-
raise ValueError("top_logprobs should be an integer greater than 1")
|
|
141
|
+
main_message = OperatorUtils.build_message(
|
|
142
|
+
OperatorUtils.build_main_prompt(
|
|
143
|
+
prompt_configs["main_template"], analysis, output_lang, user_prompt
|
|
144
|
+
)
|
|
145
|
+
)
|
|
158
146
|
|
|
159
147
|
parsed, completion = await self._parse_completion(
|
|
160
|
-
|
|
148
|
+
main_message,
|
|
149
|
+
output_model,
|
|
150
|
+
temperature,
|
|
151
|
+
logprobs,
|
|
152
|
+
top_logprobs,
|
|
153
|
+
priority,
|
|
161
154
|
)
|
|
162
155
|
|
|
163
156
|
# Retry logic if validation fails
|
|
@@ -166,9 +159,7 @@ class AsyncOperator:
|
|
|
166
159
|
not isinstance(max_validation_retries, int)
|
|
167
160
|
or max_validation_retries < 1
|
|
168
161
|
):
|
|
169
|
-
raise ValueError(
|
|
170
|
-
"max_validation_retries should be a positive integer"
|
|
171
|
-
)
|
|
162
|
+
raise ValueError("max_validation_retries should be a positive int")
|
|
172
163
|
|
|
173
164
|
succeeded = False
|
|
174
165
|
for _ in range(max_validation_retries):
|
|
@@ -177,7 +168,7 @@ class AsyncOperator:
|
|
|
177
168
|
|
|
178
169
|
try:
|
|
179
170
|
parsed, completion = await self._parse_completion(
|
|
180
|
-
|
|
171
|
+
main_message,
|
|
181
172
|
output_model,
|
|
182
173
|
retry_temperature,
|
|
183
174
|
logprobs,
|
|
@@ -5,7 +5,29 @@ import random
|
|
|
5
5
|
|
|
6
6
|
class OperatorUtils:
|
|
7
7
|
@staticmethod
|
|
8
|
-
def
|
|
8
|
+
def build_main_prompt(
|
|
9
|
+
main_template: str,
|
|
10
|
+
analysis: str | None,
|
|
11
|
+
output_lang: str | None,
|
|
12
|
+
user_prompt: str | None,
|
|
13
|
+
) -> str:
|
|
14
|
+
main_prompt = ""
|
|
15
|
+
|
|
16
|
+
if analysis:
|
|
17
|
+
main_prompt += f"Based on this analysis:\n{analysis}\n"
|
|
18
|
+
|
|
19
|
+
if output_lang:
|
|
20
|
+
main_prompt += f"Respond only in the {output_lang} language.\n"
|
|
21
|
+
|
|
22
|
+
if user_prompt:
|
|
23
|
+
main_prompt += f"Consider this instruction {user_prompt}\n"
|
|
24
|
+
|
|
25
|
+
main_prompt += main_template
|
|
26
|
+
|
|
27
|
+
return main_prompt
|
|
28
|
+
|
|
29
|
+
@staticmethod
|
|
30
|
+
def build_message(prompt: str) -> list[dict[str, str]]:
|
|
9
31
|
return [{"role": "user", "content": prompt}]
|
|
10
32
|
|
|
11
33
|
@staticmethod
|
|
@@ -20,7 +42,7 @@ class OperatorUtils:
|
|
|
20
42
|
|
|
21
43
|
for choice in completion.choices:
|
|
22
44
|
if not getattr(choice, "logprobs", None):
|
|
23
|
-
|
|
45
|
+
raise ValueError("Your model does not support logprobs")
|
|
24
46
|
|
|
25
47
|
for logprob_item in choice.logprobs.content:
|
|
26
48
|
if ignore_pattern.match(logprob_item.token):
|
|
@@ -27,17 +27,11 @@ class Operator:
|
|
|
27
27
|
self._client = client
|
|
28
28
|
self._model = model
|
|
29
29
|
|
|
30
|
-
def _analyze_completion(self,
|
|
30
|
+
def _analyze_completion(self, analyze_message: list[dict[str, str]]) -> str:
|
|
31
31
|
try:
|
|
32
|
-
if not analyze_prompt:
|
|
33
|
-
raise PromptError("Analyze template is empty")
|
|
34
|
-
|
|
35
|
-
analyze_message = OperatorUtils.build_user_message(analyze_prompt)
|
|
36
|
-
|
|
37
32
|
completion = self._client.chat.completions.create(
|
|
38
33
|
model=self._model,
|
|
39
34
|
messages=analyze_message,
|
|
40
|
-
temperature=temperature,
|
|
41
35
|
)
|
|
42
36
|
|
|
43
37
|
if not completion.choices:
|
|
@@ -57,7 +51,7 @@ class Operator:
|
|
|
57
51
|
|
|
58
52
|
def _parse_completion(
|
|
59
53
|
self,
|
|
60
|
-
|
|
54
|
+
main_message: list[dict[str, str]],
|
|
61
55
|
output_model: Type[T],
|
|
62
56
|
temperature: float,
|
|
63
57
|
logprobs: bool,
|
|
@@ -69,8 +63,6 @@ class Operator:
|
|
|
69
63
|
Returns both the parsed object and the raw completion for logprobs.
|
|
70
64
|
"""
|
|
71
65
|
try:
|
|
72
|
-
main_message = OperatorUtils.build_user_message(main_prompt)
|
|
73
|
-
|
|
74
66
|
request_kwargs = {
|
|
75
67
|
"model": self._model,
|
|
76
68
|
"messages": main_message,
|
|
@@ -122,11 +114,13 @@ class Operator:
|
|
|
122
114
|
**extra_kwargs,
|
|
123
115
|
) -> OperatorOutput:
|
|
124
116
|
"""
|
|
125
|
-
Execute the LLM pipeline with the given input text.
|
|
117
|
+
Execute the LLM pipeline with the given input text.
|
|
126
118
|
"""
|
|
127
119
|
try:
|
|
128
|
-
|
|
120
|
+
if logprobs and (not isinstance(top_logprobs, int) or top_logprobs < 2):
|
|
121
|
+
raise ValueError("top_logprobs should be an int greater than 1")
|
|
129
122
|
|
|
123
|
+
prompt_loader = PromptLoader()
|
|
130
124
|
prompt_configs = prompt_loader.load(
|
|
131
125
|
prompt_file=prompt_file,
|
|
132
126
|
text=text.strip(),
|
|
@@ -134,28 +128,27 @@ class Operator:
|
|
|
134
128
|
**extra_kwargs,
|
|
135
129
|
)
|
|
136
130
|
|
|
137
|
-
|
|
138
|
-
analysis = ""
|
|
131
|
+
analysis: str | None = None
|
|
139
132
|
|
|
140
133
|
if with_analysis:
|
|
141
|
-
|
|
142
|
-
prompt_configs["analyze_template"]
|
|
134
|
+
analyze_message = OperatorUtils.build_message(
|
|
135
|
+
prompt_configs["analyze_template"]
|
|
143
136
|
)
|
|
144
|
-
|
|
137
|
+
analysis = self._analyze_completion(analyze_message)
|
|
145
138
|
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
main_prompt += prompt_configs["main_template"]
|
|
153
|
-
|
|
154
|
-
if logprobs and (not isinstance(top_logprobs, int) or top_logprobs < 2):
|
|
155
|
-
raise ValueError("top_logprobs should be an integer greater than 1")
|
|
139
|
+
main_message = OperatorUtils.build_message(
|
|
140
|
+
OperatorUtils.build_main_prompt(
|
|
141
|
+
prompt_configs["main_template"], analysis, output_lang, user_prompt
|
|
142
|
+
)
|
|
143
|
+
)
|
|
156
144
|
|
|
157
145
|
parsed, completion = self._parse_completion(
|
|
158
|
-
|
|
146
|
+
main_message,
|
|
147
|
+
output_model,
|
|
148
|
+
temperature,
|
|
149
|
+
logprobs,
|
|
150
|
+
top_logprobs,
|
|
151
|
+
priority,
|
|
159
152
|
)
|
|
160
153
|
|
|
161
154
|
# Retry logic if validation fails
|
|
@@ -164,9 +157,7 @@ class Operator:
|
|
|
164
157
|
not isinstance(max_validation_retries, int)
|
|
165
158
|
or max_validation_retries < 1
|
|
166
159
|
):
|
|
167
|
-
raise ValueError(
|
|
168
|
-
"max_validation_retries should be a positive integer"
|
|
169
|
-
)
|
|
160
|
+
raise ValueError("max_validation_retries should be a positive int")
|
|
170
161
|
|
|
171
162
|
succeeded = False
|
|
172
163
|
for _ in range(max_validation_retries):
|
|
@@ -175,7 +166,7 @@ class Operator:
|
|
|
175
166
|
|
|
176
167
|
try:
|
|
177
168
|
parsed, completion = self._parse_completion(
|
|
178
|
-
|
|
169
|
+
main_message,
|
|
179
170
|
output_model,
|
|
180
171
|
retry_temperature,
|
|
181
172
|
logprobs,
|
texttools/tools/async_tools.py
CHANGED
|
@@ -1044,8 +1044,6 @@ class AsyncTheTool:
|
|
|
1044
1044
|
"""
|
|
1045
1045
|
Custom tool that can do almost anything!
|
|
1046
1046
|
|
|
1047
|
-
Important Note: This tool is EXPERIMENTAL, you can use it but it isn't reliable.
|
|
1048
|
-
|
|
1049
1047
|
Arguments:
|
|
1050
1048
|
prompt: The user prompt
|
|
1051
1049
|
output_model: Pydantic BaseModel used for structured output
|
texttools/tools/sync_tools.py
CHANGED
|
@@ -1044,8 +1044,6 @@ class TheTool:
|
|
|
1044
1044
|
"""
|
|
1045
1045
|
Custom tool that can do almost anything!
|
|
1046
1046
|
|
|
1047
|
-
Important Note: This tool is EXPERIMENTAL, you can use it but it isn't reliable.
|
|
1048
|
-
|
|
1049
1047
|
Arguments:
|
|
1050
1048
|
prompt: The user prompt
|
|
1051
1049
|
output_model: Pydantic BaseModel used for structured output
|
|
File without changes
|
|
File without changes
|
|
File without changes
|