hamtaa-texttools 2.1.0__py3-none-any.whl → 2.2.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: hamtaa-texttools
3
- Version: 2.1.0
3
+ Version: 2.2.0
4
4
  Summary: A high-level NLP toolkit built on top of modern LLMs.
5
5
  Author-email: Tohidi <the.mohammad.tohidi@gmail.com>, Erfan Moosavi <erfanmoosavi84@gmail.com>, Montazer <montazerh82@gmail.com>, Givechi <mohamad.m.givechi@gmail.com>, Zareshahi <a.zareshahi1377@gmail.com>
6
6
  Maintainer-email: Erfan Moosavi <erfanmoosavi84@gmail.com>, Tohidi <the.mohammad.tohidi@gmail.com>
@@ -29,7 +29,10 @@ Dynamic: license-file
29
29
 
30
30
  **TextTools** is a high-level **NLP toolkit** built on top of **LLMs**.
31
31
 
32
- It provides both **sync (`TheTool`)** and **async (`AsyncTheTool`)** APIs for maximum flexibility.
32
+ It provides three API styles for maximum flexibility:
33
+ - Sync API (`TheTool`) - Simple, sequential operations
34
+ - Async API (`AsyncTheTool`) - High-performance async operations
35
+ - Batch API (`BatchTheTool`) - Process multiple texts in parallel with built-in concurrency control
33
36
 
34
37
  It provides ready-to-use utilities for **translation, question detection, categorization, NER extraction, and more** - designed to help you integrate AI-powered text processing into your applications with minimal effort.
35
38
 
@@ -76,8 +79,6 @@ pip install -U hamtaa-texttools
76
79
 
77
80
  ## ⚙️ Additional Parameters
78
81
 
79
- - **`raise_on_error: bool`** → (`TheTool/AsyncTheTool` parameter) Raise errors (True) or return them in output (False). Default is True.
80
-
81
82
  - **`with_analysis: bool`** → Adds a reasoning step before generating the final output.
82
83
  **Note:** This doubles token usage per call.
83
84
 
@@ -98,6 +99,9 @@ pip install -U hamtaa-texttools
98
99
  - **`timeout: float`** → Maximum time in seconds to wait for the response before raising a timeout error.
99
100
  **Note:** This feature is only available in `AsyncTheTool`.
100
101
 
102
+ - **`raise_on_error: bool`** → (`TheTool/AsyncTheTool`) Raise errors (True) or return them in output (False). Default is True.
103
+
104
+ - **`max_concurrency: int`** → (`BatchTheTool` only) Maximum number of concurrent API calls. Default is 5.
101
105
 
102
106
  ---
103
107
 
@@ -117,13 +121,16 @@ Every tool of `TextTools` returns a `ToolOutput` object which is a BaseModel wit
117
121
  - Verify operation success with the `is_successful()` method.
118
122
  - Convert output to a dictionary with the `to_dict()` method.
119
123
 
124
+ **Note:** For BatchTheTool: Each method returns a list[ToolOutput] containing results for all input texts.
125
+
120
126
  ---
121
127
 
122
- ## 🧨 Sync vs Async
123
- | Tool | Style | Use case |
124
- |--------------|---------|---------------------------------------------|
125
- | `TheTool` | Sync | Simple scripts, sequential workflows |
126
- | `AsyncTheTool` | Async | High-throughput apps, APIs, concurrent tasks |
128
+ ## 🧨 Sync vs Async vs Batch
129
+ | Tool | Style | Use Case | Best For |
130
+ |------|-------|----------|----------|
131
+ | `TheTool` | **Sync** | Simple scripts, sequential workflows | • Quick prototyping<br>• Simple scripts<br>• Sequential processing<br>• Debugging |
132
+ | `AsyncTheTool` | **Async** | High-throughput applications, APIs, concurrent tasks | • Web APIs<br>• Concurrent operations<br>• High-performance apps<br>• Real-time processing |
133
+ | `BatchTheTool` | **Batch** | Process multiple texts efficiently with controlled concurrency | • Bulk processing<br>• Large datasets<br>• Parallel execution<br>• Resource optimization |
127
134
 
128
135
  ---
129
136
 
@@ -168,6 +175,35 @@ async def main():
168
175
  asyncio.run(main())
169
176
  ```
170
177
 
178
+ ## ⚡ Quick Start (Batch)
179
+
180
+ ```python
181
+ import asyncio
182
+ from openai import AsyncOpenAI
183
+ from texttools import BatchTheTool
184
+
185
+ async def main():
186
+ async_client = AsyncOpenAI(base_url="your_url", api_key="your_api_key")
187
+ model = "model_name"
188
+
189
+ batch_the_tool = BatchTheTool(client=async_client, model=model, max_concurrency=3)
190
+
191
+ categories = await batch_tool.categorize(
192
+ texts=[
193
+ "Climate change impacts on agriculture",
194
+ "Artificial intelligence in healthcare",
195
+ "Economic effects of remote work",
196
+ "Advancements in quantum computing",
197
+ ],
198
+ categories=["Science", "Technology", "Economics", "Environment"],
199
+ )
200
+
201
+ for i, result in enumerate(categories):
202
+ print(f"Text {i+1}: {result.result}")
203
+
204
+ asyncio.run(main())
205
+ ```
206
+
171
207
  ---
172
208
 
173
209
  ## ✅ Use Cases
@@ -176,4 +212,20 @@ Use **TextTools** when you need to:
176
212
 
177
213
  - 🔍 **Classify** large datasets quickly without model training
178
214
  - 🧩 **Integrate** LLMs into production pipelines (structured outputs)
179
- - 📊 **Analyze** large text collections using embeddings and categorization
215
+ - 📊 **Analyze** large text collections using embeddings and categorization
216
+
217
+ ---
218
+
219
+ ## 📄 License
220
+
221
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
222
+
223
+ ---
224
+
225
+ ## 🤝 Contributing
226
+
227
+ We welcome contributions from the community! - see the [CONTRIBUTING](CONTRIBUTING.md) file for details.
228
+
229
+ ## 📚 Documentation
230
+
231
+ For detailed documentation, architecture overview, and implementation details, please visit the [docs](docs) directory.
@@ -1,5 +1,5 @@
1
- hamtaa_texttools-2.1.0.dist-info/licenses/LICENSE,sha256=gqxbR8wqI3utd__l3Yn6_dQ3Pou1a17W4KmydbvZGok,1084
2
- texttools/__init__.py,sha256=AHpTq1BbL3sWCaFiIjlSkqNfNqweq-qm2EIOSmUZRJ0,175
1
+ hamtaa_texttools-2.2.0.dist-info/licenses/LICENSE,sha256=gqxbR8wqI3utd__l3Yn6_dQ3Pou1a17W4KmydbvZGok,1084
2
+ texttools/__init__.py,sha256=2bIFP0BdsDeOC7aQNTQjSX6OBmWQEweltUPRowwrhmg,236
3
3
  texttools/models.py,sha256=CQnO1zkKHFyqeMWrYGA4IyXQ7YYLVc3Xz1WaXbXzDLw,4634
4
4
  texttools/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
5
5
  texttools/core/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
@@ -9,7 +9,7 @@ texttools/core/utils.py,sha256=jqXHXU1DWDKWhK0HHSjnjq4_TLg3FMcnRzrwTF1eqqc,9744
9
9
  texttools/core/operators/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
10
10
  texttools/core/operators/async_operator.py,sha256=HOi9gUwIffJUtyp8WLNbMpxI8jnafNDrbtLl6vyPcUs,6221
11
11
  texttools/core/operators/sync_operator.py,sha256=yM14fsku-4Nf60lPUVePaB9Lu8HbGKb4ubwoizVWuYQ,6126
12
- texttools/prompts/augment.yaml,sha256=O-LMVyrihr0GQ8hp2Lx6uIR8Jh83bUDS9UZ-dvYOP7k,5453
12
+ texttools/prompts/augment.yaml,sha256=uJnnP-uEafiATdBx74LiOQWX6spvwcC0J-yfhySfoAM,5423
13
13
  texttools/prompts/categorize.yaml,sha256=kN4uRPOC7q6A13bdCIox60vZZ8sgRiTtquv-kqIvTsk,1133
14
14
  texttools/prompts/extract_entities.yaml,sha256=-qe1eEvN-8nJ2_GLjeoFAPVORCPYUzsIt7UGXD485bE,648
15
15
  texttools/prompts/extract_keywords.yaml,sha256=jP74HFa4Dka01d1COStEBbdzW5onqwocwyyVsmNpECs,3276
@@ -22,9 +22,10 @@ texttools/prompts/summarize.yaml,sha256=0aKYFRDxODqOOEhSexi-hn3twLwkMFVmi7rtAifn
22
22
  texttools/prompts/to_question.yaml,sha256=n8Bn28QjvSHwPHQLwRYpZ2IsaaBsq4pK9Dp_i0xk8eg,2210
23
23
  texttools/prompts/translate.yaml,sha256=omtC-TlFYMidy8WqRe7idUtKNiK4g3IhEl-iyufOwjk,649
24
24
  texttools/tools/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
25
- texttools/tools/async_tools.py,sha256=2ZJ8K1-SSRSyyQ5VfDBZof0HDeRjEuakZJyHAlswrLw,46089
26
- texttools/tools/sync_tools.py,sha256=WqHaUQscOd6RbMCGjhFbC4muw1VZxu-W5qCOA9JIwVc,41835
27
- hamtaa_texttools-2.1.0.dist-info/METADATA,sha256=Sq4pywPSrBvHxp6sundpF2LFblcJqYgkhONx8V3XNyU,6958
28
- hamtaa_texttools-2.1.0.dist-info/WHEEL,sha256=wUyA8OaulRlbfwMtmQsvNngGrxQHAvkKcvRmdizlJi0,92
29
- hamtaa_texttools-2.1.0.dist-info/top_level.txt,sha256=5Mh0jIxxZ5rOXHGJ6Mp-JPKviywwN0MYuH0xk5bEWqE,10
30
- hamtaa_texttools-2.1.0.dist-info/RECORD,,
25
+ texttools/tools/async_tools.py,sha256=_Dr5bo7RFp4f6eGNgNr549YIv5VoVpUq_ex_R5vsD2M,46087
26
+ texttools/tools/batch_tools.py,sha256=hwWutcSWc2k79vZX5Urft1arTgHpDnnxztHZba54xtg,29899
27
+ texttools/tools/sync_tools.py,sha256=UxXKUhnALoTCw2wpzfoBZVmhOZIGi6qv8tZAVXGIqFI,41833
28
+ hamtaa_texttools-2.2.0.dist-info/METADATA,sha256=qnmDDJ24KJ6BI-kJ31vxaigEET0-gVM0DBBnOlL9B-M,8928
29
+ hamtaa_texttools-2.2.0.dist-info/WHEEL,sha256=wUyA8OaulRlbfwMtmQsvNngGrxQHAvkKcvRmdizlJi0,92
30
+ hamtaa_texttools-2.2.0.dist-info/top_level.txt,sha256=5Mh0jIxxZ5rOXHGJ6Mp-JPKviywwN0MYuH0xk5bEWqE,10
31
+ hamtaa_texttools-2.2.0.dist-info/RECORD,,
texttools/__init__.py CHANGED
@@ -1,5 +1,6 @@
1
1
  from .models import CategoryTree
2
2
  from .tools.async_tools import AsyncTheTool
3
+ from .tools.batch_tools import BatchTheTool
3
4
  from .tools.sync_tools import TheTool
4
5
 
5
- __all__ = ["CategoryTree", "AsyncTheTool", "TheTool"]
6
+ __all__ = ["CategoryTree", "AsyncTheTool", "TheTool", "BatchTheTool"]
@@ -38,25 +38,25 @@ main_template:
38
38
  "{text}"
39
39
 
40
40
  hard_negative: |
41
- You are an AI assistant designed to generate high-quality training data for semantic text embedding models.
42
- Your task is to create a hard-negative sample for a given "Anchor" text.
41
+ You are an AI assistant designed to generate high-quality training data for semantic text embedding models.
42
+ Your task is to create a hard-negative sample for a given "Anchor" text.
43
43
 
44
- A high-quality hard-negative sample is a sentence that is topically related but semantically distinct from the Anchor.
45
- It should share some context (e.g., same domain, same entities) but differ in a crucial piece of information, action, conclusion, or specific detail.
44
+ A high-quality hard-negative sample is a sentence that is topically related but semantically distinct from the Anchor.
45
+ It should share some context (e.g., same domain, same entities) but differ in a crucial piece of information, action, conclusion, or specific detail.
46
46
 
47
- Instructions:
48
- - Stay in General Domain: Remain in the same broad domain (e.g., religious topics), but choose a completely different subject matter.
49
- - Maintain Topical Overlap: Keep the same domain, subject, or entities (e.g., people, products, concepts) as the Anchor.
50
- - Alter a Key Semantic Element: Reverse a key word or condition or place or proper name that completely reverses the meaning of the sentence.
51
- - Avoid Being a Paraphrase: The sentence must NOT be semantically equivalent. The core factual claim or intent must be different.
52
- - Make it Challenging: The difference should be subtle enough that it requires a deep understanding of the text to identify, not just a simple keyword mismatch.
53
- - Maintain Similar Length: The generated sentence should be of roughly the same length and level of detail as the Anchor.
47
+ Instructions:
48
+ - Stay in General Domain: Remain in the same broad domain (e.g., religious topics), but choose a completely different subject matter.
49
+ - Maintain Topical Overlap: Keep the same domain, subject, or entities (e.g., people, products, concepts) as the Anchor.
50
+ - Alter a Key Semantic Element: Reverse a key word or condition or place or proper name that completely reverses the meaning of the sentence.
51
+ - Avoid Being a Paraphrase: The sentence must NOT be semantically equivalent. The core factual claim or intent must be different.
52
+ - Make it Challenging: The difference should be subtle enough that it requires a deep understanding of the text to identify, not just a simple keyword mismatch.
53
+ - Maintain Similar Length: The generated sentence should be of roughly the same length and level of detail as the Anchor.
54
54
 
55
- Respond only in JSON format:
56
- {{"result": "rewriteen_text"}}
55
+ Respond only in JSON format:
56
+ {{"result": "rewriteen_text"}}
57
57
 
58
- Anchor Text:
59
- "{text}"
58
+ Anchor Text:
59
+ "{text}"
60
60
 
61
61
 
62
62
  analyze_template:
@@ -62,7 +62,6 @@ class AsyncTheTool:
62
62
 
63
63
  Returns:
64
64
  ToolOutput
65
-
66
65
  """
67
66
  tool_name = "categorize"
68
67
  start = perf_counter()
@@ -0,0 +1,688 @@
1
+ import asyncio
2
+ from typing import Any, Callable, Literal
3
+
4
+ from openai import AsyncOpenAI
5
+
6
+ from ..models import CategoryTree, ToolOutput
7
+ from .async_tools import AsyncTheTool
8
+
9
+
10
+ class BatchTheTool:
11
+ def __init__(
12
+ self,
13
+ client: AsyncOpenAI,
14
+ model: str,
15
+ raise_on_error: bool = True,
16
+ max_concurrency: int = 5,
17
+ ):
18
+ self.tool = AsyncTheTool(client, model, raise_on_error)
19
+ self.semaphore = asyncio.Semaphore(max_concurrency)
20
+
21
+ async def categorize(
22
+ self,
23
+ texts: list[str],
24
+ categories: list[str] | CategoryTree,
25
+ with_analysis: bool = False,
26
+ user_prompt: str | None = None,
27
+ temperature: float | None = 0.0,
28
+ logprobs: bool = False,
29
+ top_logprobs: int = 3,
30
+ validator: Callable[[Any], bool] | None = None,
31
+ max_validation_retries: int | None = None,
32
+ priority: int | None = None,
33
+ timeout: float | None = None,
34
+ ) -> list[ToolOutput]:
35
+ """
36
+ Classify texts into given categories
37
+
38
+ Arguments:
39
+ texts: The input texts
40
+ categories: The category list / category tree
41
+ with_analysis: Adds a reasoning step before generating the final output. Note: This doubles token usage per call
42
+ user_prompt: Additional instructions
43
+ temperature: Controls randomness
44
+ logprobs: Whether to return token probability information
45
+ top_logprobs: Number of top token alternatives to return if logprobs enabled
46
+ validator: Custom validation function to validate the output
47
+ max_validation_retries: Maximum number of retry attempts if validation fails
48
+ priority: Task execution priority (if enabled by vLLM and the model)
49
+ timeout: Maximum time in seconds to wait for the response before raising a timeout error
50
+
51
+ Returns:
52
+ list[ToolOutput]
53
+ """
54
+
55
+ async def _throttled_task(text: str) -> ToolOutput:
56
+ async with self.semaphore:
57
+ return await self.tool.categorize(
58
+ text=text,
59
+ categories=categories,
60
+ with_analysis=with_analysis,
61
+ user_prompt=user_prompt,
62
+ temperature=temperature,
63
+ logprobs=logprobs,
64
+ top_logprobs=top_logprobs,
65
+ validator=validator,
66
+ max_validation_retries=max_validation_retries,
67
+ priority=priority,
68
+ timeout=timeout,
69
+ )
70
+
71
+ tasks = [_throttled_task(t) for t in texts]
72
+ return await asyncio.gather(*tasks)
73
+
74
+ async def extract_keywords(
75
+ self,
76
+ texts: list[str],
77
+ mode: Literal["auto", "threshold", "count"],
78
+ number_of_keywords: int | None = None,
79
+ with_analysis: bool = False,
80
+ output_lang: str | None = None,
81
+ user_prompt: str | None = None,
82
+ temperature: float | None = 0.0,
83
+ logprobs: bool = False,
84
+ top_logprobs: int = 3,
85
+ validator: Callable[[Any], bool] | None = None,
86
+ max_validation_retries: int | None = None,
87
+ priority: int | None = None,
88
+ timeout: float | None = None,
89
+ ) -> list[ToolOutput]:
90
+ """
91
+ Extract keywords from the texts
92
+
93
+ Arguments:
94
+ texts: The input texts
95
+ mode: auto -> decide n of keywords automatically, threshold -> decide n of keywords by a threshold, count -> takes number of keywords as the parameter
96
+ number_of_keywords: Must be set only when using "count" mode
97
+ with_analysis: Adds a reasoning step before generating the final output. Note: This doubles token usage per call
98
+ output_lang: Forces the model to respond in a specific language
99
+ user_prompt: Additional instructions
100
+ temperature: Controls randomness
101
+ logprobs: Whether to return token probability information
102
+ top_logprobs: Number of top token alternatives to return if logprobs enabled
103
+ validator: Custom validation function to validate the output
104
+ max_validation_retries: Maximum number of retry attempts if validation fails
105
+ priority: Task execution priority (if enabled by vLLM and the model)
106
+ timeout: Maximum time in seconds to wait for the response before raising a timeout error
107
+
108
+ Returns:
109
+ list[ToolOutput]
110
+ """
111
+
112
+ async def _throttled_task(text: str) -> ToolOutput:
113
+ async with self.semaphore:
114
+ return await self.tool.extract_keywords(
115
+ text=text,
116
+ mode=mode,
117
+ number_of_keywords=number_of_keywords,
118
+ with_analysis=with_analysis,
119
+ output_lang=output_lang,
120
+ user_prompt=user_prompt,
121
+ temperature=temperature,
122
+ logprobs=logprobs,
123
+ top_logprobs=top_logprobs,
124
+ validator=validator,
125
+ max_validation_retries=max_validation_retries,
126
+ priority=priority,
127
+ timeout=timeout,
128
+ )
129
+
130
+ tasks = [_throttled_task(t) for t in texts]
131
+ return await asyncio.gather(*tasks)
132
+
133
+ async def extract_entities(
134
+ self,
135
+ texts: list[str],
136
+ entities: list[str] = ["all named entities"],
137
+ with_analysis: bool = False,
138
+ output_lang: str | None = None,
139
+ user_prompt: str | None = None,
140
+ temperature: float | None = 0.0,
141
+ logprobs: bool = False,
142
+ top_logprobs: int = 3,
143
+ validator: Callable[[Any], bool] | None = None,
144
+ max_validation_retries: int | None = None,
145
+ priority: int | None = None,
146
+ timeout: float | None = None,
147
+ ) -> list[ToolOutput]:
148
+ """
149
+ Perform Named Entity Recognition (NER) on texts
150
+
151
+ Arguments:
152
+ texts: The input texts
153
+ entities: List of entities
154
+ with_analysis: Adds a reasoning step before generating the final output. Note: This doubles token usage per call
155
+ output_lang: Forces the model to respond in a specific language
156
+ user_prompt: Additional instructions
157
+ temperature: Controls randomness
158
+ logprobs: Whether to return token probability information
159
+ top_logprobs: Number of top token alternatives to return if logprobs enabled
160
+ validator: Custom validation function to validate the output
161
+ max_validation_retries: Maximum number of retry attempts if validation fails
162
+ priority: Task execution priority (if enabled by vLLM and the model)
163
+ timeout: Maximum time in seconds to wait for the response before raising a timeout error
164
+
165
+ Returns:
166
+ list[ToolOutput]
167
+ """
168
+
169
+ async def _throttled_task(text: str) -> ToolOutput:
170
+ async with self.semaphore:
171
+ return await self.tool.extract_entities(
172
+ text=text,
173
+ entities=entities,
174
+ with_analysis=with_analysis,
175
+ output_lang=output_lang,
176
+ user_prompt=user_prompt,
177
+ temperature=temperature,
178
+ logprobs=logprobs,
179
+ top_logprobs=top_logprobs,
180
+ validator=validator,
181
+ max_validation_retries=max_validation_retries,
182
+ priority=priority,
183
+ timeout=timeout,
184
+ )
185
+
186
+ tasks = [_throttled_task(t) for t in texts]
187
+ return await asyncio.gather(*tasks)
188
+
189
+ async def is_question(
190
+ self,
191
+ texts: list[str],
192
+ with_analysis: bool = False,
193
+ user_prompt: str | None = None,
194
+ temperature: float | None = 0.0,
195
+ logprobs: bool = False,
196
+ top_logprobs: int = 3,
197
+ validator: Callable[[Any], bool] | None = None,
198
+ max_validation_retries: int | None = None,
199
+ priority: int | None = None,
200
+ timeout: float | None = None,
201
+ ) -> list[ToolOutput]:
202
+ """
203
+ Detect if the inputs are phrased as questions.
204
+
205
+ Arguments:
206
+ texts: The input texts
207
+ with_analysis: Adds a reasoning step before generating the final output. Note: This doubles token usage per call
208
+ user_prompt: Additional instructions
209
+ temperature: Controls randomness
210
+ logprobs: Whether to return token probability information
211
+ top_logprobs: Number of top token alternatives to return if logprobs enabled
212
+ validator: Custom validation function to validate the output
213
+ max_validation_retries: Maximum number of retry attempts if validation fails
214
+ priority: Task execution priority (if enabled by vLLM and the model)
215
+ timeout: Maximum time in seconds to wait for the response before raising a timeout error
216
+
217
+ Returns:
218
+ list[ToolOutput]
219
+ """
220
+
221
+ async def _throttled_task(text: str) -> ToolOutput:
222
+ async with self.semaphore:
223
+ return await self.tool.is_question(
224
+ text=text,
225
+ with_analysis=with_analysis,
226
+ user_prompt=user_prompt,
227
+ temperature=temperature,
228
+ logprobs=logprobs,
229
+ top_logprobs=top_logprobs,
230
+ validator=validator,
231
+ max_validation_retries=max_validation_retries,
232
+ priority=priority,
233
+ timeout=timeout,
234
+ )
235
+
236
+ tasks = [_throttled_task(t) for t in texts]
237
+ return await asyncio.gather(*tasks)
238
+
239
+ async def to_question(
240
+ self,
241
+ texts: list[str],
242
+ number_of_questions: int,
243
+ mode: Literal["from_text", "from_subject"],
244
+ with_analysis: bool = False,
245
+ output_lang: str | None = None,
246
+ user_prompt: str | None = None,
247
+ temperature: float | None = 0.0,
248
+ logprobs: bool = False,
249
+ top_logprobs: int = 3,
250
+ validator: Callable[[Any], bool] | None = None,
251
+ max_validation_retries: int | None = None,
252
+ priority: int | None = None,
253
+ timeout: float | None = None,
254
+ ) -> list[ToolOutput]:
255
+ """
256
+ Generate questions from the given texts / subjects
257
+
258
+ Arguments:
259
+ texts: The input texts
260
+ mode: from_text -> generate questions from an answer, from_subject -> generate questions from a subject
261
+ number_of_questions: Number of questions to generate
262
+ with_analysis: Adds a reasoning step before generating the final output. Note: This doubles token usage per call
263
+ output_lang: Forces the model to respond in a specific language
264
+ user_prompt: Additional instructions
265
+ temperature: Controls randomness
266
+ logprobs: Whether to return token probability information
267
+ top_logprobs: Number of top token alternatives to return if logprobs enabled
268
+ validator: Custom validation function to validate the output
269
+ max_validation_retries: Maximum number of retry attempts if validation fails
270
+ priority: Task execution priority (if enabled by vLLM and the model)
271
+ timeout: Maximum time in seconds to wait for the response before raising a timeout error
272
+
273
+ Returns:
274
+ list[ToolOutput]
275
+ """
276
+
277
+ async def _throttled_task(text: str) -> ToolOutput:
278
+ async with self.semaphore:
279
+ return await self.tool.to_question(
280
+ text=text,
281
+ number_of_questions=number_of_questions,
282
+ mode=mode,
283
+ with_analysis=with_analysis,
284
+ output_lang=output_lang,
285
+ user_prompt=user_prompt,
286
+ temperature=temperature,
287
+ logprobs=logprobs,
288
+ top_logprobs=top_logprobs,
289
+ validator=validator,
290
+ max_validation_retries=max_validation_retries,
291
+ priority=priority,
292
+ timeout=timeout,
293
+ )
294
+
295
+ tasks = [_throttled_task(t) for t in texts]
296
+ return await asyncio.gather(*tasks)
297
+
298
+ async def merge_questions(
299
+ self,
300
+ texts_list: list[list[str]],
301
+ mode: Literal["simple", "stepwise"],
302
+ with_analysis: bool = False,
303
+ output_lang: str | None = None,
304
+ user_prompt: str | None = None,
305
+ temperature: float | None = 0.0,
306
+ logprobs: bool = False,
307
+ top_logprobs: int = 3,
308
+ validator: Callable[[Any], bool] | None = None,
309
+ max_validation_retries: int | None = None,
310
+ priority: int | None = None,
311
+ timeout: float | None = None,
312
+ ) -> list[ToolOutput]:
313
+ """
314
+ Merge multiple questions into a single unified question for each group
315
+
316
+ Arguments:
317
+ texts_list: List of groups of questions to merge
318
+ mode: simple -> regular question merging, stepwise -> merge questions in two steps
319
+ with_analysis: Adds a reasoning step before generating the final output. Note: This doubles token usage per call
320
+ output_lang: Forces the model to respond in a specific language
321
+ user_prompt: Additional instructions
322
+ temperature: Controls randomness
323
+ logprobs: Whether to return token probability information
324
+ top_logprobs: Number of top token alternatives to return if logprobs enabled
325
+ validator: Custom validation function to validate the output
326
+ max_validation_retries: Maximum number of retry attempts if validation fails
327
+ priority: Task execution priority (if enabled by vLLM and the model)
328
+ timeout: Maximum time in seconds to wait for the response before raising a timeout error
329
+
330
+ Returns:
331
+ list[ToolOutput]
332
+ """
333
+
334
+ async def _throttled_task(texts: list[str]) -> ToolOutput:
335
+ async with self.semaphore:
336
+ return await self.tool.merge_questions(
337
+ text=texts,
338
+ mode=mode,
339
+ with_analysis=with_analysis,
340
+ output_lang=output_lang,
341
+ user_prompt=user_prompt,
342
+ temperature=temperature,
343
+ logprobs=logprobs,
344
+ top_logprobs=top_logprobs,
345
+ validator=validator,
346
+ max_validation_retries=max_validation_retries,
347
+ priority=priority,
348
+ timeout=timeout,
349
+ )
350
+
351
+ tasks = [_throttled_task(t) for t in texts_list]
352
+ return await asyncio.gather(*tasks)
353
+
354
+ async def augment(
355
+ self,
356
+ texts: list[str],
357
+ mode: Literal["positive", "negative", "hard_negative"],
358
+ with_analysis: bool = False,
359
+ output_lang: str | None = None,
360
+ user_prompt: str | None = None,
361
+ temperature: float | None = 0.0,
362
+ logprobs: bool = False,
363
+ top_logprobs: int = 3,
364
+ validator: Callable[[Any], bool] | None = None,
365
+ max_validation_retries: int | None = None,
366
+ priority: int | None = None,
367
+ timeout: float | None = None,
368
+ ) -> list[ToolOutput]:
369
+ """
370
+ Rewrite texts in different augmentations
371
+
372
+ Arguments:
373
+ texts: The input texts
374
+ mode: positive -> positive augmentation, negative -> negative augmentation, hard_negative -> hard negative augmentation
375
+ with_analysis: Adds a reasoning step before generating the final output. Note: This doubles token usage per call
376
+ output_lang: Forces the model to respond in a specific language
377
+ user_prompt: Additional instructions
378
+ temperature: Controls randomness
379
+ logprobs: Whether to return token probability information
380
+ top_logprobs: Number of top token alternatives to return if logprobs enabled
381
+ validator: Custom validation function to validate the output
382
+ max_validation_retries: Maximum number of retry attempts if validation fails
383
+ priority: Task execution priority (if enabled by vLLM and the model)
384
+ timeout: Maximum time in seconds to wait for the response before raising a timeout error
385
+
386
+ Returns:
387
+ list[ToolOutput]
388
+ """
389
+
390
+ async def _throttled_task(text: str) -> ToolOutput:
391
+ async with self.semaphore:
392
+ return await self.tool.augment(
393
+ text=text,
394
+ mode=mode,
395
+ with_analysis=with_analysis,
396
+ output_lang=output_lang,
397
+ user_prompt=user_prompt,
398
+ temperature=temperature,
399
+ logprobs=logprobs,
400
+ top_logprobs=top_logprobs,
401
+ validator=validator,
402
+ max_validation_retries=max_validation_retries,
403
+ priority=priority,
404
+ timeout=timeout,
405
+ )
406
+
407
+ tasks = [_throttled_task(t) for t in texts]
408
+ return await asyncio.gather(*tasks)
409
+
410
+ async def summarize(
411
+ self,
412
+ texts: list[str],
413
+ with_analysis: bool = False,
414
+ output_lang: str | None = None,
415
+ user_prompt: str | None = None,
416
+ temperature: float | None = 0.0,
417
+ logprobs: bool = False,
418
+ top_logprobs: int = 3,
419
+ validator: Callable[[Any], bool] | None = None,
420
+ max_validation_retries: int | None = None,
421
+ priority: int | None = None,
422
+ timeout: float | None = None,
423
+ ) -> list[ToolOutput]:
424
+ """
425
+ Summarize the given texts
426
+
427
+ Arguments:
428
+ texts: The input texts
429
+ with_analysis: Adds a reasoning step before generating the final output. Note: This doubles token usage per call
430
+ output_lang: Forces the model to respond in a specific language
431
+ user_prompt: Additional instructions
432
+ temperature: Controls randomness
433
+ logprobs: Whether to return token probability information
434
+ top_logprobs: Number of top token alternatives to return if logprobs enabled
435
+ validator: Custom validation function to validate the output
436
+ max_validation_retries: Maximum number of retry attempts if validation fails
437
+ priority: Task execution priority (if enabled by vLLM and the model)
438
+ timeout: Maximum time in seconds to wait for the response before raising a timeout error
439
+
440
+ Returns:
441
+ list[ToolOutput]
442
+ """
443
+
444
+ async def _throttled_task(text: str) -> ToolOutput:
445
+ async with self.semaphore:
446
+ return await self.tool.summarize(
447
+ text=text,
448
+ with_analysis=with_analysis,
449
+ output_lang=output_lang,
450
+ user_prompt=user_prompt,
451
+ temperature=temperature,
452
+ logprobs=logprobs,
453
+ top_logprobs=top_logprobs,
454
+ validator=validator,
455
+ max_validation_retries=max_validation_retries,
456
+ priority=priority,
457
+ timeout=timeout,
458
+ )
459
+
460
+ tasks = [_throttled_task(t) for t in texts]
461
+ return await asyncio.gather(*tasks)
462
+
463
+ async def translate(
464
+ self,
465
+ texts: list[str],
466
+ target_lang: str,
467
+ use_chunker: bool = True,
468
+ with_analysis: bool = False,
469
+ user_prompt: str | None = None,
470
+ temperature: float | None = 0.0,
471
+ logprobs: bool = False,
472
+ top_logprobs: int = 3,
473
+ validator: Callable[[Any], bool] | None = None,
474
+ max_validation_retries: int | None = None,
475
+ priority: int | None = None,
476
+ timeout: float | None = None,
477
+ ) -> list[ToolOutput]:
478
+ """
479
+ Translate texts between languages
480
+
481
+ Important Note: This tool is EXPERIMENTAL, you can use it but it isn't reliable.
482
+
483
+ Arguments:
484
+ texts: The input texts
485
+ target_lang: The target language for translation
486
+ use_chunker: Whether to use text chunker for large texts
487
+ with_analysis: Adds a reasoning step before generating the final output. Note: This doubles token usage per call
488
+ user_prompt: Additional instructions
489
+ temperature: Controls randomness
490
+ logprobs: Whether to return token probability information
491
+ top_logprobs: Number of top token alternatives to return if logprobs enabled
492
+ validator: Custom validation function to validate the output
493
+ max_validation_retries: Maximum number of retry attempts if validation fails
494
+ priority: Task execution priority (if enabled by vLLM and the model)
495
+ timeout: Maximum time in seconds to wait for the response before raising a timeout error
496
+
497
+ Returns:
498
+ list[ToolOutput]
499
+ """
500
+
501
+ async def _throttled_task(text: str) -> ToolOutput:
502
+ async with self.semaphore:
503
+ return await self.tool.translate(
504
+ text=text,
505
+ target_lang=target_lang,
506
+ use_chunker=use_chunker,
507
+ with_analysis=with_analysis,
508
+ user_prompt=user_prompt,
509
+ temperature=temperature,
510
+ logprobs=logprobs,
511
+ top_logprobs=top_logprobs,
512
+ validator=validator,
513
+ max_validation_retries=max_validation_retries,
514
+ priority=priority,
515
+ timeout=timeout,
516
+ )
517
+
518
+ tasks = [_throttled_task(t) for t in texts]
519
+ return await asyncio.gather(*tasks)
520
+
521
+ async def propositionize(
522
+ self,
523
+ texts: list[str],
524
+ with_analysis: bool = False,
525
+ output_lang: str | None = None,
526
+ user_prompt: str | None = None,
527
+ temperature: float | None = 0.0,
528
+ logprobs: bool = False,
529
+ top_logprobs: int = 3,
530
+ validator: Callable[[Any], bool] | None = None,
531
+ max_validation_retries: int | None = None,
532
+ priority: int | None = None,
533
+ timeout: float | None = None,
534
+ ) -> list[ToolOutput]:
535
+ """
536
+ Convert texts into atomic, independent, meaningful sentences
537
+
538
+ Important Note: This tool is EXPERIMENTAL, you can use it but it isn't reliable.
539
+
540
+ Arguments:
541
+ texts: The input texts
542
+ with_analysis: Adds a reasoning step before generating the final output. Note: This doubles token usage per call
543
+ output_lang: Forces the model to respond in a specific language
544
+ user_prompt: Additional instructions
545
+ temperature: Controls randomness
546
+ logprobs: Whether to return token probability information
547
+ top_logprobs: Number of top token alternatives to return if logprobs enabled
548
+ validator: Custom validation function to validate the output
549
+ max_validation_retries: Maximum number of retry attempts if validation fails
550
+ priority: Task execution priority (if enabled by vLLM and the model)
551
+ timeout: Maximum time in seconds to wait for the response before raising a timeout error
552
+
553
+ Returns:
554
+ list[ToolOutput]
555
+ """
556
+
557
+ async def _throttled_task(text: str) -> ToolOutput:
558
+ async with self.semaphore:
559
+ return await self.tool.propositionize(
560
+ text=text,
561
+ with_analysis=with_analysis,
562
+ output_lang=output_lang,
563
+ user_prompt=user_prompt,
564
+ temperature=temperature,
565
+ logprobs=logprobs,
566
+ top_logprobs=top_logprobs,
567
+ validator=validator,
568
+ max_validation_retries=max_validation_retries,
569
+ priority=priority,
570
+ timeout=timeout,
571
+ )
572
+
573
+ tasks = [_throttled_task(t) for t in texts]
574
+ return await asyncio.gather(*tasks)
575
+
576
+ async def is_fact(
577
+ self,
578
+ texts: list[str],
579
+ source_texts: list[str],
580
+ with_analysis: bool = False,
581
+ output_lang: str | None = None,
582
+ user_prompt: str | None = None,
583
+ temperature: float | None = 0.0,
584
+ logprobs: bool = False,
585
+ top_logprobs: int = 3,
586
+ validator: Callable[[Any], bool] | None = None,
587
+ max_validation_retries: int | None = None,
588
+ priority: int | None = None,
589
+ timeout: float | None = None,
590
+ ) -> list[ToolOutput]:
591
+ """
592
+ Check whether statements are facts based on source texts
593
+
594
+ Important Note: This tool is EXPERIMENTAL, you can use it but it isn't reliable.
595
+
596
+ Arguments:
597
+ texts: The input texts (statements to check)
598
+ source_texts: The source texts
599
+ with_analysis: Adds a reasoning step before generating the final output. Note: This doubles token usage per call
600
+ output_lang: Forces the model to respond in a specific language
601
+ user_prompt: Additional instructions
602
+ temperature: Controls randomness
603
+ logprobs: Whether to return token probability information
604
+ top_logprobs: Number of top token alternatives to return if logprobs enabled
605
+ validator: Custom validation function to validate the output
606
+ max_validation_retries: Maximum number of retry attempts if validation fails
607
+ priority: Task execution priority (if enabled by vLLM and the model)
608
+ timeout: Maximum time in seconds to wait for the response before raising a timeout error
609
+
610
+ Returns:
611
+ list[ToolOutput]
612
+ """
613
+
614
+ async def _throttled_task(text: str, source_text: str) -> ToolOutput:
615
+ async with self.semaphore:
616
+ return await self.tool.is_fact(
617
+ text=text,
618
+ source_text=source_text,
619
+ with_analysis=with_analysis,
620
+ output_lang=output_lang,
621
+ user_prompt=user_prompt,
622
+ temperature=temperature,
623
+ logprobs=logprobs,
624
+ top_logprobs=top_logprobs,
625
+ validator=validator,
626
+ max_validation_retries=max_validation_retries,
627
+ priority=priority,
628
+ timeout=timeout,
629
+ )
630
+
631
+ tasks = [_throttled_task(t, s) for t, s in zip(texts, source_texts)]
632
+ return await asyncio.gather(*tasks)
633
+
634
+ async def run_custom(
635
+ self,
636
+ prompts: list[str],
637
+ output_model: Any,
638
+ with_analysis: bool = False,
639
+ analyze_template: str | None = None,
640
+ output_lang: str | None = None,
641
+ temperature: float | None = None,
642
+ logprobs: bool | None = None,
643
+ top_logprobs: int = 3,
644
+ validator: Callable[[Any], bool] | None = None,
645
+ max_validation_retries: int | None = None,
646
+ priority: int | None = None,
647
+ timeout: float | None = None,
648
+ ) -> list[ToolOutput]:
649
+ """
650
+ Custom tool that can do almost anything for multiple prompts
651
+
652
+ Arguments:
653
+ prompts: The user prompts
654
+ output_model: Pydantic BaseModel used for structured output
655
+ with_analysis: Adds a reasoning step before generating the final output. Note: This doubles token usage per call
656
+ analyze_template: The analyze template used for reasoning analysis
657
+ output_lang: Forces the model to respond in a specific language
658
+ temperature: Controls randomness
659
+ logprobs: Whether to return token probability information
660
+ top_logprobs: Number of top token alternatives to return if logprobs enabled
661
+ validator: Custom validation function to validate the output
662
+ max_validation_retries: Maximum number of retry attempts if validation fails
663
+ priority: Task execution priority (if enabled by vLLM and the model)
664
+ timeout: Maximum time in seconds to wait for the response before raising a timeout error
665
+
666
+ Returns:
667
+ list[ToolOutput]
668
+ """
669
+
670
+ async def _throttled_task(prompt: str) -> ToolOutput:
671
+ async with self.semaphore:
672
+ return await self.tool.run_custom(
673
+ prompt=prompt,
674
+ output_model=output_model,
675
+ with_analysis=with_analysis,
676
+ analyze_template=analyze_template,
677
+ output_lang=output_lang,
678
+ temperature=temperature,
679
+ logprobs=logprobs,
680
+ top_logprobs=top_logprobs,
681
+ validator=validator,
682
+ max_validation_retries=max_validation_retries,
683
+ priority=priority,
684
+ timeout=timeout,
685
+ )
686
+
687
+ tasks = [_throttled_task(p) for p in prompts]
688
+ return await asyncio.gather(*tasks)
@@ -60,7 +60,6 @@ class TheTool:
60
60
 
61
61
  Returns:
62
62
  ToolOutput
63
-
64
63
  """
65
64
  tool_name = "categorize"
66
65
  start = perf_counter()