hamtaa-texttools 2.0.0__tar.gz → 2.2.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/PKG-INFO +60 -12
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/README.md +58 -11
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/hamtaa_texttools.egg-info/PKG-INFO +60 -12
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/hamtaa_texttools.egg-info/SOURCES.txt +1 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/hamtaa_texttools.egg-info/requires.txt +1 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/pyproject.toml +46 -45
- hamtaa_texttools-2.2.0/texttools/__init__.py +6 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/prompts/augment.yaml +15 -15
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/tools/async_tools.py +64 -3
- hamtaa_texttools-2.2.0/texttools/tools/batch_tools.py +688 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/tools/sync_tools.py +64 -3
- hamtaa_texttools-2.0.0/texttools/__init__.py +0 -5
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/LICENSE +0 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/hamtaa_texttools.egg-info/dependency_links.txt +0 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/hamtaa_texttools.egg-info/top_level.txt +0 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/setup.cfg +0 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/tests/test_category_tree.py +0 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/tests/test_to_chunks.py +0 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/core/__init__.py +0 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/core/exceptions.py +0 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/core/internal_models.py +0 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/core/operators/__init__.py +0 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/core/operators/async_operator.py +0 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/core/operators/sync_operator.py +0 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/core/utils.py +0 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/models.py +0 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/prompts/categorize.yaml +0 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/prompts/extract_entities.yaml +0 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/prompts/extract_keywords.yaml +0 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/prompts/is_fact.yaml +0 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/prompts/is_question.yaml +0 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/prompts/merge_questions.yaml +0 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/prompts/propositionize.yaml +0 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/prompts/run_custom.yaml +0 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/prompts/summarize.yaml +0 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/prompts/to_question.yaml +0 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/prompts/translate.yaml +0 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/py.typed +0 -0
- {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/tools/__init__.py +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: hamtaa-texttools
|
|
3
|
-
Version: 2.
|
|
3
|
+
Version: 2.2.0
|
|
4
4
|
Summary: A high-level NLP toolkit built on top of modern LLMs.
|
|
5
5
|
Author-email: Tohidi <the.mohammad.tohidi@gmail.com>, Erfan Moosavi <erfanmoosavi84@gmail.com>, Montazer <montazerh82@gmail.com>, Givechi <mohamad.m.givechi@gmail.com>, Zareshahi <a.zareshahi1377@gmail.com>
|
|
6
6
|
Maintainer-email: Erfan Moosavi <erfanmoosavi84@gmail.com>, Tohidi <the.mohammad.tohidi@gmail.com>
|
|
@@ -14,6 +14,7 @@ Classifier: Operating System :: OS Independent
|
|
|
14
14
|
Requires-Python: >=3.11
|
|
15
15
|
Description-Content-Type: text/markdown
|
|
16
16
|
License-File: LICENSE
|
|
17
|
+
Requires-Dist: dotenv>=0.9.9
|
|
17
18
|
Requires-Dist: openai>=1.97.1
|
|
18
19
|
Requires-Dist: pydantic>=2.0.0
|
|
19
20
|
Requires-Dist: pyyaml>=6.0
|
|
@@ -28,7 +29,10 @@ Dynamic: license-file
|
|
|
28
29
|
|
|
29
30
|
**TextTools** is a high-level **NLP toolkit** built on top of **LLMs**.
|
|
30
31
|
|
|
31
|
-
It provides
|
|
32
|
+
It provides three API styles for maximum flexibility:
|
|
33
|
+
- Sync API (`TheTool`) - Simple, sequential operations
|
|
34
|
+
- Async API (`AsyncTheTool`) - High-performance async operations
|
|
35
|
+
- Batch API (`BatchTheTool`) - Process multiple texts in parallel with built-in concurrency control
|
|
32
36
|
|
|
33
37
|
It provides ready-to-use utilities for **translation, question detection, categorization, NER extraction, and more** - designed to help you integrate AI-powered text processing into your applications with minimal effort.
|
|
34
38
|
|
|
@@ -68,8 +72,8 @@ pip install -U hamtaa-texttools
|
|
|
68
72
|
|
|
69
73
|
| Status | Meaning | Tools | Safe for Production? |
|
|
70
74
|
|--------|---------|----------|-------------------|
|
|
71
|
-
| **✅ Production** | Evaluated and tested. | `categorize()
|
|
72
|
-
| **🧪 Experimental** | Added to the package but **not fully evaluated**. |
|
|
75
|
+
| **✅ Production** | Evaluated and tested. | `categorize()`, `extract_keywords()`, `extract_entities()`, `is_question()`, `to_question()`, `merge_questions()`, `augment()`, `summarize()`, `run_custom()` | **Yes** - ready for reliable use. |
|
|
76
|
+
| **🧪 Experimental** | Added to the package but **not fully evaluated**. | `translate()`, `propositionize()`, `is_fact()` | **Use with caution** |
|
|
73
77
|
|
|
74
78
|
---
|
|
75
79
|
|
|
@@ -95,6 +99,9 @@ pip install -U hamtaa-texttools
|
|
|
95
99
|
- **`timeout: float`** → Maximum time in seconds to wait for the response before raising a timeout error.
|
|
96
100
|
**Note:** This feature is only available in `AsyncTheTool`.
|
|
97
101
|
|
|
102
|
+
- **`raise_on_error: bool`** → (`TheTool/AsyncTheTool`) Raise errors (True) or return them in output (False). Default is True.
|
|
103
|
+
|
|
104
|
+
- **`max_concurrency: int`** → (`BatchTheTool` only) Maximum number of concurrent API calls. Default is 5.
|
|
98
105
|
|
|
99
106
|
---
|
|
100
107
|
|
|
@@ -114,13 +121,16 @@ Every tool of `TextTools` returns a `ToolOutput` object which is a BaseModel wit
|
|
|
114
121
|
- Verify operation success with the `is_successful()` method.
|
|
115
122
|
- Convert output to a dictionary with the `to_dict()` method.
|
|
116
123
|
|
|
124
|
+
**Note:** For BatchTheTool: Each method returns a list[ToolOutput] containing results for all input texts.
|
|
125
|
+
|
|
117
126
|
---
|
|
118
127
|
|
|
119
|
-
## 🧨 Sync vs Async
|
|
120
|
-
| Tool
|
|
121
|
-
|
|
122
|
-
| `TheTool`
|
|
123
|
-
| `AsyncTheTool` | Async | High-throughput
|
|
128
|
+
## 🧨 Sync vs Async vs Batch
|
|
129
|
+
| Tool | Style | Use Case | Best For |
|
|
130
|
+
|------|-------|----------|----------|
|
|
131
|
+
| `TheTool` | **Sync** | Simple scripts, sequential workflows | • Quick prototyping<br>• Simple scripts<br>• Sequential processing<br>• Debugging |
|
|
132
|
+
| `AsyncTheTool` | **Async** | High-throughput applications, APIs, concurrent tasks | • Web APIs<br>• Concurrent operations<br>• High-performance apps<br>• Real-time processing |
|
|
133
|
+
| `BatchTheTool` | **Batch** | Process multiple texts efficiently with controlled concurrency | • Bulk processing<br>• Large datasets<br>• Parallel execution<br>• Resource optimization |
|
|
124
134
|
|
|
125
135
|
---
|
|
126
136
|
|
|
@@ -165,6 +175,35 @@ async def main():
|
|
|
165
175
|
asyncio.run(main())
|
|
166
176
|
```
|
|
167
177
|
|
|
178
|
+
## ⚡ Quick Start (Batch)
|
|
179
|
+
|
|
180
|
+
```python
|
|
181
|
+
import asyncio
|
|
182
|
+
from openai import AsyncOpenAI
|
|
183
|
+
from texttools import BatchTheTool
|
|
184
|
+
|
|
185
|
+
async def main():
|
|
186
|
+
async_client = AsyncOpenAI(base_url="your_url", api_key="your_api_key")
|
|
187
|
+
model = "model_name"
|
|
188
|
+
|
|
189
|
+
batch_the_tool = BatchTheTool(client=async_client, model=model, max_concurrency=3)
|
|
190
|
+
|
|
191
|
+
categories = await batch_tool.categorize(
|
|
192
|
+
texts=[
|
|
193
|
+
"Climate change impacts on agriculture",
|
|
194
|
+
"Artificial intelligence in healthcare",
|
|
195
|
+
"Economic effects of remote work",
|
|
196
|
+
"Advancements in quantum computing",
|
|
197
|
+
],
|
|
198
|
+
categories=["Science", "Technology", "Economics", "Environment"],
|
|
199
|
+
)
|
|
200
|
+
|
|
201
|
+
for i, result in enumerate(categories):
|
|
202
|
+
print(f"Text {i+1}: {result.result}")
|
|
203
|
+
|
|
204
|
+
asyncio.run(main())
|
|
205
|
+
```
|
|
206
|
+
|
|
168
207
|
---
|
|
169
208
|
|
|
170
209
|
## ✅ Use Cases
|
|
@@ -173,11 +212,20 @@ Use **TextTools** when you need to:
|
|
|
173
212
|
|
|
174
213
|
- 🔍 **Classify** large datasets quickly without model training
|
|
175
214
|
- 🧩 **Integrate** LLMs into production pipelines (structured outputs)
|
|
176
|
-
- 📊 **Analyze** large text collections using embeddings and categorization
|
|
215
|
+
- 📊 **Analyze** large text collections using embeddings and categorization
|
|
216
|
+
|
|
217
|
+
---
|
|
218
|
+
|
|
219
|
+
## 📄 License
|
|
220
|
+
|
|
221
|
+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
|
177
222
|
|
|
178
223
|
---
|
|
179
224
|
|
|
180
225
|
## 🤝 Contributing
|
|
181
226
|
|
|
182
|
-
|
|
183
|
-
|
|
227
|
+
We welcome contributions from the community! - see the [CONTRIBUTING](CONTRIBUTING.md) file for details.
|
|
228
|
+
|
|
229
|
+
## 📚 Documentation
|
|
230
|
+
|
|
231
|
+
For detailed documentation, architecture overview, and implementation details, please visit the [docs](docs) directory.
|
|
@@ -7,7 +7,10 @@
|
|
|
7
7
|
|
|
8
8
|
**TextTools** is a high-level **NLP toolkit** built on top of **LLMs**.
|
|
9
9
|
|
|
10
|
-
It provides
|
|
10
|
+
It provides three API styles for maximum flexibility:
|
|
11
|
+
- Sync API (`TheTool`) - Simple, sequential operations
|
|
12
|
+
- Async API (`AsyncTheTool`) - High-performance async operations
|
|
13
|
+
- Batch API (`BatchTheTool`) - Process multiple texts in parallel with built-in concurrency control
|
|
11
14
|
|
|
12
15
|
It provides ready-to-use utilities for **translation, question detection, categorization, NER extraction, and more** - designed to help you integrate AI-powered text processing into your applications with minimal effort.
|
|
13
16
|
|
|
@@ -47,8 +50,8 @@ pip install -U hamtaa-texttools
|
|
|
47
50
|
|
|
48
51
|
| Status | Meaning | Tools | Safe for Production? |
|
|
49
52
|
|--------|---------|----------|-------------------|
|
|
50
|
-
| **✅ Production** | Evaluated and tested. | `categorize()
|
|
51
|
-
| **🧪 Experimental** | Added to the package but **not fully evaluated**. |
|
|
53
|
+
| **✅ Production** | Evaluated and tested. | `categorize()`, `extract_keywords()`, `extract_entities()`, `is_question()`, `to_question()`, `merge_questions()`, `augment()`, `summarize()`, `run_custom()` | **Yes** - ready for reliable use. |
|
|
54
|
+
| **🧪 Experimental** | Added to the package but **not fully evaluated**. | `translate()`, `propositionize()`, `is_fact()` | **Use with caution** |
|
|
52
55
|
|
|
53
56
|
---
|
|
54
57
|
|
|
@@ -74,6 +77,9 @@ pip install -U hamtaa-texttools
|
|
|
74
77
|
- **`timeout: float`** → Maximum time in seconds to wait for the response before raising a timeout error.
|
|
75
78
|
**Note:** This feature is only available in `AsyncTheTool`.
|
|
76
79
|
|
|
80
|
+
- **`raise_on_error: bool`** → (`TheTool/AsyncTheTool`) Raise errors (True) or return them in output (False). Default is True.
|
|
81
|
+
|
|
82
|
+
- **`max_concurrency: int`** → (`BatchTheTool` only) Maximum number of concurrent API calls. Default is 5.
|
|
77
83
|
|
|
78
84
|
---
|
|
79
85
|
|
|
@@ -93,13 +99,16 @@ Every tool of `TextTools` returns a `ToolOutput` object which is a BaseModel wit
|
|
|
93
99
|
- Verify operation success with the `is_successful()` method.
|
|
94
100
|
- Convert output to a dictionary with the `to_dict()` method.
|
|
95
101
|
|
|
102
|
+
**Note:** For BatchTheTool: Each method returns a list[ToolOutput] containing results for all input texts.
|
|
103
|
+
|
|
96
104
|
---
|
|
97
105
|
|
|
98
|
-
## 🧨 Sync vs Async
|
|
99
|
-
| Tool
|
|
100
|
-
|
|
101
|
-
| `TheTool`
|
|
102
|
-
| `AsyncTheTool` | Async | High-throughput
|
|
106
|
+
## 🧨 Sync vs Async vs Batch
|
|
107
|
+
| Tool | Style | Use Case | Best For |
|
|
108
|
+
|------|-------|----------|----------|
|
|
109
|
+
| `TheTool` | **Sync** | Simple scripts, sequential workflows | • Quick prototyping<br>• Simple scripts<br>• Sequential processing<br>• Debugging |
|
|
110
|
+
| `AsyncTheTool` | **Async** | High-throughput applications, APIs, concurrent tasks | • Web APIs<br>• Concurrent operations<br>• High-performance apps<br>• Real-time processing |
|
|
111
|
+
| `BatchTheTool` | **Batch** | Process multiple texts efficiently with controlled concurrency | • Bulk processing<br>• Large datasets<br>• Parallel execution<br>• Resource optimization |
|
|
103
112
|
|
|
104
113
|
---
|
|
105
114
|
|
|
@@ -144,6 +153,35 @@ async def main():
|
|
|
144
153
|
asyncio.run(main())
|
|
145
154
|
```
|
|
146
155
|
|
|
156
|
+
## ⚡ Quick Start (Batch)
|
|
157
|
+
|
|
158
|
+
```python
|
|
159
|
+
import asyncio
|
|
160
|
+
from openai import AsyncOpenAI
|
|
161
|
+
from texttools import BatchTheTool
|
|
162
|
+
|
|
163
|
+
async def main():
|
|
164
|
+
async_client = AsyncOpenAI(base_url="your_url", api_key="your_api_key")
|
|
165
|
+
model = "model_name"
|
|
166
|
+
|
|
167
|
+
batch_the_tool = BatchTheTool(client=async_client, model=model, max_concurrency=3)
|
|
168
|
+
|
|
169
|
+
categories = await batch_tool.categorize(
|
|
170
|
+
texts=[
|
|
171
|
+
"Climate change impacts on agriculture",
|
|
172
|
+
"Artificial intelligence in healthcare",
|
|
173
|
+
"Economic effects of remote work",
|
|
174
|
+
"Advancements in quantum computing",
|
|
175
|
+
],
|
|
176
|
+
categories=["Science", "Technology", "Economics", "Environment"],
|
|
177
|
+
)
|
|
178
|
+
|
|
179
|
+
for i, result in enumerate(categories):
|
|
180
|
+
print(f"Text {i+1}: {result.result}")
|
|
181
|
+
|
|
182
|
+
asyncio.run(main())
|
|
183
|
+
```
|
|
184
|
+
|
|
147
185
|
---
|
|
148
186
|
|
|
149
187
|
## ✅ Use Cases
|
|
@@ -152,11 +190,20 @@ Use **TextTools** when you need to:
|
|
|
152
190
|
|
|
153
191
|
- 🔍 **Classify** large datasets quickly without model training
|
|
154
192
|
- 🧩 **Integrate** LLMs into production pipelines (structured outputs)
|
|
155
|
-
- 📊 **Analyze** large text collections using embeddings and categorization
|
|
193
|
+
- 📊 **Analyze** large text collections using embeddings and categorization
|
|
194
|
+
|
|
195
|
+
---
|
|
196
|
+
|
|
197
|
+
## 📄 License
|
|
198
|
+
|
|
199
|
+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
|
156
200
|
|
|
157
201
|
---
|
|
158
202
|
|
|
159
203
|
## 🤝 Contributing
|
|
160
204
|
|
|
161
|
-
|
|
162
|
-
|
|
205
|
+
We welcome contributions from the community! - see the [CONTRIBUTING](CONTRIBUTING.md) file for details.
|
|
206
|
+
|
|
207
|
+
## 📚 Documentation
|
|
208
|
+
|
|
209
|
+
For detailed documentation, architecture overview, and implementation details, please visit the [docs](docs) directory.
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: hamtaa-texttools
|
|
3
|
-
Version: 2.
|
|
3
|
+
Version: 2.2.0
|
|
4
4
|
Summary: A high-level NLP toolkit built on top of modern LLMs.
|
|
5
5
|
Author-email: Tohidi <the.mohammad.tohidi@gmail.com>, Erfan Moosavi <erfanmoosavi84@gmail.com>, Montazer <montazerh82@gmail.com>, Givechi <mohamad.m.givechi@gmail.com>, Zareshahi <a.zareshahi1377@gmail.com>
|
|
6
6
|
Maintainer-email: Erfan Moosavi <erfanmoosavi84@gmail.com>, Tohidi <the.mohammad.tohidi@gmail.com>
|
|
@@ -14,6 +14,7 @@ Classifier: Operating System :: OS Independent
|
|
|
14
14
|
Requires-Python: >=3.11
|
|
15
15
|
Description-Content-Type: text/markdown
|
|
16
16
|
License-File: LICENSE
|
|
17
|
+
Requires-Dist: dotenv>=0.9.9
|
|
17
18
|
Requires-Dist: openai>=1.97.1
|
|
18
19
|
Requires-Dist: pydantic>=2.0.0
|
|
19
20
|
Requires-Dist: pyyaml>=6.0
|
|
@@ -28,7 +29,10 @@ Dynamic: license-file
|
|
|
28
29
|
|
|
29
30
|
**TextTools** is a high-level **NLP toolkit** built on top of **LLMs**.
|
|
30
31
|
|
|
31
|
-
It provides
|
|
32
|
+
It provides three API styles for maximum flexibility:
|
|
33
|
+
- Sync API (`TheTool`) - Simple, sequential operations
|
|
34
|
+
- Async API (`AsyncTheTool`) - High-performance async operations
|
|
35
|
+
- Batch API (`BatchTheTool`) - Process multiple texts in parallel with built-in concurrency control
|
|
32
36
|
|
|
33
37
|
It provides ready-to-use utilities for **translation, question detection, categorization, NER extraction, and more** - designed to help you integrate AI-powered text processing into your applications with minimal effort.
|
|
34
38
|
|
|
@@ -68,8 +72,8 @@ pip install -U hamtaa-texttools
|
|
|
68
72
|
|
|
69
73
|
| Status | Meaning | Tools | Safe for Production? |
|
|
70
74
|
|--------|---------|----------|-------------------|
|
|
71
|
-
| **✅ Production** | Evaluated and tested. | `categorize()
|
|
72
|
-
| **🧪 Experimental** | Added to the package but **not fully evaluated**. |
|
|
75
|
+
| **✅ Production** | Evaluated and tested. | `categorize()`, `extract_keywords()`, `extract_entities()`, `is_question()`, `to_question()`, `merge_questions()`, `augment()`, `summarize()`, `run_custom()` | **Yes** - ready for reliable use. |
|
|
76
|
+
| **🧪 Experimental** | Added to the package but **not fully evaluated**. | `translate()`, `propositionize()`, `is_fact()` | **Use with caution** |
|
|
73
77
|
|
|
74
78
|
---
|
|
75
79
|
|
|
@@ -95,6 +99,9 @@ pip install -U hamtaa-texttools
|
|
|
95
99
|
- **`timeout: float`** → Maximum time in seconds to wait for the response before raising a timeout error.
|
|
96
100
|
**Note:** This feature is only available in `AsyncTheTool`.
|
|
97
101
|
|
|
102
|
+
- **`raise_on_error: bool`** → (`TheTool/AsyncTheTool`) Raise errors (True) or return them in output (False). Default is True.
|
|
103
|
+
|
|
104
|
+
- **`max_concurrency: int`** → (`BatchTheTool` only) Maximum number of concurrent API calls. Default is 5.
|
|
98
105
|
|
|
99
106
|
---
|
|
100
107
|
|
|
@@ -114,13 +121,16 @@ Every tool of `TextTools` returns a `ToolOutput` object which is a BaseModel wit
|
|
|
114
121
|
- Verify operation success with the `is_successful()` method.
|
|
115
122
|
- Convert output to a dictionary with the `to_dict()` method.
|
|
116
123
|
|
|
124
|
+
**Note:** For BatchTheTool: Each method returns a list[ToolOutput] containing results for all input texts.
|
|
125
|
+
|
|
117
126
|
---
|
|
118
127
|
|
|
119
|
-
## 🧨 Sync vs Async
|
|
120
|
-
| Tool
|
|
121
|
-
|
|
122
|
-
| `TheTool`
|
|
123
|
-
| `AsyncTheTool` | Async | High-throughput
|
|
128
|
+
## 🧨 Sync vs Async vs Batch
|
|
129
|
+
| Tool | Style | Use Case | Best For |
|
|
130
|
+
|------|-------|----------|----------|
|
|
131
|
+
| `TheTool` | **Sync** | Simple scripts, sequential workflows | • Quick prototyping<br>• Simple scripts<br>• Sequential processing<br>• Debugging |
|
|
132
|
+
| `AsyncTheTool` | **Async** | High-throughput applications, APIs, concurrent tasks | • Web APIs<br>• Concurrent operations<br>• High-performance apps<br>• Real-time processing |
|
|
133
|
+
| `BatchTheTool` | **Batch** | Process multiple texts efficiently with controlled concurrency | • Bulk processing<br>• Large datasets<br>• Parallel execution<br>• Resource optimization |
|
|
124
134
|
|
|
125
135
|
---
|
|
126
136
|
|
|
@@ -165,6 +175,35 @@ async def main():
|
|
|
165
175
|
asyncio.run(main())
|
|
166
176
|
```
|
|
167
177
|
|
|
178
|
+
## ⚡ Quick Start (Batch)
|
|
179
|
+
|
|
180
|
+
```python
|
|
181
|
+
import asyncio
|
|
182
|
+
from openai import AsyncOpenAI
|
|
183
|
+
from texttools import BatchTheTool
|
|
184
|
+
|
|
185
|
+
async def main():
|
|
186
|
+
async_client = AsyncOpenAI(base_url="your_url", api_key="your_api_key")
|
|
187
|
+
model = "model_name"
|
|
188
|
+
|
|
189
|
+
batch_the_tool = BatchTheTool(client=async_client, model=model, max_concurrency=3)
|
|
190
|
+
|
|
191
|
+
categories = await batch_tool.categorize(
|
|
192
|
+
texts=[
|
|
193
|
+
"Climate change impacts on agriculture",
|
|
194
|
+
"Artificial intelligence in healthcare",
|
|
195
|
+
"Economic effects of remote work",
|
|
196
|
+
"Advancements in quantum computing",
|
|
197
|
+
],
|
|
198
|
+
categories=["Science", "Technology", "Economics", "Environment"],
|
|
199
|
+
)
|
|
200
|
+
|
|
201
|
+
for i, result in enumerate(categories):
|
|
202
|
+
print(f"Text {i+1}: {result.result}")
|
|
203
|
+
|
|
204
|
+
asyncio.run(main())
|
|
205
|
+
```
|
|
206
|
+
|
|
168
207
|
---
|
|
169
208
|
|
|
170
209
|
## ✅ Use Cases
|
|
@@ -173,11 +212,20 @@ Use **TextTools** when you need to:
|
|
|
173
212
|
|
|
174
213
|
- 🔍 **Classify** large datasets quickly without model training
|
|
175
214
|
- 🧩 **Integrate** LLMs into production pipelines (structured outputs)
|
|
176
|
-
- 📊 **Analyze** large text collections using embeddings and categorization
|
|
215
|
+
- 📊 **Analyze** large text collections using embeddings and categorization
|
|
216
|
+
|
|
217
|
+
---
|
|
218
|
+
|
|
219
|
+
## 📄 License
|
|
220
|
+
|
|
221
|
+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
|
177
222
|
|
|
178
223
|
---
|
|
179
224
|
|
|
180
225
|
## 🤝 Contributing
|
|
181
226
|
|
|
182
|
-
|
|
183
|
-
|
|
227
|
+
We welcome contributions from the community! - see the [CONTRIBUTING](CONTRIBUTING.md) file for details.
|
|
228
|
+
|
|
229
|
+
## 📚 Documentation
|
|
230
|
+
|
|
231
|
+
For detailed documentation, architecture overview, and implementation details, please visit the [docs](docs) directory.
|
|
@@ -1,45 +1,46 @@
|
|
|
1
|
-
[build-system]
|
|
2
|
-
requires = ["setuptools>=61.0", "wheel"]
|
|
3
|
-
build-backend = "setuptools.build_meta"
|
|
4
|
-
|
|
5
|
-
[project]
|
|
6
|
-
name = "hamtaa-texttools"
|
|
7
|
-
version = "2.
|
|
8
|
-
authors = [
|
|
9
|
-
{name = "Tohidi", email = "the.mohammad.tohidi@gmail.com"},
|
|
10
|
-
{name = "Erfan Moosavi", email = "erfanmoosavi84@gmail.com"},
|
|
11
|
-
{name = "Montazer", email = "montazerh82@gmail.com"},
|
|
12
|
-
{name = "Givechi", email = "mohamad.m.givechi@gmail.com"},
|
|
13
|
-
{name = "Zareshahi", email = "a.zareshahi1377@gmail.com"},
|
|
14
|
-
]
|
|
15
|
-
maintainers = [
|
|
16
|
-
{name = "Erfan Moosavi", email = "erfanmoosavi84@gmail.com"},
|
|
17
|
-
{name = "Tohidi", email = "the.mohammad.tohidi@gmail.com"},
|
|
18
|
-
]
|
|
19
|
-
description = "A high-level NLP toolkit built on top of modern LLMs."
|
|
20
|
-
readme = "README.md"
|
|
21
|
-
license = {text = "MIT"}
|
|
22
|
-
requires-python = ">=3.11"
|
|
23
|
-
dependencies = [
|
|
24
|
-
"
|
|
25
|
-
"
|
|
26
|
-
"
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
"
|
|
32
|
-
"
|
|
33
|
-
"Topic ::
|
|
34
|
-
"
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
1
|
+
[build-system]
|
|
2
|
+
requires = ["setuptools>=61.0", "wheel"]
|
|
3
|
+
build-backend = "setuptools.build_meta"
|
|
4
|
+
|
|
5
|
+
[project]
|
|
6
|
+
name = "hamtaa-texttools"
|
|
7
|
+
version = "2.2.0"
|
|
8
|
+
authors = [
|
|
9
|
+
{name = "Tohidi", email = "the.mohammad.tohidi@gmail.com"},
|
|
10
|
+
{name = "Erfan Moosavi", email = "erfanmoosavi84@gmail.com"},
|
|
11
|
+
{name = "Montazer", email = "montazerh82@gmail.com"},
|
|
12
|
+
{name = "Givechi", email = "mohamad.m.givechi@gmail.com"},
|
|
13
|
+
{name = "Zareshahi", email = "a.zareshahi1377@gmail.com"},
|
|
14
|
+
]
|
|
15
|
+
maintainers = [
|
|
16
|
+
{name = "Erfan Moosavi", email = "erfanmoosavi84@gmail.com"},
|
|
17
|
+
{name = "Tohidi", email = "the.mohammad.tohidi@gmail.com"},
|
|
18
|
+
]
|
|
19
|
+
description = "A high-level NLP toolkit built on top of modern LLMs."
|
|
20
|
+
readme = "README.md"
|
|
21
|
+
license = {text = "MIT"}
|
|
22
|
+
requires-python = ">=3.11"
|
|
23
|
+
dependencies = [
|
|
24
|
+
"dotenv>=0.9.9",
|
|
25
|
+
"openai>=1.97.1",
|
|
26
|
+
"pydantic>=2.0.0",
|
|
27
|
+
"pyyaml>=6.0",
|
|
28
|
+
]
|
|
29
|
+
keywords = ["nlp", "llm", "text-processing", "openai"]
|
|
30
|
+
classifiers = [
|
|
31
|
+
"Development Status :: 5 - Production/Stable",
|
|
32
|
+
"License :: OSI Approved :: MIT License",
|
|
33
|
+
"Topic :: Scientific/Engineering :: Artificial Intelligence",
|
|
34
|
+
"Topic :: Text Processing",
|
|
35
|
+
"Operating System :: OS Independent",
|
|
36
|
+
]
|
|
37
|
+
|
|
38
|
+
[tool.setuptools.packages.find]
|
|
39
|
+
where = ["."]
|
|
40
|
+
include = ["texttools*"]
|
|
41
|
+
|
|
42
|
+
[tool.setuptools]
|
|
43
|
+
include-package-data = true
|
|
44
|
+
|
|
45
|
+
[tool.setuptools.package-data]
|
|
46
|
+
"texttools" = ["prompts/*.yaml", "py.typed"]
|
|
@@ -38,25 +38,25 @@ main_template:
|
|
|
38
38
|
"{text}"
|
|
39
39
|
|
|
40
40
|
hard_negative: |
|
|
41
|
-
|
|
42
|
-
|
|
41
|
+
You are an AI assistant designed to generate high-quality training data for semantic text embedding models.
|
|
42
|
+
Your task is to create a hard-negative sample for a given "Anchor" text.
|
|
43
43
|
|
|
44
|
-
|
|
45
|
-
|
|
44
|
+
A high-quality hard-negative sample is a sentence that is topically related but semantically distinct from the Anchor.
|
|
45
|
+
It should share some context (e.g., same domain, same entities) but differ in a crucial piece of information, action, conclusion, or specific detail.
|
|
46
46
|
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
47
|
+
Instructions:
|
|
48
|
+
- Stay in General Domain: Remain in the same broad domain (e.g., religious topics), but choose a completely different subject matter.
|
|
49
|
+
- Maintain Topical Overlap: Keep the same domain, subject, or entities (e.g., people, products, concepts) as the Anchor.
|
|
50
|
+
- Alter a Key Semantic Element: Reverse a key word or condition or place or proper name that completely reverses the meaning of the sentence.
|
|
51
|
+
- Avoid Being a Paraphrase: The sentence must NOT be semantically equivalent. The core factual claim or intent must be different.
|
|
52
|
+
- Make it Challenging: The difference should be subtle enough that it requires a deep understanding of the text to identify, not just a simple keyword mismatch.
|
|
53
|
+
- Maintain Similar Length: The generated sentence should be of roughly the same length and level of detail as the Anchor.
|
|
54
54
|
|
|
55
|
-
|
|
56
|
-
|
|
55
|
+
Respond only in JSON format:
|
|
56
|
+
{{"result": "rewriteen_text"}}
|
|
57
57
|
|
|
58
|
-
|
|
59
|
-
|
|
58
|
+
Anchor Text:
|
|
59
|
+
"{text}"
|
|
60
60
|
|
|
61
61
|
|
|
62
62
|
analyze_template:
|