hamtaa-texttools 2.0.0__tar.gz → 2.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (39) hide show
  1. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/PKG-INFO +60 -12
  2. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/README.md +58 -11
  3. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/hamtaa_texttools.egg-info/PKG-INFO +60 -12
  4. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/hamtaa_texttools.egg-info/SOURCES.txt +1 -0
  5. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/hamtaa_texttools.egg-info/requires.txt +1 -0
  6. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/pyproject.toml +46 -45
  7. hamtaa_texttools-2.2.0/texttools/__init__.py +6 -0
  8. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/prompts/augment.yaml +15 -15
  9. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/tools/async_tools.py +64 -3
  10. hamtaa_texttools-2.2.0/texttools/tools/batch_tools.py +688 -0
  11. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/tools/sync_tools.py +64 -3
  12. hamtaa_texttools-2.0.0/texttools/__init__.py +0 -5
  13. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/LICENSE +0 -0
  14. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/hamtaa_texttools.egg-info/dependency_links.txt +0 -0
  15. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/hamtaa_texttools.egg-info/top_level.txt +0 -0
  16. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/setup.cfg +0 -0
  17. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/tests/test_category_tree.py +0 -0
  18. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/tests/test_to_chunks.py +0 -0
  19. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/core/__init__.py +0 -0
  20. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/core/exceptions.py +0 -0
  21. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/core/internal_models.py +0 -0
  22. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/core/operators/__init__.py +0 -0
  23. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/core/operators/async_operator.py +0 -0
  24. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/core/operators/sync_operator.py +0 -0
  25. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/core/utils.py +0 -0
  26. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/models.py +0 -0
  27. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/prompts/categorize.yaml +0 -0
  28. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/prompts/extract_entities.yaml +0 -0
  29. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/prompts/extract_keywords.yaml +0 -0
  30. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/prompts/is_fact.yaml +0 -0
  31. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/prompts/is_question.yaml +0 -0
  32. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/prompts/merge_questions.yaml +0 -0
  33. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/prompts/propositionize.yaml +0 -0
  34. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/prompts/run_custom.yaml +0 -0
  35. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/prompts/summarize.yaml +0 -0
  36. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/prompts/to_question.yaml +0 -0
  37. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/prompts/translate.yaml +0 -0
  38. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/py.typed +0 -0
  39. {hamtaa_texttools-2.0.0 → hamtaa_texttools-2.2.0}/texttools/tools/__init__.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: hamtaa-texttools
3
- Version: 2.0.0
3
+ Version: 2.2.0
4
4
  Summary: A high-level NLP toolkit built on top of modern LLMs.
5
5
  Author-email: Tohidi <the.mohammad.tohidi@gmail.com>, Erfan Moosavi <erfanmoosavi84@gmail.com>, Montazer <montazerh82@gmail.com>, Givechi <mohamad.m.givechi@gmail.com>, Zareshahi <a.zareshahi1377@gmail.com>
6
6
  Maintainer-email: Erfan Moosavi <erfanmoosavi84@gmail.com>, Tohidi <the.mohammad.tohidi@gmail.com>
@@ -14,6 +14,7 @@ Classifier: Operating System :: OS Independent
14
14
  Requires-Python: >=3.11
15
15
  Description-Content-Type: text/markdown
16
16
  License-File: LICENSE
17
+ Requires-Dist: dotenv>=0.9.9
17
18
  Requires-Dist: openai>=1.97.1
18
19
  Requires-Dist: pydantic>=2.0.0
19
20
  Requires-Dist: pyyaml>=6.0
@@ -28,7 +29,10 @@ Dynamic: license-file
28
29
 
29
30
  **TextTools** is a high-level **NLP toolkit** built on top of **LLMs**.
30
31
 
31
- It provides both **sync (`TheTool`)** and **async (`AsyncTheTool`)** APIs for maximum flexibility.
32
+ It provides three API styles for maximum flexibility:
33
+ - Sync API (`TheTool`) - Simple, sequential operations
34
+ - Async API (`AsyncTheTool`) - High-performance async operations
35
+ - Batch API (`BatchTheTool`) - Process multiple texts in parallel with built-in concurrency control
32
36
 
33
37
  It provides ready-to-use utilities for **translation, question detection, categorization, NER extraction, and more** - designed to help you integrate AI-powered text processing into your applications with minimal effort.
34
38
 
@@ -68,8 +72,8 @@ pip install -U hamtaa-texttools
68
72
 
69
73
  | Status | Meaning | Tools | Safe for Production? |
70
74
  |--------|---------|----------|-------------------|
71
- | **✅ Production** | Evaluated and tested. | `categorize()` (list mode), `extract_keywords()`, `extract_entities()`, `is_question()`, `to_question()`, `merge_questions()`, `augment()`, `summarize()`, `run_custom()` | **Yes** - ready for reliable use. |
72
- | **🧪 Experimental** | Added to the package but **not fully evaluated**. | `categorize()` (tree mode), `translate()`, `propositionize()`, `is_fact()` | **Use with caution** |
75
+ | **✅ Production** | Evaluated and tested. | `categorize()`, `extract_keywords()`, `extract_entities()`, `is_question()`, `to_question()`, `merge_questions()`, `augment()`, `summarize()`, `run_custom()` | **Yes** - ready for reliable use. |
76
+ | **🧪 Experimental** | Added to the package but **not fully evaluated**. | `translate()`, `propositionize()`, `is_fact()` | **Use with caution** |
73
77
 
74
78
  ---
75
79
 
@@ -95,6 +99,9 @@ pip install -U hamtaa-texttools
95
99
  - **`timeout: float`** → Maximum time in seconds to wait for the response before raising a timeout error.
96
100
  **Note:** This feature is only available in `AsyncTheTool`.
97
101
 
102
+ - **`raise_on_error: bool`** → (`TheTool/AsyncTheTool`) Raise errors (True) or return them in output (False). Default is True.
103
+
104
+ - **`max_concurrency: int`** → (`BatchTheTool` only) Maximum number of concurrent API calls. Default is 5.
98
105
 
99
106
  ---
100
107
 
@@ -114,13 +121,16 @@ Every tool of `TextTools` returns a `ToolOutput` object which is a BaseModel wit
114
121
  - Verify operation success with the `is_successful()` method.
115
122
  - Convert output to a dictionary with the `to_dict()` method.
116
123
 
124
+ **Note:** For BatchTheTool: Each method returns a list[ToolOutput] containing results for all input texts.
125
+
117
126
  ---
118
127
 
119
- ## 🧨 Sync vs Async
120
- | Tool | Style | Use case |
121
- |--------------|---------|---------------------------------------------|
122
- | `TheTool` | Sync | Simple scripts, sequential workflows |
123
- | `AsyncTheTool` | Async | High-throughput apps, APIs, concurrent tasks |
128
+ ## 🧨 Sync vs Async vs Batch
129
+ | Tool | Style | Use Case | Best For |
130
+ |------|-------|----------|----------|
131
+ | `TheTool` | **Sync** | Simple scripts, sequential workflows | • Quick prototyping<br>• Simple scripts<br>• Sequential processing<br>• Debugging |
132
+ | `AsyncTheTool` | **Async** | High-throughput applications, APIs, concurrent tasks | • Web APIs<br>• Concurrent operations<br>• High-performance apps<br>• Real-time processing |
133
+ | `BatchTheTool` | **Batch** | Process multiple texts efficiently with controlled concurrency | • Bulk processing<br>• Large datasets<br>• Parallel execution<br>• Resource optimization |
124
134
 
125
135
  ---
126
136
 
@@ -165,6 +175,35 @@ async def main():
165
175
  asyncio.run(main())
166
176
  ```
167
177
 
178
+ ## ⚡ Quick Start (Batch)
179
+
180
+ ```python
181
+ import asyncio
182
+ from openai import AsyncOpenAI
183
+ from texttools import BatchTheTool
184
+
185
+ async def main():
186
+ async_client = AsyncOpenAI(base_url="your_url", api_key="your_api_key")
187
+ model = "model_name"
188
+
189
+ batch_the_tool = BatchTheTool(client=async_client, model=model, max_concurrency=3)
190
+
191
+ categories = await batch_tool.categorize(
192
+ texts=[
193
+ "Climate change impacts on agriculture",
194
+ "Artificial intelligence in healthcare",
195
+ "Economic effects of remote work",
196
+ "Advancements in quantum computing",
197
+ ],
198
+ categories=["Science", "Technology", "Economics", "Environment"],
199
+ )
200
+
201
+ for i, result in enumerate(categories):
202
+ print(f"Text {i+1}: {result.result}")
203
+
204
+ asyncio.run(main())
205
+ ```
206
+
168
207
  ---
169
208
 
170
209
  ## ✅ Use Cases
@@ -173,11 +212,20 @@ Use **TextTools** when you need to:
173
212
 
174
213
  - 🔍 **Classify** large datasets quickly without model training
175
214
  - 🧩 **Integrate** LLMs into production pipelines (structured outputs)
176
- - 📊 **Analyze** large text collections using embeddings and categorization
215
+ - 📊 **Analyze** large text collections using embeddings and categorization
216
+
217
+ ---
218
+
219
+ ## 📄 License
220
+
221
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
177
222
 
178
223
  ---
179
224
 
180
225
  ## 🤝 Contributing
181
226
 
182
- Contributions are welcome!
183
- Feel free to **open issues, suggest new features, or submit pull requests**.
227
+ We welcome contributions from the community! - see the [CONTRIBUTING](CONTRIBUTING.md) file for details.
228
+
229
+ ## 📚 Documentation
230
+
231
+ For detailed documentation, architecture overview, and implementation details, please visit the [docs](docs) directory.
@@ -7,7 +7,10 @@
7
7
 
8
8
  **TextTools** is a high-level **NLP toolkit** built on top of **LLMs**.
9
9
 
10
- It provides both **sync (`TheTool`)** and **async (`AsyncTheTool`)** APIs for maximum flexibility.
10
+ It provides three API styles for maximum flexibility:
11
+ - Sync API (`TheTool`) - Simple, sequential operations
12
+ - Async API (`AsyncTheTool`) - High-performance async operations
13
+ - Batch API (`BatchTheTool`) - Process multiple texts in parallel with built-in concurrency control
11
14
 
12
15
  It provides ready-to-use utilities for **translation, question detection, categorization, NER extraction, and more** - designed to help you integrate AI-powered text processing into your applications with minimal effort.
13
16
 
@@ -47,8 +50,8 @@ pip install -U hamtaa-texttools
47
50
 
48
51
  | Status | Meaning | Tools | Safe for Production? |
49
52
  |--------|---------|----------|-------------------|
50
- | **✅ Production** | Evaluated and tested. | `categorize()` (list mode), `extract_keywords()`, `extract_entities()`, `is_question()`, `to_question()`, `merge_questions()`, `augment()`, `summarize()`, `run_custom()` | **Yes** - ready for reliable use. |
51
- | **🧪 Experimental** | Added to the package but **not fully evaluated**. | `categorize()` (tree mode), `translate()`, `propositionize()`, `is_fact()` | **Use with caution** |
53
+ | **✅ Production** | Evaluated and tested. | `categorize()`, `extract_keywords()`, `extract_entities()`, `is_question()`, `to_question()`, `merge_questions()`, `augment()`, `summarize()`, `run_custom()` | **Yes** - ready for reliable use. |
54
+ | **🧪 Experimental** | Added to the package but **not fully evaluated**. | `translate()`, `propositionize()`, `is_fact()` | **Use with caution** |
52
55
 
53
56
  ---
54
57
 
@@ -74,6 +77,9 @@ pip install -U hamtaa-texttools
74
77
  - **`timeout: float`** → Maximum time in seconds to wait for the response before raising a timeout error.
75
78
  **Note:** This feature is only available in `AsyncTheTool`.
76
79
 
80
+ - **`raise_on_error: bool`** → (`TheTool/AsyncTheTool`) Raise errors (True) or return them in output (False). Default is True.
81
+
82
+ - **`max_concurrency: int`** → (`BatchTheTool` only) Maximum number of concurrent API calls. Default is 5.
77
83
 
78
84
  ---
79
85
 
@@ -93,13 +99,16 @@ Every tool of `TextTools` returns a `ToolOutput` object which is a BaseModel wit
93
99
  - Verify operation success with the `is_successful()` method.
94
100
  - Convert output to a dictionary with the `to_dict()` method.
95
101
 
102
+ **Note:** For BatchTheTool: Each method returns a list[ToolOutput] containing results for all input texts.
103
+
96
104
  ---
97
105
 
98
- ## 🧨 Sync vs Async
99
- | Tool | Style | Use case |
100
- |--------------|---------|---------------------------------------------|
101
- | `TheTool` | Sync | Simple scripts, sequential workflows |
102
- | `AsyncTheTool` | Async | High-throughput apps, APIs, concurrent tasks |
106
+ ## 🧨 Sync vs Async vs Batch
107
+ | Tool | Style | Use Case | Best For |
108
+ |------|-------|----------|----------|
109
+ | `TheTool` | **Sync** | Simple scripts, sequential workflows | • Quick prototyping<br>• Simple scripts<br>• Sequential processing<br>• Debugging |
110
+ | `AsyncTheTool` | **Async** | High-throughput applications, APIs, concurrent tasks | • Web APIs<br>• Concurrent operations<br>• High-performance apps<br>• Real-time processing |
111
+ | `BatchTheTool` | **Batch** | Process multiple texts efficiently with controlled concurrency | • Bulk processing<br>• Large datasets<br>• Parallel execution<br>• Resource optimization |
103
112
 
104
113
  ---
105
114
 
@@ -144,6 +153,35 @@ async def main():
144
153
  asyncio.run(main())
145
154
  ```
146
155
 
156
+ ## ⚡ Quick Start (Batch)
157
+
158
+ ```python
159
+ import asyncio
160
+ from openai import AsyncOpenAI
161
+ from texttools import BatchTheTool
162
+
163
+ async def main():
164
+ async_client = AsyncOpenAI(base_url="your_url", api_key="your_api_key")
165
+ model = "model_name"
166
+
167
+ batch_the_tool = BatchTheTool(client=async_client, model=model, max_concurrency=3)
168
+
169
+ categories = await batch_tool.categorize(
170
+ texts=[
171
+ "Climate change impacts on agriculture",
172
+ "Artificial intelligence in healthcare",
173
+ "Economic effects of remote work",
174
+ "Advancements in quantum computing",
175
+ ],
176
+ categories=["Science", "Technology", "Economics", "Environment"],
177
+ )
178
+
179
+ for i, result in enumerate(categories):
180
+ print(f"Text {i+1}: {result.result}")
181
+
182
+ asyncio.run(main())
183
+ ```
184
+
147
185
  ---
148
186
 
149
187
  ## ✅ Use Cases
@@ -152,11 +190,20 @@ Use **TextTools** when you need to:
152
190
 
153
191
  - 🔍 **Classify** large datasets quickly without model training
154
192
  - 🧩 **Integrate** LLMs into production pipelines (structured outputs)
155
- - 📊 **Analyze** large text collections using embeddings and categorization
193
+ - 📊 **Analyze** large text collections using embeddings and categorization
194
+
195
+ ---
196
+
197
+ ## 📄 License
198
+
199
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
156
200
 
157
201
  ---
158
202
 
159
203
  ## 🤝 Contributing
160
204
 
161
- Contributions are welcome!
162
- Feel free to **open issues, suggest new features, or submit pull requests**.
205
+ We welcome contributions from the community! - see the [CONTRIBUTING](CONTRIBUTING.md) file for details.
206
+
207
+ ## 📚 Documentation
208
+
209
+ For detailed documentation, architecture overview, and implementation details, please visit the [docs](docs) directory.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: hamtaa-texttools
3
- Version: 2.0.0
3
+ Version: 2.2.0
4
4
  Summary: A high-level NLP toolkit built on top of modern LLMs.
5
5
  Author-email: Tohidi <the.mohammad.tohidi@gmail.com>, Erfan Moosavi <erfanmoosavi84@gmail.com>, Montazer <montazerh82@gmail.com>, Givechi <mohamad.m.givechi@gmail.com>, Zareshahi <a.zareshahi1377@gmail.com>
6
6
  Maintainer-email: Erfan Moosavi <erfanmoosavi84@gmail.com>, Tohidi <the.mohammad.tohidi@gmail.com>
@@ -14,6 +14,7 @@ Classifier: Operating System :: OS Independent
14
14
  Requires-Python: >=3.11
15
15
  Description-Content-Type: text/markdown
16
16
  License-File: LICENSE
17
+ Requires-Dist: dotenv>=0.9.9
17
18
  Requires-Dist: openai>=1.97.1
18
19
  Requires-Dist: pydantic>=2.0.0
19
20
  Requires-Dist: pyyaml>=6.0
@@ -28,7 +29,10 @@ Dynamic: license-file
28
29
 
29
30
  **TextTools** is a high-level **NLP toolkit** built on top of **LLMs**.
30
31
 
31
- It provides both **sync (`TheTool`)** and **async (`AsyncTheTool`)** APIs for maximum flexibility.
32
+ It provides three API styles for maximum flexibility:
33
+ - Sync API (`TheTool`) - Simple, sequential operations
34
+ - Async API (`AsyncTheTool`) - High-performance async operations
35
+ - Batch API (`BatchTheTool`) - Process multiple texts in parallel with built-in concurrency control
32
36
 
33
37
  It provides ready-to-use utilities for **translation, question detection, categorization, NER extraction, and more** - designed to help you integrate AI-powered text processing into your applications with minimal effort.
34
38
 
@@ -68,8 +72,8 @@ pip install -U hamtaa-texttools
68
72
 
69
73
  | Status | Meaning | Tools | Safe for Production? |
70
74
  |--------|---------|----------|-------------------|
71
- | **✅ Production** | Evaluated and tested. | `categorize()` (list mode), `extract_keywords()`, `extract_entities()`, `is_question()`, `to_question()`, `merge_questions()`, `augment()`, `summarize()`, `run_custom()` | **Yes** - ready for reliable use. |
72
- | **🧪 Experimental** | Added to the package but **not fully evaluated**. | `categorize()` (tree mode), `translate()`, `propositionize()`, `is_fact()` | **Use with caution** |
75
+ | **✅ Production** | Evaluated and tested. | `categorize()`, `extract_keywords()`, `extract_entities()`, `is_question()`, `to_question()`, `merge_questions()`, `augment()`, `summarize()`, `run_custom()` | **Yes** - ready for reliable use. |
76
+ | **🧪 Experimental** | Added to the package but **not fully evaluated**. | `translate()`, `propositionize()`, `is_fact()` | **Use with caution** |
73
77
 
74
78
  ---
75
79
 
@@ -95,6 +99,9 @@ pip install -U hamtaa-texttools
95
99
  - **`timeout: float`** → Maximum time in seconds to wait for the response before raising a timeout error.
96
100
  **Note:** This feature is only available in `AsyncTheTool`.
97
101
 
102
+ - **`raise_on_error: bool`** → (`TheTool/AsyncTheTool`) Raise errors (True) or return them in output (False). Default is True.
103
+
104
+ - **`max_concurrency: int`** → (`BatchTheTool` only) Maximum number of concurrent API calls. Default is 5.
98
105
 
99
106
  ---
100
107
 
@@ -114,13 +121,16 @@ Every tool of `TextTools` returns a `ToolOutput` object which is a BaseModel wit
114
121
  - Verify operation success with the `is_successful()` method.
115
122
  - Convert output to a dictionary with the `to_dict()` method.
116
123
 
124
+ **Note:** For BatchTheTool: Each method returns a list[ToolOutput] containing results for all input texts.
125
+
117
126
  ---
118
127
 
119
- ## 🧨 Sync vs Async
120
- | Tool | Style | Use case |
121
- |--------------|---------|---------------------------------------------|
122
- | `TheTool` | Sync | Simple scripts, sequential workflows |
123
- | `AsyncTheTool` | Async | High-throughput apps, APIs, concurrent tasks |
128
+ ## 🧨 Sync vs Async vs Batch
129
+ | Tool | Style | Use Case | Best For |
130
+ |------|-------|----------|----------|
131
+ | `TheTool` | **Sync** | Simple scripts, sequential workflows | • Quick prototyping<br>• Simple scripts<br>• Sequential processing<br>• Debugging |
132
+ | `AsyncTheTool` | **Async** | High-throughput applications, APIs, concurrent tasks | • Web APIs<br>• Concurrent operations<br>• High-performance apps<br>• Real-time processing |
133
+ | `BatchTheTool` | **Batch** | Process multiple texts efficiently with controlled concurrency | • Bulk processing<br>• Large datasets<br>• Parallel execution<br>• Resource optimization |
124
134
 
125
135
  ---
126
136
 
@@ -165,6 +175,35 @@ async def main():
165
175
  asyncio.run(main())
166
176
  ```
167
177
 
178
+ ## ⚡ Quick Start (Batch)
179
+
180
+ ```python
181
+ import asyncio
182
+ from openai import AsyncOpenAI
183
+ from texttools import BatchTheTool
184
+
185
+ async def main():
186
+ async_client = AsyncOpenAI(base_url="your_url", api_key="your_api_key")
187
+ model = "model_name"
188
+
189
+ batch_the_tool = BatchTheTool(client=async_client, model=model, max_concurrency=3)
190
+
191
+ categories = await batch_tool.categorize(
192
+ texts=[
193
+ "Climate change impacts on agriculture",
194
+ "Artificial intelligence in healthcare",
195
+ "Economic effects of remote work",
196
+ "Advancements in quantum computing",
197
+ ],
198
+ categories=["Science", "Technology", "Economics", "Environment"],
199
+ )
200
+
201
+ for i, result in enumerate(categories):
202
+ print(f"Text {i+1}: {result.result}")
203
+
204
+ asyncio.run(main())
205
+ ```
206
+
168
207
  ---
169
208
 
170
209
  ## ✅ Use Cases
@@ -173,11 +212,20 @@ Use **TextTools** when you need to:
173
212
 
174
213
  - 🔍 **Classify** large datasets quickly without model training
175
214
  - 🧩 **Integrate** LLMs into production pipelines (structured outputs)
176
- - 📊 **Analyze** large text collections using embeddings and categorization
215
+ - 📊 **Analyze** large text collections using embeddings and categorization
216
+
217
+ ---
218
+
219
+ ## 📄 License
220
+
221
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
177
222
 
178
223
  ---
179
224
 
180
225
  ## 🤝 Contributing
181
226
 
182
- Contributions are welcome!
183
- Feel free to **open issues, suggest new features, or submit pull requests**.
227
+ We welcome contributions from the community! - see the [CONTRIBUTING](CONTRIBUTING.md) file for details.
228
+
229
+ ## 📚 Documentation
230
+
231
+ For detailed documentation, architecture overview, and implementation details, please visit the [docs](docs) directory.
@@ -32,4 +32,5 @@ texttools/prompts/to_question.yaml
32
32
  texttools/prompts/translate.yaml
33
33
  texttools/tools/__init__.py
34
34
  texttools/tools/async_tools.py
35
+ texttools/tools/batch_tools.py
35
36
  texttools/tools/sync_tools.py
@@ -1,3 +1,4 @@
1
+ dotenv>=0.9.9
1
2
  openai>=1.97.1
2
3
  pydantic>=2.0.0
3
4
  pyyaml>=6.0
@@ -1,45 +1,46 @@
1
- [build-system]
2
- requires = ["setuptools>=61.0", "wheel"]
3
- build-backend = "setuptools.build_meta"
4
-
5
- [project]
6
- name = "hamtaa-texttools"
7
- version = "2.0.0"
8
- authors = [
9
- {name = "Tohidi", email = "the.mohammad.tohidi@gmail.com"},
10
- {name = "Erfan Moosavi", email = "erfanmoosavi84@gmail.com"},
11
- {name = "Montazer", email = "montazerh82@gmail.com"},
12
- {name = "Givechi", email = "mohamad.m.givechi@gmail.com"},
13
- {name = "Zareshahi", email = "a.zareshahi1377@gmail.com"},
14
- ]
15
- maintainers = [
16
- {name = "Erfan Moosavi", email = "erfanmoosavi84@gmail.com"},
17
- {name = "Tohidi", email = "the.mohammad.tohidi@gmail.com"},
18
- ]
19
- description = "A high-level NLP toolkit built on top of modern LLMs."
20
- readme = "README.md"
21
- license = {text = "MIT"}
22
- requires-python = ">=3.11"
23
- dependencies = [
24
- "openai>=1.97.1",
25
- "pydantic>=2.0.0",
26
- "pyyaml>=6.0",
27
- ]
28
- keywords = ["nlp", "llm", "text-processing", "openai"]
29
- classifiers = [
30
- "Development Status :: 5 - Production/Stable",
31
- "License :: OSI Approved :: MIT License",
32
- "Topic :: Scientific/Engineering :: Artificial Intelligence",
33
- "Topic :: Text Processing",
34
- "Operating System :: OS Independent",
35
- ]
36
-
37
- [tool.setuptools.packages.find]
38
- where = ["."]
39
- include = ["texttools*"]
40
-
41
- [tool.setuptools]
42
- include-package-data = true
43
-
44
- [tool.setuptools.package-data]
45
- "texttools" = ["prompts/*.yaml", "py.typed"]
1
+ [build-system]
2
+ requires = ["setuptools>=61.0", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "hamtaa-texttools"
7
+ version = "2.2.0"
8
+ authors = [
9
+ {name = "Tohidi", email = "the.mohammad.tohidi@gmail.com"},
10
+ {name = "Erfan Moosavi", email = "erfanmoosavi84@gmail.com"},
11
+ {name = "Montazer", email = "montazerh82@gmail.com"},
12
+ {name = "Givechi", email = "mohamad.m.givechi@gmail.com"},
13
+ {name = "Zareshahi", email = "a.zareshahi1377@gmail.com"},
14
+ ]
15
+ maintainers = [
16
+ {name = "Erfan Moosavi", email = "erfanmoosavi84@gmail.com"},
17
+ {name = "Tohidi", email = "the.mohammad.tohidi@gmail.com"},
18
+ ]
19
+ description = "A high-level NLP toolkit built on top of modern LLMs."
20
+ readme = "README.md"
21
+ license = {text = "MIT"}
22
+ requires-python = ">=3.11"
23
+ dependencies = [
24
+ "dotenv>=0.9.9",
25
+ "openai>=1.97.1",
26
+ "pydantic>=2.0.0",
27
+ "pyyaml>=6.0",
28
+ ]
29
+ keywords = ["nlp", "llm", "text-processing", "openai"]
30
+ classifiers = [
31
+ "Development Status :: 5 - Production/Stable",
32
+ "License :: OSI Approved :: MIT License",
33
+ "Topic :: Scientific/Engineering :: Artificial Intelligence",
34
+ "Topic :: Text Processing",
35
+ "Operating System :: OS Independent",
36
+ ]
37
+
38
+ [tool.setuptools.packages.find]
39
+ where = ["."]
40
+ include = ["texttools*"]
41
+
42
+ [tool.setuptools]
43
+ include-package-data = true
44
+
45
+ [tool.setuptools.package-data]
46
+ "texttools" = ["prompts/*.yaml", "py.typed"]
@@ -0,0 +1,6 @@
1
+ from .models import CategoryTree
2
+ from .tools.async_tools import AsyncTheTool
3
+ from .tools.batch_tools import BatchTheTool
4
+ from .tools.sync_tools import TheTool
5
+
6
+ __all__ = ["CategoryTree", "AsyncTheTool", "TheTool", "BatchTheTool"]
@@ -38,25 +38,25 @@ main_template:
38
38
  "{text}"
39
39
 
40
40
  hard_negative: |
41
- You are an AI assistant designed to generate high-quality training data for semantic text embedding models.
42
- Your task is to create a hard-negative sample for a given "Anchor" text.
41
+ You are an AI assistant designed to generate high-quality training data for semantic text embedding models.
42
+ Your task is to create a hard-negative sample for a given "Anchor" text.
43
43
 
44
- A high-quality hard-negative sample is a sentence that is topically related but semantically distinct from the Anchor.
45
- It should share some context (e.g., same domain, same entities) but differ in a crucial piece of information, action, conclusion, or specific detail.
44
+ A high-quality hard-negative sample is a sentence that is topically related but semantically distinct from the Anchor.
45
+ It should share some context (e.g., same domain, same entities) but differ in a crucial piece of information, action, conclusion, or specific detail.
46
46
 
47
- Instructions:
48
- - Stay in General Domain: Remain in the same broad domain (e.g., religious topics), but choose a completely different subject matter.
49
- - Maintain Topical Overlap: Keep the same domain, subject, or entities (e.g., people, products, concepts) as the Anchor.
50
- - Alter a Key Semantic Element: Reverse a key word or condition or place or proper name that completely reverses the meaning of the sentence.
51
- - Avoid Being a Paraphrase: The sentence must NOT be semantically equivalent. The core factual claim or intent must be different.
52
- - Make it Challenging: The difference should be subtle enough that it requires a deep understanding of the text to identify, not just a simple keyword mismatch.
53
- - Maintain Similar Length: The generated sentence should be of roughly the same length and level of detail as the Anchor.
47
+ Instructions:
48
+ - Stay in General Domain: Remain in the same broad domain (e.g., religious topics), but choose a completely different subject matter.
49
+ - Maintain Topical Overlap: Keep the same domain, subject, or entities (e.g., people, products, concepts) as the Anchor.
50
+ - Alter a Key Semantic Element: Reverse a key word or condition or place or proper name that completely reverses the meaning of the sentence.
51
+ - Avoid Being a Paraphrase: The sentence must NOT be semantically equivalent. The core factual claim or intent must be different.
52
+ - Make it Challenging: The difference should be subtle enough that it requires a deep understanding of the text to identify, not just a simple keyword mismatch.
53
+ - Maintain Similar Length: The generated sentence should be of roughly the same length and level of detail as the Anchor.
54
54
 
55
- Respond only in JSON format:
56
- {{"result": "rewriteen_text"}}
55
+ Respond only in JSON format:
56
+ {{"result": "rewriteen_text"}}
57
57
 
58
- Anchor Text:
59
- "{text}"
58
+ Anchor Text:
59
+ "{text}"
60
60
 
61
61
 
62
62
  analyze_template: