hamtaa-texttools 1.1.13__tar.gz → 1.1.15__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {hamtaa_texttools-1.1.13/hamtaa_texttools.egg-info → hamtaa_texttools-1.1.15}/PKG-INFO +9 -6
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/README.md +8 -5
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15/hamtaa_texttools.egg-info}/PKG-INFO +9 -6
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/hamtaa_texttools.egg-info/SOURCES.txt +3 -3
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/pyproject.toml +33 -33
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/tests/test_all_async_tools.py +3 -1
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/tests/test_all_tools.py +24 -5
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/texttools/__init__.py +2 -1
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/texttools/batch/batch_config.py +1 -1
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/texttools/batch/batch_runner.py +1 -1
- hamtaa_texttools-1.1.15/texttools/prompts/categorize.yaml +77 -0
- hamtaa_texttools-1.1.15/texttools/prompts/detect_entity.yaml +22 -0
- hamtaa_texttools-1.1.15/texttools/prompts/extract_keywords.yaml +68 -0
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/texttools/tools/async_tools.py +277 -53
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/texttools/tools/internals/async_operator.py +10 -4
- hamtaa_texttools-1.1.15/texttools/tools/internals/models.py +183 -0
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/texttools/tools/internals/sync_operator.py +11 -4
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/texttools/tools/sync_tools.py +277 -51
- hamtaa_texttools-1.1.13/tests/test_logprobs.py +0 -38
- hamtaa_texttools-1.1.13/texttools/prompts/categorizer.yaml +0 -28
- hamtaa_texttools-1.1.13/texttools/prompts/extract_keywords.yaml +0 -18
- hamtaa_texttools-1.1.13/texttools/tools/internals/output_models.py +0 -62
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/LICENSE +0 -0
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/MANIFEST.in +0 -0
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/hamtaa_texttools.egg-info/dependency_links.txt +0 -0
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/hamtaa_texttools.egg-info/requires.txt +0 -0
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/hamtaa_texttools.egg-info/top_level.txt +0 -0
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/setup.cfg +0 -0
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/tests/test_output_validation.py +0 -0
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/texttools/batch/internals/batch_manager.py +0 -0
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/texttools/batch/internals/utils.py +0 -0
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/texttools/prompts/README.md +0 -0
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/texttools/prompts/extract_entities.yaml +0 -0
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/texttools/prompts/is_question.yaml +0 -0
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/texttools/prompts/merge_questions.yaml +0 -0
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/texttools/prompts/rewrite.yaml +0 -0
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/texttools/prompts/run_custom.yaml +0 -0
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/texttools/prompts/subject_to_question.yaml +0 -0
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/texttools/prompts/summarize.yaml +0 -0
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/texttools/prompts/text_to_question.yaml +0 -0
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/texttools/prompts/translate.yaml +0 -0
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/texttools/tools/internals/formatters.py +0 -0
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/texttools/tools/internals/operator_utils.py +0 -0
- {hamtaa_texttools-1.1.13 → hamtaa_texttools-1.1.15}/texttools/tools/internals/prompt_loader.py +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: hamtaa-texttools
|
|
3
|
-
Version: 1.1.
|
|
3
|
+
Version: 1.1.15
|
|
4
4
|
Summary: A high-level NLP toolkit built on top of modern LLMs.
|
|
5
5
|
Author-email: Tohidi <the.mohammad.tohidi@gmail.com>, Montazer <montazerh82@gmail.com>, Givechi <mohamad.m.givechi@gmail.com>, MoosaviNejad <erfanmoosavi84@gmail.com>
|
|
6
6
|
License: MIT License
|
|
@@ -50,7 +50,7 @@ It provides ready-to-use utilities for **translation, question detection, keywor
|
|
|
50
50
|
TextTools provides a rich collection of high-level NLP utilities,
|
|
51
51
|
Each tool is designed to work with structured outputs (JSON / Pydantic).
|
|
52
52
|
|
|
53
|
-
- **`categorize()`** - Classifies text into
|
|
53
|
+
- **`categorize()`** - Classifies text into given categories (You have to create a category tree)
|
|
54
54
|
- **`extract_keywords()`** - Extracts keywords from text
|
|
55
55
|
- **`extract_entities()`** - Named Entity Recognition (NER) system
|
|
56
56
|
- **`is_question()`** - Binary detection of whether input is a question
|
|
@@ -64,7 +64,7 @@ Each tool is designed to work with structured outputs (JSON / Pydantic).
|
|
|
64
64
|
|
|
65
65
|
---
|
|
66
66
|
|
|
67
|
-
## ⚙️ `with_analysis`, `logprobs`, `output_lang`, `user_prompt`, `temperature` and `
|
|
67
|
+
## ⚙️ `with_analysis`, `logprobs`, `output_lang`, `user_prompt`, `temperature`, `validator` and `priority` parameters
|
|
68
68
|
|
|
69
69
|
TextTools provides several optional flags to customize LLM behavior:
|
|
70
70
|
|
|
@@ -72,6 +72,7 @@ TextTools provides several optional flags to customize LLM behavior:
|
|
|
72
72
|
**Note:** This doubles token usage per call because it triggers an additional LLM request.
|
|
73
73
|
|
|
74
74
|
- **`logprobs (bool)`** → Returns token-level probabilities for the generated output. You can also specify `top_logprobs=<N>` to get the top N alternative tokens and their probabilities.
|
|
75
|
+
**Note:** This feature works if it's supported by the model.
|
|
75
76
|
|
|
76
77
|
- **`output_lang (str)`** → Forces the model to respond in a specific language. The model will ignore other instructions about language and respond strictly in the requested language.
|
|
77
78
|
|
|
@@ -79,9 +80,10 @@ TextTools provides several optional flags to customize LLM behavior:
|
|
|
79
80
|
|
|
80
81
|
- **`temperature (float)`** → Determines how creative the model should respond. Takes a float number from `0.0` to `2.0`.
|
|
81
82
|
|
|
82
|
-
- **`validator (Callable)`** → Forces TheTool to validate the output result based on your custom validator. Validator should return bool (True if there were no problem, False if the validation
|
|
83
|
+
- **`validator (Callable)`** → Forces TheTool to validate the output result based on your custom validator. Validator should return a bool (True if there were no problem, False if the validation fails.) If the validator fails, TheTool will retry to get another output by modifying `temperature`. You can specify `max_validation_retries=<N>` to change the number of retries.
|
|
83
84
|
|
|
84
|
-
|
|
85
|
+
- **`priority (int)`** → Task execution priority level. Higher values = higher priority. Affects processing order in queues.
|
|
86
|
+
**Note:** This feature works if it's supported by the model and vLLM.
|
|
85
87
|
|
|
86
88
|
**Note:** There might be some tools that don't support some of the parameters above.
|
|
87
89
|
|
|
@@ -93,9 +95,10 @@ Every tool of `TextTools` returns a `ToolOutput` object which is a BaseModel wit
|
|
|
93
95
|
- **`result (Any)`** → The output of LLM
|
|
94
96
|
- **`analysis (str)`** → The reasoning step before generating the final output
|
|
95
97
|
- **`logprobs (list)`** → Token-level probabilities for the generated output
|
|
98
|
+
- **`process (str)`** → The tool name which processed the input
|
|
96
99
|
- **`errors (list[str])`** → Any error that have occured during calling LLM
|
|
97
100
|
|
|
98
|
-
**
|
|
101
|
+
**Note:** You can use `repr(ToolOutput)` to see details of an output.
|
|
99
102
|
|
|
100
103
|
---
|
|
101
104
|
|
|
@@ -15,7 +15,7 @@ It provides ready-to-use utilities for **translation, question detection, keywor
|
|
|
15
15
|
TextTools provides a rich collection of high-level NLP utilities,
|
|
16
16
|
Each tool is designed to work with structured outputs (JSON / Pydantic).
|
|
17
17
|
|
|
18
|
-
- **`categorize()`** - Classifies text into
|
|
18
|
+
- **`categorize()`** - Classifies text into given categories (You have to create a category tree)
|
|
19
19
|
- **`extract_keywords()`** - Extracts keywords from text
|
|
20
20
|
- **`extract_entities()`** - Named Entity Recognition (NER) system
|
|
21
21
|
- **`is_question()`** - Binary detection of whether input is a question
|
|
@@ -29,7 +29,7 @@ Each tool is designed to work with structured outputs (JSON / Pydantic).
|
|
|
29
29
|
|
|
30
30
|
---
|
|
31
31
|
|
|
32
|
-
## ⚙️ `with_analysis`, `logprobs`, `output_lang`, `user_prompt`, `temperature` and `
|
|
32
|
+
## ⚙️ `with_analysis`, `logprobs`, `output_lang`, `user_prompt`, `temperature`, `validator` and `priority` parameters
|
|
33
33
|
|
|
34
34
|
TextTools provides several optional flags to customize LLM behavior:
|
|
35
35
|
|
|
@@ -37,6 +37,7 @@ TextTools provides several optional flags to customize LLM behavior:
|
|
|
37
37
|
**Note:** This doubles token usage per call because it triggers an additional LLM request.
|
|
38
38
|
|
|
39
39
|
- **`logprobs (bool)`** → Returns token-level probabilities for the generated output. You can also specify `top_logprobs=<N>` to get the top N alternative tokens and their probabilities.
|
|
40
|
+
**Note:** This feature works if it's supported by the model.
|
|
40
41
|
|
|
41
42
|
- **`output_lang (str)`** → Forces the model to respond in a specific language. The model will ignore other instructions about language and respond strictly in the requested language.
|
|
42
43
|
|
|
@@ -44,9 +45,10 @@ TextTools provides several optional flags to customize LLM behavior:
|
|
|
44
45
|
|
|
45
46
|
- **`temperature (float)`** → Determines how creative the model should respond. Takes a float number from `0.0` to `2.0`.
|
|
46
47
|
|
|
47
|
-
- **`validator (Callable)`** → Forces TheTool to validate the output result based on your custom validator. Validator should return bool (True if there were no problem, False if the validation
|
|
48
|
+
- **`validator (Callable)`** → Forces TheTool to validate the output result based on your custom validator. Validator should return a bool (True if there were no problem, False if the validation fails.) If the validator fails, TheTool will retry to get another output by modifying `temperature`. You can specify `max_validation_retries=<N>` to change the number of retries.
|
|
48
49
|
|
|
49
|
-
|
|
50
|
+
- **`priority (int)`** → Task execution priority level. Higher values = higher priority. Affects processing order in queues.
|
|
51
|
+
**Note:** This feature works if it's supported by the model and vLLM.
|
|
50
52
|
|
|
51
53
|
**Note:** There might be some tools that don't support some of the parameters above.
|
|
52
54
|
|
|
@@ -58,9 +60,10 @@ Every tool of `TextTools` returns a `ToolOutput` object which is a BaseModel wit
|
|
|
58
60
|
- **`result (Any)`** → The output of LLM
|
|
59
61
|
- **`analysis (str)`** → The reasoning step before generating the final output
|
|
60
62
|
- **`logprobs (list)`** → Token-level probabilities for the generated output
|
|
63
|
+
- **`process (str)`** → The tool name which processed the input
|
|
61
64
|
- **`errors (list[str])`** → Any error that have occured during calling LLM
|
|
62
65
|
|
|
63
|
-
**
|
|
66
|
+
**Note:** You can use `repr(ToolOutput)` to see details of an output.
|
|
64
67
|
|
|
65
68
|
---
|
|
66
69
|
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: hamtaa-texttools
|
|
3
|
-
Version: 1.1.
|
|
3
|
+
Version: 1.1.15
|
|
4
4
|
Summary: A high-level NLP toolkit built on top of modern LLMs.
|
|
5
5
|
Author-email: Tohidi <the.mohammad.tohidi@gmail.com>, Montazer <montazerh82@gmail.com>, Givechi <mohamad.m.givechi@gmail.com>, MoosaviNejad <erfanmoosavi84@gmail.com>
|
|
6
6
|
License: MIT License
|
|
@@ -50,7 +50,7 @@ It provides ready-to-use utilities for **translation, question detection, keywor
|
|
|
50
50
|
TextTools provides a rich collection of high-level NLP utilities,
|
|
51
51
|
Each tool is designed to work with structured outputs (JSON / Pydantic).
|
|
52
52
|
|
|
53
|
-
- **`categorize()`** - Classifies text into
|
|
53
|
+
- **`categorize()`** - Classifies text into given categories (You have to create a category tree)
|
|
54
54
|
- **`extract_keywords()`** - Extracts keywords from text
|
|
55
55
|
- **`extract_entities()`** - Named Entity Recognition (NER) system
|
|
56
56
|
- **`is_question()`** - Binary detection of whether input is a question
|
|
@@ -64,7 +64,7 @@ Each tool is designed to work with structured outputs (JSON / Pydantic).
|
|
|
64
64
|
|
|
65
65
|
---
|
|
66
66
|
|
|
67
|
-
## ⚙️ `with_analysis`, `logprobs`, `output_lang`, `user_prompt`, `temperature` and `
|
|
67
|
+
## ⚙️ `with_analysis`, `logprobs`, `output_lang`, `user_prompt`, `temperature`, `validator` and `priority` parameters
|
|
68
68
|
|
|
69
69
|
TextTools provides several optional flags to customize LLM behavior:
|
|
70
70
|
|
|
@@ -72,6 +72,7 @@ TextTools provides several optional flags to customize LLM behavior:
|
|
|
72
72
|
**Note:** This doubles token usage per call because it triggers an additional LLM request.
|
|
73
73
|
|
|
74
74
|
- **`logprobs (bool)`** → Returns token-level probabilities for the generated output. You can also specify `top_logprobs=<N>` to get the top N alternative tokens and their probabilities.
|
|
75
|
+
**Note:** This feature works if it's supported by the model.
|
|
75
76
|
|
|
76
77
|
- **`output_lang (str)`** → Forces the model to respond in a specific language. The model will ignore other instructions about language and respond strictly in the requested language.
|
|
77
78
|
|
|
@@ -79,9 +80,10 @@ TextTools provides several optional flags to customize LLM behavior:
|
|
|
79
80
|
|
|
80
81
|
- **`temperature (float)`** → Determines how creative the model should respond. Takes a float number from `0.0` to `2.0`.
|
|
81
82
|
|
|
82
|
-
- **`validator (Callable)`** → Forces TheTool to validate the output result based on your custom validator. Validator should return bool (True if there were no problem, False if the validation
|
|
83
|
+
- **`validator (Callable)`** → Forces TheTool to validate the output result based on your custom validator. Validator should return a bool (True if there were no problem, False if the validation fails.) If the validator fails, TheTool will retry to get another output by modifying `temperature`. You can specify `max_validation_retries=<N>` to change the number of retries.
|
|
83
84
|
|
|
84
|
-
|
|
85
|
+
- **`priority (int)`** → Task execution priority level. Higher values = higher priority. Affects processing order in queues.
|
|
86
|
+
**Note:** This feature works if it's supported by the model and vLLM.
|
|
85
87
|
|
|
86
88
|
**Note:** There might be some tools that don't support some of the parameters above.
|
|
87
89
|
|
|
@@ -93,9 +95,10 @@ Every tool of `TextTools` returns a `ToolOutput` object which is a BaseModel wit
|
|
|
93
95
|
- **`result (Any)`** → The output of LLM
|
|
94
96
|
- **`analysis (str)`** → The reasoning step before generating the final output
|
|
95
97
|
- **`logprobs (list)`** → Token-level probabilities for the generated output
|
|
98
|
+
- **`process (str)`** → The tool name which processed the input
|
|
96
99
|
- **`errors (list[str])`** → Any error that have occured during calling LLM
|
|
97
100
|
|
|
98
|
-
**
|
|
101
|
+
**Note:** You can use `repr(ToolOutput)` to see details of an output.
|
|
99
102
|
|
|
100
103
|
---
|
|
101
104
|
|
|
@@ -9,7 +9,6 @@ hamtaa_texttools.egg-info/requires.txt
|
|
|
9
9
|
hamtaa_texttools.egg-info/top_level.txt
|
|
10
10
|
tests/test_all_async_tools.py
|
|
11
11
|
tests/test_all_tools.py
|
|
12
|
-
tests/test_logprobs.py
|
|
13
12
|
tests/test_output_validation.py
|
|
14
13
|
texttools/__init__.py
|
|
15
14
|
texttools/batch/batch_config.py
|
|
@@ -17,7 +16,8 @@ texttools/batch/batch_runner.py
|
|
|
17
16
|
texttools/batch/internals/batch_manager.py
|
|
18
17
|
texttools/batch/internals/utils.py
|
|
19
18
|
texttools/prompts/README.md
|
|
20
|
-
texttools/prompts/
|
|
19
|
+
texttools/prompts/categorize.yaml
|
|
20
|
+
texttools/prompts/detect_entity.yaml
|
|
21
21
|
texttools/prompts/extract_entities.yaml
|
|
22
22
|
texttools/prompts/extract_keywords.yaml
|
|
23
23
|
texttools/prompts/is_question.yaml
|
|
@@ -32,7 +32,7 @@ texttools/tools/async_tools.py
|
|
|
32
32
|
texttools/tools/sync_tools.py
|
|
33
33
|
texttools/tools/internals/async_operator.py
|
|
34
34
|
texttools/tools/internals/formatters.py
|
|
35
|
+
texttools/tools/internals/models.py
|
|
35
36
|
texttools/tools/internals/operator_utils.py
|
|
36
|
-
texttools/tools/internals/output_models.py
|
|
37
37
|
texttools/tools/internals/prompt_loader.py
|
|
38
38
|
texttools/tools/internals/sync_operator.py
|
|
@@ -1,33 +1,33 @@
|
|
|
1
|
-
[build-system]
|
|
2
|
-
requires = ["setuptools>=61.0", "wheel"]
|
|
3
|
-
build-backend = "setuptools.build_meta"
|
|
4
|
-
|
|
5
|
-
[project]
|
|
6
|
-
name = "hamtaa-texttools"
|
|
7
|
-
version = "1.1.
|
|
8
|
-
authors = [
|
|
9
|
-
{ name = "Tohidi", email = "the.mohammad.tohidi@gmail.com" },
|
|
10
|
-
{ name = "Montazer", email = "montazerh82@gmail.com" },
|
|
11
|
-
{ name = "Givechi", email = "mohamad.m.givechi@gmail.com" },
|
|
12
|
-
{ name = "MoosaviNejad", email = "erfanmoosavi84@gmail.com" },
|
|
13
|
-
]
|
|
14
|
-
description = "A high-level NLP toolkit built on top of modern LLMs."
|
|
15
|
-
readme = "README.md"
|
|
16
|
-
license = {file = "LICENSE"}
|
|
17
|
-
requires-python = ">=3.8"
|
|
18
|
-
dependencies = [
|
|
19
|
-
"openai==1.97.1",
|
|
20
|
-
"pydantic>=2.0.0",
|
|
21
|
-
"pyyaml>=6.0",
|
|
22
|
-
]
|
|
23
|
-
keywords = ["nlp", "llm", "text-processing", "openai"]
|
|
24
|
-
|
|
25
|
-
[tool.setuptools.packages.find]
|
|
26
|
-
where = ["."]
|
|
27
|
-
include = ["texttools*"]
|
|
28
|
-
|
|
29
|
-
[tool.setuptools]
|
|
30
|
-
include-package-data = true
|
|
31
|
-
|
|
32
|
-
[tool.setuptools.package-data]
|
|
33
|
-
"texttools" = ["prompts/*.yaml", "prompts/*.yml"]
|
|
1
|
+
[build-system]
|
|
2
|
+
requires = ["setuptools>=61.0", "wheel"]
|
|
3
|
+
build-backend = "setuptools.build_meta"
|
|
4
|
+
|
|
5
|
+
[project]
|
|
6
|
+
name = "hamtaa-texttools"
|
|
7
|
+
version = "1.1.15"
|
|
8
|
+
authors = [
|
|
9
|
+
{ name = "Tohidi", email = "the.mohammad.tohidi@gmail.com" },
|
|
10
|
+
{ name = "Montazer", email = "montazerh82@gmail.com" },
|
|
11
|
+
{ name = "Givechi", email = "mohamad.m.givechi@gmail.com" },
|
|
12
|
+
{ name = "MoosaviNejad", email = "erfanmoosavi84@gmail.com" },
|
|
13
|
+
]
|
|
14
|
+
description = "A high-level NLP toolkit built on top of modern LLMs."
|
|
15
|
+
readme = "README.md"
|
|
16
|
+
license = {file = "LICENSE"}
|
|
17
|
+
requires-python = ">=3.8"
|
|
18
|
+
dependencies = [
|
|
19
|
+
"openai==1.97.1",
|
|
20
|
+
"pydantic>=2.0.0",
|
|
21
|
+
"pyyaml>=6.0",
|
|
22
|
+
]
|
|
23
|
+
keywords = ["nlp", "llm", "text-processing", "openai"]
|
|
24
|
+
|
|
25
|
+
[tool.setuptools.packages.find]
|
|
26
|
+
where = ["."]
|
|
27
|
+
include = ["texttools*"]
|
|
28
|
+
|
|
29
|
+
[tool.setuptools]
|
|
30
|
+
include-package-data = true
|
|
31
|
+
|
|
32
|
+
[tool.setuptools.package-data]
|
|
33
|
+
"texttools" = ["prompts/*.yaml", "prompts/*.yml"]
|
|
@@ -20,7 +20,9 @@ t = AsyncTheTool(client=client, model=MODEL)
|
|
|
20
20
|
|
|
21
21
|
|
|
22
22
|
async def main():
|
|
23
|
-
category_task = t.categorize(
|
|
23
|
+
category_task = t.categorize(
|
|
24
|
+
"سلام حالت چطوره؟", categories=["هیچکدام", "دینی", "فلسفه"]
|
|
25
|
+
)
|
|
24
26
|
keywords_task = t.extract_keywords("Tomorrow, we will be dead by the car crash")
|
|
25
27
|
entities_task = t.extract_entities("We will be dead by the car crash")
|
|
26
28
|
detection_task = t.is_question("We will be dead by the car crash")
|
|
@@ -4,7 +4,7 @@ from dotenv import load_dotenv
|
|
|
4
4
|
from openai import OpenAI
|
|
5
5
|
from pydantic import BaseModel
|
|
6
6
|
|
|
7
|
-
from texttools import TheTool
|
|
7
|
+
from texttools import TheTool, CategoryTree
|
|
8
8
|
|
|
9
9
|
# Load environment variables from .env
|
|
10
10
|
load_dotenv()
|
|
@@ -18,18 +18,38 @@ client = OpenAI(base_url=BASE_URL, api_key=API_KEY)
|
|
|
18
18
|
# Create an instance of TheTool
|
|
19
19
|
t = TheTool(client=client, model=MODEL)
|
|
20
20
|
|
|
21
|
-
# Categorizer
|
|
22
|
-
category = t.categorize("سلام حالت چطوره؟")
|
|
21
|
+
# Categorizer: list mode
|
|
22
|
+
category = t.categorize("سلام حالت چطوره؟", categories=["هیچکدام", "دینی", "فلسفه"])
|
|
23
23
|
print(repr(category))
|
|
24
24
|
|
|
25
|
+
# Categorizer: tree mode
|
|
26
|
+
tree = CategoryTree("category_test_tree")
|
|
27
|
+
tree.add_node("اخلاق")
|
|
28
|
+
tree.add_node("معرفت شناسی")
|
|
29
|
+
tree.add_node("متافیزیک", description="اراده قدرت در حیطه متافیزیک است")
|
|
30
|
+
tree.add_node("فلسفه ذهن", description="فلسفه ذهن به چگونگی درک ما از جهان می پردازد")
|
|
31
|
+
tree.add_node("آگاهی", "فلسفه ذهن", description="آگاهی خیلی مهم است")
|
|
32
|
+
tree.add_node("ذهن و بدن", "فلسفه ذهن")
|
|
33
|
+
tree.add_node("امکان و ضرورت", "متافیزیک")
|
|
34
|
+
|
|
35
|
+
categories = t.categorize(
|
|
36
|
+
"اراده قدرت مفهومی مهم در مابعد الطبیعه است که توسط نیچه مطرح شده",
|
|
37
|
+
tree,
|
|
38
|
+
mode="category_tree",
|
|
39
|
+
)
|
|
40
|
+
print(repr(categories))
|
|
41
|
+
|
|
25
42
|
# Keyword Extractor
|
|
26
|
-
keywords = t.extract_keywords(
|
|
43
|
+
keywords = t.extract_keywords(
|
|
44
|
+
"Tomorrow, we will be dead by the car crash", mode="count", number_of_keywords=3
|
|
45
|
+
)
|
|
27
46
|
print(repr(keywords))
|
|
28
47
|
|
|
29
48
|
# NER Extractor
|
|
30
49
|
entities = t.extract_entities("We will be dead by the car crash")
|
|
31
50
|
print(repr(entities))
|
|
32
51
|
|
|
52
|
+
|
|
33
53
|
# Question Detector
|
|
34
54
|
detection = t.is_question("We will be dead by the car crash")
|
|
35
55
|
print(repr(detection))
|
|
@@ -49,7 +69,6 @@ print(repr(merged))
|
|
|
49
69
|
rewritten = t.rewrite(
|
|
50
70
|
"چرا ما انسان ها، موجوداتی اجتماعی هستیم؟",
|
|
51
71
|
mode="positive",
|
|
52
|
-
with_analysis=True,
|
|
53
72
|
)
|
|
54
73
|
print(repr(rewritten))
|
|
55
74
|
|
|
@@ -2,5 +2,6 @@ from .batch.batch_runner import BatchJobRunner
|
|
|
2
2
|
from .batch.batch_config import BatchConfig
|
|
3
3
|
from .tools.sync_tools import TheTool
|
|
4
4
|
from .tools.async_tools import AsyncTheTool
|
|
5
|
+
from .tools.internals.models import CategoryTree
|
|
5
6
|
|
|
6
|
-
__all__ = ["TheTool", "AsyncTheTool", "BatchJobRunner", "BatchConfig"]
|
|
7
|
+
__all__ = ["TheTool", "AsyncTheTool", "BatchJobRunner", "BatchConfig", "CategoryTree"]
|
|
@@ -11,7 +11,7 @@ from pydantic import BaseModel
|
|
|
11
11
|
|
|
12
12
|
from texttools.batch.internals.batch_manager import BatchManager
|
|
13
13
|
from texttools.batch.batch_config import BatchConfig
|
|
14
|
-
from texttools.tools.internals.
|
|
14
|
+
from texttools.tools.internals.models import StrOutput
|
|
15
15
|
|
|
16
16
|
# Base Model type for output models
|
|
17
17
|
T = TypeVar("T", bound=BaseModel)
|
|
@@ -0,0 +1,77 @@
|
|
|
1
|
+
main_template:
|
|
2
|
+
|
|
3
|
+
category_list: |
|
|
4
|
+
You are an expert classification agent.
|
|
5
|
+
You receive a list of categories.
|
|
6
|
+
|
|
7
|
+
Your task:
|
|
8
|
+
- Read all provided categories carefully.
|
|
9
|
+
- Consider the user query, intent, and task explanation.
|
|
10
|
+
- Select exactly one category name from the list that best matches the user’s intent.
|
|
11
|
+
- Return only the category name, nothing else.
|
|
12
|
+
|
|
13
|
+
Rules:
|
|
14
|
+
- Never invent categories that are not in the list.
|
|
15
|
+
- If multiple categories seem possible, choose the closest match based on the description and user intent.
|
|
16
|
+
- If descriptions are missing or empty, rely on the category name.
|
|
17
|
+
- If the correct answer cannot be determined with certainty, choose the most likely one.
|
|
18
|
+
|
|
19
|
+
Output format:
|
|
20
|
+
{{
|
|
21
|
+
"reason": "Explanation of why the input belongs to the category"
|
|
22
|
+
"result": "<category_name_only>"
|
|
23
|
+
}}
|
|
24
|
+
|
|
25
|
+
Available categories with their descriptions:
|
|
26
|
+
{category_list}
|
|
27
|
+
|
|
28
|
+
The text that has to be categorized:
|
|
29
|
+
{input}
|
|
30
|
+
|
|
31
|
+
category_tree: |
|
|
32
|
+
You are an expert classification agent.
|
|
33
|
+
You receive a list of categories at the current level of a hierarchical category tree.
|
|
34
|
+
|
|
35
|
+
Your task:
|
|
36
|
+
- Read all provided categories carefully.
|
|
37
|
+
- Consider the user query, intent, and task explanation.
|
|
38
|
+
- Select exactly one category name from the list that best matches the user’s intent.
|
|
39
|
+
- Return only the category name, nothing else.
|
|
40
|
+
|
|
41
|
+
Rules:
|
|
42
|
+
- Never invent categories that are not in the list.
|
|
43
|
+
- If multiple categories seem possible, choose the closest match based on the description and user intent.
|
|
44
|
+
- If descriptions are missing or empty, rely on the category name.
|
|
45
|
+
- If the correct answer cannot be determined with certainty, choose the most likely one.
|
|
46
|
+
|
|
47
|
+
Output format:
|
|
48
|
+
{{
|
|
49
|
+
"reason": "Explanation of why the input belongs to the category"
|
|
50
|
+
"result": "<category_name_only>"
|
|
51
|
+
}}
|
|
52
|
+
|
|
53
|
+
Available categories with their descriptions at this level:
|
|
54
|
+
{category_list}
|
|
55
|
+
|
|
56
|
+
Do not include category descriptions at all. Only write the raw category.
|
|
57
|
+
|
|
58
|
+
The text that has to be categorized:
|
|
59
|
+
{input}
|
|
60
|
+
|
|
61
|
+
analyze_template:
|
|
62
|
+
|
|
63
|
+
category_list: |
|
|
64
|
+
We want to categorize the given text.
|
|
65
|
+
To improve categorization, we need an analysis of the text.
|
|
66
|
+
Analyze the given text and write its main idea and a short analysis of that.
|
|
67
|
+
Analysis should be very short.
|
|
68
|
+
Text:
|
|
69
|
+
{input}
|
|
70
|
+
|
|
71
|
+
category_tree: |
|
|
72
|
+
We want to categorize the given text.
|
|
73
|
+
To improve categorization, we need an analysis of the text.
|
|
74
|
+
Analyze the given text and write its main idea and a short analysis of that.
|
|
75
|
+
Analysis should be very short.
|
|
76
|
+
Text:
|
|
77
|
+
{input}
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
main_template: |
|
|
2
|
+
You are an expert Named Entity Recognition (NER) system. Extract entities from the text.
|
|
3
|
+
The output must strictly follow the provided Pydantic schema.
|
|
4
|
+
|
|
5
|
+
Mapping Rule:
|
|
6
|
+
- Person: شخص
|
|
7
|
+
- Location: مکان
|
|
8
|
+
- Time: زمان
|
|
9
|
+
- Living Beings: موجود زنده
|
|
10
|
+
- Organization: سازمان
|
|
11
|
+
- Concept: مفهوم
|
|
12
|
+
|
|
13
|
+
CRITICAL:
|
|
14
|
+
1. The final output structure must be a complete JSON object matching the Pydantic schema (List[Entity]).
|
|
15
|
+
2. Both the extracted text and the type must be in Persian, using the exact mapping provided above.
|
|
16
|
+
|
|
17
|
+
Here is the text: {input}
|
|
18
|
+
|
|
19
|
+
analyze_template: |
|
|
20
|
+
Analyze the following text to identify all potential named entities and their categories (Person, Location, Time, Living Beings, Organization, Concept).
|
|
21
|
+
Provide a brief summary of the entities identified that will help the main process to extract them accurately and apply the correct Persian type label.
|
|
22
|
+
Here is the text: {input}
|
|
@@ -0,0 +1,68 @@
|
|
|
1
|
+
main_template:
|
|
2
|
+
|
|
3
|
+
auto: |
|
|
4
|
+
You are an expert keyword extractor.
|
|
5
|
+
Extract the most relevant keywords from the given text.
|
|
6
|
+
Guidelines:
|
|
7
|
+
- Keywords must represent the main concepts of the text.
|
|
8
|
+
- If two words have overlapping meanings, choose only one.
|
|
9
|
+
- Do not include generic or unrelated words.
|
|
10
|
+
- Keywords must be single, self-contained words (no phrases).
|
|
11
|
+
- Output between 3 and 7 keywords based on the input length.
|
|
12
|
+
- Respond only in JSON format:
|
|
13
|
+
{{"result": ["keyword1", "keyword2", etc.]}}
|
|
14
|
+
Here is the text:
|
|
15
|
+
{input}
|
|
16
|
+
|
|
17
|
+
threshold: |
|
|
18
|
+
You are an expert keyword extractor specialized in fine-grained concept identification.
|
|
19
|
+
Extract the most specific, content-bearing keywords from the text.
|
|
20
|
+
|
|
21
|
+
Requirements:
|
|
22
|
+
- Choose fine-grained conceptual terms, not general domain labels.
|
|
23
|
+
- Avoid words that only describe the broad topic (e.g., Islam, religion, philosophy, history).
|
|
24
|
+
- Prefer specific names, concepts, doctrines, events, arguments, or terminology.
|
|
25
|
+
- Do not select words only because they appear frequently. A keyword must represent a central conceptual idea, not a repeated surface term.
|
|
26
|
+
- If multiple words express overlapping meaning, select the more specific one.
|
|
27
|
+
- Keywords must be single words (no multi-word expressions).
|
|
28
|
+
- Extract N keywords depending on input length:
|
|
29
|
+
- Short texts (a few sentences): 3 keywords
|
|
30
|
+
- Medium texts (1–4 paragraphs): 4–5 keywords
|
|
31
|
+
- Long texts (more than 4 paragraphs): 6–7 keywords
|
|
32
|
+
- Respond only in JSON format:
|
|
33
|
+
{{"result": ["keyword1", "keyword2", etc.]}}
|
|
34
|
+
Here is the text:
|
|
35
|
+
{input}
|
|
36
|
+
|
|
37
|
+
count: |
|
|
38
|
+
You are an expert keyword extractor with precise output requirements.
|
|
39
|
+
Extract exactly {number_of_keywords} keywords from the given text.
|
|
40
|
+
|
|
41
|
+
Requirements:
|
|
42
|
+
- Extract exactly {number_of_keywords} keywords, no more, no less.
|
|
43
|
+
- Select the {number_of_keywords} most relevant and specific keywords that represent core concepts.
|
|
44
|
+
- Prefer specific terms, names, and concepts over general topic labels.
|
|
45
|
+
- If the text doesn't contain enough distinct keywords, include the most relevant ones even if some are less specific.
|
|
46
|
+
- Keywords must be single words (no multi-word expressions).
|
|
47
|
+
- Order keywords by relevance (most relevant first).
|
|
48
|
+
- Respond only in JSON format:
|
|
49
|
+
{{"result": ["keyword1", "keyword2", "keyword3", ...]}}
|
|
50
|
+
|
|
51
|
+
Here is the text:
|
|
52
|
+
{input}
|
|
53
|
+
|
|
54
|
+
analyze_template:
|
|
55
|
+
auto: |
|
|
56
|
+
Analyze the following text to identify its main topics, concepts, and important terms.
|
|
57
|
+
Provide a concise summary of your findings that will help in extracting relevant keywords.
|
|
58
|
+
{input}
|
|
59
|
+
|
|
60
|
+
threshold: |
|
|
61
|
+
Analyze the following text to identify its main topics, concepts, and important terms.
|
|
62
|
+
Provide a concise summary of your findings that will help in extracting relevant keywords.
|
|
63
|
+
{input}
|
|
64
|
+
|
|
65
|
+
count: |
|
|
66
|
+
Analyze the following text to identify its main topics, concepts, and important terms.
|
|
67
|
+
Provide a concise summary of your findings that will help in extracting relevant keywords.
|
|
68
|
+
{input}
|