langroid 0.31.1__py3-none-any.whl → 0.33.3__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {langroid-0.31.1.dist-info → langroid-0.33.3.dist-info}/METADATA +150 -124
- langroid-0.33.3.dist-info/RECORD +7 -0
- {langroid-0.31.1.dist-info → langroid-0.33.3.dist-info}/WHEEL +1 -1
- langroid-0.33.3.dist-info/entry_points.txt +4 -0
- pyproject.toml +317 -212
- langroid/__init__.py +0 -106
- langroid/agent/.chainlit/config.toml +0 -121
- langroid/agent/.chainlit/translations/bn.json +0 -231
- langroid/agent/.chainlit/translations/en-US.json +0 -229
- langroid/agent/.chainlit/translations/gu.json +0 -231
- langroid/agent/.chainlit/translations/he-IL.json +0 -231
- langroid/agent/.chainlit/translations/hi.json +0 -231
- langroid/agent/.chainlit/translations/kn.json +0 -231
- langroid/agent/.chainlit/translations/ml.json +0 -231
- langroid/agent/.chainlit/translations/mr.json +0 -231
- langroid/agent/.chainlit/translations/ta.json +0 -231
- langroid/agent/.chainlit/translations/te.json +0 -231
- langroid/agent/.chainlit/translations/zh-CN.json +0 -229
- langroid/agent/__init__.py +0 -41
- langroid/agent/base.py +0 -1981
- langroid/agent/batch.py +0 -398
- langroid/agent/callbacks/__init__.py +0 -0
- langroid/agent/callbacks/chainlit.py +0 -598
- langroid/agent/chat_agent.py +0 -1899
- langroid/agent/chat_document.py +0 -454
- langroid/agent/helpers.py +0 -0
- langroid/agent/junk +0 -13
- langroid/agent/openai_assistant.py +0 -882
- langroid/agent/special/__init__.py +0 -59
- langroid/agent/special/arangodb/__init__.py +0 -0
- langroid/agent/special/arangodb/arangodb_agent.py +0 -656
- langroid/agent/special/arangodb/system_messages.py +0 -186
- langroid/agent/special/arangodb/tools.py +0 -107
- langroid/agent/special/arangodb/utils.py +0 -36
- langroid/agent/special/doc_chat_agent.py +0 -1466
- langroid/agent/special/lance_doc_chat_agent.py +0 -262
- langroid/agent/special/lance_rag/__init__.py +0 -9
- langroid/agent/special/lance_rag/critic_agent.py +0 -198
- langroid/agent/special/lance_rag/lance_rag_task.py +0 -82
- langroid/agent/special/lance_rag/query_planner_agent.py +0 -260
- langroid/agent/special/lance_tools.py +0 -61
- langroid/agent/special/neo4j/__init__.py +0 -0
- langroid/agent/special/neo4j/csv_kg_chat.py +0 -174
- langroid/agent/special/neo4j/neo4j_chat_agent.py +0 -433
- langroid/agent/special/neo4j/system_messages.py +0 -120
- langroid/agent/special/neo4j/tools.py +0 -32
- langroid/agent/special/relevance_extractor_agent.py +0 -127
- langroid/agent/special/retriever_agent.py +0 -56
- langroid/agent/special/sql/__init__.py +0 -17
- langroid/agent/special/sql/sql_chat_agent.py +0 -654
- langroid/agent/special/sql/utils/__init__.py +0 -21
- langroid/agent/special/sql/utils/description_extractors.py +0 -190
- langroid/agent/special/sql/utils/populate_metadata.py +0 -85
- langroid/agent/special/sql/utils/system_message.py +0 -35
- langroid/agent/special/sql/utils/tools.py +0 -64
- langroid/agent/special/table_chat_agent.py +0 -263
- langroid/agent/structured_message.py +0 -9
- langroid/agent/task.py +0 -2093
- langroid/agent/tool_message.py +0 -393
- langroid/agent/tools/__init__.py +0 -38
- langroid/agent/tools/duckduckgo_search_tool.py +0 -50
- langroid/agent/tools/file_tools.py +0 -234
- langroid/agent/tools/google_search_tool.py +0 -39
- langroid/agent/tools/metaphor_search_tool.py +0 -67
- langroid/agent/tools/orchestration.py +0 -303
- langroid/agent/tools/recipient_tool.py +0 -235
- langroid/agent/tools/retrieval_tool.py +0 -32
- langroid/agent/tools/rewind_tool.py +0 -137
- langroid/agent/tools/segment_extract_tool.py +0 -41
- langroid/agent/typed_task.py +0 -19
- langroid/agent/xml_tool_message.py +0 -382
- langroid/agent_config.py +0 -0
- langroid/cachedb/__init__.py +0 -17
- langroid/cachedb/base.py +0 -58
- langroid/cachedb/momento_cachedb.py +0 -108
- langroid/cachedb/redis_cachedb.py +0 -153
- langroid/embedding_models/__init__.py +0 -39
- langroid/embedding_models/base.py +0 -74
- langroid/embedding_models/clustering.py +0 -189
- langroid/embedding_models/models.py +0 -461
- langroid/embedding_models/protoc/__init__.py +0 -0
- langroid/embedding_models/protoc/embeddings.proto +0 -19
- langroid/embedding_models/protoc/embeddings_pb2.py +0 -33
- langroid/embedding_models/protoc/embeddings_pb2.pyi +0 -50
- langroid/embedding_models/protoc/embeddings_pb2_grpc.py +0 -79
- langroid/embedding_models/remote_embeds.py +0 -153
- langroid/exceptions.py +0 -65
- langroid/experimental/team-save.py +0 -391
- langroid/language_models/.chainlit/config.toml +0 -121
- langroid/language_models/.chainlit/translations/en-US.json +0 -231
- langroid/language_models/__init__.py +0 -53
- langroid/language_models/azure_openai.py +0 -153
- langroid/language_models/base.py +0 -678
- langroid/language_models/config.py +0 -18
- langroid/language_models/mock_lm.py +0 -124
- langroid/language_models/openai_gpt.py +0 -1923
- langroid/language_models/prompt_formatter/__init__.py +0 -16
- langroid/language_models/prompt_formatter/base.py +0 -40
- langroid/language_models/prompt_formatter/hf_formatter.py +0 -132
- langroid/language_models/prompt_formatter/llama2_formatter.py +0 -75
- langroid/language_models/utils.py +0 -147
- langroid/mytypes.py +0 -84
- langroid/parsing/__init__.py +0 -52
- langroid/parsing/agent_chats.py +0 -38
- langroid/parsing/code-parsing.md +0 -86
- langroid/parsing/code_parser.py +0 -121
- langroid/parsing/config.py +0 -0
- langroid/parsing/document_parser.py +0 -718
- langroid/parsing/image_text.py +0 -32
- langroid/parsing/para_sentence_split.py +0 -62
- langroid/parsing/parse_json.py +0 -155
- langroid/parsing/parser.py +0 -313
- langroid/parsing/repo_loader.py +0 -790
- langroid/parsing/routing.py +0 -36
- langroid/parsing/search.py +0 -275
- langroid/parsing/spider.py +0 -102
- langroid/parsing/table_loader.py +0 -94
- langroid/parsing/url_loader.py +0 -111
- langroid/parsing/url_loader_cookies.py +0 -73
- langroid/parsing/urls.py +0 -273
- langroid/parsing/utils.py +0 -373
- langroid/parsing/web_search.py +0 -155
- langroid/prompts/__init__.py +0 -9
- langroid/prompts/chat-gpt4-system-prompt.md +0 -68
- langroid/prompts/dialog.py +0 -17
- langroid/prompts/prompts_config.py +0 -5
- langroid/prompts/templates.py +0 -141
- langroid/pydantic_v1/__init__.py +0 -10
- langroid/pydantic_v1/main.py +0 -4
- langroid/utils/.chainlit/config.toml +0 -121
- langroid/utils/.chainlit/translations/en-US.json +0 -231
- langroid/utils/__init__.py +0 -19
- langroid/utils/algorithms/__init__.py +0 -3
- langroid/utils/algorithms/graph.py +0 -103
- langroid/utils/configuration.py +0 -98
- langroid/utils/constants.py +0 -30
- langroid/utils/docker.py +0 -37
- langroid/utils/git_utils.py +0 -252
- langroid/utils/globals.py +0 -49
- langroid/utils/llms/__init__.py +0 -0
- langroid/utils/llms/strings.py +0 -8
- langroid/utils/logging.py +0 -135
- langroid/utils/object_registry.py +0 -66
- langroid/utils/output/__init__.py +0 -20
- langroid/utils/output/citations.py +0 -41
- langroid/utils/output/printing.py +0 -99
- langroid/utils/output/status.py +0 -40
- langroid/utils/pandas_utils.py +0 -30
- langroid/utils/pydantic_utils.py +0 -602
- langroid/utils/system.py +0 -286
- langroid/utils/types.py +0 -93
- langroid/utils/web/__init__.py +0 -0
- langroid/utils/web/login.py +0 -83
- langroid/vector_store/__init__.py +0 -50
- langroid/vector_store/base.py +0 -357
- langroid/vector_store/chromadb.py +0 -214
- langroid/vector_store/lancedb.py +0 -401
- langroid/vector_store/meilisearch.py +0 -299
- langroid/vector_store/momento.py +0 -278
- langroid/vector_store/qdrant_cloud.py +0 -6
- langroid/vector_store/qdrantdb.py +0 -468
- langroid-0.31.1.dist-info/RECORD +0 -162
- {langroid-0.31.1.dist-info → langroid-0.33.3.dist-info/licenses}/LICENSE +0 -0
@@ -1,16 +0,0 @@
|
|
1
|
-
from . import base
|
2
|
-
from . import llama2_formatter
|
3
|
-
from .base import PromptFormatter
|
4
|
-
from .llama2_formatter import Llama2Formatter
|
5
|
-
from ..config import PromptFormatterConfig
|
6
|
-
from ..config import Llama2FormatterConfig
|
7
|
-
|
8
|
-
|
9
|
-
__all__ = [
|
10
|
-
"PromptFormatter",
|
11
|
-
"Llama2Formatter",
|
12
|
-
"PromptFormatterConfig",
|
13
|
-
"Llama2FormatterConfig",
|
14
|
-
"base",
|
15
|
-
"llama2_formatter",
|
16
|
-
]
|
@@ -1,40 +0,0 @@
|
|
1
|
-
import logging
|
2
|
-
from abc import ABC, abstractmethod
|
3
|
-
from typing import List
|
4
|
-
|
5
|
-
from langroid.language_models.base import LLMMessage
|
6
|
-
from langroid.language_models.config import PromptFormatterConfig
|
7
|
-
|
8
|
-
logger = logging.getLogger(__name__)
|
9
|
-
|
10
|
-
|
11
|
-
class PromptFormatter(ABC):
|
12
|
-
"""
|
13
|
-
Abstract base class for a prompt formatter
|
14
|
-
"""
|
15
|
-
|
16
|
-
def __init__(self, config: PromptFormatterConfig):
|
17
|
-
self.config = config
|
18
|
-
|
19
|
-
@staticmethod
|
20
|
-
def create(formatter: str) -> "PromptFormatter":
|
21
|
-
from langroid.language_models.config import HFPromptFormatterConfig
|
22
|
-
from langroid.language_models.prompt_formatter.hf_formatter import HFFormatter
|
23
|
-
|
24
|
-
return HFFormatter(HFPromptFormatterConfig(model_name=formatter))
|
25
|
-
|
26
|
-
@abstractmethod
|
27
|
-
def format(self, messages: List[LLMMessage]) -> str:
|
28
|
-
"""
|
29
|
-
Convert sequence of messages (system, user, assistant, user, assistant...user)
|
30
|
-
to a single prompt formatted according to the specific format type,
|
31
|
-
to be used in a /completions endpoint.
|
32
|
-
|
33
|
-
Args:
|
34
|
-
messages (List[LLMMessage]): chat history as a sequence of messages
|
35
|
-
|
36
|
-
Returns:
|
37
|
-
(str): formatted version of chat history
|
38
|
-
|
39
|
-
"""
|
40
|
-
pass
|
@@ -1,132 +0,0 @@
|
|
1
|
-
"""
|
2
|
-
Prompt formatter based on HuggingFace `AutoTokenizer.apply_chat_template` method
|
3
|
-
from their Transformers library. It searches the hub for a model matching the
|
4
|
-
specified name, and uses the first one it finds. We assume that all matching
|
5
|
-
models will have the same tokenizer, so we just use the first one.
|
6
|
-
"""
|
7
|
-
|
8
|
-
import logging
|
9
|
-
import re
|
10
|
-
from typing import Any, List, Set, Tuple, Type
|
11
|
-
|
12
|
-
from jinja2.exceptions import TemplateError
|
13
|
-
|
14
|
-
from langroid.language_models.base import LanguageModel, LLMMessage, Role
|
15
|
-
from langroid.language_models.config import HFPromptFormatterConfig
|
16
|
-
from langroid.language_models.prompt_formatter.base import PromptFormatter
|
17
|
-
|
18
|
-
logger = logging.getLogger(__name__)
|
19
|
-
|
20
|
-
|
21
|
-
def try_import_hf_modules() -> Tuple[Type[Any], Type[Any]]:
|
22
|
-
"""
|
23
|
-
Attempts to import the AutoTokenizer class from the transformers package.
|
24
|
-
Returns:
|
25
|
-
The AutoTokenizer class if successful.
|
26
|
-
Raises:
|
27
|
-
ImportError: If the transformers package is not installed.
|
28
|
-
"""
|
29
|
-
try:
|
30
|
-
from huggingface_hub import HfApi
|
31
|
-
from transformers import AutoTokenizer
|
32
|
-
|
33
|
-
return AutoTokenizer, HfApi
|
34
|
-
except ImportError:
|
35
|
-
raise ImportError(
|
36
|
-
"""
|
37
|
-
You are trying to use some/all of:
|
38
|
-
HuggingFace transformers.AutoTokenizer,
|
39
|
-
huggingface_hub.HfApi,
|
40
|
-
but these are not not installed
|
41
|
-
by default with Langroid. Please install langroid using the
|
42
|
-
`transformers` extra, like so:
|
43
|
-
pip install "langroid[transformers]"
|
44
|
-
or equivalent.
|
45
|
-
"""
|
46
|
-
)
|
47
|
-
|
48
|
-
|
49
|
-
def find_hf_formatter(model_name: str) -> str:
|
50
|
-
AutoTokenizer, HfApi = try_import_hf_modules()
|
51
|
-
hf_api = HfApi()
|
52
|
-
# try to find a matching model, with progressivly shorter prefixes of model_name
|
53
|
-
model_name = model_name.lower().split("/")[-1]
|
54
|
-
parts = re.split("[:\\-_]", model_name)
|
55
|
-
parts = [p.lower() for p in parts if p != ""]
|
56
|
-
for i in range(len(parts), 0, -1):
|
57
|
-
prefix = "-".join(parts[:i])
|
58
|
-
models = hf_api.list_models(
|
59
|
-
task="text-generation",
|
60
|
-
model_name=prefix,
|
61
|
-
)
|
62
|
-
try:
|
63
|
-
mdl = next(models)
|
64
|
-
tokenizer = AutoTokenizer.from_pretrained(mdl.id)
|
65
|
-
if tokenizer.chat_template is not None:
|
66
|
-
return str(mdl.id)
|
67
|
-
else:
|
68
|
-
continue
|
69
|
-
except Exception:
|
70
|
-
continue
|
71
|
-
|
72
|
-
return ""
|
73
|
-
|
74
|
-
|
75
|
-
class HFFormatter(PromptFormatter):
|
76
|
-
models: Set[str] = set() # which models have been used for formatting
|
77
|
-
|
78
|
-
def __init__(self, config: HFPromptFormatterConfig):
|
79
|
-
super().__init__(config)
|
80
|
-
AutoTokenizer, HfApi = try_import_hf_modules()
|
81
|
-
self.config: HFPromptFormatterConfig = config
|
82
|
-
hf_api = HfApi()
|
83
|
-
models = hf_api.list_models(
|
84
|
-
task="text-generation",
|
85
|
-
model_name=config.model_name,
|
86
|
-
)
|
87
|
-
try:
|
88
|
-
mdl = next(models)
|
89
|
-
except StopIteration:
|
90
|
-
raise ValueError(f"Model {config.model_name} not found on HuggingFace Hub")
|
91
|
-
|
92
|
-
self.tokenizer = AutoTokenizer.from_pretrained(mdl.id)
|
93
|
-
if self.tokenizer.chat_template is None:
|
94
|
-
raise ValueError(
|
95
|
-
f"Model {config.model_name} does not support chat template"
|
96
|
-
)
|
97
|
-
elif mdl.id not in HFFormatter.models:
|
98
|
-
# only warn if this is the first time we've used this mdl.id
|
99
|
-
logger.warning(
|
100
|
-
f"""
|
101
|
-
Using HuggingFace {mdl.id} for prompt formatting:
|
102
|
-
This is the CHAT TEMPLATE. If this is not what you intended,
|
103
|
-
consider specifying a more complete model name for the formatter.
|
104
|
-
|
105
|
-
{self.tokenizer.chat_template}
|
106
|
-
"""
|
107
|
-
)
|
108
|
-
HFFormatter.models.add(mdl.id)
|
109
|
-
|
110
|
-
def format(self, messages: List[LLMMessage]) -> str:
|
111
|
-
sys_msg, chat_msgs, user_msg = LanguageModel.get_chat_history_components(
|
112
|
-
messages
|
113
|
-
)
|
114
|
-
# build msg dicts expected by AutoTokenizer.apply_chat_template
|
115
|
-
sys_msg_dict = dict(role=Role.SYSTEM.value, content=sys_msg)
|
116
|
-
chat_dicts = []
|
117
|
-
for user, assistant in chat_msgs:
|
118
|
-
chat_dicts.append(dict(role=Role.USER.value, content=user))
|
119
|
-
chat_dicts.append(dict(role=Role.ASSISTANT.value, content=assistant))
|
120
|
-
chat_dicts.append(dict(role=Role.USER.value, content=user_msg))
|
121
|
-
all_dicts = [sys_msg_dict] + chat_dicts
|
122
|
-
try:
|
123
|
-
# apply chat template
|
124
|
-
result = self.tokenizer.apply_chat_template(all_dicts, tokenize=False)
|
125
|
-
except TemplateError:
|
126
|
-
# this likely means the model doesn't support a system msg,
|
127
|
-
# so combine it with the first user msg
|
128
|
-
first_user_msg = chat_msgs[0][0] if len(chat_msgs) > 0 else user_msg
|
129
|
-
first_user_msg = sys_msg + "\n\n" + first_user_msg
|
130
|
-
chat_dicts[0] = dict(role=Role.USER.value, content=first_user_msg)
|
131
|
-
result = self.tokenizer.apply_chat_template(chat_dicts, tokenize=False)
|
132
|
-
return str(result)
|
@@ -1,75 +0,0 @@
|
|
1
|
-
import logging
|
2
|
-
from typing import List, Tuple
|
3
|
-
|
4
|
-
from langroid.language_models.base import LanguageModel, LLMMessage
|
5
|
-
from langroid.language_models.config import Llama2FormatterConfig
|
6
|
-
from langroid.language_models.prompt_formatter.base import PromptFormatter
|
7
|
-
|
8
|
-
logger = logging.getLogger(__name__)
|
9
|
-
|
10
|
-
|
11
|
-
BOS: str = "<s>"
|
12
|
-
EOS: str = "</s>"
|
13
|
-
B_INST: str = "[INST]"
|
14
|
-
E_INST: str = "[/INST]"
|
15
|
-
B_SYS: str = "<<SYS>>\n"
|
16
|
-
E_SYS: str = "\n<</SYS>>\n\n"
|
17
|
-
SPECIAL_TAGS: List[str] = [B_INST, E_INST, BOS, EOS, "<<SYS>>", "<</SYS>>"]
|
18
|
-
|
19
|
-
|
20
|
-
class Llama2Formatter(PromptFormatter):
|
21
|
-
def __int__(self, config: Llama2FormatterConfig) -> None:
|
22
|
-
super().__init__(config)
|
23
|
-
self.config: Llama2FormatterConfig = config
|
24
|
-
|
25
|
-
def format(self, messages: List[LLMMessage]) -> str:
|
26
|
-
sys_msg, chat_msgs, user_msg = LanguageModel.get_chat_history_components(
|
27
|
-
messages
|
28
|
-
)
|
29
|
-
return self._get_prompt_from_components(sys_msg, chat_msgs, user_msg)
|
30
|
-
|
31
|
-
def _get_prompt_from_components(
|
32
|
-
self,
|
33
|
-
system_prompt: str,
|
34
|
-
chat_history: List[Tuple[str, str]],
|
35
|
-
user_message: str,
|
36
|
-
) -> str:
|
37
|
-
"""
|
38
|
-
For llama2 models, convert chat history into a single
|
39
|
-
prompt for Llama2 models, for use in the /completions endpoint
|
40
|
-
(as opposed to the /chat/completions endpoint).
|
41
|
-
See:
|
42
|
-
https://www.reddit.com/r/LocalLLaMA/comments/155po2p/get_llama_2_prompt_format_right/
|
43
|
-
https://github.com/facebookresearch/llama/blob/main/llama/generation.py#L44
|
44
|
-
|
45
|
-
Args:
|
46
|
-
system_prompt (str): system prompt, typically specifying role/task.
|
47
|
-
chat_history (List[Tuple[str,str]]): List of (user, assistant) pairs
|
48
|
-
user_message (str): user message, at the end of the chat, i.e. the message
|
49
|
-
for which we want to generate a response.
|
50
|
-
|
51
|
-
Returns:
|
52
|
-
str: Prompt for Llama2 models
|
53
|
-
|
54
|
-
Typical structure of the formatted prompt:
|
55
|
-
Note important that the first [INST], [/INST] surrounds the system prompt,
|
56
|
-
together with the first user message. A lot of libs seem to miss this detail.
|
57
|
-
|
58
|
-
<s>[INST] <<SYS>>
|
59
|
-
You are are a helpful... bla bla.. assistant
|
60
|
-
<</SYS>>
|
61
|
-
|
62
|
-
Hi there! [/INST] Hello! How can I help you today? </s><s>[INST]
|
63
|
-
What is a neutron star? [/INST] A neutron star is a ... </s><s>
|
64
|
-
[INST] Okay cool, thank you! [/INST] You're welcome! </s><s>
|
65
|
-
[INST] Ah, I have one more question.. [/INST]
|
66
|
-
"""
|
67
|
-
bos = BOS if self.config.use_bos_eos else ""
|
68
|
-
eos = EOS if self.config.use_bos_eos else ""
|
69
|
-
text = f"{bos}{B_INST} {B_SYS}{system_prompt}{E_SYS}"
|
70
|
-
for user_input, response in chat_history:
|
71
|
-
text += (
|
72
|
-
f"{user_input.strip()} {E_INST} {response.strip()} {eos}{bos} {B_INST} "
|
73
|
-
)
|
74
|
-
text += f"{user_message.strip()} {E_INST}"
|
75
|
-
return text
|
@@ -1,147 +0,0 @@
|
|
1
|
-
# from openai-cookbook
|
2
|
-
import asyncio
|
3
|
-
import logging
|
4
|
-
import random
|
5
|
-
import time
|
6
|
-
from typing import Any, Callable, Dict, List
|
7
|
-
|
8
|
-
import aiohttp
|
9
|
-
import openai
|
10
|
-
import requests
|
11
|
-
|
12
|
-
logger = logging.getLogger(__name__)
|
13
|
-
# setlevel to warning
|
14
|
-
logger.setLevel(logging.WARNING)
|
15
|
-
|
16
|
-
|
17
|
-
# define a retry decorator
|
18
|
-
def retry_with_exponential_backoff(
|
19
|
-
func: Callable[..., Any],
|
20
|
-
initial_delay: float = 1,
|
21
|
-
exponential_base: float = 1.3,
|
22
|
-
jitter: bool = True,
|
23
|
-
max_retries: int = 5,
|
24
|
-
errors: tuple = ( # type: ignore
|
25
|
-
requests.exceptions.RequestException,
|
26
|
-
openai.APITimeoutError,
|
27
|
-
openai.RateLimitError,
|
28
|
-
openai.AuthenticationError,
|
29
|
-
openai.APIError,
|
30
|
-
aiohttp.ServerTimeoutError,
|
31
|
-
asyncio.TimeoutError,
|
32
|
-
),
|
33
|
-
) -> Callable[..., Any]:
|
34
|
-
"""Retry a function with exponential backoff."""
|
35
|
-
|
36
|
-
def wrapper(*args: List[Any], **kwargs: Dict[Any, Any]) -> Any:
|
37
|
-
# Initialize variables
|
38
|
-
num_retries = 0
|
39
|
-
delay = initial_delay
|
40
|
-
|
41
|
-
# Loop until a successful response or max_retries is hit or exception is raised
|
42
|
-
while True:
|
43
|
-
try:
|
44
|
-
return func(*args, **kwargs)
|
45
|
-
|
46
|
-
except openai.BadRequestError as e:
|
47
|
-
# do not retry when the request itself is invalid,
|
48
|
-
# e.g. when context is too long
|
49
|
-
logger.error(f"OpenAI API request failed with error: {e}.")
|
50
|
-
raise e
|
51
|
-
except openai.AuthenticationError as e:
|
52
|
-
# do not retry when there's an auth error
|
53
|
-
logger.error(f"OpenAI API request failed with error: {e}.")
|
54
|
-
raise e
|
55
|
-
|
56
|
-
# Retry on specified errors
|
57
|
-
except errors as e:
|
58
|
-
# Increment retries
|
59
|
-
num_retries += 1
|
60
|
-
|
61
|
-
# Check if max retries has been reached
|
62
|
-
if num_retries > max_retries:
|
63
|
-
raise Exception(
|
64
|
-
f"Maximum number of retries ({max_retries}) exceeded."
|
65
|
-
f" Last error: {str(e)}."
|
66
|
-
)
|
67
|
-
|
68
|
-
# Increment the delay
|
69
|
-
delay *= exponential_base * (1 + jitter * random.random())
|
70
|
-
logger.warning(
|
71
|
-
f"""OpenAI API request failed with error:
|
72
|
-
{e}.
|
73
|
-
Retrying in {delay} seconds..."""
|
74
|
-
)
|
75
|
-
# Sleep for the delay
|
76
|
-
time.sleep(delay)
|
77
|
-
|
78
|
-
# Raise exceptions for any errors not specified
|
79
|
-
except Exception as e:
|
80
|
-
raise e
|
81
|
-
|
82
|
-
return wrapper
|
83
|
-
|
84
|
-
|
85
|
-
def async_retry_with_exponential_backoff(
|
86
|
-
func: Callable[..., Any],
|
87
|
-
initial_delay: float = 1,
|
88
|
-
exponential_base: float = 1.3,
|
89
|
-
jitter: bool = True,
|
90
|
-
max_retries: int = 5,
|
91
|
-
errors: tuple = ( # type: ignore
|
92
|
-
openai.APITimeoutError,
|
93
|
-
openai.RateLimitError,
|
94
|
-
openai.AuthenticationError,
|
95
|
-
openai.APIError,
|
96
|
-
aiohttp.ServerTimeoutError,
|
97
|
-
asyncio.TimeoutError,
|
98
|
-
),
|
99
|
-
) -> Callable[..., Any]:
|
100
|
-
"""Retry a function with exponential backoff."""
|
101
|
-
|
102
|
-
async def wrapper(*args: List[Any], **kwargs: Dict[Any, Any]) -> Any:
|
103
|
-
# Initialize variables
|
104
|
-
num_retries = 0
|
105
|
-
delay = initial_delay
|
106
|
-
|
107
|
-
# Loop until a successful response or max_retries is hit or exception is raised
|
108
|
-
while True:
|
109
|
-
try:
|
110
|
-
result = await func(*args, **kwargs)
|
111
|
-
return result
|
112
|
-
|
113
|
-
except openai.BadRequestError as e:
|
114
|
-
# do not retry when the request itself is invalid,
|
115
|
-
# e.g. when context is too long
|
116
|
-
logger.error(f"OpenAI API request failed with error: {e}.")
|
117
|
-
raise e
|
118
|
-
except openai.AuthenticationError as e:
|
119
|
-
# do not retry when there's an auth error
|
120
|
-
logger.error(f"OpenAI API request failed with error: {e}.")
|
121
|
-
raise e
|
122
|
-
# Retry on specified errors
|
123
|
-
except errors as e:
|
124
|
-
# Increment retries
|
125
|
-
num_retries += 1
|
126
|
-
|
127
|
-
# Check if max retries has been reached
|
128
|
-
if num_retries > max_retries:
|
129
|
-
raise Exception(
|
130
|
-
f"Maximum number of retries ({max_retries}) exceeded."
|
131
|
-
f" Last error: {str(e)}."
|
132
|
-
)
|
133
|
-
|
134
|
-
# Increment the delay
|
135
|
-
delay *= exponential_base * (1 + jitter * random.random())
|
136
|
-
logger.warning(
|
137
|
-
f"""OpenAI API request failed with error{e}.
|
138
|
-
Retrying in {delay} seconds..."""
|
139
|
-
)
|
140
|
-
# Sleep for the delay
|
141
|
-
time.sleep(delay)
|
142
|
-
|
143
|
-
# Raise exceptions for any errors not specified
|
144
|
-
except Exception as e:
|
145
|
-
raise e
|
146
|
-
|
147
|
-
return wrapper
|
langroid/mytypes.py
DELETED
@@ -1,84 +0,0 @@
|
|
1
|
-
from enum import Enum
|
2
|
-
from textwrap import dedent
|
3
|
-
from typing import Any, Callable, Dict, List, Union
|
4
|
-
from uuid import uuid4
|
5
|
-
|
6
|
-
from langroid.pydantic_v1 import BaseModel, Extra, Field
|
7
|
-
|
8
|
-
Number = Union[int, float]
|
9
|
-
Embedding = List[Number]
|
10
|
-
Embeddings = List[Embedding]
|
11
|
-
EmbeddingFunction = Callable[[List[str]], Embeddings]
|
12
|
-
|
13
|
-
|
14
|
-
class Entity(str, Enum):
|
15
|
-
"""
|
16
|
-
Enum for the different types of entities that can respond to the current message.
|
17
|
-
"""
|
18
|
-
|
19
|
-
AGENT = "Agent"
|
20
|
-
LLM = "LLM"
|
21
|
-
USER = "User"
|
22
|
-
SYSTEM = "System"
|
23
|
-
|
24
|
-
def __eq__(self, other: object) -> bool:
|
25
|
-
"""Allow case-insensitive equality (==) comparison with strings."""
|
26
|
-
if other is None:
|
27
|
-
return False
|
28
|
-
if isinstance(other, str):
|
29
|
-
return self.value.lower() == other.lower()
|
30
|
-
return super().__eq__(other)
|
31
|
-
|
32
|
-
def __ne__(self, other: object) -> bool:
|
33
|
-
"""Allow case-insensitive non-equality (!=) comparison with strings."""
|
34
|
-
return not self.__eq__(other)
|
35
|
-
|
36
|
-
def __hash__(self) -> int:
|
37
|
-
"""Override this to ensure hashability of the enum,
|
38
|
-
so it can be used sets and dictionary keys.
|
39
|
-
"""
|
40
|
-
return hash(self.value.lower())
|
41
|
-
|
42
|
-
|
43
|
-
class DocMetaData(BaseModel):
|
44
|
-
"""Metadata for a document."""
|
45
|
-
|
46
|
-
source: str = "context"
|
47
|
-
is_chunk: bool = False # if it is a chunk, don't split
|
48
|
-
id: str = Field(default_factory=lambda: str(uuid4()))
|
49
|
-
window_ids: List[str] = [] # for RAG: ids of chunks around this one
|
50
|
-
|
51
|
-
def dict_bool_int(self, *args: Any, **kwargs: Any) -> Dict[str, Any]:
|
52
|
-
"""
|
53
|
-
Special dict method to convert bool fields to int, to appease some
|
54
|
-
downstream libraries, e.g. Chroma which complains about bool fields in
|
55
|
-
metadata.
|
56
|
-
"""
|
57
|
-
original_dict = super().dict(*args, **kwargs)
|
58
|
-
|
59
|
-
for key, value in original_dict.items():
|
60
|
-
if isinstance(value, bool):
|
61
|
-
original_dict[key] = 1 * value
|
62
|
-
|
63
|
-
return original_dict
|
64
|
-
|
65
|
-
class Config:
|
66
|
-
extra = Extra.allow
|
67
|
-
|
68
|
-
|
69
|
-
class Document(BaseModel):
|
70
|
-
"""Interface for interacting with a document."""
|
71
|
-
|
72
|
-
content: str
|
73
|
-
metadata: DocMetaData
|
74
|
-
|
75
|
-
def id(self) -> str:
|
76
|
-
return self.metadata.id
|
77
|
-
|
78
|
-
def __str__(self) -> str:
|
79
|
-
return dedent(
|
80
|
-
f"""
|
81
|
-
CONTENT: {self.content}
|
82
|
-
SOURCE:{self.metadata.source}
|
83
|
-
"""
|
84
|
-
)
|
langroid/parsing/__init__.py
DELETED
@@ -1,52 +0,0 @@
|
|
1
|
-
from . import parser
|
2
|
-
from . import agent_chats
|
3
|
-
from . import code_parser
|
4
|
-
from . import document_parser
|
5
|
-
from . import parse_json
|
6
|
-
from . import para_sentence_split
|
7
|
-
from . import repo_loader
|
8
|
-
from . import url_loader
|
9
|
-
from . import table_loader
|
10
|
-
from . import urls
|
11
|
-
from . import utils
|
12
|
-
from . import search
|
13
|
-
from . import web_search
|
14
|
-
|
15
|
-
from .parser import (
|
16
|
-
Splitter,
|
17
|
-
PdfParsingConfig,
|
18
|
-
DocxParsingConfig,
|
19
|
-
DocParsingConfig,
|
20
|
-
ParsingConfig,
|
21
|
-
Parser,
|
22
|
-
)
|
23
|
-
|
24
|
-
__all__ = [
|
25
|
-
"parser",
|
26
|
-
"agent_chats",
|
27
|
-
"code_parser",
|
28
|
-
"document_parser",
|
29
|
-
"parse_json",
|
30
|
-
"para_sentence_split",
|
31
|
-
"repo_loader",
|
32
|
-
"url_loader",
|
33
|
-
"table_loader",
|
34
|
-
"urls",
|
35
|
-
"utils",
|
36
|
-
"search",
|
37
|
-
"web_search",
|
38
|
-
"Splitter",
|
39
|
-
"PdfParsingConfig",
|
40
|
-
"DocxParsingConfig",
|
41
|
-
"DocParsingConfig",
|
42
|
-
"ParsingConfig",
|
43
|
-
"Parser",
|
44
|
-
]
|
45
|
-
|
46
|
-
try:
|
47
|
-
from . import spider
|
48
|
-
|
49
|
-
spider
|
50
|
-
__all__.append("spider")
|
51
|
-
except ImportError:
|
52
|
-
pass
|
langroid/parsing/agent_chats.py
DELETED
@@ -1,38 +0,0 @@
|
|
1
|
-
from typing import Tuple, no_type_check
|
2
|
-
|
3
|
-
from pyparsing import Empty, Literal, ParseException, SkipTo, StringEnd, Word, alphanums
|
4
|
-
|
5
|
-
|
6
|
-
@no_type_check
|
7
|
-
def parse_message(msg: str) -> Tuple[str, str]:
|
8
|
-
"""
|
9
|
-
Parse the intended recipient and content of a message.
|
10
|
-
Message format is assumed to be TO[<recipient>]:<message>.
|
11
|
-
The TO[<recipient>]: part is optional.
|
12
|
-
|
13
|
-
Args:
|
14
|
-
msg (str): message to parse
|
15
|
-
|
16
|
-
Returns:
|
17
|
-
str, str: task-name of intended recipient, and content of message
|
18
|
-
(if recipient is not specified, task-name is empty string)
|
19
|
-
|
20
|
-
"""
|
21
|
-
if msg is None:
|
22
|
-
return "", ""
|
23
|
-
|
24
|
-
# Grammar definition
|
25
|
-
name = Word(alphanums)
|
26
|
-
to_start = Literal("TO[").suppress()
|
27
|
-
to_end = Literal("]:").suppress()
|
28
|
-
to_field = (to_start + name("name") + to_end) | Empty().suppress()
|
29
|
-
message = SkipTo(StringEnd())("text")
|
30
|
-
|
31
|
-
# Parser definition
|
32
|
-
parser = to_field + message
|
33
|
-
|
34
|
-
try:
|
35
|
-
parsed = parser.parseString(msg)
|
36
|
-
return parsed.name, parsed.text
|
37
|
-
except ParseException:
|
38
|
-
return "", msg
|
langroid/parsing/code-parsing.md
DELETED
@@ -1,86 +0,0 @@
|
|
1
|
-
To split Python code files into meaningful chunks, you can use the `tree-sitter` library, which is a parser generator tool and an incremental parsing library. It can be used to parse source code into an abstract syntax tree (AST) and extract meaningful code blocks from it. Here's how you can use `tree-sitter` to achieve this:
|
2
|
-
|
3
|
-
1. Install the `tree-sitter` Python package:
|
4
|
-
```python
|
5
|
-
pip install tree-sitter
|
6
|
-
```
|
7
|
-
|
8
|
-
2. Install the `tree-sitter-python` language grammar:
|
9
|
-
```bash
|
10
|
-
git clone https://github.com/tree-sitter/tree-sitter-python
|
11
|
-
```
|
12
|
-
|
13
|
-
3. Use `tree-sitter` to parse Python code files and extract meaningful code
|
14
|
-
blocks:
|
15
|
-
|
16
|
-
```python
|
17
|
-
from tree_sitter import Language, Parser
|
18
|
-
|
19
|
-
# Set the path to the tree-sitter-python language grammar
|
20
|
-
TREE_SITTER_PYTHON_PATH = './tree-sitter-python'
|
21
|
-
|
22
|
-
# Build the Python language
|
23
|
-
Language.build_library(
|
24
|
-
'build/my-languages.so',
|
25
|
-
[TREE_SITTER_PYTHON_PATH]
|
26
|
-
)
|
27
|
-
|
28
|
-
PYTHON_LANGUAGE = Language('build/my-languages.so', 'python')
|
29
|
-
|
30
|
-
# Create a parser
|
31
|
-
parser = Parser()
|
32
|
-
parser.set_language(PYTHON_LANGUAGE)
|
33
|
-
|
34
|
-
# Parse the code
|
35
|
-
code = """
|
36
|
-
def foo():
|
37
|
-
return "Hello, World!"
|
38
|
-
|
39
|
-
def bar():
|
40
|
-
return "Goodbye, World!"
|
41
|
-
"""
|
42
|
-
|
43
|
-
tree = parser.parse(bytes(code, 'utf8'))
|
44
|
-
|
45
|
-
# Extract meaningful code blocks (e.g., function definitions)
|
46
|
-
def extract_functions(node):
|
47
|
-
functions = []
|
48
|
-
for child in node.children:
|
49
|
-
if child.type == 'function_definition':
|
50
|
-
start_byte = child.start_byte
|
51
|
-
end_byte = child.end_byte
|
52
|
-
functions.append(code[start_byte:end_byte])
|
53
|
-
functions.extend(extract_functions(child))
|
54
|
-
return functions
|
55
|
-
|
56
|
-
functions = extract_functions(tree.root_node)
|
57
|
-
print(functions)
|
58
|
-
```
|
59
|
-
|
60
|
-
In the example provided, the `tree-sitter` library is used to parse the Python
|
61
|
-
code into an abstract syntax tree (AST). The `extract_functions` function is
|
62
|
-
then used to recursively traverse the AST and extract code blocks corresponding
|
63
|
-
to function definitions. The extracted code blocks are stored in the `functions`
|
64
|
-
list.
|
65
|
-
|
66
|
-
The `extract_functions` function takes an AST node as input and returns a list
|
67
|
-
of code blocks corresponding to function definitions. It checks whether the
|
68
|
-
current node is of type `'function_definition'` (which corresponds to a function
|
69
|
-
definition in Python code). If it is, the function extracts the corresponding
|
70
|
-
code block from the original code using the `start_byte` and `end_byte`
|
71
|
-
attributes of the node. The function then recursively processes the children of
|
72
|
-
the current node to extract any nested function definitions.
|
73
|
-
|
74
|
-
The resulting list `functions` contains the extracted code blocks, each
|
75
|
-
representing a function definition from the original code. You can modify
|
76
|
-
the `extract_functions` function to extract other types of code blocks (e.g.,
|
77
|
-
class definitions, loops) by checking for different node types in the AST.
|
78
|
-
|
79
|
-
Once you have extracted the code blocks, you can proceed with further
|
80
|
-
processing, such as converting them into vectors and storing them in a vector
|
81
|
-
database, as mentioned in the previous response.
|
82
|
-
|
83
|
-
Note: The code provided in this response is a basic example to demonstrate the
|
84
|
-
concept. Depending on your specific use case and requirements, you may need to
|
85
|
-
extend or modify the code to handle more complex scenarios, such as handling
|
86
|
-
comments, docstrings, and other code constructs.
|