PyPI - vanna - Versions diffs - 0.7.4__tar.gz → 0.7.6__tar.gz - Mend

vanna 0.7.4tar.gz → 0.7.6tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (76) hide show

{vanna-0.7.4 → vanna-0.7.6}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
-Metadata-Version: 2.1
+Metadata-Version: 2.3
 Name: vanna
-Version: 0.7.4
+Version: 0.7.6
 Summary: Generate SQL queries from natural language
 Author-email: Zain Hoda <zain@vanna.ai>
 Requires-Python: >=3.9
@@ -52,6 +52,9 @@ Requires-Dist: boto3 ; extra == "all"
 Requires-Dist: botocore ; extra == "all"
 Requires-Dist: langchain_core ; extra == "all"
 Requires-Dist: langchain_postgres ; extra == "all"
+Requires-Dist: langchain-community ; extra == "all"
+Requires-Dist: langchain-huggingface ; extra == "all"
+Requires-Dist: xinference-client ; extra == "all"
 Requires-Dist: anthropic ; extra == "anthropic"
 Requires-Dist: azure-search-documents ; extra == "azuresearch"
 Requires-Dist: azure-identity ; extra == "azuresearch"
@@ -78,6 +81,10 @@ Requires-Dist: httpx ; extra == "ollama"
 Requires-Dist: openai ; extra == "openai"
 Requires-Dist: opensearch-py ; extra == "opensearch"
 Requires-Dist: opensearch-dsl ; extra == "opensearch"
+Requires-Dist: langchain-community ; extra == "opensearch"
+Requires-Dist: langchain-huggingface ; extra == "opensearch"
+Requires-Dist: oracledb ; extra == "oracle"
+Requires-Dist: chromadb ; extra == "oracle"
 Requires-Dist: langchain-postgres>=0.0.12 ; extra == "pgvector"
 Requires-Dist: pinecone-client ; extra == "pinecone"
 Requires-Dist: fastembed ; extra == "pinecone"
@@ -90,6 +97,7 @@ Requires-Dist: snowflake-connector-python ; extra == "snowflake"
 Requires-Dist: tox ; extra == "test"
 Requires-Dist: vllm ; extra == "vllm"
 Requires-Dist: weaviate-client ; extra == "weaviate"
+Requires-Dist: xinference-client ; extra == "xinference-client"
 Requires-Dist: zhipuai ; extra == "zhipuai"
 Project-URL: Bug Tracker, https://github.com/vanna-ai/vanna/issues
 Project-URL: Homepage, https://github.com/vanna-ai/vanna
@@ -113,6 +121,7 @@ Provides-Extra: mysql
 Provides-Extra: ollama
 Provides-Extra: openai
 Provides-Extra: opensearch
+Provides-Extra: oracle
 Provides-Extra: pgvector
 Provides-Extra: pinecone
 Provides-Extra: postgres
@@ -122,13 +131,14 @@ Provides-Extra: snowflake
 Provides-Extra: test
 Provides-Extra: vllm
 Provides-Extra: weaviate
+Provides-Extra: xinference-client
 Provides-Extra: zhipuai
-| GitHub | PyPI | Documentation |
-| ------ | ---- | ------------- |
-| [![GitHub](https://img.shields.io/badge/GitHub-vanna-blue?logo=github)](https://github.com/vanna-ai/vanna) | [![PyPI](https://img.shields.io/pypi/v/vanna?logo=pypi)](https://pypi.org/project/vanna/) | [![Documentation](https://img.shields.io/badge/Documentation-vanna-blue?logo=read-the-docs)](https://vanna.ai/docs/) |
+| GitHub | PyPI | Documentation | Gurubase |
+| ------ | ---- | ------------- | -------- |
+| [![GitHub](https://img.shields.io/badge/GitHub-vanna-blue?logo=github)](https://github.com/vanna-ai/vanna) | [![PyPI](https://img.shields.io/pypi/v/vanna?logo=pypi)](https://pypi.org/project/vanna/) | [![Documentation](https://img.shields.io/badge/Documentation-vanna-blue?logo=read-the-docs)](https://vanna.ai/docs/) | [![Gurubase](https://img.shields.io/badge/Gurubase-Ask%20Vanna%20Guru-006BFF)](https://gurubase.io/g/vanna) |
 # Vanna
 Vanna is an MIT-licensed open-source Python RAG (Retrieval-Augmented Generation) framework for SQL generation and related functionality.
@@ -161,6 +171,46 @@ These are some of the user interfaces that we've built using Vanna. You can use
 - [vanna-ai/vanna-flask](https://github.com/vanna-ai/vanna-flask)
 - [vanna-ai/vanna-slack](https://github.com/vanna-ai/vanna-slack)
+## Supported LLMs
+- [OpenAI](https://github.com/vanna-ai/vanna/tree/main/src/vanna/openai)
+- [Anthropic](https://github.com/vanna-ai/vanna/tree/main/src/vanna/anthropic)
+- [Gemini](https://github.com/vanna-ai/vanna/blob/main/src/vanna/google/gemini_chat.py)
+- [HuggingFace](https://github.com/vanna-ai/vanna/blob/main/src/vanna/hf/hf.py)
+- [AWS Bedrock](https://github.com/vanna-ai/vanna/tree/main/src/vanna/bedrock)
+- [Ollama](https://github.com/vanna-ai/vanna/tree/main/src/vanna/ollama)
+- [Qianwen](https://github.com/vanna-ai/vanna/tree/main/src/vanna/qianwen)
+- [Qianfan](https://github.com/vanna-ai/vanna/tree/main/src/vanna/qianfan)
+- [Zhipu](https://github.com/vanna-ai/vanna/tree/main/src/vanna/ZhipuAI)
+## Supported VectorStores
+- [AzureSearch](https://github.com/vanna-ai/vanna/tree/main/src/vanna/azuresearch)
+- [Opensearch](https://github.com/vanna-ai/vanna/tree/main/src/vanna/opensearch)
+- [PgVector](https://github.com/vanna-ai/vanna/tree/main/src/vanna/pgvector)
+- [PineCone](https://github.com/vanna-ai/vanna/tree/main/src/vanna/pinecone)
+- [ChromaDB](https://github.com/vanna-ai/vanna/tree/main/src/vanna/chromadb)
+- [FAISS](https://github.com/vanna-ai/vanna/tree/main/src/vanna/faiss)
+- [Marqo](https://github.com/vanna-ai/vanna/tree/main/src/vanna/marqo)
+- [Milvus](https://github.com/vanna-ai/vanna/tree/main/src/vanna/milvus)
+- [Qdrant](https://github.com/vanna-ai/vanna/tree/main/src/vanna/qdrant)
+- [Weaviate](https://github.com/vanna-ai/vanna/tree/main/src/vanna/weaviate)
+- [Oracle](https://github.com/vanna-ai/vanna/tree/main/src/vanna/oracle)
+## Supported Databases
+- [PostgreSQL](https://www.postgresql.org/)
+- [MySQL](https://www.mysql.com/)
+- [PrestoDB](https://prestodb.io/)
+- [Apache Hive](https://hive.apache.org/)
+- [ClickHouse](https://clickhouse.com/)
+- [Snowflake](https://www.snowflake.com/en/)
+- [Oracle](https://www.oracle.com/)
+- [Microsoft SQL Server](https://www.microsoft.com/en-us/sql-server/sql-server-downloads)
+- [BigQuery](https://cloud.google.com/bigquery)
+- [SQLite](https://www.sqlite.org/)
+- [DuckDB](https://duckdb.org/)
 ## Getting started
 See the [documentation](https://vanna.ai/docs/) for specifics on your desired database, LLM, etc.

{vanna-0.7.4 → vanna-0.7.6}/README.md RENAMED Viewed

@@ -1,8 +1,8 @@
-| GitHub | PyPI | Documentation |
-| ------ | ---- | ------------- |
-| [![GitHub](https://img.shields.io/badge/GitHub-vanna-blue?logo=github)](https://github.com/vanna-ai/vanna) | [![PyPI](https://img.shields.io/pypi/v/vanna?logo=pypi)](https://pypi.org/project/vanna/) | [![Documentation](https://img.shields.io/badge/Documentation-vanna-blue?logo=read-the-docs)](https://vanna.ai/docs/) |
+| GitHub | PyPI | Documentation | Gurubase |
+| ------ | ---- | ------------- | -------- |
+| [![GitHub](https://img.shields.io/badge/GitHub-vanna-blue?logo=github)](https://github.com/vanna-ai/vanna) | [![PyPI](https://img.shields.io/pypi/v/vanna?logo=pypi)](https://pypi.org/project/vanna/) | [![Documentation](https://img.shields.io/badge/Documentation-vanna-blue?logo=read-the-docs)](https://vanna.ai/docs/) | [![Gurubase](https://img.shields.io/badge/Gurubase-Ask%20Vanna%20Guru-006BFF)](https://gurubase.io/g/vanna) |
 # Vanna
 Vanna is an MIT-licensed open-source Python RAG (Retrieval-Augmented Generation) framework for SQL generation and related functionality.
@@ -35,6 +35,46 @@ These are some of the user interfaces that we've built using Vanna. You can use
 - [vanna-ai/vanna-flask](https://github.com/vanna-ai/vanna-flask)
 - [vanna-ai/vanna-slack](https://github.com/vanna-ai/vanna-slack)
+## Supported LLMs
+- [OpenAI](https://github.com/vanna-ai/vanna/tree/main/src/vanna/openai)
+- [Anthropic](https://github.com/vanna-ai/vanna/tree/main/src/vanna/anthropic)
+- [Gemini](https://github.com/vanna-ai/vanna/blob/main/src/vanna/google/gemini_chat.py)
+- [HuggingFace](https://github.com/vanna-ai/vanna/blob/main/src/vanna/hf/hf.py)
+- [AWS Bedrock](https://github.com/vanna-ai/vanna/tree/main/src/vanna/bedrock)
+- [Ollama](https://github.com/vanna-ai/vanna/tree/main/src/vanna/ollama)
+- [Qianwen](https://github.com/vanna-ai/vanna/tree/main/src/vanna/qianwen)
+- [Qianfan](https://github.com/vanna-ai/vanna/tree/main/src/vanna/qianfan)
+- [Zhipu](https://github.com/vanna-ai/vanna/tree/main/src/vanna/ZhipuAI)
+## Supported VectorStores
+- [AzureSearch](https://github.com/vanna-ai/vanna/tree/main/src/vanna/azuresearch)
+- [Opensearch](https://github.com/vanna-ai/vanna/tree/main/src/vanna/opensearch)
+- [PgVector](https://github.com/vanna-ai/vanna/tree/main/src/vanna/pgvector)
+- [PineCone](https://github.com/vanna-ai/vanna/tree/main/src/vanna/pinecone)
+- [ChromaDB](https://github.com/vanna-ai/vanna/tree/main/src/vanna/chromadb)
+- [FAISS](https://github.com/vanna-ai/vanna/tree/main/src/vanna/faiss)
+- [Marqo](https://github.com/vanna-ai/vanna/tree/main/src/vanna/marqo)
+- [Milvus](https://github.com/vanna-ai/vanna/tree/main/src/vanna/milvus)
+- [Qdrant](https://github.com/vanna-ai/vanna/tree/main/src/vanna/qdrant)
+- [Weaviate](https://github.com/vanna-ai/vanna/tree/main/src/vanna/weaviate)
+- [Oracle](https://github.com/vanna-ai/vanna/tree/main/src/vanna/oracle)
+## Supported Databases
+- [PostgreSQL](https://www.postgresql.org/)
+- [MySQL](https://www.mysql.com/)
+- [PrestoDB](https://prestodb.io/)
+- [Apache Hive](https://hive.apache.org/)
+- [ClickHouse](https://clickhouse.com/)
+- [Snowflake](https://www.snowflake.com/en/)
+- [Oracle](https://www.oracle.com/)
+- [Microsoft SQL Server](https://www.microsoft.com/en-us/sql-server/sql-server-downloads)
+- [BigQuery](https://cloud.google.com/bigquery)
+- [SQLite](https://www.sqlite.org/)
+- [DuckDB](https://duckdb.org/)
 ## Getting started
 See the [documentation](https://vanna.ai/docs/) for specifics on your desired database, LLM, etc.

{vanna-0.7.4 → vanna-0.7.6}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "flit_core.buildapi"
 [project]
 name = "vanna"
-version = "0.7.4"
+version = "0.7.6"
 authors = [
   { name="Zain Hoda", email="zain@vanna.ai" },
 ]
@@ -33,7 +33,7 @@ bigquery = ["google-cloud-bigquery"]
 snowflake = ["snowflake-connector-python"]
 duckdb = ["duckdb"]
 google = ["google-generativeai", "google-cloud-aiplatform"]
-all = ["psycopg2-binary", "db-dtypes", "PyMySQL", "google-cloud-bigquery", "snowflake-connector-python", "duckdb", "openai", "qianfan", "mistralai>=1.0.0", "chromadb", "anthropic", "zhipuai", "marqo", "google-generativeai", "google-cloud-aiplatform", "qdrant-client", "fastembed", "ollama", "httpx", "opensearch-py", "opensearch-dsl", "transformers", "pinecone-client", "pymilvus[model]","weaviate-client", "azure-search-documents", "azure-identity", "azure-common", "faiss-cpu", "boto", "boto3", "botocore", "langchain_core", "langchain_postgres"]
+all = ["psycopg2-binary", "db-dtypes", "PyMySQL", "google-cloud-bigquery", "snowflake-connector-python", "duckdb", "openai", "qianfan", "mistralai>=1.0.0", "chromadb", "anthropic", "zhipuai", "marqo", "google-generativeai", "google-cloud-aiplatform", "qdrant-client", "fastembed", "ollama", "httpx", "opensearch-py", "opensearch-dsl", "transformers", "pinecone-client", "pymilvus[model]","weaviate-client", "azure-search-documents", "azure-identity", "azure-common", "faiss-cpu", "boto", "boto3", "botocore", "langchain_core", "langchain_postgres", "langchain-community", "langchain-huggingface", "xinference-client"]
 test = ["tox"]
 chromadb = ["chromadb"]
 openai = ["openai"]
@@ -47,7 +47,7 @@ ollama = ["ollama", "httpx"]
 qdrant = ["qdrant-client", "fastembed"]
 vllm = ["vllm"]
 pinecone = ["pinecone-client", "fastembed"]
-opensearch = ["opensearch-py", "opensearch-dsl"]
+opensearch = ["opensearch-py", "opensearch-dsl", "langchain-community", "langchain-huggingface"]
 hf = ["transformers"]
 milvus = ["pymilvus[model]"]
 bedrock = ["boto3", "botocore"]
@@ -56,3 +56,5 @@ azuresearch = ["azure-search-documents", "azure-identity", "azure-common", "fast
 pgvector = ["langchain-postgres>=0.0.12"]
 faiss-cpu = ["faiss-cpu"]
 faiss-gpu = ["faiss-gpu"]
+xinference-client = ["xinference-client"]
+oracle = ["oracledb", "chromadb"]

{vanna-0.7.4 → vanna-0.7.6}/src/vanna/base/base.py RENAMED Viewed

@@ -306,7 +306,7 @@ class VannaBase(ABC):
         message_log = [
             self.system_message(
-                f"You are a helpful data assistant. The user asked the question: '{question}'\n\nThe SQL query for this question was: {sql}\n\nThe following is a pandas DataFrame with the results of the query: \n{df.to_markdown()}\n\n"
+                f"You are a helpful data assistant. The user asked the question: '{question}'\n\nThe SQL query for this question was: {sql}\n\nThe following is a pandas DataFrame with the results of the query: \n{df.head(25).to_markdown()}\n\n"
             ),
             self.user_message(
                 f"Generate a list of {n_questions} followup questions that the user might ask about this data. Respond with a list of questions, one per line. Do not answer with any explanations -- just the questions. Remember that there should be an unambiguous SQL query that can be generated from the question. Prefer questions that are answerable outside of the context of this conversation. Prefer questions that are slight modifications of the SQL query that was generated that allow digging deeper into the data. Each question will be turned into a button that the user can click to generate a new SQL query so don't use 'example' type questions. Each question must have a one-to-one correspondence with an instantiated SQL query." +
@@ -689,6 +689,9 @@ class VannaBase(ABC):
         return response
     def _extract_python_code(self, markdown_string: str) -> str:
+        # Strip whitespace to avoid indentation errors in LLM-generated code
+        markdown_string = markdown_string.strip()
         # Regex pattern to match Python code blocks
         pattern = r"```[\w\s]*python\n([\s\S]*?)```|```([\s\S]*?)```"
@@ -1167,7 +1170,7 @@ class VannaBase(ABC):
         vn.connect_to_oracle(
         user="username",
         password="password",
-        dns="host:port/sid",
+        dsn="host:port/sid",
         )
         ```
         Args:

vanna-0.7.6/src/vanna/deepseek/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ from .deepseek_chat import DeepSeekChat

vanna-0.7.6/src/vanna/deepseek/deepseek_chat.py ADDED Viewed

@@ -0,0 +1,60 @@
+import os
+from openai import OpenAI
+from ..base import VannaBase
+# from vanna.chromadb import ChromaDB_VectorStore
+# class DeepSeekVanna(ChromaDB_VectorStore, DeepSeekChat):
+#     def __init__(self, config=None):
+#         ChromaDB_VectorStore.__init__(self, config=config)
+#         DeepSeekChat.__init__(self, config=config)
+# vn = DeepSeekVanna(config={"api_key": "sk-************", "model": "deepseek-chat"})
+class DeepSeekChat(VannaBase):
+    def __init__(self, config=None):
+        if config is None:
+            raise ValueError(
+                "For DeepSeek, config must be provided with an api_key and model"
+            )
+        if "api_key" not in config:
+            raise ValueError("config must contain a DeepSeek api_key")
+        if "model" not in config:
+            raise ValueError("config must contain a DeepSeek model")
+        api_key = config["api_key"]
+        model = config["model"]
+        self.model = model
+        self.client = OpenAI(api_key=api_key, base_url="https://api.deepseek.com/v1")
+    def system_message(self, message: str) -> any:
+        return {"role": "system", "content": message}
+    def user_message(self, message: str) -> any:
+        return {"role": "user", "content": message}
+    def assistant_message(self, message: str) -> any:
+        return {"role": "assistant", "content": message}
+    def generate_sql(self, question: str, **kwargs) -> str:
+        # 使用父类的 generate_sql
+        sql = super().generate_sql(question, **kwargs)
+        # 替换 "\_" 为 "_"
+        sql = sql.replace("\\_", "_")
+        return sql
+    def submit_prompt(self, prompt, **kwargs) -> str:
+        chat_response = self.client.chat.completions.create(
+            model=self.model,
+            messages=prompt,
+        )
+        return chat_response.choices[0].message.content

{vanna-0.7.4 → vanna-0.7.6}/src/vanna/google/gemini_chat.py RENAMED Viewed

@@ -1,4 +1,5 @@
 import os
 from ..base import VannaBase
@@ -30,8 +31,29 @@ class GoogleGeminiChat(VannaBase):
             self.chat_model = genai.GenerativeModel(model_name)
         else:
             # Authenticate using VertexAI
+            import google.auth
+            import vertexai
             from vertexai.generative_models import GenerativeModel
-            self.chat_model = GenerativeModel(model_name)
+            json_file_path = config.get("google_credentials")  # Assuming the JSON file path is provided in the config
+            if not json_file_path or not os.path.exists(json_file_path):
+                raise FileNotFoundError(f"JSON credentials file not found at: {json_file_path}")
+            try:
+                # Validate and set the JSON file path for GOOGLE_APPLICATION_CREDENTIALS
+                os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = json_file_path
+                # Initialize VertexAI with the credentials
+                credentials, _ = google.auth.default()
+                vertexai.init(credentials=credentials)
+                self.chat_model = GenerativeModel(model_name)
+            except google.auth.exceptions.DefaultCredentialsError as e:
+                raise RuntimeError(f"Default credentials error: {e}")
+            except google.auth.exceptions.TransportError as e:
+                raise RuntimeError(f"Transport error during authentication: {e}")
+            except Exception as e:
+                raise RuntimeError(f"Failed to authenticate using JSON file: {e}")
     def system_message(self, message: str) -> any:
         return message

{vanna-0.7.4 → vanna-0.7.6}/src/vanna/ollama/ollama.py RENAMED Viewed

@@ -91,7 +91,7 @@ class Ollama(VannaBase):
       f"model={self.model},\n"
       f"options={self.ollama_options},\n"
       f"keep_alive={self.keep_alive}")
-    self.log(f"Prompt Content:\n{json.dumps(prompt)}")
+    self.log(f"Prompt Content:\n{json.dumps(prompt, ensure_ascii=False)}")
     response_dict = self.ollama_client.chat(model=self.model,
                                             messages=prompt,
                                             stream=False,

vanna-0.7.6/src/vanna/opensearch/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ from .opensearch_vector import OpenSearch_VectorStore
2	+ from .opensearch_vector_semantic import OpenSearch_Semantic_VectorStore

vanna-0.7.6/src/vanna/opensearch/opensearch_vector_semantic.py ADDED Viewed

@@ -0,0 +1,175 @@
+import json
+import pandas as pd
+from langchain_community.vectorstores import OpenSearchVectorSearch
+from ..base import VannaBase
+from ..utils import deterministic_uuid
+class OpenSearch_Semantic_VectorStore(VannaBase):
+  def __init__(self, config=None):
+    VannaBase.__init__(self, config=config)
+    if config is None:
+      config = {}
+    if "embedding_function" in config:
+      self.embedding_function = config.get("embedding_function")
+    else:
+      from langchain_huggingface import HuggingFaceEmbeddings
+      self.embedding_function = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
+    self.n_results_sql = config.get("n_results_sql", config.get("n_results", 10))
+    self.n_results_documentation = config.get("n_results_documentation", config.get("n_results", 10))
+    self.n_results_ddl = config.get("n_results_ddl", config.get("n_results", 10))
+    self.document_index = config.get("es_document_index", "vanna_document_index")
+    self.ddl_index = config.get("es_ddl_index", "vanna_ddl_index")
+    self.question_sql_index = config.get("es_question_sql_index", "vanna_questions_sql_index")
+    self.log(f"OpenSearch_Semantic_VectorStore initialized with document_index: {self.document_index}, ddl_index: {self.ddl_index}, question_sql_index: {self.question_sql_index}")
+    es_urls = config.get("es_urls", "https://localhost:9200")
+    ssl = config.get("es_ssl", True)
+    verify_certs = config.get("es_verify_certs", True)
+    if "es_user" in config:
+      auth = (config["es_user"], config["es_password"])
+    else:
+      auth = None
+    headers = config.get("es_headers", None)
+    timeout = config.get("es_timeout", 60)
+    max_retries = config.get("es_max_retries", 10)
+    common_args = {
+        "opensearch_url": es_urls,
+        "embedding_function": self.embedding_function,
+        "engine": "faiss",
+        "http_auth": auth,
+        "use_ssl": ssl,
+        "verify_certs": verify_certs,
+        "timeout": timeout,
+        "max_retries": max_retries,
+        "retry_on_timeout": True,
+        "headers": headers,
+    }
+    self.documentation_store = OpenSearchVectorSearch(index_name=self.document_index, **common_args)
+    self.ddl_store = OpenSearchVectorSearch(index_name=self.ddl_index, **common_args)
+    self.sql_store = OpenSearchVectorSearch(index_name=self.question_sql_index, **common_args)
+  def add_ddl(self, ddl: str, **kwargs) -> str:
+    _id = deterministic_uuid(ddl) + "-ddl"
+    self.ddl_store.add_texts(texts=[ddl], ids=[_id], **kwargs)
+    return _id
+  def add_documentation(self, documentation: str, **kwargs) -> str:
+    _id = deterministic_uuid(documentation) + "-doc"
+    self.documentation_store.add_texts(texts=[documentation], ids=[_id], **kwargs)
+    return _id
+  def add_question_sql(self, question: str, sql: str, **kwargs) -> str:
+    question_sql_json = json.dumps(
+      {
+        "question": question,
+        "sql": sql,
+      },
+      ensure_ascii=False,
+    )
+    _id = deterministic_uuid(question_sql_json) + "-sql"
+    self.sql_store.add_texts(texts=[question_sql_json], ids=[_id], **kwargs)
+    return _id
+  def get_related_ddl(self, question: str, **kwargs) -> list:
+    documents = self.ddl_store.similarity_search(query=question, k=self.n_results_ddl)
+    return [document.page_content for document in documents]
+  def get_related_documentation(self, question: str, **kwargs) -> list:
+    documents = self.documentation_store.similarity_search(query=question, k=self.n_results_documentation)
+    return [document.page_content for document in documents]
+  def get_similar_question_sql(self, question: str, **kwargs) -> list:
+    documents = self.sql_store.similarity_search(query=question, k=self.n_results_sql)
+    return [json.loads(document.page_content) for document in documents]
+  def get_training_data(self, **kwargs) -> pd.DataFrame:
+    data = []
+    query = {
+      "query": {
+        "match_all": {}
+      }
+    }
+    indices = [
+      {"index": self.document_index, "type": "documentation"},
+      {"index": self.question_sql_index, "type": "sql"},
+      {"index": self.ddl_index, "type": "ddl"},
+    ]
+    # Use documentation_store.client consistently for search on all indices
+    opensearch_client = self.documentation_store.client
+    for index_info in indices:
+      index_name = index_info["index"]
+      training_data_type = index_info["type"]
+      scroll = '1m'  # keep scroll context for 1 minute
+      response = opensearch_client.search(
+        index=index_name,
+        ignore_unavailable=True,
+        body=query,
+        scroll=scroll,
+        size=1000
+      )
+      scroll_id = response.get('_scroll_id')
+      while scroll_id:
+        hits = response['hits']['hits']
+        if not hits:
+          break  # No more hits, exit loop
+        for hit in hits:
+          source = hit['_source']
+          if training_data_type == "sql":
+            try:
+              doc_dict = json.loads(source['text'])
+              content = doc_dict.get("sql")
+              question = doc_dict.get("question")
+            except json.JSONDecodeError as e:
+              self.log(f"Skipping row with custom_id {hit['_id']} due to JSON parsing error: {e}","Error")
+              continue
+          else:  # documentation or ddl
+            content = source['text']
+            question = None
+          data.append({
+            "id": hit["_id"],
+            "training_data_type": training_data_type,
+            "question": question,
+            "content": content,
+          })
+        # Get next batch of results, using documentation_store.client.scroll
+        response = opensearch_client.scroll(scroll_id=scroll_id, scroll=scroll)
+        scroll_id = response.get('_scroll_id')
+    return pd.DataFrame(data)
+  def remove_training_data(self, id: str, **kwargs) -> bool:
+    try:
+      if id.endswith("-sql"):
+        return self.sql_store.delete(ids=[id], **kwargs)
+      elif id.endswith("-ddl"):
+        return self.ddl_store.delete(ids=[id], **kwargs)
+      elif id.endswith("-doc"):
+        return self.documentation_store.delete(ids=[id], **kwargs)
+      else:
+        return False
+    except Exception as e:
+      self.log(f"Error deleting training dataError deleting training data: {e}", "Error")
+      return False
+  def generate_embedding(self, data: str, **kwargs) -> list[float]:
+    pass

vanna-0.7.6/src/vanna/oracle/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ from .oracle_vector import Oracle_VectorStore

vanna 0.7.4__tar.gz → 0.7.6__tar.gz

vanna 0.7.4tar.gz → 0.7.6tar.gz