PyPI - haiku.rag - Versions diffs - 0.5.4__tar.gz → 0.6.0__tar.gz - Mend

haiku.rag 0.5.4tar.gz → 0.6.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of haiku.rag might be problematic. Click here for more details.

Files changed (83) hide show

{haiku_rag-0.5.4 → haiku_rag-0.6.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: haiku.rag
-Version: 0.5.4
+Version: 0.6.0
 Summary: Retrieval Augmented Generation (RAG) with SQLite
 Author-email: Yiorgis Gozadinos <ggozadinos@gmail.com>
 License: MIT
@@ -22,6 +22,7 @@ Requires-Dist: docling>=2.15.0
 Requires-Dist: fastmcp>=2.8.1
 Requires-Dist: httpx>=0.28.1
 Requires-Dist: ollama>=0.5.3
+Requires-Dist: pydantic-ai>=0.7.2
 Requires-Dist: pydantic>=2.11.7
 Requires-Dist: python-dotenv>=1.1.0
 Requires-Dist: rich>=14.0.0
@@ -29,14 +30,8 @@ Requires-Dist: sqlite-vec>=0.1.6
 Requires-Dist: tiktoken>=0.9.0
 Requires-Dist: typer>=0.16.0
 Requires-Dist: watchfiles>=1.1.0
-Provides-Extra: anthropic
-Requires-Dist: anthropic>=0.56.0; extra == 'anthropic'
-Provides-Extra: cohere
-Requires-Dist: cohere>=5.16.1; extra == 'cohere'
 Provides-Extra: mxbai
 Requires-Dist: mxbai-rerank>=0.1.6; extra == 'mxbai'
-Provides-Extra: openai
-Requires-Dist: openai>=1.0.0; extra == 'openai'
 Provides-Extra: voyageai
 Requires-Dist: voyageai>=0.3.2; extra == 'voyageai'
 Description-Content-Type: text/markdown
@@ -51,7 +46,7 @@ Retrieval-Augmented Generation (RAG) library on SQLite.
 - **Local SQLite**: No external servers required
 - **Multiple embedding providers**: Ollama, VoyageAI, OpenAI
-- **Multiple QA providers**: Ollama, OpenAI, Anthropic
+- **Multiple QA providers**: Any provider/model supported by Pydantic AI
 - **Hybrid search**: Vector + full-text search with Reciprocal Rank Fusion
 - **Reranking**: Default search result reranking with MixedBread AI or Cohere
 - **Question answering**: Built-in QA agents on your documents

{haiku_rag-0.5.4 → haiku_rag-0.6.0}/README.md RENAMED Viewed

@@ -8,7 +8,7 @@ Retrieval-Augmented Generation (RAG) library on SQLite.
 - **Local SQLite**: No external servers required
 - **Multiple embedding providers**: Ollama, VoyageAI, OpenAI
-- **Multiple QA providers**: Ollama, OpenAI, Anthropic
+- **Multiple QA providers**: Any provider/model supported by Pydantic AI
 - **Hybrid search**: Vector + full-text search with Reciprocal Rank Fusion
 - **Reranking**: Default search result reranking with MixedBread AI or Cohere
 - **Question answering**: Built-in QA agents on your documents

{haiku_rag-0.5.4 → haiku_rag-0.6.0}/docs/configuration.md RENAMED Viewed

@@ -44,13 +44,7 @@ VOYAGE_API_KEY="your-api-key"
 ```
 ### OpenAI
-If you want to use OpenAI embeddings you will need to install `haiku.rag` with the VoyageAI extras,
-```bash
-uv pip install haiku.rag[openai]
-```
-and set environment variables.
+OpenAI embeddings are included in the default installation. Simply set environment variables:
 ```bash
 EMBEDDINGS_PROVIDER="openai"
@@ -61,7 +55,7 @@ OPENAI_API_KEY="your-api-key"
 ## Question Answering Providers
-Configure which LLM provider to use for question answering.
+Configure which LLM provider to use for question answering. Any provider and model supported by [Pydantic AI](https://ai.pydantic.dev/models/) can be used.
 ### Ollama (Default)
@@ -73,13 +67,7 @@ OLLAMA_BASE_URL="http://localhost:11434"
 ### OpenAI
-For OpenAI QA, you need to install haiku.rag with OpenAI extras:
-```bash
-uv pip install haiku.rag[openai]
-```
-Then configure:
+OpenAI QA is included in the default installation. Simply configure:
 ```bash
 QA_PROVIDER="openai"
@@ -89,20 +77,34 @@ OPENAI_API_KEY="your-api-key"
 ### Anthropic
-For Anthropic QA, you need to install haiku.rag with Anthropic extras:
+Anthropic QA is included in the default installation. Simply configure:
 ```bash
-uv pip install haiku.rag[anthropic]
+QA_PROVIDER="anthropic"
+QA_MODEL="claude-3-5-haiku-20241022"  # or claude-3-5-sonnet-20241022, etc.
+ANTHROPIC_API_KEY="your-api-key"
 ```
-Then configure:
+### Other Providers
+Any provider supported by Pydantic AI can be used. Examples include:
 ```bash
-QA_PROVIDER="anthropic"
-QA_MODEL="claude-3-5-haiku-20241022"  # or claude-3-5-sonnet-20241022, etc.
-ANTHROPIC_API_KEY="your-api-key"
+# Google Gemini
+QA_PROVIDER="gemini"
+QA_MODEL="gemini-1.5-flash"
+# Groq
+QA_PROVIDER="groq"
+QA_MODEL="llama-3.3-70b-versatile"
+# Mistral
+QA_PROVIDER="mistral"
+QA_MODEL="mistral-small-latest"
 ```
+See the [Pydantic AI documentation](https://ai.pydantic.dev/models/) for the complete list of supported providers and models.
 ## Reranking
 Reranking improves search quality by re-ordering the initial search results using specialized models. When enabled, the system retrieves more candidates (3x the requested limit) and then reranks them to return the most relevant results.
@@ -144,13 +146,7 @@ RERANK_MODEL="mixedbread-ai/mxbai-rerank-base-v2"
 ### Cohere
-For Cohere reranking, install with Cohere extras:
-```bash
-uv pip install haiku.rag[cohere]
-```
-Then configure:
+Cohere reranking is included in the default installation. Simply configure:
 ```bash
 RERANK_PROVIDER="cohere"

haiku_rag-0.6.0/docs/installation.md ADDED Viewed

@@ -0,0 +1,35 @@
+# Installation
+## Basic Installation
+```bash
+uv pip install haiku.rag
+```
+This includes support for:
+- **Ollama** (default embedding provider using `mxbai-embed-large`)
+- **OpenAI** (GPT models for QA and embeddings)
+- **Anthropic** (Claude models for QA)
+- **Cohere** (reranking models)
+## Provider-Specific Installation
+For additional embedding providers, install with extras:
+### VoyageAI
+```bash
+uv pip install haiku.rag[voyageai]
+```
+### MixedBread AI Reranking
+```bash
+uv pip install haiku.rag[mxbai]
+```
+## Requirements
+- Python 3.10+
+- SQLite 3.38+
+- Ollama (for default embeddings)

{haiku_rag-0.5.4 → haiku_rag-0.6.0}/docs/python.md RENAMED Viewed

@@ -138,9 +138,12 @@ Expand search results with adjacent chunks for more complete context:
 # Get initial search results
 search_results = await client.search("machine learning", limit=3)
-# Expand with adjacent chunks based on CONTEXT_CHUNK_RADIUS setting
+# Expand with adjacent chunks using config setting
 expanded_results = await client.expand_context(search_results)
+# Or specify a custom radius
+expanded_results = await client.expand_context(search_results, radius=2)
 # The expanded results contain chunks with combined content from adjacent chunks
 for chunk, score in expanded_results:
     print(f"Expanded content: {chunk.content}")  # Now includes before/after chunks

{haiku_rag-0.5.4 → haiku_rag-0.6.0}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "haiku.rag"
-version = "0.5.4"
+version = "0.6.0"
 description = "Retrieval Augmented Generation (RAG) with SQLite"
 authors = [{ name = "Yiorgis Gozadinos", email = "ggozadinos@gmail.com" }]
 license = { text = "MIT" }
@@ -27,6 +27,7 @@ dependencies = [
     "httpx>=0.28.1",
     "ollama>=0.5.3",
     "pydantic>=2.11.7",
+    "pydantic-ai>=0.7.2",
     "python-dotenv>=1.1.0",
     "rich>=14.0.0",
     "sqlite-vec>=0.1.6",
@@ -37,9 +38,6 @@ dependencies = [
 [project.optional-dependencies]
 voyageai = ["voyageai>=0.3.2"]
-openai = ["openai>=1.0.0"]
-anthropic = ["anthropic>=0.56.0"]
-cohere = ["cohere>=5.16.1"]
 mxbai = ["mxbai-rerank>=0.1.6"]
 [project.scripts]

{haiku_rag-0.5.4 → haiku_rag-0.6.0}/src/haiku/rag/client.py RENAMED Viewed

@@ -349,17 +349,21 @@ class HaikuRAG:
         return reranked_results
     async def expand_context(
-        self, search_results: list[tuple[Chunk, float]]
+        self,
+        search_results: list[tuple[Chunk, float]],
+        radius: int = Config.CONTEXT_CHUNK_RADIUS,
     ) -> list[tuple[Chunk, float]]:
         """Expand search results with adjacent chunks, merging overlapping chunks.
         Args:
             search_results: List of (chunk, score) tuples from search.
+            radius: Number of adjacent chunks to include before/after each chunk.
+                   Defaults to CONTEXT_CHUNK_RADIUS config setting.
         Returns:
             List of (chunk, score) tuples with expanded and merged context chunks.
         """
-        if Config.CONTEXT_CHUNK_RADIUS == 0:
+        if radius == 0:
             return search_results
         # Group chunks by document_id to handle merging within documents
@@ -377,7 +381,7 @@ class HaikuRAG:
             expanded_ranges = []
             for chunk, score in doc_chunks:
                 adjacent_chunks = await self.chunk_repository.get_adjacent_chunks(
-                    chunk, Config.CONTEXT_CHUNK_RADIUS
+                    chunk, radius
                 )
                 all_chunks = adjacent_chunks + [chunk]

{haiku_rag-0.5.4 → haiku_rag-0.6.0}/src/haiku/rag/embeddings/__init__.py RENAMED Viewed

@@ -17,20 +17,14 @@ def get_embedder() -> EmbedderBase:
         except ImportError:
             raise ImportError(
                 "VoyageAI embedder requires the 'voyageai' package. "
-                "Please install haiku.rag with the 'voyageai' extra:"
+                "Please install haiku.rag with the 'voyageai' extra: "
                 "uv pip install haiku.rag[voyageai]"
             )
         return VoyageAIEmbedder(Config.EMBEDDINGS_MODEL, Config.EMBEDDINGS_VECTOR_DIM)
     if Config.EMBEDDINGS_PROVIDER == "openai":
-        try:
-            from haiku.rag.embeddings.openai import Embedder as OpenAIEmbedder
-        except ImportError:
-            raise ImportError(
-                "OpenAI embedder requires the 'openai' package. "
-                "Please install haiku.rag with the 'openai' extra:"
-                "uv pip install haiku.rag[openai]"
-            )
+        from haiku.rag.embeddings.openai import Embedder as OpenAIEmbedder
         return OpenAIEmbedder(Config.EMBEDDINGS_MODEL, Config.EMBEDDINGS_VECTOR_DIM)
     raise ValueError(f"Unsupported embedding provider: {Config.EMBEDDINGS_PROVIDER}")

haiku_rag-0.6.0/src/haiku/rag/embeddings/openai.py ADDED Viewed

@@ -0,0 +1,13 @@
+from openai import AsyncOpenAI
+from haiku.rag.embeddings.base import EmbedderBase
+class Embedder(EmbedderBase):
+    async def embed(self, text: str) -> list[float]:
+        client = AsyncOpenAI()
+        response = await client.embeddings.create(
+            model=self._model,
+            input=text,
+        )
+        return response.data[0].embedding

haiku_rag-0.6.0/src/haiku/rag/qa/__init__.py ADDED Viewed

@@ -0,0 +1,15 @@
+from haiku.rag.client import HaikuRAG
+from haiku.rag.config import Config
+from haiku.rag.qa.agent import QuestionAnswerAgent
+def get_qa_agent(client: HaikuRAG, use_citations: bool = False) -> QuestionAnswerAgent:
+    provider = Config.QA_PROVIDER
+    model_name = Config.QA_MODEL
+    return QuestionAnswerAgent(
+        client=client,
+        provider=provider,
+        model=model_name,
+        use_citations=use_citations,
+    )

haiku_rag-0.6.0/src/haiku/rag/qa/agent.py ADDED Viewed

@@ -0,0 +1,76 @@
+from pydantic import BaseModel, Field
+from pydantic_ai import Agent, RunContext
+from pydantic_ai.models.openai import OpenAIModel
+from pydantic_ai.providers.ollama import OllamaProvider
+from haiku.rag.client import HaikuRAG
+from haiku.rag.config import Config
+from haiku.rag.qa.prompts import SYSTEM_PROMPT, SYSTEM_PROMPT_WITH_CITATIONS
+class SearchResult(BaseModel):
+    content: str = Field(description="The document text content")
+    score: float = Field(description="Relevance score (higher is more relevant)")
+    document_uri: str = Field(description="Source URI/path of the document")
+class Dependencies(BaseModel):
+    model_config = {"arbitrary_types_allowed": True}
+    client: HaikuRAG
+class QuestionAnswerAgent:
+    def __init__(
+        self,
+        client: HaikuRAG,
+        provider: str,
+        model: str,
+        use_citations: bool = False,
+        q: float = 0.0,
+    ):
+        self._client = client
+        system_prompt = SYSTEM_PROMPT_WITH_CITATIONS if use_citations else SYSTEM_PROMPT
+        model_obj = self._get_model(provider, model)
+        self._agent = Agent(
+            model=model_obj,
+            deps_type=Dependencies,
+            system_prompt=system_prompt,
+        )
+        @self._agent.tool
+        async def search_documents(
+            ctx: RunContext[Dependencies],
+            query: str,
+            limit: int = 3,
+        ) -> list[SearchResult]:
+            """Search the knowledge base for relevant documents."""
+            search_results = await ctx.deps.client.search(query, limit=limit)
+            expanded_results = await ctx.deps.client.expand_context(search_results)
+            return [
+                SearchResult(
+                    content=chunk.content,
+                    score=score,
+                    document_uri=chunk.document_uri or "",
+                )
+                for chunk, score in expanded_results
+            ]
+    def _get_model(self, provider: str, model: str):
+        """Get the appropriate model object for the provider."""
+        if provider == "ollama":
+            return OpenAIModel(
+                model_name=model,
+                provider=OllamaProvider(base_url=f"{Config.OLLAMA_BASE_URL}/v1"),
+            )
+        else:
+            # For all other providers, use the provider:model format
+            return f"{provider}:{model}"
+    async def answer(self, question: str) -> str:
+        """Answer a question using the RAG system."""
+        deps = Dependencies(client=self._client)
+        result = await self._agent.run(question, deps=deps)
+        return result.output

{haiku_rag-0.5.4 → haiku_rag-0.6.0}/src/haiku/rag/qa/prompts.py RENAMED Viewed

@@ -18,6 +18,7 @@ Guidelines:
 - Stick to the answer, do not ellaborate or provide context unless explicitly asked for it.
 Be concise, and always maintain accuracy over completeness. Prefer short, direct answers that are well-supported by the documents.
+/no_think
 """
 SYSTEM_PROMPT_WITH_CITATIONS = """
@@ -55,4 +56,5 @@ Citations:
 - /path/to/document2.pdf: "The manual provides guidance on military procedures and..."
 Be concise, and always maintain accuracy over completeness. Prefer short, direct answers that are well-supported by the documents.
+/no_think
 """

{haiku_rag-0.5.4 → haiku_rag-0.6.0}/src/haiku/rag/reranking/ollama.py RENAMED Viewed

@@ -1,14 +1,12 @@
-import json
-from ollama import AsyncClient
 from pydantic import BaseModel
+from pydantic_ai import Agent
+from pydantic_ai.models.openai import OpenAIModel
+from pydantic_ai.providers.ollama import OllamaProvider
 from haiku.rag.config import Config
 from haiku.rag.reranking.base import RerankerBase
 from haiku.rag.store.models.chunk import Chunk
-OLLAMA_OPTIONS = {"temperature": 0.0, "seed": 42, "num_ctx": 16384}
 class RerankResult(BaseModel):
     """Individual rerank result with index and relevance score."""
@@ -26,7 +24,28 @@ class RerankResponse(BaseModel):
 class OllamaReranker(RerankerBase):
     def __init__(self, model: str = Config.RERANK_MODEL):
         self._model = model
-        self._client = AsyncClient(host=Config.OLLAMA_BASE_URL)
+        # Create the reranking prompt
+        system_prompt = """You are a document reranking assistant. Given a query and a list of document chunks, you must rank them by relevance to the query.
+Return your response as a JSON object with a "results" array. Each result should have:
+- "index": the original index of the document (integer)
+- "relevance_score": a score between 0.0 and 1.0 indicating relevance (float, where 1.0 is most relevant)
+Only return the top documents up to the requested limit, ordered by decreasing relevance score.
+/no_think
+"""
+        model_obj = OpenAIModel(
+            model_name=model,
+            provider=OllamaProvider(base_url=f"{Config.OLLAMA_BASE_URL}/v1"),
+        )
+        self._agent = Agent(
+            model=model_obj,
+            output_type=RerankResponse,
+            system_prompt=system_prompt,
+        )
     async def rerank(
         self, query: str, chunks: list[Chunk], top_n: int = 10
@@ -38,15 +57,6 @@ class OllamaReranker(RerankerBase):
         for i, chunk in enumerate(chunks):
             documents.append({"index": i, "content": chunk.content})
-        # Create the prompt for reranking
-        system_prompt = """You are a document reranking assistant. Given a query and a list of document chunks, you must rank them by relevance to the query.
-Return your response as a JSON object with a "results" array. Each result should have:
-- "index": the original index of the document (integer)
-- "relevance_score": a score between 0.0 and 1.0 indicating relevance (float, where 1.0 is most relevant)
-Only return the top documents up to the requested limit, ordered by decreasing relevance score."""
         documents_text = ""
         for doc in documents:
             documents_text += f"Index {doc['index']}: {doc['content']}\n\n"
@@ -56,27 +66,14 @@ Only return the top documents up to the requested limit, ordered by decreasing r
 Documents to rerank:
 {documents_text.strip()}
-Please rank these documents by relevance to the query and return the top {top_n} results as JSON."""
-        messages = [
-            {"role": "system", "content": system_prompt},
-            {"role": "user", "content": user_prompt},
-        ]
+Rank these documents by relevance to the query and return the top {top_n} results as JSON."""
         try:
-            response = await self._client.chat(
-                model=self._model,
-                messages=messages,
-                format=RerankResponse.model_json_schema(),
-                options=OLLAMA_OPTIONS,
-            )
-            content = response["message"]["content"]
+            result = await self._agent.run(user_prompt)
-            parsed_response = RerankResponse.model_validate(json.loads(content))
             return [
-                (chunks[result.index], result.relevance_score)
-                for result in parsed_response.results[:top_n]
+                (chunks[result_item.index], result_item.relevance_score)
+                for result_item in result.output.results[:top_n]
             ]
         except Exception:

{haiku_rag-0.5.4 → haiku_rag-0.6.0}/tests/llm_judge.py RENAMED Viewed

@@ -1,21 +1,55 @@
-import json
-from ollama import AsyncClient
 from pydantic import BaseModel
+from pydantic_ai import Agent
+from pydantic_ai.models.openai import OpenAIModel
+from pydantic_ai.providers.ollama import OllamaProvider
 from haiku.rag.config import Config
+# Shared rubric/prompt for answer equivalence evaluation
+ANSWER_EQUIVALENCE_RUBRIC = """You are evaluating whether two answers to the same question are semantically equivalent.
+EVALUATION CRITERIA:
+Rate as EQUIVALENT if:
+✓ Both answers contain the same core factual information
+✓ Both directly address the question asked
+✓ The key claims and conclusions are consistent
+✓ Any additional detail in one answer doesn't contradict the other
+Rate as NOT EQUIVALENT if:
+✗ Factual contradictions exist between the answers
+✗ One answer fails to address the core question
+✗ Key information is missing that changes the meaning
+✗ The answers lead to different conclusions or implications
+GUIDELINES:
+- Ignore minor differences in phrasing, style, or formatting
+- Focus on semantic meaning rather than exact wording
+- Consider both answers correct if they convey the same essential information
+- Be tolerant of different levels of detail if the core answer is preserved
+- Evaluate based on what a person asking this question would need to know
+/no_think"""
 class LLMJudgeResponseSchema(BaseModel):
     equivalent: bool
 class LLMJudge:
-    """LLM-as-judge for evaluating answer equivalence using Ollama."""
+    """LLM-as-judge for evaluating answer equivalence using Pydantic AI."""
     def __init__(self, model: str = Config.QA_MODEL):
-        self.model = model
-        self.client = AsyncClient(host=Config.OLLAMA_BASE_URL)
+        # Create Ollama model
+        ollama_model = OpenAIModel(
+            model_name=model,
+            provider=OllamaProvider(base_url=f"{Config.OLLAMA_BASE_URL}/v1"),
+        )
+        # Create Pydantic AI agent
+        self._agent = Agent(
+            model=ollama_model,
+            output_type=LLMJudgeResponseSchema,
+            system_prompt=ANSWER_EQUIVALENCE_RUBRIC,
+        )
     async def judge_answers(
         self, question: str, answer: str, expected_answer: str
@@ -29,53 +63,14 @@ class LLMJudge:
             expected_answer: The reference/expected answer
         Returns:
-            Dictionary with judgment result:
-            - equivalent: bool indicating if answers are equivalent
-            - explanation: str explaining the reasoning
-            - score: str rating from 1-5
+            bool indicating if answers are equivalent
         """
-        prompt = f"""You are an expert evaluator determining whether two answers to the same question are semantically equivalent.
-QUESTION: {question}
+        prompt = f"""QUESTION: {question}
 GENERATED ANSWER: {answer}
-EXPECTED ANSWER: {expected_answer}
-EVALUATION CRITERIA:
-Rate as EQUIVALENT (true) if:
-✓ Both answers contain the same core factual information
-✓ Both directly address the question asked
-✓ The key claims and conclusions are consistent
-✓ Any additional detail in one answer doesn't contradict the other
-Rate as NOT EQUIVALENT (false) if:
-✗ Factual contradictions exist between the answers
-✗ One answer fails to address the core question
-✗ Key information is missing from one answer that changes the meaning
-✗ The answers lead to different conclusions or implications
-GUIDELINES:
-- Ignore minor differences in phrasing, style, or formatting
-- Focus on semantic meaning rather than exact wording
-- Consider both answers correct if they convey the same essential information
-- Be tolerant of different levels of detail if the core answer is preserved
-- Evaluate based on what a person asking this question would need to know
-Respond with JSON containing only: {{"equivalent": true}} or {{"equivalent": false}}"""
-        response = await self.client.chat(
-            model=self.model,
-            messages=[{"role": "user", "content": prompt}],
-            format=LLMJudgeResponseSchema.model_json_schema(),
-            think=False,
-        )
+EXPECTED ANSWER: {expected_answer}"""
-        answer = response["message"]["content"].strip()
-        try:
-            res = json.loads(answer)
-            assert "equivalent" in res, "Response must contain 'equivalent' key"
-            return res["equivalent"]
-        except json.JSONDecodeError:
-            assert False, "Response is not valid JSON"
+        result = await self._agent.run(prompt)
+        return result.output.equivalent

haiku.rag 0.5.4__tar.gz → 0.6.0__tar.gz

Potentially problematic release.

haiku.rag 0.5.4tar.gz → 0.6.0tar.gz