haiku.rag 0.1.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of haiku.rag might be problematic. Click here for more details.

@@ -0,0 +1,195 @@
1
+ Metadata-Version: 2.4
2
+ Name: haiku.rag
3
+ Version: 0.1.0
4
+ Summary: Retrieval Augmented Generation (RAG) with SQLite
5
+ Author-email: Yiorgis Gozadinos <ggozadinos@gmail.com>
6
+ License: MIT
7
+ License-File: LICENSE
8
+ Classifier: Development Status :: 4 - Beta
9
+ Classifier: Environment :: Console
10
+ Classifier: Intended Audience :: Developers
11
+ Classifier: Operating System :: MacOS
12
+ Classifier: Operating System :: Microsoft :: Windows :: Windows 10
13
+ Classifier: Operating System :: Microsoft :: Windows :: Windows 11
14
+ Classifier: Operating System :: POSIX :: Linux
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Typing :: Typed
19
+ Requires-Python: >=3.10
20
+ Requires-Dist: fastmcp>=2.8.1
21
+ Requires-Dist: httpx>=0.28.1
22
+ Requires-Dist: markitdown[audio-transcription,docx,pdf,pptx,xlsx]>=0.1.2
23
+ Requires-Dist: ollama>=0.5.1
24
+ Requires-Dist: pydantic>=2.11.7
25
+ Requires-Dist: python-dotenv>=1.1.0
26
+ Requires-Dist: rich>=14.0.0
27
+ Requires-Dist: sqlite-vec>=0.1.6
28
+ Requires-Dist: tiktoken>=0.9.0
29
+ Requires-Dist: typer>=0.16.0
30
+ Requires-Dist: watchfiles>=1.1.0
31
+ Provides-Extra: voyageai
32
+ Requires-Dist: voyageai>=0.3.2; extra == 'voyageai'
33
+ Description-Content-Type: text/markdown
34
+
35
+ # Haiku SQLite RAG
36
+
37
+ A SQLite-based Retrieval-Augmented Generation (RAG) system built for efficient document storage, chunking, and hybrid search capabilities.
38
+
39
+ ## Features
40
+ - **Local SQLite**: No need to run additional servers
41
+ - **Support for various embedding providers**: You can use Ollama, VoyageAI or add your own
42
+ - **Hybrid Search**: Vector search using `sqlite-vec` combined with full-text search `FTS5`, using Reciprocal Rank Fusion
43
+ - **Multi-format Support**: Parse 40+ file formats including PDF, DOCX, HTML, Markdown, audio and more. Or add a url!
44
+
45
+ ## Installation
46
+
47
+ ```bash
48
+ uv pip install haiku.rag
49
+ ```
50
+
51
+ By default Ollama (with the `mxbai-embed-large` model) is used for the embeddings.
52
+ For other providers use:
53
+
54
+ - **VoyageAI**: `uv pip install haiku.rag --extra voyageai`
55
+
56
+ ## Configuration
57
+
58
+ If you want to use an alternative embeddings provider (Ollama being the default) you will need to set the provider details through environment variables:
59
+
60
+ By default:
61
+
62
+ ```bash
63
+ EMBEDDING_PROVIDER="ollama"
64
+ EMBEDDING_MODEL="mxbai-embed-large" # or any other model
65
+ EMBEDDING_VECTOR_DIM=1024
66
+ ```
67
+
68
+ For VoyageAI:
69
+ ```bash
70
+ EMBEDDING_PROVIDER="voyageai"
71
+ EMBEDDING_MODEL="voyage-3.5" # or any other model
72
+ EMBEDDING_VECTOR_DIM=1024
73
+ ```
74
+
75
+ ## Command Line Interface
76
+
77
+ `haiku.rag` includes a CLI application for managing documents and performing searches from the command line:
78
+
79
+ ### Available Commands
80
+
81
+ ```bash
82
+ # List all documents
83
+ haiku-rag list
84
+
85
+ # Add document from text
86
+ haiku-rag add "Your document content here"
87
+
88
+ # Add document from file or URL
89
+ haiku-rag add-src /path/to/document.pdf
90
+ haiku-rag add-src https://example.com/article.html
91
+
92
+ # Get and display a specific document
93
+ haiku-rag get 1
94
+
95
+ # Delete a document by ID
96
+ haiku-rag delete 1
97
+
98
+ # Search documents
99
+ haiku-rag search "machine learning"
100
+
101
+ # Search with custom options
102
+ haiku-rag search "python programming" --limit 10 --k 100
103
+
104
+ # Start MCP server (default HTTP transport)
105
+ haiku-rag serve # --stdio for stdio transport or --sse for SSE transport
106
+ ```
107
+
108
+ All commands support the `--db` option to specify a custom database path. Run
109
+ ```bash
110
+ haiku-rag command -h
111
+ ```
112
+ to see additional parameters for a command.
113
+
114
+ ## MCP Server
115
+
116
+ `haiku.rag` includes a Model Context Protocol (MCP) server that exposes RAG functionality as tools for AI assistants like Claude Desktop. The MCP server provides the following tools:
117
+
118
+ - `add_document_from_file` - Add documents from local file paths
119
+ - `add_document_from_url` - Add documents from URLs
120
+ - `add_document_from_text` - Add documents from raw text content
121
+ - `search_documents` - Search documents using hybrid search
122
+ - `get_document` - Retrieve specific documents by ID
123
+ - `list_documents` - List all documents with pagination
124
+ - `delete_document` - Delete documents by ID
125
+
126
+ You can start the server (using Streamble HTTP, stdio or SSE transports) with:
127
+
128
+ ```bash
129
+ # Start with default HTTP transport
130
+ haiku-rag serve # --stdio for stdio transport or --sse for SSE transport
131
+ ```
132
+
133
+ ## Using `haiku.rag` from python
134
+
135
+ ### Managing documents
136
+
137
+ ```python
138
+ from pathlib import Path
139
+ from haiku.rag.client import HaikuRAG
140
+
141
+ # Use as async context manager (recommended)
142
+ async with HaikuRAG("path/to/database.db") as client:
143
+ # Create document from text
144
+ doc = await client.create_document(
145
+ content="Your document content here",
146
+ uri="doc://example",
147
+ metadata={"source": "manual", "topic": "example"}
148
+ )
149
+
150
+ # Create document from file (auto-parses content)
151
+ doc = await client.create_document_from_source("path/to/document.pdf")
152
+
153
+ # Create document from URL
154
+ doc = await client.create_document_from_source("https://example.com/article.html")
155
+
156
+ # Retrieve documents
157
+ doc = await client.get_document_by_id(1)
158
+ doc = await client.get_document_by_uri("file:///path/to/document.pdf")
159
+
160
+ # List all documents with pagination
161
+ docs = await client.list_documents(limit=10, offset=0)
162
+
163
+ # Update document content
164
+ doc.content = "Updated content"
165
+ await client.update_document(doc)
166
+
167
+ # Delete document
168
+ await client.delete_document(doc.id)
169
+
170
+ # Search documents using hybrid search (vector + full-text)
171
+ results = await client.search("machine learning algorithms", limit=5)
172
+ for chunk, score in results:
173
+ print(f"Score: {score:.3f}")
174
+ print(f"Content: {chunk.content}")
175
+ print(f"Document ID: {chunk.document_id}")
176
+ print("---")
177
+ ```
178
+
179
+ ## Searching documents
180
+
181
+ ```python
182
+ async with HaikuRAG("database.db") as client:
183
+
184
+ results = await client.search(
185
+ query="machine learning",
186
+ limit=5, # Maximum results to return, defaults to 5
187
+ k=60 # RRF parameter for reciprocal rank fusion, defaults to 60
188
+ )
189
+
190
+ # Process results
191
+ for chunk, relevance_score in results:
192
+ print(f"Relevance: {relevance_score:.3f}")
193
+ print(f"Content: {chunk.content}")
194
+ print(f"From document: {chunk.document_id}")
195
+ ```
@@ -0,0 +1,27 @@
1
+ haiku/rag/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
2
+ haiku/rag/app.py,sha256=jJb5THgH3nbh2K8uiYsMVlkqVbSkIGEyxPMISM3epMA,4546
3
+ haiku/rag/chunker.py,sha256=lSSPWgNAe7gNZL_yNLmDtqxJix4YclOiG7gbARcEpV8,1871
4
+ haiku/rag/cli.py,sha256=XOxl7H86La7fB4DvsEJxtNuSfZgOtwqQDmECSaxv4sY,4020
5
+ haiku/rag/client.py,sha256=H5zE-HO8Asxo-_vEcnxFqvQixdiTFTqvNH8EkH7Xo4E,9713
6
+ haiku/rag/config.py,sha256=GxpfUwsQmfzQcknIAPEET_Qu-0WFYtPkHrV3arvNdxM,596
7
+ haiku/rag/mcp.py,sha256=tMN6fNX7ZtAER1R6DL1GkC9HZozTC4HzuQs199p7icI,4551
8
+ haiku/rag/reader.py,sha256=S7-Z72pDvSHedvgt4-RkTOwZadG88Oed9keJ69SVITk,962
9
+ haiku/rag/utils.py,sha256=6xVM6z2OmhzB4FEDlPbMsr_ZBBmCbMQb83nP6E2UdxY,629
10
+ haiku/rag/embeddings/__init__.py,sha256=jOamqhoeFX9J-ThwvVyHGd2s8jqJzA8B6J4sxHGZ39o,1007
11
+ haiku/rag/embeddings/base.py,sha256=PTAWKTU-Q-hXIhbRK1o6pIdpaW7DFdzJXQ0Nzc6VI-w,379
12
+ haiku/rag/embeddings/ollama.py,sha256=i_w7hbh-_ukysco274fLkQuFRgaFq0zIwIs8CNmRcLE,440
13
+ haiku/rag/embeddings/voyageai.py,sha256=MPioqQ0duzjglqvnN_8ftVq11fvBrcpV03p9MMLwflM,533
14
+ haiku/rag/store/__init__.py,sha256=hq0W0DAC7ysqhWSP2M2uHX8cbG6kbr-sWHxhq6qQcY0,103
15
+ haiku/rag/store/engine.py,sha256=BeYZRZ08zaYeeu375ysnAL3tGz4roA3GzP7WRNwznCo,2603
16
+ haiku/rag/store/models/__init__.py,sha256=s0E72zneGlowvZrFWaNxHYjOAUjgWdLxzdYsnvNRVlY,88
17
+ haiku/rag/store/models/chunk.py,sha256=D-fLHXtItXXyClj_KaE1OV-QQ-urDGS7lTE-qv2VHjw,223
18
+ haiku/rag/store/models/document.py,sha256=TVXVY-nQs-1vCORQEs9rA7zOtndeGC4dgCoujLAS054,396
19
+ haiku/rag/store/repositories/__init__.py,sha256=uIBhxjQh-4o3O-ck8b7BQ58qXQTuJdPvrDIHVhY5T1A,263
20
+ haiku/rag/store/repositories/base.py,sha256=cm3VyQXhtxvRfk1uJHpA0fDSxMpYN-mjQmRiDiLsQ68,1008
21
+ haiku/rag/store/repositories/chunk.py,sha256=6zABVlb5zbMQ4s50z9qb53ieHYaiv4CjgxpbsXxs814,14639
22
+ haiku/rag/store/repositories/document.py,sha256=xpWOpjHFbhVwNJ1gpusEKNY6l_Qyibg9y_bdHCwcfpk,7133
23
+ haiku_rag-0.1.0.dist-info/METADATA,sha256=kDmX6IcmvyL8ss4Go30_UDaSBA4TTzpkp6unzcDOgnM,6141
24
+ haiku_rag-0.1.0.dist-info/WHEEL,sha256=qtCwoSJWgHk21S1Kb4ihdzI2rlJ1ZKaIurTj_ngOhyQ,87
25
+ haiku_rag-0.1.0.dist-info/entry_points.txt,sha256=G1U3nAkNd5YDYd4v0tuYFbriz0i-JheCsFuT9kIoGCI,48
26
+ haiku_rag-0.1.0.dist-info/licenses/LICENSE,sha256=eXZrWjSk9PwYFNK9yUczl3oPl95Z4V9UXH7bPN46iPo,1065
27
+ haiku_rag-0.1.0.dist-info/RECORD,,
@@ -0,0 +1,4 @@
1
+ Wheel-Version: 1.0
2
+ Generator: hatchling 1.27.0
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any
@@ -0,0 +1,2 @@
1
+ [console_scripts]
2
+ haiku-rag = haiku.rag.cli:cli
@@ -0,0 +1,7 @@
1
+ Copyright 2025 Yiorgis Gozadinos
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
4
+
5
+ The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
6
+
7
+ THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.