nishant-ragkit 0.1.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (31) hide show
  1. nishant_ragkit-0.1.2/LICENSE +21 -0
  2. nishant_ragkit-0.1.2/PKG-INFO +373 -0
  3. nishant_ragkit-0.1.2/README.md +343 -0
  4. nishant_ragkit-0.1.2/pyproject.toml +41 -0
  5. nishant_ragkit-0.1.2/setup.cfg +4 -0
  6. nishant_ragkit-0.1.2/src/nishant_ragkit.egg-info/PKG-INFO +373 -0
  7. nishant_ragkit-0.1.2/src/nishant_ragkit.egg-info/SOURCES.txt +29 -0
  8. nishant_ragkit-0.1.2/src/nishant_ragkit.egg-info/dependency_links.txt +1 -0
  9. nishant_ragkit-0.1.2/src/nishant_ragkit.egg-info/requires.txt +24 -0
  10. nishant_ragkit-0.1.2/src/nishant_ragkit.egg-info/top_level.txt +1 -0
  11. nishant_ragkit-0.1.2/src/ragkit/__init__.py +4 -0
  12. nishant_ragkit-0.1.2/src/ragkit/chains/__init__.py +4 -0
  13. nishant_ragkit-0.1.2/src/ragkit/chains/answer.py +37 -0
  14. nishant_ragkit-0.1.2/src/ragkit/chains/rewrite.py +26 -0
  15. nishant_ragkit-0.1.2/src/ragkit/config.py +20 -0
  16. nishant_ragkit-0.1.2/src/ragkit/core.py +286 -0
  17. nishant_ragkit-0.1.2/src/ragkit/embeddings/__init__.py +3 -0
  18. nishant_ragkit-0.1.2/src/ragkit/embeddings/factory.py +7 -0
  19. nishant_ragkit-0.1.2/src/ragkit/ingestion/__init__.py +4 -0
  20. nishant_ragkit-0.1.2/src/ragkit/ingestion/loaders.py +7 -0
  21. nishant_ragkit-0.1.2/src/ragkit/ingestion/splitters.py +9 -0
  22. nishant_ragkit-0.1.2/src/ragkit/memory/__init__.py +3 -0
  23. nishant_ragkit-0.1.2/src/ragkit/memory/history.py +17 -0
  24. nishant_ragkit-0.1.2/src/ragkit/providers/__init__.py +3 -0
  25. nishant_ragkit-0.1.2/src/ragkit/providers/llm.py +129 -0
  26. nishant_ragkit-0.1.2/src/ragkit/retrieval/__init__.py +3 -0
  27. nishant_ragkit-0.1.2/src/ragkit/retrieval/retriever.py +17 -0
  28. nishant_ragkit-0.1.2/src/ragkit/store/__init__.py +3 -0
  29. nishant_ragkit-0.1.2/src/ragkit/store/chroma_store.py +18 -0
  30. nishant_ragkit-0.1.2/src/ragkit/utils/__init__.py +3 -0
  31. nishant_ragkit-0.1.2/src/ragkit/utils/text.py +22 -0
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 J-Libraries
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,373 @@
1
+ Metadata-Version: 2.4
2
+ Name: nishant-ragkit
3
+ Version: 0.1.2
4
+ Summary: A Python library for document-based RAG
5
+ Author: Nishant Mishra
6
+ Requires-Python: >=3.10
7
+ Description-Content-Type: text/markdown
8
+ License-File: LICENSE
9
+ Requires-Dist: langchain
10
+ Requires-Dist: langchain-core
11
+ Requires-Dist: langchain-community
12
+ Requires-Dist: langchain-text-splitters
13
+ Requires-Dist: langchain-classic
14
+ Requires-Dist: langchain-huggingface
15
+ Requires-Dist: chromadb
16
+ Requires-Dist: pypdf
17
+ Requires-Dist: sentence-transformers
18
+ Requires-Dist: python-dotenv
19
+ Provides-Extra: sarvam
20
+ Requires-Dist: langchain-sarvam; extra == "sarvam"
21
+ Provides-Extra: openai
22
+ Requires-Dist: langchain-openai; extra == "openai"
23
+ Provides-Extra: anthropic
24
+ Requires-Dist: langchain-anthropic; extra == "anthropic"
25
+ Provides-Extra: all
26
+ Requires-Dist: langchain-sarvam; extra == "all"
27
+ Requires-Dist: langchain-openai; extra == "all"
28
+ Requires-Dist: langchain-anthropic; extra == "all"
29
+ Dynamic: license-file
30
+
31
+ # rag-kit
32
+
33
+ `rag-kit` is a simple, modular Python library for building PDF-based RAG applications with conversational memory and flexible LLM provider support.
34
+
35
+ It is designed to hide most of the LangChain complexity behind a clean API:
36
+
37
+ ```python
38
+ from ragkit import PDFRAG
39
+
40
+ rag = PDFRAG("data/sample.pdf")
41
+ print(rag.ask("What is LangChain?"))
42
+ ```
43
+
44
+ ---
45
+
46
+ ## Features
47
+
48
+ - PDF-based RAG
49
+ - Conversational chat with session memory
50
+ - Follow-up handling for queries like:
51
+ - `hindi m batao`
52
+ - `tell me in english`
53
+ - `what did I ask earlier?`
54
+ - Query rewriting for better retrieval
55
+ - Source return support
56
+ - Configurable chunking and retrieval
57
+ - Multiple LLM provider support:
58
+ - Sarvam (default)
59
+ - OpenAI
60
+ - Anthropic / Claude
61
+ - Custom LangChain-compatible chat models
62
+
63
+ ---
64
+
65
+ ## Installation
66
+
67
+ ### Basic install
68
+
69
+ ```bash
70
+ pip install rag-kit
71
+ ```
72
+
73
+ ### Optional provider extras
74
+
75
+ ```bash
76
+ pip install "rag-kit[openai]"
77
+ pip install "rag-kit[anthropic]"
78
+ pip install "rag-kit[all]"
79
+ ```
80
+
81
+ ### Local development install
82
+
83
+ ```bash
84
+ pip install -e .
85
+ ```
86
+
87
+ ---
88
+
89
+ ## Environment Variables
90
+
91
+ Create a `.env` file in your project root:
92
+
93
+ ```env
94
+ SARVAM_API_KEY=
95
+ OPENAI_API_KEY=
96
+ ANTHROPIC_API_KEY=
97
+ ```
98
+
99
+ An example template is provided in `.env.example`.
100
+
101
+ ---
102
+
103
+ ## Quick Start
104
+
105
+ ### Stateless Q&A
106
+
107
+ ```python
108
+ from ragkit import PDFRAG
109
+
110
+ rag = PDFRAG("data/sample.pdf")
111
+ answer = rag.ask("What is memory?")
112
+ print(answer)
113
+ ```
114
+
115
+ ### Chat with memory
116
+
117
+ ```python
118
+ from ragkit import PDFRAG
119
+
120
+ rag = PDFRAG("data/sample.pdf")
121
+
122
+ session_id = "user1"
123
+
124
+ print(rag.chat("What is memory?", session_id=session_id))
125
+ print(rag.chat("hindi m batao", session_id=session_id))
126
+ print(rag.chat("tell me in english", session_id=session_id))
127
+ ```
128
+
129
+ ### Return sources
130
+
131
+ ```python
132
+ from ragkit import PDFRAG
133
+
134
+ rag = PDFRAG("data/sample.pdf")
135
+ result = rag.ask("What is memory?", return_sources=True)
136
+
137
+ print(result["answer"])
138
+ print(result["sources"])
139
+ ```
140
+
141
+ Example shape:
142
+
143
+ ```python
144
+ {
145
+ "answer": "Memory in LangChain stores previous conversation turns...",
146
+ "sources": [
147
+ {
148
+ "content": "Memory in chat applications is created by storing earlier conversation turns...",
149
+ "page": 2,
150
+ "source": "data/sample.pdf",
151
+ "metadata": {
152
+ "page": 2,
153
+ "source": "data/sample.pdf"
154
+ }
155
+ }
156
+ ]
157
+ }
158
+ ```
159
+
160
+ ---
161
+
162
+ ## ask() vs chat()
163
+
164
+ | Method | Purpose |
165
+ |---|---|
166
+ | `ask()` | Stateless document Q&A |
167
+ | `chat()` | History-aware conversational interaction |
168
+
169
+ Use `ask()` when you want a direct answer from the document.
170
+
171
+ Use `chat()` when you want:
172
+ - follow-up questions
173
+ - translation of the previous answer
174
+ - history-based conversation
175
+
176
+ ---
177
+
178
+ ## LLM Providers
179
+
180
+ ### Default: Sarvam
181
+
182
+ ```python
183
+ from ragkit import PDFRAG
184
+
185
+ rag = PDFRAG("file.pdf")
186
+ ```
187
+
188
+ ### OpenAI
189
+
190
+ ```python
191
+ from ragkit import PDFRAG
192
+
193
+ rag = PDFRAG(
194
+ "file.pdf",
195
+ llm_provider="openai",
196
+ llm_config={
197
+ "model": "gpt-4o-mini",
198
+ "temperature": 0.1,
199
+ },
200
+ )
201
+ ```
202
+
203
+ ### Claude
204
+
205
+ ```python
206
+ from ragkit import PDFRAG
207
+
208
+ rag = PDFRAG(
209
+ "file.pdf",
210
+ llm_provider="claude",
211
+ llm_config={
212
+ "model": "claude-3-5-haiku-latest",
213
+ "temperature": 0.2,
214
+ },
215
+ )
216
+ ```
217
+
218
+ ### Custom LLM
219
+
220
+ ```python
221
+ from langchain_openai import ChatOpenAI
222
+ from ragkit import PDFRAG
223
+
224
+ llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
225
+ rag = PDFRAG("file.pdf", llm=llm)
226
+ ```
227
+
228
+ ---
229
+
230
+ ## Configuration
231
+
232
+ ```python
233
+ from ragkit import PDFRAG, RAGConfig
234
+
235
+ config = RAGConfig(
236
+ chunk_size=800,
237
+ chunk_overlap=150,
238
+ top_k=5,
239
+ use_multi_query=True,
240
+ enable_query_rewrite=True,
241
+ )
242
+
243
+ rag = PDFRAG("file.pdf", config=config)
244
+ ```
245
+
246
+ Configurable options currently include:
247
+
248
+ - `persist_directory`
249
+ - `chunk_size`
250
+ - `chunk_overlap`
251
+ - `top_k`
252
+ - `use_multi_query`
253
+ - `enable_query_rewrite`
254
+ - `collection_name`
255
+ - `verbose`
256
+ - `llm_provider`
257
+ - `llm_model`
258
+ - `llm_temperature`
259
+ - `llm_kwargs`
260
+
261
+ ---
262
+
263
+ ## Add More Documents
264
+
265
+ ```python
266
+ rag.add_documents("data/another.pdf")
267
+ ```
268
+
269
+ ---
270
+
271
+ ## Reset Chat
272
+
273
+ ```python
274
+ rag.reset_chat("user1")
275
+ ```
276
+
277
+ ---
278
+
279
+ ## Project Structure
280
+
281
+ ```text
282
+ rag-kit/
283
+ ├── .env.example
284
+ ├── .gitignore
285
+ ├── README.md
286
+ ├── pyproject.toml
287
+ ├── examples/
288
+ ├── data/
289
+ ├── src/
290
+ │ └── ragkit/
291
+ └── third_party/
292
+ ```
293
+
294
+ ---
295
+
296
+ ## Do You Need `requirements.txt`?
297
+
298
+ Not necessarily.
299
+
300
+ For modern Python packaging, `pyproject.toml` is enough and should be the main source of dependencies.
301
+
302
+ Use `requirements.txt` only if you want one of these:
303
+
304
+ - easier local setup for teammates
305
+ - pinned development environment
306
+ - quick install for people who do not use packaging workflows
307
+
308
+ ### Recommendation
309
+
310
+ Keep:
311
+
312
+ - `pyproject.toml` as the main dependency file
313
+
314
+ Optional:
315
+
316
+ - `requirements-dev.txt` for local development and testing
317
+
318
+ Example `requirements-dev.txt`:
319
+
320
+ ```txt
321
+ pytest
322
+ black
323
+ ruff
324
+ build
325
+ twine
326
+ ```
327
+
328
+ If you want, you can also generate a plain `requirements.txt`, but it should not replace `pyproject.toml`.
329
+
330
+ ---
331
+
332
+ ## Current Limitations
333
+
334
+ - Primarily optimized for PDF-based RAG
335
+ - Sarvam support may depend on vendored or local integration setup
336
+ - No streaming support yet
337
+ - No FastAPI server or UI layer yet
338
+ - Agent support is planned, but not included in the current public API
339
+
340
+ ---
341
+
342
+ ## Roadmap
343
+
344
+ - Better source citations
345
+ - Improved multi-file indexing isolation
346
+ - Streaming responses
347
+ - FastAPI server mode
348
+ - Playground / UI
349
+ - Agent support via `ragkit.agent`
350
+
351
+ ---
352
+
353
+ ## Examples
354
+
355
+ Check the `examples/` folder for runnable examples such as:
356
+
357
+ - `basic_ask.py`
358
+ - `chat_example.py`
359
+ - `provider_openai.py`
360
+
361
+ ---
362
+
363
+ ## License
364
+
365
+ MIT License
366
+
367
+ ---
368
+
369
+ ## Version
370
+
371
+ Current version: `0.1.0-beta`
372
+
373
+ APIs may evolve in future releases.
@@ -0,0 +1,343 @@
1
+ # rag-kit
2
+
3
+ `rag-kit` is a simple, modular Python library for building PDF-based RAG applications with conversational memory and flexible LLM provider support.
4
+
5
+ It is designed to hide most of the LangChain complexity behind a clean API:
6
+
7
+ ```python
8
+ from ragkit import PDFRAG
9
+
10
+ rag = PDFRAG("data/sample.pdf")
11
+ print(rag.ask("What is LangChain?"))
12
+ ```
13
+
14
+ ---
15
+
16
+ ## Features
17
+
18
+ - PDF-based RAG
19
+ - Conversational chat with session memory
20
+ - Follow-up handling for queries like:
21
+ - `hindi m batao`
22
+ - `tell me in english`
23
+ - `what did I ask earlier?`
24
+ - Query rewriting for better retrieval
25
+ - Source return support
26
+ - Configurable chunking and retrieval
27
+ - Multiple LLM provider support:
28
+ - Sarvam (default)
29
+ - OpenAI
30
+ - Anthropic / Claude
31
+ - Custom LangChain-compatible chat models
32
+
33
+ ---
34
+
35
+ ## Installation
36
+
37
+ ### Basic install
38
+
39
+ ```bash
40
+ pip install rag-kit
41
+ ```
42
+
43
+ ### Optional provider extras
44
+
45
+ ```bash
46
+ pip install "rag-kit[openai]"
47
+ pip install "rag-kit[anthropic]"
48
+ pip install "rag-kit[all]"
49
+ ```
50
+
51
+ ### Local development install
52
+
53
+ ```bash
54
+ pip install -e .
55
+ ```
56
+
57
+ ---
58
+
59
+ ## Environment Variables
60
+
61
+ Create a `.env` file in your project root:
62
+
63
+ ```env
64
+ SARVAM_API_KEY=
65
+ OPENAI_API_KEY=
66
+ ANTHROPIC_API_KEY=
67
+ ```
68
+
69
+ An example template is provided in `.env.example`.
70
+
71
+ ---
72
+
73
+ ## Quick Start
74
+
75
+ ### Stateless Q&A
76
+
77
+ ```python
78
+ from ragkit import PDFRAG
79
+
80
+ rag = PDFRAG("data/sample.pdf")
81
+ answer = rag.ask("What is memory?")
82
+ print(answer)
83
+ ```
84
+
85
+ ### Chat with memory
86
+
87
+ ```python
88
+ from ragkit import PDFRAG
89
+
90
+ rag = PDFRAG("data/sample.pdf")
91
+
92
+ session_id = "user1"
93
+
94
+ print(rag.chat("What is memory?", session_id=session_id))
95
+ print(rag.chat("hindi m batao", session_id=session_id))
96
+ print(rag.chat("tell me in english", session_id=session_id))
97
+ ```
98
+
99
+ ### Return sources
100
+
101
+ ```python
102
+ from ragkit import PDFRAG
103
+
104
+ rag = PDFRAG("data/sample.pdf")
105
+ result = rag.ask("What is memory?", return_sources=True)
106
+
107
+ print(result["answer"])
108
+ print(result["sources"])
109
+ ```
110
+
111
+ Example shape:
112
+
113
+ ```python
114
+ {
115
+ "answer": "Memory in LangChain stores previous conversation turns...",
116
+ "sources": [
117
+ {
118
+ "content": "Memory in chat applications is created by storing earlier conversation turns...",
119
+ "page": 2,
120
+ "source": "data/sample.pdf",
121
+ "metadata": {
122
+ "page": 2,
123
+ "source": "data/sample.pdf"
124
+ }
125
+ }
126
+ ]
127
+ }
128
+ ```
129
+
130
+ ---
131
+
132
+ ## ask() vs chat()
133
+
134
+ | Method | Purpose |
135
+ |---|---|
136
+ | `ask()` | Stateless document Q&A |
137
+ | `chat()` | History-aware conversational interaction |
138
+
139
+ Use `ask()` when you want a direct answer from the document.
140
+
141
+ Use `chat()` when you want:
142
+ - follow-up questions
143
+ - translation of the previous answer
144
+ - history-based conversation
145
+
146
+ ---
147
+
148
+ ## LLM Providers
149
+
150
+ ### Default: Sarvam
151
+
152
+ ```python
153
+ from ragkit import PDFRAG
154
+
155
+ rag = PDFRAG("file.pdf")
156
+ ```
157
+
158
+ ### OpenAI
159
+
160
+ ```python
161
+ from ragkit import PDFRAG
162
+
163
+ rag = PDFRAG(
164
+ "file.pdf",
165
+ llm_provider="openai",
166
+ llm_config={
167
+ "model": "gpt-4o-mini",
168
+ "temperature": 0.1,
169
+ },
170
+ )
171
+ ```
172
+
173
+ ### Claude
174
+
175
+ ```python
176
+ from ragkit import PDFRAG
177
+
178
+ rag = PDFRAG(
179
+ "file.pdf",
180
+ llm_provider="claude",
181
+ llm_config={
182
+ "model": "claude-3-5-haiku-latest",
183
+ "temperature": 0.2,
184
+ },
185
+ )
186
+ ```
187
+
188
+ ### Custom LLM
189
+
190
+ ```python
191
+ from langchain_openai import ChatOpenAI
192
+ from ragkit import PDFRAG
193
+
194
+ llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
195
+ rag = PDFRAG("file.pdf", llm=llm)
196
+ ```
197
+
198
+ ---
199
+
200
+ ## Configuration
201
+
202
+ ```python
203
+ from ragkit import PDFRAG, RAGConfig
204
+
205
+ config = RAGConfig(
206
+ chunk_size=800,
207
+ chunk_overlap=150,
208
+ top_k=5,
209
+ use_multi_query=True,
210
+ enable_query_rewrite=True,
211
+ )
212
+
213
+ rag = PDFRAG("file.pdf", config=config)
214
+ ```
215
+
216
+ Configurable options currently include:
217
+
218
+ - `persist_directory`
219
+ - `chunk_size`
220
+ - `chunk_overlap`
221
+ - `top_k`
222
+ - `use_multi_query`
223
+ - `enable_query_rewrite`
224
+ - `collection_name`
225
+ - `verbose`
226
+ - `llm_provider`
227
+ - `llm_model`
228
+ - `llm_temperature`
229
+ - `llm_kwargs`
230
+
231
+ ---
232
+
233
+ ## Add More Documents
234
+
235
+ ```python
236
+ rag.add_documents("data/another.pdf")
237
+ ```
238
+
239
+ ---
240
+
241
+ ## Reset Chat
242
+
243
+ ```python
244
+ rag.reset_chat("user1")
245
+ ```
246
+
247
+ ---
248
+
249
+ ## Project Structure
250
+
251
+ ```text
252
+ rag-kit/
253
+ ├── .env.example
254
+ ├── .gitignore
255
+ ├── README.md
256
+ ├── pyproject.toml
257
+ ├── examples/
258
+ ├── data/
259
+ ├── src/
260
+ │ └── ragkit/
261
+ └── third_party/
262
+ ```
263
+
264
+ ---
265
+
266
+ ## Do You Need `requirements.txt`?
267
+
268
+ Not necessarily.
269
+
270
+ For modern Python packaging, `pyproject.toml` is enough and should be the main source of dependencies.
271
+
272
+ Use `requirements.txt` only if you want one of these:
273
+
274
+ - easier local setup for teammates
275
+ - pinned development environment
276
+ - quick install for people who do not use packaging workflows
277
+
278
+ ### Recommendation
279
+
280
+ Keep:
281
+
282
+ - `pyproject.toml` as the main dependency file
283
+
284
+ Optional:
285
+
286
+ - `requirements-dev.txt` for local development and testing
287
+
288
+ Example `requirements-dev.txt`:
289
+
290
+ ```txt
291
+ pytest
292
+ black
293
+ ruff
294
+ build
295
+ twine
296
+ ```
297
+
298
+ If you want, you can also generate a plain `requirements.txt`, but it should not replace `pyproject.toml`.
299
+
300
+ ---
301
+
302
+ ## Current Limitations
303
+
304
+ - Primarily optimized for PDF-based RAG
305
+ - Sarvam support may depend on vendored or local integration setup
306
+ - No streaming support yet
307
+ - No FastAPI server or UI layer yet
308
+ - Agent support is planned, but not included in the current public API
309
+
310
+ ---
311
+
312
+ ## Roadmap
313
+
314
+ - Better source citations
315
+ - Improved multi-file indexing isolation
316
+ - Streaming responses
317
+ - FastAPI server mode
318
+ - Playground / UI
319
+ - Agent support via `ragkit.agent`
320
+
321
+ ---
322
+
323
+ ## Examples
324
+
325
+ Check the `examples/` folder for runnable examples such as:
326
+
327
+ - `basic_ask.py`
328
+ - `chat_example.py`
329
+ - `provider_openai.py`
330
+
331
+ ---
332
+
333
+ ## License
334
+
335
+ MIT License
336
+
337
+ ---
338
+
339
+ ## Version
340
+
341
+ Current version: `0.1.0-beta`
342
+
343
+ APIs may evolve in future releases.