visual-rag-toolkit 0.1.1__py3-none-any.whl → 0.1.3__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: visual-rag-toolkit
3
- Version: 0.1.1
3
+ Version: 0.1.3
4
4
  Summary: End-to-end visual document retrieval with ColPali, featuring two-stage pooling for scalable search
5
5
  Project-URL: Homepage, https://github.com/Ara-Yeroyan/visual-rag-toolkit
6
6
  Project-URL: Documentation, https://github.com/Ara-Yeroyan/visual-rag-toolkit#readme
@@ -85,10 +85,9 @@ Description-Content-Type: text/markdown
85
85
 
86
86
  # Visual RAG Toolkit
87
87
 
88
- [![PyPI version](https://badge.fury.io/py/visual-rag-toolkit.svg)](https://badge.fury.io/py/visual-rag-toolkit)
89
- [![CI](https://github.com/Ara-Yeroyan/visual-rag-toolkit/actions/workflows/ci.yaml/badge.svg)](https://github.com/Ara-Yeroyan/visual-rag-toolkit/actions/workflows/ci.yaml)
90
- [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
91
- [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
88
+ [![PyPI](https://img.shields.io/pypi/v/visual-rag-toolkit)](https://pypi.org/project/visual-rag-toolkit/)
89
+ [![Python](https://img.shields.io/pypi/pyversions/visual-rag-toolkit)](https://pypi.org/project/visual-rag-toolkit/)
90
+ [![License](https://img.shields.io/pypi/l/visual-rag-toolkit)](LICENSE)
92
91
 
93
92
  End-to-end visual document retrieval toolkit featuring **fast multi-stage retrieval** (prefetch with pooled vectors + exact MaxSim reranking).
94
93
 
@@ -112,11 +111,10 @@ This repo contains:
112
111
  pip install visual-rag-toolkit
113
112
 
114
113
  # With specific features
115
- pip install visual-rag-toolkit[embedding] # ColSmol/ColPali embedding support
116
- pip install visual-rag-toolkit[pdf] # PDF processing
114
+ pip install visual-rag-toolkit[ui] # Streamlit demo dependencies
117
115
  pip install visual-rag-toolkit[qdrant] # Vector database
116
+ pip install visual-rag-toolkit[embedding] # ColSmol/ColPali embedding support
118
117
  pip install visual-rag-toolkit[cloudinary] # Image CDN
119
- pip install visual-rag-toolkit[ui] # Streamlit demo dependencies
120
118
 
121
119
  # All dependencies
122
120
  pip install visual-rag-toolkit[all]
@@ -157,6 +155,95 @@ for r in results[:3]:
157
155
  print(r["id"], r["score_final"])
158
156
  ```
159
157
 
158
+ ### End-to-end: ingest PDFs (with cropping) → index in Qdrant
159
+
160
+ This is the "SDK-style" pipeline: PDF → images → optional crop → embed → store vectors + payload in Qdrant.
161
+
162
+ ```python
163
+ import os
164
+ from pathlib import Path
165
+
166
+ import numpy as np
167
+ import torch
168
+
169
+ from visual_rag import VisualEmbedder
170
+ from visual_rag.indexing import ProcessingPipeline, QdrantIndexer
171
+
172
+ QDRANT_URL = os.environ["QDRANT_URL"]
173
+ QDRANT_KEY = os.getenv("QDRANT_API_KEY", "")
174
+
175
+ collection = "my_visual_docs"
176
+
177
+ embedder = VisualEmbedder(
178
+ model_name="vidore/colSmol-500M",
179
+ torch_dtype=torch.float16,
180
+ output_dtype=np.float16,
181
+ batch_size=8,
182
+ )
183
+
184
+ indexer = QdrantIndexer(
185
+ url=QDRANT_URL,
186
+ api_key=QDRANT_KEY,
187
+ collection_name=collection,
188
+ prefer_grpc=True,
189
+ vector_datatype="float16",
190
+ )
191
+
192
+ # Creates collection + required payload indexes (e.g., "filename" for skip_existing)
193
+ indexer.create_collection(force_recreate=False)
194
+
195
+ pipeline = ProcessingPipeline(
196
+ embedder=embedder,
197
+ indexer=indexer,
198
+ embedding_strategy="all", # store full tokens + pooled vectors in one pass
199
+ crop_empty=True,
200
+ crop_empty_percentage_to_remove=0.99, # kept for traceability
201
+ crop_empty_remove_page_number=True,
202
+ crop_empty_preserve_border_px=1,
203
+ crop_empty_uniform_rowcol_std_threshold=3.0,
204
+ )
205
+
206
+ pdfs = [Path("docs/a.pdf"), Path("docs/b.pdf")]
207
+ for pdf_path in pdfs:
208
+ result = pipeline.process_pdf(
209
+ pdf_path,
210
+ skip_existing=True, # Skip pages already in Qdrant (uses filename index)
211
+ upload_to_cloudinary=False,
212
+ upload_to_qdrant=True,
213
+ )
214
+ # Logs automatically shown:
215
+ # [10:23:45] 📚 Processing PDF: a.pdf
216
+ # [10:23:45] 🖼️ Converting PDF to images...
217
+ # [10:23:46] ✅ Converted 12 pages
218
+ # [10:23:46] 📦 Processing pages 1-8/12
219
+ # [10:23:46] 🤖 Generating embeddings for 8 pages...
220
+ # [10:23:48] 📤 Uploading batch of 8 pages...
221
+ # [10:23:48] ✅ Uploaded 8 points to Qdrant
222
+ # [10:23:48] 📦 Processing pages 9-12/12
223
+ # [10:23:48] 🤖 Generating embeddings for 4 pages...
224
+ # [10:23:50] 📤 Uploading batch of 4 pages...
225
+ # [10:23:50] ✅ Uploaded 4 points to Qdrant
226
+ # [10:23:50] ✅ Completed a.pdf: 12 uploaded, 0 skipped, 0 failed
227
+ ```
228
+
229
+ CLI equivalent:
230
+
231
+ ```bash
232
+ export QDRANT_URL="https://YOUR_QDRANT"
233
+ export QDRANT_API_KEY="YOUR_KEY"
234
+
235
+ visual-rag process \
236
+ --reports-dir ./docs \
237
+ --collection my_visual_docs \
238
+ --model vidore/colSmol-500M \
239
+ --strategy all \
240
+ --batch-size 8 \
241
+ --qdrant-vector-dtype float16 \
242
+ --prefer-grpc \
243
+ --crop-empty \
244
+ --crop-empty-remove-page-number
245
+ ```
246
+
160
247
  ### Process a PDF into images (no embedding, no vector DB)
161
248
 
162
249
  ```python
@@ -186,7 +273,7 @@ Stage 2: Exact MaxSim reranking on candidates
186
273
  └── Return top-k results (e.g., 10)
187
274
  ```
188
275
 
189
- Three-stage extends this with an additional cheap prefetch stage before stage 2.
276
+ Three-stage extends this with an additional "cheap prefetch" stage before stage 2.
190
277
 
191
278
  ## 📁 Package Structure
192
279
 
@@ -209,16 +296,11 @@ visual-rag-toolkit/
209
296
  Configure via environment variables or YAML:
210
297
 
211
298
  ```bash
212
- # Qdrant credentials (preferred names used by the demo + scripts)
213
- export SIGIR_QDRANT_URL="https://your-cluster.qdrant.io"
214
- export SIGIR_QDRANT_KEY="your-api-key"
215
299
 
216
- # Backwards-compatible fallbacks (also supported)
300
+ # Qdrant credentials (preferred names used by the demo + scripts)
217
301
  export QDRANT_URL="https://your-cluster.qdrant.io"
218
302
  export QDRANT_API_KEY="your-api-key"
219
303
 
220
- export VISUALRAG_MODEL="vidore/colSmol-500M"
221
-
222
304
  # Special token handling (default: filter them out)
223
305
  export VISUALRAG_INCLUDE_SPECIAL_TOKENS=true # Include special tokens
224
306
  ```
@@ -269,7 +351,7 @@ python -m benchmarks.vidore_beir_qdrant.run_qdrant_beir \
269
351
  ```
270
352
 
271
353
  More commands (including multi-stage variants and cropping configs) live in:
272
- - `benchmarks/vidore_tatdqa_test/COMMANDS.md`
354
+ - `examples/COMMANDS.md`
273
355
 
274
356
  ## 🔧 Development
275
357
 
@@ -302,4 +384,3 @@ MIT License - see [LICENSE](LICENSE) for details.
302
384
  - [Qdrant](https://qdrant.tech/) - Vector database with multi-vector support
303
385
  - [ColPali](https://github.com/illuin-tech/colpali) - Visual document retrieval models
304
386
  - [ViDoRe](https://huggingface.co/spaces/vidore/vidore-leaderboard) - Benchmark dataset
305
-
@@ -6,31 +6,30 @@ benchmarks/prepare_submission.py,sha256=wD9sLWDqkQw_OANmVOdwe7OQlv4ZVf4sTQiQs7La
6
6
  benchmarks/quick_test.py,sha256=Mdcf2FNYSqWpYVfCmQLQzUVWLG-FiKUnyHyHKnAR3z4,20531
7
7
  benchmarks/run_vidore.py,sha256=RuDaEJ0wIV-hLHRtcd8PsRGOEEUFYDcrjUlor-HAajc,16373
8
8
  benchmarks/vidore_beir_qdrant/run_qdrant_beir.py,sha256=0lqIA6Qv53CreJpOg-h48sl4c8m7c_pVoQCp-oscnG0,56715
9
- benchmarks/vidore_tatdqa_test/COMMANDS.md,sha256=lhobkqHLZJjIPE-Lo3VuBuKh5XpbT2WS_sK-6dasPcE,1890
10
9
  benchmarks/vidore_tatdqa_test/__init__.py,sha256=WZiwKx8BGNuc0-oz1V3yiq8m_gWc5woEWy-WGb4F14E,18
11
10
  benchmarks/vidore_tatdqa_test/dataset_loader.py,sha256=gCCneGAKWQm0WlJHLvGjoMrAbm5b9cPEflkoMimtA2s,12795
12
11
  benchmarks/vidore_tatdqa_test/metrics.py,sha256=cLdYbRt5VcxInO1cN79ve6ZLP3kaSxRkdzRX3IbPPMs,1112
13
12
  benchmarks/vidore_tatdqa_test/run_qdrant.py,sha256=_PikeqIYpWPim-KEQOwvT-aqwYoAWASjqJVisi8PfQg,28681
14
13
  benchmarks/vidore_tatdqa_test/sweep_eval.py,sha256=d_kbyNTJ1LoFfIVnsZyiRO1nKyMqmRB5jEweZL6kYd4,12688
15
14
  demo/__init__.py,sha256=jVzjsVKZl5ZZuFxawA8Pxj3yuIKL7llkao3rBpde-aQ,204
16
- demo/app.py,sha256=1GZJ_JhVWvqoBewngc8tHeiuM1fNbxddEO6ZsEdwBfg,1029
15
+ demo/app.py,sha256=nZbCz1mpRK-GZTgOHyz4m4AfgKFgsH-09JwXeL3d3ng,1405
17
16
  demo/commands.py,sha256=qxRE2x610yZvcjwEfSKiR9CyFonX-vRxFqQNJCUKfyA,13690
18
17
  demo/config.py,sha256=BNkV4NSEEMIV9e6Z-cxds2v247uVmTPCgL-M5ItPzMg,757
19
18
  demo/download_models.py,sha256=J10qQt2TpEshVOxvCX_ZSbV7YozIBqDATZnt8fUKFHs,2868
20
- demo/evaluation.py,sha256=wiVxzRu3UZ5wAwHlpSKQ6srZjnSR06dgQw3G0OOV2Eg,28954
19
+ demo/evaluation.py,sha256=4ixJGg50KAVNiZ_mr5FMVv-QKCrZRooJ80LbrjKXM1s,27467
21
20
  demo/example_metadata_mapping_sigir.json,sha256=UCgqZtr6Wnq_vS7zxPxpvuokk9gxOVgKydC7f1lauw8,824
22
- demo/indexing.py,sha256=NLtGYnuCCb3uHGCgs8KHlLqKR-FSD6sxW3PlEw9UhYM,12853
23
- demo/qdrant_utils.py,sha256=VWEC7BwhMjjB7iIS5iaVDMGt_CMh9mQG4F94k1Pt0yA,7677
21
+ demo/indexing.py,sha256=qUVEB3QrIolS53Ggxurccbh-QyeLLbzcY5TLyVBVKME,10620
22
+ demo/qdrant_utils.py,sha256=Xh-thLIrACrYkFCrqazYNH0p3vS8_yMCaTbvt4HAy98,7778
24
23
  demo/results.py,sha256=dprvxnyHwxJvkAQuh4deaCsiEG1wm0n9svPyxI37vJg,1050
25
24
  demo/test_qdrant_connection.py,sha256=hkbyl3zGsw_GdBBp5MkW_3SBKTHXbwH3Sr_pUE54_po,3866
26
25
  demo/ui/__init__.py,sha256=EyBCvnXYfPbdyxJzyp9TjQBeJJUgmOY1yRHkUeC6JFQ,412
27
26
  demo/ui/benchmark.py,sha256=HiGCN4HrqeOC7L6t2kuzIiyWdcVE_cP2JTxoewrmPSo,14218
28
27
  demo/ui/header.py,sha256=J2hXr_nNyg1H9rmrd-EGx3WUl7lYo-Ca30ptgzBCfBs,806
29
- demo/ui/playground.py,sha256=Z3OgCWOzzTld1I3eN1IcTadaSzsqDQf7MiHwTbxbvJA,13692
30
- demo/ui/sidebar.py,sha256=muVCnvoeMOm1rHx7UPt68yLXlG3OERdXvJ3QqIXAUoc,7839
31
- demo/ui/upload.py,sha256=BHJmbIQOAYdMF_svxlRSYIe163Y5UX5P_gilJ09YHSA,20372
32
- visual_rag/__init__.py,sha256=UkGFXjPmjbO6Iad8ty1uJOMQsVMpV_s63ihchHltLx8,2555
33
- visual_rag/config.py,sha256=pd48M3j3n8ZV1HhaabMmP_uoEJnqhBC-Bma9vuvc8V4,7368
28
+ demo/ui/playground.py,sha256=yRlWWzJgsc596vALn5f0PHhmhtJCMmfv61nYakW75GQ,13672
29
+ demo/ui/sidebar.py,sha256=DLVhEj-8xAJCXUwOhndNv8ZFT4K3u8iE6FVOoH-jRuA,7699
30
+ demo/ui/upload.py,sha256=6iv4xDsacMtUF1FrquRBE_xNb92HevgxCMS0LBK4Ay0,20455
31
+ visual_rag/__init__.py,sha256=4NksVCaN_p32ezMF1N-oxpPFKeOm8xRo70VC4OSa2a0,3911
32
+ visual_rag/config.py,sha256=qqSQk2lM5MiRji-6xQNGS2gSiXA4NgyJnCbgGx7uGJQ,7395
34
33
  visual_rag/demo_runner.py,sha256=wi0Wz3gZ39l4aovMd6zURq_CKUSgma4kGjF6hpQHwGY,2793
35
34
  visual_rag/qdrant_admin.py,sha256=NNczko2S5-K3qATNUxgYn51hNWgWb6boheL7vlCQGpM,7055
36
35
  visual_rag/cli/__init__.py,sha256=WgBRXm0VACfLltvVlLcSs3FTM1uQ7Uuw3CVD4-zWZwc,46
@@ -38,22 +37,22 @@ visual_rag/cli/main.py,sha256=QmpnQ0lbC6Q9lwxaSCDh6paEEzI78IPY1jwc3_9y7VI,21083
38
37
  visual_rag/embedding/__init__.py,sha256=7QIENmxwRnwnUzsYKRY3VQTyF3HJkRiL1D7Au9XHF0w,682
39
38
  visual_rag/embedding/pooling.py,sha256=x8uY4VHbxEnsJRM2JeOkzPHDiwOkbi5NK4XW21U1hAc,11401
40
39
  visual_rag/embedding/visual_embedder.py,sha256=he9JpVHmo_szOiXCwtJdrCseGmf2y5Gi0UEFjwazzVY,23198
41
- visual_rag/indexing/__init__.py,sha256=pMLuinCIERbwWechn176nMrtlmTp0ySfuj8gdkNvRks,679
40
+ visual_rag/indexing/__init__.py,sha256=rloBEBt3x8BQut1Tj1n8fuaQ3iXMS3pm64o8n-NlSAw,985
42
41
  visual_rag/indexing/cloudinary_uploader.py,sha256=e-G5du4D7z6mWWl2lahMidG-Wdc-baImFFILTojebpA,8826
43
42
  visual_rag/indexing/pdf_processor.py,sha256=V3RAKpwgIFicqUaXzaaljePxh_oP4UV5W0aiJyfv0BY,10247
44
43
  visual_rag/indexing/pipeline.py,sha256=1ScpVRlLCq2FWi3IPvlQcIfDCQQ2F64IlRd9ZZHiTaA,25037
45
- visual_rag/indexing/qdrant_indexer.py,sha256=uUOA-6Qkd_vEeP1LdgGyoh1FHu1ZNEyYKuNxJAqetBU,17121
44
+ visual_rag/indexing/qdrant_indexer.py,sha256=Q0e8JCr9B1OxgOMW7BWeg7MlWiLPaBUmjoFof4gZFYo,19519
46
45
  visual_rag/preprocessing/__init__.py,sha256=rCzfBO0jaVKp6MpPRRused_4gasHfobAbG-139Y806E,121
47
46
  visual_rag/preprocessing/crop_empty.py,sha256=iHXITFkRlF40VPJ4k9d432RUAi_89BhAEvK4wOEn96Q,5211
48
47
  visual_rag/retrieval/__init__.py,sha256=J9pnbeB83Fqs9n4g3GcIp1VR9dnuyAlcsIDVsf0lSb8,601
49
- visual_rag/retrieval/multi_vector.py,sha256=m5PKjkj0TFeWNccKNmCqghTM5b9ARr43Lq3sRhOxnjw,7381
50
- visual_rag/retrieval/single_stage.py,sha256=TSndnh4Kz9aT_0kKhNyLEvokbDLkgq--lXuyldzP5sU,4105
48
+ visual_rag/retrieval/multi_vector.py,sha256=ZZ_O4x7MZbhF--kRp8T4UJG5GuenfjJ91FKicklhK3Q,7006
49
+ visual_rag/retrieval/single_stage.py,sha256=Ba06V-KRSFSZm0xzbjFR3EBEWaQkDo7U_pWNx25W8H0,4425
51
50
  visual_rag/retrieval/three_stage.py,sha256=YC0CVEohxTT5zhilcQHI7nYAk08E5jC3zkQ3-rNdLMw,5951
52
- visual_rag/retrieval/two_stage.py,sha256=_RnEgIx_qY4yu2iIk0a3w47D7WiKHlmBivm5gLEpyI4,16779
51
+ visual_rag/retrieval/two_stage.py,sha256=JJ6rXv_3_3WLIjAcxOY7NuhSyuPzIMyHf3ooiGFTp9k,16776
53
52
  visual_rag/visualization/__init__.py,sha256=SITKNvBEseDp7F3K6UzLPA-6OQFqYfY5azS5nlDdihQ,447
54
53
  visual_rag/visualization/saliency.py,sha256=F3Plc18Sf3tzWcyncuaruTmENm1IfW5j9NFGEQR93cY,11248
55
- visual_rag_toolkit-0.1.1.dist-info/METADATA,sha256=SL55eEexz2ogZPD5q-gfzpF2TVZ_U1ZwykPlHaggEdU,11070
56
- visual_rag_toolkit-0.1.1.dist-info/WHEEL,sha256=WLgqFyCfm_KASv4WHyYy0P3pM_m7J5L9k2skdKLirC8,87
57
- visual_rag_toolkit-0.1.1.dist-info/entry_points.txt,sha256=6Tob1GPg_ILGELjYTPsAnNMZ1W0NS939nfI7xyW2DIY,102
58
- visual_rag_toolkit-0.1.1.dist-info/licenses/LICENSE,sha256=hEg_weKnHXJakQRR3sw2ygcZ101zCI00zMhBOPb3yfA,1069
59
- visual_rag_toolkit-0.1.1.dist-info/RECORD,,
54
+ visual_rag_toolkit-0.1.3.dist-info/METADATA,sha256=IQaXJV0GkBuRZG5JTkFA-Zv6pboaAeoWoQIeWzu7-Z4,13180
55
+ visual_rag_toolkit-0.1.3.dist-info/WHEEL,sha256=WLgqFyCfm_KASv4WHyYy0P3pM_m7J5L9k2skdKLirC8,87
56
+ visual_rag_toolkit-0.1.3.dist-info/entry_points.txt,sha256=6Tob1GPg_ILGELjYTPsAnNMZ1W0NS939nfI7xyW2DIY,102
57
+ visual_rag_toolkit-0.1.3.dist-info/licenses/LICENSE,sha256=hEg_weKnHXJakQRR3sw2ygcZ101zCI00zMhBOPb3yfA,1069
58
+ visual_rag_toolkit-0.1.3.dist-info/RECORD,,
@@ -1,83 +0,0 @@
1
- # ViDoRe TAT-DQA (Qdrant) — commands
2
-
3
- ## Environment
4
-
5
- Either export:
6
-
7
- ```bash
8
- export QDRANT_URL="..."
9
- export QDRANT_API_KEY="..." # optional
10
- ```
11
-
12
- Or create a `.env` file in `visual-rag-toolkit/` with the same variables.
13
-
14
- ## Index + evaluate (single run)
15
-
16
- This is the “all-in-one” script (indexes, then evaluates once):
17
-
18
- ```bash
19
- python -m benchmarks.vidore_tatdqa_test.run_qdrant \
20
- --dataset vidore/tatdqa_test \
21
- --collection vidore_tatdqa_test \
22
- --recreate --index \
23
- --indexing-threshold 0 \
24
- --batch-size 6 \
25
- --upload-batch-size 12 \
26
- --upload-workers 0 \
27
- --loader-workers 0 \
28
- --prefer-grpc \
29
- --torch-dtype float16 \
30
- --no-upsert-wait \
31
- --qdrant-vector-dtype float16
32
- ```
33
-
34
- ## Evaluate only (no re-index) — baseline + sweeps
35
-
36
- These commands assume the Qdrant collection already exists and is populated.
37
-
38
- ### Baseline: single-stage full MaxSim
39
-
40
- ```bash
41
- python -m benchmarks.vidore_tatdqa_test.sweep_eval \
42
- --dataset vidore/tatdqa_test \
43
- --collection vidore_tatdqa_test \
44
- --prefer-grpc \
45
- --mode single_full \
46
- --torch-dtype auto \
47
- --query-batch-size 32 \
48
- --top-k 10 \
49
- --out-dir results/sweeps
50
- ```
51
-
52
- ### Two-stage sweep (preferred): stage-1 tokens vs tiles, stage-2 full rerank
53
-
54
- ```bash
55
- python -m benchmarks.vidore_tatdqa_test.sweep_eval \
56
- --dataset vidore/tatdqa_test \
57
- --collection vidore_tatdqa_test \
58
- --prefer-grpc \
59
- --mode two_stage \
60
- --stage1-mode tokens_vs_tiles \
61
- --prefetch-ks 20,50,100,200,400 \
62
- --torch-dtype auto \
63
- --query-batch-size 32 \
64
- --top-k 10 \
65
- --out-dir results/sweeps
66
- ```
67
-
68
- ### Smoke test (optional): run only N queries
69
-
70
- ```bash
71
- python -m benchmarks.vidore_tatdqa_test.sweep_eval \
72
- --dataset vidore/tatdqa_test \
73
- --collection vidore_tatdqa_test \
74
- --prefer-grpc \
75
- --mode single_full \
76
- --torch-dtype auto \
77
- --query-batch-size 32 \
78
- --top-k 10 \
79
- --max-queries 50 \
80
- --out-dir results/sweeps
81
- ```
82
-
83
-