PyPI - projectdavid - Versions diffs - 1.32.21__tar.gz → 1.33.1__tar.gz - Mend

projectdavid 1.32.21tar.gz → 1.33.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of projectdavid might be problematic. Click here for more details.

Files changed (68) hide show

{projectdavid-1.32.21 → projectdavid-1.33.1}/CHANGELOG.md RENAMED Viewed

@@ -1,3 +1,28 @@
+## [1.33.1](https://github.com/frankie336/projectdavid/compare/v1.33.0...v1.33.1) (2025-06-10)
+### Bug Fixes
+* Add create_vector_vision_store_for_user ([392813b](https://github.com/frankie336/projectdavid/commit/392813bef20e12c2aca456e349b6d937e686f78c))
+# [1.33.0](https://github.com/frankie336/projectdavid/compare/v1.32.21...v1.33.0) (2025-06-10)
+### Features
+* Add support for multi-modal image search ([58e7e27](https://github.com/frankie336/projectdavid/commit/58e7e270be849e36bcd93e6a19942fa3e8abbd25))
+* Add support for multi-modal image search-1 ([b8ebc7c](https://github.com/frankie336/projectdavid/commit/b8ebc7c4fb73cec0bff1b98ee45fa5b52e41a9b3))
+* Add support for multi-modal image search-1 ([2362069](https://github.com/frankie336/projectdavid/commit/2362069e4b5390b4eb2b1007a413a6adb1a8bc7b))
+* Add support for multi-modal image search-2 ([07f81fe](https://github.com/frankie336/projectdavid/commit/07f81fe0a475652bc6d316f3dc45e341452f43b7))
+* Add support for multi-modal image search-3 ([29bce72](https://github.com/frankie336/projectdavid/commit/29bce72b12e3b2b5d2daeafe2367908e0cc3b402))
+* Add support for multi-modal image search-3 ([3f8149e](https://github.com/frankie336/projectdavid/commit/3f8149e31371efa8727b96fa16d92fbe5474f727))
+* Add support for multi-modal image search-4 ([b434d6d](https://github.com/frankie336/projectdavid/commit/b434d6d035324f444b46bd49dd15cbed528527a5))
+* Add support for multi-modal image search-4 ([6acddf0](https://github.com/frankie336/projectdavid/commit/6acddf0c3b38ed6ca9e786ddb6d8ebf1a1328ac5))
+* Add support for multi-modal image search-5 ([1dd9dd9](https://github.com/frankie336/projectdavid/commit/1dd9dd9d91556df8a0089255efad82bfe3f9a6b6))
+* Add support for multi-modal image search-6 ([33a6069](https://github.com/frankie336/projectdavid/commit/33a6069b9f7a9e9007c156d511b3cb8abf859760))
+* Add support for multi-modal image search-7 ([01d68e5](https://github.com/frankie336/projectdavid/commit/01d68e591c8dbc52c81b6bfcd522bb95d27c9ddd))
+* Add support for multi-modal image search-8 ([8663b2a](https://github.com/frankie336/projectdavid/commit/8663b2ab7f0f035ae953281d86ba01a0db926839))
 ## [1.32.21](https://github.com/frankie336/projectdavid/compare/v1.32.20...v1.32.21) (2025-06-10)

{projectdavid-1.32.21/src/projectdavid.egg-info → projectdavid-1.33.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: projectdavid
-Version: 1.32.21
+Version: 1.33.1
 Summary: Python SDK for interacting with the Entities Assistant API.
 Author-email: Francis Neequaye Armah <francis.neequaye@projectdavid.co.uk>
 License: PolyForm Noncommercial License 1.0.0
@@ -29,6 +29,13 @@ Requires-Dist: sseclient-py
 Requires-Dist: requests
 Requires-Dist: python-docx
 Requires-Dist: python-pptx
+Requires-Dist: open_clip_torch>=2.24
+Requires-Dist: pillow>=10.2
+Requires-Dist: transformers>=4.41
+Requires-Dist: accelerate>=0.28
+Requires-Dist: sentencepiece>=0.2
+Requires-Dist: ultralytics>=8.2.21
+Requires-Dist: pytesseract>=0.3
 Provides-Extra: dev
 Requires-Dist: black>=23.3; extra == "dev"
 Requires-Dist: isort>=5.12; extra == "dev"
@@ -36,6 +43,17 @@ Requires-Dist: pytest>=7.2; extra == "dev"
 Requires-Dist: mypy>=1.0; extra == "dev"
 Requires-Dist: build; extra == "dev"
 Requires-Dist: twine; extra == "dev"
+Provides-Extra: vision
+Requires-Dist: torch>=2.2.1; extra == "vision"
+Requires-Dist: torchvision>=0.17.1; extra == "vision"
+Requires-Dist: torchaudio>=2.2.1; extra == "vision"
+Requires-Dist: open_clip_torch>=2.24; extra == "vision"
+Requires-Dist: pillow>=10.2; extra == "vision"
+Requires-Dist: transformers>=4.41; extra == "vision"
+Requires-Dist: accelerate>=0.28; extra == "vision"
+Requires-Dist: sentencepiece>=0.2; extra == "vision"
+Requires-Dist: ultralytics>=8.2.21; extra == "vision"
+Requires-Dist: pytesseract>=0.3; extra == "vision"
 Dynamic: license-file
 # Entity  — by Project David

{projectdavid-1.32.21 → projectdavid-1.33.1}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "projectdavid"
-version = "1.32.21"
+version = "1.33.1"
 description = "Python SDK for interacting with the Entities Assistant API."
 readme = "README.md"
 authors = [
@@ -26,10 +26,18 @@ dependencies = [
     "validators>=0.29.0,<0.35.0",
     "sentence-transformers>=3.4.0,<5.0",
     "sseclient-py",
-     "requests",
+    "requests",
     "python-docx",
     "python-pptx",
+    # Vision / multimodal dependencies
+    "open_clip_torch>=2.24",
+    "pillow>=10.2",
+    "transformers>=4.41",
+    "accelerate>=0.28",
+    "sentencepiece>=0.2",
+    "ultralytics>=8.2.21",
+    "pytesseract>=0.3",
 ]
 classifiers = [
@@ -52,5 +60,24 @@ dev = [
     "twine"
 ]
+vision = [
+    # Users must supply the correct torch wheel (cpu / cu121 / cu118) at install time
+    "torch>=2.2.1",
+    "torchvision>=0.17.1",
+    "torchaudio>=2.2.1",
+    # OpenCLIP + captioning stack
+    "open_clip_torch>=2.24",
+    "pillow>=10.2",
+    "transformers>=4.41",
+    "accelerate>=0.28",
+    "sentencepiece>=0.2",
+    "ultralytics>=8.2.21",
+    "pytesseract>=0.3",
+    # Geolocation package pending release (uncomment when available)
+    # "geoloc-regio-net>=0.2.0 ; extra == 'vision'",
+]
 [tool.isort]
 profile = "black"

{projectdavid-1.32.21 → projectdavid-1.33.1}/src/projectdavid/clients/file_processor.py RENAMED Viewed

@@ -1,6 +1,8 @@
 import asyncio
 import csv
+import hashlib
 import json
+import math
 import re
 import textwrap
 from concurrent.futures import ThreadPoolExecutor
@@ -13,34 +15,124 @@ except ImportError:  # 3.9–3.10
     from typing_extensions import LiteralString
 import numpy as np
+import open_clip
 import pdfplumber
+import torch
 from docx import Document
+from PIL import Image
 from pptx import Presentation
+from transformers import Blip2ForConditionalGeneration, Blip2Processor
+from ultralytics import YOLO
+# OCR fallback – optional
+try:
+    import pytesseract  # noqa: F401  # pylint: disable=unused-import
+except ImportError:
+    pytesseract = None
 from projectdavid_common import UtilsInterface
 from sentence_transformers import SentenceTransformer
 log = UtilsInterface.LoggingUtility()
+def latlon_to_unit_vec(lat: float, lon: float) -> List[float]:
+    """Convert geographic lat/lon (deg) to a 3-D unit vector for Qdrant."""
+    lat_r = math.radians(lat)
+    lon_r = math.radians(lon)
+    return [
+        math.cos(lat_r) * math.cos(lon_r),
+        math.cos(lat_r) * math.sin(lon_r),
+        math.sin(lat_r),
+    ]
 class FileProcessor:
+    """Unified processor for text, tabular, office, JSON, **and image** files.
+    Each modality is embedded with its optimal model:
+        • Text   → paraphrase‑MiniLM‑L6‑v2 (384‑D)
+        • Image  → OpenCLIP ViT‑H/14         (1024‑D)
+        • Caption→ OpenCLIP text head        (1024‑D)
+    Rich captions are generated via BLIP‑2 Flan‑T5‑XL.
+    GPU usage is optional; pass `use_gpu=False` to stay on CPU.
+    """
     # ------------------------------------------------------------------ #
     #  Construction
     # ------------------------------------------------------------------ #
-    def __init__(self, max_workers: int = 4, chunk_size: int = 512):
-        self.embedding_model = SentenceTransformer("paraphrase-MiniLM-L6-v2")
+    def __init__(
+        self,
+        *,
+        max_workers: int = 4,
+        chunk_size: int = 512,
+        use_gpu: bool = True,
+        use_ocr: bool = True,
+        use_detection: bool = False,
+        image_model_name: str = "ViT-H-14",
+        caption_model_name: str = "Salesforce/blip2-flan-t5-xl",
+    ):
+        # Device selection
+        if use_gpu and torch.cuda.is_available():
+            self.device = torch.device("cuda")
+            self.torch_dtype = torch.float16
+        else:
+            self.device = torch.device("cpu")
+            self.torch_dtype = torch.float32
+        # Feature flags
+        self.use_ocr = use_ocr and pytesseract is not None
+        self.use_detection = use_detection
+        if use_ocr and pytesseract is None:
+            log.warning("OCR requested but pytesseract not installed – skipping.")
+        if self.use_detection:
+            self.detector = YOLO("yolov8x.pt").to(self.device)
+        # Text embedder
         self.embedding_model_name = "paraphrase-MiniLM-L6-v2"
-        self._executor = ThreadPoolExecutor(max_workers=max_workers)
+        self.embedding_model = SentenceTransformer(self.embedding_model_name)
+        self.embedding_model.to(str(self.device))
-        # token limits
+        # Chunking parameters
         self.max_seq_length = self.embedding_model.get_max_seq_length()
         self.special_tokens_count = 2
         self.effective_max_length = self.max_seq_length - self.special_tokens_count
         self.chunk_size = min(chunk_size, self.effective_max_length * 4)
-        log.info("Initialized optimized FileProcessor")
+        # Image embedder
+        self.clip_model, _, self.clip_preprocess = (
+            open_clip.create_model_and_transforms(
+                image_model_name,
+                pretrained="laion2b_s32b_b79k",
+                precision="fp16" if self.device.type == "cuda" else "fp32",
+            )
+        )
+        self.clip_model = self.clip_model.to(self.device).eval()
+        self.clip_tokenizer = open_clip.get_tokenizer(image_model_name)
+        # Caption generator
+        self.blip_processor = Blip2Processor.from_pretrained(caption_model_name)
+        self.blip_model = (
+            Blip2ForConditionalGeneration.from_pretrained(
+                caption_model_name,
+                torch_dtype=self.torch_dtype,
+            )
+            .to(self.device)
+            .eval()
+        )
+        # Executor & logging
+        self._executor = ThreadPoolExecutor(max_workers=max_workers)
+        log.info(
+            "FileProcessor ready (device=%s, OCR=%s, detection=%s)",
+            self.device,
+            self.use_ocr,
+            self.use_detection,
+        )
     # ------------------------------------------------------------------ #
-    #  Generic validators
+    #  Generic validators                                           *
     # ------------------------------------------------------------------ #
     def validate_file(self, file_path: Path):
         """Ensure file exists and is under 100 MB."""
@@ -52,20 +144,10 @@ class FileProcessor:
             raise ValueError(f"{file_path.name} > {mb} MB limit")
     # ------------------------------------------------------------------ #
-    #  File-type detection  (simple extension map – NO libmagic)
+    #  File‑type detection (extension‑based – no libmagic)
     # ------------------------------------------------------------------ #
     def _detect_file_type(self, file_path: Path) -> str:
-        """
-        Return one of:
-            • 'pdf'   • 'csv'   • 'json'
-            • 'office' (.doc/.docx/.pptx)
-            • 'text'  (code / markup / plain text)
-        Raises *ValueError* if the extension is not recognised.
-        """
         suffix = file_path.suffix.lower()
         if suffix == ".pdf":
             return "pdf"
         if suffix == ".csv":
@@ -74,7 +156,8 @@ class FileProcessor:
             return "json"
         if suffix in {".doc", ".docx", ".pptx"}:
             return "office"
+        if suffix in {".jpg", ".jpeg", ".png", ".webp", ".bmp", ".gif", ".tiff"}:
+            return "image"
         text_exts = {
             ".txt",
             ".md",
@@ -96,29 +179,100 @@ class FileProcessor:
         }
         if suffix in text_exts:
             return "text"
         raise ValueError(f"Unsupported file type: {file_path.name} (ext={suffix})")
     # ------------------------------------------------------------------ #
-    #  Public entry-point
+    # Dispatcher
     # ------------------------------------------------------------------ #
     async def process_file(self, file_path: Union[str, Path]) -> Dict[str, Any]:
-        """Validate → detect → dispatch to the appropriate processor."""
-        file_path = Path(file_path)
-        self.validate_file(file_path)
-        ftype = self._detect_file_type(file_path)
-        dispatch_map = {
-            "pdf": self._process_pdf,
-            "text": self._process_text,
-            "csv": self._process_csv,
-            "office": self._process_office,
-            "json": self._process_json,
+        path = Path(file_path)
+        self.validate_file(path)
+        ftype = self._detect_file_type(path)
+        return await getattr(self, f"_process_{ftype}")(path)
+    # ------------------------------------------------------------------ #
+    #  Image processing (OpenCLIP + BLIP-2 + OCR + YOLO)
+    # ------------------------------------------------------------------ #
+    async def _process_image(self, file_path: Path) -> Dict[str, Any]:
+        loop = asyncio.get_event_loop()
+        img = await loop.run_in_executor(self._executor, Image.open, file_path)
+        # 1) Image vector
+        def enc_img():
+            with torch.no_grad():
+                t = self.clip_preprocess(img).unsqueeze(0).to(self.device)
+                v = self.clip_model.encode_image(t).squeeze()
+                return (v / v.norm()).float().cpu().numpy()
+        image_vec = await loop.run_in_executor(self._executor, enc_img)
+        # 2) Caption
+        def gen_cap():
+            inp = self.blip_processor(images=img, return_tensors="pt").to(self.device)
+            with torch.no_grad():
+                ids = self.blip_model.generate(**inp, max_new_tokens=50)
+            return self.blip_processor.decode(ids[0], skip_special_tokens=True)
+        caption = await loop.run_in_executor(self._executor, gen_cap)
+        # 3) OCR
+        if self.use_ocr:
+            text = await loop.run_in_executor(
+                self._executor, pytesseract.image_to_string, img
+            )
+            if t := text.strip():
+                caption += "\n" + t
+        # 4) Caption vector
+        def enc_txt():
+            with torch.no_grad():
+                tok = self.clip_tokenizer(caption).unsqueeze(0).to(self.device)
+                v = self.clip_model.encode_text(tok).squeeze()
+                return (v / v.norm()).float().cpu().numpy()
+        caption_vec = await loop.run_in_executor(self._executor, enc_txt)
+        # 5) YOLO regions
+        region_vectors = []
+        if self.use_detection:
+            dets = self.detector(img)[0]
+            for box in dets.boxes:
+                x1, y1, x2, y2 = map(int, box.xyxy[0].cpu().tolist())
+                crop = img.crop((x1, y1, x2, y2))
+                vec = self.encode_image(crop)
+                region_vectors.append(
+                    {
+                        "vector": vec.tolist(),
+                        "bbox": [x1, y1, x2, y2],
+                        "label": dets.names[int(box.cls)],
+                        "conf": float(box.conf),
+                    }
+                )
+        # Metadata
+        sha = hashlib.sha256(file_path.read_bytes()).hexdigest()
+        w, h = img.size
+        meta = {
+            "source": str(file_path),
+            "type": "image",
+            "width": w,
+            "height": h,
+            "mime": f"image/{file_path.suffix.lstrip('.')}",
+            "sha256": sha,
+            "embedding_model": "openclip-vit-h-14",
+            "caption": caption,
         }
-        if ftype not in dispatch_map:
-            raise ValueError(f"Unsupported file type: {file_path.suffix}")
-        return await dispatch_map[ftype](file_path)
+        result = {
+            "content": None,
+            "metadata": meta,
+            "chunks": [caption],
+            "vectors": [image_vec.tolist()],
+            "caption_vector": caption_vec.tolist(),
+        }
+        if region_vectors:
+            result["region_vectors"] = region_vectors
+        return result
     # ------------------------------------------------------------------ #
     #  PDF
@@ -126,7 +280,6 @@ class FileProcessor:
     async def _process_pdf(self, file_path: Path) -> Dict[str, Any]:
         page_chunks, doc_meta = await self._extract_text(file_path)
         all_chunks, line_data = [], []
         for page_text, page_num, line_nums in page_chunks:
             lines = page_text.split("\n")
             buf, buf_lines, length = [], [], 0
@@ -165,7 +318,7 @@ class FileProcessor:
         }
     # ------------------------------------------------------------------ #
-    #  Plain-text / code / markup
+    #  Plain‑text / code / markup
     # ------------------------------------------------------------------ #
     async def _process_text(self, file_path: Path) -> Dict[str, Any]:
         text, extra_meta, _ = await self._extract_text(file_path)
@@ -198,7 +351,6 @@ class FileProcessor:
                     continue
                 texts.append(txt)
                 metas.append({k: v for k, v in row.items() if k != text_field and v})
         vectors = await asyncio.gather(*[self._encode_chunk_async(t) for t in texts])
         return {
             "content": None,
@@ -209,7 +361,7 @@ class FileProcessor:
         }
     # ------------------------------------------------------------------ #
-    #  Office docs (.doc/.docx/.pptx)
+    #  Office docs
     # ------------------------------------------------------------------ #
     async def _process_office(self, file_path: Path) -> Dict[str, Any]:
         loop = asyncio.get_event_loop()
@@ -217,11 +369,10 @@ class FileProcessor:
             text = await loop.run_in_executor(
                 self._executor, self._read_docx, file_path
             )
-        else:  # .pptx
+        else:
             text = await loop.run_in_executor(
                 self._executor, self._read_pptx, file_path
             )
         chunks = self._chunk_text(text)
         vectors = await asyncio.gather(*[self._encode_chunk_async(c) for c in chunks])
         return {
@@ -267,11 +418,25 @@ class FileProcessor:
             return await loop.run_in_executor(
                 self._executor, self._extract_pdf_text, file_path
             )
-        else:
-            text = await loop.run_in_executor(
-                self._executor, self._read_text_file, file_path
+        text = await loop.run_in_executor(
+            self._executor, self._read_text_file, file_path
+        )
+        return text, {}, []
+    # ------------------------------------------------------------------ #
+    # util: clip‑text encoder (public)
+    # ------------------------------------------------------------------ #
+    def encode_clip_text(self, text: Union[str, List[str]]) -> np.ndarray:
+        with torch.no_grad():
+            toks = (
+                self.clip_tokenizer(text)
+                if isinstance(text, str)
+                else self.clip_tokenizer(text, truncate=True)
             )
-            return text, {}, []
+            tensor = toks.unsqueeze(0).to(self.device)
+            feat = self.clip_model.encode_text(tensor).squeeze()
+            feat = feat / feat.norm()
+            return feat.float().cpu().numpy()
     def _extract_pdf_text(self, file_path: Path):
         page_chunks, meta = [], {}
@@ -287,8 +452,8 @@ class FileProcessor:
                 lines = page.extract_text_lines()
                 sorted_lines = sorted(lines, key=lambda x: x["top"])
                 txts, nums = [], []
-                for ln_idx, L in enumerate(sorted_lines, start=1):
-                    t = L.get("text", "").strip()
+                for ln_idx, line in enumerate(sorted_lines, start=1):
+                    t = line.get("text", "").strip()
                     if t:
                         txts.append(t)
                         nums.append(ln_idx)
@@ -362,3 +527,24 @@ class FileProcessor:
             seg = tokens[i : i + self.effective_max_length]
             out.append(self.embedding_model.tokenizer.convert_tokens_to_string(seg))
         return out
+    # ------------------------------------------------------------------ #
+    #  Retrieval helpers (optional use)
+    # ------------------------------------------------------------------ #
+    def encode_text(self, text: Union[str, List[str]]) -> np.ndarray:
+        """Embed raw text with the SentenceTransformer model."""
+        single = isinstance(text, str)
+        out = self.embedding_model.encode(
+            text,
+            convert_to_numpy=True,
+            normalize_embeddings=True,
+            show_progress_bar=False,
+        )
+        return out if not single else out[0]
+    def encode_image(self, img: Image.Image) -> np.ndarray:
+        with torch.no_grad():
+            tensor = self.clip_preprocess(img).unsqueeze(0).to(self.device)
+            feat = self.clip_model.encode_image(tensor).squeeze()
+            feat = feat / feat.norm()
+            return feat.float().cpu().numpy()

{projectdavid-1.32.21 → projectdavid-1.33.1}/src/projectdavid/clients/vector_store_manager.py RENAMED Viewed

@@ -50,11 +50,18 @@ class VectorStoreManager(BaseVectorStore):
     def create_store(
         self,
         collection_name: str,
+        *,
         vector_size: int = 384,
         distance: str = "COSINE",
+        vectors_config: Optional[Dict[str, qdrant.VectorParams]] = None,
     ) -> dict:
+        """
+        Create or recreate a Qdrant collection.  By default creates a single-vector
+        collection with `vector_size`.  To define multi-vector schema, pass
+        `vectors_config` mapping field names to VectorParams.
+        """
         try:
-            # quick existence check
+            # existence check
             if any(
                 col.name == collection_name
                 for col in self.client.get_collections().collections
@@ -65,16 +72,27 @@ class VectorStoreManager(BaseVectorStore):
             if dist not in qdrant.Distance.__members__:
                 raise ValueError(f"Invalid distance metric '{distance}'")
+            # choose schema
+            if vectors_config:
+                config = vectors_config
+            else:
+                config = {
+                    "_default": qdrant.VectorParams(
+                        size=vector_size, distance=qdrant.Distance[dist]
+                    )
+                }
+            # recreate with full schema
             self.client.recreate_collection(
                 collection_name=collection_name,
-                vectors_config=qdrant.VectorParams(
-                    size=vector_size, distance=qdrant.Distance[dist]
-                ),
+                vectors_config=config,
             )
+            # record metadata for each field
             self.active_stores[collection_name] = {
                 "created_at": int(time.time()),
                 "vector_size": vector_size,
                 "distance": dist,
+                "fields": list(config.keys()),
             }
             log.info("Created Qdrant collection %s", collection_name)
             return {"collection_name": collection_name, "status": "created"}
@@ -103,8 +121,9 @@ class VectorStoreManager(BaseVectorStore):
                 "name": store_name,
                 "status": "active",
                 "vectors_count": info.points_count,
-                "configuration": info.config.params["default"],
+                "configuration": info.config.params,
                 "created_at": self.active_stores[store_name]["created_at"],
+                "fields": self.active_stores[store_name].get("fields"),
             }
         except Exception as e:
             log.error("Store info failed: %s", e)
@@ -119,6 +138,8 @@ class VectorStoreManager(BaseVectorStore):
         texts: List[str],
         vectors: List[List[float]],
         metadata: List[dict],
+        *,
+        vector_name: Optional[str] = None,  # NEW
     ):
         if not vectors:
             raise ValueError("Empty vectors list")
@@ -136,7 +157,13 @@ class VectorStoreManager(BaseVectorStore):
             for txt, vec, meta in zip(texts, vectors, metadata)
         ]
         try:
-            self.client.upsert(collection_name=store_name, points=points, wait=True)
+            # pass vector_name if multi-column
+            self.client.upsert(
+                collection_name=store_name,
+                points=points,
+                wait=True,
+                vector_name=vector_name,  # ignored if None
+            )
             return {"status": "success", "points_inserted": len(points)}
         except Exception as e:
             log.error("Add‑to‑store failed: %s", e)
@@ -189,15 +216,25 @@ class VectorStoreManager(BaseVectorStore):
         query_vector: List[float],
         top_k: int = 5,
         filters: Optional[dict] = None,
+        *,
+        vector_field: Optional[str] = None,  # ← NEW
         score_threshold: float = 0.0,
         offset: int = 0,
         limit: Optional[int] = None,
     ) -> List[dict]:
-        """Run a similarity search that works with any 1.x qdrant‑client."""
+        """
+        Run a similarity search against *store_name*.
+        • Works with any Qdrant-client ≥ 1.0
+        • `vector_field` lets you target a non-default vector column
+          (e.g. ``\"caption_vector\"`` for image stores).  Pass **None**
+          to use the collection’s default vector.
+        """
         limit = limit or top_k
         flt = self._dict_to_filter(filters) if filters else None
+        # ── shared kwargs ----------------------------------------------------
         common: Dict[str, Any] = dict(
             collection_name=store_name,
             query_vector=query_vector,
@@ -207,20 +244,21 @@ class VectorStoreManager(BaseVectorStore):
             with_payload=True,
             with_vectors=False,
         )
+        if vector_field:  # ← inject when requested
+            common["vector_name"] = vector_field
+        # ── call search (new client first, fallback to old) ------------------
         try:
-            # Newer clients (≥ 1.6) use `filter=`
-            res = self.client.search(**common, filter=flt)  # type: ignore[arg-type]
+            res = self.client.search(**common, filter=flt)  # ≥ 1.6
         except AssertionError as ae:
             if "Unknown arguments" not in str(ae):
                 raise
-            # Older clients use `query_filter=`
-            res = self.client.search(**common, query_filter=flt)  # type: ignore[arg-type]
+            res = self.client.search(**common, query_filter=flt)  # < 1.6
         except Exception as e:
             log.error("Query failed: %s", e)
             raise VectorStoreError(f"Query failed: {e}") from e
+        # ── normalise result -------------------------------------------------
         return [
             {
                 "id": p.id,

projectdavid 1.32.21__tar.gz → 1.33.1__tar.gz

Potentially problematic release.

projectdavid 1.32.21tar.gz → 1.33.1tar.gz