PyPI - cocoindex - Versions diffs - 0.2.18__cp311-abi3-manylinux_2_28_aarch64.whl → 0.2.20__cp311-abi3-manylinux_2_28_aarch64.whl - Mend

cocoindex 0.2.18__cp311-abi3-manylinux_2_28_aarch64.whl → 0.2.20__cp311-abi3-manylinux_2_28_aarch64.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

cocoindex/_engine.abi3.so CHANGED Viewed

Binary file

cocoindex/llm.py CHANGED Viewed

@@ -14,6 +14,7 @@ class LlmApiType(Enum):
     OPEN_ROUTER = "OpenRouter"
     VOYAGE = "Voyage"
     VLLM = "Vllm"
+    BEDROCK = "Bedrock"
 @dataclass

cocoindex/sources/_engine_builtin_specs.py CHANGED Viewed

@@ -100,3 +100,6 @@ class Postgres(op.SourceSpec):
     # Optional: when set, supports change capture from PostgreSQL notification.
     notification: PostgresNotification | None = None
+    # Optional: SQL expression filter for rows (arbitrary SQL boolean expression)
+    filter: str | None = None

{cocoindex-0.2.18.dist-info → cocoindex-0.2.20.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: cocoindex
-Version: 0.2.18
+Version: 0.2.20
 Classifier: Development Status :: 3 - Alpha
 Classifier: License :: OSI Approved :: Apache Software License
 Classifier: Operating System :: OS Independent
@@ -75,7 +75,6 @@ Project-URL: Homepage, https://cocoindex.io/
     <a href="https://trendshift.io/repositories/13939" target="_blank"><img src="https://trendshift.io/api/badge/repositories/13939" alt="cocoindex-io%2Fcocoindex | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
 </div>
 Ultra performant data transformation framework for AI, with core engine written in Rust. Support incremental processing and data lineage out-of-box.  Exceptional developer velocity. Production-ready at day 0.
 ⭐ Drop a star to help us grow!
@@ -113,9 +112,8 @@ CocoIndex makes it effortless to transform data with AI, and keep source data an
 </br>
 ## Exceptional velocity
 Just declare transformation in dataflow with ~100 lines of python
 ```python
@@ -139,6 +137,7 @@ CocoIndex follows the idea of [Dataflow](https://en.wikipedia.org/wiki/Dataflow_
 **Particularly**, developers don't explicitly mutate data by creating, updating and deleting. They just need to define transformation/formula for a set of source data.
 ## Plug-and-Play Building Blocks
 Native builtins for different source, targets and transformations. Standardize interface, make it 1-line code switch between different components - as easy as assembling building blocks.
 <p align="center">
@@ -146,6 +145,7 @@ Native builtins for different source, targets and transformations. Standardize i
 </p>
 ## Data Freshness
 CocoIndex keep source data and target in sync effortlessly.
 <p align="center">
@@ -153,11 +153,14 @@ CocoIndex keep source data and target in sync effortlessly.
 </p>
 It has out-of-box support for incremental indexing:
 - minimal recomputation on source or logic change.
 - (re-)processing necessary portions; reuse cache when possible
-## Quick Start:
+## Quick Start
 If you're new to CocoIndex, we recommend checking out
 - 📖 [Documentation](https://cocoindex.io/docs)
 - ⚡  [Quick Start Guide](https://cocoindex.io/docs/getting_started/quickstart)
 - 🎬 [Quick Start Video Tutorial](https://youtu.be/gv5R8nOXsWU?si=9ioeKYkMEnYevTXT)
@@ -172,7 +175,6 @@ pip install -U cocoindex
 2. [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one. CocoIndex uses it for incremental processing.
 ## Define data flow
 Follow [Quick Start Guide](https://cocoindex.io/docs/getting_started/quickstart) to define your first indexing flow. An example flow looks like:
@@ -228,6 +230,7 @@ It defines an index flow like this:
 | [Text Embedding](examples/text_embedding) | Index text documents with embeddings for semantic search |
 | [Code Embedding](examples/code_embedding) | Index code embeddings for semantic search |
 | [PDF Embedding](examples/pdf_embedding) | Parse PDF and index text embeddings for semantic search |
+| [PDF Elements Embedding](examples/pdf_elements_embedding) | Extract text and images from PDFs; embed text with SentenceTransformers and images with CLIP; store in Qdrant for multimodal search |
 | [Manuals LLM Extraction](examples/manuals_llm_extraction) | Extract structured information from a manual using LLM |
 | [Amazon S3 Embedding](examples/amazon_s3_embedding) | Index text documents from Amazon S3 |
 | [Azure Blob Storage Embedding](examples/azure_blob_embedding) | Index text documents from Azure Blob Storage |
@@ -244,16 +247,18 @@ It defines an index flow like this:
 | [Custom Output Files](examples/custom_output_files) | Convert markdown files to HTML files and save them to a local directory, using *CocoIndex Custom Targets* |
 | [Patient intake form extraction](examples/patient_intake_extraction) | Use LLM to extract structured data from patient intake forms with different formats |
 More coming and stay tuned 👀!
 ## 📖 Documentation
 For detailed documentation, visit [CocoIndex Documentation](https://cocoindex.io/docs), including a [Quickstart guide](https://cocoindex.io/docs/getting_started/quickstart).
 ## 🤝 Contributing
 We love contributions from our community ❤️. For details on contributing or running the project for development, check out our [contributing guide](https://cocoindex.io/docs/about/contributing).
 ## 👥 Community
 Welcome with a huge coconut hug 🥥⋆｡˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.
 Join our community here:
@@ -263,9 +268,11 @@ Join our community here:
 - ▶️ [Subscribe to our YouTube channel](https://www.youtube.com/@cocoindex-io)
 - 📜 [Read our blog posts](https://cocoindex.io/blogs/)
-## Support us:
+## Support us
 We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo [![GitHub](https://img.shields.io/github/stars/cocoindex-io/cocoindex?color=5B5BD6)](https://github.com/cocoindex-io/cocoindex) to stay tuned and help us grow.
 ## License
 CocoIndex is Apache 2.0 licensed.

{cocoindex-0.2.18.dist-info → cocoindex-0.2.20.dist-info}/RECORD RENAMED Viewed

@@ -1,22 +1,21 @@
-cocoindex-0.2.18.dist-info/METADATA,sha256=W5VPkE2KMAEDvAV-JO66Ca8EAMSbP_w1QykrjRXejuY,13444
-cocoindex-0.2.18.dist-info/WHEEL,sha256=T94Vf-8hBLuJYmQaKIvspCD375-5CHbUeNmaNVtwQwY,108
-cocoindex-0.2.18.dist-info/entry_points.txt,sha256=_NretjYVzBdNTn7dK-zgwr7YfG2afz1u1uSE-5bZXF8,46
-cocoindex-0.2.18.dist-info/licenses/THIRD_PARTY_NOTICES.html,sha256=MpyWY1rkfN94bYXo7IgLpY4F9cc1S4Vn5muesIpG5VM,719620
+cocoindex-0.2.20.dist-info/METADATA,sha256=PMLqa8rFhhAtRQCDWSvUSQbKy3vLYHdHatftA49W0e4,13644
+cocoindex-0.2.20.dist-info/WHEEL,sha256=T94Vf-8hBLuJYmQaKIvspCD375-5CHbUeNmaNVtwQwY,108
+cocoindex-0.2.20.dist-info/entry_points.txt,sha256=_NretjYVzBdNTn7dK-zgwr7YfG2afz1u1uSE-5bZXF8,46
+cocoindex-0.2.20.dist-info/licenses/THIRD_PARTY_NOTICES.html,sha256=SJ-7q0eqT40cFyT1cXqQkxWocFEuLT6PrETn5dhxiX8,719620
 cocoindex/__init__.py,sha256=6qZWVkK4WZ01BIAg3CPh_bRRdA6Clk4d4Q6OnZ2jFa4,2630
-cocoindex/_engine.abi3.so,sha256=Foh8iKEC5LxeHl1EzzUZfAmBqDbbyJ0f1OXG6BVWTK8,74683152
+cocoindex/_engine.abi3.so,sha256=YiQOMxjygiJrhOFTPvtHZ_mpvpfvDRy2KEhrRG_XwBQ,74720848
 cocoindex/auth_registry.py,sha256=g-uLDWLYW5NMbYe7q4Y-sU5dSyrlJXBEciyWtAiP9KE,1340
 cocoindex/cli.py,sha256=19IszBXOzqGn0xOV1SaS-oR9NupTmIm18uzFNET7NTQ,23978
 cocoindex/engine_object.py,sha256=5YTuWoR3WILhyt3PW-d9es3MAas_xD6tZZqvipN-sjg,10050
 cocoindex/engine_value.py,sha256=8M7MbwVG2bfd3kFptGGbQHBAp9pD3TVjrBiBDOAhD5M,23211
 cocoindex/flow.py,sha256=JWPTR2G6TdPJkO5ZlrCcyDyQ8utUS4zZWNR8zsHTeW8,40074
-cocoindex/functions.py,sha256=V4ljBnCprvA25XlCVvNLwK5ergXiEcKU76jkOGC-X3A,12882
 cocoindex/functions/__init__.py,sha256=V2IF4h-Cqq4OD_GN3Oqdry-FArORyRCKmqJ7g5UlJr8,1021
 cocoindex/functions/_engine_builtin_specs.py,sha256=WpCGrjUfJBa8xZP5JiEmA8kLu7fp9Rcs7ynpuJmvSGg,1786
 cocoindex/functions/colpali.py,sha256=oACyG3qG2dquyCJ6bT7FkMkua5rXDLSxnOHcgoz9waU,8865
 cocoindex/functions/sbert.py,sha256=1z5OJT-blXT6tVN5vEvEzvYAzOnzs1RCnu1UbCUP6wM,2162
 cocoindex/index.py,sha256=tz5ilvmOp0BtroGehCQDqWK_pIX9m6ghkhcxsDVU8WE,982
 cocoindex/lib.py,sha256=spfdU4IbzdffHyGdrQPIw_qGo9aX0OAAboqsjj8bTiQ,2290
-cocoindex/llm.py,sha256=Pv_cdnRngTLtuLU9AUmS8izIHhcKVnuBNolC33f9BDI,851
+cocoindex/llm.py,sha256=8ZdJhOmhdb2xEcCxk6rDpnj6hlhCyFBmJdhCNMqAOP4,875
 cocoindex/op.py,sha256=Ycvr6lJf7hcCCjYUqHtXZqzSeDD-FQdP3_jcmZUV_zI,26896
 cocoindex/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
 cocoindex/query_handler.py,sha256=X-SQT71LHiOOXn6-TJlQcGodJk-iT8p_1TcIMvRLBRI,1344
@@ -24,7 +23,7 @@ cocoindex/runtime.py,sha256=4NxcltaDZvA3RR3Pnt6gH_f99jcWSyMH_1Xi5BjbtwY,1342
 cocoindex/setting.py,sha256=1Dx8ktjwf-8BiXrbsmfn5Mzudb2SQYqFdRnSNGVKaLk,4960
 cocoindex/setup.py,sha256=7uIHKN4FOCuoidPXcKyGTrkqpkl9luL49-6UcnMxYzw,3068
 cocoindex/sources/__init__.py,sha256=Yu9VHNaGlOEE3jpqfIseswsg25Le3HzwDr6XJAn22Ns,78
-cocoindex/sources/_engine_builtin_specs.py,sha256=NI4Uq6ffX5nfsgzneDErSBK4YH8hccaYj-InEBihJpo,3191
+cocoindex/sources/_engine_builtin_specs.py,sha256=s4AxMLi2j3ZHmzACVEGAdVe05gY8PRZ_mYMxWR7scDY,3304
 cocoindex/subprocess_exec.py,sha256=r1xO84uek4VP4I6i87JMwsH5xFm3vKW0ABvgn0jskt4,10088
 cocoindex/targets/__init__.py,sha256=HQG7I4U0xQhHiYctiUvwEBLxT2727oHP3xwrqotjmhk,78
 cocoindex/targets/_engine_builtin_specs.py,sha256=glXUN5bj11Jxky1VPvmGnWnMHXTQWEh08INcbldo3F4,3375
@@ -40,4 +39,4 @@ cocoindex/typing.py,sha256=so_RusbhBmg_uLoZTY7W_pqU0aIJwFarkTF5NQufl4o,23944
 cocoindex/user_app_loader.py,sha256=bc3Af-gYRxJ9GpObtpjegZY855oQBCv5FGkrkWV2yGY,1873
 cocoindex/utils.py,sha256=hUhX-XV6XGCtJSEIpBOuDv6VvqImwPlgBxztBTw7u0U,598
 cocoindex/validation.py,sha256=PZnJoby4sLbsmPv9fOjOQXuefjfZ7gmtsiTGU8SH-tc,3090
-cocoindex-0.2.18.dist-info/RECORD,,
+cocoindex-0.2.20.dist-info/RECORD,,

{cocoindex-0.2.18.dist-info → cocoindex-0.2.20.dist-info}/licenses/THIRD_PARTY_NOTICES.html RENAMED Viewed

@@ -2428,7 +2428,7 @@ Software.
                 <h3 id="Apache-2.0">Apache License 2.0</h3>
                 <h4>Used by:</h4>
                 <ul class="license-used-by">
-                    <li><a href=" https://crates.io/crates/cocoindex ">cocoindex 0.2.18</a></li>
+                    <li><a href=" https://crates.io/crates/cocoindex ">cocoindex 0.2.20</a></li>
                     <li><a href=" https://github.com/awesomized/crc-fast-rust ">crc-fast 1.3.0</a></li>
                     <li><a href=" https://github.com/qdrant/rust-client ">qdrant-client 1.15.0</a></li>
                 </ul>

cocoindex/functions.py DELETED Viewed

@@ -1,375 +0,0 @@
-"""All builtin functions."""
-import dataclasses
-import functools
-from typing import Any, Literal
-import numpy as np
-from numpy.typing import NDArray
-from . import llm, op
-from .typing import Vector
-class ParseJson(op.FunctionSpec):
-    """Parse a text into a JSON object."""
-@dataclasses.dataclass
-class CustomLanguageSpec:
-    """Custom language specification."""
-    language_name: str
-    separators_regex: list[str]
-    aliases: list[str] = dataclasses.field(default_factory=list)
-@dataclasses.dataclass
-class ColPaliModelInfo:
-    """Data structure for ColPali model and processor."""
-    model: Any
-    processor: Any
-    dimension: int
-    device: Any
-class SplitRecursively(op.FunctionSpec):
-    """Split a document (in string) recursively."""
-    custom_languages: list[CustomLanguageSpec] = dataclasses.field(default_factory=list)
-class SplitBySeparators(op.FunctionSpec):
-    """
-    Split text by specified regex separators only.
-    Output schema matches SplitRecursively for drop-in compatibility:
-        KTable rows with fields: location (Range), text (Str), start, end.
-    Args:
-        separators_regex: list[str]  # e.g., [r"\\n\\n+"]
-        keep_separator: Literal["NONE", "LEFT", "RIGHT"] = "NONE"
-        include_empty: bool = False
-        trim: bool = True
-    """
-    separators_regex: list[str] = dataclasses.field(default_factory=list)
-    keep_separator: Literal["NONE", "LEFT", "RIGHT"] = "NONE"
-    include_empty: bool = False
-    trim: bool = True
-class EmbedText(op.FunctionSpec):
-    """Embed a text into a vector space."""
-    api_type: llm.LlmApiType
-    model: str
-    address: str | None = None
-    output_dimension: int | None = None
-    task_type: str | None = None
-    api_config: llm.VertexAiConfig | None = None
-class ExtractByLlm(op.FunctionSpec):
-    """Extract information from a text using a LLM."""
-    llm_spec: llm.LlmSpec
-    output_type: type
-    instruction: str | None = None
-class SentenceTransformerEmbed(op.FunctionSpec):
-    """
-    `SentenceTransformerEmbed` embeds a text into a vector space using the [SentenceTransformer](https://huggingface.co/sentence-transformers) library.
-    Args:
-        model: The name of the SentenceTransformer model to use.
-        args: Additional arguments to pass to the SentenceTransformer constructor. e.g. {"trust_remote_code": True}
-    Note:
-        This function requires the optional sentence-transformers dependency.
-        Install it with: pip install 'cocoindex[embeddings]'
-    """
-    model: str
-    args: dict[str, Any] | None = None
-@op.executor_class(
-    gpu=True,
-    cache=True,
-    behavior_version=1,
-    arg_relationship=(op.ArgRelationship.EMBEDDING_ORIGIN_TEXT, "text"),
-)
-class SentenceTransformerEmbedExecutor:
-    """Executor for SentenceTransformerEmbed."""
-    spec: SentenceTransformerEmbed
-    _model: Any | None = None
-    def analyze(self) -> type:
-        try:
-            # Only import sentence_transformers locally when it's needed, as its import is very slow.
-            import sentence_transformers  # pylint: disable=import-outside-toplevel
-        except ImportError as e:
-            raise ImportError(
-                "sentence_transformers is required for SentenceTransformerEmbed function. "
-                "Install it with one of these commands:\n"
-                "  pip install 'cocoindex[embeddings]'\n"
-                "  pip install sentence-transformers"
-            ) from e
-        args = self.spec.args or {}
-        self._model = sentence_transformers.SentenceTransformer(self.spec.model, **args)
-        dim = self._model.get_sentence_embedding_dimension()
-        return Vector[np.float32, Literal[dim]]  # type: ignore
-    def __call__(self, text: str) -> NDArray[np.float32]:
-        assert self._model is not None
-        result: NDArray[np.float32] = self._model.encode(text, convert_to_numpy=True)
-        return result
-@functools.cache
-def _get_colpali_model_and_processor(model_name: str) -> ColPaliModelInfo:
-    """Get or load ColPali model and processor, with caching."""
-    try:
-        from colpali_engine.models import (  # type: ignore[import-untyped]
-            ColPali,
-            ColPaliProcessor,
-            ColQwen2,
-            ColQwen2Processor,
-            ColQwen2_5,
-            ColQwen2_5_Processor,
-            ColIdefics3,
-            ColIdefics3Processor,
-        )
-        from colpali_engine.utils.torch_utils import get_torch_device  # type: ignore[import-untyped]
-        import torch
-    except ImportError as e:
-        raise ImportError(
-            "ColVision models are not available. Make sure cocoindex is installed with ColPali support."
-        ) from e
-    device = get_torch_device("auto")
-    # Manual model detection based on model name
-    model_name_lower = model_name.lower()
-    try:
-        if "qwen2.5" in model_name_lower:
-            model = ColQwen2_5.from_pretrained(
-                model_name,
-                torch_dtype=torch.bfloat16,
-                device_map=device,
-            ).eval()
-            processor = ColQwen2_5_Processor.from_pretrained(model_name)
-        elif "qwen2" in model_name_lower:
-            model = ColQwen2.from_pretrained(
-                model_name,
-                torch_dtype=torch.bfloat16,
-                device_map=device,
-            ).eval()
-            processor = ColQwen2Processor.from_pretrained(model_name)
-        elif "colsmol" in model_name_lower or "smol" in model_name_lower:
-            # ColSmol models use Idefics3 architecture
-            model = ColIdefics3.from_pretrained(
-                model_name,
-                torch_dtype=torch.bfloat16,
-                device_map=device,
-            ).eval()
-            processor = ColIdefics3Processor.from_pretrained(model_name)
-        else:
-            # Default to ColPali
-            model = ColPali.from_pretrained(
-                model_name,
-                torch_dtype=torch.bfloat16,
-                device_map=device,
-            ).eval()
-            processor = ColPaliProcessor.from_pretrained(model_name)
-    except Exception as e:
-        raise RuntimeError(f"Failed to load model {model_name}: {e}")
-    # Get dimension from the actual model
-    dimension = _detect_colpali_dimension(model, processor, device)
-    return ColPaliModelInfo(
-        model=model,
-        processor=processor,
-        dimension=dimension,
-        device=device,
-    )
-def _detect_colpali_dimension(model: Any, processor: Any, device: Any) -> int:
-    """Detect ColPali embedding dimension from the actual model config."""
-    # Try to access embedding dimension
-    if hasattr(model.config, "embedding_dim"):
-        dim = model.config.embedding_dim
-    else:
-        # Fallback: infer from output shape with dummy data
-        from PIL import Image
-        import numpy as np
-        import torch
-        dummy_img = Image.fromarray(np.zeros((224, 224, 3), np.uint8))
-        # Use the processor to process the dummy image
-        processed = processor.process_images([dummy_img]).to(device)
-        with torch.no_grad():
-            output = model(**processed)
-        dim = int(output.shape[-1])
-    if isinstance(dim, int):
-        return dim
-    else:
-        raise ValueError(f"Expected integer dimension, got {type(dim)}: {dim}")
-    return dim
-class ColPaliEmbedImage(op.FunctionSpec):
-    """
-    `ColPaliEmbedImage` embeds images using ColVision multimodal models.
-    Supports ALL models available in the colpali-engine library, including:
-    - ColPali models (colpali-*): PaliGemma-based, best for general document retrieval
-    - ColQwen2 models (colqwen-*): Qwen2-VL-based, excellent for multilingual text (29+ languages) and general vision
-    - ColSmol models (colsmol-*): Lightweight, good for resource-constrained environments
-    - Any future ColVision models supported by colpali-engine
-    These models use late interaction between image patch embeddings and text token
-    embeddings for retrieval.
-    Args:
-        model: Any ColVision model name supported by colpali-engine
-               (e.g., "vidore/colpali-v1.2", "vidore/colqwen2.5-v0.2", "vidore/colsmol-v1.0")
-               See https://github.com/illuin-tech/colpali for the complete list of supported models.
-    Note:
-        This function requires the optional colpali-engine dependency.
-        Install it with: pip install 'cocoindex[colpali]'
-    """
-    model: str
-@op.executor_class(
-    gpu=True,
-    cache=True,
-    behavior_version=1,
-)
-class ColPaliEmbedImageExecutor:
-    """Executor for ColVision image embedding (ColPali, ColQwen2, ColSmol, etc.)."""
-    spec: ColPaliEmbedImage
-    _model_info: ColPaliModelInfo
-    def analyze(self) -> type:
-        # Get shared model and dimension
-        self._model_info = _get_colpali_model_and_processor(self.spec.model)
-        # Return multi-vector type: Variable patches x Fixed hidden dimension
-        dimension = self._model_info.dimension
-        return Vector[Vector[np.float32, Literal[dimension]]]  # type: ignore
-    def __call__(self, img_bytes: bytes) -> Any:
-        try:
-            from PIL import Image
-            import torch
-            import io
-        except ImportError as e:
-            raise ImportError(
-                "Required dependencies (PIL, torch) are missing for ColVision image embedding."
-            ) from e
-        model = self._model_info.model
-        processor = self._model_info.processor
-        device = self._model_info.device
-        pil_image = Image.open(io.BytesIO(img_bytes)).convert("RGB")
-        inputs = processor.process_images([pil_image]).to(device)
-        with torch.no_grad():
-            embeddings = model(**inputs)
-        # Return multi-vector format: [patches, hidden_dim]
-        if len(embeddings.shape) != 3:
-            raise ValueError(
-                f"Expected 3D tensor [batch, patches, hidden_dim], got shape {embeddings.shape}"
-            )
-        # Keep patch-level embeddings: [batch, patches, hidden_dim] -> [patches, hidden_dim]
-        patch_embeddings = embeddings[0]  # Remove batch dimension
-        return patch_embeddings.cpu().to(torch.float32).numpy()
-class ColPaliEmbedQuery(op.FunctionSpec):
-    """
-    `ColPaliEmbedQuery` embeds text queries using ColVision multimodal models.
-    Supports ALL models available in the colpali-engine library, including:
-    - ColPali models (colpali-*): PaliGemma-based, best for general document retrieval
-    - ColQwen2 models (colqwen-*): Qwen2-VL-based, excellent for multilingual text (29+ languages) and general vision
-    - ColSmol models (colsmol-*): Lightweight, good for resource-constrained environments
-    - Any future ColVision models supported by colpali-engine
-    This produces query embeddings compatible with ColVision image embeddings
-    for late interaction scoring (MaxSim).
-    Args:
-        model: Any ColVision model name supported by colpali-engine
-               (e.g., "vidore/colpali-v1.2", "vidore/colqwen2.5-v0.2", "vidore/colsmol-v1.0")
-               See https://github.com/illuin-tech/colpali for the complete list of supported models.
-    Note:
-        This function requires the optional colpali-engine dependency.
-        Install it with: pip install 'cocoindex[colpali]'
-    """
-    model: str
-@op.executor_class(
-    gpu=True,
-    cache=True,
-    behavior_version=1,
-)
-class ColPaliEmbedQueryExecutor:
-    """Executor for ColVision query embedding (ColPali, ColQwen2, ColSmol, etc.)."""
-    spec: ColPaliEmbedQuery
-    _model_info: ColPaliModelInfo
-    def analyze(self) -> type:
-        # Get shared model and dimension
-        self._model_info = _get_colpali_model_and_processor(self.spec.model)
-        # Return multi-vector type: Variable tokens x Fixed hidden dimension
-        dimension = self._model_info.dimension
-        return Vector[Vector[np.float32, Literal[dimension]]]  # type: ignore
-    def __call__(self, query: str) -> Any:
-        try:
-            import torch
-        except ImportError as e:
-            raise ImportError(
-                "Required dependencies (torch) are missing for ColVision query embedding."
-            ) from e
-        model = self._model_info.model
-        processor = self._model_info.processor
-        device = self._model_info.device
-        inputs = processor.process_queries([query]).to(device)
-        with torch.no_grad():
-            embeddings = model(**inputs)
-        # Return multi-vector format: [tokens, hidden_dim]
-        if len(embeddings.shape) != 3:
-            raise ValueError(
-                f"Expected 3D tensor [batch, tokens, hidden_dim], got shape {embeddings.shape}"
-            )
-        # Keep token-level embeddings: [batch, tokens, hidden_dim] -> [tokens, hidden_dim]
-        token_embeddings = embeddings[0]  # Remove batch dimension
-        return token_embeddings.cpu().to(torch.float32).numpy()

{cocoindex-0.2.18.dist-info → cocoindex-0.2.20.dist-info}/WHEEL RENAMED Viewed

File without changes

{cocoindex-0.2.18.dist-info → cocoindex-0.2.20.dist-info}/entry_points.txt RENAMED Viewed

File without changes