PyPI - langchain-postgres - Versions diffs - 0.0.15__tar.gz → 0.0.16__tar.gz - Mend

langchain-postgres 0.0.15tar.gz → 0.0.16tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (65) hide show

{langchain_postgres-0.0.15 → langchain_postgres-0.0.16}/DEVELOPMENT.md RENAMED Viewed

@@ -21,7 +21,7 @@ Start PostgreSQL/PGVector.
 docker run --rm -it --name pgvector-container \
   -e POSTGRES_USER=langchain \
   -e POSTGRES_PASSWORD=langchain \
-  -e POSTGRES_DB=langchain \
+  -e POSTGRES_DB=langchain_test \
   -p 6024:5432 pgvector/pgvector:pg16 \
   postgres -c log_statement=all
 ```

langchain_postgres-0.0.15/README.md → langchain_postgres-0.0.16/PKG-INFO RENAMED Viewed

@@ -1,3 +1,19 @@
+Metadata-Version: 2.4
+Name: langchain-postgres
+Version: 0.0.16
+Summary: An integration package connecting Postgres and LangChain
+License-Expression: MIT
+License-File: LICENSE
+Requires-Python: >=3.9
+Requires-Dist: asyncpg>=0.30.0
+Requires-Dist: langchain-core<2.0,>=0.2.13
+Requires-Dist: numpy<3,>=1.21
+Requires-Dist: pgvector<0.4,>=0.2.5
+Requires-Dist: psycopg-pool<4,>=3.2.1
+Requires-Dist: psycopg[binary]<4,>=3
+Requires-Dist: sqlalchemy[asyncio]<3,>=2
+Description-Content-Type: text/markdown
 # langchain-postgres
 [![Release Notes](https://img.shields.io/github/release/langchain-ai/langchain-postgres)](https://github.com/langchain-ai/langchain-postgres/releases)
@@ -79,6 +95,24 @@ print(docs)
 > [!TIP]
 > All synchronous functions have corresponding asynchronous functions
+### Hybrid Search with PGVectorStore
+With PGVectorStore you can use hybrid search for more comprehensive and relevant search results.
+```python
+vs = PGVectorStore.create_sync(
+    engine=engine,
+    table_name=TABLE_NAME,
+    embedding_service=embedding,
+    hybrid_search_config=HybridSearchConfig(
+      fusion_function=reciprocal_rank_fusion
+    ),
+)
+hybrid_docs = vector_store.similarity_search("products", k=5)
+```
+For a detailed guide on how to use hybrid search, see the [documentation](/examples/pg_vectorstore_how_to.ipynb#hybrid-search-with-pgvectorstore ).
 ## ChatMessageHistory
 The chat message history abstraction helps to persist chat message history

langchain_postgres-0.0.15/PKG-INFO → langchain_postgres-0.0.16/README.md RENAMED Viewed

@@ -1,19 +1,3 @@
-Metadata-Version: 2.4
-Name: langchain-postgres
-Version: 0.0.15
-Summary: An integration package connecting Postgres and LangChain
-License-Expression: MIT
-License-File: LICENSE
-Requires-Python: >=3.9
-Requires-Dist: asyncpg>=0.30.0
-Requires-Dist: langchain-core<0.4.0,>=0.2.13
-Requires-Dist: numpy<3,>=1.21
-Requires-Dist: pgvector<0.4,>=0.2.5
-Requires-Dist: psycopg-pool<4,>=3.2.1
-Requires-Dist: psycopg<4,>=3
-Requires-Dist: sqlalchemy<3,>=2
-Description-Content-Type: text/markdown
 # langchain-postgres
 [![Release Notes](https://img.shields.io/github/release/langchain-ai/langchain-postgres)](https://github.com/langchain-ai/langchain-postgres/releases)
@@ -95,6 +79,24 @@ print(docs)
 > [!TIP]
 > All synchronous functions have corresponding asynchronous functions
+### Hybrid Search with PGVectorStore
+With PGVectorStore you can use hybrid search for more comprehensive and relevant search results.
+```python
+vs = PGVectorStore.create_sync(
+    engine=engine,
+    table_name=TABLE_NAME,
+    embedding_service=embedding,
+    hybrid_search_config=HybridSearchConfig(
+      fusion_function=reciprocal_rank_fusion
+    ),
+)
+hybrid_docs = vector_store.similarity_search("products", k=5)
+```
+For a detailed guide on how to use hybrid search, see the [documentation](/examples/pg_vectorstore_how_to.ipynb#hybrid-search-with-pgvectorstore ).
 ## ChatMessageHistory
 The chat message history abstraction helps to persist chat message history

{langchain_postgres-0.0.15 → langchain_postgres-0.0.16}/examples/pg_vectorstore_how_to.ipynb RENAMED Viewed

@@ -686,6 +686,260 @@
     "1. For new records, added via `VectorStore` embeddings are automatically generated."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Hybrid Search with PGVectorStore\n",
+    "\n",
+    "A Hybrid Search combines multiple lookup strategies to provide more comprehensive and relevant search results. Specifically, it leverages both dense embedding vector search as the primary search (for semantic similarity) and TSV (Text Search Vector) based keyword search as the secondary search (for lexical matching). This approach is particularly powerful for applications requiring efficient searching through customized text and metadata, especially when a specialized embedding model isn't feasible or necessary.\n",
+    "\n",
+    "By integrating both semantic and lexical capabilities, hybrid search helps overcome the limitations of each individual method:\n",
+    "* **Semantic Search**: Excellent for understanding the meaning of a query, even if the exact keywords aren't present. However, it can sometimes miss highly relevant documents that contain the precise keywords but have a slightly different semantic context.\n",
+    "* **Keyword Search**: Highly effective for finding documents with exact keyword matches and is generally fast. Its weakness lies in its inability to understand synonyms, misspellings, or conceptual relationships."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Hybrid Search Config\n",
+    "\n",
+    "You can take advantage of hybrid search with PGVectorStore using the `HybridSearchConfig`.\n",
+    "\n",
+    "With a `HybridSearchConfig` provided, the `PGVectorStore` class can efficiently manage a hybrid search vector store using PostgreSQL as the backend, automatically handling the creation and population of the necessary TSV columns when possible."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Building the config\n",
+    "\n",
+    "Here are the parameters to the hybrid search config:\n",
+    "* **tsv_column:** The column name for TSV column. Default: `<content_column>_tsv`\n",
+    "* **tsv_lang:** Value representing a supported language. Default: `pg_catalog.english`\n",
+    "* **fts_query:** If provided, this would be used for secondary retrieval instead of user provided query.\n",
+    "* **fusion_function:** Determines how the results are to be merged, default is equal weighted sum ranking.\n",
+    "* **fusion_function_parameters:** Parameters for the fusion function\n",
+    "* **primary_top_k:** Max results fetched for primary retrieval. Default: `4`\n",
+    "* **secondary_top_k:** Max results fetched for secondary retrieval. Default: `4`\n",
+    "* **index_name:** Name of the index built on the `tsv_column`\n",
+    "* **index_type:** GIN or GIST. Default: `GIN`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here is an example `HybridSearchConfig`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_postgres.v2.hybrid_search_config import (\n",
+    "    HybridSearchConfig,\n",
+    "    reciprocal_rank_fusion,\n",
+    ")\n",
+    "\n",
+    "hybrid_search_config = HybridSearchConfig(\n",
+    "    tsv_column=\"hybrid_description\",\n",
+    "    tsv_lang=\"pg_catalog.english\",\n",
+    "    fusion_function=reciprocal_rank_fusion,\n",
+    "    fusion_function_parameters={\n",
+    "        \"rrf_k\": 60,\n",
+    "        \"fetch_top_k\": 10,\n",
+    "    },\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Note:** In this case, we have mentioned the fusion function to be a `reciprocal rank fusion` but you can also use the `weighted_sum_ranking`.\n",
+    "\n",
+    "Make sure to use the right fusion function parameters\n",
+    "\n",
+    "`reciprocal_rank_fusion`:\n",
+    "* rrf_k: The RRF parameter k. Defaults to 60\n",
+    "* fetch_top_k: The number of documents to fetch after merging the results. Defaults to 4\n",
+    "\n",
+    "`weighted_sum_ranking`:\n",
+    "* primary_results_weight: The weight for the primary source's scores. Defaults to 0.5\n",
+    "* secondary_results_weight: The weight for the secondary source's scores. Defaults to 0.5\n",
+    "* fetch_top_k: The number of documents to fetch after merging the results. Defaults to 4\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Usage\n",
+    "\n",
+    "Let's assume we are using the previously mentioned table [`products`](#create-a-vector-store-using-existing-table), which stores product details for an eComm venture.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### With a new hybrid search table\n",
+    "To create a new postgres table with the tsv column, specify the hybrid search config during the initialization of the vector store.\n",
+    "\n",
+    "In this case, all the similarity searches will make use of hybrid search."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_postgres import PGVectorStore\n",
+    "\n",
+    "TABLE_NAME = \"hybrid_search_products\"\n",
+    "\n",
+    "await pg_engine.ainit_vectorstore_table(\n",
+    "    table_name=TABLE_NAME,\n",
+    "    # schema_name=SCHEMA_NAME,\n",
+    "    vector_size=VECTOR_SIZE,\n",
+    "    id_column=\"product_id\",\n",
+    "    content_column=\"description\",\n",
+    "    embedding_column=\"embed\",\n",
+    "    metadata_columns=[\"name\", \"category\", \"price_usd\", \"quantity\", \"sku\", \"image_url\"],\n",
+    "    metadata_json_column=\"metadata\",\n",
+    "    hybrid_search_config=hybrid_search_config,\n",
+    "    store_metadata=True,\n",
+    ")\n",
+    "\n",
+    "vs_hybrid = await PGVectorStore.create(\n",
+    "    pg_engine,\n",
+    "    table_name=TABLE_NAME,\n",
+    "    # schema_name=SCHEMA_NAME,\n",
+    "    embedding_service=embedding,\n",
+    "    # Connect to existing VectorStore by customizing below column names\n",
+    "    id_column=\"product_id\",\n",
+    "    content_column=\"description\",\n",
+    "    embedding_column=\"embed\",\n",
+    "    metadata_columns=[\"name\", \"category\", \"price_usd\", \"quantity\", \"sku\", \"image_url\"],\n",
+    "    metadata_json_column=\"metadata\",\n",
+    "    hybrid_search_config=hybrid_search_config,\n",
+    ")\n",
+    "\n",
+    "# Fetch documents from the previously created store to fetch product documents\n",
+    "docs = await custom_store.asimilarity_search(\"products\", k=5)\n",
+    "# Add data normally to the hybrid search vector store, which will also add the tsv values in tsv_column\n",
+    "await vs_hybrid.aadd_documents(docs)\n",
+    "\n",
+    "# Use hybrid search\n",
+    "hybrid_docs = await vs_hybrid.asimilarity_search(\"products\", k=5)\n",
+    "print(hybrid_docs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### With a pre-existing table\n",
+    "\n",
+    "If a hybrid search config is **NOT** provided during `init_vectorstore_table` while creating a table, the table will not contain a tsv_column. In this case you can still take advantage of hybrid search using the `HybridSearchConfig`.\n",
+    "\n",
+    "The specified TSV column is not present but the TSV vectors are created dynamically on-the-go for hybrid search."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_postgres import PGVectorStore\n",
+    "\n",
+    "# Set the existing table name\n",
+    "TABLE_NAME = \"products\"\n",
+    "# SCHEMA_NAME = \"my_schema\"\n",
+    "\n",
+    "hybrid_search_config = HybridSearchConfig(\n",
+    "    tsv_lang=\"pg_catalog.english\",\n",
+    "    fusion_function=reciprocal_rank_fusion,\n",
+    "    fusion_function_parameters={\n",
+    "        \"rrf_k\": 60,\n",
+    "        \"fetch_top_k\": 10,\n",
+    "    },\n",
+    ")\n",
+    "\n",
+    "# Initialize PGVectorStore with the hybrid search config\n",
+    "custom_hybrid_store = await PGVectorStore.create(\n",
+    "    pg_engine,\n",
+    "    table_name=TABLE_NAME,\n",
+    "    # schema_name=SCHEMA_NAME,\n",
+    "    embedding_service=embedding,\n",
+    "    # Connect to existing VectorStore by customizing below column names\n",
+    "    id_column=\"product_id\",\n",
+    "    content_column=\"description\",\n",
+    "    embedding_column=\"embed\",\n",
+    "    metadata_columns=[\"name\", \"category\", \"price_usd\", \"quantity\", \"sku\", \"image_url\"],\n",
+    "    metadata_json_column=\"metadata\",\n",
+    "    hybrid_search_config=hybrid_search_config,\n",
+    ")\n",
+    "\n",
+    "# Use hybrid search\n",
+    "hybrid_docs = await custom_hybrid_store.asimilarity_search(\"products\", k=5)\n",
+    "print(hybrid_docs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this case, all the similarity searches will make use of hybrid search."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Applying Hybrid Search to Specific Queries\n",
+    "\n",
+    "To use hybrid search only for certain queries, omit the configuration during initialization and pass it directly to the search method when needed."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Use hybrid search\n",
+    "hybrid_docs = await custom_store.asimilarity_search(\n",
+    "    \"products\", k=5, hybrid_search_config=hybrid_search_config\n",
+    ")\n",
+    "print(hybrid_docs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Hybrid Search Index\n",
+    "\n",
+    "Optionally, if you have created a Postgres table with a tsv_column, you can create an index."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "await vs_hybrid.aapply_hybrid_search_index()"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},

{langchain_postgres-0.0.15 → langchain_postgres-0.0.16}/langchain_postgres/v2/async_vectorstore.py RENAMED Viewed

@@ -210,7 +210,7 @@ class AsyncPGVectorStore(VectorStore):
                 hybrid_search_config.tsv_column = ""
         if embedding_column not in columns:
             raise ValueError(f"Embedding column, {embedding_column}, does not exist.")
-        if columns[embedding_column] != "USER-DEFINED":
+        if columns[embedding_column] not in ["USER-DEFINED", "vector"]:
             raise ValueError(
                 f"Embedding column, {embedding_column}, is not type Vector."
             )
@@ -580,16 +580,16 @@ class AsyncPGVectorStore(VectorStore):
         For best hybrid search performance, consider creating a TSV column
         and adding GIN index.
         """
-        if not k:
-            k = (
-                max(
-                    self.k,
-                    self.hybrid_search_config.primary_top_k,
-                    self.hybrid_search_config.secondary_top_k,
-                )
-                if self.hybrid_search_config
-                else self.k
-            )
+        hybrid_search_config = kwargs.get(
+            "hybrid_search_config", self.hybrid_search_config
+        )
+        final_k = k if k is not None else self.k
+        dense_limit = final_k
+        if hybrid_search_config:
+            dense_limit = hybrid_search_config.primary_top_k
         operator = self.distance_strategy.operator
         search_function = self.distance_strategy.search_function
@@ -617,9 +617,9 @@ class AsyncPGVectorStore(VectorStore):
             embedding_data_string = ":query_embedding"
         where_filters = f"WHERE {safe_filter}" if safe_filter else ""
         dense_query_stmt = f"""SELECT {column_names}, {search_function}("{self.embedding_column}", {embedding_data_string}) as distance
-        FROM "{self.schema_name}"."{self.table_name}" {where_filters} ORDER BY "{self.embedding_column}" {operator} {embedding_data_string} LIMIT :k;
+        FROM "{self.schema_name}"."{self.table_name}" {where_filters} ORDER BY "{self.embedding_column}" {operator} {embedding_data_string} LIMIT :dense_limit;
         """
-        param_dict = {"query_embedding": query_embedding, "k": k}
+        param_dict = {"query_embedding": query_embedding, "dense_limit": dense_limit}
         if filter_dict:
             param_dict.update(filter_dict)
         if self.index_query_options:
@@ -637,16 +637,13 @@ class AsyncPGVectorStore(VectorStore):
                 result_map = result.mappings()
                 dense_results = result_map.fetchall()
-        hybrid_search_config = kwargs.get(
-            "hybrid_search_config", self.hybrid_search_config
-        )
         fts_query = (
             hybrid_search_config.fts_query
             if hybrid_search_config and hybrid_search_config.fts_query
             else kwargs.get("fts_query", "")
         )
         if hybrid_search_config and fts_query:
-            hybrid_search_config.fusion_function_parameters["fetch_top_k"] = k
+            hybrid_search_config.fusion_function_parameters["fetch_top_k"] = final_k
             # do the sparse query
             lang = (
                 f"'{hybrid_search_config.tsv_lang}',"
@@ -670,6 +667,7 @@ class AsyncPGVectorStore(VectorStore):
                 dense_results,
                 sparse_results,
                 **hybrid_search_config.fusion_function_parameters,
+                distance_strategy=self.distance_strategy,
             )
             return combined_results
         return dense_results

{langchain_postgres-0.0.15 → langchain_postgres-0.0.16}/langchain_postgres/v2/engine.py RENAMED Viewed

@@ -119,7 +119,7 @@ class PGEngine:
             return await coro
         # Otherwise, run in the background thread
         return await asyncio.wrap_future(
-            asyncio.run_coroutine_threadsafe(coro, self._loop)
+            asyncio.run_coroutine_threadsafe(coro, self._loop)  # type: ignore[arg-type]
         )
     def _run_as_sync(self, coro: Awaitable[T]) -> T:
@@ -128,7 +128,7 @@ class PGEngine:
             raise Exception(
                 "Engine was initialized without a background loop and cannot call sync methods."
             )
-        return asyncio.run_coroutine_threadsafe(coro, self._loop).result()
+        return asyncio.run_coroutine_threadsafe(coro, self._loop).result()  # type: ignore[arg-type]
     async def close(self) -> None:
         """Dispose of connection pool"""

langchain_postgres-0.0.16/langchain_postgres/v2/hybrid_search_config.py ADDED Viewed

@@ -0,0 +1,212 @@
+from abc import ABC
+from dataclasses import dataclass, field
+from typing import Any, Callable, Optional, Sequence
+from sqlalchemy import RowMapping
+from .indexes import DistanceStrategy
+def _normalize_scores(
+    results: Sequence[dict[str, Any]], is_distance_metric: bool
+) -> Sequence[dict[str, Any]]:
+    """Normalizes scores to a 0-1 scale, where 1 is best."""
+    if not results:
+        return []
+    # Get scores from the last column of each result
+    scores = [float(list(item.values())[-1]) for item in results]
+    min_score, max_score = min(scores), max(scores)
+    score_range = max_score - min_score
+    if score_range == 0:
+        # All documents are of the highest quality (1.0)
+        for item in results:
+            item["normalized_score"] = 1.0
+        return list(results)
+    for item in results:
+        # Access the score again from the last column for calculation
+        score = list(item.values())[-1]
+        normalized = (score - min_score) / score_range
+        if is_distance_metric:
+            # For distance, a lower score is better, so we invert the result.
+            item["normalized_score"] = 1.0 - normalized
+        else:
+            # For similarity (like keyword search), a higher score is better.
+            item["normalized_score"] = normalized
+    return list(results)
+def weighted_sum_ranking(
+    primary_search_results: Sequence[RowMapping],
+    secondary_search_results: Sequence[RowMapping],
+    primary_results_weight: float = 0.5,
+    secondary_results_weight: float = 0.5,
+    fetch_top_k: int = 4,
+    **kwargs: Any,
+) -> Sequence[dict[str, Any]]:
+    """
+    Ranks documents using a weighted sum of scores from two sources.
+    Args:
+        primary_search_results: A list of (document, distance) tuples from
+            the primary search.
+        secondary_search_results: A list of (document, distance) tuples from
+            the secondary search.
+        primary_results_weight: The weight for the primary source's scores.
+            Defaults to 0.5.
+        secondary_results_weight: The weight for the secondary source's scores.
+            Defaults to 0.5.
+        fetch_top_k: The number of documents to fetch after merging the results.
+            Defaults to 4.
+    Returns:
+        A list of (document, distance) tuples, sorted by weighted_score in
+        descending order.
+    """
+    distance_strategy = kwargs.get(
+        "distance_strategy", DistanceStrategy.COSINE_DISTANCE
+    )
+    is_primary_distance = distance_strategy != DistanceStrategy.INNER_PRODUCT
+    # Normalize both sets of results onto a 0-1 scale
+    normalized_primary = _normalize_scores(
+        [dict(row) for row in primary_search_results],
+        is_distance_metric=is_primary_distance,
+    )
+    # Keyword search relevance is a similarity score (higher is better)
+    normalized_secondary = _normalize_scores(
+        [dict(row) for row in secondary_search_results], is_distance_metric=False
+    )
+    # stores computed metric with provided distance metric and weights
+    weighted_scores: dict[str, dict[str, Any]] = {}
+    # Process primary results
+    for item in normalized_primary:
+        doc_id = str(list(item.values())[0])
+        # Set the 'distance' key with the weighted primary score
+        item["distance"] = item["normalized_score"] * primary_results_weight
+        weighted_scores[doc_id] = item
+    # Process secondary results
+    for item in normalized_secondary:
+        doc_id = str(list(item.values())[0])
+        secondary_weighted_score = item["normalized_score"] * secondary_results_weight
+        if doc_id in weighted_scores:
+            # Add to the existing 'distance' score
+            weighted_scores[doc_id]["distance"] += secondary_weighted_score
+        else:
+            # Set the 'distance' key for the new item
+            item["distance"] = secondary_weighted_score
+            weighted_scores[doc_id] = item
+    ranked_results = sorted(
+        weighted_scores.values(), key=lambda item: item["distance"], reverse=True
+    )
+    for result in ranked_results:
+        result.pop("normalized_score", None)
+    return ranked_results[:fetch_top_k]
+def reciprocal_rank_fusion(
+    primary_search_results: Sequence[RowMapping],
+    secondary_search_results: Sequence[RowMapping],
+    rrf_k: float = 60,
+    fetch_top_k: int = 4,
+    **kwargs: Any,
+) -> Sequence[dict[str, Any]]:
+    """
+    Ranks documents using Reciprocal Rank Fusion (RRF) of scores from two sources.
+    Args:
+        primary_search_results: A list of (document, distance) tuples from
+            the primary search.
+        secondary_search_results: A list of (document, distance) tuples from
+            the secondary search.
+        rrf_k: The RRF parameter k.
+            Defaults to 60.
+        fetch_top_k: The number of documents to fetch after merging the results.
+            Defaults to 4.
+    Returns:
+        A list of (document_id, rrf_score) tuples, sorted by rrf_score
+        in descending order.
+    """
+    distance_strategy = kwargs.get(
+        "distance_strategy", DistanceStrategy.COSINE_DISTANCE
+    )
+    rrf_scores: dict[str, dict[str, Any]] = {}
+    # Process results from primary source
+    # Determine sorting order based on the vector distance strategy.
+    # For COSINE & EUCLIDEAN(distance), we sort ascending (reverse=False).
+    # For INNER_PRODUCT (similarity), we sort descending (reverse=True).
+    is_similarity_metric = distance_strategy == DistanceStrategy.INNER_PRODUCT
+    sorted_primary = sorted(
+        primary_search_results,
+        key=lambda item: item["distance"],
+        reverse=is_similarity_metric,
+    )
+    for rank, row in enumerate(sorted_primary):
+        doc_id = str(list(row.values())[0])
+        if doc_id not in rrf_scores:
+            rrf_scores[doc_id] = dict(row)
+            rrf_scores[doc_id]["distance"] = 0.0
+        # Add the "normalized" rank score
+        rrf_scores[doc_id]["distance"] += 1.0 / (rank + rrf_k)
+    # Process results from secondary source
+    # Keyword search relevance is always "higher is better" -> sort descending
+    sorted_secondary = sorted(
+        secondary_search_results,
+        key=lambda item: item["distance"],
+        reverse=True,
+    )
+    for rank, row in enumerate(sorted_secondary):
+        doc_id = str(list(row.values())[0])
+        if doc_id not in rrf_scores:
+            rrf_scores[doc_id] = dict(row)
+            rrf_scores[doc_id]["distance"] = 0.0
+        # Add the rank score from this list to the existing score
+        rrf_scores[doc_id]["distance"] += 1.0 / (rank + rrf_k)
+    # Sort the results by rrf score in descending order
+    # Sort the results by weighted score in descending order
+    ranked_results = sorted(
+        rrf_scores.values(), key=lambda item: item["distance"], reverse=True
+    )
+    # Extract only the RowMapping for the top results
+    return ranked_results[:fetch_top_k]
+@dataclass
+class HybridSearchConfig(ABC):
+    """
+    AlloyDB Vector Store Hybrid Search Config.
+    Queries might be slow if the hybrid search column does not exist.
+    For best hybrid search performance, consider creating a TSV column
+    and adding GIN index.
+    """
+    tsv_column: Optional[str] = ""
+    tsv_lang: Optional[str] = "pg_catalog.english"
+    fts_query: Optional[str] = ""
+    fusion_function: Callable[
+        [Sequence[RowMapping], Sequence[RowMapping], Any], Sequence[Any]
+    ] = weighted_sum_ranking  # Updated default
+    fusion_function_parameters: dict[str, Any] = field(default_factory=dict)
+    primary_top_k: int = 4
+    secondary_top_k: int = 4
+    index_name: str = "langchain_tsv_index"
+    index_type: str = "GIN"

langchain-postgres 0.0.15__tar.gz → 0.0.16__tar.gz

langchain-postgres 0.0.15tar.gz → 0.0.16tar.gz