PyPI - cbrkit - Versions diffs - 0.6.0__tar.gz → 0.6.2__tar.gz - Mend

cbrkit 0.6.0tar.gz → 0.6.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

{cbrkit-0.6.0 → cbrkit-0.6.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: cbrkit
-Version: 0.6.0
+Version: 0.6.2
 Summary: Customizable Case-Based Reasoning (CBR) toolkit for Python with a built-in API and CLI.
 Home-page: https://wi2trier.github.io/cbrkit/
 License: MIT
@@ -41,9 +41,9 @@ Requires-Dist: sentence-transformers (>=2.2,<3.0) ; extra == "all" or extra == "
 Requires-Dist: spacy (>=3.7,<4.0) ; extra == "all" or extra == "all" or extra == "nlp"
 Requires-Dist: torch (>=2.1.1,<3.0.0) ; extra == "all" or extra == "transformers"
 Requires-Dist: transformers (>=4.35,<5.0) ; extra == "all" or extra == "transformers"
-Requires-Dist: typer[all] (>=0.9,<0.10) ; extra == "all" or extra == "cli"
+Requires-Dist: typer[all] (>=0.9,<1.0) ; extra == "all" or extra == "cli"
 Requires-Dist: uvicorn[standard] (>=0.24,<1.0) ; extra == "all" or extra == "api"
-Requires-Dist: xmltodict (>=0.13,<0.14)
+Requires-Dist: xmltodict (>=0.13,<1.0)
 Project-URL: Repository, https://github.com/wi2trier/cbrkit
 Description-Content-Type: text/markdown
@@ -161,7 +161,7 @@ It is possible to define custom measures, use built-in ones, or combine both.
 In CBRkit, a similarity measure is defined as a function that takes two arguments (a case and a query) and returns a similarity score: `sim = f(x, y)`.
 It also supports pipeline-based similarity measures that are popular in NLP where a list of tuples is passed to the similarity measure: `sims = f([(x1, y1), (x2, y2), ...])`.
 This generic approach allows you to define custom similarity measures for your specific use case.
-For instance, you may define the following function for comparing colors:
+For instance, the following function not only checks for strict equality, but also for partial matches (e.g., `x = "blue"` and `y = "light blue"`):
 ```python
 def color_similarity(x: str, y: str) -> float:
@@ -173,7 +173,8 @@ def color_similarity(x: str, y: str) -> float:
     return 0.0
 ```
-In addition to checking for strict equality, our function also checks for partial matches (e.g., `x = "blue"` and `y = "light blue"`).
+**Please note:** CBRkit inspects the signature of custom similarity functions to perform some checks.
+You need to make sure that the two parameters are named `x` and `y`, otherwise CBRkit will throw an error.
 ### Built-in Similarity Measures
@@ -207,7 +208,7 @@ For the common use case of attribute-value based data, CBRkit provides a predefi
 cbrkit.sim.attribute_value(
     attributes={
         "price": cbrkit.sim.numbers.linear(),
-        "color": color_similarity
+        "color": color_similarity # custom measure
         ...
     },
     aggregator=cbrkit.sim.aggregator(pooling="mean"),
@@ -216,7 +217,8 @@ cbrkit.sim.attribute_value(
 The `attribute_value` function lets you define measures for each attribute of the cases/queries as well as the aggregation function.
 It also allows to use custom measures like the `color_similarity` function defined above.
-**Please note:** The custom measure is not called directly but passed as a reference to the `attribute_value` function since it is not a generator function.
+**Please note:** The custom measure is not executed (i.e., there are **no** parenthesis at the end), but instead passed as a reference to the `attribute_value` function.
 You may even nest similarity functions to create measures for object-oriented cases:
@@ -230,7 +232,7 @@ cbrkit.sim.attribute_value(
             },
             aggregator=cbrkit.sim.aggregator(pooling="mean"),
         ),
-        "color": color_similarity
+        "color": color_similarity # custom measure
         ...
     },
     aggregator=cbrkit.sim.aggregator(pooling="mean"),
@@ -268,19 +270,14 @@ In some cases, it is useful to combine multiple retrieval pipelines, for example
 To use this pattern, first create the corresponding retrievers using the builder:
 ```python
-retriever1 = cbrkit.retrieval.build(..., limit=10)
-# since retriever2 only receives the cases from retriever1, we do not need a limit
-retriever2 = cbrkit.retrieval.build(..., limit=None)
+retriever1 = cbrkit.retrieval.build(..., min_similarity=0.5, limit=20)
+retriever2 = cbrkit.retrieval.build(..., limit=10)
 ```
 Then apply all of them sequentially by passing them as a list or tuple to the `apply` function:
 ```python
-result = cbrkit.retrieval.apply(
-    casebase,
-    query,
-    (retriever1, retriever2)
-)
+result = cbrkit.retrieval.apply(casebase, query, (retriever1, retriever2))
 ```
 The result has the following two attributes:

{cbrkit-0.6.0 → cbrkit-0.6.2}/README.md RENAMED Viewed

@@ -112,7 +112,7 @@ It is possible to define custom measures, use built-in ones, or combine both.
 In CBRkit, a similarity measure is defined as a function that takes two arguments (a case and a query) and returns a similarity score: `sim = f(x, y)`.
 It also supports pipeline-based similarity measures that are popular in NLP where a list of tuples is passed to the similarity measure: `sims = f([(x1, y1), (x2, y2), ...])`.
 This generic approach allows you to define custom similarity measures for your specific use case.
-For instance, you may define the following function for comparing colors:
+For instance, the following function not only checks for strict equality, but also for partial matches (e.g., `x = "blue"` and `y = "light blue"`):
 ```python
 def color_similarity(x: str, y: str) -> float:
@@ -124,7 +124,8 @@ def color_similarity(x: str, y: str) -> float:
     return 0.0
 ```
-In addition to checking for strict equality, our function also checks for partial matches (e.g., `x = "blue"` and `y = "light blue"`).
+**Please note:** CBRkit inspects the signature of custom similarity functions to perform some checks.
+You need to make sure that the two parameters are named `x` and `y`, otherwise CBRkit will throw an error.
 ### Built-in Similarity Measures
@@ -158,7 +159,7 @@ For the common use case of attribute-value based data, CBRkit provides a predefi
 cbrkit.sim.attribute_value(
     attributes={
         "price": cbrkit.sim.numbers.linear(),
-        "color": color_similarity
+        "color": color_similarity # custom measure
         ...
     },
     aggregator=cbrkit.sim.aggregator(pooling="mean"),
@@ -167,7 +168,8 @@ cbrkit.sim.attribute_value(
 The `attribute_value` function lets you define measures for each attribute of the cases/queries as well as the aggregation function.
 It also allows to use custom measures like the `color_similarity` function defined above.
-**Please note:** The custom measure is not called directly but passed as a reference to the `attribute_value` function since it is not a generator function.
+**Please note:** The custom measure is not executed (i.e., there are **no** parenthesis at the end), but instead passed as a reference to the `attribute_value` function.
 You may even nest similarity functions to create measures for object-oriented cases:
@@ -181,7 +183,7 @@ cbrkit.sim.attribute_value(
             },
             aggregator=cbrkit.sim.aggregator(pooling="mean"),
         ),
-        "color": color_similarity
+        "color": color_similarity # custom measure
         ...
     },
     aggregator=cbrkit.sim.aggregator(pooling="mean"),
@@ -219,19 +221,14 @@ In some cases, it is useful to combine multiple retrieval pipelines, for example
 To use this pattern, first create the corresponding retrievers using the builder:
 ```python
-retriever1 = cbrkit.retrieval.build(..., limit=10)
-# since retriever2 only receives the cases from retriever1, we do not need a limit
-retriever2 = cbrkit.retrieval.build(..., limit=None)
+retriever1 = cbrkit.retrieval.build(..., min_similarity=0.5, limit=20)
+retriever2 = cbrkit.retrieval.build(..., limit=10)
 ```
 Then apply all of them sequentially by passing them as a list or tuple to the `apply` function:
 ```python
-result = cbrkit.retrieval.apply(
-    casebase,
-    query,
-    (retriever1, retriever2)
-)
+result = cbrkit.retrieval.apply(casebase, query, (retriever1, retriever2))
 ```
 The result has the following two attributes:

{cbrkit-0.6.0 → cbrkit-0.6.2}/cbrkit/helpers.py RENAMED Viewed

@@ -97,7 +97,12 @@ def sim2seq(
         return wrapped_func
-    return cast(SimSeqFunc[ValueType, SimType], func)
+    elif len(signature.parameters) == 1:
+        return cast(SimSeqFunc[ValueType, SimType], func)
+    raise TypeError(
+        f"Invalid signature for similarity function: {signature.parameters}"
+    )
 def sim2map(
@@ -107,7 +112,13 @@ def sim2map(
 ) -> SimMapFunc[KeyType, ValueType, SimType]:
     signature = inspect_signature(func)
-    if len(signature.parameters) == 2 and signature.parameters.keys() == {"x", "y"}:
+    if len(signature.parameters) == 2 and signature.parameters.keys() in (
+        {"x_map", "y"},
+        {"casebase", "query"},
+    ):
+        return cast(SimMapFunc[KeyType, ValueType, SimType], func)
+    elif len(signature.parameters) == 2:
         sim_pair_func = cast(SimPairFunc[ValueType, SimType], func)
         def wrapped_sim_pair_func(
@@ -131,7 +142,9 @@ def sim2map(
         return wrapped_sim_seq_func
-    return cast(SimMapFunc[KeyType, ValueType, SimType], func)
+    raise TypeError(
+        f"Invalid signature for similarity function: {signature.parameters}"
+    )
 def unpack_sim(sim: AnyFloat) -> float:

{cbrkit-0.6.0 → cbrkit-0.6.2}/cbrkit/retrieval.py RENAMED Viewed

@@ -8,8 +8,8 @@ from cbrkit.typing import (
     AnySimFunc,
     Casebase,
     KeyType,
-    RetrieveFunc,
     SimMap,
+    SimMapFunc,
     SimType,
     ValueType,
 )
@@ -76,8 +76,8 @@ class Result(Generic[KeyType, ValueType, SimType]):
 def apply(
     casebase: Casebase[KeyType, ValueType],
     query: ValueType,
-    retrievers: RetrieveFunc[KeyType, ValueType, SimType]
-    | Sequence[RetrieveFunc[KeyType, ValueType, SimType]],
+    retrievers: SimMapFunc[KeyType, ValueType, SimType]
+    | Sequence[SimMapFunc[KeyType, ValueType, SimType]],
 ) -> Result[KeyType, ValueType, SimType]:
     """Applies a query to a Casebase using retriever functions.
@@ -135,7 +135,7 @@ def build(
     limit: int | None = None,
     min_similarity: float | None = None,
     max_similarity: float | None = None,
-) -> RetrieveFunc[KeyType, ValueType, SimType]:
+) -> SimMapFunc[KeyType, ValueType, SimType]:
     """Based on the similarity function this function creates a retriever function.
     The given limit will be applied after filtering for min/max similarity.
@@ -174,10 +174,10 @@ def build(
     sim_func = sim2map(similarity_func)
     def wrapped_func(
-        casebase: Casebase[KeyType, ValueType],
-        query: ValueType,
+        x_map: Casebase[KeyType, ValueType],
+        y: ValueType,
     ) -> SimMap[KeyType, SimType]:
-        similarities = sim_func(casebase, query)
+        similarities = sim_func(x_map, y)
         ranking = _similarities2ranking(similarities)
         if min_similarity is not None:
@@ -200,11 +200,11 @@ def build(
 def load(
     import_names: Sequence[str] | str,
-) -> list[RetrieveFunc[Any, Any, Any]]:
+) -> list[SimMapFunc[Any, Any, Any]]:
     if isinstance(import_names, str):
         import_names = [import_names]
-    retrievers: list[RetrieveFunc] = []
+    retrievers: list[SimMapFunc] = []
     for import_path in import_names:
         obj = load_python(import_path)
@@ -220,11 +220,11 @@ def load(
 def load_map(
     import_names: Collection[str] | str,
-) -> dict[str, RetrieveFunc[Any, Any, Any]]:
+) -> dict[str, SimMapFunc[Any, Any, Any]]:
     if isinstance(import_names, str):
         import_names = [import_names]
-    retrievers: dict[str, RetrieveFunc] = {}
+    retrievers: dict[str, SimMapFunc] = {}
     for import_path in import_names:
         obj = load_python(import_path)

{cbrkit-0.6.0 → cbrkit-0.6.2}/cbrkit/typing.py RENAMED Viewed

@@ -28,9 +28,10 @@ SimSeq = Sequence[SimType]
 SimSeqOrMap = SimMap[KeyType, SimType] | SimSeq[SimType]
+# Parameter names must match so that the signature can be inspected, do not add `/` here!
 class SimMapFunc(Protocol[KeyType, ValueType_contra, SimType_cov]):
     def __call__(
-        self, x_map: Mapping[KeyType, ValueType_contra], y: ValueType_contra, /
+        self, x_map: Mapping[KeyType, ValueType_contra], y: ValueType_contra
     ) -> SimMap[KeyType, SimType_cov]:
         ...
@@ -42,9 +43,8 @@ class SimSeqFunc(Protocol[ValueType_contra, SimType_cov]):
         ...
-# Parameter names must match so that the signature can be inspected, do not add `/` here!
 class SimPairFunc(Protocol[ValueType_contra, SimType_cov]):
-    def __call__(self, x: ValueType_contra, y: ValueType_contra) -> SimType_cov:
+    def __call__(self, x: ValueType_contra, y: ValueType_contra, /) -> SimType_cov:
         ...
@@ -54,8 +54,6 @@ AnySimFunc = (
     | SimPairFunc[ValueType, SimType]
 )
-RetrieveFunc = SimMapFunc[KeyType, ValueType, SimType]
 class AggregatorFunc(Protocol[KeyType, SimType_contra]):
     def __call__(

{cbrkit-0.6.0 → cbrkit-0.6.2}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "cbrkit"
-version = "0.6.0"
+version = "0.6.2"
 description = "Customizable Case-Based Reasoning (CBR) toolkit for Python with a built-in API and CLI."
 authors = ["Mirko Lenz <mirko@mirkolenz.com>"]
 license = "MIT"
@@ -52,13 +52,13 @@ sentence-transformers = { version = "^2.2", optional = true }
 spacy = { version = "^3.7", optional = true }
 torch = { version = "^2.1.1", optional = true }
 transformers = { version = "^4.35", optional = true }
-typer = { version = "^0.9", extras = ["all"], optional = true }
+typer = { version = ">=0.9, <1.0", extras = ["all"], optional = true }
 uvicorn = { version = ">=0.24, <1.0", optional = true, extras = ["standard"] }
-xmltodict = "^0.13"
+xmltodict = ">=0.13, <1.0"
 [tool.poetry.group.dev.dependencies]
 pytest = "^8.0.0"
-pytest-cov = "^4.1"
+pytest-cov = "^5.0.0"
 [tool.poetry.group.docs.dependencies]
 pdoc = "^14.4"