PyPI - dataeval - Versions diffs - 0.74.2__tar.gz → 0.75.0__tar.gz - Mend

dataeval 0.74.2tar.gz → 0.75.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (77) hide show

dataeval-0.75.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,136 @@
+Metadata-Version: 2.1
+Name: dataeval
+Version: 0.75.0
+Summary: DataEval provides a simple interface to characterize image data and its impact on model performance across classification and object-detection tasks
+Home-page: https://dataeval.ai/
+License: MIT
+Author: Andrew Weng
+Author-email: andrew.weng@ariacoustics.com
+Maintainer: ARiA
+Maintainer-email: dataeval@ariacoustics.com
+Requires-Python: >=3.9,<3.13
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Science/Research
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3 :: Only
+Classifier: Topic :: Scientific/Engineering
+Provides-Extra: all
+Requires-Dist: matplotlib ; extra == "all"
+Requires-Dist: numpy (>=1.24.3)
+Requires-Dist: pillow (>=10.3.0)
+Requires-Dist: requests
+Requires-Dist: scikit-learn (>=1.5.0)
+Requires-Dist: scipy (>=1.10)
+Requires-Dist: torch (>=2.2.0)
+Requires-Dist: torchvision (>=0.17.0)
+Requires-Dist: tqdm
+Requires-Dist: typing-extensions (>=4.12) ; python_version >= "3.9" and python_version < "4.0"
+Requires-Dist: xxhash (>=3.3)
+Project-URL: Documentation, https://dataeval.readthedocs.io/
+Project-URL: Repository, https://github.com/aria-ml/dataeval/
+Description-Content-Type: text/markdown
+# DataEval
+To view our extensive collection of tutorials, how-to's, explanation guides, and reference material, please visit our documentation on **[Read the Docs](https://dataeval.readthedocs.io/)**
+## About DataEval
+<!-- start tagline -->
+DataEval curates datasets to train and test performant, robust, unbiased and reliable AI models and monitors for data shifts that impact performance of deployed models.
+<!-- end tagline -->
+### Our mission
+<!-- start needs -->
+DataEval is an effective, powerful, and reliable set of tools for any T&E engineer. Throughout all stages of the machine learning lifecycle, DataEval supports **model development, data analysis, and monitoring with state-of-the-art algorithms to help you solve difficult problems. With a focus on computer vision tasks, DataEval provides simple, but effective metrics for performance estimation, bias detection, and dataset linting.
+<!-- end needs -->
+<!-- start JATIC interop -->
+DataEval is easy to install, supports a wide range of Python versions, and is compatible with many of the most popular packages in the scientific and T&E communities.
+DataEval also has native interopability between JATIC's suite of tools when using MAITE-compliant datasets and models.
+<!-- end JATIC interop -->
+## Getting Started
+**Python versions:** 3.9 - 3.12
+**Supported packages**: *NumPy*, *Pandas*, *Sci-kit learn*, *MAITE*, *NRTK*, *Gradient*
+Choose your preferred method of installation below or follow our [installation guide](https://dataeval.readthedocs.io/en/v0.74.2/installation.html).
+* [Installing with pip](#installing-with-pip)
+* [Installing with conda/mamba](#installing-with-conda)
+* [Installing from GitHub](#installing-from-github)
+### **Installing with pip**
+You can install DataEval directly from pypi.org using the following command.  The optional dependencies of DataEval are `all`.
+```
+pip install dataeval[all]
+```
+### **Installing with conda**
+DataEval can be installed in a Conda/Mamba environment using the provided `environment.yaml` file.  As some dependencies
+are installed from the `pytorch` channel, the channel is specified in the below example.
+```
+micromamba create -f environment\environment.yaml -c pytorch
+```
+### **Installing from GitHub**
+To install DataEval from source locally on Ubuntu, you will need `git-lfs` to download larger, binary source files and `poetry` for project dependency management.
+```
+sudo apt-get install git-lfs
+pip install poetry
+```
+Pull the source down and change to the DataEval project directory.
+```
+git clone https://github.com/aria-ml/dataeval.git
+cd dataeval
+```
+Install DataEval with optional dependencies for development.
+```
+poetry install --all-extras --with dev
+```
+Now that DataEval is installed, you can run commands in the poetry virtual environment by prefixing shell commands with `poetry run`, or activate the virtual environment directly in the shell.
+```
+poetry shell
+```
+## Contact Us
+If you have any questions, feel free to reach out to the people below:
+- **POC**: Scott Swan @scott.swan
+- **DPOC**: Andrew Weng @aweng
+## Acknowledgement
+<!-- start attribution -->
+### Alibi-Detect
+This project uses code from the [Alibi-Detect](https://github.com/SeldonIO/alibi-detect) Python library developed by SeldonIO.\
+Additional documentation from their developers is available on the [Alibi-Detect documentation page](https://docs.seldon.io/projects/alibi-detect/en/stable/).
+### CDAO Funding Acknowledgement
+This material is based upon work supported by the Chief Digital and Artificial Intelligence Office under Contract No. W519TC-23-9-2033. The views and conclusions contained herein are those of the author(s) and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the U.S. Government.
+<!-- end attribution -->

dataeval-0.75.0/README.md ADDED Viewed

@@ -0,0 +1,97 @@
+# DataEval
+To view our extensive collection of tutorials, how-to's, explanation guides, and reference material, please visit our documentation on **[Read the Docs](https://dataeval.readthedocs.io/)**
+## About DataEval
+<!-- start tagline -->
+DataEval curates datasets to train and test performant, robust, unbiased and reliable AI models and monitors for data shifts that impact performance of deployed models.
+<!-- end tagline -->
+### Our mission
+<!-- start needs -->
+DataEval is an effective, powerful, and reliable set of tools for any T&E engineer. Throughout all stages of the machine learning lifecycle, DataEval supports **model development, data analysis, and monitoring with state-of-the-art algorithms to help you solve difficult problems. With a focus on computer vision tasks, DataEval provides simple, but effective metrics for performance estimation, bias detection, and dataset linting.
+<!-- end needs -->
+<!-- start JATIC interop -->
+DataEval is easy to install, supports a wide range of Python versions, and is compatible with many of the most popular packages in the scientific and T&E communities.
+DataEval also has native interopability between JATIC's suite of tools when using MAITE-compliant datasets and models.
+<!-- end JATIC interop -->
+## Getting Started
+**Python versions:** 3.9 - 3.12
+**Supported packages**: *NumPy*, *Pandas*, *Sci-kit learn*, *MAITE*, *NRTK*, *Gradient*
+Choose your preferred method of installation below or follow our [installation guide](https://dataeval.readthedocs.io/en/v0.74.2/installation.html).
+* [Installing with pip](#installing-with-pip)
+* [Installing with conda/mamba](#installing-with-conda)
+* [Installing from GitHub](#installing-from-github)
+### **Installing with pip**
+You can install DataEval directly from pypi.org using the following command.  The optional dependencies of DataEval are `all`.
+```
+pip install dataeval[all]
+```
+### **Installing with conda**
+DataEval can be installed in a Conda/Mamba environment using the provided `environment.yaml` file.  As some dependencies
+are installed from the `pytorch` channel, the channel is specified in the below example.
+```
+micromamba create -f environment\environment.yaml -c pytorch
+```
+### **Installing from GitHub**
+To install DataEval from source locally on Ubuntu, you will need `git-lfs` to download larger, binary source files and `poetry` for project dependency management.
+```
+sudo apt-get install git-lfs
+pip install poetry
+```
+Pull the source down and change to the DataEval project directory.
+```
+git clone https://github.com/aria-ml/dataeval.git
+cd dataeval
+```
+Install DataEval with optional dependencies for development.
+```
+poetry install --all-extras --with dev
+```
+Now that DataEval is installed, you can run commands in the poetry virtual environment by prefixing shell commands with `poetry run`, or activate the virtual environment directly in the shell.
+```
+poetry shell
+```
+## Contact Us
+If you have any questions, feel free to reach out to the people below:
+- **POC**: Scott Swan @scott.swan
+- **DPOC**: Andrew Weng @aweng
+## Acknowledgement
+<!-- start attribution -->
+### Alibi-Detect
+This project uses code from the [Alibi-Detect](https://github.com/SeldonIO/alibi-detect) Python library developed by SeldonIO.\
+Additional documentation from their developers is available on the [Alibi-Detect documentation page](https://docs.seldon.io/projects/alibi-detect/en/stable/).
+### CDAO Funding Acknowledgement
+This material is based upon work supported by the Chief Digital and Artificial Intelligence Office under Contract No. W519TC-23-9-2033. The views and conclusions contained herein are those of the author(s) and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the U.S. Government.
+<!-- end attribution -->

{dataeval-0.74.2 → dataeval-0.75.0}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "dataeval"
-version = "0.74.2" # dynamic
+version = "0.75.0" # dynamic
 description = "DataEval provides a simple interface to characterize image data and its impact on model performance across classification and object-detection tasks"
 license = "MIT"
 readme = "README.md"
@@ -44,20 +44,20 @@ packages = [
 python = ">=3.9,<3.13"
 numpy = {version = ">=1.24.3"}
 pillow = {version = ">=10.3.0"}
+requests = {version = "*"}
 scipy = {version = ">=1.10"}
 scikit-learn = {version = ">=1.5.0"}
+torch = {version = ">=2.2.0", source = "pytorch"}
+torchvision = {version = ">=0.17.0", source = "pytorch"}
 tqdm = {version = "*"}
-typing-extensions = {version = ">=4.12", python = ">=3.9,<3.10"}  # ParamSpec
+typing-extensions = {version = ">=4.12", python = "^3.9"}  # ParamSpec
 xxhash = {version = ">=3.3"}
 # optional
 matplotlib = {version = "*", optional = true}
-torch = {version = ">=2.2.0", source = "pytorch", optional = true}
-torchvision = {version = ">=0.17.0", source = "pytorch", optional = true}
 [tool.poetry.extras]
-torch = ["torch", "torchvision"]
-all = ["matplotlib", "torch", "torchvision"]
+all = ["matplotlib"]
 [tool.poetry.group.dev]
 optional = true
@@ -65,9 +65,10 @@ optional = true
 [tool.poetry.group.dev.dependencies]
 nox = {version = "*", extras = ["uv"]}
 uv = {version = "*"}
-poetry = {version = "*"}
+poetry = {version = "<2"}
 poetry-lock-groups-plugin = {version = "*"}
 poetry2conda = {version = "*"}
+numpy = {version = ">=2.0.2"}
 # lint
 ruff = {version = "*"}
 codespell = {version = "*", extras = ["toml"]}
@@ -76,14 +77,12 @@ pytest = {version = "*"}
 pytest-cov = {version = "*"}
 pytest-xdist = {version = "*"}
 coverage = {version = "*", extras = ["toml"]}
-torchmetrics = {version = ">=1.0.0", source = "pytorch"}
 # type
 pyright = {version = "*", extras = ["nodejs"]}
 # prototype
 maite = {version = "*"}
 pandas = {version = "*"}
 seaborn = {version = "*"}
-numpy = {version = ">=2.0.2"}
 # docs
 certifi = {version = ">=2024.07.04"}
 enum_tools = {version = ">=0.12.0", extras = ["sphinx"]}
@@ -93,9 +92,11 @@ jupyter-client = {version = ">=8.6.0"}
 jupyter-cache = {version = "*"}
 myst-nb = {version = ">=1.0.0"}
 pydata-sphinx-theme = {version = ">=0.15.4"}
+sphinx-autoapi = {version = "*"}
 sphinx-design = {version = "*"}
 sphinx-tabs = {version = "*"}
 Sphinx = {version = ">=7.2.6"}
+torchmetrics = {version = ">=1.0.0", source = "pytorch"}
 markupsafe = {version = "<3.0.2", optional = true}
 [[tool.poetry.source]]
@@ -136,8 +137,6 @@ parallel = true
 [tool.coverage.report]
 exclude_also = [
   "raise NotImplementedError",
-  "if _IS_TORCH_AVAILABLE",
-  "if _IS_TORCHVISION_AVAILABLE",
 ]
 include = ["*/src/dataeval/*"]
 omit = [
@@ -155,7 +154,7 @@ exclude = [
   ".jupyter_cache",
   "*env*",
   "output",
-  "_build",
+  "build",
   ".nox",
   ".tox",
   "prototype",
@@ -185,7 +184,7 @@ docstring-code-format = true
 docstring-code-line-length = "dynamic"
 [tool.codespell]
-skip = './*env*,./prototype,./output,./docs/_build,./docs/.jupyter_cache,CHANGELOG.md,poetry.lock,*.html'
+skip = './*env*,./prototype,./output,./docs/build,./docs/.jupyter_cache,CHANGELOG.md,poetry.lock,*.html'
 ignore-words-list = ["Hart"]
 [build-system]

dataeval-0.75.0/src/dataeval/__init__.py ADDED Viewed

@@ -0,0 +1,40 @@
+"""
+DataEval provides a simple interface to characterize image data and its impact on model performance
+across classification and object-detection tasks. It also provides capabilities to select and curate
+datasets to test and train performant, robust, unbiased and reliable AI models and monitor for data
+shifts that impact performance of deployed models.
+"""
+from __future__ import annotations
+__all__ = ["detectors", "log", "metrics", "utils", "workflows"]
+__version__ = "0.75.0"
+import logging
+from dataeval import detectors, metrics, utils, workflows
+logging.getLogger(__name__).addHandler(logging.NullHandler())
+def log(level: int = logging.DEBUG, handler: logging.Handler | None = None) -> None:
+    """
+    Helper for quickly adding a StreamHandler to the logger. Useful for debugging.
+    Parameters
+    ----------
+    level : int, default logging.DEBUG(10)
+        Set the logging level for the logger
+    handler : logging.Handler, optional
+        Sets the logging handler for the logger if provided, otherwise logger will be
+        provided with a StreamHandler
+    """
+    import logging
+    logger = logging.getLogger(__name__)
+    if handler is None:
+        handler = logging.StreamHandler() if handler is None else handler
+        handler.setFormatter(logging.Formatter("%(asctime)s %(levelname)s %(message)s"))
+    logger.addHandler(handler)
+    logger.setLevel(level)
+    logger.debug(f"Added logging handler {handler} to logger: {__name__}")

dataeval-0.75.0/src/dataeval/detectors/drift/__init__.py ADDED Viewed

@@ -0,0 +1,22 @@
+"""
+:term:`Drift` detectors identify if the statistical properties of the data has changed.
+"""
+__all__ = [
+    "DriftCVM",
+    "DriftKS",
+    "DriftMMD",
+    "DriftMMDOutput",
+    "DriftOutput",
+    "DriftUncertainty",
+    "preprocess_drift",
+    "updates",
+]
+from dataeval.detectors.drift import updates
+from dataeval.detectors.drift.base import DriftOutput
+from dataeval.detectors.drift.cvm import DriftCVM
+from dataeval.detectors.drift.ks import DriftKS
+from dataeval.detectors.drift.mmd import DriftMMD, DriftMMDOutput
+from dataeval.detectors.drift.torch import preprocess_drift
+from dataeval.detectors.drift.uncertainty import DriftUncertainty

{dataeval-0.74.2 → dataeval-0.75.0}/src/dataeval/detectors/drift/base.py RENAMED Viewed

@@ -8,7 +8,7 @@ Licensed under Apache Software License (Apache 2.0)
 from __future__ import annotations
-__all__ = ["DriftOutput"]
+__all__ = []
 from abc import ABC, abstractmethod
 from dataclasses import dataclass

{dataeval-0.74.2 → dataeval-0.75.0}/src/dataeval/detectors/drift/cvm.py RENAMED Viewed

@@ -8,7 +8,7 @@ Licensed under Apache Software License (Apache 2.0)
 from __future__ import annotations
-__all__ = ["DriftCVM"]
+__all__ = []
 from typing import Callable, Literal

{dataeval-0.74.2 → dataeval-0.75.0}/src/dataeval/detectors/drift/ks.py RENAMED Viewed

@@ -8,7 +8,7 @@ Licensed under Apache Software License (Apache 2.0)
 from __future__ import annotations
-__all__ = ["DriftKS"]
+__all__ = []
 from typing import Callable, Literal

{dataeval-0.74.2 → dataeval-0.75.0}/src/dataeval/detectors/drift/mmd.py RENAMED Viewed

@@ -8,7 +8,7 @@ Licensed under Apache Software License (Apache 2.0)
 from __future__ import annotations
-__all__ = ["DriftMMD", "DriftMMDOutput"]
+__all__ = []
 from dataclasses import dataclass
 from typing import Callable
@@ -17,9 +17,10 @@ import torch
 from numpy.typing import ArrayLike
 from dataeval.detectors.drift.base import BaseDrift, DriftBaseOutput, UpdateStrategy, preprocess_x, update_x_ref
-from dataeval.detectors.drift.torch import _GaussianRBF, _mmd2_from_kernel_matrix, get_device
+from dataeval.detectors.drift.torch import GaussianRBF, mmd2_from_kernel_matrix
 from dataeval.interop import as_numpy
 from dataeval.output import set_metadata
+from dataeval.utils.torch.internal import get_device
 @dataclass(frozen=True)
@@ -109,7 +110,7 @@ class DriftMMD(BaseDrift):
         # initialize kernel
         sigma_tensor = torch.from_numpy(as_numpy(sigma)).to(self.device) if sigma is not None else None
-        self._kernel = _GaussianRBF(sigma_tensor).to(self.device)
+        self._kernel = GaussianRBF(sigma_tensor).to(self.device)
         # compute kernel matrix for the reference data
         if self._infer_sigma or isinstance(sigma_tensor, torch.Tensor):
@@ -150,9 +151,9 @@ class DriftMMD(BaseDrift):
         n = x.shape[0]
         kernel_mat = self._kernel_matrix(x_ref, torch.from_numpy(x).to(self.device))
         kernel_mat = kernel_mat - torch.diag(kernel_mat.diag())  # zero diagonal
-        mmd2 = _mmd2_from_kernel_matrix(kernel_mat, n, permute=False, zero_diag=False)
+        mmd2 = mmd2_from_kernel_matrix(kernel_mat, n, permute=False, zero_diag=False)
         mmd2_permuted = torch.Tensor(
-            [_mmd2_from_kernel_matrix(kernel_mat, n, permute=True, zero_diag=False) for _ in range(self.n_permutations)]
+            [mmd2_from_kernel_matrix(kernel_mat, n, permute=True, zero_diag=False) for _ in range(self.n_permutations)]
         )
         mmd2, mmd2_permuted = mmd2.detach().cpu(), mmd2_permuted.detach().cpu()
         p_val = (mmd2 <= mmd2_permuted).float().mean()

{dataeval-0.74.2 → dataeval-0.75.0}/src/dataeval/detectors/drift/torch.py RENAMED Viewed

@@ -17,10 +17,10 @@ import torch
 import torch.nn as nn
 from numpy.typing import NDArray
-from dataeval.utils.torch.utils import get_device, predict_batch
+from dataeval.utils.torch.internal import get_device, predict_batch
-def _mmd2_from_kernel_matrix(
+def mmd2_from_kernel_matrix(
     kernel_mat: torch.Tensor, m: int, permute: bool = False, zero_diag: bool = True
 ) -> torch.Tensor:
     """
@@ -127,7 +127,7 @@ def _squared_pairwise_distance(
 def sigma_median(x: torch.Tensor, y: torch.Tensor, dist: torch.Tensor) -> torch.Tensor:
     """
-    Bandwidth estimation using the median heuristic :cite:t:`Gretton2012`.
+    Bandwidth estimation using the median heuristic `Gretton2012`
     Parameters
     ----------
@@ -151,7 +151,7 @@ def sigma_median(x: torch.Tensor, y: torch.Tensor, dist: torch.Tensor) -> torch.
     return sigma
-class _GaussianRBF(nn.Module):
+class GaussianRBF(nn.Module):
     """
     Gaussian RBF kernel: k(x,y) = exp(-(1/(2*sigma^2)||x-y||^2).
@@ -179,18 +179,18 @@ class _GaussianRBF(nn.Module):
     ) -> None:
         super().__init__()
         init_sigma_fn = sigma_median if init_sigma_fn is None else init_sigma_fn
-        self.config = {
+        self.config: dict[str, Any] = {
             "sigma": sigma,
             "trainable": trainable,
             "init_sigma_fn": init_sigma_fn,
         }
         if sigma is None:
-            self.log_sigma = nn.Parameter(torch.empty(1), requires_grad=trainable)
-            self.init_required = True
+            self.log_sigma: nn.Parameter = nn.Parameter(torch.empty(1), requires_grad=trainable)
+            self.init_required: bool = True
         else:
             sigma = sigma.reshape(-1)  # [Ns,]
-            self.log_sigma = nn.Parameter(sigma.log(), requires_grad=trainable)
-            self.init_required = False
+            self.log_sigma: nn.Parameter = nn.Parameter(sigma.log(), requires_grad=trainable)
+            self.init_required: bool = False
         self.init_sigma_fn = init_sigma_fn
         self.trainable = trainable
@@ -200,8 +200,8 @@ class _GaussianRBF(nn.Module):
     def forward(
         self,
-        x: np.ndarray | torch.Tensor,
-        y: np.ndarray | torch.Tensor,
+        x: np.ndarray[Any, Any] | torch.Tensor,
+        y: np.ndarray[Any, Any] | torch.Tensor,
         infer_sigma: bool = False,
     ) -> torch.Tensor:
         x, y = torch.as_tensor(x), torch.as_tensor(y)
@@ -213,7 +213,7 @@ class _GaussianRBF(nn.Module):
             sigma = self.init_sigma_fn(x, y, dist)
             with torch.no_grad():
                 self.log_sigma.copy_(sigma.log().clone())
-            self.init_required = False
+            self.init_required: bool = False
         gamma = 1.0 / (2.0 * self.sigma**2)  # [Ns,]
         # TODO: do matrix multiplication after all?

{dataeval-0.74.2 → dataeval-0.75.0}/src/dataeval/detectors/drift/uncertainty.py RENAMED Viewed

@@ -8,7 +8,7 @@ Licensed under Apache Software License (Apache 2.0)
 from __future__ import annotations
-__all__ = ["DriftUncertainty"]
+__all__ = []
 from functools import partial
 from typing import Callable, Literal
@@ -20,7 +20,8 @@ from scipy.stats import entropy
 from dataeval.detectors.drift.base import DriftOutput, UpdateStrategy
 from dataeval.detectors.drift.ks import DriftKS
-from dataeval.detectors.drift.torch import get_device, preprocess_drift
+from dataeval.detectors.drift.torch import preprocess_drift
+from dataeval.utils.torch.internal import get_device
 def classifier_uncertainty(

{dataeval-0.74.2 → dataeval-0.75.0}/src/dataeval/detectors/linters/clusterer.py RENAMED Viewed

@@ -1,6 +1,6 @@
 from __future__ import annotations
-__all__ = ["ClustererOutput", "Clusterer"]
+__all__ = []
 from dataclasses import dataclass
 from typing import Any, Iterable, NamedTuple, cast
@@ -147,12 +147,6 @@ class Clusterer:
     ----
     The Clusterer works best when the length of the feature dimension, P, is less than 500.
     If flattening a CxHxW image results in a dimension larger than 500, then it is recommended to reduce the dimensions.
-    Example
-    -------
-    Initialize the Clusterer class:
-    >>> cluster = Clusterer(dataset)
     """
     def __init__(self, dataset: ArrayLike) -> None:
@@ -506,6 +500,7 @@ class Clusterer:
         Example
         -------
+        >>> cluster = Clusterer(clusterer_images)
         >>> cluster.evaluate()
         ClustererOutput(outliers=[18, 21, 34, 35, 45], potential_outliers=[13, 15, 42], duplicates=[[9, 24], [23, 48]], potential_duplicates=[[1, 11]])
         """  # noqa: E501

{dataeval-0.74.2 → dataeval-0.75.0}/src/dataeval/detectors/linters/duplicates.py RENAMED Viewed

@@ -1,6 +1,6 @@
 from __future__ import annotations
-__all__ = ["DuplicatesOutput", "Duplicates"]
+__all__ = []
 from dataclasses import dataclass
 from typing import Generic, Iterable, Sequence, TypeVar, overload
@@ -51,13 +51,6 @@ class Duplicates:
     ----------
     only_exact : bool, default False
         Only inspect the dataset for exact image matches
-    Example
-    -------
-    Initialize the Duplicates class:
-    >>> all_dupes = Duplicates()
-    >>> exact_dupes = Duplicates(only_exact=True)
     """
     def __init__(self, only_exact: bool = False) -> None:
@@ -73,7 +66,8 @@ class Duplicates:
         if not self.only_exact:
             near_dict: dict[int, list] = {}
             for i, value in enumerate(stats["pchash"]):
-                near_dict.setdefault(value, []).append(i)
+                if value:
+                    near_dict.setdefault(value, []).append(i)
             near = [sorted(v) for v in near_dict.values() if len(v) > 1 and not any(set(v).issubset(x) for x in exact)]
         else:
             near = []
@@ -112,6 +106,7 @@ class Duplicates:
         Example
         -------
+        >>> exact_dupes = Duplicates(only_exact=True)
         >>> exact_dupes.from_stats([hashes1, hashes2])
         DuplicatesOutput(exact=[{0: [3, 20]}, {0: [16], 1: [12]}], near=[])
         """
@@ -159,7 +154,8 @@ class Duplicates:
         Example
         -------
-        >>> all_dupes.evaluate(images)
+        >>> all_dupes = Duplicates()
+        >>> all_dupes.evaluate(duplicate_images)
         DuplicatesOutput(exact=[[3, 20], [16, 37]], near=[[3, 20, 22], [12, 18], [13, 36], [14, 31], [17, 27], [19, 38, 47]])
         """  # noqa: E501
         self.stats = hashstats(data)

{dataeval-0.74.2 → dataeval-0.75.0}/src/dataeval/detectors/linters/outliers.py RENAMED Viewed

@@ -1,6 +1,6 @@
 from __future__ import annotations
-__all__ = ["OutliersOutput", "Outliers"]
+__all__ = []
 from dataclasses import dataclass
 from typing import Generic, Iterable, Literal, Sequence, TypeVar, Union, overload
@@ -188,6 +188,7 @@ class Outliers:
         -------
         Evaluate the dataset:
+        >>> outliers = Outliers(outlier_method="zscore", outlier_threshold=3.5)
         >>> results = outliers.from_stats([stats1, stats2])
         >>> len(results)
         2
@@ -248,7 +249,8 @@ class Outliers:
         -------
         Evaluate the dataset:
-        >>> results = outliers.evaluate(images)
+        >>> outliers = Outliers(outlier_method="zscore", outlier_threshold=3.5)
+        >>> results = outliers.evaluate(outlier_images)
         >>> list(results.issues)
         [10, 12]
         >>> results.issues[10]

dataeval-0.75.0/src/dataeval/detectors/ood/__init__.py ADDED Viewed

@@ -0,0 +1,8 @@
+"""
+Out-of-distribution (OOD)` detectors identify data that is different from the data used to train a particular model.
+"""
+__all__ = ["OODOutput", "OODScoreOutput", "OOD_AE"]
+from dataeval.detectors.ood.ae import OOD_AE
+from dataeval.detectors.ood.output import OODOutput, OODScoreOutput

dataeval 0.74.2__tar.gz → 0.75.0__tar.gz

dataeval 0.74.2tar.gz → 0.75.0tar.gz