PyPI - dataeval - Versions diffs - 1.0.5__tar.gz → 1.1.0rc0__tar.gz - Mend

dataeval 1.0.5tar.gz → 1.1.0rc0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (141) hide show

{dataeval-1.0.5 → dataeval-1.1.0rc0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: dataeval
-Version: 1.0.5
+Version: 1.1.0rc0
 Summary: DataEval provides a simple interface to characterize image data and its impact on model performance across classification and object-detection tasks
 Project-URL: Homepage, https://dataeval.ai/
 Project-URL: Repository, https://github.com/aria-ml/dataeval/
@@ -21,6 +21,7 @@ Classifier: Programming Language :: Python :: 3.14
 Classifier: Topic :: Scientific/Engineering
 Requires-Python: >=3.10
 Requires-Dist: lightgbm>=4
+Requires-Dist: maite>=0.9.4
 Requires-Dist: numba>=0.61.0
 Requires-Dist: numpy>=1.24.2
 Requires-Dist: polars>=1.0.0
@@ -37,23 +38,27 @@ Requires-Dist: torchvision>=0.17.0; extra == 'cpu'
 Provides-Extra: cu118
 Requires-Dist: torch>=2.2.0; extra == 'cu118'
 Requires-Dist: torchvision>=0.17.0; extra == 'cu118'
-Provides-Extra: cu124
-Requires-Dist: torch>=2.2.0; extra == 'cu124'
-Requires-Dist: torchvision>=0.17.0; extra == 'cu124'
 Provides-Extra: cu128
 Requires-Dist: torch>=2.2.0; extra == 'cu128'
 Requires-Dist: torchvision>=0.17.0; extra == 'cu128'
+Provides-Extra: litert
+Requires-Dist: ai-edge-litert>=2.0; (python_version <= '3.14') and extra == 'litert'
 Provides-Extra: onnx
-Requires-Dist: onnx; extra == 'onnx'
-Requires-Dist: onnxruntime>=1.14.0; extra == 'onnx'
+Requires-Dist: onnx>=1.14.0; extra == 'onnx'
+Requires-Dist: onnxruntime<1.24,>=1.14.0; (python_version == '3.10') and extra == 'onnx'
+Requires-Dist: onnxruntime>=1.14.0; (python_version >= '3.11') and extra == 'onnx'
 Provides-Extra: onnx-gpu
-Requires-Dist: onnx; extra == 'onnx-gpu'
-Requires-Dist: onnxruntime-gpu>=1.14.0; extra == 'onnx-gpu'
+Requires-Dist: onnx>=1.14.0; extra == 'onnx-gpu'
+Requires-Dist: onnxruntime-gpu<1.24,>=1.14.0; (python_version == '3.10') and extra == 'onnx-gpu'
+Requires-Dist: onnxruntime-gpu>=1.14.0; (python_version >= '3.11') and extra == 'onnx-gpu'
+Provides-Extra: ontology
+Requires-Dist: rdflib>=7.0; extra == 'ontology'
 Provides-Extra: opencv
 Requires-Dist: opencv-python-headless>=4.8.0; extra == 'opencv'
 Description-Content-Type: text/markdown
 <!-- markdownlint-disable MD041 -->
 ![dataeval-logo](docs/source/_static/images/DataEval_ImageText.png)
 <!-- :auto badges: -->
@@ -130,14 +135,28 @@ You can install DataEval directly from pypi.org using the following command.
 pip install dataeval
 ```
+By default, PyTorch is installed from PyPI which includes CUDA support on Linux.
+To install a specific PyTorch variant, use `--extra-index-url`:
+```bash
+# CPU only
+pip install dataeval --extra-index-url https://download.pytorch.org/whl/cpu
+# CUDA 11.8
+pip install dataeval --extra-index-url https://download.pytorch.org/whl/cu118
+# CUDA 12.8
+pip install dataeval --extra-index-url https://download.pytorch.org/whl/cu128
+```
 ### **Installing with conda**
 DataEval can be installed in a Conda/Mamba environment using the provided
-`environment.yaml` file. As some dependencies are installed from the `pytorch`
+`environment.yml` file. As some dependencies are installed from the `pytorch`
 channel, the channel is specified in the below example.
 ```bash
-micromamba create -f environment\environment.yaml -c pytorch
+micromamba create -f environment\environment.yml -c pytorch
 ```
 ### **Installing from GitHub**
@@ -401,7 +420,7 @@ shape: (3, 5)
 A result with many large groups is a signal that your dataset contains
 repeated collection events. Before training, remove all but one sample from
-each group. See the [deduplication how-to guide](./docs/source/notebooks/h2_deduplicate.md)
+each group. See the [deduplication how-to guide](./docs/source/notebooks/h2_deduplicate.py)
 for a complete walkthrough, including how to choose which sample to keep.
 ### Where to go next

{dataeval-1.0.5 → dataeval-1.1.0rc0}/README.md RENAMED Viewed

@@ -1,4 +1,5 @@
 <!-- markdownlint-disable MD041 -->
 ![dataeval-logo](docs/source/_static/images/DataEval_ImageText.png)
 <!-- :auto badges: -->
@@ -75,14 +76,28 @@ You can install DataEval directly from pypi.org using the following command.
 pip install dataeval
 ```
+By default, PyTorch is installed from PyPI which includes CUDA support on Linux.
+To install a specific PyTorch variant, use `--extra-index-url`:
+```bash
+# CPU only
+pip install dataeval --extra-index-url https://download.pytorch.org/whl/cpu
+# CUDA 11.8
+pip install dataeval --extra-index-url https://download.pytorch.org/whl/cu118
+# CUDA 12.8
+pip install dataeval --extra-index-url https://download.pytorch.org/whl/cu128
+```
 ### **Installing with conda**
 DataEval can be installed in a Conda/Mamba environment using the provided
-`environment.yaml` file. As some dependencies are installed from the `pytorch`
+`environment.yml` file. As some dependencies are installed from the `pytorch`
 channel, the channel is specified in the below example.
 ```bash
-micromamba create -f environment\environment.yaml -c pytorch
+micromamba create -f environment\environment.yml -c pytorch
 ```
 ### **Installing from GitHub**
@@ -346,7 +361,7 @@ shape: (3, 5)
 A result with many large groups is a signal that your dataset contains
 repeated collection events. Before training, remove all but one sample from
-each group. See the [deduplication how-to guide](./docs/source/notebooks/h2_deduplicate.md)
+each group. See the [deduplication how-to guide](./docs/source/notebooks/h2_deduplicate.py)
 for a complete walkthrough, including how to choose which sample to keep.
 ### Where to go next

{dataeval-1.0.5 → dataeval-1.1.0rc0}/pyproject.toml RENAMED Viewed

@@ -31,6 +31,7 @@ classifiers = [
   "Topic :: Scientific/Engineering",
 ]
 dependencies = [
+  "maite>=0.9.4",
   "numba>=0.61.0",
   "lightgbm>=4",
   "numpy>=1.24.2",
@@ -47,17 +48,43 @@ dependencies = [
 [project.optional-dependencies]
 cpu = ["torch>=2.2.0", "torchvision>=0.17.0"]
 cu118 = ["torch>=2.2.0", "torchvision>=0.17.0"]
-cu124 = ["torch>=2.2.0", "torchvision>=0.17.0"]
 cu128 = ["torch>=2.2.0", "torchvision>=0.17.0"]
+litert = ["ai-edge-litert>=2.0; python_version <= '3.14'"]
 opencv = ["opencv-python-headless>=4.8.0"]
-onnx = ["onnx", "onnxruntime>=1.14.0"]
-onnx-gpu = ["onnx", "onnxruntime-gpu>=1.14.0"]
+onnx = [
+  "onnx>=1.14.0",
+  "onnxruntime>=1.14.0,<1.24; python_version == '3.10'",
+  "onnxruntime>=1.14.0; python_version >= '3.11'",
+]
+onnx-gpu = [
+  "onnx>=1.14.0",
+  "onnxruntime-gpu>=1.14.0,<1.24; python_version == '3.10'",
+  "onnxruntime-gpu>=1.14.0; python_version >= '3.11'",
+]
+ontology = ["rdflib>=7.0"]
 [project.urls]
 Homepage = "https://dataeval.ai/"
 Repository = "https://github.com/aria-ml/dataeval/"
 Documentation = "https://dataeval.readthedocs.io/"
+# MAITE interoperability entry-points.
+[project.entry-points."maite.tasks"]
+dataeval_balance = "dataeval.bias:Balance"
+dataeval_diversity = "dataeval.bias:Diversity"
+dataeval_parity = "dataeval.bias:Parity"
+dataeval_outliers = "dataeval.quality:Outliers"
+dataeval_duplicates = "dataeval.quality:Duplicates"
+dataeval_sufficiency = "dataeval.performance:Sufficiency"
+[project.entry-points."maite.protocols.image_classification.Model"]
+dataeval_OnnxImageClassifier = "dataeval.models:OnnxImageClassifier"
+dataeval_LiteRtImageClassifier = "dataeval.models:LiteRtImageClassifier"
+[project.entry-points."maite.protocols.object_detection.Model"]
+dataeval_OnnxObjectDetector = "dataeval.models:OnnxObjectDetector"
+dataeval_LiteRtObjectDetector = "dataeval.models:LiteRtObjectDetector"
 [dependency-groups]
 base = [
   "uv>=0.8.0",
@@ -65,7 +92,7 @@ base = [
 lock = [
   { include-group = "base" },
   "pyproject2conda>=0.22",
-  "poetry>=2; python_version<'3.14'",
+  "poetry==2.2.0; python_version<'3.14'",
 ]
 lint = [
   "ruff>=0.11",
@@ -73,19 +100,15 @@ lint = [
 ]
 docsync = [
   "jupytext>=1.19.1",
-  "mdformat-myst",
-]
-doclint = [
-  { include-group = "docs"},
-  "ruff>=0.11",
-  "pyright[nodejs]>=1.1.400",
 ]
 test = [
   "coverage[toml]>=7.6",
   "onnx>=1.14.0",
+  "onnxscript>=0.6.0",
   "pytest>=8.3",
   "pytest-cov>=6.1",
   "pytest-xdist>=3.6.1",
+  "rdflib>=7.0",
 ]
 verify = [
   "pytest>=8.3",
@@ -103,10 +126,13 @@ docs = [
   "jinja2>=3.1.6",
   "jupyter-client>=8.6.0",
   "jupyter-cache>=1.0",
-  "maite-datasets>=0.0.12",
+  "maite-datasets>=0.0.15",
   "myst-nb>=1.0",
   "opencv-python-headless>=4.8.0",
+  "pandas>=2.0.0",
   "plotly>=6.2.0",
+  "rapidfuzz>=3.0",
+  "rdflib>=7.0",
   "sphinx-autoapi>=3.6.0",
   "sphinx-design>=0.6.1",
   "sphinx-immaterial>=0.12.5",
@@ -114,9 +140,9 @@ docs = [
   "sphinx-tabs>=3.4.7",
   "Sphinx>=7.2.6,<9.0.0", # sphinx-immaterial <= 0.13.9 is not compatible with sphinx >=9.0
   "torchmetrics>=1.0.0",
-  "torchvision>=0.17.0",
   "markupsafe>=3,<3.0.2",
   "jupytext>=1.19.1",
+  "ultralytics>=8.0.0",
 ]
 security = [  # keep in sync with [tool.uv.constraint-dependencies]
   "cryptography>=46.0.5",    # CVE-2026-26007: Missing Subgroup Validation for SECT Curves
@@ -143,7 +169,6 @@ conflicts = [
   [
     { extra = "cpu" },
     { extra = "cu118" },
-    { extra = "cu124" },
     { extra = "cu128" },
   ],
 ]
@@ -166,11 +191,6 @@ name = "pytorch-cu118"
 url = "https://download.pytorch.org/whl/cu118"
 explicit = true
-[[tool.uv.index]]
-name = "pytorch-cu124"
-url = "https://download.pytorch.org/whl/cu124"
-explicit = true
 [[tool.uv.index]]
 name = "pytorch-cu128"
 url = "https://download.pytorch.org/whl/cu128"
@@ -180,19 +200,28 @@ explicit = true
 torch = [
   { index = "pytorch-cpu", extra = "cpu" },
   { index = "pytorch-cu118", extra = "cu118" },
-  { index = "pytorch-cu124", extra = "cu124" },
   { index = "pytorch-cu128", extra = "cu128" },
 ]
 torchvision = [
   { index = "pytorch-cpu", extra = "cpu" },
   { index = "pytorch-cu118", extra = "cu118" },
-  { index = "pytorch-cu124", extra = "cu124" },
   { index = "pytorch-cu128", extra = "cu128" },
 ]
 [tool.uv.extra-build-dependencies]
 numba = ["tbb>=2021.6"]
+[tool.poetry]
+version = "0.0.0"  # overwritten by poetry-dynamic-versioning
+[[tool.poetry.source]]
+name = "pytorch-cpu"
+url = "https://download.pytorch.org/whl/cpu"
+priority = "supplemental"
+[tool.poetry.dependencies]
+torch = { version = ">=2.2.0", source = "pytorch-cpu" }
 [tool.hatch.build.targets.sdist]
 include = ["src/dataeval"]
@@ -208,8 +237,11 @@ source = "vcs"
 [tool.hatch.build.hooks.vcs]
 version-file = "src/dataeval/_version.py"
-[tool.poetry]
-version = "0.0.0"  # unused
+[tool.poetry-dynamic-versioning]
+enable = true
+vcs = "git"
+style = "pep440"
+pattern = "^v?(?P<base>\\d+\\.\\d+\\.\\d+)"
 [tool.pyproject2conda.dependencies]
 numpy = { skip = true, packages = "numpy>=1.24.2" }
@@ -219,7 +251,7 @@ torch = { pip = true }  # PyTorch is no longer maintained on conda-forge
 xxhash = { skip = true, packages = "python-xxhash>=3.3" }
 [tool.pyright]
-include = ["src", "tests"]
+include = ["src", "tests", "verification", "docs/source/notebooks"]
 exclude = [
   "**/__pycache__",
   "**/node_modules",
@@ -232,6 +264,10 @@ reportMissingImports = false
 [tool.pytest.ini_options]
 testpaths = ["tests"]
+filterwarnings = [
+  "ignore:The default value of normalize_pixel_values changed:FutureWarning",
+  "ignore:Clustering metrics expect discrete values but received continuous values:UserWarning",
+]
 addopts = [
   "--pythonwarnings=ignore::DeprecationWarning",
   "--verbose",
@@ -278,6 +314,7 @@ exclude = [
   ".tox",
   "prototype",
   "src/dataeval/_version.py",
+  "*.ipynb",
 ]
 line-length = 120
 indent-width = 4
@@ -292,7 +329,7 @@ ignore = ["ANN101", "ANN102", "ANN401", "C408", "C416", "COM812", "NPY002", "SLF
 fixable = ["ALL"]
 unfixable = []
 dummy-variable-rgx = "^(_+|(_+[a-zA-Z0-9_]*[a-zA-Z0-9]+?))$"
-per-file-ignores = { "!src/*" = ["ANN", "ARG", "BLE", "S", "SLF", "RET", "C90", "D", "FIX", "N", "PERF"], "docs/*" = ["B904"] }
+per-file-ignores = { "!src/*" = ["ANN", "ARG", "BLE", "S", "SLF", "RET", "C90", "D", "FIX", "N", "PERF"], "docs/*" = ["B904"], "docs/source/notebooks/*" = ["E402", "E501", "E703", "RUF100", "SIM105", "UP009"] }
 [tool.ruff.lint.flake8-builtins]
 builtins-strict-checking = false
@@ -307,6 +344,7 @@ max-complexity = 5
 convention = "numpy"
 [tool.ruff.format]
+preview = true
 quote-style = "double"
 indent-style = "space"
 skip-magic-trailing-comma = false
@@ -315,8 +353,8 @@ docstring-code-format = true
 docstring-code-line-length = "dynamic"
 [tool.codespell]
-skip = './*env*,./output,./docs/build,./docs/source/.jupyter_cache,./docs/source/*/data,CHANGELOG.md,uv.lock,requirements.txt,*.html,*.lock,*.ipynb'
-ignore-words-list = ["Hart","FPR"]
+skip = './*env*,./output,./docs/build,./docs/source/.jupyter_cache,./docs/source/*/data,CHANGELOG.md,uv.lock,requirements.*.txt,*.html,*.lock,*.ipynb'
+ignore-words-list = ["Hart","FPR", "MOT", "mot"]
 [build-system]
 requires = ["hatchling", "hatch-vcs"]

{dataeval-1.0.5 → dataeval-1.1.0rc0}/src/dataeval/__init__.py RENAMED Viewed

@@ -20,24 +20,30 @@ __all__ = [
     "exceptions",
     "flags",
     "log",
+    "models",
     "protocols",
     "types",
     "Embeddings",
     "Metadata",
+    "Ontology",
 ]
 import logging
-from . import config, exceptions, flags, protocols, types
+from . import config, exceptions, flags, models, protocols, types
 from ._embeddings import Embeddings
 from ._metadata import Metadata
+from ._ontology import Ontology
 logging.getLogger(__name__).addHandler(logging.NullHandler())
 def log(level: int = logging.DEBUG, handler: logging.Handler | None = None) -> None:
     """
-    Add a StreamHandler to the logger quickly for debugging.
+    Add a handler to the logger quickly for debugging.
+    Calling this more than once is idempotent: a handler equal to one already
+    attached to the logger is not added again, so log lines are not duplicated.
     Parameters
     ----------
@@ -45,18 +51,21 @@ def log(level: int = logging.DEBUG, handler: logging.Handler | None = None) -> N
         Set the logging level for the logger.
     handler : logging.Handler, optional
         Sets the logging handler for the logger if provided, otherwise logger will be
-        provided with a StreamHandler.
+        provided with a StreamHandler. When a custom handler is supplied its formatter
+        is left untouched; the default StreamHandler is given a verbose debugging
+        formatter.
     """
     import logging
-    logger = logging.getLogger(__name__)
+    _logger = logging.getLogger(__name__)
     if handler is None:
-        handler = logging.StreamHandler() if handler is None else handler
+        handler = logging.StreamHandler()
         handler.setFormatter(
             logging.Formatter(
                 "%(asctime)s %(levelname)-8s %(name)s.%(filename)s:%(lineno)s - %(funcName)10s() | %(message)s",
             ),
         )
-    logger.addHandler(handler)
-    logger.setLevel(level)
-    logger.debug(f"Added logging handler {handler} to logger: {__name__}")
+    if handler not in _logger.handlers:
+        _logger.addHandler(handler)
+    _logger.setLevel(level)
+    _logger.debug("Added logging handler %s to logger: %s", handler, __name__)

{dataeval-1.0.5 → dataeval-1.1.0rc0}/src/dataeval/_embeddings.py RENAMED Viewed

@@ -1,6 +1,6 @@
 """Embeddings class for extracting and managing image embeddings."""
-__all__ = []
+__all__ = ["Embeddings"]
 import logging
 import os
@@ -14,7 +14,7 @@ import xxhash as xxh
 from numpy.typing import NDArray
 from typing_extensions import Self
-from dataeval.config import get_batch_size
+from dataeval.config import resolve_batch_size
 from dataeval.exceptions import NotFittedError
 from dataeval.extractors import FlattenExtractor
 from dataeval.protocols import (
@@ -25,6 +25,8 @@ from dataeval.protocols import (
     FeatureExtractor,
     ProgressCallback,
 )
+from dataeval.utils._internal import unwrap_image
+from dataeval.utils._validate import requires_maite_dataset
 _logger = logging.getLogger(__name__)
@@ -53,8 +55,14 @@ class Embeddings(Array, FeatureExtractor):
         :class:`~dataeval.extractors.FlattenExtractor` for simple baseline
         compatibility with all DataEval tools.
     batch_size : int or None, default None
-        Number of samples to process per batch. When None, uses DataEval's
-        configured batch size via :func:`~dataeval.config.get_batch_size`.
+        I/O chunk size: how many images are loaded from the dataset, encoded, and
+        written to storage per step. Resolved via
+        :func:`~dataeval.config.resolve_batch_size` as the first set of
+        ``batch_size`` (this argument), the extractor's own ``batch_size``, then
+        the global default. This is distinct from an extractor's own forward-pass
+        (compute) batch size: an extractor with its own ``batch_size`` sub-batches
+        each chunk for the model, so the smaller of the two bounds the forward
+        pass. Batching never changes the resulting embeddings.
     path : Path, str, or None, default None
         File path for memory-mapped storage. When None, caches embeddings in memory only.
         When Path or string is provided, uses memory-mapped storage for large embeddings
@@ -93,6 +101,7 @@ class Embeddings(Array, FeatureExtractor):
     memory_threshold: float
+    @requires_maite_dataset("dataset", expected="image_only")
     def __init__(
         self,
         # Technically more permissive than ImageClassificationDataset or ObjectDetectionDataset
@@ -104,7 +113,7 @@ class Embeddings(Array, FeatureExtractor):
         progress_callback: ProgressCallback | None = None,
     ) -> None:
         self._extractor = extractor if extractor is not None else FlattenExtractor()
-        self._batch_size = get_batch_size(batch_size)
+        self._batch_size = resolve_batch_size(batch_size, getattr(self._extractor, "batch_size", None))
         self.memory_threshold = max(0.0, min(1.0, memory_threshold))
         self._progress_callback = progress_callback
@@ -159,6 +168,7 @@ class Embeddings(Array, FeatureExtractor):
         """
         return self._dataset is not None
+    @requires_maite_dataset("dataset", expected="image_only")
     def bind(self, dataset: Dataset[tuple[ArrayLike, Any, Any]] | Dataset[ArrayLike]) -> Self:
         """Bind this instance to a dataset.
@@ -502,12 +512,7 @@ class Embeddings(Array, FeatureExtractor):
         if self._dataset is None:
             raise NotFittedError("No dataset bound. Call bind() first.")
-        images: list[Any] = []
-        for idx in indices:
-            item = self._dataset[idx]
-            image = item[0] if isinstance(item, tuple) else item
-            images.append(image)
-        return images
+        return [unwrap_image(self._dataset[idx]) for idx in indices]
     def _batch(self, indices: Sequence[int]) -> Iterator[NDArray[Any]]:  # noqa: C901
         """Process indices in batches using the extractor."""

{dataeval-1.0.5 → dataeval-1.1.0rc0}/src/dataeval/_experimental.py RENAMED Viewed

@@ -39,14 +39,6 @@ def _make_warning_message(  # noqa: C901
     return msg
-def _prepend_doc_note(doc: str | None, note: str) -> str:
-    """Prepend a status note to a docstring."""
-    header = f".. warning::\n    {note}"
-    if doc:
-        return f"{header}\n\n{doc}"
-    return header
 @overload
 def experimental(_target: F) -> F: ...
 @overload
@@ -89,7 +81,7 @@ def experimental(  # noqa: C901
                 original_init(self, *args, **kwargs)
             target.__init__ = new_init  # type: ignore[attr-defined]
-            target.__doc__ = _prepend_doc_note(target.__doc__, msg)
+            target.__experimental__ = msg  # type: ignore[attr-defined]
             return target  # type: ignore[return-value]
         @functools.wraps(target)
@@ -100,7 +92,7 @@ def experimental(  # noqa: C901
                 warned = True
             return target(*args, **kwargs)
-        wrapper.__doc__ = _prepend_doc_note(target.__doc__, msg)
+        wrapper.__experimental__ = msg  # type: ignore[attr-defined]
         return wrapper  # type: ignore[return-value]
     if _target is not None:
@@ -165,7 +157,7 @@ def deprecated(  # noqa: C901
                 original_init(self, *args, **kwargs)
             target.__init__ = new_init  # type: ignore[attr-defined]
-            target.__doc__ = _prepend_doc_note(target.__doc__, msg)
+            target.__deprecated__ = msg  # type: ignore[attr-defined]
             return target  # type: ignore[return-value]
         @functools.wraps(target)
@@ -176,7 +168,7 @@ def deprecated(  # noqa: C901
                 warned = True
             return target(*args, **kwargs)
-        wrapper.__doc__ = _prepend_doc_note(target.__doc__, msg)
+        wrapper.__deprecated__ = msg  # type: ignore[attr-defined]
         return wrapper  # type: ignore[return-value]
     if _target is not None:

{dataeval-1.0.5 → dataeval-1.1.0rc0}/src/dataeval/_metadata.py RENAMED Viewed

@@ -1,4 +1,4 @@
-__all__ = []
+__all__ = ["Metadata"]
 import logging
 from collections.abc import Callable, Iterable, Iterator, Mapping, Sequence, Sized
@@ -22,6 +22,7 @@ from dataeval.protocols import (
 )
 from dataeval.types import Array1D
 from dataeval.utils._internal import as_numpy, merge_metadata
+from dataeval.utils._validate import requires_maite_dataset
 _logger = logging.getLogger(__name__)
@@ -105,6 +106,7 @@ class Metadata(Array, FeatureExtractor):
     >>> test_factors = metadata(test_dataset)  # Extract from new dataset
     """
+    @requires_maite_dataset("dataset", expected="any_target")
     def __init__(
         self,
         dataset: AnnotatedDataset[tuple[Any, Any, DatumMetadata]] | None = None,
@@ -168,6 +170,7 @@ class Metadata(Array, FeatureExtractor):
         """
         return self._dataset is not None
+    @requires_maite_dataset("dataset", expected="any_target")
     def bind(self, dataset: AnnotatedDataset[tuple[Any, Any, DatumMetadata]]) -> Self:
         """Bind this instance to a dataset.
@@ -573,6 +576,11 @@ class Metadata(Array, FeatureExtractor):
             Rows where target_index is None contain datum-level data.
             Rows where target_index is an integer contain target/detection-level data.
+        See Also
+        --------
+        :attr:`~dataeval.Metadata.image_data` : Filter to image-level rows only
+        :attr:`~dataeval.Metadata.target_data` : Filter to target-level rows only
         Notes
         -----
         This property triggers dataset structure analysis on first access.
@@ -581,11 +589,6 @@ class Metadata(Array, FeatureExtractor):
         For Object Detection datasets, the dataframe now contains:
         - Image-level rows (target_index=None): One per image with image-level factors
         - Target-level rows (target_index=0,1,2...): One per detection with detection data
-        See Also
-        --------
-        image_data : Filter to image-level rows only
-        target_data : Filter to target-level rows only
         """
         self._structure()
         return self._dataframe
@@ -650,7 +653,7 @@ class Metadata(Array, FeatureExtractor):
         -------
         Sequence[str]
             List of factor names that passed filtering and preprocessing steps.
-            Order matches columns in factor_data and binned_data.
+            Order matches columns in factor_data.
         Notes
         -----

dataeval 1.0.5__tar.gz → 1.1.0rc0__tar.gz

dataeval 1.0.5tar.gz → 1.1.0rc0tar.gz