PyPI - scTimeBench - Versions diffs - 0.1.0__tar.gz - Mend

scTimeBench 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (75) hide show

sctimebench-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Eric Haoran Huang and Adrien Osakwe
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

sctimebench-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,150 @@
+Metadata-Version: 2.4
+Name: scTimeBench
+Version: 0.1.0
+Summary: A streamlined benchmarking platform for single-cell time-series analysis
+Author-email: Eric Haoran Huang <eric.h.huang@mail.mcgill.ca>, Adrien Osakwe <adrien.osakwe@mail.mcgill.ca>, Yue Li <yueli@cs.mcgill.ca>
+License-Expression: MIT
+Project-URL: Homepage, https://github.com/li-lab-mcgill/scTimeBench
+Project-URL: Issues, https://github.com/li-lab-mcgill/scTimeBench/issues
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Science/Research
+Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
+Classifier: Programming Language :: Python :: 3.10
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: PyYAML
+Requires-Dist: scanpy
+Requires-Dist: scikit-learn
+Requires-Dist: fastparquet<2026.3.0
+Provides-Extra: dev
+Requires-Dist: pre-commit; extra == "dev"
+Requires-Dist: pytest; extra == "dev"
+Provides-Extra: benchmark
+Requires-Dist: networkx; extra == "benchmark"
+Requires-Dist: numpy; extra == "benchmark"
+Requires-Dist: geomloss; extra == "benchmark"
+Requires-Dist: pykeops; extra == "benchmark"
+Requires-Dist: scanpy[leiden]; extra == "benchmark"
+Requires-Dist: graphviz; extra == "benchmark"
+Requires-Dist: celltypist; extra == "benchmark"
+Dynamic: license-file
+<!-- for this to work on Pypi, we need to point to the absolute path -->
+<h1>
+	<img src="https://raw.githubusercontent.com/li-lab-mcgill/scTimeBench/refs/heads/main/assets/logo.png" alt="scTimeBench-Logo" height="150" align="absmiddle" /> scTimeBench
+</h1>
+[![python](https://img.shields.io/badge/Python-3.10%2B-blue.svg?logo=python&style=flat-square)](https://www.python.org/downloads/release/python-31012/)
+[![License](https://img.shields.io/badge/License-MIT-blue.svg?style=flat-square)](https://opensource.org/license/mit)
+[![bioRXiv](https://img.shields.io/badge/bioRXiv-10.64898/2026.03.16.712069v1-red.svg?style=flat-square)](https://www.biorxiv.org/content/10.64898/2026.03.16.712069v1)
+[![Google Colab](https://img.shields.io/badge/Google-Colab-orange?logo=googlecolab&style=flat-square)](https://colab.research.google.com/drive/1J-yNXu_FcSnhrCwTDQKjWCBSHsmdbohJ?usp=sharing)
+<!-- TODO: --> <!-- [![Documentation](https://img.shields.io/badge/Documentation-Online-green.svg?style=flat-square)]() -->
+![scTimeBench Overview](http://raw.githubusercontent.com/li-lab-mcgill/scTimeBench/refs/heads/main/assets/scTimeBench.png)
+## Table of Contents
+- [Environment Setup](#environment-setup)
+  - [Suggested UV Installation](#suggested-uv-installation)
+  - [Standard Pip](#standard-pip)
+- [Benchmark Architecture](#benchmark-architecture)
+  - [Detailed Layout of File Structure](#detailed-layout-of-file-structure)
+  - [Command-Line Interface Details](#command-line-interface-details)
+  - [Example Run](#example-run)
+- [Contributing to scTimeBench](#contributing-to-sctimebench)
+- [Citation](#citation)
+## Environment Setup
+scTimeBench was tested and supported using Python 3.10. If any other version that is 3.10+ does not work when using the benchmark, please submit an issue to this GitHub.
+Important Note: this setup is needed twice. Once for the user to run the benchmark metrics, and the second time for the method itself needing to read from `scTimeBench.method_utils.method_runner` and other important shared constants. In short, this means we need to install scTimeBench into two separate virtual environments:
+1. Your normal pip installation where you'll be running the benchmark from. This will require the extra "\[benchmark\]" group installation (pip) or the extra group installation `--extra benchmark` (uv).
+2. For each method's virtual environment, you need to install the scTimeBench.
+### Suggested UV Installation
+Due to external dependencies and a more complex setup, we have decided to package everything under `uv` (see: https://github.com/astral-sh/uv). To start with, you need to get all the necessary extern dependencies, which can be done either by running:
+```
+git submodule update --init extern/
+```
+If you wish to benchmark across all methods, feel free to clone the submodules for all the methods as well with:
+```
+git submodule update --init
+```
+Then, install `uv` and run the following:
+```
+uv sync --extra benchmark
+```
+If you're using uv under a method's virtual environment, either the pip installation or the following will suffice:
+```
+uv sync
+```
+### Standard Pip
+If the external dependencies such as pypsupertime or sceptic are not used (which they are not used by default), you can install using pip as follows:
+```
+pip install -e ".[benchmark]"
+```
+to run the benchmark. For your own method, simply install without the extra benchmarking requirements with
+```
+pip install -e .
+```
+There are extra dependencies that can be found under `pyproject.toml`.
+## Benchmark Architecture
+![Benchmark Architecture](https://raw.githubusercontent.com/li-lab-mcgill/scTimeBench/refs/heads/main/assets/architecture.png)
+scTimeBench is controlled by a central configuration file which determines which datasets, methods, and metrics to run. An example of this can be found under `configs/scNODE/gex.yaml`.
+### Detailed Layout of File Structure
+* `configs/`: possible yaml config files to use as a starting point
+* `extern/`: external packages that required edits for compatability such as pyPsupertime and Sceptic.
+* `methods/`: the different methods that are possible to use, including defined submodules. Add your own methodology here.
+* `src/`: where the scTimeBench package lies. See `src/ReadMe.md` for more documentation on the modules that exist there.
+* `test/`: unit tests for each method, dataset, metric, and other important modules.
+### Command-Line Interface Details
+The entrypoint for the benchmark is defined as `scTimeBench`. Run `scTimeBench --help` for more details, or refer to `src/scTimeBench/config.py` and the documentation.
+### Example Run
+Run the package with:
+```
+scTimeBench --config configs/scNODE/gex.yaml
+```
+For a full running example using scNODE, refer to our example [Jupyter Notebook](https://colab.research.google.com/drive/1J-yNXu_FcSnhrCwTDQKjWCBSHsmdbohJ?usp=sharing).
+## Contributing to scTimeBench
+If you want to contribute, please install the dev environments with:
+```
+uv sync --extra dev --extra benchmark
+```
+or
+```
+pip install -e ".[dev, benchmark]"
+```
+To enable the autoformatting, please run:
+```
+pre-commit install
+```
+before committing.
+Follow our example tutorials on adding new methods, datasets, and metrics in our documentation here: TODO-ADD-THIS.
+### Testing
+If your change heavily modifies the architecture, please run the necessary tests under the `test/` environment using pytest. Read more on the different available tests under `test/ReadMe.md`. See more information on the pytest documentation: https://docs.pytest.org/en/stable/. A useful flag is `-s` to view the entire output of the test.
+## Citation
+```bibtex
+@article {scTimeBench,
+	author = {Osakwe, Adrien and Huang, Eric Haoran and Li, Yue},
+	title = {scTimeBench: A streamlined benchmarking platform for single-cell time-series analysis},
+	elocation-id = {2026.03.16.712069},
+	year = {2026},
+	doi = {10.64898/2026.03.16.712069},
+	publisher = {Cold Spring Harbor Laboratory},
+	URL = {https://www.biorxiv.org/content/early/2026/03/18/2026.03.16.712069},
+	eprint = {https://www.biorxiv.org/content/early/2026/03/18/2026.03.16.712069.full.pdf},
+	journal = {bioRxiv}
+}
+```

sctimebench-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,118 @@
+<!-- for this to work on Pypi, we need to point to the absolute path -->
+<h1>
+	<img src="https://raw.githubusercontent.com/li-lab-mcgill/scTimeBench/refs/heads/main/assets/logo.png" alt="scTimeBench-Logo" height="150" align="absmiddle" /> scTimeBench
+</h1>
+[![python](https://img.shields.io/badge/Python-3.10%2B-blue.svg?logo=python&style=flat-square)](https://www.python.org/downloads/release/python-31012/)
+[![License](https://img.shields.io/badge/License-MIT-blue.svg?style=flat-square)](https://opensource.org/license/mit)
+[![bioRXiv](https://img.shields.io/badge/bioRXiv-10.64898/2026.03.16.712069v1-red.svg?style=flat-square)](https://www.biorxiv.org/content/10.64898/2026.03.16.712069v1)
+[![Google Colab](https://img.shields.io/badge/Google-Colab-orange?logo=googlecolab&style=flat-square)](https://colab.research.google.com/drive/1J-yNXu_FcSnhrCwTDQKjWCBSHsmdbohJ?usp=sharing)
+<!-- TODO: --> <!-- [![Documentation](https://img.shields.io/badge/Documentation-Online-green.svg?style=flat-square)]() -->
+![scTimeBench Overview](http://raw.githubusercontent.com/li-lab-mcgill/scTimeBench/refs/heads/main/assets/scTimeBench.png)
+## Table of Contents
+- [Environment Setup](#environment-setup)
+  - [Suggested UV Installation](#suggested-uv-installation)
+  - [Standard Pip](#standard-pip)
+- [Benchmark Architecture](#benchmark-architecture)
+  - [Detailed Layout of File Structure](#detailed-layout-of-file-structure)
+  - [Command-Line Interface Details](#command-line-interface-details)
+  - [Example Run](#example-run)
+- [Contributing to scTimeBench](#contributing-to-sctimebench)
+- [Citation](#citation)
+## Environment Setup
+scTimeBench was tested and supported using Python 3.10. If any other version that is 3.10+ does not work when using the benchmark, please submit an issue to this GitHub.
+Important Note: this setup is needed twice. Once for the user to run the benchmark metrics, and the second time for the method itself needing to read from `scTimeBench.method_utils.method_runner` and other important shared constants. In short, this means we need to install scTimeBench into two separate virtual environments:
+1. Your normal pip installation where you'll be running the benchmark from. This will require the extra "\[benchmark\]" group installation (pip) or the extra group installation `--extra benchmark` (uv).
+2. For each method's virtual environment, you need to install the scTimeBench.
+### Suggested UV Installation
+Due to external dependencies and a more complex setup, we have decided to package everything under `uv` (see: https://github.com/astral-sh/uv). To start with, you need to get all the necessary extern dependencies, which can be done either by running:
+```
+git submodule update --init extern/
+```
+If you wish to benchmark across all methods, feel free to clone the submodules for all the methods as well with:
+```
+git submodule update --init
+```
+Then, install `uv` and run the following:
+```
+uv sync --extra benchmark
+```
+If you're using uv under a method's virtual environment, either the pip installation or the following will suffice:
+```
+uv sync
+```
+### Standard Pip
+If the external dependencies such as pypsupertime or sceptic are not used (which they are not used by default), you can install using pip as follows:
+```
+pip install -e ".[benchmark]"
+```
+to run the benchmark. For your own method, simply install without the extra benchmarking requirements with
+```
+pip install -e .
+```
+There are extra dependencies that can be found under `pyproject.toml`.
+## Benchmark Architecture
+![Benchmark Architecture](https://raw.githubusercontent.com/li-lab-mcgill/scTimeBench/refs/heads/main/assets/architecture.png)
+scTimeBench is controlled by a central configuration file which determines which datasets, methods, and metrics to run. An example of this can be found under `configs/scNODE/gex.yaml`.
+### Detailed Layout of File Structure
+* `configs/`: possible yaml config files to use as a starting point
+* `extern/`: external packages that required edits for compatability such as pyPsupertime and Sceptic.
+* `methods/`: the different methods that are possible to use, including defined submodules. Add your own methodology here.
+* `src/`: where the scTimeBench package lies. See `src/ReadMe.md` for more documentation on the modules that exist there.
+* `test/`: unit tests for each method, dataset, metric, and other important modules.
+### Command-Line Interface Details
+The entrypoint for the benchmark is defined as `scTimeBench`. Run `scTimeBench --help` for more details, or refer to `src/scTimeBench/config.py` and the documentation.
+### Example Run
+Run the package with:
+```
+scTimeBench --config configs/scNODE/gex.yaml
+```
+For a full running example using scNODE, refer to our example [Jupyter Notebook](https://colab.research.google.com/drive/1J-yNXu_FcSnhrCwTDQKjWCBSHsmdbohJ?usp=sharing).
+## Contributing to scTimeBench
+If you want to contribute, please install the dev environments with:
+```
+uv sync --extra dev --extra benchmark
+```
+or
+```
+pip install -e ".[dev, benchmark]"
+```
+To enable the autoformatting, please run:
+```
+pre-commit install
+```
+before committing.
+Follow our example tutorials on adding new methods, datasets, and metrics in our documentation here: TODO-ADD-THIS.
+### Testing
+If your change heavily modifies the architecture, please run the necessary tests under the `test/` environment using pytest. Read more on the different available tests under `test/ReadMe.md`. See more information on the pytest documentation: https://docs.pytest.org/en/stable/. A useful flag is `-s` to view the entire output of the test.
+## Citation
+```bibtex
+@article {scTimeBench,
+	author = {Osakwe, Adrien and Huang, Eric Haoran and Li, Yue},
+	title = {scTimeBench: A streamlined benchmarking platform for single-cell time-series analysis},
+	elocation-id = {2026.03.16.712069},
+	year = {2026},
+	doi = {10.64898/2026.03.16.712069},
+	publisher = {Cold Spring Harbor Laboratory},
+	URL = {https://www.biorxiv.org/content/early/2026/03/18/2026.03.16.712069},
+	eprint = {https://www.biorxiv.org/content/early/2026/03/18/2026.03.16.712069.full.pdf},
+	journal = {bioRxiv}
+}
+```

sctimebench-0.1.0/pyproject.toml ADDED Viewed

@@ -0,0 +1,61 @@
+[build-system]
+requires = ["setuptools", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "scTimeBench"  # This is the name you use for 'pip install'
+version = "0.1.0"
+dependencies = [
+    "PyYAML",
+    "scanpy",
+    "scikit-learn",
+    "fastparquet<2026.3.0",
+]
+requires-python = ">=3.10"
+authors = [
+    { name="Eric Haoran Huang", email="eric.h.huang@mail.mcgill.ca" },
+    { name="Adrien Osakwe", email="adrien.osakwe@mail.mcgill.ca" },
+    { name="Yue Li", email="yueli@cs.mcgill.ca" },
+]
+description = "A streamlined benchmarking platform for single-cell time-series analysis"
+classifiers = [
+    "Development Status :: 4 - Beta",
+    "Intended Audience :: Science/Research",
+    "Topic :: Scientific/Engineering :: Bio-Informatics",
+    "Programming Language :: Python :: 3.10",
+]
+license = "MIT"
+license-files = ["LICENSE"]
+readme = "README.md"
+[project.urls]
+Homepage = "https://github.com/li-lab-mcgill/scTimeBench"
+Issues = "https://github.com/li-lab-mcgill/scTimeBench/issues"
+# TODO: have documentation here!
+[tool.setuptools.packages.find]
+where = ["src"]  # Tells Python: "The actual code lives inside the src folder"
+[tool.setuptools.package-data]
+scTimeBench = [
+    "metrics/shared/*.yaml",
+    "metrics/shared/cell_lineages/**/*.txt",
+]
+[project.optional-dependencies]
+dev = [
+    "pre-commit",
+    "pytest",
+]
+benchmark = [
+    "networkx",
+    "numpy",
+    "geomloss",
+    "pykeops",
+    "scanpy[leiden]",
+    "graphviz",
+    "celltypist"
+]
+[project.scripts]
+scTimeBench = "scTimeBench.main:main"

sctimebench-0.1.0/setup.cfg ADDED Viewed

@@ -0,0 +1,4 @@
+[egg_info]
+tag_build =
+tag_date = 0

sctimebench-0.1.0/src/scTimeBench/config.py ADDED Viewed

@@ -0,0 +1,318 @@
+"""
+config.py
+Configuration management for YAML-based configs, similar to the tf-binding project.
+Handles both YAML file loading and command-line argument parsing.
+"""
+import argparse
+import logging
+import os
+import yaml
+from enum import Enum
+# enum for the different run types, primarily:
+# 1) auto_train_test: automatically run training and testing for methods that support it,
+# by running the training and testing script specified.
+# 2) preprocess: we preprocess the data and then save out a yaml file specifying requirements.
+# The user handles training and testing outside of this framework.
+# 3) eval_only: we only evaluate the metric on already generated data based on step 2).
+class RunType(Enum):
+    AUTO_TRAIN_TEST = "auto_train_test"
+    PREPROCESS = "preprocess"
+    EVAL_ONLY = "eval_only"
+    TRAIN_ONLY = "train_only"
+class Config:
+    """Config class for both yaml and cli arguments."""
+    def __init__(self):
+        """
+        Initialize config by parsing YAML file and command-line arguments.
+        CLI arguments override YAML settings.
+        """
+        # Initiate parser and parse arguments
+        parser = argparse.ArgumentParser(
+            description="Single-cell trajectory analysis configuration"
+        )
+        # Config file argument
+        parser.add_argument(
+            "-c", "--config", type=str, help="Path to YAML configuration file"
+        )
+        # add metrics argument
+        parser.add_argument(
+            "--metrics",
+            type=str,
+            nargs="+",
+            help="List of metrics to compute",
+        )
+        parser.add_argument(
+            "--available",
+            action="store_true",
+            help="Show available methods, datasets, and metrics",
+        )
+        parser.add_argument(
+            "--print_all",
+            action="store_true",
+            help="Print all entries in the database tables",
+        )
+        parser.add_argument(
+            "--graph_sim_to_csv",
+            action="store_true",
+            help="Print graph similarity evaluations as CSV to stdout",
+        )
+        parser.add_argument(
+            "--output_csv_path",
+            type=str,
+            default="graph_sim.csv",
+            help="Optional path to save CSV output of graph similarity evaluations; if omitted, outputs to graph_sim.csv",
+        )
+        parser.add_argument(
+            "--clear_tables",
+            action="store_true",
+            help="Clear all entries in the database tables",
+        )
+        parser.add_argument(
+            "--view_evals_by_method",
+            action="store_true",
+            help="View existing evaluations of all metrics in the database per method set in the configuration",
+        )
+        parser.add_argument(
+            "--view_evals_by_metric",
+            action="store_true",
+            help="View existing evaluations of all methods in the database per metric set in the configuration",
+        )
+        parser.add_argument(
+            "--database_path",
+            type=str,
+            help="Path to the SQLite database file for storing results",
+        )
+        parser.add_argument(
+            "--run_type",
+            type=str,
+            choices=[rt.value for rt in RunType],
+            help="Type of run to perform: (default) auto_train_test, preprocess, eval_only, train_only. Defaults to auto_train_test.",
+        )
+        parser.add_argument(
+            "--output_dir",
+            type=str,
+            help="Directory to store outputs",
+        )
+        parser.add_argument(
+            "--log_level",
+            type=str,
+            choices=["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"],
+            help="Logging level for the run (default: INFO)",
+        )
+        parser.add_argument(
+            "--log_file",
+            type=str,
+            help="Optional path to a log file; if omitted logs only go to stdout",
+        )
+        parser.add_argument(
+            "--data_dir",
+            type=str,
+            help="Optional base directory for dataset files, otherwise uses root paths specified in the config. If used, treats paths in config as either absolute or relative to this directory.",
+        )
+        parser.add_argument(
+            "--force_rerun",
+            action="store_true",
+            help="Usually duplicate method evaluations are skipped. This flag forces re-running even if evaluations already exist.",
+        )
+        parser.add_argument(
+            "-cf",
+            "--crispy-fishstick",
+            action="store_true",
+            help=argparse.SUPPRESS,  # hide this from help since it's an Easter egg
+        )
+        # Parse known arguments
+        args = parser.parse_args()
+        # first handle the Easter egg
+        if args.crispy_fishstick:
+            from scTimeBench.shared.utils import animate, restore_interrupts
+            import sys
+            try:
+                animate()
+            finally:
+                restore_interrupts()
+                sys.stdout.write("\033[?25h")  # Re-show the cursor if you hid it
+            exit()
+        # Get all config keys
+        config_keys = list(args.__dict__.keys())
+        # other keys to add from the yaml file
+        config_keys.extend(["method", "datasets"])
+        # First read the config file if provided
+        assert (
+            args.config is not None
+        ), "Config file path must be provided with --config"
+        if not os.path.exists(args.config):
+            raise FileNotFoundError(f"Config file not found: {args.config}")
+        with open(args.config, "r") as file:
+            data = yaml.safe_load(file)
+        # Set attributes from YAML file
+        for key in config_keys:
+            if key in data.keys():
+                setattr(self, key, data[key])
+        # Override with command-line arguments
+        for key, value in args._get_kwargs():
+            if value is not None:
+                setattr(self, key, value)
+        # Set defaults for optional parameters
+        defaults = {
+            "database_path": "scTimeBench.db",
+            "run_type": RunType.AUTO_TRAIN_TEST.value,
+            "output_dir": "outputs/",
+            "datasets": [],
+            "log_level": "INFO",
+            "log_file": None,
+        }
+        for key, value in defaults.items():
+            if not hasattr(self, key) or getattr(self, key) is None:
+                setattr(self, key, value)
+        # Configure logging with stdout always enabled and optional file output.
+        resolved_log_level = getattr(logging, str(self.log_level).upper(), logging.INFO)
+        handlers = [logging.StreamHandler()]
+        if self.log_file:
+            log_dir = os.path.dirname(self.log_file)
+            if log_dir:
+                os.makedirs(log_dir, exist_ok=True)
+            handlers.append(logging.FileHandler(self.log_file))
+        logging.basicConfig(
+            level=resolved_log_level,
+            format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+            handlers=handlers,
+        )
+        # Validate required fields
+        required_fields = ["method", "metrics"]
+        for field in required_fields:
+            assert (
+                hasattr(self, field) and getattr(self, field) is not None
+            ), f"Required field '{field}' must be specified in config file or as --{field}"
+        # make sure no other fields exist besides this + datasets in the yaml
+        allowed_fields = set(required_fields + ["datasets", "metrics_skiplist"])
+        for field in data.keys():
+            if field not in allowed_fields:
+                raise ValueError(
+                    f"Unknown field '{field}' found in config. Allowed fields are {allowed_fields}."
+                )
+        # validate the fields within each larger section
+        # N.B.: we don't need them to specify preprocessors because it might already be preprocessed
+        # ** DATASETS **
+        dataset_required_fields = ["data_path", "name"]
+        dataset_optional_fields = ["data_preprocessing_steps"]
+        dataset_alternate_field = "tag"
+        for dataset in self.datasets:
+            # we want to make sure either all of the required fields are specified,
+            # or the alternate field is specified, but not a mix of both
+            if dataset_alternate_field in dataset:
+                if any([field in dataset for field in dataset_required_fields]):
+                    raise ValueError(
+                        f"Dataset config cannot have both '{dataset_alternate_field}' and any other fields {dataset_required_fields}."
+                    )
+                continue
+            for field in dataset_required_fields:
+                assert (
+                    field in dataset
+                ), f"Required dataset field '{field}' must be specified in config file, or use alternate flag \"tag\" instead."
+            # also ensure that all the fields found in the dataset are only of the required fields
+            # this is to ensure caching also works properly
+            for field in dataset.keys():
+                if (
+                    field not in dataset_required_fields
+                    and field not in dataset_optional_fields
+                ):
+                    raise ValueError(
+                        f"Unknown field '{field}' found in dataset config. Allowed fields are {dataset_required_fields} or '{dataset_alternate_field}'."
+                    )
+        # ** METHOD **
+        method_required_fields = ["name"]
+        for field in method_required_fields:
+            assert (
+                field in self.method
+            ), f"Required method field '{field}' must be specified in config file"
+        # Validate paths exist
+        dataset_path_keys = [
+            "data_path",
+            "cell_lineage_file",
+            "cell_equivalence_file",
+        ]
+        method_path_keys = [
+            "train_and_test_script",
+        ]
+        paths = {
+            *{
+                value
+                for dataset in self.datasets
+                for key, value in dataset.items()
+                if key in dataset_path_keys
+            },
+            *{value for key, value in self.method.items() if key in method_path_keys},
+        }
+        for path in paths:
+            assert os.path.exists(path), f"Path for '{path}' does not exist: {path}"
+        # set the run type
+        self.run_type = RunType(self.run_type)
+        # verify that the train and test script is specified if auto_train_test is set
+        if self.run_type == RunType.AUTO_TRAIN_TEST:
+            assert (
+                "train_and_test_script" in self.method
+            ), "Method must specify 'train_and_test_script' to use --auto_train_test"
+        # finally, to be used later for saving the original config
+        # into the method output yaml config:
+        self.method_yaml_data = data["method"]
+        # then let's also add in the metric skip list
+        self.metrics_skiplist = data.get("metrics_skiplist", [])
+        self.metrics_skiplist = [
+            metric if isinstance(metric, str) else metric.get("name", "")
+            for metric in self.metrics_skiplist
+        ]
+        logging.info("Configuration successfully loaded")
+        logging.debug("Configuration details: %s", self.__dict__)