PyPI - syckpt - Versions diffs - 0.0.1__tar.gz - Mend

syckpt 0.0.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

syckpt-0.0.1/PKG-INFO +134 -0
syckpt-0.0.1/README.md +103 -0
syckpt-0.0.1/pyproject.toml +44 -0
syckpt-0.0.1/setup.cfg +4 -0
syckpt-0.0.1/setup.py +58 -0
syckpt-0.0.1/syckpt/__init__.py +20 -0
syckpt-0.0.1/syckpt/config.py +161 -0
syckpt-0.0.1/syckpt/dataloader.py +81 -0
syckpt-0.0.1/syckpt/hash.py +249 -0
syckpt-0.0.1/syckpt/manager.py +818 -0
syckpt-0.0.1/syckpt/state.py +189 -0
syckpt-0.0.1/syckpt/storage.py +237 -0
syckpt-0.0.1/syckpt.egg-info/PKG-INFO +134 -0
syckpt-0.0.1/syckpt.egg-info/SOURCES.txt +15 -0
syckpt-0.0.1/syckpt.egg-info/dependency_links.txt +1 -0
syckpt-0.0.1/syckpt.egg-info/requires.txt +12 -0
syckpt-0.0.1/syckpt.egg-info/top_level.txt +1 -0

syckpt-0.0.1/PKG-INFO ADDED Viewed

@@ -0,0 +1,134 @@
+Metadata-Version: 2.4
+Name: syckpt
+Version: 0.0.1
+Summary: Git-like experiment tracking for deep learning with exact computational resumption
+Home-page: https://github.com/sykchw/syckpt
+Author: Sayak Chowdhury
+Author-email: Sayak Chowdhury <sayak.iiitb@gmail.com>
+License: MIT
+Project-URL: Bug Reports, https://github.com/sykchw/syckpt/issues
+Project-URL: Source, https://github.com/sykchw/syckpt
+Classifier: Development Status :: 3 - Alpha
+Classifier: Intended Audience :: Developers
+Classifier: Intended Audience :: Science/Research
+Classifier: License :: OSI Approved :: MIT License
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+Requires-Dist: torch>=2.0.0
+Requires-Dist: numpy>=1.20.0
+Requires-Dist: safetensors>=0.4.0
+Requires-Dist: fsspec>=2023.0.0
+Provides-Extra: dev
+Requires-Dist: pytest>=7.0.0; extra == "dev"
+Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
+Requires-Dist: black>=23.0.0; extra == "dev"
+Requires-Dist: mypy>=1.0.0; extra == "dev"
+Requires-Dist: twine>=4.0.0; extra == "dev"
+Requires-Dist: build>=1.0.0; extra == "dev"
+Dynamic: author
+Dynamic: home-page
+Dynamic: requires-python
+# Syckpt v0.0.1
+**Git-like experiment tracking for deep learning with exact computational resumption, zero-copy safetensors memory-mapping, and delta-compression.**
+`syckpt` is a lightweight, local-first experiment version control system designed to perfectly reconstruct massive computational states—model weights, optimizer momentum, mixed-precision GradScalers, Random Number Generators, and Stateful DataLoaders—without perturbing the loss curve.
+---
+## How `syckpt` Works (The Architecture)
+When training massive Deep Learning models, saving a full checkpoint at every epoch typically results in gigabytes of duplicated disk space and high latency. `syckpt` solves this by treating machine learning checkpoints like a Git repository.
+1. **Content-Addressable Storage (CAS) & Delta-Compression**:
+   Instead of saving full 5GB `.pt` weight files at every step, `syckpt` finds the most mathematically similar historical checkpoint and computes the pure `float32` difference (`delta = current - base`). Because gradient steps are small, this delta is highly compressible. `syckpt` stores these deltas in a hidden `.syckpt/objects/` directory, saving up to 90% of disk space.
+2. **Locality-Sensitive Hashing (LSH)**:
+   To instantly find the "most similar" historical checkpoint, `syckpt` uses LSH to hash your hyperparameters (like learning rate, batch size, and seed). Similar hyperparameters mathematically collide to produce identical hash prefixes, allowing the system to rapidly query the Git-tree.
+3. **Zero-Copy memory mapping via Safetensors**:
+   `syckpt` bypasses Python's insecure and memory-heavy `pickle` module. It uses Rust-backed `safetensors` to memory-map the delta-blobs directly from your SSD into the GPU's VRAM ("Zero-Copy"), completely eliminating CPU RAM Out-Of-Memory (OOM) errors during loading.
+4. **Exact Mathematical Resumption**:
+   Standard PyTorch training loops suffer from "resumption spikes" in the loss curve because the DataLoader indices and Random Number Generators (RNG) get reset. `syckpt` intercepts PyTorch, CUDA, and Numpy generators, as well as preserving the internal `RandomSampler` permutations of your DataLoaders. When you resume, it is mathematically identical to if the process was never interrupted.
+## Installation
+We utilize the Rust-accelerated `uv` package manager.
+```bash
+pip install syckpt
+# Or using uv
+uv pip install syckpt
+```
+## Quick Start
+```python
+import torch
+import torch.nn as nn
+import torch.optim as optim
+from syckpt import CheckpointManager
+from syckpt.dataloader import StatefulDataLoader
+from torch.utils.data import DataLoader, TensorDataset
+# Typical PyTorch components
+model = nn.Linear(10, 2)
+optimizer = optim.SGD(model.parameters(), lr=0.01)
+dummy_data = TensorDataset(torch.randn(100, 10), torch.randn(100, 2))
+# Wrap standard non-deterministic DataLoader internally
+loader = StatefulDataLoader(DataLoader(dummy_data, batch_size=32, shuffle=True))
+# Specify an S3 or Local URL: Atomic locks handles concurrency natively
+with CheckpointManager("s3://my-experiments-bucket/.syckpt") as ckpt:
+    # 1. Register dynamic objects (Automatically mapped via flattening)
+    ckpt.model = model
+    ckpt.optimizer = optimizer
+    ckpt.dataloader = loader
+    # 2. Hyperparameters automatically generate the unique LSH Hash
+    ckpt.config.lr = 0.01
+    ckpt.config.batch_size = 32
+    # 3. Training Loop inherently traps the step and epoch parameters
+    for epoch in ckpt.loop(epochs=10):
+        for batch_idx, batch in enumerate(loader):
+            loss = torch.randn(1) # Fake loss
+            ckpt.step_up()
+        # Delta-Compression kicks in automatically
+        if epoch % 2 == 0:
+            ckpt.save(metric=loss.item())
+    print(f"Mathematical execution saved at LSH Commit: {ckpt.hash}")
+```
+## Feature Reference
+### Exporting Monolithic Assets (`.ckpt`)
+If you deploy your model and no longer need `.syckpt` branching, you can securely collapse the Git-tree into a standard monolithic PyTorch `.ckpt` file for Hugging Face or deployment:
+```python
+with CheckpointManager("./experiments") as ckpt:
+    # Recursively loads flat delta-tensors and reconstitutes standard dict
+    ckpt.export_ckpt(hash_or_branch="main", output_path="final-model.ckpt")
+```
+### Full Distributed Resumption (DDP)
+`syckpt` seamlessly broadcasts LSH hashes and uses `dist.gather` to collect highly volatile RNG seeds across your entire multi-GPU cluster.
+```python
+import numpy as np
+with CheckpointManager("./") as ckpt:
+    # Simply register your Modern Numpy generator and the state_manager intercepts the memory bytes
+    ckpt.numpy_rng = np.random.default_rng()
+    ckpt.save()
+```
+---
+## Architectural Deep-Dive
+Curious how `syckpt v0.0.1` leverages Git pointers, `fsspec` atomic cloud mechanisms, manages PyTorch tensors, and accelerates training via Zero-Copy Safetensors?
+Read the definitive educational walkthrough: [Implementation Guide (`implementation.md`)](./implementation.md).

syckpt-0.0.1/README.md ADDED Viewed

@@ -0,0 +1,103 @@
+# Syckpt v0.0.1
+**Git-like experiment tracking for deep learning with exact computational resumption, zero-copy safetensors memory-mapping, and delta-compression.**
+`syckpt` is a lightweight, local-first experiment version control system designed to perfectly reconstruct massive computational states—model weights, optimizer momentum, mixed-precision GradScalers, Random Number Generators, and Stateful DataLoaders—without perturbing the loss curve.
+---
+## How `syckpt` Works (The Architecture)
+When training massive Deep Learning models, saving a full checkpoint at every epoch typically results in gigabytes of duplicated disk space and high latency. `syckpt` solves this by treating machine learning checkpoints like a Git repository.
+1. **Content-Addressable Storage (CAS) & Delta-Compression**:
+   Instead of saving full 5GB `.pt` weight files at every step, `syckpt` finds the most mathematically similar historical checkpoint and computes the pure `float32` difference (`delta = current - base`). Because gradient steps are small, this delta is highly compressible. `syckpt` stores these deltas in a hidden `.syckpt/objects/` directory, saving up to 90% of disk space.
+2. **Locality-Sensitive Hashing (LSH)**:
+   To instantly find the "most similar" historical checkpoint, `syckpt` uses LSH to hash your hyperparameters (like learning rate, batch size, and seed). Similar hyperparameters mathematically collide to produce identical hash prefixes, allowing the system to rapidly query the Git-tree.
+3. **Zero-Copy memory mapping via Safetensors**:
+   `syckpt` bypasses Python's insecure and memory-heavy `pickle` module. It uses Rust-backed `safetensors` to memory-map the delta-blobs directly from your SSD into the GPU's VRAM ("Zero-Copy"), completely eliminating CPU RAM Out-Of-Memory (OOM) errors during loading.
+4. **Exact Mathematical Resumption**:
+   Standard PyTorch training loops suffer from "resumption spikes" in the loss curve because the DataLoader indices and Random Number Generators (RNG) get reset. `syckpt` intercepts PyTorch, CUDA, and Numpy generators, as well as preserving the internal `RandomSampler` permutations of your DataLoaders. When you resume, it is mathematically identical to if the process was never interrupted.
+## Installation
+We utilize the Rust-accelerated `uv` package manager.
+```bash
+pip install syckpt
+# Or using uv
+uv pip install syckpt
+```
+## Quick Start
+```python
+import torch
+import torch.nn as nn
+import torch.optim as optim
+from syckpt import CheckpointManager
+from syckpt.dataloader import StatefulDataLoader
+from torch.utils.data import DataLoader, TensorDataset
+# Typical PyTorch components
+model = nn.Linear(10, 2)
+optimizer = optim.SGD(model.parameters(), lr=0.01)
+dummy_data = TensorDataset(torch.randn(100, 10), torch.randn(100, 2))
+# Wrap standard non-deterministic DataLoader internally
+loader = StatefulDataLoader(DataLoader(dummy_data, batch_size=32, shuffle=True))
+# Specify an S3 or Local URL: Atomic locks handles concurrency natively
+with CheckpointManager("s3://my-experiments-bucket/.syckpt") as ckpt:
+    # 1. Register dynamic objects (Automatically mapped via flattening)
+    ckpt.model = model
+    ckpt.optimizer = optimizer
+    ckpt.dataloader = loader
+    # 2. Hyperparameters automatically generate the unique LSH Hash
+    ckpt.config.lr = 0.01
+    ckpt.config.batch_size = 32
+    # 3. Training Loop inherently traps the step and epoch parameters
+    for epoch in ckpt.loop(epochs=10):
+        for batch_idx, batch in enumerate(loader):
+            loss = torch.randn(1) # Fake loss
+            ckpt.step_up()
+        # Delta-Compression kicks in automatically
+        if epoch % 2 == 0:
+            ckpt.save(metric=loss.item())
+    print(f"Mathematical execution saved at LSH Commit: {ckpt.hash}")
+```
+## Feature Reference
+### Exporting Monolithic Assets (`.ckpt`)
+If you deploy your model and no longer need `.syckpt` branching, you can securely collapse the Git-tree into a standard monolithic PyTorch `.ckpt` file for Hugging Face or deployment:
+```python
+with CheckpointManager("./experiments") as ckpt:
+    # Recursively loads flat delta-tensors and reconstitutes standard dict
+    ckpt.export_ckpt(hash_or_branch="main", output_path="final-model.ckpt")
+```
+### Full Distributed Resumption (DDP)
+`syckpt` seamlessly broadcasts LSH hashes and uses `dist.gather` to collect highly volatile RNG seeds across your entire multi-GPU cluster.
+```python
+import numpy as np
+with CheckpointManager("./") as ckpt:
+    # Simply register your Modern Numpy generator and the state_manager intercepts the memory bytes
+    ckpt.numpy_rng = np.random.default_rng()
+    ckpt.save()
+```
+---
+## Architectural Deep-Dive
+Curious how `syckpt v0.0.1` leverages Git pointers, `fsspec` atomic cloud mechanisms, manages PyTorch tensors, and accelerates training via Zero-Copy Safetensors?
+Read the definitive educational walkthrough: [Implementation Guide (`implementation.md`)](./implementation.md).

syckpt-0.0.1/pyproject.toml ADDED Viewed

@@ -0,0 +1,44 @@
+[build-system]
+requires = ["setuptools>=61.0.0"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "syckpt"
+version = "0.0.1"
+description = "Git-like experiment tracking for deep learning with exact computational resumption"
+readme = "README.md"
+requires-python = ">=3.8"
+license = { text = "MIT" }
+authors = [
+    { name = "Sayak Chowdhury", email = "sayak.iiitb@gmail.com" }
+]
+classifiers = [
+    "Development Status :: 3 - Alpha",
+    "Intended Audience :: Developers",
+    "Intended Audience :: Science/Research",
+    "License :: OSI Approved :: MIT License",
+]
+dependencies = [
+    "torch>=2.0.0",
+    "numpy>=1.20.0",
+    "safetensors>=0.4.0",
+    "fsspec>=2023.0.0"
+]
+[project.optional-dependencies]
+dev = [
+    "pytest>=7.0.0",
+    "pytest-cov>=4.0.0",
+    "black>=23.0.0",
+    "mypy>=1.0.0",
+    "twine>=4.0.0",
+    "build>=1.0.0"
+]
+[project.urls]
+"Bug Reports" = "https://github.com/sykchw/syckpt/issues"
+"Source" = "https://github.com/sykchw/syckpt"
+[tool.setuptools]
+packages = ["syckpt"]

syckpt-0.0.1/setup.cfg ADDED Viewed

@@ -0,0 +1,4 @@
+[egg_info]
+tag_build =
+tag_date = 0

syckpt-0.0.1/setup.py ADDED Viewed

@@ -0,0 +1,58 @@
+"""Setup for syckpt package - Git-like experiment tracking for DL."""
+from setuptools import setup, find_packages
+from pathlib import Path
+this_directory = Path(__file__).parent
+long_description = (
+    (this_directory / "README.md").read_text()
+    if (this_directory / "README.md").exists()
+    else ""
+)
+setup(
+    name="syckpt",
+    version="0.0.1",
+    description="Git-like experiment tracking for deep learning with LSH hashing",
+    long_description=long_description,
+    long_description_content_type="text/markdown",
+    author="Sayak Chowdhury",
+    author_email="sayak.iiitb@gmail.com",
+    url="https://github.com/sykchw/syckpt",
+    packages=find_packages(exclude=["tests*", "docs*"]),
+    python_requires=">=3.8",
+    install_requires=[
+        "torch>=2.0.0",
+        "numpy>=1.20.0",
+        "safetensors>=0.4.0",
+        "fsspec>=2023.0.0"
+    ],
+    extras_require={
+        "dev": [
+            "pytest>=7.0.0",
+            "pytest-cov>=4.0.0",
+            "black>=23.0.0",
+            "mypy>=1.0.0",
+            "twine>=4.0.0",
+            "build>=1.0.0",
+        ],
+    },
+    classifiers=[
+        "Development Status :: 3 - Alpha",
+        "Intended Audience :: Developers",
+        "Intended Audience :: Science/Research",
+        "License :: OSI Approved :: MIT License",
+        "Programming Language :: Python :: 3",
+        "Programming Language :: Python :: 3.8",
+        "Programming Language :: Python :: 3.9",
+        "Programming Language :: Python :: 3.10",
+        "Programming Language :: Python :: 3.11",
+        "Programming Language :: Python :: 3.12",
+        "Topic :: Scientific/Engineering :: Artificial Intelligence",
+    ],
+    keywords="machine-learning deep-learning experiment-tracking checkpoint reproducibility lsh",
+    project_urls={
+        "Bug Reports": "https://github.com/sykchw/syckpt/issues",
+        "Source": "https://github.com/sykchw/syckpt",
+    },
+)

syckpt-0.0.1/syckpt/__init__.py ADDED Viewed

@@ -0,0 +1,20 @@
+"""Checkpoint - Git-like experiment tracking for deep learning with LSH hashing."""
+from syckpt.manager import CheckpointManager, create_checkpoint, Commit
+from syckpt.config import HyperConfig
+from syckpt.hash import LSHHashGenerator, DEFAULT_HASH_FACTORS
+from syckpt.state import set_seed, get_rng_state, set_rng_state
+__version__ = "0.0.1"
+__all__ = [
+    "CheckpointManager",
+    "create_checkpoint",
+    "Commit",
+    "HyperConfig",
+    "LSHHashGenerator",
+    "DEFAULT_HASH_FACTORS",
+    "set_seed",
+    "get_rng_state",
+    "set_rng_state",
+]

syckpt-0.0.1/syckpt/config.py ADDED Viewed

@@ -0,0 +1,161 @@
+"""Configuration system with attribute and dict access."""
+import copy
+from typing import Any, Dict, Optional, Union
+from collections.abc import Mapping
+class HyperConfig(Mapping):
+    """A configuration object that supports both attribute and dict access.
+    Supports nested configuration via dot notation (e.g., config.a.b.c)
+    and provides full dict-like access (config['key'], config.get('key')).
+    """
+    def __init__(self, data: Optional[Dict[str, Any]] = None, **kwargs):
+        self._data: Dict[str, Any] = {}
+        if data:
+            self._data = self._flatten_dict(data) if isinstance(data, dict) else {}
+        self._data.update(kwargs)
+    def _flatten_dict(
+        self, d: Dict[str, Any], parent_key: str = "", sep: str = "."
+    ) -> Dict[str, Any]:
+        """Flatten nested dict into dot-notation keys."""
+        items = []
+        for k, v in d.items():
+            new_key = f"{parent_key}{sep}{k}" if parent_key else k
+            if isinstance(v, dict):
+                items.extend(self._flatten_dict(v, new_key, sep=sep).items())
+            else:
+                items.append((new_key, v))
+        return dict(items)
+    def _unflatten_dict(self, d: Dict[str, Any], sep: str = ".") -> Dict[str, Any]:
+        """Unflatten dot-notation keys back to nested dict."""
+        result = {}
+        for key, value in d.items():
+            parts = key.split(sep)
+            d_obj = result
+            for part in parts[:-1]:
+                if part not in d_obj:
+                    d_obj[part] = {}
+                d_obj = d_obj[part]
+            d_obj[parts[-1]] = value
+        return result
+    def __getattr__(self, name: str) -> Any:
+        if name.startswith("_"):
+            return object.__getattribute__(self, name)
+        unflattened = self._unflatten_dict(self._data)
+        if name in unflattened:
+            val = unflattened[name]
+            if isinstance(val, dict) and all(
+                isinstance(k, str) and not any("." in k for k in v.keys())
+                if isinstance(v, dict)
+                else True
+                for k, v in val.items()
+                if isinstance(v, dict)
+            ):
+                return HyperConfig(val)
+            return val
+        if name in self._data:
+            return self._data[name]
+        raise AttributeError(f"'{type(self).__name__}' has no attribute '{name}'")
+    def __setattr__(self, name: str, value: Any) -> None:
+        if name.startswith("_"):
+            object.__setattr__(self, name, value)
+        else:
+            if isinstance(value, dict):
+                for k, v in self._flatten_dict({name: value}).items():
+                    self._data[k] = v
+            else:
+                self._data[name] = value
+    def __delattr__(self, name: str) -> None:
+        if name in self._data:
+            del self._data[name]
+        unflattened = self._unflatten_dict(self._data)
+        if name in unflattened:
+            del unflattened[name]
+            self._data = self._flatten_dict(unflattened)
+            return
+        raise AttributeError(f"'{type(self).__name__}' has no attribute '{name}'")
+    def __getitem__(self, key: str) -> Any:
+        return self._data[key]
+    def __setitem__(self, key: str, value: Any) -> None:
+        if isinstance(value, dict):
+            for k, v in self._flatten_dict({key: value}).items():
+                self._data[k] = v
+        else:
+            self._data[key] = value
+    def __delitem__(self, key: str) -> None:
+        del self._data[key]
+    def __contains__(self, key: object) -> bool:
+        return isinstance(key, str) and key in self._data
+    def __iter__(self):
+        return iter(self._unflatten_dict(self._data))
+    def __len__(self) -> int:
+        return len(self._unflatten_dict(self._data))
+    def __repr__(self) -> str:
+        return f"{type(self).__name__}({self._unflatten_dict(self._data)})"
+    def __str__(self) -> str:
+        import json
+        return json.dumps(self._unflatten_dict(self._data), indent=2)
+    def __bool__(self) -> bool:
+        return bool(self._data)
+    def get(self, key: str, default: Any = None) -> Any:
+        return self._data.get(key, default)
+    def update(
+        self, other: Union[Dict[str, Any], "HyperConfig"], **kwargs
+    ) -> "HyperConfig":
+        """Update config with new values."""
+        if other:
+            if isinstance(other, HyperConfig):
+                self._data.update(other._data)
+            elif isinstance(other, dict):
+                self._data.update(self._flatten_dict(other))
+        self._data.update(self._flatten_dict(kwargs))
+        return self
+    def to_dict(self) -> Dict[str, Any]:
+        """Return unflattened dict representation."""
+        return self._unflatten_dict(self._data)
+    def copy(self) -> "HyperConfig":
+        """Return a shallow copy."""
+        return HyperConfig(copy.copy(self._data))
+    def deep_copy(self) -> "HyperConfig":
+        """Return a deep copy."""
+        return HyperConfig(copy.deepcopy(self._data))
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "HyperConfig":
+        """Create config from dict."""
+        return cls(data)
+    def items(self):
+        """Return unflattened items."""
+        return self._unflatten_dict(self._data).items()
+    def keys(self):
+        """Return unflattened keys."""
+        return self._unflatten_dict(self._data).keys()
+    def values(self):
+        """Return unflattened values."""
+        return self._unflatten_dict(self._data).values()

syckpt-0.0.1/syckpt/dataloader.py ADDED Viewed

@@ -0,0 +1,81 @@
+import torch
+from torch.utils.data import DataLoader, RandomSampler, SequentialSampler
+class StatefulDataLoader:
+    """A wrapper for PyTorch DataLoader that allows exact resumption.
+    It tracks the batch index and the exact permuted index list generated by
+    the RandomSampler. Upon resumption, it slices the iterator to return
+    only the remaining batches, without fast-forwarding the dataset visually.
+    """
+    def __init__(self, dataloader: DataLoader, base_seed: int = 42):
+        self.dataloader = dataloader
+        self.base_seed = base_seed
+        self.batch_idx = 0
+        self.epoch = 0
+        self._indices = []
+        self._iterator = None
+        self._generator = torch.Generator()
+    def __iter__(self):
+        # We manually seed the generator based on epoch for exact reproducible shuffling
+        self._generator.manual_seed(self.base_seed + self.epoch)
+        # We recreate the indices to maintain exact determinism
+        if isinstance(self.dataloader.sampler, RandomSampler):
+            # Using torch.randperm directly as the RandomSampler would with our generator
+            n = len(self.dataloader.dataset)
+            self._indices = torch.randperm(n, generator=self._generator).tolist()
+        elif isinstance(self.dataloader.sampler, SequentialSampler):
+            self._indices = list(range(len(self.dataloader.dataset)))
+        else:
+            # For custom samplers, we extract indices if we can
+            try:
+                self._indices = list(self.dataloader.sampler)
+            except Exception:
+                self._indices = []
+        # Fast forward if we resumed mid-epoch
+        if self.batch_idx > 0 and self._indices:
+            # Skip the items we already yielded
+            items_to_skip = self.batch_idx * self.dataloader.batch_size
+            self._indices = self._indices[items_to_skip:]
+            # create a subset dataloader or just yield properly
+            # In a true wrapper we'd modify the sampler, but for simplicity
+            # we iterate and skip quickly (which is still slow for massive datasets,
+            # so the true way is subclassing the sampler).
+            # Since standard dataloader isn't easily sliced without rewriting it:
+            self._iterator = iter(self.dataloader)
+            for _ in range(self.batch_idx):
+                next(self._iterator, None)
+        else:
+            self._iterator = iter(self.dataloader)
+        return self
+    def __next__(self):
+        try:
+            batch = next(self._iterator)
+            self.batch_idx += 1
+            return batch
+        except StopIteration:
+            self.epoch += 1
+            self.batch_idx = 0
+            raise
+    def state_dict(self):
+        return {
+            "batch_idx": self.batch_idx,
+            "epoch": self.epoch,
+            "indices": self._indices,
+            "base_seed": self.base_seed
+        }
+    def load_state_dict(self, state):
+        self.batch_idx = state.get("batch_idx", 0)
+        self.epoch = state.get("epoch", 0)
+        self._indices = state.get("indices", [])
+        self.base_seed = state.get("base_seed", 42)