episodevault 0.0.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 [AUTHOR]
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,152 @@
1
+ Metadata-Version: 2.4
2
+ Name: episodevault
3
+ Version: 0.0.2
4
+ Summary: Version control and lineage tracking for robot training episode data
5
+ License: MIT
6
+ Requires-Python: >=3.10
7
+ Description-Content-Type: text/markdown
8
+ License-File: LICENSE
9
+ Requires-Dist: pyarrow>=14.0
10
+ Requires-Dist: pandas>=2.0
11
+ Requires-Dist: duckdb>=0.10
12
+ Requires-Dist: deltalake>=0.17
13
+ Requires-Dist: click>=8.1
14
+ Requires-Dist: rich>=13.0
15
+ Requires-Dist: pydantic>=2.0
16
+ Requires-Dist: xxhash>=3.4
17
+ Provides-Extra: dev
18
+ Requires-Dist: pytest>=8.0; extra == "dev"
19
+ Requires-Dist: pytest-cov>=5.0; extra == "dev"
20
+ Requires-Dist: ruff>=0.4; extra == "dev"
21
+ Requires-Dist: mypy>=1.10; extra == "dev"
22
+ Dynamic: license-file
23
+
24
+ # EpisodeVault: find out exactly why your robot model regressed.
25
+
26
+ [![PyPI](https://img.shields.io/pypi/v/episodevault)](https://pypi.org/project/episodevault/)
27
+ [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
28
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
29
+ [![LeRobot v3](https://img.shields.io/badge/LeRobot-v3-green.svg)](https://github.com/huggingface/lerobot)
30
+
31
+ ---
32
+
33
+ ## The problem
34
+
35
+ Every robotics ML engineer has retrained a model and watched performance drop with no clear cause. DVC tracks which files changed. MLflow tracks which hyperparameters ran. Nobody tracks what changed at the episode level, which tasks dropped out, which quality metrics shifted, which task distribution moved between v1 and v2 of your dataset.
36
+
37
+ EpisodeVault fills that gap.
38
+
39
+ ---
40
+
41
+ ## What EpisodeVault does
42
+
43
+ Run `episodevault diff v1.0 v2.0` and get this:
44
+
45
+ ```
46
+ Dataset diff: v1.0 → v2.0
47
+ ────────────────────────────────────────────────────
48
+ Episodes added: +0
49
+ Episodes removed: -7
50
+
51
+ Distribution shift:
52
+ factory_pick 2 → 6 ↑ 200% ⚠️
53
+ kitchen_grasp 4 → 1 ↓ 75% ⚠️
54
+
55
+ Quality metrics:
56
+ avg episode length: 3.7s → 3.0s ↓
57
+ success_rate: 0.88 → 0.38 ↓
58
+ camera_sync_score: 1.00 → 1.00 →
59
+
60
+ Regression candidates (ranked by magnitude; correlate with your eval):
61
+ - 'kitchen_grasp' episodes dropped 75% (4 → 1). Restore from prior
62
+ version if this task is in your eval benchmark.
63
+ - Success rate fell 50% (0.88 → 0.38). New episodes may contain failed
64
+ demonstrations. Run score_lerobot_episodes to identify low-quality additions.
65
+ ```
66
+
67
+ ---
68
+
69
+ ## Install
70
+
71
+ ```bash
72
+ pip install episodevault
73
+ ```
74
+
75
+ Requires Python 3.10+. Key dependencies: `pyarrow`, `pandas`, `duckdb`, `click`, `rich`, `pydantic`.
76
+
77
+ ---
78
+
79
+ ## Quickstart
80
+
81
+ ```bash
82
+ # Start tracking a local LeRobot dataset
83
+ episodevault track ./my_dataset
84
+
85
+ # Snapshot the current state with a message
86
+ episodevault commit -m "added 500 kitchen episodes"
87
+
88
+ # Compare two snapshots
89
+ episodevault diff v1.0 v2.0
90
+
91
+ # Find what dataset a model was trained on and diff against the prior version
92
+ episodevault blame model_v3
93
+ ```
94
+
95
+ `track` initializes a `.episodevault/` store inside your dataset directory. `commit` snapshots the episode manifest (not raw sensor data -- fast). `diff` computes task distribution shift and quality deltas between any two versions. `blame` looks up which dataset version trained a given model and diffs it against the version before.
96
+
97
+ ---
98
+
99
+ ## Python API
100
+
101
+ Log a training run from your training script so `blame` can trace it back:
102
+
103
+ ```python
104
+ import episodevault as ev
105
+
106
+ ev.log_training_run(
107
+ model_version="model_v3",
108
+ dataset_version="v2.0",
109
+ framework="lerobot"
110
+ )
111
+ ```
112
+
113
+ One call. That's all `blame` needs.
114
+
115
+ ---
116
+
117
+ ## Compatibility
118
+
119
+ Tested against real HuggingFace LeRobot v3 datasets:
120
+
121
+ | Dataset | Robot | Format | Episodes | Parse time | Status |
122
+ |------------------|--------|------------|----------|------------|--------|
123
+ | aloha_pencil | aloha | LeRobot v3 | 25 | 0.33s | OK |
124
+ | aloha_shrimp | aloha | LeRobot v3 | 18 | 0.38s | OK |
125
+ | so100_stacking | so100 | LeRobot v3 | 56 | 0.65s | OK |
126
+ | aloha_cabinet | aloha | LeRobot v3 | 85 | 2.65s | OK |
127
+
128
+ Parse time is for the episode manifest only. Raw sensor data (video, joint trajectories) is never loaded.
129
+
130
+ ---
131
+
132
+ ## How it works
133
+
134
+ - Parses episode manifests (`meta/episodes/`, `meta/tasks.parquet`, `meta/info.json`) without loading raw sensor data -- sub-second parse regardless of frame count or video size.
135
+ - Snapshots manifests into a version store on every `commit` -- diff and time travel are built in from the start.
136
+ - Diff engine computes task distribution shift and quality deltas between any two snapshots -- regression candidates are ranked by a normalized severity score and the top few are surfaced, not asserted as proven causes.
137
+
138
+
139
+ ---
140
+
141
+ ## Credits
142
+
143
+ - [HuggingFace LeRobot](https://github.com/huggingface/lerobot) team for the v3 dataset format that EpisodeVault parses.
144
+ - Berkeley AutoLab (Kaiyuan Chen et al.) for [Robo-DM / fog_x](https://github.com/BerkeleyAutomation/fog_x), prior work on robot dataset management.
145
+ - [score_lerobot_episodes](https://github.com/RobotData/score-lerobot-episodes) by RobotData for quality signal methodology.
146
+ - [Evidently AI](https://github.com/evidentlyai/evidently) for drift detection methodology that informed the distribution shift logic.
147
+
148
+ ---
149
+
150
+ ## License
151
+
152
+ MIT. See [LICENSE](LICENSE).
@@ -0,0 +1,129 @@
1
+ # EpisodeVault: find out exactly why your robot model regressed.
2
+
3
+ [![PyPI](https://img.shields.io/pypi/v/episodevault)](https://pypi.org/project/episodevault/)
4
+ [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
5
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
6
+ [![LeRobot v3](https://img.shields.io/badge/LeRobot-v3-green.svg)](https://github.com/huggingface/lerobot)
7
+
8
+ ---
9
+
10
+ ## The problem
11
+
12
+ Every robotics ML engineer has retrained a model and watched performance drop with no clear cause. DVC tracks which files changed. MLflow tracks which hyperparameters ran. Nobody tracks what changed at the episode level, which tasks dropped out, which quality metrics shifted, which task distribution moved between v1 and v2 of your dataset.
13
+
14
+ EpisodeVault fills that gap.
15
+
16
+ ---
17
+
18
+ ## What EpisodeVault does
19
+
20
+ Run `episodevault diff v1.0 v2.0` and get this:
21
+
22
+ ```
23
+ Dataset diff: v1.0 → v2.0
24
+ ────────────────────────────────────────────────────
25
+ Episodes added: +0
26
+ Episodes removed: -7
27
+
28
+ Distribution shift:
29
+ factory_pick 2 → 6 ↑ 200% ⚠️
30
+ kitchen_grasp 4 → 1 ↓ 75% ⚠️
31
+
32
+ Quality metrics:
33
+ avg episode length: 3.7s → 3.0s ↓
34
+ success_rate: 0.88 → 0.38 ↓
35
+ camera_sync_score: 1.00 → 1.00 →
36
+
37
+ Regression candidates (ranked by magnitude; correlate with your eval):
38
+ - 'kitchen_grasp' episodes dropped 75% (4 → 1). Restore from prior
39
+ version if this task is in your eval benchmark.
40
+ - Success rate fell 50% (0.88 → 0.38). New episodes may contain failed
41
+ demonstrations. Run score_lerobot_episodes to identify low-quality additions.
42
+ ```
43
+
44
+ ---
45
+
46
+ ## Install
47
+
48
+ ```bash
49
+ pip install episodevault
50
+ ```
51
+
52
+ Requires Python 3.10+. Key dependencies: `pyarrow`, `pandas`, `duckdb`, `click`, `rich`, `pydantic`.
53
+
54
+ ---
55
+
56
+ ## Quickstart
57
+
58
+ ```bash
59
+ # Start tracking a local LeRobot dataset
60
+ episodevault track ./my_dataset
61
+
62
+ # Snapshot the current state with a message
63
+ episodevault commit -m "added 500 kitchen episodes"
64
+
65
+ # Compare two snapshots
66
+ episodevault diff v1.0 v2.0
67
+
68
+ # Find what dataset a model was trained on and diff against the prior version
69
+ episodevault blame model_v3
70
+ ```
71
+
72
+ `track` initializes a `.episodevault/` store inside your dataset directory. `commit` snapshots the episode manifest (not raw sensor data -- fast). `diff` computes task distribution shift and quality deltas between any two versions. `blame` looks up which dataset version trained a given model and diffs it against the version before.
73
+
74
+ ---
75
+
76
+ ## Python API
77
+
78
+ Log a training run from your training script so `blame` can trace it back:
79
+
80
+ ```python
81
+ import episodevault as ev
82
+
83
+ ev.log_training_run(
84
+ model_version="model_v3",
85
+ dataset_version="v2.0",
86
+ framework="lerobot"
87
+ )
88
+ ```
89
+
90
+ One call. That's all `blame` needs.
91
+
92
+ ---
93
+
94
+ ## Compatibility
95
+
96
+ Tested against real HuggingFace LeRobot v3 datasets:
97
+
98
+ | Dataset | Robot | Format | Episodes | Parse time | Status |
99
+ |------------------|--------|------------|----------|------------|--------|
100
+ | aloha_pencil | aloha | LeRobot v3 | 25 | 0.33s | OK |
101
+ | aloha_shrimp | aloha | LeRobot v3 | 18 | 0.38s | OK |
102
+ | so100_stacking | so100 | LeRobot v3 | 56 | 0.65s | OK |
103
+ | aloha_cabinet | aloha | LeRobot v3 | 85 | 2.65s | OK |
104
+
105
+ Parse time is for the episode manifest only. Raw sensor data (video, joint trajectories) is never loaded.
106
+
107
+ ---
108
+
109
+ ## How it works
110
+
111
+ - Parses episode manifests (`meta/episodes/`, `meta/tasks.parquet`, `meta/info.json`) without loading raw sensor data -- sub-second parse regardless of frame count or video size.
112
+ - Snapshots manifests into a version store on every `commit` -- diff and time travel are built in from the start.
113
+ - Diff engine computes task distribution shift and quality deltas between any two snapshots -- regression candidates are ranked by a normalized severity score and the top few are surfaced, not asserted as proven causes.
114
+
115
+
116
+ ---
117
+
118
+ ## Credits
119
+
120
+ - [HuggingFace LeRobot](https://github.com/huggingface/lerobot) team for the v3 dataset format that EpisodeVault parses.
121
+ - Berkeley AutoLab (Kaiyuan Chen et al.) for [Robo-DM / fog_x](https://github.com/BerkeleyAutomation/fog_x), prior work on robot dataset management.
122
+ - [score_lerobot_episodes](https://github.com/RobotData/score-lerobot-episodes) by RobotData for quality signal methodology.
123
+ - [Evidently AI](https://github.com/evidentlyai/evidently) for drift detection methodology that informed the distribution shift logic.
124
+
125
+ ---
126
+
127
+ ## License
128
+
129
+ MIT. See [LICENSE](LICENSE).
@@ -0,0 +1,19 @@
1
+ from episodevault.api import log_training_run
2
+ from episodevault.diff.engine import diff
3
+ from episodevault.models import DatasetManifest, EpisodeManifest, EpisodeQuality
4
+ from episodevault.parsers.lerobot import parse as parse_lerobot
5
+ from episodevault.store.lineage_store import LineageStore
6
+ from episodevault.store.version_store import VersionStore
7
+
8
+ __all__ = [
9
+ "parse_lerobot",
10
+ "VersionStore",
11
+ "LineageStore",
12
+ "diff",
13
+ "log_training_run",
14
+ "DatasetManifest",
15
+ "EpisodeManifest",
16
+ "EpisodeQuality",
17
+ ]
18
+
19
+ __version__ = "0.1.0"
@@ -0,0 +1,23 @@
1
+ from __future__ import annotations
2
+
3
+ from pathlib import Path
4
+ from typing import Any
5
+
6
+ from episodevault.store.lineage_store import LineageStore
7
+
8
+
9
+ def log_training_run(
10
+ model_version: str,
11
+ dataset_version: str,
12
+ dataset_path: str | Path = ".",
13
+ framework: str = "lerobot",
14
+ **extras: Any,
15
+ ) -> None:
16
+ store_path = Path(dataset_path) / ".episodevault"
17
+ lineage = LineageStore(store_path)
18
+ lineage.log_training_run(
19
+ model_version=model_version,
20
+ dataset_version=dataset_version,
21
+ framework=framework,
22
+ **extras,
23
+ )
File without changes
@@ -0,0 +1,178 @@
1
+ from __future__ import annotations
2
+
3
+ import sys
4
+ from pathlib import Path
5
+
6
+ import click
7
+ from rich.console import Console
8
+ from rich.table import Table
9
+
10
+ from episodevault.diff.engine import diff as compute_diff
11
+ from episodevault.parsers.lerobot import parse as parse_lerobot
12
+ from episodevault.store.lineage_store import LineageStore
13
+ from episodevault.store.version_store import VersionStore
14
+
15
+ _console = Console(force_terminal=True)
16
+
17
+ def _resolve_store(dataset_path: Path) -> Path:
18
+ return dataset_path / ".episodevault"
19
+
20
+
21
+ @click.group()
22
+ def cli() -> None:
23
+ pass
24
+
25
+
26
+ @cli.command()
27
+ @click.argument("dataset_path")
28
+ def track(dataset_path: str) -> None:
29
+ if "/" in dataset_path and not Path(dataset_path).exists():
30
+ local_name = dataset_path.replace("/", "__")
31
+ _console.print(
32
+ f"[red]error:[/red] '{dataset_path}' looks like a HuggingFace repo ID, "
33
+ "not a local path."
34
+ )
35
+ _console.print(
36
+ f"Download first with:\n"
37
+ f" huggingface-cli download --repo-type dataset {dataset_path} "
38
+ f"--local-dir ./{local_name}"
39
+ )
40
+ sys.exit(1)
41
+
42
+ root = Path(dataset_path)
43
+ if not root.exists():
44
+ _console.print(f"[red]error:[/red] path '{dataset_path}' does not exist.")
45
+ sys.exit(1)
46
+
47
+ store_path = _resolve_store(root)
48
+ store_path.mkdir(parents=True, exist_ok=True)
49
+ (store_path / ".gitignore").write_text("*.parquet\n")
50
+ _console.print(f"[green]Tracking[/green] {root.resolve()}")
51
+ _console.print(f"Store initialised at {store_path}")
52
+
53
+
54
+ @cli.command()
55
+ @click.argument("dataset_path", type=click.Path(exists=True))
56
+ @click.option("-m", "--message", required=True, help="Commit message")
57
+ def commit(dataset_path: str, message: str) -> None:
58
+ root = Path(dataset_path)
59
+ store_path = _resolve_store(root)
60
+
61
+ if not store_path.exists():
62
+ _console.print(
63
+ "[red]error:[/red] dataset not tracked. Run `episodevault track` first."
64
+ )
65
+ sys.exit(1)
66
+
67
+ _console.print("Parsing dataset…")
68
+ try:
69
+ manifest = parse_lerobot(root)
70
+ except (FileNotFoundError, ValueError) as exc:
71
+ _console.print(f"[red]error:[/red] {exc}")
72
+ sys.exit(1)
73
+
74
+ store = VersionStore(store_path)
75
+ version_id = store.commit(manifest, message)
76
+
77
+ _console.print(f"[green]Committed[/green] {version_id} {message}")
78
+ _console.print(
79
+ f" {manifest.total_episodes} episodes · "
80
+ f"{len(manifest.tasks)} tasks · "
81
+ f"{manifest.robot_type}"
82
+ )
83
+
84
+
85
+ @cli.command(name="diff")
86
+ @click.argument("version_before")
87
+ @click.argument("version_after")
88
+ @click.argument("dataset_path", type=click.Path(exists=True), default=".")
89
+ def diff_cmd(version_before: str, version_after: str, dataset_path: str) -> None:
90
+ root = Path(dataset_path)
91
+ store_path = _resolve_store(root)
92
+ store = VersionStore(store_path)
93
+
94
+ try:
95
+ manifest_before = store.read_version(version_before)
96
+ manifest_after = store.read_version(version_after)
97
+ except KeyError as exc:
98
+ _console.print(f"[red]error:[/red] {exc}")
99
+ sys.exit(1)
100
+
101
+ object.__setattr__(manifest_before, "_version_id", version_before)
102
+ object.__setattr__(manifest_after, "_version_id", version_after)
103
+
104
+ result = compute_diff(manifest_before, manifest_after)
105
+ _console.print(result.format())
106
+
107
+
108
+ @cli.command()
109
+ @click.argument("model_version")
110
+ @click.argument("dataset_path", type=click.Path(exists=True), default=".")
111
+ def blame(model_version: str, dataset_path: str) -> None:
112
+ root = Path(dataset_path)
113
+ store_path = _resolve_store(root)
114
+ lineage = LineageStore(store_path)
115
+ store = VersionStore(store_path)
116
+
117
+ dataset_version = lineage.dataset_version_for_model(model_version)
118
+ if dataset_version is None:
119
+ _console.print(
120
+ f"[red]error:[/red] No training run found for model '{model_version}'. "
121
+ "Log runs with ev.log_training_run() in your training script."
122
+ )
123
+ sys.exit(1)
124
+
125
+ versions = store.list_versions()
126
+ version_ids = [v["version_id"] for v in versions]
127
+
128
+ if dataset_version not in version_ids:
129
+ _console.print(
130
+ f"[yellow]warning:[/yellow] Model '{model_version}' trained on "
131
+ f"dataset version '{dataset_version}' which is not in the store."
132
+ )
133
+ sys.exit(1)
134
+
135
+ idx = version_ids.index(dataset_version)
136
+ _console.print(
137
+ f"[bold]{model_version}[/bold] was trained on dataset version "
138
+ f"[bold]{dataset_version}[/bold]"
139
+ )
140
+
141
+ if idx == 0:
142
+ _console.print("No prior version to diff against.")
143
+ return
144
+
145
+ prior_version = version_ids[idx - 1]
146
+ _console.print(f"Diffing {prior_version} → {dataset_version}:\n")
147
+
148
+ manifest_before = store.read_version(prior_version)
149
+ manifest_after = store.read_version(dataset_version)
150
+ result = compute_diff(manifest_before, manifest_after)
151
+ _console.print(result.format())
152
+
153
+
154
+ @cli.command()
155
+ @click.argument("dataset_path", type=click.Path(exists=True), default=".")
156
+ def log(dataset_path: str) -> None:
157
+ root = Path(dataset_path)
158
+ store_path = _resolve_store(root)
159
+ store = VersionStore(store_path)
160
+ versions = store.list_versions()
161
+
162
+ if not versions:
163
+ _console.print("No commits yet.")
164
+ return
165
+
166
+ table = Table(show_header=True, header_style="bold")
167
+ table.add_column("Version")
168
+ table.add_column("Episodes", justify="right")
169
+ table.add_column("Message")
170
+
171
+ for v in reversed(versions):
172
+ table.add_row(
173
+ str(v["version_id"]),
174
+ str(v["total_episodes"]),
175
+ str(v["commit_message"]),
176
+ )
177
+
178
+ _console.print(table)
File without changes