vlakit 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- vlakit-0.1.0/.gitignore +16 -0
- vlakit-0.1.0/LICENSE +21 -0
- vlakit-0.1.0/PKG-INFO +121 -0
- vlakit-0.1.0/README.md +91 -0
- vlakit-0.1.0/pyproject.toml +45 -0
- vlakit-0.1.0/src/vlakit/__init__.py +8 -0
- vlakit-0.1.0/src/vlakit/cli.py +199 -0
- vlakit-0.1.0/src/vlakit/scripts/data/README.md +45 -0
- vlakit-0.1.0/src/vlakit/scripts/data/compute_stats.py +128 -0
- vlakit-0.1.0/src/vlakit/scripts/data/convert_v21_to_v30.sh +41 -0
- vlakit-0.1.0/src/vlakit/scripts/data/heldout_split.py +72 -0
- vlakit-0.1.0/src/vlakit/scripts/eval/README.md +59 -0
- vlakit-0.1.0/src/vlakit/scripts/eval/rollout.py +263 -0
- vlakit-0.1.0/src/vlakit/scripts/lib/README.md +43 -0
- vlakit-0.1.0/src/vlakit/scripts/lib/common.sh +64 -0
- vlakit-0.1.0/src/vlakit/scripts/lib/config.sh +121 -0
- vlakit-0.1.0/src/vlakit/scripts/lib/remote.sh +40 -0
- vlakit-0.1.0/src/vlakit/scripts/ops/README.md +46 -0
- vlakit-0.1.0/src/vlakit/scripts/ops/autoresume.sh +34 -0
- vlakit-0.1.0/src/vlakit/scripts/ops/autoresume_dispatch.sh +29 -0
- vlakit-0.1.0/src/vlakit/scripts/ops/bootstrap_box.sh +122 -0
- vlakit-0.1.0/src/vlakit/scripts/ops/ensure_swap.sh +47 -0
- vlakit-0.1.0/src/vlakit/scripts/ops/finalize.sh +101 -0
- vlakit-0.1.0/src/vlakit/scripts/ops/launch.sh +60 -0
- vlakit-0.1.0/src/vlakit/scripts/ops/monitor.sh +125 -0
- vlakit-0.1.0/src/vlakit/scripts/ops/publish_checkpoint.py +47 -0
- vlakit-0.1.0/src/vlakit/scripts/ops/pull_artifacts_to_storage.py +51 -0
- vlakit-0.1.0/src/vlakit/scripts/ops/remote.sh +66 -0
- vlakit-0.1.0/src/vlakit/scripts/ops/rescale_gpus.sh +61 -0
- vlakit-0.1.0/src/vlakit/scripts/ops/safe_kill.sh +12 -0
- vlakit-0.1.0/src/vlakit/scripts/ops/salvage_checkpoint.sh +101 -0
- vlakit-0.1.0/src/vlakit/scripts/ops/stage_assets.sh +26 -0
- vlakit-0.1.0/src/vlakit/scripts/ops/teardown.sh +126 -0
- vlakit-0.1.0/src/vlakit/scripts/ops/verify_checkpoint.py +37 -0
- vlakit-0.1.0/src/vlakit/scripts/patches/README.md +19 -0
- vlakit-0.1.0/src/vlakit/scripts/patches/patch_lerobot_resume_overrides.py +94 -0
- vlakit-0.1.0/src/vlakit/scripts/stacks/lerobot/README.md +47 -0
- vlakit-0.1.0/src/vlakit/scripts/stacks/lerobot/launch.sh +77 -0
- vlakit-0.1.0/src/vlakit/scripts/stacks/molmo/README.md +47 -0
- vlakit-0.1.0/src/vlakit/scripts/stacks/molmo/launch.sh +81 -0
- vlakit-0.1.0/src/vlakit/scripts/stacks/openpi/README.md +52 -0
- vlakit-0.1.0/src/vlakit/scripts/stacks/openpi/launch.sh +67 -0
- vlakit-0.1.0/src/vlakit/templates/configs/.gitignore +10 -0
- vlakit-0.1.0/src/vlakit/templates/configs/README.md +68 -0
- vlakit-0.1.0/src/vlakit/templates/configs/_defaults.yaml +39 -0
- vlakit-0.1.0/src/vlakit/templates/configs/baselines.yaml +38 -0
- vlakit-0.1.0/src/vlakit/templates/configs/boxes.yaml +46 -0
- vlakit-0.1.0/src/vlakit/templates/configs/datasets.yaml +35 -0
- vlakit-0.1.0/src/vlakit/templates/configs/runs/mixed_qa_lerobot.yaml +34 -0
- vlakit-0.1.0/src/vlakit/templates/configs/runs/mixed_qa_openpi.yaml +33 -0
- vlakit-0.1.0/src/vlakit/templates/configs/runs/towel_lerobot.yaml +31 -0
- vlakit-0.1.0/src/vlakit/templates/configs/runs/towel_molmo.yaml +38 -0
- vlakit-0.1.0/src/vlakit/templates/configs/runs/towel_openpi.yaml +33 -0
- vlakit-0.1.0/src/vlakit/templates/configs/secrets.example.env +23 -0
vlakit-0.1.0/.gitignore
ADDED
vlakit-0.1.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 kkipngenokoech
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
vlakit-0.1.0/PKG-INFO
ADDED
|
@@ -0,0 +1,121 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: vlakit
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Config-driven CLI to launch, monitor, and ship VLA fine-tunes across ephemeral GPU boxes.
|
|
5
|
+
Project-URL: Homepage, https://github.com/kkipngenokoech/vlakit
|
|
6
|
+
Project-URL: Issues, https://github.com/kkipngenokoech/vlakit/issues
|
|
7
|
+
Author-email: kkipngenokoech <kkipngenokoech22@gmail.com>
|
|
8
|
+
License: MIT
|
|
9
|
+
License-File: LICENSE
|
|
10
|
+
Keywords: fine-tuning,lerobot,molmoact,openpi,robotics,vla
|
|
11
|
+
Classifier: Environment :: Console
|
|
12
|
+
Classifier: Intended Audience :: Science/Research
|
|
13
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
14
|
+
Classifier: Operating System :: POSIX
|
|
15
|
+
Classifier: Programming Language :: Python :: 3
|
|
16
|
+
Requires-Python: >=3.10
|
|
17
|
+
Provides-Extra: all
|
|
18
|
+
Requires-Dist: numpy; extra == 'all'
|
|
19
|
+
Requires-Dist: pyarrow; extra == 'all'
|
|
20
|
+
Requires-Dist: pyyaml; extra == 'all'
|
|
21
|
+
Requires-Dist: wandb; extra == 'all'
|
|
22
|
+
Provides-Extra: stats
|
|
23
|
+
Requires-Dist: numpy; extra == 'stats'
|
|
24
|
+
Requires-Dist: pyarrow; extra == 'stats'
|
|
25
|
+
Provides-Extra: wandb
|
|
26
|
+
Requires-Dist: wandb; extra == 'wandb'
|
|
27
|
+
Provides-Extra: yaml
|
|
28
|
+
Requires-Dist: pyyaml; extra == 'yaml'
|
|
29
|
+
Description-Content-Type: text/markdown
|
|
30
|
+
|
|
31
|
+
# vlakit
|
|
32
|
+
|
|
33
|
+
`vlakit` is a config-driven command-line tool for launching VLA fine-tunes across **ephemeral GPU boxes**, keeping them alive through crashes, and shipping verified weights to durable storage. You describe a run in YAML and drive everything with a single `vla` command from your laptop; the actual work runs on a GPU box over ssh, and the durable artifacts land on Weights & Biases and a storage box — never on the GPU box, which you throw away.
|
|
34
|
+
|
|
35
|
+
It packages a set of battle-tested shell and Python scripts (the ones that encode the hard-won operational lessons) behind a friendly CLI. The scripts ship with the install as read-only package data, while your environment — the boxes, datasets, baselines, and runs — lives in a `configs/` directory you own and edit.
|
|
36
|
+
|
|
37
|
+
## Install
|
|
38
|
+
|
|
39
|
+
`vlakit` is best installed as an isolated CLI with [pipx](https://pipx.pypa.io):
|
|
40
|
+
|
|
41
|
+
```bash
|
|
42
|
+
pipx install vlakit
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
Or with pip. The core install is dependency-light because the laptop-side commands need almost nothing; the heavier pieces are opt-in extras:
|
|
46
|
+
|
|
47
|
+
```bash
|
|
48
|
+
pip install vlakit # laptop-side: config / remote / launch
|
|
49
|
+
pip install "vlakit[stats]" # adds numpy + pyarrow for `vla stats`
|
|
50
|
+
pip install "vlakit[wandb]" # adds wandb for publish / pull / eval logging
|
|
51
|
+
pip install "vlakit[all]" # everything
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
## Quickstart
|
|
55
|
+
|
|
56
|
+
```bash
|
|
57
|
+
vla init # scaffold an editable ./configs from templates
|
|
58
|
+
# edit configs/boxes.yaml, datasets.yaml, baselines.yaml, and a runs/<name>.yaml
|
|
59
|
+
# then copy configs/secrets.example.env -> configs/secrets.local.env and fill it
|
|
60
|
+
|
|
61
|
+
vla config <run> # resolve + print the run config (local, no box)
|
|
62
|
+
vla remote <box> deploy # rsync the toolkit + your configs onto the box
|
|
63
|
+
vla remote <box> push-secrets # install ~/.secrets.env on the box (mode 600)
|
|
64
|
+
vla remote <box> ensure-swap # provision swap (absorbs the checkpoint-save spike)
|
|
65
|
+
vla launch <run> # launch detached + auto-resume (box read from the run cfg)
|
|
66
|
+
vla remote <box> monitor # step / rate / ETA + liveness
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
## Where each command runs
|
|
70
|
+
|
|
71
|
+
`vlakit` keeps a clean split between your laptop and the GPU box. Commands that resolve or inspect configuration — `init`, `config`, `stats`, `split`, `eval`, `doctor` — run entirely on your **laptop** and need no box. Commands that operate on a machine — everything under `vla remote ...`, and `vla launch` — open an ssh connection from your laptop and run the work **on the box** defined in your `boxes.yaml`.
|
|
72
|
+
|
|
73
|
+
Run `vla doctor` to see exactly which scripts directory and config directory were resolved, and which optional dependencies are installed.
|
|
74
|
+
|
|
75
|
+
## Commands
|
|
76
|
+
|
|
77
|
+
| Command | Runs | What it does |
|
|
78
|
+
|---|---|---|
|
|
79
|
+
| `vla init [dir]` | laptop | Scaffolds an editable `configs/` directory from the bundled templates. |
|
|
80
|
+
| `vla config <run>` | laptop | Resolves a run (defaults merged under the run) and prints the config plus the exact command, running nothing. |
|
|
81
|
+
| `vla remote <box> <subcmd> [args]` | box | Runs an operational subcommand on the box: `deploy`, `push-secrets`, `ensure-swap`, `launch`, `autoresume`, `monitor`, `kill`, `rescale`, `pull`, `gpus`, `exec`, `shell`. |
|
|
82
|
+
| `vla launch <run>` | box | Launches the run detached and auto-resuming; the box is read from the run's `box:` field. |
|
|
83
|
+
| `vla stats [args]` | laptop/box | Computes the full dataset statistics (quantiles + image stats) that lerobot and molmo need. Requires the `[stats]` extra. |
|
|
84
|
+
| `vla split [args]` | laptop | Produces a deterministic held-out episode split for validation/eval. |
|
|
85
|
+
| `vla eval [args]` | laptop/box | Ranks a checkpoint by held-out error or rollout, not loss. Try `vla eval --self-test` to verify the harness with no box. |
|
|
86
|
+
| `vla doctor` | laptop | Prints the resolved scripts/config directories and optional-dependency status. |
|
|
87
|
+
|
|
88
|
+
## Configuration
|
|
89
|
+
|
|
90
|
+
Your `configs/` directory holds everything dynamic, and no secrets ever live in it: keys resolve on the box via `~/.secrets.env`. The directory is resolved from `--config-dir`, then the `VLA_CONFIG_DIR` environment variable, then `./configs`. A run file under `runs/` is a thin recipe that names a `box`, a `dataset`, and a `baseline` — each a pointer into the corresponding registry — plus a few hyperparameters; everything else is inherited from `_defaults.yaml`.
|
|
91
|
+
|
|
92
|
+
## Status
|
|
93
|
+
|
|
94
|
+
The local commands (`init`, `config`, `stats`, `split`, `eval`, `doctor`) are implemented and tested. The remote commands shell out to the bundled, proven ops scripts; `vla remote <box> deploy` now ships both the toolkit and your `configs/` to the box. The `eval` offline comparator is implemented and self-tested (`vla eval --self-test`); its sim and robot rollout modes are still stubs.
|
|
95
|
+
|
|
96
|
+
## Publishing (maintainers)
|
|
97
|
+
|
|
98
|
+
Releases publish to PyPI automatically through [Trusted Publishing](https://docs.pypi.org/trusted-publishers/) (OIDC), so no API token is stored anywhere. The workflow is [`.github/workflows/release.yml`](.github/workflows/release.yml).
|
|
99
|
+
|
|
100
|
+
One-time setup:
|
|
101
|
+
|
|
102
|
+
1. On PyPI, add a **pending** Trusted Publisher (Account → Publishing) with these exact values:
|
|
103
|
+
- PyPI Project Name: `vlakit`
|
|
104
|
+
- Owner: `kkipngenokoech`
|
|
105
|
+
- Repository name: `vlakit`
|
|
106
|
+
- Workflow name: `release.yml`
|
|
107
|
+
- Environment name: `pypi`
|
|
108
|
+
2. In the GitHub repo, create an Environment named `pypi` (Settings → Environments).
|
|
109
|
+
|
|
110
|
+
To cut a release, tag a version that matches `pyproject.toml` and push it:
|
|
111
|
+
|
|
112
|
+
```bash
|
|
113
|
+
git tag v0.1.0
|
|
114
|
+
git push origin v0.1.0
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
The workflow builds the sdist + wheel, verifies the bundled scripts/templates are inside the wheel, checks the tag matches the package version, and publishes. After the first successful run the pending publisher becomes a normal one, and `pipx install vlakit` works for everyone.
|
|
118
|
+
|
|
119
|
+
## License
|
|
120
|
+
|
|
121
|
+
MIT — see [LICENSE](LICENSE).
|
vlakit-0.1.0/README.md
ADDED
|
@@ -0,0 +1,91 @@
|
|
|
1
|
+
# vlakit
|
|
2
|
+
|
|
3
|
+
`vlakit` is a config-driven command-line tool for launching VLA fine-tunes across **ephemeral GPU boxes**, keeping them alive through crashes, and shipping verified weights to durable storage. You describe a run in YAML and drive everything with a single `vla` command from your laptop; the actual work runs on a GPU box over ssh, and the durable artifacts land on Weights & Biases and a storage box — never on the GPU box, which you throw away.
|
|
4
|
+
|
|
5
|
+
It packages a set of battle-tested shell and Python scripts (the ones that encode the hard-won operational lessons) behind a friendly CLI. The scripts ship with the install as read-only package data, while your environment — the boxes, datasets, baselines, and runs — lives in a `configs/` directory you own and edit.
|
|
6
|
+
|
|
7
|
+
## Install
|
|
8
|
+
|
|
9
|
+
`vlakit` is best installed as an isolated CLI with [pipx](https://pipx.pypa.io):
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
pipx install vlakit
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
Or with pip. The core install is dependency-light because the laptop-side commands need almost nothing; the heavier pieces are opt-in extras:
|
|
16
|
+
|
|
17
|
+
```bash
|
|
18
|
+
pip install vlakit # laptop-side: config / remote / launch
|
|
19
|
+
pip install "vlakit[stats]" # adds numpy + pyarrow for `vla stats`
|
|
20
|
+
pip install "vlakit[wandb]" # adds wandb for publish / pull / eval logging
|
|
21
|
+
pip install "vlakit[all]" # everything
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
## Quickstart
|
|
25
|
+
|
|
26
|
+
```bash
|
|
27
|
+
vla init # scaffold an editable ./configs from templates
|
|
28
|
+
# edit configs/boxes.yaml, datasets.yaml, baselines.yaml, and a runs/<name>.yaml
|
|
29
|
+
# then copy configs/secrets.example.env -> configs/secrets.local.env and fill it
|
|
30
|
+
|
|
31
|
+
vla config <run> # resolve + print the run config (local, no box)
|
|
32
|
+
vla remote <box> deploy # rsync the toolkit + your configs onto the box
|
|
33
|
+
vla remote <box> push-secrets # install ~/.secrets.env on the box (mode 600)
|
|
34
|
+
vla remote <box> ensure-swap # provision swap (absorbs the checkpoint-save spike)
|
|
35
|
+
vla launch <run> # launch detached + auto-resume (box read from the run cfg)
|
|
36
|
+
vla remote <box> monitor # step / rate / ETA + liveness
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
## Where each command runs
|
|
40
|
+
|
|
41
|
+
`vlakit` keeps a clean split between your laptop and the GPU box. Commands that resolve or inspect configuration — `init`, `config`, `stats`, `split`, `eval`, `doctor` — run entirely on your **laptop** and need no box. Commands that operate on a machine — everything under `vla remote ...`, and `vla launch` — open an ssh connection from your laptop and run the work **on the box** defined in your `boxes.yaml`.
|
|
42
|
+
|
|
43
|
+
Run `vla doctor` to see exactly which scripts directory and config directory were resolved, and which optional dependencies are installed.
|
|
44
|
+
|
|
45
|
+
## Commands
|
|
46
|
+
|
|
47
|
+
| Command | Runs | What it does |
|
|
48
|
+
|---|---|---|
|
|
49
|
+
| `vla init [dir]` | laptop | Scaffolds an editable `configs/` directory from the bundled templates. |
|
|
50
|
+
| `vla config <run>` | laptop | Resolves a run (defaults merged under the run) and prints the config plus the exact command, running nothing. |
|
|
51
|
+
| `vla remote <box> <subcmd> [args]` | box | Runs an operational subcommand on the box: `deploy`, `push-secrets`, `ensure-swap`, `launch`, `autoresume`, `monitor`, `kill`, `rescale`, `pull`, `gpus`, `exec`, `shell`. |
|
|
52
|
+
| `vla launch <run>` | box | Launches the run detached and auto-resuming; the box is read from the run's `box:` field. |
|
|
53
|
+
| `vla stats [args]` | laptop/box | Computes the full dataset statistics (quantiles + image stats) that lerobot and molmo need. Requires the `[stats]` extra. |
|
|
54
|
+
| `vla split [args]` | laptop | Produces a deterministic held-out episode split for validation/eval. |
|
|
55
|
+
| `vla eval [args]` | laptop/box | Ranks a checkpoint by held-out error or rollout, not loss. Try `vla eval --self-test` to verify the harness with no box. |
|
|
56
|
+
| `vla doctor` | laptop | Prints the resolved scripts/config directories and optional-dependency status. |
|
|
57
|
+
|
|
58
|
+
## Configuration
|
|
59
|
+
|
|
60
|
+
Your `configs/` directory holds everything dynamic, and no secrets ever live in it: keys resolve on the box via `~/.secrets.env`. The directory is resolved from `--config-dir`, then the `VLA_CONFIG_DIR` environment variable, then `./configs`. A run file under `runs/` is a thin recipe that names a `box`, a `dataset`, and a `baseline` — each a pointer into the corresponding registry — plus a few hyperparameters; everything else is inherited from `_defaults.yaml`.
|
|
61
|
+
|
|
62
|
+
## Status
|
|
63
|
+
|
|
64
|
+
The local commands (`init`, `config`, `stats`, `split`, `eval`, `doctor`) are implemented and tested. The remote commands shell out to the bundled, proven ops scripts; `vla remote <box> deploy` now ships both the toolkit and your `configs/` to the box. The `eval` offline comparator is implemented and self-tested (`vla eval --self-test`); its sim and robot rollout modes are still stubs.
|
|
65
|
+
|
|
66
|
+
## Publishing (maintainers)
|
|
67
|
+
|
|
68
|
+
Releases publish to PyPI automatically through [Trusted Publishing](https://docs.pypi.org/trusted-publishers/) (OIDC), so no API token is stored anywhere. The workflow is [`.github/workflows/release.yml`](.github/workflows/release.yml).
|
|
69
|
+
|
|
70
|
+
One-time setup:
|
|
71
|
+
|
|
72
|
+
1. On PyPI, add a **pending** Trusted Publisher (Account → Publishing) with these exact values:
|
|
73
|
+
- PyPI Project Name: `vlakit`
|
|
74
|
+
- Owner: `kkipngenokoech`
|
|
75
|
+
- Repository name: `vlakit`
|
|
76
|
+
- Workflow name: `release.yml`
|
|
77
|
+
- Environment name: `pypi`
|
|
78
|
+
2. In the GitHub repo, create an Environment named `pypi` (Settings → Environments).
|
|
79
|
+
|
|
80
|
+
To cut a release, tag a version that matches `pyproject.toml` and push it:
|
|
81
|
+
|
|
82
|
+
```bash
|
|
83
|
+
git tag v0.1.0
|
|
84
|
+
git push origin v0.1.0
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
The workflow builds the sdist + wheel, verifies the bundled scripts/templates are inside the wheel, checks the tag matches the package version, and publishes. After the first successful run the pending publisher becomes a normal one, and `pipx install vlakit` works for everyone.
|
|
88
|
+
|
|
89
|
+
## License
|
|
90
|
+
|
|
91
|
+
MIT — see [LICENSE](LICENSE).
|
|
@@ -0,0 +1,45 @@
|
|
|
1
|
+
[build-system]
|
|
2
|
+
requires = ["hatchling"]
|
|
3
|
+
build-backend = "hatchling.build"
|
|
4
|
+
|
|
5
|
+
[project]
|
|
6
|
+
name = "vlakit"
|
|
7
|
+
version = "0.1.0"
|
|
8
|
+
description = "Config-driven CLI to launch, monitor, and ship VLA fine-tunes across ephemeral GPU boxes."
|
|
9
|
+
readme = "README.md"
|
|
10
|
+
requires-python = ">=3.10"
|
|
11
|
+
license = { text = "MIT" }
|
|
12
|
+
authors = [{ name = "kkipngenokoech", email = "kkipngenokoech22@gmail.com" }]
|
|
13
|
+
keywords = ["vla", "fine-tuning", "robotics", "openpi", "lerobot", "molmoact"]
|
|
14
|
+
classifiers = [
|
|
15
|
+
"Programming Language :: Python :: 3",
|
|
16
|
+
"License :: OSI Approved :: MIT License",
|
|
17
|
+
"Environment :: Console",
|
|
18
|
+
"Intended Audience :: Science/Research",
|
|
19
|
+
"Operating System :: POSIX",
|
|
20
|
+
]
|
|
21
|
+
# Core stays light: the laptop-side commands (config/remote/launch) need nothing extra,
|
|
22
|
+
# and PyYAML is optional because the bundled config reader falls back to grep/sed.
|
|
23
|
+
dependencies = []
|
|
24
|
+
|
|
25
|
+
[project.optional-dependencies]
|
|
26
|
+
yaml = ["pyyaml"] # faster/robust config parsing (else grep/sed fallback)
|
|
27
|
+
stats = ["numpy", "pyarrow"] # `vla stats` (compute_stats.py)
|
|
28
|
+
wandb = ["wandb"] # publish / pull / eval logging
|
|
29
|
+
all = ["pyyaml", "numpy", "pyarrow", "wandb"]
|
|
30
|
+
|
|
31
|
+
[project.scripts]
|
|
32
|
+
vla = "vlakit.cli:main"
|
|
33
|
+
|
|
34
|
+
[project.urls]
|
|
35
|
+
Homepage = "https://github.com/kkipngenokoech/vlakit"
|
|
36
|
+
Issues = "https://github.com/kkipngenokoech/vlakit/issues"
|
|
37
|
+
|
|
38
|
+
# src-layout: map src/vlakit -> vlakit. hatchling includes ALL files under the
|
|
39
|
+
# package (the bundled scripts/ + templates/, not just .py) in the wheel by default,
|
|
40
|
+
# so no force-include is needed — adding one would double-add files and fail the build.
|
|
41
|
+
[tool.hatch.build.targets.wheel]
|
|
42
|
+
packages = ["src/vlakit"]
|
|
43
|
+
|
|
44
|
+
[tool.hatch.build.targets.sdist]
|
|
45
|
+
include = ["src/vlakit", "README.md", "LICENSE"]
|
|
@@ -0,0 +1,8 @@
|
|
|
1
|
+
"""vlakit — a config-driven CLI for launching and shipping VLA fine-tunes.
|
|
2
|
+
|
|
3
|
+
The heavy lifting lives in battle-tested shell + Python scripts shipped as package
|
|
4
|
+
data under `vlakit/scripts/`; this package's `cli` module is the friendly `vla`
|
|
5
|
+
front door that resolves your config dir and dispatches to them.
|
|
6
|
+
"""
|
|
7
|
+
|
|
8
|
+
__version__ = "0.1.0"
|
|
@@ -0,0 +1,199 @@
|
|
|
1
|
+
"""vla — the command-line front door for vlakit.
|
|
2
|
+
|
|
3
|
+
This module resolves two things and then hands off to the shipped scripts:
|
|
4
|
+
|
|
5
|
+
* the SCRIPTS dir — the proven shell/Python toolkit, bundled as package data
|
|
6
|
+
(vlakit/scripts/); exported to the scripts as FT_ROOT.
|
|
7
|
+
* the CONFIG dir — YOUR editable configs (boxes/datasets/baselines/runs);
|
|
8
|
+
resolved from --config-dir, then $VLA_CONFIG_DIR, then
|
|
9
|
+
./configs, and exported to the scripts as VLA_CONFIG_DIR.
|
|
10
|
+
|
|
11
|
+
So the toolkit code is read-only and lives in the install, while your environment
|
|
12
|
+
lives in a configs/ dir you own. `vla init` scaffolds that dir from templates.
|
|
13
|
+
|
|
14
|
+
Commands that touch a GPU box (remote/launch/deploy/...) shell out to the bundled
|
|
15
|
+
ops/remote.sh, which ssh's to the box defined in your boxes.yaml. Commands that run
|
|
16
|
+
locally (config/stats/split/eval/init) do not need a box.
|
|
17
|
+
"""
|
|
18
|
+
import argparse
|
|
19
|
+
import os
|
|
20
|
+
import shutil
|
|
21
|
+
import subprocess
|
|
22
|
+
import sys
|
|
23
|
+
from contextlib import ExitStack
|
|
24
|
+
from importlib.resources import files, as_file
|
|
25
|
+
|
|
26
|
+
from . import __version__
|
|
27
|
+
|
|
28
|
+
|
|
29
|
+
def _scripts_root(stack):
|
|
30
|
+
"""Return a real filesystem path to the bundled scripts/ dir.
|
|
31
|
+
|
|
32
|
+
Uses importlib.resources so it works whether installed as a wheel, an editable
|
|
33
|
+
install, or run straight from the source tree. `stack` is an ExitStack that keeps
|
|
34
|
+
any extracted temp dir alive for the duration of the call.
|
|
35
|
+
"""
|
|
36
|
+
return str(stack.enter_context(as_file(files("vlakit") / "scripts")))
|
|
37
|
+
|
|
38
|
+
|
|
39
|
+
def _templates_configs(stack):
|
|
40
|
+
return str(stack.enter_context(as_file(files("vlakit") / "templates" / "configs")))
|
|
41
|
+
|
|
42
|
+
|
|
43
|
+
def _config_dir(args, *, required=True):
|
|
44
|
+
"""Resolve the user's config dir: --config-dir, then $VLA_CONFIG_DIR, then ./configs."""
|
|
45
|
+
cand = args.config_dir or os.environ.get("VLA_CONFIG_DIR") or os.path.join(os.getcwd(), "configs")
|
|
46
|
+
if required and not os.path.isdir(cand):
|
|
47
|
+
sys.exit(f"config dir not found: {cand}\n run `vla init` to scaffold one, "
|
|
48
|
+
f"or pass --config-dir / set VLA_CONFIG_DIR")
|
|
49
|
+
return cand
|
|
50
|
+
|
|
51
|
+
|
|
52
|
+
def _env(scripts, config_dir=None):
|
|
53
|
+
"""Build the environment the scripts expect: FT_ROOT + (optionally) VLA_CONFIG_DIR."""
|
|
54
|
+
env = dict(os.environ)
|
|
55
|
+
env["FT_ROOT"] = scripts
|
|
56
|
+
if config_dir:
|
|
57
|
+
env["VLA_CONFIG_DIR"] = config_dir
|
|
58
|
+
return env
|
|
59
|
+
|
|
60
|
+
|
|
61
|
+
def _run(cmd, env):
|
|
62
|
+
"""Run a subprocess, echo it, and propagate its exit code."""
|
|
63
|
+
print(f"# $ {' '.join(cmd)}", file=sys.stderr)
|
|
64
|
+
return subprocess.run(cmd, env=env).returncode
|
|
65
|
+
|
|
66
|
+
|
|
67
|
+
# --------------------------------------------------------------------------- #
|
|
68
|
+
# commands
|
|
69
|
+
# --------------------------------------------------------------------------- #
|
|
70
|
+
def cmd_version(args):
|
|
71
|
+
print(f"vlakit {__version__}")
|
|
72
|
+
return 0
|
|
73
|
+
|
|
74
|
+
|
|
75
|
+
def cmd_init(args):
|
|
76
|
+
"""Scaffold an editable configs/ dir from the bundled templates."""
|
|
77
|
+
dest = os.path.abspath(args.dir or os.path.join(os.getcwd(), "configs"))
|
|
78
|
+
if os.path.exists(dest) and not args.force:
|
|
79
|
+
sys.exit(f"{dest} already exists — pass --force to overwrite")
|
|
80
|
+
with ExitStack() as stack:
|
|
81
|
+
src = _templates_configs(stack)
|
|
82
|
+
if os.path.exists(dest) and args.force:
|
|
83
|
+
shutil.rmtree(dest)
|
|
84
|
+
shutil.copytree(src, dest)
|
|
85
|
+
print(f"scaffolded configs -> {dest}")
|
|
86
|
+
print("next: edit boxes.yaml / datasets.yaml / baselines.yaml / runs/, then "
|
|
87
|
+
"copy secrets.example.env -> secrets.local.env and fill it.")
|
|
88
|
+
return 0
|
|
89
|
+
|
|
90
|
+
|
|
91
|
+
def cmd_config(args):
|
|
92
|
+
"""Resolve a run config (defaults merged under the run) and print it. Runs locally."""
|
|
93
|
+
with ExitStack() as stack:
|
|
94
|
+
scripts = _scripts_root(stack)
|
|
95
|
+
cfg = _config_dir(args)
|
|
96
|
+
# source the resolver and print — exactly what ops/launch.sh shows in dry-run,
|
|
97
|
+
# without needing a box or ssh.
|
|
98
|
+
snippet = ('source "$FT_ROOT/lib/config.sh"; '
|
|
99
|
+
'load_run_config "$1" && print_run_config')
|
|
100
|
+
return _run(["bash", "-c", snippet, "_", args.run], _env(scripts, cfg))
|
|
101
|
+
|
|
102
|
+
|
|
103
|
+
def cmd_remote(args):
|
|
104
|
+
"""Pass through to ops/remote.sh <box> <subcmd> [args...]."""
|
|
105
|
+
with ExitStack() as stack:
|
|
106
|
+
scripts = _scripts_root(stack)
|
|
107
|
+
cfg = _config_dir(args)
|
|
108
|
+
return _run(["bash", os.path.join(scripts, "ops", "remote.sh"),
|
|
109
|
+
args.box, args.subcmd, *args.extra], _env(scripts, cfg))
|
|
110
|
+
|
|
111
|
+
|
|
112
|
+
def cmd_launch(args):
|
|
113
|
+
"""Launch detached + auto-resume; the box is read from the run's config (auto)."""
|
|
114
|
+
with ExitStack() as stack:
|
|
115
|
+
scripts = _scripts_root(stack)
|
|
116
|
+
cfg = _config_dir(args)
|
|
117
|
+
return _run(["bash", os.path.join(scripts, "ops", "remote.sh"),
|
|
118
|
+
"auto", "autoresume", args.run], _env(scripts, cfg))
|
|
119
|
+
|
|
120
|
+
|
|
121
|
+
def _run_pyscript(args, rel, passthrough):
|
|
122
|
+
with ExitStack() as stack:
|
|
123
|
+
scripts = _scripts_root(stack)
|
|
124
|
+
return _run([sys.executable, os.path.join(scripts, *rel), *passthrough],
|
|
125
|
+
_env(scripts))
|
|
126
|
+
|
|
127
|
+
|
|
128
|
+
def cmd_stats(args):
|
|
129
|
+
"""data/compute_stats.py — needs the [stats] extra (numpy + pyarrow)."""
|
|
130
|
+
return _run_pyscript(args, ("data", "compute_stats.py"), args.extra)
|
|
131
|
+
|
|
132
|
+
|
|
133
|
+
def cmd_split(args):
|
|
134
|
+
"""data/heldout_split.py — deterministic held-out episode ids (stdlib only)."""
|
|
135
|
+
return _run_pyscript(args, ("data", "heldout_split.py"), args.extra)
|
|
136
|
+
|
|
137
|
+
|
|
138
|
+
def cmd_eval(args):
|
|
139
|
+
"""eval/rollout.py — offline comparator / rollout (use --self-test to verify the harness)."""
|
|
140
|
+
return _run_pyscript(args, ("eval", "rollout.py"), args.extra)
|
|
141
|
+
|
|
142
|
+
|
|
143
|
+
def cmd_doctor(args):
|
|
144
|
+
"""Print what vlakit resolved, so you can see where scripts/configs are coming from."""
|
|
145
|
+
with ExitStack() as stack:
|
|
146
|
+
scripts = _scripts_root(stack)
|
|
147
|
+
print(f"vlakit {__version__}")
|
|
148
|
+
print(f"python {sys.version.split()[0]}")
|
|
149
|
+
print(f"scripts (FT_ROOT) {scripts}")
|
|
150
|
+
cfg = _config_dir(args, required=False)
|
|
151
|
+
print(f"config dir {cfg} {'(exists)' if os.path.isdir(cfg) else '(MISSING — run `vla init`)'}")
|
|
152
|
+
for mod in ("yaml", "numpy", "pyarrow", "wandb"):
|
|
153
|
+
try:
|
|
154
|
+
__import__(mod)
|
|
155
|
+
print(f" {mod:8} ok")
|
|
156
|
+
except ImportError:
|
|
157
|
+
print(f" {mod:8} not installed")
|
|
158
|
+
return 0
|
|
159
|
+
|
|
160
|
+
|
|
161
|
+
def main(argv=None):
|
|
162
|
+
p = argparse.ArgumentParser(prog="vla", description=__doc__,
|
|
163
|
+
formatter_class=argparse.RawDescriptionHelpFormatter)
|
|
164
|
+
p.add_argument("--config-dir", help="path to your configs/ (else $VLA_CONFIG_DIR, else ./configs)")
|
|
165
|
+
sub = p.add_subparsers(dest="cmd", required=True)
|
|
166
|
+
|
|
167
|
+
sub.add_parser("version", help="print the vlakit version").set_defaults(fn=cmd_version)
|
|
168
|
+
sub.add_parser("doctor", help="show resolved scripts/config dirs + optional deps").set_defaults(fn=cmd_doctor)
|
|
169
|
+
|
|
170
|
+
sp = sub.add_parser("init", help="scaffold a configs/ dir from templates")
|
|
171
|
+
sp.add_argument("dir", nargs="?", help="target dir (default ./configs)")
|
|
172
|
+
sp.add_argument("--force", action="store_true", help="overwrite an existing dir")
|
|
173
|
+
sp.set_defaults(fn=cmd_init)
|
|
174
|
+
|
|
175
|
+
sp = sub.add_parser("config", help="resolve + print a run config (local, no box)")
|
|
176
|
+
sp.add_argument("run", help="run name (a file in configs/runs/)")
|
|
177
|
+
sp.set_defaults(fn=cmd_config)
|
|
178
|
+
|
|
179
|
+
sp = sub.add_parser("remote", help="run an ops subcommand on a box (deploy/monitor/gpus/...)")
|
|
180
|
+
sp.add_argument("box", help="box alias from boxes.yaml, or 'auto'")
|
|
181
|
+
sp.add_argument("subcmd", help="deploy|push-secrets|ensure-swap|launch|autoresume|monitor|kill|rescale|pull|gpus|exec|shell")
|
|
182
|
+
sp.set_defaults(fn=cmd_remote)
|
|
183
|
+
|
|
184
|
+
sub.add_parser("launch", help="launch a run detached + auto-resume (box from the run cfg)").add_argument("run", help="run name")
|
|
185
|
+
sub.choices["launch"].set_defaults(fn=cmd_launch)
|
|
186
|
+
|
|
187
|
+
sub.add_parser("stats", help="compute full dataset stats (needs [stats] extra)").set_defaults(fn=cmd_stats)
|
|
188
|
+
sub.add_parser("split", help="deterministic held-out episode split").set_defaults(fn=cmd_split)
|
|
189
|
+
sub.add_parser("eval", help="held-out comparator / rollout (try: vla eval --self-test)").set_defaults(fn=cmd_eval)
|
|
190
|
+
|
|
191
|
+
# parse_known_args so unknown flags (e.g. `eval --self-test`, `split --episodes N`) forward
|
|
192
|
+
# cleanly to the wrapped script — argparse.REMAINDER drops a *leading* option.
|
|
193
|
+
args, extra = p.parse_known_args(argv)
|
|
194
|
+
args.extra = extra
|
|
195
|
+
return args.fn(args)
|
|
196
|
+
|
|
197
|
+
|
|
198
|
+
if __name__ == "__main__":
|
|
199
|
+
sys.exit(main())
|
|
@@ -0,0 +1,45 @@
|
|
|
1
|
+
# data/ — dataset conversion + stats
|
|
2
|
+
|
|
3
|
+
## The v2.1 / v3.0 split (ui.md §1.4)
|
|
4
|
+
| Stack | Format it reads |
|
|
5
|
+
|---|---|
|
|
6
|
+
| **openpi** | **v2.1** directly — no conversion, no extra stats step |
|
|
7
|
+
| **lerobot** | **v3.0** + full stats |
|
|
8
|
+
| **molmo** | **v3.0** + full stats |
|
|
9
|
+
|
|
10
|
+
Convert **in place** — do NOT make a second full copy (towel_300h is ~2.6 TB).
|
|
11
|
+
|
|
12
|
+
## Why a custom stats step (E6) — the important part
|
|
13
|
+
lerobot's `gen_episode_stats.py` only computes **mean/std/min/max**. But lerobot &
|
|
14
|
+
molmo read, at launch:
|
|
15
|
+
- **quantiles q01–q99** (every percentile), and
|
|
16
|
+
- **per-image-key stats** (channel mean/std for each camera/observation image key).
|
|
17
|
+
|
|
18
|
+
Missing either => **KeyError at launch**. So [`compute_stats.py`](compute_stats.py) computes the FULL
|
|
19
|
+
set and validates it before you launch. Performance: read parquet **columns with
|
|
20
|
+
pyarrow** (numpy buffers); the naive `.to_pylist()` path was ~100× too slow (E6).
|
|
21
|
+
|
|
22
|
+
## Pipeline
|
|
23
|
+
```bash
|
|
24
|
+
# lerobot/molmo only (openpi skips this entirely):
|
|
25
|
+
./convert_v21_to_v30.sh /path/to/dataset # in place, then chains compute_stats.py
|
|
26
|
+
# or stats alone:
|
|
27
|
+
python compute_stats.py --dataset /path/to/dataset
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
## Files
|
|
31
|
+
|
|
32
|
+
- **convert_v21_to_v30.sh** — convert a LeRobot dataset v2.1→v3.0 in place, then chain stats. [deep dive →](../docs/reference/data.md#convert_v21_to_v30sh)
|
|
33
|
+
- **compute_stats.py** — compute the full stats set (quantiles q01–q99 + image stats) and validate before launch. [deep dive →](../docs/reference/data.md#compute_statspy)
|
|
34
|
+
- **heldout_split.py** — pick a deterministic, strided held-out episode set for validation/eval (no native val split exists since E20). Used by [`../eval/rollout.py`](../eval/rollout.py) offline mode and any in-training val. [deep dive →](../docs/reference/data.md#heldout_splitpy)
|
|
35
|
+
|
|
36
|
+
## Honesty: what's verified vs template
|
|
37
|
+
- **Verified from ui.md:** the v2.1/v3.0 split, in-place conversion, the requirement
|
|
38
|
+
for quantiles + image stats, the pyarrow-not-`to_pylist` performance lesson, and
|
|
39
|
+
validating stats before launch.
|
|
40
|
+
- **TEMPLATE / TODO (need the box's lerobot version):**
|
|
41
|
+
- [`convert_v21_to_v30.sh`](convert_v21_to_v30.sh) — the exact upstream converter entrypoint (`UPSTREAM_CONVERT`).
|
|
42
|
+
- [`compute_stats.py`](compute_stats.py) — exact stats file path/format, the low-dim column names, and
|
|
43
|
+
the image-key decode/sampling path (`compute_image_stats` returns the right
|
|
44
|
+
SHAPE but placeholder values; wire it to the real frame decode).
|
|
45
|
+
These are clearly marked `TODO` in the files.
|