VLASH Training
Fine-tuning a state-of-the-art robot policy like π₀.₅ on your own task requires multi-GPU distributed training infrastructure that has never been publicly released for the VLASH framework. This project removes that barrier.
It packages VLASH — a VLA fine-tuning and deployment framework built on LeRobot — into a single container image that runs on a cloud VM, an HPC cluster, or a local workstation with one command. You bring your robot demonstrations; the pipeline handles the rest.
Workflow
1. Collect demonstrations on your robot → upload to HuggingFace Hub
2. Run: ./scripts/train.sh config.yaml → fine-tuned checkpoint on HF Hub
3. Load checkpoint on your inference hardware → deploy
What VLASH adds over standard LeRobot
| Feature | Description |
|---|---|
| Asynchronous inference | Temporal Delay Augmentation — 29.5× lower latency on embedded hardware |
| Memory-efficient fine-tuning | LoRA + shared observation encoding — fine-tune π₀.₅ on 12 GB VRAM |
| Distributed training | DeepSpeed ZeRO-2 (LoRA) and FSDP (full fine-tuning) |
| Portable containers | Same Docker/Singularity image on AWS, GCP, NSCC ASPIRE, or a local GPU |
Validated end-to-end
The pipeline is validated on a ball pick-and-place task using a Piper arm with inference on a Jetson AGX Orin at 30 Hz:
- 65% task success rate with async inference vs. 5% synchronous baseline
- 29.5× latency reduction — 184.5 ms vs. 5444.1 ms per inference call
- 2.9× faster task completion — 52 s vs. 153 s
Quick start
# 1. Set required environment variables
export SCRATCH=/your/persistent/storage # mounted disk
export HF_TOKEN=hf_xxx # HuggingFace token
export DATASET_REPO_ID=your-org/your-dataset
# 2. Pull the container
singularity pull vlash.sif docker://frieddeli/vlash-forge:latest
# or: docker pull frieddeli/vlash-forge:latest
# 3. Train
./scripts/train.sh examples/train/pi05/cloud.yaml
See First-time Setup for prerequisites.