SmolVLA is a compact, efficient vision-language-action model that achieves competitive performance at reduced computational costs and can be deployed on consumer-grade hardware.
This policy has been trained and pushed to the Hub using LeRobot. See the full documentation at LeRobot Docs.
For a complete walkthrough, see the training guide. Below is the short version on how to train and run inference/eval:
Writes checkpoints to `outputs/train/ /checkpoints/`.
Prefix the dataset repo with eval\ and supply `--policy.path` pointing to a local or hub checkpoint.