yjguo
Ctrl World
👉 Ctrl-World: A Controllable Generative World Model for Robot Manipulation Yanjiang Guo, Lucy Xiaoyang Shi, Jianyu Chen, Chelsea Finn \Equal contribution; Stanford University, Tsinghua University TL; DR: Ctrl-World is an action-conditioned world model compatible with modern VLA policies and enables policy-in-the-loop rollouts entirely in imagination, which can be used to evaluate and improve the instruction following ability of VLA. Model Details: This repo include the Ctrl-World model checkpoint trained on opensourced DROID dataset (~95k trajectories, 564 scenes). The DROID platform consists of a Franka Panda robotic arm equipped with a Robotiq gripper and three cameras: two randomly placed third-person cameras and one wrist-mounted camera. Usage See the official Ctrl-World github repo for detailed usage. Ctrl-World is developed from the opensourced video foundation model Stable-Video-Diffusion. The VLA model used in this repo is from openpi. We thank the authors for their efforts! Bibtex If you find our work helpful, please leave us a star and cite our paper. Thank you!