physical-intelligence

1 models • 1 total models in database

Sort by:

Fast

FAST: Efficient Action Tokenization for Vision-Language-Action Models This is the official repo for the FAST action tokenizer. The action tokenizer maps any sequence of robot actions into a sequence of dense, discrete action tokens for training autoregressive VLA models. Here, we provide: 1. FAST+, our universal action tokenizer, trained on 1M real robot action sequences. 2. Code for quickly training new action tokenizers on your custom dataset. FAST can be used as a convenient HuggingFace AutoProcessor. To use it, simply install the `transformers` package (and `scipy` for the underlying DCT algorithm). We recommend applying the tokenizer to 1-second action "chunks" that have been pre-normalized to a range of [-1...1] (we use quantile normalization for this step -- check our paper). Encoding and decoding support batched inference. Note: During decoding, the tokenizer needs to map the decoded sequence of actions back into a `[timehorizon, actiondim]` matrix. There are multiple ways to provide the necessary dimensions to the tokenizer: (1) they automatically get saved on the first `forward()` call, (2) you can set them manually as arguments to the `decode()` call In our experiments, we found the FAST+ universal tokenizer to work well across a wide range of robot setups, action dimensions, and control frequencies. If you, however, want to train a custom FAST tokenizer for your dataset at hand, it is very easy using the `.fit()` convenience function we provide. When called on a dataset of action chunks (of the same or different lengths), it returns a new tokenizer instance, which you can save and optionally push to the HuggingFace hub. Training should typically only take a few seconds to minutes.

license:apache-2.0

148