jadechoghari
Ferret-UI-Gemma2b
Ferret-UI is the first UI-centric multimodal large language model (MLLM) designed for referring, grounding, and reasoning tasks. Built on Gemma-2B and Llama-3-8B, it is capable of executing complex UI tasks. This is the Gemma-2B version of ferret-ui. It follows from this paper by Apple. You will need first to download `builder.py`, `conversation.py`, `inference.py`, `modelUI.py`, and `mmutils.py` locally.
mar
RT-DETRv2
robustsam-vit-large
Ferret-UI-Llama8b
Ferret-UI is the first UI-centric multimodal large language model (MLLM) designed for referring, grounding, and reasoning tasks. Built on Gemma-2B and Llama-3-8B, it is capable of executing complex UI tasks. This is the Llama-3-8B version of ferret-ui. It follows from this paper by Apple. You will need first to download `builder.py`, `conversation.py`, `inference.py`, `modelUI.py`, and `mmutils.py` locally.
LongVU_Qwen2_7B
vfusion3d
smolvla_metaworld
SmolVLA is a compact, efficient vision-language-action model that achieves competitive performance at reduced computational costs and can be deployed on consumer-grade hardware. This policy has been trained and pushed to the Hub using LeRobot. See the full documentation at LeRobot Docs. For a complete walkthrough, see the training guide. Below is the short version on how to train and run inference/eval: Writes checkpoints to `outputs/train/ /checkpoints/`. Prefix the dataset repo with eval\ and supply `--policy.path` pointing to a local or hub checkpoint.
VoiceRestore
VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration VoiceRestore is a cutting-edge speech restoration model designed to significantly enhance the quality of degraded voice recordings. Leveraging flow-matching transformers, this model excels at addressing a wide range of audio imperfections commonly found in speech, including background noise, reverberation, distortion, and signal loss. It is based on this repo & demo of audio restorations: VoiceRestore - Universal Restoration: The model can handle any level and type of voice recording degradation. Pure magic. - Easy to Use: Simple interface for processing degraded audio files. - Pretrained Model: Includes a 301 million parameter transformer model with pre-trained weights. (Model is still in the process of training, there will be further checkpoint updates) - Architecture: Flow-matching transformer - Parameters: 300M+ parameters - Input: Degraded speech audio (various formats supported) - Output: Restored speech - Current model is optimized for speech; may not perform optimally on music or other audio types. - Ongoing research to improve performance on extreme degradations. - Future updates may include real-time processing capabilities. If you use VoiceRestore in your research, please cite our paper: This project is licensed under the MIT License - see the LICENSE file for details. - Based on the E2-TTS implementation by Lucidrains - Special thanks to the open-source community for their invaluable contributions. - Credits: This repository is based on the E2-TTS implementation by Lucidrains
openmusic
Official Hugging Face Diffusers Implementation of QA-MDT QAMDT: Quality-Aware Diffusion for Text-to-Music 🎶 QADMT brings a new approach to text-to-music generation by using quality-aware training to tackle issues like low-fidelity audio and weak labeling in datasets. With a masked diffusion transformer (MDT), QADMT delivers SOTA results on MusicCaps and Song-Describer, enhancing both quality and musicality. It follows from this paper by the University of Science and Technology of China, authored by @changli et al.. Usage: This command will change the folder name from `openmusic` to `qamdt`
robustsam-vit-base
VidToMe
MODEL_20k
pi05-libero-10-256-quantiles_no_ki
xvla-thor
test-lerobot-act
Action Chunking with Transformers (ACT) is an imitation-learning method that predicts short action chunks instead of single steps. It learns from teleoperated data and often achieves high success rates. This policy has been trained and pushed to the Hub using LeRobot. See the full documentation at LeRobot Docs. For a complete walkthrough, see the training guide. Below is the short version on how to train and run inference/eval: Writes checkpoints to `outputs/train/ /checkpoints/`. Prefix the dataset repo with eval\ and supply `--policy.path` pointing to a local or hub checkpoint.
textnet-base
pusht_pipe
pifast-80k
act-so101-phone
act-so101-ee
spad
act_pipe
Action Chunking with Transformers (ACT) is an imitation-learning method that predicts short action chunks instead of single steps. It learns from teleoperated data and often achieves high success rates. This policy has been trained and pushed to the Hub using LeRobot. See the full documentation at LeRobot Docs. For a complete walkthrough, see the training guide. Below is the short version on how to train and run inference/eval: Writes checkpoints to `outputs/train/ /checkpoints/`. Prefix the dataset repo with eval\ and supply `--policy.path` pointing to a local or hub checkpoint.
smolvla-new-libero
aya-23-8B-quantized
dot_pusht_keypoints_best
dot_pusht_images
smolvla-libero-ckpts
SmolVLA is a compact, efficient vision-language-action model that achieves competitive performance at reduced computational costs and can be deployed on consumer-grade hardware. This policy has been trained and pushed to the Hub using LeRobot. See the full documentation at LeRobot Docs. For a complete walkthrough, see the training guide. Below is the short version on how to train and run inference/eval: Writes checkpoints to `outputs/train/ /checkpoints/`. Prefix the dataset repo with eval\ and supply `--policy.path` pointing to a local or hub checkpoint.