jadechoghari

35 models • 1 total models in database

Sort by:

Ferret-UI-Gemma2b

Ferret-UI is the first UI-centric multimodal large language model (MLLM) designed for referring, grounding, and reasoning tasks. Built on Gemma-2B and Llama-3-8B, it is capable of executing complex UI tasks. This is the Gemma-2B version of ferret-ui. It follows from this paper by Apple. You will need first to download `builder.py`, `conversation.py`, `inference.py`, `modelUI.py`, and `mmutils.py` locally.

NaNK

—

5,782

Ferret-UI-Llama8b

Ferret-UI is the first UI-centric multimodal large language model (MLLM) designed for referring, grounding, and reasoning tasks. Built on Gemma-2B and Llama-3-8B, it is capable of executing complex UI tasks. This is the Llama-3-8B version of ferret-ui. It follows from this paper by Apple. You will need first to download `builder.py`, `conversation.py`, `inference.py`, `modelUI.py`, and `mmutils.py` locally.

NaNK

ferret_llama

326

LongVU_Qwen2_7B

NaNK

—

222

vfusion3d

license:cc-by-nc-2.0

114

smolvla_metaworld

SmolVLA is a compact, efficient vision-language-action model that achieves competitive performance at reduced computational costs and can be deployed on consumer-grade hardware. This policy has been trained and pushed to the Hub using LeRobot. See the full documentation at LeRobot Docs. For a complete walkthrough, see the training guide. Below is the short version on how to train and run inference/eval: Writes checkpoints to `outputs/train/ /checkpoints/`. Prefix the dataset repo with eval\ and supply `--policy.path` pointing to a local or hub checkpoint.

NaNK

license:apache-2.0

112

VoiceRestore

VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration VoiceRestore is a cutting-edge speech restoration model designed to significantly enhance the quality of degraded voice recordings. Leveraging flow-matching transformers, this model excels at addressing a wide range of audio imperfections commonly found in speech, including background noise, reverberation, distortion, and signal loss. It is based on this repo & demo of audio restorations: VoiceRestore - Universal Restoration: The model can handle any level and type of voice recording degradation. Pure magic. - Easy to Use: Simple interface for processing degraded audio files. - Pretrained Model: Includes a 301 million parameter transformer model with pre-trained weights. (Model is still in the process of training, there will be further checkpoint updates) - Architecture: Flow-matching transformer - Parameters: 300M+ parameters - Input: Degraded speech audio (various formats supported) - Output: Restored speech - Current model is optimized for speech; may not perform optimally on music or other audio types. - Ongoing research to improve performance on extreme degradations. - Future updates may include real-time processing capabilities. If you use VoiceRestore in your research, please cite our paper: This project is licensed under the MIT License - see the LICENSE file for details. - Based on the E2-TTS implementation by Lucidrains - Special thanks to the open-source community for their invaluable contributions. - Credits: This repository is based on the E2-TTS implementation by Lucidrains

license:mit

101

openmusic

Official Hugging Face Diffusers Implementation of QA-MDT QAMDT: Quality-Aware Diffusion for Text-to-Music 🎶 QADMT brings a new approach to text-to-music generation by using quality-aware training to tackle issues like low-fidelity audio and weak labeling in datasets. With a masked diffusion transformer (MDT), QADMT delivers SOTA results on MusicCaps and Song-Describer, enhancing both quality and musicality. It follows from this paper by the University of Science and Technology of China, authored by @changli et al.. Usage: This command will change the folder name from `openmusic` to `qamdt`

—

robustsam-vit-base

license:mit

VidToMe

license:mit

MODEL_20k

—

pi05-libero-10-256-quantiles_no_ki

license:apache-2.0

xvla-thor

license:apache-2.0

test-lerobot-act

Action Chunking with Transformers (ACT) is an imitation-learning method that predicts short action chunks instead of single steps. It learns from teleoperated data and often achieves high success rates. This policy has been trained and pushed to the Hub using LeRobot. See the full documentation at LeRobot Docs. For a complete walkthrough, see the training guide. Below is the short version on how to train and run inference/eval: Writes checkpoints to `outputs/train/ /checkpoints/`. Prefix the dataset repo with eval\ and supply `--policy.path` pointing to a local or hub checkpoint.

license:apache-2.0