gpt-omni
mini-omni2
Mini Omni
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming Mini-Omni is an open-source multimodel large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities. ✅ Real-time speech-to-speech conversational capabilities. No extra ASR or TTS models required. ✅ Talking while thinking, with the ability to generate text and audio at the same time. ✅ With "Audio-to-Text" and "Audio-to-Audio" batch inference to further boost the performance. NOTE: please refer to the code repository for more details. Create a new conda environment and install the required packages: NOTE: you need to run streamlit locally with PyAudio installed. NOTE: need to unmute first. Gradio seems can not play audio stream instantly, so the latency feels a bit longer. - Qwen2 as the LLM backbone. - litGPT for training and inference. - whisper for audio encoding. - snac for audio decoding. - CosyVoice for generating synthetic speech. - OpenOrca and MOSS for alignment.