zhifeixie

1 models • 1 total models in database
Sort by:

Audio Reasoner

We implemented inference scaling on Audio-Reasoner, a large audio language model, enabling deepthink and structured chain-of-thought (COT) reasoning for multimodal understanding and reasoning. To achieve this, we constructed CoTA, a high-quality dataset with 1.2M reasoning-rich samples using structured COT techniques. Audio-Reasoner achieves state-of-the-art results on MMAU-mini(+25.42%) and AIR-Bench-Chat(+14.57%) benchmarks. Audio-Reasoner-7B 🤗 | CoTA Dataset 🤗 (coming soon) Paper 📑 | Wechat 💭 | Code ⚙️ News and Updates - 2025.03.05: ✅Audio-Reasoner-7B checkpoint is released on HuggingFace 🤗 ! - 2025.03.05: ✅Audio-Reasoner Paper is uploaded to arXiv 📑 . - 2025.03.04: ✅Demos, inference code and evaluation results have been released. - 2025.03.04: ✅Create this repo. Roadmap - 2025.03: 🔜Upload CoTA dataset to HuggingFace🤗. - 2025.04: 🔜Open-source data systhesis pipeline and training code. Features ✅ Audio-Reasoner enables deep reasoning and inference scaling in audio-based tasks, built on Qwen2-Audio-Instruct with structured CoT training. ✅ CoTA offers 1.2M high-quality captions and QA pairs across domains for structured reasoning and enhanced pretraining. ✅ Pretrained model and dataset encompassing various types of audio including sound, music, and speech, has achieved state-of-the-art results across multiple benchmarks. Refer to our paper for details. 1. What kind of audio can Audio - Reasoner understand and what kind of thinking does it perform? Audio - Reasoner can understand various types of audio, including sound, music, and speech. It conducts in - depth thinking in four parts: planning, caption, reasoning, and summary. 2. Why is transformers installed after 'ms-swift' in the environment configuration? The version of transformers has a significant impact on the performance of the model. We have tested that version `transformers==4.49.1` is one of the suitable versions. Installing ms-swift first may ensure a more stable environment for the subsequent installation of transformers to avoid potential version conflicts that could affect the model's performance. If you have any questions, please feel free to contact us via `[email protected]`. Citation Please cite our paper if you find our model and detaset useful. Thanks!

license:mit
42
13