This model contains the weights of NExT-GPT covering text-image-video-audio (tiva), which is built upon - 1) Vicuna-7B with version 0 - 2) ImageBind - 3) Stable Diffusion with version `v1-5`. - 4) AudioLDM with version `l-full`. - 5) ZeroScope with version `v2576w`.
For more details about the usage of the model, please refer to our code repository.