xcodec2
3
license:cc-by-nc-4.0
by
litagin
Audio Model
OTHER
New
3 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
Unknown
Mobile
Laptop
Server
Quick Summary
AI model with specialized capabilities.
Code Examples
bash
conda create -n xcodec2 python=3.9
conda activate xcodec2
pip install xcodec2 (Use `xcodec2==0.1.5` for codec inference and llasa fine-tuning. I’ve removed unnecessary dependencies, and it works fine in my testing. However, I’m not sure if other problems may arise. If you prefer more stability, I recommend using `xcodec2==0.1.3` which accurately aligns during my codec training.)bash
conda create -n xcodec2 python=3.9
conda activate xcodec2
pip install xcodec2 (Use `xcodec2==0.1.5` for codec inference and llasa fine-tuning. I’ve removed unnecessary dependencies, and it works fine in my testing. However, I’m not sure if other problems may arise. If you prefer more stability, I recommend using `xcodec2==0.1.3` which accurately aligns during my codec training.)pythontransformers
import torch
import soundfile as sf
from transformers import AutoConfig
from xcodec2.modeling_xcodec2 import XCodec2Model
model_path = "HKUSTAudio/xcodec2"
model = XCodec2Model.from_pretrained(model_path)
model.eval().cuda()
wav, sr = sf.read("test.wav")
wav_tensor = torch.from_numpy(wav).float().unsqueeze(0) # Shape: (1, T)
with torch.no_grad():
# Only 16khz speech
# Only supports single input. For batch inference, please refer to the link below.
vq_code = model.encode_code(input_waveform=wav_tensor)
print("Code:", vq_code )
recon_wav = model.decode_code(vq_code).cpu() # Shape: (1, 1, T')
sf.write("reconstructed.wav", recon_wav[0, 0, :].numpy(), sr)
print("Done! Check reconstructed.wav")pythontransformers
import torch
import soundfile as sf
from transformers import AutoConfig
from xcodec2.modeling_xcodec2 import XCodec2Model
model_path = "HKUSTAudio/xcodec2"
model = XCodec2Model.from_pretrained(model_path)
model.eval().cuda()
wav, sr = sf.read("test.wav")
wav_tensor = torch.from_numpy(wav).float().unsqueeze(0) # Shape: (1, T)
with torch.no_grad():
# Only 16khz speech
# Only supports single input. For batch inference, please refer to the link below.
vq_code = model.encode_code(input_waveform=wav_tensor)
print("Code:", vq_code )
recon_wav = model.decode_code(vq_code).cpu() # Shape: (1, 1, T')
sf.write("reconstructed.wav", recon_wav[0, 0, :].numpy(), sr)
print("Done! Check reconstructed.wav")Deploy This Model
Production-ready deployment in minutes
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIReplicate
One-click model deployment
Run models in the cloud with simple API. No DevOps required.
Deploy NowDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.