Step-Audio-EditX-AWQ-4bit

2
by
stepfun-ai
Code Model
OTHER
4B params
New
2 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
9GB+ RAM
Mobile
Laptop
Server
Quick Summary

AI model with specialized capabilities.

Device Compatibility

Mobile
4-6GB RAM
Laptop
16GB RAM
Server
GPU
Minimum Recommended
4GB+ RAM

Code Examples

Local Inference Demobash
# zero-shot cloning
# The path of the generated audio file is output/fear_zh_female_prompt_cloned.wav
python3 tts_infer.py \
    --model-path where_you_download_dir \
    --tokenizer-path where_you_download_dir \
    --prompt-text "我总觉得,有人在跟着我,我能听到奇怪的脚步声。" \
    --prompt-audio "examples/fear_zh_female_prompt.wav" \
    --generated-text "可惜没有如果,已经发生的事情终究是发生了。" \
    --edit-type "clone" \
    --output-dir ./output 

python3 tts_infer.py \
    --model-path where_you_download_dir \
    --tokenizer-path where_you_download_dir \
    --prompt-text "His political stance was conservative, and he was particularly close to margaret thatcher." \
    --prompt-audio "examples/zero_shot_en_prompt.wav" \
    --generated-text "Underneath the courtyard is a large underground exhibition room which connects the two buildings.	" \
    --edit-type "clone" \
    --output-dir ./output 

# edit
# There will be one or multiple wave files corresponding to each edit iteration, for example: output/fear_zh_female_prompt_edited_iter1.wav, output/fear_zh_female_prompt_edited_iter2.wav, ...
# emotion; fear
python3 tts_infer.py \
    --model-path where_you_download_dir \
    --tokenizer-path where_you_download_dir \
    --prompt-text "我总觉得,有人在跟着我,我能听到奇怪的脚步声。" \
    --prompt-audio "examples/fear_zh_female_prompt.wav" \
    --edit-type "emotion" \
    --edit-info "fear" \
    --output-dir ./output 

# emotion; happy
python3 tts_infer.py \
    --model-path where_you_download_dir \
    --tokenizer-path where_you_download_dir \
    --prompt-text "You know, I just finished that big project and feel so relieved. Everything seems easier and more colorful, what a wonderful feeling!" \
    --prompt-audio "examples/en_happy_prompt.wav" \
    --edit-type "emotion" \
    --edit-info "happy" \
    --output-dir ./output 

# style; whisper
# for style whisper, the edit iteration num should be set bigger than 1 to get better results.
python3 tts_infer.py \
    --model-path where_you_download_dir \
    --tokenizer-path where_you_download_dir \
    --prompt-text "比如在工作间隙,做一些简单的伸展运动,放松一下身体,这样,会让你更有精力." \
    --prompt-audio "examples/whisper_prompt.wav" \
    --edit-type "style" \
    --edit-info "whisper" \
    --output-dir ./output 

# paraliguistic 
# supported tags, Breathing, Laughter, Surprise-oh, Confirmation-en, Uhm, Surprise-ah, Surprise-wa, Sigh, Question-ei, Dissatisfaction-hnn
python3 tts_infer.py \
    --model-path where_you_download_dir \
    --tokenizer-path where_you_download_dir \
    --prompt-text "我觉得这个计划大概是可行的,不过还需要再仔细考虑一下。" \
    --prompt-audio "examples/paralingustic_prompt.wav" \
    --generated-text "我觉得这个计划大概是可行的,[Uhm]不过还需要再仔细考虑一下。" \
    --edit-type "paralinguistic" \
    --output-dir ./output 

# denoise
# Prompt text is not needed.
python3 tts_infer.py \
    --model-path where_you_download_dir \
    --tokenizer-path where_you_download_dir \
    --prompt-audio "examples/denoise_prompt.wav"\
    --edit-type "denoise" \
    --output-dir ./output 

# vad 
# Prompt text is not needed.
python3 tts_infer.py \
    --model-path where_you_download_dir \
    --tokenizer-path where_you_download_dir \
    --prompt-audio "examples/vad_prompt.wav" \
    --edit-type "vad" \
    --output-dir ./output 

# speed
# supported edit-info: faster, slower, more faster, more slower
python3 tts_infer.py \
    --model-path where_you_download_dir \
    --tokenizer-path where_you_download_dir \
    --prompt-text "上次你说鞋子有点磨脚,我给你买了一双软软的鞋垫。" \
    --prompt-audio "examples/speed_prompt.wav" \
    --edit-type "speed" \
    --edit-info "more faster" \
    --output-dir ./output

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.