Step-Audio-EditX-AWQ-4bit
2
—
by
stepfun-ai
Code Model
OTHER
4B params
New
2 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
9GB+ RAM
Mobile
Laptop
Server
Quick Summary
AI model with specialized capabilities.
Device Compatibility
Mobile
4-6GB RAM
Laptop
16GB RAM
Server
GPU
Minimum Recommended
4GB+ RAM
Code Examples
Local Inference Demobash
# zero-shot cloning
# The path of the generated audio file is output/fear_zh_female_prompt_cloned.wav
python3 tts_infer.py \
--model-path where_you_download_dir \
--tokenizer-path where_you_download_dir \
--prompt-text "我总觉得,有人在跟着我,我能听到奇怪的脚步声。" \
--prompt-audio "examples/fear_zh_female_prompt.wav" \
--generated-text "可惜没有如果,已经发生的事情终究是发生了。" \
--edit-type "clone" \
--output-dir ./output
python3 tts_infer.py \
--model-path where_you_download_dir \
--tokenizer-path where_you_download_dir \
--prompt-text "His political stance was conservative, and he was particularly close to margaret thatcher." \
--prompt-audio "examples/zero_shot_en_prompt.wav" \
--generated-text "Underneath the courtyard is a large underground exhibition room which connects the two buildings. " \
--edit-type "clone" \
--output-dir ./output
# edit
# There will be one or multiple wave files corresponding to each edit iteration, for example: output/fear_zh_female_prompt_edited_iter1.wav, output/fear_zh_female_prompt_edited_iter2.wav, ...
# emotion; fear
python3 tts_infer.py \
--model-path where_you_download_dir \
--tokenizer-path where_you_download_dir \
--prompt-text "我总觉得,有人在跟着我,我能听到奇怪的脚步声。" \
--prompt-audio "examples/fear_zh_female_prompt.wav" \
--edit-type "emotion" \
--edit-info "fear" \
--output-dir ./output
# emotion; happy
python3 tts_infer.py \
--model-path where_you_download_dir \
--tokenizer-path where_you_download_dir \
--prompt-text "You know, I just finished that big project and feel so relieved. Everything seems easier and more colorful, what a wonderful feeling!" \
--prompt-audio "examples/en_happy_prompt.wav" \
--edit-type "emotion" \
--edit-info "happy" \
--output-dir ./output
# style; whisper
# for style whisper, the edit iteration num should be set bigger than 1 to get better results.
python3 tts_infer.py \
--model-path where_you_download_dir \
--tokenizer-path where_you_download_dir \
--prompt-text "比如在工作间隙,做一些简单的伸展运动,放松一下身体,这样,会让你更有精力." \
--prompt-audio "examples/whisper_prompt.wav" \
--edit-type "style" \
--edit-info "whisper" \
--output-dir ./output
# paraliguistic
# supported tags, Breathing, Laughter, Surprise-oh, Confirmation-en, Uhm, Surprise-ah, Surprise-wa, Sigh, Question-ei, Dissatisfaction-hnn
python3 tts_infer.py \
--model-path where_you_download_dir \
--tokenizer-path where_you_download_dir \
--prompt-text "我觉得这个计划大概是可行的,不过还需要再仔细考虑一下。" \
--prompt-audio "examples/paralingustic_prompt.wav" \
--generated-text "我觉得这个计划大概是可行的,[Uhm]不过还需要再仔细考虑一下。" \
--edit-type "paralinguistic" \
--output-dir ./output
# denoise
# Prompt text is not needed.
python3 tts_infer.py \
--model-path where_you_download_dir \
--tokenizer-path where_you_download_dir \
--prompt-audio "examples/denoise_prompt.wav"\
--edit-type "denoise" \
--output-dir ./output
# vad
# Prompt text is not needed.
python3 tts_infer.py \
--model-path where_you_download_dir \
--tokenizer-path where_you_download_dir \
--prompt-audio "examples/vad_prompt.wav" \
--edit-type "vad" \
--output-dir ./output
# speed
# supported edit-info: faster, slower, more faster, more slower
python3 tts_infer.py \
--model-path where_you_download_dir \
--tokenizer-path where_you_download_dir \
--prompt-text "上次你说鞋子有点磨脚,我给你买了一双软软的鞋垫。" \
--prompt-audio "examples/speed_prompt.wav" \
--edit-type "speed" \
--edit-info "more faster" \
--output-dir ./outputDeploy This Model
Production-ready deployment in minutes
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIReplicate
One-click model deployment
Run models in the cloud with simple API. No DevOps required.
Deploy NowDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.