uer

63 models • 4 total models in database

Sort by:

roberta-base-finetuned-dianping-chinese

Chinese RoBERTa-Base Models for Text Classification This is the set of 5 Chinese RoBERTa-Base classification models fine-tuned by UER-py, which is introduced in this paper. Besides, the models could also be fine-tuned by TencentPretrain introduced in this paper, which inherits UER-py to support models with parameters above one billion, and extends it to a multimodal pre-training framework. You can download the 5 Chinese RoBERTa-Base classification models either from the UER-py Modelzoo page, or via HuggingFace from the links below: | Dataset | Link | | :-----------: | :-------------------------------------------------------: | | JD full | [roberta-base-finetuned-jd-full-chinese][jdfull] | | JD binary | [roberta-base-finetuned-jd-binary-chinese][jdbinary] | | Dianping | [roberta-base-finetuned-dianping-chinese][dianping] | | Ifeng | [roberta-base-finetuned-ifeng-chinese][ifeng] | | Chinanews | [roberta-base-finetuned-chinanews-chinese][chinanews] | You can use this model directly with a pipeline for text classification (take the case of roberta-base-finetuned-chinanews-chinese): 5 Chinese text classification datasets are used. JD full, JD binary, and Dianping datasets consist of user reviews of different sentiment polarities. Ifeng and Chinanews consist of first paragraphs of news articles of different topic classes. They are collected by Glyph project and more details are discussed in the corresponding paper. Models are fine-tuned by UER-py on Tencent Cloud. We fine-tune three epochs with a sequence length of 512 on the basis of the pre-trained model chineserobertaL-12H-768. At the end of each epoch, the model is saved when the best performance on development set is achieved. We use the same hyper-parameters on different models. Taking the case of roberta-base-finetuned-chinanews-chinese Finally, we convert the pre-trained model into Huggingface's format: [jdfull]:https://huggingface.co/uer/roberta-base-finetuned-jd-full-chinese [jdbinary]:https://huggingface.co/uer/roberta-base-finetuned-jd-binary-chinese [dianping]:https://huggingface.co/uer/roberta-base-finetuned-dianping-chinese [ifeng]:https://huggingface.co/uer/roberta-base-finetuned-ifeng-chinese [chinanews]:https://huggingface.co/uer/roberta-base-finetuned-chinanews-chinese

—

9,885

gpt2-chinese-cluecorpussmall

The set of GPT2 models, except for GPT2-xlarge model, are pre-trained by UER-py, which is introduced in this paper. The GPT2-xlarge model is pre-trained by TencentPretrain introduced in this paper, which inherits UER-py to support models with parameters above one billion, and extends it to a multimodal pre-training framework. Besides, the other models could also be pre-trained by TencentPretrain. The model is used to generate Chinese texts. You can download the set of Chinese GPT2 models either from the UER-py Modelzoo page, or via HuggingFace from the links below: | | Link | | ----------------- | :----------------------------: | | GPT2-distil | [L=6/H=768][distil] | | GPT2 | [L=12/H=768][base] | | GPT2-medium | [L=24/H=1024][medium] | | GPT2-large | [L=36/H=1280][large] | | GPT2-xlarge | [L=48/H=1600][xlarge] | Note that the 6-layer model is called GPT2-distil model because it follows the configuration of distilgpt2, and the pre-training does not involve the supervision of larger models. You can use the model directly with a pipeline for text generation (take the case of GPT2-distil): The GPT2-xlarge model is pre-trained by TencentPretrain, and the others are pre-trained by UER-py on Tencent Cloud. We pre-train 1,000,000 steps with a sequence length of 128 and then pre-train 250,000 additional steps with a sequence length of 1024. For the models pre-trained by UER-py, take the case of GPT2-distil Finally, we convert the pre-trained model into Huggingface's format: Before stage2, we extract fp32 consolidated weights from a zero 2 and 3 DeepSpeed checkpoints: Then, we extract fp32 consolidated weights from a zero 2 and 3 DeepSpeed checkpoints: Finally, we convert the pre-trained model into Huggingface's format: [distil]:https://huggingface.co/uer/gpt2-distil-chinese-cluecorpussmall [base]:https://huggingface.co/uer/gpt2-chinese-cluecorpussmall [medium]:https://huggingface.co/uer/gpt2-medium-chinese-cluecorpussmall [large]:https://huggingface.co/uer/gpt2-large-chinese-cluecorpussmall [xlarge]:https://huggingface.co/uer/gpt2-xlarge-chinese-cluecorpussmall

—

9,864

237

roberta-base-finetuned-jd-binary-chinese

—

7,527

roberta-base-finetuned-chinanews-chinese

—

3,531

roberta-base-finetuned-cluener2020-chinese

The model is used for named entity recognition. It is fine-tuned by UER-py, which is introduced in this paper. Besides, the model could also be fine-tuned by TencentPretrain introduced in this paper, which inherits UER-py to support models with parameters above one billion, and extends it to a multimodal pre-training framework. You can download the model either from the UER-py Modelzoo page, or via HuggingFace from the link roberta-base-finetuned-cluener2020-chinese. You can use this model directly with a pipeline for token classification : CLUENER2020 is used as training data. We only use the train set of the dataset. The model is fine-tuned by UER-py on Tencent Cloud. We fine-tune five epochs with a sequence length of 512 on the basis of the pre-trained model chineserobertaL-12H-768. At the end of each epoch, the model is saved when the best performance on development set is achieved. Finally, we convert the pre-trained model into Huggingface's format:

—

2,670

roberta-base-chinese-extractive-qa

—

2,626

102

sbert-base-chinese-nli

This is the sentence embedding model pre-trained by UER-py, which is introduced in this paper. Besides, the model could also be pre-trained by TencentPretrain introduced in this paper, which inherits UER-py to support models with parameters above one billion, and extends it to a multimodal pre-training framework. You can use this model to extract sentence embeddings for sentence similarity task. We use cosine distance to calculate the embedding similarity here: The model is fine-tuned by UER-py on Tencent Cloud. We fine-tune five epochs with a sequence length of 128 on the basis of the pre-trained model chineserobertaL-12H-768. At the end of each epoch, the model is saved when the best performance on development set is achieved. Finally, we convert the pre-trained model into Huggingface's format:

license:apache-2.0

2,564

133

t5-base-chinese-cluecorpussmall

This is the set of Chinese T5 models pre-trained by UER-py, which is introduced in this paper. Besides, the models could also be pre-trained by TencentPretrain introduced in this paper, which inherits UER-py to support models with parameters above one billion, and extends it to a multimodal pre-training framework. The Text-to-Text Transfer Transformer (T5) leverages a unified text-to-text format and attains state-of-the-art results on a wide variety of English-language NLP tasks. Following their work, we released a series of Chinese T5 models. You can download the set of Chinese T5 models either from the UER-py Modelzoo page, or via HuggingFace from the links below: | | Link | | -------- | :-----------------------: | | T5-Small | [L=6/H=512 (Small)][small] | | T5-Base | [L=12/H=768 (Base)][base] | In T5, spans of the input sequence are masked by so-called sentinel token. Each sentinel token represents a unique mask token for the input sequence and should start with ` `, ` `, … up to ` `. However, ` ` is separated into multiple parts in Huggingface's Hosted inference API. Therefore, we replace ` ` with `extraxxx` in vocabulary and BertTokenizer regards `extraxxx` as one sentinel token. You can use this model directly with a pipeline for text2text generation (take the case of T5-Small): The model is pre-trained by UER-py on Tencent Cloud. We pre-train 1,000,000 steps with a sequence length of 128 and then pre-train 250,000 additional steps with a sequence length of 512. We use the same hyper-parameters on different model sizes. Finally, we convert the pre-trained model into Huggingface's format: [small]:https://huggingface.co/uer/t5-small-chinese-cluecorpussmall [base]:https://huggingface.co/uer/t5-base-chinese-cluecorpussmall

—

1,139

uer

roberta-base-finetuned-dianping-chinese

gpt2-chinese-cluecorpussmall

roberta-base-finetuned-jd-binary-chinese

roberta-base-finetuned-chinanews-chinese

roberta-base-finetuned-cluener2020-chinese

roberta-base-chinese-extractive-qa

sbert-base-chinese-nli

t5-base-chinese-cluecorpussmall

gpt2-chinese-poem

t5-small-chinese-cluecorpussmall

chinese_roberta_L-4_H-512

albert-base-chinese-cluecorpussmall

roberta-base-finetuned-jd-full-chinese

roberta-small-wwm-chinese-cluecorpussmall

gpt2-distil-chinese-cluecorpussmall

chinese_roberta_L-8_H-512

roberta-tiny-wwm-chinese-cluecorpussmall

t5-v1_1-small-chinese-cluecorpussmall

gpt2-large-chinese-cluecorpussmall

chinese_roberta_L-12_H-768

gpt2-chinese-lyric

gpt2-chinese-ancient

bart-base-chinese-cluecorpussmall

chinese_roberta_L-2_H-128

pegasus-base-chinese-cluecorpussmall

gpt2-chinese-couplet

chinese_roberta_L-4_H-256

gpt2-medium-chinese-cluecorpussmall

chinese_roberta_L-6_H-768

roberta-base-finetuned-ifeng-chinese

roberta-base-wwm-chinese-cluecorpussmall

roberta-mini-wwm-chinese-cluecorpussmall

gpt2-xlarge-chinese-cluecorpussmall

t5-v1_1-base-chinese-cluecorpussmall

roberta-medium-word-chinese-cluecorpussmall

bart-large-chinese-cluecorpussmall

chinese_roberta_L-2_H-512

albert-large-chinese-cluecorpussmall

roberta-small-word-chinese-cluecorpussmall

chinese_roberta_L-6_H-512

chinese_roberta_L-6_H-256

roberta-base-word-chinese-cluecorpussmall

chinese_roberta_L-10_H-256

chinese_roberta_L-4_H-128

chinese_roberta_L-10_H-128

chinese_roberta_L-8_H-128

chinese_roberta_L-8_H-768

roberta-tiny-word-chinese-cluecorpussmall

chinese_roberta_L-10_H-768

roberta-mini-word-chinese-cluecorpussmall

roberta-large-wwm-chinese-cluecorpussmall

chinese_roberta_L-8_H-256

chinese_roberta_L-6_H-128

chinese_roberta_L-2_H-256

chinese_roberta_L-4_H-768

chinese_roberta_L-12_H-128

chinese_roberta_L-12_H-512

roberta-medium-wwm-chinese-cluecorpussmall

chinese_roberta_L-10_H-512

chinese_roberta_L-12_H-256

roberta-xlarge-wwm-chinese-cluecorpussmall

chinese_roberta_L-2_H-768

pegasus-large-chinese-cluecorpussmall