uer

63 models • 4 total models in database
Sort by:

roberta-base-finetuned-dianping-chinese

Chinese RoBERTa-Base Models for Text Classification This is the set of 5 Chinese RoBERTa-Base classification models fine-tuned by UER-py, which is introduced in this paper. Besides, the models could also be fine-tuned by TencentPretrain introduced in this paper, which inherits UER-py to support models with parameters above one billion, and extends it to a multimodal pre-training framework. You can download the 5 Chinese RoBERTa-Base classification models either from the UER-py Modelzoo page, or via HuggingFace from the links below: | Dataset | Link | | :-----------: | :-------------------------------------------------------: | | JD full | [roberta-base-finetuned-jd-full-chinese][jdfull] | | JD binary | [roberta-base-finetuned-jd-binary-chinese][jdbinary] | | Dianping | [roberta-base-finetuned-dianping-chinese][dianping] | | Ifeng | [roberta-base-finetuned-ifeng-chinese][ifeng] | | Chinanews | [roberta-base-finetuned-chinanews-chinese][chinanews] | You can use this model directly with a pipeline for text classification (take the case of roberta-base-finetuned-chinanews-chinese): 5 Chinese text classification datasets are used. JD full, JD binary, and Dianping datasets consist of user reviews of different sentiment polarities. Ifeng and Chinanews consist of first paragraphs of news articles of different topic classes. They are collected by Glyph project and more details are discussed in the corresponding paper. Models are fine-tuned by UER-py on Tencent Cloud. We fine-tune three epochs with a sequence length of 512 on the basis of the pre-trained model chineserobertaL-12H-768. At the end of each epoch, the model is saved when the best performance on development set is achieved. We use the same hyper-parameters on different models. Taking the case of roberta-base-finetuned-chinanews-chinese Finally, we convert the pre-trained model into Huggingface's format: [jdfull]:https://huggingface.co/uer/roberta-base-finetuned-jd-full-chinese [jdbinary]:https://huggingface.co/uer/roberta-base-finetuned-jd-binary-chinese [dianping]:https://huggingface.co/uer/roberta-base-finetuned-dianping-chinese [ifeng]:https://huggingface.co/uer/roberta-base-finetuned-ifeng-chinese [chinanews]:https://huggingface.co/uer/roberta-base-finetuned-chinanews-chinese

9,885
68

gpt2-chinese-cluecorpussmall

The set of GPT2 models, except for GPT2-xlarge model, are pre-trained by UER-py, which is introduced in this paper. The GPT2-xlarge model is pre-trained by TencentPretrain introduced in this paper, which inherits UER-py to support models with parameters above one billion, and extends it to a multimodal pre-training framework. Besides, the other models could also be pre-trained by TencentPretrain. The model is used to generate Chinese texts. You can download the set of Chinese GPT2 models either from the UER-py Modelzoo page, or via HuggingFace from the links below: | | Link | | ----------------- | :----------------------------: | | GPT2-distil | [L=6/H=768][distil] | | GPT2 | [L=12/H=768][base] | | GPT2-medium | [L=24/H=1024][medium] | | GPT2-large | [L=36/H=1280][large] | | GPT2-xlarge | [L=48/H=1600][xlarge] | Note that the 6-layer model is called GPT2-distil model because it follows the configuration of distilgpt2, and the pre-training does not involve the supervision of larger models. You can use the model directly with a pipeline for text generation (take the case of GPT2-distil): The GPT2-xlarge model is pre-trained by TencentPretrain, and the others are pre-trained by UER-py on Tencent Cloud. We pre-train 1,000,000 steps with a sequence length of 128 and then pre-train 250,000 additional steps with a sequence length of 1024. For the models pre-trained by UER-py, take the case of GPT2-distil Finally, we convert the pre-trained model into Huggingface's format: Before stage2, we extract fp32 consolidated weights from a zero 2 and 3 DeepSpeed checkpoints: Then, we extract fp32 consolidated weights from a zero 2 and 3 DeepSpeed checkpoints: Finally, we convert the pre-trained model into Huggingface's format: [distil]:https://huggingface.co/uer/gpt2-distil-chinese-cluecorpussmall [base]:https://huggingface.co/uer/gpt2-chinese-cluecorpussmall [medium]:https://huggingface.co/uer/gpt2-medium-chinese-cluecorpussmall [large]:https://huggingface.co/uer/gpt2-large-chinese-cluecorpussmall [xlarge]:https://huggingface.co/uer/gpt2-xlarge-chinese-cluecorpussmall

9,864
237

roberta-base-finetuned-jd-binary-chinese

7,527
40

roberta-base-finetuned-chinanews-chinese

3,531
76

roberta-base-finetuned-cluener2020-chinese

The model is used for named entity recognition. It is fine-tuned by UER-py, which is introduced in this paper. Besides, the model could also be fine-tuned by TencentPretrain introduced in this paper, which inherits UER-py to support models with parameters above one billion, and extends it to a multimodal pre-training framework. You can download the model either from the UER-py Modelzoo page, or via HuggingFace from the link roberta-base-finetuned-cluener2020-chinese. You can use this model directly with a pipeline for token classification : CLUENER2020 is used as training data. We only use the train set of the dataset. The model is fine-tuned by UER-py on Tencent Cloud. We fine-tune five epochs with a sequence length of 512 on the basis of the pre-trained model chineserobertaL-12H-768. At the end of each epoch, the model is saved when the best performance on development set is achieved. Finally, we convert the pre-trained model into Huggingface's format:

2,670
44

roberta-base-chinese-extractive-qa

2,626
102

sbert-base-chinese-nli

This is the sentence embedding model pre-trained by UER-py, which is introduced in this paper. Besides, the model could also be pre-trained by TencentPretrain introduced in this paper, which inherits UER-py to support models with parameters above one billion, and extends it to a multimodal pre-training framework. You can use this model to extract sentence embeddings for sentence similarity task. We use cosine distance to calculate the embedding similarity here: The model is fine-tuned by UER-py on Tencent Cloud. We fine-tune five epochs with a sequence length of 128 on the basis of the pre-trained model chineserobertaL-12H-768. At the end of each epoch, the model is saved when the best performance on development set is achieved. Finally, we convert the pre-trained model into Huggingface's format:

license:apache-2.0
2,564
133

t5-base-chinese-cluecorpussmall

This is the set of Chinese T5 models pre-trained by UER-py, which is introduced in this paper. Besides, the models could also be pre-trained by TencentPretrain introduced in this paper, which inherits UER-py to support models with parameters above one billion, and extends it to a multimodal pre-training framework. The Text-to-Text Transfer Transformer (T5) leverages a unified text-to-text format and attains state-of-the-art results on a wide variety of English-language NLP tasks. Following their work, we released a series of Chinese T5 models. You can download the set of Chinese T5 models either from the UER-py Modelzoo page, or via HuggingFace from the links below: | | Link | | -------- | :-----------------------: | | T5-Small | [L=6/H=512 (Small)][small] | | T5-Base | [L=12/H=768 (Base)][base] | In T5, spans of the input sequence are masked by so-called sentinel token. Each sentinel token represents a unique mask token for the input sequence and should start with ` `, ` `, … up to ` `. However, ` ` is separated into multiple parts in Huggingface's Hosted inference API. Therefore, we replace ` ` with `extraxxx` in vocabulary and BertTokenizer regards `extraxxx` as one sentinel token. You can use this model directly with a pipeline for text2text generation (take the case of T5-Small): The model is pre-trained by UER-py on Tencent Cloud. We pre-train 1,000,000 steps with a sequence length of 128 and then pre-train 250,000 additional steps with a sequence length of 512. We use the same hyper-parameters on different model sizes. Finally, we convert the pre-trained model into Huggingface's format: [small]:https://huggingface.co/uer/t5-small-chinese-cluecorpussmall [base]:https://huggingface.co/uer/t5-base-chinese-cluecorpussmall

1,139
26

gpt2-chinese-poem

805
39

t5-small-chinese-cluecorpussmall

762
20

chinese_roberta_L-4_H-512

731
11

albert-base-chinese-cluecorpussmall

677
38

roberta-base-finetuned-jd-full-chinese

555
14

roberta-small-wwm-chinese-cluecorpussmall

417
2

gpt2-distil-chinese-cluecorpussmall

406
20

chinese_roberta_L-8_H-512

251
3

roberta-tiny-wwm-chinese-cluecorpussmall

241
2

t5-v1_1-small-chinese-cluecorpussmall

187
7

gpt2-large-chinese-cluecorpussmall

179
4

chinese_roberta_L-12_H-768

115
16

gpt2-chinese-lyric

90
32

gpt2-chinese-ancient

83
18

bart-base-chinese-cluecorpussmall

82
18

chinese_roberta_L-2_H-128

75
11

pegasus-base-chinese-cluecorpussmall

55
4

gpt2-chinese-couplet

54
10

chinese_roberta_L-4_H-256

53
4

gpt2-medium-chinese-cluecorpussmall

51
3

chinese_roberta_L-6_H-768

51
2

roberta-base-finetuned-ifeng-chinese

44
1

roberta-base-wwm-chinese-cluecorpussmall

30
3

roberta-mini-wwm-chinese-cluecorpussmall

27
0

gpt2-xlarge-chinese-cluecorpussmall

21
5

t5-v1_1-base-chinese-cluecorpussmall

18
12

roberta-medium-word-chinese-cluecorpussmall

18
2

bart-large-chinese-cluecorpussmall

17
2

chinese_roberta_L-2_H-512

16
1

albert-large-chinese-cluecorpussmall

15
4

roberta-small-word-chinese-cluecorpussmall

11
2

chinese_roberta_L-6_H-512

11
0

chinese_roberta_L-6_H-256

10
1

roberta-base-word-chinese-cluecorpussmall

9
9

chinese_roberta_L-10_H-256

9
0

chinese_roberta_L-4_H-128

8
0

chinese_roberta_L-10_H-128

7
1

chinese_roberta_L-8_H-128

7
0

chinese_roberta_L-8_H-768

7
0

roberta-tiny-word-chinese-cluecorpussmall

6
3

chinese_roberta_L-10_H-768

6
2

roberta-mini-word-chinese-cluecorpussmall

6
1

roberta-large-wwm-chinese-cluecorpussmall

6
0

chinese_roberta_L-8_H-256

5
1

chinese_roberta_L-6_H-128

5
0

chinese_roberta_L-2_H-256

4
1

chinese_roberta_L-4_H-768

4
0

chinese_roberta_L-12_H-128

3
1

chinese_roberta_L-12_H-512

3
1

roberta-medium-wwm-chinese-cluecorpussmall

3
1

chinese_roberta_L-10_H-512

3
0

chinese_roberta_L-12_H-256

3
0

roberta-xlarge-wwm-chinese-cluecorpussmall

2
1

chinese_roberta_L-2_H-768

2
0

pegasus-large-chinese-cluecorpussmall

1
2