hfl

133 models • 4 total models in database

Sort by:

chinese-roberta-wwm-ext

--- language: - zh tags: - bert license: "apache-2.0" ---

license:apache-2.0

476,377

367

chinese-bert-wwm

license:apache-2.0

52,446

chinese-bert-wwm-ext

Chinese BERT with Whole Word Masking For further accelerating Chinese natural language processing, we provide Chinese pre-trained BERT with Whole Word Masking. Pre-Training with Whole Word Masking for Chinese BERT Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang, Shijin Wang, Guoping Hu This repository is developed based on：https://github.com/google-research/bert You may also interested in, - Chinese BERT series: https://github.com/ymcui/Chinese-BERT-wwm - Chinese MacBERT: https://github.com/ymcui/MacBERT - Chinese ELECTRA: https://github.com/ymcui/Chinese-ELECTRA - Chinese XLNet: https://github.com/ymcui/Chinese-XLNet - Knowledge Distillation Toolkit - TextBrewer: https://github.com/airaria/TextBrewer More resources by HFL: https://github.com/ymcui/HFL-Anthology Citation If you find the technical report or resource is useful, please cite the following technical report in your paper. - Primary: https://arxiv.org/abs/2004.13922

license:apache-2.0

10,075

186

chinese-macbert-large

license:apache-2.0

9,255

llama-3-chinese-8b-instruct-v3

NaNK

llama

9,018

chinese-roberta-wwm-ext-large

Please use 'Bert' related functions to load this model! Chinese BERT with Whole Word Masking For further accelerating Chinese natural language processing, we provide Chinese pre-trained BERT with Whole Word Masking. Pre-Training with Whole Word Masking for Chinese BERT Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang, Shijin Wang, Guoping Hu This repository is developed based on：https://github.com/google-research/bert You may also interested in, - Chinese BERT series: https://github.com/ymcui/Chinese-BERT-wwm - Chinese MacBERT: https://github.com/ymcui/MacBERT - Chinese ELECTRA: https://github.com/ymcui/Chinese-ELECTRA - Chinese XLNet: https://github.com/ymcui/Chinese-XLNet - Knowledge Distillation Toolkit - TextBrewer: https://github.com/airaria/TextBrewer More resources by HFL: https://github.com/ymcui/HFL-Anthology Citation If you find the technical report or resource is useful, please cite the following technical report in your paper. - Primary: https://arxiv.org/abs/2004.13922

license:apache-2.0

8,876

221

chinese-macbert-base

Please use 'Bert' related functions to load this model! This repository contains the resources in our paper "Revisiting Pre-trained Models for Chinese Natural Language Processing", which will be published in "Findings of EMNLP". You can read our camera-ready paper through ACL Anthology or arXiv pre-print. Revisiting Pre-trained Models for Chinese Natural Language Processing Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, Guoping Hu You may also interested in, - Chinese BERT series: https://github.com/ymcui/Chinese-BERT-wwm - Chinese ELECTRA: https://github.com/ymcui/Chinese-ELECTRA - Chinese XLNet: https://github.com/ymcui/Chinese-XLNet - Knowledge Distillation Toolkit - TextBrewer: https://github.com/airaria/TextBrewer More resources by HFL: https://github.com/ymcui/HFL-Anthology Introduction MacBERT is an improved BERT with novel MLM as correction pre-training task, which mitigates the discrepancy of pre-training and fine-tuning. Instead of masking with [MASK] token, which never appears in the ﬁne-tuning stage, we propose to use similar words for the masking purpose. A similar word is obtained by using Synonyms toolkit (Wang and Hu, 2017), which is based on word2vec (Mikolov et al., 2013) similarity calculations. If an N-gram is selected to mask, we will ﬁnd similar words individually. In rare cases, when there is no similar word, we will degrade to use random word replacement. Here is an example of our pre-training task. | | Example | | -------------- | ----------------- | | Original Sentence | we use a language model to predict the probability of the next word. | | MLM | we use a language [M] to [M] ##di ##ct the pro [M] ##bility of the next word . | | Whole word masking | we use a language [M] to [M] [M] [M] the [M] [M] [M] of the next word . | | N-gram masking | we use a [M] [M] to [M] [M] [M] the [M] [M] [M] [M] [M] next word . | | MLM as correction | we use a text system to ca ##lc ##ulate the po ##si ##bility of the next word . | Except for the new pre-training task, we also incorporate the following techniques. - Whole Word Masking (WWM) - N-gram masking - Sentence-Order Prediction (SOP) Note that our MacBERT can be directly replaced with the original BERT as there is no differences in the main neural architecture. For more technical details, please check our paper: Revisiting Pre-trained Models for Chinese Natural Language Processing Citation If you find our resource or paper is useful, please consider including the following citation in your paper. - https://arxiv.org/abs/2004.13922

hfl

chinese-roberta-wwm-ext

chinese-bert-wwm

chinese-bert-wwm-ext

chinese-macbert-large

llama-3-chinese-8b-instruct-v3

chinese-roberta-wwm-ext-large

chinese-alpaca-2-13b-16k

chinese-alpaca-2-13b

chinese-mixtral

chinese-llama-2-13b-16k

chinese-mixtral-instruct

chinese-llama-2-13b

llama-3-chinese-8b-instruct

llama-3-chinese-8b-instruct-v2

chinese-macbert-base

rbt3

minirbt-h256

chinese-electra-180g-small-ex-discriminator

Qwen2.5-VL-7B-Instruct-GPTQ-Int4

llama-3-chinese-8b

llama-3-chinese-8b-instruct-v3-gguf

llama-3-chinese-8b-gguf

rbt4-h312

Qwen2.5-VL-3B-Instruct-GPTQ-Int4

chinese-mixtral-instruct-gguf

chinese-xlnet-base

chinese-alpaca-2-13b-gguf

llama-3-chinese-8b-instruct-gguf

chinese-lert-base

chinese-llama-2-1.3b

chinese-electra-base-discriminator

chinese-electra-180g-large-discriminator

chinese-lert-small

chinese-llama-2-7b

chinese-electra-180g-base-discriminator

chinese-alpaca-2-1.3b

minirbt-h288

chinese-legal-electra-base-discriminator

chinese-electra-small-discriminator

llama-3-chinese-8b-instruct-v2-gguf

chinese-electra-180g-small-discriminator

rbt6

chinese-alpaca-2-7b-64k

chinese-llama-2-7b-64k

chinese-llama-2-lora-7b-64k

chinese-alpaca-2-lora-7b-64k

chinese-alpaca-2-1.3b-gguf

chinese-alpaca-2-1.3b-rlhf-gguf

chinese-alpaca-2-7b-64k-gguf

chinese-alpaca-2-13b-16k-gguf

chinese-alpaca-2-7b

chinese-llama-2-13b-gguf

chinese-mixtral-gguf

chinese-llama-2-7b-gguf

rbtl3

chinese-alpaca-2-7b-gguf

chinese-electra-small-ex-discriminator

chinese-alpaca-2-lora-7b-16k

chinese-electra-180g-base-generator

chinese-alpaca-2-7b-16k

chinese-llama-2-7b-16k

chinese-alpaca-2-1.3b-rlhf

chinese-llama-2-lora-7b-16k

chinese-electra-small-generator

chinese-electra-180g-small-generator

chinese-llama-2-lora-13b-16k

chinese-electra-small-ex-generator

chinese-legal-electra-large-discriminator

chinese-alpaca-2-7b-rlhf

chinese-electra-180g-large-generator

chinese-alpaca-2-7b-rlhf-gguf

chinese-electra-base-generator

chinese-alpaca-2-lora-13b-16k

chinese-electra-large-discriminator

chinese-llama-2-7b-64k-gguf

chinese-electra-large-generator

chinese-llama-2-13b-16k-gguf

chinese-electra-180g-small-ex-generator

cino-large-v2