t5-base-chinese-cluecorpussmall
1.1K
26
2 languages
—
by
uer
Language Model
OTHER
New
1K downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
Unknown
Mobile
Laptop
Server
Quick Summary
This is the set of Chinese T5 models pre-trained by UER-py, which is introduced in this paper.
Training Data Analysis
🔵 Good (6.0/10)
Researched training datasets used by t5-base-chinese-cluecorpussmall with quality assessment
Specialized For
general
multilingual
Training Datasets (1)
c4
🔵 6/10
general
multilingual
Key Strengths
- •Scale and Accessibility: 750GB of publicly available, filtered text
- •Systematic Filtering: Documented heuristics enable reproducibility
- •Language Diversity: Despite English-only, captures diverse writing styles
Considerations
- •English-Only: Limits multilingual applications
- •Filtering Limitations: Offensive content and low-quality text remain despite filtering
Explore our comprehensive training dataset analysis
View All DatasetsCode Examples
How to usepythontransformers
>>> from transformers import BertTokenizer, T5ForConditionalGeneration, Text2TextGenerationPipeline
>>> tokenizer = BertTokenizer.from_pretrained("uer/t5-small-chinese-cluecorpussmall")
>>> model = T5ForConditionalGeneration.from_pretrained("uer/t5-small-chinese-cluecorpussmall")
>>> text2text_generator = Text2TextGenerationPipeline(model, tokenizer)
>>> text2text_generator("中国的首都是extra0京", max_length=50, do_sample=False)
[{'generated_text': 'extra0 北 extra1 extra2 extra3 extra4 extra5'}]Deploy This Model
Production-ready deployment in minutes
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIReplicate
One-click model deployment
Run models in the cloud with simple API. No DevOps required.
Deploy NowDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.