IDEA-CCNL
Taiyi-Stable-Diffusion-1B-Chinese-EN-v0.1
Taiyi-CLIP-Roberta-102M-Chinese
Erlangshen-Roberta-110M-Sentiment
Erlangshen-Roberta-330M-Sentiment
Ziya LLaMA 13B V1
A language model designed for various natural language processing tasks, supporting English.
Ziya-LLaMA-13B-Pretrain-v1
- Ziya-LLaMA-13B-v1 - Ziya-LLaMA-7B-Reward - Ziya-LLaMA-13B-Pretrain-v1 - Ziya-BLIP2-14B-Visual-v1 Ziya-LLaMA-13B-Pretrain-v1 是基于LLaMa的130亿参数大规模预训练模型,针对中文分词优化,并完成了中英文 110B tokens 的增量预训练,进一步提升了中文生成和理解能力。目前姜子牙通用大模型 Ziya-LLaMA-13B-v1 在本模型上,进一步完成了多任务有监督微调和人类反馈学习阶段的训练过程,具备翻译,编程,文本分类,信息抽取,摘要,文案生成,常识问答和数学计算等能力。 用户须知:为了遵循 Meta 发布的 LLaMA 模型许可,本模型发布的是训练前后的权重增量,最终模型可方便地通过脚本获得(参考 Usage 中的步骤)。 The Ziya-LLaMA-13B-Pretrain-v1 is a large-scale pre-trained model based on LLaMA with 13 billion parameters. We optimizes LLaMAtokenizer on chinese, and incrementally train 110 billion tokens of data based on LLaMa-13B model, which significantly improved the understanding and generation ability on Chinese. Based on the Ziya-LLaMA-13B-Pretrain-v1, the Ziya-LLaMA-13B-v1 is furtherly trained with 2 stages: multi-task supervised fine-tuning (SFT), and human feedback learning (RM, PPO). The Ziya-LLaMA-13B-v1 has the ability to perform tasks such as translation, programming, text classification, information extraction, summarization, copywriting, common sense Q&A, and mathematical calculation. README: To follow the License of LLaMA released by Meta, we only release the incremental weights after continual pretraining. The final model Ziya-LLaMA-13B-Pretrain-v1 could be easily got via the script (refer to Usage). | 需求 Demand | 任务 Task | 系列 Series | 模型 Model | 参数 Parameter | 额外 Extra | | :----: | :----: | :----: | :----: | :----: | :----: | | 通用 General | AGI模型 | 姜子牙 Ziya | LLaMA | 13B | English&Chinese | 原始数据包含英文和中文,其中英文数据来自openwebtext、Books、Wikipedia和Code,中文数据来自清洗后的悟道数据集、自建的中文数据集。在对原始数据进行去重、模型打分、数据分桶、规则过滤、敏感主题过滤和数据评估后,最终得到125B tokens的有效数据。 为了解决LLaMA原生分词对中文编解码效率低下的问题,我们在LLaMA词表的基础上增加了7k+个常见中文字,通过和LLaMA原生的词表去重,最终得到一个39410大小的词表,并通过复用Transformers里LlamaTokenizer来实现了这一效果。 在增量训练过程中,我们使用了160张40GB的A100,采用2.6M tokens的训练集样本数量和FP 16的混合精度,吞吐量达到118 TFLOP per GPU per second。因此我们能够在8天的时间里在原生的LLaMA-13B模型基础上,增量训练110B tokens的数据。据我们所知,这也是至今为止LLaMA-13B上最大规模增量训练。 训练期间,虽然遇到了机器宕机、底层框架bug、loss spike等各种问题,但我们通过快速调整,保证了增量训练的稳定性。我们也放出训练过程的loss曲线,让大家了解可能出现的问题。 The original data contains both English and Chinese, with English data from openwebtext, Books, Wikipedia, and Code, and Chinese data from the cleaned Wudao dataset and self-built Chinese dataset. After deduplication, model scoring, data bucketing, rule filtering, sensitive topic filtering, and data evaluation, we finally obtained 125 billion tokens of data. To address the issue of low efficiency in Chinese encoding and decoding caused by the tokenizer of LLaMa, we added 8,000 commonly used Chinese characters to the LLaMa SentencePiece vocabulary. Deduplicating with the original LLaMa vocabulary, we finally obtained a vocabulary of size 39,410. We achieved this by reusing the LlamaTokenizer in Transformers. During the incremental training process, we used 160 A100s with a total of 40GB memory, using a training dataset with 2.6 million tokens and mixed precision of FP16. The throughput reached 118 TFLOP per GPU per second. As a result, we were able to incrementally train 110 billion tokens of data based on LLaMa-13B model in just 8 days.As far as we know, it is the largest increamental training on LLaMA-13B so far. Throughout the training process, we encountered various issues such as machine crashes, underlying framework bugs, and loss spikes. However, we ensured the stability of the incremental training by making rapid adjustments. We have also released the loss curve during the training process to help everyone understand the potential issues that may arise. 以下是 Ziya-LLaMA-13B-Pertrain-v1 和继续训练前的LLaMA 模型在英文公开评测 HeLM 和中文多项选择评测集上的评估效果对比。 Here are comparisons of the Ziya-LLaMA-13B-Pretrain-v1 model and the LLaMA model before continual pre-training, evaluated on the English benchmark (HeLM), and our Chinese multiple-choice evaluation datasets. | Model | Meanwinrate | MMLU | BoolQ | NarrativeQA | NaturalQuestion(closed-book) | NaturalQuestion(open-book) | QuAC | TruthfulQA | IMDB | | -------------------------- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | | LLaMA-13B | 0.500 | 0.424 | 0.718 | 0.440 | 0.349 | 0.591 | 0.318 | 0.326 | 0.487 | | Ziya-LLaMA-13B-Pretrain-v1 | 0.650 | 0.433 | 0.753 | 0.445 | 0.348 | 0.528 | 0.335 | 0.249 | 0.497 | | 模型 | incontext | c3 | 常识 | 语文 | 数学 | 英语 | 物理 | 化学 | 生物 | 历史 | 政治 | 地理 | |-------------------------|------------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------| | LLaMA-13B | 0-shot | 0.4817 | 0.3088 | 0.2674 | 0.2882 | 0.3399 | 0.2581 | 0.2478 | 0.2271 | 0.3380 | 0.3275 | 0.296 | | Ziya-LLaMA-13B-Pretrain-v1 | 0-shot | 0.5354 | 0.3373 | 0.2925 | 0.3059 | 0.3428 | 0.2903 | 0.2655 | 0.3215 | 0.4190 | 0.4123 | 0.4425 | | LLaMA-13B | 5-shot | 0.5314 | 0.3586 | 0.2813 | 0.2912 | 0.4476 | 0.2939 | 0.2301 | 0.2330 | 0.3268 | 0.3187 | 0.3103 | | Ziya-LLaMA-13B-Pretrain-v1 | 5-shot | 0.6037 | 0.4330 | 0.2802 | 0.2912 | 0.4363 | 0.2975 | 0.2802 | 0.3422 | 0.4358 | 0.4357 | 0.4540 | 由于LLaMA权重的许可限制,该模型不能用于商业用途,请严格遵守LLaMA的使用政策。考虑到LLaMA权重的许可证限制,我们无法直接发布完整的模型权重。因此,我们使用了FastChat开源工具作为基础,并对其进行了进一步的优化。我们计算并发布了Ziya-LLaMA-13B-v1权重与原始LLaMA权重之间的差值。用户可以按照以下步骤操作以获得Ziya-LLaMA-13B-v1完整权重,具体步骤如下: Step 1:获取LLaMA权重并转成Hugging Face Transformers模型格式,可参考转换脚本(若已经有huggingface权重则跳过) Step 2:下载Ziya-LLaMA-13B-v1的delta权重以及step 1中转换好的原始LLaMA权重,使用如下脚本转换:https://github.com/IDEA-CCNL/Fengshenbang-LM/blob/main/fengshen/utils/applydelta.py. Step 1: Obtain the LLaMA weights and convert them into the Hugging Face Transformers format. You can refer to the script (skip this step if you already have the Hugging Face weights). Step 2: Download the delta weights for Ziya-LLaMA-13B-v1 and the pre-converted original LLaMA weights from step 1. Use the following script for conversion: https://github.com/IDEA-CCNL/Fengshenbang-LM/blob/main/fengshen/utils/applydelta.py Step 3: Load the model obtained in Step 2 for inference. If you are using the resource for your work, please cite the our paper:
Erlangshen-Roberta-110M-NLI
Randeng-Pegasus-238M-Summary-Chinese
Wenzhong-GPT2-110M
Randeng-BART-139M-SUMMARY
Taiyi-Stable-Diffusion-1B-Chinese-v0.1
Taiyi-CLIP-Roberta-large-326M-Chinese
Erlangshen-DeBERTa-v2-320M-Chinese
Erlangshen-DeBERTa-v2-710M-Chinese
Erlangshen-MegatronBert-1.3B-Sentiment
Randeng-Pegasus-523M-Summary-Chinese
Erlangshen-Roberta-330M-NLI
Randeng-T5-784M
Erlangshen-DeBERTa-v2-97M-Chinese
Taiyi-BLIP-750M-Chinese
Randeng-T5-Char-57M-Chinese
Taiyi-Stable-Diffusion-1B-Anime-Chinese-v0.1
Taiyi-Stable-Diffusion-XL-3.5B
Randeng-T5-77M
Randeng-T5-784M-MultiTask-Chinese
Erlangshen-MegatronBert-1.3B
Erlangshen-UniMC-RoBERTa-110M-Chinese
Randeng-Pegasus-523M-Chinese
Wenzhong2.0 GPT2 3.5B Chinese
Pretraining on Wudao Corpus, focused on handling NLG tasks, the current largest, Chinese GPT2. | 需求 Demand | 任务 Task | 系列 Series | 模型 Model | 参数 Parameter | 额外 Extra | | :----: | :----: | :----: | :----: | :----: | :----: | | 通用 General | 自然语言生成 NLG | 闻仲 Wenzhong | GPT2 | 3.5B | 中文 Chinese | 为了可以获得一个强大的单向语言模型,我们采用GPT模型结构,并且应用于中文语料上。类似于Wenzhong-GPT2-3.5B,这个模型拥有30层解码器和35亿参数,这比原本的GPT2-XL还要大。不同的是,我们把这个模型在悟道(300G版本)语料上进行预训练。据我们所知,它是目前最大的中文的GPT模型。 To obtain a powerful unidirectional language model, we adopt the GPT model structure and apply it to the Chinese corpus. Similar to Wenzhong-GPT2-3.5B, this model has 30 decoder layers and 3.5 billion parameters, which is larger than the original GPT2-XL. The difference is that we pre-trained this model on the Wudao (300G version) corpus. To the best of our knowledge, it is the largest Chinese GPT model currently available. If you are using the resource for your work, please cite the our paper:
Randeng-T5-77M-MultiTask-Chinese
Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese
Randeng-Pegasus-238M-Chinese
Ziya-BLIP2-14B-Visual-v1
- Ziya-BLIP2-14B-Visual-v1 - Ziya-LLaMA-13B-v1.1 - Ziya-LLaMA-13B-v1 - Ziya-LLaMA-7B-Reward - Ziya-LLaMA-13B-Pretrain-v1 Ziya-Visual多模态大模型基于姜子牙通用大模型V1训练,具有视觉问答和对话能力。今年3月份OpenAI发布具有识图能力的多模态大模型GPT-4,遗憾的是,时至今日绝大部分用户也都还没有拿到GPT-4输入图片的权限,Ziya-Visual参考了Mini-GPT4、LLaVA等优秀的开源实现,补齐了Ziya的识图能力,使中文用户群体可以体验到结合视觉和语言两大模态的大模型的卓越能力。 The Ziya-Visual multimodal Big Model is based on the Ziya-LLaMA-13B-v1 training and has visual question and answer and dialogue capabilities. In March this year, OpenAI released GPT-4, a multimodal big model with image recognition capabilities. Unfortunately, to date, the vast majority of users have not yet been given access to GPT-4 for image input, so Ziya-Visual refers to Mini-GPT4, LLaVA and other excellent open source implementations to complement Ziya's image recognition capabilities, so that the Chinese user community can experience the superior capabilities of a large model combining two modalities: visual and language. | 需求 Demand | 任务 Task | 系列 Series | 模型 Model | 参数 Parameter | 额外 Extra | | :----: | :----: | :----: | :----: | :----: | :----: | | 多模态 Multi-Modal | 通用 General | 姜子牙-多模态 Ziya-Visual | BLIP2 LLaMA | 14B | English&Chinese | 这个例子展示了模型的识图能力、知识能力和创作能力。首先第一个问题中,模型识别出了图片中是电影《泰坦尼克号》的场景,并给出电影导演、发布时间、奖项成就等信息;第二个问题,模型根据用户的需求创作了一首现代爱情诗。 This example demonstrates the model's ability to read pictures, its knowledge and its ability to compose. Firstly in the first problem, the model identifies the picture as a scene from the movie Titanic and gives information about the movie director, release date and award achievements; in the second problem, the model creates a modern love poem based on the user's needs. 这个例子展示了Ziya-Visual传统中国文化的识别和理解能力,模型识别出了中国画中的信息,在得到提示《清明上河图》之后给出了也给出画家张择端和北宋的历史背景。 This example demonstrates Ziya-Visual's ability to recognise and understand traditional Chinese culture. The model identifies information in Chinese paintings, and after getting the hint 'Qingming Shanghe Tu' gives also gives the historical context of the painter Zhang Zeduan and the Northern Song Dynasty. 如果输入多张图片进行问答呢?Ziya-Visual也是胜任的,在这个例子中,Ziya-Visual展现了强大的多图和多轮交互能力,根据用户给的三张图片,叙述了一个女士在城市夜景中邂逅一对母子猫咪,并与之交谈、分别的小故事。 What if multiple images are entered for a quiz? Ziya-Visual is also up to the task. In this example, Ziya-Visual demonstrates the power of multiple images and multiple rounds of interaction, narrating a short story of a lady who encounters a mother and son cat in a city night scene, talks to them and separates them, based on three images given by the user. 在中文视觉问答模型训练上,最大的问题就是数据量少,数据质量差。首先,封神榜团队在开源数据的基础上清洗、积累了一部分高质量数据;其次,我们通过翻译api得到了一部分英-中双语数据集,我们发现虽然翻译数据集会有“翻译腔”等问题,但是借助Ziya-v1的双语能力,最终的语言输出是能够缓解这一问题的;最后,团队结合BLIP,Grounded SAM等先进视觉技术,抽取图像描述的粗粒度信息和图像中物体、方位等细粒度信息,转化为语言描述形式,构造了一部分高质量数据。最终,Ziya-Visual构造了约2千万的优质数据进行训练。和Mini-GPT4、LLaVA一样,Ziya-Visual-v1主要是一个以数据为中心的工作,因此数据的数量和质量非常重要。 In the training of Chinese visual quiz model, the biggest problem is the small amount of data and poor data quality. Firstly, the team cleaned and accumulated some high-quality data based on open source data; secondly, we obtained a part of the English-Chinese bilingual dataset through translation api, and we found that although the translated dataset would have problems such as "translation accent", the final language output was able to alleviate this problem with Ziya-v1's bilingual capability. Finally, the team combined BLIP, Grounded SAM and other advanced vision technologies to extract coarse-grained information from image descriptions and fine-grained information such as objects and orientation in images, and transform them into linguistic descriptions to construct a portion of high-quality data. Ultimately, Ziya-Visual constructed approximately 20 million pieces of high quality data for training. Like Mini-GPT4 and LLaVA, Ziya-Visual-v1 is primarily a data-centric exercise, so the quantity and quality of data is very important. 为了更好的结合视觉预训练模型和LLM的能力,和Mini-GPT4和LLaVA工作一样,Ziya-Visual-v1的训练遵循了BLIP2提出的经典网络结构和两阶段训练的范式。而且我们在实验过程中发现,是否训练Vision Encoder的参数对于最终的生成效果影响很小。因此,在整体模型上,视觉处理部分我们继承了BLIP2的ViT + QFormer参数,LLM部分继承了Ziya-v1的权重,这两个部分权重都是冻结不参与训练的。我们主要训练的部分是视觉映射层(Projection Layer)。第一阶段,我们使用图像Caption数据训练映射层,使Vision Encder抽取出来的图像特征能够和LLM中的文本特征空间进行对齐;第二阶段,我们使用图像问答数据集,进一步微调Ziya-Visual的视觉-语言能力。 In order to better combine the capabilities of the vision pre-training model and the LLM, as in the Mini-GPT4 and LLaVA work, the training of Ziya-Visual-v1 followed the classical network structure and the two-stage training paradigm proposed by BLIP2. Moreover, we found during our experiments that whether or not the parameters of the Vision Encoder are trained has very little impact on the final generation results. Therefore, for the overall model, we inherited the ViT + QFormer parameters from BLIP2 for the vision processing part and the Ziya-v1 weights for the LLM part, both of which were frozen from training. Our main training component is the visual mapping layer (Projection Layer). In the first stage, we use the image Caption data to train the mapping layer so that the image features extracted by Vision Encder can be aligned with the text feature space in LLM; in the second stage, we use the image Q & A dataset to further fine-tune the visual-verbal capabilities of Ziya-Visual. 首先是VQA效果上的评价,可以看到Ziya-Visual模型在GQA的中文和英文测试集上大部分指标均高于VisualGLM,而在BLUE-4上分数较低,这表明Ziya-Visual在大多数开放域的多模态问答上生成的答案更为泛化和准确,但在一些发散性的问题上生成答案具有自主性。对于mPLUG-Owl模型,英文采用了 mPLUG-Owl 7B Instruction tuning (LoRA) 版本,中文则采用了多语言的mPLUG-Owl 7B (Multilingual) Instruction tuning (LoRA) 版本。因此在英文测评分数上高于双语版本的Ziya-Visual,另一方面,由于Ziya-Visual采用的LLaMA具备更优秀的多语言理解和生成能力,并且在Ziya-Visual二阶段训练时也通过翻译工具引入了多语言多模态训练语料,因此在中文数据的测评结果上更有优势。 Firstly, the evaluation on the VQA effectiveness shows that the Ziya-Visual model outperforms VisualGLM on most of the metrics on both the Chinese and English test sets of GQA, while scoring lower on BLUE-4, indicating that Ziya-Visual generates more generalized and accurate answers on most open domain multimodal questions and answers, but generates some discrete questions on answers have autonomy. For the mPLUG-Owl model, the mPLUG-Owl 7B Instruction tuning (LoRA) version was used for English and the multilingual mPLUG-Owl 7B (Multilingual) Instruction tuning (LoRA) version was used for Chinese. On the other hand, Ziya-Visual's LLaMA has better multilingual comprehension and generation capabilities, and the multilingual multimodal training corpus was introduced in the second phase of Ziya-Visual training through a translation tool, so it has an advantage in the Chinese data. 其次我们使用LLaVA的做法利用GPT-4打分评价,该方法利用coco数据集中的caption和物体检测框信息输入给GPT-4;然后将Ziya-Visual和VisualGLM的图像问答的回答再输入到GPT-4,要求GPT-4从回答的有用性、相关性、准确性、细节程度进行评分(1-10分);LLaVA中将对话任务划分为conv(简单对话),detail(细节对话)和complex(复杂推理),all是三种对话任务的综合平均分。最终评价结果如下,可以看到在简单对话和细节对话中,Ziya-Viusual优于VisualGLM,在复杂推理中略输于VisualGLM,最终总体平均结果优于VisualGLM。在对比mPLUG-Owl中我们得到的结论是类似的,Ziya-Visual总体平均结果优于mPLUG-Owl。 Secondly, we used the LLaVA approach to score the evaluation using the GPT-4, which uses the caption and object detection box information from the coco dataset to input to the GPT-4; the responses to the image quiz from Ziya-Visual and VisualGLM are then input to the GPT-4, which is asked to score the responses in terms of usefulness, relevance, accuracy, and The responses were then fed back into GPT-4, which was asked to rate the responses in terms of usefulness, relevance, accuracy, and level of detail (on a scale of 1-10); LLaVA divided the dialogue tasks into conv (simple dialogue), detail (detailed dialogue) and complex (complex reasoning), and all was the combined average score of the three dialogue tasks. The final evaluation results are as follows, and it can be seen that Ziya-Viusual outperforms VisualGLM in simple and detail dialogues, slightly loses out to VisualGLM in complex reasoning, and finally outperforms VisualGLM in overall average results. In comparing mPLUG-Owl we reach a similar conclusion, with Ziya-Viusual outperforming mPLUG-Owl on average overall. 首先加载Ziya-Visual模型:需要注意的是Visual-Ziya的模型仓库只包含视觉模型部分的参数,Ziya LLM部分的参数通过Ziya-LLaMA-13B-v1获得。得到这两部分的模型参数后,我们加载模型: First load the Ziya-Visual model: it should be noted that the model repository of Visual-Ziya contains only the parameters of the visual model part, the parameters of the Ziya LLM part are obtained through Ziya-LLaMA-13B-v1. Once we have the parameters for both parts of the model, we load the model: Once the model has been loaded, we can happily use the Ziya-Visual model: If you are using the resource for your work, please cite the our paper:
Wenzhong2.0-GPT2-110M-BertTokenizer-chinese
Wenzhong-GPT2-3.5B
Erlangshen-SimCSE-110M-Chinese
Erlangshen-Roberta-330M-Similarity
Erlangshen-Roberta-330M-Causal-Chinese
Erlangshen-MegatronBert-1.3B-NLI
Erlangshen-DeBERTa-v2-186M-Chinese-SentencePiece
Erlangshen-MacBERT-325M-NLI-Chinese
Randeng-T5-Char-700M-Chinese
Erlangshen-UniMC-DeBERTa-v2-1.4B-Chinese
Randeng-Deltalm-362M-Zh-En
Erlangshen-Longformer-110M
Erlangshen-MegatronBert-3.9B-Chinese
Randeng-T5-784M-QA-Chinese
Taiyi-Diffusion-532M-Cyberpunk-Chinese
Zhouwenwang-Unified-1.3B
Taiyi-CLIP-RoBERTa-326M-ViT-H-Chinese
Randeng-BART-139M
Taiyi-Diffusion-532M-Nature-Chinese
Erlangshen-Ubert-330M-Chinese
Erlangshen-Ubert-110M-Chinese
Randeng-TransformerXL-5B-Deduction-Chinese
Erlangshen-DeBERTa-v2-97M-CWS-Chinese
Erlangshen-UniMC-Albert-235M-English
Ziya2-13B-Base
Randeng-T5-Char-57M-MultiTask-Chinese
Randeng-Pegasus-523M-Summary-Chinese-V1
Zhouwenwang-Unified-110M
Ziya-LLaMA-7B-Reward
- Ziya-LLaMA-13B-v1.1 - Ziya-LLaMA-13B-v1 - Ziya-LLaMA-7B-Reward - Ziya-LLaMA-13B-Pretrain-v1 - Ziya-BLIP2-14B-Visual-v1 Introduction Ziya-LLaMA-7B-Reward基于Ziya-LLaMA模型,在以下偏好排序数据上进行训练: 自标注高质量偏好排序数据40190条 严格过滤的外部开源数据3600条,来源包括:`OpenAssistant Conversations Dataset (OASST1)`、`Anthropic HH-RLHF`、`GPT-4-LLM`和`webgptcomparisions` Ziya-LLaMA-7B-Reward is based on the Ziya-LLaMA model, trained on the following preference ranking data: 40190 self-labeled high-quality preference ranking data 3600 strictly filtered external open source data from sources including `OpenAssistant Conversations Dataset (OASST1)`, `Anthropic HH-RLHF`, `GPT-4-LLM` and `webgptcomparisions` The model is able to simulate a bilingual reward environment and provide accurate reward feedback on LLM generation results. The model can more accurately determine low quality model generation results such as text repetition, interruptions and failure to meet instruction requirements, and give lower reward values. The model is able to compare different generation results for the same instruction and give reward values based on quality. Limitation 由于基础模型能力和训练数据的限制,Ziya-LLaMA-7B-Reward的能力也存在一些不足,例如,模型难以精确判断事实性问题的对错,对于质量相近的生成文本判断不够准确等。模型对同一指令的不同生成结果对比排序较为准确,但不同类型指令之间的相互对比则较为困难,比如一个正确回答的数学问题和一个准确回复的写作问题的奖励值可能并不相近。 我们将继续训练以提升模型的能力。 Due to the limitations of the base model capabilities and training data, there are also some shortcomings in the capabilities of Ziya-LLaMA-7B-Reward, for example, the model has difficulty in accurately determining the correctness of factual questions and is not accurate enough in judging generated text of similar quality. The model is more accurate in comparing and ranking different generated results for the same instruction, but it is more difficult to compare different types of instructions with each other, for example, the reward value of a correctly answered math question and an accurately responded writing question may not be similar. We will continue training to improve the model's capabilities.
Yuyuan-GPT2-3.5B
Ziya-Visual-Lyrics-14B
Randeng-DELLA-226M-Chinese
Erlangshen-MegatronBert-1.3B-Similarity
Erlangshen-UniMC-DeBERTa-v2-110M-Chinese
Erlangshen-TCBert-330M-Sentence-Embedding-Chinese
Taiyi-vit-87M-D
Ziya-LLaMA-13B-v1.1
姜子牙系列模型 - Ziya-LLaMA-13B-v1.1 - Ziya-LLaMA-13B-v1 - Ziya-LLaMA-7B-Reward - Ziya-LLaMA-13B-Pretrain-v1 - Ziya-BLIP2-14B-Visual-v1 简介 Brief Introduction 我们对Ziya-LLaMA-13B-v1模型进行继续优化,推出开源版本Ziya-LLaMA-13B-v1.1。通过调整微调数据的比例和采用更优的强化学习策略,本版本在问答准确性、数学能力以及安全性等方面得到了提升,详细能力分析如下图所示。 We have further optimized the Ziya-LLaMA-13B-v1 model and released the open-source version Ziya-LLaMA-13B-v1.1. By adjusting the proportion of fine-tuning data and adopting a better reinforcement learning strategy, this version has achieved improvements in question-answering accuracy, mathematical ability, and safety, as shown in the following figure in detail. 注意:合并后默认会生成3个.bin文件,md5值依次为59194d10b1553d66131d8717c9ef03d6、cc14eebe2408ddfe06b727b4a76e86bb、4a8495d64aa06aee96b5a1cc8cc55fa7。 Note: After merging, three .bin files will be generated by default, with MD5 values of 59194d10b1553d66131d8717c9ef03d6, cc14eebe2408ddfe06b727b4a76e86bb, and 4a8495d64aa06aee96b5a1cc8cc55fa7, respectively. If you are using the resource for your work, please cite the our paper:
Ziya-Writing-LLaMa-13B-v1
- Ziya-LLaMA-13B-v1.1 - Ziya-LLaMA-13B-v1 - Ziya-LLaMA-7B-Reward - Ziya-LLaMA-13B-Pretrain-v1 - Ziya-BLIP2-14B-Visual-v1 姜子牙写作大模型V1是基于LLaMa的130亿参数的指令微调模型,在写作任务上进行了能力增强,是专注于写作的大模型。姜子牙写作模型可以完成公文报告、讲稿书信、创意文案等多类的写作任务。 Ziya-Writing-LLaMa-13B-v1 is a 13-billion parameter instruction fine-tuned model based on LLaMa, which has been enhanced for better performance in writing tasks. It is a large model that focuses on writing. Ziya-Writing-LLaMa-13B-v1 can handle several types of writing tasks, including official reports, speeches, creative copywriting, and more. | 需求 Demand | 任务 Task | 系列 Series | 模型 Model | 参数 Parameter | 额外 Extra | | :----: | :----: | :----: | :----: | :----: | :----: | | 写作 Writing | AGI模型 | 姜子牙 Ziya | LLaMA | 13B | English&Chinese | 我们从网络中收集并清洗了大量真实的真人写作数据,利用GPT-3.5生成对应的写作指令,并进行了极为严格的人工校验。 在此基础上,我们利用奖励模型和一定的清洗逻辑,精心挑选了难度更高的写作指令,剔除了简单的数据,并保证了指令的多样性。 我们利用evol-instruct的方法,生成了约30万条高质量的通用指令数据。我们混合了通用指令数据和写作指令数据,这使得ziya-writing不仅拥有良好的意图理解能力,也能够生成优秀的回答。 We collected and cleaned a large amount of real human writing data from the internet and used GPT-3.5 to generate corresponding writing instructions which have undergone extremely strict manual verification. Based on this, we used a reward model and certain cleaning logic to carefully select more challenging writing instructions, eliminating simple data, and ensuring the diversity of instructions. We used the evol-instruct method to generate about 300,000 high-quality general instruction data. We mixed general instruction data and writing instruction data, which made ziya-writing not only have good intention understanding ability, but also can generate excellent responses. 我们在实验中发现,利用少量人类标注的高质量的写作排序数据,使用强化学习训练模型,就能对进一步拔高模型的写作效果。 为了进一步提升模型的表现,使其能够充分理解人类意图、减少“幻觉”和不安全的输出,基于指令微调后的模型,进行了人类反馈训练(Human-Feedback Training,HFT)。在训练中,我们采用了以人类反馈强化学习(RM、PPO)为主。 我们在内部自研的框架上实现了HFT的训练流程,该框架可以利用最少8张40G的A100显卡完成Ziya-Writing-LLaMA-13B-v1的全参数训练。在PPO训练中,我们没有限制生成样本的长度,以确保长文本任务的奖励准确性。每次训练的总经验池尺寸超过100k样本,确保了训练的充分性。 In our experiment, we found that by using a small amount of high-quality human-annotated writing ranking data and training the model with reinforcement learning, we could effectively improve the writing performance of the model. To further improve the performance of the model, enabling it to fully understand human intentions, reduce "hallucinations" and unsafe outputs, we conducted Human-Feedback Training (HFT) based on the model fine-tuned with instructions. In the training process, we used human feedback reinforcement learning (RM, PPO). We implemented the HFT training process on an internally developed framework, which can use a minimum of 8 40GB A100 GPUs to complete the full parameter training of Ziya-Writing-LLaMA-13B-v1. In the PPO training, we did not limit the length of the generated samples to ensure the accuracy of rewards for long-text tasks. The total experience pool size for each training exceeded 100k samples, ensuring the sufficiency of the training. 写作文案的优劣评价是一个较为主观的评判,很难用一个准确率或者满意度的打分来衡量。因此,我们使用了匿名模型多人Side-by-Side评估的机制,收集了100条不同难度的写作指令数据进行评估,我们后续也会公开这个评测集。 一般而言,由于语言模型大多基于采样来生成回答,因此胜出率大于55%表示该模型显著胜出于另外一个模型,胜出率小于45%表示该模型明显落后,胜出率在45%至55%之间表示两个模型基本持平。 The evaluation of the quality of a writing task is quite subjective, making it difficult to measure with precise accuracy or satisfaction score. Therefore, we've used an anonymous multi-person Side-by-Side evaluation mechanism, and have collected 100 pieces of writing instruction data of different difficulties for evaluation. We will also make this evaluation set public in the future. We use the win rate as an indicator of the quality of a model. The formula to calculate a model's win rate is as follows: Win Rate = (Number of wins for the model + Number of draws / 2) / Total number of annotations Generally, since most language models generate responses based on sampling, hence, a win rate greater than 55% indicates that the model significantly outperforms another model, a win rate less than 45% shows that the model clearly lags behind, and a win rate between 45% and 55% signifies that the two models are essentially on par. | Ziya-Writing-LLaMa-13B-v1 | 平均胜出率 | 最大胜出率 | 最小胜出率 | | :----: | :----: | :----: | :----: | | vs Ziya-LLaMa-13B-v1.1 | 70.7 | 73.5 | 69 | | vs baichuan-vicuna-7b | 69.6 | 73.5 | 68 | | vs Moss-16B | 65.1 | 69 | 62 | | vs ChatGLM2-6B | 58.3 | 61.5 | 56 | | vs Minimax-abab5 | 52.3 | 53 | 50.5 | | vs GPT-3.5-turbo | 44.7 | 49.5 | 38 | (注:最大胜出率和最小胜出率,是对每一个标注人员的标注结果进行单独统计,计算出最大和最小的得分;平均胜出率是对所有标注人员的标注结果进行汇总统计,计算出平均的得分。) If you are using the resource for your work, please cite the our paper:
Erlangshen-UniMC-MegatronBERT-1.3B-Chinese
Randeng-TransformerXL-1.1B-Paraphrasing-Chinese
Taiyi-Roberta-124M-D-v2
Ziya-Reader-13B-v1.0
- Ziya-LLaMA-13B-v1.1 - Ziya-LLaMA-13B-v1 - Ziya-LLaMA-7B-Reward - Ziya-LLaMA-13B-Pretrain-v1 - Ziya-BLIP2-14B-Visual-v1 - Ziya-Writing-LLaMa-13B-v1 - Ziya-Coding-15B-v1 - Ziya-Coding-34B-v1.0 Ziya-Reader-13B-v1.0是一个知识问答模型,给定问题和知识文档可以准确回答问题,用于多文档或单文档问答。该模型具有8k的上下文窗口,相比其他具有更长窗口的模型,我们在多个长文本任务的评测中胜出。包括多文档问答、合成任务(文档检索)长文本摘要。 该模型主要面向知识库问答、检索问答、电商客服等场景,在私域知识问答中有着不错的效果,能广泛应用于法律、金融、医疗等垂直领域。因为它解决了多文档问答中当正确信息不在首个或末尾文档中时,回答准确率大幅降低的问题。 另外,模型的通用能力同样出众,可以进行通用问答。它在我们的通用能力评估集上的效果超过了Ziya-Llama-13B-v1.1. "Ziya-Reader-13B-v1.0" is a knowledge question-answering model. It can accurately answer questions given questions and knowledge documents, and is suitable for both multi-document and single-document question-answering. The model has an 8k context window, and compared to models with longer windows, we have achieved victory in evaluations across multiple long-text tasks. The tasks include multi-document question-answering, synthetic tasks (document retrieval), and long-text summarization. Additionally, the model also demonstrates excellent generalization capabilities, enabling it to be used for general question-answering. Its performance on our general ability evaluation set surpassed that of Ziya-Llama-13B. |model|Multi-doc QA(%)| Synthetic task(%) | Summarization | |:---:|:---:|:---:|:---:| |GPT3.5-turbo-16k | 28.7 | 77.5 |16.0 | |Longchat-v1.5-7B-32k |19.5|7.6|9.9| |Xgen-7B-8k| 11.0| 3.5| 2.2 | |InternlM-7B-8k | 16.3|0.9|12.4| |ChatGLM2-6B-32k|37.6|64.5|16.2| |Vicuna-v1.5-7B-16k|19.3|5.0|15.1| |Ziya-Reader-13B-v1.0| 44.7| 98.5|15.6| Multi-doc QA是多文档问答任务,给定问题和多个文档,根据其中含有正确信息的文档回答问题。该任务衡量模型的相关性判断和记忆力,以及问答的能力。在该任务上Ziya-Reader-13B-v1.0大幅领先所有模型,包括更长窗口的模型。 Synthetic task是合成的相关文档查找任务,给定一个摘要,从众多文档中找出与它对应文档。该任务衡量模型的语义匹配能力。在该任务上,我们的模型超越了所有开源模型,达到66%。 Summarization是长文本摘要任务,给定包含多个说话人的会议记录,生成出超长上下文的会议总结。在该任务上我们的模型非常有竞争力,在只有8k的上下文窗口情况下,与16k或更长窗口的模型差距不到1%,在8k窗口中最强。 "Multi-doc QA" is a multi-document question-answering task, where given a question and multiple documents, the model answers the question based on the documents that contain relevant information. This task measures the model's ability in relevance judgment, memory, and question-answering skills. "Synthetic task" is a synthetic document retrieval task, where given a summary, the goal is to find the corresponding document from a large number of documents. This task evaluates the model's semantic matching ability. "Summarization" is a long-text summarization task, where given meeting records containing multiple speakers, the model generates a meeting summary with an extremely long context. |model|LongBench 中文Multi-doc QA(%)|LongBench 中文Multi-doc QA shuffled(%) | |:---|:---:|:---:| |gpt3.5-turbo-16k | 28.7 | 23.1| |chatGLM2-32k | 34.3 | 20.3 | |Baichuan-13B-Chat2 | 32.4 | 27.2 | |Ziya-Reader-13B-v1.0| 44.7 | 40.9| 我们发现Multi-doc QA中的文档都按照相关性从高到低排列,正确答案往往在第一或前几个,并不能反映模型的相关性判断能力。因此我们对该测试集打乱文档的顺序,再测试各个模型的效果。结果发现目前大多数模型的效果均显著下降,从5%到17%不等,而我们的模型非常鲁棒,降幅不到2%。 We found that the documents in Multi-doc QA were arranged in descending order of relevance, with the correct answer often in the first or early positions, which did not truly reflect the model's ability in relevance judgment. Therefore, we shuffled the document order in this test set and evaluated the performance of various models. The results showed a significant decrease in performance for most models, ranging from 5% to 17%. In contrast, our model demonstrated high robustness with a decrease of less than 2%. | 需求 Demand | 任务 Task | 系列 Series | 模型 Model | 参数 Parameter | 额外 Extra | | :----: | :----: | :----: | :----: | :----: | :----: | | 问答QA,阅读理解MRC| AGI模型 | 姜子牙 Ziya | Llama2 | 13B | Chinese | 模型信息 Model Information 我们使用了位置插值(PI)的方式,在精选的长文档语料上进行微调,扩展上下文到8k大小。其次,模型靠数据喂养,我们从近千万数据中筛选高质量数据,仅用层层过滤的10万量级的数据即可将一个平平无奇的模型培养成知识问答小钢炮。另外,我们为搜索任务量身定做了特殊的任务,精心制作了数据,让模型学会从中寻找相关文档并回答问题。 更多信息请阅读我们的公众号文章姜子牙大模型系列 | 为知识检索而生,Ziya-Reader开源,多个长文本中文任务第一 Please read our public release article for more details姜子牙大模型系列 | 为知识检索而生,Ziya-Reader开源,多个长文本中文任务第一 进行阅读理解类问答时: 问题请放在前面,然后放上下文(知识文档),instruction放到最后。多个检索结果时,每个检索结果用”<eod>\n“分隔,开头使用方括号标识序号。如"[1] xxxxxxx<eod>\n"。 生成结果偶尔会有“根据上面编号为xx的信息”,真正答案从“我的答案是”后开始,解码时请截断前面语句。 dtype:Bfloat16 如果您在您的工作中使用了我们的模型,可以引用我们的论文: If you are using the resource for your work, please cite our paper:
Ziya-Visual-14B-Chat
Randeng-BART-139M-QG-Chinese
Randeng-T5-Char-700M-MultiTask-Chinese
Erlangshen-Roberta-110M-Similarity
Randeng-BART-759M-Chinese-BertTokenizer
Randeng-TransformerXL-5B-Abduction-Chinese
Randeng-PPVAE-1.2B-Augmentation-Chinese
Erlangshen-TCBert-110M-Classification-Chinese
Ziya2-13B-Chat
Erlangshen-TCBert-1.3B-Sentence-Embedding-Chinese
YuyuanQA-GPT2-3.5B
Erlangshen-Longformer-330M
Randeng-DELLA-CVAE-226M-NER-Chinese
Erlangshen-UniEX-RoBERTa-110M-Chinese
Erlangshen-ZEN1-224M-Chinese
Ziya-Writing-13B-v2
Randeng-MegatronT5-770M
Erlangshen-TCBert-110M-Sentence-Embedding-Chinese
Erlangshen-UniEX-RoBERTa-330M-Chinese
Ziya-Coding-15B-v1
Erlangshen-UniMC-RoBERTa-330M-Chinese
Erlangshen-UniMC-DeBERTa-v2-330M-Chinese
Yuyuan-Bart-139M
Erlangshen-MacBERT-325M-TextMatch-Chinese
Erlangshen-MacBERT-110M-BinaryClassification-Chinese
Erlangshen-ZEN2-345M-Chinese
Erlangshen-ZEN2-668M-Chinese
Randeng-DAVAE-1.2B-General-Chinese
Yuyuan-GPT2-110M-SciFi-Chinese
Ziya-Coding-34B-v1.0
- Ziya-LLaMA-13B-v1.1 - Ziya-LLaMA-13B-v1 - Ziya-LLaMA-7B-Reward - Ziya-LLaMA-13B-Pretrain-v1 - Ziya-BLIP2-14B-Visual-v1 - Ziya-Writing-LLaMa-13B-v1 - Ziya-Coding-15B-v1 使用自然语言生成高质量的代码是大模型落地中的高频需求。今天,IDEA研究院封神榜团队正式开源最新的代码大模型Ziya-Coding-34B-v1.0,我们在HumanEval Pass@1的评测上,取得了75.5的好成绩,超过了GPT-4(67.0)的得分,也成为目前已知开源模型新高。封神榜团队正在为社区提供先进的大模型技术和经验,帮助生产和定制更多优秀垂类模型,推进大模型生态发展。 Generating high-quality code using natural language is a high-frequency demand in the deployment of large models. Today, the IDEA Research Institute's Fengshenbang team officially open-sourced the latest code model, Ziya-Coding-34B-v1.0. We achieved a good score of 75.5 on the HumanEval Pass@1 evaluation, surpassing the score of GPT-4 (67.0) and setting a new high for known open-source models. The Fengshenbang team is providing the community with advanced large model technology and experience, helping to produce and customize more excellent vertical models, and promoting the development of the large model ecosystem. 在9月初,我们开源了基于StarCoder-15B的代码模型Ziya-Coding-15B-v1,我们将训练Ziya-Coding-15B-v1积累的训练经验迁移到了新版本的训练中。 我们收集并构造了约45万涵盖了几乎所有代码相关任务的指令数据进行第一阶段的微调,这其中包括约10万的中文指令和35万的英文指令,保证了数据的多样性,在构造数据时,我们充分利用了高质量的无指令代码数据,使用LLM生成对应的指令,扩充得到了更多高质量的代码指令数据。 同时实验过程中,我们注意到,代码指令的难度和正确性是训练代码模型成功的关键。因此,我们引入了第二阶段的精调。我们使用evol-instruct的方法生成了大量高难度多要求的代码指令数据,并利用代码编译器作为反馈,筛选出能够通过编译的代码。最后利用LLM生成单元测试进一步验证代码的正确性。我们最终筛选出了46k数据,在第一阶段模型的基础上,使用较低的学习率进行微调,最终得到了我们的Ziya-coding-34B-v1.0。 In early September, we open-sourced the code model Ziya-Coding-15B-v1 based on StarCoder-15B. The training experience accumulated in training Ziya-Coding-15B-v1 was transferred to the training of the new version. We collected and constructed about 450,000 instruction data covering almost all code-related tasks for the first stage of fine-tuning. This includes about 100,000 Chinese instructions and 350,000 English instructions, ensuring data diversity. When constructing the data, we made full use of high-quality non-instructional code data, used LLM to generate corresponding instructions, and expanded to obtain more high-quality code instruction data. During the experiment, we noticed that the difficulty and correctness of code instructions are key to the successful training of code models. Therefore, we introduced a second stage of fine-tuning. We used the evol-instruct method to generate a large amount of high-difficulty, multi-requirement code instruction data, and used a code compiler as feedback to filter out code that could pass compilation. Finally, we used LLM to generate unit tests to further verify the correctness of the code. We ultimately filtered out 46k data, and on the basis of the first-stage model, we fine-tuned it with a lower learning rate to finally obtain our Ziya-coding-34B-v1.0. | Model | HumanEval(pass@1) | |:----------------------------|:-----------------:| | Ziya-Coding-34B-v1.0 | 75.5% | | CodeFuse-CodeLlama-34B | 74.4% | | Phind-CodeLLaMa-34B-v2 | 73.8% | | WizardCoder-Python-34B-V1.0 | 73.2% | | GPT-4 | 67.0% | | PanGu-Coder2 15B | 61.6% | | WizardCoder-15B-V1.0 | 59.8% | | CodeLlama-34b-Python | 53.7% | | Ziya-Coding-15B-v1 | 50.1% | | CodeLlama-34b | 48.8% | | GPT-3.5 | 48.1% | | StarCoder-15B | 33.6% | 其中,我们对微调数据集进行了去污处理,避免数据泄露,HumanEval的pass@1指标是贪婪生成的结果。 In this process, we performed a decontamination process on the fine-tuning dataset to avoid data leakage. The pass@1 metric for HumanEval is based on the results of greedy generation. Thanks to the excellent work of the community, you can use the quantized version trained by community developers for Ziya-Coding-34B-v1.0. If you are using the resource for your work, please cite the our paper:
Randeng-Deltalm-362M-En-Zh
使用封神框架基于 Detalm base 进行finetune ,搜集的中英数据集(共3千万条)以及 iwslt的中英平行数据(20万),得到 英-> 中方向的翻译模型 Using the Fengshen-LM framework and finetuning based on detalm , get a translation model in the English -> Chinese direction | 需求 Demand | 任务 Task | 系列 Series | 模型 Model | 参数 Parameter | 额外 Extra | | :----: | :----: | :----: | :----: | :----: | :----: | | 通用 General | 自然语言转换 NLT | 燃灯 Randeng | Deltalm | 362M | 翻译任务 En-Zh | 参考论文:DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders | datasets | bleu| | ---- | ---- | | florse101-en-zh | 40.22 | If you are using the resource for your work, please cite the our paper: