BlinkDL
rwkv-4-raven
rwkv-5-world
rwkv-4-world
RWKV-4 trained on 100+ world languages (70% English, 15% multilang, 15% code). World = SomePile + SomeRedPajama + SomeOSCAR + AllWikipedia + AllChatGPTDataIcanfind XXXtuned = finetune of World on MC4, OSCAR, wiki, etc. How to use: use https://github.com/josStorer/RWKV-Runner for GUI use latest rwkv pip package (0.8.0+) use https://github.com/BlinkDL/ChatRWKV/blob/main/v2/benchmarkworld.py and https://github.com/BlinkDL/ChatRWKV/blob/main/APIDEMOWORLD.py to test it The differences between World & Raven: set pipeline = PIPELINE(model, "rwkvvocabv20230424") instead of 20Btokenizer.json (EXACTLY AS WRITTEN HERE. "rwkvvocabv20230424" is included in rwkv 0.7.4+) use Question/Answer or User/AI or Human/Bot for chat. DO NOT USE Bob/Alice or Q/A For 0.1/0.4/1.5B models, use fp32 for first layer (will overflow in fp16 at this moment - fixable in future), or bf16 if you have 30xx/40xx GPUs. Example strategy: cuda fp32 1 -> cuda fp16 NOTE: the new greedy tokenizer (https://github.com/BlinkDL/ChatRWKV/blob/main/tokenizer/rwkvtokenizer.py) will tokenize '\n\n' as one single token instead of ['\n','\n']
rwkv7-g1
These are BASE models (pretrained with web/code/synthetic + instruction/chat/reasoning data), suitable for post-training and fine-tuning (check https://huggingface.co/spaces/Jellyfish042/Uncheatabl...
rwkv-4-pile-14b
[UPDATE: Try RWKV-4-World (https://huggingface.co/BlinkDL/rwkv-4-world) for generation & chat & code in 100+ world languages, with great English zero-shot & in-context learning ability too.] RWKV-4 14B is a L40-D5120 causal language model trained on the Pile. See https://github.com/BlinkDL/RWKV-LM for details. RWKV-4-Pile-14B-2023xxxx-ctx8192-testxxx.pth : Fine-tuned to ctxlen 8192. The best general model. "Raven": RWKV alpaca+vicuna-style model: https://huggingface.co/BlinkDL/rwkv-4-raven (highly recommended) It is a strong chat model too. You can use +i for "Alpaca Instruct" in latest ChatRWKV v2. Examples: RWKV-4-Pile-14B-20230213-8019.pth : Trained on the Pile for 331B tokens Pile loss 1.7579 (ctxlen 1024) LAMBADA ppl 3.81, acc 71.05% PIQA acc 77.42% SC2016 acc 75.57% Hellaswag accnorm 70.24% WinoGrande acc 62.98%
rwkv-4-pile-7b
[UPDATE: Try RWKV-4-World (https://huggingface.co/BlinkDL/rwkv-4-world) for generation & chat & code in 100+ world languages, with great English zero-shot & in-context learning ability too.] RWKV-4 7B is a L32-D4096 causal language model trained on the Pile. See https://github.com/BlinkDL/RWKV-LM for details. RWKV-4-Pile-7B-20230109-ctx4096.pth : Fine-tuned to ctxlen 4096. Likely the best. Please test. "Raven": RWKV alpaca+vicuna-style model: https://huggingface.co/BlinkDL/rwkv-4-raven (highly recommended) It is a strong chat model too. You can use +i for "Alpaca Instruct" in latest ChatRWKV v2. Examples: RWKV-4-Pile-7B-20230xxx-ctx8192-testxxx : Fine-tuned to ctxlen 8192. Slightly weaker than ctx4096 model when ctxlen < 3k. RWKV-4-Pile-7B-20221115-8047.pth : Trained on the Pile for 332B tokens. Pile loss 1.8415T LAMBADA ppl 4.38, acc 67.18% PIQA acc 76.06% SC2016 acc 73.44% Hellaswag accnorm 65.51% Instruct-test models (OLD): only useful if you construct your prompt following dataset templates Note I am using "Q: instruct\n\nA: result" prompt for all instructs. RWKV-4-Pile-7B-Instruct-test1 instruct-tuned on https://huggingface.co/datasets/bigscience/xP3all/viewer/en/train RWKV-4-Pile-7B-Instruct-test2 instruct-tuned on https://huggingface.co/datasets/Muennighoff/flan & NIv2 RWKV-4-Pile-7B-EngChn-testNovel-xxx for writing Chinese novels (trained on 200G Chinese novels.)
Rwkv 6 World
Use rwkv pip package 0.8.24+ for RWKV-6 inference: https://pypi.org/project/rwkv/ (pipeline = PIPELINE(model, "rwkvvocabv20230424") for rwkv-world models) Online Demo 1: https://huggingface.co/spaces/BlinkDL/RWKV-Gradio-2 Online Demo 2: https://huggingface.co/spaces/BlinkDL/RWKV-Gradio-1 GUI: https://github.com/josStorer/RWKV-Runner (see Releases) For developer: https://github.com/BlinkDL/ChatRWKV/blob/main/APIDEMOCHAT.py RWKV-6 7B v3 MMLU = 54.2% (using the same "47.9%" code) RWKV-6 7B v2.1 MMLU = 47.9%: https://github.com/Jellyfish042/rwkvmmlu RWKV-6 0.1B (using pythia-160m tokenizer): https://huggingface.co/BlinkDL/temp-latest-training-models/blob/main/temp/rwkv-x060-173m-pile-20240515-ctx4k.pth RWKV-6 trained on 100+ world languages (70% English, 15% multilang, 15% code). World = SomePile + SomeSlimPajama + SomeStarCoder + SomeOSCAR + AllWikipedia + AllChatGPTDataIcanfind Recommended fine-tuning format (use \n for newlines): A good chat prompt (better replace \n\n in xxx to \n, such that there will be no newlines in xxx): QA prompt (better replace \n\n in xxx to \n, such that there will be no newlines in xxx): !!! There should not be any space after your final ":" or you will upset the tokenizer and see non-English reponse !!! !!! There should not be any space after your final ":" or you will upset the tokenizer and see non-English reponse !!! !!! There should not be any space after your final ":" or you will upset the tokenizer and see non-English reponse !!!
rwkv-7-world
rwkv-4-novel
Currently I am doing it for Chn novels. More languages to come. Use https://github.com/BlinkDL/ChatRWKV to run them. See https://github.com/BlinkDL/RWKV-LM for details on the RWKV Language Model (100% RNN). RWKV-4-Novel-ChnEng : 50% Chinese + 50% Pile RWKV-4-Novel-ChnEng-ChnPro : RWKV-4-Novel-ChnEng finetuned on high-quality professional Chn novels RWKV-4-Novel-Chn : 100% Chinese
temp-latest-training-models
rwkv-4-music
rwkv-4-pile-3b
rwkv-4-pileplus
clip-guided-binary-autoencoder
rwkv-4-pile-1b5
rwkv-6-misc
rwkv-5-music
Rwkv 8 Pile
RWKV-8 trained on the Pile w/ "20b tokenizer" (332115325534 tokens)