mesolitica
llama2-embedding-1b-8k
--- language: - ms ---
wav2vec2-xls-r-300m-mixed
translation-t5-small-standard-bahasa-cased-v2
Trained on 1536 context length, able to translate malay, pasar malay (social media texts or local context), english, manglish, javanese, banjarese and indonesian to target language. It also able to maintain the text structure as it is and only translate necessary texts, eg, programming code. Added more coding translation dataset, noisy b.cari.com.my translation, noisy ChatGPT4 translation and heavy postfilter.
sentiment-analysis-nanot5-small-malaysian-cased
Malaysian-whisper-large-v3-turbo-v3
malay-parler-tts-mini-v1
Malaysian-Qwen2.5-7B-Reasoning-SFT
malaysian-whisper-small-v2
emotion-analysis-nanot5-small-malaysian-cased
bert-base-standard-bahasa-cased
llama2-embedding-2b-8k-contrastive
malaysian-whisper-small-v3
mallam-3b-20k-instructions
Malaysian-Podcast-Dia-1.6B
Full parameter finetuning nari-labs/Dia-1.6B on Malaysian Podcast from mesolitica/Malaysian-Emilia where the permutation for voice conversion only select 80% similar. Complete tutorial how to use at mesolitica/malaya-speech/Dia-TTS. 1. The finetuning done in FP32-BF16 mixed precision training. 2. Multipacking encoder-decoder. 3. Wandb at https://wandb.ai/huseinzol05/dia-tts-malaysian-emilia-full-mixed-precision-podcast Source code at https://github.com/mesolitica/malaya-speech/tree/master/session/dia-tts Special thanks to https://www.sns.com.my and Nvidia for 8x H100 node!
Malaysian-Llama-3.1-8B-Instruct
nanot5-base-malaysian-translation-v2
Qwen2.5-72B-Instruct-FP8
This is FP8 Dynamic Quantization (A8W8) for https://huggingface.co/Qwen/Qwen2.5-72B-Instruct, we use it for vLLM==0.8.5.post1 and above.
finetune-dependency-t5-tiny-standard-bahasa-cased
Malaysian-TTS-4B-v0.1
Continue pretraining Qwen/Qwen3-4B-Base on mesolitica/Malaysian-TTS-v2, 1. Use DistilCodec as speech detokenizer, output in 24k sample rate. 2. Optional controllable pitch and speed for each words. 3. Support context switching between Malay and English. 4. Support streamable text segment. 5. Support `husein` and `idayu` speakers only. 1. Dataset purely synthetic generated using mesolitica/Malaysian-Podcast-Dia-1.6B. 2. Multipacking with proper document masking on 4096 context length. 3. FP32-BF16 mixed precision training. 4. Full parameter finetuning. 5. WanDB at https://wandb.ai/huseinzol05/Qwen-Qwen3-4B-Base-4k-TTS-distilcodec Source code at https://github.com/mesolitica/malaya-speech/tree/master/session/qwen-tts Special thanks to https://www.sns.com.my and Nvidia for 1x H100!
translation-t5-small-standard-bahasa-cased
mallam-1.1B-4096
pos-t5-small-standard-bahasa-cased
malaysian-whisper-medium
Malaysian-Llama-3.2-1B-Instruct
ner-t5-small-standard-bahasa-cased
Malaysian-Dia-1.6B
Full parameter finetuning nari-labs/Dia-1.6B on mesolitica/Malaysian-Emilia. Complete tutorial how to use at mesolitica/malaya-speech/Dia-TTS. 1. The finetuning done in FP32-BF16 mixed precision training. 2. Multipacking encoder-decoder. 3. Wandb at https://wandb.ai/huseinzol05/dia-tts-malaysian-emilia-full-mixed-precision-multipacking-v2 Source code at https://github.com/mesolitica/malaya-speech/tree/master/session/dia-tts Special thanks to https://www.sns.com.my and Nvidia for 8x H100 node!
translation-t5-base-standard-bahasa-cased
jawi-nanot5-small-malaysian-cased
malaysian-whisper-tiny
VITS-female-singlish
nanot5-small-malaysian-translation-v2
mistral-embedding-191m-8k-contrastive
roberta-base-bahasa-cased
VITS-yasmin
roberta-tiny-bahasa-cased
nanot5-small-malaysian-cased
nanot5-small-malaysian-translation-v2.1
malaysian-whisper-base
nanot5-base-malaysian-translation-v2.1
MeloTTS-MS
MeloTTS continue train on MS, forked at https://github.com/malaysia-ai/MeloTTS-MS We uploaded full checkpoints with optimizer states at checkpoints.
finetune-mnli-nanot5-small
mallam-3B-4096
sentiment-analysis-nanot5-tiny-malaysian-cased
embedding-malaysian-mistral-64M-32k
malaysian-llama2-7b-32k-instructions
mallam-1.1b-20k-instructions-v2
finetune-mnli-t5-super-tiny-standard-bahasa-cased
VITS-osman
finetune-qa-t5-small-standard-bahasa-cased
nanot5-base-malaysian-cased
malaysian-debertav2-base
Malaysian-TTS-1.7B-v0.1
Continue pretraining Qwen/Qwen3-1.7B-Base on mesolitica/Malaysian-TTS-v2, 1. Use DistilCodec as speech detokenizer, output in 24k sample rate. 2. Optional controllable pitch and speed for each words. 3. Support context switching between Malay and English. 4. Support streamable text segment. 5. Support `husein` and `idayu` speakers only. 1. Dataset purely synthetic generated using mesolitica/Malaysian-Podcast-Dia-1.6B. 2. Multipacking with proper document masking on 4096 context length. 3. FP32-BF16 mixed precision training. 4. Full parameter finetuning. 5. WanDB at https://wandb.ai/huseinzol05/Qwen-Qwen3-1.7B-Base-4k-TTS-distilcodec 1. output-idayu-chunk.mp3 2. output-husein-chunk.mp3 Source code at https://github.com/mesolitica/malaya-speech/tree/master/session/qwen-tts Special thanks to https://www.sns.com.my and Nvidia for 1x H100!
Malaysian-TTS-0.6B-v1
Continue pretraining mesolitica/Malaysian-TTS-0.6B-v0.1 on much consistent dataset, 1. Use DistilCodec as speech detokenizer, output in 24k sample rate. 2. Support context switching between Malay and English. 3. Better pronunciation for letters. 4. Better repetitive tolerance. 1. husein 2. idayu 3. singaporean 4. DisfluencySpeech 5. singlish-speaker2050 6. singlish-speaker2202 7. haqkiem, private dataset. 1. Multipacking with proper document masking on 4096 context length. 2. FP32-BF16 mixed precision training. 3. Full parameter finetuning. 4. WanDB at https://wandb.ai/huseinzol05/Malaysian-TTS-0.6B-v1 1. husein-0.6b.mp3 2. idayu-0.6b.mp3 3. singaporean-0.6b.mp3 4. DisfluencySpeech-0.6b.mp3 5. singlish-speaker2050-0.6b.mp3 6. singlish-speaker2202-0.6b.mp3 6. haqkiem-0.6b.mp3 1. This model trained on normalized text, so if you have text such as `123`, you have to normalize it first to become `one two three` or `one hundred twenty three` or `satu dua tiga` or `seratus dua puluh tiga`. Feel free to use Malaya for normalization, Malaya support Malay and English normalization, read more at https://github.com/mesolitica/malaya/issues/247#issuecomment-3030313021 2. The repetitive pronunciation dataset does not consistently use commas to for pauses. For example, `A, A, A, A, B, B` in our recordings is spoken as `A A A A B B`. We have no intention to improve it due to cost, but continue finetune using proper dataset should able to solve it. Source code at https://github.com/mesolitica/malaya-speech/tree/master/session/qwen-tts Special thanks to https://www.sns.com.my and Nvidia for 1x H100!
Malaysian-Qwen2.5-7B-Instruct
t5-tiny-standard-bahasa-cased
mallam-5B-4096
Malaysian-Qwen2.5-14B-Reasoning-GRPO
Online Reinforcement learning using GRPO full parameter on warmup reasoning SFT https://huggingface.co/mesolitica/Malaysian-Qwen2.5-14B-Reasoning-SFT on highly curated Malaysian Reasoning dataset. 1. Multitask reasoning, each datapoint been replicated to 4 generations. 2. Actual online reinforcement learning. To get better performance, use system prompt `You are going to enter reasoning mode. First, you try to think step-by-step in Malay. After that, put your final answer within $\\boxed{}$.` Finetune on combine/combined-malaysian-reasoning.jsonl, this is train set from mesolitica/Malaysian-Reasoning. 1. GRPO full parameters. 5. WanDB at https://wandb.ai/huseinzol05/fpf-Malaysian-Qwen2.5-14B-Reasoning-SFT-GRPO 1. Epoch 1.0, revision cc1032dfe961a56a3e33e36f03c37ed09b33c7fe 2. Epoch 2.0, revision 90896edeb1eb18cb48ac682ad606d4ec51172941 Source code at https://github.com/mesolitica/malaya/blob/master/session/qwen2.5/14b-grpo-fsdp.sh All the benchmarks generate using vLLM, evaluation based on sacrebleu CHRF max@5. Source code for evaluation at https://github.com/mesolitica/malaya/tree/master/session/qwen2.5/evaluate-dialect Source code for evaluation at https://github.com/mesolitica/malaya/tree/master/session/qwen2.5/evaluate-malaymmlu Special thanks to https://www.sns.com.my and Nvidia for 8x H100 node!
pos-t5-tiny-standard-bahasa-cased
Malaysian-TTS-1.7B-v1
Continue pretraining mesolitica/Malaysian-TTS-1.7B-v0.1 on much consistent dataset, 1. Use DistilCodec as speech detokenizer, output in 24k sample rate. 2. Support context switching between Malay and English. 3. Better pronunciation for letters. 4. Better repetitive tolerance. 1. husein 2. idayu 3. singaporean 4. DisfluencySpeech 5. singlish-speaker2050 6. singlish-speaker2202 7. haqkiem, private dataset. 1. Multipacking with proper document masking on 4096 context length. 2. FP32-BF16 mixed precision training. 3. Full parameter finetuning. 4. WanDB at https://wandb.ai/huseinzol05/Malaysian-TTS-1.7B-v1 1. husein-v1.mp3 2. idayu-v1.mp3 3. singaporean-v1.mp3 4. DisfluencySpeech-v1.mp3 5. singlish-speaker2050-v1.mp3 6. singlish-speaker2202-v1.mp3 6. haqkiem-v1.mp3 Only `singlish-speaker2202` and `haqkiem` had to generate 2 times to get better output that follow exact text input. 1. This model trained on normalized text, so if you have text such as `123`, you have to normalize it first to become `one two three` or `one hundred twenty three` or `satu dua tiga` or `seratus dua puluh tiga`. Feel free to use Malaya for normalization, Malaya support Malay and English normalization, read more at https://github.com/mesolitica/malaya/issues/247#issuecomment-3030313021 2. The repetitive pronunciation dataset does not consistently use commas for pauses. For example, `A, A, A, A, B, B` in our recordings is spoken as `A A A A B B`. We have no intention to improve it due to cost, but continue finetune using proper dataset should able to solve it. Source code at https://github.com/mesolitica/malaya-speech/tree/master/session/qwen-tts Special thanks to https://www.sns.com.my and Nvidia for 1x H100!
Malaysian-Qwen2.5-14B-Instruct
roberta-base-standard-bahasa-cased
Malaysian-Qwen2.5-3B-Instruct
malaysian-mistral-191M-4096
malaysian-mistral-7b-32k-instructions-v4
finetune-tatabahasa-t5-small-standard-bahasa-cased
malay-VITS-multispeaker
malaysian-mistral-7b-32k-instructions
VITS-multispeaker-clean
finetune-mnli-nanot5-base
Malaysian-TTS-0.6B-v0.1
malaysian-parler-tts-tiny-v1
gemma-3n-e4b-it-audio-encoder
malaysian-llama2-13b-32k-instructions
malaysian-tinyllama-1.1b-16k-instructions
malaysian-tinyllama-1.1b-16k-instructions-v2
Malaysian-Qwen2.5-14B-Reasoning-SFT
Continue finetuning https://huggingface.co/mesolitica/Malaysian-Qwen2.5-14B-Instruct on highly curated Malaysian Reasoning dataset. 1. Reasoning on Math, Science, Translation, Dialects, Multiple choices, coding and Maktabah Al Bakri. 2. Warmup reasoning. Finetune on mesolitica/Malaysian-Reasoning to make the model better reasoning on Malaysian context. 1. Full parameters on 12k context length. 5. WanDB at https://wandb.ai/huseinzol05/fpf-qwen2.5-14b-malaysian-12k-reasoning Source code at https://github.com/mesolitica/malaya/tree/master/session/qwen2.5 All the benchmarks generate using vLLM, evaluation based on sacrebleu CHRF max@5. Source code for evaluation at https://github.com/mesolitica/malaya/tree/master/session/qwen2.5/evaluate-dialect Source code for evaluation at https://github.com/mesolitica/malaya/tree/master/session/qwen2.5/evaluate-malaymmlu Special thanks to https://www.sns.com.my and Nvidia for 8x H100 node!
malaysian-mistral-7b-32k-instructions-v2
malaysian-distil-whisper-large-v3
Malaysian-Qwen2.5-7B-Speech-Instruct
Speech model on top of mesolitica/Malaysian-Qwen2.5-7B-Audio-Instruct. It designs for voice assistant general question answer. Speech instructions, actual conversations related to coding, politics, chat assistant and general QA. - We use freezed Whisper Large V3 Encoder without any pooling, means 30 seconds audio consumed 1500 tokens or 1 token equal to 0.02 seconds. - Projection, Embedding and LM Head layers are done in full parameter finetuning. - LoRA for other linear layers with rank 64 and alpha 128. - Training done in multipacking with 10240 context length. - WanDB at https://wandb.ai/huseinzol05/lora-embedding-64-audio-qwen2.5-7b-malaysian-10k-stage2 - Revision 513a900f40d372e8d7eb774e0561af043c704449 1. mesolitica/Malaysian-UltraChat-Speech-Multiturn-Instructions, 1 epoch. 2. mesolitica/Malaysian-Multiturn-Chat-Assistant, 1 epoch. 3. mesolitica/Malaysian-Speech-Instructions, 1 epoch. 4. mesolitica/Malaysian-Reasoning-Speech-Instructions, 1 epoch. 5. mesolitica/Malaysian-Speech-Description-Timestamp-Instructions, random sampling 0.2 epoch. 6. mesolitica/Cantonese-Radio-Description-Instructions, random sampling 0.2 epoch. 7. mesolitica/Emilia-Mandarin-Description-Instructions, random sampling 0.2 epoch. 8. mesolitica/Malaysian-SFT/combined-malaysian-sft-5k-sample.jsonl, text corpus, 1 epoch. 9. mesolitica/Malaysian-Instructions/voiceassistant, text only instructions, 1 epoch. 10. mesolitica/Malaysian-Instructions/mixedmanglish, text only instructions, 1 epoch. 11. mesolitica/Malaysian-Instructions/manglish, text only instructions, 1 epoch. 12. mesolitica/Malaysian-Instructions/longerrespond, text only instructions, 1 epoch. With total 3.14B tokens (include text only instructions) or 9584.595 audio hours. You can try more examples at https://github.com/mesolitica/malaya-speech/tree/master/speech/speech-instructions We cover more examples such as RAG multi-turn, force specific languages, voice assistant mode, reasoning and longer respond at https://github.com/mesolitica/malaya/wiki/Malaysian-Speech-Instruct You can use this fork to serve the model in vLLM, https://github.com/mesolitica/vllm-llmaudio Source code at https://github.com/mesolitica/malaya/tree/master/session/audiollm
gpt2-117m-bahasa-cased-v2
gpt2-117m-bahasa-cased
bert-tiny-standard-bahasa-cased
finetune-paraphrase-t5-tiny-standard-bahasa-cased
electra-base-generator-bahasa-cased
finetune-dependency-t5-small-standard-bahasa-cased
llama-1b-hf-32768-fpf
llama2-embedding-1b-8k-contrastive
malaysian-mistral-7b-32k-instructions-v3
conformer-tiny-ctc
t5-super-super-tiny-standard-bahasa-cased
VITS-female
malaysian-llama2-7b-32k-instructions-v2
llama-3-8b-8192-hf
VITS-haqkiem
finetune-paraphrase-t5-base-standard-bahasa-cased
finetune-summarization-t5-small-standard-bahasa-cased
finetune-true-case-t5-tiny-standard-bahasa-cased
nanot5-large-malaysian-cased
Malaysian-Llama-3.2-3B-Instruct
Malaysian-Qwen2.5-1.5B-Reasoning-GRPO
Online Reinforcement learning using GRPO full parameter on warmup reasoning SFT https://huggingface.co/mesolitica/Malaysian-Qwen2.5-1.5B-Reasoning-SFT on highly curated Malaysian Reasoning dataset. 1. Multitask reasoning, each datapoint been replicated to 4 generations. 2. Actual online reinforcement learning. To get better performance, use system prompt `You are going to enter reasoning mode. First, you try to think step-by-step in Malay. After that, put your final answer within $\\boxed{}$.` Finetune on combine/combined-malaysian-reasoning.jsonl, this is train set from mesolitica/Malaysian-Reasoning. 1. GRPO full parameters. 5. WanDB at https://wandb.ai/huseinzol05/fpf-Malaysian-Qwen2.5-1.5B-Reasoning-SFT-GRPO 1. Epoch 5.0, revision b4c3d2b391ff08141a0728c6f1868bffed313be6 Source code at https://github.com/mesolitica/malaya/blob/master/session/qwen2.5/1.5b-grpo-fsdp.sh Source code for evaluation at https://github.com/mesolitica/malaya/tree/master/session/qwen2.5/evaluate-dialect Source code for evaluation at https://github.com/mesolitica/malaya/tree/master/session/qwen2.5/evaluate-malaymmlu Special thanks to https://www.sns.com.my and Nvidia for 8x H100 node!
Malaysian-orpheus-3b-0.1-ft
translation-t5-tiny-standard-bahasa-cased
mistral-embedding-349m-8k-contrastive
electra-small-discriminator-bahasa-cased
malaysian-mistral-349M-4096
malaysian-tinyllama-1.1b-siglip-large-384-vision
malaysian-mistral-64M-4096
malaysian-whisper-large-v2
Malaysian-Qwen2.5-1.5B-Instruct-v0.1
Malaysian-Qwen2.5-32B-Instruct
Malaysian-Qwen2.5-72B-Instruct
finetune-whisper-base-ms-singlish-v2
llama-2b-hf-32768-fpf
finetune-mnli-t5-small-standard-bahasa-cased
translation-nanot5-base-malaysian-cased
translation-nanot5-tiny-malaysian-cased
malaysian-mistral-474M-MLM-512
malaysian-whisper-medium-v2
Malaysian-Qwen2.5-1.5B-Instruct
Malaysian-Qwen2.5-1.5B-Reasoning-SFT
Malaysian-Qwen2.5-7B-Audio-Instruct
Audio model on top of mesolitica/Malaysian-Qwen2.5-7B-Instruct. Audio understanding, this is to introduce audio dataset to the LLM. - We use freezed Whisper Large V3 Encoder without any pooling, means 30 seconds audio consumed 1500 tokens or 1 token equal to 0.02 seconds. - Projection, Embedding and LM Head layers are done in full parameter finetuning. - LoRA for other linear layers with rank 64 and alpha 128. - Training done in multipacking with 8192 context length. - WanDB at https://wandb.ai/huseinzol05/lora-embedding-64-audio-qwen2.5-7b-malaysian-8k 1. mesolitica/AudioSet-Audio-Instruction, 1 epoch. 2. mesolitica/Classification-Speech-Instructions, 1 epoch. 3. mesolitica/Animal-Sound-Instructions, 3 epoch. 4. mesolitica/Transcription-Instructions, 1 epoch. 5. mesolitica/Speaker-Diarization-Instructions, 4 epoch. 6. mesolitica/Speech-Translation-Instructions, 2 epoch. 7. mesolitica/CoVoST2-Instructions, 1 epoch. 8. mesolitica/MusicBench-Instructions, 2 epoch. 9. mesolitica/Sampling-Multitask-National-Speech-Corpus-v1, 1 epoch. 10. mesolitica/Malaysian-Speech-Description-Timestamp-Instructions, 1 epoch. 11. mesolitica/Cantonese-Radio-Description-Instructions, 1 epoch. 12. mesolitica/Emilia-Mandarin-Description-Instructions, 1 epoch. 13. mesolitica/Audio-Adversarial-Instructions, revision 4536d60ab09a190e7d12536811be404062d5d38c, 1 epoch. 14. mesolitica/Zeroshot-Audio-Classification-Instructions, revision 7d22438bdcd697af1ce4281228860c6b8663fb76, 1 epoch. Because most of the dataset is about audio understanding, for End-to-End Speech-LLM chat instructions, please use mesolitica/Malaysian-Qwen2.5-7B-Speech-Instruct. You can use this fork to serve the model in vLLM, https://github.com/mesolitica/vllm-llmaudio Source code at https://github.com/mesolitica/malaya/tree/master/session/audiollm
Malaysian-sesame-csm-1b
Full parameter finetuning sesame/csm-1b on mesolitica/Malaysian-Emilia. 1. The finetuning done in FP32-BF16 mixed precision training. 2. Multipacking decoder. 3. Wandb at https://wandb.ai/huseinzol05/sesame-1b-malaysian-emilia-full-mixed-precision Source code at https://github.com/mesolitica/malaya-speech/tree/master/session/sesame-tts Special thanks to https://www.sns.com.my and Nvidia for 8x H100 node!
finetune-keyword-t5-base-standard-bahasa-cased
llama-7b-hf-2048-fpf
malaysian-llama-3-8b-instruct-16k
finetune-keyword-t5-small-standard-bahasa-cased
malaysian-parler-tts-mini-v1
Finetuned https://huggingface.co/parler-tts/parler-tts-mini-v1 on Mesolitica/TTS Wandb at https://wandb.ai/huseinzol05/malaysian-parler-tts-mini-v1 Source code at https://github.com/mesolitica/malaya-speech/tree/master/session/parler-tts
t5-small-standard-bahasa-cased
t5-super-tiny-bahasa-cased
finetune-paraphrase-t5-small-standard-bahasa-cased
gpt2-355m-bahasa-cased
finetune-whisper-base-ms-singlish
VITS-orkid
VITS-bunga
VITS-tuah
VITS-male
translation-nanot5-small-malaysian-cased
emotion-analysis-nanot5-tiny-malaysian-cased
ner-t5-tiny-standard-bahasa-cased
mistral-7b-4096-fpf
llama2-embedding-600m-8k-contrastive
Malaysian-Llama-3.2-3B-Instruct-v0.2
Malaysian-Qwen2.5-0.5B-Instruct
Malaysian-gemma-3-1b-it
Malaysian-Podcast-sesame-csm-1b
Full parameter finetuning sesame/csm-1b on Malaysian Podcast from mesolitica/Malaysian-Emilia where the permutation for voice conversion only select 80% similar. 1. The finetuning done in FP32-BF16 mixed precision training. 2. Multipacking decoder. 3. Wandb at https://wandb.ai/huseinzol05/sesame-1b-malaysian-emilia-full-mixed-precision-podcast Source code at https://github.com/mesolitica/malaya-speech/tree/master/session/sesame-tts Special thanks to https://www.sns.com.my and Nvidia for 8x H100 node!
Malaysian-Qwen2.5-7B-Dialect-Reasoning-GRPO
llama-13b-hf-2048-fpf
Malaysian-Llama-3.2-3B-Instruct-v0.1
finetune-extractive-qa-t5-base-standard-bahasa-cased
llama2-embedding-600m-8k
mallam-5b-20k-instructions
mallam-5b-20k-instructions-v2
malaysian-llama-3-8b-262k
t5-3x-super-tiny-standard-bahasa-cased
t5-small-bahasa-cased
finetune-isi-penting-generator-t5-base-standard-bahasa-cased
finetune-isi-penting-generator-t5-small-standard-bahasa-cased
electra-small-generator-bahasa-cased
wav2vec2-base-ms-singlish
finetune-qa-t5-base-standard-bahasa-cased
finetune-keyword-t5-tiny-standard-bahasa-cased
finetune-whisper-tiny-ms-singlish
finetune-whisper-tiny-ms-singlish-v2
VITS-jebat
nanot5-tiny-malaysian-cased
llama-7b-hf-32768-fpf
llama-600m-hf-32768-fpf
jawi-nanot5-tiny-malaysian-cased
constituency-parsing-t5-base-standard-bahasa-cased
malaysian-tinyllama-1.1b-siglip-large-384-vision-alignment
malaysian-mistral-siglip-base-384-vision-alignment
malaysian-mistral-474M-4096
reranker-malaysian-mistral-474M-32k
malaysian-mistral-64M-MLM-512
mnli-malaysian-mistral-191M-MLM-512
llava-v1.6-vicuna-13b-hf-awq
Malaysian-Llama-3.2-1B-Instruct-v0.2
Malaysian-Llama-3.1-8B-Instruct-Marlin
Malaysian-Llama-3.1-70B-Instruct
Malaysian-gemma-3-27b-it
Malaysian-Mistral-Small-3.1-24B-Instruct-2503
Continue finetuning https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503 on highly curated 1.5B tokens Malaysian instruction dataset. 1. Support respond in Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan and Terengganu. 2. Able to code in Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan and Terengganu. 3. Multi-turn Malaysian context such as related to Malaysian Legislation, politics, religions and languages. Finetune on mesolitica/Malaysian-SFT to make the model understand Malaysian context. 1. LoRA on `["qproj", "kproj", "vproj", "oproj", "gateproj", "upproj", "downproj", "embedtokens", "lmhead"]`. 2. 256 Rank with alpha 512, or alpha of 2.0 3. Multipacking 8192 context length with proper SDPA causal masking to prevent document contamination and also make sure proper position ids. 4. Chunk CCE loss for LoRA. 5. WanDB at https://wandb.ai/huseinzol05/lora-embedding-256-Mistral-Small-3.1-24B-Instruct-2503-malaysian-8k Source code at https://github.com/mesolitica/malaya/tree/master/session/mistral3 Based on 0-shot official MalayMMLU First token accuracy, Based on 0-shot exact first token match using vLLM Guided Decoding, Special thanks to https://www.sns.com.my and Nvidia for 8x H100 node!