NYTK

PULI-HuBA 130M is a monolingual Hungarian foundation model based on the Mamba configuration. (https://huggingface.co/state-spaces/mamba-130m-hf) Parameters: MambaForCausalLM( (backbone): MambaModel( (embeddings): Embedding(52000, 768) (layers): ModuleList( (0-23): 24 x MambaBlock( (norm): MambaRMSNorm(768, eps=1e-05) (mixer): MambaMixer( (conv1d): Conv1d(1536, 1536, kernelsize=(4,), stride=(1,), padding=(3,), groups=1536) (act): SiLU() (inproj): Linear(infeatures=768, outfeatures=3072, bias=False) (xproj): Linear(infeatures=1536, outfeatures=80, bias=False) (dtproj): Linear(infeatures=48, outfeatures=1536, bias=True) (outproj): Linear(infeatures=1536, outfeatures=768, bias=False) ) ) ) (normf): MambaRMSNorm(768, eps=1e-05) ) (lmhead): Linear(infeatures=768, outfeatures=52000, bias=False) ) The model was trained on a ~3.48B-token, toxic-filtered, deduplicated, and semantically segmented dataset. License: Apache 2.0 Hardware: 4 × NVIDIA A100 (80GB) GPUs Year of training: 2024 Input/output: Text only Parameter count: 130 million Available model size: Single variant Data type: float32 Batch size: 10 per GPU Learning rate: 3e-4 Reference: GitHub issue Potential for biased, incorrect, or harmful content generation. To generate text using this model with Hugging Face's `pipeline`, use the following Python code: If you have any questions, please contact me: [email protected] or [email protected]

license:apache-2.0

PULI-Trio-Q

- Trained with LLaMA-Factory github - The Qwen2.5 7B Instruct model were continual pretrained on Hungarian dataset - Hungarian (8.08 billion words): documents (763K) that exceed 5000 words in length + Hungarian Wikipedia - English: Long Context QA (2 billion words), BookSum (78 million words) - Chinese (3 billion Chinese characters): Wudao - The training was completed using a Hungarian-only dataset: - 626 million Hungarian words (1 epoch): Hungarian Wikipedia + News articles Citation If you use this model, please cite the following paper:

license:apache-2.0