HoangHa
Pensez-Llama3.1-8B
This model is based on the transformers library and is licensed under llama3.1. It is designed to work with various datasets.
Pensez V0.1 E5
Pensez: Less Data, Better Reasoning β Rethinking French LLM About | How to Run Locally | Models and Datasets | Benchmarks | Training Details Paper: Pensez: Less Data, Better Reasoning - Rethinking French LLM Pensez is a bilingual (French-English) reasoning model designed to maximize efficiency with significantly reduced training data. The model leverages a curated dataset focusing on daily reasoning tasks and scientific questions to enhance performance. Key strategies for improved reasoning: - Concise reasoning for simple tasks to prevent overthinking. - Extended reasoning for complex domains like mathematics, coding, and science. - Special tokens (` ... `) to explicitly guide the modelβs reasoning process. These optimizations result in superior reasoning capabilities while maintaining robust general understanding compared to models like DeepSeek-R1-Distill-Qwen-7B. Pensez is built upon Qwen 2.5 Instruct 7B and trained over five epochs. | Model | Backbone | Size | Download Link | |---------------|----------------------------------------|------|---------------| | Pensez-v0.1-e1 | Qwen2.5-7B-Instruct | 7B | π€ Pensez-v0.1-e1 | | Pensez-v0.1-e2 | Qwen2.5-7B-Instruct | 7B | π€ Pensez-v0.1-e2 | | Pensez-v0.1-e3 | Qwen2.5-7B-Instruct | 7B | π€ Pensez-v0.1-e3 | | Pensez-v0.1-e4 | Qwen2.5-7B-Instruct | 7B | π€ Pensez-v0.1-e4 | | Pensez-v0.1-e5 | Qwen2.5-7B-Instruct | 7B | π€ Pensez-v0.1-e5 | Pensez was trained on the hand-curated Pensez v0.1 dataset containing 2,000 samples (1,000 French, 1,000 English). | Dataset | Description | Size | Link | |--------------|----------------------|-------|-------| | Pensez v0.1 | SFT Training Dataset | 2K samples | π€ Pensez v0.1 | Pensez was evaluated on French-specific benchmarks, demonstrating strong reasoning ability and improved task-specific performance: | Benchmark | Pensez-v0.1-e5 | DeepSeek-R1-Distill-Qwen-7B | Qwen2.5-7B-Instruct | |-----------|---------------|-----------------------------|----------------------| | Math-hard (fr) | 0.3458 | 0.3403 | 0.2253 | | MMLU (fr) | 0.5766 | 0.4961 | 0.6612 | | BoolQA (fr) | 0.9157 | 0.7079 | 0.9382 | | Trivia (en) | 0.4421 | 0.2711 | 0.5316 | | HellaSwag (en) | 0.5050 | 0.3540 | 0.5258 | Key Observations: - Pensez outperforms Qwen2.5-7B-Instruct in reasoning tasks. - Comparable to DeepSeek-R1-Distill-Qwen-7B in reasoning while maintaining strong understanding. - Reduced degradation in knowledge-based tasks. | Tasks | Pensez v0.1 e1 | Pensez v0.1 e2 | Pensez v0.1 e3 | Pensez v0.1 e4 | Pensez v0.1 e5 | Qwen 7B instruct | R1 distil | |------------------------------------------------|---------------|---------------|---------------|---------------|---------------|-----------------|-----------| | leaderboardmathhardfr | 0.0918 | 0.2547 | 0.2783 | 0.3035 | 0.3458 | 0.2253 | 0.3403 | | leaderboardmathalgebrahardfr | 0.1029 | 0.3914 | 0.3971 | 0.5114 | 0.5000 | 0.4229 | 0.4771 | | leaderboardmathcountingandprobhardfr | 0.0765 | 0.1378 | 0.1939 | 0.2041 | 0.2398 | 0.1224 | 0.2347 | | leaderboardmathgeometryhardfr | 0.0388 | 0.1019 | 0.1408 | 0.1359 | 0.1748 | 0.1019 | 0.2330 | | leaderboardmathnumtheoryhardfr | 0.1198 | 0.2581 | 0.3502 | 0.3548 | 0.4332 | 0.3180 | 0.3963 | | leaderboardmathprealgebrahardfr | 0.1681 | 0.4425 | 0.4690 | 0.4956 | 0.5841 | 0.3274 | 0.4867 | | leaderboardmathprecalculushardfr | 0.0357 | 0.0714 | 0.1190 | 0.1190 | 0.1429 | 0.0595 | 0.2143 | | leaderboardmmlufr | 0.3806 | 0.3329 | - | - | 0.5766 | 0.6612 | 0.4961 | | frenchbencharcchallenge | 0.5047 | 0.5021 | 0.4919 | 0.4859 | 0.4842 | 0.5518 | 0.3447 | | frenchbenchboolqa | 0.9326 | 0.9326 | 0.9326 | 0.9270 | 0.9157 | 0.9382 | 0.7079 | | frenchbenchfquadv2 | 0.4325 | 0.4400 | 0.4412 | 0.4375 | 0.4387 | 0.4800 | 0.2988 | | frenchbenchhellaswag | 0.4970 | 0.5055 | 0.5092 | 0.5058 | 0.5050 | 0.5258 | 0.3540 | | frenchbenchtrivia | 0.4763 | 0.4763 | 0.4553 | 0.4395 | 0.4421 | 0.5316 | 0.2711 | You can run Pensez using Hugging Faceβs `transformers` library: Pensez was trained with: - Packing Inputs Without Cross-Contamination Attention (Reference) - Liger Kernel (Reference) - DeepSpeed 3 (Reference) - NEFTune Noise (Reference) for robustness. | Parameter | Value | |--------------|----------| | Epochs | 5 | | Global Batch Size | 200 | | Learning Rate | 1e-5 | | Scheduler | Cosine | | Optimizer | AdamW | | Warmup Ratio | 0.05 | | Weight Decay | 0.01 | | Max Sequence Length | 16,384 | - llama-factory - Deepseek R1 - Qwen 2.5 - NEFTune Noise - Packing Inputs Without Cross-Contamination Attention - Liger Kernel - Deepspeed - lm-evaluation-harness - Hyperbolic - Modal ```