codelion
dhara-70m
gpt-2-70m
A 70M parameter GPT-2 model trained on 1 billion tokens using an optimized 50-30-20 dataset mixing strategy. This model demonstrates the effectiveness of careful dataset composition for efficient language model pretraining. Despite using 10x less training data than GPT-2 (1B vs 10B tokens), it achieves competitive performance by leveraging an optimal mixture of high-quality data sources. Architecture: GPT-2 - Parameters: 70M (64.09M trainable) - Layers: 12 - Hidden Size: 512 - Attention Heads: 8 - Context Length: 1024 tokens - Vocabulary Size: 50,257 The model was trained on 1 billion tokens with the following composition: - 50% - FinePDFs (500M tokens): High-quality PDF content - 30% - DCLM Baseline (300M tokens): Filtered web content - 20% - FineWeb-Edu (200M tokens): Educational web content This 50-30-20 mixing ratio was identified through systematic experimentation as optimal for balanced performance across multiple domains. - Total Tokens: 1,000,000,000 - Batch Size: 24 (effective: 120 with gradient accumulation) - Learning Rate: 5e-4 → 5e-5 (cosine decay) - Warmup Steps: 162 (2% of total) - Precision: BFloat16 - Optimizer: AdamW - Final Loss: 2.92 | Benchmark | Our Model | Random | GPT-2 | vs Random | vs GPT-2 | |-----------|-----------|--------|-------|-----------|----------| | MMLU (5-shot) | 24.11% | 25.00% | 26.00% | -0.89% | -1.89% | | HellaSwag (0-shot) | 27.03% | 25.00% | 30.00% | +2.03% | -2.97% | | ARC-Challenge (0-shot) | 21.67% | 25.00% | 24.00% | -3.33% | -2.33% | | PIQA (0-shot) | 57.29% | 50.00% | 63.00% | +7.29% | -5.71% | | WinoGrande (0-shot) | 51.46% | 50.00% | 51.00% | +1.46% | +0.46% | | TruthfulQA MC2 (0-shot) | 47.31% | 25.00% | 40.00% | +22.31% | +7.31% | | Average | 38.15% | 33.33% | 39.00% | +4.81% | -0.85% | - Performance Gap: Only 0.85% behind GPT-2 baseline (39.00%) - Efficiency: Achieves 84.9% of GPT-2's performance improvement over random guessing - Data Efficiency: Competitive results with 10x less training data - TruthfulQA Excellence: +7.31% above GPT-2 baseline, demonstrating superior factual accuracy 1. Data Quality > Quantity: The 50-30-20 mixing strategy demonstrates that careful dataset composition can achieve strong performance with significantly reduced compute 2. Factual Accuracy: The model excels at truthfulness (TruthfulQA), likely due to high-quality FinePDF content (50%) 3. Practical Commonsense: Strong performance on PIQA and WinoGrande shows effective real-world reasoning 4. Knowledge Gaps: Below-random performance on MMLU and ARC-Challenge indicates insufficient academic/scientific knowledge for this scale - Academic Knowledge: Limited performance on academic benchmarks (MMLU, ARC-Challenge) - Training Scale: 1B tokens is insufficient for comprehensive world knowledge - Parameter Count: 70M parameters may limit capacity for complex reasoning For questions or issues, please open an issue on the model repository.
Llama-3.3-70B-o1
Llama-3.2-1B-Instruct-tool-calling-lora
gemma-3-1b-it-reasoning-grpo-lora
This LoRA adapter enhances google/gemma-3-1b-it with structured reasoning capabilities using ` ` tags. Trained with GRPO (Group Relative Policy Optimization) on self-generated preference data. - Structured Thinking: Teaches models to use ` ` tags for chain-of-thought reasoning - GRPO Training: Uses preference learning to optimize reasoning quality - Self-Generated Data: No external datasets required - uses Magpie approach - Multi-Domain: Effective across mathematics, logic, science, and problem-solving - Base Model: google/gemma-3-1b-it - Training Method: GRPO (Group Relative Policy Optimization) - LoRA Rank: 64 - LoRA Alpha: 128 - Training Samples: 107 - Thinking Tag Usage: 60.0% - Average Quality Score: 5.60 The model will generate responses with structured thinking: - Method: GRPO (Group Relative Policy Optimization) - Data Generation: Magpie approach with reasoning-focused prompts - Preference Learning: Multiple responses ranked by reasoning quality - Domains: Mathematics, logic puzzles, science, programming, philosophy - Quality Scoring: Based on thinking tag usage, reasoning markers, and structure The model was trained on self-generated reasoning problems across multiple domains: - Mathematical problem-solving - Logic puzzles and riddles - Scientific analysis - Programming challenges - Philosophical reasoning - Decision-making scenarios - Step-by-step analysis: Breaking complex problems into smaller parts - Causal reasoning: Using "because", "therefore", "since" connections - Sequential thinking: "First", "next", "then", "finally" progression - Structured output: Clear separation of thinking and final response The adapter was evaluated on diverse reasoning tasks: - Thinking tag usage rate: 60.0% - Average reasoning quality score: 5.60 - Response comprehensiveness: 454 words average - Dataset: codelion/gemma-3-1b-it-magpie-reasoning - Base Model: google/gemma-3-1b-it - Framework: PEFT - Training Method: GRPO (Group Relative Policy Optimization) This adapter is part of the Ellora project - standardized recipes for enhancing LLM capabilities.
Qwen3-0.6B-accuracy-recovery-lora
Qwen3 4B Execution World Model Lora
This LoRA adapter adds execution awareness capabilities to Qwen/Qwen3-4B-Thinking-2507. Inspired by Meta's CWM (Code World Model) research, it enables the model to predict and understand program execution step-by-step. - Step-by-Step Execution Prediction: Predicts variable states at each line - Dynamic World Model: Understands how code behaves at runtime - Execution Tracing: Generates detailed execution traces with variable states - Debugging Support: Can identify and explain execution behavior - GRPO-Trained: Uses preference learning with real execution feedback - Base Model: Qwen/Qwen3-4B-Thinking-2507 - Training Method: GRPO (Group Relative Policy Optimization) with Real Execution Traces - LoRA Rank: 64 - LoRA Alpha: 128 - Training Samples: 298 - Evaluation Samples: 323 - Execution Prediction Accuracy: 20.0% - Mean State Accuracy: 33.3% - Method: GRPO (Group Relative Policy Optimization) - Data: Self-generated code with real execution traces - Epochs: 3 - Reward: Gradual scoring (0.0-1.0) based on execution accuracy - Python code (3-20 lines) - Real execution traces via `sys.settrace()` - Ground truth variable states - Dataset: codelion/execution-world-model-dataset - Base Model: Qwen/Qwen3-4B-Thinking-2507 - Project: Ellora Recipes Part of the Ellora project - standardized recipes for enhancing LLM capabilities.
Llama-3.2-3B-o1
Qwen2.5-Coder-0.5B-Instruct-security-grpo-lora
codelion/Qwen2.5-Coder-0.5B-Instruct-security-grpo-lora This LoRA adapter enhances Qwen/Qwen2.5-Coder-0.5B-Instruct to generate secure code by default, trained using GRPO (Group Relative Policy Optimization) with automated security analysis via Semgrep. - Automated Security Analysis: Uses Semgrep for consistent vulnerability detection - Self-Supervised Training: No manually curated secure/insecure datasets required - Comprehensive Coverage: Addresses OWASP Top 10 and CWE Top 25 vulnerabilities - Language Focus: Specialized for Python security patterns - Preference Learning: GRPO training to prefer secure coding patterns - Base Model: Qwen/Qwen2.5-Coder-0.5B-Instruct - Training Method: GRPO with security-based preferences - LoRA Rank: 64 - LoRA Alpha: 128 - Training Samples: 195 - Security Evaluation Pass Rate: 20.0% - Average Security Score: 0.40 (lower is better) | Vulnerability Type | Score | Status | |-------------------|-------|---------| | SQL Injection | 0 | ✅ | | Command Injection | 0 | ✅ | | Path Traversal | 2 | ✅ | | Weak Cryptography | 0 | ✅ | | Hardcoded Secrets | 0 | ✅ | The model generates code with security best practices: - SQL Injection Prevention: Parameterized queries, prepared statements - Password Security: Bcrypt/Argon2 hashing, no plaintext storage - Input Validation: Comprehensive validation and sanitization - Error Handling: Safe error messages without information disclosure - Secure Randomness: Using `secrets` module instead of `random` - Path Security: Proper path joining and validation - Command Injection Prevention: Avoiding shell=True, using subprocess safely Data Generation - Method: Self-supervised with Magpie-style generation - Scenarios: 7 security categories - Analysis: Automated using Semgrep security rules - Preference Pairs: Based on security score differences GRPO Training - Objective: Minimize security vulnerabilities while maintaining functionality - Reward Signal: Negative correlation with Semgrep security score - Batch Size: 1 with 8x gradient accumulation - Learning Rate: 3e-06 - Epochs: 5 The adapter was evaluated on comprehensive security test cases: - CWE Coverage: Top 25 most dangerous software weaknesses - OWASP Alignment: Addresses OWASP Top 10 vulnerabilities - Practical Scenarios: Real-world security challenges - Pattern Recognition: Identifies and applies secure coding patterns 1. Language Focus: Currently optimized for Python; other languages may need additional training 2. Context Awareness: Best results with clear security-focused prompts 3. Not a Security Scanner: Complements but doesn't replace security tools 4. Continuous Updates: Security landscape evolves; periodic retraining recommended - Dataset: codelion/Qwen2.5-Coder-0.5B-Instruct-security-preference - Base Model: Qwen/Qwen2.5-Coder-0.5B-Instruct - Ellora Project: GitHub Repository - Semgrep: Security Analysis Tool This adapter is part of the Ellora project - standardized recipes for enhancing LLM capabilities.
qwen2-5-coder-0-5b-instruct-progressive-2000k-lora
Qwen3-0.6B-ICM-DPO-mlx-fp16
The Model codelion/Qwen3-0.6B-ICM-DPO-mlx-fp16 was converted to MLX format from codelion/Qwen3-0.6B-ICM-DPO using mlx-lm version 0.25.3.
MathCoT
gemma-3-1b-it-ICM-DPO-mlx-fp16
The Model codelion/gemma-3-1b-it-ICM-DPO-mlx-fp16 was converted to MLX format from codelion/gemma-3-1b-it-ICM-DPO using mlx-lm version 0.25.3.
Llama-3.3-70B-o1-gguf
This repository contains the GGUF quants for the Llama-3.3-70B-o1 model. You can use them for inference in local inference servers like ollama or llama.cpp
Qwen3-4B-Instruct-2507-self-verify-lora
malm-165m
public-domain-mickey-mouse
Qwen3-0.6B-ICM-DPO
- Developed by: codelion - License: apache-2.0 - Finetuned from model : unsloth/Qwen3-0.6B This qwen3 model was trained 2x faster with Unsloth and Huggingface's TRL library.
SmolLM2-70M
Qwen3-0.6B-PTS-DPO
gemma-3-1b-it-ICM-DPO
- Developed by: codelion - License: apache-2.0 - Finetuned from model : unsloth/gemma-3-1b-it This gemma3text model was trained 2x faster with Unsloth and Huggingface's TRL library.
whisper-age-estimator
Qwen3-0.6B-GRPO-mlx-fp16
DeepSeek-R1-Distill-Qwen-1.5B-PTS-DPO
- Developed by: codelion - License: apache-2.0 - Finetuned from model : unsloth/DeepSeek-R1-Distill-Qwen-1.5B This qwen2 model was trained 2x faster with Unsloth and Huggingface's TRL library.