LHL3341
Caco CodeGen
Caco: Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning [](https://arxiv.org/abs/2510.04081) [](https://neurips.cc/) [](https://opensource.org/licenses/Apache-2.0) Caco-CodeGen is a code-driven reasoning generation model trained under the Caco framework. It serves as the core engine for expanding executable Code Chain-of-Thoughts (Code CoTs), enabling diverse, verifiable, and pattern-aware reasoning data synthesis at scale. Traditional Chain-of-Thought (CoT) data often lacks verifiability and diversity. Caco addresses this by grounding reasoning in executable programs, enabling automatic correctness checks and scalable reasoning synthesis. | Property | Description | | ---------------------- | -------------------------------------------------------------------------- | | Model Type | Code LLM (Code-Aware Generator) | | Base Model | Qwen2.5-Coder-7B | | Training Objective | Next-token prediction on executable reasoning traces | | Training Data | Code CoTs extracted and unified from math and algorithmic datasets | | Output Type | Python-like executable reasoning steps (`codecot`) | | Verification | Code execution + output consistency filter | Caco constructs reasoning data through three scalable stages: Collect diverse seed reasoning traces (mathematical + algorithmic), normalize them into a unified executable format. Train a Code Generator to expand reasoning traces via Pattern-level Augmentation ā restructuring logic (e.g., decomposition, reformulation, alternative solution paths). Back-translate executable reasoning into natural language problems and solutions, and apply dual correctness verification. Fine-tuning reasoning LLMs (math, logic, or code tasks) Verifiable reasoning data augmentation Program-based RL reward modeling (RLVR) Cross-domain reasoning transfer experiments | Model | MATH | Olympiad | Theorem-QA | | -------------------- | -------- | -------- | ---------- | | DeepSeekMath-7B-Caco | 68.2 | 29.5 | 33.8 | | Qwen2.5-7B-Caco | 82.4 | 46.5 | 46.0 | | Llama3-8B-Caco | 70.6 | 34.1 | 31.0 | Models trained on Caco show consistent improvements across multiple reasoning benchmarks and domains. Apache 2.0 ā free for academic and commercial use, with attribution. š§ Caco Paper (arXiv:2510.04081) š§© Caco-1.3M Dataset Raising Difficulty: integrate harder datasets (AM-Thinking-distill, DAPO) Expanding Diversity: add science, proofs, procedural planning RL with Verifiable Rewards (RLVR): use code execution as low-noise reward signal