GuoxinChen
ReForm-8B
🧩 ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization ReForm is a reflective Autoformalization framework that enables large language models to generate → verify → refine formal mathematical statements in an integrated self-corrective loop. It introduces Prospective Bounded Sequence Optimization (PBSO) — a novel reinforcement learning algorithm designed for heterogeneous rewards at different sequence positions — enabling stable, reflective training and substantial gains in semantic fidelity. - 🪞 Reflective Autoformalization Paradigm Turns single-pass translation into an iterative “generate–validate–refine” cycle, allowing the model to autonomously detect and correct semantic errors. - ⚖️ Prospective Bounded Sequence Optimization (PBSO) A reinforcement learning algorithm with position-specific rewards for both task accuracy and critique quality, ensuring stable and interpretable optimization. - 📈 State-of-the-art Semantic Consistency ReForm achieves an average +22.6pp improvement over the strongest baseline across four formalization benchmarks (miniF2F, ProofNet, Putnam, and AIME 2025). Figure: ReForm consistently outperforms Goedel-V2 and Kimina-Autoformalizer on all benchmarks. If you find ReForm useful for your research, please cite:
ReForm-32B
🧩 ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization ReForm is a reflective Autoformalization framework that enables large language models to generate → verify → refine formal mathematical statements in an integrated self-corrective loop. It introduces Prospective Bounded Sequence Optimization (PBSO) — a novel reinforcement learning algorithm designed for heterogeneous rewards at different sequence positions — enabling stable, reflective training and substantial gains in semantic fidelity. - 🪞 Reflective Autoformalization Paradigm Turns single-pass translation into an iterative “generate–validate–refine” cycle, allowing the model to autonomously detect and correct semantic errors. - ⚖️ Prospective Bounded Sequence Optimization (PBSO) A reinforcement learning algorithm with position-specific rewards for both task accuracy and critique quality, ensuring stable and interpretable optimization. - 📈 State-of-the-art Semantic Consistency ReForm achieves an average +22.6pp improvement over the strongest baseline across four formalization benchmarks (miniF2F, ProofNet, Putnam, and AIME 2025). Figure: ReForm consistently outperforms Goedel-V2 and Kimina-Autoformalizer on all benchmarks. If you find ReForm useful for your research, please cite: