DreamW1ngs
AdaR Qwen 2.5 Math 7B 9K
Large Language Models (LLMs) have shown impressive reasoning capabilities, yet they often rely on spurious reasoning — producing answers from superficial features, leading to failure at robustness and generalization. We propose AdaR framework to enable adaptive reasoning, wherein models rely on problem-solving logic to produce answers. AdaR synthesizes logically equivalent queries by varying variable values, and trains models with RLVR on these data to penalize spurious logic while encouragin...
DCSQE-ende-syn
DCSQE-heen-syn
DCSQE-zhen-syn
DCSQE-ende-sent
Spearman correlation coefficient is set as the best metric to train the sentence-level task. Using the DCSQE framework, synthetic data is generated from the WMT2023 parallel corpus for pre-training, and then fine-tuned on the WMT2022 QE EN-DE training set, all implemented with the Fairseq framework. For a detailed description of the DCSQE framework, please refer to the paper: Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation
DCSQE-ende-word
Matthews correlation coefficient is set as the best metric to train the word-level task. Using the DCSQE framework, synthetic data is generated from the WMT2023 parallel corpus for pre-training, and then fine-tuned on the WMT2022 QE EN-DE training set, all implemented with the Fairseq framework. For a detailed description of the DCSQE framework, please refer to the paper: Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation
DCSQE-zhen-word
Matthews correlation coefficient is set as the best metric to train the word-level task. Using the DCSQE framework, synthetic data is generated from the WMT2023 parallel corpus for pre-training, and then fine-tuned on the WMT2022 QE ZH-EN training set, all implemented with the Fairseq framework. For a detailed description of the DCSQE framework, please refer to the paper: Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation
DCSQE-zhen-sent
Spearman correlation coefficient is set as the best metric to train the sentence-level task. Using the DCSQE framework, synthetic data is generated from the WMT2023 parallel corpus for pre-training, and then fine-tuned on the WMT2022 QE ZH-EN training set, all implemented with the Fairseq framework. For a detailed description of the DCSQE framework, please refer to the paper: Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation