kalpeshk2011
dipper-paraphraser-xxl
This is the HuggingFace model release of our paper "Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense". Paper: https://arxiv.org/abs/2303.13408 Code: https://github.com/martiansideofthemoon/ai-detection-paraphrases Usage instructions: https://github.com/martiansideofthemoon/ai-detection-paraphrases#running-the-paraphraser-model-dipper DIPPER ("Discourse Paraphraser") is a 11B parameter paraphrase generation model built by fine-tuning T5-XXL. DIPPER possesses two unique features that help its outputs evade AI-generated text detectors: Paraphrasing long-form text in context: Most modern paraphrasers are exclusively trained on sentence-level data, ignoring discourse-level information. However, many critical use cases of LLMs involve generating long-form text in responses to detailed userspecified prompts. Thus, we train DIPPER to paraphrase paragraph-length texts, re-order content, and optionally leverage context such as input prompts. Controlling output diversity: Another weakness of existing paraphrasers is that they lack an easy way to control output diversity. An attacker may want to apply just the minimum amount of lexical and syntactic modifications necessary to evade a detection algorithm. DIPPER provides users with two intuitive scalar control knobs at inference time that are trained end-to-end: one controls the lexical diversity of the paraphrase, and the other controls the amount of content re-ordering. We leverage the PAR3 dataset publicly released by Thai et al. (2022) to train DIPPER. This dataset contains multiple translations of non-English novels into English aligned at a paragraph level (e.g., it contains both the Henry Morley and Robert Adams translations of Voltaire’s Candide), which we treat as paragraphlevel paraphrases and use to train our paraphraser. Full instructions: https://github.com/martiansideofthemoon/ai-detection-paraphrases#running-the-paraphraser-model-dipper We suggest using the code below to use the model correctly:
dipper-paraphraser-xxl-no-context
Instruct Llama 7b Wdiff
This is the HuggingFace model release of the instruction tuned LLAMA-7B model used in our paper FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation. Please refer to the README for instructions on how to setup the model (link). Credits to Yizhong Wang for originally training this model.