Clover-Hill

8 models • 1 total models in database

Sort by:

MemoryDecoder Gpt2 Small

gpt2-xl-finetuned-wikitext103

A fine-tuned version of GPT2-XL on the WikiText-103 dataset. - Base Model: GPT2-XL - Training Dataset: WikiText-103 - Model Size: 1.5B parameters | Model | Perplexity | Improvement | |:------|:----------:|:-----------:| | GPT2-XL (baseline) | 14.39 | - | | GPT2-XL-Finetuned | 10.16 | -4.23 | - Training Data: WikiText-103 (103M tokens) - Optimizer: AdamW - Learning Rate: 1e-5 with cosine schedule This model was released as part of the paper "Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models" (NIPS'2025). For more information, see: https://github.com/LUMIA-Group/MemoryDecoder.

NaNK

license:apache-2.0

MemoryDecoder-Llama-finance

This Memory Decoder model is trained on the Finance domain and can be adapted to enhance any model in the Llama3, Llama3.1, and Llama3.2 families. > [!IMPORTANT] > These Llama models are initialized from Qwen models with the embedding layer adapted to fit the Llama tokenizer. This enables efficient cross-model family knowledge transfer. Paper: Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models GitHub: https://github.com/LUMIA-Group/MemoryDecoder Finance Domain Dataset: yahoofinancestockmarketnews | Model | Base Model | Base + MemDec | |-------|------------|---------------| | Llama3-8B | 8.63 | 4.32 | | Llama3-70B | 6.87 | 4.01 | | Model | Base Model | Base + MemDec | |-------|------------|---------------| | Llama3.1-8B | 8.46 | 4.30 | | Llama3.1-70B | 6.68 | 3.97 | | Model | Base Model | Base + MemDec | |-------|------------|---------------| | Llama3.2-1B | 11.85 | 4.85 | | Llama3.2-3B | 9.70 | 4.45 | Perplexity scores on Finance domain test set. Lower is better.

NaNK

license:apache-2.0

MemoryDecoder-Qwen-finance

This Memory Decoder model is trained on the Finance domain and can be adapted to enhance any model in the Qwen2 and Qwen2.5 families. Paper: Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models GitHub: https://github.com/LUMIA-Group/MemoryDecoder Finance Domain Dataset: yahoofinancestockmarketnews | Model | Base Model | Base + MemDec | |-------|------------|---------------| | Qwen2-0.5B | 16.00 | 3.84 | | Qwen2-1.5B | 10.96 | 3.61 | | Qwen2-7B | 8.31 | 3.38 | | Qwen2-72B | 6.62 | 3.20 | | Model | Base Model | Base + MemDec | |-------|------------|---------------| | Qwen2.5-0.5B | 16.04 | 3.87 | | Qwen2.5-1.5B | 11.20 | 3.61 | | Qwen2.5-3B | 9.83 | 3.52 | | Qwen2.5-7B | 8.61 | 3.42 | | Qwen2.5-14B | 7.60 | 3.31 | | Qwen2.5-32B | 7.38 | 3.29 | | Qwen2.5-72B | 6.80 | 3.23 | Perplexity scores on Finance domain test set. Lower is better.

NaNK

license:apache-2.0

MemoryDecoder-Qwen-biomed

This Memory Decoder model is trained on the Biomedical domain and can be adapted to enhance any model in the Qwen2 and Qwen2.5 families. Paper: Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models GitHub: https://github.com/LUMIA-Group/MemoryDecoder Biomedical Domain Dataset: mimiciiidiagnosisanonymous | Model | Base Model | Base + MemDec | |-------|------------|---------------| | Qwen2-0.5B | 18.41 | 3.75 | | Qwen2-1.5B | 12.42 | 3.68 | | Qwen2-7B | 8.36 | 3.59 | | Qwen2-72B | 6.15 | 3.45 | | Model | Base Model | Base + MemDec | |-------|------------|---------------| | Qwen2.5-0.5B | 17.01 | 3.74 | | Qwen2.5-1.5B | 11.33 | 3.67 | | Qwen2.5-3B | 9.70 | 3.63 | | Qwen2.5-7B | 8.19 | 3.57 | | Qwen2.5-14B | 7.01 | 3.51 | | Qwen2.5-32B | 6.65 | 3.48 | | Qwen2.5-72B | 5.90 | 3.44 | Perplexity scores on Biomedical domain test set. Lower is better.

NaNK

license:apache-2.0

MemoryDecoder-Llama-law

This Memory Decoder model is trained on the Law domain and can be adapted to enhance any model in the Llama3, Llama3.1, and Llama3.2 families. > [!IMPORTANT] > These Llama models are initialized from Qwen models with the embedding layer adapted to fit the Llama tokenizer. This enables efficient cross-model family knowledge transfer. Paper: Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models GitHub: https://github.com/LUMIA-Group/MemoryDecoder | Model | Base Model | Base + MemDec | |-------|------------|---------------| | Llama3-8B | 5.96 | 4.46 | | Llama3-70B | 4.90 | 4.07 | | Model | Base Model | Base + MemDec | |-------|------------|---------------| | Llama3.1-8B | 5.88 | 4.42 | | Llama3.1-70B | 4.89 | 4.06 | | Model | Base Model | Base + MemDec | |-------|------------|---------------| | Llama3.2-1B | 8.23 | 5.11 | | Llama3.2-3B | 6.83 | 4.76 | Perplexity scores on Law domain test set. Lower is better.

NaNK

license:apache-2.0

MemoryDecoder-Qwen-law

This Memory Decoder model is trained on the Law domain and can be adapted to enhance any model in the Qwen2 and Qwen2.5 families. Paper: Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models GitHub: https://github.com/LUMIA-Group/MemoryDecoder | Model | Base Model | Base + MemDec | |-------|------------|---------------| | Qwen2-0.5B | 10.23 | 4.57 | | Qwen2-1.5B | 7.69 | 4.32 | | Qwen2-7B | 5.92 | 4.00 | | Qwen2-72B | 4.84 | 3.69 | | Model | Base Model | Base + MemDec | |-------|------------|---------------| | Qwen2.5-0.5B | 9.86 | 4.57 | | Qwen2.5-1.5B | 7.42 | 4.29 | | Qwen2.5-3B | 6.68 | 4.16 | | Qwen2.5-7B | 5.94 | 4.01 | | Qwen2.5-14B | 5.35 | 3.86 | | Qwen2.5-32B | 5.18 | 3.81 | | Qwen2.5-72B | 4.84 | 3.70 | Perplexity scores on Law domain test set. Lower is better.

NaNK

license:apache-2.0

MemoryDecoder-Llama-biomed

This Memory Decoder model is trained on the Biomedical domain and can be adapted to enhance any model in the Llama3, Llama3.1, and Llama3.2 families. > [!IMPORTANT] > These Llama models are initialized from Qwen models with the embedding layer adapted to fit the Llama tokenizer. This enables efficient cross-model family knowledge transfer. Paper: Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models GitHub: https://github.com/LUMIA-Group/MemoryDecoder Biomedical Domain Dataset: mimiciiidiagnosisanonymous | Model | Base Model | Base + MemDec | |-------|------------|---------------| | Llama3-8B | 7.95 | 3.92 | | Llama3-70B | 5.92 | 3.74 | | Model | Base Model | Base + MemDec | |-------|------------|---------------| | Llama3.1-8B | 7.82 | 3.91 | | Llama3.1-70B | 5.85 | 3.73 | | Model | Base Model | Base + MemDec | |-------|------------|---------------| | Llama3.2-1B | 12.81 | 4.06 | | Llama3.2-3B | 9.83 | 3.99 | Perplexity scores on Biomedical domain test set. Lower is better.

NaNK

license:apache-2.0