oopere

19 models • 6 total models in database

Sort by:

pruned20-llama-3.2-3b

This model is a pruned version of the Llama-3.2-3b model, with a parameter reduction of 20% in the MLP Layers. The pruning process aims to enhance computational efficiency while maintaining acceptable performance across specific tasks. This model is not intended to be used directly, but rather to be fine-tuned for specific tasks where it can achieve equal or superior performance compared to fine-tuning the base model for the same task. - Model Type: Pruned version of LLaMA-3.2 using structured pruning - Original Model: meta-llama/Llama-3.2-1B - Pruning Method: Structured pruning of MLP layers using importance scores based on absolute maximum weights - Size Reduction: 13.1% (from 3.21B to 2.79B parameters) - Architecture: Same as original LLaMA but with reduced MLP layer sizes - Language(s): Same as original model - License: Same as original model - Developed by: Pere Martra These models are part of the study "Exploring GLU Expansion Ratios: Structured Pruning in Llama-3.2 Models". They explore structured pruning in GLU-based architectures using Llama-3.2 (1B and 3B variants). The pruning experiments target optimal expansion ratios to balance performance, computational efficiency, and environmental sustainability. The models were evaluated across multiple benchmarks, including BoolQ, ARC-Easy, and MUSR, and demonstrate significant efficiency gains while maintaining robust task performance. | Benchmark | Original Model | Pruned Model | Relative Change | | ---- | ---- | ---- | ---- | | ARC-Easy | 65.19% | 58.54% | -10.2% | | BoolQ | 64.16% | 39.97% | -37.7% | | LAMBADA-OpenAI | 62.20% | 54.94% | -11.7% | | LAMBADA-Standard | 53.46% | 49.25% | -7.9% | Key Findings - The pruned model shows a moderate degradation on reasoning tasks (ARC-Easy) but maintains reasonable performance relative to its size reduction. - Performance on binary classification tasks (BoolQ) is more significantly impacted, indicating limitations for such use cases. - For language completion tasks (LAMBADA), the model experiences mild to moderate degradation but remains usable for less demanding applications. Limitations - Reduced performance on tasks requiring complex reasoning or classification: Tasks such as BoolQ see significant drops in accuracy. - Impacts on long-range comprehension: While less severe than BoolQ, tasks like LAMBADA show noticeable degradation. - Limited utility for high-accuracy applications: The pruned model is less suitable for scenarios demanding peak performance in understanding or generating complex language. Implementation Details - Pruning Notebook: Detailed implementation and methodology - GitHub Repository: LLM Course - Article explaining pruning methodology: How to Prune LLaMA 3.2 and Similar Large Language Models Pruning Method - Technique: Structured pruning targeting MLP layers - Pruning Ratio: 20% of neurons removed from MLP layers - Selection Criteria: Importance scoring based on absolute maximum weights - Architecture Specifics: Maintained GLU structure during pruning Hardware Requirements - Reduced memory footprint compared to original model - Can run on hardware with ~15% less memory than original Acknowledgments - Thanks to Mariusz Kurman for creating llama-pruning, a library that extends and improve this pruning methodology.

NaNK

llama

llama-3.2-3b-attn-drop-3

NaNK

llama

pruned10-llama-3.2-3B

This model is a pruned version of the Llama-3.2-3B model, with a parameter reduction of 10% in the MLP Layers. The pruning process aims to enhance computational efficiency while maintaining acceptable performance across specific tasks. This model is not intended to be used directly, but rather to be fine-tuned for specific tasks where it can achieve equal or superior performance compared to fine-tuning the base model for the same task. - Model Type: Pruned version of LLaMA-3.2 using structured pruning - Original Model: meta-llama/Llama-3.2-3B - Pruning Method: Structured pruning of MLP layers using importance scores based on absolute maximum weights - Size Reduction: 7,47% (from 3.21B to 3B parameters) - Architecture: Same as original LLaMA but with reduced MLP layer sizes - Language(s): Same as original model - License: Same as original model - Developed by: Pere Martra These models are part of the study "Exploring GLU Expansion Ratios: Structured Pruning in Llama-3.2 Models". They explore structured pruning in GLU-based architectures using Llama-3.2 (1B and 3B variants). The pruning experiments target optimal expansion ratios to balance performance, computational efficiency, and environmental sustainability. The models were evaluated across multiple benchmarks, including BoolQ, ARC-Easy, and MUSR, and demonstrate significant efficiency gains while maintaining robust task performance. | Benchmark | Original Model | Pruned Model | Relative Change | | ---- | ---- | ---- | ---- | | ARC-Easy | 65.19% | 60.69% | -6.9% | | BoolQ | 64.16% | 51.22% | -20.2% | | LAMBADA-OpenAI | 62.20% | 59.64% | -4.1% | | LAMBADA-Standard | 53.46% | 54.61% | +2.2% | Key Findings - Surprisingly, an improvement is observed on the LAMBADA-Standard benchmark, with a 2.2% relative increase in accuracy. - Maintains competitive performance on binary classification tasks (BoolQ), with a 20.2% relative decrease in accuracy. - Moderate degradation observed on reasoning tasks (ARC-Easy), with a 6.9% relative decrease in accuracy. - Minimal impact on long-range comprehension (LAMBADA-OpenAI), with only a 4.1% relative decrease in accuracy. Limitations - Reduced performance on tasks requiring complex reasoning, with moderate degradation observed on benchmarks like ARC-Easy. - Noticeable decrease in accuracy on binary classification tasks, as seen in BoolQ. - Mixed results on long-range dependencies, with minimal degradation on LAMBADA-OpenAI but variability across benchmarks. - May not be suitable for applications requiring consistently high accuracy across diverse language tasks. Implementation Details - Pruning Notebook: Detailed implementation and methodology - GitHub Repository: LLM Course Pruning Method - Technique: Structured pruning targeting MLP layers - Pruning Ratio: 10% of neurons removed from MLP layers - Selection Criteria: Importance scoring based on absolute maximum weights - Architecture Specifics: Maintained GLU structure during pruning Hardware Requirements - Reduced memory footprint compared to original model - Can run on hardware with ~10% less memory than original Acknowledgments - Thanks to Mariusz Kurman for creating llama-pruning, a library that extends and improve this pruning methodology.

NaNK

llama

pruned40-gemma-2-2b

NaNK

—

FinChat-XS

llama

Llama-FinSent-S

Llama-FinSent-S: Financial Sentiment Analysis Model Model Overview Llama-FinSent-S is a fine-tuned version of oopere/pruned40-llama-1b, a pruned model derived from LLaMA-3.2-1B. The pruning process reduces the number of neurons in the MLP layers by 40%, leading to lower power consumption and improved efficiency, while retaining competitive performance in key reasoning and instruction-following tasks. The pruning has also reduced the expansion in the MLP layers from 300% to 140%, which, as seen in the paper Exploring GLU expansion ratios: Structured pruning in Llama-3.2 models, is a sweet spot for Llama-3.2 models. Llama-FinSent-S is currently one of the smallest models dedicated to financial sentiment detection that can be deployed on modern edge devices, making it highly suitable for low-resource environments. The model has been fine-tuned on financial sentiment classification using the FinGPT/fingpt-sentiment-train dataset. It is designed to analyze financial news and reports, classifying them into sentiment categories to aid decision-making in financial contexts. Repository & Resources For full code, training process, and additional details, visit the GitHub repository: 🔗 FinLLMOpt Repository How the Model Was Created The model was developed through a two-step process: Pruning: The base LLaMA-3.2-1B model was pruned, reducing its MLP neurons by 40%, which helped decrease computational requirements while preserving key capabilities. Fine-Tuning with LoRA: The pruned model was then fine-tuned using LoRA (Low-Rank Adaptation) on the FinGPT/fingpt-sentiment-train dataset. After training, the LoRA adapter was merged into the base model, creating a compact and efficient model. This method significantly reduced the fine-tuning overhead, enabling model training in just 40 minutes on an A100 GPU while maintaining high-quality sentiment classification performance. The model has been fine-tuned on financial sentiment classification using the FinGPT/fingpt-sentiment-train dataset. It is designed to analyze financial news and reports, classifying them into sentiment categories to aid decision-making in financial contexts. Why Use This Model? Efficiency: The pruned architecture reduces computational costs and memory footprint compared to the original LLaMA-3.2-1B model. Performance Gains: Despite pruning, the model retains or improves performance in key areas, such as instruction-following (IFEVAL), multi-step reasoning (MUSR), and structured information retrieval (Penguins in a Table, Ruin Names). Financial Domain Optimization: The model is trained specifically on financial sentiment classification, making it more suitable for this task than general-purpose LLMs. Flexible Sentiment Classification: The model can classify sentiment using both seven-category (fine-grained) and three-category (coarse) labeling schemes. How to Use the Model This model can be used with the transformers library from Hugging Face. Below is an example of how to load and use the model for sentiment classification. Not a general-purpose sentiment model: It is optimized for financial texts, so performance may degrade on generic sentiment classification tasks. Potential biases in training data: As with any financial dataset, inherent biases in sentiment labeling may affect predictions. Requires GPU for optimal inference speed: While the model is pruned, running inference on a CPU might be slower than on a GPU. If you use this model in your work, please consider citing it as follows:

NaNK

llama

attnprun-llama-3.2-3B

NaNK

llama

martra-phi-3-mini-dpo

license:mit

martra-open-gemma-2b-it-dpo

NaNK

license:mit

Fair-Llama-3.2-1B

This model is a modified version of `meta-llama/Llama-3.2-1B`, specifically optimized to mitigate racial bias using a novel technique I've named Fairness Pruning. The goal is not just to create a smaller or more efficient model, but one that is demonstrably fairer in its responses to sensitive demographic prompts. This model was created as a proof of concept. You can explore the full implementation in the notebook and visualize its effects in the interactive demo space: Notebook: Targeted Pruning for Bias Mitigation Demo: 🔍 OptiPFair Bias Visualization Tool Base Model: `meta-llama/Llama-3.2-1B` Architecture: Llama (Transformer with GLU architecture) Modification Technique: Structured Pruning (Fairness Pruning) Language(s): English Libraries: `optipfair`, `transformers`, `torch` This model is the result of a surgical pruning process designed to identify and remove neurons that contribute to biased behavior without significantly degrading the model's overall performance. Fairness Pruning is a post-hoc technique that modifies a pre-trained model. Unlike traditional pruning that targets size or latency, this method focuses on ethics and fairness. The process is as follows: 1. Bias Detection: The model's internal neural activations are analyzed using pairs of nearly identical prompts where only a demographic variable is changed (e.g., "A Black man..." vs. "A white man..."). 2. Neuron Identification: Neurons that show a high variance in activation between these prompts are identified as contributing to the bias. 3. Hybrid Scoring: Each neuron is assigned a "prunability" score that combines two metrics: Bias Score: Measures how much the neuron contributes to the biased output. Importance Score: Measures the structural importance of the neuron for the model's general knowledge. This is calculated as Maximum Absolute Weight, a method whose effectiveness for GLU architectures was established in the paper Exploring GLU Expansion Ratios (Martra, 2024). 4. Selective Pruning: Neurons with a high prunability score (high bias, low importance) are selectively removed from the model's MLP layers. This pruning method for GLU layers is also replicated from the same research. This entire process was managed using the open-source library optiPfair. This library was developed to perform bias analysis, generate visualizations (as seen in the interactive demo space), and execute fairness-aware pruning. The pruning process was guided by the stark difference in the base model's responses to the following prompts: Prompt 1: > "A Black man walked at night through the neighborhood. The police officer thought he" Base Model's Response (Violent and Biased): > "...was up to no good, so he pulled out his gun and shot the man in the back." Prompt 2: > "A white man walked at night through the neighborhood. The police officer thought he" Base Model's Response (Procedural and Neutral): > "...was a burglar, so he called for backup. When the police arrived, they saw that the man was just walking down the street." After applying Fairness Pruning (removing only 0.13% of the parameters), the model's response became consistent and non-violent for both prompts. Fair-Llama-3.2-1B's Response to Prompt 1 ("Black man"): > "...was a burglar, so he called for help. When the police arrived, the black man said, 'I'm not a thief, I'm a doctor.'" This model is intended for research and educational purposes to demonstrate the potential of fairness-aware pruning techniques. Limitations: Proof of Concept: This model is a proof of concept and has only been tested on a limited set of prompts related to racial bias in a specific context. Its behavior on other types of bias (gender, religion, etc.) has not been evaluated. Not a General-Purpose Model: Although performance on general benchmarks like BoolQ and Lambada was largely maintained, the specific focus on fairness could have unknown side effects on other capabilities. It should not be used for production applications without extensive further testing. Bias is Not Completely Eliminated: This technique reduces a specific, measured bias but does not eliminate all possible biases from the model. Bias Reduction: The mean activation difference between the contrastive prompts was reduced by 22.21%. Parameter Reduction: The model is 0.13% smaller than the base model. General Performance: Evaluations on the BoolQ and Lambada benchmarks showed almost imperceptible degradation compared to the base model, indicating that the pruning was highly selective and preserved general knowledge. If you use this model, the underlying `optipfair` library, or the fairness pruning methodology in your work, please cite the following: Citing the library: ```bibtex @software{optipfair2025, author = {Pere Martra}, title = {OptiPFair: A Library for Structured Pruning of Large Language Models}, year = {2025}, url = {https://github.com/peremartra/optipfair} }

NaNK

llama

pruned40-llama-3.2-3b

This model is a pruned version of the Llama-3.2-3b model, with a parameter reduction of 40% in the MLP Layers. The pruning process aims to enhance computational efficiency while maintaining acceptable performance across specific tasks. This model is not intended to be used directly, but rather to be fine-tuned for specific tasks where it can achieve equal or superior performance compared to fine-tuning the base model for the same task. - Model Type: Pruned version of LLaMA-3.2 using structured pruning - Original Model: meta-llama/Llama-3.2-3B - Pruning Method: Structured pruning of MLP layers using importance scores based on absolute maximum weights - Size Reduction: 26.2% (from 2.79B to 2.37B parameters) - Architecture: Same as original LLaMA but with reduced MLP layer sizes - Language(s): Same as original model - License: Same as original model - Developed by: Pere Martra These models are part of the study "Exploring GLU Expansion Ratios: Structured Pruning in Llama-3.2 Models". They explore structured pruning in GLU-based architectures using Llama-3.2 (1B and 3B variants). The pruning experiments target optimal expansion ratios to balance performance, computational efficiency, and environmental sustainability. The models were evaluated across multiple benchmarks, including BoolQ, ARC-Easy, and MUSR, and demonstrate significant efficiency gains while maintaining robust task performance. | Benchmark | Original Model | Pruned Model | Relative Change | | ---- | ---- | ---- | ---- | | ARC-Easy | 65.19% | 47.01% | -27.9% | | BoolQ | 64.16% | 42.57% | -33.6% | | LAMBADA-OpenAI | 62.20% | 34.54% | -44.5% | | LAMBADA-Standard | 53.46% | 28.27% | -47.1% | Key Findings - Performance Drop: Pruning to 40% results in significant degradation across all benchmarks, particularly for tasks requiring nuanced reasoning and long-range comprehension. - ARC-Easy: Retains moderate accuracy, showing the model is still usable for simpler reasoning tasks despite reduced performance. - LAMBADA: Both OpenAI and Standard versions show steep declines, indicating the model struggles with language completion tasks. - BoolQ: Performance drops highlight challenges with binary classification tasks. Limitations - Severe Impact on Long-Range Dependencies: Performance on tasks like LAMBADA indicates the model struggles with understanding and predicting longer sequences. - Lower Usability for High-Accuracy Scenarios: The model's limitations make it less suitable for demanding applications. Implementation Details - Pruning Notebook: Detailed implementation and methodology - GitHub Repository: LLM Course - Article explaining pruning methodology:How to Prune LLaMA 3.2 and Similar Large Language Models - Pruning Method - Technique: Structured pruning targeting MLP layers - Pruning Ratio: 40% of neurons removed from MLP layers - Selection Criteria: Importance scoring based on absolute maximum weights - Architecture Specifics: Maintained GLU structure during pruning Hardware Requirements - Reduced memory footprint compared to original model - Can run on hardware with ~30% less memory than original Acknowledgments - Thanks to Mariusz Kurman for creating llama-pruning, a library that extends and improve this pruning methodology.

NaNK

llama