sumink
Qwenftmodel
Fine-Tuned Qwen 2.5-Coder-1.5B is a causal language model fine-tuned for generating contextually relevant responses. The base model, Qwen/Qwen2.5-Coder-1.5B, features a Transformer-based architecture with 1.5 billion parameters. The model was fine-tuned on a custom dataset named subset5, consisting of prompt-response pairs tokenized with a maximum sequence length of 128 tokens. During training, inputs were padded and truncated appropriately, and labels were aligned for causal language modeling. Key hyperparameters included a learning rate of 2e-5, batch size of 1, gradient accumulation steps of 32, and 3 epochs. The AdamW optimizer was used, with weight decay set to 0.01. Training was performed on CPU without CUDA. The model can be used for tasks like answering questions, completing sentences, or generating responses. For usage, load the model and tokenizer with the Hugging Face Transformers library, tokenize your input prompt, and generate responses with the model’s generate method. Example input-output pairs demonstrate the model’s ability to generate concise, informative answers. However, the model should not be used for harmful, malicious, or unethical content, and users are responsible for adhering to applicable laws and ethical standards.
somer
This is a merge of pre-trained language models created using mergekit. This model was merged using the SLERP merge method. The following models were included in the merge: upstage/SOLAR-10.7B-Instruct-v1.0 upstage/SOLAR-10.7B-v1.0 The following YAML configuration was used to produce this model:
somer2
This is a merge of pre-trained language models created using mergekit. This model was merged using the SLERP merge method. The following models were included in the merge: upstage/SOLAR-10.7B-Instruct-v1.0 upstage/SOLAR-10.7B-v1.0 The following YAML configuration was used to produce this model:
solarmer3
This is a merge of pre-trained language models created using mergekit. This model was merged using the SLERP merge method. The following models were included in the merge: upstage/SOLAR-10.7B-Instruct-v1.0 upstage/SOLAR-10.7B-v1.0 The following YAML configuration was used to produce this model:
flflmillama
Model Overview This model is a fine-tuned version of LLaMA-3.2B, specifically trained on a dataset processed using Facility Location (FL) and Facility Location Mutual Information (FLMI) techniques. These data selection methods were employed to reduce the dataset size while retaining high-quality and representative samples, ensuring the model is trained on the most informative and diverse data points. Dataset Details Original Dataset: A filtered subset of a conversational dataset, containing examples of chosen and rejected responses. Data Preprocessing: The dataset underwent an initial Facility Location (FL) process to select 1,000 samples from the original dataset. Further refinement using Facility Location Mutual Information (FLMI) reduced the dataset to 500 highly informative samples. These methods ensured that the final dataset preserved critical information and diversity, optimizing the training efficiency and model performance. Training Configuration Base Model: LLaMA-3.2B Fine-Tuning Dataset: The final dataset of 500 samples refined through FL and FLMI techniques. Objective: Enhance the model's ability to generate high-quality, contextually accurate responses in conversational settings. Training Framework: Hugging Face Transformers library with PyTorch backend. Training Hardware: Multi-GPU setup (e.g., NVIDIA A100 GPUs). Batch Size: 16 Learning Rate: 5e-5 with linear decay. Optimizer: AdamW
llftfl7
Model Overview This model is a fine-tuned version of the Llama-3.2-3B model, trained on a curated and optimized dataset derived through Facility Location (FL) techniques. The base model, Llama-3.2-3B, is a state-of-the-art large language model designed for various natural language processing tasks, and it has been further adapted to improve its task-specific performance. Dataset Details Original Dataset: The dataset initially consisted of 10,000 samples, combining diverse conversational pairs for instruction tuning and response generation tasks. Data Selection Process: The Facility Location (FL) algorithm was applied to the original dataset to identify the most representative and diverse samples. This method maximized dataset utility by ensuring a balanced and informative subset while maintaining the richness of the original data. As a result, the dataset was reduced to 7,000 high-quality samples, retaining only the most relevant and representative data points. Dataset Characteristics: Chosen-Response Pairs: 7,000 samples of question-response pairs refined to optimize learning efficiency. Diversity & Balance: The FL algorithm ensured the dataset captures diverse language usage and contexts without redundancy.
bbhqwen
Qwen2.5-3B Fine-Tuned on BBH Dataset - Model Card 📌 Model Overview Model Name: Qwen2.5-3B Fine-Tuned on BBH Base Model: Qwen2.5-3B-Instruct Fine-Tuned Dataset: BBH (BigBench Hard) Task: Causal Language Modeling (CLM) Fine-Tuning Objective: Improve performance on reasoning and knowledge-based multiple-choice tasks 📌 Dataset Information The model was fine-tuned on BigBench Hard (BBH), a dataset designed to evaluate complex reasoning tasks. Key subsets used for training: Causal Judgement 🧠: Evaluating causality understanding Date Understanding 📆: Temporal reasoning and date manipulation Boolean Expressions ✅❌: Logical reasoning Dataset characteristics: Format: Multiple-choice questions Domains: Logic, Mathematics, Commonsense Reasoning Label Mapping: Answers converted into numerical classes (e.g., A → 0, B → 1, etc.)
Qwensci
Fine-Tuned Qwen 2.5-Coder-1.5B is a causal language model fine-tuned for generating contextually relevant responses. The base model, Qwen/Qwen2.5-Coder-1.5B, features a Transformer-based architecture with 1.5 billion parameters. The model was fine-tuned on a custom dataset named subset5, consisting of prompt-response pairs tokenized with a maximum sequence length of 128 tokens. The dataset includes diverse mathematical problems and solutions, along with general prompt-response pairs, to enhance the model's performance on mathematical reasoning tasks. Additionally, the model was further fine-tuned on a science dataset comprising 7,787 genuine grade-school-level multiple-choice science questions, promoting the model's ability to handle scientific reasoning and knowledge-based tasks. During training, inputs were padded and truncated appropriately, and labels were aligned for causal language modeling. Key hyperparameters included a learning rate of 2e-5, batch size of 1, gradient accumulation steps of 32, and 3 epochs. The AdamW optimizer was used, with weight decay set to 0.01. Training was performed on CPU without CUDA. The model can be used for tasks like answering questions, completing sentences, solving mathematical problems, tackling scientific queries, or generating responses. For usage, load the model and tokenizer with the Hugging Face Transformers library, tokenize your input prompt, and generate responses with the model’s generate method. Example input-output pairs demonstrate the model’s ability to generate concise, informative answers, including solving mathematical problems and answering scientific questions accurately. However, the model should not be used for harmful, malicious, or unethical content, and users are responsible for adhering to applicable laws and ethical standards.
llamamerge
This document outlines the process of merging two pre-trained models, psmathur/orcaminiv313b and garage-bAInd/Platypus2-13B, using Spherical Linear Interpolation (SLERP). The base model for the merge is psmathur/orcaminiv313b, and the weights are processed in float16 format to optimize memory usage. SLERP ensures a smooth blending of the model weights, allowing the merged model to benefit from the strengths of both original models. The merging parameters include specific interpolation values for different components: [0.0, 0.5, 0.3, 0.7, 1.0] for the selfattn layers and [1.0, 0.5, 0.7, 0.3, 0.0] for the mlp layers. For all other layers, a default value of 0.5 is applied. The layer slices from both models are merged within the range of layers 0-40. To replicate this process, the merging script is run using the configuration provided in the gradient-slerp.yml file. The merged model will be saved in the designated output directory. This approach ensures the new model combines the unique capabilities of both input models while maintaining balanced performance.
qwmer
This is a merge of pre-trained language models created using mergekit. This model was merged using the SLERP merge method. The following models were included in the merge: Qwen/Qwen2-7B-Instruct Qwen/Qwen2.5-7B-Instruct The following YAML configuration was used to produce this model:
Qmerft
This is a merge of pre-trained language models created using mergekit. This model was merged using the SLERP merge method. The following models were included in the merge: Qwen/Qwen2.5-1.5B sumink/Qwenftmodel The following YAML configuration was used to produce this model:
somerft
This repository contains a fine-tuned version of the SOLAR Model, merged and customized for enhanced performance on specific tasks. Overview The model was fine-tuned starting from the merged SOLAR model. The merging process combined the SOLAR-10.7B-Instruct and SOLAR-10.7B models using the slerp merge method, focusing on specific layers to optimize self-attention and feed-forward modules. This fine-tuning process specializes the model for domain-specific tasks while maintaining the high performance and generalization capabilities of the original SOLAR model.
llamaft
This model is a fine-tuned version of LLaMA 3.2-3B, trained on a carefully curated dataset of 500 samples selected using Facility Location (FL) optimization. The dataset was refined from a larger corpus through representative sample selection, ensuring that the most informative and diverse data points were retained while redundant and uninformative samples were removed. Fine-tuning was conducted to improve task-specific performance while significantly reducing training cost and data inefficiencies. By leveraging FL-based data selection, we ensured that the final dataset maintained high coverage and diversity while requiring only 5% of the original dataset size.
bbhqwen2
🚀 Qwen2.5-3B Fine-Tuned on BBH (Disambiguation QA) - Model Card 📌 Model Overview Model Name: Qwen2.5-3B Fine-Tuned on BBH (Disambiguation QA) Base Model: Qwen2.5-3B-Instruct Fine-Tuned Dataset: BBH (BigBench Hard) - Disambiguation QA Task: Causal Language Modeling (CLM) Fine-Tuning Objective: Improve performance on pronoun disambiguation and antecedent reasoning tasks 📌 Dataset Information This model was fine-tuned on the Disambiguation QA subset of the BigBench Hard (BBH) dataset. Task Type: Pronoun resolution & antecedent disambiguation Input Format: Multiple-choice questions (MCQ) with ambiguous pronouns Target Labels: (A), (B), (C), or Ambiguous Example: Input: "In the following sentences, explain the antecedent of the pronoun (which thing the pronoun refers to), or state that it is ambiguous. Sentence: The patient was referred to the specialist because he had a rare skin condition. Options: (A) The patient had a skin condition (B) The specialist had a skin condition (C) Ambiguous" This dataset evaluates a model's ability to correctly resolve pronouns in ambiguous sentences, which is crucial for natural language understanding (NLU), legal/medical document processing, and AI explainability.
bbhqwen5
🚀 Qwen2.5-3B Fine-Tuned on BBH (Geometric Shapes) - Model Card 📌 Model Overview Model Name: Qwen2.5-3B Fine-Tuned on BBH (Geometric Shapes) Base Model: Qwen2.5-3B-Instruct Fine-Tuned Dataset: BBH (BigBench Hard) - Geometric Shapes Task: Shape Classification from SVG Path Data Fine-Tuning Objective: Enhance the model’s ability to recognize and classify geometric shapes based on SVG path descriptions. 📌 Dataset Information This model was fine-tuned on the Geometric Shapes subset of the BigBench Hard (BBH) dataset. Task Type: Shape classification from SVG path representations Input Format: SVG path data describing a geometric shape Target Labels: "A" to "K" representing different shapes Example: Input: "This SVG path element draws a Options: (A) circle (B) heptagon (C) hexagon (D) kite (E) line (F) octagon (G) pentagon (H) rectangle (I) sector (J) triangle" This dataset evaluates a model’s ability to decode and classify shape data from vector graphics formats, which is crucial for AI-assisted design, graphics processing, and computational geometry.