swap-uniba
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
📣 New MODEL FAMILY❗ https://huggingface.co/m-polignano/ANITA-NEXT-24B-Magistral-2506-VISION-ITA --> "Built with Meta Llama 3 ". LLaMAntino-3-ANITA-8B-Inst-DPO-ITA is a model of the LLaMAntino - Large Language Models family . The model is an instruction-tuned version of Meta-Llama-3-8b-instruct (a fine-tuned LLaMA 3 model ). This model version aims to be the a Multilingual Model 🏁 (EN 🇺🇸 + ITA🇮🇹) to further fine-tuning on Specific Tasks in Italian. The 🌟ANITA project🌟 (Advanced Natural-based interaction for the ITAlian language) wants to provide Italian NLP researchers with an improved model for the Italian Language 🇮🇹 use cases. Live DEMO: https://chat.llamantino.it/ It works only with Italian connection. | Model | HF | GGUF | EXL2 | |-------|-------|-------|-------| | swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA | Link | Link | Link | - Model developers: Ph.D. Marco Polignano - University of Bari Aldo Moro, Italy SWAP Research Group - Variations: The model release has been supervised fine-tuning (SFT) using QLoRA 4bit, on instruction-based datasets. DPO approach over the mlabonne/orpo-dpo-mix-40k dataset is used to align with human preferences for helpfulness and safety. - Input: Models input text only. - Language: Multilingual 🏁 + Italian 🇮🇹 - Output: Models generate text and code only. - Model Architecture: Llama 3 architecture. - Context length: 8K, 8192. - Library Used: Unsloth To use the model directly, there are many ways to get started, choose one of the following ways to experience it. For direct use with `transformers`, you can easily get started with the following steps. - Firstly, you need to install transformers via the command below with `pip`. - Right now, you can start using the model directly. - Additionally, you can also use a model with 4bit quantization to reduce the required resources at least. You can start with the code below. Evaluated with lm-evaluation-benchmark-harness for the Open Italian LLMs Leaderboard | Metric | Value | |-----------------------|---------------------------| | Avg. | 0.6160 | | ArcIT | 0.5714 | | HellaswagIT | 0.7093 | | MMLUIT | 0.5672 | Unsloth, a great tool that helps us easily develop products, at a lower cost than expected. Acknowledgments We acknowledge the support of the PNRR project FAIR - Future AI Research (PE00000013), Spoke 6 - Symbiotic AI (CUP H97G22000210007) under the NRRP MUR program funded by the NextGenerationEU. Models are built on the Leonardo supercomputer with the support of CINECA-Italian Super Computing Resource Allocation, class C project IscrC\Pro\MRS (HP10CQO70G). Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric |Value| |---------------------------------|----:| |Avg. |75.12| |AI2 Reasoning Challenge (25-Shot)|74.57| |HellaSwag (10-Shot) |92.75| |MMLU (5-Shot) |66.85| |TruthfulQA (0-shot) |75.93| |Winogrande (5-shot) |82.00| |GSM8k (5-shot) |58.61|
LLaMAntino-2-7b-hf-ITA
LLaMAntino-2-chat-13b-hf-UltraChat-ITA
LLaMAntino-2-70b-hf-UltraChat-ITA
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA_GGUF
siglip2-large-patch16-256-VWSD-ft
Refer to the GitHub repository: https://github.com/swapUniba/VWSD-VLMs
LLM-wsd-FT-ALL
LLM-wsd-FT-ALL is a Large Language Model (LLM) instruction-tuned over meta-llama/Meta-Llama-3.1-8B-Instruct. This model has been trained for the WSD task over the entire training dataset, without machine-translation. It is capable of providing the definition of a word in a given sentence. Specifically, it can answer both: 1) Open-ended questions, where the model will generate the definition of the target word; 2) Closed-ended questions, where the model will generate the identifier of the correct option out of a list of alternatives. More details regarding the training procedure (e.g. hyperparameters, dataset construction, and so on) can be found in Section 4.2 of the paper. - Developed by: Pierpaolo Basile, Lucia Siciliani, Elio Musacchio - Model type: LLaMA 3.1 Instruct - Language(s) (NLP): English, French, German, Italian and Spanish - License: LLAMA 3.1 COMMUNITY LICENSE AGREEMENT - Finetuned from model: meta-llama/Meta-Llama-3.1-8B-Instruct The model has been trained using several instructions depending on language, task (open-ended or closed-ended) and number of occurences of target word in the sentence. In Instructions, we provide the instructions used for all cases. The following placeholder variables have to be replaced: - {targetword}: the target word in the input to disambiguate; - {options}: options to provide to the model for the closed-ended task only. The options should be newline separated and each option should be identified by a number. Refer to the closed-ended example for an example of options formatting; - {occurrence}: the ordinal number of the {targetword} occurrence (e.g. "second"). This is required only when the input sentence contains multiple occurrences of {targetword}. Please note that the complete prompt also has the following string after the instruction: where {sentence} is the input sentence containing the word to disambiguate. Below you can find two examples of model usage, for open-ended and closed-ended generation respectively. If you use this model in your research, please cite the following:
LLaVA-NDiNO_pt_long
LLaMAntino-2-chat-7b-hf-UltraChat-ITA
LLaMAntino-2-13b-hf-ITA
LLaMAntino-2-7b-hf-dolly-ITA
bloom-1b7-evalita-it
LLaMAntino-2-chat-13b-hf-ITA
LLaVA-NDiNO_pt
xVLM2Vec_image_loss
xVLM2Vecimageloss is a Large Vision-Language Model (LVLM) aligned over TIGER-Lab/VLM2Vec-LoRA. This model has been trained for increased performance in multilingual retrieval tasks, specifically it was trained on a machine-translated parallel corpus. It is capable of performing several multimodal retrieval tasks (e.g. Text-to-Image, Image-to-Text, VQA, Visual Grounding and Classification). It was trained with a different loss w.r.t. [swap-uniba/xVLM2Vec](), however no significant performance differences were found. More details regarding the training procedure (e.g. hyperparameters, dataset construction, and so on) can be found in the [paper]. - Developed by: Elio Musacchio, Lucia Siciliani, Pierpaolo Basile - Model type: Phi-3.5-vision-instruct - Language(s) (NLP): English, French, German, Italian and Spanish - License: Apache 2.0 - Finetuned from model: TIGER-Lab/VLM2Vec-LoRA Below you can find an example of model usage. To facilitate its usage, we recommend pulling from GitHub the version of the VLM2Vec source code we used for both training and inference: This is a use case where the model is being used to retrieve an image caption in Italian. If you use this model in your research, please cite the following:
llama-latin-wsd-binary
bloom-1b7-comoscio-it
LLaVA-NDiNO_short_it
llama-latin-wsd
LLaMAntino-2-13b-hf-dolly-ITA
LLM-wsd-TT-10000
Qwen2.5-VL-7B-Instruct-VWSD-ft
Refer to the GitHub repository: https://github.com/swapUniba/VWSD-VLMs
LLaMAntino-2-13b-hf-evalita-ITA
bloom-1b7-it
bloom-1b7-evalita
llama3-it-pa-100k-adapter
llama3-it-pa-300k-adapter
LLaVA-NDiNO_pt_short_it
LLM-wsd-FT-20000
LLM-wsd-FT-20000 is a Large Language Model (LLM) instruction-tuned over meta-llama/Meta-Llama-3.1-8B-Instruct. This model has been trained for the WSD task over a balanced training dataset (20000 instances per language), without machine-translation. It is capable of providing the definition of a word in a given sentence. Specifically, it can answer both: 1) Open-ended questions, where the model will generate the definition of the target word; 2) Closed-ended questions, where the model will generate the identifier of the correct option out of a list of alternatives. More details regarding the training procedure (e.g. hyperparameters, dataset construction, and so on) can be found in Section 4.2 of the paper. - Developed by: Pierpaolo Basile, Lucia Siciliani, Elio Musacchio - Model type: LLaMA 3.1 Instruct - Language(s) (NLP): English, French, German, Italian and Spanish - License: LLAMA 3.1 COMMUNITY LICENSE AGREEMENT - Finetuned from model: meta-llama/Meta-Llama-3.1-8B-Instruct The model has been trained using several instructions depending on language, task (open-ended or closed-ended) and number of occurences of target word in the sentence. In Instructions, we provide the instructions used for all cases. The following placeholder variables have to be replaced: - {targetword}: the target word in the input to disambiguate; - {options}: options to provide to the model for the closed-ended task only. The options should be newline separated and each option should be identified by a number. Refer to the closed-ended example for an example of options formatting; - {occurrence}: the ordinal number of the {targetword} occurrence (e.g. "second"). This is required only when the input sentence contains multiple occurrences of {targetword}. Please note that the complete prompt also has the following string after the instruction: where {sentence} is the input sentence containing the word to disambiguate. Below you can find two examples of model usage, for open-ended and closed-ended generation respectively. If you use this model in your research, please cite the following:
LLM-wsd-FT-10000
LLM-wsd-FT-10000 is a Large Language Model (LLM) instruction-tuned over meta-llama/Meta-Llama-3.1-8B-Instruct. This model has been trained for the WSD task over a balanced training dataset (10000 instances per language), without machine-translation. It is capable of providing the definition of a word in a given sentence. Specifically, it can answer both: 1) Open-ended questions, where the model will generate the definition of the target word; 2) Closed-ended questions, where the model will generate the identifier of the correct option out of a list of alternatives. More details regarding the training procedure (e.g. hyperparameters, dataset construction, and so on) can be found in Section 4.2 of the paper. - Developed by: Pierpaolo Basile, Lucia Siciliani, Elio Musacchio - Model type: LLaMA 3.1 Instruct - Language(s) (NLP): English, French, German, Italian and Spanish - License: LLAMA 3.1 COMMUNITY LICENSE AGREEMENT - Finetuned from model: meta-llama/Meta-Llama-3.1-8B-Instruct The model has been trained using several instructions depending on language, task (open-ended or closed-ended) and number of occurences of target word in the sentence. In Instructions, we provide the instructions used for all cases. The following placeholder variables have to be replaced: - {targetword}: the target word in the input to disambiguate; - {options}: options to provide to the model for the closed-ended task only. The options should be newline separated and each option should be identified by a number. Refer to the closed-ended example for an example of options formatting; - {occurrence}: the ordinal number of the {targetword} occurrence (e.g. "second"). This is required only when the input sentence contains multiple occurrences of {targetword}. Please note that the complete prompt also has the following string after the instruction: where {sentence} is the input sentence containing the word to disambiguate. Below you can find two examples of model usage, for open-ended and closed-ended generation respectively. If you use this model in your research, please cite the following: