anakin87

15 models • 1 total models in database

Sort by:

zephyr-7b-alpha-sharded

UPDATE The original model (Zephyr 7B Alpha) was recently sharded. You can use the original model. 💻 Using this version, you can smoothly load the model on Colab and play with it! From the original model card: > Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr-7B-α is the first model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1 that was trained on on a mix of publicly available, synthetic datasets using Direct Preference Optimization (DPO). We found that removing the in-built alignment of these datasets boosted performance on MT Bench and made the model more helpful. However, this means that model is likely to generate problematic text when prompted to do so and should only be used for educational and research purposes. Usage This version of the model is meant primarily to run smoothly on Colab. I suggest loading the model with 8-bit quantization, so that you have some free GPU to perform inference. However, it is perfectly fine to load the model in half-precision or with stronger quantization (4-bit).

NaNK

license:mit

109

Electra Italian Xxl Cased Squad It

Electra model for (Extractive) Question Answering on Italian texts Model description This model has been fine-tuned on squadit dataset, starting from the pre-trained model dbmdz/electra-base-italian-xxl-cased-discriminator. It can be used for Extractive Q&A on Italian texts. | Metric | Value | | ------ | --------- | | EM | 0.660 | | F1 | 0.775 | Usage in Transformers 🤗 Model checkpoints are available for usage in PyTorch. They can be used directly with pipelines as: With the Haystack NLP framework, you can use this model and create a scalable Question Answering system that works across millions of documents. | Model | EM | F1 | Model size (PyTorch) | Architecture | |-----------------------------------------------------------|-------|-------|----------------------|------------------| | it5/it5-large-question-answering | 69.10 | 78.00 | 3.13 GB | encoder-decoder | | anakin87/electra-italian-xxl-cased-squad-it (this one) | 66.03 | 77.47 | 437 MB | encoder | | it5/it5-base-question-answering | 66.30 | 76.10 | 990 MB | encoder-decoder | | it5/mt5-base-question-answering | 66.30 | 75.70 | 2.33 GB | encoder-decoder | | antoniocappiello/bert-base-italian-uncased-squad-it | 63.80 | 75.30 | 440 MB | encoder | | luigisaetta/squaditxxlcasedhub1 | 63.95 | 75.27 | 440 MB | encoder | | it5/it5-efficient-small-el32-question-answering | 64.50 | 74.70 | 569 MB | encoder-decoder | | mrm8488/bert-italian-finedtuned-squadv1-it-alfa | 62.51 | 74.16 | 440 MB | encoder | | mrm8488/umberto-wikipedia-uncased-v1-finetuned-squadv1-it | 60.50 | 72.41 | 443 MB | encoder | | it5/it5-small-question-answering | 61.90 | 71.60 | 308 MB | encoder-decoder | | it5/mt5-small-question-answering | 56.00 | 66.00 | 1.2 GB | encoder-decoder | | DrQA-it trained on SQuAD-it | 56.10 | 65.90 | ? | ? | Hyperparameters - learningrate: 2e-05 - batchsize: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lrschedulertype: linear - numepochs: 2 - mixedprecisiontraining: Native AMP > Created by Stefano Fiorucci/anakin87 > > Made with &hearts; in Italy

license:apache-2.0

gemma-2b-orpo-GGUF

NaNK

—

Qwen3-0.6B-alphabet-sort-grpo

This model was trained using GRPO with the 🔀 alphabet-sort RL environment. Compared to the original model, it shows improved performance on this alphabetical sorting task. ➡️ For training walkthrough, evaluation and other details, refer to this article.

NaNK

license:apache-2.0

LFM2-2.6B-mr-tictactoe

NaNK

—

gemma-2-2b-ita-sft

This model is based on google/gemma-2-2b-it and has been trained on the Italian language using Instruction Fine Tuning. ⚠️ This checkpoint is provided for documentation purposes only. For better performance on the Italian language, use anakin87/gemma-2-2b-neogenesis-ita, which builds upon the current model and applies DPO for improved results.

NaNK

—