facebook

500 models • 65 total models in database

Sort by:

esm2_t33_650M_UR50D

ESM-2 is a state-of-the-art protein model trained on a masked language modelling objective. It is suitable for fine-tuning on a wide range of tasks that take protein sequences as input. For detailed information on the model architecture and training data, please refer to the accompanying paper. You may also be interested in some demo notebooks (PyTorch, TensorFlow) which demonstrate how to fine-tune ESM-2 models on your tasks of interest. Several ESM-2 checkpoints are available in the Hub with varying sizes. Larger sizes generally have somewhat better accuracy, but require much more memory and time to train: | Checkpoint name | Num layers | Num parameters | |------------------------------|----|----------| | esm2t4815BUR50D | 48 | 15B | | esm2t363BUR50D | 36 | 3B | | esm2t33650MUR50D | 33 | 650M | | esm2t30150MUR50D | 30 | 150M | | esm2t1235MUR50D | 12 | 35M | | esm2t68MUR50D | 6 | 8M |

—

11,021,509

contriever

This model has been trained without supervision following the approach described in [Towards Unsupervised Dense Information Retrieval with Contrastive Learning](https://arxiv.org/abs/2112.09118). The associated GitHub repository is available here https://github.com/facebookresearch/contriever.

—

7,793,446

wav2vec2-base-960h

The base model pretrained and fine-tuned on 960 hours of Librispeech on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. Authors: Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler. wav2vec 2.0 masks the speech input in the latent space and solves a contrastive task defined over a quantization of the latent representations which are jointly learned. Experiments using all labeled data of Librispeech achieve 1.8/3.3 WER on the clean/other test sets. When lowering the amount of labeled data to one hour, wav2vec 2.0 outperforms the previous state of the art on the 100 hour subset while using 100 times less labeled data. Using just ten minutes of labeled data and pre-training on 53k hours of unlabeled data still achieves 4.8/8.2 WER. This demonstrates the feasibility of speech recognition with limited amounts of labeled data. The original model can be found under https://github.com/pytorch/fairseq/tree/master/examples/wav2vec#wav2vec-20. To transcribe audio files the model can be used as a standalone acoustic model as follows: This code snippet shows how to evaluate facebook/wav2vec2-base-960h on LibriSpeech's "clean" and "other" test data.

—

5,669,239

381

opt-125m

OPT was first introduced in Open Pre-trained Transformer Language Models and first released in metaseq's repository on May 3rd 2022 by Meta AI. Disclaimer: The team releasing OPT wrote an official model card, which is available in Appendix D of the paper. Content from this model card has been written by the Hugging Face team. To quote the first two paragraphs of the official paper > Large language models trained on massive text collections have shown surprising emergent > capabilities to generate text and perform zero- and few-shot learning. While in some cases the public > can interact with these models through paid APIs, full model access is currently limited to only a > few highly resourced labs. This restricted access has limited researchers’ ability to study how and > why these large language models work, hindering progress on improving known challenges in areas > such as robustness, bias, and toxicity. > We present Open Pretrained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M > to 175B parameters, which we aim to fully and responsibly share with interested researchers. We train the OPT models to roughly match > the performance and sizes of the GPT-3 class of models, while also applying the latest best practices in data > collection and efficient training. Our aim in developing this suite of OPT models is to enable reproducible and responsible research at scale, and > to bring more voices to the table in studying the impact of these LLMs. Definitions of risk, harm, bias, and toxicity, etc., should be articulated by the > collective research community as a whole, which is only possible when models are available for study. OPT was predominantly pretrained with English text, but a small amount of non-English data is still present within the training corpus via CommonCrawl. The model was pretrained using a causal language modeling (CLM) objective. OPT belongs to the same family of decoder-only models like GPT-3. As such, it was pretrained using the self-supervised causal language modedling objective. For evaluation, OPT follows GPT-3 by using their prompts and overall experimental setup. For more details, please read the official paper. Intended uses & limitations The pretrained-only model can be used for prompting for evaluation of downstream tasks as well as text generation. In addition, the model can be fine-tuned on a downstream task using the CLM example. For all other OPT checkpoints, please have a look at the model hub. You can use this model directly with a pipeline for text generation. By default, generation is deterministic. In order to use the top-k sampling, please set `dosample` to `True`. As mentioned in Meta AI's model card, given that the training data used for this model contains a lot of unfiltered content from the internet, which is far from neutral the model is strongly biased : > Like other large language models for which the diversity (or lack thereof) of training > data induces downstream impact on the quality of our model, OPT-175B has limitations in terms > of bias and safety. OPT-175B can also have quality issues in terms of generation diversity and > hallucination. In general, OPT-175B is not immune from the plethora of issues that plague modern > large language models. This bias will also affect all fine-tuned versions of this model. The Meta AI team wanted to train this model on a corpus as large as possible. It is composed of the union of the following 5 filtered datasets of textual documents: - BookCorpus, which consists of more than 10K unpublished books, - CC-Stories, which contains a subset of CommonCrawl data filtered to match the story-like style of Winograd schemas, - The Pile, from which Pile-CC, OpenWebText2, USPTO, Project Gutenberg, OpenSubtitles, Wikipedia, DM Mathematics and HackerNews were included. - Pushshift.io Reddit dataset that was developed in Baumgartner et al. (2020) and processed in Roller et al. (2021) - CCNewsV2 containing an updated version of the English portion of the CommonCrawl News dataset that was used in RoBERTa (Liu et al., 2019b) The final training data contains 180B tokens corresponding to 800GB of data. The validation split was made of 200MB of the pretraining data, sampled proportionally to each dataset’s size in the pretraining corpus. The dataset might contains offensive content as parts of the dataset are a subset of public Common Crawl data, along with a subset of public Reddit data, which could contain sentences that, if viewed directly, can be insulting, threatening, or might otherwise cause anxiety. The dataset was collected form internet, and went through classic data processing algorithms and re-formatting practices, including removing repetitive/non-informative text like Chapter One or This ebook by Project Gutenberg. The texts are tokenized using the GPT2 byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a vocabulary size of 50272. The inputs are sequences of 2048 consecutive tokens. The 175B model was trained on 992 80GB A100 GPUs. The training duration was roughly ~33 days of continuous training.

—

4,176,325

224

esmfold_v1

--- license: mit ---

—

3,754,685

bart-large-cnn

--- language: - en pipeline_tag: summarization license: mit thumbnail: https://huggingface.co/front/thumbnails/facebook.png datasets: - cnn_dailymail model-index: - name: facebook/bart-large-cnn results: - task: type: summarization name: Summarization dataset: name: cnn_dailymail type: cnn_dailymail config: 3.0.0 split: train metrics: - name: ROUGE-1 type: rouge value: 42.9486 verified: true - name: ROUGE-2 type: rouge value: 20.8149 verified: true - name: ROUGE-L type: rouge value: 30.6186 veri

—

3,432,857

1,496

bart-large-mnli

This is the checkpoint for bart-large after being trained on the MultiNLI (MNLI) dataset. Additional information about this model: - The bart-large model page - BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension Yin et al. proposed a method for using pre-trained NLI models as a ready-made zero-shot sequence classifiers. The method works by posing the sequence to be classified as the NLI premise and to construct a hypothesis from each candidate label. For example, if we want to evaluate whether a sequence belongs to the class "politics", we could construct a hypothesis of `This text is about politics.`. The probabilities for entailment and contradiction are then converted to label probabilities. This method is surprisingly effective in many cases, particularly when used with larger pre-trained models like BART and Roberta. See this blog post for a more expansive introduction to this and other zero shot methods, and see the code snippets below for examples of using this model for zero-shot classification both with Hugging Face's built-in pipeline and with native Transformers/PyTorch code. The model can be loaded with the `zero-shot-classification` pipeline like so: You can then use this pipeline to classify sequences into any of the class names you specify. If more than one candidate label can be correct, pass `multilabel=True` to calculate each class independently:

—

3,285,410

1,479

flava-full

--- license: bsd-3-clause ---

license:bsd-3-clause

2,731,883

dinov2-base

Vision Transformer (base-sized model) trained using DINOv2 Vision Transformer (ViT) model trained using the DINOv2 method. It was introduced in the paper DINOv2: Learning Robust Visual Features without Supervision by Oquab et al. and first released in this repository. Disclaimer: The team releasing DINOv2 did not write a model card for this model so this model card has been written by the Hugging Face team. The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fashion. Images are presented to the model as a sequence of fixed-size patches, which are linearly embedded. One also adds a [CLS] token to the beginning of a sequence to use it for classification tasks. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder. Note that this model does not include any fine-tuned heads. By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image. You can use the raw model for feature extraction. See the model hub to look for fine-tuned versions on a task that interests you.

license:apache-2.0

2,030,054

157

w2v-bert-2.0

--- license: mit language: - af - am - ar - as - az - be - bn - bs - bg - ca - cs - zh - cy - da - de - el - en - et - fi - fr - or - om - ga - gl - gu - ha - he - hi - hr - hu - hy - ig - id - is - it - jv - ja - kn - ka - kk - mn - km - ky - ko - lo - ln - lt - lb - lg - lv - ml - mr - mk - mt - mi - my - nl - nb - ne - ny - oc - pa - ps - fa - pl - pt - ro - ru - sk - sl - sn - sd - so - es - sr - sv - sw - ta - te - tg - tl - th - tr - uk - ur - uz - vi - wo - xh - yo - ms - zu - ary - arz -

license:mit

1,700,988

194

bart-base

--- license: apache-2.0 language: en ---

license:apache-2.0

1,538,740

200

dinov2-small

--- license: apache-2.0 tags: - dino - vision ---

license:apache-2.0

1,513,306

musicgen-medium

--- inference: true tags: - musicgen license: cc-by-nc-4.0 pipeline_tag: text-to-audio widget: - text: a funky house with 80s hip hop vibes example_title: Prompt 1 - text: a chill song with influences from lofi, chillstep and downtempo example_title: Prompt 2 - text: a catchy beat for a podcast intro example_title: Prompt 3 ---

facebook

esm2_t33_650M_UR50D

contriever

wav2vec2-base-960h

opt-125m

esmfold_v1

bart-large-cnn

bart-large-mnli

flava-full

dinov2-base

w2v-bert-2.0

bart-base

dinov2-small

musicgen-medium

roberta-hate-speech-dynabench-r4-target

esm2_t6_8M_UR50D

mms-300m

esm2_t36_3B_UR50D

m2m100_418M

detr-resnet-50

dino-vits16

hubert-base-ls960

encodec_24khz

wav2vec2-base

dinov3-vitb16-pretrain-lvd1689m

wav2vec2-large-xlsr-53

VGGT-1B

dinov3-vitl16-pretrain-lvd1689m

m2m100_1.2B

dinov2-large

esm2_t30_150M_UR50D

wav2vec2-xlsr-53-espeak-cv-ft

esm2_t12_35M_UR50D

fasttext-language-identification

PE-Core-G14-448

sam-vit-base

mask2former-swin-large-cityscapes-semantic

nllb-200-distilled-600M

cwm

hubert-large-ls960-ft

dinov3-vits16-pretrain-lvd1689m

dinov2-with-registers-base

mms-lid-256

sam-vit-huge

mbart-large-50-many-to-many-mmt

sam2-hiera-large

dinov2-giant

dino-vitb16

wav2vec2-xls-r-300m

wav2vec2-large-robust-ft-libri-960h

sam-vit-large

dinov3-vith16plus-pretrain-lvd1689m

convnextv2-tiny-22k-224

sam2.1-hiera-large

opt-350m

opt-1.3b

vjepa2-vitl-fpc64-256

mask2former-swin-large-mapillary-vistas-semantic

nllb-200-distilled-1.3B

PE-Core-L14-336

detr-resnet-50-panoptic

musicgen-small

wav2vec2-large-es-voxpopuli

encodec_32khz

wav2vec2-conformer-rope-large-960h-ft

dpr-ctx_encoder-single-nq-base

mask2former-swin-tiny-coco-instance

rag-sequence-nq

chameleon-7b

metaclip-b32-400m

dinov3-vit7b16-pretrain-sat493m

dpr-question_encoder-single-nq-base

wav2vec2-large-960h

contriever-msmarco

nllb-200-3.3B

wav2vec2-large-960h-lv60-self

detr-resnet-101

bart-large

vit-mae-base

mms-1b-all