naver-clova-ix

9 models • 1 total models in database
Sort by:

donut-base-finetuned-docvqa

--- license: mit pipeline_tag: document-question-answering tags: - donut - image-to-text - vision widget: - text: "What is the invoice number?" src: "https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/invoice.png" - text: "What is the purchase amount?" src: "https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/contract.jpeg" ---

license:mit
223,744
253

donut-base

Donut model pre-trained-only. It was introduced in the paper OCR-free Document Understanding Transformer by Geewok et al. and first released in this repository. Disclaimer: The team releasing Donut did not write a model card for this model so this model card has been written by the Hugging Face team. Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batchsize, seqlen, hiddensize), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder. This model is meant to be fine-tuned on a downstream task, like document image classification or document parsing. See the model hub to look for fine-tuned versions on a task that interests you. We refer to the documentation which includes code examples.

license:mit
90,557
234

donut-base-finetuned-cord-v2

license:mit
45,281
109

donut-base-finetuned-rvlcdip

license:mit
2,518
18

donut-base-finetuned-zhtrainticket

license:mit
233
0

donut-base-finetuned-cord-v1

license:mit
61
0

donut-proto

license:mit
29
7

donut-base-finetuned-cord-v1-2560

license:mit
21
1

donut-base-finetuned-kuzushiji

1
1