yifeihu

7 models • 1 total models in database

Sort by:

TFT ID 1.0

TFT-ID: Table/Figure/Text IDentifier for academic papers TFT-ID (Table/Figure/Text IDentifier) is an object detection model finetuned to extract tables, figures, and text sections in academic papers created by Yifei Hu. TFT-ID is finetuned from microsoft/Florence-2 checkpoints. - The model was finetuned with papers from Hugging Face Daily Papers. All 36,000+ bounding boxes are manually annotated and checked by Yifei Hu. - TFT-ID model takes an image of a single paper page as the input, and return bounding boxes for all tables, figures, and text sections in the given page. - The text sections contain clean text content perfect for downstream OCR workflows. I recommend using TB-OCR-preview-0.1 [[HF]](https://huggingface.co/yifeihu/TB-OCR-preview-0.1) as the OCR model to convert the text sections into clean markdown and math latex output. Object Detection results format: {'\ ': {'bboxes': [[x1, y1, x2, y2], ...], 'labels': ['label1', 'label2', ...]} } Training Code and Dataset - Dataset: Coming soon. - Code: github.com/ai8hyf/TF-ID The model was tested on paper pages outside the training dataset. The papers are a subset of huggingface daily paper. Correct output - the model draws correct bounding boxes for every table/figure/text section in the given page and does not missing any content. Task 1: Table, Figure, and Text Section Identification | Model | Total Images | Correct Output | Success Rate | |---------------------------------------------------------------|--------------|----------------|--------------| | TFT-ID-1.0[[HF]](https://huggingface.co/yifeihu/TFT-ID-1.0) | 373 | 361 | 96.78% | Task 2: Table and Figure Identification | Model | Total Images | Correct Output | Success Rate | |---------------------------------------------------------------|--------------|----------------|--------------| | TFT-ID-1.0[[HF]](https://huggingface.co/yifeihu/TFT-ID-1.0) | 258 | 255 | 98.84% | | TF-ID-large[[HF]](https://huggingface.co/yifeihu/TF-ID-large) | 258 | 253 | 98.06% | Note: Depending on the use cases, some "incorrect" output could be totally usable. For example, the model draw two bounding boxes for one figure with two child components. For non-CUDA environments, please check out this post for a simple patch: https://huggingface.co/microsoft/Florence-2-base/discussions/4 To visualize the results, see this tutorial notebook for more details.

license:mit

106

Florence-2-DocLayNet-Fixed

NaNK

license:apache-2.0

TB-OCR-preview-0.1

license:mit

129

TF-ID-base-no-caption

license:mit

yifeihu

TF-ID-large

TF-ID-base

TF-ID-large-no-caption

TFT ID 1.0

Florence-2-DocLayNet-Fixed

TB-OCR-preview-0.1

TF-ID-base-no-caption