infly

27 models • 1 total models in database
Sort by:

Infinity-Parser-7B

We develop Infinity-Parser, an end-to-end scanned document parsing model trained with reinforcement learning. By incorporating verifiable rewards based on layout and content, Infinity-Parser maintains the original document's structure and content with high fidelity. Extensive evaluations on benchmarks in cluding OmniDocBench, olmOCR-Bench, PubTabNet, and FinTabNet show that Infinity-Parser consistently achieves state-of-the-art performance across a broad range of document types, languages, and structural complexities, substantially outperforming both specialized document parsing systems and general-purpose vision-language models while preserving the model’s general multimodal understanding capability. - LayoutRL Framework: a reinforcement learning framework that explicitly trains models to be layout-aware through verifiable multi-aspect rewards combining edit distance, paragraph accuracy, and reading order preservation. - Infinity-Doc-400K Dataset: a large-scale dataset of 400K scanned documents that integrates high-quality synthetic data with diverse real-world samples, featuring rich layout variations and comprehensive structural annotations. - Infinity-Parser Model: a VLM-based parser that achieves new state-of-the-art performance on OCR, table and formula extraction, and reading-order detection benchmarks in both English and Chinese, while maintaining nearly the same general multimodal understanding capability as the base model. Overview of Infinity-Parser training framework. Our model is optimized via reinforcement finetuning with edit distance, layout, and order-based rewards. > Note: The baseline model is Qwen2.5-VL-7B, and all metrics are evaluated using the LMMS-Eval framework. Before starting, make sure that PyTorch is correctly installed according to the official installation guide at https://pytorch.org/. Vllm Inference We recommend using the vLLM backend for accelerated inference. It supports image and PDF inputs, automatically parses the document content, and exports the results in Markdown format to a specified directory. Adjust the tensor parallelism (tp) value — 1, 2, or 4 — and the batch size according to the number of GPUs and the available memory. [The information of result folder] The result folder contains the following contents: The generation code is available at Infinity-Synth. Limitations - Layout / BBox: The current model does not provide layout or bounding box (bbox) information, which limits its ability to support downstream tasks such as structured document reconstruction or reading order prediction. - Charts & Figures: The model lacks perception and understanding of charts and figures, and therefore cannot perform visual reasoning or structured extraction for graphical elements. We are dedicated to enabling our model to read like humans, and we firmly believe that Vision-Language Models (VLMs) can make this vision possible. We have conducted preliminary explorations of reinforcement learning (RL) for document parsing and achieved promising initial results. In future research, we will continue to deepen our efforts in the following directions: - Chart & Figure Understanding: Extend the model’s capability to handle chart detection, semantic interpretation, and structured data extraction from graphical elements. - General-Purpose Perception: Move toward a unified Vision-Language perception model that integrates detection, image captioning, OCR, layout analysis, and chart understanding into a single framework. Acknowledgments We would like to thank Qwen2.5-VL, MinerU, MonkeyOCR, EasyR1, LLaMA-Factory OmniDocBench, dots.ocr, for providing code and models.

NaNK
1,756
16

inf-retriever-v1-1.5b

NaNK
license:apache-2.0
1,625
44

OpenCoder-8B-Instruct

NaNK
llama
1,496
199

inf-retriever-v1

NaNK
license:apache-2.0
1,331
42

inf-wse-v2-base-zh

867
3

OpenCoder-1.5B-Instruct

NaNK
llama
723
46

inf-retriever-v1-pro

NaNK
license:apache-2.0
550
4

Infinity-Parser2-Pro

297
12

OpenCoder-8B-Base

NaNK
llama
267
28

inf-query-aligner

NaNK
license:apache-2.0
189
5

OpenCoder-1.5B-Base

NaNK
llama
157
23

inf-o1-pi0

NaNK
82
8

inf-rl-qwen-coder-32b-2746

NaNK
79
2

INFLogic-Qwen2.5-32B-RL-Preview

NaNK
license:apache-2.0
75
4

INFIndo-Qwen3-32B-Preview

NaNK
license:cc-by-nc-4.0
75
1

inf-rl-qwen-coder-32b-864

NaNK
74
0

INF-34B-Chat

NaNK
67
4

Universal-PRM-7B

NaNK
license:apache-2.0
65
8

INF-34B-Chat-AWQ

NaNK
63
3

InfMLLM2_7B_chat

NaNK
license:mit
63
2

INF-34B-Base

NaNK
62
7

INF-34B-Chat-GPTQ-8bit

NaNK
62
0

INF-ORM-Llama3.1-70B

NaNK
llama
59
26

INF-34B-Chat-GPTQ-4bit

NaNK
59
2

INF-AZ-7B-0524

NaNK
license:cc-by-nc-4.0
32
2

inf-wse-v1-base-zh

31
5

INFRL-Qwen2.5-VL-72B-Preview

NaNK
license:apache-2.0
31
2