infinity-parser2 0.1.0__tar.gz → 0.2.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/PKG-INFO +43 -5
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/README.md +42 -4
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/__init__.py +1 -1
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/cli.py +9 -9
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/parser.py +6 -5
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2.egg-info/PKG-INFO +43 -5
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/setup.py +1 -1
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/__main__.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/backends/__init__.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/backends/base.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/backends/transformers.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/backends/vllm_engine.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/backends/vllm_server.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/prompts.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/utils/__init__.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/utils/file.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/utils/image.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/utils/model.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/utils/pdf.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/utils/utils.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2.egg-info/SOURCES.txt +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2.egg-info/dependency_links.txt +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2.egg-info/entry_points.txt +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2.egg-info/requires.txt +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2.egg-info/top_level.txt +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/setup.cfg +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/tests/__init__.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/tests/test_backends.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/tests/test_parser.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/tests/test_utils.py +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: infinity_parser2
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.2.0
|
|
4
4
|
Summary: Document parsing Python package supporting PDF and image parsing using Infinity-Parser2-Pro model.
|
|
5
5
|
Home-page: https://github.com/infly-ai/INF-MLLM
|
|
6
6
|
Author: INF Tech
|
|
@@ -53,7 +53,33 @@ Dynamic: summary
|
|
|
53
53
|
|
|
54
54
|
# Infinity-Parser2
|
|
55
55
|
|
|
56
|
-
|
|
56
|
+
<p align="center">
|
|
57
|
+
<img src="https://raw.githubusercontent.com/infly-ai/INF-MLLM/main/Infinity-Parser2/assets/logo.png" width="400"/>
|
|
58
|
+
<p>
|
|
59
|
+
|
|
60
|
+
<p align="center">
|
|
61
|
+
🤗 <a href="https://huggingface.co/infly/Infinity-Parser2-Pro">Model</a> |
|
|
62
|
+
📊 <a>Dataset (coming soon...)</a> |
|
|
63
|
+
📄 <a>Paper (coming soon...)</a> |
|
|
64
|
+
🚀 <a>Demo (coming soon...)</a>
|
|
65
|
+
</p>
|
|
66
|
+
|
|
67
|
+
## Introduction
|
|
68
|
+
|
|
69
|
+
We are excited to release Infinity-Parser2-Pro, our latest flagship document understanding model that achieves a new state-of-the-art on olmOCR-Bench with a score of 86.7%, surpassing frontier models such as DeepSeek-OCR-2, PaddleOCR-VL, and dots.mocr. Building on our previous model Infinity-Parser-7B, we have significantly enhanced our data engine and multi-task reinforcement learning approach. This enables the model to consolidate robust multi-modal parsing capabilities into a unified architecture, delivering brand-new zero-shot capabilities for diverse real-world business scenarios.
|
|
70
|
+
|
|
71
|
+
### Key Features
|
|
72
|
+
|
|
73
|
+
- **Upgraded Data Engine**: We have comprehensively enhanced our synthetic data engine to support both fixed-layout and flexible-layout document formats. By generating over 1 million diverse full-text samples covering a wide range of document layouts, combined with a dynamic adaptive sampling strategy, we ensure highly balanced and robust multi-task learning across various document types.
|
|
74
|
+
- **Multi-Task Reinforcement Learning**: We designed a novel verifiable reward system to support Joint Reinforcement Learning (RL), enabling seamless and simultaneous co-optimization of multiple complex tasks, including doc2json and doc2markdown.
|
|
75
|
+
- **Breakthrough Parsing Performance**: It substantially outperforms our previous 7B model, achieving 86.7% on olmOCR-Bench, surpassing frontier models such as DeepSeek-OCR-2, PaddleOCR-VL, and dots.mocr.
|
|
76
|
+
- **Inference Acceleration**: By adopting the highly efficient MoE architecture, our inference throughput has increased by 21% (from 441 to 534 tokens/sec), reducing deployment latency and costs.
|
|
77
|
+
|
|
78
|
+
## Performance
|
|
79
|
+
|
|
80
|
+
<p align="left">
|
|
81
|
+
<img src="https://raw.githubusercontent.com/infly-ai/INF-MLLM/main/Infinity-Parser2/assets/document_parsing_performance_evaluation.png" width="1200"/>
|
|
82
|
+
<p>
|
|
57
83
|
|
|
58
84
|
## Quick Start
|
|
59
85
|
|
|
@@ -62,13 +88,17 @@ Infinity-Parser2 is a document parsing tool powered by the Infinity-Parser2-Pro
|
|
|
62
88
|
#### Pre-requisites
|
|
63
89
|
|
|
64
90
|
```bash
|
|
65
|
-
#
|
|
91
|
+
# Create a Conda environment (Optional)
|
|
92
|
+
conda create -n infinity_parser2 python=3.12
|
|
93
|
+
conda activate infinity_parser2
|
|
94
|
+
|
|
95
|
+
# Install PyTorch (CUDA). Find the proper version at https://pytorch.org/get-started/previous-versions based on your CUDA version.
|
|
66
96
|
pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu128
|
|
67
97
|
|
|
68
98
|
# Install FlashAttention (required for NVIDIA GPUs).
|
|
69
99
|
# This command builds flash-attn from source, which can take 10 to 30 minutes.
|
|
70
100
|
pip install flash-attn==2.8.3 --no-build-isolation
|
|
71
|
-
# For Hopper GPUs (e.g. H100, H800), we recommend FlashAttention-3 instead. See the
|
|
101
|
+
# For Hopper GPUs (e.g. H100, H800), we recommend FlashAttention-3 instead. See the official guide at https://github.com/Dao-AILab/flash-attention.
|
|
72
102
|
|
|
73
103
|
# Install vLLM
|
|
74
104
|
# NOTE: you may need to run the command below to resolve triton and numpy conflicts before installing vllm.
|
|
@@ -95,6 +125,8 @@ pip install -e .
|
|
|
95
125
|
The `parser` command is the fastest way to get started.
|
|
96
126
|
|
|
97
127
|
```bash
|
|
128
|
+
# NOTE: The Infinity-Parser2 model will be automatically downloaded on the first run.
|
|
129
|
+
|
|
98
130
|
# Parse a PDF (outputs Markdown by default)
|
|
99
131
|
parser demo_data/demo.pdf
|
|
100
132
|
|
|
@@ -122,6 +154,8 @@ parser --help
|
|
|
122
154
|
#### Python API
|
|
123
155
|
|
|
124
156
|
```python
|
|
157
|
+
# NOTE: The Infinity-Parser2 model will be automatically downloaded on the first run.
|
|
158
|
+
|
|
125
159
|
from infinity_parser2 import InfinityParser2
|
|
126
160
|
|
|
127
161
|
parser = InfinityParser2()
|
|
@@ -154,7 +188,7 @@ result = parser.parse("demo_data/demo.pdf", task_type="doc2md")
|
|
|
154
188
|
|
|
155
189
|
# Custom prompt
|
|
156
190
|
result = parser.parse("demo_data/demo.pdf", task_type="custom",
|
|
157
|
-
custom_prompt="
|
|
191
|
+
custom_prompt="Please transform the document's contents into Markdown format.")
|
|
158
192
|
|
|
159
193
|
# Batch processing with custom batch size
|
|
160
194
|
result = parser.parse("demo_data", batch_size=8)
|
|
@@ -308,3 +342,7 @@ print(cache.resolve_model_path("infly/Infinity-Parser2-Pro"))
|
|
|
308
342
|
- Python 3.12+
|
|
309
343
|
- CUDA-compatible GPU
|
|
310
344
|
- See `setup.py` for full dependency list.
|
|
345
|
+
|
|
346
|
+
## Acknowledgments
|
|
347
|
+
|
|
348
|
+
We would like to thank [Qwen3.5](https://github.com/QwenLM/Qwen3.5), [ms-swift](https://github.com/modelscope/ms-swift), [VeRL](https://github.com/verl-project/verl), [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval), [olmocr](https://huggingface.co/datasets/allenai/olmOCR-bench), [PaddleOCR-VL](https://github.com/PaddlePaddle/PaddleOCR), [MinerU](https://github.com/opendatalab/MinerU), [dots.ocr](https://github.com/rednote-hilab/dots.ocr), [Chandra-OCR-2](https://github.com/datalab-to/chandra) for providing dataset, code and models.
|
|
@@ -1,6 +1,32 @@
|
|
|
1
1
|
# Infinity-Parser2
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
<p align="center">
|
|
4
|
+
<img src="https://raw.githubusercontent.com/infly-ai/INF-MLLM/main/Infinity-Parser2/assets/logo.png" width="400"/>
|
|
5
|
+
<p>
|
|
6
|
+
|
|
7
|
+
<p align="center">
|
|
8
|
+
🤗 <a href="https://huggingface.co/infly/Infinity-Parser2-Pro">Model</a> |
|
|
9
|
+
📊 <a>Dataset (coming soon...)</a> |
|
|
10
|
+
📄 <a>Paper (coming soon...)</a> |
|
|
11
|
+
🚀 <a>Demo (coming soon...)</a>
|
|
12
|
+
</p>
|
|
13
|
+
|
|
14
|
+
## Introduction
|
|
15
|
+
|
|
16
|
+
We are excited to release Infinity-Parser2-Pro, our latest flagship document understanding model that achieves a new state-of-the-art on olmOCR-Bench with a score of 86.7%, surpassing frontier models such as DeepSeek-OCR-2, PaddleOCR-VL, and dots.mocr. Building on our previous model Infinity-Parser-7B, we have significantly enhanced our data engine and multi-task reinforcement learning approach. This enables the model to consolidate robust multi-modal parsing capabilities into a unified architecture, delivering brand-new zero-shot capabilities for diverse real-world business scenarios.
|
|
17
|
+
|
|
18
|
+
### Key Features
|
|
19
|
+
|
|
20
|
+
- **Upgraded Data Engine**: We have comprehensively enhanced our synthetic data engine to support both fixed-layout and flexible-layout document formats. By generating over 1 million diverse full-text samples covering a wide range of document layouts, combined with a dynamic adaptive sampling strategy, we ensure highly balanced and robust multi-task learning across various document types.
|
|
21
|
+
- **Multi-Task Reinforcement Learning**: We designed a novel verifiable reward system to support Joint Reinforcement Learning (RL), enabling seamless and simultaneous co-optimization of multiple complex tasks, including doc2json and doc2markdown.
|
|
22
|
+
- **Breakthrough Parsing Performance**: It substantially outperforms our previous 7B model, achieving 86.7% on olmOCR-Bench, surpassing frontier models such as DeepSeek-OCR-2, PaddleOCR-VL, and dots.mocr.
|
|
23
|
+
- **Inference Acceleration**: By adopting the highly efficient MoE architecture, our inference throughput has increased by 21% (from 441 to 534 tokens/sec), reducing deployment latency and costs.
|
|
24
|
+
|
|
25
|
+
## Performance
|
|
26
|
+
|
|
27
|
+
<p align="left">
|
|
28
|
+
<img src="https://raw.githubusercontent.com/infly-ai/INF-MLLM/main/Infinity-Parser2/assets/document_parsing_performance_evaluation.png" width="1200"/>
|
|
29
|
+
<p>
|
|
4
30
|
|
|
5
31
|
## Quick Start
|
|
6
32
|
|
|
@@ -9,13 +35,17 @@ Infinity-Parser2 is a document parsing tool powered by the Infinity-Parser2-Pro
|
|
|
9
35
|
#### Pre-requisites
|
|
10
36
|
|
|
11
37
|
```bash
|
|
12
|
-
#
|
|
38
|
+
# Create a Conda environment (Optional)
|
|
39
|
+
conda create -n infinity_parser2 python=3.12
|
|
40
|
+
conda activate infinity_parser2
|
|
41
|
+
|
|
42
|
+
# Install PyTorch (CUDA). Find the proper version at https://pytorch.org/get-started/previous-versions based on your CUDA version.
|
|
13
43
|
pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu128
|
|
14
44
|
|
|
15
45
|
# Install FlashAttention (required for NVIDIA GPUs).
|
|
16
46
|
# This command builds flash-attn from source, which can take 10 to 30 minutes.
|
|
17
47
|
pip install flash-attn==2.8.3 --no-build-isolation
|
|
18
|
-
# For Hopper GPUs (e.g. H100, H800), we recommend FlashAttention-3 instead. See the
|
|
48
|
+
# For Hopper GPUs (e.g. H100, H800), we recommend FlashAttention-3 instead. See the official guide at https://github.com/Dao-AILab/flash-attention.
|
|
19
49
|
|
|
20
50
|
# Install vLLM
|
|
21
51
|
# NOTE: you may need to run the command below to resolve triton and numpy conflicts before installing vllm.
|
|
@@ -42,6 +72,8 @@ pip install -e .
|
|
|
42
72
|
The `parser` command is the fastest way to get started.
|
|
43
73
|
|
|
44
74
|
```bash
|
|
75
|
+
# NOTE: The Infinity-Parser2 model will be automatically downloaded on the first run.
|
|
76
|
+
|
|
45
77
|
# Parse a PDF (outputs Markdown by default)
|
|
46
78
|
parser demo_data/demo.pdf
|
|
47
79
|
|
|
@@ -69,6 +101,8 @@ parser --help
|
|
|
69
101
|
#### Python API
|
|
70
102
|
|
|
71
103
|
```python
|
|
104
|
+
# NOTE: The Infinity-Parser2 model will be automatically downloaded on the first run.
|
|
105
|
+
|
|
72
106
|
from infinity_parser2 import InfinityParser2
|
|
73
107
|
|
|
74
108
|
parser = InfinityParser2()
|
|
@@ -101,7 +135,7 @@ result = parser.parse("demo_data/demo.pdf", task_type="doc2md")
|
|
|
101
135
|
|
|
102
136
|
# Custom prompt
|
|
103
137
|
result = parser.parse("demo_data/demo.pdf", task_type="custom",
|
|
104
|
-
custom_prompt="
|
|
138
|
+
custom_prompt="Please transform the document's contents into Markdown format.")
|
|
105
139
|
|
|
106
140
|
# Batch processing with custom batch size
|
|
107
141
|
result = parser.parse("demo_data", batch_size=8)
|
|
@@ -255,3 +289,7 @@ print(cache.resolve_model_path("infly/Infinity-Parser2-Pro"))
|
|
|
255
289
|
- Python 3.12+
|
|
256
290
|
- CUDA-compatible GPU
|
|
257
291
|
- See `setup.py` for full dependency list.
|
|
292
|
+
|
|
293
|
+
## Acknowledgments
|
|
294
|
+
|
|
295
|
+
We would like to thank [Qwen3.5](https://github.com/QwenLM/Qwen3.5), [ms-swift](https://github.com/modelscope/ms-swift), [VeRL](https://github.com/verl-project/verl), [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval), [olmocr](https://huggingface.co/datasets/allenai/olmOCR-bench), [PaddleOCR-VL](https://github.com/PaddlePaddle/PaddleOCR), [MinerU](https://github.com/opendatalab/MinerU), [dots.ocr](https://github.com/rednote-hilab/dots.ocr), [Chandra-OCR-2](https://github.com/datalab-to/chandra) for providing dataset, code and models.
|
|
@@ -26,28 +26,28 @@ def build_parser() -> argparse.ArgumentParser:
|
|
|
26
26
|
epilog="""
|
|
27
27
|
Examples:
|
|
28
28
|
# Parse a PDF file (default: doc2json -> markdown output)
|
|
29
|
-
parser
|
|
29
|
+
parser demo_data/demo.pdf
|
|
30
30
|
|
|
31
31
|
# Parse with doc2md task type
|
|
32
|
-
parser
|
|
32
|
+
parser demo_data/demo.pdf --task doc2md
|
|
33
33
|
|
|
34
34
|
# Parse with custom prompt
|
|
35
|
-
parser
|
|
35
|
+
parser demo_data/demo.pdf --task custom --prompt "Please transform the document's contents into Markdown format."
|
|
36
36
|
|
|
37
37
|
# Parse multiple files
|
|
38
|
-
parser
|
|
38
|
+
parser demo_data/demo.pdf demo_data/demo.png --output-dir ./results
|
|
39
39
|
|
|
40
40
|
# Parse a directory
|
|
41
|
-
parser
|
|
41
|
+
parser demo_data --output-dir ./results
|
|
42
42
|
|
|
43
43
|
# Output raw JSON
|
|
44
|
-
parser
|
|
44
|
+
parser demo_data/demo.pdf --output-format json
|
|
45
45
|
|
|
46
46
|
# Use transformers backend
|
|
47
|
-
parser
|
|
47
|
+
parser demo_data/demo.pdf --backend transformers
|
|
48
48
|
|
|
49
49
|
# Use vllm-server backend
|
|
50
|
-
parser
|
|
50
|
+
parser demo_data/demo.pdf --backend vllm-server --api-url http://localhost:8000/v1/chat/completions
|
|
51
51
|
""",
|
|
52
52
|
)
|
|
53
53
|
|
|
@@ -136,7 +136,7 @@ Examples:
|
|
|
136
136
|
parser.add_argument(
|
|
137
137
|
"--version",
|
|
138
138
|
action="version",
|
|
139
|
-
version="Infinity-Parser2 0.
|
|
139
|
+
version="Infinity-Parser2 0.2.0",
|
|
140
140
|
)
|
|
141
141
|
|
|
142
142
|
return parser
|
|
@@ -52,7 +52,7 @@ class InfinityParser2:
|
|
|
52
52
|
Example:
|
|
53
53
|
>>> from infinity_parser2 import InfinityParser2
|
|
54
54
|
>>> parser = InfinityParser2(model_name="infly/Infinity-Parser2-Pro")
|
|
55
|
-
>>> result = parser.parse("
|
|
55
|
+
>>> result = parser.parse("demo_data/demo.pdf")
|
|
56
56
|
"""
|
|
57
57
|
|
|
58
58
|
def __init__(
|
|
@@ -183,13 +183,13 @@ class InfinityParser2:
|
|
|
183
183
|
Example:
|
|
184
184
|
>>> parser = InfinityParser2()
|
|
185
185
|
>>> # Single file, returns str
|
|
186
|
-
>>> result = parser.parse("
|
|
186
|
+
>>> result = parser.parse("demo_data/demo.pdf")
|
|
187
187
|
>>> # Multiple files, returns List[str]
|
|
188
|
-
>>> result = parser.parse(["
|
|
188
|
+
>>> result = parser.parse(["demo_data/demo.pdf", "demo_data/demo.png"])
|
|
189
189
|
>>> # Directory, returns Dict[str, str]
|
|
190
|
-
>>> result = parser.parse("
|
|
190
|
+
>>> result = parser.parse("./demo_data")
|
|
191
191
|
>>> # Save results to output_dir, returns None
|
|
192
|
-
>>> parser.parse("
|
|
192
|
+
>>> parser.parse("demo_data/demo.pdf", output_dir="./output")
|
|
193
193
|
"""
|
|
194
194
|
if task_type not in SUPPORTED_TASK_TYPES:
|
|
195
195
|
raise ValueError(f"task_type must be one of {SUPPORTED_TASK_TYPES}, got '{task_type}'")
|
|
@@ -204,6 +204,7 @@ class InfinityParser2:
|
|
|
204
204
|
)
|
|
205
205
|
|
|
206
206
|
prompt = self._resolve_prompt(task_type, custom_prompt)
|
|
207
|
+
print(f"[Infinity-Parser2] task_type: {task_type}, prompt: {prompt}")
|
|
207
208
|
|
|
208
209
|
is_directory = isinstance(input_data, str) and os.path.isdir(input_data)
|
|
209
210
|
file_paths = normalize_input(input_data)
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: infinity_parser2
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.2.0
|
|
4
4
|
Summary: Document parsing Python package supporting PDF and image parsing using Infinity-Parser2-Pro model.
|
|
5
5
|
Home-page: https://github.com/infly-ai/INF-MLLM
|
|
6
6
|
Author: INF Tech
|
|
@@ -53,7 +53,33 @@ Dynamic: summary
|
|
|
53
53
|
|
|
54
54
|
# Infinity-Parser2
|
|
55
55
|
|
|
56
|
-
|
|
56
|
+
<p align="center">
|
|
57
|
+
<img src="https://raw.githubusercontent.com/infly-ai/INF-MLLM/main/Infinity-Parser2/assets/logo.png" width="400"/>
|
|
58
|
+
<p>
|
|
59
|
+
|
|
60
|
+
<p align="center">
|
|
61
|
+
🤗 <a href="https://huggingface.co/infly/Infinity-Parser2-Pro">Model</a> |
|
|
62
|
+
📊 <a>Dataset (coming soon...)</a> |
|
|
63
|
+
📄 <a>Paper (coming soon...)</a> |
|
|
64
|
+
🚀 <a>Demo (coming soon...)</a>
|
|
65
|
+
</p>
|
|
66
|
+
|
|
67
|
+
## Introduction
|
|
68
|
+
|
|
69
|
+
We are excited to release Infinity-Parser2-Pro, our latest flagship document understanding model that achieves a new state-of-the-art on olmOCR-Bench with a score of 86.7%, surpassing frontier models such as DeepSeek-OCR-2, PaddleOCR-VL, and dots.mocr. Building on our previous model Infinity-Parser-7B, we have significantly enhanced our data engine and multi-task reinforcement learning approach. This enables the model to consolidate robust multi-modal parsing capabilities into a unified architecture, delivering brand-new zero-shot capabilities for diverse real-world business scenarios.
|
|
70
|
+
|
|
71
|
+
### Key Features
|
|
72
|
+
|
|
73
|
+
- **Upgraded Data Engine**: We have comprehensively enhanced our synthetic data engine to support both fixed-layout and flexible-layout document formats. By generating over 1 million diverse full-text samples covering a wide range of document layouts, combined with a dynamic adaptive sampling strategy, we ensure highly balanced and robust multi-task learning across various document types.
|
|
74
|
+
- **Multi-Task Reinforcement Learning**: We designed a novel verifiable reward system to support Joint Reinforcement Learning (RL), enabling seamless and simultaneous co-optimization of multiple complex tasks, including doc2json and doc2markdown.
|
|
75
|
+
- **Breakthrough Parsing Performance**: It substantially outperforms our previous 7B model, achieving 86.7% on olmOCR-Bench, surpassing frontier models such as DeepSeek-OCR-2, PaddleOCR-VL, and dots.mocr.
|
|
76
|
+
- **Inference Acceleration**: By adopting the highly efficient MoE architecture, our inference throughput has increased by 21% (from 441 to 534 tokens/sec), reducing deployment latency and costs.
|
|
77
|
+
|
|
78
|
+
## Performance
|
|
79
|
+
|
|
80
|
+
<p align="left">
|
|
81
|
+
<img src="https://raw.githubusercontent.com/infly-ai/INF-MLLM/main/Infinity-Parser2/assets/document_parsing_performance_evaluation.png" width="1200"/>
|
|
82
|
+
<p>
|
|
57
83
|
|
|
58
84
|
## Quick Start
|
|
59
85
|
|
|
@@ -62,13 +88,17 @@ Infinity-Parser2 is a document parsing tool powered by the Infinity-Parser2-Pro
|
|
|
62
88
|
#### Pre-requisites
|
|
63
89
|
|
|
64
90
|
```bash
|
|
65
|
-
#
|
|
91
|
+
# Create a Conda environment (Optional)
|
|
92
|
+
conda create -n infinity_parser2 python=3.12
|
|
93
|
+
conda activate infinity_parser2
|
|
94
|
+
|
|
95
|
+
# Install PyTorch (CUDA). Find the proper version at https://pytorch.org/get-started/previous-versions based on your CUDA version.
|
|
66
96
|
pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu128
|
|
67
97
|
|
|
68
98
|
# Install FlashAttention (required for NVIDIA GPUs).
|
|
69
99
|
# This command builds flash-attn from source, which can take 10 to 30 minutes.
|
|
70
100
|
pip install flash-attn==2.8.3 --no-build-isolation
|
|
71
|
-
# For Hopper GPUs (e.g. H100, H800), we recommend FlashAttention-3 instead. See the
|
|
101
|
+
# For Hopper GPUs (e.g. H100, H800), we recommend FlashAttention-3 instead. See the official guide at https://github.com/Dao-AILab/flash-attention.
|
|
72
102
|
|
|
73
103
|
# Install vLLM
|
|
74
104
|
# NOTE: you may need to run the command below to resolve triton and numpy conflicts before installing vllm.
|
|
@@ -95,6 +125,8 @@ pip install -e .
|
|
|
95
125
|
The `parser` command is the fastest way to get started.
|
|
96
126
|
|
|
97
127
|
```bash
|
|
128
|
+
# NOTE: The Infinity-Parser2 model will be automatically downloaded on the first run.
|
|
129
|
+
|
|
98
130
|
# Parse a PDF (outputs Markdown by default)
|
|
99
131
|
parser demo_data/demo.pdf
|
|
100
132
|
|
|
@@ -122,6 +154,8 @@ parser --help
|
|
|
122
154
|
#### Python API
|
|
123
155
|
|
|
124
156
|
```python
|
|
157
|
+
# NOTE: The Infinity-Parser2 model will be automatically downloaded on the first run.
|
|
158
|
+
|
|
125
159
|
from infinity_parser2 import InfinityParser2
|
|
126
160
|
|
|
127
161
|
parser = InfinityParser2()
|
|
@@ -154,7 +188,7 @@ result = parser.parse("demo_data/demo.pdf", task_type="doc2md")
|
|
|
154
188
|
|
|
155
189
|
# Custom prompt
|
|
156
190
|
result = parser.parse("demo_data/demo.pdf", task_type="custom",
|
|
157
|
-
custom_prompt="
|
|
191
|
+
custom_prompt="Please transform the document's contents into Markdown format.")
|
|
158
192
|
|
|
159
193
|
# Batch processing with custom batch size
|
|
160
194
|
result = parser.parse("demo_data", batch_size=8)
|
|
@@ -308,3 +342,7 @@ print(cache.resolve_model_path("infly/Infinity-Parser2-Pro"))
|
|
|
308
342
|
- Python 3.12+
|
|
309
343
|
- CUDA-compatible GPU
|
|
310
344
|
- See `setup.py` for full dependency list.
|
|
345
|
+
|
|
346
|
+
## Acknowledgments
|
|
347
|
+
|
|
348
|
+
We would like to thank [Qwen3.5](https://github.com/QwenLM/Qwen3.5), [ms-swift](https://github.com/modelscope/ms-swift), [VeRL](https://github.com/verl-project/verl), [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval), [olmocr](https://huggingface.co/datasets/allenai/olmOCR-bench), [PaddleOCR-VL](https://github.com/PaddlePaddle/PaddleOCR), [MinerU](https://github.com/opendatalab/MinerU), [dots.ocr](https://github.com/rednote-hilab/dots.ocr), [Chandra-OCR-2](https://github.com/datalab-to/chandra) for providing dataset, code and models.
|
|
@@ -32,7 +32,7 @@ install_requires = [
|
|
|
32
32
|
|
|
33
33
|
setup(
|
|
34
34
|
name="infinity_parser2",
|
|
35
|
-
version="0.
|
|
35
|
+
version="0.2.0",
|
|
36
36
|
description="Document parsing Python package supporting PDF and image parsing using Infinity-Parser2-Pro model.",
|
|
37
37
|
long_description=open("README.md", "r", encoding="utf-8").read(),
|
|
38
38
|
long_description_content_type="text/markdown",
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
{infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2.egg-info/dependency_links.txt
RENAMED
|
File without changes
|
{infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2.egg-info/entry_points.txt
RENAMED
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|