infinity-parser2 0.1.0__tar.gz → 0.3.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/PKG-INFO +151 -13
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/README.md +150 -12
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/infinity_parser2/__init__.py +1 -1
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/infinity_parser2/cli.py +9 -9
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/infinity_parser2/parser.py +11 -7
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/infinity_parser2.egg-info/PKG-INFO +151 -13
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/setup.py +1 -1
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/infinity_parser2/__main__.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/infinity_parser2/backends/__init__.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/infinity_parser2/backends/base.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/infinity_parser2/backends/transformers.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/infinity_parser2/backends/vllm_engine.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/infinity_parser2/backends/vllm_server.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/infinity_parser2/prompts.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/infinity_parser2/utils/__init__.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/infinity_parser2/utils/file.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/infinity_parser2/utils/image.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/infinity_parser2/utils/model.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/infinity_parser2/utils/pdf.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/infinity_parser2/utils/utils.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/infinity_parser2.egg-info/SOURCES.txt +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/infinity_parser2.egg-info/dependency_links.txt +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/infinity_parser2.egg-info/entry_points.txt +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/infinity_parser2.egg-info/requires.txt +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/infinity_parser2.egg-info/top_level.txt +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/setup.cfg +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/tests/__init__.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/tests/test_backends.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/tests/test_parser.py +0 -0
- {infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/tests/test_utils.py +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: infinity_parser2
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.3.0
|
|
4
4
|
Summary: Document parsing Python package supporting PDF and image parsing using Infinity-Parser2-Pro model.
|
|
5
5
|
Home-page: https://github.com/infly-ai/INF-MLLM
|
|
6
6
|
Author: INF Tech
|
|
@@ -53,22 +53,148 @@ Dynamic: summary
|
|
|
53
53
|
|
|
54
54
|
# Infinity-Parser2
|
|
55
55
|
|
|
56
|
-
|
|
56
|
+
<p align="center">
|
|
57
|
+
<img src="https://raw.githubusercontent.com/infly-ai/INF-MLLM/main/Infinity-Parser2/assets/logo.png" width="400"/>
|
|
58
|
+
<p>
|
|
59
|
+
|
|
60
|
+
<p align="center">
|
|
61
|
+
🤗 <a href="https://huggingface.co/infly/Infinity-Parser2-Pro">Model</a> |
|
|
62
|
+
📊 <a>Dataset (coming soon...)</a> |
|
|
63
|
+
📄 <a>Paper (coming soon...)</a> |
|
|
64
|
+
🚀 <a>Demo (coming soon...)</a>
|
|
65
|
+
</p>
|
|
66
|
+
|
|
67
|
+
## Introduction
|
|
68
|
+
|
|
69
|
+
We are excited to release Infinity-Parser2-Pro, our latest flagship document understanding model that achieves a new state-of-the-art on olmOCR-Bench with a score of 86.7%, surpassing frontier models such as DeepSeek-OCR-2, PaddleOCR-VL, and dots.mocr. Building on our previous model Infinity-Parser-7B, we have significantly enhanced our data engine and multi-task reinforcement learning approach. This enables the model to consolidate robust multi-modal parsing capabilities into a unified architecture, delivering brand-new zero-shot capabilities for diverse real-world business scenarios.
|
|
70
|
+
|
|
71
|
+
### Key Features
|
|
72
|
+
|
|
73
|
+
- **Upgraded Data Engine**: We have comprehensively enhanced our synthetic data engine to support both fixed-layout and flexible-layout document formats. By generating over 1 million diverse full-text samples covering a wide range of document layouts, combined with a dynamic adaptive sampling strategy, we ensure highly balanced and robust multi-task learning across various document types.
|
|
74
|
+
- **Multi-Task Reinforcement Learning**: We designed a novel verifiable reward system to support Joint Reinforcement Learning (RL), enabling seamless and simultaneous co-optimization of multiple complex tasks, including doc2json and doc2markdown.
|
|
75
|
+
- **Breakthrough Parsing Performance**: It substantially outperforms our previous 7B model, achieving 86.7% on olmOCR-Bench, surpassing frontier models such as DeepSeek-OCR-2, PaddleOCR-VL, and dots.mocr.
|
|
76
|
+
- **Inference Acceleration**: By adopting the highly efficient MoE architecture, our inference throughput has increased by 21% (from 441 to 534 tokens/sec), reducing deployment latency and costs.
|
|
77
|
+
|
|
78
|
+
## Performance
|
|
79
|
+
|
|
80
|
+
<p align="left">
|
|
81
|
+
<img src="https://raw.githubusercontent.com/infly-ai/INF-MLLM/main/Infinity-Parser2/assets/document_parsing_performance_evaluation.png" width="1200"/>
|
|
82
|
+
<p>
|
|
57
83
|
|
|
58
84
|
## Quick Start
|
|
59
85
|
|
|
60
|
-
###
|
|
86
|
+
### 1. Minimal "Hello World" (Native Transformers)
|
|
87
|
+
|
|
88
|
+
If you are looking for a minimal script to parse a single image to Markdown using the native `transformers` library, here is a simple snippet:
|
|
89
|
+
|
|
90
|
+
```python
|
|
91
|
+
from PIL import Image
|
|
92
|
+
import torch
|
|
93
|
+
from transformers import AutoModelForImageTextToText, AutoProcessor
|
|
94
|
+
from qwen_vl_utils import process_vision_info
|
|
95
|
+
|
|
96
|
+
# Load the model and processor
|
|
97
|
+
model = AutoModelForImageTextToText.from_pretrained(
|
|
98
|
+
"infly/Infinity-Parser2-Pro",
|
|
99
|
+
torch_dtype="float16",
|
|
100
|
+
device_map="auto",
|
|
101
|
+
)
|
|
102
|
+
processor = AutoProcessor.from_pretrained("infly/Infinity-Parser2-Pro")
|
|
103
|
+
|
|
104
|
+
# Build the messages for the model
|
|
105
|
+
pil_image = Image.open("demo_data/demo.png").convert("RGB")
|
|
106
|
+
min_pixels = 2048 # 32 * 64
|
|
107
|
+
max_pixels = 16777216 # 4096 * 4096
|
|
108
|
+
prompt = """
|
|
109
|
+
Please output the layout information from the PDF image, including each layout element's bbox, its category, and the corresponding text content within the bbox.
|
|
110
|
+
1. Bbox format: [x1, y1, x2, y2]
|
|
111
|
+
2. Layout Categories: The possible categories are ['header', 'title', 'text', 'figure', 'table', 'formula', 'figure_caption', 'table_caption', 'formula_caption', 'figure_footnote', 'table_footnote', 'page_footnote', 'footer'].
|
|
112
|
+
3. Text Extraction & Formatting Rules:
|
|
113
|
+
- Figure: For the 'figure' category, the text field should be empty string.
|
|
114
|
+
- Formula: Format its text as LaTeX.
|
|
115
|
+
- Table: Format its text as HTML.
|
|
116
|
+
- All Others (Text, Title, etc.): Format their text as Markdown.
|
|
117
|
+
4. Constraints:
|
|
118
|
+
- The output text must be the original text from the image, with no translation.
|
|
119
|
+
- All layout elements must be sorted according to human reading order.
|
|
120
|
+
5. Final Output: The entire output must be a single JSON object.
|
|
121
|
+
"""
|
|
122
|
+
|
|
123
|
+
messages = [
|
|
124
|
+
{
|
|
125
|
+
"role": "user",
|
|
126
|
+
"content": [
|
|
127
|
+
{
|
|
128
|
+
"type": "image",
|
|
129
|
+
"image": pil_image,
|
|
130
|
+
"min_pixels": min_pixels,
|
|
131
|
+
"max_pixels": max_pixels,
|
|
132
|
+
},
|
|
133
|
+
{"type": "text", "text": prompt},
|
|
134
|
+
],
|
|
135
|
+
}
|
|
136
|
+
]
|
|
137
|
+
|
|
138
|
+
chat_template_kwargs = {"enable_thinking": False}
|
|
139
|
+
|
|
140
|
+
text = processor.apply_chat_template(
|
|
141
|
+
messages, tokenize=False, add_generation_prompt=True, **chat_template_kwargs
|
|
142
|
+
)
|
|
143
|
+
image_inputs, _ = process_vision_info(messages, image_patch_size=16)
|
|
144
|
+
|
|
145
|
+
inputs = processor(
|
|
146
|
+
text=text,
|
|
147
|
+
images=image_inputs,
|
|
148
|
+
do_resize=False,
|
|
149
|
+
padding=True,
|
|
150
|
+
return_tensors="pt",
|
|
151
|
+
)
|
|
152
|
+
|
|
153
|
+
# Move all tensors to the same device as the model
|
|
154
|
+
inputs = {
|
|
155
|
+
k: v.to(model.device) if isinstance(v, torch.Tensor) else v
|
|
156
|
+
for k, v in inputs.items()
|
|
157
|
+
}
|
|
158
|
+
|
|
159
|
+
# Generate the response
|
|
160
|
+
generated_ids = model.generate(
|
|
161
|
+
**inputs,
|
|
162
|
+
max_new_tokens=32768,
|
|
163
|
+
temperature=0.0,
|
|
164
|
+
top_p=1.0,
|
|
165
|
+
)
|
|
166
|
+
|
|
167
|
+
# Strip input tokens, keeping only the newly generated response
|
|
168
|
+
generated_ids_trimmed = [
|
|
169
|
+
out_ids[len(in_ids) :]
|
|
170
|
+
for in_ids, out_ids in zip(inputs["input_ids"], generated_ids)
|
|
171
|
+
]
|
|
172
|
+
output_text = processor.batch_decode(
|
|
173
|
+
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
|
|
174
|
+
)
|
|
175
|
+
print(output_text)
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
### 2. Advanced Pipeline (infinity_parser2)
|
|
179
|
+
|
|
180
|
+
For bulk processing, advanced features, or an end-to-end PDF parsing pipeline, we recommend using our infinity_parser2 wrapper.
|
|
61
181
|
|
|
62
182
|
#### Pre-requisites
|
|
63
183
|
|
|
64
184
|
```bash
|
|
65
|
-
#
|
|
185
|
+
# Create a Conda environment (Optional)
|
|
186
|
+
conda create -n infinity_parser2 python=3.12
|
|
187
|
+
conda activate infinity_parser2
|
|
188
|
+
|
|
189
|
+
# Install PyTorch (CUDA). Find the proper version at https://pytorch.org/get-started/previous-versions based on your CUDA version.
|
|
66
190
|
pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu128
|
|
67
191
|
|
|
68
|
-
# Install FlashAttention (
|
|
69
|
-
#
|
|
192
|
+
# Install FlashAttention (FlashAttention-2 is recommended by default)
|
|
193
|
+
# Standard install (compiles from source, ~10-30 min):
|
|
70
194
|
pip install flash-attn==2.8.3 --no-build-isolation
|
|
71
|
-
#
|
|
195
|
+
# Faster install: download wheel from https://github.com/Dao-AILab/flash-attention/releases. Then run: pip install /path/to/<wheel_filename>.whl
|
|
196
|
+
# For Hopper GPUs (e.g. H100, H800), we recommend FlashAttention-3 instead. See: https://github.com/Dao-AILab/flash-attention
|
|
197
|
+
# NOTE: The code will prioritize detecting FlashAttention-3. If not found, it falls back to FlashAttention-2.
|
|
72
198
|
|
|
73
199
|
# Install vLLM
|
|
74
200
|
# NOTE: you may need to run the command below to resolve triton and numpy conflicts before installing vllm.
|
|
@@ -78,23 +204,29 @@ pip install vllm==0.17.1
|
|
|
78
204
|
|
|
79
205
|
#### Install infinity_parser2
|
|
80
206
|
|
|
207
|
+
Install from PyPI
|
|
208
|
+
|
|
81
209
|
```bash
|
|
82
|
-
# From PyPI
|
|
83
210
|
pip install infinity_parser2
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
Install from source code
|
|
84
214
|
|
|
85
|
-
|
|
215
|
+
```bash
|
|
86
216
|
git clone https://github.com/infly-ai/INF-MLLM.git
|
|
87
217
|
cd INF-MLLM/Infinity-Parser2
|
|
88
218
|
pip install -e .
|
|
89
219
|
```
|
|
90
220
|
|
|
91
|
-
|
|
221
|
+
#### Usage
|
|
92
222
|
|
|
93
|
-
|
|
223
|
+
##### Command Line
|
|
94
224
|
|
|
95
225
|
The `parser` command is the fastest way to get started.
|
|
96
226
|
|
|
97
227
|
```bash
|
|
228
|
+
# NOTE: The Infinity-Parser2 model will be automatically downloaded on the first run.
|
|
229
|
+
|
|
98
230
|
# Parse a PDF (outputs Markdown by default)
|
|
99
231
|
parser demo_data/demo.pdf
|
|
100
232
|
|
|
@@ -119,9 +251,11 @@ parser demo_data/demo.png --task doc2md
|
|
|
119
251
|
parser --help
|
|
120
252
|
```
|
|
121
253
|
|
|
122
|
-
|
|
254
|
+
##### Python API
|
|
123
255
|
|
|
124
256
|
```python
|
|
257
|
+
# NOTE: The Infinity-Parser2 model will be automatically downloaded on the first run.
|
|
258
|
+
|
|
125
259
|
from infinity_parser2 import InfinityParser2
|
|
126
260
|
|
|
127
261
|
parser = InfinityParser2()
|
|
@@ -154,7 +288,7 @@ result = parser.parse("demo_data/demo.pdf", task_type="doc2md")
|
|
|
154
288
|
|
|
155
289
|
# Custom prompt
|
|
156
290
|
result = parser.parse("demo_data/demo.pdf", task_type="custom",
|
|
157
|
-
custom_prompt="
|
|
291
|
+
custom_prompt="Please transform the document's contents into Markdown format.")
|
|
158
292
|
|
|
159
293
|
# Batch processing with custom batch size
|
|
160
294
|
result = parser.parse("demo_data", batch_size=8)
|
|
@@ -308,3 +442,7 @@ print(cache.resolve_model_path("infly/Infinity-Parser2-Pro"))
|
|
|
308
442
|
- Python 3.12+
|
|
309
443
|
- CUDA-compatible GPU
|
|
310
444
|
- See `setup.py` for full dependency list.
|
|
445
|
+
|
|
446
|
+
## Acknowledgments
|
|
447
|
+
|
|
448
|
+
We would like to thank [Qwen3.5](https://github.com/QwenLM/Qwen3.5), [ms-swift](https://github.com/modelscope/ms-swift), [VeRL](https://github.com/verl-project/verl), [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval), [olmocr](https://huggingface.co/datasets/allenai/olmOCR-bench), [PaddleOCR-VL](https://github.com/PaddlePaddle/PaddleOCR), [MinerU](https://github.com/opendatalab/MinerU), [dots.ocr](https://github.com/rednote-hilab/dots.ocr), [Chandra-OCR-2](https://github.com/datalab-to/chandra) for providing dataset, code and models.
|
|
@@ -1,21 +1,147 @@
|
|
|
1
1
|
# Infinity-Parser2
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
<p align="center">
|
|
4
|
+
<img src="https://raw.githubusercontent.com/infly-ai/INF-MLLM/main/Infinity-Parser2/assets/logo.png" width="400"/>
|
|
5
|
+
<p>
|
|
6
|
+
|
|
7
|
+
<p align="center">
|
|
8
|
+
🤗 <a href="https://huggingface.co/infly/Infinity-Parser2-Pro">Model</a> |
|
|
9
|
+
📊 <a>Dataset (coming soon...)</a> |
|
|
10
|
+
📄 <a>Paper (coming soon...)</a> |
|
|
11
|
+
🚀 <a>Demo (coming soon...)</a>
|
|
12
|
+
</p>
|
|
13
|
+
|
|
14
|
+
## Introduction
|
|
15
|
+
|
|
16
|
+
We are excited to release Infinity-Parser2-Pro, our latest flagship document understanding model that achieves a new state-of-the-art on olmOCR-Bench with a score of 86.7%, surpassing frontier models such as DeepSeek-OCR-2, PaddleOCR-VL, and dots.mocr. Building on our previous model Infinity-Parser-7B, we have significantly enhanced our data engine and multi-task reinforcement learning approach. This enables the model to consolidate robust multi-modal parsing capabilities into a unified architecture, delivering brand-new zero-shot capabilities for diverse real-world business scenarios.
|
|
17
|
+
|
|
18
|
+
### Key Features
|
|
19
|
+
|
|
20
|
+
- **Upgraded Data Engine**: We have comprehensively enhanced our synthetic data engine to support both fixed-layout and flexible-layout document formats. By generating over 1 million diverse full-text samples covering a wide range of document layouts, combined with a dynamic adaptive sampling strategy, we ensure highly balanced and robust multi-task learning across various document types.
|
|
21
|
+
- **Multi-Task Reinforcement Learning**: We designed a novel verifiable reward system to support Joint Reinforcement Learning (RL), enabling seamless and simultaneous co-optimization of multiple complex tasks, including doc2json and doc2markdown.
|
|
22
|
+
- **Breakthrough Parsing Performance**: It substantially outperforms our previous 7B model, achieving 86.7% on olmOCR-Bench, surpassing frontier models such as DeepSeek-OCR-2, PaddleOCR-VL, and dots.mocr.
|
|
23
|
+
- **Inference Acceleration**: By adopting the highly efficient MoE architecture, our inference throughput has increased by 21% (from 441 to 534 tokens/sec), reducing deployment latency and costs.
|
|
24
|
+
|
|
25
|
+
## Performance
|
|
26
|
+
|
|
27
|
+
<p align="left">
|
|
28
|
+
<img src="https://raw.githubusercontent.com/infly-ai/INF-MLLM/main/Infinity-Parser2/assets/document_parsing_performance_evaluation.png" width="1200"/>
|
|
29
|
+
<p>
|
|
4
30
|
|
|
5
31
|
## Quick Start
|
|
6
32
|
|
|
7
|
-
###
|
|
33
|
+
### 1. Minimal "Hello World" (Native Transformers)
|
|
34
|
+
|
|
35
|
+
If you are looking for a minimal script to parse a single image to Markdown using the native `transformers` library, here is a simple snippet:
|
|
36
|
+
|
|
37
|
+
```python
|
|
38
|
+
from PIL import Image
|
|
39
|
+
import torch
|
|
40
|
+
from transformers import AutoModelForImageTextToText, AutoProcessor
|
|
41
|
+
from qwen_vl_utils import process_vision_info
|
|
42
|
+
|
|
43
|
+
# Load the model and processor
|
|
44
|
+
model = AutoModelForImageTextToText.from_pretrained(
|
|
45
|
+
"infly/Infinity-Parser2-Pro",
|
|
46
|
+
torch_dtype="float16",
|
|
47
|
+
device_map="auto",
|
|
48
|
+
)
|
|
49
|
+
processor = AutoProcessor.from_pretrained("infly/Infinity-Parser2-Pro")
|
|
50
|
+
|
|
51
|
+
# Build the messages for the model
|
|
52
|
+
pil_image = Image.open("demo_data/demo.png").convert("RGB")
|
|
53
|
+
min_pixels = 2048 # 32 * 64
|
|
54
|
+
max_pixels = 16777216 # 4096 * 4096
|
|
55
|
+
prompt = """
|
|
56
|
+
Please output the layout information from the PDF image, including each layout element's bbox, its category, and the corresponding text content within the bbox.
|
|
57
|
+
1. Bbox format: [x1, y1, x2, y2]
|
|
58
|
+
2. Layout Categories: The possible categories are ['header', 'title', 'text', 'figure', 'table', 'formula', 'figure_caption', 'table_caption', 'formula_caption', 'figure_footnote', 'table_footnote', 'page_footnote', 'footer'].
|
|
59
|
+
3. Text Extraction & Formatting Rules:
|
|
60
|
+
- Figure: For the 'figure' category, the text field should be empty string.
|
|
61
|
+
- Formula: Format its text as LaTeX.
|
|
62
|
+
- Table: Format its text as HTML.
|
|
63
|
+
- All Others (Text, Title, etc.): Format their text as Markdown.
|
|
64
|
+
4. Constraints:
|
|
65
|
+
- The output text must be the original text from the image, with no translation.
|
|
66
|
+
- All layout elements must be sorted according to human reading order.
|
|
67
|
+
5. Final Output: The entire output must be a single JSON object.
|
|
68
|
+
"""
|
|
69
|
+
|
|
70
|
+
messages = [
|
|
71
|
+
{
|
|
72
|
+
"role": "user",
|
|
73
|
+
"content": [
|
|
74
|
+
{
|
|
75
|
+
"type": "image",
|
|
76
|
+
"image": pil_image,
|
|
77
|
+
"min_pixels": min_pixels,
|
|
78
|
+
"max_pixels": max_pixels,
|
|
79
|
+
},
|
|
80
|
+
{"type": "text", "text": prompt},
|
|
81
|
+
],
|
|
82
|
+
}
|
|
83
|
+
]
|
|
84
|
+
|
|
85
|
+
chat_template_kwargs = {"enable_thinking": False}
|
|
86
|
+
|
|
87
|
+
text = processor.apply_chat_template(
|
|
88
|
+
messages, tokenize=False, add_generation_prompt=True, **chat_template_kwargs
|
|
89
|
+
)
|
|
90
|
+
image_inputs, _ = process_vision_info(messages, image_patch_size=16)
|
|
91
|
+
|
|
92
|
+
inputs = processor(
|
|
93
|
+
text=text,
|
|
94
|
+
images=image_inputs,
|
|
95
|
+
do_resize=False,
|
|
96
|
+
padding=True,
|
|
97
|
+
return_tensors="pt",
|
|
98
|
+
)
|
|
99
|
+
|
|
100
|
+
# Move all tensors to the same device as the model
|
|
101
|
+
inputs = {
|
|
102
|
+
k: v.to(model.device) if isinstance(v, torch.Tensor) else v
|
|
103
|
+
for k, v in inputs.items()
|
|
104
|
+
}
|
|
105
|
+
|
|
106
|
+
# Generate the response
|
|
107
|
+
generated_ids = model.generate(
|
|
108
|
+
**inputs,
|
|
109
|
+
max_new_tokens=32768,
|
|
110
|
+
temperature=0.0,
|
|
111
|
+
top_p=1.0,
|
|
112
|
+
)
|
|
113
|
+
|
|
114
|
+
# Strip input tokens, keeping only the newly generated response
|
|
115
|
+
generated_ids_trimmed = [
|
|
116
|
+
out_ids[len(in_ids) :]
|
|
117
|
+
for in_ids, out_ids in zip(inputs["input_ids"], generated_ids)
|
|
118
|
+
]
|
|
119
|
+
output_text = processor.batch_decode(
|
|
120
|
+
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
|
|
121
|
+
)
|
|
122
|
+
print(output_text)
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
### 2. Advanced Pipeline (infinity_parser2)
|
|
126
|
+
|
|
127
|
+
For bulk processing, advanced features, or an end-to-end PDF parsing pipeline, we recommend using our infinity_parser2 wrapper.
|
|
8
128
|
|
|
9
129
|
#### Pre-requisites
|
|
10
130
|
|
|
11
131
|
```bash
|
|
12
|
-
#
|
|
132
|
+
# Create a Conda environment (Optional)
|
|
133
|
+
conda create -n infinity_parser2 python=3.12
|
|
134
|
+
conda activate infinity_parser2
|
|
135
|
+
|
|
136
|
+
# Install PyTorch (CUDA). Find the proper version at https://pytorch.org/get-started/previous-versions based on your CUDA version.
|
|
13
137
|
pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu128
|
|
14
138
|
|
|
15
|
-
# Install FlashAttention (
|
|
16
|
-
#
|
|
139
|
+
# Install FlashAttention (FlashAttention-2 is recommended by default)
|
|
140
|
+
# Standard install (compiles from source, ~10-30 min):
|
|
17
141
|
pip install flash-attn==2.8.3 --no-build-isolation
|
|
18
|
-
#
|
|
142
|
+
# Faster install: download wheel from https://github.com/Dao-AILab/flash-attention/releases. Then run: pip install /path/to/<wheel_filename>.whl
|
|
143
|
+
# For Hopper GPUs (e.g. H100, H800), we recommend FlashAttention-3 instead. See: https://github.com/Dao-AILab/flash-attention
|
|
144
|
+
# NOTE: The code will prioritize detecting FlashAttention-3. If not found, it falls back to FlashAttention-2.
|
|
19
145
|
|
|
20
146
|
# Install vLLM
|
|
21
147
|
# NOTE: you may need to run the command below to resolve triton and numpy conflicts before installing vllm.
|
|
@@ -25,23 +151,29 @@ pip install vllm==0.17.1
|
|
|
25
151
|
|
|
26
152
|
#### Install infinity_parser2
|
|
27
153
|
|
|
154
|
+
Install from PyPI
|
|
155
|
+
|
|
28
156
|
```bash
|
|
29
|
-
# From PyPI
|
|
30
157
|
pip install infinity_parser2
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
Install from source code
|
|
31
161
|
|
|
32
|
-
|
|
162
|
+
```bash
|
|
33
163
|
git clone https://github.com/infly-ai/INF-MLLM.git
|
|
34
164
|
cd INF-MLLM/Infinity-Parser2
|
|
35
165
|
pip install -e .
|
|
36
166
|
```
|
|
37
167
|
|
|
38
|
-
|
|
168
|
+
#### Usage
|
|
39
169
|
|
|
40
|
-
|
|
170
|
+
##### Command Line
|
|
41
171
|
|
|
42
172
|
The `parser` command is the fastest way to get started.
|
|
43
173
|
|
|
44
174
|
```bash
|
|
175
|
+
# NOTE: The Infinity-Parser2 model will be automatically downloaded on the first run.
|
|
176
|
+
|
|
45
177
|
# Parse a PDF (outputs Markdown by default)
|
|
46
178
|
parser demo_data/demo.pdf
|
|
47
179
|
|
|
@@ -66,9 +198,11 @@ parser demo_data/demo.png --task doc2md
|
|
|
66
198
|
parser --help
|
|
67
199
|
```
|
|
68
200
|
|
|
69
|
-
|
|
201
|
+
##### Python API
|
|
70
202
|
|
|
71
203
|
```python
|
|
204
|
+
# NOTE: The Infinity-Parser2 model will be automatically downloaded on the first run.
|
|
205
|
+
|
|
72
206
|
from infinity_parser2 import InfinityParser2
|
|
73
207
|
|
|
74
208
|
parser = InfinityParser2()
|
|
@@ -101,7 +235,7 @@ result = parser.parse("demo_data/demo.pdf", task_type="doc2md")
|
|
|
101
235
|
|
|
102
236
|
# Custom prompt
|
|
103
237
|
result = parser.parse("demo_data/demo.pdf", task_type="custom",
|
|
104
|
-
custom_prompt="
|
|
238
|
+
custom_prompt="Please transform the document's contents into Markdown format.")
|
|
105
239
|
|
|
106
240
|
# Batch processing with custom batch size
|
|
107
241
|
result = parser.parse("demo_data", batch_size=8)
|
|
@@ -255,3 +389,7 @@ print(cache.resolve_model_path("infly/Infinity-Parser2-Pro"))
|
|
|
255
389
|
- Python 3.12+
|
|
256
390
|
- CUDA-compatible GPU
|
|
257
391
|
- See `setup.py` for full dependency list.
|
|
392
|
+
|
|
393
|
+
## Acknowledgments
|
|
394
|
+
|
|
395
|
+
We would like to thank [Qwen3.5](https://github.com/QwenLM/Qwen3.5), [ms-swift](https://github.com/modelscope/ms-swift), [VeRL](https://github.com/verl-project/verl), [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval), [olmocr](https://huggingface.co/datasets/allenai/olmOCR-bench), [PaddleOCR-VL](https://github.com/PaddlePaddle/PaddleOCR), [MinerU](https://github.com/opendatalab/MinerU), [dots.ocr](https://github.com/rednote-hilab/dots.ocr), [Chandra-OCR-2](https://github.com/datalab-to/chandra) for providing dataset, code and models.
|
|
@@ -26,28 +26,28 @@ def build_parser() -> argparse.ArgumentParser:
|
|
|
26
26
|
epilog="""
|
|
27
27
|
Examples:
|
|
28
28
|
# Parse a PDF file (default: doc2json -> markdown output)
|
|
29
|
-
parser
|
|
29
|
+
parser demo_data/demo.pdf
|
|
30
30
|
|
|
31
31
|
# Parse with doc2md task type
|
|
32
|
-
parser
|
|
32
|
+
parser demo_data/demo.pdf --task doc2md
|
|
33
33
|
|
|
34
34
|
# Parse with custom prompt
|
|
35
|
-
parser
|
|
35
|
+
parser demo_data/demo.pdf --task custom --prompt "Please transform the document's contents into Markdown format."
|
|
36
36
|
|
|
37
37
|
# Parse multiple files
|
|
38
|
-
parser
|
|
38
|
+
parser demo_data/demo.pdf demo_data/demo.png --output-dir ./results
|
|
39
39
|
|
|
40
40
|
# Parse a directory
|
|
41
|
-
parser
|
|
41
|
+
parser demo_data --output-dir ./results
|
|
42
42
|
|
|
43
43
|
# Output raw JSON
|
|
44
|
-
parser
|
|
44
|
+
parser demo_data/demo.pdf --output-format json
|
|
45
45
|
|
|
46
46
|
# Use transformers backend
|
|
47
|
-
parser
|
|
47
|
+
parser demo_data/demo.pdf --backend transformers
|
|
48
48
|
|
|
49
49
|
# Use vllm-server backend
|
|
50
|
-
parser
|
|
50
|
+
parser demo_data/demo.pdf --backend vllm-server --api-url http://localhost:8000/v1/chat/completions
|
|
51
51
|
""",
|
|
52
52
|
)
|
|
53
53
|
|
|
@@ -136,7 +136,7 @@ Examples:
|
|
|
136
136
|
parser.add_argument(
|
|
137
137
|
"--version",
|
|
138
138
|
action="version",
|
|
139
|
-
version="Infinity-Parser2 0.
|
|
139
|
+
version="Infinity-Parser2 0.3.0",
|
|
140
140
|
)
|
|
141
141
|
|
|
142
142
|
return parser
|
|
@@ -52,7 +52,7 @@ class InfinityParser2:
|
|
|
52
52
|
Example:
|
|
53
53
|
>>> from infinity_parser2 import InfinityParser2
|
|
54
54
|
>>> parser = InfinityParser2(model_name="infly/Infinity-Parser2-Pro")
|
|
55
|
-
>>> result = parser.parse("
|
|
55
|
+
>>> result = parser.parse("demo_data/demo.pdf")
|
|
56
56
|
"""
|
|
57
57
|
|
|
58
58
|
def __init__(
|
|
@@ -86,8 +86,11 @@ class InfinityParser2:
|
|
|
86
86
|
self.kwargs = kwargs
|
|
87
87
|
|
|
88
88
|
# Initialize model cache and resolve model path (stored separately)
|
|
89
|
-
|
|
90
|
-
|
|
89
|
+
if self.backend_name == "vllm-server":
|
|
90
|
+
self._model_path = self.model_name
|
|
91
|
+
else:
|
|
92
|
+
cache = get_model_cache(model_cache_dir)
|
|
93
|
+
self._model_path = cache.resolve_model_path(self.model_name)
|
|
91
94
|
|
|
92
95
|
self._backend: BaseBackend = self._init_backend()
|
|
93
96
|
|
|
@@ -183,13 +186,13 @@ class InfinityParser2:
|
|
|
183
186
|
Example:
|
|
184
187
|
>>> parser = InfinityParser2()
|
|
185
188
|
>>> # Single file, returns str
|
|
186
|
-
>>> result = parser.parse("
|
|
189
|
+
>>> result = parser.parse("demo_data/demo.pdf")
|
|
187
190
|
>>> # Multiple files, returns List[str]
|
|
188
|
-
>>> result = parser.parse(["
|
|
191
|
+
>>> result = parser.parse(["demo_data/demo.pdf", "demo_data/demo.png"])
|
|
189
192
|
>>> # Directory, returns Dict[str, str]
|
|
190
|
-
>>> result = parser.parse("
|
|
193
|
+
>>> result = parser.parse("./demo_data")
|
|
191
194
|
>>> # Save results to output_dir, returns None
|
|
192
|
-
>>> parser.parse("
|
|
195
|
+
>>> parser.parse("demo_data/demo.pdf", output_dir="./output")
|
|
193
196
|
"""
|
|
194
197
|
if task_type not in SUPPORTED_TASK_TYPES:
|
|
195
198
|
raise ValueError(f"task_type must be one of {SUPPORTED_TASK_TYPES}, got '{task_type}'")
|
|
@@ -204,6 +207,7 @@ class InfinityParser2:
|
|
|
204
207
|
)
|
|
205
208
|
|
|
206
209
|
prompt = self._resolve_prompt(task_type, custom_prompt)
|
|
210
|
+
print(f"[Infinity-Parser2] task_type: {task_type}, prompt: {prompt}")
|
|
207
211
|
|
|
208
212
|
is_directory = isinstance(input_data, str) and os.path.isdir(input_data)
|
|
209
213
|
file_paths = normalize_input(input_data)
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: infinity_parser2
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.3.0
|
|
4
4
|
Summary: Document parsing Python package supporting PDF and image parsing using Infinity-Parser2-Pro model.
|
|
5
5
|
Home-page: https://github.com/infly-ai/INF-MLLM
|
|
6
6
|
Author: INF Tech
|
|
@@ -53,22 +53,148 @@ Dynamic: summary
|
|
|
53
53
|
|
|
54
54
|
# Infinity-Parser2
|
|
55
55
|
|
|
56
|
-
|
|
56
|
+
<p align="center">
|
|
57
|
+
<img src="https://raw.githubusercontent.com/infly-ai/INF-MLLM/main/Infinity-Parser2/assets/logo.png" width="400"/>
|
|
58
|
+
<p>
|
|
59
|
+
|
|
60
|
+
<p align="center">
|
|
61
|
+
🤗 <a href="https://huggingface.co/infly/Infinity-Parser2-Pro">Model</a> |
|
|
62
|
+
📊 <a>Dataset (coming soon...)</a> |
|
|
63
|
+
📄 <a>Paper (coming soon...)</a> |
|
|
64
|
+
🚀 <a>Demo (coming soon...)</a>
|
|
65
|
+
</p>
|
|
66
|
+
|
|
67
|
+
## Introduction
|
|
68
|
+
|
|
69
|
+
We are excited to release Infinity-Parser2-Pro, our latest flagship document understanding model that achieves a new state-of-the-art on olmOCR-Bench with a score of 86.7%, surpassing frontier models such as DeepSeek-OCR-2, PaddleOCR-VL, and dots.mocr. Building on our previous model Infinity-Parser-7B, we have significantly enhanced our data engine and multi-task reinforcement learning approach. This enables the model to consolidate robust multi-modal parsing capabilities into a unified architecture, delivering brand-new zero-shot capabilities for diverse real-world business scenarios.
|
|
70
|
+
|
|
71
|
+
### Key Features
|
|
72
|
+
|
|
73
|
+
- **Upgraded Data Engine**: We have comprehensively enhanced our synthetic data engine to support both fixed-layout and flexible-layout document formats. By generating over 1 million diverse full-text samples covering a wide range of document layouts, combined with a dynamic adaptive sampling strategy, we ensure highly balanced and robust multi-task learning across various document types.
|
|
74
|
+
- **Multi-Task Reinforcement Learning**: We designed a novel verifiable reward system to support Joint Reinforcement Learning (RL), enabling seamless and simultaneous co-optimization of multiple complex tasks, including doc2json and doc2markdown.
|
|
75
|
+
- **Breakthrough Parsing Performance**: It substantially outperforms our previous 7B model, achieving 86.7% on olmOCR-Bench, surpassing frontier models such as DeepSeek-OCR-2, PaddleOCR-VL, and dots.mocr.
|
|
76
|
+
- **Inference Acceleration**: By adopting the highly efficient MoE architecture, our inference throughput has increased by 21% (from 441 to 534 tokens/sec), reducing deployment latency and costs.
|
|
77
|
+
|
|
78
|
+
## Performance
|
|
79
|
+
|
|
80
|
+
<p align="left">
|
|
81
|
+
<img src="https://raw.githubusercontent.com/infly-ai/INF-MLLM/main/Infinity-Parser2/assets/document_parsing_performance_evaluation.png" width="1200"/>
|
|
82
|
+
<p>
|
|
57
83
|
|
|
58
84
|
## Quick Start
|
|
59
85
|
|
|
60
|
-
###
|
|
86
|
+
### 1. Minimal "Hello World" (Native Transformers)
|
|
87
|
+
|
|
88
|
+
If you are looking for a minimal script to parse a single image to Markdown using the native `transformers` library, here is a simple snippet:
|
|
89
|
+
|
|
90
|
+
```python
|
|
91
|
+
from PIL import Image
|
|
92
|
+
import torch
|
|
93
|
+
from transformers import AutoModelForImageTextToText, AutoProcessor
|
|
94
|
+
from qwen_vl_utils import process_vision_info
|
|
95
|
+
|
|
96
|
+
# Load the model and processor
|
|
97
|
+
model = AutoModelForImageTextToText.from_pretrained(
|
|
98
|
+
"infly/Infinity-Parser2-Pro",
|
|
99
|
+
torch_dtype="float16",
|
|
100
|
+
device_map="auto",
|
|
101
|
+
)
|
|
102
|
+
processor = AutoProcessor.from_pretrained("infly/Infinity-Parser2-Pro")
|
|
103
|
+
|
|
104
|
+
# Build the messages for the model
|
|
105
|
+
pil_image = Image.open("demo_data/demo.png").convert("RGB")
|
|
106
|
+
min_pixels = 2048 # 32 * 64
|
|
107
|
+
max_pixels = 16777216 # 4096 * 4096
|
|
108
|
+
prompt = """
|
|
109
|
+
Please output the layout information from the PDF image, including each layout element's bbox, its category, and the corresponding text content within the bbox.
|
|
110
|
+
1. Bbox format: [x1, y1, x2, y2]
|
|
111
|
+
2. Layout Categories: The possible categories are ['header', 'title', 'text', 'figure', 'table', 'formula', 'figure_caption', 'table_caption', 'formula_caption', 'figure_footnote', 'table_footnote', 'page_footnote', 'footer'].
|
|
112
|
+
3. Text Extraction & Formatting Rules:
|
|
113
|
+
- Figure: For the 'figure' category, the text field should be empty string.
|
|
114
|
+
- Formula: Format its text as LaTeX.
|
|
115
|
+
- Table: Format its text as HTML.
|
|
116
|
+
- All Others (Text, Title, etc.): Format their text as Markdown.
|
|
117
|
+
4. Constraints:
|
|
118
|
+
- The output text must be the original text from the image, with no translation.
|
|
119
|
+
- All layout elements must be sorted according to human reading order.
|
|
120
|
+
5. Final Output: The entire output must be a single JSON object.
|
|
121
|
+
"""
|
|
122
|
+
|
|
123
|
+
messages = [
|
|
124
|
+
{
|
|
125
|
+
"role": "user",
|
|
126
|
+
"content": [
|
|
127
|
+
{
|
|
128
|
+
"type": "image",
|
|
129
|
+
"image": pil_image,
|
|
130
|
+
"min_pixels": min_pixels,
|
|
131
|
+
"max_pixels": max_pixels,
|
|
132
|
+
},
|
|
133
|
+
{"type": "text", "text": prompt},
|
|
134
|
+
],
|
|
135
|
+
}
|
|
136
|
+
]
|
|
137
|
+
|
|
138
|
+
chat_template_kwargs = {"enable_thinking": False}
|
|
139
|
+
|
|
140
|
+
text = processor.apply_chat_template(
|
|
141
|
+
messages, tokenize=False, add_generation_prompt=True, **chat_template_kwargs
|
|
142
|
+
)
|
|
143
|
+
image_inputs, _ = process_vision_info(messages, image_patch_size=16)
|
|
144
|
+
|
|
145
|
+
inputs = processor(
|
|
146
|
+
text=text,
|
|
147
|
+
images=image_inputs,
|
|
148
|
+
do_resize=False,
|
|
149
|
+
padding=True,
|
|
150
|
+
return_tensors="pt",
|
|
151
|
+
)
|
|
152
|
+
|
|
153
|
+
# Move all tensors to the same device as the model
|
|
154
|
+
inputs = {
|
|
155
|
+
k: v.to(model.device) if isinstance(v, torch.Tensor) else v
|
|
156
|
+
for k, v in inputs.items()
|
|
157
|
+
}
|
|
158
|
+
|
|
159
|
+
# Generate the response
|
|
160
|
+
generated_ids = model.generate(
|
|
161
|
+
**inputs,
|
|
162
|
+
max_new_tokens=32768,
|
|
163
|
+
temperature=0.0,
|
|
164
|
+
top_p=1.0,
|
|
165
|
+
)
|
|
166
|
+
|
|
167
|
+
# Strip input tokens, keeping only the newly generated response
|
|
168
|
+
generated_ids_trimmed = [
|
|
169
|
+
out_ids[len(in_ids) :]
|
|
170
|
+
for in_ids, out_ids in zip(inputs["input_ids"], generated_ids)
|
|
171
|
+
]
|
|
172
|
+
output_text = processor.batch_decode(
|
|
173
|
+
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
|
|
174
|
+
)
|
|
175
|
+
print(output_text)
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
### 2. Advanced Pipeline (infinity_parser2)
|
|
179
|
+
|
|
180
|
+
For bulk processing, advanced features, or an end-to-end PDF parsing pipeline, we recommend using our infinity_parser2 wrapper.
|
|
61
181
|
|
|
62
182
|
#### Pre-requisites
|
|
63
183
|
|
|
64
184
|
```bash
|
|
65
|
-
#
|
|
185
|
+
# Create a Conda environment (Optional)
|
|
186
|
+
conda create -n infinity_parser2 python=3.12
|
|
187
|
+
conda activate infinity_parser2
|
|
188
|
+
|
|
189
|
+
# Install PyTorch (CUDA). Find the proper version at https://pytorch.org/get-started/previous-versions based on your CUDA version.
|
|
66
190
|
pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu128
|
|
67
191
|
|
|
68
|
-
# Install FlashAttention (
|
|
69
|
-
#
|
|
192
|
+
# Install FlashAttention (FlashAttention-2 is recommended by default)
|
|
193
|
+
# Standard install (compiles from source, ~10-30 min):
|
|
70
194
|
pip install flash-attn==2.8.3 --no-build-isolation
|
|
71
|
-
#
|
|
195
|
+
# Faster install: download wheel from https://github.com/Dao-AILab/flash-attention/releases. Then run: pip install /path/to/<wheel_filename>.whl
|
|
196
|
+
# For Hopper GPUs (e.g. H100, H800), we recommend FlashAttention-3 instead. See: https://github.com/Dao-AILab/flash-attention
|
|
197
|
+
# NOTE: The code will prioritize detecting FlashAttention-3. If not found, it falls back to FlashAttention-2.
|
|
72
198
|
|
|
73
199
|
# Install vLLM
|
|
74
200
|
# NOTE: you may need to run the command below to resolve triton and numpy conflicts before installing vllm.
|
|
@@ -78,23 +204,29 @@ pip install vllm==0.17.1
|
|
|
78
204
|
|
|
79
205
|
#### Install infinity_parser2
|
|
80
206
|
|
|
207
|
+
Install from PyPI
|
|
208
|
+
|
|
81
209
|
```bash
|
|
82
|
-
# From PyPI
|
|
83
210
|
pip install infinity_parser2
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
Install from source code
|
|
84
214
|
|
|
85
|
-
|
|
215
|
+
```bash
|
|
86
216
|
git clone https://github.com/infly-ai/INF-MLLM.git
|
|
87
217
|
cd INF-MLLM/Infinity-Parser2
|
|
88
218
|
pip install -e .
|
|
89
219
|
```
|
|
90
220
|
|
|
91
|
-
|
|
221
|
+
#### Usage
|
|
92
222
|
|
|
93
|
-
|
|
223
|
+
##### Command Line
|
|
94
224
|
|
|
95
225
|
The `parser` command is the fastest way to get started.
|
|
96
226
|
|
|
97
227
|
```bash
|
|
228
|
+
# NOTE: The Infinity-Parser2 model will be automatically downloaded on the first run.
|
|
229
|
+
|
|
98
230
|
# Parse a PDF (outputs Markdown by default)
|
|
99
231
|
parser demo_data/demo.pdf
|
|
100
232
|
|
|
@@ -119,9 +251,11 @@ parser demo_data/demo.png --task doc2md
|
|
|
119
251
|
parser --help
|
|
120
252
|
```
|
|
121
253
|
|
|
122
|
-
|
|
254
|
+
##### Python API
|
|
123
255
|
|
|
124
256
|
```python
|
|
257
|
+
# NOTE: The Infinity-Parser2 model will be automatically downloaded on the first run.
|
|
258
|
+
|
|
125
259
|
from infinity_parser2 import InfinityParser2
|
|
126
260
|
|
|
127
261
|
parser = InfinityParser2()
|
|
@@ -154,7 +288,7 @@ result = parser.parse("demo_data/demo.pdf", task_type="doc2md")
|
|
|
154
288
|
|
|
155
289
|
# Custom prompt
|
|
156
290
|
result = parser.parse("demo_data/demo.pdf", task_type="custom",
|
|
157
|
-
custom_prompt="
|
|
291
|
+
custom_prompt="Please transform the document's contents into Markdown format.")
|
|
158
292
|
|
|
159
293
|
# Batch processing with custom batch size
|
|
160
294
|
result = parser.parse("demo_data", batch_size=8)
|
|
@@ -308,3 +442,7 @@ print(cache.resolve_model_path("infly/Infinity-Parser2-Pro"))
|
|
|
308
442
|
- Python 3.12+
|
|
309
443
|
- CUDA-compatible GPU
|
|
310
444
|
- See `setup.py` for full dependency list.
|
|
445
|
+
|
|
446
|
+
## Acknowledgments
|
|
447
|
+
|
|
448
|
+
We would like to thank [Qwen3.5](https://github.com/QwenLM/Qwen3.5), [ms-swift](https://github.com/modelscope/ms-swift), [VeRL](https://github.com/verl-project/verl), [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval), [olmocr](https://huggingface.co/datasets/allenai/olmOCR-bench), [PaddleOCR-VL](https://github.com/PaddlePaddle/PaddleOCR), [MinerU](https://github.com/opendatalab/MinerU), [dots.ocr](https://github.com/rednote-hilab/dots.ocr), [Chandra-OCR-2](https://github.com/datalab-to/chandra) for providing dataset, code and models.
|
|
@@ -32,7 +32,7 @@ install_requires = [
|
|
|
32
32
|
|
|
33
33
|
setup(
|
|
34
34
|
name="infinity_parser2",
|
|
35
|
-
version="0.
|
|
35
|
+
version="0.3.0",
|
|
36
36
|
description="Document parsing Python package supporting PDF and image parsing using Infinity-Parser2-Pro model.",
|
|
37
37
|
long_description=open("README.md", "r", encoding="utf-8").read(),
|
|
38
38
|
long_description_content_type="text/markdown",
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
{infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/infinity_parser2.egg-info/dependency_links.txt
RENAMED
|
File without changes
|
{infinity_parser2-0.1.0 → infinity_parser2-0.3.0}/infinity_parser2.egg-info/entry_points.txt
RENAMED
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|