infinity-parser2 0.1.0__tar.gz → 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (30) hide show
  1. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/PKG-INFO +43 -5
  2. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/README.md +42 -4
  3. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/__init__.py +1 -1
  4. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/cli.py +9 -9
  5. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/parser.py +6 -5
  6. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2.egg-info/PKG-INFO +43 -5
  7. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/setup.py +1 -1
  8. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/__main__.py +0 -0
  9. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/backends/__init__.py +0 -0
  10. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/backends/base.py +0 -0
  11. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/backends/transformers.py +0 -0
  12. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/backends/vllm_engine.py +0 -0
  13. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/backends/vllm_server.py +0 -0
  14. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/prompts.py +0 -0
  15. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/utils/__init__.py +0 -0
  16. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/utils/file.py +0 -0
  17. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/utils/image.py +0 -0
  18. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/utils/model.py +0 -0
  19. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/utils/pdf.py +0 -0
  20. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2/utils/utils.py +0 -0
  21. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2.egg-info/SOURCES.txt +0 -0
  22. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2.egg-info/dependency_links.txt +0 -0
  23. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2.egg-info/entry_points.txt +0 -0
  24. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2.egg-info/requires.txt +0 -0
  25. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/infinity_parser2.egg-info/top_level.txt +0 -0
  26. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/setup.cfg +0 -0
  27. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/tests/__init__.py +0 -0
  28. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/tests/test_backends.py +0 -0
  29. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/tests/test_parser.py +0 -0
  30. {infinity_parser2-0.1.0 → infinity_parser2-0.2.0}/tests/test_utils.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: infinity_parser2
3
- Version: 0.1.0
3
+ Version: 0.2.0
4
4
  Summary: Document parsing Python package supporting PDF and image parsing using Infinity-Parser2-Pro model.
5
5
  Home-page: https://github.com/infly-ai/INF-MLLM
6
6
  Author: INF Tech
@@ -53,7 +53,33 @@ Dynamic: summary
53
53
 
54
54
  # Infinity-Parser2
55
55
 
56
- Infinity-Parser2 is a document parsing tool powered by the Infinity-Parser2-Pro model. It converts **PDF files** and **images** (PNG, JPG, WEBP) into structured Markdown or JSON with layout information.
56
+ <p align="center">
57
+ <img src="https://raw.githubusercontent.com/infly-ai/INF-MLLM/main/Infinity-Parser2/assets/logo.png" width="400"/>
58
+ <p>
59
+
60
+ <p align="center">
61
+ 🤗 <a href="https://huggingface.co/infly/Infinity-Parser2-Pro">Model</a> |
62
+ 📊 <a>Dataset (coming soon...)</a> |
63
+ 📄 <a>Paper (coming soon...)</a> |
64
+ 🚀 <a>Demo (coming soon...)</a>
65
+ </p>
66
+
67
+ ## Introduction
68
+
69
+ We are excited to release Infinity-Parser2-Pro, our latest flagship document understanding model that achieves a new state-of-the-art on olmOCR-Bench with a score of 86.7%, surpassing frontier models such as DeepSeek-OCR-2, PaddleOCR-VL, and dots.mocr. Building on our previous model Infinity-Parser-7B, we have significantly enhanced our data engine and multi-task reinforcement learning approach. This enables the model to consolidate robust multi-modal parsing capabilities into a unified architecture, delivering brand-new zero-shot capabilities for diverse real-world business scenarios.
70
+
71
+ ### Key Features
72
+
73
+ - **Upgraded Data Engine**: We have comprehensively enhanced our synthetic data engine to support both fixed-layout and flexible-layout document formats. By generating over 1 million diverse full-text samples covering a wide range of document layouts, combined with a dynamic adaptive sampling strategy, we ensure highly balanced and robust multi-task learning across various document types.
74
+ - **Multi-Task Reinforcement Learning**: We designed a novel verifiable reward system to support Joint Reinforcement Learning (RL), enabling seamless and simultaneous co-optimization of multiple complex tasks, including doc2json and doc2markdown.
75
+ - **Breakthrough Parsing Performance**: It substantially outperforms our previous 7B model, achieving 86.7% on olmOCR-Bench, surpassing frontier models such as DeepSeek-OCR-2, PaddleOCR-VL, and dots.mocr.
76
+ - **Inference Acceleration**: By adopting the highly efficient MoE architecture, our inference throughput has increased by 21% (from 441 to 534 tokens/sec), reducing deployment latency and costs.
77
+
78
+ ## Performance
79
+
80
+ <p align="left">
81
+ <img src="https://raw.githubusercontent.com/infly-ai/INF-MLLM/main/Infinity-Parser2/assets/document_parsing_performance_evaluation.png" width="1200"/>
82
+ <p>
57
83
 
58
84
  ## Quick Start
59
85
 
@@ -62,13 +88,17 @@ Infinity-Parser2 is a document parsing tool powered by the Infinity-Parser2-Pro
62
88
  #### Pre-requisites
63
89
 
64
90
  ```bash
65
- # Install PyTorch (CUDA). Find the proper version on the [official site](https://pytorch.org/get-started/previous-versions) based on your CUDA version.
91
+ # Create a Conda environment (Optional)
92
+ conda create -n infinity_parser2 python=3.12
93
+ conda activate infinity_parser2
94
+
95
+ # Install PyTorch (CUDA). Find the proper version at https://pytorch.org/get-started/previous-versions based on your CUDA version.
66
96
  pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu128
67
97
 
68
98
  # Install FlashAttention (required for NVIDIA GPUs).
69
99
  # This command builds flash-attn from source, which can take 10 to 30 minutes.
70
100
  pip install flash-attn==2.8.3 --no-build-isolation
71
- # For Hopper GPUs (e.g. H100, H800), we recommend FlashAttention-3 instead. See the [official guide](https://github.com/Dao-AILab/flash-attention).
101
+ # For Hopper GPUs (e.g. H100, H800), we recommend FlashAttention-3 instead. See the official guide at https://github.com/Dao-AILab/flash-attention.
72
102
 
73
103
  # Install vLLM
74
104
  # NOTE: you may need to run the command below to resolve triton and numpy conflicts before installing vllm.
@@ -95,6 +125,8 @@ pip install -e .
95
125
  The `parser` command is the fastest way to get started.
96
126
 
97
127
  ```bash
128
+ # NOTE: The Infinity-Parser2 model will be automatically downloaded on the first run.
129
+
98
130
  # Parse a PDF (outputs Markdown by default)
99
131
  parser demo_data/demo.pdf
100
132
 
@@ -122,6 +154,8 @@ parser --help
122
154
  #### Python API
123
155
 
124
156
  ```python
157
+ # NOTE: The Infinity-Parser2 model will be automatically downloaded on the first run.
158
+
125
159
  from infinity_parser2 import InfinityParser2
126
160
 
127
161
  parser = InfinityParser2()
@@ -154,7 +188,7 @@ result = parser.parse("demo_data/demo.pdf", task_type="doc2md")
154
188
 
155
189
  # Custom prompt
156
190
  result = parser.parse("demo_data/demo.pdf", task_type="custom",
157
- custom_prompt="Extract the title and authors only.")
191
+ custom_prompt="Please transform the document's contents into Markdown format.")
158
192
 
159
193
  # Batch processing with custom batch size
160
194
  result = parser.parse("demo_data", batch_size=8)
@@ -308,3 +342,7 @@ print(cache.resolve_model_path("infly/Infinity-Parser2-Pro"))
308
342
  - Python 3.12+
309
343
  - CUDA-compatible GPU
310
344
  - See `setup.py` for full dependency list.
345
+
346
+ ## Acknowledgments
347
+
348
+ We would like to thank [Qwen3.5](https://github.com/QwenLM/Qwen3.5), [ms-swift](https://github.com/modelscope/ms-swift), [VeRL](https://github.com/verl-project/verl), [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval), [olmocr](https://huggingface.co/datasets/allenai/olmOCR-bench), [PaddleOCR-VL](https://github.com/PaddlePaddle/PaddleOCR), [MinerU](https://github.com/opendatalab/MinerU), [dots.ocr](https://github.com/rednote-hilab/dots.ocr), [Chandra-OCR-2](https://github.com/datalab-to/chandra) for providing dataset, code and models.
@@ -1,6 +1,32 @@
1
1
  # Infinity-Parser2
2
2
 
3
- Infinity-Parser2 is a document parsing tool powered by the Infinity-Parser2-Pro model. It converts **PDF files** and **images** (PNG, JPG, WEBP) into structured Markdown or JSON with layout information.
3
+ <p align="center">
4
+ <img src="https://raw.githubusercontent.com/infly-ai/INF-MLLM/main/Infinity-Parser2/assets/logo.png" width="400"/>
5
+ <p>
6
+
7
+ <p align="center">
8
+ 🤗 <a href="https://huggingface.co/infly/Infinity-Parser2-Pro">Model</a> |
9
+ 📊 <a>Dataset (coming soon...)</a> |
10
+ 📄 <a>Paper (coming soon...)</a> |
11
+ 🚀 <a>Demo (coming soon...)</a>
12
+ </p>
13
+
14
+ ## Introduction
15
+
16
+ We are excited to release Infinity-Parser2-Pro, our latest flagship document understanding model that achieves a new state-of-the-art on olmOCR-Bench with a score of 86.7%, surpassing frontier models such as DeepSeek-OCR-2, PaddleOCR-VL, and dots.mocr. Building on our previous model Infinity-Parser-7B, we have significantly enhanced our data engine and multi-task reinforcement learning approach. This enables the model to consolidate robust multi-modal parsing capabilities into a unified architecture, delivering brand-new zero-shot capabilities for diverse real-world business scenarios.
17
+
18
+ ### Key Features
19
+
20
+ - **Upgraded Data Engine**: We have comprehensively enhanced our synthetic data engine to support both fixed-layout and flexible-layout document formats. By generating over 1 million diverse full-text samples covering a wide range of document layouts, combined with a dynamic adaptive sampling strategy, we ensure highly balanced and robust multi-task learning across various document types.
21
+ - **Multi-Task Reinforcement Learning**: We designed a novel verifiable reward system to support Joint Reinforcement Learning (RL), enabling seamless and simultaneous co-optimization of multiple complex tasks, including doc2json and doc2markdown.
22
+ - **Breakthrough Parsing Performance**: It substantially outperforms our previous 7B model, achieving 86.7% on olmOCR-Bench, surpassing frontier models such as DeepSeek-OCR-2, PaddleOCR-VL, and dots.mocr.
23
+ - **Inference Acceleration**: By adopting the highly efficient MoE architecture, our inference throughput has increased by 21% (from 441 to 534 tokens/sec), reducing deployment latency and costs.
24
+
25
+ ## Performance
26
+
27
+ <p align="left">
28
+ <img src="https://raw.githubusercontent.com/infly-ai/INF-MLLM/main/Infinity-Parser2/assets/document_parsing_performance_evaluation.png" width="1200"/>
29
+ <p>
4
30
 
5
31
  ## Quick Start
6
32
 
@@ -9,13 +35,17 @@ Infinity-Parser2 is a document parsing tool powered by the Infinity-Parser2-Pro
9
35
  #### Pre-requisites
10
36
 
11
37
  ```bash
12
- # Install PyTorch (CUDA). Find the proper version on the [official site](https://pytorch.org/get-started/previous-versions) based on your CUDA version.
38
+ # Create a Conda environment (Optional)
39
+ conda create -n infinity_parser2 python=3.12
40
+ conda activate infinity_parser2
41
+
42
+ # Install PyTorch (CUDA). Find the proper version at https://pytorch.org/get-started/previous-versions based on your CUDA version.
13
43
  pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu128
14
44
 
15
45
  # Install FlashAttention (required for NVIDIA GPUs).
16
46
  # This command builds flash-attn from source, which can take 10 to 30 minutes.
17
47
  pip install flash-attn==2.8.3 --no-build-isolation
18
- # For Hopper GPUs (e.g. H100, H800), we recommend FlashAttention-3 instead. See the [official guide](https://github.com/Dao-AILab/flash-attention).
48
+ # For Hopper GPUs (e.g. H100, H800), we recommend FlashAttention-3 instead. See the official guide at https://github.com/Dao-AILab/flash-attention.
19
49
 
20
50
  # Install vLLM
21
51
  # NOTE: you may need to run the command below to resolve triton and numpy conflicts before installing vllm.
@@ -42,6 +72,8 @@ pip install -e .
42
72
  The `parser` command is the fastest way to get started.
43
73
 
44
74
  ```bash
75
+ # NOTE: The Infinity-Parser2 model will be automatically downloaded on the first run.
76
+
45
77
  # Parse a PDF (outputs Markdown by default)
46
78
  parser demo_data/demo.pdf
47
79
 
@@ -69,6 +101,8 @@ parser --help
69
101
  #### Python API
70
102
 
71
103
  ```python
104
+ # NOTE: The Infinity-Parser2 model will be automatically downloaded on the first run.
105
+
72
106
  from infinity_parser2 import InfinityParser2
73
107
 
74
108
  parser = InfinityParser2()
@@ -101,7 +135,7 @@ result = parser.parse("demo_data/demo.pdf", task_type="doc2md")
101
135
 
102
136
  # Custom prompt
103
137
  result = parser.parse("demo_data/demo.pdf", task_type="custom",
104
- custom_prompt="Extract the title and authors only.")
138
+ custom_prompt="Please transform the document's contents into Markdown format.")
105
139
 
106
140
  # Batch processing with custom batch size
107
141
  result = parser.parse("demo_data", batch_size=8)
@@ -255,3 +289,7 @@ print(cache.resolve_model_path("infly/Infinity-Parser2-Pro"))
255
289
  - Python 3.12+
256
290
  - CUDA-compatible GPU
257
291
  - See `setup.py` for full dependency list.
292
+
293
+ ## Acknowledgments
294
+
295
+ We would like to thank [Qwen3.5](https://github.com/QwenLM/Qwen3.5), [ms-swift](https://github.com/modelscope/ms-swift), [VeRL](https://github.com/verl-project/verl), [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval), [olmocr](https://huggingface.co/datasets/allenai/olmOCR-bench), [PaddleOCR-VL](https://github.com/PaddlePaddle/PaddleOCR), [MinerU](https://github.com/opendatalab/MinerU), [dots.ocr](https://github.com/rednote-hilab/dots.ocr), [Chandra-OCR-2](https://github.com/datalab-to/chandra) for providing dataset, code and models.
@@ -1,6 +1,6 @@
1
1
  """Infinity-Parser2: Document parsing Python package."""
2
2
 
3
- __version__ = "0.1.0"
3
+ __version__ = "0.2.0"
4
4
 
5
5
  from .parser import InfinityParser2
6
6
  from .backends import (
@@ -26,28 +26,28 @@ def build_parser() -> argparse.ArgumentParser:
26
26
  epilog="""
27
27
  Examples:
28
28
  # Parse a PDF file (default: doc2json -> markdown output)
29
- parser document.pdf
29
+ parser demo_data/demo.pdf
30
30
 
31
31
  # Parse with doc2md task type
32
- parser document.pdf --task doc2md
32
+ parser demo_data/demo.pdf --task doc2md
33
33
 
34
34
  # Parse with custom prompt
35
- parser document.pdf --task custom --prompt "Extract the title and authors"
35
+ parser demo_data/demo.pdf --task custom --prompt "Please transform the document's contents into Markdown format."
36
36
 
37
37
  # Parse multiple files
38
- parser doc1.pdf doc2.png --output-dir ./results
38
+ parser demo_data/demo.pdf demo_data/demo.png --output-dir ./results
39
39
 
40
40
  # Parse a directory
41
- parser ./docs --output-dir ./results
41
+ parser demo_data --output-dir ./results
42
42
 
43
43
  # Output raw JSON
44
- parser document.pdf --output-format json
44
+ parser demo_data/demo.pdf --output-format json
45
45
 
46
46
  # Use transformers backend
47
- parser document.pdf --backend transformers
47
+ parser demo_data/demo.pdf --backend transformers
48
48
 
49
49
  # Use vllm-server backend
50
- parser document.pdf --backend vllm-server --api-url http://localhost:8000/v1/chat/completions
50
+ parser demo_data/demo.pdf --backend vllm-server --api-url http://localhost:8000/v1/chat/completions
51
51
  """,
52
52
  )
53
53
 
@@ -136,7 +136,7 @@ Examples:
136
136
  parser.add_argument(
137
137
  "--version",
138
138
  action="version",
139
- version="Infinity-Parser2 0.1.0",
139
+ version="Infinity-Parser2 0.2.0",
140
140
  )
141
141
 
142
142
  return parser
@@ -52,7 +52,7 @@ class InfinityParser2:
52
52
  Example:
53
53
  >>> from infinity_parser2 import InfinityParser2
54
54
  >>> parser = InfinityParser2(model_name="infly/Infinity-Parser2-Pro")
55
- >>> result = parser.parse("document.pdf")
55
+ >>> result = parser.parse("demo_data/demo.pdf")
56
56
  """
57
57
 
58
58
  def __init__(
@@ -183,13 +183,13 @@ class InfinityParser2:
183
183
  Example:
184
184
  >>> parser = InfinityParser2()
185
185
  >>> # Single file, returns str
186
- >>> result = parser.parse("document.pdf")
186
+ >>> result = parser.parse("demo_data/demo.pdf")
187
187
  >>> # Multiple files, returns List[str]
188
- >>> result = parser.parse(["doc1.pdf", "doc2.pdf"])
188
+ >>> result = parser.parse(["demo_data/demo.pdf", "demo_data/demo.png"])
189
189
  >>> # Directory, returns Dict[str, str]
190
- >>> result = parser.parse("/path/to/docs")
190
+ >>> result = parser.parse("./demo_data")
191
191
  >>> # Save results to output_dir, returns None
192
- >>> parser.parse("document.pdf", output_dir="./output")
192
+ >>> parser.parse("demo_data/demo.pdf", output_dir="./output")
193
193
  """
194
194
  if task_type not in SUPPORTED_TASK_TYPES:
195
195
  raise ValueError(f"task_type must be one of {SUPPORTED_TASK_TYPES}, got '{task_type}'")
@@ -204,6 +204,7 @@ class InfinityParser2:
204
204
  )
205
205
 
206
206
  prompt = self._resolve_prompt(task_type, custom_prompt)
207
+ print(f"[Infinity-Parser2] task_type: {task_type}, prompt: {prompt}")
207
208
 
208
209
  is_directory = isinstance(input_data, str) and os.path.isdir(input_data)
209
210
  file_paths = normalize_input(input_data)
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: infinity_parser2
3
- Version: 0.1.0
3
+ Version: 0.2.0
4
4
  Summary: Document parsing Python package supporting PDF and image parsing using Infinity-Parser2-Pro model.
5
5
  Home-page: https://github.com/infly-ai/INF-MLLM
6
6
  Author: INF Tech
@@ -53,7 +53,33 @@ Dynamic: summary
53
53
 
54
54
  # Infinity-Parser2
55
55
 
56
- Infinity-Parser2 is a document parsing tool powered by the Infinity-Parser2-Pro model. It converts **PDF files** and **images** (PNG, JPG, WEBP) into structured Markdown or JSON with layout information.
56
+ <p align="center">
57
+ <img src="https://raw.githubusercontent.com/infly-ai/INF-MLLM/main/Infinity-Parser2/assets/logo.png" width="400"/>
58
+ <p>
59
+
60
+ <p align="center">
61
+ 🤗 <a href="https://huggingface.co/infly/Infinity-Parser2-Pro">Model</a> |
62
+ 📊 <a>Dataset (coming soon...)</a> |
63
+ 📄 <a>Paper (coming soon...)</a> |
64
+ 🚀 <a>Demo (coming soon...)</a>
65
+ </p>
66
+
67
+ ## Introduction
68
+
69
+ We are excited to release Infinity-Parser2-Pro, our latest flagship document understanding model that achieves a new state-of-the-art on olmOCR-Bench with a score of 86.7%, surpassing frontier models such as DeepSeek-OCR-2, PaddleOCR-VL, and dots.mocr. Building on our previous model Infinity-Parser-7B, we have significantly enhanced our data engine and multi-task reinforcement learning approach. This enables the model to consolidate robust multi-modal parsing capabilities into a unified architecture, delivering brand-new zero-shot capabilities for diverse real-world business scenarios.
70
+
71
+ ### Key Features
72
+
73
+ - **Upgraded Data Engine**: We have comprehensively enhanced our synthetic data engine to support both fixed-layout and flexible-layout document formats. By generating over 1 million diverse full-text samples covering a wide range of document layouts, combined with a dynamic adaptive sampling strategy, we ensure highly balanced and robust multi-task learning across various document types.
74
+ - **Multi-Task Reinforcement Learning**: We designed a novel verifiable reward system to support Joint Reinforcement Learning (RL), enabling seamless and simultaneous co-optimization of multiple complex tasks, including doc2json and doc2markdown.
75
+ - **Breakthrough Parsing Performance**: It substantially outperforms our previous 7B model, achieving 86.7% on olmOCR-Bench, surpassing frontier models such as DeepSeek-OCR-2, PaddleOCR-VL, and dots.mocr.
76
+ - **Inference Acceleration**: By adopting the highly efficient MoE architecture, our inference throughput has increased by 21% (from 441 to 534 tokens/sec), reducing deployment latency and costs.
77
+
78
+ ## Performance
79
+
80
+ <p align="left">
81
+ <img src="https://raw.githubusercontent.com/infly-ai/INF-MLLM/main/Infinity-Parser2/assets/document_parsing_performance_evaluation.png" width="1200"/>
82
+ <p>
57
83
 
58
84
  ## Quick Start
59
85
 
@@ -62,13 +88,17 @@ Infinity-Parser2 is a document parsing tool powered by the Infinity-Parser2-Pro
62
88
  #### Pre-requisites
63
89
 
64
90
  ```bash
65
- # Install PyTorch (CUDA). Find the proper version on the [official site](https://pytorch.org/get-started/previous-versions) based on your CUDA version.
91
+ # Create a Conda environment (Optional)
92
+ conda create -n infinity_parser2 python=3.12
93
+ conda activate infinity_parser2
94
+
95
+ # Install PyTorch (CUDA). Find the proper version at https://pytorch.org/get-started/previous-versions based on your CUDA version.
66
96
  pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu128
67
97
 
68
98
  # Install FlashAttention (required for NVIDIA GPUs).
69
99
  # This command builds flash-attn from source, which can take 10 to 30 minutes.
70
100
  pip install flash-attn==2.8.3 --no-build-isolation
71
- # For Hopper GPUs (e.g. H100, H800), we recommend FlashAttention-3 instead. See the [official guide](https://github.com/Dao-AILab/flash-attention).
101
+ # For Hopper GPUs (e.g. H100, H800), we recommend FlashAttention-3 instead. See the official guide at https://github.com/Dao-AILab/flash-attention.
72
102
 
73
103
  # Install vLLM
74
104
  # NOTE: you may need to run the command below to resolve triton and numpy conflicts before installing vllm.
@@ -95,6 +125,8 @@ pip install -e .
95
125
  The `parser` command is the fastest way to get started.
96
126
 
97
127
  ```bash
128
+ # NOTE: The Infinity-Parser2 model will be automatically downloaded on the first run.
129
+
98
130
  # Parse a PDF (outputs Markdown by default)
99
131
  parser demo_data/demo.pdf
100
132
 
@@ -122,6 +154,8 @@ parser --help
122
154
  #### Python API
123
155
 
124
156
  ```python
157
+ # NOTE: The Infinity-Parser2 model will be automatically downloaded on the first run.
158
+
125
159
  from infinity_parser2 import InfinityParser2
126
160
 
127
161
  parser = InfinityParser2()
@@ -154,7 +188,7 @@ result = parser.parse("demo_data/demo.pdf", task_type="doc2md")
154
188
 
155
189
  # Custom prompt
156
190
  result = parser.parse("demo_data/demo.pdf", task_type="custom",
157
- custom_prompt="Extract the title and authors only.")
191
+ custom_prompt="Please transform the document's contents into Markdown format.")
158
192
 
159
193
  # Batch processing with custom batch size
160
194
  result = parser.parse("demo_data", batch_size=8)
@@ -308,3 +342,7 @@ print(cache.resolve_model_path("infly/Infinity-Parser2-Pro"))
308
342
  - Python 3.12+
309
343
  - CUDA-compatible GPU
310
344
  - See `setup.py` for full dependency list.
345
+
346
+ ## Acknowledgments
347
+
348
+ We would like to thank [Qwen3.5](https://github.com/QwenLM/Qwen3.5), [ms-swift](https://github.com/modelscope/ms-swift), [VeRL](https://github.com/verl-project/verl), [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval), [olmocr](https://huggingface.co/datasets/allenai/olmOCR-bench), [PaddleOCR-VL](https://github.com/PaddlePaddle/PaddleOCR), [MinerU](https://github.com/opendatalab/MinerU), [dots.ocr](https://github.com/rednote-hilab/dots.ocr), [Chandra-OCR-2](https://github.com/datalab-to/chandra) for providing dataset, code and models.
@@ -32,7 +32,7 @@ install_requires = [
32
32
 
33
33
  setup(
34
34
  name="infinity_parser2",
35
- version="0.1.0",
35
+ version="0.2.0",
36
36
  description="Document parsing Python package supporting PDF and image parsing using Infinity-Parser2-Pro model.",
37
37
  long_description=open("README.md", "r", encoding="utf-8").read(),
38
38
  long_description_content_type="text/markdown",