sinapsis-deepseek-ocr 0.1.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- sinapsis_deepseek_ocr/__init__.py +0 -0
- sinapsis_deepseek_ocr/helpers/__init__.py +0 -0
- sinapsis_deepseek_ocr/helpers/bbox_utils.py +18 -0
- sinapsis_deepseek_ocr/helpers/grounding_parser.py +38 -0
- sinapsis_deepseek_ocr/helpers/mode_registry.py +33 -0
- sinapsis_deepseek_ocr/helpers/schemas.py +59 -0
- sinapsis_deepseek_ocr/helpers/tags.py +19 -0
- sinapsis_deepseek_ocr/templates/__init__.py +18 -0
- sinapsis_deepseek_ocr/templates/deepseek_ocr_inference.py +218 -0
- sinapsis_deepseek_ocr-0.1.0.dist-info/METADATA +295 -0
- sinapsis_deepseek_ocr-0.1.0.dist-info/RECORD +14 -0
- sinapsis_deepseek_ocr-0.1.0.dist-info/WHEEL +5 -0
- sinapsis_deepseek_ocr-0.1.0.dist-info/licenses/LICENSE +661 -0
- sinapsis_deepseek_ocr-0.1.0.dist-info/top_level.txt +1 -0
|
@@ -0,0 +1,295 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: sinapsis-deepseek-ocr
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: A powerful DeepSeek-based Optical Character Recognition (OCR) implementation supporting text extraction and grounding.
|
|
5
|
+
Author-email: SinapsisAI <dev@sinapsis.tech>
|
|
6
|
+
Project-URL: Homepage, https://sinapsis.tech
|
|
7
|
+
Project-URL: Documentation, https://docs.sinapsis.tech/docs
|
|
8
|
+
Project-URL: Tutorials, https://docs.sinapsis.tech/tutorials
|
|
9
|
+
Project-URL: Repository, https://github.com/Sinapsis-AI/sinapsis-ocr.git
|
|
10
|
+
Requires-Python: >=3.10
|
|
11
|
+
Description-Content-Type: text/markdown
|
|
12
|
+
License-File: LICENSE
|
|
13
|
+
Requires-Dist: sinapsis>=0.2.24
|
|
14
|
+
Requires-Dist: flash-attn>=2.8.3
|
|
15
|
+
Requires-Dist: torch==2.8.0
|
|
16
|
+
Requires-Dist: transformers==4.46.3
|
|
17
|
+
Requires-Dist: tokenizers==0.20.3
|
|
18
|
+
Requires-Dist: einops>=0.8.1
|
|
19
|
+
Requires-Dist: addict>=2.4.0
|
|
20
|
+
Requires-Dist: easydict>=1.13
|
|
21
|
+
Requires-Dist: accelerate>=1.12.0
|
|
22
|
+
Requires-Dist: sinapsis-generic-data-tools>=0.1.11
|
|
23
|
+
Provides-Extra: sinapsis-data-readers
|
|
24
|
+
Requires-Dist: sinapsis-data-readers[opencv]>=0.1.0; extra == "sinapsis-data-readers"
|
|
25
|
+
Provides-Extra: sinapsis-data-writers
|
|
26
|
+
Requires-Dist: sinapsis-data-writers>=0.1.0; extra == "sinapsis-data-writers"
|
|
27
|
+
Provides-Extra: sinapsis-data-visualization
|
|
28
|
+
Requires-Dist: sinapsis-data-visualization[visualization-matplotlib]>=0.1.0; extra == "sinapsis-data-visualization"
|
|
29
|
+
Provides-Extra: all
|
|
30
|
+
Requires-Dist: sinapsis-deepseek-ocr[sinapsis-data-readers]; extra == "all"
|
|
31
|
+
Requires-Dist: sinapsis-deepseek-ocr[sinapsis-data-writers]; extra == "all"
|
|
32
|
+
Requires-Dist: sinapsis-deepseek-ocr[sinapsis-data-visualization]; extra == "all"
|
|
33
|
+
Dynamic: license-file
|
|
34
|
+
|
|
35
|
+
<h1 align="center">
|
|
36
|
+
<br>
|
|
37
|
+
<a href="https://sinapsis.tech/">
|
|
38
|
+
<img
|
|
39
|
+
src="https://github.com/Sinapsis-AI/brand-resources/blob/main/sinapsis_logo/4x/logo.png?raw=true"
|
|
40
|
+
alt="" width="300">
|
|
41
|
+
</a><br>
|
|
42
|
+
Sinapsis DeepSeek OCR
|
|
43
|
+
<br>
|
|
44
|
+
</h1>
|
|
45
|
+
|
|
46
|
+
<h4 align="center">DeepSeek-based Optical Character Recognition (OCR) for images</h4>
|
|
47
|
+
|
|
48
|
+
<p align="center">
|
|
49
|
+
<a href="#installation">🐍 Installation</a> •
|
|
50
|
+
<a href="#features">🚀 Features</a> •
|
|
51
|
+
<a href="#usage">📚 Usage example</a> •
|
|
52
|
+
<a href="#webapp">🌐 Webapp</a> •
|
|
53
|
+
<a href="#documentation">📙 Documentation</a> •
|
|
54
|
+
<a href="#license">🔍 License</a>
|
|
55
|
+
</p>
|
|
56
|
+
|
|
57
|
+
**Sinapsis DeepSeek OCR** provides a powerful implementation for extracting text from images using DeepSeek's OCR model. It supports optional grounding for bounding box extraction.
|
|
58
|
+
|
|
59
|
+
<h2 id="installation">🐍 Installation</h2>
|
|
60
|
+
|
|
61
|
+
Install using your package manager of choice. We encourage the use of <code>uv</code>
|
|
62
|
+
|
|
63
|
+
Example with <code>uv</code>:
|
|
64
|
+
|
|
65
|
+
```bash
|
|
66
|
+
uv pip install sinapsis-deepseek-ocr --extra-index-url https://pypi.sinapsis.tech
|
|
67
|
+
```
|
|
68
|
+
or with raw <code>pip</code>:
|
|
69
|
+
```bash
|
|
70
|
+
pip install sinapsis-deepseek-ocr --extra-index-url https://pypi.sinapsis.tech
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
> [!IMPORTANT]
|
|
74
|
+
> Templates may require extra dependencies. For development, we recommend installing the package with all the optional dependencies:
|
|
75
|
+
>
|
|
76
|
+
|
|
77
|
+
with <code>uv</code>:
|
|
78
|
+
|
|
79
|
+
```bash
|
|
80
|
+
uv pip install sinapsis-deepseek-ocr[all] --extra-index-url https://pypi.sinapsis.tech
|
|
81
|
+
```
|
|
82
|
+
or with raw <code>pip</code>:
|
|
83
|
+
```bash
|
|
84
|
+
pip install sinapsis-deepseek-ocr[all] --extra-index-url https://pypi.sinapsis.tech
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
> [!TIP]
|
|
88
|
+
> Use CLI command ```sinapsis info --all-template-names``` to show a list with all the available Template names installed with Sinapsis OCR.
|
|
89
|
+
|
|
90
|
+
> [!TIP]
|
|
91
|
+
> Use CLI command ```sinapsis info --example-template-config DeepSeekOCRInference``` to produce an example Agent config for the DeepSeekOCRInference template.
|
|
92
|
+
|
|
93
|
+
<h2 id="features">🚀 Features</h2>
|
|
94
|
+
|
|
95
|
+
<h3>Templates Supported</h3>
|
|
96
|
+
|
|
97
|
+
This module includes a template tailored for the DeepSeek OCR engine:
|
|
98
|
+
|
|
99
|
+
- **DeepSeekOCRInference**: Uses DeepSeek's OCR model to extract text from images. Supports optional grounding for bounding box extraction.
|
|
100
|
+
|
|
101
|
+
<details>
|
|
102
|
+
<summary><strong><span style="font-size: 1.25em;">DeepSeekOCRInference Attributes</span></strong></summary>
|
|
103
|
+
|
|
104
|
+
- **`prompt`** (str): The prompt to send to the model. Defaults to `"OCR the image."`.
|
|
105
|
+
- **`enable_grounding`** (bool): Whether to enable grounding for bbox extraction. Defaults to `False`.
|
|
106
|
+
- **`mode`** (str): The inference mode. Options: `"tiny"`, `"small"`, `"gundam"`, `"base"`, `"large"`. Defaults to `"base"`.
|
|
107
|
+
- **`init_args`** (DeepSeekOCRInitArgs): Initialization arguments for the model including:
|
|
108
|
+
- `pretrained_model_name_or_path`: Model identifier. Defaults to `"deepseek-ai/DeepSeek-OCR"`.
|
|
109
|
+
- `torch_dtype`: Model precision (`"float16"`, `"bfloat16"`, `"auto"`). Defaults to `"auto"`.
|
|
110
|
+
- `attn_implementation`: Attention implementation. Defaults to `"flash_attention_2"`.
|
|
111
|
+
- **Note**: This model requires CUDA. CPU inference is not supported.
|
|
112
|
+
|
|
113
|
+
</details>
|
|
114
|
+
|
|
115
|
+
<h2 id="usage">📚 Usage example</h2>
|
|
116
|
+
|
|
117
|
+
<details>
|
|
118
|
+
<summary><strong><span style="font-size: 1.4em;">Text Extraction (No Grounding)</span></strong></summary>
|
|
119
|
+
|
|
120
|
+
```yaml
|
|
121
|
+
agent:
|
|
122
|
+
name: deepseek_ocr_agent
|
|
123
|
+
description: Agent to run inference with DeepSeek OCR
|
|
124
|
+
|
|
125
|
+
templates:
|
|
126
|
+
- template_name: InputTemplate
|
|
127
|
+
class_name: InputTemplate
|
|
128
|
+
attributes: {}
|
|
129
|
+
|
|
130
|
+
- template_name: FolderImageDatasetCV2
|
|
131
|
+
class_name: FolderImageDatasetCV2
|
|
132
|
+
template_input: InputTemplate
|
|
133
|
+
attributes:
|
|
134
|
+
data_dir: dataset/input
|
|
135
|
+
|
|
136
|
+
- template_name: DeepSeekOCRInference
|
|
137
|
+
class_name: DeepSeekOCRInference
|
|
138
|
+
template_input: FolderImageDatasetCV2
|
|
139
|
+
attributes:
|
|
140
|
+
prompt: "Perform OCR."
|
|
141
|
+
enable_grounding: false
|
|
142
|
+
mode: base
|
|
143
|
+
```
|
|
144
|
+
</details>
|
|
145
|
+
|
|
146
|
+
<details>
|
|
147
|
+
<summary><strong><span style="font-size: 1.4em;">With Grounding (Bounding Boxes)</span></strong></summary>
|
|
148
|
+
|
|
149
|
+
```yaml
|
|
150
|
+
agent:
|
|
151
|
+
name: deepseek_ocr_grounding_agent
|
|
152
|
+
description: Agent with grounding for bbox extraction
|
|
153
|
+
|
|
154
|
+
templates:
|
|
155
|
+
- template_name: InputTemplate
|
|
156
|
+
class_name: InputTemplate
|
|
157
|
+
attributes: {}
|
|
158
|
+
|
|
159
|
+
- template_name: FolderImageDatasetCV2
|
|
160
|
+
class_name: FolderImageDatasetCV2
|
|
161
|
+
template_input: InputTemplate
|
|
162
|
+
attributes:
|
|
163
|
+
data_dir: dataset/input
|
|
164
|
+
|
|
165
|
+
- template_name: DeepSeekOCRInference
|
|
166
|
+
class_name: DeepSeekOCRInference
|
|
167
|
+
template_input: FolderImageDatasetCV2
|
|
168
|
+
attributes:
|
|
169
|
+
prompt: "Convert the document to markdown."
|
|
170
|
+
enable_grounding: true
|
|
171
|
+
mode: base
|
|
172
|
+
|
|
173
|
+
- template_name: BBoxDrawer
|
|
174
|
+
class_name: BBoxDrawer
|
|
175
|
+
template_input: DeepSeekOCRInference
|
|
176
|
+
attributes:
|
|
177
|
+
draw_confidence: True
|
|
178
|
+
draw_extra_labels: True
|
|
179
|
+
|
|
180
|
+
- template_name: ImageSaver
|
|
181
|
+
class_name: ImageSaver
|
|
182
|
+
template_input: BBoxDrawer
|
|
183
|
+
attributes:
|
|
184
|
+
save_dir: output
|
|
185
|
+
root_dir: dataset
|
|
186
|
+
```
|
|
187
|
+
</details>
|
|
188
|
+
|
|
189
|
+
To run, simply use:
|
|
190
|
+
|
|
191
|
+
```bash
|
|
192
|
+
sinapsis run name_of_the_config.yml
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
<h2 id="webapp">🌐 Webapp</h2>
|
|
196
|
+
|
|
197
|
+
The webapp provides a simple interface to extract text from images using DeepSeek OCR. Upload your image, and the app will process it and display the detected text.
|
|
198
|
+
|
|
199
|
+
> [!IMPORTANT]
|
|
200
|
+
> To run the app you first need to clone the sinapsis-ocr repository:
|
|
201
|
+
|
|
202
|
+
```bash
|
|
203
|
+
git clone https://github.com/Sinapsis-ai/sinapsis-ocr.git
|
|
204
|
+
cd sinapsis-ocr
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
> [!NOTE]
|
|
208
|
+
> If you'd like to enable external app sharing in Gradio, `export GRADIO_SHARE_APP=True`
|
|
209
|
+
|
|
210
|
+
> [!IMPORTANT]
|
|
211
|
+
> To use DeepSeek OCR in the webapp, set the environment variable:
|
|
212
|
+
> `AGENT_CONFIG_PATH=/app/packages/sinapsis_deepseek_ocr/src/sinapsis_deepseek_ocr/configs/inference.yaml`
|
|
213
|
+
|
|
214
|
+
<details>
|
|
215
|
+
<summary id="docker"><strong><span style="font-size: 1.4em;">🐳 Docker</span></strong></summary>
|
|
216
|
+
|
|
217
|
+
**IMPORTANT** This docker image depends on the sinapsis:base image. Please refer to the official [sinapsis](https://github.com/Sinapsis-ai/sinapsis?tab=readme-ov-file#docker) instructions to Build with Docker.
|
|
218
|
+
|
|
219
|
+
1. **Build the sinapsis-ocr image**:
|
|
220
|
+
|
|
221
|
+
```bash
|
|
222
|
+
docker compose -f docker/compose.yaml build
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
2. **Start the app container**:
|
|
226
|
+
|
|
227
|
+
```bash
|
|
228
|
+
docker compose -f docker/compose_app.yaml up
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
3. **Check the status**:
|
|
232
|
+
|
|
233
|
+
```bash
|
|
234
|
+
docker logs -f sinapsis-ocr-app
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
4. The logs will display the URL to access the webapp, e.g.:
|
|
238
|
+
|
|
239
|
+
**NOTE**: The url can be different, check the output of logs
|
|
240
|
+
|
|
241
|
+
```bash
|
|
242
|
+
Running on local URL: http://127.0.0.1:7860
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
5. To stop the app:
|
|
246
|
+
|
|
247
|
+
```bash
|
|
248
|
+
docker compose -f docker/compose_app.yaml down
|
|
249
|
+
```
|
|
250
|
+
|
|
251
|
+
</details>
|
|
252
|
+
|
|
253
|
+
<details>
|
|
254
|
+
<summary id="uv"><strong><span style="font-size: 1.4em;">💻 UV</span></strong></summary>
|
|
255
|
+
|
|
256
|
+
To run the webapp using the <code>uv</code> package manager, please:
|
|
257
|
+
|
|
258
|
+
1. **Create the virtual environment and sync the dependencies**:
|
|
259
|
+
|
|
260
|
+
```bash
|
|
261
|
+
uv sync --frozen
|
|
262
|
+
```
|
|
263
|
+
|
|
264
|
+
2. **Install packages**:
|
|
265
|
+
```bash
|
|
266
|
+
uv pip install sinapsis-deepseek-ocr[all] --extra-index-url https://pypi.sinapsis.tech
|
|
267
|
+
```
|
|
268
|
+
3. **Run the webapp**:
|
|
269
|
+
|
|
270
|
+
```bash
|
|
271
|
+
uv run webapps/gradio_ocr.py
|
|
272
|
+
```
|
|
273
|
+
|
|
274
|
+
4. **The terminal will display the URL to access the webapp, e.g.**:
|
|
275
|
+
|
|
276
|
+
```bash
|
|
277
|
+
Running on local URL: http://127.0.0.1:7860
|
|
278
|
+
```
|
|
279
|
+
NOTE: The url can be different, check the output of the terminal
|
|
280
|
+
|
|
281
|
+
5. To stop the app press `Control + C` on the terminal
|
|
282
|
+
|
|
283
|
+
</details>
|
|
284
|
+
|
|
285
|
+
<h2 id="documentation">📙 Documentation</h2>
|
|
286
|
+
|
|
287
|
+
Documentation for this and other sinapsis packages is available on the [sinapsis website](https://docs.sinapsis.tech/docs)
|
|
288
|
+
|
|
289
|
+
Tutorials for different projects within sinapsis are available at [sinapsis tutorials page](https://docs.sinapsis.tech/tutorials)
|
|
290
|
+
|
|
291
|
+
<h2 id="license">🔍 License</h2>
|
|
292
|
+
|
|
293
|
+
This project is licensed under the AGPLv3 license, which encourages open collaboration and sharing. For more details, please refer to the [LICENSE](LICENSE) file.
|
|
294
|
+
|
|
295
|
+
For commercial use, please refer to our [official Sinapsis website](https://sinapsis.tech) for information on obtaining a commercial license.
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
sinapsis_deepseek_ocr/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
|
2
|
+
sinapsis_deepseek_ocr/helpers/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
|
3
|
+
sinapsis_deepseek_ocr/helpers/bbox_utils.py,sha256=3lkYXPUP_bM0G1PClBA8364dtU9x9JCq_r_FaDyqX7I,565
|
|
4
|
+
sinapsis_deepseek_ocr/helpers/grounding_parser.py,sha256=nFKjP1oR1wMgIGhPImM49I2DJ_8ThESsUER8rDUaVpA,1176
|
|
5
|
+
sinapsis_deepseek_ocr/helpers/mode_registry.py,sha256=Waj27UsI9dOB9ATjfIxJgM-vOXUM_9JPlfqKzfmvBnY,1267
|
|
6
|
+
sinapsis_deepseek_ocr/helpers/schemas.py,sha256=b6TsFzVii6wYe5FvoW4r1y1wg2906D8anDboDRqTQw4,2396
|
|
7
|
+
sinapsis_deepseek_ocr/helpers/tags.py,sha256=-drXXyKOvPDlA5ykdpNMGrRd1nJJ7JZQ6xNUmU1FI-I,547
|
|
8
|
+
sinapsis_deepseek_ocr/templates/__init__.py,sha256=4bVYh-bWmJj-g9ysLoFLcfIlHQzqlg-Pl-BADYuBl74,498
|
|
9
|
+
sinapsis_deepseek_ocr/templates/deepseek_ocr_inference.py,sha256=PrHYjzP9TzjKVZg7bwApocq7MSmoyv2Mi8mVluWt6KM,7809
|
|
10
|
+
sinapsis_deepseek_ocr-0.1.0.dist-info/licenses/LICENSE,sha256=ILBn-G3jdarm2w8oOrLmXeJNU3czuJvVhDLBASWdhM8,34522
|
|
11
|
+
sinapsis_deepseek_ocr-0.1.0.dist-info/METADATA,sha256=2hFdUbA425nS20ZrRobECG0D57jZg86m13uHh6enWkk,9304
|
|
12
|
+
sinapsis_deepseek_ocr-0.1.0.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
|
|
13
|
+
sinapsis_deepseek_ocr-0.1.0.dist-info/top_level.txt,sha256=hb_Gw4JQgFdjuF9wjm4wwajLTiEhI79jzug9vgRbHv0,22
|
|
14
|
+
sinapsis_deepseek_ocr-0.1.0.dist-info/RECORD,,
|