mdify-cli 1.6.0__tar.gz → 2.5.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {mdify_cli-1.6.0/mdify_cli.egg-info → mdify_cli-2.5.0}/PKG-INFO +38 -15
- {mdify_cli-1.6.0 → mdify_cli-2.5.0}/README.md +36 -14
- {mdify_cli-1.6.0 → mdify_cli-2.5.0}/mdify/__init__.py +1 -1
- {mdify_cli-1.6.0 → mdify_cli-2.5.0}/mdify/cli.py +316 -128
- mdify_cli-2.5.0/mdify/container.py +132 -0
- mdify_cli-2.5.0/mdify/docling_client.py +224 -0
- {mdify_cli-1.6.0 → mdify_cli-2.5.0/mdify_cli.egg-info}/PKG-INFO +38 -15
- {mdify_cli-1.6.0 → mdify_cli-2.5.0}/mdify_cli.egg-info/SOURCES.txt +5 -1
- {mdify_cli-1.6.0 → mdify_cli-2.5.0}/mdify_cli.egg-info/requires.txt +1 -0
- {mdify_cli-1.6.0 → mdify_cli-2.5.0}/pyproject.toml +2 -2
- mdify_cli-2.5.0/tests/test_cli.py +1193 -0
- mdify_cli-2.5.0/tests/test_container.py +317 -0
- mdify_cli-2.5.0/tests/test_docling_client.py +358 -0
- mdify_cli-1.6.0/tests/test_cli.py +0 -77
- {mdify_cli-1.6.0 → mdify_cli-2.5.0}/LICENSE +0 -0
- {mdify_cli-1.6.0 → mdify_cli-2.5.0}/assets/mdify.png +0 -0
- {mdify_cli-1.6.0 → mdify_cli-2.5.0}/mdify/__main__.py +0 -0
- {mdify_cli-1.6.0 → mdify_cli-2.5.0}/mdify_cli.egg-info/dependency_links.txt +0 -0
- {mdify_cli-1.6.0 → mdify_cli-2.5.0}/mdify_cli.egg-info/entry_points.txt +0 -0
- {mdify_cli-1.6.0 → mdify_cli-2.5.0}/mdify_cli.egg-info/top_level.txt +0 -0
- {mdify_cli-1.6.0 → mdify_cli-2.5.0}/setup.cfg +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: mdify-cli
|
|
3
|
-
Version:
|
|
3
|
+
Version: 2.5.0
|
|
4
4
|
Summary: Convert PDFs and document images into structured Markdown for LLM workflows
|
|
5
5
|
Author: tiroq
|
|
6
6
|
License-Expression: MIT
|
|
@@ -24,6 +24,7 @@ Classifier: Topic :: Utilities
|
|
|
24
24
|
Requires-Python: >=3.8
|
|
25
25
|
Description-Content-Type: text/markdown
|
|
26
26
|
License-File: LICENSE
|
|
27
|
+
Requires-Dist: requests
|
|
27
28
|
Provides-Extra: dev
|
|
28
29
|
Requires-Dist: pytest>=7.0; extra == "dev"
|
|
29
30
|
Dynamic: license-file
|
|
@@ -100,15 +101,32 @@ Recursively convert files:
|
|
|
100
101
|
mdify /path/to/documents -r -g "*.pdf"
|
|
101
102
|
```
|
|
102
103
|
|
|
103
|
-
###
|
|
104
|
+
### GPU Acceleration
|
|
104
105
|
|
|
105
|
-
|
|
106
|
+
For faster processing with NVIDIA GPU:
|
|
106
107
|
```bash
|
|
107
|
-
mdify
|
|
108
|
-
mdify document.pdf --mask
|
|
108
|
+
mdify --gpu documents/*.pdf
|
|
109
109
|
```
|
|
110
110
|
|
|
111
|
-
|
|
111
|
+
Requires NVIDIA GPU with CUDA support and nvidia-container-toolkit.
|
|
112
|
+
|
|
113
|
+
### ⚠️ PII Masking (Deprecated)
|
|
114
|
+
|
|
115
|
+
The `--mask` flag is deprecated and will be ignored in this version. PII masking functionality was available in older versions using a custom runtime but is not supported with the current docling-serve backend.
|
|
116
|
+
|
|
117
|
+
If PII masking is critical for your use case, please use mdify v1.5.x or earlier versions.
|
|
118
|
+
|
|
119
|
+
## Performance
|
|
120
|
+
|
|
121
|
+
mdify now uses docling-serve for significantly faster batch processing:
|
|
122
|
+
|
|
123
|
+
- **Single model load**: Models are loaded once per session, not per file
|
|
124
|
+
- **~10-20x speedup** for multiple file conversions compared to previous versions
|
|
125
|
+
- **GPU acceleration**: Use `--gpu` for additional 2-6x speedup (requires NVIDIA GPU)
|
|
126
|
+
|
|
127
|
+
### First Run Behavior
|
|
128
|
+
|
|
129
|
+
The first conversion takes longer (~30-60s) as the container loads ML models into memory. Subsequent files in the same batch process quickly, typically in 1-3 seconds per file.
|
|
112
130
|
|
|
113
131
|
## Options
|
|
114
132
|
|
|
@@ -121,9 +139,11 @@ This uses Docling's content-aware masking to obscure sensitive information in em
|
|
|
121
139
|
| `--flat` | Disable directory structure preservation |
|
|
122
140
|
| `--overwrite` | Overwrite existing output files |
|
|
123
141
|
| `-q, --quiet` | Suppress progress messages |
|
|
124
|
-
| `-m, --mask` |
|
|
142
|
+
| `-m, --mask` | ⚠️ **Deprecated**: PII masking not supported in current version |
|
|
143
|
+
| `--gpu` | Use GPU-accelerated container (requires NVIDIA GPU and nvidia-container-toolkit) |
|
|
144
|
+
| `--port PORT` | Container port (default: 5001) |
|
|
125
145
|
| `--runtime RUNTIME` | Container runtime: docker or podman (auto-detected) |
|
|
126
|
-
| `--image IMAGE` | Custom container image (default: ghcr.io/
|
|
146
|
+
| `--image IMAGE` | Custom container image (default: ghcr.io/docling-project/docling-serve-cpu:main) |
|
|
127
147
|
| `--pull POLICY` | Image pull policy: always, missing, never (default: missing) |
|
|
128
148
|
| `--check-update` | Check for available updates and exit |
|
|
129
149
|
| `--version` | Show version and exit |
|
|
@@ -177,19 +197,22 @@ The CLI:
|
|
|
177
197
|
- Pulls the runtime container on first use
|
|
178
198
|
- Mounts files and runs conversions in the container
|
|
179
199
|
|
|
180
|
-
## Container
|
|
200
|
+
## Container Images
|
|
201
|
+
|
|
202
|
+
mdify uses official docling-serve containers:
|
|
181
203
|
|
|
182
|
-
|
|
204
|
+
**CPU Version** (default):
|
|
183
205
|
```
|
|
184
|
-
ghcr.io/
|
|
206
|
+
ghcr.io/docling-project/docling-serve-cpu:main
|
|
185
207
|
```
|
|
186
208
|
|
|
187
|
-
|
|
188
|
-
```
|
|
189
|
-
|
|
190
|
-
docker build -t mdify-runtime .
|
|
209
|
+
**GPU Version** (use with `--gpu` flag):
|
|
210
|
+
```
|
|
211
|
+
ghcr.io/docling-project/docling-serve-cu126:main
|
|
191
212
|
```
|
|
192
213
|
|
|
214
|
+
These are official images from the [docling-serve project](https://github.com/DS4SD/docling-serve).
|
|
215
|
+
|
|
193
216
|
## Updates
|
|
194
217
|
|
|
195
218
|
mdify checks for updates daily. When a new version is available:
|
|
@@ -70,15 +70,32 @@ Recursively convert files:
|
|
|
70
70
|
mdify /path/to/documents -r -g "*.pdf"
|
|
71
71
|
```
|
|
72
72
|
|
|
73
|
-
###
|
|
73
|
+
### GPU Acceleration
|
|
74
74
|
|
|
75
|
-
|
|
75
|
+
For faster processing with NVIDIA GPU:
|
|
76
76
|
```bash
|
|
77
|
-
mdify
|
|
78
|
-
mdify document.pdf --mask
|
|
77
|
+
mdify --gpu documents/*.pdf
|
|
79
78
|
```
|
|
80
79
|
|
|
81
|
-
|
|
80
|
+
Requires NVIDIA GPU with CUDA support and nvidia-container-toolkit.
|
|
81
|
+
|
|
82
|
+
### ⚠️ PII Masking (Deprecated)
|
|
83
|
+
|
|
84
|
+
The `--mask` flag is deprecated and will be ignored in this version. PII masking functionality was available in older versions using a custom runtime but is not supported with the current docling-serve backend.
|
|
85
|
+
|
|
86
|
+
If PII masking is critical for your use case, please use mdify v1.5.x or earlier versions.
|
|
87
|
+
|
|
88
|
+
## Performance
|
|
89
|
+
|
|
90
|
+
mdify now uses docling-serve for significantly faster batch processing:
|
|
91
|
+
|
|
92
|
+
- **Single model load**: Models are loaded once per session, not per file
|
|
93
|
+
- **~10-20x speedup** for multiple file conversions compared to previous versions
|
|
94
|
+
- **GPU acceleration**: Use `--gpu` for additional 2-6x speedup (requires NVIDIA GPU)
|
|
95
|
+
|
|
96
|
+
### First Run Behavior
|
|
97
|
+
|
|
98
|
+
The first conversion takes longer (~30-60s) as the container loads ML models into memory. Subsequent files in the same batch process quickly, typically in 1-3 seconds per file.
|
|
82
99
|
|
|
83
100
|
## Options
|
|
84
101
|
|
|
@@ -91,9 +108,11 @@ This uses Docling's content-aware masking to obscure sensitive information in em
|
|
|
91
108
|
| `--flat` | Disable directory structure preservation |
|
|
92
109
|
| `--overwrite` | Overwrite existing output files |
|
|
93
110
|
| `-q, --quiet` | Suppress progress messages |
|
|
94
|
-
| `-m, --mask` |
|
|
111
|
+
| `-m, --mask` | ⚠️ **Deprecated**: PII masking not supported in current version |
|
|
112
|
+
| `--gpu` | Use GPU-accelerated container (requires NVIDIA GPU and nvidia-container-toolkit) |
|
|
113
|
+
| `--port PORT` | Container port (default: 5001) |
|
|
95
114
|
| `--runtime RUNTIME` | Container runtime: docker or podman (auto-detected) |
|
|
96
|
-
| `--image IMAGE` | Custom container image (default: ghcr.io/
|
|
115
|
+
| `--image IMAGE` | Custom container image (default: ghcr.io/docling-project/docling-serve-cpu:main) |
|
|
97
116
|
| `--pull POLICY` | Image pull policy: always, missing, never (default: missing) |
|
|
98
117
|
| `--check-update` | Check for available updates and exit |
|
|
99
118
|
| `--version` | Show version and exit |
|
|
@@ -147,19 +166,22 @@ The CLI:
|
|
|
147
166
|
- Pulls the runtime container on first use
|
|
148
167
|
- Mounts files and runs conversions in the container
|
|
149
168
|
|
|
150
|
-
## Container
|
|
169
|
+
## Container Images
|
|
170
|
+
|
|
171
|
+
mdify uses official docling-serve containers:
|
|
151
172
|
|
|
152
|
-
|
|
173
|
+
**CPU Version** (default):
|
|
153
174
|
```
|
|
154
|
-
ghcr.io/
|
|
175
|
+
ghcr.io/docling-project/docling-serve-cpu:main
|
|
155
176
|
```
|
|
156
177
|
|
|
157
|
-
|
|
158
|
-
```
|
|
159
|
-
|
|
160
|
-
docker build -t mdify-runtime .
|
|
178
|
+
**GPU Version** (use with `--gpu` flag):
|
|
179
|
+
```
|
|
180
|
+
ghcr.io/docling-project/docling-serve-cu126:main
|
|
161
181
|
```
|
|
162
182
|
|
|
183
|
+
These are official images from the [docling-serve project](https://github.com/DS4SD/docling-serve).
|
|
184
|
+
|
|
163
185
|
## Updates
|
|
164
186
|
|
|
165
187
|
mdify checks for updates daily. When a new version is available:
|