mdify-cli 1.5.0__tar.gz → 2.0.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {mdify_cli-1.5.0/mdify_cli.egg-info → mdify_cli-2.0.0}/PKG-INFO +40 -15
- {mdify_cli-1.5.0 → mdify_cli-2.0.0}/README.md +36 -14
- {mdify_cli-1.5.0 → mdify_cli-2.0.0}/mdify/__init__.py +1 -1
- {mdify_cli-1.5.0 → mdify_cli-2.0.0}/mdify/cli.py +251 -204
- mdify_cli-2.0.0/mdify/container.py +128 -0
- mdify_cli-2.0.0/mdify/docling_client.py +224 -0
- {mdify_cli-1.5.0 → mdify_cli-2.0.0/mdify_cli.egg-info}/PKG-INFO +40 -15
- {mdify_cli-1.5.0 → mdify_cli-2.0.0}/mdify_cli.egg-info/SOURCES.txt +7 -1
- mdify_cli-2.0.0/mdify_cli.egg-info/requires.txt +4 -0
- {mdify_cli-1.5.0 → mdify_cli-2.0.0}/pyproject.toml +5 -2
- mdify_cli-2.0.0/tests/test_cli.py +137 -0
- mdify_cli-2.0.0/tests/test_container.py +317 -0
- mdify_cli-2.0.0/tests/test_docling_client.py +358 -0
- {mdify_cli-1.5.0 → mdify_cli-2.0.0}/LICENSE +0 -0
- {mdify_cli-1.5.0 → mdify_cli-2.0.0}/assets/mdify.png +0 -0
- {mdify_cli-1.5.0 → mdify_cli-2.0.0}/mdify/__main__.py +0 -0
- {mdify_cli-1.5.0 → mdify_cli-2.0.0}/mdify_cli.egg-info/dependency_links.txt +0 -0
- {mdify_cli-1.5.0 → mdify_cli-2.0.0}/mdify_cli.egg-info/entry_points.txt +0 -0
- {mdify_cli-1.5.0 → mdify_cli-2.0.0}/mdify_cli.egg-info/top_level.txt +0 -0
- {mdify_cli-1.5.0 → mdify_cli-2.0.0}/setup.cfg +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: mdify-cli
|
|
3
|
-
Version:
|
|
3
|
+
Version: 2.0.0
|
|
4
4
|
Summary: Convert PDFs and document images into structured Markdown for LLM workflows
|
|
5
5
|
Author: tiroq
|
|
6
6
|
License-Expression: MIT
|
|
@@ -24,6 +24,9 @@ Classifier: Topic :: Utilities
|
|
|
24
24
|
Requires-Python: >=3.8
|
|
25
25
|
Description-Content-Type: text/markdown
|
|
26
26
|
License-File: LICENSE
|
|
27
|
+
Requires-Dist: requests
|
|
28
|
+
Provides-Extra: dev
|
|
29
|
+
Requires-Dist: pytest>=7.0; extra == "dev"
|
|
27
30
|
Dynamic: license-file
|
|
28
31
|
|
|
29
32
|
# mdify
|
|
@@ -98,15 +101,32 @@ Recursively convert files:
|
|
|
98
101
|
mdify /path/to/documents -r -g "*.pdf"
|
|
99
102
|
```
|
|
100
103
|
|
|
101
|
-
###
|
|
104
|
+
### GPU Acceleration
|
|
102
105
|
|
|
103
|
-
|
|
106
|
+
For faster processing with NVIDIA GPU:
|
|
104
107
|
```bash
|
|
105
|
-
mdify
|
|
106
|
-
mdify document.pdf --mask
|
|
108
|
+
mdify --gpu documents/*.pdf
|
|
107
109
|
```
|
|
108
110
|
|
|
109
|
-
|
|
111
|
+
Requires NVIDIA GPU with CUDA support and nvidia-container-toolkit.
|
|
112
|
+
|
|
113
|
+
### ⚠️ PII Masking (Deprecated)
|
|
114
|
+
|
|
115
|
+
The `--mask` flag is deprecated and will be ignored in this version. PII masking functionality was available in older versions using a custom runtime but is not supported with the current docling-serve backend.
|
|
116
|
+
|
|
117
|
+
If PII masking is critical for your use case, please use mdify v1.5.x or earlier versions.
|
|
118
|
+
|
|
119
|
+
## Performance
|
|
120
|
+
|
|
121
|
+
mdify now uses docling-serve for significantly faster batch processing:
|
|
122
|
+
|
|
123
|
+
- **Single model load**: Models are loaded once per session, not per file
|
|
124
|
+
- **~10-20x speedup** for multiple file conversions compared to previous versions
|
|
125
|
+
- **GPU acceleration**: Use `--gpu` for additional 2-6x speedup (requires NVIDIA GPU)
|
|
126
|
+
|
|
127
|
+
### First Run Behavior
|
|
128
|
+
|
|
129
|
+
The first conversion takes longer (~30-60s) as the container loads ML models into memory. Subsequent files in the same batch process quickly, typically in 1-3 seconds per file.
|
|
110
130
|
|
|
111
131
|
## Options
|
|
112
132
|
|
|
@@ -119,9 +139,11 @@ This uses Docling's content-aware masking to obscure sensitive information in em
|
|
|
119
139
|
| `--flat` | Disable directory structure preservation |
|
|
120
140
|
| `--overwrite` | Overwrite existing output files |
|
|
121
141
|
| `-q, --quiet` | Suppress progress messages |
|
|
122
|
-
| `-m, --mask` |
|
|
142
|
+
| `-m, --mask` | ⚠️ **Deprecated**: PII masking not supported in current version |
|
|
143
|
+
| `--gpu` | Use GPU-accelerated container (requires NVIDIA GPU and nvidia-container-toolkit) |
|
|
144
|
+
| `--port PORT` | Container port (default: 5001) |
|
|
123
145
|
| `--runtime RUNTIME` | Container runtime: docker or podman (auto-detected) |
|
|
124
|
-
| `--image IMAGE` | Custom container image (default: ghcr.io/
|
|
146
|
+
| `--image IMAGE` | Custom container image (default: ghcr.io/docling-project/docling-serve-cpu:main) |
|
|
125
147
|
| `--pull POLICY` | Image pull policy: always, missing, never (default: missing) |
|
|
126
148
|
| `--check-update` | Check for available updates and exit |
|
|
127
149
|
| `--version` | Show version and exit |
|
|
@@ -175,19 +197,22 @@ The CLI:
|
|
|
175
197
|
- Pulls the runtime container on first use
|
|
176
198
|
- Mounts files and runs conversions in the container
|
|
177
199
|
|
|
178
|
-
## Container
|
|
200
|
+
## Container Images
|
|
201
|
+
|
|
202
|
+
mdify uses official docling-serve containers:
|
|
179
203
|
|
|
180
|
-
|
|
204
|
+
**CPU Version** (default):
|
|
181
205
|
```
|
|
182
|
-
ghcr.io/
|
|
206
|
+
ghcr.io/docling-project/docling-serve-cpu:main
|
|
183
207
|
```
|
|
184
208
|
|
|
185
|
-
|
|
186
|
-
```
|
|
187
|
-
|
|
188
|
-
docker build -t mdify-runtime .
|
|
209
|
+
**GPU Version** (use with `--gpu` flag):
|
|
210
|
+
```
|
|
211
|
+
ghcr.io/docling-project/docling-serve-cu126:main
|
|
189
212
|
```
|
|
190
213
|
|
|
214
|
+
These are official images from the [docling-serve project](https://github.com/DS4SD/docling-serve).
|
|
215
|
+
|
|
191
216
|
## Updates
|
|
192
217
|
|
|
193
218
|
mdify checks for updates daily. When a new version is available:
|
|
@@ -70,15 +70,32 @@ Recursively convert files:
|
|
|
70
70
|
mdify /path/to/documents -r -g "*.pdf"
|
|
71
71
|
```
|
|
72
72
|
|
|
73
|
-
###
|
|
73
|
+
### GPU Acceleration
|
|
74
74
|
|
|
75
|
-
|
|
75
|
+
For faster processing with NVIDIA GPU:
|
|
76
76
|
```bash
|
|
77
|
-
mdify
|
|
78
|
-
mdify document.pdf --mask
|
|
77
|
+
mdify --gpu documents/*.pdf
|
|
79
78
|
```
|
|
80
79
|
|
|
81
|
-
|
|
80
|
+
Requires NVIDIA GPU with CUDA support and nvidia-container-toolkit.
|
|
81
|
+
|
|
82
|
+
### ⚠️ PII Masking (Deprecated)
|
|
83
|
+
|
|
84
|
+
The `--mask` flag is deprecated and will be ignored in this version. PII masking functionality was available in older versions using a custom runtime but is not supported with the current docling-serve backend.
|
|
85
|
+
|
|
86
|
+
If PII masking is critical for your use case, please use mdify v1.5.x or earlier versions.
|
|
87
|
+
|
|
88
|
+
## Performance
|
|
89
|
+
|
|
90
|
+
mdify now uses docling-serve for significantly faster batch processing:
|
|
91
|
+
|
|
92
|
+
- **Single model load**: Models are loaded once per session, not per file
|
|
93
|
+
- **~10-20x speedup** for multiple file conversions compared to previous versions
|
|
94
|
+
- **GPU acceleration**: Use `--gpu` for additional 2-6x speedup (requires NVIDIA GPU)
|
|
95
|
+
|
|
96
|
+
### First Run Behavior
|
|
97
|
+
|
|
98
|
+
The first conversion takes longer (~30-60s) as the container loads ML models into memory. Subsequent files in the same batch process quickly, typically in 1-3 seconds per file.
|
|
82
99
|
|
|
83
100
|
## Options
|
|
84
101
|
|
|
@@ -91,9 +108,11 @@ This uses Docling's content-aware masking to obscure sensitive information in em
|
|
|
91
108
|
| `--flat` | Disable directory structure preservation |
|
|
92
109
|
| `--overwrite` | Overwrite existing output files |
|
|
93
110
|
| `-q, --quiet` | Suppress progress messages |
|
|
94
|
-
| `-m, --mask` |
|
|
111
|
+
| `-m, --mask` | ⚠️ **Deprecated**: PII masking not supported in current version |
|
|
112
|
+
| `--gpu` | Use GPU-accelerated container (requires NVIDIA GPU and nvidia-container-toolkit) |
|
|
113
|
+
| `--port PORT` | Container port (default: 5001) |
|
|
95
114
|
| `--runtime RUNTIME` | Container runtime: docker or podman (auto-detected) |
|
|
96
|
-
| `--image IMAGE` | Custom container image (default: ghcr.io/
|
|
115
|
+
| `--image IMAGE` | Custom container image (default: ghcr.io/docling-project/docling-serve-cpu:main) |
|
|
97
116
|
| `--pull POLICY` | Image pull policy: always, missing, never (default: missing) |
|
|
98
117
|
| `--check-update` | Check for available updates and exit |
|
|
99
118
|
| `--version` | Show version and exit |
|
|
@@ -147,19 +166,22 @@ The CLI:
|
|
|
147
166
|
- Pulls the runtime container on first use
|
|
148
167
|
- Mounts files and runs conversions in the container
|
|
149
168
|
|
|
150
|
-
## Container
|
|
169
|
+
## Container Images
|
|
170
|
+
|
|
171
|
+
mdify uses official docling-serve containers:
|
|
151
172
|
|
|
152
|
-
|
|
173
|
+
**CPU Version** (default):
|
|
153
174
|
```
|
|
154
|
-
ghcr.io/
|
|
175
|
+
ghcr.io/docling-project/docling-serve-cpu:main
|
|
155
176
|
```
|
|
156
177
|
|
|
157
|
-
|
|
158
|
-
```
|
|
159
|
-
|
|
160
|
-
docker build -t mdify-runtime .
|
|
178
|
+
**GPU Version** (use with `--gpu` flag):
|
|
179
|
+
```
|
|
180
|
+
ghcr.io/docling-project/docling-serve-cu126:main
|
|
161
181
|
```
|
|
162
182
|
|
|
183
|
+
These are official images from the [docling-serve project](https://github.com/DS4SD/docling-serve).
|
|
184
|
+
|
|
163
185
|
## Updates
|
|
164
186
|
|
|
165
187
|
mdify checks for updates daily. When a new version is available:
|