mdify-cli 1.6.0__tar.gz → 2.5.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: mdify-cli
3
- Version: 1.6.0
3
+ Version: 2.5.0
4
4
  Summary: Convert PDFs and document images into structured Markdown for LLM workflows
5
5
  Author: tiroq
6
6
  License-Expression: MIT
@@ -24,6 +24,7 @@ Classifier: Topic :: Utilities
24
24
  Requires-Python: >=3.8
25
25
  Description-Content-Type: text/markdown
26
26
  License-File: LICENSE
27
+ Requires-Dist: requests
27
28
  Provides-Extra: dev
28
29
  Requires-Dist: pytest>=7.0; extra == "dev"
29
30
  Dynamic: license-file
@@ -100,15 +101,32 @@ Recursively convert files:
100
101
  mdify /path/to/documents -r -g "*.pdf"
101
102
  ```
102
103
 
103
- ### Masking sensitive content
104
+ ### GPU Acceleration
104
105
 
105
- Mask PII and sensitive content in images:
106
+ For faster processing with NVIDIA GPU:
106
107
  ```bash
107
- mdify document.pdf -m
108
- mdify document.pdf --mask
108
+ mdify --gpu documents/*.pdf
109
109
  ```
110
110
 
111
- This uses Docling's content-aware masking to obscure sensitive information in embedded images.
111
+ Requires NVIDIA GPU with CUDA support and nvidia-container-toolkit.
112
+
113
+ ### ⚠️ PII Masking (Deprecated)
114
+
115
+ The `--mask` flag is deprecated and will be ignored in this version. PII masking functionality was available in older versions using a custom runtime but is not supported with the current docling-serve backend.
116
+
117
+ If PII masking is critical for your use case, please use mdify v1.5.x or earlier versions.
118
+
119
+ ## Performance
120
+
121
+ mdify now uses docling-serve for significantly faster batch processing:
122
+
123
+ - **Single model load**: Models are loaded once per session, not per file
124
+ - **~10-20x speedup** for multiple file conversions compared to previous versions
125
+ - **GPU acceleration**: Use `--gpu` for additional 2-6x speedup (requires NVIDIA GPU)
126
+
127
+ ### First Run Behavior
128
+
129
+ The first conversion takes longer (~30-60s) as the container loads ML models into memory. Subsequent files in the same batch process quickly, typically in 1-3 seconds per file.
112
130
 
113
131
  ## Options
114
132
 
@@ -121,9 +139,11 @@ This uses Docling's content-aware masking to obscure sensitive information in em
121
139
  | `--flat` | Disable directory structure preservation |
122
140
  | `--overwrite` | Overwrite existing output files |
123
141
  | `-q, --quiet` | Suppress progress messages |
124
- | `-m, --mask` | Mask PII and sensitive content in images |
142
+ | `-m, --mask` | ⚠️ **Deprecated**: PII masking not supported in current version |
143
+ | `--gpu` | Use GPU-accelerated container (requires NVIDIA GPU and nvidia-container-toolkit) |
144
+ | `--port PORT` | Container port (default: 5001) |
125
145
  | `--runtime RUNTIME` | Container runtime: docker or podman (auto-detected) |
126
- | `--image IMAGE` | Custom container image (default: ghcr.io/tiroq/mdify-runtime:latest) |
146
+ | `--image IMAGE` | Custom container image (default: ghcr.io/docling-project/docling-serve-cpu:main) |
127
147
  | `--pull POLICY` | Image pull policy: always, missing, never (default: missing) |
128
148
  | `--check-update` | Check for available updates and exit |
129
149
  | `--version` | Show version and exit |
@@ -177,19 +197,22 @@ The CLI:
177
197
  - Pulls the runtime container on first use
178
198
  - Mounts files and runs conversions in the container
179
199
 
180
- ## Container Image
200
+ ## Container Images
201
+
202
+ mdify uses official docling-serve containers:
181
203
 
182
- The runtime container is hosted at:
204
+ **CPU Version** (default):
183
205
  ```
184
- ghcr.io/tiroq/mdify-runtime:latest
206
+ ghcr.io/docling-project/docling-serve-cpu:main
185
207
  ```
186
208
 
187
- To build locally:
188
- ```bash
189
- cd runtime
190
- docker build -t mdify-runtime .
209
+ **GPU Version** (use with `--gpu` flag):
210
+ ```
211
+ ghcr.io/docling-project/docling-serve-cu126:main
191
212
  ```
192
213
 
214
+ These are official images from the [docling-serve project](https://github.com/DS4SD/docling-serve).
215
+
193
216
  ## Updates
194
217
 
195
218
  mdify checks for updates daily. When a new version is available:
@@ -70,15 +70,32 @@ Recursively convert files:
70
70
  mdify /path/to/documents -r -g "*.pdf"
71
71
  ```
72
72
 
73
- ### Masking sensitive content
73
+ ### GPU Acceleration
74
74
 
75
- Mask PII and sensitive content in images:
75
+ For faster processing with NVIDIA GPU:
76
76
  ```bash
77
- mdify document.pdf -m
78
- mdify document.pdf --mask
77
+ mdify --gpu documents/*.pdf
79
78
  ```
80
79
 
81
- This uses Docling's content-aware masking to obscure sensitive information in embedded images.
80
+ Requires NVIDIA GPU with CUDA support and nvidia-container-toolkit.
81
+
82
+ ### ⚠️ PII Masking (Deprecated)
83
+
84
+ The `--mask` flag is deprecated and will be ignored in this version. PII masking functionality was available in older versions using a custom runtime but is not supported with the current docling-serve backend.
85
+
86
+ If PII masking is critical for your use case, please use mdify v1.5.x or earlier versions.
87
+
88
+ ## Performance
89
+
90
+ mdify now uses docling-serve for significantly faster batch processing:
91
+
92
+ - **Single model load**: Models are loaded once per session, not per file
93
+ - **~10-20x speedup** for multiple file conversions compared to previous versions
94
+ - **GPU acceleration**: Use `--gpu` for additional 2-6x speedup (requires NVIDIA GPU)
95
+
96
+ ### First Run Behavior
97
+
98
+ The first conversion takes longer (~30-60s) as the container loads ML models into memory. Subsequent files in the same batch process quickly, typically in 1-3 seconds per file.
82
99
 
83
100
  ## Options
84
101
 
@@ -91,9 +108,11 @@ This uses Docling's content-aware masking to obscure sensitive information in em
91
108
  | `--flat` | Disable directory structure preservation |
92
109
  | `--overwrite` | Overwrite existing output files |
93
110
  | `-q, --quiet` | Suppress progress messages |
94
- | `-m, --mask` | Mask PII and sensitive content in images |
111
+ | `-m, --mask` | ⚠️ **Deprecated**: PII masking not supported in current version |
112
+ | `--gpu` | Use GPU-accelerated container (requires NVIDIA GPU and nvidia-container-toolkit) |
113
+ | `--port PORT` | Container port (default: 5001) |
95
114
  | `--runtime RUNTIME` | Container runtime: docker or podman (auto-detected) |
96
- | `--image IMAGE` | Custom container image (default: ghcr.io/tiroq/mdify-runtime:latest) |
115
+ | `--image IMAGE` | Custom container image (default: ghcr.io/docling-project/docling-serve-cpu:main) |
97
116
  | `--pull POLICY` | Image pull policy: always, missing, never (default: missing) |
98
117
  | `--check-update` | Check for available updates and exit |
99
118
  | `--version` | Show version and exit |
@@ -147,19 +166,22 @@ The CLI:
147
166
  - Pulls the runtime container on first use
148
167
  - Mounts files and runs conversions in the container
149
168
 
150
- ## Container Image
169
+ ## Container Images
170
+
171
+ mdify uses official docling-serve containers:
151
172
 
152
- The runtime container is hosted at:
173
+ **CPU Version** (default):
153
174
  ```
154
- ghcr.io/tiroq/mdify-runtime:latest
175
+ ghcr.io/docling-project/docling-serve-cpu:main
155
176
  ```
156
177
 
157
- To build locally:
158
- ```bash
159
- cd runtime
160
- docker build -t mdify-runtime .
178
+ **GPU Version** (use with `--gpu` flag):
179
+ ```
180
+ ghcr.io/docling-project/docling-serve-cu126:main
161
181
  ```
162
182
 
183
+ These are official images from the [docling-serve project](https://github.com/DS4SD/docling-serve).
184
+
163
185
  ## Updates
164
186
 
165
187
  mdify checks for updates daily. When a new version is available:
@@ -1,3 +1,3 @@
1
1
  """mdify - Convert documents to Markdown via Docling container."""
2
2
 
3
- __version__ = "1.6.0"
3
+ __version__ = "2.5.0"