mdify-cli 1.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Stranger
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,243 @@
1
+ Metadata-Version: 2.4
2
+ Name: mdify-cli
3
+ Version: 1.2.0
4
+ Summary: Lightweight CLI for converting documents to Markdown via Docling container
5
+ Author: tiroq
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/tiroq/mdify
8
+ Project-URL: Repository, https://github.com/tiroq/mdify
9
+ Project-URL: Issues, https://github.com/tiroq/mdify/issues
10
+ Keywords: markdown,conversion,pdf,docling,cli,document,docker
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Environment :: Console
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: Intended Audience :: End Users/Desktop
15
+ Classifier: Operating System :: OS Independent
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Programming Language :: Python :: 3.8
18
+ Classifier: Programming Language :: Python :: 3.9
19
+ Classifier: Programming Language :: Python :: 3.10
20
+ Classifier: Programming Language :: Python :: 3.11
21
+ Classifier: Programming Language :: Python :: 3.12
22
+ Classifier: Topic :: Text Processing :: Markup :: Markdown
23
+ Classifier: Topic :: Utilities
24
+ Requires-Python: >=3.8
25
+ Description-Content-Type: text/markdown
26
+ License-File: LICENSE
27
+ Dynamic: license-file
28
+
29
+ # mdify
30
+
31
+ A lightweight CLI for converting documents to Markdown. The CLI is fast to install via pipx, while the heavy ML conversion (Docling) runs inside a container.
32
+
33
+ ## Requirements
34
+
35
+ - **Python 3.8+**
36
+ - **Docker** or **Podman** (for document conversion)
37
+
38
+ ## Installation
39
+
40
+ ### macOS (recommended)
41
+
42
+ ```bash
43
+ brew install pipx
44
+ pipx ensurepath
45
+ pipx install mdify-cli
46
+ ```
47
+
48
+ Restart your terminal after installation.
49
+
50
+ ### Linux
51
+
52
+ ```bash
53
+ python3 -m pip install --user pipx
54
+ pipx ensurepath
55
+ pipx install mdify-cli
56
+ ```
57
+
58
+ ### Install via pip
59
+
60
+ ```bash
61
+ pip install mdify-cli
62
+ ```
63
+
64
+ ### Development install
65
+
66
+ ```bash
67
+ git clone https://github.com/tiroq/mdify.git
68
+ cd mdify
69
+ pip install -e .
70
+ ```
71
+
72
+ ## Usage
73
+
74
+ ### Basic conversion
75
+
76
+ Convert a single file:
77
+ ```bash
78
+ mdify document.pdf
79
+ ```
80
+
81
+ The first run will automatically pull the container image (~2GB) if not present.
82
+
83
+ ### Convert multiple files
84
+
85
+ Convert all PDFs in a directory:
86
+ ```bash
87
+ mdify /path/to/documents -g "*.pdf"
88
+ ```
89
+
90
+ Recursively convert files:
91
+ ```bash
92
+ mdify /path/to/documents -r -g "*.pdf"
93
+ ```
94
+
95
+ ### Masking sensitive content
96
+
97
+ Mask PII and sensitive content in images:
98
+ ```bash
99
+ mdify document.pdf -m
100
+ mdify document.pdf --mask
101
+ ```
102
+
103
+ This uses Docling's content-aware masking to obscure sensitive information in embedded images.
104
+
105
+ ## Options
106
+
107
+ | Option | Description |
108
+ |--------|-------------|
109
+ | `input` | Input file or directory to convert (required) |
110
+ | `-o, --out-dir DIR` | Output directory for converted files (default: output) |
111
+ | `-g, --glob PATTERN` | Glob pattern for filtering files (default: *) |
112
+ | `-r, --recursive` | Recursively scan directories |
113
+ | `--flat` | Disable directory structure preservation |
114
+ | `--overwrite` | Overwrite existing output files |
115
+ | `-q, --quiet` | Suppress progress messages |
116
+ | `-m, --mask` | Mask PII and sensitive content in images |
117
+ | `--runtime RUNTIME` | Container runtime: docker or podman (auto-detected) |
118
+ | `--image IMAGE` | Custom container image (default: ghcr.io/tiroq/mdify-runtime:latest) |
119
+ | `--pull POLICY` | Image pull policy: always, missing, never (default: missing) |
120
+ | `--check-update` | Check for available updates and exit |
121
+ | `--version` | Show version and exit |
122
+
123
+ ### Flat Mode
124
+
125
+ With `--flat`, all output files are placed directly in the output directory. Directory paths are incorporated into filenames to prevent collisions:
126
+
127
+ - `docs/subdir1/file.pdf` → `output/subdir1_file.md`
128
+ - `docs/subdir2/file.pdf` → `output/subdir2_file.md`
129
+
130
+ ## Examples
131
+
132
+ Convert all PDFs recursively, preserving structure:
133
+ ```bash
134
+ mdify documents/ -r -g "*.pdf" -o markdown_output
135
+ ```
136
+
137
+ Convert with Podman instead of Docker:
138
+ ```bash
139
+ mdify document.pdf --runtime podman
140
+ ```
141
+
142
+ Use a custom/local container image:
143
+ ```bash
144
+ mdify document.pdf --image my-custom-image:latest
145
+ ```
146
+
147
+ Force pull latest container image:
148
+ ```bash
149
+ mdify document.pdf --pull
150
+ ```
151
+
152
+ ## Architecture
153
+
154
+ ```
155
+ ┌──────────────────┐ ┌─────────────────────────────────┐
156
+ │ mdify CLI │ │ Container (Docker/Podman) │
157
+ │ (lightweight) │────▶│ ┌───────────────────────────┐ │
158
+ │ │ │ │ Docling + ML Models │ │
159
+ │ - File handling │◀────│ │ - PDF parsing │ │
160
+ │ - Container │ │ │ - OCR (Tesseract) │ │
161
+ │ orchestration │ │ │ - Document conversion │ │
162
+ └──────────────────┘ │ └───────────────────────────┘ │
163
+ └─────────────────────────────────┘
164
+ ```
165
+
166
+ The CLI:
167
+ - Installs in seconds via pipx (no ML dependencies)
168
+ - Automatically detects Docker or Podman
169
+ - Pulls the runtime container on first use
170
+ - Mounts files and runs conversions in the container
171
+
172
+ ## Container Image
173
+
174
+ The runtime container is hosted at:
175
+ ```
176
+ ghcr.io/tiroq/mdify-runtime:latest
177
+ ```
178
+
179
+ To build locally:
180
+ ```bash
181
+ cd runtime
182
+ docker build -t mdify-runtime .
183
+ ```
184
+
185
+ ## Updates
186
+
187
+ mdify checks for updates daily. When a new version is available:
188
+
189
+ ```
190
+ ==================================================
191
+ A new version of mdify is available!
192
+ Current version: 0.3.0
193
+ Latest version: 0.4.0
194
+ ==================================================
195
+
196
+ Run upgrade now? [y/N]
197
+ ```
198
+
199
+ ### Disable update checks
200
+
201
+ ```bash
202
+ export MDIFY_NO_UPDATE_CHECK=1
203
+ ```
204
+
205
+ ## Uninstall
206
+
207
+ ```bash
208
+ pipx uninstall mdify-cli
209
+ ```
210
+
211
+ Or if installed via pip:
212
+
213
+ ```bash
214
+ pip uninstall mdify-cli
215
+ ```
216
+
217
+ ## Development
218
+
219
+ ### Task automation
220
+
221
+ This project uses [Task](https://taskfile.dev) for automation:
222
+
223
+ ```bash
224
+ # Show available tasks
225
+ task
226
+
227
+ # Build package
228
+ task build
229
+
230
+ # Build container locally
231
+ task container-build
232
+
233
+ # Release workflow
234
+ task release-patch
235
+ ```
236
+
237
+ ### Building for PyPI
238
+
239
+ See [PUBLISHING.md](PUBLISHING.md) for complete publishing instructions.
240
+
241
+ ## License
242
+
243
+ MIT
@@ -0,0 +1,215 @@
1
+ # mdify
2
+
3
+ A lightweight CLI for converting documents to Markdown. The CLI is fast to install via pipx, while the heavy ML conversion (Docling) runs inside a container.
4
+
5
+ ## Requirements
6
+
7
+ - **Python 3.8+**
8
+ - **Docker** or **Podman** (for document conversion)
9
+
10
+ ## Installation
11
+
12
+ ### macOS (recommended)
13
+
14
+ ```bash
15
+ brew install pipx
16
+ pipx ensurepath
17
+ pipx install mdify-cli
18
+ ```
19
+
20
+ Restart your terminal after installation.
21
+
22
+ ### Linux
23
+
24
+ ```bash
25
+ python3 -m pip install --user pipx
26
+ pipx ensurepath
27
+ pipx install mdify-cli
28
+ ```
29
+
30
+ ### Install via pip
31
+
32
+ ```bash
33
+ pip install mdify-cli
34
+ ```
35
+
36
+ ### Development install
37
+
38
+ ```bash
39
+ git clone https://github.com/tiroq/mdify.git
40
+ cd mdify
41
+ pip install -e .
42
+ ```
43
+
44
+ ## Usage
45
+
46
+ ### Basic conversion
47
+
48
+ Convert a single file:
49
+ ```bash
50
+ mdify document.pdf
51
+ ```
52
+
53
+ The first run will automatically pull the container image (~2GB) if not present.
54
+
55
+ ### Convert multiple files
56
+
57
+ Convert all PDFs in a directory:
58
+ ```bash
59
+ mdify /path/to/documents -g "*.pdf"
60
+ ```
61
+
62
+ Recursively convert files:
63
+ ```bash
64
+ mdify /path/to/documents -r -g "*.pdf"
65
+ ```
66
+
67
+ ### Masking sensitive content
68
+
69
+ Mask PII and sensitive content in images:
70
+ ```bash
71
+ mdify document.pdf -m
72
+ mdify document.pdf --mask
73
+ ```
74
+
75
+ This uses Docling's content-aware masking to obscure sensitive information in embedded images.
76
+
77
+ ## Options
78
+
79
+ | Option | Description |
80
+ |--------|-------------|
81
+ | `input` | Input file or directory to convert (required) |
82
+ | `-o, --out-dir DIR` | Output directory for converted files (default: output) |
83
+ | `-g, --glob PATTERN` | Glob pattern for filtering files (default: *) |
84
+ | `-r, --recursive` | Recursively scan directories |
85
+ | `--flat` | Disable directory structure preservation |
86
+ | `--overwrite` | Overwrite existing output files |
87
+ | `-q, --quiet` | Suppress progress messages |
88
+ | `-m, --mask` | Mask PII and sensitive content in images |
89
+ | `--runtime RUNTIME` | Container runtime: docker or podman (auto-detected) |
90
+ | `--image IMAGE` | Custom container image (default: ghcr.io/tiroq/mdify-runtime:latest) |
91
+ | `--pull POLICY` | Image pull policy: always, missing, never (default: missing) |
92
+ | `--check-update` | Check for available updates and exit |
93
+ | `--version` | Show version and exit |
94
+
95
+ ### Flat Mode
96
+
97
+ With `--flat`, all output files are placed directly in the output directory. Directory paths are incorporated into filenames to prevent collisions:
98
+
99
+ - `docs/subdir1/file.pdf` → `output/subdir1_file.md`
100
+ - `docs/subdir2/file.pdf` → `output/subdir2_file.md`
101
+
102
+ ## Examples
103
+
104
+ Convert all PDFs recursively, preserving structure:
105
+ ```bash
106
+ mdify documents/ -r -g "*.pdf" -o markdown_output
107
+ ```
108
+
109
+ Convert with Podman instead of Docker:
110
+ ```bash
111
+ mdify document.pdf --runtime podman
112
+ ```
113
+
114
+ Use a custom/local container image:
115
+ ```bash
116
+ mdify document.pdf --image my-custom-image:latest
117
+ ```
118
+
119
+ Force pull latest container image:
120
+ ```bash
121
+ mdify document.pdf --pull
122
+ ```
123
+
124
+ ## Architecture
125
+
126
+ ```
127
+ ┌──────────────────┐ ┌─────────────────────────────────┐
128
+ │ mdify CLI │ │ Container (Docker/Podman) │
129
+ │ (lightweight) │────▶│ ┌───────────────────────────┐ │
130
+ │ │ │ │ Docling + ML Models │ │
131
+ │ - File handling │◀────│ │ - PDF parsing │ │
132
+ │ - Container │ │ │ - OCR (Tesseract) │ │
133
+ │ orchestration │ │ │ - Document conversion │ │
134
+ └──────────────────┘ │ └───────────────────────────┘ │
135
+ └─────────────────────────────────┘
136
+ ```
137
+
138
+ The CLI:
139
+ - Installs in seconds via pipx (no ML dependencies)
140
+ - Automatically detects Docker or Podman
141
+ - Pulls the runtime container on first use
142
+ - Mounts files and runs conversions in the container
143
+
144
+ ## Container Image
145
+
146
+ The runtime container is hosted at:
147
+ ```
148
+ ghcr.io/tiroq/mdify-runtime:latest
149
+ ```
150
+
151
+ To build locally:
152
+ ```bash
153
+ cd runtime
154
+ docker build -t mdify-runtime .
155
+ ```
156
+
157
+ ## Updates
158
+
159
+ mdify checks for updates daily. When a new version is available:
160
+
161
+ ```
162
+ ==================================================
163
+ A new version of mdify is available!
164
+ Current version: 0.3.0
165
+ Latest version: 0.4.0
166
+ ==================================================
167
+
168
+ Run upgrade now? [y/N]
169
+ ```
170
+
171
+ ### Disable update checks
172
+
173
+ ```bash
174
+ export MDIFY_NO_UPDATE_CHECK=1
175
+ ```
176
+
177
+ ## Uninstall
178
+
179
+ ```bash
180
+ pipx uninstall mdify-cli
181
+ ```
182
+
183
+ Or if installed via pip:
184
+
185
+ ```bash
186
+ pip uninstall mdify-cli
187
+ ```
188
+
189
+ ## Development
190
+
191
+ ### Task automation
192
+
193
+ This project uses [Task](https://taskfile.dev) for automation:
194
+
195
+ ```bash
196
+ # Show available tasks
197
+ task
198
+
199
+ # Build package
200
+ task build
201
+
202
+ # Build container locally
203
+ task container-build
204
+
205
+ # Release workflow
206
+ task release-patch
207
+ ```
208
+
209
+ ### Building for PyPI
210
+
211
+ See [PUBLISHING.md](PUBLISHING.md) for complete publishing instructions.
212
+
213
+ ## License
214
+
215
+ MIT
@@ -0,0 +1,3 @@
1
+ """mdify - Convert documents to Markdown via Docling container."""
2
+
3
+ __version__ = "1.2.0"
@@ -0,0 +1,7 @@
1
+ """Allow running mdify as a module: python -m mdify"""
2
+
3
+ import sys
4
+ from mdify.cli import main
5
+
6
+ if __name__ == "__main__":
7
+ sys.exit(main())