mdify-cli 1.2.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- mdify_cli-1.2.0/LICENSE +21 -0
- mdify_cli-1.2.0/PKG-INFO +243 -0
- mdify_cli-1.2.0/README.md +215 -0
- mdify_cli-1.2.0/mdify/__init__.py +3 -0
- mdify_cli-1.2.0/mdify/__main__.py +7 -0
- mdify_cli-1.2.0/mdify/cli.py +647 -0
- mdify_cli-1.2.0/mdify_cli.egg-info/PKG-INFO +243 -0
- mdify_cli-1.2.0/mdify_cli.egg-info/SOURCES.txt +11 -0
- mdify_cli-1.2.0/mdify_cli.egg-info/dependency_links.txt +1 -0
- mdify_cli-1.2.0/mdify_cli.egg-info/entry_points.txt +2 -0
- mdify_cli-1.2.0/mdify_cli.egg-info/top_level.txt +1 -0
- mdify_cli-1.2.0/pyproject.toml +43 -0
- mdify_cli-1.2.0/setup.cfg +4 -0
mdify_cli-1.2.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Stranger
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
mdify_cli-1.2.0/PKG-INFO
ADDED
|
@@ -0,0 +1,243 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: mdify-cli
|
|
3
|
+
Version: 1.2.0
|
|
4
|
+
Summary: Lightweight CLI for converting documents to Markdown via Docling container
|
|
5
|
+
Author: tiroq
|
|
6
|
+
License-Expression: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/tiroq/mdify
|
|
8
|
+
Project-URL: Repository, https://github.com/tiroq/mdify
|
|
9
|
+
Project-URL: Issues, https://github.com/tiroq/mdify/issues
|
|
10
|
+
Keywords: markdown,conversion,pdf,docling,cli,document,docker
|
|
11
|
+
Classifier: Development Status :: 4 - Beta
|
|
12
|
+
Classifier: Environment :: Console
|
|
13
|
+
Classifier: Intended Audience :: Developers
|
|
14
|
+
Classifier: Intended Audience :: End Users/Desktop
|
|
15
|
+
Classifier: Operating System :: OS Independent
|
|
16
|
+
Classifier: Programming Language :: Python :: 3
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.8
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
21
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
22
|
+
Classifier: Topic :: Text Processing :: Markup :: Markdown
|
|
23
|
+
Classifier: Topic :: Utilities
|
|
24
|
+
Requires-Python: >=3.8
|
|
25
|
+
Description-Content-Type: text/markdown
|
|
26
|
+
License-File: LICENSE
|
|
27
|
+
Dynamic: license-file
|
|
28
|
+
|
|
29
|
+
# mdify
|
|
30
|
+
|
|
31
|
+
A lightweight CLI for converting documents to Markdown. The CLI is fast to install via pipx, while the heavy ML conversion (Docling) runs inside a container.
|
|
32
|
+
|
|
33
|
+
## Requirements
|
|
34
|
+
|
|
35
|
+
- **Python 3.8+**
|
|
36
|
+
- **Docker** or **Podman** (for document conversion)
|
|
37
|
+
|
|
38
|
+
## Installation
|
|
39
|
+
|
|
40
|
+
### macOS (recommended)
|
|
41
|
+
|
|
42
|
+
```bash
|
|
43
|
+
brew install pipx
|
|
44
|
+
pipx ensurepath
|
|
45
|
+
pipx install mdify-cli
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
Restart your terminal after installation.
|
|
49
|
+
|
|
50
|
+
### Linux
|
|
51
|
+
|
|
52
|
+
```bash
|
|
53
|
+
python3 -m pip install --user pipx
|
|
54
|
+
pipx ensurepath
|
|
55
|
+
pipx install mdify-cli
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
### Install via pip
|
|
59
|
+
|
|
60
|
+
```bash
|
|
61
|
+
pip install mdify-cli
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
### Development install
|
|
65
|
+
|
|
66
|
+
```bash
|
|
67
|
+
git clone https://github.com/tiroq/mdify.git
|
|
68
|
+
cd mdify
|
|
69
|
+
pip install -e .
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
## Usage
|
|
73
|
+
|
|
74
|
+
### Basic conversion
|
|
75
|
+
|
|
76
|
+
Convert a single file:
|
|
77
|
+
```bash
|
|
78
|
+
mdify document.pdf
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
The first run will automatically pull the container image (~2GB) if not present.
|
|
82
|
+
|
|
83
|
+
### Convert multiple files
|
|
84
|
+
|
|
85
|
+
Convert all PDFs in a directory:
|
|
86
|
+
```bash
|
|
87
|
+
mdify /path/to/documents -g "*.pdf"
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
Recursively convert files:
|
|
91
|
+
```bash
|
|
92
|
+
mdify /path/to/documents -r -g "*.pdf"
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
### Masking sensitive content
|
|
96
|
+
|
|
97
|
+
Mask PII and sensitive content in images:
|
|
98
|
+
```bash
|
|
99
|
+
mdify document.pdf -m
|
|
100
|
+
mdify document.pdf --mask
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
This uses Docling's content-aware masking to obscure sensitive information in embedded images.
|
|
104
|
+
|
|
105
|
+
## Options
|
|
106
|
+
|
|
107
|
+
| Option | Description |
|
|
108
|
+
|--------|-------------|
|
|
109
|
+
| `input` | Input file or directory to convert (required) |
|
|
110
|
+
| `-o, --out-dir DIR` | Output directory for converted files (default: output) |
|
|
111
|
+
| `-g, --glob PATTERN` | Glob pattern for filtering files (default: *) |
|
|
112
|
+
| `-r, --recursive` | Recursively scan directories |
|
|
113
|
+
| `--flat` | Disable directory structure preservation |
|
|
114
|
+
| `--overwrite` | Overwrite existing output files |
|
|
115
|
+
| `-q, --quiet` | Suppress progress messages |
|
|
116
|
+
| `-m, --mask` | Mask PII and sensitive content in images |
|
|
117
|
+
| `--runtime RUNTIME` | Container runtime: docker or podman (auto-detected) |
|
|
118
|
+
| `--image IMAGE` | Custom container image (default: ghcr.io/tiroq/mdify-runtime:latest) |
|
|
119
|
+
| `--pull POLICY` | Image pull policy: always, missing, never (default: missing) |
|
|
120
|
+
| `--check-update` | Check for available updates and exit |
|
|
121
|
+
| `--version` | Show version and exit |
|
|
122
|
+
|
|
123
|
+
### Flat Mode
|
|
124
|
+
|
|
125
|
+
With `--flat`, all output files are placed directly in the output directory. Directory paths are incorporated into filenames to prevent collisions:
|
|
126
|
+
|
|
127
|
+
- `docs/subdir1/file.pdf` → `output/subdir1_file.md`
|
|
128
|
+
- `docs/subdir2/file.pdf` → `output/subdir2_file.md`
|
|
129
|
+
|
|
130
|
+
## Examples
|
|
131
|
+
|
|
132
|
+
Convert all PDFs recursively, preserving structure:
|
|
133
|
+
```bash
|
|
134
|
+
mdify documents/ -r -g "*.pdf" -o markdown_output
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
Convert with Podman instead of Docker:
|
|
138
|
+
```bash
|
|
139
|
+
mdify document.pdf --runtime podman
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
Use a custom/local container image:
|
|
143
|
+
```bash
|
|
144
|
+
mdify document.pdf --image my-custom-image:latest
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
Force pull latest container image:
|
|
148
|
+
```bash
|
|
149
|
+
mdify document.pdf --pull
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
## Architecture
|
|
153
|
+
|
|
154
|
+
```
|
|
155
|
+
┌──────────────────┐ ┌─────────────────────────────────┐
|
|
156
|
+
│ mdify CLI │ │ Container (Docker/Podman) │
|
|
157
|
+
│ (lightweight) │────▶│ ┌───────────────────────────┐ │
|
|
158
|
+
│ │ │ │ Docling + ML Models │ │
|
|
159
|
+
│ - File handling │◀────│ │ - PDF parsing │ │
|
|
160
|
+
│ - Container │ │ │ - OCR (Tesseract) │ │
|
|
161
|
+
│ orchestration │ │ │ - Document conversion │ │
|
|
162
|
+
└──────────────────┘ │ └───────────────────────────┘ │
|
|
163
|
+
└─────────────────────────────────┘
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
The CLI:
|
|
167
|
+
- Installs in seconds via pipx (no ML dependencies)
|
|
168
|
+
- Automatically detects Docker or Podman
|
|
169
|
+
- Pulls the runtime container on first use
|
|
170
|
+
- Mounts files and runs conversions in the container
|
|
171
|
+
|
|
172
|
+
## Container Image
|
|
173
|
+
|
|
174
|
+
The runtime container is hosted at:
|
|
175
|
+
```
|
|
176
|
+
ghcr.io/tiroq/mdify-runtime:latest
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
To build locally:
|
|
180
|
+
```bash
|
|
181
|
+
cd runtime
|
|
182
|
+
docker build -t mdify-runtime .
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
## Updates
|
|
186
|
+
|
|
187
|
+
mdify checks for updates daily. When a new version is available:
|
|
188
|
+
|
|
189
|
+
```
|
|
190
|
+
==================================================
|
|
191
|
+
A new version of mdify is available!
|
|
192
|
+
Current version: 0.3.0
|
|
193
|
+
Latest version: 0.4.0
|
|
194
|
+
==================================================
|
|
195
|
+
|
|
196
|
+
Run upgrade now? [y/N]
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
### Disable update checks
|
|
200
|
+
|
|
201
|
+
```bash
|
|
202
|
+
export MDIFY_NO_UPDATE_CHECK=1
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
## Uninstall
|
|
206
|
+
|
|
207
|
+
```bash
|
|
208
|
+
pipx uninstall mdify-cli
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
Or if installed via pip:
|
|
212
|
+
|
|
213
|
+
```bash
|
|
214
|
+
pip uninstall mdify-cli
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
## Development
|
|
218
|
+
|
|
219
|
+
### Task automation
|
|
220
|
+
|
|
221
|
+
This project uses [Task](https://taskfile.dev) for automation:
|
|
222
|
+
|
|
223
|
+
```bash
|
|
224
|
+
# Show available tasks
|
|
225
|
+
task
|
|
226
|
+
|
|
227
|
+
# Build package
|
|
228
|
+
task build
|
|
229
|
+
|
|
230
|
+
# Build container locally
|
|
231
|
+
task container-build
|
|
232
|
+
|
|
233
|
+
# Release workflow
|
|
234
|
+
task release-patch
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
### Building for PyPI
|
|
238
|
+
|
|
239
|
+
See [PUBLISHING.md](PUBLISHING.md) for complete publishing instructions.
|
|
240
|
+
|
|
241
|
+
## License
|
|
242
|
+
|
|
243
|
+
MIT
|
|
@@ -0,0 +1,215 @@
|
|
|
1
|
+
# mdify
|
|
2
|
+
|
|
3
|
+
A lightweight CLI for converting documents to Markdown. The CLI is fast to install via pipx, while the heavy ML conversion (Docling) runs inside a container.
|
|
4
|
+
|
|
5
|
+
## Requirements
|
|
6
|
+
|
|
7
|
+
- **Python 3.8+**
|
|
8
|
+
- **Docker** or **Podman** (for document conversion)
|
|
9
|
+
|
|
10
|
+
## Installation
|
|
11
|
+
|
|
12
|
+
### macOS (recommended)
|
|
13
|
+
|
|
14
|
+
```bash
|
|
15
|
+
brew install pipx
|
|
16
|
+
pipx ensurepath
|
|
17
|
+
pipx install mdify-cli
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
Restart your terminal after installation.
|
|
21
|
+
|
|
22
|
+
### Linux
|
|
23
|
+
|
|
24
|
+
```bash
|
|
25
|
+
python3 -m pip install --user pipx
|
|
26
|
+
pipx ensurepath
|
|
27
|
+
pipx install mdify-cli
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
### Install via pip
|
|
31
|
+
|
|
32
|
+
```bash
|
|
33
|
+
pip install mdify-cli
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
### Development install
|
|
37
|
+
|
|
38
|
+
```bash
|
|
39
|
+
git clone https://github.com/tiroq/mdify.git
|
|
40
|
+
cd mdify
|
|
41
|
+
pip install -e .
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
## Usage
|
|
45
|
+
|
|
46
|
+
### Basic conversion
|
|
47
|
+
|
|
48
|
+
Convert a single file:
|
|
49
|
+
```bash
|
|
50
|
+
mdify document.pdf
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
The first run will automatically pull the container image (~2GB) if not present.
|
|
54
|
+
|
|
55
|
+
### Convert multiple files
|
|
56
|
+
|
|
57
|
+
Convert all PDFs in a directory:
|
|
58
|
+
```bash
|
|
59
|
+
mdify /path/to/documents -g "*.pdf"
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
Recursively convert files:
|
|
63
|
+
```bash
|
|
64
|
+
mdify /path/to/documents -r -g "*.pdf"
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
### Masking sensitive content
|
|
68
|
+
|
|
69
|
+
Mask PII and sensitive content in images:
|
|
70
|
+
```bash
|
|
71
|
+
mdify document.pdf -m
|
|
72
|
+
mdify document.pdf --mask
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
This uses Docling's content-aware masking to obscure sensitive information in embedded images.
|
|
76
|
+
|
|
77
|
+
## Options
|
|
78
|
+
|
|
79
|
+
| Option | Description |
|
|
80
|
+
|--------|-------------|
|
|
81
|
+
| `input` | Input file or directory to convert (required) |
|
|
82
|
+
| `-o, --out-dir DIR` | Output directory for converted files (default: output) |
|
|
83
|
+
| `-g, --glob PATTERN` | Glob pattern for filtering files (default: *) |
|
|
84
|
+
| `-r, --recursive` | Recursively scan directories |
|
|
85
|
+
| `--flat` | Disable directory structure preservation |
|
|
86
|
+
| `--overwrite` | Overwrite existing output files |
|
|
87
|
+
| `-q, --quiet` | Suppress progress messages |
|
|
88
|
+
| `-m, --mask` | Mask PII and sensitive content in images |
|
|
89
|
+
| `--runtime RUNTIME` | Container runtime: docker or podman (auto-detected) |
|
|
90
|
+
| `--image IMAGE` | Custom container image (default: ghcr.io/tiroq/mdify-runtime:latest) |
|
|
91
|
+
| `--pull POLICY` | Image pull policy: always, missing, never (default: missing) |
|
|
92
|
+
| `--check-update` | Check for available updates and exit |
|
|
93
|
+
| `--version` | Show version and exit |
|
|
94
|
+
|
|
95
|
+
### Flat Mode
|
|
96
|
+
|
|
97
|
+
With `--flat`, all output files are placed directly in the output directory. Directory paths are incorporated into filenames to prevent collisions:
|
|
98
|
+
|
|
99
|
+
- `docs/subdir1/file.pdf` → `output/subdir1_file.md`
|
|
100
|
+
- `docs/subdir2/file.pdf` → `output/subdir2_file.md`
|
|
101
|
+
|
|
102
|
+
## Examples
|
|
103
|
+
|
|
104
|
+
Convert all PDFs recursively, preserving structure:
|
|
105
|
+
```bash
|
|
106
|
+
mdify documents/ -r -g "*.pdf" -o markdown_output
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
Convert with Podman instead of Docker:
|
|
110
|
+
```bash
|
|
111
|
+
mdify document.pdf --runtime podman
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
Use a custom/local container image:
|
|
115
|
+
```bash
|
|
116
|
+
mdify document.pdf --image my-custom-image:latest
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
Force pull latest container image:
|
|
120
|
+
```bash
|
|
121
|
+
mdify document.pdf --pull
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
## Architecture
|
|
125
|
+
|
|
126
|
+
```
|
|
127
|
+
┌──────────────────┐ ┌─────────────────────────────────┐
|
|
128
|
+
│ mdify CLI │ │ Container (Docker/Podman) │
|
|
129
|
+
│ (lightweight) │────▶│ ┌───────────────────────────┐ │
|
|
130
|
+
│ │ │ │ Docling + ML Models │ │
|
|
131
|
+
│ - File handling │◀────│ │ - PDF parsing │ │
|
|
132
|
+
│ - Container │ │ │ - OCR (Tesseract) │ │
|
|
133
|
+
│ orchestration │ │ │ - Document conversion │ │
|
|
134
|
+
└──────────────────┘ │ └───────────────────────────┘ │
|
|
135
|
+
└─────────────────────────────────┘
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
The CLI:
|
|
139
|
+
- Installs in seconds via pipx (no ML dependencies)
|
|
140
|
+
- Automatically detects Docker or Podman
|
|
141
|
+
- Pulls the runtime container on first use
|
|
142
|
+
- Mounts files and runs conversions in the container
|
|
143
|
+
|
|
144
|
+
## Container Image
|
|
145
|
+
|
|
146
|
+
The runtime container is hosted at:
|
|
147
|
+
```
|
|
148
|
+
ghcr.io/tiroq/mdify-runtime:latest
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
To build locally:
|
|
152
|
+
```bash
|
|
153
|
+
cd runtime
|
|
154
|
+
docker build -t mdify-runtime .
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
## Updates
|
|
158
|
+
|
|
159
|
+
mdify checks for updates daily. When a new version is available:
|
|
160
|
+
|
|
161
|
+
```
|
|
162
|
+
==================================================
|
|
163
|
+
A new version of mdify is available!
|
|
164
|
+
Current version: 0.3.0
|
|
165
|
+
Latest version: 0.4.0
|
|
166
|
+
==================================================
|
|
167
|
+
|
|
168
|
+
Run upgrade now? [y/N]
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
### Disable update checks
|
|
172
|
+
|
|
173
|
+
```bash
|
|
174
|
+
export MDIFY_NO_UPDATE_CHECK=1
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
## Uninstall
|
|
178
|
+
|
|
179
|
+
```bash
|
|
180
|
+
pipx uninstall mdify-cli
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
Or if installed via pip:
|
|
184
|
+
|
|
185
|
+
```bash
|
|
186
|
+
pip uninstall mdify-cli
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
## Development
|
|
190
|
+
|
|
191
|
+
### Task automation
|
|
192
|
+
|
|
193
|
+
This project uses [Task](https://taskfile.dev) for automation:
|
|
194
|
+
|
|
195
|
+
```bash
|
|
196
|
+
# Show available tasks
|
|
197
|
+
task
|
|
198
|
+
|
|
199
|
+
# Build package
|
|
200
|
+
task build
|
|
201
|
+
|
|
202
|
+
# Build container locally
|
|
203
|
+
task container-build
|
|
204
|
+
|
|
205
|
+
# Release workflow
|
|
206
|
+
task release-patch
|
|
207
|
+
```
|
|
208
|
+
|
|
209
|
+
### Building for PyPI
|
|
210
|
+
|
|
211
|
+
See [PUBLISHING.md](PUBLISHING.md) for complete publishing instructions.
|
|
212
|
+
|
|
213
|
+
## License
|
|
214
|
+
|
|
215
|
+
MIT
|