docslight 0.1.1__tar.gz → 0.1.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (56) hide show
  1. {docslight-0.1.1 → docslight-0.1.3}/PKG-INFO +19 -18
  2. {docslight-0.1.1 → docslight-0.1.3}/README.md +46 -45
  3. {docslight-0.1.1 → docslight-0.1.3}/docslight/__init__.py +1 -1
  4. {docslight-0.1.1 → docslight-0.1.3}/docslight/cli.py +63 -33
  5. {docslight-0.1.1 → docslight-0.1.3}/docslight/cloud/client.py +6 -6
  6. {docslight-0.1.1 → docslight-0.1.3}/docslight/preview.py +1 -1
  7. {docslight-0.1.1 → docslight-0.1.3}/docslight/schemas/fields.py +2 -2
  8. {docslight-0.1.1 → docslight-0.1.3}/docslight/web_app.py +32 -31
  9. {docslight-0.1.1 → docslight-0.1.3}/docslight.egg-info/PKG-INFO +19 -18
  10. {docslight-0.1.1 → docslight-0.1.3}/docslight.egg-info/SOURCES.txt +12 -9
  11. {docslight-0.1.1 → docslight-0.1.3}/pyproject.toml +4 -7
  12. docslight-0.1.3/tests/test_cli.py +450 -0
  13. docslight-0.1.3/tests/test_cli_entrypoint.py +15 -0
  14. docslight-0.1.3/tests/test_client.py +255 -0
  15. docslight-0.1.3/tests/test_cloud_client.py +771 -0
  16. docslight-0.1.3/tests/test_config_result.py +231 -0
  17. docslight-0.1.3/tests/test_examples.py +20 -0
  18. docslight-0.1.3/tests/test_local_llm.py +401 -0
  19. docslight-0.1.3/tests/test_local_loader_parser.py +300 -0
  20. docslight-0.1.3/tests/test_local_office_loader.py +108 -0
  21. docslight-0.1.3/tests/test_local_pipeline.py +825 -0
  22. docslight-0.1.3/tests/test_schema_helpers.py +117 -0
  23. docslight-0.1.3/tests/test_web_app.py +442 -0
  24. docslight-0.1.1/docslight/static/app/common.js +0 -668
  25. docslight-0.1.1/docslight/static/app/docslight-extract.json +0 -307
  26. docslight-0.1.1/docslight/static/app/extract.js +0 -394
  27. docslight-0.1.1/docslight/static/app/i18n.js +0 -405
  28. docslight-0.1.1/docslight/static/app/parse.js +0 -161
  29. docslight-0.1.1/docslight/static/styles.css +0 -878
  30. docslight-0.1.1/docslight/templates/base.html +0 -36
  31. docslight-0.1.1/docslight/templates/extract.html +0 -123
  32. docslight-0.1.1/docslight/templates/parse.html +0 -81
  33. {docslight-0.1.1 → docslight-0.1.3}/LICENSE +0 -0
  34. {docslight-0.1.1 → docslight-0.1.3}/docslight/client.py +0 -0
  35. {docslight-0.1.1 → docslight-0.1.3}/docslight/cloud/__init__.py +0 -0
  36. {docslight-0.1.1 → docslight-0.1.3}/docslight/config.py +0 -0
  37. {docslight-0.1.1 → docslight-0.1.3}/docslight/exceptions.py +0 -0
  38. {docslight-0.1.1 → docslight-0.1.3}/docslight/local/__init__.py +0 -0
  39. {docslight-0.1.1 → docslight-0.1.3}/docslight/local/layout_blocks.py +0 -0
  40. {docslight-0.1.1 → docslight-0.1.3}/docslight/local/llm_extractor.py +0 -0
  41. {docslight-0.1.1 → docslight-0.1.3}/docslight/local/loaders.py +0 -0
  42. {docslight-0.1.1 → docslight-0.1.3}/docslight/local/markdown.py +0 -0
  43. {docslight-0.1.1 → docslight-0.1.3}/docslight/local/office_loader.py +0 -0
  44. {docslight-0.1.1 → docslight-0.1.3}/docslight/local/paddle_parser.py +0 -0
  45. {docslight-0.1.1 → docslight-0.1.3}/docslight/local/pipeline.py +0 -0
  46. {docslight-0.1.1 → docslight-0.1.3}/docslight/providers/__init__.py +0 -0
  47. {docslight-0.1.1 → docslight-0.1.3}/docslight/providers/ollama.py +0 -0
  48. {docslight-0.1.1 → docslight-0.1.3}/docslight/providers/openai_compatible.py +0 -0
  49. {docslight-0.1.1 → docslight-0.1.3}/docslight/result.py +0 -0
  50. {docslight-0.1.1 → docslight-0.1.3}/docslight/schemas/__init__.py +0 -0
  51. {docslight-0.1.1 → docslight-0.1.3}/docslight/standard_json.py +0 -0
  52. {docslight-0.1.1 → docslight-0.1.3}/docslight.egg-info/dependency_links.txt +0 -0
  53. {docslight-0.1.1 → docslight-0.1.3}/docslight.egg-info/entry_points.txt +0 -0
  54. {docslight-0.1.1 → docslight-0.1.3}/docslight.egg-info/requires.txt +0 -0
  55. {docslight-0.1.1 → docslight-0.1.3}/docslight.egg-info/top_level.txt +0 -0
  56. {docslight-0.1.1 → docslight-0.1.3}/setup.cfg +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: docslight
3
- Version: 0.1.1
3
+ Version: 0.1.3
4
4
  Summary: Lightweight ComPDF document parsing and extraction SDK
5
5
  Author-email: ComPDF AI <support@compdf.com>
6
6
  License-Expression: MIT
@@ -78,6 +78,7 @@ Parse any document:
78
78
 
79
79
  ```bash
80
80
  docslight parse invoice.pdf --output invoice.md
81
+ docslight parse invoice.pdf --format zip --output invoice.zip
81
82
  ```
82
83
 
83
84
  Extract specific fields:
@@ -86,14 +87,13 @@ Extract specific fields:
86
87
  docslight extract invoice.pdf --fields invoice_number,total_amount
87
88
  ```
88
89
 
89
- Launch the local Web UI workbench:
90
+ Launch the local API server:
90
91
 
91
92
  ```bash
92
- pip install "docslight[web]"
93
93
  docslight web
94
- # Open http://127.0.0.1:8000
94
+ # Health: http://127.0.0.1:8000/api/health
95
95
 
96
- # Or run the same Web UI directly as a module
96
+ # Or run the same API server directly as a module
97
97
  python -m docslight.web_app --host 0.0.0.0 --port 8000 --debug
98
98
  ```
99
99
 
@@ -103,7 +103,7 @@ python -m docslight.web_app --host 0.0.0.0 --port 8000 --debug
103
103
  - **Parse → Markdown** — Convert PDF, DOCX, PPTX, XLSX, and images (PNG, JPG, TIFF, BMP, WebP) to clean Markdown
104
104
  - **Extract → JSON** — Pull structured data by field list, JSON Schema, or structured template (key-value + table extraction)
105
105
  - **CLI first** — Full-featured command-line interface, script-friendly
106
- - **Web UI** — Local Flask workbench with drag-and-drop, live preview with bbox highlights, and a Fields Builder UI
106
+ - **API server** — Local Flask backend exposing parse, extract, preview, health, and system-info endpoints
107
107
  - **Batch processing** — `parse_batch()` / `extract_batch()` for multiple files
108
108
  - **Local LLM extraction** — Ollama or any OpenAI-compatible provider for offline extraction
109
109
  - **Document types** — Classify and route documents by type for cloud extraction
@@ -116,7 +116,7 @@ python -m docslight.web_app --host 0.0.0.0 --port 8000 --debug
116
116
  | Core SDK & CLI | `pip install docslight` |
117
117
  | + Local parsing (OCR, Office) | `pip install "docslight[local]"` |
118
118
  | + Local LLM extraction | `pip install "docslight[local,local-llm]"` |
119
- | + Web UI workbench | `pip install "docslight[web]"` |
119
+ | + API server | `pip install "docslight[web]"` |
120
120
 
121
121
  > Local CPU parsing is experimental. Validate accuracy and latency on your own documents before production use.
122
122
 
@@ -216,35 +216,36 @@ for r in results:
216
216
 
217
217
  ```bash
218
218
  # Parse
219
- docslight parse invoice.pdf --mode cloud -o invoice.md
220
- docslight parse invoice.pdf --mode local -o invoice.md
219
+ docslight parse invoice.pdf --mode cloud -o invoice.zip
220
+ docslight parse invoice.pdf --mode cloud --format zip -o invoice.zip
221
+ docslight parse invoice.pdf --mode local -o invoice.zip
221
222
 
222
223
  # Extract
223
224
  docslight extract invoice.pdf --mode cloud --fields invoice_number,total_amount
224
225
  docslight extract invoice.pdf --mode local --fields invoice_number --local-llm-provider ollama --local-llm-model llama3.1
226
+ docslight extract "D:\pdf\invoice\1.pdf" --mode local --fields invoice_number --local-llm-provider ollama
225
227
 
226
228
  # Extract with schema
227
229
  docslight extract invoice.pdf --schema schema.json
228
230
 
229
- # Web UI
231
+ # API server
230
232
  docslight web --host 127.0.0.1 --port 8000
231
233
  ```
232
234
 
233
- ## Web UI Workbench
235
+ ## API Server
234
236
 
235
- DocSlight Workbench is a local Flask app for visual document processing.
237
+ DocSlight includes a local Flask API server for document processing. Frontend assets are not bundled in this package.
236
238
 
237
239
  ```bash
238
- pip install "docslight[web]"
239
240
  docslight web
240
241
  python -m docslight.web_app
241
242
  ```
242
243
 
243
- - **Parse & Extract tabs** — Switch between parsing and extraction workflows
244
- - **Drag-and-drop upload** — PDF, images, DOCX, PPTX, XLSX
245
- - **Live preview** — PDF page rendering with bbox highlight overlays
246
- - **Fields Builder** — Structured UI for building key-value and table extraction templates
247
- - **Download results** — One-click download of Markdown or JSON output
244
+ - `GET /api/health`
245
+ - `GET /api/system-info`
246
+ - `POST /api/parse`
247
+ - `POST /api/extract`
248
+ - `POST /api/preview`
248
249
 
249
250
  ## Environment Variables
250
251
 
@@ -1,14 +1,14 @@
1
1
  <p align="center">
2
- <h1 align="center">DocSlight</h1>
2
+ <h1 align="center">DocSlight</h1>
3
3
  <p align="center">Lightweight Python SDK & CLI for document parsing and structured extraction</p>
4
4
  <p align="center">
5
- <a href="https://pypi.org/project/docslight/"><img src="https://img.shields.io/pypi/v/docslight" alt="PyPI"></a>
6
- <a href="https://pypi.org/project/docslight/"><img src="https://img.shields.io/pypi/pyversions/docslight" alt="Python versions"></a>
7
- <a href="LICENSE"><img src="https://img.shields.io/github/license/kdanmobile/docslight" alt="License"></a>
5
+ <a href="https://pypi.org/project/docslight/"><img src="https://img.shields.io/pypi/v/docslight" alt="PyPI"></a>
6
+ <a href="https://pypi.org/project/docslight/"><img src="https://img.shields.io/pypi/pyversions/docslight" alt="Python versions"></a>
7
+ <a href="LICENSE"><img src="https://img.shields.io/github/license/kdanmobile/docslight" alt="License"></a>
8
8
  </p>
9
9
  </p>
10
10
 
11
- ## What is DocSlight?
11
+ ## What is DocSlight?
12
12
 
13
13
  A lightweight Python library that turns PDFs, images, and Office documents into clean Markdown or structured JSON — with one line of code. Works with ComPDF Cloud (recommended) or fully offline with local parsers.
14
14
 
@@ -23,13 +23,14 @@ print(result.to_markdown())
23
23
  ## Quick Start
24
24
 
25
25
  ```bash
26
- pip install docslight
26
+ pip install docslight
27
27
  ```
28
28
 
29
29
  Parse any document:
30
30
 
31
31
  ```bash
32
32
  docslight parse invoice.pdf --output invoice.md
33
+ docslight parse invoice.pdf --format zip --output invoice.zip
33
34
  ```
34
35
 
35
36
  Extract specific fields:
@@ -38,16 +39,15 @@ Extract specific fields:
38
39
  docslight extract invoice.pdf --fields invoice_number,total_amount
39
40
  ```
40
41
 
41
- Launch the local Web UI workbench:
42
-
43
- ```bash
44
- pip install "docslight[web]"
45
- docslight web
46
- # Open http://127.0.0.1:8000
47
-
48
- # Or run the same Web UI directly as a module
49
- python -m docslight.web_app --host 0.0.0.0 --port 8000 --debug
50
- ```
42
+ Launch the local API server:
43
+
44
+ ```bash
45
+ docslight web
46
+ # Health: http://127.0.0.1:8000/api/health
47
+
48
+ # Or run the same API server directly as a module
49
+ python -m docslight.web_app --host 0.0.0.0 --port 8000 --debug
50
+ ```
51
51
 
52
52
  ## Features
53
53
 
@@ -55,7 +55,7 @@ python -m docslight.web_app --host 0.0.0.0 --port 8000 --debug
55
55
  - **Parse → Markdown** — Convert PDF, DOCX, PPTX, XLSX, and images (PNG, JPG, TIFF, BMP, WebP) to clean Markdown
56
56
  - **Extract → JSON** — Pull structured data by field list, JSON Schema, or structured template (key-value + table extraction)
57
57
  - **CLI first** — Full-featured command-line interface, script-friendly
58
- - **Web UI** — Local Flask workbench with drag-and-drop, live preview with bbox highlights, and a Fields Builder UI
58
+ - **API server** — Local Flask backend exposing parse, extract, preview, health, and system-info endpoints
59
59
  - **Batch processing** — `parse_batch()` / `extract_batch()` for multiple files
60
60
  - **Local LLM extraction** — Ollama or any OpenAI-compatible provider for offline extraction
61
61
  - **Document types** — Classify and route documents by type for cloud extraction
@@ -65,10 +65,10 @@ python -m docslight.web_app --host 0.0.0.0 --port 8000 --debug
65
65
 
66
66
  | Scenario | Command |
67
67
  |----------|---------|
68
- | Core SDK & CLI | `pip install docslight` |
69
- | + Local parsing (OCR, Office) | `pip install "docslight[local]"` |
70
- | + Local LLM extraction | `pip install "docslight[local,local-llm]"` |
71
- | + Web UI workbench | `pip install "docslight[web]"` |
68
+ | Core SDK & CLI | `pip install docslight` |
69
+ | + Local parsing (OCR, Office) | `pip install "docslight[local]"` |
70
+ | + Local LLM extraction | `pip install "docslight[local,local-llm]"` |
71
+ | + API server | `pip install "docslight[web]"` |
72
72
 
73
73
  > Local CPU parsing is experimental. Validate accuracy and latency on your own documents before production use.
74
74
 
@@ -159,44 +159,45 @@ client = DocSlight(
159
159
  ### Batch Processing
160
160
 
161
161
  ```python
162
- results = client.parse_batch(["doc1.pdf", "doc2.pdf", "doc3.pdf"])
163
- for r in results:
164
- print(r.to_markdown()[:200])
162
+ results = client.parse_batch(["doc1.pdf", "doc2.pdf", "doc3.pdf"])
163
+ for r in results:
164
+ print(r.to_markdown()[:200])
165
165
  ```
166
166
 
167
167
  ## CLI Usage
168
168
 
169
169
  ```bash
170
170
  # Parse
171
- docslight parse invoice.pdf --mode cloud -o invoice.md
172
- docslight parse invoice.pdf --mode local -o invoice.md
171
+ docslight parse invoice.pdf --mode cloud -o invoice.zip
172
+ docslight parse invoice.pdf --mode cloud --format zip -o invoice.zip
173
+ docslight parse invoice.pdf --mode local -o invoice.zip
173
174
 
174
175
  # Extract
175
176
  docslight extract invoice.pdf --mode cloud --fields invoice_number,total_amount
176
177
  docslight extract invoice.pdf --mode local --fields invoice_number --local-llm-provider ollama --local-llm-model llama3.1
178
+ docslight extract "D:\pdf\invoice\1.pdf" --mode local --fields invoice_number --local-llm-provider ollama
177
179
 
178
180
  # Extract with schema
179
181
  docslight extract invoice.pdf --schema schema.json
180
182
 
181
- # Web UI
182
- docslight web --host 127.0.0.1 --port 8000
183
- ```
184
-
185
- ## Web UI Workbench
186
-
187
- DocSlight Workbench is a local Flask app for visual document processing.
188
-
189
- ```bash
190
- pip install "docslight[web]"
191
- docslight web
192
- python -m docslight.web_app
193
- ```
194
-
195
- - **Parse & Extract tabs** — Switch between parsing and extraction workflows
196
- - **Drag-and-drop upload** — PDF, images, DOCX, PPTX, XLSX
197
- - **Live preview** — PDF page rendering with bbox highlight overlays
198
- - **Fields Builder** — Structured UI for building key-value and table extraction templates
199
- - **Download results** — One-click download of Markdown or JSON output
183
+ # API server
184
+ docslight web --host 127.0.0.1 --port 8000
185
+ ```
186
+
187
+ ## API Server
188
+
189
+ DocSlight includes a local Flask API server for document processing. Frontend assets are not bundled in this package.
190
+
191
+ ```bash
192
+ docslight web
193
+ python -m docslight.web_app
194
+ ```
195
+
196
+ - `GET /api/health`
197
+ - `GET /api/system-info`
198
+ - `POST /api/parse`
199
+ - `POST /api/extract`
200
+ - `POST /api/preview`
200
201
 
201
202
  ## Environment Variables
202
203
 
@@ -19,7 +19,7 @@ from docslight.exceptions import (
19
19
  )
20
20
  from docslight.result import ExtractResult, ParseResult
21
21
 
22
- __version__ = "0.1.1"
22
+ __version__ = "0.1.2"
23
23
 
24
24
  __all__ = [
25
25
  "AuthenticationError",
@@ -81,20 +81,34 @@ def _client_from_args(args: argparse.Namespace) -> DocSlight:
81
81
  )
82
82
 
83
83
 
84
- def _write_output(content: str, output_path: str | None) -> None:
85
- if output_path is None:
86
- sys.stdout.write(content)
87
- if not content.endswith("\n"):
88
- sys.stdout.write("\n")
89
- return
90
- Path(output_path).write_text(content, encoding="utf-8")
84
+ def _write_output(content: str, output_path: str | None) -> None:
85
+ if output_path is None:
86
+ sys.stdout.write(content)
87
+ if not content.endswith("\n"):
88
+ sys.stdout.write("\n")
89
+ return
90
+ path = Path(output_path)
91
+ path.write_text(content, encoding="utf-8")
92
+ sys.stderr.write(f"Wrote {path.resolve()}\n")
93
+
94
+
95
+ def _write_binary_output(content: bytes, output_path: str | None) -> None:
96
+ if output_path is None:
97
+ output_buffer = getattr(sys.stdout, "buffer", None)
98
+ if output_buffer is None:
99
+ raise CLIUsageError("binary output requires --output")
100
+ output_buffer.write(content)
101
+ return
102
+ path = Path(output_path)
103
+ path.write_bytes(content)
104
+ sys.stderr.write(f"Wrote {path.resolve()}\n")
91
105
 
92
106
 
93
107
  def _to_pretty_json(data: Any) -> str:
94
108
  return json.dumps(data, ensure_ascii=False, indent=2)
95
109
 
96
110
 
97
- def run_web_app(host: str, port: int, debug: bool) -> None:
111
+ def run_web_app(host: str, port: int, debug: bool) -> None:
98
112
  """Run the optional Flask web application."""
99
113
  if importlib.util.find_spec("docslight.web_app") is None:
100
114
  raise CLIUsageError(WEB_EXTRA_ERROR)
@@ -105,8 +119,8 @@ def run_web_app(host: str, port: int, debug: bool) -> None:
105
119
  if exc.name in {"flask", "werkzeug"}:
106
120
  raise CLIUsageError(WEB_EXTRA_ERROR) from exc
107
121
  raise
108
- _run_web_app = web_app.run_web_app
109
- _run_web_app(host, port, debug)
122
+ _run_web_app = web_app.run_web_app
123
+ _run_web_app(host, port, debug)
110
124
 
111
125
 
112
126
  def _print_cli_error(error: Exception) -> int:
@@ -125,11 +139,11 @@ def build_parser() -> argparse.ArgumentParser:
125
139
  parse_parser = subparsers.add_parser("parse", help="Parse a document")
126
140
  parse_parser.add_argument("input")
127
141
  parse_parser.add_argument("--output", "-o")
128
- parse_parser.add_argument(
129
- "--format",
130
- choices=("markdown", "json", "standard-json"),
131
- default="markdown",
132
- )
142
+ parse_parser.add_argument(
143
+ "--format",
144
+ choices=("markdown", "json", "standard-json", "zip"),
145
+ default=None,
146
+ )
133
147
  _add_common_options(parse_parser)
134
148
  parse_parser.set_defaults(func=_run_parse)
135
149
 
@@ -151,25 +165,41 @@ def build_parser() -> argparse.ArgumentParser:
151
165
  extract_parser.set_defaults(func=_run_extract)
152
166
 
153
167
  web_parser = subparsers.add_parser("web", help="Run the web application")
154
- web_parser.add_argument("--host", default="127.0.0.1")
155
- web_parser.add_argument("--port", type=int, default=8000)
156
- web_parser.add_argument("--debug", action="store_true")
157
- web_parser.set_defaults(func=_run_web)
168
+ web_parser.add_argument("--host", default="127.0.0.1")
169
+ web_parser.add_argument("--port", type=int, default=8000)
170
+ web_parser.add_argument("--debug", action="store_true")
171
+ web_parser.set_defaults(func=_run_web)
158
172
 
159
173
  return parser
160
174
 
161
175
 
162
- def _run_parse(args: argparse.Namespace) -> int:
163
- parse_output = "json" if args.format == "standard-json" else args.format
164
- result = _client_from_args(args).parse(args.input, output=parse_output)
165
- if args.format == "json":
166
- content = _to_pretty_json(result.to_json())
167
- elif args.format == "standard-json":
168
- content = _to_pretty_json(result.to_standard_json())
169
- else:
170
- content = result.to_markdown()
171
- _write_output(content, args.output)
172
- return 0
176
+ def _run_parse(args: argparse.Namespace) -> int:
177
+ parse_format = _resolve_parse_format(args.format, args.output)
178
+ parse_output = "json" if parse_format == "standard-json" else "markdown"
179
+ result = _client_from_args(args).parse(args.input, output=parse_output)
180
+ if parse_format == "zip":
181
+ raw_archive = getattr(result, "raw_archive", None)
182
+ if not isinstance(raw_archive, bytes):
183
+ raise CLIUsageError("parse result did not include a ZIP archive")
184
+ _write_binary_output(raw_archive, args.output)
185
+ elif parse_format == "json":
186
+ content = _to_pretty_json(result.to_json())
187
+ _write_output(content, args.output)
188
+ elif parse_format == "standard-json":
189
+ content = _to_pretty_json(result.to_standard_json())
190
+ _write_output(content, args.output)
191
+ else:
192
+ content = result.to_markdown()
193
+ _write_output(content, args.output)
194
+ return 0
195
+
196
+
197
+ def _resolve_parse_format(parse_format: str | None, output_path: str | None) -> str:
198
+ if parse_format is not None:
199
+ return parse_format
200
+ if output_path is not None and Path(output_path).suffix.lower() == ".zip":
201
+ return "zip"
202
+ return "markdown"
173
203
 
174
204
 
175
205
  def _run_convert_parse_json(args: argparse.Namespace) -> int:
@@ -197,9 +227,9 @@ def _run_extract(args: argparse.Namespace) -> int:
197
227
  return 0
198
228
 
199
229
 
200
- def _run_web(args: argparse.Namespace) -> int:
201
- run_web_app(args.host, args.port, args.debug)
202
- return 0
230
+ def _run_web(args: argparse.Namespace) -> int:
231
+ run_web_app(args.host, args.port, args.debug)
232
+ return 0
203
233
 
204
234
 
205
235
  def main(argv: Sequence[str] | None = None) -> int:
@@ -171,11 +171,11 @@ class CloudClient:
171
171
  return _read_downloaded_result_payload(content)
172
172
  return self._response_json(response), None
173
173
 
174
- def _prepare_options(self, operation: str, options: dict[str, Any]) -> dict[str, Any]:
175
- if operation != "extract" or not self._uses_custom_operation_urls():
176
- return options
177
-
178
- prepared = dict(options)
174
+ def _prepare_options(self, operation: str, options: dict[str, Any]) -> dict[str, Any]:
175
+ if operation != "extract":
176
+ return options
177
+
178
+ prepared = dict(options)
179
179
  if "extract_fields" in prepared:
180
180
  return prepared
181
181
 
@@ -219,7 +219,7 @@ class CloudClient:
219
219
  return compacted
220
220
 
221
221
  def _headers(self) -> dict[str, str]:
222
- headers = {"User-Agent": "docslight/0.1.1"}
222
+ headers = {"User-Agent": "docslight/0.1.2"}
223
223
  if self.api_key:
224
224
  headers["Authorization"] = f"Bearer {self.api_key}"
225
225
  headers["x-api-key"] = self.api_key
@@ -1,4 +1,4 @@
1
- """Preview rendering helpers for the local Web UI."""
1
+ """Preview rendering helpers for the local API server."""
2
2
 
3
3
  from __future__ import annotations
4
4
 
@@ -11,8 +11,8 @@ NormalizedFields = list[str] | StructuredFields | None
11
11
  ExtractSchema = dict[str, Any]
12
12
 
13
13
 
14
- def normalize_fields(fields: list[str] | str | StructuredFields | None) -> NormalizedFields:
15
- """Normalize extraction fields from SDK, CLI, or Web UI inputs."""
14
+ def normalize_fields(fields: list[str] | str | StructuredFields | None) -> NormalizedFields:
15
+ """Normalize extraction fields from SDK, CLI, or API inputs."""
16
16
  if fields is None:
17
17
  return None
18
18
  if isinstance(fields, str):
@@ -2,19 +2,19 @@
2
2
 
3
3
  from __future__ import annotations
4
4
 
5
- import argparse
6
- import base64
7
- import json
8
- import logging
9
- import sys
10
- import tempfile
5
+ import argparse
6
+ import base64
7
+ import json
8
+ import logging
9
+ import sys
10
+ import tempfile
11
11
  from collections.abc import Callable
12
12
  from io import BytesIO
13
13
  from json import JSONDecodeError
14
14
  from pathlib import Path
15
15
  from typing import Any, cast
16
16
 
17
- from flask import Flask, Response, jsonify, redirect, render_template, request, send_file, url_for
17
+ from flask import Flask, Response, jsonify, request, send_file
18
18
  from werkzeug.datastructures import FileStorage
19
19
  from werkzeug.utils import secure_filename
20
20
 
@@ -58,21 +58,18 @@ OFFICE_PREVIEW_UNSUPPORTED_MESSAGE = (
58
58
  LOG_FORMAT = "%(levelname)s:%(name)s:%(message)s"
59
59
 
60
60
 
61
- def create_app(docslight_factory: Callable[..., Any] = DocSlight) -> Flask:
62
- """Create the local DocSlight Flask application."""
63
- app = Flask(__name__)
64
-
65
- @app.get("/")
66
- def index() -> Any:
67
- return redirect(url_for("parse_page"))
68
-
69
- @app.get("/parse")
70
- def parse_page() -> str:
71
- return render_template("parse.html", active_page="parse")
72
-
73
- @app.get("/extract")
74
- def extract_page() -> str:
75
- return render_template("extract.html", active_page="extract")
61
+ def create_app(docslight_factory: Callable[..., Any] = DocSlight) -> Flask:
62
+ """Create the local DocSlight Flask application."""
63
+ app = Flask(__name__)
64
+
65
+ @app.get("/")
66
+ def index() -> Any:
67
+ return jsonify(
68
+ {
69
+ "status": "healthy",
70
+ "service": "docslight-web",
71
+ }
72
+ )
76
73
 
77
74
  @app.get("/api/health")
78
75
  def health() -> Any:
@@ -160,7 +157,11 @@ def create_app(docslight_factory: Callable[..., Any] = DocSlight) -> Flask:
160
157
  return app
161
158
 
162
159
 
163
- def run_web_app(host: str = "127.0.0.1", port: int = 8000, debug: bool = False) -> None:
160
+ def run_web_app(
161
+ host: str = "127.0.0.1",
162
+ port: int = 8000,
163
+ debug: bool = False,
164
+ ) -> None:
164
165
  """Run the local DocSlight web application."""
165
166
  _configure_web_logging(debug)
166
167
  create_app().run(host=host, port=port, debug=debug)
@@ -183,17 +184,17 @@ def build_parser() -> argparse.ArgumentParser:
183
184
  prog="python -m docslight.web_app",
184
185
  description="Run the DocSlight web application.",
185
186
  )
186
- parser.add_argument("--host", default="127.0.0.1")
187
- parser.add_argument("--port", type=int, default=8000)
188
- parser.add_argument("--debug", action="store_true")
189
- return parser
187
+ parser.add_argument("--host", default="127.0.0.1")
188
+ parser.add_argument("--port", type=int, default=8000)
189
+ parser.add_argument("--debug", action="store_true")
190
+ return parser
190
191
 
191
192
 
192
193
  def main(argv: list[str] | None = None) -> int:
193
- """Run the standalone DocSlight web application entrypoint."""
194
- args = build_parser().parse_args(argv)
195
- run_web_app(args.host, args.port, args.debug)
196
- return 0
194
+ """Run the standalone DocSlight web application entrypoint."""
195
+ args = build_parser().parse_args(argv)
196
+ run_web_app(args.host, args.port, args.debug)
197
+ return 0
197
198
 
198
199
 
199
200
  def local_llm_from_form(form: Any) -> dict[str, str] | None:
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: docslight
3
- Version: 0.1.1
3
+ Version: 0.1.3
4
4
  Summary: Lightweight ComPDF document parsing and extraction SDK
5
5
  Author-email: ComPDF AI <support@compdf.com>
6
6
  License-Expression: MIT
@@ -78,6 +78,7 @@ Parse any document:
78
78
 
79
79
  ```bash
80
80
  docslight parse invoice.pdf --output invoice.md
81
+ docslight parse invoice.pdf --format zip --output invoice.zip
81
82
  ```
82
83
 
83
84
  Extract specific fields:
@@ -86,14 +87,13 @@ Extract specific fields:
86
87
  docslight extract invoice.pdf --fields invoice_number,total_amount
87
88
  ```
88
89
 
89
- Launch the local Web UI workbench:
90
+ Launch the local API server:
90
91
 
91
92
  ```bash
92
- pip install "docslight[web]"
93
93
  docslight web
94
- # Open http://127.0.0.1:8000
94
+ # Health: http://127.0.0.1:8000/api/health
95
95
 
96
- # Or run the same Web UI directly as a module
96
+ # Or run the same API server directly as a module
97
97
  python -m docslight.web_app --host 0.0.0.0 --port 8000 --debug
98
98
  ```
99
99
 
@@ -103,7 +103,7 @@ python -m docslight.web_app --host 0.0.0.0 --port 8000 --debug
103
103
  - **Parse → Markdown** — Convert PDF, DOCX, PPTX, XLSX, and images (PNG, JPG, TIFF, BMP, WebP) to clean Markdown
104
104
  - **Extract → JSON** — Pull structured data by field list, JSON Schema, or structured template (key-value + table extraction)
105
105
  - **CLI first** — Full-featured command-line interface, script-friendly
106
- - **Web UI** — Local Flask workbench with drag-and-drop, live preview with bbox highlights, and a Fields Builder UI
106
+ - **API server** — Local Flask backend exposing parse, extract, preview, health, and system-info endpoints
107
107
  - **Batch processing** — `parse_batch()` / `extract_batch()` for multiple files
108
108
  - **Local LLM extraction** — Ollama or any OpenAI-compatible provider for offline extraction
109
109
  - **Document types** — Classify and route documents by type for cloud extraction
@@ -116,7 +116,7 @@ python -m docslight.web_app --host 0.0.0.0 --port 8000 --debug
116
116
  | Core SDK & CLI | `pip install docslight` |
117
117
  | + Local parsing (OCR, Office) | `pip install "docslight[local]"` |
118
118
  | + Local LLM extraction | `pip install "docslight[local,local-llm]"` |
119
- | + Web UI workbench | `pip install "docslight[web]"` |
119
+ | + API server | `pip install "docslight[web]"` |
120
120
 
121
121
  > Local CPU parsing is experimental. Validate accuracy and latency on your own documents before production use.
122
122
 
@@ -216,35 +216,36 @@ for r in results:
216
216
 
217
217
  ```bash
218
218
  # Parse
219
- docslight parse invoice.pdf --mode cloud -o invoice.md
220
- docslight parse invoice.pdf --mode local -o invoice.md
219
+ docslight parse invoice.pdf --mode cloud -o invoice.zip
220
+ docslight parse invoice.pdf --mode cloud --format zip -o invoice.zip
221
+ docslight parse invoice.pdf --mode local -o invoice.zip
221
222
 
222
223
  # Extract
223
224
  docslight extract invoice.pdf --mode cloud --fields invoice_number,total_amount
224
225
  docslight extract invoice.pdf --mode local --fields invoice_number --local-llm-provider ollama --local-llm-model llama3.1
226
+ docslight extract "D:\pdf\invoice\1.pdf" --mode local --fields invoice_number --local-llm-provider ollama
225
227
 
226
228
  # Extract with schema
227
229
  docslight extract invoice.pdf --schema schema.json
228
230
 
229
- # Web UI
231
+ # API server
230
232
  docslight web --host 127.0.0.1 --port 8000
231
233
  ```
232
234
 
233
- ## Web UI Workbench
235
+ ## API Server
234
236
 
235
- DocSlight Workbench is a local Flask app for visual document processing.
237
+ DocSlight includes a local Flask API server for document processing. Frontend assets are not bundled in this package.
236
238
 
237
239
  ```bash
238
- pip install "docslight[web]"
239
240
  docslight web
240
241
  python -m docslight.web_app
241
242
  ```
242
243
 
243
- - **Parse & Extract tabs** — Switch between parsing and extraction workflows
244
- - **Drag-and-drop upload** — PDF, images, DOCX, PPTX, XLSX
245
- - **Live preview** — PDF page rendering with bbox highlight overlays
246
- - **Fields Builder** — Structured UI for building key-value and table extraction templates
247
- - **Download results** — One-click download of Markdown or JSON output
244
+ - `GET /api/health`
245
+ - `GET /api/system-info`
246
+ - `POST /api/parse`
247
+ - `POST /api/extract`
248
+ - `POST /api/preview`
248
249
 
249
250
  ## Environment Variables
250
251