datalab-python-sdk 0.1.1__tar.gz → 0.1.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/.github/workflows/ci.yml +1 -1
  2. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/.github/workflows/publish.yml +0 -1
  3. datalab_python_sdk-0.1.3/PKG-INFO +68 -0
  4. datalab_python_sdk-0.1.3/README.md +53 -0
  5. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/datalab_sdk/__init__.py +5 -7
  6. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/datalab_sdk/cli.py +4 -4
  7. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/datalab_sdk/client.py +11 -11
  8. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/datalab_sdk/models.py +22 -14
  9. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/datalab_sdk/settings.py +1 -0
  10. datalab_python_sdk-0.1.3/integration/README.md +36 -0
  11. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/integration/test_live_api.py +7 -7
  12. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/integration/test_readme_examples.py +22 -22
  13. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/pyproject.toml +17 -8
  14. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/tests/conftest.py +2 -3
  15. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/tests/test_client_methods.py +3 -3
  16. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/uv.lock +24 -119
  17. datalab_python_sdk-0.1.1/PKG-INFO +0 -17
  18. datalab_python_sdk-0.1.1/README.md +0 -178
  19. datalab_python_sdk-0.1.1/integration/README.md +0 -71
  20. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/.gitignore +0 -0
  21. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/.pre-commit-config.yaml +0 -0
  22. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/.python-version +0 -0
  23. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/LICENSE +0 -0
  24. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/data/08-Lambda-Calculus.pptx +0 -0
  25. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/data/adversarial.pdf +0 -0
  26. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/data/bid_evaluation.docx +0 -0
  27. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/data/book_review.ppt +0 -0
  28. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/data/book_store.xls +0 -0
  29. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/data/chi_hind.png +0 -0
  30. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/data/how_to_read.doc +0 -0
  31. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/data/normandy.epub +0 -0
  32. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/data/sample-1-sheet.xlsx +0 -0
  33. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/data/thinkpython.pdf +0 -0
  34. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/data/vibe.html +0 -0
  35. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/datalab_sdk/exceptions.py +0 -0
  36. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/datalab_sdk/mimetypes.py +0 -0
  37. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/integration/__init__.py +0 -0
  38. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/poetry.lock +0 -0
  39. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/pytest.ini +0 -0
  40. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/tests/__init__.py +0 -0
  41. {datalab_python_sdk-0.1.1 → datalab_python_sdk-0.1.3}/tests/test_cli_simple.py +0 -0
@@ -14,6 +14,6 @@ jobs:
14
14
  - name: Install python dependencies
15
15
  run: |
16
16
  pip install uv
17
- uv sync
17
+ uv sync --group dev
18
18
  - name: Run tests
19
19
  run: uv run pytest tests/
@@ -35,5 +35,4 @@ jobs:
35
35
  env:
36
36
  PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}
37
37
  run: |
38
- cd python
39
38
  uv publish --token "$PYPI_TOKEN"
@@ -0,0 +1,68 @@
1
+ Metadata-Version: 2.4
2
+ Name: datalab-python-sdk
3
+ Version: 0.1.3
4
+ Summary: SDK for the Datalab document intelligence API
5
+ Author-email: Datalab Team <hi@datalab.to>
6
+ License-Expression: MIT
7
+ License-File: LICENSE
8
+ Keywords: api,datalab,document-intelligence,sdk
9
+ Requires-Python: >=3.10
10
+ Requires-Dist: aiohttp>=3.12.14
11
+ Requires-Dist: click>=8.2.1
12
+ Requires-Dist: pydantic-settings<3.0.0,>=2.10.1
13
+ Requires-Dist: pydantic<3.0.0,>=2.11.7
14
+ Description-Content-Type: text/markdown
15
+
16
+ # Datalab SDK
17
+
18
+ A Python SDK for the [Datalab API](https://www.datalab.to) - a document intelligence platform powered by [marker](https://github.com/VikParuchuri/marker) and [surya](https://github.com/VikParuchuri/surya).
19
+
20
+ See the full documentation at [https://documentation.datalab.to](https://documentation.datalab.to).
21
+
22
+ ## Installation
23
+
24
+ ```bash
25
+ pip install datalab-python-sdk
26
+ ```
27
+
28
+ ## Quick Start
29
+
30
+ ### Authentication
31
+
32
+ Get your API key from [https://www.datalab.to/app/keys](https://www.datalab.to/app/keys):
33
+
34
+ ```bash
35
+ export DATALAB_API_KEY="your_api_key_here"
36
+ ```
37
+
38
+ ### Basic Usage
39
+
40
+ ```python
41
+ from datalab_sdk import DatalabClient
42
+
43
+ client = DatalabClient() # use env var from above, or pass api_key="your_api_key_here"
44
+
45
+ # Convert PDF to markdown
46
+ result = client.convert("document.pdf")
47
+ print(result.markdown)
48
+
49
+ # OCR a document
50
+ ocr_result = client.ocr("document.pdf")
51
+ print(ocr_result.pages) # Get all text as string
52
+ ```
53
+
54
+ ## CLI Usage
55
+
56
+ The SDK includes a command-line interface:
57
+
58
+ ```bash
59
+ # Convert document to markdown
60
+ datalab convert document.pdf
61
+
62
+ # OCR with JSON output
63
+ datalab ocr document.pdf --output-format json
64
+ ```
65
+
66
+ ## License
67
+
68
+ MIT License
@@ -0,0 +1,53 @@
1
+ # Datalab SDK
2
+
3
+ A Python SDK for the [Datalab API](https://www.datalab.to) - a document intelligence platform powered by [marker](https://github.com/VikParuchuri/marker) and [surya](https://github.com/VikParuchuri/surya).
4
+
5
+ See the full documentation at [https://documentation.datalab.to](https://documentation.datalab.to).
6
+
7
+ ## Installation
8
+
9
+ ```bash
10
+ pip install datalab-python-sdk
11
+ ```
12
+
13
+ ## Quick Start
14
+
15
+ ### Authentication
16
+
17
+ Get your API key from [https://www.datalab.to/app/keys](https://www.datalab.to/app/keys):
18
+
19
+ ```bash
20
+ export DATALAB_API_KEY="your_api_key_here"
21
+ ```
22
+
23
+ ### Basic Usage
24
+
25
+ ```python
26
+ from datalab_sdk import DatalabClient
27
+
28
+ client = DatalabClient() # use env var from above, or pass api_key="your_api_key_here"
29
+
30
+ # Convert PDF to markdown
31
+ result = client.convert("document.pdf")
32
+ print(result.markdown)
33
+
34
+ # OCR a document
35
+ ocr_result = client.ocr("document.pdf")
36
+ print(ocr_result.pages) # Get all text as string
37
+ ```
38
+
39
+ ## CLI Usage
40
+
41
+ The SDK includes a command-line interface:
42
+
43
+ ```bash
44
+ # Convert document to markdown
45
+ datalab convert document.pdf
46
+
47
+ # OCR with JSON output
48
+ datalab ocr document.pdf --output-format json
49
+ ```
50
+
51
+ ## License
52
+
53
+ MIT License
@@ -7,13 +7,10 @@ supporting document conversion, OCR, layout analysis, and table recognition.
7
7
 
8
8
  from .client import DatalabClient, AsyncDatalabClient
9
9
  from .exceptions import DatalabError, DatalabAPIError, DatalabTimeoutError
10
- from .models import (
11
- ConversionResult,
12
- OCRResult,
13
- ProcessingOptions,
14
- )
10
+ from .models import ConversionResult, OCRResult, ConvertOptions, OCROptions
11
+ from .settings import settings
15
12
 
16
- __version__ = "1.0.0"
13
+ __version__ = settings.VERSION
17
14
  __all__ = [
18
15
  "DatalabClient",
19
16
  "AsyncDatalabClient",
@@ -22,5 +19,6 @@ __all__ = [
22
19
  "DatalabTimeoutError",
23
20
  "ConversionResult",
24
21
  "OCRResult",
25
- "ProcessingOptions",
22
+ "ConvertOptions",
23
+ "OCROptions",
26
24
  ]
@@ -12,7 +12,7 @@ import click
12
12
 
13
13
  from datalab_sdk.client import DatalabClient, AsyncDatalabClient
14
14
  from datalab_sdk.mimetypes import SUPPORTED_EXTENSIONS
15
- from datalab_sdk.models import ProcessingOptions
15
+ from datalab_sdk.models import OCROptions, ConvertOptions, ProcessingOptions
16
16
  from datalab_sdk.exceptions import DatalabError
17
17
  from datalab_sdk.settings import settings
18
18
 
@@ -186,7 +186,7 @@ def process_single_file_sync(
186
186
 
187
187
 
188
188
  @click.group()
189
- @click.version_option(version="1.0.0")
189
+ @click.version_option(version=settings.VERSION)
190
190
  def cli():
191
191
  """Datalab SDK - Command line interface for document processing"""
192
192
  pass
@@ -242,7 +242,7 @@ def convert(
242
242
  ]
243
243
 
244
244
  # Create processing options
245
- options = ProcessingOptions(
245
+ options = ConvertOptions(
246
246
  output_format=output_format,
247
247
  max_pages=max_pages,
248
248
  force_ocr=force_ocr,
@@ -366,7 +366,7 @@ def ocr(
366
366
  click.echo(f"❌ Skipping {path}: unsupported file type", err=True)
367
367
  sys.exit(1)
368
368
 
369
- options = ProcessingOptions(
369
+ options = OCROptions(
370
370
  max_pages=max_pages,
371
371
  page_range=page_range,
372
372
  skip_cache=skip_cache,
@@ -14,7 +14,13 @@ from datalab_sdk.exceptions import (
14
14
  DatalabFileError,
15
15
  )
16
16
  from datalab_sdk.mimetypes import MIMETYPE_MAP
17
- from datalab_sdk.models import ConversionResult, OCRResult, ProcessingOptions
17
+ from datalab_sdk.models import (
18
+ ConversionResult,
19
+ OCRResult,
20
+ ProcessingOptions,
21
+ ConvertOptions,
22
+ OCROptions,
23
+ )
18
24
  from datalab_sdk.settings import settings
19
25
 
20
26
 
@@ -62,7 +68,7 @@ class AsyncDatalabClient:
62
68
  timeout=timeout,
63
69
  headers={
64
70
  "X-Api-Key": self.api_key,
65
- "User-Agent": "datalab-python-sdk/0.1.0",
71
+ "User-Agent": f"datalab-python-sdk/{settings.VERSION}",
66
72
  },
67
73
  )
68
74
 
@@ -170,7 +176,7 @@ class AsyncDatalabClient:
170
176
  ) -> ConversionResult:
171
177
  """Convert a document using the marker endpoint"""
172
178
  if options is None:
173
- options = ProcessingOptions()
179
+ options = ConvertOptions()
174
180
 
175
181
  initial_data = await self._make_request(
176
182
  "POST", "/api/v1/marker", data=self.get_form_params(file_path, options)
@@ -212,7 +218,7 @@ class AsyncDatalabClient:
212
218
  ) -> OCRResult:
213
219
  """Perform OCR on a document"""
214
220
  if options is None:
215
- options = ProcessingOptions()
221
+ options = OCROptions()
216
222
 
217
223
  initial_data = await self._make_request(
218
224
  "POST", "/api/v1/ocr", data=self.get_form_params(file_path, options)
@@ -263,13 +269,7 @@ class DatalabClient:
263
269
 
264
270
  def _run_async(self, coro):
265
271
  """Run async coroutine in sync context"""
266
- try:
267
- loop = asyncio.get_event_loop()
268
- except RuntimeError:
269
- loop = asyncio.new_event_loop()
270
- asyncio.set_event_loop(loop)
271
-
272
- return loop.run_until_complete(self._async_wrapper(coro))
272
+ return asyncio.run(self._async_wrapper(coro))
273
273
 
274
274
  async def _async_wrapper(self, coro):
275
275
  """Wrapper to ensure session management"""
@@ -11,25 +11,11 @@ import base64
11
11
 
12
12
  @dataclass
13
13
  class ProcessingOptions:
14
- """Options for document processing"""
15
-
16
14
  # Common options
17
15
  max_pages: Optional[int] = None
18
16
  skip_cache: bool = True
19
17
  page_range: Optional[str] = None
20
18
 
21
- # Marker specific options
22
- force_ocr: bool = False
23
- format_lines: bool = False
24
- paginate: bool = False
25
- use_llm: bool = False
26
- strip_existing_ocr: bool = False
27
- disable_image_extraction: bool = False
28
- block_correction_prompt: Optional[str] = None
29
- additional_config: Optional[Dict[str, Any]] = None
30
- page_schema: Optional[Dict[str, Any]] = None
31
- output_format: str = "markdown" # markdown, json, html
32
-
33
19
  def to_form_data(self) -> Dict[str, Any]:
34
20
  """Convert to form data format for API requests"""
35
21
  form_data = {}
@@ -47,6 +33,28 @@ class ProcessingOptions:
47
33
  return form_data
48
34
 
49
35
 
36
+ @dataclass
37
+ class ConvertOptions(ProcessingOptions):
38
+ """Options for marker conversion"""
39
+
40
+ # Marker specific options
41
+ force_ocr: bool = False
42
+ format_lines: bool = False
43
+ paginate: bool = False
44
+ use_llm: bool = False
45
+ strip_existing_ocr: bool = False
46
+ disable_image_extraction: bool = False
47
+ block_correction_prompt: Optional[str] = None
48
+ additional_config: Optional[Dict[str, Any]] = None
49
+ page_schema: Optional[Dict[str, Any]] = None
50
+ output_format: str = "markdown" # markdown, json, html
51
+
52
+
53
+ @dataclass
54
+ class OCROptions(ProcessingOptions):
55
+ pass
56
+
57
+
50
58
  @dataclass
51
59
  class ConversionResult:
52
60
  """Result from document conversion (marker endpoint)"""
@@ -6,6 +6,7 @@ class Settings(BaseSettings):
6
6
  # Paths
7
7
  BASE_DIR: str = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
8
8
  LOGLEVEL: str = "DEBUG"
9
+ VERSION: str = "0.1.3"
9
10
 
10
11
  # Base settings
11
12
  DATALAB_API_KEY: str | None = None
@@ -0,0 +1,36 @@
1
+ # Integration Tests
2
+
3
+ This directory contains integration tests that run against the live Datalab API.
4
+
5
+ ## Setup
6
+
7
+ 1. **Set your API key** as an environment variable:
8
+ ```bash
9
+ export DATALAB_API_KEY="your_api_key_here"
10
+ ```
11
+
12
+ 2. **Optional: Set custom base URL** if testing against a different server:
13
+ ```bash
14
+ export DATALAB_BASE_URL="https://custom.datalab.to"
15
+ ```
16
+
17
+
18
+ ## Running the Tests
19
+
20
+ Run all integration tests:
21
+ ```bash
22
+ pytest integration/ -v
23
+ ```
24
+
25
+ Run specific test classes:
26
+ ```bash
27
+ pytest integration/test_live_api.py::TestMarkerIntegration -v
28
+ pytest integration/test_live_api.py::TestOCRIntegration -v
29
+ ```
30
+
31
+ Run individual tests:
32
+ ```bash
33
+ pytest integration/test_live_api.py::TestMarkerIntegration::test_convert_pdf_basic -v
34
+ ```
35
+
36
+ Set `-n 4` to run 4 parallel test workers.
@@ -14,7 +14,7 @@ import pytest
14
14
  import os
15
15
  from pathlib import Path
16
16
  from datalab_sdk import DatalabClient, AsyncDatalabClient
17
- from datalab_sdk.models import ProcessingOptions, ConversionResult, OCRResult
17
+ from datalab_sdk.models import ConversionResult, OCRResult, ConvertOptions, OCROptions
18
18
  from datalab_sdk.exceptions import DatalabError
19
19
 
20
20
  # Test data files
@@ -32,7 +32,7 @@ class TestMarkerIntegration:
32
32
  pdf_file = DATA_DIR / "adversarial.pdf"
33
33
 
34
34
  # Convert with limited pages to keep test fast
35
- options = ProcessingOptions(max_pages=2)
35
+ options = ConvertOptions(max_pages=2)
36
36
  result = client.convert(pdf_file, options=options)
37
37
 
38
38
  # Verify result
@@ -52,7 +52,7 @@ class TestMarkerIntegration:
52
52
  doc_file = DATA_DIR / "bid_evaluation.docx"
53
53
 
54
54
  # Convert to HTML format
55
- options = ProcessingOptions(output_format="html", max_pages=1)
55
+ options = ConvertOptions(output_format="html", max_pages=1)
56
56
  result = client.convert(doc_file, options=options)
57
57
 
58
58
  # Verify result
@@ -70,7 +70,7 @@ class TestMarkerIntegration:
70
70
  ppt_file = DATA_DIR / "08-Lambda-Calculus.pptx"
71
71
 
72
72
  # Convert to JSON format
73
- options = ProcessingOptions(output_format="json", max_pages=1)
73
+ options = ConvertOptions(output_format="json", max_pages=1)
74
74
  result = await client.convert(ppt_file, options=options)
75
75
 
76
76
  # Verify result
@@ -94,7 +94,7 @@ class TestOCRIntegration:
94
94
  pdf_file = DATA_DIR / "thinkpython.pdf"
95
95
 
96
96
  # OCR with limited pages
97
- options = ProcessingOptions(max_pages=1)
97
+ options = OCROptions(max_pages=1)
98
98
  result = client.ocr(pdf_file, options)
99
99
 
100
100
  # Verify result
@@ -149,7 +149,7 @@ class TestOCRIntegration:
149
149
  pdf_file = DATA_DIR / "adversarial.pdf"
150
150
 
151
151
  # OCR with limited pages
152
- options = ProcessingOptions(max_pages=2)
152
+ options = OCROptions(max_pages=2)
153
153
  result = await client.ocr(pdf_file, options)
154
154
 
155
155
  # Verify result
@@ -223,7 +223,7 @@ class TestSaveOutput:
223
223
  output_path = tmp_path / "test_output"
224
224
 
225
225
  # Convert with save_output
226
- options = ProcessingOptions(max_pages=1)
226
+ options = ConvertOptions(max_pages=1)
227
227
  result = client.convert(pdf_file, options=options, save_output=output_path)
228
228
 
229
229
  # Verify result
@@ -15,7 +15,7 @@ import json
15
15
  import tempfile
16
16
  from pathlib import Path
17
17
  from datalab_sdk import DatalabClient, AsyncDatalabClient
18
- from datalab_sdk.models import ProcessingOptions
18
+ from datalab_sdk.models import ConvertOptions, OCROptions
19
19
  from datalab_sdk.settings import settings
20
20
 
21
21
  # Test data files
@@ -34,7 +34,7 @@ class TestBasicUsageExamples:
34
34
 
35
35
  # Convert PDF to markdown (using test data)
36
36
  result = client.convert(
37
- DATA_DIR / "adversarial.pdf", options=ProcessingOptions(max_pages=1)
37
+ DATA_DIR / "adversarial.pdf", options=ConvertOptions(max_pages=1)
38
38
  )
39
39
  print(result.markdown)
40
40
 
@@ -61,7 +61,7 @@ class TestBasicUsageExamples:
61
61
  async with AsyncDatalabClient() as client:
62
62
  # Convert PDF to markdown
63
63
  result = await client.convert(
64
- DATA_DIR / "adversarial.pdf", options=ProcessingOptions(max_pages=1)
64
+ DATA_DIR / "adversarial.pdf", options=ConvertOptions(max_pages=1)
65
65
  )
66
66
  print(result.markdown)
67
67
 
@@ -85,19 +85,19 @@ class TestAPIMethodExamples:
85
85
 
86
86
  def test_document_conversion_examples(self):
87
87
  """Test Document Conversion section examples"""
88
- from datalab_sdk import DatalabClient, ProcessingOptions
88
+ from datalab_sdk import DatalabClient, ConvertOptions
89
89
 
90
90
  client = DatalabClient()
91
91
 
92
92
  # Basic conversion
93
93
  result = client.convert(
94
- DATA_DIR / "adversarial.pdf", options=ProcessingOptions(max_pages=1)
94
+ DATA_DIR / "adversarial.pdf", options=ConvertOptions(max_pages=1)
95
95
  )
96
96
  assert result.success is True
97
97
  assert result.markdown is not None
98
98
 
99
99
  # With options
100
- options = ProcessingOptions(
100
+ options = ConvertOptions(
101
101
  force_ocr=True,
102
102
  output_format="html",
103
103
  use_llm=False, # Keep false for cost reasons
@@ -112,7 +112,7 @@ class TestAPIMethodExamples:
112
112
  output_path = Path(tmp_dir) / "result"
113
113
  result = client.convert(
114
114
  DATA_DIR / "adversarial.pdf",
115
- options=ProcessingOptions(max_pages=1),
115
+ options=ConvertOptions(max_pages=1),
116
116
  save_output=output_path,
117
117
  )
118
118
  assert result.success is True
@@ -132,7 +132,7 @@ class TestAPIMethodExamples:
132
132
  assert isinstance(text, str)
133
133
 
134
134
  # OCR with options
135
- options = ProcessingOptions(max_pages=1)
135
+ options = OCROptions(max_pages=1)
136
136
  result = client.ocr(DATA_DIR / "adversarial.pdf", options)
137
137
  assert result.success is True
138
138
  assert len(result.pages) > 0
@@ -158,7 +158,7 @@ class TestErrorHandlingExamples:
158
158
  # Test with valid file (should not raise error)
159
159
  try:
160
160
  result = client.convert(
161
- DATA_DIR / "adversarial.pdf", options=ProcessingOptions(max_pages=1)
161
+ DATA_DIR / "adversarial.pdf", options=ConvertOptions(max_pages=1)
162
162
  )
163
163
  assert result.success is True
164
164
  except DatalabAPIError as e:
@@ -171,7 +171,7 @@ class TestErrorHandlingExamples:
171
171
 
172
172
  with pytest.raises(DatalabAPIError):
173
173
  invalid_client.convert(
174
- DATA_DIR / "adversarial.pdf", options=ProcessingOptions(max_pages=1)
174
+ DATA_DIR / "adversarial.pdf", options=ConvertOptions(max_pages=1)
175
175
  )
176
176
 
177
177
 
@@ -180,10 +180,10 @@ class TestExamplesSectionFromReadme:
180
180
 
181
181
  def test_extract_json_data_example(self):
182
182
  """Test Extract JSON Data example"""
183
- from datalab_sdk import DatalabClient, ProcessingOptions
183
+ from datalab_sdk import DatalabClient, ConvertOptions
184
184
 
185
185
  client = DatalabClient()
186
- options = ProcessingOptions(output_format="json", max_pages=1)
186
+ options = ConvertOptions(output_format="json", max_pages=1)
187
187
  result = client.convert(DATA_DIR / "adversarial.pdf", options=options)
188
188
 
189
189
  # Parse JSON to find equations (modified to not fail if no equations)
@@ -221,7 +221,7 @@ class TestExamplesSectionFromReadme:
221
221
  if file.suffix == ".pdf":
222
222
  result = await client.convert(
223
223
  str(file),
224
- options=ProcessingOptions(max_pages=1),
224
+ options=ConvertOptions(max_pages=1),
225
225
  save_output=output_path,
226
226
  )
227
227
  print(f"{file.name}: {result.page_count} pages")
@@ -247,7 +247,7 @@ class TestClientInitializationVariations:
247
247
 
248
248
  client = DatalabClient()
249
249
  result = client.convert(
250
- DATA_DIR / "adversarial.pdf", options=ProcessingOptions(max_pages=1)
250
+ DATA_DIR / "adversarial.pdf", options=ConvertOptions(max_pages=1)
251
251
  )
252
252
  assert result.success is True
253
253
 
@@ -262,7 +262,7 @@ class TestClientInitializationVariations:
262
262
  # Client should use environment variable
263
263
  client = DatalabClient()
264
264
  result = client.convert(
265
- DATA_DIR / "adversarial.pdf", options=ProcessingOptions(max_pages=1)
265
+ DATA_DIR / "adversarial.pdf", options=ConvertOptions(max_pages=1)
266
266
  )
267
267
  assert result.success is True
268
268
  finally:
@@ -280,7 +280,7 @@ class TestClientInitializationVariations:
280
280
  api_key=settings.DATALAB_API_KEY, base_url=settings.DATALAB_HOST
281
281
  ) as client:
282
282
  result = await client.convert(
283
- DATA_DIR / "adversarial.pdf", options=ProcessingOptions(max_pages=1)
283
+ DATA_DIR / "adversarial.pdf", options=ConvertOptions(max_pages=1)
284
284
  )
285
285
  assert result.success is True
286
286
  assert result.markdown is not None
@@ -291,9 +291,9 @@ class TestProcessingOptionsVariations:
291
291
 
292
292
  def test_processing_options_defaults(self):
293
293
  """Test ProcessingOptions with default values"""
294
- from datalab_sdk import ProcessingOptions
294
+ from datalab_sdk import ConvertOptions
295
295
 
296
- options = ProcessingOptions()
296
+ options = ConvertOptions()
297
297
  assert options.force_ocr is False
298
298
  assert options.output_format == "markdown"
299
299
  assert options.use_llm is False
@@ -301,9 +301,9 @@ class TestProcessingOptionsVariations:
301
301
 
302
302
  def test_processing_options_custom_values(self):
303
303
  """Test ProcessingOptions with custom values"""
304
- from datalab_sdk import ProcessingOptions
304
+ from datalab_sdk import ConvertOptions
305
305
 
306
- options = ProcessingOptions(
306
+ options = ConvertOptions(
307
307
  force_ocr=True,
308
308
  output_format="html",
309
309
  use_llm=False, # Keep false for cost reasons
@@ -319,9 +319,9 @@ class TestProcessingOptionsVariations:
319
319
 
320
320
  def test_processing_options_json_output(self):
321
321
  """Test ProcessingOptions with JSON output"""
322
- from datalab_sdk import ProcessingOptions
322
+ from datalab_sdk import ConvertOptions
323
323
 
324
- options = ProcessingOptions(output_format="json", max_pages=1)
324
+ options = ConvertOptions(output_format="json", max_pages=1)
325
325
 
326
326
  client = DatalabClient()
327
327
  result = client.convert(DATA_DIR / "adversarial.pdf", options=options)
@@ -1,24 +1,29 @@
1
1
  [project]
2
2
  name = "datalab-python-sdk"
3
- version = "0.1.1"
4
- description = "Auto-generated SDK for Datalab API"
3
+ authors = [
4
+ {name = "Datalab Team", email = "hi@datalab.to"}
5
+ ]
6
+ readme = "README.md"
7
+ license = "MIT"
8
+ repository = "https://github.com/datalab-to/sdk"
9
+ keywords = ["datalab", "sdk", "document-intelligence", "api"]
10
+ version = "0.1.3"
11
+ description = "SDK for the Datalab document intelligence API"
5
12
  requires-python = ">=3.10"
6
13
  dependencies = [
7
14
  "aiohttp>=3.12.14",
8
15
  "click>=8.2.1",
9
- "pydantic (>=2.11.7,<3.0.0)",
10
- "pydantic-settings (>=2.10.1,<3.0.0)",
11
- "pytest-asyncio>=1.0.0",
16
+ "pydantic>=2.11.7,<3.0.0",
17
+ "pydantic-settings>=2.10.1,<3.0.0",
12
18
  ]
13
19
 
14
-
15
20
  [project.scripts]
16
21
  datalab = "datalab_sdk.cli:cli"
17
22
 
18
- [project.optional-dependencies]
23
+ [project.dev-dependencies]
19
24
  test = [
20
25
  "pytest>=7.4.0",
21
- "pytest-asyncio>=0.21.0",
26
+ "pytest-asyncio>=1.0.0",
22
27
  "pytest-mock>=3.11.0",
23
28
  "pytest-cov>=4.1.0",
24
29
  "aiofiles>=23.2.0",
@@ -33,7 +38,11 @@ packages = ["datalab_sdk"]
33
38
 
34
39
  [dependency-groups]
35
40
  dev = [
41
+ "aiohttp>=3.12.14",
42
+ "click>=8.2.1",
36
43
  "pre-commit>=4.2.0",
37
44
  "pytest>=8.4.1",
45
+ "pytest-asyncio>=1.0.0",
46
+ "pytest-xdist>=3.8.0",
38
47
  "ruff>=0.12.2",
39
48
  ]
@@ -10,8 +10,7 @@ from aiohttp.test_utils import TestServer
10
10
  import json
11
11
  import tempfile
12
12
 
13
- from datalab_sdk import DatalabClient, AsyncDatalabClient
14
- from datalab_sdk.models import ProcessingOptions
13
+ from datalab_sdk import DatalabClient, AsyncDatalabClient, ConvertOptions
15
14
 
16
15
 
17
16
  @pytest.fixture
@@ -144,7 +143,7 @@ async def mock_async_client(mock_server):
144
143
  @pytest.fixture
145
144
  def processing_options():
146
145
  """Create sample processing options"""
147
- return ProcessingOptions(
146
+ return ConvertOptions(
148
147
  force_ocr=True, output_format="markdown", use_llm=False, max_pages=10
149
148
  )
150
149
 
@@ -7,7 +7,7 @@ from unittest.mock import patch, AsyncMock
7
7
  import json
8
8
 
9
9
  from datalab_sdk import DatalabClient, AsyncDatalabClient
10
- from datalab_sdk.models import ProcessingOptions, ConversionResult, OCRResult
10
+ from datalab_sdk.models import ConversionResult, OCRResult, ConvertOptions, OCROptions
11
11
  from datalab_sdk.exceptions import DatalabAPIError, DatalabFileError
12
12
 
13
13
 
@@ -124,7 +124,7 @@ class TestConvertMethod:
124
124
  pdf_file.write_bytes(b"%PDF-1.4\n%Test PDF content\n%%EOF\n")
125
125
 
126
126
  # Create processing options
127
- options = ProcessingOptions(
127
+ options = ConvertOptions(
128
128
  force_ocr=True, output_format="html", use_llm=True, max_pages=5
129
129
  )
130
130
 
@@ -338,7 +338,7 @@ class TestOCRMethod:
338
338
  mock_request.return_value = mock_initial_response
339
339
  mock_poll.return_value = mock_result_response
340
340
 
341
- options = ProcessingOptions(
341
+ options = OCROptions(
342
342
  max_pages=2,
343
343
  )
344
344
 
@@ -1,19 +1,6 @@
1
1
  version = 1
2
2
  revision = 2
3
3
  requires-python = ">=3.10"
4
- resolution-markers = [
5
- "python_full_version >= '3.11'",
6
- "python_full_version < '3.11'",
7
- ]
8
-
9
- [[package]]
10
- name = "aiofiles"
11
- version = "24.1.0"
12
- source = { registry = "https://pypi.org/simple" }
13
- sdist = { url = "https://files.pythonhosted.org/packages/0b/03/a88171e277e8caa88a4c77808c20ebb04ba74cc4681bf1e9416c862de237/aiofiles-24.1.0.tar.gz", hash = "sha256:22a075c9e5a3810f0c2e48f3008c94d68c65d763b9b03857924c99e57355166c", size = 30247, upload-time = "2024-06-24T11:02:03.584Z" }
14
- wheels = [
15
- { url = "https://files.pythonhosted.org/packages/a5/45/30bb92d442636f570cb5651bc661f52b610e2eec3f891a5dc3a4c3667db0/aiofiles-24.1.0-py3-none-any.whl", hash = "sha256:b4ec55f4195e3eb5d7abd1bf7e061763e864dd4954231fb8539a0ef8bb8260e5", size = 15896, upload-time = "2024-06-24T11:02:01.529Z" },
16
- ]
17
4
 
18
5
  [[package]]
19
6
  name = "aiohappyeyeballs"
@@ -180,122 +167,44 @@ wheels = [
180
167
  { url = "https://files.pythonhosted.org/packages/d1/d6/3965ed04c63042e047cb6a3e6ed1a63a35087b6a609aa3a15ed8ac56c221/colorama-0.4.6-py2.py3-none-any.whl", hash = "sha256:4f1d9991f5acc0ca119f9d443620b77f9d6b33703e51011c16baf57afb285fc6", size = 25335, upload-time = "2022-10-25T02:36:20.889Z" },
181
168
  ]
182
169
 
183
- [[package]]
184
- name = "coverage"
185
- version = "7.9.2"
186
- source = { registry = "https://pypi.org/simple" }
187
- sdist = { url = "https://files.pythonhosted.org/packages/04/b7/c0465ca253df10a9e8dae0692a4ae6e9726d245390aaef92360e1d6d3832/coverage-7.9.2.tar.gz", hash = "sha256:997024fa51e3290264ffd7492ec97d0690293ccd2b45a6cd7d82d945a4a80c8b", size = 813556, upload-time = "2025-07-03T10:54:15.101Z" }
188
- wheels = [
189
- { url = "https://files.pythonhosted.org/packages/a1/0d/5c2114fd776c207bd55068ae8dc1bef63ecd1b767b3389984a8e58f2b926/coverage-7.9.2-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:66283a192a14a3854b2e7f3418d7db05cdf411012ab7ff5db98ff3b181e1f912", size = 212039, upload-time = "2025-07-03T10:52:38.955Z" },
190
- { url = "https://files.pythonhosted.org/packages/cf/ad/dc51f40492dc2d5fcd31bb44577bc0cc8920757d6bc5d3e4293146524ef9/coverage-7.9.2-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:4e01d138540ef34fcf35c1aa24d06c3de2a4cffa349e29a10056544f35cca15f", size = 212428, upload-time = "2025-07-03T10:52:41.36Z" },
191
- { url = "https://files.pythonhosted.org/packages/a2/a3/55cb3ff1b36f00df04439c3993d8529193cdf165a2467bf1402539070f16/coverage-7.9.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f22627c1fe2745ee98d3ab87679ca73a97e75ca75eb5faee48660d060875465f", size = 241534, upload-time = "2025-07-03T10:52:42.956Z" },
192
- { url = "https://files.pythonhosted.org/packages/eb/c9/a8410b91b6be4f6e9c2e9f0dce93749b6b40b751d7065b4410bf89cb654b/coverage-7.9.2-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:4b1c2d8363247b46bd51f393f86c94096e64a1cf6906803fa8d5a9d03784bdbf", size = 239408, upload-time = "2025-07-03T10:52:44.199Z" },
193
- { url = "https://files.pythonhosted.org/packages/ff/c4/6f3e56d467c612b9070ae71d5d3b114c0b899b5788e1ca3c93068ccb7018/coverage-7.9.2-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c10c882b114faf82dbd33e876d0cbd5e1d1ebc0d2a74ceef642c6152f3f4d547", size = 240552, upload-time = "2025-07-03T10:52:45.477Z" },
194
- { url = "https://files.pythonhosted.org/packages/fd/20/04eda789d15af1ce79bce5cc5fd64057c3a0ac08fd0576377a3096c24663/coverage-7.9.2-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:de3c0378bdf7066c3988d66cd5232d161e933b87103b014ab1b0b4676098fa45", size = 240464, upload-time = "2025-07-03T10:52:46.809Z" },
195
- { url = "https://files.pythonhosted.org/packages/a9/5a/217b32c94cc1a0b90f253514815332d08ec0812194a1ce9cca97dda1cd20/coverage-7.9.2-cp310-cp310-musllinux_1_2_i686.whl", hash = "sha256:1e2f097eae0e5991e7623958a24ced3282676c93c013dde41399ff63e230fcf2", size = 239134, upload-time = "2025-07-03T10:52:48.149Z" },
196
- { url = "https://files.pythonhosted.org/packages/34/73/1d019c48f413465eb5d3b6898b6279e87141c80049f7dbf73fd020138549/coverage-7.9.2-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:28dc1f67e83a14e7079b6cea4d314bc8b24d1aed42d3582ff89c0295f09b181e", size = 239405, upload-time = "2025-07-03T10:52:49.687Z" },
197
- { url = "https://files.pythonhosted.org/packages/49/6c/a2beca7aa2595dad0c0d3f350382c381c92400efe5261e2631f734a0e3fe/coverage-7.9.2-cp310-cp310-win32.whl", hash = "sha256:bf7d773da6af9e10dbddacbf4e5cab13d06d0ed93561d44dae0188a42c65be7e", size = 214519, upload-time = "2025-07-03T10:52:51.036Z" },
198
- { url = "https://files.pythonhosted.org/packages/fc/c8/91e5e4a21f9a51e2c7cdd86e587ae01a4fcff06fc3fa8cde4d6f7cf68df4/coverage-7.9.2-cp310-cp310-win_amd64.whl", hash = "sha256:0c0378ba787681ab1897f7c89b415bd56b0b2d9a47e5a3d8dc0ea55aac118d6c", size = 215400, upload-time = "2025-07-03T10:52:52.313Z" },
199
- { url = "https://files.pythonhosted.org/packages/39/40/916786453bcfafa4c788abee4ccd6f592b5b5eca0cd61a32a4e5a7ef6e02/coverage-7.9.2-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:a7a56a2964a9687b6aba5b5ced6971af308ef6f79a91043c05dd4ee3ebc3e9ba", size = 212152, upload-time = "2025-07-03T10:52:53.562Z" },
200
- { url = "https://files.pythonhosted.org/packages/9f/66/cc13bae303284b546a030762957322bbbff1ee6b6cb8dc70a40f8a78512f/coverage-7.9.2-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:123d589f32c11d9be7fe2e66d823a236fe759b0096f5db3fb1b75b2fa414a4fa", size = 212540, upload-time = "2025-07-03T10:52:55.196Z" },
201
- { url = "https://files.pythonhosted.org/packages/0f/3c/d56a764b2e5a3d43257c36af4a62c379df44636817bb5f89265de4bf8bd7/coverage-7.9.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:333b2e0ca576a7dbd66e85ab402e35c03b0b22f525eed82681c4b866e2e2653a", size = 245097, upload-time = "2025-07-03T10:52:56.509Z" },
202
- { url = "https://files.pythonhosted.org/packages/b1/46/bd064ea8b3c94eb4ca5d90e34d15b806cba091ffb2b8e89a0d7066c45791/coverage-7.9.2-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:326802760da234baf9f2f85a39e4a4b5861b94f6c8d95251f699e4f73b1835dc", size = 242812, upload-time = "2025-07-03T10:52:57.842Z" },
203
- { url = "https://files.pythonhosted.org/packages/43/02/d91992c2b29bc7afb729463bc918ebe5f361be7f1daae93375a5759d1e28/coverage-7.9.2-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:19e7be4cfec248df38ce40968c95d3952fbffd57b400d4b9bb580f28179556d2", size = 244617, upload-time = "2025-07-03T10:52:59.239Z" },
204
- { url = "https://files.pythonhosted.org/packages/b7/4f/8fadff6bf56595a16d2d6e33415841b0163ac660873ed9a4e9046194f779/coverage-7.9.2-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:0b4a4cb73b9f2b891c1788711408ef9707666501ba23684387277ededab1097c", size = 244263, upload-time = "2025-07-03T10:53:00.601Z" },
205
- { url = "https://files.pythonhosted.org/packages/9b/d2/e0be7446a2bba11739edb9f9ba4eff30b30d8257370e237418eb44a14d11/coverage-7.9.2-cp311-cp311-musllinux_1_2_i686.whl", hash = "sha256:2c8937fa16c8c9fbbd9f118588756e7bcdc7e16a470766a9aef912dd3f117dbd", size = 242314, upload-time = "2025-07-03T10:53:01.932Z" },
206
- { url = "https://files.pythonhosted.org/packages/9d/7d/dcbac9345000121b8b57a3094c2dfcf1ccc52d8a14a40c1d4bc89f936f80/coverage-7.9.2-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:42da2280c4d30c57a9b578bafd1d4494fa6c056d4c419d9689e66d775539be74", size = 242904, upload-time = "2025-07-03T10:53:03.478Z" },
207
- { url = "https://files.pythonhosted.org/packages/41/58/11e8db0a0c0510cf31bbbdc8caf5d74a358b696302a45948d7c768dfd1cf/coverage-7.9.2-cp311-cp311-win32.whl", hash = "sha256:14fa8d3da147f5fdf9d298cacc18791818f3f1a9f542c8958b80c228320e90c6", size = 214553, upload-time = "2025-07-03T10:53:05.174Z" },
208
- { url = "https://files.pythonhosted.org/packages/3a/7d/751794ec8907a15e257136e48dc1021b1f671220ecccfd6c4eaf30802714/coverage-7.9.2-cp311-cp311-win_amd64.whl", hash = "sha256:549cab4892fc82004f9739963163fd3aac7a7b0df430669b75b86d293d2df2a7", size = 215441, upload-time = "2025-07-03T10:53:06.472Z" },
209
- { url = "https://files.pythonhosted.org/packages/62/5b/34abcedf7b946c1c9e15b44f326cb5b0da852885312b30e916f674913428/coverage-7.9.2-cp311-cp311-win_arm64.whl", hash = "sha256:c2667a2b913e307f06aa4e5677f01a9746cd08e4b35e14ebcde6420a9ebb4c62", size = 213873, upload-time = "2025-07-03T10:53:07.699Z" },
210
- { url = "https://files.pythonhosted.org/packages/53/d7/7deefc6fd4f0f1d4c58051f4004e366afc9e7ab60217ac393f247a1de70a/coverage-7.9.2-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:ae9eb07f1cfacd9cfe8eaee6f4ff4b8a289a668c39c165cd0c8548484920ffc0", size = 212344, upload-time = "2025-07-03T10:53:09.3Z" },
211
- { url = "https://files.pythonhosted.org/packages/95/0c/ee03c95d32be4d519e6a02e601267769ce2e9a91fc8faa1b540e3626c680/coverage-7.9.2-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:9ce85551f9a1119f02adc46d3014b5ee3f765deac166acf20dbb851ceb79b6f3", size = 212580, upload-time = "2025-07-03T10:53:11.52Z" },
212
- { url = "https://files.pythonhosted.org/packages/8b/9f/826fa4b544b27620086211b87a52ca67592622e1f3af9e0a62c87aea153a/coverage-7.9.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f8f6389ac977c5fb322e0e38885fbbf901743f79d47f50db706e7644dcdcb6e1", size = 246383, upload-time = "2025-07-03T10:53:13.134Z" },
213
- { url = "https://files.pythonhosted.org/packages/7f/b3/4477aafe2a546427b58b9c540665feff874f4db651f4d3cb21b308b3a6d2/coverage-7.9.2-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:ff0d9eae8cdfcd58fe7893b88993723583a6ce4dfbfd9f29e001922544f95615", size = 243400, upload-time = "2025-07-03T10:53:14.614Z" },
214
- { url = "https://files.pythonhosted.org/packages/f8/c2/efffa43778490c226d9d434827702f2dfbc8041d79101a795f11cbb2cf1e/coverage-7.9.2-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:fae939811e14e53ed8a9818dad51d434a41ee09df9305663735f2e2d2d7d959b", size = 245591, upload-time = "2025-07-03T10:53:15.872Z" },
215
- { url = "https://files.pythonhosted.org/packages/c6/e7/a59888e882c9a5f0192d8627a30ae57910d5d449c80229b55e7643c078c4/coverage-7.9.2-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:31991156251ec202c798501e0a42bbdf2169dcb0f137b1f5c0f4267f3fc68ef9", size = 245402, upload-time = "2025-07-03T10:53:17.124Z" },
216
- { url = "https://files.pythonhosted.org/packages/92/a5/72fcd653ae3d214927edc100ce67440ed8a0a1e3576b8d5e6d066ed239db/coverage-7.9.2-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:d0d67963f9cbfc7c7f96d4ac74ed60ecbebd2ea6eeb51887af0f8dce205e545f", size = 243583, upload-time = "2025-07-03T10:53:18.781Z" },
217
- { url = "https://files.pythonhosted.org/packages/5c/f5/84e70e4df28f4a131d580d7d510aa1ffd95037293da66fd20d446090a13b/coverage-7.9.2-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:49b752a2858b10580969ec6af6f090a9a440a64a301ac1528d7ca5f7ed497f4d", size = 244815, upload-time = "2025-07-03T10:53:20.168Z" },
218
- { url = "https://files.pythonhosted.org/packages/39/e7/d73d7cbdbd09fdcf4642655ae843ad403d9cbda55d725721965f3580a314/coverage-7.9.2-cp312-cp312-win32.whl", hash = "sha256:88d7598b8ee130f32f8a43198ee02edd16d7f77692fa056cb779616bbea1b355", size = 214719, upload-time = "2025-07-03T10:53:21.521Z" },
219
- { url = "https://files.pythonhosted.org/packages/9f/d6/7486dcc3474e2e6ad26a2af2db7e7c162ccd889c4c68fa14ea8ec189c9e9/coverage-7.9.2-cp312-cp312-win_amd64.whl", hash = "sha256:9dfb070f830739ee49d7c83e4941cc767e503e4394fdecb3b54bfdac1d7662c0", size = 215509, upload-time = "2025-07-03T10:53:22.853Z" },
220
- { url = "https://files.pythonhosted.org/packages/b7/34/0439f1ae2593b0346164d907cdf96a529b40b7721a45fdcf8b03c95fcd90/coverage-7.9.2-cp312-cp312-win_arm64.whl", hash = "sha256:4e2c058aef613e79df00e86b6d42a641c877211384ce5bd07585ed7ba71ab31b", size = 213910, upload-time = "2025-07-03T10:53:24.472Z" },
221
- { url = "https://files.pythonhosted.org/packages/94/9d/7a8edf7acbcaa5e5c489a646226bed9591ee1c5e6a84733c0140e9ce1ae1/coverage-7.9.2-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:985abe7f242e0d7bba228ab01070fde1d6c8fa12f142e43debe9ed1dde686038", size = 212367, upload-time = "2025-07-03T10:53:25.811Z" },
222
- { url = "https://files.pythonhosted.org/packages/e8/9e/5cd6f130150712301f7e40fb5865c1bc27b97689ec57297e568d972eec3c/coverage-7.9.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:82c3939264a76d44fde7f213924021ed31f55ef28111a19649fec90c0f109e6d", size = 212632, upload-time = "2025-07-03T10:53:27.075Z" },
223
- { url = "https://files.pythonhosted.org/packages/a8/de/6287a2c2036f9fd991c61cefa8c64e57390e30c894ad3aa52fac4c1e14a8/coverage-7.9.2-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ae5d563e970dbe04382f736ec214ef48103d1b875967c89d83c6e3f21706d5b3", size = 245793, upload-time = "2025-07-03T10:53:28.408Z" },
224
- { url = "https://files.pythonhosted.org/packages/06/cc/9b5a9961d8160e3cb0b558c71f8051fe08aa2dd4b502ee937225da564ed1/coverage-7.9.2-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:bdd612e59baed2a93c8843c9a7cb902260f181370f1d772f4842987535071d14", size = 243006, upload-time = "2025-07-03T10:53:29.754Z" },
225
- { url = "https://files.pythonhosted.org/packages/49/d9/4616b787d9f597d6443f5588619c1c9f659e1f5fc9eebf63699eb6d34b78/coverage-7.9.2-cp313-cp313-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:256ea87cb2a1ed992bcdfc349d8042dcea1b80436f4ddf6e246d6bee4b5d73b6", size = 244990, upload-time = "2025-07-03T10:53:31.098Z" },
226
- { url = "https://files.pythonhosted.org/packages/48/83/801cdc10f137b2d02b005a761661649ffa60eb173dcdaeb77f571e4dc192/coverage-7.9.2-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:f44ae036b63c8ea432f610534a2668b0c3aee810e7037ab9d8ff6883de480f5b", size = 245157, upload-time = "2025-07-03T10:53:32.717Z" },
227
- { url = "https://files.pythonhosted.org/packages/c8/a4/41911ed7e9d3ceb0ffb019e7635468df7499f5cc3edca5f7dfc078e9c5ec/coverage-7.9.2-cp313-cp313-musllinux_1_2_i686.whl", hash = "sha256:82d76ad87c932935417a19b10cfe7abb15fd3f923cfe47dbdaa74ef4e503752d", size = 243128, upload-time = "2025-07-03T10:53:34.009Z" },
228
- { url = "https://files.pythonhosted.org/packages/10/41/344543b71d31ac9cb00a664d5d0c9ef134a0fe87cb7d8430003b20fa0b7d/coverage-7.9.2-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:619317bb86de4193debc712b9e59d5cffd91dc1d178627ab2a77b9870deb2868", size = 244511, upload-time = "2025-07-03T10:53:35.434Z" },
229
- { url = "https://files.pythonhosted.org/packages/d5/81/3b68c77e4812105e2a060f6946ba9e6f898ddcdc0d2bfc8b4b152a9ae522/coverage-7.9.2-cp313-cp313-win32.whl", hash = "sha256:0a07757de9feb1dfafd16ab651e0f628fd7ce551604d1bf23e47e1ddca93f08a", size = 214765, upload-time = "2025-07-03T10:53:36.787Z" },
230
- { url = "https://files.pythonhosted.org/packages/06/a2/7fac400f6a346bb1a4004eb2a76fbff0e242cd48926a2ce37a22a6a1d917/coverage-7.9.2-cp313-cp313-win_amd64.whl", hash = "sha256:115db3d1f4d3f35f5bb021e270edd85011934ff97c8797216b62f461dd69374b", size = 215536, upload-time = "2025-07-03T10:53:38.188Z" },
231
- { url = "https://files.pythonhosted.org/packages/08/47/2c6c215452b4f90d87017e61ea0fd9e0486bb734cb515e3de56e2c32075f/coverage-7.9.2-cp313-cp313-win_arm64.whl", hash = "sha256:48f82f889c80af8b2a7bb6e158d95a3fbec6a3453a1004d04e4f3b5945a02694", size = 213943, upload-time = "2025-07-03T10:53:39.492Z" },
232
- { url = "https://files.pythonhosted.org/packages/a3/46/e211e942b22d6af5e0f323faa8a9bc7c447a1cf1923b64c47523f36ed488/coverage-7.9.2-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:55a28954545f9d2f96870b40f6c3386a59ba8ed50caf2d949676dac3ecab99f5", size = 213088, upload-time = "2025-07-03T10:53:40.874Z" },
233
- { url = "https://files.pythonhosted.org/packages/d2/2f/762551f97e124442eccd907bf8b0de54348635b8866a73567eb4e6417acf/coverage-7.9.2-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:cdef6504637731a63c133bb2e6f0f0214e2748495ec15fe42d1e219d1b133f0b", size = 213298, upload-time = "2025-07-03T10:53:42.218Z" },
234
- { url = "https://files.pythonhosted.org/packages/7a/b7/76d2d132b7baf7360ed69be0bcab968f151fa31abe6d067f0384439d9edb/coverage-7.9.2-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:bcd5ebe66c7a97273d5d2ddd4ad0ed2e706b39630ed4b53e713d360626c3dbb3", size = 256541, upload-time = "2025-07-03T10:53:43.823Z" },
235
- { url = "https://files.pythonhosted.org/packages/a0/17/392b219837d7ad47d8e5974ce5f8dc3deb9f99a53b3bd4d123602f960c81/coverage-7.9.2-cp313-cp313t-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:9303aed20872d7a3c9cb39c5d2b9bdbe44e3a9a1aecb52920f7e7495410dfab8", size = 252761, upload-time = "2025-07-03T10:53:45.19Z" },
236
- { url = "https://files.pythonhosted.org/packages/d5/77/4256d3577fe1b0daa8d3836a1ebe68eaa07dd2cbaf20cf5ab1115d6949d4/coverage-7.9.2-cp313-cp313t-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:bc18ea9e417a04d1920a9a76fe9ebd2f43ca505b81994598482f938d5c315f46", size = 254917, upload-time = "2025-07-03T10:53:46.931Z" },
237
- { url = "https://files.pythonhosted.org/packages/53/99/fc1a008eef1805e1ddb123cf17af864743354479ea5129a8f838c433cc2c/coverage-7.9.2-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:6406cff19880aaaadc932152242523e892faff224da29e241ce2fca329866584", size = 256147, upload-time = "2025-07-03T10:53:48.289Z" },
238
- { url = "https://files.pythonhosted.org/packages/92/c0/f63bf667e18b7f88c2bdb3160870e277c4874ced87e21426128d70aa741f/coverage-7.9.2-cp313-cp313t-musllinux_1_2_i686.whl", hash = "sha256:2d0d4f6ecdf37fcc19c88fec3e2277d5dee740fb51ffdd69b9579b8c31e4232e", size = 254261, upload-time = "2025-07-03T10:53:49.99Z" },
239
- { url = "https://files.pythonhosted.org/packages/8c/32/37dd1c42ce3016ff8ec9e4b607650d2e34845c0585d3518b2a93b4830c1a/coverage-7.9.2-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:c33624f50cf8de418ab2b4d6ca9eda96dc45b2c4231336bac91454520e8d1fac", size = 255099, upload-time = "2025-07-03T10:53:51.354Z" },
240
- { url = "https://files.pythonhosted.org/packages/da/2e/af6b86f7c95441ce82f035b3affe1cd147f727bbd92f563be35e2d585683/coverage-7.9.2-cp313-cp313t-win32.whl", hash = "sha256:1df6b76e737c6a92210eebcb2390af59a141f9e9430210595251fbaf02d46926", size = 215440, upload-time = "2025-07-03T10:53:52.808Z" },
241
- { url = "https://files.pythonhosted.org/packages/4d/bb/8a785d91b308867f6b2e36e41c569b367c00b70c17f54b13ac29bcd2d8c8/coverage-7.9.2-cp313-cp313t-win_amd64.whl", hash = "sha256:f5fd54310b92741ebe00d9c0d1d7b2b27463952c022da6d47c175d246a98d1bd", size = 216537, upload-time = "2025-07-03T10:53:54.273Z" },
242
- { url = "https://files.pythonhosted.org/packages/1d/a0/a6bffb5e0f41a47279fd45a8f3155bf193f77990ae1c30f9c224b61cacb0/coverage-7.9.2-cp313-cp313t-win_arm64.whl", hash = "sha256:c48c2375287108c887ee87d13b4070a381c6537d30e8487b24ec721bf2a781cb", size = 214398, upload-time = "2025-07-03T10:53:56.715Z" },
243
- { url = "https://files.pythonhosted.org/packages/d7/85/f8bbefac27d286386961c25515431482a425967e23d3698b75a250872924/coverage-7.9.2-pp39.pp310.pp311-none-any.whl", hash = "sha256:8a1166db2fb62473285bcb092f586e081e92656c7dfa8e9f62b4d39d7e6b5050", size = 204013, upload-time = "2025-07-03T10:54:12.084Z" },
244
- { url = "https://files.pythonhosted.org/packages/3c/38/bbe2e63902847cf79036ecc75550d0698af31c91c7575352eb25190d0fb3/coverage-7.9.2-py3-none-any.whl", hash = "sha256:e425cd5b00f6fc0ed7cdbd766c70be8baab4b7839e4d4fe5fac48581dd968ea4", size = 204005, upload-time = "2025-07-03T10:54:13.491Z" },
245
- ]
246
-
247
- [package.optional-dependencies]
248
- toml = [
249
- { name = "tomli", marker = "python_full_version <= '3.11'" },
250
- ]
251
-
252
170
  [[package]]
253
171
  name = "datalab-python-sdk"
254
- version = "0.1.1"
172
+ version = "0.1.3"
255
173
  source = { editable = "." }
256
174
  dependencies = [
257
175
  { name = "aiohttp" },
258
176
  { name = "click" },
259
177
  { name = "pydantic" },
260
178
  { name = "pydantic-settings" },
261
- { name = "pytest-asyncio" },
262
- ]
263
-
264
- [package.optional-dependencies]
265
- test = [
266
- { name = "aiofiles" },
267
- { name = "pytest" },
268
- { name = "pytest-asyncio" },
269
- { name = "pytest-cov" },
270
- { name = "pytest-mock" },
271
179
  ]
272
180
 
273
181
  [package.dev-dependencies]
274
182
  dev = [
183
+ { name = "aiohttp" },
184
+ { name = "click" },
275
185
  { name = "pre-commit" },
276
186
  { name = "pytest" },
187
+ { name = "pytest-asyncio" },
188
+ { name = "pytest-xdist" },
277
189
  { name = "ruff" },
278
190
  ]
279
191
 
280
192
  [package.metadata]
281
193
  requires-dist = [
282
- { name = "aiofiles", marker = "extra == 'test'", specifier = ">=23.2.0" },
283
194
  { name = "aiohttp", specifier = ">=3.12.14" },
284
195
  { name = "click", specifier = ">=8.2.1" },
285
196
  { name = "pydantic", specifier = ">=2.11.7,<3.0.0" },
286
197
  { name = "pydantic-settings", specifier = ">=2.10.1,<3.0.0" },
287
- { name = "pytest", marker = "extra == 'test'", specifier = ">=7.4.0" },
288
- { name = "pytest-asyncio", specifier = ">=1.0.0" },
289
- { name = "pytest-asyncio", marker = "extra == 'test'", specifier = ">=0.21.0" },
290
- { name = "pytest-cov", marker = "extra == 'test'", specifier = ">=4.1.0" },
291
- { name = "pytest-mock", marker = "extra == 'test'", specifier = ">=3.11.0" },
292
198
  ]
293
- provides-extras = ["test"]
294
199
 
295
200
  [package.metadata.requires-dev]
296
201
  dev = [
202
+ { name = "aiohttp", specifier = ">=3.12.14" },
203
+ { name = "click", specifier = ">=8.2.1" },
297
204
  { name = "pre-commit", specifier = ">=4.2.0" },
298
205
  { name = "pytest", specifier = ">=8.4.1" },
206
+ { name = "pytest-asyncio", specifier = ">=1.0.0" },
207
+ { name = "pytest-xdist", specifier = ">=3.8.0" },
299
208
  { name = "ruff", specifier = ">=0.12.2" },
300
209
  ]
301
210
 
@@ -313,13 +222,22 @@ name = "exceptiongroup"
313
222
  version = "1.3.0"
314
223
  source = { registry = "https://pypi.org/simple" }
315
224
  dependencies = [
316
- { name = "typing-extensions", marker = "python_full_version < '3.11'" },
225
+ { name = "typing-extensions", marker = "python_full_version < '3.13'" },
317
226
  ]
318
227
  sdist = { url = "https://files.pythonhosted.org/packages/0b/9f/a65090624ecf468cdca03533906e7c69ed7588582240cfe7cc9e770b50eb/exceptiongroup-1.3.0.tar.gz", hash = "sha256:b241f5885f560bc56a59ee63ca4c6a8bfa46ae4ad651af316d4e81817bb9fd88", size = 29749, upload-time = "2025-05-10T17:42:51.123Z" }
319
228
  wheels = [
320
229
  { url = "https://files.pythonhosted.org/packages/36/f4/c6e662dade71f56cd2f3735141b265c3c79293c109549c1e6933b0651ffc/exceptiongroup-1.3.0-py3-none-any.whl", hash = "sha256:4d111e6e0c13d0644cad6ddaa7ed0261a0b36971f6d23e7ec9b4b9097da78a10", size = 16674, upload-time = "2025-05-10T17:42:49.33Z" },
321
230
  ]
322
231
 
232
+ [[package]]
233
+ name = "execnet"
234
+ version = "2.1.1"
235
+ source = { registry = "https://pypi.org/simple" }
236
+ sdist = { url = "https://files.pythonhosted.org/packages/bb/ff/b4c0dc78fbe20c3e59c0c7334de0c27eb4001a2b2017999af398bf730817/execnet-2.1.1.tar.gz", hash = "sha256:5189b52c6121c24feae288166ab41b32549c7e2348652736540b9e6e7d4e72e3", size = 166524, upload-time = "2024-04-08T09:04:19.245Z" }
237
+ wheels = [
238
+ { url = "https://files.pythonhosted.org/packages/43/09/2aea36ff60d16dd8879bdb2f5b3ee0ba8d08cbbdcdfe870e695ce3784385/execnet-2.1.1-py3-none-any.whl", hash = "sha256:26dee51f1b80cebd6d0ca8e74dd8745419761d3bef34163928cbebbdc4749fdc", size = 40612, upload-time = "2024-04-08T09:04:17.414Z" },
239
+ ]
240
+
323
241
  [[package]]
324
242
  name = "filelock"
325
243
  version = "3.18.0"
@@ -849,29 +767,16 @@ wheels = [
849
767
  ]
850
768
 
851
769
  [[package]]
852
- name = "pytest-cov"
853
- version = "6.2.1"
854
- source = { registry = "https://pypi.org/simple" }
855
- dependencies = [
856
- { name = "coverage", extra = ["toml"] },
857
- { name = "pluggy" },
858
- { name = "pytest" },
859
- ]
860
- sdist = { url = "https://files.pythonhosted.org/packages/18/99/668cade231f434aaa59bbfbf49469068d2ddd945000621d3d165d2e7dd7b/pytest_cov-6.2.1.tar.gz", hash = "sha256:25cc6cc0a5358204b8108ecedc51a9b57b34cc6b8c967cc2c01a4e00d8a67da2", size = 69432, upload-time = "2025-06-12T10:47:47.684Z" }
861
- wheels = [
862
- { url = "https://files.pythonhosted.org/packages/bc/16/4ea354101abb1287856baa4af2732be351c7bee728065aed451b678153fd/pytest_cov-6.2.1-py3-none-any.whl", hash = "sha256:f5bc4c23f42f1cdd23c70b1dab1bbaef4fc505ba950d53e0081d0730dd7e86d5", size = 24644, upload-time = "2025-06-12T10:47:45.932Z" },
863
- ]
864
-
865
- [[package]]
866
- name = "pytest-mock"
867
- version = "3.14.1"
770
+ name = "pytest-xdist"
771
+ version = "3.8.0"
868
772
  source = { registry = "https://pypi.org/simple" }
869
773
  dependencies = [
774
+ { name = "execnet" },
870
775
  { name = "pytest" },
871
776
  ]
872
- sdist = { url = "https://files.pythonhosted.org/packages/71/28/67172c96ba684058a4d24ffe144d64783d2a270d0af0d9e792737bddc75c/pytest_mock-3.14.1.tar.gz", hash = "sha256:159e9edac4c451ce77a5cdb9fc5d1100708d2dd4ba3c3df572f14097351af80e", size = 33241, upload-time = "2025-05-26T13:58:45.167Z" }
777
+ sdist = { url = "https://files.pythonhosted.org/packages/78/b4/439b179d1ff526791eb921115fca8e44e596a13efeda518b9d845a619450/pytest_xdist-3.8.0.tar.gz", hash = "sha256:7e578125ec9bc6050861aa93f2d59f1d8d085595d6551c2c90b6f4fad8d3a9f1", size = 88069, upload-time = "2025-07-01T13:30:59.346Z" }
873
778
  wheels = [
874
- { url = "https://files.pythonhosted.org/packages/b2/05/77b60e520511c53d1c1ca75f1930c7dd8e971d0c4379b7f4b3f9644685ba/pytest_mock-3.14.1-py3-none-any.whl", hash = "sha256:178aefcd11307d874b4cd3100344e7e2d888d9791a6a1d9bfe90fbc1b74fd1d0", size = 9923, upload-time = "2025-05-26T13:58:43.487Z" },
779
+ { url = "https://files.pythonhosted.org/packages/ca/31/d4e37e9e550c2b92a9cbc2e4d0b7420a27224968580b5a447f420847c975/pytest_xdist-3.8.0-py3-none-any.whl", hash = "sha256:202ca578cfeb7370784a8c33d6d05bc6e13b4f25b5053c30a152269fd10f0b88", size = 46396, upload-time = "2025-07-01T13:30:56.632Z" },
875
780
  ]
876
781
 
877
782
  [[package]]
@@ -1,17 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: datalab-python-sdk
3
- Version: 0.1.1
4
- Summary: Auto-generated SDK for Datalab API
5
- License-File: LICENSE
6
- Requires-Python: >=3.10
7
- Requires-Dist: aiohttp>=3.12.14
8
- Requires-Dist: click>=8.2.1
9
- Requires-Dist: pydantic-settings<3.0.0,>=2.10.1
10
- Requires-Dist: pydantic<3.0.0,>=2.11.7
11
- Requires-Dist: pytest-asyncio>=1.0.0
12
- Provides-Extra: test
13
- Requires-Dist: aiofiles>=23.2.0; extra == 'test'
14
- Requires-Dist: pytest-asyncio>=0.21.0; extra == 'test'
15
- Requires-Dist: pytest-cov>=4.1.0; extra == 'test'
16
- Requires-Dist: pytest-mock>=3.11.0; extra == 'test'
17
- Requires-Dist: pytest>=7.4.0; extra == 'test'
@@ -1,178 +0,0 @@
1
- # Datalab SDK
2
-
3
- A Python SDK for the [Datalab API](https://www.datalab.to) - a document intelligence platform powered by [marker](https://github.com/VikParuchuri/marker) and [surya](https://github.com/VikParuchuri/surya).
4
-
5
- ## Installation
6
-
7
- ```bash
8
- pip install datalab-sdk
9
- ```
10
-
11
- ## Quick Start
12
-
13
- ### Authentication
14
-
15
- Get your API key from [https://www.datalab.to/app/keys](https://www.datalab.to/app/keys):
16
-
17
- ```bash
18
- export DATALAB_API_KEY="your_api_key_here"
19
- ```
20
-
21
- ### Basic Usage
22
-
23
- ```python
24
- from datalab_sdk import DatalabClient
25
-
26
- client = DatalabClient() # use env var from above, or pass api_key="your_api_key_here"
27
-
28
- # Convert PDF to markdown
29
- result = client.convert("document.pdf")
30
- print(result.markdown)
31
-
32
- # OCR a document
33
- ocr_result = client.ocr("document.pdf")
34
- print(ocr_result.get_text()) # Get all text as string
35
- ```
36
-
37
- ### Async Usage
38
-
39
- ```python
40
- import asyncio
41
- from datalab_sdk import AsyncDatalabClient
42
-
43
- async def main():
44
- async with AsyncDatalabClient(api_key="YOUR_API_KEY") as client:
45
- # Convert PDF to markdown
46
- result = await client.convert("document.pdf")
47
- print(result.markdown)
48
-
49
- # OCR a document
50
- ocr_result = await client.ocr("document.pdf")
51
- print(f"OCR found {len(ocr_result.pages)} pages")
52
-
53
- asyncio.run(main())
54
- ```
55
-
56
- ## API Methods
57
-
58
- ### Document Conversion
59
-
60
- Convert PDFs, Office documents, and images to markdown, HTML, or JSON.
61
-
62
- ```python
63
- # Basic conversion
64
- result = client.convert("document.pdf")
65
-
66
- # With options
67
- from datalab_sdk import ProcessingOptions
68
- options = ProcessingOptions(
69
- force_ocr=True,
70
- output_format="html",
71
- use_llm=True,
72
- max_pages=10
73
- )
74
- result = client.convert("document.pdf", options=options)
75
-
76
- # Convert and save automatically
77
- result = client.convert("document.pdf", save_output="output/result")
78
- ```
79
-
80
- ### OCR
81
-
82
- Extract text with bounding boxes from documents.
83
-
84
- ```python
85
- # Basic OCR
86
- result = client.ocr("document.pdf")
87
- print(result.get_text())
88
-
89
- # OCR with options
90
- from datalab_sdk import ProcessingOptions
91
- options = ProcessingOptions(
92
- max_pages=2
93
- )
94
- result = client.ocr("document.pdf", options)
95
-
96
- # OCR and save automatically
97
- result = client.ocr("document.pdf", save_output="output/ocr_result")
98
- ```
99
-
100
- ## CLI Usage
101
-
102
- The SDK includes a command-line interface:
103
-
104
- ```bash
105
- # Convert document to markdown
106
- datalab convert document.pdf
107
-
108
- # OCR with JSON output
109
- datalab ocr document.pdf --output-format json
110
- ```
111
-
112
- ## Error Handling
113
-
114
- ```python
115
- from datalab_sdk import DatalabAPIError, DatalabTimeoutError
116
-
117
- try:
118
- result = client.convert("document.pdf")
119
- except DatalabAPIError as e:
120
- print(f"API Error: {e}")
121
- except DatalabTimeoutError as e:
122
- print(f"Timeout: {e}")
123
- ```
124
-
125
- ## Supported File Types
126
-
127
- - **PDF**: `pdf`
128
- - **Images**: `png`, `jpeg`, `webp`, `gif`, `tiff`
129
- - **Office Documents**: `docx`, `xlsx`, `pptx`, `doc`, `xls`, `ppt`
130
- - **Other**: `html`, `epub`, `odt`, `ods`, `odp`
131
-
132
- ## Rate Limits
133
-
134
- - 200 requests per 60 seconds
135
- - Maximum 200 concurrent requests
136
- - 200MB file size limit
137
-
138
- * email hi@datalab.to for higher limits.
139
-
140
- ## Examples
141
-
142
- ### Extract JSON Data
143
-
144
- ```python
145
- from datalab_sdk import DatalabClient, ProcessingOptions
146
-
147
- client = DatalabClient(api_key="YOUR_API_KEY")
148
- options = ProcessingOptions(output_format="json")
149
- result = client.convert("research_paper.pdf", options=options)
150
-
151
- # Parse JSON to find equations
152
- import json
153
- data = json.loads(result.json)
154
- equations = [block for block in data if block.get('block_type') == 'Formula']
155
- print(f"Found {len(equations)} equations")
156
- ```
157
-
158
- ### Batch Process Documents
159
-
160
- ```python
161
- import asyncio
162
- from pathlib import Path
163
- from datalab_sdk import AsyncDatalabClient
164
-
165
- async def process_documents():
166
- files = list(Path("documents/").glob("*.pdf"))
167
-
168
- async with AsyncDatalabClient(api_key="YOUR_API_KEY") as client:
169
- for file in files[:5]:
170
- result = await client.convert(str(file), save_output=f"output/{file.stem}")
171
- print(f"{file.name}: {result.page_count} pages")
172
-
173
- asyncio.run(process_documents())
174
- ```
175
-
176
- ## License
177
-
178
- MIT License
@@ -1,71 +0,0 @@
1
- # Integration Tests
2
-
3
- This directory contains integration tests that run against the live Datalab API.
4
-
5
- ## Setup
6
-
7
- 1. **Set your API key** as an environment variable:
8
- ```bash
9
- export DATALAB_API_KEY="your_api_key_here"
10
- ```
11
-
12
- 2. **Optional: Set custom base URL** if testing against a different server:
13
- ```bash
14
- export DATALAB_BASE_URL="https://custom.datalab.to"
15
- ```
16
-
17
- ## Running the Tests
18
-
19
- Run all integration tests:
20
- ```bash
21
- pytest integration/ -v
22
- ```
23
-
24
- Run specific test classes:
25
- ```bash
26
- pytest integration/test_live_api.py::TestMarkerIntegration -v
27
- pytest integration/test_live_api.py::TestOCRIntegration -v
28
- ```
29
-
30
- Run individual tests:
31
- ```bash
32
- pytest integration/test_live_api.py::TestMarkerIntegration::test_convert_pdf_basic -v
33
- ```
34
-
35
- ## Test Coverage
36
-
37
- ### Marker/Convert Tests
38
- - **test_convert_pdf_basic**: Basic PDF to markdown conversion
39
- - **test_convert_office_document**: Word document to HTML conversion
40
- - **test_convert_async_with_json**: Async PowerPoint to JSON conversion
41
-
42
- ### OCR Tests
43
- - **test_ocr_pdf_basic**: Basic PDF OCR with text extraction
44
- - **test_ocr_image_file**: OCR on PNG image file
45
- - **test_ocr_async_multiple_pages**: Async OCR with multiple pages
46
-
47
- ### Error Handling Tests
48
- - **test_invalid_api_key**: Invalid API key handling
49
- - **test_nonexistent_file**: Nonexistent file handling
50
- - **test_unsupported_file_type**: Unsupported file type handling
51
-
52
- ### Save Output Tests
53
- - **test_convert_with_save_output**: Automatic file saving for conversion
54
- - **test_ocr_with_save_output**: Automatic file saving for OCR
55
-
56
- ## Test Data Files Used
57
-
58
- The tests use sample files from the `data/` directory:
59
- - `adversarial.pdf` - PDF document
60
- - `bid_evaluation.docx` - Word document
61
- - `08-Lambda-Calculus.pptx` - PowerPoint presentation
62
- - `thinkpython.pdf` - PDF book
63
- - `chi_hind.png` - Image file
64
-
65
- ## Notes
66
-
67
- - Tests use `max_pages=1` or `max_pages=2` to keep API usage minimal
68
- - LLM mode is disabled to avoid extra costs
69
- - All tests require a valid API key and will be skipped if not provided
70
- - Tests make actual API calls and will consume API credits
71
- - Some tests may take time to complete due to processing delays