chub-dev 0.1.0 → 0.1.2-beta.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +55 -0
- package/bin/chub-mcp +2 -0
- package/dist/airtable/docs/database/javascript/DOC.md +1437 -0
- package/dist/airtable/docs/database/python/DOC.md +1735 -0
- package/dist/amplitude/docs/analytics/javascript/DOC.md +1282 -0
- package/dist/amplitude/docs/analytics/python/DOC.md +1199 -0
- package/dist/anthropic/docs/claude-api/javascript/DOC.md +503 -0
- package/dist/anthropic/docs/claude-api/python/DOC.md +389 -0
- package/dist/asana/docs/tasks/DOC.md +1396 -0
- package/dist/assemblyai/docs/transcription/DOC.md +1043 -0
- package/dist/atlassian/docs/confluence/javascript/DOC.md +1347 -0
- package/dist/atlassian/docs/confluence/python/DOC.md +1604 -0
- package/dist/auth0/docs/identity/javascript/DOC.md +968 -0
- package/dist/auth0/docs/identity/python/DOC.md +1199 -0
- package/dist/aws/docs/s3/javascript/DOC.md +1773 -0
- package/dist/aws/docs/s3/python/DOC.md +1807 -0
- package/dist/binance/docs/trading/javascript/DOC.md +1315 -0
- package/dist/binance/docs/trading/python/DOC.md +1454 -0
- package/dist/braintree/docs/gateway/javascript/DOC.md +1278 -0
- package/dist/braintree/docs/gateway/python/DOC.md +1179 -0
- package/dist/chromadb/docs/embeddings-db/javascript/DOC.md +1263 -0
- package/dist/chromadb/docs/embeddings-db/python/DOC.md +1707 -0
- package/dist/clerk/docs/auth/javascript/DOC.md +1220 -0
- package/dist/clerk/docs/auth/python/DOC.md +274 -0
- package/dist/cloudflare/docs/workers/javascript/DOC.md +918 -0
- package/dist/cloudflare/docs/workers/python/DOC.md +994 -0
- package/dist/cockroachdb/docs/distributed-db/DOC.md +1500 -0
- package/dist/cohere/docs/llm/DOC.md +1335 -0
- package/dist/datadog/docs/monitoring/javascript/DOC.md +1740 -0
- package/dist/datadog/docs/monitoring/python/DOC.md +1815 -0
- package/dist/deepgram/docs/speech/javascript/DOC.md +885 -0
- package/dist/deepgram/docs/speech/python/DOC.md +685 -0
- package/dist/deepl/docs/translation/javascript/DOC.md +887 -0
- package/dist/deepl/docs/translation/python/DOC.md +944 -0
- package/dist/deepseek/docs/llm/DOC.md +1220 -0
- package/dist/directus/docs/headless-cms/javascript/DOC.md +1128 -0
- package/dist/directus/docs/headless-cms/python/DOC.md +1276 -0
- package/dist/discord/docs/bot/javascript/DOC.md +1090 -0
- package/dist/discord/docs/bot/python/DOC.md +1130 -0
- package/dist/elasticsearch/docs/search/DOC.md +1634 -0
- package/dist/elevenlabs/docs/text-to-speech/javascript/DOC.md +336 -0
- package/dist/elevenlabs/docs/text-to-speech/python/DOC.md +552 -0
- package/dist/firebase/docs/auth/DOC.md +1015 -0
- package/dist/gemini/docs/genai/javascript/DOC.md +691 -0
- package/dist/gemini/docs/genai/python/DOC.md +555 -0
- package/dist/github/docs/octokit/DOC.md +1560 -0
- package/dist/google/docs/bigquery/javascript/DOC.md +1688 -0
- package/dist/google/docs/bigquery/python/DOC.md +1503 -0
- package/dist/hubspot/docs/crm/javascript/DOC.md +1805 -0
- package/dist/hubspot/docs/crm/python/DOC.md +2033 -0
- package/dist/huggingface/docs/transformers/DOC.md +948 -0
- package/dist/intercom/docs/messaging/javascript/DOC.md +1844 -0
- package/dist/intercom/docs/messaging/python/DOC.md +1797 -0
- package/dist/jira/docs/issues/javascript/DOC.md +1420 -0
- package/dist/jira/docs/issues/python/DOC.md +1492 -0
- package/dist/kafka/docs/streaming/javascript/DOC.md +1671 -0
- package/dist/kafka/docs/streaming/python/DOC.md +1464 -0
- package/dist/landingai-ade/docs/api/DOC.md +620 -0
- package/dist/landingai-ade/docs/sdk/python/DOC.md +489 -0
- package/dist/landingai-ade/docs/sdk/typescript/DOC.md +542 -0
- package/dist/landingai-ade/skills/SKILL.md +489 -0
- package/dist/launchdarkly/docs/feature-flags/javascript/DOC.md +1191 -0
- package/dist/launchdarkly/docs/feature-flags/python/DOC.md +1671 -0
- package/dist/linear/docs/tracker/DOC.md +1554 -0
- package/dist/livekit/docs/realtime/javascript/DOC.md +303 -0
- package/dist/livekit/docs/realtime/python/DOC.md +163 -0
- package/dist/mailchimp/docs/marketing/DOC.md +1420 -0
- package/dist/meilisearch/docs/search/DOC.md +1241 -0
- package/dist/microsoft/docs/onedrive/javascript/DOC.md +1421 -0
- package/dist/microsoft/docs/onedrive/python/DOC.md +1549 -0
- package/dist/mongodb/docs/atlas/DOC.md +2041 -0
- package/dist/notion/docs/workspace-api/javascript/DOC.md +1435 -0
- package/dist/notion/docs/workspace-api/python/DOC.md +1400 -0
- package/dist/okta/docs/identity/javascript/DOC.md +1171 -0
- package/dist/okta/docs/identity/python/DOC.md +1401 -0
- package/dist/openai/docs/chat/javascript/DOC.md +407 -0
- package/dist/openai/docs/chat/python/DOC.md +568 -0
- package/dist/paypal/docs/checkout/DOC.md +278 -0
- package/dist/pinecone/docs/sdk/javascript/DOC.md +984 -0
- package/dist/pinecone/docs/sdk/python/DOC.md +1395 -0
- package/dist/plaid/docs/banking/javascript/DOC.md +1163 -0
- package/dist/plaid/docs/banking/python/DOC.md +1203 -0
- package/dist/playwright-community/skills/login-flows/SKILL.md +108 -0
- package/dist/postmark/docs/transactional-email/DOC.md +1168 -0
- package/dist/prisma/docs/orm/javascript/DOC.md +1419 -0
- package/dist/prisma/docs/orm/python/DOC.md +1317 -0
- package/dist/qdrant/docs/vector-search/javascript/DOC.md +1221 -0
- package/dist/qdrant/docs/vector-search/python/DOC.md +1653 -0
- package/dist/rabbitmq/docs/message-queue/javascript/DOC.md +1193 -0
- package/dist/rabbitmq/docs/message-queue/python/DOC.md +1243 -0
- package/dist/razorpay/docs/payments/javascript/DOC.md +1219 -0
- package/dist/razorpay/docs/payments/python/DOC.md +1330 -0
- package/dist/redis/docs/key-value/javascript/DOC.md +1851 -0
- package/dist/redis/docs/key-value/python/DOC.md +2054 -0
- package/dist/registry.json +2817 -0
- package/dist/replicate/docs/model-hosting/DOC.md +1318 -0
- package/dist/resend/docs/email/DOC.md +1271 -0
- package/dist/salesforce/docs/crm/javascript/DOC.md +1241 -0
- package/dist/salesforce/docs/crm/python/DOC.md +1183 -0
- package/dist/search-index.json +1 -0
- package/dist/sendgrid/docs/email-api/javascript/DOC.md +371 -0
- package/dist/sendgrid/docs/email-api/python/DOC.md +656 -0
- package/dist/sentry/docs/error-tracking/javascript/DOC.md +1073 -0
- package/dist/sentry/docs/error-tracking/python/DOC.md +1309 -0
- package/dist/shopify/docs/storefront/DOC.md +457 -0
- package/dist/slack/docs/workspace/javascript/DOC.md +933 -0
- package/dist/slack/docs/workspace/python/DOC.md +271 -0
- package/dist/square/docs/payments/javascript/DOC.md +1855 -0
- package/dist/square/docs/payments/python/DOC.md +1728 -0
- package/dist/stripe/docs/api/DOC.md +1727 -0
- package/dist/stripe/docs/payments/DOC.md +1726 -0
- package/dist/stytch/docs/auth/javascript/DOC.md +1813 -0
- package/dist/stytch/docs/auth/python/DOC.md +1962 -0
- package/dist/supabase/docs/client/DOC.md +1606 -0
- package/dist/twilio/docs/messaging/python/DOC.md +469 -0
- package/dist/twilio/docs/messaging/typescript/DOC.md +946 -0
- package/dist/vercel/docs/platform/DOC.md +1940 -0
- package/dist/weaviate/docs/vector-db/javascript/DOC.md +1268 -0
- package/dist/weaviate/docs/vector-db/python/DOC.md +1388 -0
- package/dist/zendesk/docs/support/javascript/DOC.md +2150 -0
- package/dist/zendesk/docs/support/python/DOC.md +2297 -0
- package/package.json +22 -6
- package/skills/get-api-docs/SKILL.md +84 -0
- package/src/commands/annotate.js +83 -0
- package/src/commands/build.js +12 -1
- package/src/commands/feedback.js +150 -0
- package/src/commands/get.js +83 -42
- package/src/commands/search.js +7 -0
- package/src/index.js +43 -17
- package/src/lib/analytics.js +90 -0
- package/src/lib/annotations.js +57 -0
- package/src/lib/bm25.js +170 -0
- package/src/lib/cache.js +69 -6
- package/src/lib/config.js +8 -3
- package/src/lib/identity.js +99 -0
- package/src/lib/registry.js +103 -20
- package/src/lib/telemetry.js +86 -0
- package/src/mcp/server.js +177 -0
- package/src/mcp/tools.js +251 -0
|
@@ -0,0 +1,489 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: sdk
|
|
3
|
+
description: "Python SDK reference for LandingAI's Agentic Document Extraction (ADE). Includes Pydantic schema extraction, async processing, error handling, save_to, visual grounding, table cell lookup, and complete API context."
|
|
4
|
+
metadata:
|
|
5
|
+
languages: "python"
|
|
6
|
+
versions: "0.1.0"
|
|
7
|
+
updated-on: "2026-03-04"
|
|
8
|
+
source: maintainer
|
|
9
|
+
tags: "landingai,ade,python,sdk,pydantic,document-extraction,parse,extract,split,async"
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
# LandingAI ADE — Python SDK Reference
|
|
13
|
+
|
|
14
|
+
Python SDK for LandingAI's Agentic Document Extraction.
|
|
15
|
+
|
|
16
|
+
## Installation
|
|
17
|
+
|
|
18
|
+
```bash
|
|
19
|
+
pip install landingai-ade
|
|
20
|
+
export VISION_AGENT_API_KEY="v2_..."
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
## Client Setup
|
|
24
|
+
|
|
25
|
+
```python
|
|
26
|
+
from landingai_ade import LandingAIADE
|
|
27
|
+
from pathlib import Path
|
|
28
|
+
|
|
29
|
+
client = LandingAIADE() # Uses VISION_AGENT_API_KEY env var
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
### Constructor Arguments
|
|
33
|
+
|
|
34
|
+
| Parameter | Type | Default | Description |
|
|
35
|
+
|-----------|------|---------|-------------|
|
|
36
|
+
| `api_key` | `str \| None` | env `VISION_AGENT_API_KEY` | API key |
|
|
37
|
+
| `environment` | `"production" \| "eu"` | `"production"` | Region — `"production"` (US) or `"eu"` |
|
|
38
|
+
| `base_url` | `str \| None` | — | Override base URL |
|
|
39
|
+
| `timeout` | `float \| Timeout \| None` | SDK default | Request timeout in seconds |
|
|
40
|
+
| `max_retries` | `int` | SDK default | Max retry attempts for transient errors |
|
|
41
|
+
| `http_client` | `httpx.Client \| None` | — | Custom httpx client |
|
|
42
|
+
|
|
43
|
+
```python
|
|
44
|
+
# EU region
|
|
45
|
+
client = LandingAIADE(environment="eu")
|
|
46
|
+
|
|
47
|
+
# Pass key directly
|
|
48
|
+
client = LandingAIADE(api_key="v2_...")
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
---
|
|
52
|
+
|
|
53
|
+
## 1. Parse
|
|
54
|
+
|
|
55
|
+
Converts documents to structured markdown with visual grounding.
|
|
56
|
+
|
|
57
|
+
### Arguments
|
|
58
|
+
|
|
59
|
+
| Parameter | Type | Required | Description |
|
|
60
|
+
|-----------|------|----------|-------------|
|
|
61
|
+
| `document` | `FileTypes \| None` | One required | Local file (Path, bytes, file-like) |
|
|
62
|
+
| `document_url` | `str \| None` | One required | Remote document URL |
|
|
63
|
+
| `model` | `str \| None` | No | Model version (default: `dpt-2-latest`) |
|
|
64
|
+
| `split` | `"page" \| None` | No | Split by pages |
|
|
65
|
+
| `save_to` | `str \| None` | No | Directory to save `{filename}_parse_output.json` |
|
|
66
|
+
|
|
67
|
+
### Returns `ParseResponse`
|
|
68
|
+
|
|
69
|
+
```
|
|
70
|
+
.markdown → str: full document as markdown
|
|
71
|
+
.chunks[] → Chunk: {id, type, markdown, grounding: {page, box}}
|
|
72
|
+
.grounding → dict: {id → Grounding} with bounding boxes and tableCell positions
|
|
73
|
+
.splits[] → Split: {chunks[], class, identifier, markdown, pages[]} (only if split="page")
|
|
74
|
+
.metadata → ParseMetadata: {filename, page_count, duration_ms, credit_usage, version, job_id, failed_pages}
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
### Example
|
|
78
|
+
|
|
79
|
+
```python
|
|
80
|
+
response = client.parse(
|
|
81
|
+
document=Path("invoice.pdf"),
|
|
82
|
+
model="dpt-2-latest",
|
|
83
|
+
save_to="./output",
|
|
84
|
+
)
|
|
85
|
+
|
|
86
|
+
print(response.markdown)
|
|
87
|
+
print(f"{len(response.chunks)} chunks, {response.metadata.page_count} pages")
|
|
88
|
+
|
|
89
|
+
tables = [c for c in response.chunks if c.type == "table"]
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
### Visual Grounding and Table Cells
|
|
93
|
+
|
|
94
|
+
```python
|
|
95
|
+
for chunk in response.chunks:
|
|
96
|
+
box = chunk.grounding.box
|
|
97
|
+
print(f"{chunk.type} on page {chunk.grounding.page}: "
|
|
98
|
+
f"({box.left:.3f}, {box.top:.3f}) → ({box.right:.3f}, {box.bottom:.3f})")
|
|
99
|
+
|
|
100
|
+
for gid, grounding in response.grounding.items():
|
|
101
|
+
if grounding.type == "tableCell":
|
|
102
|
+
pos = grounding.position
|
|
103
|
+
print(f"Cell ({pos.row}, {pos.col}) span=({pos.rowspan}x{pos.colspan})")
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
### Extract a Cell Value by Row and Column (PDF)
|
|
107
|
+
|
|
108
|
+
```python
|
|
109
|
+
import re
|
|
110
|
+
|
|
111
|
+
table = next(c for c in response.chunks if c.type == "table")
|
|
112
|
+
|
|
113
|
+
rows = re.findall(r'<tr[^>]*>(.*?)</tr>', table.markdown, re.DOTALL)
|
|
114
|
+
grid = {}
|
|
115
|
+
for r, row_html in enumerate(rows):
|
|
116
|
+
for c, m in enumerate(re.finditer(r'<td[^>]*>(.*?)</td>', row_html, re.DOTALL)):
|
|
117
|
+
grid[(r, c)] = re.sub(r'<[^>]+>', '', m.group(1)).strip()
|
|
118
|
+
|
|
119
|
+
value = grid[(1, 0)] # zero-indexed row, col
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
### Read a Spreadsheet Cell by Reference
|
|
123
|
+
|
|
124
|
+
```python
|
|
125
|
+
import re
|
|
126
|
+
|
|
127
|
+
response = client.parse(document=Path("report.xlsx"))
|
|
128
|
+
table = next(c for c in response.chunks if c.type == "table")
|
|
129
|
+
|
|
130
|
+
# Spreadsheet cell IDs are "{tab_name}-{cell_ref}" (e.g., "Sheet 1-B2").
|
|
131
|
+
# grounding is null for spreadsheets, so parse IDs directly from HTML.
|
|
132
|
+
cell_text = {}
|
|
133
|
+
for m in re.finditer(
|
|
134
|
+
r'<td[^>]*\bid=["\']([^"\']+)["\'][^>]*>(.*?)</td>',
|
|
135
|
+
table.markdown, re.DOTALL,
|
|
136
|
+
):
|
|
137
|
+
cell_text[m.group(1)] = re.sub(r"<[^>]+>", "", m.group(2)).strip()
|
|
138
|
+
|
|
139
|
+
value = cell_text["Sheet 1-B2"]
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
---
|
|
143
|
+
|
|
144
|
+
## 2. Extract
|
|
145
|
+
|
|
146
|
+
Extracts structured data from markdown using a JSON schema.
|
|
147
|
+
|
|
148
|
+
### Arguments
|
|
149
|
+
|
|
150
|
+
| Parameter | Type | Required | Description |
|
|
151
|
+
|-----------|------|----------|-------------|
|
|
152
|
+
| `schema` | `str` | Yes | JSON schema string (use `pydantic_to_json_schema()` to generate from Pydantic models) |
|
|
153
|
+
| `markdown` | `FileTypes \| str \| None` | One required | Markdown content, string, or file |
|
|
154
|
+
| `markdown_url` | `str \| None` | One required | URL to markdown |
|
|
155
|
+
| `model` | `str \| None` | No | Model version (default: `extract-latest`) |
|
|
156
|
+
| `save_to` | `str \| None` | No | Directory to save `{filename}_extract_output.json` |
|
|
157
|
+
|
|
158
|
+
### Returns `ExtractResponse`
|
|
159
|
+
|
|
160
|
+
```
|
|
161
|
+
.extraction → dict: extracted key-value pairs matching schema
|
|
162
|
+
.extraction_metadata → dict: {field → {references: [chunk_ids]}} for grounding
|
|
163
|
+
.metadata → Metadata: {credit_usage, duration_ms, filename, job_id, version, schema_violation_error}
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
### Pydantic Schema Extraction
|
|
167
|
+
|
|
168
|
+
```python
|
|
169
|
+
from pydantic import BaseModel, Field
|
|
170
|
+
from landingai_ade.lib import pydantic_to_json_schema
|
|
171
|
+
|
|
172
|
+
class InvoiceData(BaseModel):
|
|
173
|
+
invoice_number: str = Field(description="Invoice number or ID")
|
|
174
|
+
total_amount: float = Field(description="Total amount to be paid")
|
|
175
|
+
vendor_name: str = Field(description="Vendor or supplier name")
|
|
176
|
+
line_items: list[dict] | None = Field(default=None, description="Line items")
|
|
177
|
+
|
|
178
|
+
# Parse once, extract many
|
|
179
|
+
parsed = client.parse(document=Path("invoice.pdf"))
|
|
180
|
+
|
|
181
|
+
response = client.extract(
|
|
182
|
+
markdown=parsed.markdown,
|
|
183
|
+
schema=pydantic_to_json_schema(InvoiceData),
|
|
184
|
+
)
|
|
185
|
+
|
|
186
|
+
invoice = InvoiceData(**response.extraction)
|
|
187
|
+
print(f"Invoice {invoice.invoice_number}: ${invoice.total_amount}")
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
### Grounding References (Tracing Back to Source)
|
|
191
|
+
|
|
192
|
+
```python
|
|
193
|
+
chunk_map = {c.id: c for c in parsed.chunks}
|
|
194
|
+
|
|
195
|
+
for field, meta in response.extraction_metadata.items():
|
|
196
|
+
if meta.get("references"):
|
|
197
|
+
chunk = chunk_map.get(meta["references"][0])
|
|
198
|
+
if chunk:
|
|
199
|
+
print(f"{field}: page {chunk.grounding.page}, type={chunk.type}")
|
|
200
|
+
```
|
|
201
|
+
|
|
202
|
+
### `pydantic_to_json_schema(model)`
|
|
203
|
+
|
|
204
|
+
Converts a Pydantic `BaseModel` class to a resolved JSON schema string (all `$ref` inlined). Pass the result directly to `schema=`.
|
|
205
|
+
|
|
206
|
+
```python
|
|
207
|
+
from landingai_ade.lib import pydantic_to_json_schema
|
|
208
|
+
|
|
209
|
+
schema_str = pydantic_to_json_schema(InvoiceData) # → JSON string
|
|
210
|
+
```
|
|
211
|
+
|
|
212
|
+
---
|
|
213
|
+
|
|
214
|
+
## 3. Split
|
|
215
|
+
|
|
216
|
+
Classifies and splits mixed documents by type.
|
|
217
|
+
|
|
218
|
+
### Arguments
|
|
219
|
+
|
|
220
|
+
| Parameter | Type | Required | Description |
|
|
221
|
+
|-----------|------|----------|-------------|
|
|
222
|
+
| `split_class` | `Iterable[SplitClass]` | Yes | List of `{"name": str, "description"?: str, "identifier"?: str}` |
|
|
223
|
+
| `markdown` | `FileTypes \| str \| None` | One required | Markdown content or file |
|
|
224
|
+
| `markdown_url` | `str \| None` | One required | URL to markdown |
|
|
225
|
+
| `model` | `str \| None` | No | Model version (default: `split-latest`) |
|
|
226
|
+
| `save_to` | `str \| None` | No | Directory to save `{filename}_split_output.json` |
|
|
227
|
+
|
|
228
|
+
### Returns `SplitResponse`
|
|
229
|
+
|
|
230
|
+
```
|
|
231
|
+
.splits[] → Split: {classification, identifier, markdowns[], pages[]}
|
|
232
|
+
.metadata → Metadata: {credit_usage, duration_ms, filename, page_count}
|
|
233
|
+
```
|
|
234
|
+
|
|
235
|
+
### Split → Extract Pipeline
|
|
236
|
+
|
|
237
|
+
```python
|
|
238
|
+
parsed = client.parse(document=Path("mixed_invoices.pdf"))
|
|
239
|
+
|
|
240
|
+
splits = client.split(
|
|
241
|
+
markdown=parsed.markdown,
|
|
242
|
+
split_class=[
|
|
243
|
+
{"name": "Invoice", "description": "Sales invoice", "identifier": "Invoice Number"},
|
|
244
|
+
{"name": "Receipt", "description": "Payment receipt", "identifier": "Receipt Number"},
|
|
245
|
+
],
|
|
246
|
+
)
|
|
247
|
+
|
|
248
|
+
for split in splits.splits:
|
|
249
|
+
print(f"{split.classification}: {split.identifier} (pages {split.pages})")
|
|
250
|
+
|
|
251
|
+
# Extract from each split
|
|
252
|
+
schema = pydantic_to_json_schema(InvoiceData)
|
|
253
|
+
results = []
|
|
254
|
+
for split in splits.splits:
|
|
255
|
+
extracted = client.extract(markdown=split.markdowns[0], schema=schema)
|
|
256
|
+
results.append({"type": split.classification, "id": split.identifier, **extracted.extraction})
|
|
257
|
+
```
|
|
258
|
+
|
|
259
|
+
---
|
|
260
|
+
|
|
261
|
+
## 4. Parse Jobs (Async, Large Files)
|
|
262
|
+
|
|
263
|
+
For files >50MB, use asynchronous processing.
|
|
264
|
+
|
|
265
|
+
### `parse_jobs.create()` Arguments
|
|
266
|
+
|
|
267
|
+
| Parameter | Type | Required | Description |
|
|
268
|
+
|-----------|------|----------|-------------|
|
|
269
|
+
| `document` | `FileTypes \| None` | One required | Local file |
|
|
270
|
+
| `document_url` | `str \| None` | One required | Remote document URL |
|
|
271
|
+
| `model` | `str \| None` | No | Model version (default: `dpt-2-latest`) |
|
|
272
|
+
| `split` | `"page" \| None` | No | Split by pages |
|
|
273
|
+
| `output_save_url` | `str \| None` | If ZDR | URL for zero data retention output |
|
|
274
|
+
|
|
275
|
+
### Returns `ParseJobCreateResponse`
|
|
276
|
+
|
|
277
|
+
```
|
|
278
|
+
.job_id → str: unique job identifier
|
|
279
|
+
```
|
|
280
|
+
|
|
281
|
+
### `parse_jobs.get(job_id)` Returns `ParseJobGetResponse`
|
|
282
|
+
|
|
283
|
+
```
|
|
284
|
+
.job_id → str
|
|
285
|
+
.status → str: pending|processing|completed|failed|cancelled
|
|
286
|
+
.progress → float: 0.0 to 1.0
|
|
287
|
+
.failure_reason → str | None: error message if failed
|
|
288
|
+
.data → ParseResponse | None: full result when completed
|
|
289
|
+
.output_url → str | None: presigned URL if result >1MB (expires 1hr)
|
|
290
|
+
```
|
|
291
|
+
|
|
292
|
+
### `parse_jobs.list()` Arguments & Returns
|
|
293
|
+
|
|
294
|
+
| Parameter | Type | Required | Description |
|
|
295
|
+
|-----------|------|----------|-------------|
|
|
296
|
+
| `status` | `"pending" \| "processing" \| "completed" \| "failed" \| "cancelled"` | No | Filter by status |
|
|
297
|
+
| `page` | `int \| None` | No | Page number (0-indexed) |
|
|
298
|
+
| `page_size` | `int \| None` | No | Items per page |
|
|
299
|
+
|
|
300
|
+
```
|
|
301
|
+
.jobs[] → Job: {job_id, status, progress, received_at, failure_reason}
|
|
302
|
+
.has_more → bool | None
|
|
303
|
+
```
|
|
304
|
+
|
|
305
|
+
### Example
|
|
306
|
+
|
|
307
|
+
```python
|
|
308
|
+
import time
|
|
309
|
+
|
|
310
|
+
job = client.parse_jobs.create(document=Path("large.pdf"))
|
|
311
|
+
print(f"Job ID: {job.job_id}")
|
|
312
|
+
|
|
313
|
+
while True:
|
|
314
|
+
status = client.parse_jobs.get(job.job_id)
|
|
315
|
+
print(f"Status: {status.status}, Progress: {status.progress * 100:.0f}%")
|
|
316
|
+
|
|
317
|
+
if status.status == "completed":
|
|
318
|
+
result = status.data # ParseResponse
|
|
319
|
+
break
|
|
320
|
+
elif status.status == "failed":
|
|
321
|
+
raise RuntimeError(f"Job failed: {status.failure_reason}")
|
|
322
|
+
|
|
323
|
+
time.sleep(5)
|
|
324
|
+
```
|
|
325
|
+
|
|
326
|
+
---
|
|
327
|
+
|
|
328
|
+
## Error Handling
|
|
329
|
+
|
|
330
|
+
### Exception Classes
|
|
331
|
+
|
|
332
|
+
All exceptions inherit from `LandingAiadeError`:
|
|
333
|
+
|
|
334
|
+
| Exception | HTTP Status | Description |
|
|
335
|
+
|-----------|-------------|-------------|
|
|
336
|
+
| `BadRequestError` | 400 | Invalid parameters |
|
|
337
|
+
| `AuthenticationError` | 401 | Invalid API key |
|
|
338
|
+
| `PermissionDeniedError` | 403 | Forbidden |
|
|
339
|
+
| `NotFoundError` | 404 | Resource not found |
|
|
340
|
+
| `UnprocessableEntityError` | 422 | Invalid file type or malformed schema |
|
|
341
|
+
| `RateLimitError` | 429 | Too many requests |
|
|
342
|
+
| `InternalServerError` | 5xx | Server error |
|
|
343
|
+
| `APIConnectionError` | — | Network failure |
|
|
344
|
+
| `APITimeoutError` | — | Request timeout |
|
|
345
|
+
|
|
346
|
+
`APIStatusError` is the base for all HTTP errors and has a `status_code` attribute.
|
|
347
|
+
|
|
348
|
+
### Retry with Fallback to Jobs
|
|
349
|
+
|
|
350
|
+
```python
|
|
351
|
+
from landingai_ade import RateLimitError, APITimeoutError, APIStatusError, APIConnectionError
|
|
352
|
+
|
|
353
|
+
def parse_with_retry(client, file_path, max_retries=3):
|
|
354
|
+
for attempt in range(max_retries):
|
|
355
|
+
try:
|
|
356
|
+
return client.parse(document=Path(file_path))
|
|
357
|
+
except RateLimitError:
|
|
358
|
+
time.sleep(2 ** attempt * 10)
|
|
359
|
+
except (APITimeoutError, APIStatusError) as e:
|
|
360
|
+
if isinstance(e, APIStatusError) and e.status_code not in (413, 504):
|
|
361
|
+
raise
|
|
362
|
+
print("Timeout or too large — switching to parse jobs")
|
|
363
|
+
job = client.parse_jobs.create(document=Path(file_path))
|
|
364
|
+
return poll_job(client, job.job_id)
|
|
365
|
+
except APIConnectionError:
|
|
366
|
+
time.sleep(2)
|
|
367
|
+
raise RuntimeError("Failed after retries")
|
|
368
|
+
|
|
369
|
+
def poll_job(client, job_id, timeout=300):
|
|
370
|
+
start = time.time()
|
|
371
|
+
while time.time() - start < timeout:
|
|
372
|
+
status = client.parse_jobs.get(job_id)
|
|
373
|
+
if status.status == "completed":
|
|
374
|
+
return status.data
|
|
375
|
+
if status.status == "failed":
|
|
376
|
+
raise RuntimeError(f"Job failed: {status.failure_reason}")
|
|
377
|
+
time.sleep(5)
|
|
378
|
+
raise TimeoutError("Job did not complete in time")
|
|
379
|
+
```
|
|
380
|
+
|
|
381
|
+
---
|
|
382
|
+
|
|
383
|
+
## Async / Concurrent Processing
|
|
384
|
+
|
|
385
|
+
```python
|
|
386
|
+
import asyncio
|
|
387
|
+
from landingai_ade import AsyncLandingAIADE
|
|
388
|
+
|
|
389
|
+
async def parse_multiple(files: list[str]):
|
|
390
|
+
client = AsyncLandingAIADE()
|
|
391
|
+
tasks = [client.parse(document=Path(f)) for f in files]
|
|
392
|
+
results = await asyncio.gather(*tasks, return_exceptions=True)
|
|
393
|
+
return [r for r in results if not isinstance(r, Exception)]
|
|
394
|
+
```
|
|
395
|
+
|
|
396
|
+
`AsyncLandingAIADE` has the same constructor and methods as `LandingAIADE` — all methods are `async`.
|
|
397
|
+
|
|
398
|
+
---
|
|
399
|
+
|
|
400
|
+
## API Reference
|
|
401
|
+
|
|
402
|
+
The following sections provide the complete API context so this document is fully self-contained.
|
|
403
|
+
|
|
404
|
+
### Base Configuration
|
|
405
|
+
|
|
406
|
+
| Region | Base URL |
|
|
407
|
+
|--------|----------|
|
|
408
|
+
| US (default) | `https://api.va.landing.ai/v1/ade` |
|
|
409
|
+
| EU | `https://api.va.eu-west-1.landing.ai/v1/ade` |
|
|
410
|
+
|
|
411
|
+
**Authentication**: All requests require `Authorization: Bearer $VISION_AGENT_API_KEY`
|
|
412
|
+
|
|
413
|
+
### Quick Reference
|
|
414
|
+
|
|
415
|
+
| Endpoint | Method | Path | Model | Input |
|
|
416
|
+
|----------|--------|------|-------|-------|
|
|
417
|
+
| Parse | POST | `/v1/ade/parse` | `dpt-2-latest` | `document` (file) or `document_url` |
|
|
418
|
+
| Extract | POST | `/v1/ade/extract` | `extract-latest` | `markdown` (file/string) or `markdown_url` + `schema` |
|
|
419
|
+
| Split | POST | `/v1/ade/split` | `split-latest` | `markdown` (file/string) or `markdown_url` + `split_class` |
|
|
420
|
+
| Create Job | POST | `/v1/ade/parse/jobs` | `dpt-2-latest` | `document` or `document_url` |
|
|
421
|
+
| Get Job | GET | `/v1/ade/parse/jobs/{id}` | — | — |
|
|
422
|
+
| List Jobs | GET | `/v1/ade/parse/jobs` | — | `?status=&page=&pageSize=` |
|
|
423
|
+
|
|
424
|
+
### Data Types
|
|
425
|
+
|
|
426
|
+
#### Chunk Types
|
|
427
|
+
- `text` — Characters, paragraphs, headings, lists, form fields, checkboxes, code blocks
|
|
428
|
+
- `table` — Grid of rows and columns; includes spreadsheets and receipts
|
|
429
|
+
- `figure` — Visual/graphical non-text content — images, graphs, flowcharts, diagrams
|
|
430
|
+
- `marginalia` — Content in document margins — headers, footers, page numbers, handwritten notes
|
|
431
|
+
- `logo` — Logos (DPT-2 only)
|
|
432
|
+
- `card` — ID cards and driver's licenses (DPT-2 only)
|
|
433
|
+
- `attestation` — Signatures, stamps, and seals (DPT-2 only)
|
|
434
|
+
- `scan_code` — QR codes and barcodes (DPT-2 only)
|
|
435
|
+
|
|
436
|
+
#### Grounding Types
|
|
437
|
+
- Chunk grounding: `chunkText`, `chunkTable`, `chunkFigure`, `chunkMarginalia`, `chunkLogo`, `chunkCard`, `chunkAttestation`, `chunkScanCode`
|
|
438
|
+
- Structure: `table`, `tableCell` (with position data)
|
|
439
|
+
|
|
440
|
+
#### Bounding Box
|
|
441
|
+
All coordinates normalized 0–1: `{ left, top, right, bottom }`.
|
|
442
|
+
|
|
443
|
+
#### Table Cell Position
|
|
444
|
+
`{ row, col, rowspan, colspan, chunk_id }` — zero-indexed.
|
|
445
|
+
|
|
446
|
+
#### Table Chunk Formats
|
|
447
|
+
|
|
448
|
+
**PDF/Image tables**: Element IDs use `{page}-{base62_seq}`. Grounding object has bounding boxes and `tableCell` entries.
|
|
449
|
+
|
|
450
|
+
**Spreadsheet tables (XLSX/CSV)**: Element IDs use `{tab_name}-{cell_ref}` (e.g., `Sheet 1-B2`). **Grounding is null** — positions are encoded in IDs.
|
|
451
|
+
|
|
452
|
+
### Error Codes
|
|
453
|
+
|
|
454
|
+
| Status | Error Type | Description | Solution |
|
|
455
|
+
|--------|------------|-------------|----------|
|
|
456
|
+
| 400 | `validation_error` | Invalid parameters | Check request format |
|
|
457
|
+
| 401 | `authentication_error` | Invalid API key | Check VISION_AGENT_API_KEY |
|
|
458
|
+
| 413 | `payload_too_large` | File too large | Use Parse Jobs API |
|
|
459
|
+
| 422 | `unprocessable_entity` | Invalid file type or malformed schema | Validate file format and schema JSON |
|
|
460
|
+
| 429 | `rate_limit_error` | Too many requests | Implement backoff |
|
|
461
|
+
| 500 | `internal_error` | Server error | Retry with backoff |
|
|
462
|
+
| 504 | `timeout_error` | Request timeout | Use Parse Jobs API |
|
|
463
|
+
|
|
464
|
+
### Supported File Types
|
|
465
|
+
|
|
466
|
+
| Category | Formats | Notes |
|
|
467
|
+
|----------|---------|-------|
|
|
468
|
+
| **PDF** | PDF | Up to 100 pages; no password-protected files |
|
|
469
|
+
| **Images** | JPEG, JPG, PNG, APNG, BMP, DCX, DDS, DIB, GD, GIF, ICNS, JP2, PCX, PPM, PSD, TGA, TIF, TIFF, WEBP | |
|
|
470
|
+
| **Text Documents** | DOC, DOCX, ODT | Converted to PDF before parsing |
|
|
471
|
+
| **Presentations** | ODP, PPT, PPTX | Converted to PDF before parsing |
|
|
472
|
+
| **Spreadsheets** | CSV, XLSX | Up to 10 MB in Playground; no sheet/column/row limits |
|
|
473
|
+
|
|
474
|
+
> **Note:** Word, PowerPoint, and OpenDocument files are converted to PDF server-side before parsing.
|
|
475
|
+
|
|
476
|
+
### Model Versions
|
|
477
|
+
|
|
478
|
+
| Operation | Current Version | Description |
|
|
479
|
+
|-----------|----------------|-------------|
|
|
480
|
+
| Parse | `dpt-2-latest` | Document parsing and OCR |
|
|
481
|
+
| Extract | `extract-latest` | Schema-based extraction |
|
|
482
|
+
| Split | `split-latest` | Document classification |
|
|
483
|
+
|
|
484
|
+
---
|
|
485
|
+
|
|
486
|
+
## External Links
|
|
487
|
+
|
|
488
|
+
- [Python SDK Documentation](https://docs.landing.ai/ade/ade-python)
|
|
489
|
+
- [Python SDK GitHub](https://github.com/landing-ai/ade-python)
|