chub-dev 0.1.0 → 0.1.2-beta.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (139) hide show
  1. package/README.md +55 -0
  2. package/bin/chub-mcp +2 -0
  3. package/dist/airtable/docs/database/javascript/DOC.md +1437 -0
  4. package/dist/airtable/docs/database/python/DOC.md +1735 -0
  5. package/dist/amplitude/docs/analytics/javascript/DOC.md +1282 -0
  6. package/dist/amplitude/docs/analytics/python/DOC.md +1199 -0
  7. package/dist/anthropic/docs/claude-api/javascript/DOC.md +503 -0
  8. package/dist/anthropic/docs/claude-api/python/DOC.md +389 -0
  9. package/dist/asana/docs/tasks/DOC.md +1396 -0
  10. package/dist/assemblyai/docs/transcription/DOC.md +1043 -0
  11. package/dist/atlassian/docs/confluence/javascript/DOC.md +1347 -0
  12. package/dist/atlassian/docs/confluence/python/DOC.md +1604 -0
  13. package/dist/auth0/docs/identity/javascript/DOC.md +968 -0
  14. package/dist/auth0/docs/identity/python/DOC.md +1199 -0
  15. package/dist/aws/docs/s3/javascript/DOC.md +1773 -0
  16. package/dist/aws/docs/s3/python/DOC.md +1807 -0
  17. package/dist/binance/docs/trading/javascript/DOC.md +1315 -0
  18. package/dist/binance/docs/trading/python/DOC.md +1454 -0
  19. package/dist/braintree/docs/gateway/javascript/DOC.md +1278 -0
  20. package/dist/braintree/docs/gateway/python/DOC.md +1179 -0
  21. package/dist/chromadb/docs/embeddings-db/javascript/DOC.md +1263 -0
  22. package/dist/chromadb/docs/embeddings-db/python/DOC.md +1707 -0
  23. package/dist/clerk/docs/auth/javascript/DOC.md +1220 -0
  24. package/dist/clerk/docs/auth/python/DOC.md +274 -0
  25. package/dist/cloudflare/docs/workers/javascript/DOC.md +918 -0
  26. package/dist/cloudflare/docs/workers/python/DOC.md +994 -0
  27. package/dist/cockroachdb/docs/distributed-db/DOC.md +1500 -0
  28. package/dist/cohere/docs/llm/DOC.md +1335 -0
  29. package/dist/datadog/docs/monitoring/javascript/DOC.md +1740 -0
  30. package/dist/datadog/docs/monitoring/python/DOC.md +1815 -0
  31. package/dist/deepgram/docs/speech/javascript/DOC.md +885 -0
  32. package/dist/deepgram/docs/speech/python/DOC.md +685 -0
  33. package/dist/deepl/docs/translation/javascript/DOC.md +887 -0
  34. package/dist/deepl/docs/translation/python/DOC.md +944 -0
  35. package/dist/deepseek/docs/llm/DOC.md +1220 -0
  36. package/dist/directus/docs/headless-cms/javascript/DOC.md +1128 -0
  37. package/dist/directus/docs/headless-cms/python/DOC.md +1276 -0
  38. package/dist/discord/docs/bot/javascript/DOC.md +1090 -0
  39. package/dist/discord/docs/bot/python/DOC.md +1130 -0
  40. package/dist/elasticsearch/docs/search/DOC.md +1634 -0
  41. package/dist/elevenlabs/docs/text-to-speech/javascript/DOC.md +336 -0
  42. package/dist/elevenlabs/docs/text-to-speech/python/DOC.md +552 -0
  43. package/dist/firebase/docs/auth/DOC.md +1015 -0
  44. package/dist/gemini/docs/genai/javascript/DOC.md +691 -0
  45. package/dist/gemini/docs/genai/python/DOC.md +555 -0
  46. package/dist/github/docs/octokit/DOC.md +1560 -0
  47. package/dist/google/docs/bigquery/javascript/DOC.md +1688 -0
  48. package/dist/google/docs/bigquery/python/DOC.md +1503 -0
  49. package/dist/hubspot/docs/crm/javascript/DOC.md +1805 -0
  50. package/dist/hubspot/docs/crm/python/DOC.md +2033 -0
  51. package/dist/huggingface/docs/transformers/DOC.md +948 -0
  52. package/dist/intercom/docs/messaging/javascript/DOC.md +1844 -0
  53. package/dist/intercom/docs/messaging/python/DOC.md +1797 -0
  54. package/dist/jira/docs/issues/javascript/DOC.md +1420 -0
  55. package/dist/jira/docs/issues/python/DOC.md +1492 -0
  56. package/dist/kafka/docs/streaming/javascript/DOC.md +1671 -0
  57. package/dist/kafka/docs/streaming/python/DOC.md +1464 -0
  58. package/dist/landingai-ade/docs/api/DOC.md +620 -0
  59. package/dist/landingai-ade/docs/sdk/python/DOC.md +489 -0
  60. package/dist/landingai-ade/docs/sdk/typescript/DOC.md +542 -0
  61. package/dist/landingai-ade/skills/SKILL.md +489 -0
  62. package/dist/launchdarkly/docs/feature-flags/javascript/DOC.md +1191 -0
  63. package/dist/launchdarkly/docs/feature-flags/python/DOC.md +1671 -0
  64. package/dist/linear/docs/tracker/DOC.md +1554 -0
  65. package/dist/livekit/docs/realtime/javascript/DOC.md +303 -0
  66. package/dist/livekit/docs/realtime/python/DOC.md +163 -0
  67. package/dist/mailchimp/docs/marketing/DOC.md +1420 -0
  68. package/dist/meilisearch/docs/search/DOC.md +1241 -0
  69. package/dist/microsoft/docs/onedrive/javascript/DOC.md +1421 -0
  70. package/dist/microsoft/docs/onedrive/python/DOC.md +1549 -0
  71. package/dist/mongodb/docs/atlas/DOC.md +2041 -0
  72. package/dist/notion/docs/workspace-api/javascript/DOC.md +1435 -0
  73. package/dist/notion/docs/workspace-api/python/DOC.md +1400 -0
  74. package/dist/okta/docs/identity/javascript/DOC.md +1171 -0
  75. package/dist/okta/docs/identity/python/DOC.md +1401 -0
  76. package/dist/openai/docs/chat/javascript/DOC.md +407 -0
  77. package/dist/openai/docs/chat/python/DOC.md +568 -0
  78. package/dist/paypal/docs/checkout/DOC.md +278 -0
  79. package/dist/pinecone/docs/sdk/javascript/DOC.md +984 -0
  80. package/dist/pinecone/docs/sdk/python/DOC.md +1395 -0
  81. package/dist/plaid/docs/banking/javascript/DOC.md +1163 -0
  82. package/dist/plaid/docs/banking/python/DOC.md +1203 -0
  83. package/dist/playwright-community/skills/login-flows/SKILL.md +108 -0
  84. package/dist/postmark/docs/transactional-email/DOC.md +1168 -0
  85. package/dist/prisma/docs/orm/javascript/DOC.md +1419 -0
  86. package/dist/prisma/docs/orm/python/DOC.md +1317 -0
  87. package/dist/qdrant/docs/vector-search/javascript/DOC.md +1221 -0
  88. package/dist/qdrant/docs/vector-search/python/DOC.md +1653 -0
  89. package/dist/rabbitmq/docs/message-queue/javascript/DOC.md +1193 -0
  90. package/dist/rabbitmq/docs/message-queue/python/DOC.md +1243 -0
  91. package/dist/razorpay/docs/payments/javascript/DOC.md +1219 -0
  92. package/dist/razorpay/docs/payments/python/DOC.md +1330 -0
  93. package/dist/redis/docs/key-value/javascript/DOC.md +1851 -0
  94. package/dist/redis/docs/key-value/python/DOC.md +2054 -0
  95. package/dist/registry.json +2817 -0
  96. package/dist/replicate/docs/model-hosting/DOC.md +1318 -0
  97. package/dist/resend/docs/email/DOC.md +1271 -0
  98. package/dist/salesforce/docs/crm/javascript/DOC.md +1241 -0
  99. package/dist/salesforce/docs/crm/python/DOC.md +1183 -0
  100. package/dist/search-index.json +1 -0
  101. package/dist/sendgrid/docs/email-api/javascript/DOC.md +371 -0
  102. package/dist/sendgrid/docs/email-api/python/DOC.md +656 -0
  103. package/dist/sentry/docs/error-tracking/javascript/DOC.md +1073 -0
  104. package/dist/sentry/docs/error-tracking/python/DOC.md +1309 -0
  105. package/dist/shopify/docs/storefront/DOC.md +457 -0
  106. package/dist/slack/docs/workspace/javascript/DOC.md +933 -0
  107. package/dist/slack/docs/workspace/python/DOC.md +271 -0
  108. package/dist/square/docs/payments/javascript/DOC.md +1855 -0
  109. package/dist/square/docs/payments/python/DOC.md +1728 -0
  110. package/dist/stripe/docs/api/DOC.md +1727 -0
  111. package/dist/stripe/docs/payments/DOC.md +1726 -0
  112. package/dist/stytch/docs/auth/javascript/DOC.md +1813 -0
  113. package/dist/stytch/docs/auth/python/DOC.md +1962 -0
  114. package/dist/supabase/docs/client/DOC.md +1606 -0
  115. package/dist/twilio/docs/messaging/python/DOC.md +469 -0
  116. package/dist/twilio/docs/messaging/typescript/DOC.md +946 -0
  117. package/dist/vercel/docs/platform/DOC.md +1940 -0
  118. package/dist/weaviate/docs/vector-db/javascript/DOC.md +1268 -0
  119. package/dist/weaviate/docs/vector-db/python/DOC.md +1388 -0
  120. package/dist/zendesk/docs/support/javascript/DOC.md +2150 -0
  121. package/dist/zendesk/docs/support/python/DOC.md +2297 -0
  122. package/package.json +22 -6
  123. package/skills/get-api-docs/SKILL.md +84 -0
  124. package/src/commands/annotate.js +83 -0
  125. package/src/commands/build.js +12 -1
  126. package/src/commands/feedback.js +150 -0
  127. package/src/commands/get.js +83 -42
  128. package/src/commands/search.js +7 -0
  129. package/src/index.js +43 -17
  130. package/src/lib/analytics.js +90 -0
  131. package/src/lib/annotations.js +57 -0
  132. package/src/lib/bm25.js +170 -0
  133. package/src/lib/cache.js +69 -6
  134. package/src/lib/config.js +8 -3
  135. package/src/lib/identity.js +99 -0
  136. package/src/lib/registry.js +103 -20
  137. package/src/lib/telemetry.js +86 -0
  138. package/src/mcp/server.js +177 -0
  139. package/src/mcp/tools.js +251 -0
@@ -0,0 +1,489 @@
1
+ ---
2
+ name: sdk
3
+ description: "Python SDK reference for LandingAI's Agentic Document Extraction (ADE). Includes Pydantic schema extraction, async processing, error handling, save_to, visual grounding, table cell lookup, and complete API context."
4
+ metadata:
5
+ languages: "python"
6
+ versions: "0.1.0"
7
+ updated-on: "2026-03-04"
8
+ source: maintainer
9
+ tags: "landingai,ade,python,sdk,pydantic,document-extraction,parse,extract,split,async"
10
+ ---
11
+
12
+ # LandingAI ADE — Python SDK Reference
13
+
14
+ Python SDK for LandingAI's Agentic Document Extraction.
15
+
16
+ ## Installation
17
+
18
+ ```bash
19
+ pip install landingai-ade
20
+ export VISION_AGENT_API_KEY="v2_..."
21
+ ```
22
+
23
+ ## Client Setup
24
+
25
+ ```python
26
+ from landingai_ade import LandingAIADE
27
+ from pathlib import Path
28
+
29
+ client = LandingAIADE() # Uses VISION_AGENT_API_KEY env var
30
+ ```
31
+
32
+ ### Constructor Arguments
33
+
34
+ | Parameter | Type | Default | Description |
35
+ |-----------|------|---------|-------------|
36
+ | `api_key` | `str \| None` | env `VISION_AGENT_API_KEY` | API key |
37
+ | `environment` | `"production" \| "eu"` | `"production"` | Region — `"production"` (US) or `"eu"` |
38
+ | `base_url` | `str \| None` | — | Override base URL |
39
+ | `timeout` | `float \| Timeout \| None` | SDK default | Request timeout in seconds |
40
+ | `max_retries` | `int` | SDK default | Max retry attempts for transient errors |
41
+ | `http_client` | `httpx.Client \| None` | — | Custom httpx client |
42
+
43
+ ```python
44
+ # EU region
45
+ client = LandingAIADE(environment="eu")
46
+
47
+ # Pass key directly
48
+ client = LandingAIADE(api_key="v2_...")
49
+ ```
50
+
51
+ ---
52
+
53
+ ## 1. Parse
54
+
55
+ Converts documents to structured markdown with visual grounding.
56
+
57
+ ### Arguments
58
+
59
+ | Parameter | Type | Required | Description |
60
+ |-----------|------|----------|-------------|
61
+ | `document` | `FileTypes \| None` | One required | Local file (Path, bytes, file-like) |
62
+ | `document_url` | `str \| None` | One required | Remote document URL |
63
+ | `model` | `str \| None` | No | Model version (default: `dpt-2-latest`) |
64
+ | `split` | `"page" \| None` | No | Split by pages |
65
+ | `save_to` | `str \| None` | No | Directory to save `{filename}_parse_output.json` |
66
+
67
+ ### Returns `ParseResponse`
68
+
69
+ ```
70
+ .markdown → str: full document as markdown
71
+ .chunks[] → Chunk: {id, type, markdown, grounding: {page, box}}
72
+ .grounding → dict: {id → Grounding} with bounding boxes and tableCell positions
73
+ .splits[] → Split: {chunks[], class, identifier, markdown, pages[]} (only if split="page")
74
+ .metadata → ParseMetadata: {filename, page_count, duration_ms, credit_usage, version, job_id, failed_pages}
75
+ ```
76
+
77
+ ### Example
78
+
79
+ ```python
80
+ response = client.parse(
81
+ document=Path("invoice.pdf"),
82
+ model="dpt-2-latest",
83
+ save_to="./output",
84
+ )
85
+
86
+ print(response.markdown)
87
+ print(f"{len(response.chunks)} chunks, {response.metadata.page_count} pages")
88
+
89
+ tables = [c for c in response.chunks if c.type == "table"]
90
+ ```
91
+
92
+ ### Visual Grounding and Table Cells
93
+
94
+ ```python
95
+ for chunk in response.chunks:
96
+ box = chunk.grounding.box
97
+ print(f"{chunk.type} on page {chunk.grounding.page}: "
98
+ f"({box.left:.3f}, {box.top:.3f}) → ({box.right:.3f}, {box.bottom:.3f})")
99
+
100
+ for gid, grounding in response.grounding.items():
101
+ if grounding.type == "tableCell":
102
+ pos = grounding.position
103
+ print(f"Cell ({pos.row}, {pos.col}) span=({pos.rowspan}x{pos.colspan})")
104
+ ```
105
+
106
+ ### Extract a Cell Value by Row and Column (PDF)
107
+
108
+ ```python
109
+ import re
110
+
111
+ table = next(c for c in response.chunks if c.type == "table")
112
+
113
+ rows = re.findall(r'<tr[^>]*>(.*?)</tr>', table.markdown, re.DOTALL)
114
+ grid = {}
115
+ for r, row_html in enumerate(rows):
116
+ for c, m in enumerate(re.finditer(r'<td[^>]*>(.*?)</td>', row_html, re.DOTALL)):
117
+ grid[(r, c)] = re.sub(r'<[^>]+>', '', m.group(1)).strip()
118
+
119
+ value = grid[(1, 0)] # zero-indexed row, col
120
+ ```
121
+
122
+ ### Read a Spreadsheet Cell by Reference
123
+
124
+ ```python
125
+ import re
126
+
127
+ response = client.parse(document=Path("report.xlsx"))
128
+ table = next(c for c in response.chunks if c.type == "table")
129
+
130
+ # Spreadsheet cell IDs are "{tab_name}-{cell_ref}" (e.g., "Sheet 1-B2").
131
+ # grounding is null for spreadsheets, so parse IDs directly from HTML.
132
+ cell_text = {}
133
+ for m in re.finditer(
134
+ r'<td[^>]*\bid=["\']([^"\']+)["\'][^>]*>(.*?)</td>',
135
+ table.markdown, re.DOTALL,
136
+ ):
137
+ cell_text[m.group(1)] = re.sub(r"<[^>]+>", "", m.group(2)).strip()
138
+
139
+ value = cell_text["Sheet 1-B2"]
140
+ ```
141
+
142
+ ---
143
+
144
+ ## 2. Extract
145
+
146
+ Extracts structured data from markdown using a JSON schema.
147
+
148
+ ### Arguments
149
+
150
+ | Parameter | Type | Required | Description |
151
+ |-----------|------|----------|-------------|
152
+ | `schema` | `str` | Yes | JSON schema string (use `pydantic_to_json_schema()` to generate from Pydantic models) |
153
+ | `markdown` | `FileTypes \| str \| None` | One required | Markdown content, string, or file |
154
+ | `markdown_url` | `str \| None` | One required | URL to markdown |
155
+ | `model` | `str \| None` | No | Model version (default: `extract-latest`) |
156
+ | `save_to` | `str \| None` | No | Directory to save `{filename}_extract_output.json` |
157
+
158
+ ### Returns `ExtractResponse`
159
+
160
+ ```
161
+ .extraction → dict: extracted key-value pairs matching schema
162
+ .extraction_metadata → dict: {field → {references: [chunk_ids]}} for grounding
163
+ .metadata → Metadata: {credit_usage, duration_ms, filename, job_id, version, schema_violation_error}
164
+ ```
165
+
166
+ ### Pydantic Schema Extraction
167
+
168
+ ```python
169
+ from pydantic import BaseModel, Field
170
+ from landingai_ade.lib import pydantic_to_json_schema
171
+
172
+ class InvoiceData(BaseModel):
173
+ invoice_number: str = Field(description="Invoice number or ID")
174
+ total_amount: float = Field(description="Total amount to be paid")
175
+ vendor_name: str = Field(description="Vendor or supplier name")
176
+ line_items: list[dict] | None = Field(default=None, description="Line items")
177
+
178
+ # Parse once, extract many
179
+ parsed = client.parse(document=Path("invoice.pdf"))
180
+
181
+ response = client.extract(
182
+ markdown=parsed.markdown,
183
+ schema=pydantic_to_json_schema(InvoiceData),
184
+ )
185
+
186
+ invoice = InvoiceData(**response.extraction)
187
+ print(f"Invoice {invoice.invoice_number}: ${invoice.total_amount}")
188
+ ```
189
+
190
+ ### Grounding References (Tracing Back to Source)
191
+
192
+ ```python
193
+ chunk_map = {c.id: c for c in parsed.chunks}
194
+
195
+ for field, meta in response.extraction_metadata.items():
196
+ if meta.get("references"):
197
+ chunk = chunk_map.get(meta["references"][0])
198
+ if chunk:
199
+ print(f"{field}: page {chunk.grounding.page}, type={chunk.type}")
200
+ ```
201
+
202
+ ### `pydantic_to_json_schema(model)`
203
+
204
+ Converts a Pydantic `BaseModel` class to a resolved JSON schema string (all `$ref` inlined). Pass the result directly to `schema=`.
205
+
206
+ ```python
207
+ from landingai_ade.lib import pydantic_to_json_schema
208
+
209
+ schema_str = pydantic_to_json_schema(InvoiceData) # → JSON string
210
+ ```
211
+
212
+ ---
213
+
214
+ ## 3. Split
215
+
216
+ Classifies and splits mixed documents by type.
217
+
218
+ ### Arguments
219
+
220
+ | Parameter | Type | Required | Description |
221
+ |-----------|------|----------|-------------|
222
+ | `split_class` | `Iterable[SplitClass]` | Yes | List of `{"name": str, "description"?: str, "identifier"?: str}` |
223
+ | `markdown` | `FileTypes \| str \| None` | One required | Markdown content or file |
224
+ | `markdown_url` | `str \| None` | One required | URL to markdown |
225
+ | `model` | `str \| None` | No | Model version (default: `split-latest`) |
226
+ | `save_to` | `str \| None` | No | Directory to save `{filename}_split_output.json` |
227
+
228
+ ### Returns `SplitResponse`
229
+
230
+ ```
231
+ .splits[] → Split: {classification, identifier, markdowns[], pages[]}
232
+ .metadata → Metadata: {credit_usage, duration_ms, filename, page_count}
233
+ ```
234
+
235
+ ### Split → Extract Pipeline
236
+
237
+ ```python
238
+ parsed = client.parse(document=Path("mixed_invoices.pdf"))
239
+
240
+ splits = client.split(
241
+ markdown=parsed.markdown,
242
+ split_class=[
243
+ {"name": "Invoice", "description": "Sales invoice", "identifier": "Invoice Number"},
244
+ {"name": "Receipt", "description": "Payment receipt", "identifier": "Receipt Number"},
245
+ ],
246
+ )
247
+
248
+ for split in splits.splits:
249
+ print(f"{split.classification}: {split.identifier} (pages {split.pages})")
250
+
251
+ # Extract from each split
252
+ schema = pydantic_to_json_schema(InvoiceData)
253
+ results = []
254
+ for split in splits.splits:
255
+ extracted = client.extract(markdown=split.markdowns[0], schema=schema)
256
+ results.append({"type": split.classification, "id": split.identifier, **extracted.extraction})
257
+ ```
258
+
259
+ ---
260
+
261
+ ## 4. Parse Jobs (Async, Large Files)
262
+
263
+ For files >50MB, use asynchronous processing.
264
+
265
+ ### `parse_jobs.create()` Arguments
266
+
267
+ | Parameter | Type | Required | Description |
268
+ |-----------|------|----------|-------------|
269
+ | `document` | `FileTypes \| None` | One required | Local file |
270
+ | `document_url` | `str \| None` | One required | Remote document URL |
271
+ | `model` | `str \| None` | No | Model version (default: `dpt-2-latest`) |
272
+ | `split` | `"page" \| None` | No | Split by pages |
273
+ | `output_save_url` | `str \| None` | If ZDR | URL for zero data retention output |
274
+
275
+ ### Returns `ParseJobCreateResponse`
276
+
277
+ ```
278
+ .job_id → str: unique job identifier
279
+ ```
280
+
281
+ ### `parse_jobs.get(job_id)` Returns `ParseJobGetResponse`
282
+
283
+ ```
284
+ .job_id → str
285
+ .status → str: pending|processing|completed|failed|cancelled
286
+ .progress → float: 0.0 to 1.0
287
+ .failure_reason → str | None: error message if failed
288
+ .data → ParseResponse | None: full result when completed
289
+ .output_url → str | None: presigned URL if result >1MB (expires 1hr)
290
+ ```
291
+
292
+ ### `parse_jobs.list()` Arguments & Returns
293
+
294
+ | Parameter | Type | Required | Description |
295
+ |-----------|------|----------|-------------|
296
+ | `status` | `"pending" \| "processing" \| "completed" \| "failed" \| "cancelled"` | No | Filter by status |
297
+ | `page` | `int \| None` | No | Page number (0-indexed) |
298
+ | `page_size` | `int \| None` | No | Items per page |
299
+
300
+ ```
301
+ .jobs[] → Job: {job_id, status, progress, received_at, failure_reason}
302
+ .has_more → bool | None
303
+ ```
304
+
305
+ ### Example
306
+
307
+ ```python
308
+ import time
309
+
310
+ job = client.parse_jobs.create(document=Path("large.pdf"))
311
+ print(f"Job ID: {job.job_id}")
312
+
313
+ while True:
314
+ status = client.parse_jobs.get(job.job_id)
315
+ print(f"Status: {status.status}, Progress: {status.progress * 100:.0f}%")
316
+
317
+ if status.status == "completed":
318
+ result = status.data # ParseResponse
319
+ break
320
+ elif status.status == "failed":
321
+ raise RuntimeError(f"Job failed: {status.failure_reason}")
322
+
323
+ time.sleep(5)
324
+ ```
325
+
326
+ ---
327
+
328
+ ## Error Handling
329
+
330
+ ### Exception Classes
331
+
332
+ All exceptions inherit from `LandingAiadeError`:
333
+
334
+ | Exception | HTTP Status | Description |
335
+ |-----------|-------------|-------------|
336
+ | `BadRequestError` | 400 | Invalid parameters |
337
+ | `AuthenticationError` | 401 | Invalid API key |
338
+ | `PermissionDeniedError` | 403 | Forbidden |
339
+ | `NotFoundError` | 404 | Resource not found |
340
+ | `UnprocessableEntityError` | 422 | Invalid file type or malformed schema |
341
+ | `RateLimitError` | 429 | Too many requests |
342
+ | `InternalServerError` | 5xx | Server error |
343
+ | `APIConnectionError` | — | Network failure |
344
+ | `APITimeoutError` | — | Request timeout |
345
+
346
+ `APIStatusError` is the base for all HTTP errors and has a `status_code` attribute.
347
+
348
+ ### Retry with Fallback to Jobs
349
+
350
+ ```python
351
+ from landingai_ade import RateLimitError, APITimeoutError, APIStatusError, APIConnectionError
352
+
353
+ def parse_with_retry(client, file_path, max_retries=3):
354
+ for attempt in range(max_retries):
355
+ try:
356
+ return client.parse(document=Path(file_path))
357
+ except RateLimitError:
358
+ time.sleep(2 ** attempt * 10)
359
+ except (APITimeoutError, APIStatusError) as e:
360
+ if isinstance(e, APIStatusError) and e.status_code not in (413, 504):
361
+ raise
362
+ print("Timeout or too large — switching to parse jobs")
363
+ job = client.parse_jobs.create(document=Path(file_path))
364
+ return poll_job(client, job.job_id)
365
+ except APIConnectionError:
366
+ time.sleep(2)
367
+ raise RuntimeError("Failed after retries")
368
+
369
+ def poll_job(client, job_id, timeout=300):
370
+ start = time.time()
371
+ while time.time() - start < timeout:
372
+ status = client.parse_jobs.get(job_id)
373
+ if status.status == "completed":
374
+ return status.data
375
+ if status.status == "failed":
376
+ raise RuntimeError(f"Job failed: {status.failure_reason}")
377
+ time.sleep(5)
378
+ raise TimeoutError("Job did not complete in time")
379
+ ```
380
+
381
+ ---
382
+
383
+ ## Async / Concurrent Processing
384
+
385
+ ```python
386
+ import asyncio
387
+ from landingai_ade import AsyncLandingAIADE
388
+
389
+ async def parse_multiple(files: list[str]):
390
+ client = AsyncLandingAIADE()
391
+ tasks = [client.parse(document=Path(f)) for f in files]
392
+ results = await asyncio.gather(*tasks, return_exceptions=True)
393
+ return [r for r in results if not isinstance(r, Exception)]
394
+ ```
395
+
396
+ `AsyncLandingAIADE` has the same constructor and methods as `LandingAIADE` — all methods are `async`.
397
+
398
+ ---
399
+
400
+ ## API Reference
401
+
402
+ The following sections provide the complete API context so this document is fully self-contained.
403
+
404
+ ### Base Configuration
405
+
406
+ | Region | Base URL |
407
+ |--------|----------|
408
+ | US (default) | `https://api.va.landing.ai/v1/ade` |
409
+ | EU | `https://api.va.eu-west-1.landing.ai/v1/ade` |
410
+
411
+ **Authentication**: All requests require `Authorization: Bearer $VISION_AGENT_API_KEY`
412
+
413
+ ### Quick Reference
414
+
415
+ | Endpoint | Method | Path | Model | Input |
416
+ |----------|--------|------|-------|-------|
417
+ | Parse | POST | `/v1/ade/parse` | `dpt-2-latest` | `document` (file) or `document_url` |
418
+ | Extract | POST | `/v1/ade/extract` | `extract-latest` | `markdown` (file/string) or `markdown_url` + `schema` |
419
+ | Split | POST | `/v1/ade/split` | `split-latest` | `markdown` (file/string) or `markdown_url` + `split_class` |
420
+ | Create Job | POST | `/v1/ade/parse/jobs` | `dpt-2-latest` | `document` or `document_url` |
421
+ | Get Job | GET | `/v1/ade/parse/jobs/{id}` | — | — |
422
+ | List Jobs | GET | `/v1/ade/parse/jobs` | — | `?status=&page=&pageSize=` |
423
+
424
+ ### Data Types
425
+
426
+ #### Chunk Types
427
+ - `text` — Characters, paragraphs, headings, lists, form fields, checkboxes, code blocks
428
+ - `table` — Grid of rows and columns; includes spreadsheets and receipts
429
+ - `figure` — Visual/graphical non-text content — images, graphs, flowcharts, diagrams
430
+ - `marginalia` — Content in document margins — headers, footers, page numbers, handwritten notes
431
+ - `logo` — Logos (DPT-2 only)
432
+ - `card` — ID cards and driver's licenses (DPT-2 only)
433
+ - `attestation` — Signatures, stamps, and seals (DPT-2 only)
434
+ - `scan_code` — QR codes and barcodes (DPT-2 only)
435
+
436
+ #### Grounding Types
437
+ - Chunk grounding: `chunkText`, `chunkTable`, `chunkFigure`, `chunkMarginalia`, `chunkLogo`, `chunkCard`, `chunkAttestation`, `chunkScanCode`
438
+ - Structure: `table`, `tableCell` (with position data)
439
+
440
+ #### Bounding Box
441
+ All coordinates normalized 0–1: `{ left, top, right, bottom }`.
442
+
443
+ #### Table Cell Position
444
+ `{ row, col, rowspan, colspan, chunk_id }` — zero-indexed.
445
+
446
+ #### Table Chunk Formats
447
+
448
+ **PDF/Image tables**: Element IDs use `{page}-{base62_seq}`. Grounding object has bounding boxes and `tableCell` entries.
449
+
450
+ **Spreadsheet tables (XLSX/CSV)**: Element IDs use `{tab_name}-{cell_ref}` (e.g., `Sheet 1-B2`). **Grounding is null** — positions are encoded in IDs.
451
+
452
+ ### Error Codes
453
+
454
+ | Status | Error Type | Description | Solution |
455
+ |--------|------------|-------------|----------|
456
+ | 400 | `validation_error` | Invalid parameters | Check request format |
457
+ | 401 | `authentication_error` | Invalid API key | Check VISION_AGENT_API_KEY |
458
+ | 413 | `payload_too_large` | File too large | Use Parse Jobs API |
459
+ | 422 | `unprocessable_entity` | Invalid file type or malformed schema | Validate file format and schema JSON |
460
+ | 429 | `rate_limit_error` | Too many requests | Implement backoff |
461
+ | 500 | `internal_error` | Server error | Retry with backoff |
462
+ | 504 | `timeout_error` | Request timeout | Use Parse Jobs API |
463
+
464
+ ### Supported File Types
465
+
466
+ | Category | Formats | Notes |
467
+ |----------|---------|-------|
468
+ | **PDF** | PDF | Up to 100 pages; no password-protected files |
469
+ | **Images** | JPEG, JPG, PNG, APNG, BMP, DCX, DDS, DIB, GD, GIF, ICNS, JP2, PCX, PPM, PSD, TGA, TIF, TIFF, WEBP | |
470
+ | **Text Documents** | DOC, DOCX, ODT | Converted to PDF before parsing |
471
+ | **Presentations** | ODP, PPT, PPTX | Converted to PDF before parsing |
472
+ | **Spreadsheets** | CSV, XLSX | Up to 10 MB in Playground; no sheet/column/row limits |
473
+
474
+ > **Note:** Word, PowerPoint, and OpenDocument files are converted to PDF server-side before parsing.
475
+
476
+ ### Model Versions
477
+
478
+ | Operation | Current Version | Description |
479
+ |-----------|----------------|-------------|
480
+ | Parse | `dpt-2-latest` | Document parsing and OCR |
481
+ | Extract | `extract-latest` | Schema-based extraction |
482
+ | Split | `split-latest` | Document classification |
483
+
484
+ ---
485
+
486
+ ## External Links
487
+
488
+ - [Python SDK Documentation](https://docs.landing.ai/ade/ade-python)
489
+ - [Python SDK GitHub](https://github.com/landing-ai/ade-python)