chub-dev 0.1.0 → 0.1.2-beta.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (139) hide show
  1. package/README.md +55 -0
  2. package/bin/chub-mcp +2 -0
  3. package/dist/airtable/docs/database/javascript/DOC.md +1437 -0
  4. package/dist/airtable/docs/database/python/DOC.md +1735 -0
  5. package/dist/amplitude/docs/analytics/javascript/DOC.md +1282 -0
  6. package/dist/amplitude/docs/analytics/python/DOC.md +1199 -0
  7. package/dist/anthropic/docs/claude-api/javascript/DOC.md +503 -0
  8. package/dist/anthropic/docs/claude-api/python/DOC.md +389 -0
  9. package/dist/asana/docs/tasks/DOC.md +1396 -0
  10. package/dist/assemblyai/docs/transcription/DOC.md +1043 -0
  11. package/dist/atlassian/docs/confluence/javascript/DOC.md +1347 -0
  12. package/dist/atlassian/docs/confluence/python/DOC.md +1604 -0
  13. package/dist/auth0/docs/identity/javascript/DOC.md +968 -0
  14. package/dist/auth0/docs/identity/python/DOC.md +1199 -0
  15. package/dist/aws/docs/s3/javascript/DOC.md +1773 -0
  16. package/dist/aws/docs/s3/python/DOC.md +1807 -0
  17. package/dist/binance/docs/trading/javascript/DOC.md +1315 -0
  18. package/dist/binance/docs/trading/python/DOC.md +1454 -0
  19. package/dist/braintree/docs/gateway/javascript/DOC.md +1278 -0
  20. package/dist/braintree/docs/gateway/python/DOC.md +1179 -0
  21. package/dist/chromadb/docs/embeddings-db/javascript/DOC.md +1263 -0
  22. package/dist/chromadb/docs/embeddings-db/python/DOC.md +1707 -0
  23. package/dist/clerk/docs/auth/javascript/DOC.md +1220 -0
  24. package/dist/clerk/docs/auth/python/DOC.md +274 -0
  25. package/dist/cloudflare/docs/workers/javascript/DOC.md +918 -0
  26. package/dist/cloudflare/docs/workers/python/DOC.md +994 -0
  27. package/dist/cockroachdb/docs/distributed-db/DOC.md +1500 -0
  28. package/dist/cohere/docs/llm/DOC.md +1335 -0
  29. package/dist/datadog/docs/monitoring/javascript/DOC.md +1740 -0
  30. package/dist/datadog/docs/monitoring/python/DOC.md +1815 -0
  31. package/dist/deepgram/docs/speech/javascript/DOC.md +885 -0
  32. package/dist/deepgram/docs/speech/python/DOC.md +685 -0
  33. package/dist/deepl/docs/translation/javascript/DOC.md +887 -0
  34. package/dist/deepl/docs/translation/python/DOC.md +944 -0
  35. package/dist/deepseek/docs/llm/DOC.md +1220 -0
  36. package/dist/directus/docs/headless-cms/javascript/DOC.md +1128 -0
  37. package/dist/directus/docs/headless-cms/python/DOC.md +1276 -0
  38. package/dist/discord/docs/bot/javascript/DOC.md +1090 -0
  39. package/dist/discord/docs/bot/python/DOC.md +1130 -0
  40. package/dist/elasticsearch/docs/search/DOC.md +1634 -0
  41. package/dist/elevenlabs/docs/text-to-speech/javascript/DOC.md +336 -0
  42. package/dist/elevenlabs/docs/text-to-speech/python/DOC.md +552 -0
  43. package/dist/firebase/docs/auth/DOC.md +1015 -0
  44. package/dist/gemini/docs/genai/javascript/DOC.md +691 -0
  45. package/dist/gemini/docs/genai/python/DOC.md +555 -0
  46. package/dist/github/docs/octokit/DOC.md +1560 -0
  47. package/dist/google/docs/bigquery/javascript/DOC.md +1688 -0
  48. package/dist/google/docs/bigquery/python/DOC.md +1503 -0
  49. package/dist/hubspot/docs/crm/javascript/DOC.md +1805 -0
  50. package/dist/hubspot/docs/crm/python/DOC.md +2033 -0
  51. package/dist/huggingface/docs/transformers/DOC.md +948 -0
  52. package/dist/intercom/docs/messaging/javascript/DOC.md +1844 -0
  53. package/dist/intercom/docs/messaging/python/DOC.md +1797 -0
  54. package/dist/jira/docs/issues/javascript/DOC.md +1420 -0
  55. package/dist/jira/docs/issues/python/DOC.md +1492 -0
  56. package/dist/kafka/docs/streaming/javascript/DOC.md +1671 -0
  57. package/dist/kafka/docs/streaming/python/DOC.md +1464 -0
  58. package/dist/landingai-ade/docs/api/DOC.md +620 -0
  59. package/dist/landingai-ade/docs/sdk/python/DOC.md +489 -0
  60. package/dist/landingai-ade/docs/sdk/typescript/DOC.md +542 -0
  61. package/dist/landingai-ade/skills/SKILL.md +489 -0
  62. package/dist/launchdarkly/docs/feature-flags/javascript/DOC.md +1191 -0
  63. package/dist/launchdarkly/docs/feature-flags/python/DOC.md +1671 -0
  64. package/dist/linear/docs/tracker/DOC.md +1554 -0
  65. package/dist/livekit/docs/realtime/javascript/DOC.md +303 -0
  66. package/dist/livekit/docs/realtime/python/DOC.md +163 -0
  67. package/dist/mailchimp/docs/marketing/DOC.md +1420 -0
  68. package/dist/meilisearch/docs/search/DOC.md +1241 -0
  69. package/dist/microsoft/docs/onedrive/javascript/DOC.md +1421 -0
  70. package/dist/microsoft/docs/onedrive/python/DOC.md +1549 -0
  71. package/dist/mongodb/docs/atlas/DOC.md +2041 -0
  72. package/dist/notion/docs/workspace-api/javascript/DOC.md +1435 -0
  73. package/dist/notion/docs/workspace-api/python/DOC.md +1400 -0
  74. package/dist/okta/docs/identity/javascript/DOC.md +1171 -0
  75. package/dist/okta/docs/identity/python/DOC.md +1401 -0
  76. package/dist/openai/docs/chat/javascript/DOC.md +407 -0
  77. package/dist/openai/docs/chat/python/DOC.md +568 -0
  78. package/dist/paypal/docs/checkout/DOC.md +278 -0
  79. package/dist/pinecone/docs/sdk/javascript/DOC.md +984 -0
  80. package/dist/pinecone/docs/sdk/python/DOC.md +1395 -0
  81. package/dist/plaid/docs/banking/javascript/DOC.md +1163 -0
  82. package/dist/plaid/docs/banking/python/DOC.md +1203 -0
  83. package/dist/playwright-community/skills/login-flows/SKILL.md +108 -0
  84. package/dist/postmark/docs/transactional-email/DOC.md +1168 -0
  85. package/dist/prisma/docs/orm/javascript/DOC.md +1419 -0
  86. package/dist/prisma/docs/orm/python/DOC.md +1317 -0
  87. package/dist/qdrant/docs/vector-search/javascript/DOC.md +1221 -0
  88. package/dist/qdrant/docs/vector-search/python/DOC.md +1653 -0
  89. package/dist/rabbitmq/docs/message-queue/javascript/DOC.md +1193 -0
  90. package/dist/rabbitmq/docs/message-queue/python/DOC.md +1243 -0
  91. package/dist/razorpay/docs/payments/javascript/DOC.md +1219 -0
  92. package/dist/razorpay/docs/payments/python/DOC.md +1330 -0
  93. package/dist/redis/docs/key-value/javascript/DOC.md +1851 -0
  94. package/dist/redis/docs/key-value/python/DOC.md +2054 -0
  95. package/dist/registry.json +2817 -0
  96. package/dist/replicate/docs/model-hosting/DOC.md +1318 -0
  97. package/dist/resend/docs/email/DOC.md +1271 -0
  98. package/dist/salesforce/docs/crm/javascript/DOC.md +1241 -0
  99. package/dist/salesforce/docs/crm/python/DOC.md +1183 -0
  100. package/dist/search-index.json +1 -0
  101. package/dist/sendgrid/docs/email-api/javascript/DOC.md +371 -0
  102. package/dist/sendgrid/docs/email-api/python/DOC.md +656 -0
  103. package/dist/sentry/docs/error-tracking/javascript/DOC.md +1073 -0
  104. package/dist/sentry/docs/error-tracking/python/DOC.md +1309 -0
  105. package/dist/shopify/docs/storefront/DOC.md +457 -0
  106. package/dist/slack/docs/workspace/javascript/DOC.md +933 -0
  107. package/dist/slack/docs/workspace/python/DOC.md +271 -0
  108. package/dist/square/docs/payments/javascript/DOC.md +1855 -0
  109. package/dist/square/docs/payments/python/DOC.md +1728 -0
  110. package/dist/stripe/docs/api/DOC.md +1727 -0
  111. package/dist/stripe/docs/payments/DOC.md +1726 -0
  112. package/dist/stytch/docs/auth/javascript/DOC.md +1813 -0
  113. package/dist/stytch/docs/auth/python/DOC.md +1962 -0
  114. package/dist/supabase/docs/client/DOC.md +1606 -0
  115. package/dist/twilio/docs/messaging/python/DOC.md +469 -0
  116. package/dist/twilio/docs/messaging/typescript/DOC.md +946 -0
  117. package/dist/vercel/docs/platform/DOC.md +1940 -0
  118. package/dist/weaviate/docs/vector-db/javascript/DOC.md +1268 -0
  119. package/dist/weaviate/docs/vector-db/python/DOC.md +1388 -0
  120. package/dist/zendesk/docs/support/javascript/DOC.md +2150 -0
  121. package/dist/zendesk/docs/support/python/DOC.md +2297 -0
  122. package/package.json +22 -6
  123. package/skills/get-api-docs/SKILL.md +84 -0
  124. package/src/commands/annotate.js +83 -0
  125. package/src/commands/build.js +12 -1
  126. package/src/commands/feedback.js +150 -0
  127. package/src/commands/get.js +83 -42
  128. package/src/commands/search.js +7 -0
  129. package/src/index.js +43 -17
  130. package/src/lib/analytics.js +90 -0
  131. package/src/lib/annotations.js +57 -0
  132. package/src/lib/bm25.js +170 -0
  133. package/src/lib/cache.js +69 -6
  134. package/src/lib/config.js +8 -3
  135. package/src/lib/identity.js +99 -0
  136. package/src/lib/registry.js +103 -20
  137. package/src/lib/telemetry.js +86 -0
  138. package/src/mcp/server.js +177 -0
  139. package/src/mcp/tools.js +251 -0
@@ -0,0 +1,489 @@
1
+ ---
2
+ name: skill
3
+ description: Use when the user mentions document parsing, PDF extraction, OCR, markdown extraction, structured data extraction from documents, document classification/splitting, LandingAI, ADE API, or wants to pull data out of a PDF/image/spreadsheet
4
+ metadata:
5
+ updated-on: "2026-03-04"
6
+ source: maintainer
7
+ tags: "landingai,ade,document-extraction,parse,extract,split,ocr,pdf"
8
+ ---
9
+
10
+ # LandingAI ADE — Interactive Document Extraction
11
+
12
+ ## Overview
13
+
14
+ Guided wizard for LandingAI's Agentic Document Extraction API. Collects all config from the user via `AskUserQuestion`, then executes `curl` commands via `Bash`. Never write Python — always use curl.
15
+
16
+ ## When to Use
17
+
18
+ - User wants to parse a document (PDF, image, spreadsheet) into markdown
19
+ - User wants to extract structured fields from a document
20
+ - User wants to classify/split a multi-document PDF
21
+ - User mentions LandingAI, ADE, vision agent, document AI
22
+
23
+ ## CRITICAL RULES
24
+
25
+ 1. **Extract and Split accept MARKDOWN, not raw files.** Always parse first if the user has a PDF/image.
26
+ 2. **Auth is Bearer, not Basic.** Header: `Authorization: Bearer $VISION_AGENT_API_KEY`
27
+ 3. **File field names:** `document` for parse, `markdown` for extract/split. Never `pdf`, `file`, etc.
28
+ 4. **Always use `-F` (multipart form), never `-d` (JSON body).**
29
+ 5. **Use `jq -r` when extracting markdown** to avoid escaped/quoted strings.
30
+ 6. **NEVER read full output files into your context.** See Token Warnings below.
31
+
32
+ ## TOKEN WARNINGS
33
+
34
+ Parse output is ~55,000 tokens per page (grounding bounding boxes = ~36,000 of that). 10 pages = ~550,000 tokens.
35
+
36
+ **Output handling rules:**
37
+ - **Always pipe curl output to a file:** `| jq . > output.json`
38
+ - **Show a small jq summary** after each operation (see Step 4)
39
+ - **If the user wants to see the full output:** use `cat` via Bash so it displays in their terminal
40
+ - **If the user wants to analyze the output:** use Bash commands (`jq`, `grep`, `wc`, `head`, `tail`, etc.) to query the file and return only the targeted answer. Do NOT read the whole file into context. Examples:
41
+ ```bash
42
+ # Count chunks by type
43
+ jq '[.chunks[] | .type] | group_by(.) | map({type: .[0], count: length})' output.json
44
+ # Find chunks containing a keyword
45
+ jq '.chunks[] | select(.markdown | test("invoice"; "i")) | {id, type, markdown}' output.json
46
+ # Get page count
47
+ jq '.metadata.page_count' output.json
48
+ # List all unique grounding types
49
+ jq '[.grounding | to_entries[].value.type] | unique' output.json
50
+ ```
51
+ - **Never use the Read tool** on parse/extract/split output files
52
+ - **For markdown preview:** `head -20` on the saved .md file
53
+
54
+ **Summary queries (use after every operation):**
55
+ ```bash
56
+ # Parse (~430 tokens instead of ~55,000):
57
+ jq '{md_preview: (.markdown | .[0:500]), chunks: (.chunks | length), types: ([.chunks[].type] | unique), metadata: .metadata}' parse_output.json
58
+ # Extract (~500 tokens):
59
+ jq '.extraction' extract_output.json
60
+ # Split (~200 tokens):
61
+ jq '[.splits[] | {classification, pages, identifier}]' split_output.json
62
+ ```
63
+
64
+ ## Workflow
65
+
66
+ ```dot
67
+ digraph ade_wizard {
68
+ rankdir=TB;
69
+ node [shape=box];
70
+
71
+ collect_config [label="Step 1: Collect Config\n(API key, region, output dir)"];
72
+ choose_op [label="Step 2: Choose Operation" shape=diamond];
73
+ parse [label="Parse"];
74
+ extract [label="Extract"];
75
+ split [label="Split"];
76
+ parse_job [label="Parse Job (async)"];
77
+
78
+ extract_has_md [label="Has markdown already?" shape=diamond];
79
+ split_has_md [label="Has markdown already?" shape=diamond];
80
+
81
+ parse_first_e [label="Parse first → get markdown"];
82
+ parse_first_s [label="Parse first → get markdown"];
83
+ show_preview [label="Show markdown preview to user"];
84
+ show_preview_s [label="Show markdown preview to user"];
85
+
86
+ schema_choice [label="Schema: build new, load file,\nor generate from doc?" shape=diamond];
87
+ build_schema [label="Interactive schema builder\n(iterate until done)"];
88
+ load_schema [label="Load schema from file"];
89
+ gen_schema [label="Generate schema from\ndocument content"];
90
+ save_schema [label="Save schema to file?"];
91
+
92
+ split_choice [label="Split config: build new\nor load file?" shape=diamond];
93
+ build_split [label="Interactive split builder\n(iterate until done)"];
94
+ load_split [label="Load split config from file"];
95
+ save_split [label="Save split config to file?"];
96
+
97
+ run_extract [label="Run extract curl"];
98
+ run_split [label="Run split curl"];
99
+ run_parse [label="Run parse curl"];
100
+ run_job [label="Run parse job curl\n(poll if requested)"];
101
+
102
+ save_output [label="Save output to file"];
103
+
104
+ collect_config -> choose_op;
105
+ choose_op -> parse [label="Parse"];
106
+ choose_op -> extract [label="Extract"];
107
+ choose_op -> split [label="Split"];
108
+ choose_op -> parse_job [label="Parse Job"];
109
+
110
+ parse -> run_parse -> save_output;
111
+ parse_job -> run_job -> save_output;
112
+
113
+ extract -> extract_has_md;
114
+ extract_has_md -> parse_first_e [label="no"];
115
+ extract_has_md -> schema_choice [label="yes"];
116
+ parse_first_e -> show_preview -> schema_choice;
117
+ schema_choice -> build_schema [label="build"];
118
+ schema_choice -> load_schema [label="load"];
119
+ schema_choice -> gen_schema [label="generate"];
120
+ build_schema -> save_schema -> run_extract -> save_output;
121
+ load_schema -> run_extract;
122
+ gen_schema -> save_schema;
123
+
124
+ split -> split_has_md;
125
+ split_has_md -> parse_first_s [label="no"];
126
+ split_has_md -> split_choice [label="yes"];
127
+ parse_first_s -> show_preview_s -> split_choice;
128
+ split_choice -> build_split [label="build"];
129
+ split_choice -> load_split [label="load"];
130
+ build_split -> save_split -> run_split -> save_output;
131
+ load_split -> run_split;
132
+ }
133
+ ```
134
+
135
+ ## Step 1: Collect Configuration
136
+
137
+ Use `AskUserQuestion` to gather ALL of these upfront:
138
+
139
+ **Question 1 — API Key:**
140
+
141
+ > "What is your VISION_AGENT_API_KEY? (Type 'env' if it's already set as an environment variable)"
142
+
143
+ - If `env`: use `$VISION_AGENT_API_KEY` in all commands. Validate with:
144
+ ```bash
145
+ if [ -z "$VISION_AGENT_API_KEY" ]; then echo "ERROR: VISION_AGENT_API_KEY not set"; fi
146
+ ```
147
+ - Otherwise: store the provided value and use it directly in commands.
148
+
149
+ **Question 2 — Region:**
150
+
151
+ > Options: `US (default)`, `EU`
152
+
153
+ | Region | Base URL |
154
+ | ------ | ------------------------------------- |
155
+ | US | `https://api.va.landing.ai` |
156
+ | EU | `https://api.va.eu-west-1.landing.ai` |
157
+
158
+ **Question 3 — Operation:**
159
+
160
+ > Options: `Parse`, `Extract`, `Split`, `Parse Job (async)`
161
+
162
+ **Question 4 — Output Directory:**
163
+
164
+ > "Where should output files be saved? (e.g., ./output)"
165
+
166
+ Then `mkdir -p` the output directory.
167
+
168
+ ## Step 2: Collect Operation-Specific Inputs
169
+
170
+ ### For Parse
171
+
172
+ Ask:
173
+
174
+ 1. "Local file path or URL?" → determines `document=@/path` vs `document_url=https://...`
175
+ 2. "Split by page?" → yes adds `-F "split=page"`
176
+
177
+ ### For Extract
178
+
179
+ Ask:
180
+
181
+ 1. "Local file path or URL to your document? (PDF/image for raw file, or .md if already parsed)"
182
+ 2. Detect file type:
183
+ - If `.md` file → use directly as markdown input, skip to schema step
184
+ - If PDF/image → **parse first**, save markdown, show preview to user
185
+ 3. Schema source (see Schema Builder section below)
186
+
187
+ ### For Split
188
+
189
+ Ask:
190
+
191
+ 1. "Local file path or URL to your document?"
192
+ 2. Same parse-first logic as Extract
193
+ 3. Split config source (see Split Builder section below)
194
+
195
+ ### For Parse Job
196
+
197
+ Ask:
198
+
199
+ 1. "Local file path or URL?"
200
+ 2. "Poll until complete?" → yes/no
201
+
202
+ ## Schema Builder (for Extract)
203
+
204
+ After the user has markdown (either provided or from parsing), offer three choices via `AskUserQuestion`:
205
+
206
+ > "How do you want to define your extraction schema?"
207
+ >
208
+ > - **Build interactively** — I'll walk you through adding fields one by one
209
+ > - **Generate from document** — I'll analyze the parsed markdown and suggest a schema
210
+ > - **Load from file** — Load a previously saved schema JSON file
211
+
212
+ ### Option A: Build Interactively
213
+
214
+ Loop until the user says done:
215
+
216
+ 1. Ask: "Field name? (e.g., invoice_number, vendor_name, total_amount)"
217
+ 2. Ask: "Field type?"
218
+ - Options: `string`, `number`, `boolean`, `array of strings`, `array of objects`
219
+ 3. Ask: "Description? (helps the model understand what to look for)"
220
+ 4. If `array of objects`: recursively ask for sub-fields
221
+ 5. Show the current schema so far
222
+ 6. Ask: "Add another field, edit a field, remove a field, or done?"
223
+ - **Add another** → repeat from step 1
224
+ - **Edit** → ask which field, then re-ask type/description
225
+ - **Remove** → ask which field to remove
226
+ - **Done** → finalize schema
227
+
228
+ Assemble into JSON:
229
+
230
+ ```json
231
+ {
232
+ "type": "object",
233
+ "properties": {
234
+ "invoice_number": { "type": "string", "description": "Invoice number" },
235
+ "total_amount": { "type": "number", "description": "Total dollar amount" },
236
+ "line_items": {
237
+ "type": "array",
238
+ "items": {
239
+ "type": "object",
240
+ "properties": {
241
+ "description": {
242
+ "type": "string",
243
+ "description": "Item description"
244
+ },
245
+ "amount": { "type": "number", "description": "Line item amount" }
246
+ }
247
+ }
248
+ }
249
+ }
250
+ }
251
+ ```
252
+
253
+ ### Option B: Generate from Document
254
+
255
+ 1. Read the parsed markdown content (from the parse output file)
256
+ 2. Analyze the markdown to identify extractable fields — look for:
257
+ - **Key-value patterns**: lines like `Invoice #: 12345`, `Name: ___`, `Date: 01/15/2024`
258
+ - **Table headers**: column names in markdown tables suggest array-of-object fields
259
+ - **Labeled sections**: `Section 2: Insurance Information` suggests grouped fields
260
+ - **Repeated structures**: multiple similar entries suggest `array` types
261
+ - **Checkboxes/booleans**: `[x]` or `Yes/No` fields → `boolean` type
262
+ - **Numeric values**: amounts, totals, quantities → `number` type
263
+ - **Dates**: any date-like content → `string` with date description
264
+ 3. Build a JSON schema from the detected fields (use `string` as default type when uncertain)
265
+ 4. Present the suggested schema to the user showing each field name, type, and description
266
+ 5. Ask: "Does this look right? Edit any fields, or accept?"
267
+ 6. Allow iterative edits (same add/edit/remove loop as Option A)
268
+
269
+ ### Option C: Load from File
270
+
271
+ 1. Ask: "Path to your schema JSON file?"
272
+ 2. Read and validate the file:
273
+ ```bash
274
+ cat /path/to/schema.json | jq .
275
+ ```
276
+ 3. Show it to the user for confirmation
277
+ 4. Allow edits if needed
278
+
279
+ ### Save Schema
280
+
281
+ After finalizing (any option), ask:
282
+
283
+ > "Save this schema for reuse? (provide a file path, or 'no')"
284
+
285
+ If yes:
286
+
287
+ ```bash
288
+ cat << 'SCHEMA_EOF' > /path/to/schema.json
289
+ { ... the schema ... }
290
+ SCHEMA_EOF
291
+ ```
292
+
293
+ ## Split Config Builder (for Split)
294
+
295
+ After the user has markdown, offer two choices:
296
+
297
+ > "How do you want to define your split classifications?"
298
+ >
299
+ > - **Build interactively** — I'll walk you through adding categories
300
+ > - **Load from file** — Load a previously saved split config
301
+
302
+ ### Option A: Build Interactively
303
+
304
+ Loop until done:
305
+
306
+ 1. Ask: "Category name? (e.g., 'Bank Statement', 'Pay Stub', 'Invoice')"
307
+ 2. Ask: "Description? (what does this document type look like?)"
308
+ 3. Ask: "Identifier field? (optional — a field to group/partition by, e.g., 'Account Number', 'Invoice Date'). Type 'none' to skip."
309
+ 4. Show current config so far
310
+ 5. Ask: "Add another category, edit one, remove one, or done?"
311
+
312
+ Assemble into JSON array:
313
+
314
+ ```json
315
+ [
316
+ {
317
+ "name": "Bank Statement",
318
+ "description": "Bank account activity summary over a period"
319
+ },
320
+ {
321
+ "name": "Pay Stub",
322
+ "description": "Employee earnings and deductions",
323
+ "identifier": "Pay Stub Date"
324
+ },
325
+ {
326
+ "name": "Invoice",
327
+ "description": "Bill for goods or services",
328
+ "identifier": "Invoice Number"
329
+ }
330
+ ]
331
+ ```
332
+
333
+ ### Option B: Load from File
334
+
335
+ Same pattern as schema loading — read, validate, confirm, allow edits.
336
+
337
+ ### Save Split Config
338
+
339
+ After finalizing, ask:
340
+
341
+ > "Save this split config for reuse?"
342
+
343
+ If yes, write to the specified path.
344
+
345
+ ## Step 3: Execute
346
+
347
+ ### Parse Command
348
+
349
+ ```bash
350
+ curl -s -X POST "${BASE_URL}/v1/ade/parse" \
351
+ -H "Authorization: Bearer ${API_KEY}" \
352
+ -F "document=@/path/to/file.pdf" \
353
+ -F "model=dpt-2-latest" | jq . > ${OUTPUT_DIR}/parse_output.json
354
+ ```
355
+
356
+ For URL input, replace `-F "document=@..."` with `-F "document_url=https://..."`.
357
+ For page splitting, add `-F "split=page"`.
358
+
359
+ ### Extract Command (after parse + schema built)
360
+
361
+ ```bash
362
+ curl -s -X POST "${BASE_URL}/v1/ade/extract" \
363
+ -H "Authorization: Bearer ${API_KEY}" \
364
+ -F "markdown=@/path/to/parsed_markdown.md" \
365
+ -F "model=extract-latest" \
366
+ -F "schema=$(cat /path/to/schema.json)" | jq . > ${OUTPUT_DIR}/extract_output.json
367
+ ```
368
+
369
+ ### Split Command (after parse + split config built)
370
+
371
+ ```bash
372
+ curl -s -X POST "${BASE_URL}/v1/ade/split" \
373
+ -H "Authorization: Bearer ${API_KEY}" \
374
+ -F "markdown=@/path/to/parsed_markdown.md" \
375
+ -F "model=split-latest" \
376
+ -F "split_class=$(cat /path/to/split_config.json)" | jq . > ${OUTPUT_DIR}/split_output.json
377
+ ```
378
+
379
+ ### Parse Job Commands
380
+
381
+ ```bash
382
+ # Create
383
+ JOB_ID=$(curl -s -X POST "${BASE_URL}/v1/ade/parse/jobs" \
384
+ -H "Authorization: Bearer ${API_KEY}" \
385
+ -F "document=@/path/to/file.pdf" \
386
+ -F "model=dpt-2-latest" | jq -r '.job_id')
387
+ echo "Job: $JOB_ID"
388
+
389
+ # Poll
390
+ while true; do
391
+ RESP=$(curl -s "${BASE_URL}/v1/ade/parse/jobs/$JOB_ID" \
392
+ -H "Authorization: Bearer ${API_KEY}")
393
+ STATUS=$(echo "$RESP" | jq -r '.status')
394
+ echo "Status: $STATUS | Progress: $(echo "$RESP" | jq -r '.progress')"
395
+ [ "$STATUS" = "completed" ] || [ "$STATUS" = "failed" ] && break
396
+ sleep 5
397
+ done
398
+ echo "$RESP" | jq . > ${OUTPUT_DIR}/job_result.json
399
+
400
+ # List jobs
401
+ curl -s "${BASE_URL}/v1/ade/parse/jobs?status=completed&page=0&pageSize=10" \
402
+ -H "Authorization: Bearer ${API_KEY}" | jq .
403
+ ```
404
+
405
+ ## Step 4: Present Results
406
+
407
+ After execution:
408
+
409
+ 1. Show the user a summary of the output (key fields, not the full JSON dump)
410
+ 2. Tell them where the file was saved
411
+ 3. For Extract: show the `.extraction` object formatted nicely
412
+ 4. For Split: show each `.splits[].classification` with page ranges
413
+ 5. Ask: "Want to run another operation on this document?"
414
+
415
+ ## Response Structure Reference
416
+
417
+ ### Parse Response
418
+
419
+ ```
420
+ .markdown → full document markdown
421
+ .chunks[] → {id, markdown, type, grounding: {box, page}}
422
+ .metadata → {credit_usage, duration_ms, filename, job_id}
423
+ .splits[] → {class, identifier, markdown, pages[], chunks[]}
424
+ ```
425
+
426
+ ### Extract Response
427
+
428
+ ```
429
+ .extraction → the extracted key-value pairs (matches your schema)
430
+ .extraction_metadata → key-values with chunk_reference for grounding
431
+ .metadata → {credit_usage, duration_ms, filename, job_id}
432
+ .metadata.schema_violation_error → non-null if extraction didn't match schema
433
+ ```
434
+
435
+ ### Split Response
436
+
437
+ ```
438
+ .splits[] → {classification, identifier, markdowns[], pages[]}
439
+ .metadata → {credit_usage, duration_ms, filename, page_count}
440
+ ```
441
+
442
+ ### Job Response
443
+
444
+ ```
445
+ .job_id, .status → pending|processing|completed|failed|cancelled
446
+ .progress → 0.0 to 1.0
447
+ .data → full parse response when completed
448
+ .output_url → presigned URL if result > 1MB (expires 1hr)
449
+ .failure_reason → error details if failed
450
+ ```
451
+
452
+ ## Quick Reference
453
+
454
+ | Endpoint | Method | Path | Model | Input |
455
+ | ---------- | ------ | ------------------------- | ---------------- | ---------------------------------------------------------- |
456
+ | Parse | POST | `/v1/ade/parse` | `dpt-2-latest` | `document` (file) or `document_url` |
457
+ | Extract | POST | `/v1/ade/extract` | `extract-latest` | `markdown` (file/string) or `markdown_url` + `schema` |
458
+ | Split | POST | `/v1/ade/split` | `split-latest` | `markdown` (file/string) or `markdown_url` + `split_class` |
459
+ | Create Job | POST | `/v1/ade/parse/jobs` | `dpt-2-latest` | `document` or `document_url` |
460
+ | Get Job | GET | `/v1/ade/parse/jobs/{id}` | — | — |
461
+ | List Jobs | GET | `/v1/ade/parse/jobs` | — | `?status=&page=&pageSize=` |
462
+
463
+ | Supported Files | Types |
464
+ | --------------- | ------------------------------------------ |
465
+ | Documents | PDF, PNG, JPG, JPEG, TIFF, BMP, WEBP, HEIC |
466
+ | Spreadsheets | XLSX, CSV |
467
+
468
+ ## Common Mistakes
469
+
470
+ | Mistake | Fix |
471
+ | ------------------------------------- | --------------------------------------------------------- |
472
+ | Sending PDF to `/extract` or `/split` | Parse first to get markdown |
473
+ | `Authorization: Basic` | Must be `Authorization: Bearer` |
474
+ | `-F "pdf=@..."` or `-F "file=@..."` | Field is `document` (parse) or `markdown` (extract/split) |
475
+ | Missing `@` before file path | `-F "document=@/path"` needs the `@` |
476
+ | Using `-d` instead of `-F` | Always use `-F` for multipart form |
477
+ | Missing `schema` on extract | Required — build one using the schema builder |
478
+ | Not using `jq -r` for markdown | Avoids escaped/quoted strings in output |
479
+ | Sync parse for huge docs | Use `/v1/ade/parse/jobs` for 50+ pages |
480
+
481
+ ## Error Codes
482
+
483
+ | Code | Meaning | Action |
484
+ | ---- | ------------------- | ------------------------------------- |
485
+ | 401 | Bad/missing API key | Check `VISION_AGENT_API_KEY` |
486
+ | 400 | Bad request | Validate inputs, check file format |
487
+ | 422 | Unprocessable | Invalid file type or malformed schema |
488
+ | 429 | Rate limited | Wait and retry |
489
+ | 500+ | Server error | Retry after a few seconds |