paddleocr-skills 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (29) hide show
  1. package/README.md +220 -0
  2. package/bin/paddleocr-skills.js +20 -0
  3. package/lib/copy.js +39 -0
  4. package/lib/installer.js +70 -0
  5. package/lib/prompts.js +67 -0
  6. package/lib/python.js +75 -0
  7. package/lib/verify.js +121 -0
  8. package/package.json +42 -0
  9. package/templates/.env.example +12 -0
  10. package/templates/paddleocr-vl/references/paddleocr-vl/layout_schema.md +64 -0
  11. package/templates/paddleocr-vl/references/paddleocr-vl/output_format.md +154 -0
  12. package/templates/paddleocr-vl/references/paddleocr-vl/vl_model_spec.md +157 -0
  13. package/templates/paddleocr-vl/scripts/paddleocr-vl/_lib.py +780 -0
  14. package/templates/paddleocr-vl/scripts/paddleocr-vl/configure.py +270 -0
  15. package/templates/paddleocr-vl/scripts/paddleocr-vl/optimize_file.py +226 -0
  16. package/templates/paddleocr-vl/scripts/paddleocr-vl/requirements-optimize.txt +8 -0
  17. package/templates/paddleocr-vl/scripts/paddleocr-vl/requirements.txt +7 -0
  18. package/templates/paddleocr-vl/scripts/paddleocr-vl/smoke_test.py +199 -0
  19. package/templates/paddleocr-vl/scripts/paddleocr-vl/vl_caller.py +232 -0
  20. package/templates/paddleocr-vl/skills/paddleocr-vl/SKILL.md +481 -0
  21. package/templates/ppocrv5/references/ppocrv5/agent_policy.md +258 -0
  22. package/templates/ppocrv5/references/ppocrv5/normalized_schema.md +257 -0
  23. package/templates/ppocrv5/references/ppocrv5/provider_api.md +140 -0
  24. package/templates/ppocrv5/scripts/ppocrv5/_lib.py +635 -0
  25. package/templates/ppocrv5/scripts/ppocrv5/configure.py +346 -0
  26. package/templates/ppocrv5/scripts/ppocrv5/ocr_caller.py +684 -0
  27. package/templates/ppocrv5/scripts/ppocrv5/requirements.txt +4 -0
  28. package/templates/ppocrv5/scripts/ppocrv5/smoke_test.py +139 -0
  29. package/templates/ppocrv5/skills/ppocrv5/SKILL.md +272 -0
@@ -0,0 +1,258 @@
1
+ # Agent Policy: Auto Mode Strategy
2
+
3
+ This document defines the execution strategies for the three OCR modes, with emphasis on the auto mode's adaptive retry logic.
4
+
5
+ ## Mode Overview
6
+
7
+ | Mode | Strategy | Use Case |
8
+ |------|----------|----------|
9
+ | `fast` | Single call, all corrections off | Lowest latency, suitable for high-quality scans |
10
+ | `quality` | Single call, all corrections on | Maximum quality, slower, suitable for complex documents |
11
+ | `auto` | Adaptive multiple attempts, quality scoring | Balanced performance, production default |
12
+
13
+ ## Mode: fast
14
+
15
+ **Single attempt**, minimal preprocessing.
16
+
17
+ **Options:**
18
+ ```json
19
+ {
20
+ "use_doc_orientation_classify": false,
21
+ "use_doc_unwarping": false,
22
+ "use_textline_orientation": false
23
+ }
24
+ ```
25
+
26
+ **Suitable scenarios:**
27
+ - Input is known high-quality images (scanned documents, screenshots)
28
+ - Latency is critical
29
+ - Text is already correctly oriented
30
+
31
+ ## Mode: quality
32
+
33
+ **Single attempt**, all corrections enabled.
34
+
35
+ **Options:**
36
+ ```json
37
+ {
38
+ "use_doc_orientation_classify": true,
39
+ "use_doc_unwarping": true,
40
+ "use_textline_orientation": false
41
+ }
42
+ ```
43
+
44
+ **Suitable scenarios:**
45
+ - Input quality is unknown or poor (photos, rotated PDFs)
46
+ - Maximum accuracy is required
47
+ - Latency is acceptable (2-3x slower than fast)
48
+
49
+ ## Mode: auto (default)
50
+
51
+ **Adaptive strategy**: Start fast, escalate progressively if quality is insufficient.
52
+
53
+ ### Attempt Sequence
54
+
55
+ #### Attempt 1: Fast Path
56
+ ```json
57
+ {
58
+ "use_doc_orientation_classify": false,
59
+ "use_doc_unwarping": false,
60
+ "use_textline_orientation": false
61
+ }
62
+ ```
63
+
64
+ **If `quality_score >= quality_target`**: Stop and return result.
65
+
66
+ **Otherwise**: Continue to Attempt 2.
67
+
68
+ ---
69
+
70
+ #### Attempt 2: Orientation Correction
71
+ ```json
72
+ {
73
+ "use_doc_orientation_classify": true,
74
+ "use_doc_unwarping": false,
75
+ "use_textline_orientation": false
76
+ }
77
+ ```
78
+
79
+ Enable page-level orientation detection (0°/90°/180°/270°).
80
+
81
+ **If `quality_score >= quality_target`**: Stop and return result.
82
+
83
+ **Otherwise**: Continue to Attempt 3.
84
+
85
+ ---
86
+
87
+ #### Attempt 3: Unwarping Correction
88
+ ```json
89
+ {
90
+ "use_doc_orientation_classify": true,
91
+ "use_doc_unwarping": true,
92
+ "use_textline_orientation": false
93
+ }
94
+ ```
95
+
96
+ Add perspective correction and skew correction.
97
+
98
+ **If `quality_score >= quality_target`**: Stop and return result.
99
+
100
+ **Otherwise**: Select the best attempt so far.
101
+
102
+ ---
103
+
104
+ #### Optional Attempt 4: Textline Orientation
105
+
106
+ *(Reserved for future use)*
107
+
108
+ ```json
109
+ {
110
+ "use_doc_orientation_classify": true,
111
+ "use_doc_unwarping": true,
112
+ "use_textline_orientation": true
113
+ }
114
+ ```
115
+
116
+ Add line-by-line angle correction (rarely needed).
117
+
118
+ ### Stop Conditions
119
+
120
+ Auto mode stops when **any** of the following conditions are met:
121
+
122
+ 1. **Quality target reached**: `quality_score >= quality_target` (default 0.72)
123
+ 2. **Max attempts reached**: `attempt_count >= max_attempts` (default 3)
124
+ 3. **Budget exceeded**: `total_elapsed_ms >= budget_ms` (default 25000)
125
+
126
+ ### Selection Strategy
127
+
128
+ If multiple attempts are completed, **select the attempt with the highest `quality_score`**.
129
+
130
+ ## Quality Scoring
131
+
132
+ Quality score balances text quantity and recognition confidence.
133
+
134
+ ### Formula
135
+
136
+ ```
137
+ quality_score = 0 if text_items == 0
138
+ = 0.6 * norm(text_items) + 0.4 * avg_rec_score otherwise
139
+
140
+ norm(n) = min(1, log(1+n) / log(1+50))
141
+ ```
142
+
143
+ Where:
144
+ - `text_items`: Number of recognized text blocks
145
+ - `avg_rec_score`: Average of all recognition confidence scores (0.0-1.0)
146
+ - If `rec_scores` is missing, default to `0.5`
147
+
148
+ ### Interpretation
149
+
150
+ | Quality Score | Interpretation |
151
+ |---------------|----------------|
152
+ | 0.90 - 1.00 | Excellent (high confidence, many items) |
153
+ | 0.72 - 0.89 | Good (default target) |
154
+ | 0.50 - 0.71 | Fair (may need retry) |
155
+ | 0.00 - 0.49 | Poor (may be low-quality input or blank) |
156
+
157
+ ### Default Target
158
+
159
+ `quality_target = 0.72`
160
+
161
+ This balances cost (API calls) and quality. Adjust with `--quality-target` if needed.
162
+
163
+ ## Budget Management
164
+
165
+ Auto mode respects a **total time budget** (default 25000ms).
166
+
167
+ - Before each attempt, check: `elapsed_ms < budget_ms`
168
+ - If budget exceeded, stop and return best attempt so far
169
+ - Provider timeout is independent (`PADDLE_OCR_TIMEOUT_MS`, default 25000ms)
170
+
171
+ **Example:**
172
+ - Attempt 1: 1200ms, quality 0.61 → Continue
173
+ - Attempt 2: 1800ms, quality 0.79 → Stop (target reached)
174
+ - Total: 3000ms
175
+
176
+ ## Error Handling
177
+
178
+ ### Provider Errors
179
+
180
+ If an attempt fails with a provider error (authentication, quota, etc.):
181
+ - **Stop immediately** (do not retry with different options)
182
+ - Return error response with failed attempt shown in trace
183
+
184
+ ### Temporary Errors (503/504)
185
+
186
+ - Retry **within the same attempt** (up to 2 retries with backoff)
187
+ - If all retries fail, treat as error and stop
188
+
189
+ ## Examples
190
+
191
+ ### Example 1: Fast Path Success
192
+
193
+ **Input:** High-quality scanned invoice
194
+
195
+ **Execution:**
196
+ 1. Attempt 1 (fast): quality_score = 0.85 → **Stop**
197
+
198
+ **Result:** Select Attempt 1, total time 1.2s
199
+
200
+ ---
201
+
202
+ ### Example 2: Retry with Improvement
203
+
204
+ **Input:** Photo of rotated document
205
+
206
+ **Execution:**
207
+ 1. Attempt 1 (fast): quality_score = 0.48 → Continue
208
+ 2. Attempt 2 (orientation): quality_score = 0.81 → **Stop**
209
+
210
+ **Result:** Select Attempt 2, total time 3.1s
211
+
212
+ ---
213
+
214
+ ### Example 3: All Attempts Needed
215
+
216
+ **Input:** Photo of warped document
217
+
218
+ **Execution:**
219
+ 1. Attempt 1 (fast): quality_score = 0.35 → Continue
220
+ 2. Attempt 2 (orientation): quality_score = 0.58 → Continue
221
+ 3. Attempt 3 (unwarping): quality_score = 0.76 → **Stop**
222
+
223
+ **Result:** Select Attempt 3, total time 5.2s
224
+
225
+ ---
226
+
227
+ ### Example 4: No Improvement
228
+
229
+ **Input:** Blank or corrupted image
230
+
231
+ **Execution:**
232
+ 1. Attempt 1 (fast): quality_score = 0.0, text_items = 0 → Continue
233
+ 2. Attempt 2 (orientation): quality_score = 0.0, text_items = 0 → Continue
234
+ 3. Attempt 3 (unwarping): quality_score = 0.0, text_items = 0 → **Stop**
235
+
236
+ **Result:** Select Attempt 1 (all are 0.0), return with warning
237
+
238
+ ## Tuning Parameters
239
+
240
+ Users can override defaults via CLI:
241
+
242
+ ```bash
243
+ --mode auto \
244
+ --max-attempts 2 \ # Reduce to 2 for faster execution (but less robustness)
245
+ --budget-ms 15000 \ # Stricter budget
246
+ --quality-target 0.80 # Higher quality standard
247
+ ```
248
+
249
+ ## Testing Auto Mode
250
+
251
+ For unit tests, simulate provider responses with different `rec_texts` and `rec_scores` to verify:
252
+
253
+ 1. Quality score calculation
254
+ 2. Attempt selection (highest score wins)
255
+ 3. Early stopping (when target is reached)
256
+ 4. Budget enforcement
257
+
258
+ See `scripts/tests/test_agent_policy.py`.
@@ -0,0 +1,257 @@
1
+ # Normalized Output Schema
2
+
3
+ This document defines the unified output format returned by `ocr_caller.py`, and the format that downstream agents/tools should expect.
4
+
5
+ ## Schema Version
6
+
7
+ **v0.1** (stable)
8
+
9
+ ## Output Structure
10
+
11
+ All responses follow this top-level structure:
12
+
13
+ ```typescript
14
+ {
15
+ ok: boolean, // true indicates OCR success
16
+ request_id: string, // Unique request ID (e.g.: "req_abc123")
17
+ provider: ProviderInfo, // Provider API metadata
18
+ result: Result | null, // OCR results (null on error)
19
+ quality: Quality | null, // Quality metrics (null on error)
20
+ agent_trace: AgentTrace, // Execution trace for transparency
21
+ raw_provider: any | null, // Raw provider response (if --return-raw-provider is used)
22
+ error: Error | null // Error details (null on success)
23
+ }
24
+ ```
25
+
26
+ ## ProviderInfo
27
+
28
+ ```typescript
29
+ {
30
+ api_url: string, // Full API endpoint used
31
+ status_code: number, // HTTP status code
32
+ log_id: string | null // Provider's log ID (if available)
33
+ }
34
+ ```
35
+
36
+ ## Result (success only)
37
+
38
+ ```typescript
39
+ {
40
+ pages: Page[], // Array of pages (one per image/PDF page)
41
+ full_text: string // All text joined by "\n\n"
42
+ }
43
+ ```
44
+
45
+ ### Page
46
+
47
+ ```typescript
48
+ {
49
+ page_index: number, // 0-based page number
50
+ text: string, // Page text (items joined by "\n")
51
+ avg_confidence: number, // Average confidence for this page (0.0-1.0)
52
+ items: TextItem[] // Individual text blocks/lines
53
+ }
54
+ ```
55
+
56
+ ### TextItem
57
+
58
+ ```typescript
59
+ {
60
+ text: string, // Recognized text
61
+ score?: number, // Confidence score (0.0-1.0), may be missing
62
+ box?: number[] | number[][] // Bounding box or polygon, may be missing
63
+ }
64
+ ```
65
+
66
+ **Box format:**
67
+ - `[xmin, ymin, xmax, ymax]` - Bounding box (4 numbers)
68
+ - Or polygon points (array of arrays)
69
+
70
+ ## Quality (success only)
71
+
72
+ ```typescript
73
+ {
74
+ quality_score: number, // Overall quality (0.0-1.0)
75
+ avg_rec_score: number, // Average recognition confidence (0.0-1.0)
76
+ text_items: number, // Total text items detected
77
+ warnings: string[] // Warnings (e.g.: "rec_scores missing")
78
+ }
79
+ ```
80
+
81
+ ### Quality Score Formula
82
+
83
+ ```
84
+ quality_score = 0 if text_items == 0
85
+ = 0.6 * norm(text_items) + 0.4 * avg_rec_score otherwise
86
+
87
+ norm(n) = min(1, log(1+n) / log(1+50))
88
+ ```
89
+
90
+ ## AgentTrace
91
+
92
+ ```typescript
93
+ {
94
+ mode: "fast" | "quality" | "auto", // Execution mode
95
+ selected_attempt: number, // Attempt used (1-indexed)
96
+ attempts: Attempt[] // Details of all attempts
97
+ }
98
+ ```
99
+
100
+ ### Attempt
101
+
102
+ ```typescript
103
+ {
104
+ attempt: number, // Attempt number (1-indexed)
105
+ provider_time_ms: number, // Provider call latency
106
+ quality_score: number, // Quality score for this attempt
107
+ avg_rec_score: number, // Average recognition score
108
+ text_items: number, // Number of text items
109
+ warnings: string[], // Warnings for this attempt
110
+ options_effective: { // OCR options used
111
+ use_doc_orientation_classify: boolean,
112
+ use_doc_unwarping: boolean,
113
+ use_textline_orientation: boolean
114
+ }
115
+ }
116
+ ```
117
+
118
+ ## Error (error only)
119
+
120
+ ```typescript
121
+ {
122
+ code: ErrorCode, // Unified error code
123
+ message: string, // Human-readable error message
124
+ details: object // Additional error context
125
+ }
126
+ ```
127
+
128
+ ### ErrorCode Enum
129
+
130
+ - `PROVIDER_AUTH_ERROR` - Authentication failed (403)
131
+ - `PROVIDER_QUOTA_EXCEEDED` - Quota/rate limit exceeded (429)
132
+ - `PROVIDER_BAD_REQUEST` - Invalid parameters (500)
133
+ - `PROVIDER_OVERLOADED` - Service overloaded (503)
134
+ - `PROVIDER_TIMEOUT` - Gateway timeout (504)
135
+ - `PROVIDER_ERROR` - Other provider errors
136
+
137
+ ## Example: Success Response
138
+
139
+ ```json
140
+ {
141
+ "ok": true,
142
+ "request_id": "req_abc123",
143
+ "provider": {
144
+ "api_url": "https://example.aistudio-app.com/ocr",
145
+ "status_code": 200,
146
+ "log_id": "log_xyz"
147
+ },
148
+ "result": {
149
+ "pages": [
150
+ {
151
+ "page_index": 0,
152
+ "text": "Invoice\nAmount: $123.45",
153
+ "avg_confidence": 0.95,
154
+ "items": [
155
+ {"text": "Invoice", "score": 0.98, "box": [10, 20, 100, 50]},
156
+ {"text": "Amount: $123.45", "score": 0.92, "box": [10, 60, 200, 90]}
157
+ ]
158
+ }
159
+ ],
160
+ "full_text": "Invoice\nAmount: $123.45"
161
+ },
162
+ "quality": {
163
+ "quality_score": 0.79,
164
+ "avg_rec_score": 0.95,
165
+ "text_items": 2,
166
+ "warnings": []
167
+ },
168
+ "agent_trace": {
169
+ "mode": "auto",
170
+ "selected_attempt": 1,
171
+ "attempts": [
172
+ {
173
+ "attempt": 1,
174
+ "provider_time_ms": 1200,
175
+ "quality_score": 0.79,
176
+ "avg_rec_score": 0.95,
177
+ "text_items": 2,
178
+ "warnings": [],
179
+ "options_effective": {
180
+ "use_doc_orientation_classify": false,
181
+ "use_doc_unwarping": false,
182
+ "use_textline_orientation": false
183
+ }
184
+ }
185
+ ]
186
+ },
187
+ "raw_provider": null,
188
+ "error": null
189
+ }
190
+ ```
191
+
192
+ ## Example: Error Response
193
+
194
+ ```json
195
+ {
196
+ "ok": false,
197
+ "request_id": "req_def456",
198
+ "provider": {
199
+ "api_url": "https://example.aistudio-app.com/ocr",
200
+ "status_code": 403,
201
+ "log_id": null
202
+ },
203
+ "result": null,
204
+ "quality": null,
205
+ "agent_trace": {
206
+ "mode": "auto",
207
+ "selected_attempt": 1,
208
+ "attempts": [
209
+ {
210
+ "attempt": 1,
211
+ "provider_time_ms": 150,
212
+ "quality_score": 0.0,
213
+ "avg_rec_score": 0.0,
214
+ "text_items": 0,
215
+ "warnings": ["Provider error: Authentication failed"],
216
+ "options_effective": {
217
+ "use_doc_orientation_classify": false,
218
+ "use_doc_unwarping": false,
219
+ "use_textline_orientation": false
220
+ }
221
+ }
222
+ ]
223
+ },
224
+ "raw_provider": null,
225
+ "error": {
226
+ "code": "PROVIDER_AUTH_ERROR",
227
+ "message": "Authentication failed",
228
+ "details": {
229
+ "error_code": 403,
230
+ "status_code": 403
231
+ }
232
+ }
233
+ }
234
+ ```
235
+
236
+ ## Usage Guide
237
+
238
+ ### For Agents/Scripts
239
+
240
+ 1. **Check `ok` first**: `if response.ok:`
241
+ 2. **Extract text**: `response.result.full_text`
242
+ 3. **Extract structured data**: `response.result.pages[].items[]`
243
+ 4. **Check quality**: `response.quality.quality_score` (0.72+ is usually good)
244
+ 5. **Handle errors**: `response.error.code` and `response.error.message`
245
+
246
+ ### For Debugging
247
+
248
+ 1. **Check trace**: `response.agent_trace.attempts` shows all attempts and their quality
249
+ 2. **Selected attempt**: `response.agent_trace.selected_attempt` indicates which one was selected
250
+ 3. **Raw provider**: Use `--return-raw-provider` to see raw API response
251
+
252
+ ## Compatibility Notes
253
+
254
+ - **Missing scores**: `items[].score` may not exist if provider didn't return scores
255
+ - **Missing boxes**: `items[].box` may not exist if provider didn't return geometry
256
+ - **Empty results**: `text_items == 0` means no text detected (not necessarily an error)
257
+ - **Warnings**: Check `quality.warnings` for non-fatal issues (e.g.: missing fields)
@@ -0,0 +1,140 @@
1
+ # Provider API Reference: Paddle AI Studio PP-OCRv5
2
+
3
+ This document describes the external provider API contract that this skill depends on.
4
+
5
+ ## Endpoint
6
+
7
+ **POST** `https://<AISTUDIO_HOST>/ocr`
8
+
9
+ Where `<AISTUDIO_HOST>` is provided by the user (e.g.: `your-subdomain.aistudio-app.com`).
10
+
11
+ ## Authentication
12
+
13
+ **Header:**
14
+ ```
15
+ Authorization: token <ACCESS_TOKEN>
16
+ ```
17
+
18
+ Where `<ACCESS_TOKEN>` is the API token obtained by the user from Paddle AI Studio.
19
+
20
+ ## Request Body
21
+
22
+ ```jsonc
23
+ {
24
+ "file": "https://example.com/image.png", // URL or base64 (without data: prefix)
25
+ "fileType": 1, // 0=PDF, 1=Image
26
+ "visualize": false, // Default false (avoid large responses)
27
+
28
+ // Text detection options
29
+ "textDetLimitSideLen": 736, // Maximum side length for detection
30
+ "textDetLimitType": "max", // "min" or "max"
31
+ "textDetThresh": 0.3, // Detection threshold
32
+ "textDetBoxThresh": 0.6, // Box threshold
33
+ "textDetUnclipRatio": 1.5, // Unclip ratio
34
+
35
+ // Text recognition options
36
+ "textRecScoreThresh": 0.0, // Recognition score threshold
37
+
38
+ // Document preprocessing options
39
+ "useDocOrientationClassify": false, // Enable orientation correction
40
+ "useDocUnwarping": false, // Enable unwarping/skew correction
41
+ "useTextlineOrientation": false // Enable textline orientation
42
+ }
43
+ ```
44
+
45
+ ### Key Parameters
46
+
47
+ - **file**: URL or base64 string of image/PDF (without `data:` URI prefix)
48
+ - **fileType**:
49
+ - `0` = PDF
50
+ - `1` = Image
51
+ - **visualize**: If `true`, returns visualization image (increases response size)
52
+ - **useDocOrientationClassify**: Correct page orientation (0°/90°/180°/270°)
53
+ - **useDocUnwarping**: Correct perspective distortion and skew
54
+ - **useTextlineOrientation**: Correct individual text line angles
55
+
56
+ ## Response Structure
57
+
58
+ ### Success Response (errorCode == 0)
59
+
60
+ ```jsonc
61
+ {
62
+ "errorCode": 0,
63
+ "errorMsg": "",
64
+ "logId": "unique-log-id",
65
+ "result": {
66
+ "ocrResults": [
67
+ {
68
+ "prunedResult": {
69
+ "rec_texts": ["Invoice", "Amount", "123.45"], // Recognized text
70
+ "rec_scores": [0.98, 0.95, 0.92], // Confidence scores (may be missing)
71
+ "rec_boxes": [ // Bounding boxes (may be missing)
72
+ [10, 20, 100, 50],
73
+ [10, 60, 150, 90],
74
+ [200, 60, 300, 90]
75
+ ],
76
+ "rec_polys": [...] // Polygons (alternative to boxes)
77
+ }
78
+ }
79
+ ]
80
+ }
81
+ }
82
+ ```
83
+
84
+ ### Error Response (errorCode != 0)
85
+
86
+ ```jsonc
87
+ {
88
+ "errorCode": 500,
89
+ "errorMsg": "Invalid parameter",
90
+ "logId": "unique-log-id"
91
+ }
92
+ ```
93
+
94
+ ## Error Codes
95
+
96
+ | HTTP Status | errorCode | Meaning | Mapped Error Code |
97
+ |-------------|-----------|---------|-------------------|
98
+ | 403 | N/A | Authentication failed | `PROVIDER_AUTH_ERROR` |
99
+ | 429 | N/A | Quota/rate limit exceeded | `PROVIDER_QUOTA_EXCEEDED` |
100
+ | 500 | 500 | Invalid parameters | `PROVIDER_BAD_REQUEST` |
101
+ | 503 | N/A | Service overloaded | `PROVIDER_OVERLOADED` |
102
+ | 504 | N/A | Gateway timeout | `PROVIDER_TIMEOUT` |
103
+ | Other | Other | Unknown error | `PROVIDER_ERROR` |
104
+
105
+ ## Field Compatibility Notes
106
+
107
+ - **rec_scores**: May be missing or empty. Default to 0.5 if needed.
108
+ - **rec_boxes**: May be missing. Use `rec_polys` as fallback.
109
+ - **rec_polys**: May be missing. Bounding box information may not be available.
110
+ - **visualize result**: Only returned when `visualize: true` (not recommended for auto mode).
111
+
112
+ ## Best Practices
113
+
114
+ 1. **Always set visualize to false** unless explicitly requested by user (reduces response size and latency)
115
+ 2. **Handle missing fields gracefully** (rec_scores, rec_boxes, rec_polys may not exist)
116
+ 3. **Retry on 503/504** with exponential backoff (up to 2 retries)
117
+ 4. **Never log or print tokens** in any output or logs
118
+ 5. **Normalize host input** to handle user errors (https://, trailing /ocr, etc.)
119
+
120
+ ## Request Example
121
+
122
+ ```bash
123
+ curl -X POST https://your-subdomain.aistudio-app.com/ocr \
124
+ -H "Authorization: token YOUR_ACCESS_TOKEN" \
125
+ -H "Content-Type: application/json" \
126
+ -d '{
127
+ "file": "https://example.com/test.png",
128
+ "fileType": 1,
129
+ "visualize": false,
130
+ "useDocOrientationClassify": true,
131
+ "useDocUnwarping": false,
132
+ "useTextlineOrientation": false,
133
+ "textDetLimitSideLen": 736,
134
+ "textDetLimitType": "max",
135
+ "textDetThresh": 0.3,
136
+ "textDetBoxThresh": 0.6,
137
+ "textDetUnclipRatio": 1.5,
138
+ "textRecScoreThresh": 0.0
139
+ }'
140
+ ```