paddleocr-skills 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +220 -0
- package/bin/paddleocr-skills.js +20 -0
- package/lib/copy.js +39 -0
- package/lib/installer.js +70 -0
- package/lib/prompts.js +67 -0
- package/lib/python.js +75 -0
- package/lib/verify.js +121 -0
- package/package.json +42 -0
- package/templates/.env.example +12 -0
- package/templates/paddleocr-vl/references/paddleocr-vl/layout_schema.md +64 -0
- package/templates/paddleocr-vl/references/paddleocr-vl/output_format.md +154 -0
- package/templates/paddleocr-vl/references/paddleocr-vl/vl_model_spec.md +157 -0
- package/templates/paddleocr-vl/scripts/paddleocr-vl/_lib.py +780 -0
- package/templates/paddleocr-vl/scripts/paddleocr-vl/configure.py +270 -0
- package/templates/paddleocr-vl/scripts/paddleocr-vl/optimize_file.py +226 -0
- package/templates/paddleocr-vl/scripts/paddleocr-vl/requirements-optimize.txt +8 -0
- package/templates/paddleocr-vl/scripts/paddleocr-vl/requirements.txt +7 -0
- package/templates/paddleocr-vl/scripts/paddleocr-vl/smoke_test.py +199 -0
- package/templates/paddleocr-vl/scripts/paddleocr-vl/vl_caller.py +232 -0
- package/templates/paddleocr-vl/skills/paddleocr-vl/SKILL.md +481 -0
- package/templates/ppocrv5/references/ppocrv5/agent_policy.md +258 -0
- package/templates/ppocrv5/references/ppocrv5/normalized_schema.md +257 -0
- package/templates/ppocrv5/references/ppocrv5/provider_api.md +140 -0
- package/templates/ppocrv5/scripts/ppocrv5/_lib.py +635 -0
- package/templates/ppocrv5/scripts/ppocrv5/configure.py +346 -0
- package/templates/ppocrv5/scripts/ppocrv5/ocr_caller.py +684 -0
- package/templates/ppocrv5/scripts/ppocrv5/requirements.txt +4 -0
- package/templates/ppocrv5/scripts/ppocrv5/smoke_test.py +139 -0
- package/templates/ppocrv5/skills/ppocrv5/SKILL.md +272 -0
|
@@ -0,0 +1,258 @@
|
|
|
1
|
+
# Agent Policy: Auto Mode Strategy
|
|
2
|
+
|
|
3
|
+
This document defines the execution strategies for the three OCR modes, with emphasis on the auto mode's adaptive retry logic.
|
|
4
|
+
|
|
5
|
+
## Mode Overview
|
|
6
|
+
|
|
7
|
+
| Mode | Strategy | Use Case |
|
|
8
|
+
|------|----------|----------|
|
|
9
|
+
| `fast` | Single call, all corrections off | Lowest latency, suitable for high-quality scans |
|
|
10
|
+
| `quality` | Single call, all corrections on | Maximum quality, slower, suitable for complex documents |
|
|
11
|
+
| `auto` | Adaptive multiple attempts, quality scoring | Balanced performance, production default |
|
|
12
|
+
|
|
13
|
+
## Mode: fast
|
|
14
|
+
|
|
15
|
+
**Single attempt**, minimal preprocessing.
|
|
16
|
+
|
|
17
|
+
**Options:**
|
|
18
|
+
```json
|
|
19
|
+
{
|
|
20
|
+
"use_doc_orientation_classify": false,
|
|
21
|
+
"use_doc_unwarping": false,
|
|
22
|
+
"use_textline_orientation": false
|
|
23
|
+
}
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
**Suitable scenarios:**
|
|
27
|
+
- Input is known high-quality images (scanned documents, screenshots)
|
|
28
|
+
- Latency is critical
|
|
29
|
+
- Text is already correctly oriented
|
|
30
|
+
|
|
31
|
+
## Mode: quality
|
|
32
|
+
|
|
33
|
+
**Single attempt**, all corrections enabled.
|
|
34
|
+
|
|
35
|
+
**Options:**
|
|
36
|
+
```json
|
|
37
|
+
{
|
|
38
|
+
"use_doc_orientation_classify": true,
|
|
39
|
+
"use_doc_unwarping": true,
|
|
40
|
+
"use_textline_orientation": false
|
|
41
|
+
}
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
**Suitable scenarios:**
|
|
45
|
+
- Input quality is unknown or poor (photos, rotated PDFs)
|
|
46
|
+
- Maximum accuracy is required
|
|
47
|
+
- Latency is acceptable (2-3x slower than fast)
|
|
48
|
+
|
|
49
|
+
## Mode: auto (default)
|
|
50
|
+
|
|
51
|
+
**Adaptive strategy**: Start fast, escalate progressively if quality is insufficient.
|
|
52
|
+
|
|
53
|
+
### Attempt Sequence
|
|
54
|
+
|
|
55
|
+
#### Attempt 1: Fast Path
|
|
56
|
+
```json
|
|
57
|
+
{
|
|
58
|
+
"use_doc_orientation_classify": false,
|
|
59
|
+
"use_doc_unwarping": false,
|
|
60
|
+
"use_textline_orientation": false
|
|
61
|
+
}
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
**If `quality_score >= quality_target`**: Stop and return result.
|
|
65
|
+
|
|
66
|
+
**Otherwise**: Continue to Attempt 2.
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
#### Attempt 2: Orientation Correction
|
|
71
|
+
```json
|
|
72
|
+
{
|
|
73
|
+
"use_doc_orientation_classify": true,
|
|
74
|
+
"use_doc_unwarping": false,
|
|
75
|
+
"use_textline_orientation": false
|
|
76
|
+
}
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
Enable page-level orientation detection (0°/90°/180°/270°).
|
|
80
|
+
|
|
81
|
+
**If `quality_score >= quality_target`**: Stop and return result.
|
|
82
|
+
|
|
83
|
+
**Otherwise**: Continue to Attempt 3.
|
|
84
|
+
|
|
85
|
+
---
|
|
86
|
+
|
|
87
|
+
#### Attempt 3: Unwarping Correction
|
|
88
|
+
```json
|
|
89
|
+
{
|
|
90
|
+
"use_doc_orientation_classify": true,
|
|
91
|
+
"use_doc_unwarping": true,
|
|
92
|
+
"use_textline_orientation": false
|
|
93
|
+
}
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
Add perspective correction and skew correction.
|
|
97
|
+
|
|
98
|
+
**If `quality_score >= quality_target`**: Stop and return result.
|
|
99
|
+
|
|
100
|
+
**Otherwise**: Select the best attempt so far.
|
|
101
|
+
|
|
102
|
+
---
|
|
103
|
+
|
|
104
|
+
#### Optional Attempt 4: Textline Orientation
|
|
105
|
+
|
|
106
|
+
*(Reserved for future use)*
|
|
107
|
+
|
|
108
|
+
```json
|
|
109
|
+
{
|
|
110
|
+
"use_doc_orientation_classify": true,
|
|
111
|
+
"use_doc_unwarping": true,
|
|
112
|
+
"use_textline_orientation": true
|
|
113
|
+
}
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
Add line-by-line angle correction (rarely needed).
|
|
117
|
+
|
|
118
|
+
### Stop Conditions
|
|
119
|
+
|
|
120
|
+
Auto mode stops when **any** of the following conditions are met:
|
|
121
|
+
|
|
122
|
+
1. **Quality target reached**: `quality_score >= quality_target` (default 0.72)
|
|
123
|
+
2. **Max attempts reached**: `attempt_count >= max_attempts` (default 3)
|
|
124
|
+
3. **Budget exceeded**: `total_elapsed_ms >= budget_ms` (default 25000)
|
|
125
|
+
|
|
126
|
+
### Selection Strategy
|
|
127
|
+
|
|
128
|
+
If multiple attempts are completed, **select the attempt with the highest `quality_score`**.
|
|
129
|
+
|
|
130
|
+
## Quality Scoring
|
|
131
|
+
|
|
132
|
+
Quality score balances text quantity and recognition confidence.
|
|
133
|
+
|
|
134
|
+
### Formula
|
|
135
|
+
|
|
136
|
+
```
|
|
137
|
+
quality_score = 0 if text_items == 0
|
|
138
|
+
= 0.6 * norm(text_items) + 0.4 * avg_rec_score otherwise
|
|
139
|
+
|
|
140
|
+
norm(n) = min(1, log(1+n) / log(1+50))
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
Where:
|
|
144
|
+
- `text_items`: Number of recognized text blocks
|
|
145
|
+
- `avg_rec_score`: Average of all recognition confidence scores (0.0-1.0)
|
|
146
|
+
- If `rec_scores` is missing, default to `0.5`
|
|
147
|
+
|
|
148
|
+
### Interpretation
|
|
149
|
+
|
|
150
|
+
| Quality Score | Interpretation |
|
|
151
|
+
|---------------|----------------|
|
|
152
|
+
| 0.90 - 1.00 | Excellent (high confidence, many items) |
|
|
153
|
+
| 0.72 - 0.89 | Good (default target) |
|
|
154
|
+
| 0.50 - 0.71 | Fair (may need retry) |
|
|
155
|
+
| 0.00 - 0.49 | Poor (may be low-quality input or blank) |
|
|
156
|
+
|
|
157
|
+
### Default Target
|
|
158
|
+
|
|
159
|
+
`quality_target = 0.72`
|
|
160
|
+
|
|
161
|
+
This balances cost (API calls) and quality. Adjust with `--quality-target` if needed.
|
|
162
|
+
|
|
163
|
+
## Budget Management
|
|
164
|
+
|
|
165
|
+
Auto mode respects a **total time budget** (default 25000ms).
|
|
166
|
+
|
|
167
|
+
- Before each attempt, check: `elapsed_ms < budget_ms`
|
|
168
|
+
- If budget exceeded, stop and return best attempt so far
|
|
169
|
+
- Provider timeout is independent (`PADDLE_OCR_TIMEOUT_MS`, default 25000ms)
|
|
170
|
+
|
|
171
|
+
**Example:**
|
|
172
|
+
- Attempt 1: 1200ms, quality 0.61 → Continue
|
|
173
|
+
- Attempt 2: 1800ms, quality 0.79 → Stop (target reached)
|
|
174
|
+
- Total: 3000ms
|
|
175
|
+
|
|
176
|
+
## Error Handling
|
|
177
|
+
|
|
178
|
+
### Provider Errors
|
|
179
|
+
|
|
180
|
+
If an attempt fails with a provider error (authentication, quota, etc.):
|
|
181
|
+
- **Stop immediately** (do not retry with different options)
|
|
182
|
+
- Return error response with failed attempt shown in trace
|
|
183
|
+
|
|
184
|
+
### Temporary Errors (503/504)
|
|
185
|
+
|
|
186
|
+
- Retry **within the same attempt** (up to 2 retries with backoff)
|
|
187
|
+
- If all retries fail, treat as error and stop
|
|
188
|
+
|
|
189
|
+
## Examples
|
|
190
|
+
|
|
191
|
+
### Example 1: Fast Path Success
|
|
192
|
+
|
|
193
|
+
**Input:** High-quality scanned invoice
|
|
194
|
+
|
|
195
|
+
**Execution:**
|
|
196
|
+
1. Attempt 1 (fast): quality_score = 0.85 → **Stop**
|
|
197
|
+
|
|
198
|
+
**Result:** Select Attempt 1, total time 1.2s
|
|
199
|
+
|
|
200
|
+
---
|
|
201
|
+
|
|
202
|
+
### Example 2: Retry with Improvement
|
|
203
|
+
|
|
204
|
+
**Input:** Photo of rotated document
|
|
205
|
+
|
|
206
|
+
**Execution:**
|
|
207
|
+
1. Attempt 1 (fast): quality_score = 0.48 → Continue
|
|
208
|
+
2. Attempt 2 (orientation): quality_score = 0.81 → **Stop**
|
|
209
|
+
|
|
210
|
+
**Result:** Select Attempt 2, total time 3.1s
|
|
211
|
+
|
|
212
|
+
---
|
|
213
|
+
|
|
214
|
+
### Example 3: All Attempts Needed
|
|
215
|
+
|
|
216
|
+
**Input:** Photo of warped document
|
|
217
|
+
|
|
218
|
+
**Execution:**
|
|
219
|
+
1. Attempt 1 (fast): quality_score = 0.35 → Continue
|
|
220
|
+
2. Attempt 2 (orientation): quality_score = 0.58 → Continue
|
|
221
|
+
3. Attempt 3 (unwarping): quality_score = 0.76 → **Stop**
|
|
222
|
+
|
|
223
|
+
**Result:** Select Attempt 3, total time 5.2s
|
|
224
|
+
|
|
225
|
+
---
|
|
226
|
+
|
|
227
|
+
### Example 4: No Improvement
|
|
228
|
+
|
|
229
|
+
**Input:** Blank or corrupted image
|
|
230
|
+
|
|
231
|
+
**Execution:**
|
|
232
|
+
1. Attempt 1 (fast): quality_score = 0.0, text_items = 0 → Continue
|
|
233
|
+
2. Attempt 2 (orientation): quality_score = 0.0, text_items = 0 → Continue
|
|
234
|
+
3. Attempt 3 (unwarping): quality_score = 0.0, text_items = 0 → **Stop**
|
|
235
|
+
|
|
236
|
+
**Result:** Select Attempt 1 (all are 0.0), return with warning
|
|
237
|
+
|
|
238
|
+
## Tuning Parameters
|
|
239
|
+
|
|
240
|
+
Users can override defaults via CLI:
|
|
241
|
+
|
|
242
|
+
```bash
|
|
243
|
+
--mode auto \
|
|
244
|
+
--max-attempts 2 \ # Reduce to 2 for faster execution (but less robustness)
|
|
245
|
+
--budget-ms 15000 \ # Stricter budget
|
|
246
|
+
--quality-target 0.80 # Higher quality standard
|
|
247
|
+
```
|
|
248
|
+
|
|
249
|
+
## Testing Auto Mode
|
|
250
|
+
|
|
251
|
+
For unit tests, simulate provider responses with different `rec_texts` and `rec_scores` to verify:
|
|
252
|
+
|
|
253
|
+
1. Quality score calculation
|
|
254
|
+
2. Attempt selection (highest score wins)
|
|
255
|
+
3. Early stopping (when target is reached)
|
|
256
|
+
4. Budget enforcement
|
|
257
|
+
|
|
258
|
+
See `scripts/tests/test_agent_policy.py`.
|
|
@@ -0,0 +1,257 @@
|
|
|
1
|
+
# Normalized Output Schema
|
|
2
|
+
|
|
3
|
+
This document defines the unified output format returned by `ocr_caller.py`, and the format that downstream agents/tools should expect.
|
|
4
|
+
|
|
5
|
+
## Schema Version
|
|
6
|
+
|
|
7
|
+
**v0.1** (stable)
|
|
8
|
+
|
|
9
|
+
## Output Structure
|
|
10
|
+
|
|
11
|
+
All responses follow this top-level structure:
|
|
12
|
+
|
|
13
|
+
```typescript
|
|
14
|
+
{
|
|
15
|
+
ok: boolean, // true indicates OCR success
|
|
16
|
+
request_id: string, // Unique request ID (e.g.: "req_abc123")
|
|
17
|
+
provider: ProviderInfo, // Provider API metadata
|
|
18
|
+
result: Result | null, // OCR results (null on error)
|
|
19
|
+
quality: Quality | null, // Quality metrics (null on error)
|
|
20
|
+
agent_trace: AgentTrace, // Execution trace for transparency
|
|
21
|
+
raw_provider: any | null, // Raw provider response (if --return-raw-provider is used)
|
|
22
|
+
error: Error | null // Error details (null on success)
|
|
23
|
+
}
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
## ProviderInfo
|
|
27
|
+
|
|
28
|
+
```typescript
|
|
29
|
+
{
|
|
30
|
+
api_url: string, // Full API endpoint used
|
|
31
|
+
status_code: number, // HTTP status code
|
|
32
|
+
log_id: string | null // Provider's log ID (if available)
|
|
33
|
+
}
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
## Result (success only)
|
|
37
|
+
|
|
38
|
+
```typescript
|
|
39
|
+
{
|
|
40
|
+
pages: Page[], // Array of pages (one per image/PDF page)
|
|
41
|
+
full_text: string // All text joined by "\n\n"
|
|
42
|
+
}
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
### Page
|
|
46
|
+
|
|
47
|
+
```typescript
|
|
48
|
+
{
|
|
49
|
+
page_index: number, // 0-based page number
|
|
50
|
+
text: string, // Page text (items joined by "\n")
|
|
51
|
+
avg_confidence: number, // Average confidence for this page (0.0-1.0)
|
|
52
|
+
items: TextItem[] // Individual text blocks/lines
|
|
53
|
+
}
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
### TextItem
|
|
57
|
+
|
|
58
|
+
```typescript
|
|
59
|
+
{
|
|
60
|
+
text: string, // Recognized text
|
|
61
|
+
score?: number, // Confidence score (0.0-1.0), may be missing
|
|
62
|
+
box?: number[] | number[][] // Bounding box or polygon, may be missing
|
|
63
|
+
}
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
**Box format:**
|
|
67
|
+
- `[xmin, ymin, xmax, ymax]` - Bounding box (4 numbers)
|
|
68
|
+
- Or polygon points (array of arrays)
|
|
69
|
+
|
|
70
|
+
## Quality (success only)
|
|
71
|
+
|
|
72
|
+
```typescript
|
|
73
|
+
{
|
|
74
|
+
quality_score: number, // Overall quality (0.0-1.0)
|
|
75
|
+
avg_rec_score: number, // Average recognition confidence (0.0-1.0)
|
|
76
|
+
text_items: number, // Total text items detected
|
|
77
|
+
warnings: string[] // Warnings (e.g.: "rec_scores missing")
|
|
78
|
+
}
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
### Quality Score Formula
|
|
82
|
+
|
|
83
|
+
```
|
|
84
|
+
quality_score = 0 if text_items == 0
|
|
85
|
+
= 0.6 * norm(text_items) + 0.4 * avg_rec_score otherwise
|
|
86
|
+
|
|
87
|
+
norm(n) = min(1, log(1+n) / log(1+50))
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
## AgentTrace
|
|
91
|
+
|
|
92
|
+
```typescript
|
|
93
|
+
{
|
|
94
|
+
mode: "fast" | "quality" | "auto", // Execution mode
|
|
95
|
+
selected_attempt: number, // Attempt used (1-indexed)
|
|
96
|
+
attempts: Attempt[] // Details of all attempts
|
|
97
|
+
}
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
### Attempt
|
|
101
|
+
|
|
102
|
+
```typescript
|
|
103
|
+
{
|
|
104
|
+
attempt: number, // Attempt number (1-indexed)
|
|
105
|
+
provider_time_ms: number, // Provider call latency
|
|
106
|
+
quality_score: number, // Quality score for this attempt
|
|
107
|
+
avg_rec_score: number, // Average recognition score
|
|
108
|
+
text_items: number, // Number of text items
|
|
109
|
+
warnings: string[], // Warnings for this attempt
|
|
110
|
+
options_effective: { // OCR options used
|
|
111
|
+
use_doc_orientation_classify: boolean,
|
|
112
|
+
use_doc_unwarping: boolean,
|
|
113
|
+
use_textline_orientation: boolean
|
|
114
|
+
}
|
|
115
|
+
}
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
## Error (error only)
|
|
119
|
+
|
|
120
|
+
```typescript
|
|
121
|
+
{
|
|
122
|
+
code: ErrorCode, // Unified error code
|
|
123
|
+
message: string, // Human-readable error message
|
|
124
|
+
details: object // Additional error context
|
|
125
|
+
}
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
### ErrorCode Enum
|
|
129
|
+
|
|
130
|
+
- `PROVIDER_AUTH_ERROR` - Authentication failed (403)
|
|
131
|
+
- `PROVIDER_QUOTA_EXCEEDED` - Quota/rate limit exceeded (429)
|
|
132
|
+
- `PROVIDER_BAD_REQUEST` - Invalid parameters (500)
|
|
133
|
+
- `PROVIDER_OVERLOADED` - Service overloaded (503)
|
|
134
|
+
- `PROVIDER_TIMEOUT` - Gateway timeout (504)
|
|
135
|
+
- `PROVIDER_ERROR` - Other provider errors
|
|
136
|
+
|
|
137
|
+
## Example: Success Response
|
|
138
|
+
|
|
139
|
+
```json
|
|
140
|
+
{
|
|
141
|
+
"ok": true,
|
|
142
|
+
"request_id": "req_abc123",
|
|
143
|
+
"provider": {
|
|
144
|
+
"api_url": "https://example.aistudio-app.com/ocr",
|
|
145
|
+
"status_code": 200,
|
|
146
|
+
"log_id": "log_xyz"
|
|
147
|
+
},
|
|
148
|
+
"result": {
|
|
149
|
+
"pages": [
|
|
150
|
+
{
|
|
151
|
+
"page_index": 0,
|
|
152
|
+
"text": "Invoice\nAmount: $123.45",
|
|
153
|
+
"avg_confidence": 0.95,
|
|
154
|
+
"items": [
|
|
155
|
+
{"text": "Invoice", "score": 0.98, "box": [10, 20, 100, 50]},
|
|
156
|
+
{"text": "Amount: $123.45", "score": 0.92, "box": [10, 60, 200, 90]}
|
|
157
|
+
]
|
|
158
|
+
}
|
|
159
|
+
],
|
|
160
|
+
"full_text": "Invoice\nAmount: $123.45"
|
|
161
|
+
},
|
|
162
|
+
"quality": {
|
|
163
|
+
"quality_score": 0.79,
|
|
164
|
+
"avg_rec_score": 0.95,
|
|
165
|
+
"text_items": 2,
|
|
166
|
+
"warnings": []
|
|
167
|
+
},
|
|
168
|
+
"agent_trace": {
|
|
169
|
+
"mode": "auto",
|
|
170
|
+
"selected_attempt": 1,
|
|
171
|
+
"attempts": [
|
|
172
|
+
{
|
|
173
|
+
"attempt": 1,
|
|
174
|
+
"provider_time_ms": 1200,
|
|
175
|
+
"quality_score": 0.79,
|
|
176
|
+
"avg_rec_score": 0.95,
|
|
177
|
+
"text_items": 2,
|
|
178
|
+
"warnings": [],
|
|
179
|
+
"options_effective": {
|
|
180
|
+
"use_doc_orientation_classify": false,
|
|
181
|
+
"use_doc_unwarping": false,
|
|
182
|
+
"use_textline_orientation": false
|
|
183
|
+
}
|
|
184
|
+
}
|
|
185
|
+
]
|
|
186
|
+
},
|
|
187
|
+
"raw_provider": null,
|
|
188
|
+
"error": null
|
|
189
|
+
}
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
## Example: Error Response
|
|
193
|
+
|
|
194
|
+
```json
|
|
195
|
+
{
|
|
196
|
+
"ok": false,
|
|
197
|
+
"request_id": "req_def456",
|
|
198
|
+
"provider": {
|
|
199
|
+
"api_url": "https://example.aistudio-app.com/ocr",
|
|
200
|
+
"status_code": 403,
|
|
201
|
+
"log_id": null
|
|
202
|
+
},
|
|
203
|
+
"result": null,
|
|
204
|
+
"quality": null,
|
|
205
|
+
"agent_trace": {
|
|
206
|
+
"mode": "auto",
|
|
207
|
+
"selected_attempt": 1,
|
|
208
|
+
"attempts": [
|
|
209
|
+
{
|
|
210
|
+
"attempt": 1,
|
|
211
|
+
"provider_time_ms": 150,
|
|
212
|
+
"quality_score": 0.0,
|
|
213
|
+
"avg_rec_score": 0.0,
|
|
214
|
+
"text_items": 0,
|
|
215
|
+
"warnings": ["Provider error: Authentication failed"],
|
|
216
|
+
"options_effective": {
|
|
217
|
+
"use_doc_orientation_classify": false,
|
|
218
|
+
"use_doc_unwarping": false,
|
|
219
|
+
"use_textline_orientation": false
|
|
220
|
+
}
|
|
221
|
+
}
|
|
222
|
+
]
|
|
223
|
+
},
|
|
224
|
+
"raw_provider": null,
|
|
225
|
+
"error": {
|
|
226
|
+
"code": "PROVIDER_AUTH_ERROR",
|
|
227
|
+
"message": "Authentication failed",
|
|
228
|
+
"details": {
|
|
229
|
+
"error_code": 403,
|
|
230
|
+
"status_code": 403
|
|
231
|
+
}
|
|
232
|
+
}
|
|
233
|
+
}
|
|
234
|
+
```
|
|
235
|
+
|
|
236
|
+
## Usage Guide
|
|
237
|
+
|
|
238
|
+
### For Agents/Scripts
|
|
239
|
+
|
|
240
|
+
1. **Check `ok` first**: `if response.ok:`
|
|
241
|
+
2. **Extract text**: `response.result.full_text`
|
|
242
|
+
3. **Extract structured data**: `response.result.pages[].items[]`
|
|
243
|
+
4. **Check quality**: `response.quality.quality_score` (0.72+ is usually good)
|
|
244
|
+
5. **Handle errors**: `response.error.code` and `response.error.message`
|
|
245
|
+
|
|
246
|
+
### For Debugging
|
|
247
|
+
|
|
248
|
+
1. **Check trace**: `response.agent_trace.attempts` shows all attempts and their quality
|
|
249
|
+
2. **Selected attempt**: `response.agent_trace.selected_attempt` indicates which one was selected
|
|
250
|
+
3. **Raw provider**: Use `--return-raw-provider` to see raw API response
|
|
251
|
+
|
|
252
|
+
## Compatibility Notes
|
|
253
|
+
|
|
254
|
+
- **Missing scores**: `items[].score` may not exist if provider didn't return scores
|
|
255
|
+
- **Missing boxes**: `items[].box` may not exist if provider didn't return geometry
|
|
256
|
+
- **Empty results**: `text_items == 0` means no text detected (not necessarily an error)
|
|
257
|
+
- **Warnings**: Check `quality.warnings` for non-fatal issues (e.g.: missing fields)
|
|
@@ -0,0 +1,140 @@
|
|
|
1
|
+
# Provider API Reference: Paddle AI Studio PP-OCRv5
|
|
2
|
+
|
|
3
|
+
This document describes the external provider API contract that this skill depends on.
|
|
4
|
+
|
|
5
|
+
## Endpoint
|
|
6
|
+
|
|
7
|
+
**POST** `https://<AISTUDIO_HOST>/ocr`
|
|
8
|
+
|
|
9
|
+
Where `<AISTUDIO_HOST>` is provided by the user (e.g.: `your-subdomain.aistudio-app.com`).
|
|
10
|
+
|
|
11
|
+
## Authentication
|
|
12
|
+
|
|
13
|
+
**Header:**
|
|
14
|
+
```
|
|
15
|
+
Authorization: token <ACCESS_TOKEN>
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
Where `<ACCESS_TOKEN>` is the API token obtained by the user from Paddle AI Studio.
|
|
19
|
+
|
|
20
|
+
## Request Body
|
|
21
|
+
|
|
22
|
+
```jsonc
|
|
23
|
+
{
|
|
24
|
+
"file": "https://example.com/image.png", // URL or base64 (without data: prefix)
|
|
25
|
+
"fileType": 1, // 0=PDF, 1=Image
|
|
26
|
+
"visualize": false, // Default false (avoid large responses)
|
|
27
|
+
|
|
28
|
+
// Text detection options
|
|
29
|
+
"textDetLimitSideLen": 736, // Maximum side length for detection
|
|
30
|
+
"textDetLimitType": "max", // "min" or "max"
|
|
31
|
+
"textDetThresh": 0.3, // Detection threshold
|
|
32
|
+
"textDetBoxThresh": 0.6, // Box threshold
|
|
33
|
+
"textDetUnclipRatio": 1.5, // Unclip ratio
|
|
34
|
+
|
|
35
|
+
// Text recognition options
|
|
36
|
+
"textRecScoreThresh": 0.0, // Recognition score threshold
|
|
37
|
+
|
|
38
|
+
// Document preprocessing options
|
|
39
|
+
"useDocOrientationClassify": false, // Enable orientation correction
|
|
40
|
+
"useDocUnwarping": false, // Enable unwarping/skew correction
|
|
41
|
+
"useTextlineOrientation": false // Enable textline orientation
|
|
42
|
+
}
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
### Key Parameters
|
|
46
|
+
|
|
47
|
+
- **file**: URL or base64 string of image/PDF (without `data:` URI prefix)
|
|
48
|
+
- **fileType**:
|
|
49
|
+
- `0` = PDF
|
|
50
|
+
- `1` = Image
|
|
51
|
+
- **visualize**: If `true`, returns visualization image (increases response size)
|
|
52
|
+
- **useDocOrientationClassify**: Correct page orientation (0°/90°/180°/270°)
|
|
53
|
+
- **useDocUnwarping**: Correct perspective distortion and skew
|
|
54
|
+
- **useTextlineOrientation**: Correct individual text line angles
|
|
55
|
+
|
|
56
|
+
## Response Structure
|
|
57
|
+
|
|
58
|
+
### Success Response (errorCode == 0)
|
|
59
|
+
|
|
60
|
+
```jsonc
|
|
61
|
+
{
|
|
62
|
+
"errorCode": 0,
|
|
63
|
+
"errorMsg": "",
|
|
64
|
+
"logId": "unique-log-id",
|
|
65
|
+
"result": {
|
|
66
|
+
"ocrResults": [
|
|
67
|
+
{
|
|
68
|
+
"prunedResult": {
|
|
69
|
+
"rec_texts": ["Invoice", "Amount", "123.45"], // Recognized text
|
|
70
|
+
"rec_scores": [0.98, 0.95, 0.92], // Confidence scores (may be missing)
|
|
71
|
+
"rec_boxes": [ // Bounding boxes (may be missing)
|
|
72
|
+
[10, 20, 100, 50],
|
|
73
|
+
[10, 60, 150, 90],
|
|
74
|
+
[200, 60, 300, 90]
|
|
75
|
+
],
|
|
76
|
+
"rec_polys": [...] // Polygons (alternative to boxes)
|
|
77
|
+
}
|
|
78
|
+
}
|
|
79
|
+
]
|
|
80
|
+
}
|
|
81
|
+
}
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
### Error Response (errorCode != 0)
|
|
85
|
+
|
|
86
|
+
```jsonc
|
|
87
|
+
{
|
|
88
|
+
"errorCode": 500,
|
|
89
|
+
"errorMsg": "Invalid parameter",
|
|
90
|
+
"logId": "unique-log-id"
|
|
91
|
+
}
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
## Error Codes
|
|
95
|
+
|
|
96
|
+
| HTTP Status | errorCode | Meaning | Mapped Error Code |
|
|
97
|
+
|-------------|-----------|---------|-------------------|
|
|
98
|
+
| 403 | N/A | Authentication failed | `PROVIDER_AUTH_ERROR` |
|
|
99
|
+
| 429 | N/A | Quota/rate limit exceeded | `PROVIDER_QUOTA_EXCEEDED` |
|
|
100
|
+
| 500 | 500 | Invalid parameters | `PROVIDER_BAD_REQUEST` |
|
|
101
|
+
| 503 | N/A | Service overloaded | `PROVIDER_OVERLOADED` |
|
|
102
|
+
| 504 | N/A | Gateway timeout | `PROVIDER_TIMEOUT` |
|
|
103
|
+
| Other | Other | Unknown error | `PROVIDER_ERROR` |
|
|
104
|
+
|
|
105
|
+
## Field Compatibility Notes
|
|
106
|
+
|
|
107
|
+
- **rec_scores**: May be missing or empty. Default to 0.5 if needed.
|
|
108
|
+
- **rec_boxes**: May be missing. Use `rec_polys` as fallback.
|
|
109
|
+
- **rec_polys**: May be missing. Bounding box information may not be available.
|
|
110
|
+
- **visualize result**: Only returned when `visualize: true` (not recommended for auto mode).
|
|
111
|
+
|
|
112
|
+
## Best Practices
|
|
113
|
+
|
|
114
|
+
1. **Always set visualize to false** unless explicitly requested by user (reduces response size and latency)
|
|
115
|
+
2. **Handle missing fields gracefully** (rec_scores, rec_boxes, rec_polys may not exist)
|
|
116
|
+
3. **Retry on 503/504** with exponential backoff (up to 2 retries)
|
|
117
|
+
4. **Never log or print tokens** in any output or logs
|
|
118
|
+
5. **Normalize host input** to handle user errors (https://, trailing /ocr, etc.)
|
|
119
|
+
|
|
120
|
+
## Request Example
|
|
121
|
+
|
|
122
|
+
```bash
|
|
123
|
+
curl -X POST https://your-subdomain.aistudio-app.com/ocr \
|
|
124
|
+
-H "Authorization: token YOUR_ACCESS_TOKEN" \
|
|
125
|
+
-H "Content-Type: application/json" \
|
|
126
|
+
-d '{
|
|
127
|
+
"file": "https://example.com/test.png",
|
|
128
|
+
"fileType": 1,
|
|
129
|
+
"visualize": false,
|
|
130
|
+
"useDocOrientationClassify": true,
|
|
131
|
+
"useDocUnwarping": false,
|
|
132
|
+
"useTextlineOrientation": false,
|
|
133
|
+
"textDetLimitSideLen": 736,
|
|
134
|
+
"textDetLimitType": "max",
|
|
135
|
+
"textDetThresh": 0.3,
|
|
136
|
+
"textDetBoxThresh": 0.6,
|
|
137
|
+
"textDetUnclipRatio": 1.5,
|
|
138
|
+
"textRecScoreThresh": 0.0
|
|
139
|
+
}'
|
|
140
|
+
```
|