paddleocr-skills 1.0.0 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +220 -220
- package/bin/paddleocr-skills.js +33 -20
- package/lib/copy.js +39 -39
- package/lib/installer.js +76 -70
- package/lib/prompts.js +67 -67
- package/lib/python.js +75 -75
- package/lib/verify.js +121 -121
- package/package.json +42 -42
- package/templates/.env.example +12 -12
- package/templates/{paddleocr-vl/references/paddleocr-vl → paddleocr-vl-1.5/references/paddleocr-vl-1.5}/layout_schema.md +64 -64
- package/templates/{paddleocr-vl/references/paddleocr-vl → paddleocr-vl-1.5/references/paddleocr-vl-1.5}/output_format.md +154 -154
- package/templates/{paddleocr-vl/references/paddleocr-vl → paddleocr-vl-1.5/references/paddleocr-vl-1.5}/vl_model_spec.md +157 -157
- package/templates/{paddleocr-vl/scripts/paddleocr-vl → paddleocr-vl-1.5/scripts/paddleocr-vl-1.5}/_lib.py +780 -780
- package/templates/{paddleocr-vl/scripts/paddleocr-vl → paddleocr-vl-1.5/scripts/paddleocr-vl-1.5}/configure.py +270 -270
- package/templates/{paddleocr-vl/scripts/paddleocr-vl → paddleocr-vl-1.5/scripts/paddleocr-vl-1.5}/optimize_file.py +226 -226
- package/templates/{paddleocr-vl/scripts/paddleocr-vl → paddleocr-vl-1.5/scripts/paddleocr-vl-1.5}/requirements-optimize.txt +8 -8
- package/templates/{paddleocr-vl/scripts/paddleocr-vl → paddleocr-vl-1.5/scripts/paddleocr-vl-1.5}/requirements.txt +7 -7
- package/templates/{paddleocr-vl/scripts/paddleocr-vl → paddleocr-vl-1.5/scripts/paddleocr-vl-1.5}/smoke_test.py +199 -199
- package/templates/{paddleocr-vl/scripts/paddleocr-vl → paddleocr-vl-1.5/scripts/paddleocr-vl-1.5}/vl_caller.py +232 -232
- package/templates/{paddleocr-vl/skills/paddleocr-vl → paddleocr-vl-1.5/skills/paddleocr-vl-1.5}/SKILL.md +481 -481
- package/templates/ppocrv5/references/ppocrv5/agent_policy.md +258 -258
- package/templates/ppocrv5/references/ppocrv5/normalized_schema.md +257 -257
- package/templates/ppocrv5/references/ppocrv5/provider_api.md +140 -140
- package/templates/ppocrv5/scripts/ppocrv5/_lib.py +635 -635
- package/templates/ppocrv5/scripts/ppocrv5/configure.py +346 -346
- package/templates/ppocrv5/scripts/ppocrv5/ocr_caller.py +684 -684
- package/templates/ppocrv5/scripts/ppocrv5/requirements.txt +4 -4
- package/templates/ppocrv5/scripts/ppocrv5/smoke_test.py +139 -139
- package/templates/ppocrv5/skills/ppocrv5/SKILL.md +272 -272
|
@@ -1,154 +1,154 @@
|
|
|
1
|
-
# PaddleOCR-VL Output Format Specification
|
|
2
|
-
|
|
3
|
-
## Response Structure
|
|
4
|
-
|
|
5
|
-
All PaddleOCR-VL API responses follow this structure:
|
|
6
|
-
|
|
7
|
-
```json
|
|
8
|
-
{
|
|
9
|
-
"ok": boolean,
|
|
10
|
-
"result": { ... },
|
|
11
|
-
"metadata": { ... },
|
|
12
|
-
"error": { ... } // Only present if ok = false
|
|
13
|
-
}
|
|
14
|
-
```
|
|
15
|
-
|
|
16
|
-
## Success Response
|
|
17
|
-
|
|
18
|
-
### JSON Output
|
|
19
|
-
|
|
20
|
-
```json
|
|
21
|
-
{
|
|
22
|
-
"ok": true,
|
|
23
|
-
"result": {
|
|
24
|
-
"layout": {
|
|
25
|
-
"regions": [...],
|
|
26
|
-
"reading_order": [...]
|
|
27
|
-
},
|
|
28
|
-
"elements": [
|
|
29
|
-
{
|
|
30
|
-
"type": "text" | "table" | "formula" | "figure",
|
|
31
|
-
"content": "...",
|
|
32
|
-
"bbox": [x1, y1, x2, y2],
|
|
33
|
-
"confidence": 0.0-1.0
|
|
34
|
-
}
|
|
35
|
-
],
|
|
36
|
-
"full_text": "Combined text from all elements"
|
|
37
|
-
},
|
|
38
|
-
"metadata": {
|
|
39
|
-
"processing_time_ms": 3500,
|
|
40
|
-
"total_pages": 1,
|
|
41
|
-
"languages_detected": ["en", "zh"],
|
|
42
|
-
"model_version": "paddleocr-vl-
|
|
43
|
-
}
|
|
44
|
-
}
|
|
45
|
-
```
|
|
46
|
-
|
|
47
|
-
### Markdown Output
|
|
48
|
-
|
|
49
|
-
When `--format markdown` is specified, the output is formatted as:
|
|
50
|
-
|
|
51
|
-
```markdown
|
|
52
|
-
# [Title from document]
|
|
53
|
-
|
|
54
|
-
[Paragraph text...]
|
|
55
|
-
|
|
56
|
-
| Table | Headers |
|
|
57
|
-
|-------|---------|
|
|
58
|
-
| Data | Values |
|
|
59
|
-
|
|
60
|
-
Formula: $E = mc^2$
|
|
61
|
-
|
|
62
|
-
[More content...]
|
|
63
|
-
```
|
|
64
|
-
|
|
65
|
-
## Element Types
|
|
66
|
-
|
|
67
|
-
### Text Element
|
|
68
|
-
```json
|
|
69
|
-
{
|
|
70
|
-
"type": "text",
|
|
71
|
-
"content": "The actual text content",
|
|
72
|
-
"bbox": [100, 200, 500, 250],
|
|
73
|
-
"confidence": 0.95,
|
|
74
|
-
"language": "en"
|
|
75
|
-
}
|
|
76
|
-
```
|
|
77
|
-
|
|
78
|
-
### Table Element
|
|
79
|
-
```json
|
|
80
|
-
{
|
|
81
|
-
"type": "table",
|
|
82
|
-
"content": {
|
|
83
|
-
"rows": 3,
|
|
84
|
-
"cols": 2,
|
|
85
|
-
"cells": [
|
|
86
|
-
["Header 1", "Header 2"],
|
|
87
|
-
["Data 1", "Data 2"],
|
|
88
|
-
["Data 3", "Data 4"]
|
|
89
|
-
]
|
|
90
|
-
},
|
|
91
|
-
"bbox": [100, 300, 500, 450],
|
|
92
|
-
"confidence": 0.88
|
|
93
|
-
}
|
|
94
|
-
```
|
|
95
|
-
|
|
96
|
-
### Formula Element
|
|
97
|
-
```json
|
|
98
|
-
{
|
|
99
|
-
"type": "formula",
|
|
100
|
-
"content": "E = mc^2",
|
|
101
|
-
"latex": "$E = mc^2$",
|
|
102
|
-
"bbox": [200, 500, 400, 530],
|
|
103
|
-
"confidence": 0.92
|
|
104
|
-
}
|
|
105
|
-
```
|
|
106
|
-
|
|
107
|
-
### Figure Element
|
|
108
|
-
```json
|
|
109
|
-
{
|
|
110
|
-
"type": "figure",
|
|
111
|
-
"content": {
|
|
112
|
-
"description": "Bar chart showing sales data",
|
|
113
|
-
"extracted_data": {...} // If available
|
|
114
|
-
},
|
|
115
|
-
"bbox": [100, 600, 500, 800],
|
|
116
|
-
"confidence": 0.85
|
|
117
|
-
}
|
|
118
|
-
```
|
|
119
|
-
|
|
120
|
-
## Error Response
|
|
121
|
-
|
|
122
|
-
```json
|
|
123
|
-
{
|
|
124
|
-
"ok": false,
|
|
125
|
-
"error": {
|
|
126
|
-
"code": "ERROR_CODE",
|
|
127
|
-
"message": "Human-readable error message",
|
|
128
|
-
"details": {
|
|
129
|
-
"field": "Additional context"
|
|
130
|
-
}
|
|
131
|
-
}
|
|
132
|
-
}
|
|
133
|
-
```
|
|
134
|
-
|
|
135
|
-
### Error Codes
|
|
136
|
-
|
|
137
|
-
- `INVALID_INPUT`: Invalid file or URL
|
|
138
|
-
- `UNSUPPORTED_FORMAT`: File format not supported
|
|
139
|
-
- `PROVIDER_AUTH_ERROR`: Authentication failed
|
|
140
|
-
- `PROVIDER_QUOTA_EXCEEDED`: API quota exceeded
|
|
141
|
-
- `PROCESSING_ERROR`: Error during document processing
|
|
142
|
-
- `TIMEOUT`: Request timeout
|
|
143
|
-
|
|
144
|
-
## Quality Metrics
|
|
145
|
-
|
|
146
|
-
Each element includes a confidence score (0.0 to 1.0):
|
|
147
|
-
- **0.90 - 1.00**: Excellent quality, highly reliable
|
|
148
|
-
- **0.75 - 0.89**: Good quality, generally reliable
|
|
149
|
-
- **0.50 - 0.74**: Fair quality, may have some errors
|
|
150
|
-
- **0.00 - 0.49**: Poor quality, likely has errors
|
|
151
|
-
|
|
152
|
-
---
|
|
153
|
-
|
|
154
|
-
*This is a placeholder document. Full output format specifications will be added when integration is complete.*
|
|
1
|
+
# PaddleOCR-VL 1.5 Output Format Specification
|
|
2
|
+
|
|
3
|
+
## Response Structure
|
|
4
|
+
|
|
5
|
+
All PaddleOCR-VL 1.5 API responses follow this structure:
|
|
6
|
+
|
|
7
|
+
```json
|
|
8
|
+
{
|
|
9
|
+
"ok": boolean,
|
|
10
|
+
"result": { ... },
|
|
11
|
+
"metadata": { ... },
|
|
12
|
+
"error": { ... } // Only present if ok = false
|
|
13
|
+
}
|
|
14
|
+
```
|
|
15
|
+
|
|
16
|
+
## Success Response
|
|
17
|
+
|
|
18
|
+
### JSON Output
|
|
19
|
+
|
|
20
|
+
```json
|
|
21
|
+
{
|
|
22
|
+
"ok": true,
|
|
23
|
+
"result": {
|
|
24
|
+
"layout": {
|
|
25
|
+
"regions": [...],
|
|
26
|
+
"reading_order": [...]
|
|
27
|
+
},
|
|
28
|
+
"elements": [
|
|
29
|
+
{
|
|
30
|
+
"type": "text" | "table" | "formula" | "figure",
|
|
31
|
+
"content": "...",
|
|
32
|
+
"bbox": [x1, y1, x2, y2],
|
|
33
|
+
"confidence": 0.0-1.0
|
|
34
|
+
}
|
|
35
|
+
],
|
|
36
|
+
"full_text": "Combined text from all elements"
|
|
37
|
+
},
|
|
38
|
+
"metadata": {
|
|
39
|
+
"processing_time_ms": 3500,
|
|
40
|
+
"total_pages": 1,
|
|
41
|
+
"languages_detected": ["en", "zh"],
|
|
42
|
+
"model_version": "paddleocr-vl-1.5"
|
|
43
|
+
}
|
|
44
|
+
}
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
### Markdown Output
|
|
48
|
+
|
|
49
|
+
When `--format markdown` is specified, the output is formatted as:
|
|
50
|
+
|
|
51
|
+
```markdown
|
|
52
|
+
# [Title from document]
|
|
53
|
+
|
|
54
|
+
[Paragraph text...]
|
|
55
|
+
|
|
56
|
+
| Table | Headers |
|
|
57
|
+
|-------|---------|
|
|
58
|
+
| Data | Values |
|
|
59
|
+
|
|
60
|
+
Formula: $E = mc^2$
|
|
61
|
+
|
|
62
|
+
[More content...]
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
## Element Types
|
|
66
|
+
|
|
67
|
+
### Text Element
|
|
68
|
+
```json
|
|
69
|
+
{
|
|
70
|
+
"type": "text",
|
|
71
|
+
"content": "The actual text content",
|
|
72
|
+
"bbox": [100, 200, 500, 250],
|
|
73
|
+
"confidence": 0.95,
|
|
74
|
+
"language": "en"
|
|
75
|
+
}
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
### Table Element
|
|
79
|
+
```json
|
|
80
|
+
{
|
|
81
|
+
"type": "table",
|
|
82
|
+
"content": {
|
|
83
|
+
"rows": 3,
|
|
84
|
+
"cols": 2,
|
|
85
|
+
"cells": [
|
|
86
|
+
["Header 1", "Header 2"],
|
|
87
|
+
["Data 1", "Data 2"],
|
|
88
|
+
["Data 3", "Data 4"]
|
|
89
|
+
]
|
|
90
|
+
},
|
|
91
|
+
"bbox": [100, 300, 500, 450],
|
|
92
|
+
"confidence": 0.88
|
|
93
|
+
}
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
### Formula Element
|
|
97
|
+
```json
|
|
98
|
+
{
|
|
99
|
+
"type": "formula",
|
|
100
|
+
"content": "E = mc^2",
|
|
101
|
+
"latex": "$E = mc^2$",
|
|
102
|
+
"bbox": [200, 500, 400, 530],
|
|
103
|
+
"confidence": 0.92
|
|
104
|
+
}
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
### Figure Element
|
|
108
|
+
```json
|
|
109
|
+
{
|
|
110
|
+
"type": "figure",
|
|
111
|
+
"content": {
|
|
112
|
+
"description": "Bar chart showing sales data",
|
|
113
|
+
"extracted_data": {...} // If available
|
|
114
|
+
},
|
|
115
|
+
"bbox": [100, 600, 500, 800],
|
|
116
|
+
"confidence": 0.85
|
|
117
|
+
}
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
## Error Response
|
|
121
|
+
|
|
122
|
+
```json
|
|
123
|
+
{
|
|
124
|
+
"ok": false,
|
|
125
|
+
"error": {
|
|
126
|
+
"code": "ERROR_CODE",
|
|
127
|
+
"message": "Human-readable error message",
|
|
128
|
+
"details": {
|
|
129
|
+
"field": "Additional context"
|
|
130
|
+
}
|
|
131
|
+
}
|
|
132
|
+
}
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
### Error Codes
|
|
136
|
+
|
|
137
|
+
- `INVALID_INPUT`: Invalid file or URL
|
|
138
|
+
- `UNSUPPORTED_FORMAT`: File format not supported
|
|
139
|
+
- `PROVIDER_AUTH_ERROR`: Authentication failed
|
|
140
|
+
- `PROVIDER_QUOTA_EXCEEDED`: API quota exceeded
|
|
141
|
+
- `PROCESSING_ERROR`: Error during document processing
|
|
142
|
+
- `TIMEOUT`: Request timeout
|
|
143
|
+
|
|
144
|
+
## Quality Metrics
|
|
145
|
+
|
|
146
|
+
Each element includes a confidence score (0.0 to 1.0):
|
|
147
|
+
- **0.90 - 1.00**: Excellent quality, highly reliable
|
|
148
|
+
- **0.75 - 0.89**: Good quality, generally reliable
|
|
149
|
+
- **0.50 - 0.74**: Fair quality, may have some errors
|
|
150
|
+
- **0.00 - 0.49**: Poor quality, likely has errors
|
|
151
|
+
|
|
152
|
+
---
|
|
153
|
+
|
|
154
|
+
*This is a placeholder document. Full output format specifications will be added when integration is complete.*
|
|
@@ -1,157 +1,157 @@
|
|
|
1
|
-
# PaddleOCR-VL Model Specification
|
|
2
|
-
|
|
3
|
-
## Overview
|
|
4
|
-
|
|
5
|
-
PaddleOCR-VL is a vision-language model designed for advanced document parsing. This document provides the technical specifications and capabilities of the model.
|
|
6
|
-
|
|
7
|
-
## Model Architecture
|
|
8
|
-
|
|
9
|
-
- **Model Size**:
|
|
10
|
-
- **Vision Encoder**: NaViT-style dynamic high-resolution visual encoder
|
|
11
|
-
- **Language Model**: ERNIE-4.5-0.3B
|
|
12
|
-
- **Two-Stage Processing**:
|
|
13
|
-
1. PP-DocLayoutV2 for layout analysis
|
|
14
|
-
2. PaddleOCR-VL-
|
|
15
|
-
|
|
16
|
-
## Supported Languages
|
|
17
|
-
|
|
18
|
-
109 languages including:
|
|
19
|
-
- English, Chinese, Japanese, Korean
|
|
20
|
-
- European languages (French, German, Spanish, Italian, etc.)
|
|
21
|
-
- Asian languages (Thai, Vietnamese, Arabic, etc.)
|
|
22
|
-
- And many more...
|
|
23
|
-
|
|
24
|
-
## Supported Elements
|
|
25
|
-
|
|
26
|
-
### Text Recognition
|
|
27
|
-
- Plain text paragraphs
|
|
28
|
-
- Headings and titles
|
|
29
|
-
- Captions and labels
|
|
30
|
-
- Multi-column text
|
|
31
|
-
|
|
32
|
-
### Table Recognition
|
|
33
|
-
- Simple tables
|
|
34
|
-
- Complex nested tables
|
|
35
|
-
- Borderless tables
|
|
36
|
-
- Multi-page tables
|
|
37
|
-
|
|
38
|
-
### Formula Recognition
|
|
39
|
-
- Inline formulas
|
|
40
|
-
- Display formulas
|
|
41
|
-
- Mathematical equations
|
|
42
|
-
- Chemical formulas
|
|
43
|
-
- LaTeX output support
|
|
44
|
-
|
|
45
|
-
### Chart Recognition
|
|
46
|
-
- Bar charts
|
|
47
|
-
- Line charts
|
|
48
|
-
- Pie charts
|
|
49
|
-
- Diagrams and flowcharts
|
|
50
|
-
|
|
51
|
-
## Performance Characteristics
|
|
52
|
-
|
|
53
|
-
- **Accuracy**: SOTA on document parsing benchmarks
|
|
54
|
-
- **Speed**: ~3-5 seconds per page (depending on complexity)
|
|
55
|
-
- **Input Resolution**: Dynamic, up to 4K resolution
|
|
56
|
-
- **Max File Size**: 20MB per request
|
|
57
|
-
|
|
58
|
-
## API Limitations
|
|
59
|
-
|
|
60
|
-
- Maximum pages per request: 10
|
|
61
|
-
- Maximum requests per minute: 30
|
|
62
|
-
- Supported formats: PDF, PNG, JPG, JPEG
|
|
63
|
-
- Maximum file size: 20MB per document
|
|
64
|
-
- Timeout: 30 seconds default (configurable up to 60s)
|
|
65
|
-
|
|
66
|
-
## Output Format
|
|
67
|
-
|
|
68
|
-
### Region Types
|
|
69
|
-
|
|
70
|
-
The model identifies and classifies document regions into the following types:
|
|
71
|
-
|
|
72
|
-
| Type | Description | Typical Use |
|
|
73
|
-
|------|-------------|-------------|
|
|
74
|
-
| `header` | Page headers, chapter titles | Navigation, structure |
|
|
75
|
-
| `text` | Main body text, paragraphs | Primary content |
|
|
76
|
-
| `table` | Tabular data with rows/columns | Structured data extraction |
|
|
77
|
-
| `formula` | Mathematical equations | Scientific documents |
|
|
78
|
-
| `figure` | Images, charts, diagrams | Visual content |
|
|
79
|
-
| `footnote` | References, citations | Supporting information |
|
|
80
|
-
| `footer` | Page footers, metadata | Document metadata |
|
|
81
|
-
| `page_number` | Page numbering | Navigation |
|
|
82
|
-
| `margin_note` | Annotations, comments | Supplementary notes |
|
|
83
|
-
|
|
84
|
-
### Confidence Scoring
|
|
85
|
-
|
|
86
|
-
Each recognized element includes a confidence score (0.0-1.0):
|
|
87
|
-
|
|
88
|
-
- **0.90-1.00**: Excellent recognition quality
|
|
89
|
-
- **0.75-0.89**: Good quality, minor uncertainties
|
|
90
|
-
- **0.60-0.74**: Acceptable, may have some errors
|
|
91
|
-
- **Below 0.60**: Poor quality, manual review recommended
|
|
92
|
-
|
|
93
|
-
### Layout Analysis
|
|
94
|
-
|
|
95
|
-
The model performs reading order detection and provides:
|
|
96
|
-
- Bounding box coordinates for each element
|
|
97
|
-
- Reading order sequence
|
|
98
|
-
- Element relationships (e.g., caption-to-figure association)
|
|
99
|
-
- Multi-column layout handling
|
|
100
|
-
|
|
101
|
-
## Quality Considerations
|
|
102
|
-
|
|
103
|
-
### Best Performance On:
|
|
104
|
-
- High-resolution scanned documents (300 DPI+)
|
|
105
|
-
- Clean, well-formatted PDFs
|
|
106
|
-
- Standard fonts and layouts
|
|
107
|
-
- Good contrast between text and background
|
|
108
|
-
|
|
109
|
-
### Challenging Scenarios:
|
|
110
|
-
- Handwritten text (not supported)
|
|
111
|
-
- Very low resolution images (<150 DPI)
|
|
112
|
-
- Heavy distortion or skew
|
|
113
|
-
- Unusual fonts or decorative text
|
|
114
|
-
|
|
115
|
-
## API Response Structure
|
|
116
|
-
|
|
117
|
-
The API returns a comprehensive JSON structure:
|
|
118
|
-
|
|
119
|
-
```json
|
|
120
|
-
{
|
|
121
|
-
"ok": true,
|
|
122
|
-
"result": {
|
|
123
|
-
"full_text": "Complete extracted text...",
|
|
124
|
-
"layout": {
|
|
125
|
-
"regions": [
|
|
126
|
-
{
|
|
127
|
-
"id": 0,
|
|
128
|
-
"type": "text|table|formula|figure|header|footer|...",
|
|
129
|
-
"content": "...",
|
|
130
|
-
"bbox": [x1, y1, x2, y2],
|
|
131
|
-
"confidence": 0.95,
|
|
132
|
-
"page": 1
|
|
133
|
-
}
|
|
134
|
-
],
|
|
135
|
-
"reading_order": [0, 1, 2, ...]
|
|
136
|
-
}
|
|
137
|
-
},
|
|
138
|
-
"metadata": {
|
|
139
|
-
"processing_time_ms": 3500,
|
|
140
|
-
"total_pages": 1,
|
|
141
|
-
"model_version": "paddleocr-vl-
|
|
142
|
-
}
|
|
143
|
-
}
|
|
144
|
-
```
|
|
145
|
-
|
|
146
|
-
## Integration Notes
|
|
147
|
-
|
|
148
|
-
- All requests must include valid authentication token
|
|
149
|
-
- Responses are cached for 10 minutes by default
|
|
150
|
-
- Failed requests return standardized error codes
|
|
151
|
-
- Retry logic recommended for transient failures (503, 504)
|
|
152
|
-
|
|
153
|
-
---
|
|
154
|
-
|
|
155
|
-
**Document Version**: 1.0
|
|
156
|
-
**Last Updated**: 2026-01-28
|
|
157
|
-
**Status**: Production Ready
|
|
1
|
+
# PaddleOCR-VL 1.5 Model Specification
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
PaddleOCR-VL 1.5 is a vision-language model designed for advanced document parsing. This document provides the technical specifications and capabilities of the model.
|
|
6
|
+
|
|
7
|
+
## Model Architecture
|
|
8
|
+
|
|
9
|
+
- **Model Size**: 1.5B parameters
|
|
10
|
+
- **Vision Encoder**: NaViT-style dynamic high-resolution visual encoder
|
|
11
|
+
- **Language Model**: ERNIE-4.5-0.3B
|
|
12
|
+
- **Two-Stage Processing**:
|
|
13
|
+
1. PP-DocLayoutV2 for layout analysis
|
|
14
|
+
2. PaddleOCR-VL-1.5B for content recognition
|
|
15
|
+
|
|
16
|
+
## Supported Languages
|
|
17
|
+
|
|
18
|
+
109 languages including:
|
|
19
|
+
- English, Chinese, Japanese, Korean
|
|
20
|
+
- European languages (French, German, Spanish, Italian, etc.)
|
|
21
|
+
- Asian languages (Thai, Vietnamese, Arabic, etc.)
|
|
22
|
+
- And many more...
|
|
23
|
+
|
|
24
|
+
## Supported Elements
|
|
25
|
+
|
|
26
|
+
### Text Recognition
|
|
27
|
+
- Plain text paragraphs
|
|
28
|
+
- Headings and titles
|
|
29
|
+
- Captions and labels
|
|
30
|
+
- Multi-column text
|
|
31
|
+
|
|
32
|
+
### Table Recognition
|
|
33
|
+
- Simple tables
|
|
34
|
+
- Complex nested tables
|
|
35
|
+
- Borderless tables
|
|
36
|
+
- Multi-page tables
|
|
37
|
+
|
|
38
|
+
### Formula Recognition
|
|
39
|
+
- Inline formulas
|
|
40
|
+
- Display formulas
|
|
41
|
+
- Mathematical equations
|
|
42
|
+
- Chemical formulas
|
|
43
|
+
- LaTeX output support
|
|
44
|
+
|
|
45
|
+
### Chart Recognition
|
|
46
|
+
- Bar charts
|
|
47
|
+
- Line charts
|
|
48
|
+
- Pie charts
|
|
49
|
+
- Diagrams and flowcharts
|
|
50
|
+
|
|
51
|
+
## Performance Characteristics
|
|
52
|
+
|
|
53
|
+
- **Accuracy**: SOTA on document parsing benchmarks
|
|
54
|
+
- **Speed**: ~3-5 seconds per page (depending on complexity)
|
|
55
|
+
- **Input Resolution**: Dynamic, up to 4K resolution
|
|
56
|
+
- **Max File Size**: 20MB per request
|
|
57
|
+
|
|
58
|
+
## API Limitations
|
|
59
|
+
|
|
60
|
+
- Maximum pages per request: 10
|
|
61
|
+
- Maximum requests per minute: 30
|
|
62
|
+
- Supported formats: PDF, PNG, JPG, JPEG
|
|
63
|
+
- Maximum file size: 20MB per document
|
|
64
|
+
- Timeout: 30 seconds default (configurable up to 60s)
|
|
65
|
+
|
|
66
|
+
## Output Format
|
|
67
|
+
|
|
68
|
+
### Region Types
|
|
69
|
+
|
|
70
|
+
The model identifies and classifies document regions into the following types:
|
|
71
|
+
|
|
72
|
+
| Type | Description | Typical Use |
|
|
73
|
+
|------|-------------|-------------|
|
|
74
|
+
| `header` | Page headers, chapter titles | Navigation, structure |
|
|
75
|
+
| `text` | Main body text, paragraphs | Primary content |
|
|
76
|
+
| `table` | Tabular data with rows/columns | Structured data extraction |
|
|
77
|
+
| `formula` | Mathematical equations | Scientific documents |
|
|
78
|
+
| `figure` | Images, charts, diagrams | Visual content |
|
|
79
|
+
| `footnote` | References, citations | Supporting information |
|
|
80
|
+
| `footer` | Page footers, metadata | Document metadata |
|
|
81
|
+
| `page_number` | Page numbering | Navigation |
|
|
82
|
+
| `margin_note` | Annotations, comments | Supplementary notes |
|
|
83
|
+
|
|
84
|
+
### Confidence Scoring
|
|
85
|
+
|
|
86
|
+
Each recognized element includes a confidence score (0.0-1.0):
|
|
87
|
+
|
|
88
|
+
- **0.90-1.00**: Excellent recognition quality
|
|
89
|
+
- **0.75-0.89**: Good quality, minor uncertainties
|
|
90
|
+
- **0.60-0.74**: Acceptable, may have some errors
|
|
91
|
+
- **Below 0.60**: Poor quality, manual review recommended
|
|
92
|
+
|
|
93
|
+
### Layout Analysis
|
|
94
|
+
|
|
95
|
+
The model performs reading order detection and provides:
|
|
96
|
+
- Bounding box coordinates for each element
|
|
97
|
+
- Reading order sequence
|
|
98
|
+
- Element relationships (e.g., caption-to-figure association)
|
|
99
|
+
- Multi-column layout handling
|
|
100
|
+
|
|
101
|
+
## Quality Considerations
|
|
102
|
+
|
|
103
|
+
### Best Performance On:
|
|
104
|
+
- High-resolution scanned documents (300 DPI+)
|
|
105
|
+
- Clean, well-formatted PDFs
|
|
106
|
+
- Standard fonts and layouts
|
|
107
|
+
- Good contrast between text and background
|
|
108
|
+
|
|
109
|
+
### Challenging Scenarios:
|
|
110
|
+
- Handwritten text (not supported)
|
|
111
|
+
- Very low resolution images (<150 DPI)
|
|
112
|
+
- Heavy distortion or skew
|
|
113
|
+
- Unusual fonts or decorative text
|
|
114
|
+
|
|
115
|
+
## API Response Structure
|
|
116
|
+
|
|
117
|
+
The API returns a comprehensive JSON structure:
|
|
118
|
+
|
|
119
|
+
```json
|
|
120
|
+
{
|
|
121
|
+
"ok": true,
|
|
122
|
+
"result": {
|
|
123
|
+
"full_text": "Complete extracted text...",
|
|
124
|
+
"layout": {
|
|
125
|
+
"regions": [
|
|
126
|
+
{
|
|
127
|
+
"id": 0,
|
|
128
|
+
"type": "text|table|formula|figure|header|footer|...",
|
|
129
|
+
"content": "...",
|
|
130
|
+
"bbox": [x1, y1, x2, y2],
|
|
131
|
+
"confidence": 0.95,
|
|
132
|
+
"page": 1
|
|
133
|
+
}
|
|
134
|
+
],
|
|
135
|
+
"reading_order": [0, 1, 2, ...]
|
|
136
|
+
}
|
|
137
|
+
},
|
|
138
|
+
"metadata": {
|
|
139
|
+
"processing_time_ms": 3500,
|
|
140
|
+
"total_pages": 1,
|
|
141
|
+
"model_version": "paddleocr-vl-1.5"
|
|
142
|
+
}
|
|
143
|
+
}
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
## Integration Notes
|
|
147
|
+
|
|
148
|
+
- All requests must include valid authentication token
|
|
149
|
+
- Responses are cached for 10 minutes by default
|
|
150
|
+
- Failed requests return standardized error codes
|
|
151
|
+
- Retry logic recommended for transient failures (503, 504)
|
|
152
|
+
|
|
153
|
+
---
|
|
154
|
+
|
|
155
|
+
**Document Version**: 1.0
|
|
156
|
+
**Last Updated**: 2026-01-28
|
|
157
|
+
**Status**: Production Ready
|