paddleocr-skills 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (29) hide show
  1. package/README.md +220 -0
  2. package/bin/paddleocr-skills.js +20 -0
  3. package/lib/copy.js +39 -0
  4. package/lib/installer.js +70 -0
  5. package/lib/prompts.js +67 -0
  6. package/lib/python.js +75 -0
  7. package/lib/verify.js +121 -0
  8. package/package.json +42 -0
  9. package/templates/.env.example +12 -0
  10. package/templates/paddleocr-vl/references/paddleocr-vl/layout_schema.md +64 -0
  11. package/templates/paddleocr-vl/references/paddleocr-vl/output_format.md +154 -0
  12. package/templates/paddleocr-vl/references/paddleocr-vl/vl_model_spec.md +157 -0
  13. package/templates/paddleocr-vl/scripts/paddleocr-vl/_lib.py +780 -0
  14. package/templates/paddleocr-vl/scripts/paddleocr-vl/configure.py +270 -0
  15. package/templates/paddleocr-vl/scripts/paddleocr-vl/optimize_file.py +226 -0
  16. package/templates/paddleocr-vl/scripts/paddleocr-vl/requirements-optimize.txt +8 -0
  17. package/templates/paddleocr-vl/scripts/paddleocr-vl/requirements.txt +7 -0
  18. package/templates/paddleocr-vl/scripts/paddleocr-vl/smoke_test.py +199 -0
  19. package/templates/paddleocr-vl/scripts/paddleocr-vl/vl_caller.py +232 -0
  20. package/templates/paddleocr-vl/skills/paddleocr-vl/SKILL.md +481 -0
  21. package/templates/ppocrv5/references/ppocrv5/agent_policy.md +258 -0
  22. package/templates/ppocrv5/references/ppocrv5/normalized_schema.md +257 -0
  23. package/templates/ppocrv5/references/ppocrv5/provider_api.md +140 -0
  24. package/templates/ppocrv5/scripts/ppocrv5/_lib.py +635 -0
  25. package/templates/ppocrv5/scripts/ppocrv5/configure.py +346 -0
  26. package/templates/ppocrv5/scripts/ppocrv5/ocr_caller.py +684 -0
  27. package/templates/ppocrv5/scripts/ppocrv5/requirements.txt +4 -0
  28. package/templates/ppocrv5/scripts/ppocrv5/smoke_test.py +139 -0
  29. package/templates/ppocrv5/skills/ppocrv5/SKILL.md +272 -0
@@ -0,0 +1,64 @@
1
+ # PaddleOCR-VL Layout Schema
2
+
3
+ ## Layout Detection Overview
4
+
5
+ PaddleOCR-VL uses PP-DocLayoutV2 for automatic layout analysis, detecting semantic regions and determining reading order.
6
+
7
+ ## Region Types
8
+
9
+ ### Text Regions
10
+ - **paragraph**: Regular text paragraphs
11
+ - **title**: Headings and titles
12
+ - **caption**: Image/table captions
13
+ - **footnote**: Footnotes and references
14
+
15
+ ### Non-Text Regions
16
+ - **table**: Tabular data
17
+ - **figure**: Images, charts, diagrams
18
+ - **formula**: Mathematical formulas
19
+ - **header**: Page headers
20
+ - **footer**: Page footers
21
+
22
+ ## Layout Structure
23
+
24
+ ```json
25
+ {
26
+ "layout": {
27
+ "regions": [
28
+ {
29
+ "id": 0,
30
+ "type": "title",
31
+ "bbox": [x1, y1, x2, y2],
32
+ "confidence": 0.95
33
+ },
34
+ {
35
+ "id": 1,
36
+ "type": "paragraph",
37
+ "bbox": [x1, y1, x2, y2],
38
+ "confidence": 0.92
39
+ }
40
+ ],
41
+ "reading_order": [0, 1, 2, 3],
42
+ "page_number": 1
43
+ }
44
+ }
45
+ ```
46
+
47
+ ## Reading Order Algorithm
48
+
49
+ The model automatically determines the correct reading order based on:
50
+ - Spatial layout (top-to-bottom, left-to-right)
51
+ - Document structure (titles before content)
52
+ - Column detection (multi-column layouts)
53
+ - Semantic relationships
54
+
55
+ ## Bounding Box Format
56
+
57
+ All bounding boxes use the format: `[x1, y1, x2, y2]`
58
+ - (x1, y1): Top-left corner
59
+ - (x2, y2): Bottom-right corner
60
+ - Coordinates are absolute pixel positions
61
+
62
+ ---
63
+
64
+ *This is a placeholder document. Full layout schema specifications will be added when integration is complete.*
@@ -0,0 +1,154 @@
1
+ # PaddleOCR-VL Output Format Specification
2
+
3
+ ## Response Structure
4
+
5
+ All PaddleOCR-VL API responses follow this structure:
6
+
7
+ ```json
8
+ {
9
+ "ok": boolean,
10
+ "result": { ... },
11
+ "metadata": { ... },
12
+ "error": { ... } // Only present if ok = false
13
+ }
14
+ ```
15
+
16
+ ## Success Response
17
+
18
+ ### JSON Output
19
+
20
+ ```json
21
+ {
22
+ "ok": true,
23
+ "result": {
24
+ "layout": {
25
+ "regions": [...],
26
+ "reading_order": [...]
27
+ },
28
+ "elements": [
29
+ {
30
+ "type": "text" | "table" | "formula" | "figure",
31
+ "content": "...",
32
+ "bbox": [x1, y1, x2, y2],
33
+ "confidence": 0.0-1.0
34
+ }
35
+ ],
36
+ "full_text": "Combined text from all elements"
37
+ },
38
+ "metadata": {
39
+ "processing_time_ms": 3500,
40
+ "total_pages": 1,
41
+ "languages_detected": ["en", "zh"],
42
+ "model_version": "paddleocr-vl-0.9b"
43
+ }
44
+ }
45
+ ```
46
+
47
+ ### Markdown Output
48
+
49
+ When `--format markdown` is specified, the output is formatted as:
50
+
51
+ ```markdown
52
+ # [Title from document]
53
+
54
+ [Paragraph text...]
55
+
56
+ | Table | Headers |
57
+ |-------|---------|
58
+ | Data | Values |
59
+
60
+ Formula: $E = mc^2$
61
+
62
+ [More content...]
63
+ ```
64
+
65
+ ## Element Types
66
+
67
+ ### Text Element
68
+ ```json
69
+ {
70
+ "type": "text",
71
+ "content": "The actual text content",
72
+ "bbox": [100, 200, 500, 250],
73
+ "confidence": 0.95,
74
+ "language": "en"
75
+ }
76
+ ```
77
+
78
+ ### Table Element
79
+ ```json
80
+ {
81
+ "type": "table",
82
+ "content": {
83
+ "rows": 3,
84
+ "cols": 2,
85
+ "cells": [
86
+ ["Header 1", "Header 2"],
87
+ ["Data 1", "Data 2"],
88
+ ["Data 3", "Data 4"]
89
+ ]
90
+ },
91
+ "bbox": [100, 300, 500, 450],
92
+ "confidence": 0.88
93
+ }
94
+ ```
95
+
96
+ ### Formula Element
97
+ ```json
98
+ {
99
+ "type": "formula",
100
+ "content": "E = mc^2",
101
+ "latex": "$E = mc^2$",
102
+ "bbox": [200, 500, 400, 530],
103
+ "confidence": 0.92
104
+ }
105
+ ```
106
+
107
+ ### Figure Element
108
+ ```json
109
+ {
110
+ "type": "figure",
111
+ "content": {
112
+ "description": "Bar chart showing sales data",
113
+ "extracted_data": {...} // If available
114
+ },
115
+ "bbox": [100, 600, 500, 800],
116
+ "confidence": 0.85
117
+ }
118
+ ```
119
+
120
+ ## Error Response
121
+
122
+ ```json
123
+ {
124
+ "ok": false,
125
+ "error": {
126
+ "code": "ERROR_CODE",
127
+ "message": "Human-readable error message",
128
+ "details": {
129
+ "field": "Additional context"
130
+ }
131
+ }
132
+ }
133
+ ```
134
+
135
+ ### Error Codes
136
+
137
+ - `INVALID_INPUT`: Invalid file or URL
138
+ - `UNSUPPORTED_FORMAT`: File format not supported
139
+ - `PROVIDER_AUTH_ERROR`: Authentication failed
140
+ - `PROVIDER_QUOTA_EXCEEDED`: API quota exceeded
141
+ - `PROCESSING_ERROR`: Error during document processing
142
+ - `TIMEOUT`: Request timeout
143
+
144
+ ## Quality Metrics
145
+
146
+ Each element includes a confidence score (0.0 to 1.0):
147
+ - **0.90 - 1.00**: Excellent quality, highly reliable
148
+ - **0.75 - 0.89**: Good quality, generally reliable
149
+ - **0.50 - 0.74**: Fair quality, may have some errors
150
+ - **0.00 - 0.49**: Poor quality, likely has errors
151
+
152
+ ---
153
+
154
+ *This is a placeholder document. Full output format specifications will be added when integration is complete.*
@@ -0,0 +1,157 @@
1
+ # PaddleOCR-VL Model Specification
2
+
3
+ ## Overview
4
+
5
+ PaddleOCR-VL is a vision-language model designed for advanced document parsing. This document provides the technical specifications and capabilities of the model.
6
+
7
+ ## Model Architecture
8
+
9
+ - **Model Size**: 0.9B parameters
10
+ - **Vision Encoder**: NaViT-style dynamic high-resolution visual encoder
11
+ - **Language Model**: ERNIE-4.5-0.3B
12
+ - **Two-Stage Processing**:
13
+ 1. PP-DocLayoutV2 for layout analysis
14
+ 2. PaddleOCR-VL-0.9B for content recognition
15
+
16
+ ## Supported Languages
17
+
18
+ 109 languages including:
19
+ - English, Chinese, Japanese, Korean
20
+ - European languages (French, German, Spanish, Italian, etc.)
21
+ - Asian languages (Thai, Vietnamese, Arabic, etc.)
22
+ - And many more...
23
+
24
+ ## Supported Elements
25
+
26
+ ### Text Recognition
27
+ - Plain text paragraphs
28
+ - Headings and titles
29
+ - Captions and labels
30
+ - Multi-column text
31
+
32
+ ### Table Recognition
33
+ - Simple tables
34
+ - Complex nested tables
35
+ - Borderless tables
36
+ - Multi-page tables
37
+
38
+ ### Formula Recognition
39
+ - Inline formulas
40
+ - Display formulas
41
+ - Mathematical equations
42
+ - Chemical formulas
43
+ - LaTeX output support
44
+
45
+ ### Chart Recognition
46
+ - Bar charts
47
+ - Line charts
48
+ - Pie charts
49
+ - Diagrams and flowcharts
50
+
51
+ ## Performance Characteristics
52
+
53
+ - **Accuracy**: SOTA on document parsing benchmarks
54
+ - **Speed**: ~3-5 seconds per page (depending on complexity)
55
+ - **Input Resolution**: Dynamic, up to 4K resolution
56
+ - **Max File Size**: 20MB per request
57
+
58
+ ## API Limitations
59
+
60
+ - Maximum pages per request: 10
61
+ - Maximum requests per minute: 30
62
+ - Supported formats: PDF, PNG, JPG, JPEG
63
+ - Maximum file size: 20MB per document
64
+ - Timeout: 30 seconds default (configurable up to 60s)
65
+
66
+ ## Output Format
67
+
68
+ ### Region Types
69
+
70
+ The model identifies and classifies document regions into the following types:
71
+
72
+ | Type | Description | Typical Use |
73
+ |------|-------------|-------------|
74
+ | `header` | Page headers, chapter titles | Navigation, structure |
75
+ | `text` | Main body text, paragraphs | Primary content |
76
+ | `table` | Tabular data with rows/columns | Structured data extraction |
77
+ | `formula` | Mathematical equations | Scientific documents |
78
+ | `figure` | Images, charts, diagrams | Visual content |
79
+ | `footnote` | References, citations | Supporting information |
80
+ | `footer` | Page footers, metadata | Document metadata |
81
+ | `page_number` | Page numbering | Navigation |
82
+ | `margin_note` | Annotations, comments | Supplementary notes |
83
+
84
+ ### Confidence Scoring
85
+
86
+ Each recognized element includes a confidence score (0.0-1.0):
87
+
88
+ - **0.90-1.00**: Excellent recognition quality
89
+ - **0.75-0.89**: Good quality, minor uncertainties
90
+ - **0.60-0.74**: Acceptable, may have some errors
91
+ - **Below 0.60**: Poor quality, manual review recommended
92
+
93
+ ### Layout Analysis
94
+
95
+ The model performs reading order detection and provides:
96
+ - Bounding box coordinates for each element
97
+ - Reading order sequence
98
+ - Element relationships (e.g., caption-to-figure association)
99
+ - Multi-column layout handling
100
+
101
+ ## Quality Considerations
102
+
103
+ ### Best Performance On:
104
+ - High-resolution scanned documents (300 DPI+)
105
+ - Clean, well-formatted PDFs
106
+ - Standard fonts and layouts
107
+ - Good contrast between text and background
108
+
109
+ ### Challenging Scenarios:
110
+ - Handwritten text (not supported)
111
+ - Very low resolution images (<150 DPI)
112
+ - Heavy distortion or skew
113
+ - Unusual fonts or decorative text
114
+
115
+ ## API Response Structure
116
+
117
+ The API returns a comprehensive JSON structure:
118
+
119
+ ```json
120
+ {
121
+ "ok": true,
122
+ "result": {
123
+ "full_text": "Complete extracted text...",
124
+ "layout": {
125
+ "regions": [
126
+ {
127
+ "id": 0,
128
+ "type": "text|table|formula|figure|header|footer|...",
129
+ "content": "...",
130
+ "bbox": [x1, y1, x2, y2],
131
+ "confidence": 0.95,
132
+ "page": 1
133
+ }
134
+ ],
135
+ "reading_order": [0, 1, 2, ...]
136
+ }
137
+ },
138
+ "metadata": {
139
+ "processing_time_ms": 3500,
140
+ "total_pages": 1,
141
+ "model_version": "paddleocr-vl-0.9b"
142
+ }
143
+ }
144
+ ```
145
+
146
+ ## Integration Notes
147
+
148
+ - All requests must include valid authentication token
149
+ - Responses are cached for 10 minutes by default
150
+ - Failed requests return standardized error codes
151
+ - Retry logic recommended for transient failures (503, 504)
152
+
153
+ ---
154
+
155
+ **Document Version**: 1.0
156
+ **Last Updated**: 2026-01-28
157
+ **Status**: Production Ready