paddleocr-skills 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (29) hide show
  1. package/README.md +220 -220
  2. package/bin/paddleocr-skills.js +33 -20
  3. package/lib/copy.js +39 -39
  4. package/lib/installer.js +76 -70
  5. package/lib/prompts.js +67 -67
  6. package/lib/python.js +75 -75
  7. package/lib/verify.js +121 -121
  8. package/package.json +42 -42
  9. package/templates/.env.example +12 -12
  10. package/templates/{paddleocr-vl/references/paddleocr-vl → paddleocr-vl-1.5/references/paddleocr-vl-1.5}/layout_schema.md +64 -64
  11. package/templates/{paddleocr-vl/references/paddleocr-vl → paddleocr-vl-1.5/references/paddleocr-vl-1.5}/output_format.md +154 -154
  12. package/templates/{paddleocr-vl/references/paddleocr-vl → paddleocr-vl-1.5/references/paddleocr-vl-1.5}/vl_model_spec.md +157 -157
  13. package/templates/{paddleocr-vl/scripts/paddleocr-vl → paddleocr-vl-1.5/scripts/paddleocr-vl-1.5}/_lib.py +780 -780
  14. package/templates/{paddleocr-vl/scripts/paddleocr-vl → paddleocr-vl-1.5/scripts/paddleocr-vl-1.5}/configure.py +270 -270
  15. package/templates/{paddleocr-vl/scripts/paddleocr-vl → paddleocr-vl-1.5/scripts/paddleocr-vl-1.5}/optimize_file.py +226 -226
  16. package/templates/{paddleocr-vl/scripts/paddleocr-vl → paddleocr-vl-1.5/scripts/paddleocr-vl-1.5}/requirements-optimize.txt +8 -8
  17. package/templates/{paddleocr-vl/scripts/paddleocr-vl → paddleocr-vl-1.5/scripts/paddleocr-vl-1.5}/requirements.txt +7 -7
  18. package/templates/{paddleocr-vl/scripts/paddleocr-vl → paddleocr-vl-1.5/scripts/paddleocr-vl-1.5}/smoke_test.py +199 -199
  19. package/templates/{paddleocr-vl/scripts/paddleocr-vl → paddleocr-vl-1.5/scripts/paddleocr-vl-1.5}/vl_caller.py +232 -232
  20. package/templates/{paddleocr-vl/skills/paddleocr-vl → paddleocr-vl-1.5/skills/paddleocr-vl-1.5}/SKILL.md +481 -481
  21. package/templates/ppocrv5/references/ppocrv5/agent_policy.md +258 -258
  22. package/templates/ppocrv5/references/ppocrv5/normalized_schema.md +257 -257
  23. package/templates/ppocrv5/references/ppocrv5/provider_api.md +140 -140
  24. package/templates/ppocrv5/scripts/ppocrv5/_lib.py +635 -635
  25. package/templates/ppocrv5/scripts/ppocrv5/configure.py +346 -346
  26. package/templates/ppocrv5/scripts/ppocrv5/ocr_caller.py +684 -684
  27. package/templates/ppocrv5/scripts/ppocrv5/requirements.txt +4 -4
  28. package/templates/ppocrv5/scripts/ppocrv5/smoke_test.py +139 -139
  29. package/templates/ppocrv5/skills/ppocrv5/SKILL.md +272 -272
@@ -1,257 +1,257 @@
1
- # Normalized Output Schema
2
-
3
- This document defines the unified output format returned by `ocr_caller.py`, and the format that downstream agents/tools should expect.
4
-
5
- ## Schema Version
6
-
7
- **v0.1** (stable)
8
-
9
- ## Output Structure
10
-
11
- All responses follow this top-level structure:
12
-
13
- ```typescript
14
- {
15
- ok: boolean, // true indicates OCR success
16
- request_id: string, // Unique request ID (e.g.: "req_abc123")
17
- provider: ProviderInfo, // Provider API metadata
18
- result: Result | null, // OCR results (null on error)
19
- quality: Quality | null, // Quality metrics (null on error)
20
- agent_trace: AgentTrace, // Execution trace for transparency
21
- raw_provider: any | null, // Raw provider response (if --return-raw-provider is used)
22
- error: Error | null // Error details (null on success)
23
- }
24
- ```
25
-
26
- ## ProviderInfo
27
-
28
- ```typescript
29
- {
30
- api_url: string, // Full API endpoint used
31
- status_code: number, // HTTP status code
32
- log_id: string | null // Provider's log ID (if available)
33
- }
34
- ```
35
-
36
- ## Result (success only)
37
-
38
- ```typescript
39
- {
40
- pages: Page[], // Array of pages (one per image/PDF page)
41
- full_text: string // All text joined by "\n\n"
42
- }
43
- ```
44
-
45
- ### Page
46
-
47
- ```typescript
48
- {
49
- page_index: number, // 0-based page number
50
- text: string, // Page text (items joined by "\n")
51
- avg_confidence: number, // Average confidence for this page (0.0-1.0)
52
- items: TextItem[] // Individual text blocks/lines
53
- }
54
- ```
55
-
56
- ### TextItem
57
-
58
- ```typescript
59
- {
60
- text: string, // Recognized text
61
- score?: number, // Confidence score (0.0-1.0), may be missing
62
- box?: number[] | number[][] // Bounding box or polygon, may be missing
63
- }
64
- ```
65
-
66
- **Box format:**
67
- - `[xmin, ymin, xmax, ymax]` - Bounding box (4 numbers)
68
- - Or polygon points (array of arrays)
69
-
70
- ## Quality (success only)
71
-
72
- ```typescript
73
- {
74
- quality_score: number, // Overall quality (0.0-1.0)
75
- avg_rec_score: number, // Average recognition confidence (0.0-1.0)
76
- text_items: number, // Total text items detected
77
- warnings: string[] // Warnings (e.g.: "rec_scores missing")
78
- }
79
- ```
80
-
81
- ### Quality Score Formula
82
-
83
- ```
84
- quality_score = 0 if text_items == 0
85
- = 0.6 * norm(text_items) + 0.4 * avg_rec_score otherwise
86
-
87
- norm(n) = min(1, log(1+n) / log(1+50))
88
- ```
89
-
90
- ## AgentTrace
91
-
92
- ```typescript
93
- {
94
- mode: "fast" | "quality" | "auto", // Execution mode
95
- selected_attempt: number, // Attempt used (1-indexed)
96
- attempts: Attempt[] // Details of all attempts
97
- }
98
- ```
99
-
100
- ### Attempt
101
-
102
- ```typescript
103
- {
104
- attempt: number, // Attempt number (1-indexed)
105
- provider_time_ms: number, // Provider call latency
106
- quality_score: number, // Quality score for this attempt
107
- avg_rec_score: number, // Average recognition score
108
- text_items: number, // Number of text items
109
- warnings: string[], // Warnings for this attempt
110
- options_effective: { // OCR options used
111
- use_doc_orientation_classify: boolean,
112
- use_doc_unwarping: boolean,
113
- use_textline_orientation: boolean
114
- }
115
- }
116
- ```
117
-
118
- ## Error (error only)
119
-
120
- ```typescript
121
- {
122
- code: ErrorCode, // Unified error code
123
- message: string, // Human-readable error message
124
- details: object // Additional error context
125
- }
126
- ```
127
-
128
- ### ErrorCode Enum
129
-
130
- - `PROVIDER_AUTH_ERROR` - Authentication failed (403)
131
- - `PROVIDER_QUOTA_EXCEEDED` - Quota/rate limit exceeded (429)
132
- - `PROVIDER_BAD_REQUEST` - Invalid parameters (500)
133
- - `PROVIDER_OVERLOADED` - Service overloaded (503)
134
- - `PROVIDER_TIMEOUT` - Gateway timeout (504)
135
- - `PROVIDER_ERROR` - Other provider errors
136
-
137
- ## Example: Success Response
138
-
139
- ```json
140
- {
141
- "ok": true,
142
- "request_id": "req_abc123",
143
- "provider": {
144
- "api_url": "https://example.aistudio-app.com/ocr",
145
- "status_code": 200,
146
- "log_id": "log_xyz"
147
- },
148
- "result": {
149
- "pages": [
150
- {
151
- "page_index": 0,
152
- "text": "Invoice\nAmount: $123.45",
153
- "avg_confidence": 0.95,
154
- "items": [
155
- {"text": "Invoice", "score": 0.98, "box": [10, 20, 100, 50]},
156
- {"text": "Amount: $123.45", "score": 0.92, "box": [10, 60, 200, 90]}
157
- ]
158
- }
159
- ],
160
- "full_text": "Invoice\nAmount: $123.45"
161
- },
162
- "quality": {
163
- "quality_score": 0.79,
164
- "avg_rec_score": 0.95,
165
- "text_items": 2,
166
- "warnings": []
167
- },
168
- "agent_trace": {
169
- "mode": "auto",
170
- "selected_attempt": 1,
171
- "attempts": [
172
- {
173
- "attempt": 1,
174
- "provider_time_ms": 1200,
175
- "quality_score": 0.79,
176
- "avg_rec_score": 0.95,
177
- "text_items": 2,
178
- "warnings": [],
179
- "options_effective": {
180
- "use_doc_orientation_classify": false,
181
- "use_doc_unwarping": false,
182
- "use_textline_orientation": false
183
- }
184
- }
185
- ]
186
- },
187
- "raw_provider": null,
188
- "error": null
189
- }
190
- ```
191
-
192
- ## Example: Error Response
193
-
194
- ```json
195
- {
196
- "ok": false,
197
- "request_id": "req_def456",
198
- "provider": {
199
- "api_url": "https://example.aistudio-app.com/ocr",
200
- "status_code": 403,
201
- "log_id": null
202
- },
203
- "result": null,
204
- "quality": null,
205
- "agent_trace": {
206
- "mode": "auto",
207
- "selected_attempt": 1,
208
- "attempts": [
209
- {
210
- "attempt": 1,
211
- "provider_time_ms": 150,
212
- "quality_score": 0.0,
213
- "avg_rec_score": 0.0,
214
- "text_items": 0,
215
- "warnings": ["Provider error: Authentication failed"],
216
- "options_effective": {
217
- "use_doc_orientation_classify": false,
218
- "use_doc_unwarping": false,
219
- "use_textline_orientation": false
220
- }
221
- }
222
- ]
223
- },
224
- "raw_provider": null,
225
- "error": {
226
- "code": "PROVIDER_AUTH_ERROR",
227
- "message": "Authentication failed",
228
- "details": {
229
- "error_code": 403,
230
- "status_code": 403
231
- }
232
- }
233
- }
234
- ```
235
-
236
- ## Usage Guide
237
-
238
- ### For Agents/Scripts
239
-
240
- 1. **Check `ok` first**: `if response.ok:`
241
- 2. **Extract text**: `response.result.full_text`
242
- 3. **Extract structured data**: `response.result.pages[].items[]`
243
- 4. **Check quality**: `response.quality.quality_score` (0.72+ is usually good)
244
- 5. **Handle errors**: `response.error.code` and `response.error.message`
245
-
246
- ### For Debugging
247
-
248
- 1. **Check trace**: `response.agent_trace.attempts` shows all attempts and their quality
249
- 2. **Selected attempt**: `response.agent_trace.selected_attempt` indicates which one was selected
250
- 3. **Raw provider**: Use `--return-raw-provider` to see raw API response
251
-
252
- ## Compatibility Notes
253
-
254
- - **Missing scores**: `items[].score` may not exist if provider didn't return scores
255
- - **Missing boxes**: `items[].box` may not exist if provider didn't return geometry
256
- - **Empty results**: `text_items == 0` means no text detected (not necessarily an error)
257
- - **Warnings**: Check `quality.warnings` for non-fatal issues (e.g.: missing fields)
1
+ # Normalized Output Schema
2
+
3
+ This document defines the unified output format returned by `ocr_caller.py`, and the format that downstream agents/tools should expect.
4
+
5
+ ## Schema Version
6
+
7
+ **v0.1** (stable)
8
+
9
+ ## Output Structure
10
+
11
+ All responses follow this top-level structure:
12
+
13
+ ```typescript
14
+ {
15
+ ok: boolean, // true indicates OCR success
16
+ request_id: string, // Unique request ID (e.g.: "req_abc123")
17
+ provider: ProviderInfo, // Provider API metadata
18
+ result: Result | null, // OCR results (null on error)
19
+ quality: Quality | null, // Quality metrics (null on error)
20
+ agent_trace: AgentTrace, // Execution trace for transparency
21
+ raw_provider: any | null, // Raw provider response (if --return-raw-provider is used)
22
+ error: Error | null // Error details (null on success)
23
+ }
24
+ ```
25
+
26
+ ## ProviderInfo
27
+
28
+ ```typescript
29
+ {
30
+ api_url: string, // Full API endpoint used
31
+ status_code: number, // HTTP status code
32
+ log_id: string | null // Provider's log ID (if available)
33
+ }
34
+ ```
35
+
36
+ ## Result (success only)
37
+
38
+ ```typescript
39
+ {
40
+ pages: Page[], // Array of pages (one per image/PDF page)
41
+ full_text: string // All text joined by "\n\n"
42
+ }
43
+ ```
44
+
45
+ ### Page
46
+
47
+ ```typescript
48
+ {
49
+ page_index: number, // 0-based page number
50
+ text: string, // Page text (items joined by "\n")
51
+ avg_confidence: number, // Average confidence for this page (0.0-1.0)
52
+ items: TextItem[] // Individual text blocks/lines
53
+ }
54
+ ```
55
+
56
+ ### TextItem
57
+
58
+ ```typescript
59
+ {
60
+ text: string, // Recognized text
61
+ score?: number, // Confidence score (0.0-1.0), may be missing
62
+ box?: number[] | number[][] // Bounding box or polygon, may be missing
63
+ }
64
+ ```
65
+
66
+ **Box format:**
67
+ - `[xmin, ymin, xmax, ymax]` - Bounding box (4 numbers)
68
+ - Or polygon points (array of arrays)
69
+
70
+ ## Quality (success only)
71
+
72
+ ```typescript
73
+ {
74
+ quality_score: number, // Overall quality (0.0-1.0)
75
+ avg_rec_score: number, // Average recognition confidence (0.0-1.0)
76
+ text_items: number, // Total text items detected
77
+ warnings: string[] // Warnings (e.g.: "rec_scores missing")
78
+ }
79
+ ```
80
+
81
+ ### Quality Score Formula
82
+
83
+ ```
84
+ quality_score = 0 if text_items == 0
85
+ = 0.6 * norm(text_items) + 0.4 * avg_rec_score otherwise
86
+
87
+ norm(n) = min(1, log(1+n) / log(1+50))
88
+ ```
89
+
90
+ ## AgentTrace
91
+
92
+ ```typescript
93
+ {
94
+ mode: "fast" | "quality" | "auto", // Execution mode
95
+ selected_attempt: number, // Attempt used (1-indexed)
96
+ attempts: Attempt[] // Details of all attempts
97
+ }
98
+ ```
99
+
100
+ ### Attempt
101
+
102
+ ```typescript
103
+ {
104
+ attempt: number, // Attempt number (1-indexed)
105
+ provider_time_ms: number, // Provider call latency
106
+ quality_score: number, // Quality score for this attempt
107
+ avg_rec_score: number, // Average recognition score
108
+ text_items: number, // Number of text items
109
+ warnings: string[], // Warnings for this attempt
110
+ options_effective: { // OCR options used
111
+ use_doc_orientation_classify: boolean,
112
+ use_doc_unwarping: boolean,
113
+ use_textline_orientation: boolean
114
+ }
115
+ }
116
+ ```
117
+
118
+ ## Error (error only)
119
+
120
+ ```typescript
121
+ {
122
+ code: ErrorCode, // Unified error code
123
+ message: string, // Human-readable error message
124
+ details: object // Additional error context
125
+ }
126
+ ```
127
+
128
+ ### ErrorCode Enum
129
+
130
+ - `PROVIDER_AUTH_ERROR` - Authentication failed (403)
131
+ - `PROVIDER_QUOTA_EXCEEDED` - Quota/rate limit exceeded (429)
132
+ - `PROVIDER_BAD_REQUEST` - Invalid parameters (500)
133
+ - `PROVIDER_OVERLOADED` - Service overloaded (503)
134
+ - `PROVIDER_TIMEOUT` - Gateway timeout (504)
135
+ - `PROVIDER_ERROR` - Other provider errors
136
+
137
+ ## Example: Success Response
138
+
139
+ ```json
140
+ {
141
+ "ok": true,
142
+ "request_id": "req_abc123",
143
+ "provider": {
144
+ "api_url": "https://example.aistudio-app.com/ocr",
145
+ "status_code": 200,
146
+ "log_id": "log_xyz"
147
+ },
148
+ "result": {
149
+ "pages": [
150
+ {
151
+ "page_index": 0,
152
+ "text": "Invoice\nAmount: $123.45",
153
+ "avg_confidence": 0.95,
154
+ "items": [
155
+ {"text": "Invoice", "score": 0.98, "box": [10, 20, 100, 50]},
156
+ {"text": "Amount: $123.45", "score": 0.92, "box": [10, 60, 200, 90]}
157
+ ]
158
+ }
159
+ ],
160
+ "full_text": "Invoice\nAmount: $123.45"
161
+ },
162
+ "quality": {
163
+ "quality_score": 0.79,
164
+ "avg_rec_score": 0.95,
165
+ "text_items": 2,
166
+ "warnings": []
167
+ },
168
+ "agent_trace": {
169
+ "mode": "auto",
170
+ "selected_attempt": 1,
171
+ "attempts": [
172
+ {
173
+ "attempt": 1,
174
+ "provider_time_ms": 1200,
175
+ "quality_score": 0.79,
176
+ "avg_rec_score": 0.95,
177
+ "text_items": 2,
178
+ "warnings": [],
179
+ "options_effective": {
180
+ "use_doc_orientation_classify": false,
181
+ "use_doc_unwarping": false,
182
+ "use_textline_orientation": false
183
+ }
184
+ }
185
+ ]
186
+ },
187
+ "raw_provider": null,
188
+ "error": null
189
+ }
190
+ ```
191
+
192
+ ## Example: Error Response
193
+
194
+ ```json
195
+ {
196
+ "ok": false,
197
+ "request_id": "req_def456",
198
+ "provider": {
199
+ "api_url": "https://example.aistudio-app.com/ocr",
200
+ "status_code": 403,
201
+ "log_id": null
202
+ },
203
+ "result": null,
204
+ "quality": null,
205
+ "agent_trace": {
206
+ "mode": "auto",
207
+ "selected_attempt": 1,
208
+ "attempts": [
209
+ {
210
+ "attempt": 1,
211
+ "provider_time_ms": 150,
212
+ "quality_score": 0.0,
213
+ "avg_rec_score": 0.0,
214
+ "text_items": 0,
215
+ "warnings": ["Provider error: Authentication failed"],
216
+ "options_effective": {
217
+ "use_doc_orientation_classify": false,
218
+ "use_doc_unwarping": false,
219
+ "use_textline_orientation": false
220
+ }
221
+ }
222
+ ]
223
+ },
224
+ "raw_provider": null,
225
+ "error": {
226
+ "code": "PROVIDER_AUTH_ERROR",
227
+ "message": "Authentication failed",
228
+ "details": {
229
+ "error_code": 403,
230
+ "status_code": 403
231
+ }
232
+ }
233
+ }
234
+ ```
235
+
236
+ ## Usage Guide
237
+
238
+ ### For Agents/Scripts
239
+
240
+ 1. **Check `ok` first**: `if response.ok:`
241
+ 2. **Extract text**: `response.result.full_text`
242
+ 3. **Extract structured data**: `response.result.pages[].items[]`
243
+ 4. **Check quality**: `response.quality.quality_score` (0.72+ is usually good)
244
+ 5. **Handle errors**: `response.error.code` and `response.error.message`
245
+
246
+ ### For Debugging
247
+
248
+ 1. **Check trace**: `response.agent_trace.attempts` shows all attempts and their quality
249
+ 2. **Selected attempt**: `response.agent_trace.selected_attempt` indicates which one was selected
250
+ 3. **Raw provider**: Use `--return-raw-provider` to see raw API response
251
+
252
+ ## Compatibility Notes
253
+
254
+ - **Missing scores**: `items[].score` may not exist if provider didn't return scores
255
+ - **Missing boxes**: `items[].box` may not exist if provider didn't return geometry
256
+ - **Empty results**: `text_items == 0` means no text detected (not necessarily an error)
257
+ - **Warnings**: Check `quality.warnings` for non-fatal issues (e.g.: missing fields)