paddleocr-skills 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (29) hide show
  1. package/README.md +220 -220
  2. package/bin/paddleocr-skills.js +33 -20
  3. package/lib/copy.js +39 -39
  4. package/lib/installer.js +76 -70
  5. package/lib/prompts.js +67 -67
  6. package/lib/python.js +75 -75
  7. package/lib/verify.js +121 -121
  8. package/package.json +42 -42
  9. package/templates/.env.example +12 -12
  10. package/templates/{paddleocr-vl/references/paddleocr-vl → paddleocr-vl-1.5/references/paddleocr-vl-1.5}/layout_schema.md +64 -64
  11. package/templates/{paddleocr-vl/references/paddleocr-vl → paddleocr-vl-1.5/references/paddleocr-vl-1.5}/output_format.md +154 -154
  12. package/templates/{paddleocr-vl/references/paddleocr-vl → paddleocr-vl-1.5/references/paddleocr-vl-1.5}/vl_model_spec.md +157 -157
  13. package/templates/{paddleocr-vl/scripts/paddleocr-vl → paddleocr-vl-1.5/scripts/paddleocr-vl-1.5}/_lib.py +780 -780
  14. package/templates/{paddleocr-vl/scripts/paddleocr-vl → paddleocr-vl-1.5/scripts/paddleocr-vl-1.5}/configure.py +270 -270
  15. package/templates/{paddleocr-vl/scripts/paddleocr-vl → paddleocr-vl-1.5/scripts/paddleocr-vl-1.5}/optimize_file.py +226 -226
  16. package/templates/{paddleocr-vl/scripts/paddleocr-vl → paddleocr-vl-1.5/scripts/paddleocr-vl-1.5}/requirements-optimize.txt +8 -8
  17. package/templates/{paddleocr-vl/scripts/paddleocr-vl → paddleocr-vl-1.5/scripts/paddleocr-vl-1.5}/requirements.txt +7 -7
  18. package/templates/{paddleocr-vl/scripts/paddleocr-vl → paddleocr-vl-1.5/scripts/paddleocr-vl-1.5}/smoke_test.py +199 -199
  19. package/templates/{paddleocr-vl/scripts/paddleocr-vl → paddleocr-vl-1.5/scripts/paddleocr-vl-1.5}/vl_caller.py +232 -232
  20. package/templates/{paddleocr-vl/skills/paddleocr-vl → paddleocr-vl-1.5/skills/paddleocr-vl-1.5}/SKILL.md +481 -481
  21. package/templates/ppocrv5/references/ppocrv5/agent_policy.md +258 -258
  22. package/templates/ppocrv5/references/ppocrv5/normalized_schema.md +257 -257
  23. package/templates/ppocrv5/references/ppocrv5/provider_api.md +140 -140
  24. package/templates/ppocrv5/scripts/ppocrv5/_lib.py +635 -635
  25. package/templates/ppocrv5/scripts/ppocrv5/configure.py +346 -346
  26. package/templates/ppocrv5/scripts/ppocrv5/ocr_caller.py +684 -684
  27. package/templates/ppocrv5/scripts/ppocrv5/requirements.txt +4 -4
  28. package/templates/ppocrv5/scripts/ppocrv5/smoke_test.py +139 -139
  29. package/templates/ppocrv5/skills/ppocrv5/SKILL.md +272 -272
@@ -1,258 +1,258 @@
1
- # Agent Policy: Auto Mode Strategy
2
-
3
- This document defines the execution strategies for the three OCR modes, with emphasis on the auto mode's adaptive retry logic.
4
-
5
- ## Mode Overview
6
-
7
- | Mode | Strategy | Use Case |
8
- |------|----------|----------|
9
- | `fast` | Single call, all corrections off | Lowest latency, suitable for high-quality scans |
10
- | `quality` | Single call, all corrections on | Maximum quality, slower, suitable for complex documents |
11
- | `auto` | Adaptive multiple attempts, quality scoring | Balanced performance, production default |
12
-
13
- ## Mode: fast
14
-
15
- **Single attempt**, minimal preprocessing.
16
-
17
- **Options:**
18
- ```json
19
- {
20
- "use_doc_orientation_classify": false,
21
- "use_doc_unwarping": false,
22
- "use_textline_orientation": false
23
- }
24
- ```
25
-
26
- **Suitable scenarios:**
27
- - Input is known high-quality images (scanned documents, screenshots)
28
- - Latency is critical
29
- - Text is already correctly oriented
30
-
31
- ## Mode: quality
32
-
33
- **Single attempt**, all corrections enabled.
34
-
35
- **Options:**
36
- ```json
37
- {
38
- "use_doc_orientation_classify": true,
39
- "use_doc_unwarping": true,
40
- "use_textline_orientation": false
41
- }
42
- ```
43
-
44
- **Suitable scenarios:**
45
- - Input quality is unknown or poor (photos, rotated PDFs)
46
- - Maximum accuracy is required
47
- - Latency is acceptable (2-3x slower than fast)
48
-
49
- ## Mode: auto (default)
50
-
51
- **Adaptive strategy**: Start fast, escalate progressively if quality is insufficient.
52
-
53
- ### Attempt Sequence
54
-
55
- #### Attempt 1: Fast Path
56
- ```json
57
- {
58
- "use_doc_orientation_classify": false,
59
- "use_doc_unwarping": false,
60
- "use_textline_orientation": false
61
- }
62
- ```
63
-
64
- **If `quality_score >= quality_target`**: Stop and return result.
65
-
66
- **Otherwise**: Continue to Attempt 2.
67
-
68
- ---
69
-
70
- #### Attempt 2: Orientation Correction
71
- ```json
72
- {
73
- "use_doc_orientation_classify": true,
74
- "use_doc_unwarping": false,
75
- "use_textline_orientation": false
76
- }
77
- ```
78
-
79
- Enable page-level orientation detection (0°/90°/180°/270°).
80
-
81
- **If `quality_score >= quality_target`**: Stop and return result.
82
-
83
- **Otherwise**: Continue to Attempt 3.
84
-
85
- ---
86
-
87
- #### Attempt 3: Unwarping Correction
88
- ```json
89
- {
90
- "use_doc_orientation_classify": true,
91
- "use_doc_unwarping": true,
92
- "use_textline_orientation": false
93
- }
94
- ```
95
-
96
- Add perspective correction and skew correction.
97
-
98
- **If `quality_score >= quality_target`**: Stop and return result.
99
-
100
- **Otherwise**: Select the best attempt so far.
101
-
102
- ---
103
-
104
- #### Optional Attempt 4: Textline Orientation
105
-
106
- *(Reserved for future use)*
107
-
108
- ```json
109
- {
110
- "use_doc_orientation_classify": true,
111
- "use_doc_unwarping": true,
112
- "use_textline_orientation": true
113
- }
114
- ```
115
-
116
- Add line-by-line angle correction (rarely needed).
117
-
118
- ### Stop Conditions
119
-
120
- Auto mode stops when **any** of the following conditions are met:
121
-
122
- 1. **Quality target reached**: `quality_score >= quality_target` (default 0.72)
123
- 2. **Max attempts reached**: `attempt_count >= max_attempts` (default 3)
124
- 3. **Budget exceeded**: `total_elapsed_ms >= budget_ms` (default 25000)
125
-
126
- ### Selection Strategy
127
-
128
- If multiple attempts are completed, **select the attempt with the highest `quality_score`**.
129
-
130
- ## Quality Scoring
131
-
132
- Quality score balances text quantity and recognition confidence.
133
-
134
- ### Formula
135
-
136
- ```
137
- quality_score = 0 if text_items == 0
138
- = 0.6 * norm(text_items) + 0.4 * avg_rec_score otherwise
139
-
140
- norm(n) = min(1, log(1+n) / log(1+50))
141
- ```
142
-
143
- Where:
144
- - `text_items`: Number of recognized text blocks
145
- - `avg_rec_score`: Average of all recognition confidence scores (0.0-1.0)
146
- - If `rec_scores` is missing, default to `0.5`
147
-
148
- ### Interpretation
149
-
150
- | Quality Score | Interpretation |
151
- |---------------|----------------|
152
- | 0.90 - 1.00 | Excellent (high confidence, many items) |
153
- | 0.72 - 0.89 | Good (default target) |
154
- | 0.50 - 0.71 | Fair (may need retry) |
155
- | 0.00 - 0.49 | Poor (may be low-quality input or blank) |
156
-
157
- ### Default Target
158
-
159
- `quality_target = 0.72`
160
-
161
- This balances cost (API calls) and quality. Adjust with `--quality-target` if needed.
162
-
163
- ## Budget Management
164
-
165
- Auto mode respects a **total time budget** (default 25000ms).
166
-
167
- - Before each attempt, check: `elapsed_ms < budget_ms`
168
- - If budget exceeded, stop and return best attempt so far
169
- - Provider timeout is independent (`PADDLE_OCR_TIMEOUT_MS`, default 25000ms)
170
-
171
- **Example:**
172
- - Attempt 1: 1200ms, quality 0.61 → Continue
173
- - Attempt 2: 1800ms, quality 0.79 → Stop (target reached)
174
- - Total: 3000ms
175
-
176
- ## Error Handling
177
-
178
- ### Provider Errors
179
-
180
- If an attempt fails with a provider error (authentication, quota, etc.):
181
- - **Stop immediately** (do not retry with different options)
182
- - Return error response with failed attempt shown in trace
183
-
184
- ### Temporary Errors (503/504)
185
-
186
- - Retry **within the same attempt** (up to 2 retries with backoff)
187
- - If all retries fail, treat as error and stop
188
-
189
- ## Examples
190
-
191
- ### Example 1: Fast Path Success
192
-
193
- **Input:** High-quality scanned invoice
194
-
195
- **Execution:**
196
- 1. Attempt 1 (fast): quality_score = 0.85 → **Stop**
197
-
198
- **Result:** Select Attempt 1, total time 1.2s
199
-
200
- ---
201
-
202
- ### Example 2: Retry with Improvement
203
-
204
- **Input:** Photo of rotated document
205
-
206
- **Execution:**
207
- 1. Attempt 1 (fast): quality_score = 0.48 → Continue
208
- 2. Attempt 2 (orientation): quality_score = 0.81 → **Stop**
209
-
210
- **Result:** Select Attempt 2, total time 3.1s
211
-
212
- ---
213
-
214
- ### Example 3: All Attempts Needed
215
-
216
- **Input:** Photo of warped document
217
-
218
- **Execution:**
219
- 1. Attempt 1 (fast): quality_score = 0.35 → Continue
220
- 2. Attempt 2 (orientation): quality_score = 0.58 → Continue
221
- 3. Attempt 3 (unwarping): quality_score = 0.76 → **Stop**
222
-
223
- **Result:** Select Attempt 3, total time 5.2s
224
-
225
- ---
226
-
227
- ### Example 4: No Improvement
228
-
229
- **Input:** Blank or corrupted image
230
-
231
- **Execution:**
232
- 1. Attempt 1 (fast): quality_score = 0.0, text_items = 0 → Continue
233
- 2. Attempt 2 (orientation): quality_score = 0.0, text_items = 0 → Continue
234
- 3. Attempt 3 (unwarping): quality_score = 0.0, text_items = 0 → **Stop**
235
-
236
- **Result:** Select Attempt 1 (all are 0.0), return with warning
237
-
238
- ## Tuning Parameters
239
-
240
- Users can override defaults via CLI:
241
-
242
- ```bash
243
- --mode auto \
244
- --max-attempts 2 \ # Reduce to 2 for faster execution (but less robustness)
245
- --budget-ms 15000 \ # Stricter budget
246
- --quality-target 0.80 # Higher quality standard
247
- ```
248
-
249
- ## Testing Auto Mode
250
-
251
- For unit tests, simulate provider responses with different `rec_texts` and `rec_scores` to verify:
252
-
253
- 1. Quality score calculation
254
- 2. Attempt selection (highest score wins)
255
- 3. Early stopping (when target is reached)
256
- 4. Budget enforcement
257
-
258
- See `scripts/tests/test_agent_policy.py`.
1
+ # Agent Policy: Auto Mode Strategy
2
+
3
+ This document defines the execution strategies for the three OCR modes, with emphasis on the auto mode's adaptive retry logic.
4
+
5
+ ## Mode Overview
6
+
7
+ | Mode | Strategy | Use Case |
8
+ |------|----------|----------|
9
+ | `fast` | Single call, all corrections off | Lowest latency, suitable for high-quality scans |
10
+ | `quality` | Single call, all corrections on | Maximum quality, slower, suitable for complex documents |
11
+ | `auto` | Adaptive multiple attempts, quality scoring | Balanced performance, production default |
12
+
13
+ ## Mode: fast
14
+
15
+ **Single attempt**, minimal preprocessing.
16
+
17
+ **Options:**
18
+ ```json
19
+ {
20
+ "use_doc_orientation_classify": false,
21
+ "use_doc_unwarping": false,
22
+ "use_textline_orientation": false
23
+ }
24
+ ```
25
+
26
+ **Suitable scenarios:**
27
+ - Input is known high-quality images (scanned documents, screenshots)
28
+ - Latency is critical
29
+ - Text is already correctly oriented
30
+
31
+ ## Mode: quality
32
+
33
+ **Single attempt**, all corrections enabled.
34
+
35
+ **Options:**
36
+ ```json
37
+ {
38
+ "use_doc_orientation_classify": true,
39
+ "use_doc_unwarping": true,
40
+ "use_textline_orientation": false
41
+ }
42
+ ```
43
+
44
+ **Suitable scenarios:**
45
+ - Input quality is unknown or poor (photos, rotated PDFs)
46
+ - Maximum accuracy is required
47
+ - Latency is acceptable (2-3x slower than fast)
48
+
49
+ ## Mode: auto (default)
50
+
51
+ **Adaptive strategy**: Start fast, escalate progressively if quality is insufficient.
52
+
53
+ ### Attempt Sequence
54
+
55
+ #### Attempt 1: Fast Path
56
+ ```json
57
+ {
58
+ "use_doc_orientation_classify": false,
59
+ "use_doc_unwarping": false,
60
+ "use_textline_orientation": false
61
+ }
62
+ ```
63
+
64
+ **If `quality_score >= quality_target`**: Stop and return result.
65
+
66
+ **Otherwise**: Continue to Attempt 2.
67
+
68
+ ---
69
+
70
+ #### Attempt 2: Orientation Correction
71
+ ```json
72
+ {
73
+ "use_doc_orientation_classify": true,
74
+ "use_doc_unwarping": false,
75
+ "use_textline_orientation": false
76
+ }
77
+ ```
78
+
79
+ Enable page-level orientation detection (0°/90°/180°/270°).
80
+
81
+ **If `quality_score >= quality_target`**: Stop and return result.
82
+
83
+ **Otherwise**: Continue to Attempt 3.
84
+
85
+ ---
86
+
87
+ #### Attempt 3: Unwarping Correction
88
+ ```json
89
+ {
90
+ "use_doc_orientation_classify": true,
91
+ "use_doc_unwarping": true,
92
+ "use_textline_orientation": false
93
+ }
94
+ ```
95
+
96
+ Add perspective correction and skew correction.
97
+
98
+ **If `quality_score >= quality_target`**: Stop and return result.
99
+
100
+ **Otherwise**: Select the best attempt so far.
101
+
102
+ ---
103
+
104
+ #### Optional Attempt 4: Textline Orientation
105
+
106
+ *(Reserved for future use)*
107
+
108
+ ```json
109
+ {
110
+ "use_doc_orientation_classify": true,
111
+ "use_doc_unwarping": true,
112
+ "use_textline_orientation": true
113
+ }
114
+ ```
115
+
116
+ Add line-by-line angle correction (rarely needed).
117
+
118
+ ### Stop Conditions
119
+
120
+ Auto mode stops when **any** of the following conditions are met:
121
+
122
+ 1. **Quality target reached**: `quality_score >= quality_target` (default 0.72)
123
+ 2. **Max attempts reached**: `attempt_count >= max_attempts` (default 3)
124
+ 3. **Budget exceeded**: `total_elapsed_ms >= budget_ms` (default 25000)
125
+
126
+ ### Selection Strategy
127
+
128
+ If multiple attempts are completed, **select the attempt with the highest `quality_score`**.
129
+
130
+ ## Quality Scoring
131
+
132
+ Quality score balances text quantity and recognition confidence.
133
+
134
+ ### Formula
135
+
136
+ ```
137
+ quality_score = 0 if text_items == 0
138
+ = 0.6 * norm(text_items) + 0.4 * avg_rec_score otherwise
139
+
140
+ norm(n) = min(1, log(1+n) / log(1+50))
141
+ ```
142
+
143
+ Where:
144
+ - `text_items`: Number of recognized text blocks
145
+ - `avg_rec_score`: Average of all recognition confidence scores (0.0-1.0)
146
+ - If `rec_scores` is missing, default to `0.5`
147
+
148
+ ### Interpretation
149
+
150
+ | Quality Score | Interpretation |
151
+ |---------------|----------------|
152
+ | 0.90 - 1.00 | Excellent (high confidence, many items) |
153
+ | 0.72 - 0.89 | Good (default target) |
154
+ | 0.50 - 0.71 | Fair (may need retry) |
155
+ | 0.00 - 0.49 | Poor (may be low-quality input or blank) |
156
+
157
+ ### Default Target
158
+
159
+ `quality_target = 0.72`
160
+
161
+ This balances cost (API calls) and quality. Adjust with `--quality-target` if needed.
162
+
163
+ ## Budget Management
164
+
165
+ Auto mode respects a **total time budget** (default 25000ms).
166
+
167
+ - Before each attempt, check: `elapsed_ms < budget_ms`
168
+ - If budget exceeded, stop and return best attempt so far
169
+ - Provider timeout is independent (`PADDLE_OCR_TIMEOUT_MS`, default 25000ms)
170
+
171
+ **Example:**
172
+ - Attempt 1: 1200ms, quality 0.61 → Continue
173
+ - Attempt 2: 1800ms, quality 0.79 → Stop (target reached)
174
+ - Total: 3000ms
175
+
176
+ ## Error Handling
177
+
178
+ ### Provider Errors
179
+
180
+ If an attempt fails with a provider error (authentication, quota, etc.):
181
+ - **Stop immediately** (do not retry with different options)
182
+ - Return error response with failed attempt shown in trace
183
+
184
+ ### Temporary Errors (503/504)
185
+
186
+ - Retry **within the same attempt** (up to 2 retries with backoff)
187
+ - If all retries fail, treat as error and stop
188
+
189
+ ## Examples
190
+
191
+ ### Example 1: Fast Path Success
192
+
193
+ **Input:** High-quality scanned invoice
194
+
195
+ **Execution:**
196
+ 1. Attempt 1 (fast): quality_score = 0.85 → **Stop**
197
+
198
+ **Result:** Select Attempt 1, total time 1.2s
199
+
200
+ ---
201
+
202
+ ### Example 2: Retry with Improvement
203
+
204
+ **Input:** Photo of rotated document
205
+
206
+ **Execution:**
207
+ 1. Attempt 1 (fast): quality_score = 0.48 → Continue
208
+ 2. Attempt 2 (orientation): quality_score = 0.81 → **Stop**
209
+
210
+ **Result:** Select Attempt 2, total time 3.1s
211
+
212
+ ---
213
+
214
+ ### Example 3: All Attempts Needed
215
+
216
+ **Input:** Photo of warped document
217
+
218
+ **Execution:**
219
+ 1. Attempt 1 (fast): quality_score = 0.35 → Continue
220
+ 2. Attempt 2 (orientation): quality_score = 0.58 → Continue
221
+ 3. Attempt 3 (unwarping): quality_score = 0.76 → **Stop**
222
+
223
+ **Result:** Select Attempt 3, total time 5.2s
224
+
225
+ ---
226
+
227
+ ### Example 4: No Improvement
228
+
229
+ **Input:** Blank or corrupted image
230
+
231
+ **Execution:**
232
+ 1. Attempt 1 (fast): quality_score = 0.0, text_items = 0 → Continue
233
+ 2. Attempt 2 (orientation): quality_score = 0.0, text_items = 0 → Continue
234
+ 3. Attempt 3 (unwarping): quality_score = 0.0, text_items = 0 → **Stop**
235
+
236
+ **Result:** Select Attempt 1 (all are 0.0), return with warning
237
+
238
+ ## Tuning Parameters
239
+
240
+ Users can override defaults via CLI:
241
+
242
+ ```bash
243
+ --mode auto \
244
+ --max-attempts 2 \ # Reduce to 2 for faster execution (but less robustness)
245
+ --budget-ms 15000 \ # Stricter budget
246
+ --quality-target 0.80 # Higher quality standard
247
+ ```
248
+
249
+ ## Testing Auto Mode
250
+
251
+ For unit tests, simulate provider responses with different `rec_texts` and `rec_scores` to verify:
252
+
253
+ 1. Quality score calculation
254
+ 2. Attempt selection (highest score wins)
255
+ 3. Early stopping (when target is reached)
256
+ 4. Budget enforcement
257
+
258
+ See `scripts/tests/test_agent_policy.py`.