npm - paddleocr-skills - Versions diffs - 1.0.0 → 1.1.0 - Mend

paddleocr-skills 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

package/templates/ppocrv5/references/ppocrv5/agent_policy.md CHANGED Viewed

@@ -1,258 +1,258 @@
-# Agent Policy: Auto Mode Strategy
-This document defines the execution strategies for the three OCR modes, with emphasis on the auto mode's adaptive retry logic.
-## Mode Overview
-| Mode | Strategy | Use Case |
-|------|----------|----------|
-| `fast` | Single call, all corrections off | Lowest latency, suitable for high-quality scans |
-| `quality` | Single call, all corrections on | Maximum quality, slower, suitable for complex documents |
-| `auto` | Adaptive multiple attempts, quality scoring | Balanced performance, production default |
-## Mode: fast
-**Single attempt**, minimal preprocessing.
-**Options:**
-```json
-{
-  "use_doc_orientation_classify": false,
-  "use_doc_unwarping": false,
-  "use_textline_orientation": false
-}
-```
-**Suitable scenarios:**
-- Input is known high-quality images (scanned documents, screenshots)
-- Latency is critical
-- Text is already correctly oriented
-## Mode: quality
-**Single attempt**, all corrections enabled.
-**Options:**
-```json
-{
-  "use_doc_orientation_classify": true,
-  "use_doc_unwarping": true,
-  "use_textline_orientation": false
-}
-```
-**Suitable scenarios:**
-- Input quality is unknown or poor (photos, rotated PDFs)
-- Maximum accuracy is required
-- Latency is acceptable (2-3x slower than fast)
-## Mode: auto (default)
-**Adaptive strategy**: Start fast, escalate progressively if quality is insufficient.
-### Attempt Sequence
-#### Attempt 1: Fast Path
-```json
-{
-  "use_doc_orientation_classify": false,
-  "use_doc_unwarping": false,
-  "use_textline_orientation": false
-}
-```
-**If `quality_score >= quality_target`**: Stop and return result.
-**Otherwise**: Continue to Attempt 2.
----
-#### Attempt 2: Orientation Correction
-```json
-{
-  "use_doc_orientation_classify": true,
-  "use_doc_unwarping": false,
-  "use_textline_orientation": false
-}
-```
-Enable page-level orientation detection (0°/90°/180°/270°).
-**If `quality_score >= quality_target`**: Stop and return result.
-**Otherwise**: Continue to Attempt 3.
----
-#### Attempt 3: Unwarping Correction
-```json
-{
-  "use_doc_orientation_classify": true,
-  "use_doc_unwarping": true,
-  "use_textline_orientation": false
-}
-```
-Add perspective correction and skew correction.
-**If `quality_score >= quality_target`**: Stop and return result.
-**Otherwise**: Select the best attempt so far.
----
-#### Optional Attempt 4: Textline Orientation
-*(Reserved for future use)*
-```json
-{
-  "use_doc_orientation_classify": true,
-  "use_doc_unwarping": true,
-  "use_textline_orientation": true
-}
-```
-Add line-by-line angle correction (rarely needed).
-### Stop Conditions
-Auto mode stops when **any** of the following conditions are met:
-1. **Quality target reached**: `quality_score >= quality_target` (default 0.72)
-2. **Max attempts reached**: `attempt_count >= max_attempts` (default 3)
-3. **Budget exceeded**: `total_elapsed_ms >= budget_ms` (default 25000)
-### Selection Strategy
-If multiple attempts are completed, **select the attempt with the highest `quality_score`**.
-## Quality Scoring
-Quality score balances text quantity and recognition confidence.
-### Formula
-```
-quality_score = 0                                        if text_items == 0
-              = 0.6 * norm(text_items) + 0.4 * avg_rec_score  otherwise
-norm(n) = min(1, log(1+n) / log(1+50))
-```
-Where:
-- `text_items`: Number of recognized text blocks
-- `avg_rec_score`: Average of all recognition confidence scores (0.0-1.0)
-- If `rec_scores` is missing, default to `0.5`
-### Interpretation
-| Quality Score | Interpretation |
-|---------------|----------------|
-| 0.90 - 1.00 | Excellent (high confidence, many items) |
-| 0.72 - 0.89 | Good (default target) |
-| 0.50 - 0.71 | Fair (may need retry) |
-| 0.00 - 0.49 | Poor (may be low-quality input or blank) |
-### Default Target
-`quality_target = 0.72`
-This balances cost (API calls) and quality. Adjust with `--quality-target` if needed.
-## Budget Management
-Auto mode respects a **total time budget** (default 25000ms).
-- Before each attempt, check: `elapsed_ms < budget_ms`
-- If budget exceeded, stop and return best attempt so far
-- Provider timeout is independent (`PADDLE_OCR_TIMEOUT_MS`, default 25000ms)
-**Example:**
-- Attempt 1: 1200ms, quality 0.61 → Continue
-- Attempt 2: 1800ms, quality 0.79 → Stop (target reached)
-- Total: 3000ms
-## Error Handling
-### Provider Errors
-If an attempt fails with a provider error (authentication, quota, etc.):
-- **Stop immediately** (do not retry with different options)
-- Return error response with failed attempt shown in trace
-### Temporary Errors (503/504)
-- Retry **within the same attempt** (up to 2 retries with backoff)
-- If all retries fail, treat as error and stop
-## Examples
-### Example 1: Fast Path Success
-**Input:** High-quality scanned invoice
-**Execution:**
-1. Attempt 1 (fast): quality_score = 0.85 → **Stop**
-**Result:** Select Attempt 1, total time 1.2s
----
-### Example 2: Retry with Improvement
-**Input:** Photo of rotated document
-**Execution:**
-1. Attempt 1 (fast): quality_score = 0.48 → Continue
-2. Attempt 2 (orientation): quality_score = 0.81 → **Stop**
-**Result:** Select Attempt 2, total time 3.1s
----
-### Example 3: All Attempts Needed
-**Input:** Photo of warped document
-**Execution:**
-1. Attempt 1 (fast): quality_score = 0.35 → Continue
-2. Attempt 2 (orientation): quality_score = 0.58 → Continue
-3. Attempt 3 (unwarping): quality_score = 0.76 → **Stop**
-**Result:** Select Attempt 3, total time 5.2s
----
-### Example 4: No Improvement
-**Input:** Blank or corrupted image
-**Execution:**
-1. Attempt 1 (fast): quality_score = 0.0, text_items = 0 → Continue
-2. Attempt 2 (orientation): quality_score = 0.0, text_items = 0 → Continue
-3. Attempt 3 (unwarping): quality_score = 0.0, text_items = 0 → **Stop**
-**Result:** Select Attempt 1 (all are 0.0), return with warning
-## Tuning Parameters
-Users can override defaults via CLI:
-```bash
---mode auto \
---max-attempts 2 \             # Reduce to 2 for faster execution (but less robustness)
---budget-ms 15000 \            # Stricter budget
---quality-target 0.80          # Higher quality standard
-```
-## Testing Auto Mode
-For unit tests, simulate provider responses with different `rec_texts` and `rec_scores` to verify:
-1. Quality score calculation
-2. Attempt selection (highest score wins)
-3. Early stopping (when target is reached)
-4. Budget enforcement
-See `scripts/tests/test_agent_policy.py`.
+# Agent Policy: Auto Mode Strategy
+This document defines the execution strategies for the three OCR modes, with emphasis on the auto mode's adaptive retry logic.
+## Mode Overview
+| Mode | Strategy | Use Case |
+|------|----------|----------|
+| `fast` | Single call, all corrections off | Lowest latency, suitable for high-quality scans |
+| `quality` | Single call, all corrections on | Maximum quality, slower, suitable for complex documents |
+| `auto` | Adaptive multiple attempts, quality scoring | Balanced performance, production default |
+## Mode: fast
+**Single attempt**, minimal preprocessing.
+**Options:**
+```json
+{
+  "use_doc_orientation_classify": false,
+  "use_doc_unwarping": false,
+  "use_textline_orientation": false
+}
+```
+**Suitable scenarios:**
+- Input is known high-quality images (scanned documents, screenshots)
+- Latency is critical
+- Text is already correctly oriented
+## Mode: quality
+**Single attempt**, all corrections enabled.
+**Options:**
+```json
+{
+  "use_doc_orientation_classify": true,
+  "use_doc_unwarping": true,
+  "use_textline_orientation": false
+}
+```
+**Suitable scenarios:**
+- Input quality is unknown or poor (photos, rotated PDFs)
+- Maximum accuracy is required
+- Latency is acceptable (2-3x slower than fast)
+## Mode: auto (default)
+**Adaptive strategy**: Start fast, escalate progressively if quality is insufficient.
+### Attempt Sequence
+#### Attempt 1: Fast Path
+```json
+{
+  "use_doc_orientation_classify": false,
+  "use_doc_unwarping": false,
+  "use_textline_orientation": false
+}
+```
+**If `quality_score >= quality_target`**: Stop and return result.
+**Otherwise**: Continue to Attempt 2.
+---
+#### Attempt 2: Orientation Correction
+```json
+{
+  "use_doc_orientation_classify": true,
+  "use_doc_unwarping": false,
+  "use_textline_orientation": false
+}
+```
+Enable page-level orientation detection (0°/90°/180°/270°).
+**If `quality_score >= quality_target`**: Stop and return result.
+**Otherwise**: Continue to Attempt 3.
+---
+#### Attempt 3: Unwarping Correction
+```json
+{
+  "use_doc_orientation_classify": true,
+  "use_doc_unwarping": true,
+  "use_textline_orientation": false
+}
+```
+Add perspective correction and skew correction.
+**If `quality_score >= quality_target`**: Stop and return result.
+**Otherwise**: Select the best attempt so far.
+---
+#### Optional Attempt 4: Textline Orientation
+*(Reserved for future use)*
+```json
+{
+  "use_doc_orientation_classify": true,
+  "use_doc_unwarping": true,
+  "use_textline_orientation": true
+}
+```
+Add line-by-line angle correction (rarely needed).
+### Stop Conditions
+Auto mode stops when **any** of the following conditions are met:
+1. **Quality target reached**: `quality_score >= quality_target` (default 0.72)
+2. **Max attempts reached**: `attempt_count >= max_attempts` (default 3)
+3. **Budget exceeded**: `total_elapsed_ms >= budget_ms` (default 25000)
+### Selection Strategy
+If multiple attempts are completed, **select the attempt with the highest `quality_score`**.
+## Quality Scoring
+Quality score balances text quantity and recognition confidence.
+### Formula
+```
+quality_score = 0                                        if text_items == 0
+              = 0.6 * norm(text_items) + 0.4 * avg_rec_score  otherwise
+norm(n) = min(1, log(1+n) / log(1+50))
+```
+Where:
+- `text_items`: Number of recognized text blocks
+- `avg_rec_score`: Average of all recognition confidence scores (0.0-1.0)
+- If `rec_scores` is missing, default to `0.5`
+### Interpretation
+| Quality Score | Interpretation |
+|---------------|----------------|
+| 0.90 - 1.00 | Excellent (high confidence, many items) |
+| 0.72 - 0.89 | Good (default target) |
+| 0.50 - 0.71 | Fair (may need retry) |
+| 0.00 - 0.49 | Poor (may be low-quality input or blank) |
+### Default Target
+`quality_target = 0.72`
+This balances cost (API calls) and quality. Adjust with `--quality-target` if needed.
+## Budget Management
+Auto mode respects a **total time budget** (default 25000ms).
+- Before each attempt, check: `elapsed_ms < budget_ms`
+- If budget exceeded, stop and return best attempt so far
+- Provider timeout is independent (`PADDLE_OCR_TIMEOUT_MS`, default 25000ms)
+**Example:**
+- Attempt 1: 1200ms, quality 0.61 → Continue
+- Attempt 2: 1800ms, quality 0.79 → Stop (target reached)
+- Total: 3000ms
+## Error Handling
+### Provider Errors
+If an attempt fails with a provider error (authentication, quota, etc.):
+- **Stop immediately** (do not retry with different options)
+- Return error response with failed attempt shown in trace
+### Temporary Errors (503/504)
+- Retry **within the same attempt** (up to 2 retries with backoff)
+- If all retries fail, treat as error and stop
+## Examples
+### Example 1: Fast Path Success
+**Input:** High-quality scanned invoice
+**Execution:**
+1. Attempt 1 (fast): quality_score = 0.85 → **Stop**
+**Result:** Select Attempt 1, total time 1.2s
+---
+### Example 2: Retry with Improvement
+**Input:** Photo of rotated document
+**Execution:**
+1. Attempt 1 (fast): quality_score = 0.48 → Continue
+2. Attempt 2 (orientation): quality_score = 0.81 → **Stop**
+**Result:** Select Attempt 2, total time 3.1s
+---
+### Example 3: All Attempts Needed
+**Input:** Photo of warped document
+**Execution:**
+1. Attempt 1 (fast): quality_score = 0.35 → Continue
+2. Attempt 2 (orientation): quality_score = 0.58 → Continue
+3. Attempt 3 (unwarping): quality_score = 0.76 → **Stop**
+**Result:** Select Attempt 3, total time 5.2s
+---
+### Example 4: No Improvement
+**Input:** Blank or corrupted image
+**Execution:**
+1. Attempt 1 (fast): quality_score = 0.0, text_items = 0 → Continue
+2. Attempt 2 (orientation): quality_score = 0.0, text_items = 0 → Continue
+3. Attempt 3 (unwarping): quality_score = 0.0, text_items = 0 → **Stop**
+**Result:** Select Attempt 1 (all are 0.0), return with warning
+## Tuning Parameters
+Users can override defaults via CLI:
+```bash
+--mode auto \
+--max-attempts 2 \             # Reduce to 2 for faster execution (but less robustness)
+--budget-ms 15000 \            # Stricter budget
+--quality-target 0.80          # Higher quality standard
+```
+## Testing Auto Mode
+For unit tests, simulate provider responses with different `rec_texts` and `rec_scores` to verify:
+1. Quality score calculation
+2. Attempt selection (highest score wins)
+3. Early stopping (when target is reached)
+4. Budget enforcement
+See `scripts/tests/test_agent_policy.py`.