npm - paddleocr-skills - Versions diffs - 1.0.0 - Mend

paddleocr-skills 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

package/README.md +220 -0
package/bin/paddleocr-skills.js +20 -0
package/lib/copy.js +39 -0
package/lib/installer.js +70 -0
package/lib/prompts.js +67 -0
package/lib/python.js +75 -0
package/lib/verify.js +121 -0
package/package.json +42 -0
package/templates/.env.example +12 -0
package/templates/paddleocr-vl/references/paddleocr-vl/layout_schema.md +64 -0
package/templates/paddleocr-vl/references/paddleocr-vl/output_format.md +154 -0
package/templates/paddleocr-vl/references/paddleocr-vl/vl_model_spec.md +157 -0
package/templates/paddleocr-vl/scripts/paddleocr-vl/_lib.py +780 -0
package/templates/paddleocr-vl/scripts/paddleocr-vl/configure.py +270 -0
package/templates/paddleocr-vl/scripts/paddleocr-vl/optimize_file.py +226 -0
package/templates/paddleocr-vl/scripts/paddleocr-vl/requirements-optimize.txt +8 -0
package/templates/paddleocr-vl/scripts/paddleocr-vl/requirements.txt +7 -0
package/templates/paddleocr-vl/scripts/paddleocr-vl/smoke_test.py +199 -0
package/templates/paddleocr-vl/scripts/paddleocr-vl/vl_caller.py +232 -0
package/templates/paddleocr-vl/skills/paddleocr-vl/SKILL.md +481 -0
package/templates/ppocrv5/references/ppocrv5/agent_policy.md +258 -0
package/templates/ppocrv5/references/ppocrv5/normalized_schema.md +257 -0
package/templates/ppocrv5/references/ppocrv5/provider_api.md +140 -0
package/templates/ppocrv5/scripts/ppocrv5/_lib.py +635 -0
package/templates/ppocrv5/scripts/ppocrv5/configure.py +346 -0
package/templates/ppocrv5/scripts/ppocrv5/ocr_caller.py +684 -0
package/templates/ppocrv5/scripts/ppocrv5/requirements.txt +4 -0
package/templates/ppocrv5/scripts/ppocrv5/smoke_test.py +139 -0
package/templates/ppocrv5/skills/ppocrv5/SKILL.md +272 -0

package/templates/paddleocr-vl/references/paddleocr-vl/layout_schema.md ADDED Viewed

@@ -0,0 +1,64 @@
+# PaddleOCR-VL Layout Schema
+## Layout Detection Overview
+PaddleOCR-VL uses PP-DocLayoutV2 for automatic layout analysis, detecting semantic regions and determining reading order.
+## Region Types
+### Text Regions
+- **paragraph**: Regular text paragraphs
+- **title**: Headings and titles
+- **caption**: Image/table captions
+- **footnote**: Footnotes and references
+### Non-Text Regions
+- **table**: Tabular data
+- **figure**: Images, charts, diagrams
+- **formula**: Mathematical formulas
+- **header**: Page headers
+- **footer**: Page footers
+## Layout Structure
+```json
+{
+  "layout": {
+    "regions": [
+      {
+        "id": 0,
+        "type": "title",
+        "bbox": [x1, y1, x2, y2],
+        "confidence": 0.95
+      },
+      {
+        "id": 1,
+        "type": "paragraph",
+        "bbox": [x1, y1, x2, y2],
+        "confidence": 0.92
+      }
+    ],
+    "reading_order": [0, 1, 2, 3],
+    "page_number": 1
+  }
+}
+```
+## Reading Order Algorithm
+The model automatically determines the correct reading order based on:
+- Spatial layout (top-to-bottom, left-to-right)
+- Document structure (titles before content)
+- Column detection (multi-column layouts)
+- Semantic relationships
+## Bounding Box Format
+All bounding boxes use the format: `[x1, y1, x2, y2]`
+- (x1, y1): Top-left corner
+- (x2, y2): Bottom-right corner
+- Coordinates are absolute pixel positions
+---
+*This is a placeholder document. Full layout schema specifications will be added when integration is complete.*

package/templates/paddleocr-vl/references/paddleocr-vl/output_format.md ADDED Viewed

@@ -0,0 +1,154 @@
+# PaddleOCR-VL Output Format Specification
+## Response Structure
+All PaddleOCR-VL API responses follow this structure:
+```json
+{
+  "ok": boolean,
+  "result": { ... },
+  "metadata": { ... },
+  "error": { ... }  // Only present if ok = false
+}
+```
+## Success Response
+### JSON Output
+```json
+{
+  "ok": true,
+  "result": {
+    "layout": {
+      "regions": [...],
+      "reading_order": [...]
+    },
+    "elements": [
+      {
+        "type": "text" | "table" | "formula" | "figure",
+        "content": "...",
+        "bbox": [x1, y1, x2, y2],
+        "confidence": 0.0-1.0
+      }
+    ],
+    "full_text": "Combined text from all elements"
+  },
+  "metadata": {
+    "processing_time_ms": 3500,
+    "total_pages": 1,
+    "languages_detected": ["en", "zh"],
+    "model_version": "paddleocr-vl-0.9b"
+  }
+}
+```
+### Markdown Output
+When `--format markdown` is specified, the output is formatted as:
+```markdown
+# [Title from document]
+[Paragraph text...]
+| Table | Headers |
+|-------|---------|
+| Data  | Values  |
+Formula: $E = mc^2$
+[More content...]
+```
+## Element Types
+### Text Element
+```json
+{
+  "type": "text",
+  "content": "The actual text content",
+  "bbox": [100, 200, 500, 250],
+  "confidence": 0.95,
+  "language": "en"
+}
+```
+### Table Element
+```json
+{
+  "type": "table",
+  "content": {
+    "rows": 3,
+    "cols": 2,
+    "cells": [
+      ["Header 1", "Header 2"],
+      ["Data 1", "Data 2"],
+      ["Data 3", "Data 4"]
+    ]
+  },
+  "bbox": [100, 300, 500, 450],
+  "confidence": 0.88
+}
+```
+### Formula Element
+```json
+{
+  "type": "formula",
+  "content": "E = mc^2",
+  "latex": "$E = mc^2$",
+  "bbox": [200, 500, 400, 530],
+  "confidence": 0.92
+}
+```
+### Figure Element
+```json
+{
+  "type": "figure",
+  "content": {
+    "description": "Bar chart showing sales data",
+    "extracted_data": {...}  // If available
+  },
+  "bbox": [100, 600, 500, 800],
+  "confidence": 0.85
+}
+```
+## Error Response
+```json
+{
+  "ok": false,
+  "error": {
+    "code": "ERROR_CODE",
+    "message": "Human-readable error message",
+    "details": {
+      "field": "Additional context"
+    }
+  }
+}
+```
+### Error Codes
+- `INVALID_INPUT`: Invalid file or URL
+- `UNSUPPORTED_FORMAT`: File format not supported
+- `PROVIDER_AUTH_ERROR`: Authentication failed
+- `PROVIDER_QUOTA_EXCEEDED`: API quota exceeded
+- `PROCESSING_ERROR`: Error during document processing
+- `TIMEOUT`: Request timeout
+## Quality Metrics
+Each element includes a confidence score (0.0 to 1.0):
+- **0.90 - 1.00**: Excellent quality, highly reliable
+- **0.75 - 0.89**: Good quality, generally reliable
+- **0.50 - 0.74**: Fair quality, may have some errors
+- **0.00 - 0.49**: Poor quality, likely has errors
+---
+*This is a placeholder document. Full output format specifications will be added when integration is complete.*

package/templates/paddleocr-vl/references/paddleocr-vl/vl_model_spec.md ADDED Viewed

@@ -0,0 +1,157 @@
+# PaddleOCR-VL Model Specification
+## Overview
+PaddleOCR-VL is a vision-language model designed for advanced document parsing. This document provides the technical specifications and capabilities of the model.
+## Model Architecture
+- **Model Size**: 0.9B parameters
+- **Vision Encoder**: NaViT-style dynamic high-resolution visual encoder
+- **Language Model**: ERNIE-4.5-0.3B
+- **Two-Stage Processing**:
+  1. PP-DocLayoutV2 for layout analysis
+  2. PaddleOCR-VL-0.9B for content recognition
+## Supported Languages
+109 languages including:
+- English, Chinese, Japanese, Korean
+- European languages (French, German, Spanish, Italian, etc.)
+- Asian languages (Thai, Vietnamese, Arabic, etc.)
+- And many more...
+## Supported Elements
+### Text Recognition
+- Plain text paragraphs
+- Headings and titles
+- Captions and labels
+- Multi-column text
+### Table Recognition
+- Simple tables
+- Complex nested tables
+- Borderless tables
+- Multi-page tables
+### Formula Recognition
+- Inline formulas
+- Display formulas
+- Mathematical equations
+- Chemical formulas
+- LaTeX output support
+### Chart Recognition
+- Bar charts
+- Line charts
+- Pie charts
+- Diagrams and flowcharts
+## Performance Characteristics
+- **Accuracy**: SOTA on document parsing benchmarks
+- **Speed**: ~3-5 seconds per page (depending on complexity)
+- **Input Resolution**: Dynamic, up to 4K resolution
+- **Max File Size**: 20MB per request
+## API Limitations
+- Maximum pages per request: 10
+- Maximum requests per minute: 30
+- Supported formats: PDF, PNG, JPG, JPEG
+- Maximum file size: 20MB per document
+- Timeout: 30 seconds default (configurable up to 60s)
+## Output Format
+### Region Types
+The model identifies and classifies document regions into the following types:
+| Type | Description | Typical Use |
+|------|-------------|-------------|
+| `header` | Page headers, chapter titles | Navigation, structure |
+| `text` | Main body text, paragraphs | Primary content |
+| `table` | Tabular data with rows/columns | Structured data extraction |
+| `formula` | Mathematical equations | Scientific documents |
+| `figure` | Images, charts, diagrams | Visual content |
+| `footnote` | References, citations | Supporting information |
+| `footer` | Page footers, metadata | Document metadata |
+| `page_number` | Page numbering | Navigation |
+| `margin_note` | Annotations, comments | Supplementary notes |
+### Confidence Scoring
+Each recognized element includes a confidence score (0.0-1.0):
+- **0.90-1.00**: Excellent recognition quality
+- **0.75-0.89**: Good quality, minor uncertainties
+- **0.60-0.74**: Acceptable, may have some errors
+- **Below 0.60**: Poor quality, manual review recommended
+### Layout Analysis
+The model performs reading order detection and provides:
+- Bounding box coordinates for each element
+- Reading order sequence
+- Element relationships (e.g., caption-to-figure association)
+- Multi-column layout handling
+## Quality Considerations
+### Best Performance On:
+- High-resolution scanned documents (300 DPI+)
+- Clean, well-formatted PDFs
+- Standard fonts and layouts
+- Good contrast between text and background
+### Challenging Scenarios:
+- Handwritten text (not supported)
+- Very low resolution images (<150 DPI)
+- Heavy distortion or skew
+- Unusual fonts or decorative text
+## API Response Structure
+The API returns a comprehensive JSON structure:
+```json
+{
+  "ok": true,
+  "result": {
+    "full_text": "Complete extracted text...",
+    "layout": {
+      "regions": [
+        {
+          "id": 0,
+          "type": "text|table|formula|figure|header|footer|...",
+          "content": "...",
+          "bbox": [x1, y1, x2, y2],
+          "confidence": 0.95,
+          "page": 1
+        }
+      ],
+      "reading_order": [0, 1, 2, ...]
+    }
+  },
+  "metadata": {
+    "processing_time_ms": 3500,
+    "total_pages": 1,
+    "model_version": "paddleocr-vl-0.9b"
+  }
+}
+```
+## Integration Notes
+- All requests must include valid authentication token
+- Responses are cached for 10 minutes by default
+- Failed requests return standardized error codes
+- Retry logic recommended for transient failures (503, 504)
+---
+**Document Version**: 1.0
+**Last Updated**: 2026-01-28
+**Status**: Production Ready