npm - paddleocr-skills - Versions diffs - 1.0.0 - Mend

paddleocr-skills 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

package/README.md +220 -0
package/bin/paddleocr-skills.js +20 -0
package/lib/copy.js +39 -0
package/lib/installer.js +70 -0
package/lib/prompts.js +67 -0
package/lib/python.js +75 -0
package/lib/verify.js +121 -0
package/package.json +42 -0
package/templates/.env.example +12 -0
package/templates/paddleocr-vl/references/paddleocr-vl/layout_schema.md +64 -0
package/templates/paddleocr-vl/references/paddleocr-vl/output_format.md +154 -0
package/templates/paddleocr-vl/references/paddleocr-vl/vl_model_spec.md +157 -0
package/templates/paddleocr-vl/scripts/paddleocr-vl/_lib.py +780 -0
package/templates/paddleocr-vl/scripts/paddleocr-vl/configure.py +270 -0
package/templates/paddleocr-vl/scripts/paddleocr-vl/optimize_file.py +226 -0
package/templates/paddleocr-vl/scripts/paddleocr-vl/requirements-optimize.txt +8 -0
package/templates/paddleocr-vl/scripts/paddleocr-vl/requirements.txt +7 -0
package/templates/paddleocr-vl/scripts/paddleocr-vl/smoke_test.py +199 -0
package/templates/paddleocr-vl/scripts/paddleocr-vl/vl_caller.py +232 -0
package/templates/paddleocr-vl/skills/paddleocr-vl/SKILL.md +481 -0
package/templates/ppocrv5/references/ppocrv5/agent_policy.md +258 -0
package/templates/ppocrv5/references/ppocrv5/normalized_schema.md +257 -0
package/templates/ppocrv5/references/ppocrv5/provider_api.md +140 -0
package/templates/ppocrv5/scripts/ppocrv5/_lib.py +635 -0
package/templates/ppocrv5/scripts/ppocrv5/configure.py +346 -0
package/templates/ppocrv5/scripts/ppocrv5/ocr_caller.py +684 -0
package/templates/ppocrv5/scripts/ppocrv5/requirements.txt +4 -0
package/templates/ppocrv5/scripts/ppocrv5/smoke_test.py +139 -0
package/templates/ppocrv5/skills/ppocrv5/SKILL.md +272 -0

package/templates/paddleocr-vl/skills/paddleocr-vl/SKILL.md ADDED Viewed

@@ -0,0 +1,481 @@
+---
+name: paddleocr-vl
+description: >
+  Advanced document parsing with PaddleOCR-VL vision-language model. Returns complete document
+  structure including text, tables, formulas, charts, and layout information. Claude extracts
+  relevant content based on user needs.
+---
+# PaddleOCR-VL Document Parsing Skill
+## When to Use This Skill
+✅ **Use PaddleOCR-VL for**:
+- Documents with tables (invoices, financial reports, spreadsheets)
+- Documents with mathematical formulas (academic papers, scientific documents)
+- Documents with charts and diagrams
+- Multi-column layouts (newspapers, magazines, brochures)
+- Complex document structures requiring layout analysis
+- Any document requiring structured understanding
+❌ **Use PP-OCRv5 instead for**:
+- Simple text-only extraction
+- Quick OCR tasks where speed is critical
+- Screenshots or simple images with clear text
+## How to Use This Skill
+**⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔**
+1. **ONLY use PaddleOCR-VL API** - Execute the script `python scripts/paddleocr-vl/vl_caller.py`
+2. **NEVER use Claude's built-in vision** - Do NOT parse documents yourself
+3. **NEVER offer alternatives** - Do NOT suggest "I can try to analyze it" or similar
+4. **IF API fails** - Display the error message and STOP immediately
+5. **NO fallback methods** - Do NOT attempt document parsing any other way
+If the script execution fails (API not configured, network error, etc.):
+- Show the error message to the user
+- Do NOT offer to help using your vision capabilities
+- Do NOT ask "Would you like me to try parsing it?"
+- Simply stop and wait for user to fix the configuration
+### Basic Workflow
+1. **Execute document parsing**:
+   ```bash
+   python scripts/paddleocr-vl/vl_caller.py --file-url "URL provided by user"
+   ```
+   Or for local files:
+   ```bash
+   python scripts/paddleocr-vl/vl_caller.py --file-path "file path"
+   ```
+   **Save result to file** (recommended):
+   ```bash
+   python scripts/paddleocr-vl/vl_caller.py --file-url "URL" --output result.json --pretty
+   ```
+   - The script will display: `Result saved to: /absolute/path/to/result.json`
+   - This message appears on stderr, the JSON is saved to the file
+   - **Tell the user the file path** shown in the message
+2. **The script returns COMPLETE JSON** with all document content:
+   - Headers, footers, page numbers
+   - Main text content
+   - Tables with structure
+   - Formulas (with LaTeX)
+   - Figures and charts
+   - Footnotes and references
+   - Layout and reading order
+3. **Extract what the user needs** from the complete data based on their request.
+### IMPORTANT: Complete Content Display
+**CRITICAL**: You must display the COMPLETE extracted content to the user based on their needs.
+- The script returns ALL document content in a structured format
+- **Display the full content requested by the user**, do NOT truncate or summarize
+- If user asks for "all text", show the entire `result.full_text`
+- If user asks for "tables", show ALL tables in the document
+- If user asks for "main content", filter out headers/footers but show ALL body text
+**What this means**:
+- ✅ **DO**: Display complete text, all tables, all formulas as requested
+- ✅ **DO**: Present content in reading order using `reading_order` array
+- ❌ **DON'T**: Truncate with "..." unless content is excessively long (>10,000 chars)
+- ❌ **DON'T**: Summarize or provide excerpts when user asks for full content
+- ❌ **DON'T**: Say "Here's a preview" when user expects complete output
+**Example - Correct**:
+```
+User: "Extract all the text from this document"
+Claude: I've parsed the complete document. Here's all the extracted text:
+[Display entire result.full_text or concatenated regions in reading order]
+Document Statistics:
+- Total regions: 25
+- Text blocks: 15
+- Tables: 3
+- Formulas: 2
+Quality: Excellent (confidence: 0.92)
+```
+**Example - Incorrect** ❌:
+```
+User: "Extract all the text"
+Claude: "I found a document with multiple sections. Here's the beginning:
+'Introduction...' (content truncated for brevity)"
+```
+### Understanding the JSON Response
+The script returns a complete JSON structure:
+```json
+{
+  "ok": true,
+  "result": {
+    "full_text": "Complete text with all content including headers, footers, etc.",
+    "layout": {
+      "regions": [
+        {
+          "id": 0,
+          "type": "header",
+          "content": "Chapter 3: Methods",
+          "bbox": [100, 50, 500, 100]
+        },
+        {
+          "id": 1,
+          "type": "text",
+          "content": "Main body text content...",
+          "bbox": [100, 150, 500, 300]
+        },
+        {
+          "id": 2,
+          "type": "table",
+          "content": {
+            "rows": 3,
+            "cols": 2,
+            "cells": [["Header1", "Header2"], ["Data1", "Data2"]]
+          },
+          "bbox": [100, 350, 500, 550]
+        },
+        {
+          "id": 3,
+          "type": "formula",
+          "content": "E = mc^2",
+          "latex": "$E = mc^2$",
+          "bbox": [200, 600, 400, 630]
+        },
+        {
+          "id": 4,
+          "type": "footnote",
+          "content": "[1] Reference citation",
+          "bbox": [100, 650, 500, 680]
+        },
+        {
+          "id": 5,
+          "type": "footer",
+          "content": "University Name 2024",
+          "bbox": [100, 750, 500, 800]
+        },
+        {
+          "id": 6,
+          "type": "page_number",
+          "content": "25",
+          "bbox": [250, 770, 280, 790]
+        }
+      ],
+      "reading_order": [0, 1, 2, 3, 4, 5, 6]
+    }
+  },
+  "metadata": {
+    "processing_time_ms": 3500,
+    "total_pages": 1,
+    "model_version": "paddleocr-vl-0.9b"
+  }
+}
+```
+### Region Types
+The `type` field indicates the element category:
+| Type | Description | Typically Include? |
+|------|-------------|-------------------|
+| `header` | Page headers (chapter/section titles) | Exclude (usually repetitive) |
+| `text` | Main body text | **Include** |
+| `table` | Tables with structured data | **Include** |
+| `formula` | Mathematical formulas | **Include** |
+| `figure` | Images, charts, diagrams | **Include** |
+| `footnote` | Footnotes and references | **Include** (often important) |
+| `footer` | Page footers (author/institution) | Exclude (usually repetitive) |
+| `page_number` | Page numbers | Exclude (not content) |
+| `margin_note` | Margin annotations | Context-dependent |
+### Content Extraction Guidelines
+**Based on user intent, filter the regions**:
+| User Says | What to Extract | How |
+|-----------|-----------------|-----|
+| "Extract main content" | text, table, formula, figure, footnote | Skip header, footer, page_number |
+| "Get all tables" | table only | Filter by type="table" |
+| "Extract formulas" | formula only | Filter by type="formula" |
+| "Complete document" | Everything | Use all regions or full_text |
+| "Without headers/footers" | Core content | Skip header, footer types |
+| "Include page numbers" | Core + page_number | Keep page_number type |
+### Usage Examples
+**Example 1: Extract Main Content** (default behavior)
+```bash
+python scripts/paddleocr-vl/vl_caller.py \
+  --file-url "https://example.com/paper.pdf" \
+  --pretty
+```
+Then filter JSON to extract core content:
+- Include: text, table, formula, figure, footnote
+- Exclude: header, footer, page_number
+**Example 2: Extract Tables Only**
+```bash
+python scripts/paddleocr-vl/vl_caller.py \
+  --file-path "./financial_report.pdf" \
+  --pretty
+```
+Then filter JSON:
+- Only keep regions where type="table"
+- Present table content in markdown format
+**Example 3: Complete Document with Everything**
+```bash
+python scripts/paddleocr-vl/vl_caller.py \
+  --file-url "URL" \
+  --pretty
+```
+Then use `result.full_text` or present all regions in reading_order.
+### First-Time Configuration
+**When API is not configured**:
+The error will show:
+```
+Configuration error: API not configured. Get your API at: https://aistudio.baidu.com/paddleocr/task
+```
+**Auto-configuration workflow**:
+1. **Show the exact error message** to user (including the URL)
+2. **Tell user to provide credentials**:
+   ```
+   Please visit the URL above to get your VL_API_URL and VL_TOKEN.
+   Once you have them, send them to me and I'll configure it automatically.
+   ```
+3. **When user provides credentials** (accept any format):
+   - `VL_API_URL=https://xxx.com/v1, VL_TOKEN=abc123...`
+   - `Here's my API: https://xxx and token: abc123`
+   - Copy-pasted code format
+   - Any other reasonable format
+4. **Parse credentials from user's message**:
+   - Extract VL_API_URL value (look for URLs)
+   - Extract VL_TOKEN value (long alphanumeric string, usually 40+ chars)
+5. **Configure automatically**:
+   ```bash
+   python scripts/paddleocr-vl/configure.py --api-url "PARSED_URL" --token "PARSED_TOKEN"
+   ```
+6. **If configuration succeeds**:
+   - Inform user: "Configuration complete! Parsing document now..."
+   - Retry the original parsing task
+7. **If configuration fails**:
+   - Show the error
+   - Ask user to verify the credentials
+**IMPORTANT**: The error message format is STRICT and must be shown exactly as provided by the script. Do not modify or paraphrase it.
+### Handling Large Files (>20MB)
+**Problem**: Local files larger than 20MB are rejected by default.
+**Solutions** (Choose based on your situation):
+#### Solution 1: Use URL Upload (Recommended) ⭐
+Upload your file to a web server and use `--file-url`:
+```bash
+# Instead of local file
+python scripts/paddleocr-vl/vl_caller.py --file-path "large_file.pdf"  # ❌ May fail
+# Use URL instead
+python scripts/paddleocr-vl/vl_caller.py --file-url "https://your-server.com/large_file.pdf"  # ✅ No size limit
+```
+Benefits:
+- No local file size limit
+- Faster upload (direct from API server)
+- Suitable for very large files (>100MB)
+#### Solution 2: Increase Size Limit
+Adjust the limit in `.env` file:
+```bash
+# .env
+VL_MAX_FILE_SIZE_MB=50  # Increase from 20MB to 50MB
+```
+Then process as usual:
+```bash
+python scripts/paddleocr-vl/vl_caller.py --file-path "large_file.pdf"
+```
+**Note**: Your API provider may still have upload limits. Check with your VL API service.
+#### Solution 3: Compress/Optimize File
+Use the built-in optimizer to reduce file size:
+```bash
+# Install optimization dependencies first
+pip install -r scripts/paddleocr-vl/requirements-optimize.txt
+# Optimize image (reduce quality)
+python scripts/paddleocr-vl/optimize_file.py input.png output.png --quality 70
+# Optimize PDF (compress images within)
+python scripts/paddleocr-vl/optimize_file.py input.pdf output.pdf --target-size 15
+# Then process optimized file
+python scripts/paddleocr-vl/vl_caller.py --file-path "output.pdf" --pretty
+```
+The optimizer will:
+- Compress images (adjust JPEG quality)
+- Resize images if needed (maintain aspect ratio)
+- Compress PDF images (reduce DPI to 150)
+- Show before/after size comparison
+#### Solution 4: Process Specific Pages (PDF Only)
+If you only need certain pages from a large PDF, extract them first:
+```bash
+# Using PyMuPDF (requires: pip install PyMuPDF)
+python -c "
+import fitz
+doc = fitz.open('large.pdf')
+writer = fitz.open()
+writer.insert_pdf(doc, from_page=0, to_page=4)  # Pages 1-5
+writer.save('pages_1_5.pdf')
+"
+# Then process the smaller file
+python scripts/paddleocr-vl/vl_caller.py --file-path "pages_1_5.pdf"
+```
+#### Solution 5: Upload to Cloud Storage
+For extremely large files (>100MB), use cloud storage with public URLs:
+```bash
+# Example: Upload to AWS S3, Google Drive, or similar
+# Get public URL: https://storage.example.com/my-document.pdf
+# Process via URL
+python scripts/paddleocr-vl/vl_caller.py --file-url "https://storage.example.com/my-document.pdf"
+```
+**Comparison Table**:
+| Solution | Max Size | Speed | Complexity | Best For |
+|----------|----------|-------|------------|----------|
+| URL Upload | Unlimited | Fast | Low | Any large file |
+| Increase Limit | Configurable | Medium | Very Low | Slightly over limit |
+| Compress | ~70% reduction | Slow | Medium | Images/PDFs with images |
+| Extract Pages | As needed | Fast | Medium | Multi-page PDFs |
+| Cloud Storage | Unlimited | Fast | High | Very large files (>100MB) |
+### Error Handling
+**Authentication failed (401/403)**:
+```
+error: Authentication failed
+```
+→ Token is invalid, reconfigure with correct credentials
+**API quota exceeded (429)**:
+```
+error: API quota exceeded
+```
+→ Daily API quota exhausted, inform user to wait or upgrade
+**Unsupported format**:
+```
+error: Unsupported file format
+```
+→ File format not supported, convert to PDF/PNG/JPG
+### Pseudo-Code: Content Extraction
+**Extract main content** (most common):
+```python
+def extract_main_content(json_response):
+    regions = json_response['result']['layout']['regions']
+    # Keep only core content types
+    core_types = ['text', 'table', 'formula', 'figure', 'footnote']
+    main_regions = [r for r in regions if r['type'] in core_types]
+    # Sort by reading order
+    reading_order = json_response['result']['layout']['reading_order']
+    sorted_regions = sort_by_order(main_regions, reading_order)
+    # Present to user
+    for region in sorted_regions:
+        if region['type'] == 'text':
+            print(region['content'])
+        elif region['type'] == 'table':
+            print_as_markdown_table(region['content'])
+        elif region['type'] == 'formula':
+            print(f"Formula: {region['latex']}")
+```
+**Extract tables only**:
+```python
+def extract_tables(json_response):
+    regions = json_response['result']['layout']['regions']
+    tables = [r for r in regions if r['type'] == 'table']
+    for i, table in enumerate(tables):
+        print(f"Table {i+1}:")
+        print_as_markdown_table(table['content'])
+```
+**Extract everything**:
+```python
+def extract_complete(json_response):
+    # Simply use the full_text which includes everything
+    print(json_response['result']['full_text'])
+    # Or present all regions in order
+    regions = json_response['result']['layout']['regions']
+    reading_order = json_response['result']['layout']['reading_order']
+    for idx in reading_order:
+        region = regions[idx]
+        print(f"[{region['type']}] {region['content']}")
+```
+## Important Notes
+- **The script NEVER filters content** - It always returns complete data
+- **Claude decides what to present** - Based on user's specific request
+- **All data is always available** - Can be re-interpreted for different needs
+- **No information is lost** - Complete document structure preserved
+## Reference Documentation
+For in-depth understanding of the PaddleOCR-VL system, refer to:
+- `references/paddleocr-vl/vl_model_spec.md` - VL model specifications
+- `references/paddleocr-vl/layout_schema.md` - Layout detection schema
+- `references/paddleocr-vl/output_format.md` - Complete output format specification
+Load these reference documents into context when:
+- Debugging complex parsing issues
+- Understanding layout detection algorithm
+- Working with special document types
+- Customizing content extraction logic
+## Testing the Skill
+To verify the skill is working properly:
+```bash
+python scripts/paddleocr-vl/smoke_test.py
+```
+This tests configuration and optionally API connectivity.