npm - structurecc - Versions diffs - 2.0.0 → 2.0.1 - Mend

structurecc 2.0.0 → 2.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (2) hide show

package/README.md +34 -307
package/package.json +3 -8

package/README.md CHANGED Viewed

@@ -1,7 +1,7 @@
-<h1 align="center">STRUCTURE v2.0</h1>
+<h1 align="center">STRUCTURE</h1>
 <p align="center">
-<strong>Landing AI charges $500/month for agentic document structuring.<br>This is free.</strong>
+<strong>Extract structured data from PDFs, Word docs, and images using Claude Code.</strong>
 </p>
 <p align="center">
@@ -13,339 +13,84 @@
 <img src="assets/terminal.png" alt="structurecc" width="550">
 </p>
-<p align="center">
-<em>Works on Mac, Windows, and Linux</em>
-</p>
----
-## What's New in v2.0
-**3-Phase Pipeline with Quality Verification**
-```
-Image → [Classify] → [Extract] → [Verify] → Output
-                         ↑_______↻_______↓
-```
-| Phase | Agent | Purpose |
-|-------|-------|---------|
-| 1. Classify | `structurecc-classifier` | Fast triage to route to correct extractor |
-| 2. Extract | 6 specialized extractors | Type-specific verbatim extraction |
-| 3. Verify | `structurecc-verifier` | Quality scoring with auto-revision |
-**Verbatim Extraction** - Text is copied EXACTLY as shown. No paraphrasing, no "cleanup."
-**Quality Scoring** - Each extraction gets a 0.0-1.0 score. Failures auto-retry up to 2x.
-**Specialized Extractors** - Tables, charts, heatmaps, diagrams each get dedicated agents.
----
-## The Problem
-You have a 50-page PDF with figures, tables, and charts. You need that data.
-**Manual approach:** Screenshot each figure. Transcribe tables cell by cell. Spend hours on one document.
-**With structurecc:** One command. Walk away. Come back to perfectly structured markdown with quality verification.
-```
-/structure paper.pdf
-```
-Spawns parallel AI agents. Each agent analyzes one visual element. All run simultaneously. Quality verified. Done in minutes, not hours.
----
-## Specialized Extractors
-| Extractor | Handles |
-|-----------|---------|
-| `structurecc-extract-table` | Tables with cell-by-cell accuracy, merged cells, footnotes |
-| `structurecc-extract-chart` | Kaplan-Meier, bar, line, scatter, forest plots with axes, legends, data |
-| `structurecc-extract-heatmap` | Expression heatmaps, correlation matrices with full label extraction |
-| `structurecc-extract-diagram` | CONSORT flows, timelines, network diagrams with all node text |
-| `structurecc-extract-multipanel` | Multi-panel figures (A, B, C, D) with per-panel extraction |
-| `structurecc-extract-generic` | Photographs, schematics, equations, other visuals |
----
-## Quality Verification
-Every extraction is verified against the source image:
-```json
-{
-  "scores": {
-    "completeness": 0.95,
-    "accuracy": 0.92,
-    "verbatim_compliance": 0.88,
-    "structure_correctness": 0.97,
-    "overall": 0.93
-  },
-  "pass": true,
-  "threshold": 0.90
-}
-```
-| Score | Meaning |
-|-------|---------|
-| **completeness** | Was every visible element captured? |
-| **accuracy** | Are values (numbers, stats) correct? |
-| **verbatim_compliance** | Was text copied exactly as shown? |
-| **structure_correctness** | Is the JSON structure valid? |
-**Auto-revision:** If score < 0.90, extraction is re-run with specific feedback. Max 2 attempts.
----
-## Before You Start
-You need two things:
-### 1. Node.js
-Check if you have it:
-```bash
-node --version
-```
-If you see a version number, you're good. If you see "command not found", download Node.js from **[nodejs.org](https://nodejs.org/)** and install it.
-### 2. Anthropic API Key or Pro/Max Plan
-You need one of these to use Claude Code:
-- **API key:** Get one at **[console.anthropic.com](https://console.anthropic.com/)**. Requires a payment method.
-- **Pro or Max plan:** If you subscribe to Claude Pro ($20/mo) or Max ($100/mo), you can use Claude Code without a separate API key.
 ---
-## Setup (5 minutes)
-### Step 1: Open your terminal
+## Requirements
-**Mac:** Press `Cmd + Space`, type `Terminal`, press Enter
-**Windows:** Press `Win + X`, click "Terminal" or "PowerShell"
-**Linux:** Press `Ctrl + Alt + T`
+- **Node.js** - [nodejs.org](https://nodejs.org/)
+- **Claude Code** - Requires API key or Pro/Max subscription
 ---
-### Step 2: Install Claude Code
+## Install
-Copy this command and paste it into your terminal:
+### Step 1: Install Claude Code
 ```bash
 npm install -g @anthropic-ai/claude-code
 ```
-Wait for it to finish.
----
-### Step 3: Install structurecc
+<p align="center">
+<img src="assets/screenshots/step0.png" alt="Install Claude Code" width="550">
+</p>
-Copy and run this:
+### Step 2: Install structurecc
 ```bash
 npx structurecc
 ```
-You will see a STRUCTURE banner and 8 agents being installed. You only do this once.
----
-### Step 4: Set up your document folder
-Create a folder with your document:
-```
-documents/
-└── document.pdf          ← your PDF, DOCX, or image
-```
-**Put your document in a folder. That's it.**
----
+<p align="center">
+<img src="assets/screenshots/step1.png" alt="Install structurecc" width="420">
+</p>
-### Step 5: Open Claude Code
+### Step 3: Start Claude Code
-Navigate to your document folder and start Claude Code:
+Navigate to your document folder and run:
 ```bash
 cd ~/Desktop/documents
 claude
 ```
-**Windows users:** Replace `~/Desktop/documents` with your actual path, like `C:\Users\YourName\Desktop\documents`
-The first time you run `claude`, it will ask for your API key. Paste it in.
----
+<p align="center">
+<img src="assets/screenshots/step3a.png" alt="Start Claude Code" width="460">
+</p>
-### Step 6: Run structure
+### Step 4: Run structure
-Now you are inside Claude Code. Type this command:
+Inside Claude Code:
 ```
 /structure document.pdf
 ```
-**Important:** The `/structure` command only works inside Claude Code. If you type it in your regular terminal, it will not work.
+<p align="center">
+<img src="assets/screenshots/step3.png" alt="Run /structure" width="520">
+</p>
-structurecc will:
-1. Extract every image from your document
-2. Classify each image (table, chart, heatmap, diagram, etc.)
-3. Spawn specialized extractors in parallel
-4. Verify each extraction against the source
-5. Auto-revise failed extractions
-6. Combine everything into `STRUCTURED.md`
+Supports **PDF**, **DOCX**, **PNG**, **JPG**, and **TIFF**.
 ---
-## What You Get
-A comprehensive output directory with full traceability:
+## Output
 ```
 document_extracted/
-├── images/                    # All extracted visuals
-├── classifications/           # Phase 1: type detection
-│   ├── element_001_class.json
-│   └── ...
-├── extractions/              # Phase 2: JSON extractions
-│   ├── element_001.json
-│   └── ...
-├── verifications/            # Phase 3: quality scores
-│   ├── element_001_verify.json
-│   └── ...
-├── elements/                 # Markdown per element
-│   ├── element_001.md
-│   └── ...
-├── STRUCTURED.md             # Combined output
-└── extraction_report.json    # Quality metrics summary
-```
-### Quality Report
-```json
-{
-  "document": "clinical_trial.pdf",
-  "pipeline_version": "2.0.0",
-  "elements_total": 15,
-  "elements_passed": 13,
-  "elements_revised": 2,
-  "elements_human_review": 0,
-  "average_quality_score": 0.92
-}
-```
-### Example: Table Extraction
-```markdown
-# Patient Demographics
-**Type:** Table
-**Source:** Page 3, clinical_trial.pdf
-## Data
-| Characteristic | Treatment (n=245) | Placebo (n=248) | P-value |
-|---|---|---|---|
-| Age, years | 54.3 ± 12.1 | 53.8 ± 11.9 | 0.67 |
-| Male (%) | 58.4 | 56.9 | 0.73 |
-| BMI (kg/m²) | 28.7 ± 4.2 | 28.4 ± 4.1 | 0.42 |
-## Footnotes
-- * Missing data excluded from analysis
-- † Adjusted for baseline
-```
-### Example: Kaplan-Meier Extraction
-```markdown
-# Kaplan-Meier Survival Curves
-**Type:** kaplan_meier
-**Source:** Page 7, clinical_trial.pdf
-## Axes
-- **X-axis:** Time (Days) Since HSV Diagnosis
-  - Range: 0 to 7000
-- **Y-axis:** Cumulative Risk of Dementia
-  - Range: 0 to 0.6
-## Legend
-- **HSV: Dementia Risk**: purple solid
-- **Control: Dementia Risk**: dark blue solid
-- **HSV: Dementia Risk 95% CI**: light purple shaded area
-- **Control: Dementia Risk 95% CI**: light orange shaded area
-## Statistical Annotations
-- p_value: < 0.001
-- hazard_ratio: 1.52 (95% CI: 1.38-1.68)
-## Risk Table
-| Time (days) | 0 | 1000 | 2000 | 3000 | 4000 | 5000 | 6000 | 7000 |
-|---|---|---|---|---|---|---|---|---|
-| HSV | 8,362 | 7,891 | 6,543 | 5,102 | 3,876 | 2,654 | 1,432 | 521 |
-| Control | 41,810 | 39,765 | 33,421 | 26,543 | 19,876 | 13,543 | 7,654 | 2,876 |
+├── images/              # Extracted visuals
+├── elements/            # Markdown per element
+└── STRUCTURED.md        # Combined output
 ```
 ---
-## Cost
-| Document | Elements | ~Cost |
-|----------|----------|-------|
-| Simple paper | 5-10 | $1-$2 |
-| Full paper | 15-25 | $3-$6 |
-| Dense report | 40+ | $8-$15 |
-Uses Claude's multimodal vision with model-appropriate routing:
-- **Haiku** for classification (fast, cheap)
-- **Opus** for extraction (highest quality)
-- **Sonnet** for verification (balanced)
----
-## Supported Formats
-- **PDF** - Extracts embedded images via PyMuPDF
-- **DOCX** - Extracts images from Word's media folder
-- **PNG/JPG/TIFF** - Analyzes images directly
----
 ## Troubleshooting
-**"npm: command not found"**
-You need Node.js. Download it from [nodejs.org](https://nodejs.org/).
-**"bash: /structure: No such file or directory"**
-You typed `/structure` in your regular terminal. You need to type it inside Claude Code. First run `claude` to start Claude Code, then type `/structure`.
-**"No images found"**
-Make sure your PDF contains actual images, not just text. Some PDFs render everything as text.
-**Low quality scores**
-Check `verifications/` for specific issues. Complex tables or poor image quality may need human review.
-**Claude Code asks for an API key**
-Either get an API key at [console.anthropic.com](https://console.anthropic.com/), or subscribe to Claude Pro/Max at [claude.ai](https://claude.ai/).
+| Issue | Solution |
+|-------|----------|
+| `npm: command not found` | Install Node.js from [nodejs.org](https://nodejs.org/) |
+| `/structure: No such file` | Run `claude` first, then type `/structure` inside Claude Code |
+| No images found | PDF may be text-only with no embedded images |
 ---
@@ -357,24 +102,6 @@ npx structurecc --uninstall
 ---
-## Upgrade from v1.x
-Just run the installer again:
-```bash
-npx structurecc
-```
-The installer automatically removes the old `structurecc-extractor` and installs the new 8-agent pipeline.
----
 ## License
 MIT
----
-<p align="center">
-<strong>Verbatim in. Quality verified out.</strong>
-</p>

package/package.json CHANGED Viewed

@@ -1,24 +1,19 @@
 {
   "name": "structurecc",
-  "version": "2.0.0",
-  "description": "Agentic document structuring for Claude Code with verbatim extraction and quality verification. 3-phase pipeline: Classify → Extract → Verify.",
+  "version": "2.0.1",
+  "description": "Extract structured data from PDFs, Word docs, and images using Claude Code.",
   "keywords": [
     "document-extraction",
     "pdf",
     "structure",
-    "agentic",
     "claude-code",
     "llm",
     "multimodal",
     "tables",
     "figures",
     "charts",
-    "heatmaps",
     "markdown",
-    "ai-agents",
-    "ocr",
-    "verbatim",
-    "quality-assurance"
+    "ai-agents"
   ],
   "author": "James Weatherhead",
   "license": "MIT",