structurecc 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 James Weatherhead
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,179 @@
1
+ <h1 align="center">STRUCTUREIT</h1>
2
+
3
+ <p align="center">
4
+ <strong>Agentic Document Extraction for Claude Code</strong><br>
5
+ <em>One command. Every figure. Every table.</em>
6
+ </p>
7
+
8
+ <p align="center">
9
+ <a href="https://www.npmjs.com/package/structurecc"><img src="https://img.shields.io/npm/v/structurecc.svg" alt="npm version"></a>
10
+ <a href="https://github.com/JamesWeatherhead/structurecc/stargazers"><img src="https://img.shields.io/github/stars/JamesWeatherhead/structurecc" alt="GitHub stars"></a>
11
+ <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License: MIT"></a>
12
+ </p>
13
+
14
+ <p align="center">
15
+ <em>Unstructured in. Structured out.</em>
16
+ </p>
17
+
18
+ ---
19
+
20
+ ## The Problem
21
+
22
+ You have a PDF with figures, tables, and charts. You need that data.
23
+
24
+ **Manual approach:** Screenshot each figure. Copy tables cell by cell. Spend hours on one document.
25
+
26
+ **structurecc:**
27
+ ```
28
+ /structure paper.pdf
29
+ ```
30
+
31
+ Done.
32
+
33
+ ---
34
+
35
+ ## What It Does
36
+
37
+ ```
38
+ PDF ───▶ [Agent 1] ───┐
39
+ [Agent 2] ───┤
40
+ [Agent 3] ───┼───▶ STRUCTURED.md
41
+ [Agent N] ───┘
42
+ ```
43
+
44
+ 1. **Extracts** every image from your document
45
+ 2. **Spawns** one AI agent per image (running in parallel)
46
+ 3. **Analyzes** each element exhaustively
47
+ 4. **Outputs** clean, structured markdown
48
+
49
+ Like [Landing AI's Agentic Document Extraction](https://landing.ai/agentic-document-extraction), but running locally via Claude Code.
50
+
51
+ ---
52
+
53
+ ## Install
54
+
55
+ ```bash
56
+ npx structurecc
57
+ ```
58
+
59
+ ## Use
60
+
61
+ In Claude Code:
62
+
63
+ ```
64
+ /structure path/to/document.pdf
65
+ ```
66
+
67
+ Works with: **PDF, DOCX, PNG, JPG**
68
+
69
+ ---
70
+
71
+ ## What You Get
72
+
73
+ ```
74
+ document_extracted/
75
+ ├── images/ # All extracted visuals
76
+ ├── elements/ # One markdown file per element
77
+ │ ├── element_1.md # Table fully extracted
78
+ │ ├── element_2.md # Figure analyzed
79
+ │ └── ...
80
+ └── STRUCTURED.md # Everything combined
81
+ ```
82
+
83
+ ### Example: Table Extraction
84
+
85
+ ```markdown
86
+ # Patient Demographics
87
+
88
+ **Type:** Table
89
+ **Source:** Page 3, clinical_trial.pdf
90
+
91
+ ## Content
92
+
93
+ | Group | N | Age (mean±SD) | Male (%) |
94
+ |-------|---|---------------|----------|
95
+ | Treatment | 245 | 54.3±12.1 | 58.4 |
96
+ | Placebo | 248 | 53.8±11.9 | 56.9 |
97
+ | p-value | - | 0.67 | 0.73 |
98
+
99
+ ## Notes
100
+ - * Missing data excluded
101
+ ```
102
+
103
+ ### Example: Figure Analysis
104
+
105
+ ```markdown
106
+ # Kaplan-Meier Survival Curves
107
+
108
+ **Type:** Figure
109
+ **Source:** Page 7, clinical_trial.pdf
110
+
111
+ ## Content
112
+
113
+ Survival curves comparing treatment (blue) vs placebo (red) over 24 months.
114
+
115
+ - 12-month survival: Treatment 0.89, Placebo 0.78
116
+ - 24-month survival: Treatment 0.76, Placebo 0.61
117
+ - Log-rank p = 0.003
118
+
119
+ ## Labels & Text
120
+ - "Survival Probability"
121
+ - "Time (months)"
122
+ - "Treatment (n=245)"
123
+ - "Placebo (n=248)"
124
+ ```
125
+
126
+ ---
127
+
128
+ ## How It Works
129
+
130
+ 1. **Extract** - PyMuPDF pulls all images from PDF (or unzip DOCX media folder)
131
+ 2. **Swarm** - Launch N parallel agents, one per image
132
+ 3. **Analyze** - Each agent reads its image, extracts everything, writes markdown
133
+ 4. **Combine** - Merge all element files into STRUCTURED.md
134
+
135
+ Agents run simultaneously. 10 images = 10 agents = fast.
136
+
137
+ ---
138
+
139
+ ## Cost
140
+
141
+ Depends on document complexity:
142
+
143
+ | Document | Elements | ~Cost |
144
+ |----------|----------|-------|
145
+ | Simple paper | 5-10 | $0.50-$1 |
146
+ | Full paper | 15-25 | $2-$4 |
147
+ | Dense report | 40+ | $5-$10 |
148
+
149
+ Uses Claude's multimodal vision. Works best with **Opus 4.5**.
150
+
151
+ ---
152
+
153
+ ## Requirements
154
+
155
+ - Node.js
156
+ - Claude Code (`npm install -g @anthropic-ai/claude-code`)
157
+ - Anthropic API key or Claude Pro/Max
158
+
159
+ PyMuPDF installed automatically if needed.
160
+
161
+ ---
162
+
163
+ ## Uninstall
164
+
165
+ ```bash
166
+ npx structurecc --uninstall
167
+ ```
168
+
169
+ ---
170
+
171
+ ## License
172
+
173
+ MIT
174
+
175
+ ---
176
+
177
+ <p align="center">
178
+ <strong>Stop copying tables by hand.</strong>
179
+ </p>
@@ -0,0 +1,70 @@
1
+ ---
2
+ name: structureit-extractor
3
+ description: Extract and analyze any visual element from documents
4
+ ---
5
+
6
+ # Visual Element Extractor
7
+
8
+ You are an expert at extracting structured data from document images.
9
+
10
+ ## Your Task
11
+
12
+ Given an image from a document, you:
13
+
14
+ 1. **Identify** what type of visual this is
15
+ 2. **Extract** every piece of data and text visible
16
+ 3. **Structure** the output as clean markdown
17
+ 4. **Ground** your extraction to the source location
18
+
19
+ ## Visual Types You Handle
20
+
21
+ - Tables (any format - simple, complex, merged cells)
22
+ - Figures (scientific, photographs, illustrations)
23
+ - Charts (bar, line, pie, scatter, area, box plots)
24
+ - Heatmaps (color matrices, correlation plots, expression data)
25
+ - Diagrams (flowcharts, architectures, schematics, networks)
26
+ - Forms (fields, checkboxes, filled forms)
27
+ - Mixed (images containing multiple element types)
28
+
29
+ ## Extraction Rules
30
+
31
+ 1. **Be exhaustive** - Extract EVERY visible data point, label, annotation
32
+ 2. **Be exact** - Copy text verbatim, preserve numbers precisely
33
+ 3. **Be structured** - Output clean markdown tables when appropriate
34
+ 4. **Be honest** - Mark unclear items with [?], note confidence levels
35
+ 5. **Be grounded** - Always cite source page/location
36
+
37
+ ## Output Format
38
+
39
+ ```markdown
40
+ # [Descriptive Title Based on Content]
41
+
42
+ **Type:** [table/figure/chart/heatmap/diagram/form/mixed]
43
+ **Source:** Page [N], [document name]
44
+
45
+ ## Content
46
+
47
+ [Primary extraction - tables as markdown tables, descriptions for figures,
48
+ data points for charts, etc.]
49
+
50
+ ## Labels & Annotations
51
+
52
+ [All text visible in the image, verbatim]
53
+
54
+ ## Notes
55
+
56
+ [Extraction observations, confidence levels, anything unclear]
57
+ ```
58
+
59
+ ## Special Handling
60
+
61
+ **For complex heatmaps:**
62
+ - Extract all row/column labels
63
+ - Describe the color scale
64
+ - Identify patterns and notable regions
65
+ - Sample representative values if matrix is very large
66
+
67
+ **For multi-panel figures:**
68
+ - Label each panel (A, B, C, etc.)
69
+ - Extract each panel's content separately
70
+ - Note relationships between panels
package/bin/install.js ADDED
@@ -0,0 +1,173 @@
1
+ #!/usr/bin/env node
2
+
3
+ const fs = require('fs');
4
+ const path = require('path');
5
+ const os = require('os');
6
+
7
+ const VERSION = '1.0.0';
8
+ const PACKAGE_NAME = 'structurecc';
9
+
10
+ // Colors
11
+ const colors = {
12
+ reset: '\x1b[0m',
13
+ bright: '\x1b[1m',
14
+ dim: '\x1b[2m',
15
+ red: '\x1b[31m',
16
+ green: '\x1b[32m',
17
+ yellow: '\x1b[33m',
18
+ cyan: '\x1b[36m',
19
+ magenta: '\x1b[35m',
20
+ white: '\x1b[37m'
21
+ };
22
+
23
+ function log(msg, color = '') {
24
+ console.log(`${color}${msg}${colors.reset}`);
25
+ }
26
+
27
+ function banner() {
28
+ console.log(`
29
+ ${colors.cyan}
30
+ ╔══════════════════════════════════════════════════════════════════════════════╗
31
+ ║ ║
32
+ ║ ███████╗████████╗██████╗ ██╗ ██╗ ██████╗████████╗██╗ ██╗██████╗ ███████╗║
33
+ ║ ██╔════╝╚══██╔══╝██╔══██╗██║ ██║██╔════╝╚══██╔══╝██║ ██║██╔══██╗██╔════╝║
34
+ ║ ███████╗ ██║ ██████╔╝██║ ██║██║ ██║ ██║ ██║██████╔╝█████╗ ║
35
+ ║ ╚════██║ ██║ ██╔══██╗██║ ██║██║ ██║ ██║ ██║██╔══██╗██╔══╝ ║
36
+ ║ ███████║ ██║ ██║ ██║╚██████╔╝╚██████╗ ██║ ╚██████╔╝██║ ██║███████╗║
37
+ ║ ╚══════╝ ╚═╝ ╚═╝ ╚═╝ ╚═════╝ ╚═════╝ ╚═╝ ╚═════╝ ╚═╝ ╚═╝╚══════╝║
38
+ ║ ║
39
+ ║ ${colors.reset}${colors.bright}Agentic Document Extraction${colors.reset}${colors.cyan} ║
40
+ ║ ${colors.reset}${colors.dim}One command. Every figure. Every table.${colors.reset}${colors.cyan} ║
41
+ ║ ║
42
+ ╠══════════════════════════════════════════════════════════════════════════════╣
43
+ ║ ║
44
+ ║ ${colors.reset}${colors.yellow}PDF${colors.cyan} ───▶ ${colors.green}[Agent 1]${colors.cyan} ───┐ ║
45
+ ║ ${colors.green}[Agent 2]${colors.cyan} ───┤ ║
46
+ ║ ${colors.green}[Agent 3]${colors.cyan} ───┼───▶ ${colors.magenta}STRUCTURED.md${colors.cyan} ║
47
+ ║ ${colors.green}[Agent N]${colors.cyan} ───┘ ║
48
+ ║ ║
49
+ ║ ${colors.reset}${colors.white}Unstructured in. Structured out.${colors.reset}${colors.cyan} ║
50
+ ║ ║
51
+ ╚══════════════════════════════════════════════════════════════════════════════╝
52
+ ${colors.reset}
53
+ ${colors.bright}structurecc${colors.reset} v${VERSION}
54
+ `);
55
+ }
56
+
57
+ function getClaudeDir() {
58
+ return path.join(os.homedir(), '.claude');
59
+ }
60
+
61
+ function ensureDir(dir) {
62
+ if (!fs.existsSync(dir)) {
63
+ fs.mkdirSync(dir, { recursive: true });
64
+ }
65
+ }
66
+
67
+ function copyDir(src, dest) {
68
+ ensureDir(dest);
69
+ const entries = fs.readdirSync(src, { withFileTypes: true });
70
+
71
+ for (const entry of entries) {
72
+ const srcPath = path.join(src, entry.name);
73
+ const destPath = path.join(dest, entry.name);
74
+
75
+ if (entry.isDirectory()) {
76
+ copyDir(srcPath, destPath);
77
+ } else {
78
+ fs.copyFileSync(srcPath, destPath);
79
+ }
80
+ }
81
+ }
82
+
83
+ function install() {
84
+ const claudeDir = getClaudeDir();
85
+ const commandsDir = path.join(claudeDir, 'commands', 'structure');
86
+ const agentsDir = path.join(claudeDir, 'agents');
87
+
88
+ const packageDir = path.resolve(__dirname, '..');
89
+ const srcCommandsDir = path.join(packageDir, 'commands', 'structure');
90
+ const srcAgentsDir = path.join(packageDir, 'agents');
91
+
92
+ log('Installing structurecc...', colors.yellow);
93
+ log('');
94
+
95
+ // Install command
96
+ if (fs.existsSync(srcCommandsDir)) {
97
+ copyDir(srcCommandsDir, commandsDir);
98
+ log(' ✓ Installed /structure command', colors.green);
99
+ }
100
+
101
+ // Install agents
102
+ if (fs.existsSync(srcAgentsDir)) {
103
+ const agentFiles = fs.readdirSync(srcAgentsDir);
104
+ ensureDir(agentsDir);
105
+ for (const file of agentFiles) {
106
+ if (file.startsWith('structureit-')) {
107
+ fs.copyFileSync(
108
+ path.join(srcAgentsDir, file),
109
+ path.join(agentsDir, file)
110
+ );
111
+ log(` ✓ Installed ${file.replace('.md', '')}`, colors.green);
112
+ }
113
+ }
114
+ }
115
+
116
+ log('');
117
+ log(`${colors.green}Done!${colors.reset}`);
118
+ log('');
119
+ log(`Run in Claude Code:`, colors.bright);
120
+ log(` /structure path/to/document.pdf`, colors.cyan);
121
+ log('');
122
+ log(`${colors.dim}Supports: PDF, DOCX, PNG, JPG, TIFF${colors.reset}`);
123
+ log('');
124
+ }
125
+
126
+ function uninstall() {
127
+ const claudeDir = getClaudeDir();
128
+ const commandsDir = path.join(claudeDir, 'commands', 'structure');
129
+ const agentsDir = path.join(claudeDir, 'agents');
130
+
131
+ log('Uninstalling structurecc...', colors.yellow);
132
+
133
+ if (fs.existsSync(commandsDir)) {
134
+ fs.rmSync(commandsDir, { recursive: true });
135
+ log(' ✓ /structure command removed', colors.green);
136
+ }
137
+
138
+ if (fs.existsSync(agentsDir)) {
139
+ const agentFiles = fs.readdirSync(agentsDir);
140
+ for (const file of agentFiles) {
141
+ if (file.startsWith('structureit-')) {
142
+ fs.unlinkSync(path.join(agentsDir, file));
143
+ log(` ✓ Removed ${file}`, colors.green);
144
+ }
145
+ }
146
+ }
147
+
148
+ log('');
149
+ log('Uninstall complete.', colors.green);
150
+ log('');
151
+ }
152
+
153
+ // Main
154
+ const args = process.argv.slice(2);
155
+
156
+ banner();
157
+
158
+ if (args.includes('--uninstall') || args.includes('-u')) {
159
+ uninstall();
160
+ } else if (args.includes('--help') || args.includes('-h')) {
161
+ log('Usage: npx structurecc [options]', colors.bright);
162
+ log('');
163
+ log('Options:', colors.bright);
164
+ log(' --help, -h Show this help', colors.dim);
165
+ log(' --uninstall, -u Remove from Claude Code', colors.dim);
166
+ log('');
167
+ log('After install, use in Claude Code:', colors.bright);
168
+ log(' /structure path/to/document.pdf', colors.cyan);
169
+ log(' /structure path/to/document.docx', colors.cyan);
170
+ log('');
171
+ } else {
172
+ install();
173
+ }
@@ -0,0 +1,242 @@
1
+ ---
2
+ name: structure
3
+ description: Extract structured data from PDFs and Word docs using AI agent swarms
4
+ arguments:
5
+ - name: path
6
+ description: Path to document (PDF, DOCX, or image)
7
+ required: true
8
+ ---
9
+
10
+ # /structure - Agentic Document Extraction
11
+
12
+ Turn complex documents into structured markdown using parallel AI subagents.
13
+
14
+ ## Overview
15
+
16
+ 1. Extract all images from the document
17
+ 2. Spawn ONE subagent PER IMAGE (all in parallel)
18
+ 3. Each agent analyzes its image and writes structured markdown
19
+ 4. Combine into final STRUCTURED.md
20
+
21
+ ## Step 1: Setup
22
+
23
+ Create output directory next to the document:
24
+ ```
25
+ <document_name>_extracted/
26
+ ├── images/ # Extracted visuals
27
+ ├── elements/ # Per-element markdown
28
+ └── STRUCTURED.md # Final output
29
+ ```
30
+
31
+ ## Step 2: Extract Images
32
+
33
+ **For PDF files** - Use PyMuPDF:
34
+
35
+ ```python
36
+ import fitz
37
+ import os
38
+
39
+ pdf_path = "<document_path>"
40
+ output_dir = "<output_dir>"
41
+ images_dir = os.path.join(output_dir, "images")
42
+ os.makedirs(images_dir, exist_ok=True)
43
+
44
+ doc = fitz.open(pdf_path)
45
+ extracted = []
46
+
47
+ for page_num in range(len(doc)):
48
+ page = doc[page_num]
49
+ for img_idx, img in enumerate(page.get_images()):
50
+ xref = img[0]
51
+ pix = fitz.Pixmap(doc, xref)
52
+ if pix.n - pix.alpha > 3:
53
+ pix = fitz.Pixmap(fitz.csRGB, pix)
54
+
55
+ img_name = f"p{page_num + 1}_img{img_idx + 1}.png"
56
+ pix.save(os.path.join(images_dir, img_name))
57
+ extracted.append({"path": os.path.join(images_dir, img_name), "page": page_num + 1, "name": img_name})
58
+ pix = None
59
+
60
+ doc.close()
61
+ print(f"Extracted {len(extracted)} images")
62
+ ```
63
+
64
+ **For DOCX files** - Unzip and extract media:
65
+
66
+ ```python
67
+ from zipfile import ZipFile
68
+ import os
69
+
70
+ docx_path = "<document_path>"
71
+ output_dir = "<output_dir>"
72
+ images_dir = os.path.join(output_dir, "images")
73
+ os.makedirs(images_dir, exist_ok=True)
74
+
75
+ extracted = []
76
+ with ZipFile(docx_path, 'r') as z:
77
+ for f in z.namelist():
78
+ if f.startswith('word/media/'):
79
+ name = os.path.basename(f)
80
+ path = os.path.join(images_dir, name)
81
+ with z.open(f) as src, open(path, 'wb') as dst:
82
+ dst.write(src.read())
83
+ extracted.append({"path": path, "name": name})
84
+
85
+ print(f"Extracted {len(extracted)} images")
86
+ ```
87
+
88
+ **For standalone images** - Just process directly.
89
+
90
+ Also extract main text:
91
+ - PDF: `page.get_text()` for each page
92
+ - DOCX: `textutil -convert txt "<path>" -stdout`
93
+
94
+ ## Step 3: Spawn Agent Swarm
95
+
96
+ **CRITICAL:** Launch ALL agents in ONE message with MULTIPLE Task tool calls.
97
+
98
+ For EACH extracted image:
99
+
100
+ ```
101
+ Task(
102
+ subagent_type: "general-purpose",
103
+ description: "Extract element [N]",
104
+ prompt: """
105
+ You are extracting structured data from a document image.
106
+
107
+ **Image:** <full_path_to_image>
108
+ **Source:** Page <N> of <document_name>
109
+ **Output:** Write to <output_dir>/elements/element_<N>.md
110
+
111
+ ## Instructions
112
+
113
+ 1. Read the image carefully
114
+ 2. Identify what it contains (table, figure, chart, heatmap, diagram, etc.)
115
+ 3. Extract ALL visible data - be exhaustive
116
+ 4. Structure as clean markdown
117
+
118
+ ## Output Format
119
+
120
+ Write this to the output file:
121
+
122
+ ```markdown
123
+ # [Descriptive Title]
124
+
125
+ **Type:** [table/figure/chart/heatmap/diagram/other]
126
+ **Source:** Page [N], [document name]
127
+
128
+ ## Content
129
+
130
+ [For tables: markdown table with all data]
131
+ [For figures: detailed description + all visible text/labels]
132
+ [For charts: data points, axes, trends]
133
+ [For heatmaps: labels, color scale, patterns]
134
+ [For diagrams: components, relationships, flow]
135
+
136
+ ## Labels & Text
137
+
138
+ [Every piece of text visible, verbatim]
139
+
140
+ ## Notes
141
+
142
+ [Confidence level, unclear items marked with [?]]
143
+ ```
144
+
145
+ Be thorough. Extract every data point.
146
+ """
147
+ )
148
+ ```
149
+
150
+ Launch 10 images = 10 Task calls in ONE message. They run in parallel.
151
+
152
+ ## Step 4: Extract Main Text
153
+
154
+ Save document text to `elements/main_text.md`:
155
+
156
+ ```markdown
157
+ # Main Document Text
158
+
159
+ **Source:** [document name]
160
+
161
+ ---
162
+
163
+ [Full text extracted from document, preserving structure]
164
+ ```
165
+
166
+ ## Step 5: Combine Results
167
+
168
+ After all agents complete, read all `elements/*.md` files and create:
169
+
170
+ **STRUCTURED.md:**
171
+
172
+ ```markdown
173
+ # [Document Name] - Structured Extraction
174
+
175
+ **Original:** [filename]
176
+ **Extracted:** [date/time]
177
+ **Elements:** [N] visual elements processed
178
+
179
+ ---
180
+
181
+ ## Main Text
182
+
183
+ [Content from main_text.md]
184
+
185
+ ---
186
+
187
+ ## Visual Elements
188
+
189
+ ### Element 1
190
+ [Content from element_1.md]
191
+
192
+ ### Element 2
193
+ [Content from element_2.md]
194
+
195
+ [... continue for all elements ...]
196
+
197
+ ---
198
+
199
+ ## Extraction Summary
200
+
201
+ | # | Type | Source | Status |
202
+ |---|------|--------|--------|
203
+ | 1 | Table | Page 2 | ✓ |
204
+ | 2 | Figure | Page 3 | ✓ |
205
+ | ... | ... | ... | ... |
206
+ ```
207
+
208
+ ## Step 6: Display Results
209
+
210
+ ```
211
+ ╔═══════════════════════════════════════════════════════════╗
212
+ ║ EXTRACTION COMPLETE ║
213
+ ╠═══════════════════════════════════════════════════════════╣
214
+ ║ ║
215
+ ║ Document: [name] ║
216
+ ║ Output: [path]_extracted/ ║
217
+ ║ ║
218
+ ║ Extracted: [N] visual elements ║
219
+ ║ ║
220
+ ║ Files: ║
221
+ ║ images/ [N] extracted images ║
222
+ ║ elements/ [N] element markdown files ║
223
+ ║ STRUCTURED.md Combined output ║
224
+ ║ ║
225
+ ╚═══════════════════════════════════════════════════════════╝
226
+ ```
227
+
228
+ Then open: `open "<output_dir>/STRUCTURED.md"`
229
+
230
+ ## Dependencies
231
+
232
+ Install PyMuPDF if not present:
233
+ ```bash
234
+ pip3 install PyMuPDF --quiet
235
+ ```
236
+
237
+ ## Tips
238
+
239
+ - Use opus model for best extraction quality on complex visuals
240
+ - Each image = one agent = one API call
241
+ - Agents run in parallel for speed
242
+ - Check individual element files if one extraction looks wrong
package/package.json ADDED
@@ -0,0 +1,34 @@
1
+ {
2
+ "name": "structurecc",
3
+ "version": "1.0.0",
4
+ "description": "Agentic document extraction for Claude Code. One command. Every figure. Every table.",
5
+ "keywords": [
6
+ "document-extraction",
7
+ "pdf",
8
+ "structure",
9
+ "agentic",
10
+ "claude-code",
11
+ "llm",
12
+ "multimodal",
13
+ "tables",
14
+ "figures",
15
+ "markdown",
16
+ "ai-agents",
17
+ "ocr"
18
+ ],
19
+ "author": "James Weatherhead",
20
+ "license": "MIT",
21
+ "repository": {
22
+ "type": "git",
23
+ "url": "https://github.com/JamesWeatherhead/structurecc"
24
+ },
25
+ "homepage": "https://github.com/JamesWeatherhead/structurecc#readme",
26
+ "bin": {
27
+ "structurecc": "./bin/install.js"
28
+ },
29
+ "files": [
30
+ "bin/",
31
+ "commands/",
32
+ "agents/"
33
+ ]
34
+ }