structurecc 1.0.0 → 1.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md
CHANGED
|
@@ -1,75 +1,183 @@
|
|
|
1
|
-
<h1 align="center">
|
|
1
|
+
<h1 align="center">STRUCTURE</h1>
|
|
2
2
|
|
|
3
3
|
<p align="center">
|
|
4
|
-
<strong>
|
|
5
|
-
<em>One command. Every figure. Every table.</em>
|
|
4
|
+
<strong>Landing AI charges $500/month for agentic document extraction.<br>This is free.</strong>
|
|
6
5
|
</p>
|
|
7
6
|
|
|
8
7
|
<p align="center">
|
|
9
8
|
<a href="https://www.npmjs.com/package/structurecc"><img src="https://img.shields.io/npm/v/structurecc.svg" alt="npm version"></a>
|
|
10
|
-
<a href="https://github.com/JamesWeatherhead/structurecc/stargazers"><img src="https://img.shields.io/github/stars/JamesWeatherhead/structurecc" alt="GitHub stars"></a>
|
|
11
9
|
<a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License: MIT"></a>
|
|
12
10
|
</p>
|
|
13
11
|
|
|
14
12
|
<p align="center">
|
|
15
|
-
<
|
|
13
|
+
<img src="assets/terminal.png" alt="structurecc" width="550">
|
|
14
|
+
</p>
|
|
15
|
+
|
|
16
|
+
<p align="center">
|
|
17
|
+
<em>Works on Mac, Windows, and Linux</em>
|
|
16
18
|
</p>
|
|
17
19
|
|
|
18
20
|
---
|
|
19
21
|
|
|
20
22
|
## The Problem
|
|
21
23
|
|
|
22
|
-
You have a PDF with figures, tables, and charts. You need that data.
|
|
24
|
+
You have a 50-page PDF with figures, tables, and charts. You need that data.
|
|
23
25
|
|
|
24
|
-
**Manual approach:** Screenshot each figure.
|
|
26
|
+
**Manual approach:** Screenshot each figure. Transcribe tables cell by cell. Spend hours on one document.
|
|
27
|
+
|
|
28
|
+
**With structurecc:** One command. Walk away. Come back to perfectly structured markdown.
|
|
25
29
|
|
|
26
|
-
**structurecc:**
|
|
27
30
|
```
|
|
28
31
|
/structure paper.pdf
|
|
29
32
|
```
|
|
30
33
|
|
|
31
|
-
Done.
|
|
34
|
+
Spawns parallel AI agents. Each agent analyzes one visual element. All run simultaneously. Done in minutes, not hours.
|
|
35
|
+
|
|
36
|
+
---
|
|
37
|
+
|
|
38
|
+
## What is this?
|
|
39
|
+
|
|
40
|
+
Give it a document. It extracts every image. Spawns one AI agent per image. Each agent exhaustively analyzes its element—tables become markdown tables, figures get descriptions, charts get data points extracted.
|
|
41
|
+
|
|
42
|
+
Runs inside **[Claude Code](https://docs.anthropic.com/en/docs/claude-code)** (Anthropic's terminal assistant). One command. ~$0.50-$5 per document.
|
|
43
|
+
|
|
44
|
+
Like [Landing AI's Agentic Document Extraction](https://landing.ai/agentic-document-extraction), but running locally via Claude Code.
|
|
32
45
|
|
|
33
46
|
---
|
|
34
47
|
|
|
35
|
-
##
|
|
48
|
+
## Before You Start
|
|
36
49
|
|
|
50
|
+
You need two things:
|
|
51
|
+
|
|
52
|
+
### 1. Node.js
|
|
53
|
+
|
|
54
|
+
Check if you have it:
|
|
55
|
+
|
|
56
|
+
```bash
|
|
57
|
+
node --version
|
|
37
58
|
```
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
59
|
+
|
|
60
|
+
If you see a version number, you're good. If you see "command not found", download Node.js from **[nodejs.org](https://nodejs.org/)** and install it.
|
|
61
|
+
|
|
62
|
+
### 2. Anthropic API Key or Pro/Max Plan
|
|
63
|
+
|
|
64
|
+
You need one of these to use Claude Code:
|
|
65
|
+
|
|
66
|
+
- **API key:** Get one at **[console.anthropic.com](https://console.anthropic.com/)**. Requires a payment method.
|
|
67
|
+
- **Pro or Max plan:** If you subscribe to Claude Pro ($20/mo) or Max ($100/mo), you can use Claude Code without a separate API key.
|
|
68
|
+
|
|
69
|
+
---
|
|
70
|
+
|
|
71
|
+
## Setup (5 minutes)
|
|
72
|
+
|
|
73
|
+
### Step 1: Open your terminal
|
|
74
|
+
|
|
75
|
+
**Mac:** Press `Cmd + Space`, type `Terminal`, press Enter
|
|
76
|
+
|
|
77
|
+
**Windows:** Press `Win + X`, click "Terminal" or "PowerShell"
|
|
78
|
+
|
|
79
|
+
**Linux:** Press `Ctrl + Alt + T`
|
|
80
|
+
|
|
81
|
+
---
|
|
82
|
+
|
|
83
|
+
### Step 2: Install Claude Code
|
|
84
|
+
|
|
85
|
+
Copy this command and paste it into your terminal:
|
|
86
|
+
|
|
87
|
+
```bash
|
|
88
|
+
npm install -g @anthropic-ai/claude-code
|
|
42
89
|
```
|
|
43
90
|
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
4. **Outputs** clean, structured markdown
|
|
91
|
+
<p align="center">
|
|
92
|
+
<img src="assets/screenshots/step0.png" alt="Install Claude Code" width="550">
|
|
93
|
+
</p>
|
|
48
94
|
|
|
49
|
-
|
|
95
|
+
Wait for it to finish.
|
|
50
96
|
|
|
51
97
|
---
|
|
52
98
|
|
|
53
|
-
|
|
99
|
+
### Step 3: Install structurecc
|
|
100
|
+
|
|
101
|
+
Copy and run this:
|
|
54
102
|
|
|
55
103
|
```bash
|
|
56
104
|
npx structurecc
|
|
57
105
|
```
|
|
58
106
|
|
|
59
|
-
|
|
107
|
+
<p align="center">
|
|
108
|
+
<img src="assets/screenshots/step1.png" alt="Install structurecc" width="420">
|
|
109
|
+
</p>
|
|
110
|
+
|
|
111
|
+
You will see a STRUCTURE banner. That means it worked. You only do this once.
|
|
112
|
+
|
|
113
|
+
---
|
|
114
|
+
|
|
115
|
+
### Step 4: Set up your document folder
|
|
116
|
+
|
|
117
|
+
Create a folder with your document:
|
|
118
|
+
|
|
119
|
+
<p align="center">
|
|
120
|
+
<img src="assets/screenshots/step2.png" alt="Folder structure" width="380">
|
|
121
|
+
</p>
|
|
122
|
+
|
|
123
|
+
```
|
|
124
|
+
documents/
|
|
125
|
+
├── document.pdf ← your PDF, DOCX, or image
|
|
126
|
+
└── images/ ← extracted images go here (created automatically)
|
|
127
|
+
├── figure_1.png
|
|
128
|
+
├── table_2.png
|
|
129
|
+
└── chart_3.png
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
**Put your document in a folder. That's it.**
|
|
133
|
+
|
|
134
|
+
---
|
|
135
|
+
|
|
136
|
+
### Step 5: Open Claude Code
|
|
137
|
+
|
|
138
|
+
Navigate to your document folder and start Claude Code:
|
|
139
|
+
|
|
140
|
+
```bash
|
|
141
|
+
cd ~/Desktop/documents
|
|
142
|
+
claude
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
<p align="center">
|
|
146
|
+
<img src="assets/screenshots/step3a.png" alt="Start Claude Code" width="460">
|
|
147
|
+
</p>
|
|
148
|
+
|
|
149
|
+
**Windows users:** Replace `~/Desktop/documents` with your actual path, like `C:\Users\YourName\Desktop\documents`
|
|
150
|
+
|
|
151
|
+
The first time you run `claude`, it will ask for your API key. Paste it in.
|
|
152
|
+
|
|
153
|
+
---
|
|
154
|
+
|
|
155
|
+
### Step 6: Run structure
|
|
60
156
|
|
|
61
|
-
|
|
157
|
+
Now you are inside Claude Code. Type this command:
|
|
62
158
|
|
|
63
159
|
```
|
|
64
|
-
/structure
|
|
160
|
+
/structure document.pdf
|
|
65
161
|
```
|
|
66
162
|
|
|
67
|
-
|
|
163
|
+
<p align="center">
|
|
164
|
+
<img src="assets/screenshots/step3.png" alt="Run /structure" width="500">
|
|
165
|
+
</p>
|
|
166
|
+
|
|
167
|
+
**Important:** The `/structure` command only works inside Claude Code. If you type it in your regular terminal, it will not work.
|
|
168
|
+
|
|
169
|
+
structurecc will:
|
|
170
|
+
1. Extract every image from your document
|
|
171
|
+
2. Spawn one agent per image (all running in parallel)
|
|
172
|
+
3. Each agent exhaustively analyzes its visual element
|
|
173
|
+
4. Combine everything into `STRUCTURED.md`
|
|
68
174
|
|
|
69
175
|
---
|
|
70
176
|
|
|
71
177
|
## What You Get
|
|
72
178
|
|
|
179
|
+
A comprehensive markdown file with every visual element extracted:
|
|
180
|
+
|
|
73
181
|
```
|
|
74
182
|
document_extracted/
|
|
75
183
|
├── images/ # All extracted visuals
|
|
@@ -97,66 +205,72 @@ document_extracted/
|
|
|
97
205
|
| p-value | - | 0.67 | 0.73 |
|
|
98
206
|
|
|
99
207
|
## Notes
|
|
100
|
-
-
|
|
208
|
+
- Confidence level: High
|
|
209
|
+
- * Missing data excluded from analysis
|
|
101
210
|
```
|
|
102
211
|
|
|
103
|
-
### Example:
|
|
212
|
+
### Example: Chart Analysis
|
|
104
213
|
|
|
105
214
|
```markdown
|
|
106
215
|
# Kaplan-Meier Survival Curves
|
|
107
216
|
|
|
108
|
-
**Type:**
|
|
217
|
+
**Type:** Chart (Line/Survival)
|
|
109
218
|
**Source:** Page 7, clinical_trial.pdf
|
|
110
219
|
|
|
111
220
|
## Content
|
|
112
221
|
|
|
113
222
|
Survival curves comparing treatment (blue) vs placebo (red) over 24 months.
|
|
114
223
|
|
|
224
|
+
Key data points:
|
|
115
225
|
- 12-month survival: Treatment 0.89, Placebo 0.78
|
|
116
226
|
- 24-month survival: Treatment 0.76, Placebo 0.61
|
|
117
227
|
- Log-rank p = 0.003
|
|
118
228
|
|
|
119
|
-
## Labels &
|
|
120
|
-
- "Survival Probability"
|
|
121
|
-
- "Time (months)"
|
|
122
|
-
- "Treatment (n=245)"
|
|
123
|
-
- "Placebo (n=248)"
|
|
229
|
+
## Labels & Annotations
|
|
230
|
+
- Y-axis: "Survival Probability"
|
|
231
|
+
- X-axis: "Time (months)"
|
|
232
|
+
- Legend: "Treatment (n=245)", "Placebo (n=248)"
|
|
124
233
|
```
|
|
125
234
|
|
|
126
235
|
---
|
|
127
236
|
|
|
128
|
-
## How It Works
|
|
129
|
-
|
|
130
|
-
1. **Extract** - PyMuPDF pulls all images from PDF (or unzip DOCX media folder)
|
|
131
|
-
2. **Swarm** - Launch N parallel agents, one per image
|
|
132
|
-
3. **Analyze** - Each agent reads its image, extracts everything, writes markdown
|
|
133
|
-
4. **Combine** - Merge all element files into STRUCTURED.md
|
|
134
|
-
|
|
135
|
-
Agents run simultaneously. 10 images = 10 agents = fast.
|
|
136
|
-
|
|
137
|
-
---
|
|
138
|
-
|
|
139
237
|
## Cost
|
|
140
238
|
|
|
141
|
-
Depends on document complexity:
|
|
142
|
-
|
|
143
239
|
| Document | Elements | ~Cost |
|
|
144
240
|
|----------|----------|-------|
|
|
145
241
|
| Simple paper | 5-10 | $0.50-$1 |
|
|
146
242
|
| Full paper | 15-25 | $2-$4 |
|
|
147
243
|
| Dense report | 40+ | $5-$10 |
|
|
148
244
|
|
|
149
|
-
Uses Claude's multimodal vision. Works best with **Opus 4.5
|
|
245
|
+
Uses Claude's multimodal vision. Works best with **Opus 4.5** for complex tables and charts.
|
|
150
246
|
|
|
151
247
|
---
|
|
152
248
|
|
|
153
|
-
##
|
|
249
|
+
## Supported Formats
|
|
250
|
+
|
|
251
|
+
- **PDF** - Extracts embedded images via PyMuPDF
|
|
252
|
+
- **DOCX** - Extracts images from Word's media folder
|
|
253
|
+
- **PNG/JPG/TIFF** - Analyzes images directly
|
|
254
|
+
|
|
255
|
+
---
|
|
256
|
+
|
|
257
|
+
## Troubleshooting
|
|
258
|
+
|
|
259
|
+
**"npm: command not found"**
|
|
260
|
+
|
|
261
|
+
You need Node.js. Download it from [nodejs.org](https://nodejs.org/).
|
|
262
|
+
|
|
263
|
+
**"bash: /structure: No such file or directory"**
|
|
264
|
+
|
|
265
|
+
You typed `/structure` in your regular terminal. You need to type it inside Claude Code. First run `claude` to start Claude Code, then type `/structure`.
|
|
266
|
+
|
|
267
|
+
**"No images found"**
|
|
268
|
+
|
|
269
|
+
Make sure your PDF contains actual images, not just text. Some PDFs render everything as text.
|
|
154
270
|
|
|
155
|
-
|
|
156
|
-
- Claude Code (`npm install -g @anthropic-ai/claude-code`)
|
|
157
|
-
- Anthropic API key or Claude Pro/Max
|
|
271
|
+
**Claude Code asks for an API key**
|
|
158
272
|
|
|
159
|
-
|
|
273
|
+
Either get an API key at [console.anthropic.com](https://console.anthropic.com/), or subscribe to Claude Pro/Max at [claude.ai](https://claude.ai/).
|
|
160
274
|
|
|
161
275
|
---
|
|
162
276
|
|
|
@@ -175,5 +289,5 @@ MIT
|
|
|
175
289
|
---
|
|
176
290
|
|
|
177
291
|
<p align="center">
|
|
178
|
-
<strong>
|
|
292
|
+
<strong>Unstructured in. Structured out.</strong>
|
|
179
293
|
</p>
|
package/bin/install.js
CHANGED
|
@@ -4,7 +4,7 @@ const fs = require('fs');
|
|
|
4
4
|
const path = require('path');
|
|
5
5
|
const os = require('os');
|
|
6
6
|
|
|
7
|
-
const VERSION = '1.0.
|
|
7
|
+
const VERSION = '1.0.2';
|
|
8
8
|
const PACKAGE_NAME = 'structurecc';
|
|
9
9
|
|
|
10
10
|
// Colors
|
|
@@ -103,7 +103,7 @@ function install() {
|
|
|
103
103
|
const agentFiles = fs.readdirSync(srcAgentsDir);
|
|
104
104
|
ensureDir(agentsDir);
|
|
105
105
|
for (const file of agentFiles) {
|
|
106
|
-
if (file.startsWith('
|
|
106
|
+
if (file.startsWith('structurecc-')) {
|
|
107
107
|
fs.copyFileSync(
|
|
108
108
|
path.join(srcAgentsDir, file),
|
|
109
109
|
path.join(agentsDir, file)
|
|
@@ -138,7 +138,7 @@ function uninstall() {
|
|
|
138
138
|
if (fs.existsSync(agentsDir)) {
|
|
139
139
|
const agentFiles = fs.readdirSync(agentsDir);
|
|
140
140
|
for (const file of agentFiles) {
|
|
141
|
-
if (file.startsWith('
|
|
141
|
+
if (file.startsWith('structurecc-')) {
|
|
142
142
|
fs.unlinkSync(path.join(agentsDir, file));
|
|
143
143
|
log(` ✓ Removed ${file}`, colors.green);
|
|
144
144
|
}
|