structurecc 1.0.1 → 1.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +166 -52
  2. package/bin/install.js +1 -1
  3. package/package.json +1 -1
package/README.md CHANGED
@@ -1,75 +1,183 @@
1
- <h1 align="center">STRUCTUREIT</h1>
1
+ <h1 align="center">STRUCTURE</h1>
2
2
 
3
3
  <p align="center">
4
- <strong>Agentic Document Extraction for Claude Code</strong><br>
5
- <em>One command. Every figure. Every table.</em>
4
+ <strong>Landing AI charges $500/month for agentic document extraction.<br>This is free.</strong>
6
5
  </p>
7
6
 
8
7
  <p align="center">
9
8
  <a href="https://www.npmjs.com/package/structurecc"><img src="https://img.shields.io/npm/v/structurecc.svg" alt="npm version"></a>
10
- <a href="https://github.com/JamesWeatherhead/structurecc/stargazers"><img src="https://img.shields.io/github/stars/JamesWeatherhead/structurecc" alt="GitHub stars"></a>
11
9
  <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License: MIT"></a>
12
10
  </p>
13
11
 
14
12
  <p align="center">
15
- <em>Unstructured in. Structured out.</em>
13
+ <img src="assets/terminal.png" alt="structurecc" width="550">
14
+ </p>
15
+
16
+ <p align="center">
17
+ <em>Works on Mac, Windows, and Linux</em>
16
18
  </p>
17
19
 
18
20
  ---
19
21
 
20
22
  ## The Problem
21
23
 
22
- You have a PDF with figures, tables, and charts. You need that data.
24
+ You have a 50-page PDF with figures, tables, and charts. You need that data.
23
25
 
24
- **Manual approach:** Screenshot each figure. Copy tables cell by cell. Spend hours on one document.
26
+ **Manual approach:** Screenshot each figure. Transcribe tables cell by cell. Spend hours on one document.
27
+
28
+ **With structurecc:** One command. Walk away. Come back to perfectly structured markdown.
25
29
 
26
- **structurecc:**
27
30
  ```
28
31
  /structure paper.pdf
29
32
  ```
30
33
 
31
- Done.
34
+ Spawns parallel AI agents. Each agent analyzes one visual element. All run simultaneously. Done in minutes, not hours.
35
+
36
+ ---
37
+
38
+ ## What is this?
39
+
40
+ Give it a document. It extracts every image. Spawns one AI agent per image. Each agent exhaustively analyzes its element—tables become markdown tables, figures get descriptions, charts get data points extracted.
41
+
42
+ Runs inside **[Claude Code](https://docs.anthropic.com/en/docs/claude-code)** (Anthropic's terminal assistant). One command. ~$0.50-$5 per document.
43
+
44
+ Like [Landing AI's Agentic Document Extraction](https://landing.ai/agentic-document-extraction), but running locally via Claude Code.
32
45
 
33
46
  ---
34
47
 
35
- ## What It Does
48
+ ## Before You Start
36
49
 
50
+ You need two things:
51
+
52
+ ### 1. Node.js
53
+
54
+ Check if you have it:
55
+
56
+ ```bash
57
+ node --version
37
58
  ```
38
- PDF ───▶ [Agent 1] ───┐
39
- [Agent 2] ───┤
40
- [Agent 3] ───┼───▶ STRUCTURED.md
41
- [Agent N] ───┘
59
+
60
+ If you see a version number, you're good. If you see "command not found", download Node.js from **[nodejs.org](https://nodejs.org/)** and install it.
61
+
62
+ ### 2. Anthropic API Key or Pro/Max Plan
63
+
64
+ You need one of these to use Claude Code:
65
+
66
+ - **API key:** Get one at **[console.anthropic.com](https://console.anthropic.com/)**. Requires a payment method.
67
+ - **Pro or Max plan:** If you subscribe to Claude Pro ($20/mo) or Max ($100/mo), you can use Claude Code without a separate API key.
68
+
69
+ ---
70
+
71
+ ## Setup (5 minutes)
72
+
73
+ ### Step 1: Open your terminal
74
+
75
+ **Mac:** Press `Cmd + Space`, type `Terminal`, press Enter
76
+
77
+ **Windows:** Press `Win + X`, click "Terminal" or "PowerShell"
78
+
79
+ **Linux:** Press `Ctrl + Alt + T`
80
+
81
+ ---
82
+
83
+ ### Step 2: Install Claude Code
84
+
85
+ Copy this command and paste it into your terminal:
86
+
87
+ ```bash
88
+ npm install -g @anthropic-ai/claude-code
42
89
  ```
43
90
 
44
- 1. **Extracts** every image from your document
45
- 2. **Spawns** one AI agent per image (running in parallel)
46
- 3. **Analyzes** each element exhaustively
47
- 4. **Outputs** clean, structured markdown
91
+ <p align="center">
92
+ <img src="assets/screenshots/step0.png" alt="Install Claude Code" width="550">
93
+ </p>
48
94
 
49
- Like [Landing AI's Agentic Document Extraction](https://landing.ai/agentic-document-extraction), but running locally via Claude Code.
95
+ Wait for it to finish.
50
96
 
51
97
  ---
52
98
 
53
- ## Install
99
+ ### Step 3: Install structurecc
100
+
101
+ Copy and run this:
54
102
 
55
103
  ```bash
56
104
  npx structurecc
57
105
  ```
58
106
 
59
- ## Use
107
+ <p align="center">
108
+ <img src="assets/screenshots/step1.png" alt="Install structurecc" width="420">
109
+ </p>
110
+
111
+ You will see a STRUCTURE banner. That means it worked. You only do this once.
112
+
113
+ ---
114
+
115
+ ### Step 4: Set up your document folder
116
+
117
+ Create a folder with your document:
118
+
119
+ <p align="center">
120
+ <img src="assets/screenshots/step2.png" alt="Folder structure" width="380">
121
+ </p>
122
+
123
+ ```
124
+ documents/
125
+ ├── document.pdf ← your PDF, DOCX, or image
126
+ └── images/ ← extracted images go here (created automatically)
127
+ ├── figure_1.png
128
+ ├── table_2.png
129
+ └── chart_3.png
130
+ ```
131
+
132
+ **Put your document in a folder. That's it.**
133
+
134
+ ---
135
+
136
+ ### Step 5: Open Claude Code
137
+
138
+ Navigate to your document folder and start Claude Code:
139
+
140
+ ```bash
141
+ cd ~/Desktop/documents
142
+ claude
143
+ ```
144
+
145
+ <p align="center">
146
+ <img src="assets/screenshots/step3a.png" alt="Start Claude Code" width="460">
147
+ </p>
148
+
149
+ **Windows users:** Replace `~/Desktop/documents` with your actual path, like `C:\Users\YourName\Desktop\documents`
150
+
151
+ The first time you run `claude`, it will ask for your API key. Paste it in.
152
+
153
+ ---
154
+
155
+ ### Step 6: Run structure
60
156
 
61
- In Claude Code:
157
+ Now you are inside Claude Code. Type this command:
62
158
 
63
159
  ```
64
- /structure path/to/document.pdf
160
+ /structure document.pdf
65
161
  ```
66
162
 
67
- Works with: **PDF, DOCX, PNG, JPG**
163
+ <p align="center">
164
+ <img src="assets/screenshots/step3.png" alt="Run /structure" width="500">
165
+ </p>
166
+
167
+ **Important:** The `/structure` command only works inside Claude Code. If you type it in your regular terminal, it will not work.
168
+
169
+ structurecc will:
170
+ 1. Extract every image from your document
171
+ 2. Spawn one agent per image (all running in parallel)
172
+ 3. Each agent exhaustively analyzes its visual element
173
+ 4. Combine everything into `STRUCTURED.md`
68
174
 
69
175
  ---
70
176
 
71
177
  ## What You Get
72
178
 
179
+ A comprehensive markdown file with every visual element extracted:
180
+
73
181
  ```
74
182
  document_extracted/
75
183
  ├── images/ # All extracted visuals
@@ -97,66 +205,72 @@ document_extracted/
97
205
  | p-value | - | 0.67 | 0.73 |
98
206
 
99
207
  ## Notes
100
- - * Missing data excluded
208
+ - Confidence level: High
209
+ - * Missing data excluded from analysis
101
210
  ```
102
211
 
103
- ### Example: Figure Analysis
212
+ ### Example: Chart Analysis
104
213
 
105
214
  ```markdown
106
215
  # Kaplan-Meier Survival Curves
107
216
 
108
- **Type:** Figure
217
+ **Type:** Chart (Line/Survival)
109
218
  **Source:** Page 7, clinical_trial.pdf
110
219
 
111
220
  ## Content
112
221
 
113
222
  Survival curves comparing treatment (blue) vs placebo (red) over 24 months.
114
223
 
224
+ Key data points:
115
225
  - 12-month survival: Treatment 0.89, Placebo 0.78
116
226
  - 24-month survival: Treatment 0.76, Placebo 0.61
117
227
  - Log-rank p = 0.003
118
228
 
119
- ## Labels & Text
120
- - "Survival Probability"
121
- - "Time (months)"
122
- - "Treatment (n=245)"
123
- - "Placebo (n=248)"
229
+ ## Labels & Annotations
230
+ - Y-axis: "Survival Probability"
231
+ - X-axis: "Time (months)"
232
+ - Legend: "Treatment (n=245)", "Placebo (n=248)"
124
233
  ```
125
234
 
126
235
  ---
127
236
 
128
- ## How It Works
129
-
130
- 1. **Extract** - PyMuPDF pulls all images from PDF (or unzip DOCX media folder)
131
- 2. **Swarm** - Launch N parallel agents, one per image
132
- 3. **Analyze** - Each agent reads its image, extracts everything, writes markdown
133
- 4. **Combine** - Merge all element files into STRUCTURED.md
134
-
135
- Agents run simultaneously. 10 images = 10 agents = fast.
136
-
137
- ---
138
-
139
237
  ## Cost
140
238
 
141
- Depends on document complexity:
142
-
143
239
  | Document | Elements | ~Cost |
144
240
  |----------|----------|-------|
145
241
  | Simple paper | 5-10 | $0.50-$1 |
146
242
  | Full paper | 15-25 | $2-$4 |
147
243
  | Dense report | 40+ | $5-$10 |
148
244
 
149
- Uses Claude's multimodal vision. Works best with **Opus 4.5**.
245
+ Uses Claude's multimodal vision. Works best with **Opus 4.5** for complex tables and charts.
150
246
 
151
247
  ---
152
248
 
153
- ## Requirements
249
+ ## Supported Formats
250
+
251
+ - **PDF** - Extracts embedded images via PyMuPDF
252
+ - **DOCX** - Extracts images from Word's media folder
253
+ - **PNG/JPG/TIFF** - Analyzes images directly
254
+
255
+ ---
256
+
257
+ ## Troubleshooting
258
+
259
+ **"npm: command not found"**
260
+
261
+ You need Node.js. Download it from [nodejs.org](https://nodejs.org/).
262
+
263
+ **"bash: /structure: No such file or directory"**
264
+
265
+ You typed `/structure` in your regular terminal. You need to type it inside Claude Code. First run `claude` to start Claude Code, then type `/structure`.
266
+
267
+ **"No images found"**
268
+
269
+ Make sure your PDF contains actual images, not just text. Some PDFs render everything as text.
154
270
 
155
- - Node.js
156
- - Claude Code (`npm install -g @anthropic-ai/claude-code`)
157
- - Anthropic API key or Claude Pro/Max
271
+ **Claude Code asks for an API key**
158
272
 
159
- PyMuPDF installed automatically if needed.
273
+ Either get an API key at [console.anthropic.com](https://console.anthropic.com/), or subscribe to Claude Pro/Max at [claude.ai](https://claude.ai/).
160
274
 
161
275
  ---
162
276
 
@@ -175,5 +289,5 @@ MIT
175
289
  ---
176
290
 
177
291
  <p align="center">
178
- <strong>Stop copying tables by hand.</strong>
292
+ <strong>Unstructured in. Structured out.</strong>
179
293
  </p>
package/bin/install.js CHANGED
@@ -4,7 +4,7 @@ const fs = require('fs');
4
4
  const path = require('path');
5
5
  const os = require('os');
6
6
 
7
- const VERSION = '1.0.1';
7
+ const VERSION = '1.0.3';
8
8
  const PACKAGE_NAME = 'structurecc';
9
9
 
10
10
  // Colors
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "structurecc",
3
- "version": "1.0.1",
3
+ "version": "1.0.3",
4
4
  "description": "Agentic document extraction for Claude Code. One command. Every figure. Every table.",
5
5
  "keywords": [
6
6
  "document-extraction",