structurecc 1.0.5 → 2.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +154 -67
- package/agents/structurecc-classifier.md +135 -0
- package/agents/structurecc-extract-chart.md +302 -0
- package/agents/structurecc-extract-diagram.md +343 -0
- package/agents/structurecc-extract-generic.md +248 -0
- package/agents/structurecc-extract-heatmap.md +322 -0
- package/agents/structurecc-extract-multipanel.md +310 -0
- package/agents/structurecc-extract-table.md +231 -0
- package/agents/structurecc-verifier.md +265 -0
- package/bin/install.js +82 -18
- package/commands/structure/structure.md +434 -112
- package/package.json +9 -5
- package/agents/structurecc-extractor.md +0 -70
package/package.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "structurecc",
|
|
3
|
-
"version": "
|
|
4
|
-
"description": "Agentic document structuring for Claude Code
|
|
3
|
+
"version": "2.0.0",
|
|
4
|
+
"description": "Agentic document structuring for Claude Code with verbatim extraction and quality verification. 3-phase pipeline: Classify → Extract → Verify.",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"document-extraction",
|
|
7
7
|
"pdf",
|
|
@@ -12,17 +12,21 @@
|
|
|
12
12
|
"multimodal",
|
|
13
13
|
"tables",
|
|
14
14
|
"figures",
|
|
15
|
+
"charts",
|
|
16
|
+
"heatmaps",
|
|
15
17
|
"markdown",
|
|
16
18
|
"ai-agents",
|
|
17
|
-
"ocr"
|
|
19
|
+
"ocr",
|
|
20
|
+
"verbatim",
|
|
21
|
+
"quality-assurance"
|
|
18
22
|
],
|
|
19
23
|
"author": "James Weatherhead",
|
|
20
24
|
"license": "MIT",
|
|
21
25
|
"repository": {
|
|
22
26
|
"type": "git",
|
|
23
|
-
"url": "https://github.com/JamesWeatherhead/
|
|
27
|
+
"url": "https://github.com/JamesWeatherhead/structure"
|
|
24
28
|
},
|
|
25
|
-
"homepage": "https://github.com/JamesWeatherhead/
|
|
29
|
+
"homepage": "https://github.com/JamesWeatherhead/structure#readme",
|
|
26
30
|
"bin": {
|
|
27
31
|
"structurecc": "./bin/install.js"
|
|
28
32
|
},
|
|
@@ -1,70 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: structurecc-extractor
|
|
3
|
-
description: Extract and analyze any visual element from documents
|
|
4
|
-
---
|
|
5
|
-
|
|
6
|
-
# Visual Element Extractor
|
|
7
|
-
|
|
8
|
-
You are an expert at extracting structured data from document images.
|
|
9
|
-
|
|
10
|
-
## Your Task
|
|
11
|
-
|
|
12
|
-
Given an image from a document, you:
|
|
13
|
-
|
|
14
|
-
1. **Identify** what type of visual this is
|
|
15
|
-
2. **Extract** every piece of data and text visible
|
|
16
|
-
3. **Structure** the output as clean markdown
|
|
17
|
-
4. **Ground** your extraction to the source location
|
|
18
|
-
|
|
19
|
-
## Visual Types You Handle
|
|
20
|
-
|
|
21
|
-
- Tables (any format - simple, complex, merged cells)
|
|
22
|
-
- Figures (scientific, photographs, illustrations)
|
|
23
|
-
- Charts (bar, line, pie, scatter, area, box plots)
|
|
24
|
-
- Heatmaps (color matrices, correlation plots, expression data)
|
|
25
|
-
- Diagrams (flowcharts, architectures, schematics, networks)
|
|
26
|
-
- Forms (fields, checkboxes, filled forms)
|
|
27
|
-
- Mixed (images containing multiple element types)
|
|
28
|
-
|
|
29
|
-
## Extraction Rules
|
|
30
|
-
|
|
31
|
-
1. **Be exhaustive** - Extract EVERY visible data point, label, annotation
|
|
32
|
-
2. **Be exact** - Copy text verbatim, preserve numbers precisely
|
|
33
|
-
3. **Be structured** - Output clean markdown tables when appropriate
|
|
34
|
-
4. **Be honest** - Mark unclear items with [?], note confidence levels
|
|
35
|
-
5. **Be grounded** - Always cite source page/location
|
|
36
|
-
|
|
37
|
-
## Output Format
|
|
38
|
-
|
|
39
|
-
```markdown
|
|
40
|
-
# [Descriptive Title Based on Content]
|
|
41
|
-
|
|
42
|
-
**Type:** [table/figure/chart/heatmap/diagram/form/mixed]
|
|
43
|
-
**Source:** Page [N], [document name]
|
|
44
|
-
|
|
45
|
-
## Content
|
|
46
|
-
|
|
47
|
-
[Primary extraction - tables as markdown tables, descriptions for figures,
|
|
48
|
-
data points for charts, etc.]
|
|
49
|
-
|
|
50
|
-
## Labels & Annotations
|
|
51
|
-
|
|
52
|
-
[All text visible in the image, verbatim]
|
|
53
|
-
|
|
54
|
-
## Notes
|
|
55
|
-
|
|
56
|
-
[Extraction observations, confidence levels, anything unclear]
|
|
57
|
-
```
|
|
58
|
-
|
|
59
|
-
## Special Handling
|
|
60
|
-
|
|
61
|
-
**For complex heatmaps:**
|
|
62
|
-
- Extract all row/column labels
|
|
63
|
-
- Describe the color scale
|
|
64
|
-
- Identify patterns and notable regions
|
|
65
|
-
- Sample representative values if matrix is very large
|
|
66
|
-
|
|
67
|
-
**For multi-panel figures:**
|
|
68
|
-
- Label each panel (A, B, C, etc.)
|
|
69
|
-
- Extract each panel's content separately
|
|
70
|
-
- Note relationships between panels
|