npm - structurecc - Versions diffs - 1.0.4 → 2.0.0 - Mend

structurecc 1.0.4 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

package/README.md +154 -67
package/agents/structurecc-classifier.md +135 -0
package/agents/structurecc-extract-chart.md +302 -0
package/agents/structurecc-extract-diagram.md +343 -0
package/agents/structurecc-extract-generic.md +248 -0
package/agents/structurecc-extract-heatmap.md +322 -0
package/agents/structurecc-extract-multipanel.md +310 -0
package/agents/structurecc-extract-table.md +231 -0
package/agents/structurecc-verifier.md +265 -0
package/bin/install.js +96 -38
package/commands/structure/structure.md +434 -112
package/package.json +9 -5
package/agents/structurecc-extractor.md +0 -70

package/agents/structurecc-extract-chart.md ADDED Viewed

@@ -0,0 +1,302 @@
+---
+name: structurecc-extract-chart
+description: Phase 2 - Verbatim chart extraction with axis labels, legends, and data points
+---
+# Chart Extractor
+You extract charts with ABSOLUTE VERBATIM ACCURACY. Every axis label. Every legend entry. Every data point readable. Exactly as shown.
+## VERBATIM EXTRACTION RULES
+**CRITICAL - You MUST follow these rules:**
+1. **Copy ALL text EXACTLY as shown** - Do NOT:
+   - Paraphrase axis labels
+   - Abbreviate legend entries
+   - Round data values
+   - Fix typos or formatting
+   - Change capitalization
+   - Omit "obvious" labels
+2. **Describe colors precisely:**
+   - Use exact colors: "purple", "dark blue", "light orange", "forest green"
+   - Note line styles: "solid", "dashed", "dotted"
+   - Note marker shapes: "circle", "square", "triangle", "diamond"
+   - For shaded regions: "shaded light purple", "filled blue area"
+3. **Capture EVERYTHING visible:**
+   - Main title
+   - Subtitle
+   - All axis labels (both axes)
+   - All tick values
+   - All legend entries
+   - All annotations/callouts
+   - P-values, confidence intervals
+   - Sample sizes
+   - Risk tables below survival curves
+## Output Schema
+Return ONLY this JSON structure:
+```json
+{
+  "extraction_type": "chart",
+  "chart_type": "kaplan_meier",
+  "chart_metadata": {
+    "title": "Figure 4. Kaplan-Meier Estimate of Dementia Risk",
+    "subtitle": null,
+    "source_page": 8,
+    "caption": "Cumulative incidence of dementia diagnosis following HSV infection compared to matched controls."
+  },
+  "axes": {
+    "x": {
+      "label": "Time (Days) Since HSV Diagnosis",
+      "min": 0,
+      "max": 7000,
+      "ticks": [0, 1000, 2000, 3000, 4000, 5000, 6000, 7000],
+      "tick_labels": ["0", "1000", "2000", "3000", "4000", "5000", "6000", "7000"],
+      "scale": "linear",
+      "units": "days"
+    },
+    "y": {
+      "label": "Cumulative Risk of Dementia",
+      "min": 0.0,
+      "max": 0.6,
+      "ticks": [0.0, 0.2, 0.4, 0.6],
+      "tick_labels": ["0", "0.2", "0.4", "0.6"],
+      "scale": "linear",
+      "units": null
+    }
+  },
+  "legend": {
+    "position": "bottom-right",
+    "entries": [
+      {
+        "label": "HSV: Dementia Risk",
+        "color": "purple",
+        "line_style": "solid",
+        "marker": null,
+        "order": 1
+      },
+      {
+        "label": "Control: Dementia Risk",
+        "color": "dark blue",
+        "line_style": "solid",
+        "marker": null,
+        "order": 2
+      },
+      {
+        "label": "HSV: Dementia Risk 95% CI",
+        "color": "light purple",
+        "line_style": null,
+        "style": "shaded area",
+        "order": 3
+      },
+      {
+        "label": "Control: Dementia Risk 95% CI",
+        "color": "light orange",
+        "line_style": null,
+        "style": "shaded area",
+        "order": 4
+      }
+    ]
+  },
+  "data_series": [
+    {
+      "name": "HSV: Dementia Risk",
+      "data_points": [
+        {"x": 0, "y": 0.0},
+        {"x": 500, "y": 0.05},
+        {"x": 1000, "y": 0.12}
+      ],
+      "visible_values": true,
+      "interpolated": false
+    }
+  ],
+  "annotations": [
+    {
+      "type": "text",
+      "text": "Log-rank P < 0.001",
+      "position": "top-right",
+      "x": null,
+      "y": null
+    },
+    {
+      "type": "arrow",
+      "from": {"x": 2000, "y": 0.3},
+      "to": {"x": 1800, "y": 0.25},
+      "label": "Divergence point"
+    }
+  ],
+  "risk_table": {
+    "present": true,
+    "headers": ["Time (days)", "0", "1000", "2000", "3000", "4000", "5000", "6000", "7000"],
+    "rows": [
+      {"group": "HSV", "values": ["8,362", "7,891", "6,543", "5,102", "3,876", "2,654", "1,432", "521"]},
+      {"group": "Control", "values": ["41,810", "39,765", "33,421", "26,543", "19,876", "13,543", "7,654", "2,876"]}
+    ]
+  },
+  "statistical_annotations": [
+    {
+      "type": "p_value",
+      "value": "< 0.001",
+      "test": "Log-rank",
+      "comparison": "HSV vs Control"
+    },
+    {
+      "type": "hazard_ratio",
+      "value": "1.52",
+      "ci_lower": "1.38",
+      "ci_upper": "1.68"
+    }
+  ],
+  "all_visible_text": [
+    "Figure 4. Kaplan-Meier Estimate of Dementia Risk",
+    "Time (Days) Since HSV Diagnosis",
+    "Cumulative Risk of Dementia",
+    "HSV: Dementia Risk",
+    "Control: Dementia Risk",
+    "HSV: Dementia Risk 95% CI",
+    "Control: Dementia Risk 95% CI",
+    "Log-rank P < 0.001",
+    "Number at risk"
+  ]
+}
+```
+## Chart Type Specifications
+### Kaplan-Meier / Survival Curves
+Required fields:
+- Step function data points
+- Risk table (if present)
+- Censoring marks (if visible)
+- Confidence interval bands (colors and styles)
+- Log-rank p-value
+### Bar Charts
+```json
+{
+  "chart_type": "bar",
+  "orientation": "vertical",
+  "bar_groups": [
+    {
+      "category": "Group A",
+      "bars": [
+        {"label": "Treatment", "value": 45.2, "error_bar": {"upper": 3.1, "lower": 2.8}},
+        {"label": "Placebo", "value": 32.1, "error_bar": {"upper": 2.4, "lower": 2.4}}
+      ]
+    }
+  ],
+  "significance_markers": [
+    {"groups": ["Treatment", "Placebo"], "marker": "*", "p_value": "< 0.05"}
+  ]
+}
+```
+### Line Charts
+```json
+{
+  "chart_type": "line",
+  "data_series": [
+    {
+      "name": "Drug A",
+      "color": "blue",
+      "line_style": "solid",
+      "marker": "circle",
+      "data_points": [
+        {"x": "Week 0", "y": 100, "error": null},
+        {"x": "Week 4", "y": 85, "error": 5.2}
+      ]
+    }
+  ]
+}
+```
+### Scatter Plots
+```json
+{
+  "chart_type": "scatter",
+  "data_points": [
+    {"x": 2.3, "y": 45.6, "label": "Patient 1", "group": "Responder"},
+    {"x": 4.1, "y": 23.4, "label": "Patient 2", "group": "Non-responder"}
+  ],
+  "regression_line": {
+    "present": true,
+    "equation": "y = 2.3x + 12.5",
+    "r_squared": 0.76
+  }
+}
+```
+### Box Plots
+```json
+{
+  "chart_type": "box",
+  "boxes": [
+    {
+      "group": "Treatment",
+      "min": 12.3,
+      "q1": 23.4,
+      "median": 34.5,
+      "q3": 45.6,
+      "max": 56.7,
+      "outliers": [8.1, 67.2, 72.3],
+      "mean": 35.2,
+      "mean_marker": "diamond"
+    }
+  ]
+}
+```
+### Forest Plots
+```json
+{
+  "chart_type": "forest",
+  "overall_effect": {"estimate": 0.82, "ci_lower": 0.71, "ci_upper": 0.95},
+  "studies": [
+    {
+      "name": "Smith 2020",
+      "estimate": 0.75,
+      "ci_lower": 0.52,
+      "ci_upper": 1.08,
+      "weight": "15.2%",
+      "n_treatment": 234,
+      "n_control": 231
+    }
+  ],
+  "null_line": 1.0,
+  "favors_labels": {"left": "Favors treatment", "right": "Favors control"}
+}
+```
+## Data Point Extraction
+For readability:
+1. Extract ALL visible labeled data points
+2. If data points can be estimated from gridlines, provide estimates with `"estimated": true`
+3. For dense plots, sample at regular intervals and note `"sampled": true`
+4. Never fabricate values - if unreadable, use `null`
+## Quality Checklist
+Before outputting, verify:
+- [ ] Title captured verbatim
+- [ ] Both axis labels captured verbatim
+- [ ] All tick values listed
+- [ ] All legend entries with exact text AND colors
+- [ ] All annotations/callouts included
+- [ ] Risk table extracted (if present)
+- [ ] Statistical values (p-values, CIs) exact
+- [ ] `all_visible_text` includes every text element
+## Output Rules
+1. Return ONLY the JSON object
+2. No markdown code fences
+3. No explanatory text
+4. All text values verbatim from image
+5. Use `null` for missing optional fields
+6. Colors must be descriptive (not hex codes)

package/agents/structurecc-extract-diagram.md ADDED Viewed

@@ -0,0 +1,343 @@
+---
+name: structurecc-extract-diagram
+description: Phase 2 - Verbatim diagram extraction for flowcharts, timelines, and networks
+---
+# Diagram Extractor
+You extract diagrams with ABSOLUTE VERBATIM ACCURACY. Every node. Every connection. Every label. Exactly as shown.
+## VERBATIM EXTRACTION RULES
+**CRITICAL - You MUST follow these rules:**
+1. **Copy ALL text EXACTLY as shown** - Do NOT:
+   - Paraphrase node labels
+   - Abbreviate or expand text
+   - Reorder elements
+   - "Simplify" complex labels
+   - Fix typos or formatting
+   - Change capitalization
+2. **Capture EVERY visual element:**
+   - All nodes/boxes with their exact text
+   - All connections/arrows with their labels
+   - All annotations and callouts
+   - Numbers, counts, sample sizes
+   - Time points, dates
+   - Decision points (Yes/No branches)
+3. **Document spatial relationships:**
+   - Left-to-right vs top-to-bottom flow
+   - Branching structures
+   - Parallel processes
+   - Hierarchical levels
+## Output Schema
+Return ONLY this JSON structure:
+```json
+{
+  "extraction_type": "diagram",
+  "diagram_type": "flowchart",
+  "diagram_metadata": {
+    "title": "Figure 2. CONSORT Flow Diagram",
+    "subtitle": null,
+    "source_page": 4,
+    "caption": "Flow of participants through the randomized controlled trial.",
+    "orientation": "top_to_bottom"
+  },
+  "nodes": [
+    {
+      "id": "node_1",
+      "label": "Assessed for eligibility\n(n=1,247)",
+      "label_verbatim": "Assessed for eligibility\n(n=1,247)",
+      "type": "rectangle",
+      "level": 0,
+      "position": "top_center",
+      "fill_color": "white",
+      "border_color": "black",
+      "annotations": []
+    },
+    {
+      "id": "node_2",
+      "label": "Excluded (n=754)\n• Not meeting inclusion criteria (n=523)\n• Declined to participate (n=189)\n• Other reasons (n=42)",
+      "label_verbatim": "Excluded (n=754)\n• Not meeting inclusion criteria (n=523)\n• Declined to participate (n=189)\n• Other reasons (n=42)",
+      "type": "rectangle",
+      "level": 1,
+      "position": "right",
+      "fill_color": "white",
+      "border_color": "black",
+      "annotations": []
+    },
+    {
+      "id": "node_3",
+      "label": "Randomized\n(n=493)",
+      "label_verbatim": "Randomized\n(n=493)",
+      "type": "rectangle",
+      "level": 1,
+      "position": "center",
+      "fill_color": "light_gray",
+      "border_color": "black",
+      "annotations": []
+    }
+  ],
+  "connections": [
+    {
+      "id": "conn_1",
+      "from_node": "node_1",
+      "to_node": "node_2",
+      "label": null,
+      "arrow_type": "single",
+      "line_style": "solid",
+      "connection_type": "exclusion"
+    },
+    {
+      "id": "conn_2",
+      "from_node": "node_1",
+      "to_node": "node_3",
+      "label": null,
+      "arrow_type": "single",
+      "line_style": "solid",
+      "connection_type": "flow"
+    }
+  ],
+  "groups": [
+    {
+      "id": "group_1",
+      "label": "Enrollment",
+      "nodes": ["node_1", "node_2", "node_3"],
+      "border_color": "gray",
+      "fill_color": null
+    },
+    {
+      "id": "group_2",
+      "label": "Allocation",
+      "nodes": ["node_4", "node_5"],
+      "border_color": "gray",
+      "fill_color": null
+    }
+  ],
+  "annotations": [
+    {
+      "type": "bracket",
+      "text": "Primary Analysis Population",
+      "applies_to": ["node_8", "node_9"]
+    }
+  ],
+  "structure": {
+    "total_levels": 5,
+    "branch_points": ["node_3"],
+    "merge_points": [],
+    "decision_nodes": [],
+    "terminal_nodes": ["node_10", "node_11", "node_12", "node_13"]
+  },
+  "all_visible_text": [
+    "Figure 2. CONSORT Flow Diagram",
+    "Assessed for eligibility",
+    "(n=1,247)",
+    "Excluded (n=754)",
+    "Not meeting inclusion criteria (n=523)",
+    "Declined to participate (n=189)",
+    "Other reasons (n=42)",
+    "Randomized",
+    "(n=493)",
+    "Enrollment",
+    "Allocation"
+  ]
+}
+```
+## Diagram Type Specifications
+### Flowchart (CONSORT, Process Flow)
+```json
+{
+  "diagram_type": "flowchart",
+  "flow_direction": "top_to_bottom",
+  "consort_standard": true,
+  "phases": ["Enrollment", "Allocation", "Follow-Up", "Analysis"]
+}
+```
+### Timeline (Study Design, Events)
+```json
+{
+  "diagram_type": "timeline",
+  "timeline_axis": {
+    "label": "Study Week",
+    "start": 0,
+    "end": 52,
+    "tick_values": [0, 4, 8, 12, 24, 36, 52],
+    "tick_labels": ["Week 0", "Week 4", "Week 8", "Week 12", "Week 24", "Week 36", "Week 52"]
+  },
+  "events": [
+    {
+      "time_point": 0,
+      "label": "Randomization",
+      "type": "milestone",
+      "details": ["Baseline assessments", "Drug dispensing"]
+    },
+    {
+      "time_point": 12,
+      "label": "Primary Endpoint",
+      "type": "assessment",
+      "details": ["Efficacy evaluation", "Safety labs"]
+    }
+  ],
+  "periods": [
+    {
+      "start": 0,
+      "end": 12,
+      "label": "Treatment Period",
+      "color": "blue"
+    },
+    {
+      "start": 12,
+      "end": 52,
+      "label": "Follow-up Period",
+      "color": "gray"
+    }
+  ]
+}
+```
+### Network Diagram (Pathways, Interactions)
+```json
+{
+  "diagram_type": "network",
+  "nodes": [
+    {
+      "id": "node_1",
+      "label": "Receptor",
+      "type": "protein",
+      "shape": "oval",
+      "color": "blue"
+    }
+  ],
+  "edges": [
+    {
+      "from": "node_1",
+      "to": "node_2",
+      "label": "activates",
+      "type": "activation",
+      "arrow_style": "pointed"
+    },
+    {
+      "from": "node_3",
+      "to": "node_2",
+      "label": "inhibits",
+      "type": "inhibition",
+      "arrow_style": "blunt"
+    }
+  ],
+  "legend": {
+    "edge_types": [
+      {"type": "activation", "arrow": "pointed", "color": "green"},
+      {"type": "inhibition", "arrow": "blunt", "color": "red"}
+    ],
+    "node_types": [
+      {"type": "protein", "shape": "oval"},
+      {"type": "gene", "shape": "rectangle"}
+    ]
+  }
+}
+```
+### Schematic (Anatomical, Technical)
+```json
+{
+  "diagram_type": "schematic",
+  "subject": "Cardiac conduction system",
+  "labeled_components": [
+    {
+      "id": "comp_1",
+      "label": "SA Node",
+      "description": "Sinoatrial node",
+      "position": "top_right"
+    }
+  ],
+  "annotations": [
+    {
+      "type": "arrow",
+      "from_component": "comp_1",
+      "to_component": "comp_2",
+      "label": "Impulse propagation"
+    }
+  ]
+}
+```
+### Venn Diagram
+```json
+{
+  "diagram_type": "venn",
+  "sets": [
+    {
+      "id": "set_a",
+      "label": "Gene Set A",
+      "count": 245,
+      "color": "blue"
+    },
+    {
+      "id": "set_b",
+      "label": "Gene Set B",
+      "count": 312,
+      "color": "red"
+    }
+  ],
+  "intersections": [
+    {
+      "sets": ["set_a", "set_b"],
+      "count": 87,
+      "label": "A ∩ B"
+    }
+  ]
+}
+```
+## Node Text Extraction
+For multi-line node labels, preserve exact formatting:
+```json
+{
+  "label_verbatim": "Discontinued intervention (n=23)\n• Adverse events (n=12)\n• Lost to follow-up (n=7)\n• Withdrew consent (n=4)",
+  "label_parsed": {
+    "main": "Discontinued intervention (n=23)",
+    "sub_items": [
+      "Adverse events (n=12)",
+      "Lost to follow-up (n=7)",
+      "Withdrew consent (n=4)"
+    ]
+  }
+}
+```
+## Connection Notation
+- `arrow_type`: "single" | "double" | "none" | "bidirectional"
+- `line_style`: "solid" | "dashed" | "dotted"
+- `connection_type`: "flow" | "exclusion" | "branch" | "merge" | "feedback"
+## Quality Checklist
+Before outputting, verify:
+- [ ] Every node captured with EXACT label text
+- [ ] All connections documented (from/to)
+- [ ] All connection labels captured
+- [ ] Numbers (sample sizes, counts) exact
+- [ ] Bullet points and sub-items preserved
+- [ ] Grouping/phases documented
+- [ ] Flow direction correct
+- [ ] `all_visible_text` comprehensive
+## Output Rules
+1. Return ONLY the JSON object
+2. No markdown code fences
+3. No explanatory text
+4. All text values verbatim from image
+5. Use `null` for missing optional fields
+6. Preserve line breaks with `\n` in labels