npm - structurecc - Versions diffs - 3.1.0 → 3.2.0 - Mend

structurecc 3.1.0 → 3.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (2) hide show

package/commands/structure.md +129 -12
package/package.json +1 -1

package/commands/structure.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 description: Extract structured data from documents (PDF, DOCX, images) using Claude vision
-argument-hint: <path>
+argument-hint: <path> [--verbose]
 ---
 # Document Structure Extraction
@@ -11,13 +11,23 @@ You are extracting structured data from a document using Claude's native vision
 **Document path:** $ARGUMENTS
+## Flags
+- `--verbose` - Keep all intermediate files (chunks/, pages/, debug logs). Default behavior is clean output only.
+Parse the arguments to detect `--verbose`:
+- If `--verbose` is present anywhere in $ARGUMENTS, set VERBOSE_MODE=true
+- Remove `--verbose` from the path to get the actual document path
+- Default: VERBOSE_MODE=false (clean output)
 ## Workflow
 ### Step 1: Validate Input
-1. Check if the file exists at the provided path
-2. Determine the file type (PDF, DOCX, PNG, JPG, TIFF, etc.)
-3. If the path is invalid, inform the user and stop
+1. Parse arguments for `--verbose` flag
+2. Check if the file exists at the provided path
+3. Determine the file type (PDF, DOCX, PNG, JPG, TIFF, etc.)
+4. If the path is invalid, inform the user and stop
 ### Step 2: Determine Processing Strategy
@@ -40,13 +50,28 @@ Based on document type:
 ### Step 3: Create Output Directory
 Create output directory structure:
+**Default (clean output):**
+```
+<source_dir>/<filename>_extracted/
+├── structure.json    # Final merged JSON (machine-readable)
+├── STRUCTURE.md      # Human-readable markdown summary
+└── images/           # Extracted figures (if any)
+```
+**With --verbose (keeps intermediates):**
 ```
 <source_dir>/<filename>_extracted/
-├── chunks/           # Individual chunk JSON files
 ├── structure.json    # Final merged JSON
-└── STRUCTURE.md      # Human-readable markdown
+├── STRUCTURE.md      # Human-readable markdown
+├── images/           # Extracted figures (if any)
+├── chunks/           # Individual chunk JSON files
+├── pages/            # Per-page PNG images (if generated)
+└── debug/            # Processing logs
 ```
+During processing, create a temporary `_processing/` subdirectory for intermediate files. This will be cleaned up at the end (unless --verbose).
 ### Step 4: Launch Chunk Agents (Parallel)
 For each chunk, launch a Task agent with subagent_type="general-purpose":
@@ -57,7 +82,7 @@ Each agent receives:
 1. The document path
 2. Their assigned page range (e.g., pages 1-5)
 3. The chunk extractor prompt (embedded below)
-4. Output path for their chunk JSON
+4. Output path for their chunk JSON (write to `_processing/chunks/` subdirectory)
 **Chunk Extractor Prompt for Agents:**
@@ -207,13 +232,13 @@ Write your JSON to: {output_path}
 ### Step 5: Wait for Chunk Agents
 After launching all chunk agents, wait for them to complete.
-Each agent will write their chunk JSON to the chunks/ directory.
+Each agent will write their chunk JSON to the `_processing/chunks/` directory.
 ### Step 6: Merge Chunks
 Once all chunks are complete:
-1. Read all chunk JSON files from chunks/ directory
+1. Read all chunk JSON files from `_processing/chunks/` directory
 2. Merge into single structure with page offset correction:
 ```python
@@ -289,8 +314,72 @@ Create STRUCTURE.md with human-readable format:
 ---
 ```
-### Step 8: Display Completion
+### Step 8: Clean Up Intermediate Files
+After generating the final outputs, clean up intermediate files **unless --verbose flag was provided**.
+**Default behavior (VERBOSE_MODE=false):**
+1. Move any extracted images from `_processing/` to `images/` directory
+2. Delete the entire `_processing/` directory and its contents:
+   - `_processing/chunks/` - intermediate chunk JSON files
+   - `_processing/pages/` - per-page images (if generated)
+   - `_processing/debug/` - any debug logs
+```bash
+# Move images if they exist
+if [ -d "_processing/images" ]; then
+    mv _processing/images ./images
+fi
+# Remove processing directory
+rm -rf _processing/
+```
+**Verbose behavior (VERBOSE_MODE=true):**
+1. Move intermediate files to permanent locations:
+   - `_processing/chunks/` → `chunks/`
+   - `_processing/pages/` → `pages/`
+   - `_processing/images/` → `images/`
+   - `_processing/debug/` → `debug/`
+2. Keep all files for debugging/inspection
+```bash
+# Move to permanent locations
+mv _processing/chunks ./chunks 2>/dev/null
+mv _processing/pages ./pages 2>/dev/null
+mv _processing/images ./images 2>/dev/null
+mv _processing/debug ./debug 2>/dev/null
+# Remove empty processing directory
+rmdir _processing 2>/dev/null
+```
+**Final output structure:**
+Default (clean):
+```
+<filename>_extracted/
+├── structure.json    # 28 KB - complete machine-readable data
+├── STRUCTURE.md      # 12 KB - human-readable summary
+└── images/           # Extracted figures (if any)
+```
+With --verbose:
+```
+<filename>_extracted/
+├── structure.json
+├── STRUCTURE.md
+├── images/
+├── chunks/           # Per-chunk JSON files
+├── pages/            # Per-page images
+└── debug/            # Processing logs
+```
+### Step 9: Display Completion
+**Default (clean) completion message:**
 ```
 ┌──────────────────────────────────────────────────────────────────────┐
 │  EXTRACTION COMPLETE                                                 │
@@ -300,9 +389,34 @@ Create STRUCTURE.md with human-readable format:
 │  Pages: {total} | Tables: {count} | Figures: {count}                 │
 │  Average Confidence: {score}                                         │
 │                                                                      │
+│  Output (2 files):                                                   │
+│    {output_dir}/structure.json    (machine-readable)                 │
+│    {output_dir}/STRUCTURE.md      (human-readable)                   │
+│                                                                      │
+│  Low Confidence Items: {count}                                       │
+│  {List any elements with confidence < 0.8}                           │
+│                                                                      │
+│  Tip: Use --verbose to keep intermediate files                       │
+│                                                                      │
+└──────────────────────────────────────────────────────────────────────┘
+```
+**Verbose completion message:**
+```
+┌──────────────────────────────────────────────────────────────────────┐
+│  EXTRACTION COMPLETE (verbose mode)                                  │
+├──────────────────────────────────────────────────────────────────────┤
+│                                                                      │
+│  Source: {filename}                                                  │
+│  Pages: {total} | Tables: {count} | Figures: {count}                 │
+│  Average Confidence: {score}                                         │
+│                                                                      │
 │  Output:                                                             │
-│    {output_dir}/structure.json                                       │
-│    {output_dir}/STRUCTURE.md                                         │
+│    {output_dir}/structure.json    (final merged JSON)                │
+│    {output_dir}/STRUCTURE.md      (human-readable summary)           │
+│    {output_dir}/chunks/           (intermediate chunk files)         │
+│    {output_dir}/pages/            (per-page images)                  │
+│    {output_dir}/images/           (extracted figures)                │
 │                                                                      │
 │  Low Confidence Items: {count}                                       │
 │  {List any elements with confidence < 0.8}                           │
@@ -324,3 +438,6 @@ Create STRUCTURE.md with human-readable format:
 - Each chunk agent has 200K context - plenty for 5 pages
 - Chunks preserve figure-caption relationships (usually within same chunk)
 - Edge cases (figure on page 5, caption on page 6) are rare but detectable
+- **Default output is clean** (~40 KB total): structure.json + STRUCTURE.md + images/
+- Use `--verbose` to keep all intermediate files for debugging
+- Intermediate files are processed in `_processing/` directory during extraction

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "structurecc",
-  "version": "3.1.0",
+  "version": "3.2.0",
   "description": "Claude Code plugin for extracting structured data from documents using native vision and parallel Task agents",
   "author": "UTMB Diagnostic Center",
   "license": "MIT",