structurecc 3.0.1 → 3.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  description: Extract structured data from documents (PDF, DOCX, images) using Claude vision
3
- argument-hint: <path>
3
+ argument-hint: <path> [--verbose]
4
4
  ---
5
5
 
6
6
  # Document Structure Extraction
@@ -11,13 +11,23 @@ You are extracting structured data from a document using Claude's native vision
11
11
 
12
12
  **Document path:** $ARGUMENTS
13
13
 
14
+ ## Flags
15
+
16
+ - `--verbose` - Keep all intermediate files (chunks/, pages/, debug logs). Default behavior is clean output only.
17
+
18
+ Parse the arguments to detect `--verbose`:
19
+ - If `--verbose` is present anywhere in $ARGUMENTS, set VERBOSE_MODE=true
20
+ - Remove `--verbose` from the path to get the actual document path
21
+ - Default: VERBOSE_MODE=false (clean output)
22
+
14
23
  ## Workflow
15
24
 
16
25
  ### Step 1: Validate Input
17
26
 
18
- 1. Check if the file exists at the provided path
19
- 2. Determine the file type (PDF, DOCX, PNG, JPG, TIFF, etc.)
20
- 3. If the path is invalid, inform the user and stop
27
+ 1. Parse arguments for `--verbose` flag
28
+ 2. Check if the file exists at the provided path
29
+ 3. Determine the file type (PDF, DOCX, PNG, JPG, TIFF, etc.)
30
+ 4. If the path is invalid, inform the user and stop
21
31
 
22
32
  ### Step 2: Determine Processing Strategy
23
33
 
@@ -40,13 +50,28 @@ Based on document type:
40
50
  ### Step 3: Create Output Directory
41
51
 
42
52
  Create output directory structure:
53
+
54
+ **Default (clean output):**
55
+ ```
56
+ <source_dir>/<filename>_extracted/
57
+ ├── structure.json # Final merged JSON (machine-readable)
58
+ ├── STRUCTURE.md # Human-readable markdown summary
59
+ └── images/ # Extracted figures (if any)
60
+ ```
61
+
62
+ **With --verbose (keeps intermediates):**
43
63
  ```
44
64
  <source_dir>/<filename>_extracted/
45
- ├── chunks/ # Individual chunk JSON files
46
65
  ├── structure.json # Final merged JSON
47
- └── STRUCTURE.md # Human-readable markdown
66
+ ├── STRUCTURE.md # Human-readable markdown
67
+ ├── images/ # Extracted figures (if any)
68
+ ├── chunks/ # Individual chunk JSON files
69
+ ├── pages/ # Per-page PNG images (if generated)
70
+ └── debug/ # Processing logs
48
71
  ```
49
72
 
73
+ During processing, create a temporary `_processing/` subdirectory for intermediate files. This will be cleaned up at the end (unless --verbose).
74
+
50
75
  ### Step 4: Launch Chunk Agents (Parallel)
51
76
 
52
77
  For each chunk, launch a Task agent with subagent_type="general-purpose":
@@ -57,7 +82,7 @@ Each agent receives:
57
82
  1. The document path
58
83
  2. Their assigned page range (e.g., pages 1-5)
59
84
  3. The chunk extractor prompt (embedded below)
60
- 4. Output path for their chunk JSON
85
+ 4. Output path for their chunk JSON (write to `_processing/chunks/` subdirectory)
61
86
 
62
87
  **Chunk Extractor Prompt for Agents:**
63
88
 
@@ -207,13 +232,13 @@ Write your JSON to: {output_path}
207
232
  ### Step 5: Wait for Chunk Agents
208
233
 
209
234
  After launching all chunk agents, wait for them to complete.
210
- Each agent will write their chunk JSON to the chunks/ directory.
235
+ Each agent will write their chunk JSON to the `_processing/chunks/` directory.
211
236
 
212
237
  ### Step 6: Merge Chunks
213
238
 
214
239
  Once all chunks are complete:
215
240
 
216
- 1. Read all chunk JSON files from chunks/ directory
241
+ 1. Read all chunk JSON files from `_processing/chunks/` directory
217
242
  2. Merge into single structure with page offset correction:
218
243
 
219
244
  ```python
@@ -289,8 +314,72 @@ Create STRUCTURE.md with human-readable format:
289
314
  ---
290
315
  ```
291
316
 
292
- ### Step 8: Display Completion
317
+ ### Step 8: Clean Up Intermediate Files
318
+
319
+ After generating the final outputs, clean up intermediate files **unless --verbose flag was provided**.
320
+
321
+ **Default behavior (VERBOSE_MODE=false):**
322
+
323
+ 1. Move any extracted images from `_processing/` to `images/` directory
324
+ 2. Delete the entire `_processing/` directory and its contents:
325
+ - `_processing/chunks/` - intermediate chunk JSON files
326
+ - `_processing/pages/` - per-page images (if generated)
327
+ - `_processing/debug/` - any debug logs
328
+
329
+ ```bash
330
+ # Move images if they exist
331
+ if [ -d "_processing/images" ]; then
332
+ mv _processing/images ./images
333
+ fi
334
+
335
+ # Remove processing directory
336
+ rm -rf _processing/
337
+ ```
338
+
339
+ **Verbose behavior (VERBOSE_MODE=true):**
340
+
341
+ 1. Move intermediate files to permanent locations:
342
+ - `_processing/chunks/` → `chunks/`
343
+ - `_processing/pages/` → `pages/`
344
+ - `_processing/images/` → `images/`
345
+ - `_processing/debug/` → `debug/`
346
+ 2. Keep all files for debugging/inspection
347
+
348
+ ```bash
349
+ # Move to permanent locations
350
+ mv _processing/chunks ./chunks 2>/dev/null
351
+ mv _processing/pages ./pages 2>/dev/null
352
+ mv _processing/images ./images 2>/dev/null
353
+ mv _processing/debug ./debug 2>/dev/null
354
+
355
+ # Remove empty processing directory
356
+ rmdir _processing 2>/dev/null
357
+ ```
358
+
359
+ **Final output structure:**
293
360
 
361
+ Default (clean):
362
+ ```
363
+ <filename>_extracted/
364
+ ├── structure.json # 28 KB - complete machine-readable data
365
+ ├── STRUCTURE.md # 12 KB - human-readable summary
366
+ └── images/ # Extracted figures (if any)
367
+ ```
368
+
369
+ With --verbose:
370
+ ```
371
+ <filename>_extracted/
372
+ ├── structure.json
373
+ ├── STRUCTURE.md
374
+ ├── images/
375
+ ├── chunks/ # Per-chunk JSON files
376
+ ├── pages/ # Per-page images
377
+ └── debug/ # Processing logs
378
+ ```
379
+
380
+ ### Step 9: Display Completion
381
+
382
+ **Default (clean) completion message:**
294
383
  ```
295
384
  ┌──────────────────────────────────────────────────────────────────────┐
296
385
  │ EXTRACTION COMPLETE │
@@ -300,9 +389,34 @@ Create STRUCTURE.md with human-readable format:
300
389
  │ Pages: {total} | Tables: {count} | Figures: {count} │
301
390
  │ Average Confidence: {score} │
302
391
  │ │
392
+ │ Output (2 files): │
393
+ │ {output_dir}/structure.json (machine-readable) │
394
+ │ {output_dir}/STRUCTURE.md (human-readable) │
395
+ │ │
396
+ │ Low Confidence Items: {count} │
397
+ │ {List any elements with confidence < 0.8} │
398
+ │ │
399
+ │ Tip: Use --verbose to keep intermediate files │
400
+ │ │
401
+ └──────────────────────────────────────────────────────────────────────┘
402
+ ```
403
+
404
+ **Verbose completion message:**
405
+ ```
406
+ ┌──────────────────────────────────────────────────────────────────────┐
407
+ │ EXTRACTION COMPLETE (verbose mode) │
408
+ ├──────────────────────────────────────────────────────────────────────┤
409
+ │ │
410
+ │ Source: {filename} │
411
+ │ Pages: {total} | Tables: {count} | Figures: {count} │
412
+ │ Average Confidence: {score} │
413
+ │ │
303
414
  │ Output: │
304
- │ {output_dir}/structure.json
305
- │ {output_dir}/STRUCTURE.md
415
+ │ {output_dir}/structure.json (final merged JSON)
416
+ │ {output_dir}/STRUCTURE.md (human-readable summary)
417
+ │ {output_dir}/chunks/ (intermediate chunk files) │
418
+ │ {output_dir}/pages/ (per-page images) │
419
+ │ {output_dir}/images/ (extracted figures) │
306
420
  │ │
307
421
  │ Low Confidence Items: {count} │
308
422
  │ {List any elements with confidence < 0.8} │
@@ -324,3 +438,6 @@ Create STRUCTURE.md with human-readable format:
324
438
  - Each chunk agent has 200K context - plenty for 5 pages
325
439
  - Chunks preserve figure-caption relationships (usually within same chunk)
326
440
  - Edge cases (figure on page 5, caption on page 6) are rare but detectable
441
+ - **Default output is clean** (~40 KB total): structure.json + STRUCTURE.md + images/
442
+ - Use `--verbose` to keep all intermediate files for debugging
443
+ - Intermediate files are processed in `_processing/` directory during extraction
package/install.js CHANGED
@@ -4,17 +4,8 @@ const fs = require('fs');
4
4
  const path = require('path');
5
5
  const os = require('os');
6
6
 
7
- const PLUGIN_NAME = 'structurecc';
8
- const DEST_DIR = path.join(os.homedir(), '.claude', 'plugins', PLUGIN_NAME);
9
-
10
- // Files to copy
11
- const FILES = [
12
- '.claude-plugin/plugin.json',
13
- 'commands/structure.md',
14
- 'commands/structure-batch.md',
15
- 'prompts/chunk-extractor.md',
16
- 'README.md'
17
- ];
7
+ const CLAUDE_DIR = path.join(os.homedir(), '.claude');
8
+ const COMMANDS_DIR = path.join(CLAUDE_DIR, 'commands');
18
9
 
19
10
  function copyFile(src, dest) {
20
11
  const destDir = path.dirname(dest);
@@ -31,39 +22,44 @@ function install() {
31
22
  ║ STRUCTURECC ║
32
23
  ║ Document Structure Extraction ║
33
24
  ║ ║
34
- ║ Claude Code Plugin | v3.0
25
+ ║ Claude Code Plugin | v3.1
35
26
  ║ ║
36
27
  ╚═══════════════════════════════════════════════════╝
37
28
  `);
38
29
 
39
30
  const sourceDir = __dirname;
40
31
 
41
- console.log(`Installing to: ${DEST_DIR}\n`);
42
-
43
- // Create destination directory
44
- if (!fs.existsSync(DEST_DIR)) {
45
- fs.mkdirSync(DEST_DIR, { recursive: true });
32
+ // Ensure commands directory exists
33
+ if (!fs.existsSync(COMMANDS_DIR)) {
34
+ fs.mkdirSync(COMMANDS_DIR, { recursive: true });
46
35
  }
47
36
 
48
- // Copy each file
37
+ console.log(`Installing to: ${COMMANDS_DIR}\n`);
38
+
39
+ // Copy command files to ~/.claude/commands/
40
+ const commands = [
41
+ { src: 'commands/structure.md', dest: 'structure.md' },
42
+ { src: 'commands/structure-batch.md', dest: 'structure-batch.md' }
43
+ ];
44
+
49
45
  let copied = 0;
50
- for (const file of FILES) {
51
- const src = path.join(sourceDir, file);
52
- const dest = path.join(DEST_DIR, file);
46
+ for (const cmd of commands) {
47
+ const src = path.join(sourceDir, cmd.src);
48
+ const dest = path.join(COMMANDS_DIR, cmd.dest);
53
49
 
54
50
  if (fs.existsSync(src)) {
55
51
  copyFile(src, dest);
56
- console.log(` ✓ ${file}`);
52
+ console.log(` ✓ ${cmd.dest}`);
57
53
  copied++;
58
54
  } else {
59
- console.log(` ✗ ${file} (not found)`);
55
+ console.log(` ✗ ${cmd.dest} (source not found)`);
60
56
  }
61
57
  }
62
58
 
63
59
  console.log(`
64
60
  ────────────────────────────────────────────────────────────────────────
65
61
 
66
- Installation complete! ${copied}/${FILES.length} files installed.
62
+ Installation complete! ${copied}/2 commands installed.
67
63
 
68
64
  USAGE:
69
65
  /structure <path> Extract from a single document
@@ -81,6 +77,8 @@ EXAMPLE:
81
77
 
82
78
  Output: JSON + Markdown in same directory as source document
83
79
 
80
+ NOTE: Restart Claude Code for commands to appear.
81
+
84
82
  ────────────────────────────────────────────────────────────────────────
85
83
  `);
86
84
  }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "structurecc",
3
- "version": "3.0.1",
3
+ "version": "3.2.0",
4
4
  "description": "Claude Code plugin for extracting structured data from documents using native vision and parallel Task agents",
5
5
  "author": "UTMB Diagnostic Center",
6
6
  "license": "MIT",