npm - @yeyuan98/opencode-bioresearcher-plugin - Versions diffs - 1.4.1 → 1.5.0 - Mend

@yeyuan98/opencode-bioresearcher-plugin 1.4.1 → 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (69) hide show

package/README.md +49 -22
package/dist/db-tools/backends/index.d.ts +11 -0
package/dist/db-tools/backends/index.js +48 -0
package/dist/db-tools/backends/mongodb/backend.d.ts +15 -0
package/dist/db-tools/backends/mongodb/backend.js +76 -0
package/dist/db-tools/backends/mongodb/connection.d.ts +27 -0
package/dist/db-tools/backends/mongodb/connection.js +107 -0
package/dist/db-tools/backends/mongodb/index.d.ts +4 -0
package/dist/db-tools/backends/mongodb/index.js +3 -0
package/dist/db-tools/backends/mongodb/translator.d.ts +30 -0
package/dist/db-tools/backends/mongodb/translator.js +407 -0
package/dist/db-tools/backends/mysql/backend.d.ts +15 -0
package/dist/db-tools/backends/mysql/backend.js +57 -0
package/dist/db-tools/backends/mysql/connection.d.ts +25 -0
package/dist/db-tools/backends/mysql/connection.js +83 -0
package/dist/db-tools/backends/mysql/index.d.ts +3 -0
package/dist/db-tools/backends/mysql/index.js +2 -0
package/dist/db-tools/backends/mysql/translator.d.ts +7 -0
package/dist/db-tools/backends/mysql/translator.js +67 -0
package/dist/db-tools/core/base.d.ts +17 -0
package/dist/db-tools/core/base.js +51 -0
package/dist/db-tools/core/config-loader.d.ts +3 -0
package/dist/db-tools/core/config-loader.js +46 -0
package/dist/db-tools/core/index.d.ts +2 -0
package/dist/db-tools/core/index.js +2 -0
package/dist/db-tools/core/jsonc-parser.d.ts +2 -0
package/dist/db-tools/core/jsonc-parser.js +77 -0
package/dist/db-tools/core/validator.d.ts +16 -0
package/dist/db-tools/core/validator.js +118 -0
package/dist/db-tools/executor.d.ts +13 -0
package/dist/db-tools/executor.js +54 -0
package/dist/db-tools/index.d.ts +51 -0
package/dist/db-tools/index.js +27 -0
package/dist/db-tools/interface/backend.d.ts +24 -0
package/dist/db-tools/interface/backend.js +1 -0
package/dist/db-tools/interface/connection.d.ts +21 -0
package/dist/db-tools/interface/connection.js +11 -0
package/dist/db-tools/interface/index.d.ts +4 -0
package/dist/db-tools/interface/index.js +4 -0
package/dist/db-tools/interface/query.d.ts +60 -0
package/dist/db-tools/interface/query.js +1 -0
package/dist/db-tools/interface/schema.d.ts +22 -0
package/dist/db-tools/interface/schema.js +1 -0
package/dist/db-tools/pool.d.ts +8 -0
package/dist/db-tools/pool.js +49 -0
package/dist/db-tools/tools/index.d.ts +27 -0
package/dist/db-tools/tools/index.js +191 -0
package/dist/db-tools/tools.d.ts +27 -0
package/dist/db-tools/tools.js +111 -0
package/dist/db-tools/types.d.ts +94 -0
package/dist/db-tools/types.js +40 -0
package/dist/db-tools/utils.d.ts +33 -0
package/dist/db-tools/utils.js +94 -0
package/dist/index.js +5 -1
package/dist/parser-tools/obo/index.d.ts +2 -0
package/dist/parser-tools/obo/index.js +2 -0
package/dist/parser-tools/obo/obo.d.ts +17 -0
package/dist/parser-tools/obo/obo.js +216 -0
package/dist/parser-tools/obo/types.d.ts +166 -0
package/dist/parser-tools/obo/types.js +1 -0
package/dist/parser-tools/obo/utils.d.ts +21 -0
package/dist/parser-tools/obo/utils.js +411 -0
package/dist/skills/env-jsonc-setup/SKILL.md +206 -0
package/dist/skills/long-table-summary/SKILL.md +437 -374
package/dist/skills/long-table-summary/combine_outputs.py +5 -14
package/dist/skills/long-table-summary/generate_prompts.py +211 -0
package/dist/skills/long-table-summary/pyproject.toml +8 -11
package/package.json +3 -1
package/dist/skills/long-table-summary/__init__.py +0 -3

package/dist/skills/long-table-summary/SKILL.md CHANGED Viewed

@@ -1,374 +1,437 @@
----
-name: long-table-summary
-description: Batch-process large tables using parallel subagents for summarization
-allowedTools:
-  - Bash
-  - Read
-  - Write
-  - Question
-  - Task
-  - tableListSheets
-  - tableGetSheetPreview
-  - tableGetHeaders
-  - tableGetRange
-  - blockingTimer
-dependencies:
-  - bioresearcher-core
----
-# Long Table Summary
-Batch-process large tables (xlsx, csv) using parallel subagents.
-## bioresearcher-core Dependencies
-This skill uses these resources from `bioresearcher-core`:
-**Python Utility:**
-| Script | Usage | Step |
-|--------|-------|------|
-| `python/template.py` | Generate batch prompts from template + contexts | Step 6 |
-**Patterns:**
-| Pattern | Usage | Step |
-|---------|-------|------|
-| `patterns/subagent-waves.md` | Launch subagents in parallel waves | Step 7 |
-| `patterns/progress.md` | Report processing progress | Step 8 |
-| `patterns/retry.md` | Retry failed batches with backoff | Step 9 |
-**Load bioresearcher-core first:**
-```
-skill bioresearcher-core
-```
-Extract `<core_skill_path>` from the skill tool output.
----
-## Workflow
-```
-Step 1: File Discovery & Validation
-Step 2: Sheet Selection (conditional)
-Step 3: Summarization Instructions
-Step 4: Auto-Configuration
-Step 5: Environment Setup
-Step 6: Generate Prompts
-Step 7: Launch Subagents
-Step 8: Monitor Progress
-Step 9: Auto-Retry & Notify
-Step 10: Combine & Report
-```
----
-## Steps
-### Step 1: File Discovery & Validation
-Use `question` tool to ask for the table file path:
-**Question:**
-- "What is the full path to the table file you want to process?" (supports .xlsx, .csv, .ods)
-After receiving the path, immediately verify and fetch metadata:
-1. Use `tableListSheets` to verify file exists and list sheets
-2. Use `tableGetSheetPreview` to get row count
-3. Use `tableGetHeaders` to get column structure
-**Error handling:**
-- If file doesn't exist: "File not found. Please verify the path." Then re-prompt in same interaction.
-- If row_count <= 1: "This table has no data rows (only header found)." Then re-prompt.
----
-### Step 2: Sheet Selection (Conditional)
-**Skip this step for CSV files.**
-For Excel (.xlsx) and ODS (.ods) files with multiple sheets, use `question` tool:
-**Question:**
-- "This file has multiple sheets. Which sheet would you like to process?"
-**Options:** List all available sheet names.
-For CSV files or single-sheet files: Use the sheet name from `tableListSheets` automatically.
----
-### Step 3: Summarization Instructions
-Use `question` tool to request JSON format summarization requirements.
-**Question:**
-- "Please provide your summarization requirements in JSON format. Each key is the output field name, each value describes what to extract."
-**Show example:**
-```json
-{
-  "species": "Species: Tier1 (human/monkey), Tier2 (other animals), or NA",
-  "topic": "Main topic: Oncology, Immunology, General Biology, or Other"
-}
-```
-**After receiving JSON:**
-1. Parse and validate JSON format
-2. Show formatted JSON back to user
-3. Ask single confirmation: "Is this correct?" (Yes/No)
-4. If No: User provides corrected JSON in full
-5. Repeat until confirmed
----
-### Step 4: Auto-Configuration
-Automatically configure (no user questions):
-1. **Topic name:** Filename without extension
-   - Example: `clinical_trials_2024_data.xlsx` → `clinical_trials_2024_data`
-2. **Batch size:** Default 30 rows per batch
-3. **Calculate batches:**
-   - Data rows = total_rows - 1 (excluding header)
-   - num_batches = ceil(data_rows / 30)
-4. **Report to user:**
-   - "Processing {data_rows} data rows in {num_batches} batches"
----
-### Step 5: Environment Setup
-Create directory structure:
-```bash
-mkdir -p .long-table-summary/{topic}/prompts
-mkdir -p .long-table-summary/{topic}/outputs
-```
-**Write `subagent_template.md`:**
-```markdown
-# Batch Data Summarization Task
-## Input File
-- Full path: `{file_path}`
-- Sheet name: `{sheet_name}`
-## Row Range
-- Batch number: {batch_number}
-- Start row: {row_start}
-- End row: {row_end}
-## Summarization Instructions
-Extract the following fields from each row:
-{instructions_json}
-## Output Format
-Your output must be a valid JSON file with this structure:
-```json
-{
-  "batch_number": {batch_number},
-  "row_count": <number_of_rows_processed>,
-  "summaries": [
-    {
-      "row_number": <row_number>,
-      <field_1>: "<extracted_value>",
-      <field_2>: "<extracted_value>"
-    }
-  ]
-}
-```
-**Important:** JSON keys must match field names in Summarization Instructions.
-## Instructions
-1. Read the specified row range using `tableGetRange` tool
-2. For each row, extract the requested fields
-3. Save output to: `{output_file}`
-## Output File Path
-Full path: `{output_file}`
-**CRITICAL:** Write final output as markdown (.md) containing ONLY the JSON object.
-```
-**Write `contexts.json`:**
-```json
-[
-  {
-    "batch_number": 1,
-    "row_start": 2,
-    "row_end": 31,
-    "file_path": "<absolute_path>",
-    "sheet_name": "<sheet_name>",
-    "output_file": ".long-table-summary/<topic>/outputs/batch001.md",
-    "instructions_json": "<user_json_escaped>"
-  }
-]
-```
-Calculate contexts for all batches. Row 1 is header, data starts at row 2.
----
-### Step 6: Generate Prompts
-**Utility:** `<core_skill_path>/python/template.py`
-Use the template.py script from bioresearcher-core.
-**Unix-like shells:**
-```bash
-uv run python <core_skill_path>/python/template.py generate-batches \
-  --template .long-table-summary/{topic}/subagent_template.md \
-  --contexts .long-table-summary/{topic}/contexts.json \
-  --output-dir .long-table-summary/{topic}/prompts \
-  --filename-pattern "batch{index:03d}.md" \
-  --escape
-```
-**Windows cmd.exe:**
-```bash
-uv.exe run python <core_skill_path>\python\template.py generate-batches ^
-  --template .long-table-summary\{topic}\subagent_template.md ^
-  --contexts .long-table-summary\{topic}\contexts.json ^
-  --output-dir .long-table-summary\{topic}\prompts ^
-  --filename-pattern "batch{index:03d}.md" ^
-  --escape
-```
----
-### Step 7: Launch Subagents
-**Pattern:** `<core_skill_path>/patterns/subagent-waves.md`
-Launch in waves of 3 using Task tool.
-**Wave pattern:**
-```
-task(subagent_type="general", description="Process batch 001", prompt="Read .long-table-summary/{topic}/prompts/batch001.md and execute the task exactly as written.")
-task(subagent_type="general", description="Process batch 002", prompt="Read .long-table-summary/{topic}/prompts/batch002.md and execute the task exactly as written.")
-task(subagent_type="general", description="Process batch 003", prompt="Read .long-table-summary/{topic}/prompts/batch003.md and execute the task exactly as written.")
-```
-Wait for wave to complete, then launch next wave.
-**Important:** Always direct subagents to read prompt files. Never pass full prompts inline.
----
-### Step 8: Monitor Progress
-**Pattern:** `<core_skill_path>/patterns/progress.md`
-Report every 3 completions:
-- "Progress: 3/{num_batches} batches (30%)"
-- "Progress: 6/{num_batches} batches (60%)"
-- "Progress: {num_batches}/{num_batches} batches (100%)"
-Do NOT inspect individual outputs during processing.
----
-### Step 9: Auto-Retry & Notify
-**Pattern:** `<core_skill_path>/patterns/retry.md`
-After all batches complete, check for missing outputs:
-```bash
-ls .long-table-summary/{topic}/outputs/
-```
-**Auto-retry logic:**
-1. Detect failures: Compare expected batch files vs actual files
-2. For each failed batch:
-   - Use `blockingTimer(delay=2)` to wait 2 seconds
-   - Re-launch with same prompt
-   - Up to 3 attempts per batch
-3. Track retry count
-4. Only notify if failures persist after all retries:
-   - "Warning: {count} batches failed after 3 attempts: {batch_list}"
-   - "Proceeding with available data."
----
-### Step 10: Combine & Report
-Use `combine_outputs.py` from this skill.
-**Extract `<skill_path>` from this skill's tool output.**
-**Unix-like shells:**
-```bash
-uv run python <skill_path>/combine_outputs.py \
-  --input-dir .long-table-summary/{topic}/outputs \
-  --output-file .long-table-summary/{topic}/combined_summary.xlsx
-```
-**Windows cmd.exe:**
-```bash
-uv.exe run python <skill_path>\combine_outputs.py ^
-  --input-dir .long-table-summary\{topic}\outputs ^
-  --output-file .long-table-summary\{topic}\combined_summary.xlsx
-```
-**Final Report:**
-```
-Processing Complete
-━━━━━━━━━━━━━━━━━━━━
-Topic: {topic}
-Total batches: {num_batches}
-Successful: {success_count}
-Retries: {retry_count}
-Failed: {failed_count}
-Output: .long-table-summary/{topic}/combined_summary.xlsx
-Rows processed: {total_rows}
-Expected rows: {expected_rows}
-Completeness: {completeness}%
-```
----
-## Python Scripts
-### combine_outputs.py
-Combines batch JSON outputs into a single Excel file.
-**Arguments:**
-- `--input-dir`: Directory containing batch*.md files (required)
-- `--output-file`: Path for output Excel file (required)
-- `--deduplicate`: Remove duplicate row numbers (optional)
-- `--column-order`: "preserve" or "alphabetical" (default: preserve)
-- `--verbose`: Enable debug output (optional)
-- `--dry-run`: Validate without writing (optional)
-**Output:**
-- Excel file with columns: row_number, <user_fields...>
-- Sorted by row_number ascending
----
-## Notes
-- Default batch size: 30 rows
-- Topic name: filename without extension
-- Subagent type: "general"
-- Wave size: 3 subagents
-- Retry attempts: 3 per failed batch
-- Retry delay: 2 seconds
+---
+name: long-table-summary
+description: Batch-process large tables using parallel subagents for summarization
+allowedTools:
+  - Bash
+  - Read
+  - Write
+  - Question
+  - Task
+  - tableListSheets
+  - tableGetSheetPreview
+  - tableGetHeaders
+  - tableGetRange
+---
+# Long Table Summary
+This skill enables batched processing of large tables (xlsx, csv) using parallel subagents.
+## Workflow Overview
+1. **Table Discovery**: Interview user to locate table file and confirm existence
+2. **Sheet Selection**: If multiple sheets, prompt user to choose one
+3. **Summarization Instructions**: Interview user for summary requirements (JSON format)
+4. **Instruction Refinement**: Iterate to refine summarization instructions
+5. **Batch Size Prompting**: Use `question` tool to ask user for batch size
+6. **Topic Generation**: Autogenerate topic name from filename + comprehension of JSON
+7. **Template Creation**: Draft subagent prompt template with JSON output schema
+8. **Template Writing**: Write finalized template
+9. **Prompt Generation**: Use Python script to generate batch-specific prompts
+10. **Parallel Processing**: Launch subagents in waves of 3
+11. **Progress Monitoring**: Report every 3 completed subagents
+12. **Retry Failed Batches**: Up to 3 retry attempts for failed batches
+13. **Output Combination**: Automatically combine all JSON outputs into single table
+## Steps
+### Step 1: Interview User for Table Location
+Use `question` tool to ask for table file path:
+**Question:**
+- "What is the full path to the table file you want to process?" (supports .xlsx, .csv, .ods)
+### Step 2: Confirm Table Existence and List Sheets
+Use `tableListSheets` tool to verify:
+```bash
+tableListSheets(file_path="<user_provided_path>")
+```
+If the file doesn't exist or is invalid, prompt user to verify the path.
+### Step 3: Handle Multiple Sheets (if applicable)
+**Important:** CSV files have only one sheet. Always skip this step for CSV files.
+If the file is an Excel (.xlsx) or ODS (.ods) file AND there is more than one sheet, use `question` tool to ask user to choose one.
+For CSV files: Use the first/only sheet name automatically (either filename returned by tableListSheets or "Sheet1" default) without asking user.
+### Step 4: Get Table Metadata
+Use `tableGetSheetPreview` and `tableGetHeaders` to get row count and column structure.
+### Step 5: Interview User for Summarization Instructions
+Use `question` tool to ask user to provide summarization instructions in JSON format.
+**Question:**
+- "Please provide your summarization requirements in JSON format. For each field you want extracted, specify as key the field name, and as value a brief description of what it represents."
+**Show example:**
+```json
+{
+  "species": "Species, one of Tier1/Tier2/NA. Tier1 includes human and monkey, Tier2 includes other animals. NA otherwise.",
+  "topic": "Main topic, one of Oncology/Immunology/General Biology/Others."
+}
+```
+**Additional example:**
+```json
+{
+  "gene_mutation": "Gene mutation pattern, e.g., V600E, R173, Wild Type",
+  "clinical_significance": "Clinical relevance, one of High/Medium/Low/Unknown",
+  "therapeutic_target": "Is this a drug target? Answer Yes/No/Unknown"
+}
+```
+**Instructions to user:**
+- Provide any number of fields
+- Each field should be a key (JSON property name) with a description as the value
+- Use clear, specific descriptions that a subagent can interpret
+- Include allowed values or examples if applicable
+### Step 6: Summarization Instruction Refinement
+After receiving user's JSON instructions, iteratively refine them.
+**Question:**
+- "Here's your summarization instruction template. Would you like to modify any field descriptions or add new fields?"
+**Show the current JSON to user**
+If user selects "No, it's correct", proceed to Step 7.
+If user selects "Yes, I want to modify":
+- Ask which field to modify or add
+- Update the JSON accordingly
+- Repeat the approval question
+Continue until user explicitly confirms that the instruction JSON is correct.
+### Step 7: Autogenerate Topic Name
+Generate the topic name by combining:
+- Base filename (without extension)
+- Summary words derived from manual comprehension of user's JSON
+**Algorithm:**
+1. Extract filename: `clinical_trials_2024_data.xlsx` → `clinical_trials_2024_data`
+   - **Note:** The filename may have more than 6 words; keep the full name
+2. Comprehend the overall content of user's JSON instructions:
+   - Read the JSON manually to understand what it's about
+   - Identify the main idea or subject matter
+   - Generate summary words (maximum 6 words, lowercase, hyphenated)
+   - Examples: `species-classification`, `gene-mutation`, `clinical-significance`
+3. Combine with hyphens: `clinical_trials_2024_data-species-classification`
+   - **Total words may exceed 6** (filename + summary words)
+   - **Only the summary words part has a 6-word limit**
+**Examples:**
+- `clinical_trials_2024_data.xlsx` + topic about species → `clinical_trials_2024_data-species-classification`
+- `literature_review_comprehensive.csv` + gene mutation analysis → `literature_review_comprehensive-gene-mutation`
+- `long_descriptive_filename_multiple_words.xlsx` + immunology → `long_descriptive_filename_multiple_words-immunology`
+### Step 8: Ask User for Batch Size
+Use `question` tool to explicitly prompt user about batch size.
+**Question:**
+- "How many rows should each batch contain? Recommended default: 30 rows per batch."
+User's value will be used. Calculate the number of batches needed: `ceil(total_rows / batch_size)`.
+### Step 9: Calculate Batch Ranges
+Example for 90 rows with 30 per batch:
+- Batch 1: Rows 2-31
+- Batch 2: Rows 32-61
+- Batch 3: Rows 62-90
+**Note:** Row 1 is the header, data starts at row 2.
+### Step 10: Create Subagent Prompt Template
+Create a template with `{placeholder}` format (single braces):
+```markdown
+# Batch Data Summarization Task
+## Input File
+- Full path: `{file_path}`
+- Sheet name: `{sheet_name}`
+## Row Range
+- Batch number: {batch_number}
+- Start row: {row_start}
+- End row: {row_end}
+## Summarization Instructions
+Extract the following fields from each row:
+{instructions_json}
+## Output Format
+Your output must be a valid JSON file with this structure:
+```json
+{
+  "batch_number": {batch_number},
+  "row_count": <number_of_rows_processed>,
+  "summaries": [
+    {
+      "row_number": <row_number>,
+      <field_1>: "<extracted_value>",
+      <field_2>: "<extracted_value>"
+    }
+  ]
+}
+```
+**Important:** The JSON keys for extracted values must match the field names specified in the Summarization Instructions.
+## Instructions
+1. Read the specified row range from the input file using the `tableGetRange` tool
+2. For each row, extract the requested fields according to the instructions above
+3. Map the extracted values to the JSON keys specified in the instructions
+4. Generate concise summaries based on the extracted data
+5. Save your output to: `{output_file}`
+## Output File Path
+Full path: `{output_file}`
+**CRITICAL:** Write your final output as a markdown file (.md) containing ONLY the JSON object (no additional text or explanation).
+```
+### Step 11: Create Directory Structure
+```bash
+mkdir -p .long-table-summary/{topic}/prompts
+mkdir -p .long-table-summary/{topic}/outputs
+```
+### Step 12: Write Template
+Write the finalized template to `.long-table-summary/{topic}/subagent_template.md`.
+### Step 13: Generate Subagent Prompts
+Use `generate_prompts.py`:
+**Before Step 13 and Step 17:** Extract the full path to the skill directory from the `<skill_files>` section in the skill tool output. Use this path as `<skill_path>` in the commands below.
+**Unix-like shells:**
+```bash
+uv run python <skill_path>/generate_prompts.py \
+  --template .long-table-summary/{topic}/subagent_template.md \
+  --output-dir .long-table-summary/{topic}/prompts \
+  --num-batches {num_batches} \
+  --sheet-name "{sheet_name}" \
+  --file-path "{input_file}" \
+  --start-row 2 \
+  --batch-size {batch_size} \
+  --instructions '{instructions_json}'
+```
+**For Windows cmd.exe:**
+```bash
+uv.exe run python <skill_path>\generate_prompts.py ^
+  --template .long-table-summary\{topic}\subagent_template.md ^
+  --output-dir .long-table-summary\{topic}\prompts ^
+  --num-batches {num_batches} ^
+  --sheet-name "{sheet_name}" ^
+  --file-path "{input_file}" ^
+  --start-row 2 ^
+  --batch-size {batch_size} ^
+  --instructions "{instructions_json}"
+```
+**Note:** The `{instructions_json}` is the user-confirmed JSON from Step 6.
+### Step 14: Launch Subagents in Waves of 3
+Launch subagents in waves of 3, waiting for each wave to complete before starting the next.
+**Wave 1:**
+```typescript
+task(subagent_type="general", description="Process batch 001", prompt="Read your prompt from .long-table-summary/{topic}/prompts/batch001.md and perform the task described there exactly as written.", run_in_background=true)
+task(subagent_type="general", description="Process batch 002", prompt="Read your prompt from .long-table-summary/{topic}/prompts/batch002.md and perform the task described there exactly as written.", run_in_background=true)
+task(subagent_type="general", description="Process batch 003", prompt="Read your prompt from .long-table-summary/{topic}/prompts/batch003.md and perform the task described there exactly as written.", run_in_background=true)
+```
+**Wait for Wave 1 to complete.**
+**Wave 2 (if more batches):**
+```typescript
+task(subagent_type="general", description="Process batch 004", prompt="Read your prompt from .long-table-summary/{topic}/prompts/batch004.md and perform the task described there exactly as written.", run_in_background=true)
+task(subagent_type="general", description="Process batch 005", prompt="Read your prompt from .long-table-summary/{topic}/prompts/batch005.md and perform the task described there exactly as written.", run_in_background=true)
+task(subagent_type="general", description="Process batch 006", prompt="Read your prompt from .long-table-summary/{topic}/prompts/batch006.md and perform the task described there exactly as written.", run_in_background=true)
+```
+**Continue** launching waves of 3 until all batches are started.
+**Important:** Do NOT pass the full generated prompts directly to subagents. Always direct subagents to read their respective prompt files.
+### Step 15: Monitor Progress
+Every time 3 subagents complete, report progress:
+**Progress report format:**
+- "Progress: X/Y batches completed (Z%)"
+For example:
+- "Progress: 3/10 batches completed (30%)"
+- "Progress: 6/10 batches completed (60%)"
+- "Progress: 9/10 batches completed (90%)"
+- "Progress: 10/10 batches completed (100%)"
+Do NOT inspect individual subagent outputs midway.
+### Step 16: Retry Failed Batches
+After all batches are done, check for missing outputs:
+```bash
+ls .long-table-summary/{topic}/outputs/
+```
+**Missing files** = failed batches. Collect the batch numbers of all missing files.
+If there are failed batches:
+1. Ask user using the `question` tool:
+   - "Continue with failed batches or stop?
+Failed batches: batch003, batch007 (2 failures)
+• Continue - Will retry each failed batch up to 3 times
+• Stop - Keep current outputs and proceed to final report"
+   (Replace the bracketed list with the actual missing batch numbers)
+2. **Options:**
+   - "Continue with failed batches"
+   - "Stop and keep current outputs"
+3. **If user selects "Continue":**
+   - For each failed batch:
+     a. Wait 2 seconds
+     b. Retry with the same `subagent_type="general"`
+     c. Up to 3 retry attempts
+   - After retries, check again for remaining failures
+   - If batches are still failing, repeat the question with the updated failed list
+4. **If user selects "Stop":**
+   - Do not retry any more batches
+   - Proceed to Step 17 with whatever outputs exist
+### Step 17: Combine All JSON Outputs
+After all batches are complete (or user stops retrying), use `combine_outputs.py`:
+**Before Step 13 and Step 17:** Extract the full path to the skill directory from the `<skill_files>` section in the skill tool output. Use this path as `<skill_path>` in the commands below.
+**Unix-like shells:**
+```bash
+uv run python <skill_path>/combine_outputs.py \
+  --input-dir .long-table-summary/{topic}/outputs \
+  --output-file .long-table-summary/{topic}/combined_summary.xlsx
+```
+**For Windows cmd.exe:**
+```bash
+uv.exe run python <skill_path>\combine_outputs.py ^
+  --input-dir .long-table-summary\{topic}\outputs ^
+  --output-file .long-table-summary\{topic}\combined_summary.xlsx
+```
+**Expected output:**
+- A single Excel file with all summaries combined
+- Table format: row_number, <field_1>, <field_2>, ... (all user-requested fields)
+- One row per input table row
+- Sorted by row_number ascending
+**Script behavior:**
+- Reads all `batch*.md` files from the output directory
+- Parses JSON from each file
+- Dynamically determines all columns from the first batch's summaries (user's JSON keys)
+- Merges all summaries into a structured table
+- Writes the combined Excel file
+### Step 18: Final Report
+Provide user with:
+1. Topic name used
+2. Total batches processed
+3. Number of retries (if any)
+4. Combined output location
+5. Row count in the combined file
+## Python Scripts
+### Script 1: `generate_prompts.py`
+**Arguments:**
+- `--template`: Path to subagent_template.md
+- `--output-dir`: Directory for generated prompts
+- `--num-batches`: Total number of batches
+- `--sheet-name`: Sheet name
+- `--file-path`: Full path to the input table file
+- `--start-row`: Starting data row (default: 2)
+- `--batch-size`: Rows per batch (default: 30)
+- `--instructions`: User-confirmed JSON with summarization fields
+- `--dry-run`: Validate without creating files (optional)
+- `--verbose`: Enable verbose output for debugging (optional)
+**Placeholders to replace:**
+- `{file_path}` → Absolute input file path
+- `{sheet_name}` → Sheet name
+- `{batch_number}` → Batch number (001, 002, etc.)
+- `{row_start}` → Start row
+- `{row_end}` → End row
+- `{output_file}` → Output file path
+- `{instructions_json}` → User's JSON instruction (properly escaped for markdown code block)
+### Script 2: `combine_outputs.py`
+**Arguments:**
+- `--input-dir`: Directory containing batch output JSON files
+- `--output-file`: Path for the combined Excel output file
+- `--dry-run`: Validate inputs without writing output file (optional)
+- `--verbose`: Enable verbose output for debugging (optional)
+- `--deduplicate`: Remove duplicate row numbers (keep first occurrence) (optional)
+- `--column-order`: Column order - 'preserve' (from first batch) or 'alphabetical' (default) (optional)
+**Behavior:**
+1. Scan the input directory for `batch*.md` files
+2. For each file, extract the JSON content
+3. Dynamically determine all columns from the first batch's summaries (extract all JSON keys from the first summary, excluding `batch_number` and `row_count`)
+4. Merge all summaries by row_number
+5. Create an Excel file with columns: row_number, <all user fields>
+6. Sort by row_number ascending
+7. Write to the output path
+**Error handling:**
+- If no output files are found → Return error JSON
+- If JSON parse fails → Log error and continue with other files
+- If duplicate row numbers exist → The last write wins (or use --deduplicate flag to keep first)
+## Notes
+- The default batch size recommendation is 30 rows per batch
+- Summarization instructions are provided as JSON with explicit field descriptions
+- The topic name is autogenerated from the filename + manual comprehension of the user's JSON
+  - The summary words part (derived from JSON) has a maximum of 6 words
+  - The total topic name (filename + summary words) may exceed 6 words
+- Subagent type is always `general`
+- Subagents are launched in waves of 3 (not 5)
+- Progress is reported every 3 completions
+- Failed batches are retried up to 3 times
+- Output is always combined automatically via the Python script
+- The main agent does NOT manually read the JSON outputs