@yeyuan98/opencode-bioresearcher-plugin 1.4.1 → 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (69) hide show
  1. package/README.md +49 -22
  2. package/dist/db-tools/backends/index.d.ts +11 -0
  3. package/dist/db-tools/backends/index.js +48 -0
  4. package/dist/db-tools/backends/mongodb/backend.d.ts +15 -0
  5. package/dist/db-tools/backends/mongodb/backend.js +76 -0
  6. package/dist/db-tools/backends/mongodb/connection.d.ts +27 -0
  7. package/dist/db-tools/backends/mongodb/connection.js +107 -0
  8. package/dist/db-tools/backends/mongodb/index.d.ts +4 -0
  9. package/dist/db-tools/backends/mongodb/index.js +3 -0
  10. package/dist/db-tools/backends/mongodb/translator.d.ts +30 -0
  11. package/dist/db-tools/backends/mongodb/translator.js +407 -0
  12. package/dist/db-tools/backends/mysql/backend.d.ts +15 -0
  13. package/dist/db-tools/backends/mysql/backend.js +57 -0
  14. package/dist/db-tools/backends/mysql/connection.d.ts +25 -0
  15. package/dist/db-tools/backends/mysql/connection.js +83 -0
  16. package/dist/db-tools/backends/mysql/index.d.ts +3 -0
  17. package/dist/db-tools/backends/mysql/index.js +2 -0
  18. package/dist/db-tools/backends/mysql/translator.d.ts +7 -0
  19. package/dist/db-tools/backends/mysql/translator.js +67 -0
  20. package/dist/db-tools/core/base.d.ts +17 -0
  21. package/dist/db-tools/core/base.js +51 -0
  22. package/dist/db-tools/core/config-loader.d.ts +3 -0
  23. package/dist/db-tools/core/config-loader.js +46 -0
  24. package/dist/db-tools/core/index.d.ts +2 -0
  25. package/dist/db-tools/core/index.js +2 -0
  26. package/dist/db-tools/core/jsonc-parser.d.ts +2 -0
  27. package/dist/db-tools/core/jsonc-parser.js +77 -0
  28. package/dist/db-tools/core/validator.d.ts +16 -0
  29. package/dist/db-tools/core/validator.js +118 -0
  30. package/dist/db-tools/executor.d.ts +13 -0
  31. package/dist/db-tools/executor.js +54 -0
  32. package/dist/db-tools/index.d.ts +51 -0
  33. package/dist/db-tools/index.js +27 -0
  34. package/dist/db-tools/interface/backend.d.ts +24 -0
  35. package/dist/db-tools/interface/backend.js +1 -0
  36. package/dist/db-tools/interface/connection.d.ts +21 -0
  37. package/dist/db-tools/interface/connection.js +11 -0
  38. package/dist/db-tools/interface/index.d.ts +4 -0
  39. package/dist/db-tools/interface/index.js +4 -0
  40. package/dist/db-tools/interface/query.d.ts +60 -0
  41. package/dist/db-tools/interface/query.js +1 -0
  42. package/dist/db-tools/interface/schema.d.ts +22 -0
  43. package/dist/db-tools/interface/schema.js +1 -0
  44. package/dist/db-tools/pool.d.ts +8 -0
  45. package/dist/db-tools/pool.js +49 -0
  46. package/dist/db-tools/tools/index.d.ts +27 -0
  47. package/dist/db-tools/tools/index.js +191 -0
  48. package/dist/db-tools/tools.d.ts +27 -0
  49. package/dist/db-tools/tools.js +111 -0
  50. package/dist/db-tools/types.d.ts +94 -0
  51. package/dist/db-tools/types.js +40 -0
  52. package/dist/db-tools/utils.d.ts +33 -0
  53. package/dist/db-tools/utils.js +94 -0
  54. package/dist/index.js +5 -1
  55. package/dist/parser-tools/obo/index.d.ts +2 -0
  56. package/dist/parser-tools/obo/index.js +2 -0
  57. package/dist/parser-tools/obo/obo.d.ts +17 -0
  58. package/dist/parser-tools/obo/obo.js +216 -0
  59. package/dist/parser-tools/obo/types.d.ts +166 -0
  60. package/dist/parser-tools/obo/types.js +1 -0
  61. package/dist/parser-tools/obo/utils.d.ts +21 -0
  62. package/dist/parser-tools/obo/utils.js +411 -0
  63. package/dist/skills/env-jsonc-setup/SKILL.md +206 -0
  64. package/dist/skills/long-table-summary/SKILL.md +437 -374
  65. package/dist/skills/long-table-summary/combine_outputs.py +5 -14
  66. package/dist/skills/long-table-summary/generate_prompts.py +211 -0
  67. package/dist/skills/long-table-summary/pyproject.toml +8 -11
  68. package/package.json +3 -1
  69. package/dist/skills/long-table-summary/__init__.py +0 -3
@@ -1,374 +1,437 @@
1
- ---
2
- name: long-table-summary
3
- description: Batch-process large tables using parallel subagents for summarization
4
- allowedTools:
5
- - Bash
6
- - Read
7
- - Write
8
- - Question
9
- - Task
10
- - tableListSheets
11
- - tableGetSheetPreview
12
- - tableGetHeaders
13
- - tableGetRange
14
- - blockingTimer
15
- dependencies:
16
- - bioresearcher-core
17
- ---
18
-
19
- # Long Table Summary
20
-
21
- Batch-process large tables (xlsx, csv) using parallel subagents.
22
-
23
- ## bioresearcher-core Dependencies
24
-
25
- This skill uses these resources from `bioresearcher-core`:
26
-
27
- **Python Utility:**
28
- | Script | Usage | Step |
29
- |--------|-------|------|
30
- | `python/template.py` | Generate batch prompts from template + contexts | Step 6 |
31
-
32
- **Patterns:**
33
- | Pattern | Usage | Step |
34
- |---------|-------|------|
35
- | `patterns/subagent-waves.md` | Launch subagents in parallel waves | Step 7 |
36
- | `patterns/progress.md` | Report processing progress | Step 8 |
37
- | `patterns/retry.md` | Retry failed batches with backoff | Step 9 |
38
-
39
- **Load bioresearcher-core first:**
40
- ```
41
- skill bioresearcher-core
42
- ```
43
-
44
- Extract `<core_skill_path>` from the skill tool output.
45
-
46
- ---
47
-
48
- ## Workflow
49
-
50
- ```
51
- Step 1: File Discovery & Validation
52
- Step 2: Sheet Selection (conditional)
53
- Step 3: Summarization Instructions
54
- Step 4: Auto-Configuration
55
- Step 5: Environment Setup
56
- Step 6: Generate Prompts
57
- Step 7: Launch Subagents
58
- Step 8: Monitor Progress
59
- Step 9: Auto-Retry & Notify
60
- Step 10: Combine & Report
61
- ```
62
-
63
- ---
64
-
65
- ## Steps
66
-
67
- ### Step 1: File Discovery & Validation
68
-
69
- Use `question` tool to ask for the table file path:
70
-
71
- **Question:**
72
- - "What is the full path to the table file you want to process?" (supports .xlsx, .csv, .ods)
73
-
74
- After receiving the path, immediately verify and fetch metadata:
75
- 1. Use `tableListSheets` to verify file exists and list sheets
76
- 2. Use `tableGetSheetPreview` to get row count
77
- 3. Use `tableGetHeaders` to get column structure
78
-
79
- **Error handling:**
80
- - If file doesn't exist: "File not found. Please verify the path." Then re-prompt in same interaction.
81
- - If row_count <= 1: "This table has no data rows (only header found)." Then re-prompt.
82
-
83
- ---
84
-
85
- ### Step 2: Sheet Selection (Conditional)
86
-
87
- **Skip this step for CSV files.**
88
-
89
- For Excel (.xlsx) and ODS (.ods) files with multiple sheets, use `question` tool:
90
-
91
- **Question:**
92
- - "This file has multiple sheets. Which sheet would you like to process?"
93
-
94
- **Options:** List all available sheet names.
95
-
96
- For CSV files or single-sheet files: Use the sheet name from `tableListSheets` automatically.
97
-
98
- ---
99
-
100
- ### Step 3: Summarization Instructions
101
-
102
- Use `question` tool to request JSON format summarization requirements.
103
-
104
- **Question:**
105
- - "Please provide your summarization requirements in JSON format. Each key is the output field name, each value describes what to extract."
106
-
107
- **Show example:**
108
- ```json
109
- {
110
- "species": "Species: Tier1 (human/monkey), Tier2 (other animals), or NA",
111
- "topic": "Main topic: Oncology, Immunology, General Biology, or Other"
112
- }
113
- ```
114
-
115
- **After receiving JSON:**
116
- 1. Parse and validate JSON format
117
- 2. Show formatted JSON back to user
118
- 3. Ask single confirmation: "Is this correct?" (Yes/No)
119
- 4. If No: User provides corrected JSON in full
120
- 5. Repeat until confirmed
121
-
122
- ---
123
-
124
- ### Step 4: Auto-Configuration
125
-
126
- Automatically configure (no user questions):
127
-
128
- 1. **Topic name:** Filename without extension
129
- - Example: `clinical_trials_2024_data.xlsx` `clinical_trials_2024_data`
130
-
131
- 2. **Batch size:** Default 30 rows per batch
132
-
133
- 3. **Calculate batches:**
134
- - Data rows = total_rows - 1 (excluding header)
135
- - num_batches = ceil(data_rows / 30)
136
-
137
- 4. **Report to user:**
138
- - "Processing {data_rows} data rows in {num_batches} batches"
139
-
140
- ---
141
-
142
- ### Step 5: Environment Setup
143
-
144
- Create directory structure:
145
-
146
- ```bash
147
- mkdir -p .long-table-summary/{topic}/prompts
148
- mkdir -p .long-table-summary/{topic}/outputs
149
- ```
150
-
151
- **Write `subagent_template.md`:**
152
-
153
- ```markdown
154
- # Batch Data Summarization Task
155
-
156
- ## Input File
157
- - Full path: `{file_path}`
158
- - Sheet name: `{sheet_name}`
159
-
160
- ## Row Range
161
- - Batch number: {batch_number}
162
- - Start row: {row_start}
163
- - End row: {row_end}
164
-
165
- ## Summarization Instructions
166
-
167
- Extract the following fields from each row:
168
-
169
- {instructions_json}
170
-
171
- ## Output Format
172
-
173
- Your output must be a valid JSON file with this structure:
174
-
175
- ```json
176
- {
177
- "batch_number": {batch_number},
178
- "row_count": <number_of_rows_processed>,
179
- "summaries": [
180
- {
181
- "row_number": <row_number>,
182
- <field_1>: "<extracted_value>",
183
- <field_2>: "<extracted_value>"
184
- }
185
- ]
186
- }
187
- ```
188
-
189
- **Important:** JSON keys must match field names in Summarization Instructions.
190
-
191
- ## Instructions
192
-
193
- 1. Read the specified row range using `tableGetRange` tool
194
- 2. For each row, extract the requested fields
195
- 3. Save output to: `{output_file}`
196
-
197
- ## Output File Path
198
- Full path: `{output_file}`
199
-
200
- **CRITICAL:** Write final output as markdown (.md) containing ONLY the JSON object.
201
- ```
202
-
203
- **Write `contexts.json`:**
204
-
205
- ```json
206
- [
207
- {
208
- "batch_number": 1,
209
- "row_start": 2,
210
- "row_end": 31,
211
- "file_path": "<absolute_path>",
212
- "sheet_name": "<sheet_name>",
213
- "output_file": ".long-table-summary/<topic>/outputs/batch001.md",
214
- "instructions_json": "<user_json_escaped>"
215
- }
216
- ]
217
- ```
218
-
219
- Calculate contexts for all batches. Row 1 is header, data starts at row 2.
220
-
221
- ---
222
-
223
- ### Step 6: Generate Prompts
224
-
225
- **Utility:** `<core_skill_path>/python/template.py`
226
-
227
- Use the template.py script from bioresearcher-core.
228
-
229
- **Unix-like shells:**
230
- ```bash
231
- uv run python <core_skill_path>/python/template.py generate-batches \
232
- --template .long-table-summary/{topic}/subagent_template.md \
233
- --contexts .long-table-summary/{topic}/contexts.json \
234
- --output-dir .long-table-summary/{topic}/prompts \
235
- --filename-pattern "batch{index:03d}.md" \
236
- --escape
237
- ```
238
-
239
- **Windows cmd.exe:**
240
- ```bash
241
- uv.exe run python <core_skill_path>\python\template.py generate-batches ^
242
- --template .long-table-summary\{topic}\subagent_template.md ^
243
- --contexts .long-table-summary\{topic}\contexts.json ^
244
- --output-dir .long-table-summary\{topic}\prompts ^
245
- --filename-pattern "batch{index:03d}.md" ^
246
- --escape
247
- ```
248
-
249
- ---
250
-
251
- ### Step 7: Launch Subagents
252
-
253
- **Pattern:** `<core_skill_path>/patterns/subagent-waves.md`
254
-
255
- Launch in waves of 3 using Task tool.
256
-
257
- **Wave pattern:**
258
- ```
259
- task(subagent_type="general", description="Process batch 001", prompt="Read .long-table-summary/{topic}/prompts/batch001.md and execute the task exactly as written.")
260
- task(subagent_type="general", description="Process batch 002", prompt="Read .long-table-summary/{topic}/prompts/batch002.md and execute the task exactly as written.")
261
- task(subagent_type="general", description="Process batch 003", prompt="Read .long-table-summary/{topic}/prompts/batch003.md and execute the task exactly as written.")
262
- ```
263
-
264
- Wait for wave to complete, then launch next wave.
265
-
266
- **Important:** Always direct subagents to read prompt files. Never pass full prompts inline.
267
-
268
- ---
269
-
270
- ### Step 8: Monitor Progress
271
-
272
- **Pattern:** `<core_skill_path>/patterns/progress.md`
273
-
274
- Report every 3 completions:
275
-
276
- - "Progress: 3/{num_batches} batches (30%)"
277
- - "Progress: 6/{num_batches} batches (60%)"
278
- - "Progress: {num_batches}/{num_batches} batches (100%)"
279
-
280
- Do NOT inspect individual outputs during processing.
281
-
282
- ---
283
-
284
- ### Step 9: Auto-Retry & Notify
285
-
286
- **Pattern:** `<core_skill_path>/patterns/retry.md`
287
-
288
- After all batches complete, check for missing outputs:
289
-
290
- ```bash
291
- ls .long-table-summary/{topic}/outputs/
292
- ```
293
-
294
- **Auto-retry logic:**
295
-
296
- 1. Detect failures: Compare expected batch files vs actual files
297
- 2. For each failed batch:
298
- - Use `blockingTimer(delay=2)` to wait 2 seconds
299
- - Re-launch with same prompt
300
- - Up to 3 attempts per batch
301
- 3. Track retry count
302
- 4. Only notify if failures persist after all retries:
303
- - "Warning: {count} batches failed after 3 attempts: {batch_list}"
304
- - "Proceeding with available data."
305
-
306
- ---
307
-
308
- ### Step 10: Combine & Report
309
-
310
- Use `combine_outputs.py` from this skill.
311
-
312
- **Extract `<skill_path>` from this skill's tool output.**
313
-
314
- **Unix-like shells:**
315
- ```bash
316
- uv run python <skill_path>/combine_outputs.py \
317
- --input-dir .long-table-summary/{topic}/outputs \
318
- --output-file .long-table-summary/{topic}/combined_summary.xlsx
319
- ```
320
-
321
- **Windows cmd.exe:**
322
- ```bash
323
- uv.exe run python <skill_path>\combine_outputs.py ^
324
- --input-dir .long-table-summary\{topic}\outputs ^
325
- --output-file .long-table-summary\{topic}\combined_summary.xlsx
326
- ```
327
-
328
- **Final Report:**
329
-
330
- ```
331
- Processing Complete
332
- ━━━━━━━━━━━━━━━━━━━━
333
- Topic: {topic}
334
- Total batches: {num_batches}
335
- Successful: {success_count}
336
- Retries: {retry_count}
337
- Failed: {failed_count}
338
-
339
- Output: .long-table-summary/{topic}/combined_summary.xlsx
340
- Rows processed: {total_rows}
341
- Expected rows: {expected_rows}
342
- Completeness: {completeness}%
343
- ```
344
-
345
- ---
346
-
347
- ## Python Scripts
348
-
349
- ### combine_outputs.py
350
-
351
- Combines batch JSON outputs into a single Excel file.
352
-
353
- **Arguments:**
354
- - `--input-dir`: Directory containing batch*.md files (required)
355
- - `--output-file`: Path for output Excel file (required)
356
- - `--deduplicate`: Remove duplicate row numbers (optional)
357
- - `--column-order`: "preserve" or "alphabetical" (default: preserve)
358
- - `--verbose`: Enable debug output (optional)
359
- - `--dry-run`: Validate without writing (optional)
360
-
361
- **Output:**
362
- - Excel file with columns: row_number, <user_fields...>
363
- - Sorted by row_number ascending
364
-
365
- ---
366
-
367
- ## Notes
368
-
369
- - Default batch size: 30 rows
370
- - Topic name: filename without extension
371
- - Subagent type: "general"
372
- - Wave size: 3 subagents
373
- - Retry attempts: 3 per failed batch
374
- - Retry delay: 2 seconds
1
+ ---
2
+ name: long-table-summary
3
+ description: Batch-process large tables using parallel subagents for summarization
4
+ allowedTools:
5
+ - Bash
6
+ - Read
7
+ - Write
8
+ - Question
9
+ - Task
10
+ - tableListSheets
11
+ - tableGetSheetPreview
12
+ - tableGetHeaders
13
+ - tableGetRange
14
+ ---
15
+
16
+ # Long Table Summary
17
+
18
+ This skill enables batched processing of large tables (xlsx, csv) using parallel subagents.
19
+
20
+ ## Workflow Overview
21
+
22
+ 1. **Table Discovery**: Interview user to locate table file and confirm existence
23
+ 2. **Sheet Selection**: If multiple sheets, prompt user to choose one
24
+ 3. **Summarization Instructions**: Interview user for summary requirements (JSON format)
25
+ 4. **Instruction Refinement**: Iterate to refine summarization instructions
26
+ 5. **Batch Size Prompting**: Use `question` tool to ask user for batch size
27
+ 6. **Topic Generation**: Autogenerate topic name from filename + comprehension of JSON
28
+ 7. **Template Creation**: Draft subagent prompt template with JSON output schema
29
+ 8. **Template Writing**: Write finalized template
30
+ 9. **Prompt Generation**: Use Python script to generate batch-specific prompts
31
+ 10. **Parallel Processing**: Launch subagents in waves of 3
32
+ 11. **Progress Monitoring**: Report every 3 completed subagents
33
+ 12. **Retry Failed Batches**: Up to 3 retry attempts for failed batches
34
+ 13. **Output Combination**: Automatically combine all JSON outputs into single table
35
+
36
+ ## Steps
37
+
38
+ ### Step 1: Interview User for Table Location
39
+
40
+ Use `question` tool to ask for table file path:
41
+
42
+ **Question:**
43
+ - "What is the full path to the table file you want to process?" (supports .xlsx, .csv, .ods)
44
+
45
+ ### Step 2: Confirm Table Existence and List Sheets
46
+
47
+ Use `tableListSheets` tool to verify:
48
+
49
+ ```bash
50
+ tableListSheets(file_path="<user_provided_path>")
51
+ ```
52
+
53
+ If the file doesn't exist or is invalid, prompt user to verify the path.
54
+
55
+ ### Step 3: Handle Multiple Sheets (if applicable)
56
+
57
+ **Important:** CSV files have only one sheet. Always skip this step for CSV files.
58
+
59
+ If the file is an Excel (.xlsx) or ODS (.ods) file AND there is more than one sheet, use `question` tool to ask user to choose one.
60
+
61
+ For CSV files: Use the first/only sheet name automatically (either filename returned by tableListSheets or "Sheet1" default) without asking user.
62
+
63
+ ### Step 4: Get Table Metadata
64
+
65
+ Use `tableGetSheetPreview` and `tableGetHeaders` to get row count and column structure.
66
+
67
+ ### Step 5: Interview User for Summarization Instructions
68
+
69
+ Use `question` tool to ask user to provide summarization instructions in JSON format.
70
+
71
+ **Question:**
72
+ - "Please provide your summarization requirements in JSON format. For each field you want extracted, specify as key the field name, and as value a brief description of what it represents."
73
+
74
+ **Show example:**
75
+ ```json
76
+ {
77
+ "species": "Species, one of Tier1/Tier2/NA. Tier1 includes human and monkey, Tier2 includes other animals. NA otherwise.",
78
+ "topic": "Main topic, one of Oncology/Immunology/General Biology/Others."
79
+ }
80
+ ```
81
+
82
+ **Additional example:**
83
+ ```json
84
+ {
85
+ "gene_mutation": "Gene mutation pattern, e.g., V600E, R173, Wild Type",
86
+ "clinical_significance": "Clinical relevance, one of High/Medium/Low/Unknown",
87
+ "therapeutic_target": "Is this a drug target? Answer Yes/No/Unknown"
88
+ }
89
+ ```
90
+
91
+ **Instructions to user:**
92
+ - Provide any number of fields
93
+ - Each field should be a key (JSON property name) with a description as the value
94
+ - Use clear, specific descriptions that a subagent can interpret
95
+ - Include allowed values or examples if applicable
96
+
97
+ ### Step 6: Summarization Instruction Refinement
98
+
99
+ After receiving user's JSON instructions, iteratively refine them.
100
+
101
+ **Question:**
102
+ - "Here's your summarization instruction template. Would you like to modify any field descriptions or add new fields?"
103
+
104
+ **Show the current JSON to user**
105
+
106
+ If user selects "No, it's correct", proceed to Step 7.
107
+
108
+ If user selects "Yes, I want to modify":
109
+ - Ask which field to modify or add
110
+ - Update the JSON accordingly
111
+ - Repeat the approval question
112
+
113
+ Continue until user explicitly confirms that the instruction JSON is correct.
114
+
115
+ ### Step 7: Autogenerate Topic Name
116
+
117
+ Generate the topic name by combining:
118
+ - Base filename (without extension)
119
+ - Summary words derived from manual comprehension of user's JSON
120
+
121
+ **Algorithm:**
122
+ 1. Extract filename: `clinical_trials_2024_data.xlsx` → `clinical_trials_2024_data`
123
+ - **Note:** The filename may have more than 6 words; keep the full name
124
+ 2. Comprehend the overall content of user's JSON instructions:
125
+ - Read the JSON manually to understand what it's about
126
+ - Identify the main idea or subject matter
127
+ - Generate summary words (maximum 6 words, lowercase, hyphenated)
128
+ - Examples: `species-classification`, `gene-mutation`, `clinical-significance`
129
+ 3. Combine with hyphens: `clinical_trials_2024_data-species-classification`
130
+ - **Total words may exceed 6** (filename + summary words)
131
+ - **Only the summary words part has a 6-word limit**
132
+
133
+ **Examples:**
134
+ - `clinical_trials_2024_data.xlsx` + topic about species `clinical_trials_2024_data-species-classification`
135
+ - `literature_review_comprehensive.csv` + gene mutation analysis → `literature_review_comprehensive-gene-mutation`
136
+ - `long_descriptive_filename_multiple_words.xlsx` + immunology → `long_descriptive_filename_multiple_words-immunology`
137
+
138
+ ### Step 8: Ask User for Batch Size
139
+
140
+ Use `question` tool to explicitly prompt user about batch size.
141
+
142
+ **Question:**
143
+ - "How many rows should each batch contain? Recommended default: 30 rows per batch."
144
+
145
+ User's value will be used. Calculate the number of batches needed: `ceil(total_rows / batch_size)`.
146
+
147
+ ### Step 9: Calculate Batch Ranges
148
+
149
+ Example for 90 rows with 30 per batch:
150
+ - Batch 1: Rows 2-31
151
+ - Batch 2: Rows 32-61
152
+ - Batch 3: Rows 62-90
153
+
154
+ **Note:** Row 1 is the header, data starts at row 2.
155
+
156
+ ### Step 10: Create Subagent Prompt Template
157
+
158
+ Create a template with `{placeholder}` format (single braces):
159
+
160
+ ```markdown
161
+ # Batch Data Summarization Task
162
+
163
+ ## Input File
164
+ - Full path: `{file_path}`
165
+ - Sheet name: `{sheet_name}`
166
+
167
+ ## Row Range
168
+ - Batch number: {batch_number}
169
+ - Start row: {row_start}
170
+ - End row: {row_end}
171
+
172
+ ## Summarization Instructions
173
+
174
+ Extract the following fields from each row:
175
+
176
+ {instructions_json}
177
+
178
+ ## Output Format
179
+
180
+ Your output must be a valid JSON file with this structure:
181
+
182
+ ```json
183
+ {
184
+ "batch_number": {batch_number},
185
+ "row_count": <number_of_rows_processed>,
186
+ "summaries": [
187
+ {
188
+ "row_number": <row_number>,
189
+ <field_1>: "<extracted_value>",
190
+ <field_2>: "<extracted_value>"
191
+ }
192
+ ]
193
+ }
194
+ ```
195
+
196
+ **Important:** The JSON keys for extracted values must match the field names specified in the Summarization Instructions.
197
+
198
+ ## Instructions
199
+
200
+ 1. Read the specified row range from the input file using the `tableGetRange` tool
201
+ 2. For each row, extract the requested fields according to the instructions above
202
+ 3. Map the extracted values to the JSON keys specified in the instructions
203
+ 4. Generate concise summaries based on the extracted data
204
+ 5. Save your output to: `{output_file}`
205
+
206
+ ## Output File Path
207
+ Full path: `{output_file}`
208
+
209
+ **CRITICAL:** Write your final output as a markdown file (.md) containing ONLY the JSON object (no additional text or explanation).
210
+ ```
211
+
212
+ ### Step 11: Create Directory Structure
213
+
214
+ ```bash
215
+ mkdir -p .long-table-summary/{topic}/prompts
216
+ mkdir -p .long-table-summary/{topic}/outputs
217
+ ```
218
+
219
+ ### Step 12: Write Template
220
+
221
+ Write the finalized template to `.long-table-summary/{topic}/subagent_template.md`.
222
+
223
+ ### Step 13: Generate Subagent Prompts
224
+
225
+ Use `generate_prompts.py`:
226
+
227
+ **Before Step 13 and Step 17:** Extract the full path to the skill directory from the `<skill_files>` section in the skill tool output. Use this path as `<skill_path>` in the commands below.
228
+
229
+ **Unix-like shells:**
230
+ ```bash
231
+ uv run python <skill_path>/generate_prompts.py \
232
+ --template .long-table-summary/{topic}/subagent_template.md \
233
+ --output-dir .long-table-summary/{topic}/prompts \
234
+ --num-batches {num_batches} \
235
+ --sheet-name "{sheet_name}" \
236
+ --file-path "{input_file}" \
237
+ --start-row 2 \
238
+ --batch-size {batch_size} \
239
+ --instructions '{instructions_json}'
240
+ ```
241
+
242
+ **For Windows cmd.exe:**
243
+ ```bash
244
+ uv.exe run python <skill_path>\generate_prompts.py ^
245
+ --template .long-table-summary\{topic}\subagent_template.md ^
246
+ --output-dir .long-table-summary\{topic}\prompts ^
247
+ --num-batches {num_batches} ^
248
+ --sheet-name "{sheet_name}" ^
249
+ --file-path "{input_file}" ^
250
+ --start-row 2 ^
251
+ --batch-size {batch_size} ^
252
+ --instructions "{instructions_json}"
253
+ ```
254
+
255
+ **Note:** The `{instructions_json}` is the user-confirmed JSON from Step 6.
256
+
257
+ ### Step 14: Launch Subagents in Waves of 3
258
+
259
+ Launch subagents in waves of 3, waiting for each wave to complete before starting the next.
260
+
261
+ **Wave 1:**
262
+ ```typescript
263
+ task(subagent_type="general", description="Process batch 001", prompt="Read your prompt from .long-table-summary/{topic}/prompts/batch001.md and perform the task described there exactly as written.", run_in_background=true)
264
+ task(subagent_type="general", description="Process batch 002", prompt="Read your prompt from .long-table-summary/{topic}/prompts/batch002.md and perform the task described there exactly as written.", run_in_background=true)
265
+ task(subagent_type="general", description="Process batch 003", prompt="Read your prompt from .long-table-summary/{topic}/prompts/batch003.md and perform the task described there exactly as written.", run_in_background=true)
266
+ ```
267
+
268
+ **Wait for Wave 1 to complete.**
269
+
270
+ **Wave 2 (if more batches):**
271
+ ```typescript
272
+ task(subagent_type="general", description="Process batch 004", prompt="Read your prompt from .long-table-summary/{topic}/prompts/batch004.md and perform the task described there exactly as written.", run_in_background=true)
273
+ task(subagent_type="general", description="Process batch 005", prompt="Read your prompt from .long-table-summary/{topic}/prompts/batch005.md and perform the task described there exactly as written.", run_in_background=true)
274
+ task(subagent_type="general", description="Process batch 006", prompt="Read your prompt from .long-table-summary/{topic}/prompts/batch006.md and perform the task described there exactly as written.", run_in_background=true)
275
+ ```
276
+
277
+ **Continue** launching waves of 3 until all batches are started.
278
+
279
+ **Important:** Do NOT pass the full generated prompts directly to subagents. Always direct subagents to read their respective prompt files.
280
+
281
+ ### Step 15: Monitor Progress
282
+
283
+ Every time 3 subagents complete, report progress:
284
+
285
+ **Progress report format:**
286
+ - "Progress: X/Y batches completed (Z%)"
287
+
288
+ For example:
289
+ - "Progress: 3/10 batches completed (30%)"
290
+ - "Progress: 6/10 batches completed (60%)"
291
+ - "Progress: 9/10 batches completed (90%)"
292
+ - "Progress: 10/10 batches completed (100%)"
293
+
294
+ Do NOT inspect individual subagent outputs midway.
295
+
296
+ ### Step 16: Retry Failed Batches
297
+
298
+ After all batches are done, check for missing outputs:
299
+
300
+ ```bash
301
+ ls .long-table-summary/{topic}/outputs/
302
+ ```
303
+
304
+ **Missing files** = failed batches. Collect the batch numbers of all missing files.
305
+
306
+ If there are failed batches:
307
+
308
+ 1. Ask user using the `question` tool:
309
+ - "Continue with failed batches or stop?
310
+
311
+ Failed batches: batch003, batch007 (2 failures)
312
+
313
+ • Continue - Will retry each failed batch up to 3 times
314
+ • Stop - Keep current outputs and proceed to final report"
315
+
316
+ (Replace the bracketed list with the actual missing batch numbers)
317
+
318
+ 2. **Options:**
319
+ - "Continue with failed batches"
320
+ - "Stop and keep current outputs"
321
+
322
+ 3. **If user selects "Continue":**
323
+ - For each failed batch:
324
+ a. Wait 2 seconds
325
+ b. Retry with the same `subagent_type="general"`
326
+ c. Up to 3 retry attempts
327
+ - After retries, check again for remaining failures
328
+ - If batches are still failing, repeat the question with the updated failed list
329
+
330
+ 4. **If user selects "Stop":**
331
+ - Do not retry any more batches
332
+ - Proceed to Step 17 with whatever outputs exist
333
+
334
+ ### Step 17: Combine All JSON Outputs
335
+
336
+ After all batches are complete (or user stops retrying), use `combine_outputs.py`:
337
+
338
+ **Before Step 13 and Step 17:** Extract the full path to the skill directory from the `<skill_files>` section in the skill tool output. Use this path as `<skill_path>` in the commands below.
339
+
340
+ **Unix-like shells:**
341
+ ```bash
342
+ uv run python <skill_path>/combine_outputs.py \
343
+ --input-dir .long-table-summary/{topic}/outputs \
344
+ --output-file .long-table-summary/{topic}/combined_summary.xlsx
345
+ ```
346
+
347
+ **For Windows cmd.exe:**
348
+ ```bash
349
+ uv.exe run python <skill_path>\combine_outputs.py ^
350
+ --input-dir .long-table-summary\{topic}\outputs ^
351
+ --output-file .long-table-summary\{topic}\combined_summary.xlsx
352
+ ```
353
+
354
+ **Expected output:**
355
+ - A single Excel file with all summaries combined
356
+ - Table format: row_number, <field_1>, <field_2>, ... (all user-requested fields)
357
+ - One row per input table row
358
+ - Sorted by row_number ascending
359
+
360
+ **Script behavior:**
361
+ - Reads all `batch*.md` files from the output directory
362
+ - Parses JSON from each file
363
+ - Dynamically determines all columns from the first batch's summaries (user's JSON keys)
364
+ - Merges all summaries into a structured table
365
+ - Writes the combined Excel file
366
+
367
+ ### Step 18: Final Report
368
+
369
+ Provide user with:
370
+ 1. Topic name used
371
+ 2. Total batches processed
372
+ 3. Number of retries (if any)
373
+ 4. Combined output location
374
+ 5. Row count in the combined file
375
+
376
+ ## Python Scripts
377
+
378
+ ### Script 1: `generate_prompts.py`
379
+
380
+ **Arguments:**
381
+ - `--template`: Path to subagent_template.md
382
+ - `--output-dir`: Directory for generated prompts
383
+ - `--num-batches`: Total number of batches
384
+ - `--sheet-name`: Sheet name
385
+ - `--file-path`: Full path to the input table file
386
+ - `--start-row`: Starting data row (default: 2)
387
+ - `--batch-size`: Rows per batch (default: 30)
388
+ - `--instructions`: User-confirmed JSON with summarization fields
389
+ - `--dry-run`: Validate without creating files (optional)
390
+ - `--verbose`: Enable verbose output for debugging (optional)
391
+
392
+ **Placeholders to replace:**
393
+ - `{file_path}` → Absolute input file path
394
+ - `{sheet_name}` → Sheet name
395
+ - `{batch_number}` → Batch number (001, 002, etc.)
396
+ - `{row_start}` → Start row
397
+ - `{row_end}` → End row
398
+ - `{output_file}` → Output file path
399
+ - `{instructions_json}` → User's JSON instruction (properly escaped for markdown code block)
400
+
401
+ ### Script 2: `combine_outputs.py`
402
+
403
+ **Arguments:**
404
+ - `--input-dir`: Directory containing batch output JSON files
405
+ - `--output-file`: Path for the combined Excel output file
406
+ - `--dry-run`: Validate inputs without writing output file (optional)
407
+ - `--verbose`: Enable verbose output for debugging (optional)
408
+ - `--deduplicate`: Remove duplicate row numbers (keep first occurrence) (optional)
409
+ - `--column-order`: Column order - 'preserve' (from first batch) or 'alphabetical' (default) (optional)
410
+
411
+ **Behavior:**
412
+ 1. Scan the input directory for `batch*.md` files
413
+ 2. For each file, extract the JSON content
414
+ 3. Dynamically determine all columns from the first batch's summaries (extract all JSON keys from the first summary, excluding `batch_number` and `row_count`)
415
+ 4. Merge all summaries by row_number
416
+ 5. Create an Excel file with columns: row_number, <all user fields>
417
+ 6. Sort by row_number ascending
418
+ 7. Write to the output path
419
+
420
+ **Error handling:**
421
+ - If no output files are found → Return error JSON
422
+ - If JSON parse fails → Log error and continue with other files
423
+ - If duplicate row numbers exist → The last write wins (or use --deduplicate flag to keep first)
424
+
425
+ ## Notes
426
+
427
+ - The default batch size recommendation is 30 rows per batch
428
+ - Summarization instructions are provided as JSON with explicit field descriptions
429
+ - The topic name is autogenerated from the filename + manual comprehension of the user's JSON
430
+ - The summary words part (derived from JSON) has a maximum of 6 words
431
+ - The total topic name (filename + summary words) may exceed 6 words
432
+ - Subagent type is always `general`
433
+ - Subagents are launched in waves of 3 (not 5)
434
+ - Progress is reported every 3 completions
435
+ - Failed batches are retried up to 3 times
436
+ - Output is always combined automatically via the Python script
437
+ - The main agent does NOT manually read the JSON outputs