npm - @yeyuan98/opencode-bioresearcher-plugin - Versions diffs - 1.5.0-alpha.0 → 1.5.1 - Mend

@yeyuan98/opencode-bioresearcher-plugin 1.5.0-alpha.0 → 1.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (133) hide show

package/dist/skills/long-table-summary/SKILL.md CHANGED Viewed

@@ -1,17 +1,19 @@
 ---
 name: long-table-summary
 description: Batch-process large tables using parallel subagents for summarization
-allowedTools:
-  - Bash
-  - Read
-  - Write
-  - Question
-  - Task
-  - tableListSheets
-  - tableGetSheetPreview
-  - tableGetHeaders
-  - tableGetRange
----
+allowedTools:
+  - Bash
+  - Read
+  - Write
+  - Question
+  - Task
+  - tableListSheets
+  - tableGetSheetPreview
+  - tableGetHeaders
+  - tableGetRange
+  - jsonValidate
+  - jsonInfer
+---
 # Long Table Summary
@@ -110,9 +112,70 @@ If user selects "Yes, I want to modify":
 - Update the JSON accordingly
 - Repeat the approval question
-Continue until user explicitly confirms that the instruction JSON is correct.
-### Step 7: Autogenerate Topic Name
+Continue until user explicitly confirms that the instruction JSON is correct.
+### Step 6.5: Generate Output JSON Schema
+Generate a JSON Schema that defines the exact output structure. All fields are required.
+**Default value for unavailable data:** Use `"NA"` (string) for any field where data cannot be extracted.
+**Construct example output:**
+1. Start with the base structure:
+```json
+{
+  "batch_number": 1,
+  "row_count": 30,
+  "summaries": [
+    {
+      "row_number": 2
+    }
+  ]
+}
+```
+2. Add each user-specified field with an example value (use `"NA"` if the field might be empty):
+For example, if user provided:
+```json
+{
+  "species": "Species classification: Tier1/Tier2/NA",
+  "topic": "Main topic: Oncology/Immunology/Other"
+}
+```
+Construct the example output:
+```json
+{
+  "batch_number": 1,
+  "row_count": 30,
+  "summaries": [
+    {
+      "row_number": 2,
+      "species": "Tier1",
+      "topic": "Oncology"
+    }
+  ]
+}
+```
+**Generate schema with strict mode:**
+```typescript
+jsonInfer data='<example_output_json>' strict=true
+```
+**Save the returned schema to:**
+```bash
+Write file: .long-table-summary/{topic}/schema.json
+Content: <schema_from_jsonInfer>
+```
+This schema file will be used by all subagents to validate their outputs before writing.
+### Step 7: Autogenerate Topic Name
 Generate the topic name by combining:
 - Base filename (without extension)
@@ -153,61 +216,77 @@ Example for 90 rows with 30 per batch:
 **Note:** Row 1 is the header, data starts at row 2.
-### Step 10: Create Subagent Prompt Template
-Create a template with `{placeholder}` format (single braces):
-```markdown
-# Batch Data Summarization Task
-## Input File
-- Full path: `{file_path}`
-- Sheet name: `{sheet_name}`
-## Row Range
-- Batch number: {batch_number}
-- Start row: {row_start}
-- End row: {row_end}
-## Summarization Instructions
-Extract the following fields from each row:
-{instructions_json}
-## Output Format
-Your output must be a valid JSON file with this structure:
-```json
-{
-  "batch_number": {batch_number},
-  "row_count": <number_of_rows_processed>,
-  "summaries": [
-    {
-      "row_number": <row_number>,
-      <field_1>: "<extracted_value>",
-      <field_2>: "<extracted_value>"
-    }
-  ]
-}
-```
-**Important:** The JSON keys for extracted values must match the field names specified in the Summarization Instructions.
-## Instructions
-1. Read the specified row range from the input file using the `tableGetRange` tool
-2. For each row, extract the requested fields according to the instructions above
-3. Map the extracted values to the JSON keys specified in the instructions
-4. Generate concise summaries based on the extracted data
-5. Save your output to: `{output_file}`
-## Output File Path
-Full path: `{output_file}`
-**CRITICAL:** Write your final output as a markdown file (.md) containing ONLY the JSON object (no additional text or explanation).
-```
+### Step 10: Create Subagent Prompt Template
+Create a template with `{placeholder}` format (single braces):
+```markdown
+# Batch Data Summarization Task
+## Input File
+- Path: `{file_path}`
+- Sheet: `{sheet_name}`
+## Row Range
+- Batch: {batch_number}
+- Rows: {row_start} to {row_end}
+## Summarization Instructions
+For each row, extract these fields:
+{instructions_json}
+**Default for unavailable data:** If a field cannot be extracted, use `"NA"` as the value.
+## Output Structure
+```json
+{
+  "batch_number": {batch_number},
+  "row_count": <number_of_rows_in_this_batch>,
+  "summaries": [
+    {
+      "row_number": <row_number>,
+      "<field_1>": "<value_or_NA>",
+      "<field_2>": "<value_or_NA>"
+    }
+  ]
+}
+```
+## Output Schema
+Your output must conform to this schema: `{schema_path}`
+All fields are required. Use `"NA"` for unavailable values.
+## Mandatory Workflow
+**Step 1:** Read rows using `tableGetRange`:
+```typescript
+tableGetRange file_path="{file_path}" sheet_name="{sheet_name}" range="A{row_start}:Z{row_end}"
+```
+**Step 2:** Build JSON in memory with all required fields
+**Step 3:** Validate BEFORE writing:
+```typescript
+jsonValidate data='<your_complete_json>' schema="{schema_path}"
+```
+**Step 4:** Check result:
+- If `valid: true` → Go to Step 5
+- If `valid: false` → Fix errors listed in `errors` array, return to Step 3
+**Step 5:** Write validated JSON to `{output_file}`
+Output file should contain ONLY the JSON object (no markdown, no extra text).
+## Output Path
+`{output_file}`
+```
+```
 ### Step 11: Create Directory Structure
@@ -226,31 +305,33 @@ Use `generate_prompts.py`:
 **Before Step 13 and Step 17:** Extract the full path to the skill directory from the `<skill_files>` section in the skill tool output. Use this path as `<skill_path>` in the commands below.
-**Unix-like shells:**
-```bash
-uv run python <skill_path>/generate_prompts.py \
-  --template .long-table-summary/{topic}/subagent_template.md \
-  --output-dir .long-table-summary/{topic}/prompts \
-  --num-batches {num_batches} \
-  --sheet-name "{sheet_name}" \
-  --file-path "{input_file}" \
-  --start-row 2 \
-  --batch-size {batch_size} \
-  --instructions '{instructions_json}'
-```
-**For Windows cmd.exe:**
-```bash
-uv.exe run python <skill_path>\generate_prompts.py ^
-  --template .long-table-summary\{topic}\subagent_template.md ^
-  --output-dir .long-table-summary\{topic}\prompts ^
-  --num-batches {num_batches} ^
-  --sheet-name "{sheet_name}" ^
-  --file-path "{input_file}" ^
-  --start-row 2 ^
-  --batch-size {batch_size} ^
-  --instructions "{instructions_json}"
-```
+**Unix-like shells:**
+```bash
+uv run python <skill_path>/generate_prompts.py \
+  --template .long-table-summary/{topic}/subagent_template.md \
+  --output-dir .long-table-summary/{topic}/prompts \
+  --num-batches {num_batches} \
+  --sheet-name "{sheet_name}" \
+  --file-path "{input_file}" \
+  --start-row 2 \
+  --batch-size {batch_size} \
+  --instructions '{instructions_json}' \
+  --schema-path ".long-table-summary/{topic}/schema.json"
+```
+**For Windows cmd.exe:**
+```bash
+uv.exe run python <skill_path>\generate_prompts.py ^
+  --template .long-table-summary\{topic}\subagent_template.md ^
+  --output-dir .long-table-summary\{topic}\prompts ^
+  --num-batches {num_batches} ^
+  --sheet-name "{sheet_name}" ^
+  --file-path "{input_file}" ^
+  --start-row 2 ^
+  --batch-size {batch_size} ^
+  --instructions "{instructions_json}" ^
+  --schema-path ".long-table-summary\{topic}\schema.json"
+```
 **Note:** The `{instructions_json}` is the user-confirmed JSON from Step 6.
@@ -293,43 +374,31 @@ For example:
 Do NOT inspect individual subagent outputs midway.
-### Step 16: Retry Failed Batches
-After all batches are done, check for missing outputs:
-```bash
-ls .long-table-summary/{topic}/outputs/
-```
-**Missing files** = failed batches. Collect the batch numbers of all missing files.
-If there are failed batches:
-1. Ask user using the `question` tool:
-   - "Continue with failed batches or stop?
-Failed batches: batch003, batch007 (2 failures)
-• Continue - Will retry each failed batch up to 3 times
-• Stop - Keep current outputs and proceed to final report"
-   (Replace the bracketed list with the actual missing batch numbers)
-2. **Options:**
-   - "Continue with failed batches"
-   - "Stop and keep current outputs"
-3. **If user selects "Continue":**
-   - For each failed batch:
-     a. Wait 2 seconds
-     b. Retry with the same `subagent_type="general"`
-     c. Up to 3 retry attempts
-   - After retries, check again for remaining failures
-   - If batches are still failing, repeat the question with the updated failed list
-4. **If user selects "Stop":**
-   - Do not retry any more batches
-   - Proceed to Step 17 with whatever outputs exist
+### Step 16: Check for Missing Outputs
+After all batches are done, check for missing outputs:
+```bash
+ls .long-table-summary/{topic}/outputs/
+```
+Missing files indicate subagent failure. If any are missing:
+1. Ask user using the `question` tool:
+   - "{number} batches failed. Retry failed batches or proceed with available outputs?"
+2. **Options:**
+   - "Retry failed batches"
+   - "Proceed with available outputs"
+3. **If user selects "Retry":**
+   - Re-launch subagent with same prompt file for each failed batch
+4. **If user selects "Proceed":**
+   - Continue to Step 17 with available outputs
+Note: Since subagents validate their outputs before writing, existing files should contain valid JSON.
 ### Step 17: Combine All JSON Outputs
@@ -375,28 +444,30 @@ Provide user with:
 ## Python Scripts
-### Script 1: `generate_prompts.py`
-**Arguments:**
-- `--template`: Path to subagent_template.md
-- `--output-dir`: Directory for generated prompts
-- `--num-batches`: Total number of batches
-- `--sheet-name`: Sheet name
-- `--file-path`: Full path to the input table file
-- `--start-row`: Starting data row (default: 2)
-- `--batch-size`: Rows per batch (default: 30)
-- `--instructions`: User-confirmed JSON with summarization fields
-- `--dry-run`: Validate without creating files (optional)
-- `--verbose`: Enable verbose output for debugging (optional)
-**Placeholders to replace:**
-- `{file_path}` → Absolute input file path
-- `{sheet_name}` → Sheet name
-- `{batch_number}` → Batch number (001, 002, etc.)
-- `{row_start}` → Start row
-- `{row_end}` → End row
-- `{output_file}` → Output file path
-- `{instructions_json}` → User's JSON instruction (properly escaped for markdown code block)
+### Script 1: `generate_prompts.py`
+**Arguments:**
+- `--template`: Path to subagent_template.md
+- `--output-dir`: Directory for generated prompts
+- `--num-batches`: Total number of batches
+- `--sheet-name`: Sheet name
+- `--file-path`: Full path to the input table file
+- `--start-row`: Starting data row (default: 2)
+- `--batch-size`: Rows per batch (default: 30)
+- `--instructions`: User-confirmed JSON with summarization fields
+- `--schema-path`: Path to output JSON Schema file (required)
+- `--dry-run`: Validate without creating files (optional)
+- `--verbose`: Enable verbose output for debugging (optional)
+**Placeholders to replace:**
+- `{file_path}` → Absolute input file path
+- `{sheet_name}` → Sheet name
+- `{batch_number}` → Batch number (001, 002, etc.)
+- `{row_start}` → Start row
+- `{row_end}` → End row
+- `{output_file}` → Output file path
+- `{instructions_json}` → User's JSON instruction (properly escaped for markdown code block)
+- `{schema_path}` → Path to output JSON Schema file
 ### Script 2: `combine_outputs.py`

package/dist/skills/long-table-summary/combine_outputs.py CHANGED Viewed

@@ -37,20 +37,17 @@ def read_json_outputs(input_dir: str, verbose: bool = False) -> Dict[str, Any]:
             with open(batch_file, "r", encoding="utf-8") as f:
                 content = f.read().strip()
-            # Find JSON in markdown (typically the entire content)
-            json_start = content.find("{")
-            json_end = content.rfind("}") + 1
-            if json_start == -1 or json_end == 0:
+            # Extract JSON using brace matching (string-aware, handles nested structures)
+            extracted = extract_json_from_content(content)
+            if extracted is None:
                 if verbose:
-                    print(f"Warning: No JSON found in {batch_file.name}")
+                    print(f"Warning: No valid JSON found in {batch_file.name}")
                 continue
-            json_str = content[json_start:json_end]
-            data = json.loads(json_str)
-            all_summaries.append(data)
+            all_summaries.append(extracted)
             if verbose:
+                data = extracted
                 print(
                     f"Parsed: {batch_file.name} - {len(data.get('summaries', []))} summaries"
                 )
@@ -67,6 +64,55 @@ def read_json_outputs(input_dir: str, verbose: bool = False) -> Dict[str, Any]:
     return {"success": True, "summaries": all_summaries}
+def extract_json_from_content(content: str) -> dict | None:
+    """Extract JSON from content using string-aware brace matching.
+    Args:
+        content: File content string
+    Returns:
+        Parsed JSON dict or None if not found
+    """
+    # Try direct parse first (file contains only JSON)
+    try:
+        return json.loads(content)
+    except json.JSONDecodeError:
+        pass
+    # Find JSON object boundaries with brace matching
+    start = content.find("{")
+    if start == -1:
+        return None
+    depth = 0
+    in_string = False
+    escape = False
+    for i in range(start, len(content)):
+        char = content[i]
+        if escape:
+            escape = False
+            continue
+        if char == "\\":
+            escape = True
+            continue
+        if char == '"':
+            in_string = not in_string
+            continue
+        if not in_string:
+            if char == "{":
+                depth += 1
+            elif char == "}":
+                depth -= 1
+                if depth == 0:
+                    try:
+                        return json.loads(content[start : i + 1])
+                    except json.JSONDecodeError:
+                        return None
+    return None
 def merge_summaries(
     summaries: List[Dict[str, Any]],
     deduplicate: bool = False,

package/dist/skills/long-table-summary/generate_prompts.py CHANGED Viewed

@@ -17,6 +17,7 @@ def generate_prompts(
     batch_size,
     file_path,
     instructions,
+    schema_path,
     dry_run=False,
     verbose=False,
 ):
@@ -31,6 +32,7 @@ def generate_prompts(
         batch_size: Rows per batch
         file_path: Full path to input table file
         instructions: User-provided summarization instructions (JSON string)
+        schema_path: Path to output JSON Schema file
         dry_run: Validate without creating files
         verbose: Enable verbose output
     """
@@ -83,6 +85,7 @@ def generate_prompts(
         content = content.replace("{row_end}", str(row_end))
         content = content.replace("{output_file}", output_file)
         content = content.replace("{instructions_json}", instructions_escaped)
+        content = content.replace("{schema_path}", schema_path)
         # Dry run mode - skip actual file writes
         if dry_run:
@@ -147,6 +150,11 @@ def main():
         required=True,
         help="User-provided summarization instructions (JSON string)",
     )
+    parser.add_argument(
+        "--schema-path",
+        required=True,
+        help="Path to output JSON Schema file (relative or absolute)",
+    )
     parser.add_argument(
         "--dry-run",
         action="store_true",
@@ -200,6 +208,7 @@ def main():
         batch_size=args.batch_size,
         file_path=args.file_path,
         instructions=args.instructions,
+        schema_path=args.schema_path,
         dry_run=args.dry_run,
         verbose=args.verbose,
     )