npm - @yeyuan98/opencode-bioresearcher-plugin - Versions diffs - 1.3.1-alpha.0 → 1.3.1 - Mend

@yeyuan98/opencode-bioresearcher-plugin 1.3.1-alpha.0 → 1.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/dist/skills/pubmed-weekly/SKILL.md +125 -56
package/dist/skills/pubmed-weekly/pubmed_weekly.py +134 -16
package/dist/skills/pubmed-weekly/pyproject.toml +8 -0
package/package.json +1 -1

package/dist/skills/pubmed-weekly/SKILL.md CHANGED Viewed

@@ -6,6 +6,7 @@ allowedTools:
   - Read
   - Write
   - Question
+  - parse_pubmed_articleSet
 ---
 # PubMed Weekly Daily Updates Download
@@ -34,13 +35,13 @@ This skill integrates with the `python-setup-uv` skill to ensure Python environm
 Before starting the download process:
 1. **Check if uv is installed:**
-   ```bash
-   if [ -f "uv" ] || [ -f "uv.exe" ]; then
-     echo "uv already installed"
-   else
-     echo "uv not found, setting up..."
-   fi
-   ```
+    ```bash
+    if [ -f "uv" ] || [ -f "uv.exe" ]; then
+      echo "uv already installed"
+    else
+      echo "uv not found, setting up..."
+    fi
+    ```
 2. **If uv is not installed:**
    - Load the `python-setup-uv` skill using the skill tool
@@ -49,53 +50,46 @@ Before starting the download process:
    - Continue with this skill's Step 1 below
 3. **After uv is installed:**
-   - All Python commands in this skill will use the bash tool with `workdir` parameter
-   - Use `./uv run python` (Unix-like) or `uv.exe run python` (Windows cmd.exe)
    - The bundled script `pubmed_weekly.py` will be executed using uv
+   - Extract the full script path from the `<skill_files>` section in skill tool output
-## Path Resolution Strategy
-This skill uses the bash tool's `workdir` parameter to handle portable script execution:
+## Steps
-1. **Extract skill directory path** from `<skill_files>` section in skill tool output
-   - Example: `<file>C:\Users\...\plugin\skills\pubmed-weekly\pubmed_weekly.py</file>`
-   - Extract directory: `C:\Users\...\plugin\skills\pubmed-weekly\`
+Follow these steps EXACTLY as described.
-2. **Get current working directory** using the bash tool
-   ```bash
-   WORKING_DIR=$(pwd)  # Unix-like
-   # or use the default working directory from bash tool
-   ```
+### Step 1: Calculate Week Date Range
-3. **Use bash tool with workdir** to run Python scripts from skill directory
-   ```bash
-   # bash tool will handle the workdir parameter
-   # Python's os.getcwd() will give skill directory
-   # Downloads go to working directory via --working-dir argument
-   ```
+First, determine the date range for the past week (Monday through Sunday).
-## Steps
+Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section in the skill tool output.
-Follow these steps EXACTLY as described.
+**For Unix-like shells (Git Bash / macOS / Linux):**
+```bash
+uv run python <skill_path>/pubmed_weekly.py calculate_week
+```
 ### Step 1: Calculate Week Date Range
 First, determine the date range for the past week (Monday through Sunday).
-Use the bash tool with:
-- `workdir` set to the skill directory (extracted from `<skill_files>`)
-- `command` to run the Python script
+Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section in the skill tool output.
+### Step 1: Calculate Week Date Range
+First, determine the date range for the past week (Monday through Sunday).
 **For Unix-like shells (Git Bash / macOS / Linux):**
 ```bash
-./uv run python pubmed_weekly.py calculate_week --working-dir="$(pwd)"
+uv run python <skill_path>/pubmed_weekly.py calculate_week
 ```
 **For Windows cmd.exe:**
 ```bash
-uv.exe run python pubmed_weekly.py calculate_week --working-dir="%CD%"
+uv.exe run python <skill_path>\pubmed_weekly.py calculate_week
 ```
+Replace `<skill_path>` with the full directory path extracted from `<skill_files>`.
 This will output the week folder name in format `YYYYMMDD-YYYYMMDD`.
 **Expected output format:**
@@ -105,28 +99,24 @@ This will output the week folder name in format `YYYYMMDD-YYYYMMDD`.
 ### Step 2: Create Download Directory
-Create the directory structure for the week in the working directory:
-```bash
-mkdir -p .download/pubmed-daily/<WEEK>
-```
-Replace `<WEEK>` with the actual week folder name from Step 1.
+The `download_file` command will automatically create the directory structure when needed. No manual directory creation is required.
 ### Step 3: Fetch FTP File List
-Use the bash tool with `workdir` set to the skill directory to fetch the list of files from the NCBI FTP server:
+Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section and fetch the list of files from the NCBI FTP server.
 **For Unix-like shells:**
 ```bash
-./uv run python pubmed_weekly.py fetch_files --working-dir="$(pwd)"
+uv run python <skill_path>/pubmed_weekly.py fetch_files
 ```
 **For Windows cmd.exe:**
 ```bash
-uv.exe run python pubmed_weekly.py fetch_files --working-dir="%CD%"
+uv.exe run python <skill_path>\pubmed_weekly.py fetch_files
 ```
+Replace `<skill_path>` with the full directory path extracted from `<skill_files>`.
 This will list all daily update xml.gz files available on the FTP server.
 **Expected output:**
@@ -136,19 +126,20 @@ pubmed24n1234.xml.gz pubmed24n1235.xml.gz pubmed24n1236.xml.gz
 ### Step 4: Filter Files for Past Week
-Use the bash tool with `workdir` set to the skill directory to filter the file list for the past week's daily updates:
+Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section and filter the file list for the past week's daily updates.
 **For Unix-like shells:**
 ```bash
-./uv run python pubmed_weekly.py filter_files "<WEEK>" "<FILE_LIST>" --working-dir="$(pwd)"
+uv run python <skill_path>/pubmed_weekly.py filter_files "<WEEK>" "<FILE_LIST>"
 ```
 **For Windows cmd.exe:**
 ```bash
-uv.exe run python pubmed_weekly.py filter_files "<WEEK>" "<FILE_LIST>" --working-dir="%CD%"
+uv.exe run python <skill_path>\pubmed_weekly.py filter_files "<WEEK>" "<FILE_LIST>"
 ```
 Where:
+- `<skill_path>` is the full directory path extracted from `<skill_files>`
 - `<WEEK>` is the week folder name (e.g., `20250217-20250223`)
 - `<FILE_LIST>` is the output from Step 3 (space-separated filenames, use quotes)
@@ -161,20 +152,25 @@ pubmed24n1234.xml.gz pubmed24n1235.xml.gz pubmed24n1236.xml.gz
 ### Step 5: Download Files with Retry
-For each file in the filtered list, download to the target directory with retry logic:
+Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section and for each file in the filtered list, download to the target directory with retry logic.
 **For Unix-like shells:**
 ```bash
 for file in <FILE_LIST>; do
-  ./uv run python pubmed_weekly.py download_file <WEEK> $file --working-dir="$(pwd)"
+  uv run python <skill_path>/pubmed_weekly.py download_file <WEEK> $file
 done
 ```
 **For Windows cmd.exe:**
 ```bash
-for %f in (<FILE_LIST>) do uv.exe run python pubmed_weekly.py download_file <WEEK> %f --working-dir="%CD%"
+for %f in (<FILE_LIST>) do uv.exe run python <skill_path>\pubmed_weekly.py download_file <WEEK> %f
 ```
+Where:
+- `<skill_path>` is the full directory path extracted from `<skill_files>`
+- `<FILE_LIST>` is the space-separated list from Step 4
+- `<WEEK>` is the week folder name
 Replace `<FILE_LIST>` with the space-separated list from Step 4.
 **Download behavior:**
@@ -200,6 +196,60 @@ ls -lh .download/pubmed-daily/<WEEK>/
 Count the number of files downloaded and report the summary to the user.
+### Step 7: Parse XML Files to Individual Excel Sheets
+For each downloaded `.xml.gz` file in `.download/pubmed-daily/<WEEK>/`, use the `parse_pubmed_articleSet` tool to convert it to an Excel file.
+**Tool invocation pattern:**
+```
+parse_pubmed_articleSet
+  filePath="<working_dir>/.download/pubmed-daily/<WEEK>/<filename>.xml.gz"
+  outputMode="excel"
+  outputFileName="<filename>.xlsx"
+  outputDir="<working_dir>/.download/pubmed-daily/<WEEK>"
+```
+**Example:**
+For file `pubmed24n1234.xml.gz` in week `20250217-20250223`:
+- Input: `.download/pubmed-daily/20250217-20250223/pubmed24n1234.xml.gz`
+- Output: `.download/pubmed-daily/20250217-20250223/pubmed24n1234.xlsx`
+**Process:**
+1. List all `.xml.gz` files in the week directory
+2. For each file, call `parse_pubmed_articleSet` with `outputMode="excel"`
+3. The output Excel file will be saved in the same directory as the input
+4. Report parsing statistics (articles processed, any errors)
+### Step 8: Combine Individual Excel Files
+After all individual Excel files are created, combine them into a single `combined.xlsx` file using the Python script.
+Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section.
+**For Unix-like shells (Git Bash / macOS / Linux):**
+```bash
+uv run python <skill_path>/pubmed_weekly.py combine_excel "<WEEK>"
+```
+**For Windows cmd.exe:**
+```bash
+uv.exe run python <skill_path>\pubmed_weekly.py combine_excel "<WEEK>"
+```
+Where:
+- `<skill_path>` is the full directory path extracted from `<skill_files>`
+- `<WEEK>` is the actual week folder name (e.g., `20250217-20250223`)
+**Expected behavior:**
+- Finds all `.xlsx` files in the week directory (excluding `combined.xlsx`)
+- Reads each file and combines all rows
+- Writes `combined.xlsx` with all articles from all files
+- Returns summary: total rows, source files processed
+**Output location:**
+`.download/pubmed-daily/<WEEK>/combined.xlsx`
 ## Python Script Details
 The skill includes a bundled Python script at `pubmed_weekly.py` with the following functions:
@@ -228,11 +278,23 @@ Parameters:
 Behavior:
 - Downloads from `ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/<filename>`
-- Saves to `<working_dir>/.download/pubmed-daily/<week_name>/<filename>`
-- Uses `--working-dir` argument if provided, otherwise uses current directory
+- Saves to `.download/pubmed-daily/<week_name>/<filename>` in current working directory
+- Creates directory structure if needed
 - Retries up to 3 times on failure
 - Returns exit code 0 on success, 1 on failure (after all retries)
+### 5. `combine_excel(week_name)` - Combine Excel files into combined.xlsx
+Parameters:
+- `week_name`: Week folder name (e.g., `20250217-20250223`)
+Behavior:
+- Searches for all `.xlsx` files in `.download/pubmed-daily/<week_name>/` in current working directory
+- Excludes `combined.xlsx` from the list
+- Reads each Excel file and combines all rows
+- Creates `combined.xlsx` with all articles merged
+- Returns JSON with: success, total_rows, source_files, output_file
 ## Output Summary
 After completion, provide the user with:
@@ -242,19 +304,26 @@ After completion, provide the user with:
 3. Number of files successfully downloaded
 4. Number of files failed to download (if any)
 5. Download location: `.download/pubmed-daily/<WEEK>/`
+6. Number of XML files parsed to Excel (Step 7)
+7. Total articles in combined.xlsx (Step 8)
+8. Combined file location: `.download/pubmed-daily/<WEEK>/combined.xlsx`
 ## Notes
 - This skill automatically checks for and installs uv using the `python-setup-uv` skill if not present
 - The Python script is bundled with this skill at `pubmed_weekly.py`
-- All Python commands use the bash tool's `workdir` parameter to run from skill directory
-- The `--working-dir` argument ensures downloads go to the user's working directory
+- All Python commands use the full script path extracted from `<skill_files>` section
+- The script uses `os.getcwd()` to determine the working directory, which is naturally the opencode working directory
+- All output files (downloads, Excel files) are created in the opencode working directory
 - The FTP server path is: `ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/`
 - Only `.xml.gz` files are downloaded
 - Downloads are sequential (one file at a time)
 - Retry logic includes 2-second delays between attempts
 - User has control to abort on persistent failures
-- The script uses Python's built-in `urllib.request` for FTP operations - no additional Python package dependencies required
-- Skill directory path is extracted from `<skill_files>` section for portability
+- The script uses Python's built-in `urllib.request` for FTP operations
+- The `combine_excel` command requires `openpyxl` package (auto-installed via uv)
+- Skill directory path is extracted from `<skill_files>` section for script location
 - Windows with Git Bash: Follow Unix-like shell instructions
-- Windows cmd.exe: Use `uv.exe run python` syntax and `%CD%` for working directory
+- Windows cmd.exe: Use `uv.exe run python` syntax
+- Step 7 uses the `parse_pubmed_articleSet` tool for XML to Excel conversion
+- Step 8 combines all individual Excel files into a single `combined.xlsx`

package/dist/skills/pubmed-weekly/pubmed_weekly.py CHANGED Viewed

@@ -7,19 +7,19 @@ This script handles:
 - Fetching FTP file list from NCBI
 - Filtering files for the specific week
 - Downloading files with retry logic
+- Combining Excel files into combined.xlsx
 """
 import os
 import sys
 import re
 import time
+import json
+import glob
 import urllib.request
 import argparse
 from datetime import datetime, timedelta
-from typing import List, Tuple
-# Global working directory for downloads (set via --working-dir argument)
-WORKING_DIR = None
+from typing import List, Dict, Any
 def calculate_week() -> str:
@@ -201,8 +201,8 @@ def download_file(week_name: str, filename: str, max_retries: int = 3) -> int:
     base_url = "ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/"
     url = f"{base_url}{filename}"
-    # Create download directory - use WORKING_DIR if set, otherwise current directory
-    base_dir = WORKING_DIR if WORKING_DIR else os.getcwd()
+    # Create download directory in current working directory
+    base_dir = os.getcwd()
     download_dir = os.path.join(base_dir, ".download", "pubmed-daily", week_name)
     os.makedirs(download_dir, exist_ok=True)
@@ -234,26 +234,132 @@ def download_file(week_name: str, filename: str, max_retries: int = 3) -> int:
     return 1
+def combine_excel(week_name: str) -> Dict[str, Any]:
+    """Combine all Excel files in week folder into combined.xlsx.
+    Args:
+        week_name: Week folder name (e.g., '20250217-20250223')
+    Returns:
+        Dict with success, total_rows, source_files, output_file
+    """
+    try:
+        from openpyxl import load_workbook, Workbook
+    except ImportError:
+        print("Error: openpyxl package not installed.", file=sys.stderr)
+        print("Please install with: uv add openpyxl", file=sys.stderr)
+        return {
+            "success": False,
+            "error": "openpyxl not installed",
+            "total_rows": 0,
+            "source_files": [],
+            "output_file": None,
+        }
+    # Use current working directory
+    base_dir = os.getcwd()
+    week_dir = os.path.join(base_dir, ".download", "pubmed-daily", week_name)
+    if not os.path.exists(week_dir):
+        return {
+            "success": False,
+            "error": f"Directory not found: {week_dir}",
+            "total_rows": 0,
+            "source_files": [],
+            "output_file": None,
+        }
+    xlsx_pattern = os.path.join(week_dir, "*.xlsx")
+    all_xlsx_files = glob.glob(xlsx_pattern)
+    source_files = [
+        os.path.basename(f) for f in all_xlsx_files if not f.endswith("combined.xlsx")
+    ]
+    source_files.sort()
+    if not source_files:
+        return {
+            "success": False,
+            "error": "No Excel files found to combine",
+            "total_rows": 0,
+            "source_files": [],
+            "output_file": None,
+        }
+    combined_wb = Workbook()
+    combined_ws = combined_wb.active
+    if combined_ws is None:
+        combined_ws = combined_wb.create_sheet("PubMed Articles")
+    else:
+        combined_ws.title = "PubMed Articles"
+    header_written = False
+    total_rows = 0
+    processed_files = []
+    for filename in source_files:
+        filepath = os.path.join(week_dir, filename)
+        try:
+            wb = load_workbook(filepath, read_only=True, data_only=True)
+            ws = wb.active
+            if ws is None:
+                print(f"Warning: {filename} has no active sheet, skipping")
+                wb.close()
+                continue
+            rows = list(ws.rows)
+            if not rows:
+                print(f"Warning: {filename} is empty, skipping")
+                wb.close()
+                continue
+            if not header_written:
+                headers = [cell.value for cell in rows[0]]
+                combined_ws.append(headers)
+                header_written = True
+                data_start = 1
+            else:
+                data_start = 0
+            for row in rows[data_start:]:
+                row_values = [cell.value for cell in row]
+                if any(v is not None for v in row_values):
+                    combined_ws.append(row_values)
+                    total_rows += 1
+            processed_files.append(filename)
+            wb.close()
+            print(f"Processed: {filename}")
+        except Exception as e:
+            print(f"Warning: Error processing {filename}: {e}", file=sys.stderr)
+            continue
+    output_path = os.path.join(week_dir, "combined.xlsx")
+    combined_wb.save(output_path)
+    print(f"\nCombined {total_rows} rows from {len(processed_files)} files")
+    print(f"Output: {output_path}")
+    return {
+        "success": True,
+        "total_rows": total_rows,
+        "source_files": processed_files,
+        "output_file": "combined.xlsx",
+    }
 def main():
     """Main entry point for command-line usage."""
     parser = argparse.ArgumentParser(
         description="PubMed Weekly Daily Updates Downloader"
     )
-    parser.add_argument(
-        "--working-dir",
-        type=str,
-        help="Working directory for downloads (default: current directory)",
-    )
     parser.add_argument("command", type=str, help="Command to execute")
     parser.add_argument("args", nargs="*", help="Command arguments")
     parsed = parser.parse_args()
-    # Set global working directory if provided
-    global WORKING_DIR
-    if parsed.working_dir:
-        WORKING_DIR = parsed.working_dir
     command = parsed.command
     args = parsed.args
@@ -284,6 +390,18 @@ def main():
         filename = args[1]
         sys.exit(download_file(week_name, filename))
+    elif command == "combine_excel":
+        if len(args) < 1:
+            print("Usage: python pubmed_weekly.py combine_excel <week_name>")
+            sys.exit(1)
+        week_name = args[0]
+        result = combine_excel(week_name)
+        print(json.dumps(result, indent=2))
+        if not result.get("success"):
+            sys.exit(1)
     else:
         print(f"Unknown command: {command}")
         sys.exit(1)

package/dist/skills/pubmed-weekly/pyproject.toml ADDED Viewed

@@ -0,0 +1,8 @@
+[project]
+name = "pubmed-weekly"
+version = "1.0.0"
+description = "PubMed weekly daily updates downloader and processor"
+requires-python = ">=3.10"
+dependencies = [
+    "openpyxl>=3.1.0",
+]

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@yeyuan98/opencode-bioresearcher-plugin",
-  "version": "1.3.1-alpha.0",
+  "version": "1.3.1",
   "description": "OpenCode plugin that adds a bioresearcher agent",
   "main": "dist/index.js",
   "types": "dist/index.d.ts",