@yeyuan98/opencode-bioresearcher-plugin 1.3.1-alpha.0 → 1.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -6,6 +6,7 @@ allowedTools:
6
6
  - Read
7
7
  - Write
8
8
  - Question
9
+ - parse_pubmed_articleSet
9
10
  ---
10
11
 
11
12
  # PubMed Weekly Daily Updates Download
@@ -34,13 +35,13 @@ This skill integrates with the `python-setup-uv` skill to ensure Python environm
34
35
  Before starting the download process:
35
36
 
36
37
  1. **Check if uv is installed:**
37
- ```bash
38
- if [ -f "uv" ] || [ -f "uv.exe" ]; then
39
- echo "uv already installed"
40
- else
41
- echo "uv not found, setting up..."
42
- fi
43
- ```
38
+ ```bash
39
+ if [ -f "uv" ] || [ -f "uv.exe" ]; then
40
+ echo "uv already installed"
41
+ else
42
+ echo "uv not found, setting up..."
43
+ fi
44
+ ```
44
45
 
45
46
  2. **If uv is not installed:**
46
47
  - Load the `python-setup-uv` skill using the skill tool
@@ -49,53 +50,46 @@ Before starting the download process:
49
50
  - Continue with this skill's Step 1 below
50
51
 
51
52
  3. **After uv is installed:**
52
- - All Python commands in this skill will use the bash tool with `workdir` parameter
53
- - Use `./uv run python` (Unix-like) or `uv.exe run python` (Windows cmd.exe)
54
53
  - The bundled script `pubmed_weekly.py` will be executed using uv
54
+ - Extract the full script path from the `<skill_files>` section in skill tool output
55
55
 
56
- ## Path Resolution Strategy
57
-
58
- This skill uses the bash tool's `workdir` parameter to handle portable script execution:
56
+ ## Steps
59
57
 
60
- 1. **Extract skill directory path** from `<skill_files>` section in skill tool output
61
- - Example: `<file>C:\Users\...\plugin\skills\pubmed-weekly\pubmed_weekly.py</file>`
62
- - Extract directory: `C:\Users\...\plugin\skills\pubmed-weekly\`
58
+ Follow these steps EXACTLY as described.
63
59
 
64
- 2. **Get current working directory** using the bash tool
65
- ```bash
66
- WORKING_DIR=$(pwd) # Unix-like
67
- # or use the default working directory from bash tool
68
- ```
60
+ ### Step 1: Calculate Week Date Range
69
61
 
70
- 3. **Use bash tool with workdir** to run Python scripts from skill directory
71
- ```bash
72
- # bash tool will handle the workdir parameter
73
- # Python's os.getcwd() will give skill directory
74
- # Downloads go to working directory via --working-dir argument
75
- ```
62
+ First, determine the date range for the past week (Monday through Sunday).
76
63
 
77
- ## Steps
64
+ Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section in the skill tool output.
78
65
 
79
- Follow these steps EXACTLY as described.
66
+ **For Unix-like shells (Git Bash / macOS / Linux):**
67
+ ```bash
68
+ uv run python <skill_path>/pubmed_weekly.py calculate_week
69
+ ```
80
70
 
81
71
  ### Step 1: Calculate Week Date Range
82
72
 
83
73
  First, determine the date range for the past week (Monday through Sunday).
84
74
 
85
- Use the bash tool with:
86
- - `workdir` set to the skill directory (extracted from `<skill_files>`)
87
- - `command` to run the Python script
75
+ Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section in the skill tool output.
76
+
77
+ ### Step 1: Calculate Week Date Range
78
+
79
+ First, determine the date range for the past week (Monday through Sunday).
88
80
 
89
81
  **For Unix-like shells (Git Bash / macOS / Linux):**
90
82
  ```bash
91
- ./uv run python pubmed_weekly.py calculate_week --working-dir="$(pwd)"
83
+ uv run python <skill_path>/pubmed_weekly.py calculate_week
92
84
  ```
93
85
 
94
86
  **For Windows cmd.exe:**
95
87
  ```bash
96
- uv.exe run python pubmed_weekly.py calculate_week --working-dir="%CD%"
88
+ uv.exe run python <skill_path>\pubmed_weekly.py calculate_week
97
89
  ```
98
90
 
91
+ Replace `<skill_path>` with the full directory path extracted from `<skill_files>`.
92
+
99
93
  This will output the week folder name in format `YYYYMMDD-YYYYMMDD`.
100
94
 
101
95
  **Expected output format:**
@@ -105,28 +99,24 @@ This will output the week folder name in format `YYYYMMDD-YYYYMMDD`.
105
99
 
106
100
  ### Step 2: Create Download Directory
107
101
 
108
- Create the directory structure for the week in the working directory:
109
-
110
- ```bash
111
- mkdir -p .download/pubmed-daily/<WEEK>
112
- ```
113
-
114
- Replace `<WEEK>` with the actual week folder name from Step 1.
102
+ The `download_file` command will automatically create the directory structure when needed. No manual directory creation is required.
115
103
 
116
104
  ### Step 3: Fetch FTP File List
117
105
 
118
- Use the bash tool with `workdir` set to the skill directory to fetch the list of files from the NCBI FTP server:
106
+ Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section and fetch the list of files from the NCBI FTP server.
119
107
 
120
108
  **For Unix-like shells:**
121
109
  ```bash
122
- ./uv run python pubmed_weekly.py fetch_files --working-dir="$(pwd)"
110
+ uv run python <skill_path>/pubmed_weekly.py fetch_files
123
111
  ```
124
112
 
125
113
  **For Windows cmd.exe:**
126
114
  ```bash
127
- uv.exe run python pubmed_weekly.py fetch_files --working-dir="%CD%"
115
+ uv.exe run python <skill_path>\pubmed_weekly.py fetch_files
128
116
  ```
129
117
 
118
+ Replace `<skill_path>` with the full directory path extracted from `<skill_files>`.
119
+
130
120
  This will list all daily update xml.gz files available on the FTP server.
131
121
 
132
122
  **Expected output:**
@@ -136,19 +126,20 @@ pubmed24n1234.xml.gz pubmed24n1235.xml.gz pubmed24n1236.xml.gz
136
126
 
137
127
  ### Step 4: Filter Files for Past Week
138
128
 
139
- Use the bash tool with `workdir` set to the skill directory to filter the file list for the past week's daily updates:
129
+ Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section and filter the file list for the past week's daily updates.
140
130
 
141
131
  **For Unix-like shells:**
142
132
  ```bash
143
- ./uv run python pubmed_weekly.py filter_files "<WEEK>" "<FILE_LIST>" --working-dir="$(pwd)"
133
+ uv run python <skill_path>/pubmed_weekly.py filter_files "<WEEK>" "<FILE_LIST>"
144
134
  ```
145
135
 
146
136
  **For Windows cmd.exe:**
147
137
  ```bash
148
- uv.exe run python pubmed_weekly.py filter_files "<WEEK>" "<FILE_LIST>" --working-dir="%CD%"
138
+ uv.exe run python <skill_path>\pubmed_weekly.py filter_files "<WEEK>" "<FILE_LIST>"
149
139
  ```
150
140
 
151
141
  Where:
142
+ - `<skill_path>` is the full directory path extracted from `<skill_files>`
152
143
  - `<WEEK>` is the week folder name (e.g., `20250217-20250223`)
153
144
  - `<FILE_LIST>` is the output from Step 3 (space-separated filenames, use quotes)
154
145
 
@@ -161,20 +152,25 @@ pubmed24n1234.xml.gz pubmed24n1235.xml.gz pubmed24n1236.xml.gz
161
152
 
162
153
  ### Step 5: Download Files with Retry
163
154
 
164
- For each file in the filtered list, download to the target directory with retry logic:
155
+ Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section and for each file in the filtered list, download to the target directory with retry logic.
165
156
 
166
157
  **For Unix-like shells:**
167
158
  ```bash
168
159
  for file in <FILE_LIST>; do
169
- ./uv run python pubmed_weekly.py download_file <WEEK> $file --working-dir="$(pwd)"
160
+ uv run python <skill_path>/pubmed_weekly.py download_file <WEEK> $file
170
161
  done
171
162
  ```
172
163
 
173
164
  **For Windows cmd.exe:**
174
165
  ```bash
175
- for %f in (<FILE_LIST>) do uv.exe run python pubmed_weekly.py download_file <WEEK> %f --working-dir="%CD%"
166
+ for %f in (<FILE_LIST>) do uv.exe run python <skill_path>\pubmed_weekly.py download_file <WEEK> %f
176
167
  ```
177
168
 
169
+ Where:
170
+ - `<skill_path>` is the full directory path extracted from `<skill_files>`
171
+ - `<FILE_LIST>` is the space-separated list from Step 4
172
+ - `<WEEK>` is the week folder name
173
+
178
174
  Replace `<FILE_LIST>` with the space-separated list from Step 4.
179
175
 
180
176
  **Download behavior:**
@@ -200,6 +196,60 @@ ls -lh .download/pubmed-daily/<WEEK>/
200
196
 
201
197
  Count the number of files downloaded and report the summary to the user.
202
198
 
199
+ ### Step 7: Parse XML Files to Individual Excel Sheets
200
+
201
+ For each downloaded `.xml.gz` file in `.download/pubmed-daily/<WEEK>/`, use the `parse_pubmed_articleSet` tool to convert it to an Excel file.
202
+
203
+ **Tool invocation pattern:**
204
+
205
+ ```
206
+ parse_pubmed_articleSet
207
+ filePath="<working_dir>/.download/pubmed-daily/<WEEK>/<filename>.xml.gz"
208
+ outputMode="excel"
209
+ outputFileName="<filename>.xlsx"
210
+ outputDir="<working_dir>/.download/pubmed-daily/<WEEK>"
211
+ ```
212
+
213
+ **Example:**
214
+ For file `pubmed24n1234.xml.gz` in week `20250217-20250223`:
215
+ - Input: `.download/pubmed-daily/20250217-20250223/pubmed24n1234.xml.gz`
216
+ - Output: `.download/pubmed-daily/20250217-20250223/pubmed24n1234.xlsx`
217
+
218
+ **Process:**
219
+ 1. List all `.xml.gz` files in the week directory
220
+ 2. For each file, call `parse_pubmed_articleSet` with `outputMode="excel"`
221
+ 3. The output Excel file will be saved in the same directory as the input
222
+ 4. Report parsing statistics (articles processed, any errors)
223
+
224
+ ### Step 8: Combine Individual Excel Files
225
+
226
+ After all individual Excel files are created, combine them into a single `combined.xlsx` file using the Python script.
227
+
228
+ Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section.
229
+
230
+ **For Unix-like shells (Git Bash / macOS / Linux):**
231
+ ```bash
232
+ uv run python <skill_path>/pubmed_weekly.py combine_excel "<WEEK>"
233
+ ```
234
+
235
+ **For Windows cmd.exe:**
236
+ ```bash
237
+ uv.exe run python <skill_path>\pubmed_weekly.py combine_excel "<WEEK>"
238
+ ```
239
+
240
+ Where:
241
+ - `<skill_path>` is the full directory path extracted from `<skill_files>`
242
+ - `<WEEK>` is the actual week folder name (e.g., `20250217-20250223`)
243
+
244
+ **Expected behavior:**
245
+ - Finds all `.xlsx` files in the week directory (excluding `combined.xlsx`)
246
+ - Reads each file and combines all rows
247
+ - Writes `combined.xlsx` with all articles from all files
248
+ - Returns summary: total rows, source files processed
249
+
250
+ **Output location:**
251
+ `.download/pubmed-daily/<WEEK>/combined.xlsx`
252
+
203
253
  ## Python Script Details
204
254
 
205
255
  The skill includes a bundled Python script at `pubmed_weekly.py` with the following functions:
@@ -228,11 +278,23 @@ Parameters:
228
278
 
229
279
  Behavior:
230
280
  - Downloads from `ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/<filename>`
231
- - Saves to `<working_dir>/.download/pubmed-daily/<week_name>/<filename>`
232
- - Uses `--working-dir` argument if provided, otherwise uses current directory
281
+ - Saves to `.download/pubmed-daily/<week_name>/<filename>` in current working directory
282
+ - Creates directory structure if needed
233
283
  - Retries up to 3 times on failure
234
284
  - Returns exit code 0 on success, 1 on failure (after all retries)
235
285
 
286
+ ### 5. `combine_excel(week_name)` - Combine Excel files into combined.xlsx
287
+
288
+ Parameters:
289
+ - `week_name`: Week folder name (e.g., `20250217-20250223`)
290
+
291
+ Behavior:
292
+ - Searches for all `.xlsx` files in `.download/pubmed-daily/<week_name>/` in current working directory
293
+ - Excludes `combined.xlsx` from the list
294
+ - Reads each Excel file and combines all rows
295
+ - Creates `combined.xlsx` with all articles merged
296
+ - Returns JSON with: success, total_rows, source_files, output_file
297
+
236
298
  ## Output Summary
237
299
 
238
300
  After completion, provide the user with:
@@ -242,19 +304,26 @@ After completion, provide the user with:
242
304
  3. Number of files successfully downloaded
243
305
  4. Number of files failed to download (if any)
244
306
  5. Download location: `.download/pubmed-daily/<WEEK>/`
307
+ 6. Number of XML files parsed to Excel (Step 7)
308
+ 7. Total articles in combined.xlsx (Step 8)
309
+ 8. Combined file location: `.download/pubmed-daily/<WEEK>/combined.xlsx`
245
310
 
246
311
  ## Notes
247
312
 
248
313
  - This skill automatically checks for and installs uv using the `python-setup-uv` skill if not present
249
314
  - The Python script is bundled with this skill at `pubmed_weekly.py`
250
- - All Python commands use the bash tool's `workdir` parameter to run from skill directory
251
- - The `--working-dir` argument ensures downloads go to the user's working directory
315
+ - All Python commands use the full script path extracted from `<skill_files>` section
316
+ - The script uses `os.getcwd()` to determine the working directory, which is naturally the opencode working directory
317
+ - All output files (downloads, Excel files) are created in the opencode working directory
252
318
  - The FTP server path is: `ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/`
253
319
  - Only `.xml.gz` files are downloaded
254
320
  - Downloads are sequential (one file at a time)
255
321
  - Retry logic includes 2-second delays between attempts
256
322
  - User has control to abort on persistent failures
257
- - The script uses Python's built-in `urllib.request` for FTP operations - no additional Python package dependencies required
258
- - Skill directory path is extracted from `<skill_files>` section for portability
323
+ - The script uses Python's built-in `urllib.request` for FTP operations
324
+ - The `combine_excel` command requires `openpyxl` package (auto-installed via uv)
325
+ - Skill directory path is extracted from `<skill_files>` section for script location
259
326
  - Windows with Git Bash: Follow Unix-like shell instructions
260
- - Windows cmd.exe: Use `uv.exe run python` syntax and `%CD%` for working directory
327
+ - Windows cmd.exe: Use `uv.exe run python` syntax
328
+ - Step 7 uses the `parse_pubmed_articleSet` tool for XML to Excel conversion
329
+ - Step 8 combines all individual Excel files into a single `combined.xlsx`
@@ -7,19 +7,19 @@ This script handles:
7
7
  - Fetching FTP file list from NCBI
8
8
  - Filtering files for the specific week
9
9
  - Downloading files with retry logic
10
+ - Combining Excel files into combined.xlsx
10
11
  """
11
12
 
12
13
  import os
13
14
  import sys
14
15
  import re
15
16
  import time
17
+ import json
18
+ import glob
16
19
  import urllib.request
17
20
  import argparse
18
21
  from datetime import datetime, timedelta
19
- from typing import List, Tuple
20
-
21
- # Global working directory for downloads (set via --working-dir argument)
22
- WORKING_DIR = None
22
+ from typing import List, Dict, Any
23
23
 
24
24
 
25
25
  def calculate_week() -> str:
@@ -201,8 +201,8 @@ def download_file(week_name: str, filename: str, max_retries: int = 3) -> int:
201
201
  base_url = "ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/"
202
202
  url = f"{base_url}{filename}"
203
203
 
204
- # Create download directory - use WORKING_DIR if set, otherwise current directory
205
- base_dir = WORKING_DIR if WORKING_DIR else os.getcwd()
204
+ # Create download directory in current working directory
205
+ base_dir = os.getcwd()
206
206
  download_dir = os.path.join(base_dir, ".download", "pubmed-daily", week_name)
207
207
  os.makedirs(download_dir, exist_ok=True)
208
208
 
@@ -234,26 +234,132 @@ def download_file(week_name: str, filename: str, max_retries: int = 3) -> int:
234
234
  return 1
235
235
 
236
236
 
237
+ def combine_excel(week_name: str) -> Dict[str, Any]:
238
+ """Combine all Excel files in week folder into combined.xlsx.
239
+
240
+ Args:
241
+ week_name: Week folder name (e.g., '20250217-20250223')
242
+
243
+ Returns:
244
+ Dict with success, total_rows, source_files, output_file
245
+ """
246
+ try:
247
+ from openpyxl import load_workbook, Workbook
248
+ except ImportError:
249
+ print("Error: openpyxl package not installed.", file=sys.stderr)
250
+ print("Please install with: uv add openpyxl", file=sys.stderr)
251
+ return {
252
+ "success": False,
253
+ "error": "openpyxl not installed",
254
+ "total_rows": 0,
255
+ "source_files": [],
256
+ "output_file": None,
257
+ }
258
+
259
+ # Use current working directory
260
+ base_dir = os.getcwd()
261
+ week_dir = os.path.join(base_dir, ".download", "pubmed-daily", week_name)
262
+
263
+ if not os.path.exists(week_dir):
264
+ return {
265
+ "success": False,
266
+ "error": f"Directory not found: {week_dir}",
267
+ "total_rows": 0,
268
+ "source_files": [],
269
+ "output_file": None,
270
+ }
271
+
272
+ xlsx_pattern = os.path.join(week_dir, "*.xlsx")
273
+ all_xlsx_files = glob.glob(xlsx_pattern)
274
+
275
+ source_files = [
276
+ os.path.basename(f) for f in all_xlsx_files if not f.endswith("combined.xlsx")
277
+ ]
278
+ source_files.sort()
279
+
280
+ if not source_files:
281
+ return {
282
+ "success": False,
283
+ "error": "No Excel files found to combine",
284
+ "total_rows": 0,
285
+ "source_files": [],
286
+ "output_file": None,
287
+ }
288
+
289
+ combined_wb = Workbook()
290
+ combined_ws = combined_wb.active
291
+ if combined_ws is None:
292
+ combined_ws = combined_wb.create_sheet("PubMed Articles")
293
+ else:
294
+ combined_ws.title = "PubMed Articles"
295
+
296
+ header_written = False
297
+ total_rows = 0
298
+ processed_files = []
299
+
300
+ for filename in source_files:
301
+ filepath = os.path.join(week_dir, filename)
302
+
303
+ try:
304
+ wb = load_workbook(filepath, read_only=True, data_only=True)
305
+ ws = wb.active
306
+ if ws is None:
307
+ print(f"Warning: {filename} has no active sheet, skipping")
308
+ wb.close()
309
+ continue
310
+
311
+ rows = list(ws.rows)
312
+ if not rows:
313
+ print(f"Warning: {filename} is empty, skipping")
314
+ wb.close()
315
+ continue
316
+
317
+ if not header_written:
318
+ headers = [cell.value for cell in rows[0]]
319
+ combined_ws.append(headers)
320
+ header_written = True
321
+ data_start = 1
322
+ else:
323
+ data_start = 0
324
+
325
+ for row in rows[data_start:]:
326
+ row_values = [cell.value for cell in row]
327
+ if any(v is not None for v in row_values):
328
+ combined_ws.append(row_values)
329
+ total_rows += 1
330
+
331
+ processed_files.append(filename)
332
+ wb.close()
333
+ print(f"Processed: {filename}")
334
+
335
+ except Exception as e:
336
+ print(f"Warning: Error processing {filename}: {e}", file=sys.stderr)
337
+ continue
338
+
339
+ output_path = os.path.join(week_dir, "combined.xlsx")
340
+ combined_wb.save(output_path)
341
+
342
+ print(f"\nCombined {total_rows} rows from {len(processed_files)} files")
343
+ print(f"Output: {output_path}")
344
+
345
+ return {
346
+ "success": True,
347
+ "total_rows": total_rows,
348
+ "source_files": processed_files,
349
+ "output_file": "combined.xlsx",
350
+ }
351
+
352
+
237
353
  def main():
238
354
  """Main entry point for command-line usage."""
239
355
  parser = argparse.ArgumentParser(
240
356
  description="PubMed Weekly Daily Updates Downloader"
241
357
  )
242
- parser.add_argument(
243
- "--working-dir",
244
- type=str,
245
- help="Working directory for downloads (default: current directory)",
246
- )
247
358
  parser.add_argument("command", type=str, help="Command to execute")
248
359
  parser.add_argument("args", nargs="*", help="Command arguments")
249
360
 
250
361
  parsed = parser.parse_args()
251
362
 
252
- # Set global working directory if provided
253
- global WORKING_DIR
254
- if parsed.working_dir:
255
- WORKING_DIR = parsed.working_dir
256
-
257
363
  command = parsed.command
258
364
  args = parsed.args
259
365
 
@@ -284,6 +390,18 @@ def main():
284
390
  filename = args[1]
285
391
  sys.exit(download_file(week_name, filename))
286
392
 
393
+ elif command == "combine_excel":
394
+ if len(args) < 1:
395
+ print("Usage: python pubmed_weekly.py combine_excel <week_name>")
396
+ sys.exit(1)
397
+
398
+ week_name = args[0]
399
+ result = combine_excel(week_name)
400
+ print(json.dumps(result, indent=2))
401
+
402
+ if not result.get("success"):
403
+ sys.exit(1)
404
+
287
405
  else:
288
406
  print(f"Unknown command: {command}")
289
407
  sys.exit(1)
@@ -0,0 +1,8 @@
1
+ [project]
2
+ name = "pubmed-weekly"
3
+ version = "1.0.0"
4
+ description = "PubMed weekly daily updates downloader and processor"
5
+ requires-python = ">=3.10"
6
+ dependencies = [
7
+ "openpyxl>=3.1.0",
8
+ ]
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@yeyuan98/opencode-bioresearcher-plugin",
3
- "version": "1.3.1-alpha.0",
3
+ "version": "1.3.1",
4
4
  "description": "OpenCode plugin that adds a bioresearcher agent",
5
5
  "main": "dist/index.js",
6
6
  "types": "dist/index.d.ts",