@yeyuan98/opencode-bioresearcher-plugin 1.3.1-alpha.0 → 1.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
|
@@ -6,6 +6,7 @@ allowedTools:
|
|
|
6
6
|
- Read
|
|
7
7
|
- Write
|
|
8
8
|
- Question
|
|
9
|
+
- parse_pubmed_articleSet
|
|
9
10
|
---
|
|
10
11
|
|
|
11
12
|
# PubMed Weekly Daily Updates Download
|
|
@@ -34,13 +35,13 @@ This skill integrates with the `python-setup-uv` skill to ensure Python environm
|
|
|
34
35
|
Before starting the download process:
|
|
35
36
|
|
|
36
37
|
1. **Check if uv is installed:**
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
38
|
+
```bash
|
|
39
|
+
if [ -f "uv" ] || [ -f "uv.exe" ]; then
|
|
40
|
+
echo "uv already installed"
|
|
41
|
+
else
|
|
42
|
+
echo "uv not found, setting up..."
|
|
43
|
+
fi
|
|
44
|
+
```
|
|
44
45
|
|
|
45
46
|
2. **If uv is not installed:**
|
|
46
47
|
- Load the `python-setup-uv` skill using the skill tool
|
|
@@ -49,53 +50,46 @@ Before starting the download process:
|
|
|
49
50
|
- Continue with this skill's Step 1 below
|
|
50
51
|
|
|
51
52
|
3. **After uv is installed:**
|
|
52
|
-
- All Python commands in this skill will use the bash tool with `workdir` parameter
|
|
53
|
-
- Use `./uv run python` (Unix-like) or `uv.exe run python` (Windows cmd.exe)
|
|
54
53
|
- The bundled script `pubmed_weekly.py` will be executed using uv
|
|
54
|
+
- Extract the full script path from the `<skill_files>` section in skill tool output
|
|
55
55
|
|
|
56
|
-
##
|
|
57
|
-
|
|
58
|
-
This skill uses the bash tool's `workdir` parameter to handle portable script execution:
|
|
56
|
+
## Steps
|
|
59
57
|
|
|
60
|
-
|
|
61
|
-
- Example: `<file>C:\Users\...\plugin\skills\pubmed-weekly\pubmed_weekly.py</file>`
|
|
62
|
-
- Extract directory: `C:\Users\...\plugin\skills\pubmed-weekly\`
|
|
58
|
+
Follow these steps EXACTLY as described.
|
|
63
59
|
|
|
64
|
-
|
|
65
|
-
```bash
|
|
66
|
-
WORKING_DIR=$(pwd) # Unix-like
|
|
67
|
-
# or use the default working directory from bash tool
|
|
68
|
-
```
|
|
60
|
+
### Step 1: Calculate Week Date Range
|
|
69
61
|
|
|
70
|
-
|
|
71
|
-
```bash
|
|
72
|
-
# bash tool will handle the workdir parameter
|
|
73
|
-
# Python's os.getcwd() will give skill directory
|
|
74
|
-
# Downloads go to working directory via --working-dir argument
|
|
75
|
-
```
|
|
62
|
+
First, determine the date range for the past week (Monday through Sunday).
|
|
76
63
|
|
|
77
|
-
|
|
64
|
+
Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section in the skill tool output.
|
|
78
65
|
|
|
79
|
-
|
|
66
|
+
**For Unix-like shells (Git Bash / macOS / Linux):**
|
|
67
|
+
```bash
|
|
68
|
+
uv run python <skill_path>/pubmed_weekly.py calculate_week
|
|
69
|
+
```
|
|
80
70
|
|
|
81
71
|
### Step 1: Calculate Week Date Range
|
|
82
72
|
|
|
83
73
|
First, determine the date range for the past week (Monday through Sunday).
|
|
84
74
|
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
75
|
+
Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section in the skill tool output.
|
|
76
|
+
|
|
77
|
+
### Step 1: Calculate Week Date Range
|
|
78
|
+
|
|
79
|
+
First, determine the date range for the past week (Monday through Sunday).
|
|
88
80
|
|
|
89
81
|
**For Unix-like shells (Git Bash / macOS / Linux):**
|
|
90
82
|
```bash
|
|
91
|
-
|
|
83
|
+
uv run python <skill_path>/pubmed_weekly.py calculate_week
|
|
92
84
|
```
|
|
93
85
|
|
|
94
86
|
**For Windows cmd.exe:**
|
|
95
87
|
```bash
|
|
96
|
-
uv.exe run python pubmed_weekly.py calculate_week
|
|
88
|
+
uv.exe run python <skill_path>\pubmed_weekly.py calculate_week
|
|
97
89
|
```
|
|
98
90
|
|
|
91
|
+
Replace `<skill_path>` with the full directory path extracted from `<skill_files>`.
|
|
92
|
+
|
|
99
93
|
This will output the week folder name in format `YYYYMMDD-YYYYMMDD`.
|
|
100
94
|
|
|
101
95
|
**Expected output format:**
|
|
@@ -105,28 +99,24 @@ This will output the week folder name in format `YYYYMMDD-YYYYMMDD`.
|
|
|
105
99
|
|
|
106
100
|
### Step 2: Create Download Directory
|
|
107
101
|
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
```bash
|
|
111
|
-
mkdir -p .download/pubmed-daily/<WEEK>
|
|
112
|
-
```
|
|
113
|
-
|
|
114
|
-
Replace `<WEEK>` with the actual week folder name from Step 1.
|
|
102
|
+
The `download_file` command will automatically create the directory structure when needed. No manual directory creation is required.
|
|
115
103
|
|
|
116
104
|
### Step 3: Fetch FTP File List
|
|
117
105
|
|
|
118
|
-
|
|
106
|
+
Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section and fetch the list of files from the NCBI FTP server.
|
|
119
107
|
|
|
120
108
|
**For Unix-like shells:**
|
|
121
109
|
```bash
|
|
122
|
-
|
|
110
|
+
uv run python <skill_path>/pubmed_weekly.py fetch_files
|
|
123
111
|
```
|
|
124
112
|
|
|
125
113
|
**For Windows cmd.exe:**
|
|
126
114
|
```bash
|
|
127
|
-
uv.exe run python pubmed_weekly.py fetch_files
|
|
115
|
+
uv.exe run python <skill_path>\pubmed_weekly.py fetch_files
|
|
128
116
|
```
|
|
129
117
|
|
|
118
|
+
Replace `<skill_path>` with the full directory path extracted from `<skill_files>`.
|
|
119
|
+
|
|
130
120
|
This will list all daily update xml.gz files available on the FTP server.
|
|
131
121
|
|
|
132
122
|
**Expected output:**
|
|
@@ -136,19 +126,20 @@ pubmed24n1234.xml.gz pubmed24n1235.xml.gz pubmed24n1236.xml.gz
|
|
|
136
126
|
|
|
137
127
|
### Step 4: Filter Files for Past Week
|
|
138
128
|
|
|
139
|
-
|
|
129
|
+
Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section and filter the file list for the past week's daily updates.
|
|
140
130
|
|
|
141
131
|
**For Unix-like shells:**
|
|
142
132
|
```bash
|
|
143
|
-
|
|
133
|
+
uv run python <skill_path>/pubmed_weekly.py filter_files "<WEEK>" "<FILE_LIST>"
|
|
144
134
|
```
|
|
145
135
|
|
|
146
136
|
**For Windows cmd.exe:**
|
|
147
137
|
```bash
|
|
148
|
-
uv.exe run python pubmed_weekly.py filter_files "<WEEK>" "<FILE_LIST>"
|
|
138
|
+
uv.exe run python <skill_path>\pubmed_weekly.py filter_files "<WEEK>" "<FILE_LIST>"
|
|
149
139
|
```
|
|
150
140
|
|
|
151
141
|
Where:
|
|
142
|
+
- `<skill_path>` is the full directory path extracted from `<skill_files>`
|
|
152
143
|
- `<WEEK>` is the week folder name (e.g., `20250217-20250223`)
|
|
153
144
|
- `<FILE_LIST>` is the output from Step 3 (space-separated filenames, use quotes)
|
|
154
145
|
|
|
@@ -161,20 +152,25 @@ pubmed24n1234.xml.gz pubmed24n1235.xml.gz pubmed24n1236.xml.gz
|
|
|
161
152
|
|
|
162
153
|
### Step 5: Download Files with Retry
|
|
163
154
|
|
|
164
|
-
|
|
155
|
+
Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section and for each file in the filtered list, download to the target directory with retry logic.
|
|
165
156
|
|
|
166
157
|
**For Unix-like shells:**
|
|
167
158
|
```bash
|
|
168
159
|
for file in <FILE_LIST>; do
|
|
169
|
-
|
|
160
|
+
uv run python <skill_path>/pubmed_weekly.py download_file <WEEK> $file
|
|
170
161
|
done
|
|
171
162
|
```
|
|
172
163
|
|
|
173
164
|
**For Windows cmd.exe:**
|
|
174
165
|
```bash
|
|
175
|
-
for %f in (<FILE_LIST>) do uv.exe run python pubmed_weekly.py download_file <WEEK> %f
|
|
166
|
+
for %f in (<FILE_LIST>) do uv.exe run python <skill_path>\pubmed_weekly.py download_file <WEEK> %f
|
|
176
167
|
```
|
|
177
168
|
|
|
169
|
+
Where:
|
|
170
|
+
- `<skill_path>` is the full directory path extracted from `<skill_files>`
|
|
171
|
+
- `<FILE_LIST>` is the space-separated list from Step 4
|
|
172
|
+
- `<WEEK>` is the week folder name
|
|
173
|
+
|
|
178
174
|
Replace `<FILE_LIST>` with the space-separated list from Step 4.
|
|
179
175
|
|
|
180
176
|
**Download behavior:**
|
|
@@ -200,6 +196,60 @@ ls -lh .download/pubmed-daily/<WEEK>/
|
|
|
200
196
|
|
|
201
197
|
Count the number of files downloaded and report the summary to the user.
|
|
202
198
|
|
|
199
|
+
### Step 7: Parse XML Files to Individual Excel Sheets
|
|
200
|
+
|
|
201
|
+
For each downloaded `.xml.gz` file in `.download/pubmed-daily/<WEEK>/`, use the `parse_pubmed_articleSet` tool to convert it to an Excel file.
|
|
202
|
+
|
|
203
|
+
**Tool invocation pattern:**
|
|
204
|
+
|
|
205
|
+
```
|
|
206
|
+
parse_pubmed_articleSet
|
|
207
|
+
filePath="<working_dir>/.download/pubmed-daily/<WEEK>/<filename>.xml.gz"
|
|
208
|
+
outputMode="excel"
|
|
209
|
+
outputFileName="<filename>.xlsx"
|
|
210
|
+
outputDir="<working_dir>/.download/pubmed-daily/<WEEK>"
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
**Example:**
|
|
214
|
+
For file `pubmed24n1234.xml.gz` in week `20250217-20250223`:
|
|
215
|
+
- Input: `.download/pubmed-daily/20250217-20250223/pubmed24n1234.xml.gz`
|
|
216
|
+
- Output: `.download/pubmed-daily/20250217-20250223/pubmed24n1234.xlsx`
|
|
217
|
+
|
|
218
|
+
**Process:**
|
|
219
|
+
1. List all `.xml.gz` files in the week directory
|
|
220
|
+
2. For each file, call `parse_pubmed_articleSet` with `outputMode="excel"`
|
|
221
|
+
3. The output Excel file will be saved in the same directory as the input
|
|
222
|
+
4. Report parsing statistics (articles processed, any errors)
|
|
223
|
+
|
|
224
|
+
### Step 8: Combine Individual Excel Files
|
|
225
|
+
|
|
226
|
+
After all individual Excel files are created, combine them into a single `combined.xlsx` file using the Python script.
|
|
227
|
+
|
|
228
|
+
Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section.
|
|
229
|
+
|
|
230
|
+
**For Unix-like shells (Git Bash / macOS / Linux):**
|
|
231
|
+
```bash
|
|
232
|
+
uv run python <skill_path>/pubmed_weekly.py combine_excel "<WEEK>"
|
|
233
|
+
```
|
|
234
|
+
|
|
235
|
+
**For Windows cmd.exe:**
|
|
236
|
+
```bash
|
|
237
|
+
uv.exe run python <skill_path>\pubmed_weekly.py combine_excel "<WEEK>"
|
|
238
|
+
```
|
|
239
|
+
|
|
240
|
+
Where:
|
|
241
|
+
- `<skill_path>` is the full directory path extracted from `<skill_files>`
|
|
242
|
+
- `<WEEK>` is the actual week folder name (e.g., `20250217-20250223`)
|
|
243
|
+
|
|
244
|
+
**Expected behavior:**
|
|
245
|
+
- Finds all `.xlsx` files in the week directory (excluding `combined.xlsx`)
|
|
246
|
+
- Reads each file and combines all rows
|
|
247
|
+
- Writes `combined.xlsx` with all articles from all files
|
|
248
|
+
- Returns summary: total rows, source files processed
|
|
249
|
+
|
|
250
|
+
**Output location:**
|
|
251
|
+
`.download/pubmed-daily/<WEEK>/combined.xlsx`
|
|
252
|
+
|
|
203
253
|
## Python Script Details
|
|
204
254
|
|
|
205
255
|
The skill includes a bundled Python script at `pubmed_weekly.py` with the following functions:
|
|
@@ -228,11 +278,23 @@ Parameters:
|
|
|
228
278
|
|
|
229
279
|
Behavior:
|
|
230
280
|
- Downloads from `ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/<filename>`
|
|
231
|
-
- Saves to
|
|
232
|
-
-
|
|
281
|
+
- Saves to `.download/pubmed-daily/<week_name>/<filename>` in current working directory
|
|
282
|
+
- Creates directory structure if needed
|
|
233
283
|
- Retries up to 3 times on failure
|
|
234
284
|
- Returns exit code 0 on success, 1 on failure (after all retries)
|
|
235
285
|
|
|
286
|
+
### 5. `combine_excel(week_name)` - Combine Excel files into combined.xlsx
|
|
287
|
+
|
|
288
|
+
Parameters:
|
|
289
|
+
- `week_name`: Week folder name (e.g., `20250217-20250223`)
|
|
290
|
+
|
|
291
|
+
Behavior:
|
|
292
|
+
- Searches for all `.xlsx` files in `.download/pubmed-daily/<week_name>/` in current working directory
|
|
293
|
+
- Excludes `combined.xlsx` from the list
|
|
294
|
+
- Reads each Excel file and combines all rows
|
|
295
|
+
- Creates `combined.xlsx` with all articles merged
|
|
296
|
+
- Returns JSON with: success, total_rows, source_files, output_file
|
|
297
|
+
|
|
236
298
|
## Output Summary
|
|
237
299
|
|
|
238
300
|
After completion, provide the user with:
|
|
@@ -242,19 +304,26 @@ After completion, provide the user with:
|
|
|
242
304
|
3. Number of files successfully downloaded
|
|
243
305
|
4. Number of files failed to download (if any)
|
|
244
306
|
5. Download location: `.download/pubmed-daily/<WEEK>/`
|
|
307
|
+
6. Number of XML files parsed to Excel (Step 7)
|
|
308
|
+
7. Total articles in combined.xlsx (Step 8)
|
|
309
|
+
8. Combined file location: `.download/pubmed-daily/<WEEK>/combined.xlsx`
|
|
245
310
|
|
|
246
311
|
## Notes
|
|
247
312
|
|
|
248
313
|
- This skill automatically checks for and installs uv using the `python-setup-uv` skill if not present
|
|
249
314
|
- The Python script is bundled with this skill at `pubmed_weekly.py`
|
|
250
|
-
- All Python commands use the
|
|
251
|
-
- The
|
|
315
|
+
- All Python commands use the full script path extracted from `<skill_files>` section
|
|
316
|
+
- The script uses `os.getcwd()` to determine the working directory, which is naturally the opencode working directory
|
|
317
|
+
- All output files (downloads, Excel files) are created in the opencode working directory
|
|
252
318
|
- The FTP server path is: `ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/`
|
|
253
319
|
- Only `.xml.gz` files are downloaded
|
|
254
320
|
- Downloads are sequential (one file at a time)
|
|
255
321
|
- Retry logic includes 2-second delays between attempts
|
|
256
322
|
- User has control to abort on persistent failures
|
|
257
|
-
- The script uses Python's built-in `urllib.request` for FTP operations
|
|
258
|
-
-
|
|
323
|
+
- The script uses Python's built-in `urllib.request` for FTP operations
|
|
324
|
+
- The `combine_excel` command requires `openpyxl` package (auto-installed via uv)
|
|
325
|
+
- Skill directory path is extracted from `<skill_files>` section for script location
|
|
259
326
|
- Windows with Git Bash: Follow Unix-like shell instructions
|
|
260
|
-
- Windows cmd.exe: Use `uv.exe run python` syntax
|
|
327
|
+
- Windows cmd.exe: Use `uv.exe run python` syntax
|
|
328
|
+
- Step 7 uses the `parse_pubmed_articleSet` tool for XML to Excel conversion
|
|
329
|
+
- Step 8 combines all individual Excel files into a single `combined.xlsx`
|
|
@@ -7,19 +7,19 @@ This script handles:
|
|
|
7
7
|
- Fetching FTP file list from NCBI
|
|
8
8
|
- Filtering files for the specific week
|
|
9
9
|
- Downloading files with retry logic
|
|
10
|
+
- Combining Excel files into combined.xlsx
|
|
10
11
|
"""
|
|
11
12
|
|
|
12
13
|
import os
|
|
13
14
|
import sys
|
|
14
15
|
import re
|
|
15
16
|
import time
|
|
17
|
+
import json
|
|
18
|
+
import glob
|
|
16
19
|
import urllib.request
|
|
17
20
|
import argparse
|
|
18
21
|
from datetime import datetime, timedelta
|
|
19
|
-
from typing import List,
|
|
20
|
-
|
|
21
|
-
# Global working directory for downloads (set via --working-dir argument)
|
|
22
|
-
WORKING_DIR = None
|
|
22
|
+
from typing import List, Dict, Any
|
|
23
23
|
|
|
24
24
|
|
|
25
25
|
def calculate_week() -> str:
|
|
@@ -201,8 +201,8 @@ def download_file(week_name: str, filename: str, max_retries: int = 3) -> int:
|
|
|
201
201
|
base_url = "ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/"
|
|
202
202
|
url = f"{base_url}{filename}"
|
|
203
203
|
|
|
204
|
-
# Create download directory
|
|
205
|
-
base_dir =
|
|
204
|
+
# Create download directory in current working directory
|
|
205
|
+
base_dir = os.getcwd()
|
|
206
206
|
download_dir = os.path.join(base_dir, ".download", "pubmed-daily", week_name)
|
|
207
207
|
os.makedirs(download_dir, exist_ok=True)
|
|
208
208
|
|
|
@@ -234,26 +234,132 @@ def download_file(week_name: str, filename: str, max_retries: int = 3) -> int:
|
|
|
234
234
|
return 1
|
|
235
235
|
|
|
236
236
|
|
|
237
|
+
def combine_excel(week_name: str) -> Dict[str, Any]:
|
|
238
|
+
"""Combine all Excel files in week folder into combined.xlsx.
|
|
239
|
+
|
|
240
|
+
Args:
|
|
241
|
+
week_name: Week folder name (e.g., '20250217-20250223')
|
|
242
|
+
|
|
243
|
+
Returns:
|
|
244
|
+
Dict with success, total_rows, source_files, output_file
|
|
245
|
+
"""
|
|
246
|
+
try:
|
|
247
|
+
from openpyxl import load_workbook, Workbook
|
|
248
|
+
except ImportError:
|
|
249
|
+
print("Error: openpyxl package not installed.", file=sys.stderr)
|
|
250
|
+
print("Please install with: uv add openpyxl", file=sys.stderr)
|
|
251
|
+
return {
|
|
252
|
+
"success": False,
|
|
253
|
+
"error": "openpyxl not installed",
|
|
254
|
+
"total_rows": 0,
|
|
255
|
+
"source_files": [],
|
|
256
|
+
"output_file": None,
|
|
257
|
+
}
|
|
258
|
+
|
|
259
|
+
# Use current working directory
|
|
260
|
+
base_dir = os.getcwd()
|
|
261
|
+
week_dir = os.path.join(base_dir, ".download", "pubmed-daily", week_name)
|
|
262
|
+
|
|
263
|
+
if not os.path.exists(week_dir):
|
|
264
|
+
return {
|
|
265
|
+
"success": False,
|
|
266
|
+
"error": f"Directory not found: {week_dir}",
|
|
267
|
+
"total_rows": 0,
|
|
268
|
+
"source_files": [],
|
|
269
|
+
"output_file": None,
|
|
270
|
+
}
|
|
271
|
+
|
|
272
|
+
xlsx_pattern = os.path.join(week_dir, "*.xlsx")
|
|
273
|
+
all_xlsx_files = glob.glob(xlsx_pattern)
|
|
274
|
+
|
|
275
|
+
source_files = [
|
|
276
|
+
os.path.basename(f) for f in all_xlsx_files if not f.endswith("combined.xlsx")
|
|
277
|
+
]
|
|
278
|
+
source_files.sort()
|
|
279
|
+
|
|
280
|
+
if not source_files:
|
|
281
|
+
return {
|
|
282
|
+
"success": False,
|
|
283
|
+
"error": "No Excel files found to combine",
|
|
284
|
+
"total_rows": 0,
|
|
285
|
+
"source_files": [],
|
|
286
|
+
"output_file": None,
|
|
287
|
+
}
|
|
288
|
+
|
|
289
|
+
combined_wb = Workbook()
|
|
290
|
+
combined_ws = combined_wb.active
|
|
291
|
+
if combined_ws is None:
|
|
292
|
+
combined_ws = combined_wb.create_sheet("PubMed Articles")
|
|
293
|
+
else:
|
|
294
|
+
combined_ws.title = "PubMed Articles"
|
|
295
|
+
|
|
296
|
+
header_written = False
|
|
297
|
+
total_rows = 0
|
|
298
|
+
processed_files = []
|
|
299
|
+
|
|
300
|
+
for filename in source_files:
|
|
301
|
+
filepath = os.path.join(week_dir, filename)
|
|
302
|
+
|
|
303
|
+
try:
|
|
304
|
+
wb = load_workbook(filepath, read_only=True, data_only=True)
|
|
305
|
+
ws = wb.active
|
|
306
|
+
if ws is None:
|
|
307
|
+
print(f"Warning: {filename} has no active sheet, skipping")
|
|
308
|
+
wb.close()
|
|
309
|
+
continue
|
|
310
|
+
|
|
311
|
+
rows = list(ws.rows)
|
|
312
|
+
if not rows:
|
|
313
|
+
print(f"Warning: {filename} is empty, skipping")
|
|
314
|
+
wb.close()
|
|
315
|
+
continue
|
|
316
|
+
|
|
317
|
+
if not header_written:
|
|
318
|
+
headers = [cell.value for cell in rows[0]]
|
|
319
|
+
combined_ws.append(headers)
|
|
320
|
+
header_written = True
|
|
321
|
+
data_start = 1
|
|
322
|
+
else:
|
|
323
|
+
data_start = 0
|
|
324
|
+
|
|
325
|
+
for row in rows[data_start:]:
|
|
326
|
+
row_values = [cell.value for cell in row]
|
|
327
|
+
if any(v is not None for v in row_values):
|
|
328
|
+
combined_ws.append(row_values)
|
|
329
|
+
total_rows += 1
|
|
330
|
+
|
|
331
|
+
processed_files.append(filename)
|
|
332
|
+
wb.close()
|
|
333
|
+
print(f"Processed: {filename}")
|
|
334
|
+
|
|
335
|
+
except Exception as e:
|
|
336
|
+
print(f"Warning: Error processing {filename}: {e}", file=sys.stderr)
|
|
337
|
+
continue
|
|
338
|
+
|
|
339
|
+
output_path = os.path.join(week_dir, "combined.xlsx")
|
|
340
|
+
combined_wb.save(output_path)
|
|
341
|
+
|
|
342
|
+
print(f"\nCombined {total_rows} rows from {len(processed_files)} files")
|
|
343
|
+
print(f"Output: {output_path}")
|
|
344
|
+
|
|
345
|
+
return {
|
|
346
|
+
"success": True,
|
|
347
|
+
"total_rows": total_rows,
|
|
348
|
+
"source_files": processed_files,
|
|
349
|
+
"output_file": "combined.xlsx",
|
|
350
|
+
}
|
|
351
|
+
|
|
352
|
+
|
|
237
353
|
def main():
|
|
238
354
|
"""Main entry point for command-line usage."""
|
|
239
355
|
parser = argparse.ArgumentParser(
|
|
240
356
|
description="PubMed Weekly Daily Updates Downloader"
|
|
241
357
|
)
|
|
242
|
-
parser.add_argument(
|
|
243
|
-
"--working-dir",
|
|
244
|
-
type=str,
|
|
245
|
-
help="Working directory for downloads (default: current directory)",
|
|
246
|
-
)
|
|
247
358
|
parser.add_argument("command", type=str, help="Command to execute")
|
|
248
359
|
parser.add_argument("args", nargs="*", help="Command arguments")
|
|
249
360
|
|
|
250
361
|
parsed = parser.parse_args()
|
|
251
362
|
|
|
252
|
-
# Set global working directory if provided
|
|
253
|
-
global WORKING_DIR
|
|
254
|
-
if parsed.working_dir:
|
|
255
|
-
WORKING_DIR = parsed.working_dir
|
|
256
|
-
|
|
257
363
|
command = parsed.command
|
|
258
364
|
args = parsed.args
|
|
259
365
|
|
|
@@ -284,6 +390,18 @@ def main():
|
|
|
284
390
|
filename = args[1]
|
|
285
391
|
sys.exit(download_file(week_name, filename))
|
|
286
392
|
|
|
393
|
+
elif command == "combine_excel":
|
|
394
|
+
if len(args) < 1:
|
|
395
|
+
print("Usage: python pubmed_weekly.py combine_excel <week_name>")
|
|
396
|
+
sys.exit(1)
|
|
397
|
+
|
|
398
|
+
week_name = args[0]
|
|
399
|
+
result = combine_excel(week_name)
|
|
400
|
+
print(json.dumps(result, indent=2))
|
|
401
|
+
|
|
402
|
+
if not result.get("success"):
|
|
403
|
+
sys.exit(1)
|
|
404
|
+
|
|
287
405
|
else:
|
|
288
406
|
print(f"Unknown command: {command}")
|
|
289
407
|
sys.exit(1)
|