@yeyuan98/opencode-bioresearcher-plugin 1.3.1 → 1.4.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +14 -0
- package/dist/index.js +4 -1
- package/dist/misc-tools/index.d.ts +3 -0
- package/dist/misc-tools/index.js +3 -0
- package/dist/misc-tools/json-extract.d.ts +13 -0
- package/dist/misc-tools/json-extract.js +394 -0
- package/dist/misc-tools/json-infer.d.ts +13 -0
- package/dist/misc-tools/json-infer.js +199 -0
- package/dist/misc-tools/json-tools.d.ts +33 -0
- package/dist/misc-tools/json-tools.js +187 -0
- package/dist/misc-tools/json-validate.d.ts +13 -0
- package/dist/misc-tools/json-validate.js +228 -0
- package/dist/skills/bioresearcher-core/README.md +210 -0
- package/dist/skills/bioresearcher-core/SKILL.md +128 -0
- package/dist/skills/bioresearcher-core/examples/contexts.json +29 -0
- package/dist/skills/bioresearcher-core/examples/data-exchange-example.md +303 -0
- package/dist/skills/bioresearcher-core/examples/template.md +49 -0
- package/dist/skills/bioresearcher-core/patterns/calculator.md +215 -0
- package/dist/skills/bioresearcher-core/patterns/data-exchange.md +406 -0
- package/dist/skills/bioresearcher-core/patterns/json-tools.md +263 -0
- package/dist/skills/bioresearcher-core/patterns/progress.md +127 -0
- package/dist/skills/bioresearcher-core/patterns/retry.md +110 -0
- package/dist/skills/bioresearcher-core/patterns/shell-commands.md +79 -0
- package/dist/skills/bioresearcher-core/patterns/subagent-waves.md +186 -0
- package/dist/skills/bioresearcher-core/patterns/table-tools.md +260 -0
- package/dist/skills/bioresearcher-core/patterns/user-confirmation.md +187 -0
- package/dist/skills/bioresearcher-core/python/template.md +273 -0
- package/dist/skills/bioresearcher-core/python/template.py +323 -0
- package/dist/skills/long-table-summary/SKILL.md +374 -0
- package/dist/skills/long-table-summary/__init__.py +3 -0
- package/dist/skills/long-table-summary/combine_outputs.py +345 -0
- package/dist/skills/long-table-summary/pyproject.toml +11 -0
- package/dist/skills/pubmed-weekly/SKILL.md +329 -329
- package/dist/skills/pubmed-weekly/pubmed_weekly.py +411 -411
- package/dist/skills/pubmed-weekly/pyproject.toml +8 -8
- package/package.json +7 -2
|
@@ -1,329 +1,329 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: pubmed-weekly
|
|
3
|
-
description: Download PubMed daily update xml.gz files from the past week from NCBI FTP server
|
|
4
|
-
allowedTools:
|
|
5
|
-
- Bash
|
|
6
|
-
- Read
|
|
7
|
-
- Write
|
|
8
|
-
- Question
|
|
9
|
-
- parse_pubmed_articleSet
|
|
10
|
-
---
|
|
11
|
-
|
|
12
|
-
# PubMed Weekly Daily Updates Download
|
|
13
|
-
|
|
14
|
-
This skill downloads PubMed daily update xml.gz files from the past week (Monday-Sunday).
|
|
15
|
-
|
|
16
|
-
## Workflow Overview
|
|
17
|
-
|
|
18
|
-
1. **Python Environment Setup** (automatic): Checks for uv, installs via `python-setup-uv` skill if needed
|
|
19
|
-
2. **Date Calculation**: Calculates the past week's date range (Monday-Sunday)
|
|
20
|
-
3. **FTP Listing**: Fetches available xml.gz files from NCBI FTP server
|
|
21
|
-
4. **Filtering**: Filters files to include only those from the past week
|
|
22
|
-
5. **Download**: Downloads filtered files with retry logic (max 3 attempts per file)
|
|
23
|
-
|
|
24
|
-
## Prerequisites
|
|
25
|
-
- Internet connection
|
|
26
|
-
- Access to NCBI FTP server
|
|
27
|
-
- uv package manager (will be automatically installed if not present)
|
|
28
|
-
|
|
29
|
-
## Integration with python-setup-uv
|
|
30
|
-
|
|
31
|
-
This skill integrates with the `python-setup-uv` skill to ensure Python environment is properly configured.
|
|
32
|
-
|
|
33
|
-
### Prerequisite Check
|
|
34
|
-
|
|
35
|
-
Before starting the download process:
|
|
36
|
-
|
|
37
|
-
1. **Check if uv is installed:**
|
|
38
|
-
```bash
|
|
39
|
-
if [ -f "uv" ] || [ -f "uv.exe" ]; then
|
|
40
|
-
echo "uv already installed"
|
|
41
|
-
else
|
|
42
|
-
echo "uv not found, setting up..."
|
|
43
|
-
fi
|
|
44
|
-
```
|
|
45
|
-
|
|
46
|
-
2. **If uv is not installed:**
|
|
47
|
-
- Load the `python-setup-uv` skill using the skill tool
|
|
48
|
-
- Follow all steps EXACTLY as specified in the python-setup-uv skill
|
|
49
|
-
- Wait for uv installation to complete
|
|
50
|
-
- Continue with this skill's Step 1 below
|
|
51
|
-
|
|
52
|
-
3. **After uv is installed:**
|
|
53
|
-
- The bundled script `pubmed_weekly.py` will be executed using uv
|
|
54
|
-
- Extract the full script path from the `<skill_files>` section in skill tool output
|
|
55
|
-
|
|
56
|
-
## Steps
|
|
57
|
-
|
|
58
|
-
Follow these steps EXACTLY as described.
|
|
59
|
-
|
|
60
|
-
### Step 1: Calculate Week Date Range
|
|
61
|
-
|
|
62
|
-
First, determine the date range for the past week (Monday through Sunday).
|
|
63
|
-
|
|
64
|
-
Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section in the skill tool output.
|
|
65
|
-
|
|
66
|
-
**For Unix-like shells (Git Bash / macOS / Linux):**
|
|
67
|
-
```bash
|
|
68
|
-
uv run python <skill_path>/pubmed_weekly.py calculate_week
|
|
69
|
-
```
|
|
70
|
-
|
|
71
|
-
### Step 1: Calculate Week Date Range
|
|
72
|
-
|
|
73
|
-
First, determine the date range for the past week (Monday through Sunday).
|
|
74
|
-
|
|
75
|
-
Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section in the skill tool output.
|
|
76
|
-
|
|
77
|
-
### Step 1: Calculate Week Date Range
|
|
78
|
-
|
|
79
|
-
First, determine the date range for the past week (Monday through Sunday).
|
|
80
|
-
|
|
81
|
-
**For Unix-like shells (Git Bash / macOS / Linux):**
|
|
82
|
-
```bash
|
|
83
|
-
uv run python <skill_path>/pubmed_weekly.py calculate_week
|
|
84
|
-
```
|
|
85
|
-
|
|
86
|
-
**For Windows cmd.exe:**
|
|
87
|
-
```bash
|
|
88
|
-
uv.exe run python <skill_path>\pubmed_weekly.py calculate_week
|
|
89
|
-
```
|
|
90
|
-
|
|
91
|
-
Replace `<skill_path>` with the full directory path extracted from `<skill_files>`.
|
|
92
|
-
|
|
93
|
-
This will output the week folder name in format `YYYYMMDD-YYYYMMDD`.
|
|
94
|
-
|
|
95
|
-
**Expected output format:**
|
|
96
|
-
```
|
|
97
|
-
20250217-20250223
|
|
98
|
-
```
|
|
99
|
-
|
|
100
|
-
### Step 2: Create Download Directory
|
|
101
|
-
|
|
102
|
-
The `download_file` command will automatically create the directory structure when needed. No manual directory creation is required.
|
|
103
|
-
|
|
104
|
-
### Step 3: Fetch FTP File List
|
|
105
|
-
|
|
106
|
-
Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section and fetch the list of files from the NCBI FTP server.
|
|
107
|
-
|
|
108
|
-
**For Unix-like shells:**
|
|
109
|
-
```bash
|
|
110
|
-
uv run python <skill_path>/pubmed_weekly.py fetch_files
|
|
111
|
-
```
|
|
112
|
-
|
|
113
|
-
**For Windows cmd.exe:**
|
|
114
|
-
```bash
|
|
115
|
-
uv.exe run python <skill_path>\pubmed_weekly.py fetch_files
|
|
116
|
-
```
|
|
117
|
-
|
|
118
|
-
Replace `<skill_path>` with the full directory path extracted from `<skill_files>`.
|
|
119
|
-
|
|
120
|
-
This will list all daily update xml.gz files available on the FTP server.
|
|
121
|
-
|
|
122
|
-
**Expected output:**
|
|
123
|
-
```
|
|
124
|
-
pubmed24n1234.xml.gz pubmed24n1235.xml.gz pubmed24n1236.xml.gz
|
|
125
|
-
```
|
|
126
|
-
|
|
127
|
-
### Step 4: Filter Files for Past Week
|
|
128
|
-
|
|
129
|
-
Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section and filter the file list for the past week's daily updates.
|
|
130
|
-
|
|
131
|
-
**For Unix-like shells:**
|
|
132
|
-
```bash
|
|
133
|
-
uv run python <skill_path>/pubmed_weekly.py filter_files "<WEEK>" "<FILE_LIST>"
|
|
134
|
-
```
|
|
135
|
-
|
|
136
|
-
**For Windows cmd.exe:**
|
|
137
|
-
```bash
|
|
138
|
-
uv.exe run python <skill_path>\pubmed_weekly.py filter_files "<WEEK>" "<FILE_LIST>"
|
|
139
|
-
```
|
|
140
|
-
|
|
141
|
-
Where:
|
|
142
|
-
- `<skill_path>` is the full directory path extracted from `<skill_files>`
|
|
143
|
-
- `<WEEK>` is the week folder name (e.g., `20250217-20250223`)
|
|
144
|
-
- `<FILE_LIST>` is the output from Step 3 (space-separated filenames, use quotes)
|
|
145
|
-
|
|
146
|
-
This will return a space-separated list of xml.gz files from the past week.
|
|
147
|
-
|
|
148
|
-
**Expected output:**
|
|
149
|
-
```
|
|
150
|
-
pubmed24n1234.xml.gz pubmed24n1235.xml.gz pubmed24n1236.xml.gz
|
|
151
|
-
```
|
|
152
|
-
|
|
153
|
-
### Step 5: Download Files with Retry
|
|
154
|
-
|
|
155
|
-
Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section and for each file in the filtered list, download to the target directory with retry logic.
|
|
156
|
-
|
|
157
|
-
**For Unix-like shells:**
|
|
158
|
-
```bash
|
|
159
|
-
for file in <FILE_LIST>; do
|
|
160
|
-
uv run python <skill_path>/pubmed_weekly.py download_file <WEEK> $file
|
|
161
|
-
done
|
|
162
|
-
```
|
|
163
|
-
|
|
164
|
-
**For Windows cmd.exe:**
|
|
165
|
-
```bash
|
|
166
|
-
for %f in (<FILE_LIST>) do uv.exe run python <skill_path>\pubmed_weekly.py download_file <WEEK> %f
|
|
167
|
-
```
|
|
168
|
-
|
|
169
|
-
Where:
|
|
170
|
-
- `<skill_path>` is the full directory path extracted from `<skill_files>`
|
|
171
|
-
- `<FILE_LIST>` is the space-separated list from Step 4
|
|
172
|
-
- `<WEEK>` is the week folder name
|
|
173
|
-
|
|
174
|
-
Replace `<FILE_LIST>` with the space-separated list from Step 4.
|
|
175
|
-
|
|
176
|
-
**Download behavior:**
|
|
177
|
-
- Downloads one file at a time
|
|
178
|
-
- Retries up to 3 times if download fails
|
|
179
|
-
- Waits 2 seconds between retry attempts
|
|
180
|
-
- After 3 failed attempts, asks user whether to abort
|
|
181
|
-
|
|
182
|
-
**If a download fails after 3 retries:**
|
|
183
|
-
Use the question tool to ask:
|
|
184
|
-
- "Abort remaining downloads?" (options: "Yes" / "No")
|
|
185
|
-
|
|
186
|
-
If user selects "Yes", stop the process and report summary.
|
|
187
|
-
If user selects "No", skip the failed file and continue with the next one.
|
|
188
|
-
|
|
189
|
-
### Step 6: Verify Downloads
|
|
190
|
-
|
|
191
|
-
After all downloads complete (or are aborted), verify the downloaded files:
|
|
192
|
-
|
|
193
|
-
```bash
|
|
194
|
-
ls -lh .download/pubmed-daily/<WEEK>/
|
|
195
|
-
```
|
|
196
|
-
|
|
197
|
-
Count the number of files downloaded and report the summary to the user.
|
|
198
|
-
|
|
199
|
-
### Step 7: Parse XML Files to Individual Excel Sheets
|
|
200
|
-
|
|
201
|
-
For each downloaded `.xml.gz` file in `.download/pubmed-daily/<WEEK>/`, use the `parse_pubmed_articleSet` tool to convert it to an Excel file.
|
|
202
|
-
|
|
203
|
-
**Tool invocation pattern:**
|
|
204
|
-
|
|
205
|
-
```
|
|
206
|
-
parse_pubmed_articleSet
|
|
207
|
-
filePath="<working_dir>/.download/pubmed-daily/<WEEK>/<filename>.xml.gz"
|
|
208
|
-
outputMode="excel"
|
|
209
|
-
outputFileName="<filename>.xlsx"
|
|
210
|
-
outputDir="<working_dir>/.download/pubmed-daily/<WEEK>"
|
|
211
|
-
```
|
|
212
|
-
|
|
213
|
-
**Example:**
|
|
214
|
-
For file `pubmed24n1234.xml.gz` in week `20250217-20250223`:
|
|
215
|
-
- Input: `.download/pubmed-daily/20250217-20250223/pubmed24n1234.xml.gz`
|
|
216
|
-
- Output: `.download/pubmed-daily/20250217-20250223/pubmed24n1234.xlsx`
|
|
217
|
-
|
|
218
|
-
**Process:**
|
|
219
|
-
1. List all `.xml.gz` files in the week directory
|
|
220
|
-
2. For each file, call `parse_pubmed_articleSet` with `outputMode="excel"`
|
|
221
|
-
3. The output Excel file will be saved in the same directory as the input
|
|
222
|
-
4. Report parsing statistics (articles processed, any errors)
|
|
223
|
-
|
|
224
|
-
### Step 8: Combine Individual Excel Files
|
|
225
|
-
|
|
226
|
-
After all individual Excel files are created, combine them into a single `combined.xlsx` file using the Python script.
|
|
227
|
-
|
|
228
|
-
Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section.
|
|
229
|
-
|
|
230
|
-
**For Unix-like shells (Git Bash / macOS / Linux):**
|
|
231
|
-
```bash
|
|
232
|
-
uv run python <skill_path>/pubmed_weekly.py combine_excel "<WEEK>"
|
|
233
|
-
```
|
|
234
|
-
|
|
235
|
-
**For Windows cmd.exe:**
|
|
236
|
-
```bash
|
|
237
|
-
uv.exe run python <skill_path>\pubmed_weekly.py combine_excel "<WEEK>"
|
|
238
|
-
```
|
|
239
|
-
|
|
240
|
-
Where:
|
|
241
|
-
- `<skill_path>` is the full directory path extracted from `<skill_files>`
|
|
242
|
-
- `<WEEK>` is the actual week folder name (e.g., `20250217-20250223`)
|
|
243
|
-
|
|
244
|
-
**Expected behavior:**
|
|
245
|
-
- Finds all `.xlsx` files in the week directory (excluding `combined.xlsx`)
|
|
246
|
-
- Reads each file and combines all rows
|
|
247
|
-
- Writes `combined.xlsx` with all articles from all files
|
|
248
|
-
- Returns summary: total rows, source files processed
|
|
249
|
-
|
|
250
|
-
**Output location:**
|
|
251
|
-
`.download/pubmed-daily/<WEEK>/combined.xlsx`
|
|
252
|
-
|
|
253
|
-
## Python Script Details
|
|
254
|
-
|
|
255
|
-
The skill includes a bundled Python script at `pubmed_weekly.py` with the following functions:
|
|
256
|
-
|
|
257
|
-
### 1. `calculate_week()` - Calculate week date range
|
|
258
|
-
|
|
259
|
-
Returns week folder name in format `YYYYMMDD-YYYYMMDD` for the past week (Monday-Sunday).
|
|
260
|
-
|
|
261
|
-
### 2. `fetch_files()` - Fetch FTP file list
|
|
262
|
-
|
|
263
|
-
Returns list of all xml.gz filenames from NCBI FTP server.
|
|
264
|
-
|
|
265
|
-
### 3. `filter_files(week_name, file_list)` - Filter files for the week
|
|
266
|
-
|
|
267
|
-
Parameters:
|
|
268
|
-
- `week_name`: Week folder name (e.g., `20250217-20250223`)
|
|
269
|
-
- `file_list`: List of filenames from FTP server
|
|
270
|
-
|
|
271
|
-
Returns: Space-separated list of xml.gz files that fall within the date range.
|
|
272
|
-
|
|
273
|
-
### 4. `download_file(week_name, filename)` - Download single file with retry
|
|
274
|
-
|
|
275
|
-
Parameters:
|
|
276
|
-
- `week_name`: Week folder name
|
|
277
|
-
- `filename`: XML.gz filename to download
|
|
278
|
-
|
|
279
|
-
Behavior:
|
|
280
|
-
- Downloads from `ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/<filename>`
|
|
281
|
-
- Saves to `.download/pubmed-daily/<week_name>/<filename>` in current working directory
|
|
282
|
-
- Creates directory structure if needed
|
|
283
|
-
- Retries up to 3 times on failure
|
|
284
|
-
- Returns exit code 0 on success, 1 on failure (after all retries)
|
|
285
|
-
|
|
286
|
-
### 5. `combine_excel(week_name)` - Combine Excel files into combined.xlsx
|
|
287
|
-
|
|
288
|
-
Parameters:
|
|
289
|
-
- `week_name`: Week folder name (e.g., `20250217-20250223`)
|
|
290
|
-
|
|
291
|
-
Behavior:
|
|
292
|
-
- Searches for all `.xlsx` files in `.download/pubmed-daily/<week_name>/` in current working directory
|
|
293
|
-
- Excludes `combined.xlsx` from the list
|
|
294
|
-
- Reads each Excel file and combines all rows
|
|
295
|
-
- Creates `combined.xlsx` with all articles merged
|
|
296
|
-
- Returns JSON with: success, total_rows, source_files, output_file
|
|
297
|
-
|
|
298
|
-
## Output Summary
|
|
299
|
-
|
|
300
|
-
After completion, provide the user with:
|
|
301
|
-
|
|
302
|
-
1. Week date range processed
|
|
303
|
-
2. Number of files found for the week
|
|
304
|
-
3. Number of files successfully downloaded
|
|
305
|
-
4. Number of files failed to download (if any)
|
|
306
|
-
5. Download location: `.download/pubmed-daily/<WEEK>/`
|
|
307
|
-
6. Number of XML files parsed to Excel (Step 7)
|
|
308
|
-
7. Total articles in combined.xlsx (Step 8)
|
|
309
|
-
8. Combined file location: `.download/pubmed-daily/<WEEK>/combined.xlsx`
|
|
310
|
-
|
|
311
|
-
## Notes
|
|
312
|
-
|
|
313
|
-
- This skill automatically checks for and installs uv using the `python-setup-uv` skill if not present
|
|
314
|
-
- The Python script is bundled with this skill at `pubmed_weekly.py`
|
|
315
|
-
- All Python commands use the full script path extracted from `<skill_files>` section
|
|
316
|
-
- The script uses `os.getcwd()` to determine the working directory, which is naturally the opencode working directory
|
|
317
|
-
- All output files (downloads, Excel files) are created in the opencode working directory
|
|
318
|
-
- The FTP server path is: `ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/`
|
|
319
|
-
- Only `.xml.gz` files are downloaded
|
|
320
|
-
- Downloads are sequential (one file at a time)
|
|
321
|
-
- Retry logic includes 2-second delays between attempts
|
|
322
|
-
- User has control to abort on persistent failures
|
|
323
|
-
- The script uses Python's built-in `urllib.request` for FTP operations
|
|
324
|
-
- The `combine_excel` command requires `openpyxl` package (auto-installed via uv)
|
|
325
|
-
- Skill directory path is extracted from `<skill_files>` section for script location
|
|
326
|
-
- Windows with Git Bash: Follow Unix-like shell instructions
|
|
327
|
-
- Windows cmd.exe: Use `uv.exe run python` syntax
|
|
328
|
-
- Step 7 uses the `parse_pubmed_articleSet` tool for XML to Excel conversion
|
|
329
|
-
- Step 8 combines all individual Excel files into a single `combined.xlsx`
|
|
1
|
+
---
|
|
2
|
+
name: pubmed-weekly
|
|
3
|
+
description: Download PubMed daily update xml.gz files from the past week from NCBI FTP server
|
|
4
|
+
allowedTools:
|
|
5
|
+
- Bash
|
|
6
|
+
- Read
|
|
7
|
+
- Write
|
|
8
|
+
- Question
|
|
9
|
+
- parse_pubmed_articleSet
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
# PubMed Weekly Daily Updates Download
|
|
13
|
+
|
|
14
|
+
This skill downloads PubMed daily update xml.gz files from the past week (Monday-Sunday).
|
|
15
|
+
|
|
16
|
+
## Workflow Overview
|
|
17
|
+
|
|
18
|
+
1. **Python Environment Setup** (automatic): Checks for uv, installs via `python-setup-uv` skill if needed
|
|
19
|
+
2. **Date Calculation**: Calculates the past week's date range (Monday-Sunday)
|
|
20
|
+
3. **FTP Listing**: Fetches available xml.gz files from NCBI FTP server
|
|
21
|
+
4. **Filtering**: Filters files to include only those from the past week
|
|
22
|
+
5. **Download**: Downloads filtered files with retry logic (max 3 attempts per file)
|
|
23
|
+
|
|
24
|
+
## Prerequisites
|
|
25
|
+
- Internet connection
|
|
26
|
+
- Access to NCBI FTP server
|
|
27
|
+
- uv package manager (will be automatically installed if not present)
|
|
28
|
+
|
|
29
|
+
## Integration with python-setup-uv
|
|
30
|
+
|
|
31
|
+
This skill integrates with the `python-setup-uv` skill to ensure Python environment is properly configured.
|
|
32
|
+
|
|
33
|
+
### Prerequisite Check
|
|
34
|
+
|
|
35
|
+
Before starting the download process:
|
|
36
|
+
|
|
37
|
+
1. **Check if uv is installed:**
|
|
38
|
+
```bash
|
|
39
|
+
if [ -f "uv" ] || [ -f "uv.exe" ]; then
|
|
40
|
+
echo "uv already installed"
|
|
41
|
+
else
|
|
42
|
+
echo "uv not found, setting up..."
|
|
43
|
+
fi
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
2. **If uv is not installed:**
|
|
47
|
+
- Load the `python-setup-uv` skill using the skill tool
|
|
48
|
+
- Follow all steps EXACTLY as specified in the python-setup-uv skill
|
|
49
|
+
- Wait for uv installation to complete
|
|
50
|
+
- Continue with this skill's Step 1 below
|
|
51
|
+
|
|
52
|
+
3. **After uv is installed:**
|
|
53
|
+
- The bundled script `pubmed_weekly.py` will be executed using uv
|
|
54
|
+
- Extract the full script path from the `<skill_files>` section in skill tool output
|
|
55
|
+
|
|
56
|
+
## Steps
|
|
57
|
+
|
|
58
|
+
Follow these steps EXACTLY as described.
|
|
59
|
+
|
|
60
|
+
### Step 1: Calculate Week Date Range
|
|
61
|
+
|
|
62
|
+
First, determine the date range for the past week (Monday through Sunday).
|
|
63
|
+
|
|
64
|
+
Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section in the skill tool output.
|
|
65
|
+
|
|
66
|
+
**For Unix-like shells (Git Bash / macOS / Linux):**
|
|
67
|
+
```bash
|
|
68
|
+
uv run python <skill_path>/pubmed_weekly.py calculate_week
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
### Step 1: Calculate Week Date Range
|
|
72
|
+
|
|
73
|
+
First, determine the date range for the past week (Monday through Sunday).
|
|
74
|
+
|
|
75
|
+
Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section in the skill tool output.
|
|
76
|
+
|
|
77
|
+
### Step 1: Calculate Week Date Range
|
|
78
|
+
|
|
79
|
+
First, determine the date range for the past week (Monday through Sunday).
|
|
80
|
+
|
|
81
|
+
**For Unix-like shells (Git Bash / macOS / Linux):**
|
|
82
|
+
```bash
|
|
83
|
+
uv run python <skill_path>/pubmed_weekly.py calculate_week
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
**For Windows cmd.exe:**
|
|
87
|
+
```bash
|
|
88
|
+
uv.exe run python <skill_path>\pubmed_weekly.py calculate_week
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
Replace `<skill_path>` with the full directory path extracted from `<skill_files>`.
|
|
92
|
+
|
|
93
|
+
This will output the week folder name in format `YYYYMMDD-YYYYMMDD`.
|
|
94
|
+
|
|
95
|
+
**Expected output format:**
|
|
96
|
+
```
|
|
97
|
+
20250217-20250223
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
### Step 2: Create Download Directory
|
|
101
|
+
|
|
102
|
+
The `download_file` command will automatically create the directory structure when needed. No manual directory creation is required.
|
|
103
|
+
|
|
104
|
+
### Step 3: Fetch FTP File List
|
|
105
|
+
|
|
106
|
+
Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section and fetch the list of files from the NCBI FTP server.
|
|
107
|
+
|
|
108
|
+
**For Unix-like shells:**
|
|
109
|
+
```bash
|
|
110
|
+
uv run python <skill_path>/pubmed_weekly.py fetch_files
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
**For Windows cmd.exe:**
|
|
114
|
+
```bash
|
|
115
|
+
uv.exe run python <skill_path>\pubmed_weekly.py fetch_files
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
Replace `<skill_path>` with the full directory path extracted from `<skill_files>`.
|
|
119
|
+
|
|
120
|
+
This will list all daily update xml.gz files available on the FTP server.
|
|
121
|
+
|
|
122
|
+
**Expected output:**
|
|
123
|
+
```
|
|
124
|
+
pubmed24n1234.xml.gz pubmed24n1235.xml.gz pubmed24n1236.xml.gz
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
### Step 4: Filter Files for Past Week
|
|
128
|
+
|
|
129
|
+
Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section and filter the file list for the past week's daily updates.
|
|
130
|
+
|
|
131
|
+
**For Unix-like shells:**
|
|
132
|
+
```bash
|
|
133
|
+
uv run python <skill_path>/pubmed_weekly.py filter_files "<WEEK>" "<FILE_LIST>"
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
**For Windows cmd.exe:**
|
|
137
|
+
```bash
|
|
138
|
+
uv.exe run python <skill_path>\pubmed_weekly.py filter_files "<WEEK>" "<FILE_LIST>"
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
Where:
|
|
142
|
+
- `<skill_path>` is the full directory path extracted from `<skill_files>`
|
|
143
|
+
- `<WEEK>` is the week folder name (e.g., `20250217-20250223`)
|
|
144
|
+
- `<FILE_LIST>` is the output from Step 3 (space-separated filenames, use quotes)
|
|
145
|
+
|
|
146
|
+
This will return a space-separated list of xml.gz files from the past week.
|
|
147
|
+
|
|
148
|
+
**Expected output:**
|
|
149
|
+
```
|
|
150
|
+
pubmed24n1234.xml.gz pubmed24n1235.xml.gz pubmed24n1236.xml.gz
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
### Step 5: Download Files with Retry
|
|
154
|
+
|
|
155
|
+
Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section and for each file in the filtered list, download to the target directory with retry logic.
|
|
156
|
+
|
|
157
|
+
**For Unix-like shells:**
|
|
158
|
+
```bash
|
|
159
|
+
for file in <FILE_LIST>; do
|
|
160
|
+
uv run python <skill_path>/pubmed_weekly.py download_file <WEEK> $file
|
|
161
|
+
done
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
**For Windows cmd.exe:**
|
|
165
|
+
```bash
|
|
166
|
+
for %f in (<FILE_LIST>) do uv.exe run python <skill_path>\pubmed_weekly.py download_file <WEEK> %f
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
Where:
|
|
170
|
+
- `<skill_path>` is the full directory path extracted from `<skill_files>`
|
|
171
|
+
- `<FILE_LIST>` is the space-separated list from Step 4
|
|
172
|
+
- `<WEEK>` is the week folder name
|
|
173
|
+
|
|
174
|
+
Replace `<FILE_LIST>` with the space-separated list from Step 4.
|
|
175
|
+
|
|
176
|
+
**Download behavior:**
|
|
177
|
+
- Downloads one file at a time
|
|
178
|
+
- Retries up to 3 times if download fails
|
|
179
|
+
- Waits 2 seconds between retry attempts
|
|
180
|
+
- After 3 failed attempts, asks user whether to abort
|
|
181
|
+
|
|
182
|
+
**If a download fails after 3 retries:**
|
|
183
|
+
Use the question tool to ask:
|
|
184
|
+
- "Abort remaining downloads?" (options: "Yes" / "No")
|
|
185
|
+
|
|
186
|
+
If user selects "Yes", stop the process and report summary.
|
|
187
|
+
If user selects "No", skip the failed file and continue with the next one.
|
|
188
|
+
|
|
189
|
+
### Step 6: Verify Downloads
|
|
190
|
+
|
|
191
|
+
After all downloads complete (or are aborted), verify the downloaded files:
|
|
192
|
+
|
|
193
|
+
```bash
|
|
194
|
+
ls -lh .download/pubmed-daily/<WEEK>/
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
Count the number of files downloaded and report the summary to the user.
|
|
198
|
+
|
|
199
|
+
### Step 7: Parse XML Files to Individual Excel Sheets
|
|
200
|
+
|
|
201
|
+
For each downloaded `.xml.gz` file in `.download/pubmed-daily/<WEEK>/`, use the `parse_pubmed_articleSet` tool to convert it to an Excel file.
|
|
202
|
+
|
|
203
|
+
**Tool invocation pattern:**
|
|
204
|
+
|
|
205
|
+
```
|
|
206
|
+
parse_pubmed_articleSet
|
|
207
|
+
filePath="<working_dir>/.download/pubmed-daily/<WEEK>/<filename>.xml.gz"
|
|
208
|
+
outputMode="excel"
|
|
209
|
+
outputFileName="<filename>.xlsx"
|
|
210
|
+
outputDir="<working_dir>/.download/pubmed-daily/<WEEK>"
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
**Example:**
|
|
214
|
+
For file `pubmed24n1234.xml.gz` in week `20250217-20250223`:
|
|
215
|
+
- Input: `.download/pubmed-daily/20250217-20250223/pubmed24n1234.xml.gz`
|
|
216
|
+
- Output: `.download/pubmed-daily/20250217-20250223/pubmed24n1234.xlsx`
|
|
217
|
+
|
|
218
|
+
**Process:**
|
|
219
|
+
1. List all `.xml.gz` files in the week directory
|
|
220
|
+
2. For each file, call `parse_pubmed_articleSet` with `outputMode="excel"`
|
|
221
|
+
3. The output Excel file will be saved in the same directory as the input
|
|
222
|
+
4. Report parsing statistics (articles processed, any errors)
|
|
223
|
+
|
|
224
|
+
### Step 8: Combine Individual Excel Files
|
|
225
|
+
|
|
226
|
+
After all individual Excel files are created, combine them into a single `combined.xlsx` file using the Python script.
|
|
227
|
+
|
|
228
|
+
Extract the full path to `pubmed_weekly.py` from the `<skill_files>` section.
|
|
229
|
+
|
|
230
|
+
**For Unix-like shells (Git Bash / macOS / Linux):**
|
|
231
|
+
```bash
|
|
232
|
+
uv run python <skill_path>/pubmed_weekly.py combine_excel "<WEEK>"
|
|
233
|
+
```
|
|
234
|
+
|
|
235
|
+
**For Windows cmd.exe:**
|
|
236
|
+
```bash
|
|
237
|
+
uv.exe run python <skill_path>\pubmed_weekly.py combine_excel "<WEEK>"
|
|
238
|
+
```
|
|
239
|
+
|
|
240
|
+
Where:
|
|
241
|
+
- `<skill_path>` is the full directory path extracted from `<skill_files>`
|
|
242
|
+
- `<WEEK>` is the actual week folder name (e.g., `20250217-20250223`)
|
|
243
|
+
|
|
244
|
+
**Expected behavior:**
|
|
245
|
+
- Finds all `.xlsx` files in the week directory (excluding `combined.xlsx`)
|
|
246
|
+
- Reads each file and combines all rows
|
|
247
|
+
- Writes `combined.xlsx` with all articles from all files
|
|
248
|
+
- Returns summary: total rows, source files processed
|
|
249
|
+
|
|
250
|
+
**Output location:**
|
|
251
|
+
`.download/pubmed-daily/<WEEK>/combined.xlsx`
|
|
252
|
+
|
|
253
|
+
## Python Script Details
|
|
254
|
+
|
|
255
|
+
The skill includes a bundled Python script at `pubmed_weekly.py` with the following functions:
|
|
256
|
+
|
|
257
|
+
### 1. `calculate_week()` - Calculate week date range
|
|
258
|
+
|
|
259
|
+
Returns week folder name in format `YYYYMMDD-YYYYMMDD` for the past week (Monday-Sunday).
|
|
260
|
+
|
|
261
|
+
### 2. `fetch_files()` - Fetch FTP file list
|
|
262
|
+
|
|
263
|
+
Returns list of all xml.gz filenames from NCBI FTP server.
|
|
264
|
+
|
|
265
|
+
### 3. `filter_files(week_name, file_list)` - Filter files for the week
|
|
266
|
+
|
|
267
|
+
Parameters:
|
|
268
|
+
- `week_name`: Week folder name (e.g., `20250217-20250223`)
|
|
269
|
+
- `file_list`: List of filenames from FTP server
|
|
270
|
+
|
|
271
|
+
Returns: Space-separated list of xml.gz files that fall within the date range.
|
|
272
|
+
|
|
273
|
+
### 4. `download_file(week_name, filename)` - Download single file with retry
|
|
274
|
+
|
|
275
|
+
Parameters:
|
|
276
|
+
- `week_name`: Week folder name
|
|
277
|
+
- `filename`: XML.gz filename to download
|
|
278
|
+
|
|
279
|
+
Behavior:
|
|
280
|
+
- Downloads from `ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/<filename>`
|
|
281
|
+
- Saves to `.download/pubmed-daily/<week_name>/<filename>` in current working directory
|
|
282
|
+
- Creates directory structure if needed
|
|
283
|
+
- Retries up to 3 times on failure
|
|
284
|
+
- Returns exit code 0 on success, 1 on failure (after all retries)
|
|
285
|
+
|
|
286
|
+
### 5. `combine_excel(week_name)` - Combine Excel files into combined.xlsx
|
|
287
|
+
|
|
288
|
+
Parameters:
|
|
289
|
+
- `week_name`: Week folder name (e.g., `20250217-20250223`)
|
|
290
|
+
|
|
291
|
+
Behavior:
|
|
292
|
+
- Searches for all `.xlsx` files in `.download/pubmed-daily/<week_name>/` in current working directory
|
|
293
|
+
- Excludes `combined.xlsx` from the list
|
|
294
|
+
- Reads each Excel file and combines all rows
|
|
295
|
+
- Creates `combined.xlsx` with all articles merged
|
|
296
|
+
- Returns JSON with: success, total_rows, source_files, output_file
|
|
297
|
+
|
|
298
|
+
## Output Summary
|
|
299
|
+
|
|
300
|
+
After completion, provide the user with:
|
|
301
|
+
|
|
302
|
+
1. Week date range processed
|
|
303
|
+
2. Number of files found for the week
|
|
304
|
+
3. Number of files successfully downloaded
|
|
305
|
+
4. Number of files failed to download (if any)
|
|
306
|
+
5. Download location: `.download/pubmed-daily/<WEEK>/`
|
|
307
|
+
6. Number of XML files parsed to Excel (Step 7)
|
|
308
|
+
7. Total articles in combined.xlsx (Step 8)
|
|
309
|
+
8. Combined file location: `.download/pubmed-daily/<WEEK>/combined.xlsx`
|
|
310
|
+
|
|
311
|
+
## Notes
|
|
312
|
+
|
|
313
|
+
- This skill automatically checks for and installs uv using the `python-setup-uv` skill if not present
|
|
314
|
+
- The Python script is bundled with this skill at `pubmed_weekly.py`
|
|
315
|
+
- All Python commands use the full script path extracted from `<skill_files>` section
|
|
316
|
+
- The script uses `os.getcwd()` to determine the working directory, which is naturally the opencode working directory
|
|
317
|
+
- All output files (downloads, Excel files) are created in the opencode working directory
|
|
318
|
+
- The FTP server path is: `ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/`
|
|
319
|
+
- Only `.xml.gz` files are downloaded
|
|
320
|
+
- Downloads are sequential (one file at a time)
|
|
321
|
+
- Retry logic includes 2-second delays between attempts
|
|
322
|
+
- User has control to abort on persistent failures
|
|
323
|
+
- The script uses Python's built-in `urllib.request` for FTP operations
|
|
324
|
+
- The `combine_excel` command requires `openpyxl` package (auto-installed via uv)
|
|
325
|
+
- Skill directory path is extracted from `<skill_files>` section for script location
|
|
326
|
+
- Windows with Git Bash: Follow Unix-like shell instructions
|
|
327
|
+
- Windows cmd.exe: Use `uv.exe run python` syntax
|
|
328
|
+
- Step 7 uses the `parse_pubmed_articleSet` tool for XML to Excel conversion
|
|
329
|
+
- Step 8 combines all individual Excel files into a single `combined.xlsx`
|