@yeyuan98/opencode-bioresearcher-plugin 1.5.1 → 1.5.2-alpha.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (53) hide show
  1. package/dist/agents/bioresearcher/prompt.d.ts +1 -1
  2. package/dist/agents/bioresearcher/prompt.js +235 -27
  3. package/dist/agents/bioresearcherDR/prompt.d.ts +1 -1
  4. package/dist/agents/bioresearcherDR/prompt.js +8 -8
  5. package/dist/agents/bioresearcherDR_worker/prompt.d.ts +3 -2
  6. package/dist/agents/bioresearcherDR_worker/prompt.js +37 -12
  7. package/dist/shared/tool-restrictions.d.ts +2 -2
  8. package/dist/shared/tool-restrictions.js +4 -3
  9. package/dist/skills/bioresearcher-core/SKILL.md +58 -1
  10. package/dist/skills/bioresearcher-core/patterns/bioresearcher/analysis-methods.md +551 -0
  11. package/dist/skills/bioresearcher-core/patterns/bioresearcher/best-practices.md +647 -0
  12. package/dist/skills/bioresearcher-core/patterns/bioresearcher/python-standards.md +944 -0
  13. package/dist/skills/bioresearcher-core/patterns/bioresearcher/report-template.md +613 -0
  14. package/dist/skills/bioresearcher-core/patterns/bioresearcher/tool-selection.md +481 -0
  15. package/dist/skills/bioresearcher-core/patterns/citations.md +234 -0
  16. package/dist/skills/bioresearcher-core/patterns/rate-limiting.md +167 -0
  17. package/dist/skills/bioresearcher-tests/README.md +90 -90
  18. package/dist/skills/bioresearcher-tests/SKILL.md +255 -255
  19. package/dist/skills/bioresearcher-tests/pyproject.toml +6 -6
  20. package/dist/skills/bioresearcher-tests/test_cases/json_tests.md +137 -137
  21. package/dist/skills/bioresearcher-tests/test_cases/misc_tests.md +141 -141
  22. package/dist/skills/bioresearcher-tests/test_cases/parser_tests.md +80 -80
  23. package/dist/skills/bioresearcher-tests/test_cases/skill_tests.md +59 -59
  24. package/dist/skills/bioresearcher-tests/test_cases/table_tests.md +194 -194
  25. package/dist/skills/bioresearcher-tests/test_runner.py +607 -607
  26. package/dist/skills/long-table-summary/SKILL.md +224 -224
  27. package/dist/tools/sandbox/bash-parser.d.ts +17 -0
  28. package/dist/tools/sandbox/bash-parser.js +166 -0
  29. package/dist/tools/sandbox/escape-scenarios.test.d.ts +7 -0
  30. package/dist/tools/sandbox/escape-scenarios.test.js +182 -0
  31. package/dist/tools/sandbox/expander.d.ts +30 -0
  32. package/dist/tools/sandbox/expander.js +57 -0
  33. package/dist/tools/sandbox/final-verification.test.d.ts +6 -0
  34. package/dist/tools/sandbox/final-verification.test.js +70 -0
  35. package/dist/tools/sandbox/hooks.d.ts +25 -0
  36. package/dist/tools/sandbox/hooks.js +217 -0
  37. package/dist/tools/sandbox/index.d.ts +19 -0
  38. package/dist/tools/sandbox/index.js +24 -0
  39. package/dist/tools/sandbox/manager.d.ts +60 -0
  40. package/dist/tools/sandbox/manager.js +113 -0
  41. package/dist/tools/sandbox/sandbox.integration.test.d.ts +7 -0
  42. package/dist/tools/sandbox/sandbox.integration.test.js +106 -0
  43. package/dist/tools/sandbox/sandbox.test.d.ts +6 -0
  44. package/dist/tools/sandbox/sandbox.test.js +160 -0
  45. package/dist/tools/sandbox/tool.d.ts +66 -0
  46. package/dist/tools/sandbox/tool.js +163 -0
  47. package/dist/tools/sandbox/types.d.ts +38 -0
  48. package/dist/tools/sandbox/types.js +6 -0
  49. package/dist/tools/sandbox/validator.d.ts +33 -0
  50. package/dist/tools/sandbox/validator.js +150 -0
  51. package/dist/tools/skill/registry.js +0 -1
  52. package/dist/tools/table/utils.js +4 -4
  53. package/package.json +1 -1
@@ -0,0 +1,551 @@
1
+ # Data Analysis Method Selection
2
+
3
+ Decision criteria for choosing between table tools, long-table-summary skill, and custom Python scripts.
4
+
5
+ ## Overview
6
+
7
+ This pattern guides selection of the optimal analysis approach based on:
8
+ - Data size (row count)
9
+ - Operation complexity
10
+ - Parallelization needs
11
+ - Output format requirements
12
+ - Reusability considerations
13
+
14
+ ---
15
+
16
+ ## Decision Matrix
17
+
18
+ | Criteria | Table Tools | long-table-summary Skill | Custom Python |
19
+ |----------|-------------|-------------------------|---------------|
20
+ | **Data Size** | < 30 rows | 30-1000 rows, batch processing | > 1000 rows OR complex logic |
21
+ | **Operation Complexity** | Simple (filter, group, summarize) | Structured summarization with schema | Complex transformations, ML |
22
+ | **Parallelization** | No | Yes (waves of 3 subagents) | Optional |
23
+ | **Output Format** | Table/Excel | Structured JSON → Excel | Any format |
24
+ | **Reusability** | One-time | Reusable template | Reusable functions |
25
+ | **Setup Time** | Immediate | 5-10 minutes | 10-30 minutes |
26
+ | **Context Usage** | Low | Medium (subagent waves) | Low-Medium |
27
+
28
+ ---
29
+
30
+ ## Decision Flowchart
31
+
32
+ ```
33
+ START: Data analysis needed
34
+
35
+ ├─ Can operation be done with table tools?
36
+ │ │
37
+ │ ├─ YES → Use table tools (fastest, simplest)
38
+ │ │ → See Section 1: Table Tools Approach
39
+ │ │
40
+ │ └─ NO → Continue
41
+
42
+ ├─ Is data in table format (Excel/CSV)?
43
+ │ │
44
+ │ ├─ YES AND rows >= 30 AND need structured summarization
45
+ │ │ → Load skill 'long-table-summary'
46
+ │ │ → Follow 16-step workflow
47
+ │ │ → See Section 2: long-table-summary Approach
48
+ │ │
49
+ │ └─ NO OR need custom logic
50
+ │ → Write custom Python script
51
+ │ → See Section 3: Custom Python Approach
52
+ ```
53
+
54
+ ---
55
+
56
+ ## Section 1: Table Tools Approach
57
+
58
+ ### When to Use
59
+ - Data size: < 30 rows
60
+ - Operations: Filter, group, summarize, pivot
61
+ - No complex transformations needed
62
+ - Quick analysis required
63
+
64
+ ### Available Operations
65
+
66
+ #### Filtering
67
+ ```typescript
68
+ // Filter by single condition
69
+ tableFilterRows(
70
+ file_path,
71
+ column="Status",
72
+ operator="=",
73
+ value="Active",
74
+ max_results=100
75
+ )
76
+
77
+ // Supported operators: =, !=, >, <, >=, <=, contains
78
+ ```
79
+
80
+ #### Grouping & Aggregation
81
+ ```typescript
82
+ // Group by one column, aggregate another
83
+ tableGroupBy(
84
+ file_path,
85
+ group_column="Phase",
86
+ agg_column="Patient_Count",
87
+ agg_type="sum" // sum, count, avg, min, max
88
+ )
89
+ ```
90
+
91
+ #### Statistical Summary
92
+ ```typescript
93
+ // Get statistics for numeric columns
94
+ tableSummarize(
95
+ file_path,
96
+ columns=["Age", "Dose_mg", "Response_Rate"]
97
+ )
98
+ // Returns: sum, avg, min, max, std_dev
99
+ ```
100
+
101
+ #### Pivot Tables
102
+ ```typescript
103
+ // Create cross-tabulation
104
+ tablePivotSummary(
105
+ file_path,
106
+ row_field="Phase",
107
+ col_field="Status",
108
+ value_field="Trial_Count",
109
+ agg="count"
110
+ )
111
+ ```
112
+
113
+ #### Search
114
+ ```typescript
115
+ // Search across all cells
116
+ tableSearch(
117
+ file_path,
118
+ search_term="BRAF",
119
+ max_results=50
120
+ )
121
+ ```
122
+
123
+ ### Example Workflow
124
+
125
+ ```markdown
126
+ User: "Summarize the trial data by phase and count recruiting trials"
127
+
128
+ Agent actions:
129
+ 1. tableGetSheetPreview(file_path) → Understand structure
130
+ 2. tableFilterRows(
131
+ file_path,
132
+ column="Status",
133
+ operator="=",
134
+ value="Recruiting"
135
+ )
136
+ 3. tableGroupBy(
137
+ file_path,
138
+ group_column="Phase",
139
+ agg_column="NCT_ID",
140
+ agg_type="count"
141
+ )
142
+ 4. Present results with citations
143
+ ```
144
+
145
+ ---
146
+
147
+ ## Section 2: long-table-summary Approach
148
+
149
+ ### When to Use
150
+ - Data size: 30-1000+ rows
151
+ - Need structured summarization with specific output schema
152
+ - Batch processing with parallel subagents
153
+ - Complex field extraction/classification
154
+
155
+ ### Skill Loading
156
+
157
+ ```markdown
158
+ skill long-table-summary
159
+ ```
160
+
161
+ ### 16-Step Workflow Overview
162
+
163
+ 1. Interview user for table location
164
+ 2. Confirm table existence and list sheets
165
+ 3. Interview for summarization instructions (JSON format)
166
+ 4. Refine instructions iteratively
167
+ 5. Generate output JSON schema
168
+ 6. Autogenerate topic name
169
+ 7. Ask user for batch size (default: 30 rows)
170
+ 8. Calculate batch ranges
171
+ 9. Create subagent prompt template
172
+ 10. Create directory structure
173
+ 11. Generate prompts using Python script
174
+ 12. Launch subagents in waves of 3
175
+ 13. Monitor progress (report every 3 completions)
176
+ 14. Check for missing outputs and retry
177
+ 15. Combine outputs using Python script
178
+ 16. Generate final report
179
+
180
+ ### Example Use Cases
181
+
182
+ #### Use Case 1: Species Classification
183
+ ```json
184
+ {
185
+ "species": "Species classification: Tier1 (human, monkey), Tier2 (other animals), or NA",
186
+ "topic": "Research topic: Oncology, Immunology, General Biology, or Others"
187
+ }
188
+ ```
189
+
190
+ #### Use Case 2: Gene Mutation Analysis
191
+ ```json
192
+ {
193
+ "gene_mutation": "Gene mutation pattern, e.g., V600E, R173, Wild Type",
194
+ "clinical_significance": "Clinical relevance: High, Medium, Low, or Unknown",
195
+ "therapeutic_target": "Is this a drug target? Yes, No, or Unknown"
196
+ }
197
+ ```
198
+
199
+ #### Use Case 3: Trial Status Extraction
200
+ ```json
201
+ {
202
+ "trial_phase": "Extract trial phase: Phase 1, Phase 2, Phase 3, or Phase 4",
203
+ "recruitment_status": "Recruitment status: Recruiting, Completed, Terminated",
204
+ "primary_outcome": "Primary outcome measure description"
205
+ }
206
+ ```
207
+
208
+ ### Performance Considerations
209
+
210
+ ```
211
+ Batch Size Recommendations:
212
+ - Simple classification (2-3 fields): 50 rows per batch
213
+ - Moderate complexity (4-6 fields): 30 rows per batch
214
+ - Complex analysis (7+ fields): 20 rows per batch
215
+
216
+ Parallel Processing:
217
+ - Waves of 3 subagents
218
+ - Each wave processes 3 batches simultaneously
219
+ - Progress reported every 3 completions
220
+
221
+ Expected Time:
222
+ - Setup: 5-10 minutes
223
+ - Processing: ~1 minute per batch
224
+ - Total for 100 rows, 30/batch: ~40 minutes (4 batches × 10 min)
225
+ ```
226
+
227
+ ### When NOT to Use
228
+
229
+ ```
230
+ ❌ DON'T use long-table-summary when:
231
+ - Less than 30 rows (use table tools instead)
232
+ - Simple filtering/grouping only (use table tools)
233
+ - Need real-time results (skill takes minutes)
234
+ - Unstructured output acceptable (use Python)
235
+ - No clear JSON schema possible (use Python)
236
+ ```
237
+
238
+ ---
239
+
240
+ ## Section 3: Custom Python Approach
241
+
242
+ ### When to Use
243
+ - Data size: > 1000 rows
244
+ - Complex transformations (ML, statistical models)
245
+ - Custom output formats
246
+ - Reusable analysis pipeline needed
247
+ - Operations beyond table tool capabilities
248
+
249
+ ### Decision Sub-Flowchart
250
+
251
+ ```
252
+ PYTHON SCRIPT DECISION:
253
+
254
+ ├─ Is this a one-time analysis?
255
+ │ ├─ YES → Write single script in .scripts/py/
256
+ │ │ → Include full documentation
257
+ │ │ → Follow DRY principle
258
+ │ │
259
+ │ └─ NO → Create reusable module
260
+ │ → Separate utilities into _utils.py
261
+ │ → Document thoroughly
262
+
263
+ ├─ Can existing skills/patterns help?
264
+ │ ├─ YES → Load skill 'bioresearcher-core'
265
+ │ │ → Read relevant patterns:
266
+ │ │ - retry.md (for network operations)
267
+ │ │ - progress.md (for batch processing)
268
+ │ │ - json-tools.md (for JSON validation)
269
+ │ │ - table-tools.md (for combining outputs)
270
+ │ │
271
+ │ └─ NO → Implement from scratch
272
+ │ → Follow python-standards.md pattern
273
+ │ → Include comprehensive docstrings
274
+ ```
275
+
276
+ ### Python Script Categories
277
+
278
+ #### Category 1: Data Transformation
279
+ ```python
280
+ # When to use: Complex data reshaping, format conversion
281
+ # Examples:
282
+ # - Pivot/unpivot operations
283
+ # - Multi-file joins
284
+ # - Data cleaning and normalization
285
+ ```
286
+
287
+ #### Category 2: Statistical Analysis
288
+ ```python
289
+ # When to use: Beyond basic sum/avg/min/max
290
+ # Examples:
291
+ # - Statistical tests (t-test, chi-square)
292
+ # - Regression analysis
293
+ # - Survival analysis
294
+ ```
295
+
296
+ #### Category 3: Machine Learning
297
+ ```python
298
+ # When to use: Predictive modeling, classification
299
+ # Examples:
300
+ # - Clustering
301
+ # - Classification
302
+ # - Feature extraction
303
+ ```
304
+
305
+ #### Category 4: Custom Aggregation
306
+ ```python
307
+ # When to use: Complex business logic for aggregation
308
+ # Examples:
309
+ # - Weighted averages
310
+ # - Custom scoring algorithms
311
+ # - Multi-step calculations
312
+ ```
313
+
314
+ ### Example: Statistical Analysis Script
315
+
316
+ ```python
317
+ #!/usr/bin/env python3
318
+ """Clinical trial outcome analysis with statistical testing.
319
+
320
+ This module provides functionality for:
321
+ - Comparing response rates across trial phases
322
+ - Statistical significance testing (chi-square)
323
+ - Generating publication-ready summary tables
324
+
325
+ Usage:
326
+ uv run python trial_analysis.py analyze --input trials.xlsx --output results.json
327
+ uv run python trial_analysis.py compare --phase1 phase2.xlsx --phase2 phase3.xlsx
328
+
329
+ Dependencies:
330
+ - pandas >= 1.5.0
331
+ - scipy >= 1.9.0
332
+ - openpyxl >= 3.0.0
333
+
334
+ Author: BioResearcher AI Agent
335
+ Date: 2024-01-15
336
+ """
337
+
338
+ import pandas as pd
339
+ from scipy import stats
340
+ from typing import Dict, List, Any
341
+
342
+ def load_trial_data(file_path: str) -> pd.DataFrame:
343
+ """Load clinical trial data from Excel file.
344
+
345
+ Args:
346
+ file_path: Path to Excel file containing trial data
347
+
348
+ Returns:
349
+ DataFrame with trial records
350
+
351
+ Raises:
352
+ FileNotFoundError: If file_path does not exist
353
+ ValueError: If required columns missing
354
+ """
355
+ df = pd.read_excel(file_path)
356
+
357
+ required_columns = ["nct_id", "phase", "status", "response_rate"]
358
+ missing = [col for col in required_columns if col not in df.columns]
359
+
360
+ if missing:
361
+ raise ValueError(f"Missing required columns: {missing}")
362
+
363
+ return df
364
+
365
+ def compare_response_rates(
366
+ df: pd.DataFrame,
367
+ group_column: str = "phase"
368
+ ) -> Dict[str, Any]:
369
+ """Compare response rates across groups with statistical testing.
370
+
371
+ Performs chi-square test for independence to determine if
372
+ response rates differ significantly across trial phases.
373
+
374
+ Args:
375
+ df: DataFrame containing trial data
376
+ group_column: Column to group by (default: "phase")
377
+
378
+ Returns:
379
+ Dictionary containing:
380
+ - "group_stats": Statistics per group
381
+ - "chi_square": Chi-square test results
382
+ - "significant": Boolean indicating significance
383
+
384
+ Example:
385
+ >>> df = load_trial_data("trials.xlsx")
386
+ >>> results = compare_response_rates(df, group_column="phase")
387
+ >>> print(results["significant"])
388
+ True
389
+ """
390
+ # Group statistics
391
+ group_stats = df.groupby(group_column)["response_rate"].agg(
392
+ ["mean", "std", "count"]
393
+ ).to_dict()
394
+
395
+ # Chi-square test
396
+ contingency = pd.crosstab(df[group_column], df["response_rate"] > 50)
397
+ chi2, p_value, dof, expected = stats.chi2_contingency(contingency)
398
+
399
+ return {
400
+ "group_stats": group_stats,
401
+ "chi_square": {
402
+ "statistic": chi2,
403
+ "p_value": p_value,
404
+ "degrees_of_freedom": dof
405
+ },
406
+ "significant": p_value < 0.05
407
+ }
408
+
409
+ if __name__ == "__main__":
410
+ import argparse
411
+
412
+ parser = argparse.ArgumentParser(description="Clinical trial analysis")
413
+ parser.add_argument("command", choices=["analyze", "compare"])
414
+ parser.add_argument("--input", required=True, help="Input file path")
415
+ parser.add_argument("--output", required=True, help="Output file path")
416
+
417
+ args = parser.parse_args()
418
+
419
+ # Analysis logic here
420
+ df = load_trial_data(args.input)
421
+ results = compare_response_rates(df)
422
+
423
+ # Save results
424
+ import json
425
+ with open(args.output, 'w') as f:
426
+ json.dump(results, f, indent=2)
427
+ ```
428
+
429
+ ### Python Best Practices
430
+
431
+ ```
432
+ 1. File Organization
433
+ .scripts/py/
434
+ ├── [topic]_analysis.py # Main analysis script
435
+ ├── [topic]_utils.py # Reusable utilities
436
+ └── requirements.txt # Dependencies (if needed)
437
+
438
+ 2. Documentation Requirements
439
+ - Module-level docstring (purpose, usage, dependencies)
440
+ - Function docstrings (Args, Returns, Raises, Examples)
441
+ - Inline comments for complex logic
442
+ - Type hints for all function signatures
443
+
444
+ 3. DRY Principle
445
+ - Single responsibility per function
446
+ - No code duplication
447
+ - Extract repeated logic to utility functions
448
+ - Use configuration over hardcoding
449
+
450
+ 4. Error Handling
451
+ - Validate inputs early
452
+ - Use try/except for external operations
453
+ - Provide informative error messages
454
+ - Log errors for debugging
455
+
456
+ 5. Testing
457
+ - Include example usage in docstrings
458
+ - Test with sample data
459
+ - Validate outputs
460
+ ```
461
+
462
+ ---
463
+
464
+ ## Hybrid Approaches
465
+
466
+ ### Approach 1: Table Tools → Python
467
+ ```
468
+ Use table tools for initial filtering
469
+ → Export subset
470
+ → Use Python for complex analysis
471
+ ```
472
+
473
+ ### Approach 2: Python → Table Tools
474
+ ```
475
+ Use Python for data cleaning
476
+ → Export to Excel
477
+ → Use table tools for interactive exploration
478
+ ```
479
+
480
+ ### Approach 3: long-table-summary → Python
481
+ ```
482
+ Use skill for batch summarization
483
+ → Combine outputs with Python
484
+ → Further analysis with Python
485
+ ```
486
+
487
+ ---
488
+
489
+ ## Performance Benchmarks
490
+
491
+ ### Table Tools
492
+ - **Setup Time:** Immediate
493
+ - **Execution Time:** < 1 second for < 1000 rows
494
+ - **Context Usage:** Minimal
495
+ - **Best For:** Quick lookups, simple aggregations
496
+
497
+ ### long-table-summary Skill
498
+ - **Setup Time:** 5-10 minutes
499
+ - **Execution Time:** ~1 minute per batch (30 rows)
500
+ - **Context Usage:** Medium (subagent overhead)
501
+ - **Best For:** Structured summarization, classification
502
+
503
+ ### Custom Python
504
+ - **Setup Time:** 10-30 minutes
505
+ - **Execution Time:** Variable (depends on complexity)
506
+ - **Context Usage:** Low (script runs independently)
507
+ - **Best For:** Complex analysis, reusable pipelines
508
+
509
+ ---
510
+
511
+ ## Common Mistakes to Avoid
512
+
513
+ ### Mistake 1: Using Python for Simple Operations
514
+ ```
515
+ ❌ BAD: Write Python script for simple filtering
516
+ ✅ GOOD: Use tableFilterRows() for filtering
517
+ ```
518
+
519
+ ### Mistake 2: Using long-table-summary for Small Tables
520
+ ```
521
+ ❌ BAD: Load skill for 15-row table
522
+ ✅ GOOD: Use table tools directly for < 30 rows
523
+ ```
524
+
525
+ ### Mistake 3: Not Using Skills When Appropriate
526
+ ```
527
+ ❌ BAD: Write custom Python for batch summarization
528
+ ✅ GOOD: Load long-table-summary skill for structured summarization
529
+ ```
530
+
531
+ ### Mistake 4: Ignoring Upfront Filtering
532
+ ```
533
+ ❌ BAD: Load entire 10,000-row table then process
534
+ ✅ GOOD: Apply filters first, then process subset
535
+ ```
536
+
537
+ ---
538
+
539
+ ## Decision Checklist
540
+
541
+ Before choosing an approach, ask:
542
+
543
+ - [ ] How many rows in the dataset?
544
+ - [ ] What operations are needed? (filter, group, summarize, transform)
545
+ - [ ] Is output format specified? (JSON schema vs. flexible)
546
+ - [ ] Is parallel processing needed?
547
+ - [ ] Will this analysis be reused?
548
+ - [ ] What is the time constraint?
549
+ - [ ] Are there existing skills that can help?
550
+
551
+ **Based on answers, select approach using decision matrix above.**