@yeyuan98/opencode-bioresearcher-plugin 1.5.1 → 1.5.2-alpha.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/agents/bioresearcher/prompt.d.ts +1 -1
- package/dist/agents/bioresearcher/prompt.js +235 -27
- package/dist/agents/bioresearcherDR/prompt.d.ts +1 -1
- package/dist/agents/bioresearcherDR/prompt.js +8 -8
- package/dist/agents/bioresearcherDR_worker/prompt.d.ts +3 -2
- package/dist/agents/bioresearcherDR_worker/prompt.js +37 -12
- package/dist/shared/tool-restrictions.d.ts +2 -2
- package/dist/shared/tool-restrictions.js +4 -3
- package/dist/skills/bioresearcher-core/SKILL.md +58 -1
- package/dist/skills/bioresearcher-core/patterns/bioresearcher/analysis-methods.md +551 -0
- package/dist/skills/bioresearcher-core/patterns/bioresearcher/best-practices.md +647 -0
- package/dist/skills/bioresearcher-core/patterns/bioresearcher/python-standards.md +944 -0
- package/dist/skills/bioresearcher-core/patterns/bioresearcher/report-template.md +613 -0
- package/dist/skills/bioresearcher-core/patterns/bioresearcher/tool-selection.md +481 -0
- package/dist/skills/bioresearcher-core/patterns/citations.md +234 -0
- package/dist/skills/bioresearcher-core/patterns/rate-limiting.md +167 -0
- package/dist/skills/bioresearcher-tests/README.md +90 -90
- package/dist/skills/bioresearcher-tests/SKILL.md +255 -255
- package/dist/skills/bioresearcher-tests/pyproject.toml +6 -6
- package/dist/skills/bioresearcher-tests/test_cases/json_tests.md +137 -137
- package/dist/skills/bioresearcher-tests/test_cases/misc_tests.md +141 -141
- package/dist/skills/bioresearcher-tests/test_cases/parser_tests.md +80 -80
- package/dist/skills/bioresearcher-tests/test_cases/skill_tests.md +59 -59
- package/dist/skills/bioresearcher-tests/test_cases/table_tests.md +194 -194
- package/dist/skills/bioresearcher-tests/test_runner.py +607 -607
- package/dist/skills/long-table-summary/SKILL.md +224 -224
- package/dist/tools/sandbox/bash-parser.d.ts +17 -0
- package/dist/tools/sandbox/bash-parser.js +166 -0
- package/dist/tools/sandbox/escape-scenarios.test.d.ts +7 -0
- package/dist/tools/sandbox/escape-scenarios.test.js +182 -0
- package/dist/tools/sandbox/expander.d.ts +30 -0
- package/dist/tools/sandbox/expander.js +57 -0
- package/dist/tools/sandbox/final-verification.test.d.ts +6 -0
- package/dist/tools/sandbox/final-verification.test.js +70 -0
- package/dist/tools/sandbox/hooks.d.ts +25 -0
- package/dist/tools/sandbox/hooks.js +217 -0
- package/dist/tools/sandbox/index.d.ts +19 -0
- package/dist/tools/sandbox/index.js +24 -0
- package/dist/tools/sandbox/manager.d.ts +60 -0
- package/dist/tools/sandbox/manager.js +113 -0
- package/dist/tools/sandbox/sandbox.integration.test.d.ts +7 -0
- package/dist/tools/sandbox/sandbox.integration.test.js +106 -0
- package/dist/tools/sandbox/sandbox.test.d.ts +6 -0
- package/dist/tools/sandbox/sandbox.test.js +160 -0
- package/dist/tools/sandbox/tool.d.ts +66 -0
- package/dist/tools/sandbox/tool.js +163 -0
- package/dist/tools/sandbox/types.d.ts +38 -0
- package/dist/tools/sandbox/types.js +6 -0
- package/dist/tools/sandbox/validator.d.ts +33 -0
- package/dist/tools/sandbox/validator.js +150 -0
- package/dist/tools/skill/registry.js +0 -1
- package/dist/tools/table/utils.js +4 -4
- package/package.json +1 -1
|
@@ -0,0 +1,551 @@
|
|
|
1
|
+
# Data Analysis Method Selection
|
|
2
|
+
|
|
3
|
+
Decision criteria for choosing between table tools, long-table-summary skill, and custom Python scripts.
|
|
4
|
+
|
|
5
|
+
## Overview
|
|
6
|
+
|
|
7
|
+
This pattern guides selection of the optimal analysis approach based on:
|
|
8
|
+
- Data size (row count)
|
|
9
|
+
- Operation complexity
|
|
10
|
+
- Parallelization needs
|
|
11
|
+
- Output format requirements
|
|
12
|
+
- Reusability considerations
|
|
13
|
+
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
## Decision Matrix
|
|
17
|
+
|
|
18
|
+
| Criteria | Table Tools | long-table-summary Skill | Custom Python |
|
|
19
|
+
|----------|-------------|-------------------------|---------------|
|
|
20
|
+
| **Data Size** | < 30 rows | 30-1000 rows, batch processing | > 1000 rows OR complex logic |
|
|
21
|
+
| **Operation Complexity** | Simple (filter, group, summarize) | Structured summarization with schema | Complex transformations, ML |
|
|
22
|
+
| **Parallelization** | No | Yes (waves of 3 subagents) | Optional |
|
|
23
|
+
| **Output Format** | Table/Excel | Structured JSON → Excel | Any format |
|
|
24
|
+
| **Reusability** | One-time | Reusable template | Reusable functions |
|
|
25
|
+
| **Setup Time** | Immediate | 5-10 minutes | 10-30 minutes |
|
|
26
|
+
| **Context Usage** | Low | Medium (subagent waves) | Low-Medium |
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## Decision Flowchart
|
|
31
|
+
|
|
32
|
+
```
|
|
33
|
+
START: Data analysis needed
|
|
34
|
+
│
|
|
35
|
+
├─ Can operation be done with table tools?
|
|
36
|
+
│ │
|
|
37
|
+
│ ├─ YES → Use table tools (fastest, simplest)
|
|
38
|
+
│ │ → See Section 1: Table Tools Approach
|
|
39
|
+
│ │
|
|
40
|
+
│ └─ NO → Continue
|
|
41
|
+
│
|
|
42
|
+
├─ Is data in table format (Excel/CSV)?
|
|
43
|
+
│ │
|
|
44
|
+
│ ├─ YES AND rows >= 30 AND need structured summarization
|
|
45
|
+
│ │ → Load skill 'long-table-summary'
|
|
46
|
+
│ │ → Follow 16-step workflow
|
|
47
|
+
│ │ → See Section 2: long-table-summary Approach
|
|
48
|
+
│ │
|
|
49
|
+
│ └─ NO OR need custom logic
|
|
50
|
+
│ → Write custom Python script
|
|
51
|
+
│ → See Section 3: Custom Python Approach
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
---
|
|
55
|
+
|
|
56
|
+
## Section 1: Table Tools Approach
|
|
57
|
+
|
|
58
|
+
### When to Use
|
|
59
|
+
- Data size: < 30 rows
|
|
60
|
+
- Operations: Filter, group, summarize, pivot
|
|
61
|
+
- No complex transformations needed
|
|
62
|
+
- Quick analysis required
|
|
63
|
+
|
|
64
|
+
### Available Operations
|
|
65
|
+
|
|
66
|
+
#### Filtering
|
|
67
|
+
```typescript
|
|
68
|
+
// Filter by single condition
|
|
69
|
+
tableFilterRows(
|
|
70
|
+
file_path,
|
|
71
|
+
column="Status",
|
|
72
|
+
operator="=",
|
|
73
|
+
value="Active",
|
|
74
|
+
max_results=100
|
|
75
|
+
)
|
|
76
|
+
|
|
77
|
+
// Supported operators: =, !=, >, <, >=, <=, contains
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
#### Grouping & Aggregation
|
|
81
|
+
```typescript
|
|
82
|
+
// Group by one column, aggregate another
|
|
83
|
+
tableGroupBy(
|
|
84
|
+
file_path,
|
|
85
|
+
group_column="Phase",
|
|
86
|
+
agg_column="Patient_Count",
|
|
87
|
+
agg_type="sum" // sum, count, avg, min, max
|
|
88
|
+
)
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
#### Statistical Summary
|
|
92
|
+
```typescript
|
|
93
|
+
// Get statistics for numeric columns
|
|
94
|
+
tableSummarize(
|
|
95
|
+
file_path,
|
|
96
|
+
columns=["Age", "Dose_mg", "Response_Rate"]
|
|
97
|
+
)
|
|
98
|
+
// Returns: sum, avg, min, max, std_dev
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
#### Pivot Tables
|
|
102
|
+
```typescript
|
|
103
|
+
// Create cross-tabulation
|
|
104
|
+
tablePivotSummary(
|
|
105
|
+
file_path,
|
|
106
|
+
row_field="Phase",
|
|
107
|
+
col_field="Status",
|
|
108
|
+
value_field="Trial_Count",
|
|
109
|
+
agg="count"
|
|
110
|
+
)
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
#### Search
|
|
114
|
+
```typescript
|
|
115
|
+
// Search across all cells
|
|
116
|
+
tableSearch(
|
|
117
|
+
file_path,
|
|
118
|
+
search_term="BRAF",
|
|
119
|
+
max_results=50
|
|
120
|
+
)
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
### Example Workflow
|
|
124
|
+
|
|
125
|
+
```markdown
|
|
126
|
+
User: "Summarize the trial data by phase and count recruiting trials"
|
|
127
|
+
|
|
128
|
+
Agent actions:
|
|
129
|
+
1. tableGetSheetPreview(file_path) → Understand structure
|
|
130
|
+
2. tableFilterRows(
|
|
131
|
+
file_path,
|
|
132
|
+
column="Status",
|
|
133
|
+
operator="=",
|
|
134
|
+
value="Recruiting"
|
|
135
|
+
)
|
|
136
|
+
3. tableGroupBy(
|
|
137
|
+
file_path,
|
|
138
|
+
group_column="Phase",
|
|
139
|
+
agg_column="NCT_ID",
|
|
140
|
+
agg_type="count"
|
|
141
|
+
)
|
|
142
|
+
4. Present results with citations
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
---
|
|
146
|
+
|
|
147
|
+
## Section 2: long-table-summary Approach
|
|
148
|
+
|
|
149
|
+
### When to Use
|
|
150
|
+
- Data size: 30-1000+ rows
|
|
151
|
+
- Need structured summarization with specific output schema
|
|
152
|
+
- Batch processing with parallel subagents
|
|
153
|
+
- Complex field extraction/classification
|
|
154
|
+
|
|
155
|
+
### Skill Loading
|
|
156
|
+
|
|
157
|
+
```markdown
|
|
158
|
+
skill long-table-summary
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
### 16-Step Workflow Overview
|
|
162
|
+
|
|
163
|
+
1. Interview user for table location
|
|
164
|
+
2. Confirm table existence and list sheets
|
|
165
|
+
3. Interview for summarization instructions (JSON format)
|
|
166
|
+
4. Refine instructions iteratively
|
|
167
|
+
5. Generate output JSON schema
|
|
168
|
+
6. Autogenerate topic name
|
|
169
|
+
7. Ask user for batch size (default: 30 rows)
|
|
170
|
+
8. Calculate batch ranges
|
|
171
|
+
9. Create subagent prompt template
|
|
172
|
+
10. Create directory structure
|
|
173
|
+
11. Generate prompts using Python script
|
|
174
|
+
12. Launch subagents in waves of 3
|
|
175
|
+
13. Monitor progress (report every 3 completions)
|
|
176
|
+
14. Check for missing outputs and retry
|
|
177
|
+
15. Combine outputs using Python script
|
|
178
|
+
16. Generate final report
|
|
179
|
+
|
|
180
|
+
### Example Use Cases
|
|
181
|
+
|
|
182
|
+
#### Use Case 1: Species Classification
|
|
183
|
+
```json
|
|
184
|
+
{
|
|
185
|
+
"species": "Species classification: Tier1 (human, monkey), Tier2 (other animals), or NA",
|
|
186
|
+
"topic": "Research topic: Oncology, Immunology, General Biology, or Others"
|
|
187
|
+
}
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
#### Use Case 2: Gene Mutation Analysis
|
|
191
|
+
```json
|
|
192
|
+
{
|
|
193
|
+
"gene_mutation": "Gene mutation pattern, e.g., V600E, R173, Wild Type",
|
|
194
|
+
"clinical_significance": "Clinical relevance: High, Medium, Low, or Unknown",
|
|
195
|
+
"therapeutic_target": "Is this a drug target? Yes, No, or Unknown"
|
|
196
|
+
}
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
#### Use Case 3: Trial Status Extraction
|
|
200
|
+
```json
|
|
201
|
+
{
|
|
202
|
+
"trial_phase": "Extract trial phase: Phase 1, Phase 2, Phase 3, or Phase 4",
|
|
203
|
+
"recruitment_status": "Recruitment status: Recruiting, Completed, Terminated",
|
|
204
|
+
"primary_outcome": "Primary outcome measure description"
|
|
205
|
+
}
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
### Performance Considerations
|
|
209
|
+
|
|
210
|
+
```
|
|
211
|
+
Batch Size Recommendations:
|
|
212
|
+
- Simple classification (2-3 fields): 50 rows per batch
|
|
213
|
+
- Moderate complexity (4-6 fields): 30 rows per batch
|
|
214
|
+
- Complex analysis (7+ fields): 20 rows per batch
|
|
215
|
+
|
|
216
|
+
Parallel Processing:
|
|
217
|
+
- Waves of 3 subagents
|
|
218
|
+
- Each wave processes 3 batches simultaneously
|
|
219
|
+
- Progress reported every 3 completions
|
|
220
|
+
|
|
221
|
+
Expected Time:
|
|
222
|
+
- Setup: 5-10 minutes
|
|
223
|
+
- Processing: ~1 minute per batch
|
|
224
|
+
- Total for 100 rows, 30/batch: ~40 minutes (4 batches × 10 min)
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
### When NOT to Use
|
|
228
|
+
|
|
229
|
+
```
|
|
230
|
+
❌ DON'T use long-table-summary when:
|
|
231
|
+
- Less than 30 rows (use table tools instead)
|
|
232
|
+
- Simple filtering/grouping only (use table tools)
|
|
233
|
+
- Need real-time results (skill takes minutes)
|
|
234
|
+
- Unstructured output acceptable (use Python)
|
|
235
|
+
- No clear JSON schema possible (use Python)
|
|
236
|
+
```
|
|
237
|
+
|
|
238
|
+
---
|
|
239
|
+
|
|
240
|
+
## Section 3: Custom Python Approach
|
|
241
|
+
|
|
242
|
+
### When to Use
|
|
243
|
+
- Data size: > 1000 rows
|
|
244
|
+
- Complex transformations (ML, statistical models)
|
|
245
|
+
- Custom output formats
|
|
246
|
+
- Reusable analysis pipeline needed
|
|
247
|
+
- Operations beyond table tool capabilities
|
|
248
|
+
|
|
249
|
+
### Decision Sub-Flowchart
|
|
250
|
+
|
|
251
|
+
```
|
|
252
|
+
PYTHON SCRIPT DECISION:
|
|
253
|
+
│
|
|
254
|
+
├─ Is this a one-time analysis?
|
|
255
|
+
│ ├─ YES → Write single script in .scripts/py/
|
|
256
|
+
│ │ → Include full documentation
|
|
257
|
+
│ │ → Follow DRY principle
|
|
258
|
+
│ │
|
|
259
|
+
│ └─ NO → Create reusable module
|
|
260
|
+
│ → Separate utilities into _utils.py
|
|
261
|
+
│ → Document thoroughly
|
|
262
|
+
│
|
|
263
|
+
├─ Can existing skills/patterns help?
|
|
264
|
+
│ ├─ YES → Load skill 'bioresearcher-core'
|
|
265
|
+
│ │ → Read relevant patterns:
|
|
266
|
+
│ │ - retry.md (for network operations)
|
|
267
|
+
│ │ - progress.md (for batch processing)
|
|
268
|
+
│ │ - json-tools.md (for JSON validation)
|
|
269
|
+
│ │ - table-tools.md (for combining outputs)
|
|
270
|
+
│ │
|
|
271
|
+
│ └─ NO → Implement from scratch
|
|
272
|
+
│ → Follow python-standards.md pattern
|
|
273
|
+
│ → Include comprehensive docstrings
|
|
274
|
+
```
|
|
275
|
+
|
|
276
|
+
### Python Script Categories
|
|
277
|
+
|
|
278
|
+
#### Category 1: Data Transformation
|
|
279
|
+
```python
|
|
280
|
+
# When to use: Complex data reshaping, format conversion
|
|
281
|
+
# Examples:
|
|
282
|
+
# - Pivot/unpivot operations
|
|
283
|
+
# - Multi-file joins
|
|
284
|
+
# - Data cleaning and normalization
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
#### Category 2: Statistical Analysis
|
|
288
|
+
```python
|
|
289
|
+
# When to use: Beyond basic sum/avg/min/max
|
|
290
|
+
# Examples:
|
|
291
|
+
# - Statistical tests (t-test, chi-square)
|
|
292
|
+
# - Regression analysis
|
|
293
|
+
# - Survival analysis
|
|
294
|
+
```
|
|
295
|
+
|
|
296
|
+
#### Category 3: Machine Learning
|
|
297
|
+
```python
|
|
298
|
+
# When to use: Predictive modeling, classification
|
|
299
|
+
# Examples:
|
|
300
|
+
# - Clustering
|
|
301
|
+
# - Classification
|
|
302
|
+
# - Feature extraction
|
|
303
|
+
```
|
|
304
|
+
|
|
305
|
+
#### Category 4: Custom Aggregation
|
|
306
|
+
```python
|
|
307
|
+
# When to use: Complex business logic for aggregation
|
|
308
|
+
# Examples:
|
|
309
|
+
# - Weighted averages
|
|
310
|
+
# - Custom scoring algorithms
|
|
311
|
+
# - Multi-step calculations
|
|
312
|
+
```
|
|
313
|
+
|
|
314
|
+
### Example: Statistical Analysis Script
|
|
315
|
+
|
|
316
|
+
```python
|
|
317
|
+
#!/usr/bin/env python3
|
|
318
|
+
"""Clinical trial outcome analysis with statistical testing.
|
|
319
|
+
|
|
320
|
+
This module provides functionality for:
|
|
321
|
+
- Comparing response rates across trial phases
|
|
322
|
+
- Statistical significance testing (chi-square)
|
|
323
|
+
- Generating publication-ready summary tables
|
|
324
|
+
|
|
325
|
+
Usage:
|
|
326
|
+
uv run python trial_analysis.py analyze --input trials.xlsx --output results.json
|
|
327
|
+
uv run python trial_analysis.py compare --phase1 phase2.xlsx --phase2 phase3.xlsx
|
|
328
|
+
|
|
329
|
+
Dependencies:
|
|
330
|
+
- pandas >= 1.5.0
|
|
331
|
+
- scipy >= 1.9.0
|
|
332
|
+
- openpyxl >= 3.0.0
|
|
333
|
+
|
|
334
|
+
Author: BioResearcher AI Agent
|
|
335
|
+
Date: 2024-01-15
|
|
336
|
+
"""
|
|
337
|
+
|
|
338
|
+
import pandas as pd
|
|
339
|
+
from scipy import stats
|
|
340
|
+
from typing import Dict, List, Any
|
|
341
|
+
|
|
342
|
+
def load_trial_data(file_path: str) -> pd.DataFrame:
|
|
343
|
+
"""Load clinical trial data from Excel file.
|
|
344
|
+
|
|
345
|
+
Args:
|
|
346
|
+
file_path: Path to Excel file containing trial data
|
|
347
|
+
|
|
348
|
+
Returns:
|
|
349
|
+
DataFrame with trial records
|
|
350
|
+
|
|
351
|
+
Raises:
|
|
352
|
+
FileNotFoundError: If file_path does not exist
|
|
353
|
+
ValueError: If required columns missing
|
|
354
|
+
"""
|
|
355
|
+
df = pd.read_excel(file_path)
|
|
356
|
+
|
|
357
|
+
required_columns = ["nct_id", "phase", "status", "response_rate"]
|
|
358
|
+
missing = [col for col in required_columns if col not in df.columns]
|
|
359
|
+
|
|
360
|
+
if missing:
|
|
361
|
+
raise ValueError(f"Missing required columns: {missing}")
|
|
362
|
+
|
|
363
|
+
return df
|
|
364
|
+
|
|
365
|
+
def compare_response_rates(
|
|
366
|
+
df: pd.DataFrame,
|
|
367
|
+
group_column: str = "phase"
|
|
368
|
+
) -> Dict[str, Any]:
|
|
369
|
+
"""Compare response rates across groups with statistical testing.
|
|
370
|
+
|
|
371
|
+
Performs chi-square test for independence to determine if
|
|
372
|
+
response rates differ significantly across trial phases.
|
|
373
|
+
|
|
374
|
+
Args:
|
|
375
|
+
df: DataFrame containing trial data
|
|
376
|
+
group_column: Column to group by (default: "phase")
|
|
377
|
+
|
|
378
|
+
Returns:
|
|
379
|
+
Dictionary containing:
|
|
380
|
+
- "group_stats": Statistics per group
|
|
381
|
+
- "chi_square": Chi-square test results
|
|
382
|
+
- "significant": Boolean indicating significance
|
|
383
|
+
|
|
384
|
+
Example:
|
|
385
|
+
>>> df = load_trial_data("trials.xlsx")
|
|
386
|
+
>>> results = compare_response_rates(df, group_column="phase")
|
|
387
|
+
>>> print(results["significant"])
|
|
388
|
+
True
|
|
389
|
+
"""
|
|
390
|
+
# Group statistics
|
|
391
|
+
group_stats = df.groupby(group_column)["response_rate"].agg(
|
|
392
|
+
["mean", "std", "count"]
|
|
393
|
+
).to_dict()
|
|
394
|
+
|
|
395
|
+
# Chi-square test
|
|
396
|
+
contingency = pd.crosstab(df[group_column], df["response_rate"] > 50)
|
|
397
|
+
chi2, p_value, dof, expected = stats.chi2_contingency(contingency)
|
|
398
|
+
|
|
399
|
+
return {
|
|
400
|
+
"group_stats": group_stats,
|
|
401
|
+
"chi_square": {
|
|
402
|
+
"statistic": chi2,
|
|
403
|
+
"p_value": p_value,
|
|
404
|
+
"degrees_of_freedom": dof
|
|
405
|
+
},
|
|
406
|
+
"significant": p_value < 0.05
|
|
407
|
+
}
|
|
408
|
+
|
|
409
|
+
if __name__ == "__main__":
|
|
410
|
+
import argparse
|
|
411
|
+
|
|
412
|
+
parser = argparse.ArgumentParser(description="Clinical trial analysis")
|
|
413
|
+
parser.add_argument("command", choices=["analyze", "compare"])
|
|
414
|
+
parser.add_argument("--input", required=True, help="Input file path")
|
|
415
|
+
parser.add_argument("--output", required=True, help="Output file path")
|
|
416
|
+
|
|
417
|
+
args = parser.parse_args()
|
|
418
|
+
|
|
419
|
+
# Analysis logic here
|
|
420
|
+
df = load_trial_data(args.input)
|
|
421
|
+
results = compare_response_rates(df)
|
|
422
|
+
|
|
423
|
+
# Save results
|
|
424
|
+
import json
|
|
425
|
+
with open(args.output, 'w') as f:
|
|
426
|
+
json.dump(results, f, indent=2)
|
|
427
|
+
```
|
|
428
|
+
|
|
429
|
+
### Python Best Practices
|
|
430
|
+
|
|
431
|
+
```
|
|
432
|
+
1. File Organization
|
|
433
|
+
.scripts/py/
|
|
434
|
+
├── [topic]_analysis.py # Main analysis script
|
|
435
|
+
├── [topic]_utils.py # Reusable utilities
|
|
436
|
+
└── requirements.txt # Dependencies (if needed)
|
|
437
|
+
|
|
438
|
+
2. Documentation Requirements
|
|
439
|
+
- Module-level docstring (purpose, usage, dependencies)
|
|
440
|
+
- Function docstrings (Args, Returns, Raises, Examples)
|
|
441
|
+
- Inline comments for complex logic
|
|
442
|
+
- Type hints for all function signatures
|
|
443
|
+
|
|
444
|
+
3. DRY Principle
|
|
445
|
+
- Single responsibility per function
|
|
446
|
+
- No code duplication
|
|
447
|
+
- Extract repeated logic to utility functions
|
|
448
|
+
- Use configuration over hardcoding
|
|
449
|
+
|
|
450
|
+
4. Error Handling
|
|
451
|
+
- Validate inputs early
|
|
452
|
+
- Use try/except for external operations
|
|
453
|
+
- Provide informative error messages
|
|
454
|
+
- Log errors for debugging
|
|
455
|
+
|
|
456
|
+
5. Testing
|
|
457
|
+
- Include example usage in docstrings
|
|
458
|
+
- Test with sample data
|
|
459
|
+
- Validate outputs
|
|
460
|
+
```
|
|
461
|
+
|
|
462
|
+
---
|
|
463
|
+
|
|
464
|
+
## Hybrid Approaches
|
|
465
|
+
|
|
466
|
+
### Approach 1: Table Tools → Python
|
|
467
|
+
```
|
|
468
|
+
Use table tools for initial filtering
|
|
469
|
+
→ Export subset
|
|
470
|
+
→ Use Python for complex analysis
|
|
471
|
+
```
|
|
472
|
+
|
|
473
|
+
### Approach 2: Python → Table Tools
|
|
474
|
+
```
|
|
475
|
+
Use Python for data cleaning
|
|
476
|
+
→ Export to Excel
|
|
477
|
+
→ Use table tools for interactive exploration
|
|
478
|
+
```
|
|
479
|
+
|
|
480
|
+
### Approach 3: long-table-summary → Python
|
|
481
|
+
```
|
|
482
|
+
Use skill for batch summarization
|
|
483
|
+
→ Combine outputs with Python
|
|
484
|
+
→ Further analysis with Python
|
|
485
|
+
```
|
|
486
|
+
|
|
487
|
+
---
|
|
488
|
+
|
|
489
|
+
## Performance Benchmarks
|
|
490
|
+
|
|
491
|
+
### Table Tools
|
|
492
|
+
- **Setup Time:** Immediate
|
|
493
|
+
- **Execution Time:** < 1 second for < 1000 rows
|
|
494
|
+
- **Context Usage:** Minimal
|
|
495
|
+
- **Best For:** Quick lookups, simple aggregations
|
|
496
|
+
|
|
497
|
+
### long-table-summary Skill
|
|
498
|
+
- **Setup Time:** 5-10 minutes
|
|
499
|
+
- **Execution Time:** ~1 minute per batch (30 rows)
|
|
500
|
+
- **Context Usage:** Medium (subagent overhead)
|
|
501
|
+
- **Best For:** Structured summarization, classification
|
|
502
|
+
|
|
503
|
+
### Custom Python
|
|
504
|
+
- **Setup Time:** 10-30 minutes
|
|
505
|
+
- **Execution Time:** Variable (depends on complexity)
|
|
506
|
+
- **Context Usage:** Low (script runs independently)
|
|
507
|
+
- **Best For:** Complex analysis, reusable pipelines
|
|
508
|
+
|
|
509
|
+
---
|
|
510
|
+
|
|
511
|
+
## Common Mistakes to Avoid
|
|
512
|
+
|
|
513
|
+
### Mistake 1: Using Python for Simple Operations
|
|
514
|
+
```
|
|
515
|
+
❌ BAD: Write Python script for simple filtering
|
|
516
|
+
✅ GOOD: Use tableFilterRows() for filtering
|
|
517
|
+
```
|
|
518
|
+
|
|
519
|
+
### Mistake 2: Using long-table-summary for Small Tables
|
|
520
|
+
```
|
|
521
|
+
❌ BAD: Load skill for 15-row table
|
|
522
|
+
✅ GOOD: Use table tools directly for < 30 rows
|
|
523
|
+
```
|
|
524
|
+
|
|
525
|
+
### Mistake 3: Not Using Skills When Appropriate
|
|
526
|
+
```
|
|
527
|
+
❌ BAD: Write custom Python for batch summarization
|
|
528
|
+
✅ GOOD: Load long-table-summary skill for structured summarization
|
|
529
|
+
```
|
|
530
|
+
|
|
531
|
+
### Mistake 4: Ignoring Upfront Filtering
|
|
532
|
+
```
|
|
533
|
+
❌ BAD: Load entire 10,000-row table then process
|
|
534
|
+
✅ GOOD: Apply filters first, then process subset
|
|
535
|
+
```
|
|
536
|
+
|
|
537
|
+
---
|
|
538
|
+
|
|
539
|
+
## Decision Checklist
|
|
540
|
+
|
|
541
|
+
Before choosing an approach, ask:
|
|
542
|
+
|
|
543
|
+
- [ ] How many rows in the dataset?
|
|
544
|
+
- [ ] What operations are needed? (filter, group, summarize, transform)
|
|
545
|
+
- [ ] Is output format specified? (JSON schema vs. flexible)
|
|
546
|
+
- [ ] Is parallel processing needed?
|
|
547
|
+
- [ ] Will this analysis be reused?
|
|
548
|
+
- [ ] What is the time constraint?
|
|
549
|
+
- [ ] Are there existing skills that can help?
|
|
550
|
+
|
|
551
|
+
**Based on answers, select approach using decision matrix above.**
|