gyoshu 0.1.0 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +11 -9
- package/bin/gyoshu.js +2 -2
- package/install.sh +1 -1
- package/package.json +1 -1
- package/src/lib/report-markdown.ts +21 -63
- package/src/lib/literature-client.ts +0 -1048
- package/src/skill/scientific-method/SKILL.md +0 -331
- package/src/tool/literature-search.ts +0 -389
|
@@ -1,331 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: scientific-method
|
|
3
|
-
description: Framework for hypothesis-driven scientific research
|
|
4
|
-
---
|
|
5
|
-
|
|
6
|
-
# Scientific Method Framework
|
|
7
|
-
|
|
8
|
-
## When to Use
|
|
9
|
-
Load this skill when conducting hypothesis-driven research that requires rigorous methodology.
|
|
10
|
-
|
|
11
|
-
## The Scientific Method
|
|
12
|
-
|
|
13
|
-
### 1. Observation
|
|
14
|
-
- Examine existing data or phenomena
|
|
15
|
-
- Note patterns, anomalies, or questions
|
|
16
|
-
- Document initial observations with `[OBSERVATION]` marker
|
|
17
|
-
|
|
18
|
-
### 2. Question
|
|
19
|
-
- Formulate a specific, testable question
|
|
20
|
-
- Use `[OBJECTIVE]` marker to state clearly
|
|
21
|
-
|
|
22
|
-
### 3. Hypothesis
|
|
23
|
-
- Propose a testable explanation
|
|
24
|
-
- Make specific, falsifiable predictions
|
|
25
|
-
- Use `[HYPOTHESIS]` marker
|
|
26
|
-
|
|
27
|
-
### 4. Experiment
|
|
28
|
-
- Design controlled experiments
|
|
29
|
-
- Identify variables (independent, dependent, controlled)
|
|
30
|
-
- Use `[EXPERIMENT]` marker for procedures
|
|
31
|
-
|
|
32
|
-
### 5. Analysis
|
|
33
|
-
- Collect and analyze data
|
|
34
|
-
- Use statistical methods appropriately
|
|
35
|
-
- Use `[ANALYSIS]` marker for interpretations
|
|
36
|
-
|
|
37
|
-
### 6. Conclusion
|
|
38
|
-
- Accept or reject hypothesis based on evidence
|
|
39
|
-
- Acknowledge limitations
|
|
40
|
-
- Use `[CONCLUSION:confidence=X]` marker
|
|
41
|
-
|
|
42
|
-
## Best Practices
|
|
43
|
-
|
|
44
|
-
1. **Null Hypothesis**: Always consider the null hypothesis
|
|
45
|
-
2. **Controls**: Include appropriate control groups/conditions
|
|
46
|
-
3. **Sample Size**: Ensure adequate sample size for statistical power
|
|
47
|
-
4. **Reproducibility**: Document all steps for replication
|
|
48
|
-
5. **Peer Review**: Validate findings before final conclusions
|
|
49
|
-
|
|
50
|
-
---
|
|
51
|
-
|
|
52
|
-
## Hypothesis-First Workflow
|
|
53
|
-
|
|
54
|
-
**CRITICAL**: State your hypotheses BEFORE looking at the data. This prevents p-hacking and confirmation bias.
|
|
55
|
-
|
|
56
|
-
### 1. State H0/H1 Before Data Analysis
|
|
57
|
-
|
|
58
|
-
```python
|
|
59
|
-
# ALWAYS document hypotheses before running analysis
|
|
60
|
-
print("[HYPOTHESIS] H0: No difference between groups (μ1 = μ2)")
|
|
61
|
-
print("[HYPOTHESIS] H1: Treatment group shows improvement (μ1 > μ2)")
|
|
62
|
-
```
|
|
63
|
-
|
|
64
|
-
**Requirements:**
|
|
65
|
-
- H0 (Null Hypothesis): The default assumption of no effect/difference
|
|
66
|
-
- H1 (Alternative Hypothesis): The specific effect you're testing
|
|
67
|
-
- Both must be stated BEFORE examining the data
|
|
68
|
-
|
|
69
|
-
### 2. Define Endpoints and Alpha Before Analysis
|
|
70
|
-
|
|
71
|
-
```python
|
|
72
|
-
# Pre-specify statistical parameters
|
|
73
|
-
print("[DECISION] Primary endpoint: mean response time")
|
|
74
|
-
print("[DECISION] Alpha level: 0.05 (two-tailed)")
|
|
75
|
-
print("[DECISION] Minimum effect size of interest: Cohen's d = 0.5")
|
|
76
|
-
```
|
|
77
|
-
|
|
78
|
-
**Before running any tests, document:**
|
|
79
|
-
- Primary endpoint (what you're measuring)
|
|
80
|
-
- Alpha level (typically 0.05)
|
|
81
|
-
- Directionality (one-tailed vs two-tailed)
|
|
82
|
-
- Minimum effect size of practical significance
|
|
83
|
-
|
|
84
|
-
### 3. Pre-registration
|
|
85
|
-
|
|
86
|
-
Document your complete analysis plan before data analysis:
|
|
87
|
-
|
|
88
|
-
```markdown
|
|
89
|
-
## Pre-Registration
|
|
90
|
-
- **Primary Hypothesis**: [H0/H1 statements]
|
|
91
|
-
- **Primary Endpoint**: [What metric]
|
|
92
|
-
- **Statistical Test**: [Which test and why]
|
|
93
|
-
- **Alpha Level**: [Significance threshold]
|
|
94
|
-
- **Sample Size Rationale**: [Power analysis results]
|
|
95
|
-
- **Multiple Testing**: [Correction method if >1 test]
|
|
96
|
-
- **Exclusion Criteria**: [When to remove data points]
|
|
97
|
-
```
|
|
98
|
-
|
|
99
|
-
**Why Pre-register?**
|
|
100
|
-
- Distinguishes confirmatory from exploratory analysis
|
|
101
|
-
- Prevents HARKing (Hypothesizing After Results are Known)
|
|
102
|
-
- Increases credibility of findings
|
|
103
|
-
|
|
104
|
-
---
|
|
105
|
-
|
|
106
|
-
## Statistical Rigor Requirements
|
|
107
|
-
|
|
108
|
-
### Always Report Confidence Intervals
|
|
109
|
-
|
|
110
|
-
**NEVER** report only point estimates. Always include confidence intervals:
|
|
111
|
-
|
|
112
|
-
```python
|
|
113
|
-
# Calculate and report CI
|
|
114
|
-
from scipy.stats import sem
|
|
115
|
-
mean_val = data.mean()
|
|
116
|
-
ci_margin = 1.96 * sem(data)
|
|
117
|
-
ci_low, ci_high = mean_val - ci_margin, mean_val + ci_margin
|
|
118
|
-
|
|
119
|
-
print(f"[STAT:estimate] Mean = {mean_val:.3f}")
|
|
120
|
-
print(f"[STAT:ci] 95% CI [{ci_low:.3f}, {ci_high:.3f}]")
|
|
121
|
-
```
|
|
122
|
-
|
|
123
|
-
**CI communicates:**
|
|
124
|
-
- Precision of the estimate
|
|
125
|
-
- Range of plausible values
|
|
126
|
-
- Whether effect is meaningfully different from zero
|
|
127
|
-
|
|
128
|
-
### Always Report Effect Size with Interpretation
|
|
129
|
-
|
|
130
|
-
**NEVER** claim significance without effect size:
|
|
131
|
-
|
|
132
|
-
```python
|
|
133
|
-
import numpy as np
|
|
134
|
-
|
|
135
|
-
def cohens_d(group1, group2):
|
|
136
|
-
n1, n2 = len(group1), len(group2)
|
|
137
|
-
var1, var2 = group1.var(), group2.var()
|
|
138
|
-
pooled_std = np.sqrt(((n1-1)*var1 + (n2-1)*var2) / (n1+n2-2))
|
|
139
|
-
return (group1.mean() - group2.mean()) / pooled_std
|
|
140
|
-
|
|
141
|
-
d = cohens_d(treatment, control)
|
|
142
|
-
effect_label = "small" if abs(d) < 0.5 else "medium" if abs(d) < 0.8 else "large"
|
|
143
|
-
print(f"[STAT:effect_size] Cohen's d = {d:.3f} ({effect_label})")
|
|
144
|
-
```
|
|
145
|
-
|
|
146
|
-
### Use Appropriate Tests for Data Type
|
|
147
|
-
|
|
148
|
-
| Data Type | Comparison | Recommended Test |
|
|
149
|
-
|-----------|------------|------------------|
|
|
150
|
-
| Continuous, normal | 2 groups | Welch's t-test |
|
|
151
|
-
| Continuous, non-normal | 2 groups | Mann-Whitney U |
|
|
152
|
-
| Continuous, normal | >2 groups | ANOVA |
|
|
153
|
-
| Continuous, non-normal | >2 groups | Kruskal-Wallis |
|
|
154
|
-
| Categorical | 2x2 table | Chi-square or Fisher's exact |
|
|
155
|
-
| Proportions | 2 groups | Z-test for proportions |
|
|
156
|
-
| Correlation | Continuous | Pearson (normal) or Spearman |
|
|
157
|
-
|
|
158
|
-
```python
|
|
159
|
-
# Document test selection
|
|
160
|
-
print("[DECISION] Using Welch's t-test: two independent groups, unequal variance assumed")
|
|
161
|
-
|
|
162
|
-
# Check assumptions
|
|
163
|
-
from scipy.stats import shapiro, levene
|
|
164
|
-
_, p_norm = shapiro(data)
|
|
165
|
-
print(f"[CHECK:normality] Shapiro-Wilk p = {p_norm:.3f}")
|
|
166
|
-
```
|
|
167
|
-
|
|
168
|
-
---
|
|
169
|
-
|
|
170
|
-
## Multiple Comparison Correction
|
|
171
|
-
|
|
172
|
-
When running multiple statistical tests, adjust for inflated false positive rate.
|
|
173
|
-
|
|
174
|
-
### Bonferroni Correction (Conservative)
|
|
175
|
-
|
|
176
|
-
**Use when:** Small number of planned comparisons (≤10)
|
|
177
|
-
|
|
178
|
-
```python
|
|
179
|
-
import numpy as np
|
|
180
|
-
|
|
181
|
-
n_tests = 5
|
|
182
|
-
alpha = 0.05
|
|
183
|
-
bonferroni_alpha = alpha / n_tests
|
|
184
|
-
|
|
185
|
-
print(f"[DECISION] Bonferroni correction: α = {alpha}/{n_tests} = {bonferroni_alpha:.4f}")
|
|
186
|
-
|
|
187
|
-
# Report both raw and adjusted p-values
|
|
188
|
-
p_values = [0.01, 0.03, 0.02, 0.15, 0.04]
|
|
189
|
-
for i, p in enumerate(p_values):
|
|
190
|
-
adjusted_p = min(p * n_tests, 1.0)
|
|
191
|
-
sig = "***" if p < bonferroni_alpha else ""
|
|
192
|
-
print(f"[STAT:p_value] Test {i+1}: raw p = {p:.4f}, adjusted p = {adjusted_p:.4f} {sig}")
|
|
193
|
-
```
|
|
194
|
-
|
|
195
|
-
### Benjamini-Hochberg FDR (Less Conservative)
|
|
196
|
-
|
|
197
|
-
**Use when:** Large number of tests (>10), exploratory analysis
|
|
198
|
-
|
|
199
|
-
```python
|
|
200
|
-
from scipy.stats import false_discovery_control
|
|
201
|
-
|
|
202
|
-
p_values = np.array([0.001, 0.008, 0.039, 0.041, 0.042, 0.06, 0.074, 0.205, 0.212, 0.35])
|
|
203
|
-
adjusted_p = false_discovery_control(p_values, method='bh')
|
|
204
|
-
|
|
205
|
-
print("[DECISION] Benjamini-Hochberg FDR correction for 10 tests")
|
|
206
|
-
for i, (raw, adj) in enumerate(zip(p_values, adjusted_p)):
|
|
207
|
-
sig = "***" if adj < 0.05 else ""
|
|
208
|
-
print(f"[STAT:p_value] Test {i+1}: raw p = {raw:.4f}, BH-adjusted p = {adj:.4f} {sig}")
|
|
209
|
-
```
|
|
210
|
-
|
|
211
|
-
### Decision Guide
|
|
212
|
-
|
|
213
|
-
| Scenario | Correction | Rationale |
|
|
214
|
-
|----------|------------|-----------|
|
|
215
|
-
| 1-5 planned comparisons | Bonferroni | Simple, conservative |
|
|
216
|
-
| 6-20 tests | Bonferroni or BH-FDR | Balance rigor/power |
|
|
217
|
-
| >20 tests (genomics, etc.) | BH-FDR | Maintains power |
|
|
218
|
-
| Exploratory screening | BH-FDR | Focus on discovery |
|
|
219
|
-
| Confirmatory study | Bonferroni | Focus on rigor |
|
|
220
|
-
|
|
221
|
-
**Always report both raw AND adjusted p-values** to allow readers to assess significance under different criteria.
|
|
222
|
-
|
|
223
|
-
---
|
|
224
|
-
|
|
225
|
-
## Effect Size Interpretation
|
|
226
|
-
|
|
227
|
-
Use these thresholds to interpret effect magnitudes:
|
|
228
|
-
|
|
229
|
-
### Cohen's d (Group Differences)
|
|
230
|
-
|
|
231
|
-
| Cohen's d | Interpretation | Practical Meaning |
|
|
232
|
-
|-----------|----------------|-------------------|
|
|
233
|
-
| 0.2 | Small | Barely noticeable difference |
|
|
234
|
-
| 0.5 | Medium | Noticeable, potentially meaningful |
|
|
235
|
-
| 0.8 | Large | Obvious, substantial difference |
|
|
236
|
-
|
|
237
|
-
```python
|
|
238
|
-
d = 0.65
|
|
239
|
-
if abs(d) < 0.2:
|
|
240
|
-
interpretation = "negligible"
|
|
241
|
-
elif abs(d) < 0.5:
|
|
242
|
-
interpretation = "small"
|
|
243
|
-
elif abs(d) < 0.8:
|
|
244
|
-
interpretation = "medium"
|
|
245
|
-
else:
|
|
246
|
-
interpretation = "large"
|
|
247
|
-
|
|
248
|
-
print(f"[STAT:effect_size] Cohen's d = {d:.2f} ({interpretation})")
|
|
249
|
-
```
|
|
250
|
-
|
|
251
|
-
### Correlation Coefficient (r)
|
|
252
|
-
|
|
253
|
-
| |r| | Interpretation | Variance Explained |
|
|
254
|
-
|-----|----------------|-------------------|
|
|
255
|
-
| 0.1 | Small | ~1% |
|
|
256
|
-
| 0.3 | Medium | ~9% |
|
|
257
|
-
| 0.5 | Large | ~25% |
|
|
258
|
-
|
|
259
|
-
```python
|
|
260
|
-
r = 0.42
|
|
261
|
-
r_squared = r ** 2
|
|
262
|
-
if abs(r) < 0.1:
|
|
263
|
-
interpretation = "negligible"
|
|
264
|
-
elif abs(r) < 0.3:
|
|
265
|
-
interpretation = "small"
|
|
266
|
-
elif abs(r) < 0.5:
|
|
267
|
-
interpretation = "medium"
|
|
268
|
-
else:
|
|
269
|
-
interpretation = "large"
|
|
270
|
-
|
|
271
|
-
print(f"[STAT:effect_size] r = {r:.2f} ({interpretation}, explains {r_squared*100:.1f}% variance)")
|
|
272
|
-
```
|
|
273
|
-
|
|
274
|
-
### Odds Ratio (OR)
|
|
275
|
-
|
|
276
|
-
| Odds Ratio | Interpretation |
|
|
277
|
-
|------------|----------------|
|
|
278
|
-
| 1.5 | Small effect |
|
|
279
|
-
| 2.5 | Medium effect |
|
|
280
|
-
| 4.0 | Large effect |
|
|
281
|
-
|
|
282
|
-
**Note:** Odds ratios are symmetric around 1.0. An OR of 0.25 is equivalent in magnitude to OR of 4.0.
|
|
283
|
-
|
|
284
|
-
```python
|
|
285
|
-
OR = 3.2
|
|
286
|
-
if OR > 1:
|
|
287
|
-
if OR < 1.5:
|
|
288
|
-
interpretation = "negligible"
|
|
289
|
-
elif OR < 2.5:
|
|
290
|
-
interpretation = "small"
|
|
291
|
-
elif OR < 4.0:
|
|
292
|
-
interpretation = "medium"
|
|
293
|
-
else:
|
|
294
|
-
interpretation = "large"
|
|
295
|
-
else:
|
|
296
|
-
# For protective effects (OR < 1)
|
|
297
|
-
reciprocal = 1 / OR
|
|
298
|
-
if reciprocal < 1.5:
|
|
299
|
-
interpretation = "negligible"
|
|
300
|
-
elif reciprocal < 2.5:
|
|
301
|
-
interpretation = "small"
|
|
302
|
-
elif reciprocal < 4.0:
|
|
303
|
-
interpretation = "medium"
|
|
304
|
-
else:
|
|
305
|
-
interpretation = "large"
|
|
306
|
-
|
|
307
|
-
print(f"[STAT:effect_size] OR = {OR:.2f} ({interpretation})")
|
|
308
|
-
```
|
|
309
|
-
|
|
310
|
-
### Quick Reference Table
|
|
311
|
-
|
|
312
|
-
| Measure | Small | Medium | Large |
|
|
313
|
-
|---------|-------|--------|-------|
|
|
314
|
-
| **Cohen's d** | 0.2 | 0.5 | 0.8 |
|
|
315
|
-
| **Correlation r** | 0.1 | 0.3 | 0.5 |
|
|
316
|
-
| **Odds Ratio** | 1.5 | 2.5 | 4.0 |
|
|
317
|
-
| **R² (variance)** | 1% | 9% | 25% |
|
|
318
|
-
| **η² (eta squared)** | 0.01 | 0.06 | 0.14 |
|
|
319
|
-
|
|
320
|
-
### From Effect Size to Practical Significance
|
|
321
|
-
|
|
322
|
-
Always translate statistical effect size to real-world meaning:
|
|
323
|
-
|
|
324
|
-
```python
|
|
325
|
-
# Good: Connect to practical significance
|
|
326
|
-
print(f"[STAT:effect_size] Cohen's d = 0.5 (medium)")
|
|
327
|
-
print(f"[SO_WHAT] A medium effect means the treatment reduces wait time by ~15 seconds on average")
|
|
328
|
-
|
|
329
|
-
# Bad: Just reporting the number
|
|
330
|
-
print(f"[STAT:effect_size] d = 0.5") # Missing interpretation and practical meaning
|
|
331
|
-
```
|