gyoshu 0.1.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,331 +0,0 @@
1
- ---
2
- name: scientific-method
3
- description: Framework for hypothesis-driven scientific research
4
- ---
5
-
6
- # Scientific Method Framework
7
-
8
- ## When to Use
9
- Load this skill when conducting hypothesis-driven research that requires rigorous methodology.
10
-
11
- ## The Scientific Method
12
-
13
- ### 1. Observation
14
- - Examine existing data or phenomena
15
- - Note patterns, anomalies, or questions
16
- - Document initial observations with `[OBSERVATION]` marker
17
-
18
- ### 2. Question
19
- - Formulate a specific, testable question
20
- - Use `[OBJECTIVE]` marker to state clearly
21
-
22
- ### 3. Hypothesis
23
- - Propose a testable explanation
24
- - Make specific, falsifiable predictions
25
- - Use `[HYPOTHESIS]` marker
26
-
27
- ### 4. Experiment
28
- - Design controlled experiments
29
- - Identify variables (independent, dependent, controlled)
30
- - Use `[EXPERIMENT]` marker for procedures
31
-
32
- ### 5. Analysis
33
- - Collect and analyze data
34
- - Use statistical methods appropriately
35
- - Use `[ANALYSIS]` marker for interpretations
36
-
37
- ### 6. Conclusion
38
- - Accept or reject hypothesis based on evidence
39
- - Acknowledge limitations
40
- - Use `[CONCLUSION:confidence=X]` marker
41
-
42
- ## Best Practices
43
-
44
- 1. **Null Hypothesis**: Always consider the null hypothesis
45
- 2. **Controls**: Include appropriate control groups/conditions
46
- 3. **Sample Size**: Ensure adequate sample size for statistical power
47
- 4. **Reproducibility**: Document all steps for replication
48
- 5. **Peer Review**: Validate findings before final conclusions
49
-
50
- ---
51
-
52
- ## Hypothesis-First Workflow
53
-
54
- **CRITICAL**: State your hypotheses BEFORE looking at the data. This prevents p-hacking and confirmation bias.
55
-
56
- ### 1. State H0/H1 Before Data Analysis
57
-
58
- ```python
59
- # ALWAYS document hypotheses before running analysis
60
- print("[HYPOTHESIS] H0: No difference between groups (μ1 = μ2)")
61
- print("[HYPOTHESIS] H1: Treatment group shows improvement (μ1 > μ2)")
62
- ```
63
-
64
- **Requirements:**
65
- - H0 (Null Hypothesis): The default assumption of no effect/difference
66
- - H1 (Alternative Hypothesis): The specific effect you're testing
67
- - Both must be stated BEFORE examining the data
68
-
69
- ### 2. Define Endpoints and Alpha Before Analysis
70
-
71
- ```python
72
- # Pre-specify statistical parameters
73
- print("[DECISION] Primary endpoint: mean response time")
74
- print("[DECISION] Alpha level: 0.05 (two-tailed)")
75
- print("[DECISION] Minimum effect size of interest: Cohen's d = 0.5")
76
- ```
77
-
78
- **Before running any tests, document:**
79
- - Primary endpoint (what you're measuring)
80
- - Alpha level (typically 0.05)
81
- - Directionality (one-tailed vs two-tailed)
82
- - Minimum effect size of practical significance
83
-
84
- ### 3. Pre-registration
85
-
86
- Document your complete analysis plan before data analysis:
87
-
88
- ```markdown
89
- ## Pre-Registration
90
- - **Primary Hypothesis**: [H0/H1 statements]
91
- - **Primary Endpoint**: [What metric]
92
- - **Statistical Test**: [Which test and why]
93
- - **Alpha Level**: [Significance threshold]
94
- - **Sample Size Rationale**: [Power analysis results]
95
- - **Multiple Testing**: [Correction method if >1 test]
96
- - **Exclusion Criteria**: [When to remove data points]
97
- ```
98
-
99
- **Why Pre-register?**
100
- - Distinguishes confirmatory from exploratory analysis
101
- - Prevents HARKing (Hypothesizing After Results are Known)
102
- - Increases credibility of findings
103
-
104
- ---
105
-
106
- ## Statistical Rigor Requirements
107
-
108
- ### Always Report Confidence Intervals
109
-
110
- **NEVER** report only point estimates. Always include confidence intervals:
111
-
112
- ```python
113
- # Calculate and report CI
114
- from scipy.stats import sem
115
- mean_val = data.mean()
116
- ci_margin = 1.96 * sem(data)
117
- ci_low, ci_high = mean_val - ci_margin, mean_val + ci_margin
118
-
119
- print(f"[STAT:estimate] Mean = {mean_val:.3f}")
120
- print(f"[STAT:ci] 95% CI [{ci_low:.3f}, {ci_high:.3f}]")
121
- ```
122
-
123
- **CI communicates:**
124
- - Precision of the estimate
125
- - Range of plausible values
126
- - Whether effect is meaningfully different from zero
127
-
128
- ### Always Report Effect Size with Interpretation
129
-
130
- **NEVER** claim significance without effect size:
131
-
132
- ```python
133
- import numpy as np
134
-
135
- def cohens_d(group1, group2):
136
- n1, n2 = len(group1), len(group2)
137
- var1, var2 = group1.var(), group2.var()
138
- pooled_std = np.sqrt(((n1-1)*var1 + (n2-1)*var2) / (n1+n2-2))
139
- return (group1.mean() - group2.mean()) / pooled_std
140
-
141
- d = cohens_d(treatment, control)
142
- effect_label = "small" if abs(d) < 0.5 else "medium" if abs(d) < 0.8 else "large"
143
- print(f"[STAT:effect_size] Cohen's d = {d:.3f} ({effect_label})")
144
- ```
145
-
146
- ### Use Appropriate Tests for Data Type
147
-
148
- | Data Type | Comparison | Recommended Test |
149
- |-----------|------------|------------------|
150
- | Continuous, normal | 2 groups | Welch's t-test |
151
- | Continuous, non-normal | 2 groups | Mann-Whitney U |
152
- | Continuous, normal | >2 groups | ANOVA |
153
- | Continuous, non-normal | >2 groups | Kruskal-Wallis |
154
- | Categorical | 2x2 table | Chi-square or Fisher's exact |
155
- | Proportions | 2 groups | Z-test for proportions |
156
- | Correlation | Continuous | Pearson (normal) or Spearman |
157
-
158
- ```python
159
- # Document test selection
160
- print("[DECISION] Using Welch's t-test: two independent groups, unequal variance assumed")
161
-
162
- # Check assumptions
163
- from scipy.stats import shapiro, levene
164
- _, p_norm = shapiro(data)
165
- print(f"[CHECK:normality] Shapiro-Wilk p = {p_norm:.3f}")
166
- ```
167
-
168
- ---
169
-
170
- ## Multiple Comparison Correction
171
-
172
- When running multiple statistical tests, adjust for inflated false positive rate.
173
-
174
- ### Bonferroni Correction (Conservative)
175
-
176
- **Use when:** Small number of planned comparisons (≤10)
177
-
178
- ```python
179
- import numpy as np
180
-
181
- n_tests = 5
182
- alpha = 0.05
183
- bonferroni_alpha = alpha / n_tests
184
-
185
- print(f"[DECISION] Bonferroni correction: α = {alpha}/{n_tests} = {bonferroni_alpha:.4f}")
186
-
187
- # Report both raw and adjusted p-values
188
- p_values = [0.01, 0.03, 0.02, 0.15, 0.04]
189
- for i, p in enumerate(p_values):
190
- adjusted_p = min(p * n_tests, 1.0)
191
- sig = "***" if p < bonferroni_alpha else ""
192
- print(f"[STAT:p_value] Test {i+1}: raw p = {p:.4f}, adjusted p = {adjusted_p:.4f} {sig}")
193
- ```
194
-
195
- ### Benjamini-Hochberg FDR (Less Conservative)
196
-
197
- **Use when:** Large number of tests (>10), exploratory analysis
198
-
199
- ```python
200
- from scipy.stats import false_discovery_control
201
-
202
- p_values = np.array([0.001, 0.008, 0.039, 0.041, 0.042, 0.06, 0.074, 0.205, 0.212, 0.35])
203
- adjusted_p = false_discovery_control(p_values, method='bh')
204
-
205
- print("[DECISION] Benjamini-Hochberg FDR correction for 10 tests")
206
- for i, (raw, adj) in enumerate(zip(p_values, adjusted_p)):
207
- sig = "***" if adj < 0.05 else ""
208
- print(f"[STAT:p_value] Test {i+1}: raw p = {raw:.4f}, BH-adjusted p = {adj:.4f} {sig}")
209
- ```
210
-
211
- ### Decision Guide
212
-
213
- | Scenario | Correction | Rationale |
214
- |----------|------------|-----------|
215
- | 1-5 planned comparisons | Bonferroni | Simple, conservative |
216
- | 6-20 tests | Bonferroni or BH-FDR | Balance rigor/power |
217
- | >20 tests (genomics, etc.) | BH-FDR | Maintains power |
218
- | Exploratory screening | BH-FDR | Focus on discovery |
219
- | Confirmatory study | Bonferroni | Focus on rigor |
220
-
221
- **Always report both raw AND adjusted p-values** to allow readers to assess significance under different criteria.
222
-
223
- ---
224
-
225
- ## Effect Size Interpretation
226
-
227
- Use these thresholds to interpret effect magnitudes:
228
-
229
- ### Cohen's d (Group Differences)
230
-
231
- | Cohen's d | Interpretation | Practical Meaning |
232
- |-----------|----------------|-------------------|
233
- | 0.2 | Small | Barely noticeable difference |
234
- | 0.5 | Medium | Noticeable, potentially meaningful |
235
- | 0.8 | Large | Obvious, substantial difference |
236
-
237
- ```python
238
- d = 0.65
239
- if abs(d) < 0.2:
240
- interpretation = "negligible"
241
- elif abs(d) < 0.5:
242
- interpretation = "small"
243
- elif abs(d) < 0.8:
244
- interpretation = "medium"
245
- else:
246
- interpretation = "large"
247
-
248
- print(f"[STAT:effect_size] Cohen's d = {d:.2f} ({interpretation})")
249
- ```
250
-
251
- ### Correlation Coefficient (r)
252
-
253
- | |r| | Interpretation | Variance Explained |
254
- |-----|----------------|-------------------|
255
- | 0.1 | Small | ~1% |
256
- | 0.3 | Medium | ~9% |
257
- | 0.5 | Large | ~25% |
258
-
259
- ```python
260
- r = 0.42
261
- r_squared = r ** 2
262
- if abs(r) < 0.1:
263
- interpretation = "negligible"
264
- elif abs(r) < 0.3:
265
- interpretation = "small"
266
- elif abs(r) < 0.5:
267
- interpretation = "medium"
268
- else:
269
- interpretation = "large"
270
-
271
- print(f"[STAT:effect_size] r = {r:.2f} ({interpretation}, explains {r_squared*100:.1f}% variance)")
272
- ```
273
-
274
- ### Odds Ratio (OR)
275
-
276
- | Odds Ratio | Interpretation |
277
- |------------|----------------|
278
- | 1.5 | Small effect |
279
- | 2.5 | Medium effect |
280
- | 4.0 | Large effect |
281
-
282
- **Note:** Odds ratios are symmetric around 1.0. An OR of 0.25 is equivalent in magnitude to OR of 4.0.
283
-
284
- ```python
285
- OR = 3.2
286
- if OR > 1:
287
- if OR < 1.5:
288
- interpretation = "negligible"
289
- elif OR < 2.5:
290
- interpretation = "small"
291
- elif OR < 4.0:
292
- interpretation = "medium"
293
- else:
294
- interpretation = "large"
295
- else:
296
- # For protective effects (OR < 1)
297
- reciprocal = 1 / OR
298
- if reciprocal < 1.5:
299
- interpretation = "negligible"
300
- elif reciprocal < 2.5:
301
- interpretation = "small"
302
- elif reciprocal < 4.0:
303
- interpretation = "medium"
304
- else:
305
- interpretation = "large"
306
-
307
- print(f"[STAT:effect_size] OR = {OR:.2f} ({interpretation})")
308
- ```
309
-
310
- ### Quick Reference Table
311
-
312
- | Measure | Small | Medium | Large |
313
- |---------|-------|--------|-------|
314
- | **Cohen's d** | 0.2 | 0.5 | 0.8 |
315
- | **Correlation r** | 0.1 | 0.3 | 0.5 |
316
- | **Odds Ratio** | 1.5 | 2.5 | 4.0 |
317
- | **R² (variance)** | 1% | 9% | 25% |
318
- | **η² (eta squared)** | 0.01 | 0.06 | 0.14 |
319
-
320
- ### From Effect Size to Practical Significance
321
-
322
- Always translate statistical effect size to real-world meaning:
323
-
324
- ```python
325
- # Good: Connect to practical significance
326
- print(f"[STAT:effect_size] Cohen's d = 0.5 (medium)")
327
- print(f"[SO_WHAT] A medium effect means the treatment reduces wait time by ~15 seconds on average")
328
-
329
- # Bad: Just reporting the number
330
- print(f"[STAT:effect_size] d = 0.5") # Missing interpretation and practical meaning
331
- ```