@wentorai/research-plugins 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (252) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +204 -0
  3. package/curated/analysis/README.md +64 -0
  4. package/curated/domains/README.md +104 -0
  5. package/curated/literature/README.md +53 -0
  6. package/curated/research/README.md +62 -0
  7. package/curated/tools/README.md +87 -0
  8. package/curated/writing/README.md +61 -0
  9. package/index.ts +39 -0
  10. package/mcp-configs/academic-db/ChatSpatial.json +17 -0
  11. package/mcp-configs/academic-db/academia-mcp.json +17 -0
  12. package/mcp-configs/academic-db/academic-paper-explorer.json +17 -0
  13. package/mcp-configs/academic-db/academic-search-mcp-server.json +17 -0
  14. package/mcp-configs/academic-db/agentinterviews-mcp.json +17 -0
  15. package/mcp-configs/academic-db/all-in-mcp.json +17 -0
  16. package/mcp-configs/academic-db/apple-health-mcp.json +17 -0
  17. package/mcp-configs/academic-db/arxiv-latex-mcp.json +17 -0
  18. package/mcp-configs/academic-db/arxiv-mcp-server.json +17 -0
  19. package/mcp-configs/academic-db/bgpt-mcp.json +17 -0
  20. package/mcp-configs/academic-db/biomcp.json +17 -0
  21. package/mcp-configs/academic-db/biothings-mcp.json +17 -0
  22. package/mcp-configs/academic-db/catalysishub-mcp-server.json +17 -0
  23. package/mcp-configs/academic-db/clinicaltrialsgov-mcp-server.json +17 -0
  24. package/mcp-configs/academic-db/deep-research-mcp.json +17 -0
  25. package/mcp-configs/academic-db/dicom-mcp.json +17 -0
  26. package/mcp-configs/academic-db/enrichr-mcp-server.json +17 -0
  27. package/mcp-configs/academic-db/fec-mcp-server.json +17 -0
  28. package/mcp-configs/academic-db/fhir-mcp-server-themomentum.json +17 -0
  29. package/mcp-configs/academic-db/fhir-mcp.json +19 -0
  30. package/mcp-configs/academic-db/gget-mcp.json +17 -0
  31. package/mcp-configs/academic-db/google-researcher-mcp.json +17 -0
  32. package/mcp-configs/academic-db/idea-reality-mcp.json +17 -0
  33. package/mcp-configs/academic-db/legiscan-mcp.json +19 -0
  34. package/mcp-configs/academic-db/lex.json +17 -0
  35. package/mcp-configs/ai-platform/Adaptive-Graph-of-Thoughts-MCP-server.json +17 -0
  36. package/mcp-configs/ai-platform/ai-counsel.json +17 -0
  37. package/mcp-configs/ai-platform/atlas-mcp-server.json +17 -0
  38. package/mcp-configs/ai-platform/counsel-mcp.json +17 -0
  39. package/mcp-configs/ai-platform/cross-llm-mcp.json +17 -0
  40. package/mcp-configs/ai-platform/gptr-mcp.json +17 -0
  41. package/mcp-configs/browser/decipher-research-agent.json +17 -0
  42. package/mcp-configs/browser/deep-research.json +17 -0
  43. package/mcp-configs/browser/everything-claude-code.json +17 -0
  44. package/mcp-configs/browser/gpt-researcher.json +17 -0
  45. package/mcp-configs/browser/heurist-agent-framework.json +17 -0
  46. package/mcp-configs/data-platform/4everland-hosting-mcp.json +17 -0
  47. package/mcp-configs/data-platform/context-keeper.json +17 -0
  48. package/mcp-configs/data-platform/context7.json +19 -0
  49. package/mcp-configs/data-platform/contextstream-mcp.json +17 -0
  50. package/mcp-configs/data-platform/email-mcp.json +17 -0
  51. package/mcp-configs/note-knowledge/ApeRAG.json +17 -0
  52. package/mcp-configs/note-knowledge/In-Memoria.json +17 -0
  53. package/mcp-configs/note-knowledge/agent-memory.json +17 -0
  54. package/mcp-configs/note-knowledge/aimemo.json +17 -0
  55. package/mcp-configs/note-knowledge/biel-mcp.json +19 -0
  56. package/mcp-configs/note-knowledge/cognee.json +17 -0
  57. package/mcp-configs/note-knowledge/context-awesome.json +17 -0
  58. package/mcp-configs/note-knowledge/context-mcp.json +17 -0
  59. package/mcp-configs/note-knowledge/conversation-handoff-mcp.json +17 -0
  60. package/mcp-configs/note-knowledge/cortex.json +17 -0
  61. package/mcp-configs/note-knowledge/devrag.json +17 -0
  62. package/mcp-configs/note-knowledge/easy-obsidian-mcp.json +17 -0
  63. package/mcp-configs/note-knowledge/engram.json +17 -0
  64. package/mcp-configs/note-knowledge/gnosis-mcp.json +17 -0
  65. package/mcp-configs/note-knowledge/graphlit-mcp-server.json +19 -0
  66. package/mcp-configs/reference-mgr/arxiv-cli.json +17 -0
  67. package/mcp-configs/reference-mgr/arxiv-search-mcp.json +17 -0
  68. package/mcp-configs/reference-mgr/chiken.json +17 -0
  69. package/mcp-configs/reference-mgr/claude-scholar.json +17 -0
  70. package/mcp-configs/reference-mgr/devonthink-mcp.json +17 -0
  71. package/mcp-configs/registry.json +447 -0
  72. package/openclaw.plugin.json +21 -0
  73. package/package.json +61 -0
  74. package/skills/analysis/dataviz/color-accessibility-guide/SKILL.md +230 -0
  75. package/skills/analysis/dataviz/geospatial-viz-guide/SKILL.md +218 -0
  76. package/skills/analysis/dataviz/interactive-viz-guide/SKILL.md +287 -0
  77. package/skills/analysis/dataviz/network-visualization-guide/SKILL.md +195 -0
  78. package/skills/analysis/dataviz/publication-figures-guide/SKILL.md +238 -0
  79. package/skills/analysis/dataviz/python-dataviz-guide/SKILL.md +195 -0
  80. package/skills/analysis/econometrics/causal-inference-guide/SKILL.md +197 -0
  81. package/skills/analysis/econometrics/iv-regression-guide/SKILL.md +198 -0
  82. package/skills/analysis/econometrics/panel-data-guide/SKILL.md +274 -0
  83. package/skills/analysis/econometrics/robustness-checks/SKILL.md +250 -0
  84. package/skills/analysis/econometrics/stata-regression/SKILL.md +117 -0
  85. package/skills/analysis/econometrics/time-series-guide/SKILL.md +235 -0
  86. package/skills/analysis/statistics/bayesian-statistics-guide/SKILL.md +221 -0
  87. package/skills/analysis/statistics/hypothesis-testing-guide/SKILL.md +210 -0
  88. package/skills/analysis/statistics/meta-analysis-guide/SKILL.md +206 -0
  89. package/skills/analysis/statistics/nonparametric-tests-guide/SKILL.md +221 -0
  90. package/skills/analysis/statistics/power-analysis-guide/SKILL.md +240 -0
  91. package/skills/analysis/statistics/sem-guide/SKILL.md +231 -0
  92. package/skills/analysis/statistics/survival-analysis-guide/SKILL.md +195 -0
  93. package/skills/analysis/wrangling/missing-data-handling/SKILL.md +224 -0
  94. package/skills/analysis/wrangling/pandas-data-wrangling/SKILL.md +242 -0
  95. package/skills/analysis/wrangling/questionnaire-design-guide/SKILL.md +234 -0
  96. package/skills/analysis/wrangling/text-mining-guide/SKILL.md +225 -0
  97. package/skills/domains/ai-ml/computer-vision-guide/SKILL.md +213 -0
  98. package/skills/domains/ai-ml/deep-learning-papers-guide/SKILL.md +200 -0
  99. package/skills/domains/ai-ml/llm-evaluation-guide/SKILL.md +194 -0
  100. package/skills/domains/ai-ml/prompt-engineering-research/SKILL.md +233 -0
  101. package/skills/domains/ai-ml/reinforcement-learning-guide/SKILL.md +254 -0
  102. package/skills/domains/ai-ml/transformer-architecture-guide/SKILL.md +233 -0
  103. package/skills/domains/biomedical/clinical-research-guide/SKILL.md +232 -0
  104. package/skills/domains/biomedical/clinicaltrials-api/SKILL.md +177 -0
  105. package/skills/domains/biomedical/epidemiology-guide/SKILL.md +200 -0
  106. package/skills/domains/biomedical/genomics-analysis-guide/SKILL.md +270 -0
  107. package/skills/domains/business/market-analysis-guide/SKILL.md +112 -0
  108. package/skills/domains/business/strategic-management-guide/SKILL.md +154 -0
  109. package/skills/domains/chemistry/computational-chemistry-guide/SKILL.md +266 -0
  110. package/skills/domains/chemistry/retrosynthesis-guide/SKILL.md +215 -0
  111. package/skills/domains/cs/algorithms-complexity-guide/SKILL.md +194 -0
  112. package/skills/domains/cs/dblp-api/SKILL.md +129 -0
  113. package/skills/domains/cs/software-engineering-research/SKILL.md +218 -0
  114. package/skills/domains/ecology/biodiversity-data-guide/SKILL.md +296 -0
  115. package/skills/domains/ecology/conservation-biology-guide/SKILL.md +198 -0
  116. package/skills/domains/ecology/gbif-api/SKILL.md +158 -0
  117. package/skills/domains/ecology/inaturalist-api/SKILL.md +173 -0
  118. package/skills/domains/economics/behavioral-economics-guide/SKILL.md +239 -0
  119. package/skills/domains/economics/development-economics-guide/SKILL.md +181 -0
  120. package/skills/domains/economics/fred-api/SKILL.md +189 -0
  121. package/skills/domains/education/curriculum-design-guide/SKILL.md +144 -0
  122. package/skills/domains/education/learning-science-guide/SKILL.md +150 -0
  123. package/skills/domains/finance/financial-data-analysis/SKILL.md +152 -0
  124. package/skills/domains/finance/quantitative-finance-guide/SKILL.md +151 -0
  125. package/skills/domains/geoscience/climate-science-guide/SKILL.md +158 -0
  126. package/skills/domains/geoscience/gis-remote-sensing-guide/SKILL.md +129 -0
  127. package/skills/domains/humanities/digital-humanities-guide/SKILL.md +181 -0
  128. package/skills/domains/humanities/philosophy-research-guide/SKILL.md +148 -0
  129. package/skills/domains/law/courtlistener-api/SKILL.md +213 -0
  130. package/skills/domains/law/legal-research-guide/SKILL.md +250 -0
  131. package/skills/domains/math/linear-algebra-applications/SKILL.md +227 -0
  132. package/skills/domains/math/numerical-methods-guide/SKILL.md +236 -0
  133. package/skills/domains/math/oeis-api/SKILL.md +158 -0
  134. package/skills/domains/pharma/clinical-pharmacology-guide/SKILL.md +165 -0
  135. package/skills/domains/pharma/drug-development-guide/SKILL.md +177 -0
  136. package/skills/domains/physics/computational-physics-guide/SKILL.md +300 -0
  137. package/skills/domains/physics/nasa-ads-api/SKILL.md +150 -0
  138. package/skills/domains/physics/quantum-computing-guide/SKILL.md +234 -0
  139. package/skills/domains/social-science/social-research-methods/SKILL.md +194 -0
  140. package/skills/domains/social-science/survey-research-guide/SKILL.md +182 -0
  141. package/skills/literature/discovery/citation-alert-guide/SKILL.md +154 -0
  142. package/skills/literature/discovery/conference-proceedings-guide/SKILL.md +142 -0
  143. package/skills/literature/discovery/literature-mapping-guide/SKILL.md +175 -0
  144. package/skills/literature/discovery/paper-tracking-guide/SKILL.md +211 -0
  145. package/skills/literature/discovery/rss-paper-feeds/SKILL.md +214 -0
  146. package/skills/literature/discovery/semantic-scholar-recs-guide/SKILL.md +164 -0
  147. package/skills/literature/fulltext/doaj-api/SKILL.md +120 -0
  148. package/skills/literature/fulltext/interlibrary-loan-guide/SKILL.md +163 -0
  149. package/skills/literature/fulltext/open-access-guide/SKILL.md +183 -0
  150. package/skills/literature/fulltext/pmc-oai-api/SKILL.md +184 -0
  151. package/skills/literature/fulltext/preprint-servers-guide/SKILL.md +128 -0
  152. package/skills/literature/fulltext/repository-harvesting-guide/SKILL.md +207 -0
  153. package/skills/literature/fulltext/unpaywall-api/SKILL.md +113 -0
  154. package/skills/literature/metadata/altmetrics-guide/SKILL.md +132 -0
  155. package/skills/literature/metadata/citation-network-guide/SKILL.md +236 -0
  156. package/skills/literature/metadata/crossref-api/SKILL.md +133 -0
  157. package/skills/literature/metadata/datacite-api/SKILL.md +126 -0
  158. package/skills/literature/metadata/doi-resolution-guide/SKILL.md +168 -0
  159. package/skills/literature/metadata/h-index-guide/SKILL.md +183 -0
  160. package/skills/literature/metadata/journal-metrics-guide/SKILL.md +188 -0
  161. package/skills/literature/metadata/opencitations-api/SKILL.md +128 -0
  162. package/skills/literature/metadata/orcid-api/SKILL.md +136 -0
  163. package/skills/literature/metadata/orcid-integration-guide/SKILL.md +178 -0
  164. package/skills/literature/search/arxiv-api/SKILL.md +95 -0
  165. package/skills/literature/search/biorxiv-api/SKILL.md +123 -0
  166. package/skills/literature/search/boolean-search-guide/SKILL.md +199 -0
  167. package/skills/literature/search/citation-chaining-guide/SKILL.md +148 -0
  168. package/skills/literature/search/database-comparison-guide/SKILL.md +100 -0
  169. package/skills/literature/search/europe-pmc-api/SKILL.md +120 -0
  170. package/skills/literature/search/google-scholar-guide/SKILL.md +182 -0
  171. package/skills/literature/search/mesh-terms-guide/SKILL.md +164 -0
  172. package/skills/literature/search/openalex-api/SKILL.md +134 -0
  173. package/skills/literature/search/pubmed-api/SKILL.md +130 -0
  174. package/skills/literature/search/scientify-literature-survey/SKILL.md +203 -0
  175. package/skills/literature/search/semantic-scholar-api/SKILL.md +134 -0
  176. package/skills/literature/search/systematic-search-strategy/SKILL.md +214 -0
  177. package/skills/research/automation/ai-scientist-guide/SKILL.md +228 -0
  178. package/skills/research/automation/data-collection-automation/SKILL.md +248 -0
  179. package/skills/research/automation/research-workflow-automation/SKILL.md +266 -0
  180. package/skills/research/deep-research/meta-synthesis-guide/SKILL.md +174 -0
  181. package/skills/research/deep-research/research-cog/SKILL.md +153 -0
  182. package/skills/research/deep-research/scoping-review-guide/SKILL.md +217 -0
  183. package/skills/research/deep-research/systematic-review-guide/SKILL.md +250 -0
  184. package/skills/research/funding/figshare-api/SKILL.md +163 -0
  185. package/skills/research/funding/grant-writing-guide/SKILL.md +233 -0
  186. package/skills/research/funding/nsf-grant-guide/SKILL.md +206 -0
  187. package/skills/research/funding/open-science-guide/SKILL.md +255 -0
  188. package/skills/research/funding/zenodo-api/SKILL.md +174 -0
  189. package/skills/research/methodology/action-research-guide/SKILL.md +201 -0
  190. package/skills/research/methodology/experimental-design-guide/SKILL.md +236 -0
  191. package/skills/research/methodology/grad-school-guide/SKILL.md +182 -0
  192. package/skills/research/methodology/grounded-theory-guide/SKILL.md +171 -0
  193. package/skills/research/methodology/mixed-methods-guide/SKILL.md +208 -0
  194. package/skills/research/methodology/qualitative-research-guide/SKILL.md +234 -0
  195. package/skills/research/methodology/scientify-idea-generation/SKILL.md +222 -0
  196. package/skills/research/paper-review/paper-reading-assistant/SKILL.md +266 -0
  197. package/skills/research/paper-review/peer-review-guide/SKILL.md +227 -0
  198. package/skills/research/paper-review/rebuttal-writing-guide/SKILL.md +185 -0
  199. package/skills/research/paper-review/scientify-write-review-paper/SKILL.md +209 -0
  200. package/skills/tools/code-exec/jupyter-notebook-guide/SKILL.md +178 -0
  201. package/skills/tools/code-exec/python-reproducibility-guide/SKILL.md +341 -0
  202. package/skills/tools/code-exec/r-reproducibility-guide/SKILL.md +236 -0
  203. package/skills/tools/code-exec/sandbox-execution-guide/SKILL.md +221 -0
  204. package/skills/tools/diagram/mermaid-diagram-guide/SKILL.md +269 -0
  205. package/skills/tools/diagram/plantuml-guide/SKILL.md +397 -0
  206. package/skills/tools/diagram/scientific-illustration-guide/SKILL.md +225 -0
  207. package/skills/tools/document/anystyle-api/SKILL.md +199 -0
  208. package/skills/tools/document/grobid-pdf-parsing/SKILL.md +294 -0
  209. package/skills/tools/document/markdown-academic-guide/SKILL.md +217 -0
  210. package/skills/tools/document/pdf-extraction-guide/SKILL.md +321 -0
  211. package/skills/tools/knowledge-graph/knowledge-graph-construction/SKILL.md +306 -0
  212. package/skills/tools/knowledge-graph/ontology-design-guide/SKILL.md +214 -0
  213. package/skills/tools/knowledge-graph/rag-methodology-guide/SKILL.md +325 -0
  214. package/skills/tools/ocr-translate/formula-recognition-guide/SKILL.md +367 -0
  215. package/skills/tools/ocr-translate/handwriting-recognition-guide/SKILL.md +211 -0
  216. package/skills/tools/ocr-translate/latex-ocr-guide/SKILL.md +204 -0
  217. package/skills/tools/ocr-translate/multilingual-research-guide/SKILL.md +234 -0
  218. package/skills/tools/scraping/academic-web-scraping/SKILL.md +326 -0
  219. package/skills/tools/scraping/api-data-collection-guide/SKILL.md +301 -0
  220. package/skills/tools/scraping/web-scraping-ethics-guide/SKILL.md +250 -0
  221. package/skills/writing/citation/bibtex-management-guide/SKILL.md +246 -0
  222. package/skills/writing/citation/citation-style-guide/SKILL.md +248 -0
  223. package/skills/writing/citation/reference-manager-comparison/SKILL.md +208 -0
  224. package/skills/writing/citation/zotero-api/SKILL.md +188 -0
  225. package/skills/writing/composition/abstract-writing-guide/SKILL.md +188 -0
  226. package/skills/writing/composition/discussion-writing-guide/SKILL.md +194 -0
  227. package/skills/writing/composition/introduction-writing-guide/SKILL.md +194 -0
  228. package/skills/writing/composition/literature-review-writing/SKILL.md +196 -0
  229. package/skills/writing/composition/methods-section-guide/SKILL.md +185 -0
  230. package/skills/writing/composition/response-to-reviewers/SKILL.md +215 -0
  231. package/skills/writing/composition/scientific-writing-guide/SKILL.md +152 -0
  232. package/skills/writing/latex/bibliography-management-guide/SKILL.md +206 -0
  233. package/skills/writing/latex/latex-drawing-guide/SKILL.md +234 -0
  234. package/skills/writing/latex/latex-ecosystem-guide/SKILL.md +240 -0
  235. package/skills/writing/latex/math-typesetting-guide/SKILL.md +231 -0
  236. package/skills/writing/latex/overleaf-collaboration-guide/SKILL.md +211 -0
  237. package/skills/writing/latex/tikz-diagrams-guide/SKILL.md +211 -0
  238. package/skills/writing/polish/academic-translation-guide/SKILL.md +175 -0
  239. package/skills/writing/polish/academic-writing-refiner/SKILL.md +143 -0
  240. package/skills/writing/polish/ai-writing-humanizer/SKILL.md +178 -0
  241. package/skills/writing/polish/grammar-checker-guide/SKILL.md +184 -0
  242. package/skills/writing/polish/plagiarism-detection-guide/SKILL.md +167 -0
  243. package/skills/writing/templates/beamer-presentation-guide/SKILL.md +263 -0
  244. package/skills/writing/templates/conference-paper-template/SKILL.md +219 -0
  245. package/skills/writing/templates/thesis-template-guide/SKILL.md +200 -0
  246. package/skills/writing/templates/thesis-writing-guide/SKILL.md +220 -0
  247. package/src/tools/arxiv.ts +131 -0
  248. package/src/tools/crossref.ts +112 -0
  249. package/src/tools/openalex.ts +174 -0
  250. package/src/tools/pubmed.ts +166 -0
  251. package/src/tools/semantic-scholar.ts +108 -0
  252. package/src/tools/unpaywall.ts +58 -0
@@ -0,0 +1,224 @@
1
+ ---
2
+ name: missing-data-handling
3
+ description: "Diagnose missing data patterns and apply appropriate imputation strategies"
4
+ metadata:
5
+ openclaw:
6
+ emoji: "jigsaw"
7
+ category: "analysis"
8
+ subcategory: "wrangling"
9
+ keywords: ["missing value imputation", "missing data handling", "outlier detection", "data cleaning", "multiple imputation"]
10
+ source: "wentor"
11
+ ---
12
+
13
+ # Missing Data Handling
14
+
15
+ A skill for diagnosing missing data mechanisms, selecting appropriate imputation strategies, and conducting sensitivity analyses. Covers everything from simple imputation to multiple imputation and modern machine learning approaches.
16
+
17
+ ## Missing Data Mechanisms
18
+
19
+ ### Rubin's Classification
20
+
21
+ Understanding the mechanism determines the appropriate handling strategy:
22
+
23
+ | Mechanism | Definition | Example | Implication |
24
+ |-----------|-----------|---------|-------------|
25
+ | MCAR | Missingness unrelated to any variable | Lab sample randomly contaminated | Listwise deletion is unbiased (but loses power) |
26
+ | MAR | Missingness related to observed variables | Higher-income respondents skip income question less | Multiple imputation appropriate |
27
+ | MNAR | Missingness related to the missing value itself | Depressed patients drop out of depression study | Requires sensitivity analysis; no simple fix |
28
+
29
+ ### Diagnosing the Mechanism
30
+
31
+ ```python
32
+ import pandas as pd
33
+ import numpy as np
34
+ from scipy import stats
35
+
36
+ def diagnose_missing_data(df: pd.DataFrame) -> dict:
37
+ """
38
+ Diagnose missing data patterns and mechanism.
39
+ """
40
+ n_rows, n_cols = df.shape
41
+ results = {
42
+ 'total_cells': n_rows * n_cols,
43
+ 'total_missing': df.isnull().sum().sum(),
44
+ 'pct_missing': (df.isnull().sum().sum() / (n_rows * n_cols)) * 100,
45
+ 'by_column': {}
46
+ }
47
+
48
+ for col in df.columns:
49
+ n_missing = df[col].isnull().sum()
50
+ pct = n_missing / n_rows * 100
51
+ results['by_column'][col] = {
52
+ 'n_missing': n_missing,
53
+ 'pct_missing': round(pct, 2)
54
+ }
55
+
56
+ # Little's MCAR test approximation
57
+ # Compare means of other variables between missing/non-missing groups
58
+ mcar_tests = {}
59
+ for col in df.columns:
60
+ if df[col].isnull().sum() > 0:
61
+ missing_mask = df[col].isnull()
62
+ for other_col in df.select_dtypes(include=[np.number]).columns:
63
+ if other_col != col and df[other_col].isnull().sum() == 0:
64
+ group_missing = df.loc[missing_mask, other_col]
65
+ group_observed = df.loc[~missing_mask, other_col]
66
+ if len(group_missing) > 1 and len(group_observed) > 1:
67
+ t_stat, p_val = stats.ttest_ind(group_missing, group_observed)
68
+ mcar_tests[f'{col}_vs_{other_col}'] = {
69
+ 't': round(t_stat, 3),
70
+ 'p': round(p_val, 4)
71
+ }
72
+
73
+ significant_diffs = sum(1 for v in mcar_tests.values() if v['p'] < 0.05)
74
+ results['mcar_assessment'] = (
75
+ 'Likely MCAR' if significant_diffs == 0
76
+ else f'Likely NOT MCAR ({significant_diffs} significant differences found)'
77
+ )
78
+ results['mcar_tests'] = mcar_tests
79
+
80
+ return results
81
+ ```
82
+
83
+ ## Imputation Methods
84
+
85
+ ### Simple Imputation
86
+
87
+ ```python
88
+ def simple_imputation(df: pd.DataFrame, strategy: str = 'mean') -> pd.DataFrame:
89
+ """
90
+ Apply simple imputation strategies.
91
+
92
+ Args:
93
+ strategy: 'mean', 'median', 'mode', 'constant', or 'forward_fill'
94
+ """
95
+ imputed = df.copy()
96
+
97
+ for col in imputed.columns:
98
+ if imputed[col].isnull().any():
99
+ if strategy == 'mean' and np.issubdtype(imputed[col].dtype, np.number):
100
+ imputed[col].fillna(imputed[col].mean(), inplace=True)
101
+ elif strategy == 'median' and np.issubdtype(imputed[col].dtype, np.number):
102
+ imputed[col].fillna(imputed[col].median(), inplace=True)
103
+ elif strategy == 'mode':
104
+ imputed[col].fillna(imputed[col].mode()[0], inplace=True)
105
+ elif strategy == 'forward_fill':
106
+ imputed[col].ffill(inplace=True)
107
+
108
+ return imputed
109
+ ```
110
+
111
+ ### Multiple Imputation (MICE)
112
+
113
+ The gold standard for MAR data:
114
+
115
+ ```python
116
+ from sklearn.experimental import enable_iterative_imputer
117
+ from sklearn.impute import IterativeImputer
118
+ from sklearn.linear_model import BayesianRidge
119
+
120
+ def multiple_imputation(df: pd.DataFrame, n_imputations: int = 20,
121
+ max_iter: int = 50) -> list[pd.DataFrame]:
122
+ """
123
+ Perform Multiple Imputation by Chained Equations (MICE).
124
+
125
+ Args:
126
+ df: DataFrame with missing values (numeric columns only)
127
+ n_imputations: Number of imputed datasets (>=20 recommended)
128
+ max_iter: Maximum iterations per imputation
129
+ Returns:
130
+ List of completed DataFrames
131
+ """
132
+ imputed_datasets = []
133
+
134
+ for i in range(n_imputations):
135
+ imputer = IterativeImputer(
136
+ estimator=BayesianRidge(),
137
+ max_iter=max_iter,
138
+ random_state=i,
139
+ sample_posterior=True # Important for proper MI
140
+ )
141
+ imputed_data = imputer.fit_transform(df)
142
+ imputed_df = pd.DataFrame(imputed_data, columns=df.columns, index=df.index)
143
+ imputed_datasets.append(imputed_df)
144
+
145
+ return imputed_datasets
146
+
147
+
148
+ def pool_mi_results(estimates: list[float], variances: list[float]) -> dict:
149
+ """
150
+ Pool results across multiply imputed datasets using Rubin's rules.
151
+
152
+ Args:
153
+ estimates: Parameter estimate from each imputed dataset
154
+ variances: Variance of estimate from each imputed dataset
155
+ """
156
+ m = len(estimates)
157
+ q_bar = np.mean(estimates) # Pooled estimate
158
+ u_bar = np.mean(variances) # Within-imputation variance
159
+ b = np.var(estimates, ddof=1) # Between-imputation variance
160
+
161
+ # Total variance
162
+ total_var = u_bar + (1 + 1/m) * b
163
+
164
+ # Degrees of freedom (Barnard-Rubin)
165
+ lambda_hat = ((1 + 1/m) * b) / total_var
166
+ df_old = (m - 1) / lambda_hat**2
167
+
168
+ se = np.sqrt(total_var)
169
+ ci = (q_bar - 1.96*se, q_bar + 1.96*se)
170
+
171
+ return {
172
+ 'pooled_estimate': q_bar,
173
+ 'pooled_se': se,
174
+ 'ci_95': ci,
175
+ 'fraction_missing_info': lambda_hat,
176
+ 'relative_efficiency': 1 / (1 + lambda_hat/m)
177
+ }
178
+ ```
179
+
180
+ ## Outlier Detection
181
+
182
+ ### Statistical Methods
183
+
184
+ ```python
185
+ def detect_outliers(series: pd.Series, method: str = 'iqr') -> pd.Series:
186
+ """
187
+ Detect outliers using specified method.
188
+
189
+ Returns boolean mask where True indicates an outlier.
190
+ """
191
+ if method == 'iqr':
192
+ q1 = series.quantile(0.25)
193
+ q3 = series.quantile(0.75)
194
+ iqr = q3 - q1
195
+ lower = q1 - 1.5 * iqr
196
+ upper = q3 + 1.5 * iqr
197
+ return (series < lower) | (series > upper)
198
+
199
+ elif method == 'zscore':
200
+ z = np.abs((series - series.mean()) / series.std())
201
+ return z > 3
202
+
203
+ elif method == 'mad':
204
+ median = series.median()
205
+ mad = np.median(np.abs(series - median))
206
+ modified_z = 0.6745 * (series - median) / (mad + 1e-10)
207
+ return np.abs(modified_z) > 3.5
208
+
209
+ else:
210
+ raise ValueError(f"Unknown method: {method}")
211
+ ```
212
+
213
+ ## Reporting Standards
214
+
215
+ When reporting missing data handling in a paper:
216
+
217
+ 1. Report the amount and pattern of missing data (by variable and overall)
218
+ 2. State the assumed mechanism (MCAR/MAR/MNAR) with justification
219
+ 3. Describe the imputation method and software used
220
+ 4. Report the number of imputations (for MI)
221
+ 5. Conduct sensitivity analyses (e.g., compare results from complete-case, single imputation, and multiple imputation)
222
+ 6. Report results using Rubin's pooling rules for MI
223
+
224
+ Never simply delete missing data without justification. Even for MCAR data, listwise deletion reduces statistical power and is rarely the best choice.
@@ -0,0 +1,242 @@
1
+ ---
2
+ name: pandas-data-wrangling
3
+ description: "Data cleaning, transformation, and exploratory analysis with pandas"
4
+ metadata:
5
+ openclaw:
6
+ emoji: "🧹"
7
+ category: "analysis"
8
+ subcategory: "wrangling"
9
+ keywords: ["CSV data analyzer", "pandas", "data wrangling", "exploratory data analysis", "missing value imputation"]
10
+ source: "N/A"
11
+ ---
12
+
13
+ # Pandas Data Wrangling Guide
14
+
15
+ ## Overview
16
+
17
+ Data wrangling -- the process of cleaning, transforming, and preparing raw data for analysis -- typically consumes 60-80% of a data scientist's time. Pandas is the de facto standard library for tabular data manipulation in Python, and mastering its idioms directly translates to faster, more reliable research workflows.
18
+
19
+ This guide covers the essential pandas operations that researchers encounter daily: loading heterogeneous data sources, diagnosing data quality issues, handling missing values, reshaping data for analysis, and performing exploratory data analysis (EDA). Each section includes copy-paste code examples designed for real-world research datasets.
20
+
21
+ Whether you are cleaning survey responses, preprocessing experimental logs, merging datasets from multiple sources, or preparing features for machine learning, the patterns here will save hours of trial and error.
22
+
23
+ ## Loading and Inspecting Data
24
+
25
+ ### Reading Common Formats
26
+
27
+ ```python
28
+ import pandas as pd
29
+ import numpy as np
30
+
31
+ # CSV with encoding and date parsing
32
+ df = pd.read_csv('data.csv', encoding='utf-8',
33
+ parse_dates=['timestamp'],
34
+ dtype={'participant_id': str})
35
+
36
+ # Excel with specific sheet
37
+ df = pd.read_excel('data.xlsx', sheet_name='Experiment1',
38
+ header=1) # Skip first row
39
+
40
+ # JSON (nested)
41
+ df = pd.json_normalize(json_data, record_path='results',
42
+ meta=['experiment_id', 'date'])
43
+
44
+ # Parquet (fast, columnar)
45
+ df = pd.read_parquet('data.parquet')
46
+ ```
47
+
48
+ ### Initial Diagnostics
49
+
50
+ ```python
51
+ # Shape and types
52
+ print(f"Shape: {df.shape}")
53
+ print(df.dtypes)
54
+ print(df.info(memory_usage='deep'))
55
+
56
+ # Statistical summary
57
+ print(df.describe(include='all'))
58
+
59
+ # Missing value report
60
+ missing = df.isnull().sum()
61
+ missing_pct = (missing / len(df) * 100).round(1)
62
+ missing_report = pd.DataFrame({
63
+ 'count': missing,
64
+ 'percent': missing_pct
65
+ }).query('count > 0').sort_values('percent', ascending=False)
66
+ print(missing_report)
67
+
68
+ # Duplicate check
69
+ n_dupes = df.duplicated().sum()
70
+ print(f"Duplicate rows: {n_dupes}")
71
+ ```
72
+
73
+ ## Handling Missing Data
74
+
75
+ ### Strategy Decision Tree
76
+
77
+ | Situation | Strategy | pandas Method |
78
+ |-----------|----------|---------------|
79
+ | < 5% missing, random | Drop rows | `df.dropna()` |
80
+ | Numeric, moderate missing | Mean/median imputation | `df.fillna(df.median())` |
81
+ | Categorical missing | Mode or "Unknown" | `df.fillna('Unknown')` |
82
+ | Time series gaps | Forward/backward fill | `df.ffill()` / `df.bfill()` |
83
+ | Systematic missing | Multiple imputation | `sklearn.impute.IterativeImputer` |
84
+ | Feature > 50% missing | Drop column | `df.drop(columns=[...])` |
85
+
86
+ ### Implementation Examples
87
+
88
+ ```python
89
+ # Conditional imputation
90
+ df['age'] = df['age'].fillna(df.groupby('group')['age'].transform('median'))
91
+
92
+ # Interpolation for time series
93
+ df['temperature'] = df['temperature'].interpolate(method='time')
94
+
95
+ # Flag missing values before imputing (preserve information)
96
+ df['salary_missing'] = df['salary'].isnull().astype(int)
97
+ df['salary'] = df['salary'].fillna(df['salary'].median())
98
+ ```
99
+
100
+ ## Data Transformation
101
+
102
+ ### Type Conversion and Cleaning
103
+
104
+ ```python
105
+ # String cleaning
106
+ df['name'] = df['name'].str.strip().str.lower()
107
+ df['email'] = df['email'].str.replace(r'\s+', '', regex=True)
108
+
109
+ # Categorical conversion (saves memory, enables ordering)
110
+ df['education'] = pd.Categorical(
111
+ df['education'],
112
+ categories=['high_school', 'bachelors', 'masters', 'phd'],
113
+ ordered=True
114
+ )
115
+
116
+ # Numeric extraction from text
117
+ df['value'] = df['text_field'].str.extract(r'(\d+\.?\d*)').astype(float)
118
+ ```
119
+
120
+ ### Reshaping Operations
121
+
122
+ ```python
123
+ # Wide to long (unpivot)
124
+ df_long = pd.melt(df,
125
+ id_vars=['subject_id', 'condition'],
126
+ value_vars=['score_t1', 'score_t2', 'score_t3'],
127
+ var_name='timepoint',
128
+ value_name='score'
129
+ )
130
+
131
+ # Long to wide (pivot)
132
+ df_wide = df_long.pivot_table(
133
+ index='subject_id',
134
+ columns='condition',
135
+ values='score',
136
+ aggfunc='mean'
137
+ ).reset_index()
138
+
139
+ # Cross-tabulation
140
+ ct = pd.crosstab(df['group'], df['outcome'],
141
+ margins=True, normalize='index')
142
+ ```
143
+
144
+ ### Merging and Joining
145
+
146
+ ```python
147
+ # Left join with validation
148
+ merged = pd.merge(
149
+ experiments, participants,
150
+ on='participant_id',
151
+ how='left',
152
+ validate='many_to_one', # Catch unexpected duplicates
153
+ indicator=True # Shows _merge column
154
+ )
155
+
156
+ # Check merge quality
157
+ print(merged['_merge'].value_counts())
158
+ ```
159
+
160
+ ## Exploratory Data Analysis (EDA)
161
+
162
+ ### Automated EDA Pipeline
163
+
164
+ ```python
165
+ def quick_eda(df, target_col=None):
166
+ """Run a quick EDA pipeline on a DataFrame."""
167
+ print(f"=== Shape: {df.shape} ===\n")
168
+
169
+ # Numeric columns
170
+ numeric_cols = df.select_dtypes(include=np.number).columns
171
+ print(f"Numeric columns ({len(numeric_cols)}):")
172
+ print(df[numeric_cols].describe().round(2))
173
+
174
+ # Categorical columns
175
+ cat_cols = df.select_dtypes(include=['object', 'category']).columns
176
+ print(f"\nCategorical columns ({len(cat_cols)}):")
177
+ for col in cat_cols:
178
+ n_unique = df[col].nunique()
179
+ print(f" {col}: {n_unique} unique values")
180
+ if n_unique <= 10:
181
+ print(f" {df[col].value_counts().to_dict()}")
182
+
183
+ # Correlations with target
184
+ if target_col and target_col in numeric_cols:
185
+ corr = df[numeric_cols].corr()[target_col].drop(target_col)
186
+ print(f"\nCorrelations with '{target_col}':")
187
+ print(corr.sort_values(ascending=False).round(3))
188
+
189
+ quick_eda(df, target_col='accuracy')
190
+ ```
191
+
192
+ ### GroupBy Aggregations
193
+
194
+ ```python
195
+ # Multi-metric summary by group
196
+ summary = df.groupby('method').agg(
197
+ mean_acc=('accuracy', 'mean'),
198
+ std_acc=('accuracy', 'std'),
199
+ median_time=('runtime_sec', 'median'),
200
+ n_runs=('run_id', 'count')
201
+ ).round(3).sort_values('mean_acc', ascending=False)
202
+
203
+ print(summary.to_markdown())
204
+ ```
205
+
206
+ ## Performance Optimization
207
+
208
+ | Technique | When to Use | Speedup |
209
+ |-----------|-------------|---------|
210
+ | `pd.Categorical` for strings | Repeated string values | 2-10x memory |
211
+ | `.query()` instead of boolean indexing | Complex filters | 1.5-3x |
212
+ | `pd.eval()` for arithmetic | Column arithmetic | 2-5x |
213
+ | Parquet instead of CSV | Large datasets | 5-20x I/O |
214
+ | `df.pipe()` for chaining | Readable pipelines | Clarity |
215
+
216
+ ```python
217
+ # Method chaining with pipe
218
+ result = (
219
+ df
220
+ .query('score > 0')
221
+ .assign(log_score=lambda x: np.log1p(x['score']))
222
+ .groupby('group')
223
+ .agg(mean_log=('log_score', 'mean'))
224
+ .sort_values('mean_log', ascending=False)
225
+ )
226
+ ```
227
+
228
+ ## Best Practices
229
+
230
+ - **Never modify the original DataFrame in place.** Use `.copy()` when creating derived datasets.
231
+ - **Use method chaining for readability.** Pipe operations together instead of creating intermediate variables.
232
+ - **Document your cleaning steps.** Keep a data cleaning log or use a Jupyter notebook with explanations.
233
+ - **Validate after every merge.** Check row counts, null values, and the `_merge` indicator column.
234
+ - **Profile before optimizing.** Use `df.memory_usage(deep=True)` to identify memory bottlenecks.
235
+ - **Save intermediate results as Parquet.** It preserves dtypes and is much faster than CSV.
236
+
237
+ ## References
238
+
239
+ - [pandas Documentation](https://pandas.pydata.org/docs/) -- Official reference
240
+ - [Python for Data Analysis, 3rd Edition](https://wesmckinney.com/book/) -- Wes McKinney
241
+ - [Effective Pandas](https://store.metasnake.com/effective-pandas-book) -- Matt Harrison
242
+ - [pandas Cookbook](https://github.com/jvns/pandas-cookbook) -- Julia Evans
@@ -0,0 +1,234 @@
1
+ ---
2
+ name: questionnaire-design-guide
3
+ description: "Questionnaire and survey design with Likert scales and coding"
4
+ metadata:
5
+ openclaw:
6
+ emoji: "clipboard"
7
+ category: "analysis"
8
+ subcategory: "wrangling"
9
+ keywords: ["questionnaire design", "survey design", "Likert scale", "data transformation"]
10
+ source: "wentor-research-plugins"
11
+ ---
12
+
13
+ # Questionnaire Design Guide
14
+
15
+ Design valid and reliable survey instruments with proper question types, Likert scale construction, response coding, and data preparation for analysis.
16
+
17
+ ## Survey Design Principles
18
+
19
+ ### Question Types
20
+
21
+ | Type | Example | Best For | Analysis |
22
+ |------|---------|----------|----------|
23
+ | **Likert scale** | "Rate your agreement: 1-5" | Attitudes, perceptions | Ordinal/interval statistics |
24
+ | **Multiple choice** | "Select your field" | Demographics, categories | Frequencies, chi-square |
25
+ | **Ranking** | "Rank these 5 options" | Preferences, priorities | Rank correlations |
26
+ | **Open-ended** | "Describe your experience" | Exploratory, rich data | Qualitative coding |
27
+ | **Matrix/grid** | Multiple items, same scale | Efficient battery of items | Factor analysis, reliability |
28
+ | **Slider/VAS** | 0-100 visual analog scale | Continuous measures | Parametric statistics |
29
+ | **Semantic differential** | "Easy __ __ __ __ __ Difficult" | Bipolar attitudes | Factor analysis |
30
+
31
+ ### The Four C's of Good Questions
32
+
33
+ 1. **Clear**: Avoid jargon, double-barreled questions, and ambiguity
34
+ 2. **Concise**: Keep questions short (ideally under 20 words)
35
+ 3. **Complete**: Include all relevant response options
36
+ 4. **Consistent**: Use the same scale direction and format throughout
37
+
38
+ ## Likert Scale Design
39
+
40
+ ### Scale Points
41
+
42
+ | Points | Scale Example | Recommended Use |
43
+ |--------|---------------|-----------------|
44
+ | 4-point | Strongly Disagree to Strongly Agree | Forces choice (no neutral), less discriminating |
45
+ | 5-point | SD, D, Neutral, A, SA | Most common, good balance of simplicity and discrimination |
46
+ | 7-point | SD, D, Somewhat D, Neutral, Somewhat A, A, SA | More discriminating, better for experienced respondents |
47
+ | 11-point (0-10) | Not at all to Completely | NPS, continuous-like measures |
48
+
49
+ ### Anchoring Labels
50
+
51
+ ```
52
+ 5-Point Agreement Scale:
53
+ 1 = Strongly Disagree
54
+ 2 = Disagree
55
+ 3 = Neither Agree nor Disagree
56
+ 4 = Agree
57
+ 5 = Strongly Agree
58
+
59
+ 5-Point Frequency Scale:
60
+ 1 = Never
61
+ 2 = Rarely
62
+ 3 = Sometimes
63
+ 4 = Often
64
+ 5 = Always
65
+
66
+ 5-Point Satisfaction Scale:
67
+ 1 = Very Dissatisfied
68
+ 2 = Dissatisfied
69
+ 3 = Neutral
70
+ 4 = Satisfied
71
+ 5 = Very Satisfied
72
+ ```
73
+
74
+ ### Reverse-Coded Items
75
+
76
+ Include 2-3 reverse-coded items per construct to detect acquiescence bias:
77
+
78
+ ```
79
+ Regular: "I find research methods interesting." (1-5: SD to SA)
80
+ Reversed: "I find research methods tedious and dull." (1-5: SD to SA)
81
+
82
+ # Recode reversed items before analysis:
83
+ # reversed_score = (max_scale + 1) - raw_score
84
+ # For a 5-point scale: reversed_score = 6 - raw_score
85
+ ```
86
+
87
+ ## Constructing a Multi-Item Scale
88
+
89
+ ### Step-by-Step Process
90
+
91
+ 1. **Define the construct**: Write a clear conceptual definition
92
+ 2. **Generate items**: Write 1.5-2x the number of items you plan to keep (e.g., write 15 items for an 8-item scale)
93
+ 3. **Expert review**: Have 3-5 experts rate each item for relevance (Content Validity Index)
94
+ 4. **Pilot test**: Administer to 30-50 respondents
95
+ 5. **Item analysis**: Calculate item-total correlations, check reliability
96
+ 6. **Exploratory Factor Analysis (EFA)**: Confirm dimensionality
97
+ 7. **Finalize scale**: Remove weak items, re-test reliability
98
+
99
+ ### Example: Research Self-Efficacy Scale
100
+
101
+ ```
102
+ Construct: Belief in one's ability to conduct academic research
103
+
104
+ Items (5-point Likert, Strongly Disagree to Strongly Agree):
105
+ RSE1: I can formulate clear research questions.
106
+ RSE2: I can design an appropriate research methodology.
107
+ RSE3: I can analyze data using statistical software.
108
+ RSE4: I can write a publishable research paper.
109
+ RSE5: I can critically evaluate published research.
110
+ RSE6: I can present research findings at a conference.
111
+ RSE7R: I struggle to interpret statistical results. [REVERSED]
112
+ RSE8R: I find it difficult to synthesize literature. [REVERSED]
113
+ ```
114
+
115
+ ## Data Coding and Preparation
116
+
117
+ ### Coding Scheme
118
+
119
+ ```python
120
+ import pandas as pd
121
+ import numpy as np
122
+
123
+ # Define coding scheme
124
+ likert_coding = {
125
+ "Strongly Disagree": 1,
126
+ "Disagree": 2,
127
+ "Neither Agree nor Disagree": 3,
128
+ "Agree": 4,
129
+ "Strongly Agree": 5
130
+ }
131
+
132
+ # Apply coding
133
+ df["Q1_coded"] = df["Q1_raw"].map(likert_coding)
134
+
135
+ # Reverse code specific items
136
+ reverse_items = ["RSE7R", "RSE8R"]
137
+ max_scale = 5
138
+ for item in reverse_items:
139
+ df[f"{item}_recoded"] = (max_scale + 1) - df[item]
140
+
141
+ # Calculate composite score (mean of items)
142
+ scale_items = ["RSE1", "RSE2", "RSE3", "RSE4", "RSE5", "RSE6",
143
+ "RSE7R_recoded", "RSE8R_recoded"]
144
+ df["RSE_mean"] = df[scale_items].mean(axis=1)
145
+ ```
146
+
147
+ ### Missing Data Handling
148
+
149
+ ```python
150
+ # Check missing data patterns
151
+ print(df[scale_items].isnull().sum())
152
+ print(f"Complete cases: {df[scale_items].dropna().shape[0]} / {df.shape[0]}")
153
+
154
+ # Common strategies:
155
+ # 1. Listwise deletion (if < 5% missing)
156
+ df_complete = df.dropna(subset=scale_items)
157
+
158
+ # 2. Mean imputation per item (simple but biased)
159
+ df[scale_items] = df[scale_items].fillna(df[scale_items].mean())
160
+
161
+ # 3. Person-mean imputation (if < 20% of items missing per person)
162
+ def person_mean_impute(row, items, max_missing=2):
163
+ if row[items].isnull().sum() <= max_missing:
164
+ return row[items].fillna(row[items].mean())
165
+ return row[items] # leave as NaN if too many missing
166
+
167
+ df[scale_items] = df.apply(lambda r: person_mean_impute(r, scale_items), axis=1)
168
+ ```
169
+
170
+ ## Reliability Analysis
171
+
172
+ ### Cronbach's Alpha
173
+
174
+ ```python
175
+ import pingouin as pg
176
+
177
+ # Calculate Cronbach's alpha
178
+ alpha = pg.cronbach_alpha(df[scale_items])
179
+ print(f"Cronbach's alpha: {alpha[0]:.3f}")
180
+ # Interpretation: >= 0.70 acceptable, >= 0.80 good, >= 0.90 excellent
181
+ ```
182
+
183
+ ```r
184
+ library(psych)
185
+
186
+ # Cronbach's alpha with item-level diagnostics
187
+ alpha_result <- alpha(data[, scale_items])
188
+ print(alpha_result)
189
+ # Check "raw_alpha if item dropped" to identify weak items
190
+ ```
191
+
192
+ ### Item-Total Correlations
193
+
194
+ ```r
195
+ # Corrected item-total correlations (should be > 0.30)
196
+ item_stats <- alpha_result$item.stats
197
+ print(item_stats[, c("r.drop", "raw.alpha")])
198
+ # r.drop < 0.30: consider removing the item
199
+ # raw.alpha increases if dropped: item is weakening the scale
200
+ ```
201
+
202
+ ## Validity Assessment
203
+
204
+ | Validity Type | Method | Criterion |
205
+ |--------------|--------|-----------|
206
+ | **Content validity** | Expert panel rating (CVI) | I-CVI >= 0.78, S-CVI/Ave >= 0.90 |
207
+ | **Construct validity** | Exploratory Factor Analysis (EFA) | Eigenvalue > 1, loadings > 0.40 |
208
+ | **Convergent validity** | Correlation with related construct | r > 0.30 |
209
+ | **Discriminant validity** | Correlation with unrelated construct | r < 0.30 |
210
+ | **Criterion validity** | Correlation with external criterion | Significant correlation |
211
+ | **Test-retest reliability** | ICC or Pearson r over 2-4 weeks | ICC > 0.70 |
212
+
213
+ ## Common Design Mistakes
214
+
215
+ | Mistake | Example | Fix |
216
+ |---------|---------|-----|
217
+ | Double-barreled question | "This course is interesting and useful" | Split into two separate items |
218
+ | Leading question | "Don't you agree that X is important?" | "How important is X to you?" |
219
+ | Absolute terms | "Do you always check citations?" | "How often do you check citations?" |
220
+ | Missing option | No "Not Applicable" when needed | Add N/A option or filter logic |
221
+ | Inconsistent scale direction | Some items 1=good, others 1=bad | Standardize direction; clearly mark reversed items |
222
+ | Too many items | 100-item survey | Aim for 5-8 items per construct, 15-30 min total |
223
+ | No pilot test | Skip straight to full deployment | Always pilot with 30-50 respondents |
224
+
225
+ ## Survey Platform Comparison
226
+
227
+ | Platform | Cost | Features | Best For |
228
+ |----------|------|----------|----------|
229
+ | Qualtrics | Institutional | Advanced logic, panels, API | Large academic studies |
230
+ | SurveyMonkey | Freemium | Easy to use, basic analysis | Quick surveys |
231
+ | Google Forms | Free | Simple, integrates with Sheets | Classroom, pilot testing |
232
+ | LimeSurvey | Free/self-hosted | Open source, full control | Privacy-sensitive research |
233
+ | REDCap | Free (academic) | Clinical data, HIPAA compliant | Medical/clinical research |
234
+ | Prolific | Per-response | Participant recruitment | Online experiments |