@wentorai/research-plugins 1.0.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (415) hide show
  1. package/README.md +22 -22
  2. package/curated/analysis/README.md +82 -56
  3. package/curated/domains/README.md +225 -69
  4. package/curated/literature/README.md +115 -46
  5. package/curated/research/README.md +106 -58
  6. package/curated/tools/README.md +107 -87
  7. package/curated/writing/README.md +92 -45
  8. package/mcp-configs/academic-db/alphafold-mcp.json +20 -0
  9. package/mcp-configs/academic-db/brightspace-mcp.json +21 -0
  10. package/mcp-configs/academic-db/climatiq-mcp.json +20 -0
  11. package/mcp-configs/academic-db/gibs-mcp.json +20 -0
  12. package/mcp-configs/academic-db/gis-mcp-server.json +22 -0
  13. package/mcp-configs/academic-db/google-earth-engine-mcp.json +21 -0
  14. package/mcp-configs/academic-db/m4-clinical-mcp.json +21 -0
  15. package/mcp-configs/academic-db/medical-mcp.json +21 -0
  16. package/mcp-configs/academic-db/nexonco-mcp.json +20 -0
  17. package/mcp-configs/academic-db/omop-mcp.json +20 -0
  18. package/mcp-configs/academic-db/onekgpd-mcp.json +20 -0
  19. package/mcp-configs/academic-db/openedu-mcp.json +20 -0
  20. package/mcp-configs/academic-db/opengenes-mcp.json +20 -0
  21. package/mcp-configs/academic-db/openstax-mcp.json +21 -0
  22. package/mcp-configs/academic-db/openstreetmap-mcp.json +21 -0
  23. package/mcp-configs/academic-db/opentargets-mcp.json +21 -0
  24. package/mcp-configs/academic-db/pdb-mcp.json +21 -0
  25. package/mcp-configs/academic-db/smithsonian-mcp.json +20 -0
  26. package/mcp-configs/ai-platform/magi-researchers.json +21 -0
  27. package/mcp-configs/ai-platform/mcp-academic-researcher.json +22 -0
  28. package/mcp-configs/ai-platform/open-paper-machine.json +21 -0
  29. package/mcp-configs/ai-platform/paper-intelligence.json +21 -0
  30. package/mcp-configs/ai-platform/paper-reader.json +21 -0
  31. package/mcp-configs/ai-platform/paperdebugger.json +21 -0
  32. package/mcp-configs/browser/exa-mcp.json +20 -0
  33. package/mcp-configs/browser/mcp-searxng.json +21 -0
  34. package/mcp-configs/browser/mcp-webresearch.json +20 -0
  35. package/mcp-configs/cloud-docs/confluence-mcp.json +37 -0
  36. package/mcp-configs/cloud-docs/google-drive-mcp.json +35 -0
  37. package/mcp-configs/cloud-docs/notion-mcp.json +29 -0
  38. package/mcp-configs/communication/discord-mcp.json +29 -0
  39. package/mcp-configs/communication/discourse-mcp.json +21 -0
  40. package/mcp-configs/communication/slack-mcp.json +29 -0
  41. package/mcp-configs/communication/telegram-mcp.json +28 -0
  42. package/mcp-configs/data-platform/automl-stat-mcp.json +21 -0
  43. package/mcp-configs/data-platform/jefferson-stats-mcp.json +22 -0
  44. package/mcp-configs/data-platform/mcp-excel-server.json +21 -0
  45. package/mcp-configs/data-platform/mcp-stata.json +21 -0
  46. package/mcp-configs/data-platform/mcpstack-jupyter.json +21 -0
  47. package/mcp-configs/data-platform/ml-mcp.json +21 -0
  48. package/mcp-configs/data-platform/nasdaq-data-link-mcp.json +20 -0
  49. package/mcp-configs/data-platform/numpy-mcp.json +21 -0
  50. package/mcp-configs/database/neo4j-mcp.json +37 -0
  51. package/mcp-configs/database/postgres-mcp.json +28 -0
  52. package/mcp-configs/database/sqlite-mcp.json +29 -0
  53. package/mcp-configs/dev-platform/geogebra-mcp.json +21 -0
  54. package/mcp-configs/dev-platform/github-mcp.json +31 -0
  55. package/mcp-configs/dev-platform/gitlab-mcp.json +34 -0
  56. package/mcp-configs/dev-platform/latex-mcp-server.json +21 -0
  57. package/mcp-configs/dev-platform/manim-mcp.json +20 -0
  58. package/mcp-configs/dev-platform/mcp-echarts.json +20 -0
  59. package/mcp-configs/dev-platform/panel-viz-mcp.json +20 -0
  60. package/mcp-configs/dev-platform/paperbanana.json +20 -0
  61. package/mcp-configs/dev-platform/texflow-mcp.json +20 -0
  62. package/mcp-configs/dev-platform/texmcp.json +20 -0
  63. package/mcp-configs/dev-platform/typst-mcp.json +21 -0
  64. package/mcp-configs/dev-platform/vizro-mcp.json +20 -0
  65. package/mcp-configs/email/email-mcp.json +40 -0
  66. package/mcp-configs/email/gmail-mcp.json +37 -0
  67. package/mcp-configs/note-knowledge/local-faiss-mcp.json +21 -0
  68. package/mcp-configs/note-knowledge/mcp-memory-service.json +21 -0
  69. package/mcp-configs/note-knowledge/mcp-obsidian.json +23 -0
  70. package/mcp-configs/note-knowledge/mcp-ragdocs.json +20 -0
  71. package/mcp-configs/note-knowledge/mcp-summarizer.json +21 -0
  72. package/mcp-configs/note-knowledge/mediawiki-mcp.json +21 -0
  73. package/mcp-configs/note-knowledge/openzim-mcp.json +20 -0
  74. package/mcp-configs/note-knowledge/zettelkasten-mcp.json +21 -0
  75. package/mcp-configs/reference-mgr/academic-paper-mcp-http.json +20 -0
  76. package/mcp-configs/reference-mgr/academix.json +20 -0
  77. package/mcp-configs/reference-mgr/arxiv-research-mcp.json +21 -0
  78. package/mcp-configs/reference-mgr/google-scholar-abstract-mcp.json +19 -0
  79. package/mcp-configs/reference-mgr/google-scholar-mcp.json +20 -0
  80. package/mcp-configs/reference-mgr/mcp-paperswithcode.json +21 -0
  81. package/mcp-configs/reference-mgr/mcp-scholarly.json +20 -0
  82. package/mcp-configs/reference-mgr/mcp-simple-arxiv.json +20 -0
  83. package/mcp-configs/reference-mgr/mcp-simple-pubmed.json +20 -0
  84. package/mcp-configs/reference-mgr/mcp-zotero.json +21 -0
  85. package/mcp-configs/reference-mgr/mendeley-mcp.json +20 -0
  86. package/mcp-configs/reference-mgr/ncbi-mcp-server.json +22 -0
  87. package/mcp-configs/reference-mgr/onecite.json +21 -0
  88. package/mcp-configs/reference-mgr/paper-search-mcp.json +21 -0
  89. package/mcp-configs/reference-mgr/pubmed-search-mcp.json +21 -0
  90. package/mcp-configs/reference-mgr/scholar-mcp.json +21 -0
  91. package/mcp-configs/reference-mgr/scholar-multi-mcp.json +21 -0
  92. package/mcp-configs/reference-mgr/seerai.json +21 -0
  93. package/mcp-configs/reference-mgr/semantic-scholar-fastmcp.json +21 -0
  94. package/mcp-configs/reference-mgr/sourcelibrary.json +20 -0
  95. package/mcp-configs/registry.json +178 -149
  96. package/mcp-configs/repository/dataverse-mcp.json +33 -0
  97. package/mcp-configs/repository/huggingface-mcp.json +29 -0
  98. package/openclaw.plugin.json +2 -2
  99. package/package.json +2 -2
  100. package/skills/analysis/dataviz/algorithm-visualizer-guide/SKILL.md +259 -0
  101. package/skills/analysis/dataviz/bokeh-visualization-guide/SKILL.md +270 -0
  102. package/skills/analysis/dataviz/chart-image-generator/SKILL.md +229 -0
  103. package/skills/analysis/dataviz/citation-map-guide/SKILL.md +184 -0
  104. package/skills/analysis/dataviz/d3-visualization-guide/SKILL.md +281 -0
  105. package/skills/analysis/dataviz/data-visualization-principles/SKILL.md +171 -0
  106. package/skills/analysis/dataviz/echarts-visualization-guide/SKILL.md +250 -0
  107. package/skills/analysis/dataviz/metabase-analytics-guide/SKILL.md +242 -0
  108. package/skills/analysis/dataviz/plotly-interactive-guide/SKILL.md +266 -0
  109. package/skills/analysis/dataviz/redash-analytics-guide/SKILL.md +284 -0
  110. package/skills/analysis/econometrics/econml-causal-guide/SKILL.md +163 -0
  111. package/skills/analysis/econometrics/empirical-paper-analysis/SKILL.md +192 -0
  112. package/skills/analysis/econometrics/mostly-harmless-guide/SKILL.md +139 -0
  113. package/skills/analysis/econometrics/panel-data-analyst/SKILL.md +259 -0
  114. package/skills/analysis/econometrics/panel-data-regression-workflow/SKILL.md +267 -0
  115. package/skills/analysis/econometrics/python-causality-guide/SKILL.md +134 -0
  116. package/skills/analysis/econometrics/stata-accounting-guide/SKILL.md +269 -0
  117. package/skills/analysis/econometrics/stata-analyst-guide/SKILL.md +245 -0
  118. package/skills/analysis/econometrics/stata-reference-guide/SKILL.md +293 -0
  119. package/skills/analysis/statistics/data-anomaly-detection/SKILL.md +157 -0
  120. package/skills/analysis/statistics/general-statistics-guide/SKILL.md +226 -0
  121. package/skills/analysis/statistics/infiagent-benchmark-guide/SKILL.md +106 -0
  122. package/skills/analysis/statistics/ml-experiment-tracker/SKILL.md +212 -0
  123. package/skills/analysis/statistics/pywayne-statistics-guide/SKILL.md +192 -0
  124. package/skills/analysis/statistics/quantitative-methods-guide/SKILL.md +193 -0
  125. package/skills/analysis/statistics/senior-data-scientist-guide/SKILL.md +223 -0
  126. package/skills/analysis/wrangling/claude-data-analysis-guide/SKILL.md +100 -0
  127. package/skills/analysis/wrangling/csv-data-analyzer/SKILL.md +170 -0
  128. package/skills/analysis/wrangling/data-cleaning-pipeline/SKILL.md +266 -0
  129. package/skills/analysis/wrangling/data-cog-guide/SKILL.md +178 -0
  130. package/skills/analysis/wrangling/open-data-scientist-guide/SKILL.md +197 -0
  131. package/skills/analysis/wrangling/stata-data-cleaning/SKILL.md +276 -0
  132. package/skills/analysis/wrangling/streamline-analyst-guide/SKILL.md +119 -0
  133. package/skills/analysis/wrangling/survey-data-processing/SKILL.md +298 -0
  134. package/skills/domains/ai-ml/ai-agent-papers-guide/SKILL.md +146 -0
  135. package/skills/domains/ai-ml/ai-model-benchmarking/SKILL.md +209 -0
  136. package/skills/domains/ai-ml/annotated-dl-papers-guide/SKILL.md +159 -0
  137. package/skills/domains/ai-ml/anomaly-detection-papers-guide/SKILL.md +167 -0
  138. package/skills/domains/ai-ml/autonomous-agents-papers-guide/SKILL.md +178 -0
  139. package/skills/domains/ai-ml/dl-transformer-finetune/SKILL.md +239 -0
  140. package/skills/domains/ai-ml/domain-adaptation-papers-guide/SKILL.md +173 -0
  141. package/skills/domains/ai-ml/generative-ai-guide/SKILL.md +146 -0
  142. package/skills/domains/ai-ml/graph-learning-papers-guide/SKILL.md +125 -0
  143. package/skills/domains/ai-ml/huggingface-inference-guide/SKILL.md +196 -0
  144. package/skills/domains/ai-ml/keras-deep-learning/SKILL.md +210 -0
  145. package/skills/domains/ai-ml/kolmogorov-arnold-networks-guide/SKILL.md +185 -0
  146. package/skills/domains/ai-ml/llm-from-scratch-guide/SKILL.md +124 -0
  147. package/skills/domains/ai-ml/ml-pipeline-guide/SKILL.md +295 -0
  148. package/skills/domains/ai-ml/nlp-toolkit-guide/SKILL.md +247 -0
  149. package/skills/domains/ai-ml/npcpy-research-guide/SKILL.md +137 -0
  150. package/skills/domains/ai-ml/pytorch-guide/SKILL.md +281 -0
  151. package/skills/domains/ai-ml/pytorch-lightning-guide/SKILL.md +244 -0
  152. package/skills/domains/ai-ml/responsible-ai-guide/SKILL.md +126 -0
  153. package/skills/domains/ai-ml/tensorflow-guide/SKILL.md +241 -0
  154. package/skills/domains/ai-ml/vmas-simulator-guide/SKILL.md +129 -0
  155. package/skills/domains/biomedical/bioagents-guide/SKILL.md +308 -0
  156. package/skills/domains/biomedical/clawbio-guide/SKILL.md +167 -0
  157. package/skills/domains/biomedical/clinical-dialogue-agents-guide/SKILL.md +145 -0
  158. package/skills/domains/biomedical/ena-sequence-api/SKILL.md +175 -0
  159. package/skills/domains/biomedical/genomas-guide/SKILL.md +126 -0
  160. package/skills/domains/biomedical/genotex-benchmark-guide/SKILL.md +125 -0
  161. package/skills/domains/biomedical/med-researcher-guide/SKILL.md +161 -0
  162. package/skills/domains/biomedical/med-researcher-r1-guide/SKILL.md +146 -0
  163. package/skills/domains/biomedical/medgeclaw-guide/SKILL.md +345 -0
  164. package/skills/domains/biomedical/medical-imaging-guide/SKILL.md +305 -0
  165. package/skills/domains/biomedical/ncbi-blast-api/SKILL.md +195 -0
  166. package/skills/domains/biomedical/ncbi-datasets-api/SKILL.md +220 -0
  167. package/skills/domains/biomedical/quickgo-api/SKILL.md +181 -0
  168. package/skills/domains/business/architecture-design-guide/SKILL.md +279 -0
  169. package/skills/domains/business/innovation-management-guide/SKILL.md +257 -0
  170. package/skills/domains/business/operations-research-guide/SKILL.md +258 -0
  171. package/skills/domains/business/xpert-bi-guide/SKILL.md +84 -0
  172. package/skills/domains/chemistry/cactus-cheminformatics-guide/SKILL.md +89 -0
  173. package/skills/domains/chemistry/chemeagle-guide/SKILL.md +147 -0
  174. package/skills/domains/chemistry/chemgraph-agent-guide/SKILL.md +120 -0
  175. package/skills/domains/chemistry/molecular-dynamics-guide/SKILL.md +237 -0
  176. package/skills/domains/chemistry/pubchem-api-guide/SKILL.md +180 -0
  177. package/skills/domains/chemistry/spectroscopy-analysis-guide/SKILL.md +290 -0
  178. package/skills/domains/cs/ai-security-papers-guide/SKILL.md +103 -0
  179. package/skills/domains/cs/code-llm-papers-guide/SKILL.md +131 -0
  180. package/skills/domains/cs/distributed-systems-guide/SKILL.md +268 -0
  181. package/skills/domains/cs/formal-verification-guide/SKILL.md +298 -0
  182. package/skills/domains/cs/gaussian-splatting-papers-guide/SKILL.md +158 -0
  183. package/skills/domains/cs/llm-aiops-guide/SKILL.md +70 -0
  184. package/skills/domains/cs/software-heritage-api/SKILL.md +200 -0
  185. package/skills/domains/ecology/species-distribution-guide/SKILL.md +343 -0
  186. package/skills/domains/economics/imf-data-api-guide/SKILL.md +174 -0
  187. package/skills/domains/economics/nber-working-papers-api/SKILL.md +177 -0
  188. package/skills/domains/economics/post-labor-economics/SKILL.md +254 -0
  189. package/skills/domains/economics/pricing-psychology-guide/SKILL.md +273 -0
  190. package/skills/domains/economics/repec-economics-api/SKILL.md +188 -0
  191. package/skills/domains/economics/world-bank-data-guide/SKILL.md +179 -0
  192. package/skills/domains/education/academic-study-methods/SKILL.md +228 -0
  193. package/skills/domains/education/assessment-design-guide/SKILL.md +213 -0
  194. package/skills/domains/education/educational-research-methods/SKILL.md +179 -0
  195. package/skills/domains/education/edumcp-guide/SKILL.md +74 -0
  196. package/skills/domains/education/mooc-analytics-guide/SKILL.md +206 -0
  197. package/skills/domains/education/open-syllabus-api/SKILL.md +171 -0
  198. package/skills/domains/finance/akshare-finance-data/SKILL.md +207 -0
  199. package/skills/domains/finance/finsight-research-guide/SKILL.md +113 -0
  200. package/skills/domains/finance/options-analytics-agent-guide/SKILL.md +117 -0
  201. package/skills/domains/finance/portfolio-optimization-guide/SKILL.md +279 -0
  202. package/skills/domains/finance/risk-modeling-guide/SKILL.md +260 -0
  203. package/skills/domains/finance/stata-accounting-research/SKILL.md +372 -0
  204. package/skills/domains/geoscience/climate-modeling-guide/SKILL.md +215 -0
  205. package/skills/domains/geoscience/pangaea-data-api/SKILL.md +197 -0
  206. package/skills/domains/geoscience/satellite-remote-sensing/SKILL.md +193 -0
  207. package/skills/domains/geoscience/seismology-data-guide/SKILL.md +208 -0
  208. package/skills/domains/humanities/digital-humanities-methods/SKILL.md +232 -0
  209. package/skills/domains/humanities/ethical-philosophy-guide/SKILL.md +244 -0
  210. package/skills/domains/humanities/history-research-guide/SKILL.md +260 -0
  211. package/skills/domains/humanities/political-history-guide/SKILL.md +241 -0
  212. package/skills/domains/law/caselaw-access-api/SKILL.md +149 -0
  213. package/skills/domains/law/legal-agent-skills-guide/SKILL.md +132 -0
  214. package/skills/domains/law/legal-nlp-guide/SKILL.md +236 -0
  215. package/skills/domains/law/legal-research-methods/SKILL.md +190 -0
  216. package/skills/domains/law/opencontracts-guide/SKILL.md +168 -0
  217. package/skills/domains/law/patent-analysis-guide/SKILL.md +257 -0
  218. package/skills/domains/law/regulatory-compliance-guide/SKILL.md +267 -0
  219. package/skills/domains/math/lean-theorem-proving-guide/SKILL.md +140 -0
  220. package/skills/domains/math/symbolic-computation-guide/SKILL.md +263 -0
  221. package/skills/domains/math/topology-data-analysis/SKILL.md +305 -0
  222. package/skills/domains/pharma/clinical-trial-design-guide/SKILL.md +271 -0
  223. package/skills/domains/pharma/drug-target-interaction/SKILL.md +242 -0
  224. package/skills/domains/pharma/madd-drug-discovery-guide/SKILL.md +153 -0
  225. package/skills/domains/pharma/pharmacovigilance-guide/SKILL.md +216 -0
  226. package/skills/domains/physics/astrophysics-data-guide/SKILL.md +305 -0
  227. package/skills/domains/physics/particle-physics-guide/SKILL.md +287 -0
  228. package/skills/domains/social-science/ipums-microdata-api/SKILL.md +211 -0
  229. package/skills/domains/social-science/network-analysis-guide/SKILL.md +310 -0
  230. package/skills/domains/social-science/psychology-research-guide/SKILL.md +270 -0
  231. package/skills/domains/social-science/sociology-research-guide/SKILL.md +238 -0
  232. package/skills/domains/social-science/sociology-research-methods/SKILL.md +181 -0
  233. package/skills/literature/discovery/arxiv-paper-monitoring/SKILL.md +233 -0
  234. package/skills/literature/discovery/paper-recommendation-guide/SKILL.md +120 -0
  235. package/skills/literature/discovery/papers-we-love-guide/SKILL.md +169 -0
  236. package/skills/literature/discovery/semantic-paper-radar/SKILL.md +144 -0
  237. package/skills/literature/discovery/zotero-arxiv-daily-guide/SKILL.md +94 -0
  238. package/skills/literature/fulltext/bioc-pmc-api/SKILL.md +146 -0
  239. package/skills/literature/fulltext/core-api-guide/SKILL.md +144 -0
  240. package/skills/literature/fulltext/dataverse-api/SKILL.md +215 -0
  241. package/skills/literature/fulltext/hal-archive-api/SKILL.md +218 -0
  242. package/skills/literature/fulltext/institutional-repository-guide/SKILL.md +212 -0
  243. package/skills/literature/fulltext/open-access-mining-guide/SKILL.md +341 -0
  244. package/skills/literature/fulltext/osf-api/SKILL.md +212 -0
  245. package/skills/literature/fulltext/pmc-ftp-bulk-download/SKILL.md +182 -0
  246. package/skills/literature/fulltext/zotero-ai-butler-guide/SKILL.md +166 -0
  247. package/skills/literature/fulltext/zotero-scihub-guide/SKILL.md +168 -0
  248. package/skills/literature/metadata/academic-paper-summarizer/SKILL.md +101 -0
  249. package/skills/literature/metadata/bibliometrix-guide/SKILL.md +164 -0
  250. package/skills/literature/metadata/crossref-event-data-api/SKILL.md +183 -0
  251. package/skills/literature/metadata/doi-content-negotiation/SKILL.md +202 -0
  252. package/skills/literature/metadata/orkg-api/SKILL.md +153 -0
  253. package/skills/literature/metadata/plumx-metrics-api/SKILL.md +188 -0
  254. package/skills/literature/metadata/ror-organization-api/SKILL.md +208 -0
  255. package/skills/literature/metadata/sophosia-reference-guide/SKILL.md +110 -0
  256. package/skills/literature/metadata/viaf-authority-api/SKILL.md +209 -0
  257. package/skills/literature/metadata/wikidata-api-guide/SKILL.md +156 -0
  258. package/skills/literature/metadata/zoplicate-dedup-guide/SKILL.md +147 -0
  259. package/skills/literature/metadata/zotero-actions-tags-guide/SKILL.md +212 -0
  260. package/skills/literature/metadata/zotmoov-guide/SKILL.md +120 -0
  261. package/skills/literature/metadata/zutilo-guide/SKILL.md +140 -0
  262. package/skills/literature/search/arxiv-batch-reporting/SKILL.md +133 -0
  263. package/skills/literature/search/arxiv-cli-tools/SKILL.md +172 -0
  264. package/skills/literature/search/arxiv-osiris/SKILL.md +199 -0
  265. package/skills/literature/search/arxiv-paper-processor/SKILL.md +141 -0
  266. package/skills/literature/search/baidu-scholar-guide/SKILL.md +110 -0
  267. package/skills/literature/search/base-academic-search/SKILL.md +196 -0
  268. package/skills/literature/search/chatpaper-guide/SKILL.md +122 -0
  269. package/skills/literature/search/citeseerx-api/SKILL.md +183 -0
  270. package/skills/literature/search/deep-literature-search/SKILL.md +149 -0
  271. package/skills/literature/search/deepgit-search-guide/SKILL.md +147 -0
  272. package/skills/literature/search/eric-education-api/SKILL.md +199 -0
  273. package/skills/literature/search/findpapers-guide/SKILL.md +177 -0
  274. package/skills/literature/search/ieee-xplore-api/SKILL.md +177 -0
  275. package/skills/literature/search/lens-scholarly-api/SKILL.md +211 -0
  276. package/skills/literature/search/multi-database-literature-search/SKILL.md +198 -0
  277. package/skills/literature/search/open-library-api/SKILL.md +196 -0
  278. package/skills/literature/search/open-semantic-search-guide/SKILL.md +190 -0
  279. package/skills/literature/search/openaire-api/SKILL.md +141 -0
  280. package/skills/literature/search/paper-search-mcp-guide/SKILL.md +107 -0
  281. package/skills/literature/search/papers-chat-guide/SKILL.md +194 -0
  282. package/skills/literature/search/pasa-paper-search-guide/SKILL.md +138 -0
  283. package/skills/literature/search/plos-open-access-api/SKILL.md +203 -0
  284. package/skills/literature/search/scielo-api/SKILL.md +182 -0
  285. package/skills/literature/search/share-research-api/SKILL.md +129 -0
  286. package/skills/literature/search/worldcat-search-api/SKILL.md +224 -0
  287. package/skills/research/automation/ai-scientist-v2-guide/SKILL.md +284 -0
  288. package/skills/research/automation/aim-experiment-guide/SKILL.md +234 -0
  289. package/skills/research/automation/claude-academic-workflow-guide/SKILL.md +202 -0
  290. package/skills/research/automation/coexist-ai-guide/SKILL.md +149 -0
  291. package/skills/research/automation/datagen-research-guide/SKILL.md +131 -0
  292. package/skills/research/automation/foam-agent-guide/SKILL.md +203 -0
  293. package/skills/research/automation/kedro-pipeline-guide/SKILL.md +216 -0
  294. package/skills/research/automation/mle-agent-guide/SKILL.md +139 -0
  295. package/skills/research/automation/paper-to-agent-guide/SKILL.md +116 -0
  296. package/skills/research/automation/rd-agent-guide/SKILL.md +246 -0
  297. package/skills/research/automation/research-paper-orchestrator/SKILL.md +254 -0
  298. package/skills/research/deep-research/academic-deep-research/SKILL.md +190 -0
  299. package/skills/research/deep-research/auto-deep-research-guide/SKILL.md +141 -0
  300. package/skills/research/deep-research/cognitive-kernel-guide/SKILL.md +200 -0
  301. package/skills/research/deep-research/corvus-research-guide/SKILL.md +132 -0
  302. package/skills/research/deep-research/deep-research-pro/SKILL.md +213 -0
  303. package/skills/research/deep-research/deep-research-work/SKILL.md +204 -0
  304. package/skills/research/deep-research/deep-searcher-guide/SKILL.md +253 -0
  305. package/skills/research/deep-research/gpt-researcher-guide/SKILL.md +191 -0
  306. package/skills/research/deep-research/in-depth-research-guide/SKILL.md +205 -0
  307. package/skills/research/deep-research/khoj-research-guide/SKILL.md +200 -0
  308. package/skills/research/deep-research/kosmos-scientist-guide/SKILL.md +185 -0
  309. package/skills/research/deep-research/llm-scientific-discovery-guide/SKILL.md +178 -0
  310. package/skills/research/deep-research/local-deep-research-guide/SKILL.md +253 -0
  311. package/skills/research/deep-research/open-researcher-guide/SKILL.md +138 -0
  312. package/skills/research/deep-research/tongyi-deep-research-guide/SKILL.md +217 -0
  313. package/skills/research/funding/eu-horizon-guide/SKILL.md +244 -0
  314. package/skills/research/funding/grant-budget-guide/SKILL.md +284 -0
  315. package/skills/research/funding/nih-reporter-api-guide/SKILL.md +166 -0
  316. package/skills/research/funding/nsf-award-api-guide/SKILL.md +133 -0
  317. package/skills/research/methodology/academic-mentor-guide/SKILL.md +169 -0
  318. package/skills/research/methodology/claude-scientific-guide/SKILL.md +122 -0
  319. package/skills/research/methodology/deep-innovator-guide/SKILL.md +242 -0
  320. package/skills/research/methodology/osf-api-guide/SKILL.md +165 -0
  321. package/skills/research/methodology/parsifal-slr-guide/SKILL.md +154 -0
  322. package/skills/research/methodology/research-paper-kb/SKILL.md +263 -0
  323. package/skills/research/methodology/research-pipeline-units-guide/SKILL.md +169 -0
  324. package/skills/research/methodology/research-town-guide/SKILL.md +263 -0
  325. package/skills/research/methodology/slr-automation-guide/SKILL.md +235 -0
  326. package/skills/research/paper-review/automated-review-guide/SKILL.md +281 -0
  327. package/skills/research/paper-review/latte-review-guide/SKILL.md +175 -0
  328. package/skills/research/paper-review/paper-compare-guide/SKILL.md +238 -0
  329. package/skills/research/paper-review/paper-critique-framework/SKILL.md +181 -0
  330. package/skills/research/paper-review/paper-digest-guide/SKILL.md +240 -0
  331. package/skills/research/paper-review/paper-research-assistant/SKILL.md +231 -0
  332. package/skills/research/paper-review/research-quality-filter/SKILL.md +261 -0
  333. package/skills/research/paper-review/review-response-guide/SKILL.md +275 -0
  334. package/skills/tools/code-exec/contextplus-mcp-guide/SKILL.md +110 -0
  335. package/skills/tools/code-exec/google-colab-guide/SKILL.md +276 -0
  336. package/skills/tools/code-exec/kaggle-api-guide/SKILL.md +216 -0
  337. package/skills/tools/code-exec/overleaf-cli-guide/SKILL.md +279 -0
  338. package/skills/tools/diagram/clawphd-guide/SKILL.md +149 -0
  339. package/skills/tools/diagram/code-flow-visualizer/SKILL.md +197 -0
  340. package/skills/tools/diagram/excalidraw-diagram-guide/SKILL.md +170 -0
  341. package/skills/tools/diagram/json-data-visualizer/SKILL.md +270 -0
  342. package/skills/tools/diagram/kroki-diagram-api/SKILL.md +198 -0
  343. package/skills/tools/diagram/mermaid-architect-guide/SKILL.md +219 -0
  344. package/skills/tools/diagram/scientific-graphical-abstract/SKILL.md +201 -0
  345. package/skills/tools/diagram/tldraw-whiteboard-guide/SKILL.md +397 -0
  346. package/skills/tools/document/docsgpt-guide/SKILL.md +130 -0
  347. package/skills/tools/document/large-document-reader/SKILL.md +202 -0
  348. package/skills/tools/document/md2pdf-xelatex/SKILL.md +212 -0
  349. package/skills/tools/document/openpaper-guide/SKILL.md +232 -0
  350. package/skills/tools/document/paper-parse-guide/SKILL.md +243 -0
  351. package/skills/tools/document/weknora-guide/SKILL.md +216 -0
  352. package/skills/tools/document/zotero-addon-market-guide/SKILL.md +108 -0
  353. package/skills/tools/document/zotero-night-theme-guide/SKILL.md +142 -0
  354. package/skills/tools/document/zotero-style-guide/SKILL.md +217 -0
  355. package/skills/tools/knowledge-graph/citation-network-builder/SKILL.md +244 -0
  356. package/skills/tools/knowledge-graph/concept-map-generator/SKILL.md +284 -0
  357. package/skills/tools/knowledge-graph/graphiti-guide/SKILL.md +219 -0
  358. package/skills/tools/knowledge-graph/mimir-memory-guide/SKILL.md +135 -0
  359. package/skills/tools/knowledge-graph/notero-zotero-notion-guide/SKILL.md +187 -0
  360. package/skills/tools/knowledge-graph/open-webui-tools-guide/SKILL.md +156 -0
  361. package/skills/tools/knowledge-graph/openspg-guide/SKILL.md +210 -0
  362. package/skills/tools/knowledge-graph/paperpile-notion-guide/SKILL.md +84 -0
  363. package/skills/tools/knowledge-graph/zotero-markdb-connect-guide/SKILL.md +162 -0
  364. package/skills/tools/ocr-translate/latex-translation-guide/SKILL.md +176 -0
  365. package/skills/tools/ocr-translate/math-equation-renderer/SKILL.md +198 -0
  366. package/skills/tools/ocr-translate/pdf-math-translate-guide/SKILL.md +141 -0
  367. package/skills/tools/ocr-translate/zotero-pdf-translate-guide/SKILL.md +95 -0
  368. package/skills/tools/ocr-translate/zotero-pdf2zh-guide/SKILL.md +143 -0
  369. package/skills/tools/scraping/dataset-finder-guide/SKILL.md +253 -0
  370. package/skills/tools/scraping/easy-spider-guide/SKILL.md +250 -0
  371. package/skills/tools/scraping/google-scholar-scraper/SKILL.md +255 -0
  372. package/skills/tools/scraping/repository-harvesting-guide/SKILL.md +310 -0
  373. package/skills/writing/citation/academic-citation-manager/SKILL.md +314 -0
  374. package/skills/writing/citation/academic-citation-manager-guide/SKILL.md +182 -0
  375. package/skills/writing/citation/citation-assistant-skill/SKILL.md +192 -0
  376. package/skills/writing/citation/jabref-reference-guide/SKILL.md +127 -0
  377. package/skills/writing/citation/jasminum-zotero-guide/SKILL.md +103 -0
  378. package/skills/writing/citation/mendeley-api/SKILL.md +231 -0
  379. package/skills/writing/citation/obsidian-citation-guide/SKILL.md +164 -0
  380. package/skills/writing/citation/obsidian-zotero-guide/SKILL.md +137 -0
  381. package/skills/writing/citation/onecite-reference-guide/SKILL.md +168 -0
  382. package/skills/writing/citation/papersgpt-zotero-guide/SKILL.md +132 -0
  383. package/skills/writing/citation/papis-cli-guide/SKILL.md +213 -0
  384. package/skills/writing/citation/zotero-better-bibtex-guide/SKILL.md +107 -0
  385. package/skills/writing/citation/zotero-better-notes-guide/SKILL.md +121 -0
  386. package/skills/writing/citation/zotero-gpt-guide/SKILL.md +111 -0
  387. package/skills/writing/citation/zotero-mcp-guide/SKILL.md +164 -0
  388. package/skills/writing/citation/zotero-mdnotes-guide/SKILL.md +162 -0
  389. package/skills/writing/citation/zotero-reference-guide/SKILL.md +139 -0
  390. package/skills/writing/citation/zotero-scholar-guide/SKILL.md +294 -0
  391. package/skills/writing/citation/zotfile-attachment-guide/SKILL.md +140 -0
  392. package/skills/writing/composition/ml-paper-writing/SKILL.md +163 -0
  393. package/skills/writing/composition/opendraft-thesis-guide/SKILL.md +200 -0
  394. package/skills/writing/composition/paper-debugger-guide/SKILL.md +143 -0
  395. package/skills/writing/composition/paperforge-guide/SKILL.md +205 -0
  396. package/skills/writing/composition/research-paper-writer/SKILL.md +226 -0
  397. package/skills/writing/composition/scientific-writing-resources/SKILL.md +151 -0
  398. package/skills/writing/composition/scientific-writing-wrapper/SKILL.md +153 -0
  399. package/skills/writing/latex/academic-writing-latex/SKILL.md +285 -0
  400. package/skills/writing/latex/latex-drawing-collection/SKILL.md +154 -0
  401. package/skills/writing/latex/latex-templates-collection/SKILL.md +159 -0
  402. package/skills/writing/latex/md-to-pdf-academic/SKILL.md +230 -0
  403. package/skills/writing/latex/tex-render-guide/SKILL.md +243 -0
  404. package/skills/writing/polish/academic-tone-guide/SKILL.md +209 -0
  405. package/skills/writing/polish/chinese-text-humanizer/SKILL.md +140 -0
  406. package/skills/writing/polish/conciseness-editing-guide/SKILL.md +225 -0
  407. package/skills/writing/polish/paper-polish-guide/SKILL.md +160 -0
  408. package/skills/writing/templates/arxiv-preprint-template/SKILL.md +184 -0
  409. package/skills/writing/templates/elegant-paper-template/SKILL.md +141 -0
  410. package/skills/writing/templates/graphical-abstract-guide/SKILL.md +183 -0
  411. package/skills/writing/templates/novathesis-guide/SKILL.md +152 -0
  412. package/skills/writing/templates/scientific-article-pdf/SKILL.md +261 -0
  413. package/skills/writing/templates/sjtuthesis-guide/SKILL.md +197 -0
  414. package/skills/writing/templates/thuthesis-guide/SKILL.md +181 -0
  415. package/skills/literature/fulltext/repository-harvesting-guide/SKILL.md +0 -207
@@ -0,0 +1,293 @@
1
+ ---
2
+ name: stata-reference-guide
3
+ description: "Comprehensive Stata reference covering syntax, econometrics, and 20+ packages"
4
+ metadata:
5
+ openclaw:
6
+ emoji: "📊"
7
+ category: "analysis"
8
+ subcategory: "econometrics"
9
+ keywords: ["stata", "econometrics", "panel data", "causal inference", "community packages", "data management", "regression"]
10
+ source: "https://github.com/dylantmoore/stata-skill"
11
+ ---
12
+
13
+ # Stata Comprehensive Reference Guide
14
+
15
+ ## Overview
16
+
17
+ Stata is the dominant statistical software in economics, political science, public health, and sociology research. This guide provides a comprehensive reference covering core syntax, data management, estimation commands, causal inference methods, graphics, Mata programming, and 20+ community-contributed packages. It is designed as a progressive-disclosure reference: use the section relevant to your current task rather than reading end-to-end.
18
+
19
+ ## Core Syntax and Data Management
20
+
21
+ ### Data Import and Export
22
+
23
+ ```stata
24
+ * Import CSV with variable names in first row
25
+ import delimited "data.csv", clear varnames(1)
26
+
27
+ * Import Excel (specific sheet and cell range)
28
+ import excel "workbook.xlsx", sheet("Sheet1") cellrange(A1:Z1000) firstrow clear
29
+
30
+ * Import Stata format
31
+ use "dataset.dta", clear
32
+
33
+ * Export to CSV
34
+ export delimited "output.csv", replace
35
+
36
+ * Save as Stata format
37
+ save "cleaned_data.dta", replace
38
+ ```
39
+
40
+ ### Variable Management
41
+
42
+ ```stata
43
+ * Generate new variables
44
+ gen log_income = ln(income)
45
+ gen age_sq = age^2
46
+ gen treatment_post = treatment * post
47
+
48
+ * Recode and label
49
+ recode education (1/12 = 1 "HS or less") (13/16 = 2 "College") (17/20 = 3 "Graduate"), gen(edu_cat)
50
+ label variable edu_cat "Education Category"
51
+
52
+ * String operations
53
+ gen first_name = word(full_name, 1)
54
+ gen year_str = string(year)
55
+ destring price_str, gen(price) force
56
+
57
+ * Date handling
58
+ gen date = date(date_str, "YMD")
59
+ format date %td
60
+ gen year = year(date)
61
+ gen quarter = quarter(date)
62
+ ```
63
+
64
+ ### Data Cleaning Patterns
65
+
66
+ ```stata
67
+ * Identify and handle duplicates
68
+ duplicates report id year
69
+ duplicates tag id year, gen(dup_flag)
70
+ duplicates drop id year, force
71
+
72
+ * Missing values
73
+ misstable summarize
74
+ misstable patterns
75
+ replace income = . if income < 0 // recode impossible values
76
+
77
+ * Merge datasets
78
+ merge 1:1 id year using "panel_data.dta", keep(match master) nogen
79
+ merge m:1 state year using "state_controls.dta", keep(match master) nogen
80
+
81
+ * Reshape between wide and long
82
+ reshape long income_, i(id) j(year)
83
+ reshape wide income, i(id) j(year)
84
+
85
+ * Collapse to group level
86
+ collapse (mean) avg_income=income (sd) sd_income=income (count) n=income, by(state year)
87
+ ```
88
+
89
+ ## Estimation Commands
90
+
91
+ ### Linear Regression
92
+
93
+ ```stata
94
+ * OLS with robust standard errors
95
+ reg y x1 x2 x3, robust
96
+
97
+ * Clustered standard errors
98
+ reg y x1 x2 x3, cluster(firm_id)
99
+
100
+ * Fixed effects (within estimator)
101
+ xtreg y x1 x2 x3, fe cluster(firm_id)
102
+ xtset firm_id year // must declare panel structure first
103
+
104
+ * Absorbing high-dimensional FE (reghdfe)
105
+ reghdfe y x1 x2 x3, absorb(firm_id year) cluster(firm_id)
106
+
107
+ * Instrumental variables (2SLS)
108
+ ivregress 2sls y x1 x2 (endog_var = instrument1 instrument2), robust
109
+ estat firststage
110
+ estat overid
111
+ ```
112
+
113
+ ### Panel Data Methods
114
+
115
+ ```stata
116
+ * Panel setup
117
+ xtset firm_id year
118
+
119
+ * Hausman test (FE vs RE)
120
+ quietly xtreg y x1 x2, fe
121
+ estimates store fe
122
+ quietly xtreg y x1 x2, re
123
+ estimates store re
124
+ hausman fe re
125
+
126
+ * Dynamic panel GMM (xtabond2)
127
+ xtabond2 y L.y x1 x2, gmm(L.y, lag(2 4)) iv(x1 x2) robust twostep
128
+
129
+ * Test for serial correlation and overidentification
130
+ estat abond // Arellano-Bond test
131
+ estat sargan // Sargan/Hansen test
132
+ ```
133
+
134
+ ### Causal Inference
135
+
136
+ ```stata
137
+ * Difference-in-Differences
138
+ gen did = treatment * post
139
+ reg y did treatment post controls, cluster(state)
140
+
141
+ * Modern DiD with staggered treatment (csdid)
142
+ csdid y x1 x2, ivar(id) time(year) gvar(first_treat) method(dripw)
143
+ csdid_plot // event study plot
144
+
145
+ * Regression Discontinuity (rdrobust)
146
+ rdrobust y running_var, c(0) p(1) kernel(triangular)
147
+ rdplot y running_var, c(0) p(1)
148
+
149
+ * Propensity Score Matching (psmatch2)
150
+ psmatch2 treatment x1 x2 x3, outcome(y) logit caliper(0.05) common
151
+ pstest x1 x2 x3 // balance check
152
+
153
+ * Synthetic Control (synth)
154
+ synth y x1 x2 x3 y(1990) y(1991) y(1992), trunit(1) trperiod(1993) fig
155
+ ```
156
+
157
+ ### Limited Dependent Variables
158
+
159
+ ```stata
160
+ * Logit/Probit
161
+ logit binary_y x1 x2, robust
162
+ margins, dydx(*) // average marginal effects
163
+
164
+ probit binary_y x1 x2, robust
165
+ margins, dydx(*)
166
+
167
+ * Ordered logit
168
+ ologit ordered_y x1 x2, robust
169
+ margins, predict(outcome(3)) dydx(x1)
170
+
171
+ * Tobit (censored regression)
172
+ tobit y x1 x2, ll(0)
173
+
174
+ * Poisson and Negative Binomial
175
+ poisson count_y x1 x2, robust
176
+ nbreg count_y x1 x2, robust
177
+ ```
178
+
179
+ ## Community Packages (20+)
180
+
181
+ ### Installation
182
+
183
+ ```stata
184
+ * Install from SSC (Statistical Software Components)
185
+ ssc install reghdfe
186
+ ssc install estout
187
+ ssc install coefplot
188
+ ssc install csdid
189
+ ssc install rdrobust
190
+ ssc install psmatch2
191
+ ssc install synth
192
+ ssc install ivreg2
193
+ ssc install xtabond2
194
+ ssc install winsor2
195
+ ssc install gtools
196
+ ssc install ftools
197
+ ssc install binscatter
198
+ ssc install binsreg
199
+ ssc install grstyle
200
+
201
+ * Install from GitHub
202
+ net install did_multiplegt, from("https://raw.githubusercontent.com/chaisemartinDehejia/did_multiplegt/main")
203
+ ```
204
+
205
+ ### Publication-Quality Output
206
+
207
+ ```stata
208
+ * estout / esttab — formatted regression tables
209
+ eststo clear
210
+ eststo: reg y x1 x2, robust
211
+ eststo: reg y x1 x2 x3, robust
212
+ eststo: reg y x1 x2 x3, cluster(firm_id)
213
+ esttab, se star(* 0.10 ** 0.05 *** 0.01) ///
214
+ title("Main Results") label replace ///
215
+ scalars("r2 R-squared" "N Observations")
216
+
217
+ * Export to LaTeX
218
+ esttab using "table1.tex", replace booktabs ///
219
+ se star(* 0.10 ** 0.05 *** 0.01) label
220
+
221
+ * Export to CSV/Excel
222
+ esttab using "table1.csv", replace se
223
+
224
+ * coefplot — coefficient visualization
225
+ coefplot est1 est2 est3, drop(_cons) xline(0) ///
226
+ title("Coefficient Estimates") legend(order(1 "Model 1" 2 "Model 2" 3 "Model 3"))
227
+ ```
228
+
229
+ ## Graphics
230
+
231
+ ```stata
232
+ * Scatter with fit line
233
+ twoway (scatter y x) (lfit y x), title("Y vs X") ///
234
+ xtitle("X Variable") ytitle("Y Variable")
235
+
236
+ * Event study plot
237
+ coefplot, vertical drop(_cons) yline(0) ///
238
+ title("Event Study") xtitle("Periods Relative to Treatment")
239
+
240
+ * Binned scatter (binscatter)
241
+ binscatter y x, controls(z1 z2) nquantiles(20) ///
242
+ title("Binned Scatter") xtitle("X") ytitle("Y")
243
+
244
+ * Kernel density
245
+ kdensity income if year==2020, normal ///
246
+ title("Income Distribution") xtitle("Income")
247
+
248
+ * Graph styling (grstyle)
249
+ grstyle init
250
+ grstyle set plain, horizontal grid
251
+ grstyle color background white
252
+ grstyle set color economist
253
+ ```
254
+
255
+ ## Mata Programming
256
+
257
+ ```stata
258
+ * Basic Mata usage
259
+ mata:
260
+ // Matrix operations
261
+ X = st_data(., ("x1", "x2", "x3"))
262
+ y = st_data(., "y")
263
+ n = rows(X)
264
+
265
+ // OLS by hand
266
+ X = X, J(n, 1, 1) // add constant
267
+ beta = invsym(X'X) * X'y
268
+ e = y - X * beta
269
+ sigma2 = (e'e) / (n - cols(X))
270
+ V = sigma2 * invsym(X'X)
271
+ se = sqrt(diagonal(V))
272
+
273
+ beta, se
274
+ end
275
+ ```
276
+
277
+ ## Workflow Best Practices
278
+
279
+ 1. **Always set a random seed** before any procedure involving randomness: `set seed 12345`
280
+ 2. **Use `preserve`/`restore`** for temporary data manipulations within a do-file
281
+ 3. **Log your sessions**: `log using "analysis_log.smcl", replace`
282
+ 4. **Version control**: Start do-files with `version 17` (or your version) for reproducibility
283
+ 5. **Use tempfiles** for intermediate datasets: `tempfile merged` then `save `merged'`
284
+ 6. **Profile your code** with `timer on 1` / `timer off 1` / `timer list` for long-running operations
285
+ 7. **Use `gtools`** (greshape, gcollapse, gegen) for 5-10x speedups on large datasets
286
+
287
+ ## References
288
+
289
+ - [Stata Official Documentation](https://www.stata.com/manuals/)
290
+ - [UCLA Stata FAQ](https://stats.oarc.ucla.edu/stata/faq/)
291
+ - [Stata Journal](https://www.stata-journal.com/)
292
+ - [SSC Archive](https://ideas.repec.org/s/boc/bocode.html)
293
+ - [dylantmoore/stata-skill](https://github.com/dylantmoore/stata-skill) — Source for this reference
@@ -0,0 +1,157 @@
1
+ ---
2
+ name: data-anomaly-detection
3
+ description: "Detect anomalies and outliers in research data using statistical methods"
4
+ metadata:
5
+ openclaw:
6
+ emoji: "🔎"
7
+ category: "analysis"
8
+ subcategory: "statistics"
9
+ keywords: ["anomaly detection", "outlier detection", "data quality", "statistical testing", "robust statistics"]
10
+ source: "https://github.com/AcademicSkills/data-anomaly-detection"
11
+ ---
12
+
13
+ # Data Anomaly Detection
14
+
15
+ A skill for identifying anomalies, outliers, and suspicious patterns in research datasets. Combines classical statistical methods with modern machine learning approaches to flag data points that deviate significantly from expected distributions, helping researchers maintain data integrity and uncover genuine scientific findings.
16
+
17
+ ## Overview
18
+
19
+ Anomalous data points in research datasets can arise from measurement errors, instrument malfunction, data entry mistakes, or genuine rare phenomena. Distinguishing between these sources is critical: blindly removing outliers can bias results, while ignoring measurement errors introduces noise. This skill provides a structured framework for detecting, classifying, and handling anomalies in univariate, multivariate, and time-series research data.
20
+
21
+ The approach follows a three-stage pipeline: detection (flagging candidate anomalies), diagnosis (determining likely cause), and decision (remove, transform, or retain with justification). Every decision is logged for reproducibility and transparent reporting.
22
+
23
+ ## Statistical Detection Methods
24
+
25
+ ### Univariate Outlier Detection
26
+
27
+ ```python
28
+ import numpy as np
29
+ from scipy import stats
30
+
31
+ def detect_univariate_outliers(data: np.ndarray, method: str = 'iqr') -> dict:
32
+ """
33
+ Detect outliers using classical univariate methods.
34
+
35
+ Methods:
36
+ 'iqr': Interquartile range (1.5x IQR rule)
37
+ 'zscore': Z-score threshold (|z| > 3)
38
+ 'mad': Median absolute deviation (robust)
39
+ 'grubbs': Grubbs' test for single outlier
40
+ """
41
+ results = {'method': method, 'n_total': len(data)}
42
+
43
+ if method == 'iqr':
44
+ q1, q3 = np.percentile(data, [25, 75])
45
+ iqr = q3 - q1
46
+ lower, upper = q1 - 1.5 * iqr, q3 + 1.5 * iqr
47
+ mask = (data < lower) | (data > upper)
48
+
49
+ elif method == 'zscore':
50
+ z = np.abs(stats.zscore(data))
51
+ mask = z > 3
52
+
53
+ elif method == 'mad':
54
+ median = np.median(data)
55
+ mad = np.median(np.abs(data - median))
56
+ modified_z = 0.6745 * (data - median) / mad if mad > 0 else np.zeros_like(data)
57
+ mask = np.abs(modified_z) > 3.5
58
+
59
+ elif method == 'grubbs':
60
+ # Grubbs' test for the single most extreme value
61
+ n = len(data)
62
+ mean, sd = np.mean(data), np.std(data, ddof=1)
63
+ g = np.max(np.abs(data - mean)) / sd
64
+ t_crit = stats.t.ppf(1 - 0.05 / (2 * n), n - 2)
65
+ g_crit = ((n - 1) / np.sqrt(n)) * np.sqrt(t_crit**2 / (n - 2 + t_crit**2))
66
+ mask = np.abs(data - mean) / sd >= g_crit
67
+
68
+ results['outlier_indices'] = np.where(mask)[0].tolist()
69
+ results['n_outliers'] = int(mask.sum())
70
+ results['pct_outliers'] = round(mask.sum() / len(data) * 100, 2)
71
+ return results
72
+ ```
73
+
74
+ ### Multivariate Outlier Detection
75
+
76
+ ```python
77
+ from sklearn.covariance import EllipticEnvelope
78
+ from sklearn.ensemble import IsolationForest
79
+
80
+ def detect_multivariate_outliers(X: np.ndarray, method: str = 'mahalanobis') -> dict:
81
+ """
82
+ Detect multivariate outliers using distance-based and model-based methods.
83
+ """
84
+ if method == 'mahalanobis':
85
+ detector = EllipticEnvelope(contamination=0.05, random_state=42)
86
+ labels = detector.fit_predict(X) # -1 = outlier, 1 = inlier
87
+
88
+ elif method == 'isolation_forest':
89
+ detector = IsolationForest(
90
+ n_estimators=100, contamination=0.05, random_state=42
91
+ )
92
+ labels = detector.fit_predict(X)
93
+
94
+ outlier_mask = labels == -1
95
+ return {
96
+ 'method': method,
97
+ 'outlier_indices': np.where(outlier_mask)[0].tolist(),
98
+ 'n_outliers': int(outlier_mask.sum()),
99
+ 'contamination_assumed': 0.05
100
+ }
101
+ ```
102
+
103
+ ## Diagnosis Framework
104
+
105
+ Once candidate anomalies are flagged, classify each by likely cause:
106
+
107
+ | Category | Indicators | Action |
108
+ |----------|-----------|--------|
109
+ | **Measurement error** | Value physically impossible, instrument log shows malfunction | Remove with documentation |
110
+ | **Data entry error** | Obvious typo (e.g., extra digit), inconsistent units | Correct if source available, else remove |
111
+ | **Sampling artifact** | Unusual but plausible value from edge of population | Retain; use robust methods |
112
+ | **Genuine extreme** | Verified measurement, consistent with other variables | Retain; report sensitivity analysis |
113
+ | **Contamination** | Data from wrong population or experimental condition | Remove with justification |
114
+
115
+ ### Diagnostic Checks
116
+
117
+ - **Cross-variable consistency**: Does the flagged value make sense given other columns for the same observation?
118
+ - **Temporal context**: For longitudinal data, is the spike consistent with known events?
119
+ - **Instrument logs**: Can the anomaly be traced to a calibration or equipment issue?
120
+ - **Domain knowledge**: Is the value within theoretically possible bounds?
121
+
122
+ ## Time-Series Anomaly Detection
123
+
124
+ ```python
125
+ def detect_timeseries_anomalies(series: np.ndarray, window: int = 20) -> dict:
126
+ """
127
+ Detect anomalies in time-series data using rolling statistics.
128
+ """
129
+ rolling_mean = pd.Series(series).rolling(window=window).mean()
130
+ rolling_std = pd.Series(series).rolling(window=window).std()
131
+
132
+ upper_bound = rolling_mean + 3 * rolling_std
133
+ lower_bound = rolling_mean - 3 * rolling_std
134
+
135
+ anomalies = (series > upper_bound) | (series < lower_bound)
136
+ return {
137
+ 'anomaly_indices': np.where(anomalies)[0].tolist(),
138
+ 'n_anomalies': int(anomalies.sum()),
139
+ 'window_size': window
140
+ }
141
+ ```
142
+
143
+ ## Reporting Anomaly Handling
144
+
145
+ When reporting anomaly handling in publications:
146
+
147
+ 1. **State the detection method** and its parameters (e.g., "Outliers were identified using the 1.5x IQR rule").
148
+ 2. **Report the number and percentage** of observations flagged.
149
+ 3. **Describe the disposition**: how many were removed, corrected, or retained.
150
+ 4. **Provide sensitivity analysis**: show that main conclusions hold with and without outliers.
151
+ 5. **Include in supplementary materials**: full list of flagged observations and their disposition.
152
+
153
+ ## References
154
+
155
+ - Rousseeuw, P. J. & Hubert, M. (2011). Robust Statistics for Outlier Detection. *WIREs Data Mining and Knowledge Discovery*, 1(1), 73-79.
156
+ - Liu, F. T., Ting, K. M., & Zhou, Z.-H. (2008). Isolation Forest. *ICDM 2008*.
157
+ - Aguinis, H., Gottfredson, R. K., & Joo, H. (2013). Best-Practice Recommendations for Defining, Identifying, and Handling Outliers. *Organizational Research Methods*, 16(2), 270-301.
@@ -0,0 +1,226 @@
1
+ ---
2
+ name: general-statistics-guide
3
+ description: "Conceptual foundations of statistical inference for empirical research"
4
+ metadata:
5
+ openclaw:
6
+ emoji: "📈"
7
+ category: "analysis"
8
+ subcategory: "statistics"
9
+ keywords: ["statistical inference", "hypothesis testing", "probability", "regression", "confidence intervals", "statistical thinking"]
10
+ source: "https://clawhub.com/ivangdavila/statistics"
11
+ ---
12
+
13
+ # Statistical Foundations for Empirical Research
14
+
15
+ ## Overview
16
+
17
+ This guide builds statistical intuition from probability fundamentals through inferential methods to practical application in research. It is language-agnostic (not tied to R, Python, or Stata) and focuses on the concepts, assumptions, and interpretation of statistical methods commonly used in empirical papers. Use it as a reference when designing studies, choosing tests, or interpreting results.
18
+
19
+ ## Probability Foundations
20
+
21
+ ### Key Distributions
22
+
23
+ | Distribution | When to Use | Parameters | Example |
24
+ |-------------|-------------|-----------|---------|
25
+ | **Normal** | Continuous, symmetric data; CLT applications | μ (mean), σ (std) | Height, test scores |
26
+ | **Binomial** | Count of successes in n trials | n (trials), p (probability) | Survey yes/no responses |
27
+ | **Poisson** | Count of rare events in fixed interval | λ (rate) | Paper citations per year |
28
+ | **t-distribution** | Small sample means (n < 30) | df (degrees of freedom) | Pilot study comparisons |
29
+ | **Chi-squared** | Goodness of fit, contingency tables | df | Category frequency tests |
30
+ | **F-distribution** | Ratio of variances, ANOVA | df₁, df₂ | Comparing model fits |
31
+
32
+ ### Central Limit Theorem
33
+
34
+ The sample mean $\bar{X}$ of n independent observations approaches a normal distribution as n increases, regardless of the population distribution:
35
+
36
+ ```
37
+ If X₁, X₂, ..., Xₙ are i.i.d. with mean μ and variance σ²:
38
+ √n(X̄ - μ) / σ → N(0, 1) as n → ∞
39
+
40
+ Practical rule: n ≥ 30 is usually sufficient
41
+ Exception: heavily skewed distributions may need n ≥ 100
42
+ ```
43
+
44
+ This is why most inferential statistics (confidence intervals, t-tests, regression) work even when the underlying data is not normally distributed.
45
+
46
+ ## Descriptive Statistics
47
+
48
+ ### Measures of Central Tendency
49
+
50
+ | Measure | Formula | When to Use | Sensitive to Outliers? |
51
+ |---------|---------|-------------|----------------------|
52
+ | Mean | Σxᵢ / n | Symmetric distributions | Yes |
53
+ | Median | Middle value when sorted | Skewed distributions, ordinal data | No |
54
+ | Mode | Most frequent value | Categorical data, multimodal distributions | No |
55
+
56
+ ### Measures of Spread
57
+
58
+ | Measure | Interpretation | When to Report |
59
+ |---------|---------------|----------------|
60
+ | Standard deviation (σ) | Average distance from mean | With the mean |
61
+ | IQR (Q3 - Q1) | Spread of middle 50% | With the median |
62
+ | Range (max - min) | Total spread | Rarely (sensitive to outliers) |
63
+ | Coefficient of variation (σ/μ) | Relative spread | Comparing variability across scales |
64
+
65
+ ## Hypothesis Testing
66
+
67
+ ### The Testing Framework
68
+
69
+ ```
70
+ 1. State hypotheses:
71
+ H₀: null hypothesis (no effect, no difference)
72
+ H₁: alternative hypothesis (there is an effect)
73
+
74
+ 2. Choose significance level: α = 0.05 (conventional)
75
+
76
+ 3. Compute test statistic from data
77
+
78
+ 4. Compare to critical value or compute p-value
79
+
80
+ 5. Decision:
81
+ p < α → Reject H₀ (statistically significant)
82
+ p ≥ α → Fail to reject H₀ (not significant)
83
+ ```
84
+
85
+ ### Common Errors
86
+
87
+ | | H₀ is True | H₀ is False |
88
+ |---|---|---|
89
+ | **Reject H₀** | Type I Error (α) | Correct (Power = 1 - β) |
90
+ | **Fail to Reject H₀** | Correct | Type II Error (β) |
91
+
92
+ **Practical interpretation**:
93
+ - Type I (false positive): Claiming a drug works when it doesn't
94
+ - Type II (false negative): Missing a real drug effect
95
+ - Power: Probability of detecting a real effect (target ≥ 0.80)
96
+
97
+ ### Choosing the Right Test
98
+
99
+ | Question | Data Type | Test | Assumptions |
100
+ |----------|-----------|------|-------------|
101
+ | Compare 2 means | Continuous, normal | Independent t-test | Equal variance (or Welch's) |
102
+ | Compare 2 means (paired) | Continuous, normal | Paired t-test | Paired observations |
103
+ | Compare 2 means (non-normal) | Continuous/ordinal | Mann-Whitney U | Independent samples |
104
+ | Compare >2 means | Continuous, normal | One-way ANOVA | Equal variance, normality |
105
+ | Compare >2 means (non-normal) | Ordinal | Kruskal-Wallis | Independent samples |
106
+ | Association (categorical) | Categorical × Categorical | Chi-squared test | Expected count ≥ 5 |
107
+ | Correlation | Continuous × Continuous | Pearson r | Linear relationship, bivariate normal |
108
+ | Correlation (non-normal) | Ordinal or non-normal | Spearman ρ | Monotonic relationship |
109
+
110
+ ## Regression Analysis
111
+
112
+ ### Linear Regression
113
+
114
+ ```
115
+ Y = β₀ + β₁X₁ + β₂X₂ + ... + βₖXₖ + ε
116
+
117
+ Interpretation:
118
+ β₁ = change in Y for a 1-unit increase in X₁, holding other X's constant
119
+ R² = proportion of variance in Y explained by the model
120
+ Adjusted R² = R² penalized for number of predictors
121
+ ```
122
+
123
+ **Key assumptions** (check before trusting results):
124
+ 1. **Linearity**: Y is a linear function of X's
125
+ 2. **Independence**: Observations are independent
126
+ 3. **Homoscedasticity**: Constant variance of residuals
127
+ 4. **Normality**: Residuals are approximately normal (for inference)
128
+ 5. **No multicollinearity**: X's are not highly correlated with each other
129
+
130
+ **Diagnostic checks**:
131
+
132
+ ```
133
+ Linearity: Plot residuals vs. fitted values (no pattern)
134
+ Homoscedasticity: Breusch-Pagan test or residual plot (no funnel shape)
135
+ Normality: Q-Q plot of residuals, Shapiro-Wilk test
136
+ Multicollinearity: VIF (Variance Inflation Factor) — VIF > 10 is concerning
137
+ Influential obs: Cook's distance — D > 4/n warrants investigation
138
+ ```
139
+
140
+ ### Logistic Regression
141
+
142
+ For binary outcomes (0/1):
143
+
144
+ ```
145
+ log(p / (1-p)) = β₀ + β₁X₁ + β₂X₂ + ...
146
+
147
+ Where p = P(Y = 1 | X)
148
+
149
+ Interpretation:
150
+ exp(β₁) = odds ratio
151
+ exp(β₁) = 1.5 means "a 1-unit increase in X₁ multiplies the odds by 1.5"
152
+ Report: odds ratios with 95% CI
153
+ ```
154
+
155
+ ## Confidence Intervals
156
+
157
+ ```
158
+ Point estimate ± (critical value × standard error)
159
+
160
+ For a mean: X̄ ± z*(σ/√n) or X̄ ± t*(s/√n)
161
+
162
+ Interpretation (frequentist):
163
+ "If we repeated this study many times, 95% of the resulting intervals
164
+ would contain the true population parameter."
165
+
166
+ NOT: "There is a 95% probability that the true value is in this interval."
167
+ ```
168
+
169
+ ## Effect Sizes
170
+
171
+ p-values tell you IF an effect exists; effect sizes tell you HOW BIG it is.
172
+
173
+ | Measure | Context | Small | Medium | Large |
174
+ |---------|---------|-------|--------|-------|
175
+ | Cohen's d | Mean difference | 0.2 | 0.5 | 0.8 |
176
+ | Pearson r | Correlation | 0.1 | 0.3 | 0.5 |
177
+ | η² (eta-squared) | ANOVA | 0.01 | 0.06 | 0.14 |
178
+ | Odds ratio | Logistic regression | 1.5 | 2.5 | 4.3 |
179
+ | R² | Regression | 0.02 | 0.13 | 0.26 |
180
+
181
+ **Always report effect sizes alongside p-values** — a "significant" result with d = 0.05 is trivial in practice.
182
+
183
+ ## Multiple Testing
184
+
185
+ When testing multiple hypotheses simultaneously, the chance of at least one false positive increases:
186
+
187
+ ```
188
+ With α = 0.05 and 20 independent tests:
189
+ P(at least one false positive) = 1 - (1 - 0.05)^20 = 0.64
190
+
191
+ Corrections:
192
+ Bonferroni: α_adj = α / m (conservative)
193
+ Benjamini-Hochberg: Controls false discovery rate (FDR) (less conservative)
194
+ Holm-Bonferroni: Step-down procedure (more powerful than Bonferroni)
195
+ ```
196
+
197
+ ## Sample Size and Power
198
+
199
+ Before collecting data, determine the required sample size:
200
+
201
+ ```
202
+ Inputs needed:
203
+ 1. Desired power (typically 0.80)
204
+ 2. Significance level (α = 0.05)
205
+ 3. Expected effect size (from pilot study or literature)
206
+ 4. Type of test (t-test, ANOVA, regression, etc.)
207
+
208
+ Rule of thumb for two-sample t-test:
209
+ n per group ≈ 16 / d² (for 80% power, α = 0.05)
210
+ d = 0.5 → n ≈ 64 per group
211
+ d = 0.2 → n ≈ 400 per group
212
+ ```
213
+
214
+ ## Common Pitfalls
215
+
216
+ 1. **p-hacking**: Trying many analyses until p < 0.05. Fix: pre-register analyses.
217
+ 2. **Absence of evidence ≠ evidence of absence**: p > 0.05 does not prove H₀. Consider equivalence tests.
218
+ 3. **Correlation ≠ causation**: Regression coefficients are causal only with proper identification strategy.
219
+ 4. **Simpson's paradox**: A trend in subgroups can reverse when combined. Always check stratified analyses.
220
+ 5. **Overfitting**: Too many predictors relative to sample size. Rule of thumb: n ≥ 10-20 per predictor.
221
+
222
+ ## References
223
+
224
+ - Agresti, A. (2018). *Statistical Methods for the Social Sciences* (5th ed.). Pearson.
225
+ - Wasserstein, R. L., & Lazar, N. A. (2016). "The ASA Statement on p-Values." *The American Statistician*, 70(2), 129-133.
226
+ - Cohen, J. (1988). *Statistical Power Analysis for the Behavioral Sciences* (2nd ed.). Routledge.