@wentorai/research-plugins 1.0.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (415) hide show
  1. package/README.md +22 -22
  2. package/curated/analysis/README.md +82 -56
  3. package/curated/domains/README.md +225 -69
  4. package/curated/literature/README.md +115 -46
  5. package/curated/research/README.md +106 -58
  6. package/curated/tools/README.md +107 -87
  7. package/curated/writing/README.md +92 -45
  8. package/mcp-configs/academic-db/alphafold-mcp.json +20 -0
  9. package/mcp-configs/academic-db/brightspace-mcp.json +21 -0
  10. package/mcp-configs/academic-db/climatiq-mcp.json +20 -0
  11. package/mcp-configs/academic-db/gibs-mcp.json +20 -0
  12. package/mcp-configs/academic-db/gis-mcp-server.json +22 -0
  13. package/mcp-configs/academic-db/google-earth-engine-mcp.json +21 -0
  14. package/mcp-configs/academic-db/m4-clinical-mcp.json +21 -0
  15. package/mcp-configs/academic-db/medical-mcp.json +21 -0
  16. package/mcp-configs/academic-db/nexonco-mcp.json +20 -0
  17. package/mcp-configs/academic-db/omop-mcp.json +20 -0
  18. package/mcp-configs/academic-db/onekgpd-mcp.json +20 -0
  19. package/mcp-configs/academic-db/openedu-mcp.json +20 -0
  20. package/mcp-configs/academic-db/opengenes-mcp.json +20 -0
  21. package/mcp-configs/academic-db/openstax-mcp.json +21 -0
  22. package/mcp-configs/academic-db/openstreetmap-mcp.json +21 -0
  23. package/mcp-configs/academic-db/opentargets-mcp.json +21 -0
  24. package/mcp-configs/academic-db/pdb-mcp.json +21 -0
  25. package/mcp-configs/academic-db/smithsonian-mcp.json +20 -0
  26. package/mcp-configs/ai-platform/magi-researchers.json +21 -0
  27. package/mcp-configs/ai-platform/mcp-academic-researcher.json +22 -0
  28. package/mcp-configs/ai-platform/open-paper-machine.json +21 -0
  29. package/mcp-configs/ai-platform/paper-intelligence.json +21 -0
  30. package/mcp-configs/ai-platform/paper-reader.json +21 -0
  31. package/mcp-configs/ai-platform/paperdebugger.json +21 -0
  32. package/mcp-configs/browser/exa-mcp.json +20 -0
  33. package/mcp-configs/browser/mcp-searxng.json +21 -0
  34. package/mcp-configs/browser/mcp-webresearch.json +20 -0
  35. package/mcp-configs/cloud-docs/confluence-mcp.json +37 -0
  36. package/mcp-configs/cloud-docs/google-drive-mcp.json +35 -0
  37. package/mcp-configs/cloud-docs/notion-mcp.json +29 -0
  38. package/mcp-configs/communication/discord-mcp.json +29 -0
  39. package/mcp-configs/communication/discourse-mcp.json +21 -0
  40. package/mcp-configs/communication/slack-mcp.json +29 -0
  41. package/mcp-configs/communication/telegram-mcp.json +28 -0
  42. package/mcp-configs/data-platform/automl-stat-mcp.json +21 -0
  43. package/mcp-configs/data-platform/jefferson-stats-mcp.json +22 -0
  44. package/mcp-configs/data-platform/mcp-excel-server.json +21 -0
  45. package/mcp-configs/data-platform/mcp-stata.json +21 -0
  46. package/mcp-configs/data-platform/mcpstack-jupyter.json +21 -0
  47. package/mcp-configs/data-platform/ml-mcp.json +21 -0
  48. package/mcp-configs/data-platform/nasdaq-data-link-mcp.json +20 -0
  49. package/mcp-configs/data-platform/numpy-mcp.json +21 -0
  50. package/mcp-configs/database/neo4j-mcp.json +37 -0
  51. package/mcp-configs/database/postgres-mcp.json +28 -0
  52. package/mcp-configs/database/sqlite-mcp.json +29 -0
  53. package/mcp-configs/dev-platform/geogebra-mcp.json +21 -0
  54. package/mcp-configs/dev-platform/github-mcp.json +31 -0
  55. package/mcp-configs/dev-platform/gitlab-mcp.json +34 -0
  56. package/mcp-configs/dev-platform/latex-mcp-server.json +21 -0
  57. package/mcp-configs/dev-platform/manim-mcp.json +20 -0
  58. package/mcp-configs/dev-platform/mcp-echarts.json +20 -0
  59. package/mcp-configs/dev-platform/panel-viz-mcp.json +20 -0
  60. package/mcp-configs/dev-platform/paperbanana.json +20 -0
  61. package/mcp-configs/dev-platform/texflow-mcp.json +20 -0
  62. package/mcp-configs/dev-platform/texmcp.json +20 -0
  63. package/mcp-configs/dev-platform/typst-mcp.json +21 -0
  64. package/mcp-configs/dev-platform/vizro-mcp.json +20 -0
  65. package/mcp-configs/email/email-mcp.json +40 -0
  66. package/mcp-configs/email/gmail-mcp.json +37 -0
  67. package/mcp-configs/note-knowledge/local-faiss-mcp.json +21 -0
  68. package/mcp-configs/note-knowledge/mcp-memory-service.json +21 -0
  69. package/mcp-configs/note-knowledge/mcp-obsidian.json +23 -0
  70. package/mcp-configs/note-knowledge/mcp-ragdocs.json +20 -0
  71. package/mcp-configs/note-knowledge/mcp-summarizer.json +21 -0
  72. package/mcp-configs/note-knowledge/mediawiki-mcp.json +21 -0
  73. package/mcp-configs/note-knowledge/openzim-mcp.json +20 -0
  74. package/mcp-configs/note-knowledge/zettelkasten-mcp.json +21 -0
  75. package/mcp-configs/reference-mgr/academic-paper-mcp-http.json +20 -0
  76. package/mcp-configs/reference-mgr/academix.json +20 -0
  77. package/mcp-configs/reference-mgr/arxiv-research-mcp.json +21 -0
  78. package/mcp-configs/reference-mgr/google-scholar-abstract-mcp.json +19 -0
  79. package/mcp-configs/reference-mgr/google-scholar-mcp.json +20 -0
  80. package/mcp-configs/reference-mgr/mcp-paperswithcode.json +21 -0
  81. package/mcp-configs/reference-mgr/mcp-scholarly.json +20 -0
  82. package/mcp-configs/reference-mgr/mcp-simple-arxiv.json +20 -0
  83. package/mcp-configs/reference-mgr/mcp-simple-pubmed.json +20 -0
  84. package/mcp-configs/reference-mgr/mcp-zotero.json +21 -0
  85. package/mcp-configs/reference-mgr/mendeley-mcp.json +20 -0
  86. package/mcp-configs/reference-mgr/ncbi-mcp-server.json +22 -0
  87. package/mcp-configs/reference-mgr/onecite.json +21 -0
  88. package/mcp-configs/reference-mgr/paper-search-mcp.json +21 -0
  89. package/mcp-configs/reference-mgr/pubmed-search-mcp.json +21 -0
  90. package/mcp-configs/reference-mgr/scholar-mcp.json +21 -0
  91. package/mcp-configs/reference-mgr/scholar-multi-mcp.json +21 -0
  92. package/mcp-configs/reference-mgr/seerai.json +21 -0
  93. package/mcp-configs/reference-mgr/semantic-scholar-fastmcp.json +21 -0
  94. package/mcp-configs/reference-mgr/sourcelibrary.json +20 -0
  95. package/mcp-configs/registry.json +178 -149
  96. package/mcp-configs/repository/dataverse-mcp.json +33 -0
  97. package/mcp-configs/repository/huggingface-mcp.json +29 -0
  98. package/openclaw.plugin.json +2 -2
  99. package/package.json +2 -2
  100. package/skills/analysis/dataviz/algorithm-visualizer-guide/SKILL.md +259 -0
  101. package/skills/analysis/dataviz/bokeh-visualization-guide/SKILL.md +270 -0
  102. package/skills/analysis/dataviz/chart-image-generator/SKILL.md +229 -0
  103. package/skills/analysis/dataviz/citation-map-guide/SKILL.md +184 -0
  104. package/skills/analysis/dataviz/d3-visualization-guide/SKILL.md +281 -0
  105. package/skills/analysis/dataviz/data-visualization-principles/SKILL.md +171 -0
  106. package/skills/analysis/dataviz/echarts-visualization-guide/SKILL.md +250 -0
  107. package/skills/analysis/dataviz/metabase-analytics-guide/SKILL.md +242 -0
  108. package/skills/analysis/dataviz/plotly-interactive-guide/SKILL.md +266 -0
  109. package/skills/analysis/dataviz/redash-analytics-guide/SKILL.md +284 -0
  110. package/skills/analysis/econometrics/econml-causal-guide/SKILL.md +163 -0
  111. package/skills/analysis/econometrics/empirical-paper-analysis/SKILL.md +192 -0
  112. package/skills/analysis/econometrics/mostly-harmless-guide/SKILL.md +139 -0
  113. package/skills/analysis/econometrics/panel-data-analyst/SKILL.md +259 -0
  114. package/skills/analysis/econometrics/panel-data-regression-workflow/SKILL.md +267 -0
  115. package/skills/analysis/econometrics/python-causality-guide/SKILL.md +134 -0
  116. package/skills/analysis/econometrics/stata-accounting-guide/SKILL.md +269 -0
  117. package/skills/analysis/econometrics/stata-analyst-guide/SKILL.md +245 -0
  118. package/skills/analysis/econometrics/stata-reference-guide/SKILL.md +293 -0
  119. package/skills/analysis/statistics/data-anomaly-detection/SKILL.md +157 -0
  120. package/skills/analysis/statistics/general-statistics-guide/SKILL.md +226 -0
  121. package/skills/analysis/statistics/infiagent-benchmark-guide/SKILL.md +106 -0
  122. package/skills/analysis/statistics/ml-experiment-tracker/SKILL.md +212 -0
  123. package/skills/analysis/statistics/pywayne-statistics-guide/SKILL.md +192 -0
  124. package/skills/analysis/statistics/quantitative-methods-guide/SKILL.md +193 -0
  125. package/skills/analysis/statistics/senior-data-scientist-guide/SKILL.md +223 -0
  126. package/skills/analysis/wrangling/claude-data-analysis-guide/SKILL.md +100 -0
  127. package/skills/analysis/wrangling/csv-data-analyzer/SKILL.md +170 -0
  128. package/skills/analysis/wrangling/data-cleaning-pipeline/SKILL.md +266 -0
  129. package/skills/analysis/wrangling/data-cog-guide/SKILL.md +178 -0
  130. package/skills/analysis/wrangling/open-data-scientist-guide/SKILL.md +197 -0
  131. package/skills/analysis/wrangling/stata-data-cleaning/SKILL.md +276 -0
  132. package/skills/analysis/wrangling/streamline-analyst-guide/SKILL.md +119 -0
  133. package/skills/analysis/wrangling/survey-data-processing/SKILL.md +298 -0
  134. package/skills/domains/ai-ml/ai-agent-papers-guide/SKILL.md +146 -0
  135. package/skills/domains/ai-ml/ai-model-benchmarking/SKILL.md +209 -0
  136. package/skills/domains/ai-ml/annotated-dl-papers-guide/SKILL.md +159 -0
  137. package/skills/domains/ai-ml/anomaly-detection-papers-guide/SKILL.md +167 -0
  138. package/skills/domains/ai-ml/autonomous-agents-papers-guide/SKILL.md +178 -0
  139. package/skills/domains/ai-ml/dl-transformer-finetune/SKILL.md +239 -0
  140. package/skills/domains/ai-ml/domain-adaptation-papers-guide/SKILL.md +173 -0
  141. package/skills/domains/ai-ml/generative-ai-guide/SKILL.md +146 -0
  142. package/skills/domains/ai-ml/graph-learning-papers-guide/SKILL.md +125 -0
  143. package/skills/domains/ai-ml/huggingface-inference-guide/SKILL.md +196 -0
  144. package/skills/domains/ai-ml/keras-deep-learning/SKILL.md +210 -0
  145. package/skills/domains/ai-ml/kolmogorov-arnold-networks-guide/SKILL.md +185 -0
  146. package/skills/domains/ai-ml/llm-from-scratch-guide/SKILL.md +124 -0
  147. package/skills/domains/ai-ml/ml-pipeline-guide/SKILL.md +295 -0
  148. package/skills/domains/ai-ml/nlp-toolkit-guide/SKILL.md +247 -0
  149. package/skills/domains/ai-ml/npcpy-research-guide/SKILL.md +137 -0
  150. package/skills/domains/ai-ml/pytorch-guide/SKILL.md +281 -0
  151. package/skills/domains/ai-ml/pytorch-lightning-guide/SKILL.md +244 -0
  152. package/skills/domains/ai-ml/responsible-ai-guide/SKILL.md +126 -0
  153. package/skills/domains/ai-ml/tensorflow-guide/SKILL.md +241 -0
  154. package/skills/domains/ai-ml/vmas-simulator-guide/SKILL.md +129 -0
  155. package/skills/domains/biomedical/bioagents-guide/SKILL.md +308 -0
  156. package/skills/domains/biomedical/clawbio-guide/SKILL.md +167 -0
  157. package/skills/domains/biomedical/clinical-dialogue-agents-guide/SKILL.md +145 -0
  158. package/skills/domains/biomedical/ena-sequence-api/SKILL.md +175 -0
  159. package/skills/domains/biomedical/genomas-guide/SKILL.md +126 -0
  160. package/skills/domains/biomedical/genotex-benchmark-guide/SKILL.md +125 -0
  161. package/skills/domains/biomedical/med-researcher-guide/SKILL.md +161 -0
  162. package/skills/domains/biomedical/med-researcher-r1-guide/SKILL.md +146 -0
  163. package/skills/domains/biomedical/medgeclaw-guide/SKILL.md +345 -0
  164. package/skills/domains/biomedical/medical-imaging-guide/SKILL.md +305 -0
  165. package/skills/domains/biomedical/ncbi-blast-api/SKILL.md +195 -0
  166. package/skills/domains/biomedical/ncbi-datasets-api/SKILL.md +220 -0
  167. package/skills/domains/biomedical/quickgo-api/SKILL.md +181 -0
  168. package/skills/domains/business/architecture-design-guide/SKILL.md +279 -0
  169. package/skills/domains/business/innovation-management-guide/SKILL.md +257 -0
  170. package/skills/domains/business/operations-research-guide/SKILL.md +258 -0
  171. package/skills/domains/business/xpert-bi-guide/SKILL.md +84 -0
  172. package/skills/domains/chemistry/cactus-cheminformatics-guide/SKILL.md +89 -0
  173. package/skills/domains/chemistry/chemeagle-guide/SKILL.md +147 -0
  174. package/skills/domains/chemistry/chemgraph-agent-guide/SKILL.md +120 -0
  175. package/skills/domains/chemistry/molecular-dynamics-guide/SKILL.md +237 -0
  176. package/skills/domains/chemistry/pubchem-api-guide/SKILL.md +180 -0
  177. package/skills/domains/chemistry/spectroscopy-analysis-guide/SKILL.md +290 -0
  178. package/skills/domains/cs/ai-security-papers-guide/SKILL.md +103 -0
  179. package/skills/domains/cs/code-llm-papers-guide/SKILL.md +131 -0
  180. package/skills/domains/cs/distributed-systems-guide/SKILL.md +268 -0
  181. package/skills/domains/cs/formal-verification-guide/SKILL.md +298 -0
  182. package/skills/domains/cs/gaussian-splatting-papers-guide/SKILL.md +158 -0
  183. package/skills/domains/cs/llm-aiops-guide/SKILL.md +70 -0
  184. package/skills/domains/cs/software-heritage-api/SKILL.md +200 -0
  185. package/skills/domains/ecology/species-distribution-guide/SKILL.md +343 -0
  186. package/skills/domains/economics/imf-data-api-guide/SKILL.md +174 -0
  187. package/skills/domains/economics/nber-working-papers-api/SKILL.md +177 -0
  188. package/skills/domains/economics/post-labor-economics/SKILL.md +254 -0
  189. package/skills/domains/economics/pricing-psychology-guide/SKILL.md +273 -0
  190. package/skills/domains/economics/repec-economics-api/SKILL.md +188 -0
  191. package/skills/domains/economics/world-bank-data-guide/SKILL.md +179 -0
  192. package/skills/domains/education/academic-study-methods/SKILL.md +228 -0
  193. package/skills/domains/education/assessment-design-guide/SKILL.md +213 -0
  194. package/skills/domains/education/educational-research-methods/SKILL.md +179 -0
  195. package/skills/domains/education/edumcp-guide/SKILL.md +74 -0
  196. package/skills/domains/education/mooc-analytics-guide/SKILL.md +206 -0
  197. package/skills/domains/education/open-syllabus-api/SKILL.md +171 -0
  198. package/skills/domains/finance/akshare-finance-data/SKILL.md +207 -0
  199. package/skills/domains/finance/finsight-research-guide/SKILL.md +113 -0
  200. package/skills/domains/finance/options-analytics-agent-guide/SKILL.md +117 -0
  201. package/skills/domains/finance/portfolio-optimization-guide/SKILL.md +279 -0
  202. package/skills/domains/finance/risk-modeling-guide/SKILL.md +260 -0
  203. package/skills/domains/finance/stata-accounting-research/SKILL.md +372 -0
  204. package/skills/domains/geoscience/climate-modeling-guide/SKILL.md +215 -0
  205. package/skills/domains/geoscience/pangaea-data-api/SKILL.md +197 -0
  206. package/skills/domains/geoscience/satellite-remote-sensing/SKILL.md +193 -0
  207. package/skills/domains/geoscience/seismology-data-guide/SKILL.md +208 -0
  208. package/skills/domains/humanities/digital-humanities-methods/SKILL.md +232 -0
  209. package/skills/domains/humanities/ethical-philosophy-guide/SKILL.md +244 -0
  210. package/skills/domains/humanities/history-research-guide/SKILL.md +260 -0
  211. package/skills/domains/humanities/political-history-guide/SKILL.md +241 -0
  212. package/skills/domains/law/caselaw-access-api/SKILL.md +149 -0
  213. package/skills/domains/law/legal-agent-skills-guide/SKILL.md +132 -0
  214. package/skills/domains/law/legal-nlp-guide/SKILL.md +236 -0
  215. package/skills/domains/law/legal-research-methods/SKILL.md +190 -0
  216. package/skills/domains/law/opencontracts-guide/SKILL.md +168 -0
  217. package/skills/domains/law/patent-analysis-guide/SKILL.md +257 -0
  218. package/skills/domains/law/regulatory-compliance-guide/SKILL.md +267 -0
  219. package/skills/domains/math/lean-theorem-proving-guide/SKILL.md +140 -0
  220. package/skills/domains/math/symbolic-computation-guide/SKILL.md +263 -0
  221. package/skills/domains/math/topology-data-analysis/SKILL.md +305 -0
  222. package/skills/domains/pharma/clinical-trial-design-guide/SKILL.md +271 -0
  223. package/skills/domains/pharma/drug-target-interaction/SKILL.md +242 -0
  224. package/skills/domains/pharma/madd-drug-discovery-guide/SKILL.md +153 -0
  225. package/skills/domains/pharma/pharmacovigilance-guide/SKILL.md +216 -0
  226. package/skills/domains/physics/astrophysics-data-guide/SKILL.md +305 -0
  227. package/skills/domains/physics/particle-physics-guide/SKILL.md +287 -0
  228. package/skills/domains/social-science/ipums-microdata-api/SKILL.md +211 -0
  229. package/skills/domains/social-science/network-analysis-guide/SKILL.md +310 -0
  230. package/skills/domains/social-science/psychology-research-guide/SKILL.md +270 -0
  231. package/skills/domains/social-science/sociology-research-guide/SKILL.md +238 -0
  232. package/skills/domains/social-science/sociology-research-methods/SKILL.md +181 -0
  233. package/skills/literature/discovery/arxiv-paper-monitoring/SKILL.md +233 -0
  234. package/skills/literature/discovery/paper-recommendation-guide/SKILL.md +120 -0
  235. package/skills/literature/discovery/papers-we-love-guide/SKILL.md +169 -0
  236. package/skills/literature/discovery/semantic-paper-radar/SKILL.md +144 -0
  237. package/skills/literature/discovery/zotero-arxiv-daily-guide/SKILL.md +94 -0
  238. package/skills/literature/fulltext/bioc-pmc-api/SKILL.md +146 -0
  239. package/skills/literature/fulltext/core-api-guide/SKILL.md +144 -0
  240. package/skills/literature/fulltext/dataverse-api/SKILL.md +215 -0
  241. package/skills/literature/fulltext/hal-archive-api/SKILL.md +218 -0
  242. package/skills/literature/fulltext/institutional-repository-guide/SKILL.md +212 -0
  243. package/skills/literature/fulltext/open-access-mining-guide/SKILL.md +341 -0
  244. package/skills/literature/fulltext/osf-api/SKILL.md +212 -0
  245. package/skills/literature/fulltext/pmc-ftp-bulk-download/SKILL.md +182 -0
  246. package/skills/literature/fulltext/zotero-ai-butler-guide/SKILL.md +166 -0
  247. package/skills/literature/fulltext/zotero-scihub-guide/SKILL.md +168 -0
  248. package/skills/literature/metadata/academic-paper-summarizer/SKILL.md +101 -0
  249. package/skills/literature/metadata/bibliometrix-guide/SKILL.md +164 -0
  250. package/skills/literature/metadata/crossref-event-data-api/SKILL.md +183 -0
  251. package/skills/literature/metadata/doi-content-negotiation/SKILL.md +202 -0
  252. package/skills/literature/metadata/orkg-api/SKILL.md +153 -0
  253. package/skills/literature/metadata/plumx-metrics-api/SKILL.md +188 -0
  254. package/skills/literature/metadata/ror-organization-api/SKILL.md +208 -0
  255. package/skills/literature/metadata/sophosia-reference-guide/SKILL.md +110 -0
  256. package/skills/literature/metadata/viaf-authority-api/SKILL.md +209 -0
  257. package/skills/literature/metadata/wikidata-api-guide/SKILL.md +156 -0
  258. package/skills/literature/metadata/zoplicate-dedup-guide/SKILL.md +147 -0
  259. package/skills/literature/metadata/zotero-actions-tags-guide/SKILL.md +212 -0
  260. package/skills/literature/metadata/zotmoov-guide/SKILL.md +120 -0
  261. package/skills/literature/metadata/zutilo-guide/SKILL.md +140 -0
  262. package/skills/literature/search/arxiv-batch-reporting/SKILL.md +133 -0
  263. package/skills/literature/search/arxiv-cli-tools/SKILL.md +172 -0
  264. package/skills/literature/search/arxiv-osiris/SKILL.md +199 -0
  265. package/skills/literature/search/arxiv-paper-processor/SKILL.md +141 -0
  266. package/skills/literature/search/baidu-scholar-guide/SKILL.md +110 -0
  267. package/skills/literature/search/base-academic-search/SKILL.md +196 -0
  268. package/skills/literature/search/chatpaper-guide/SKILL.md +122 -0
  269. package/skills/literature/search/citeseerx-api/SKILL.md +183 -0
  270. package/skills/literature/search/deep-literature-search/SKILL.md +149 -0
  271. package/skills/literature/search/deepgit-search-guide/SKILL.md +147 -0
  272. package/skills/literature/search/eric-education-api/SKILL.md +199 -0
  273. package/skills/literature/search/findpapers-guide/SKILL.md +177 -0
  274. package/skills/literature/search/ieee-xplore-api/SKILL.md +177 -0
  275. package/skills/literature/search/lens-scholarly-api/SKILL.md +211 -0
  276. package/skills/literature/search/multi-database-literature-search/SKILL.md +198 -0
  277. package/skills/literature/search/open-library-api/SKILL.md +196 -0
  278. package/skills/literature/search/open-semantic-search-guide/SKILL.md +190 -0
  279. package/skills/literature/search/openaire-api/SKILL.md +141 -0
  280. package/skills/literature/search/paper-search-mcp-guide/SKILL.md +107 -0
  281. package/skills/literature/search/papers-chat-guide/SKILL.md +194 -0
  282. package/skills/literature/search/pasa-paper-search-guide/SKILL.md +138 -0
  283. package/skills/literature/search/plos-open-access-api/SKILL.md +203 -0
  284. package/skills/literature/search/scielo-api/SKILL.md +182 -0
  285. package/skills/literature/search/share-research-api/SKILL.md +129 -0
  286. package/skills/literature/search/worldcat-search-api/SKILL.md +224 -0
  287. package/skills/research/automation/ai-scientist-v2-guide/SKILL.md +284 -0
  288. package/skills/research/automation/aim-experiment-guide/SKILL.md +234 -0
  289. package/skills/research/automation/claude-academic-workflow-guide/SKILL.md +202 -0
  290. package/skills/research/automation/coexist-ai-guide/SKILL.md +149 -0
  291. package/skills/research/automation/datagen-research-guide/SKILL.md +131 -0
  292. package/skills/research/automation/foam-agent-guide/SKILL.md +203 -0
  293. package/skills/research/automation/kedro-pipeline-guide/SKILL.md +216 -0
  294. package/skills/research/automation/mle-agent-guide/SKILL.md +139 -0
  295. package/skills/research/automation/paper-to-agent-guide/SKILL.md +116 -0
  296. package/skills/research/automation/rd-agent-guide/SKILL.md +246 -0
  297. package/skills/research/automation/research-paper-orchestrator/SKILL.md +254 -0
  298. package/skills/research/deep-research/academic-deep-research/SKILL.md +190 -0
  299. package/skills/research/deep-research/auto-deep-research-guide/SKILL.md +141 -0
  300. package/skills/research/deep-research/cognitive-kernel-guide/SKILL.md +200 -0
  301. package/skills/research/deep-research/corvus-research-guide/SKILL.md +132 -0
  302. package/skills/research/deep-research/deep-research-pro/SKILL.md +213 -0
  303. package/skills/research/deep-research/deep-research-work/SKILL.md +204 -0
  304. package/skills/research/deep-research/deep-searcher-guide/SKILL.md +253 -0
  305. package/skills/research/deep-research/gpt-researcher-guide/SKILL.md +191 -0
  306. package/skills/research/deep-research/in-depth-research-guide/SKILL.md +205 -0
  307. package/skills/research/deep-research/khoj-research-guide/SKILL.md +200 -0
  308. package/skills/research/deep-research/kosmos-scientist-guide/SKILL.md +185 -0
  309. package/skills/research/deep-research/llm-scientific-discovery-guide/SKILL.md +178 -0
  310. package/skills/research/deep-research/local-deep-research-guide/SKILL.md +253 -0
  311. package/skills/research/deep-research/open-researcher-guide/SKILL.md +138 -0
  312. package/skills/research/deep-research/tongyi-deep-research-guide/SKILL.md +217 -0
  313. package/skills/research/funding/eu-horizon-guide/SKILL.md +244 -0
  314. package/skills/research/funding/grant-budget-guide/SKILL.md +284 -0
  315. package/skills/research/funding/nih-reporter-api-guide/SKILL.md +166 -0
  316. package/skills/research/funding/nsf-award-api-guide/SKILL.md +133 -0
  317. package/skills/research/methodology/academic-mentor-guide/SKILL.md +169 -0
  318. package/skills/research/methodology/claude-scientific-guide/SKILL.md +122 -0
  319. package/skills/research/methodology/deep-innovator-guide/SKILL.md +242 -0
  320. package/skills/research/methodology/osf-api-guide/SKILL.md +165 -0
  321. package/skills/research/methodology/parsifal-slr-guide/SKILL.md +154 -0
  322. package/skills/research/methodology/research-paper-kb/SKILL.md +263 -0
  323. package/skills/research/methodology/research-pipeline-units-guide/SKILL.md +169 -0
  324. package/skills/research/methodology/research-town-guide/SKILL.md +263 -0
  325. package/skills/research/methodology/slr-automation-guide/SKILL.md +235 -0
  326. package/skills/research/paper-review/automated-review-guide/SKILL.md +281 -0
  327. package/skills/research/paper-review/latte-review-guide/SKILL.md +175 -0
  328. package/skills/research/paper-review/paper-compare-guide/SKILL.md +238 -0
  329. package/skills/research/paper-review/paper-critique-framework/SKILL.md +181 -0
  330. package/skills/research/paper-review/paper-digest-guide/SKILL.md +240 -0
  331. package/skills/research/paper-review/paper-research-assistant/SKILL.md +231 -0
  332. package/skills/research/paper-review/research-quality-filter/SKILL.md +261 -0
  333. package/skills/research/paper-review/review-response-guide/SKILL.md +275 -0
  334. package/skills/tools/code-exec/contextplus-mcp-guide/SKILL.md +110 -0
  335. package/skills/tools/code-exec/google-colab-guide/SKILL.md +276 -0
  336. package/skills/tools/code-exec/kaggle-api-guide/SKILL.md +216 -0
  337. package/skills/tools/code-exec/overleaf-cli-guide/SKILL.md +279 -0
  338. package/skills/tools/diagram/clawphd-guide/SKILL.md +149 -0
  339. package/skills/tools/diagram/code-flow-visualizer/SKILL.md +197 -0
  340. package/skills/tools/diagram/excalidraw-diagram-guide/SKILL.md +170 -0
  341. package/skills/tools/diagram/json-data-visualizer/SKILL.md +270 -0
  342. package/skills/tools/diagram/kroki-diagram-api/SKILL.md +198 -0
  343. package/skills/tools/diagram/mermaid-architect-guide/SKILL.md +219 -0
  344. package/skills/tools/diagram/scientific-graphical-abstract/SKILL.md +201 -0
  345. package/skills/tools/diagram/tldraw-whiteboard-guide/SKILL.md +397 -0
  346. package/skills/tools/document/docsgpt-guide/SKILL.md +130 -0
  347. package/skills/tools/document/large-document-reader/SKILL.md +202 -0
  348. package/skills/tools/document/md2pdf-xelatex/SKILL.md +212 -0
  349. package/skills/tools/document/openpaper-guide/SKILL.md +232 -0
  350. package/skills/tools/document/paper-parse-guide/SKILL.md +243 -0
  351. package/skills/tools/document/weknora-guide/SKILL.md +216 -0
  352. package/skills/tools/document/zotero-addon-market-guide/SKILL.md +108 -0
  353. package/skills/tools/document/zotero-night-theme-guide/SKILL.md +142 -0
  354. package/skills/tools/document/zotero-style-guide/SKILL.md +217 -0
  355. package/skills/tools/knowledge-graph/citation-network-builder/SKILL.md +244 -0
  356. package/skills/tools/knowledge-graph/concept-map-generator/SKILL.md +284 -0
  357. package/skills/tools/knowledge-graph/graphiti-guide/SKILL.md +219 -0
  358. package/skills/tools/knowledge-graph/mimir-memory-guide/SKILL.md +135 -0
  359. package/skills/tools/knowledge-graph/notero-zotero-notion-guide/SKILL.md +187 -0
  360. package/skills/tools/knowledge-graph/open-webui-tools-guide/SKILL.md +156 -0
  361. package/skills/tools/knowledge-graph/openspg-guide/SKILL.md +210 -0
  362. package/skills/tools/knowledge-graph/paperpile-notion-guide/SKILL.md +84 -0
  363. package/skills/tools/knowledge-graph/zotero-markdb-connect-guide/SKILL.md +162 -0
  364. package/skills/tools/ocr-translate/latex-translation-guide/SKILL.md +176 -0
  365. package/skills/tools/ocr-translate/math-equation-renderer/SKILL.md +198 -0
  366. package/skills/tools/ocr-translate/pdf-math-translate-guide/SKILL.md +141 -0
  367. package/skills/tools/ocr-translate/zotero-pdf-translate-guide/SKILL.md +95 -0
  368. package/skills/tools/ocr-translate/zotero-pdf2zh-guide/SKILL.md +143 -0
  369. package/skills/tools/scraping/dataset-finder-guide/SKILL.md +253 -0
  370. package/skills/tools/scraping/easy-spider-guide/SKILL.md +250 -0
  371. package/skills/tools/scraping/google-scholar-scraper/SKILL.md +255 -0
  372. package/skills/tools/scraping/repository-harvesting-guide/SKILL.md +310 -0
  373. package/skills/writing/citation/academic-citation-manager/SKILL.md +314 -0
  374. package/skills/writing/citation/academic-citation-manager-guide/SKILL.md +182 -0
  375. package/skills/writing/citation/citation-assistant-skill/SKILL.md +192 -0
  376. package/skills/writing/citation/jabref-reference-guide/SKILL.md +127 -0
  377. package/skills/writing/citation/jasminum-zotero-guide/SKILL.md +103 -0
  378. package/skills/writing/citation/mendeley-api/SKILL.md +231 -0
  379. package/skills/writing/citation/obsidian-citation-guide/SKILL.md +164 -0
  380. package/skills/writing/citation/obsidian-zotero-guide/SKILL.md +137 -0
  381. package/skills/writing/citation/onecite-reference-guide/SKILL.md +168 -0
  382. package/skills/writing/citation/papersgpt-zotero-guide/SKILL.md +132 -0
  383. package/skills/writing/citation/papis-cli-guide/SKILL.md +213 -0
  384. package/skills/writing/citation/zotero-better-bibtex-guide/SKILL.md +107 -0
  385. package/skills/writing/citation/zotero-better-notes-guide/SKILL.md +121 -0
  386. package/skills/writing/citation/zotero-gpt-guide/SKILL.md +111 -0
  387. package/skills/writing/citation/zotero-mcp-guide/SKILL.md +164 -0
  388. package/skills/writing/citation/zotero-mdnotes-guide/SKILL.md +162 -0
  389. package/skills/writing/citation/zotero-reference-guide/SKILL.md +139 -0
  390. package/skills/writing/citation/zotero-scholar-guide/SKILL.md +294 -0
  391. package/skills/writing/citation/zotfile-attachment-guide/SKILL.md +140 -0
  392. package/skills/writing/composition/ml-paper-writing/SKILL.md +163 -0
  393. package/skills/writing/composition/opendraft-thesis-guide/SKILL.md +200 -0
  394. package/skills/writing/composition/paper-debugger-guide/SKILL.md +143 -0
  395. package/skills/writing/composition/paperforge-guide/SKILL.md +205 -0
  396. package/skills/writing/composition/research-paper-writer/SKILL.md +226 -0
  397. package/skills/writing/composition/scientific-writing-resources/SKILL.md +151 -0
  398. package/skills/writing/composition/scientific-writing-wrapper/SKILL.md +153 -0
  399. package/skills/writing/latex/academic-writing-latex/SKILL.md +285 -0
  400. package/skills/writing/latex/latex-drawing-collection/SKILL.md +154 -0
  401. package/skills/writing/latex/latex-templates-collection/SKILL.md +159 -0
  402. package/skills/writing/latex/md-to-pdf-academic/SKILL.md +230 -0
  403. package/skills/writing/latex/tex-render-guide/SKILL.md +243 -0
  404. package/skills/writing/polish/academic-tone-guide/SKILL.md +209 -0
  405. package/skills/writing/polish/chinese-text-humanizer/SKILL.md +140 -0
  406. package/skills/writing/polish/conciseness-editing-guide/SKILL.md +225 -0
  407. package/skills/writing/polish/paper-polish-guide/SKILL.md +160 -0
  408. package/skills/writing/templates/arxiv-preprint-template/SKILL.md +184 -0
  409. package/skills/writing/templates/elegant-paper-template/SKILL.md +141 -0
  410. package/skills/writing/templates/graphical-abstract-guide/SKILL.md +183 -0
  411. package/skills/writing/templates/novathesis-guide/SKILL.md +152 -0
  412. package/skills/writing/templates/scientific-article-pdf/SKILL.md +261 -0
  413. package/skills/writing/templates/sjtuthesis-guide/SKILL.md +197 -0
  414. package/skills/writing/templates/thuthesis-guide/SKILL.md +181 -0
  415. package/skills/literature/fulltext/repository-harvesting-guide/SKILL.md +0 -207
@@ -0,0 +1,185 @@
1
+ ---
2
+ name: kolmogorov-arnold-networks-guide
3
+ description: "Papers and tutorials on KAN learnable activation networks"
4
+ metadata:
5
+ openclaw:
6
+ emoji: "📐"
7
+ category: "domains"
8
+ subcategory: "ai-ml"
9
+ keywords: ["KAN", "Kolmogorov-Arnold", "learnable activations", "spline networks", "neural architecture", "interpretability"]
10
+ source: "https://github.com/mintisan/awesome-kan"
11
+ ---
12
+
13
+ # Kolmogorov-Arnold Networks (KAN) Guide
14
+
15
+ ## Overview
16
+
17
+ Kolmogorov-Arnold Networks (KANs) are a novel neural network architecture that places learnable activation functions on edges (weights) instead of fixed activations on nodes. Based on the Kolmogorov-Arnold representation theorem, KANs use B-spline functions as learnable edge activations, achieving better accuracy and interpretability than MLPs with fewer parameters in certain domains. This collection tracks the rapidly growing KAN literature.
18
+
19
+ ## Core Concept
20
+
21
+ ```
22
+ Traditional MLP:
23
+ x → [fixed activation(linear transform)] → y
24
+ Activations on nodes, weights on edges
25
+
26
+ KAN:
27
+ x → [learnable spline functions on edges] → sum → y
28
+ Each edge learns its own activation function (B-spline)
29
+
30
+ Kolmogorov-Arnold Theorem:
31
+ f(x₁,...,xₙ) = Σ Φᵢ(Σ φᵢⱼ(xⱼ))
32
+ Any multivariate continuous function = composition of
33
+ univariate functions and addition
34
+ ```
35
+
36
+ ## Key Papers
37
+
38
+ ```bibtex
39
+ @article{liu2024kan,
40
+ title={KAN: Kolmogorov-Arnold Networks},
41
+ author={Liu, Ziming and Wang, Yixuan and Vaidya, Sachin and
42
+ Ruehle, Fabian and Halverson, James and
43
+ Solja{\v{c}}i{\'c}, Marin and Hou, Thomas Y. and
44
+ Tegmark, Max},
45
+ journal={arXiv:2404.19756},
46
+ year={2024}
47
+ }
48
+ ```
49
+
50
+ ## Implementation
51
+
52
+ ```python
53
+ # Using pykan (official implementation)
54
+ # pip install pykan
55
+
56
+ from kan import KAN
57
+ import torch
58
+
59
+ # Create a KAN model
60
+ model = KAN(
61
+ width=[2, 5, 1], # Input: 2, Hidden: 5, Output: 1
62
+ grid=5, # Spline grid resolution
63
+ k=3, # Spline order (cubic)
64
+ )
65
+
66
+ # Training data
67
+ x = torch.randn(1000, 2)
68
+ y = torch.sin(x[:, 0]) + torch.cos(x[:, 1])
69
+ y = y.unsqueeze(1)
70
+
71
+ # Train
72
+ dataset = {"train_input": x[:800], "train_label": y[:800],
73
+ "test_input": x[800:], "test_label": y[800:]}
74
+ model.train(dataset, steps=100, lr=0.01)
75
+
76
+ # Visualize learned functions
77
+ model.plot()
78
+
79
+ # Prune and simplify
80
+ model = model.prune()
81
+ model.plot()
82
+ ```
83
+
84
+ ## KAN vs MLP Comparison
85
+
86
+ ```python
87
+ # Comparison on function approximation
88
+ from kan import KAN
89
+ import torch.nn as nn
90
+
91
+ # KAN: learnable activations on edges
92
+ kan_model = KAN(width=[2, 5, 1], grid=5, k=3)
93
+ # Parameters: ~150 (spline coefficients)
94
+
95
+ # MLP: fixed activations on nodes
96
+ class MLP(nn.Module):
97
+ def __init__(self):
98
+ super().__init__()
99
+ self.net = nn.Sequential(
100
+ nn.Linear(2, 50),
101
+ nn.ReLU(),
102
+ nn.Linear(50, 50),
103
+ nn.ReLU(),
104
+ nn.Linear(50, 1),
105
+ )
106
+ def forward(self, x):
107
+ return self.net(x)
108
+
109
+ mlp_model = MLP()
110
+ # Parameters: ~2,700
111
+
112
+ # KAN advantages:
113
+ # - Fewer parameters for same accuracy
114
+ # - Interpretable (visualize learned functions)
115
+ # - Better for scientific discovery (symbolic regression)
116
+ # - Grid refinement for progressive accuracy
117
+
118
+ # MLP advantages:
119
+ # - Faster training
120
+ # - Better scaling to high dimensions
121
+ # - More mature tooling and optimization
122
+ ```
123
+
124
+ ## Extensions and Variants
125
+
126
+ | Variant | Innovation | Application |
127
+ |---------|-----------|-------------|
128
+ | **KAN 2.0** | MultKAN with multiplication nodes | Improved scaling |
129
+ | **Temporal KAN** | Time-series adaptation | Forecasting |
130
+ | **ConvKAN** | KAN + convolutions | Image processing |
131
+ | **GraphKAN** | KAN on graph structures | Graph learning |
132
+ | **FourierKAN** | Fourier basis instead of splines | Periodic functions |
133
+ | **WavKAN** | Wavelet-based activations | Signal processing |
134
+ | **BSRBF-KAN** | B-spline + radial basis | Function approximation |
135
+
136
+ ## Scientific Applications
137
+
138
+ ```python
139
+ # KAN for symbolic regression (discovering equations)
140
+ from kan import KAN
141
+
142
+ # Generate data from unknown equation: f(x,y) = x*exp(y)
143
+ import torch
144
+ x = torch.rand(1000, 2) * 2
145
+ y = x[:, 0:1] * torch.exp(x[:, 1:2])
146
+
147
+ dataset = {"train_input": x[:800], "train_label": y[:800],
148
+ "test_input": x[800:], "test_label": y[800:]}
149
+
150
+ model = KAN(width=[2, 1, 1], grid=10, k=3)
151
+ model.train(dataset, steps=200)
152
+
153
+ # Symbolic fitting — discover the equation
154
+ model.auto_symbolic()
155
+ # Output: f(x₁, x₂) = x₁ * exp(x₂)
156
+ # KAN can discover symbolic expressions from data
157
+ ```
158
+
159
+ ## Research Landscape
160
+
161
+ ```markdown
162
+ ### Key Research Directions
163
+ 1. **Scaling** — Making KANs work at LLM scale
164
+ 2. **Efficiency** — Reducing spline computation overhead
165
+ 3. **Theory** — Understanding approximation guarantees
166
+ 4. **Architecture search** — Optimal KAN topologies
167
+ 5. **Hybrid models** — Combining KAN and MLP strengths
168
+ 6. **Domain applications** — Physics, chemistry, biology
169
+ 7. **Interpretability** — Extracting symbolic knowledge
170
+ ```
171
+
172
+ ## Use Cases
173
+
174
+ 1. **Scientific discovery**: Extract equations from experimental data
175
+ 2. **Function approximation**: High-accuracy low-parameter models
176
+ 3. **Interpretable ML**: Understand what the network learned
177
+ 4. **Physics-informed**: Embed physical constraints in activations
178
+ 5. **Education**: Teach alternative neural network architectures
179
+
180
+ ## References
181
+
182
+ - [awesome-kan](https://github.com/mintisan/awesome-kan)
183
+ - [KAN Paper](https://arxiv.org/abs/2404.19756)
184
+ - [pykan Implementation](https://github.com/KindXiaoming/pykan)
185
+ - [KAN 2.0 Paper](https://arxiv.org/abs/2408.10205)
@@ -0,0 +1,124 @@
1
+ ---
2
+ name: llm-from-scratch-guide
3
+ description: "Build a ChatGPT-like LLM from scratch using PyTorch step by step"
4
+ metadata:
5
+ openclaw:
6
+ emoji: "🧱"
7
+ category: "domains"
8
+ subcategory: "ai-ml"
9
+ keywords: ["llm", "pytorch", "transformer", "gpt", "pretraining", "finetuning"]
10
+ source: "https://github.com/rasbt/LLMs-from-scratch"
11
+ ---
12
+
13
+ # LLM From Scratch Guide
14
+
15
+ ## Overview
16
+
17
+ LLMs-from-scratch is a comprehensive educational repository with over 87,000 stars on GitHub that teaches you how to build a ChatGPT-like large language model from the ground up using PyTorch. Created by Sebastian Raschka, a machine learning researcher and author, the project provides a complete pipeline covering data preparation, tokenization, attention mechanisms, pretraining, and instruction finetuning.
18
+
19
+ Unlike tutorials that treat LLMs as black boxes, this project demystifies every component by walking through the full implementation. Each chapter corresponds to a Jupyter notebook with clear explanations, diagrams, and runnable code. The repository accompanies the book "Build a Large Language Model (From Scratch)" and serves as a standalone learning resource for researchers and engineers who want deep understanding of transformer-based language models.
20
+
21
+ The project is particularly valuable for academic researchers who need to understand the internals of LLMs for their own research, whether that involves modifying architectures, running ablation studies, or developing domain-specific language models for scientific applications.
22
+
23
+ ## Installation and Setup
24
+
25
+ Clone the repository and set up a Python environment with the required dependencies:
26
+
27
+ ```bash
28
+ git clone https://github.com/rasbt/LLMs-from-scratch.git
29
+ cd LLMs-from-scratch
30
+
31
+ # Create a virtual environment
32
+ python -m venv llm-env
33
+ source llm-env/bin/activate
34
+
35
+ # Install dependencies
36
+ pip install -r requirements.txt
37
+ ```
38
+
39
+ The project requires Python 3.10+ and PyTorch 2.0+. For GPU-accelerated training, ensure you have CUDA installed. The notebooks can also run on CPU for smaller model configurations, though training times will be significantly longer.
40
+
41
+ Key dependencies include:
42
+
43
+ - **PyTorch** >= 2.0 for model implementation and training
44
+ - **tiktoken** for BPE tokenization compatible with OpenAI models
45
+ - **matplotlib** for training visualization
46
+ - **jupyter** for interactive notebook execution
47
+
48
+ ## Core Learning Pipeline
49
+
50
+ The project is organized into sequential chapters that build on each other:
51
+
52
+ ### Chapter 1: Understanding Large Language Models
53
+ Covers the conceptual foundations of LLMs, including the transformer architecture, the difference between encoder and decoder models, and how pretraining and finetuning work at a high level.
54
+
55
+ ### Chapter 2: Working with Text Data
56
+ Implements text tokenization from scratch, including byte-pair encoding (BPE). You build a custom tokenizer and learn how text is converted to numerical representations:
57
+
58
+ ```python
59
+ # Tokenization example from the project
60
+ import tiktoken
61
+
62
+ tokenizer = tiktoken.get_encoding("gpt2")
63
+ text = "Large language models are fascinating."
64
+ token_ids = tokenizer.encode(text)
65
+ decoded = tokenizer.decode(token_ids)
66
+ ```
67
+
68
+ ### Chapter 3: Coding Attention Mechanisms
69
+ Implements self-attention, multi-head attention, and causal (masked) attention from scratch. This is the core computational primitive of transformers:
70
+
71
+ ```python
72
+ # Simplified multi-head attention
73
+ class MultiHeadAttention(nn.Module):
74
+ def __init__(self, d_in, d_out, context_length, num_heads, dropout=0.0):
75
+ super().__init__()
76
+ self.W_query = nn.Linear(d_in, d_out, bias=False)
77
+ self.W_key = nn.Linear(d_in, d_out, bias=False)
78
+ self.W_value = nn.Linear(d_in, d_out, bias=False)
79
+ self.out_proj = nn.Linear(d_out, d_out)
80
+ self.num_heads = num_heads
81
+ self.head_dim = d_out // num_heads
82
+ ```
83
+
84
+ ### Chapter 4: Implementing a GPT Model
85
+ Assembles the full GPT architecture using the attention mechanism, layer normalization, feed-forward networks, and positional embeddings.
86
+
87
+ ### Chapter 5: Pretraining on Unlabeled Data
88
+ Trains the GPT model on a text corpus using next-token prediction. Covers the training loop, loss computation, learning rate scheduling, and gradient clipping.
89
+
90
+ ### Chapter 6: Finetuning for Text Classification
91
+ Adapts the pretrained model for downstream classification tasks, demonstrating how to add a classification head and finetune on labeled data.
92
+
93
+ ### Chapter 7: Instruction Finetuning
94
+ Converts the pretrained model into an instruction-following assistant using supervised finetuning on instruction-response pairs, similar to how ChatGPT is trained.
95
+
96
+ ## Research Applications
97
+
98
+ This resource is invaluable for several research scenarios:
99
+
100
+ - **Architecture ablation studies**: Modify individual components (attention heads, layer count, embedding dimensions) and measure their impact on performance
101
+ - **Domain-specific pretraining**: Use the pipeline to pretrain models on scientific corpora (biomedical literature, physics papers, chemical databases)
102
+ - **Tokenizer research**: Experiment with different tokenization strategies for specialized vocabularies
103
+ - **Efficient training methods**: Test techniques like gradient accumulation, mixed precision, and learning rate warmup
104
+ - **Interpretability research**: Inspect attention patterns and intermediate representations at every layer
105
+
106
+ For researchers working with limited compute, the project includes configurations for small models (124M parameters) that can be trained on a single GPU in reasonable time, making it practical for experimentation and prototyping.
107
+
108
+ ## Integration with Research Workflows
109
+
110
+ Combine this project with other tools in your research stack:
111
+
112
+ - Use **Weights & Biases** or **MLflow** for experiment tracking during pretraining runs
113
+ - Export trained models to **Hugging Face Hub** for sharing and reproducibility
114
+ - Integrate with **PyTorch Lightning** for distributed training across multiple GPUs
115
+ - Apply **LoRA** or **QLoRA** adapters from the bonus chapters for parameter-efficient finetuning
116
+
117
+ The bonus materials in the repository cover additional topics like DPO (Direct Preference Optimization), loading pretrained weights from Hugging Face, and converting models between different formats.
118
+
119
+ ## References
120
+
121
+ - Repository: https://github.com/rasbt/LLMs-from-scratch
122
+ - Book: "Build a Large Language Model (From Scratch)" by Sebastian Raschka (Manning, 2024)
123
+ - Author's blog: https://sebastianraschka.com/blog/
124
+ - PyTorch documentation: https://pytorch.org/docs/stable/
@@ -0,0 +1,295 @@
1
+ ---
2
+ name: ml-pipeline-guide
3
+ description: "Build and deploy reproducible production ML pipelines for research"
4
+ metadata:
5
+ openclaw:
6
+ emoji: "🔧"
7
+ category: "domains"
8
+ subcategory: "ai-ml"
9
+ keywords: ["MLOps", "pipeline", "deployment", "reproducibility", "feature engineering", "CI/CD"]
10
+ source: "https://github.com/mlflow/mlflow"
11
+ ---
12
+
13
+ # ML Pipeline Guide
14
+
15
+ ## Overview
16
+
17
+ Machine learning research increasingly demands reproducible, end-to-end pipelines that go beyond a single training script. A research ML pipeline encompasses data ingestion, feature engineering, model training, evaluation, experiment tracking, and artifact management. Without a structured pipeline, research results become difficult to reproduce, ablation studies become error-prone, and collaborators cannot build on prior work.
18
+
19
+ This guide covers the practical tools and patterns for building ML pipelines in an academic research context. The focus is on reproducibility, experiment tracking, and the transition from notebook prototyping to structured experiments. The patterns use MLflow, DVC, and standard Python tooling -- chosen because they are open source, widely adopted in published research, and require minimal infrastructure.
20
+
21
+ Unlike industry MLOps guides that emphasize deployment at scale, this guide prioritizes the research workflow: running many experiments, tracking what changed between runs, and producing results that reviewers can verify.
22
+
23
+ ## Pipeline Architecture
24
+
25
+ A research ML pipeline typically has five stages:
26
+
27
+ ```
28
+ Data Ingestion → Feature Engineering → Training → Evaluation → Artifact Storage
29
+ │ │ │ │ │
30
+ ├── raw data ├── transforms ├── model ├── metrics ├── models
31
+ ├── splits ├── features ├── logs ├── plots ├── configs
32
+ └── metadata └── cache └── ckpts └── tables └── reports
33
+ ```
34
+
35
+ ### Directory Structure for Reproducible Research
36
+
37
+ ```
38
+ project/
39
+ ├── configs/
40
+ │ ├── base.yaml # Default hyperparameters
41
+ │ ├── experiment_001.yaml # Experiment-specific overrides
42
+ │ └── sweep.yaml # Hyperparameter search space
43
+ ├── data/
44
+ │ ├── raw/ # Immutable original data
45
+ │ ├── processed/ # Cleaned and transformed
46
+ │ └── splits/ # Train/val/test splits (versioned)
47
+ ├── src/
48
+ │ ├── data/ # Data loading and preprocessing
49
+ │ ├── features/ # Feature engineering
50
+ │ ├── models/ # Model definitions
51
+ │ ├── training/ # Training loops
52
+ │ └── evaluation/ # Metrics and visualization
53
+ ├── experiments/ # MLflow/W&B experiment logs
54
+ ├── notebooks/ # Exploratory analysis only
55
+ ├── tests/ # Unit tests for pipeline components
56
+ ├── Makefile # Reproducible commands
57
+ ├── requirements.txt # Pinned dependencies
58
+ └── dvc.yaml # Data version control pipeline
59
+ ```
60
+
61
+ ## Experiment Tracking with MLflow
62
+
63
+ ```python
64
+ import mlflow
65
+ import mlflow.pytorch
66
+ from pathlib import Path
67
+
68
+ def run_experiment(config: dict):
69
+ """Run a single experiment with full tracking."""
70
+ mlflow.set_experiment(config["experiment_name"])
71
+
72
+ with mlflow.start_run(run_name=config.get("run_name")):
73
+ # Log configuration
74
+ mlflow.log_params({
75
+ "model": config["model_name"],
76
+ "learning_rate": config["lr"],
77
+ "batch_size": config["batch_size"],
78
+ "epochs": config["epochs"],
79
+ "optimizer": config["optimizer"],
80
+ "seed": config["seed"],
81
+ })
82
+
83
+ # Log environment
84
+ mlflow.log_param("python_version", sys.version)
85
+ mlflow.log_param("torch_version", torch.__version__)
86
+ mlflow.log_param("cuda_version", torch.version.cuda)
87
+
88
+ # Training
89
+ model = build_model(config)
90
+ for epoch in range(config["epochs"]):
91
+ train_loss = train_one_epoch(model, train_loader, optimizer)
92
+ val_loss, val_metrics = evaluate(model, val_loader)
93
+
94
+ mlflow.log_metrics({
95
+ "train_loss": train_loss,
96
+ "val_loss": val_loss,
97
+ **{f"val_{k}": v for k, v in val_metrics.items()},
98
+ }, step=epoch)
99
+
100
+ # Log final model
101
+ mlflow.pytorch.log_model(model, "model")
102
+
103
+ # Log artifacts (plots, configs)
104
+ mlflow.log_artifact(config_path)
105
+ save_evaluation_plots(model, test_loader, "plots/")
106
+ mlflow.log_artifacts("plots/")
107
+
108
+ return val_metrics
109
+ ```
110
+
111
+ ## Data Versioning with DVC
112
+
113
+ ```yaml
114
+ # dvc.yaml -- Pipeline definition
115
+ stages:
116
+ prepare_data:
117
+ cmd: python src/data/prepare.py --config configs/base.yaml
118
+ deps:
119
+ - src/data/prepare.py
120
+ - data/raw/
121
+ outs:
122
+ - data/processed/
123
+ params:
124
+ - configs/base.yaml:
125
+ - data.split_ratio
126
+ - data.random_seed
127
+
128
+ extract_features:
129
+ cmd: python src/features/extract.py --config configs/base.yaml
130
+ deps:
131
+ - src/features/extract.py
132
+ - data/processed/
133
+ outs:
134
+ - data/features/
135
+ params:
136
+ - configs/base.yaml:
137
+ - features
138
+
139
+ train:
140
+ cmd: python src/training/train.py --config configs/base.yaml
141
+ deps:
142
+ - src/training/train.py
143
+ - src/models/
144
+ - data/features/
145
+ outs:
146
+ - models/
147
+ metrics:
148
+ - metrics.json:
149
+ cache: false
150
+ plots:
151
+ - plots/training_curve.csv:
152
+ x: epoch
153
+ y: loss
154
+ ```
155
+
156
+ ```bash
157
+ # Reproduce the full pipeline
158
+ dvc repro
159
+
160
+ # Compare experiments
161
+ dvc metrics diff
162
+
163
+ # Push data to remote storage
164
+ dvc push
165
+ ```
166
+
167
+ ## Configuration Management with Hydra
168
+
169
+ ```python
170
+ import hydra
171
+ from omegaconf import DictConfig, OmegaConf
172
+
173
+ @hydra.main(config_path="configs", config_name="base", version_base=None)
174
+ def main(cfg: DictConfig):
175
+ print(OmegaConf.to_yaml(cfg))
176
+
177
+ model = build_model(
178
+ name=cfg.model.name,
179
+ hidden_dim=cfg.model.hidden_dim,
180
+ num_layers=cfg.model.num_layers,
181
+ )
182
+
183
+ train(
184
+ model=model,
185
+ lr=cfg.training.lr,
186
+ epochs=cfg.training.epochs,
187
+ batch_size=cfg.training.batch_size,
188
+ )
189
+
190
+ # Override from command line:
191
+ # python train.py training.lr=1e-4 model.hidden_dim=512
192
+ # python train.py --multirun training.lr=1e-3,1e-4,1e-5
193
+ ```
194
+
195
+ ```yaml
196
+ # configs/base.yaml
197
+ model:
198
+ name: resnet50
199
+ hidden_dim: 256
200
+ num_layers: 4
201
+
202
+ training:
203
+ lr: 1e-3
204
+ epochs: 100
205
+ batch_size: 32
206
+ optimizer: adamw
207
+ weight_decay: 0.01
208
+
209
+ data:
210
+ dataset: cifar10
211
+ split_ratio: [0.8, 0.1, 0.1]
212
+ random_seed: 42
213
+ augmentation: true
214
+ ```
215
+
216
+ ## Feature Engineering Patterns
217
+
218
+ ```python
219
+ from sklearn.pipeline import Pipeline
220
+ from sklearn.preprocessing import StandardScaler, OneHotEncoder
221
+ from sklearn.compose import ColumnTransformer
222
+ from sklearn.impute import SimpleImputer
223
+ import joblib
224
+
225
+ def build_feature_pipeline(numeric_cols: list, categorical_cols: list) -> Pipeline:
226
+ """Build a reproducible feature engineering pipeline."""
227
+ numeric_transformer = Pipeline([
228
+ ("imputer", SimpleImputer(strategy="median")),
229
+ ("scaler", StandardScaler()),
230
+ ])
231
+
232
+ categorical_transformer = Pipeline([
233
+ ("imputer", SimpleImputer(strategy="most_frequent")),
234
+ ("encoder", OneHotEncoder(handle_unknown="ignore", sparse_output=False)),
235
+ ])
236
+
237
+ preprocessor = ColumnTransformer([
238
+ ("num", numeric_transformer, numeric_cols),
239
+ ("cat", categorical_transformer, categorical_cols),
240
+ ])
241
+
242
+ return preprocessor
243
+
244
+ # Save and load for reproducibility
245
+ preprocessor.fit(X_train)
246
+ joblib.dump(preprocessor, "artifacts/preprocessor.pkl")
247
+ # Later: preprocessor = joblib.load("artifacts/preprocessor.pkl")
248
+ ```
249
+
250
+ ## Makefile for Reproducibility
251
+
252
+ ```makefile
253
+ .PHONY: setup data train evaluate all clean
254
+
255
+ setup:
256
+ pip install -r requirements.txt
257
+ dvc pull
258
+
259
+ data:
260
+ python src/data/prepare.py --config configs/base.yaml
261
+
262
+ train:
263
+ python src/training/train.py --config configs/base.yaml
264
+
265
+ evaluate:
266
+ python src/evaluation/evaluate.py --config configs/base.yaml
267
+
268
+ all: setup data train evaluate
269
+
270
+ sweep:
271
+ python src/training/train.py --multirun \
272
+ training.lr=1e-3,1e-4,1e-5 \
273
+ model.hidden_dim=128,256,512
274
+
275
+ clean:
276
+ rm -rf outputs/ multirun/ __pycache__/
277
+ ```
278
+
279
+ ## Best Practices
280
+
281
+ - **Never modify raw data.** All transformations should be scripted and reproducible.
282
+ - **Pin every dependency version** including CUDA, cuDNN, and OS-level libraries.
283
+ - **Separate configuration from code.** Use YAML/JSON configs, not hardcoded values.
284
+ - **Track experiments from day one.** Retrofitting experiment tracking is painful.
285
+ - **Write tests for data preprocessing.** Shape mismatches and silent data corruption are common.
286
+ - **Use `Makefile` or `dvc repro`** so any collaborator can reproduce results with one command.
287
+ - **Version your data alongside your code** using DVC, Git-LFS, or cloud storage with manifests.
288
+
289
+ ## References
290
+
291
+ - [MLflow documentation](https://mlflow.org/docs/latest/) -- Experiment tracking and model registry
292
+ - [DVC documentation](https://dvc.org/doc) -- Data version control for ML
293
+ - [Hydra documentation](https://hydra.cc/) -- Configuration management framework
294
+ - [Cookiecutter Data Science](https://drivendata.github.io/cookiecutter-data-science/) -- Project structure template
295
+ - [Made With ML](https://madewithml.com/) -- MLOps best practices for researchers