@wentorai/research-plugins 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (252) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +204 -0
  3. package/curated/analysis/README.md +64 -0
  4. package/curated/domains/README.md +104 -0
  5. package/curated/literature/README.md +53 -0
  6. package/curated/research/README.md +62 -0
  7. package/curated/tools/README.md +87 -0
  8. package/curated/writing/README.md +61 -0
  9. package/index.ts +39 -0
  10. package/mcp-configs/academic-db/ChatSpatial.json +17 -0
  11. package/mcp-configs/academic-db/academia-mcp.json +17 -0
  12. package/mcp-configs/academic-db/academic-paper-explorer.json +17 -0
  13. package/mcp-configs/academic-db/academic-search-mcp-server.json +17 -0
  14. package/mcp-configs/academic-db/agentinterviews-mcp.json +17 -0
  15. package/mcp-configs/academic-db/all-in-mcp.json +17 -0
  16. package/mcp-configs/academic-db/apple-health-mcp.json +17 -0
  17. package/mcp-configs/academic-db/arxiv-latex-mcp.json +17 -0
  18. package/mcp-configs/academic-db/arxiv-mcp-server.json +17 -0
  19. package/mcp-configs/academic-db/bgpt-mcp.json +17 -0
  20. package/mcp-configs/academic-db/biomcp.json +17 -0
  21. package/mcp-configs/academic-db/biothings-mcp.json +17 -0
  22. package/mcp-configs/academic-db/catalysishub-mcp-server.json +17 -0
  23. package/mcp-configs/academic-db/clinicaltrialsgov-mcp-server.json +17 -0
  24. package/mcp-configs/academic-db/deep-research-mcp.json +17 -0
  25. package/mcp-configs/academic-db/dicom-mcp.json +17 -0
  26. package/mcp-configs/academic-db/enrichr-mcp-server.json +17 -0
  27. package/mcp-configs/academic-db/fec-mcp-server.json +17 -0
  28. package/mcp-configs/academic-db/fhir-mcp-server-themomentum.json +17 -0
  29. package/mcp-configs/academic-db/fhir-mcp.json +19 -0
  30. package/mcp-configs/academic-db/gget-mcp.json +17 -0
  31. package/mcp-configs/academic-db/google-researcher-mcp.json +17 -0
  32. package/mcp-configs/academic-db/idea-reality-mcp.json +17 -0
  33. package/mcp-configs/academic-db/legiscan-mcp.json +19 -0
  34. package/mcp-configs/academic-db/lex.json +17 -0
  35. package/mcp-configs/ai-platform/Adaptive-Graph-of-Thoughts-MCP-server.json +17 -0
  36. package/mcp-configs/ai-platform/ai-counsel.json +17 -0
  37. package/mcp-configs/ai-platform/atlas-mcp-server.json +17 -0
  38. package/mcp-configs/ai-platform/counsel-mcp.json +17 -0
  39. package/mcp-configs/ai-platform/cross-llm-mcp.json +17 -0
  40. package/mcp-configs/ai-platform/gptr-mcp.json +17 -0
  41. package/mcp-configs/browser/decipher-research-agent.json +17 -0
  42. package/mcp-configs/browser/deep-research.json +17 -0
  43. package/mcp-configs/browser/everything-claude-code.json +17 -0
  44. package/mcp-configs/browser/gpt-researcher.json +17 -0
  45. package/mcp-configs/browser/heurist-agent-framework.json +17 -0
  46. package/mcp-configs/data-platform/4everland-hosting-mcp.json +17 -0
  47. package/mcp-configs/data-platform/context-keeper.json +17 -0
  48. package/mcp-configs/data-platform/context7.json +19 -0
  49. package/mcp-configs/data-platform/contextstream-mcp.json +17 -0
  50. package/mcp-configs/data-platform/email-mcp.json +17 -0
  51. package/mcp-configs/note-knowledge/ApeRAG.json +17 -0
  52. package/mcp-configs/note-knowledge/In-Memoria.json +17 -0
  53. package/mcp-configs/note-knowledge/agent-memory.json +17 -0
  54. package/mcp-configs/note-knowledge/aimemo.json +17 -0
  55. package/mcp-configs/note-knowledge/biel-mcp.json +19 -0
  56. package/mcp-configs/note-knowledge/cognee.json +17 -0
  57. package/mcp-configs/note-knowledge/context-awesome.json +17 -0
  58. package/mcp-configs/note-knowledge/context-mcp.json +17 -0
  59. package/mcp-configs/note-knowledge/conversation-handoff-mcp.json +17 -0
  60. package/mcp-configs/note-knowledge/cortex.json +17 -0
  61. package/mcp-configs/note-knowledge/devrag.json +17 -0
  62. package/mcp-configs/note-knowledge/easy-obsidian-mcp.json +17 -0
  63. package/mcp-configs/note-knowledge/engram.json +17 -0
  64. package/mcp-configs/note-knowledge/gnosis-mcp.json +17 -0
  65. package/mcp-configs/note-knowledge/graphlit-mcp-server.json +19 -0
  66. package/mcp-configs/reference-mgr/arxiv-cli.json +17 -0
  67. package/mcp-configs/reference-mgr/arxiv-search-mcp.json +17 -0
  68. package/mcp-configs/reference-mgr/chiken.json +17 -0
  69. package/mcp-configs/reference-mgr/claude-scholar.json +17 -0
  70. package/mcp-configs/reference-mgr/devonthink-mcp.json +17 -0
  71. package/mcp-configs/registry.json +447 -0
  72. package/openclaw.plugin.json +21 -0
  73. package/package.json +61 -0
  74. package/skills/analysis/dataviz/color-accessibility-guide/SKILL.md +230 -0
  75. package/skills/analysis/dataviz/geospatial-viz-guide/SKILL.md +218 -0
  76. package/skills/analysis/dataviz/interactive-viz-guide/SKILL.md +287 -0
  77. package/skills/analysis/dataviz/network-visualization-guide/SKILL.md +195 -0
  78. package/skills/analysis/dataviz/publication-figures-guide/SKILL.md +238 -0
  79. package/skills/analysis/dataviz/python-dataviz-guide/SKILL.md +195 -0
  80. package/skills/analysis/econometrics/causal-inference-guide/SKILL.md +197 -0
  81. package/skills/analysis/econometrics/iv-regression-guide/SKILL.md +198 -0
  82. package/skills/analysis/econometrics/panel-data-guide/SKILL.md +274 -0
  83. package/skills/analysis/econometrics/robustness-checks/SKILL.md +250 -0
  84. package/skills/analysis/econometrics/stata-regression/SKILL.md +117 -0
  85. package/skills/analysis/econometrics/time-series-guide/SKILL.md +235 -0
  86. package/skills/analysis/statistics/bayesian-statistics-guide/SKILL.md +221 -0
  87. package/skills/analysis/statistics/hypothesis-testing-guide/SKILL.md +210 -0
  88. package/skills/analysis/statistics/meta-analysis-guide/SKILL.md +206 -0
  89. package/skills/analysis/statistics/nonparametric-tests-guide/SKILL.md +221 -0
  90. package/skills/analysis/statistics/power-analysis-guide/SKILL.md +240 -0
  91. package/skills/analysis/statistics/sem-guide/SKILL.md +231 -0
  92. package/skills/analysis/statistics/survival-analysis-guide/SKILL.md +195 -0
  93. package/skills/analysis/wrangling/missing-data-handling/SKILL.md +224 -0
  94. package/skills/analysis/wrangling/pandas-data-wrangling/SKILL.md +242 -0
  95. package/skills/analysis/wrangling/questionnaire-design-guide/SKILL.md +234 -0
  96. package/skills/analysis/wrangling/text-mining-guide/SKILL.md +225 -0
  97. package/skills/domains/ai-ml/computer-vision-guide/SKILL.md +213 -0
  98. package/skills/domains/ai-ml/deep-learning-papers-guide/SKILL.md +200 -0
  99. package/skills/domains/ai-ml/llm-evaluation-guide/SKILL.md +194 -0
  100. package/skills/domains/ai-ml/prompt-engineering-research/SKILL.md +233 -0
  101. package/skills/domains/ai-ml/reinforcement-learning-guide/SKILL.md +254 -0
  102. package/skills/domains/ai-ml/transformer-architecture-guide/SKILL.md +233 -0
  103. package/skills/domains/biomedical/clinical-research-guide/SKILL.md +232 -0
  104. package/skills/domains/biomedical/clinicaltrials-api/SKILL.md +177 -0
  105. package/skills/domains/biomedical/epidemiology-guide/SKILL.md +200 -0
  106. package/skills/domains/biomedical/genomics-analysis-guide/SKILL.md +270 -0
  107. package/skills/domains/business/market-analysis-guide/SKILL.md +112 -0
  108. package/skills/domains/business/strategic-management-guide/SKILL.md +154 -0
  109. package/skills/domains/chemistry/computational-chemistry-guide/SKILL.md +266 -0
  110. package/skills/domains/chemistry/retrosynthesis-guide/SKILL.md +215 -0
  111. package/skills/domains/cs/algorithms-complexity-guide/SKILL.md +194 -0
  112. package/skills/domains/cs/dblp-api/SKILL.md +129 -0
  113. package/skills/domains/cs/software-engineering-research/SKILL.md +218 -0
  114. package/skills/domains/ecology/biodiversity-data-guide/SKILL.md +296 -0
  115. package/skills/domains/ecology/conservation-biology-guide/SKILL.md +198 -0
  116. package/skills/domains/ecology/gbif-api/SKILL.md +158 -0
  117. package/skills/domains/ecology/inaturalist-api/SKILL.md +173 -0
  118. package/skills/domains/economics/behavioral-economics-guide/SKILL.md +239 -0
  119. package/skills/domains/economics/development-economics-guide/SKILL.md +181 -0
  120. package/skills/domains/economics/fred-api/SKILL.md +189 -0
  121. package/skills/domains/education/curriculum-design-guide/SKILL.md +144 -0
  122. package/skills/domains/education/learning-science-guide/SKILL.md +150 -0
  123. package/skills/domains/finance/financial-data-analysis/SKILL.md +152 -0
  124. package/skills/domains/finance/quantitative-finance-guide/SKILL.md +151 -0
  125. package/skills/domains/geoscience/climate-science-guide/SKILL.md +158 -0
  126. package/skills/domains/geoscience/gis-remote-sensing-guide/SKILL.md +129 -0
  127. package/skills/domains/humanities/digital-humanities-guide/SKILL.md +181 -0
  128. package/skills/domains/humanities/philosophy-research-guide/SKILL.md +148 -0
  129. package/skills/domains/law/courtlistener-api/SKILL.md +213 -0
  130. package/skills/domains/law/legal-research-guide/SKILL.md +250 -0
  131. package/skills/domains/math/linear-algebra-applications/SKILL.md +227 -0
  132. package/skills/domains/math/numerical-methods-guide/SKILL.md +236 -0
  133. package/skills/domains/math/oeis-api/SKILL.md +158 -0
  134. package/skills/domains/pharma/clinical-pharmacology-guide/SKILL.md +165 -0
  135. package/skills/domains/pharma/drug-development-guide/SKILL.md +177 -0
  136. package/skills/domains/physics/computational-physics-guide/SKILL.md +300 -0
  137. package/skills/domains/physics/nasa-ads-api/SKILL.md +150 -0
  138. package/skills/domains/physics/quantum-computing-guide/SKILL.md +234 -0
  139. package/skills/domains/social-science/social-research-methods/SKILL.md +194 -0
  140. package/skills/domains/social-science/survey-research-guide/SKILL.md +182 -0
  141. package/skills/literature/discovery/citation-alert-guide/SKILL.md +154 -0
  142. package/skills/literature/discovery/conference-proceedings-guide/SKILL.md +142 -0
  143. package/skills/literature/discovery/literature-mapping-guide/SKILL.md +175 -0
  144. package/skills/literature/discovery/paper-tracking-guide/SKILL.md +211 -0
  145. package/skills/literature/discovery/rss-paper-feeds/SKILL.md +214 -0
  146. package/skills/literature/discovery/semantic-scholar-recs-guide/SKILL.md +164 -0
  147. package/skills/literature/fulltext/doaj-api/SKILL.md +120 -0
  148. package/skills/literature/fulltext/interlibrary-loan-guide/SKILL.md +163 -0
  149. package/skills/literature/fulltext/open-access-guide/SKILL.md +183 -0
  150. package/skills/literature/fulltext/pmc-oai-api/SKILL.md +184 -0
  151. package/skills/literature/fulltext/preprint-servers-guide/SKILL.md +128 -0
  152. package/skills/literature/fulltext/repository-harvesting-guide/SKILL.md +207 -0
  153. package/skills/literature/fulltext/unpaywall-api/SKILL.md +113 -0
  154. package/skills/literature/metadata/altmetrics-guide/SKILL.md +132 -0
  155. package/skills/literature/metadata/citation-network-guide/SKILL.md +236 -0
  156. package/skills/literature/metadata/crossref-api/SKILL.md +133 -0
  157. package/skills/literature/metadata/datacite-api/SKILL.md +126 -0
  158. package/skills/literature/metadata/doi-resolution-guide/SKILL.md +168 -0
  159. package/skills/literature/metadata/h-index-guide/SKILL.md +183 -0
  160. package/skills/literature/metadata/journal-metrics-guide/SKILL.md +188 -0
  161. package/skills/literature/metadata/opencitations-api/SKILL.md +128 -0
  162. package/skills/literature/metadata/orcid-api/SKILL.md +136 -0
  163. package/skills/literature/metadata/orcid-integration-guide/SKILL.md +178 -0
  164. package/skills/literature/search/arxiv-api/SKILL.md +95 -0
  165. package/skills/literature/search/biorxiv-api/SKILL.md +123 -0
  166. package/skills/literature/search/boolean-search-guide/SKILL.md +199 -0
  167. package/skills/literature/search/citation-chaining-guide/SKILL.md +148 -0
  168. package/skills/literature/search/database-comparison-guide/SKILL.md +100 -0
  169. package/skills/literature/search/europe-pmc-api/SKILL.md +120 -0
  170. package/skills/literature/search/google-scholar-guide/SKILL.md +182 -0
  171. package/skills/literature/search/mesh-terms-guide/SKILL.md +164 -0
  172. package/skills/literature/search/openalex-api/SKILL.md +134 -0
  173. package/skills/literature/search/pubmed-api/SKILL.md +130 -0
  174. package/skills/literature/search/scientify-literature-survey/SKILL.md +203 -0
  175. package/skills/literature/search/semantic-scholar-api/SKILL.md +134 -0
  176. package/skills/literature/search/systematic-search-strategy/SKILL.md +214 -0
  177. package/skills/research/automation/ai-scientist-guide/SKILL.md +228 -0
  178. package/skills/research/automation/data-collection-automation/SKILL.md +248 -0
  179. package/skills/research/automation/research-workflow-automation/SKILL.md +266 -0
  180. package/skills/research/deep-research/meta-synthesis-guide/SKILL.md +174 -0
  181. package/skills/research/deep-research/research-cog/SKILL.md +153 -0
  182. package/skills/research/deep-research/scoping-review-guide/SKILL.md +217 -0
  183. package/skills/research/deep-research/systematic-review-guide/SKILL.md +250 -0
  184. package/skills/research/funding/figshare-api/SKILL.md +163 -0
  185. package/skills/research/funding/grant-writing-guide/SKILL.md +233 -0
  186. package/skills/research/funding/nsf-grant-guide/SKILL.md +206 -0
  187. package/skills/research/funding/open-science-guide/SKILL.md +255 -0
  188. package/skills/research/funding/zenodo-api/SKILL.md +174 -0
  189. package/skills/research/methodology/action-research-guide/SKILL.md +201 -0
  190. package/skills/research/methodology/experimental-design-guide/SKILL.md +236 -0
  191. package/skills/research/methodology/grad-school-guide/SKILL.md +182 -0
  192. package/skills/research/methodology/grounded-theory-guide/SKILL.md +171 -0
  193. package/skills/research/methodology/mixed-methods-guide/SKILL.md +208 -0
  194. package/skills/research/methodology/qualitative-research-guide/SKILL.md +234 -0
  195. package/skills/research/methodology/scientify-idea-generation/SKILL.md +222 -0
  196. package/skills/research/paper-review/paper-reading-assistant/SKILL.md +266 -0
  197. package/skills/research/paper-review/peer-review-guide/SKILL.md +227 -0
  198. package/skills/research/paper-review/rebuttal-writing-guide/SKILL.md +185 -0
  199. package/skills/research/paper-review/scientify-write-review-paper/SKILL.md +209 -0
  200. package/skills/tools/code-exec/jupyter-notebook-guide/SKILL.md +178 -0
  201. package/skills/tools/code-exec/python-reproducibility-guide/SKILL.md +341 -0
  202. package/skills/tools/code-exec/r-reproducibility-guide/SKILL.md +236 -0
  203. package/skills/tools/code-exec/sandbox-execution-guide/SKILL.md +221 -0
  204. package/skills/tools/diagram/mermaid-diagram-guide/SKILL.md +269 -0
  205. package/skills/tools/diagram/plantuml-guide/SKILL.md +397 -0
  206. package/skills/tools/diagram/scientific-illustration-guide/SKILL.md +225 -0
  207. package/skills/tools/document/anystyle-api/SKILL.md +199 -0
  208. package/skills/tools/document/grobid-pdf-parsing/SKILL.md +294 -0
  209. package/skills/tools/document/markdown-academic-guide/SKILL.md +217 -0
  210. package/skills/tools/document/pdf-extraction-guide/SKILL.md +321 -0
  211. package/skills/tools/knowledge-graph/knowledge-graph-construction/SKILL.md +306 -0
  212. package/skills/tools/knowledge-graph/ontology-design-guide/SKILL.md +214 -0
  213. package/skills/tools/knowledge-graph/rag-methodology-guide/SKILL.md +325 -0
  214. package/skills/tools/ocr-translate/formula-recognition-guide/SKILL.md +367 -0
  215. package/skills/tools/ocr-translate/handwriting-recognition-guide/SKILL.md +211 -0
  216. package/skills/tools/ocr-translate/latex-ocr-guide/SKILL.md +204 -0
  217. package/skills/tools/ocr-translate/multilingual-research-guide/SKILL.md +234 -0
  218. package/skills/tools/scraping/academic-web-scraping/SKILL.md +326 -0
  219. package/skills/tools/scraping/api-data-collection-guide/SKILL.md +301 -0
  220. package/skills/tools/scraping/web-scraping-ethics-guide/SKILL.md +250 -0
  221. package/skills/writing/citation/bibtex-management-guide/SKILL.md +246 -0
  222. package/skills/writing/citation/citation-style-guide/SKILL.md +248 -0
  223. package/skills/writing/citation/reference-manager-comparison/SKILL.md +208 -0
  224. package/skills/writing/citation/zotero-api/SKILL.md +188 -0
  225. package/skills/writing/composition/abstract-writing-guide/SKILL.md +188 -0
  226. package/skills/writing/composition/discussion-writing-guide/SKILL.md +194 -0
  227. package/skills/writing/composition/introduction-writing-guide/SKILL.md +194 -0
  228. package/skills/writing/composition/literature-review-writing/SKILL.md +196 -0
  229. package/skills/writing/composition/methods-section-guide/SKILL.md +185 -0
  230. package/skills/writing/composition/response-to-reviewers/SKILL.md +215 -0
  231. package/skills/writing/composition/scientific-writing-guide/SKILL.md +152 -0
  232. package/skills/writing/latex/bibliography-management-guide/SKILL.md +206 -0
  233. package/skills/writing/latex/latex-drawing-guide/SKILL.md +234 -0
  234. package/skills/writing/latex/latex-ecosystem-guide/SKILL.md +240 -0
  235. package/skills/writing/latex/math-typesetting-guide/SKILL.md +231 -0
  236. package/skills/writing/latex/overleaf-collaboration-guide/SKILL.md +211 -0
  237. package/skills/writing/latex/tikz-diagrams-guide/SKILL.md +211 -0
  238. package/skills/writing/polish/academic-translation-guide/SKILL.md +175 -0
  239. package/skills/writing/polish/academic-writing-refiner/SKILL.md +143 -0
  240. package/skills/writing/polish/ai-writing-humanizer/SKILL.md +178 -0
  241. package/skills/writing/polish/grammar-checker-guide/SKILL.md +184 -0
  242. package/skills/writing/polish/plagiarism-detection-guide/SKILL.md +167 -0
  243. package/skills/writing/templates/beamer-presentation-guide/SKILL.md +263 -0
  244. package/skills/writing/templates/conference-paper-template/SKILL.md +219 -0
  245. package/skills/writing/templates/thesis-template-guide/SKILL.md +200 -0
  246. package/skills/writing/templates/thesis-writing-guide/SKILL.md +220 -0
  247. package/src/tools/arxiv.ts +131 -0
  248. package/src/tools/crossref.ts +112 -0
  249. package/src/tools/openalex.ts +174 -0
  250. package/src/tools/pubmed.ts +166 -0
  251. package/src/tools/semantic-scholar.ts +108 -0
  252. package/src/tools/unpaywall.ts +58 -0
@@ -0,0 +1,225 @@
1
+ ---
2
+ name: text-mining-guide
3
+ description: "Apply NLP and text mining techniques to research text data"
4
+ metadata:
5
+ openclaw:
6
+ emoji: "mag"
7
+ category: "analysis"
8
+ subcategory: "wrangling"
9
+ keywords: ["text mining", "NLP", "topic modeling", "sentiment analysis", "text preprocessing", "natural language processing"]
10
+ source: "wentor-research-plugins"
11
+ ---
12
+
13
+ # Text Mining Guide
14
+
15
+ A skill for applying natural language processing (NLP) and text mining techniques to research data. Covers text preprocessing, feature extraction, topic modeling, sentiment analysis, and named entity recognition for analyzing surveys, abstracts, social media, and document corpora.
16
+
17
+ ## Text Preprocessing Pipeline
18
+
19
+ ### Standard Cleaning Steps
20
+
21
+ ```python
22
+ import re
23
+ from collections import Counter
24
+
25
+
26
+ def preprocess_text(text: str, lowercase: bool = True,
27
+ remove_numbers: bool = False,
28
+ min_word_length: int = 2) -> list[str]:
29
+ """
30
+ Preprocess text for NLP analysis.
31
+
32
+ Args:
33
+ text: Raw input text
34
+ lowercase: Convert to lowercase
35
+ remove_numbers: Remove numeric tokens
36
+ min_word_length: Minimum token length to keep
37
+ """
38
+ if lowercase:
39
+ text = text.lower()
40
+
41
+ # Remove URLs
42
+ text = re.sub(r"http\S+|www\.\S+", "", text)
43
+
44
+ # Remove HTML tags
45
+ text = re.sub(r"<[^>]+>", "", text)
46
+
47
+ # Remove special characters (keep apostrophes for contractions)
48
+ text = re.sub(r"[^a-zA-Z0-9\s']", " ", text)
49
+
50
+ # Tokenize
51
+ tokens = text.split()
52
+
53
+ if remove_numbers:
54
+ tokens = [t for t in tokens if not t.isdigit()]
55
+
56
+ # Remove short tokens
57
+ tokens = [t for t in tokens if len(t) >= min_word_length]
58
+
59
+ return tokens
60
+
61
+
62
+ def remove_stopwords(tokens: list[str],
63
+ custom_stopwords: list[str] = None) -> list[str]:
64
+ """
65
+ Remove stopwords from token list.
66
+ """
67
+ # Minimal English stopwords (extend as needed)
68
+ default_stops = {
69
+ "the", "a", "an", "and", "or", "but", "in", "on", "at",
70
+ "to", "for", "of", "with", "by", "is", "was", "are", "were",
71
+ "be", "been", "being", "have", "has", "had", "do", "does",
72
+ "did", "will", "would", "could", "should", "may", "might",
73
+ "this", "that", "these", "those", "it", "its", "not", "no"
74
+ }
75
+
76
+ if custom_stopwords:
77
+ default_stops.update(custom_stopwords)
78
+
79
+ return [t for t in tokens if t not in default_stops]
80
+ ```
81
+
82
+ ### Document-Term Matrix
83
+
84
+ ```python
85
+ from sklearn.feature_extraction.text import TfidfVectorizer
86
+
87
+
88
+ def build_tfidf_matrix(documents: list[str],
89
+ max_features: int = 5000) -> dict:
90
+ """
91
+ Build a TF-IDF document-term matrix.
92
+
93
+ Args:
94
+ documents: List of document strings
95
+ max_features: Maximum vocabulary size
96
+ """
97
+ vectorizer = TfidfVectorizer(
98
+ max_features=max_features,
99
+ stop_words="english",
100
+ min_df=2, # Appear in at least 2 documents
101
+ max_df=0.95, # Ignore terms in >95% of documents
102
+ ngram_range=(1, 2) # Unigrams and bigrams
103
+ )
104
+
105
+ tfidf_matrix = vectorizer.fit_transform(documents)
106
+
107
+ return {
108
+ "matrix_shape": tfidf_matrix.shape,
109
+ "vocabulary_size": len(vectorizer.vocabulary_),
110
+ "top_terms": sorted(
111
+ vectorizer.vocabulary_.items(),
112
+ key=lambda x: x[1]
113
+ )[:20],
114
+ "vectorizer": vectorizer,
115
+ "matrix": tfidf_matrix
116
+ }
117
+ ```
118
+
119
+ ## Topic Modeling
120
+
121
+ ### Latent Dirichlet Allocation (LDA)
122
+
123
+ ```python
124
+ from sklearn.decomposition import LatentDirichletAllocation
125
+
126
+
127
+ def run_topic_model(tfidf_matrix, vectorizer,
128
+ n_topics: int = 10) -> list[dict]:
129
+ """
130
+ Run LDA topic modeling on a document-term matrix.
131
+
132
+ Args:
133
+ tfidf_matrix: Sparse TF-IDF matrix
134
+ vectorizer: Fitted TfidfVectorizer
135
+ n_topics: Number of topics to extract
136
+ """
137
+ lda = LatentDirichletAllocation(
138
+ n_components=n_topics,
139
+ random_state=42,
140
+ max_iter=50,
141
+ learning_method="online"
142
+ )
143
+ lda.fit(tfidf_matrix)
144
+
145
+ feature_names = vectorizer.get_feature_names_out()
146
+ topics = []
147
+
148
+ for idx, topic_weights in enumerate(lda.components_):
149
+ top_indices = topic_weights.argsort()[-10:][::-1]
150
+ top_words = [feature_names[i] for i in top_indices]
151
+ topics.append({
152
+ "topic_id": idx,
153
+ "top_words": top_words,
154
+ "label": "Assign a human-readable label based on top words"
155
+ })
156
+
157
+ return topics
158
+ ```
159
+
160
+ ### Choosing the Number of Topics
161
+
162
+ ```
163
+ Methods for selecting k (number of topics):
164
+ - Coherence score: Higher is better (use gensim's CoherenceModel)
165
+ - Perplexity: Lower is better (but can overfit)
166
+ - Human judgment: Do topics make interpretive sense?
167
+ - Domain knowledge: Expected number of themes in the corpus
168
+
169
+ Practical advice:
170
+ - Start with k = 5, 10, 15, 20 and compare
171
+ - Examine top words for each k -- look for coherent themes
172
+ - If topics are too broad, increase k
173
+ - If topics overlap heavily, decrease k
174
+ ```
175
+
176
+ ## Sentiment Analysis
177
+
178
+ ### Lexicon-Based Approach
179
+
180
+ ```python
181
+ def simple_sentiment(text: str, positive_words: set,
182
+ negative_words: set) -> dict:
183
+ """
184
+ Basic lexicon-based sentiment scoring.
185
+
186
+ Args:
187
+ text: Input text
188
+ positive_words: Set of positive sentiment words
189
+ negative_words: Set of negative sentiment words
190
+ """
191
+ tokens = text.lower().split()
192
+
193
+ pos_count = sum(1 for t in tokens if t in positive_words)
194
+ neg_count = sum(1 for t in tokens if t in negative_words)
195
+ total = len(tokens)
196
+
197
+ score = (pos_count - neg_count) / max(total, 1)
198
+
199
+ return {
200
+ "positive_count": pos_count,
201
+ "negative_count": neg_count,
202
+ "score": score,
203
+ "label": (
204
+ "positive" if score > 0.05
205
+ else "negative" if score < -0.05
206
+ else "neutral"
207
+ )
208
+ }
209
+ ```
210
+
211
+ ## Research Applications
212
+
213
+ ### Common Text Mining Tasks in Research
214
+
215
+ | Task | Method | Application |
216
+ |------|--------|-------------|
217
+ | Literature mapping | Topic modeling | Identify research themes in a corpus of abstracts |
218
+ | Survey analysis | Thematic coding + sentiment | Analyze open-ended survey responses |
219
+ | Social media analysis | NER + sentiment | Track public discourse on a topic |
220
+ | Content analysis | Classification + keyword extraction | Code qualitative data at scale |
221
+ | Bibliometrics | Co-word analysis | Map intellectual structure of a field |
222
+
223
+ ## Validation and Reporting
224
+
225
+ Always validate text mining results against human judgment. Report preprocessing steps, parameter choices (e.g., number of topics, min_df, max_df), and model evaluation metrics. For topic models, include the top 10-15 words per topic and representative documents. For classification, report precision, recall, and F1 on a held-out test set. Acknowledge that automated text analysis supplements but does not replace close reading.
@@ -0,0 +1,213 @@
1
+ ---
2
+ name: computer-vision-guide
3
+ description: "Apply computer vision research methods, models, and evaluation tools"
4
+ metadata:
5
+ openclaw:
6
+ emoji: "eye"
7
+ category: "domains"
8
+ subcategory: "ai-ml"
9
+ keywords: ["computer vision", "image classification", "object detection", "CNN", "vision transformer", "deep learning"]
10
+ source: "wentor-research-plugins"
11
+ ---
12
+
13
+ # Computer Vision Guide
14
+
15
+ A skill for conducting computer vision research, covering model architectures, dataset preparation, training pipelines, evaluation metrics, and common experimental protocols for image classification, object detection, and segmentation tasks.
16
+
17
+ ## Core Tasks and Architectures
18
+
19
+ ### Computer Vision Task Taxonomy
20
+
21
+ ```
22
+ Image Classification:
23
+ Input: Single image
24
+ Output: Class label(s)
25
+ Models: ResNet, EfficientNet, ViT, ConvNeXt
26
+
27
+ Object Detection:
28
+ Input: Single image
29
+ Output: Bounding boxes + class labels
30
+ Models: YOLO (v5-v9), Faster R-CNN, DETR, RT-DETR
31
+
32
+ Semantic Segmentation:
33
+ Input: Single image
34
+ Output: Per-pixel class label
35
+ Models: U-Net, DeepLab, SegFormer, Mask2Former
36
+
37
+ Instance Segmentation:
38
+ Input: Single image
39
+ Output: Per-pixel labels distinguishing individual objects
40
+ Models: Mask R-CNN, Mask2Former, SAM
41
+
42
+ Image Generation:
43
+ Input: Text prompt or noise
44
+ Output: Generated image
45
+ Models: Stable Diffusion, DALL-E, Imagen
46
+ ```
47
+
48
+ ### Model Architecture Evolution
49
+
50
+ ```
51
+ CNNs (Convolutional Neural Networks):
52
+ LeNet (1998) -> AlexNet (2012) -> VGG (2014) -> ResNet (2015)
53
+ -> EfficientNet (2019) -> ConvNeXt (2022)
54
+
55
+ Vision Transformers:
56
+ ViT (2020) -> DeiT (2021) -> Swin Transformer (2021)
57
+ -> BEiT (2021) -> DINOv2 (2023)
58
+
59
+ Trend: Transformers are competitive with CNNs at scale.
60
+ Hybrid architectures combining convolutions and attention are common.
61
+ ```
62
+
63
+ ## Dataset Preparation
64
+
65
+ ### Building a Research Dataset
66
+
67
+ ```python
68
+ import os
69
+ from pathlib import Path
70
+
71
+
72
+ def organize_image_dataset(source_dir: str,
73
+ split_ratios: dict = None) -> dict:
74
+ """
75
+ Organize images into train/val/test splits.
76
+
77
+ Args:
78
+ source_dir: Directory containing class subdirectories
79
+ split_ratios: Dict with 'train', 'val', 'test' ratios
80
+ """
81
+ if split_ratios is None:
82
+ split_ratios = {"train": 0.7, "val": 0.15, "test": 0.15}
83
+
84
+ import random
85
+ random.seed(42)
86
+
87
+ stats = {}
88
+ for class_dir in sorted(Path(source_dir).iterdir()):
89
+ if not class_dir.is_dir():
90
+ continue
91
+
92
+ images = list(class_dir.glob("*.jpg")) + list(class_dir.glob("*.png"))
93
+ random.shuffle(images)
94
+
95
+ n = len(images)
96
+ n_train = int(n * split_ratios["train"])
97
+ n_val = int(n * split_ratios["val"])
98
+
99
+ stats[class_dir.name] = {
100
+ "total": n,
101
+ "train": n_train,
102
+ "val": n_val,
103
+ "test": n - n_train - n_val
104
+ }
105
+
106
+ return stats
107
+ ```
108
+
109
+ ### Data Augmentation
110
+
111
+ ```python
112
+ from torchvision import transforms
113
+
114
+
115
+ def get_training_transforms(img_size: int = 224) -> transforms.Compose:
116
+ """
117
+ Standard data augmentation pipeline for training.
118
+
119
+ Args:
120
+ img_size: Target image size
121
+ """
122
+ return transforms.Compose([
123
+ transforms.RandomResizedCrop(img_size, scale=(0.8, 1.0)),
124
+ transforms.RandomHorizontalFlip(p=0.5),
125
+ transforms.ColorJitter(brightness=0.2, contrast=0.2,
126
+ saturation=0.2, hue=0.1),
127
+ transforms.RandomRotation(15),
128
+ transforms.ToTensor(),
129
+ transforms.Normalize(
130
+ mean=[0.485, 0.456, 0.406],
131
+ std=[0.229, 0.224, 0.225]
132
+ )
133
+ ])
134
+ ```
135
+
136
+ ## Training Pipeline
137
+
138
+ ### Transfer Learning Workflow
139
+
140
+ ```python
141
+ import torch
142
+ import torch.nn as nn
143
+ from torchvision import models
144
+
145
+
146
+ def create_classifier(num_classes: int,
147
+ backbone: str = "resnet50",
148
+ pretrained: bool = True) -> nn.Module:
149
+ """
150
+ Create an image classifier using transfer learning.
151
+
152
+ Args:
153
+ num_classes: Number of target classes
154
+ backbone: Model architecture name
155
+ pretrained: Whether to use ImageNet-pretrained weights
156
+ """
157
+ if backbone == "resnet50":
158
+ weights = models.ResNet50_Weights.DEFAULT if pretrained else None
159
+ model = models.resnet50(weights=weights)
160
+ model.fc = nn.Linear(model.fc.in_features, num_classes)
161
+ elif backbone == "vit_b_16":
162
+ weights = models.ViT_B_16_Weights.DEFAULT if pretrained else None
163
+ model = models.vit_b_16(weights=weights)
164
+ model.heads.head = nn.Linear(
165
+ model.heads.head.in_features, num_classes
166
+ )
167
+ else:
168
+ raise ValueError(f"Unknown backbone: {backbone}")
169
+
170
+ return model
171
+ ```
172
+
173
+ ## Evaluation Metrics
174
+
175
+ ### Metrics by Task
176
+
177
+ ```
178
+ Classification:
179
+ - Top-1 Accuracy: Fraction of correct predictions
180
+ - Top-5 Accuracy: Correct class in top 5 predictions
181
+ - Precision, Recall, F1: Per-class and macro-averaged
182
+ - Confusion Matrix: Visualize class-level errors
183
+
184
+ Object Detection:
185
+ - mAP (mean Average Precision): Standard COCO metric
186
+ - mAP@0.5: AP at IoU threshold 0.5
187
+ - mAP@0.5:0.95: AP averaged over IoU thresholds 0.5 to 0.95
188
+ - AP per class: Identifies weak categories
189
+
190
+ Segmentation:
191
+ - mIoU (mean Intersection over Union): Standard metric
192
+ - Pixel Accuracy: Fraction of correctly classified pixels
193
+ - Dice Coefficient: F1 score at the pixel level
194
+ ```
195
+
196
+ ## Reproducibility Checklist
197
+
198
+ ### What to Report in Papers
199
+
200
+ ```
201
+ 1. Architecture: Exact model name, number of parameters
202
+ 2. Pretraining: Dataset and weights used for initialization
203
+ 3. Training: Optimizer, learning rate schedule, batch size, epochs
204
+ 4. Augmentation: Full list of augmentations with parameters
205
+ 5. Hardware: GPU type, number, training time
206
+ 6. Evaluation: Exact metrics, test set version, evaluation protocol
207
+ 7. Code: Link to repository with training and evaluation scripts
208
+ 8. Random seeds: Report seeds used; ideally report mean over 3+ seeds
209
+ ```
210
+
211
+ ## Ethical Considerations
212
+
213
+ When collecting or using image datasets, consider consent (especially for images of people), geographic and demographic representation, potential for bias amplification, and dual-use concerns. Document the dataset's composition and limitations. Follow the Datasheets for Datasets framework. For generative models, implement safeguards against generating harmful content.
@@ -0,0 +1,200 @@
1
+ ---
2
+ name: deep-learning-papers-guide
3
+ description: "Annotated deep learning paper implementations with code walkthroughs"
4
+ metadata:
5
+ openclaw:
6
+ emoji: "🧠"
7
+ category: "domains"
8
+ subcategory: "ai-ml"
9
+ keywords: ["deep learning", "neural network", "Transformer", "CNN", "NLP", "computer vision"]
10
+ source: "https://github.com/labmlai/annotated_deep_learning_paper_implementations"
11
+ ---
12
+
13
+ # Deep Learning Papers Guide
14
+
15
+ ## Overview
16
+
17
+ Understanding deep learning architectures requires more than reading papers -- it requires reading and writing code. The annotated_deep_learning_paper_implementations repository (65,800+ stars) provides line-by-line annotated implementations of seminal deep learning papers in PyTorch, making it one of the most valuable learning resources in the field.
18
+
19
+ This guide organizes the key architectures by category, provides implementation patterns for the most important building blocks, and offers strategies for going from paper to working code. Whether you are implementing a Transformer variant for your research, understanding a GAN architecture for your experiments, or teaching a deep learning course, these patterns accelerate the process.
20
+
21
+ The focus is on practical understanding: what each component does, why it is designed that way, and how to implement it correctly in PyTorch.
22
+
23
+ ## Core Architecture Families
24
+
25
+ ### Transformer Architectures
26
+
27
+ The Transformer (Vaswani et al., 2017) is the foundation of modern NLP and increasingly of computer vision.
28
+
29
+ #### Multi-Head Self-Attention
30
+
31
+ ```python
32
+ import torch
33
+ import torch.nn as nn
34
+ import math
35
+
36
+ class MultiHeadAttention(nn.Module):
37
+ def __init__(self, d_model: int, n_heads: int):
38
+ super().__init__()
39
+ assert d_model % n_heads == 0
40
+ self.d_model = d_model
41
+ self.n_heads = n_heads
42
+ self.d_k = d_model // n_heads
43
+
44
+ self.W_q = nn.Linear(d_model, d_model)
45
+ self.W_k = nn.Linear(d_model, d_model)
46
+ self.W_v = nn.Linear(d_model, d_model)
47
+ self.W_o = nn.Linear(d_model, d_model)
48
+
49
+ def forward(self, query, key, value, mask=None):
50
+ batch_size = query.size(0)
51
+
52
+ # Linear projections and reshape to (batch, heads, seq, d_k)
53
+ Q = self.W_q(query).view(batch_size, -1, self.n_heads, self.d_k).transpose(1, 2)
54
+ K = self.W_k(key).view(batch_size, -1, self.n_heads, self.d_k).transpose(1, 2)
55
+ V = self.W_v(value).view(batch_size, -1, self.n_heads, self.d_k).transpose(1, 2)
56
+
57
+ # Scaled dot-product attention
58
+ scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(self.d_k)
59
+ if mask is not None:
60
+ scores = scores.masked_fill(mask == 0, float('-inf'))
61
+ attn = torch.softmax(scores, dim=-1)
62
+ context = torch.matmul(attn, V)
63
+
64
+ # Concatenate heads and project
65
+ context = context.transpose(1, 2).contiguous().view(batch_size, -1, self.d_model)
66
+ return self.W_o(context)
67
+ ```
68
+
69
+ #### Transformer Encoder Block
70
+
71
+ ```python
72
+ class TransformerBlock(nn.Module):
73
+ def __init__(self, d_model: int, n_heads: int, d_ff: int, dropout: float = 0.1):
74
+ super().__init__()
75
+ self.attention = MultiHeadAttention(d_model, n_heads)
76
+ self.norm1 = nn.LayerNorm(d_model)
77
+ self.norm2 = nn.LayerNorm(d_model)
78
+ self.ffn = nn.Sequential(
79
+ nn.Linear(d_model, d_ff),
80
+ nn.GELU(),
81
+ nn.Dropout(dropout),
82
+ nn.Linear(d_ff, d_model),
83
+ nn.Dropout(dropout)
84
+ )
85
+ self.dropout = nn.Dropout(dropout)
86
+
87
+ def forward(self, x, mask=None):
88
+ # Pre-norm variant (used in GPT-2, ViT, modern architectures)
89
+ attn_out = self.attention(self.norm1(x), self.norm1(x), self.norm1(x), mask)
90
+ x = x + self.dropout(attn_out)
91
+ x = x + self.ffn(self.norm2(x))
92
+ return x
93
+ ```
94
+
95
+ ### Convolutional Neural Networks
96
+
97
+ #### ResNet Bottleneck Block
98
+
99
+ ```python
100
+ class BottleneckBlock(nn.Module):
101
+ expansion = 4
102
+
103
+ def __init__(self, in_channels, out_channels, stride=1, downsample=None):
104
+ super().__init__()
105
+ self.conv1 = nn.Conv2d(in_channels, out_channels, 1, bias=False)
106
+ self.bn1 = nn.BatchNorm2d(out_channels)
107
+ self.conv2 = nn.Conv2d(out_channels, out_channels, 3,
108
+ stride=stride, padding=1, bias=False)
109
+ self.bn2 = nn.BatchNorm2d(out_channels)
110
+ self.conv3 = nn.Conv2d(out_channels, out_channels * self.expansion, 1, bias=False)
111
+ self.bn3 = nn.BatchNorm2d(out_channels * self.expansion)
112
+ self.relu = nn.ReLU(inplace=True)
113
+ self.downsample = downsample
114
+
115
+ def forward(self, x):
116
+ identity = x
117
+ out = self.relu(self.bn1(self.conv1(x)))
118
+ out = self.relu(self.bn2(self.conv2(out)))
119
+ out = self.bn3(self.conv3(out))
120
+ if self.downsample is not None:
121
+ identity = self.downsample(x)
122
+ out += identity
123
+ return self.relu(out)
124
+ ```
125
+
126
+ ## Key Architecture Comparison
127
+
128
+ | Architecture | Year | Parameters | Key Innovation | Primary Domain |
129
+ |-------------|------|------------|----------------|---------------|
130
+ | ResNet | 2015 | 25M (ResNet-50) | Skip connections | Vision |
131
+ | Transformer | 2017 | Varies | Self-attention | NLP |
132
+ | BERT | 2018 | 340M (Large) | Masked language modeling | NLP |
133
+ | GPT-2 | 2019 | 1.5B | Autoregressive generation | NLP |
134
+ | ViT | 2020 | 86M (Base) | Patch-based image tokenization | Vision |
135
+ | Diffusion | 2020 | Varies | Iterative denoising | Generation |
136
+ | LLaMA | 2023 | 7B-70B | Efficient open LLM | NLP |
137
+
138
+ ## Training Patterns
139
+
140
+ ### Standard Training Loop
141
+
142
+ ```python
143
+ def train_epoch(model, dataloader, optimizer, criterion, device):
144
+ model.train()
145
+ total_loss = 0
146
+ for batch_idx, (data, targets) in enumerate(dataloader):
147
+ data, targets = data.to(device), targets.to(device)
148
+
149
+ optimizer.zero_grad()
150
+ outputs = model(data)
151
+ loss = criterion(outputs, targets)
152
+ loss.backward()
153
+
154
+ # Gradient clipping (crucial for Transformers)
155
+ torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
156
+
157
+ optimizer.step()
158
+ total_loss += loss.item()
159
+
160
+ return total_loss / len(dataloader)
161
+ ```
162
+
163
+ ### Learning Rate Scheduling
164
+
165
+ ```python
166
+ # Cosine annealing with warmup (standard for Transformers)
167
+ from torch.optim.lr_scheduler import CosineAnnealingLR, LinearLR, SequentialLR
168
+
169
+ optimizer = torch.optim.AdamW(model.parameters(), lr=3e-4, weight_decay=0.01)
170
+ warmup = LinearLR(optimizer, start_factor=0.01, total_iters=1000)
171
+ cosine = CosineAnnealingLR(optimizer, T_max=50000)
172
+ scheduler = SequentialLR(optimizer, schedulers=[warmup, cosine], milestones=[1000])
173
+ ```
174
+
175
+ ## From Paper to Code: A Methodology
176
+
177
+ 1. **Read the paper twice.** First pass for high-level understanding; second pass for implementation details.
178
+ 2. **Identify the core algorithm.** Usually in Section 3 or 4 of the paper.
179
+ 3. **List all hyperparameters.** Create a config dict before writing any code.
180
+ 4. **Implement bottom-up.** Start with the smallest building blocks (attention, normalization), then compose.
181
+ 5. **Test each component in isolation.** Verify tensor shapes and gradients at each level.
182
+ 6. **Reproduce a known result first.** Match the paper's numbers on a small dataset before scaling.
183
+ 7. **Use the official implementation as reference.** But write your own code for understanding.
184
+
185
+ ## Best Practices
186
+
187
+ - **Always verify tensor shapes.** Add assert statements for dimensions during development.
188
+ - **Use mixed precision training.** `torch.cuda.amp` provides 2x speedup with minimal accuracy loss.
189
+ - **Log everything.** Use Weights & Biases or TensorBoard for experiment tracking.
190
+ - **Start small.** Debug on a tiny dataset before running on the full one.
191
+ - **Read the appendix.** Critical details (learning rates, initialization, data augmentation) are often in the supplementary material.
192
+ - **Join the community.** Papers With Code, Reddit r/MachineLearning, and Twitter/X are where implementation details are discussed.
193
+
194
+ ## References
195
+
196
+ - [annotated_deep_learning_paper_implementations](https://github.com/labmlai/annotated_deep_learning_paper_implementations) -- Line-by-line annotated implementations (65,800+ stars)
197
+ - [Attention Is All You Need](https://arxiv.org/abs/1706.03762) -- Original Transformer paper
198
+ - [Deep Residual Learning](https://arxiv.org/abs/1512.03385) -- ResNet paper
199
+ - [The Illustrated Transformer](https://jalammar.github.io/illustrated-transformer/) -- Jay Alammar's visual guide
200
+ - [Papers With Code](https://paperswithcode.com/) -- Paper-implementation pairs with benchmarks