@wentorai/research-plugins 1.2.2 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (141) hide show
  1. package/README.md +16 -8
  2. package/openclaw.plugin.json +10 -3
  3. package/package.json +2 -5
  4. package/skills/analysis/dataviz/SKILL.md +25 -0
  5. package/skills/analysis/dataviz/chart-image-generator/SKILL.md +1 -1
  6. package/skills/analysis/econometrics/SKILL.md +23 -0
  7. package/skills/analysis/econometrics/robustness-checks/SKILL.md +1 -1
  8. package/skills/analysis/statistics/SKILL.md +21 -0
  9. package/skills/analysis/statistics/data-anomaly-detection/SKILL.md +1 -1
  10. package/skills/analysis/statistics/ml-experiment-tracker/SKILL.md +1 -1
  11. package/skills/analysis/statistics/{senior-data-scientist-guide → modeling-strategy-guide}/SKILL.md +5 -5
  12. package/skills/analysis/wrangling/SKILL.md +21 -0
  13. package/skills/analysis/wrangling/csv-data-analyzer/SKILL.md +1 -1
  14. package/skills/analysis/wrangling/data-cog-guide/SKILL.md +1 -1
  15. package/skills/domains/ai-ml/SKILL.md +37 -0
  16. package/skills/domains/biomedical/SKILL.md +28 -0
  17. package/skills/domains/biomedical/genomas-guide/SKILL.md +1 -1
  18. package/skills/domains/biomedical/med-researcher-guide/SKILL.md +1 -1
  19. package/skills/domains/biomedical/medgeclaw-guide/SKILL.md +1 -1
  20. package/skills/domains/business/SKILL.md +17 -0
  21. package/skills/domains/business/architecture-design-guide/SKILL.md +1 -1
  22. package/skills/domains/chemistry/SKILL.md +19 -0
  23. package/skills/domains/chemistry/computational-chemistry-guide/SKILL.md +1 -1
  24. package/skills/domains/cs/SKILL.md +21 -0
  25. package/skills/domains/ecology/SKILL.md +16 -0
  26. package/skills/domains/economics/SKILL.md +20 -0
  27. package/skills/domains/economics/post-labor-economics/SKILL.md +1 -1
  28. package/skills/domains/economics/pricing-psychology-guide/SKILL.md +1 -1
  29. package/skills/domains/education/SKILL.md +19 -0
  30. package/skills/domains/education/academic-study-methods/SKILL.md +1 -1
  31. package/skills/domains/education/edumcp-guide/SKILL.md +1 -1
  32. package/skills/domains/finance/SKILL.md +19 -0
  33. package/skills/domains/finance/akshare-finance-data/SKILL.md +1 -1
  34. package/skills/domains/finance/options-analytics-agent-guide/SKILL.md +1 -1
  35. package/skills/domains/finance/stata-accounting-research/SKILL.md +1 -1
  36. package/skills/domains/geoscience/SKILL.md +17 -0
  37. package/skills/domains/humanities/SKILL.md +16 -0
  38. package/skills/domains/humanities/history-research-guide/SKILL.md +1 -1
  39. package/skills/domains/humanities/political-history-guide/SKILL.md +1 -1
  40. package/skills/domains/law/SKILL.md +19 -0
  41. package/skills/domains/math/SKILL.md +17 -0
  42. package/skills/domains/pharma/SKILL.md +17 -0
  43. package/skills/domains/physics/SKILL.md +16 -0
  44. package/skills/domains/social-science/SKILL.md +17 -0
  45. package/skills/domains/social-science/sociology-research-methods/SKILL.md +1 -1
  46. package/skills/literature/discovery/SKILL.md +20 -0
  47. package/skills/literature/discovery/paper-recommendation-guide/SKILL.md +1 -1
  48. package/skills/literature/discovery/semantic-paper-radar/SKILL.md +1 -1
  49. package/skills/literature/fulltext/SKILL.md +26 -0
  50. package/skills/literature/metadata/SKILL.md +35 -0
  51. package/skills/literature/metadata/doi-content-negotiation/SKILL.md +4 -0
  52. package/skills/literature/metadata/doi-resolution-guide/SKILL.md +4 -0
  53. package/skills/literature/metadata/orcid-api/SKILL.md +4 -0
  54. package/skills/literature/metadata/orcid-integration-guide/SKILL.md +4 -0
  55. package/skills/literature/search/SKILL.md +43 -0
  56. package/skills/literature/search/paper-search-mcp-guide/SKILL.md +1 -1
  57. package/skills/research/automation/SKILL.md +21 -0
  58. package/skills/research/deep-research/SKILL.md +24 -0
  59. package/skills/research/deep-research/auto-deep-research-guide/SKILL.md +1 -1
  60. package/skills/research/deep-research/in-depth-research-guide/SKILL.md +1 -1
  61. package/skills/research/funding/SKILL.md +20 -0
  62. package/skills/research/methodology/SKILL.md +24 -0
  63. package/skills/research/paper-review/SKILL.md +19 -0
  64. package/skills/research/paper-review/paper-critique-framework/SKILL.md +1 -1
  65. package/skills/tools/code-exec/SKILL.md +18 -0
  66. package/skills/tools/diagram/SKILL.md +20 -0
  67. package/skills/tools/document/SKILL.md +21 -0
  68. package/skills/tools/knowledge-graph/SKILL.md +21 -0
  69. package/skills/tools/ocr-translate/SKILL.md +18 -0
  70. package/skills/tools/ocr-translate/handwriting-recognition-guide/SKILL.md +2 -0
  71. package/skills/tools/ocr-translate/latex-ocr-guide/SKILL.md +2 -0
  72. package/skills/tools/scraping/SKILL.md +17 -0
  73. package/skills/writing/citation/SKILL.md +33 -0
  74. package/skills/writing/citation/zotfile-attachment-guide/SKILL.md +2 -0
  75. package/skills/writing/composition/SKILL.md +22 -0
  76. package/skills/writing/composition/research-paper-writer/SKILL.md +1 -1
  77. package/skills/writing/composition/scientific-writing-wrapper/SKILL.md +1 -1
  78. package/skills/writing/latex/SKILL.md +22 -0
  79. package/skills/writing/latex/academic-writing-latex/SKILL.md +1 -1
  80. package/skills/writing/latex/latex-drawing-guide/SKILL.md +1 -1
  81. package/skills/writing/polish/SKILL.md +20 -0
  82. package/skills/writing/polish/chinese-text-humanizer/SKILL.md +1 -1
  83. package/skills/writing/templates/SKILL.md +22 -0
  84. package/skills/writing/templates/beamer-presentation-guide/SKILL.md +1 -1
  85. package/skills/writing/templates/scientific-article-pdf/SKILL.md +1 -1
  86. package/skills/analysis/dataviz/citation-map-guide/SKILL.md +0 -184
  87. package/skills/analysis/dataviz/data-visualization-principles/SKILL.md +0 -171
  88. package/skills/analysis/econometrics/empirical-paper-analysis/SKILL.md +0 -192
  89. package/skills/analysis/econometrics/panel-data-regression-workflow/SKILL.md +0 -267
  90. package/skills/analysis/econometrics/stata-regression/SKILL.md +0 -117
  91. package/skills/analysis/statistics/general-statistics-guide/SKILL.md +0 -226
  92. package/skills/analysis/statistics/infiagent-benchmark-guide/SKILL.md +0 -106
  93. package/skills/analysis/statistics/pywayne-statistics-guide/SKILL.md +0 -192
  94. package/skills/analysis/statistics/quantitative-methods-guide/SKILL.md +0 -193
  95. package/skills/analysis/wrangling/claude-data-analysis-guide/SKILL.md +0 -100
  96. package/skills/analysis/wrangling/open-data-scientist-guide/SKILL.md +0 -197
  97. package/skills/domains/ai-ml/annotated-dl-papers-guide/SKILL.md +0 -159
  98. package/skills/domains/humanities/digital-humanities-methods/SKILL.md +0 -232
  99. package/skills/domains/law/legal-research-methods/SKILL.md +0 -190
  100. package/skills/domains/social-science/sociology-research-guide/SKILL.md +0 -238
  101. package/skills/literature/discovery/arxiv-paper-monitoring/SKILL.md +0 -233
  102. package/skills/literature/discovery/paper-tracking-guide/SKILL.md +0 -211
  103. package/skills/literature/fulltext/zotero-scihub-guide/SKILL.md +0 -168
  104. package/skills/literature/search/arxiv-osiris/SKILL.md +0 -199
  105. package/skills/literature/search/deepgit-search-guide/SKILL.md +0 -147
  106. package/skills/literature/search/multi-database-literature-search/SKILL.md +0 -198
  107. package/skills/literature/search/papers-chat-guide/SKILL.md +0 -194
  108. package/skills/literature/search/pasa-paper-search-guide/SKILL.md +0 -138
  109. package/skills/literature/search/scientify-literature-survey/SKILL.md +0 -203
  110. package/skills/research/automation/ai-scientist-guide/SKILL.md +0 -228
  111. package/skills/research/automation/coexist-ai-guide/SKILL.md +0 -149
  112. package/skills/research/automation/foam-agent-guide/SKILL.md +0 -203
  113. package/skills/research/automation/research-paper-orchestrator/SKILL.md +0 -254
  114. package/skills/research/deep-research/academic-deep-research/SKILL.md +0 -190
  115. package/skills/research/deep-research/cognitive-kernel-guide/SKILL.md +0 -200
  116. package/skills/research/deep-research/corvus-research-guide/SKILL.md +0 -132
  117. package/skills/research/deep-research/deep-research-pro/SKILL.md +0 -213
  118. package/skills/research/deep-research/deep-research-work/SKILL.md +0 -204
  119. package/skills/research/deep-research/research-cog/SKILL.md +0 -153
  120. package/skills/research/methodology/academic-mentor-guide/SKILL.md +0 -169
  121. package/skills/research/methodology/deep-innovator-guide/SKILL.md +0 -242
  122. package/skills/research/methodology/research-pipeline-units-guide/SKILL.md +0 -169
  123. package/skills/research/paper-review/paper-compare-guide/SKILL.md +0 -238
  124. package/skills/research/paper-review/paper-digest-guide/SKILL.md +0 -240
  125. package/skills/research/paper-review/paper-research-assistant/SKILL.md +0 -231
  126. package/skills/research/paper-review/research-quality-filter/SKILL.md +0 -261
  127. package/skills/tools/code-exec/contextplus-mcp-guide/SKILL.md +0 -110
  128. package/skills/tools/diagram/clawphd-guide/SKILL.md +0 -149
  129. package/skills/tools/diagram/scientific-graphical-abstract/SKILL.md +0 -201
  130. package/skills/tools/document/md2pdf-xelatex/SKILL.md +0 -212
  131. package/skills/tools/document/openpaper-guide/SKILL.md +0 -232
  132. package/skills/tools/document/weknora-guide/SKILL.md +0 -216
  133. package/skills/tools/knowledge-graph/mimir-memory-guide/SKILL.md +0 -135
  134. package/skills/tools/knowledge-graph/open-webui-tools-guide/SKILL.md +0 -156
  135. package/skills/tools/ocr-translate/formula-recognition-guide/SKILL.md +0 -367
  136. package/skills/tools/ocr-translate/math-equation-renderer/SKILL.md +0 -198
  137. package/skills/tools/scraping/api-data-collection-guide/SKILL.md +0 -301
  138. package/skills/writing/citation/academic-citation-manager-guide/SKILL.md +0 -182
  139. package/skills/writing/composition/opendraft-thesis-guide/SKILL.md +0 -200
  140. package/skills/writing/composition/paper-debugger-guide/SKILL.md +0 -143
  141. package/skills/writing/composition/paperforge-guide/SKILL.md +0 -205
@@ -1,193 +0,0 @@
1
- ---
2
- name: quantitative-methods-guide
3
- description: "Design and execute statistical analyses with regression modeling"
4
- metadata:
5
- openclaw:
6
- emoji: "📈"
7
- category: "analysis"
8
- subcategory: "statistics"
9
- keywords: ["regression analysis", "quantitative methods", "research design", "statistical modeling", "OLS", "logistic regression"]
10
- source: "https://github.com/AcademicSkills/quantitative-methods-guide"
11
- ---
12
-
13
- # Quantitative Methods Guide
14
-
15
- A skill for designing and executing rigorous quantitative analyses in academic research. Covers the full pipeline from research question formulation through variable operationalization, model specification, estimation, diagnostics, and interpretation, with emphasis on regression modeling as the workhorse of empirical research.
16
-
17
- ## Overview
18
-
19
- Quantitative methods form the foundation of empirical research across the social sciences, health sciences, economics, education, and many STEM fields. This skill provides a structured approach to the entire quantitative analysis workflow, ensuring that researchers make methodologically sound choices at each stage. It treats regression analysis as the central tool, covering ordinary least squares (OLS), logistic regression, Poisson regression, and multilevel models, while also addressing the broader issues of research design, measurement, and causal inference that determine whether regression results are meaningful.
20
-
21
- The skill is designed for graduate students and researchers who have basic statistics knowledge but need guidance on applying methods correctly in their own research contexts.
22
-
23
- ## Research Design and Variable Specification
24
-
25
- ### From Question to Model
26
-
27
- ```
28
- Research Question: "Does mentoring frequency affect publication output among
29
- junior faculty, controlling for department size and funding?"
30
-
31
- Step 1: Identify variables
32
- - Outcome (Y): publication_count (count data)
33
- - Predictor (X1): mentoring_hours_per_month (continuous)
34
- - Controls: department_size (continuous), total_funding (continuous)
35
- - Potential moderator: career_stage (categorical: assistant/associate)
36
-
37
- Step 2: Choose model family
38
- - Count outcome → Poisson or Negative Binomial regression
39
- - Check for overdispersion before deciding
40
-
41
- Step 3: Specify the model
42
- publications ~ mentoring_hours + department_size + log(funding) + career_stage
43
- Optional: publications ~ mentoring_hours * career_stage + controls (interaction)
44
- ```
45
-
46
- ### Variable Types and Measurement
47
-
48
- | Variable Type | Examples | Modeling Considerations |
49
- |--------------|----------|----------------------|
50
- | Continuous | Income, GPA, temperature | Check distribution, consider transformations |
51
- | Binary | Pass/fail, treatment/control | Logistic regression |
52
- | Count | Publications, citations, events | Poisson or negative binomial |
53
- | Ordinal | Likert scales, rankings | Ordinal logistic or treat as continuous if 5+ levels |
54
- | Nominal | Department, country, method | Dummy coding (k-1 indicators) |
55
- | Time-to-event | Months until graduation | Survival analysis |
56
-
57
- ## Regression Analysis
58
-
59
- ### Ordinary Least Squares (OLS)
60
-
61
- ```python
62
- import statsmodels.formula.api as smf
63
- import pandas as pd
64
-
65
- def run_ols_analysis(df: pd.DataFrame, formula: str) -> dict:
66
- """
67
- Fit an OLS regression model with full diagnostics.
68
-
69
- Args:
70
- df: DataFrame with all variables
71
- formula: Patsy formula (e.g., 'y ~ x1 + x2 + C(group)')
72
- """
73
- model = smf.ols(formula=formula, data=df).fit(cov_type='HC3') # robust SE
74
-
75
- results = {
76
- 'coefficients': model.params.to_dict(),
77
- 'std_errors': model.bse.to_dict(),
78
- 'p_values': model.pvalues.to_dict(),
79
- 'conf_int': model.conf_int().to_dict(),
80
- 'r_squared': model.rsquared,
81
- 'adj_r_squared': model.rsquared_adj,
82
- 'f_statistic': model.fvalue,
83
- 'f_pvalue': model.f_pvalue,
84
- 'n_obs': int(model.nobs),
85
- 'aic': model.aic,
86
- 'bic': model.bic
87
- }
88
- return results
89
-
90
- # Example usage:
91
- # results = run_ols_analysis(df, 'gpa ~ study_hours + sleep_hours + C(major)')
92
- ```
93
-
94
- ### Logistic Regression
95
-
96
- ```python
97
- def run_logistic_analysis(df: pd.DataFrame, formula: str) -> dict:
98
- """
99
- Fit a logistic regression for binary outcomes.
100
- Reports odds ratios alongside coefficients.
101
- """
102
- model = smf.logit(formula=formula, data=df).fit(disp=False)
103
-
104
- import numpy as np
105
- results = {
106
- 'coefficients': model.params.to_dict(),
107
- 'odds_ratios': np.exp(model.params).to_dict(),
108
- 'p_values': model.pvalues.to_dict(),
109
- 'conf_int_OR': np.exp(model.conf_int()).to_dict(),
110
- 'pseudo_r_squared': model.prsquared,
111
- 'log_likelihood': model.llf,
112
- 'aic': model.aic,
113
- 'n_obs': int(model.nobs)
114
- }
115
- return results
116
- ```
117
-
118
- ## Model Diagnostics
119
-
120
- ### OLS Assumption Checks
121
-
122
- Run these diagnostics after fitting any OLS model:
123
-
124
- 1. **Linearity**: Plot residuals vs. fitted values. Look for no systematic pattern.
125
- 2. **Normality of residuals**: Q-Q plot and Shapiro-Wilk test on residuals.
126
- 3. **Homoscedasticity**: Breusch-Pagan test (`statsmodels.stats.diagnostic.het_breuschpagan`).
127
- 4. **No multicollinearity**: Variance Inflation Factor (VIF) for each predictor.
128
- 5. **Independence**: Durbin-Watson statistic for autocorrelation (especially panel/time data).
129
-
130
- ```python
131
- from statsmodels.stats.outliers_influence import variance_inflation_factor
132
- from statsmodels.stats.diagnostic import het_breuschpagan
133
-
134
- def check_ols_assumptions(model, X_matrix) -> dict:
135
- """
136
- Run standard OLS diagnostic tests.
137
- """
138
- residuals = model.resid
139
- fitted = model.fittedvalues
140
-
141
- # VIF for multicollinearity
142
- vif = {X_matrix.columns[i]: variance_inflation_factor(X_matrix.values, i)
143
- for i in range(X_matrix.shape[1])}
144
- multicollinearity_flag = any(v > 10 for v in vif.values())
145
-
146
- # Breusch-Pagan for heteroscedasticity
147
- bp_stat, bp_p, _, _ = het_breuschpagan(residuals, X_matrix)
148
-
149
- from scipy import stats
150
- _, normality_p = stats.shapiro(residuals[:5000]) # cap at 5000
151
-
152
- return {
153
- 'vif': vif,
154
- 'multicollinearity_problem': multicollinearity_flag,
155
- 'breusch_pagan_p': round(bp_p, 4),
156
- 'heteroscedasticity_problem': bp_p < 0.05,
157
- 'residual_normality_p': round(normality_p, 4),
158
- 'recommendation': 'Use HC3 robust standard errors if heteroscedasticity detected'
159
- }
160
- ```
161
-
162
- ## Reporting Regression Results
163
-
164
- ### Standard Regression Table Format
165
-
166
- | Variable | Coefficient | SE | t | p | 95% CI |
167
- |----------|-----------|------|------|-------|---------|
168
- | (Intercept) | 2.34 | 0.45 | 5.20 | <.001 | [1.45, 3.23] |
169
- | Mentoring hours | 0.18 | 0.06 | 3.00 | .003 | [0.06, 0.30] |
170
- | Dept. size | -0.02 | 0.01 | -2.00 | .048 | [-0.04, -0.00] |
171
- | Log(Funding) | 0.31 | 0.12 | 2.58 | .011 | [0.07, 0.55] |
172
-
173
- Report: N, R-squared, Adjusted R-squared, F-statistic with df and p-value, and the type of standard errors used (e.g., HC3 robust).
174
-
175
- ### Interpretation Template
176
-
177
- "A one-unit increase in [predictor] is associated with a [coefficient] [unit] change in [outcome], holding all other variables constant (b = [coef], SE = [se], p = [p], 95% CI [[lower], [upper]])."
178
-
179
- For logistic regression: "A one-unit increase in [predictor] is associated with [OR]-times higher odds of [outcome] (OR = [or], 95% CI [[lower], [upper]], p = [p])."
180
-
181
- ## Common Pitfalls
182
-
183
- - **Omitted variable bias**: Failing to control for confounders that affect both X and Y.
184
- - **Overfitting**: Including too many predictors relative to sample size (rule of thumb: 10-20 observations per predictor).
185
- - **p-hacking**: Running many models and reporting only significant results. Pre-register your analysis plan.
186
- - **Misinterpreting R-squared**: High R-squared does not imply causation; low R-squared does not mean the model is useless.
187
- - **Ignoring assumptions**: Always run diagnostics. Violated assumptions can invalidate standard errors and p-values.
188
-
189
- ## References
190
-
191
- - Wooldridge, J. M. (2019). *Introductory Econometrics* (7th ed.). Cengage.
192
- - Gelman, A. & Hill, J. (2007). *Data Analysis Using Regression and Multilevel/Hierarchical Models*. Cambridge University Press.
193
- - King, G. (1986). How Not to Lie with Statistics. *American Journal of Political Science*, 30(3), 666-687.
@@ -1,100 +0,0 @@
1
- ---
2
- name: claude-data-analysis-guide
3
- description: "Claude Code-based conversational data analysis agent"
4
- metadata:
5
- openclaw:
6
- emoji: "🔬"
7
- category: "analysis"
8
- subcategory: "wrangling"
9
- keywords: ["Claude Code", "data analysis", "conversational", "pandas", "visualization", "interactive"]
10
- source: "https://github.com/liangdabiao/claude-data-analysis"
11
- ---
12
-
13
- # Claude Data Analysis Guide
14
-
15
- ## Overview
16
-
17
- A Claude Code-based data analysis agent that provides conversational data exploration and analysis. Upload datasets and ask questions in natural language — the agent writes and executes Python code for data cleaning, statistical analysis, visualization, and reporting. Leverages Claude Code's ability to read files, run code, and iterate on results.
18
-
19
- ## Setup
20
-
21
- ```markdown
22
- ### CLAUDE.md Configuration
23
- # Data Analysis Project
24
-
25
- ## Instructions
26
- - Analyze data files in the data/ directory
27
- - Use pandas, numpy, scipy, matplotlib, seaborn
28
- - Always show data shape and dtypes first
29
- - Include statistical tests where appropriate
30
- - Generate publication-quality figures (300 DPI)
31
- - Save outputs to output/ directory
32
-
33
- ## Conventions
34
- - Use seaborn for statistical plots
35
- - Report confidence intervals, not just p-values
36
- - Handle missing data explicitly (report, then impute)
37
- - Set random_state=42 for reproducibility
38
- ```
39
-
40
- ## Workflow
41
-
42
- ```markdown
43
- ### Interactive Analysis Loop
44
- 1. "Load the experiment data from data/results.csv"
45
- → Agent reads file, shows shape, dtypes, head()
46
-
47
- 2. "How many missing values are there?"
48
- → Agent runs df.isnull().sum(), reports per column
49
-
50
- 3. "Show the distribution of response time by condition"
51
- → Agent creates violin plots, reports summary stats
52
-
53
- 4. "Is there a significant difference between groups?"
54
- → Agent runs appropriate test (t-test, ANOVA, etc.)
55
-
56
- 5. "Build a regression model predicting response time"
57
- → Agent fits model, reports coefficients, R², diagnostics
58
-
59
- 6. "Create a summary report with all findings"
60
- → Agent generates markdown report with embedded figures
61
- ```
62
-
63
- ## Common Analysis Patterns
64
-
65
- ```python
66
- # Data profiling
67
- import pandas as pd
68
- df = pd.read_csv("data/experiment.csv")
69
- print(f"Shape: {df.shape}")
70
- print(f"\nDtypes:\n{df.dtypes}")
71
- print(f"\nMissing:\n{df.isnull().sum()}")
72
- print(f"\nDescribe:\n{df.describe()}")
73
-
74
- # Statistical comparison
75
- from scipy import stats
76
- group_a = df[df["condition"] == "A"]["score"]
77
- group_b = df[df["condition"] == "B"]["score"]
78
- t_stat, p_value = stats.ttest_ind(group_a, group_b)
79
- print(f"t={t_stat:.3f}, p={p_value:.4f}")
80
-
81
- # Visualization
82
- import seaborn as sns
83
- import matplotlib.pyplot as plt
84
- fig, ax = plt.subplots(figsize=(8, 5))
85
- sns.violinplot(data=df, x="condition", y="score", ax=ax)
86
- plt.savefig("output/comparison.png", dpi=300, bbox_inches="tight")
87
- ```
88
-
89
- ## Use Cases
90
-
91
- 1. **Experiment analysis**: Interactive analysis of lab data
92
- 2. **EDA**: Rapid exploration of unfamiliar datasets
93
- 3. **Statistical testing**: Guided hypothesis testing
94
- 4. **Report generation**: Analysis reports with figures
95
- 5. **Learning**: Interactive data science exploration
96
-
97
- ## References
98
-
99
- - [claude-data-analysis GitHub](https://github.com/liangdabiao/claude-data-analysis)
100
- - [Claude Code](https://docs.anthropic.com/en/docs/claude-code)
@@ -1,197 +0,0 @@
1
- ---
2
- name: open-data-scientist-guide
3
- description: "AI agent that performs end-to-end data science workflows"
4
- metadata:
5
- openclaw:
6
- emoji: "📊"
7
- category: "analysis"
8
- subcategory: "wrangling"
9
- keywords: ["data science", "automated analysis", "EDA", "feature engineering", "data wrangling", "AI agent"]
10
- source: "https://github.com/Open-Data-Scientist/open-data-scientist"
11
- ---
12
-
13
- # Open Data Scientist Guide
14
-
15
- ## Overview
16
-
17
- Open Data Scientist is an AI agent that automates end-to-end data science workflows — from data loading and cleaning through exploratory analysis, feature engineering, modeling, and report generation. It interprets natural language task descriptions, generates and executes Python code, iteratively refines analyses based on results, and produces publication-ready outputs. Designed for researchers who need quick, thorough data analyses without deep programming expertise.
18
-
19
- ## Workflow Pipeline
20
-
21
- ```
22
- Dataset + Task Description
23
-
24
- Data Profiling (types, distributions, missing values)
25
-
26
- Cleaning & Preprocessing (imputation, encoding, scaling)
27
-
28
- Exploratory Data Analysis (correlations, distributions, outliers)
29
-
30
- Feature Engineering (transforms, interactions, selection)
31
-
32
- Modeling (train, evaluate, compare)
33
-
34
- Report Generation (figures, tables, interpretation)
35
- ```
36
-
37
- ## Usage
38
-
39
- ```python
40
- from open_data_scientist import DataScientist
41
-
42
- ds = DataScientist(llm_provider="anthropic")
43
-
44
- # Natural language task
45
- result = ds.analyze(
46
- data="experiment_results.csv",
47
- task="Identify which experimental conditions significantly affect "
48
- "the response variable. Build a predictive model and report "
49
- "the most important features.",
50
- )
51
-
52
- # Outputs
53
- print(result.summary) # Text summary of findings
54
- result.save_report("report.html") # Full HTML report
55
- result.save_figures("figures/") # All generated plots
56
- ```
57
-
58
- ## Data Profiling
59
-
60
- ```python
61
- # Automatic data profiling before analysis
62
- profile = ds.profile("dataset.csv")
63
-
64
- print(f"Rows: {profile.n_rows}, Columns: {profile.n_cols}")
65
- print(f"Missing values: {profile.missing_summary}")
66
- print(f"Data types: {profile.dtype_summary}")
67
- print(f"Potential issues: {profile.warnings}")
68
-
69
- # Column-level details
70
- for col in profile.columns:
71
- print(f"\n{col.name} ({col.dtype}):")
72
- print(f" Unique: {col.n_unique}")
73
- print(f" Missing: {col.n_missing} ({col.pct_missing:.1f}%)")
74
- if col.is_numeric:
75
- print(f" Range: [{col.min}, {col.max}]")
76
- print(f" Mean: {col.mean:.3f}, Std: {col.std:.3f}")
77
- ```
78
-
79
- ## Exploratory Data Analysis
80
-
81
- ```python
82
- # Guided EDA
83
- eda_result = ds.explore(
84
- data="dataset.csv",
85
- focus="relationships", # or "distributions", "outliers", "time_trends"
86
- target_column="outcome",
87
- )
88
-
89
- # Generated analyses include:
90
- # - Correlation heatmap
91
- # - Pairwise scatter plots for top correlations
92
- # - Distribution plots per group
93
- # - Statistical tests (t-test, ANOVA, chi-square)
94
- # - Outlier detection (IQR, Z-score)
95
-
96
- for finding in eda_result.findings:
97
- print(f"- {finding.description} (p={finding.p_value:.4f})")
98
- ```
99
-
100
- ## Feature Engineering
101
-
102
- ```python
103
- # Automatic feature engineering
104
- features = ds.engineer_features(
105
- data="dataset.csv",
106
- target="outcome",
107
- strategies=[
108
- "polynomial_interactions", # x1*x2, x1^2
109
- "datetime_extraction", # year, month, day_of_week
110
- "text_embeddings", # TF-IDF or sentence embeddings
111
- "binning", # numeric to categorical
112
- "target_encoding", # category to target mean
113
- ],
114
- selection_method="mutual_information",
115
- max_features=50,
116
- )
117
-
118
- print(f"Original features: {features.n_original}")
119
- print(f"Generated features: {features.n_generated}")
120
- print(f"Selected features: {features.n_selected}")
121
- ```
122
-
123
- ## Modeling Pipeline
124
-
125
- ```python
126
- result = ds.model(
127
- data="dataset.csv",
128
- target="outcome",
129
- task_type="classification", # or "regression"
130
- models=["logistic_regression", "random_forest",
131
- "gradient_boosting", "neural_network"],
132
- cv_folds=5,
133
- metric="f1_macro",
134
- )
135
-
136
- # Model comparison table
137
- print(result.comparison_table)
138
- # | Model | F1 Macro | Accuracy | AUC |
139
- # |--------------------|----------|----------|-------|
140
- # | Gradient Boosting | 0.847 | 0.862 | 0.921 |
141
- # | Random Forest | 0.831 | 0.849 | 0.908 |
142
- # | ... | | | |
143
-
144
- # Best model details
145
- best = result.best_model
146
- print(f"Best: {best.name}")
147
- print(f"Feature importance:\n{best.feature_importance.head(10)}")
148
- ```
149
-
150
- ## Report Generation
151
-
152
- ```python
153
- # Generate publication-ready report
154
- result = ds.analyze(
155
- data="experiment_results.csv",
156
- task="Full analysis with statistical tests",
157
- report_config={
158
- "format": "html", # html, pdf, markdown
159
- "style": "academic", # academic, business, minimal
160
- "include_code": True, # Show generated code
161
- "figure_dpi": 300, # Publication quality
162
- },
163
- )
164
-
165
- result.save_report("analysis_report.html")
166
- ```
167
-
168
- ## Configuration
169
-
170
- ```python
171
- ds = DataScientist(
172
- llm_provider="anthropic",
173
- model="claude-sonnet-4-20250514",
174
- execution_config={
175
- "timeout": 300, # Max seconds per code block
176
- "max_iterations": 10, # Refinement iterations
177
- "sandbox": True, # Isolated execution
178
- },
179
- analysis_config={
180
- "significance_level": 0.05,
181
- "random_state": 42,
182
- "test_size": 0.2,
183
- },
184
- )
185
- ```
186
-
187
- ## Use Cases
188
-
189
- 1. **Experiment analysis**: Analyze lab or survey data with statistical tests
190
- 2. **Dataset exploration**: Quick EDA on unfamiliar datasets
191
- 3. **Baseline modeling**: Rapid prototyping of predictive models
192
- 4. **Report generation**: Automated analysis reports for publications
193
-
194
- ## References
195
-
196
- - [Open Data Scientist GitHub](https://github.com/Open-Data-Scientist/open-data-scientist)
197
- - [Pandas Profiling](https://github.com/ydataai/ydata-profiling)
@@ -1,159 +0,0 @@
1
- ---
2
- name: annotated-dl-papers-guide
3
- description: "Annotated deep learning paper implementations with side-by-side notes"
4
- metadata:
5
- openclaw:
6
- emoji: "📝"
7
- category: "domains"
8
- subcategory: "ai-ml"
9
- keywords: ["deep-learning", "paper-implementation", "annotations", "transformer", "gan", "diffusion"]
10
- source: "https://github.com/labmlai/annotated_deep_learning_paper_implementations"
11
- ---
12
-
13
- # Annotated Deep Learning Papers Guide
14
-
15
- ## Overview
16
-
17
- The annotated_deep_learning_paper_implementations project, maintained by labml.ai with over 66,000 GitHub stars, provides 60+ implementations of influential deep learning papers with detailed, line-by-line annotations. Each implementation is presented as a literate programming document where the code and explanations are interwoven, making it possible to read the paper and understand the implementation simultaneously.
18
-
19
- This project bridges the gap between reading a research paper and understanding its practical implementation. For academic researchers, this is an essential resource because many breakthrough papers omit crucial implementation details, and reproducing results from a paper description alone can take weeks. The annotated implementations cover transformers, GANs, diffusion models, reinforcement learning algorithms, optimizers, and many other core deep learning building blocks.
20
-
21
- All implementations are written in PyTorch and are designed to be self-contained, readable, and runnable. The project also provides a web interface at papers.labml.ai where you can browse implementations with syntax-highlighted code alongside formatted annotations.
22
-
23
- ## Installation and Setup
24
-
25
- Install the labml packages to use the implementations and experiment tracking:
26
-
27
- ```bash
28
- # Install the core library
29
- pip install labml labml-nn
30
-
31
- # Clone for direct access to all implementations
32
- git clone https://github.com/labmlai/annotated_deep_learning_paper_implementations.git
33
- cd annotated_deep_learning_paper_implementations
34
-
35
- # Install in development mode
36
- pip install -e .
37
- ```
38
-
39
- Requirements:
40
-
41
- - Python 3.8+
42
- - PyTorch >= 1.9
43
- - labml >= 0.5 (experiment tracking and configuration)
44
- - numpy, einops for tensor operations
45
-
46
- The `labml` library provides experiment tracking, configuration management, and training loop utilities that are used throughout the implementations.
47
-
48
- ## Core Paper Categories
49
-
50
- ### Transformers and Attention
51
-
52
- The project includes comprehensive implementations of the transformer family:
53
-
54
- - **Original Transformer** (Vaswani et al., 2017): Multi-head attention, positional encoding, encoder-decoder architecture
55
- - **GPT and GPT-2**: Autoregressive language modeling with causal attention
56
- - **BERT**: Masked language modeling and next sentence prediction
57
- - **Vision Transformer (ViT)**: Applying transformers to image classification
58
- - **Flash Attention**: Memory-efficient attention computation
59
- - **Rotary Position Embeddings (RoPE)**: Position encoding used in modern LLMs
60
- - **Mixture of Experts (MoE)**: Sparse expert routing for scaling models
61
-
62
- ```python
63
- # Example: Multi-head attention from the transformer implementation
64
- from labml_nn.transformers.mha import MultiHeadAttention
65
-
66
- # The implementation includes detailed annotations explaining
67
- # each step of the attention computation
68
- mha = MultiHeadAttention(
69
- heads=8,
70
- d_model=512,
71
- dropout_prob=0.1
72
- )
73
- ```
74
-
75
- ### Generative Models
76
-
77
- - **GAN** (Goodfellow et al., 2014): Original generative adversarial network
78
- - **DCGAN**: Deep convolutional GAN with architectural guidelines
79
- - **StyleGAN**: Style-based generator architecture
80
- - **Diffusion Models (DDPM)**: Denoising diffusion probabilistic models
81
- - **Stable Diffusion**: Latent diffusion with CLIP conditioning
82
- - **VAE**: Variational autoencoders with KL divergence
83
-
84
- ### Optimization and Training
85
-
86
- - **Adam, AdamW**: Adaptive learning rate optimizers
87
- - **LAMB**: Large batch optimization for distributed training
88
- - **Noam learning rate schedule**: Warmup + inverse square root decay
89
- - **Gradient clipping**: Norm-based and value-based clipping
90
- - **Mixed precision training**: FP16/BF16 training techniques
91
-
92
- ### Normalization and Regularization
93
-
94
- - **Batch Normalization**: Per-batch statistics normalization
95
- - **Layer Normalization**: Per-sample normalization for transformers
96
- - **RMSNorm**: Simplified normalization used in LLaMA
97
- - **Dropout and DropPath**: Stochastic regularization methods
98
-
99
- ## Using Implementations in Research
100
-
101
- Each implementation can be used as a building block in your own research projects. The modular design allows you to swap components easily:
102
-
103
- ```python
104
- from labml_nn.transformers import TransformerLayer
105
- from labml_nn.transformers.mha import MultiHeadAttention
106
- from labml_nn.normalization.rmsnorm import RMSNorm
107
-
108
- # Build a custom transformer block with RMSNorm instead of LayerNorm
109
- class CustomTransformerBlock(nn.Module):
110
- def __init__(self, d_model, heads, d_ff):
111
- super().__init__()
112
- self.attention = MultiHeadAttention(heads, d_model)
113
- self.norm1 = RMSNorm(d_model)
114
- self.norm2 = RMSNorm(d_model)
115
- self.feed_forward = nn.Sequential(
116
- nn.Linear(d_model, d_ff),
117
- nn.GELU(),
118
- nn.Linear(d_ff, d_model)
119
- )
120
-
121
- def forward(self, x):
122
- x = x + self.attention(self.norm1(x), self.norm1(x), self.norm1(x), None)
123
- x = x + self.feed_forward(self.norm2(x))
124
- return x
125
- ```
126
-
127
- The experiment tracking integration with labml makes it straightforward to log metrics, hyperparameters, and model checkpoints:
128
-
129
- ```python
130
- from labml import experiment, tracker
131
-
132
- # Create an experiment
133
- experiment.create(name="custom_transformer_ablation")
134
-
135
- # Track metrics during training
136
- for epoch in range(num_epochs):
137
- for batch in dataloader:
138
- loss = train_step(batch)
139
- tracker.save({"loss": loss, "epoch": epoch})
140
- ```
141
-
142
- ## Research Workflow Integration
143
-
144
- This project fits naturally into an academic deep learning research workflow:
145
-
146
- 1. **Literature review**: Read the annotated implementation alongside the original paper to build deep understanding
147
- 2. **Baseline reproduction**: Use the provided implementation as a verified baseline for comparison experiments
148
- 3. **Architecture modification**: Fork a specific implementation and modify components for your research hypothesis
149
- 4. **Ablation studies**: Systematically disable or replace components to measure their contribution
150
- 5. **Paper writing**: Reference the annotated implementation for accurate method descriptions
151
-
152
- The web interface at papers.labml.ai provides a searchable index of all implementations, organized by topic. Each page shows the paper citation, a brief summary, and the annotated code with toggleable explanations.
153
-
154
- ## References
155
-
156
- - Repository: https://github.com/labmlai/annotated_deep_learning_paper_implementations
157
- - Web interface: https://nn.labml.ai/
158
- - labml experiment tracking: https://github.com/labmlai/labml
159
- - PyTorch documentation: https://pytorch.org/docs/stable/