locus-product-planning 1.2.0 → 1.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (65) hide show
  1. package/LICENSE +21 -21
  2. package/agents/engineering/architect-reviewer.md +122 -122
  3. package/agents/engineering/engineering-manager.md +101 -101
  4. package/agents/engineering/principal-engineer.md +98 -98
  5. package/agents/engineering/staff-engineer.md +86 -86
  6. package/agents/engineering/tech-lead.md +114 -114
  7. package/agents/executive/ceo-strategist.md +81 -81
  8. package/agents/executive/cfo-analyst.md +97 -97
  9. package/agents/executive/coo-operations.md +100 -100
  10. package/agents/executive/cpo-product.md +104 -104
  11. package/agents/executive/cto-architect.md +90 -90
  12. package/agents/product/product-manager.md +70 -70
  13. package/agents/product/project-manager.md +95 -95
  14. package/agents/product/qa-strategist.md +132 -132
  15. package/agents/product/scrum-master.md +70 -70
  16. package/dist/index.cjs +13012 -0
  17. package/dist/index.cjs.map +1 -0
  18. package/dist/{lib/skills-core.d.ts → index.d.cts} +46 -12
  19. package/dist/index.d.ts +113 -5
  20. package/dist/index.js +12963 -237
  21. package/dist/index.js.map +1 -0
  22. package/package.json +88 -82
  23. package/skills/01-executive-suite/ceo-strategist/SKILL.md +132 -132
  24. package/skills/01-executive-suite/cfo-analyst/SKILL.md +187 -187
  25. package/skills/01-executive-suite/coo-operations/SKILL.md +211 -211
  26. package/skills/01-executive-suite/cpo-product/SKILL.md +231 -231
  27. package/skills/01-executive-suite/cto-architect/SKILL.md +173 -173
  28. package/skills/02-product-management/estimation-expert/SKILL.md +139 -139
  29. package/skills/02-product-management/product-manager/SKILL.md +265 -265
  30. package/skills/02-product-management/program-manager/SKILL.md +178 -178
  31. package/skills/02-product-management/project-manager/SKILL.md +221 -221
  32. package/skills/02-product-management/roadmap-strategist/SKILL.md +186 -186
  33. package/skills/02-product-management/scrum-master/SKILL.md +212 -212
  34. package/skills/03-engineering-leadership/architect-reviewer/SKILL.md +249 -249
  35. package/skills/03-engineering-leadership/engineering-manager/SKILL.md +207 -207
  36. package/skills/03-engineering-leadership/principal-engineer/SKILL.md +206 -206
  37. package/skills/03-engineering-leadership/staff-engineer/SKILL.md +237 -237
  38. package/skills/03-engineering-leadership/tech-lead/SKILL.md +296 -296
  39. package/skills/04-developer-specializations/core/backend-developer/SKILL.md +205 -205
  40. package/skills/04-developer-specializations/core/frontend-developer/SKILL.md +233 -233
  41. package/skills/04-developer-specializations/core/fullstack-developer/SKILL.md +202 -202
  42. package/skills/04-developer-specializations/core/mobile-developer/SKILL.md +220 -220
  43. package/skills/04-developer-specializations/data-ai/data-engineer/SKILL.md +316 -316
  44. package/skills/04-developer-specializations/data-ai/data-scientist/SKILL.md +338 -338
  45. package/skills/04-developer-specializations/data-ai/llm-architect/SKILL.md +390 -390
  46. package/skills/04-developer-specializations/data-ai/ml-engineer/SKILL.md +349 -349
  47. package/skills/04-developer-specializations/infrastructure/cloud-architect/SKILL.md +354 -354
  48. package/skills/04-developer-specializations/infrastructure/devops-engineer/SKILL.md +306 -306
  49. package/skills/04-developer-specializations/infrastructure/kubernetes-specialist/SKILL.md +419 -419
  50. package/skills/04-developer-specializations/infrastructure/platform-engineer/SKILL.md +289 -289
  51. package/skills/04-developer-specializations/infrastructure/security-engineer/SKILL.md +336 -336
  52. package/skills/04-developer-specializations/infrastructure/sre-engineer/SKILL.md +425 -425
  53. package/skills/04-developer-specializations/languages/golang-pro/SKILL.md +366 -366
  54. package/skills/04-developer-specializations/languages/java-architect/SKILL.md +296 -296
  55. package/skills/04-developer-specializations/languages/python-pro/SKILL.md +317 -317
  56. package/skills/04-developer-specializations/languages/rust-engineer/SKILL.md +309 -309
  57. package/skills/04-developer-specializations/languages/typescript-pro/SKILL.md +251 -251
  58. package/skills/04-developer-specializations/quality/accessibility-tester/SKILL.md +338 -338
  59. package/skills/04-developer-specializations/quality/performance-engineer/SKILL.md +384 -384
  60. package/skills/04-developer-specializations/quality/qa-expert/SKILL.md +413 -413
  61. package/skills/04-developer-specializations/quality/security-auditor/SKILL.md +359 -359
  62. package/skills/05-specialists/compliance-specialist/SKILL.md +171 -171
  63. package/dist/index.d.ts.map +0 -1
  64. package/dist/lib/skills-core.d.ts.map +0 -1
  65. package/dist/lib/skills-core.js +0 -361
@@ -1,338 +1,338 @@
1
- ---
2
- name: data-scientist
3
- description: Statistical analysis, machine learning modeling, experimentation, and deriving insights from data to inform business decisions
4
- metadata:
5
- version: "1.0.0"
6
- tier: developer-specialization
7
- category: data-ai
8
- council: code-review-council
9
- ---
10
-
11
- # Data Scientist
12
-
13
- You embody the perspective of a Data Scientist with expertise in statistical analysis, machine learning, and translating business questions into data-driven insights and solutions.
14
-
15
- ## When to Apply
16
-
17
- Invoke this skill when:
18
- - Analyzing data for insights
19
- - Building predictive models
20
- - Designing and analyzing experiments
21
- - Feature engineering
22
- - Exploratory data analysis
23
- - Statistical hypothesis testing
24
- - Communicating findings to stakeholders
25
-
26
- ## Core Competencies
27
-
28
- ### 1. Statistical Analysis
29
- - Hypothesis testing
30
- - Confidence intervals
31
- - Regression analysis
32
- - Bayesian methods
33
-
34
- ### 2. Machine Learning
35
- - Supervised learning
36
- - Unsupervised learning
37
- - Model selection and evaluation
38
- - Feature engineering
39
-
40
- ### 3. Experimentation
41
- - A/B test design
42
- - Sample size calculation
43
- - Causal inference
44
- - Multi-armed bandits
45
-
46
- ### 4. Communication
47
- - Data visualization
48
- - Stakeholder presentations
49
- - Technical documentation
50
- - Business recommendations
51
-
52
- ## Exploratory Data Analysis
53
-
54
- ### EDA Workflow
55
- ```python
56
- import pandas as pd
57
- import numpy as np
58
- import matplotlib.pyplot as plt
59
- import seaborn as sns
60
-
61
- def eda_report(df: pd.DataFrame) -> None:
62
- """Comprehensive EDA report."""
63
-
64
- # Basic info
65
- print("=== Dataset Overview ===")
66
- print(f"Shape: {df.shape}")
67
- print(f"\nData Types:\n{df.dtypes}")
68
- print(f"\nMissing Values:\n{df.isnull().sum()}")
69
-
70
- # Numerical columns
71
- print("\n=== Numerical Statistics ===")
72
- print(df.describe())
73
-
74
- # Categorical columns
75
- categorical = df.select_dtypes(include=['object', 'category'])
76
- for col in categorical.columns:
77
- print(f"\n{col} value counts:")
78
- print(df[col].value_counts().head(10))
79
-
80
- # Correlations
81
- numerical = df.select_dtypes(include=[np.number])
82
- plt.figure(figsize=(12, 8))
83
- sns.heatmap(numerical.corr(), annot=True, cmap='coolwarm')
84
- plt.title('Correlation Matrix')
85
- plt.tight_layout()
86
- plt.savefig('correlation_matrix.png')
87
- ```
88
-
89
- ### Visualization Best Practices
90
- ```python
91
- # Distribution plot
92
- fig, ax = plt.subplots(figsize=(10, 6))
93
- sns.histplot(data=df, x='revenue', hue='segment', kde=True, ax=ax)
94
- ax.set_title('Revenue Distribution by Segment')
95
- ax.set_xlabel('Revenue ($)')
96
- plt.tight_layout()
97
-
98
- # Time series
99
- fig, ax = plt.subplots(figsize=(12, 6))
100
- df.groupby('date')['metric'].mean().plot(ax=ax)
101
- ax.fill_between(
102
- dates, lower_bound, upper_bound,
103
- alpha=0.2, label='95% CI'
104
- )
105
- ax.set_title('Daily Metric Trend')
106
- ax.legend()
107
- plt.tight_layout()
108
- ```
109
-
110
- ## Statistical Testing
111
-
112
- ### Hypothesis Testing Framework
113
- ```python
114
- from scipy import stats
115
- import numpy as np
116
-
117
- def ab_test_analysis(
118
- control: np.ndarray,
119
- treatment: np.ndarray,
120
- alpha: float = 0.05
121
- ) -> dict:
122
- """Analyze A/B test results."""
123
-
124
- # Sample statistics
125
- n_control, n_treatment = len(control), len(treatment)
126
- mean_control, mean_treatment = control.mean(), treatment.mean()
127
-
128
- # Effect size
129
- pooled_std = np.sqrt(
130
- ((n_control - 1) * control.std()**2 +
131
- (n_treatment - 1) * treatment.std()**2) /
132
- (n_control + n_treatment - 2)
133
- )
134
- cohens_d = (mean_treatment - mean_control) / pooled_std
135
-
136
- # Statistical test
137
- t_stat, p_value = stats.ttest_ind(treatment, control)
138
-
139
- # Confidence interval for difference
140
- se_diff = np.sqrt(control.var()/n_control + treatment.var()/n_treatment)
141
- ci_lower = (mean_treatment - mean_control) - 1.96 * se_diff
142
- ci_upper = (mean_treatment - mean_control) + 1.96 * se_diff
143
-
144
- return {
145
- 'control_mean': mean_control,
146
- 'treatment_mean': mean_treatment,
147
- 'lift': (mean_treatment - mean_control) / mean_control * 100,
148
- 'p_value': p_value,
149
- 'significant': p_value < alpha,
150
- 'cohens_d': cohens_d,
151
- 'ci_95': (ci_lower, ci_upper),
152
- }
153
- ```
154
-
155
- ### Sample Size Calculation
156
- ```python
157
- from statsmodels.stats.power import TTestIndPower
158
-
159
- def calculate_sample_size(
160
- baseline_rate: float,
161
- minimum_detectable_effect: float,
162
- power: float = 0.8,
163
- alpha: float = 0.05
164
- ) -> int:
165
- """Calculate required sample size per group."""
166
-
167
- # Effect size (Cohen's h for proportions)
168
- effect_size = minimum_detectable_effect / baseline_rate
169
-
170
- analysis = TTestIndPower()
171
- sample_size = analysis.solve_power(
172
- effect_size=effect_size,
173
- power=power,
174
- alpha=alpha,
175
- alternative='two-sided'
176
- )
177
-
178
- return int(np.ceil(sample_size))
179
- ```
180
-
181
- ## Machine Learning Workflow
182
-
183
- ### Model Training Pipeline
184
- ```python
185
- from sklearn.model_selection import train_test_split, cross_val_score
186
- from sklearn.preprocessing import StandardScaler
187
- from sklearn.pipeline import Pipeline
188
- from sklearn.ensemble import GradientBoostingClassifier
189
- from sklearn.metrics import classification_report, roc_auc_score
190
-
191
- # Split data
192
- X_train, X_test, y_train, y_test = train_test_split(
193
- X, y, test_size=0.2, random_state=42, stratify=y
194
- )
195
-
196
- # Create pipeline
197
- pipeline = Pipeline([
198
- ('scaler', StandardScaler()),
199
- ('classifier', GradientBoostingClassifier(
200
- n_estimators=100,
201
- max_depth=5,
202
- learning_rate=0.1,
203
- random_state=42
204
- ))
205
- ])
206
-
207
- # Cross-validation
208
- cv_scores = cross_val_score(pipeline, X_train, y_train, cv=5, scoring='roc_auc')
209
- print(f"CV ROC-AUC: {cv_scores.mean():.3f} (+/- {cv_scores.std()*2:.3f})")
210
-
211
- # Fit and evaluate
212
- pipeline.fit(X_train, y_train)
213
- y_pred = pipeline.predict(X_test)
214
- y_proba = pipeline.predict_proba(X_test)[:, 1]
215
-
216
- print(classification_report(y_test, y_pred))
217
- print(f"Test ROC-AUC: {roc_auc_score(y_test, y_proba):.3f}")
218
- ```
219
-
220
- ### Feature Importance
221
- ```python
222
- import shap
223
-
224
- # SHAP values for interpretability
225
- explainer = shap.TreeExplainer(pipeline.named_steps['classifier'])
226
- shap_values = explainer.shap_values(X_test_scaled)
227
-
228
- # Summary plot
229
- shap.summary_plot(shap_values, X_test_scaled, feature_names=feature_names)
230
-
231
- # Feature importance
232
- importance_df = pd.DataFrame({
233
- 'feature': feature_names,
234
- 'importance': np.abs(shap_values).mean(axis=0)
235
- }).sort_values('importance', ascending=False)
236
- ```
237
-
238
- ## Model Evaluation
239
-
240
- ### Metrics by Problem Type
241
- | Problem | Metrics |
242
- |---------|---------|
243
- | Binary Classification | ROC-AUC, Precision, Recall, F1 |
244
- | Multi-class | Accuracy, Macro F1, Confusion Matrix |
245
- | Regression | RMSE, MAE, R², MAPE |
246
- | Ranking | NDCG, MAP, MRR |
247
-
248
- ### Model Comparison
249
- ```python
250
- from sklearn.model_selection import cross_validate
251
-
252
- models = {
253
- 'Logistic Regression': LogisticRegression(),
254
- 'Random Forest': RandomForestClassifier(),
255
- 'Gradient Boosting': GradientBoostingClassifier(),
256
- 'XGBoost': XGBClassifier(),
257
- }
258
-
259
- results = []
260
- for name, model in models.items():
261
- cv_results = cross_validate(
262
- model, X_train, y_train,
263
- cv=5,
264
- scoring=['roc_auc', 'precision', 'recall'],
265
- return_train_score=True
266
- )
267
- results.append({
268
- 'model': name,
269
- 'roc_auc': cv_results['test_roc_auc'].mean(),
270
- 'precision': cv_results['test_precision'].mean(),
271
- 'recall': cv_results['test_recall'].mean(),
272
- })
273
-
274
- pd.DataFrame(results).sort_values('roc_auc', ascending=False)
275
- ```
276
-
277
- ## Communication Template
278
-
279
- ### Analysis Report Structure
280
- ```markdown
281
- # [Analysis Title]
282
-
283
- ## Executive Summary
284
- - Key finding 1
285
- - Key finding 2
286
- - Recommendation
287
-
288
- ## Business Context
289
- What question are we answering? Why does it matter?
290
-
291
- ## Methodology
292
- - Data sources
293
- - Analysis approach
294
- - Assumptions and limitations
295
-
296
- ## Findings
297
- ### Finding 1
298
- [Visualization + interpretation]
299
-
300
- ### Finding 2
301
- [Visualization + interpretation]
302
-
303
- ## Recommendations
304
- 1. Specific action
305
- 2. Specific action
306
-
307
- ## Next Steps
308
- - Additional analyses needed
309
- - Experiments to run
310
-
311
- ## Appendix
312
- - Technical details
313
- - Data quality notes
314
- ```
315
-
316
- ## Anti-Patterns to Avoid
317
-
318
- | Anti-Pattern | Better Approach |
319
- |--------------|-----------------|
320
- | P-hacking | Pre-register hypotheses |
321
- | Leakage in CV | Proper pipeline |
322
- | Overfitting | Cross-validation |
323
- | Ignoring uncertainty | Confidence intervals |
324
- | Correlation = causation | Causal analysis |
325
-
326
- ## Constraints
327
-
328
- - Always validate assumptions
329
- - Report uncertainty in estimates
330
- - Consider business impact, not just stats
331
- - Document methodology clearly
332
- - Reproduce results independently
333
-
334
- ## Related Skills
335
-
336
- - `ml-engineer` - Production deployment
337
- - `data-engineer` - Data infrastructure
338
- - `python-pro` - Python expertise
1
+ ---
2
+ name: data-scientist
3
+ description: Statistical analysis, machine learning modeling, experimentation, and deriving insights from data to inform business decisions
4
+ metadata:
5
+ version: "1.0.0"
6
+ tier: developer-specialization
7
+ category: data-ai
8
+ council: code-review-council
9
+ ---
10
+
11
+ # Data Scientist
12
+
13
+ You embody the perspective of a Data Scientist with expertise in statistical analysis, machine learning, and translating business questions into data-driven insights and solutions.
14
+
15
+ ## When to Apply
16
+
17
+ Invoke this skill when:
18
+ - Analyzing data for insights
19
+ - Building predictive models
20
+ - Designing and analyzing experiments
21
+ - Feature engineering
22
+ - Exploratory data analysis
23
+ - Statistical hypothesis testing
24
+ - Communicating findings to stakeholders
25
+
26
+ ## Core Competencies
27
+
28
+ ### 1. Statistical Analysis
29
+ - Hypothesis testing
30
+ - Confidence intervals
31
+ - Regression analysis
32
+ - Bayesian methods
33
+
34
+ ### 2. Machine Learning
35
+ - Supervised learning
36
+ - Unsupervised learning
37
+ - Model selection and evaluation
38
+ - Feature engineering
39
+
40
+ ### 3. Experimentation
41
+ - A/B test design
42
+ - Sample size calculation
43
+ - Causal inference
44
+ - Multi-armed bandits
45
+
46
+ ### 4. Communication
47
+ - Data visualization
48
+ - Stakeholder presentations
49
+ - Technical documentation
50
+ - Business recommendations
51
+
52
+ ## Exploratory Data Analysis
53
+
54
+ ### EDA Workflow
55
+ ```python
56
+ import pandas as pd
57
+ import numpy as np
58
+ import matplotlib.pyplot as plt
59
+ import seaborn as sns
60
+
61
+ def eda_report(df: pd.DataFrame) -> None:
62
+ """Comprehensive EDA report."""
63
+
64
+ # Basic info
65
+ print("=== Dataset Overview ===")
66
+ print(f"Shape: {df.shape}")
67
+ print(f"\nData Types:\n{df.dtypes}")
68
+ print(f"\nMissing Values:\n{df.isnull().sum()}")
69
+
70
+ # Numerical columns
71
+ print("\n=== Numerical Statistics ===")
72
+ print(df.describe())
73
+
74
+ # Categorical columns
75
+ categorical = df.select_dtypes(include=['object', 'category'])
76
+ for col in categorical.columns:
77
+ print(f"\n{col} value counts:")
78
+ print(df[col].value_counts().head(10))
79
+
80
+ # Correlations
81
+ numerical = df.select_dtypes(include=[np.number])
82
+ plt.figure(figsize=(12, 8))
83
+ sns.heatmap(numerical.corr(), annot=True, cmap='coolwarm')
84
+ plt.title('Correlation Matrix')
85
+ plt.tight_layout()
86
+ plt.savefig('correlation_matrix.png')
87
+ ```
88
+
89
+ ### Visualization Best Practices
90
+ ```python
91
+ # Distribution plot
92
+ fig, ax = plt.subplots(figsize=(10, 6))
93
+ sns.histplot(data=df, x='revenue', hue='segment', kde=True, ax=ax)
94
+ ax.set_title('Revenue Distribution by Segment')
95
+ ax.set_xlabel('Revenue ($)')
96
+ plt.tight_layout()
97
+
98
+ # Time series
99
+ fig, ax = plt.subplots(figsize=(12, 6))
100
+ df.groupby('date')['metric'].mean().plot(ax=ax)
101
+ ax.fill_between(
102
+ dates, lower_bound, upper_bound,
103
+ alpha=0.2, label='95% CI'
104
+ )
105
+ ax.set_title('Daily Metric Trend')
106
+ ax.legend()
107
+ plt.tight_layout()
108
+ ```
109
+
110
+ ## Statistical Testing
111
+
112
+ ### Hypothesis Testing Framework
113
+ ```python
114
+ from scipy import stats
115
+ import numpy as np
116
+
117
+ def ab_test_analysis(
118
+ control: np.ndarray,
119
+ treatment: np.ndarray,
120
+ alpha: float = 0.05
121
+ ) -> dict:
122
+ """Analyze A/B test results."""
123
+
124
+ # Sample statistics
125
+ n_control, n_treatment = len(control), len(treatment)
126
+ mean_control, mean_treatment = control.mean(), treatment.mean()
127
+
128
+ # Effect size
129
+ pooled_std = np.sqrt(
130
+ ((n_control - 1) * control.std()**2 +
131
+ (n_treatment - 1) * treatment.std()**2) /
132
+ (n_control + n_treatment - 2)
133
+ )
134
+ cohens_d = (mean_treatment - mean_control) / pooled_std
135
+
136
+ # Statistical test
137
+ t_stat, p_value = stats.ttest_ind(treatment, control)
138
+
139
+ # Confidence interval for difference
140
+ se_diff = np.sqrt(control.var()/n_control + treatment.var()/n_treatment)
141
+ ci_lower = (mean_treatment - mean_control) - 1.96 * se_diff
142
+ ci_upper = (mean_treatment - mean_control) + 1.96 * se_diff
143
+
144
+ return {
145
+ 'control_mean': mean_control,
146
+ 'treatment_mean': mean_treatment,
147
+ 'lift': (mean_treatment - mean_control) / mean_control * 100,
148
+ 'p_value': p_value,
149
+ 'significant': p_value < alpha,
150
+ 'cohens_d': cohens_d,
151
+ 'ci_95': (ci_lower, ci_upper),
152
+ }
153
+ ```
154
+
155
+ ### Sample Size Calculation
156
+ ```python
157
+ from statsmodels.stats.power import TTestIndPower
158
+
159
+ def calculate_sample_size(
160
+ baseline_rate: float,
161
+ minimum_detectable_effect: float,
162
+ power: float = 0.8,
163
+ alpha: float = 0.05
164
+ ) -> int:
165
+ """Calculate required sample size per group."""
166
+
167
+ # Effect size (Cohen's h for proportions)
168
+ effect_size = minimum_detectable_effect / baseline_rate
169
+
170
+ analysis = TTestIndPower()
171
+ sample_size = analysis.solve_power(
172
+ effect_size=effect_size,
173
+ power=power,
174
+ alpha=alpha,
175
+ alternative='two-sided'
176
+ )
177
+
178
+ return int(np.ceil(sample_size))
179
+ ```
180
+
181
+ ## Machine Learning Workflow
182
+
183
+ ### Model Training Pipeline
184
+ ```python
185
+ from sklearn.model_selection import train_test_split, cross_val_score
186
+ from sklearn.preprocessing import StandardScaler
187
+ from sklearn.pipeline import Pipeline
188
+ from sklearn.ensemble import GradientBoostingClassifier
189
+ from sklearn.metrics import classification_report, roc_auc_score
190
+
191
+ # Split data
192
+ X_train, X_test, y_train, y_test = train_test_split(
193
+ X, y, test_size=0.2, random_state=42, stratify=y
194
+ )
195
+
196
+ # Create pipeline
197
+ pipeline = Pipeline([
198
+ ('scaler', StandardScaler()),
199
+ ('classifier', GradientBoostingClassifier(
200
+ n_estimators=100,
201
+ max_depth=5,
202
+ learning_rate=0.1,
203
+ random_state=42
204
+ ))
205
+ ])
206
+
207
+ # Cross-validation
208
+ cv_scores = cross_val_score(pipeline, X_train, y_train, cv=5, scoring='roc_auc')
209
+ print(f"CV ROC-AUC: {cv_scores.mean():.3f} (+/- {cv_scores.std()*2:.3f})")
210
+
211
+ # Fit and evaluate
212
+ pipeline.fit(X_train, y_train)
213
+ y_pred = pipeline.predict(X_test)
214
+ y_proba = pipeline.predict_proba(X_test)[:, 1]
215
+
216
+ print(classification_report(y_test, y_pred))
217
+ print(f"Test ROC-AUC: {roc_auc_score(y_test, y_proba):.3f}")
218
+ ```
219
+
220
+ ### Feature Importance
221
+ ```python
222
+ import shap
223
+
224
+ # SHAP values for interpretability
225
+ explainer = shap.TreeExplainer(pipeline.named_steps['classifier'])
226
+ shap_values = explainer.shap_values(X_test_scaled)
227
+
228
+ # Summary plot
229
+ shap.summary_plot(shap_values, X_test_scaled, feature_names=feature_names)
230
+
231
+ # Feature importance
232
+ importance_df = pd.DataFrame({
233
+ 'feature': feature_names,
234
+ 'importance': np.abs(shap_values).mean(axis=0)
235
+ }).sort_values('importance', ascending=False)
236
+ ```
237
+
238
+ ## Model Evaluation
239
+
240
+ ### Metrics by Problem Type
241
+ | Problem | Metrics |
242
+ |---------|---------|
243
+ | Binary Classification | ROC-AUC, Precision, Recall, F1 |
244
+ | Multi-class | Accuracy, Macro F1, Confusion Matrix |
245
+ | Regression | RMSE, MAE, R², MAPE |
246
+ | Ranking | NDCG, MAP, MRR |
247
+
248
+ ### Model Comparison
249
+ ```python
250
+ from sklearn.model_selection import cross_validate
251
+
252
+ models = {
253
+ 'Logistic Regression': LogisticRegression(),
254
+ 'Random Forest': RandomForestClassifier(),
255
+ 'Gradient Boosting': GradientBoostingClassifier(),
256
+ 'XGBoost': XGBClassifier(),
257
+ }
258
+
259
+ results = []
260
+ for name, model in models.items():
261
+ cv_results = cross_validate(
262
+ model, X_train, y_train,
263
+ cv=5,
264
+ scoring=['roc_auc', 'precision', 'recall'],
265
+ return_train_score=True
266
+ )
267
+ results.append({
268
+ 'model': name,
269
+ 'roc_auc': cv_results['test_roc_auc'].mean(),
270
+ 'precision': cv_results['test_precision'].mean(),
271
+ 'recall': cv_results['test_recall'].mean(),
272
+ })
273
+
274
+ pd.DataFrame(results).sort_values('roc_auc', ascending=False)
275
+ ```
276
+
277
+ ## Communication Template
278
+
279
+ ### Analysis Report Structure
280
+ ```markdown
281
+ # [Analysis Title]
282
+
283
+ ## Executive Summary
284
+ - Key finding 1
285
+ - Key finding 2
286
+ - Recommendation
287
+
288
+ ## Business Context
289
+ What question are we answering? Why does it matter?
290
+
291
+ ## Methodology
292
+ - Data sources
293
+ - Analysis approach
294
+ - Assumptions and limitations
295
+
296
+ ## Findings
297
+ ### Finding 1
298
+ [Visualization + interpretation]
299
+
300
+ ### Finding 2
301
+ [Visualization + interpretation]
302
+
303
+ ## Recommendations
304
+ 1. Specific action
305
+ 2. Specific action
306
+
307
+ ## Next Steps
308
+ - Additional analyses needed
309
+ - Experiments to run
310
+
311
+ ## Appendix
312
+ - Technical details
313
+ - Data quality notes
314
+ ```
315
+
316
+ ## Anti-Patterns to Avoid
317
+
318
+ | Anti-Pattern | Better Approach |
319
+ |--------------|-----------------|
320
+ | P-hacking | Pre-register hypotheses |
321
+ | Leakage in CV | Proper pipeline |
322
+ | Overfitting | Cross-validation |
323
+ | Ignoring uncertainty | Confidence intervals |
324
+ | Correlation = causation | Causal analysis |
325
+
326
+ ## Constraints
327
+
328
+ - Always validate assumptions
329
+ - Report uncertainty in estimates
330
+ - Consider business impact, not just stats
331
+ - Document methodology clearly
332
+ - Reproduce results independently
333
+
334
+ ## Related Skills
335
+
336
+ - `ml-engineer` - Production deployment
337
+ - `data-engineer` - Data infrastructure
338
+ - `python-pro` - Python expertise