cnhkmcp 2.2.0__py3-none-any.whl → 2.3.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (96) hide show
  1. cnhkmcp/__init__.py +1 -1
  2. cnhkmcp/untracked/AI/321/206/320/261/320/234/321/211/320/255/320/262/321/206/320/237/320/242/321/204/342/225/227/342/225/242/README.md +1 -1
  3. cnhkmcp/untracked/AI/321/206/320/261/320/234/321/211/320/255/320/262/321/206/320/237/320/242/321/204/342/225/227/342/225/242/config.json +2 -2
  4. cnhkmcp/untracked/AI/321/206/320/261/320/234/321/211/320/255/320/262/321/206/320/237/320/242/321/204/342/225/227/342/225/242/main.py +1 -1
  5. cnhkmcp/untracked/AI/321/206/320/261/320/234/321/211/320/255/320/262/321/206/320/237/320/242/321/204/342/225/227/342/225/242/vector_db/chroma.sqlite3 +0 -0
  6. cnhkmcp/untracked/APP/Tranformer/Transformer.py +2 -2
  7. cnhkmcp/untracked/APP/Tranformer/transformer_config.json +1 -1
  8. cnhkmcp/untracked/APP/blueprints/feature_engineering.py +2 -2
  9. cnhkmcp/untracked/APP/blueprints/inspiration_house.py +4 -4
  10. cnhkmcp/untracked/APP/blueprints/paper_analysis.py +3 -3
  11. cnhkmcp/untracked/APP/give_me_idea/BRAIN_Alpha_Template_Expert_SystemPrompt.md +34 -73
  12. cnhkmcp/untracked/APP/give_me_idea/alpha_data_specific_template_master.py +2 -2
  13. cnhkmcp/untracked/APP/give_me_idea/what_is_Alpha_template.md +366 -1
  14. cnhkmcp/untracked/APP/static/inspiration.js +345 -13
  15. cnhkmcp/untracked/APP/templates/index.html +11 -3
  16. cnhkmcp/untracked/APP/templates/transformer_web.html +1 -1
  17. cnhkmcp/untracked/APP/trailSomeAlphas/README.md +38 -0
  18. cnhkmcp/untracked/APP/trailSomeAlphas/ace.log +66 -0
  19. cnhkmcp/untracked/APP/trailSomeAlphas/enhance_template.py +588 -0
  20. cnhkmcp/untracked/APP/trailSomeAlphas/requirements.txt +3 -0
  21. cnhkmcp/untracked/APP/trailSomeAlphas/run_pipeline.py +1001 -0
  22. cnhkmcp/untracked/APP/trailSomeAlphas/run_pipeline_step_by_step.ipynb +5258 -0
  23. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-data-feature-engineering/OUTPUT_TEMPLATE.md +325 -0
  24. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-data-feature-engineering/SKILL.md +503 -0
  25. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-data-feature-engineering/examples.md +244 -0
  26. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-data-feature-engineering/output_report/ASI_delay1_analyst11_ideas.md +285 -0
  27. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-data-feature-engineering/reference.md +399 -0
  28. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/SKILL.md +40 -0
  29. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/config.json +6 -0
  30. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709385783386000.json +388 -0
  31. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709386274840400.json +131 -0
  32. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709386838244700.json +1926 -0
  33. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709387369198500.json +31 -0
  34. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709387908905800.json +1926 -0
  35. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709388486243600.json +240 -0
  36. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709389024058600.json +1926 -0
  37. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709389549608700.json +41 -0
  38. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709390068714000.json +110 -0
  39. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709390591996900.json +36 -0
  40. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709391129137100.json +31 -0
  41. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709391691643500.json +41 -0
  42. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709392192099200.json +31 -0
  43. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709392703423500.json +46 -0
  44. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709393213729400.json +246 -0
  45. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710186683932500.json +388 -0
  46. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710187165414300.json +131 -0
  47. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710187665211700.json +1926 -0
  48. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710188149193400.json +31 -0
  49. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710188667627400.json +1926 -0
  50. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710189220822000.json +240 -0
  51. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710189726189500.json +1926 -0
  52. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710190248066100.json +41 -0
  53. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710190768298700.json +110 -0
  54. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710191282588100.json +36 -0
  55. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710191838960900.json +31 -0
  56. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710192396688000.json +41 -0
  57. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710192941922400.json +31 -0
  58. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710193473524600.json +46 -0
  59. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710194001961200.json +246 -0
  60. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710420975888800.json +46 -0
  61. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710421647590100.json +196 -0
  62. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710422131378500.json +5 -0
  63. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710422644184400.json +196 -0
  64. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710423702350600.json +196 -0
  65. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710424244661800.json +5 -0
  66. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_delay1.csv +211 -0
  67. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/final_expressions.json +7062 -0
  68. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/scripts/ace.log +3 -0
  69. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/scripts/ace_lib.py +1514 -0
  70. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/scripts/fetch_dataset.py +113 -0
  71. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/scripts/helpful_functions.py +180 -0
  72. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/scripts/implement_idea.py +236 -0
  73. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/scripts/merge_expression_list.py +90 -0
  74. cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/scripts/parsetab.py +60 -0
  75. cnhkmcp/untracked/APP/trailSomeAlphas/skills/template_final_enhance/op/321/206/320/220/342/225/227/321/207/342/225/227/320/243.md +434 -0
  76. cnhkmcp/untracked/APP/trailSomeAlphas/skills/template_final_enhance/sample_prompt.md +62 -0
  77. cnhkmcp/untracked/APP/trailSomeAlphas/skills/template_final_enhance//321/205/320/235/320/245/321/205/320/253/320/260/321/205/320/275/320/240/321/206/320/220/320/255/321/210/320/220/320/223/321/211/320/220/342/225/227/321/210/342/225/233/320/241/321/211/320/243/342/225/233.md +354 -0
  78. cnhkmcp/untracked/APP/usage.md +2 -2
  79. cnhkmcp/untracked/APP//321/210/342/224/220/320/240/321/210/320/261/320/234/321/206/320/231/320/243/321/205/342/225/235/320/220/321/206/320/230/320/241.py +388 -8
  80. cnhkmcp/untracked/skills/alpha-expression-verifier/scripts/validator.py +889 -0
  81. cnhkmcp/untracked/skills/brain-feature-implementation/scripts/implement_idea.py +4 -3
  82. cnhkmcp/untracked/skills/brain-improve-alpha-performance/arXiv_API_Tool_Manual.md +490 -0
  83. cnhkmcp/untracked/skills/brain-improve-alpha-performance/reference.md +1 -1
  84. cnhkmcp/untracked/skills/brain-improve-alpha-performance/scripts/arxiv_api.py +229 -0
  85. cnhkmcp/untracked//321/211/320/225/320/235/321/207/342/225/234/320/276/321/205/320/231/320/235/321/210/342/224/220/320/240/321/210/320/261/320/234/321/206/320/230/320/241_/321/205/320/276/320/231/321/210/320/263/320/225/321/205/342/224/220/320/225/321/210/320/266/320/221/321/204/342/225/233/320/255/321/210/342/225/241/320/246/321/205/320/234/320/225.py +35 -11
  86. cnhkmcp/vector_db/_manifest.json +1 -0
  87. cnhkmcp/vector_db/_meta.json +1 -0
  88. {cnhkmcp-2.2.0.dist-info → cnhkmcp-2.3.0.dist-info}/METADATA +1 -1
  89. {cnhkmcp-2.2.0.dist-info → cnhkmcp-2.3.0.dist-info}/RECORD +96 -30
  90. /cnhkmcp/untracked/{skills/expression_verifier → APP/trailSomeAlphas/skills/brain-feature-implementation}/scripts/validator.py +0 -0
  91. /cnhkmcp/untracked/skills/{expression_verifier → alpha-expression-verifier}/SKILL.md +0 -0
  92. /cnhkmcp/untracked/skills/{expression_verifier → alpha-expression-verifier}/scripts/verify_expr.py +0 -0
  93. {cnhkmcp-2.2.0.dist-info → cnhkmcp-2.3.0.dist-info}/WHEEL +0 -0
  94. {cnhkmcp-2.2.0.dist-info → cnhkmcp-2.3.0.dist-info}/entry_points.txt +0 -0
  95. {cnhkmcp-2.2.0.dist-info → cnhkmcp-2.3.0.dist-info}/licenses/LICENSE +0 -0
  96. {cnhkmcp-2.2.0.dist-info → cnhkmcp-2.3.0.dist-info}/top_level.txt +0 -0
@@ -0,0 +1,399 @@
1
+ # Feature Engineering Mindset Patterns
2
+
3
+ This document provides a comprehensive framework for **thinking** about feature engineering, not a list of patterns to apply blindly.
4
+
5
+ ## The Core Philosophy
6
+
7
+ **Feature engineering is not about finding predictive patterns—it's about understanding what data truly means and expressing that meaning in quantifiable ways.**
8
+
9
+ ## 1. Data Semantic Understanding Framework
10
+
11
+ ### Field Deconstruction Methodology
12
+
13
+ **For each field, ask these fundamental questions:**
14
+
15
+ #### What is being measured?
16
+ - Not just the surface description—what is the actual entity or concept?
17
+ - Example: Don't think "P/E ratio", think "price divided by earnings per share"
18
+ - What is the "thing" behind the numbers?
19
+
20
+ #### How is it measured?
21
+ - Data collection method (survey, sensor, calculation)
22
+ - Assumptions embedded in measurement
23
+ - Frequency and timing considerations
24
+ - Example: Book values are quarterly, audited, historical cost; market cap is continuous, forward-looking
25
+
26
+ #### What is the time dimension?
27
+ - Instantaneous snapshot (price at moment T)
28
+ - Cumulative value (total sales to date)
29
+ - Rate of change (velocity, acceleration)
30
+ - Memory/persistence (how long effects last)
31
+
32
+ #### Why does this field exist?
33
+ - What problem was it designed to solve?
34
+ - Who uses it and for what purpose?
35
+ - What business process generates it?
36
+
37
+ ### Field Relationship Mapping
38
+
39
+ **Find the story the data tells:**
40
+
41
+ #### Identify connections:
42
+ - **Causal**: X causes Y (revenue → profit)
43
+ - **Complementary**: X and Y measure related aspects (price & volume)
44
+ - **Conflicting**: X and Y can diverge (book value vs. market cap)
45
+ - **Independent**: X and Y are unrelated (company location vs. stock price)
46
+
47
+ #### Build the narrative:
48
+ - What is the complete picture these fields paint?
49
+ - What are the key turning points?
50
+ - What is missing that would complete the story?
51
+
52
+ ### Data Quality Assessment
53
+
54
+ **Evaluate from the source:**
55
+
56
+ #### Generation mechanisms:
57
+ - Manual entry (human error, bias, gaming)
58
+ - Automated collection (sensor precision, calibration)
59
+ - Calculated values (formula assumptions, input quality)
60
+
61
+ #### Reliability indicators:
62
+ - Audit trails and verification processes
63
+ - Consistency checks across sources
64
+ - Update frequency vs. true change rate
65
+
66
+ ## 2. First-Principles Thinking
67
+
68
+ **Strip away all labels and assumptions.**
69
+
70
+ ### The Process:
71
+ 1. **Forget what you "know"**: Ignore domain-specific labels
72
+ 2. **Identify raw components**: What are the fundamental elements?
73
+ 3. **Question everything**: Why is it measured this way?
74
+ 4. **Rebuild from basics**: Construct features from fundamental truths
75
+
76
+ ### Example:
77
+ **Don't say**: "P/E ratio measures valuation"
78
+ **Do say**: "Price per share divided by earnings per share compares market price to accounting profit"
79
+
80
+ **First principles analysis**:
81
+ - Price: What market participants collectively believe value is
82
+ - Earnings: Accounting measure of profit generation
83
+ - Ratio: Comparison of two different perspectives on value
84
+ - **Insight**: The spread between perspectives is what matters, not the ratio itself
85
+
86
+ ### Exercise:
87
+ For any field, write down:
88
+ - What is literally being measured (no jargon)
89
+ - What assumptions are built in
90
+ - What could cause it to be wrong
91
+ - What it would mean if it were very high or very low
92
+
93
+ ## 3. Question-Driven Feature Generation
94
+
95
+ **Start with questions, not formulas.**
96
+
97
+ ### The Question Bank:
98
+
99
+ #### Q1: "What is stable?" (Invariance)
100
+ **Purpose**: Find what doesn't change—it's often more meaningful than what does
101
+
102
+ **Leads to features about:**
103
+ - Stability measures (coefficient of variation)
104
+ - Invariant relationships (ratios that stay constant)
105
+ - Structural constants (parameters that define the system)
106
+
107
+ **Examples**:
108
+ - "Customer acquisition cost stability" = std_dev(CAC) / mean(CAC)
109
+ - *Meaning*: Is our cost structure predictable?
110
+ - *High value*: Costs are volatile, business model is unstable
111
+ - *Low value*: Costs are predictable, scalable model
112
+
113
+ #### Q2: "What is changing?" (Dynamics)
114
+ **Purpose**: Understand motion, rate, and direction
115
+
116
+ **Leads to features about:**
117
+ - Velocity and acceleration
118
+ - Trend vs. noise
119
+ - Change significance
120
+
121
+ **Examples**:
122
+ - "Growth acceleration" = (revenue_t - revenue_{t-1}) - (revenue_{t-1} - revenue_{t-2})
123
+ - *Meaning*: Is growth speeding up or slowing down?
124
+ - *High value*: Accelerating growth
125
+ - *Low value*: Decelerating growth
126
+ - *Why it matters*: Acceleration is early signal of inflection points
127
+
128
+ #### Q3: "What is anomalous?" (Deviation)
129
+ **Purpose**: Identify what breaks patterns—the exceptions reveal rules
130
+
131
+ **Leads to features about:**
132
+ - Outliers and extremes
133
+ - Deviation from normal
134
+ - Pattern breaks
135
+
136
+ **Examples**:
137
+ - "Earnings surprise magnitude" = (actual - expected) / |expected|
138
+ - *Meaning*: How much did results deviate from expectations?
139
+ - *High value*: Significant surprise (positive or negative)
140
+ - *Why it matters*: Surprises often trigger re-evaluation
141
+
142
+ #### Q4: "What is combined?" (Interaction)
143
+ **Purpose**: Understand how elements affect each other
144
+
145
+ **Leads to features about:**
146
+ - Synergies and conflicts
147
+ - Joint effects
148
+ - Conditional relationships
149
+
150
+ **Examples**:
151
+ - "Marketing-sales synergy" = (marketing_spend × sales_efficiency)
152
+ - *Meaning*: Do marketing and sales amplify each other?
153
+ - *High value*: Strong synergy (1+1=3)
154
+ - *Low value*: Weak synergy (1+1=1.5)
155
+ - *Why it matters*: Synergy indicates scalability
156
+
157
+ #### Q5: "What is structural?" (Composition)
158
+ **Purpose**: Decompose wholes into meaningful parts
159
+
160
+ **Leads to features about:**
161
+ - Component breakdowns
162
+ - Proportional relationships
163
+ - Structure changes
164
+
165
+ **Examples**:
166
+ - "Recurring revenue quality" = subscription_revenue / total_revenue
167
+ - *Meaning*: What portion of revenue is predictable?
168
+ - *High value*: High-quality recurring revenue
169
+ - *Low value*: Low-quality one-time revenue
170
+ - *Why it matters*: Predictability affects valuation
171
+
172
+ #### Q6: "What is cumulative?" (Accumulation)
173
+ **Purpose**: Capture time-based build-up and decay
174
+
175
+ **Leads to features about:**
176
+ - Running totals and diminishing returns
177
+ - Memory effects
178
+ - Time-weighted values
179
+
180
+ **Examples**:
181
+ - "Customer relationship depth" = Σ(purchase_value × e^{-days_ago / half_life})
182
+ - *Meaning*: Time-decayed cumulative purchase value
183
+ - *High value*: Deep, recent relationship
184
+ - *Low value*: Shallow or old relationship
185
+ - *Why it matters*: Recency and frequency predict loyalty
186
+
187
+ #### Q7: "What is relative?" (Comparison)
188
+ **Purpose**: Understand position in context
189
+
190
+ **Leads to features about:**
191
+ - Rankings and percentiles
192
+ - Normalizations
193
+ - Context-aware measures
194
+
195
+ **Examples**:
196
+ - "Relative efficiency" = company_efficiency / industry_median_efficiency
197
+ - *Meaning*: How efficient vs. peers?
198
+ - *High value*: More efficient than typical
199
+ - *Low value*: Less efficient than typical
200
+ - *Why it matters*: Competitiveness indicator
201
+
202
+ #### Q8: "What is essential?" (Essence)
203
+ **Purpose**: Distill to core truths
204
+
205
+ **Leads to features about:**
206
+ - First-principles measures
207
+ - Fundamental relationships
208
+ - Stripped-down indicators
209
+
210
+ **Examples**:
211
+ - "Core profitability" = (revenue - variable_costs) / revenue
212
+ - *Meaning*: Profitability without fixed cost distortions
213
+ - *Why it matters*: Shows true unit economics
214
+
215
+ ### How to Use the Question Bank:
216
+
217
+ **For any dataset**:
218
+ 1. Go through each question
219
+ 2. Ask: "Which fields or combinations can answer this?"
220
+ 3. Formulate specific feature concepts
221
+ 4. Validate each concept has clear meaning
222
+ 5. Document the reasoning
223
+
224
+ **Example Workflow:**
225
+ ```
226
+ Dataset: Sales data with fields [customer_id, order_value, order_date, product_category]
227
+
228
+ Q: "What is stable?"
229
+ → Average order value per customer over time
230
+ → Favorite category per customer (most frequent)
231
+ → Purchase frequency pattern
232
+
233
+ Q: "What is changing?"
234
+ → Order value trend (increasing/decreasing)
235
+ → Category preference evolution
236
+ → Purchase interval changes
237
+
238
+ Q: "What is anomalous?"
239
+ → Orders far from customer's typical behavior
240
+ → Sudden category switches
241
+ → Unusually large/small orders
242
+
243
+ Q: "What is combined?"
244
+ → Order value × frequency = total value
245
+ → Category diversity × consistency = loyalty measure
246
+ → Recency × frequency = engagement score
247
+
248
+ ... (continue through all questions)
249
+ ```
250
+
251
+ ## 4. Field Combination Logic Patterns
252
+
253
+ ### When you combine fields, what are you really doing?
254
+
255
+ #### Addition: "X + Y" → What does this sum represent?
256
+ **Good when**: Combining parts of a whole
257
+ - Total revenue = product_A_revenue + product_B_revenue
258
+ **Bad when**: Adding unrelated concepts
259
+ - Price + volume (What does this mean?)
260
+
261
+ #### Subtraction: "X - Y" → What is the difference telling you?
262
+ **Good when**: Measuring gap or surplus
263
+ - Profit = revenue - costs
264
+ - Shortfall = target - actual
265
+ **Bad when**: Ignoring that difference scales with magnitude
266
+ - Revenue_2023 - revenue_2022 (better: percentage change)
267
+
268
+ #### Multiplication: "X × Y" → What is the joint effect?
269
+ **Good when**: Capturing interaction or scaling
270
+ - Total_value = price × quantity
271
+ - Weighted_importance = score × weight
272
+ **Bad when**: Mixing units without meaning
273
+ - Revenue × employee_count (What is "dollar-employees"?)
274
+
275
+ #### Division: "X / Y" → What ratio or rate are you computing?
276
+ **Good when**: Creating relative measures
277
+ - Efficiency = output / input
278
+ - Concentration = part / whole
279
+ **Bad when**: Denominator can be zero or meaningless
280
+ - Revenue / days_since_founded (early days distort heavily)
281
+
282
+ #### Conditional: "If X then Y" → What condition matters?
283
+ **Good when**: Threshold effects exist
284
+ - If temperature > 100°C then phase = "gas"
285
+ - If churn_risk > 0.8 then intervene = true
286
+ **Bad when**: Arbitrary thresholds without justification
287
+ - If customer_age > 30 then category = "old" (why 30?)
288
+
289
+ ### The Deeper Question:
290
+ **"What new information does this combination create?"**
291
+
292
+ A good combination:
293
+ - Reveals something the individual fields hide
294
+ - Creates a new concept with clear meaning
295
+ - Has intuitive interpretation
296
+
297
+ A bad combination:
298
+ - Just applies math to numbers
299
+ - Creates meaningless units (dollar-days per employee)
300
+ - Is hard to explain
301
+
302
+ ## 5. Escaping Conventional Thinking Traps
303
+
304
+ ### Trap 1: "This is a [field type], so I should..."
305
+ **Wrong**: "This is price data, so I should calculate moving averages"
306
+ **Right**: "This is a time series of transaction values—what patterns exist?"
307
+
308
+ **Escaping method**: Pretend you don't know the field name or domain. Just look at:
309
+ - Data type (number, category, date)
310
+ - Update frequency
311
+ - Distribution
312
+ - Missingness pattern
313
+
314
+ **Ask**: What would a data scientist from a different field see?
315
+
316
+ ### Trap 2: "Everyone uses [conventional feature], so I will too"
317
+ **Wrong**: Building P/E, moving averages, RSI because "that's what you do"
318
+ **Right**: Asking "What does this ratio truly mean? Is there a better way to express that concept?"
319
+
320
+ **Example with P/E**:
321
+ - Conventional: P/E = price / earnings ("valuation metric")
322
+ - First principles: Compares market's forward-looking assessment to accounting record
323
+ - Deeper question: Why do these diverge? What does divergence mean?
324
+ - Better feature: Track divergence trend, not just level
325
+
326
+ ### Trap 3: "Complexity = better"
327
+ **Wrong**: Adding more variables, interactions, conditions to improve "sophistication"
328
+ **Right**: Simpler is often more robust and interpretable
329
+
330
+ **Test**: Can you explain the feature in one sentence to a non-expert?
331
+ - If no → It's too complex
332
+ - If yes → It might be valuable
333
+
334
+ ### Trap 4: "Feature engineering is separate from domain knowledge"
335
+ **Wrong**: Applying math without understanding what fields mean
336
+ **Right**: Deep domain understanding → Better features
337
+
338
+ **Process**:
339
+ 1. Understand the business process that generates each field
340
+ 2. Identify pain points and edge cases in that process
341
+ 3. Build features that capture those nuances
342
+ 4. Validate with domain experts
343
+
344
+ ## 6. Feature Validation Checklist
345
+
346
+ ### Before finalizing any feature, verify:
347
+
348
+ #### □ Clear Definition
349
+ - [ ] Can be explained in one sentence
350
+ - [ ] Uses precise language
351
+ - [ ] Avoids jargon and buzzwords
352
+
353
+ #### □ Logical Meaning
354
+ - [ ] Represents a real phenomenon or concept
355
+ - [ ] Not just a mathematical operation
356
+ - [ ] Has intuitive interpretation
357
+
358
+ #### □ Business Relevance
359
+ - [ ] Connects to real-world decision-making
360
+ - [ ] Answers a meaningful question
361
+ - [ ] Reveals actionable insight
362
+
363
+ #### □ Directional Understanding
364
+ - [ ] What does high value mean?
365
+ - [ ] What does low value mean?
366
+ - [ ] Is there an optimal range?
367
+
368
+ #### □ Boundary Conditions
369
+ - [ ] What do extreme values indicate?
370
+ - [ ] What happens at zero/infinity?
371
+ - [ ] Are there theoretical limits?
372
+
373
+ #### □ Data Quality Awareness
374
+ - [ ] What are sources of noise?
375
+ - [ ] When might this be unreliable?
376
+ - [ ] What biases could affect it?
377
+
378
+ #### □ Novelty Check
379
+ - [ ] Does this reveal something new?
380
+ - [ ] Or just repackage existing information?
381
+ - [ ] Would an expert learn something?
382
+
383
+ ### Example Validation:
384
+
385
+ **Feature**: Customer purchase velocity = total_purchases / account_age_days
386
+
387
+ - **Clear definition**: "Average number of purchases per day since account creation"
388
+ - **Logical meaning**: Measures purchase frequency over customer lifetime
389
+ - **Business relevance**: Indicates customer engagement and habit formation
390
+ - **Directional**: High = frequent buyer, Low = infrequent buyer
391
+ - **Boundaries**: Zero = no purchases, Very high = possible data error or bulk buyer
392
+ - **Data quality**: Affected by returns, multi-item orders, gift purchases
393
+ - **Novelty**: Reveals engagement pattern beyond simple total purchases
394
+
395
+ ## 7. Creative Thinking Techniques
396
+
397
+ ### A. Lateral Thinking (Borrow from other domains)
398
+
399
+ **Ask**: How would a physicist/biologist/sociologist approach this?
@@ -0,0 +1,40 @@
1
+ ---
2
+ name: brain-feature-implementation
3
+ description: Automate conversion of Brain idea documents into actionable Alpha expressions using local CSV data.
4
+ ---
5
+
6
+ # Brain Feature Implementation
7
+
8
+ ## Description
9
+ This skill automates the process of converting a WorldQuant Brain idea document (Markdown) into actionable Alpha expressions.
10
+
11
+ ## Instructions
12
+
13
+ 1. **Analyze the Idea Document**
14
+ * Read the provided markdown file.
15
+ * Extract the following metadata:
16
+ * **Dataset ID** (e.g., `analyst15`)
17
+ * **Region** (e.g., `GLB`)
18
+ * **Delay** (e.g., `1` or `0`)
19
+ * *If any metadata is missing, ask the user to clarify.*
20
+
21
+ 2. **Plan Implementation**
22
+ * Scan the markdown file for **Feature Definitions** or **Formulas**.
23
+ * Look for patterns like `Definition: <formula>` or code blocks describing math.
24
+ * Use the `manage_todo_list` tool to create a plan with one entry for each unique idea/formula found.
25
+ * *Title*: The Idea Name or ID (e.g., "3.1.1 Estimate Stability Score").
26
+ * *Description*: The specific template formula (e.g., `template: "{st_dev} / abs({mean})"`).
27
+
28
+ 3. **Execute Implementation**
29
+ * For each item in the Todo List:
30
+ * **Construct the Template**:
31
+ * Use Python format string syntax `{variable}`.
32
+ * The `{variable}` must match the **suffix** of the fields in the dataset (e.g., `mean`, `st_dev`, `gro`).
33
+ * **CRITICAL**: Do NOT include the full prefix or horizon in the template. The script auto-detects these.
34
+ * *Correct Example*: For `anl15_gr_12_m_gro / anl15_gr_12_m_pe`, use template: `{gro} / {pe}`.
35
+ * *Incorrect Example*: `{anl15_gr_12_m_gro} / {pe}` (Includes prefix).
36
+ * *Incorrect Example*: `${gro} / ${pe}` (Shell syntax).
37
+ * *Note*: The script ONLY accepts `--template` and `--dataset`. Do not pass any other arguments like `--filters` or `--groupby`.
38
+ * Verify the output (number of expressions generated).
39
+ * Mark the Todo item as completed.
40
+
@@ -0,0 +1,6 @@
1
+ {
2
+ "BRAIN_CREDENTIALS": {
3
+ "email": "xxxxx",
4
+ "password": "xxx"
5
+ }
6
+ }