cnhkmcp 2.1.7__py3-none-any.whl → 2.1.9__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (26) hide show
  1. cnhkmcp/__init__.py +1 -1
  2. cnhkmcp/untracked/AI/321/206/320/231/320/243/321/205/342/225/226/320/265/321/204/342/225/221/342/225/221/BRAIN_AI/321/206/320/231/320/243/321/205/342/225/226/320/265/321/204/342/225/221/342/225/221Mac_Linux/321/207/320/231/320/230/321/206/320/254/320/274.zip +0 -0
  3. cnhkmcp/untracked/AI/321/206/320/231/320/243/321/205/342/225/226/320/265/321/204/342/225/221/342/225/221//321/205/320/237/320/234/321/205/320/227/342/225/227/321/205/320/276/320/231/321/210/320/263/320/225AI/321/206/320/231/320/243/321/205/342/225/226/320/265/321/204/342/225/221/342/225/221_Windows/321/207/320/231/320/230/321/206/320/254/320/274.exe +0 -0
  4. cnhkmcp/untracked/AI/321/206/320/261/320/234/321/211/320/255/320/262/321/206/320/237/320/242/321/204/342/225/227/342/225/242/config.json +1 -1
  5. cnhkmcp/untracked/skills/brain-calculate-alpha-selfcorrQuick/SKILL.md +25 -0
  6. cnhkmcp/untracked/skills/brain-calculate-alpha-selfcorrQuick/reference.md +59 -0
  7. cnhkmcp/untracked/skills/brain-calculate-alpha-selfcorrQuick/scripts/requirements.txt +4 -0
  8. cnhkmcp/untracked/skills/brain-calculate-alpha-selfcorrQuick/scripts/skill.py +734 -0
  9. cnhkmcp/untracked/skills/brain-datafield-exploration-general/SKILL.md +45 -0
  10. cnhkmcp/untracked/skills/brain-datafield-exploration-general/reference.md +194 -0
  11. cnhkmcp/untracked/skills/brain-dataset-exploration-general/SKILL.md +39 -0
  12. cnhkmcp/untracked/skills/brain-dataset-exploration-general/reference.md +436 -0
  13. cnhkmcp/untracked/skills/brain-explain-alphas/SKILL.md +39 -0
  14. cnhkmcp/untracked/skills/brain-explain-alphas/reference.md +56 -0
  15. cnhkmcp/untracked/skills/brain-how-to-pass-AlphaTest/SKILL.md +72 -0
  16. cnhkmcp/untracked/skills/brain-how-to-pass-AlphaTest/reference.md +202 -0
  17. cnhkmcp/untracked/skills/brain-improve-alpha-performance/SKILL.md +44 -0
  18. cnhkmcp/untracked/skills/brain-improve-alpha-performance/reference.md +101 -0
  19. cnhkmcp/untracked/skills/brain-nextMove-analysis/SKILL.md +37 -0
  20. cnhkmcp/untracked/skills/brain-nextMove-analysis/reference.md +128 -0
  21. {cnhkmcp-2.1.7.dist-info → cnhkmcp-2.1.9.dist-info}/METADATA +1 -1
  22. {cnhkmcp-2.1.7.dist-info → cnhkmcp-2.1.9.dist-info}/RECORD +26 -10
  23. {cnhkmcp-2.1.7.dist-info → cnhkmcp-2.1.9.dist-info}/WHEEL +0 -0
  24. {cnhkmcp-2.1.7.dist-info → cnhkmcp-2.1.9.dist-info}/entry_points.txt +0 -0
  25. {cnhkmcp-2.1.7.dist-info → cnhkmcp-2.1.9.dist-info}/licenses/LICENSE +0 -0
  26. {cnhkmcp-2.1.7.dist-info → cnhkmcp-2.1.9.dist-info}/top_level.txt +0 -0
@@ -0,0 +1,436 @@
1
+ # Dataset Exploration Expert - Job Duty Manual
2
+ ## WorldQuant BRAIN Platform
3
+
4
+ ### Position Overview
5
+ The Dataset Exploration Expert is a specialized role focused on deep analysis and categorization of datasets within the WorldQuant BRAIN platform. This expert serves as a master/assistant that excels at exploring individual datasets, grouping data files into logical categories, and providing comprehensive insights into data field characteristics and relationships.
6
+
7
+ ### Core Responsibilities
8
+
9
+ #### 1. Dataset Deep Dive Analysis
10
+ - **Single Dataset Focus**: Concentrate on one dataset at a time for comprehensive understanding
11
+ - **Data Field Inventory**: Catalog and analyze all available data fields within the target dataset
12
+ - **Coverage Analysis**: Assess data availability across different instruments, regions, and time periods
13
+ - **Quality Assessment**: Evaluate data reliability, consistency, and completeness
14
+
15
+ #### 2. Data Field Categorization & Grouping
16
+ - **Logical Grouping**: Organize data fields into meaningful categories based on:
17
+ - Business function (e.g., financial metrics, operational data, market indicators)
18
+ - Data type (e.g., matrix, vector, group fields)
19
+ - Update frequency (e.g., daily, quarterly, annual)
20
+ - Coverage patterns (e.g., high-coverage vs. low-coverage fields)
21
+ - Usage patterns (e.g., frequently used vs. underutilized fields)
22
+
23
+ - **Hierarchical Organization**: Create multi-level categorization systems:
24
+ - Primary categories (e.g., Financial Statements, Market Data, Analyst Estimates)
25
+ - Secondary categories (e.g., Balance Sheet, Income Statement, Cash Flow)
26
+ - Tertiary categories (e.g., Assets, Liabilities, Revenue, Expenses)
27
+
28
+ #### 3. Enhanced Data Field Descriptions
29
+ - **Current Description Analysis**: Review existing field descriptions for clarity and completeness
30
+ - **Enhanced Documentation**: Write improved, detailed descriptions that include:
31
+ - Business context and significance
32
+ - Calculation methodology (if applicable)
33
+ - Typical value ranges and distributions
34
+ - Relationships to other fields
35
+ - Common use cases in alpha creation
36
+ - Coverage limitations and considerations
37
+
38
+ #### 4. Exploratory Data Analysis
39
+ - **Statistical Profiling**: Analyze data field characteristics including:
40
+ - Value distributions and ranges
41
+ - Temporal patterns and seasonality
42
+ - Cross-sectional relationships
43
+ - Missing data patterns
44
+ - Outlier identification
45
+
46
+ - **Feature Engineering Insights**: Identify potential derived features and combinations
47
+ - **Alpha Creation Opportunities**: Discover patterns that could lead to profitable trading strategies
48
+
49
+ #### 5. Cross-Platform Research & Integration
50
+ - **BRAIN Platform Exploration**: Leverage all available platform tools:
51
+ - Data Explorer for field discovery
52
+ - Simulation capabilities for data testing
53
+ - Research papers and documentation
54
+ - User community insights and best practices
55
+
56
+ - **Forum & Community Engagement**: Research and integrate knowledge from:
57
+ - BRAIN support forum discussions
58
+ - User-generated content and tutorials
59
+ - Expert insights and case studies
60
+ - Platform updates and new features
61
+
62
+ ### Technical Skills Required
63
+
64
+ #### 1. BRAIN Platform Proficiency
65
+ - **Data Explorer Mastery**: Expert use of BRAIN's AI-powered data discovery tools
66
+ - **Simulation Tools**: Ability to run test simulations to understand data behavior
67
+ - **API Knowledge**: Understanding of BRAIN API for automated data exploration
68
+ - **Documentation Navigation**: Efficient use of platform documentation and resources
69
+
70
+ #### 2. Data Analysis Capabilities
71
+ - **Statistical Analysis**: Understanding of descriptive statistics, distributions, and relationships
72
+ - **Financial Knowledge**: Familiarity with financial statements, ratios, and market data
73
+ - **Pattern Recognition**: Ability to identify meaningful patterns in complex datasets
74
+ - **Data Quality Assessment**: Skills in evaluating data reliability and consistency
75
+
76
+ #### 3. Documentation & Communication
77
+ - **Technical Writing**: Ability to create clear, comprehensive field descriptions
78
+ - **Visual Organization**: Skills in creating logical categorization systems
79
+ - **Knowledge Management**: Ability to organize and present complex information clearly
80
+
81
+ ### Workflow & Methodology
82
+
83
+ #### Phase 1: Dataset Selection & Initial Assessment
84
+ 1. **Dataset Identification**: Select target dataset based on:
85
+ - Strategic importance
86
+ - Current usage levels
87
+ - Data quality scores
88
+ - User community needs
89
+
90
+ 2. **Initial Exploration**: Use MCP to:
91
+ - Review dataset overview and description
92
+ - Identify total field count and coverage
93
+ - Assess value scores and pyramid multipliers
94
+ - Review research papers and documentation
95
+
96
+ **MCP Tool Calls for Phase 1:**
97
+ - **`mcp_brain-api_get_datasets`**: Discover available datasets with coverage filters
98
+ - **`mcp_brain-api_get_datafields`**: Get field count and coverage statistics for selected dataset
99
+ - **`mcp_brain-api_get_documentations`**: Access platform documentation and research papers
100
+ - **`mcp_brain-api_get_documentation_page`**: Read specific documentation pages for dataset context
101
+
102
+ #### Phase 2: Comprehensive Field Analysis
103
+ 1. **Field Inventory**: Catalog all data fields with:
104
+ - Field ID and name
105
+ - Current description
106
+ - Data type (matrix/vector/group), please note, different data types have different characteristics and usage patterns; you should use mcp to check how to handle different data types by reading the related documents.
107
+ - Coverage statistics
108
+ - Usage metrics (user count, alpha count)
109
+
110
+ 2. **Preliminary Categorization**: Group fields by:
111
+ - Business function
112
+ - Data characteristics
113
+ - Update patterns
114
+ - Coverage levels
115
+
116
+ **MCP Tool Calls for Phase 2:**
117
+ - **`mcp_brain-api_get_datafields`**: Retrieve complete field inventory with metadata
118
+ - **`mcp_brain-api_get_documentation_page`**: Read data type handling documentation (e.g., "vector-datafields", "group-data-fields")
119
+ - **`mcp_brain-api_get_operators`**: Understand available operators for different data types
120
+ - **`mcp_brain-api_get_documentations`**: Access data handling best practices and examples
121
+
122
+ #### Phase 3: Initial Data Exploration
123
+ 1. **Statistical Profiling Using BRAIN 6-Tips Methodology**: Run systematic exploratory simulations following the proven BRAIN platform approach. This methodology provides a comprehensive framework for understanding new datafields efficiently. **Critical Settings for All Tests**:
124
+ - **Neutralization**: "None" (to see raw data behavior without masking important patterns)
125
+ - **Decay**: 0 (to preserve actual data values and avoid smoothing out variations)
126
+ - **Test Period**: P0Y0M (for focused analysis)
127
+ - **Focus**: Long Count and Short Count in IS Summary section for insights
128
+
129
+ **A. Basic Coverage Analysis**
130
+ - **Expression**: `datafield` (for matrix data type) or `vector_operator(datafield)` (for vector data type)
131
+ - **Purpose**: Determine % coverage = (Long Count + Short Count) / Universe Size
132
+ - **Insight**: Understand basic data availability across instruments
133
+ - **What it tells you**: How many instruments have data for this field on average
134
+ - **Implementation**: Start with this test to establish baseline coverage understanding
135
+
136
+ **B. Non-Zero Value Coverage**
137
+ - **Expression**: `datafield != 0 ? 1 : 0` (for matrix) or `vector_operator(datafield) != 0 ? 1 : 0` (for vector)
138
+ - **Purpose**: Distinguish between missing data and actual zero values
139
+ - **Insight**: Long Count indicates average non-zero values on a daily basis
140
+ - **What it tells you**: Whether the field has meaningful data vs. just coverage gaps
141
+ - **Implementation**: Run after basic coverage to understand data quality vs. availability
142
+
143
+ **C. Data Update Frequency Analysis**
144
+ - **Expression**: `ts_std_dev(datafield,N) != 0 ? 1 : 0` (for matrix) or `ts_std_dev(vector_operator(datafield),N) != 0 ? 1 : 0` (for vector)
145
+ - **Purpose**: Understand how often data actually changes vs. being backfilled
146
+ - **Insight**: Frequency of unique data updates (daily, weekly, monthly, quarterly)
147
+ - **Key Testing Strategy**:
148
+ - **N = 5 (weekly)**: Long Count + Short Count will be lowest (approx. 1/5th of coverage)
149
+ - **N = 22 (monthly)**: Long Count + Short Count will be lower (approx. 1/3rd of coverage)
150
+ - **N = 66 (quarterly)**: Long Count + Short Count will be closest to actual coverage
151
+ - **What it tells you**: Data freshness patterns and whether data is actively updated or backfilled
152
+ - **Implementation**: Test with various N values to identify the actual update frequency
153
+
154
+ **D. Data Bounds Analysis**
155
+ - **Expression**: `abs(datafield) > X` (for matrix) or `abs(vector_operator(datafield)) > X` (for vector)
156
+ - **Purpose**: Understand data range, scale, and normalization
157
+ - **Insight**: Bounds of the datafield values
158
+ - **Testing Strategy**: Vary X values systematically:
159
+ - **X = 1**: Test if data is normalized to values between -1 and +1
160
+ - **X = 0.1, 0.5, 1, 5, 10**: Test various thresholds to understand value distribution
161
+ - **What it tells you**: Whether data is normalized, typical value ranges, and data scale
162
+ - **Implementation**: Start with X = 1, then adjust based on results to map the full value range
163
+
164
+ **E. Central Tendency Analysis**
165
+ - **Expression**: `ts_median(datafield, 1000) > X` (for matrix) or `ts_median(vector_operator(datafield), 1000) > X` (for vector)
166
+ - **Purpose**: Understand typical values and central tendency over 5 years
167
+ - **Insight**: Median of the datafield over extended period
168
+ - **Testing Strategy**: Vary X values to understand value distribution:
169
+ - **X = 0**: Test if median is positive
170
+ - **X = 0.1, 0.5, 1, 5, 10**: Test various thresholds to map central tendency
171
+ - **What it tells you**: Whether data is skewed, what typical values look like, and data characteristics
172
+ - **Alternative**: Can also use `ts_mean(datafield, 1000) > X` for mean-based analysis
173
+ - **Implementation**: Test with increasing X values until Long Count approaches zero
174
+
175
+ **F. Data Distribution Analysis**
176
+ - **Expression**: `X < scale_down(datafield) && scale_down(datafield) < Y` (for matrix) or `X < scale_down(vector_operator(datafield)) && scale_down(vector_operator(datafield)) < Y` (for vector)
177
+ - **Purpose**: Understand how data distributes across its range
178
+ - **Insight**: Distribution characteristics and patterns
179
+ - **Key Understanding**: `scale_down` acts as a MinMaxScaler that preserves original distribution
180
+ - **Testing Strategy**: Vary X and Y between 0 and 1 to test different distribution segments:
181
+ - **X = 0, Y = 0.25**: Test bottom quartile distribution
182
+ - **X = 0.25, Y = 0.5**: Test second quartile distribution
183
+ - **X = 0.5, Y = 0.75**: Test third quartile distribution
184
+ - **X = 0.75, Y = 1**: Test top quartile distribution
185
+ - **What it tells you**: Whether data is evenly distributed, clustered, or has specific patterns
186
+ - **Implementation**: Test quartile ranges first, then adjust for finer granularity
187
+
188
+ **Data Type Considerations**:
189
+ - **Matrix Data Type**: Use expressions directly as shown above
190
+ - **Vector Data Type**: Must use appropriate vector operators (found via MCP) to convert to matrix format
191
+ - **Group Data Type**: Requires special handling - consult MCP documentation for group field operators
192
+ - **Critical**: Always verify data type before testing and use appropriate operators accordingly
193
+
194
+ **Implementation Workflow for BRAIN 6-Tips**:
195
+ 1. **Setup Phase**: Configure simulation with "None" neutralization, decay 0, and P0Y0M test period
196
+ 2. **Sequential Testing**: Run tests A through F in order for systematic understanding
197
+ 3. **Iterative Refinement**: Adjust thresholds based on initial results for deeper insights
198
+ 4. **Documentation**: Record Long Count and Short Count for each test to build comprehensive profile
199
+ 5. **Validation**: Cross-reference results across different N values and thresholds for consistency
200
+
201
+ **Expected Results Interpretation**:
202
+ - **Coverage Tests (A & B)**: Should show Long Count + Short Count ≤ Universe Size
203
+ - **Frequency Tests (C)**: Lower N values should show proportionally lower counts
204
+ - **Bounds Tests (D)**: Should reveal data normalization and typical ranges
205
+ - **Tendency Tests (E)**: Should show data skewness and central value characteristics
206
+ - **Distribution Tests (F)**: Should reveal clustering, patterns, and data spread
207
+
208
+ **Common Patterns to Watch For**:
209
+ - **Normalized Data**: Values consistently between -1 and +1
210
+ - **Quarterly Updates**: Significant count differences between N=22 and N=66
211
+ - **Sparse Data**: High coverage but low non-zero counts
212
+ - **Skewed Distributions**: Uneven quartile distributions in scale_down tests
213
+ - **Data Quality Issues**: Inconsistent results across different test parameters
214
+
215
+ **Practical Example - Closing Price Analysis**:
216
+ **Test A (Basic Coverage)**: `close` → High Long Count + Short Count indicates universal coverage
217
+ **Test B (Non-Zero)**: `close != 0 ? 1 : 0` → Should show same high counts (prices are never zero)
218
+ **Test C (Frequency)**: `ts_std_dev(close,5) != 0 ? 1 : 0` → High counts indicate daily price changes
219
+ **Test D (Bounds)**: `abs(close) > 1` → Should show high counts (prices typically > $1)
220
+ **Test E (Tendency)**: `ts_median(close,1000) > 0` → Should show high counts (median prices are positive)
221
+ **Test F (Distribution)**: `0 < scale_down(close) && scale_down(close) < 0.25` → Tests bottom quartile distribution
222
+
223
+ **What This Example Demonstrates**:
224
+ - **Validation**: Confirms expected behavior (prices are positive, change daily, have good coverage)
225
+ - **Pattern Recognition**: Shows how to identify normal vs. abnormal data characteristics
226
+ - **Quality Assessment**: Reveals data consistency and reliability
227
+ - **Alpha Creation Insights**: Understanding price behavior helps in strategy development
228
+
229
+ **Troubleshooting Common Issues**:
230
+ - **Zero Counts**: Check if datafield name is correct and data type is appropriate
231
+ - **Unexpected Results**: Verify neutralization is "None" and decay is 0
232
+ - **Vector Field Errors**: Ensure proper vector operator is used for vector data types
233
+ - **Inconsistent Patterns**: Test with different N values and thresholds for validation
234
+ - **Low Coverage**: Consider universe size and data availability in selected region/timeframe
235
+
236
+ **Best Practices for Efficient Exploration**:
237
+ - **Start Simple**: Begin with basic coverage tests before complex analysis
238
+ - **Document Everything**: Record all test parameters and results systematically
239
+ - **Iterate Intelligently**: Use initial results to guide subsequent test parameters
240
+ - **Cross-Validate**: Compare results across different test methods for consistency
241
+ - **Focus on Insights**: Prioritize understanding data behavior over exhaustive testing
242
+
243
+ 2. **Advanced Statistical Analysis**:
244
+ - Value distributions and ranges
245
+ - Temporal patterns and seasonality
246
+ - Cross-sectional relationships
247
+ - Missing data patterns
248
+ - Outlier identification
249
+ - Data quality consistency over time
250
+
251
+ **MCP Tool Calls for Phase 3:**
252
+ - **`mcp_brain-api_create_multi_regularAlpha_simulation`**: Execute BRAIN 6-tips methodology simulations
253
+ - **`mcp_brain-api_get_platform_setting_options`**: Validate simulation settings and parameters
254
+ - **`mcp_brain-api_get_operators`**: Access time series operators (ts_std_dev, ts_median, scale_down)
255
+ - **`mcp_brain-api_get_documentation_page`**: Read simulation settings documentation ("simulation-settings")
256
+ - **`mcp_brain-api_get_documentation_page`**: Access data analysis best practices ("data")
257
+
258
+ 3. **Relationship Mapping**: Identify:
259
+ - Field interdependencies and correlations
260
+ - Logical groupings and hierarchies
261
+ - Potential derived features and combinations
262
+ - Alpha creation opportunities
263
+ - Risk factors and limitations
264
+
265
+ #### Phase 4: Enhanced Documentation
266
+ 1. **Description Enhancement**: Improve field descriptions with:
267
+ - Business context
268
+ - Calculation details and data unit
269
+ - Usage examples
270
+ - Limitations and considerations
271
+
272
+ 2. **Categorization Refinement**: Finalize logical groupings with:
273
+ - Clear category names
274
+ - Hierarchical structure
275
+ - Cross-references
276
+ - Usage guidelines
277
+
278
+ **MCP Tool Calls for Phase 4:**
279
+ - **`mcp_brain-api_get_documentation_page`**: Access field description best practices ("data")
280
+ - **`mcp_brain-api_get_documentations`**: Review documentation structure and organization
281
+ - **`mcp_brain-api_get_alpha_examples`**: Find usage examples in documentation ("19-alpha-examples")
282
+ - **`mcp_brain-api_get_documentation_page`**: Access categorization guidelines ("how-use-data-explorer")
283
+
284
+ #### Phase 5: Knowledge Integration & Validation
285
+ 1. **Community Research**: Review forum discussions and user insights, search and read related documents or related guidanline.
286
+ 2. **Best Practice Integration**: Incorporate platform-specific knowledge by looking into related documents or related competitions' guidanline.
287
+ 3. **Validation**: Test categorization with sample use cases
288
+ 4. **Documentation**: Create final comprehensive dataset guide
289
+
290
+ **MCP Tool Calls for Phase 5:**
291
+ - **`mcp_brain-forum_search_forum_posts`**: Search community discussions and user insights
292
+ - **`mcp_brain-forum_read_full_forum_post`**: Read detailed forum discussions and best practices
293
+ - **`mcp_brain-api_get_events`**: Access competition guidelines and rules
294
+ - **`mcp_brain-api_get_competition_details`**: Review specific competition requirements
295
+ - **`mcp_brain-api_get_documentation_page`**: Access platform best practices and guidelines
296
+ - **`mcp_brain-api_get_alpha_examples`**: Review alpha strategy examples for validation
297
+
298
+ ### Deliverables
299
+
300
+ #### 1. Dataset Field Catalog
301
+ - Complete inventory of all data fields
302
+ - Enhanced descriptions for each field
303
+ - Coverage and usage statistics
304
+ - Quality indicators and limitations
305
+
306
+ #### 2. Logical Categorization System
307
+ - Hierarchical field grouping
308
+ - Category descriptions and rationale
309
+ - Cross-reference system
310
+ - Usage guidelines and examples
311
+
312
+ #### 3. Data Initial Exploration Report
313
+ - Coverage analysis by instrument and time
314
+ - Data consistency evaluation
315
+ - Missing data patterns
316
+ - Quality improvement recommendations
317
+
318
+ #### 4. Alpha Creation Insights
319
+ - Identified patterns and relationships
320
+ - Potential strategy opportunities
321
+ - Risk considerations
322
+ - Implementation guidelines
323
+
324
+ #### 5. Comprehensive Dataset Guide
325
+ - Executive summary
326
+ - Detailed field documentation
327
+ - Categorization system
328
+ - Best practices and examples
329
+ - Troubleshooting guide
330
+
331
+ ### Success Metrics
332
+
333
+ #### 1. Documentation Quality
334
+ - **Completeness**: All fields documented with enhanced descriptions
335
+ - **Clarity**: Descriptions are clear and actionable
336
+ - **Organization**: Logical, intuitive categorization system
337
+ - **Accuracy**: Information is current and correct
338
+
339
+ #### 2. User Experience Improvement
340
+ - **Discovery**: Users can quickly find relevant fields
341
+ - **Understanding**: Clear comprehension of field purpose and usage
342
+ - **Efficiency**: Reduced time to identify appropriate data
343
+ - **Confidence**: Users trust the information provided
344
+
345
+ #### 3. Platform Knowledge Enhancement
346
+ - **Coverage**: Comprehensive understanding of dataset capabilities
347
+ - **Insights**: Discovery of new patterns and opportunities
348
+ - **Integration**: Knowledge connects to broader platform understanding
349
+ - **Innovation**: Identification of new use cases and applications
350
+
351
+ ### Tools & Resources
352
+
353
+ #### 1. BRAIN Platform Tools
354
+ - **Data Explorer**: Primary field discovery and analysis tool
355
+ - **Simulation Engine**: Data behavior testing and validation
356
+ - **Documentation System**: Platform knowledge and best practices
357
+ - **API Access**: Automated data exploration and analysis
358
+ - **BRAIN 6-Tips Methodology**: Proven systematic approach to datafield exploration
359
+
360
+ **MCP Tool Integration for Platform Tools:**
361
+ - **Data Explorer**: Use `mcp_brain-api_get_datasets` and `mcp_brain-api_get_datafields`
362
+ - **Simulation Engine**: Use `mcp_brain-api_create_simulation` with proper settings
363
+ - **Documentation System**: Use `mcp_brain-api_get_documentations` and `mcp_brain-api_get_documentation_page`
364
+ - **API Access**: All MCP tools provide automated API access
365
+ - **BRAIN 6-Tips**: Implemented through `mcp_brain-api_create_simulation` calls
366
+
367
+ #### 2. External Resources
368
+ - **Financial Databases**: Additional context for financial fields
369
+ - **Industry Publications**: Market knowledge and trends
370
+ - **Academic Research**: Statistical methods and best practices
371
+ - **Community Forums**: User insights and experiences
372
+
373
+ #### 3. Analysis Tools
374
+ - **Statistical Software**: Data analysis and visualization
375
+ - **Documentation Tools**: Knowledge management and organization
376
+ - **Collaboration Platforms**: Team coordination and knowledge sharing
377
+
378
+ **MCP-Enhanced Analysis Capabilities:**
379
+ - **Statistical Analysis**: Use `mcp_brain-api_create_simulation` for data behavior testing
380
+ - **Data Quality Assessment**: Use `mcp_brain-api_get_platform_setting_options` for validation
381
+ - **Pattern Recognition**: Use `mcp_brain-api_get_operators` for available analysis functions
382
+ - **Documentation Management**: Use `mcp_brain-api_get_documentations` for comprehensive knowledge access
383
+ - **Community Integration**: Use `mcp_brain-forum_*` tools for collaborative insights
384
+
385
+ ### Professional Development
386
+
387
+ #### 1. Continuous Learning
388
+ - **Platform Updates**: Stay current with BRAIN platform developments
389
+ - **Industry Trends**: Monitor financial data and technology advances
390
+ - **Best Practices**: Learn from community and expert insights
391
+ - **Skill Enhancement**: Develop additional technical and analytical capabilities
392
+
393
+ #### 2. Knowledge Sharing
394
+ - **Team Training**: Share expertise with colleagues
395
+ - **Community Contribution**: Contribute to BRAIN community knowledge
396
+ - **Documentation Updates**: Maintain current and accurate information
397
+ - **Best Practice Development**: Create and refine methodologies
398
+
399
+ ### Conclusion
400
+
401
+ The Dataset Exploration Expert role is critical for maximizing the value of WorldQuant BRAIN's extensive data resources. By providing deep insights, logical organization, and comprehensive documentation, this expert enables users to discover new opportunities, create more effective alphas, and leverage the platform's full potential.
402
+
403
+ Success in this role requires a combination of technical expertise, analytical thinking, and communication skills, along with a deep understanding of both financial markets and data science principles. The expert serves as a bridge between raw data and actionable insights, transforming complex datasets into accessible, well-organized knowledge resources that drive innovation and success on the BRAIN platform.
404
+
405
+ ---
406
+
407
+ ## 🔧 **MCP Tool Reference Guide**
408
+
409
+ ### **Core Data Exploration Tools**
410
+ - **`mcp_brain-api_get_datasets`**: Discover and filter available datasets
411
+ - **`mcp_brain-api_get_datafields`**: Retrieve field inventory and metadata
412
+ - **`mcp_brain-api_create_simulation`**: Execute data analysis simulations
413
+ - **`mcp_brain-api_get_platform_setting_options`**: Validate simulation parameters
414
+
415
+ ### **Documentation & Knowledge Tools**
416
+ - **`mcp_brain-api_get_documentations`**: Access platform documentation structure
417
+ - **`mcp_brain-api_get_documentation_page`**: Read specific documentation content
418
+ - **`mcp_brain-api_get_operators`**: Discover available analysis operators
419
+ - **`mcp_brain-api_get_alpha_examples`**: Access strategy examples and templates
420
+
421
+ ### **Community & Forum Tools**
422
+ - **`mcp_brain-forum_search_forum_posts`**: Search community discussions
423
+ - **`mcp_brain-forum_read_full_forum_post`**: Read detailed forum content
424
+ - **`mcp_brain-forum_get_glossary_terms`**: Access community terminology
425
+
426
+ ### **Competition & Event Tools**
427
+ - **`mcp_brain-api_get_events`**: Discover available competitions
428
+ - **`mcp_brain-api_get_competition_details`**: Get competition guidelines
429
+ - **`mcp_brain-api_get_competition_agreement`**: Access competition rules
430
+
431
+ ### **Best Practices for MCP Tool Usage**
432
+ 1. **Always authenticate first** using `mcp_brain-api_authenticate`
433
+ 2. **Validate parameters** using `mcp_brain-api_get_platform_setting_options`
434
+ 3. **Handle errors gracefully** and retry with corrected parameters
435
+ 4. **Use appropriate delays** between API calls to avoid rate limiting
436
+ 5. **Document tool usage** in your exploration reports for reproducibility
@@ -0,0 +1,39 @@
1
+ ---
2
+ name: brain-explain-alphas
3
+ description: >-
4
+ Provides a step-by-step workflow for analyzing and explaining WorldQuant BRAIN alpha expressions.
5
+ Use this when the user asks to explain a specific alpha expression, what a datafield does, or how operators work together.
6
+ Includes steps for data field lookup, operator analysis, and external research.
7
+ ---
8
+
9
+ # Alpha Explanation Workflow
10
+
11
+ This manual provides a workflow for analyzing and explaining a WorldQuant BRAIN alpha expression.
12
+ For the full detailed workflow and examples, see [reference.md](reference.md).
13
+
14
+ ## Step 1: Deconstruct the Alpha Expression
15
+ Break down the alpha expression into its fundamental components: data fields and operators.
16
+ *Example:* `quantile(ts_regression(oth423_find,group_mean(oth423_find,vec_max(shrt3_bar),country),90))`
17
+ - **Data Fields**: `oth423_find`, `shrt3_bar`
18
+ - **Operators**: `quantile`, `ts_regression`, `group_mean`, `vec_max`
19
+
20
+ ## Step 2: Analyze Data Fields
21
+ Use the `get_datafields` tool to get details about each data field.
22
+ - Identify: Instrument Type, Region, Delay, Universe, Data Type (Matrix/Vector).
23
+ - Note: Vector data requires aggregation (e.g., `vec_max`).
24
+
25
+ ## Step 3: Understand the Operators
26
+ Use the `get_operators` tool to understand what each operator does.
27
+
28
+ ## Step 4: Consult Official Documentation
29
+ Use `get_documentations` and `read_specific_documentation` for deep dives into concepts (e.g., vector data handling).
30
+
31
+ ## Step 5: Synthesize and Explain
32
+ Structure the explanation:
33
+ 1. **Idea**: High-level summary of the strategy.
34
+ 2. **Rationale for data**: Why these fields? What do they represent?
35
+ 3. **Rationale for operators**: How do they transform the data?
36
+ 4. **Further Inspiration**: Potential improvements.
37
+
38
+ ## Appendix: Vector Data
39
+ Vector data records multiple events per day per instrument (e.g., news). It requires aggregation (like `vec_mean`, `vec_sum`) to become a matrix value usable by other operators.
@@ -0,0 +1,56 @@
1
+ Alpha Explanation Workflow
2
+ This manual provides a step-by-step workflow for analyzing and explaining a WorldQuant BRAIN alpha expression. By following this guide, you can efficiently gather the necessary information to understand the logic and potential strategy behind any alpha.
3
+
4
+ Step 1: Deconstruct the Alpha Expression
5
+ The first step is to break down the alpha expression into its fundamental components: data fields and operators.
6
+
7
+ For example, given the expression quantile(ts_regression(oth423_find,group_mean(oth423_find,vec_max(shrt3_bar),country),90)):
8
+
9
+ Data Fields: oth423_find, shrt3_bar
10
+ Operators: quantile, ts_regression, group_mean, vec_max
11
+ Step 2: Analyze Data Fields
12
+ Use the brain-platform-mcp tool get_datafields to get detailed information about each data field.
13
+
14
+ Tool Usage: xml <use_mcp_tool> <server_name>brain-platform-mcp</server_name> <tool_name>get_datafields</tool_name> <arguments> { "instrument_type": "EQUITY", "region": "ASI", "delay": 1, "universe": "MINVOL1M", "data_type": "VECTOR", "search": "shrt3_bar" } </arguments> </use_mcp_tool>
15
+
16
+ Tips for effective searching:
17
+
18
+ Specify Parameters: Always provide as much information as you know, including instrument_type, region, delay, universe, and data_type (MATRIX or VECTOR).
19
+ Iterate: If you don't find the data field on your first try, try different combinations of parameters. The ASI region, for example, has two universes: MINVOL1M and ILLIQUID_MINVOL1M.
20
+ Check Data Type: Be sure to check if the data is a MATRIX (one value per stock per day) or a VECTOR (multiple values per stock per day). This is crucial for understanding how the data is used.
21
+ Example Data Field Information:
22
+
23
+ oth423_find: A matrix data field from the "Fundamental Income and Dividend Model" dataset in the ASI region. It represents a "Find score," likely indicating fundamental attractiveness.
24
+ shrt3_bar: A vector data field from the "Securities Lending Files Data" dataset in the ASI region. It provides a vector of ratings (1-10) indicating the demand to borrow a stock, which is a proxy for short-selling interest.
25
+ Step 3: Understand the Operators
26
+ Use the brain-platform-mcp tool get_operators to get a list of all available operators and their descriptions.
27
+
28
+ Tool Usage: xml <use_mcp_tool> <server_name>brain-platform-mcp</server_name> <tool_name>get_operators</tool_name> <arguments> {} </arguments> </use_mcp_tool> The output of this command contains a wealth of information. For your convenience, a table of the most common operators is included in the Appendix of this manual.
29
+
30
+ Step 4: Consult Official Documentation
31
+ For more complex topics, the official BRAIN documentation is an invaluable resource. Use the get_documentations tool to see a list of available documents, and get_documentation_page to read a specific page.
32
+
33
+ Example: To understand vector data fields better, I consulted the "Vector Data Fields 🥉" document (vector-datafields). This revealed that vector data contains multiple values per instrument per day and must be aggregated by a vector operator before being used with other operators.
34
+
35
+ Step 5: Broaden Understanding with External Research (Must Call the arxiv_api.py script to get the latest research papers)
36
+ For cutting-edge ideas and inspiration, you can search for academic papers on arXiv using the provided arxiv_api.py script.
37
+
38
+ Workflow:
39
+
40
+ Identify Keywords: Based on your analysis of the alpha, identify relevant keywords. For our example, these were: "short interest", "fundamental analysis", "relative value", and "news sentiment".
41
+ Run the Script: Use the with-wrappers script to avoid SSL errors.
42
+
43
+ python arxiv_api.py "your keywords here" -n 10
44
+ Step 6: Synthesize and Explain
45
+ Once you have gathered all the necessary information, structure your explanation in a clear and concise format. The following template is recommended:
46
+
47
+ Idea: A high-level summary of the alpha's strategy.
48
+ Rationale for data used: An explanation of why each data field was chosen and what it represents.
49
+ Rationale for operators used: A step-by-step explanation of how the operators transform the data to generate the final signal.
50
+ Further Inspiration: Ideas for new alphas based on your research.
51
+ Troubleshooting
52
+ SSL Errors: If you encounter a CERTIFICATE_VERIFY_FAILED error when running python scripts that access the internet, use the AI to help you change or make script to execute your command.
53
+ Appendix A: Understanding Vector Data
54
+ Vector Data is a distinct type of data field where the number of events recorded per day, per instrument, can vary. This is in contrast to standard matrix data, which has a single value for each instrument per day.
55
+
56
+ For example, news sentiment data is often a vector because a stock can have multiple news articles on a single day. To use this data in most BRAIN operators, it must first be aggregated into a single value using a vector operator.
@@ -0,0 +1,72 @@
1
+ ---
2
+ name: brain-how-to-pass-AlphaTest
3
+ description: >-
4
+ Provides detailed requirements, thresholds, and improvement tips for WorldQuant BRAIN Alpha submission tests.
5
+ Covers Fitness, Sharpe, Turnover, Weight, Sub-universe, and Self-Correlation tests.
6
+ Use this when the user asks about alpha submission failures, how to improve alpha metrics, or test requirements.
7
+ ---
8
+
9
+ # BRAIN Alpha Submission Tests: Requirements and Improvement Tips
10
+
11
+ This skill provides key requirements and expert tips for passing alpha submission tests.
12
+ For comprehensive details, thresholds, and community-sourced strategies, please read [reference.md](reference.md).
13
+
14
+ ## Overview
15
+
16
+ Alphas must pass a series of pre-submission checks to ensure they meet quality thresholds.
17
+
18
+ ## 1. Fitness
19
+ ### Requirements
20
+ - At least "Average": Greater than 1.3 for Delay-0 or Greater than 1 for Delay-1.
21
+ - Fitness = Sharpe * sqrt(abs(Returns) / max(Turnover, 0.125)).
22
+
23
+ ### Tips to Improve
24
+ - Increase Sharpe/Returns and reduce Turnover.
25
+ - Use group operators (e.g., with pv13) to boost fitness.
26
+ - Check with `check_submission` tool.
27
+
28
+ ## 2. Sharpe Ratio
29
+ ### Requirements
30
+ - Greater than 2 for Delay-0 or Greater than 1.25 for Delay-1.
31
+ - Sharpe = sqrt(252) * IR, where IR = mean(PnL) / stdev(PnL).
32
+
33
+ ### Tips to Improve
34
+ - Focus on consistent PnL with low volatility.
35
+ - Decay signals separately for liquid/non-liquid stocks.
36
+ - If Sharpe is negative (e.g., -1 to -2), try flipping the sign: `-original_expression`.
37
+
38
+ ## 3. Turnover
39
+ ### Requirements
40
+ - 1% < Turnover < 70%.
41
+
42
+ ### Tips to Improve
43
+ - Use decay functions (`ts_decay_linear`) to smooth signals.
44
+
45
+ ## 4. Weight Test
46
+ ### Requirements
47
+ - Max weight in any stock <10%.
48
+
49
+ ### Tips to Improve
50
+ - Use neutralization (e.g., `neutralize(x, "MARKET")`) to distribute weights.
51
+
52
+ ## 5. Sub-universe Test
53
+ ### Requirements
54
+ - Sub-universe Sharpe >= 0.75 * sqrt(subuniverse_size / alpha_universe_size) * alpha_sharpe.
55
+
56
+ ### Tips to Improve
57
+ - Avoid size-related multipliers.
58
+ - Decay liquid/non-liquid parts separately.
59
+
60
+ ## 6. Self-Correlation
61
+ ### Requirements
62
+ - <0.7 PnL correlation with own submitted alphas.
63
+
64
+ ### Tips to Improve
65
+ - Submit diverse ideas.
66
+ - Use `check_correlation` tool.
67
+ - Transform negatively correlated alphas.
68
+
69
+ ## General Guidance
70
+ - **Start Simple**: Use basic operators like `ts_rank` first.
71
+ - **Optimize Settings**: Choose universes like TOP3000 (USA, D1).
72
+ - **ATOM Principle**: Avoid mixing datasets to benefit from relaxed "ATOM" submission criteria (Last 2Y Sharpe).