cnhkmcp 2.1.7__py3-none-any.whl → 2.1.9__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- cnhkmcp/__init__.py +1 -1
- cnhkmcp/untracked/AI/321/206/320/231/320/243/321/205/342/225/226/320/265/321/204/342/225/221/342/225/221/BRAIN_AI/321/206/320/231/320/243/321/205/342/225/226/320/265/321/204/342/225/221/342/225/221Mac_Linux/321/207/320/231/320/230/321/206/320/254/320/274.zip +0 -0
- cnhkmcp/untracked/AI/321/206/320/231/320/243/321/205/342/225/226/320/265/321/204/342/225/221/342/225/221//321/205/320/237/320/234/321/205/320/227/342/225/227/321/205/320/276/320/231/321/210/320/263/320/225AI/321/206/320/231/320/243/321/205/342/225/226/320/265/321/204/342/225/221/342/225/221_Windows/321/207/320/231/320/230/321/206/320/254/320/274.exe +0 -0
- cnhkmcp/untracked/AI/321/206/320/261/320/234/321/211/320/255/320/262/321/206/320/237/320/242/321/204/342/225/227/342/225/242/config.json +1 -1
- cnhkmcp/untracked/skills/brain-calculate-alpha-selfcorrQuick/SKILL.md +25 -0
- cnhkmcp/untracked/skills/brain-calculate-alpha-selfcorrQuick/reference.md +59 -0
- cnhkmcp/untracked/skills/brain-calculate-alpha-selfcorrQuick/scripts/requirements.txt +4 -0
- cnhkmcp/untracked/skills/brain-calculate-alpha-selfcorrQuick/scripts/skill.py +734 -0
- cnhkmcp/untracked/skills/brain-datafield-exploration-general/SKILL.md +45 -0
- cnhkmcp/untracked/skills/brain-datafield-exploration-general/reference.md +194 -0
- cnhkmcp/untracked/skills/brain-dataset-exploration-general/SKILL.md +39 -0
- cnhkmcp/untracked/skills/brain-dataset-exploration-general/reference.md +436 -0
- cnhkmcp/untracked/skills/brain-explain-alphas/SKILL.md +39 -0
- cnhkmcp/untracked/skills/brain-explain-alphas/reference.md +56 -0
- cnhkmcp/untracked/skills/brain-how-to-pass-AlphaTest/SKILL.md +72 -0
- cnhkmcp/untracked/skills/brain-how-to-pass-AlphaTest/reference.md +202 -0
- cnhkmcp/untracked/skills/brain-improve-alpha-performance/SKILL.md +44 -0
- cnhkmcp/untracked/skills/brain-improve-alpha-performance/reference.md +101 -0
- cnhkmcp/untracked/skills/brain-nextMove-analysis/SKILL.md +37 -0
- cnhkmcp/untracked/skills/brain-nextMove-analysis/reference.md +128 -0
- {cnhkmcp-2.1.7.dist-info → cnhkmcp-2.1.9.dist-info}/METADATA +1 -1
- {cnhkmcp-2.1.7.dist-info → cnhkmcp-2.1.9.dist-info}/RECORD +26 -10
- {cnhkmcp-2.1.7.dist-info → cnhkmcp-2.1.9.dist-info}/WHEEL +0 -0
- {cnhkmcp-2.1.7.dist-info → cnhkmcp-2.1.9.dist-info}/entry_points.txt +0 -0
- {cnhkmcp-2.1.7.dist-info → cnhkmcp-2.1.9.dist-info}/licenses/LICENSE +0 -0
- {cnhkmcp-2.1.7.dist-info → cnhkmcp-2.1.9.dist-info}/top_level.txt +0 -0
|
@@ -0,0 +1,436 @@
|
|
|
1
|
+
# Dataset Exploration Expert - Job Duty Manual
|
|
2
|
+
## WorldQuant BRAIN Platform
|
|
3
|
+
|
|
4
|
+
### Position Overview
|
|
5
|
+
The Dataset Exploration Expert is a specialized role focused on deep analysis and categorization of datasets within the WorldQuant BRAIN platform. This expert serves as a master/assistant that excels at exploring individual datasets, grouping data files into logical categories, and providing comprehensive insights into data field characteristics and relationships.
|
|
6
|
+
|
|
7
|
+
### Core Responsibilities
|
|
8
|
+
|
|
9
|
+
#### 1. Dataset Deep Dive Analysis
|
|
10
|
+
- **Single Dataset Focus**: Concentrate on one dataset at a time for comprehensive understanding
|
|
11
|
+
- **Data Field Inventory**: Catalog and analyze all available data fields within the target dataset
|
|
12
|
+
- **Coverage Analysis**: Assess data availability across different instruments, regions, and time periods
|
|
13
|
+
- **Quality Assessment**: Evaluate data reliability, consistency, and completeness
|
|
14
|
+
|
|
15
|
+
#### 2. Data Field Categorization & Grouping
|
|
16
|
+
- **Logical Grouping**: Organize data fields into meaningful categories based on:
|
|
17
|
+
- Business function (e.g., financial metrics, operational data, market indicators)
|
|
18
|
+
- Data type (e.g., matrix, vector, group fields)
|
|
19
|
+
- Update frequency (e.g., daily, quarterly, annual)
|
|
20
|
+
- Coverage patterns (e.g., high-coverage vs. low-coverage fields)
|
|
21
|
+
- Usage patterns (e.g., frequently used vs. underutilized fields)
|
|
22
|
+
|
|
23
|
+
- **Hierarchical Organization**: Create multi-level categorization systems:
|
|
24
|
+
- Primary categories (e.g., Financial Statements, Market Data, Analyst Estimates)
|
|
25
|
+
- Secondary categories (e.g., Balance Sheet, Income Statement, Cash Flow)
|
|
26
|
+
- Tertiary categories (e.g., Assets, Liabilities, Revenue, Expenses)
|
|
27
|
+
|
|
28
|
+
#### 3. Enhanced Data Field Descriptions
|
|
29
|
+
- **Current Description Analysis**: Review existing field descriptions for clarity and completeness
|
|
30
|
+
- **Enhanced Documentation**: Write improved, detailed descriptions that include:
|
|
31
|
+
- Business context and significance
|
|
32
|
+
- Calculation methodology (if applicable)
|
|
33
|
+
- Typical value ranges and distributions
|
|
34
|
+
- Relationships to other fields
|
|
35
|
+
- Common use cases in alpha creation
|
|
36
|
+
- Coverage limitations and considerations
|
|
37
|
+
|
|
38
|
+
#### 4. Exploratory Data Analysis
|
|
39
|
+
- **Statistical Profiling**: Analyze data field characteristics including:
|
|
40
|
+
- Value distributions and ranges
|
|
41
|
+
- Temporal patterns and seasonality
|
|
42
|
+
- Cross-sectional relationships
|
|
43
|
+
- Missing data patterns
|
|
44
|
+
- Outlier identification
|
|
45
|
+
|
|
46
|
+
- **Feature Engineering Insights**: Identify potential derived features and combinations
|
|
47
|
+
- **Alpha Creation Opportunities**: Discover patterns that could lead to profitable trading strategies
|
|
48
|
+
|
|
49
|
+
#### 5. Cross-Platform Research & Integration
|
|
50
|
+
- **BRAIN Platform Exploration**: Leverage all available platform tools:
|
|
51
|
+
- Data Explorer for field discovery
|
|
52
|
+
- Simulation capabilities for data testing
|
|
53
|
+
- Research papers and documentation
|
|
54
|
+
- User community insights and best practices
|
|
55
|
+
|
|
56
|
+
- **Forum & Community Engagement**: Research and integrate knowledge from:
|
|
57
|
+
- BRAIN support forum discussions
|
|
58
|
+
- User-generated content and tutorials
|
|
59
|
+
- Expert insights and case studies
|
|
60
|
+
- Platform updates and new features
|
|
61
|
+
|
|
62
|
+
### Technical Skills Required
|
|
63
|
+
|
|
64
|
+
#### 1. BRAIN Platform Proficiency
|
|
65
|
+
- **Data Explorer Mastery**: Expert use of BRAIN's AI-powered data discovery tools
|
|
66
|
+
- **Simulation Tools**: Ability to run test simulations to understand data behavior
|
|
67
|
+
- **API Knowledge**: Understanding of BRAIN API for automated data exploration
|
|
68
|
+
- **Documentation Navigation**: Efficient use of platform documentation and resources
|
|
69
|
+
|
|
70
|
+
#### 2. Data Analysis Capabilities
|
|
71
|
+
- **Statistical Analysis**: Understanding of descriptive statistics, distributions, and relationships
|
|
72
|
+
- **Financial Knowledge**: Familiarity with financial statements, ratios, and market data
|
|
73
|
+
- **Pattern Recognition**: Ability to identify meaningful patterns in complex datasets
|
|
74
|
+
- **Data Quality Assessment**: Skills in evaluating data reliability and consistency
|
|
75
|
+
|
|
76
|
+
#### 3. Documentation & Communication
|
|
77
|
+
- **Technical Writing**: Ability to create clear, comprehensive field descriptions
|
|
78
|
+
- **Visual Organization**: Skills in creating logical categorization systems
|
|
79
|
+
- **Knowledge Management**: Ability to organize and present complex information clearly
|
|
80
|
+
|
|
81
|
+
### Workflow & Methodology
|
|
82
|
+
|
|
83
|
+
#### Phase 1: Dataset Selection & Initial Assessment
|
|
84
|
+
1. **Dataset Identification**: Select target dataset based on:
|
|
85
|
+
- Strategic importance
|
|
86
|
+
- Current usage levels
|
|
87
|
+
- Data quality scores
|
|
88
|
+
- User community needs
|
|
89
|
+
|
|
90
|
+
2. **Initial Exploration**: Use MCP to:
|
|
91
|
+
- Review dataset overview and description
|
|
92
|
+
- Identify total field count and coverage
|
|
93
|
+
- Assess value scores and pyramid multipliers
|
|
94
|
+
- Review research papers and documentation
|
|
95
|
+
|
|
96
|
+
**MCP Tool Calls for Phase 1:**
|
|
97
|
+
- **`mcp_brain-api_get_datasets`**: Discover available datasets with coverage filters
|
|
98
|
+
- **`mcp_brain-api_get_datafields`**: Get field count and coverage statistics for selected dataset
|
|
99
|
+
- **`mcp_brain-api_get_documentations`**: Access platform documentation and research papers
|
|
100
|
+
- **`mcp_brain-api_get_documentation_page`**: Read specific documentation pages for dataset context
|
|
101
|
+
|
|
102
|
+
#### Phase 2: Comprehensive Field Analysis
|
|
103
|
+
1. **Field Inventory**: Catalog all data fields with:
|
|
104
|
+
- Field ID and name
|
|
105
|
+
- Current description
|
|
106
|
+
- Data type (matrix/vector/group), please note, different data types have different characteristics and usage patterns; you should use mcp to check how to handle different data types by reading the related documents.
|
|
107
|
+
- Coverage statistics
|
|
108
|
+
- Usage metrics (user count, alpha count)
|
|
109
|
+
|
|
110
|
+
2. **Preliminary Categorization**: Group fields by:
|
|
111
|
+
- Business function
|
|
112
|
+
- Data characteristics
|
|
113
|
+
- Update patterns
|
|
114
|
+
- Coverage levels
|
|
115
|
+
|
|
116
|
+
**MCP Tool Calls for Phase 2:**
|
|
117
|
+
- **`mcp_brain-api_get_datafields`**: Retrieve complete field inventory with metadata
|
|
118
|
+
- **`mcp_brain-api_get_documentation_page`**: Read data type handling documentation (e.g., "vector-datafields", "group-data-fields")
|
|
119
|
+
- **`mcp_brain-api_get_operators`**: Understand available operators for different data types
|
|
120
|
+
- **`mcp_brain-api_get_documentations`**: Access data handling best practices and examples
|
|
121
|
+
|
|
122
|
+
#### Phase 3: Initial Data Exploration
|
|
123
|
+
1. **Statistical Profiling Using BRAIN 6-Tips Methodology**: Run systematic exploratory simulations following the proven BRAIN platform approach. This methodology provides a comprehensive framework for understanding new datafields efficiently. **Critical Settings for All Tests**:
|
|
124
|
+
- **Neutralization**: "None" (to see raw data behavior without masking important patterns)
|
|
125
|
+
- **Decay**: 0 (to preserve actual data values and avoid smoothing out variations)
|
|
126
|
+
- **Test Period**: P0Y0M (for focused analysis)
|
|
127
|
+
- **Focus**: Long Count and Short Count in IS Summary section for insights
|
|
128
|
+
|
|
129
|
+
**A. Basic Coverage Analysis**
|
|
130
|
+
- **Expression**: `datafield` (for matrix data type) or `vector_operator(datafield)` (for vector data type)
|
|
131
|
+
- **Purpose**: Determine % coverage = (Long Count + Short Count) / Universe Size
|
|
132
|
+
- **Insight**: Understand basic data availability across instruments
|
|
133
|
+
- **What it tells you**: How many instruments have data for this field on average
|
|
134
|
+
- **Implementation**: Start with this test to establish baseline coverage understanding
|
|
135
|
+
|
|
136
|
+
**B. Non-Zero Value Coverage**
|
|
137
|
+
- **Expression**: `datafield != 0 ? 1 : 0` (for matrix) or `vector_operator(datafield) != 0 ? 1 : 0` (for vector)
|
|
138
|
+
- **Purpose**: Distinguish between missing data and actual zero values
|
|
139
|
+
- **Insight**: Long Count indicates average non-zero values on a daily basis
|
|
140
|
+
- **What it tells you**: Whether the field has meaningful data vs. just coverage gaps
|
|
141
|
+
- **Implementation**: Run after basic coverage to understand data quality vs. availability
|
|
142
|
+
|
|
143
|
+
**C. Data Update Frequency Analysis**
|
|
144
|
+
- **Expression**: `ts_std_dev(datafield,N) != 0 ? 1 : 0` (for matrix) or `ts_std_dev(vector_operator(datafield),N) != 0 ? 1 : 0` (for vector)
|
|
145
|
+
- **Purpose**: Understand how often data actually changes vs. being backfilled
|
|
146
|
+
- **Insight**: Frequency of unique data updates (daily, weekly, monthly, quarterly)
|
|
147
|
+
- **Key Testing Strategy**:
|
|
148
|
+
- **N = 5 (weekly)**: Long Count + Short Count will be lowest (approx. 1/5th of coverage)
|
|
149
|
+
- **N = 22 (monthly)**: Long Count + Short Count will be lower (approx. 1/3rd of coverage)
|
|
150
|
+
- **N = 66 (quarterly)**: Long Count + Short Count will be closest to actual coverage
|
|
151
|
+
- **What it tells you**: Data freshness patterns and whether data is actively updated or backfilled
|
|
152
|
+
- **Implementation**: Test with various N values to identify the actual update frequency
|
|
153
|
+
|
|
154
|
+
**D. Data Bounds Analysis**
|
|
155
|
+
- **Expression**: `abs(datafield) > X` (for matrix) or `abs(vector_operator(datafield)) > X` (for vector)
|
|
156
|
+
- **Purpose**: Understand data range, scale, and normalization
|
|
157
|
+
- **Insight**: Bounds of the datafield values
|
|
158
|
+
- **Testing Strategy**: Vary X values systematically:
|
|
159
|
+
- **X = 1**: Test if data is normalized to values between -1 and +1
|
|
160
|
+
- **X = 0.1, 0.5, 1, 5, 10**: Test various thresholds to understand value distribution
|
|
161
|
+
- **What it tells you**: Whether data is normalized, typical value ranges, and data scale
|
|
162
|
+
- **Implementation**: Start with X = 1, then adjust based on results to map the full value range
|
|
163
|
+
|
|
164
|
+
**E. Central Tendency Analysis**
|
|
165
|
+
- **Expression**: `ts_median(datafield, 1000) > X` (for matrix) or `ts_median(vector_operator(datafield), 1000) > X` (for vector)
|
|
166
|
+
- **Purpose**: Understand typical values and central tendency over 5 years
|
|
167
|
+
- **Insight**: Median of the datafield over extended period
|
|
168
|
+
- **Testing Strategy**: Vary X values to understand value distribution:
|
|
169
|
+
- **X = 0**: Test if median is positive
|
|
170
|
+
- **X = 0.1, 0.5, 1, 5, 10**: Test various thresholds to map central tendency
|
|
171
|
+
- **What it tells you**: Whether data is skewed, what typical values look like, and data characteristics
|
|
172
|
+
- **Alternative**: Can also use `ts_mean(datafield, 1000) > X` for mean-based analysis
|
|
173
|
+
- **Implementation**: Test with increasing X values until Long Count approaches zero
|
|
174
|
+
|
|
175
|
+
**F. Data Distribution Analysis**
|
|
176
|
+
- **Expression**: `X < scale_down(datafield) && scale_down(datafield) < Y` (for matrix) or `X < scale_down(vector_operator(datafield)) && scale_down(vector_operator(datafield)) < Y` (for vector)
|
|
177
|
+
- **Purpose**: Understand how data distributes across its range
|
|
178
|
+
- **Insight**: Distribution characteristics and patterns
|
|
179
|
+
- **Key Understanding**: `scale_down` acts as a MinMaxScaler that preserves original distribution
|
|
180
|
+
- **Testing Strategy**: Vary X and Y between 0 and 1 to test different distribution segments:
|
|
181
|
+
- **X = 0, Y = 0.25**: Test bottom quartile distribution
|
|
182
|
+
- **X = 0.25, Y = 0.5**: Test second quartile distribution
|
|
183
|
+
- **X = 0.5, Y = 0.75**: Test third quartile distribution
|
|
184
|
+
- **X = 0.75, Y = 1**: Test top quartile distribution
|
|
185
|
+
- **What it tells you**: Whether data is evenly distributed, clustered, or has specific patterns
|
|
186
|
+
- **Implementation**: Test quartile ranges first, then adjust for finer granularity
|
|
187
|
+
|
|
188
|
+
**Data Type Considerations**:
|
|
189
|
+
- **Matrix Data Type**: Use expressions directly as shown above
|
|
190
|
+
- **Vector Data Type**: Must use appropriate vector operators (found via MCP) to convert to matrix format
|
|
191
|
+
- **Group Data Type**: Requires special handling - consult MCP documentation for group field operators
|
|
192
|
+
- **Critical**: Always verify data type before testing and use appropriate operators accordingly
|
|
193
|
+
|
|
194
|
+
**Implementation Workflow for BRAIN 6-Tips**:
|
|
195
|
+
1. **Setup Phase**: Configure simulation with "None" neutralization, decay 0, and P0Y0M test period
|
|
196
|
+
2. **Sequential Testing**: Run tests A through F in order for systematic understanding
|
|
197
|
+
3. **Iterative Refinement**: Adjust thresholds based on initial results for deeper insights
|
|
198
|
+
4. **Documentation**: Record Long Count and Short Count for each test to build comprehensive profile
|
|
199
|
+
5. **Validation**: Cross-reference results across different N values and thresholds for consistency
|
|
200
|
+
|
|
201
|
+
**Expected Results Interpretation**:
|
|
202
|
+
- **Coverage Tests (A & B)**: Should show Long Count + Short Count ≤ Universe Size
|
|
203
|
+
- **Frequency Tests (C)**: Lower N values should show proportionally lower counts
|
|
204
|
+
- **Bounds Tests (D)**: Should reveal data normalization and typical ranges
|
|
205
|
+
- **Tendency Tests (E)**: Should show data skewness and central value characteristics
|
|
206
|
+
- **Distribution Tests (F)**: Should reveal clustering, patterns, and data spread
|
|
207
|
+
|
|
208
|
+
**Common Patterns to Watch For**:
|
|
209
|
+
- **Normalized Data**: Values consistently between -1 and +1
|
|
210
|
+
- **Quarterly Updates**: Significant count differences between N=22 and N=66
|
|
211
|
+
- **Sparse Data**: High coverage but low non-zero counts
|
|
212
|
+
- **Skewed Distributions**: Uneven quartile distributions in scale_down tests
|
|
213
|
+
- **Data Quality Issues**: Inconsistent results across different test parameters
|
|
214
|
+
|
|
215
|
+
**Practical Example - Closing Price Analysis**:
|
|
216
|
+
**Test A (Basic Coverage)**: `close` → High Long Count + Short Count indicates universal coverage
|
|
217
|
+
**Test B (Non-Zero)**: `close != 0 ? 1 : 0` → Should show same high counts (prices are never zero)
|
|
218
|
+
**Test C (Frequency)**: `ts_std_dev(close,5) != 0 ? 1 : 0` → High counts indicate daily price changes
|
|
219
|
+
**Test D (Bounds)**: `abs(close) > 1` → Should show high counts (prices typically > $1)
|
|
220
|
+
**Test E (Tendency)**: `ts_median(close,1000) > 0` → Should show high counts (median prices are positive)
|
|
221
|
+
**Test F (Distribution)**: `0 < scale_down(close) && scale_down(close) < 0.25` → Tests bottom quartile distribution
|
|
222
|
+
|
|
223
|
+
**What This Example Demonstrates**:
|
|
224
|
+
- **Validation**: Confirms expected behavior (prices are positive, change daily, have good coverage)
|
|
225
|
+
- **Pattern Recognition**: Shows how to identify normal vs. abnormal data characteristics
|
|
226
|
+
- **Quality Assessment**: Reveals data consistency and reliability
|
|
227
|
+
- **Alpha Creation Insights**: Understanding price behavior helps in strategy development
|
|
228
|
+
|
|
229
|
+
**Troubleshooting Common Issues**:
|
|
230
|
+
- **Zero Counts**: Check if datafield name is correct and data type is appropriate
|
|
231
|
+
- **Unexpected Results**: Verify neutralization is "None" and decay is 0
|
|
232
|
+
- **Vector Field Errors**: Ensure proper vector operator is used for vector data types
|
|
233
|
+
- **Inconsistent Patterns**: Test with different N values and thresholds for validation
|
|
234
|
+
- **Low Coverage**: Consider universe size and data availability in selected region/timeframe
|
|
235
|
+
|
|
236
|
+
**Best Practices for Efficient Exploration**:
|
|
237
|
+
- **Start Simple**: Begin with basic coverage tests before complex analysis
|
|
238
|
+
- **Document Everything**: Record all test parameters and results systematically
|
|
239
|
+
- **Iterate Intelligently**: Use initial results to guide subsequent test parameters
|
|
240
|
+
- **Cross-Validate**: Compare results across different test methods for consistency
|
|
241
|
+
- **Focus on Insights**: Prioritize understanding data behavior over exhaustive testing
|
|
242
|
+
|
|
243
|
+
2. **Advanced Statistical Analysis**:
|
|
244
|
+
- Value distributions and ranges
|
|
245
|
+
- Temporal patterns and seasonality
|
|
246
|
+
- Cross-sectional relationships
|
|
247
|
+
- Missing data patterns
|
|
248
|
+
- Outlier identification
|
|
249
|
+
- Data quality consistency over time
|
|
250
|
+
|
|
251
|
+
**MCP Tool Calls for Phase 3:**
|
|
252
|
+
- **`mcp_brain-api_create_multi_regularAlpha_simulation`**: Execute BRAIN 6-tips methodology simulations
|
|
253
|
+
- **`mcp_brain-api_get_platform_setting_options`**: Validate simulation settings and parameters
|
|
254
|
+
- **`mcp_brain-api_get_operators`**: Access time series operators (ts_std_dev, ts_median, scale_down)
|
|
255
|
+
- **`mcp_brain-api_get_documentation_page`**: Read simulation settings documentation ("simulation-settings")
|
|
256
|
+
- **`mcp_brain-api_get_documentation_page`**: Access data analysis best practices ("data")
|
|
257
|
+
|
|
258
|
+
3. **Relationship Mapping**: Identify:
|
|
259
|
+
- Field interdependencies and correlations
|
|
260
|
+
- Logical groupings and hierarchies
|
|
261
|
+
- Potential derived features and combinations
|
|
262
|
+
- Alpha creation opportunities
|
|
263
|
+
- Risk factors and limitations
|
|
264
|
+
|
|
265
|
+
#### Phase 4: Enhanced Documentation
|
|
266
|
+
1. **Description Enhancement**: Improve field descriptions with:
|
|
267
|
+
- Business context
|
|
268
|
+
- Calculation details and data unit
|
|
269
|
+
- Usage examples
|
|
270
|
+
- Limitations and considerations
|
|
271
|
+
|
|
272
|
+
2. **Categorization Refinement**: Finalize logical groupings with:
|
|
273
|
+
- Clear category names
|
|
274
|
+
- Hierarchical structure
|
|
275
|
+
- Cross-references
|
|
276
|
+
- Usage guidelines
|
|
277
|
+
|
|
278
|
+
**MCP Tool Calls for Phase 4:**
|
|
279
|
+
- **`mcp_brain-api_get_documentation_page`**: Access field description best practices ("data")
|
|
280
|
+
- **`mcp_brain-api_get_documentations`**: Review documentation structure and organization
|
|
281
|
+
- **`mcp_brain-api_get_alpha_examples`**: Find usage examples in documentation ("19-alpha-examples")
|
|
282
|
+
- **`mcp_brain-api_get_documentation_page`**: Access categorization guidelines ("how-use-data-explorer")
|
|
283
|
+
|
|
284
|
+
#### Phase 5: Knowledge Integration & Validation
|
|
285
|
+
1. **Community Research**: Review forum discussions and user insights, search and read related documents or related guidanline.
|
|
286
|
+
2. **Best Practice Integration**: Incorporate platform-specific knowledge by looking into related documents or related competitions' guidanline.
|
|
287
|
+
3. **Validation**: Test categorization with sample use cases
|
|
288
|
+
4. **Documentation**: Create final comprehensive dataset guide
|
|
289
|
+
|
|
290
|
+
**MCP Tool Calls for Phase 5:**
|
|
291
|
+
- **`mcp_brain-forum_search_forum_posts`**: Search community discussions and user insights
|
|
292
|
+
- **`mcp_brain-forum_read_full_forum_post`**: Read detailed forum discussions and best practices
|
|
293
|
+
- **`mcp_brain-api_get_events`**: Access competition guidelines and rules
|
|
294
|
+
- **`mcp_brain-api_get_competition_details`**: Review specific competition requirements
|
|
295
|
+
- **`mcp_brain-api_get_documentation_page`**: Access platform best practices and guidelines
|
|
296
|
+
- **`mcp_brain-api_get_alpha_examples`**: Review alpha strategy examples for validation
|
|
297
|
+
|
|
298
|
+
### Deliverables
|
|
299
|
+
|
|
300
|
+
#### 1. Dataset Field Catalog
|
|
301
|
+
- Complete inventory of all data fields
|
|
302
|
+
- Enhanced descriptions for each field
|
|
303
|
+
- Coverage and usage statistics
|
|
304
|
+
- Quality indicators and limitations
|
|
305
|
+
|
|
306
|
+
#### 2. Logical Categorization System
|
|
307
|
+
- Hierarchical field grouping
|
|
308
|
+
- Category descriptions and rationale
|
|
309
|
+
- Cross-reference system
|
|
310
|
+
- Usage guidelines and examples
|
|
311
|
+
|
|
312
|
+
#### 3. Data Initial Exploration Report
|
|
313
|
+
- Coverage analysis by instrument and time
|
|
314
|
+
- Data consistency evaluation
|
|
315
|
+
- Missing data patterns
|
|
316
|
+
- Quality improvement recommendations
|
|
317
|
+
|
|
318
|
+
#### 4. Alpha Creation Insights
|
|
319
|
+
- Identified patterns and relationships
|
|
320
|
+
- Potential strategy opportunities
|
|
321
|
+
- Risk considerations
|
|
322
|
+
- Implementation guidelines
|
|
323
|
+
|
|
324
|
+
#### 5. Comprehensive Dataset Guide
|
|
325
|
+
- Executive summary
|
|
326
|
+
- Detailed field documentation
|
|
327
|
+
- Categorization system
|
|
328
|
+
- Best practices and examples
|
|
329
|
+
- Troubleshooting guide
|
|
330
|
+
|
|
331
|
+
### Success Metrics
|
|
332
|
+
|
|
333
|
+
#### 1. Documentation Quality
|
|
334
|
+
- **Completeness**: All fields documented with enhanced descriptions
|
|
335
|
+
- **Clarity**: Descriptions are clear and actionable
|
|
336
|
+
- **Organization**: Logical, intuitive categorization system
|
|
337
|
+
- **Accuracy**: Information is current and correct
|
|
338
|
+
|
|
339
|
+
#### 2. User Experience Improvement
|
|
340
|
+
- **Discovery**: Users can quickly find relevant fields
|
|
341
|
+
- **Understanding**: Clear comprehension of field purpose and usage
|
|
342
|
+
- **Efficiency**: Reduced time to identify appropriate data
|
|
343
|
+
- **Confidence**: Users trust the information provided
|
|
344
|
+
|
|
345
|
+
#### 3. Platform Knowledge Enhancement
|
|
346
|
+
- **Coverage**: Comprehensive understanding of dataset capabilities
|
|
347
|
+
- **Insights**: Discovery of new patterns and opportunities
|
|
348
|
+
- **Integration**: Knowledge connects to broader platform understanding
|
|
349
|
+
- **Innovation**: Identification of new use cases and applications
|
|
350
|
+
|
|
351
|
+
### Tools & Resources
|
|
352
|
+
|
|
353
|
+
#### 1. BRAIN Platform Tools
|
|
354
|
+
- **Data Explorer**: Primary field discovery and analysis tool
|
|
355
|
+
- **Simulation Engine**: Data behavior testing and validation
|
|
356
|
+
- **Documentation System**: Platform knowledge and best practices
|
|
357
|
+
- **API Access**: Automated data exploration and analysis
|
|
358
|
+
- **BRAIN 6-Tips Methodology**: Proven systematic approach to datafield exploration
|
|
359
|
+
|
|
360
|
+
**MCP Tool Integration for Platform Tools:**
|
|
361
|
+
- **Data Explorer**: Use `mcp_brain-api_get_datasets` and `mcp_brain-api_get_datafields`
|
|
362
|
+
- **Simulation Engine**: Use `mcp_brain-api_create_simulation` with proper settings
|
|
363
|
+
- **Documentation System**: Use `mcp_brain-api_get_documentations` and `mcp_brain-api_get_documentation_page`
|
|
364
|
+
- **API Access**: All MCP tools provide automated API access
|
|
365
|
+
- **BRAIN 6-Tips**: Implemented through `mcp_brain-api_create_simulation` calls
|
|
366
|
+
|
|
367
|
+
#### 2. External Resources
|
|
368
|
+
- **Financial Databases**: Additional context for financial fields
|
|
369
|
+
- **Industry Publications**: Market knowledge and trends
|
|
370
|
+
- **Academic Research**: Statistical methods and best practices
|
|
371
|
+
- **Community Forums**: User insights and experiences
|
|
372
|
+
|
|
373
|
+
#### 3. Analysis Tools
|
|
374
|
+
- **Statistical Software**: Data analysis and visualization
|
|
375
|
+
- **Documentation Tools**: Knowledge management and organization
|
|
376
|
+
- **Collaboration Platforms**: Team coordination and knowledge sharing
|
|
377
|
+
|
|
378
|
+
**MCP-Enhanced Analysis Capabilities:**
|
|
379
|
+
- **Statistical Analysis**: Use `mcp_brain-api_create_simulation` for data behavior testing
|
|
380
|
+
- **Data Quality Assessment**: Use `mcp_brain-api_get_platform_setting_options` for validation
|
|
381
|
+
- **Pattern Recognition**: Use `mcp_brain-api_get_operators` for available analysis functions
|
|
382
|
+
- **Documentation Management**: Use `mcp_brain-api_get_documentations` for comprehensive knowledge access
|
|
383
|
+
- **Community Integration**: Use `mcp_brain-forum_*` tools for collaborative insights
|
|
384
|
+
|
|
385
|
+
### Professional Development
|
|
386
|
+
|
|
387
|
+
#### 1. Continuous Learning
|
|
388
|
+
- **Platform Updates**: Stay current with BRAIN platform developments
|
|
389
|
+
- **Industry Trends**: Monitor financial data and technology advances
|
|
390
|
+
- **Best Practices**: Learn from community and expert insights
|
|
391
|
+
- **Skill Enhancement**: Develop additional technical and analytical capabilities
|
|
392
|
+
|
|
393
|
+
#### 2. Knowledge Sharing
|
|
394
|
+
- **Team Training**: Share expertise with colleagues
|
|
395
|
+
- **Community Contribution**: Contribute to BRAIN community knowledge
|
|
396
|
+
- **Documentation Updates**: Maintain current and accurate information
|
|
397
|
+
- **Best Practice Development**: Create and refine methodologies
|
|
398
|
+
|
|
399
|
+
### Conclusion
|
|
400
|
+
|
|
401
|
+
The Dataset Exploration Expert role is critical for maximizing the value of WorldQuant BRAIN's extensive data resources. By providing deep insights, logical organization, and comprehensive documentation, this expert enables users to discover new opportunities, create more effective alphas, and leverage the platform's full potential.
|
|
402
|
+
|
|
403
|
+
Success in this role requires a combination of technical expertise, analytical thinking, and communication skills, along with a deep understanding of both financial markets and data science principles. The expert serves as a bridge between raw data and actionable insights, transforming complex datasets into accessible, well-organized knowledge resources that drive innovation and success on the BRAIN platform.
|
|
404
|
+
|
|
405
|
+
---
|
|
406
|
+
|
|
407
|
+
## 🔧 **MCP Tool Reference Guide**
|
|
408
|
+
|
|
409
|
+
### **Core Data Exploration Tools**
|
|
410
|
+
- **`mcp_brain-api_get_datasets`**: Discover and filter available datasets
|
|
411
|
+
- **`mcp_brain-api_get_datafields`**: Retrieve field inventory and metadata
|
|
412
|
+
- **`mcp_brain-api_create_simulation`**: Execute data analysis simulations
|
|
413
|
+
- **`mcp_brain-api_get_platform_setting_options`**: Validate simulation parameters
|
|
414
|
+
|
|
415
|
+
### **Documentation & Knowledge Tools**
|
|
416
|
+
- **`mcp_brain-api_get_documentations`**: Access platform documentation structure
|
|
417
|
+
- **`mcp_brain-api_get_documentation_page`**: Read specific documentation content
|
|
418
|
+
- **`mcp_brain-api_get_operators`**: Discover available analysis operators
|
|
419
|
+
- **`mcp_brain-api_get_alpha_examples`**: Access strategy examples and templates
|
|
420
|
+
|
|
421
|
+
### **Community & Forum Tools**
|
|
422
|
+
- **`mcp_brain-forum_search_forum_posts`**: Search community discussions
|
|
423
|
+
- **`mcp_brain-forum_read_full_forum_post`**: Read detailed forum content
|
|
424
|
+
- **`mcp_brain-forum_get_glossary_terms`**: Access community terminology
|
|
425
|
+
|
|
426
|
+
### **Competition & Event Tools**
|
|
427
|
+
- **`mcp_brain-api_get_events`**: Discover available competitions
|
|
428
|
+
- **`mcp_brain-api_get_competition_details`**: Get competition guidelines
|
|
429
|
+
- **`mcp_brain-api_get_competition_agreement`**: Access competition rules
|
|
430
|
+
|
|
431
|
+
### **Best Practices for MCP Tool Usage**
|
|
432
|
+
1. **Always authenticate first** using `mcp_brain-api_authenticate`
|
|
433
|
+
2. **Validate parameters** using `mcp_brain-api_get_platform_setting_options`
|
|
434
|
+
3. **Handle errors gracefully** and retry with corrected parameters
|
|
435
|
+
4. **Use appropriate delays** between API calls to avoid rate limiting
|
|
436
|
+
5. **Document tool usage** in your exploration reports for reproducibility
|
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: brain-explain-alphas
|
|
3
|
+
description: >-
|
|
4
|
+
Provides a step-by-step workflow for analyzing and explaining WorldQuant BRAIN alpha expressions.
|
|
5
|
+
Use this when the user asks to explain a specific alpha expression, what a datafield does, or how operators work together.
|
|
6
|
+
Includes steps for data field lookup, operator analysis, and external research.
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# Alpha Explanation Workflow
|
|
10
|
+
|
|
11
|
+
This manual provides a workflow for analyzing and explaining a WorldQuant BRAIN alpha expression.
|
|
12
|
+
For the full detailed workflow and examples, see [reference.md](reference.md).
|
|
13
|
+
|
|
14
|
+
## Step 1: Deconstruct the Alpha Expression
|
|
15
|
+
Break down the alpha expression into its fundamental components: data fields and operators.
|
|
16
|
+
*Example:* `quantile(ts_regression(oth423_find,group_mean(oth423_find,vec_max(shrt3_bar),country),90))`
|
|
17
|
+
- **Data Fields**: `oth423_find`, `shrt3_bar`
|
|
18
|
+
- **Operators**: `quantile`, `ts_regression`, `group_mean`, `vec_max`
|
|
19
|
+
|
|
20
|
+
## Step 2: Analyze Data Fields
|
|
21
|
+
Use the `get_datafields` tool to get details about each data field.
|
|
22
|
+
- Identify: Instrument Type, Region, Delay, Universe, Data Type (Matrix/Vector).
|
|
23
|
+
- Note: Vector data requires aggregation (e.g., `vec_max`).
|
|
24
|
+
|
|
25
|
+
## Step 3: Understand the Operators
|
|
26
|
+
Use the `get_operators` tool to understand what each operator does.
|
|
27
|
+
|
|
28
|
+
## Step 4: Consult Official Documentation
|
|
29
|
+
Use `get_documentations` and `read_specific_documentation` for deep dives into concepts (e.g., vector data handling).
|
|
30
|
+
|
|
31
|
+
## Step 5: Synthesize and Explain
|
|
32
|
+
Structure the explanation:
|
|
33
|
+
1. **Idea**: High-level summary of the strategy.
|
|
34
|
+
2. **Rationale for data**: Why these fields? What do they represent?
|
|
35
|
+
3. **Rationale for operators**: How do they transform the data?
|
|
36
|
+
4. **Further Inspiration**: Potential improvements.
|
|
37
|
+
|
|
38
|
+
## Appendix: Vector Data
|
|
39
|
+
Vector data records multiple events per day per instrument (e.g., news). It requires aggregation (like `vec_mean`, `vec_sum`) to become a matrix value usable by other operators.
|
|
@@ -0,0 +1,56 @@
|
|
|
1
|
+
Alpha Explanation Workflow
|
|
2
|
+
This manual provides a step-by-step workflow for analyzing and explaining a WorldQuant BRAIN alpha expression. By following this guide, you can efficiently gather the necessary information to understand the logic and potential strategy behind any alpha.
|
|
3
|
+
|
|
4
|
+
Step 1: Deconstruct the Alpha Expression
|
|
5
|
+
The first step is to break down the alpha expression into its fundamental components: data fields and operators.
|
|
6
|
+
|
|
7
|
+
For example, given the expression quantile(ts_regression(oth423_find,group_mean(oth423_find,vec_max(shrt3_bar),country),90)):
|
|
8
|
+
|
|
9
|
+
Data Fields: oth423_find, shrt3_bar
|
|
10
|
+
Operators: quantile, ts_regression, group_mean, vec_max
|
|
11
|
+
Step 2: Analyze Data Fields
|
|
12
|
+
Use the brain-platform-mcp tool get_datafields to get detailed information about each data field.
|
|
13
|
+
|
|
14
|
+
Tool Usage: xml <use_mcp_tool> <server_name>brain-platform-mcp</server_name> <tool_name>get_datafields</tool_name> <arguments> { "instrument_type": "EQUITY", "region": "ASI", "delay": 1, "universe": "MINVOL1M", "data_type": "VECTOR", "search": "shrt3_bar" } </arguments> </use_mcp_tool>
|
|
15
|
+
|
|
16
|
+
Tips for effective searching:
|
|
17
|
+
|
|
18
|
+
Specify Parameters: Always provide as much information as you know, including instrument_type, region, delay, universe, and data_type (MATRIX or VECTOR).
|
|
19
|
+
Iterate: If you don't find the data field on your first try, try different combinations of parameters. The ASI region, for example, has two universes: MINVOL1M and ILLIQUID_MINVOL1M.
|
|
20
|
+
Check Data Type: Be sure to check if the data is a MATRIX (one value per stock per day) or a VECTOR (multiple values per stock per day). This is crucial for understanding how the data is used.
|
|
21
|
+
Example Data Field Information:
|
|
22
|
+
|
|
23
|
+
oth423_find: A matrix data field from the "Fundamental Income and Dividend Model" dataset in the ASI region. It represents a "Find score," likely indicating fundamental attractiveness.
|
|
24
|
+
shrt3_bar: A vector data field from the "Securities Lending Files Data" dataset in the ASI region. It provides a vector of ratings (1-10) indicating the demand to borrow a stock, which is a proxy for short-selling interest.
|
|
25
|
+
Step 3: Understand the Operators
|
|
26
|
+
Use the brain-platform-mcp tool get_operators to get a list of all available operators and their descriptions.
|
|
27
|
+
|
|
28
|
+
Tool Usage: xml <use_mcp_tool> <server_name>brain-platform-mcp</server_name> <tool_name>get_operators</tool_name> <arguments> {} </arguments> </use_mcp_tool> The output of this command contains a wealth of information. For your convenience, a table of the most common operators is included in the Appendix of this manual.
|
|
29
|
+
|
|
30
|
+
Step 4: Consult Official Documentation
|
|
31
|
+
For more complex topics, the official BRAIN documentation is an invaluable resource. Use the get_documentations tool to see a list of available documents, and get_documentation_page to read a specific page.
|
|
32
|
+
|
|
33
|
+
Example: To understand vector data fields better, I consulted the "Vector Data Fields 🥉" document (vector-datafields). This revealed that vector data contains multiple values per instrument per day and must be aggregated by a vector operator before being used with other operators.
|
|
34
|
+
|
|
35
|
+
Step 5: Broaden Understanding with External Research (Must Call the arxiv_api.py script to get the latest research papers)
|
|
36
|
+
For cutting-edge ideas and inspiration, you can search for academic papers on arXiv using the provided arxiv_api.py script.
|
|
37
|
+
|
|
38
|
+
Workflow:
|
|
39
|
+
|
|
40
|
+
Identify Keywords: Based on your analysis of the alpha, identify relevant keywords. For our example, these were: "short interest", "fundamental analysis", "relative value", and "news sentiment".
|
|
41
|
+
Run the Script: Use the with-wrappers script to avoid SSL errors.
|
|
42
|
+
|
|
43
|
+
python arxiv_api.py "your keywords here" -n 10
|
|
44
|
+
Step 6: Synthesize and Explain
|
|
45
|
+
Once you have gathered all the necessary information, structure your explanation in a clear and concise format. The following template is recommended:
|
|
46
|
+
|
|
47
|
+
Idea: A high-level summary of the alpha's strategy.
|
|
48
|
+
Rationale for data used: An explanation of why each data field was chosen and what it represents.
|
|
49
|
+
Rationale for operators used: A step-by-step explanation of how the operators transform the data to generate the final signal.
|
|
50
|
+
Further Inspiration: Ideas for new alphas based on your research.
|
|
51
|
+
Troubleshooting
|
|
52
|
+
SSL Errors: If you encounter a CERTIFICATE_VERIFY_FAILED error when running python scripts that access the internet, use the AI to help you change or make script to execute your command.
|
|
53
|
+
Appendix A: Understanding Vector Data
|
|
54
|
+
Vector Data is a distinct type of data field where the number of events recorded per day, per instrument, can vary. This is in contrast to standard matrix data, which has a single value for each instrument per day.
|
|
55
|
+
|
|
56
|
+
For example, news sentiment data is often a vector because a stock can have multiple news articles on a single day. To use this data in most BRAIN operators, it must first be aggregated into a single value using a vector operator.
|
|
@@ -0,0 +1,72 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: brain-how-to-pass-AlphaTest
|
|
3
|
+
description: >-
|
|
4
|
+
Provides detailed requirements, thresholds, and improvement tips for WorldQuant BRAIN Alpha submission tests.
|
|
5
|
+
Covers Fitness, Sharpe, Turnover, Weight, Sub-universe, and Self-Correlation tests.
|
|
6
|
+
Use this when the user asks about alpha submission failures, how to improve alpha metrics, or test requirements.
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# BRAIN Alpha Submission Tests: Requirements and Improvement Tips
|
|
10
|
+
|
|
11
|
+
This skill provides key requirements and expert tips for passing alpha submission tests.
|
|
12
|
+
For comprehensive details, thresholds, and community-sourced strategies, please read [reference.md](reference.md).
|
|
13
|
+
|
|
14
|
+
## Overview
|
|
15
|
+
|
|
16
|
+
Alphas must pass a series of pre-submission checks to ensure they meet quality thresholds.
|
|
17
|
+
|
|
18
|
+
## 1. Fitness
|
|
19
|
+
### Requirements
|
|
20
|
+
- At least "Average": Greater than 1.3 for Delay-0 or Greater than 1 for Delay-1.
|
|
21
|
+
- Fitness = Sharpe * sqrt(abs(Returns) / max(Turnover, 0.125)).
|
|
22
|
+
|
|
23
|
+
### Tips to Improve
|
|
24
|
+
- Increase Sharpe/Returns and reduce Turnover.
|
|
25
|
+
- Use group operators (e.g., with pv13) to boost fitness.
|
|
26
|
+
- Check with `check_submission` tool.
|
|
27
|
+
|
|
28
|
+
## 2. Sharpe Ratio
|
|
29
|
+
### Requirements
|
|
30
|
+
- Greater than 2 for Delay-0 or Greater than 1.25 for Delay-1.
|
|
31
|
+
- Sharpe = sqrt(252) * IR, where IR = mean(PnL) / stdev(PnL).
|
|
32
|
+
|
|
33
|
+
### Tips to Improve
|
|
34
|
+
- Focus on consistent PnL with low volatility.
|
|
35
|
+
- Decay signals separately for liquid/non-liquid stocks.
|
|
36
|
+
- If Sharpe is negative (e.g., -1 to -2), try flipping the sign: `-original_expression`.
|
|
37
|
+
|
|
38
|
+
## 3. Turnover
|
|
39
|
+
### Requirements
|
|
40
|
+
- 1% < Turnover < 70%.
|
|
41
|
+
|
|
42
|
+
### Tips to Improve
|
|
43
|
+
- Use decay functions (`ts_decay_linear`) to smooth signals.
|
|
44
|
+
|
|
45
|
+
## 4. Weight Test
|
|
46
|
+
### Requirements
|
|
47
|
+
- Max weight in any stock <10%.
|
|
48
|
+
|
|
49
|
+
### Tips to Improve
|
|
50
|
+
- Use neutralization (e.g., `neutralize(x, "MARKET")`) to distribute weights.
|
|
51
|
+
|
|
52
|
+
## 5. Sub-universe Test
|
|
53
|
+
### Requirements
|
|
54
|
+
- Sub-universe Sharpe >= 0.75 * sqrt(subuniverse_size / alpha_universe_size) * alpha_sharpe.
|
|
55
|
+
|
|
56
|
+
### Tips to Improve
|
|
57
|
+
- Avoid size-related multipliers.
|
|
58
|
+
- Decay liquid/non-liquid parts separately.
|
|
59
|
+
|
|
60
|
+
## 6. Self-Correlation
|
|
61
|
+
### Requirements
|
|
62
|
+
- <0.7 PnL correlation with own submitted alphas.
|
|
63
|
+
|
|
64
|
+
### Tips to Improve
|
|
65
|
+
- Submit diverse ideas.
|
|
66
|
+
- Use `check_correlation` tool.
|
|
67
|
+
- Transform negatively correlated alphas.
|
|
68
|
+
|
|
69
|
+
## General Guidance
|
|
70
|
+
- **Start Simple**: Use basic operators like `ts_rank` first.
|
|
71
|
+
- **Optimize Settings**: Choose universes like TOP3000 (USA, D1).
|
|
72
|
+
- **ATOM Principle**: Avoid mixing datasets to benefit from relaxed "ATOM" submission criteria (Last 2Y Sharpe).
|