cnhkmcp 2.1.7__py3-none-any.whl → 2.1.8__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- cnhkmcp/__init__.py +1 -1
- cnhkmcp/untracked/skills/brain-calculate-alpha-selfcorrQuick/SKILL.md +25 -0
- cnhkmcp/untracked/skills/brain-calculate-alpha-selfcorrQuick/reference.md +59 -0
- cnhkmcp/untracked/skills/brain-calculate-alpha-selfcorrQuick/scripts/requirements.txt +4 -0
- cnhkmcp/untracked/skills/brain-calculate-alpha-selfcorrQuick/scripts/skill.py +734 -0
- cnhkmcp/untracked/skills/brain-datafield-exploration-general/SKILL.md +45 -0
- cnhkmcp/untracked/skills/brain-datafield-exploration-general/reference.md +194 -0
- cnhkmcp/untracked/skills/brain-dataset-exploration-general/SKILL.md +39 -0
- cnhkmcp/untracked/skills/brain-dataset-exploration-general/reference.md +436 -0
- cnhkmcp/untracked/skills/brain-explain-alphas/SKILL.md +39 -0
- cnhkmcp/untracked/skills/brain-explain-alphas/reference.md +56 -0
- cnhkmcp/untracked/skills/brain-how-to-pass-AlphaTest/SKILL.md +72 -0
- cnhkmcp/untracked/skills/brain-how-to-pass-AlphaTest/reference.md +202 -0
- cnhkmcp/untracked/skills/brain-improve-alpha-performance/SKILL.md +44 -0
- cnhkmcp/untracked/skills/brain-improve-alpha-performance/reference.md +101 -0
- cnhkmcp/untracked/skills/brain-nextMove-analysis/SKILL.md +37 -0
- cnhkmcp/untracked/skills/brain-nextMove-analysis/reference.md +128 -0
- {cnhkmcp-2.1.7.dist-info → cnhkmcp-2.1.8.dist-info}/METADATA +1 -1
- {cnhkmcp-2.1.7.dist-info → cnhkmcp-2.1.8.dist-info}/RECORD +23 -7
- {cnhkmcp-2.1.7.dist-info → cnhkmcp-2.1.8.dist-info}/WHEEL +0 -0
- {cnhkmcp-2.1.7.dist-info → cnhkmcp-2.1.8.dist-info}/entry_points.txt +0 -0
- {cnhkmcp-2.1.7.dist-info → cnhkmcp-2.1.8.dist-info}/licenses/LICENSE +0 -0
- {cnhkmcp-2.1.7.dist-info → cnhkmcp-2.1.8.dist-info}/top_level.txt +0 -0
|
@@ -0,0 +1,45 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: brain-datafield-exploration-general
|
|
3
|
+
description: >-
|
|
4
|
+
Provides 6 proven methods to evaluate new datasets on the WorldQuant BRAIN platform.
|
|
5
|
+
Includes methods for checking coverage, non-zero values, update frequency, bounds, central tendency, and distribution.
|
|
6
|
+
Use when the user wants to understand a specific datafield (e.g., "what is this field?", "how often does it update?").
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# 6 Ways to Evaluate a New Dataset
|
|
10
|
+
|
|
11
|
+
This skill provides 6 methods to quickly evaluate a new datafield on the WorldQuant BRAIN platform.
|
|
12
|
+
For the complete guide and detailed examples, see [reference.md](reference.md).
|
|
13
|
+
|
|
14
|
+
**Important**: Run these simulations with **Neutralization: None**, **Decay: 0**, **Test Period: P0Y0M**.
|
|
15
|
+
**Metrics**: Check **Long Count** and **Short Count** in the IS Summary.
|
|
16
|
+
|
|
17
|
+
## 1. Basic Coverage Analysis
|
|
18
|
+
* **Expression**: `datafield` (or `vec_op(datafield)` for vectors)
|
|
19
|
+
* **Insight**: % Coverage (Long Count + Short Count) / Universe Size.
|
|
20
|
+
|
|
21
|
+
## 2. Non-Zero Value Coverage
|
|
22
|
+
* **Expression**: `datafield != 0 ? 1 : 0`
|
|
23
|
+
* **Insight**: Real coverage (excluding zeros). Distinguishes missing data (NaN) from actual zero values.
|
|
24
|
+
|
|
25
|
+
## 3. Data Update Frequency Analysis
|
|
26
|
+
* **Expression**: `ts_std_dev(datafield, N) != 0 ? 1 : 0`
|
|
27
|
+
* **Insight**: Frequency of updates. Vary `N`:
|
|
28
|
+
* `N=5` (Week): Low count implies weekly updates.
|
|
29
|
+
* `N=22` (Month): Monthly updates.
|
|
30
|
+
* `N=66` (Quarter): Quarterly updates.
|
|
31
|
+
|
|
32
|
+
## 4. Data Bounds Analysis
|
|
33
|
+
* **Expression**: `abs(datafield) > X`
|
|
34
|
+
* **Insight**: Check value range. Vary `X` (e.g., 1, 10, 100) to check scale (e.g., is it normalized -1 to 1?).
|
|
35
|
+
|
|
36
|
+
## 5. Central Tendency Analysis
|
|
37
|
+
* **Expression**: `ts_median(datafield, 1000) > X`
|
|
38
|
+
* **Insight**: Typical values over time (5-year median). Vary `X` to find the center.
|
|
39
|
+
|
|
40
|
+
## 6. Data Distribution Analysis
|
|
41
|
+
* **Expression**: `X < scale_down(datafield) && scale_down(datafield) < Y`
|
|
42
|
+
* **Insight**: Distribution shape. `scale_down` maps to 0-1. Vary `X` and `Y` (e.g., 0.1-0.2) to check buckets.
|
|
43
|
+
|
|
44
|
+
## Note on Vector Data
|
|
45
|
+
If the datafield is a **VECTOR** type, wrap it in a vector operator first (e.g., `vec_sum(datafield)` or `vec_mean(datafield)`).
|
|
@@ -0,0 +1,194 @@
|
|
|
1
|
+
# BRAIN TIPS: 6 Ways to Quickly Evaluate a New Dataset
|
|
2
|
+
## WorldQuant BRAIN Platform - Datafield Exploration Guide
|
|
3
|
+
|
|
4
|
+
**Original Post**: [BRAIN TIPS] 6 ways to quickly evaluate a new dataset
|
|
5
|
+
**Author**: KA64574
|
|
6
|
+
**Date**: 2 years ago
|
|
7
|
+
**Followers**: 265 people
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## 🎯 **Overview**
|
|
12
|
+
|
|
13
|
+
WorldQuant BRAIN has thousands of datafields for you to create alphas. But how do you quickly understand a new datafield? Here are 6 proven methods to evaluate and understand new datasets efficiently.
|
|
14
|
+
|
|
15
|
+
**Important**: Simulate the below expressions in **"None" neutralization** and **decay 0 setting** and **test_period P0Y0M**. Obtain insights of specific parameters using the **Long Count** and **Short Count** in the **IS Summary section** of the results.
|
|
16
|
+
**Watch Out**: - Data type (matrix/vector), please note, these are two special definition here and not similar as we knew in math. Different data types have different characteristics and usage rule; if it is a matrix data type, you can use the datafield directly, but if it is a vector data type, you should use a vector operator to convert the datafield to a matrix data type. Thus, for a vector data type, you should find proper vector operator via mcp then put it into the following test.
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## 📊 **The 6 Exploration Methods**
|
|
21
|
+
|
|
22
|
+
### **1. Basic Coverage Analysis**
|
|
23
|
+
**Expression**: `datafield`, for vector data type, the expression should be `vector_operator(datafield)`, please note, the vector_operator is the operator that you found via mcp.
|
|
24
|
+
**Insight**: % coverage, would approximately be ratio of (Long Count + Short Count in the IS Summary) / (Universe Size in the settings)
|
|
25
|
+
|
|
26
|
+
**Purpose**: Understand the basic availability of data across the universe
|
|
27
|
+
**What it tells you**: How many instruments have data for this field on average
|
|
28
|
+
|
|
29
|
+
---
|
|
30
|
+
|
|
31
|
+
### **2. Non-Zero Value Coverage**
|
|
32
|
+
**Expression**: `datafield != 0 ? 1 : 0` , for vector data type, the expression should be `vector_operator(datafield) != 0 ? 1 : 0`, please note, the vector_operator is the operator that you found via mcp.
|
|
33
|
+
**Insight**: Coverage. Long Count indicates average non-zero values on a daily basis
|
|
34
|
+
|
|
35
|
+
**Purpose**: Distinguish between missing data and actual zero values
|
|
36
|
+
**What it tells you**: Whether the field has meaningful data vs. just coverage gaps
|
|
37
|
+
|
|
38
|
+
---
|
|
39
|
+
|
|
40
|
+
### **3. Data Update Frequency Analysis**
|
|
41
|
+
**Expression**: `ts_std_dev(datafield,N) != 0 ? 1 : 0` , for vector data type, the expression should be `ts_std_dev(vector_operator(datafield),N) != 0 ? 1 : 0`, please note, the vector_operator is the operator that you found via mcp.
|
|
42
|
+
**Insight**: Frequency of unique data (daily, weekly, monthly etc.)
|
|
43
|
+
|
|
44
|
+
**Key Points**:
|
|
45
|
+
- Some datasets have data backfilled for missing values, while some do not
|
|
46
|
+
- This expression can be used to find the frequency of unique datafield updates by varying N (no. of days)
|
|
47
|
+
- Datafields with quarterly unique data frequency would see a Long Count + Short Count value close to its actual coverage when N = 66 (quarter)
|
|
48
|
+
- When N = 22 (month) Long Count + Short Count would be lower (approx. 1/3rd of coverage)
|
|
49
|
+
- When N = 5 (week), Long Count + Short Count would be even lower
|
|
50
|
+
|
|
51
|
+
**Purpose**: Understand how often the data actually changes vs. being backfilled
|
|
52
|
+
**What it tells you**: Data freshness and update patterns
|
|
53
|
+
|
|
54
|
+
---
|
|
55
|
+
|
|
56
|
+
### **4. Data Bounds Analysis**
|
|
57
|
+
**Expression**: `abs(datafield) > X` , for vector data type, the expression should be `abs(vector_operator(datafield)) > X`, please note, the vector_operator is the operator that you found via mcp.
|
|
58
|
+
**Insight**: Bounds of the datafield. Vary the values of X and see the Long Count
|
|
59
|
+
|
|
60
|
+
**Example**: X=1 will indicate if the field is normalized to values between -1 and +1
|
|
61
|
+
|
|
62
|
+
**Purpose**: Understand the range and scale of the data values
|
|
63
|
+
**What it tells you**: Whether data is normalized, what the typical value ranges are
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
### **5. Central Tendency Analysis**
|
|
68
|
+
**Expression**: `ts_median(datafield, 1000) > X` , for vector data type, the expression should be `ts_median(vector_operator(datafield), 1000) > X`, please note, the vector_operator is the operator that you found via mcp.
|
|
69
|
+
**Insight**: Median of the datafield over 5 years. Vary the values of X and see the Long Count
|
|
70
|
+
|
|
71
|
+
**Note**: Similar process can be applied to check the mean of the datafield
|
|
72
|
+
|
|
73
|
+
**Purpose**: Understand the typical values and central tendency of the data
|
|
74
|
+
**What it tells you**: Whether the data is skewed, what typical values look like
|
|
75
|
+
|
|
76
|
+
---
|
|
77
|
+
|
|
78
|
+
### **6. Data Distribution Analysis**
|
|
79
|
+
**Expression**: `X < scale_down(datafield) && scale_down(datafield) < Y` , for vector data type, the expression should be `X < scale_down(vector_operator(datafield)) && scale_down(vector_operator(datafield)) < Y`, please note, the vector_operator is the operator that you found via mcp.
|
|
80
|
+
**Insight**: Distribution of the datafield
|
|
81
|
+
|
|
82
|
+
**Key Points**:
|
|
83
|
+
- `scale_down` acts as a MinMaxScaler that can preserve the original distribution of the data
|
|
84
|
+
- X and Y are values that vary between 0 and 1 that allow us to check how the datafield distributes across its range
|
|
85
|
+
|
|
86
|
+
**Purpose**: Understand how data is distributed across its range
|
|
87
|
+
**What it tells you**: Whether data is evenly distributed, clustered, or has specific patterns
|
|
88
|
+
|
|
89
|
+
---
|
|
90
|
+
|
|
91
|
+
## 🔍 **Practical Example**
|
|
92
|
+
|
|
93
|
+
**Example**: If you simulate `[close <= 0]`, you will see Long and Short Counts as 0. This implies that closing price always has a positive value (as expected!)
|
|
94
|
+
|
|
95
|
+
**What this demonstrates**: The validation that your understanding of the data is correct
|
|
96
|
+
|
|
97
|
+
---
|
|
98
|
+
|
|
99
|
+
## 📋 **Implementation Workflow**
|
|
100
|
+
|
|
101
|
+
### **Step 1: Setup**
|
|
102
|
+
1. Set neutralization to "None"
|
|
103
|
+
2. Set decay to 0
|
|
104
|
+
3. Choose appropriate universe and time period
|
|
105
|
+
|
|
106
|
+
### **Step 2: Run Basic Tests**
|
|
107
|
+
1. Start with expression 1 (`datafield`) to get baseline coverage
|
|
108
|
+
2. Run expression 2 (`datafield != 0 ? 1 : 0`) to understand non-zero coverage
|
|
109
|
+
|
|
110
|
+
### **Step 3: Analyze Update Frequency**
|
|
111
|
+
1. Test with N = 5 (weekly)
|
|
112
|
+
2. Test with N = 22 (monthly)
|
|
113
|
+
3. Test with N = 66 (quarterly)
|
|
114
|
+
4. Compare results to understand update patterns
|
|
115
|
+
|
|
116
|
+
### **Step 4: Explore Value Ranges**
|
|
117
|
+
1. Test various thresholds for bounds analysis
|
|
118
|
+
2. Test various thresholds for central tendency
|
|
119
|
+
3. Test various ranges for distribution analysis
|
|
120
|
+
|
|
121
|
+
### **Step 5: Document Insights**
|
|
122
|
+
1. Record Long Count and Short Count for each test
|
|
123
|
+
2. Calculate coverage ratios
|
|
124
|
+
3. Note patterns in update frequency
|
|
125
|
+
4. Document value ranges and distributions
|
|
126
|
+
|
|
127
|
+
---
|
|
128
|
+
|
|
129
|
+
## 🎯 **When to Use Each Method**
|
|
130
|
+
|
|
131
|
+
| Method | Best For | When to Use |
|
|
132
|
+
|--------|----------|-------------|
|
|
133
|
+
| **1. Basic Coverage** | Initial assessment | First exploration of any new field |
|
|
134
|
+
| **2. Non-Zero Coverage** | Data quality check | After basic coverage to understand meaningful data |
|
|
135
|
+
| **3. Update Frequency** | Data freshness | When you need to understand how often data changes |
|
|
136
|
+
| **4. Data Bounds** | Value ranges | When you need to understand data scale and normalization |
|
|
137
|
+
| **5. Central Tendency** | Typical values | When you need to understand what "normal" looks like |
|
|
138
|
+
| **6. Distribution** | Data patterns | When you need to understand how data is spread |
|
|
139
|
+
|
|
140
|
+
---
|
|
141
|
+
|
|
142
|
+
## ⚠️ **Important Considerations**
|
|
143
|
+
|
|
144
|
+
### **Neutralization Setting**
|
|
145
|
+
- **Use "None"** for these exploration tests
|
|
146
|
+
- This ensures you're seeing the raw data behavior
|
|
147
|
+
- Other neutralization settings may mask important patterns
|
|
148
|
+
|
|
149
|
+
### **Decay Setting**
|
|
150
|
+
- **Use 0** for these exploration tests
|
|
151
|
+
- This ensures you're seeing the actual data values
|
|
152
|
+
- Decay can smooth out important variations
|
|
153
|
+
|
|
154
|
+
### **Universe Selection**
|
|
155
|
+
- Choose a universe that represents your target use case
|
|
156
|
+
- Consider both coverage and representativeness
|
|
157
|
+
- Large universes may have different patterns than smaller ones
|
|
158
|
+
|
|
159
|
+
### **Time Period**
|
|
160
|
+
- Use sufficient history to see patterns
|
|
161
|
+
- Consider seasonal or cyclical effects
|
|
162
|
+
- Ensure you have enough data for statistical significance
|
|
163
|
+
|
|
164
|
+
---
|
|
165
|
+
|
|
166
|
+
## 🚀 **Advanced Applications**
|
|
167
|
+
|
|
168
|
+
### **Combining Methods**
|
|
169
|
+
- Use multiple methods together for comprehensive understanding
|
|
170
|
+
- Cross-reference results to validate insights
|
|
171
|
+
- Look for inconsistencies that might indicate data quality issues
|
|
172
|
+
|
|
173
|
+
### **Custom Variations**
|
|
174
|
+
- Modify expressions to test specific hypotheses
|
|
175
|
+
- Combine with other operators for deeper insights
|
|
176
|
+
- Create custom metrics based on your findings
|
|
177
|
+
|
|
178
|
+
### **Automation**
|
|
179
|
+
- These tests can be automated for systematic dataset evaluation
|
|
180
|
+
- Create standardized evaluation reports
|
|
181
|
+
- Track changes in data quality over time
|
|
182
|
+
|
|
183
|
+
---
|
|
184
|
+
|
|
185
|
+
## 📚 **Related Resources**
|
|
186
|
+
|
|
187
|
+
- **BRAIN Platform Documentation**: Understanding Data concepts
|
|
188
|
+
- **Data Explorer Tool**: Visual exploration of data fields
|
|
189
|
+
- **Simulation Results**: Detailed analysis of field behavior
|
|
190
|
+
- **Community Forums**: User experiences and best practices
|
|
191
|
+
|
|
192
|
+
---
|
|
193
|
+
|
|
194
|
+
*This guide provides a systematic approach to understanding new datafields on the WorldQuant BRAIN platform. Use these methods to quickly assess data quality, coverage, and characteristics before incorporating fields into your alpha strategies.*
|
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: brain-dataset-exploration-general
|
|
3
|
+
description: >-
|
|
4
|
+
Provides a comprehensive workflow for deep-diving into entire WorldQuant BRAIN datasets.
|
|
5
|
+
Includes steps for dataset selection, field categorization, detailed description generation, and cross-platform research.
|
|
6
|
+
Use when the user wants to "audit a dataset", "categorize fields", or "explore a new dataset".
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# Dataset Exploration Expert Workflow
|
|
10
|
+
|
|
11
|
+
This workflow guides the deep analysis and categorization of datasets.
|
|
12
|
+
For the detailed job duty manual and specific MCP tool strategies, see [reference.md](reference.md).
|
|
13
|
+
|
|
14
|
+
## Phase 1: Dataset Selection & Initial Assessment
|
|
15
|
+
1. **Identify Dataset**: Select based on strategic importance or user needs.
|
|
16
|
+
2. **Initial Exploration**:
|
|
17
|
+
- Use `get_datasets` to find datasets.
|
|
18
|
+
- Use `get_datafields` to count fields and check coverage.
|
|
19
|
+
- Use `get_documentations` to find related docs.
|
|
20
|
+
|
|
21
|
+
## Phase 2: Field Categorization
|
|
22
|
+
Group data fields into logical categories:
|
|
23
|
+
- **Business Function**: Financials, Market Data, Estimates, etc.
|
|
24
|
+
- **Data Type**: Matrix, Vector.
|
|
25
|
+
- **Update Frequency**: Daily, Quarterly.
|
|
26
|
+
- **Hierarchy**: Primary -> Secondary -> Tertiary (e.g., Financials -> Income Statement -> Revenue).
|
|
27
|
+
|
|
28
|
+
## Phase 3: Enhanced Description & Analysis
|
|
29
|
+
1. **Describe**: Write detailed descriptions (Business context, Methodology, Typical values).
|
|
30
|
+
2. **Analyze**: Use `brain-datafield-exploration` techniques on key fields to understand distributions and patterns.
|
|
31
|
+
|
|
32
|
+
## Phase 4: Integration
|
|
33
|
+
1. **Research**: Check forum posts for community insights.
|
|
34
|
+
2. **Alpha Ideas**: Brainstorm alpha concepts based on the dataset characteristics.
|
|
35
|
+
|
|
36
|
+
## Core Responsibilities
|
|
37
|
+
- **Deep Dive**: Focus on one dataset at a time.
|
|
38
|
+
- **Inventory**: Catalog all fields.
|
|
39
|
+
- **Documentation**: Improve descriptions.
|