cnhkmcp 2.1.7__py3-none-any.whl → 2.1.9__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (26) hide show
  1. cnhkmcp/__init__.py +1 -1
  2. cnhkmcp/untracked/AI/321/206/320/231/320/243/321/205/342/225/226/320/265/321/204/342/225/221/342/225/221/BRAIN_AI/321/206/320/231/320/243/321/205/342/225/226/320/265/321/204/342/225/221/342/225/221Mac_Linux/321/207/320/231/320/230/321/206/320/254/320/274.zip +0 -0
  3. cnhkmcp/untracked/AI/321/206/320/231/320/243/321/205/342/225/226/320/265/321/204/342/225/221/342/225/221//321/205/320/237/320/234/321/205/320/227/342/225/227/321/205/320/276/320/231/321/210/320/263/320/225AI/321/206/320/231/320/243/321/205/342/225/226/320/265/321/204/342/225/221/342/225/221_Windows/321/207/320/231/320/230/321/206/320/254/320/274.exe +0 -0
  4. cnhkmcp/untracked/AI/321/206/320/261/320/234/321/211/320/255/320/262/321/206/320/237/320/242/321/204/342/225/227/342/225/242/config.json +1 -1
  5. cnhkmcp/untracked/skills/brain-calculate-alpha-selfcorrQuick/SKILL.md +25 -0
  6. cnhkmcp/untracked/skills/brain-calculate-alpha-selfcorrQuick/reference.md +59 -0
  7. cnhkmcp/untracked/skills/brain-calculate-alpha-selfcorrQuick/scripts/requirements.txt +4 -0
  8. cnhkmcp/untracked/skills/brain-calculate-alpha-selfcorrQuick/scripts/skill.py +734 -0
  9. cnhkmcp/untracked/skills/brain-datafield-exploration-general/SKILL.md +45 -0
  10. cnhkmcp/untracked/skills/brain-datafield-exploration-general/reference.md +194 -0
  11. cnhkmcp/untracked/skills/brain-dataset-exploration-general/SKILL.md +39 -0
  12. cnhkmcp/untracked/skills/brain-dataset-exploration-general/reference.md +436 -0
  13. cnhkmcp/untracked/skills/brain-explain-alphas/SKILL.md +39 -0
  14. cnhkmcp/untracked/skills/brain-explain-alphas/reference.md +56 -0
  15. cnhkmcp/untracked/skills/brain-how-to-pass-AlphaTest/SKILL.md +72 -0
  16. cnhkmcp/untracked/skills/brain-how-to-pass-AlphaTest/reference.md +202 -0
  17. cnhkmcp/untracked/skills/brain-improve-alpha-performance/SKILL.md +44 -0
  18. cnhkmcp/untracked/skills/brain-improve-alpha-performance/reference.md +101 -0
  19. cnhkmcp/untracked/skills/brain-nextMove-analysis/SKILL.md +37 -0
  20. cnhkmcp/untracked/skills/brain-nextMove-analysis/reference.md +128 -0
  21. {cnhkmcp-2.1.7.dist-info → cnhkmcp-2.1.9.dist-info}/METADATA +1 -1
  22. {cnhkmcp-2.1.7.dist-info → cnhkmcp-2.1.9.dist-info}/RECORD +26 -10
  23. {cnhkmcp-2.1.7.dist-info → cnhkmcp-2.1.9.dist-info}/WHEEL +0 -0
  24. {cnhkmcp-2.1.7.dist-info → cnhkmcp-2.1.9.dist-info}/entry_points.txt +0 -0
  25. {cnhkmcp-2.1.7.dist-info → cnhkmcp-2.1.9.dist-info}/licenses/LICENSE +0 -0
  26. {cnhkmcp-2.1.7.dist-info → cnhkmcp-2.1.9.dist-info}/top_level.txt +0 -0
@@ -0,0 +1,45 @@
1
+ ---
2
+ name: brain-datafield-exploration-general
3
+ description: >-
4
+ Provides 6 proven methods to evaluate new datasets on the WorldQuant BRAIN platform.
5
+ Includes methods for checking coverage, non-zero values, update frequency, bounds, central tendency, and distribution.
6
+ Use when the user wants to understand a specific datafield (e.g., "what is this field?", "how often does it update?").
7
+ ---
8
+
9
+ # 6 Ways to Evaluate a New Dataset
10
+
11
+ This skill provides 6 methods to quickly evaluate a new datafield on the WorldQuant BRAIN platform.
12
+ For the complete guide and detailed examples, see [reference.md](reference.md).
13
+
14
+ **Important**: Run these simulations with **Neutralization: None**, **Decay: 0**, **Test Period: P0Y0M**.
15
+ **Metrics**: Check **Long Count** and **Short Count** in the IS Summary.
16
+
17
+ ## 1. Basic Coverage Analysis
18
+ * **Expression**: `datafield` (or `vec_op(datafield)` for vectors)
19
+ * **Insight**: % Coverage (Long Count + Short Count) / Universe Size.
20
+
21
+ ## 2. Non-Zero Value Coverage
22
+ * **Expression**: `datafield != 0 ? 1 : 0`
23
+ * **Insight**: Real coverage (excluding zeros). Distinguishes missing data (NaN) from actual zero values.
24
+
25
+ ## 3. Data Update Frequency Analysis
26
+ * **Expression**: `ts_std_dev(datafield, N) != 0 ? 1 : 0`
27
+ * **Insight**: Frequency of updates. Vary `N`:
28
+ * `N=5` (Week): Low count implies weekly updates.
29
+ * `N=22` (Month): Monthly updates.
30
+ * `N=66` (Quarter): Quarterly updates.
31
+
32
+ ## 4. Data Bounds Analysis
33
+ * **Expression**: `abs(datafield) > X`
34
+ * **Insight**: Check value range. Vary `X` (e.g., 1, 10, 100) to check scale (e.g., is it normalized -1 to 1?).
35
+
36
+ ## 5. Central Tendency Analysis
37
+ * **Expression**: `ts_median(datafield, 1000) > X`
38
+ * **Insight**: Typical values over time (5-year median). Vary `X` to find the center.
39
+
40
+ ## 6. Data Distribution Analysis
41
+ * **Expression**: `X < scale_down(datafield) && scale_down(datafield) < Y`
42
+ * **Insight**: Distribution shape. `scale_down` maps to 0-1. Vary `X` and `Y` (e.g., 0.1-0.2) to check buckets.
43
+
44
+ ## Note on Vector Data
45
+ If the datafield is a **VECTOR** type, wrap it in a vector operator first (e.g., `vec_sum(datafield)` or `vec_mean(datafield)`).
@@ -0,0 +1,194 @@
1
+ # BRAIN TIPS: 6 Ways to Quickly Evaluate a New Dataset
2
+ ## WorldQuant BRAIN Platform - Datafield Exploration Guide
3
+
4
+ **Original Post**: [BRAIN TIPS] 6 ways to quickly evaluate a new dataset
5
+ **Author**: KA64574
6
+ **Date**: 2 years ago
7
+ **Followers**: 265 people
8
+
9
+ ---
10
+
11
+ ## 🎯 **Overview**
12
+
13
+ WorldQuant BRAIN has thousands of datafields for you to create alphas. But how do you quickly understand a new datafield? Here are 6 proven methods to evaluate and understand new datasets efficiently.
14
+
15
+ **Important**: Simulate the below expressions in **"None" neutralization** and **decay 0 setting** and **test_period P0Y0M**. Obtain insights of specific parameters using the **Long Count** and **Short Count** in the **IS Summary section** of the results.
16
+ **Watch Out**: - Data type (matrix/vector), please note, these are two special definition here and not similar as we knew in math. Different data types have different characteristics and usage rule; if it is a matrix data type, you can use the datafield directly, but if it is a vector data type, you should use a vector operator to convert the datafield to a matrix data type. Thus, for a vector data type, you should find proper vector operator via mcp then put it into the following test.
17
+
18
+ ---
19
+
20
+ ## 📊 **The 6 Exploration Methods**
21
+
22
+ ### **1. Basic Coverage Analysis**
23
+ **Expression**: `datafield`, for vector data type, the expression should be `vector_operator(datafield)`, please note, the vector_operator is the operator that you found via mcp.
24
+ **Insight**: % coverage, would approximately be ratio of (Long Count + Short Count in the IS Summary) / (Universe Size in the settings)
25
+
26
+ **Purpose**: Understand the basic availability of data across the universe
27
+ **What it tells you**: How many instruments have data for this field on average
28
+
29
+ ---
30
+
31
+ ### **2. Non-Zero Value Coverage**
32
+ **Expression**: `datafield != 0 ? 1 : 0` , for vector data type, the expression should be `vector_operator(datafield) != 0 ? 1 : 0`, please note, the vector_operator is the operator that you found via mcp.
33
+ **Insight**: Coverage. Long Count indicates average non-zero values on a daily basis
34
+
35
+ **Purpose**: Distinguish between missing data and actual zero values
36
+ **What it tells you**: Whether the field has meaningful data vs. just coverage gaps
37
+
38
+ ---
39
+
40
+ ### **3. Data Update Frequency Analysis**
41
+ **Expression**: `ts_std_dev(datafield,N) != 0 ? 1 : 0` , for vector data type, the expression should be `ts_std_dev(vector_operator(datafield),N) != 0 ? 1 : 0`, please note, the vector_operator is the operator that you found via mcp.
42
+ **Insight**: Frequency of unique data (daily, weekly, monthly etc.)
43
+
44
+ **Key Points**:
45
+ - Some datasets have data backfilled for missing values, while some do not
46
+ - This expression can be used to find the frequency of unique datafield updates by varying N (no. of days)
47
+ - Datafields with quarterly unique data frequency would see a Long Count + Short Count value close to its actual coverage when N = 66 (quarter)
48
+ - When N = 22 (month) Long Count + Short Count would be lower (approx. 1/3rd of coverage)
49
+ - When N = 5 (week), Long Count + Short Count would be even lower
50
+
51
+ **Purpose**: Understand how often the data actually changes vs. being backfilled
52
+ **What it tells you**: Data freshness and update patterns
53
+
54
+ ---
55
+
56
+ ### **4. Data Bounds Analysis**
57
+ **Expression**: `abs(datafield) > X` , for vector data type, the expression should be `abs(vector_operator(datafield)) > X`, please note, the vector_operator is the operator that you found via mcp.
58
+ **Insight**: Bounds of the datafield. Vary the values of X and see the Long Count
59
+
60
+ **Example**: X=1 will indicate if the field is normalized to values between -1 and +1
61
+
62
+ **Purpose**: Understand the range and scale of the data values
63
+ **What it tells you**: Whether data is normalized, what the typical value ranges are
64
+
65
+ ---
66
+
67
+ ### **5. Central Tendency Analysis**
68
+ **Expression**: `ts_median(datafield, 1000) > X` , for vector data type, the expression should be `ts_median(vector_operator(datafield), 1000) > X`, please note, the vector_operator is the operator that you found via mcp.
69
+ **Insight**: Median of the datafield over 5 years. Vary the values of X and see the Long Count
70
+
71
+ **Note**: Similar process can be applied to check the mean of the datafield
72
+
73
+ **Purpose**: Understand the typical values and central tendency of the data
74
+ **What it tells you**: Whether the data is skewed, what typical values look like
75
+
76
+ ---
77
+
78
+ ### **6. Data Distribution Analysis**
79
+ **Expression**: `X < scale_down(datafield) && scale_down(datafield) < Y` , for vector data type, the expression should be `X < scale_down(vector_operator(datafield)) && scale_down(vector_operator(datafield)) < Y`, please note, the vector_operator is the operator that you found via mcp.
80
+ **Insight**: Distribution of the datafield
81
+
82
+ **Key Points**:
83
+ - `scale_down` acts as a MinMaxScaler that can preserve the original distribution of the data
84
+ - X and Y are values that vary between 0 and 1 that allow us to check how the datafield distributes across its range
85
+
86
+ **Purpose**: Understand how data is distributed across its range
87
+ **What it tells you**: Whether data is evenly distributed, clustered, or has specific patterns
88
+
89
+ ---
90
+
91
+ ## 🔍 **Practical Example**
92
+
93
+ **Example**: If you simulate `[close <= 0]`, you will see Long and Short Counts as 0. This implies that closing price always has a positive value (as expected!)
94
+
95
+ **What this demonstrates**: The validation that your understanding of the data is correct
96
+
97
+ ---
98
+
99
+ ## 📋 **Implementation Workflow**
100
+
101
+ ### **Step 1: Setup**
102
+ 1. Set neutralization to "None"
103
+ 2. Set decay to 0
104
+ 3. Choose appropriate universe and time period
105
+
106
+ ### **Step 2: Run Basic Tests**
107
+ 1. Start with expression 1 (`datafield`) to get baseline coverage
108
+ 2. Run expression 2 (`datafield != 0 ? 1 : 0`) to understand non-zero coverage
109
+
110
+ ### **Step 3: Analyze Update Frequency**
111
+ 1. Test with N = 5 (weekly)
112
+ 2. Test with N = 22 (monthly)
113
+ 3. Test with N = 66 (quarterly)
114
+ 4. Compare results to understand update patterns
115
+
116
+ ### **Step 4: Explore Value Ranges**
117
+ 1. Test various thresholds for bounds analysis
118
+ 2. Test various thresholds for central tendency
119
+ 3. Test various ranges for distribution analysis
120
+
121
+ ### **Step 5: Document Insights**
122
+ 1. Record Long Count and Short Count for each test
123
+ 2. Calculate coverage ratios
124
+ 3. Note patterns in update frequency
125
+ 4. Document value ranges and distributions
126
+
127
+ ---
128
+
129
+ ## 🎯 **When to Use Each Method**
130
+
131
+ | Method | Best For | When to Use |
132
+ |--------|----------|-------------|
133
+ | **1. Basic Coverage** | Initial assessment | First exploration of any new field |
134
+ | **2. Non-Zero Coverage** | Data quality check | After basic coverage to understand meaningful data |
135
+ | **3. Update Frequency** | Data freshness | When you need to understand how often data changes |
136
+ | **4. Data Bounds** | Value ranges | When you need to understand data scale and normalization |
137
+ | **5. Central Tendency** | Typical values | When you need to understand what "normal" looks like |
138
+ | **6. Distribution** | Data patterns | When you need to understand how data is spread |
139
+
140
+ ---
141
+
142
+ ## ⚠️ **Important Considerations**
143
+
144
+ ### **Neutralization Setting**
145
+ - **Use "None"** for these exploration tests
146
+ - This ensures you're seeing the raw data behavior
147
+ - Other neutralization settings may mask important patterns
148
+
149
+ ### **Decay Setting**
150
+ - **Use 0** for these exploration tests
151
+ - This ensures you're seeing the actual data values
152
+ - Decay can smooth out important variations
153
+
154
+ ### **Universe Selection**
155
+ - Choose a universe that represents your target use case
156
+ - Consider both coverage and representativeness
157
+ - Large universes may have different patterns than smaller ones
158
+
159
+ ### **Time Period**
160
+ - Use sufficient history to see patterns
161
+ - Consider seasonal or cyclical effects
162
+ - Ensure you have enough data for statistical significance
163
+
164
+ ---
165
+
166
+ ## 🚀 **Advanced Applications**
167
+
168
+ ### **Combining Methods**
169
+ - Use multiple methods together for comprehensive understanding
170
+ - Cross-reference results to validate insights
171
+ - Look for inconsistencies that might indicate data quality issues
172
+
173
+ ### **Custom Variations**
174
+ - Modify expressions to test specific hypotheses
175
+ - Combine with other operators for deeper insights
176
+ - Create custom metrics based on your findings
177
+
178
+ ### **Automation**
179
+ - These tests can be automated for systematic dataset evaluation
180
+ - Create standardized evaluation reports
181
+ - Track changes in data quality over time
182
+
183
+ ---
184
+
185
+ ## 📚 **Related Resources**
186
+
187
+ - **BRAIN Platform Documentation**: Understanding Data concepts
188
+ - **Data Explorer Tool**: Visual exploration of data fields
189
+ - **Simulation Results**: Detailed analysis of field behavior
190
+ - **Community Forums**: User experiences and best practices
191
+
192
+ ---
193
+
194
+ *This guide provides a systematic approach to understanding new datafields on the WorldQuant BRAIN platform. Use these methods to quickly assess data quality, coverage, and characteristics before incorporating fields into your alpha strategies.*
@@ -0,0 +1,39 @@
1
+ ---
2
+ name: brain-dataset-exploration-general
3
+ description: >-
4
+ Provides a comprehensive workflow for deep-diving into entire WorldQuant BRAIN datasets.
5
+ Includes steps for dataset selection, field categorization, detailed description generation, and cross-platform research.
6
+ Use when the user wants to "audit a dataset", "categorize fields", or "explore a new dataset".
7
+ ---
8
+
9
+ # Dataset Exploration Expert Workflow
10
+
11
+ This workflow guides the deep analysis and categorization of datasets.
12
+ For the detailed job duty manual and specific MCP tool strategies, see [reference.md](reference.md).
13
+
14
+ ## Phase 1: Dataset Selection & Initial Assessment
15
+ 1. **Identify Dataset**: Select based on strategic importance or user needs.
16
+ 2. **Initial Exploration**:
17
+ - Use `get_datasets` to find datasets.
18
+ - Use `get_datafields` to count fields and check coverage.
19
+ - Use `get_documentations` to find related docs.
20
+
21
+ ## Phase 2: Field Categorization
22
+ Group data fields into logical categories:
23
+ - **Business Function**: Financials, Market Data, Estimates, etc.
24
+ - **Data Type**: Matrix, Vector.
25
+ - **Update Frequency**: Daily, Quarterly.
26
+ - **Hierarchy**: Primary -> Secondary -> Tertiary (e.g., Financials -> Income Statement -> Revenue).
27
+
28
+ ## Phase 3: Enhanced Description & Analysis
29
+ 1. **Describe**: Write detailed descriptions (Business context, Methodology, Typical values).
30
+ 2. **Analyze**: Use `brain-datafield-exploration` techniques on key fields to understand distributions and patterns.
31
+
32
+ ## Phase 4: Integration
33
+ 1. **Research**: Check forum posts for community insights.
34
+ 2. **Alpha Ideas**: Brainstorm alpha concepts based on the dataset characteristics.
35
+
36
+ ## Core Responsibilities
37
+ - **Deep Dive**: Focus on one dataset at a time.
38
+ - **Inventory**: Catalog all fields.
39
+ - **Documentation**: Improve descriptions.