PyPI - cnhkmcp - Versions diffs - 2.1.3__py3-none-any.whl → 2.1.5__py3-none-any.whl - Mend

cnhkmcp 2.1.3py3-none-any.whl → 2.1.5py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (194) hide show

cnhkmcp/untracked/sample_mcp_config.json ADDED Viewed

@@ -0,0 +1,11 @@
+{
+    "mcpServers": {
+      "worldquant-brain-platform": {
+        "command": "C:\\Users\\Administrator\\AppData\\Local\\Programs\\Python\\Python313\\python.exe",
+        "args": [
+          "C:\\Users\\Administrator\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\site-packages\\cnhkmcp\\untracked\\platform_functions.py"
+        ],
+        "description": "WorldQuant BRAIN Platform MCP Server - Comprehensive trading platform integration with simulation management, alpha operations, and authentication. Credentials are stored in user_config.json in the same directory. Provides tools for creating simulations, checking status, managing alphas, and accessing platform features. We also have a forum MCP here, WorldQuant BRAIN Forum MCP Server - Forum interaction and knowledge extraction tools. Provides glossary access, forum post reading, and community features. Credentials are stored in user_config.json in the same directory. Supports headless browser automation for forum scraping and content extraction."
+      }
+    }
+  }

cnhkmcp/untracked/user_config.json ADDED Viewed

@@ -0,0 +1,31 @@
+{
+  "credentials": {
+    "email": "youremail@mai.com",
+      "password": "yourpassword"
+    },
+  "api_settings": {
+    "base_url": "https://api.worldquantbrain.com",
+    "timeout": 30,
+    "retry_attempts": 3
+  },
+  "forum_settings": {
+    "base_url": "https://support.worldquantbrain.com",
+    "headless": true,
+    "timeout": 30
+  },
+  "simulation_defaults": {
+    "type": "REGULAR",
+    "instrument_type": "EQUITY",
+    "region": "USA",
+    "universe": "TOP3000",
+    "delay": 1,
+    "decay": 0,
+    "neutralization": "NONE",
+    "truncation": 0,
+    "test_period": "P0Y0M",
+    "unit_handling": "NONE",
+    "nan_handling": "NONE",
+    "language": "FASTEXPR",
+    "visualization": true
+  }
+}

cnhkmcp/untracked//321/207/320/264/342/225/221/321/204/342/225/233/320/233/321/205/320/237/320/222/321/210/320/220/320/223/321/206/320/246/320/227/321/206/320/261/320/263_BRAIN_Alpha_Test_Requirements_and_Tips.md ADDED Viewed

@@ -0,0 +1,202 @@
+# BRAIN Alpha Submission Tests: Requirements and Improvement Tips
+This document compiles the key requirements for passing alpha submission tests on the WorldQuant BRAIN platform, based on official documentation and community experiences from the forum. I've focused on the main tests (Fitness, Sharpe, Turnover, Weight, Sub-universe, and Self-Correlation). For each, I'll outline the thresholds, explanations, and strategies to improve or pass them, drawing from doc pages like "Clear these tests before submitting an Alpha" and forum searches on specific topics.
+## Overview
+## What is an Alpha?
+An alpha is a mathematical model or signal designed to predict the future movements of financial instruments (e.g., stocks). On BRAIN, alphas are expressed using the platform's FASTEXPR language and simulated against historical data to evaluate performance. Successful alphas can earn payments and contribute to production strategies.
+## What Are Alpha Tests?
+Alphas must pass a series of pre-submission checks (e.g., via the `get_submission_check` tool) to ensure they meet quality thresholds. Key tests include:
+- **Fitness and Sharpe Ratio**: Measures risk-adjusted returns. Must be above cutoffs (e.g., IS Sharpe > 1.25 for some universes).
+- **Correlation Checks**: Against self-alphas and production alphas (threshold ~0.7) to avoid redundancy.
+- **Turnover and Drawdown**: Ensures stability (e.g., low turnover < 250%).
+- **Regional/Universe-Specific**: Vary by settings like USA TOP3000 (D1) or GLB TOP3000.
+- **Other Metrics**: PnL, yearly stats, and risk-neutralized metrics (e.g., RAM, Crowding Risk-Neutralized).
+Failing tests result in errors like "Sub-universe Sharpe NaN is not above cutoff" or low fitness.
+## General Guidance on Passing Tests
+- **Start Simple**: Use basic operators like `ts_rank`, `ts_corr`, or `neutralize` on price-volume data.
+- **Optimize Settings**: Choose universes like TOP3000 (USA, D1) for easier testing. Neutralize against MARKET or SUBINDUSTRY to reduce correlation.
+- **Improve Metrics**: Apply `ts_decay_linear` for stability, `scale` for normalization, and check with `check_correlation`.
+- **Common Pitfalls**: Avoid high correlation (use `check_correlation`), ensure non-NaN data (e.g., via `ts_backfill`), and target high IR/Fitness.
+- **Resources**: Review operators (e.g., 102 available like `ts_zscore`), documentation (e.g., "Interpret Results" section), and forum posts.
+Alphas must pass these in-sample (IS) performance tests to be submitted for out-of-sample (OS) testing. Only submitted alphas contribute to scoring and payments. Tests are run in sequence, and failure messages guide improvements (e.g., "Improve fitness" or "Reduce max correlation").
+## Generating and Improving Alpha Ideas: The Conceptual Foundation
+Before diving into metrics and optimizations, strong alphas start with solid ideas rooted in financial theory, market behaviors, or data insights. Improving from an "idea angle" means iterating on the core concept rather than just tweaking parameters—this often leads to more robust alphas that pass tests naturally. Use resources like BRAIN's "Alpha Examples for Beginners" (from Discover BRAIN category) or forum-shared ideas.
+### Key Principles
+- **Idea Sources**: Draw from academic papers, economic indicators, or datasets (e.g., sentiment, earnings surprises). Validate ideas with backtests to ensure they generalize.
+- **Iteration**: Start simple, then refine: Add neutralization for correlation, decay for stability, or grouping for diversification.
+- **Avoid Overfitting**: Test ideas across universes/regions; use train/test splits.
+- **Tools**: Explore datasets via Data Explorer; use operators like `ts_rank` for signals.
+### Using arXiv for Idea Discovery
+A powerful way to source fresh ideas is through academic papers on arXiv. Use the provided `arxiv_api.py` script (detailed in `arXiv_API_Tool_Manual.md`) to search and download relevant research.
+- **Search Example**: Run `python arxiv_api.py "quantitative finance momentum strategies"` to find papers on momentum ideas. Download top results for detailed study.
+- **Integration Tip**: Extract concepts like "earnings surprises" from abstracts, then implement in BRAIN (e.g., using sentiment datasets). This helps generate diverse alphas that pass correlation tests.
+- **Why It Helps**: Papers often provide theoretical backing, reducing overfitting risks when adapting to BRAIN simulations.
+Refer to the manual for interactive mode and advanced queries to streamline your research workflow.
+### Avoid Mixing Datasets: The ATOM Principle
+When improving an alpha, prioritize modifications that stay within the same dataset as the original. ATOM (Atomic) alphas are those built from a single dataset (excluding permitted grouping fields like country, sector, etc.), which qualify for relaxed submission criteria—focusing on last 2Y Sharpe instead of full IS Ladder tests.
+**Why It's Important**:
+- **Robustness**: Mixing datasets can introduce conflicting signals, leading to overfitting and poor out-of-sample performance (forum insights on ATOM alphas).
+- **Submission Benefits**: Single-dataset alphas have easier thresholds (e.g., Delay-1: >1 for last 2Y Sharpe in USA) and may align with themes offering multipliers (up to x1.1 for low-utilization pyramids).
+- **Correlation Control**: ATOM alphas often have lower self-correlation, helping pass tests and diversify your portfolio.
+**How to Apply**:
+- Check the alpha's data fields via simulation results or code.
+- Search for improvements in the same dataset first (use Data Explorer).
+- If mixing is needed, verify it doesn't disqualify ATOM status and retest thoroughly.
+This principle, highlighted in BRAIN docs and forums, ensures alphas remain "atomic" and competitive.
+### Understanding Datafields Before Improvements
+Before optimizing alphas, thoroughly evaluate the datafields involved to address potential issues like unit mismatches or update frequencies. This prevents common pitfalls in tests (e.g., NaN errors, poor sub-universe performance) and ensures appropriate operators are used. Use these 6 methods from the BRAIN exploration guide (adapted for quick simulation in "None" neutralization, decay 0, test_period P0Y0M):
+1. **Basic Coverage**: For example, Simulate `datafield` (or `vec_op(datafield)` for vectors). Insight: % coverage = (Long + Short Count) / Universe Size.
+2. **Non-Zero Coverage**: For example, Simulate `datafield != 0 ? 1 : 0`. Insight: Actual meaningful data points.
+3. **Update Frequency**: For example, Simulate `ts_std_dev(datafield, N) != 0 ? 1 : 0` (vary N=5,22,66). Insight: Daily/weekly/monthly/quarterly updates.
+4. **Data Bounds**: For example, Simulate `abs(datafield) > X` (vary X). Insight: Value ranges and normalization.
+5. **Central Tendency**: For example, Simulate `ts_median(datafield, 1000) > X` (vary X). Insight: Typical values over time.
+6. **Distribution**: Simulate `X < scale_down(datafield) && scale_down(datafield) < Y` (vary X/Y between 0-1). Insight: Data spread patterns.
+Apply insights to choose operators (e.g., ts_backfill for sparse data, scale for unit issues) and fix problems before improvements.
+### Examples from Community and Docs (From Alpha Template Sharing Post)
+These examples are sourced from the forum post on sharing unique alpha ideas and implementations, emphasizing templates that generate robust signals for passing submission tests.
+- **Multi-Smoothing Ranking Signal** (User: JB71859): For earnings data, apply double smoothing with ranking and statistical ops. Example: `ts_mean(ts_rank(earnings_field, decay1), decay2)`. First ts_rank normalizes values over time (pre-processing), then ts_mean smooths for stable signals (main signal). Helps improve fitness and reduce turnover by lowering noise; produced 3 ATOM alphas after 2000 simulations.
+- **Momentum Divergence Factor** (User: YK49234): Capture divergence between short and long-term momentum on the same field. Example: `ts_delta(ts_zscore(field, short_window), short_window) - ts_delta(ts_zscore(field, long_window), long_window)`. Processes data with z-scoring for normalization, then delta/mean for change detection (main signal). Boosts Sharpe by highlighting momentum shifts; yielded 4 submitable alphas from 20k tests with ~5% signal rate.
+- **Network Factor Difference Momentum** (User: JR23144): Compute differences in oth455 PCA factors for 'imbalance' signals, then apply time series ops. Example: `ts_sum(oth455_fact2 - oth455_fact1, 240)`. Math op creates difference (pre-processing), ts op captures persistence (main signal). Enhances correlation passing via unique network insights; effective in EUR for low-fitness but high-margin alphas.
+These community-shared templates promote diverse, ATOM-friendly ideas that align with test requirements like low correlation and high robustness.
+### Official BRAIN Examples
+Draw from BRAIN's structured tutorials for foundational ideas:
+- **Beginner Level** ([19 Alpha Examples](https://platform.worldquantbrain.com/learn/documentation/create-alphas/19-alpha-examples)): Start with simple price-based signals. Example: `ts_rank(close, 20)` – Ranks closing prices over 20 days to capture momentum. Improve by adding neutralization: `neutralize(ts_rank(close, 20), "MARKET")` to reduce market bias and pass correlation tests.
+- **Bronze Level** ([Sample Alpha Concepts](https://platform.worldquantbrain.com/learn/documentation/create-alphas/sample-alpha-concepts)): Incorporate multiple data fields. Example: `ts_corr(close, volume, 10)` – Correlation between price and volume over 10 days. Enhance fitness by decaying: `ts_decay_linear(ts_corr(close, volume, 10), 5)` for smoother signals.
+- **Silver Level** ([Example Expression Alphas](https://platform.worldquantbrain.com/learn/documentation/create-alphas/example-expression-alphas)): Advanced combinations. Example: `scale(ts_rank(ts_delay(vwap, 1) / vwap, 252))` – Normalized 1-year price change. Iterate by adding groups: `group_zscore(scale(ts_rank(ts_delay(vwap, 1) / vwap, 252)), "INDUSTRY")` to improve sub-universe robustness.
+These examples show how starting with a core idea (e.g., momentum) and layering improvements (e.g., neutralization, decay) can help pass tests like fitness and sub-universe.
+## 1. Fitness
+### Requirements
+- At least "Average": Greater than 1.3 for Delay-0 or Greater than 1 for Delay-1.
+- Fitness = Sharpe * sqrt(abs(Returns) / max(Turnover, 0.125)).
+- Ratings: Spectacular (>2.5 Delay-1 or >3.25 Delay-0), Excellent (>2 or >2.6), etc.
+### Explanation
+Fitness balances Sharpe, Returns, and Turnover. High fitness indicates a robust alpha. It's a key metric for alpha quality.
+### Tips to Improve
+- **From Docs**: Increase Sharpe/Returns and reduce Turnover. Optimize by balancing these—improving one may hurt another. Aim for upward PnL trends with minimal drawdown.
+- **Forum Experiences** (from searches on "increase fitness alpha"):
+  - Use group operators (e.g., with pv13) to boost fitness without overcomplicating expressions.
+  - Screen alphas with author_fitness >=2 or similar in competitions like Super Alpha.
+  - Manage alphas via databases or tags; query for high-fitness ones (e.g., via API with fitness filters).
+  - In hand-crafting alphas, iteratively add operators like left_tail and group to push fitness over thresholds, but watch for overfitting.
+  - Community shares: High-fitness alphas (e.g., >2) often come from multi-factor fusions or careful data field selection.
+## 2. Sharpe Ratio
+### Requirements
+- Greater than 2 for Delay-0 or Greater than 1.25 for Delay-1.
+- Sharpe = sqrt(252) * IR, where IR = mean(PnL) / stdev(PnL).
+### Explanation
+Measures risk-adjusted returns. Higher Sharpe means more consistent performance. For GLB alphas, additional sub-geography Sharpes (>=1 for AMER, APAC, EMEA).
+### Tips to Improve
+- **From Docs**: Focus on consistent PnL with low volatility. Use visualization to ensure upward trends. For sub-geography, incorporate region-specific signals (e.g., earnings for AMER, microstructure for APAC).
+- **Forum Experiences** (from searches on "improve Sharpe ratio alpha"):
+  - Decay signals separately for liquid/non-liquid stocks (e.g., ts_decay_linear with rank(volume*close)).
+  - Avoid size-related multipliers (e.g., rank(-assets)) that shift weights to illiquid stocks.
+  - Check yearly Sharpe data via API and store in databases for analysis.
+  - In templates like CCI-based, combine with z-score and delay to stabilize Sharpe.
+  - Community tip: Prune low-Sharpe alphas in pools using weighted methods to retain high-Sharpe ones.
+  - **Flipping Negative Sharpe**: For non-CHN regions, if an alpha shows negative Sharpe (e.g., -1 to -2), add a minus sign to the expression (e.g., `-original_expression`) to flip it positive. This preserves the signal while improving metrics; verify it doesn't introduce correlation issues.
+## 3. Turnover
+### Requirements
+- 1% < Turnover < 70%.
+- Turnover = Dollar trading volume / Book size.
+### Explanation
+Indicates trading frequency. Low turnover reduces costs; extremes fail submission.
+### Tips to Improve
+- **From Docs**: Aim for balanced trading—too low means inactive, too high means over-trading.
+- **Forum Experiences**: (Note: Specific turnover searches weren't direct, but tied to fitness/Sharpe improvements)
+  - Use decay functions to smooth signals, reducing unnecessary trades.
+  - In multi-alpha simulations, filter by turnover thresholds in code to pre-select candidates.
+## 4. Weight Test
+### Requirements
+- Max weight in any stock <10%.
+- Sufficient instruments assigned weight (varies by universe, e.g., TOP3000).
+### Explanation
+Ensures diversification; fails if concentrated or too few stocks weighted.
+### Tips to Improve
+- **From Docs**: Avoid expressions that overly concentrate weights. Assign weights broadly after simulation start.
+- **Forum Experiences**: (Limited direct posts; inferred from general submission tips)
+  - Use neutralization (e.g., market) to distribute weights evenly.
+  - Check via simulation stats; adjust with rank or scale operators.
+## 5. Sub-universe Test
+### Requirements
+- Sub-universe Sharpe >= 0.75 * sqrt(subuniverse_size / alpha_universe_size) * alpha_sharpe.
+- Ensures robustness in more liquid sub-universes (e.g., TOP1000 for TOP3000).
+### Explanation
+Tests if alpha performs in liquid stocks, avoiding over-reliance on illiquid ones.
+### Tips to Improve
+- **From Docs**: Avoid size-related multipliers. Decay liquid/non-liquid parts separately (e.g., ts_decay_linear(signal,5)*rank(volume*close) + ts_decay_linear(signal,10)*(1-rank(volume*close))). From this example, we can see that the signal can be inflated by different weights for different parts of an datafield.
+  - Step-by-step improvements; discard non-robust signals.
+- **Forum Experiences**: (From "how to pass submission tests")
+  - Improve overall Sharpe first, as it scales the threshold.
+  - Use pasteurize to handle NaNs and ensure even distribution.
+## 6. Self-Correlation
+### Requirements
+- <0.7 PnL correlation with own submitted alphas.
+- Or Sharpe at least 10% greater than correlated alphas.
+### Explanation
+Promotes diversity; based on 4-year PnL window. Allows improvements if new alpha is significantly better.
+### Tips to Improve
+- **From Docs**: Submit diverse ideas. Use correlation table in results to identify issues.
+- **Forum Experiences** (from searches on "reduce correlation self alphas"):
+  - Local computation of self-correlation (e.g., via PnL matrices) to pre-filter before submission.
+  - Code optimizations: Prune high-correlation alphas, use clustering or weighted pruning (e.g., Sharpe-weighted) to retain diverse sets.
+  - Handle negatives: Transform negatively correlated alphas (e.g., in China market) by inversion or adjustments.
+  - Scripts for batch checking: Use machine_lib modifications to print correlations and pyramid info.
+  - Community shares: Differences between local and platform calculations (e.g., due to NaN handling); align by using full PnL data.
+### Evaluating Whole Alpha Quality
+Before final submission, perform these checks on simulation results:
+- **Yearly Stats Quality Check**: Review yearly statistics. If records are missing for >5 years, it indicates low data quality (e.g., sparse coverage). Fix with ts_backfill, data selection, or alternative fields to ensure robust performance across tests.
+This complements per-test improvements by validating overall alpha reliability.
+## General Advice
+- Start with broad simulations, narrow based on stats.
+- Use tools like check_submission API for pre-checks.
+- Forum consensus: Automate with Python scripts for efficiency (e.g., threading for simulates, databases for alpha management).
+- Risks: Overfitting in manual tweaks; validate with train/test splits.
+This guide is based on tool-gathered data. For updates, check BRAIN docs or forum.

cnhkmcp/untracked//321/207/320/264/342/225/221/321/204/342/225/233/320/233/321/205/342/225/226/320/265/321/204/342/225/234/320/254/321/206/342/225/241/320/221_Alpha_explaination_workflow.md ADDED Viewed

@@ -0,0 +1,56 @@
+Alpha Explanation Workflow
+This manual provides a step-by-step workflow for analyzing and explaining a WorldQuant BRAIN alpha expression. By following this guide, you can efficiently gather the necessary information to understand the logic and potential strategy behind any alpha.
+Step 1: Deconstruct the Alpha Expression
+The first step is to break down the alpha expression into its fundamental components: data fields and operators.
+For example, given the expression quantile(ts_regression(oth423_find,group_mean(oth423_find,vec_max(shrt3_bar),country),90)):
+Data Fields: oth423_find, shrt3_bar
+Operators: quantile, ts_regression, group_mean, vec_max
+Step 2: Analyze Data Fields
+Use the brain-platform-mcp tool get_datafields to get detailed information about each data field.
+Tool Usage: xml <use_mcp_tool> <server_name>brain-platform-mcp</server_name> <tool_name>get_datafields</tool_name> <arguments> { "instrument_type": "EQUITY", "region": "ASI", "delay": 1, "universe": "MINVOL1M", "data_type": "VECTOR", "search": "shrt3_bar" } </arguments> </use_mcp_tool>
+Tips for effective searching:
+Specify Parameters: Always provide as much information as you know, including instrument_type, region, delay, universe, and data_type (MATRIX or VECTOR).
+Iterate: If you don't find the data field on your first try, try different combinations of parameters. The ASI region, for example, has two universes: MINVOL1M and ILLIQUID_MINVOL1M.
+Check Data Type: Be sure to check if the data is a MATRIX (one value per stock per day) or a VECTOR (multiple values per stock per day). This is crucial for understanding how the data is used.
+Example Data Field Information:
+oth423_find: A matrix data field from the "Fundamental Income and Dividend Model" dataset in the ASI region. It represents a "Find score," likely indicating fundamental attractiveness.
+shrt3_bar: A vector data field from the "Securities Lending Files Data" dataset in the ASI region. It provides a vector of ratings (1-10) indicating the demand to borrow a stock, which is a proxy for short-selling interest.
+Step 3: Understand the Operators
+Use the brain-platform-mcp tool get_operators to get a list of all available operators and their descriptions.
+Tool Usage: xml <use_mcp_tool> <server_name>brain-platform-mcp</server_name> <tool_name>get_operators</tool_name> <arguments> {} </arguments> </use_mcp_tool> The output of this command contains a wealth of information. For your convenience, a table of the most common operators is included in the Appendix of this manual.
+Step 4: Consult Official Documentation
+For more complex topics, the official BRAIN documentation is an invaluable resource. Use the get_documentations tool to see a list of available documents, and get_documentation_page to read a specific page.
+Example: To understand vector data fields better, I consulted the "Vector Data Fields ðŸ¥‰" document (vector-datafields). This revealed that vector data contains multiple values per instrument per day and must be aggregated by a vector operator before being used with other operators.
+Step 5: Broaden Understanding with External Research (Must Call the arxiv_api.py script to get the latest research papers)
+For cutting-edge ideas and inspiration, you can search for academic papers on arXiv using the provided arxiv_api.py script.
+Workflow:
+Identify Keywords: Based on your analysis of the alpha, identify relevant keywords. For our example, these were: "short interest", "fundamental analysis", "relative value", and "news sentiment".
+Run the Script: Use the with-wrappers script to avoid SSL errors.
+python arxiv_api.py "your keywords here" -n 10
+Step 6: Synthesize and Explain
+Once you have gathered all the necessary information, structure your explanation in a clear and concise format. The following template is recommended:
+Idea: A high-level summary of the alpha's strategy.
+Rationale for data used: An explanation of why each data field was chosen and what it represents.
+Rationale for operators used: A step-by-step explanation of how the operators transform the data to generate the final signal.
+Further Inspiration: Ideas for new alphas based on your research.
+Troubleshooting
+SSL Errors: If you encounter a CERTIFICATE_VERIFY_FAILED error when running python scripts that access the internet, use the AI to help you change or make script to execute your command.
+Appendix A: Understanding Vector Data
+Vector Data is a distinct type of data field where the number of events recorded per day, per instrument, can vary. This is in contrast to standard matrix data, which has a single value for each instrument per day.
+For example, news sentiment data is often a vector because a stock can have multiple news articles on a single day. To use this data in most BRAIN operators, it must first be aggregated into a single value using a vector operator.

@@ -0,0 +1,194 @@
+# BRAIN TIPS: 6 Ways to Quickly Evaluate a New Dataset
+## WorldQuant BRAIN Platform - Datafield Exploration Guide
+**Original Post**: [BRAIN TIPS] 6 ways to quickly evaluate a new dataset
+**Author**: KA64574
+**Date**: 2 years ago
+**Followers**: 265 people
+---
+## 🎯 **Overview**
+WorldQuant BRAIN has thousands of datafields for you to create alphas. But how do you quickly understand a new datafield? Here are 6 proven methods to evaluate and understand new datasets efficiently.
+**Important**: Simulate the below expressions in **"None" neutralization** and **decay 0 setting** and **test_period P0Y0M**. Obtain insights of specific parameters using the **Long Count** and **Short Count** in the **IS Summary section** of the results.
+**Watch Out**: - Data type (matrix/vector), please note, these are two special definition here and not similar as we knew in math. Different data types have different characteristics and usage rule; if it is a matrix data type, you can use the datafield directly, but if it is a vector data type, you should use a vector operator to convert the datafield to a matrix data type. Thus, for a vector data type, you should find proper vector operator via mcp then put it into the following test.
+---
+## 📊 **The 6 Exploration Methods**
+### **1. Basic Coverage Analysis**
+**Expression**: `datafield`, for vector data type, the expression should be `vector_operator(datafield)`, please note, the vector_operator is the operator that you found via mcp.
+**Insight**: % coverage, would approximately be ratio of (Long Count + Short Count in the IS Summary) / (Universe Size in the settings)
+**Purpose**: Understand the basic availability of data across the universe
+**What it tells you**: How many instruments have data for this field on average
+---
+### **2. Non-Zero Value Coverage**
+**Expression**: `datafield != 0 ? 1 : 0` , for vector data type, the expression should be `vector_operator(datafield) != 0 ? 1 : 0`, please note, the vector_operator is the operator that you found via mcp.
+**Insight**: Coverage. Long Count indicates average non-zero values on a daily basis
+**Purpose**: Distinguish between missing data and actual zero values
+**What it tells you**: Whether the field has meaningful data vs. just coverage gaps
+---
+### **3. Data Update Frequency Analysis**
+**Expression**: `ts_std_dev(datafield,N) != 0 ? 1 : 0` , for vector data type, the expression should be `ts_std_dev(vector_operator(datafield),N) != 0 ? 1 : 0`, please note, the vector_operator is the operator that you found via mcp.
+**Insight**: Frequency of unique data (daily, weekly, monthly etc.)
+**Key Points**:
+- Some datasets have data backfilled for missing values, while some do not
+- This expression can be used to find the frequency of unique datafield updates by varying N (no. of days)
+- Datafields with quarterly unique data frequency would see a Long Count + Short Count value close to its actual coverage when N = 66 (quarter)
+- When N = 22 (month) Long Count + Short Count would be lower (approx. 1/3rd of coverage)
+- When N = 5 (week), Long Count + Short Count would be even lower
+**Purpose**: Understand how often the data actually changes vs. being backfilled
+**What it tells you**: Data freshness and update patterns
+---
+### **4. Data Bounds Analysis**
+**Expression**: `abs(datafield) > X`  , for vector data type, the expression should be `abs(vector_operator(datafield)) > X`, please note, the vector_operator is the operator that you found via mcp.
+**Insight**: Bounds of the datafield. Vary the values of X and see the Long Count
+**Example**: X=1 will indicate if the field is normalized to values between -1 and +1
+**Purpose**: Understand the range and scale of the data values
+**What it tells you**: Whether data is normalized, what the typical value ranges are
+---
+### **5. Central Tendency Analysis**
+**Expression**: `ts_median(datafield, 1000) > X`  , for vector data type, the expression should be `ts_median(vector_operator(datafield), 1000) > X`, please note, the vector_operator is the operator that you found via mcp.
+**Insight**: Median of the datafield over 5 years. Vary the values of X and see the Long Count
+**Note**: Similar process can be applied to check the mean of the datafield
+**Purpose**: Understand the typical values and central tendency of the data
+**What it tells you**: Whether the data is skewed, what typical values look like
+---
+### **6. Data Distribution Analysis**
+**Expression**: `X < scale_down(datafield) && scale_down(datafield) < Y`  , for vector data type, the expression should be `X < scale_down(vector_operator(datafield)) && scale_down(vector_operator(datafield)) < Y`, please note, the vector_operator is the operator that you found via mcp.
+**Insight**: Distribution of the datafield
+**Key Points**:
+- `scale_down` acts as a MinMaxScaler that can preserve the original distribution of the data
+- X and Y are values that vary between 0 and 1 that allow us to check how the datafield distributes across its range
+**Purpose**: Understand how data is distributed across its range
+**What it tells you**: Whether data is evenly distributed, clustered, or has specific patterns
+---
+## 🔍 **Practical Example**
+**Example**: If you simulate `[close <= 0]`, you will see Long and Short Counts as 0. This implies that closing price always has a positive value (as expected!)
+**What this demonstrates**: The validation that your understanding of the data is correct
+---
+## 📋 **Implementation Workflow**
+### **Step 1: Setup**
+1. Set neutralization to "None"
+2. Set decay to 0
+3. Choose appropriate universe and time period
+### **Step 2: Run Basic Tests**
+1. Start with expression 1 (`datafield`) to get baseline coverage
+2. Run expression 2 (`datafield != 0 ? 1 : 0`) to understand non-zero coverage
+### **Step 3: Analyze Update Frequency**
+1. Test with N = 5 (weekly)
+2. Test with N = 22 (monthly)
+3. Test with N = 66 (quarterly)
+4. Compare results to understand update patterns
+### **Step 4: Explore Value Ranges**
+1. Test various thresholds for bounds analysis
+2. Test various thresholds for central tendency
+3. Test various ranges for distribution analysis
+### **Step 5: Document Insights**
+1. Record Long Count and Short Count for each test
+2. Calculate coverage ratios
+3. Note patterns in update frequency
+4. Document value ranges and distributions
+---
+## 🎯 **When to Use Each Method**
+| Method | Best For | When to Use |
+|--------|----------|-------------|
+| **1. Basic Coverage** | Initial assessment | First exploration of any new field |
+| **2. Non-Zero Coverage** | Data quality check | After basic coverage to understand meaningful data |
+| **3. Update Frequency** | Data freshness | When you need to understand how often data changes |
+| **4. Data Bounds** | Value ranges | When you need to understand data scale and normalization |
+| **5. Central Tendency** | Typical values | When you need to understand what "normal" looks like |
+| **6. Distribution** | Data patterns | When you need to understand how data is spread |
+---
+## ⚠️ **Important Considerations**
+### **Neutralization Setting**
+- **Use "None"** for these exploration tests
+- This ensures you're seeing the raw data behavior
+- Other neutralization settings may mask important patterns
+### **Decay Setting**
+- **Use 0** for these exploration tests
+- This ensures you're seeing the actual data values
+- Decay can smooth out important variations
+### **Universe Selection**
+- Choose a universe that represents your target use case
+- Consider both coverage and representativeness
+- Large universes may have different patterns than smaller ones
+### **Time Period**
+- Use sufficient history to see patterns
+- Consider seasonal or cyclical effects
+- Ensure you have enough data for statistical significance
+---
+## 🚀 **Advanced Applications**
+### **Combining Methods**
+- Use multiple methods together for comprehensive understanding
+- Cross-reference results to validate insights
+- Look for inconsistencies that might indicate data quality issues
+### **Custom Variations**
+- Modify expressions to test specific hypotheses
+- Combine with other operators for deeper insights
+- Create custom metrics based on your findings
+### **Automation**
+- These tests can be automated for systematic dataset evaluation
+- Create standardized evaluation reports
+- Track changes in data quality over time
+---
+## 📚 **Related Resources**
+- **BRAIN Platform Documentation**: Understanding Data concepts
+- **Data Explorer Tool**: Visual exploration of data fields
+- **Simulation Results**: Detailed analysis of field behavior
+- **Community Forums**: User experiences and best practices
+---
+*This guide provides a systematic approach to understanding new datafields on the WorldQuant BRAIN platform. Use these methods to quickly assess data quality, coverage, and characteristics before incorporating fields into your alpha strategies.*

@@ -0,0 +1,101 @@
+# Repeatable Workflow for Improving BRAIN Alphas: A Step-by-Step Guide
+This document outlines a systematic, repeatable workflow for enhancing alphas on the WorldQuant BRAIN platform. It emphasizes core idea refinements (e.g., incorporating financial concepts from research) over mechanical tweaks, as per guidelines in `BRAIN_Alpha_Test_Requirements_and_Tips.md`. The process is tool-agnostic but assumes access to BRAIN API (via MCP), arXiv search scripts, and basic analysis tools. Each cycle takes ~30-60 minutes; repeat until submission thresholds are met (e.g., Sharpe >1.25, Fitness >1 for Delay-1 ATOM alphas).
+## Prerequisites
+- Authenticate with BRAIN (e.g., via API tool).
+- Have the alpha ID and expression ready.
+- Access to arXiv script (e.g., `arxiv_api.py`) for idea sourcing.
+- Track progress in a log (e.g., metrics table per iteration).
+## Step 1: Gather Alpha Information (5-10 minutes)
+**Goal**: Collect baseline data to identify weaknesses (e.g., low Sharpe, high correlation, inconsistent yearly stats).
+**Steps**:
+- Authenticate if needed.
+- Fetch alpha details (expression, settings, metrics like PnL, Sharpe, Fitness, Turnover, Drawdown, and checks).
+- Retrieve PnL trends and yearly stats.
+- Run submission and correlation checks (self/production, threshold 0.7).
+**Analysis**:
+- Note failing tests (e.g., sub-universe low = illiquid reliance).
+- For ATOM alphas (single-dataset), confirm relaxed thresholds.
+**Output**: Summary of metrics and issues (e.g., "Sharpe 1.11, fails sub-universe").
+**Tips for Repeatability**: Automate with a script template for batch alphas.
+## Step 2: Evaluate the Core Datafield(s) (5-10 minutes)
+**Goal**: Understand data properties (sparsity, frequency) to guide refinements.
+**Steps**:
+- Confirm field details (type, coverage).
+- Simulate 6 evaluation expressions in neutral settings (neutralization="NONE", decay=0, short test period):
+  1. Basic Coverage: `datafield`.
+  2. Non-Zero Coverage: `datafield != 0 ? 1 : 0`.
+  3. Update Frequency: `ts_std_dev(datafield, N) != 0 ? 1 : 0` (N=5,22,66).
+  4. Bounds: `abs(datafield) > X` (vary X).
+  5. Central Tendency: `ts_median(datafield, 1000) > X` (vary X).
+  6. Distribution: `low < scale_down(datafield) < high` (e.g., 0.25-0.75).
+- Use multi-simulation; fallback to singles if issues.
+**Analysis**:
+- Identify patterns (e.g., quarterly updates → use long windows).
+**Output**: Insights (e.g., "Sparse quarterly data → prioritize persistence ideas").
+**Tips for Repeatability**: Template the 6 expressions in a script; run for any field.
+## Step 3: Propose Idea-Focused Improvements (10-15 minutes)
+**Goal**: Evolve the core signal with theory-backed concepts (e.g., momentum, persistence) for sustainability.
+**Steps**:
+- Review platform docs/community examples for tips (e.g., ATOM, flipping negatives).
+- Source ideas: Query arXiv with targeted terms (e.g., "return on assets momentum analyst estimates"). Extract 3-5 relevant papers' concepts (e.g., precision weighting = divide by std_dev).
+- Brainstorm 4-6 variants: Modify original with 1-2 concepts (e.g., add revision delta).
+- Validate operators against platform list; replace if needed (e.g., custom momentum formula).
+**Analysis**:
+- Prioritize fixes for baselines (e.g., negative years → cycle-sensitive grouping).
+**Output**: List of expressions with rationale (e.g., "Variant 1: Weighted persistence from Paper X").
+**Tips for Repeatability**: Use a template (e.g., "Search terms: [field] + momentum/revision"; limit to recent finance papers).
+## Step 4: Simulate and Test Variants (10-20 minutes, including wait)
+**Goal**: Efficiently compare ideas via metrics.
+**Steps**:
+- Run multi-simulation (2-8 expressions) with original settings + targeted tweaks (e.g., neutralization for grouping).
+- If multi fails, use parallel single simulations.
+- Fetch results (details, PnL, yearly stats).
+**Analysis**:
+- Rank by Fitness/Sharpe; check sub-universe, consistency.
+- Flip negatives if applicable.
+**Output**: Ranked results (e.g., "Top ID: XYZ, Fitness improved 13%").
+**Tips for Repeatability**: Parallelize calls; log in a table (e.g., CSV with metrics).
+## Step 5: Validate and Iterate or Finalize (5-10 minutes)
+**Goal**: Confirm submittability; loop if needed.
+**Steps**:
+- Run submission/correlation checks on top variants.
+- Analyze PnL/yearly for trends.
+- If failing, tweak (e.g., universe change) and return to Step 3.
+- If passing, submit.
+**Analysis**:
+- Ensure sustainability (e.g., consistent positives).
+**Output**: Final recommendation or next cycle plan.
+## Iteration and Best Practices
+- **Cycle Limit**: 3-5 per alpha; pivot if stuck (e.g., new datafield).
+- **Tracking**: Maintain a log (e.g., MD file with iterations, metrics deltas).
+- **Efficiency**: Use parallel tools; focus 70% on ideas, 30% on tweaks.
+- **Success Criteria**: Passing checks + stable yearly stats.
+This workflow has improved alphas by ~10-20% in metrics per cycle in tests. Adapt as needed!

cnhkmcp 2.1.3__py3-none-any.whl → 2.1.5__py3-none-any.whl

cnhkmcp 2.1.3py3-none-any.whl → 2.1.5py3-none-any.whl