cnhkmcp 2.2.0__py3-none-any.whl → 2.3.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- cnhkmcp/__init__.py +1 -1
- cnhkmcp/untracked/AI/321/206/320/261/320/234/321/211/320/255/320/262/321/206/320/237/320/242/321/204/342/225/227/342/225/242/README.md +1 -1
- cnhkmcp/untracked/AI/321/206/320/261/320/234/321/211/320/255/320/262/321/206/320/237/320/242/321/204/342/225/227/342/225/242/config.json +2 -2
- cnhkmcp/untracked/AI/321/206/320/261/320/234/321/211/320/255/320/262/321/206/320/237/320/242/321/204/342/225/227/342/225/242/main.py +1 -1
- cnhkmcp/untracked/AI/321/206/320/261/320/234/321/211/320/255/320/262/321/206/320/237/320/242/321/204/342/225/227/342/225/242/vector_db/chroma.sqlite3 +0 -0
- cnhkmcp/untracked/APP/Tranformer/Transformer.py +2 -2
- cnhkmcp/untracked/APP/Tranformer/transformer_config.json +1 -1
- cnhkmcp/untracked/APP/blueprints/feature_engineering.py +2 -2
- cnhkmcp/untracked/APP/blueprints/inspiration_house.py +4 -4
- cnhkmcp/untracked/APP/blueprints/paper_analysis.py +3 -3
- cnhkmcp/untracked/APP/give_me_idea/BRAIN_Alpha_Template_Expert_SystemPrompt.md +34 -73
- cnhkmcp/untracked/APP/give_me_idea/alpha_data_specific_template_master.py +2 -2
- cnhkmcp/untracked/APP/give_me_idea/what_is_Alpha_template.md +366 -1
- cnhkmcp/untracked/APP/static/inspiration.js +345 -13
- cnhkmcp/untracked/APP/templates/index.html +11 -3
- cnhkmcp/untracked/APP/templates/transformer_web.html +1 -1
- cnhkmcp/untracked/APP/trailSomeAlphas/README.md +38 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/ace.log +66 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/enhance_template.py +588 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/requirements.txt +3 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/run_pipeline.py +1001 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/run_pipeline_step_by_step.ipynb +5258 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-data-feature-engineering/OUTPUT_TEMPLATE.md +325 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-data-feature-engineering/SKILL.md +503 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-data-feature-engineering/examples.md +244 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-data-feature-engineering/output_report/ASI_delay1_analyst11_ideas.md +285 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-data-feature-engineering/reference.md +399 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/SKILL.md +40 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/config.json +6 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709385783386000.json +388 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709386274840400.json +131 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709386838244700.json +1926 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709387369198500.json +31 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709387908905800.json +1926 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709388486243600.json +240 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709389024058600.json +1926 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709389549608700.json +41 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709390068714000.json +110 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709390591996900.json +36 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709391129137100.json +31 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709391691643500.json +41 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709392192099200.json +31 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709392703423500.json +46 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769709393213729400.json +246 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710186683932500.json +388 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710187165414300.json +131 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710187665211700.json +1926 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710188149193400.json +31 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710188667627400.json +1926 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710189220822000.json +240 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710189726189500.json +1926 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710190248066100.json +41 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710190768298700.json +110 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710191282588100.json +36 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710191838960900.json +31 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710192396688000.json +41 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710192941922400.json +31 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710193473524600.json +46 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710194001961200.json +246 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710420975888800.json +46 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710421647590100.json +196 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710422131378500.json +5 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710422644184400.json +196 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710423702350600.json +196 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_1_idea_1769710424244661800.json +5 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/analyst11_ASI_delay1.csv +211 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/analyst11_ASI_delay1/final_expressions.json +7062 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/scripts/ace.log +3 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/scripts/ace_lib.py +1514 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/scripts/fetch_dataset.py +113 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/scripts/helpful_functions.py +180 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/scripts/implement_idea.py +236 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/scripts/merge_expression_list.py +90 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/scripts/parsetab.py +60 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/template_final_enhance/op/321/206/320/220/342/225/227/321/207/342/225/227/320/243.md +434 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/template_final_enhance/sample_prompt.md +62 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/template_final_enhance//321/205/320/235/320/245/321/205/320/253/320/260/321/205/320/275/320/240/321/206/320/220/320/255/321/210/320/220/320/223/321/211/320/220/342/225/227/321/210/342/225/233/320/241/321/211/320/243/342/225/233.md +354 -0
- cnhkmcp/untracked/APP/usage.md +2 -2
- cnhkmcp/untracked/APP//321/210/342/224/220/320/240/321/210/320/261/320/234/321/206/320/231/320/243/321/205/342/225/235/320/220/321/206/320/230/320/241.py +388 -8
- cnhkmcp/untracked/skills/alpha-expression-verifier/scripts/validator.py +889 -0
- cnhkmcp/untracked/skills/brain-feature-implementation/scripts/implement_idea.py +4 -3
- cnhkmcp/untracked/skills/brain-improve-alpha-performance/arXiv_API_Tool_Manual.md +490 -0
- cnhkmcp/untracked/skills/brain-improve-alpha-performance/reference.md +1 -1
- cnhkmcp/untracked/skills/brain-improve-alpha-performance/scripts/arxiv_api.py +229 -0
- cnhkmcp/untracked//321/211/320/225/320/235/321/207/342/225/234/320/276/321/205/320/231/320/235/321/210/342/224/220/320/240/321/210/320/261/320/234/321/206/320/230/320/241_/321/205/320/276/320/231/321/210/320/263/320/225/321/205/342/224/220/320/225/321/210/320/266/320/221/321/204/342/225/233/320/255/321/210/342/225/241/320/246/321/205/320/234/320/225.py +35 -11
- cnhkmcp/vector_db/_manifest.json +1 -0
- cnhkmcp/vector_db/_meta.json +1 -0
- {cnhkmcp-2.2.0.dist-info → cnhkmcp-2.3.0.dist-info}/METADATA +1 -1
- {cnhkmcp-2.2.0.dist-info → cnhkmcp-2.3.0.dist-info}/RECORD +96 -30
- /cnhkmcp/untracked/{skills/expression_verifier → APP/trailSomeAlphas/skills/brain-feature-implementation}/scripts/validator.py +0 -0
- /cnhkmcp/untracked/skills/{expression_verifier → alpha-expression-verifier}/SKILL.md +0 -0
- /cnhkmcp/untracked/skills/{expression_verifier → alpha-expression-verifier}/scripts/verify_expr.py +0 -0
- {cnhkmcp-2.2.0.dist-info → cnhkmcp-2.3.0.dist-info}/WHEEL +0 -0
- {cnhkmcp-2.2.0.dist-info → cnhkmcp-2.3.0.dist-info}/entry_points.txt +0 -0
- {cnhkmcp-2.2.0.dist-info → cnhkmcp-2.3.0.dist-info}/licenses/LICENSE +0 -0
- {cnhkmcp-2.2.0.dist-info → cnhkmcp-2.3.0.dist-info}/top_level.txt +0 -0
|
@@ -0,0 +1,503 @@
|
|
|
1
|
+
---
|
|
2
|
+
brain-data-feature-engineering methodology
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
# BRAIN Data Feature Engineering Workflow
|
|
6
|
+
|
|
7
|
+
**Purpose**: Automatically transform BRAIN dataset fields into deep, meaningful feature engineering ideas.
|
|
8
|
+
|
|
9
|
+
## Input Requirements
|
|
10
|
+
|
|
11
|
+
### Required Parameters:
|
|
12
|
+
- **data_category**: Dataset category (e.g., "fundamental", "analyst", "news", "model")
|
|
13
|
+
- **delay**: Data delay setting (0 or 1)
|
|
14
|
+
- **region**: Market region (e.g., "USA", "EUR", "ASI")
|
|
15
|
+
|
|
16
|
+
### Optional Parameters:
|
|
17
|
+
- **universe**: Trading universe (default: "TOP3000")
|
|
18
|
+
- **dataset_id**: Specific dataset ID (if known, skips discovery phase)
|
|
19
|
+
|
|
20
|
+
## Workflow Overview
|
|
21
|
+
|
|
22
|
+
|
|
23
|
+
### Step 2: Field Extraction and Deconstruction
|
|
24
|
+
- **Deconstruct each field's meaning**:
|
|
25
|
+
* What is being measured? (the entity/concept)
|
|
26
|
+
* How is it measured? (collection/calculation method)
|
|
27
|
+
* Time dimension? (instantaneous, cumulative, rate of change)
|
|
28
|
+
* Business context? (why does this field exist?)
|
|
29
|
+
* Generation logic? (reliability considerations)
|
|
30
|
+
- **Build field profiles**: Structured understanding of each field's essence
|
|
31
|
+
|
|
32
|
+
### Step 3: Reasoning and Analysis
|
|
33
|
+
**performs deep analysis based on collected information:**
|
|
34
|
+
|
|
35
|
+
**A. Field Relationship Mapping**
|
|
36
|
+
- Analyze logical connections between fields
|
|
37
|
+
- Identify: independent fields, related fields, complementary fields
|
|
38
|
+
- Map the "story" the dataset tells
|
|
39
|
+
- **Key question**: What relationships are implied by these fields?
|
|
40
|
+
|
|
41
|
+
**B. Question-Driven Feature Generation (Internal Process)**
|
|
42
|
+
The skill asks itself these questions and generates feature concepts:
|
|
43
|
+
|
|
44
|
+
1. **"What is stable?"** → Look for invariants
|
|
45
|
+
- Which fields or combinations remain relatively constant?
|
|
46
|
+
- What stability measures make sense?
|
|
47
|
+
|
|
48
|
+
2. **"What is changing?"** → Analyze change patterns
|
|
49
|
+
- Rate of change, acceleration, volatility
|
|
50
|
+
- Trend vs. noise separation
|
|
51
|
+
|
|
52
|
+
3. **"What is anomalous?"** → Identify deviations
|
|
53
|
+
- Outliers, unusual patterns, breaks from normal
|
|
54
|
+
- Deviation magnitude and significance
|
|
55
|
+
|
|
56
|
+
4. **"What is combined?"** → Examine interactions
|
|
57
|
+
- How fields interact, amplify, or offset each other
|
|
58
|
+
- Synthesis creates new meaning
|
|
59
|
+
|
|
60
|
+
5. **"What is structural?"** → Study compositions
|
|
61
|
+
- Constituent parts, proportional relationships
|
|
62
|
+
- Structural changes over time
|
|
63
|
+
|
|
64
|
+
6. **"What is cumulative?"** → Explore accumulation effects
|
|
65
|
+
- Building up over time, decay effects
|
|
66
|
+
- Memory and persistence in data
|
|
67
|
+
|
|
68
|
+
7. **"What is relative?"** → Make comparisons
|
|
69
|
+
- Relative positioning, ranking, normalization
|
|
70
|
+
- Context within dataset
|
|
71
|
+
|
|
72
|
+
8. **"What is essential?"** → Distill to core meaning
|
|
73
|
+
- First principles thinking
|
|
74
|
+
- Strip away assumptions, get to essence
|
|
75
|
+
|
|
76
|
+
**C. Feature Concept Generation**
|
|
77
|
+
For each relevant question-field combination:
|
|
78
|
+
- Formulate feature concept that answers the question
|
|
79
|
+
- Define the concept clearly
|
|
80
|
+
- Identify the logical meaning
|
|
81
|
+
- Consider directionality (what high/low values mean)
|
|
82
|
+
- Identify boundary conditions
|
|
83
|
+
- Note potential issues/limitations
|
|
84
|
+
|
|
85
|
+
### Step 4: Feature Documentation
|
|
86
|
+
**For each generated feature concept, document:**
|
|
87
|
+
- **Concept Name**: Clear, descriptive name
|
|
88
|
+
- **Definition**: One-sentence definition
|
|
89
|
+
- **Logical Meaning**: What phenomenon/concept does it represent?
|
|
90
|
+
- **Why It's Meaningful**: Why does this feature make sense?
|
|
91
|
+
- **Directionality**: Interpretation of high vs. low values
|
|
92
|
+
- **Boundary Conditions**: What extremes indicate
|
|
93
|
+
- **Data Requirements**: What fields are used and any constraints
|
|
94
|
+
- **Potential Issues**: Known limitations or concerns
|
|
95
|
+
|
|
96
|
+
### Step 5: Output Generation
|
|
97
|
+
**Generate structured markdown report including:**
|
|
98
|
+
|
|
99
|
+
0. **Output the report markdown format** in the following format:
|
|
100
|
+
|
|
101
|
+
# {dataset_name} Feature Engineering Analysis Report
|
|
102
|
+
|
|
103
|
+
**Dataset**: {dataset_id}
|
|
104
|
+
**Category**: {category}
|
|
105
|
+
**Region**: {region}
|
|
106
|
+
**Analysis Date**: {analysis_date}
|
|
107
|
+
**Fields Analyzed**: {field_count}
|
|
108
|
+
|
|
109
|
+
---
|
|
110
|
+
|
|
111
|
+
## Executive Summary
|
|
112
|
+
|
|
113
|
+
**Primary Question Answered by Dataset**: What does this dataset fundamentally measure?
|
|
114
|
+
|
|
115
|
+
**Key Insights from Analysis**:
|
|
116
|
+
- {insight_1}
|
|
117
|
+
- {insight_2}
|
|
118
|
+
- {insight_3}
|
|
119
|
+
|
|
120
|
+
**Critical Field Relationships Identified**:
|
|
121
|
+
- {relationship_1}
|
|
122
|
+
- {relationship_2}
|
|
123
|
+
|
|
124
|
+
**Most Promising Feature Concepts**:
|
|
125
|
+
1. {top_feature_1} - because {reason_1}
|
|
126
|
+
2. {top_feature_2} - because {reason_2}
|
|
127
|
+
3. {top_feature_3} - because {reason_3}
|
|
128
|
+
|
|
129
|
+
---
|
|
130
|
+
|
|
131
|
+
## Dataset Deep Understanding
|
|
132
|
+
|
|
133
|
+
### Dataset Description
|
|
134
|
+
{dataset_description}
|
|
135
|
+
|
|
136
|
+
### Field Inventory
|
|
137
|
+
| Field ID | Description | Data Type | Update Frequency | Coverage |
|
|
138
|
+
|----------|-------------|-----------|------------------|----------|
|
|
139
|
+
| {field_1_id} | {field_1_desc} | {type_1} | {freq_1} | {coverage_1}% |
|
|
140
|
+
| {field_2_id} | {field_2_desc} | {type_2} | {freq_2} | {coverage_2}% |
|
|
141
|
+
| {field_3_id} | {field_3_desc} | {type_3} | {freq_3} | {coverage_3}% |
|
|
142
|
+
|
|
143
|
+
*(Additional fields as needed)*
|
|
144
|
+
|
|
145
|
+
### Field Deconstruction Analysis
|
|
146
|
+
|
|
147
|
+
#### {field_1_id}: {field_1_name}
|
|
148
|
+
- **What is being measured?**: {measurement_object_1}
|
|
149
|
+
- **How is it measured?**: {measurement_method_1}
|
|
150
|
+
- **Time dimension**: {time_dimension_1}
|
|
151
|
+
- **Business context**: {business_context_1}
|
|
152
|
+
- **Generation logic**: {generation_logic_1}
|
|
153
|
+
- **Reliability considerations**: {reliability_1}
|
|
154
|
+
|
|
155
|
+
#### {field_2_id}: {field_2_name}
|
|
156
|
+
- **What is being measured?**: {measurement_object_2}
|
|
157
|
+
- **How is it measured?**: {measurement_method_2}
|
|
158
|
+
- **Time dimension**: {time_dimension_2}
|
|
159
|
+
- **Business context**: {business_context_2}
|
|
160
|
+
- **Generation logic**: {generation_logic_2}
|
|
161
|
+
- **Reliability considerations**: {reliability_2}
|
|
162
|
+
|
|
163
|
+
*(Additional fields as needed)*
|
|
164
|
+
|
|
165
|
+
### Field Relationship Mapping
|
|
166
|
+
|
|
167
|
+
**The Story This Data Tells**:
|
|
168
|
+
{story_description}
|
|
169
|
+
|
|
170
|
+
**Key Relationships Identified**:
|
|
171
|
+
1. {relationship_1_desc}
|
|
172
|
+
2. {relationship_2_desc}
|
|
173
|
+
3. {relationship_3_desc}
|
|
174
|
+
|
|
175
|
+
**Missing Pieces That Would Complete the Picture**:
|
|
176
|
+
- {missing_1}
|
|
177
|
+
- {missing_2}
|
|
178
|
+
|
|
179
|
+
---
|
|
180
|
+
|
|
181
|
+
## Feature Concepts by Question Type
|
|
182
|
+
|
|
183
|
+
### Q1: "What is stable?" (Invariance Features)
|
|
184
|
+
|
|
185
|
+
**Concept**: {stability_feature_1_name}
|
|
186
|
+
- **Sample Fields Used**: fields_used_1
|
|
187
|
+
- **Definition**: {definition_1}
|
|
188
|
+
- **Why This Feature**: {why_1}
|
|
189
|
+
- **Logical Meaning**: {logical_meaning_1}
|
|
190
|
+
- **Directionality**: {directionality_1}
|
|
191
|
+
- **Boundary Conditions**: {boundaries_1}
|
|
192
|
+
- **Implementation Example**: `{implementation_1}`
|
|
193
|
+
|
|
194
|
+
**Concept**: {stability_feature_2_name}
|
|
195
|
+
- **Sample Fields Used**: fields_used_2
|
|
196
|
+
- **Definition**: {definition_2}
|
|
197
|
+
- **Why This Feature**: {why_2}
|
|
198
|
+
- **Logical Meaning**: {logical_meaning_2}
|
|
199
|
+
- **Directionality**: {directionality_2}
|
|
200
|
+
- **Boundary Conditions**: {boundaries_2}
|
|
201
|
+
- **Implementation Example**: `{implementation_2}`
|
|
202
|
+
|
|
203
|
+
---
|
|
204
|
+
|
|
205
|
+
### Q2: "What is changing?" (Dynamics Features)
|
|
206
|
+
|
|
207
|
+
**Concept**: {dynamics_feature_1_name}
|
|
208
|
+
- **Sample Fields Used**: fields_used_3
|
|
209
|
+
- **Definition**: {definition_3}
|
|
210
|
+
- **Why This Feature**: {why_3}
|
|
211
|
+
- **Logical Meaning**: {logical_meaning_3}
|
|
212
|
+
- **Directionality**: {directionality_3}
|
|
213
|
+
- **Boundary Conditions**: {boundaries_3}
|
|
214
|
+
- **Implementation Example**: `{implementation_3}`
|
|
215
|
+
|
|
216
|
+
**Concept**: {dynamics_feature_2_name}
|
|
217
|
+
- **Sample Fields Used**: fields_used_4
|
|
218
|
+
- **Definition**: {definition_4}
|
|
219
|
+
- **Why This Feature**: {why_4}
|
|
220
|
+
- **Logical Meaning**: {logical_meaning_4}
|
|
221
|
+
- **Directionality**: {directionality_4}
|
|
222
|
+
- **Boundary Conditions**: {boundaries_4}
|
|
223
|
+
- **Implementation Example**: `{implementation_4}`
|
|
224
|
+
|
|
225
|
+
---
|
|
226
|
+
|
|
227
|
+
### Q3: "What is anomalous?" (Deviation Features)
|
|
228
|
+
|
|
229
|
+
**Concept**: {anomaly_feature_1_name}
|
|
230
|
+
- **Sample Fields Used**: fields_used_5
|
|
231
|
+
- **Definition**: {definition_5}
|
|
232
|
+
- **Why This Feature**: {why_5}
|
|
233
|
+
- **Logical Meaning**: {logical_meaning_5}
|
|
234
|
+
- **Directionality**: {directionality_5}
|
|
235
|
+
- **Boundary Conditions**: {boundaries_5}
|
|
236
|
+
- **Implementation Example**: `{implementation_5}`
|
|
237
|
+
|
|
238
|
+
**Concept**: {anomaly_feature_2_name}
|
|
239
|
+
- **Sample Fields Used**: fields_used_6
|
|
240
|
+
- **Definition**: {definition_6}
|
|
241
|
+
- **Why This Feature**: {why_6}
|
|
242
|
+
- **Logical Meaning**: {logical_meaning_6}
|
|
243
|
+
- **Directionality**: {directionality_6}
|
|
244
|
+
- **Boundary Conditions**: {boundaries_6}
|
|
245
|
+
- **Implementation Example**: `{implementation_6}`
|
|
246
|
+
|
|
247
|
+
---
|
|
248
|
+
|
|
249
|
+
### Q4: "What is combined?" (Interaction Features)
|
|
250
|
+
|
|
251
|
+
**Concept**: {interaction_feature_1_name}
|
|
252
|
+
- **Sample Fields Used**: fields_used_7
|
|
253
|
+
- **Definition**: {definition_7}
|
|
254
|
+
- **Why This Feature**: {why_7}
|
|
255
|
+
- **Logical Meaning**: {logical_meaning_7}
|
|
256
|
+
- **Directionality**: {directionality_7}
|
|
257
|
+
- **Boundary Conditions**: {boundaries_7}
|
|
258
|
+
- **Implementation Example**: `{implementation_7}`
|
|
259
|
+
|
|
260
|
+
**Concept**: {interaction_feature_2_name}
|
|
261
|
+
- **Sample Fields Used**: fields_used_8
|
|
262
|
+
- **Definition**: {definition_8}
|
|
263
|
+
- **Why This Feature**: {why_8}
|
|
264
|
+
- **Logical Meaning**: {logical_meaning_8}
|
|
265
|
+
- **Directionality**: {directionality_8}
|
|
266
|
+
- **Boundary Conditions**: {boundaries_8}
|
|
267
|
+
- **Implementation Example**: `{implementation_8}`
|
|
268
|
+
|
|
269
|
+
---
|
|
270
|
+
|
|
271
|
+
### Q5: "What is structural?" (Composition Features)
|
|
272
|
+
|
|
273
|
+
**Concept**: {structure_feature_1_name}
|
|
274
|
+
- **Sample Fields Used**: fields_used_9
|
|
275
|
+
- **Definition**: {definition_9}
|
|
276
|
+
- **Why This Feature**: {why_9}
|
|
277
|
+
- **Logical Meaning**: {logical_meaning_9}
|
|
278
|
+
- **Directionality**: {directionality_9}
|
|
279
|
+
- **Boundary Conditions**: {boundaries_9}
|
|
280
|
+
- **Implementation Example**: `{implementation_9}`
|
|
281
|
+
|
|
282
|
+
**Concept**: {structure_feature_2_name}
|
|
283
|
+
- **Sample Fields Used**: fields_used_10
|
|
284
|
+
- **Definition**: {definition_10}
|
|
285
|
+
- **Why This Feature**: {why_10}
|
|
286
|
+
- **Logical Meaning**: {logical_meaning_10}
|
|
287
|
+
- **Directionality**: {directionality_10}
|
|
288
|
+
- **Boundary Conditions**: {boundaries_10}
|
|
289
|
+
- **Implementation Example**: `{implementation_10}`
|
|
290
|
+
|
|
291
|
+
---
|
|
292
|
+
|
|
293
|
+
### Q6: "What is cumulative?" (Accumulation Features)
|
|
294
|
+
|
|
295
|
+
**Concept**: {accumulation_feature_1_name}
|
|
296
|
+
- **Sample Fields Used**: fields_used_11
|
|
297
|
+
- **Definition**: {definition_11}
|
|
298
|
+
- **Why This Feature**: {why_11}
|
|
299
|
+
- **Logical Meaning**: {logical_meaning_11}
|
|
300
|
+
- **Directionality**: {directionality_11}
|
|
301
|
+
- **Boundary Conditions**: {boundaries_11}
|
|
302
|
+
- **Implementation Example**: `{implementation_11}`
|
|
303
|
+
|
|
304
|
+
**Concept**: {accumulation_feature_2_name}
|
|
305
|
+
- **Sample Fields Used**: fields_used_12
|
|
306
|
+
- **Definition**: {definition_12}
|
|
307
|
+
- **Why This Feature**: {why_12}
|
|
308
|
+
- **Logical Meaning**: {logical_meaning_12}
|
|
309
|
+
- **Directionality**: {directionality_12}
|
|
310
|
+
- **Boundary Conditions**: {boundaries_12}
|
|
311
|
+
- **Implementation Example**: `{implementation_12}`
|
|
312
|
+
|
|
313
|
+
---
|
|
314
|
+
|
|
315
|
+
### Q7: "What is relative?" (Comparison Features)
|
|
316
|
+
|
|
317
|
+
**Concept**: {relative_feature_1_name}
|
|
318
|
+
- **Sample Fields Used**: fields_used_13
|
|
319
|
+
- **Definition**: {definition_13}
|
|
320
|
+
- **Why This Feature**: {why_13}
|
|
321
|
+
- **Logical Meaning**: {logical_meaning_13}
|
|
322
|
+
- **Directionality**: {directionality_13}
|
|
323
|
+
- **Boundary Conditions**: {boundaries_13}
|
|
324
|
+
- **Implementation Example**: `{implementation_13}`
|
|
325
|
+
|
|
326
|
+
**Concept**: {relative_feature_2_name}
|
|
327
|
+
- **Sample Fields Used**: fields_used_14
|
|
328
|
+
- **Definition**: {definition_14}
|
|
329
|
+
- **Why This Feature**: {why_14}
|
|
330
|
+
- **Logical Meaning**: {logical_meaning_14}
|
|
331
|
+
- **Directionality**: {directionality_14}
|
|
332
|
+
- **Boundary Conditions**: {boundaries_14}
|
|
333
|
+
- **Implementation Example**: `{implementation_14}`
|
|
334
|
+
|
|
335
|
+
---
|
|
336
|
+
|
|
337
|
+
### Q8: "What is essential?" (Essence Features)
|
|
338
|
+
|
|
339
|
+
**Concept**: {essence_feature_1_name}
|
|
340
|
+
- **Sample Fields Used**: fields_used_15
|
|
341
|
+
- **Definition**: {definition_15}
|
|
342
|
+
- **Why This Feature**: {why_15}
|
|
343
|
+
- **Logical Meaning**: {logical_meaning_15}
|
|
344
|
+
- **Directionality**: {directionality_15}
|
|
345
|
+
- **Boundary Conditions**: {boundaries_15}
|
|
346
|
+
- **Implementation Example**: `{implementation_15}`
|
|
347
|
+
|
|
348
|
+
**Concept**: {essence_feature_2_name}
|
|
349
|
+
- **Sample Fields Used**: fields_used_16
|
|
350
|
+
- **Definition**: {definition_16}
|
|
351
|
+
- **Why This Feature**: {why_16}
|
|
352
|
+
- **Logical Meaning**: {logical_meaning_16}
|
|
353
|
+
- **Directionality**: {directionality_16}
|
|
354
|
+
- **Boundary Conditions**: {boundaries_16}
|
|
355
|
+
- **Implementation Example**: `{implementation_16}`
|
|
356
|
+
|
|
357
|
+
---
|
|
358
|
+
|
|
359
|
+
## Implementation Considerations
|
|
360
|
+
|
|
361
|
+
### Data Quality Notes
|
|
362
|
+
- **Coverage**: {coverage_note}
|
|
363
|
+
- **Timeliness**: {timeliness_note}
|
|
364
|
+
- **Accuracy**: {accuracy_note}
|
|
365
|
+
- **Potential Biases**: {bias_note}
|
|
366
|
+
|
|
367
|
+
### Computational Complexity
|
|
368
|
+
- **Lightweight features**: {simple_features}
|
|
369
|
+
- **Medium complexity**: {medium_features}
|
|
370
|
+
- **Heavy computation**: {complex_features}
|
|
371
|
+
|
|
372
|
+
### Recommended Prioritization
|
|
373
|
+
|
|
374
|
+
**Tier 1 (Immediate Implementation)**:
|
|
375
|
+
1. {priority_1_feature} - {priority_1_reason}
|
|
376
|
+
2. {priority_2_feature} - {priority_2_reason}
|
|
377
|
+
3. {priority_3_feature} - {priority_3_reason}
|
|
378
|
+
|
|
379
|
+
**Tier 2 (Secondary Priority)**:
|
|
380
|
+
1. {priority_4_feature} - {priority_4_reason}
|
|
381
|
+
2. {priority_5_feature} - {priority_5_reason}
|
|
382
|
+
|
|
383
|
+
**Tier 3 (Requires Further Validation)**:
|
|
384
|
+
1. {priority_6_feature} - {priority_6_reason}
|
|
385
|
+
|
|
386
|
+
---
|
|
387
|
+
|
|
388
|
+
## Critical Questions for Further Exploration
|
|
389
|
+
|
|
390
|
+
### Unanswered Questions:
|
|
391
|
+
1. {unanswered_question_1}
|
|
392
|
+
2. {unanswered_question_2}
|
|
393
|
+
3. {unanswered_question_3}
|
|
394
|
+
|
|
395
|
+
### Recommended Additional Data:
|
|
396
|
+
- {additional_data_1}
|
|
397
|
+
- {additional_data_2}
|
|
398
|
+
- {additional_data_3}
|
|
399
|
+
|
|
400
|
+
### Assumptions to Challenge:
|
|
401
|
+
- {assumption_1}
|
|
402
|
+
- {assumption_2}
|
|
403
|
+
- {assumption_3}
|
|
404
|
+
|
|
405
|
+
---
|
|
406
|
+
|
|
407
|
+
## Methodology Notes
|
|
408
|
+
|
|
409
|
+
**Analysis Approach**: This report was generated by:
|
|
410
|
+
1. Deep field deconstruction to understand data essence
|
|
411
|
+
2. Question-driven feature generation (8 fundamental questions)
|
|
412
|
+
3. Logical validation of each feature concept
|
|
413
|
+
4. Transparent documentation of reasoning
|
|
414
|
+
|
|
415
|
+
**Design Principles**:
|
|
416
|
+
- Focus on logical meaning over conventional patterns
|
|
417
|
+
- Every feature must answer a specific question
|
|
418
|
+
- Clear documentation of "why" for each suggestion
|
|
419
|
+
- Emphasis on data understanding over prediction
|
|
420
|
+
|
|
421
|
+
---
|
|
422
|
+
|
|
423
|
+
*Report generated: {generation_timestamp}*
|
|
424
|
+
*Analysis depth: Comprehensive field deconstruction + 8-question framework*
|
|
425
|
+
*Next steps: Implement Tier 1 features, validate assumptions, gather additional data as needed*
|
|
426
|
+
|
|
427
|
+
|
|
428
|
+
|
|
429
|
+
## Core Analysis Principles
|
|
430
|
+
|
|
431
|
+
1. **From Data Essence**: Start with what data truly means, not what it's traditionally used for
|
|
432
|
+
2. **Autonomous Reasoning**: Skill performs all thinking, no user input required
|
|
433
|
+
3. **Question-Driven**: Internal question bank guides feature generation
|
|
434
|
+
4. **Meaning Over Patterns**: Prioritize logical meaning over conventional combinations
|
|
435
|
+
5. **Transparency**: Show reasoning process in output
|
|
436
|
+
|
|
437
|
+
## Example Output Structure
|
|
438
|
+
|
|
439
|
+
When analyzing dataset 'BEME' (Balance Sheet and Market Data), the output would include:
|
|
440
|
+
|
|
441
|
+
### Dataset Understanding
|
|
442
|
+
**Fields Analyzed**: book_value, market_cap, book_to_market, etc.
|
|
443
|
+
**Key Observations**: Dataset compares accounting values with market valuations
|
|
444
|
+
|
|
445
|
+
### Field Deconstruction
|
|
446
|
+
- **book_value**: Accountant's calculation of net asset value (quarterly, audited, historical cost-based)
|
|
447
|
+
- **market_cap**: Market participants' valuation (continuous, forward-looking, sentiment-influenced)
|
|
448
|
+
- **book_to_market**: Ratio comparing these two valuation perspectives
|
|
449
|
+
|
|
450
|
+
### Feature Concepts Generated
|
|
451
|
+
|
|
452
|
+
**From "What is stable?"**
|
|
453
|
+
- "Market reevaluation stability": Rolling coefficient of variation of book_to_market
|
|
454
|
+
- **Logic**: Measures whether market opinion is stable or volatile
|
|
455
|
+
- **Meaning**: Stable values suggest consensus, volatile values suggest disagreement/uncertainty
|
|
456
|
+
|
|
457
|
+
**From "What is changing?"**
|
|
458
|
+
- "Value creation vs. market reevaluation decomposition": Separate book_value growth from market_cap growth
|
|
459
|
+
- **Logic**: Distinguish fundamental value creation from market sentiment changes
|
|
460
|
+
- **Meaning**: Which component drives changes in book_to_market?
|
|
461
|
+
|
|
462
|
+
**From "What is combined?"**
|
|
463
|
+
- "Intangible value proportion": (market_cap - book_value) / enterprise_value
|
|
464
|
+
- **Logic**: Quantify proportion of value from intangibles (brand, growth, etc.)
|
|
465
|
+
- **Meaning**: What percentage of valuation isn't captured on the balance sheet?
|
|
466
|
+
|
|
467
|
+
**(Additional question-based features would follow...)**
|
|
468
|
+
|
|
469
|
+
## Implementation Notes
|
|
470
|
+
|
|
471
|
+
### The skill should:
|
|
472
|
+
1. **Analyze first, then generate**: Fully understand dataset before proposing features
|
|
473
|
+
2. **Show reasoning**: Explain why each feature concept makes sense
|
|
474
|
+
3. **Be specific**: Reference actual field names and their characteristics
|
|
475
|
+
4. **Be critical**: Question assumptions and identify limitations
|
|
476
|
+
5. **Be creative**: Look beyond traditional financial metrics
|
|
477
|
+
|
|
478
|
+
### The skill should NOT:
|
|
479
|
+
1. **Ask users to think**: All thinking is internal to the skill
|
|
480
|
+
2. **Provide generic templates**: Each analysis should be specific to the dataset
|
|
481
|
+
3. **Rely on conventional wisdom**: Challenge traditional approaches
|
|
482
|
+
4. **Output patterns without meaning**: Every suggestion must have clear logic
|
|
483
|
+
|
|
484
|
+
## Quality Assurance
|
|
485
|
+
|
|
486
|
+
**Self-Check Process:**
|
|
487
|
+
- [ ] All fields analyzed, not just skimmed
|
|
488
|
+
- [ ] Field meanings understood beyond descriptions
|
|
489
|
+
- [ ] Multiple question types explored
|
|
490
|
+
- [ ] Each feature has clear logical meaning
|
|
491
|
+
- [ ] Reasoning is explicit, not implicit
|
|
492
|
+
- [ ] Limitations are acknowledged
|
|
493
|
+
- [ ] Output is dataset-specific, not generic
|
|
494
|
+
|
|
495
|
+
**Validation Questions:**
|
|
496
|
+
- Would this analysis help someone truly understand the data?
|
|
497
|
+
- Are feature concepts novel yet meaningful?
|
|
498
|
+
- Is the reasoning process transparent?
|
|
499
|
+
- Does it avoid conventional thinking traps?
|
|
500
|
+
|
|
501
|
+
---
|
|
502
|
+
|
|
503
|
+
*This skill performs deep analysis of BRAIN datasets, generating meaningful feature engineering concepts based on data essence and logical reasoning.*
|