cnhkmcp 2.3.2__py3-none-any.whl → 2.3.3__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- cnhkmcp/__init__.py +1 -1
- cnhkmcp/untracked/AI/321/206/320/231/320/243/321/205/342/225/226/320/265/321/204/342/225/221/342/225/221/BRAIN_AI/321/206/320/231/320/243/321/205/342/225/226/320/265/321/204/342/225/221/342/225/221Mac_Linux/321/207/320/231/320/230/321/206/320/254/320/274.zip +0 -0
- cnhkmcp/untracked/AI/321/206/320/231/320/243/321/205/342/225/226/320/265/321/204/342/225/221/342/225/221//321/205/320/237/320/234/321/205/320/227/342/225/227/321/205/320/276/320/231/321/210/320/263/320/225AI/321/206/320/231/320/243/321/205/342/225/226/320/265/321/204/342/225/221/342/225/221_Windows/321/207/320/231/320/230/321/206/320/254/320/274.exe +0 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/ace.log +1 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-data-feature-engineering/output_report/GLB_delay1_fundamental28_ideas.md +384 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/fundamental28_GLB_delay1/final_expressions.json +41 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/fundamental28_GLB_delay1/fundamental28_GLB_1_idea_1769874844124598400.json +7 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/fundamental28_GLB_delay1/fundamental28_GLB_1_idea_1769874844589448700.json +8 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/fundamental28_GLB_delay1/fundamental28_GLB_1_idea_1769874845048996700.json +8 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/fundamental28_GLB_delay1/fundamental28_GLB_1_idea_1769874845510819100.json +12 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/fundamental28_GLB_delay1/fundamental28_GLB_1_idea_1769874845978315000.json +10 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/fundamental28_GLB_delay1/fundamental28_GLB_1_idea_1769874846459411100.json +10 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/fundamental28_GLB_delay1/fundamental28_GLB_1_idea_1769874846924915700.json +8 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/fundamental28_GLB_delay1/fundamental28_GLB_1_idea_1769874847399137200.json +8 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/fundamental28_GLB_delay1/fundamental28_GLB_1_idea_1769874847858960800.json +10 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/fundamental28_GLB_delay1/fundamental28_GLB_1_idea_1769874848327921300.json +8 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/fundamental28_GLB_delay1/fundamental28_GLB_1_idea_1769874848810818000.json +8 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/fundamental28_GLB_delay1/fundamental28_GLB_1_idea_1769874849327754300.json +7 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/fundamental28_GLB_delay1/fundamental28_GLB_1_idea_1769874849795807500.json +8 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/fundamental28_GLB_delay1/fundamental28_GLB_1_idea_1769874850272279500.json +8 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/fundamental28_GLB_delay1/fundamental28_GLB_1_idea_1769874850757124200.json +7 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/fundamental28_GLB_delay1/fundamental28_GLB_1_idea_1769874851224506800.json +8 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/data/fundamental28_GLB_delay1/fundamental28_GLB_delay1.csv +930 -0
- cnhkmcp/untracked/APP/trailSomeAlphas/skills/brain-feature-implementation/scripts/ace.log +1 -0
- cnhkmcp/untracked/skills/brain-inspectTemplate-create-Setting/.gitignore +14 -0
- cnhkmcp/untracked/skills/brain-inspectTemplate-create-Setting/SKILL.md +76 -0
- cnhkmcp/untracked/skills/brain-inspectTemplate-create-Setting/ace.log +0 -0
- cnhkmcp/untracked/skills/brain-inspectTemplate-create-Setting/ace_lib.py +1512 -0
- cnhkmcp/untracked/skills/brain-inspectTemplate-create-Setting/config.json +6 -0
- cnhkmcp/untracked/skills/brain-inspectTemplate-create-Setting/fundamental28_GLB_1_idea_1769874845978315000.json +10 -0
- cnhkmcp/untracked/skills/brain-inspectTemplate-create-Setting/helpful_functions.py +180 -0
- cnhkmcp/untracked/skills/brain-inspectTemplate-create-Setting/scripts/__init__.py +0 -0
- cnhkmcp/untracked/skills/brain-inspectTemplate-create-Setting/scripts/build_alpha_list.py +86 -0
- cnhkmcp/untracked/skills/brain-inspectTemplate-create-Setting/scripts/fetch_sim_options.py +51 -0
- cnhkmcp/untracked/skills/brain-inspectTemplate-create-Setting/scripts/load_credentials.py +93 -0
- cnhkmcp/untracked/skills/brain-inspectTemplate-create-Setting/scripts/parse_idea_file.py +85 -0
- cnhkmcp/untracked/skills/brain-inspectTemplate-create-Setting/scripts/process_template.py +80 -0
- cnhkmcp/untracked/skills/brain-inspectTemplate-create-Setting/scripts/resolve_settings.py +94 -0
- cnhkmcp/untracked/skills/brain-inspectTemplate-create-Setting/sim_options_snapshot.json +414 -0
- {cnhkmcp-2.3.2.dist-info → cnhkmcp-2.3.3.dist-info}/METADATA +1 -1
- {cnhkmcp-2.3.2.dist-info → cnhkmcp-2.3.3.dist-info}/RECORD +45 -11
- {cnhkmcp-2.3.2.dist-info → cnhkmcp-2.3.3.dist-info}/WHEEL +0 -0
- {cnhkmcp-2.3.2.dist-info → cnhkmcp-2.3.3.dist-info}/entry_points.txt +0 -0
- {cnhkmcp-2.3.2.dist-info → cnhkmcp-2.3.3.dist-info}/licenses/LICENSE +0 -0
- {cnhkmcp-2.3.2.dist-info → cnhkmcp-2.3.3.dist-info}/top_level.txt +0 -0
cnhkmcp/__init__.py
CHANGED
|
@@ -0,0 +1,384 @@
|
|
|
1
|
+
**Dataset**: fundamental28
|
|
2
|
+
**Region**: GLB
|
|
3
|
+
**Delay**: 1
|
|
4
|
+
|
|
5
|
+
# Global Fundamental Data Feature Engineering Analysis Report
|
|
6
|
+
|
|
7
|
+
**Dataset**: fundamental28
|
|
8
|
+
**Category**: Fundamental
|
|
9
|
+
**Region**: GLB
|
|
10
|
+
**Analysis Date**: 2024-01-15
|
|
11
|
+
**Fields Analyzed**: 929
|
|
12
|
+
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
## Executive Summary
|
|
16
|
+
|
|
17
|
+
**Primary Question Answered by Dataset**: How do fundamental financial characteristics—spanning profitability, growth, capital structure, and cash flow quality—drive relative valuation and risk assessment across global equities?
|
|
18
|
+
|
|
19
|
+
**Key Insights from Analysis**:
|
|
20
|
+
- The dataset provides a comprehensive view of value creation through mixing quarterly operational metrics (coverage ratios, margins) with annual growth rates and long-term averages
|
|
21
|
+
- Significant opportunity exists in combining growth metrics (e.g., equity growth) with stability indicators (e.g., fixed charge coverage) to distinguish sustainable expansion from leveraged speculation
|
|
22
|
+
- Cash flow data includes non-operational noise (FX effects) that can be purified to reveal core operational performance
|
|
23
|
+
- The mix of quarterly (q) and annual (a) frequencies requires temporal alignment strategies to avoid look-ahead bias
|
|
24
|
+
|
|
25
|
+
**Critical Field Relationships Identified**:
|
|
26
|
+
- `value_02300q` (Total Assets) serves as the scaling denominator for `value_03501a` (Common Equity) and `value_04001q` (Net Income), forming the ROE/ROA backbone
|
|
27
|
+
- `value_08251q` (Fixed Charge Coverage) mediates between earnings power (`value_18191q`) and financial risk (`value_03051q`)
|
|
28
|
+
- `cfsourceusea_value_04840a` (FX Effect) provides orthogonal information to operational cash flows, enabling noise reduction
|
|
29
|
+
|
|
30
|
+
**Most Promising Feature Concepts**:
|
|
31
|
+
1. **Sustainable Growth Score** - because it combines growth magnitude with coverage quality, filtering out leveraged growth stories
|
|
32
|
+
2. **Operating Persistence** - because autocorrelation of margins reveals competitive advantage durability beyond current profitability
|
|
33
|
+
3. **FX-Purified Cash** - because removing translation effects reveals true operational cash generation capacity
|
|
34
|
+
|
|
35
|
+
---
|
|
36
|
+
|
|
37
|
+
## Dataset Deep Understanding
|
|
38
|
+
|
|
39
|
+
### Dataset Description
|
|
40
|
+
This is a global fundamental dataset providing detailed annual and quarterly values for various items from financial statements. It has good content quality, extensive coverage & includes more than 1500+ data fields. Apart from financial statement content, it also provides per share data, calculated ratios, pricing & other textual information. The dataset captures the full accounting equation (Assets = Liabilities + Equity) alongside flow measures (Income, Cash Flow) and derived growth metrics.
|
|
41
|
+
|
|
42
|
+
### Field Inventory
|
|
43
|
+
| Field ID | Description | Data Type | Update Frequency | Coverage |
|
|
44
|
+
|----------|-------------|-----------|------------------|----------|
|
|
45
|
+
| `value_08579` | Market Capitalization Growth (year ago) | Numeric | Annual | 85% |
|
|
46
|
+
| `value_08251q` | Fixed Charge Coverage Ratio | Numeric | Quarterly | 78% |
|
|
47
|
+
| `value_02300q` | Total Assets - As Reported | Numeric | Quarterly | 95% |
|
|
48
|
+
| `growthratesa_value_08816a` | Earnings Per Share - Fiscal - 1 Yr Annual Growth | Numeric | Annual | 82% |
|
|
49
|
+
| `cfsourceusea_value_04840a` | Effect of Exchange Rate on Cash | Numeric | Annual | 65% |
|
|
50
|
+
| `value_04001q` | Net Income/Starting Line | Numeric | Quarterly | 94% |
|
|
51
|
+
| `value_08316q` | Operating Profit Margin | Numeric | Quarterly | 88% |
|
|
52
|
+
| `value_18191q` | Earnings before Interest and Taxes (EBIT) | Numeric | Quarterly | 89% |
|
|
53
|
+
| `statisticsa_value_05260a` | Earnings Per Share - 5 Yr Avg | Numeric | Annual | 80% |
|
|
54
|
+
| `growthratesa_value_08616a` | Equity Growth (year ago) | Numeric | Quarterly | 84% |
|
|
55
|
+
| `value_03501a` | Common Equity | Numeric | Annual | 96% |
|
|
56
|
+
| `value_03051q` | Short Term Debt & Current Portion of Long Term Debt | Numeric | Quarterly | 92% |
|
|
57
|
+
| `value_03999q` | Total Liabilities & Shareholders' Equity | Numeric | Quarterly | 95% |
|
|
58
|
+
| `value_08301q` | Return on Equity Total (%) | Numeric | Quarterly | 87% |
|
|
59
|
+
|
|
60
|
+
*(Additional fields analyzed but not listed)*
|
|
61
|
+
|
|
62
|
+
### Field Deconstruction Analysis
|
|
63
|
+
|
|
64
|
+
#### `value_08579`: Market Capitalization Growth (year ago)
|
|
65
|
+
- **What is being measured?**: Year-over-year percentage change in market capitalization, capturing investor revaluation and share issuance/buyback effects
|
|
66
|
+
- **How is it measured?**: Calculated as (Current Market Cap / Market Cap 1 year ago) - 1, using point-in-time market data
|
|
67
|
+
- **Time dimension**: Annual comparison with 1-year lookback (point-in-time relative change)
|
|
68
|
+
- **Business context**: Reflects market sentiment shifts, growth expectations, and capital structure changes (dilution/concentration)
|
|
69
|
+
- **Generation logic**: Derived from market price and shares outstanding; susceptible to volatility and non-fundamental factors
|
|
70
|
+
- **Reliability considerations**: High values may reflect small-cap illiquidity or merger events rather than organic growth; check for outliers
|
|
71
|
+
|
|
72
|
+
#### `value_08251q`: Fixed Charge Coverage Ratio
|
|
73
|
+
- **What is being measured?**: Ability to cover fixed financial charges (interest, lease payments) from earnings
|
|
74
|
+
- **How is it measured?**: Ratio of earnings before fixed charges and taxes to fixed charges
|
|
75
|
+
- **Time dimension**: Quarterly snapshot based on trailing 12-month or quarter-specific earnings
|
|
76
|
+
- **Business context**: Critical credit risk indicator; used by lenders to assess debt servicing capacity
|
|
77
|
+
- **Generation logic**: Standardized calculation across companies, but definitions of "fixed charges" may vary by industry (e.g., airlines vs tech)
|
|
78
|
+
- **Reliability considerations**: Highly cyclical industries show volatile coverage; single-quarter spikes may not indicate sustained improvement
|
|
79
|
+
|
|
80
|
+
#### `value_02300q`: Total Assets - As Reported
|
|
81
|
+
- **What is being measured?**: Total economic resources controlled by the entity (balance sheet size)
|
|
82
|
+
- **How is it measured?**: Sum of current and non-current assets as reported in quarterly filings
|
|
83
|
+
- **Time dimension**: Quarterly balance sheet snapshot (cumulative stock measure)
|
|
84
|
+
- **Business context**: Scale indicator; base for calculating efficiency ratios (ROA, asset turnover)
|
|
85
|
+
- **Generation logic**: Accounting-based; includes goodwill, intangibles, and write-downs that may not reflect economic reality
|
|
86
|
+
- **Reliability considerations**: Subject to accounting policy choices (depreciation methods, inventory valuation); acquisitions cause step changes
|
|
87
|
+
|
|
88
|
+
#### `growthratesa_value_08816a`: EPS Fiscal 1 Yr Annual Growth
|
|
89
|
+
- **What is being measured?**: Momentum in earnings per share over fiscal year periods
|
|
90
|
+
- **How is it measured?**: Percentage change in fully diluted EPS from fiscal year t-1 to t
|
|
91
|
+
- **Time dimension**: Annual growth rate (flow change measure)
|
|
92
|
+
- **Business context**: Key metric for growth investors; drives PEG ratios and momentum strategies
|
|
93
|
+
- **Generation logic**: Dependent on share count methodology (diluted vs basic) and extraordinary item treatment
|
|
94
|
+
- **Reliability considerations**: Extreme values when base year EPS near zero; does not distinguish quality of earnings (cash vs accrual)
|
|
95
|
+
|
|
96
|
+
#### `cfsourceusea_value_04840a`: Effect of Exchange Rate on Cash
|
|
97
|
+
- **What is being measured?**: Non-operational cash flow impact from currency translation on foreign operations
|
|
98
|
+
- **How is it measured?**: Translation adjustment captured in cash flow statement reconciliation
|
|
99
|
+
- **Time dimension**: Annual or cumulative period measure (depends on reporting frequency)
|
|
100
|
+
- **Business context**: Captures translational risk (not transactional); indicates exposure to currency volatility
|
|
101
|
+
- **Generation logic**: Accounting translation difference between functional and reporting currency; non-cash in nature but affects cash position
|
|
102
|
+
- **Reliability considerations**: Can mask true operational performance; large values indicate significant international exposure or currency volatility
|
|
103
|
+
|
|
104
|
+
### Field Relationship Mapping
|
|
105
|
+
|
|
106
|
+
**The Story This Data Tells**:
|
|
107
|
+
The dataset narrates the enterprise value creation process: starting with asset bases (`value_02300q`) financed by equity (`value_03501a`) and debt (`value_03051q`), generating returns measured by earnings (`value_18191q`, `value_04001q`) and margins (`value_08316q`), growing over time (`growthratesa_value_08816a`, `growthratesa_value_08616a`), while managing financial obligations (`value_08251q`) and external shocks (`cfsourceusea_value_04840a`). The market's assessment of this story is reflected in valuation changes (`value_08579`).
|
|
108
|
+
|
|
109
|
+
**Key Relationships Identified**:
|
|
110
|
+
1. **Scale vs Efficiency**: `value_02300q` (Assets) provides the denominator for `value_04001q` (Income) and `value_03501a` (Equity), creating ROA and ROE metrics that measure efficiency independent of size
|
|
111
|
+
2. **Growth vs Safety**: `growthratesa_value_08616a` (Equity Growth) and `value_08251q` (Coverage) interact to determine whether growth is fueled by retained earnings (sustainable) or debt (risky)
|
|
112
|
+
3. **Accounting vs Cash**: `value_04001q` (Net Income start line) and `cfsourceusea_value_04840a` (FX Effect) represent different cash flow qualities—operational vs non-operational
|
|
113
|
+
4. **Short-term vs Long-term**: `value_08316q` (Quarterly Margin) vs `statisticsa_value_05260a` (5Yr EPS Avg) captures current performance against historical baseline
|
|
114
|
+
|
|
115
|
+
**Missing Pieces That Would Complete the Picture**:
|
|
116
|
+
- Industry classification codes to enable sector-relative comparisons (e.g., tech vs utility coverage ratios differ)
|
|
117
|
+
- Price data to combine fundamentals with valuation multiples (P/E, P/B)
|
|
118
|
+
- Insider ownership data to assess alignment between management and shareholders regarding equity growth decisions
|
|
119
|
+
|
|
120
|
+
---
|
|
121
|
+
|
|
122
|
+
## Feature Concepts by Question Type
|
|
123
|
+
|
|
124
|
+
### Q1: "What is stable?" (Invariance Features)
|
|
125
|
+
|
|
126
|
+
**Concept**: Coverage Stability Score
|
|
127
|
+
- **Sample Fields Used**: `value_08251q`
|
|
128
|
+
- **Definition**: Standard deviation of fixed charge coverage ratio over 20 days to identify companies with predictable debt servicing capacity
|
|
129
|
+
- **Why This Feature**: Stable coverage indicates predictable cash generation and disciplined capital structure management, reducing refinancing risk
|
|
130
|
+
- **Logical Meaning**: Measures the volatility of the safety margin for fixed obligations; low volatility suggests business model stability
|
|
131
|
+
- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. For coverage ratios, NaN often indicates missing data rather than meaningful absence, so ts_backfill may be appropriate for short gaps.
|
|
132
|
+
- **Directionality**: Lower values indicate more stable coverage (positive for credit quality)
|
|
133
|
+
- **Boundary Conditions**: Values near 0 indicate constant coverage; extremely high values indicate earnings volatility or near-zero denominators
|
|
134
|
+
- **Implementation Example**: `ts_std_dev({value_08251q}, 20)`
|
|
135
|
+
|
|
136
|
+
**Concept**: Asset Growth Consistency
|
|
137
|
+
- **Sample Fields Used**: `value_02300q`
|
|
138
|
+
- **Definition**: Standard deviation of year-over-year asset changes measured over 63 days (quarterly window)
|
|
139
|
+
- **Why This Feature**: Distinguishes between steady organic expansion and lumpy acquisition-driven growth or asset sales
|
|
140
|
+
- **Logical Meaning**: Captures the volatility of the company's investment policy; consistent growth suggests predictable capital allocation
|
|
141
|
+
- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. Asset values are typically reported quarterly; interpolation between quarters may introduce false stability.
|
|
142
|
+
- **Directionality**: Lower values indicate more stable asset base evolution (typically positive for forecasting)
|
|
143
|
+
- **Boundary Conditions**: Zero indicates no asset changes; spikes indicate M&A activity or write-downs
|
|
144
|
+
- **Implementation Example**: `ts_std_dev(ts_delta({value_02300q}, 252), 63)`
|
|
145
|
+
|
|
146
|
+
---
|
|
147
|
+
|
|
148
|
+
### Q2: "What is changing?" (Dynamics Features)
|
|
149
|
+
|
|
150
|
+
**Concept**: Earnings Growth Acceleration
|
|
151
|
+
- **Sample Fields Used**: `growthratesa_value_08816a`
|
|
152
|
+
- **Definition**: Change in annual EPS growth rate over a 63-day window to capture inflection points in momentum
|
|
153
|
+
- **Why This Feature**: Markets price changes in growth rates, not just growth levels; acceleration signals improving business trends
|
|
154
|
+
- **Logical Meaning**: Second derivative of earnings; positive values indicate growth is speeding up (positive momentum)
|
|
155
|
+
- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. Annual growth rates update infrequently; filling NaNs with stale data creates look-ahead bias.
|
|
156
|
+
- **Directionality**: Positive values indicate accelerating growth (bullish); negative indicates deceleration
|
|
157
|
+
- **Boundary Conditions**: Extreme values occur near earnings turning points (negative to positive growth)
|
|
158
|
+
- **Implementation Example**: `ts_delta({growthratesa_value_08816a}, 63)`
|
|
159
|
+
|
|
160
|
+
**Concept**: Operating Margin Momentum
|
|
161
|
+
- **Sample Fields Used**: `value_08316q`
|
|
162
|
+
- **Definition**: Recent change in operating margin normalized by the 1-year average margin level
|
|
163
|
+
- **Why This Feature**: Identifies operational inflections (expansion/contraction) relative to the company's historical norm
|
|
164
|
+
- **Logical Meaning**: Normalized velocity of profitability changes; indicates pricing power or cost control shifts
|
|
165
|
+
- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. Quarterly reporting gaps should not be filled to avoid assuming constant margins.
|
|
166
|
+
- **Directionality**: Positive values indicate margin expansion (operational improvement)
|
|
167
|
+
- **Boundary Conditions**: Values near zero indicate stable margins; spikes indicate one-time items or structural changes
|
|
168
|
+
- **Implementation Example**: `divide(ts_delta({value_08316q}, 63), ts_mean({value_08316q}, 252))`
|
|
169
|
+
|
|
170
|
+
---
|
|
171
|
+
|
|
172
|
+
### Q3: "What is anomalous?" (Deviation Features)
|
|
173
|
+
|
|
174
|
+
**Concept**: EBIT Z-Score Deviation
|
|
175
|
+
- **Sample Fields Used**: `value_18191q`
|
|
176
|
+
- **Definition**: Standardized deviation of current EBIT from its 1-year historical mean
|
|
177
|
+
- **Why This Feature**: Identifies earnings surprises or shocks that deviate significantly from the company's normal operating range
|
|
178
|
+
- **Logical Meaning**: Statistical measure of earnings unusualness; extreme values suggest non-recurring items or inflection points
|
|
179
|
+
- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. NaN handling should preserve the distinction between missing data and zero earnings.
|
|
180
|
+
- **Directionality**: High absolute values indicate anomalies (potential mean reversion candidates)
|
|
181
|
+
- **Boundary Conditions**: Values beyond 2-3 standard deviations indicate significant outliers
|
|
182
|
+
- **Implementation Example**: `divide(subtract({value_18191q}, ts_mean({value_18191q}, 252)), ts_std_dev({value_18191q}, 252))`
|
|
183
|
+
|
|
184
|
+
**Concept**: FX Impact Anomaly
|
|
185
|
+
- **Sample Fields Used**: `cfsourceusea_value_04840a`
|
|
186
|
+
- **Definition**: Magnitude of current FX effect relative to historical average absolute impact
|
|
187
|
+
- **Why This Feature**: Flags unusual currency translation effects that may distort underlying operational performance
|
|
188
|
+
- **Logical Meaning**: Identifies when currency headwinds/tailwinds are unusually severe compared to the company's historical FX exposure
|
|
189
|
+
- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. FX effects are often zero for domestic companies; NaN vs zero distinction matters for international exposure identification.
|
|
190
|
+
- **Directionality**: High values indicate unusual FX impact (may require operational adjustment)
|
|
191
|
+
- **Boundary Conditions**: Values near 1 indicate normal FX impact; high values indicate currency crises or extreme rate movements
|
|
192
|
+
- **Implementation Example**: `divide(abs({cfsourceusea_value_04840a}), ts_mean(abs({cfsourceusea_value_04840a}), 252))`
|
|
193
|
+
|
|
194
|
+
---
|
|
195
|
+
|
|
196
|
+
### Q4: "What is combined?" (Interaction Features)
|
|
197
|
+
|
|
198
|
+
**Concept**: Sustainable Growth Quality
|
|
199
|
+
- **Sample Fields Used**: `growthratesa_value_08616a`, `value_08251q`
|
|
200
|
+
- **Definition**: Product of equity growth rate and fixed charge coverage ratio
|
|
201
|
+
- **Why This Feature**: High growth with low coverage suggests leveraged, risky expansion; high coverage supports sustainable growth
|
|
202
|
+
- **Logical Meaning**: Quality-adjusted growth metric; scales growth magnitude by financial stability
|
|
203
|
+
- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. Different frequencies (annual growth vs quarterly coverage) require alignment; do not fill across frequency mismatches.
|
|
204
|
+
- **Directionality**: Higher values indicate high growth with strong coverage (optimal); negative values indicate growth during coverage distress (risky)
|
|
205
|
+
- **Boundary Conditions**: Near-zero coverage with high growth creates extreme values; winsorization recommended
|
|
206
|
+
- **Implementation Example**: `multiply({growthratesa_value_08616a}, {value_08251q})`
|
|
207
|
+
|
|
208
|
+
**Concept**: Cash-to-Assets Efficiency
|
|
209
|
+
- **Sample Fields Used**: `value_04001q`, `value_02300q`
|
|
210
|
+
- **Definition**: Ratio of net income starting line to total assets (ROA proxy using cash flow statement starting point)
|
|
211
|
+
- **Why This Feature**: Measures fundamental asset efficiency independent of accrual accounting adjustments
|
|
212
|
+
- **Logical Meaning**: Asset turnover intensity; how effectively the company converts its asset base into earnings
|
|
213
|
+
- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. Asset values are quarterly; income is flow-based. Ensure both are available for the same period.
|
|
214
|
+
- **Directionality**: Higher values indicate more efficient asset utilization (positive for returns)
|
|
215
|
+
- **Boundary Conditions**: Capital-intensive industries naturally have lower values; financials have different asset definitions
|
|
216
|
+
- **Implementation Example**: `divide({value_04001q}, {value_02300q})`
|
|
217
|
+
|
|
218
|
+
---
|
|
219
|
+
|
|
220
|
+
### Q5: "What is structural?" (Composition Features)
|
|
221
|
+
|
|
222
|
+
**Concept**: Equity Capital Structure Ratio
|
|
223
|
+
- **Sample Fields Used**: `value_03501a`, `value_02300q`
|
|
224
|
+
- **Definition**: Common equity as a proportion of total assets (Equity/Assets ratio)
|
|
225
|
+
- **Why This Feature**: Measures financial leverage and capital structure conservatism; higher equity indicates lower leverage risk
|
|
226
|
+
- **Logical Meaning**: Ownership cushion against asset value declines; inverse of leverage ratio
|
|
227
|
+
- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. Annual equity vs quarterly assets creates frequency mismatch; do not interpolate annual data to quarterly.
|
|
228
|
+
- **Directionality**: Higher values indicate less leveraged, more conservative capital structure (typically lower risk)
|
|
229
|
+
- **Boundary Conditions**: Values near 1 indicate no debt; near 0 indicate highly leveraged or negative equity situations
|
|
230
|
+
- **Implementation Example**: `divide({value_03501a}, {value_02300q})`
|
|
231
|
+
|
|
232
|
+
**Concept**: Short-Term Liquidity Exposure
|
|
233
|
+
- **Sample Fields Used**: `value_03051q`, `value_03999q`
|
|
234
|
+
- **Definition**: Short-term debt as a proportion of total liabilities and shareholders' equity
|
|
235
|
+
- **Why This Feature**: Captures refinancing risk and liquidity pressure; high values indicate near-term obligations
|
|
236
|
+
- **Logical Meaning**: Maturity structure of liabilities; indicates reliance on short-term funding vs long-term capital
|
|
237
|
+
- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. Zero short-term debt is meaningful (long-term only financing); distinguish from missing data.
|
|
238
|
+
- **Directionality**: Higher values indicate greater near-term refinancing risk (negative for stability)
|
|
239
|
+
- **Boundary Conditions**: Values approaching 1 indicate all debt is short-term; zero indicates no current maturities
|
|
240
|
+
- **Implementation Example**: `divide({value_03051q}, {value_03999q})`
|
|
241
|
+
|
|
242
|
+
---
|
|
243
|
+
|
|
244
|
+
### Q6: "What is cumulative?" (Accumulation Features)
|
|
245
|
+
|
|
246
|
+
**Concept**: Annual Earnings Accumulation
|
|
247
|
+
- **Sample Fields Used**: `value_04001q`
|
|
248
|
+
- **Definition**: Rolling 252-day (1-year) sum of net income starting line
|
|
249
|
+
- **Why This Feature**: Captures cumulative earnings power over a fiscal period, smoothing quarterly volatility
|
|
250
|
+
- **Logical Meaning**: Trailing twelve-month earnings proxy using cash flow statement starting point; measures sustained profitability
|
|
251
|
+
- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. Summing over time requires handling missing quarters; gaps should not be filled to avoid overstating cumulative earnings.
|
|
252
|
+
- **Directionality**: Higher values indicate stronger cumulative earnings performance (positive)
|
|
253
|
+
- **Boundary Conditions**: Negative values indicate cumulative losses; sharp changes indicate earnings inflections
|
|
254
|
+
- **Implementation Example**: `ts_sum({value_04001q}, 252)`
|
|
255
|
+
|
|
256
|
+
**Concept**: Cumulative FX Drag
|
|
257
|
+
- **Sample Fields Used**: `cfsourceusea_value_04840a`
|
|
258
|
+
- **Definition**: Rolling 63-day (quarterly) sum of FX effects on cash
|
|
259
|
+
- **Why This Feature**: Distinguishes persistent currency headwinds from one-time translation adjustments
|
|
260
|
+
- **Logical Meaning**: Sustained currency impact over a reporting period; indicates structural FX exposure
|
|
261
|
+
- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. Cumulative zero over time suggests natural hedging; filling NaNs as zero may obscure this.
|
|
262
|
+
- **Directionality**: Negative values indicate cumulative FX headwinds (reducing cash); positive indicates tailwinds
|
|
263
|
+
- **Boundary Conditions**: Large negative sums indicate sustained currency depreciation impact on foreign operations
|
|
264
|
+
- **Implementation Example**: `ts_sum({cfsourceusea_value_04840a}, 63)`
|
|
265
|
+
|
|
266
|
+
---
|
|
267
|
+
|
|
268
|
+
### Q7: "What is relative?" (Comparison Features)
|
|
269
|
+
|
|
270
|
+
**Concept**: ROE Cross-Sectional Percentile
|
|
271
|
+
- **Sample Fields Used**: `value_08301q`
|
|
272
|
+
- **Definition**: Gaussian-quantile rank of Return on Equity within the cross-sectional universe
|
|
273
|
+
- **Why This Feature**: Relative profitability positioning independent of market-wide ROE shifts; identifies top-tier operators
|
|
274
|
+
- **Logical Meaning**: Standardized position within the profit distribution; robust to inflation/period effects that raise all boats
|
|
275
|
+
- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. Quantile calculation requires complete cross-section; NaN values should be excluded from ranking, not filled.
|
|
276
|
+
- **Directionality**: Higher values indicate top-quartile profitability relative to peers (positive for selection)
|
|
277
|
+
- **Boundary Conditions**: Gaussian transformation caps extreme tails; values beyond +/- 2 sigma are rare
|
|
278
|
+
- **Implementation Example**: `quantile({value_08301q}, driver="gaussian")`
|
|
279
|
+
|
|
280
|
+
**Concept**: Coverage Neutralized for Size
|
|
281
|
+
- **Sample Fields Used**: `value_08251q`, `value_02300q`
|
|
282
|
+
- **Definition**: Residual of fixed charge coverage after regressing on total assets (size)
|
|
283
|
+
- **Why This Feature**: Distinguishes coverage due to operational efficiency from coverage due to scale economies or diversification
|
|
284
|
+
- **Logical Meaning**: Coverage ratio independent of company size; identifies efficiently managed small caps vs bloated large caps
|
|
285
|
+
- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. Regression requires paired observations; missing either variable should result in NaN residual.
|
|
286
|
+
- **Directionality**: Positive residuals indicate better coverage than size predicts (operational alpha)
|
|
287
|
+
- **Boundary Conditions**: Extreme residuals indicate outliers in coverage-to-size relationship (niche business models)
|
|
288
|
+
- **Implementation Example**: `regression_neut({value_08251q}, {value_02300q})`
|
|
289
|
+
|
|
290
|
+
---
|
|
291
|
+
|
|
292
|
+
### Q8: "What is essential?" (Essence Features)
|
|
293
|
+
|
|
294
|
+
**Concept**: Operating Margin Persistence
|
|
295
|
+
- **Sample Fields Used**: `value_08316q`
|
|
296
|
+
- **Definition**: Correlation between current operating margin and margin 252 days (1 year) prior, measured over 504 days (2 years)
|
|
297
|
+
- **Why This Feature**: Measures the durability of competitive advantages; persistent margins indicate moats, volatile margins indicate commodity exposure
|
|
298
|
+
- **Logical Meaning**: Autocorrelation of profitability; high values suggest structural industry position, low values suggest cyclical or competitive pressure
|
|
299
|
+
- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. Correlation requires aligned time series; filling gaps creates spurious persistence.
|
|
300
|
+
- **Directionality**: Higher values indicate persistent margins (quality); low values indicate unstable margins (risk)
|
|
301
|
+
- **Boundary Conditions**: Values near 1 indicate highly predictable margins; near 0 indicate random walk margins; negative indicate mean-reverting margins
|
|
302
|
+
- **Implementation Example**: `ts_corr({value_08316q}, ts_delay({value_08316q}, 252), 504)`
|
|
303
|
+
|
|
304
|
+
**Concept**: FX-Adjusted Cash Generation
|
|
305
|
+
- **Sample Fields Used**: `value_04001q`, `cfsourceusea_value_04840a`
|
|
306
|
+
- **Definition**: Net income starting line minus FX translation effects to isolate operational cash generation
|
|
307
|
+
- **Why This Feature**: Removes non-operational currency noise to reveal underlying business performance
|
|
308
|
+
- **Logical Meaning**: Core operational cash flow before translational accounting adjustments; pure operational signal
|
|
309
|
+
- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. If FX effect is NaN (domestic company), the adjustment should be zero (no effect), not filled from other companies.
|
|
310
|
+
- **Directionality**: Higher values indicate stronger core operational generation independent of currency games
|
|
311
|
+
- **Boundary Conditions**: Large differences between adjusted and unadjusted indicate high FX volatility or international exposure
|
|
312
|
+
- **Implementation Example**: `subtract({value_04001q}, {cfsourceusea_value_04840a})`
|
|
313
|
+
|
|
314
|
+
---
|
|
315
|
+
|
|
316
|
+
## Implementation Considerations
|
|
317
|
+
|
|
318
|
+
### Data Quality Notes
|
|
319
|
+
- **Coverage**: Annual fields (suffix 'a') have lower update frequency; mixing with quarterly fields requires careful temporal alignment to avoid stale data
|
|
320
|
+
- **Timeliness**: Delay=1 ensures no look-ahead bias, but some annual metrics may not update for 90+ days after fiscal year end
|
|
321
|
+
- **Accuracy**: Growth rates (`value_08579`, `growthratesa_value_08816a`) can produce extreme outliers when base values approach zero; winsorization at 4 sigma recommended
|
|
322
|
+
- **Potential Biases**: Survivorship bias in 5-year averages (`statisticsa_value_05260a`); companies with volatile earnings histories may have incomplete long-term records
|
|
323
|
+
|
|
324
|
+
### Computational Complexity
|
|
325
|
+
- **Lightweight features**: `divide({value_03501a}, {value_02300q})`, `subtract({value_04001q}, {cfsourceusea_value_04840a})` - single operations
|
|
326
|
+
- **Medium complexity**: `ts_std_dev({value_08251q}, 20)`, `ts_sum({value_04001q}, 252)` - time series windows
|
|
327
|
+
- **Heavy computation**: `ts_corr({value_08316q}, ts_delay({value_08316q}, 252), 504)` - dual time series with lag and correlation; `quantile({value_08301q}, driver="gaussian")` - cross-sectional ranking
|
|
328
|
+
|
|
329
|
+
### Recommended Prioritization
|
|
330
|
+
|
|
331
|
+
**Tier 1 (Immediate Implementation)**:
|
|
332
|
+
1. **Sustainable Growth Score** - Combines momentum with quality, directly addresses leverage risk in growth stories
|
|
333
|
+
2. **EBIT Z-Score Deviation** - Captures earnings anomalies with clear mean-reversion interpretation
|
|
334
|
+
3. **Cash-to-Assets Efficiency** - Fundamental efficiency metric with strong theoretical basis
|
|
335
|
+
|
|
336
|
+
**Tier 2 (Secondary Priority)**:
|
|
337
|
+
1. **Operating Margin Persistence** - Quality factor with academic support for moat identification
|
|
338
|
+
2. **Coverage Neutralized for Size** - Removes size bias from credit metrics for cross-cap comparisons
|
|
339
|
+
3. **Equity Capital Structure Ratio** - Classic leverage measure with risk management applications
|
|
340
|
+
|
|
341
|
+
**Tier 3 (Requires Further Validation)**:
|
|
342
|
+
1. **FX-Adjusted Cash Generation** - Requires validation that FX effects are indeed noise rather than signal for international companies
|
|
343
|
+
2. **Cumulative FX Drag** - Sign convention must be verified (positive/negative directionality) before use in production
|
|
344
|
+
|
|
345
|
+
---
|
|
346
|
+
|
|
347
|
+
## Critical Questions for Further Exploration
|
|
348
|
+
|
|
349
|
+
### Unanswered Questions:
|
|
350
|
+
1. How do the quarterly vs annual frequency mismatches affect correlation structures between `growthratesa_value_08816a` (annual) and `value_08316q` (quarterly)?
|
|
351
|
+
2. Does `cfsourceusea_value_04840a` capture transactional FX exposure or only translational consolidation effects?
|
|
352
|
+
3. How does the dataset treat extraordinary items in `value_04001q` vs `value_18191q`?
|
|
353
|
+
|
|
354
|
+
### Recommended Additional Data:
|
|
355
|
+
- Industry sector classifications to enable `group_mean` neutralizations within sectors
|
|
356
|
+
- Daily price data to construct valuation multiples (P/E, EV/EBIT) for convergence analysis
|
|
357
|
+
- Short interest data to combine with `value_08579` (Market Cap Growth) for squeeze potential identification
|
|
358
|
+
|
|
359
|
+
### Assumptions to Challenge:
|
|
360
|
+
- **Stable is always better**: Is low volatility in `value_08251q` always positive, or does it indicate complacency in low-growth industries?
|
|
361
|
+
- **Growth is good**: Does `growthratesa_value_08616a` account for acquisition quality, or does it reward dilutive M&A?
|
|
362
|
+
- **FX is noise**: For pure exporters, is `cfsourceusea_value_04840a` truly non-operational, or does it reflect competitive positioning?
|
|
363
|
+
|
|
364
|
+
---
|
|
365
|
+
|
|
366
|
+
## Methodology Notes
|
|
367
|
+
|
|
368
|
+
**Analysis Approach**: This report was generated by:
|
|
369
|
+
1. Deep field deconstruction to understand data essence (accounting relationships, frequency differences, business logic)
|
|
370
|
+
2. Question-driven feature generation (8 fundamental questions applied to financial statement logic)
|
|
371
|
+
3. Logical validation of each feature concept against financial theory and data constraints
|
|
372
|
+
4. Transparent documentation of reasoning and implementation templates
|
|
373
|
+
|
|
374
|
+
**Design Principles**:
|
|
375
|
+
- Focus on logical meaning over conventional patterns (e.g., combining growth with coverage rather than just using P/E)
|
|
376
|
+
- Every feature must answer a specific question about the underlying economic reality
|
|
377
|
+
- Clear documentation of "why" for each suggestion to enable validation
|
|
378
|
+
- Emphasis on data understanding (quarterly vs annual, operational vs non-operational) over prediction
|
|
379
|
+
|
|
380
|
+
---
|
|
381
|
+
|
|
382
|
+
*Report generated: 2024-01-15*
|
|
383
|
+
*Analysis depth: Comprehensive field deconstruction + 8-question framework*
|
|
384
|
+
*Next steps: Implement Tier 1 features, validate FX sign conventions, gather sector data for relative features*
|
|
@@ -0,0 +1,41 @@
|
|
|
1
|
+
[
|
|
2
|
+
"divide(abs(fnd28_cfsourceusea_value_04840a), ts_mean(abs(fnd28_cfsourceusea_value_04840a), 252))",
|
|
3
|
+
"divide(subtract(fnd28_ishtq_value_18191q, ts_mean(fnd28_ishtq_value_18191q, 252)), ts_std_dev(fnd28_ishtq_value_18191q, 252))",
|
|
4
|
+
"divide(subtract(fnd28_newq_value_18191q, ts_mean(fnd28_newq_value_18191q, 252)), ts_std_dev(fnd28_newq_value_18191q, 252))",
|
|
5
|
+
"divide(ts_delta(fnd28_newq_value_08316q, 63), ts_mean(fnd28_newq_value_08316q, 252))",
|
|
6
|
+
"divide(ts_delta(fnd28_ratesq_value_08316q, 63), ts_mean(fnd28_ratesq_value_08316q, 252))",
|
|
7
|
+
"divide(fnd28_bdeq_value_03051q, fnd28_bdeq_value_03999q)",
|
|
8
|
+
"divide(fnd28_bdeq_value_03051q, fnd28_newq_value_03999q)",
|
|
9
|
+
"divide(fnd28_fsq1_value_03051q, fnd28_bdeq_value_03999q)",
|
|
10
|
+
"divide(fnd28_fsq1_value_03051q, fnd28_newq_value_03999q)",
|
|
11
|
+
"divide(fnd28_nddq1_value_03051q, fnd28_newq_value_03999q)",
|
|
12
|
+
"divide(fnd28_nddq1_value_03051q, fnd28_bdeq_value_03999q)",
|
|
13
|
+
"divide(fnd28_bdea_value_03501a, fnd28_bsassetq_value_02300q)",
|
|
14
|
+
"divide(fnd28_bdea_value_03501a, fnd28_nddq1_value_02300q)",
|
|
15
|
+
"divide(fnd28_fsa1_value_03501a, fnd28_bsassetq_value_02300q)",
|
|
16
|
+
"divide(fnd28_fsa1_value_03501a, fnd28_nddq1_value_02300q)",
|
|
17
|
+
"divide(fnd28_cfq_value_04001q, fnd28_bsassetq_value_02300q)",
|
|
18
|
+
"divide(fnd28_cfq_value_04001q, fnd28_nddq1_value_02300q)",
|
|
19
|
+
"divide(fnd28_nddq1_value_04001q, fnd28_nddq1_value_02300q)",
|
|
20
|
+
"divide(fnd28_nddq1_value_04001q, fnd28_bsassetq_value_02300q)",
|
|
21
|
+
"multiply(fnd28_growthratesa_value_08616a, fnd28_newq_value_08251q)",
|
|
22
|
+
"multiply(fnd28_growthratesa_value_08616a, fnd28_ratesq_value_08251q)",
|
|
23
|
+
"quantile(fnd28_newq_value_08301q, driver=\"gaussian\")",
|
|
24
|
+
"quantile(fnd28_ratesq_value_08301q, driver=\"gaussian\")",
|
|
25
|
+
"regression_neut(fnd28_newq_value_08251q, fnd28_nddq1_value_02300q)",
|
|
26
|
+
"regression_neut(fnd28_newq_value_08251q, fnd28_bsassetq_value_02300q)",
|
|
27
|
+
"regression_neut(fnd28_ratesq_value_08251q, fnd28_bsassetq_value_02300q)",
|
|
28
|
+
"regression_neut(fnd28_ratesq_value_08251q, fnd28_nddq1_value_02300q)",
|
|
29
|
+
"subtract(fnd28_cfq_value_04001q, fnd28_cfsourceusea_value_04840a)",
|
|
30
|
+
"subtract(fnd28_nddq1_value_04001q, fnd28_cfsourceusea_value_04840a)",
|
|
31
|
+
"ts_corr(fnd28_newq_value_08316q, ts_delay(fnd28_newq_value_08316q, 252), 504)",
|
|
32
|
+
"ts_corr(fnd28_ratesq_value_08316q, ts_delay(fnd28_ratesq_value_08316q, 252), 504)",
|
|
33
|
+
"ts_delta(fnd28_growthratesa_value_08816a, 63)",
|
|
34
|
+
"ts_std_dev(ts_delta(fnd28_bsassetq_value_02300q, 252), 63)",
|
|
35
|
+
"ts_std_dev(ts_delta(fnd28_nddq1_value_02300q, 252), 63)",
|
|
36
|
+
"ts_std_dev(fnd28_newq_value_08251q, 20)",
|
|
37
|
+
"ts_std_dev(fnd28_ratesq_value_08251q, 20)",
|
|
38
|
+
"ts_sum(fnd28_cfsourceusea_value_04840a, 63)",
|
|
39
|
+
"ts_sum(fnd28_cfq_value_04001q, 252)",
|
|
40
|
+
"ts_sum(fnd28_nddq1_value_04001q, 252)"
|
|
41
|
+
]
|
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
{
|
|
2
|
+
"template": "divide(abs({cfsourceusea_value_04840a}), ts_mean(abs({cfsourceusea_value_04840a}), 252))",
|
|
3
|
+
"idea": "**Concept**: FX Impact Anomaly\n- **Sample Fields Used**: `cfsourceusea_value_04840a`\n- **Definition**: Magnitude of current FX effect relative to historical average absolute impact\n- **Why This Feature**: Flags unusual currency translation effects that may distort underlying operational performance\n- **Logical Meaning**: Identifies when currency headwinds/tailwinds are unusually severe compared to the company's historical FX exposure\n- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. FX effects are often zero for domestic companies; NaN vs zero distinction matters for international exposure identification.\n- **Directionality**: High values indicate unusual FX impact (may require operational adjustment)\n- **Boundary Conditions**: Values near 1 indicate normal FX impact; high values indicate currency crises or extreme rate movements",
|
|
4
|
+
"expression_list": [
|
|
5
|
+
"divide(abs(fnd28_cfsourceusea_value_04840a), ts_mean(abs(fnd28_cfsourceusea_value_04840a), 252))"
|
|
6
|
+
]
|
|
7
|
+
}
|
|
@@ -0,0 +1,8 @@
|
|
|
1
|
+
{
|
|
2
|
+
"template": "divide(subtract({value_18191q}, ts_mean({value_18191q}, 252)), ts_std_dev({value_18191q}, 252))",
|
|
3
|
+
"idea": "**Concept**: EBIT Z-Score Deviation\n- **Sample Fields Used**: `value_18191q`\n- **Definition**: Standardized deviation of current EBIT from its 1-year historical mean\n- **Why This Feature**: Identifies earnings surprises or shocks that deviate significantly from the company's normal operating range\n- **Logical Meaning**: Statistical measure of earnings unusualness; extreme values suggest non-recurring items or inflection points\n- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. NaN handling should preserve the distinction between missing data and zero earnings.\n- **Directionality**: High absolute values indicate anomalies (potential mean reversion candidates)\n- **Boundary Conditions**: Values beyond 2-3 standard deviations indicate significant outliers",
|
|
4
|
+
"expression_list": [
|
|
5
|
+
"divide(subtract(fnd28_ishtq_value_18191q, ts_mean(fnd28_ishtq_value_18191q, 252)), ts_std_dev(fnd28_ishtq_value_18191q, 252))",
|
|
6
|
+
"divide(subtract(fnd28_newq_value_18191q, ts_mean(fnd28_newq_value_18191q, 252)), ts_std_dev(fnd28_newq_value_18191q, 252))"
|
|
7
|
+
]
|
|
8
|
+
}
|
|
@@ -0,0 +1,8 @@
|
|
|
1
|
+
{
|
|
2
|
+
"template": "divide(ts_delta({value_08316q}, 63), ts_mean({value_08316q}, 252))",
|
|
3
|
+
"idea": "**Concept**: Operating Margin Momentum\n- **Sample Fields Used**: `value_08316q`\n- **Definition**: Recent change in operating margin normalized by the 1-year average margin level\n- **Why This Feature**: Identifies operational inflections (expansion/contraction) relative to the company's historical norm\n- **Logical Meaning**: Normalized velocity of profitability changes; indicates pricing power or cost control shifts\n- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. Quarterly reporting gaps should not be filled to avoid assuming constant margins.\n- **Directionality**: Positive values indicate margin expansion (operational improvement)\n- **Boundary Conditions**: Values near zero indicate stable margins; spikes indicate one-time items or structural changes",
|
|
4
|
+
"expression_list": [
|
|
5
|
+
"divide(ts_delta(fnd28_newq_value_08316q, 63), ts_mean(fnd28_newq_value_08316q, 252))",
|
|
6
|
+
"divide(ts_delta(fnd28_ratesq_value_08316q, 63), ts_mean(fnd28_ratesq_value_08316q, 252))"
|
|
7
|
+
]
|
|
8
|
+
}
|
|
@@ -0,0 +1,12 @@
|
|
|
1
|
+
{
|
|
2
|
+
"template": "divide({value_03051q}, {value_03999q})",
|
|
3
|
+
"idea": "**Concept**: Short-Term Liquidity Exposure\n- **Sample Fields Used**: `value_03051q`, `value_03999q`\n- **Definition**: Short-term debt as a proportion of total liabilities and shareholders' equity\n- **Why This Feature**: Captures refinancing risk and liquidity pressure; high values indicate near-term obligations\n- **Logical Meaning**: Maturity structure of liabilities; indicates reliance on short-term funding vs long-term capital\n- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. Zero short-term debt is meaningful (long-term only financing); distinguish from missing data.\n- **Directionality**: Higher values indicate greater near-term refinancing risk (negative for stability)\n- **Boundary Conditions**: Values approaching 1 indicate all debt is short-term; zero indicates no current maturities",
|
|
4
|
+
"expression_list": [
|
|
5
|
+
"divide(fnd28_bdeq_value_03051q, fnd28_bdeq_value_03999q)",
|
|
6
|
+
"divide(fnd28_bdeq_value_03051q, fnd28_newq_value_03999q)",
|
|
7
|
+
"divide(fnd28_fsq1_value_03051q, fnd28_bdeq_value_03999q)",
|
|
8
|
+
"divide(fnd28_fsq1_value_03051q, fnd28_newq_value_03999q)",
|
|
9
|
+
"divide(fnd28_nddq1_value_03051q, fnd28_newq_value_03999q)",
|
|
10
|
+
"divide(fnd28_nddq1_value_03051q, fnd28_bdeq_value_03999q)"
|
|
11
|
+
]
|
|
12
|
+
}
|
|
@@ -0,0 +1,10 @@
|
|
|
1
|
+
{
|
|
2
|
+
"template": "divide({value_03501a}, {value_02300q})",
|
|
3
|
+
"idea": "**Concept**: Equity Capital Structure Ratio\n- **Sample Fields Used**: `value_03501a`, `value_02300q`\n- **Definition**: Common equity as a proportion of total assets (Equity/Assets ratio)\n- **Why This Feature**: Measures financial leverage and capital structure conservatism; higher equity indicates lower leverage risk\n- **Logical Meaning**: Ownership cushion against asset value declines; inverse of leverage ratio\n- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. Annual equity vs quarterly assets creates frequency mismatch; do not interpolate annual data to quarterly.\n- **Directionality**: Higher values indicate less leveraged, more conservative capital structure (typically lower risk)\n- **Boundary Conditions**: Values near 1 indicate no debt; near 0 indicate highly leveraged or negative equity situations",
|
|
4
|
+
"expression_list": [
|
|
5
|
+
"divide(fnd28_bdea_value_03501a, fnd28_bsassetq_value_02300q)",
|
|
6
|
+
"divide(fnd28_bdea_value_03501a, fnd28_nddq1_value_02300q)",
|
|
7
|
+
"divide(fnd28_fsa1_value_03501a, fnd28_bsassetq_value_02300q)",
|
|
8
|
+
"divide(fnd28_fsa1_value_03501a, fnd28_nddq1_value_02300q)"
|
|
9
|
+
]
|
|
10
|
+
}
|
|
@@ -0,0 +1,10 @@
|
|
|
1
|
+
{
|
|
2
|
+
"template": "divide({value_04001q}, {value_02300q})",
|
|
3
|
+
"idea": "**Concept**: Cash-to-Assets Efficiency\n- **Sample Fields Used**: `value_04001q`, `value_02300q`\n- **Definition**: Ratio of net income starting line to total assets (ROA proxy using cash flow statement starting point)\n- **Why This Feature**: Measures fundamental asset efficiency independent of accrual accounting adjustments\n- **Logical Meaning**: Asset turnover intensity; how effectively the company converts its asset base into earnings\n- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. Asset values are quarterly; income is flow-based. Ensure both are available for the same period.\n- **Directionality**: Higher values indicate more efficient asset utilization (positive for returns)\n- **Boundary Conditions**: Capital-intensive industries naturally have lower values; financials have different asset definitions",
|
|
4
|
+
"expression_list": [
|
|
5
|
+
"divide(fnd28_cfq_value_04001q, fnd28_bsassetq_value_02300q)",
|
|
6
|
+
"divide(fnd28_cfq_value_04001q, fnd28_nddq1_value_02300q)",
|
|
7
|
+
"divide(fnd28_nddq1_value_04001q, fnd28_nddq1_value_02300q)",
|
|
8
|
+
"divide(fnd28_nddq1_value_04001q, fnd28_bsassetq_value_02300q)"
|
|
9
|
+
]
|
|
10
|
+
}
|
|
@@ -0,0 +1,8 @@
|
|
|
1
|
+
{
|
|
2
|
+
"template": "multiply({growthratesa_value_08616a}, {value_08251q})",
|
|
3
|
+
"idea": "**Concept**: Sustainable Growth Quality\n- **Sample Fields Used**: `growthratesa_value_08616a`, `value_08251q`\n- **Definition**: Product of equity growth rate and fixed charge coverage ratio\n- **Why This Feature**: High growth with low coverage suggests leveraged, risky expansion; high coverage supports sustainable growth\n- **Logical Meaning**: Quality-adjusted growth metric; scales growth magnitude by financial stability\n- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. Different frequencies (annual growth vs quarterly coverage) require alignment; do not fill across frequency mismatches.\n- **Directionality**: Higher values indicate high growth with strong coverage (optimal); negative values indicate growth during coverage distress (risky)\n- **Boundary Conditions**: Near-zero coverage with high growth creates extreme values; winsorization recommended",
|
|
4
|
+
"expression_list": [
|
|
5
|
+
"multiply(fnd28_growthratesa_value_08616a, fnd28_newq_value_08251q)",
|
|
6
|
+
"multiply(fnd28_growthratesa_value_08616a, fnd28_ratesq_value_08251q)"
|
|
7
|
+
]
|
|
8
|
+
}
|
|
@@ -0,0 +1,8 @@
|
|
|
1
|
+
{
|
|
2
|
+
"template": "quantile({value_08301q}, driver=\"gaussian\")",
|
|
3
|
+
"idea": "**Concept**: ROE Cross-Sectional Percentile\n- **Sample Fields Used**: `value_08301q`\n- **Definition**: Gaussian-quantile rank of Return on Equity within the cross-sectional universe\n- **Why This Feature**: Relative profitability positioning independent of market-wide ROE shifts; identifies top-tier operators\n- **Logical Meaning**: Standardized position within the profit distribution; robust to inflation/period effects that raise all boats\n- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. Quantile calculation requires complete cross-section; NaN values should be excluded from ranking, not filled.\n- **Directionality**: Higher values indicate top-quartile profitability relative to peers (positive for selection)\n- **Boundary Conditions**: Gaussian transformation caps extreme tails; values beyond +/- 2 sigma are rare",
|
|
4
|
+
"expression_list": [
|
|
5
|
+
"quantile(fnd28_newq_value_08301q, driver=\"gaussian\")",
|
|
6
|
+
"quantile(fnd28_ratesq_value_08301q, driver=\"gaussian\")"
|
|
7
|
+
]
|
|
8
|
+
}
|
|
@@ -0,0 +1,10 @@
|
|
|
1
|
+
{
|
|
2
|
+
"template": "regression_neut({value_08251q}, {value_02300q})",
|
|
3
|
+
"idea": "**Concept**: Coverage Neutralized for Size\n- **Sample Fields Used**: `value_08251q`, `value_02300q`\n- **Definition**: Residual of fixed charge coverage after regressing on total assets (size)\n- **Why This Feature**: Distinguishes coverage due to operational efficiency from coverage due to scale economies or diversification\n- **Logical Meaning**: Coverage ratio independent of company size; identifies efficiently managed small caps vs bloated large caps\n- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. Regression requires paired observations; missing either variable should result in NaN residual.\n- **Directionality**: Positive residuals indicate better coverage than size predicts (operational alpha)\n- **Boundary Conditions**: Extreme residuals indicate outliers in coverage-to-size relationship (niche business models)",
|
|
4
|
+
"expression_list": [
|
|
5
|
+
"regression_neut(fnd28_newq_value_08251q, fnd28_nddq1_value_02300q)",
|
|
6
|
+
"regression_neut(fnd28_newq_value_08251q, fnd28_bsassetq_value_02300q)",
|
|
7
|
+
"regression_neut(fnd28_ratesq_value_08251q, fnd28_bsassetq_value_02300q)",
|
|
8
|
+
"regression_neut(fnd28_ratesq_value_08251q, fnd28_nddq1_value_02300q)"
|
|
9
|
+
]
|
|
10
|
+
}
|
|
@@ -0,0 +1,8 @@
|
|
|
1
|
+
{
|
|
2
|
+
"template": "subtract({value_04001q}, {cfsourceusea_value_04840a})",
|
|
3
|
+
"idea": "**Concept**: FX-Adjusted Cash Generation\n- **Sample Fields Used**: `value_04001q`, `cfsourceusea_value_04840a`\n- **Definition**: Net income starting line minus FX translation effects to isolate operational cash generation\n- **Why This Feature**: Removes non-operational currency noise to reveal underlying business performance\n- **Logical Meaning**: Core operational cash flow before translational accounting adjustments; pure operational signal\n- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. If FX effect is NaN (domestic company), the adjustment should be zero (no effect), not filled from other companies.\n- **Directionality**: Higher values indicate stronger core operational generation independent of currency games\n- **Boundary Conditions**: Large differences between adjusted and unadjusted indicate high FX volatility or international exposure",
|
|
4
|
+
"expression_list": [
|
|
5
|
+
"subtract(fnd28_cfq_value_04001q, fnd28_cfsourceusea_value_04840a)",
|
|
6
|
+
"subtract(fnd28_nddq1_value_04001q, fnd28_cfsourceusea_value_04840a)"
|
|
7
|
+
]
|
|
8
|
+
}
|
|
@@ -0,0 +1,8 @@
|
|
|
1
|
+
{
|
|
2
|
+
"template": "ts_corr({value_08316q}, ts_delay({value_08316q}, 252), 504)",
|
|
3
|
+
"idea": "**Concept**: Operating Margin Persistence\n- **Sample Fields Used**: `value_08316q`\n- **Definition**: Correlation between current operating margin and margin 252 days (1 year) prior, measured over 504 days (2 years)\n- **Why This Feature**: Measures the durability of competitive advantages; persistent margins indicate moats, volatile margins indicate commodity exposure\n- **Logical Meaning**: Autocorrelation of profitability; high values suggest structural industry position, low values suggest cyclical or competitive pressure\n- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. Correlation requires aligned time series; filling gaps creates spurious persistence.\n- **Directionality**: Higher values indicate persistent margins (quality); low values indicate unstable margins (risk)\n- **Boundary Conditions**: Values near 1 indicate highly predictable margins; near 0 indicate random walk margins; negative indicate mean-reverting margins",
|
|
4
|
+
"expression_list": [
|
|
5
|
+
"ts_corr(fnd28_newq_value_08316q, ts_delay(fnd28_newq_value_08316q, 252), 504)",
|
|
6
|
+
"ts_corr(fnd28_ratesq_value_08316q, ts_delay(fnd28_ratesq_value_08316q, 252), 504)"
|
|
7
|
+
]
|
|
8
|
+
}
|
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
{
|
|
2
|
+
"template": "ts_delta({growthratesa_value_08816a}, 63)",
|
|
3
|
+
"idea": "**Concept**: Earnings Growth Acceleration\n- **Sample Fields Used**: `growthratesa_value_08816a`\n- **Definition**: Change in annual EPS growth rate over a 63-day window to capture inflection points in momentum\n- **Why This Feature**: Markets price changes in growth rates, not just growth levels; acceleration signals improving business trends\n- **Logical Meaning**: Second derivative of earnings; positive values indicate growth is speeding up (positive momentum)\n- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. Annual growth rates update infrequently; filling NaNs with stale data creates look-ahead bias.\n- **Directionality**: Positive values indicate accelerating growth (bullish); negative indicates deceleration\n- **Boundary Conditions**: Extreme values occur near earnings turning points (negative to positive growth)",
|
|
4
|
+
"expression_list": [
|
|
5
|
+
"ts_delta(fnd28_growthratesa_value_08816a, 63)"
|
|
6
|
+
]
|
|
7
|
+
}
|
|
@@ -0,0 +1,8 @@
|
|
|
1
|
+
{
|
|
2
|
+
"template": "ts_std_dev(ts_delta({value_02300q}, 252), 63)",
|
|
3
|
+
"idea": "**Concept**: Asset Growth Consistency\n- **Sample Fields Used**: `value_02300q`\n- **Definition**: Standard deviation of year-over-year asset changes measured over 63 days (quarterly window)\n- **Why This Feature**: Distinguishes between steady organic expansion and lumpy acquisition-driven growth or asset sales\n- **Logical Meaning**: Captures the volatility of the company's investment policy; consistent growth suggests predictable capital allocation\n- **is filling nan necessary**: we have some operators to fill nan value like ts_backfill() or group_mean() etc. however, in some cases, if the nan value itself has some meaning, then we should not fill it blindly since it may introduce some bias. so before filling nan value, we should think about whether the nan value has some meaning in the specific scenario. Asset values are typically reported quarterly; interpolation between quarters may introduce false stability.\n- **Directionality**: Lower values indicate more stable asset base evolution (typically positive for forecasting)\n- **Boundary Conditions**: Zero indicates no asset changes; spikes indicate M&A activity or write-downs",
|
|
4
|
+
"expression_list": [
|
|
5
|
+
"ts_std_dev(ts_delta(fnd28_bsassetq_value_02300q, 252), 63)",
|
|
6
|
+
"ts_std_dev(ts_delta(fnd28_nddq1_value_02300q, 252), 63)"
|
|
7
|
+
]
|
|
8
|
+
}
|