@zigrivers/scaffold 3.13.0 → 3.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (180) hide show
  1. package/README.md +32 -10
  2. package/content/knowledge/research/research-architecture.md +385 -0
  3. package/content/knowledge/research/research-conventions.md +248 -0
  4. package/content/knowledge/research/research-dev-environment.md +303 -0
  5. package/content/knowledge/research/research-experiment-loop.md +429 -0
  6. package/content/knowledge/research/research-experiment-tracking.md +336 -0
  7. package/content/knowledge/research/research-ml-architecture-search.md +383 -0
  8. package/content/knowledge/research/research-ml-evaluation.md +407 -0
  9. package/content/knowledge/research/research-ml-experiment-tracking.md +466 -0
  10. package/content/knowledge/research/research-ml-training-patterns.md +413 -0
  11. package/content/knowledge/research/research-observability.md +395 -0
  12. package/content/knowledge/research/research-overfitting-prevention.md +306 -0
  13. package/content/knowledge/research/research-project-structure.md +264 -0
  14. package/content/knowledge/research/research-quant-backtesting.md +326 -0
  15. package/content/knowledge/research/research-quant-market-data.md +366 -0
  16. package/content/knowledge/research/research-quant-metrics.md +335 -0
  17. package/content/knowledge/research/research-quant-requirements.md +223 -0
  18. package/content/knowledge/research/research-quant-risk.md +469 -0
  19. package/content/knowledge/research/research-quant-strategy-patterns.md +412 -0
  20. package/content/knowledge/research/research-requirements.md +201 -0
  21. package/content/knowledge/research/research-security.md +374 -0
  22. package/content/knowledge/research/research-sim-compute-management.md +538 -0
  23. package/content/knowledge/research/research-sim-engine-patterns.md +448 -0
  24. package/content/knowledge/research/research-sim-parameter-spaces.md +425 -0
  25. package/content/knowledge/research/research-sim-validation.md +456 -0
  26. package/content/knowledge/research/research-testing.md +334 -0
  27. package/content/methodology/research-ml-research.yml +23 -0
  28. package/content/methodology/research-overlay.yml +65 -0
  29. package/content/methodology/research-quant-finance.yml +29 -0
  30. package/content/methodology/research-simulation.yml +23 -0
  31. package/dist/cli/commands/adopt.d.ts.map +1 -1
  32. package/dist/cli/commands/adopt.js +30 -8
  33. package/dist/cli/commands/adopt.js.map +1 -1
  34. package/dist/cli/commands/adopt.serialization.test.js +49 -0
  35. package/dist/cli/commands/adopt.serialization.test.js.map +1 -1
  36. package/dist/cli/commands/adopt.test.js +8 -0
  37. package/dist/cli/commands/adopt.test.js.map +1 -1
  38. package/dist/cli/commands/build.d.ts.map +1 -1
  39. package/dist/cli/commands/build.js +191 -180
  40. package/dist/cli/commands/build.js.map +1 -1
  41. package/dist/cli/commands/complete.d.ts.map +1 -1
  42. package/dist/cli/commands/complete.js +16 -12
  43. package/dist/cli/commands/complete.js.map +1 -1
  44. package/dist/cli/commands/complete.test.js +14 -5
  45. package/dist/cli/commands/complete.test.js.map +1 -1
  46. package/dist/cli/commands/init.d.ts +4 -0
  47. package/dist/cli/commands/init.d.ts.map +1 -1
  48. package/dist/cli/commands/init.js +75 -51
  49. package/dist/cli/commands/init.js.map +1 -1
  50. package/dist/cli/commands/init.test.js +33 -27
  51. package/dist/cli/commands/init.test.js.map +1 -1
  52. package/dist/cli/commands/reset.d.ts.map +1 -1
  53. package/dist/cli/commands/reset.js +44 -40
  54. package/dist/cli/commands/reset.js.map +1 -1
  55. package/dist/cli/commands/reset.test.js +42 -20
  56. package/dist/cli/commands/reset.test.js.map +1 -1
  57. package/dist/cli/commands/rework.d.ts.map +1 -1
  58. package/dist/cli/commands/rework.js +16 -12
  59. package/dist/cli/commands/rework.js.map +1 -1
  60. package/dist/cli/commands/rework.test.js +12 -3
  61. package/dist/cli/commands/rework.test.js.map +1 -1
  62. package/dist/cli/commands/run.d.ts.map +1 -1
  63. package/dist/cli/commands/run.js +318 -298
  64. package/dist/cli/commands/run.js.map +1 -1
  65. package/dist/cli/commands/run.test.js +92 -120
  66. package/dist/cli/commands/run.test.js.map +1 -1
  67. package/dist/cli/commands/skip.d.ts.map +1 -1
  68. package/dist/cli/commands/skip.js +19 -15
  69. package/dist/cli/commands/skip.js.map +1 -1
  70. package/dist/cli/commands/skip.test.js +22 -11
  71. package/dist/cli/commands/skip.test.js.map +1 -1
  72. package/dist/cli/commands/update.d.ts.map +1 -1
  73. package/dist/cli/commands/update.js +3 -1
  74. package/dist/cli/commands/update.js.map +1 -1
  75. package/dist/cli/commands/update.test.js +8 -4
  76. package/dist/cli/commands/update.test.js.map +1 -1
  77. package/dist/cli/commands/version.d.ts.map +1 -1
  78. package/dist/cli/commands/version.js +3 -1
  79. package/dist/cli/commands/version.js.map +1 -1
  80. package/dist/cli/commands/version.test.js +9 -5
  81. package/dist/cli/commands/version.test.js.map +1 -1
  82. package/dist/cli/index.d.ts.map +1 -1
  83. package/dist/cli/index.js +2 -0
  84. package/dist/cli/index.js.map +1 -1
  85. package/dist/cli/init-flag-families.d.ts +6 -1
  86. package/dist/cli/init-flag-families.d.ts.map +1 -1
  87. package/dist/cli/init-flag-families.js +32 -1
  88. package/dist/cli/init-flag-families.js.map +1 -1
  89. package/dist/cli/init-flag-families.test.js +47 -0
  90. package/dist/cli/init-flag-families.test.js.map +1 -1
  91. package/dist/cli/output/interactive.d.ts +1 -0
  92. package/dist/cli/output/interactive.d.ts.map +1 -1
  93. package/dist/cli/output/interactive.js +5 -0
  94. package/dist/cli/output/interactive.js.map +1 -1
  95. package/dist/cli/shutdown.d.ts +51 -0
  96. package/dist/cli/shutdown.d.ts.map +1 -0
  97. package/dist/cli/shutdown.js +199 -0
  98. package/dist/cli/shutdown.js.map +1 -0
  99. package/dist/cli/shutdown.test.d.ts +2 -0
  100. package/dist/cli/shutdown.test.d.ts.map +1 -0
  101. package/dist/cli/shutdown.test.js +316 -0
  102. package/dist/cli/shutdown.test.js.map +1 -0
  103. package/dist/config/schema.d.ts +272 -16
  104. package/dist/config/schema.d.ts.map +1 -1
  105. package/dist/config/schema.js +25 -1
  106. package/dist/config/schema.js.map +1 -1
  107. package/dist/config/schema.test.js +103 -3
  108. package/dist/config/schema.test.js.map +1 -1
  109. package/dist/core/assembly/overlay-loader.d.ts +12 -0
  110. package/dist/core/assembly/overlay-loader.d.ts.map +1 -1
  111. package/dist/core/assembly/overlay-loader.js +30 -0
  112. package/dist/core/assembly/overlay-loader.js.map +1 -1
  113. package/dist/core/assembly/overlay-loader.test.js +66 -1
  114. package/dist/core/assembly/overlay-loader.test.js.map +1 -1
  115. package/dist/core/assembly/overlay-state-resolver.d.ts.map +1 -1
  116. package/dist/core/assembly/overlay-state-resolver.js +48 -19
  117. package/dist/core/assembly/overlay-state-resolver.js.map +1 -1
  118. package/dist/core/assembly/overlay-state-resolver.test.js +80 -0
  119. package/dist/core/assembly/overlay-state-resolver.test.js.map +1 -1
  120. package/dist/e2e/init.test.js +5 -4
  121. package/dist/e2e/init.test.js.map +1 -1
  122. package/dist/e2e/project-type-overlays.test.js +119 -0
  123. package/dist/e2e/project-type-overlays.test.js.map +1 -1
  124. package/dist/project/adopt.d.ts.map +1 -1
  125. package/dist/project/adopt.js +3 -1
  126. package/dist/project/adopt.js.map +1 -1
  127. package/dist/project/detectors/disambiguate.js +1 -1
  128. package/dist/project/detectors/disambiguate.js.map +1 -1
  129. package/dist/project/detectors/index.d.ts.map +1 -1
  130. package/dist/project/detectors/index.js +2 -1
  131. package/dist/project/detectors/index.js.map +1 -1
  132. package/dist/project/detectors/ml.d.ts.map +1 -1
  133. package/dist/project/detectors/ml.js +2 -6
  134. package/dist/project/detectors/ml.js.map +1 -1
  135. package/dist/project/detectors/research.d.ts +4 -0
  136. package/dist/project/detectors/research.d.ts.map +1 -0
  137. package/dist/project/detectors/research.js +141 -0
  138. package/dist/project/detectors/research.js.map +1 -0
  139. package/dist/project/detectors/research.test.d.ts +2 -0
  140. package/dist/project/detectors/research.test.d.ts.map +1 -0
  141. package/dist/project/detectors/research.test.js +235 -0
  142. package/dist/project/detectors/research.test.js.map +1 -0
  143. package/dist/project/detectors/shared-signals.d.ts +3 -0
  144. package/dist/project/detectors/shared-signals.d.ts.map +1 -0
  145. package/dist/project/detectors/shared-signals.js +9 -0
  146. package/dist/project/detectors/shared-signals.js.map +1 -0
  147. package/dist/project/detectors/types.d.ts +6 -2
  148. package/dist/project/detectors/types.d.ts.map +1 -1
  149. package/dist/project/detectors/types.js.map +1 -1
  150. package/dist/state/lock-manager.d.ts +1 -0
  151. package/dist/state/lock-manager.d.ts.map +1 -1
  152. package/dist/state/lock-manager.js +1 -1
  153. package/dist/state/lock-manager.js.map +1 -1
  154. package/dist/types/config.d.ts +7 -1
  155. package/dist/types/config.d.ts.map +1 -1
  156. package/dist/wizard/copy/core.d.ts.map +1 -1
  157. package/dist/wizard/copy/core.js +4 -0
  158. package/dist/wizard/copy/core.js.map +1 -1
  159. package/dist/wizard/copy/index.d.ts.map +1 -1
  160. package/dist/wizard/copy/index.js +2 -0
  161. package/dist/wizard/copy/index.js.map +1 -1
  162. package/dist/wizard/copy/research.d.ts +3 -0
  163. package/dist/wizard/copy/research.d.ts.map +1 -0
  164. package/dist/wizard/copy/research.js +27 -0
  165. package/dist/wizard/copy/research.js.map +1 -0
  166. package/dist/wizard/copy/types.d.ts +5 -1
  167. package/dist/wizard/copy/types.d.ts.map +1 -1
  168. package/dist/wizard/flags.d.ts +7 -1
  169. package/dist/wizard/flags.d.ts.map +1 -1
  170. package/dist/wizard/questions.d.ts +4 -2
  171. package/dist/wizard/questions.d.ts.map +1 -1
  172. package/dist/wizard/questions.js +27 -1
  173. package/dist/wizard/questions.js.map +1 -1
  174. package/dist/wizard/questions.test.js +51 -0
  175. package/dist/wizard/questions.test.js.map +1 -1
  176. package/dist/wizard/wizard.d.ts +3 -2
  177. package/dist/wizard/wizard.d.ts.map +1 -1
  178. package/dist/wizard/wizard.js +3 -1
  179. package/dist/wizard/wizard.js.map +1 -1
  180. package/package.json +1 -1
@@ -0,0 +1,366 @@
1
+ ---
2
+ name: research-quant-market-data
3
+ description: Market data sourcing including OHLCV providers, tick data, corporate actions handling, data quality checks, gap handling, and timezone normalization
4
+ topics: [research, quant-finance, market-data, ohlcv, tick-data, corporate-actions, data-quality, alternative-data]
5
+ ---
6
+
7
+ Market data is the foundation of every quantitative strategy. Bad data produces bad backtests, and bad backtests produce strategies that fail in production. The data pipeline must handle multiple source providers with different conventions, normalize timestamps across timezones, adjust prices for corporate actions (splits, dividends, mergers), detect and fill gaps, and validate quality before any data reaches the strategy. Data quality issues are the most common source of phantom alpha -- apparent returns that disappear when data errors are fixed.
8
+
9
+ ## Summary
10
+
11
+ Build a data pipeline that sources OHLCV data from multiple providers (Yahoo Finance for screening, Polygon/Alpha Vantage for research-grade data), handles corporate actions correctly (use fully adjusted prices or apply point-in-time adjustment factors), validates quality (gap detection, outlier filtering, volume anomalies), normalizes timezones to a single reference (UTC or exchange local time), and stores data in a local cache to avoid repeated API calls. For tick data, use dedicated providers (Polygon, TickData) and downsample to desired frequency with proper bar construction.
12
+
13
+ ## Deep Guidance
14
+
15
+ ### Data Source Hierarchy
16
+
17
+ Use different data sources for different stages of research. Free sources are fine for initial screening; paid sources are necessary for final validation:
18
+
19
+ | Provider | Cost | Quality | Frequency | Best For |
20
+ |----------|------|---------|-----------|----------|
21
+ | Yahoo Finance (yfinance) | Free | Medium | Daily | Initial screening, idea generation |
22
+ | Alpha Vantage | Free tier + paid | Medium-High | 1min - Daily | Intraday research, small universes |
23
+ | Polygon.io | $29-199/mo | High | Tick - Daily | Production research, large universes |
24
+ | Tiingo | $10-30/mo | High | Daily + IEX tick | EOD research, news data |
25
+ | Quandl/Nasdaq Data Link | Varies | High | Daily | Fundamentals, alternative data |
26
+ | Interactive Brokers | Trading account | High | Tick - Daily | Live data, historical backfill |
27
+
28
+ ```python
29
+ # data/providers/base.py
30
+ from abc import ABC, abstractmethod
31
+ from datetime import date
32
+ import pandas as pd
33
+
34
+ class DataProvider(ABC):
35
+ """Base interface for market data providers."""
36
+
37
+ @abstractmethod
38
+ def fetch_ohlcv(
39
+ self,
40
+ symbol: str,
41
+ start: date,
42
+ end: date,
43
+ frequency: str = "1D",
44
+ ) -> pd.DataFrame:
45
+ """
46
+ Fetch OHLCV data for a single symbol.
47
+
48
+ Returns DataFrame with columns: open, high, low, close, volume
49
+ Index: DatetimeIndex (timezone-aware UTC)
50
+ """
51
+ ...
52
+
53
+ @abstractmethod
54
+ def fetch_splits(self, symbol: str, start: date, end: date) -> list[dict]:
55
+ """Fetch stock split history."""
56
+ ...
57
+
58
+ @abstractmethod
59
+ def fetch_dividends(self, symbol: str, start: date, end: date) -> list[dict]:
60
+ """Fetch dividend history."""
61
+ ...
62
+ ```
63
+
64
+ ### Corporate Actions Handling
65
+
66
+ Corporate actions (splits, dividends, mergers, spinoffs) change the price series in ways that must be accounted for to avoid phantom signals:
67
+
68
+ ```python
69
+ # data/adjustments.py
70
+ import pandas as pd
71
+ import numpy as np
72
+
73
+ def adjust_for_splits(
74
+ prices: pd.DataFrame,
75
+ splits: list[dict],
76
+ ) -> pd.DataFrame:
77
+ """
78
+ Apply split adjustments to historical prices.
79
+
80
+ Adjusts prices backward from most recent split to preserve
81
+ current price levels. This is the standard convention.
82
+ """
83
+ adjusted = prices.copy()
84
+
85
+ # Sort splits in reverse chronological order
86
+ for split in sorted(splits, key=lambda s: s["date"], reverse=True):
87
+ split_date = pd.Timestamp(split["date"])
88
+ ratio = split["ratio"] # e.g., 4.0 for a 4:1 split
89
+
90
+ mask = adjusted.index < split_date
91
+ for col in ["open", "high", "low", "close"]:
92
+ adjusted.loc[mask, col] /= ratio
93
+ adjusted.loc[mask, "volume"] *= ratio
94
+
95
+ return adjusted
96
+
97
+
98
+ def adjust_for_dividends(
99
+ prices: pd.DataFrame,
100
+ dividends: list[dict],
101
+ method: str = "proportional",
102
+ ) -> pd.DataFrame:
103
+ """
104
+ Apply dividend adjustments to historical prices.
105
+
106
+ Args:
107
+ method: "proportional" (standard) or "subtractive" (simple).
108
+ """
109
+ adjusted = prices.copy()
110
+
111
+ for div in sorted(dividends, key=lambda d: d["ex_date"], reverse=True):
112
+ ex_date = pd.Timestamp(div["ex_date"])
113
+ amount = div["amount"]
114
+
115
+ mask = adjusted.index < ex_date
116
+ if method == "proportional":
117
+ # Standard: adjust by the ratio (close - dividend) / close
118
+ close_before = adjusted.loc[mask, "close"].iloc[-1] if mask.any() else 0
119
+ if close_before > 0:
120
+ factor = (close_before - amount) / close_before
121
+ for col in ["open", "high", "low", "close"]:
122
+ adjusted.loc[mask, col] *= factor
123
+ elif method == "subtractive":
124
+ for col in ["open", "high", "low", "close"]:
125
+ adjusted.loc[mask, col] -= amount
126
+
127
+ return adjusted
128
+ ```
129
+
130
+ ### Data Quality Checks
131
+
132
+ Run quality checks on every dataset before it enters the backtest. Catch problems early rather than debugging phantom strategy behavior:
133
+
134
+ ```python
135
+ # data/quality.py
136
+ import pandas as pd
137
+ import numpy as np
138
+ from dataclasses import dataclass, field
139
+
140
+ @dataclass
141
+ class QualityReport:
142
+ """Results of data quality validation."""
143
+ symbol: str
144
+ total_bars: int
145
+ issues: list[str] = field(default_factory=list)
146
+ warnings: list[str] = field(default_factory=list)
147
+
148
+ @property
149
+ def passed(self) -> bool:
150
+ return len(self.issues) == 0
151
+
152
+ def validate_ohlcv(df: pd.DataFrame, symbol: str) -> QualityReport:
153
+ """Comprehensive OHLCV data quality validation."""
154
+ report = QualityReport(symbol=symbol, total_bars=len(df))
155
+
156
+ # Check required columns
157
+ required = ["open", "high", "low", "close", "volume"]
158
+ for col in required:
159
+ if col not in df.columns:
160
+ report.issues.append(f"Missing column: {col}")
161
+ if report.issues:
162
+ return report
163
+
164
+ # OHLC consistency: high >= open, close, low; low <= open, close, high
165
+ bad_high = df[df["high"] < df[["open", "close"]].max(axis=1)]
166
+ if len(bad_high) > 0:
167
+ report.issues.append(f"{len(bad_high)} bars where high < max(open, close)")
168
+
169
+ bad_low = df[df["low"] > df[["open", "close"]].min(axis=1)]
170
+ if len(bad_low) > 0:
171
+ report.issues.append(f"{len(bad_low)} bars where low > min(open, close)")
172
+
173
+ # Zero or negative prices
174
+ for col in ["open", "high", "low", "close"]:
175
+ zeros = (df[col] <= 0).sum()
176
+ if zeros > 0:
177
+ report.issues.append(f"{zeros} bars with {col} <= 0")
178
+
179
+ # Null values
180
+ null_pct = df[required].isnull().sum().sum() / (len(df) * len(required)) * 100
181
+ if null_pct > 0:
182
+ report.warnings.append(f"Null values: {null_pct:.2f}%")
183
+ if null_pct > 1:
184
+ report.issues.append(f"Null values exceed 1%: {null_pct:.2f}%")
185
+
186
+ # Extreme returns (potential data errors)
187
+ returns = df["close"].pct_change().dropna()
188
+ extreme = returns[returns.abs() > 0.5] # >50% daily move
189
+ if len(extreme) > 0:
190
+ report.warnings.append(
191
+ f"{len(extreme)} extreme daily returns (>50%): check for split errors"
192
+ )
193
+
194
+ # Volume anomalies
195
+ zero_vol = (df["volume"] == 0).sum()
196
+ if zero_vol > len(df) * 0.05:
197
+ report.warnings.append(f"{zero_vol} zero-volume bars ({zero_vol/len(df)*100:.1f}%)")
198
+
199
+ # Duplicate timestamps
200
+ dupes = df.index.duplicated().sum()
201
+ if dupes > 0:
202
+ report.issues.append(f"{dupes} duplicate timestamps")
203
+
204
+ return report
205
+ ```
206
+
207
+ ### Gap Handling
208
+
209
+ Missing data bars occur due to holidays, trading halts, data provider issues, or illiquid instruments. Handle gaps explicitly:
210
+
211
+ ```python
212
+ # data/gap_handling.py
213
+ import pandas as pd
214
+
215
+ def detect_gaps(
216
+ df: pd.DataFrame,
217
+ frequency: str = "1D",
218
+ max_gap_periods: int = 5,
219
+ ) -> list[dict]:
220
+ """Detect data gaps exceeding the maximum allowed threshold."""
221
+ expected_freq = pd.tseries.frequencies.to_offset(frequency)
222
+ gaps = []
223
+
224
+ for i in range(1, len(df)):
225
+ delta = df.index[i] - df.index[i - 1]
226
+ expected_delta = expected_freq * 1
227
+
228
+ # For daily data, skip weekends (2-day gaps are normal)
229
+ if frequency == "1D" and delta.days <= 3:
230
+ continue
231
+
232
+ gap_periods = delta / expected_delta
233
+ if gap_periods > max_gap_periods:
234
+ gaps.append({
235
+ "start": df.index[i - 1],
236
+ "end": df.index[i],
237
+ "gap_periods": int(gap_periods),
238
+ })
239
+
240
+ return gaps
241
+
242
+
243
+ def fill_gaps(
244
+ df: pd.DataFrame,
245
+ method: str = "ffill",
246
+ max_fill: int = 5,
247
+ ) -> pd.DataFrame:
248
+ """
249
+ Fill data gaps using specified method.
250
+
251
+ Args:
252
+ method: "ffill" (forward fill), "interpolate", or "drop".
253
+ max_fill: Maximum consecutive bars to fill.
254
+ """
255
+ if method == "ffill":
256
+ return df.ffill(limit=max_fill)
257
+ elif method == "interpolate":
258
+ return df.interpolate(method="time", limit=max_fill)
259
+ elif method == "drop":
260
+ return df.dropna()
261
+ else:
262
+ raise ValueError(f"Unknown fill method: {method}")
263
+ ```
264
+
265
+ ### Timezone Normalization
266
+
267
+ All timestamps must be normalized to a single reference timezone before any analysis. Mixed timezones cause subtle alignment bugs:
268
+
269
+ ```python
270
+ # data/timezone.py
271
+ import pandas as pd
272
+
273
+ # Exchange timezone mapping
274
+ EXCHANGE_TIMEZONES = {
275
+ "NYSE": "America/New_York",
276
+ "NASDAQ": "America/New_York",
277
+ "LSE": "Europe/London",
278
+ "TSE": "Asia/Tokyo",
279
+ "HKEX": "Asia/Hong_Kong",
280
+ "ASX": "Australia/Sydney",
281
+ }
282
+
283
+ def normalize_to_utc(
284
+ df: pd.DataFrame,
285
+ source_tz: str,
286
+ ) -> pd.DataFrame:
287
+ """
288
+ Normalize timestamps to UTC.
289
+
290
+ If timestamps are timezone-naive, localize to source_tz first.
291
+ """
292
+ if df.index.tz is None:
293
+ df.index = df.index.tz_localize(source_tz)
294
+ return df.tz_convert("UTC")
295
+
296
+
297
+ def align_multi_exchange(
298
+ datasets: dict[str, pd.DataFrame],
299
+ reference_tz: str = "UTC",
300
+ ) -> dict[str, pd.DataFrame]:
301
+ """Align datasets from multiple exchanges to a common timezone."""
302
+ aligned = {}
303
+ for symbol, df in datasets.items():
304
+ if df.index.tz is None:
305
+ raise ValueError(f"{symbol}: timestamps must be timezone-aware")
306
+ aligned[symbol] = df.tz_convert(reference_tz)
307
+ return aligned
308
+ ```
309
+
310
+ ### Local Data Cache
311
+
312
+ Cache data locally to avoid repeated API calls and ensure reproducibility:
313
+
314
+ ```python
315
+ # data/cache.py
316
+ import hashlib
317
+ import json
318
+ from pathlib import Path
319
+ from datetime import date
320
+ import pandas as pd
321
+
322
+ class DataCache:
323
+ """File-based cache for market data with invalidation."""
324
+
325
+ def __init__(self, cache_dir: str = "data/cache"):
326
+ self.cache_dir = Path(cache_dir)
327
+ self.cache_dir.mkdir(parents=True, exist_ok=True)
328
+
329
+ def _cache_key(self, symbol: str, start: date, end: date, freq: str) -> str:
330
+ raw = f"{symbol}_{start}_{end}_{freq}"
331
+ return hashlib.md5(raw.encode()).hexdigest()
332
+
333
+ def get(self, symbol: str, start: date, end: date, freq: str) -> pd.DataFrame | None:
334
+ key = self._cache_key(symbol, start, end, freq)
335
+ path = self.cache_dir / f"{key}.parquet"
336
+ if path.exists():
337
+ return pd.read_parquet(path)
338
+ return None
339
+
340
+ def put(self, symbol: str, start: date, end: date, freq: str,
341
+ data: pd.DataFrame) -> None:
342
+ key = self._cache_key(symbol, start, end, freq)
343
+ path = self.cache_dir / f"{key}.parquet"
344
+ data.to_parquet(path)
345
+
346
+ def invalidate(self, symbol: str, start: date, end: date, freq: str) -> None:
347
+ key = self._cache_key(symbol, start, end, freq)
348
+ path = self.cache_dir / f"{key}.parquet"
349
+ if path.exists():
350
+ path.unlink()
351
+ ```
352
+
353
+ ### Alternative Data Sources
354
+
355
+ Beyond price and volume, alternative data can provide unique signals:
356
+
357
+ | Data Type | Sources | Use Case |
358
+ |-----------|---------|----------|
359
+ | Sentiment | News APIs, social media | Contrarian/momentum signals |
360
+ | Fundamentals | SEC EDGAR, Quandl | Value-based strategies |
361
+ | Options flow | CBOE, OCC | Implied volatility signals |
362
+ | Insider trading | SEC Form 4 | Informed trading signals |
363
+ | Short interest | FINRA, exchanges | Crowding/squeeze signals |
364
+ | Macro indicators | FRED, World Bank | Regime detection |
365
+
366
+ Always validate alternative data for coverage, timeliness, and look-ahead bias before incorporating it into a strategy.
@@ -0,0 +1,335 @@
1
+ ---
2
+ name: research-quant-metrics
3
+ description: Quantitative performance metrics including Sharpe ratio, Sortino ratio, Calmar ratio, maximum drawdown, profit factor, win rate, expectancy, and alpha/beta decomposition
4
+ topics: [research, quant-finance, metrics, sharpe, sortino, calmar, drawdown, profit-factor, information-ratio, alpha, beta]
5
+ ---
6
+
7
+ Performance metrics are the lens through which every trading strategy is evaluated. A single metric is never sufficient -- strategies must be assessed across multiple dimensions including risk-adjusted return, tail risk, consistency, and independence from market direction. The choice of which metric to optimize (primary target) versus which to constrain (guardrails) is a fundamental research design decision. Optimizing the wrong metric or ignoring important dimensions leads to strategies that look good on paper but blow up in practice.
8
+
9
+ ## Summary
10
+
11
+ Implement a comprehensive metrics library covering risk-adjusted returns (Sharpe, Sortino, Calmar, information ratio), drawdown analysis (maximum drawdown, drawdown duration, recovery time), trade-level statistics (win rate, profit factor, expectancy, payoff ratio), and factor decomposition (alpha, beta, R-squared). Use annualized metrics with consistent conventions (252 trading days, risk-free rate from T-bills). Always compute confidence intervals via bootstrap resampling rather than relying on point estimates.
12
+
13
+ ## Deep Guidance
14
+
15
+ ### Risk-Adjusted Return Metrics
16
+
17
+ ```python
18
+ # metrics/risk_adjusted.py
19
+ import numpy as np
20
+ import pandas as pd
21
+
22
+ def sharpe_ratio(
23
+ returns: pd.Series,
24
+ risk_free_rate: float = 0.04,
25
+ periods_per_year: int = 252,
26
+ ) -> float:
27
+ """
28
+ Annualized Sharpe ratio.
29
+
30
+ Sharpe = (mean_return - risk_free_rate) / std_return
31
+ Annualized by multiplying by sqrt(periods_per_year).
32
+ """
33
+ excess = returns - risk_free_rate / periods_per_year
34
+ if excess.std() == 0:
35
+ return 0.0
36
+ return float(excess.mean() / excess.std() * np.sqrt(periods_per_year))
37
+
38
+
39
+ def sortino_ratio(
40
+ returns: pd.Series,
41
+ risk_free_rate: float = 0.04,
42
+ periods_per_year: int = 252,
43
+ ) -> float:
44
+ """
45
+ Sortino ratio — like Sharpe but uses downside deviation only.
46
+
47
+ Penalises downside volatility more heavily, which is often more
48
+ relevant since upside volatility is desirable.
49
+ """
50
+ excess = returns - risk_free_rate / periods_per_year
51
+ downside = excess[excess < 0]
52
+ downside_std = np.sqrt((downside**2).mean()) if len(downside) > 0 else 0.0
53
+ if downside_std == 0:
54
+ return 0.0
55
+ return float(excess.mean() / downside_std * np.sqrt(periods_per_year))
56
+
57
+
58
+ def calmar_ratio(
59
+ returns: pd.Series,
60
+ periods_per_year: int = 252,
61
+ ) -> float:
62
+ """
63
+ Calmar ratio — annualized return divided by maximum drawdown.
64
+
65
+ Measures return per unit of worst-case risk.
66
+ """
67
+ ann_return = returns.mean() * periods_per_year
68
+ max_dd = maximum_drawdown(returns)
69
+ if max_dd == 0:
70
+ return 0.0
71
+ return float(ann_return / abs(max_dd))
72
+
73
+
74
+ def information_ratio(
75
+ returns: pd.Series,
76
+ benchmark_returns: pd.Series,
77
+ periods_per_year: int = 252,
78
+ ) -> float:
79
+ """
80
+ Information ratio — excess return over benchmark per unit of tracking error.
81
+
82
+ Measures the consistency of active returns relative to a benchmark.
83
+ """
84
+ active_returns = returns - benchmark_returns
85
+ tracking_error = active_returns.std()
86
+ if tracking_error == 0:
87
+ return 0.0
88
+ return float(
89
+ active_returns.mean() / tracking_error * np.sqrt(periods_per_year)
90
+ )
91
+ ```
92
+
93
+ ### Drawdown Analysis
94
+
95
+ ```python
96
+ # metrics/drawdown.py
97
+ import numpy as np
98
+ import pandas as pd
99
+ from dataclasses import dataclass
100
+
101
+ @dataclass
102
+ class DrawdownAnalysis:
103
+ """Complete drawdown analysis results."""
104
+ max_drawdown: float # Worst peak-to-trough decline (negative)
105
+ max_drawdown_duration: int # Days in worst drawdown
106
+ max_recovery_time: int # Days to recover from worst drawdown
107
+ avg_drawdown: float # Average drawdown across all drawdown periods
108
+ drawdown_series: pd.Series # Full drawdown time series
109
+
110
+ def maximum_drawdown(returns: pd.Series) -> float:
111
+ """Maximum peak-to-trough decline as a negative fraction."""
112
+ equity = (1 + returns).cumprod()
113
+ running_max = equity.cummax()
114
+ drawdown = (equity - running_max) / running_max
115
+ return float(drawdown.min())
116
+
117
+ def analyze_drawdowns(returns: pd.Series) -> DrawdownAnalysis:
118
+ """Comprehensive drawdown analysis."""
119
+ equity = (1 + returns).cumprod()
120
+ running_max = equity.cummax()
121
+ drawdown = (equity - running_max) / running_max
122
+
123
+ # Find drawdown periods (contiguous sequences where drawdown < 0)
124
+ in_drawdown = drawdown < 0
125
+ dd_starts = in_drawdown & ~in_drawdown.shift(1, fill_value=False)
126
+ dd_ends = ~in_drawdown & in_drawdown.shift(1, fill_value=False)
127
+
128
+ # Calculate duration of worst drawdown
129
+ max_dd = drawdown.min()
130
+ max_dd_idx = drawdown.idxmin()
131
+
132
+ # Find the peak before the max drawdown
133
+ peak_idx = equity[:max_dd_idx].idxmax()
134
+ dd_duration = len(equity[peak_idx:max_dd_idx])
135
+
136
+ # Find recovery point after max drawdown
137
+ post_dd = equity[max_dd_idx:]
138
+ peak_val = equity[peak_idx]
139
+ recovered = post_dd[post_dd >= peak_val]
140
+ recovery_time = len(post_dd[:recovered.index[0]]) if len(recovered) > 0 else -1
141
+
142
+ return DrawdownAnalysis(
143
+ max_drawdown=max_dd,
144
+ max_drawdown_duration=dd_duration,
145
+ max_recovery_time=recovery_time,
146
+ avg_drawdown=float(drawdown[drawdown < 0].mean()) if (drawdown < 0).any() else 0.0,
147
+ drawdown_series=drawdown,
148
+ )
149
+ ```
150
+
151
+ ### Trade-Level Statistics
152
+
153
+ ```python
154
+ # metrics/trade_stats.py
155
+ import numpy as np
156
+ from dataclasses import dataclass
157
+
158
+ @dataclass
159
+ class TradeStatistics:
160
+ """Statistics computed from individual trade P&L records."""
161
+ total_trades: int
162
+ winning_trades: int
163
+ losing_trades: int
164
+ win_rate: float # Fraction of winning trades
165
+ avg_win: float # Average winning trade P&L
166
+ avg_loss: float # Average losing trade P&L (negative)
167
+ profit_factor: float # Gross profit / gross loss
168
+ expectancy: float # Expected P&L per trade
169
+ payoff_ratio: float # avg_win / abs(avg_loss)
170
+ max_consecutive_wins: int
171
+ max_consecutive_losses: int
172
+
173
+ def compute_trade_statistics(trade_pnls: list[float]) -> TradeStatistics:
174
+ """Compute comprehensive trade-level statistics from P&L list."""
175
+ pnls = np.array(trade_pnls)
176
+ wins = pnls[pnls > 0]
177
+ losses = pnls[pnls < 0]
178
+
179
+ total = len(pnls)
180
+ win_count = len(wins)
181
+ loss_count = len(losses)
182
+ win_rate = win_count / total if total > 0 else 0.0
183
+
184
+ avg_win = float(wins.mean()) if len(wins) > 0 else 0.0
185
+ avg_loss = float(losses.mean()) if len(losses) > 0 else 0.0
186
+
187
+ gross_profit = float(wins.sum()) if len(wins) > 0 else 0.0
188
+ gross_loss = float(abs(losses.sum())) if len(losses) > 0 else 0.0
189
+ profit_factor = gross_profit / gross_loss if gross_loss > 0 else float("inf")
190
+
191
+ expectancy = float(pnls.mean()) if total > 0 else 0.0
192
+ payoff_ratio = avg_win / abs(avg_loss) if avg_loss != 0 else float("inf")
193
+
194
+ # Consecutive wins/losses
195
+ max_consec_wins = _max_consecutive(pnls > 0)
196
+ max_consec_losses = _max_consecutive(pnls < 0)
197
+
198
+ return TradeStatistics(
199
+ total_trades=total,
200
+ winning_trades=win_count,
201
+ losing_trades=loss_count,
202
+ win_rate=win_rate,
203
+ avg_win=avg_win,
204
+ avg_loss=avg_loss,
205
+ profit_factor=profit_factor,
206
+ expectancy=expectancy,
207
+ payoff_ratio=payoff_ratio,
208
+ max_consecutive_wins=max_consec_wins,
209
+ max_consecutive_losses=max_consec_losses,
210
+ )
211
+
212
+ def _max_consecutive(mask: np.ndarray) -> int:
213
+ """Count the longest consecutive True run in a boolean array."""
214
+ if len(mask) == 0:
215
+ return 0
216
+ max_run = 0
217
+ current_run = 0
218
+ for val in mask:
219
+ if val:
220
+ current_run += 1
221
+ max_run = max(max_run, current_run)
222
+ else:
223
+ current_run = 0
224
+ return max_run
225
+ ```
226
+
227
+ ### Alpha/Beta Decomposition
228
+
229
+ ```python
230
+ # metrics/factor.py
231
+ import numpy as np
232
+ import pandas as pd
233
+ from dataclasses import dataclass
234
+
235
+ @dataclass
236
+ class FactorDecomposition:
237
+ """Alpha/beta decomposition against a benchmark."""
238
+ alpha: float # Annualized excess return not explained by benchmark
239
+ beta: float # Sensitivity to benchmark returns
240
+ r_squared: float # Fraction of variance explained by benchmark
241
+ residual_vol: float # Annualized volatility of unexplained returns
242
+
243
+ def decompose_returns(
244
+ strategy_returns: pd.Series,
245
+ benchmark_returns: pd.Series,
246
+ risk_free_rate: float = 0.04,
247
+ periods_per_year: int = 252,
248
+ ) -> FactorDecomposition:
249
+ """
250
+ Decompose strategy returns into alpha and beta components.
251
+
252
+ Uses OLS regression: R_strategy = alpha + beta * R_benchmark + epsilon
253
+ """
254
+ aligned = pd.concat(
255
+ [strategy_returns, benchmark_returns], axis=1, keys=["strat", "bench"]
256
+ ).dropna()
257
+
258
+ rf_daily = risk_free_rate / periods_per_year
259
+ excess_strat = aligned["strat"] - rf_daily
260
+ excess_bench = aligned["bench"] - rf_daily
261
+
262
+ beta = float(
263
+ np.cov(excess_strat, excess_bench)[0, 1]
264
+ / np.var(excess_bench)
265
+ )
266
+
267
+ alpha_daily = float(excess_strat.mean() - beta * excess_bench.mean())
268
+ alpha_annual = alpha_daily * periods_per_year
269
+
270
+ residuals = excess_strat - beta * excess_bench
271
+ ss_res = float((residuals**2).sum())
272
+ ss_tot = float(((excess_strat - excess_strat.mean()) ** 2).sum())
273
+ r_squared = 1 - ss_res / ss_tot if ss_tot > 0 else 0.0
274
+
275
+ residual_vol = float(residuals.std() * np.sqrt(periods_per_year))
276
+
277
+ return FactorDecomposition(
278
+ alpha=alpha_annual,
279
+ beta=beta,
280
+ r_squared=r_squared,
281
+ residual_vol=residual_vol,
282
+ )
283
+ ```
284
+
285
+ ### Bootstrap Confidence Intervals
286
+
287
+ Point estimates are unreliable. Always compute confidence intervals:
288
+
289
+ ```python
290
+ # metrics/bootstrap.py
291
+ import numpy as np
292
+
293
+ def bootstrap_sharpe_ci(
294
+ returns: np.ndarray,
295
+ n_bootstrap: int = 10_000,
296
+ confidence: float = 0.95,
297
+ periods_per_year: int = 252,
298
+ ) -> tuple[float, float, float]:
299
+ """
300
+ Bootstrap confidence interval for the Sharpe ratio.
301
+
302
+ Returns:
303
+ Tuple of (point_estimate, ci_lower, ci_upper).
304
+ """
305
+ n = len(returns)
306
+ sharpes = np.empty(n_bootstrap)
307
+
308
+ for i in range(n_bootstrap):
309
+ sample = np.random.choice(returns, size=n, replace=True)
310
+ if sample.std() > 0:
311
+ sharpes[i] = sample.mean() / sample.std() * np.sqrt(periods_per_year)
312
+ else:
313
+ sharpes[i] = 0.0
314
+
315
+ alpha = (1 - confidence) / 2
316
+ ci_lower = float(np.percentile(sharpes, alpha * 100))
317
+ ci_upper = float(np.percentile(sharpes, (1 - alpha) * 100))
318
+ point = float(np.mean(sharpes))
319
+
320
+ return point, ci_lower, ci_upper
321
+ ```
322
+
323
+ ### Metrics Interpretation Guide
324
+
325
+ | Metric | Excellent | Good | Marginal | Poor |
326
+ |--------|-----------|------|----------|------|
327
+ | Sharpe ratio | > 2.0 | 1.0 - 2.0 | 0.5 - 1.0 | < 0.5 |
328
+ | Sortino ratio | > 3.0 | 1.5 - 3.0 | 0.7 - 1.5 | < 0.7 |
329
+ | Calmar ratio | > 3.0 | 1.0 - 3.0 | 0.5 - 1.0 | < 0.5 |
330
+ | Max drawdown | < 10% | 10 - 20% | 20 - 30% | > 30% |
331
+ | Profit factor | > 2.0 | 1.5 - 2.0 | 1.0 - 1.5 | < 1.0 |
332
+ | Win rate | > 55% | 45 - 55% | 35 - 45% | < 35% |
333
+ | Information ratio | > 1.0 | 0.5 - 1.0 | 0.2 - 0.5 | < 0.2 |
334
+
335
+ Caution: These thresholds are for daily-frequency strategies. Higher-frequency strategies typically have higher Sharpe ratios but lower capacity, and lower-frequency strategies have lower Sharpe ratios but higher capacity.