PyPI - alpha-engine-lib - Versions diffs - 0.46.0__tar.gz → 0.47.0__tar.gz - Mend

alpha-engine-lib 0.46.0tar.gz → 0.47.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (95) hide show

{alpha_engine_lib-0.46.0 → alpha_engine_lib-0.47.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: alpha-engine-lib
-Version: 0.46.0
+Version: 0.47.0
 Summary: Shared utilities for the Alpha Engine modules: preflight, logging, ArcticDB, dates, decision capture, cost telemetry, Anthropic payload chokepoint, artifact freshness, RAG, agent schemas, SSM secrets, Telegram + SNS alerts, EC2 spot resilience, SSM log-capture, SSM dispatcher, Step-Functions execution-state projection, and S3-conditional-PUT writer locks. Full surface documented in README.
 Author: Brian McMahon
 License: Proprietary
@@ -16,6 +16,10 @@ Requires-Dist: arcticdb>=6.11; extra == "arcticdb"
 Requires-Dist: pandas>=2.0; extra == "arcticdb"
 Provides-Extra: quant
 Requires-Dist: numpy>=1.24; extra == "quant"
+Provides-Extra: quant-xs
+Requires-Dist: numpy>=1.24; extra == "quant-xs"
+Requires-Dist: pandas>=2.0; extra == "quant-xs"
+Requires-Dist: scikit-learn>=1.0; extra == "quant-xs"
 Provides-Extra: flow-doctor
 Requires-Dist: flow-doctor[diagnosis,s3]<0.5.0,>=0.4.0; extra == "flow-doctor"
 Provides-Extra: rag
@@ -254,7 +258,8 @@ Rotates across `(instance_type × subnet)` combinations on `InsufficientInstance
 The shared institutional-analytics engine: pure, front-end- and data-source-agnostic functions that *describe and measure* a portfolio (performance, risk, attribution) with **no advisory logic** — it sits on the "analytics, not advice" side of the line. Lifted from robodashboard's `analytics/` after the 2026-06-03 cross-repo leverage audit, so both the alpha-engine fleet and robodashboard consume one engine instead of parallel reimplementations. Import the submodule you need (the package keeps no eager imports, so the stdlib-only modules import without numpy):
-- **`quant.factor_risk`** — statistical factor risk model `Σ = B·F·Bᵀ + D`: `estimate_factor_model` (time-series factor-ETF / Fama-MacBeth loadings), `portfolio_risk` (ex-ante vol + factor/idio split + per-factor variance contribution), `tracking_error`, `benchmark_exposure`, and a numpy-only `ledoit_wolf_cov` (no sklearn). The estimator-agnostic consumption core (`portfolio_risk`/`tracking_error`) consumes any `FactorRiskModel` (B, F, D). **Needs numpy** — `pip install "alpha-engine-lib[quant]"`.
+- **`quant.factor_risk`** — statistical factor risk model `Σ = B·F·Bᵀ + D`, **Option B** (time-series factor-ETF estimator): `estimate_factor_model` (regress holdings on given factor return series), `portfolio_risk` (ex-ante vol + factor/idio split + per-factor variance contribution), `tracking_error`, `benchmark_exposure`, and a numpy-only `ledoit_wolf_cov` (no sklearn). The estimator-agnostic consumption core (`portfolio_risk`/`tracking_error`) consumes any `FactorRiskModel` (B, F, D). **Needs numpy** — `pip install "alpha-engine-lib[quant]"`.
+- **`quant.factor_risk_xs`** — same `Σ = B·F·Bᵀ + D` model, **Option A** (universe-wide cross-sectional Fama-MacBeth estimator): take *exogenous* per-ticker loadings `B` and infer factor returns `f_t` via a cross-sectional OLS at each date → `F`/`D` (`build_factor_risk_model`, `cross_sectional_factor_returns`, `estimate_factor_covariance`, `estimate_idiosyncratic_variance`). **Needs pandas + scikit-learn** — `pip install "alpha-engine-lib[quant-xs]"` (kept separate so numpy-only consumers stay light).
 - **`quant.risk_measures`** — parametric (Gaussian, Acklam inverse-normal, no scipy) + historical VaR & CVaR, as positive loss fractions at a horizon (stdlib).
 - **`quant.riskstats`** — `volatility`, `sharpe_ratio`, `sortino_ratio`, `max_drawdown` (stdlib).
 - **`quant.returns`** — `xirr` (money-weighted, Newton + bisection), `time_weighted_return` (GIPS), `cumulative_return`, `annualize` (stdlib).

{alpha_engine_lib-0.46.0 → alpha_engine_lib-0.47.0}/README.md RENAMED Viewed

@@ -223,7 +223,8 @@ Rotates across `(instance_type × subnet)` combinations on `InsufficientInstance
 The shared institutional-analytics engine: pure, front-end- and data-source-agnostic functions that *describe and measure* a portfolio (performance, risk, attribution) with **no advisory logic** — it sits on the "analytics, not advice" side of the line. Lifted from robodashboard's `analytics/` after the 2026-06-03 cross-repo leverage audit, so both the alpha-engine fleet and robodashboard consume one engine instead of parallel reimplementations. Import the submodule you need (the package keeps no eager imports, so the stdlib-only modules import without numpy):
-- **`quant.factor_risk`** — statistical factor risk model `Σ = B·F·Bᵀ + D`: `estimate_factor_model` (time-series factor-ETF / Fama-MacBeth loadings), `portfolio_risk` (ex-ante vol + factor/idio split + per-factor variance contribution), `tracking_error`, `benchmark_exposure`, and a numpy-only `ledoit_wolf_cov` (no sklearn). The estimator-agnostic consumption core (`portfolio_risk`/`tracking_error`) consumes any `FactorRiskModel` (B, F, D). **Needs numpy** — `pip install "alpha-engine-lib[quant]"`.
+- **`quant.factor_risk`** — statistical factor risk model `Σ = B·F·Bᵀ + D`, **Option B** (time-series factor-ETF estimator): `estimate_factor_model` (regress holdings on given factor return series), `portfolio_risk` (ex-ante vol + factor/idio split + per-factor variance contribution), `tracking_error`, `benchmark_exposure`, and a numpy-only `ledoit_wolf_cov` (no sklearn). The estimator-agnostic consumption core (`portfolio_risk`/`tracking_error`) consumes any `FactorRiskModel` (B, F, D). **Needs numpy** — `pip install "alpha-engine-lib[quant]"`.
+- **`quant.factor_risk_xs`** — same `Σ = B·F·Bᵀ + D` model, **Option A** (universe-wide cross-sectional Fama-MacBeth estimator): take *exogenous* per-ticker loadings `B` and infer factor returns `f_t` via a cross-sectional OLS at each date → `F`/`D` (`build_factor_risk_model`, `cross_sectional_factor_returns`, `estimate_factor_covariance`, `estimate_idiosyncratic_variance`). **Needs pandas + scikit-learn** — `pip install "alpha-engine-lib[quant-xs]"` (kept separate so numpy-only consumers stay light).
 - **`quant.risk_measures`** — parametric (Gaussian, Acklam inverse-normal, no scipy) + historical VaR & CVaR, as positive loss fractions at a horizon (stdlib).
 - **`quant.riskstats`** — `volatility`, `sharpe_ratio`, `sortino_ratio`, `max_drawdown` (stdlib).
 - **`quant.returns`** — `xirr` (money-weighted, Newton + bisection), `time_weighted_return` (GIPS), `cumulative_return`, `annualize` (stdlib).

{alpha_engine_lib-0.46.0 → alpha_engine_lib-0.47.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "alpha-engine-lib"
-version = "0.46.0"
+version = "0.47.0"
 description = "Shared utilities for the Alpha Engine modules: preflight, logging, ArcticDB, dates, decision capture, cost telemetry, Anthropic payload chokepoint, artifact freshness, RAG, agent schemas, SSM secrets, Telegram + SNS alerts, EC2 spot resilience, SSM log-capture, SSM dispatcher, Step-Functions execution-state projection, and S3-conditional-PUT writer locks. Full surface documented in README."
 readme = "README.md"
 # EC2 still runs Python 3.9 on the always-on micro instance (boto3 drops
@@ -34,6 +34,11 @@ arcticdb = ["arcticdb>=6.11", "pandas>=2.0"]
 # factor-risk module needs numpy; the VaR/CVaR, riskstats, returns, and
 # attribution modules are pure stdlib and import without this extra.
 quant = ["numpy>=1.24"]
+# Cross-sectional (Fama-MacBeth) factor risk model — quant.factor_risk_xs.
+# Needs pandas (always) + scikit-learn (LedoitWolf/OAS shrinkage). Kept
+# separate from [quant] so the numpy-only consumers (e.g. robodashboard)
+# don't pull pandas+sklearn.
+quant-xs = ["numpy>=1.24", "pandas>=2.0", "scikit-learn>=1.0"]
 flow_doctor = ["flow-doctor[diagnosis,s3]>=0.4.0,<0.5.0"]
 rag = [
     "psycopg2-binary>=2.9",

{alpha_engine_lib-0.46.0 → alpha_engine_lib-0.47.0}/src/alpha_engine_lib/__init__.py RENAMED Viewed

@@ -1,3 +1,3 @@
 """alpha-engine-lib — shared utilities for Alpha Engine modules."""
-__version__ = "0.46.0"
+__version__ = "0.47.0"

alpha_engine_lib-0.47.0/src/alpha_engine_lib/quant/factor_risk_xs.py ADDED Viewed

@@ -0,0 +1,332 @@
+"""Cross-sectional (Fama-MacBeth) factor risk model — the "Option A" estimator.
+Complements ``quant.factor_risk`` (the "Option B" time-series factor-ETF
+estimator). Both produce the inputs to the same Σ = B·F·Bᵀ + D structural
+covariance consumed by ``quant.factor_risk.portfolio_risk`` / ``tracking_error``;
+they differ only in how the factor returns ``f_t`` and the factor covariance
+``F`` are estimated:
+  - **Option B** (``factor_risk.estimate_factor_model``) — regress each holding's
+    return series on a small set of *given* factor return series (market +
+    style-ETF spreads). Loadings ``B`` are the regression betas. numpy-only.
+  - **Option A** (here) — take *exogenous* per-ticker factor loadings ``B`` (e.g.
+    fundamentals-derived style exposures) and infer the factor returns ``f_t`` by
+    a cross-sectional OLS at each date (Fama-MacBeth 1973):
+        r_t = B_{t-1} · f_t + ε_t
+    Stacking ``f_t`` over a rolling window gives a (T × K) factor-return panel;
+    ``F`` is its (Ledoit-Wolf-shrunk) covariance and ``D`` the per-ticker
+    time-series variance of the residuals ε. This is the universe-wide Barra-lite
+    build.
+**Dependencies:** pandas (always) + scikit-learn (lazy, only for the
+``ledoit_wolf``/``oas`` shrinkage estimators). Install ``alpha-engine-lib[quant-xs]``.
+Kept in its own module so the numpy-only ``factor_risk``/``risk_measures``/etc.
+consumers don't pull pandas+sklearn.
+References:
+  - Fama & MacBeth 1973 "Risk, Return, and Equilibrium: Empirical Tests"
+    (JPE 81(3)) — cross-sectional-regression construction of factor returns
+  - Grinold & Kahn 2000, _Active Portfolio Management_, Ch. 3 — canonical
+    structural factor risk model
+  - Menchero, Orr & Wang 2011 "The Barra US Equity Model (USE4)
+    Methodology Notes" — operational reference
+"""
+from __future__ import annotations
+import logging
+from typing import Iterable
+import numpy as np
+import pandas as pd
+log = logging.getLogger(__name__)
+_MIN_OBS_OVER_K = 5  # require ≥ K + 5 valid observations for a stable regression
+def cross_sectional_factor_returns(
+    returns_t: np.ndarray,
+    loadings_prev: np.ndarray,
+    *,
+    include_intercept: bool = True,
+) -> tuple[np.ndarray, np.ndarray]:
+    """Solve r_t = B_{t-1} · f_t + ε_t for one date via OLS.
+    Args:
+        returns_t: (N,) realized returns at time t.
+        loadings_prev: (N, K) factor loadings at time t-1.
+        include_intercept: if True, prepends a column of 1s to the
+            loadings (the "market" factor return). f_t[0] becomes the
+            cross-sectional mean return; f_t[1:] are the per-factor
+            slopes. Default True.
+    Returns:
+        (f_t, residuals):
+          • f_t: (K_eff,) factor return vector — length K+1 with
+            intercept, K without.
+          • residuals: (N,) per-ticker ε_t. NaN for rows where the
+            inputs had NaN (preserved positionally so the caller can
+            keep aligning with the universe).
+    Rows with NaN in either r_t or any column of B_{t-1} are excluded
+    from the regression. If fewer than K_eff + 5 valid rows remain
+    (the regression is unstable), returns all-NaN for both outputs.
+    """
+    returns_t = np.asarray(returns_t, dtype=np.float64).ravel()
+    loadings_prev = np.asarray(loadings_prev, dtype=np.float64)
+    if loadings_prev.ndim != 2:
+        raise ValueError(
+            f"loadings_prev must be 2-D (N × K); got shape {loadings_prev.shape}"
+        )
+    N, K = loadings_prev.shape
+    if returns_t.shape != (N,):
+        raise ValueError(
+            f"returns_t shape {returns_t.shape} != ({N},) matching loadings rows"
+        )
+    if include_intercept:
+        B = np.column_stack([np.ones(N), loadings_prev])
+        K_eff = K + 1
+    else:
+        B = loadings_prev
+        K_eff = K
+    valid = np.isfinite(returns_t) & np.all(np.isfinite(B), axis=1)
+    n_valid = int(valid.sum())
+    if n_valid < K_eff + _MIN_OBS_OVER_K:
+        return np.full(K_eff, np.nan), np.full(N, np.nan)
+    r_valid = returns_t[valid]
+    B_valid = B[valid]
+    # OLS via lstsq is rank-robust (returns minimum-norm solution if B
+    # is rank-deficient). Rank-deficient B is a soft warning, not an
+    # error — caller decides whether to drop low-rank dates.
+    f_t, *_ = np.linalg.lstsq(B_valid, r_valid, rcond=None)
+    residuals = np.full(N, np.nan)
+    residuals[valid] = r_valid - B_valid @ f_t
+    return f_t, residuals
+def build_factor_returns_series(
+    returns_panel: pd.DataFrame,
+    loadings_by_date: dict[pd.Timestamp, pd.DataFrame],
+    *,
+    include_intercept: bool = True,
+    factor_names: Iterable[str] | None = None,
+) -> tuple[pd.DataFrame, pd.DataFrame]:
+    """Loop over dates in ``returns_panel``; for each date t, run the
+    cross-sectional regression r_t = B_{t-1} · f_t + ε_t.
+    Args:
+        returns_panel: (T × N) DataFrame indexed by date, columns are
+            ticker names. r_t is the t-th row.
+        loadings_by_date: mapping date_t-1 → (N × K) DataFrame of
+            factor loadings for that date. Indexed by ticker, columns
+            are factor names. The driver looks up loadings at the
+            previous available date for each t (most recent ≤ t-1).
+        include_intercept: prepends a market-factor column. See
+            cross_sectional_factor_returns. Default True.
+        factor_names: optional explicit order for the K factor columns.
+            If provided, loadings_by_date entries are reindexed to this
+            order. Default: use the order of the first loadings frame.
+    Returns:
+        (factor_returns_df, residuals_df):
+          • factor_returns_df: (T × K_eff) — index matches returns_panel
+            dates; columns are ["market", *factor_names] when intercept
+            is on, [*factor_names] when off.
+          • residuals_df: (T × N) — same shape as returns_panel; NaN
+            where the regression was skipped or input was missing.
+    """
+    if returns_panel.empty:
+        return pd.DataFrame(), pd.DataFrame()
+    dates = list(returns_panel.index)
+    tickers = list(returns_panel.columns)
+    N = len(tickers)
+    # Resolve canonical factor name list from the first usable loadings frame
+    if factor_names is None:
+        sample = next(iter(loadings_by_date.values()), None)
+        if sample is None:
+            raise ValueError("loadings_by_date is empty — nothing to regress against")
+        factor_names = list(sample.columns)
+    factor_names = list(factor_names)
+    K = len(factor_names)
+    col_names = (["market"] + factor_names) if include_intercept else factor_names
+    f_panel = np.full((len(dates), len(col_names)), np.nan)
+    eps_panel = np.full((len(dates), N), np.nan)
+    sorted_loading_dates = sorted(loadings_by_date.keys())
+    for i, date_t in enumerate(dates):
+        prev_date = _latest_loading_date_at_or_before(sorted_loading_dates, date_t)
+        if prev_date is None:
+            continue
+        B_df = loadings_by_date[prev_date].reindex(index=tickers, columns=factor_names)
+        if B_df.empty:
+            continue
+        B = B_df.to_numpy(dtype=np.float64)
+        r = returns_panel.iloc[i].to_numpy(dtype=np.float64)
+        f_t, residuals = cross_sectional_factor_returns(
+            r, B, include_intercept=include_intercept,
+        )
+        f_panel[i] = f_t
+        eps_panel[i] = residuals
+    factor_returns_df = pd.DataFrame(f_panel, index=dates, columns=col_names)
+    residuals_df = pd.DataFrame(eps_panel, index=dates, columns=tickers)
+    return factor_returns_df, residuals_df
+def _latest_loading_date_at_or_before(
+    sorted_dates: list[pd.Timestamp], cutoff: pd.Timestamp,
+) -> pd.Timestamp | None:
+    """Bisect for the latest loading-date strictly < cutoff (informationally
+    safe: at date t we only know loadings as of date t-1)."""
+    import bisect
+    idx = bisect.bisect_left(sorted_dates, cutoff)
+    if idx == 0:
+        return None
+    return sorted_dates[idx - 1]
+def estimate_factor_covariance(
+    factor_returns_df: pd.DataFrame,
+    *,
+    shrinkage: str = "ledoit_wolf",
+    min_obs: int = 30,
+) -> pd.DataFrame:
+    """Estimate F = Cov(f_t) over the factor-return panel.
+    Drops rows with any NaN (incomplete regressions). Default LW shrinkage
+    mirrors the executor's portfolio_optimizer default; "sample" and "oas"
+    also supported. Reuses sklearn estimators.
+    Args:
+        factor_returns_df: (T × K_eff) factor-return panel from
+            build_factor_returns_series.
+        shrinkage: estimator name. "ledoit_wolf" (default), "sample", "oas".
+        min_obs: minimum clean rows required. Below floor returns an
+            all-NaN F so the caller knows the build was insufficient
+            (per no-silent-fails — would-be downstream consumers of F
+            see NaN, not silently zero).
+    Returns:
+        F: (K_eff × K_eff) DataFrame, index + columns are factor names.
+    """
+    clean = factor_returns_df.dropna()
+    K = factor_returns_df.shape[1]
+    cols = list(factor_returns_df.columns)
+    if len(clean) < min_obs:
+        log.warning(
+            "estimate_factor_covariance: only %d clean rows (need ≥%d) — "
+            "returning all-NaN F", len(clean), min_obs,
+        )
+        return pd.DataFrame(np.full((K, K), np.nan), index=cols, columns=cols)
+    if shrinkage == "ledoit_wolf":
+        from sklearn.covariance import LedoitWolf
+        F = LedoitWolf().fit(clean.to_numpy()).covariance_
+    elif shrinkage == "oas":
+        from sklearn.covariance import OAS
+        F = OAS().fit(clean.to_numpy()).covariance_
+    elif shrinkage == "sample":
+        F = np.cov(clean.to_numpy(), rowvar=False)
+    else:
+        raise ValueError(f"Unknown shrinkage: {shrinkage!r}")
+    return pd.DataFrame(F, index=cols, columns=cols)
+def estimate_idiosyncratic_variance(
+    residuals_df: pd.DataFrame,
+    *,
+    min_obs: int = 30,
+) -> pd.Series:
+    """Per-ticker D_{ii} = Var(ε_{i,t}) — diagonal of the residual cov.
+    Tickers with fewer than ``min_obs`` non-NaN residual rows are
+    emitted as NaN per no-silent-fails (downstream Σ = B·F·Bᵀ + D
+    construction treats NaN D as "skip this name" or falls back to a
+    safe default).
+    Args:
+        residuals_df: (T × N) residual panel from
+            build_factor_returns_series.
+        min_obs: minimum non-NaN observations per ticker.
+    Returns:
+        D: (N,) Series indexed by ticker.
+    """
+    out = pd.Series(np.nan, index=residuals_df.columns, dtype=np.float64)
+    for ticker in residuals_df.columns:
+        eps = residuals_df[ticker].dropna()
+        if len(eps) < min_obs:
+            continue
+        # Population variance (N divisor — universe is the population for
+        # cross-sectional regressions) to match the F estimator convention.
+        out[ticker] = float(eps.var(ddof=0))
+    return out
+def build_factor_risk_model(
+    returns_panel: pd.DataFrame,
+    loadings_by_date: dict[pd.Timestamp, pd.DataFrame],
+    *,
+    include_intercept: bool = True,
+    cov_shrinkage: str = "ledoit_wolf",
+    min_cov_obs: int = 30,
+    min_idio_obs: int = 30,
+) -> dict:
+    """End-to-end builder: cross-sectional regressions → F + D.
+    Returns a dict with keys:
+      • "factor_returns": (T × K_eff) DataFrame
+      • "residuals": (T × N) DataFrame
+      • "F": (K_eff × K_eff) DataFrame
+      • "D": (N,) Series
+      • "metadata": dict with n_dates, n_clean_dates, K_eff, n_tickers
+    """
+    factor_returns, residuals = build_factor_returns_series(
+        returns_panel, loadings_by_date,
+        include_intercept=include_intercept,
+    )
+    F = estimate_factor_covariance(
+        factor_returns, shrinkage=cov_shrinkage, min_obs=min_cov_obs,
+    )
+    D = estimate_idiosyncratic_variance(residuals, min_obs=min_idio_obs)
+    n_clean = int(factor_returns.dropna().shape[0])
+    metadata = {
+        "n_dates": int(factor_returns.shape[0]),
+        "n_clean_dates": n_clean,
+        "K_eff": int(factor_returns.shape[1]),
+        "n_tickers": int(returns_panel.shape[1]),
+        "cov_shrinkage": cov_shrinkage,
+        "include_intercept": bool(include_intercept),
+    }
+    return {
+        "factor_returns": factor_returns,
+        "residuals": residuals,
+        "F": F,
+        "D": D,
+        "metadata": metadata,
+    }
+__all__ = [
+    "cross_sectional_factor_returns",
+    "build_factor_returns_series",
+    "estimate_factor_covariance",
+    "estimate_idiosyncratic_variance",
+    "build_factor_risk_model",
+]

{alpha_engine_lib-0.46.0 → alpha_engine_lib-0.47.0}/src/alpha_engine_lib.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: alpha-engine-lib
-Version: 0.46.0
+Version: 0.47.0
 Summary: Shared utilities for the Alpha Engine modules: preflight, logging, ArcticDB, dates, decision capture, cost telemetry, Anthropic payload chokepoint, artifact freshness, RAG, agent schemas, SSM secrets, Telegram + SNS alerts, EC2 spot resilience, SSM log-capture, SSM dispatcher, Step-Functions execution-state projection, and S3-conditional-PUT writer locks. Full surface documented in README.
 Author: Brian McMahon
 License: Proprietary
@@ -16,6 +16,10 @@ Requires-Dist: arcticdb>=6.11; extra == "arcticdb"
 Requires-Dist: pandas>=2.0; extra == "arcticdb"
 Provides-Extra: quant
 Requires-Dist: numpy>=1.24; extra == "quant"
+Provides-Extra: quant-xs
+Requires-Dist: numpy>=1.24; extra == "quant-xs"
+Requires-Dist: pandas>=2.0; extra == "quant-xs"
+Requires-Dist: scikit-learn>=1.0; extra == "quant-xs"
 Provides-Extra: flow-doctor
 Requires-Dist: flow-doctor[diagnosis,s3]<0.5.0,>=0.4.0; extra == "flow-doctor"
 Provides-Extra: rag
@@ -254,7 +258,8 @@ Rotates across `(instance_type × subnet)` combinations on `InsufficientInstance
 The shared institutional-analytics engine: pure, front-end- and data-source-agnostic functions that *describe and measure* a portfolio (performance, risk, attribution) with **no advisory logic** — it sits on the "analytics, not advice" side of the line. Lifted from robodashboard's `analytics/` after the 2026-06-03 cross-repo leverage audit, so both the alpha-engine fleet and robodashboard consume one engine instead of parallel reimplementations. Import the submodule you need (the package keeps no eager imports, so the stdlib-only modules import without numpy):
-- **`quant.factor_risk`** — statistical factor risk model `Σ = B·F·Bᵀ + D`: `estimate_factor_model` (time-series factor-ETF / Fama-MacBeth loadings), `portfolio_risk` (ex-ante vol + factor/idio split + per-factor variance contribution), `tracking_error`, `benchmark_exposure`, and a numpy-only `ledoit_wolf_cov` (no sklearn). The estimator-agnostic consumption core (`portfolio_risk`/`tracking_error`) consumes any `FactorRiskModel` (B, F, D). **Needs numpy** — `pip install "alpha-engine-lib[quant]"`.
+- **`quant.factor_risk`** — statistical factor risk model `Σ = B·F·Bᵀ + D`, **Option B** (time-series factor-ETF estimator): `estimate_factor_model` (regress holdings on given factor return series), `portfolio_risk` (ex-ante vol + factor/idio split + per-factor variance contribution), `tracking_error`, `benchmark_exposure`, and a numpy-only `ledoit_wolf_cov` (no sklearn). The estimator-agnostic consumption core (`portfolio_risk`/`tracking_error`) consumes any `FactorRiskModel` (B, F, D). **Needs numpy** — `pip install "alpha-engine-lib[quant]"`.
+- **`quant.factor_risk_xs`** — same `Σ = B·F·Bᵀ + D` model, **Option A** (universe-wide cross-sectional Fama-MacBeth estimator): take *exogenous* per-ticker loadings `B` and infer factor returns `f_t` via a cross-sectional OLS at each date → `F`/`D` (`build_factor_risk_model`, `cross_sectional_factor_returns`, `estimate_factor_covariance`, `estimate_idiosyncratic_variance`). **Needs pandas + scikit-learn** — `pip install "alpha-engine-lib[quant-xs]"` (kept separate so numpy-only consumers stay light).
 - **`quant.risk_measures`** — parametric (Gaussian, Acklam inverse-normal, no scipy) + historical VaR & CVaR, as positive loss fractions at a horizon (stdlib).
 - **`quant.riskstats`** — `volatility`, `sharpe_ratio`, `sortino_ratio`, `max_drawdown` (stdlib).
 - **`quant.returns`** — `xirr` (money-weighted, Newton + bisection), `time_weighted_return` (GIPS), `cumulative_return`, `annualize` (stdlib).

{alpha_engine_lib-0.46.0 → alpha_engine_lib-0.47.0}/src/alpha_engine_lib.egg-info/SOURCES.txt RENAMED Viewed

@@ -39,6 +39,7 @@ src/alpha_engine_lib/pipeline_status/templates.py
 src/alpha_engine_lib/quant/__init__.py
 src/alpha_engine_lib/quant/attribution.py
 src/alpha_engine_lib/quant/factor_risk.py
+src/alpha_engine_lib/quant/factor_risk_xs.py
 src/alpha_engine_lib/quant/returns.py
 src/alpha_engine_lib/quant/risk_measures.py
 src/alpha_engine_lib/quant/riskstats.py
@@ -72,6 +73,7 @@ tests/test_pipeline_status_templates.py
 tests/test_preflight.py
 tests/test_quant_attribution.py
 tests/test_quant_factor_risk.py
+tests/test_quant_factor_risk_xs.py
 tests/test_quant_returns.py
 tests/test_quant_risk_measures.py
 tests/test_quant_riskstats.py

{alpha_engine_lib-0.46.0 → alpha_engine_lib-0.47.0}/src/alpha_engine_lib.egg-info/requires.txt RENAMED Viewed

@@ -21,6 +21,11 @@ flow-doctor[diagnosis,s3]<0.5.0,>=0.4.0
 [quant]
 numpy>=1.24
+[quant-xs]
+numpy>=1.24
+pandas>=2.0
+scikit-learn>=1.0
 [rag]
 psycopg2-binary>=2.9
 pgvector>=0.2

alpha_engine_lib-0.47.0/tests/test_quant_factor_risk_xs.py ADDED Viewed

@@ -0,0 +1,413 @@
+"""Cross-sectional (Fama-MacBeth) factor-risk model — Barra-style F + D.
+Tests the cross-sectional-regression primitives that turn an exogenous
+factor-loading matrix B into F (factor-return covariance) and D (per-ticker
+idiosyncratic variance) — the inputs to a Σ = B·F·Bᵀ + D risk decomposition.
+Load-bearing property: when synthetic data is generated from a known
+true F, the estimator should recover it within sampling error. The
+recovery test is the institutional gate — without it, a silent
+miscalibration would propagate into a downstream risk estimate.
+"""
+from __future__ import annotations
+import pytest
+# factor_risk_xs is the [quant-xs] extra (pandas always; sklearn for the default
+# LedoitWolf/OAS shrinkage). Skip the module cleanly when they're absent.
+np = pytest.importorskip("numpy")
+pd = pytest.importorskip("pandas")
+pytest.importorskip("sklearn")
+from alpha_engine_lib.quant.factor_risk_xs import (  # noqa: E402  (after importorskip guard)
+    build_factor_returns_series,
+    build_factor_risk_model,
+    cross_sectional_factor_returns,
+    estimate_factor_covariance,
+    estimate_idiosyncratic_variance,
+)
+# ─── Helpers ────────────────────────────────────────────────────────────────
+def _synthetic_panel(
+    N: int = 30, K: int = 4, T: int = 250, seed: int = 0,
+    true_F_diag: float = 0.0004, true_D_scale: float = 0.0009,
+    market_factor_var: float = 0.0001,
+):
+    """Generate a synthetic factor-model panel with known true F + D.
+    True model: r_t = market_t + B · f_t + ε_t, where market_t ~ N(0, market_factor_var),
+    f_t ~ N(0, diag(true_F_diag)), ε_t ~ N(0, D), D_i ~ uniform.
+    """
+    rng = np.random.default_rng(seed)
+    dates = pd.date_range("2024-01-01", periods=T, freq="B")
+    tickers = [f"T{i:02d}" for i in range(N)]
+    factor_names = [f"f{k}" for k in range(K)]
+    # Stationary z-scored loadings (mean ≈ 0, std ≈ 1 per factor)
+    B_raw = rng.normal(0, 1, size=(N, K))
+    B_raw = (B_raw - B_raw.mean(axis=0)) / B_raw.std(axis=0)
+    true_F = np.eye(K) * true_F_diag
+    true_D = rng.uniform(0.5 * true_D_scale, 1.5 * true_D_scale, N)
+    loadings_by_date = {d: pd.DataFrame(B_raw, index=tickers, columns=factor_names)
+                        for d in dates}
+    returns_panel = np.zeros((T, N))
+    for i in range(T):
+        m_t = float(rng.normal(0, np.sqrt(market_factor_var)))
+        f_t = rng.multivariate_normal(np.zeros(K), true_F)
+        eps_t = rng.normal(0, np.sqrt(true_D), N)
+        returns_panel[i] = m_t + B_raw @ f_t + eps_t
+    returns_df = pd.DataFrame(returns_panel, index=dates, columns=tickers)
+    return {
+        "returns_df": returns_df,
+        "loadings_by_date": loadings_by_date,
+        "B_true": B_raw,
+        "true_F_diag": true_F_diag,
+        "true_D": true_D,
+        "factor_names": factor_names,
+        "tickers": tickers,
+    }
+# ─── cross_sectional_factor_returns ─────────────────────────────────────────
+class TestCrossSectionalFactorReturns:
+    def test_recovers_known_factor_returns_no_intercept(self):
+        """Exact construction: r = B·f_true → OLS must recover f_true exactly
+        (zero residuals when no noise + no intercept needed)."""
+        rng = np.random.default_rng(1)
+        N, K = 50, 5
+        B = rng.normal(0, 1, size=(N, K))
+        f_true = np.array([0.01, -0.02, 0.005, 0.015, -0.008])
+        r = B @ f_true
+        f_hat, residuals = cross_sectional_factor_returns(
+            r, B, include_intercept=False,
+        )
+        np.testing.assert_allclose(f_hat, f_true, atol=1e-10)
+        # Residuals are zero up to numerical noise
+        assert np.max(np.abs(residuals)) < 1e-9
+    def test_with_intercept_recovers_market_plus_factors(self):
+        """r = m + B·f → 6-element solution with intercept first."""
+        rng = np.random.default_rng(2)
+        N, K = 50, 4
+        B = rng.normal(0, 1, size=(N, K))
+        B = B - B.mean(axis=0)  # z-scored loadings have mean 0
+        market = 0.005
+        f_true = np.array([0.01, -0.02, 0.005, 0.015])
+        r = market + B @ f_true
+        f_hat, _ = cross_sectional_factor_returns(
+            r, B, include_intercept=True,
+        )
+        assert f_hat.shape == (K + 1,)
+        assert f_hat[0] == pytest.approx(market, abs=1e-10)
+        np.testing.assert_allclose(f_hat[1:], f_true, atol=1e-10)
+    def test_handles_noise_with_finite_error(self):
+        """Adding noise → OLS finds the right *direction* but residuals
+        absorb the noise. Sanity: f_hat is close to f_true; residual std
+        is close to the noise std."""
+        rng = np.random.default_rng(3)
+        N, K = 100, 4
+        B = rng.normal(0, 1, size=(N, K))
+        f_true = np.array([0.01, -0.02, 0.005, 0.015])
+        noise = rng.normal(0, 0.02, N)
+        r = B @ f_true + noise
+        f_hat, residuals = cross_sectional_factor_returns(
+            r, B, include_intercept=False,
+        )
+        # Each estimated coefficient within ~3 standard errors of the truth
+        np.testing.assert_allclose(f_hat, f_true, atol=0.008)
+        # Residual std should be close to the input noise std
+        assert abs(float(np.std(residuals)) - 0.02) < 0.005
+    def test_nan_rows_dropped(self):
+        rng = np.random.default_rng(4)
+        N, K = 50, 3
+        B = rng.normal(0, 1, size=(N, K))
+        f_true = np.array([0.01, -0.01, 0.005])
+        r = B @ f_true
+        # Inject NaN
+        r_with_nan = r.copy()
+        r_with_nan[0:5] = np.nan
+        B_with_nan = B.copy()
+        B_with_nan[10:12, 0] = np.nan
+        f_hat, residuals = cross_sectional_factor_returns(
+            r_with_nan, B_with_nan, include_intercept=False,
+        )
+        np.testing.assert_allclose(f_hat, f_true, atol=1e-9)
+        # Residuals for NaN-input rows must be NaN
+        assert np.all(np.isnan(residuals[0:5]))
+        assert np.all(np.isnan(residuals[10:12]))
+    def test_too_few_observations_returns_nan(self):
+        """K + 5 observation buffer prevents unstable solves."""
+        rng = np.random.default_rng(5)
+        N, K = 6, 4  # only 6 rows for 4 factors + intercept = 5 → not ≥ 10
+        B = rng.normal(0, 1, size=(N, K))
+        r = rng.normal(0, 0.01, N)
+        f_hat, residuals = cross_sectional_factor_returns(
+            r, B, include_intercept=True,
+        )
+        assert np.all(np.isnan(f_hat))
+        assert np.all(np.isnan(residuals))
+    def test_wrong_shape_raises(self):
+        rng = np.random.default_rng(6)
+        with pytest.raises(ValueError, match="loadings_prev must be 2-D"):
+            cross_sectional_factor_returns(np.zeros(10), np.zeros(10))
+        with pytest.raises(ValueError, match="returns_t shape"):
+            cross_sectional_factor_returns(np.zeros(11), rng.normal(0, 1, (10, 3)))
+    def test_rank_deficient_loadings_returns_minimum_norm_solution(self):
+        """A perfectly collinear factor column shouldn't crash — lstsq
+        returns the minimum-norm solution. Verifies the no-crash contract."""
+        N, K = 30, 3
+        rng = np.random.default_rng(7)
+        B = rng.normal(0, 1, size=(N, K))
+        B[:, 2] = B[:, 0]  # Column 2 == Column 0 → rank 2, not 3
+        r = rng.normal(0, 0.01, N)
+        # Should not raise
+        f_hat, residuals = cross_sectional_factor_returns(
+            r, B, include_intercept=False,
+        )
+        # All-finite — solver succeeded
+        assert np.all(np.isfinite(f_hat))
+# ─── build_factor_returns_series ────────────────────────────────────────────
+class TestBuildFactorReturnsSeries:
+    def test_emits_factor_returns_and_residuals_panels(self):
+        data = _synthetic_panel(N=30, K=4, T=100)
+        f_df, eps_df = build_factor_returns_series(
+            data["returns_df"], data["loadings_by_date"],
+        )
+        # T rows; K + 1 columns (intercept on by default)
+        assert f_df.shape == (100, 5)
+        assert eps_df.shape == (100, 30)
+        # First date has no prior loadings → all NaN
+        assert f_df.iloc[0].isna().all()
+        # Subsequent dates have factor returns
+        assert not f_df.iloc[10].isna().any()
+    def test_first_date_has_no_prior_loadings(self):
+        """Informational safety: at date t we may only use loadings at
+        strictly earlier dates (t-1 or older)."""
+        data = _synthetic_panel(T=10)
+        f_df, _ = build_factor_returns_series(
+            data["returns_df"], data["loadings_by_date"],
+        )
+        assert f_df.iloc[0].isna().all()
+    def test_factor_names_argument_pins_order(self):
+        data = _synthetic_panel(K=4)
+        custom_order = ["f3", "f1", "f0", "f2"]
+        f_df, _ = build_factor_returns_series(
+            data["returns_df"], data["loadings_by_date"],
+            factor_names=custom_order,
+        )
+        # market column first (intercept on), then custom order
+        assert list(f_df.columns) == ["market"] + custom_order
+    def test_include_intercept_false_skips_market_column(self):
+        data = _synthetic_panel(K=4)
+        f_df, _ = build_factor_returns_series(
+            data["returns_df"], data["loadings_by_date"],
+            include_intercept=False,
+        )
+        # Only K columns
+        assert f_df.shape[1] == 4
+        assert "market" not in f_df.columns
+    def test_empty_returns_panel_returns_empty(self):
+        f_df, eps_df = build_factor_returns_series(pd.DataFrame(), {})
+        assert f_df.empty
+        assert eps_df.empty
+    def test_empty_loadings_raises(self):
+        returns_df = pd.DataFrame(np.zeros((5, 3)), columns=["A", "B", "C"])
+        with pytest.raises(ValueError, match="loadings_by_date is empty"):
+            build_factor_returns_series(returns_df, {})
+# ─── estimate_factor_covariance ─────────────────────────────────────────────
+class TestEstimateFactorCovariance:
+    def test_recovers_known_diagonal_F(self):
+        """The load-bearing recovery test: when the synthetic data is
+        generated with diagonal F = 0.0004 · I, the estimator (with
+        plenty of samples) should produce a roughly-diagonal F with
+        diagonal values in the ballpark of 0.0004."""
+        data = _synthetic_panel(N=50, K=4, T=500, seed=11, true_F_diag=0.0004)
+        f_df, _ = build_factor_returns_series(
+            data["returns_df"], data["loadings_by_date"],
+            include_intercept=False,
+        )
+        F = estimate_factor_covariance(f_df, shrinkage="sample")
+        # Drop the first row (NaN — no prior loadings)
+        diag = np.diag(F.values)
+        # LW would compress diag toward mean; use sample for the recovery test.
+        # Allow 50% relative tolerance — finite-sample noise + LW shrinkage.
+        for d in diag:
+            assert 0.0001 < d < 0.001, (
+                f"Diagonal entry {d:.6f} outside plausible range [1e-4, 1e-3] "
+                f"around true 0.0004"
+            )
+    def test_ledoit_wolf_returns_psd_matrix(self):
+        data = _synthetic_panel(N=30, K=4, T=200, seed=12)
+        f_df, _ = build_factor_returns_series(
+            data["returns_df"], data["loadings_by_date"],
+        )
+        F = estimate_factor_covariance(f_df, shrinkage="ledoit_wolf")
+        eigvals = np.linalg.eigvalsh(F.values)
+        assert eigvals.min() >= -1e-10, (
+            f"LW F must be PSD; got min eigval={eigvals.min()}"
+        )
+    def test_oas_estimator_works(self):
+        data = _synthetic_panel(N=30, K=4, T=200, seed=13)
+        f_df, _ = build_factor_returns_series(
+            data["returns_df"], data["loadings_by_date"],
+        )
+        F = estimate_factor_covariance(f_df, shrinkage="oas")
+        assert F.shape == (5, 5)  # K + intercept
+        eigvals = np.linalg.eigvalsh(F.values)
+        assert eigvals.min() >= -1e-10
+    def test_insufficient_data_returns_nan_F(self):
+        """Below min_obs → all-NaN F so caller knows the build is bad."""
+        f_df = pd.DataFrame(np.random.normal(0, 0.01, (10, 4)),
+                            columns=["a", "b", "c", "d"])
+        F = estimate_factor_covariance(f_df, min_obs=30)
+        assert F.shape == (4, 4)
+        assert F.isna().all().all()
+    def test_unknown_shrinkage_raises(self):
+        data = _synthetic_panel(T=100)
+        f_df, _ = build_factor_returns_series(
+            data["returns_df"], data["loadings_by_date"],
+        )
+        with pytest.raises(ValueError, match="Unknown shrinkage"):
+            estimate_factor_covariance(f_df, shrinkage="not-a-real-estimator")
+# ─── estimate_idiosyncratic_variance ────────────────────────────────────────
+class TestEstimateIdiosyncraticVariance:
+    def test_recovers_per_ticker_idio_variance(self):
+        """Recovery: synthetic D is uniform between 0.5*scale and 1.5*scale;
+        the estimator's mean across tickers should match the true mean
+        within sampling error."""
+        data = _synthetic_panel(N=40, K=4, T=400, seed=21, true_D_scale=0.0009)
+        _, eps_df = build_factor_returns_series(
+            data["returns_df"], data["loadings_by_date"],
+        )
+        D = estimate_idiosyncratic_variance(eps_df)
+        # Mean of recovered D close to mean of true D
+        true_mean = float(data["true_D"].mean())
+        rec_mean = float(D.dropna().mean())
+        # Within 30% relative — finite-sample + finite-K-factor noise
+        assert abs(rec_mean - true_mean) / true_mean < 0.3, (
+            f"Mean idio variance: recovered {rec_mean:.6f} vs true {true_mean:.6f}"
+        )
+    def test_all_positive_or_nan(self):
+        data = _synthetic_panel(N=30, K=4, T=200, seed=22)
+        _, eps_df = build_factor_returns_series(
+            data["returns_df"], data["loadings_by_date"],
+        )
+        D = estimate_idiosyncratic_variance(eps_df)
+        finite = D.dropna()
+        assert len(finite) > 0
+        assert (finite > 0).all()
+    def test_min_obs_skips_thin_tickers(self):
+        """A ticker with <min_obs non-NaN residuals → NaN in D, not 0."""
+        rng = np.random.default_rng(23)
+        T, N = 200, 4
+        eps = rng.normal(0, 0.01, size=(T, N))
+        eps[:190, 0] = np.nan  # ticker 0 has only 10 non-NaN obs
+        eps_df = pd.DataFrame(eps, columns=[f"T{i}" for i in range(N)])
+        D = estimate_idiosyncratic_variance(eps_df, min_obs=30)
+        assert np.isnan(D.iloc[0])
+        for i in range(1, N):
+            assert np.isfinite(D.iloc[i])
+# ─── build_factor_risk_model (end-to-end) ────────────────────────────────────
+class TestBuildFactorRiskModel:
+    def test_end_to_end_produces_F_and_D_with_metadata(self):
+        data = _synthetic_panel(N=30, K=4, T=200, seed=31)
+        out = build_factor_risk_model(
+            data["returns_df"], data["loadings_by_date"],
+        )
+        assert "F" in out and "D" in out and "metadata" in out
+        meta = out["metadata"]
+        assert meta["n_dates"] == 200
+        assert meta["n_clean_dates"] == 199  # first date NaN
+        assert meta["K_eff"] == 5  # 4 factors + intercept
+        assert meta["n_tickers"] == 30
+    def test_F_is_K_eff_x_K_eff_dataframe(self):
+        data = _synthetic_panel(N=20, K=3, T=150, seed=32)
+        out = build_factor_risk_model(
+            data["returns_df"], data["loadings_by_date"],
+        )
+        assert out["F"].shape == (4, 4)
+        # Indexed by factor names with "market" first
+        assert list(out["F"].columns) == ["market", "f0", "f1", "f2"]
+        assert list(out["F"].index) == ["market", "f0", "f1", "f2"]
+    def test_D_indexed_by_ticker(self):
+        data = _synthetic_panel(N=20, K=3, T=150, seed=33)
+        out = build_factor_risk_model(
+            data["returns_df"], data["loadings_by_date"],
+        )
+        assert list(out["D"].index) == data["tickers"]
+        assert (out["D"].dropna() > 0).all()
+    def test_can_disable_intercept(self):
+        data = _synthetic_panel(N=20, K=3, T=150, seed=34)
+        out = build_factor_risk_model(
+            data["returns_df"], data["loadings_by_date"],
+            include_intercept=False,
+        )
+        assert out["metadata"]["K_eff"] == 3
+        assert "market" not in out["F"].columns
+    def test_reconstructed_Sigma_is_PSD(self):
+        """The whole point: Σ = B·F·Bᵀ + D must be PSD so the executor's
+        cvxpy solver can ingest it. Verify on the synthetic recovery case
+        (no intercept — caller assembles a B that matches the F shape)."""
+        data = _synthetic_panel(N=25, K=4, T=300, seed=35)
+        out = build_factor_risk_model(
+            data["returns_df"], data["loadings_by_date"],
+            include_intercept=False,
+        )
+        B = data["B_true"]  # (N, K)
+        F = out["F"].values  # (K, K)
+        D = out["D"].fillna(out["D"].dropna().mean()).values  # (N,)
+        Sigma = B @ F @ B.T + np.diag(D)
+        eigvals = np.linalg.eigvalsh(Sigma)
+        assert eigvals.min() >= -1e-10, (
+            f"Reconstructed Σ must be PSD; got min eigval={eigvals.min()}"
+        )