PyPI - alpha-engine-lib - Versions diffs - 0.46.0__tar.gz → 0.48.0__tar.gz - Mend

alpha-engine-lib 0.46.0tar.gz → 0.48.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (97) hide show

{alpha_engine_lib-0.46.0/src/alpha_engine_lib.egg-info → alpha_engine_lib-0.48.0}/PKG-INFO RENAMED Viewed

@@ -1,7 +1,7 @@
 Metadata-Version: 2.4
 Name: alpha-engine-lib
-Version: 0.46.0
-Summary: Shared utilities for the Alpha Engine modules: preflight, logging, ArcticDB, dates, decision capture, cost telemetry, Anthropic payload chokepoint, artifact freshness, RAG, agent schemas, SSM secrets, Telegram + SNS alerts, EC2 spot resilience, SSM log-capture, SSM dispatcher, Step-Functions execution-state projection, and S3-conditional-PUT writer locks. Full surface documented in README.
+Version: 0.48.0
+Summary: Shared utilities for the Alpha Engine modules: preflight, logging, ArcticDB, dates, decision capture, cost telemetry, Anthropic payload chokepoint, artifact freshness, RAG, agent schemas, SSM secrets, Telegram + SNS alerts, EC2 spot resilience, SSM log-capture, SSM dispatcher, Step-Functions execution-state projection, S3-conditional-PUT writer locks, and bounded-backoff HTTP retry. Full surface documented in README.
 Author: Brian McMahon
 License: Proprietary
 Requires-Python: >=3.9
@@ -16,6 +16,10 @@ Requires-Dist: arcticdb>=6.11; extra == "arcticdb"
 Requires-Dist: pandas>=2.0; extra == "arcticdb"
 Provides-Extra: quant
 Requires-Dist: numpy>=1.24; extra == "quant"
+Provides-Extra: quant-xs
+Requires-Dist: numpy>=1.24; extra == "quant-xs"
+Requires-Dist: pandas>=2.0; extra == "quant-xs"
+Requires-Dist: scikit-learn>=1.0; extra == "quant-xs"
 Provides-Extra: flow-doctor
 Requires-Dist: flow-doctor[diagnosis,s3]<0.5.0,>=0.4.0; extra == "flow-doctor"
 Provides-Extra: rag
@@ -254,12 +258,17 @@ Rotates across `(instance_type × subnet)` combinations on `InsufficientInstance
 The shared institutional-analytics engine: pure, front-end- and data-source-agnostic functions that *describe and measure* a portfolio (performance, risk, attribution) with **no advisory logic** — it sits on the "analytics, not advice" side of the line. Lifted from robodashboard's `analytics/` after the 2026-06-03 cross-repo leverage audit, so both the alpha-engine fleet and robodashboard consume one engine instead of parallel reimplementations. Import the submodule you need (the package keeps no eager imports, so the stdlib-only modules import without numpy):
-- **`quant.factor_risk`** — statistical factor risk model `Σ = B·F·Bᵀ + D`: `estimate_factor_model` (time-series factor-ETF / Fama-MacBeth loadings), `portfolio_risk` (ex-ante vol + factor/idio split + per-factor variance contribution), `tracking_error`, `benchmark_exposure`, and a numpy-only `ledoit_wolf_cov` (no sklearn). The estimator-agnostic consumption core (`portfolio_risk`/`tracking_error`) consumes any `FactorRiskModel` (B, F, D). **Needs numpy** — `pip install "alpha-engine-lib[quant]"`.
+- **`quant.factor_risk`** — statistical factor risk model `Σ = B·F·Bᵀ + D`, **Option B** (time-series factor-ETF estimator): `estimate_factor_model` (regress holdings on given factor return series), `portfolio_risk` (ex-ante vol + factor/idio split + per-factor variance contribution), `tracking_error`, `benchmark_exposure`, and a numpy-only `ledoit_wolf_cov` (no sklearn). The estimator-agnostic consumption core (`portfolio_risk`/`tracking_error`) consumes any `FactorRiskModel` (B, F, D). **Needs numpy** — `pip install "alpha-engine-lib[quant]"`.
+- **`quant.factor_risk_xs`** — same `Σ = B·F·Bᵀ + D` model, **Option A** (universe-wide cross-sectional Fama-MacBeth estimator): take *exogenous* per-ticker loadings `B` and infer factor returns `f_t` via a cross-sectional OLS at each date → `F`/`D` (`build_factor_risk_model`, `cross_sectional_factor_returns`, `estimate_factor_covariance`, `estimate_idiosyncratic_variance`). **Needs pandas + scikit-learn** — `pip install "alpha-engine-lib[quant-xs]"` (kept separate so numpy-only consumers stay light).
 - **`quant.risk_measures`** — parametric (Gaussian, Acklam inverse-normal, no scipy) + historical VaR & CVaR, as positive loss fractions at a horizon (stdlib).
 - **`quant.riskstats`** — `volatility`, `sharpe_ratio`, `sortino_ratio`, `max_drawdown` (stdlib).
 - **`quant.returns`** — `xirr` (money-weighted, Newton + bisection), `time_weighted_return` (GIPS), `cumulative_return`, `annualize` (stdlib).
 - **`quant.attribution`** — single-period Brinson-Fachler decomposition (`brinson_fachler`) + multi-period Cariño linking (`link_periods`) (stdlib).
+### `http_retry` — bounded-backoff transient-API retry chokepoint
+`request_with_retry(url, *, params, session, transient_status, ...)` returns the final `requests.Response` after retrying the transient class — 429 + 5xx responses (honoring `Retry-After`) and `Timeout`/`ConnectionError` network errors — with exponential backoff + full jitter; an exhausted network error raises `HttpRetryError` (api-key-scrubbed), while a persistent transient-status response is returned for the caller to interpret (so a 403, not in the transient set, is handed back for e.g. polygon's `PolygonForbiddenError` conversion). Also exposes the low-level `backoff_delay(attempt, *, base, cap, retry_after)` and `scrub_api_keys(msg)` (masks `api_key=`/`apiKey=` querystring values) for consumers with bespoke loops (the rate-limited `polygon_client` keeps its own loop + 403 + JSON parse and reuses just the delay math + scrubber). Consolidates the four mirrored alpha-engine-data retry sites (FRED fetch, polygon client, preflight reachability, FRED repair) into one policy so they stop drifting (L4499). Stdlib + `requests` only.
 ```python
 from alpha_engine_lib.quant.risk_measures import historical_cvar
 from alpha_engine_lib.quant.factor_risk import estimate_factor_model, portfolio_risk

alpha_engine_lib-0.46.0/PKG-INFO → alpha_engine_lib-0.48.0/README.md RENAMED Viewed

@@ -1,34 +1,3 @@
-Metadata-Version: 2.4
-Name: alpha-engine-lib
-Version: 0.46.0
-Summary: Shared utilities for the Alpha Engine modules: preflight, logging, ArcticDB, dates, decision capture, cost telemetry, Anthropic payload chokepoint, artifact freshness, RAG, agent schemas, SSM secrets, Telegram + SNS alerts, EC2 spot resilience, SSM log-capture, SSM dispatcher, Step-Functions execution-state projection, and S3-conditional-PUT writer locks. Full surface documented in README.
-Author: Brian McMahon
-License: Proprietary
-Requires-Python: >=3.9
-Description-Content-Type: text/markdown
-Requires-Dist: boto3>=1.34
-Requires-Dist: pydantic>=2.0
-Requires-Dist: pyyaml>=6.0
-Requires-Dist: requests>=2.31
-Requires-Dist: eval_type_backport>=0.2.0; python_version < "3.10"
-Provides-Extra: arcticdb
-Requires-Dist: arcticdb>=6.11; extra == "arcticdb"
-Requires-Dist: pandas>=2.0; extra == "arcticdb"
-Provides-Extra: quant
-Requires-Dist: numpy>=1.24; extra == "quant"
-Provides-Extra: flow-doctor
-Requires-Dist: flow-doctor[diagnosis,s3]<0.5.0,>=0.4.0; extra == "flow-doctor"
-Provides-Extra: rag
-Requires-Dist: psycopg2-binary>=2.9; extra == "rag"
-Requires-Dist: pgvector>=0.2; extra == "rag"
-Requires-Dist: numpy>=1.24; extra == "rag"
-Provides-Extra: rerank
-Requires-Dist: sentence-transformers>=3.0; extra == "rerank"
-Provides-Extra: dev
-Requires-Dist: pytest>=7.0; extra == "dev"
-Requires-Dist: pytest-cov>=4.0; extra == "dev"
-Requires-Dist: moto>=5.0; extra == "dev"
 # alpha-engine-lib
 > Part of [**Nous Ergon**](https://nousergon.ai) — Autonomous Multi-Agent Trading System. Repo and S3 names use the underlying project name `alpha-engine`.
@@ -254,12 +223,17 @@ Rotates across `(instance_type × subnet)` combinations on `InsufficientInstance
 The shared institutional-analytics engine: pure, front-end- and data-source-agnostic functions that *describe and measure* a portfolio (performance, risk, attribution) with **no advisory logic** — it sits on the "analytics, not advice" side of the line. Lifted from robodashboard's `analytics/` after the 2026-06-03 cross-repo leverage audit, so both the alpha-engine fleet and robodashboard consume one engine instead of parallel reimplementations. Import the submodule you need (the package keeps no eager imports, so the stdlib-only modules import without numpy):
-- **`quant.factor_risk`** — statistical factor risk model `Σ = B·F·Bᵀ + D`: `estimate_factor_model` (time-series factor-ETF / Fama-MacBeth loadings), `portfolio_risk` (ex-ante vol + factor/idio split + per-factor variance contribution), `tracking_error`, `benchmark_exposure`, and a numpy-only `ledoit_wolf_cov` (no sklearn). The estimator-agnostic consumption core (`portfolio_risk`/`tracking_error`) consumes any `FactorRiskModel` (B, F, D). **Needs numpy** — `pip install "alpha-engine-lib[quant]"`.
+- **`quant.factor_risk`** — statistical factor risk model `Σ = B·F·Bᵀ + D`, **Option B** (time-series factor-ETF estimator): `estimate_factor_model` (regress holdings on given factor return series), `portfolio_risk` (ex-ante vol + factor/idio split + per-factor variance contribution), `tracking_error`, `benchmark_exposure`, and a numpy-only `ledoit_wolf_cov` (no sklearn). The estimator-agnostic consumption core (`portfolio_risk`/`tracking_error`) consumes any `FactorRiskModel` (B, F, D). **Needs numpy** — `pip install "alpha-engine-lib[quant]"`.
+- **`quant.factor_risk_xs`** — same `Σ = B·F·Bᵀ + D` model, **Option A** (universe-wide cross-sectional Fama-MacBeth estimator): take *exogenous* per-ticker loadings `B` and infer factor returns `f_t` via a cross-sectional OLS at each date → `F`/`D` (`build_factor_risk_model`, `cross_sectional_factor_returns`, `estimate_factor_covariance`, `estimate_idiosyncratic_variance`). **Needs pandas + scikit-learn** — `pip install "alpha-engine-lib[quant-xs]"` (kept separate so numpy-only consumers stay light).
 - **`quant.risk_measures`** — parametric (Gaussian, Acklam inverse-normal, no scipy) + historical VaR & CVaR, as positive loss fractions at a horizon (stdlib).
 - **`quant.riskstats`** — `volatility`, `sharpe_ratio`, `sortino_ratio`, `max_drawdown` (stdlib).
 - **`quant.returns`** — `xirr` (money-weighted, Newton + bisection), `time_weighted_return` (GIPS), `cumulative_return`, `annualize` (stdlib).
 - **`quant.attribution`** — single-period Brinson-Fachler decomposition (`brinson_fachler`) + multi-period Cariño linking (`link_periods`) (stdlib).
+### `http_retry` — bounded-backoff transient-API retry chokepoint
+`request_with_retry(url, *, params, session, transient_status, ...)` returns the final `requests.Response` after retrying the transient class — 429 + 5xx responses (honoring `Retry-After`) and `Timeout`/`ConnectionError` network errors — with exponential backoff + full jitter; an exhausted network error raises `HttpRetryError` (api-key-scrubbed), while a persistent transient-status response is returned for the caller to interpret (so a 403, not in the transient set, is handed back for e.g. polygon's `PolygonForbiddenError` conversion). Also exposes the low-level `backoff_delay(attempt, *, base, cap, retry_after)` and `scrub_api_keys(msg)` (masks `api_key=`/`apiKey=` querystring values) for consumers with bespoke loops (the rate-limited `polygon_client` keeps its own loop + 403 + JSON parse and reuses just the delay math + scrubber). Consolidates the four mirrored alpha-engine-data retry sites (FRED fetch, polygon client, preflight reachability, FRED repair) into one policy so they stop drifting (L4499). Stdlib + `requests` only.
 ```python
 from alpha_engine_lib.quant.risk_measures import historical_cvar
 from alpha_engine_lib.quant.factor_risk import estimate_factor_model, portfolio_risk

{alpha_engine_lib-0.46.0 → alpha_engine_lib-0.48.0}/pyproject.toml RENAMED Viewed

@@ -4,8 +4,8 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "alpha-engine-lib"
-version = "0.46.0"
-description = "Shared utilities for the Alpha Engine modules: preflight, logging, ArcticDB, dates, decision capture, cost telemetry, Anthropic payload chokepoint, artifact freshness, RAG, agent schemas, SSM secrets, Telegram + SNS alerts, EC2 spot resilience, SSM log-capture, SSM dispatcher, Step-Functions execution-state projection, and S3-conditional-PUT writer locks. Full surface documented in README."
+version = "0.48.0"
+description = "Shared utilities for the Alpha Engine modules: preflight, logging, ArcticDB, dates, decision capture, cost telemetry, Anthropic payload chokepoint, artifact freshness, RAG, agent schemas, SSM secrets, Telegram + SNS alerts, EC2 spot resilience, SSM log-capture, SSM dispatcher, Step-Functions execution-state projection, S3-conditional-PUT writer locks, and bounded-backoff HTTP retry. Full surface documented in README."
 readme = "README.md"
 # EC2 still runs Python 3.9 on the always-on micro instance (boto3 drops
 # 3.9 support 2026-04-29, so upgrade is on the near-term roadmap). All
@@ -34,6 +34,11 @@ arcticdb = ["arcticdb>=6.11", "pandas>=2.0"]
 # factor-risk module needs numpy; the VaR/CVaR, riskstats, returns, and
 # attribution modules are pure stdlib and import without this extra.
 quant = ["numpy>=1.24"]
+# Cross-sectional (Fama-MacBeth) factor risk model — quant.factor_risk_xs.
+# Needs pandas (always) + scikit-learn (LedoitWolf/OAS shrinkage). Kept
+# separate from [quant] so the numpy-only consumers (e.g. robodashboard)
+# don't pull pandas+sklearn.
+quant-xs = ["numpy>=1.24", "pandas>=2.0", "scikit-learn>=1.0"]
 flow_doctor = ["flow-doctor[diagnosis,s3]>=0.4.0,<0.5.0"]
 rag = [
     "psycopg2-binary>=2.9",

{alpha_engine_lib-0.46.0 → alpha_engine_lib-0.48.0}/src/alpha_engine_lib/__init__.py RENAMED Viewed

@@ -1,3 +1,3 @@
 """alpha-engine-lib — shared utilities for Alpha Engine modules."""
-__version__ = "0.46.0"
+__version__ = "0.48.0"

alpha_engine_lib-0.48.0/src/alpha_engine_lib/http_retry.py ADDED Viewed

@@ -0,0 +1,199 @@
+"""Bounded-backoff HTTP retry primitive — the transient external-API
+resilience chokepoint (L4499).
+Consolidates the backoff + full-jitter + ``Retry-After`` + api-key-scrub
+retry idiom that was mirrored across four alpha-engine-data sites:
+  * ``collectors/daily_closes.py::_fred_get_with_retry``     (L4480)
+  * ``polygon_client.py::_get`` / ``_backoff``               (L4496)
+  * ``preflight.py::_reachability_get``                      (L4494)
+  * ``collectors/daily_closes_fred_repair.py::_fetch_fred_range``
+Each had its own copy of "exponential backoff + full jitter, honor
+``Retry-After``, retry the transient class, scrub the api-key from the
+error before logging/raising, then fail loud." This module is the single
+source of truth for that policy so the four callsites stop drifting.
+Two layers are exported:
+  * :func:`request_with_retry` — the full GET-with-retry for the plain
+    callsites (FRED fetch, preflight probe, FRED repair). Returns the final
+    ``requests.Response``; the caller still owns status interpretation
+    (``raise_for_status`` / special-casing a 403), so genuinely different
+    consumers compose it without a leaky mega-config.
+  * :func:`backoff_delay` + :func:`scrub_api_keys` — the low-level pieces for
+    a consumer with bespoke control flow (the rate-limited ``polygon_client``
+    keeps its own loop + 403 handling + JSON parse + rate limiter, but shares
+    the delay math and the scrubber).
+Design note (anti-over-engineering): this is deliberately NOT a
+pluggable-everything HTTP framework. It captures the one invariant the four
+sites share; consumers whose semantics diverge (polygon's 403 + rate limiter)
+reuse the primitives rather than being forced through a generic loop.
+"""
+from __future__ import annotations
+import logging as _logging
+import random as _random
+import re
+import time as _time
+from typing import Callable, Iterable
+import requests
+_DEFAULT_LOGGER = _logging.getLogger(__name__)
+# Transient HTTP status class: 429 (rate limit) + the retryable 5xx. A 4xx
+# other than 429 is a deterministic client error — retrying it is pointless,
+# so it is NOT in the default set and is returned to the caller as-is.
+DEFAULT_TRANSIENT_STATUS: "frozenset[int]" = frozenset({429, 500, 502, 503, 504})
+# Mask FRED ``api_key=`` (snake) and polygon ``apiKey=`` (camel) querystring
+# VALUES — both leak via ``requests`` exception ``str()`` (the effective URL)
+# and via hand-built error strings. Mirrors the per-repo scrubbers this module
+# replaces; complements ``alpha_engine_lib.logging.SecretsRedactingFilter``
+# (which catches token-shaped secrets, not query-param api keys).
+_API_KEY_RE = re.compile(r"(?:api_key|apiKey)=[^&\s]+")
+def scrub_api_keys(msg: object) -> str:
+    """Mask ``api_key=...`` / ``apiKey=...`` querystring values in a string.
+    Preserves the key NAME (so logs still show *which* param) and the value
+    delimiter, replacing only the secret value with ``***``. Idempotent.
+    """
+    return _API_KEY_RE.sub(lambda m: m.group(0).split("=", 1)[0] + "=***", str(msg))
+class HttpRetryError(RuntimeError):
+    """Raised when all attempts are exhausted on a transient NETWORK error
+    (``requests.Timeout`` / ``requests.ConnectionError``) or a non-transient
+    ``RequestException``.
+    The message is api-key-scrubbed. The originating exception is preserved
+    as ``__cause__`` (and on ``.last_exc``); ``.label`` / ``.attempts`` carry
+    context for callers that want to re-wrap (e.g. preflight's
+    ``RuntimeError(... unreachable ...)``).
+    """
+    def __init__(self, label: str, attempts: int, last_exc: BaseException) -> None:
+        self.label = label
+        self.attempts = attempts
+        self.last_exc = last_exc
+        super().__init__(
+            scrub_api_keys(
+                f"{label or 'request'} failed after {attempts} attempt(s): {last_exc}"
+            )
+        )
+def backoff_delay(
+    attempt: int,
+    *,
+    base: float = 1.0,
+    cap: float = 30.0,
+    retry_after: "str | float | None" = None,
+    rng: "_random.Random | None" = None,
+) -> float:
+    """Full-jitter exponential backoff: ``min(base*2**attempt + U(0, base), cap)``.
+    ``attempt`` is 0-indexed. Honors a server ``Retry-After`` (seconds, str or
+    float) when supplied — a numeric value replaces the exponential term (still
+    + jitter, still capped); a non-numeric ``Retry-After`` (HTTP-date form)
+    falls back to the exponential term. ``rng`` is injectable for deterministic
+    tests.
+    """
+    wait: "float | None" = None
+    if retry_after is not None:
+        try:
+            wait = float(retry_after)
+        except (TypeError, ValueError):
+            wait = None
+    if wait is None:
+        wait = base * (2 ** attempt)
+    jitter = (rng or _random).uniform(0, base)
+    return min(wait + jitter, cap)
+def request_with_retry(
+    url: str,
+    *,
+    method: str = "GET",
+    params: "dict | None" = None,
+    session: "requests.Session | None" = None,
+    timeout: float = 15.0,
+    max_attempts: int = 3,
+    backoff_base: float = 1.0,
+    backoff_cap: float = 30.0,
+    transient_status: Iterable[int] = DEFAULT_TRANSIENT_STATUS,
+    retry_network: bool = True,
+    honor_retry_after: bool = True,
+    scrub: Callable[[object], str] = scrub_api_keys,
+    logger: "_logging.Logger | None" = None,
+    label: str = "",
+    sleep: Callable[[float], None] = _time.sleep,
+) -> requests.Response:
+    """``method`` ``url`` with bounded backoff + full jitter on the transient
+    class, returning the final :class:`requests.Response`.
+    Retries:
+      * responses whose status is in ``transient_status`` (default 429 + 5xx),
+        honoring ``Retry-After`` when ``honor_retry_after``; and
+      * (when ``retry_network``) ``requests.Timeout`` / ``ConnectionError``.
+    Terminal behavior:
+      * a transient-status response that survives ``max_attempts`` is
+        **returned** — the caller decides whether to ``raise_for_status`` or
+        special-case it (e.g. a 403, which is NOT in the transient set, is
+        returned immediately for the caller to convert); and
+      * an exhausted NETWORK error (or a non-transient ``RequestException``
+        such as a bad URL) raises :class:`HttpRetryError` (scrubbed).
+    ``scrub`` is applied to every error string logged or raised. ``session``
+    lets a caller reuse a session (e.g. one carrying auth query params).
+    ``sleep`` is injectable for tests. ``max_attempts`` must be >= 1.
+    """
+    if max_attempts < 1:
+        raise ValueError(f"max_attempts must be >= 1, got {max_attempts}")
+    log = logger or _DEFAULT_LOGGER
+    transient = frozenset(transient_status)
+    requester = (session or requests).request
+    resp: "requests.Response | None" = None
+    for attempt in range(max_attempts):
+        last = attempt == max_attempts - 1
+        try:
+            resp = requester(method, url, params=params or {}, timeout=timeout)
+        except (requests.Timeout, requests.ConnectionError) as exc:
+            if not retry_network or last:
+                raise HttpRetryError(label, attempt + 1, exc) from exc
+            delay = backoff_delay(attempt, base=backoff_base, cap=backoff_cap)
+            log.warning(
+                "%s transient %s — backing off %.1fs (attempt %d/%d)",
+                label or url, type(exc).__name__, delay, attempt + 1, max_attempts,
+            )
+            sleep(delay)
+            continue
+        except requests.RequestException as exc:
+            # Non-transient (bad URL / too many redirects / invalid schema) —
+            # retrying a deterministic error is pointless; fail loud now.
+            raise HttpRetryError(label, attempt + 1, exc) from exc
+        if resp.status_code in transient and not last:
+            retry_after = resp.headers.get("Retry-After") if honor_retry_after else None
+            delay = backoff_delay(
+                attempt, base=backoff_base, cap=backoff_cap, retry_after=retry_after,
+            )
+            log.warning(
+                "%s HTTP %d — backing off %.1fs (attempt %d/%d)",
+                label or url, resp.status_code, delay, attempt + 1, max_attempts,
+            )
+            sleep(delay)
+            continue
+        return resp
+    # Loop exhausted on transient-status responses: return the last one for the
+    # caller to interpret (network exhaustion already raised above). resp is
+    # non-None because max_attempts >= 1 guarantees at least one assignment.
+    assert resp is not None
+    return resp

alpha_engine_lib-0.48.0/src/alpha_engine_lib/quant/factor_risk_xs.py ADDED Viewed

@@ -0,0 +1,332 @@
+"""Cross-sectional (Fama-MacBeth) factor risk model — the "Option A" estimator.
+Complements ``quant.factor_risk`` (the "Option B" time-series factor-ETF
+estimator). Both produce the inputs to the same Σ = B·F·Bᵀ + D structural
+covariance consumed by ``quant.factor_risk.portfolio_risk`` / ``tracking_error``;
+they differ only in how the factor returns ``f_t`` and the factor covariance
+``F`` are estimated:
+  - **Option B** (``factor_risk.estimate_factor_model``) — regress each holding's
+    return series on a small set of *given* factor return series (market +
+    style-ETF spreads). Loadings ``B`` are the regression betas. numpy-only.
+  - **Option A** (here) — take *exogenous* per-ticker factor loadings ``B`` (e.g.
+    fundamentals-derived style exposures) and infer the factor returns ``f_t`` by
+    a cross-sectional OLS at each date (Fama-MacBeth 1973):
+        r_t = B_{t-1} · f_t + ε_t
+    Stacking ``f_t`` over a rolling window gives a (T × K) factor-return panel;
+    ``F`` is its (Ledoit-Wolf-shrunk) covariance and ``D`` the per-ticker
+    time-series variance of the residuals ε. This is the universe-wide Barra-lite
+    build.
+**Dependencies:** pandas (always) + scikit-learn (lazy, only for the
+``ledoit_wolf``/``oas`` shrinkage estimators). Install ``alpha-engine-lib[quant-xs]``.
+Kept in its own module so the numpy-only ``factor_risk``/``risk_measures``/etc.
+consumers don't pull pandas+sklearn.
+References:
+  - Fama & MacBeth 1973 "Risk, Return, and Equilibrium: Empirical Tests"
+    (JPE 81(3)) — cross-sectional-regression construction of factor returns
+  - Grinold & Kahn 2000, _Active Portfolio Management_, Ch. 3 — canonical
+    structural factor risk model
+  - Menchero, Orr & Wang 2011 "The Barra US Equity Model (USE4)
+    Methodology Notes" — operational reference
+"""
+from __future__ import annotations
+import logging
+from typing import Iterable
+import numpy as np
+import pandas as pd
+log = logging.getLogger(__name__)
+_MIN_OBS_OVER_K = 5  # require ≥ K + 5 valid observations for a stable regression
+def cross_sectional_factor_returns(
+    returns_t: np.ndarray,
+    loadings_prev: np.ndarray,
+    *,
+    include_intercept: bool = True,
+) -> tuple[np.ndarray, np.ndarray]:
+    """Solve r_t = B_{t-1} · f_t + ε_t for one date via OLS.
+    Args:
+        returns_t: (N,) realized returns at time t.
+        loadings_prev: (N, K) factor loadings at time t-1.
+        include_intercept: if True, prepends a column of 1s to the
+            loadings (the "market" factor return). f_t[0] becomes the
+            cross-sectional mean return; f_t[1:] are the per-factor
+            slopes. Default True.
+    Returns:
+        (f_t, residuals):
+          • f_t: (K_eff,) factor return vector — length K+1 with
+            intercept, K without.
+          • residuals: (N,) per-ticker ε_t. NaN for rows where the
+            inputs had NaN (preserved positionally so the caller can
+            keep aligning with the universe).
+    Rows with NaN in either r_t or any column of B_{t-1} are excluded
+    from the regression. If fewer than K_eff + 5 valid rows remain
+    (the regression is unstable), returns all-NaN for both outputs.
+    """
+    returns_t = np.asarray(returns_t, dtype=np.float64).ravel()
+    loadings_prev = np.asarray(loadings_prev, dtype=np.float64)
+    if loadings_prev.ndim != 2:
+        raise ValueError(
+            f"loadings_prev must be 2-D (N × K); got shape {loadings_prev.shape}"
+        )
+    N, K = loadings_prev.shape
+    if returns_t.shape != (N,):
+        raise ValueError(
+            f"returns_t shape {returns_t.shape} != ({N},) matching loadings rows"
+        )
+    if include_intercept:
+        B = np.column_stack([np.ones(N), loadings_prev])
+        K_eff = K + 1
+    else:
+        B = loadings_prev
+        K_eff = K
+    valid = np.isfinite(returns_t) & np.all(np.isfinite(B), axis=1)
+    n_valid = int(valid.sum())
+    if n_valid < K_eff + _MIN_OBS_OVER_K:
+        return np.full(K_eff, np.nan), np.full(N, np.nan)
+    r_valid = returns_t[valid]
+    B_valid = B[valid]
+    # OLS via lstsq is rank-robust (returns minimum-norm solution if B
+    # is rank-deficient). Rank-deficient B is a soft warning, not an
+    # error — caller decides whether to drop low-rank dates.
+    f_t, *_ = np.linalg.lstsq(B_valid, r_valid, rcond=None)
+    residuals = np.full(N, np.nan)
+    residuals[valid] = r_valid - B_valid @ f_t
+    return f_t, residuals
+def build_factor_returns_series(
+    returns_panel: pd.DataFrame,
+    loadings_by_date: dict[pd.Timestamp, pd.DataFrame],
+    *,
+    include_intercept: bool = True,
+    factor_names: Iterable[str] | None = None,
+) -> tuple[pd.DataFrame, pd.DataFrame]:
+    """Loop over dates in ``returns_panel``; for each date t, run the
+    cross-sectional regression r_t = B_{t-1} · f_t + ε_t.
+    Args:
+        returns_panel: (T × N) DataFrame indexed by date, columns are
+            ticker names. r_t is the t-th row.
+        loadings_by_date: mapping date_t-1 → (N × K) DataFrame of
+            factor loadings for that date. Indexed by ticker, columns
+            are factor names. The driver looks up loadings at the
+            previous available date for each t (most recent ≤ t-1).
+        include_intercept: prepends a market-factor column. See
+            cross_sectional_factor_returns. Default True.
+        factor_names: optional explicit order for the K factor columns.
+            If provided, loadings_by_date entries are reindexed to this
+            order. Default: use the order of the first loadings frame.
+    Returns:
+        (factor_returns_df, residuals_df):
+          • factor_returns_df: (T × K_eff) — index matches returns_panel
+            dates; columns are ["market", *factor_names] when intercept
+            is on, [*factor_names] when off.
+          • residuals_df: (T × N) — same shape as returns_panel; NaN
+            where the regression was skipped or input was missing.
+    """
+    if returns_panel.empty:
+        return pd.DataFrame(), pd.DataFrame()
+    dates = list(returns_panel.index)
+    tickers = list(returns_panel.columns)
+    N = len(tickers)
+    # Resolve canonical factor name list from the first usable loadings frame
+    if factor_names is None:
+        sample = next(iter(loadings_by_date.values()), None)
+        if sample is None:
+            raise ValueError("loadings_by_date is empty — nothing to regress against")
+        factor_names = list(sample.columns)
+    factor_names = list(factor_names)
+    K = len(factor_names)
+    col_names = (["market"] + factor_names) if include_intercept else factor_names
+    f_panel = np.full((len(dates), len(col_names)), np.nan)
+    eps_panel = np.full((len(dates), N), np.nan)
+    sorted_loading_dates = sorted(loadings_by_date.keys())
+    for i, date_t in enumerate(dates):
+        prev_date = _latest_loading_date_at_or_before(sorted_loading_dates, date_t)
+        if prev_date is None:
+            continue
+        B_df = loadings_by_date[prev_date].reindex(index=tickers, columns=factor_names)
+        if B_df.empty:
+            continue
+        B = B_df.to_numpy(dtype=np.float64)
+        r = returns_panel.iloc[i].to_numpy(dtype=np.float64)
+        f_t, residuals = cross_sectional_factor_returns(
+            r, B, include_intercept=include_intercept,
+        )
+        f_panel[i] = f_t
+        eps_panel[i] = residuals
+    factor_returns_df = pd.DataFrame(f_panel, index=dates, columns=col_names)
+    residuals_df = pd.DataFrame(eps_panel, index=dates, columns=tickers)
+    return factor_returns_df, residuals_df
+def _latest_loading_date_at_or_before(
+    sorted_dates: list[pd.Timestamp], cutoff: pd.Timestamp,
+) -> pd.Timestamp | None:
+    """Bisect for the latest loading-date strictly < cutoff (informationally
+    safe: at date t we only know loadings as of date t-1)."""
+    import bisect
+    idx = bisect.bisect_left(sorted_dates, cutoff)
+    if idx == 0:
+        return None
+    return sorted_dates[idx - 1]
+def estimate_factor_covariance(
+    factor_returns_df: pd.DataFrame,
+    *,
+    shrinkage: str = "ledoit_wolf",
+    min_obs: int = 30,
+) -> pd.DataFrame:
+    """Estimate F = Cov(f_t) over the factor-return panel.
+    Drops rows with any NaN (incomplete regressions). Default LW shrinkage
+    mirrors the executor's portfolio_optimizer default; "sample" and "oas"
+    also supported. Reuses sklearn estimators.
+    Args:
+        factor_returns_df: (T × K_eff) factor-return panel from
+            build_factor_returns_series.
+        shrinkage: estimator name. "ledoit_wolf" (default), "sample", "oas".
+        min_obs: minimum clean rows required. Below floor returns an
+            all-NaN F so the caller knows the build was insufficient
+            (per no-silent-fails — would-be downstream consumers of F
+            see NaN, not silently zero).
+    Returns:
+        F: (K_eff × K_eff) DataFrame, index + columns are factor names.
+    """
+    clean = factor_returns_df.dropna()
+    K = factor_returns_df.shape[1]
+    cols = list(factor_returns_df.columns)
+    if len(clean) < min_obs:
+        log.warning(
+            "estimate_factor_covariance: only %d clean rows (need ≥%d) — "
+            "returning all-NaN F", len(clean), min_obs,
+        )
+        return pd.DataFrame(np.full((K, K), np.nan), index=cols, columns=cols)
+    if shrinkage == "ledoit_wolf":
+        from sklearn.covariance import LedoitWolf
+        F = LedoitWolf().fit(clean.to_numpy()).covariance_
+    elif shrinkage == "oas":
+        from sklearn.covariance import OAS
+        F = OAS().fit(clean.to_numpy()).covariance_
+    elif shrinkage == "sample":
+        F = np.cov(clean.to_numpy(), rowvar=False)
+    else:
+        raise ValueError(f"Unknown shrinkage: {shrinkage!r}")
+    return pd.DataFrame(F, index=cols, columns=cols)
+def estimate_idiosyncratic_variance(
+    residuals_df: pd.DataFrame,
+    *,
+    min_obs: int = 30,
+) -> pd.Series:
+    """Per-ticker D_{ii} = Var(ε_{i,t}) — diagonal of the residual cov.
+    Tickers with fewer than ``min_obs`` non-NaN residual rows are
+    emitted as NaN per no-silent-fails (downstream Σ = B·F·Bᵀ + D
+    construction treats NaN D as "skip this name" or falls back to a
+    safe default).
+    Args:
+        residuals_df: (T × N) residual panel from
+            build_factor_returns_series.
+        min_obs: minimum non-NaN observations per ticker.
+    Returns:
+        D: (N,) Series indexed by ticker.
+    """
+    out = pd.Series(np.nan, index=residuals_df.columns, dtype=np.float64)
+    for ticker in residuals_df.columns:
+        eps = residuals_df[ticker].dropna()
+        if len(eps) < min_obs:
+            continue
+        # Population variance (N divisor — universe is the population for
+        # cross-sectional regressions) to match the F estimator convention.
+        out[ticker] = float(eps.var(ddof=0))
+    return out
+def build_factor_risk_model(
+    returns_panel: pd.DataFrame,
+    loadings_by_date: dict[pd.Timestamp, pd.DataFrame],
+    *,
+    include_intercept: bool = True,
+    cov_shrinkage: str = "ledoit_wolf",
+    min_cov_obs: int = 30,
+    min_idio_obs: int = 30,
+) -> dict:
+    """End-to-end builder: cross-sectional regressions → F + D.
+    Returns a dict with keys:
+      • "factor_returns": (T × K_eff) DataFrame
+      • "residuals": (T × N) DataFrame
+      • "F": (K_eff × K_eff) DataFrame
+      • "D": (N,) Series
+      • "metadata": dict with n_dates, n_clean_dates, K_eff, n_tickers
+    """
+    factor_returns, residuals = build_factor_returns_series(
+        returns_panel, loadings_by_date,
+        include_intercept=include_intercept,
+    )
+    F = estimate_factor_covariance(
+        factor_returns, shrinkage=cov_shrinkage, min_obs=min_cov_obs,
+    )
+    D = estimate_idiosyncratic_variance(residuals, min_obs=min_idio_obs)
+    n_clean = int(factor_returns.dropna().shape[0])
+    metadata = {
+        "n_dates": int(factor_returns.shape[0]),
+        "n_clean_dates": n_clean,
+        "K_eff": int(factor_returns.shape[1]),
+        "n_tickers": int(returns_panel.shape[1]),
+        "cov_shrinkage": cov_shrinkage,
+        "include_intercept": bool(include_intercept),
+    }
+    return {
+        "factor_returns": factor_returns,
+        "residuals": residuals,
+        "F": F,
+        "D": D,
+        "metadata": metadata,
+    }
+__all__ = [
+    "cross_sectional_factor_returns",
+    "build_factor_returns_series",
+    "estimate_factor_covariance",
+    "estimate_idiosyncratic_variance",
+    "build_factor_risk_model",
+]

alpha-engine-lib 0.46.0__tar.gz → 0.48.0__tar.gz

alpha-engine-lib 0.46.0tar.gz → 0.48.0tar.gz