PyPI - pathforge - Versions diffs - 0.1.1__tar.gz → 0.2.0__tar.gz - Mend

pathforge 0.1.1tar.gz → 0.2.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

{pathforge-0.1.1 → pathforge-0.2.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: pathforge
-Version: 0.1.1
+Version: 0.2.0
 Summary: Simulate realistic financial markets from historical price data
 Description-Content-Type: text/markdown
@@ -21,6 +21,7 @@ Testing a trading strategy on a single historical price series tells you how it
 ## Installation
 ```bash
 pip install pathforge
+pip install numba  # required for markov_egarch model
 ```
 To use the built-in plot functionality:
@@ -56,6 +57,7 @@ df = sim.to_dataframe()  # shape: (253, 100)
 | Model | `model=` | Best for |
 |---|---|---|
+| Markov-switching EGARCH | `"markov_egarch"` | Research-grade: hidden regimes + volatility clustering + fat tails |
 | Geometric Brownian Motion | `"gbm"` | Fast baseline, simple assumptions |
 | GARCH(1,1) | `"garch"` | Realistic volatility clustering |
 | Block Bootstrap | `"bootstrap"` | Non-parametric, no distributional assumptions |
@@ -67,6 +69,30 @@ df = sim.to_dataframe()  # shape: (253, 100)
 - **GARCH** — best for most use cases, captures the volatility clustering seen in real markets
 - **Bootstrap** — most honest for strategy testing, resamples real historical behaviour directly
 - **Jump Diffusion** — best when your data contains sudden large moves you want to preserve
+- **Markov-switching EGARCH** — the most sophisticated model. Identifies hidden market regimes (calm, stressed, crisis) each with its own EGARCH volatility dynamics and Student-t innovations. Captures regime persistence, volatility clustering, leverage effects, and fat tails simultaneously. Requires minimum 2 years of daily data and Numba for speed optimisation.
+## Usage Notes
+### Markov-switching EGARCH
+The `markov_egarch` model has specific requirements and options:
+- **Minimum data**: 2 years of daily prices (500+ observations recommended)
+- **Fitting time**: ~1 minute on a modern machine (first call longer due to Numba JIT warmup)
+- **Dependencies**: requires `numba` — `pip install numba`
+```python
+forge = pf.PathForge(prices)
+forge.fit(
+    model="markov_egarch",
+    n_states=3,        # number of hidden regimes
+    n_starts=3,        # random restarts for EM algorithm
+    verbose=True,      # print fitting progress
+    random_state=42,   # for reproducibility
+    min_persistence=0.7  # minimum regime persistence (set to None to disable)
+)
+sim = forge.simulate(days=252, n_paths=100)
+```
+> **Note:** This model uses a generalised EM algorithm rather than an exact closed-form M-step. Volatility dynamics are modelled using state-specific, uncentred EGARCH filters, resulting in an approximate likelihood. This approach is designed for practical simulation and backtesting rather than exact state-space inference. See the [GitHub repository](https://github.com/franmanz/pathforge) for full technical details.
 ## API Reference
@@ -92,10 +118,11 @@ Returned by `.simulate()`.
 ## Roadmap
-- [ ] Poisson jump diffusion ✅
+- [x] Merton Jump Diffusion
+- [x] Markov-switching EGARCH with Student-t innovations
 - [ ] Intraday timeframes (1m, 5m, 15m, 1h)
 - [ ] Multi-asset correlated simulation
-- [ ] Regime switching model
+- [ ] Centred EGARCH specification
 - [ ] CLI: `pathforge simulate AAPL --days 252 --paths 500`
 ## Contributing

{pathforge-0.1.1 → pathforge-0.2.0}/README.md RENAMED Viewed

@@ -15,6 +15,7 @@ Testing a trading strategy on a single historical price series tells you how it
 ## Installation
 ```bash
 pip install pathforge
+pip install numba  # required for markov_egarch model
 ```
 To use the built-in plot functionality:
@@ -50,6 +51,7 @@ df = sim.to_dataframe()  # shape: (253, 100)
 | Model | `model=` | Best for |
 |---|---|---|
+| Markov-switching EGARCH | `"markov_egarch"` | Research-grade: hidden regimes + volatility clustering + fat tails |
 | Geometric Brownian Motion | `"gbm"` | Fast baseline, simple assumptions |
 | GARCH(1,1) | `"garch"` | Realistic volatility clustering |
 | Block Bootstrap | `"bootstrap"` | Non-parametric, no distributional assumptions |
@@ -61,6 +63,30 @@ df = sim.to_dataframe()  # shape: (253, 100)
 - **GARCH** — best for most use cases, captures the volatility clustering seen in real markets
 - **Bootstrap** — most honest for strategy testing, resamples real historical behaviour directly
 - **Jump Diffusion** — best when your data contains sudden large moves you want to preserve
+- **Markov-switching EGARCH** — the most sophisticated model. Identifies hidden market regimes (calm, stressed, crisis) each with its own EGARCH volatility dynamics and Student-t innovations. Captures regime persistence, volatility clustering, leverage effects, and fat tails simultaneously. Requires minimum 2 years of daily data and Numba for speed optimisation.
+## Usage Notes
+### Markov-switching EGARCH
+The `markov_egarch` model has specific requirements and options:
+- **Minimum data**: 2 years of daily prices (500+ observations recommended)
+- **Fitting time**: ~1 minute on a modern machine (first call longer due to Numba JIT warmup)
+- **Dependencies**: requires `numba` — `pip install numba`
+```python
+forge = pf.PathForge(prices)
+forge.fit(
+    model="markov_egarch",
+    n_states=3,        # number of hidden regimes
+    n_starts=3,        # random restarts for EM algorithm
+    verbose=True,      # print fitting progress
+    random_state=42,   # for reproducibility
+    min_persistence=0.7  # minimum regime persistence (set to None to disable)
+)
+sim = forge.simulate(days=252, n_paths=100)
+```
+> **Note:** This model uses a generalised EM algorithm rather than an exact closed-form M-step. Volatility dynamics are modelled using state-specific, uncentred EGARCH filters, resulting in an approximate likelihood. This approach is designed for practical simulation and backtesting rather than exact state-space inference. See the [GitHub repository](https://github.com/franmanz/pathforge) for full technical details.
 ## API Reference
@@ -86,10 +112,11 @@ Returned by `.simulate()`.
 ## Roadmap
-- [ ] Poisson jump diffusion ✅
+- [x] Merton Jump Diffusion
+- [x] Markov-switching EGARCH with Student-t innovations
 - [ ] Intraday timeframes (1m, 5m, 15m, 1h)
 - [ ] Multi-asset correlated simulation
-- [ ] Regime switching model
+- [ ] Centred EGARCH specification
 - [ ] CLI: `pathforge simulate AAPL --days 252 --paths 500`
 ## Contributing

{pathforge-0.1.1 → pathforge-0.2.0}/pathforge/__init__.py RENAMED Viewed

@@ -1,6 +1,6 @@
 from pathforge.forge import PathForge
 from pathforge.result import SimulationResult
-__version__ = "0.1.0"
+__version__ = "0.2.0"
 __all__ = ["PathForge", "SimulationResult"]

{pathforge-0.1.1 → pathforge-0.2.0}/pathforge/forge.py RENAMED Viewed

@@ -1,3 +1,4 @@
+import inspect
 import numpy as np
 import pandas as pd
 from pathforge.result import SimulationResult
@@ -21,34 +22,62 @@ class PathForge:
             return data.iloc[:, 0].dropna()
         raise TypeError("data must be a pandas Series or DataFrame")
-    def fit(self, model="garch"):
+    def fit(self, model="garch", **kwargs):
         """
         Fit a simulation model to the historical price data.
         Parameters
         ----------
         model : str
-            "gbm", "garch", or "bootstrap"
+            "gbm", "garch", "bootstrap", "jump_diffusion", "markov_egarch"
+        **kwargs
+            Model-specific keyword arguments. These are routed to the model
+            constructor or the model's ``fit()`` method based on signature.
         """
         if model == "gbm":
             from pathforge.models.gbm import GBMModel
-            self._model = GBMModel(self._returns)
+            model_cls = GBMModel
         elif model == "garch":
             from pathforge.models.garch import GARCHModel
-            self._model = GARCHModel(self._returns)
+            model_cls = GARCHModel
         elif model == "bootstrap":
             from pathforge.models.bootstrap import BlockBootstrapModel
-            self._model = BlockBootstrapModel(self._returns)
+            model_cls = BlockBootstrapModel
         elif model == "jump_diffusion":
             from pathforge.models.jump_diffusion import JumpDiffusionModel
-            self._model = JumpDiffusionModel(self._returns)
+            model_cls = JumpDiffusionModel
+        elif model == "markov_egarch":
+            from pathforge.models.markov_egarch import MarkovEGARCHModel
+            self._model = MarkovEGARCHModel(self._returns, **kwargs)
+            self._model.fit()
+            self._fitted = True
+            return self
         else:
-            raise ValueError(f"Unknown model '{model}'. Choose from: gbm, garch, bootstrap")
+            raise ValueError(f"Unknown model '{model}'. Choose from: gbm, garch, bootstrap, jump_diffusion, markov_egarch")
-        self._model.fit()
+        init_params = self._accepted_kwargs(model_cls, kwargs)
+        fit_params = self._accepted_kwargs(model_cls.fit, kwargs)
+        unknown = set(kwargs) - set(init_params) - set(fit_params)
+        if unknown:
+            unknown_list = ", ".join(sorted(unknown))
+            raise TypeError(f"Unexpected keyword argument(s) for model '{model}': {unknown_list}")
+        self._model = model_cls(self._returns, **init_params)
+        self._model.fit(**fit_params)
         self._fitted = True
         return self #allows forge.fit("garch").simulate(...) in one line etc.
+    def _accepted_kwargs(self, callable_obj, kwargs):
+        """Return kwargs accepted by a callable, excluding common non-user args."""
+        sig = inspect.signature(callable_obj)
+        accepted = {}
+        for name, param in sig.parameters.items():
+            if name in {"self", "returns"}:
+                continue
+            if name in kwargs:
+                accepted[name] = kwargs[name]
+        return accepted
     #Simulation method
     def simulate(self, days=252, n_paths=100, start_price=None, seed=None):
         """

{pathforge-0.1.1 → pathforge-0.2.0}/pathforge/models/__init__.py RENAMED Viewed

@@ -2,5 +2,6 @@ from pathforge.models.gbm import GBMModel
 from pathforge.models.garch import GARCHModel
 from pathforge.models.bootstrap import BlockBootstrapModel
 from pathforge.models.jump_diffusion import JumpDiffusionModel
+from pathforge.models.markov_egarch import MarkovEGARCHModel
-__all__ = ["GBMModel", "GARCHModel", "BlockBootstrapModel", "JumpDiffusionModel"]
+__all__ = ["GBMModel", "GARCHModel", "BlockBootstrapModel", "JumpDiffusionModel","MarkovEGARCHModel"]

pathforge-0.2.0/pathforge/models/markov_egarch.py ADDED Viewed

@@ -0,0 +1,548 @@
+import numpy as np
+from scipy import optimize
+from pathforge.models.base import BaseModel
+from scipy.special import gammaln
+from numba import njit
+@njit
+def _egarch_loop(returns, omega, alpha, gamma, beta, K, T):
+    """JIT-compiled EGARCH recursion for all states."""
+    log_sigma2 = np.empty((K, T))
+    var0 = np.var(returns)
+    for k in range(K):
+        log_sigma2[k, 0] = np.log(var0 + 1e-8)
+    for t in range(1, T):
+        for k in range(K):
+            ls = log_sigma2[k, t-1]
+            ls = min(max(ls, -10.0), 10.0)
+            s = np.exp(0.5 * ls)
+            s = max(s, 1e-8)
+            z = returns[t-1] / s
+            z = min(max(z, -10.0), 10.0)
+            log_sigma2[k, t] = omega[k] + alpha[k]*z + gamma[k]*abs(z) + beta[k]*log_sigma2[k, t-1]
+            log_sigma2[k, t] = min(max(log_sigma2[k, t], -10.0), 10.0)
+    return np.exp(0.5 * log_sigma2)
+@njit
+def _egarch_loop_single(returns, om, al, ga, be):
+    """JIT-compiled EGARCH recursion for a single state."""
+    T = len(returns)
+    log_s2 = np.empty(T)
+    log_s2[0] = np.log(np.var(returns) + 1e-8)
+    for t in range(1, T):
+        s = np.exp(0.5 * min(max(log_s2[t-1], -10.0), 10.0))
+        s = max(s, 1e-8)
+        z = min(max(returns[t-1] / s, -10.0), 10.0)
+        log_s2[t] = min(max(om + al*z + ga*abs(z) + be*log_s2[t-1], -10.0), 10.0)
+    return np.exp(0.5 * log_s2)
+class MarkovEGARCHModel(BaseModel):
+    """
+    Markov-switching EGARCH(1,1) with Student-t innovations.
+    The model assumes financial returns are generated by three hidden
+    regimes (calm, stressed, crisis), each with its own EGARCH volatility
+    dynamics and Student-t innovation distribution.
+    Parameters
+    ----------
+    returns : pd.Series
+        Daily simple returns.
+    n_states : int
+        Number of hidden regimes. Defaults to 3.
+    n_starts : int
+        Number of random starting points for MLE. Defaults to 3.
+    """
+    def __init__(self, returns, n_states=3, n_starts=3, verbose = True, random_state = None, min_persistence = 0.7):
+        super().__init__(returns)
+        self.n_states = n_states
+        self.n_starts = n_starts
+        self.verbose = verbose
+        self.random_state = random_state
+        self.min_persistence = min_persistence
+    def _egarch_volatility_all(self, omega, alpha, gamma, beta):
+        T = len(self.returns)
+        K = self.n_states
+        return _egarch_loop(self.returns, omega, alpha, gamma, beta, K, T)
+    def _student_t_loglik(self, r, sigma, nu):
+        """
+        Log likelihood of observation r under standardised Student-t.
+        Numerically stable implementation.
+        """
+        # Clip sigma to prevent division by zero
+        sigma = np.clip(sigma, 1e-8, np.inf)
+        # Clip nu to prevent issues near boundary
+        nu = np.clip(nu, 2.001, 100.0)
+        c = (gammaln((nu + 1) / 2)
+             - gammaln(nu / 2)
+             - 0.5 * np.log(np.pi * (nu - 2)))
+        # Compute the log term stably
+        z2 = (r / sigma) ** 2
+        z2 = np.clip(z2, 0, 1e10)
+        log_term = np.log1p(z2 / (nu - 2))
+        return c - np.log(sigma) - (nu + 1) / 2 * log_term
+    def _forward(self, omega, alpha, gamma, beta, nu, A, pi):
+        """
+        Forward algorithm with log-sum-exp for numerical stability.
+        Returns
+        -------
+        log_likelihood : float
+            Total log likelihood of the observed returns.
+        log_alpha : np.ndarray, shape (T, K)
+            Filter probabilities in log space.
+        """
+        T = len(self.returns)
+        K = self.n_states
+        log_alpha = np.empty((T, K))
+        # Compute all state volatilities in one pass
+        sigmas = self._egarch_volatility_all(omega, alpha, gamma, beta)
+        # Vectorised log emissions across all states and time steps
+        log_emit = np.array([
+            self._student_t_loglik(self.returns, sigmas[k], nu[k])
+            for k in range(K)
+        ]).T  # shape (T, K)
+        # Initialise at t=0
+        log_alpha[0] = np.log(pi + 1e-300) + log_emit[0]
+        # Recurse forward — vectorised
+        log_A = np.log(A + 1e-300)
+        for t in range(1, T):
+            v = log_alpha[t - 1][:, np.newaxis] + log_A
+            v_max = v.max(axis=0)
+            log_alpha[t] = v_max + np.log(np.sum(np.exp(v - v_max), axis=0)) + log_emit[t]
+        # Total log likelihood — log sum exp over final states
+        v = log_alpha[-1]
+        v_max = v.max()
+        log_likelihood = v_max + np.log(np.sum(np.exp(v - v_max)))
+        return log_likelihood, log_alpha, log_emit
+    def _backward(self, log_emit, A):
+        """
+        Backward algorithm with log-sum-exp for numerical stability.
+        Parameters
+        ----------
+        log_emit : np.ndarray, shape (T, K)
+            Log emission probabilities from the forward pass.
+        A : np.ndarray, shape (K, K)
+            Transition matrix.
+        Returns
+        -------
+        log_beta : np.ndarray, shape (T, K)
+            Backward probabilities in log space.
+        """
+        T, K = log_emit.shape
+        log_beta = np.zeros((T, K))
+        log_A = np.log(A + 1e-300)
+        # Initialise — beta_T = 1, so log_beta_T = 0
+        log_beta[-1] = 0.0
+        # Recurse backward
+        for t in range(T - 2, -1, -1):
+            for k in range(K):
+                v = log_A[k, :] + log_emit[t + 1] + log_beta[t + 1]
+                v_max = v.max()
+                log_beta[t, k] = v_max + np.log(np.sum(np.exp(v - v_max)))
+        return log_beta
+    def _compute_smoothed_probs(self, log_alpha, log_beta, log_emit, A):
+        """
+        Compute smoothed state probabilities and pairwise transition probabilities.
+        Returns
+        -------
+        gamma : np.ndarray, shape (T, K)
+            Smoothed state probabilities.
+        xi : np.ndarray, shape (T-1, K, K)
+            Pairwise transition probabilities.
+        """
+        T, K = log_alpha.shape
+        log_A = np.log(A + 1e-300)
+        # Smoothed state probabilities
+        log_gamma = log_alpha + log_beta
+        # Normalise each row using log-sum-exp
+        for t in range(T):
+            v = log_gamma[t]
+            v_max = v.max()
+            log_norm = v_max + np.log(np.sum(np.exp(v - v_max)))
+            log_gamma[t] -= log_norm
+        gamma = np.exp(log_gamma)
+        # Pairwise transition probabilities
+        xi = np.zeros((T - 1, K, K))
+        for t in range(T - 1):
+            log_xi_t = np.empty((K, K))
+            for j in range(K):
+                for k in range(K):
+                    log_xi_t[j, k] = (
+                        log_alpha[t, j]
+                        + log_A[j, k]
+                        + log_emit[t + 1, k]
+                        + log_beta[t + 1, k]
+                    )
+            # Normalise in log space to avoid overflow/underflow.
+            v_max = np.max(log_xi_t)
+            xi_t = np.exp(log_xi_t - v_max)
+            xi_sum = xi_t.sum()
+            if not np.isfinite(xi_sum) or xi_sum <= 0:
+                xi[t] = np.full((K, K), 1.0 / (K * K))
+            else:
+                xi[t] = xi_t / xi_sum
+        return gamma, xi
+    def _m_step(self, gamma, xi, omega, alpha, gamma_param, beta, nu, A):
+        """
+        M-step — update parameters given smoothed state probabilities.
+        Parameters
+        ----------
+        gamma : np.ndarray, shape (T, K)
+            Smoothed state probabilities.
+        xi : np.ndarray, shape (T-1, K, K)
+            Pairwise transition probabilities.
+        Returns
+        -------
+        Updated parameters.
+        """
+        K = self.n_states
+        # Update transition matrix — enforce minimum persistence on diagonal
+        A_new = xi.sum(axis=0)
+        A_new /= A_new.sum(axis=1, keepdims=True) + 1e-300
+        # Optionally enforce minimum diagonal (regime persistence)
+        if self.min_persistence is not None:
+            for k in range(K):
+                if A_new[k, k] < self.min_persistence:
+                    deficit = self.min_persistence - A_new[k, k]
+                    off_diag = [j for j in range(K) if j != k]
+                    A_new[k, k] = self.min_persistence
+                    for j in off_diag:
+                        A_new[k, j] = max(A_new[k, j] - deficit / len(off_diag), 1e-4)
+                A_new[k] /= A_new[k].sum()
+        # Ensure no transition probability is exactly zero
+        A_new = np.maximum(A_new, 1e-4)
+        A_new /= A_new.sum(axis=1, keepdims=True)
+        # Update initial state distribution — add small floor to prevent collapse
+        pi_new = gamma[0] + 0.05
+        pi_new /= pi_new.sum()
+        # Update EGARCH parameters — small optimisation per state
+        omega_new = omega.copy()
+        alpha_new = alpha.copy()
+        gamma_new = gamma_param.copy()
+        beta_new = beta.copy()
+        nu_new = nu.copy()
+        for k in range(K):
+            weights = gamma[:, k]
+            def neg_weighted_ll(params):
+                om, al, ga, be_raw, nu_raw = params
+                be = np.tanh(be_raw)
+                be = np.clip(be, 0.5, 0.97)
+                nu_k = np.exp(np.clip(nu_raw, -5, 50)) + 2
+                nu_k = np.clip(nu_k, 4.0, 100.0)
+                om = np.clip(om, -6, 2)
+                sig = _egarch_loop_single(self.returns, om, al, ga, be)
+                ll = self._student_t_loglik(self.returns, sig, nu_k)
+                return -np.sum(weights * ll)
+            x0 = np.array([
+                omega[k],
+                alpha[k],
+                gamma_param[k],
+                np.arctanh(np.clip(beta[k], -0.999, 0.999)),
+                np.log(max(nu[k] - 2, 0.001))
+            ])
+            result = optimize.minimize(
+                neg_weighted_ll, x0,
+                method="Nelder-Mead",
+                options={"maxiter": 500, "xatol": 1e-4, "fatol": 1e-4}
+            )
+            if result.success or result.fun < neg_weighted_ll(x0):
+                omega_new[k] = result.x[0]
+                alpha_new[k] = result.x[1]
+                gamma_new[k] = result.x[2]
+                beta_new[k] = np.clip(np.tanh(result.x[3]), 0.5, 0.97)
+                nu_new[k] = np.clip(np.exp(np.clip(result.x[4], -5, 50)) + 2, 4.0, 100.0)
+            else:
+                if self.verbose:
+                    print(f"  Warning: optimizer failed for state {k}, keeping previous parameters")
+                # Keep previous parameters
+                omega_new[k] = omega[k]
+                alpha_new[k] = alpha[k]
+                gamma_new[k] = gamma_param[k]
+                beta_new[k] = beta[k]
+                nu_new[k] = nu[k]
+        return omega_new, alpha_new, gamma_new, beta_new, nu_new, A_new, pi_new
+    def fit(self, max_iter=100, tol=1e-3):
+        """
+        Fit the model using the EM algorithm.
+        Parameters
+        ----------
+        max_iter : int
+            Maximum number of EM iterations.
+        tol : float
+            Convergence tolerance on log likelihood improvement.
+        """
+        K = self.n_states
+        best_ll = -np.inf
+        best_params = None
+        if self.random_state is not None:
+            np.random.seed(self.random_state)
+        for start in range(self.n_starts):
+            if self.verbose:
+                print(f"Starting run {start + 1}/{self.n_starts}...")
+            # Random initialisation
+            omega = np.random.normal(-5, 0.5, K)
+            alpha = np.random.normal(0, 0.1, K)
+            gamma_param = np.random.normal(0, 0.1, K)
+            beta = np.random.uniform(0.7, 0.97, K)
+            nu = np.random.uniform(4, 10, K)
+            # Initialise transition matrix with strong diagonal
+            A = np.ones((K, K)) * 0.05
+            np.fill_diagonal(A, 0.90)
+            A /= A.sum(axis=1, keepdims=True)
+            # Random initial state distribution
+            pi = np.random.dirichlet(np.ones(K))
+            prev_ll = -np.inf
+            ll = -np.inf
+            for iteration in range(max_iter):
+                # E-step
+                try:
+                    ll, log_alpha, log_emit = self._forward(
+                        omega, alpha, gamma_param, beta, nu, A, pi
+                    )
+                    log_beta = self._backward(log_emit, A)
+                    gamma, xi = self._compute_smoothed_probs(
+                        log_alpha, log_beta, log_emit, A
+                    )
+                except Exception as e:
+                    if self.verbose:
+                        print(f"  E-step failed: {e}")
+                    break
+                # Check convergence
+                improvement = ll - prev_ll
+                if self.verbose:
+                    print(f"  Iter {iteration + 1}: ll = {ll:.4f}, improvement = {improvement:.6f}")
+                if iteration > 0 and ll < prev_ll - 1.0:
+                    if self.verbose:
+                        print(f"  Likelihood crashed, stopping")
+                    ll = prev_ll
+                    break
+                if iteration > 0 and improvement < 0:
+                    if self.verbose:
+                        print(f"  Warning: likelihood decreased by {-improvement:.6f}")
+                if iteration > 0 and 0 <= improvement < tol:
+                    if self.verbose:
+                        print(f"  Converged at iteration {iteration + 1}")
+                    break
+                prev_ll = ll
+                # M-step
+                try:
+                    omega, alpha, gamma_param, beta, nu, A, pi = self._m_step(
+                        gamma, xi, omega, alpha, gamma_param, beta, nu, A
+                    )
+                except Exception as e:
+                    if self.verbose:
+                        print(f"  M-step failed: {e}")
+                    break
+            # Keep best result across starts
+            if ll > best_ll:
+                best_ll = ll
+                best_params = (omega, alpha, gamma_param, beta, nu, A, pi)
+        if best_params is None:
+            raise RuntimeError("MarkovEGARCH fit failed for all random starts")
+        # Unpack best parameters
+        omega, alpha, gamma_param, beta, nu, A, pi = best_params
+        # Reorder states by increasing empirical fitted volatility
+        try:
+            _, log_alpha_final, log_emit_final = self._forward(
+                omega, alpha, gamma_param, beta, nu, A, pi
+            )
+            log_beta_final = self._backward(log_emit_final, A)
+            gamma_final, _ = self._compute_smoothed_probs(
+                log_alpha_final, log_beta_final, log_emit_final, A
+            )
+            sigmas_final = self._egarch_volatility_all(omega, alpha, gamma_param, beta)
+            avg_vol = np.array([
+                np.average(sigmas_final[k], weights=gamma_final[:, k])
+                for k in range(K)
+            ])
+            order = np.argsort(avg_vol)
+        except Exception:
+            order = np.argsort(np.abs(omega))
+        self.params_["omega"] = omega[order]
+        self.params_["alpha"] = alpha[order]
+        self.params_["gamma"] = gamma_param[order]
+        self.params_["beta"] = beta[order]
+        self.params_["nu"] = nu[order]
+        self.params_["A"] = A[np.ix_(order, order)]
+        self.params_["pi"] = pi[order]
+        self.params_["log_likelihood"] = best_ll
+        # Re-enforce minimum persistence after reordering
+        if self.min_persistence is not None:
+            A_final = self.params_["A"]
+            for k in range(self.n_states):
+                if A_final[k, k] < self.min_persistence:
+                    deficit = self.min_persistence - A_final[k, k]
+                    off_diag = [j for j in range(self.n_states) if j != k]
+                    A_final[k, k] = self.min_persistence
+                    for j in off_diag:
+                        A_final[k, j] = max(A_final[k, j] - deficit / len(off_diag), 1e-4)
+                A_final[k] /= A_final[k].sum()
+            # Ensure no transition probability is exactly zero
+            A_final = np.maximum(A_final, 1e-4)
+            A_final /= A_final.sum(axis=1, keepdims=True)
+            self.params_["A"] = A_final
+    def sample(self, days, n_paths, burn_in=100):
+        """
+        Generate simulated return paths from the fitted model.
+        Parameters
+        ----------
+        days : int
+            Number of days to simulate.
+        n_paths : int
+            Number of independent paths.
+        burn_in : int
+            Number of initial steps to discard to reduce initialisation
+            bias. Defaults to 100.
+        Returns
+        -------
+        np.ndarray, shape (days, n_paths)
+            Simulated simple returns.
+        """
+        omega = self.params_["omega"]
+        alpha = self.params_["alpha"]
+        gamma = self.params_["gamma"]
+        beta = self.params_["beta"]
+        nu = self.params_["nu"]
+        A = self.params_["A"]
+        pi = self.params_["pi"]
+        # Stationary distribution of A for initial state sampling
+        try:
+            eigvals, eigvecs = np.linalg.eig(A.T)
+            stationary = np.real(eigvecs[:, np.argmax(np.real(eigvals))])
+            stationary = np.abs(stationary)
+            stationary /= stationary.sum()
+        except Exception:
+            stationary = pi
+        simulated = np.empty((days, n_paths))
+        for path in range(n_paths):
+            # Sample initial state from stationary distribution
+            state = np.random.choice(self.n_states, p=stationary)
+            # State-dependent initial log variance
+            log_sigma2 = np.log(np.var(self.returns) + 1e-8)
+            prev_return = 0.0
+            # Burn-in period — discard these steps
+            for _ in range(burn_in):
+                sigma = np.exp(0.5 * np.clip(log_sigma2, -10, 10))
+                z = prev_return / (sigma + 1e-300)
+                log_sigma2 = np.clip(
+                    omega[state] + alpha[state] * z
+                    + gamma[state] * np.abs(z)
+                    + beta[state] * log_sigma2,
+                    -10, 10
+                )
+                sigma = np.exp(0.5 * log_sigma2)
+                prev_return = self._sample_student_t(sigma, nu[state])
+                state = np.random.choice(self.n_states, p=A[state])
+            # Actual simulation
+            for t in range(days):
+                sigma = np.exp(0.5 * np.clip(log_sigma2, -10, 10))
+                z = prev_return / (sigma + 1e-300)
+                log_sigma2 = np.clip(
+                    omega[state] + alpha[state] * z
+                    + gamma[state] * np.abs(z)
+                    + beta[state] * log_sigma2,
+                    -10, 10
+                )
+                sigma = np.exp(0.5 * log_sigma2)
+                r = self._sample_student_t(sigma, nu[state])
+                simulated[t, path] = r
+                prev_return = r
+                state = np.random.choice(self.n_states, p=A[state])
+        return simulated
+    def _sample_student_t(self, sigma, nu):
+        """Sample one return from a standardised Student-t scaled by sigma."""
+        nu = np.clip(nu, 4.0, 100.0)
+        x = np.random.standard_t(nu)
+        # Rescale to match standardised Student-t variance
+        x = x * np.sqrt((nu - 2) / nu)
+        return x * sigma
+    @property
+    def name(self):
+        return "markov_egarch"

{pathforge-0.1.1 → pathforge-0.2.0}/pathforge/result.py RENAMED Viewed

@@ -85,7 +85,7 @@ class SimulationResult:
                 zorder=5
             )
-        ax.set_title(f"PathForge Simulation — {self.model_name} ({self.paths.shape[1]} paths)")
+        ax.set_title(f"PathForge Simulation, {self.model_name} ({self.paths.shape[1]} paths)")
         ax.set_xlabel("Days")
         ax.set_ylabel("Price")
         ax.grid()

{pathforge-0.1.1 → pathforge-0.2.0}/pathforge.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: pathforge
-Version: 0.1.1
+Version: 0.2.0
 Summary: Simulate realistic financial markets from historical price data
 Description-Content-Type: text/markdown
@@ -21,6 +21,7 @@ Testing a trading strategy on a single historical price series tells you how it
 ## Installation
 ```bash
 pip install pathforge
+pip install numba  # required for markov_egarch model
 ```
 To use the built-in plot functionality:
@@ -56,6 +57,7 @@ df = sim.to_dataframe()  # shape: (253, 100)
 | Model | `model=` | Best for |
 |---|---|---|
+| Markov-switching EGARCH | `"markov_egarch"` | Research-grade: hidden regimes + volatility clustering + fat tails |
 | Geometric Brownian Motion | `"gbm"` | Fast baseline, simple assumptions |
 | GARCH(1,1) | `"garch"` | Realistic volatility clustering |
 | Block Bootstrap | `"bootstrap"` | Non-parametric, no distributional assumptions |
@@ -67,6 +69,30 @@ df = sim.to_dataframe()  # shape: (253, 100)
 - **GARCH** — best for most use cases, captures the volatility clustering seen in real markets
 - **Bootstrap** — most honest for strategy testing, resamples real historical behaviour directly
 - **Jump Diffusion** — best when your data contains sudden large moves you want to preserve
+- **Markov-switching EGARCH** — the most sophisticated model. Identifies hidden market regimes (calm, stressed, crisis) each with its own EGARCH volatility dynamics and Student-t innovations. Captures regime persistence, volatility clustering, leverage effects, and fat tails simultaneously. Requires minimum 2 years of daily data and Numba for speed optimisation.
+## Usage Notes
+### Markov-switching EGARCH
+The `markov_egarch` model has specific requirements and options:
+- **Minimum data**: 2 years of daily prices (500+ observations recommended)
+- **Fitting time**: ~1 minute on a modern machine (first call longer due to Numba JIT warmup)
+- **Dependencies**: requires `numba` — `pip install numba`
+```python
+forge = pf.PathForge(prices)
+forge.fit(
+    model="markov_egarch",
+    n_states=3,        # number of hidden regimes
+    n_starts=3,        # random restarts for EM algorithm
+    verbose=True,      # print fitting progress
+    random_state=42,   # for reproducibility
+    min_persistence=0.7  # minimum regime persistence (set to None to disable)
+)
+sim = forge.simulate(days=252, n_paths=100)
+```
+> **Note:** This model uses a generalised EM algorithm rather than an exact closed-form M-step. Volatility dynamics are modelled using state-specific, uncentred EGARCH filters, resulting in an approximate likelihood. This approach is designed for practical simulation and backtesting rather than exact state-space inference. See the [GitHub repository](https://github.com/franmanz/pathforge) for full technical details.
 ## API Reference
@@ -92,10 +118,11 @@ Returned by `.simulate()`.
 ## Roadmap
-- [ ] Poisson jump diffusion ✅
+- [x] Merton Jump Diffusion
+- [x] Markov-switching EGARCH with Student-t innovations
 - [ ] Intraday timeframes (1m, 5m, 15m, 1h)
 - [ ] Multi-asset correlated simulation
-- [ ] Regime switching model
+- [ ] Centred EGARCH specification
 - [ ] CLI: `pathforge simulate AAPL --days 252 --paths 500`
 ## Contributing

{pathforge-0.1.1 → pathforge-0.2.0}/pathforge.egg-info/SOURCES.txt RENAMED Viewed

@@ -13,4 +13,5 @@ pathforge/models/bootstrap.py
 pathforge/models/garch.py
 pathforge/models/gbm.py
 pathforge/models/jump_diffusion.py
+pathforge/models/markov_egarch.py
 tests/test_pathforge.py

{pathforge-0.1.1 → pathforge-0.2.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "pathforge"
-version = "0.1.1"
+version = "0.2.0"
 description = "Simulate realistic financial markets from historical price data"
 readme = "README.md"

{pathforge-0.1.1 → pathforge-0.2.0}/tests/test_pathforge.py RENAMED Viewed

@@ -66,12 +66,28 @@ def test_start_price(price_series):
     assert np.all(sim.paths[0] == 500.0)
-def test_to_dataframe(price_series):
-    forge = pf.PathForge(price_series)
-    forge.fit(model="gbm")
-    sim = forge.simulate(days=50, n_paths=5, seed=0)
-    df = sim.to_dataframe()
-    assert df.shape == (51, 5)
-    assert list(df.columns) == [f"path_{i}" for i in range(5)]
+def test_to_dataframe(price_series):
+    forge = pf.PathForge(price_series)
+    forge.fit(model="gbm")
+    sim = forge.simulate(days=50, n_paths=5, seed=0)
+    df = sim.to_dataframe()
+    assert df.shape == (51, 5)
+    assert list(df.columns) == [f"path_{i}" for i in range(5)]
+def test_fit_routes_model_constructor_kwargs(price_series):
+    forge = pf.PathForge(price_series)
+    forge.fit(model="jump_diffusion", jump_threshold=2.5)
+    assert forge._model.jump_threshold == 2.5
+def test_fit_routes_model_fit_kwargs_to_markov_egarch(price_series):
+    short_prices = price_series.iloc[:200]
+    forge = pf.PathForge(short_prices)
+    forge.fit(model="markov_egarch", n_starts=1, max_iter=1)
+    transition_matrix = forge._model.params_["A"]
+    assert transition_matrix.shape == (3, 3)
+    assert np.isfinite(transition_matrix).all()