pathforge 0.1.1__tar.gz → 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: pathforge
3
- Version: 0.1.1
3
+ Version: 0.2.0
4
4
  Summary: Simulate realistic financial markets from historical price data
5
5
  Description-Content-Type: text/markdown
6
6
 
@@ -21,6 +21,7 @@ Testing a trading strategy on a single historical price series tells you how it
21
21
  ## Installation
22
22
  ```bash
23
23
  pip install pathforge
24
+ pip install numba # required for markov_egarch model
24
25
  ```
25
26
 
26
27
  To use the built-in plot functionality:
@@ -56,6 +57,7 @@ df = sim.to_dataframe() # shape: (253, 100)
56
57
 
57
58
  | Model | `model=` | Best for |
58
59
  |---|---|---|
60
+ | Markov-switching EGARCH | `"markov_egarch"` | Research-grade: hidden regimes + volatility clustering + fat tails |
59
61
  | Geometric Brownian Motion | `"gbm"` | Fast baseline, simple assumptions |
60
62
  | GARCH(1,1) | `"garch"` | Realistic volatility clustering |
61
63
  | Block Bootstrap | `"bootstrap"` | Non-parametric, no distributional assumptions |
@@ -67,6 +69,30 @@ df = sim.to_dataframe() # shape: (253, 100)
67
69
  - **GARCH** — best for most use cases, captures the volatility clustering seen in real markets
68
70
  - **Bootstrap** — most honest for strategy testing, resamples real historical behaviour directly
69
71
  - **Jump Diffusion** — best when your data contains sudden large moves you want to preserve
72
+ - **Markov-switching EGARCH** — the most sophisticated model. Identifies hidden market regimes (calm, stressed, crisis) each with its own EGARCH volatility dynamics and Student-t innovations. Captures regime persistence, volatility clustering, leverage effects, and fat tails simultaneously. Requires minimum 2 years of daily data and Numba for speed optimisation.
73
+
74
+ ## Usage Notes
75
+
76
+ ### Markov-switching EGARCH
77
+ The `markov_egarch` model has specific requirements and options:
78
+
79
+ - **Minimum data**: 2 years of daily prices (500+ observations recommended)
80
+ - **Fitting time**: ~1 minute on a modern machine (first call longer due to Numba JIT warmup)
81
+ - **Dependencies**: requires `numba` — `pip install numba`
82
+ ```python
83
+ forge = pf.PathForge(prices)
84
+ forge.fit(
85
+ model="markov_egarch",
86
+ n_states=3, # number of hidden regimes
87
+ n_starts=3, # random restarts for EM algorithm
88
+ verbose=True, # print fitting progress
89
+ random_state=42, # for reproducibility
90
+ min_persistence=0.7 # minimum regime persistence (set to None to disable)
91
+ )
92
+ sim = forge.simulate(days=252, n_paths=100)
93
+ ```
94
+
95
+ > **Note:** This model uses a generalised EM algorithm rather than an exact closed-form M-step. Volatility dynamics are modelled using state-specific, uncentred EGARCH filters, resulting in an approximate likelihood. This approach is designed for practical simulation and backtesting rather than exact state-space inference. See the [GitHub repository](https://github.com/franmanz/pathforge) for full technical details.
70
96
 
71
97
  ## API Reference
72
98
 
@@ -92,10 +118,11 @@ Returned by `.simulate()`.
92
118
 
93
119
  ## Roadmap
94
120
 
95
- - [ ] Poisson jump diffusion ✅
121
+ - [x] Merton Jump Diffusion
122
+ - [x] Markov-switching EGARCH with Student-t innovations
96
123
  - [ ] Intraday timeframes (1m, 5m, 15m, 1h)
97
124
  - [ ] Multi-asset correlated simulation
98
- - [ ] Regime switching model
125
+ - [ ] Centred EGARCH specification
99
126
  - [ ] CLI: `pathforge simulate AAPL --days 252 --paths 500`
100
127
 
101
128
  ## Contributing
@@ -15,6 +15,7 @@ Testing a trading strategy on a single historical price series tells you how it
15
15
  ## Installation
16
16
  ```bash
17
17
  pip install pathforge
18
+ pip install numba # required for markov_egarch model
18
19
  ```
19
20
 
20
21
  To use the built-in plot functionality:
@@ -50,6 +51,7 @@ df = sim.to_dataframe() # shape: (253, 100)
50
51
 
51
52
  | Model | `model=` | Best for |
52
53
  |---|---|---|
54
+ | Markov-switching EGARCH | `"markov_egarch"` | Research-grade: hidden regimes + volatility clustering + fat tails |
53
55
  | Geometric Brownian Motion | `"gbm"` | Fast baseline, simple assumptions |
54
56
  | GARCH(1,1) | `"garch"` | Realistic volatility clustering |
55
57
  | Block Bootstrap | `"bootstrap"` | Non-parametric, no distributional assumptions |
@@ -61,6 +63,30 @@ df = sim.to_dataframe() # shape: (253, 100)
61
63
  - **GARCH** — best for most use cases, captures the volatility clustering seen in real markets
62
64
  - **Bootstrap** — most honest for strategy testing, resamples real historical behaviour directly
63
65
  - **Jump Diffusion** — best when your data contains sudden large moves you want to preserve
66
+ - **Markov-switching EGARCH** — the most sophisticated model. Identifies hidden market regimes (calm, stressed, crisis) each with its own EGARCH volatility dynamics and Student-t innovations. Captures regime persistence, volatility clustering, leverage effects, and fat tails simultaneously. Requires minimum 2 years of daily data and Numba for speed optimisation.
67
+
68
+ ## Usage Notes
69
+
70
+ ### Markov-switching EGARCH
71
+ The `markov_egarch` model has specific requirements and options:
72
+
73
+ - **Minimum data**: 2 years of daily prices (500+ observations recommended)
74
+ - **Fitting time**: ~1 minute on a modern machine (first call longer due to Numba JIT warmup)
75
+ - **Dependencies**: requires `numba` — `pip install numba`
76
+ ```python
77
+ forge = pf.PathForge(prices)
78
+ forge.fit(
79
+ model="markov_egarch",
80
+ n_states=3, # number of hidden regimes
81
+ n_starts=3, # random restarts for EM algorithm
82
+ verbose=True, # print fitting progress
83
+ random_state=42, # for reproducibility
84
+ min_persistence=0.7 # minimum regime persistence (set to None to disable)
85
+ )
86
+ sim = forge.simulate(days=252, n_paths=100)
87
+ ```
88
+
89
+ > **Note:** This model uses a generalised EM algorithm rather than an exact closed-form M-step. Volatility dynamics are modelled using state-specific, uncentred EGARCH filters, resulting in an approximate likelihood. This approach is designed for practical simulation and backtesting rather than exact state-space inference. See the [GitHub repository](https://github.com/franmanz/pathforge) for full technical details.
64
90
 
65
91
  ## API Reference
66
92
 
@@ -86,10 +112,11 @@ Returned by `.simulate()`.
86
112
 
87
113
  ## Roadmap
88
114
 
89
- - [ ] Poisson jump diffusion ✅
115
+ - [x] Merton Jump Diffusion
116
+ - [x] Markov-switching EGARCH with Student-t innovations
90
117
  - [ ] Intraday timeframes (1m, 5m, 15m, 1h)
91
118
  - [ ] Multi-asset correlated simulation
92
- - [ ] Regime switching model
119
+ - [ ] Centred EGARCH specification
93
120
  - [ ] CLI: `pathforge simulate AAPL --days 252 --paths 500`
94
121
 
95
122
  ## Contributing
@@ -1,6 +1,6 @@
1
1
  from pathforge.forge import PathForge
2
2
  from pathforge.result import SimulationResult
3
3
 
4
- __version__ = "0.1.0"
4
+ __version__ = "0.2.0"
5
5
 
6
6
  __all__ = ["PathForge", "SimulationResult"]
@@ -1,3 +1,4 @@
1
+ import inspect
1
2
  import numpy as np
2
3
  import pandas as pd
3
4
  from pathforge.result import SimulationResult
@@ -21,34 +22,62 @@ class PathForge:
21
22
  return data.iloc[:, 0].dropna()
22
23
  raise TypeError("data must be a pandas Series or DataFrame")
23
24
 
24
- def fit(self, model="garch"):
25
+ def fit(self, model="garch", **kwargs):
25
26
  """
26
27
  Fit a simulation model to the historical price data.
27
28
 
28
29
  Parameters
29
30
  ----------
30
31
  model : str
31
- "gbm", "garch", or "bootstrap"
32
+ "gbm", "garch", "bootstrap", "jump_diffusion", "markov_egarch"
33
+ **kwargs
34
+ Model-specific keyword arguments. These are routed to the model
35
+ constructor or the model's ``fit()`` method based on signature.
32
36
  """
33
37
  if model == "gbm":
34
38
  from pathforge.models.gbm import GBMModel
35
- self._model = GBMModel(self._returns)
39
+ model_cls = GBMModel
36
40
  elif model == "garch":
37
41
  from pathforge.models.garch import GARCHModel
38
- self._model = GARCHModel(self._returns)
42
+ model_cls = GARCHModel
39
43
  elif model == "bootstrap":
40
44
  from pathforge.models.bootstrap import BlockBootstrapModel
41
- self._model = BlockBootstrapModel(self._returns)
45
+ model_cls = BlockBootstrapModel
42
46
  elif model == "jump_diffusion":
43
47
  from pathforge.models.jump_diffusion import JumpDiffusionModel
44
- self._model = JumpDiffusionModel(self._returns)
48
+ model_cls = JumpDiffusionModel
49
+ elif model == "markov_egarch":
50
+ from pathforge.models.markov_egarch import MarkovEGARCHModel
51
+ self._model = MarkovEGARCHModel(self._returns, **kwargs)
52
+ self._model.fit()
53
+ self._fitted = True
54
+ return self
45
55
  else:
46
- raise ValueError(f"Unknown model '{model}'. Choose from: gbm, garch, bootstrap")
56
+ raise ValueError(f"Unknown model '{model}'. Choose from: gbm, garch, bootstrap, jump_diffusion, markov_egarch")
47
57
 
48
- self._model.fit()
58
+ init_params = self._accepted_kwargs(model_cls, kwargs)
59
+ fit_params = self._accepted_kwargs(model_cls.fit, kwargs)
60
+ unknown = set(kwargs) - set(init_params) - set(fit_params)
61
+ if unknown:
62
+ unknown_list = ", ".join(sorted(unknown))
63
+ raise TypeError(f"Unexpected keyword argument(s) for model '{model}': {unknown_list}")
64
+
65
+ self._model = model_cls(self._returns, **init_params)
66
+ self._model.fit(**fit_params)
49
67
  self._fitted = True
50
68
  return self #allows forge.fit("garch").simulate(...) in one line etc.
51
69
 
70
+ def _accepted_kwargs(self, callable_obj, kwargs):
71
+ """Return kwargs accepted by a callable, excluding common non-user args."""
72
+ sig = inspect.signature(callable_obj)
73
+ accepted = {}
74
+ for name, param in sig.parameters.items():
75
+ if name in {"self", "returns"}:
76
+ continue
77
+ if name in kwargs:
78
+ accepted[name] = kwargs[name]
79
+ return accepted
80
+
52
81
  #Simulation method
53
82
  def simulate(self, days=252, n_paths=100, start_price=None, seed=None):
54
83
  """
@@ -2,5 +2,6 @@ from pathforge.models.gbm import GBMModel
2
2
  from pathforge.models.garch import GARCHModel
3
3
  from pathforge.models.bootstrap import BlockBootstrapModel
4
4
  from pathforge.models.jump_diffusion import JumpDiffusionModel
5
+ from pathforge.models.markov_egarch import MarkovEGARCHModel
5
6
 
6
- __all__ = ["GBMModel", "GARCHModel", "BlockBootstrapModel", "JumpDiffusionModel"]
7
+ __all__ = ["GBMModel", "GARCHModel", "BlockBootstrapModel", "JumpDiffusionModel","MarkovEGARCHModel"]
@@ -0,0 +1,548 @@
1
+ import numpy as np
2
+ from scipy import optimize
3
+ from pathforge.models.base import BaseModel
4
+ from scipy.special import gammaln
5
+ from numba import njit
6
+
7
+ @njit
8
+ def _egarch_loop(returns, omega, alpha, gamma, beta, K, T):
9
+ """JIT-compiled EGARCH recursion for all states."""
10
+ log_sigma2 = np.empty((K, T))
11
+ var0 = np.var(returns)
12
+ for k in range(K):
13
+ log_sigma2[k, 0] = np.log(var0 + 1e-8)
14
+
15
+ for t in range(1, T):
16
+ for k in range(K):
17
+ ls = log_sigma2[k, t-1]
18
+ ls = min(max(ls, -10.0), 10.0)
19
+ s = np.exp(0.5 * ls)
20
+ s = max(s, 1e-8)
21
+ z = returns[t-1] / s
22
+ z = min(max(z, -10.0), 10.0)
23
+ log_sigma2[k, t] = omega[k] + alpha[k]*z + gamma[k]*abs(z) + beta[k]*log_sigma2[k, t-1]
24
+ log_sigma2[k, t] = min(max(log_sigma2[k, t], -10.0), 10.0)
25
+
26
+ return np.exp(0.5 * log_sigma2)
27
+
28
+
29
+ @njit
30
+ def _egarch_loop_single(returns, om, al, ga, be):
31
+ """JIT-compiled EGARCH recursion for a single state."""
32
+ T = len(returns)
33
+ log_s2 = np.empty(T)
34
+ log_s2[0] = np.log(np.var(returns) + 1e-8)
35
+
36
+ for t in range(1, T):
37
+ s = np.exp(0.5 * min(max(log_s2[t-1], -10.0), 10.0))
38
+ s = max(s, 1e-8)
39
+ z = min(max(returns[t-1] / s, -10.0), 10.0)
40
+ log_s2[t] = min(max(om + al*z + ga*abs(z) + be*log_s2[t-1], -10.0), 10.0)
41
+
42
+ return np.exp(0.5 * log_s2)
43
+
44
+ class MarkovEGARCHModel(BaseModel):
45
+ """
46
+ Markov-switching EGARCH(1,1) with Student-t innovations.
47
+
48
+ The model assumes financial returns are generated by three hidden
49
+ regimes (calm, stressed, crisis), each with its own EGARCH volatility
50
+ dynamics and Student-t innovation distribution.
51
+
52
+ Parameters
53
+ ----------
54
+ returns : pd.Series
55
+ Daily simple returns.
56
+ n_states : int
57
+ Number of hidden regimes. Defaults to 3.
58
+ n_starts : int
59
+ Number of random starting points for MLE. Defaults to 3.
60
+ """
61
+
62
+ def __init__(self, returns, n_states=3, n_starts=3, verbose = True, random_state = None, min_persistence = 0.7):
63
+ super().__init__(returns)
64
+ self.n_states = n_states
65
+ self.n_starts = n_starts
66
+ self.verbose = verbose
67
+ self.random_state = random_state
68
+ self.min_persistence = min_persistence
69
+
70
+ def _egarch_volatility_all(self, omega, alpha, gamma, beta):
71
+ T = len(self.returns)
72
+ K = self.n_states
73
+ return _egarch_loop(self.returns, omega, alpha, gamma, beta, K, T)
74
+
75
+ def _student_t_loglik(self, r, sigma, nu):
76
+ """
77
+ Log likelihood of observation r under standardised Student-t.
78
+ Numerically stable implementation.
79
+ """
80
+ # Clip sigma to prevent division by zero
81
+ sigma = np.clip(sigma, 1e-8, np.inf)
82
+
83
+ # Clip nu to prevent issues near boundary
84
+ nu = np.clip(nu, 2.001, 100.0)
85
+
86
+ c = (gammaln((nu + 1) / 2)
87
+ - gammaln(nu / 2)
88
+ - 0.5 * np.log(np.pi * (nu - 2)))
89
+
90
+ # Compute the log term stably
91
+ z2 = (r / sigma) ** 2
92
+ z2 = np.clip(z2, 0, 1e10)
93
+ log_term = np.log1p(z2 / (nu - 2))
94
+
95
+ return c - np.log(sigma) - (nu + 1) / 2 * log_term
96
+
97
+ def _forward(self, omega, alpha, gamma, beta, nu, A, pi):
98
+ """
99
+ Forward algorithm with log-sum-exp for numerical stability.
100
+
101
+ Returns
102
+ -------
103
+ log_likelihood : float
104
+ Total log likelihood of the observed returns.
105
+ log_alpha : np.ndarray, shape (T, K)
106
+ Filter probabilities in log space.
107
+ """
108
+ T = len(self.returns)
109
+ K = self.n_states
110
+ log_alpha = np.empty((T, K))
111
+
112
+ # Compute all state volatilities in one pass
113
+ sigmas = self._egarch_volatility_all(omega, alpha, gamma, beta)
114
+
115
+ # Vectorised log emissions across all states and time steps
116
+ log_emit = np.array([
117
+ self._student_t_loglik(self.returns, sigmas[k], nu[k])
118
+ for k in range(K)
119
+ ]).T # shape (T, K)
120
+
121
+ # Initialise at t=0
122
+ log_alpha[0] = np.log(pi + 1e-300) + log_emit[0]
123
+
124
+ # Recurse forward — vectorised
125
+ log_A = np.log(A + 1e-300)
126
+ for t in range(1, T):
127
+ v = log_alpha[t - 1][:, np.newaxis] + log_A
128
+ v_max = v.max(axis=0)
129
+ log_alpha[t] = v_max + np.log(np.sum(np.exp(v - v_max), axis=0)) + log_emit[t]
130
+
131
+ # Total log likelihood — log sum exp over final states
132
+ v = log_alpha[-1]
133
+ v_max = v.max()
134
+ log_likelihood = v_max + np.log(np.sum(np.exp(v - v_max)))
135
+
136
+ return log_likelihood, log_alpha, log_emit
137
+
138
+ def _backward(self, log_emit, A):
139
+ """
140
+ Backward algorithm with log-sum-exp for numerical stability.
141
+
142
+ Parameters
143
+ ----------
144
+ log_emit : np.ndarray, shape (T, K)
145
+ Log emission probabilities from the forward pass.
146
+ A : np.ndarray, shape (K, K)
147
+ Transition matrix.
148
+
149
+ Returns
150
+ -------
151
+ log_beta : np.ndarray, shape (T, K)
152
+ Backward probabilities in log space.
153
+ """
154
+ T, K = log_emit.shape
155
+ log_beta = np.zeros((T, K))
156
+ log_A = np.log(A + 1e-300)
157
+
158
+ # Initialise — beta_T = 1, so log_beta_T = 0
159
+ log_beta[-1] = 0.0
160
+
161
+ # Recurse backward
162
+ for t in range(T - 2, -1, -1):
163
+ for k in range(K):
164
+ v = log_A[k, :] + log_emit[t + 1] + log_beta[t + 1]
165
+ v_max = v.max()
166
+ log_beta[t, k] = v_max + np.log(np.sum(np.exp(v - v_max)))
167
+
168
+ return log_beta
169
+
170
+ def _compute_smoothed_probs(self, log_alpha, log_beta, log_emit, A):
171
+ """
172
+ Compute smoothed state probabilities and pairwise transition probabilities.
173
+
174
+ Returns
175
+ -------
176
+ gamma : np.ndarray, shape (T, K)
177
+ Smoothed state probabilities.
178
+ xi : np.ndarray, shape (T-1, K, K)
179
+ Pairwise transition probabilities.
180
+ """
181
+ T, K = log_alpha.shape
182
+ log_A = np.log(A + 1e-300)
183
+
184
+ # Smoothed state probabilities
185
+ log_gamma = log_alpha + log_beta
186
+ # Normalise each row using log-sum-exp
187
+ for t in range(T):
188
+ v = log_gamma[t]
189
+ v_max = v.max()
190
+ log_norm = v_max + np.log(np.sum(np.exp(v - v_max)))
191
+ log_gamma[t] -= log_norm
192
+ gamma = np.exp(log_gamma)
193
+
194
+ # Pairwise transition probabilities
195
+ xi = np.zeros((T - 1, K, K))
196
+ for t in range(T - 1):
197
+ log_xi_t = np.empty((K, K))
198
+ for j in range(K):
199
+ for k in range(K):
200
+ log_xi_t[j, k] = (
201
+ log_alpha[t, j]
202
+ + log_A[j, k]
203
+ + log_emit[t + 1, k]
204
+ + log_beta[t + 1, k]
205
+ )
206
+
207
+ # Normalise in log space to avoid overflow/underflow.
208
+ v_max = np.max(log_xi_t)
209
+ xi_t = np.exp(log_xi_t - v_max)
210
+ xi_sum = xi_t.sum()
211
+ if not np.isfinite(xi_sum) or xi_sum <= 0:
212
+ xi[t] = np.full((K, K), 1.0 / (K * K))
213
+ else:
214
+ xi[t] = xi_t / xi_sum
215
+
216
+ return gamma, xi
217
+
218
+ def _m_step(self, gamma, xi, omega, alpha, gamma_param, beta, nu, A):
219
+ """
220
+ M-step — update parameters given smoothed state probabilities.
221
+
222
+ Parameters
223
+ ----------
224
+ gamma : np.ndarray, shape (T, K)
225
+ Smoothed state probabilities.
226
+ xi : np.ndarray, shape (T-1, K, K)
227
+ Pairwise transition probabilities.
228
+
229
+ Returns
230
+ -------
231
+ Updated parameters.
232
+ """
233
+ K = self.n_states
234
+
235
+ # Update transition matrix — enforce minimum persistence on diagonal
236
+ A_new = xi.sum(axis=0)
237
+ A_new /= A_new.sum(axis=1, keepdims=True) + 1e-300
238
+
239
+ # Optionally enforce minimum diagonal (regime persistence)
240
+ if self.min_persistence is not None:
241
+ for k in range(K):
242
+ if A_new[k, k] < self.min_persistence:
243
+ deficit = self.min_persistence - A_new[k, k]
244
+ off_diag = [j for j in range(K) if j != k]
245
+ A_new[k, k] = self.min_persistence
246
+ for j in off_diag:
247
+ A_new[k, j] = max(A_new[k, j] - deficit / len(off_diag), 1e-4)
248
+ A_new[k] /= A_new[k].sum()
249
+
250
+ # Ensure no transition probability is exactly zero
251
+ A_new = np.maximum(A_new, 1e-4)
252
+ A_new /= A_new.sum(axis=1, keepdims=True)
253
+
254
+ # Update initial state distribution — add small floor to prevent collapse
255
+ pi_new = gamma[0] + 0.05
256
+ pi_new /= pi_new.sum()
257
+
258
+ # Update EGARCH parameters — small optimisation per state
259
+ omega_new = omega.copy()
260
+ alpha_new = alpha.copy()
261
+ gamma_new = gamma_param.copy()
262
+ beta_new = beta.copy()
263
+ nu_new = nu.copy()
264
+
265
+ for k in range(K):
266
+ weights = gamma[:, k]
267
+
268
+ def neg_weighted_ll(params):
269
+ om, al, ga, be_raw, nu_raw = params
270
+ be = np.tanh(be_raw)
271
+ be = np.clip(be, 0.5, 0.97)
272
+ nu_k = np.exp(np.clip(nu_raw, -5, 50)) + 2
273
+ nu_k = np.clip(nu_k, 4.0, 100.0)
274
+ om = np.clip(om, -6, 2)
275
+
276
+ sig = _egarch_loop_single(self.returns, om, al, ga, be)
277
+ ll = self._student_t_loglik(self.returns, sig, nu_k)
278
+ return -np.sum(weights * ll)
279
+
280
+ x0 = np.array([
281
+ omega[k],
282
+ alpha[k],
283
+ gamma_param[k],
284
+ np.arctanh(np.clip(beta[k], -0.999, 0.999)),
285
+ np.log(max(nu[k] - 2, 0.001))
286
+ ])
287
+
288
+ result = optimize.minimize(
289
+ neg_weighted_ll, x0,
290
+ method="Nelder-Mead",
291
+ options={"maxiter": 500, "xatol": 1e-4, "fatol": 1e-4}
292
+ )
293
+
294
+ if result.success or result.fun < neg_weighted_ll(x0):
295
+ omega_new[k] = result.x[0]
296
+ alpha_new[k] = result.x[1]
297
+ gamma_new[k] = result.x[2]
298
+ beta_new[k] = np.clip(np.tanh(result.x[3]), 0.5, 0.97)
299
+ nu_new[k] = np.clip(np.exp(np.clip(result.x[4], -5, 50)) + 2, 4.0, 100.0)
300
+ else:
301
+ if self.verbose:
302
+ print(f" Warning: optimizer failed for state {k}, keeping previous parameters")
303
+ # Keep previous parameters
304
+ omega_new[k] = omega[k]
305
+ alpha_new[k] = alpha[k]
306
+ gamma_new[k] = gamma_param[k]
307
+ beta_new[k] = beta[k]
308
+ nu_new[k] = nu[k]
309
+
310
+ return omega_new, alpha_new, gamma_new, beta_new, nu_new, A_new, pi_new
311
+
312
+
313
+ def fit(self, max_iter=100, tol=1e-3):
314
+ """
315
+ Fit the model using the EM algorithm.
316
+
317
+ Parameters
318
+ ----------
319
+ max_iter : int
320
+ Maximum number of EM iterations.
321
+ tol : float
322
+ Convergence tolerance on log likelihood improvement.
323
+ """
324
+ K = self.n_states
325
+ best_ll = -np.inf
326
+ best_params = None
327
+
328
+ if self.random_state is not None:
329
+ np.random.seed(self.random_state)
330
+
331
+ for start in range(self.n_starts):
332
+ if self.verbose:
333
+ print(f"Starting run {start + 1}/{self.n_starts}...")
334
+
335
+ # Random initialisation
336
+ omega = np.random.normal(-5, 0.5, K)
337
+ alpha = np.random.normal(0, 0.1, K)
338
+ gamma_param = np.random.normal(0, 0.1, K)
339
+ beta = np.random.uniform(0.7, 0.97, K)
340
+ nu = np.random.uniform(4, 10, K)
341
+
342
+ # Initialise transition matrix with strong diagonal
343
+ A = np.ones((K, K)) * 0.05
344
+ np.fill_diagonal(A, 0.90)
345
+ A /= A.sum(axis=1, keepdims=True)
346
+
347
+ # Random initial state distribution
348
+ pi = np.random.dirichlet(np.ones(K))
349
+
350
+ prev_ll = -np.inf
351
+ ll = -np.inf
352
+
353
+ for iteration in range(max_iter):
354
+ # E-step
355
+ try:
356
+ ll, log_alpha, log_emit = self._forward(
357
+ omega, alpha, gamma_param, beta, nu, A, pi
358
+ )
359
+ log_beta = self._backward(log_emit, A)
360
+ gamma, xi = self._compute_smoothed_probs(
361
+ log_alpha, log_beta, log_emit, A
362
+ )
363
+ except Exception as e:
364
+ if self.verbose:
365
+ print(f" E-step failed: {e}")
366
+ break
367
+
368
+ # Check convergence
369
+ improvement = ll - prev_ll
370
+ if self.verbose:
371
+ print(f" Iter {iteration + 1}: ll = {ll:.4f}, improvement = {improvement:.6f}")
372
+
373
+ if iteration > 0 and ll < prev_ll - 1.0:
374
+ if self.verbose:
375
+ print(f" Likelihood crashed, stopping")
376
+ ll = prev_ll
377
+ break
378
+
379
+ if iteration > 0 and improvement < 0:
380
+ if self.verbose:
381
+ print(f" Warning: likelihood decreased by {-improvement:.6f}")
382
+
383
+ if iteration > 0 and 0 <= improvement < tol:
384
+ if self.verbose:
385
+ print(f" Converged at iteration {iteration + 1}")
386
+ break
387
+
388
+ prev_ll = ll
389
+
390
+ # M-step
391
+ try:
392
+ omega, alpha, gamma_param, beta, nu, A, pi = self._m_step(
393
+ gamma, xi, omega, alpha, gamma_param, beta, nu, A
394
+ )
395
+ except Exception as e:
396
+ if self.verbose:
397
+ print(f" M-step failed: {e}")
398
+ break
399
+
400
+ # Keep best result across starts
401
+ if ll > best_ll:
402
+ best_ll = ll
403
+ best_params = (omega, alpha, gamma_param, beta, nu, A, pi)
404
+
405
+ if best_params is None:
406
+ raise RuntimeError("MarkovEGARCH fit failed for all random starts")
407
+
408
+ # Unpack best parameters
409
+ omega, alpha, gamma_param, beta, nu, A, pi = best_params
410
+
411
+ # Reorder states by increasing empirical fitted volatility
412
+ try:
413
+ _, log_alpha_final, log_emit_final = self._forward(
414
+ omega, alpha, gamma_param, beta, nu, A, pi
415
+ )
416
+ log_beta_final = self._backward(log_emit_final, A)
417
+ gamma_final, _ = self._compute_smoothed_probs(
418
+ log_alpha_final, log_beta_final, log_emit_final, A
419
+ )
420
+ sigmas_final = self._egarch_volatility_all(omega, alpha, gamma_param, beta)
421
+ avg_vol = np.array([
422
+ np.average(sigmas_final[k], weights=gamma_final[:, k])
423
+ for k in range(K)
424
+ ])
425
+ order = np.argsort(avg_vol)
426
+ except Exception:
427
+ order = np.argsort(np.abs(omega))
428
+
429
+ self.params_["omega"] = omega[order]
430
+ self.params_["alpha"] = alpha[order]
431
+ self.params_["gamma"] = gamma_param[order]
432
+ self.params_["beta"] = beta[order]
433
+ self.params_["nu"] = nu[order]
434
+ self.params_["A"] = A[np.ix_(order, order)]
435
+ self.params_["pi"] = pi[order]
436
+ self.params_["log_likelihood"] = best_ll
437
+
438
+ # Re-enforce minimum persistence after reordering
439
+ if self.min_persistence is not None:
440
+ A_final = self.params_["A"]
441
+ for k in range(self.n_states):
442
+ if A_final[k, k] < self.min_persistence:
443
+ deficit = self.min_persistence - A_final[k, k]
444
+ off_diag = [j for j in range(self.n_states) if j != k]
445
+ A_final[k, k] = self.min_persistence
446
+ for j in off_diag:
447
+ A_final[k, j] = max(A_final[k, j] - deficit / len(off_diag), 1e-4)
448
+ A_final[k] /= A_final[k].sum()
449
+ # Ensure no transition probability is exactly zero
450
+ A_final = np.maximum(A_final, 1e-4)
451
+ A_final /= A_final.sum(axis=1, keepdims=True)
452
+ self.params_["A"] = A_final
453
+
454
+
455
+
456
+ def sample(self, days, n_paths, burn_in=100):
457
+ """
458
+ Generate simulated return paths from the fitted model.
459
+
460
+ Parameters
461
+ ----------
462
+ days : int
463
+ Number of days to simulate.
464
+ n_paths : int
465
+ Number of independent paths.
466
+ burn_in : int
467
+ Number of initial steps to discard to reduce initialisation
468
+ bias. Defaults to 100.
469
+
470
+ Returns
471
+ -------
472
+ np.ndarray, shape (days, n_paths)
473
+ Simulated simple returns.
474
+ """
475
+
476
+ omega = self.params_["omega"]
477
+ alpha = self.params_["alpha"]
478
+ gamma = self.params_["gamma"]
479
+ beta = self.params_["beta"]
480
+ nu = self.params_["nu"]
481
+ A = self.params_["A"]
482
+ pi = self.params_["pi"]
483
+
484
+ # Stationary distribution of A for initial state sampling
485
+ try:
486
+ eigvals, eigvecs = np.linalg.eig(A.T)
487
+ stationary = np.real(eigvecs[:, np.argmax(np.real(eigvals))])
488
+ stationary = np.abs(stationary)
489
+ stationary /= stationary.sum()
490
+ except Exception:
491
+ stationary = pi
492
+
493
+ simulated = np.empty((days, n_paths))
494
+
495
+ for path in range(n_paths):
496
+ # Sample initial state from stationary distribution
497
+ state = np.random.choice(self.n_states, p=stationary)
498
+
499
+ # State-dependent initial log variance
500
+ log_sigma2 = np.log(np.var(self.returns) + 1e-8)
501
+ prev_return = 0.0
502
+
503
+ # Burn-in period — discard these steps
504
+ for _ in range(burn_in):
505
+ sigma = np.exp(0.5 * np.clip(log_sigma2, -10, 10))
506
+ z = prev_return / (sigma + 1e-300)
507
+ log_sigma2 = np.clip(
508
+ omega[state] + alpha[state] * z
509
+ + gamma[state] * np.abs(z)
510
+ + beta[state] * log_sigma2,
511
+ -10, 10
512
+ )
513
+ sigma = np.exp(0.5 * log_sigma2)
514
+ prev_return = self._sample_student_t(sigma, nu[state])
515
+ state = np.random.choice(self.n_states, p=A[state])
516
+
517
+ # Actual simulation
518
+ for t in range(days):
519
+ sigma = np.exp(0.5 * np.clip(log_sigma2, -10, 10))
520
+ z = prev_return / (sigma + 1e-300)
521
+ log_sigma2 = np.clip(
522
+ omega[state] + alpha[state] * z
523
+ + gamma[state] * np.abs(z)
524
+ + beta[state] * log_sigma2,
525
+ -10, 10
526
+ )
527
+ sigma = np.exp(0.5 * log_sigma2)
528
+ r = self._sample_student_t(sigma, nu[state])
529
+ simulated[t, path] = r
530
+ prev_return = r
531
+ state = np.random.choice(self.n_states, p=A[state])
532
+
533
+ return simulated
534
+
535
+
536
+ def _sample_student_t(self, sigma, nu):
537
+ """Sample one return from a standardised Student-t scaled by sigma."""
538
+ nu = np.clip(nu, 4.0, 100.0)
539
+ x = np.random.standard_t(nu)
540
+ # Rescale to match standardised Student-t variance
541
+ x = x * np.sqrt((nu - 2) / nu)
542
+ return x * sigma
543
+
544
+ @property
545
+ def name(self):
546
+ return "markov_egarch"
547
+
548
+
@@ -85,7 +85,7 @@ class SimulationResult:
85
85
  zorder=5
86
86
  )
87
87
 
88
- ax.set_title(f"PathForge Simulation {self.model_name} ({self.paths.shape[1]} paths)")
88
+ ax.set_title(f"PathForge Simulation, {self.model_name} ({self.paths.shape[1]} paths)")
89
89
  ax.set_xlabel("Days")
90
90
  ax.set_ylabel("Price")
91
91
  ax.grid()
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: pathforge
3
- Version: 0.1.1
3
+ Version: 0.2.0
4
4
  Summary: Simulate realistic financial markets from historical price data
5
5
  Description-Content-Type: text/markdown
6
6
 
@@ -21,6 +21,7 @@ Testing a trading strategy on a single historical price series tells you how it
21
21
  ## Installation
22
22
  ```bash
23
23
  pip install pathforge
24
+ pip install numba # required for markov_egarch model
24
25
  ```
25
26
 
26
27
  To use the built-in plot functionality:
@@ -56,6 +57,7 @@ df = sim.to_dataframe() # shape: (253, 100)
56
57
 
57
58
  | Model | `model=` | Best for |
58
59
  |---|---|---|
60
+ | Markov-switching EGARCH | `"markov_egarch"` | Research-grade: hidden regimes + volatility clustering + fat tails |
59
61
  | Geometric Brownian Motion | `"gbm"` | Fast baseline, simple assumptions |
60
62
  | GARCH(1,1) | `"garch"` | Realistic volatility clustering |
61
63
  | Block Bootstrap | `"bootstrap"` | Non-parametric, no distributional assumptions |
@@ -67,6 +69,30 @@ df = sim.to_dataframe() # shape: (253, 100)
67
69
  - **GARCH** — best for most use cases, captures the volatility clustering seen in real markets
68
70
  - **Bootstrap** — most honest for strategy testing, resamples real historical behaviour directly
69
71
  - **Jump Diffusion** — best when your data contains sudden large moves you want to preserve
72
+ - **Markov-switching EGARCH** — the most sophisticated model. Identifies hidden market regimes (calm, stressed, crisis) each with its own EGARCH volatility dynamics and Student-t innovations. Captures regime persistence, volatility clustering, leverage effects, and fat tails simultaneously. Requires minimum 2 years of daily data and Numba for speed optimisation.
73
+
74
+ ## Usage Notes
75
+
76
+ ### Markov-switching EGARCH
77
+ The `markov_egarch` model has specific requirements and options:
78
+
79
+ - **Minimum data**: 2 years of daily prices (500+ observations recommended)
80
+ - **Fitting time**: ~1 minute on a modern machine (first call longer due to Numba JIT warmup)
81
+ - **Dependencies**: requires `numba` — `pip install numba`
82
+ ```python
83
+ forge = pf.PathForge(prices)
84
+ forge.fit(
85
+ model="markov_egarch",
86
+ n_states=3, # number of hidden regimes
87
+ n_starts=3, # random restarts for EM algorithm
88
+ verbose=True, # print fitting progress
89
+ random_state=42, # for reproducibility
90
+ min_persistence=0.7 # minimum regime persistence (set to None to disable)
91
+ )
92
+ sim = forge.simulate(days=252, n_paths=100)
93
+ ```
94
+
95
+ > **Note:** This model uses a generalised EM algorithm rather than an exact closed-form M-step. Volatility dynamics are modelled using state-specific, uncentred EGARCH filters, resulting in an approximate likelihood. This approach is designed for practical simulation and backtesting rather than exact state-space inference. See the [GitHub repository](https://github.com/franmanz/pathforge) for full technical details.
70
96
 
71
97
  ## API Reference
72
98
 
@@ -92,10 +118,11 @@ Returned by `.simulate()`.
92
118
 
93
119
  ## Roadmap
94
120
 
95
- - [ ] Poisson jump diffusion ✅
121
+ - [x] Merton Jump Diffusion
122
+ - [x] Markov-switching EGARCH with Student-t innovations
96
123
  - [ ] Intraday timeframes (1m, 5m, 15m, 1h)
97
124
  - [ ] Multi-asset correlated simulation
98
- - [ ] Regime switching model
125
+ - [ ] Centred EGARCH specification
99
126
  - [ ] CLI: `pathforge simulate AAPL --days 252 --paths 500`
100
127
 
101
128
  ## Contributing
@@ -13,4 +13,5 @@ pathforge/models/bootstrap.py
13
13
  pathforge/models/garch.py
14
14
  pathforge/models/gbm.py
15
15
  pathforge/models/jump_diffusion.py
16
+ pathforge/models/markov_egarch.py
16
17
  tests/test_pathforge.py
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name = "pathforge"
7
- version = "0.1.1"
7
+ version = "0.2.0"
8
8
  description = "Simulate realistic financial markets from historical price data"
9
9
  readme = "README.md"
10
10
 
@@ -66,12 +66,28 @@ def test_start_price(price_series):
66
66
  assert np.all(sim.paths[0] == 500.0)
67
67
 
68
68
 
69
- def test_to_dataframe(price_series):
70
- forge = pf.PathForge(price_series)
71
- forge.fit(model="gbm")
72
- sim = forge.simulate(days=50, n_paths=5, seed=0)
73
- df = sim.to_dataframe()
74
- assert df.shape == (51, 5)
75
- assert list(df.columns) == [f"path_{i}" for i in range(5)]
76
-
77
-
69
+ def test_to_dataframe(price_series):
70
+ forge = pf.PathForge(price_series)
71
+ forge.fit(model="gbm")
72
+ sim = forge.simulate(days=50, n_paths=5, seed=0)
73
+ df = sim.to_dataframe()
74
+ assert df.shape == (51, 5)
75
+ assert list(df.columns) == [f"path_{i}" for i in range(5)]
76
+
77
+
78
+ def test_fit_routes_model_constructor_kwargs(price_series):
79
+ forge = pf.PathForge(price_series)
80
+ forge.fit(model="jump_diffusion", jump_threshold=2.5)
81
+ assert forge._model.jump_threshold == 2.5
82
+
83
+
84
+ def test_fit_routes_model_fit_kwargs_to_markov_egarch(price_series):
85
+ short_prices = price_series.iloc[:200]
86
+ forge = pf.PathForge(short_prices)
87
+ forge.fit(model="markov_egarch", n_starts=1, max_iter=1)
88
+
89
+ transition_matrix = forge._model.params_["A"]
90
+ assert transition_matrix.shape == (3, 3)
91
+ assert np.isfinite(transition_matrix).all()
92
+
93
+
File without changes