pyacm 0.4__tar.gz → 1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
- Metadata-Version: 2.1
1
+ Metadata-Version: 2.2
2
2
  Name: pyacm
3
- Version: 0.4
3
+ Version: 1.0
4
4
  Summary: ACM Term Premium
5
5
  Author: Tobias Adrian, Richard K. Crump, Emanuel Moench
6
6
  Maintainer: Gustavo Amarante
@@ -12,11 +12,19 @@ Requires-Dist: matplotlib
12
12
  Requires-Dist: numpy
13
13
  Requires-Dist: pandas
14
14
  Requires-Dist: scikit-learn
15
- Requires-Dist: tqdm
15
+ Requires-Dist: statsmodels
16
+ Dynamic: author
17
+ Dynamic: description
18
+ Dynamic: description-content-type
19
+ Dynamic: keywords
20
+ Dynamic: maintainer
21
+ Dynamic: maintainer-email
22
+ Dynamic: requires-dist
23
+ Dynamic: summary
16
24
 
17
25
 
18
26
  [paper_website]: https://www.newyorkfed.org/medialibrary/media/research/staff_reports/sr340.pdf
19
- [inference_atribute]: https://github.com/gusamarante/pyacm/blob/ba641c14e450fc83d22db4ef5e60eadbd489b351/pyacm/acm.py#L203
27
+
20
28
 
21
29
  # pyacm
22
30
  Implementation of ["Pricing the Term Structure with Linear Regressions" from
@@ -35,7 +43,6 @@ carries all the relevant variables as atributes:
35
43
  - Term premium
36
44
  - Historical in-sample expected returns
37
45
  - Expected return loadings
38
- - Hypothesis testing (Not sure if correct, more info observations below)
39
46
 
40
47
 
41
48
  # Instalation
@@ -43,6 +50,7 @@ carries all the relevant variables as atributes:
43
50
  pip install pyacm
44
51
  ```
45
52
 
53
+
46
54
  # Usage
47
55
  ```python
48
56
  from pyacm import NominalACM
@@ -59,13 +67,9 @@ The tricky part of using this model is getting the correct data format. The
59
67
  - Maturities (columns) must be equally spaced in **monthly** frequency and start
60
68
  at month 1. This means that you need to construct a bootstraped curve for every
61
69
  date and interpolate it at fixed monthly maturities
62
- - Whichever maturity you want to be the longest, your input data should have one
63
- column more. For example, if you want term premium estimate up to the 10-year
64
- yield (120 months), your input data should include maturities up to 121 months.
65
- This is needed to properly compute the returns.
66
70
 
67
- # Examples
68
71
 
72
+ # Examples
69
73
  The estimates for the US are available on the [NY FED website](https://www.newyorkfed.org/research/data_indicators/term-premia-tabs#/overview).
70
74
 
71
75
  The jupyter notebook [`example_br`](https://github.com/gusamarante/pyacm/blob/main/example_br.ipynb)
@@ -82,14 +86,5 @@ contains an example application to the Brazilian DI futures curve that showcases
82
86
  > FRB of New York Staff Report No. 340,
83
87
  > Available at SSRN: https://ssrn.com/abstract=1362586 or http://dx.doi.org/10.2139/ssrn.1362586
84
88
 
85
- The version of the article that was published by the NY FED is not 100% explicit on how the data is being manipulated,
86
- but I found an earlier version of the paper on SSRN where the authors go deeper into the details on how everything is being estimated:
87
- - Data for zero yields uses monthly maturities starting from month 1
88
- - All principal components and model parameters are estiamted with data resampled to a monthly frequency, averaging observations in each month
89
- - To get daily / real-time estimates, the factor loadings estimated from the monthly frquency are used to transform the daily data
90
-
91
-
92
- # Observations
93
- I am not completely sure that computations in the [inferences attributes][inference_atribute]
94
- are correct. If you find any mistakes, please open a pull request following the contributing
95
- guidelines.
89
+ I would like to thank Emanuel Moench for sending me his original MATLAB code in
90
+ order to perfectly replicate these results.
@@ -1,5 +1,5 @@
1
1
  [paper_website]: https://www.newyorkfed.org/medialibrary/media/research/staff_reports/sr340.pdf
2
- [inference_atribute]: https://github.com/gusamarante/pyacm/blob/ba641c14e450fc83d22db4ef5e60eadbd489b351/pyacm/acm.py#L203
2
+
3
3
 
4
4
  # pyacm
5
5
  Implementation of ["Pricing the Term Structure with Linear Regressions" from
@@ -18,7 +18,6 @@ carries all the relevant variables as atributes:
18
18
  - Term premium
19
19
  - Historical in-sample expected returns
20
20
  - Expected return loadings
21
- - Hypothesis testing (Not sure if correct, more info observations below)
22
21
 
23
22
 
24
23
  # Instalation
@@ -26,6 +25,7 @@ carries all the relevant variables as atributes:
26
25
  pip install pyacm
27
26
  ```
28
27
 
28
+
29
29
  # Usage
30
30
  ```python
31
31
  from pyacm import NominalACM
@@ -42,13 +42,9 @@ The tricky part of using this model is getting the correct data format. The
42
42
  - Maturities (columns) must be equally spaced in **monthly** frequency and start
43
43
  at month 1. This means that you need to construct a bootstraped curve for every
44
44
  date and interpolate it at fixed monthly maturities
45
- - Whichever maturity you want to be the longest, your input data should have one
46
- column more. For example, if you want term premium estimate up to the 10-year
47
- yield (120 months), your input data should include maturities up to 121 months.
48
- This is needed to properly compute the returns.
49
45
 
50
- # Examples
51
46
 
47
+ # Examples
52
48
  The estimates for the US are available on the [NY FED website](https://www.newyorkfed.org/research/data_indicators/term-premia-tabs#/overview).
53
49
 
54
50
  The jupyter notebook [`example_br`](https://github.com/gusamarante/pyacm/blob/main/example_br.ipynb)
@@ -65,14 +61,5 @@ contains an example application to the Brazilian DI futures curve that showcases
65
61
  > FRB of New York Staff Report No. 340,
66
62
  > Available at SSRN: https://ssrn.com/abstract=1362586 or http://dx.doi.org/10.2139/ssrn.1362586
67
63
 
68
- The version of the article that was published by the NY FED is not 100% explicit on how the data is being manipulated,
69
- but I found an earlier version of the paper on SSRN where the authors go deeper into the details on how everything is being estimated:
70
- - Data for zero yields uses monthly maturities starting from month 1
71
- - All principal components and model parameters are estiamted with data resampled to a monthly frequency, averaging observations in each month
72
- - To get daily / real-time estimates, the factor loadings estimated from the monthly frquency are used to transform the daily data
73
-
74
-
75
- # Observations
76
- I am not completely sure that computations in the [inferences attributes][inference_atribute]
77
- are correct. If you find any mistakes, please open a pull request following the contributing
78
- guidelines.
64
+ I would like to thank Emanuel Moench for sending me his original MATLAB code in
65
+ order to perfectly replicate these results.
pyacm-1.0/pyacm/acm.py ADDED
@@ -0,0 +1,454 @@
1
+ import numpy as np
2
+ import pandas as pd
3
+
4
+ from numpy.linalg import inv
5
+ from sklearn.decomposition import PCA
6
+ from statsmodels.tools.tools import add_constant
7
+
8
+
9
+ class NominalACM:
10
+ """
11
+ This class implements the model from the article:
12
+
13
+ Adrian, Tobias, Richard K. Crump, and Emanuel Moench. “Pricing the
14
+ Term Structure with Linear Regressions.” SSRN Electronic Journal,
15
+ 2012. https://doi.org/10.2139/ssrn.1362586.
16
+
17
+ It handles data transformation, estimates parameters and generates the
18
+ relevant outputs. The version of the article that was published by the NY
19
+ FED is not 100% explicit on how the data is being manipulated, but I found
20
+ an earlier version of the paper on SSRN where the authors go deeper into
21
+ the details on how everything is being estimated:
22
+ - Data for zero yields uses monthly maturities starting from month 1
23
+ - All principal components and model parameters are estiamted with data
24
+ resampled to a monthly frequency, averaging observations in each
25
+ month.
26
+ - To get daily / real-time estimates, the factor loadings estimated
27
+ from the monthly frquency are used to transform the daily data.
28
+
29
+ Attributes
30
+ ----------
31
+ n_factors: int
32
+ number of principal components used
33
+
34
+ curve: pandas.DataFrame
35
+ Raw data of the yield curve
36
+
37
+ curve_monthly: pandas.DataFrame
38
+ Yield curve data resampled to a monthly frequency by averageing
39
+ the observations
40
+
41
+ t_m: int
42
+ Number of observations in the monthly timeseries dimension
43
+
44
+ t_d: int
45
+ Number of observations in the daily timeseries dimension
46
+
47
+ n: int
48
+ Number of observations in the cross-sectional dimension, the number of
49
+ maturities available
50
+
51
+ rx_m: pd.DataFrame
52
+ Excess returns in monthly frquency
53
+
54
+ pc_factors_m: pandas.DataFrame
55
+ Principal components in monthly frequency
56
+
57
+ pc_loadings_m: pandas.DataFrame
58
+ Factor loadings of the monthly PCs
59
+
60
+ pc_explained_m: pandas.Series
61
+ Percent of total variance explained by each monthly principal component
62
+
63
+ pc_factors_d: pandas.DataFrame
64
+ Principal components in daily frequency
65
+
66
+ mu, phi, Sigma, v: numpy.array
67
+ Estimates of the VAR(1) parameters, the first stage of estimation.
68
+ The names are the same as the original paper
69
+
70
+ beta: numpy.array
71
+ Estimates of the risk premium equation, the second stage of estimation.
72
+ The name is the same as the original paper
73
+
74
+ lambda0, lambda1: numpy.array
75
+ Estimates of the price of risk parameters, the third stage of
76
+ estimation.
77
+
78
+ delta0, delta1: numpy.array
79
+ Estimates of the short rate equation coefficients.
80
+
81
+ A, B: numpy.array
82
+ Affine coefficients for the fitted yields of different maturities
83
+
84
+ Arn, Brn: numpy.array
85
+ Affine coefficients for the risk neutral yields of different maturities
86
+
87
+ miy: pandas.DataFrame
88
+ Model implied / fitted yields
89
+
90
+ rny: pandas.DataFrame
91
+ Risk neutral yields
92
+
93
+ tp: pandas.DataFrame
94
+ Term premium estimates
95
+
96
+ er_loadings: pandas.DataFrame
97
+ Loadings of the expected reutrns on the principal components
98
+
99
+ er_hist: pandas.DataFrame
100
+ Historical estimates of expected returns, computed in-sample.
101
+ """
102
+
103
+ def __init__(
104
+ self,
105
+ curve,
106
+ curve_m=None,
107
+ n_factors=5,
108
+ selected_maturities=None,
109
+ ):
110
+ """
111
+ Runs the baseline varsion of the ACM term premium model. Works for data
112
+ with monthly frequency or higher.
113
+
114
+ Parameters
115
+ ----------
116
+ curve : pandas.DataFrame
117
+ Annualized log-yields. Maturities (columns) must start at month 1
118
+ and be equally spaced in monthly frequency. Column labels must be
119
+ integers from 1 to n. Observations (index) must be a pandas
120
+ DatetimeIndex with daily frequency.
121
+
122
+ curve_m: pandas.DataFrame
123
+ Annualized log-yields in monthly frequency to be used for the
124
+ parameters estimates. This is here in case the user wants to use a
125
+ different curve for the parameter estimation. If None is passed,
126
+ the input `curve` is resampled to monthly frequency. If something
127
+ is passed, maturities (columns) must start at month 1 and be
128
+ equally spaced in monthly frequency. Column labels must be
129
+ integers from 1 to n. Observations (index) must be a pandas
130
+ DatetimeIndex with monthly frequency.
131
+
132
+ n_factors : int
133
+ number of principal components to used as state variables.
134
+
135
+ selected_maturities: list of int
136
+ the maturities to be considered in the parameter estimation steps.
137
+ If None is passed, all the maturities are considered. The user may
138
+ choose smaller set of yields to consider due to, for example,
139
+ liquidity and representativeness of certain maturities.
140
+ """
141
+
142
+ self._assertions(curve, curve_m, selected_maturities)
143
+
144
+
145
+
146
+ self.n_factors = n_factors
147
+ self.curve = curve
148
+
149
+ if selected_maturities is None:
150
+ self.selected_maturities = curve.columns
151
+ else:
152
+ self.selected_maturities = selected_maturities
153
+
154
+ if curve_m is None:
155
+ self.curve_monthly = curve.resample('M').mean()
156
+ else:
157
+ self.curve_monthly = curve_m
158
+
159
+ self.t_d = self.curve.shape[0]
160
+ self.t_m = self.curve_monthly.shape[0] - 1
161
+ self.n = self.curve.shape[1]
162
+ self.pc_factors_m, self.pc_factors_d, self.pc_loadings_m, self.pc_explained_m = self._get_pcs(self.curve_monthly, self.curve)
163
+
164
+ self.rx_m = self._get_excess_returns()
165
+
166
+ # ===== ACM Three-Step Regression =====
167
+ # 1st Step - Factor VAR
168
+ self.mu, self.phi, self.Sigma, self.v, self.s0 = self._estimate_var()
169
+
170
+ # 2nd Step - Excess Returns
171
+ self.beta, self.omega, self.beta_star = self._excess_return_regression()
172
+
173
+ # 3rd Step - Convexity-adjusted price of risk
174
+ self.lambda0, self.lambda1, self.mu_star, self.phi_star = self._retrieve_lambda()
175
+
176
+ # Short Rate Equation
177
+ self.delta0, self.delta1 = self._short_rate_equation(
178
+ r1=self.curve_monthly.iloc[:, 0],
179
+ X=self.pc_factors_m,
180
+ )
181
+
182
+ # Affine Yield Coefficients
183
+ self.A, self.B = self._affine_coefficients(
184
+ lambda0=self.lambda0,
185
+ lambda1=self.lambda1,
186
+ )
187
+
188
+ # Risk-Neutral Coefficients
189
+ self.Arn, self.Brn = self._affine_coefficients(
190
+ lambda0=np.zeros(self.lambda0.shape),
191
+ lambda1=np.zeros(self.lambda1.shape),
192
+ )
193
+
194
+ # Model Implied Yield
195
+ self.miy = self._compute_yields(self.A, self.B)
196
+
197
+ # Risk Neutral Yield
198
+ self.rny = self._compute_yields(self.Arn, self.Brn)
199
+
200
+ # Term Premium
201
+ self.tp = self.miy - self.rny
202
+
203
+ # Expected Return
204
+ self.er_loadings, self.er_hist = self._expected_return()
205
+
206
+ def fwd_curve(self, date=None):
207
+ """
208
+ Compute the forward curves for a given date.
209
+
210
+ Parameters
211
+ ----------
212
+ date : date-like
213
+ date in any format that can be interpreted by pandas.to_datetime()
214
+ """
215
+
216
+ if date is None:
217
+ date = self.curve.index[-1]
218
+
219
+ date = pd.to_datetime(date)
220
+ fwd_mkt = self._compute_fwd_curve(self.curve.loc[date])
221
+ fwd_miy = self._compute_fwd_curve(self.miy.loc[date])
222
+ fwd_rny = self._compute_fwd_curve(self.rny.loc[date])
223
+ df = pd.concat(
224
+ [
225
+ fwd_mkt.rename("Observed"),
226
+ fwd_miy.rename("Fitted"),
227
+ fwd_rny.rename("Risk-Neutral"),
228
+ ],
229
+ axis=1,
230
+ )
231
+ return df
232
+
233
+ @staticmethod
234
+ def _compute_fwd_curve(curve):
235
+ aux_curve = curve.reset_index(drop=True)
236
+ aux_curve.index = aux_curve.index + 1
237
+ factor = (1 + aux_curve) ** (aux_curve.index / 12)
238
+ fwd_factor = factor / factor.shift(1).fillna(1)
239
+ fwds = (fwd_factor ** 12) - 1
240
+ fwds = pd.Series(fwds.values, index=curve.index)
241
+ return fwds
242
+
243
+ @staticmethod
244
+ def _assertions(curve, curve_m, selected_maturities):
245
+ # Selected maturities are available
246
+ if selected_maturities is not None:
247
+ assert all([col in curve.columns for col in selected_maturities]), \
248
+ "not all `selected_columns` are available in `curve`"
249
+
250
+ # Consecutive monthly maturities
251
+ cond1 = curve.columns[0] != 1
252
+ cond2 = not all(np.diff(curve.columns.values) == 1)
253
+ if cond1 or cond2:
254
+ msg = "`curve` columns must be consecutive integers starting from 1"
255
+ raise AssertionError(msg)
256
+
257
+ # Only if `curve_m` is passed
258
+ if curve_m is not None:
259
+
260
+ # Same columns
261
+ assert curve_m.columns.equals(curve.columns), \
262
+ "columns of `curve` and `curve_m` must be the same"
263
+
264
+ # Monthly frequency
265
+ assert pd.infer_freq(curve_m.index) == 'M', \
266
+ "`curve_m` must have a DatetimeIndex with monthly frequency"
267
+
268
+ def _get_excess_returns(self):
269
+ ttm = np.arange(1, self.n + 1) / 12
270
+ log_prices = - self.curve_monthly * ttm
271
+ rf = - log_prices.iloc[:, 0].shift(1)
272
+ rx = (log_prices - log_prices.shift(1, axis=0).shift(-1, axis=1)).subtract(rf, axis=0)
273
+ rx = rx.shift(1, axis=1)
274
+
275
+ rx = rx.dropna(how='all', axis=0)
276
+ rx[1] = 0
277
+ return rx
278
+
279
+ def _get_pcs(self, curve_m, curve_d):
280
+
281
+ # The authors' code shows that they ignore the first 2 maturities for
282
+ # the PC estimation.
283
+ curve_m_cut = curve_m.iloc[:, 2:]
284
+ curve_d_cut = curve_d.iloc[:, 2:]
285
+
286
+ mean_yields = curve_m_cut.mean()
287
+ curve_m_cut = curve_m_cut - mean_yields
288
+ curve_d_cut = curve_d_cut - mean_yields
289
+
290
+ pca = PCA(n_components=self.n_factors)
291
+ pca.fit(curve_m_cut)
292
+ col_names = [f'PC {i + 1}' for i in range(self.n_factors)]
293
+ df_loadings = pd.DataFrame(
294
+ data=pca.components_.T,
295
+ columns=col_names,
296
+ index=curve_m_cut.columns,
297
+ )
298
+
299
+ df_pc_m = curve_m_cut @ df_loadings
300
+ sigma_factor = df_pc_m.std()
301
+ df_pc_m = df_pc_m / df_pc_m.std()
302
+ df_loadings = df_loadings / sigma_factor
303
+
304
+ # Enforce average positive loadings
305
+ sign_changes = np.sign(df_loadings.mean())
306
+ df_loadings = sign_changes * df_loadings
307
+ df_pc_m = sign_changes * df_pc_m
308
+
309
+ # Daily frequency
310
+ df_pc_d = curve_d_cut @ df_loadings
311
+
312
+ # Percent Explained
313
+ df_explained = pd.Series(
314
+ data=pca.explained_variance_ratio_,
315
+ name='Explained Variance',
316
+ index=col_names,
317
+ )
318
+
319
+ return df_pc_m, df_pc_d, df_loadings, df_explained
320
+
321
+ def _estimate_var(self):
322
+ X = self.pc_factors_m.copy().T
323
+ X_lhs = X.values[:, 1:] # X_t+1. Left hand side of VAR
324
+ X_rhs = np.vstack((np.ones((1, self.t_m)), X.values[:, 0:-1])) # X_t and a constant.
325
+
326
+ var_coeffs = (X_lhs @ np.linalg.pinv(X_rhs))
327
+
328
+ phi = var_coeffs[:, 1:]
329
+
330
+ # Leave the estimated constant
331
+ # mu = var_coeffs[:, [0]]
332
+
333
+ # Force constant to zero
334
+ mu = np.zeros((self.n_factors, 1))
335
+ var_coeffs[:, [0]] = 0
336
+
337
+ # Residuals
338
+ v = X_lhs - var_coeffs @ X_rhs
339
+ Sigma = v @ v.T / (self.t_m - 1)
340
+
341
+ s0 = np.cov(v).reshape((-1, 1))
342
+
343
+ return mu, phi, Sigma, v, s0
344
+
345
+ def _excess_return_regression(self):
346
+
347
+ if self.selected_maturities is not None:
348
+ rx = self.rx_m[self.selected_maturities].values
349
+ else:
350
+ rx = self.rx_m.values
351
+
352
+ X = self.pc_factors_m.copy().T.values[:, :-1]
353
+ Z = np.vstack((np.ones((1, self.t_m)), X, self.v)).T # Lagged X and Innovations
354
+ abc = inv(Z.T @ Z) @ (Z.T @ rx)
355
+ E = rx - Z @ abc
356
+ omega = np.var(E.reshape(-1, 1)) * np.eye(len(self.selected_maturities))
357
+
358
+ abc = abc.T
359
+ beta = abc[:, -self.n_factors:]
360
+
361
+ beta_star = np.zeros((len(self.selected_maturities), self.n_factors**2))
362
+
363
+ for i in range(len(self.selected_maturities)):
364
+ beta_star[i, :] = np.kron(beta[i, :], beta[i, :]).T
365
+
366
+ return beta, omega, beta_star
367
+
368
+ def _retrieve_lambda(self):
369
+ rx = self.rx_m[self.selected_maturities]
370
+ factors = np.hstack([np.ones((self.t_m, 1)), self.pc_factors_m.iloc[:-1].values])
371
+
372
+ # Orthogonalize factors with respect to v
373
+ v_proj = self.v.T @ np.linalg.pinv(self.v @ self.v.T) @ self.v
374
+ factors = factors - v_proj @ factors
375
+
376
+ adjustment = self.beta_star @ self.s0 + np.diag(self.omega).reshape(-1, 1)
377
+ rx_adjusted = rx.values + (1 / 2) * np.tile(adjustment, (1, self.t_m)).T
378
+ Y = (inv(factors.T @ factors) @ factors.T @ rx_adjusted).T
379
+
380
+ # Compute Lambda
381
+ X = self.beta
382
+ Lambda = inv(X.T @ X) @ X.T @ Y
383
+ lambda0 = Lambda[:, 0]
384
+ lambda1 = Lambda[:, 1:]
385
+
386
+ muStar = self.mu.reshape(-1) - lambda0
387
+ phiStar = self.phi - lambda1
388
+
389
+ return lambda0, lambda1, muStar, phiStar
390
+
391
+ @staticmethod
392
+ def _short_rate_equation(r1, X):
393
+ r1 = r1 / 12
394
+ X = add_constant(X)
395
+ Delta = inv(X.T @ X) @ X.T @ r1
396
+ delta0 = Delta.iloc[0]
397
+ delta1 = Delta.iloc[1:].values
398
+ return delta0, delta1
399
+
400
+ def _affine_coefficients(self, lambda0, lambda1):
401
+ lambda0 = lambda0.reshape(-1, 1)
402
+
403
+ A = np.zeros(self.n)
404
+ B = np.zeros((self.n, self.n_factors))
405
+
406
+ A[0] = - self.delta0
407
+ B[0, :] = - self.delta1
408
+
409
+ for n in range(1, self.n):
410
+ Bpb = np.kron(B[n - 1, :], B[n - 1, :])
411
+ s0term = 0.5 * (Bpb @ self.s0 + self.omega[0, 0])
412
+
413
+ A[n] = A[n - 1] + B[n - 1, :] @ (self.mu - lambda0) + s0term + A[0]
414
+ B[n, :] = B[n - 1, :] @ (self.phi - lambda1) + B[0, :]
415
+
416
+ return A, B
417
+
418
+ def _compute_yields(self, A, B):
419
+ A = A.reshape(-1, 1)
420
+ multiplier = np.tile(self.curve.columns / 12, (self.t_d, 1)).T
421
+ yields = (- ((np.tile(A, (1, self.t_d)) + B @ self.pc_factors_d.T) / multiplier).T).values
422
+ yields = pd.DataFrame(
423
+ data=yields,
424
+ index=self.curve.index,
425
+ columns=self.curve.columns,
426
+ )
427
+ return yields
428
+
429
+ def _expected_return(self):
430
+ """
431
+ Compute the "expected return" and "convexity adjustment" terms, to get
432
+ the expected return loadings and historical estimate
433
+
434
+ Loadings are interpreted as the effect of 1sd of the PCs on the
435
+ expected returns
436
+ """
437
+ stds = self.pc_factors_m.std().values[:, None].T
438
+ er_loadings = (self.B @ self.lambda1) * stds
439
+ er_loadings = pd.DataFrame(
440
+ data=er_loadings,
441
+ columns=self.pc_factors_m.columns,
442
+ index=range(1, self.n + 1),
443
+ )
444
+
445
+ # Historical estimate
446
+ exp_ret = (self.B @ (self.lambda1 @ self.pc_factors_d.T + self.lambda0.reshape(-1, 1))).values
447
+ conv_adj = np.diag(self.B @ self.Sigma @ self.B.T) + self.omega[0, 0]
448
+ er_hist = (exp_ret + conv_adj[:, None]).T
449
+ er_hist_d = pd.DataFrame(
450
+ data=er_hist,
451
+ index=self.pc_factors_d.index,
452
+ columns=self.curve.columns,
453
+ )
454
+ return er_loadings, er_hist_d
@@ -1,6 +1,6 @@
1
- Metadata-Version: 2.1
1
+ Metadata-Version: 2.2
2
2
  Name: pyacm
3
- Version: 0.4
3
+ Version: 1.0
4
4
  Summary: ACM Term Premium
5
5
  Author: Tobias Adrian, Richard K. Crump, Emanuel Moench
6
6
  Maintainer: Gustavo Amarante
@@ -12,11 +12,19 @@ Requires-Dist: matplotlib
12
12
  Requires-Dist: numpy
13
13
  Requires-Dist: pandas
14
14
  Requires-Dist: scikit-learn
15
- Requires-Dist: tqdm
15
+ Requires-Dist: statsmodels
16
+ Dynamic: author
17
+ Dynamic: description
18
+ Dynamic: description-content-type
19
+ Dynamic: keywords
20
+ Dynamic: maintainer
21
+ Dynamic: maintainer-email
22
+ Dynamic: requires-dist
23
+ Dynamic: summary
16
24
 
17
25
 
18
26
  [paper_website]: https://www.newyorkfed.org/medialibrary/media/research/staff_reports/sr340.pdf
19
- [inference_atribute]: https://github.com/gusamarante/pyacm/blob/ba641c14e450fc83d22db4ef5e60eadbd489b351/pyacm/acm.py#L203
27
+
20
28
 
21
29
  # pyacm
22
30
  Implementation of ["Pricing the Term Structure with Linear Regressions" from
@@ -35,7 +43,6 @@ carries all the relevant variables as atributes:
35
43
  - Term premium
36
44
  - Historical in-sample expected returns
37
45
  - Expected return loadings
38
- - Hypothesis testing (Not sure if correct, more info observations below)
39
46
 
40
47
 
41
48
  # Instalation
@@ -43,6 +50,7 @@ carries all the relevant variables as atributes:
43
50
  pip install pyacm
44
51
  ```
45
52
 
53
+
46
54
  # Usage
47
55
  ```python
48
56
  from pyacm import NominalACM
@@ -59,13 +67,9 @@ The tricky part of using this model is getting the correct data format. The
59
67
  - Maturities (columns) must be equally spaced in **monthly** frequency and start
60
68
  at month 1. This means that you need to construct a bootstraped curve for every
61
69
  date and interpolate it at fixed monthly maturities
62
- - Whichever maturity you want to be the longest, your input data should have one
63
- column more. For example, if you want term premium estimate up to the 10-year
64
- yield (120 months), your input data should include maturities up to 121 months.
65
- This is needed to properly compute the returns.
66
70
 
67
- # Examples
68
71
 
72
+ # Examples
69
73
  The estimates for the US are available on the [NY FED website](https://www.newyorkfed.org/research/data_indicators/term-premia-tabs#/overview).
70
74
 
71
75
  The jupyter notebook [`example_br`](https://github.com/gusamarante/pyacm/blob/main/example_br.ipynb)
@@ -82,14 +86,5 @@ contains an example application to the Brazilian DI futures curve that showcases
82
86
  > FRB of New York Staff Report No. 340,
83
87
  > Available at SSRN: https://ssrn.com/abstract=1362586 or http://dx.doi.org/10.2139/ssrn.1362586
84
88
 
85
- The version of the article that was published by the NY FED is not 100% explicit on how the data is being manipulated,
86
- but I found an earlier version of the paper on SSRN where the authors go deeper into the details on how everything is being estimated:
87
- - Data for zero yields uses monthly maturities starting from month 1
88
- - All principal components and model parameters are estiamted with data resampled to a monthly frequency, averaging observations in each month
89
- - To get daily / real-time estimates, the factor loadings estimated from the monthly frquency are used to transform the daily data
90
-
91
-
92
- # Observations
93
- I am not completely sure that computations in the [inferences attributes][inference_atribute]
94
- are correct. If you find any mistakes, please open a pull request following the contributing
95
- guidelines.
89
+ I would like to thank Emanuel Moench for sending me his original MATLAB code in
90
+ order to perfectly replicate these results.
@@ -3,7 +3,6 @@ README.md
3
3
  setup.py
4
4
  pyacm/__init__.py
5
5
  pyacm/acm.py
6
- pyacm/utils.py
7
6
  pyacm.egg-info/PKG-INFO
8
7
  pyacm.egg-info/SOURCES.txt
9
8
  pyacm.egg-info/dependency_links.txt
@@ -2,4 +2,4 @@ matplotlib
2
2
  numpy
3
3
  pandas
4
4
  scikit-learn
5
- tqdm
5
+ statsmodels
@@ -7,7 +7,7 @@ here = os.path.abspath(os.path.dirname(__file__))
7
7
  with codecs.open(os.path.join(here, "README.md"), encoding="utf-8") as fh:
8
8
  long_description = "\n" + fh.read()
9
9
 
10
- VERSION = '0.4'
10
+ VERSION = '1.0'
11
11
  DESCRIPTION = 'ACM Term Premium'
12
12
 
13
13
  # Setting up
@@ -26,7 +26,7 @@ setup(
26
26
  'numpy',
27
27
  'pandas',
28
28
  'scikit-learn',
29
- 'tqdm',
29
+ 'statsmodels',
30
30
  ],
31
31
  keywords=[
32
32
  'asset pricing',
pyacm-0.4/pyacm/acm.py DELETED
@@ -1,383 +0,0 @@
1
- import numpy as np
2
- import pandas as pd
3
-
4
- from numpy.linalg import inv
5
- from sklearn.decomposition import PCA
6
-
7
- from pyacm.utils import vec, vec_quad_form, commutation_matrix
8
-
9
-
10
- class NominalACM:
11
- """
12
- This class implements the model from the article:
13
-
14
- Adrian, Tobias, Richard K. Crump, and Emanuel Moench. “Pricing the
15
- Term Structure with Linear Regressions.” SSRN Electronic Journal,
16
- 2012. https://doi.org/10.2139/ssrn.1362586.
17
-
18
- It handles data transformation, estimates parameters and generates the
19
- relevant outputs. The version of the article that was published by the NY
20
- FED is not 100% explicit on how the data is being manipulated, but I found
21
- an earlier version of the paper on SSRN where the authors go deeper into
22
- the details on how everything is being estimated:
23
- - Data for zero yields uses monthly maturities starting from month 1
24
- - All principal components and model parameters are estiamted with data
25
- resampled to a monthly frequency, averaging observations in each
26
- month.
27
- - To get daily / real-time estimates, the factor loadings estimated
28
- from the monthly frquency are used to transform the daily data.
29
-
30
- Attributes
31
- ----------
32
- n_factors: int
33
- number of principal components used
34
-
35
- curve: pandas.DataFrame
36
- Raw data of the yield curve
37
-
38
- curve_monthly: pandas.DataFrame
39
- Yield curve data resampled to a monthly frequency by averageing
40
- the observations
41
-
42
- t: int
43
- Number of observations in the timeseries dimension
44
-
45
- n: int
46
- Number of observations in the cross-sectional dimension. Same
47
- as number of maturities available after returns are computed
48
-
49
- rx_m: pd.DataFrame
50
- Excess returns in monthly frquency
51
-
52
- rf_m: pandas.Series
53
- Risk-free rate in monthly frequency
54
-
55
- rf_d: pandas.Series
56
- Risk-free rate in daily frequency
57
-
58
- pc_factors_m: pandas.DataFrame
59
- Principal components in monthly frequency
60
-
61
- pc_loadings_m: pandas.DataFrame
62
- Factor loadings of the monthly PCs
63
-
64
- pc_explained_m: pandas.Series
65
- Percent of total variance explained by each monthly principal component
66
-
67
- pc_factors_d: pandas.DataFrame
68
- Principal components in daily frequency
69
-
70
- pc_loadings_d: pandas.DataFrame
71
- Factor loadings of the daily PCs
72
-
73
- pc_explained_d: pandas.Series
74
- Percent of total variance explained by each monthly principal component
75
-
76
- mu, phi, Sigma, v: numpy.array
77
- Estimates of the VAR(1) parameters, the first stage of estimation.
78
- The names are the same as the original paper
79
-
80
- a, beta, c, sigma2: numpy.array
81
- Estimates of the risk premium equation, the second stage of estimation.
82
- The names are the same as the original paper
83
-
84
- lambda0, lambda1: numpy.array
85
- Estimates of the price of risk parameters, the third stage of estimation.
86
- The names are the same as the original paper
87
-
88
- miy: pandas.DataFrame
89
- Model implied / fitted yields
90
-
91
- rny: pandas.DataFrame
92
- Risk neutral yields
93
-
94
- tp: pandas.DataFrame
95
- Term premium estimates
96
-
97
- er_loadings: pandas.DataFrame
98
- Loadings of the expected reutrns on the principal components
99
-
100
- er_hist_m: pandas.DataFrame
101
- Historical estimates of expected returns, computed in-sample, in monthly frequency
102
-
103
- er_hist_d: pandas.DataFrame
104
- Historical estimates of expected returns, computed in-sample, in daily frequency
105
-
106
- z_lambda: pandas.DataFrame
107
- Z-stat for inference on the price of risk parameters
108
-
109
- z_beta: pandas.DataFrame
110
- Z-stat for inference on the loadings of expected returns
111
- """
112
-
113
- def __init__(self, curve, n_factors=5):
114
- """
115
- Runs the baseline varsion of the ACM term premium model. Works for data
116
- with monthly frequency or higher.
117
-
118
- Parameters
119
- ----------
120
- curve : pandas.DataFrame
121
- Annualized log-yields. Maturities (columns) must start at month 1
122
- and be equally spaced in monthly frequency. The labels of the
123
- columns do not matter, they be kept the same. Observations (index)
124
- must be of monthly frequency or higher. The index must be a
125
- pandas.DateTimeIndex.
126
-
127
- n_factors : int
128
- number of principal components to used as state variables.
129
- """
130
-
131
- self.n_factors = n_factors
132
- self.curve = curve
133
- self.curve_monthly = curve.resample('M').mean()
134
- self.t = self.curve_monthly.shape[0] - 1
135
- self.n = self.curve_monthly.shape[1]
136
- self.rx_m, self.rf_m = self._get_excess_returns()
137
- self.rf_d = self.curve.iloc[:, 0] * (1 / 12)
138
- self.pc_factors_m, self.pc_loadings_m, self.pc_explained_m = self._get_pcs(self.curve_monthly)
139
- self.pc_factors_d, self.pc_loadings_d, self.pc_explained_d = self._get_pcs(self.curve)
140
- self.mu, self.phi, self.Sigma, self.v = self._estimate_var()
141
- self.a, self.beta, self.c, self.sigma2 = self._excess_return_regression()
142
- self.lambda0, self.lambda1 = self._retrieve_lambda()
143
-
144
- if self.curve.index.freqstr == 'M':
145
- X = self.pc_factors_m
146
- r1 = self.rf_m
147
- else:
148
- X = self.pc_factors_d
149
- r1 = self.rf_d
150
-
151
- self.miy = self._affine_recursions(self.lambda0, self.lambda1, X, r1)
152
- self.rny = self._affine_recursions(0, 0, X, r1)
153
- self.tp = self.miy - self.rny
154
- self.er_loadings, self.er_hist_m, self.er_hist_d = self._expected_return()
155
- self.z_lambda, self.z_beta = self._inference()
156
-
157
- def fwd_curve(self, date=None):
158
- """
159
- Compute the forward curves for a given date.
160
-
161
- Parameters
162
- ----------
163
- date : date-like
164
- date in any format that can be interpreted by pandas.to_datetime()
165
- """
166
-
167
- if date is None:
168
- date = self.curve.index[-1]
169
-
170
- date = pd.to_datetime(date)
171
- fwd_mkt = self._compute_fwd_curve(self.curve.loc[date])
172
- fwd_miy = self._compute_fwd_curve(self.miy.loc[date])
173
- fwd_rny = self._compute_fwd_curve(self.rny.loc[date])
174
- df = pd.concat(
175
- [
176
- fwd_mkt.rename("Observed"),
177
- fwd_miy.rename("Model Implied"),
178
- fwd_rny.rename("Risk-Neutral"),
179
- ],
180
- axis=1,
181
- )
182
- return df
183
-
184
-
185
- @staticmethod
186
- def _compute_fwd_curve(curve):
187
- aux_curve = curve.reset_index(drop=True)
188
- aux_curve.index = aux_curve.index + 1
189
- factor = (1 + aux_curve) ** (aux_curve.index / 12)
190
- fwd_factor = factor / factor.shift(1).fillna(1)
191
- fwds = (fwd_factor ** 12) - 1
192
- fwds = pd.Series(fwds.values, index=curve.index)
193
- return fwds
194
-
195
- def _get_excess_returns(self):
196
- ttm = np.arange(1, self.n + 1) / 12
197
- log_prices = - self.curve_monthly * ttm
198
- rf = - log_prices.iloc[:, 0].shift(1)
199
- rx = (log_prices - log_prices.shift(1, axis=0).shift(-1, axis=1)).subtract(rf, axis=0)
200
- rx = rx.dropna(how='all', axis=0).dropna(how='all', axis=1)
201
- return rx, rf.dropna()
202
-
203
- def _get_pcs(self, curve):
204
- pca = PCA(n_components=self.n_factors)
205
- pca.fit(curve)
206
- col_names = [f'PC {i + 1}' for i in range(self.n_factors)]
207
- df_loadings = pd.DataFrame(data=pca.components_.T,
208
- columns=col_names,
209
- index=curve.columns)
210
-
211
- # Normalize the direction of the eigenvectors
212
- signal = np.sign(df_loadings.iloc[-1])
213
- df_loadings = df_loadings * signal
214
- df_pc = (curve - curve.mean()) @ df_loadings
215
-
216
- # Percent Explained
217
- df_explained = pd.Series(data=pca.explained_variance_ratio_,
218
- name='Explained Variance',
219
- index=col_names)
220
-
221
- return df_pc, df_loadings, df_explained
222
-
223
- def _estimate_var(self):
224
- X = self.pc_factors_m.copy().T
225
- X_lhs = X.values[:, 1:] # X_t+1. Left hand side of VAR
226
- X_rhs = np.vstack((np.ones((1, self.t)), X.values[:, 0:-1])) # X_t and a constant.
227
-
228
- var_coeffs = (X_lhs @ np.linalg.pinv(X_rhs))
229
- mu = var_coeffs[:, [0]]
230
- phi = var_coeffs[:, 1:]
231
-
232
- v = X_lhs - var_coeffs @ X_rhs
233
- Sigma = v @ v.T / self.t
234
-
235
- return mu, phi, Sigma, v
236
-
237
- def _excess_return_regression(self):
238
- X = self.pc_factors_m.copy().T.values[:, :-1]
239
- Z = np.vstack((np.ones((1, self.t)), self.v, X)) # Innovations and lagged X
240
- abc = self.rx_m.values.T @ np.linalg.pinv(Z)
241
- E = self.rx_m.values.T - abc @ Z
242
- sigma2 = np.trace(E @ E.T) / (self.n * self.t)
243
-
244
- a = abc[:, [0]]
245
- beta = abc[:, 1:self.n_factors + 1].T
246
- c = abc[:, self.n_factors + 1:]
247
-
248
- return a, beta, c, sigma2
249
-
250
- def _retrieve_lambda(self):
251
- BStar = np.squeeze(np.apply_along_axis(vec_quad_form, 1, self.beta.T))
252
- lambda1 = np.linalg.pinv(self.beta.T) @ self.c
253
- lambda0 = np.linalg.pinv(self.beta.T) @ (self.a + 0.5 * (BStar @ vec(self.Sigma) + self.sigma2))
254
- return lambda0, lambda1
255
-
256
- def _affine_recursions(self, lambda0, lambda1, X_in, r1):
257
- X = X_in.T.values[:, 1:]
258
- r1 = vec(r1.values)[-X.shape[1]:, :]
259
-
260
- A = np.zeros((1, self.n))
261
- B = np.zeros((self.n_factors, self.n))
262
-
263
- delta = r1.T @ np.linalg.pinv(np.vstack((np.ones((1, X.shape[1])), X)))
264
- delta0 = delta[[0], [0]]
265
- delta1 = delta[[0], 1:]
266
-
267
- A[0, 0] = - delta0
268
- B[:, 0] = - delta1
269
-
270
- for i in range(self.n - 1):
271
- A[0, i + 1] = A[0, i] + B[:, i].T @ (self.mu - lambda0) + 1 / 2 * (B[:, i].T @ self.Sigma @ B[:, i] + 0 * self.sigma2) - delta0
272
- B[:, i + 1] = B[:, i] @ (self.phi - lambda1) - delta1
273
-
274
- # Construct fitted yields
275
- ttm = np.arange(1, self.n + 1) / 12
276
- fitted_log_prices = (A.T + B.T @ X).T
277
- fitted_yields = - fitted_log_prices / ttm
278
- fitted_yields = pd.DataFrame(
279
- data=fitted_yields,
280
- index=self.curve.index[1:],
281
- columns=self.curve.columns,
282
- )
283
- return fitted_yields
284
-
285
- def _expected_return(self):
286
- """
287
- Compute the "expected return" and "convexity adjustment" terms, to get
288
- the expected return loadings and historical estimate
289
-
290
- Loadings are interpreted as the effect of 1sd of the PCs on the
291
- expected returns
292
- """
293
- stds = self.pc_factors_m.std().values[:, None].T
294
- er_loadings = (self.beta.T @ self.lambda1) * stds
295
- er_loadings = pd.DataFrame(
296
- data=er_loadings,
297
- columns=self.pc_factors_m.columns,
298
- index=self.curve.columns[:-1],
299
- )
300
-
301
- # Monthly
302
- exp_ret = (self.beta.T @ (self.lambda1 @ self.pc_factors_m.T + self.lambda0)).values
303
- conv_adj = np.diag(self.beta.T @ self.Sigma @ self.beta) + self.sigma2
304
- er_hist = (exp_ret + conv_adj[:, None]).T
305
- er_hist_m = pd.DataFrame(
306
- data=er_hist,
307
- index=self.pc_factors_m.index,
308
- columns=self.curve.columns[:er_hist.shape[1]]
309
- )
310
-
311
- # Higher frequency
312
- exp_ret = (self.beta.T @ (self.lambda1 @ self.pc_factors_d.T + self.lambda0)).values
313
- conv_adj = np.diag(self.beta.T @ self.Sigma @ self.beta) + self.sigma2
314
- er_hist = (exp_ret + conv_adj[:, None]).T
315
- er_hist_d = pd.DataFrame(
316
- data=er_hist,
317
- index=self.pc_factors_d.index,
318
- columns=self.curve.columns[:er_hist.shape[1]]
319
- )
320
-
321
- return er_loadings, er_hist_m, er_hist_d
322
-
323
- def _inference(self):
324
- # TODO I AM NOT SURE THAT THIS SECTION IS CORRECT
325
-
326
- # Auxiliary matrices
327
- Z = self.pc_factors_m.copy().T
328
- Z = Z.values[:, 1:]
329
- Z = np.vstack((np.ones((1, self.t)), Z))
330
-
331
- Lamb = np.hstack((self.lambda0, self.lambda1))
332
-
333
- rho1 = np.zeros((self.n_factors + 1, 1))
334
- rho1[0, 0] = 1
335
-
336
- A_beta = np.zeros((self.n_factors * self.beta.shape[1], self.beta.shape[1]))
337
-
338
- for ii in range(self.beta.shape[1]):
339
- A_beta[ii * self.beta.shape[0]:(ii + 1) * self.beta.shape[0], ii] = self.beta[:, ii]
340
-
341
- BStar = np.squeeze(np.apply_along_axis(vec_quad_form, 1, self.beta.T))
342
-
343
- comm_kk = commutation_matrix(shape=(self.n_factors, self.n_factors))
344
- comm_kn = commutation_matrix(shape=(self.n_factors, self.beta.shape[1]))
345
-
346
- # Assymptotic variance of the betas
347
- v_beta = self.sigma2 * np.kron(np.eye(self.beta.shape[1]), inv(self.Sigma))
348
-
349
- # Assymptotic variance of the lambdas
350
- upsilon_zz = (1 / self.t) * Z @ Z.T
351
- v1 = np.kron(inv(upsilon_zz), self.Sigma)
352
- v2 = self.sigma2 * np.kron(inv(upsilon_zz), inv(self.beta @ self.beta.T))
353
- v3 = self.sigma2 * np.kron(Lamb.T @ self.Sigma @ Lamb, inv(self.beta @ self.beta.T))
354
-
355
- v4_sim = inv(self.beta @ self.beta.T) @ self.beta @ A_beta.T
356
- v4_mid = np.kron(np.eye(self.beta.shape[1]), self.Sigma)
357
- v4 = self.sigma2 * np.kron(rho1 @ rho1.T, v4_sim @ v4_mid @ v4_sim.T)
358
-
359
- v5_sim = inv(self.beta @ self.beta.T) @ self.beta @ BStar
360
- v5_mid = (np.eye(self.n_factors ** 2) + comm_kk) @ np.kron(self.Sigma, self.Sigma)
361
- v5 = 0.25 * np.kron(rho1 @ rho1.T, v5_sim @ v5_mid @ v5_sim.T)
362
-
363
- v6_sim = inv(self.beta @ self.beta.T) @ self.beta @ np.ones((self.beta.shape[1], 1))
364
- v6 = 0.5 * (self.sigma2 ** 2) * np.kron(rho1 @ rho1.T, v6_sim @ v6_sim.T)
365
-
366
- v_lambda_tau = v1 + v2 + v3 + v4 + v5 + v6
367
-
368
- c_lambda_tau_1 = np.kron(Lamb.T, inv(self.beta @ self.beta.T) @ self.beta)
369
- c_lambda_tau_2 = np.kron(rho1, inv(self.beta @ self.beta.T) @ self.beta @ A_beta.T @ np.kron(np.eye(self.beta.shape[1]), self.Sigma))
370
- c_lambda_tau = - c_lambda_tau_1 @ comm_kn @ v_beta @ c_lambda_tau_2.T
371
-
372
- v_lambda = v_lambda_tau + c_lambda_tau + c_lambda_tau.T
373
-
374
- # extract the z-tests
375
- sd_lambda = np.sqrt(np.diag(v_lambda).reshape(Lamb.shape, order='F'))
376
- sd_beta = np.sqrt(np.diag(v_beta).reshape(self.beta.shape, order='F'))
377
-
378
- z_beta = pd.DataFrame(self.beta / sd_beta, index=self.pc_factors_m.columns, columns=self.curve.columns[:-1]).T
379
- z_lambda = pd.DataFrame(Lamb / sd_lambda, index=self.pc_factors_m.columns, columns=[f"lambda {i}" for i in range(Lamb.shape[1])])
380
-
381
- return z_lambda, z_beta
382
-
383
-
pyacm-0.4/pyacm/utils.py DELETED
@@ -1,43 +0,0 @@
1
- import numpy as np
2
-
3
-
4
- def vec(mat):
5
- """
6
- Stack the columns of `mat` into a column vector. If mat is a M x N matrix,
7
- then vec(mat) is an MN X 1 vector.
8
-
9
- Parameters
10
- ----------
11
- mat: numpy.array
12
- """
13
- vec_mat = mat.reshape((-1, 1), order='F')
14
- return vec_mat
15
-
16
-
17
- def vec_quad_form(mat):
18
- """
19
- `vec` operation for quadratic forms
20
-
21
- Parameters
22
- ----------
23
- mat: numpy.array
24
- """
25
- return vec(np.outer(mat, mat))
26
-
27
-
28
- def commutation_matrix(shape):
29
- """
30
- Generates the commutation matrix for a matrix with shape equal to `shape`.
31
-
32
- The definition of a commutation matrix `k` is:
33
- k @ vec(mat) = vec(mat.T)
34
-
35
- Parameters
36
- ----------
37
- shape : tuple
38
- 2-d tuple (m, n) with the shape of `mat`
39
- """
40
- m, n = shape
41
- w = np.arange(m * n).reshape((m, n), order="F").T.ravel(order="F")
42
- k = np.eye(m * n)[w, :]
43
- return k
File without changes
File without changes
File without changes
File without changes