pyhrp 0.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2020 Thomas Schmelzer
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
pyhrp-0.0.0/PKG-INFO ADDED
@@ -0,0 +1,65 @@
1
+ Metadata-Version: 2.1
2
+ Name: pyhrp
3
+ Version: 0.0.0
4
+ Summary: ...
5
+ Home-page: https://github.com/tschm/pyhrp
6
+ Author: Thomas Schmelzer
7
+ Requires-Python: >=3.9.0
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: Programming Language :: Python :: 3.9
10
+ Classifier: Programming Language :: Python :: 3.10
11
+ Classifier: Programming Language :: Python :: 3.11
12
+ Requires-Dist: matplotlib (>=3.3.3)
13
+ Requires-Dist: pandas (>=1.2.0)
14
+ Requires-Dist: scikit-learn (>=0.24.1)
15
+ Requires-Dist: scipy (>=1.6.0)
16
+ Project-URL: Repository, https://github.com/tschm/pyhrp
17
+ Description-Content-Type: text/markdown
18
+
19
+ # pyhrp
20
+
21
+ [![DeepSource](https://deepsource.io/gh/tschm/hrp.svg/?label=active+issues&show_trend=true&token=qjT_aLQgo_1Xbe2Z9ZNdH3Cx)](https://deepsource.io/gh/tschm/hrp/?ref=repository-badge)
22
+
23
+ A recursive implementation of the Hierarchical Risk Parity (hrp) approach by Marcos Lopez de Prado.
24
+ We take heavily advantage of the scipy.cluster.hierarchy package.
25
+
26
+ Here's a simple example
27
+
28
+ ```python
29
+ import pandas as pd
30
+ from pyhrp.hrp import dist, linkage, tree, _hrp
31
+
32
+ prices = pd.read_csv("test/resources/stock_prices.csv", index_col=0, parse_dates=True)
33
+
34
+ returns = prices.pct_change().dropna(axis=0, how="all")
35
+ cov, cor = returns.cov(), returns.corr()
36
+ links = linkage(dist(cor.values), method='ward')
37
+ node = tree(links)
38
+
39
+ rootcluster = _hrp(node, cov)
40
+
41
+ ax = dendrogram(links, orientation="left")
42
+ ax.get_figure().savefig("dendrogram.png")
43
+ ```
44
+ For your convenience you can bypass the construction of the covariance and correlation matrix, the links and the node, e.g. the root of the tree (dendrogram).
45
+ ```python
46
+ import pandas as pd
47
+ from pyhrp.hrp import hrp
48
+
49
+ prices = pd.read_csv("test/resources/stock_prices.csv", index_col=0, parse_dates=True)
50
+ root = hrp(prices=prices)
51
+ ```
52
+ You may expect a weight series here but instead the `hrp` function returns a `Cluster` object. The `Cluster` simplifies all further post-analysis.
53
+ ```python
54
+ print(cluster.weights)
55
+ print(cluster.variance)
56
+ # You can drill into the graph by going downstream
57
+ print(cluster.left)
58
+ print(cluster.right)
59
+ ```
60
+
61
+ ## Installation:
62
+ ```
63
+ pip install pyhpr
64
+ ```
65
+
pyhrp-0.0.0/README.md ADDED
@@ -0,0 +1,46 @@
1
+ # pyhrp
2
+
3
+ [![DeepSource](https://deepsource.io/gh/tschm/hrp.svg/?label=active+issues&show_trend=true&token=qjT_aLQgo_1Xbe2Z9ZNdH3Cx)](https://deepsource.io/gh/tschm/hrp/?ref=repository-badge)
4
+
5
+ A recursive implementation of the Hierarchical Risk Parity (hrp) approach by Marcos Lopez de Prado.
6
+ We take heavily advantage of the scipy.cluster.hierarchy package.
7
+
8
+ Here's a simple example
9
+
10
+ ```python
11
+ import pandas as pd
12
+ from pyhrp.hrp import dist, linkage, tree, _hrp
13
+
14
+ prices = pd.read_csv("test/resources/stock_prices.csv", index_col=0, parse_dates=True)
15
+
16
+ returns = prices.pct_change().dropna(axis=0, how="all")
17
+ cov, cor = returns.cov(), returns.corr()
18
+ links = linkage(dist(cor.values), method='ward')
19
+ node = tree(links)
20
+
21
+ rootcluster = _hrp(node, cov)
22
+
23
+ ax = dendrogram(links, orientation="left")
24
+ ax.get_figure().savefig("dendrogram.png")
25
+ ```
26
+ For your convenience you can bypass the construction of the covariance and correlation matrix, the links and the node, e.g. the root of the tree (dendrogram).
27
+ ```python
28
+ import pandas as pd
29
+ from pyhrp.hrp import hrp
30
+
31
+ prices = pd.read_csv("test/resources/stock_prices.csv", index_col=0, parse_dates=True)
32
+ root = hrp(prices=prices)
33
+ ```
34
+ You may expect a weight series here but instead the `hrp` function returns a `Cluster` object. The `Cluster` simplifies all further post-analysis.
35
+ ```python
36
+ print(cluster.weights)
37
+ print(cluster.variance)
38
+ # You can drill into the graph by going downstream
39
+ print(cluster.left)
40
+ print(cluster.right)
41
+ ```
42
+
43
+ ## Installation:
44
+ ```
45
+ pip install pyhpr
46
+ ```
File without changes
@@ -0,0 +1,97 @@
1
+ """risk parity for clusters"""
2
+ from dataclasses import dataclass
3
+ from typing import Dict
4
+
5
+ import numpy as np
6
+ import pandas as pd
7
+
8
+
9
+ def risk_parity(cluster_left, cluster_right, cov):
10
+ """
11
+ Given two clusters compute in a bottom-up approach their parent.
12
+
13
+ :param cluster_left: left cluster
14
+ :param cluster_right: right cluster
15
+ :param cov: (global) covariance matrix. Will pick the correct sub-matrix
16
+
17
+ """
18
+
19
+ # combine two clusters
20
+
21
+ def parity(v_left, v_right):
22
+ """
23
+ Compute the weights for a risk parity portfolio of two assets
24
+ :param v_left: Variance of the "left" portfolio
25
+ :param v_right: Variance of the "right" portfolio
26
+ :return: w, 1-w the weights for the left and the right portfolio.
27
+ It is w*v_left == (1-w)*v_right hence w = v_right / (v_right + v_left)
28
+ """
29
+ return v_right / (v_left + v_right), v_left / (v_left + v_right)
30
+
31
+ if not set(cluster_left.assets).isdisjoint(set(cluster_right.assets)):
32
+ raise AssertionError
33
+
34
+ # split is s.t. v_left * alpha_left == v_right * alpha_right and alpha + beta = 1
35
+ alpha_left, alpha_right = parity(cluster_left.variance, cluster_right.variance)
36
+
37
+ # assets in the cluster are the assets of the left and right cluster
38
+ # further downstream
39
+ assets = {
40
+ **(alpha_left * cluster_left.weights).to_dict(),
41
+ **(alpha_right * cluster_right.weights).to_dict(),
42
+ }
43
+
44
+ weights = np.array(list(assets.values()))
45
+ covariance = cov[assets.keys()].loc[assets.keys()]
46
+
47
+ var = np.linalg.multi_dot((weights, covariance, weights))
48
+
49
+ return Cluster(
50
+ assets=assets,
51
+ variance=var,
52
+ left=cluster_left,
53
+ right=cluster_right, # , node=node
54
+ )
55
+
56
+
57
+ @dataclass(frozen=True)
58
+ class Cluster:
59
+ """
60
+ Clusters are the nodes of the graphs we build.
61
+ Each cluster is aware of the left and the right cluster
62
+ it is connecting to.
63
+ """
64
+
65
+ assets: Dict[str, float]
66
+ variance: float
67
+ left: object = None
68
+ right: object = None
69
+
70
+ def __post_init__(self):
71
+ """check input"""
72
+
73
+ if self.variance <= 0:
74
+ raise AssertionError
75
+ if self.left is None:
76
+ # if there is no left, there can't be a right
77
+ if self.right is not None:
78
+ raise AssertionError
79
+ else:
80
+ # left is not None, hence both left and right have to be clusters
81
+ if not isinstance(self.left, Cluster):
82
+ raise AssertionError
83
+ if not isinstance(self.right, Cluster):
84
+ raise AssertionError
85
+ if not set(self.left.assets.keys()).isdisjoint(
86
+ set(self.right.assets.keys())
87
+ ):
88
+ raise AssertionError
89
+
90
+ def is_leaf(self):
91
+ """true if this cluster is a leaf, e.g. no clusters follow downstream"""
92
+ return self.left is None and self.right is None
93
+
94
+ @property
95
+ def weights(self):
96
+ """weight series"""
97
+ return pd.Series(self.assets, name="Weights").sort_index()
@@ -0,0 +1,11 @@
1
+ """display a dendrogram"""
2
+ import matplotlib.pyplot as plt
3
+ import scipy.cluster.hierarchy as sch
4
+
5
+
6
+ def dendrogram(links, ax=None, **kwargs):
7
+ """Plot a dendrogram using matplotlib"""
8
+ if ax is None:
9
+ _, ax = plt.subplots(figsize=(25, 20))
10
+ sch.dendrogram(links, ax=ax, **kwargs)
11
+ return ax
@@ -0,0 +1,72 @@
1
+ """the hrp algorithm"""
2
+ import numpy as np
3
+ import scipy.cluster.hierarchy as sch
4
+ import scipy.spatial.distance as ssd
5
+
6
+ from pyhrp.cluster import Cluster, risk_parity
7
+
8
+
9
+ def dist(cor):
10
+ """
11
+ Compute the correlation based distance matrix d,
12
+ compare with page 239 of the first book by Marcos
13
+ :param cor: the n x n correlation matrix
14
+ :return: The matrix d indicating the distance between column i and i.
15
+ Note that all the diagonal entries are zero.
16
+
17
+ """
18
+ # https://stackoverflow.com/questions/18952587/
19
+ matrix = np.sqrt(np.clip((1.0 - cor) / 2.0, a_min=0.0, a_max=1.0))
20
+ np.fill_diagonal(matrix, val=0.0)
21
+ return ssd.squareform(matrix)
22
+
23
+
24
+ def linkage(dist_vec, method="ward", **kwargs):
25
+ """
26
+ Based on distance matrix compute the underlying links
27
+ :param dist_vec: The distance vector based on the correlation matrix
28
+ :param method: "single", "ward", etc.
29
+ :return: links The links describing the graph (useful to draw the dendrogram)
30
+ and basis for constructing the tree object
31
+ """
32
+ # compute the root node of the dendrogram
33
+ return sch.linkage(dist_vec, method=method, **kwargs)
34
+
35
+
36
+ def tree(links):
37
+ """
38
+ Compute the root ClusterNode.
39
+ :param links: The Linkage matrix compiled by the linkage function above
40
+ :return: The root node. From there it's possible to reach the entire graph
41
+ """
42
+ return sch.to_tree(links, rd=False)
43
+
44
+
45
+ def build_cluster(node, cov):
46
+ """compute a cluster"""
47
+ if node.is_leaf():
48
+ # a node is a leaf if has no further relatives downstream.
49
+ # no leaves, no branches, ...
50
+ asset = cov.keys().to_list()[node.id]
51
+ return Cluster(assets={asset: 1.0}, variance=cov[asset][asset])
52
+
53
+ # drill down on the left
54
+ cluster_left = build_cluster(node.left, cov)
55
+ # drill down on the right
56
+ cluster_right = build_cluster(node.right, cov)
57
+ # combine left and right into a new cluster
58
+ return risk_parity(cluster_left, cluster_right, cov=cov)
59
+
60
+
61
+ def hrp(prices, node=None, method="single"):
62
+ """
63
+ Computes the root node for the hierarchical risk parity portfolio
64
+ :param cov: This is the covariance matrix that shall be used
65
+ :param node: Optional. This is the rootnode of the graph describing the dendrogram
66
+ :return: the root cluster of the risk parity portfolio
67
+ """
68
+ returns = prices.pct_change().dropna(axis=0, how="all")
69
+ cov, cor = returns.cov(), returns.corr()
70
+ node = node or tree(linkage(dist(cor.values), method=method))
71
+
72
+ return build_cluster(node, cov)
@@ -0,0 +1,67 @@
1
+ """Replicate the implementation of HRP by Marcos Lopez de Prado using this package
2
+
3
+ The original implementation by Marcos Lopez de Prado is using recursive bisection
4
+ on a ranked list of columns of the covariance matrix
5
+ To get to this list Lopez de Prado is using a matrix quasi-diagonalization
6
+ induced by the order (from left to right) of the dendrogram.
7
+ Based on that we build a tree reflecting the recursive bisection.
8
+ With that tree and the covariance matrix we go back to the hrp algorithm"""
9
+
10
+ import pandas as pd
11
+ import scipy.cluster.hierarchy as sch
12
+
13
+ from pyhrp.hrp import build_cluster, dist, linkage, tree
14
+
15
+
16
+ def bisection(ids):
17
+ """
18
+ Compute the graph underlying the recursive bisection of Marcos Lopez de Prado
19
+
20
+ :param ids: A (ranked) set of indixes
21
+ :return: The root ClusterNode of this tree
22
+ """
23
+
24
+ def split(ids):
25
+ """split the vector ids in two parts, split in the middle"""
26
+ if len(ids) < 2:
27
+ raise AssertionError
28
+ num = len(ids)
29
+ return ids[: num // 2], ids[num // 2 :]
30
+
31
+ if len(ids) < 1:
32
+ raise AssertionError
33
+ if len(ids) != len(set(ids)):
34
+ raise AssertionError
35
+
36
+ if len(ids) == 1:
37
+ return sch.ClusterNode(id=ids[0])
38
+
39
+ left, right = split(ids)
40
+ return sch.ClusterNode(id=0, left=bisection(ids=left), right=bisection(ids=right))
41
+
42
+
43
+ def marcos(prices, node=None, method=None):
44
+ """The algorithm as implemented in the book by Marcos Lopez de Prado"""
45
+
46
+ # make sure the prices are a DataFrame
47
+ if not isinstance(prices, pd.DataFrame):
48
+ raise AssertionError
49
+
50
+ # convert into returns
51
+ returns = prices.pct_change().dropna(axis=0, how="all")
52
+
53
+ # compute covariance matrix and correlation matrices (both as DataFrames)
54
+ cov, cor = returns.cov(), returns.corr()
55
+
56
+ # Compute the root node of the tree
57
+ method = method or "ward"
58
+ node = node or tree(linkage(dist(cor.values), method=method))
59
+
60
+ # this is an interesting step
61
+ ids = node.pre_order()
62
+ # apply bisection, root is now a ClusterNode of the graph
63
+ root = bisection(ids=ids)
64
+
65
+ # It's not clear to me why Marcos is going down this route.
66
+ # Rather than sticking with the graph computed above.
67
+ return build_cluster(node=root, cov=cov)
@@ -0,0 +1,23 @@
1
+ [tool.poetry]
2
+ name = "pyhrp"
3
+ version = "0.0.0"
4
+ description = "..."
5
+ authors = ["Thomas Schmelzer"]
6
+ readme = "README.md"
7
+ repository = "https://github.com/tschm/pyhrp"
8
+ packages = [{include = "pyhrp"}]
9
+
10
+ [tool.poetry.dependencies]
11
+ python = ">=3.9.0"
12
+ pandas = ">=1.2.0"
13
+ scipy = ">=1.6.0"
14
+ matplotlib = ">=3.3.3"
15
+ scikit-learn = ">=0.24.1"
16
+
17
+ [tool.poetry.dev-dependencies]
18
+ pytest = "7.2.0"
19
+ pytest-cov = "*"
20
+
21
+ [build-system]
22
+ requires = ["poetry>=1.0.2"]
23
+ build-backend = "poetry.masonry.api"
pyhrp-0.0.0/setup.py ADDED
@@ -0,0 +1,30 @@
1
+ # -*- coding: utf-8 -*-
2
+ from setuptools import setup
3
+
4
+ packages = \
5
+ ['pyhrp']
6
+
7
+ package_data = \
8
+ {'': ['*']}
9
+
10
+ install_requires = \
11
+ ['matplotlib>=3.3.3', 'pandas>=1.2.0', 'scikit-learn>=0.24.1', 'scipy>=1.6.0']
12
+
13
+ setup_kwargs = {
14
+ 'name': 'pyhrp',
15
+ 'version': '0.0.0',
16
+ 'description': '...',
17
+ 'long_description': '# pyhrp\n\n[![DeepSource](https://deepsource.io/gh/tschm/hrp.svg/?label=active+issues&show_trend=true&token=qjT_aLQgo_1Xbe2Z9ZNdH3Cx)](https://deepsource.io/gh/tschm/hrp/?ref=repository-badge)\n\nA recursive implementation of the Hierarchical Risk Parity (hrp) approach by Marcos Lopez de Prado.\nWe take heavily advantage of the scipy.cluster.hierarchy package. \n\nHere\'s a simple example\n\n```python\nimport pandas as pd\nfrom pyhrp.hrp import dist, linkage, tree, _hrp\n\nprices = pd.read_csv("test/resources/stock_prices.csv", index_col=0, parse_dates=True)\n\nreturns = prices.pct_change().dropna(axis=0, how="all")\ncov, cor = returns.cov(), returns.corr()\nlinks = linkage(dist(cor.values), method=\'ward\')\nnode = tree(links)\n\nrootcluster = _hrp(node, cov)\n\nax = dendrogram(links, orientation="left")\nax.get_figure().savefig("dendrogram.png")\n```\nFor your convenience you can bypass the construction of the covariance and correlation matrix, the links and the node, e.g. the root of the tree (dendrogram).\n```python\nimport pandas as pd\nfrom pyhrp.hrp import hrp\n\nprices = pd.read_csv("test/resources/stock_prices.csv", index_col=0, parse_dates=True)\nroot = hrp(prices=prices)\n```\nYou may expect a weight series here but instead the `hrp` function returns a `Cluster` object. The `Cluster` simplifies all further post-analysis.\n```python\nprint(cluster.weights)\nprint(cluster.variance)\n# You can drill into the graph by going downstream\nprint(cluster.left)\nprint(cluster.right)\n```\n\n## Installation:\n```\npip install pyhpr\n```\n',
18
+ 'author': 'Thomas Schmelzer',
19
+ 'author_email': 'None',
20
+ 'maintainer': 'None',
21
+ 'maintainer_email': 'None',
22
+ 'url': 'https://github.com/tschm/pyhrp',
23
+ 'packages': packages,
24
+ 'package_data': package_data,
25
+ 'install_requires': install_requires,
26
+ 'python_requires': '>=3.9.0',
27
+ }
28
+
29
+
30
+ setup(**setup_kwargs)