causing 2.4.6__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ # MIT License
2
+
3
+ **Copyright (c) 2020 Dr. Holger Bartel, RealRate GmbH**
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
causing-2.4.6/PKG-INFO ADDED
@@ -0,0 +1,172 @@
1
+ Metadata-Version: 2.4
2
+ Name: causing
3
+ Version: 2.4.6
4
+ Summary: Causing: CAUSal INterpretation using Graphs
5
+ Home-page: https://github.com/realrate/Causing
6
+ Author: Dr. Holger Bartel
7
+ Author-email: holger.bartel@realrate.ai
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Operating System :: OS Independent
11
+ Requires-Python: >=3.9
12
+ Description-Content-Type: text/markdown
13
+ License-File: LICENSE.md
14
+ Requires-Dist: numpy~=1.23
15
+ Requires-Dist: pandas~=1.3
16
+ Requires-Dist: scipy~=1.9
17
+ Requires-Dist: sympy~=1.5
18
+ Requires-Dist: networkx~=2.7
19
+ Requires-Dist: pre-commit
20
+ Dynamic: author
21
+ Dynamic: author-email
22
+ Dynamic: classifier
23
+ Dynamic: description
24
+ Dynamic: description-content-type
25
+ Dynamic: home-page
26
+ Dynamic: license-file
27
+ Dynamic: requires-dist
28
+ Dynamic: requires-python
29
+ Dynamic: summary
30
+
31
+ # Causing: CAUSal INterpretation using Graphs
32
+
33
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
34
+ [![Python 3.7](https://img.shields.io/badge/python-3.7-blue.svg)](https://www.python.org/)
35
+
36
+ _Causing is a multivariate graphical analysis tool helping you to interpret the causal
37
+ effects of a given equation system._
38
+
39
+ Get a nice colored graph and immediately understand the causal effects between the variables.
40
+
41
+ **Input:** You simply have to put in a dataset and provide an equation system in form of a
42
+ python function. The endogenous variables on the left-hand side are assumed to be caused by
43
+ the variables on the right-hand side of the equation. Thus, you provide the causal structure
44
+ in form of a directed acyclic graph (DAG).
45
+
46
+ **Output:** As an output, you will get a colored graph of quantified effects acting between
47
+ the model variables. You can immediately interpret mediation chains for every
48
+ individual observation - even for highly complex nonlinear systems.
49
+
50
+ Here is a table relating Causing to other approaches:
51
+
52
+ Causing is | Causing is NOT
53
+ --- | ---
54
+ ✅ causal model given | ❌ causal search
55
+ ✅ DAG directed acyclic graph | ❌ cyclic, undirected, or bidirected graph
56
+ ✅ latent variables | ❌ just observed / manifest variables
57
+ ✅ individual effects | ❌ just average effects
58
+ ✅ direct, total, and mediation effects | ❌ just total effects
59
+ ✅ structural model | ❌ reduced model
60
+ ✅ small and big data | ❌ big data requirement
61
+ ✅ graphical results | ❌ just numerical results
62
+ ✅ XAI explainable AI | ❌ black box neural network
63
+
64
+ The Causing approach is quite flexible. It can be applied to highly latent models with many of the modeled endogenous variables being unobserved. Exogenous variables are assumed to be observed and deterministic. The most severe restriction certainly is that you need to specify the causal model / causal ordering.
65
+
66
+ ## Causal Effects
67
+
68
+ Causing combines total effects and mediation effects in one single graph that is easy to explain.
69
+
70
+ The total effects of a variable on the final variable are shown in the corresponding nodes of the graph. The total effects are split up over their outgoing edges, yielding the mediation effects shown on the edges. Just education has more than one outgoing edge to be interpreted in this way.
71
+
72
+ The effects differ from individual to individual. To emphasize this, we talk about individual effects. And the corresponding graph, combining total and mediation effects is called the Individual Mediation Effects (IME) graph.
73
+
74
+ ## Software
75
+
76
+ Causing is free software written in _Python 3_. Graphs are generated using _Graphviz_. See dependencies in [setup.py](setup.py). Causing is available under MIT license. See [LICENSE](LICENSE.md "LICENSE").
77
+
78
+ The software is developed by RealRate, an AI rating agency aiming to re-invent the rating market by using AI, interpretability, and avoiding any conflict of interest. See www.realrate.ai.
79
+
80
+ When starting `python -m causing.examples example` after cloning / downloading the Causing repository you will find the results in the _output_ folder. The results are saved in SVG files. The IME files show the individual mediation effects graphs for the respective individual.
81
+
82
+ See `causing/examples` for the code generating some examples.
83
+
84
+ ## Start your Model
85
+
86
+ To start your model, you have to provide the following information, as done in the example code below:
87
+
88
+ - Define all your model variables as SymPy symbols.
89
+ - Note that in Sympy some operators are special, e.g. Max() instead of max().
90
+ - Provide the model equations in topological order, that is, in order of computation.
91
+ - Then the model is specified with:
92
+ - _xvars_: exogenous variables
93
+ - _yvars_: endogenous variables in topological order
94
+ - _equations_: previously defined equations
95
+ - _final_var_: the final variable of interest used for mediation effects
96
+
97
+ ## 1. A Simple Example
98
+
99
+ Assume a model defined by the equation system:
100
+
101
+ Y<sub>1</sub> = X<sub>1</sub>
102
+
103
+ Y<sub>2</sub> = X<sub>2</sub> + 2 * Y<sub>1</sub><sup>2</sup>
104
+
105
+ Y<sub>3</sub> = Y<sub>1</sub> + Y<sub>2</sub>.
106
+
107
+ This gives the following graphs. Some notes to understand them:
108
+
109
+ - The data used consists of 200 observations. They are available for the x variables X<sub>1</sub> and X<sub>2</sub> with mean(X<sub>1</sub>) = 3 and mean(X<sub>2</sub>) = 2. Variables Y<sub>1</sub> and Y<sub>2</sub> are assumed to be latent / unobserved. Y<sub>3</sub> is assumed to be manifest / observed. Therefore, 200 observations are available for Y<sub>3</sub>.
110
+
111
+ - To allow for benchmark comparisons, each individual effect is measured with respect to the mean of all observations.
112
+
113
+ - Nodes and edges are colored, showing positive (_green_) and negative (_red_) effects they have on the final variable Y<sub>3</sub>.
114
+
115
+ - Individual effects are based on the given model. For each individual, however, its _own_ exogenous data is put into the given graph function to yield the corresponding endogenous values. The effects are computed at this individual point. Individual effects are shown below just for individual no. 1 out of the 200 observations.
116
+
117
+ - Total effects are shown below in the nodes and they are split up over the outgoing edges yielding the Mediation effects shown on the edges. Note, however, that just outgoing edges sum up to the node value, incoming edges do not. All effects are effects just on the final variable of interest, assumed here to be Y<sub>3</sub>.
118
+
119
+ ![Individual Mediation Effects (IME)](https://github.com/realrate/Causing/raw/develop/images_readme/IME_1.svg)
120
+
121
+ As you can see in the right-most graph for the individual mediation effects (IME), there is one green path starting at X<sub>1</sub> passing through Y<sub>1</sub>, Y<sub>2</sub>, and finally ending in Y<sub>3</sub>. This means that X<sub>1</sub> is the main cause for Y<sub>3</sub> taking on a value above average with its effect on Y<sub>3</sub> being +29.81. However, this positive effect is slightly reduced by X<sub>2</sub>. In total, accounting for all exogenous and endogenous effects, Y<sub>3</sub> is +27.07 above average. You can understand at one glance why Y<sub>3</sub> is above average for individual no. 1.
122
+
123
+ You can find the full source code for this example [here](https://github.com/realrate/Causing/blob/develop/causing/examples/models.py#L16-L45).
124
+
125
+ ## 2. Application to Education and Wages
126
+
127
+ To dig a bit deeper, here we have a real-world example from social sciences. We analyze how the wage earned by young American workers is determined by their educational attainment, family characteristics, and test scores.
128
+
129
+ This 5-minute introductory video gives a short overview of Causing and includes this real data example: See [Causing Introduction Video](https://youtu.be/GJLsjSZOk2w "Causing_Introduction_Video").
130
+
131
+ See here for a detailed analysis of the Education and Wages example: [An Application of Causing: Education and Wages](docs/education.md).
132
+
133
+ ## 3. Application to Insurance Ratings
134
+
135
+ The Causing approach and its formulas together with an application are given in:
136
+
137
+ > Bartel, Holger (2020), "Causal Analysis - With an Application to Insurance Ratings"
138
+ DOI: 10.13140/RG.2.2.31524.83848
139
+ https://www.researchgate.net/publication/339091133
140
+
141
+ Note that in this early paper the mediation effects on the final variable of interest are called final effects. Also, while the current Causing version just uses numerically computed effects, that paper uses closed formulas.
142
+
143
+ The paper proposes simple linear algebra formulas for the causal analysis of equation systems. The effect of one variable on another is the total derivative. It is extended to endogenous system variables. These total effects are identical to the effects used in graph theory and its do-calculus. Further, mediation effects are defined, decomposing the total effect of one variable on a final variable of interest over all its directly caused variables. This allows for an easy but in-depth causal and mediation analysis.
144
+
145
+ The equation system provided by the user is represented as a structural neural network (SNN). The network's nodes are represented by the model variables and its edge weights are given by the effects. Unlike classical deep neural networks, we follow a sparse and 'small data' approach. This new methodology is applied to the financial strength ratings of insurance companies.
146
+
147
+ > **Keywords:** total derivative, graphical effect, graph theory, do-Calculus, structural neural network, linear Simultaneous Equations Model (SEM), Structural Causal Model (SCM), insurance rating
148
+
149
+ ## Award
150
+
151
+ RealRate's AI software _Causing_ is a winner of the PyTorch AI Hackathon.
152
+
153
+ <img src="https://github.com/realrate/Causing/raw/develop/images_readme/RealRate_AI_Software_Winner.png">
154
+
155
+ We are excited to be a winner of the PyTorch AI Hackathon 2020 in the Responsible AI category. This is quite an honor given that more than 2,500 teams submitted their projects.
156
+
157
+ [devpost.com/software/realrate-explainable-ai-for-company-ratings](https://devpost.com/software/realrate-explainable-ai-for-company-ratings "devpost.com/software/realrate-explainable-ai-for-company-ratings").
158
+
159
+ ## GitHub Star History
160
+
161
+ [star-history.com](https://www.star-history.com/#realrate/Causing&Date)
162
+ ![star-history-2025327](https://github.com/user-attachments/assets/67271706-0534-4b97-b9da-7fe502f1d94a)
163
+
164
+ ## Contact
165
+
166
+ Dr. Holger Bartel
167
+ RealRate
168
+ Cecilienstr. 14, D-12307 Berlin
169
+ [holger.bartel@realrate.ai](mailto:holger.bartel@realrate.ai?subject=[Causing])
170
+ Phone: +49 160 957 90 844
171
+ [realrate.ai](https://realrate.ai)
172
+ [drbartel.com](https://drbartel.com)
@@ -0,0 +1,142 @@
1
+ # Causing: CAUSal INterpretation using Graphs
2
+
3
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
4
+ [![Python 3.7](https://img.shields.io/badge/python-3.7-blue.svg)](https://www.python.org/)
5
+
6
+ _Causing is a multivariate graphical analysis tool helping you to interpret the causal
7
+ effects of a given equation system._
8
+
9
+ Get a nice colored graph and immediately understand the causal effects between the variables.
10
+
11
+ **Input:** You simply have to put in a dataset and provide an equation system in form of a
12
+ python function. The endogenous variables on the left-hand side are assumed to be caused by
13
+ the variables on the right-hand side of the equation. Thus, you provide the causal structure
14
+ in form of a directed acyclic graph (DAG).
15
+
16
+ **Output:** As an output, you will get a colored graph of quantified effects acting between
17
+ the model variables. You can immediately interpret mediation chains for every
18
+ individual observation - even for highly complex nonlinear systems.
19
+
20
+ Here is a table relating Causing to other approaches:
21
+
22
+ Causing is | Causing is NOT
23
+ --- | ---
24
+ ✅ causal model given | ❌ causal search
25
+ ✅ DAG directed acyclic graph | ❌ cyclic, undirected, or bidirected graph
26
+ ✅ latent variables | ❌ just observed / manifest variables
27
+ ✅ individual effects | ❌ just average effects
28
+ ✅ direct, total, and mediation effects | ❌ just total effects
29
+ ✅ structural model | ❌ reduced model
30
+ ✅ small and big data | ❌ big data requirement
31
+ ✅ graphical results | ❌ just numerical results
32
+ ✅ XAI explainable AI | ❌ black box neural network
33
+
34
+ The Causing approach is quite flexible. It can be applied to highly latent models with many of the modeled endogenous variables being unobserved. Exogenous variables are assumed to be observed and deterministic. The most severe restriction certainly is that you need to specify the causal model / causal ordering.
35
+
36
+ ## Causal Effects
37
+
38
+ Causing combines total effects and mediation effects in one single graph that is easy to explain.
39
+
40
+ The total effects of a variable on the final variable are shown in the corresponding nodes of the graph. The total effects are split up over their outgoing edges, yielding the mediation effects shown on the edges. Just education has more than one outgoing edge to be interpreted in this way.
41
+
42
+ The effects differ from individual to individual. To emphasize this, we talk about individual effects. And the corresponding graph, combining total and mediation effects is called the Individual Mediation Effects (IME) graph.
43
+
44
+ ## Software
45
+
46
+ Causing is free software written in _Python 3_. Graphs are generated using _Graphviz_. See dependencies in [setup.py](setup.py). Causing is available under MIT license. See [LICENSE](LICENSE.md "LICENSE").
47
+
48
+ The software is developed by RealRate, an AI rating agency aiming to re-invent the rating market by using AI, interpretability, and avoiding any conflict of interest. See www.realrate.ai.
49
+
50
+ When starting `python -m causing.examples example` after cloning / downloading the Causing repository you will find the results in the _output_ folder. The results are saved in SVG files. The IME files show the individual mediation effects graphs for the respective individual.
51
+
52
+ See `causing/examples` for the code generating some examples.
53
+
54
+ ## Start your Model
55
+
56
+ To start your model, you have to provide the following information, as done in the example code below:
57
+
58
+ - Define all your model variables as SymPy symbols.
59
+ - Note that in Sympy some operators are special, e.g. Max() instead of max().
60
+ - Provide the model equations in topological order, that is, in order of computation.
61
+ - Then the model is specified with:
62
+ - _xvars_: exogenous variables
63
+ - _yvars_: endogenous variables in topological order
64
+ - _equations_: previously defined equations
65
+ - _final_var_: the final variable of interest used for mediation effects
66
+
67
+ ## 1. A Simple Example
68
+
69
+ Assume a model defined by the equation system:
70
+
71
+ Y<sub>1</sub> = X<sub>1</sub>
72
+
73
+ Y<sub>2</sub> = X<sub>2</sub> + 2 * Y<sub>1</sub><sup>2</sup>
74
+
75
+ Y<sub>3</sub> = Y<sub>1</sub> + Y<sub>2</sub>.
76
+
77
+ This gives the following graphs. Some notes to understand them:
78
+
79
+ - The data used consists of 200 observations. They are available for the x variables X<sub>1</sub> and X<sub>2</sub> with mean(X<sub>1</sub>) = 3 and mean(X<sub>2</sub>) = 2. Variables Y<sub>1</sub> and Y<sub>2</sub> are assumed to be latent / unobserved. Y<sub>3</sub> is assumed to be manifest / observed. Therefore, 200 observations are available for Y<sub>3</sub>.
80
+
81
+ - To allow for benchmark comparisons, each individual effect is measured with respect to the mean of all observations.
82
+
83
+ - Nodes and edges are colored, showing positive (_green_) and negative (_red_) effects they have on the final variable Y<sub>3</sub>.
84
+
85
+ - Individual effects are based on the given model. For each individual, however, its _own_ exogenous data is put into the given graph function to yield the corresponding endogenous values. The effects are computed at this individual point. Individual effects are shown below just for individual no. 1 out of the 200 observations.
86
+
87
+ - Total effects are shown below in the nodes and they are split up over the outgoing edges yielding the Mediation effects shown on the edges. Note, however, that just outgoing edges sum up to the node value, incoming edges do not. All effects are effects just on the final variable of interest, assumed here to be Y<sub>3</sub>.
88
+
89
+ ![Individual Mediation Effects (IME)](https://github.com/realrate/Causing/raw/develop/images_readme/IME_1.svg)
90
+
91
+ As you can see in the right-most graph for the individual mediation effects (IME), there is one green path starting at X<sub>1</sub> passing through Y<sub>1</sub>, Y<sub>2</sub>, and finally ending in Y<sub>3</sub>. This means that X<sub>1</sub> is the main cause for Y<sub>3</sub> taking on a value above average with its effect on Y<sub>3</sub> being +29.81. However, this positive effect is slightly reduced by X<sub>2</sub>. In total, accounting for all exogenous and endogenous effects, Y<sub>3</sub> is +27.07 above average. You can understand at one glance why Y<sub>3</sub> is above average for individual no. 1.
92
+
93
+ You can find the full source code for this example [here](https://github.com/realrate/Causing/blob/develop/causing/examples/models.py#L16-L45).
94
+
95
+ ## 2. Application to Education and Wages
96
+
97
+ To dig a bit deeper, here we have a real-world example from social sciences. We analyze how the wage earned by young American workers is determined by their educational attainment, family characteristics, and test scores.
98
+
99
+ This 5-minute introductory video gives a short overview of Causing and includes this real data example: See [Causing Introduction Video](https://youtu.be/GJLsjSZOk2w "Causing_Introduction_Video").
100
+
101
+ See here for a detailed analysis of the Education and Wages example: [An Application of Causing: Education and Wages](docs/education.md).
102
+
103
+ ## 3. Application to Insurance Ratings
104
+
105
+ The Causing approach and its formulas together with an application are given in:
106
+
107
+ > Bartel, Holger (2020), "Causal Analysis - With an Application to Insurance Ratings"
108
+ DOI: 10.13140/RG.2.2.31524.83848
109
+ https://www.researchgate.net/publication/339091133
110
+
111
+ Note that in this early paper the mediation effects on the final variable of interest are called final effects. Also, while the current Causing version just uses numerically computed effects, that paper uses closed formulas.
112
+
113
+ The paper proposes simple linear algebra formulas for the causal analysis of equation systems. The effect of one variable on another is the total derivative. It is extended to endogenous system variables. These total effects are identical to the effects used in graph theory and its do-calculus. Further, mediation effects are defined, decomposing the total effect of one variable on a final variable of interest over all its directly caused variables. This allows for an easy but in-depth causal and mediation analysis.
114
+
115
+ The equation system provided by the user is represented as a structural neural network (SNN). The network's nodes are represented by the model variables and its edge weights are given by the effects. Unlike classical deep neural networks, we follow a sparse and 'small data' approach. This new methodology is applied to the financial strength ratings of insurance companies.
116
+
117
+ > **Keywords:** total derivative, graphical effect, graph theory, do-Calculus, structural neural network, linear Simultaneous Equations Model (SEM), Structural Causal Model (SCM), insurance rating
118
+
119
+ ## Award
120
+
121
+ RealRate's AI software _Causing_ is a winner of the PyTorch AI Hackathon.
122
+
123
+ <img src="https://github.com/realrate/Causing/raw/develop/images_readme/RealRate_AI_Software_Winner.png">
124
+
125
+ We are excited to be a winner of the PyTorch AI Hackathon 2020 in the Responsible AI category. This is quite an honor given that more than 2,500 teams submitted their projects.
126
+
127
+ [devpost.com/software/realrate-explainable-ai-for-company-ratings](https://devpost.com/software/realrate-explainable-ai-for-company-ratings "devpost.com/software/realrate-explainable-ai-for-company-ratings").
128
+
129
+ ## GitHub Star History
130
+
131
+ [star-history.com](https://www.star-history.com/#realrate/Causing&Date)
132
+ ![star-history-2025327](https://github.com/user-attachments/assets/67271706-0534-4b97-b9da-7fe502f1d94a)
133
+
134
+ ## Contact
135
+
136
+ Dr. Holger Bartel
137
+ RealRate
138
+ Cecilienstr. 14, D-12307 Berlin
139
+ [holger.bartel@realrate.ai](mailto:holger.bartel@realrate.ai?subject=[Causing])
140
+ Phone: +49 160 957 90 844
141
+ [realrate.ai](https://realrate.ai)
142
+ [drbartel.com](https://drbartel.com)
@@ -0,0 +1,22 @@
1
+ # -*- coding: utf-8 -*-
2
+ """causing - causal interpretation using graphs."""
3
+
4
+ # flake8: noqa
5
+
6
+ # public Causing API
7
+ from causing.model import Model
8
+
9
+
10
+ def create_indiv(m: Model, xdat, show_nr_indiv: int) -> dict:
11
+ """Calculate effects and limit result set to `show_nr_indiv`
12
+
13
+ This is mostly to for backwards compatability and to shrink the amount of
14
+ data for tests cases. Otherwise, use `Model.calc_effects` directly.
15
+ """
16
+ eff = m.calc_effects(xdat)
17
+ for key in ["xnodeeffects", "ynodeeffects", "xedgeeffects", "yedgeeffects"]:
18
+ if key in ["xnodeeffects", "ynodeeffects"]:
19
+ eff[key] = eff[key][:, :show_nr_indiv]
20
+ else:
21
+ eff[key] = eff[key][:show_nr_indiv]
22
+ return eff
@@ -0,0 +1,272 @@
1
+ # -*- coding: utf-8 -*-
2
+ """Create direct, total and mediation Graphviz graph from dot_str"""
3
+ import re
4
+ import subprocess
5
+ from typing import Iterable
6
+ from itertools import chain
7
+ from pathlib import Path
8
+ from functools import cache
9
+
10
+ import numpy as np
11
+ import networkx
12
+
13
+ from causing.model import Model
14
+ from causing import utils
15
+
16
+
17
+ DOT_COMMAND = "dot"
18
+
19
+
20
+ @cache
21
+ def dot_version() -> list[int]:
22
+ full_output = subprocess.check_output(
23
+ [DOT_COMMAND, "-V"], encoding="utf-8", stderr=subprocess.STDOUT
24
+ )
25
+ match = re.search(r"(\d+[.])+\d+", full_output)
26
+ assert match
27
+ version = match.group(0)
28
+ return [int(x) for x in version.split(".")]
29
+
30
+
31
+ def fix_svg_scale(svg_code):
32
+ """Work around graphviz SVG generation bug
33
+
34
+ Graphviz divides the numbers in the viewBox attribute by the scale instead
35
+ of multiplying it. We work around it my multiplying with the scale twice.
36
+ See https://github.com/realrate/RealRate-Private/issues/631
37
+ and https://gitlab.com/graphviz/graphviz/-/issues/1406
38
+ """
39
+ # find scaling factor
40
+ scale_match = re.search(r"scale\(([0-9.]+)", svg_code)
41
+ assert scale_match
42
+ factor = float(scale_match.group(1)) ** 2
43
+
44
+ # edit SVG tag
45
+ orig_svg_tag = next(re.finditer(r"<svg.*?>", svg_code, flags=re.DOTALL)).group(0)
46
+ attrs = {
47
+ match.group(1): match.group(2)
48
+ for match in re.finditer(r'(\w+)="(.*?)"', orig_svg_tag)
49
+ }
50
+ new_svg_tag = "<svg "
51
+ for attr, val in attrs.items():
52
+ if attr == "viewBox":
53
+ val = " ".join(str(float(x) * factor) for x in val.split(" "))
54
+ new_svg_tag += f'{attr}="{val}" '
55
+ new_svg_tag += ">"
56
+
57
+ # return with changed SVG tag
58
+ return svg_code.replace(orig_svg_tag, new_svg_tag)
59
+
60
+
61
+ def save_graph(path: Path, graph_dot):
62
+ """Render graph as SVG"""
63
+
64
+ path.parent.mkdir(parents=True, exist_ok=True)
65
+
66
+ # with open(path.stem + ".txt", "w") as file:
67
+ # file.write(graph_dot)
68
+
69
+ svg_code = subprocess.check_output(
70
+ [DOT_COMMAND, "-Tsvg"], input=graph_dot, encoding="utf-8"
71
+ )
72
+ # if dot_version()[0] < 3:
73
+ # svg_code = fix_svg_scale(svg_code)
74
+ with open(path, "w") as f:
75
+ f.write(svg_code)
76
+
77
+
78
+ def annotated_graphs(
79
+ m: Model,
80
+ graph_json,
81
+ ids: Iterable[str] = None,
82
+ node_labels: dict[str, str] = {},
83
+ ) -> Iterable[networkx.DiGraph]:
84
+ """Return DiGraphs with all information required to draw IME graphs"""
85
+ if ids is None:
86
+ ids = [str(i + 1) for i in range(len(graph_json["xedgeeffects"]))]
87
+ for graph_id, exj, eyj, eyx, eyy in zip(
88
+ ids,
89
+ np.array(graph_json["xnodeeffects"]).T,
90
+ np.array(graph_json["ynodeeffects"]).T,
91
+ graph_json["xedgeeffects"],
92
+ graph_json["yedgeeffects"],
93
+ ):
94
+ g = m.graph.copy()
95
+ g.graph["id"] = graph_id
96
+
97
+ # nodes
98
+ for var, effect in chain(zip(m.xvars, exj), zip(m.yvars, eyj)):
99
+ if np.isnan(effect):
100
+ g.remove_node(var)
101
+ continue
102
+ data = g.nodes[var]
103
+ data["effect"] = effect
104
+ data["label"] = node_labels.get(var, var)
105
+
106
+ # edges
107
+ for to_node, x_effects, y_effects in zip(m.yvars, eyx, eyy):
108
+ for from_node, eff in chain(
109
+ zip(m.xvars, x_effects), zip(m.yvars, y_effects)
110
+ ):
111
+ if not np.isnan(eff):
112
+ g[from_node][to_node]["effect"] = eff
113
+
114
+ yield g
115
+
116
+
117
+ NODE_PALETTE = "#ff7973 #FFC7AD #EEEEEE #BDE7BD #75cf73".split(" ")
118
+ EDGE_PALETTE = "#ff7973 #FFC7AD #BBBBBB #aad3aa #75cf73".split(" ")
119
+ PEN_WIDTH_PALETTE = [12, 8, 4, 8, 12]
120
+
121
+
122
+ def color(val, max_val, palette):
123
+ """Choose element of palette based on `val`
124
+
125
+ val == -max_val will return palette[0]
126
+ val == +max_val will return palette[-1]
127
+ """
128
+ zero_one_scale = (val + max_val) / (2 * max_val)
129
+ ind = round(zero_one_scale * len(palette) - 0.5)
130
+ clipped_ind = np.clip(round(ind), 0, len(palette) - 1)
131
+ return palette[clipped_ind]
132
+
133
+
134
+ GRAPH_OPTIONS_STR = """
135
+ node [style="filled,rounded"]
136
+ node [shape=box]
137
+ node [color="#444444"]
138
+ ratio="compress"
139
+ size="8,10"
140
+ """
141
+
142
+
143
+ def graph_to_dot(
144
+ g: networkx.DiGraph,
145
+ invisible_edges={},
146
+ node_palette=NODE_PALETTE,
147
+ edge_palette=EDGE_PALETTE,
148
+ pen_width_palette=PEN_WIDTH_PALETTE,
149
+ graph_options_str=GRAPH_OPTIONS_STR,
150
+ in_percent=False,
151
+ min_sig_figures=3,
152
+ cutoff=0.0001,
153
+ ):
154
+ dot_str = "digraph {" + graph_options_str
155
+ max_val = max(
156
+ [abs(data["effect"]) for _, data in g.nodes(data=True)]
157
+ + [abs(data["effect"]) for _, _, data in g.edges(data=True)]
158
+ )
159
+
160
+ for node, data in g.nodes(data=True):
161
+ eff_str = utils.fmt_min_sig(
162
+ data["effect"] if abs(data["effect"]) > cutoff else 0,
163
+ min_sig_figures,
164
+ percent=in_percent,
165
+ )
166
+ label = data.get("label", node).replace("\n", r"\n") + r"\n" + eff_str
167
+ col_str = color(data["effect"], max_val, palette=node_palette)
168
+ dot_str += f' "{node}"[label = "{label}" fillcolor="{col_str}"]\n'
169
+
170
+ for from_node, to_node, data in g.edges(data=True):
171
+ eff_str = utils.fmt_min_sig(
172
+ data["effect"] if abs(data["effect"]) > cutoff else 0,
173
+ min_sig_figures,
174
+ percent=in_percent,
175
+ )
176
+ col_str = color(data["effect"], max_val, palette=edge_palette)
177
+ penwidth = color(data["effect"], max_val, palette=pen_width_palette)
178
+ dot_str += (
179
+ f' "{from_node}" -> "{to_node}" [label="{eff_str}" color="{col_str}" penwidth="{penwidth}", '
180
+ f"arrowsize=0.5]\n"
181
+ )
182
+
183
+ for from_node, to_node in invisible_edges:
184
+ dot_str += f' "{from_node}" -> "{to_node}" [style = "invisible", arrowhead="none"]\n'
185
+
186
+ dot_str += "}"
187
+ return dot_str
188
+
189
+
190
+ def create_graphs(graphs: Iterable[networkx.DiGraph], output_dir: Path, **kwargs):
191
+ for g in graphs:
192
+ filename = f"IME_{g.graph['id']}.svg"
193
+ print("Create", filename)
194
+ dot_str = graph_to_dot(g, **kwargs)
195
+ save_graph(output_dir / filename, dot_str)
196
+ # save the dot string
197
+ dot_path = output_dir / f"IME_{g.graph['id']}.dot"
198
+ with open(dot_path, "w") as f:
199
+ f.write(dot_str)
200
+
201
+
202
+ def remove_node_keep_edges(graph, node):
203
+ """Keep transitive connections when removing node.
204
+
205
+ Removing B from A->B->C will result in A->C.
206
+
207
+ WARNING: This function only approximates the edge effects. To get accurate
208
+ results, you must shrink the model and recalculate the effects, instead.
209
+ """
210
+
211
+ total_out_effect = sum(
212
+ out_data["effect"] for _, _, out_data in graph.out_edges(node, data=True)
213
+ )
214
+ num_out_edges = len(graph.out_edges(node))
215
+ for a, _, in_data in graph.in_edges(node, data=True):
216
+ for _, b, out_data in graph.out_edges(node, data=True):
217
+ if total_out_effect == 0:
218
+ # If all outgoing edges have no effect, distribute the incoming effects
219
+ # evenly across all outgoing edges.
220
+ new_edge_effect = in_data["effect"] / num_out_edges
221
+ else:
222
+ new_edge_effect = (
223
+ in_data["effect"] * out_data["effect"] / total_out_effect
224
+ )
225
+ if graph.has_edge(a, b):
226
+ graph[a][b]["effect"] += new_edge_effect
227
+ else:
228
+ graph.add_edge(a, b, effect=new_edge_effect)
229
+
230
+ graph.remove_node(node)
231
+
232
+
233
+ def recalc_graphs(graphs, model, xdat) -> Iterable[networkx.DiGraph]:
234
+ """Recalculate node and edge effects in graph.
235
+
236
+ Do this after modifying the graphs (typically with `remove_node_keep_edges`)
237
+ to calculate exact effects.
238
+ `graphs` must be in the format generated by `annotated_graphs` and in the
239
+ same order as individuals within `xdat`.
240
+ """
241
+ yhat = model.compute(xdat)
242
+ yhat_mean = np.mean(yhat, axis=1)
243
+ xdat_mean = np.mean(xdat, axis=1)
244
+
245
+ for i, approx_graph in enumerate(graphs):
246
+ individual_xdat = xdat[:, i : i + 1]
247
+ removed_nodes = set(model.graph.nodes) - set(approx_graph.nodes)
248
+
249
+ # Calc effects on shrunken model
250
+ individual_model = model.shrink(removed_nodes)
251
+ effects = individual_model.calc_effects(
252
+ individual_xdat,
253
+ xdat_mean=xdat_mean,
254
+ yhat_mean=yhat_mean[[yvar not in removed_nodes for yvar in model.yvars]],
255
+ )
256
+
257
+ # Get graph for shrunken model
258
+ [g] = annotated_graphs(
259
+ individual_model,
260
+ effects,
261
+ node_labels={
262
+ n: data["label"]
263
+ for n, data in approx_graph.nodes(data=True)
264
+ if "label" in data
265
+ },
266
+ )
267
+ for xvar in set(model.xvars) & removed_nodes & g.nodes():
268
+ g.remove_node(xvar)
269
+
270
+ # Preserve graph attributes
271
+ g.graph = approx_graph.graph
272
+ yield g