causing 2.3.0__tar.gz → 2.4.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of causing might be problematic. Click here for more details.

@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: causing
3
- Version: 2.3.0
3
+ Version: 2.4.1
4
4
  Summary: Causing: CAUSal INterpretation using Graphs
5
5
  Home-page: https://github.com/realrate/Causing
6
6
  Author: Dr. Holger Bartel
@@ -11,6 +11,12 @@ Classifier: Operating System :: OS Independent
11
11
  Requires-Python: >=3.9
12
12
  Description-Content-Type: text/markdown
13
13
  License-File: LICENSE.md
14
+ Requires-Dist: numpy~=1.23
15
+ Requires-Dist: pandas~=1.3
16
+ Requires-Dist: scipy~=1.9
17
+ Requires-Dist: sympy~=1.5
18
+ Requires-Dist: networkx~=2.7
19
+ Requires-Dist: pre-commit
14
20
 
15
21
  # Causing: CAUSal INterpretation using Graphs
16
22
 
@@ -23,12 +29,12 @@ effects of a given equation system._
23
29
  Get a nice colored graph and immediately understand the causal effects between the variables.
24
30
 
25
31
  **Input:** You simply have to put in a dataset and provide an equation system in form of a
26
- python function. The endogenous variable on the left-hand side are assumed being caused by
32
+ python function. The endogenous variables on the left-hand side are assumed to be caused by
27
33
  the variables on the right-hand side of the equation. Thus, you provide the causal structure
28
34
  in form of a directed acyclic graph (DAG).
29
35
 
30
- **Output:** As an output you will get a colored graph of quantified effects acting between
31
- the model variables. You are able to immediately interpret mediation chains for every
36
+ **Output:** As an output, you will get a colored graph of quantified effects acting between
37
+ the model variables. You can immediately interpret mediation chains for every
32
38
  individual observation - even for highly complex nonlinear systems.
33
39
 
34
40
  Here is a table relating Causing to other approaches:
@@ -36,10 +42,10 @@ Here is a table relating Causing to other approaches:
36
42
  Causing is | Causing is NOT
37
43
  --- | ---
38
44
  ✅ causal model given | ❌ causal search
39
- ✅ DAG directed acyclic graph | ❌ cyclic, undirected or bidirected graph
45
+ ✅ DAG directed acyclic graph | ❌ cyclic, undirected, or bidirected graph
40
46
  ✅ latent variables | ❌ just observed / manifest variables
41
47
  ✅ individual effects | ❌ just average effects
42
- ✅ direct, total and mediation effects | ❌ just total effects
48
+ ✅ direct, total, and mediation effects | ❌ just total effects
43
49
  ✅ structural model | ❌ reduced model
44
50
  ✅ small and big data | ❌ big data requirement
45
51
  ✅ graphical results | ❌ just numerical results
@@ -53,21 +59,21 @@ Causing combines total effects and mediation effects in one single graph that is
53
59
 
54
60
  The total effects of a variable on the final variable are shown in the corresponding nodes of the graph. The total effects are split up over their outgoing edges, yielding the mediation effects shown on the edges. Just education has more than one outgoing edge to be interpreted in this way.
55
61
 
56
- The effects differ from individual to individual. To emphsize this, we talk about individual effects. And the corresponding graph, combining total and mediation effects is called the Imdividual Mediation Effects (IME) graph.
62
+ The effects differ from individual to individual. To emphasize this, we talk about individual effects. And the corresponding graph, combining total and mediation effects is called the Individual Mediation Effects (IME) graph.
57
63
 
58
64
  ## Software
59
65
 
60
- Causing is a free software written in _Python 3_. Graphs are generated using _Graphviz_. See dependencies in [setup.py](setup.py). Causing is available under MIT license. See [LICENSE](LICENSE.md "LICENSE").
66
+ Causing is free software written in _Python 3_. Graphs are generated using _Graphviz_. See dependencies in [setup.py](setup.py). Causing is available under MIT license. See [LICENSE](LICENSE.md "LICENSE").
61
67
 
62
- The software is developed by RealRate, an AI rating agency aiming to re-invent the ratings market by using AI, interpretability and avoiding any conflict of interest. See www.realrate.ai.
68
+ The software is developed by RealRate, an AI rating agency aiming to re-invent the rating market by using AI, interpretability, and avoiding any conflict of interest. See www.realrate.ai.
63
69
 
64
70
  When starting `python -m causing.examples example` after cloning / downloading the Causing repository you will find the results in the _output_ folder. The results are saved in SVG files. The IME files show the individual mediation effects graphs for the respective individual.
65
71
 
66
72
  See `causing/examples` for the code generating some examples.
67
73
 
68
- ## Start your own Model
74
+ ## Start your Model
69
75
 
70
- To start your own model, you have to provide the following information, as done in the example code below:
76
+ To start your model, you have to provide the following information, as done in the example code below:
71
77
 
72
78
  - Define all your model variables as SymPy symbols.
73
79
  - Note that in Sympy some operators are special, e.g. Max() instead of max().
@@ -88,31 +94,31 @@ Y<sub>2</sub> = X<sub>2</sub> + 2 * Y<sub>1</sub><sup>2</sup>
88
94
 
89
95
  Y<sub>3</sub> = Y<sub>1</sub> + Y<sub>2</sub>.
90
96
 
91
- This gives the following graphs. Some notes are in order to understand them:
97
+ This gives the following graphs. Some notes to understand them:
92
98
 
93
- - The data used consist of 200 observations. They are available for the x variables X<sub>1</sub> and X<sub>2</sub> with mean(X<sub>1</sub>) = 3 and mean(X<sub>2</sub>) = 2. Variables Y<sub>1</sub> and Y<sub>2</sub> are assumed to be latent / unobserved. Y<sub>3</sub> is assumed to be manifest / observed. Therefore, 200 observations are available for Y<sub>3</sub>.
99
+ - The data used consists of 200 observations. They are available for the x variables X<sub>1</sub> and X<sub>2</sub> with mean(X<sub>1</sub>) = 3 and mean(X<sub>2</sub>) = 2. Variables Y<sub>1</sub> and Y<sub>2</sub> are assumed to be latent / unobserved. Y<sub>3</sub> is assumed to be manifest / observed. Therefore, 200 observations are available for Y<sub>3</sub>.
94
100
 
95
101
  - To allow for benchmark comparisons, each individual effect is measured with respect to the mean of all observations.
96
102
 
97
103
  - Nodes and edges are colored, showing positive (_green_) and negative (_red_) effects they have on the final variable Y<sub>3</sub>.
98
104
 
99
- - Individual effects are based on the given model. For each individual, however its _own_ exogenous data is put into the given graph function to yield the corresponding endogenous values. The effects are computed at this individual point. Individual effects are shown below just for individual no. 1 out of the 200 observations.
105
+ - Individual effects are based on the given model. For each individual, however, its _own_ exogenous data is put into the given graph function to yield the corresponding endogenous values. The effects are computed at this individual point. Individual effects are shown below just for individual no. 1 out of the 200 observations.
100
106
 
101
- - Total effects are shown below in the nodes and they are split up over the outgoing edges yielding the Mediation effects shown on the edges. Note however, that just outgoining edges sum up to the node value, incoming edges do not. All effects are effects just on the final variable of interest, assumed here to be Y<sub>3</sub>.
107
+ - Total effects are shown below in the nodes and they are split up over the outgoing edges yielding the Mediation effects shown on the edges. Note, however, that just outgoing edges sum up to the node value, incoming edges do not. All effects are effects just on the final variable of interest, assumed here to be Y<sub>3</sub>.
102
108
 
103
109
  ![Individual Mediation Effects (IME)](https://github.com/realrate/Causing/raw/develop/images_readme/IME_1.svg)
104
110
 
105
- As you can see in the right-most graph for the individual mediation effects (IME), there is one green path starting at X<sub>1</sub> passing through Y<sub>1</sub>, Y<sub>2</sub> and finally ending in Y<sub>3</sub>. This means that X<sub>1</sub> is the main cause for Y<sub>3</sub> taking on a value above average with its effect on Y<sub>3</sub> being +29.81. However, this positive effect is slightly reduced by X<sub>2</sub>. In total, accounting for all exogenous and endogenous effects, Y<sub>3</sub> is +27.07 above average. You can understand at one glance why Y<sub>3</sub> is above average for individual no. 1.
111
+ As you can see in the right-most graph for the individual mediation effects (IME), there is one green path starting at X<sub>1</sub> passing through Y<sub>1</sub>, Y<sub>2</sub>, and finally ending in Y<sub>3</sub>. This means that X<sub>1</sub> is the main cause for Y<sub>3</sub> taking on a value above average with its effect on Y<sub>3</sub> being +29.81. However, this positive effect is slightly reduced by X<sub>2</sub>. In total, accounting for all exogenous and endogenous effects, Y<sub>3</sub> is +27.07 above average. You can understand at one glance why Y<sub>3</sub> is above average for individual no. 1.
106
112
 
107
113
  You can find the full source code for this example [here](https://github.com/realrate/Causing/blob/develop/causing/examples/models.py#L16-L45).
108
114
 
109
115
  ## 2. Application to Education and Wages
110
116
 
111
- To dig a bit deeper, here we have a real world example from social sciences. We analyze how the wage earned by young American workers is determined by their educational attainment, family characteristics, and test scores.
117
+ To dig a bit deeper, here we have a real-world example from social sciences. We analyze how the wage earned by young American workers is determined by their educational attainment, family characteristics, and test scores.
112
118
 
113
- This 5 minute introductory video gives a short overview over Causing and includes this real data example: See [Causing Introduction Video](https://youtu.be/GJLsjSZOk2w "Causing_Introduction_Video").
119
+ This 5-minute introductory video gives a short overview of Causing and includes this real data example: See [Causing Introduction Video](https://youtu.be/GJLsjSZOk2w "Causing_Introduction_Video").
114
120
 
115
- See here for a detailed analys of the Education and Wages example: [An Application of Causing: Education and Wages](docs/education.md).
121
+ See here for a detailed analysis of the Education and Wages example: [An Application of Causing: Education and Wages](docs/education.md).
116
122
 
117
123
  ## 3. Application to Insurance Ratings
118
124
 
@@ -122,21 +128,21 @@ The Causing approach and its formulas together with an application are given in:
122
128
  DOI: 10.13140/RG.2.2.31524.83848
123
129
  https://www.researchgate.net/publication/339091133
124
130
 
125
- Note that in this early paper the mediation effects on the final variable of interest are called final effects. Also, while the current Causing version just uses numerically computed effect, that paper uses closed formulas.
131
+ Note that in this early paper the mediation effects on the final variable of interest are called final effects. Also, while the current Causing version just uses numerically computed effects, that paper uses closed formulas.
126
132
 
127
133
  The paper proposes simple linear algebra formulas for the causal analysis of equation systems. The effect of one variable on another is the total derivative. It is extended to endogenous system variables. These total effects are identical to the effects used in graph theory and its do-calculus. Further, mediation effects are defined, decomposing the total effect of one variable on a final variable of interest over all its directly caused variables. This allows for an easy but in-depth causal and mediation analysis.
128
134
 
129
- The equation system provided by the user is represented as a structural neural network (SNN). The network's nodes are represented by the model variables and its edge weights are given by the effects. Unlike classical deep neural networks, we follow a sparse and 'small data' approach. This new methodology is applied to financial strength ratings of insurance companies.
135
+ The equation system provided by the user is represented as a structural neural network (SNN). The network's nodes are represented by the model variables and its edge weights are given by the effects. Unlike classical deep neural networks, we follow a sparse and 'small data' approach. This new methodology is applied to the financial strength ratings of insurance companies.
130
136
 
131
137
  > **Keywords:** total derivative, graphical effect, graph theory, do-Calculus, structural neural network, linear Simultaneous Equations Model (SEM), Structural Causal Model (SCM), insurance rating
132
138
 
133
139
  ## Award
134
140
 
135
- RealRate's AI software _Causing_ is a winner of PyTorch AI Hackathon.
141
+ RealRate's AI software _Causing_ is a winner of the PyTorch AI Hackathon.
136
142
 
137
143
  <img src="https://github.com/realrate/Causing/raw/develop/images_readme/RealRate_AI_Software_Winner.png">
138
144
 
139
- We are exited being a winner of the PyTorch AI Hackathon 2020 in the Responsible AI category. This is quite an honor given that more than 2,500 teams submitted their projects.
145
+ We are excited to be a winner of the PyTorch AI Hackathon 2020 in the Responsible AI category. This is quite an honor given that more than 2,500 teams submitted their projects.
140
146
 
141
147
  [devpost.com/software/realrate-explainable-ai-for-company-ratings](https://devpost.com/software/realrate-explainable-ai-for-company-ratings "devpost.com/software/realrate-explainable-ai-for-company-ratings").
142
148
 
@@ -9,12 +9,12 @@ effects of a given equation system._
9
9
  Get a nice colored graph and immediately understand the causal effects between the variables.
10
10
 
11
11
  **Input:** You simply have to put in a dataset and provide an equation system in form of a
12
- python function. The endogenous variable on the left-hand side are assumed being caused by
12
+ python function. The endogenous variables on the left-hand side are assumed to be caused by
13
13
  the variables on the right-hand side of the equation. Thus, you provide the causal structure
14
14
  in form of a directed acyclic graph (DAG).
15
15
 
16
- **Output:** As an output you will get a colored graph of quantified effects acting between
17
- the model variables. You are able to immediately interpret mediation chains for every
16
+ **Output:** As an output, you will get a colored graph of quantified effects acting between
17
+ the model variables. You can immediately interpret mediation chains for every
18
18
  individual observation - even for highly complex nonlinear systems.
19
19
 
20
20
  Here is a table relating Causing to other approaches:
@@ -22,10 +22,10 @@ Here is a table relating Causing to other approaches:
22
22
  Causing is | Causing is NOT
23
23
  --- | ---
24
24
  ✅ causal model given | ❌ causal search
25
- ✅ DAG directed acyclic graph | ❌ cyclic, undirected or bidirected graph
25
+ ✅ DAG directed acyclic graph | ❌ cyclic, undirected, or bidirected graph
26
26
  ✅ latent variables | ❌ just observed / manifest variables
27
27
  ✅ individual effects | ❌ just average effects
28
- ✅ direct, total and mediation effects | ❌ just total effects
28
+ ✅ direct, total, and mediation effects | ❌ just total effects
29
29
  ✅ structural model | ❌ reduced model
30
30
  ✅ small and big data | ❌ big data requirement
31
31
  ✅ graphical results | ❌ just numerical results
@@ -39,21 +39,21 @@ Causing combines total effects and mediation effects in one single graph that is
39
39
 
40
40
  The total effects of a variable on the final variable are shown in the corresponding nodes of the graph. The total effects are split up over their outgoing edges, yielding the mediation effects shown on the edges. Just education has more than one outgoing edge to be interpreted in this way.
41
41
 
42
- The effects differ from individual to individual. To emphsize this, we talk about individual effects. And the corresponding graph, combining total and mediation effects is called the Imdividual Mediation Effects (IME) graph.
42
+ The effects differ from individual to individual. To emphasize this, we talk about individual effects. And the corresponding graph, combining total and mediation effects is called the Individual Mediation Effects (IME) graph.
43
43
 
44
44
  ## Software
45
45
 
46
- Causing is a free software written in _Python 3_. Graphs are generated using _Graphviz_. See dependencies in [setup.py](setup.py). Causing is available under MIT license. See [LICENSE](LICENSE.md "LICENSE").
46
+ Causing is free software written in _Python 3_. Graphs are generated using _Graphviz_. See dependencies in [setup.py](setup.py). Causing is available under MIT license. See [LICENSE](LICENSE.md "LICENSE").
47
47
 
48
- The software is developed by RealRate, an AI rating agency aiming to re-invent the ratings market by using AI, interpretability and avoiding any conflict of interest. See www.realrate.ai.
48
+ The software is developed by RealRate, an AI rating agency aiming to re-invent the rating market by using AI, interpretability, and avoiding any conflict of interest. See www.realrate.ai.
49
49
 
50
50
  When starting `python -m causing.examples example` after cloning / downloading the Causing repository you will find the results in the _output_ folder. The results are saved in SVG files. The IME files show the individual mediation effects graphs for the respective individual.
51
51
 
52
52
  See `causing/examples` for the code generating some examples.
53
53
 
54
- ## Start your own Model
54
+ ## Start your Model
55
55
 
56
- To start your own model, you have to provide the following information, as done in the example code below:
56
+ To start your model, you have to provide the following information, as done in the example code below:
57
57
 
58
58
  - Define all your model variables as SymPy symbols.
59
59
  - Note that in Sympy some operators are special, e.g. Max() instead of max().
@@ -74,31 +74,31 @@ Y<sub>2</sub> = X<sub>2</sub> + 2 * Y<sub>1</sub><sup>2</sup>
74
74
 
75
75
  Y<sub>3</sub> = Y<sub>1</sub> + Y<sub>2</sub>.
76
76
 
77
- This gives the following graphs. Some notes are in order to understand them:
77
+ This gives the following graphs. Some notes to understand them:
78
78
 
79
- - The data used consist of 200 observations. They are available for the x variables X<sub>1</sub> and X<sub>2</sub> with mean(X<sub>1</sub>) = 3 and mean(X<sub>2</sub>) = 2. Variables Y<sub>1</sub> and Y<sub>2</sub> are assumed to be latent / unobserved. Y<sub>3</sub> is assumed to be manifest / observed. Therefore, 200 observations are available for Y<sub>3</sub>.
79
+ - The data used consists of 200 observations. They are available for the x variables X<sub>1</sub> and X<sub>2</sub> with mean(X<sub>1</sub>) = 3 and mean(X<sub>2</sub>) = 2. Variables Y<sub>1</sub> and Y<sub>2</sub> are assumed to be latent / unobserved. Y<sub>3</sub> is assumed to be manifest / observed. Therefore, 200 observations are available for Y<sub>3</sub>.
80
80
 
81
81
  - To allow for benchmark comparisons, each individual effect is measured with respect to the mean of all observations.
82
82
 
83
83
  - Nodes and edges are colored, showing positive (_green_) and negative (_red_) effects they have on the final variable Y<sub>3</sub>.
84
84
 
85
- - Individual effects are based on the given model. For each individual, however its _own_ exogenous data is put into the given graph function to yield the corresponding endogenous values. The effects are computed at this individual point. Individual effects are shown below just for individual no. 1 out of the 200 observations.
85
+ - Individual effects are based on the given model. For each individual, however, its _own_ exogenous data is put into the given graph function to yield the corresponding endogenous values. The effects are computed at this individual point. Individual effects are shown below just for individual no. 1 out of the 200 observations.
86
86
 
87
- - Total effects are shown below in the nodes and they are split up over the outgoing edges yielding the Mediation effects shown on the edges. Note however, that just outgoining edges sum up to the node value, incoming edges do not. All effects are effects just on the final variable of interest, assumed here to be Y<sub>3</sub>.
87
+ - Total effects are shown below in the nodes and they are split up over the outgoing edges yielding the Mediation effects shown on the edges. Note, however, that just outgoing edges sum up to the node value, incoming edges do not. All effects are effects just on the final variable of interest, assumed here to be Y<sub>3</sub>.
88
88
 
89
89
  ![Individual Mediation Effects (IME)](https://github.com/realrate/Causing/raw/develop/images_readme/IME_1.svg)
90
90
 
91
- As you can see in the right-most graph for the individual mediation effects (IME), there is one green path starting at X<sub>1</sub> passing through Y<sub>1</sub>, Y<sub>2</sub> and finally ending in Y<sub>3</sub>. This means that X<sub>1</sub> is the main cause for Y<sub>3</sub> taking on a value above average with its effect on Y<sub>3</sub> being +29.81. However, this positive effect is slightly reduced by X<sub>2</sub>. In total, accounting for all exogenous and endogenous effects, Y<sub>3</sub> is +27.07 above average. You can understand at one glance why Y<sub>3</sub> is above average for individual no. 1.
91
+ As you can see in the right-most graph for the individual mediation effects (IME), there is one green path starting at X<sub>1</sub> passing through Y<sub>1</sub>, Y<sub>2</sub>, and finally ending in Y<sub>3</sub>. This means that X<sub>1</sub> is the main cause for Y<sub>3</sub> taking on a value above average with its effect on Y<sub>3</sub> being +29.81. However, this positive effect is slightly reduced by X<sub>2</sub>. In total, accounting for all exogenous and endogenous effects, Y<sub>3</sub> is +27.07 above average. You can understand at one glance why Y<sub>3</sub> is above average for individual no. 1.
92
92
 
93
93
  You can find the full source code for this example [here](https://github.com/realrate/Causing/blob/develop/causing/examples/models.py#L16-L45).
94
94
 
95
95
  ## 2. Application to Education and Wages
96
96
 
97
- To dig a bit deeper, here we have a real world example from social sciences. We analyze how the wage earned by young American workers is determined by their educational attainment, family characteristics, and test scores.
97
+ To dig a bit deeper, here we have a real-world example from social sciences. We analyze how the wage earned by young American workers is determined by their educational attainment, family characteristics, and test scores.
98
98
 
99
- This 5 minute introductory video gives a short overview over Causing and includes this real data example: See [Causing Introduction Video](https://youtu.be/GJLsjSZOk2w "Causing_Introduction_Video").
99
+ This 5-minute introductory video gives a short overview of Causing and includes this real data example: See [Causing Introduction Video](https://youtu.be/GJLsjSZOk2w "Causing_Introduction_Video").
100
100
 
101
- See here for a detailed analys of the Education and Wages example: [An Application of Causing: Education and Wages](docs/education.md).
101
+ See here for a detailed analysis of the Education and Wages example: [An Application of Causing: Education and Wages](docs/education.md).
102
102
 
103
103
  ## 3. Application to Insurance Ratings
104
104
 
@@ -108,21 +108,21 @@ The Causing approach and its formulas together with an application are given in:
108
108
  DOI: 10.13140/RG.2.2.31524.83848
109
109
  https://www.researchgate.net/publication/339091133
110
110
 
111
- Note that in this early paper the mediation effects on the final variable of interest are called final effects. Also, while the current Causing version just uses numerically computed effect, that paper uses closed formulas.
111
+ Note that in this early paper the mediation effects on the final variable of interest are called final effects. Also, while the current Causing version just uses numerically computed effects, that paper uses closed formulas.
112
112
 
113
113
  The paper proposes simple linear algebra formulas for the causal analysis of equation systems. The effect of one variable on another is the total derivative. It is extended to endogenous system variables. These total effects are identical to the effects used in graph theory and its do-calculus. Further, mediation effects are defined, decomposing the total effect of one variable on a final variable of interest over all its directly caused variables. This allows for an easy but in-depth causal and mediation analysis.
114
114
 
115
- The equation system provided by the user is represented as a structural neural network (SNN). The network's nodes are represented by the model variables and its edge weights are given by the effects. Unlike classical deep neural networks, we follow a sparse and 'small data' approach. This new methodology is applied to financial strength ratings of insurance companies.
115
+ The equation system provided by the user is represented as a structural neural network (SNN). The network's nodes are represented by the model variables and its edge weights are given by the effects. Unlike classical deep neural networks, we follow a sparse and 'small data' approach. This new methodology is applied to the financial strength ratings of insurance companies.
116
116
 
117
117
  > **Keywords:** total derivative, graphical effect, graph theory, do-Calculus, structural neural network, linear Simultaneous Equations Model (SEM), Structural Causal Model (SCM), insurance rating
118
118
 
119
119
  ## Award
120
120
 
121
- RealRate's AI software _Causing_ is a winner of PyTorch AI Hackathon.
121
+ RealRate's AI software _Causing_ is a winner of the PyTorch AI Hackathon.
122
122
 
123
123
  <img src="https://github.com/realrate/Causing/raw/develop/images_readme/RealRate_AI_Software_Winner.png">
124
124
 
125
- We are exited being a winner of the PyTorch AI Hackathon 2020 in the Responsible AI category. This is quite an honor given that more than 2,500 teams submitted their projects.
125
+ We are excited to be a winner of the PyTorch AI Hackathon 2020 in the Responsible AI category. This is quite an honor given that more than 2,500 teams submitted their projects.
126
126
 
127
127
  [devpost.com/software/realrate-explainable-ai-for-company-ratings](https://devpost.com/software/realrate-explainable-ai-for-company-ratings "devpost.com/software/realrate-explainable-ai-for-company-ratings").
128
128
 
@@ -69,8 +69,8 @@ def save_graph(path: Path, graph_dot):
69
69
  svg_code = subprocess.check_output(
70
70
  [DOT_COMMAND, "-Tsvg"], input=graph_dot, encoding="utf-8"
71
71
  )
72
- if dot_version()[0] < 3:
73
- svg_code = fix_svg_scale(svg_code)
72
+ # if dot_version()[0] < 3:
73
+ # svg_code = fix_svg_scale(svg_code)
74
74
  with open(path, "w") as f:
75
75
  f.write(svg_code)
76
76
 
@@ -149,6 +149,7 @@ def graph_to_dot(
149
149
  graph_options_str=GRAPH_OPTIONS_STR,
150
150
  in_percent=False,
151
151
  min_sig_figures=3,
152
+ cutoff=0.0001,
152
153
  ):
153
154
  dot_str = "digraph {" + graph_options_str
154
155
  max_val = max(
@@ -157,13 +158,21 @@ def graph_to_dot(
157
158
  )
158
159
 
159
160
  for node, data in g.nodes(data=True):
160
- eff_str = utils.fmt_min_sig(data["effect"], min_sig_figures, percent=in_percent)
161
+ eff_str = utils.fmt_min_sig(
162
+ data["effect"] if abs(data["effect"]) > cutoff else 0,
163
+ min_sig_figures,
164
+ percent=in_percent,
165
+ )
161
166
  label = data.get("label", node).replace("\n", r"\n") + r"\n" + eff_str
162
167
  col_str = color(data["effect"], max_val, palette=node_palette)
163
168
  dot_str += f' "{node}"[label = "{label}" fillcolor="{col_str}"]\n'
164
169
 
165
170
  for from_node, to_node, data in g.edges(data=True):
166
- eff_str = utils.fmt_min_sig(data["effect"], min_sig_figures, percent=in_percent)
171
+ eff_str = utils.fmt_min_sig(
172
+ data["effect"] if abs(data["effect"]) > cutoff else 0,
173
+ min_sig_figures,
174
+ percent=in_percent,
175
+ )
167
176
  col_str = color(data["effect"], max_val, palette=edge_palette)
168
177
  penwidth = color(data["effect"], max_val, palette=pen_width_palette)
169
178
  dot_str += f' "{from_node}" -> "{to_node}" [label="{eff_str}" color="{col_str}" penwidth="{penwidth}"]\n'
@@ -1,4 +1,5 @@
1
1
  from __future__ import annotations
2
+
2
3
  from dataclasses import dataclass, field
3
4
  from typing import Iterable, Callable
4
5
  from functools import cached_property
@@ -8,6 +9,10 @@ import sympy
8
9
  import numpy as np
9
10
 
10
11
 
12
+ class NumericModelError(Exception):
13
+ pass
14
+
15
+
11
16
  @dataclass
12
17
  class Model:
13
18
 
@@ -42,6 +47,7 @@ class Model:
42
47
  self.graph.add_node(var)
43
48
  self.trans_graph = networkx.transitive_closure(self.graph, reflexive=True)
44
49
 
50
+ @np.errstate(all="raise")
45
51
  def compute(
46
52
  self,
47
53
  xdat: np.array,
@@ -77,15 +83,36 @@ class Model:
77
83
  eq_inputs[:, fixed_from_ind] = fixed_vals
78
84
 
79
85
  try:
86
+ # print(f"Comuting variable: {self.yvars[i]}")
87
+ # yhat[i] = np.array(
88
+ # [eq(*eq_in, *parameters.values()) for eq_in in eq_inputs],
89
+ # dtype=np.float64,
90
+ # )
91
+ computed_yvars = []
92
+ for eq_in in eq_inputs:
93
+ try:
94
+ computed_yvars.append(eq(*eq_in, *parameters.values()))
95
+ except FloatingPointError:
96
+ # Floating Point Error for self.yvars[i]
97
+ # Adding 0.0 to overcome this.
98
+ computed_yvars.append(0.0)
99
+
80
100
  yhat[i] = np.array(
81
- [eq(*eq_in, *parameters.values()) for eq_in in eq_inputs],
101
+ computed_yvars,
82
102
  dtype=np.float64,
83
103
  )
84
104
  except Exception as e:
85
- print(
105
+ # for eq_in in eq_inputs:
106
+ # print("--", self.yvars[i])
107
+ # for var, val in zip(
108
+ # self.vars + list(parameters.keys()),
109
+ # list(eq_in) + list(parameters.values()),
110
+ # ):
111
+ # print(var, "=", val)
112
+ # eq(*eq_in, *parameters.values())
113
+ raise NumericModelError(
86
114
  f"Failed to compute model value for yvar {self.yvars[i]}: {e}"
87
- )
88
- raise
115
+ ) from e
89
116
  assert yhat.shape == (self.ndim, tau)
90
117
  return yhat
91
118
 
@@ -11,7 +11,7 @@ def round_sig(x, sig=2) -> float:
11
11
  """Round x to the given number of significant figures"""
12
12
  if x == 0 or not np.isfinite(x):
13
13
  return x
14
- return round(x, sig - int(floor(log10(abs(x)))) - 1)
14
+ return round(x, sig - int(floor(log10(abs(x)))) + 1)
15
15
 
16
16
 
17
17
  def round_sig_recursive(x, sig=2):
@@ -37,6 +37,11 @@ def round_sig_recursive(x, sig=2):
37
37
 
38
38
  class MatrixEncoder(json.JSONEncoder):
39
39
  def default(self, obj):
40
+ # allow serialization of numpy scalars
41
+ if isinstance(obj, np.integer):
42
+ return int(obj)
43
+ if isinstance(obj, np.floating):
44
+ return float(obj)
40
45
  # avoid importing pytorch for isinstance check
41
46
  if isinstance(obj, np.ndarray) or type(obj).__name__ == "Tensor":
42
47
  return obj.tolist()
@@ -61,11 +66,12 @@ def fmt_min_sig(x, min_sig_figures=3, percent=False, percent_spacer=""):
61
66
  if not math.isfinite(x):
62
67
  return str(x)
63
68
  if x == 0:
64
- return "0"
65
- if percent:
66
- x *= 100
67
- show_dec = max(-math.floor(math.log10(abs(x)) + 1) + min_sig_figures, 0)
68
- num = locale.format_string("%." + str(show_dec) + "f", x, grouping=True)
69
+ num = "0"
70
+ else:
71
+ if percent:
72
+ x *= 100
73
+ show_dec = max(-math.floor(math.log10(abs(x)) + 1) + min_sig_figures, 0)
74
+ num = locale.format_string("%." + str(show_dec) + "f", x, grouping=True)
69
75
  if percent:
70
76
  num += percent_spacer + "%"
71
77
  return num
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: causing
3
- Version: 2.3.0
3
+ Version: 2.4.1
4
4
  Summary: Causing: CAUSal INterpretation using Graphs
5
5
  Home-page: https://github.com/realrate/Causing
6
6
  Author: Dr. Holger Bartel
@@ -11,6 +11,12 @@ Classifier: Operating System :: OS Independent
11
11
  Requires-Python: >=3.9
12
12
  Description-Content-Type: text/markdown
13
13
  License-File: LICENSE.md
14
+ Requires-Dist: numpy~=1.23
15
+ Requires-Dist: pandas~=1.3
16
+ Requires-Dist: scipy~=1.9
17
+ Requires-Dist: sympy~=1.5
18
+ Requires-Dist: networkx~=2.7
19
+ Requires-Dist: pre-commit
14
20
 
15
21
  # Causing: CAUSal INterpretation using Graphs
16
22
 
@@ -23,12 +29,12 @@ effects of a given equation system._
23
29
  Get a nice colored graph and immediately understand the causal effects between the variables.
24
30
 
25
31
  **Input:** You simply have to put in a dataset and provide an equation system in form of a
26
- python function. The endogenous variable on the left-hand side are assumed being caused by
32
+ python function. The endogenous variables on the left-hand side are assumed to be caused by
27
33
  the variables on the right-hand side of the equation. Thus, you provide the causal structure
28
34
  in form of a directed acyclic graph (DAG).
29
35
 
30
- **Output:** As an output you will get a colored graph of quantified effects acting between
31
- the model variables. You are able to immediately interpret mediation chains for every
36
+ **Output:** As an output, you will get a colored graph of quantified effects acting between
37
+ the model variables. You can immediately interpret mediation chains for every
32
38
  individual observation - even for highly complex nonlinear systems.
33
39
 
34
40
  Here is a table relating Causing to other approaches:
@@ -36,10 +42,10 @@ Here is a table relating Causing to other approaches:
36
42
  Causing is | Causing is NOT
37
43
  --- | ---
38
44
  ✅ causal model given | ❌ causal search
39
- ✅ DAG directed acyclic graph | ❌ cyclic, undirected or bidirected graph
45
+ ✅ DAG directed acyclic graph | ❌ cyclic, undirected, or bidirected graph
40
46
  ✅ latent variables | ❌ just observed / manifest variables
41
47
  ✅ individual effects | ❌ just average effects
42
- ✅ direct, total and mediation effects | ❌ just total effects
48
+ ✅ direct, total, and mediation effects | ❌ just total effects
43
49
  ✅ structural model | ❌ reduced model
44
50
  ✅ small and big data | ❌ big data requirement
45
51
  ✅ graphical results | ❌ just numerical results
@@ -53,21 +59,21 @@ Causing combines total effects and mediation effects in one single graph that is
53
59
 
54
60
  The total effects of a variable on the final variable are shown in the corresponding nodes of the graph. The total effects are split up over their outgoing edges, yielding the mediation effects shown on the edges. Just education has more than one outgoing edge to be interpreted in this way.
55
61
 
56
- The effects differ from individual to individual. To emphsize this, we talk about individual effects. And the corresponding graph, combining total and mediation effects is called the Imdividual Mediation Effects (IME) graph.
62
+ The effects differ from individual to individual. To emphasize this, we talk about individual effects. And the corresponding graph, combining total and mediation effects is called the Individual Mediation Effects (IME) graph.
57
63
 
58
64
  ## Software
59
65
 
60
- Causing is a free software written in _Python 3_. Graphs are generated using _Graphviz_. See dependencies in [setup.py](setup.py). Causing is available under MIT license. See [LICENSE](LICENSE.md "LICENSE").
66
+ Causing is free software written in _Python 3_. Graphs are generated using _Graphviz_. See dependencies in [setup.py](setup.py). Causing is available under MIT license. See [LICENSE](LICENSE.md "LICENSE").
61
67
 
62
- The software is developed by RealRate, an AI rating agency aiming to re-invent the ratings market by using AI, interpretability and avoiding any conflict of interest. See www.realrate.ai.
68
+ The software is developed by RealRate, an AI rating agency aiming to re-invent the rating market by using AI, interpretability, and avoiding any conflict of interest. See www.realrate.ai.
63
69
 
64
70
  When starting `python -m causing.examples example` after cloning / downloading the Causing repository you will find the results in the _output_ folder. The results are saved in SVG files. The IME files show the individual mediation effects graphs for the respective individual.
65
71
 
66
72
  See `causing/examples` for the code generating some examples.
67
73
 
68
- ## Start your own Model
74
+ ## Start your Model
69
75
 
70
- To start your own model, you have to provide the following information, as done in the example code below:
76
+ To start your model, you have to provide the following information, as done in the example code below:
71
77
 
72
78
  - Define all your model variables as SymPy symbols.
73
79
  - Note that in Sympy some operators are special, e.g. Max() instead of max().
@@ -88,31 +94,31 @@ Y<sub>2</sub> = X<sub>2</sub> + 2 * Y<sub>1</sub><sup>2</sup>
88
94
 
89
95
  Y<sub>3</sub> = Y<sub>1</sub> + Y<sub>2</sub>.
90
96
 
91
- This gives the following graphs. Some notes are in order to understand them:
97
+ This gives the following graphs. Some notes to understand them:
92
98
 
93
- - The data used consist of 200 observations. They are available for the x variables X<sub>1</sub> and X<sub>2</sub> with mean(X<sub>1</sub>) = 3 and mean(X<sub>2</sub>) = 2. Variables Y<sub>1</sub> and Y<sub>2</sub> are assumed to be latent / unobserved. Y<sub>3</sub> is assumed to be manifest / observed. Therefore, 200 observations are available for Y<sub>3</sub>.
99
+ - The data used consists of 200 observations. They are available for the x variables X<sub>1</sub> and X<sub>2</sub> with mean(X<sub>1</sub>) = 3 and mean(X<sub>2</sub>) = 2. Variables Y<sub>1</sub> and Y<sub>2</sub> are assumed to be latent / unobserved. Y<sub>3</sub> is assumed to be manifest / observed. Therefore, 200 observations are available for Y<sub>3</sub>.
94
100
 
95
101
  - To allow for benchmark comparisons, each individual effect is measured with respect to the mean of all observations.
96
102
 
97
103
  - Nodes and edges are colored, showing positive (_green_) and negative (_red_) effects they have on the final variable Y<sub>3</sub>.
98
104
 
99
- - Individual effects are based on the given model. For each individual, however its _own_ exogenous data is put into the given graph function to yield the corresponding endogenous values. The effects are computed at this individual point. Individual effects are shown below just for individual no. 1 out of the 200 observations.
105
+ - Individual effects are based on the given model. For each individual, however, its _own_ exogenous data is put into the given graph function to yield the corresponding endogenous values. The effects are computed at this individual point. Individual effects are shown below just for individual no. 1 out of the 200 observations.
100
106
 
101
- - Total effects are shown below in the nodes and they are split up over the outgoing edges yielding the Mediation effects shown on the edges. Note however, that just outgoining edges sum up to the node value, incoming edges do not. All effects are effects just on the final variable of interest, assumed here to be Y<sub>3</sub>.
107
+ - Total effects are shown below in the nodes and they are split up over the outgoing edges yielding the Mediation effects shown on the edges. Note, however, that just outgoing edges sum up to the node value, incoming edges do not. All effects are effects just on the final variable of interest, assumed here to be Y<sub>3</sub>.
102
108
 
103
109
  ![Individual Mediation Effects (IME)](https://github.com/realrate/Causing/raw/develop/images_readme/IME_1.svg)
104
110
 
105
- As you can see in the right-most graph for the individual mediation effects (IME), there is one green path starting at X<sub>1</sub> passing through Y<sub>1</sub>, Y<sub>2</sub> and finally ending in Y<sub>3</sub>. This means that X<sub>1</sub> is the main cause for Y<sub>3</sub> taking on a value above average with its effect on Y<sub>3</sub> being +29.81. However, this positive effect is slightly reduced by X<sub>2</sub>. In total, accounting for all exogenous and endogenous effects, Y<sub>3</sub> is +27.07 above average. You can understand at one glance why Y<sub>3</sub> is above average for individual no. 1.
111
+ As you can see in the right-most graph for the individual mediation effects (IME), there is one green path starting at X<sub>1</sub> passing through Y<sub>1</sub>, Y<sub>2</sub>, and finally ending in Y<sub>3</sub>. This means that X<sub>1</sub> is the main cause for Y<sub>3</sub> taking on a value above average with its effect on Y<sub>3</sub> being +29.81. However, this positive effect is slightly reduced by X<sub>2</sub>. In total, accounting for all exogenous and endogenous effects, Y<sub>3</sub> is +27.07 above average. You can understand at one glance why Y<sub>3</sub> is above average for individual no. 1.
106
112
 
107
113
  You can find the full source code for this example [here](https://github.com/realrate/Causing/blob/develop/causing/examples/models.py#L16-L45).
108
114
 
109
115
  ## 2. Application to Education and Wages
110
116
 
111
- To dig a bit deeper, here we have a real world example from social sciences. We analyze how the wage earned by young American workers is determined by their educational attainment, family characteristics, and test scores.
117
+ To dig a bit deeper, here we have a real-world example from social sciences. We analyze how the wage earned by young American workers is determined by their educational attainment, family characteristics, and test scores.
112
118
 
113
- This 5 minute introductory video gives a short overview over Causing and includes this real data example: See [Causing Introduction Video](https://youtu.be/GJLsjSZOk2w "Causing_Introduction_Video").
119
+ This 5-minute introductory video gives a short overview of Causing and includes this real data example: See [Causing Introduction Video](https://youtu.be/GJLsjSZOk2w "Causing_Introduction_Video").
114
120
 
115
- See here for a detailed analys of the Education and Wages example: [An Application of Causing: Education and Wages](docs/education.md).
121
+ See here for a detailed analysis of the Education and Wages example: [An Application of Causing: Education and Wages](docs/education.md).
116
122
 
117
123
  ## 3. Application to Insurance Ratings
118
124
 
@@ -122,21 +128,21 @@ The Causing approach and its formulas together with an application are given in:
122
128
  DOI: 10.13140/RG.2.2.31524.83848
123
129
  https://www.researchgate.net/publication/339091133
124
130
 
125
- Note that in this early paper the mediation effects on the final variable of interest are called final effects. Also, while the current Causing version just uses numerically computed effect, that paper uses closed formulas.
131
+ Note that in this early paper the mediation effects on the final variable of interest are called final effects. Also, while the current Causing version just uses numerically computed effects, that paper uses closed formulas.
126
132
 
127
133
  The paper proposes simple linear algebra formulas for the causal analysis of equation systems. The effect of one variable on another is the total derivative. It is extended to endogenous system variables. These total effects are identical to the effects used in graph theory and its do-calculus. Further, mediation effects are defined, decomposing the total effect of one variable on a final variable of interest over all its directly caused variables. This allows for an easy but in-depth causal and mediation analysis.
128
134
 
129
- The equation system provided by the user is represented as a structural neural network (SNN). The network's nodes are represented by the model variables and its edge weights are given by the effects. Unlike classical deep neural networks, we follow a sparse and 'small data' approach. This new methodology is applied to financial strength ratings of insurance companies.
135
+ The equation system provided by the user is represented as a structural neural network (SNN). The network's nodes are represented by the model variables and its edge weights are given by the effects. Unlike classical deep neural networks, we follow a sparse and 'small data' approach. This new methodology is applied to the financial strength ratings of insurance companies.
130
136
 
131
137
  > **Keywords:** total derivative, graphical effect, graph theory, do-Calculus, structural neural network, linear Simultaneous Equations Model (SEM), Structural Causal Model (SCM), insurance rating
132
138
 
133
139
  ## Award
134
140
 
135
- RealRate's AI software _Causing_ is a winner of PyTorch AI Hackathon.
141
+ RealRate's AI software _Causing_ is a winner of the PyTorch AI Hackathon.
136
142
 
137
143
  <img src="https://github.com/realrate/Causing/raw/develop/images_readme/RealRate_AI_Software_Winner.png">
138
144
 
139
- We are exited being a winner of the PyTorch AI Hackathon 2020 in the Responsible AI category. This is quite an honor given that more than 2,500 teams submitted their projects.
145
+ We are excited to be a winner of the PyTorch AI Hackathon 2020 in the Responsible AI category. This is quite an honor given that more than 2,500 teams submitted their projects.
140
146
 
141
147
  [devpost.com/software/realrate-explainable-ai-for-company-ratings](https://devpost.com/software/realrate-explainable-ai-for-company-ratings "devpost.com/software/realrate-explainable-ai-for-company-ratings").
142
148
 
@@ -5,7 +5,7 @@ with open("README.md", "r", encoding="utf-8") as fh:
5
5
 
6
6
  setuptools.setup(
7
7
  name="causing",
8
- version="2.3.0",
8
+ version="2.4.1",
9
9
  author="Dr. Holger Bartel",
10
10
  author_email="holger.bartel@realrate.ai",
11
11
  description="Causing: CAUSal INterpretation using Graphs",
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes