hammock-plot 0.2__tar.gz → 0.4__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2023 Tiancheng Yang
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,205 @@
1
+ Metadata-Version: 2.4
2
+ Name: hammock_plot
3
+ Version: 0.4
4
+ Summary: Hammock - visualization of categorical or mixed categorical/continuous data
5
+ Home-page: https://github.com/TianchengY/hammock_plot
6
+ Author: Tiancheng Yang
7
+ Author-email: t77yang@uwaterloo.ca
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Operating System :: OS Independent
11
+ Classifier: Intended Audience :: Science/Research
12
+ Requires-Python: >=3.6
13
+ Description-Content-Type: text/markdown
14
+ License-File: LICENSE
15
+ Requires-Dist: matplotlib
16
+ Requires-Dist: numpy
17
+ Requires-Dist: pandas
18
+ Dynamic: author
19
+ Dynamic: author-email
20
+ Dynamic: classifier
21
+ Dynamic: description
22
+ Dynamic: description-content-type
23
+ Dynamic: home-page
24
+ Dynamic: license-file
25
+ Dynamic: requires-dist
26
+ Dynamic: requires-python
27
+ Dynamic: summary
28
+
29
+ # Hammock plot
30
+
31
+
32
+ ## Description
33
+
34
+ The hammock plot draws a graph to visualize categorical or mixed categorical / continuous data.
35
+ Variables are lined up parallel to the vertical axis. Categories within a variable are spread out along a
36
+ vertical line. Categories of adjacent variables are connected by boxes. (The boxes are parallelograms; we
37
+ use boxes for brevity). The "width" of a box is proportional to the number of observations that correspond
38
+ to that box (i.e. have the same values/categories for the two variables). The "width" of a box refers to the
39
+ distance between the longer set of parallel lines rather than the vertical distance.
40
+
41
+ If the boxes degenerate to a single line, and no labels or missing values are used the hammock plot
42
+ corresponds to a parallel coordinate plot. Boxes degenerate into a single line if barwidth is so small that
43
+ the boxes for categorical variables appear to be a single line. For continuous variables boxes will usually
44
+ appear to be a single line because each category typically only contains one observation.
45
+
46
+ The order of variables in varlist determines the order of variables in the graph. All variables in varlist
47
+ must be numerical. String variables should be converted to numerical variables first, e.g. using encode or
48
+ destring.
49
+
50
+
51
+
52
+
53
+ ## Getting started
54
+
55
+ You can install hammock from `pip`:
56
+
57
+ ```shell
58
+ pip install hammock_plot
59
+ ```
60
+
61
+
62
+ ### Example: Asthma data
63
+
64
+ We import the diabetes dataset:
65
+
66
+ ```python
67
+ import hammock_plot
68
+ import pandas as pd
69
+ df = pd.read_csv('../examples/asthma/asth_all3_for_python.csv')
70
+ ```
71
+
72
+ Minimal example of a hammock plot:
73
+ ```python
74
+ var = ["hospitalizations","group","gender","comorbidities"]
75
+ hammock = hammock_plot.Hammock(data_df = df)
76
+ ax = hammock.plot(var=var)
77
+ ```
78
+ <img src="image/asthma_minimal.png" alt="Minimal example for a Hammock plot" width="600"/>
79
+
80
+ The ordering of the child-adolescent-adult variable is not in the desired order; adult should not be in the middle. We now specify a specific order, child-adolescent-adult.
81
+
82
+ ```python
83
+ var = ["hospitalizations","group","gender","comorbidities"]
84
+ group_dict= {1: "child", 2: "adolescent",3: "adult"}
85
+ value_order = {"group": group_dict}
86
+ hammock = hammock_plot.Hammock(data_df = df)
87
+ ax = hammock.plot(var=var, value_order=value_order )
88
+ ```
89
+
90
+ <!--- to restrict image size, I am using a an html command, rather than the standard ![](image.png) --->
91
+ <!--- ![Hammock plot ](image/asthma1.png) --->
92
+ <img src="image/asthma_value_order.png" alt="Hammock plot" width="600"/>
93
+
94
+ We highlight observations with comorbidities=0 in red:
95
+
96
+ ```python
97
+ ax = hammock.plot(var=var, value_order=value_order ,hi_var="comorbidities", hi_value=[0], color=["red"])
98
+ ```
99
+
100
+ <!--- ![Hammock plot with highlighting](image/asthma_highlighting.png) --->
101
+ <img src="image/asthma_highlighting.png" alt="Hammock plot with highlighting" width="600"/>
102
+
103
+
104
+ ### Example Satisfaction scales for the diabetes data
105
+
106
+ We import the diabetes dataset:
107
+
108
+ ```python
109
+ import hammock_plot
110
+ import pandas as pd
111
+ df = pd.read_csv('../examples/diabetes_outlier/diabetes_for_python.csv')
112
+ ```
113
+
114
+ The three variables represent different ordinal scales for satisfaction. We are checking for missing values:
115
+ ```python
116
+ var = ["sataces","satcomm","satrate"]
117
+ hammock = hammock_plot.Hammock(data_df = df)
118
+ ax = hammock.plot(var=var, missing=True)
119
+ ```
120
+
121
+ <img src="image/diabetes.png" alt="Hammock plot for the Diabetes Data" width="600"/>
122
+
123
+ The missing value category is shown at the bottom for each variable. We find missing values for all 3 variables, but fewest for the last one. We also see a phenomenon called "top coding", where
124
+ satisfied respondents simply choose the highest value.
125
+
126
+
127
+
128
+ ## API Reference
129
+
130
+ ```
131
+ hammock()
132
+ ```
133
+
134
+ | Category | Parameter | Type | Description |
135
+ | --- | :-------- | :------- | :------------------------- |
136
+ | General | `var` | `List[str]` | List of variables to display. |
137
+ | | `value_order` | `Dict[str, Dict[int, str]]` | If specified, the order of the values in the plot follows the order of values in the list supplied in the dictionary. A specific value order is useful, for example, for ordered variables. The integer values affect spacing: for example the values 4,5,6 imply equal spacing between 4,5 and 5,6. The values 4,5,7 implies twice as much space between 5,7 as between 4,5.
138
+ | | `missing` | `bool` | Whether or not to add a category for missing values at the bottom of the plot. If False, observations that have a missing value for any variable in the data frame (even those not used in the hammock plot) are removed. Default is False. |
139
+ | | `label` | `bool` | Whether or not to display labels between the plotting segments |
140
+ | Highlighting | `hi_var` | `str` | Variable to be highlighted. Default is none. |
141
+ | | `hi_value` | `List[str or int]` | List of values of `hi_var` to be highlighted. You can highlighted one or multiple values. |
142
+ | | `hi_box` | `str` | Controls how highlighted values are displayed within category labels. Options are "vertical" for vertically stacked color segments or "horizontal" for horizontally split color segments. Default is "vertical".|
143
+ | | `hi_missing` | `bool` | Whether or not missing values for `hi_var` should be highlighted. |
144
+ | | `color` | `List[str]` | List of colors corresponding to the list of values to be highlighted. Each color can be specified as a plain color name (e.g., `"red"`, `"yellow"`) or in the format `"color=alpha"` (e.g., `"red=0.5"`) to control transparency/intensity, where `alpha` is a decimal between 0 and 1. The default highlight color list is `["red", "green", "yellow", "lightblue", "orange", "gray", "brown", "olive", "pink", "cyan", "magenta"]`. |
145
+ | | `default_color` | `str` | Default color of plotting elements for boxes that are not highlighted. Default is "blue" |
146
+ | Manipulating Spacing and Layout | `bar_width` | `float` | Factor by which the default width is increased or reduced. This allows reducing visual clutter. Default is 1.0. |
147
+ | | `space` | `float` | Space left for the labels between the plotting elements. Default is 0.5 |
148
+ | | `label_options` | `Dict[str, Dict[str, Any]]` | Manipulates the size and look of the labels. Args following the options in the website: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.text.html Example:{"ExampleVarname":{"fontsize":12,"fontstyle":"italic","fontweight":"black","color":"b"}} Default is None. |
149
+ | | `height` | `float` | Height of the plot in inches. Default is 10. |
150
+ | | `width` | `float` | Width of the plot in inches. Default is 15. Caution: Width too narrow may distort the plot. |
151
+ | | `min_bar_width` | `float` | Minimal bar width. Bars representing only a tiny fraction of the data may be so narrow, that they are invivisible in a plot. The default value tries to ensure this does not happen. Default is 0.07.
152
+ | Other options | `shape` | `str` | Shape of the boxes. "rectangle" (default) or "parallelogram". |
153
+ | | `same_scale` | `List[str]` | List of variables that have the same scale. Default is None. |
154
+ | | `display_figure` | `bool` | Whether or not to display the figure. This can be useful if you just want to save the plots. Default is 'True'. |
155
+ | | `save_path` | `str` | If it is not None, the figure will be saved to the given path with given name and format. Default is None. |
156
+
157
+
158
+ ## Historical context
159
+
160
+ In 1898, Sankey diagrams were developed to visualize flows of energy and materials.
161
+
162
+ In 1985, Inselberg popularized parallel coordinates to visualize continuous variables only. The central contribution is the use of parallel axes.
163
+
164
+ In 2003, Schonlau proposed the hammock plot. This was the first plot to visualize categorical data (or mixed categorical continuous data) on parallel axes.
165
+
166
+ In 2010, Rosvall proposed alluvial plots to visualize network variables over time. Rather than using bars to connect axes, alluvial plots use rounded curves. Alluvial plots are now also used to visualize categorical data.
167
+
168
+ There are several additional variations that also visualize categorical data including Parallel Set plots (Bendix et al, 2005), Right Angle plots (Hofmann and Vendettuoli, 2013),
169
+ and generalized parallel coordinate plots (GPCPs) (popularized by VanderPlas et al., 2023).
170
+
171
+ ### References
172
+ Bendix, F., Kosara, R., & Hauser, H. (2005). Parallel sets: visual analysis of categorical data. In IEEE Symposium on Information Visualization, 2005. INFOVIS 2005. 133-140.
173
+
174
+ Hofmann, H., & Vendettuoli, M. (2013). Common angle plots as perception-true visualizations of categorical associations. IEEE transactions on visualization and computer graphics, 19(12), 2297-2305.
175
+
176
+ Inselberg, A., & Dimsdale, B. (2009). Parallel coordinates. Human-Machine Interactive Systems, 199-233.
177
+
178
+ Rosvall, Martin, & Bergstrom, C.T. (2010) "Mapping change in large networks." PloS one 5.1: e8694.
179
+
180
+ Sankey, H. (1898). Introductory note on the thermal efficiency of steam-engines. report of
181
+ the committee appointed on the 31st march, 1896, to consider and report to the council
182
+ upon the subject of the definition of a standard or standards of thermal efficiency for
183
+ steam-engines: With an introductory note. In Minutes of proceedings of the institution
184
+ of civil engineers, Volume 134, pp. 278–283.
185
+
186
+ Schonlau M.
187
+ *[Visualizing Categorical Data Arising in the Health Sciences Using Hammock Plots.](http://www.schonlau.net/publication/03jsm_hammockplot.pdf)*
188
+ In Proceedings of the Section on Statistical Graphics, American Statistical Association; 2003
189
+
190
+ VanderPlas, S., Ge, Y., Unwin, A., & Hofmann, H. (2023).
191
+ Penguins Go Parallel: a grammar of graphics framework for generalized parallel coordinate plots.
192
+ Journal of Computational and Graphical Statistics, 1-16. (online first)
193
+
194
+ ### Other implementations of the hammock plot
195
+ There is also a Stata implementation `hammock` (available from the Stata archive SSC) and an R implementation as part of the package `ggparallel`.
196
+
197
+ [![MIT License](https://img.shields.io/badge/License-MIT-green.svg)](https://choosealicense.com/licenses/mit/)
198
+
199
+
200
+
201
+ ## Authors
202
+
203
+ - Tiancheng Yang t77yang@uwaterloo.ca
204
+
205
+
@@ -1,162 +1,177 @@
1
- # Hammock plot
2
-
3
-
4
- ## Description
5
-
6
- hammock draws a graph to visualize categorical data - though it also does fine with continuous data.
7
- Variables are lined up parallel to the vertical axis. Categories within a variable are spread out along a
8
- vertical line. Categories of adjacent variables are connected by boxes. (The boxes are parallelograms; we
9
- use boxes for brevity). The "width" of a box is proportional to the number of observations that correspond
10
- to that box (i.e. have the same values/categories for the two variables). The "width" of a box refers to the
11
- distance between the longer set of parallel lines rather than the vertical distance.
12
-
13
- If the boxes degenerate to a single line, and no labels or missing values are used the hammock plot
14
- corresponds to a parallel coordinate plot. Boxes degenerate into a single line if barwidth is so small that
15
- the boxes for categorical variables appear to be a single line. For continuous variables boxes will usually
16
- appear to be a single line because each category typically only contains one observation.
17
-
18
- The order of variables in varlist determines the order of variables in the graph. All variables in varlist
19
- must be numerical. String variables should be converted to numerical variables first, e.g. using encode or
20
- destring.
21
-
22
-
23
-
24
-
25
- ## Getting started
26
-
27
- You can install hammock from `pip`:
28
-
29
- ```shell
30
- pip install hammock_plot
31
- ```
32
-
33
-
34
- ### Example: Asthma data
35
-
36
- We import the diabetes dataset:
37
-
38
- ```python
39
- import hammock_plot
40
- import pandas as pd
41
- df = pd.read_csv('../examples/asthma/asth_all3_for_python.csv')
42
- ```
43
-
44
- Minimal example of a hammock plot:
45
- ```python
46
- var = ["hospitalizations","group","gender","comorbidities"]
47
- hammock = hammock_plot.Hammock(data_df = df)
48
- ax = hammock.plot(var=var)
49
- ```
50
-
51
- The ordering of the child-adolescent-adult variable is not in the desired order; adult should not be in the middle. We now specify a specific order, child-adolescent-adult.
52
-
53
- ```python
54
- var = ["hospitalizations","group","gender","comorbidities"]
55
- group_dict= {1: "child", 2: "adolescent",3: "adult"}
56
- value_order = {"group": group_dict}
57
- hammock = hammock_plot.Hammock(data_df = df)
58
- ax = hammock.plot(var=var, value_order=value_order )
59
- ```
60
-
61
- We highlight observations with comorbidities=0 in red:
62
-
63
- ```python
64
- ax = hammock.plot(var=var, value_order=value_order ,hi_var="comorbidities", hi_value=[0], color=["red"])
65
- ```
66
-
67
-
68
- ### Example Satisfaction scales for the diabetes data
69
-
70
- We import the diabetes dataset:
71
-
72
- ```python
73
- import hammock_plot
74
- import pandas as pd
75
- df = pd.read_csv('../examples/diabetes_outlier/diabetes_for_python.csv')
76
- ```
77
-
78
- The three variables represent different ordinal scales for satisfaction. We are checking for missing values:
79
- ```python
80
- var = ["sataces","satcomm","satrate"]
81
- hammock = hammock_plot.Hammock(data_df = df)
82
- ax = hammock.plot(var=var, default_color="blue", missing=True)
83
- ```
84
-
85
- The missing value category is shown at the bottom for each variable. We find missing values for all 3 variables, but fewest for the last one. We also see a phenomenon called "top coding", where
86
- satisfied respondents simply choose the highest value.
87
-
88
-
89
-
90
- ## API Reference
91
-
92
- ```
93
- hammock()
94
- ```
95
-
96
- | Category | Parameter | Type | Description |
97
- | --- | :-------- | :------- | :------------------------- |
98
- | General | `var` | `List[str]` | List of variables to display. |
99
- | | `value_order` | `Dict[str, Dict[int, str]]` | If specified, the order of the values in the plot follows the order of values in the list supplied in the dictionary. A specific value order is useful, for example, for ordered variables. The integer values affect spacing: for example the values 4,5,6 imply equal spacing between 4,5 and 5,6. The values 4,5,7 implies twice as much space between 5,7 as between 4,5.
100
- | | `missing` | `bool` | Whether or not to add a category for missing values at the bottom of the plot. If False, observations that have a missing value for any variable in the data frame (even those not used in the hammock plot) are removed. Default is False. |
101
- | | `label` | `bool` | Whether or not to display labels between the plotting segments |
102
- | Highlighting | `hi_var` | `str` | Variable to be highlighted. Default is none. |
103
- | | `hi_value` | `List[str or int]` | List of values of `hi_var` to be highlighted. You can highlighted one or multiple values. |
104
- | | `hi_missing` | `bool` | Whether or not missing values for `hi_var` should be highlighted. |
105
- | | `color` | `List[str]` | List of colors corresponding to the list of values to be highlighted. Default highlight color list is ["red", "green", "yellow", "lightblue", "orange", "gray", "brown", "olive", "pink", "cyan", "magenta"] |
106
- | | `default_color` | `str` | Default color of plotting elements for boxes that are not highlighted. Default is "blue" |
107
- | Manipulating Spacing and Layout | `bar_width` | `float` | Factor by which the default width is increased or reduced. This allows reducing visual clutter. Default is 1.0. |
108
- | | `space` | `float` | Space left for the labels between the plotting elements. Default is 0.5 |
109
- | | `label_options` | `Dict[str, Dict[str, Any]]` | Manipulates the size and look of the labels. Args following the options in the website: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.text.html Example:{"ExampleVarname":{"fontsize":12,"fontstyle":"italic","fontweight":"black","color":"b"}} Default is None. |
110
- | | `height` | `float` | Height of the plot in inches. Default is 10. |
111
- | | `width` | `float` | Width of the plot in inches. Default is 15. Caution: Width too narrow may distort the plot. |
112
- | Other options | `shape` | `str` | Shape of the boxes. "rectangle" (default) or "parallelogram". |
113
- | | `same_scale` | `List[str]` | List of variables that have the same scale. Default is None. |
114
- | | `display_figure` | `bool` | Whether or not to display the figure. This can be useful if you just want to save the plots. Default is 'True'. |
115
- | | `save_path` | `str` | If it is not None, the figure will be saved to the given path with given name and format. Default is None. |
116
-
117
-
118
- ## Historical context
119
-
120
- In 1898, Sankey diagrams were developed to visualize flows of energy and materials.
121
-
122
- In 1985, Inselberg popularized parallel coordinates to visualize continuous variables only. The central contribution is the use of parallel axes.
123
-
124
- In 2003, Schonlau proposed the hammock plot. This was the first plot to visualize categorical data (or mixed categorical continuous data) on parallel axes.
125
-
126
- In 2010, Rosvall proposed alluvial plots to visualize network variables over time. Rather than using bars to connect axes, alluvial plots use rounded curves. Alluvial plots are now also used to visualize categorical data.
127
-
128
- There are several additional variations that also visualize categorical data including Parallel Set plots (Bendix et al, 2005) and Right Angle plots (Hofmann and Vendettuoli, 2013).
129
-
130
- ### References
131
- Bendix, F., Kosara, R., & Hauser, H. (2005). Parallel sets: visual analysis of categorical data. In IEEE Symposium on Information Visualization, 2005. INFOVIS 2005. 133-140.
132
-
133
- Hofmann, H., & Vendettuoli, M. (2013). Common angle plots as perception-true visualizations of categorical associations. IEEE transactions on visualization and computer graphics, 19(12), 2297-2305.
134
-
135
- Inselberg, A., & Dimsdale, B. (2009). Parallel coordinates. Human-Machine Interactive Systems, 199-233.
136
-
137
- Rosvall, Martin, & Bergstrom, C.T. (2010) "Mapping change in large networks." PloS one 5.1: e8694.
138
-
139
- Sankey, H. (1898). Introductory note on the thermal efficiency of steam-engines. report of
140
- the committee appointed on the 31st march, 1896, to consider and report to the council
141
- upon the subject of the definition of a standard or standards of thermal efficiency for
142
- steam-engines: With an introductory note. In Minutes of proceedings of the institution
143
- of civil engineers, Volume 134, pp. 278–283.
144
-
145
- Schonlau M.
146
- *[Visualizing Categorical Data Arising in the Health Sciences Using Hammock Plots.](http://www.schonlau.net/publication/03jsm_hammockplot.pdf)*
147
- In Proceedings of the Section on Statistical Graphics, American Statistical Association; 2003
148
-
149
-
150
-
151
- ### Other implementations of the hammock plot
152
- There is also a Stata implementation `hammock` (available from the Stata archive SSC) and an R implementation as part of the package `ggparallel`.
153
-
154
- [![MIT License](https://img.shields.io/badge/License-MIT-green.svg)](https://choosealicense.com/licenses/mit/)
155
-
156
-
157
-
158
- ## Authors
159
-
160
- - Tiancheng Yang t77yang@uwaterloo.ca
161
-
162
-
1
+ # Hammock plot
2
+
3
+
4
+ ## Description
5
+
6
+ The hammock plot draws a graph to visualize categorical or mixed categorical / continuous data.
7
+ Variables are lined up parallel to the vertical axis. Categories within a variable are spread out along a
8
+ vertical line. Categories of adjacent variables are connected by boxes. (The boxes are parallelograms; we
9
+ use boxes for brevity). The "width" of a box is proportional to the number of observations that correspond
10
+ to that box (i.e. have the same values/categories for the two variables). The "width" of a box refers to the
11
+ distance between the longer set of parallel lines rather than the vertical distance.
12
+
13
+ If the boxes degenerate to a single line, and no labels or missing values are used the hammock plot
14
+ corresponds to a parallel coordinate plot. Boxes degenerate into a single line if barwidth is so small that
15
+ the boxes for categorical variables appear to be a single line. For continuous variables boxes will usually
16
+ appear to be a single line because each category typically only contains one observation.
17
+
18
+ The order of variables in varlist determines the order of variables in the graph. All variables in varlist
19
+ must be numerical. String variables should be converted to numerical variables first, e.g. using encode or
20
+ destring.
21
+
22
+
23
+
24
+
25
+ ## Getting started
26
+
27
+ You can install hammock from `pip`:
28
+
29
+ ```shell
30
+ pip install hammock_plot
31
+ ```
32
+
33
+
34
+ ### Example: Asthma data
35
+
36
+ We import the diabetes dataset:
37
+
38
+ ```python
39
+ import hammock_plot
40
+ import pandas as pd
41
+ df = pd.read_csv('../examples/asthma/asth_all3_for_python.csv')
42
+ ```
43
+
44
+ Minimal example of a hammock plot:
45
+ ```python
46
+ var = ["hospitalizations","group","gender","comorbidities"]
47
+ hammock = hammock_plot.Hammock(data_df = df)
48
+ ax = hammock.plot(var=var)
49
+ ```
50
+ <img src="image/asthma_minimal.png" alt="Minimal example for a Hammock plot" width="600"/>
51
+
52
+ The ordering of the child-adolescent-adult variable is not in the desired order; adult should not be in the middle. We now specify a specific order, child-adolescent-adult.
53
+
54
+ ```python
55
+ var = ["hospitalizations","group","gender","comorbidities"]
56
+ group_dict= {1: "child", 2: "adolescent",3: "adult"}
57
+ value_order = {"group": group_dict}
58
+ hammock = hammock_plot.Hammock(data_df = df)
59
+ ax = hammock.plot(var=var, value_order=value_order )
60
+ ```
61
+
62
+ <!--- to restrict image size, I am using a an html command, rather than the standard ![](image.png) --->
63
+ <!--- ![Hammock plot ](image/asthma1.png) --->
64
+ <img src="image/asthma_value_order.png" alt="Hammock plot" width="600"/>
65
+
66
+ We highlight observations with comorbidities=0 in red:
67
+
68
+ ```python
69
+ ax = hammock.plot(var=var, value_order=value_order ,hi_var="comorbidities", hi_value=[0], color=["red"])
70
+ ```
71
+
72
+ <!--- ![Hammock plot with highlighting](image/asthma_highlighting.png) --->
73
+ <img src="image/asthma_highlighting.png" alt="Hammock plot with highlighting" width="600"/>
74
+
75
+
76
+ ### Example Satisfaction scales for the diabetes data
77
+
78
+ We import the diabetes dataset:
79
+
80
+ ```python
81
+ import hammock_plot
82
+ import pandas as pd
83
+ df = pd.read_csv('../examples/diabetes_outlier/diabetes_for_python.csv')
84
+ ```
85
+
86
+ The three variables represent different ordinal scales for satisfaction. We are checking for missing values:
87
+ ```python
88
+ var = ["sataces","satcomm","satrate"]
89
+ hammock = hammock_plot.Hammock(data_df = df)
90
+ ax = hammock.plot(var=var, missing=True)
91
+ ```
92
+
93
+ <img src="image/diabetes.png" alt="Hammock plot for the Diabetes Data" width="600"/>
94
+
95
+ The missing value category is shown at the bottom for each variable. We find missing values for all 3 variables, but fewest for the last one. We also see a phenomenon called "top coding", where
96
+ satisfied respondents simply choose the highest value.
97
+
98
+
99
+
100
+ ## API Reference
101
+
102
+ ```
103
+ hammock()
104
+ ```
105
+
106
+ | Category | Parameter | Type | Description |
107
+ | --- | :-------- | :------- | :------------------------- |
108
+ | General | `var` | `List[str]` | List of variables to display. |
109
+ | | `value_order` | `Dict[str, Dict[int, str]]` | If specified, the order of the values in the plot follows the order of values in the list supplied in the dictionary. A specific value order is useful, for example, for ordered variables. The integer values affect spacing: for example the values 4,5,6 imply equal spacing between 4,5 and 5,6. The values 4,5,7 implies twice as much space between 5,7 as between 4,5.
110
+ | | `missing` | `bool` | Whether or not to add a category for missing values at the bottom of the plot. If False, observations that have a missing value for any variable in the data frame (even those not used in the hammock plot) are removed. Default is False. |
111
+ | | `label` | `bool` | Whether or not to display labels between the plotting segments |
112
+ | Highlighting | `hi_var` | `str` | Variable to be highlighted. Default is none. |
113
+ | | `hi_value` | `List[str or int]` | List of values of `hi_var` to be highlighted. You can highlighted one or multiple values. |
114
+ | | `hi_box` | `str` | Controls how highlighted values are displayed within category labels. Options are "vertical" for vertically stacked color segments or "horizontal" for horizontally split color segments. Default is "vertical".|
115
+ | | `hi_missing` | `bool` | Whether or not missing values for `hi_var` should be highlighted. |
116
+ | | `color` | `List[str]` | List of colors corresponding to the list of values to be highlighted. Each color can be specified as a plain color name (e.g., `"red"`, `"yellow"`) or in the format `"color=alpha"` (e.g., `"red=0.5"`) to control transparency/intensity, where `alpha` is a decimal between 0 and 1. The default highlight color list is `["red", "green", "yellow", "lightblue", "orange", "gray", "brown", "olive", "pink", "cyan", "magenta"]`. |
117
+ | | `default_color` | `str` | Default color of plotting elements for boxes that are not highlighted. Default is "blue" |
118
+ | Manipulating Spacing and Layout | `bar_width` | `float` | Factor by which the default width is increased or reduced. This allows reducing visual clutter. Default is 1.0. |
119
+ | | `space` | `float` | Space left for the labels between the plotting elements. Default is 0.5 |
120
+ | | `label_options` | `Dict[str, Dict[str, Any]]` | Manipulates the size and look of the labels. Args following the options in the website: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.text.html Example:{"ExampleVarname":{"fontsize":12,"fontstyle":"italic","fontweight":"black","color":"b"}} Default is None. |
121
+ | | `height` | `float` | Height of the plot in inches. Default is 10. |
122
+ | | `width` | `float` | Width of the plot in inches. Default is 15. Caution: Width too narrow may distort the plot. |
123
+ | | `min_bar_width` | `float` | Minimal bar width. Bars representing only a tiny fraction of the data may be so narrow, that they are invivisible in a plot. The default value tries to ensure this does not happen. Default is 0.07.
124
+ | Other options | `shape` | `str` | Shape of the boxes. "rectangle" (default) or "parallelogram". |
125
+ | | `same_scale` | `List[str]` | List of variables that have the same scale. Default is None. |
126
+ | | `display_figure` | `bool` | Whether or not to display the figure. This can be useful if you just want to save the plots. Default is 'True'. |
127
+ | | `save_path` | `str` | If it is not None, the figure will be saved to the given path with given name and format. Default is None. |
128
+
129
+
130
+ ## Historical context
131
+
132
+ In 1898, Sankey diagrams were developed to visualize flows of energy and materials.
133
+
134
+ In 1985, Inselberg popularized parallel coordinates to visualize continuous variables only. The central contribution is the use of parallel axes.
135
+
136
+ In 2003, Schonlau proposed the hammock plot. This was the first plot to visualize categorical data (or mixed categorical continuous data) on parallel axes.
137
+
138
+ In 2010, Rosvall proposed alluvial plots to visualize network variables over time. Rather than using bars to connect axes, alluvial plots use rounded curves. Alluvial plots are now also used to visualize categorical data.
139
+
140
+ There are several additional variations that also visualize categorical data including Parallel Set plots (Bendix et al, 2005), Right Angle plots (Hofmann and Vendettuoli, 2013),
141
+ and generalized parallel coordinate plots (GPCPs) (popularized by VanderPlas et al., 2023).
142
+
143
+ ### References
144
+ Bendix, F., Kosara, R., & Hauser, H. (2005). Parallel sets: visual analysis of categorical data. In IEEE Symposium on Information Visualization, 2005. INFOVIS 2005. 133-140.
145
+
146
+ Hofmann, H., & Vendettuoli, M. (2013). Common angle plots as perception-true visualizations of categorical associations. IEEE transactions on visualization and computer graphics, 19(12), 2297-2305.
147
+
148
+ Inselberg, A., & Dimsdale, B. (2009). Parallel coordinates. Human-Machine Interactive Systems, 199-233.
149
+
150
+ Rosvall, Martin, & Bergstrom, C.T. (2010) "Mapping change in large networks." PloS one 5.1: e8694.
151
+
152
+ Sankey, H. (1898). Introductory note on the thermal efficiency of steam-engines. report of
153
+ the committee appointed on the 31st march, 1896, to consider and report to the council
154
+ upon the subject of the definition of a standard or standards of thermal efficiency for
155
+ steam-engines: With an introductory note. In Minutes of proceedings of the institution
156
+ of civil engineers, Volume 134, pp. 278–283.
157
+
158
+ Schonlau M.
159
+ *[Visualizing Categorical Data Arising in the Health Sciences Using Hammock Plots.](http://www.schonlau.net/publication/03jsm_hammockplot.pdf)*
160
+ In Proceedings of the Section on Statistical Graphics, American Statistical Association; 2003
161
+
162
+ VanderPlas, S., Ge, Y., Unwin, A., & Hofmann, H. (2023).
163
+ Penguins Go Parallel: a grammar of graphics framework for generalized parallel coordinate plots.
164
+ Journal of Computational and Graphical Statistics, 1-16. (online first)
165
+
166
+ ### Other implementations of the hammock plot
167
+ There is also a Stata implementation `hammock` (available from the Stata archive SSC) and an R implementation as part of the package `ggparallel`.
168
+
169
+ [![MIT License](https://img.shields.io/badge/License-MIT-green.svg)](https://choosealicense.com/licenses/mit/)
170
+
171
+
172
+
173
+ ## Authors
174
+
175
+ - Tiancheng Yang t77yang@uwaterloo.ca
176
+
177
+
@@ -1,4 +1,4 @@
1
- from .hammock_plot import Hammock
2
-
3
- __author__ = "Tiancheng Yang"
4
- __author_email__ = "t77yang@uwaterloo.ca"
1
+ from .hammock_plot import Hammock
2
+
3
+ __author__ = "Tiancheng Yang"
4
+ __author_email__ = "t77yang@uwaterloo.ca"