hammock-plot 1.0__tar.gz → 1.2.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {hammock_plot-1.0 → hammock_plot-1.2.0}/LICENSE +21 -21
- hammock_plot-1.2.0/PKG-INFO +25 -0
- hammock_plot-1.2.0/README.md +326 -0
- {hammock_plot-1.0 → hammock_plot-1.2.0}/hammock_plot/__init__.py +5 -5
- hammock_plot-1.2.0/hammock_plot/figure.py +562 -0
- hammock_plot-1.2.0/hammock_plot/main.py +477 -0
- {hammock_plot-1.0 → hammock_plot-1.2.0}/hammock_plot/shapes.py +216 -167
- hammock_plot-1.2.0/hammock_plot/unibar.py +942 -0
- hammock_plot-1.2.0/hammock_plot/utils.py +302 -0
- hammock_plot-1.2.0/hammock_plot/value.py +41 -0
- hammock_plot-1.2.0/hammock_plot.egg-info/PKG-INFO +25 -0
- {hammock_plot-1.0 → hammock_plot-1.2.0}/hammock_plot.egg-info/SOURCES.txt +1 -2
- {hammock_plot-1.0 → hammock_plot-1.2.0}/hammock_plot.egg-info/requires.txt +1 -0
- hammock_plot-1.2.0/pyproject.toml +38 -0
- hammock_plot-1.2.0/setup.cfg +4 -0
- hammock_plot-1.0/PKG-INFO +0 -272
- hammock_plot-1.0/README.md +0 -257
- hammock_plot-1.0/hammock_plot/figure.py +0 -536
- hammock_plot-1.0/hammock_plot/main.py +0 -331
- hammock_plot-1.0/hammock_plot/unibar.py +0 -573
- hammock_plot-1.0/hammock_plot/utils.py +0 -140
- hammock_plot-1.0/hammock_plot/value.py +0 -35
- hammock_plot-1.0/hammock_plot.egg-info/PKG-INFO +0 -272
- hammock_plot-1.0/setup.cfg +0 -15
- hammock_plot-1.0/setup.py +0 -26
- {hammock_plot-1.0 → hammock_plot-1.2.0}/hammock_plot.egg-info/dependency_links.txt +0 -0
- {hammock_plot-1.0 → hammock_plot-1.2.0}/hammock_plot.egg-info/top_level.txt +0 -0
|
@@ -1,21 +1,21 @@
|
|
|
1
|
-
MIT License
|
|
2
|
-
|
|
3
|
-
Copyright (c) 2023 Tiancheng Yang
|
|
4
|
-
|
|
5
|
-
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
-
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
-
in the Software without restriction, including without limitation the rights
|
|
8
|
-
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
-
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
-
furnished to do so, subject to the following conditions:
|
|
11
|
-
|
|
12
|
-
The above copyright notice and this permission notice shall be included in all
|
|
13
|
-
copies or substantial portions of the Software.
|
|
14
|
-
|
|
15
|
-
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
-
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
-
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
-
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
-
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
-
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
-
SOFTWARE.
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2023 Tiancheng Yang
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: hammock-plot
|
|
3
|
+
Version: 1.2.0
|
|
4
|
+
Summary: Hammock plot visualization for categorical and mixed categorical-continuous data
|
|
5
|
+
Author-email: Tiancheng Yang <t77yang@uwaterloo.ca>, Sandra Huang <sandra.huang@uwaterloo.ca>, Matthias Schonlau <schonlau@uwaterloo.ca>
|
|
6
|
+
License: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/TianchengY/hammock_plot
|
|
8
|
+
Project-URL: Repository, https://github.com/TianchengY/hammock_plot
|
|
9
|
+
Classifier: Programming Language :: Python :: 3
|
|
10
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
11
|
+
Classifier: Operating System :: OS Independent
|
|
12
|
+
Classifier: Intended Audience :: Science/Research
|
|
13
|
+
Classifier: Intended Audience :: Developers
|
|
14
|
+
Classifier: Topic :: Scientific/Engineering :: Visualization
|
|
15
|
+
Classifier: Topic :: Scientific/Engineering :: Information Analysis
|
|
16
|
+
Requires-Python: >=3.6
|
|
17
|
+
Description-Content-Type: text/markdown
|
|
18
|
+
License-File: LICENSE
|
|
19
|
+
Requires-Dist: matplotlib
|
|
20
|
+
Requires-Dist: numpy
|
|
21
|
+
Requires-Dist: pandas
|
|
22
|
+
Requires-Dist: scipy
|
|
23
|
+
Dynamic: license-file
|
|
24
|
+
|
|
25
|
+
For the current project description, documentation, and examples, please see the GitHub repository: https://github.com/TianchengY/hammock_plot
|
|
@@ -0,0 +1,326 @@
|
|
|
1
|
+
# Hammock plot
|
|
2
|
+
|
|
3
|
+
## Table of Contents
|
|
4
|
+
|
|
5
|
+
- [Description](#description)
|
|
6
|
+
- [Gallery](#gallery)
|
|
7
|
+
- [Installation](#installation)
|
|
8
|
+
- [Example: Asthma data](#example-asthma-data)
|
|
9
|
+
- [Example Satisfaction scales for the diabetes data](#example-satisfaction-scales-for-the-diabetes-data)
|
|
10
|
+
- [Example value_order for the Shakespeare data](#example-value_order-for-the-shakespeare-data)
|
|
11
|
+
- [Example same_scale using Shakespeare data](#example-same_scale-using-shakespeare-data)
|
|
12
|
+
- [Example display_type using penguin data](#example-display_type-using-penguin-data)
|
|
13
|
+
- [API Reference](#api-reference)
|
|
14
|
+
- [Historical context](#historical-context)
|
|
15
|
+
- [References](#references)
|
|
16
|
+
- [Other implementations of the hammock plot](#other-implementations-of-the-hammock-plot)
|
|
17
|
+
- [Authors](#authors)
|
|
18
|
+
|
|
19
|
+
|
|
20
|
+
## Description
|
|
21
|
+
|
|
22
|
+
The hammock visualizes categorical or mixed categorical and numerical data. The hammock plot uses parallel coordinates which means that the variable axes are parallel to one another. Categories within a variable are spread out along a vertical line. Categories of adjacent variables are connected by boxes (rectangles or parallelograms). The width of a box is proportional to the number of observations that the box represents (i.e. have the same values/categories for the two variables). The "width" of a box refers to the distance between the longer set of parallel lines rather than the vertical distance.
|
|
23
|
+
|
|
24
|
+
If boxes are very thin (e.g., they just represent one observation) they look like a line. In that case, if no labels or missing values are used, the hammock plot corresponds to a parallel coordinate plot.
|
|
25
|
+
|
|
26
|
+
## Gallery
|
|
27
|
+
Click any image to explore the whole gallery or use the direct link: [Gallery](https://tianchengy.github.io/hammock_plot/gallery).
|
|
28
|
+
|
|
29
|
+
<p>
|
|
30
|
+
<a href="https://tianchengy.github.io/hammock_plot/gallery">
|
|
31
|
+
<img src="image/gallery/penguins_numerical_no_highlight.png" alt="Penguins numerical gallery example" width="180"/>
|
|
32
|
+
</a>
|
|
33
|
+
<a href="https://tianchengy.github.io/hammock_plot/gallery">
|
|
34
|
+
<img src="image/gallery/shakespeare_box.png" alt="Shakespeare box gallery example" width="180"/>
|
|
35
|
+
</a>
|
|
36
|
+
<a href="https://tianchengy.github.io/hammock_plot/gallery">
|
|
37
|
+
<img src="image/gallery/penguins_mixed_displays.png" alt="Penguins mixed displays gallery example" width="180"/>
|
|
38
|
+
</a>
|
|
39
|
+
<a href="https://tianchengy.github.io/hammock_plot/gallery">
|
|
40
|
+
<img src="image/gallery/penguins_highlight.png" alt="Penguins highlight gallery example" width="180"/>
|
|
41
|
+
</a>
|
|
42
|
+
</p>
|
|
43
|
+
|
|
44
|
+
|
|
45
|
+
## Installation
|
|
46
|
+
The hammock plot package is accessible for use from [this website](https://hammock-plot.streamlit.app/).
|
|
47
|
+
|
|
48
|
+
You can also install hammock from `pip`:
|
|
49
|
+
|
|
50
|
+
```shell
|
|
51
|
+
pip install hammock_plot
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
|
|
55
|
+
### Example: Asthma data
|
|
56
|
+
|
|
57
|
+
We import the asthma dataset (Schonlau 2024):
|
|
58
|
+
|
|
59
|
+
```python
|
|
60
|
+
import hammock_plot
|
|
61
|
+
import pandas as pd
|
|
62
|
+
df = pd.read_csv('./data/data_asthma.csv')
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
Minimal example of a hammock plot:
|
|
66
|
+
```python
|
|
67
|
+
var = ["hospitalizations","group","gender","comorbidities"]
|
|
68
|
+
hammock = hammock_plot.Hammock(data_df = df)
|
|
69
|
+
ax = hammock.plot(var=var)
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
<img src="image/asthma_minimal.png" alt="Minimal example for a Hammock plot" width="600"/>
|
|
73
|
+
|
|
74
|
+
|
|
75
|
+
The labels for the numerical variables aren't as desired; we would like the labels directly drawn on the data. For our numerical variables, we ignore the level management and instead label each value or level that occurs in the variable.
|
|
76
|
+
|
|
77
|
+
```python
|
|
78
|
+
numeric_levels = {"comorbidities": None, "hospitalizations": None}
|
|
79
|
+
ax = hammock.plot(var=var, numerical_var_levels=numeric_levels)
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
<img src="image/asthma_levels.png" alt="Hammock plot" width="600"/>
|
|
83
|
+
|
|
84
|
+
The ordering of the child-adolescent-adult variable is not in the desired order; adult should not be in the middle. We now specify a specific order, child-adolescent-adult.
|
|
85
|
+
|
|
86
|
+
```python
|
|
87
|
+
group_order = ["child", "adolescent", "adult"]
|
|
88
|
+
value_order = {"group": group_order}
|
|
89
|
+
hammock = hammock_plot.Hammock(data_df = df)
|
|
90
|
+
ax = hammock.plot(var=var, value_order=value_order, numerical_var_levels=numeric_levels)
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
<!--- to restrict image size, I am using a an html command, rather than the standard  --->
|
|
94
|
+
<!---  --->
|
|
95
|
+
<img src="image/asthma_value_order.png" alt="Hammock plot" width="600"/>
|
|
96
|
+
|
|
97
|
+
We highlight observations with comorbidities=0:
|
|
98
|
+
|
|
99
|
+
```python
|
|
100
|
+
ax = hammock.plot(var=var ,hi_var="comorbidities", hi_value=[0], colors=["orange"], numerical_var_levels=numeric_levels)
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
<!---  --->
|
|
104
|
+
<img src="image/asthma_highlighting.png" alt="Hammock plot with highlighting" width="600"/>
|
|
105
|
+
|
|
106
|
+
|
|
107
|
+
### Example Satisfaction scales for the diabetes data
|
|
108
|
+
|
|
109
|
+
We import the diabetes dataset:
|
|
110
|
+
|
|
111
|
+
```python
|
|
112
|
+
import hammock_plot
|
|
113
|
+
import pandas as pd
|
|
114
|
+
df = pd.read_csv('./data/data_diabetes.csv')
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
The three variables represent different ordinal scales for satisfaction. We are checking for missing values:
|
|
118
|
+
```python
|
|
119
|
+
var = ["sataces","satcomm","satrate"]
|
|
120
|
+
hammock = hammock_plot.Hammock(data_df = df)
|
|
121
|
+
ax = hammock.plot(var=var,
|
|
122
|
+
missing=True,
|
|
123
|
+
numerical_var_levels={"sataces": None, "satcomm": None, "satrate": None},
|
|
124
|
+
min_bar_height_unibar=0.2,
|
|
125
|
+
uni_vfill=0.3)
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
<img src="image/diabetes.png" alt="Hammock plot for the Diabetes Data" width="600"/>
|
|
129
|
+
|
|
130
|
+
The missing value category is shown at the bottom for each variable. We find missing values for all 3 variables, but fewest for the last one. We also see a phenomenon called "top coding", where
|
|
131
|
+
satisfied respondents simply choose the highest value.
|
|
132
|
+
|
|
133
|
+
### Example value_order for the Shakespeare data
|
|
134
|
+
|
|
135
|
+
We import the Shakespeare dataset (Schonlau, 2024):
|
|
136
|
+
|
|
137
|
+
```python
|
|
138
|
+
import hammock_plot
|
|
139
|
+
import pandas as pd
|
|
140
|
+
df = pd.read_csv('./data/data_shakespeare_v5.csv')
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
We use a dictionary to map the values of the variables `speaker1` and `speaker2` according to the social class hierarchy. We also choose different colors.
|
|
144
|
+
```python
|
|
145
|
+
var_lst = ["type","speaker1","speaker2","sex1"]
|
|
146
|
+
color_lst = ["#fdc086", "#386cb0", "#7fc97f"]
|
|
147
|
+
hi_value = ["Beggars","Citizens","Gentry"]
|
|
148
|
+
|
|
149
|
+
speaker_order=["Royalty", "Nobility", "Gentry", "Citizens", "Yeomanry", "Beggars"]
|
|
150
|
+
|
|
151
|
+
hammock = hammock_plot.Hammock(data_df = df)
|
|
152
|
+
ax = hammock.plot(var=var_lst,
|
|
153
|
+
uni_vfill=0.6,
|
|
154
|
+
connector_fraction=0.1,
|
|
155
|
+
hi_var = "speaker1", hi_value=hi_value,colors=color_lst,
|
|
156
|
+
missing=True,
|
|
157
|
+
value_order ={"speaker1":speaker_order,"speaker2":speaker_order})
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
<img src="image/shakespeare_order.png" alt="Hammock plot for the Shakespeare data, with value_order specified" width="600"/>
|
|
161
|
+
|
|
162
|
+
### Example same_scale using Shakespeare data
|
|
163
|
+
`speaker1` and `speaker2` should have the same layout, but only `speaker1` has the category Beggars. We can force the same layout using `same_scale`.
|
|
164
|
+
```python
|
|
165
|
+
hammock = hammock_plot.Hammock(data_df = df)
|
|
166
|
+
ax = hammock.plot(var=var_lst,
|
|
167
|
+
uni_vfill=0.6,
|
|
168
|
+
connector_fraction=0.1,
|
|
169
|
+
hi_var = "speaker1", hi_value=hi_value,colors=color_lst,
|
|
170
|
+
missing=True,
|
|
171
|
+
value_order ={"speaker1":speaker_order},
|
|
172
|
+
same_scale=["speaker1", "speaker2"])
|
|
173
|
+
```
|
|
174
|
+
<img src="image/shakespeare_scale.png" alt="Hammock plot for the Shakespeare data, with same_scale specified" width="600"/>
|
|
175
|
+
|
|
176
|
+
### Example display_type using penguin data
|
|
177
|
+
|
|
178
|
+
We import the penguin dataset (Horst et al., 2020):
|
|
179
|
+
|
|
180
|
+
```python
|
|
181
|
+
import hammock_plot
|
|
182
|
+
import pandas as pd
|
|
183
|
+
df = pd.read_csv('./data/data_penguins.csv')
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
We use `display_type` to control how we want to display our data.
|
|
187
|
+
|
|
188
|
+
#### Numerical display types
|
|
189
|
+
Numerical data have three display options: "box", "rug", and "violin".
|
|
190
|
+
```python
|
|
191
|
+
hammock = hammock_plot.Hammock(df)
|
|
192
|
+
ax = hammock.plot(
|
|
193
|
+
var= ["species", "island", "bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g"],
|
|
194
|
+
uni_vfill=0.7,
|
|
195
|
+
connector_fraction=0.1,
|
|
196
|
+
hi_var="island",
|
|
197
|
+
hi_value=["Torgersen"],
|
|
198
|
+
missing=True,
|
|
199
|
+
display_type={"bill_length_mm":"box", "bill_depth_mm": "rug", "flipper_length_mm": "violin", "body_mass_g":"box"},
|
|
200
|
+
)
|
|
201
|
+
```
|
|
202
|
+
<img src="image/penguin_display_numerical.png" alt="Hammock plot for the penguin data, demonstrating display_type for numerical data" width="600"/>
|
|
203
|
+
|
|
204
|
+
There is some overlap among the boxes in the lumpy rugplot. This could be reduced by setting `uni_vfill' lower (this would also affect the categorical variables). Box plots support multiple highlighted values. Violin plots only support one highlighted value (highlighted value vs remainder).
|
|
205
|
+
```python
|
|
206
|
+
ax = hammock.plot(
|
|
207
|
+
var= ["species", "island", "bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g"],
|
|
208
|
+
uni_vfill=0.99,
|
|
209
|
+
connector_fraction=0.1,
|
|
210
|
+
hi_var="island",
|
|
211
|
+
hi_value=["Torgersen", "Biscoe"],
|
|
212
|
+
missing=True,
|
|
213
|
+
display_type={"bill_length_mm":"box", "bill_depth_mm": "box", "flipper_length_mm": "box", "body_mass_g":"box"},
|
|
214
|
+
)
|
|
215
|
+
```
|
|
216
|
+
<img src="image/penguin_display_types_mult_highlight.png" alt="Hammock plot for the penguin data, demonstrating display_type with multiple highlighting" width="600"/>
|
|
217
|
+
|
|
218
|
+
#### Categorical display types
|
|
219
|
+
Categorical data has two display options: "stacked_bar", and "bar" (horizontal bar chart). Default is "stacked_bar".
|
|
220
|
+
|
|
221
|
+
For horizontal bar charts, set uni_vfill to a higher value for better visuals. When uni_vfill is high, lower the connector_fraction.
|
|
222
|
+
```python
|
|
223
|
+
hammock = hammock_plot.Hammock(df)
|
|
224
|
+
ax = hammock.plot(
|
|
225
|
+
var= ["species", "island", "bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g"],
|
|
226
|
+
uni_vfill=0.7,
|
|
227
|
+
connector_fraction=0.1,
|
|
228
|
+
hi_var="island",
|
|
229
|
+
hi_value=["Torgersen", "Biscoe"],
|
|
230
|
+
missing=True,
|
|
231
|
+
display_type={"species": "bar", "island": "bar", "bill_length_mm":"box", "bill_depth_mm": "box", "flipper_length_mm": "box", "body_mass_g":"box"},
|
|
232
|
+
)
|
|
233
|
+
```
|
|
234
|
+
<img src="image/penguin_display_horizontal_barchart.png" alt="Hammock plot for the penguin data, demonstrating display_type for categorical data" width="600"/>
|
|
235
|
+
|
|
236
|
+
## API Reference
|
|
237
|
+
|
|
238
|
+
```
|
|
239
|
+
hammock()
|
|
240
|
+
```
|
|
241
|
+
|
|
242
|
+
| Category | Parameter | Type | Description |
|
|
243
|
+
| --- | :-------- | :------- | :------------------------- |
|
|
244
|
+
| General | `var` | `List[str]` | List of variables to display. The order determines the variable order in the graph. |
|
|
245
|
+
| | `value_order` | `Dict[str, List[int]]` | If specified, the order of the values in the plot follows the order of values in the list supplied in the dictionary. Only applicable to categorical variables. If a value_order is given to a numerical variable, it will behave like categorical data instead. |
|
|
246
|
+
| | `display_type` | `Dict[str, str]` | Specifies the type of plot. "rug", "box", and "violin" are the options for numerical data, and "stacked_bar", "bar" are the options for categorical data. Example: {"NumericalVarname": "rug", "NumericalVarname2": "violin", "NumericalVarname3": "box"}. Default is "rug" for numerical data and "stacked_bar" for categorical data. |
|
|
247
|
+
| | `missing` | `bool` | Whether or not to add a category for missing values at the bottom of the plot. If `False`, observations that have a missing value for any variable in the data frame (even those not used in the hammock plot) are removed. Default is `False`. |
|
|
248
|
+
| | `weights` | `str` | Weight variable (must be a numeric variable with only positive, nonmissing values. Cannot be a member of `var`). |
|
|
249
|
+
| Labeling | `label` | `bool` | Whether or not to display labels between the plotting segments |
|
|
250
|
+
| | `label_options` | `Dict[str, Dict[str, Any]]` | Manipulates the size and look of the labels. Args following the options in the website: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.text.html Example:{"ExampleVarname":{"fontsize":12,"fontstyle":"italic","fontweight":"black","color":"b"}} Default is `None`. |
|
|
251
|
+
| | `numerical_var_levels` | `Dict[str, int \| None]` | Specifies the number of (evenly spaced) labels on the axis for numerical variables. `None` means that the label management is ignored and instead each numeric value gets a label (beware of overplotting). Example: {"NumericalVarname": 9, "NumericalVarname2": None}. Default is 7. |
|
|
252
|
+
| Highlighting and Color | `hi_var` | `str` | Variable to be highlighted. Default is `None`. |
|
|
253
|
+
| | `hi_value` | `List[str or int] or str or int` | Value(s) of `hi_var` to be highlighted. You can highlighted one or multiple values. You can also pass an expression (e.g. "x>1 and (x>5 or x<4)") in string when you want to specify a range for a numeric `hi_var`.|
|
|
254
|
+
| | `hi_box` | `str` | Controls how highlighted values are displayed within category labels. Options are "side-by-side" for side-by-side color segments or "stacked" for horizontally split color segments. Default is "side-by-side".|
|
|
255
|
+
| | `hi_missing` | `bool` | Whether or not missing values for `hi_var` should be highlighted. |
|
|
256
|
+
| | `colors` | `List[str]` | List of colors corresponding to the list of values to be highlighted. Each color can be specified as a plain color name (e.g., `"red"`, `"yellow"`) or in the format `"color=alpha"` (e.g., `"red=0.5"`) to control transparency/intensity, where `alpha` is a decimal between 0 and 1.|
|
|
257
|
+
| | `default_color` | `str` | Default color of plotting elements for boxes that are not highlighted. |
|
|
258
|
+
| | `connector_color` | `str` | The color of the connectors. Default matches the default color + highlight colors. Specifying a connector color removes highlighting from the connectors. | |
|
|
259
|
+
| | `alpha` | `float` | Alpha value for the colours in the plot. Float from 0-1. Default is 0.7. |
|
|
260
|
+
| Manipulating Spacing and Layout | `unibar`| `bool` | Whether or not to display unibars between the plotting segments |
|
|
261
|
+
| | `uni_vfill` | `float` | Fraction of vertical space that should be populated by data. Adjusts the height of the data points. Default is 0.08.|
|
|
262
|
+
| | `connector_fraction` | `float` | Fraction of the `uni_vfill` height used for drawing connectors between unibars. Controls how tall the connectors are relative to the bar height. Default is 1. |
|
|
263
|
+
| | `uni_hfill` | `float` |Fraction of horizontal space allocated to labels/univ. bars rather than to connecting boxes. Default is 0.3. |
|
|
264
|
+
| | `height` | `float` | Height of the plot in inches. Default is 10. |
|
|
265
|
+
| | `width` | `float` | Width of the plot in inches. Default is 15. Caution: Width too narrow may distort the plot. |
|
|
266
|
+
| Other options | `shape` | `str` | Shape of the boxes. "rectangle" or "parallelogram". Default is "rectangle". |
|
|
267
|
+
| | `same_scale` | `List[str]` | List of variables that have the same scale. Default is `None`. |
|
|
268
|
+
| | `min_bar_height_unibar` | `float` | Minimal drawn height of a unibar. Bars representing only a tiny fraction of the data may be so narrow that they are invisible in a plot; this sets an absolute floor on their thickness. With `hi_box="stacked"`, each colour segment within a unibar is also kept at least this tall (by trading height with the larger segments, so the bar height and layout are unchanged), keeping a colour visible even when it is a tiny share of the bar. The default value tries to ensure this does not happen. Default is 0.15 (0.15% of the entire plot height).
|
|
269
|
+
| | `min_bar_height_connectors` | `float` | Minimal drawn thickness of a connector (independent of `connector_fraction`). Like `min_bar_height_unibar` but for the connectors between unibars. Default is 0.12 (0.12% of the entire plot height).
|
|
270
|
+
| | `display_figure` | `bool` | Whether or not to display the figure. This can be useful if you just want to save the plots. Default is `True`. |
|
|
271
|
+
| | `save_path` | `str` | If it is not `None`, the figure will be saved to the given path with given name and format. Default is `None`. |
|
|
272
|
+
| | `violin_bw_method` | `str` or `float` | Specifies the bw method used to plot a violin plot. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.violinplot.html for more details. |
|
|
273
|
+
|
|
274
|
+
|
|
275
|
+
## Historical context
|
|
276
|
+
|
|
277
|
+
In 1898, Sankey diagrams were developed to visualize flows of energy and materials.
|
|
278
|
+
|
|
279
|
+
In 1985, Inselberg popularized parallel coordinates to visualize continuous variables only. The central contribution is the use of parallel axes.
|
|
280
|
+
|
|
281
|
+
In 2003, Schonlau proposed the hammock plot. This was the first plot to visualize categorical data (or mixed categorical and numerical data) on parallel axes.
|
|
282
|
+
|
|
283
|
+
In 2010, Rosvall proposed alluvial plots to visualize network variables over time. Rather than using bars to connect axes, alluvial plots use rounded curves. Alluvial plots are now also used to visualize categorical data.
|
|
284
|
+
|
|
285
|
+
There are several additional variations that also visualize categorical data including Parallel Set plots (Bendix et al, 2005), Right Angle plots (Hofmann and Vendettuoli, 2013),
|
|
286
|
+
and generalized parallel coordinate plots (GPCPs) (popularized by VanderPlas et al., 2023).
|
|
287
|
+
|
|
288
|
+
### References
|
|
289
|
+
Bendix, F., Kosara, R., & Hauser, H. (2005). Parallel sets: visual analysis of categorical data. In IEEE Symposium on Information Visualization, 2005. INFOVIS 2005. 133-140.
|
|
290
|
+
|
|
291
|
+
Hofmann, H., & Vendettuoli, M. (2013). Common angle plots as perception-true visualizations of categorical associations. IEEE transactions on visualization and computer graphics, 19(12), 2297-2305.
|
|
292
|
+
|
|
293
|
+
Horst, A. M., Hill, A. P., & Gorman, K. B. (2020). palmerpenguins: Palmer Archipelago (Antarctica) penguin data} (Version 0.1.0) [R package ]. https://doi.org/10.5281/zenodo.3960218
|
|
294
|
+
|
|
295
|
+
Inselberg, A., & Dimsdale, B. (2009). Parallel coordinates. Human-Machine Interactive Systems, 199-233.
|
|
296
|
+
|
|
297
|
+
Rosvall, Martin, & Bergstrom, C.T. (2010) "Mapping change in large networks." PloS one 5.1: e8694.
|
|
298
|
+
|
|
299
|
+
Sankey, H. (1898). Introductory note on the thermal efficiency of steam-engines. report of
|
|
300
|
+
the committee appointed on the 31st march, 1896, to consider and report to the council
|
|
301
|
+
upon the subject of the definition of a standard or standards of thermal efficiency for
|
|
302
|
+
steam-engines: With an introductory note. In Minutes of proceedings of the institution
|
|
303
|
+
of civil engineers, Volume 134, 278–283.
|
|
304
|
+
|
|
305
|
+
Schonlau M. Hammock plots: visualizing categorical and numerical variables. Journal of Computational and Graphical Statistics, November 2024. 33(4), 1475-1487.
|
|
306
|
+
|
|
307
|
+
Schonlau M.
|
|
308
|
+
*[Visualizing Categorical Data Arising in the Health Sciences Using Hammock Plots.](http://www.schonlau.net/publication/03jsm_hammockplot.pdf)*
|
|
309
|
+
In Proceedings of the Section on Statistical Graphics, American Statistical Association; 2003
|
|
310
|
+
|
|
311
|
+
VanderPlas, S., Ge, Y., Unwin, A., & Hofmann, H. (2023).
|
|
312
|
+
Penguins Go Parallel: a grammar of graphics framework for generalized parallel coordinate plots.
|
|
313
|
+
Journal of Computational and Graphical Statistics, 32(4), 1572-1587.
|
|
314
|
+
|
|
315
|
+
### Other implementations of the hammock plot
|
|
316
|
+
There is also a Stata implementation `hammock` (available from the Stata archive SSC) and an R implementation (without numerical variables) as part of the package `ggparallel`.
|
|
317
|
+
|
|
318
|
+
[](https://choosealicense.com/licenses/mit/)
|
|
319
|
+
|
|
320
|
+
|
|
321
|
+
|
|
322
|
+
## Authors
|
|
323
|
+
|
|
324
|
+
- Tiancheng Yang t77yang@uwaterloo.ca
|
|
325
|
+
- Sandra Huang sandra.huang@uwaterloo.ca
|
|
326
|
+
- Matthias Schonlau schonlau@uwaterloo.ca
|
|
@@ -1,5 +1,5 @@
|
|
|
1
|
-
from .main import Hammock
|
|
2
|
-
|
|
3
|
-
__author__ = "Tiancheng Yang"
|
|
4
|
-
__author_email__ = "t77yang@uwaterloo.ca"
|
|
5
|
-
__all__ = ["Hammock"]
|
|
1
|
+
from .main import Hammock
|
|
2
|
+
|
|
3
|
+
__author__ = "Tiancheng Yang"
|
|
4
|
+
__author_email__ = "t77yang@uwaterloo.ca"
|
|
5
|
+
__all__ = ["Hammock"]
|