hammock-plot 0.3__tar.gz → 0.4__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2023 Tiancheng Yang
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -1,6 +1,6 @@
1
- Metadata-Version: 2.1
2
- Name: hammock-plot
3
- Version: 0.3
1
+ Metadata-Version: 2.4
2
+ Name: hammock_plot
3
+ Version: 0.4
4
4
  Summary: Hammock - visualization of categorical or mixed categorical/continuous data
5
5
  Home-page: https://github.com/TianchengY/hammock_plot
6
6
  Author: Tiancheng Yang
@@ -11,16 +11,27 @@ Classifier: Operating System :: OS Independent
11
11
  Classifier: Intended Audience :: Science/Research
12
12
  Requires-Python: >=3.6
13
13
  Description-Content-Type: text/markdown
14
+ License-File: LICENSE
14
15
  Requires-Dist: matplotlib
15
16
  Requires-Dist: numpy
16
17
  Requires-Dist: pandas
18
+ Dynamic: author
19
+ Dynamic: author-email
20
+ Dynamic: classifier
21
+ Dynamic: description
22
+ Dynamic: description-content-type
23
+ Dynamic: home-page
24
+ Dynamic: license-file
25
+ Dynamic: requires-dist
26
+ Dynamic: requires-python
27
+ Dynamic: summary
17
28
 
18
29
  # Hammock plot
19
30
 
20
31
 
21
32
  ## Description
22
33
 
23
- hammock draws a graph to visualize categorical data - though it also does fine with continuous data.
34
+ The hammock plot draws a graph to visualize categorical or mixed categorical / continuous data.
24
35
  Variables are lined up parallel to the vertical axis. Categories within a variable are spread out along a
25
36
  vertical line. Categories of adjacent variables are connected by boxes. (The boxes are parallelograms; we
26
37
  use boxes for brevity). The "width" of a box is proportional to the number of observations that correspond
@@ -64,6 +75,7 @@ var = ["hospitalizations","group","gender","comorbidities"]
64
75
  hammock = hammock_plot.Hammock(data_df = df)
65
76
  ax = hammock.plot(var=var)
66
77
  ```
78
+ <img src="image/asthma_minimal.png" alt="Minimal example for a Hammock plot" width="600"/>
67
79
 
68
80
  The ordering of the child-adolescent-adult variable is not in the desired order; adult should not be in the middle. We now specify a specific order, child-adolescent-adult.
69
81
 
@@ -75,12 +87,19 @@ hammock = hammock_plot.Hammock(data_df = df)
75
87
  ax = hammock.plot(var=var, value_order=value_order )
76
88
  ```
77
89
 
90
+ <!--- to restrict image size, I am using a an html command, rather than the standard ![](image.png) --->
91
+ <!--- ![Hammock plot ](image/asthma1.png) --->
92
+ <img src="image/asthma_value_order.png" alt="Hammock plot" width="600"/>
93
+
78
94
  We highlight observations with comorbidities=0 in red:
79
95
 
80
96
  ```python
81
97
  ax = hammock.plot(var=var, value_order=value_order ,hi_var="comorbidities", hi_value=[0], color=["red"])
82
98
  ```
83
99
 
100
+ <!--- ![Hammock plot with highlighting](image/asthma_highlighting.png) --->
101
+ <img src="image/asthma_highlighting.png" alt="Hammock plot with highlighting" width="600"/>
102
+
84
103
 
85
104
  ### Example Satisfaction scales for the diabetes data
86
105
 
@@ -96,9 +115,11 @@ The three variables represent different ordinal scales for satisfaction. We are
96
115
  ```python
97
116
  var = ["sataces","satcomm","satrate"]
98
117
  hammock = hammock_plot.Hammock(data_df = df)
99
- ax = hammock.plot(var=var, default_color="blue", missing=True)
118
+ ax = hammock.plot(var=var, missing=True)
100
119
  ```
101
120
 
121
+ <img src="image/diabetes.png" alt="Hammock plot for the Diabetes Data" width="600"/>
122
+
102
123
  The missing value category is shown at the bottom for each variable. We find missing values for all 3 variables, but fewest for the last one. We also see a phenomenon called "top coding", where
103
124
  satisfied respondents simply choose the highest value.
104
125
 
@@ -118,14 +139,16 @@ satisfied respondents simply choose the highest value.
118
139
  | | `label` | `bool` | Whether or not to display labels between the plotting segments |
119
140
  | Highlighting | `hi_var` | `str` | Variable to be highlighted. Default is none. |
120
141
  | | `hi_value` | `List[str or int]` | List of values of `hi_var` to be highlighted. You can highlighted one or multiple values. |
142
+ | | `hi_box` | `str` | Controls how highlighted values are displayed within category labels. Options are "vertical" for vertically stacked color segments or "horizontal" for horizontally split color segments. Default is "vertical".|
121
143
  | | `hi_missing` | `bool` | Whether or not missing values for `hi_var` should be highlighted. |
122
- | | `color` | `List[str]` | List of colors corresponding to the list of values to be highlighted. Default highlight color list is ["red", "green", "yellow", "lightblue", "orange", "gray", "brown", "olive", "pink", "cyan", "magenta"] |
144
+ | | `color` | `List[str]` | List of colors corresponding to the list of values to be highlighted. Each color can be specified as a plain color name (e.g., `"red"`, `"yellow"`) or in the format `"color=alpha"` (e.g., `"red=0.5"`) to control transparency/intensity, where `alpha` is a decimal between 0 and 1. The default highlight color list is `["red", "green", "yellow", "lightblue", "orange", "gray", "brown", "olive", "pink", "cyan", "magenta"]`. |
123
145
  | | `default_color` | `str` | Default color of plotting elements for boxes that are not highlighted. Default is "blue" |
124
146
  | Manipulating Spacing and Layout | `bar_width` | `float` | Factor by which the default width is increased or reduced. This allows reducing visual clutter. Default is 1.0. |
125
147
  | | `space` | `float` | Space left for the labels between the plotting elements. Default is 0.5 |
126
148
  | | `label_options` | `Dict[str, Dict[str, Any]]` | Manipulates the size and look of the labels. Args following the options in the website: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.text.html Example:{"ExampleVarname":{"fontsize":12,"fontstyle":"italic","fontweight":"black","color":"b"}} Default is None. |
127
149
  | | `height` | `float` | Height of the plot in inches. Default is 10. |
128
150
  | | `width` | `float` | Width of the plot in inches. Default is 15. Caution: Width too narrow may distort the plot. |
151
+ | | `min_bar_width` | `float` | Minimal bar width. Bars representing only a tiny fraction of the data may be so narrow, that they are invivisible in a plot. The default value tries to ensure this does not happen. Default is 0.07.
129
152
  | Other options | `shape` | `str` | Shape of the boxes. "rectangle" (default) or "parallelogram". |
130
153
  | | `same_scale` | `List[str]` | List of variables that have the same scale. Default is None. |
131
154
  | | `display_figure` | `bool` | Whether or not to display the figure. This can be useful if you just want to save the plots. Default is 'True'. |
@@ -142,7 +165,8 @@ In 2003, Schonlau proposed the hammock plot. This was the first plot to visualiz
142
165
 
143
166
  In 2010, Rosvall proposed alluvial plots to visualize network variables over time. Rather than using bars to connect axes, alluvial plots use rounded curves. Alluvial plots are now also used to visualize categorical data.
144
167
 
145
- There are several additional variations that also visualize categorical data including Parallel Set plots (Bendix et al, 2005) and Right Angle plots (Hofmann and Vendettuoli, 2013).
168
+ There are several additional variations that also visualize categorical data including Parallel Set plots (Bendix et al, 2005), Right Angle plots (Hofmann and Vendettuoli, 2013),
169
+ and generalized parallel coordinate plots (GPCPs) (popularized by VanderPlas et al., 2023).
146
170
 
147
171
  ### References
148
172
  Bendix, F., Kosara, R., & Hauser, H. (2005). Parallel sets: visual analysis of categorical data. In IEEE Symposium on Information Visualization, 2005. INFOVIS 2005. 133-140.
@@ -163,7 +187,9 @@ Schonlau M.
163
187
  *[Visualizing Categorical Data Arising in the Health Sciences Using Hammock Plots.](http://www.schonlau.net/publication/03jsm_hammockplot.pdf)*
164
188
  In Proceedings of the Section on Statistical Graphics, American Statistical Association; 2003
165
189
 
166
-
190
+ VanderPlas, S., Ge, Y., Unwin, A., & Hofmann, H. (2023).
191
+ Penguins Go Parallel: a grammar of graphics framework for generalized parallel coordinate plots.
192
+ Journal of Computational and Graphical Statistics, 1-16. (online first)
167
193
 
168
194
  ### Other implementations of the hammock plot
169
195
  There is also a Stata implementation `hammock` (available from the Stata archive SSC) and an R implementation as part of the package `ggparallel`.
@@ -1,26 +1,9 @@
1
- Metadata-Version: 2.1
2
- Name: hammock_plot
3
- Version: 0.3
4
- Summary: Hammock - visualization of categorical or mixed categorical/continuous data
5
- Home-page: https://github.com/TianchengY/hammock_plot
6
- Author: Tiancheng Yang
7
- Author-email: t77yang@uwaterloo.ca
8
- Classifier: Programming Language :: Python :: 3
9
- Classifier: License :: OSI Approved :: MIT License
10
- Classifier: Operating System :: OS Independent
11
- Classifier: Intended Audience :: Science/Research
12
- Requires-Python: >=3.6
13
- Description-Content-Type: text/markdown
14
- Requires-Dist: matplotlib
15
- Requires-Dist: numpy
16
- Requires-Dist: pandas
17
-
18
1
  # Hammock plot
19
2
 
20
3
 
21
4
  ## Description
22
5
 
23
- hammock draws a graph to visualize categorical data - though it also does fine with continuous data.
6
+ The hammock plot draws a graph to visualize categorical or mixed categorical / continuous data.
24
7
  Variables are lined up parallel to the vertical axis. Categories within a variable are spread out along a
25
8
  vertical line. Categories of adjacent variables are connected by boxes. (The boxes are parallelograms; we
26
9
  use boxes for brevity). The "width" of a box is proportional to the number of observations that correspond
@@ -64,6 +47,7 @@ var = ["hospitalizations","group","gender","comorbidities"]
64
47
  hammock = hammock_plot.Hammock(data_df = df)
65
48
  ax = hammock.plot(var=var)
66
49
  ```
50
+ <img src="image/asthma_minimal.png" alt="Minimal example for a Hammock plot" width="600"/>
67
51
 
68
52
  The ordering of the child-adolescent-adult variable is not in the desired order; adult should not be in the middle. We now specify a specific order, child-adolescent-adult.
69
53
 
@@ -75,12 +59,19 @@ hammock = hammock_plot.Hammock(data_df = df)
75
59
  ax = hammock.plot(var=var, value_order=value_order )
76
60
  ```
77
61
 
62
+ <!--- to restrict image size, I am using a an html command, rather than the standard ![](image.png) --->
63
+ <!--- ![Hammock plot ](image/asthma1.png) --->
64
+ <img src="image/asthma_value_order.png" alt="Hammock plot" width="600"/>
65
+
78
66
  We highlight observations with comorbidities=0 in red:
79
67
 
80
68
  ```python
81
69
  ax = hammock.plot(var=var, value_order=value_order ,hi_var="comorbidities", hi_value=[0], color=["red"])
82
70
  ```
83
71
 
72
+ <!--- ![Hammock plot with highlighting](image/asthma_highlighting.png) --->
73
+ <img src="image/asthma_highlighting.png" alt="Hammock plot with highlighting" width="600"/>
74
+
84
75
 
85
76
  ### Example Satisfaction scales for the diabetes data
86
77
 
@@ -96,9 +87,11 @@ The three variables represent different ordinal scales for satisfaction. We are
96
87
  ```python
97
88
  var = ["sataces","satcomm","satrate"]
98
89
  hammock = hammock_plot.Hammock(data_df = df)
99
- ax = hammock.plot(var=var, default_color="blue", missing=True)
90
+ ax = hammock.plot(var=var, missing=True)
100
91
  ```
101
92
 
93
+ <img src="image/diabetes.png" alt="Hammock plot for the Diabetes Data" width="600"/>
94
+
102
95
  The missing value category is shown at the bottom for each variable. We find missing values for all 3 variables, but fewest for the last one. We also see a phenomenon called "top coding", where
103
96
  satisfied respondents simply choose the highest value.
104
97
 
@@ -118,14 +111,16 @@ satisfied respondents simply choose the highest value.
118
111
  | | `label` | `bool` | Whether or not to display labels between the plotting segments |
119
112
  | Highlighting | `hi_var` | `str` | Variable to be highlighted. Default is none. |
120
113
  | | `hi_value` | `List[str or int]` | List of values of `hi_var` to be highlighted. You can highlighted one or multiple values. |
114
+ | | `hi_box` | `str` | Controls how highlighted values are displayed within category labels. Options are "vertical" for vertically stacked color segments or "horizontal" for horizontally split color segments. Default is "vertical".|
121
115
  | | `hi_missing` | `bool` | Whether or not missing values for `hi_var` should be highlighted. |
122
- | | `color` | `List[str]` | List of colors corresponding to the list of values to be highlighted. Default highlight color list is ["red", "green", "yellow", "lightblue", "orange", "gray", "brown", "olive", "pink", "cyan", "magenta"] |
116
+ | | `color` | `List[str]` | List of colors corresponding to the list of values to be highlighted. Each color can be specified as a plain color name (e.g., `"red"`, `"yellow"`) or in the format `"color=alpha"` (e.g., `"red=0.5"`) to control transparency/intensity, where `alpha` is a decimal between 0 and 1. The default highlight color list is `["red", "green", "yellow", "lightblue", "orange", "gray", "brown", "olive", "pink", "cyan", "magenta"]`. |
123
117
  | | `default_color` | `str` | Default color of plotting elements for boxes that are not highlighted. Default is "blue" |
124
118
  | Manipulating Spacing and Layout | `bar_width` | `float` | Factor by which the default width is increased or reduced. This allows reducing visual clutter. Default is 1.0. |
125
119
  | | `space` | `float` | Space left for the labels between the plotting elements. Default is 0.5 |
126
120
  | | `label_options` | `Dict[str, Dict[str, Any]]` | Manipulates the size and look of the labels. Args following the options in the website: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.text.html Example:{"ExampleVarname":{"fontsize":12,"fontstyle":"italic","fontweight":"black","color":"b"}} Default is None. |
127
121
  | | `height` | `float` | Height of the plot in inches. Default is 10. |
128
122
  | | `width` | `float` | Width of the plot in inches. Default is 15. Caution: Width too narrow may distort the plot. |
123
+ | | `min_bar_width` | `float` | Minimal bar width. Bars representing only a tiny fraction of the data may be so narrow, that they are invivisible in a plot. The default value tries to ensure this does not happen. Default is 0.07.
129
124
  | Other options | `shape` | `str` | Shape of the boxes. "rectangle" (default) or "parallelogram". |
130
125
  | | `same_scale` | `List[str]` | List of variables that have the same scale. Default is None. |
131
126
  | | `display_figure` | `bool` | Whether or not to display the figure. This can be useful if you just want to save the plots. Default is 'True'. |
@@ -142,7 +137,8 @@ In 2003, Schonlau proposed the hammock plot. This was the first plot to visualiz
142
137
 
143
138
  In 2010, Rosvall proposed alluvial plots to visualize network variables over time. Rather than using bars to connect axes, alluvial plots use rounded curves. Alluvial plots are now also used to visualize categorical data.
144
139
 
145
- There are several additional variations that also visualize categorical data including Parallel Set plots (Bendix et al, 2005) and Right Angle plots (Hofmann and Vendettuoli, 2013).
140
+ There are several additional variations that also visualize categorical data including Parallel Set plots (Bendix et al, 2005), Right Angle plots (Hofmann and Vendettuoli, 2013),
141
+ and generalized parallel coordinate plots (GPCPs) (popularized by VanderPlas et al., 2023).
146
142
 
147
143
  ### References
148
144
  Bendix, F., Kosara, R., & Hauser, H. (2005). Parallel sets: visual analysis of categorical data. In IEEE Symposium on Information Visualization, 2005. INFOVIS 2005. 133-140.
@@ -163,7 +159,9 @@ Schonlau M.
163
159
  *[Visualizing Categorical Data Arising in the Health Sciences Using Hammock Plots.](http://www.schonlau.net/publication/03jsm_hammockplot.pdf)*
164
160
  In Proceedings of the Section on Statistical Graphics, American Statistical Association; 2003
165
161
 
166
-
162
+ VanderPlas, S., Ge, Y., Unwin, A., & Hofmann, H. (2023).
163
+ Penguins Go Parallel: a grammar of graphics framework for generalized parallel coordinate plots.
164
+ Journal of Computational and Graphical Statistics, 1-16. (online first)
167
165
 
168
166
  ### Other implementations of the hammock plot
169
167
  There is also a Stata implementation `hammock` (available from the Stata archive SSC) and an R implementation as part of the package `ggparallel`.
@@ -1,4 +1,4 @@
1
- from .hammock_plot import Hammock
2
-
3
- __author__ = "Tiancheng Yang"
4
- __author_email__ = "t77yang@uwaterloo.ca"
1
+ from .hammock_plot import Hammock
2
+
3
+ __author__ = "Tiancheng Yang"
4
+ __author_email__ = "t77yang@uwaterloo.ca"