hammock-plot 0.2__tar.gz → 0.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,179 @@
1
+ Metadata-Version: 2.1
2
+ Name: hammock_plot
3
+ Version: 0.3
4
+ Summary: Hammock - visualization of categorical or mixed categorical/continuous data
5
+ Home-page: https://github.com/TianchengY/hammock_plot
6
+ Author: Tiancheng Yang
7
+ Author-email: t77yang@uwaterloo.ca
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Operating System :: OS Independent
11
+ Classifier: Intended Audience :: Science/Research
12
+ Requires-Python: >=3.6
13
+ Description-Content-Type: text/markdown
14
+ Requires-Dist: matplotlib
15
+ Requires-Dist: numpy
16
+ Requires-Dist: pandas
17
+
18
+ # Hammock plot
19
+
20
+
21
+ ## Description
22
+
23
+ hammock draws a graph to visualize categorical data - though it also does fine with continuous data.
24
+ Variables are lined up parallel to the vertical axis. Categories within a variable are spread out along a
25
+ vertical line. Categories of adjacent variables are connected by boxes. (The boxes are parallelograms; we
26
+ use boxes for brevity). The "width" of a box is proportional to the number of observations that correspond
27
+ to that box (i.e. have the same values/categories for the two variables). The "width" of a box refers to the
28
+ distance between the longer set of parallel lines rather than the vertical distance.
29
+
30
+ If the boxes degenerate to a single line, and no labels or missing values are used the hammock plot
31
+ corresponds to a parallel coordinate plot. Boxes degenerate into a single line if barwidth is so small that
32
+ the boxes for categorical variables appear to be a single line. For continuous variables boxes will usually
33
+ appear to be a single line because each category typically only contains one observation.
34
+
35
+ The order of variables in varlist determines the order of variables in the graph. All variables in varlist
36
+ must be numerical. String variables should be converted to numerical variables first, e.g. using encode or
37
+ destring.
38
+
39
+
40
+
41
+
42
+ ## Getting started
43
+
44
+ You can install hammock from `pip`:
45
+
46
+ ```shell
47
+ pip install hammock_plot
48
+ ```
49
+
50
+
51
+ ### Example: Asthma data
52
+
53
+ We import the diabetes dataset:
54
+
55
+ ```python
56
+ import hammock_plot
57
+ import pandas as pd
58
+ df = pd.read_csv('../examples/asthma/asth_all3_for_python.csv')
59
+ ```
60
+
61
+ Minimal example of a hammock plot:
62
+ ```python
63
+ var = ["hospitalizations","group","gender","comorbidities"]
64
+ hammock = hammock_plot.Hammock(data_df = df)
65
+ ax = hammock.plot(var=var)
66
+ ```
67
+
68
+ The ordering of the child-adolescent-adult variable is not in the desired order; adult should not be in the middle. We now specify a specific order, child-adolescent-adult.
69
+
70
+ ```python
71
+ var = ["hospitalizations","group","gender","comorbidities"]
72
+ group_dict= {1: "child", 2: "adolescent",3: "adult"}
73
+ value_order = {"group": group_dict}
74
+ hammock = hammock_plot.Hammock(data_df = df)
75
+ ax = hammock.plot(var=var, value_order=value_order )
76
+ ```
77
+
78
+ We highlight observations with comorbidities=0 in red:
79
+
80
+ ```python
81
+ ax = hammock.plot(var=var, value_order=value_order ,hi_var="comorbidities", hi_value=[0], color=["red"])
82
+ ```
83
+
84
+
85
+ ### Example Satisfaction scales for the diabetes data
86
+
87
+ We import the diabetes dataset:
88
+
89
+ ```python
90
+ import hammock_plot
91
+ import pandas as pd
92
+ df = pd.read_csv('../examples/diabetes_outlier/diabetes_for_python.csv')
93
+ ```
94
+
95
+ The three variables represent different ordinal scales for satisfaction. We are checking for missing values:
96
+ ```python
97
+ var = ["sataces","satcomm","satrate"]
98
+ hammock = hammock_plot.Hammock(data_df = df)
99
+ ax = hammock.plot(var=var, default_color="blue", missing=True)
100
+ ```
101
+
102
+ The missing value category is shown at the bottom for each variable. We find missing values for all 3 variables, but fewest for the last one. We also see a phenomenon called "top coding", where
103
+ satisfied respondents simply choose the highest value.
104
+
105
+
106
+
107
+ ## API Reference
108
+
109
+ ```
110
+ hammock()
111
+ ```
112
+
113
+ | Category | Parameter | Type | Description |
114
+ | --- | :-------- | :------- | :------------------------- |
115
+ | General | `var` | `List[str]` | List of variables to display. |
116
+ | | `value_order` | `Dict[str, Dict[int, str]]` | If specified, the order of the values in the plot follows the order of values in the list supplied in the dictionary. A specific value order is useful, for example, for ordered variables. The integer values affect spacing: for example the values 4,5,6 imply equal spacing between 4,5 and 5,6. The values 4,5,7 implies twice as much space between 5,7 as between 4,5.
117
+ | | `missing` | `bool` | Whether or not to add a category for missing values at the bottom of the plot. If False, observations that have a missing value for any variable in the data frame (even those not used in the hammock plot) are removed. Default is False. |
118
+ | | `label` | `bool` | Whether or not to display labels between the plotting segments |
119
+ | Highlighting | `hi_var` | `str` | Variable to be highlighted. Default is none. |
120
+ | | `hi_value` | `List[str or int]` | List of values of `hi_var` to be highlighted. You can highlighted one or multiple values. |
121
+ | | `hi_missing` | `bool` | Whether or not missing values for `hi_var` should be highlighted. |
122
+ | | `color` | `List[str]` | List of colors corresponding to the list of values to be highlighted. Default highlight color list is ["red", "green", "yellow", "lightblue", "orange", "gray", "brown", "olive", "pink", "cyan", "magenta"] |
123
+ | | `default_color` | `str` | Default color of plotting elements for boxes that are not highlighted. Default is "blue" |
124
+ | Manipulating Spacing and Layout | `bar_width` | `float` | Factor by which the default width is increased or reduced. This allows reducing visual clutter. Default is 1.0. |
125
+ | | `space` | `float` | Space left for the labels between the plotting elements. Default is 0.5 |
126
+ | | `label_options` | `Dict[str, Dict[str, Any]]` | Manipulates the size and look of the labels. Args following the options in the website: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.text.html Example:{"ExampleVarname":{"fontsize":12,"fontstyle":"italic","fontweight":"black","color":"b"}} Default is None. |
127
+ | | `height` | `float` | Height of the plot in inches. Default is 10. |
128
+ | | `width` | `float` | Width of the plot in inches. Default is 15. Caution: Width too narrow may distort the plot. |
129
+ | Other options | `shape` | `str` | Shape of the boxes. "rectangle" (default) or "parallelogram". |
130
+ | | `same_scale` | `List[str]` | List of variables that have the same scale. Default is None. |
131
+ | | `display_figure` | `bool` | Whether or not to display the figure. This can be useful if you just want to save the plots. Default is 'True'. |
132
+ | | `save_path` | `str` | If it is not None, the figure will be saved to the given path with given name and format. Default is None. |
133
+
134
+
135
+ ## Historical context
136
+
137
+ In 1898, Sankey diagrams were developed to visualize flows of energy and materials.
138
+
139
+ In 1985, Inselberg popularized parallel coordinates to visualize continuous variables only. The central contribution is the use of parallel axes.
140
+
141
+ In 2003, Schonlau proposed the hammock plot. This was the first plot to visualize categorical data (or mixed categorical continuous data) on parallel axes.
142
+
143
+ In 2010, Rosvall proposed alluvial plots to visualize network variables over time. Rather than using bars to connect axes, alluvial plots use rounded curves. Alluvial plots are now also used to visualize categorical data.
144
+
145
+ There are several additional variations that also visualize categorical data including Parallel Set plots (Bendix et al, 2005) and Right Angle plots (Hofmann and Vendettuoli, 2013).
146
+
147
+ ### References
148
+ Bendix, F., Kosara, R., & Hauser, H. (2005). Parallel sets: visual analysis of categorical data. In IEEE Symposium on Information Visualization, 2005. INFOVIS 2005. 133-140.
149
+
150
+ Hofmann, H., & Vendettuoli, M. (2013). Common angle plots as perception-true visualizations of categorical associations. IEEE transactions on visualization and computer graphics, 19(12), 2297-2305.
151
+
152
+ Inselberg, A., & Dimsdale, B. (2009). Parallel coordinates. Human-Machine Interactive Systems, 199-233.
153
+
154
+ Rosvall, Martin, & Bergstrom, C.T. (2010) "Mapping change in large networks." PloS one 5.1: e8694.
155
+
156
+ Sankey, H. (1898). Introductory note on the thermal efficiency of steam-engines. report of
157
+ the committee appointed on the 31st march, 1896, to consider and report to the council
158
+ upon the subject of the definition of a standard or standards of thermal efficiency for
159
+ steam-engines: With an introductory note. In Minutes of proceedings of the institution
160
+ of civil engineers, Volume 134, pp. 278–283.
161
+
162
+ Schonlau M.
163
+ *[Visualizing Categorical Data Arising in the Health Sciences Using Hammock Plots.](http://www.schonlau.net/publication/03jsm_hammockplot.pdf)*
164
+ In Proceedings of the Section on Statistical Graphics, American Statistical Association; 2003
165
+
166
+
167
+
168
+ ### Other implementations of the hammock plot
169
+ There is also a Stata implementation `hammock` (available from the Stata archive SSC) and an R implementation as part of the package `ggparallel`.
170
+
171
+ [![MIT License](https://img.shields.io/badge/License-MIT-green.svg)](https://choosealicense.com/licenses/mit/)
172
+
173
+
174
+
175
+ ## Authors
176
+
177
+ - Tiancheng Yang t77yang@uwaterloo.ca
178
+
179
+
@@ -2,7 +2,7 @@ import pandas as pd
2
2
  import numpy as np
3
3
  import matplotlib.pyplot as plt
4
4
  from abc import ABC, abstractmethod
5
- from typing import List,Dict
5
+ from typing import List, Dict
6
6
  import warnings
7
7
 
8
8
 
@@ -121,7 +121,7 @@ class Hammock:
121
121
 
122
122
  def plot(self,
123
123
  var: List[str] = None,
124
- value_order: Dict[str, Dict[int,str]] = None,
124
+ value_order: Dict[str, Dict[int, str]] = None,
125
125
  missing: bool = False,
126
126
  hi_missing: bool = False,
127
127
  missing_label_space: float = 1.,
@@ -133,6 +133,7 @@ class Hammock:
133
133
  default_color="blue",
134
134
  # Manipulating Spacing and Layout
135
135
  bar_width: float = 1.,
136
+ min_bar_width: float = .05,
136
137
  space: float = .5,
137
138
  label_options: Dict = None,
138
139
  height: float = 10.,
@@ -151,7 +152,7 @@ class Hammock:
151
152
  if self.data_df[col].dtype.name == "category":
152
153
  self.data_df[col] = self.data_df[col].cat.add_categories(self.missing_data_placeholder)
153
154
  elif "float" in self.data_df[col].dtype.name:
154
- self.data_df[col] = self.data_df[col].apply(lambda x: np.round(x,2))
155
+ self.data_df[col] = self.data_df[col].apply(lambda x: np.round(x, 2))
155
156
  self.data_df_columns = self.data_df.columns.tolist()
156
157
 
157
158
  if not var_lst:
@@ -160,7 +161,6 @@ class Hammock:
160
161
  )
161
162
 
162
163
  if color and type(color) != type([]):
163
-
164
164
  raise ValueError(
165
165
  f'Argument "color" must be a list os str.'
166
166
  )
@@ -178,9 +178,9 @@ class Hammock:
178
178
  )
179
179
 
180
180
  if value_order:
181
- for k,v_ori in value_order.items():
181
+ for k, v_ori in value_order.items():
182
182
  uni_val_set = set(self.data_df[k].dropna().unique())
183
- v = [value_name for order,value_name in v_ori.items()]
183
+ v = [value_name for order, value_name in v_ori.items()]
184
184
  if not set(v) >= uni_val_set:
185
185
  error_values = (set(v) ^ uni_val_set) & set(v)
186
186
  raise ValueError(
@@ -218,23 +218,25 @@ class Hammock:
218
218
  self.hi_value.append(self.missing_data_placeholder)
219
219
  else:
220
220
  self.hi_value = [self.missing_data_placeholder]
221
- colors = ["red", "green", "yellow", "lightblue","orange", "gray", "brown", "olive", "pink", "cyan", "magenta"]
221
+ colors = ["red", "green", "yellow", "lightblue", "orange", "gray", "brown", "olive", "pink", "cyan", "magenta"]
222
222
  self.color_lst = [color for color in color_lst] if color_lst else (
223
223
  colors[:len(self.hi_value)] if hi_var else None)
224
224
  if hi_var:
225
225
  if hi_value and len(self.color_lst) < len(hi_value):
226
- for i in range(len(hi_value)-len(self.color_lst)):
226
+ for i in range(len(hi_value) - len(self.color_lst)):
227
227
  for c in colors:
228
228
  if c not in self.color_lst:
229
229
  self.color_lst.append(c)
230
230
  break
231
- warnings.warn(f"Warning: The length of color is less than the total number of (high values and missing), color was automatically extended to {self.color_lst}")
231
+ warnings.warn(
232
+ f"Warning: The length of color is less than the total number of (high values and missing), color was automatically extended to {self.color_lst}")
232
233
  if hi_var and default_color in self.color_lst:
233
234
  raise ValueError(
234
235
  f'The current highlight colors {self.color_lst} conflict with the default color {default_color}. Please choose another default color or other highlight colors'
235
236
  )
236
237
  # Manipulating Spacing and Layout
237
238
  self.bar_width = bar_width
239
+ self.min_bar_width = min_bar_width
238
240
  self.space = space
239
241
  self.label_options = label_options
240
242
  self.height = height
@@ -257,7 +259,7 @@ class Hammock:
257
259
  raise ValueError(
258
260
  f'the values: {error_values} in highlight value is not in data.'
259
261
  )
260
-
262
+
261
263
  value_color_dict = dict(zip(self.hi_value, self.color_lst))
262
264
 
263
265
  self.data_df[self.color_coloumn_placeholder] = self.data_df[hi_var].apply(
@@ -301,7 +303,7 @@ class Hammock:
301
303
  ax, coordinates_dict = self._list_labels(ax, self.height, self.width, self.label)
302
304
 
303
305
  space = self.space * 10 if label else 0
304
- bar = self.bar_width*3.5/max(data_point_numbers)
306
+ bar = self.bar_width * 3.5 / max(data_point_numbers)
305
307
 
306
308
  if self.shape == "parallelogram":
307
309
  figure_type = Parallelogram()
@@ -314,6 +316,8 @@ class Hammock:
314
316
  left_label = k[0]
315
317
  right_label = k[1]
316
318
  width = bar * v
319
+ if self.min_bar_width and width <= self.min_bar_width:
320
+ width = self.min_bar_width
317
321
  left_coordinate = (coordinates_dict[left_label][0] + space, coordinates_dict[left_label][1])
318
322
  right_coordinate = (coordinates_dict[right_label][0] - space, coordinates_dict[right_label][1])
319
323
  widths.append(width)
@@ -335,6 +339,8 @@ class Hammock:
335
339
  right_label = k[1]
336
340
  if v.get(color):
337
341
  width_temp = bar * v.get(color)
342
+ if self.min_bar_width and width_temp <= self.min_bar_width:
343
+ width_temp = self.min_bar_width
338
344
  else:
339
345
  width_temp = 0
340
346
 
@@ -359,7 +365,7 @@ class Hammock:
359
365
 
360
366
  def _get_varname(self, x):
361
367
  return x.split(self.same_var_placeholder)[:-1][0]
362
-
368
+
363
369
  def is_float(self, element: any) -> bool:
364
370
  if element is None:
365
371
  return False
@@ -390,10 +396,10 @@ class Hammock:
390
396
 
391
397
  return var_pair_lst
392
398
 
393
- def _gen_coordinate(self, start, n, edge, spacing, total_range,val_type="str"):
399
+ def _gen_coordinate(self, start, n, edge, spacing, total_range, val_type="str"):
394
400
  coor_lst = []
395
-
396
- if val_type=="str":
401
+
402
+ if val_type == "str":
397
403
  for i in range(n):
398
404
  coor_lst.append(start + i * spacing)
399
405
 
@@ -405,17 +411,17 @@ class Hammock:
405
411
  coor_lst.append(total_range + (start - edge) - edge)
406
412
  return coor_lst
407
413
 
408
- def _get_same_scale_minmax(self,original_unique_value):
409
- min,max = 0,0
410
- for i,varname in enumerate(self.same_scale):
414
+ def _get_same_scale_minmax(self, original_unique_value):
415
+ min, max = 0, 0
416
+ for i, varname in enumerate(self.same_scale):
411
417
  var_type = str(self.data_df_origin[varname].dtype.name)
412
418
  if "int" in var_type or "float" in var_type:
413
419
  min_val, max_val = original_unique_value[varname][0], original_unique_value[varname][-1]
414
420
  if i == 0:
415
- min,max = min_val, max_val
421
+ min, max = min_val, max_val
416
422
  else:
417
- min = min_val if min_val<min else min
418
- max = max_val if max_val>max else max
423
+ min = min_val if min_val < min else min
424
+ max = max_val if max_val > max else max
419
425
 
420
426
  else:
421
427
  min_val, max_val = 1, len(original_unique_value[varname])
@@ -424,7 +430,7 @@ class Hammock:
424
430
  else:
425
431
  min = min_val if min_val < min else min
426
432
  max = max_val if max_val > max else max
427
- return (min,max)
433
+ return (min, max)
428
434
 
429
435
  def _list_labels(self, ax, figsize_y, figsize_x, label):
430
436
 
@@ -440,24 +446,26 @@ class Hammock:
440
446
  unique_value = []
441
447
  original_unique_value = {}
442
448
  varname_lst = [self._get_varname(var) for var in self.var_lst]
443
-
449
+
444
450
  for var, varname in zip(self.var_lst, varname_lst):
445
451
  unique_valnames = self.data_df[varname].dropna().unique().tolist()
446
452
  sorted_unique_valnames = []
447
453
  if self.value_order and varname in self.value_order:
448
454
  varname_value_order_dict = self.value_order[varname]
449
455
  sorted_unique_valnames_temp = [v for k, v in
450
- sorted(varname_value_order_dict.items(), key=lambda item: item[0])]
456
+ sorted(varname_value_order_dict.items(), key=lambda item: item[0])]
451
457
  for v in sorted_unique_valnames_temp:
452
458
  if v in unique_valnames:
453
459
  sorted_unique_valnames.append(v)
454
460
  if self.missing_data_placeholder in unique_valnames:
455
461
  unique_valnames.remove(self.missing_data_placeholder)
456
- sorted_unique_valnames = sorted(unique_valnames) if not sorted_unique_valnames else sorted_unique_valnames
462
+ sorted_unique_valnames = sorted(
463
+ unique_valnames) if not sorted_unique_valnames else sorted_unique_valnames
457
464
  original_unique_value[varname] = sorted_unique_valnames.copy()
458
465
  sorted_unique_valnames.append(self.missing_data_placeholder)
459
466
  else:
460
- sorted_unique_valnames = sorted(unique_valnames) if not sorted_unique_valnames else sorted_unique_valnames
467
+ sorted_unique_valnames = sorted(
468
+ unique_valnames) if not sorted_unique_valnames else sorted_unique_valnames
461
469
  original_unique_value[varname] = sorted_unique_valnames.copy()
462
470
  unique_value.append([(var, x) for x in sorted_unique_valnames])
463
471
 
@@ -472,11 +480,11 @@ class Hammock:
472
480
 
473
481
  # prepare for same_scale variabels
474
482
  if self.same_scale:
475
- same_scale_min,same_scale_max = self._get_same_scale_minmax(original_unique_value)
476
- same_scale_range = same_scale_max-same_scale_min
483
+ same_scale_min, same_scale_max = self._get_same_scale_minmax(original_unique_value)
484
+ same_scale_range = same_scale_max - same_scale_min
477
485
 
478
486
  # plot labels for each variables
479
- for var_i,(x, uni_val) in enumerate(zip(label_coordinates, unique_value)):
487
+ for var_i, (x, uni_val) in enumerate(zip(label_coordinates, unique_value)):
480
488
  label_num = len(uni_val) - 2 if (uni_val[0][0], self.missing_data_placeholder) in uni_val else len(
481
489
  uni_val) - 1
482
490
  varname = varname_lst[var_i]
@@ -487,21 +495,22 @@ class Hammock:
487
495
  temp_value_range = (y_range - 2 * edge_y_range)
488
496
  # handle the variables in same_scale
489
497
  if self.same_scale and varname in self.same_scale:
490
- min_val, max_val = same_scale_min,same_scale_max
498
+ min_val, max_val = same_scale_min, same_scale_max
491
499
  else:
492
- min_val,max_val = original_unique_value[varname][0],original_unique_value[varname][-1]
493
- value_interval = [temp_value_range*(x_val-min_val)/(max_val-min_val) for x_val in original_unique_value[varname]]
500
+ min_val, max_val = original_unique_value[varname][0], original_unique_value[varname][-1]
501
+ value_interval = [temp_value_range * (x_val - min_val) / (max_val - min_val) for x_val in
502
+ original_unique_value[varname]]
494
503
  uni_val_coordinates = self._gen_coordinate(y_start, label_num, edge_y_range,
495
- value_interval, y_range,val_type = "number")
504
+ value_interval, y_range, val_type="number")
496
505
  else:
497
506
  # handle the variables in same_scale
498
507
  if self.same_scale and varname in self.same_scale:
499
508
  temp_value_range = (y_range - 2 * edge_y_range)
500
- quant_val = list(range(1,len(original_unique_value[varname])+1))
509
+ quant_val = list(range(1, len(original_unique_value[varname]) + 1))
501
510
  min_val, max_val = same_scale_min, same_scale_max
502
511
  value_interval = [temp_value_range * (x_val - min_val) / (max_val - min_val) for x_val in quant_val]
503
512
  uni_val_coordinates = self._gen_coordinate(y_start, label_num, edge_y_range,
504
- value_interval, y_range, val_type = "number")
513
+ value_interval, y_range, val_type="number")
505
514
  else:
506
515
  value_interval = (y_range - 2 * edge_y_range) / (label_num)
507
516
  uni_val_coordinates = self._gen_coordinate(y_start, label_num, edge_y_range,
@@ -525,8 +534,6 @@ class Hammock:
525
534
  ax.text(x, y, val[1], ha='center', va='center')
526
535
  coordinates_dict[val] = (x, y)
527
536
 
528
-
529
537
  return ax, coordinates_dict
530
538
 
531
539
 
532
-
@@ -0,0 +1,179 @@
1
+ Metadata-Version: 2.1
2
+ Name: hammock-plot
3
+ Version: 0.3
4
+ Summary: Hammock - visualization of categorical or mixed categorical/continuous data
5
+ Home-page: https://github.com/TianchengY/hammock_plot
6
+ Author: Tiancheng Yang
7
+ Author-email: t77yang@uwaterloo.ca
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Operating System :: OS Independent
11
+ Classifier: Intended Audience :: Science/Research
12
+ Requires-Python: >=3.6
13
+ Description-Content-Type: text/markdown
14
+ Requires-Dist: matplotlib
15
+ Requires-Dist: numpy
16
+ Requires-Dist: pandas
17
+
18
+ # Hammock plot
19
+
20
+
21
+ ## Description
22
+
23
+ hammock draws a graph to visualize categorical data - though it also does fine with continuous data.
24
+ Variables are lined up parallel to the vertical axis. Categories within a variable are spread out along a
25
+ vertical line. Categories of adjacent variables are connected by boxes. (The boxes are parallelograms; we
26
+ use boxes for brevity). The "width" of a box is proportional to the number of observations that correspond
27
+ to that box (i.e. have the same values/categories for the two variables). The "width" of a box refers to the
28
+ distance between the longer set of parallel lines rather than the vertical distance.
29
+
30
+ If the boxes degenerate to a single line, and no labels or missing values are used the hammock plot
31
+ corresponds to a parallel coordinate plot. Boxes degenerate into a single line if barwidth is so small that
32
+ the boxes for categorical variables appear to be a single line. For continuous variables boxes will usually
33
+ appear to be a single line because each category typically only contains one observation.
34
+
35
+ The order of variables in varlist determines the order of variables in the graph. All variables in varlist
36
+ must be numerical. String variables should be converted to numerical variables first, e.g. using encode or
37
+ destring.
38
+
39
+
40
+
41
+
42
+ ## Getting started
43
+
44
+ You can install hammock from `pip`:
45
+
46
+ ```shell
47
+ pip install hammock_plot
48
+ ```
49
+
50
+
51
+ ### Example: Asthma data
52
+
53
+ We import the diabetes dataset:
54
+
55
+ ```python
56
+ import hammock_plot
57
+ import pandas as pd
58
+ df = pd.read_csv('../examples/asthma/asth_all3_for_python.csv')
59
+ ```
60
+
61
+ Minimal example of a hammock plot:
62
+ ```python
63
+ var = ["hospitalizations","group","gender","comorbidities"]
64
+ hammock = hammock_plot.Hammock(data_df = df)
65
+ ax = hammock.plot(var=var)
66
+ ```
67
+
68
+ The ordering of the child-adolescent-adult variable is not in the desired order; adult should not be in the middle. We now specify a specific order, child-adolescent-adult.
69
+
70
+ ```python
71
+ var = ["hospitalizations","group","gender","comorbidities"]
72
+ group_dict= {1: "child", 2: "adolescent",3: "adult"}
73
+ value_order = {"group": group_dict}
74
+ hammock = hammock_plot.Hammock(data_df = df)
75
+ ax = hammock.plot(var=var, value_order=value_order )
76
+ ```
77
+
78
+ We highlight observations with comorbidities=0 in red:
79
+
80
+ ```python
81
+ ax = hammock.plot(var=var, value_order=value_order ,hi_var="comorbidities", hi_value=[0], color=["red"])
82
+ ```
83
+
84
+
85
+ ### Example Satisfaction scales for the diabetes data
86
+
87
+ We import the diabetes dataset:
88
+
89
+ ```python
90
+ import hammock_plot
91
+ import pandas as pd
92
+ df = pd.read_csv('../examples/diabetes_outlier/diabetes_for_python.csv')
93
+ ```
94
+
95
+ The three variables represent different ordinal scales for satisfaction. We are checking for missing values:
96
+ ```python
97
+ var = ["sataces","satcomm","satrate"]
98
+ hammock = hammock_plot.Hammock(data_df = df)
99
+ ax = hammock.plot(var=var, default_color="blue", missing=True)
100
+ ```
101
+
102
+ The missing value category is shown at the bottom for each variable. We find missing values for all 3 variables, but fewest for the last one. We also see a phenomenon called "top coding", where
103
+ satisfied respondents simply choose the highest value.
104
+
105
+
106
+
107
+ ## API Reference
108
+
109
+ ```
110
+ hammock()
111
+ ```
112
+
113
+ | Category | Parameter | Type | Description |
114
+ | --- | :-------- | :------- | :------------------------- |
115
+ | General | `var` | `List[str]` | List of variables to display. |
116
+ | | `value_order` | `Dict[str, Dict[int, str]]` | If specified, the order of the values in the plot follows the order of values in the list supplied in the dictionary. A specific value order is useful, for example, for ordered variables. The integer values affect spacing: for example the values 4,5,6 imply equal spacing between 4,5 and 5,6. The values 4,5,7 implies twice as much space between 5,7 as between 4,5.
117
+ | | `missing` | `bool` | Whether or not to add a category for missing values at the bottom of the plot. If False, observations that have a missing value for any variable in the data frame (even those not used in the hammock plot) are removed. Default is False. |
118
+ | | `label` | `bool` | Whether or not to display labels between the plotting segments |
119
+ | Highlighting | `hi_var` | `str` | Variable to be highlighted. Default is none. |
120
+ | | `hi_value` | `List[str or int]` | List of values of `hi_var` to be highlighted. You can highlighted one or multiple values. |
121
+ | | `hi_missing` | `bool` | Whether or not missing values for `hi_var` should be highlighted. |
122
+ | | `color` | `List[str]` | List of colors corresponding to the list of values to be highlighted. Default highlight color list is ["red", "green", "yellow", "lightblue", "orange", "gray", "brown", "olive", "pink", "cyan", "magenta"] |
123
+ | | `default_color` | `str` | Default color of plotting elements for boxes that are not highlighted. Default is "blue" |
124
+ | Manipulating Spacing and Layout | `bar_width` | `float` | Factor by which the default width is increased or reduced. This allows reducing visual clutter. Default is 1.0. |
125
+ | | `space` | `float` | Space left for the labels between the plotting elements. Default is 0.5 |
126
+ | | `label_options` | `Dict[str, Dict[str, Any]]` | Manipulates the size and look of the labels. Args following the options in the website: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.text.html Example:{"ExampleVarname":{"fontsize":12,"fontstyle":"italic","fontweight":"black","color":"b"}} Default is None. |
127
+ | | `height` | `float` | Height of the plot in inches. Default is 10. |
128
+ | | `width` | `float` | Width of the plot in inches. Default is 15. Caution: Width too narrow may distort the plot. |
129
+ | Other options | `shape` | `str` | Shape of the boxes. "rectangle" (default) or "parallelogram". |
130
+ | | `same_scale` | `List[str]` | List of variables that have the same scale. Default is None. |
131
+ | | `display_figure` | `bool` | Whether or not to display the figure. This can be useful if you just want to save the plots. Default is 'True'. |
132
+ | | `save_path` | `str` | If it is not None, the figure will be saved to the given path with given name and format. Default is None. |
133
+
134
+
135
+ ## Historical context
136
+
137
+ In 1898, Sankey diagrams were developed to visualize flows of energy and materials.
138
+
139
+ In 1985, Inselberg popularized parallel coordinates to visualize continuous variables only. The central contribution is the use of parallel axes.
140
+
141
+ In 2003, Schonlau proposed the hammock plot. This was the first plot to visualize categorical data (or mixed categorical continuous data) on parallel axes.
142
+
143
+ In 2010, Rosvall proposed alluvial plots to visualize network variables over time. Rather than using bars to connect axes, alluvial plots use rounded curves. Alluvial plots are now also used to visualize categorical data.
144
+
145
+ There are several additional variations that also visualize categorical data including Parallel Set plots (Bendix et al, 2005) and Right Angle plots (Hofmann and Vendettuoli, 2013).
146
+
147
+ ### References
148
+ Bendix, F., Kosara, R., & Hauser, H. (2005). Parallel sets: visual analysis of categorical data. In IEEE Symposium on Information Visualization, 2005. INFOVIS 2005. 133-140.
149
+
150
+ Hofmann, H., & Vendettuoli, M. (2013). Common angle plots as perception-true visualizations of categorical associations. IEEE transactions on visualization and computer graphics, 19(12), 2297-2305.
151
+
152
+ Inselberg, A., & Dimsdale, B. (2009). Parallel coordinates. Human-Machine Interactive Systems, 199-233.
153
+
154
+ Rosvall, Martin, & Bergstrom, C.T. (2010) "Mapping change in large networks." PloS one 5.1: e8694.
155
+
156
+ Sankey, H. (1898). Introductory note on the thermal efficiency of steam-engines. report of
157
+ the committee appointed on the 31st march, 1896, to consider and report to the council
158
+ upon the subject of the definition of a standard or standards of thermal efficiency for
159
+ steam-engines: With an introductory note. In Minutes of proceedings of the institution
160
+ of civil engineers, Volume 134, pp. 278–283.
161
+
162
+ Schonlau M.
163
+ *[Visualizing Categorical Data Arising in the Health Sciences Using Hammock Plots.](http://www.schonlau.net/publication/03jsm_hammockplot.pdf)*
164
+ In Proceedings of the Section on Statistical Graphics, American Statistical Association; 2003
165
+
166
+
167
+
168
+ ### Other implementations of the hammock plot
169
+ There is also a Stata implementation `hammock` (available from the Stata archive SSC) and an R implementation as part of the package `ggparallel`.
170
+
171
+ [![MIT License](https://img.shields.io/badge/License-MIT-green.svg)](https://choosealicense.com/licenses/mit/)
172
+
173
+
174
+
175
+ ## Authors
176
+
177
+ - Tiancheng Yang t77yang@uwaterloo.ca
178
+
179
+
@@ -1,4 +1,5 @@
1
1
  README.md
2
+ setup.cfg
2
3
  setup.py
3
4
  hammock_plot/__init__.py
4
5
  hammock_plot/hammock_plot.py
@@ -6,7 +6,7 @@ with open("README.md", "r", encoding="utf8") as fh:
6
6
 
7
7
  setuptools.setup(
8
8
  name="hammock_plot",
9
- version='0.2',
9
+ version='0.3',
10
10
  author="Tiancheng Yang",
11
11
  author_email="t77yang@uwaterloo.ca",
12
12
  description="Hammock - visualization of categorical or mixed categorical/continuous data",
hammock_plot-0.2/PKG-INFO DELETED
@@ -1,178 +0,0 @@
1
- Metadata-Version: 2.1
2
- Name: hammock_plot
3
- Version: 0.2
4
- Summary: Hammock - visualization of categorical or mixed categorical/continuous data
5
- Home-page: https://github.com/TianchengY/hammock_plot
6
- Author: Tiancheng Yang
7
- Author-email: t77yang@uwaterloo.ca
8
- License: UNKNOWN
9
- Description: # Hammock plot
10
-
11
-
12
- ## Description
13
-
14
- hammock draws a graph to visualize categorical data - though it also does fine with continuous data.
15
- Variables are lined up parallel to the vertical axis. Categories within a variable are spread out along a
16
- vertical line. Categories of adjacent variables are connected by boxes. (The boxes are parallelograms; we
17
- use boxes for brevity). The "width" of a box is proportional to the number of observations that correspond
18
- to that box (i.e. have the same values/categories for the two variables). The "width" of a box refers to the
19
- distance between the longer set of parallel lines rather than the vertical distance.
20
-
21
- If the boxes degenerate to a single line, and no labels or missing values are used the hammock plot
22
- corresponds to a parallel coordinate plot. Boxes degenerate into a single line if barwidth is so small that
23
- the boxes for categorical variables appear to be a single line. For continuous variables boxes will usually
24
- appear to be a single line because each category typically only contains one observation.
25
-
26
- The order of variables in varlist determines the order of variables in the graph. All variables in varlist
27
- must be numerical. String variables should be converted to numerical variables first, e.g. using encode or
28
- destring.
29
-
30
-
31
-
32
-
33
- ## Getting started
34
-
35
- You can install hammock from `pip`:
36
-
37
- ```shell
38
- pip install hammock_plot
39
- ```
40
-
41
-
42
- ### Example: Asthma data
43
-
44
- We import the diabetes dataset:
45
-
46
- ```python
47
- import hammock_plot
48
- import pandas as pd
49
- df = pd.read_csv('../examples/asthma/asth_all3_for_python.csv')
50
- ```
51
-
52
- Minimal example of a hammock plot:
53
- ```python
54
- var = ["hospitalizations","group","gender","comorbidities"]
55
- hammock = hammock_plot.Hammock(data_df = df)
56
- ax = hammock.plot(var=var)
57
- ```
58
-
59
- The ordering of the child-adolescent-adult variable is not in the desired order; adult should not be in the middle. We now specify a specific order, child-adolescent-adult.
60
-
61
- ```python
62
- var = ["hospitalizations","group","gender","comorbidities"]
63
- group_dict= {1: "child", 2: "adolescent",3: "adult"}
64
- value_order = {"group": group_dict}
65
- hammock = hammock_plot.Hammock(data_df = df)
66
- ax = hammock.plot(var=var, value_order=value_order )
67
- ```
68
-
69
- We highlight observations with comorbidities=0 in red:
70
-
71
- ```python
72
- ax = hammock.plot(var=var, value_order=value_order ,hi_var="comorbidities", hi_value=[0], color=["red"])
73
- ```
74
-
75
-
76
- ### Example Satisfaction scales for the diabetes data
77
-
78
- We import the diabetes dataset:
79
-
80
- ```python
81
- import hammock_plot
82
- import pandas as pd
83
- df = pd.read_csv('../examples/diabetes_outlier/diabetes_for_python.csv')
84
- ```
85
-
86
- The three variables represent different ordinal scales for satisfaction. We are checking for missing values:
87
- ```python
88
- var = ["sataces","satcomm","satrate"]
89
- hammock = hammock_plot.Hammock(data_df = df)
90
- ax = hammock.plot(var=var, default_color="blue", missing=True)
91
- ```
92
-
93
- The missing value category is shown at the bottom for each variable. We find missing values for all 3 variables, but fewest for the last one. We also see a phenomenon called "top coding", where
94
- satisfied respondents simply choose the highest value.
95
-
96
-
97
-
98
- ## API Reference
99
-
100
- ```
101
- hammock()
102
- ```
103
-
104
- | Category | Parameter | Type | Description |
105
- | --- | :-------- | :------- | :------------------------- |
106
- | General | `var` | `List[str]` | List of variables to display. |
107
- | | `value_order` | `Dict[str, Dict[int, str]]` | If specified, the order of the values in the plot follows the order of values in the list supplied in the dictionary. A specific value order is useful, for example, for ordered variables. The integer values affect spacing: for example the values 4,5,6 imply equal spacing between 4,5 and 5,6. The values 4,5,7 implies twice as much space between 5,7 as between 4,5.
108
- | | `missing` | `bool` | Whether or not to add a category for missing values at the bottom of the plot. If False, observations that have a missing value for any variable in the data frame (even those not used in the hammock plot) are removed. Default is False. |
109
- | | `label` | `bool` | Whether or not to display labels between the plotting segments |
110
- | Highlighting | `hi_var` | `str` | Variable to be highlighted. Default is none. |
111
- | | `hi_value` | `List[str or int]` | List of values of `hi_var` to be highlighted. You can highlighted one or multiple values. |
112
- | | `hi_missing` | `bool` | Whether or not missing values for `hi_var` should be highlighted. |
113
- | | `color` | `List[str]` | List of colors corresponding to the list of values to be highlighted. Default highlight color list is ["red", "green", "yellow", "lightblue", "orange", "gray", "brown", "olive", "pink", "cyan", "magenta"] |
114
- | | `default_color` | `str` | Default color of plotting elements for boxes that are not highlighted. Default is "blue" |
115
- | Manipulating Spacing and Layout | `bar_width` | `float` | Factor by which the default width is increased or reduced. This allows reducing visual clutter. Default is 1.0. |
116
- | | `space` | `float` | Space left for the labels between the plotting elements. Default is 0.5 |
117
- | | `label_options` | `Dict[str, Dict[str, Any]]` | Manipulates the size and look of the labels. Args following the options in the website: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.text.html Example:{"ExampleVarname":{"fontsize":12,"fontstyle":"italic","fontweight":"black","color":"b"}} Default is None. |
118
- | | `height` | `float` | Height of the plot in inches. Default is 10. |
119
- | | `width` | `float` | Width of the plot in inches. Default is 15. Caution: Width too narrow may distort the plot. |
120
- | Other options | `shape` | `str` | Shape of the boxes. "rectangle" (default) or "parallelogram". |
121
- | | `same_scale` | `List[str]` | List of variables that have the same scale. Default is None. |
122
- | | `display_figure` | `bool` | Whether or not to display the figure. This can be useful if you just want to save the plots. Default is 'True'. |
123
- | | `save_path` | `str` | If it is not None, the figure will be saved to the given path with given name and format. Default is None. |
124
-
125
-
126
- ## Historical context
127
-
128
- In 1898, Sankey diagrams were developed to visualize flows of energy and materials.
129
-
130
- In 1985, Inselberg popularized parallel coordinates to visualize continuous variables only. The central contribution is the use of parallel axes.
131
-
132
- In 2003, Schonlau proposed the hammock plot. This was the first plot to visualize categorical data (or mixed categorical continuous data) on parallel axes.
133
-
134
- In 2010, Rosvall proposed alluvial plots to visualize network variables over time. Rather than using bars to connect axes, alluvial plots use rounded curves. Alluvial plots are now also used to visualize categorical data.
135
-
136
- There are several additional variations that also visualize categorical data including Parallel Set plots (Bendix et al, 2005) and Right Angle plots (Hofmann and Vendettuoli, 2013).
137
-
138
- ### References
139
- Bendix, F., Kosara, R., & Hauser, H. (2005). Parallel sets: visual analysis of categorical data. In IEEE Symposium on Information Visualization, 2005. INFOVIS 2005. 133-140.
140
-
141
- Hofmann, H., & Vendettuoli, M. (2013). Common angle plots as perception-true visualizations of categorical associations. IEEE transactions on visualization and computer graphics, 19(12), 2297-2305.
142
-
143
- Inselberg, A., & Dimsdale, B. (2009). Parallel coordinates. Human-Machine Interactive Systems, 199-233.
144
-
145
- Rosvall, Martin, & Bergstrom, C.T. (2010) "Mapping change in large networks." PloS one 5.1: e8694.
146
-
147
- Sankey, H. (1898). Introductory note on the thermal efficiency of steam-engines. report of
148
- the committee appointed on the 31st march, 1896, to consider and report to the council
149
- upon the subject of the definition of a standard or standards of thermal efficiency for
150
- steam-engines: With an introductory note. In Minutes of proceedings of the institution
151
- of civil engineers, Volume 134, pp. 278–283.
152
-
153
- Schonlau M.
154
- *[Visualizing Categorical Data Arising in the Health Sciences Using Hammock Plots.](http://www.schonlau.net/publication/03jsm_hammockplot.pdf)*
155
- In Proceedings of the Section on Statistical Graphics, American Statistical Association; 2003
156
-
157
-
158
-
159
- ### Other implementations of the hammock plot
160
- There is also a Stata implementation `hammock` (available from the Stata archive SSC) and an R implementation as part of the package `ggparallel`.
161
-
162
- [![MIT License](https://img.shields.io/badge/License-MIT-green.svg)](https://choosealicense.com/licenses/mit/)
163
-
164
-
165
-
166
- ## Authors
167
-
168
- - Tiancheng Yang t77yang@uwaterloo.ca
169
-
170
-
171
-
172
- Platform: UNKNOWN
173
- Classifier: Programming Language :: Python :: 3
174
- Classifier: License :: OSI Approved :: MIT License
175
- Classifier: Operating System :: OS Independent
176
- Classifier: Intended Audience :: Science/Research
177
- Requires-Python: >=3.6
178
- Description-Content-Type: text/markdown
@@ -1,178 +0,0 @@
1
- Metadata-Version: 2.1
2
- Name: hammock-plot
3
- Version: 0.2
4
- Summary: Hammock - visualization of categorical or mixed categorical/continuous data
5
- Home-page: https://github.com/TianchengY/hammock_plot
6
- Author: Tiancheng Yang
7
- Author-email: t77yang@uwaterloo.ca
8
- License: UNKNOWN
9
- Description: # Hammock plot
10
-
11
-
12
- ## Description
13
-
14
- hammock draws a graph to visualize categorical data - though it also does fine with continuous data.
15
- Variables are lined up parallel to the vertical axis. Categories within a variable are spread out along a
16
- vertical line. Categories of adjacent variables are connected by boxes. (The boxes are parallelograms; we
17
- use boxes for brevity). The "width" of a box is proportional to the number of observations that correspond
18
- to that box (i.e. have the same values/categories for the two variables). The "width" of a box refers to the
19
- distance between the longer set of parallel lines rather than the vertical distance.
20
-
21
- If the boxes degenerate to a single line, and no labels or missing values are used the hammock plot
22
- corresponds to a parallel coordinate plot. Boxes degenerate into a single line if barwidth is so small that
23
- the boxes for categorical variables appear to be a single line. For continuous variables boxes will usually
24
- appear to be a single line because each category typically only contains one observation.
25
-
26
- The order of variables in varlist determines the order of variables in the graph. All variables in varlist
27
- must be numerical. String variables should be converted to numerical variables first, e.g. using encode or
28
- destring.
29
-
30
-
31
-
32
-
33
- ## Getting started
34
-
35
- You can install hammock from `pip`:
36
-
37
- ```shell
38
- pip install hammock_plot
39
- ```
40
-
41
-
42
- ### Example: Asthma data
43
-
44
- We import the diabetes dataset:
45
-
46
- ```python
47
- import hammock_plot
48
- import pandas as pd
49
- df = pd.read_csv('../examples/asthma/asth_all3_for_python.csv')
50
- ```
51
-
52
- Minimal example of a hammock plot:
53
- ```python
54
- var = ["hospitalizations","group","gender","comorbidities"]
55
- hammock = hammock_plot.Hammock(data_df = df)
56
- ax = hammock.plot(var=var)
57
- ```
58
-
59
- The ordering of the child-adolescent-adult variable is not in the desired order; adult should not be in the middle. We now specify a specific order, child-adolescent-adult.
60
-
61
- ```python
62
- var = ["hospitalizations","group","gender","comorbidities"]
63
- group_dict= {1: "child", 2: "adolescent",3: "adult"}
64
- value_order = {"group": group_dict}
65
- hammock = hammock_plot.Hammock(data_df = df)
66
- ax = hammock.plot(var=var, value_order=value_order )
67
- ```
68
-
69
- We highlight observations with comorbidities=0 in red:
70
-
71
- ```python
72
- ax = hammock.plot(var=var, value_order=value_order ,hi_var="comorbidities", hi_value=[0], color=["red"])
73
- ```
74
-
75
-
76
- ### Example Satisfaction scales for the diabetes data
77
-
78
- We import the diabetes dataset:
79
-
80
- ```python
81
- import hammock_plot
82
- import pandas as pd
83
- df = pd.read_csv('../examples/diabetes_outlier/diabetes_for_python.csv')
84
- ```
85
-
86
- The three variables represent different ordinal scales for satisfaction. We are checking for missing values:
87
- ```python
88
- var = ["sataces","satcomm","satrate"]
89
- hammock = hammock_plot.Hammock(data_df = df)
90
- ax = hammock.plot(var=var, default_color="blue", missing=True)
91
- ```
92
-
93
- The missing value category is shown at the bottom for each variable. We find missing values for all 3 variables, but fewest for the last one. We also see a phenomenon called "top coding", where
94
- satisfied respondents simply choose the highest value.
95
-
96
-
97
-
98
- ## API Reference
99
-
100
- ```
101
- hammock()
102
- ```
103
-
104
- | Category | Parameter | Type | Description |
105
- | --- | :-------- | :------- | :------------------------- |
106
- | General | `var` | `List[str]` | List of variables to display. |
107
- | | `value_order` | `Dict[str, Dict[int, str]]` | If specified, the order of the values in the plot follows the order of values in the list supplied in the dictionary. A specific value order is useful, for example, for ordered variables. The integer values affect spacing: for example the values 4,5,6 imply equal spacing between 4,5 and 5,6. The values 4,5,7 implies twice as much space between 5,7 as between 4,5.
108
- | | `missing` | `bool` | Whether or not to add a category for missing values at the bottom of the plot. If False, observations that have a missing value for any variable in the data frame (even those not used in the hammock plot) are removed. Default is False. |
109
- | | `label` | `bool` | Whether or not to display labels between the plotting segments |
110
- | Highlighting | `hi_var` | `str` | Variable to be highlighted. Default is none. |
111
- | | `hi_value` | `List[str or int]` | List of values of `hi_var` to be highlighted. You can highlighted one or multiple values. |
112
- | | `hi_missing` | `bool` | Whether or not missing values for `hi_var` should be highlighted. |
113
- | | `color` | `List[str]` | List of colors corresponding to the list of values to be highlighted. Default highlight color list is ["red", "green", "yellow", "lightblue", "orange", "gray", "brown", "olive", "pink", "cyan", "magenta"] |
114
- | | `default_color` | `str` | Default color of plotting elements for boxes that are not highlighted. Default is "blue" |
115
- | Manipulating Spacing and Layout | `bar_width` | `float` | Factor by which the default width is increased or reduced. This allows reducing visual clutter. Default is 1.0. |
116
- | | `space` | `float` | Space left for the labels between the plotting elements. Default is 0.5 |
117
- | | `label_options` | `Dict[str, Dict[str, Any]]` | Manipulates the size and look of the labels. Args following the options in the website: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.text.html Example:{"ExampleVarname":{"fontsize":12,"fontstyle":"italic","fontweight":"black","color":"b"}} Default is None. |
118
- | | `height` | `float` | Height of the plot in inches. Default is 10. |
119
- | | `width` | `float` | Width of the plot in inches. Default is 15. Caution: Width too narrow may distort the plot. |
120
- | Other options | `shape` | `str` | Shape of the boxes. "rectangle" (default) or "parallelogram". |
121
- | | `same_scale` | `List[str]` | List of variables that have the same scale. Default is None. |
122
- | | `display_figure` | `bool` | Whether or not to display the figure. This can be useful if you just want to save the plots. Default is 'True'. |
123
- | | `save_path` | `str` | If it is not None, the figure will be saved to the given path with given name and format. Default is None. |
124
-
125
-
126
- ## Historical context
127
-
128
- In 1898, Sankey diagrams were developed to visualize flows of energy and materials.
129
-
130
- In 1985, Inselberg popularized parallel coordinates to visualize continuous variables only. The central contribution is the use of parallel axes.
131
-
132
- In 2003, Schonlau proposed the hammock plot. This was the first plot to visualize categorical data (or mixed categorical continuous data) on parallel axes.
133
-
134
- In 2010, Rosvall proposed alluvial plots to visualize network variables over time. Rather than using bars to connect axes, alluvial plots use rounded curves. Alluvial plots are now also used to visualize categorical data.
135
-
136
- There are several additional variations that also visualize categorical data including Parallel Set plots (Bendix et al, 2005) and Right Angle plots (Hofmann and Vendettuoli, 2013).
137
-
138
- ### References
139
- Bendix, F., Kosara, R., & Hauser, H. (2005). Parallel sets: visual analysis of categorical data. In IEEE Symposium on Information Visualization, 2005. INFOVIS 2005. 133-140.
140
-
141
- Hofmann, H., & Vendettuoli, M. (2013). Common angle plots as perception-true visualizations of categorical associations. IEEE transactions on visualization and computer graphics, 19(12), 2297-2305.
142
-
143
- Inselberg, A., & Dimsdale, B. (2009). Parallel coordinates. Human-Machine Interactive Systems, 199-233.
144
-
145
- Rosvall, Martin, & Bergstrom, C.T. (2010) "Mapping change in large networks." PloS one 5.1: e8694.
146
-
147
- Sankey, H. (1898). Introductory note on the thermal efficiency of steam-engines. report of
148
- the committee appointed on the 31st march, 1896, to consider and report to the council
149
- upon the subject of the definition of a standard or standards of thermal efficiency for
150
- steam-engines: With an introductory note. In Minutes of proceedings of the institution
151
- of civil engineers, Volume 134, pp. 278–283.
152
-
153
- Schonlau M.
154
- *[Visualizing Categorical Data Arising in the Health Sciences Using Hammock Plots.](http://www.schonlau.net/publication/03jsm_hammockplot.pdf)*
155
- In Proceedings of the Section on Statistical Graphics, American Statistical Association; 2003
156
-
157
-
158
-
159
- ### Other implementations of the hammock plot
160
- There is also a Stata implementation `hammock` (available from the Stata archive SSC) and an R implementation as part of the package `ggparallel`.
161
-
162
- [![MIT License](https://img.shields.io/badge/License-MIT-green.svg)](https://choosealicense.com/licenses/mit/)
163
-
164
-
165
-
166
- ## Authors
167
-
168
- - Tiancheng Yang t77yang@uwaterloo.ca
169
-
170
-
171
-
172
- Platform: UNKNOWN
173
- Classifier: Programming Language :: Python :: 3
174
- Classifier: License :: OSI Approved :: MIT License
175
- Classifier: Operating System :: OS Independent
176
- Classifier: Intended Audience :: Science/Research
177
- Requires-Python: >=3.6
178
- Description-Content-Type: text/markdown
File without changes
File without changes