PyPI - hammock-plot - Versions diffs - 0.4__tar.gz → 1.0__tar.gz - Mend

hammock-plot 0.4tar.gz → 1.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

{hammock_plot-0.4 → hammock_plot-1.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
-Metadata-Version: 2.4
+Metadata-Version: 2.1
 Name: hammock_plot
-Version: 0.4
+Version: 1.0
 Summary: Hammock - visualization of categorical or mixed categorical/continuous data
 Home-page: https://github.com/TianchengY/hammock_plot
 Author: Tiancheng Yang
@@ -12,19 +12,6 @@ Classifier: Intended Audience :: Science/Research
 Requires-Python: >=3.6
 Description-Content-Type: text/markdown
 License-File: LICENSE
-Requires-Dist: matplotlib
-Requires-Dist: numpy
-Requires-Dist: pandas
-Dynamic: author
-Dynamic: author-email
-Dynamic: classifier
-Dynamic: description
-Dynamic: description-content-type
-Dynamic: home-page
-Dynamic: license-file
-Dynamic: requires-dist
-Dynamic: requires-python
-Dynamic: summary
 # Hammock plot
@@ -66,7 +53,7 @@ We import the diabetes dataset:
 ```python
 import hammock_plot
 import pandas as pd
-df = pd.read_csv('../examples/asthma/asth_all3_for_python.csv')
+df = pd.read_csv('./data/data_asthma.csv')
 ```
 Minimal example of a hammock plot:
@@ -77,14 +64,22 @@ ax = hammock.plot(var=var)
 ```
 <img src="image/asthma_minimal.png" alt="Minimal example for a Hammock plot" width="600"/>
+The labels for the numerical variables aren't as desired; we would like the labels directly drawn on the data. We specify that we want no levels for our numerical variables.
+```python
+numeric_levels = {"comorbidities": None, "hospitalizations": None}
+ax = hammock.plot(var=var, numerical_var_levels=numeric_levels)
+```
+<img src="image/asthma_levels.png" alt="Hammock plot" width="600"/>
 The ordering of the child-adolescent-adult variable is not in the desired order; adult should not be in the middle. We now specify a specific order, child-adolescent-adult.
 ```python
-var = ["hospitalizations","group","gender","comorbidities"]
-group_dict= {1: "child", 2: "adolescent",3: "adult"}
-value_order = {"group": group_dict}
+group_order = ["child", "adolescent", "adult"]
+value_order = {"group": group_order}
 hammock = hammock_plot.Hammock(data_df = df)
-ax = hammock.plot(var=var, value_order=value_order )
+ax = hammock.plot(var=var, value_order=value_order, numerical_var_levels=numeric_levels)
 ```
 <!--- to restrict image size, I am using a an html command, rather than the standard ![](image.png) --->
@@ -94,7 +89,7 @@ ax = hammock.plot(var=var, value_order=value_order )
 We highlight observations with comorbidities=0  in red:
 ```python
-ax = hammock.plot(var=var, value_order=value_order ,hi_var="comorbidities", hi_value=[0], color=["red"])
+ax = hammock.plot(var=var ,hi_var="comorbidities", hi_value=[0], colors=["red"], numerical_var_levels=numeric_levels)
 ```
 <!---   ![Hammock plot with highlighting](image/asthma_highlighting.png)    --->
@@ -108,14 +103,14 @@ We import the diabetes dataset:
 ```python
 import hammock_plot
 import pandas as pd
-df = pd.read_csv('../examples/diabetes_outlier/diabetes_for_python.csv')
+df = pd.read_csv('./data/data_diabetes.csv')
 ```
 The three variables represent different ordinal scales for satisfaction. We are checking for missing values:
 ```python
 var = ["sataces","satcomm","satrate"]
 hammock = hammock_plot.Hammock(data_df = df)
-ax = hammock.plot(var=var, missing=True)
+ax = hammock.plot(var=var, missing=True, min_bar_height=0.2,numerical_var_levels={"sataces": None, "satcomm": None, "satrate": None})
 ```
 <img src="image/diabetes.png" alt="Hammock plot for the Diabetes Data" width="600"/>
@@ -123,7 +118,75 @@ ax = hammock.plot(var=var, missing=True)
 The missing value category is shown at the bottom for each variable. We find missing values for all 3 variables, but fewest for the last one. We also see a phenomenon called "top coding", where
 satisfied respondents simply choose the highest value.
+### Example value_order for the Shakespeare data
+We import the Shakespeare dataset:
+```python
+import hammock_plot
+import pandas as pd
+df = pd.read_csv('./data/data_shakespeare.csv')
+```
+We use `speaker_dict` to map the values of the variables `speaker1` and `speaker2` according to the social class hierarchy.
+```python
+var_lst = ["type","speaker1","speaker2","sex1"]
+color_lst = ["red","yellow","green"]
+hi_value = ["Beggars","Citizens","Gentry"]
+speaker_order=["Beggars", "Royalty", "Nobility", "Gentry", "Citizens", "Yeomanry"]
+hammock = hammock_plot.Hammock(data_df = df)
+ax = hammock.plot(var=var_lst,hi_var = "speaker1", hi_value=hi_value,color=color_lst, bar_width=0.6,missing=True,
+                value_order ={"speaker1":speaker_order,"speaker2":speaker_order} )
+```
+<img src="image/shakespeare_order.png" alt="Hammock plot for the Shakespeare data, with value_order specified" width="600"/>
+### Example same_scale using Shakespeare data
+We can accomplish similar results using `same_scale`.
+```python
+hammock = hammock_plot.Hammock(data_df = df)
+ax = hammock.plot(var=var_lst,hi_var = "speaker1", hi_value=hi_value,color=color_lst, bar_width=0.6,missing=True,
+                value_order ={"speaker1":speaker_order}, same_scale=["speaker1", "speaker2"] )
+```
+<img src="image/shakespeare_scale.png" alt="Hammock plot for the Shakespeare data, with same_scale specified" width="600"/>
+### Example numerical_display_type using penguin data
+We import the Shakespeare dataset:
+```python
+import hammock_plot
+import pandas as pd
+df = pd.read_csv('./data/data_penguins.csv')
+```
+We use `numerical_display_type` to control how we want to display our numerical data.
+```python
+hammock = hammock_plot.Hammock(df)
+ax = hammock.plot(
+    var= ["species", "island", "bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g"],
+    hi_var="island",
+    hi_value=["Torgersen"],
+    missing=True,
+    numerical_display_type={"bill_length_mm":"box", "bill_depth_mm": "rugplot", "flipper_length_mm": "violin", "body_mass_g":"box"},
+)
+```
+<img src="image/penguin_display_violin.png" alt="Hammock plot for the penguin data, demonstrating numerical_display_type" width="600"/>
+Box plots support multiple highlight values. Violin plots only support one highlight value.
+```python
+ax = hammock.plot(
+  var= ["species", "island", "bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g"],
+  hi_var="island",
+  hi_value=["Torgersen", "Biscoe"],
+  missing=True,
+  numerical_display_type={"bill_length_mm":"box", "bill_depth_mm": "box", "flipper_length_mm": "box", "body_mass_g":"box"},
+)
+```
+<img src="image/penguin_display_types.png" alt="Hammock plot for the penguin data, demonstrating numerical_display_type with multiple highlighting" width="600"/>
 ## API Reference
@@ -134,21 +197,25 @@ satisfied respondents simply choose the highest value.
 | Category | Parameter | Type     | Description                |
 | --- | :-------- | :------- | :-------------------------  |
 | General |     `var` | `List[str]` | List of variables to display. |
-| |             `value_order` | `Dict[str, Dict[int, str]]`  |  If specified, the order of the values in the plot follows the order of values in the list supplied in the dictionary. A specific value order is useful, for example, for ordered variables. The integer values affect spacing: for example the values 4,5,6 imply equal spacing between 4,5 and 5,6. The values 4,5,7 implies twice as much space between 5,7 as between 4,5.
+| |             `value_order` | `Dict[str, List[int]]`  |  If specified, the order of the values in the plot follows the order of values in the list supplied in the dictionary. Only applicable to categorical variables |
+| |            `numerical_var_levels` | `Dict[str, int \| None]` | Specifies the number of subdivisions in the y-axis for numerical variables. Example: {"NumericalVarname": 9, "NumericalVarname2": None}. Default is 7. |
+| |            `numerical_display_type` | `Dict[str, str]` | Specifies the type of plot (rugplot, box plot, violin plot) for numerical variable display. Example: {"NumericalVarname": "rugplot", "NumericalVarname2": "violin", "NumericalVarname3": "box"}. Default is "rugplot". |
 | |             `missing` | `bool` | Whether or not to add a category for missing values at the bottom of the plot.  If False, observations that have a missing value for any variable in the data frame (even those not used in the hammock plot) are removed.  Default is False. |
 | |             `label` | `bool` | Whether or not to display labels between the plotting segments |
+| |             `unibar`| `bool` | Whether or not to display unibars between the plotting segments |
 | Highlighting | `hi_var` | `str` |  Variable to be highlighted. Default is none. |
-| | `hi_value` | `List[str or int]` | List of values of `hi_var` to be highlighted. You can highlighted one or multiple values. |
+| | `hi_value` | `List[str or int] or str or int` | Value(s) of `hi_var` to be highlighted. You can highlighted one or multiple values. You can also pass an expression (e.g. "x>1 and (x>5 or x<4)") in string when you want to specify a range for a numeric hi_var.|
 | | `hi_box` | `str` | Controls how highlighted values are displayed within category labels. Options are "vertical" for vertically stacked color segments or "horizontal" for horizontally split color segments. Default is "vertical".|
 | | `hi_missing` | `bool` | Whether or not missing values for `hi_var` should be highlighted. |
 | | `color` | `List[str]` | List of colors corresponding to the list of values to be highlighted. Each color can be specified as a plain color name (e.g., `"red"`, `"yellow"`) or in the format `"color=alpha"` (e.g., `"red=0.5"`) to control transparency/intensity, where `alpha` is a decimal between 0 and 1. The default highlight color list is `["red", "green", "yellow", "lightblue", "orange", "gray", "brown", "olive", "pink", "cyan", "magenta"]`. |
 | | `default_color` | `str` |  Default color of plotting elements for boxes that are not highlighted. Default is "blue" |
-| Manipulating Spacing and Layout |   `bar_width` | `float`  | Factor by which the default width is  increased or reduced. This allows reducing visual clutter. Default is 1.0. |
-| |              `space` |  `float`  | Space left for the labels between the plotting elements. Default is 0.5 |
+| Manipulating Spacing and Layout |   `uni_fraction` | `float`  | Fraction of vertical space that should be populated by data. Adjusts the height of the data points. Defaults is 0.08. |
+| |              `space` |  `float`  |Fraction of horizontal space allocated to labels/univ. bars rather than to connecting boxes. Default is 0.3 |
 | |              `label_options` |  `Dict[str, Dict[str, Any]]`  | Manipulates the size and look of the labels. Args following the options in the website: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.text.html Example:{"ExampleVarname":{"fontsize":12,"fontstyle":"italic","fontweight":"black","color":"b"}}  Default is None. |
 | |              `height` |  `float`  | Height of the plot in inches. Default is 10. |
 | |              `width` |  `float`  |  Width of the plot in inches. Default is 15. Caution: Width too narrow may distort the plot. |
-| |              `min_bar_width` | `float` | Minimal bar width. Bars representing only a tiny fraction of the data may be so narrow, that they are invivisible in a plot. The default value tries to ensure this does not happen.  Default is 0.07.
+| |               `alpha` | `float` | Alpha value for the colours in the plot. Float from 0-1. Default is 0.7. |
+| |              `min_bar_height` | `float` | Minimal bar height. Bars representing only a tiny fraction of the data may be so narrow, that they are invivisible in a plot. The default value tries to ensure this does not happen.  Default is 0.1.
 | Other options |              `shape` |  `str`  | Shape of the boxes. "rectangle" (default) or "parallelogram". |
 | |              `same_scale` |  `List[str]`  | List of variables that have the same scale. Default is None. |
 | |              `display_figure` |  `bool`  | Whether or not to display the figure. This can be useful if you just want to save the plots. Default is 'True'. |

{hammock_plot-0.4 → hammock_plot-1.0}/README.md RENAMED Viewed

@@ -38,7 +38,7 @@ We import the diabetes dataset:
 ```python
 import hammock_plot
 import pandas as pd
-df = pd.read_csv('../examples/asthma/asth_all3_for_python.csv')
+df = pd.read_csv('./data/data_asthma.csv')
 ```
 Minimal example of a hammock plot:
@@ -49,14 +49,22 @@ ax = hammock.plot(var=var)
 ```
 <img src="image/asthma_minimal.png" alt="Minimal example for a Hammock plot" width="600"/>
+The labels for the numerical variables aren't as desired; we would like the labels directly drawn on the data. We specify that we want no levels for our numerical variables.
+```python
+numeric_levels = {"comorbidities": None, "hospitalizations": None}
+ax = hammock.plot(var=var, numerical_var_levels=numeric_levels)
+```
+<img src="image/asthma_levels.png" alt="Hammock plot" width="600"/>
 The ordering of the child-adolescent-adult variable is not in the desired order; adult should not be in the middle. We now specify a specific order, child-adolescent-adult.
 ```python
-var = ["hospitalizations","group","gender","comorbidities"]
-group_dict= {1: "child", 2: "adolescent",3: "adult"}
-value_order = {"group": group_dict}
+group_order = ["child", "adolescent", "adult"]
+value_order = {"group": group_order}
 hammock = hammock_plot.Hammock(data_df = df)
-ax = hammock.plot(var=var, value_order=value_order )
+ax = hammock.plot(var=var, value_order=value_order, numerical_var_levels=numeric_levels)
 ```
 <!--- to restrict image size, I am using a an html command, rather than the standard ![](image.png) --->
@@ -66,7 +74,7 @@ ax = hammock.plot(var=var, value_order=value_order )
 We highlight observations with comorbidities=0  in red:
 ```python
-ax = hammock.plot(var=var, value_order=value_order ,hi_var="comorbidities", hi_value=[0], color=["red"])
+ax = hammock.plot(var=var ,hi_var="comorbidities", hi_value=[0], colors=["red"], numerical_var_levels=numeric_levels)
 ```
 <!---   ![Hammock plot with highlighting](image/asthma_highlighting.png)    --->
@@ -80,14 +88,14 @@ We import the diabetes dataset:
 ```python
 import hammock_plot
 import pandas as pd
-df = pd.read_csv('../examples/diabetes_outlier/diabetes_for_python.csv')
+df = pd.read_csv('./data/data_diabetes.csv')
 ```
 The three variables represent different ordinal scales for satisfaction. We are checking for missing values:
 ```python
 var = ["sataces","satcomm","satrate"]
 hammock = hammock_plot.Hammock(data_df = df)
-ax = hammock.plot(var=var, missing=True)
+ax = hammock.plot(var=var, missing=True, min_bar_height=0.2,numerical_var_levels={"sataces": None, "satcomm": None, "satrate": None})
 ```
 <img src="image/diabetes.png" alt="Hammock plot for the Diabetes Data" width="600"/>
@@ -95,7 +103,75 @@ ax = hammock.plot(var=var, missing=True)
 The missing value category is shown at the bottom for each variable. We find missing values for all 3 variables, but fewest for the last one. We also see a phenomenon called "top coding", where
 satisfied respondents simply choose the highest value.
+### Example value_order for the Shakespeare data
+We import the Shakespeare dataset:
+```python
+import hammock_plot
+import pandas as pd
+df = pd.read_csv('./data/data_shakespeare.csv')
+```
+We use `speaker_dict` to map the values of the variables `speaker1` and `speaker2` according to the social class hierarchy.
+```python
+var_lst = ["type","speaker1","speaker2","sex1"]
+color_lst = ["red","yellow","green"]
+hi_value = ["Beggars","Citizens","Gentry"]
+speaker_order=["Beggars", "Royalty", "Nobility", "Gentry", "Citizens", "Yeomanry"]
+hammock = hammock_plot.Hammock(data_df = df)
+ax = hammock.plot(var=var_lst,hi_var = "speaker1", hi_value=hi_value,color=color_lst, bar_width=0.6,missing=True,
+                value_order ={"speaker1":speaker_order,"speaker2":speaker_order} )
+```
+<img src="image/shakespeare_order.png" alt="Hammock plot for the Shakespeare data, with value_order specified" width="600"/>
+### Example same_scale using Shakespeare data
+We can accomplish similar results using `same_scale`.
+```python
+hammock = hammock_plot.Hammock(data_df = df)
+ax = hammock.plot(var=var_lst,hi_var = "speaker1", hi_value=hi_value,color=color_lst, bar_width=0.6,missing=True,
+                value_order ={"speaker1":speaker_order}, same_scale=["speaker1", "speaker2"] )
+```
+<img src="image/shakespeare_scale.png" alt="Hammock plot for the Shakespeare data, with same_scale specified" width="600"/>
+### Example numerical_display_type using penguin data
+We import the Shakespeare dataset:
+```python
+import hammock_plot
+import pandas as pd
+df = pd.read_csv('./data/data_penguins.csv')
+```
+We use `numerical_display_type` to control how we want to display our numerical data.
+```python
+hammock = hammock_plot.Hammock(df)
+ax = hammock.plot(
+    var= ["species", "island", "bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g"],
+    hi_var="island",
+    hi_value=["Torgersen"],
+    missing=True,
+    numerical_display_type={"bill_length_mm":"box", "bill_depth_mm": "rugplot", "flipper_length_mm": "violin", "body_mass_g":"box"},
+)
+```
+<img src="image/penguin_display_violin.png" alt="Hammock plot for the penguin data, demonstrating numerical_display_type" width="600"/>
+Box plots support multiple highlight values. Violin plots only support one highlight value.
+```python
+ax = hammock.plot(
+  var= ["species", "island", "bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g"],
+  hi_var="island",
+  hi_value=["Torgersen", "Biscoe"],
+  missing=True,
+  numerical_display_type={"bill_length_mm":"box", "bill_depth_mm": "box", "flipper_length_mm": "box", "body_mass_g":"box"},
+)
+```
+<img src="image/penguin_display_types.png" alt="Hammock plot for the penguin data, demonstrating numerical_display_type with multiple highlighting" width="600"/>
 ## API Reference
@@ -106,21 +182,25 @@ satisfied respondents simply choose the highest value.
 | Category | Parameter | Type     | Description                |
 | --- | :-------- | :------- | :-------------------------  |
 | General |     `var` | `List[str]` | List of variables to display. |
-| |             `value_order` | `Dict[str, Dict[int, str]]`  |  If specified, the order of the values in the plot follows the order of values in the list supplied in the dictionary. A specific value order is useful, for example, for ordered variables. The integer values affect spacing: for example the values 4,5,6 imply equal spacing between 4,5 and 5,6. The values 4,5,7 implies twice as much space between 5,7 as between 4,5.
+| |             `value_order` | `Dict[str, List[int]]`  |  If specified, the order of the values in the plot follows the order of values in the list supplied in the dictionary. Only applicable to categorical variables |
+| |            `numerical_var_levels` | `Dict[str, int \| None]` | Specifies the number of subdivisions in the y-axis for numerical variables. Example: {"NumericalVarname": 9, "NumericalVarname2": None}. Default is 7. |
+| |            `numerical_display_type` | `Dict[str, str]` | Specifies the type of plot (rugplot, box plot, violin plot) for numerical variable display. Example: {"NumericalVarname": "rugplot", "NumericalVarname2": "violin", "NumericalVarname3": "box"}. Default is "rugplot". |
 | |             `missing` | `bool` | Whether or not to add a category for missing values at the bottom of the plot.  If False, observations that have a missing value for any variable in the data frame (even those not used in the hammock plot) are removed.  Default is False. |
 | |             `label` | `bool` | Whether or not to display labels between the plotting segments |
+| |             `unibar`| `bool` | Whether or not to display unibars between the plotting segments |
 | Highlighting | `hi_var` | `str` |  Variable to be highlighted. Default is none. |
-| | `hi_value` | `List[str or int]` | List of values of `hi_var` to be highlighted. You can highlighted one or multiple values. |
+| | `hi_value` | `List[str or int] or str or int` | Value(s) of `hi_var` to be highlighted. You can highlighted one or multiple values. You can also pass an expression (e.g. "x>1 and (x>5 or x<4)") in string when you want to specify a range for a numeric hi_var.|
 | | `hi_box` | `str` | Controls how highlighted values are displayed within category labels. Options are "vertical" for vertically stacked color segments or "horizontal" for horizontally split color segments. Default is "vertical".|
 | | `hi_missing` | `bool` | Whether or not missing values for `hi_var` should be highlighted. |
 | | `color` | `List[str]` | List of colors corresponding to the list of values to be highlighted. Each color can be specified as a plain color name (e.g., `"red"`, `"yellow"`) or in the format `"color=alpha"` (e.g., `"red=0.5"`) to control transparency/intensity, where `alpha` is a decimal between 0 and 1. The default highlight color list is `["red", "green", "yellow", "lightblue", "orange", "gray", "brown", "olive", "pink", "cyan", "magenta"]`. |
 | | `default_color` | `str` |  Default color of plotting elements for boxes that are not highlighted. Default is "blue" |
-| Manipulating Spacing and Layout |   `bar_width` | `float`  | Factor by which the default width is  increased or reduced. This allows reducing visual clutter. Default is 1.0. |
-| |              `space` |  `float`  | Space left for the labels between the plotting elements. Default is 0.5 |
+| Manipulating Spacing and Layout |   `uni_fraction` | `float`  | Fraction of vertical space that should be populated by data. Adjusts the height of the data points. Defaults is 0.08. |
+| |              `space` |  `float`  |Fraction of horizontal space allocated to labels/univ. bars rather than to connecting boxes. Default is 0.3 |
 | |              `label_options` |  `Dict[str, Dict[str, Any]]`  | Manipulates the size and look of the labels. Args following the options in the website: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.text.html Example:{"ExampleVarname":{"fontsize":12,"fontstyle":"italic","fontweight":"black","color":"b"}}  Default is None. |
 | |              `height` |  `float`  | Height of the plot in inches. Default is 10. |
 | |              `width` |  `float`  |  Width of the plot in inches. Default is 15. Caution: Width too narrow may distort the plot. |
-| |              `min_bar_width` | `float` | Minimal bar width. Bars representing only a tiny fraction of the data may be so narrow, that they are invivisible in a plot. The default value tries to ensure this does not happen.  Default is 0.07.
+| |               `alpha` | `float` | Alpha value for the colours in the plot. Float from 0-1. Default is 0.7. |
+| |              `min_bar_height` | `float` | Minimal bar height. Bars representing only a tiny fraction of the data may be so narrow, that they are invivisible in a plot. The default value tries to ensure this does not happen.  Default is 0.1.
 | Other options |              `shape` |  `str`  | Shape of the boxes. "rectangle" (default) or "parallelogram". |
 | |              `same_scale` |  `List[str]`  | List of variables that have the same scale. Default is None. |
 | |              `display_figure` |  `bool`  | Whether or not to display the figure. This can be useful if you just want to save the plots. Default is 'True'. |

{hammock_plot-0.4 → hammock_plot-1.0}/hammock_plot/__init__.py RENAMED Viewed

@@ -1,4 +1,5 @@
-from .hammock_plot import Hammock
+from .main import Hammock
 __author__ = "Tiancheng Yang"
 __author_email__ = "t77yang@uwaterloo.ca"
+__all__ = ["Hammock"]

hammock-plot 0.4__tar.gz → 1.0__tar.gz

hammock-plot 0.4tar.gz → 1.0tar.gz