pandas-plots 0.9.9__tar.gz → 0.10.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {pandas-plots-0.9.9/src/pandas_plots.egg-info → pandas-plots-0.10.1}/PKG-INFO +9 -7
- {pandas-plots-0.9.9 → pandas-plots-0.10.1}/README.md +8 -6
- {pandas-plots-0.9.9 → pandas-plots-0.10.1}/setup.cfg +1 -1
- {pandas-plots-0.9.9 → pandas-plots-0.10.1}/src/pandas_plots/pls.py +239 -38
- {pandas-plots-0.9.9 → pandas-plots-0.10.1/src/pandas_plots.egg-info}/PKG-INFO +9 -7
- {pandas-plots-0.9.9 → pandas-plots-0.10.1}/LICENSE +0 -0
- {pandas-plots-0.9.9 → pandas-plots-0.10.1}/pyproject.toml +0 -0
- {pandas-plots-0.9.9 → pandas-plots-0.10.1}/src/pandas_plots/hlp.py +0 -0
- {pandas-plots-0.9.9 → pandas-plots-0.10.1}/src/pandas_plots/tbl.py +0 -0
- {pandas-plots-0.9.9 → pandas-plots-0.10.1}/src/pandas_plots/ven.py +0 -0
- {pandas-plots-0.9.9 → pandas-plots-0.10.1}/src/pandas_plots.egg-info/SOURCES.txt +0 -0
- {pandas-plots-0.9.9 → pandas-plots-0.10.1}/src/pandas_plots.egg-info/dependency_links.txt +0 -0
- {pandas-plots-0.9.9 → pandas-plots-0.10.1}/src/pandas_plots.egg-info/requires.txt +0 -0
- {pandas-plots-0.9.9 → pandas-plots-0.10.1}/src/pandas_plots.egg-info/top_level.txt +0 -0
@@ -1,6 +1,6 @@
|
|
1
1
|
Metadata-Version: 2.1
|
2
2
|
Name: pandas-plots
|
3
|
-
Version: 0.
|
3
|
+
Version: 0.10.1
|
4
4
|
Summary: A collection of helper for table handling and vizualization
|
5
5
|
Home-page: https://github.com/smeisegeier/pandas-plots
|
6
6
|
Author: smeisegeier
|
@@ -71,22 +71,22 @@ tbl.show_num_df(
|
|
71
71
|
|
72
72
|
## why use pandas-plots
|
73
73
|
|
74
|
-
`pandas-plots` is a package to help you examine and visualize data that are organized in a pandas DataFrame. It provides a high level api to pandas / plotly with some selected functions
|
75
|
-
|
76
|
-
It is subdivided into:
|
74
|
+
`pandas-plots` is a package to help you examine and visualize data that are organized in a pandas DataFrame. It provides a high level api to pandas / plotly with some selected functions and predefined options:
|
77
75
|
|
78
76
|
- `tbl` utilities for table descriptions
|
79
77
|
- 🌟`show_num_df()` displays a table as styled version with additional information
|
80
78
|
- `describe_df()` an alternative version of pandas `describe()` function
|
81
79
|
- `pivot_df()` gets a pivot table of a 3 column dataframe
|
82
|
-
- _⚠️ `pivot_df()` is depricated and wont get further updates. Its features are well covered in standard `pd.pivot_table()`_
|
80
|
+
- >_⚠️ `pivot_df()` is depricated and wont get further updates. Its features are well covered in standard `pd.pivot_table()`_
|
83
81
|
|
84
82
|
- `pls` for plotly visualizations
|
85
83
|
- `plot_box()` auto annotated boxplot w/ violin option
|
86
84
|
- `plot_boxes()` multiple boxplots _(annotation is experimental)_
|
87
|
-
- `plots_bars()` a standardized bar plot
|
88
|
-
- 🆕 now features convidence intervals via `use_ci` option
|
89
85
|
- `plot_stacked_bars()` shortcut to stacked bars 😄
|
86
|
+
- `plots_bars()` a standardized bar plot for a **categorical** column
|
87
|
+
- features convidence intervals via `use_ci` option
|
88
|
+
- 🆕 `plot_histogram()` histogram for one or more **numerical** columns
|
89
|
+
- 🆕 `plot_joints()` a joint plot for **exactly two numerical** columns
|
90
90
|
- `plot_quadrants()` quickly shows a 2x2 heatmap
|
91
91
|
|
92
92
|
- `ven` offers functions for _venn diagrams_
|
@@ -98,6 +98,8 @@ It is subdivided into:
|
|
98
98
|
- `mean_confidence_interval()` calculates mean and confidence interval for a series
|
99
99
|
- `wrap_text()` formats strings or lists to a given width to fit nicely on the screen
|
100
100
|
- `replace_delimiter_outside_quotes()` when manual import of csv files is needed: replaces delimiters only outside of quotes
|
101
|
+
- 🆕 `create_barcode_from_url()` creates a barcode from a given URL
|
102
|
+
- 🆕 `add_datetime_col()` adds a datetime columns to a dataframe
|
101
103
|
|
102
104
|
> note: theme setting can be controlled through all functions by setting the environment variable `THEME` to either light or dark
|
103
105
|
|
@@ -42,22 +42,22 @@ tbl.show_num_df(
|
|
42
42
|
|
43
43
|
## why use pandas-plots
|
44
44
|
|
45
|
-
`pandas-plots` is a package to help you examine and visualize data that are organized in a pandas DataFrame. It provides a high level api to pandas / plotly with some selected functions
|
46
|
-
|
47
|
-
It is subdivided into:
|
45
|
+
`pandas-plots` is a package to help you examine and visualize data that are organized in a pandas DataFrame. It provides a high level api to pandas / plotly with some selected functions and predefined options:
|
48
46
|
|
49
47
|
- `tbl` utilities for table descriptions
|
50
48
|
- 🌟`show_num_df()` displays a table as styled version with additional information
|
51
49
|
- `describe_df()` an alternative version of pandas `describe()` function
|
52
50
|
- `pivot_df()` gets a pivot table of a 3 column dataframe
|
53
|
-
- _⚠️ `pivot_df()` is depricated and wont get further updates. Its features are well covered in standard `pd.pivot_table()`_
|
51
|
+
- >_⚠️ `pivot_df()` is depricated and wont get further updates. Its features are well covered in standard `pd.pivot_table()`_
|
54
52
|
|
55
53
|
- `pls` for plotly visualizations
|
56
54
|
- `plot_box()` auto annotated boxplot w/ violin option
|
57
55
|
- `plot_boxes()` multiple boxplots _(annotation is experimental)_
|
58
|
-
- `plots_bars()` a standardized bar plot
|
59
|
-
- 🆕 now features convidence intervals via `use_ci` option
|
60
56
|
- `plot_stacked_bars()` shortcut to stacked bars 😄
|
57
|
+
- `plots_bars()` a standardized bar plot for a **categorical** column
|
58
|
+
- features convidence intervals via `use_ci` option
|
59
|
+
- 🆕 `plot_histogram()` histogram for one or more **numerical** columns
|
60
|
+
- 🆕 `plot_joints()` a joint plot for **exactly two numerical** columns
|
61
61
|
- `plot_quadrants()` quickly shows a 2x2 heatmap
|
62
62
|
|
63
63
|
- `ven` offers functions for _venn diagrams_
|
@@ -69,6 +69,8 @@ It is subdivided into:
|
|
69
69
|
- `mean_confidence_interval()` calculates mean and confidence interval for a series
|
70
70
|
- `wrap_text()` formats strings or lists to a given width to fit nicely on the screen
|
71
71
|
- `replace_delimiter_outside_quotes()` when manual import of csv files is needed: replaces delimiters only outside of quotes
|
72
|
+
- 🆕 `create_barcode_from_url()` creates a barcode from a given URL
|
73
|
+
- 🆕 `add_datetime_col()` adds a datetime columns to a dataframe
|
72
74
|
|
73
75
|
> note: theme setting can be controlled through all functions by setting the environment variable `THEME` to either light or dark
|
74
76
|
|
@@ -4,10 +4,12 @@ warnings.filterwarnings("ignore")
|
|
4
4
|
|
5
5
|
import os
|
6
6
|
from typing import Literal
|
7
|
-
|
8
|
-
from plotly import express as px
|
7
|
+
|
9
8
|
import pandas as pd
|
10
9
|
import seaborn as sb
|
10
|
+
from matplotlib import pyplot as plt
|
11
|
+
from plotly import express as px
|
12
|
+
|
11
13
|
from .hlp import *
|
12
14
|
|
13
15
|
|
@@ -168,9 +170,9 @@ def plot_stacked_bars(
|
|
168
170
|
df.iloc[:, 0] = df.iloc[:, 0].str.strip()
|
169
171
|
if df.iloc[:, 1].dtype.kind == "O":
|
170
172
|
df.iloc[:, 1] = df.iloc[:, 1].str.strip()
|
171
|
-
|
173
|
+
|
172
174
|
# * apply precision
|
173
|
-
df.iloc[:,2] = df.iloc[:,2].round(precision)
|
175
|
+
df.iloc[:, 2] = df.iloc[:, 2].round(precision)
|
174
176
|
|
175
177
|
# * set index + color col
|
176
178
|
col_index = df.columns[0] if not swap else df.columns[1]
|
@@ -323,34 +325,36 @@ def plot_stacked_bars(
|
|
323
325
|
|
324
326
|
|
325
327
|
def plot_bars(
|
326
|
-
|
328
|
+
df_in: pd.Series | pd.DataFrame,
|
327
329
|
caption: str = None,
|
328
330
|
top_n_index: int = 0,
|
329
331
|
top_n_minvalue: int = 0,
|
330
332
|
dropna: bool = False,
|
331
333
|
orientation: Literal["h", "v"] = "v",
|
332
334
|
sort_values: bool = False,
|
333
|
-
normalize: bool =
|
335
|
+
normalize: bool = True,
|
334
336
|
height: int = 500,
|
335
337
|
width: int = 1600,
|
336
338
|
title: str = None,
|
337
339
|
use_ci: bool = False,
|
338
|
-
precision: int =
|
340
|
+
precision: int = 0,
|
339
341
|
renderer: Literal["png", "svg", None] = "png",
|
340
342
|
) -> None:
|
341
343
|
"""
|
342
|
-
A function to plot a bar chart based on a
|
343
|
-
Accepts
|
344
|
+
A function to plot a bar chart based on a *categorical* column (must be string or bool) and a numerical value.
|
345
|
+
Accepts:
|
346
|
+
- a dataframe w/ exactly 2 columns: string and numerical OR
|
347
|
+
- a series, then value_counts() is applied upon to form the numercal, and use_ci is set to false
|
344
348
|
|
345
349
|
Parameters:
|
346
|
-
-
|
350
|
+
- df_in: df or series.
|
347
351
|
- caption: An optional string indicating the caption for the chart.
|
348
352
|
- top_n_index: An optional integer indicating the number of top indexes to include in the chart. Default is 0, which includes all indexes.
|
349
353
|
- top_n_minvalue: An optional integer indicating the minimum value to be included in the chart. Default is 0, which includes all values.
|
350
354
|
- dropna: A boolean indicating whether to drop NaN values from the chart. Default is False.
|
351
355
|
- orientation: A string indicating the orientation of the chart. It can be either "h" for horizontal or "v" for vertical. Default is "v".
|
352
356
|
- sort_values: A boolean indicating whether to sort the values in the chart. Default is False.
|
353
|
-
-
|
357
|
+
- normalize: A boolean indicating whether to show pct values in the chart. Default is False.
|
354
358
|
- height: An optional integer indicating the height of the chart. Default is 500.
|
355
359
|
- width: An optional integer indicating the width of the chart. Default is 2000.
|
356
360
|
- title: An optional string indicating the title of the chart. If not provided, the title will be the name of the index column.
|
@@ -359,41 +363,50 @@ def plot_bars(
|
|
359
363
|
- enforces vertical orientation.
|
360
364
|
- enforces nomalize=False
|
361
365
|
- enforces dropna=True
|
366
|
+
- precision: An integer indicating the number of decimal places to round the values to. Default is 0.
|
362
367
|
- renderer: A string indicating the renderer to use for displaying the chart. It can be "png", "svg", or None. Default is "png".
|
363
368
|
|
364
369
|
Returns:
|
365
370
|
- None
|
366
371
|
"""
|
372
|
+
# * if series, apply value_counts, deselect use_ci
|
373
|
+
if isinstance(df_in, pd.Series):
|
374
|
+
if df_in.dtype.kind not in ["O", "b"]:
|
375
|
+
print("❌ for numeric series use plot_histogram().")
|
376
|
+
return
|
377
|
+
else:
|
378
|
+
df_in = df_in.value_counts(dropna=dropna).to_frame().reset_index()
|
379
|
+
use_ci = False
|
367
380
|
|
368
|
-
# * if
|
369
|
-
if isinstance(
|
370
|
-
if len(
|
381
|
+
# * if df, check if valid
|
382
|
+
if isinstance(df_in, pd.DataFrame):
|
383
|
+
if len(df_in.columns) != 2:
|
371
384
|
print("❌ df must have exactly 2 columns")
|
372
385
|
return
|
373
|
-
|
374
|
-
|
375
|
-
|
376
|
-
|
377
|
-
|
378
|
-
|
379
|
-
|
380
|
-
|
381
|
-
|
382
|
-
|
386
|
+
elif not (df_in.iloc[:, 0].dtype.kind in ["O", "b"]) or not (
|
387
|
+
df_in.iloc[:, 1].dtype.kind in ["i", "f"]
|
388
|
+
):
|
389
|
+
print("❌ df must have string and numeric columns (in that order).")
|
390
|
+
return
|
391
|
+
else:
|
392
|
+
print("❌ input must be series or dataframe.")
|
393
|
+
return
|
394
|
+
|
395
|
+
col_index = df_in.columns[0]
|
396
|
+
col_name = df_in.columns[1]
|
383
397
|
|
384
398
|
# * ensure df is grouped to prevent false aggregations, reset index to return df
|
385
399
|
if use_ci:
|
386
400
|
# * grouping is smoother on df than on series
|
387
|
-
df = ser.reset_index()
|
388
401
|
df = (
|
389
|
-
|
390
|
-
|
402
|
+
df_in.groupby(
|
403
|
+
col_index,
|
391
404
|
dropna=False,
|
392
405
|
)
|
393
406
|
.agg(
|
394
|
-
mean=(
|
407
|
+
mean=(col_name, "mean"),
|
395
408
|
# * retrieve margin from custom func
|
396
|
-
margin=(
|
409
|
+
margin=(col_name, lambda x: mean_confidence_interval(x)[1]),
|
397
410
|
)
|
398
411
|
.reset_index()
|
399
412
|
)
|
@@ -402,7 +415,9 @@ def plot_bars(
|
|
402
415
|
normalize = False
|
403
416
|
dropna = True
|
404
417
|
else:
|
405
|
-
df =
|
418
|
+
df = df_in.groupby(col_index, dropna=dropna)[col_name].sum().reset_index()
|
419
|
+
|
420
|
+
# return df
|
406
421
|
|
407
422
|
# * nulls are hidden by default in plotly etc, so give them a proper category
|
408
423
|
if dropna:
|
@@ -410,8 +425,9 @@ def plot_bars(
|
|
410
425
|
else:
|
411
426
|
df = df.fillna("<NA>")
|
412
427
|
|
413
|
-
# * get n
|
428
|
+
# * get n, col1 now is always numeric
|
414
429
|
n = df[df.columns[1]].sum()
|
430
|
+
n_len = len(df_in)
|
415
431
|
|
416
432
|
# * after grouping add cols for pct and formatting
|
417
433
|
df["pct"] = df[df.columns[1]] / n
|
@@ -425,14 +441,15 @@ def plot_bars(
|
|
425
441
|
None
|
426
442
|
if not use_ci
|
427
443
|
else df.apply(
|
428
|
-
lambda row: f"{row['cnt_str']}{divider}[{row['mean']-row['margin']:_.{precision}f};{row['mean']+row['margin']:_.{precision}f}]",
|
444
|
+
lambda row: f"{row['cnt_str']}{divider}[{row['mean']-row['margin']:_.{precision}f};{row['mean']+row['margin']:_.{precision}f}]",
|
445
|
+
axis=1,
|
429
446
|
)
|
430
447
|
)
|
431
448
|
|
432
449
|
# * set col vars according to config
|
433
|
-
col_index = df.columns[0]
|
434
450
|
col_value = "pct" if not use_ci else df.columns[1]
|
435
451
|
col_value_str = "ci_str" if use_ci else "cnt_pct_str" if normalize else "cnt_str"
|
452
|
+
# return df
|
436
453
|
|
437
454
|
# * if top n selected
|
438
455
|
if top_n_index > 0:
|
@@ -467,7 +484,9 @@ def plot_bars(
|
|
467
484
|
_title_str_minval = f"ALL >{top_n_minvalue}, " if top_n_minvalue > 0 else ""
|
468
485
|
|
469
486
|
# * title str n
|
470
|
-
_title_str_n =
|
487
|
+
_title_str_n = (
|
488
|
+
f", n={n:_}" if not use_ci else f", n={n_len:_}<br><sub>ci(95) on means<sub>"
|
489
|
+
)
|
471
490
|
|
472
491
|
# * title str na
|
473
492
|
_title_str_null = f", NULL excluded" if dropna else ""
|
@@ -475,7 +494,7 @@ def plot_bars(
|
|
475
494
|
# * layot caption if provided
|
476
495
|
caption = _set_caption(caption)
|
477
496
|
|
478
|
-
#
|
497
|
+
# ! plot
|
479
498
|
_fig = px.bar(
|
480
499
|
df.sort_values(
|
481
500
|
col_value if sort_values else col_index,
|
@@ -486,7 +505,8 @@ def plot_bars(
|
|
486
505
|
text=col_value_str,
|
487
506
|
orientation=orientation,
|
488
507
|
# * retrieve the original columns from series
|
489
|
-
title=title
|
508
|
+
title=title
|
509
|
+
or f"{caption}{_title_str_minval}{_title_str_top}[{df.columns[1]}] by [{col_index}]{_title_str_null}{_title_str_n}",
|
490
510
|
# * retrieve theme from env (intro.set_theme) or default
|
491
511
|
template="plotly_dark" if os.getenv("THEME") == "dark" else "plotly",
|
492
512
|
width=width,
|
@@ -529,7 +549,6 @@ def plot_bars(
|
|
529
549
|
},
|
530
550
|
showlegend=False,
|
531
551
|
# uniformtext_minsize=14, uniformtext_mode='hide'
|
532
|
-
|
533
552
|
)
|
534
553
|
# * sorting
|
535
554
|
if orientation == "v":
|
@@ -544,11 +563,193 @@ def plot_bars(
|
|
544
563
|
_fig.update_layout(yaxis={"categoryorder": "category descending"})
|
545
564
|
|
546
565
|
# * looks better on single bars
|
547
|
-
_fig.update_traces(
|
566
|
+
_fig.update_traces(
|
567
|
+
textposition="outside" if not use_ci else "auto", error_y=dict(thickness=5)
|
568
|
+
)
|
548
569
|
_fig.show(renderer)
|
549
570
|
return
|
550
571
|
|
551
572
|
|
573
|
+
def plot_histogram(
|
574
|
+
df_ser: pd.DataFrame | pd.Series,
|
575
|
+
histnorm: Literal[
|
576
|
+
"probability", "probability density", "density", "percent", None
|
577
|
+
] = None,
|
578
|
+
nbins: int = 0,
|
579
|
+
orientation: Literal["h", "v"] = "v",
|
580
|
+
precision: int = 2,
|
581
|
+
height: int = 500,
|
582
|
+
width: int = 1600,
|
583
|
+
text_auto: bool = True,
|
584
|
+
barmode: Literal["group", "overlay", "relative"] = "relative",
|
585
|
+
renderer: Literal["png", "svg", None] = "png",
|
586
|
+
caption: str = None,
|
587
|
+
title: str = None,
|
588
|
+
) -> None:
|
589
|
+
"""
|
590
|
+
A function to plot a histogram based on *numeric* columns in a DataFrame.
|
591
|
+
Accepts:
|
592
|
+
- a numeric series
|
593
|
+
- a dataframe with only numeric columns
|
594
|
+
|
595
|
+
Parameters:
|
596
|
+
df_ser (pd.DataFrame | pd.Series): The input containing the data to be plotted.
|
597
|
+
histnorm (Literal["probability", "probability density", "density", "percent", None]): The normalization mode for the histogram. Default is None.
|
598
|
+
nbins (int): The number of bins in the histogram. Default is 0.
|
599
|
+
orientation (Literal["h", "v"]): The orientation of the histogram. Default is "v".
|
600
|
+
precision (int): The precision for rounding the data. Default is 2.
|
601
|
+
height (int): The height of the plot. Default is 500.
|
602
|
+
width (int): The width of the plot. Default is 1600.
|
603
|
+
text_auto (bool): Whether to automatically display text on the plot. Default is True.
|
604
|
+
barmode (Literal["group", "overlay", "relative"]): The mode for the bars in the histogram. Default is "relative".
|
605
|
+
renderer (Literal["png", "svg", None]): The renderer for displaying the plot. Default is "png".
|
606
|
+
caption (str): The caption for the plot. Default is None.
|
607
|
+
title (str): The title of the plot. Default is None.
|
608
|
+
|
609
|
+
Returns:
|
610
|
+
None
|
611
|
+
"""
|
612
|
+
|
613
|
+
# * convert to df if series
|
614
|
+
if isinstance(df_ser, pd.Series):
|
615
|
+
df = df_ser.to_frame()
|
616
|
+
else:
|
617
|
+
df=df_ser
|
618
|
+
|
619
|
+
col_not_num = df.select_dtypes(exclude="number").columns
|
620
|
+
if any(col_not_num):
|
621
|
+
print(
|
622
|
+
f"❌ all columns must be numeric, but the following are not: [{', '.join(col_not_num)}]. consider using plot_bars()."
|
623
|
+
)
|
624
|
+
return
|
625
|
+
|
626
|
+
# * rounding
|
627
|
+
df = df.applymap(lambda x: round(x, precision))
|
628
|
+
|
629
|
+
# ! plot
|
630
|
+
_caption=_set_caption(caption)
|
631
|
+
fig = px.histogram(
|
632
|
+
data_frame=df,
|
633
|
+
histnorm=histnorm,
|
634
|
+
nbins=nbins,
|
635
|
+
marginal="box",
|
636
|
+
barmode=barmode,
|
637
|
+
text_auto=text_auto,
|
638
|
+
height=height,
|
639
|
+
width=width,
|
640
|
+
orientation=orientation,
|
641
|
+
title=title or f"{_caption}[{', '.join(df.columns)}], n={df.shape[0]:_}",
|
642
|
+
template="plotly_dark" if os.getenv("THEME") == "dark" else "plotly",
|
643
|
+
)
|
644
|
+
# * set title properties
|
645
|
+
fig.update_layout(
|
646
|
+
title={
|
647
|
+
# 'x': 0.1,
|
648
|
+
"y": 0.95,
|
649
|
+
"xanchor": "left",
|
650
|
+
"yanchor": "top",
|
651
|
+
"font": {
|
652
|
+
"size": 24,
|
653
|
+
},
|
654
|
+
},
|
655
|
+
showlegend=False if df.shape[1]==1 else True,
|
656
|
+
)
|
657
|
+
|
658
|
+
fig.show(renderer)
|
659
|
+
return
|
660
|
+
|
661
|
+
|
662
|
+
def plot_joint(
|
663
|
+
df: pd.DataFrame,
|
664
|
+
kind: Literal["reg", "hist", "hex", "kde"] = "hex",
|
665
|
+
precision: int = 2,
|
666
|
+
size: int = 5,
|
667
|
+
dropna: bool = False,
|
668
|
+
caption: str = "",
|
669
|
+
title: str = "",
|
670
|
+
) -> None:
|
671
|
+
"""
|
672
|
+
Generate a seaborn joint plot for *two numeric* columns of a given DataFrame.
|
673
|
+
|
674
|
+
Parameters:
|
675
|
+
- df: The DataFrame containing the data to be plotted.
|
676
|
+
- kind: The type of plot to generate (default is "hex").
|
677
|
+
- precision: The number of decimal places to round the data to (default is 2).
|
678
|
+
- size: The size of the plot (default is 5).
|
679
|
+
- dropna: Whether to drop NA values before plotting (default is False).
|
680
|
+
- caption: A caption for the plot.
|
681
|
+
- title: The title of the plot.
|
682
|
+
|
683
|
+
Returns:
|
684
|
+
None
|
685
|
+
"""
|
686
|
+
|
687
|
+
if df.shape[1] != 2:
|
688
|
+
print("❌ df must have 2 columns")
|
689
|
+
return
|
690
|
+
|
691
|
+
col_not_num = df.select_dtypes(exclude="number").columns
|
692
|
+
if any(col_not_num):
|
693
|
+
print(
|
694
|
+
f"❌ both columns must be numeric, but the following are not: [{', '.join(col_not_num)}]. consider using plot_bars()."
|
695
|
+
)
|
696
|
+
return
|
697
|
+
|
698
|
+
df = df.applymap(lambda x: round(x, precision))
|
699
|
+
|
700
|
+
# ! plot
|
701
|
+
# * set theme and palette
|
702
|
+
sb.set_theme(style="darkgrid", palette="tab10")
|
703
|
+
if os.getenv("THEME") == "dark":
|
704
|
+
_style = "dark_background"
|
705
|
+
_cmap = "rocket"
|
706
|
+
else:
|
707
|
+
_style = "bmh"
|
708
|
+
_cmap = "bone_r"
|
709
|
+
plt.style.use(_style)
|
710
|
+
|
711
|
+
dict_base = {
|
712
|
+
"x": df.columns[0],
|
713
|
+
"y": df.columns[1],
|
714
|
+
"data": df,
|
715
|
+
"height": size,
|
716
|
+
"kind": kind,
|
717
|
+
"ratio": 10,
|
718
|
+
"marginal_ticks": False,
|
719
|
+
"dropna": dropna,
|
720
|
+
# "title": f"{caption}[{ser.name}], n = {len(ser):_}" if not title else title,
|
721
|
+
}
|
722
|
+
dict_hex={"cmap": _cmap}
|
723
|
+
dict_kde={"fill": True, "cmap": _cmap}
|
724
|
+
|
725
|
+
if kind=="hex":
|
726
|
+
fig = sb.jointplot(**dict_base, **dict_hex)
|
727
|
+
elif kind=="kde":
|
728
|
+
fig = sb.jointplot(**dict_base, **dict_kde)
|
729
|
+
else:
|
730
|
+
fig = sb.jointplot(**dict_base)
|
731
|
+
|
732
|
+
# * emojis dont work in good ol seaborn
|
733
|
+
_caption="" if not caption else f"#{caption}, "
|
734
|
+
fig.figure.suptitle(title or f"{_caption}[{df.columns[0]}] vs [{df.columns[1]}], n = {len(df):_}")
|
735
|
+
# * leave some room for the title
|
736
|
+
fig.figure.tight_layout()
|
737
|
+
fig.figure.subplots_adjust(top=0.90)
|
738
|
+
|
739
|
+
# sb.jointplot(
|
740
|
+
# x=df.columns[0],
|
741
|
+
# y=df.columns[1],
|
742
|
+
# data=df,
|
743
|
+
# height=size,
|
744
|
+
# kind=kind,
|
745
|
+
# ratio=10,
|
746
|
+
# marginal_ticks=False,
|
747
|
+
# dropna=dropna,
|
748
|
+
# cmap=_cmap,
|
749
|
+
# )
|
750
|
+
return
|
751
|
+
|
752
|
+
|
552
753
|
def plot_box(
|
553
754
|
ser: pd.Series,
|
554
755
|
points: Literal["all", "outliers", "suspectedoutlieres", None] = None,
|
@@ -1,6 +1,6 @@
|
|
1
1
|
Metadata-Version: 2.1
|
2
2
|
Name: pandas-plots
|
3
|
-
Version: 0.
|
3
|
+
Version: 0.10.1
|
4
4
|
Summary: A collection of helper for table handling and vizualization
|
5
5
|
Home-page: https://github.com/smeisegeier/pandas-plots
|
6
6
|
Author: smeisegeier
|
@@ -71,22 +71,22 @@ tbl.show_num_df(
|
|
71
71
|
|
72
72
|
## why use pandas-plots
|
73
73
|
|
74
|
-
`pandas-plots` is a package to help you examine and visualize data that are organized in a pandas DataFrame. It provides a high level api to pandas / plotly with some selected functions
|
75
|
-
|
76
|
-
It is subdivided into:
|
74
|
+
`pandas-plots` is a package to help you examine and visualize data that are organized in a pandas DataFrame. It provides a high level api to pandas / plotly with some selected functions and predefined options:
|
77
75
|
|
78
76
|
- `tbl` utilities for table descriptions
|
79
77
|
- 🌟`show_num_df()` displays a table as styled version with additional information
|
80
78
|
- `describe_df()` an alternative version of pandas `describe()` function
|
81
79
|
- `pivot_df()` gets a pivot table of a 3 column dataframe
|
82
|
-
- _⚠️ `pivot_df()` is depricated and wont get further updates. Its features are well covered in standard `pd.pivot_table()`_
|
80
|
+
- >_⚠️ `pivot_df()` is depricated and wont get further updates. Its features are well covered in standard `pd.pivot_table()`_
|
83
81
|
|
84
82
|
- `pls` for plotly visualizations
|
85
83
|
- `plot_box()` auto annotated boxplot w/ violin option
|
86
84
|
- `plot_boxes()` multiple boxplots _(annotation is experimental)_
|
87
|
-
- `plots_bars()` a standardized bar plot
|
88
|
-
- 🆕 now features convidence intervals via `use_ci` option
|
89
85
|
- `plot_stacked_bars()` shortcut to stacked bars 😄
|
86
|
+
- `plots_bars()` a standardized bar plot for a **categorical** column
|
87
|
+
- features convidence intervals via `use_ci` option
|
88
|
+
- 🆕 `plot_histogram()` histogram for one or more **numerical** columns
|
89
|
+
- 🆕 `plot_joints()` a joint plot for **exactly two numerical** columns
|
90
90
|
- `plot_quadrants()` quickly shows a 2x2 heatmap
|
91
91
|
|
92
92
|
- `ven` offers functions for _venn diagrams_
|
@@ -98,6 +98,8 @@ It is subdivided into:
|
|
98
98
|
- `mean_confidence_interval()` calculates mean and confidence interval for a series
|
99
99
|
- `wrap_text()` formats strings or lists to a given width to fit nicely on the screen
|
100
100
|
- `replace_delimiter_outside_quotes()` when manual import of csv files is needed: replaces delimiters only outside of quotes
|
101
|
+
- 🆕 `create_barcode_from_url()` creates a barcode from a given URL
|
102
|
+
- 🆕 `add_datetime_col()` adds a datetime columns to a dataframe
|
101
103
|
|
102
104
|
> note: theme setting can be controlled through all functions by setting the environment variable `THEME` to either light or dark
|
103
105
|
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|