qdesc 0.1.9.1__tar.gz → 0.1.9.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of qdesc might be problematic. Click here for more details.

qdesc-0.1.9.3/PKG-INFO ADDED
@@ -0,0 +1,175 @@
1
+ Metadata-Version: 2.4
2
+ Name: qdesc
3
+ Version: 0.1.9.3
4
+ Summary: Quick and Easy way to do descriptive analysis.
5
+ Author: Paolo Hilado
6
+ Author-email: datasciencepgh@proton.me
7
+ Description-Content-Type: text/markdown
8
+ License-File: LICENCE.txt
9
+ Requires-Dist: pandas
10
+ Requires-Dist: numpy
11
+ Requires-Dist: scipy
12
+ Requires-Dist: seaborn
13
+ Requires-Dist: matplotlib
14
+ Requires-Dist: statsmodels
15
+ Dynamic: author
16
+ Dynamic: author-email
17
+ Dynamic: description
18
+ Dynamic: description-content-type
19
+ Dynamic: license-file
20
+ Dynamic: requires-dist
21
+ Dynamic: summary
22
+
23
+ # qdesc - Quick and Easy Descriptive Analysis
24
+ ![Package Version](https://img.shields.io/badge/version-0.1.9.2-pink)
25
+ ![Downloads](https://pepy.tech/badge/qdesc)
26
+ ![Python Version](https://img.shields.io/badge/python-3.8%2B-blue)
27
+ ![License: GPL v3.0](https://img.shields.io/badge/license-GPL%20v3.0-blue)
28
+
29
+ ## Installation
30
+ ```sh
31
+ pip install qdesc
32
+ ```
33
+
34
+ ## Overview
35
+ Qdesc is a package for quick and easy descriptive analysis. It is a powerful Python package designed for quick and easy descriptive analysis of quantitative data. It provides essential statistics like mean and standard deviation for normal distribution and median and raw median absolute deviation for skewed data. With built-in functions for frequency distributions, users can effortlessly analyze categorical variables and export results to a spreadsheet. The package also includes a normality check dashboard, featuring Anderson-Darling statistics and visualizations like histograms and Q-Q plots. Whether you're handling structured datasets or exploring statistical trends, qdesc streamlines the process with efficiency and clarity.
36
+
37
+ ## Creating a sample dataframe
38
+ ```python
39
+ import pandas as pd
40
+ import numpy as np
41
+
42
+ # Create sample data
43
+ data = {
44
+ "Age": np.random.randint(18, 60, size=15), # Continuous variable
45
+ "Salary": np.random.randint(30000, 120000, size=15), # Continuous variable
46
+ "Department": np.random.choice(["HR", "Finance", "IT", "Marketing"], size=15), # Categorical variable
47
+ "Gender": np.random.choice(["Male", "Female"], size=15), # Categorical variable
48
+ }
49
+ # Create DataFrame
50
+ df = pd.DataFrame(data)
51
+ ```
52
+
53
+ ## qd.desc Function
54
+ The function qd.desc(df) generates the following statistics:
55
+ * count - number of observations
56
+ * mean - measure of central tendency for normal distribution
57
+ * std - measure of spread for normal distribution
58
+ * median - measure of central tendency for skewed distributions or those with outliers
59
+ * MAD - measure of spread for skewed distributions or those with outliers; this is manual Median Absolute Deviation (MAD) which is more robust when dealing with non-normal distributions.
60
+ * min - lowest observed value
61
+ * max - highest observed value
62
+ * AD_stat - Anderson - Darling Statistic
63
+ * 5% crit_value - critical value for a 5% Significance Level
64
+ * 1% crit_value - critical value for a 1% Significance Level
65
+
66
+ ```python
67
+ import qdesc as qd
68
+ qd.desc(df)
69
+
70
+ | Variable | Count | Mean | Std Dev | Median | MAD | Min | Max | AD Stat | 5% Crit Value |
71
+ |----------|-------|-------|---------|--------|-------|-------|--------|---------|---------------|
72
+ | Age | 15.0 | 37.87 | 13.51 | 38.0 | 12.0 | 20.0 | 59.0 | 0.41 | 0.68 |
73
+ | Salary | 15.0 | 72724 | 29483 | 67660 | 26311 | 34168 | 119590 | 0.40 | 0.68 |
74
+ ```
75
+
76
+
77
+
78
+ ## qd.grp_desc Function
79
+ This function, qd.grp_desc(df, "Continuous Var", "Group Var") creates a table for descriptive statistics similar to the qd.desc function but has the measures
80
+ presented for each level of the grouping variable. It allows one to check whether these measures, for each group, are approximately normal or not. Combining it
81
+ with qd.normcheck_dashboard allows one to decide on the appropriate measure of central tendency and spread.
82
+
83
+ ```python
84
+ import qdesc as qd
85
+ qd.grp_desc(df, "Salary", "Gender")
86
+
87
+ | Gender | Count | Mean - | Std Dev | Median | MAD | Min | Max | AD Stat | 5% Crit Value |
88
+ |---------|-------|-----------|-----------|----------|----------|--------|---------|---------|---------------|
89
+ | Female | 7 | 84,871.14 | 32,350.37 | 93,971.0 | 25,619.0 | 40,476 | 119,590 | 0.36 | 0.74 |
90
+ | Male | 8 | 62,096.12 | 23,766.82 | 60,347.0 | 14,278.5 | 34,168 | 106,281 | 0.24 | 0.71 |
91
+ ```
92
+
93
+
94
+ ## qd.freqdist Function
95
+ Run the function qd.freqdist(df, "Variable Name") to easily create a frequency distribution for your chosen categorical variable with the following:
96
+ * Variable Levels (i.e., for Sex Variable: Male and Female)
97
+ * Counts - the number of observations
98
+ * Percentage - percentage of observations from total.
99
+
100
+ ```python
101
+ import qdesc as qd
102
+ qd.freqdist(df, "Department")
103
+
104
+ | Department | Count | Percentage |
105
+ |------------|-------|------------|
106
+ | IT | 5 | 33.33 |
107
+ | HR | 5 | 33.33 |
108
+ | Marketing | 3 | 20.00 |
109
+ | Finance | 2 | 13.33 |
110
+ ```
111
+
112
+
113
+
114
+ ## qd.freqdist_a Function
115
+ Run the function qd.freqdist_a(df, ascending = FALSE) to easily create frequency distribution tables, arranged in descending manner (default) or ascending (TRUE), for all the categorical variables in your data frame. The resulting table will include columns such as:
116
+ * Variable levels (i.e., for Satisfaction: Very Low, Low, Moderate, High, Very High)
117
+ * Counts - the number of observations
118
+ * Percentage - percentage of observations from total.
119
+
120
+ ```python
121
+ import qdesc as qd
122
+ qd.freqdist_a(df)
123
+
124
+ | Column | Value | Count | Percentage |
125
+ |------------|----------|-------|------------|
126
+ | Department | IT | 5 | 33.33% |
127
+ | Department | HR | 5 | 33.33% |
128
+ | Department | Marketing| 3 | 20.00% |
129
+ | Department | Finance | 2 | 13.33% |
130
+ | Gender | Male | 8 | 53.33% |
131
+ | Gender | Female | 7 | 46.67% |
132
+ ```
133
+
134
+
135
+
136
+ ## qd.freqdist_to_excel Function
137
+ Run the function qd.freqdist_to_excel(df, "Filename.xlsx", ascending = FALSE ) to easily create frequency distribution tables, arranged in descending manner (default) or ascending (TRUE), for all the categorical variables in your data frame and SAVED as separate sheets in the .xlsx File. The resulting table will include columns such as:
138
+ * Variable levels (i.e., for Satisfaction: Very Low, Low, Moderate, High, Very High)
139
+ * Counts - the number of observations
140
+ * Percentage - percentage of observations from total.
141
+
142
+ ```python
143
+ import qdesc as qd
144
+ qd.freqdist_to_excel(df, "Results.xlsx")
145
+
146
+ Frequency distributions written to Results.xlsx
147
+ ```
148
+
149
+ ## qd.normcheck_dashboard Function
150
+ Run the function qd.normcheck_dashboard(df) to efficiently check each numeric variable for normality of its distribution. It will compute the Anderson-Darling statistic and create visualizations (i.e., qq-plot, histogram, and boxplots) for checking whether the distribution is approximately normal.
151
+
152
+ ```python
153
+ import qdesc as qd
154
+ qd.normcheck_dashboard(df)
155
+ ```
156
+ ![Descriptive Statistics](https://raw.githubusercontent.com/Dcroix/qdesc/refs/heads/main/qd.normcheck_dashboard.png)
157
+
158
+
159
+ ## License
160
+ This project is licensed under the GPL-3 License. See the LICENSE file for more details.
161
+
162
+ ## Acknowledgements
163
+ Acknowledgement of the libraries used by this package...
164
+
165
+ ### Pandas
166
+ Pandas is distributed under the BSD 3-Clause License, pandas is developed by Pandas contributors. Copyright (c) 2008-2024, the pandas development team All rights reserved.
167
+ ### NumPy
168
+ NumPy is distributed under the BSD 3-Clause License, numpy is developed by NumPy contributors. Copyright (c) 2005-2024, NumPy Developers. All rights reserved.
169
+ ### SciPy
170
+ SciPy is distributed under the BSD License, scipy is developed by SciPy contributors. Copyright (c) 2001-2024, SciPy Developers. All rights reserved.
171
+
172
+
173
+
174
+
175
+
@@ -0,0 +1,153 @@
1
+ # qdesc - Quick and Easy Descriptive Analysis
2
+ ![Package Version](https://img.shields.io/badge/version-0.1.9.2-pink)
3
+ ![Downloads](https://pepy.tech/badge/qdesc)
4
+ ![Python Version](https://img.shields.io/badge/python-3.8%2B-blue)
5
+ ![License: GPL v3.0](https://img.shields.io/badge/license-GPL%20v3.0-blue)
6
+
7
+ ## Installation
8
+ ```sh
9
+ pip install qdesc
10
+ ```
11
+
12
+ ## Overview
13
+ Qdesc is a package for quick and easy descriptive analysis. It is a powerful Python package designed for quick and easy descriptive analysis of quantitative data. It provides essential statistics like mean and standard deviation for normal distribution and median and raw median absolute deviation for skewed data. With built-in functions for frequency distributions, users can effortlessly analyze categorical variables and export results to a spreadsheet. The package also includes a normality check dashboard, featuring Anderson-Darling statistics and visualizations like histograms and Q-Q plots. Whether you're handling structured datasets or exploring statistical trends, qdesc streamlines the process with efficiency and clarity.
14
+
15
+ ## Creating a sample dataframe
16
+ ```python
17
+ import pandas as pd
18
+ import numpy as np
19
+
20
+ # Create sample data
21
+ data = {
22
+ "Age": np.random.randint(18, 60, size=15), # Continuous variable
23
+ "Salary": np.random.randint(30000, 120000, size=15), # Continuous variable
24
+ "Department": np.random.choice(["HR", "Finance", "IT", "Marketing"], size=15), # Categorical variable
25
+ "Gender": np.random.choice(["Male", "Female"], size=15), # Categorical variable
26
+ }
27
+ # Create DataFrame
28
+ df = pd.DataFrame(data)
29
+ ```
30
+
31
+ ## qd.desc Function
32
+ The function qd.desc(df) generates the following statistics:
33
+ * count - number of observations
34
+ * mean - measure of central tendency for normal distribution
35
+ * std - measure of spread for normal distribution
36
+ * median - measure of central tendency for skewed distributions or those with outliers
37
+ * MAD - measure of spread for skewed distributions or those with outliers; this is manual Median Absolute Deviation (MAD) which is more robust when dealing with non-normal distributions.
38
+ * min - lowest observed value
39
+ * max - highest observed value
40
+ * AD_stat - Anderson - Darling Statistic
41
+ * 5% crit_value - critical value for a 5% Significance Level
42
+ * 1% crit_value - critical value for a 1% Significance Level
43
+
44
+ ```python
45
+ import qdesc as qd
46
+ qd.desc(df)
47
+
48
+ | Variable | Count | Mean | Std Dev | Median | MAD | Min | Max | AD Stat | 5% Crit Value |
49
+ |----------|-------|-------|---------|--------|-------|-------|--------|---------|---------------|
50
+ | Age | 15.0 | 37.87 | 13.51 | 38.0 | 12.0 | 20.0 | 59.0 | 0.41 | 0.68 |
51
+ | Salary | 15.0 | 72724 | 29483 | 67660 | 26311 | 34168 | 119590 | 0.40 | 0.68 |
52
+ ```
53
+
54
+
55
+
56
+ ## qd.grp_desc Function
57
+ This function, qd.grp_desc(df, "Continuous Var", "Group Var") creates a table for descriptive statistics similar to the qd.desc function but has the measures
58
+ presented for each level of the grouping variable. It allows one to check whether these measures, for each group, are approximately normal or not. Combining it
59
+ with qd.normcheck_dashboard allows one to decide on the appropriate measure of central tendency and spread.
60
+
61
+ ```python
62
+ import qdesc as qd
63
+ qd.grp_desc(df, "Salary", "Gender")
64
+
65
+ | Gender | Count | Mean - | Std Dev | Median | MAD | Min | Max | AD Stat | 5% Crit Value |
66
+ |---------|-------|-----------|-----------|----------|----------|--------|---------|---------|---------------|
67
+ | Female | 7 | 84,871.14 | 32,350.37 | 93,971.0 | 25,619.0 | 40,476 | 119,590 | 0.36 | 0.74 |
68
+ | Male | 8 | 62,096.12 | 23,766.82 | 60,347.0 | 14,278.5 | 34,168 | 106,281 | 0.24 | 0.71 |
69
+ ```
70
+
71
+
72
+ ## qd.freqdist Function
73
+ Run the function qd.freqdist(df, "Variable Name") to easily create a frequency distribution for your chosen categorical variable with the following:
74
+ * Variable Levels (i.e., for Sex Variable: Male and Female)
75
+ * Counts - the number of observations
76
+ * Percentage - percentage of observations from total.
77
+
78
+ ```python
79
+ import qdesc as qd
80
+ qd.freqdist(df, "Department")
81
+
82
+ | Department | Count | Percentage |
83
+ |------------|-------|------------|
84
+ | IT | 5 | 33.33 |
85
+ | HR | 5 | 33.33 |
86
+ | Marketing | 3 | 20.00 |
87
+ | Finance | 2 | 13.33 |
88
+ ```
89
+
90
+
91
+
92
+ ## qd.freqdist_a Function
93
+ Run the function qd.freqdist_a(df, ascending = FALSE) to easily create frequency distribution tables, arranged in descending manner (default) or ascending (TRUE), for all the categorical variables in your data frame. The resulting table will include columns such as:
94
+ * Variable levels (i.e., for Satisfaction: Very Low, Low, Moderate, High, Very High)
95
+ * Counts - the number of observations
96
+ * Percentage - percentage of observations from total.
97
+
98
+ ```python
99
+ import qdesc as qd
100
+ qd.freqdist_a(df)
101
+
102
+ | Column | Value | Count | Percentage |
103
+ |------------|----------|-------|------------|
104
+ | Department | IT | 5 | 33.33% |
105
+ | Department | HR | 5 | 33.33% |
106
+ | Department | Marketing| 3 | 20.00% |
107
+ | Department | Finance | 2 | 13.33% |
108
+ | Gender | Male | 8 | 53.33% |
109
+ | Gender | Female | 7 | 46.67% |
110
+ ```
111
+
112
+
113
+
114
+ ## qd.freqdist_to_excel Function
115
+ Run the function qd.freqdist_to_excel(df, "Filename.xlsx", ascending = FALSE ) to easily create frequency distribution tables, arranged in descending manner (default) or ascending (TRUE), for all the categorical variables in your data frame and SAVED as separate sheets in the .xlsx File. The resulting table will include columns such as:
116
+ * Variable levels (i.e., for Satisfaction: Very Low, Low, Moderate, High, Very High)
117
+ * Counts - the number of observations
118
+ * Percentage - percentage of observations from total.
119
+
120
+ ```python
121
+ import qdesc as qd
122
+ qd.freqdist_to_excel(df, "Results.xlsx")
123
+
124
+ Frequency distributions written to Results.xlsx
125
+ ```
126
+
127
+ ## qd.normcheck_dashboard Function
128
+ Run the function qd.normcheck_dashboard(df) to efficiently check each numeric variable for normality of its distribution. It will compute the Anderson-Darling statistic and create visualizations (i.e., qq-plot, histogram, and boxplots) for checking whether the distribution is approximately normal.
129
+
130
+ ```python
131
+ import qdesc as qd
132
+ qd.normcheck_dashboard(df)
133
+ ```
134
+ ![Descriptive Statistics](https://raw.githubusercontent.com/Dcroix/qdesc/refs/heads/main/qd.normcheck_dashboard.png)
135
+
136
+
137
+ ## License
138
+ This project is licensed under the GPL-3 License. See the LICENSE file for more details.
139
+
140
+ ## Acknowledgements
141
+ Acknowledgement of the libraries used by this package...
142
+
143
+ ### Pandas
144
+ Pandas is distributed under the BSD 3-Clause License, pandas is developed by Pandas contributors. Copyright (c) 2008-2024, the pandas development team All rights reserved.
145
+ ### NumPy
146
+ NumPy is distributed under the BSD 3-Clause License, numpy is developed by NumPy contributors. Copyright (c) 2005-2024, NumPy Developers. All rights reserved.
147
+ ### SciPy
148
+ SciPy is distributed under the BSD License, scipy is developed by SciPy contributors. Copyright (c) 2001-2024, SciPy Developers. All rights reserved.
149
+
150
+
151
+
152
+
153
+
@@ -79,6 +79,7 @@ def grp_desc(df, numeric_col, group_col):
79
79
 
80
80
  def freqdist(df, column_name):
81
81
  import pandas as pd
82
+ import numpy as np
82
83
  if column_name not in df.columns:
83
84
  raise ValueError(f"Column '{column_name}' not found in DataFrame.")
84
85
 
@@ -87,16 +88,17 @@ def freqdist(df, column_name):
87
88
 
88
89
  freq_dist = df[column_name].value_counts().reset_index()
89
90
  freq_dist.columns = [column_name, 'Count']
90
- freq_dist['Percentage'] = (freq_dist['Count'] / len(df)) * 100
91
+ freq_dist['Percentage'] = np.round((freq_dist['Count'] / len(df)) * 100,2)
91
92
  return freq_dist
92
93
 
93
94
 
94
95
  def freqdist_a(df, ascending=False):
95
96
  import pandas as pd
97
+ import numpy as np
96
98
  results = []
97
99
  for column in df.select_dtypes(include=['object', 'category']).columns:
98
100
  frequency_table = df[column].value_counts()
99
- percentage_table = df[column].value_counts(normalize=True) * 100
101
+ percentage_table = np.round(df[column].value_counts(normalize=True) * 100,2)
100
102
 
101
103
  distribution = pd.DataFrame({
102
104
  'Column': column,
@@ -0,0 +1,175 @@
1
+ Metadata-Version: 2.4
2
+ Name: qdesc
3
+ Version: 0.1.9.3
4
+ Summary: Quick and Easy way to do descriptive analysis.
5
+ Author: Paolo Hilado
6
+ Author-email: datasciencepgh@proton.me
7
+ Description-Content-Type: text/markdown
8
+ License-File: LICENCE.txt
9
+ Requires-Dist: pandas
10
+ Requires-Dist: numpy
11
+ Requires-Dist: scipy
12
+ Requires-Dist: seaborn
13
+ Requires-Dist: matplotlib
14
+ Requires-Dist: statsmodels
15
+ Dynamic: author
16
+ Dynamic: author-email
17
+ Dynamic: description
18
+ Dynamic: description-content-type
19
+ Dynamic: license-file
20
+ Dynamic: requires-dist
21
+ Dynamic: summary
22
+
23
+ # qdesc - Quick and Easy Descriptive Analysis
24
+ ![Package Version](https://img.shields.io/badge/version-0.1.9.2-pink)
25
+ ![Downloads](https://pepy.tech/badge/qdesc)
26
+ ![Python Version](https://img.shields.io/badge/python-3.8%2B-blue)
27
+ ![License: GPL v3.0](https://img.shields.io/badge/license-GPL%20v3.0-blue)
28
+
29
+ ## Installation
30
+ ```sh
31
+ pip install qdesc
32
+ ```
33
+
34
+ ## Overview
35
+ Qdesc is a package for quick and easy descriptive analysis. It is a powerful Python package designed for quick and easy descriptive analysis of quantitative data. It provides essential statistics like mean and standard deviation for normal distribution and median and raw median absolute deviation for skewed data. With built-in functions for frequency distributions, users can effortlessly analyze categorical variables and export results to a spreadsheet. The package also includes a normality check dashboard, featuring Anderson-Darling statistics and visualizations like histograms and Q-Q plots. Whether you're handling structured datasets or exploring statistical trends, qdesc streamlines the process with efficiency and clarity.
36
+
37
+ ## Creating a sample dataframe
38
+ ```python
39
+ import pandas as pd
40
+ import numpy as np
41
+
42
+ # Create sample data
43
+ data = {
44
+ "Age": np.random.randint(18, 60, size=15), # Continuous variable
45
+ "Salary": np.random.randint(30000, 120000, size=15), # Continuous variable
46
+ "Department": np.random.choice(["HR", "Finance", "IT", "Marketing"], size=15), # Categorical variable
47
+ "Gender": np.random.choice(["Male", "Female"], size=15), # Categorical variable
48
+ }
49
+ # Create DataFrame
50
+ df = pd.DataFrame(data)
51
+ ```
52
+
53
+ ## qd.desc Function
54
+ The function qd.desc(df) generates the following statistics:
55
+ * count - number of observations
56
+ * mean - measure of central tendency for normal distribution
57
+ * std - measure of spread for normal distribution
58
+ * median - measure of central tendency for skewed distributions or those with outliers
59
+ * MAD - measure of spread for skewed distributions or those with outliers; this is manual Median Absolute Deviation (MAD) which is more robust when dealing with non-normal distributions.
60
+ * min - lowest observed value
61
+ * max - highest observed value
62
+ * AD_stat - Anderson - Darling Statistic
63
+ * 5% crit_value - critical value for a 5% Significance Level
64
+ * 1% crit_value - critical value for a 1% Significance Level
65
+
66
+ ```python
67
+ import qdesc as qd
68
+ qd.desc(df)
69
+
70
+ | Variable | Count | Mean | Std Dev | Median | MAD | Min | Max | AD Stat | 5% Crit Value |
71
+ |----------|-------|-------|---------|--------|-------|-------|--------|---------|---------------|
72
+ | Age | 15.0 | 37.87 | 13.51 | 38.0 | 12.0 | 20.0 | 59.0 | 0.41 | 0.68 |
73
+ | Salary | 15.0 | 72724 | 29483 | 67660 | 26311 | 34168 | 119590 | 0.40 | 0.68 |
74
+ ```
75
+
76
+
77
+
78
+ ## qd.grp_desc Function
79
+ This function, qd.grp_desc(df, "Continuous Var", "Group Var") creates a table for descriptive statistics similar to the qd.desc function but has the measures
80
+ presented for each level of the grouping variable. It allows one to check whether these measures, for each group, are approximately normal or not. Combining it
81
+ with qd.normcheck_dashboard allows one to decide on the appropriate measure of central tendency and spread.
82
+
83
+ ```python
84
+ import qdesc as qd
85
+ qd.grp_desc(df, "Salary", "Gender")
86
+
87
+ | Gender | Count | Mean - | Std Dev | Median | MAD | Min | Max | AD Stat | 5% Crit Value |
88
+ |---------|-------|-----------|-----------|----------|----------|--------|---------|---------|---------------|
89
+ | Female | 7 | 84,871.14 | 32,350.37 | 93,971.0 | 25,619.0 | 40,476 | 119,590 | 0.36 | 0.74 |
90
+ | Male | 8 | 62,096.12 | 23,766.82 | 60,347.0 | 14,278.5 | 34,168 | 106,281 | 0.24 | 0.71 |
91
+ ```
92
+
93
+
94
+ ## qd.freqdist Function
95
+ Run the function qd.freqdist(df, "Variable Name") to easily create a frequency distribution for your chosen categorical variable with the following:
96
+ * Variable Levels (i.e., for Sex Variable: Male and Female)
97
+ * Counts - the number of observations
98
+ * Percentage - percentage of observations from total.
99
+
100
+ ```python
101
+ import qdesc as qd
102
+ qd.freqdist(df, "Department")
103
+
104
+ | Department | Count | Percentage |
105
+ |------------|-------|------------|
106
+ | IT | 5 | 33.33 |
107
+ | HR | 5 | 33.33 |
108
+ | Marketing | 3 | 20.00 |
109
+ | Finance | 2 | 13.33 |
110
+ ```
111
+
112
+
113
+
114
+ ## qd.freqdist_a Function
115
+ Run the function qd.freqdist_a(df, ascending = FALSE) to easily create frequency distribution tables, arranged in descending manner (default) or ascending (TRUE), for all the categorical variables in your data frame. The resulting table will include columns such as:
116
+ * Variable levels (i.e., for Satisfaction: Very Low, Low, Moderate, High, Very High)
117
+ * Counts - the number of observations
118
+ * Percentage - percentage of observations from total.
119
+
120
+ ```python
121
+ import qdesc as qd
122
+ qd.freqdist_a(df)
123
+
124
+ | Column | Value | Count | Percentage |
125
+ |------------|----------|-------|------------|
126
+ | Department | IT | 5 | 33.33% |
127
+ | Department | HR | 5 | 33.33% |
128
+ | Department | Marketing| 3 | 20.00% |
129
+ | Department | Finance | 2 | 13.33% |
130
+ | Gender | Male | 8 | 53.33% |
131
+ | Gender | Female | 7 | 46.67% |
132
+ ```
133
+
134
+
135
+
136
+ ## qd.freqdist_to_excel Function
137
+ Run the function qd.freqdist_to_excel(df, "Filename.xlsx", ascending = FALSE ) to easily create frequency distribution tables, arranged in descending manner (default) or ascending (TRUE), for all the categorical variables in your data frame and SAVED as separate sheets in the .xlsx File. The resulting table will include columns such as:
138
+ * Variable levels (i.e., for Satisfaction: Very Low, Low, Moderate, High, Very High)
139
+ * Counts - the number of observations
140
+ * Percentage - percentage of observations from total.
141
+
142
+ ```python
143
+ import qdesc as qd
144
+ qd.freqdist_to_excel(df, "Results.xlsx")
145
+
146
+ Frequency distributions written to Results.xlsx
147
+ ```
148
+
149
+ ## qd.normcheck_dashboard Function
150
+ Run the function qd.normcheck_dashboard(df) to efficiently check each numeric variable for normality of its distribution. It will compute the Anderson-Darling statistic and create visualizations (i.e., qq-plot, histogram, and boxplots) for checking whether the distribution is approximately normal.
151
+
152
+ ```python
153
+ import qdesc as qd
154
+ qd.normcheck_dashboard(df)
155
+ ```
156
+ ![Descriptive Statistics](https://raw.githubusercontent.com/Dcroix/qdesc/refs/heads/main/qd.normcheck_dashboard.png)
157
+
158
+
159
+ ## License
160
+ This project is licensed under the GPL-3 License. See the LICENSE file for more details.
161
+
162
+ ## Acknowledgements
163
+ Acknowledgement of the libraries used by this package...
164
+
165
+ ### Pandas
166
+ Pandas is distributed under the BSD 3-Clause License, pandas is developed by Pandas contributors. Copyright (c) 2008-2024, the pandas development team All rights reserved.
167
+ ### NumPy
168
+ NumPy is distributed under the BSD 3-Clause License, numpy is developed by NumPy contributors. Copyright (c) 2005-2024, NumPy Developers. All rights reserved.
169
+ ### SciPy
170
+ SciPy is distributed under the BSD License, scipy is developed by SciPy contributors. Copyright (c) 2001-2024, SciPy Developers. All rights reserved.
171
+
172
+
173
+
174
+
175
+
@@ -5,4 +5,5 @@ qdesc/__init__.py
5
5
  qdesc.egg-info/PKG-INFO
6
6
  qdesc.egg-info/SOURCES.txt
7
7
  qdesc.egg-info/dependency_links.txt
8
+ qdesc.egg-info/requires.txt
8
9
  qdesc.egg-info/top_level.txt
@@ -0,0 +1,6 @@
1
+ pandas
2
+ numpy
3
+ scipy
4
+ seaborn
5
+ matplotlib
6
+ statsmodels
@@ -7,10 +7,15 @@ long_description = (this_directory / "README.md").read_text()
7
7
 
8
8
  setup(
9
9
  name='qdesc',
10
- version='0.1.9.1',
10
+ version='0.1.9.3',
11
11
  packages=find_packages(),
12
12
  install_requires=[
13
- # List your dependencies here, e.g., pandas if your function requires it
13
+ 'pandas',
14
+ 'numpy',
15
+ 'scipy',
16
+ 'seaborn',
17
+ 'matplotlib',
18
+ 'statsmodels'
14
19
  ],
15
20
  author='Paolo Hilado',
16
21
  author_email='datasciencepgh@proton.me',
qdesc-0.1.9.1/PKG-INFO DELETED
@@ -1,110 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: qdesc
3
- Version: 0.1.9.1
4
- Summary: Quick and Easy way to do descriptive analysis.
5
- Author: Paolo Hilado
6
- Author-email: datasciencepgh@proton.me
7
- Description-Content-Type: text/markdown
8
- License-File: LICENCE.txt
9
- Dynamic: author
10
- Dynamic: author-email
11
- Dynamic: description
12
- Dynamic: description-content-type
13
- Dynamic: license-file
14
- Dynamic: summary
15
-
16
- # qdesc - Quick and Easy Descriptive Analysis
17
-
18
- ## Overview
19
- This is a package for quick and easy descriptive analysis.
20
- Required packages include: pandas, numpy, and SciPy version 1.14.1
21
- Be sure to run the following prior to using the "qd.desc" function:
22
-
23
- - import pandas as pd
24
- - import numpy as np
25
- - from scipy.stats import anderson
26
- - import qdesc as qd
27
-
28
- The qdesc package provides a quick and easy approach to do descriptive analysis for quantitative data.
29
-
30
- ## qd.desc Function
31
- Run the function qd.desc(df) to get the following statistics:
32
- * count - number of observations
33
- * mean - measure of central tendency for normal distribution
34
- * std - measure of spread for normal distribution
35
- * median - measure of central tendency for skewed distributions or those with outliers
36
- * MAD - measure of spread for skewed distributions or those with outliers; this is manual Median Absolute Deviation (MAD) which is more robust when dealing with non-normal distributions.
37
- * min - lowest observed value
38
- * max - highest observed value
39
- * AD_stat - Anderson - Darling Statistic
40
- * 5% crit_value - critical value for a 5% Significance Level
41
- * 1% crit_value - critical value for a 1% Significance Level
42
-
43
- ## qd.freqdist Function
44
- Run the function qd.freqdist(df, "Variable Name") to easily create a frequency distribution for your chosen categorical variable with the following:
45
- * Variable Levels (i.e., for Sex Variable: Male and Female)
46
- * Counts - the number of observations
47
- * Percentage - percentage of observations from total.
48
-
49
- ## qd.freqdist_a Function
50
- Run the function qd.freqdist_a(df, ascending = FALSE) to easily create frequency distribution tables, arranged in descending manner (default) or ascending (TRUE), for all
51
- the categorical variables in your data frame. The resulting table will include columns such as:
52
- * Variable levels (i.e., for Satisfaction: Very Low, Low, Moderate, High, Very High)
53
- * Counts - the number of observations
54
- * Percentage - percentage of observations from total.
55
-
56
- ## qd.freqdist_to_excel Function
57
- Run the function qd.freqdist_to_excel(df, "Name of file.xlsx", ascending = FALSE ) to easily create frequency distribution tables, arranged in descending manner (default) or ascending (TRUE), for all the categorical variables in your data frame and SAVED as separate sheets in the .xlsx File. The resulting table will include columns such as:
58
- * Variable levels (i.e., for Satisfaction: Very Low, Low, Moderate, High, Very High)
59
- * Counts - the number of observations
60
- * Percentage - percentage of observations from total.
61
-
62
- ## qd.normcheck_dashboard Function
63
- Run the function qd.normcheck_dashboard(df) to efficiently check each numeric variable for normality of its distribution. It will compute the Anderson-Darling statistic and
64
- create visualizations (i.e., qq-plot, histogram, and boxplots) for checking whether the distribution is approximately normal.
65
-
66
-
67
- ## Installation
68
- pip install qdesc
69
-
70
- ## Sample use of qdesc functions
71
-
72
- ## Creating a sample dataframe
73
- import pandas as pd
74
- import numpy as np
75
-
76
- ## Set seed for reproducibility
77
- np.random.seed(21)
78
-
79
- ## Create two continuous variables
80
- var1 = np.random.normal(loc=0, scale=1, size=1000)
81
- var2 = np.random.uniform(low=10, high=50, size=1000)
82
-
83
- ## Create DataFrame
84
- df = pd.DataFrame({
85
- 'Normal_Variable': var1,
86
- 'Uniform_Variable': var2
87
- })
88
-
89
- ## Using the qdesc function
90
- import qdesc as qd
91
-
92
- qd.desc(df)
93
-
94
- ## License
95
- This project is licensed under the GPL-3 License. See the LICENSE file for more details.
96
-
97
- ## Acknowledgements
98
- Acknowledgement of the libraries used by this package...
99
-
100
- ### Pandas
101
- Pandas is distributed under the BSD 3-Clause License, pandas is developed by Pandas contributors. Copyright (c) 2008-2024, the pandas development team All rights reserved.
102
- ### NumPy
103
- NumPy is distributed under the BSD 3-Clause License, numpy is developed by NumPy contributors. Copyright (c) 2005-2024, NumPy Developers. All rights reserved.
104
- ### SciPy
105
- SciPy is distributed under the BSD License, scipy is developed by SciPy contributors. Copyright (c) 2001-2024, SciPy Developers. All rights reserved.
106
-
107
-
108
-
109
-
110
-
qdesc-0.1.9.1/README.md DELETED
@@ -1,95 +0,0 @@
1
- # qdesc - Quick and Easy Descriptive Analysis
2
-
3
- ## Overview
4
- This is a package for quick and easy descriptive analysis.
5
- Required packages include: pandas, numpy, and SciPy version 1.14.1
6
- Be sure to run the following prior to using the "qd.desc" function:
7
-
8
- - import pandas as pd
9
- - import numpy as np
10
- - from scipy.stats import anderson
11
- - import qdesc as qd
12
-
13
- The qdesc package provides a quick and easy approach to do descriptive analysis for quantitative data.
14
-
15
- ## qd.desc Function
16
- Run the function qd.desc(df) to get the following statistics:
17
- * count - number of observations
18
- * mean - measure of central tendency for normal distribution
19
- * std - measure of spread for normal distribution
20
- * median - measure of central tendency for skewed distributions or those with outliers
21
- * MAD - measure of spread for skewed distributions or those with outliers; this is manual Median Absolute Deviation (MAD) which is more robust when dealing with non-normal distributions.
22
- * min - lowest observed value
23
- * max - highest observed value
24
- * AD_stat - Anderson - Darling Statistic
25
- * 5% crit_value - critical value for a 5% Significance Level
26
- * 1% crit_value - critical value for a 1% Significance Level
27
-
28
- ## qd.freqdist Function
29
- Run the function qd.freqdist(df, "Variable Name") to easily create a frequency distribution for your chosen categorical variable with the following:
30
- * Variable Levels (i.e., for Sex Variable: Male and Female)
31
- * Counts - the number of observations
32
- * Percentage - percentage of observations from total.
33
-
34
- ## qd.freqdist_a Function
35
- Run the function qd.freqdist_a(df, ascending = FALSE) to easily create frequency distribution tables, arranged in descending manner (default) or ascending (TRUE), for all
36
- the categorical variables in your data frame. The resulting table will include columns such as:
37
- * Variable levels (i.e., for Satisfaction: Very Low, Low, Moderate, High, Very High)
38
- * Counts - the number of observations
39
- * Percentage - percentage of observations from total.
40
-
41
- ## qd.freqdist_to_excel Function
42
- Run the function qd.freqdist_to_excel(df, "Name of file.xlsx", ascending = FALSE ) to easily create frequency distribution tables, arranged in descending manner (default) or ascending (TRUE), for all the categorical variables in your data frame and SAVED as separate sheets in the .xlsx File. The resulting table will include columns such as:
43
- * Variable levels (i.e., for Satisfaction: Very Low, Low, Moderate, High, Very High)
44
- * Counts - the number of observations
45
- * Percentage - percentage of observations from total.
46
-
47
- ## qd.normcheck_dashboard Function
48
- Run the function qd.normcheck_dashboard(df) to efficiently check each numeric variable for normality of its distribution. It will compute the Anderson-Darling statistic and
49
- create visualizations (i.e., qq-plot, histogram, and boxplots) for checking whether the distribution is approximately normal.
50
-
51
-
52
- ## Installation
53
- pip install qdesc
54
-
55
- ## Sample use of qdesc functions
56
-
57
- ## Creating a sample dataframe
58
- import pandas as pd
59
- import numpy as np
60
-
61
- ## Set seed for reproducibility
62
- np.random.seed(21)
63
-
64
- ## Create two continuous variables
65
- var1 = np.random.normal(loc=0, scale=1, size=1000)
66
- var2 = np.random.uniform(low=10, high=50, size=1000)
67
-
68
- ## Create DataFrame
69
- df = pd.DataFrame({
70
- 'Normal_Variable': var1,
71
- 'Uniform_Variable': var2
72
- })
73
-
74
- ## Using the qdesc function
75
- import qdesc as qd
76
-
77
- qd.desc(df)
78
-
79
- ## License
80
- This project is licensed under the GPL-3 License. See the LICENSE file for more details.
81
-
82
- ## Acknowledgements
83
- Acknowledgement of the libraries used by this package...
84
-
85
- ### Pandas
86
- Pandas is distributed under the BSD 3-Clause License, pandas is developed by Pandas contributors. Copyright (c) 2008-2024, the pandas development team All rights reserved.
87
- ### NumPy
88
- NumPy is distributed under the BSD 3-Clause License, numpy is developed by NumPy contributors. Copyright (c) 2005-2024, NumPy Developers. All rights reserved.
89
- ### SciPy
90
- SciPy is distributed under the BSD License, scipy is developed by SciPy contributors. Copyright (c) 2001-2024, SciPy Developers. All rights reserved.
91
-
92
-
93
-
94
-
95
-
@@ -1,110 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: qdesc
3
- Version: 0.1.9.1
4
- Summary: Quick and Easy way to do descriptive analysis.
5
- Author: Paolo Hilado
6
- Author-email: datasciencepgh@proton.me
7
- Description-Content-Type: text/markdown
8
- License-File: LICENCE.txt
9
- Dynamic: author
10
- Dynamic: author-email
11
- Dynamic: description
12
- Dynamic: description-content-type
13
- Dynamic: license-file
14
- Dynamic: summary
15
-
16
- # qdesc - Quick and Easy Descriptive Analysis
17
-
18
- ## Overview
19
- This is a package for quick and easy descriptive analysis.
20
- Required packages include: pandas, numpy, and SciPy version 1.14.1
21
- Be sure to run the following prior to using the "qd.desc" function:
22
-
23
- - import pandas as pd
24
- - import numpy as np
25
- - from scipy.stats import anderson
26
- - import qdesc as qd
27
-
28
- The qdesc package provides a quick and easy approach to do descriptive analysis for quantitative data.
29
-
30
- ## qd.desc Function
31
- Run the function qd.desc(df) to get the following statistics:
32
- * count - number of observations
33
- * mean - measure of central tendency for normal distribution
34
- * std - measure of spread for normal distribution
35
- * median - measure of central tendency for skewed distributions or those with outliers
36
- * MAD - measure of spread for skewed distributions or those with outliers; this is manual Median Absolute Deviation (MAD) which is more robust when dealing with non-normal distributions.
37
- * min - lowest observed value
38
- * max - highest observed value
39
- * AD_stat - Anderson - Darling Statistic
40
- * 5% crit_value - critical value for a 5% Significance Level
41
- * 1% crit_value - critical value for a 1% Significance Level
42
-
43
- ## qd.freqdist Function
44
- Run the function qd.freqdist(df, "Variable Name") to easily create a frequency distribution for your chosen categorical variable with the following:
45
- * Variable Levels (i.e., for Sex Variable: Male and Female)
46
- * Counts - the number of observations
47
- * Percentage - percentage of observations from total.
48
-
49
- ## qd.freqdist_a Function
50
- Run the function qd.freqdist_a(df, ascending = FALSE) to easily create frequency distribution tables, arranged in descending manner (default) or ascending (TRUE), for all
51
- the categorical variables in your data frame. The resulting table will include columns such as:
52
- * Variable levels (i.e., for Satisfaction: Very Low, Low, Moderate, High, Very High)
53
- * Counts - the number of observations
54
- * Percentage - percentage of observations from total.
55
-
56
- ## qd.freqdist_to_excel Function
57
- Run the function qd.freqdist_to_excel(df, "Name of file.xlsx", ascending = FALSE ) to easily create frequency distribution tables, arranged in descending manner (default) or ascending (TRUE), for all the categorical variables in your data frame and SAVED as separate sheets in the .xlsx File. The resulting table will include columns such as:
58
- * Variable levels (i.e., for Satisfaction: Very Low, Low, Moderate, High, Very High)
59
- * Counts - the number of observations
60
- * Percentage - percentage of observations from total.
61
-
62
- ## qd.normcheck_dashboard Function
63
- Run the function qd.normcheck_dashboard(df) to efficiently check each numeric variable for normality of its distribution. It will compute the Anderson-Darling statistic and
64
- create visualizations (i.e., qq-plot, histogram, and boxplots) for checking whether the distribution is approximately normal.
65
-
66
-
67
- ## Installation
68
- pip install qdesc
69
-
70
- ## Sample use of qdesc functions
71
-
72
- ## Creating a sample dataframe
73
- import pandas as pd
74
- import numpy as np
75
-
76
- ## Set seed for reproducibility
77
- np.random.seed(21)
78
-
79
- ## Create two continuous variables
80
- var1 = np.random.normal(loc=0, scale=1, size=1000)
81
- var2 = np.random.uniform(low=10, high=50, size=1000)
82
-
83
- ## Create DataFrame
84
- df = pd.DataFrame({
85
- 'Normal_Variable': var1,
86
- 'Uniform_Variable': var2
87
- })
88
-
89
- ## Using the qdesc function
90
- import qdesc as qd
91
-
92
- qd.desc(df)
93
-
94
- ## License
95
- This project is licensed under the GPL-3 License. See the LICENSE file for more details.
96
-
97
- ## Acknowledgements
98
- Acknowledgement of the libraries used by this package...
99
-
100
- ### Pandas
101
- Pandas is distributed under the BSD 3-Clause License, pandas is developed by Pandas contributors. Copyright (c) 2008-2024, the pandas development team All rights reserved.
102
- ### NumPy
103
- NumPy is distributed under the BSD 3-Clause License, numpy is developed by NumPy contributors. Copyright (c) 2005-2024, NumPy Developers. All rights reserved.
104
- ### SciPy
105
- SciPy is distributed under the BSD License, scipy is developed by SciPy contributors. Copyright (c) 2001-2024, SciPy Developers. All rights reserved.
106
-
107
-
108
-
109
-
110
-
File without changes
File without changes