qdesc 0.1.2__tar.gz → 0.1.4__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of qdesc might be problematic. Click here for more details.

@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: qdesc
3
- Version: 0.1.2
3
+ Version: 0.1.4
4
4
  Summary: Quick and Easy way to do descriptive analysis.
5
5
  Author: Paolo Hilado
6
6
  Author-email: datasciencepgh@proton.me
@@ -38,6 +38,12 @@ Run the function qd.freqdist(df, "Variable Name") to easily create a frequency d
38
38
  * Counts - the number of observations
39
39
  * Percentage - percentage of observations from total.
40
40
 
41
+ Run the function qd.freqdist_a(df) to easily create frequency distribution tables for all the categorical variables in your data frame. The resulting
42
+ table will include columns such as:
43
+ * Variable levels (i.e., for Satisfaction: Very Low, Low, Moderate, High, Very High)
44
+ * Counts - the number of observations
45
+ * Percentage - percentage of observations from total.
46
+
41
47
  Later versions will include data visualizations handy for exploring the distribution of the data set.
42
48
 
43
49
  ## Installation
@@ -1,66 +1,63 @@
1
- Metadata-Version: 2.1
2
- Name: qdesc
3
- Version: 0.1.2
4
- Summary: Quick and Easy way to do descriptive analysis.
5
- Author: Paolo Hilado
6
- Author-email: datasciencepgh@proton.me
7
- Description-Content-Type: text/markdown
8
- License-File: LICENCE.txt
9
-
10
- # qdesc - Quick and Easy Descriptive Analysis
11
-
12
- ## Overview
13
- This is a package for quick and easy descriptive analysis.
14
- Required packages include: pandas, numpy, and SciPy version 1.14.1
15
- Be sure to run the following prior to using the "qd.desc" function:
16
-
17
- - import pandas as pd
18
- - import numpy as np
19
- - from scipy.stats import anderson
20
- - import qdesc as qd
21
-
22
- The qdesc package provides a quick and easy approach to do descriptive analysis for quantitative data.
23
-
24
- Run the function qd.desc(df) to get the following statistics:
25
- * count - number of observations
26
- * mean - measure of central tendency for normal distribution
27
- * std - measure of spread for normal distribution
28
- * median - measure of central tendency for skewed distributions or those with outliers
29
- * MAD - measure of spread for skewed distributions or those with outliers; this is manual Median Absolute Deviation (MAD) which is more robust when dealing with non-normal distributions.
30
- * min - lowest observed value
31
- * max - highest observed value
32
- * AD_stat - Anderson - Darling Statistic
33
- * 5% crit_value - critical value for a 5% Significance Level
34
- * 1% crit_value - critical value for a 1% Significance Level
35
-
36
- Run the function qd.freqdist(df, "Variable Name") to easily create a frequency distribution for your chosen categorical variable with the following:
37
- * Variable Levels (i.e., for Sex Variable: Male and Female)
38
- * Counts - the number of observations
39
- * Percentage - percentage of observations from total.
40
-
41
- Later versions will include data visualizations handy for exploring the distribution of the data set.
42
-
43
- ## Installation
44
- pip install qdesc
45
-
46
- ## Usage - doing descriptive analysis using qdesc
47
- ### import qdesc as qd
48
- ### qd.desc(df)
49
-
50
- ## License
51
- This project is licensed under the GPL-3 License. See the LICENSE file for more details.
52
-
53
- ## Acknowledgements
54
- Acknowledgement of the libraries used by this package...
55
-
56
- ### Pandas
57
- Pandas is distributed under the BSD 3-Clause License, pandas is developed by Pandas contributors. Copyright (c) 2008-2024, the pandas development team All rights reserved.
58
- ### NumPy
59
- NumPy is distributed under the BSD 3-Clause License, numpy is developed by NumPy contributors. Copyright (c) 2005-2024, NumPy Developers. All rights reserved.
60
- ### SciPy
61
- SciPy is distributed under the BSD License, scipy is developed by SciPy contributors. Copyright (c) 2001-2024, SciPy Developers. All rights reserved.
62
-
63
-
64
-
65
-
66
-
1
+ # qdesc - Quick and Easy Descriptive Analysis
2
+
3
+ ## Overview
4
+ This is a package for quick and easy descriptive analysis.
5
+ Required packages include: pandas, numpy, and SciPy version 1.14.1
6
+ Be sure to run the following prior to using the "qd.desc" function:
7
+
8
+ - import pandas as pd
9
+ - import numpy as np
10
+ - from scipy.stats import anderson
11
+ - import qdesc as qd
12
+
13
+ The qdesc package provides a quick and easy approach to do descriptive analysis for quantitative data.
14
+
15
+ Run the function qd.desc(df) to get the following statistics:
16
+ * count - number of observations
17
+ * mean - measure of central tendency for normal distribution
18
+ * std - measure of spread for normal distribution
19
+ * median - measure of central tendency for skewed distributions or those with outliers
20
+ * MAD - measure of spread for skewed distributions or those with outliers; this is manual Median Absolute Deviation (MAD) which is more robust when dealing with non-normal distributions.
21
+ * min - lowest observed value
22
+ * max - highest observed value
23
+ * AD_stat - Anderson - Darling Statistic
24
+ * 5% crit_value - critical value for a 5% Significance Level
25
+ * 1% crit_value - critical value for a 1% Significance Level
26
+
27
+ Run the function qd.freqdist(df, "Variable Name") to easily create a frequency distribution for your chosen categorical variable with the following:
28
+ * Variable Levels (i.e., for Sex Variable: Male and Female)
29
+ * Counts - the number of observations
30
+ * Percentage - percentage of observations from total.
31
+
32
+ Run the function qd.freqdist_a(df) to easily create frequency distribution tables for all the categorical variables in your data frame. The resulting
33
+ table will include columns such as:
34
+ * Variable levels (i.e., for Satisfaction: Very Low, Low, Moderate, High, Very High)
35
+ * Counts - the number of observations
36
+ * Percentage - percentage of observations from total.
37
+
38
+ Later versions will include data visualizations handy for exploring the distribution of the data set.
39
+
40
+ ## Installation
41
+ pip install qdesc
42
+
43
+ ## Usage - doing descriptive analysis using qdesc
44
+ ### import qdesc as qd
45
+ ### qd.desc(df)
46
+
47
+ ## License
48
+ This project is licensed under the GPL-3 License. See the LICENSE file for more details.
49
+
50
+ ## Acknowledgements
51
+ Acknowledgement of the libraries used by this package...
52
+
53
+ ### Pandas
54
+ Pandas is distributed under the BSD 3-Clause License, pandas is developed by Pandas contributors. Copyright (c) 2008-2024, the pandas development team All rights reserved.
55
+ ### NumPy
56
+ NumPy is distributed under the BSD 3-Clause License, numpy is developed by NumPy contributors. Copyright (c) 2005-2024, NumPy Developers. All rights reserved.
57
+ ### SciPy
58
+ SciPy is distributed under the BSD License, scipy is developed by SciPy contributors. Copyright (c) 2001-2024, SciPy Developers. All rights reserved.
59
+
60
+
61
+
62
+
63
+
@@ -0,0 +1,70 @@
1
+ def desc(df):
2
+ import pandas as pd
3
+ import numpy as np
4
+ from scipy.stats import anderson
5
+ x = np.round(df.describe().T,2)
6
+ x = x.iloc[:, [0,1,2,5,3,7]]
7
+ x.rename(columns={'50%': 'median'}, inplace=True)
8
+ mad_values = {}
9
+ # computes the manual mad which is more robust to outliers and non-normal distributions
10
+ for column in df.select_dtypes(include=[np.number]):
11
+ median = np.median(df[column])
12
+ abs_deviation = np.abs(df[column] - median)
13
+ mad = np.median(abs_deviation)
14
+ mad_values[column] = mad
15
+ mad_df = pd.DataFrame(list(mad_values.items()), columns=['Variable', 'MAD'])
16
+ mad_df.set_index('Variable', inplace=True)
17
+ results = {}
18
+ # Loop through each column to test only continuous variables (numeric columns)
19
+ for column in df.select_dtypes(include=[np.number]): # Only continuous variables
20
+ result = anderson(df[column])
21
+ statistic = result.statistic
22
+ critical_values = result.critical_values
23
+ # Only select the 5% and 1% significance levels
24
+ selected_critical_values = {
25
+ '5% crit_value': critical_values[2], # 5% critical value
26
+ '1% crit_value': critical_values[4] # 1% critical value
27
+ }
28
+ # Store the results in a dictionary
29
+ results[column] = {
30
+ 'AD_stat': statistic,
31
+ **selected_critical_values # Add critical values for 5% and 1% levels
32
+ }
33
+ # Convert the results dictionary into a DataFrame
34
+ anderson_df = pd.DataFrame.from_dict(results, orient='index')
35
+
36
+ xl = x.iloc[:, :4]
37
+ xr = x.iloc[:, 4:]
38
+ x_df = np.round(pd.concat([xl, mad_df, xr, anderson_df], axis=1),2)
39
+ return x_df
40
+
41
+ def freqdist(df, column_name):
42
+ import pandas as pd
43
+ if column_name not in df.columns:
44
+ raise ValueError(f"Column '{column_name}' not found in DataFrame.")
45
+
46
+ if df[column_name].dtype not in ['object', 'category']:
47
+ raise ValueError(f"Column '{column_name}' is not a categorical column.")
48
+
49
+ freq_dist = df[column_name].value_counts().reset_index()
50
+ freq_dist.columns = [column_name, 'Count']
51
+ freq_dist['Percentage'] = (freq_dist['Count'] / len(df)) * 100
52
+ return freq_dist
53
+
54
+
55
+ def freqdist_a(df):
56
+ results = [] # List to store distributions
57
+ for column in df.select_dtypes(include=['object', 'category']).columns:
58
+ frequency_table = df[column].value_counts()
59
+ percentage_table = df[column].value_counts(normalize=True) * 100
60
+ distribution = pd.DataFrame({
61
+ 'Column': column,
62
+ 'Value': frequency_table.index,
63
+ 'Count': frequency_table.values,
64
+ 'Percentage': percentage_table.values
65
+ })
66
+ results.append(distribution)
67
+
68
+ # Combine all distributions into a single DataFrame
69
+ final_df = pd.concat(results, ignore_index=True)
70
+ return final_df
@@ -0,0 +1,72 @@
1
+ Metadata-Version: 2.1
2
+ Name: qdesc
3
+ Version: 0.1.4
4
+ Summary: Quick and Easy way to do descriptive analysis.
5
+ Author: Paolo Hilado
6
+ Author-email: datasciencepgh@proton.me
7
+ Description-Content-Type: text/markdown
8
+ License-File: LICENCE.txt
9
+
10
+ # qdesc - Quick and Easy Descriptive Analysis
11
+
12
+ ## Overview
13
+ This is a package for quick and easy descriptive analysis.
14
+ Required packages include: pandas, numpy, and SciPy version 1.14.1
15
+ Be sure to run the following prior to using the "qd.desc" function:
16
+
17
+ - import pandas as pd
18
+ - import numpy as np
19
+ - from scipy.stats import anderson
20
+ - import qdesc as qd
21
+
22
+ The qdesc package provides a quick and easy approach to do descriptive analysis for quantitative data.
23
+
24
+ Run the function qd.desc(df) to get the following statistics:
25
+ * count - number of observations
26
+ * mean - measure of central tendency for normal distribution
27
+ * std - measure of spread for normal distribution
28
+ * median - measure of central tendency for skewed distributions or those with outliers
29
+ * MAD - measure of spread for skewed distributions or those with outliers; this is manual Median Absolute Deviation (MAD) which is more robust when dealing with non-normal distributions.
30
+ * min - lowest observed value
31
+ * max - highest observed value
32
+ * AD_stat - Anderson - Darling Statistic
33
+ * 5% crit_value - critical value for a 5% Significance Level
34
+ * 1% crit_value - critical value for a 1% Significance Level
35
+
36
+ Run the function qd.freqdist(df, "Variable Name") to easily create a frequency distribution for your chosen categorical variable with the following:
37
+ * Variable Levels (i.e., for Sex Variable: Male and Female)
38
+ * Counts - the number of observations
39
+ * Percentage - percentage of observations from total.
40
+
41
+ Run the function qd.freqdist_a(df) to easily create frequency distribution tables for all the categorical variables in your data frame. The resulting
42
+ table will include columns such as:
43
+ * Variable levels (i.e., for Satisfaction: Very Low, Low, Moderate, High, Very High)
44
+ * Counts - the number of observations
45
+ * Percentage - percentage of observations from total.
46
+
47
+ Later versions will include data visualizations handy for exploring the distribution of the data set.
48
+
49
+ ## Installation
50
+ pip install qdesc
51
+
52
+ ## Usage - doing descriptive analysis using qdesc
53
+ ### import qdesc as qd
54
+ ### qd.desc(df)
55
+
56
+ ## License
57
+ This project is licensed under the GPL-3 License. See the LICENSE file for more details.
58
+
59
+ ## Acknowledgements
60
+ Acknowledgement of the libraries used by this package...
61
+
62
+ ### Pandas
63
+ Pandas is distributed under the BSD 3-Clause License, pandas is developed by Pandas contributors. Copyright (c) 2008-2024, the pandas development team All rights reserved.
64
+ ### NumPy
65
+ NumPy is distributed under the BSD 3-Clause License, numpy is developed by NumPy contributors. Copyright (c) 2005-2024, NumPy Developers. All rights reserved.
66
+ ### SciPy
67
+ SciPy is distributed under the BSD License, scipy is developed by SciPy contributors. Copyright (c) 2001-2024, SciPy Developers. All rights reserved.
68
+
69
+
70
+
71
+
72
+
@@ -1,6 +1,7 @@
1
1
  LICENCE.txt
2
- README.txt
2
+ README.md
3
3
  setup.py
4
+ qdesc/__init__.py
4
5
  qdesc.egg-info/PKG-INFO
5
6
  qdesc.egg-info/SOURCES.txt
6
7
  qdesc.egg-info/dependency_links.txt
@@ -0,0 +1 @@
1
+ qdesc
@@ -7,7 +7,7 @@ long_description = (this_directory / "README.md").read_text()
7
7
 
8
8
  setup(
9
9
  name='qdesc',
10
- version='0.1.2',
10
+ version='0.1.4',
11
11
  packages=find_packages(),
12
12
  install_requires=[
13
13
  # List your dependencies here, e.g., pandas if your function requires it
qdesc-0.1.2/README.txt DELETED
@@ -1,22 +0,0 @@
1
- This is a package for quick and easy descriptive analysis.
2
- Required packages include: pandas, numpy, and SciPy version 1.14.1
3
- Be sure to run the following prior to using the "qd.desc" function:
4
-
5
- import pandas as pd
6
- import numpy as np
7
- from scipy.stats import anderson
8
- import qdesc as qd
9
-
10
- The qdesc package provides a quick and easy approach to do descriptive analysis for quantitative data.
11
-
12
- run the function qd.desc(df) to get the following statistics:
13
- count - number of observations
14
- mean - measure of central tendency for normal distribution
15
- std - measure of spread for normal distribution
16
- median - measure of central tendency for skewed distributions or those with outliers
17
- MAD - measure of spread for skewed distributions or those with outliers; this is manual Median Absolute Deviation (MAD) which is more robust when dealing with non-normal distributions.
18
- min - lowest observed value
19
- max - highest observed value
20
- AD_stat - Anderson - Darling Statistic
21
- 5% crit_value - critical value for a 5% Significance Level
22
- 1% crit_value - critical value for a 1% Significance Level
@@ -1 +0,0 @@
1
-
File without changes
File without changes