qdesc 0.1.5__tar.gz → 0.1.6__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of qdesc might be problematic. Click here for more details.

@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: qdesc
3
- Version: 0.1.5
3
+ Version: 0.1.6
4
4
  Summary: Quick and Easy way to do descriptive analysis.
5
5
  Author: Paolo Hilado
6
6
  Author-email: datasciencepgh@proton.me
@@ -21,6 +21,7 @@ Be sure to run the following prior to using the "qd.desc" function:
21
21
 
22
22
  The qdesc package provides a quick and easy approach to do descriptive analysis for quantitative data.
23
23
 
24
+ ## qd.desc Function
24
25
  Run the function qd.desc(df) to get the following statistics:
25
26
  * count - number of observations
26
27
  * mean - measure of central tendency for normal distribution
@@ -33,17 +34,25 @@ Run the function qd.desc(df) to get the following statistics:
33
34
  * 5% crit_value - critical value for a 5% Significance Level
34
35
  * 1% crit_value - critical value for a 1% Significance Level
35
36
 
37
+ ## qd.freqdist Function
36
38
  Run the function qd.freqdist(df, "Variable Name") to easily create a frequency distribution for your chosen categorical variable with the following:
37
39
  * Variable Levels (i.e., for Sex Variable: Male and Female)
38
40
  * Counts - the number of observations
39
41
  * Percentage - percentage of observations from total.
40
42
 
43
+ ## qd.freqdist_a Function
41
44
  Run the function qd.freqdist_a(df, ascending = FALSE) to easily create frequency distribution tables, arranged in descending manner (default) or ascending (TRUE), for all
42
45
  the categorical variables in your data frame. The resulting table will include columns such as:
43
46
  * Variable levels (i.e., for Satisfaction: Very Low, Low, Moderate, High, Very High)
44
47
  * Counts - the number of observations
45
48
  * Percentage - percentage of observations from total.
46
49
 
50
+ ## qd.freqdist_to_excel Function
51
+ Run the function qd.freqdist_to_excel(df, "Name of file.xlsx", ascending = FALSE ) to easily create frequency distribution tables, arranged in descending manner (default) or ascending (TRUE), for all the categorical variables in your data frame and SAVED as separate sheets in the .xlsx File. The resulting table will include columns such as:
52
+ * Variable levels (i.e., for Satisfaction: Very Low, Low, Moderate, High, Very High)
53
+ * Counts - the number of observations
54
+ * Percentage - percentage of observations from total.
55
+
47
56
 
48
57
  Later versions will include data visualizations handy for exploring the distribution of the data set.
49
58
 
@@ -1,73 +1,73 @@
1
- Metadata-Version: 2.1
2
- Name: qdesc
3
- Version: 0.1.5
4
- Summary: Quick and Easy way to do descriptive analysis.
5
- Author: Paolo Hilado
6
- Author-email: datasciencepgh@proton.me
7
- Description-Content-Type: text/markdown
8
- License-File: LICENCE.txt
9
-
10
- # qdesc - Quick and Easy Descriptive Analysis
11
-
12
- ## Overview
13
- This is a package for quick and easy descriptive analysis.
14
- Required packages include: pandas, numpy, and SciPy version 1.14.1
15
- Be sure to run the following prior to using the "qd.desc" function:
16
-
17
- - import pandas as pd
18
- - import numpy as np
19
- - from scipy.stats import anderson
20
- - import qdesc as qd
21
-
22
- The qdesc package provides a quick and easy approach to do descriptive analysis for quantitative data.
23
-
24
- Run the function qd.desc(df) to get the following statistics:
25
- * count - number of observations
26
- * mean - measure of central tendency for normal distribution
27
- * std - measure of spread for normal distribution
28
- * median - measure of central tendency for skewed distributions or those with outliers
29
- * MAD - measure of spread for skewed distributions or those with outliers; this is manual Median Absolute Deviation (MAD) which is more robust when dealing with non-normal distributions.
30
- * min - lowest observed value
31
- * max - highest observed value
32
- * AD_stat - Anderson - Darling Statistic
33
- * 5% crit_value - critical value for a 5% Significance Level
34
- * 1% crit_value - critical value for a 1% Significance Level
35
-
36
- Run the function qd.freqdist(df, "Variable Name") to easily create a frequency distribution for your chosen categorical variable with the following:
37
- * Variable Levels (i.e., for Sex Variable: Male and Female)
38
- * Counts - the number of observations
39
- * Percentage - percentage of observations from total.
40
-
41
- Run the function qd.freqdist_a(df, ascending = FALSE) to easily create frequency distribution tables, arranged in descending manner (default) or ascending (TRUE), for all
42
- the categorical variables in your data frame. The resulting table will include columns such as:
43
- * Variable levels (i.e., for Satisfaction: Very Low, Low, Moderate, High, Very High)
44
- * Counts - the number of observations
45
- * Percentage - percentage of observations from total.
46
-
47
-
48
- Later versions will include data visualizations handy for exploring the distribution of the data set.
49
-
50
- ## Installation
51
- pip install qdesc
52
-
53
- ## Usage - doing descriptive analysis using qdesc
54
- ### import qdesc as qd
55
- ### qd.desc(df)
56
-
57
- ## License
58
- This project is licensed under the GPL-3 License. See the LICENSE file for more details.
59
-
60
- ## Acknowledgements
61
- Acknowledgement of the libraries used by this package...
62
-
63
- ### Pandas
64
- Pandas is distributed under the BSD 3-Clause License, pandas is developed by Pandas contributors. Copyright (c) 2008-2024, the pandas development team All rights reserved.
65
- ### NumPy
66
- NumPy is distributed under the BSD 3-Clause License, numpy is developed by NumPy contributors. Copyright (c) 2005-2024, NumPy Developers. All rights reserved.
67
- ### SciPy
68
- SciPy is distributed under the BSD License, scipy is developed by SciPy contributors. Copyright (c) 2001-2024, SciPy Developers. All rights reserved.
69
-
70
-
71
-
72
-
73
-
1
+ # qdesc - Quick and Easy Descriptive Analysis
2
+
3
+ ## Overview
4
+ This is a package for quick and easy descriptive analysis.
5
+ Required packages include: pandas, numpy, and SciPy version 1.14.1
6
+ Be sure to run the following prior to using the "qd.desc" function:
7
+
8
+ - import pandas as pd
9
+ - import numpy as np
10
+ - from scipy.stats import anderson
11
+ - import qdesc as qd
12
+
13
+ The qdesc package provides a quick and easy approach to do descriptive analysis for quantitative data.
14
+
15
+ ## qd.desc Function
16
+ Run the function qd.desc(df) to get the following statistics:
17
+ * count - number of observations
18
+ * mean - measure of central tendency for normal distribution
19
+ * std - measure of spread for normal distribution
20
+ * median - measure of central tendency for skewed distributions or those with outliers
21
+ * MAD - measure of spread for skewed distributions or those with outliers; this is manual Median Absolute Deviation (MAD) which is more robust when dealing with non-normal distributions.
22
+ * min - lowest observed value
23
+ * max - highest observed value
24
+ * AD_stat - Anderson - Darling Statistic
25
+ * 5% crit_value - critical value for a 5% Significance Level
26
+ * 1% crit_value - critical value for a 1% Significance Level
27
+
28
+ ## qd.freqdist Function
29
+ Run the function qd.freqdist(df, "Variable Name") to easily create a frequency distribution for your chosen categorical variable with the following:
30
+ * Variable Levels (i.e., for Sex Variable: Male and Female)
31
+ * Counts - the number of observations
32
+ * Percentage - percentage of observations from total.
33
+
34
+ ## qd.freqdist_a Function
35
+ Run the function qd.freqdist_a(df, ascending = FALSE) to easily create frequency distribution tables, arranged in descending manner (default) or ascending (TRUE), for all
36
+ the categorical variables in your data frame. The resulting table will include columns such as:
37
+ * Variable levels (i.e., for Satisfaction: Very Low, Low, Moderate, High, Very High)
38
+ * Counts - the number of observations
39
+ * Percentage - percentage of observations from total.
40
+
41
+ ## qd.freqdist_to_excel Function
42
+ Run the function qd.freqdist_to_excel(df, "Name of file.xlsx", ascending = FALSE ) to easily create frequency distribution tables, arranged in descending manner (default) or ascending (TRUE), for all the categorical variables in your data frame and SAVED as separate sheets in the .xlsx File. The resulting table will include columns such as:
43
+ * Variable levels (i.e., for Satisfaction: Very Low, Low, Moderate, High, Very High)
44
+ * Counts - the number of observations
45
+ * Percentage - percentage of observations from total.
46
+
47
+
48
+ Later versions will include data visualizations handy for exploring the distribution of the data set.
49
+
50
+ ## Installation
51
+ pip install qdesc
52
+
53
+ ## Usage - doing descriptive analysis using qdesc
54
+ ### import qdesc as qd
55
+ ### qd.desc(df)
56
+
57
+ ## License
58
+ This project is licensed under the GPL-3 License. See the LICENSE file for more details.
59
+
60
+ ## Acknowledgements
61
+ Acknowledgement of the libraries used by this package...
62
+
63
+ ### Pandas
64
+ Pandas is distributed under the BSD 3-Clause License, pandas is developed by Pandas contributors. Copyright (c) 2008-2024, the pandas development team All rights reserved.
65
+ ### NumPy
66
+ NumPy is distributed under the BSD 3-Clause License, numpy is developed by NumPy contributors. Copyright (c) 2005-2024, NumPy Developers. All rights reserved.
67
+ ### SciPy
68
+ SciPy is distributed under the BSD License, scipy is developed by SciPy contributors. Copyright (c) 2001-2024, SciPy Developers. All rights reserved.
69
+
70
+
71
+
72
+
73
+
@@ -68,3 +68,36 @@ def freqdist_a(df, ascending=False):
68
68
  results.append(distribution)
69
69
  final_df = pd.concat(results, ignore_index=True)
70
70
  return final_df
71
+
72
+ def clean_sheet_name(name):
73
+ # Remove invalid characters
74
+ name = re.sub(r'[:\\/?*\[\]]', '', name)
75
+ # Limit to 31 characters
76
+ name = name.strip()[:31]
77
+ return name
78
+
79
+ def freqdist_to_excel(df, output_path, sort_by='Percentage', ascending=False, top_n=None):
80
+ used_names = set()
81
+ with pd.ExcelWriter(output_path, engine='xlsxwriter') as writer:
82
+ for column in df.select_dtypes(include=['object', 'category']).columns:
83
+ frequency_table = df[column].value_counts()
84
+ percentage_table = df[column].value_counts(normalize=True) * 100
85
+
86
+ distribution = pd.DataFrame({
87
+ 'Value': frequency_table.index,
88
+ 'Count': frequency_table.values,
89
+ 'Percentage': percentage_table.values
90
+ })
91
+ distribution = distribution.sort_values(by=sort_by, ascending=ascending)
92
+ if top_n is not None:
93
+ distribution = distribution.head(top_n)
94
+ # Generate safe sheet name
95
+ base_name = clean_sheet_name(column)
96
+ sheet_name = base_name
97
+ count = 1
98
+ while sheet_name.lower() in used_names:
99
+ sheet_name = f"{base_name[:28]}_{count}" # stay within 31 char limit
100
+ count += 1
101
+ used_names.add(sheet_name.lower())
102
+ distribution.to_excel(writer, sheet_name=sheet_name, index=False)
103
+ print(f"Frequency distributions written to {output_path}")
@@ -1,64 +1,82 @@
1
- # qdesc - Quick and Easy Descriptive Analysis
2
-
3
- ## Overview
4
- This is a package for quick and easy descriptive analysis.
5
- Required packages include: pandas, numpy, and SciPy version 1.14.1
6
- Be sure to run the following prior to using the "qd.desc" function:
7
-
8
- - import pandas as pd
9
- - import numpy as np
10
- - from scipy.stats import anderson
11
- - import qdesc as qd
12
-
13
- The qdesc package provides a quick and easy approach to do descriptive analysis for quantitative data.
14
-
15
- Run the function qd.desc(df) to get the following statistics:
16
- * count - number of observations
17
- * mean - measure of central tendency for normal distribution
18
- * std - measure of spread for normal distribution
19
- * median - measure of central tendency for skewed distributions or those with outliers
20
- * MAD - measure of spread for skewed distributions or those with outliers; this is manual Median Absolute Deviation (MAD) which is more robust when dealing with non-normal distributions.
21
- * min - lowest observed value
22
- * max - highest observed value
23
- * AD_stat - Anderson - Darling Statistic
24
- * 5% crit_value - critical value for a 5% Significance Level
25
- * 1% crit_value - critical value for a 1% Significance Level
26
-
27
- Run the function qd.freqdist(df, "Variable Name") to easily create a frequency distribution for your chosen categorical variable with the following:
28
- * Variable Levels (i.e., for Sex Variable: Male and Female)
29
- * Counts - the number of observations
30
- * Percentage - percentage of observations from total.
31
-
32
- Run the function qd.freqdist_a(df, ascending = FALSE) to easily create frequency distribution tables, arranged in descending manner (default) or ascending (TRUE), for all
33
- the categorical variables in your data frame. The resulting table will include columns such as:
34
- * Variable levels (i.e., for Satisfaction: Very Low, Low, Moderate, High, Very High)
35
- * Counts - the number of observations
36
- * Percentage - percentage of observations from total.
37
-
38
-
39
- Later versions will include data visualizations handy for exploring the distribution of the data set.
40
-
41
- ## Installation
42
- pip install qdesc
43
-
44
- ## Usage - doing descriptive analysis using qdesc
45
- ### import qdesc as qd
46
- ### qd.desc(df)
47
-
48
- ## License
49
- This project is licensed under the GPL-3 License. See the LICENSE file for more details.
50
-
51
- ## Acknowledgements
52
- Acknowledgement of the libraries used by this package...
53
-
54
- ### Pandas
55
- Pandas is distributed under the BSD 3-Clause License, pandas is developed by Pandas contributors. Copyright (c) 2008-2024, the pandas development team All rights reserved.
56
- ### NumPy
57
- NumPy is distributed under the BSD 3-Clause License, numpy is developed by NumPy contributors. Copyright (c) 2005-2024, NumPy Developers. All rights reserved.
58
- ### SciPy
59
- SciPy is distributed under the BSD License, scipy is developed by SciPy contributors. Copyright (c) 2001-2024, SciPy Developers. All rights reserved.
60
-
61
-
62
-
63
-
64
-
1
+ Metadata-Version: 2.1
2
+ Name: qdesc
3
+ Version: 0.1.6
4
+ Summary: Quick and Easy way to do descriptive analysis.
5
+ Author: Paolo Hilado
6
+ Author-email: datasciencepgh@proton.me
7
+ Description-Content-Type: text/markdown
8
+ License-File: LICENCE.txt
9
+
10
+ # qdesc - Quick and Easy Descriptive Analysis
11
+
12
+ ## Overview
13
+ This is a package for quick and easy descriptive analysis.
14
+ Required packages include: pandas, numpy, and SciPy version 1.14.1
15
+ Be sure to run the following prior to using the "qd.desc" function:
16
+
17
+ - import pandas as pd
18
+ - import numpy as np
19
+ - from scipy.stats import anderson
20
+ - import qdesc as qd
21
+
22
+ The qdesc package provides a quick and easy approach to do descriptive analysis for quantitative data.
23
+
24
+ ## qd.desc Function
25
+ Run the function qd.desc(df) to get the following statistics:
26
+ * count - number of observations
27
+ * mean - measure of central tendency for normal distribution
28
+ * std - measure of spread for normal distribution
29
+ * median - measure of central tendency for skewed distributions or those with outliers
30
+ * MAD - measure of spread for skewed distributions or those with outliers; this is manual Median Absolute Deviation (MAD) which is more robust when dealing with non-normal distributions.
31
+ * min - lowest observed value
32
+ * max - highest observed value
33
+ * AD_stat - Anderson - Darling Statistic
34
+ * 5% crit_value - critical value for a 5% Significance Level
35
+ * 1% crit_value - critical value for a 1% Significance Level
36
+
37
+ ## qd.freqdist Function
38
+ Run the function qd.freqdist(df, "Variable Name") to easily create a frequency distribution for your chosen categorical variable with the following:
39
+ * Variable Levels (i.e., for Sex Variable: Male and Female)
40
+ * Counts - the number of observations
41
+ * Percentage - percentage of observations from total.
42
+
43
+ ## qd.freqdist_a Function
44
+ Run the function qd.freqdist_a(df, ascending = FALSE) to easily create frequency distribution tables, arranged in descending manner (default) or ascending (TRUE), for all
45
+ the categorical variables in your data frame. The resulting table will include columns such as:
46
+ * Variable levels (i.e., for Satisfaction: Very Low, Low, Moderate, High, Very High)
47
+ * Counts - the number of observations
48
+ * Percentage - percentage of observations from total.
49
+
50
+ ## qd.freqdist_to_excel Function
51
+ Run the function qd.freqdist_to_excel(df, "Name of file.xlsx", ascending = FALSE ) to easily create frequency distribution tables, arranged in descending manner (default) or ascending (TRUE), for all the categorical variables in your data frame and SAVED as separate sheets in the .xlsx File. The resulting table will include columns such as:
52
+ * Variable levels (i.e., for Satisfaction: Very Low, Low, Moderate, High, Very High)
53
+ * Counts - the number of observations
54
+ * Percentage - percentage of observations from total.
55
+
56
+
57
+ Later versions will include data visualizations handy for exploring the distribution of the data set.
58
+
59
+ ## Installation
60
+ pip install qdesc
61
+
62
+ ## Usage - doing descriptive analysis using qdesc
63
+ ### import qdesc as qd
64
+ ### qd.desc(df)
65
+
66
+ ## License
67
+ This project is licensed under the GPL-3 License. See the LICENSE file for more details.
68
+
69
+ ## Acknowledgements
70
+ Acknowledgement of the libraries used by this package...
71
+
72
+ ### Pandas
73
+ Pandas is distributed under the BSD 3-Clause License, pandas is developed by Pandas contributors. Copyright (c) 2008-2024, the pandas development team All rights reserved.
74
+ ### NumPy
75
+ NumPy is distributed under the BSD 3-Clause License, numpy is developed by NumPy contributors. Copyright (c) 2005-2024, NumPy Developers. All rights reserved.
76
+ ### SciPy
77
+ SciPy is distributed under the BSD License, scipy is developed by SciPy contributors. Copyright (c) 2001-2024, SciPy Developers. All rights reserved.
78
+
79
+
80
+
81
+
82
+
@@ -7,7 +7,7 @@ long_description = (this_directory / "README.md").read_text()
7
7
 
8
8
  setup(
9
9
  name='qdesc',
10
- version='0.1.5',
10
+ version='0.1.6',
11
11
  packages=find_packages(),
12
12
  install_requires=[
13
13
  # List your dependencies here, e.g., pandas if your function requires it
File without changes
File without changes
File without changes