PyPI - qdesc - Versions diffs - 0.1.2__tar.gz → 0.1.4__tar.gz - Mend

qdesc 0.1.2tar.gz → 0.1.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of qdesc might be problematic. Click here for more details.

Files changed (12) hide show

{qdesc-0.1.2/qdesc.egg-info → qdesc-0.1.4}/PKG-INFO +7 -1
qdesc-0.1.2/PKG-INFO → qdesc-0.1.4/README.md +63 -66
qdesc-0.1.4/qdesc/__init__.py +70 -0
qdesc-0.1.4/qdesc.egg-info/PKG-INFO +72 -0
{qdesc-0.1.2 → qdesc-0.1.4}/qdesc.egg-info/SOURCES.txt +2 -1
qdesc-0.1.4/qdesc.egg-info/top_level.txt +1 -0
{qdesc-0.1.2 → qdesc-0.1.4}/setup.py +1 -1
qdesc-0.1.2/README.txt +0 -22
qdesc-0.1.2/qdesc.egg-info/top_level.txt +0 -1
{qdesc-0.1.2 → qdesc-0.1.4}/LICENCE.txt +0 -0
{qdesc-0.1.2 → qdesc-0.1.4}/qdesc.egg-info/dependency_links.txt +0 -0
{qdesc-0.1.2 → qdesc-0.1.4}/setup.cfg +0 -0

{qdesc-0.1.2/qdesc.egg-info → qdesc-0.1.4}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: qdesc
-Version: 0.1.2
+Version: 0.1.4
 Summary: Quick and Easy way to do descriptive analysis.
 Author: Paolo Hilado
 Author-email: datasciencepgh@proton.me
@@ -38,6 +38,12 @@ Run the function qd.freqdist(df, "Variable Name") to easily create a frequency d
 * Counts - the number of observations
 * Percentage - percentage of observations from total.
+Run the function qd.freqdist_a(df) to easily create frequency distribution tables for all the categorical variables in your data frame. The resulting
+table will include columns such as:
+* Variable levels (i.e., for Satisfaction: Very Low, Low, Moderate, High, Very High)
+* Counts - the number of observations
+* Percentage - percentage of observations from total.
 Later versions will include data visualizations handy for exploring the distribution of the data set.
 ## Installation

qdesc-0.1.2/PKG-INFO → qdesc-0.1.4/README.md RENAMED Viewed

@@ -1,66 +1,63 @@
-Metadata-Version: 2.1
-Name: qdesc
-Version: 0.1.2
-Summary: Quick and Easy way to do descriptive analysis.
-Author: Paolo Hilado
-Author-email: datasciencepgh@proton.me
-Description-Content-Type: text/markdown
-License-File: LICENCE.txt
-# qdesc - Quick and Easy Descriptive Analysis
-## Overview
-This is a package for quick and easy descriptive analysis.
-Required packages include: pandas, numpy, and SciPy version 1.14.1
-Be sure to run the following prior to using the "qd.desc" function:
-- import pandas as pd
-- import numpy as np
-- from scipy.stats import anderson
-- import qdesc as qd
-The qdesc package provides a quick and easy approach to do descriptive analysis for quantitative data.
-Run the function qd.desc(df) to get the following statistics:
-* count - number of observations
-* mean - measure of central tendency for normal distribution
-* std - measure of spread for normal distribution
-* median - measure of central tendency for skewed distributions or those with outliers
-* MAD - measure of spread for skewed distributions or those with outliers; this is manual Median Absolute Deviation (MAD) which is more robust when dealing with non-normal distributions.
-* min - lowest observed value
-* max - highest observed value
-* AD_stat	- Anderson - Darling Statistic
-* 5% crit_value - critical value for a 5% Significance Level
-* 1% crit_value - critical value for a 1% Significance Level
-Run the function qd.freqdist(df, "Variable Name") to easily create a frequency distribution for your chosen categorical variable with the following:
-* Variable Levels (i.e., for Sex Variable: Male and Female)
-* Counts - the number of observations
-* Percentage - percentage of observations from total.
-Later versions will include data visualizations handy for exploring the distribution of the data set.
-## Installation
-pip install qdesc
-## Usage - doing descriptive analysis using qdesc
-### import qdesc as qd
-### qd.desc(df)
-## License
-This project is licensed under the GPL-3 License. See the LICENSE file for more details.
-## Acknowledgements
-Acknowledgement of the libraries used by this package...
-### Pandas
-Pandas is distributed under the BSD 3-Clause License, pandas is developed by Pandas contributors. Copyright (c) 2008-2024, the pandas development team All rights reserved.
-### NumPy
-NumPy is distributed under the BSD 3-Clause License, numpy is developed by NumPy contributors. Copyright (c) 2005-2024, NumPy Developers. All rights reserved.
-### SciPy
-SciPy is distributed under the BSD License, scipy is developed by SciPy contributors. Copyright (c) 2001-2024, SciPy Developers. All rights reserved.
+# qdesc - Quick and Easy Descriptive Analysis
+## Overview
+This is a package for quick and easy descriptive analysis.
+Required packages include: pandas, numpy, and SciPy version 1.14.1
+Be sure to run the following prior to using the "qd.desc" function:
+- import pandas as pd
+- import numpy as np
+- from scipy.stats import anderson
+- import qdesc as qd
+The qdesc package provides a quick and easy approach to do descriptive analysis for quantitative data.
+Run the function qd.desc(df) to get the following statistics:
+* count - number of observations
+* mean - measure of central tendency for normal distribution
+* std - measure of spread for normal distribution
+* median - measure of central tendency for skewed distributions or those with outliers
+* MAD - measure of spread for skewed distributions or those with outliers; this is manual Median Absolute Deviation (MAD) which is more robust when dealing with non-normal distributions.
+* min - lowest observed value
+* max - highest observed value
+* AD_stat	- Anderson - Darling Statistic
+* 5% crit_value - critical value for a 5% Significance Level
+* 1% crit_value - critical value for a 1% Significance Level
+Run the function qd.freqdist(df, "Variable Name") to easily create a frequency distribution for your chosen categorical variable with the following:
+* Variable Levels (i.e., for Sex Variable: Male and Female)
+* Counts - the number of observations
+* Percentage - percentage of observations from total.
+Run the function qd.freqdist_a(df) to easily create frequency distribution tables for all the categorical variables in your data frame. The resulting
+table will include columns such as:
+* Variable levels (i.e., for Satisfaction: Very Low, Low, Moderate, High, Very High)
+* Counts - the number of observations
+* Percentage - percentage of observations from total.
+Later versions will include data visualizations handy for exploring the distribution of the data set.
+## Installation
+pip install qdesc
+## Usage - doing descriptive analysis using qdesc
+### import qdesc as qd
+### qd.desc(df)
+## License
+This project is licensed under the GPL-3 License. See the LICENSE file for more details.
+## Acknowledgements
+Acknowledgement of the libraries used by this package...
+### Pandas
+Pandas is distributed under the BSD 3-Clause License, pandas is developed by Pandas contributors. Copyright (c) 2008-2024, the pandas development team All rights reserved.
+### NumPy
+NumPy is distributed under the BSD 3-Clause License, numpy is developed by NumPy contributors. Copyright (c) 2005-2024, NumPy Developers. All rights reserved.
+### SciPy
+SciPy is distributed under the BSD License, scipy is developed by SciPy contributors. Copyright (c) 2001-2024, SciPy Developers. All rights reserved.

qdesc-0.1.4/qdesc/__init__.py ADDED Viewed

@@ -0,0 +1,70 @@
+def desc(df):
+    import pandas as pd
+    import numpy as np
+    from scipy.stats import anderson
+    x = np.round(df.describe().T,2)
+    x = x.iloc[:, [0,1,2,5,3,7]]
+    x.rename(columns={'50%': 'median'}, inplace=True)
+    mad_values = {}
+    # computes the manual mad which is more robust to outliers and non-normal distributions
+    for column in df.select_dtypes(include=[np.number]):
+        median = np.median(df[column])
+        abs_deviation = np.abs(df[column] - median)
+        mad = np.median(abs_deviation)
+        mad_values[column] = mad
+    mad_df = pd.DataFrame(list(mad_values.items()), columns=['Variable', 'MAD'])
+    mad_df.set_index('Variable', inplace=True)
+    results = {}
+    # Loop through each column to test only continuous variables (numeric columns)
+    for column in df.select_dtypes(include=[np.number]):  # Only continuous variables
+        result = anderson(df[column])
+        statistic = result.statistic
+        critical_values = result.critical_values
+        # Only select the 5% and 1% significance levels
+        selected_critical_values = {
+            '5% crit_value': critical_values[2],  # 5% critical value
+            '1% crit_value': critical_values[4]   # 1% critical value
+        }
+        # Store the results in a dictionary
+        results[column] = {
+            'AD_stat': statistic,
+            **selected_critical_values  # Add critical values for 5% and 1% levels
+        }
+    # Convert the results dictionary into a DataFrame
+    anderson_df = pd.DataFrame.from_dict(results, orient='index')
+    xl = x.iloc[:, :4]
+    xr = x.iloc[:, 4:]
+    x_df = np.round(pd.concat([xl, mad_df, xr, anderson_df], axis=1),2)
+    return x_df
+def freqdist(df, column_name):
+    import pandas as pd
+    if column_name not in df.columns:
+        raise ValueError(f"Column '{column_name}' not found in DataFrame.")
+    if df[column_name].dtype not in ['object', 'category']:
+        raise ValueError(f"Column '{column_name}' is not a categorical column.")
+    freq_dist = df[column_name].value_counts().reset_index()
+    freq_dist.columns = [column_name, 'Count']
+    freq_dist['Percentage'] = (freq_dist['Count'] / len(df)) * 100
+    return freq_dist
+def freqdist_a(df):
+    results = []  # List to store distributions
+    for column in df.select_dtypes(include=['object', 'category']).columns:
+        frequency_table = df[column].value_counts()
+        percentage_table = df[column].value_counts(normalize=True) * 100
+        distribution = pd.DataFrame({
+            'Column': column,
+            'Value': frequency_table.index,
+            'Count': frequency_table.values,
+            'Percentage': percentage_table.values
+        })
+        results.append(distribution)
+    # Combine all distributions into a single DataFrame
+    final_df = pd.concat(results, ignore_index=True)
+    return final_df

qdesc-0.1.4/qdesc.egg-info/PKG-INFO ADDED Viewed

@@ -0,0 +1,72 @@
+Metadata-Version: 2.1
+Name: qdesc
+Version: 0.1.4
+Summary: Quick and Easy way to do descriptive analysis.
+Author: Paolo Hilado
+Author-email: datasciencepgh@proton.me
+Description-Content-Type: text/markdown
+License-File: LICENCE.txt
+# qdesc - Quick and Easy Descriptive Analysis
+## Overview
+This is a package for quick and easy descriptive analysis.
+Required packages include: pandas, numpy, and SciPy version 1.14.1
+Be sure to run the following prior to using the "qd.desc" function:
+- import pandas as pd
+- import numpy as np
+- from scipy.stats import anderson
+- import qdesc as qd
+The qdesc package provides a quick and easy approach to do descriptive analysis for quantitative data.
+Run the function qd.desc(df) to get the following statistics:
+* count - number of observations
+* mean - measure of central tendency for normal distribution
+* std - measure of spread for normal distribution
+* median - measure of central tendency for skewed distributions or those with outliers
+* MAD - measure of spread for skewed distributions or those with outliers; this is manual Median Absolute Deviation (MAD) which is more robust when dealing with non-normal distributions.
+* min - lowest observed value
+* max - highest observed value
+* AD_stat	- Anderson - Darling Statistic
+* 5% crit_value - critical value for a 5% Significance Level
+* 1% crit_value - critical value for a 1% Significance Level
+Run the function qd.freqdist(df, "Variable Name") to easily create a frequency distribution for your chosen categorical variable with the following:
+* Variable Levels (i.e., for Sex Variable: Male and Female)
+* Counts - the number of observations
+* Percentage - percentage of observations from total.
+Run the function qd.freqdist_a(df) to easily create frequency distribution tables for all the categorical variables in your data frame. The resulting
+table will include columns such as:
+* Variable levels (i.e., for Satisfaction: Very Low, Low, Moderate, High, Very High)
+* Counts - the number of observations
+* Percentage - percentage of observations from total.
+Later versions will include data visualizations handy for exploring the distribution of the data set.
+## Installation
+pip install qdesc
+## Usage - doing descriptive analysis using qdesc
+### import qdesc as qd
+### qd.desc(df)
+## License
+This project is licensed under the GPL-3 License. See the LICENSE file for more details.
+## Acknowledgements
+Acknowledgement of the libraries used by this package...
+### Pandas
+Pandas is distributed under the BSD 3-Clause License, pandas is developed by Pandas contributors. Copyright (c) 2008-2024, the pandas development team All rights reserved.
+### NumPy
+NumPy is distributed under the BSD 3-Clause License, numpy is developed by NumPy contributors. Copyright (c) 2005-2024, NumPy Developers. All rights reserved.
+### SciPy
+SciPy is distributed under the BSD License, scipy is developed by SciPy contributors. Copyright (c) 2001-2024, SciPy Developers. All rights reserved.

{qdesc-0.1.2 → qdesc-0.1.4}/qdesc.egg-info/SOURCES.txt RENAMED Viewed

@@ -1,6 +1,7 @@
 LICENCE.txt
-README.txt
+README.md
 setup.py
+qdesc/__init__.py
 qdesc.egg-info/PKG-INFO
 qdesc.egg-info/SOURCES.txt
 qdesc.egg-info/dependency_links.txt

qdesc-0.1.4/qdesc.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+ qdesc

{qdesc-0.1.2 → qdesc-0.1.4}/setup.py RENAMED Viewed

@@ -7,7 +7,7 @@ long_description = (this_directory / "README.md").read_text()
 setup(
     name='qdesc',
-    version='0.1.2',
+    version='0.1.4',
     packages=find_packages(),
     install_requires=[
         # List your dependencies here, e.g., pandas if your function requires it

qdesc-0.1.2/README.txt DELETED Viewed

@@ -1,22 +0,0 @@
-This is a package for quick and easy descriptive analysis.
-Required packages include: pandas, numpy, and SciPy version 1.14.1
-Be sure to run the following prior to using the "qd.desc" function:
-import pandas as pd
-import numpy as np
-from scipy.stats import anderson
-import qdesc as qd
-The qdesc package provides a quick and easy approach to do descriptive analysis for quantitative data.
-run the function qd.desc(df) to get the following statistics:
-count - number of observations
-mean - measure of central tendency for normal distribution
-std - measure of spread for normal distribution
-median - measure of central tendency for skewed distributions or those with outliers
-MAD - measure of spread for skewed distributions or those with outliers; this is manual Median Absolute Deviation (MAD) which is more robust when dealing with non-normal distributions.
-min - lowest observed value
-max - highest observed value
-AD_stat	- Anderson - Darling Statistic
-5% crit_value - critical value for a 5% Significance Level
-1% crit_value - critical value for a 1% Significance Level

qdesc-0.1.2/qdesc.egg-info/top_level.txt DELETED Viewed

	@@ -1 +0,0 @@
1	-

{qdesc-0.1.2 → qdesc-0.1.4}/LICENCE.txt RENAMED Viewed

File without changes

{qdesc-0.1.2 → qdesc-0.1.4}/qdesc.egg-info/dependency_links.txt RENAMED Viewed

File without changes

{qdesc-0.1.2 → qdesc-0.1.4}/setup.cfg RENAMED Viewed

File without changes

qdesc 0.1.2__tar.gz → 0.1.4__tar.gz

Potentially problematic release.

qdesc 0.1.2tar.gz → 0.1.4tar.gz