PyPI - rgwfuncs - Versions diffs - 0.0.21__py3-none-any.whl → 0.0.54__py3-none-any.whl - Mend

rgwfuncs 0.0.21py3-none-any.whl → 0.0.54py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

rgwfuncs/__init__.py +5 -2
rgwfuncs/algebra_lib.py +901 -0
rgwfuncs/df_lib.py +111 -61
rgwfuncs/docs_lib.py +51 -0
rgwfuncs/interactive_shell_lib.py +32 -0
rgwfuncs/str_lib.py +8 -44
{rgwfuncs-0.0.21.dist-info → rgwfuncs-0.0.54.dist-info}/METADATA +517 -92
rgwfuncs-0.0.54.dist-info/RECORD +12 -0
rgwfuncs-0.0.21.dist-info/RECORD +0 -9
{rgwfuncs-0.0.21.dist-info → rgwfuncs-0.0.54.dist-info}/LICENSE +0 -0
{rgwfuncs-0.0.21.dist-info → rgwfuncs-0.0.54.dist-info}/WHEEL +0 -0
{rgwfuncs-0.0.21.dist-info → rgwfuncs-0.0.54.dist-info}/entry_points.txt +0 -0
{rgwfuncs-0.0.21.dist-info → rgwfuncs-0.0.54.dist-info}/top_level.txt +0 -0

{rgwfuncs-0.0.21.dist-info → rgwfuncs-0.0.54.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: rgwfuncs
-Version: 0.0.21
+Version: 0.0.54
 Summary: A functional programming paradigm for mathematical modelling and data science
 Home-page: https://github.com/ryangerardwilson/rgwfunc
 Author: Ryan Gerard Wilson
@@ -23,12 +23,16 @@ Requires-Dist: xgboost
 Requires-Dist: requests
 Requires-Dist: slack-sdk
 Requires-Dist: google-api-python-client
+Requires-Dist: boto3
 # RGWFUNCS
 ***By Ryan Gerard Wilson (https://ryangerardwilson.com)***
-This library is meant to make ML/ Data Science pipelines more readable. It assumes a linux environment, and the existence of a `.rgwfuncsrc` file for certain features (like db querying, sending data to slack, etc.)
+This library is meant to protect your eyes (and brain) from OOP syntax. It is unbelievably sad that some of the best work done in creating math and data science libraries in Python has been corrupted by the OOP mind-virus.
+By creating a functional-programming wrapper around these libraries, we aim to soothe. This library assumes a Linux environment and the existence of a `.rgwfuncsrc` file for certain features (like database querying, sending data to Slack, etc.).
 --------------------------------------------------------------------------------
@@ -75,6 +79,15 @@ A `.rgwfuncsrc` file (located at `vi ~/.rgwfuncsrc) is required for MSSQL, CLICK
           "db_type": "google_big_query",
           "json_file_path": "",
           "project_id": ""
+        },
+        {
+          "name": "athena_db1",
+          "db_type": "aws_athena",
+          "aws_access_key": "",
+          "aws_secret_key": "",
+          "aws_region: "",
+          "database": "logs",
+          "output_bucket": "s3://bucket-name"
         }
       ],
       "vm_presets": [
@@ -135,22 +148,415 @@ To display all docstrings, use:
 --------------------------------------------------------------------------------
-## String Based Functions
+## Documentation Access
-### 1. str_docs
-Print a list of available function names in alphabetical order. If a filter is provided, print the matching docstrings.
+### 1. docs
+Print a list of available function names in alphabetical order. If a filter is provided, print the docstrings of functions containing the term.
 • Parameters:
   - `method_type_filter` (str): Optional, comma-separated to select docstring types, or '*' for all.
 • Example:
-    import rgwfuncs
-    rgwfuncs.str_docs(method_type_filter='numeric_clean,limit_dataframe')
+    from rgwfuncs import docs
+    docs(method_type_filter='numeric_clean,limit_dataframe')
 --------------------------------------------------------------------------------
-### 2. send_telegram_message
+## Interactive Shell
+This section includes functions that facilitate launching an interactive Python shell to inspect and modify local variables within the user's environment.
+### 1. `interactive_shell`
+Launches an interactive prompt for inspecting and modifying local variables, making all methods in the rgwfuncs library available by default. This REPL (Read-Eval-Print Loop) environment supports command history and autocompletion, making it easier to interact with your Python code. This function is particularly useful for debugging purposes when you want real-time interaction with your program's execution environment.
+• Parameters:
+  - `local_vars` (dict, optional): A dictionary of local variables to be accessible within the interactive shell. If not provided, defaults to an empty dictionary.
+• Usage:
+  - You can call this function to enter an interactive shell where you can view and modify the variables in the given local scope.
+• Example:
+    from rgwfuncs import interactive_shell
+    import pandas as pd
+    import numpy as np
+    # Example DataFrame
+    df = pd.DataFrame({
+        'id': [1, 2, 3, 4, 5],
+        'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
+        'age': [30, 25, 35, 28, 22],
+        'city': ['New York', 'Los Angeles', 'Chicago', 'San Francisco', 'Boston']
+    })
+    # Launch the interactive shell with local variables
+    interactive_shell(locals())
+Subsequently, in the interactive shell you can use any library in your python file, as well as all rgwfuncs methods (even if they are not imported). Notice, that while pandas and numpy are available in the shell as a result of importing them in the above script, the rgwfuncs method `first_n_rows` was not imported - yet is available for use.
+    Welcome to the rgwfuncs interactive shell.
+    >>> pirst_n_rows(df, 2)
+    Traceback (most recent call last):
+      File "<console>", line 1, in <module>
+    NameError: name 'pirst_n_rows' is not defined. Did you mean: 'first_n_rows'?
+    >>> first_n_rows(df, 2)
+    {'age': '30', 'city': 'New York', 'id': '1', 'name': 'Alice'}
+    {'age': '25', 'city': 'Los Angeles', 'id': '2', 'name': 'Bob'}
+    >>> print(df)
+      id     name age           city
+    0  1    Alice  30       New York
+    1  2      Bob  25    Los Angeles
+    2  3  Charlie  35        Chicago
+    3  4    David  28  San Francisco
+    4  5      Eva  22         Boston
+    >>> arr = np.array([1, 2, 3, 4, 5])
+    >>> arr
+    array([1, 2, 3, 4, 5])
+--------------------------------------------------------------------------------
+## Algebra Based Functions
+This section provides comprehensive functions for handling algebraic expressions, performing tasks such as computation, simplification, solving equations, and prime factorization, all outputted in LaTeX format.
+### 1. `compute_prime_factors`
+Computes prime factors of a number and presents them in LaTeX format.
+• Parameters:
+  - `n` (int): The integer to factorize.
+• Returns:
+  - `str`: Prime factorization in LaTeX.
+• Example:
+    from rgwfuncs import compute_prime_factors
+    factors_1 = compute_prime_factors(100)
+    print(factors_1)  # Output: "2^{2} \cdot 5^{2}"
+    factors_2 = compute_prime_factors(60)
+    print(factors_2)  # Output: "2^{2} \cdot 3 \cdot 5"
+    factors_3 = compute_prime_factors(17)
+    print(factors_3)  # Output: "17"
+--------------------------------------------------------------------------------
+### 2. `compute_constant_expression`
+Computes the numerical result of a given expression, which can evaluate to a constant, represented as a float. Evaluates an constant expression provided as a string and returns the computed result. Supports various arithmetic operations, including addition, subtraction, multiplication, division, and modulo, as well as mathematical functions from the math module.
+• Parameters:
+  - `expression` (str): The constant expression to compute. This should be a string consisting of arithmetic operations and Python's math module functions.
+• Returns:
+  - `float`: The computed numerical result.
+• Example:
+    from rgwfuncs import compute_constant_expression
+    result1 = compute_constant_expression("2 + 2")
+    print(result1)  # Output: 4.0
+    result2 = compute_constant_expression("10 % 3")
+    print(result2)  # Output: 1.0
+    result3 = compute_constant_expression("math.gcd(36, 60) * math.sin(math.radians(45)) * 10000")
+    print(result3)  # Output: 84852.8137423857
+--------------------------------------------------------------------------------
+### 3. `compute_constant_expression_involving_matrices`
+Computes the result of a constant expression involving matrices and returns it as a LaTeX string.
+• Parameters:
+  - `expression` (str): The constant expression involving matrices. Example format includes operations such as "+", "-", "*", "/".
+• Returns:
+  - `str`: The LaTeX-formatted string representation of the computed matrix, or an error message if the operations cannot be performed due to dimensional mismatches.
+• Example:
+    from rgwfuncs import compute_constant_expression_involving_matrices
+    # Example with addition of 2D matrices
+    result = compute_constant_expression_involving_matrices("[[2, 6, 9], [1, 3, 5]] + [[1, 2, 3], [4, 5, 6]]")
+    print(result)  # Output: \begin{bmatrix}3 & 8 & 12\\5 & 8 & 11\end{bmatrix}
+    # Example of mixed operations with 1D matrices treated as 2D
+    result = compute_constant_expression_involving_matrices("[3, 6, 9] + [1, 2, 3] - [2, 2, 2]")
+    print(result)  # Output: \begin{bmatrix}2 & 6 & 10\end{bmatrix}
+    # Example with dimension mismatch
+    result = compute_constant_expression_involving_matrices("[[4, 3, 51]] + [[1, 1]]")
+    print(result)  # Output: Operations between matrices must involve matrices of the same dimension
+--------------------------------------------------------------------------------
+### 4. `compute_constant_expression_involving_ordered_series`
+Computes the result of a constant expression involving ordered series, and returns it as a Latex string.
+• Parameters:
+  - `expression` (str): A series operation expression. Supports operations such as "+", "-", "*", "/", and `dd()` for discrete differences.
+• Returns:
+  - `str`: The string representation of the resultant series after performing operations, or an error message if series lengths do not match.
+• Example:
+    from rgwfuncs import compute_constant_expression_involving_ordered_series
+    # Example with addition and discrete differences
+    result = compute_constant_expression_involving_ordered_series("dd([2, 6, 9, 60]) + dd([78, 79, 80])")
+    print(result)  # Output: [4, 3, 51] + [1, 1]
+    # Example with elementwise subtraction
+    result = compute_constant_expression_involving_ordered_series("[10, 15, 21] - [5, 5, 5]")
+    print(result)  # Output: [5, 10, 16]
+    # Example with length mismatch
+    result = compute_constant_expression_involving_ordered_series("[4, 3, 51] + [1, 1]")
+    print(result)  # Output: Operations between ordered series must involve series of equal length
+--------------------------------------------------------------------------------
+### 5. `python_polynomial_expression_to_latex`
+Converts a polynomial expression written in Python syntax to a LaTeX formatted string. This function parses algebraic expressions provided as strings using Python’s syntax and translates them into equivalent LaTeX representations, making them suitable for academic or professional documentation. The function supports inclusion of named variables, with an option to substitute specific values into the expression.
+• Parameters:
+  - `expression` (str): The algebraic expression to convert to LaTeX. This should be a string formatted with Python syntax acceptable by SymPy.
+  - `subs` (Optional[Dict[str, float]]): An optional dictionary of substitutions where the keys are variable names in the expression, and the values are the numbers with which to substitute those variables.
+• Returns:
+  - `str`: The LaTeX formatted string equivalent to the provided expression.
+• Raises:
+  - `ValueError`: If the expression cannot be parsed due to syntax errors.
+• Example:
+    from rgwfuncs import python_polynomial_expression_to_latex
+    # Convert a simple polynomial expression to LaTeX format
+    latex_result1 = python_polynomial_expression_to_latex("x**2 + y**2")
+    print(latex_result1)  # Output: "x^{2} + y^{2}"
+    # Convert polynomial expression with substituted values
+    latex_result2 = python_polynomial_expression_to_latex("x**2 + y**2", {"x": 3, "y": 4})
+    print(latex_result2)  # Output: "25"
+    # Another example with partial substitution
+    latex_result3 = python_polynomial_expression_to_latex("x**2 + y**2", {"x": 3})
+    print(latex_result3)  # Output: "y^{2} + 9"
+    # Trigonometric functions included with symbolic variables
+    latex_result4 = python_polynomial_expression_to_latex("sin(x+z**2) + cos(y)", {"x": 55})
+    print(latex_result4)  # Output: "cos y + sin \\left(z^{2} + 55\\right)"
+    # Simplified trigonometric functions example with substitution
+    latex_result5 = python_polynomial_expression_to_latex("sin(x) + cos(y)", {"x": 0})
+    print(latex_result5)  # Output: "cos y"
+--------------------------------------------------------------------------------
+### 6. `expand_polynomial_expression`
+Expands a polynomial expression written in Python syntax and converts it into a LaTeX formatted string. This function takes algebraic expressions provided as strings using Python's syntax, applies polynomial expansion through SymPy, and translates them into LaTeX representations, suitable for academic or professional documentation. It supports expressions with named variables and provides an option to substitute specific values into the expression before expansion.
+• Parameters:
+  - `expression` (str): The algebraic expression to expand and convert to LaTeX. This string should be formatted using Python syntax acceptable by SymPy.
+  - `subs` (Optional[Dict[str, float]]): An optional dictionary of substitutions where the keys are variable names in the expression, and the values are the numbers with which to substitute those variables before expanding.
+• Returns:
+  - `str`: The LaTeX formatted string of the expanded expression.
+• Raises:
+  - `ValueError`: If the expression cannot be parsed due to syntax errors.
+• Example:
+    from rgwfuncs import expand_polynomial_expression
+    # Expand a simple polynomial expression and convert to LaTeX
+    latex_result1 = expand_polynomial_expression("(x + y)**2")
+    print(latex_result1)  # Output: "x^{2} + 2 x y + y^{2}"
+    # Expand polynomial expression with substituted values
+    latex_result2 = expand_polynomial_expression("(x + y)**2", {"x": 3, "y": 4})
+    print(latex_result2)  # Output: "49"
+    # Another example with partial substitution
+    latex_result3 = expand_polynomial_expression("(x + y)**2", {"x": 3})
+    print(latex_result3)  # Output: "y^{2} + 6 y + 9"
+    # Handling trigonometric functions with symbolic variables
+    latex_result4 = expand_polynomial_expression("sin(x + z**2) + cos(y)", {"x": 55})
+    print(latex_result4)  # Output: "cos y + sin \\left(z^{2} + 55\\right)"
+--------------------------------------------------------------------------------
+### 7. `factor_polynomial_expression`
+Factors a polynomial expression written in Python syntax and converts it into a LaTeX formatted string. This function parses an algebraic expression, performs polynomial factoring using SymPy, and converts the factored expression into a LaTeX representation, ideal for academic or professional use. Optional substitutions can be made before factoring.
+• Parameters:
+  - `expression` (str): The polynomial expression to factor and convert to LaTeX. This should be a valid expression formatted using Python syntax.
+  - `subs` (Optional[Dict[str, float]]): An optional dictionary of substitutions. The keys are variable names in the expression, and the values are numbers that replace these variables.
+• Returns:
+  - `str`: The LaTeX formatted string representing the factored expression.
+• Raises:
+  - `ValueError`: If the expression cannot be parsed due to syntax errors.
+• Example:
+    from rgwfuncs import factor_polynomial_expression
+    # Factor a polynomial expression and convert to LaTeX
+    latex_result1 = factor_polynomial_expression("x**2 - 4")
+    print(latex_result1)  # Output: "\left(x - 2\right) \left(x + 2\right)"
+    # Factor with substituted values
+    latex_result2 = factor_polynomial_expression("x**2 - y**2", {"y": 3})
+    print(latex_result2)  # Output: "\left(x - 3\right) \left(x + 3\right)"
+--------------------------------------------------------------------------------
+### 8. `simplify_polynomial_expression`
+Simplifies an algebraic expression in polynomial form and returns it in LaTeX format. Takes an algebraic expression, in polynomial form, written in Python syntax and simplifies it. The result is returned as a LaTeX formatted string, suitable for academic or professional documentation.
+• Parameters:
+  - `expression` (str): The algebraic expression, in polynomial form, to simplify. For instance, the expression 'np.diff(8*x**30) where as 'np.diff([2,5,9,11)' is not a polynomial.
+  - `subs` (Optional[Dict[str, float]]): An optional dictionary of substitutions where keys are variable names and values are the numbers to substitute them with.
+• Returns:
+  - `str`: The simplified expression formatted as a LaTeX string.
+• Example:
+    from rgwfuncs import simplify_polynomial_expression
+    # Example 1: Simplifying a polynomial expression without substitutions
+    simplified_expr1 = simplify_polynomial_expression("2*x + 3*x")
+    print(simplified_expr1)  # Output: "5 x"
+    # Example 2: Simplifying a complex expression involving derivatives
+    simplified_expr2 = simplify_polynomial_expression("(np.diff(3*x**8)) / (np.diff(8*x**30) * 11*y**3)")
+    print(simplified_expr2)  # Output: r"\frac{1}{110 x^{22} y^{3}}"
+    # Example 3: Simplifying with substitutions
+    simplified_expr3 = simplify_polynomial_expression("x**2 + y**2", subs={"x": 3, "y": 4})
+    print(simplified_expr3)  # Output: "25"
+    # Example 4: Simplifying with partial substitution
+    simplified_expr4 = simplify_polynomial_expression("a*b + b", subs={"b": 2})
+    print(simplified_expr4)  # Output: "2 a + 2"
+--------------------------------------------------------------------------------
+### 9. `cancel_polynomial_expression`
+Cancels common factors within a polynomial expression written in Python syntax and converts it to a LaTeX formatted string. This function parses an algebraic expression, cancels common factors using SymPy, and translates the reduced expression into a LaTeX representation. It can also accommodate optional substitutions to be made prior to simplification.
+• Parameters:
+  - `expression` (str): The algebraic expression to simplify and convert to LaTeX. This string should be formatted using Python syntax.
+  - `subs` (Optional[Dict[str, float]]): An optional dictionary of substitutions where the keys are variable names in the expression, and the values are the numbers to substitute.
+• Returns:
+  - `str`: The LaTeX formatted string of the simplified expression. If the expression involves indeterminate forms due to operations like division by zero, a descriptive error message is returned instead.
+• Raises:
+  - `ValueError`: If the expression cannot be parsed due to syntax errors or involves undefined operations, such as division by zero.
+• Example:
+    from rgwfuncs import cancel_polynomial_expression
+    # Cancel common factors within a polynomial expression
+    latex_result1 = cancel_polynomial_expression("(x**2 - 4) / (x - 2)")
+    print(latex_result1)  # Output: "x + 2"
+    # Cancel with substituted values
+    latex_result2 = cancel_polynomial_expression("(x**2 - 4) / (x - 2)", {"x": 2})
+    print(latex_result2)  # Output: "Undefined result. This could be a division by zero error."
+--------------------------------------------------------------------------------
+### 10. `solve_homogeneous_polynomial_expression`
+Solves a homogeneous polynomial expression for a specified variable and returns solutions in LaTeX format. Assumes that the expression is homoegeneous (i.e. equal to zero), and solves for a designated variable. May optionally include substitutions for other variables in the equation. The solutions are provided as a LaTeX formatted string. The method solves equations for specified variables, with optional substitutions, returning LaTeX-formatted solutions.
+• Parameters:
+  - `expression` (str): A string of the homogeneous polynomial expression to solve.
+  - `variable` (str): The variable to solve for.
+  - `subs` (Optional[Dict[str, float]]): Substitutions for variables.
+• Returns:
+  - `str`: Solutions formatted in LaTeX.
+• Example:
+    from rgwfuncs import solve_homogeneous_polynomial_expression
+    solutions1 = solve_homogeneous_polynomial_expression("a*x**2 + b*x + c", "x", {"a": 3, "b": 7, "c": 5})
+    print(solutions1)  # Output: "\left[-7/6 - sqrt(11)*I/6, -7/6 + sqrt(11)*I/6\right]"
+    solutions2 = solve_homogeneous_polynomial_expression("x**2 - 4", "x")
+    print(solutions2)  # Output: "\left[-2, 2\right]"
+--------------------------------------------------------------------------------
+### 11. `plot_polynomial_functions`
+This function plots polynomial functions described by a list of expressions and their corresponding substitution dictionaries. It generates SVG markup of the plots, with options for specifying the domain, axis zoom, and legend display.
+• Parameters:
+- `functions` (`List[Dict[str, Dict[str, Any]]]`): A list of dictionaries, each containing:
+  - A key which is a string representing a Python/NumPy expression (e.g., `"x**2"`, `"np.diff(x,2)"`).
+  - A value which is a dictionary containing substitutions for the expression. Must include an `"x"` key, either as `"*"` for default domain or a NumPy array.
+- `zoom` (`float`): Determines the numeric axis range from `-zoom` to `+zoom` for both x and y axes (default is `10.0`).
+- `show_legend` (`bool`): Specifies whether to include a legend in the plot (default is `True`).
+- `open_file` (`bool`): If saving to path is not desireable, opens the svg as a temp file, else opens the file from the actual location using the system's default viewer (defaults to False).
+- `save_path` (`Optional[str]`): If specified, saves the output string as a .svg at the indicated path (defaults to None).
+• Returns:
+- `str`: The raw SVG markup of the resulting plot.
+• Example:
+    from rgwfuncs import plot_polynomial_functions
+    # Generate the SVG
+    plot_svg_string = plot_polynomial_functions(
+        functions=[
+            {"x**2": {"x": "*"}},  # Single expression, "*" means plot all discernable points
+            {"x**2/(2 + a) + a": {"x": np.linspace(-3, 4, 101), "a": 1.23}},
+            {"np.diff(x**3, 2)": {"x": np.linspace(-2, 2, 10)}}
+        ],
+        zoom=2
+    )
+    # Write the SVG to an actual file
+    with open("plot.svg", "w", encoding="utf-8") as file:
+        file.write(plot_svg_string)
+• Displaying the SVG:
+![Plot](./media/plot_polynomial_functions_example_1.svg)
+--------------------------------------------------------------------------------
+## String Based Functions
+### 1. send_telegram_message
 Send a message to a Telegram chat using a specified preset from your configuration file.
@@ -176,20 +582,7 @@ Send a message to a Telegram chat using a specified preset from your configurati
 Below is a quick reference of available functions, their purpose, and basic usage examples.
-### 1. df_docs
-Print a list of available function names in alphabetical order. If a filter is provided, print the matching docstrings.
-• Parameters:
-  - `method_type_filter` (str): Optional, comma-separated to select docstring types, or '*' for all.
-• Example:
-    import rgwfuncs
-    rgwfuncs.df_docs(method_type_filter='numeric_clean,limit_dataframe')
---------------------------------------------------------------------------------
-### 2. `numeric_clean`
+### 1. `numeric_clean`
 Cleans the numeric columns in a DataFrame according to specified treatments.
 • Parameters:
@@ -218,7 +611,7 @@ Cleans the numeric columns in a DataFrame according to specified treatments.
 --------------------------------------------------------------------------------
-### 3. `limit_dataframe`
+### 2. `limit_dataframe`
 Limit the DataFrame to a specified number of rows.
 • Parameters:
@@ -239,7 +632,7 @@ Limit the DataFrame to a specified number of rows.
 --------------------------------------------------------------------------------
-### 4. `from_raw_data`
+### 3. `from_raw_data`
 Create a DataFrame from raw data.
 • Parameters:
@@ -265,7 +658,7 @@ Create a DataFrame from raw data.
 --------------------------------------------------------------------------------
-### 5. `append_rows`
+### 4. `append_rows`
 Append rows to the DataFrame.
 • Parameters:
@@ -290,7 +683,7 @@ Append rows to the DataFrame.
 --------------------------------------------------------------------------------
-### 6. `append_columns`
+### 5. `append_columns`
 Append new columns to the DataFrame with None values.
 • Parameters:
@@ -311,7 +704,7 @@ Append new columns to the DataFrame with None values.
 --------------------------------------------------------------------------------
-### 7. `update_rows`
+### 6. `update_rows`
 Update specific rows in the DataFrame based on a condition.
 • Parameters:
@@ -333,7 +726,7 @@ Update specific rows in the DataFrame based on a condition.
 --------------------------------------------------------------------------------
-### 8. `delete_rows`
+### 7. `delete_rows`
 Delete rows from the DataFrame based on a condition.
 • Parameters:
@@ -354,7 +747,7 @@ Delete rows from the DataFrame based on a condition.
 --------------------------------------------------------------------------------
-### 9. `drop_duplicates`
+### 8. `drop_duplicates`
 Drop duplicate rows in the DataFrame, retaining the first occurrence.
 • Parameters:
@@ -374,7 +767,7 @@ Drop duplicate rows in the DataFrame, retaining the first occurrence.
 --------------------------------------------------------------------------------
-### 10. `drop_duplicates_retain_first`
+### 9. `drop_duplicates_retain_first`
 Drop duplicate rows based on specified columns, retaining the first occurrence.
 • Parameters:
@@ -395,7 +788,7 @@ Drop duplicate rows based on specified columns, retaining the first occurrence.
 --------------------------------------------------------------------------------
-### 11. `drop_duplicates_retain_last`
+### 10. `drop_duplicates_retain_last`
 Drop duplicate rows based on specified columns, retaining the last occurrence.
 • Parameters:
@@ -417,34 +810,55 @@ Drop duplicate rows based on specified columns, retaining the last occurrence.
 --------------------------------------------------------------------------------
-### 12. `load_data_from_query`
+### 11. `load_data_from_query`
-Load data from a database query into a DataFrame based on a configuration preset.
+Load data from a specified database using a SQL query and return the results in a Pandas DataFrame. The database connection configurations are determined by a preset name specified in a configuration file.
-- **Parameters:**
-  - `db_preset_name` (str): Name of the database preset in the configuration file.
-  - `query` (str): The SQL query to execute.
+#### Features
-- **Returns:**
-  - `pd.DataFrame`: A DataFrame containing the query result.
+- Multi-Database Support: This function supports different database types, including MSSQL, MySQL, ClickHouse, Google BigQuery, and AWS Athena, based on the configuration preset selected.
+- Configuration-Based: It utilizes a configuration file to store database connection details securely, avoiding hardcoding sensitive information directly into the script.
+- Dynamic Query Execution: Capable of executing custom user-defined SQL queries against the specified database.
+- Automatic Result Loading: Fetches query results and loads them directly into a Pandas DataFrame for further manipulation and analysis.
-- **Notes:**
-  - The configuration file is assumed to be located at `~/.rgwfuncsrc`.
+#### Parameters
-- **Example:**
+- `db_preset_name` (str): The name of the database preset found in the configuration file. This preset determines which database connection details to use.
+- `query` (str): The SQL query string to be executed on the database.
-  from rgwfuncs import load_data_from_query
+#### Returns
+- `pd.DataFrame`: Returns a DataFrame that contains the results from the executed SQL query.
+#### Configuration Details
+- The configuration file is expected to be in JSON format and located at `~/.rgwfuncsrc`.
+- Each preset within the configuration file must include:
+  - `name`: Name of the database preset.
+  - `db_type`: Type of the database (`mssql`, `mysql`, `clickhouse`, `google_big_query`, `aws_athena`).
+  - `credentials`: Necessary credentials such as host, username, password, and potentially others depending on the database type.
+#### Example
+    from rgwfuncs import load_data_from_query
+    # Load data using a preset configuration
+    df = load_data_from_query(
+        db_preset_name="MyDBPreset",
+        query="SELECT * FROM my_table"
+    )
+    print(df)
-  df = load_data_from_query(
-      db_preset_name="MyDBPreset",
-      query="SELECT * FROM my_table"
-  )
-  print(df)
+#### Notes
+- Security: Ensure that the configuration file (`~/.rgwfuncsrc`) is secure and accessible only to authorized users, as it contains sensitive information.
+- Pre-requisites: Ensure the necessary Python packages are installed for each database type you wish to query. For example, `pymssql` for MSSQL, `mysql-connector-python` for MySQL, and so on.
+- Error Handling: The function raises a `ValueError` if the specified preset name does not exist or if the database type is unsupported. Additional exceptions may arise from network issues or database errors.
+- Environment: For AWS Athena, ensure that AWS credentials are configured properly for the boto3 library to authenticate successfully. Consider using AWS IAM roles or AWS Secrets Manager for better security management.
 --------------------------------------------------------------------------------
-### 13. `load_data_from_path`
+### 12. `load_data_from_path`
 Load data from a file into a DataFrame based on the file extension.
 • Parameters:
@@ -463,7 +877,7 @@ Load data from a file into a DataFrame based on the file extension.
 --------------------------------------------------------------------------------
-### 14. `load_data_from_sqlite_path`
+### 13. `load_data_from_sqlite_path`
 Execute a query on a SQLite database file and return the results as a DataFrame.
 • Parameters:
@@ -483,7 +897,7 @@ Execute a query on a SQLite database file and return the results as a DataFrame.
 --------------------------------------------------------------------------------
-### 15. `first_n_rows`
+### 14. `first_n_rows`
 Display the first n rows of the DataFrame (prints out in dictionary format).
 • Parameters:
@@ -501,7 +915,7 @@ Display the first n rows of the DataFrame (prints out in dictionary format).
 --------------------------------------------------------------------------------
-### 16. `last_n_rows`
+### 15. `last_n_rows`
 Display the last n rows of the DataFrame (prints out in dictionary format).
 • Parameters:
@@ -519,7 +933,7 @@ Display the last n rows of the DataFrame (prints out in dictionary format).
 --------------------------------------------------------------------------------
-### 17. `top_n_unique_values`
+### 16. `top_n_unique_values`
 Print the top n unique values for specified columns in the DataFrame.
 • Parameters:
@@ -538,7 +952,7 @@ Print the top n unique values for specified columns in the DataFrame.
 --------------------------------------------------------------------------------
-### 18. `bottom_n_unique_values`
+### 17. `bottom_n_unique_values`
 Print the bottom n unique values for specified columns in the DataFrame.
 • Parameters:
@@ -557,7 +971,7 @@ Print the bottom n unique values for specified columns in the DataFrame.
 --------------------------------------------------------------------------------
-### 19. `print_correlation`
+### 18. `print_correlation`
 Print correlation for multiple pairs of columns in the DataFrame.
 • Parameters:
@@ -582,7 +996,7 @@ Print correlation for multiple pairs of columns in the DataFrame.
 --------------------------------------------------------------------------------
-### 20. `print_memory_usage`
+### 19. `print_memory_usage`
 Print the memory usage of the DataFrame in megabytes.
 • Parameters:
@@ -599,7 +1013,7 @@ Print the memory usage of the DataFrame in megabytes.
 --------------------------------------------------------------------------------
-### 21. `filter_dataframe`
+### 20. `filter_dataframe`
 Return a new DataFrame filtered by a given query expression.
 • Parameters:
@@ -625,7 +1039,7 @@ Return a new DataFrame filtered by a given query expression.
 --------------------------------------------------------------------------------
-### 22. `filter_indian_mobiles`
+### 21. `filter_indian_mobiles`
 Filter and return rows containing valid Indian mobile numbers in the specified column.
 • Parameters:
@@ -647,7 +1061,7 @@ Filter and return rows containing valid Indian mobile numbers in the specified c
 --------------------------------------------------------------------------------
-### 23. `print_dataframe`
+### 22. `print_dataframe`
 Print the entire DataFrame and its column types. Optionally print a source path.
 • Parameters:
@@ -665,7 +1079,7 @@ Print the entire DataFrame and its column types. Optionally print a source path.
 --------------------------------------------------------------------------------
-### 24. `send_dataframe_via_telegram`
+### 23. `send_dataframe_via_telegram`
 Send a DataFrame via Telegram using a specified bot configuration.
 • Parameters:
@@ -692,7 +1106,7 @@ Send a DataFrame via Telegram using a specified bot configuration.
 --------------------------------------------------------------------------------
-### 25. `send_data_to_email`
+### 24. `send_data_to_email`
 Send an email with an optional DataFrame attachment using the Gmail API via a specified preset.
 • Parameters:
@@ -722,7 +1136,7 @@ Send an email with an optional DataFrame attachment using the Gmail API via a sp
 --------------------------------------------------------------------------------
-### 26. `send_data_to_slack`
+### 25. `send_data_to_slack`
 Send a DataFrame or message to Slack using a specified bot configuration.
 • Parameters:
@@ -748,7 +1162,7 @@ Send a DataFrame or message to Slack using a specified bot configuration.
 --------------------------------------------------------------------------------
-### 27. `order_columns`
+### 26. `order_columns`
 Reorder the columns of a DataFrame based on a string input.
 • Parameters:
@@ -770,7 +1184,7 @@ Reorder the columns of a DataFrame based on a string input.
 --------------------------------------------------------------------------------
-### 28. `append_ranged_classification_column`
+### 27. `append_ranged_classification_column`
 Append a ranged classification column to the DataFrame.
 • Parameters:
@@ -794,7 +1208,7 @@ Append a ranged classification column to the DataFrame.
 --------------------------------------------------------------------------------
-### 29. `append_percentile_classification_column`
+### 28. `append_percentile_classification_column`
 Append a percentile classification column to the DataFrame.
 • Parameters:
@@ -818,7 +1232,7 @@ Append a percentile classification column to the DataFrame.
 --------------------------------------------------------------------------------
-### 30. `append_ranged_date_classification_column`
+### 29. `append_ranged_date_classification_column`
 Append a ranged date classification column to the DataFrame.
 • Parameters:
@@ -847,7 +1261,7 @@ Append a ranged date classification column to the DataFrame.
 --------------------------------------------------------------------------------
-### 31. `rename_columns`
+### 30. `rename_columns`
 Rename columns in the DataFrame.
 • Parameters:
@@ -869,7 +1283,7 @@ Rename columns in the DataFrame.
 --------------------------------------------------------------------------------
-### 32. `cascade_sort`
+### 31. `cascade_sort`
 Cascade sort the DataFrame by specified columns and order.
 • Parameters:
@@ -895,7 +1309,7 @@ Cascade sort the DataFrame by specified columns and order.
 --------------------------------------------------------------------------------
-### 33. `append_xgb_labels`
+### 32. `append_xgb_labels`
 Append XGB training labels (TRAIN, VALIDATE, TEST) based on a ratio string.
 • Parameters:
@@ -917,7 +1331,7 @@ Append XGB training labels (TRAIN, VALIDATE, TEST) based on a ratio string.
 --------------------------------------------------------------------------------
-### 34. `append_xgb_regression_predictions`
+### 33. `append_xgb_regression_predictions`
 Append XGB regression predictions to the DataFrame. Requires an `XGB_TYPE` column for TRAIN/TEST splits.
 • Parameters:
@@ -949,7 +1363,7 @@ Append XGB regression predictions to the DataFrame. Requires an `XGB_TYPE` colum
 --------------------------------------------------------------------------------
-### 35. `append_xgb_logistic_regression_predictions`
+### 34. `append_xgb_logistic_regression_predictions`
 Append XGB logistic regression predictions to the DataFrame. Requires an `XGB_TYPE` column for TRAIN/TEST splits.
 • Parameters:
@@ -981,7 +1395,7 @@ Append XGB logistic regression predictions to the DataFrame. Requires an `XGB_TY
 --------------------------------------------------------------------------------
-### 36. `print_n_frequency_cascading`
+### 35. `print_n_frequency_cascading`
 Print the cascading frequency of top n values for specified columns.
 • Parameters:
@@ -1001,27 +1415,36 @@ Print the cascading frequency of top n values for specified columns.
 --------------------------------------------------------------------------------
-### 37. `print_n_frequency_linear`
-Print the linear frequency of top n values for specified columns.
+### 36. `print_n_frequency_linear`
-• Parameters:
-  - df (pd.DataFrame)
-  - n (int)
-  - columns (str): Comma-separated columns.
-  - `order_by` (str)
+Prints the linear frequency of the top `n` values for specified columns.
+#### Parameters:
+- **df** (`pd.DataFrame`): The DataFrame to analyze.
+- **n** (`int`): The number of top values to print for each column.
+- **columns** (`list`): A list of column names to be analyzed.
+- **order_by** (`str`): The order of frequency. The available options are:
+  - `"ASC"`: Sort keys in ascending lexicographical order.
+  - `"DESC"`: Sort keys in descending lexicographical order.
+  - `"FREQ_ASC"`: Sort the frequencies in ascending order (least frequent first).
+  - `"FREQ_DESC"`: Sort the frequencies in descending order (most frequent first).
+  - `"BY_KEYS_ASC"`: Sort keys in ascending order, numerically if possible, handling special strings like 'NaN' as typical entries.
+  - `"BY_KEYS_DESC"`: Sort keys in descending order, numerically if possible, handling special strings like 'NaN' as typical entries.
+#### Example:
-• Example:
     from rgwfuncs import print_n_frequency_linear
     import pandas as pd
-    df = pd.DataFrame({'City': ['NY','LA','NY','SF','LA','LA']})
-    print_n_frequency_linear(df, 2, 'City', 'FREQ_DESC')
+    df = pd.DataFrame({'City': ['NY', 'LA', 'NY', 'SF', 'LA', 'LA']})
+    print_n_frequency_linear(df, 2, ['City'], 'FREQ_DESC')
+This example analyzes the `City` column, printing the top 2 most frequent values in descending order of frequency.
 --------------------------------------------------------------------------------
-### 38. `retain_columns`
+### 37. `retain_columns`
 Retain specified columns in the DataFrame and drop the others.
 • Parameters:
@@ -1043,7 +1466,7 @@ Retain specified columns in the DataFrame and drop the others.
 --------------------------------------------------------------------------------
-### 39. `mask_against_dataframe`
+### 38. `mask_against_dataframe`
 Retain only rows with common column values between two DataFrames.
 • Parameters:
@@ -1068,7 +1491,7 @@ Retain only rows with common column values between two DataFrames.
 --------------------------------------------------------------------------------
-### 40. `mask_against_dataframe_converse`
+### 39. `mask_against_dataframe_converse`
 Retain only rows with uncommon column values between two DataFrames.
 • Parameters:
@@ -1093,7 +1516,7 @@ Retain only rows with uncommon column values between two DataFrames.
 --------------------------------------------------------------------------------
-### 41. `union_join`
+### 40. `union_join`
 Perform a union join, concatenating two DataFrames and dropping duplicates.
 • Parameters:
@@ -1116,7 +1539,7 @@ Perform a union join, concatenating two DataFrames and dropping duplicates.
 --------------------------------------------------------------------------------
-### 42. `bag_union_join`
+### 41. `bag_union_join`
 Perform a bag union join, concatenating two DataFrames without dropping duplicates.
 • Parameters:
@@ -1139,7 +1562,7 @@ Perform a bag union join, concatenating two DataFrames without dropping duplicat
 --------------------------------------------------------------------------------
-### 43. `left_join`
+### 42. `left_join`
 Perform a left join on two DataFrames.
 • Parameters:
@@ -1164,7 +1587,7 @@ Perform a left join on two DataFrames.
 --------------------------------------------------------------------------------
-### 44. `right_join`
+### 43. `right_join`
 Perform a right join on two DataFrames.
 • Parameters:
@@ -1189,7 +1612,7 @@ Perform a right join on two DataFrames.
 --------------------------------------------------------------------------------
-### 45. `insert_dataframe_in_sqlite_database`
+### 44. `insert_dataframe_in_sqlite_database`
 Inserts a Pandas DataFrame into a SQLite database table. If the specified table does not exist, it will be created with column types automatically inferred from the DataFrame's data types.
@@ -1227,7 +1650,7 @@ Inserts a Pandas DataFrame into a SQLite database table. If the specified table
 --------------------------------------------------------------------------------
-### 46. `sync_dataframe_to_sqlite_database`
+### 45. `sync_dataframe_to_sqlite_database`
 Processes and saves a DataFrame to an SQLite database, adding a timestamp column and replacing the existing table if needed. Creates the table if it does not exist.
 • Parameters:
@@ -1251,6 +1674,8 @@ Processes and saves a DataFrame to an SQLite database, adding a timestamp column
 --------------------------------------------------------------------------------
 ## Additional Info
 For more information, refer to each function’s docstring by calling:

rgwfuncs 0.0.21__py3-none-any.whl → 0.0.54__py3-none-any.whl

rgwfuncs 0.0.21py3-none-any.whl → 0.0.54py3-none-any.whl