rgwfuncs 0.0.21__py3-none-any.whl → 0.0.54__py3-none-any.whl

Sign up to get free protection for your applications and to get access to all the features.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: rgwfuncs
3
- Version: 0.0.21
3
+ Version: 0.0.54
4
4
  Summary: A functional programming paradigm for mathematical modelling and data science
5
5
  Home-page: https://github.com/ryangerardwilson/rgwfunc
6
6
  Author: Ryan Gerard Wilson
@@ -23,12 +23,16 @@ Requires-Dist: xgboost
23
23
  Requires-Dist: requests
24
24
  Requires-Dist: slack-sdk
25
25
  Requires-Dist: google-api-python-client
26
+ Requires-Dist: boto3
26
27
 
27
28
  # RGWFUNCS
28
29
 
29
30
  ***By Ryan Gerard Wilson (https://ryangerardwilson.com)***
30
31
 
31
- This library is meant to make ML/ Data Science pipelines more readable. It assumes a linux environment, and the existence of a `.rgwfuncsrc` file for certain features (like db querying, sending data to slack, etc.)
32
+
33
+ This library is meant to protect your eyes (and brain) from OOP syntax. It is unbelievably sad that some of the best work done in creating math and data science libraries in Python has been corrupted by the OOP mind-virus.
34
+
35
+ By creating a functional-programming wrapper around these libraries, we aim to soothe. This library assumes a Linux environment and the existence of a `.rgwfuncsrc` file for certain features (like database querying, sending data to Slack, etc.).
32
36
 
33
37
  --------------------------------------------------------------------------------
34
38
 
@@ -75,6 +79,15 @@ A `.rgwfuncsrc` file (located at `vi ~/.rgwfuncsrc) is required for MSSQL, CLICK
75
79
  "db_type": "google_big_query",
76
80
  "json_file_path": "",
77
81
  "project_id": ""
82
+ },
83
+ {
84
+ "name": "athena_db1",
85
+ "db_type": "aws_athena",
86
+ "aws_access_key": "",
87
+ "aws_secret_key": "",
88
+ "aws_region: "",
89
+ "database": "logs",
90
+ "output_bucket": "s3://bucket-name"
78
91
  }
79
92
  ],
80
93
  "vm_presets": [
@@ -135,22 +148,415 @@ To display all docstrings, use:
135
148
 
136
149
  --------------------------------------------------------------------------------
137
150
 
138
- ## String Based Functions
151
+ ## Documentation Access
139
152
 
140
- ### 1. str_docs
141
- Print a list of available function names in alphabetical order. If a filter is provided, print the matching docstrings.
153
+ ### 1. docs
154
+ Print a list of available function names in alphabetical order. If a filter is provided, print the docstrings of functions containing the term.
142
155
 
143
156
  • Parameters:
144
157
  - `method_type_filter` (str): Optional, comma-separated to select docstring types, or '*' for all.
145
158
 
146
159
  • Example:
147
160
 
148
- import rgwfuncs
149
- rgwfuncs.str_docs(method_type_filter='numeric_clean,limit_dataframe')
161
+ from rgwfuncs import docs
162
+ docs(method_type_filter='numeric_clean,limit_dataframe')
150
163
 
151
164
  --------------------------------------------------------------------------------
152
165
 
153
- ### 2. send_telegram_message
166
+ ## Interactive Shell
167
+
168
+ This section includes functions that facilitate launching an interactive Python shell to inspect and modify local variables within the user's environment.
169
+
170
+ ### 1. `interactive_shell`
171
+
172
+ Launches an interactive prompt for inspecting and modifying local variables, making all methods in the rgwfuncs library available by default. This REPL (Read-Eval-Print Loop) environment supports command history and autocompletion, making it easier to interact with your Python code. This function is particularly useful for debugging purposes when you want real-time interaction with your program's execution environment.
173
+
174
+ • Parameters:
175
+ - `local_vars` (dict, optional): A dictionary of local variables to be accessible within the interactive shell. If not provided, defaults to an empty dictionary.
176
+
177
+ • Usage:
178
+ - You can call this function to enter an interactive shell where you can view and modify the variables in the given local scope.
179
+
180
+ • Example:
181
+
182
+ from rgwfuncs import interactive_shell
183
+ import pandas as pd
184
+ import numpy as np
185
+
186
+ # Example DataFrame
187
+ df = pd.DataFrame({
188
+ 'id': [1, 2, 3, 4, 5],
189
+ 'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
190
+ 'age': [30, 25, 35, 28, 22],
191
+ 'city': ['New York', 'Los Angeles', 'Chicago', 'San Francisco', 'Boston']
192
+ })
193
+
194
+ # Launch the interactive shell with local variables
195
+ interactive_shell(locals())
196
+
197
+ Subsequently, in the interactive shell you can use any library in your python file, as well as all rgwfuncs methods (even if they are not imported). Notice, that while pandas and numpy are available in the shell as a result of importing them in the above script, the rgwfuncs method `first_n_rows` was not imported - yet is available for use.
198
+
199
+ Welcome to the rgwfuncs interactive shell.
200
+ >>> pirst_n_rows(df, 2)
201
+ Traceback (most recent call last):
202
+ File "<console>", line 1, in <module>
203
+ NameError: name 'pirst_n_rows' is not defined. Did you mean: 'first_n_rows'?
204
+ >>> first_n_rows(df, 2)
205
+ {'age': '30', 'city': 'New York', 'id': '1', 'name': 'Alice'}
206
+ {'age': '25', 'city': 'Los Angeles', 'id': '2', 'name': 'Bob'}
207
+ >>> print(df)
208
+ id name age city
209
+ 0 1 Alice 30 New York
210
+ 1 2 Bob 25 Los Angeles
211
+ 2 3 Charlie 35 Chicago
212
+ 3 4 David 28 San Francisco
213
+ 4 5 Eva 22 Boston
214
+ >>> arr = np.array([1, 2, 3, 4, 5])
215
+ >>> arr
216
+ array([1, 2, 3, 4, 5])
217
+
218
+ --------------------------------------------------------------------------------
219
+
220
+ ## Algebra Based Functions
221
+
222
+ This section provides comprehensive functions for handling algebraic expressions, performing tasks such as computation, simplification, solving equations, and prime factorization, all outputted in LaTeX format.
223
+
224
+ ### 1. `compute_prime_factors`
225
+
226
+ Computes prime factors of a number and presents them in LaTeX format.
227
+
228
+ • Parameters:
229
+ - `n` (int): The integer to factorize.
230
+
231
+ • Returns:
232
+ - `str`: Prime factorization in LaTeX.
233
+
234
+ • Example:
235
+
236
+ from rgwfuncs import compute_prime_factors
237
+ factors_1 = compute_prime_factors(100)
238
+ print(factors_1) # Output: "2^{2} \cdot 5^{2}"
239
+
240
+ factors_2 = compute_prime_factors(60)
241
+ print(factors_2) # Output: "2^{2} \cdot 3 \cdot 5"
242
+
243
+ factors_3 = compute_prime_factors(17)
244
+ print(factors_3) # Output: "17"
245
+
246
+ --------------------------------------------------------------------------------
247
+
248
+ ### 2. `compute_constant_expression`
249
+
250
+ Computes the numerical result of a given expression, which can evaluate to a constant, represented as a float. Evaluates an constant expression provided as a string and returns the computed result. Supports various arithmetic operations, including addition, subtraction, multiplication, division, and modulo, as well as mathematical functions from the math module.
251
+
252
+ • Parameters:
253
+ - `expression` (str): The constant expression to compute. This should be a string consisting of arithmetic operations and Python's math module functions.
254
+
255
+ • Returns:
256
+ - `float`: The computed numerical result.
257
+
258
+ • Example:
259
+
260
+ from rgwfuncs import compute_constant_expression
261
+ result1 = compute_constant_expression("2 + 2")
262
+ print(result1) # Output: 4.0
263
+
264
+ result2 = compute_constant_expression("10 % 3")
265
+ print(result2) # Output: 1.0
266
+
267
+ result3 = compute_constant_expression("math.gcd(36, 60) * math.sin(math.radians(45)) * 10000")
268
+ print(result3) # Output: 84852.8137423857
269
+
270
+ --------------------------------------------------------------------------------
271
+
272
+ ### 3. `compute_constant_expression_involving_matrices`
273
+
274
+ Computes the result of a constant expression involving matrices and returns it as a LaTeX string.
275
+
276
+ • Parameters:
277
+ - `expression` (str): The constant expression involving matrices. Example format includes operations such as "+", "-", "*", "/".
278
+
279
+ • Returns:
280
+ - `str`: The LaTeX-formatted string representation of the computed matrix, or an error message if the operations cannot be performed due to dimensional mismatches.
281
+
282
+ • Example:
283
+
284
+ from rgwfuncs import compute_constant_expression_involving_matrices
285
+
286
+ # Example with addition of 2D matrices
287
+ result = compute_constant_expression_involving_matrices("[[2, 6, 9], [1, 3, 5]] + [[1, 2, 3], [4, 5, 6]]")
288
+ print(result) # Output: \begin{bmatrix}3 & 8 & 12\\5 & 8 & 11\end{bmatrix}
289
+
290
+ # Example of mixed operations with 1D matrices treated as 2D
291
+ result = compute_constant_expression_involving_matrices("[3, 6, 9] + [1, 2, 3] - [2, 2, 2]")
292
+ print(result) # Output: \begin{bmatrix}2 & 6 & 10\end{bmatrix}
293
+
294
+ # Example with dimension mismatch
295
+ result = compute_constant_expression_involving_matrices("[[4, 3, 51]] + [[1, 1]]")
296
+ print(result) # Output: Operations between matrices must involve matrices of the same dimension
297
+
298
+ --------------------------------------------------------------------------------
299
+
300
+ ### 4. `compute_constant_expression_involving_ordered_series`
301
+
302
+ Computes the result of a constant expression involving ordered series, and returns it as a Latex string.
303
+
304
+
305
+ • Parameters:
306
+ - `expression` (str): A series operation expression. Supports operations such as "+", "-", "*", "/", and `dd()` for discrete differences.
307
+
308
+ • Returns:
309
+ - `str`: The string representation of the resultant series after performing operations, or an error message if series lengths do not match.
310
+
311
+ • Example:
312
+
313
+ from rgwfuncs import compute_constant_expression_involving_ordered_series
314
+
315
+ # Example with addition and discrete differences
316
+ result = compute_constant_expression_involving_ordered_series("dd([2, 6, 9, 60]) + dd([78, 79, 80])")
317
+ print(result) # Output: [4, 3, 51] + [1, 1]
318
+
319
+ # Example with elementwise subtraction
320
+ result = compute_constant_expression_involving_ordered_series("[10, 15, 21] - [5, 5, 5]")
321
+ print(result) # Output: [5, 10, 16]
322
+
323
+ # Example with length mismatch
324
+ result = compute_constant_expression_involving_ordered_series("[4, 3, 51] + [1, 1]")
325
+ print(result) # Output: Operations between ordered series must involve series of equal length
326
+
327
+ --------------------------------------------------------------------------------
328
+
329
+ ### 5. `python_polynomial_expression_to_latex`
330
+
331
+ Converts a polynomial expression written in Python syntax to a LaTeX formatted string. This function parses algebraic expressions provided as strings using Python’s syntax and translates them into equivalent LaTeX representations, making them suitable for academic or professional documentation. The function supports inclusion of named variables, with an option to substitute specific values into the expression.
332
+
333
+ • Parameters:
334
+ - `expression` (str): The algebraic expression to convert to LaTeX. This should be a string formatted with Python syntax acceptable by SymPy.
335
+ - `subs` (Optional[Dict[str, float]]): An optional dictionary of substitutions where the keys are variable names in the expression, and the values are the numbers with which to substitute those variables.
336
+
337
+ • Returns:
338
+ - `str`: The LaTeX formatted string equivalent to the provided expression.
339
+
340
+ • Raises:
341
+ - `ValueError`: If the expression cannot be parsed due to syntax errors.
342
+
343
+ • Example:
344
+
345
+ from rgwfuncs import python_polynomial_expression_to_latex
346
+
347
+ # Convert a simple polynomial expression to LaTeX format
348
+ latex_result1 = python_polynomial_expression_to_latex("x**2 + y**2")
349
+ print(latex_result1) # Output: "x^{2} + y^{2}"
350
+
351
+ # Convert polynomial expression with substituted values
352
+ latex_result2 = python_polynomial_expression_to_latex("x**2 + y**2", {"x": 3, "y": 4})
353
+ print(latex_result2) # Output: "25"
354
+
355
+ # Another example with partial substitution
356
+ latex_result3 = python_polynomial_expression_to_latex("x**2 + y**2", {"x": 3})
357
+ print(latex_result3) # Output: "y^{2} + 9"
358
+
359
+ # Trigonometric functions included with symbolic variables
360
+ latex_result4 = python_polynomial_expression_to_latex("sin(x+z**2) + cos(y)", {"x": 55})
361
+ print(latex_result4) # Output: "cos y + sin \\left(z^{2} + 55\\right)"
362
+
363
+ # Simplified trigonometric functions example with substitution
364
+ latex_result5 = python_polynomial_expression_to_latex("sin(x) + cos(y)", {"x": 0})
365
+ print(latex_result5) # Output: "cos y"
366
+
367
+ --------------------------------------------------------------------------------
368
+
369
+ ### 6. `expand_polynomial_expression`
370
+
371
+ Expands a polynomial expression written in Python syntax and converts it into a LaTeX formatted string. This function takes algebraic expressions provided as strings using Python's syntax, applies polynomial expansion through SymPy, and translates them into LaTeX representations, suitable for academic or professional documentation. It supports expressions with named variables and provides an option to substitute specific values into the expression before expansion.
372
+
373
+ • Parameters:
374
+ - `expression` (str): The algebraic expression to expand and convert to LaTeX. This string should be formatted using Python syntax acceptable by SymPy.
375
+ - `subs` (Optional[Dict[str, float]]): An optional dictionary of substitutions where the keys are variable names in the expression, and the values are the numbers with which to substitute those variables before expanding.
376
+
377
+ • Returns:
378
+ - `str`: The LaTeX formatted string of the expanded expression.
379
+
380
+ • Raises:
381
+ - `ValueError`: If the expression cannot be parsed due to syntax errors.
382
+
383
+ • Example:
384
+
385
+ from rgwfuncs import expand_polynomial_expression
386
+
387
+ # Expand a simple polynomial expression and convert to LaTeX
388
+ latex_result1 = expand_polynomial_expression("(x + y)**2")
389
+ print(latex_result1) # Output: "x^{2} + 2 x y + y^{2}"
390
+
391
+ # Expand polynomial expression with substituted values
392
+ latex_result2 = expand_polynomial_expression("(x + y)**2", {"x": 3, "y": 4})
393
+ print(latex_result2) # Output: "49"
394
+
395
+ # Another example with partial substitution
396
+ latex_result3 = expand_polynomial_expression("(x + y)**2", {"x": 3})
397
+ print(latex_result3) # Output: "y^{2} + 6 y + 9"
398
+
399
+ # Handling trigonometric functions with symbolic variables
400
+ latex_result4 = expand_polynomial_expression("sin(x + z**2) + cos(y)", {"x": 55})
401
+ print(latex_result4) # Output: "cos y + sin \\left(z^{2} + 55\\right)"
402
+
403
+ --------------------------------------------------------------------------------
404
+
405
+ ### 7. `factor_polynomial_expression`
406
+
407
+ Factors a polynomial expression written in Python syntax and converts it into a LaTeX formatted string. This function parses an algebraic expression, performs polynomial factoring using SymPy, and converts the factored expression into a LaTeX representation, ideal for academic or professional use. Optional substitutions can be made before factoring.
408
+
409
+ • Parameters:
410
+ - `expression` (str): The polynomial expression to factor and convert to LaTeX. This should be a valid expression formatted using Python syntax.
411
+ - `subs` (Optional[Dict[str, float]]): An optional dictionary of substitutions. The keys are variable names in the expression, and the values are numbers that replace these variables.
412
+
413
+ • Returns:
414
+ - `str`: The LaTeX formatted string representing the factored expression.
415
+
416
+ • Raises:
417
+ - `ValueError`: If the expression cannot be parsed due to syntax errors.
418
+
419
+ • Example:
420
+
421
+ from rgwfuncs import factor_polynomial_expression
422
+
423
+ # Factor a polynomial expression and convert to LaTeX
424
+ latex_result1 = factor_polynomial_expression("x**2 - 4")
425
+ print(latex_result1) # Output: "\left(x - 2\right) \left(x + 2\right)"
426
+
427
+ # Factor with substituted values
428
+ latex_result2 = factor_polynomial_expression("x**2 - y**2", {"y": 3})
429
+ print(latex_result2) # Output: "\left(x - 3\right) \left(x + 3\right)"
430
+
431
+ --------------------------------------------------------------------------------
432
+
433
+ ### 8. `simplify_polynomial_expression`
434
+
435
+ Simplifies an algebraic expression in polynomial form and returns it in LaTeX format. Takes an algebraic expression, in polynomial form, written in Python syntax and simplifies it. The result is returned as a LaTeX formatted string, suitable for academic or professional documentation.
436
+
437
+ • Parameters:
438
+ - `expression` (str): The algebraic expression, in polynomial form, to simplify. For instance, the expression 'np.diff(8*x**30) where as 'np.diff([2,5,9,11)' is not a polynomial.
439
+ - `subs` (Optional[Dict[str, float]]): An optional dictionary of substitutions where keys are variable names and values are the numbers to substitute them with.
440
+
441
+ • Returns:
442
+ - `str`: The simplified expression formatted as a LaTeX string.
443
+
444
+ • Example:
445
+
446
+ from rgwfuncs import simplify_polynomial_expression
447
+
448
+ # Example 1: Simplifying a polynomial expression without substitutions
449
+ simplified_expr1 = simplify_polynomial_expression("2*x + 3*x")
450
+ print(simplified_expr1) # Output: "5 x"
451
+
452
+ # Example 2: Simplifying a complex expression involving derivatives
453
+ simplified_expr2 = simplify_polynomial_expression("(np.diff(3*x**8)) / (np.diff(8*x**30) * 11*y**3)")
454
+ print(simplified_expr2) # Output: r"\frac{1}{110 x^{22} y^{3}}"
455
+
456
+ # Example 3: Simplifying with substitutions
457
+ simplified_expr3 = simplify_polynomial_expression("x**2 + y**2", subs={"x": 3, "y": 4})
458
+ print(simplified_expr3) # Output: "25"
459
+
460
+ # Example 4: Simplifying with partial substitution
461
+ simplified_expr4 = simplify_polynomial_expression("a*b + b", subs={"b": 2})
462
+ print(simplified_expr4) # Output: "2 a + 2"
463
+
464
+ --------------------------------------------------------------------------------
465
+
466
+ ### 9. `cancel_polynomial_expression`
467
+
468
+ Cancels common factors within a polynomial expression written in Python syntax and converts it to a LaTeX formatted string. This function parses an algebraic expression, cancels common factors using SymPy, and translates the reduced expression into a LaTeX representation. It can also accommodate optional substitutions to be made prior to simplification.
469
+
470
+ • Parameters:
471
+ - `expression` (str): The algebraic expression to simplify and convert to LaTeX. This string should be formatted using Python syntax.
472
+ - `subs` (Optional[Dict[str, float]]): An optional dictionary of substitutions where the keys are variable names in the expression, and the values are the numbers to substitute.
473
+
474
+ • Returns:
475
+ - `str`: The LaTeX formatted string of the simplified expression. If the expression involves indeterminate forms due to operations like division by zero, a descriptive error message is returned instead.
476
+
477
+ • Raises:
478
+ - `ValueError`: If the expression cannot be parsed due to syntax errors or involves undefined operations, such as division by zero.
479
+
480
+ • Example:
481
+
482
+ from rgwfuncs import cancel_polynomial_expression
483
+
484
+ # Cancel common factors within a polynomial expression
485
+ latex_result1 = cancel_polynomial_expression("(x**2 - 4) / (x - 2)")
486
+ print(latex_result1) # Output: "x + 2"
487
+
488
+ # Cancel with substituted values
489
+ latex_result2 = cancel_polynomial_expression("(x**2 - 4) / (x - 2)", {"x": 2})
490
+ print(latex_result2) # Output: "Undefined result. This could be a division by zero error."
491
+
492
+ --------------------------------------------------------------------------------
493
+
494
+ ### 10. `solve_homogeneous_polynomial_expression`
495
+
496
+ Solves a homogeneous polynomial expression for a specified variable and returns solutions in LaTeX format. Assumes that the expression is homoegeneous (i.e. equal to zero), and solves for a designated variable. May optionally include substitutions for other variables in the equation. The solutions are provided as a LaTeX formatted string. The method solves equations for specified variables, with optional substitutions, returning LaTeX-formatted solutions.
497
+
498
+ • Parameters:
499
+ - `expression` (str): A string of the homogeneous polynomial expression to solve.
500
+ - `variable` (str): The variable to solve for.
501
+ - `subs` (Optional[Dict[str, float]]): Substitutions for variables.
502
+
503
+ • Returns:
504
+ - `str`: Solutions formatted in LaTeX.
505
+
506
+ • Example:
507
+
508
+ from rgwfuncs import solve_homogeneous_polynomial_expression
509
+ solutions1 = solve_homogeneous_polynomial_expression("a*x**2 + b*x + c", "x", {"a": 3, "b": 7, "c": 5})
510
+ print(solutions1) # Output: "\left[-7/6 - sqrt(11)*I/6, -7/6 + sqrt(11)*I/6\right]"
511
+
512
+ solutions2 = solve_homogeneous_polynomial_expression("x**2 - 4", "x")
513
+ print(solutions2) # Output: "\left[-2, 2\right]"
514
+
515
+ --------------------------------------------------------------------------------
516
+
517
+ ### 11. `plot_polynomial_functions`
518
+
519
+ This function plots polynomial functions described by a list of expressions and their corresponding substitution dictionaries. It generates SVG markup of the plots, with options for specifying the domain, axis zoom, and legend display.
520
+
521
+ • Parameters:
522
+ - `functions` (`List[Dict[str, Dict[str, Any]]]`): A list of dictionaries, each containing:
523
+ - A key which is a string representing a Python/NumPy expression (e.g., `"x**2"`, `"np.diff(x,2)"`).
524
+ - A value which is a dictionary containing substitutions for the expression. Must include an `"x"` key, either as `"*"` for default domain or a NumPy array.
525
+ - `zoom` (`float`): Determines the numeric axis range from `-zoom` to `+zoom` for both x and y axes (default is `10.0`).
526
+ - `show_legend` (`bool`): Specifies whether to include a legend in the plot (default is `True`).
527
+ - `open_file` (`bool`): If saving to path is not desireable, opens the svg as a temp file, else opens the file from the actual location using the system's default viewer (defaults to False).
528
+ - `save_path` (`Optional[str]`): If specified, saves the output string as a .svg at the indicated path (defaults to None).
529
+
530
+ • Returns:
531
+ - `str`: The raw SVG markup of the resulting plot.
532
+
533
+ • Example:
534
+
535
+ from rgwfuncs import plot_polynomial_functions
536
+
537
+ # Generate the SVG
538
+ plot_svg_string = plot_polynomial_functions(
539
+ functions=[
540
+ {"x**2": {"x": "*"}}, # Single expression, "*" means plot all discernable points
541
+ {"x**2/(2 + a) + a": {"x": np.linspace(-3, 4, 101), "a": 1.23}},
542
+ {"np.diff(x**3, 2)": {"x": np.linspace(-2, 2, 10)}}
543
+ ],
544
+ zoom=2
545
+ )
546
+
547
+ # Write the SVG to an actual file
548
+ with open("plot.svg", "w", encoding="utf-8") as file:
549
+ file.write(plot_svg_string)
550
+
551
+ • Displaying the SVG:
552
+
553
+ ![Plot](./media/plot_polynomial_functions_example_1.svg)
554
+
555
+ --------------------------------------------------------------------------------
556
+
557
+ ## String Based Functions
558
+
559
+ ### 1. send_telegram_message
154
560
 
155
561
  Send a message to a Telegram chat using a specified preset from your configuration file.
156
562
 
@@ -176,20 +582,7 @@ Send a message to a Telegram chat using a specified preset from your configurati
176
582
 
177
583
  Below is a quick reference of available functions, their purpose, and basic usage examples.
178
584
 
179
- ### 1. df_docs
180
- Print a list of available function names in alphabetical order. If a filter is provided, print the matching docstrings.
181
-
182
- • Parameters:
183
- - `method_type_filter` (str): Optional, comma-separated to select docstring types, or '*' for all.
184
-
185
- • Example:
186
-
187
- import rgwfuncs
188
- rgwfuncs.df_docs(method_type_filter='numeric_clean,limit_dataframe')
189
-
190
- --------------------------------------------------------------------------------
191
-
192
- ### 2. `numeric_clean`
585
+ ### 1. `numeric_clean`
193
586
  Cleans the numeric columns in a DataFrame according to specified treatments.
194
587
 
195
588
  • Parameters:
@@ -218,7 +611,7 @@ Cleans the numeric columns in a DataFrame according to specified treatments.
218
611
 
219
612
  --------------------------------------------------------------------------------
220
613
 
221
- ### 3. `limit_dataframe`
614
+ ### 2. `limit_dataframe`
222
615
  Limit the DataFrame to a specified number of rows.
223
616
 
224
617
  • Parameters:
@@ -239,7 +632,7 @@ Limit the DataFrame to a specified number of rows.
239
632
 
240
633
  --------------------------------------------------------------------------------
241
634
 
242
- ### 4. `from_raw_data`
635
+ ### 3. `from_raw_data`
243
636
  Create a DataFrame from raw data.
244
637
 
245
638
  • Parameters:
@@ -265,7 +658,7 @@ Create a DataFrame from raw data.
265
658
 
266
659
  --------------------------------------------------------------------------------
267
660
 
268
- ### 5. `append_rows`
661
+ ### 4. `append_rows`
269
662
  Append rows to the DataFrame.
270
663
 
271
664
  • Parameters:
@@ -290,7 +683,7 @@ Append rows to the DataFrame.
290
683
 
291
684
  --------------------------------------------------------------------------------
292
685
 
293
- ### 6. `append_columns`
686
+ ### 5. `append_columns`
294
687
  Append new columns to the DataFrame with None values.
295
688
 
296
689
  • Parameters:
@@ -311,7 +704,7 @@ Append new columns to the DataFrame with None values.
311
704
 
312
705
  --------------------------------------------------------------------------------
313
706
 
314
- ### 7. `update_rows`
707
+ ### 6. `update_rows`
315
708
  Update specific rows in the DataFrame based on a condition.
316
709
 
317
710
  • Parameters:
@@ -333,7 +726,7 @@ Update specific rows in the DataFrame based on a condition.
333
726
 
334
727
  --------------------------------------------------------------------------------
335
728
 
336
- ### 8. `delete_rows`
729
+ ### 7. `delete_rows`
337
730
  Delete rows from the DataFrame based on a condition.
338
731
 
339
732
  • Parameters:
@@ -354,7 +747,7 @@ Delete rows from the DataFrame based on a condition.
354
747
 
355
748
  --------------------------------------------------------------------------------
356
749
 
357
- ### 9. `drop_duplicates`
750
+ ### 8. `drop_duplicates`
358
751
  Drop duplicate rows in the DataFrame, retaining the first occurrence.
359
752
 
360
753
  • Parameters:
@@ -374,7 +767,7 @@ Drop duplicate rows in the DataFrame, retaining the first occurrence.
374
767
 
375
768
  --------------------------------------------------------------------------------
376
769
 
377
- ### 10. `drop_duplicates_retain_first`
770
+ ### 9. `drop_duplicates_retain_first`
378
771
  Drop duplicate rows based on specified columns, retaining the first occurrence.
379
772
 
380
773
  • Parameters:
@@ -395,7 +788,7 @@ Drop duplicate rows based on specified columns, retaining the first occurrence.
395
788
 
396
789
  --------------------------------------------------------------------------------
397
790
 
398
- ### 11. `drop_duplicates_retain_last`
791
+ ### 10. `drop_duplicates_retain_last`
399
792
  Drop duplicate rows based on specified columns, retaining the last occurrence.
400
793
 
401
794
  • Parameters:
@@ -417,34 +810,55 @@ Drop duplicate rows based on specified columns, retaining the last occurrence.
417
810
 
418
811
  --------------------------------------------------------------------------------
419
812
 
420
- ### 12. `load_data_from_query`
813
+ ### 11. `load_data_from_query`
421
814
 
422
- Load data from a database query into a DataFrame based on a configuration preset.
815
+ Load data from a specified database using a SQL query and return the results in a Pandas DataFrame. The database connection configurations are determined by a preset name specified in a configuration file.
423
816
 
424
- - **Parameters:**
425
- - `db_preset_name` (str): Name of the database preset in the configuration file.
426
- - `query` (str): The SQL query to execute.
817
+ #### Features
427
818
 
428
- - **Returns:**
429
- - `pd.DataFrame`: A DataFrame containing the query result.
819
+ - Multi-Database Support: This function supports different database types, including MSSQL, MySQL, ClickHouse, Google BigQuery, and AWS Athena, based on the configuration preset selected.
820
+ - Configuration-Based: It utilizes a configuration file to store database connection details securely, avoiding hardcoding sensitive information directly into the script.
821
+ - Dynamic Query Execution: Capable of executing custom user-defined SQL queries against the specified database.
822
+ - Automatic Result Loading: Fetches query results and loads them directly into a Pandas DataFrame for further manipulation and analysis.
430
823
 
431
- - **Notes:**
432
- - The configuration file is assumed to be located at `~/.rgwfuncsrc`.
824
+ #### Parameters
433
825
 
434
- - **Example:**
826
+ - `db_preset_name` (str): The name of the database preset found in the configuration file. This preset determines which database connection details to use.
827
+ - `query` (str): The SQL query string to be executed on the database.
435
828
 
436
- from rgwfuncs import load_data_from_query
829
+ #### Returns
830
+
831
+ - `pd.DataFrame`: Returns a DataFrame that contains the results from the executed SQL query.
832
+
833
+ #### Configuration Details
834
+
835
+ - The configuration file is expected to be in JSON format and located at `~/.rgwfuncsrc`.
836
+ - Each preset within the configuration file must include:
837
+ - `name`: Name of the database preset.
838
+ - `db_type`: Type of the database (`mssql`, `mysql`, `clickhouse`, `google_big_query`, `aws_athena`).
839
+ - `credentials`: Necessary credentials such as host, username, password, and potentially others depending on the database type.
840
+
841
+ #### Example
842
+
843
+ from rgwfuncs import load_data_from_query
844
+
845
+ # Load data using a preset configuration
846
+ df = load_data_from_query(
847
+ db_preset_name="MyDBPreset",
848
+ query="SELECT * FROM my_table"
849
+ )
850
+ print(df)
437
851
 
438
- df = load_data_from_query(
439
- db_preset_name="MyDBPreset",
440
- query="SELECT * FROM my_table"
441
- )
442
- print(df)
852
+ #### Notes
443
853
 
854
+ - Security: Ensure that the configuration file (`~/.rgwfuncsrc`) is secure and accessible only to authorized users, as it contains sensitive information.
855
+ - Pre-requisites: Ensure the necessary Python packages are installed for each database type you wish to query. For example, `pymssql` for MSSQL, `mysql-connector-python` for MySQL, and so on.
856
+ - Error Handling: The function raises a `ValueError` if the specified preset name does not exist or if the database type is unsupported. Additional exceptions may arise from network issues or database errors.
857
+ - Environment: For AWS Athena, ensure that AWS credentials are configured properly for the boto3 library to authenticate successfully. Consider using AWS IAM roles or AWS Secrets Manager for better security management.
444
858
 
445
859
  --------------------------------------------------------------------------------
446
860
 
447
- ### 13. `load_data_from_path`
861
+ ### 12. `load_data_from_path`
448
862
  Load data from a file into a DataFrame based on the file extension.
449
863
 
450
864
  • Parameters:
@@ -463,7 +877,7 @@ Load data from a file into a DataFrame based on the file extension.
463
877
 
464
878
  --------------------------------------------------------------------------------
465
879
 
466
- ### 14. `load_data_from_sqlite_path`
880
+ ### 13. `load_data_from_sqlite_path`
467
881
  Execute a query on a SQLite database file and return the results as a DataFrame.
468
882
 
469
883
  • Parameters:
@@ -483,7 +897,7 @@ Execute a query on a SQLite database file and return the results as a DataFrame.
483
897
 
484
898
  --------------------------------------------------------------------------------
485
899
 
486
- ### 15. `first_n_rows`
900
+ ### 14. `first_n_rows`
487
901
  Display the first n rows of the DataFrame (prints out in dictionary format).
488
902
 
489
903
  • Parameters:
@@ -501,7 +915,7 @@ Display the first n rows of the DataFrame (prints out in dictionary format).
501
915
 
502
916
  --------------------------------------------------------------------------------
503
917
 
504
- ### 16. `last_n_rows`
918
+ ### 15. `last_n_rows`
505
919
  Display the last n rows of the DataFrame (prints out in dictionary format).
506
920
 
507
921
  • Parameters:
@@ -519,7 +933,7 @@ Display the last n rows of the DataFrame (prints out in dictionary format).
519
933
 
520
934
  --------------------------------------------------------------------------------
521
935
 
522
- ### 17. `top_n_unique_values`
936
+ ### 16. `top_n_unique_values`
523
937
  Print the top n unique values for specified columns in the DataFrame.
524
938
 
525
939
  • Parameters:
@@ -538,7 +952,7 @@ Print the top n unique values for specified columns in the DataFrame.
538
952
 
539
953
  --------------------------------------------------------------------------------
540
954
 
541
- ### 18. `bottom_n_unique_values`
955
+ ### 17. `bottom_n_unique_values`
542
956
  Print the bottom n unique values for specified columns in the DataFrame.
543
957
 
544
958
  • Parameters:
@@ -557,7 +971,7 @@ Print the bottom n unique values for specified columns in the DataFrame.
557
971
 
558
972
  --------------------------------------------------------------------------------
559
973
 
560
- ### 19. `print_correlation`
974
+ ### 18. `print_correlation`
561
975
  Print correlation for multiple pairs of columns in the DataFrame.
562
976
 
563
977
  • Parameters:
@@ -582,7 +996,7 @@ Print correlation for multiple pairs of columns in the DataFrame.
582
996
 
583
997
  --------------------------------------------------------------------------------
584
998
 
585
- ### 20. `print_memory_usage`
999
+ ### 19. `print_memory_usage`
586
1000
  Print the memory usage of the DataFrame in megabytes.
587
1001
 
588
1002
  • Parameters:
@@ -599,7 +1013,7 @@ Print the memory usage of the DataFrame in megabytes.
599
1013
 
600
1014
  --------------------------------------------------------------------------------
601
1015
 
602
- ### 21. `filter_dataframe`
1016
+ ### 20. `filter_dataframe`
603
1017
  Return a new DataFrame filtered by a given query expression.
604
1018
 
605
1019
  • Parameters:
@@ -625,7 +1039,7 @@ Return a new DataFrame filtered by a given query expression.
625
1039
 
626
1040
  --------------------------------------------------------------------------------
627
1041
 
628
- ### 22. `filter_indian_mobiles`
1042
+ ### 21. `filter_indian_mobiles`
629
1043
  Filter and return rows containing valid Indian mobile numbers in the specified column.
630
1044
 
631
1045
  • Parameters:
@@ -647,7 +1061,7 @@ Filter and return rows containing valid Indian mobile numbers in the specified c
647
1061
 
648
1062
  --------------------------------------------------------------------------------
649
1063
 
650
- ### 23. `print_dataframe`
1064
+ ### 22. `print_dataframe`
651
1065
  Print the entire DataFrame and its column types. Optionally print a source path.
652
1066
 
653
1067
  • Parameters:
@@ -665,7 +1079,7 @@ Print the entire DataFrame and its column types. Optionally print a source path.
665
1079
 
666
1080
  --------------------------------------------------------------------------------
667
1081
 
668
- ### 24. `send_dataframe_via_telegram`
1082
+ ### 23. `send_dataframe_via_telegram`
669
1083
  Send a DataFrame via Telegram using a specified bot configuration.
670
1084
 
671
1085
  • Parameters:
@@ -692,7 +1106,7 @@ Send a DataFrame via Telegram using a specified bot configuration.
692
1106
 
693
1107
  --------------------------------------------------------------------------------
694
1108
 
695
- ### 25. `send_data_to_email`
1109
+ ### 24. `send_data_to_email`
696
1110
  Send an email with an optional DataFrame attachment using the Gmail API via a specified preset.
697
1111
 
698
1112
  • Parameters:
@@ -722,7 +1136,7 @@ Send an email with an optional DataFrame attachment using the Gmail API via a sp
722
1136
 
723
1137
  --------------------------------------------------------------------------------
724
1138
 
725
- ### 26. `send_data_to_slack`
1139
+ ### 25. `send_data_to_slack`
726
1140
  Send a DataFrame or message to Slack using a specified bot configuration.
727
1141
 
728
1142
  • Parameters:
@@ -748,7 +1162,7 @@ Send a DataFrame or message to Slack using a specified bot configuration.
748
1162
 
749
1163
  --------------------------------------------------------------------------------
750
1164
 
751
- ### 27. `order_columns`
1165
+ ### 26. `order_columns`
752
1166
  Reorder the columns of a DataFrame based on a string input.
753
1167
 
754
1168
  • Parameters:
@@ -770,7 +1184,7 @@ Reorder the columns of a DataFrame based on a string input.
770
1184
 
771
1185
  --------------------------------------------------------------------------------
772
1186
 
773
- ### 28. `append_ranged_classification_column`
1187
+ ### 27. `append_ranged_classification_column`
774
1188
  Append a ranged classification column to the DataFrame.
775
1189
 
776
1190
  • Parameters:
@@ -794,7 +1208,7 @@ Append a ranged classification column to the DataFrame.
794
1208
 
795
1209
  --------------------------------------------------------------------------------
796
1210
 
797
- ### 29. `append_percentile_classification_column`
1211
+ ### 28. `append_percentile_classification_column`
798
1212
  Append a percentile classification column to the DataFrame.
799
1213
 
800
1214
  • Parameters:
@@ -818,7 +1232,7 @@ Append a percentile classification column to the DataFrame.
818
1232
 
819
1233
  --------------------------------------------------------------------------------
820
1234
 
821
- ### 30. `append_ranged_date_classification_column`
1235
+ ### 29. `append_ranged_date_classification_column`
822
1236
  Append a ranged date classification column to the DataFrame.
823
1237
 
824
1238
  • Parameters:
@@ -847,7 +1261,7 @@ Append a ranged date classification column to the DataFrame.
847
1261
 
848
1262
  --------------------------------------------------------------------------------
849
1263
 
850
- ### 31. `rename_columns`
1264
+ ### 30. `rename_columns`
851
1265
  Rename columns in the DataFrame.
852
1266
 
853
1267
  • Parameters:
@@ -869,7 +1283,7 @@ Rename columns in the DataFrame.
869
1283
 
870
1284
  --------------------------------------------------------------------------------
871
1285
 
872
- ### 32. `cascade_sort`
1286
+ ### 31. `cascade_sort`
873
1287
  Cascade sort the DataFrame by specified columns and order.
874
1288
 
875
1289
  • Parameters:
@@ -895,7 +1309,7 @@ Cascade sort the DataFrame by specified columns and order.
895
1309
 
896
1310
  --------------------------------------------------------------------------------
897
1311
 
898
- ### 33. `append_xgb_labels`
1312
+ ### 32. `append_xgb_labels`
899
1313
  Append XGB training labels (TRAIN, VALIDATE, TEST) based on a ratio string.
900
1314
 
901
1315
  • Parameters:
@@ -917,7 +1331,7 @@ Append XGB training labels (TRAIN, VALIDATE, TEST) based on a ratio string.
917
1331
 
918
1332
  --------------------------------------------------------------------------------
919
1333
 
920
- ### 34. `append_xgb_regression_predictions`
1334
+ ### 33. `append_xgb_regression_predictions`
921
1335
  Append XGB regression predictions to the DataFrame. Requires an `XGB_TYPE` column for TRAIN/TEST splits.
922
1336
 
923
1337
  • Parameters:
@@ -949,7 +1363,7 @@ Append XGB regression predictions to the DataFrame. Requires an `XGB_TYPE` colum
949
1363
 
950
1364
  --------------------------------------------------------------------------------
951
1365
 
952
- ### 35. `append_xgb_logistic_regression_predictions`
1366
+ ### 34. `append_xgb_logistic_regression_predictions`
953
1367
  Append XGB logistic regression predictions to the DataFrame. Requires an `XGB_TYPE` column for TRAIN/TEST splits.
954
1368
 
955
1369
  • Parameters:
@@ -981,7 +1395,7 @@ Append XGB logistic regression predictions to the DataFrame. Requires an `XGB_TY
981
1395
 
982
1396
  --------------------------------------------------------------------------------
983
1397
 
984
- ### 36. `print_n_frequency_cascading`
1398
+ ### 35. `print_n_frequency_cascading`
985
1399
  Print the cascading frequency of top n values for specified columns.
986
1400
 
987
1401
  • Parameters:
@@ -1001,27 +1415,36 @@ Print the cascading frequency of top n values for specified columns.
1001
1415
 
1002
1416
  --------------------------------------------------------------------------------
1003
1417
 
1004
- ### 37. `print_n_frequency_linear`
1005
- Print the linear frequency of top n values for specified columns.
1418
+ ### 36. `print_n_frequency_linear`
1006
1419
 
1007
- Parameters:
1008
- - df (pd.DataFrame)
1009
- - n (int)
1010
- - columns (str): Comma-separated columns.
1011
- - `order_by` (str)
1420
+ Prints the linear frequency of the top `n` values for specified columns.
1421
+
1422
+ #### Parameters:
1423
+ - **df** (`pd.DataFrame`): The DataFrame to analyze.
1424
+ - **n** (`int`): The number of top values to print for each column.
1425
+ - **columns** (`list`): A list of column names to be analyzed.
1426
+ - **order_by** (`str`): The order of frequency. The available options are:
1427
+ - `"ASC"`: Sort keys in ascending lexicographical order.
1428
+ - `"DESC"`: Sort keys in descending lexicographical order.
1429
+ - `"FREQ_ASC"`: Sort the frequencies in ascending order (least frequent first).
1430
+ - `"FREQ_DESC"`: Sort the frequencies in descending order (most frequent first).
1431
+ - `"BY_KEYS_ASC"`: Sort keys in ascending order, numerically if possible, handling special strings like 'NaN' as typical entries.
1432
+ - `"BY_KEYS_DESC"`: Sort keys in descending order, numerically if possible, handling special strings like 'NaN' as typical entries.
1433
+
1434
+ #### Example:
1012
1435
 
1013
- • Example:
1014
-
1015
1436
  from rgwfuncs import print_n_frequency_linear
1016
1437
  import pandas as pd
1017
1438
 
1018
- df = pd.DataFrame({'City': ['NY','LA','NY','SF','LA','LA']})
1019
- print_n_frequency_linear(df, 2, 'City', 'FREQ_DESC')
1020
-
1439
+ df = pd.DataFrame({'City': ['NY', 'LA', 'NY', 'SF', 'LA', 'LA']})
1440
+ print_n_frequency_linear(df, 2, ['City'], 'FREQ_DESC')
1441
+
1442
+ This example analyzes the `City` column, printing the top 2 most frequent values in descending order of frequency.
1443
+
1021
1444
 
1022
1445
  --------------------------------------------------------------------------------
1023
1446
 
1024
- ### 38. `retain_columns`
1447
+ ### 37. `retain_columns`
1025
1448
  Retain specified columns in the DataFrame and drop the others.
1026
1449
 
1027
1450
  • Parameters:
@@ -1043,7 +1466,7 @@ Retain specified columns in the DataFrame and drop the others.
1043
1466
 
1044
1467
  --------------------------------------------------------------------------------
1045
1468
 
1046
- ### 39. `mask_against_dataframe`
1469
+ ### 38. `mask_against_dataframe`
1047
1470
  Retain only rows with common column values between two DataFrames.
1048
1471
 
1049
1472
  • Parameters:
@@ -1068,7 +1491,7 @@ Retain only rows with common column values between two DataFrames.
1068
1491
 
1069
1492
  --------------------------------------------------------------------------------
1070
1493
 
1071
- ### 40. `mask_against_dataframe_converse`
1494
+ ### 39. `mask_against_dataframe_converse`
1072
1495
  Retain only rows with uncommon column values between two DataFrames.
1073
1496
 
1074
1497
  • Parameters:
@@ -1093,7 +1516,7 @@ Retain only rows with uncommon column values between two DataFrames.
1093
1516
 
1094
1517
  --------------------------------------------------------------------------------
1095
1518
 
1096
- ### 41. `union_join`
1519
+ ### 40. `union_join`
1097
1520
  Perform a union join, concatenating two DataFrames and dropping duplicates.
1098
1521
 
1099
1522
  • Parameters:
@@ -1116,7 +1539,7 @@ Perform a union join, concatenating two DataFrames and dropping duplicates.
1116
1539
 
1117
1540
  --------------------------------------------------------------------------------
1118
1541
 
1119
- ### 42. `bag_union_join`
1542
+ ### 41. `bag_union_join`
1120
1543
  Perform a bag union join, concatenating two DataFrames without dropping duplicates.
1121
1544
 
1122
1545
  • Parameters:
@@ -1139,7 +1562,7 @@ Perform a bag union join, concatenating two DataFrames without dropping duplicat
1139
1562
 
1140
1563
  --------------------------------------------------------------------------------
1141
1564
 
1142
- ### 43. `left_join`
1565
+ ### 42. `left_join`
1143
1566
  Perform a left join on two DataFrames.
1144
1567
 
1145
1568
  • Parameters:
@@ -1164,7 +1587,7 @@ Perform a left join on two DataFrames.
1164
1587
 
1165
1588
  --------------------------------------------------------------------------------
1166
1589
 
1167
- ### 44. `right_join`
1590
+ ### 43. `right_join`
1168
1591
  Perform a right join on two DataFrames.
1169
1592
 
1170
1593
  • Parameters:
@@ -1189,7 +1612,7 @@ Perform a right join on two DataFrames.
1189
1612
 
1190
1613
  --------------------------------------------------------------------------------
1191
1614
 
1192
- ### 45. `insert_dataframe_in_sqlite_database`
1615
+ ### 44. `insert_dataframe_in_sqlite_database`
1193
1616
 
1194
1617
  Inserts a Pandas DataFrame into a SQLite database table. If the specified table does not exist, it will be created with column types automatically inferred from the DataFrame's data types.
1195
1618
 
@@ -1227,7 +1650,7 @@ Inserts a Pandas DataFrame into a SQLite database table. If the specified table
1227
1650
 
1228
1651
  --------------------------------------------------------------------------------
1229
1652
 
1230
- ### 46. `sync_dataframe_to_sqlite_database`
1653
+ ### 45. `sync_dataframe_to_sqlite_database`
1231
1654
  Processes and saves a DataFrame to an SQLite database, adding a timestamp column and replacing the existing table if needed. Creates the table if it does not exist.
1232
1655
 
1233
1656
  • Parameters:
@@ -1251,6 +1674,8 @@ Processes and saves a DataFrame to an SQLite database, adding a timestamp column
1251
1674
 
1252
1675
  --------------------------------------------------------------------------------
1253
1676
 
1677
+
1678
+
1254
1679
  ## Additional Info
1255
1680
 
1256
1681
  For more information, refer to each function’s docstring by calling: