rgwfuncs 0.0.2__tar.gz → 0.0.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,1003 @@
1
+ Metadata-Version: 2.2
2
+ Name: rgwfuncs
3
+ Version: 0.0.3
4
+ Summary: A functional programming paradigm for mathematical modelling and data science
5
+ Home-page: https://github.com/ryangerardwilson/rgwfunc
6
+ Author: Ryan Gerard Wilson
7
+ Author-email: Ryan Gerard Wilson <ryangerardwilson@gmail.com>
8
+ Project-URL: Homepage, https://github.com/ryangerardwilson/rgwfuncs
9
+ Project-URL: Issues, https://github.com/ryangerardwilson/rgwfuncs
10
+ Classifier: Programming Language :: Python :: 3
11
+ Classifier: License :: OSI Approved :: MIT License
12
+ Classifier: Operating System :: OS Independent
13
+ Requires-Python: >=3.12
14
+ Description-Content-Type: text/markdown
15
+ License-File: LICENSE
16
+ Requires-Dist: pandas
17
+ Requires-Dist: pymssql
18
+ Requires-Dist: mysql-connector-python
19
+ Requires-Dist: clickhouse-connect
20
+ Requires-Dist: google-cloud-bigquery
21
+ Requires-Dist: google-auth
22
+ Requires-Dist: xgboost
23
+ Requires-Dist: requests
24
+ Requires-Dist: slack-sdk
25
+ Requires-Dist: google-api-python-client
26
+
27
+ RGWML
28
+
29
+ ***By Ryan Gerard Wilson (https://ryangerardwilson.com)***
30
+
31
+ # RGWFuncs
32
+
33
+ This library provides a variety of functions for manipulating and analyzing pandas DataFrames.
34
+
35
+ --------------------------------------------------------------------------------
36
+
37
+ ## Installation
38
+
39
+ Install the package using:
40
+ ```bash
41
+ pip install rgwfuncs
42
+ ```
43
+
44
+ --------------------------------------------------------------------------------
45
+
46
+ ## Basic Usage
47
+
48
+ Import the library:
49
+ ```
50
+ import rgwfuncs
51
+ ```
52
+
53
+ View available function docstrings in alphabetical order:
54
+ ```
55
+ rgwfuncs.docs()
56
+ ```
57
+
58
+ View specific docstrings by providing a filter (comma-separated). For example, to display docstrings about "numeric_clean":
59
+ ```
60
+ rgwfuncs.docs(method_type_filter='numeric_clean')
61
+ ```
62
+
63
+ To display all docstrings, use:
64
+ ```
65
+ rgwfuncs.docs(method_type_filter='*')
66
+ ```
67
+
68
+ --------------------------------------------------------------------------------
69
+
70
+ ## Function References and Syntax Examples
71
+
72
+ Below is a quick reference of available functions, their purpose, and basic usage examples.
73
+
74
+ ### 1. docs
75
+ Print a list of available function names in alphabetical order. If a filter is provided, print the matching docstrings.
76
+
77
+ • Parameters:
78
+ - `method_type_filter` (str): Optional, comma-separated to select docstring types, or '*' for all.
79
+
80
+ • Example:
81
+
82
+ import rgwfuncs
83
+ rgwfuncs.docs(method_type_filter='numeric_clean,limit_dataframe')
84
+
85
+ --------------------------------------------------------------------------------
86
+
87
+ ### 2. `numeric_clean`
88
+ Cleans the numeric columns in a DataFrame according to specified treatments.
89
+
90
+ • Parameters:
91
+ - df (pd.DataFrame): The DataFrame to clean.
92
+ - `column_names` (str): A comma-separated string containing the names of the columns to clean.
93
+ - `column_type` (str): The type to convert the column to (`INTEGER` or `FLOAT`).
94
+ - `irregular_value_treatment` (str): How to treat irregular values (`NAN`, `TO_ZERO`, `MEAN`).
95
+
96
+ • Returns:
97
+ - pd.DataFrame: A new DataFrame with cleaned numeric columns.
98
+
99
+ • Example:
100
+
101
+ from rgwfuncs import numeric_clean
102
+ import pandas as pd
103
+
104
+ # Sample DataFrame
105
+ df = pd.DataFrame({
106
+ 'col1': [1, 2, 3, 'x', 4],
107
+ 'col2': [10.5, 20.1, 'not_a_number', 30.2, 40.8]
108
+ })
109
+
110
+ # Clean numeric columns
111
+ df_cleaned = numeric_clean(df, 'col1,col2', 'FLOAT', 'MEAN')
112
+ print(df_cleaned)
113
+
114
+ --------------------------------------------------------------------------------
115
+
116
+ ### 3. `limit_dataframe`
117
+ Limit the DataFrame to a specified number of rows.
118
+
119
+ • Parameters:
120
+ - df (pd.DataFrame): The DataFrame to limit.
121
+ - `num_rows` (int): The number of rows to retain.
122
+
123
+ • Returns:
124
+ - pd.DataFrame: A new DataFrame limited to the specified number of rows.
125
+
126
+ • Example:
127
+ ```
128
+ from rgwfuncs import limit_dataframe
129
+ import pandas as pd
130
+
131
+ df = pd.DataFrame({'A': range(10), 'B': range(10, 20)})
132
+ df_limited = limit_dataframe(df, 5)
133
+ print(df_limited)
134
+ ```
135
+ --------------------------------------------------------------------------------
136
+
137
+ ### 4. `from_raw_data`
138
+ Create a DataFrame from raw data.
139
+
140
+ • Parameters:
141
+ - headers (list): A list of column headers.
142
+ - data (list of lists): A two-dimensional list of data.
143
+
144
+ • Returns:
145
+ - pd.DataFrame: A DataFrame created from the raw data.
146
+
147
+ • Example:
148
+ ```
149
+ from rgwfuncs import from_raw_data
150
+
151
+ headers = ["Name", "Age"]
152
+ data = [
153
+ ["Alice", 30],
154
+ ["Bob", 25],
155
+ ["Charlie", 35]
156
+ ]
157
+
158
+ df = from_raw_data(headers, data)
159
+ print(df)
160
+ ```
161
+ --------------------------------------------------------------------------------
162
+
163
+ ### 5. `append_rows`
164
+ Append rows to the DataFrame.
165
+
166
+ • Parameters:
167
+ - df (pd.DataFrame): The original DataFrame.
168
+ - rows (list of lists): Each inner list represents a row to be appended.
169
+
170
+ • Returns:
171
+ - pd.DataFrame: A new DataFrame with appended rows.
172
+
173
+ • Example:
174
+ ```
175
+ from rgwfuncs import append_rows
176
+ import pandas as pd
177
+
178
+ df = pd.DataFrame({'Name': ['Alice'], 'Age': [30]})
179
+ new_rows = [
180
+ ['Bob', 25],
181
+ ['Charlie', 35]
182
+ ]
183
+ df_appended = append_rows(df, new_rows)
184
+ print(df_appended)
185
+ ```
186
+ --------------------------------------------------------------------------------
187
+
188
+ ### 6. `append_columns`
189
+ Append new columns to the DataFrame with None values.
190
+
191
+ • Parameters:
192
+ - df (pd.DataFrame): The original DataFrame.
193
+ - `col_names` (list or comma-separated string): The names of the columns to add.
194
+
195
+ • Returns:
196
+ - pd.DataFrame: A new DataFrame with the new columns appended.
197
+
198
+ • Example:
199
+ ```
200
+ from rgwfuncs import append_columns
201
+ import pandas as pd
202
+
203
+ df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [30, 25]})
204
+ df_new = append_columns(df, ['Salary', 'Department'])
205
+ print(df_new)
206
+ ```
207
+ --------------------------------------------------------------------------------
208
+
209
+ ### 7. `update_rows`
210
+ Update specific rows in the DataFrame based on a condition.
211
+
212
+ • Parameters:
213
+ - df (pd.DataFrame): The original DataFrame.
214
+ - condition (str): A query condition to identify rows for updating.
215
+ - updates (dict): A dictionary with column names as keys and new values as values.
216
+
217
+ • Returns:
218
+ - pd.DataFrame: A new DataFrame with updated rows.
219
+
220
+ • Example:
221
+ ```
222
+ from rgwfuncs import update_rows
223
+ import pandas as pd
224
+
225
+ df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [30, 25]})
226
+ df_updated = update_rows(df, "Name == 'Alice'", {'Age': 31})
227
+ print(df_updated)
228
+ ```
229
+ --------------------------------------------------------------------------------
230
+
231
+ ### 8. `delete_rows`
232
+ Delete rows from the DataFrame based on a condition.
233
+
234
+ • Parameters:
235
+ - df (pd.DataFrame): The original DataFrame.
236
+ - condition (str): A query condition to identify rows for deletion.
237
+
238
+ • Returns:
239
+ - pd.DataFrame: The DataFrame with specified rows deleted.
240
+
241
+ • Example:
242
+ ```
243
+ from rgwfuncs import delete_rows
244
+ import pandas as pd
245
+
246
+ df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [30, 25]})
247
+ df_deleted = delete_rows(df, "Age < 28")
248
+ print(df_deleted)
249
+ ```
250
+ --------------------------------------------------------------------------------
251
+
252
+ ### 9. `drop_duplicates`
253
+ Drop duplicate rows in the DataFrame, retaining the first occurrence.
254
+
255
+ • Parameters:
256
+ - df (pd.DataFrame): The DataFrame from which duplicates will be dropped.
257
+
258
+ • Returns:
259
+ - pd.DataFrame: A new DataFrame with duplicates removed.
260
+
261
+ • Example:
262
+ ```
263
+ from rgwfuncs import drop_duplicates
264
+ import pandas as pd
265
+
266
+ df = pd.DataFrame({'A': [1,1,2,2], 'B': [3,3,4,4]})
267
+ df_no_dupes = drop_duplicates(df)
268
+ print(df_no_dupes)
269
+ ```
270
+ --------------------------------------------------------------------------------
271
+
272
+ ### 10. `drop_duplicates_retain_first`
273
+ Drop duplicate rows based on specified columns, retaining the first occurrence.
274
+
275
+ • Parameters:
276
+ - df (pd.DataFrame): The DataFrame from which duplicates will be dropped.
277
+ - columns (str): Comma-separated string with column names used to identify duplicates.
278
+
279
+ • Returns:
280
+ - pd.DataFrame: A new DataFrame with duplicates removed.
281
+
282
+ • Example:
283
+ ```
284
+ from rgwfuncs import drop_duplicates_retain_first
285
+ import pandas as pd
286
+
287
+ df = pd.DataFrame({'A': [1,1,2,2], 'B': [3,3,4,4]})
288
+ df_no_dupes = drop_duplicates_retain_first(df, 'A')
289
+ print(df_no_dupes)
290
+ ```
291
+ --------------------------------------------------------------------------------
292
+
293
+ ### 11. `drop_duplicates_retain_last`
294
+ Drop duplicate rows based on specified columns, retaining the last occurrence.
295
+
296
+ • Parameters:
297
+ - df (pd.DataFrame): The DataFrame from which duplicates will be dropped.
298
+ - columns (str): Comma-separated string with column names used to identify duplicates.
299
+
300
+ • Returns:
301
+ - pd.DataFrame: A new DataFrame with duplicates removed.
302
+
303
+ • Example:
304
+ ```
305
+ from rgwfuncs import drop_duplicates_retain_last
306
+ import pandas as pd
307
+
308
+ df = pd.DataFrame({'A': [1,1,2,2], 'B': [3,3,4,4]})
309
+ df_no_dupes = drop_duplicates_retain_last(df, 'A')
310
+ print(df_no_dupes)
311
+ ```
312
+
313
+ --------------------------------------------------------------------------------
314
+
315
+ ### 12. `load_data_from_query`
316
+ Load data from a database query into a DataFrame based on a configuration preset.
317
+
318
+ • Parameters:
319
+ - `db_preset_name` (str): Name of the database preset in the config file.
320
+ - query (str): The SQL query to execute.
321
+ - `config_file_name` (str): Name of the configuration file (default: "rgwml.config").
322
+
323
+ • Returns:
324
+ - pd.DataFrame: A DataFrame containing the query result.
325
+
326
+ • Example:
327
+ ```
328
+ from rgwfuncs import load_data_from_query
329
+
330
+ df = load_data_from_query(
331
+ db_preset_name="MyDBPreset",
332
+ query="SELECT * FROM my_table",
333
+ config_file_name="rgwml.config"
334
+ )
335
+ print(df)
336
+ ```
337
+
338
+ --------------------------------------------------------------------------------
339
+
340
+ ### 13. `load_data_from_path`
341
+ Load data from a file into a DataFrame based on the file extension.
342
+
343
+ • Parameters:
344
+ - `file_path` (str): The absolute path to the data file.
345
+
346
+ • Returns:
347
+ - pd.DataFrame: A DataFrame containing the loaded data.
348
+
349
+ • Example:
350
+ ```
351
+ from rgwfuncs import load_data_from_path
352
+
353
+ df = load_data_from_path("/absolute/path/to/data.csv")
354
+ print(df)
355
+ ```
356
+
357
+ --------------------------------------------------------------------------------
358
+
359
+ ### 14. `load_data_from_sqlite_path`
360
+ Execute a query on a SQLite database file and return the results as a DataFrame.
361
+
362
+ • Parameters:
363
+ - `sqlite_path` (str): The absolute path to the SQLite database file.
364
+ - query (str): The SQL query to execute.
365
+
366
+ • Returns:
367
+ - pd.DataFrame: A DataFrame containing the query results.
368
+
369
+ • Example:
370
+ ```
371
+ from rgwfuncs import load_data_from_sqlite_path
372
+
373
+ df = load_data_from_sqlite_path("/path/to/database.db", "SELECT * FROM my_table")
374
+ print(df)
375
+ ```
376
+
377
+ --------------------------------------------------------------------------------
378
+
379
+ ### 15. `first_n_rows`
380
+ Display the first n rows of the DataFrame (prints out in dictionary format).
381
+
382
+ • Parameters:
383
+ - df (pd.DataFrame)
384
+ - n (int): Number of rows to display.
385
+
386
+ • Example:
387
+ ```
388
+ from rgwfuncs import first_n_rows
389
+ import pandas as pd
390
+
391
+ df = pd.DataFrame({'A': [1,2,3], 'B': [4,5,6]})
392
+ first_n_rows(df, 2)
393
+ ```
394
+
395
+ --------------------------------------------------------------------------------
396
+
397
+ ### 16. `last_n_rows`
398
+ Display the last n rows of the DataFrame (prints out in dictionary format).
399
+
400
+ • Parameters:
401
+ - df (pd.DataFrame)
402
+ - n (int): Number of rows to display.
403
+
404
+ • Example:
405
+ ```
406
+ from rgwfuncs import last_n_rows
407
+ import pandas as pd
408
+
409
+ df = pd.DataFrame({'A': [1,2,3,4,5], 'B': [6,7,8,9,10]})
410
+ last_n_rows(df, 2)
411
+ ```
412
+
413
+ --------------------------------------------------------------------------------
414
+
415
+ ### 17. `top_n_unique_values`
416
+ Print the top n unique values for specified columns in the DataFrame.
417
+
418
+ • Parameters:
419
+ - df (pd.DataFrame): The DataFrame to evaluate.
420
+ - n (int): Number of top values to display.
421
+ - columns (list): List of columns for which to display top unique values.
422
+
423
+ • Example:
424
+ ```
425
+ from rgwfuncs import top_n_unique_values
426
+ import pandas as pd
427
+
428
+ df = pd.DataFrame({'Cities': ['NY', 'LA', 'NY', 'SF', 'LA', 'LA']})
429
+ top_n_unique_values(df, 2, ['Cities'])
430
+ ```
431
+
432
+ --------------------------------------------------------------------------------
433
+
434
+ ### 18. `bottom_n_unique_values`
435
+ Print the bottom n unique values for specified columns in the DataFrame.
436
+
437
+ • Parameters:
438
+ - df (pd.DataFrame)
439
+ - n (int)
440
+ - columns (list)
441
+
442
+ • Example:
443
+ ```
444
+ from rgwfuncs import bottom_n_unique_values
445
+ import pandas as pd
446
+
447
+ df = pd.DataFrame({'Cities': ['NY', 'LA', 'NY', 'SF', 'LA', 'LA']})
448
+ bottom_n_unique_values(df, 1, ['Cities'])
449
+ ```
450
+
451
+ --------------------------------------------------------------------------------
452
+
453
+ ### 19. `print_correlation`
454
+ Print correlation for multiple pairs of columns in the DataFrame.
455
+
456
+ • Parameters:
457
+ - df (pd.DataFrame)
458
+ - `column_pairs` (list of tuples): E.g., `[('col1','col2'), ('colA','colB')]`.
459
+
460
+ • Example:
461
+ ```
462
+ from rgwfuncs import print_correlation
463
+ import pandas as pd
464
+
465
+ df = pd.DataFrame({
466
+ 'col1': [1,2,3,4,5],
467
+ 'col2': [2,4,6,8,10],
468
+ 'colA': [10,9,8,7,6],
469
+ 'colB': [5,4,3,2,1]
470
+ })
471
+
472
+ pairs = [('col1','col2'), ('colA','colB')]
473
+ print_correlation(df, pairs)
474
+ ```
475
+
476
+ --------------------------------------------------------------------------------
477
+
478
+ ### 20. `print_memory_usage`
479
+ Print the memory usage of the DataFrame in megabytes.
480
+
481
+ • Parameters:
482
+ - df (pd.DataFrame)
483
+
484
+ • Example:
485
+ ```
486
+ from rgwfuncs import print_memory_usage
487
+ import pandas as pd
488
+
489
+ df = pd.DataFrame({'A': range(1000)})
490
+ print_memory_usage(df)
491
+ ```
492
+
493
+ --------------------------------------------------------------------------------
494
+
495
+ ### 21. `filter_dataframe`
496
+ Return a new DataFrame filtered by a given query expression.
497
+
498
+ • Parameters:
499
+ - df (pd.DataFrame)
500
+ - `filter_expr` (str)
501
+
502
+ • Returns:
503
+ - pd.DataFrame
504
+
505
+ • Example:
506
+ ```
507
+ from rgwfuncs import filter_dataframe
508
+ import pandas as pd
509
+
510
+ df = pd.DataFrame({
511
+ 'Name': ['Alice', 'Bob', 'Charlie'],
512
+ 'Age': [30, 20, 25]
513
+ })
514
+
515
+ df_filtered = filter_dataframe(df, "Age > 23")
516
+ print(df_filtered)
517
+ ```
518
+
519
+ --------------------------------------------------------------------------------
520
+
521
+ ### 22. `filter_indian_mobiles`
522
+ Filter and return rows containing valid Indian mobile numbers in the specified column.
523
+
524
+ • Parameters:
525
+ - df (pd.DataFrame)
526
+ - `mobile_col` (str): The column name with mobile numbers.
527
+
528
+ • Returns:
529
+ - pd.DataFrame
530
+
531
+ • Example:
532
+ ```
533
+ from rgwfuncs import filter_indian_mobiles
534
+ import pandas as pd
535
+
536
+ df = pd.DataFrame({'Phone': ['9876543210', '12345', '7000012345']})
537
+ df_indian = filter_indian_mobiles(df, 'Phone')
538
+ print(df_indian)
539
+ ```
540
+
541
+ --------------------------------------------------------------------------------
542
+
543
+ ### 23. `print_dataframe`
544
+ Print the entire DataFrame and its column types. Optionally print a source path.
545
+
546
+ • Parameters:
547
+ - df (pd.DataFrame)
548
+ - source (str, optional)
549
+
550
+ • Example:
551
+ ```
552
+ from rgwfuncs import print_dataframe
553
+ import pandas as pd
554
+
555
+ df = pd.DataFrame({'Name': ['Alice'], 'Age': [30]})
556
+ print_dataframe(df, source='SampleData.csv')
557
+ ```
558
+
559
+ --------------------------------------------------------------------------------
560
+
561
+ ### 24. `send_dataframe_via_telegram`
562
+ Send a DataFrame via Telegram using a specified bot configuration.
563
+
564
+ • Parameters:
565
+ - df (pd.DataFrame)
566
+ - `bot_name` (str)
567
+ - message (str)
568
+ - `as_file` (bool)
569
+ - `remove_after_send` (bool)
570
+
571
+ • Example:
572
+ ```
573
+ from rgwfuncs import send_dataframe_via_telegram
574
+
575
+ # Suppose your bot config is in "rgwml.config" under [TelegramBots] section
576
+ df = ... # Some DataFrame
577
+ send_dataframe_via_telegram(
578
+ df,
579
+ bot_name='MyTelegramBot',
580
+ message='Hello from RGWFuncs!',
581
+ as_file=True,
582
+ remove_after_send=True
583
+ )
584
+ ```
585
+
586
+ --------------------------------------------------------------------------------
587
+
588
+ ### 25. `send_data_to_email`
589
+ Send an email with an optional DataFrame attachment using the Gmail API via a specified preset.
590
+
591
+ • Parameters:
592
+ - df (pd.DataFrame)
593
+ - `preset_name` (str)
594
+ - `to_email` (str)
595
+ - subject (str, optional)
596
+ - body (str, optional)
597
+ - `as_file` (bool)
598
+ - `remove_after_send` (bool)
599
+
600
+ • Example:
601
+ ```
602
+ from rgwfuncs import send_data_to_email
603
+
604
+ df = ... # Some DataFrame
605
+ send_data_to_email(
606
+ df,
607
+ preset_name='MyEmailPreset',
608
+ to_email='recipient@example.com',
609
+ subject='Hello from RGWFuncs',
610
+ body='Here is the data you requested.',
611
+ as_file=True,
612
+ remove_after_send=True
613
+ )
614
+ ```
615
+
616
+ --------------------------------------------------------------------------------
617
+
618
+ ### 26. `send_data_to_slack`
619
+ Send a DataFrame or message to Slack using a specified bot configuration.
620
+
621
+ • Parameters:
622
+ - df (pd.DataFrame)
623
+ - `bot_name` (str)
624
+ - message (str)
625
+ - `as_file` (bool)
626
+ - `remove_after_send` (bool)
627
+
628
+ • Example:
629
+ ```
630
+ from rgwfuncs import send_data_to_slack
631
+
632
+ df = ... # Some DataFrame
633
+ send_data_to_slack(
634
+ df,
635
+ bot_name='MySlackBot',
636
+ message='Hello Slack!',
637
+ as_file=True,
638
+ remove_after_send=True
639
+ )
640
+ ```
641
+
642
+ --------------------------------------------------------------------------------
643
+
644
+ ### 27. `order_columns`
645
+ Reorder the columns of a DataFrame based on a string input.
646
+
647
+ • Parameters:
648
+ - df (pd.DataFrame)
649
+ - `column_order_str` (str): Comma-separated column order.
650
+
651
+ • Returns:
652
+ - pd.DataFrame
653
+
654
+ • Example:
655
+ ```
656
+ from rgwfuncs import order_columns
657
+ import pandas as pd
658
+
659
+ df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [30, 25], 'Salary': [1000, 1200]})
660
+ df_reordered = order_columns(df, 'Salary,Name,Age')
661
+ print(df_reordered)
662
+ ```
663
+
664
+ --------------------------------------------------------------------------------
665
+
666
+ ### 28. `append_ranged_classification_column`
667
+ Append a ranged classification column to the DataFrame.
668
+
669
+ • Parameters:
670
+ - df (pd.DataFrame)
671
+ - ranges (str): Ranges separated by commas (e.g., "0-10,11-20,21-30").
672
+ - `target_col` (str): The column to classify.
673
+ - `new_col_name` (str): Name of the new classification column.
674
+
675
+ • Returns:
676
+ - pd.DataFrame
677
+
678
+ • Example:
679
+ ```
680
+ from rgwfuncs import append_ranged_classification_column
681
+ import pandas as pd
682
+
683
+ df = pd.DataFrame({'Scores': [5, 12, 25]})
684
+ df_classified = append_ranged_classification_column(df, '0-10,11-20,21-30', 'Scores', 'ScoreRange')
685
+ print(df_classified)
686
+ ```
687
+
688
+ --------------------------------------------------------------------------------
689
+
690
+ ### 29. `append_percentile_classification_column`
691
+ Append a percentile classification column to the DataFrame.
692
+
693
+ • Parameters:
694
+ - df (pd.DataFrame)
695
+ - percentiles (str): Percentile values separated by commas (e.g., "25,50,75").
696
+ - `target_col` (str)
697
+ - `new_col_name` (str)
698
+
699
+ • Returns:
700
+ - pd.DataFrame
701
+
702
+ • Example:
703
+ ```
704
+ from rgwfuncs import append_percentile_classification_column
705
+ import pandas as pd
706
+
707
+ df = pd.DataFrame({'Values': [10, 20, 30, 40, 50]})
708
+ df_classified = append_percentile_classification_column(df, '25,50,75', 'Values', 'ValuePercentile')
709
+ print(df_classified)
710
+ ```
711
+
712
+ --------------------------------------------------------------------------------
713
+
714
+ ### 30. `append_ranged_date_classification_column`
715
+ Append a ranged date classification column to the DataFrame.
716
+
717
+ • Parameters:
718
+ - df (pd.DataFrame)
719
+ - `date_ranges` (str): Date ranges separated by commas, e.g., `2020-01-01_2020-06-30,2020-07-01_2020-12-31`
720
+ - `target_col` (str)
721
+ - `new_col_name` (str)
722
+
723
+ • Returns:
724
+ - pd.DataFrame
725
+
726
+ • Example:
727
+ ```
728
+ from rgwfuncs import append_ranged_date_classification_column
729
+ import pandas as pd
730
+
731
+ df = pd.DataFrame({'EventDate': pd.to_datetime(['2020-03-15','2020-08-10'])})
732
+ df_classified = append_ranged_date_classification_column(
733
+ df,
734
+ '2020-01-01_2020-06-30,2020-07-01_2020-12-31',
735
+ 'EventDate',
736
+ 'DateRange'
737
+ )
738
+ print(df_classified)
739
+ ```
740
+
741
+ --------------------------------------------------------------------------------
742
+
743
+ ### 31. `rename_columns`
744
+ Rename columns in the DataFrame.
745
+
746
+ • Parameters:
747
+ - df (pd.DataFrame)
748
+ - `rename_pairs` (dict): Mapping old column names to new ones.
749
+
750
+ • Returns:
751
+ - pd.DataFrame
752
+
753
+ • Example:
754
+ ```
755
+ from rgwfuncs import rename_columns
756
+ import pandas as pd
757
+
758
+ df = pd.DataFrame({'OldName': [1,2,3]})
759
+ df_renamed = rename_columns(df, {'OldName': 'NewName'})
760
+ print(df_renamed)
761
+ ```
762
+
763
+ --------------------------------------------------------------------------------
764
+
765
+ ### 32. `cascade_sort`
766
+ Cascade sort the DataFrame by specified columns and order.
767
+
768
+ • Parameters:
769
+ - df (pd.DataFrame)
770
+ - columns (list): e.g. ["Column1::ASC", "Column2::DESC"].
771
+
772
+ • Returns:
773
+ - pd.DataFrame
774
+
775
+ • Example:
776
+ ```
777
+ from rgwfuncs import cascade_sort
778
+ import pandas as pd
779
+
780
+ df = pd.DataFrame({
781
+ 'Name': ['Charlie', 'Alice', 'Bob'],
782
+ 'Age': [25, 30, 22]
783
+ })
784
+
785
+ sorted_df = cascade_sort(df, ["Name::ASC", "Age::DESC"])
786
+ print(sorted_df)
787
+ ```
788
+
789
+ --------------------------------------------------------------------------------
790
+
791
+ ### 33. `append_xgb_labels`
792
+ Append XGB training labels (TRAIN, VALIDATE, TEST) based on a ratio string.
793
+
794
+ • Parameters:
795
+ - df (pd.DataFrame)
796
+ - `ratio_str` (str): e.g. "8:2", "7:2:1".
797
+
798
+ • Returns:
799
+ - pd.DataFrame
800
+
801
+ • Example:
802
+ ```
803
+ from rgwfuncs import append_xgb_labels
804
+ import pandas as pd
805
+
806
+ df = pd.DataFrame({'A': range(10)})
807
+ df_labeled = append_xgb_labels(df, "7:2:1")
808
+ print(df_labeled)
809
+ ```
810
+
811
+ --------------------------------------------------------------------------------
812
+
813
+ ### 34. `append_xgb_regression_predictions`
814
+ Append XGB regression predictions to the DataFrame. Requires an `XGB_TYPE` column for TRAIN/TEST splits.
815
+
816
+ • Parameters:
817
+ - df (pd.DataFrame)
818
+ - `target_col` (str)
819
+ - `feature_cols` (str): Comma-separated feature columns.
820
+ - `pred_col` (str)
821
+ - `boosting_rounds` (int, optional)
822
+ - `model_path` (str, optional)
823
+
824
+ • Returns:
825
+ - pd.DataFrame
826
+
827
+ • Example:
828
+ ```
829
+ from rgwfuncs import append_xgb_regression_predictions
830
+ import pandas as pd
831
+
832
+ df = pd.DataFrame({
833
+ 'XGB_TYPE': ['TRAIN','TRAIN','TEST','TEST'],
834
+ 'Feature1': [1.2, 2.3, 3.4, 4.5],
835
+ 'Feature2': [5.6, 6.7, 7.8, 8.9],
836
+ 'Target': [10, 20, 30, 40]
837
+ })
838
+
839
+ df_pred = append_xgb_regression_predictions(df, 'Target', 'Feature1,Feature2', 'PredictedTarget')
840
+ print(df_pred)
841
+ ```
842
+
843
+ --------------------------------------------------------------------------------
844
+
845
+ ### 35. `append_xgb_logistic_regression_predictions`
846
+ Append XGB logistic regression predictions to the DataFrame. Requires an `XGB_TYPE` column for TRAIN/TEST splits.
847
+
848
+ • Parameters:
849
+ - df (pd.DataFrame)
850
+ - `target_col` (str)
851
+ - `feature_cols` (str)
852
+ - `pred_col` (str)
853
+ - `boosting_rounds` (int, optional)
854
+ - `model_path` (str, optional)
855
+
856
+ • Returns:
857
+ - pd.DataFrame
858
+
859
+ • Example:
860
+ ```
861
+ from rgwfuncs import append_xgb_logistic_regression_predictions
862
+ import pandas as pd
863
+
864
+ df = pd.DataFrame({
865
+ 'XGB_TYPE': ['TRAIN','TRAIN','TEST','TEST'],
866
+ 'Feature1': [1, 0, 1, 0],
867
+ 'Feature2': [0.5, 0.2, 0.8, 0.1],
868
+ 'Target': [1, 0, 1, 0]
869
+ })
870
+
871
+ df_pred = append_xgb_logistic_regression_predictions(df, 'Target', 'Feature1,Feature2', 'PredictedTarget')
872
+ print(df_pred)
873
+ ```
874
+
875
+ --------------------------------------------------------------------------------
876
+
877
+ ### 36. `print_n_frequency_cascading`
878
+ Print the cascading frequency of top n values for specified columns.
879
+
880
+ • Parameters:
881
+ - df (pd.DataFrame)
882
+ - n (int)
883
+ - columns (str): Comma-separated column names.
884
+ - `order_by` (str): `ASC`, `DESC`, `FREQ_ASC`, `FREQ_DESC`.
885
+
886
+ • Example:
887
+ ```
888
+ from rgwfuncs import print_n_frequency_cascading
889
+ import pandas as pd
890
+
891
+ df = pd.DataFrame({'City': ['NY','LA','NY','SF','LA','LA']})
892
+ print_n_frequency_cascading(df, 2, 'City', 'FREQ_DESC')
893
+ ```
894
+
895
+ --------------------------------------------------------------------------------
896
+
897
+ ### 37. `print_n_frequency_linear`
898
+ Print the linear frequency of top n values for specified columns.
899
+
900
+ • Parameters:
901
+ - df (pd.DataFrame)
902
+ - n (int)
903
+ - columns (str): Comma-separated columns.
904
+ - `order_by` (str)
905
+
906
+ • Example:
907
+ ```
908
+ from rgwfuncs import print_n_frequency_linear
909
+ import pandas as pd
910
+
911
+ df = pd.DataFrame({'City': ['NY','LA','NY','SF','LA','LA']})
912
+ print_n_frequency_linear(df, 2, 'City', 'FREQ_DESC')
913
+ ```
914
+
915
+ --------------------------------------------------------------------------------
916
+
917
+ ### 38. `retain_columns`
918
+ Retain specified columns in the DataFrame and drop the others.
919
+
920
+ • Parameters:
921
+ - df (pd.DataFrame)
922
+ - `columns_to_retain` (list or str)
923
+
924
+ • Returns:
925
+ - pd.DataFrame
926
+
927
+ • Example:
928
+ ```
929
+ from rgwfuncs import retain_columns
930
+ import pandas as pd
931
+
932
+ df = pd.DataFrame({'A': [1,2], 'B': [3,4], 'C': [5,6]})
933
+ df_reduced = retain_columns(df, ['A','C'])
934
+ print(df_reduced)
935
+ ```
936
+
937
+ --------------------------------------------------------------------------------
938
+
939
+ ### 39. `mask_against_dataframe`
940
+ Retain only rows with common column values between two DataFrames.
941
+
942
+ • Parameters:
943
+ - df (pd.DataFrame)
944
+ - `other_df` (pd.DataFrame)
945
+ - `column_name` (str)
946
+
947
+ • Returns:
948
+ - pd.DataFrame
949
+
950
+ • Example:
951
+ ```
952
+ from rgwfuncs import mask_against_dataframe
953
+ import pandas as pd
954
+
955
+ df1 = pd.DataFrame({'ID': [1,2,3], 'Value': [10,20,30]})
956
+ df2 = pd.DataFrame({'ID': [2,3,4], 'Extra': ['X','Y','Z']})
957
+
958
+ df_masked = mask_against_dataframe(df1, df2, 'ID')
959
+ print(df_masked)
960
+ ```
961
+
962
+ --------------------------------------------------------------------------------
963
+
964
+ ### 40. `mask_against_dataframe_converse`
965
+ Retain only rows with uncommon column values between two DataFrames.
966
+
967
+ • Parameters:
968
+ - df (pd.DataFrame)
969
+ - `other_df` (pd.DataFrame)
970
+ - `column_name` (str)
971
+
972
+ • Returns:
973
+ - pd.DataFrame
974
+
975
+ • Example:
976
+ ```
977
+ from rgwfuncs import mask_against_dataframe_converse
978
+ import pandas as pd
979
+
980
+ df1 = pd.DataFrame({'ID': [1,2,3], 'Value': [10,20,30]})
981
+ df2 = pd.DataFrame({'ID': [2,3,4], 'Extra': ['X','Y','Z']})
982
+
983
+ df_uncommon = mask_against_dataframe_converse(df1, df2, 'ID')
984
+ print(df_uncommon)
985
+ ```
986
+
987
+ --------------------------------------------------------------------------------
988
+
989
+ ## Additional Info
990
+
991
+ For more information, refer to each function’s docstring by calling:
992
+ ```
993
+ rgwfuncs.docs(method_type_filter='function_name')
994
+ ```
995
+ or display all docstrings with:
996
+ ```python
997
+ rgwfuncs.docs(method_type_filter='*')
998
+ ```
999
+
1000
+ --------------------------------------------------------------------------------
1001
+
1002
+ © 2025 Ryan Gerard Wilson. All rights reserved.
1003
+