rgwfuncs 0.0.2__py3-none-any.whl → 0.0.4__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,999 @@
1
+ Metadata-Version: 2.2
2
+ Name: rgwfuncs
3
+ Version: 0.0.4
4
+ Summary: A functional programming paradigm for mathematical modelling and data science
5
+ Home-page: https://github.com/ryangerardwilson/rgwfunc
6
+ Author: Ryan Gerard Wilson
7
+ Author-email: Ryan Gerard Wilson <ryangerardwilson@gmail.com>
8
+ Project-URL: Homepage, https://github.com/ryangerardwilson/rgwfuncs
9
+ Project-URL: Issues, https://github.com/ryangerardwilson/rgwfuncs
10
+ Classifier: Programming Language :: Python :: 3
11
+ Classifier: License :: OSI Approved :: MIT License
12
+ Classifier: Operating System :: OS Independent
13
+ Requires-Python: >=3.12
14
+ Description-Content-Type: text/markdown
15
+ License-File: LICENSE
16
+ Requires-Dist: pandas
17
+ Requires-Dist: pymssql
18
+ Requires-Dist: mysql-connector-python
19
+ Requires-Dist: clickhouse-connect
20
+ Requires-Dist: google-cloud-bigquery
21
+ Requires-Dist: google-auth
22
+ Requires-Dist: xgboost
23
+ Requires-Dist: requests
24
+ Requires-Dist: slack-sdk
25
+ Requires-Dist: google-api-python-client
26
+
27
+ RGWML
28
+
29
+ ***By Ryan Gerard Wilson (https://ryangerardwilson.com)***
30
+
31
+ # RGWFuncs
32
+
33
+ This library provides a variety of functions for manipulating and analyzing pandas DataFrames.
34
+
35
+ --------------------------------------------------------------------------------
36
+
37
+ ## Installation
38
+
39
+ Install the package using:
40
+ bash
41
+ pip install rgwfuncs
42
+
43
+
44
+ --------------------------------------------------------------------------------
45
+
46
+ ## Basic Usage
47
+
48
+ Import the library:
49
+
50
+ import rgwfuncs
51
+
52
+ View available function docstrings in alphabetical order:
53
+
54
+ rgwfuncs.docs()
55
+
56
+ View specific docstrings by providing a filter (comma-separated). For example, to display docstrings about "numeric_clean":
57
+
58
+ rgwfuncs.docs(method_type_filter='numeric_clean')
59
+
60
+ To display all docstrings, use:
61
+
62
+ rgwfuncs.docs(method_type_filter='*')
63
+
64
+ --------------------------------------------------------------------------------
65
+
66
+ ## Function References and Syntax Examples
67
+
68
+ Below is a quick reference of available functions, their purpose, and basic usage examples.
69
+
70
+ ### 1. docs
71
+ Print a list of available function names in alphabetical order. If a filter is provided, print the matching docstrings.
72
+
73
+ • Parameters:
74
+ - `method_type_filter` (str): Optional, comma-separated to select docstring types, or '*' for all.
75
+
76
+ • Example:
77
+
78
+ import rgwfuncs
79
+ rgwfuncs.docs(method_type_filter='numeric_clean,limit_dataframe')
80
+
81
+ --------------------------------------------------------------------------------
82
+
83
+ ### 2. `numeric_clean`
84
+ Cleans the numeric columns in a DataFrame according to specified treatments.
85
+
86
+ • Parameters:
87
+ - df (pd.DataFrame): The DataFrame to clean.
88
+ - `column_names` (str): A comma-separated string containing the names of the columns to clean.
89
+ - `column_type` (str): The type to convert the column to (`INTEGER` or `FLOAT`).
90
+ - `irregular_value_treatment` (str): How to treat irregular values (`NAN`, `TO_ZERO`, `MEAN`).
91
+
92
+ • Returns:
93
+ - pd.DataFrame: A new DataFrame with cleaned numeric columns.
94
+
95
+ • Example:
96
+
97
+ from rgwfuncs import numeric_clean
98
+ import pandas as pd
99
+
100
+ # Sample DataFrame
101
+ df = pd.DataFrame({
102
+ 'col1': [1, 2, 3, 'x', 4],
103
+ 'col2': [10.5, 20.1, 'not_a_number', 30.2, 40.8]
104
+ })
105
+
106
+ # Clean numeric columns
107
+ df_cleaned = numeric_clean(df, 'col1,col2', 'FLOAT', 'MEAN')
108
+ print(df_cleaned)
109
+
110
+ --------------------------------------------------------------------------------
111
+
112
+ ### 3. `limit_dataframe`
113
+ Limit the DataFrame to a specified number of rows.
114
+
115
+ • Parameters:
116
+ - df (pd.DataFrame): The DataFrame to limit.
117
+ - `num_rows` (int): The number of rows to retain.
118
+
119
+ • Returns:
120
+ - pd.DataFrame: A new DataFrame limited to the specified number of rows.
121
+
122
+ • Example:
123
+
124
+ from rgwfuncs import limit_dataframe
125
+ import pandas as pd
126
+
127
+ df = pd.DataFrame({'A': range(10), 'B': range(10, 20)})
128
+ df_limited = limit_dataframe(df, 5)
129
+ print(df_limited)
130
+
131
+ --------------------------------------------------------------------------------
132
+
133
+ ### 4. `from_raw_data`
134
+ Create a DataFrame from raw data.
135
+
136
+ • Parameters:
137
+ - headers (list): A list of column headers.
138
+ - data (list of lists): A two-dimensional list of data.
139
+
140
+ • Returns:
141
+ - pd.DataFrame: A DataFrame created from the raw data.
142
+
143
+ • Example:
144
+
145
+ from rgwfuncs import from_raw_data
146
+
147
+ headers = ["Name", "Age"]
148
+ data = [
149
+ ["Alice", 30],
150
+ ["Bob", 25],
151
+ ["Charlie", 35]
152
+ ]
153
+
154
+ df = from_raw_data(headers, data)
155
+ print(df)
156
+
157
+ --------------------------------------------------------------------------------
158
+
159
+ ### 5. `append_rows`
160
+ Append rows to the DataFrame.
161
+
162
+ • Parameters:
163
+ - df (pd.DataFrame): The original DataFrame.
164
+ - rows (list of lists): Each inner list represents a row to be appended.
165
+
166
+ • Returns:
167
+ - pd.DataFrame: A new DataFrame with appended rows.
168
+
169
+ • Example:
170
+
171
+ from rgwfuncs import append_rows
172
+ import pandas as pd
173
+
174
+ df = pd.DataFrame({'Name': ['Alice'], 'Age': [30]})
175
+ new_rows = [
176
+ ['Bob', 25],
177
+ ['Charlie', 35]
178
+ ]
179
+ df_appended = append_rows(df, new_rows)
180
+ print(df_appended)
181
+
182
+ --------------------------------------------------------------------------------
183
+
184
+ ### 6. `append_columns`
185
+ Append new columns to the DataFrame with None values.
186
+
187
+ • Parameters:
188
+ - df (pd.DataFrame): The original DataFrame.
189
+ - `col_names` (list or comma-separated string): The names of the columns to add.
190
+
191
+ • Returns:
192
+ - pd.DataFrame: A new DataFrame with the new columns appended.
193
+
194
+ • Example:
195
+
196
+ from rgwfuncs import append_columns
197
+ import pandas as pd
198
+
199
+ df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [30, 25]})
200
+ df_new = append_columns(df, ['Salary', 'Department'])
201
+ print(df_new)
202
+
203
+ --------------------------------------------------------------------------------
204
+
205
+ ### 7. `update_rows`
206
+ Update specific rows in the DataFrame based on a condition.
207
+
208
+ • Parameters:
209
+ - df (pd.DataFrame): The original DataFrame.
210
+ - condition (str): A query condition to identify rows for updating.
211
+ - updates (dict): A dictionary with column names as keys and new values as values.
212
+
213
+ • Returns:
214
+ - pd.DataFrame: A new DataFrame with updated rows.
215
+
216
+ • Example:
217
+
218
+ from rgwfuncs import update_rows
219
+ import pandas as pd
220
+
221
+ df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [30, 25]})
222
+ df_updated = update_rows(df, "Name == 'Alice'", {'Age': 31})
223
+ print(df_updated)
224
+
225
+ --------------------------------------------------------------------------------
226
+
227
+ ### 8. `delete_rows`
228
+ Delete rows from the DataFrame based on a condition.
229
+
230
+ • Parameters:
231
+ - df (pd.DataFrame): The original DataFrame.
232
+ - condition (str): A query condition to identify rows for deletion.
233
+
234
+ • Returns:
235
+ - pd.DataFrame: The DataFrame with specified rows deleted.
236
+
237
+ • Example:
238
+
239
+ from rgwfuncs import delete_rows
240
+ import pandas as pd
241
+
242
+ df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [30, 25]})
243
+ df_deleted = delete_rows(df, "Age < 28")
244
+ print(df_deleted)
245
+
246
+ --------------------------------------------------------------------------------
247
+
248
+ ### 9. `drop_duplicates`
249
+ Drop duplicate rows in the DataFrame, retaining the first occurrence.
250
+
251
+ • Parameters:
252
+ - df (pd.DataFrame): The DataFrame from which duplicates will be dropped.
253
+
254
+ • Returns:
255
+ - pd.DataFrame: A new DataFrame with duplicates removed.
256
+
257
+ • Example:
258
+
259
+ from rgwfuncs import drop_duplicates
260
+ import pandas as pd
261
+
262
+ df = pd.DataFrame({'A': [1,1,2,2], 'B': [3,3,4,4]})
263
+ df_no_dupes = drop_duplicates(df)
264
+ print(df_no_dupes)
265
+
266
+ --------------------------------------------------------------------------------
267
+
268
+ ### 10. `drop_duplicates_retain_first`
269
+ Drop duplicate rows based on specified columns, retaining the first occurrence.
270
+
271
+ • Parameters:
272
+ - df (pd.DataFrame): The DataFrame from which duplicates will be dropped.
273
+ - columns (str): Comma-separated string with column names used to identify duplicates.
274
+
275
+ • Returns:
276
+ - pd.DataFrame: A new DataFrame with duplicates removed.
277
+
278
+ • Example:
279
+
280
+ from rgwfuncs import drop_duplicates_retain_first
281
+ import pandas as pd
282
+
283
+ df = pd.DataFrame({'A': [1,1,2,2], 'B': [3,3,4,4]})
284
+ df_no_dupes = drop_duplicates_retain_first(df, 'A')
285
+ print(df_no_dupes)
286
+
287
+ --------------------------------------------------------------------------------
288
+
289
+ ### 11. `drop_duplicates_retain_last`
290
+ Drop duplicate rows based on specified columns, retaining the last occurrence.
291
+
292
+ • Parameters:
293
+ - df (pd.DataFrame): The DataFrame from which duplicates will be dropped.
294
+ - columns (str): Comma-separated string with column names used to identify duplicates.
295
+
296
+ • Returns:
297
+ - pd.DataFrame: A new DataFrame with duplicates removed.
298
+
299
+ • Example:
300
+
301
+ from rgwfuncs import drop_duplicates_retain_last
302
+ import pandas as pd
303
+
304
+ df = pd.DataFrame({'A': [1,1,2,2], 'B': [3,3,4,4]})
305
+ df_no_dupes = drop_duplicates_retain_last(df, 'A')
306
+ print(df_no_dupes)
307
+
308
+
309
+ --------------------------------------------------------------------------------
310
+
311
+ ### 12. `load_data_from_query`
312
+ Load data from a database query into a DataFrame based on a configuration preset.
313
+
314
+ • Parameters:
315
+ - `db_preset_name` (str): Name of the database preset in the config file.
316
+ - query (str): The SQL query to execute.
317
+ - `config_file_name` (str): Name of the configuration file (default: "rgwml.config").
318
+
319
+ • Returns:
320
+ - pd.DataFrame: A DataFrame containing the query result.
321
+
322
+ • Example:
323
+
324
+ from rgwfuncs import load_data_from_query
325
+
326
+ df = load_data_from_query(
327
+ db_preset_name="MyDBPreset",
328
+ query="SELECT * FROM my_table",
329
+ config_file_name="rgwml.config"
330
+ )
331
+ print(df)
332
+
333
+
334
+ --------------------------------------------------------------------------------
335
+
336
+ ### 13. `load_data_from_path`
337
+ Load data from a file into a DataFrame based on the file extension.
338
+
339
+ • Parameters:
340
+ - `file_path` (str): The absolute path to the data file.
341
+
342
+ • Returns:
343
+ - pd.DataFrame: A DataFrame containing the loaded data.
344
+
345
+ • Example:
346
+
347
+ from rgwfuncs import load_data_from_path
348
+
349
+ df = load_data_from_path("/absolute/path/to/data.csv")
350
+ print(df)
351
+
352
+
353
+ --------------------------------------------------------------------------------
354
+
355
+ ### 14. `load_data_from_sqlite_path`
356
+ Execute a query on a SQLite database file and return the results as a DataFrame.
357
+
358
+ • Parameters:
359
+ - `sqlite_path` (str): The absolute path to the SQLite database file.
360
+ - query (str): The SQL query to execute.
361
+
362
+ • Returns:
363
+ - pd.DataFrame: A DataFrame containing the query results.
364
+
365
+ • Example:
366
+
367
+ from rgwfuncs import load_data_from_sqlite_path
368
+
369
+ df = load_data_from_sqlite_path("/path/to/database.db", "SELECT * FROM my_table")
370
+ print(df)
371
+
372
+
373
+ --------------------------------------------------------------------------------
374
+
375
+ ### 15. `first_n_rows`
376
+ Display the first n rows of the DataFrame (prints out in dictionary format).
377
+
378
+ • Parameters:
379
+ - df (pd.DataFrame)
380
+ - n (int): Number of rows to display.
381
+
382
+ • Example:
383
+
384
+ from rgwfuncs import first_n_rows
385
+ import pandas as pd
386
+
387
+ df = pd.DataFrame({'A': [1,2,3], 'B': [4,5,6]})
388
+ first_n_rows(df, 2)
389
+
390
+
391
+ --------------------------------------------------------------------------------
392
+
393
+ ### 16. `last_n_rows`
394
+ Display the last n rows of the DataFrame (prints out in dictionary format).
395
+
396
+ • Parameters:
397
+ - df (pd.DataFrame)
398
+ - n (int): Number of rows to display.
399
+
400
+ • Example:
401
+
402
+ from rgwfuncs import last_n_rows
403
+ import pandas as pd
404
+
405
+ df = pd.DataFrame({'A': [1,2,3,4,5], 'B': [6,7,8,9,10]})
406
+ last_n_rows(df, 2)
407
+
408
+
409
+ --------------------------------------------------------------------------------
410
+
411
+ ### 17. `top_n_unique_values`
412
+ Print the top n unique values for specified columns in the DataFrame.
413
+
414
+ • Parameters:
415
+ - df (pd.DataFrame): The DataFrame to evaluate.
416
+ - n (int): Number of top values to display.
417
+ - columns (list): List of columns for which to display top unique values.
418
+
419
+ • Example:
420
+
421
+ from rgwfuncs import top_n_unique_values
422
+ import pandas as pd
423
+
424
+ df = pd.DataFrame({'Cities': ['NY', 'LA', 'NY', 'SF', 'LA', 'LA']})
425
+ top_n_unique_values(df, 2, ['Cities'])
426
+
427
+
428
+ --------------------------------------------------------------------------------
429
+
430
+ ### 18. `bottom_n_unique_values`
431
+ Print the bottom n unique values for specified columns in the DataFrame.
432
+
433
+ • Parameters:
434
+ - df (pd.DataFrame)
435
+ - n (int)
436
+ - columns (list)
437
+
438
+ • Example:
439
+
440
+ from rgwfuncs import bottom_n_unique_values
441
+ import pandas as pd
442
+
443
+ df = pd.DataFrame({'Cities': ['NY', 'LA', 'NY', 'SF', 'LA', 'LA']})
444
+ bottom_n_unique_values(df, 1, ['Cities'])
445
+
446
+
447
+ --------------------------------------------------------------------------------
448
+
449
+ ### 19. `print_correlation`
450
+ Print correlation for multiple pairs of columns in the DataFrame.
451
+
452
+ • Parameters:
453
+ - df (pd.DataFrame)
454
+ - `column_pairs` (list of tuples): E.g., `[('col1','col2'), ('colA','colB')]`.
455
+
456
+ • Example:
457
+
458
+ from rgwfuncs import print_correlation
459
+ import pandas as pd
460
+
461
+ df = pd.DataFrame({
462
+ 'col1': [1,2,3,4,5],
463
+ 'col2': [2,4,6,8,10],
464
+ 'colA': [10,9,8,7,6],
465
+ 'colB': [5,4,3,2,1]
466
+ })
467
+
468
+ pairs = [('col1','col2'), ('colA','colB')]
469
+ print_correlation(df, pairs)
470
+
471
+
472
+ --------------------------------------------------------------------------------
473
+
474
+ ### 20. `print_memory_usage`
475
+ Print the memory usage of the DataFrame in megabytes.
476
+
477
+ • Parameters:
478
+ - df (pd.DataFrame)
479
+
480
+ • Example:
481
+
482
+ from rgwfuncs import print_memory_usage
483
+ import pandas as pd
484
+
485
+ df = pd.DataFrame({'A': range(1000)})
486
+ print_memory_usage(df)
487
+
488
+
489
+ --------------------------------------------------------------------------------
490
+
491
+ ### 21. `filter_dataframe`
492
+ Return a new DataFrame filtered by a given query expression.
493
+
494
+ • Parameters:
495
+ - df (pd.DataFrame)
496
+ - `filter_expr` (str)
497
+
498
+ • Returns:
499
+ - pd.DataFrame
500
+
501
+ • Example:
502
+
503
+ from rgwfuncs import filter_dataframe
504
+ import pandas as pd
505
+
506
+ df = pd.DataFrame({
507
+ 'Name': ['Alice', 'Bob', 'Charlie'],
508
+ 'Age': [30, 20, 25]
509
+ })
510
+
511
+ df_filtered = filter_dataframe(df, "Age > 23")
512
+ print(df_filtered)
513
+
514
+
515
+ --------------------------------------------------------------------------------
516
+
517
+ ### 22. `filter_indian_mobiles`
518
+ Filter and return rows containing valid Indian mobile numbers in the specified column.
519
+
520
+ • Parameters:
521
+ - df (pd.DataFrame)
522
+ - `mobile_col` (str): The column name with mobile numbers.
523
+
524
+ • Returns:
525
+ - pd.DataFrame
526
+
527
+ • Example:
528
+
529
+ from rgwfuncs import filter_indian_mobiles
530
+ import pandas as pd
531
+
532
+ df = pd.DataFrame({'Phone': ['9876543210', '12345', '7000012345']})
533
+ df_indian = filter_indian_mobiles(df, 'Phone')
534
+ print(df_indian)
535
+
536
+
537
+ --------------------------------------------------------------------------------
538
+
539
+ ### 23. `print_dataframe`
540
+ Print the entire DataFrame and its column types. Optionally print a source path.
541
+
542
+ • Parameters:
543
+ - df (pd.DataFrame)
544
+ - source (str, optional)
545
+
546
+ • Example:
547
+
548
+ from rgwfuncs import print_dataframe
549
+ import pandas as pd
550
+
551
+ df = pd.DataFrame({'Name': ['Alice'], 'Age': [30]})
552
+ print_dataframe(df, source='SampleData.csv')
553
+
554
+
555
+ --------------------------------------------------------------------------------
556
+
557
+ ### 24. `send_dataframe_via_telegram`
558
+ Send a DataFrame via Telegram using a specified bot configuration.
559
+
560
+ • Parameters:
561
+ - df (pd.DataFrame)
562
+ - `bot_name` (str)
563
+ - message (str)
564
+ - `as_file` (bool)
565
+ - `remove_after_send` (bool)
566
+
567
+ • Example:
568
+
569
+ from rgwfuncs import send_dataframe_via_telegram
570
+
571
+ # Suppose your bot config is in "rgwml.config" under [TelegramBots] section
572
+ df = ... # Some DataFrame
573
+ send_dataframe_via_telegram(
574
+ df,
575
+ bot_name='MyTelegramBot',
576
+ message='Hello from RGWFuncs!',
577
+ as_file=True,
578
+ remove_after_send=True
579
+ )
580
+
581
+
582
+ --------------------------------------------------------------------------------
583
+
584
+ ### 25. `send_data_to_email`
585
+ Send an email with an optional DataFrame attachment using the Gmail API via a specified preset.
586
+
587
+ • Parameters:
588
+ - df (pd.DataFrame)
589
+ - `preset_name` (str)
590
+ - `to_email` (str)
591
+ - subject (str, optional)
592
+ - body (str, optional)
593
+ - `as_file` (bool)
594
+ - `remove_after_send` (bool)
595
+
596
+ • Example:
597
+
598
+ from rgwfuncs import send_data_to_email
599
+
600
+ df = ... # Some DataFrame
601
+ send_data_to_email(
602
+ df,
603
+ preset_name='MyEmailPreset',
604
+ to_email='recipient@example.com',
605
+ subject='Hello from RGWFuncs',
606
+ body='Here is the data you requested.',
607
+ as_file=True,
608
+ remove_after_send=True
609
+ )
610
+
611
+
612
+ --------------------------------------------------------------------------------
613
+
614
+ ### 26. `send_data_to_slack`
615
+ Send a DataFrame or message to Slack using a specified bot configuration.
616
+
617
+ • Parameters:
618
+ - df (pd.DataFrame)
619
+ - `bot_name` (str)
620
+ - message (str)
621
+ - `as_file` (bool)
622
+ - `remove_after_send` (bool)
623
+
624
+ • Example:
625
+
626
+ from rgwfuncs import send_data_to_slack
627
+
628
+ df = ... # Some DataFrame
629
+ send_data_to_slack(
630
+ df,
631
+ bot_name='MySlackBot',
632
+ message='Hello Slack!',
633
+ as_file=True,
634
+ remove_after_send=True
635
+ )
636
+
637
+
638
+ --------------------------------------------------------------------------------
639
+
640
+ ### 27. `order_columns`
641
+ Reorder the columns of a DataFrame based on a string input.
642
+
643
+ • Parameters:
644
+ - df (pd.DataFrame)
645
+ - `column_order_str` (str): Comma-separated column order.
646
+
647
+ • Returns:
648
+ - pd.DataFrame
649
+
650
+ • Example:
651
+
652
+ from rgwfuncs import order_columns
653
+ import pandas as pd
654
+
655
+ df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [30, 25], 'Salary': [1000, 1200]})
656
+ df_reordered = order_columns(df, 'Salary,Name,Age')
657
+ print(df_reordered)
658
+
659
+
660
+ --------------------------------------------------------------------------------
661
+
662
+ ### 28. `append_ranged_classification_column`
663
+ Append a ranged classification column to the DataFrame.
664
+
665
+ • Parameters:
666
+ - df (pd.DataFrame)
667
+ - ranges (str): Ranges separated by commas (e.g., "0-10,11-20,21-30").
668
+ - `target_col` (str): The column to classify.
669
+ - `new_col_name` (str): Name of the new classification column.
670
+
671
+ • Returns:
672
+ - pd.DataFrame
673
+
674
+ • Example:
675
+
676
+ from rgwfuncs import append_ranged_classification_column
677
+ import pandas as pd
678
+
679
+ df = pd.DataFrame({'Scores': [5, 12, 25]})
680
+ df_classified = append_ranged_classification_column(df, '0-10,11-20,21-30', 'Scores', 'ScoreRange')
681
+ print(df_classified)
682
+
683
+
684
+ --------------------------------------------------------------------------------
685
+
686
+ ### 29. `append_percentile_classification_column`
687
+ Append a percentile classification column to the DataFrame.
688
+
689
+ • Parameters:
690
+ - df (pd.DataFrame)
691
+ - percentiles (str): Percentile values separated by commas (e.g., "25,50,75").
692
+ - `target_col` (str)
693
+ - `new_col_name` (str)
694
+
695
+ • Returns:
696
+ - pd.DataFrame
697
+
698
+ • Example:
699
+
700
+ from rgwfuncs import append_percentile_classification_column
701
+ import pandas as pd
702
+
703
+ df = pd.DataFrame({'Values': [10, 20, 30, 40, 50]})
704
+ df_classified = append_percentile_classification_column(df, '25,50,75', 'Values', 'ValuePercentile')
705
+ print(df_classified)
706
+
707
+
708
+ --------------------------------------------------------------------------------
709
+
710
+ ### 30. `append_ranged_date_classification_column`
711
+ Append a ranged date classification column to the DataFrame.
712
+
713
+ • Parameters:
714
+ - df (pd.DataFrame)
715
+ - `date_ranges` (str): Date ranges separated by commas, e.g., `2020-01-01_2020-06-30,2020-07-01_2020-12-31`
716
+ - `target_col` (str)
717
+ - `new_col_name` (str)
718
+
719
+ • Returns:
720
+ - pd.DataFrame
721
+
722
+ • Example:
723
+
724
+ from rgwfuncs import append_ranged_date_classification_column
725
+ import pandas as pd
726
+
727
+ df = pd.DataFrame({'EventDate': pd.to_datetime(['2020-03-15','2020-08-10'])})
728
+ df_classified = append_ranged_date_classification_column(
729
+ df,
730
+ '2020-01-01_2020-06-30,2020-07-01_2020-12-31',
731
+ 'EventDate',
732
+ 'DateRange'
733
+ )
734
+ print(df_classified)
735
+
736
+
737
+ --------------------------------------------------------------------------------
738
+
739
+ ### 31. `rename_columns`
740
+ Rename columns in the DataFrame.
741
+
742
+ • Parameters:
743
+ - df (pd.DataFrame)
744
+ - `rename_pairs` (dict): Mapping old column names to new ones.
745
+
746
+ • Returns:
747
+ - pd.DataFrame
748
+
749
+ • Example:
750
+
751
+ from rgwfuncs import rename_columns
752
+ import pandas as pd
753
+
754
+ df = pd.DataFrame({'OldName': [1,2,3]})
755
+ df_renamed = rename_columns(df, {'OldName': 'NewName'})
756
+ print(df_renamed)
757
+
758
+
759
+ --------------------------------------------------------------------------------
760
+
761
+ ### 32. `cascade_sort`
762
+ Cascade sort the DataFrame by specified columns and order.
763
+
764
+ • Parameters:
765
+ - df (pd.DataFrame)
766
+ - columns (list): e.g. ["Column1::ASC", "Column2::DESC"].
767
+
768
+ • Returns:
769
+ - pd.DataFrame
770
+
771
+ • Example:
772
+
773
+ from rgwfuncs import cascade_sort
774
+ import pandas as pd
775
+
776
+ df = pd.DataFrame({
777
+ 'Name': ['Charlie', 'Alice', 'Bob'],
778
+ 'Age': [25, 30, 22]
779
+ })
780
+
781
+ sorted_df = cascade_sort(df, ["Name::ASC", "Age::DESC"])
782
+ print(sorted_df)
783
+
784
+
785
+ --------------------------------------------------------------------------------
786
+
787
+ ### 33. `append_xgb_labels`
788
+ Append XGB training labels (TRAIN, VALIDATE, TEST) based on a ratio string.
789
+
790
+ • Parameters:
791
+ - df (pd.DataFrame)
792
+ - `ratio_str` (str): e.g. "8:2", "7:2:1".
793
+
794
+ • Returns:
795
+ - pd.DataFrame
796
+
797
+ • Example:
798
+
799
+ from rgwfuncs import append_xgb_labels
800
+ import pandas as pd
801
+
802
+ df = pd.DataFrame({'A': range(10)})
803
+ df_labeled = append_xgb_labels(df, "7:2:1")
804
+ print(df_labeled)
805
+
806
+
807
+ --------------------------------------------------------------------------------
808
+
809
+ ### 34. `append_xgb_regression_predictions`
810
+ Append XGB regression predictions to the DataFrame. Requires an `XGB_TYPE` column for TRAIN/TEST splits.
811
+
812
+ • Parameters:
813
+ - df (pd.DataFrame)
814
+ - `target_col` (str)
815
+ - `feature_cols` (str): Comma-separated feature columns.
816
+ - `pred_col` (str)
817
+ - `boosting_rounds` (int, optional)
818
+ - `model_path` (str, optional)
819
+
820
+ • Returns:
821
+ - pd.DataFrame
822
+
823
+ • Example:
824
+
825
+ from rgwfuncs import append_xgb_regression_predictions
826
+ import pandas as pd
827
+
828
+ df = pd.DataFrame({
829
+ 'XGB_TYPE': ['TRAIN','TRAIN','TEST','TEST'],
830
+ 'Feature1': [1.2, 2.3, 3.4, 4.5],
831
+ 'Feature2': [5.6, 6.7, 7.8, 8.9],
832
+ 'Target': [10, 20, 30, 40]
833
+ })
834
+
835
+ df_pred = append_xgb_regression_predictions(df, 'Target', 'Feature1,Feature2', 'PredictedTarget')
836
+ print(df_pred)
837
+
838
+
839
+ --------------------------------------------------------------------------------
840
+
841
+ ### 35. `append_xgb_logistic_regression_predictions`
842
+ Append XGB logistic regression predictions to the DataFrame. Requires an `XGB_TYPE` column for TRAIN/TEST splits.
843
+
844
+ • Parameters:
845
+ - df (pd.DataFrame)
846
+ - `target_col` (str)
847
+ - `feature_cols` (str)
848
+ - `pred_col` (str)
849
+ - `boosting_rounds` (int, optional)
850
+ - `model_path` (str, optional)
851
+
852
+ • Returns:
853
+ - pd.DataFrame
854
+
855
+ • Example:
856
+
857
+ from rgwfuncs import append_xgb_logistic_regression_predictions
858
+ import pandas as pd
859
+
860
+ df = pd.DataFrame({
861
+ 'XGB_TYPE': ['TRAIN','TRAIN','TEST','TEST'],
862
+ 'Feature1': [1, 0, 1, 0],
863
+ 'Feature2': [0.5, 0.2, 0.8, 0.1],
864
+ 'Target': [1, 0, 1, 0]
865
+ })
866
+
867
+ df_pred = append_xgb_logistic_regression_predictions(df, 'Target', 'Feature1,Feature2', 'PredictedTarget')
868
+ print(df_pred)
869
+
870
+
871
+ --------------------------------------------------------------------------------
872
+
873
+ ### 36. `print_n_frequency_cascading`
874
+ Print the cascading frequency of top n values for specified columns.
875
+
876
+ • Parameters:
877
+ - df (pd.DataFrame)
878
+ - n (int)
879
+ - columns (str): Comma-separated column names.
880
+ - `order_by` (str): `ASC`, `DESC`, `FREQ_ASC`, `FREQ_DESC`.
881
+
882
+ • Example:
883
+
884
+ from rgwfuncs import print_n_frequency_cascading
885
+ import pandas as pd
886
+
887
+ df = pd.DataFrame({'City': ['NY','LA','NY','SF','LA','LA']})
888
+ print_n_frequency_cascading(df, 2, 'City', 'FREQ_DESC')
889
+
890
+
891
+ --------------------------------------------------------------------------------
892
+
893
+ ### 37. `print_n_frequency_linear`
894
+ Print the linear frequency of top n values for specified columns.
895
+
896
+ • Parameters:
897
+ - df (pd.DataFrame)
898
+ - n (int)
899
+ - columns (str): Comma-separated columns.
900
+ - `order_by` (str)
901
+
902
+ • Example:
903
+
904
+ from rgwfuncs import print_n_frequency_linear
905
+ import pandas as pd
906
+
907
+ df = pd.DataFrame({'City': ['NY','LA','NY','SF','LA','LA']})
908
+ print_n_frequency_linear(df, 2, 'City', 'FREQ_DESC')
909
+
910
+
911
+ --------------------------------------------------------------------------------
912
+
913
+ ### 38. `retain_columns`
914
+ Retain specified columns in the DataFrame and drop the others.
915
+
916
+ • Parameters:
917
+ - df (pd.DataFrame)
918
+ - `columns_to_retain` (list or str)
919
+
920
+ • Returns:
921
+ - pd.DataFrame
922
+
923
+ • Example:
924
+
925
+ from rgwfuncs import retain_columns
926
+ import pandas as pd
927
+
928
+ df = pd.DataFrame({'A': [1,2], 'B': [3,4], 'C': [5,6]})
929
+ df_reduced = retain_columns(df, ['A','C'])
930
+ print(df_reduced)
931
+
932
+
933
+ --------------------------------------------------------------------------------
934
+
935
+ ### 39. `mask_against_dataframe`
936
+ Retain only rows with common column values between two DataFrames.
937
+
938
+ • Parameters:
939
+ - df (pd.DataFrame)
940
+ - `other_df` (pd.DataFrame)
941
+ - `column_name` (str)
942
+
943
+ • Returns:
944
+ - pd.DataFrame
945
+
946
+ • Example:
947
+
948
+ from rgwfuncs import mask_against_dataframe
949
+ import pandas as pd
950
+
951
+ df1 = pd.DataFrame({'ID': [1,2,3], 'Value': [10,20,30]})
952
+ df2 = pd.DataFrame({'ID': [2,3,4], 'Extra': ['X','Y','Z']})
953
+
954
+ df_masked = mask_against_dataframe(df1, df2, 'ID')
955
+ print(df_masked)
956
+
957
+
958
+ --------------------------------------------------------------------------------
959
+
960
+ ### 40. `mask_against_dataframe_converse`
961
+ Retain only rows with uncommon column values between two DataFrames.
962
+
963
+ • Parameters:
964
+ - df (pd.DataFrame)
965
+ - `other_df` (pd.DataFrame)
966
+ - `column_name` (str)
967
+
968
+ • Returns:
969
+ - pd.DataFrame
970
+
971
+ • Example:
972
+
973
+ from rgwfuncs import mask_against_dataframe_converse
974
+ import pandas as pd
975
+
976
+ df1 = pd.DataFrame({'ID': [1,2,3], 'Value': [10,20,30]})
977
+ df2 = pd.DataFrame({'ID': [2,3,4], 'Extra': ['X','Y','Z']})
978
+
979
+ df_uncommon = mask_against_dataframe_converse(df1, df2, 'ID')
980
+ print(df_uncommon)
981
+
982
+
983
+ --------------------------------------------------------------------------------
984
+
985
+ ## Additional Info
986
+
987
+ For more information, refer to each function’s docstring by calling:
988
+
989
+ rgwfuncs.docs(method_type_filter='function_name')
990
+
991
+ or display all docstrings with:
992
+
993
+ rgwfuncs.docs(method_type_filter='*')
994
+
995
+
996
+ --------------------------------------------------------------------------------
997
+
998
+ © 2025 Ryan Gerard Wilson. All rights reserved.
999
+