rgwfuncs 0.0.2__tar.gz → 0.0.3__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- rgwfuncs-0.0.3/PKG-INFO +1003 -0
- rgwfuncs-0.0.3/README.md +977 -0
- {rgwfuncs-0.0.2 → rgwfuncs-0.0.3}/pyproject.toml +1 -1
- {rgwfuncs-0.0.2 → rgwfuncs-0.0.3}/setup.cfg +1 -1
- {rgwfuncs-0.0.2 → rgwfuncs-0.0.3}/src/rgwfuncs/df_lib.py +730 -217
- rgwfuncs-0.0.3/src/rgwfuncs.egg-info/PKG-INFO +1003 -0
- rgwfuncs-0.0.2/PKG-INFO +0 -325
- rgwfuncs-0.0.2/README.md +0 -299
- rgwfuncs-0.0.2/src/rgwfuncs.egg-info/PKG-INFO +0 -325
- {rgwfuncs-0.0.2 → rgwfuncs-0.0.3}/LICENSE +0 -0
- {rgwfuncs-0.0.2 → rgwfuncs-0.0.3}/src/rgwfuncs/__init__.py +0 -0
- {rgwfuncs-0.0.2 → rgwfuncs-0.0.3}/src/rgwfuncs.egg-info/SOURCES.txt +0 -0
- {rgwfuncs-0.0.2 → rgwfuncs-0.0.3}/src/rgwfuncs.egg-info/dependency_links.txt +0 -0
- {rgwfuncs-0.0.2 → rgwfuncs-0.0.3}/src/rgwfuncs.egg-info/entry_points.txt +0 -0
- {rgwfuncs-0.0.2 → rgwfuncs-0.0.3}/src/rgwfuncs.egg-info/requires.txt +0 -0
- {rgwfuncs-0.0.2 → rgwfuncs-0.0.3}/src/rgwfuncs.egg-info/top_level.txt +0 -0
rgwfuncs-0.0.3/PKG-INFO
ADDED
@@ -0,0 +1,1003 @@
|
|
1
|
+
Metadata-Version: 2.2
|
2
|
+
Name: rgwfuncs
|
3
|
+
Version: 0.0.3
|
4
|
+
Summary: A functional programming paradigm for mathematical modelling and data science
|
5
|
+
Home-page: https://github.com/ryangerardwilson/rgwfunc
|
6
|
+
Author: Ryan Gerard Wilson
|
7
|
+
Author-email: Ryan Gerard Wilson <ryangerardwilson@gmail.com>
|
8
|
+
Project-URL: Homepage, https://github.com/ryangerardwilson/rgwfuncs
|
9
|
+
Project-URL: Issues, https://github.com/ryangerardwilson/rgwfuncs
|
10
|
+
Classifier: Programming Language :: Python :: 3
|
11
|
+
Classifier: License :: OSI Approved :: MIT License
|
12
|
+
Classifier: Operating System :: OS Independent
|
13
|
+
Requires-Python: >=3.12
|
14
|
+
Description-Content-Type: text/markdown
|
15
|
+
License-File: LICENSE
|
16
|
+
Requires-Dist: pandas
|
17
|
+
Requires-Dist: pymssql
|
18
|
+
Requires-Dist: mysql-connector-python
|
19
|
+
Requires-Dist: clickhouse-connect
|
20
|
+
Requires-Dist: google-cloud-bigquery
|
21
|
+
Requires-Dist: google-auth
|
22
|
+
Requires-Dist: xgboost
|
23
|
+
Requires-Dist: requests
|
24
|
+
Requires-Dist: slack-sdk
|
25
|
+
Requires-Dist: google-api-python-client
|
26
|
+
|
27
|
+
RGWML
|
28
|
+
|
29
|
+
***By Ryan Gerard Wilson (https://ryangerardwilson.com)***
|
30
|
+
|
31
|
+
# RGWFuncs
|
32
|
+
|
33
|
+
This library provides a variety of functions for manipulating and analyzing pandas DataFrames.
|
34
|
+
|
35
|
+
--------------------------------------------------------------------------------
|
36
|
+
|
37
|
+
## Installation
|
38
|
+
|
39
|
+
Install the package using:
|
40
|
+
```bash
|
41
|
+
pip install rgwfuncs
|
42
|
+
```
|
43
|
+
|
44
|
+
--------------------------------------------------------------------------------
|
45
|
+
|
46
|
+
## Basic Usage
|
47
|
+
|
48
|
+
Import the library:
|
49
|
+
```
|
50
|
+
import rgwfuncs
|
51
|
+
```
|
52
|
+
|
53
|
+
View available function docstrings in alphabetical order:
|
54
|
+
```
|
55
|
+
rgwfuncs.docs()
|
56
|
+
```
|
57
|
+
|
58
|
+
View specific docstrings by providing a filter (comma-separated). For example, to display docstrings about "numeric_clean":
|
59
|
+
```
|
60
|
+
rgwfuncs.docs(method_type_filter='numeric_clean')
|
61
|
+
```
|
62
|
+
|
63
|
+
To display all docstrings, use:
|
64
|
+
```
|
65
|
+
rgwfuncs.docs(method_type_filter='*')
|
66
|
+
```
|
67
|
+
|
68
|
+
--------------------------------------------------------------------------------
|
69
|
+
|
70
|
+
## Function References and Syntax Examples
|
71
|
+
|
72
|
+
Below is a quick reference of available functions, their purpose, and basic usage examples.
|
73
|
+
|
74
|
+
### 1. docs
|
75
|
+
Print a list of available function names in alphabetical order. If a filter is provided, print the matching docstrings.
|
76
|
+
|
77
|
+
• Parameters:
|
78
|
+
- `method_type_filter` (str): Optional, comma-separated to select docstring types, or '*' for all.
|
79
|
+
|
80
|
+
• Example:
|
81
|
+
|
82
|
+
import rgwfuncs
|
83
|
+
rgwfuncs.docs(method_type_filter='numeric_clean,limit_dataframe')
|
84
|
+
|
85
|
+
--------------------------------------------------------------------------------
|
86
|
+
|
87
|
+
### 2. `numeric_clean`
|
88
|
+
Cleans the numeric columns in a DataFrame according to specified treatments.
|
89
|
+
|
90
|
+
• Parameters:
|
91
|
+
- df (pd.DataFrame): The DataFrame to clean.
|
92
|
+
- `column_names` (str): A comma-separated string containing the names of the columns to clean.
|
93
|
+
- `column_type` (str): The type to convert the column to (`INTEGER` or `FLOAT`).
|
94
|
+
- `irregular_value_treatment` (str): How to treat irregular values (`NAN`, `TO_ZERO`, `MEAN`).
|
95
|
+
|
96
|
+
• Returns:
|
97
|
+
- pd.DataFrame: A new DataFrame with cleaned numeric columns.
|
98
|
+
|
99
|
+
• Example:
|
100
|
+
|
101
|
+
from rgwfuncs import numeric_clean
|
102
|
+
import pandas as pd
|
103
|
+
|
104
|
+
# Sample DataFrame
|
105
|
+
df = pd.DataFrame({
|
106
|
+
'col1': [1, 2, 3, 'x', 4],
|
107
|
+
'col2': [10.5, 20.1, 'not_a_number', 30.2, 40.8]
|
108
|
+
})
|
109
|
+
|
110
|
+
# Clean numeric columns
|
111
|
+
df_cleaned = numeric_clean(df, 'col1,col2', 'FLOAT', 'MEAN')
|
112
|
+
print(df_cleaned)
|
113
|
+
|
114
|
+
--------------------------------------------------------------------------------
|
115
|
+
|
116
|
+
### 3. `limit_dataframe`
|
117
|
+
Limit the DataFrame to a specified number of rows.
|
118
|
+
|
119
|
+
• Parameters:
|
120
|
+
- df (pd.DataFrame): The DataFrame to limit.
|
121
|
+
- `num_rows` (int): The number of rows to retain.
|
122
|
+
|
123
|
+
• Returns:
|
124
|
+
- pd.DataFrame: A new DataFrame limited to the specified number of rows.
|
125
|
+
|
126
|
+
• Example:
|
127
|
+
```
|
128
|
+
from rgwfuncs import limit_dataframe
|
129
|
+
import pandas as pd
|
130
|
+
|
131
|
+
df = pd.DataFrame({'A': range(10), 'B': range(10, 20)})
|
132
|
+
df_limited = limit_dataframe(df, 5)
|
133
|
+
print(df_limited)
|
134
|
+
```
|
135
|
+
--------------------------------------------------------------------------------
|
136
|
+
|
137
|
+
### 4. `from_raw_data`
|
138
|
+
Create a DataFrame from raw data.
|
139
|
+
|
140
|
+
• Parameters:
|
141
|
+
- headers (list): A list of column headers.
|
142
|
+
- data (list of lists): A two-dimensional list of data.
|
143
|
+
|
144
|
+
• Returns:
|
145
|
+
- pd.DataFrame: A DataFrame created from the raw data.
|
146
|
+
|
147
|
+
• Example:
|
148
|
+
```
|
149
|
+
from rgwfuncs import from_raw_data
|
150
|
+
|
151
|
+
headers = ["Name", "Age"]
|
152
|
+
data = [
|
153
|
+
["Alice", 30],
|
154
|
+
["Bob", 25],
|
155
|
+
["Charlie", 35]
|
156
|
+
]
|
157
|
+
|
158
|
+
df = from_raw_data(headers, data)
|
159
|
+
print(df)
|
160
|
+
```
|
161
|
+
--------------------------------------------------------------------------------
|
162
|
+
|
163
|
+
### 5. `append_rows`
|
164
|
+
Append rows to the DataFrame.
|
165
|
+
|
166
|
+
• Parameters:
|
167
|
+
- df (pd.DataFrame): The original DataFrame.
|
168
|
+
- rows (list of lists): Each inner list represents a row to be appended.
|
169
|
+
|
170
|
+
• Returns:
|
171
|
+
- pd.DataFrame: A new DataFrame with appended rows.
|
172
|
+
|
173
|
+
• Example:
|
174
|
+
```
|
175
|
+
from rgwfuncs import append_rows
|
176
|
+
import pandas as pd
|
177
|
+
|
178
|
+
df = pd.DataFrame({'Name': ['Alice'], 'Age': [30]})
|
179
|
+
new_rows = [
|
180
|
+
['Bob', 25],
|
181
|
+
['Charlie', 35]
|
182
|
+
]
|
183
|
+
df_appended = append_rows(df, new_rows)
|
184
|
+
print(df_appended)
|
185
|
+
```
|
186
|
+
--------------------------------------------------------------------------------
|
187
|
+
|
188
|
+
### 6. `append_columns`
|
189
|
+
Append new columns to the DataFrame with None values.
|
190
|
+
|
191
|
+
• Parameters:
|
192
|
+
- df (pd.DataFrame): The original DataFrame.
|
193
|
+
- `col_names` (list or comma-separated string): The names of the columns to add.
|
194
|
+
|
195
|
+
• Returns:
|
196
|
+
- pd.DataFrame: A new DataFrame with the new columns appended.
|
197
|
+
|
198
|
+
• Example:
|
199
|
+
```
|
200
|
+
from rgwfuncs import append_columns
|
201
|
+
import pandas as pd
|
202
|
+
|
203
|
+
df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [30, 25]})
|
204
|
+
df_new = append_columns(df, ['Salary', 'Department'])
|
205
|
+
print(df_new)
|
206
|
+
```
|
207
|
+
--------------------------------------------------------------------------------
|
208
|
+
|
209
|
+
### 7. `update_rows`
|
210
|
+
Update specific rows in the DataFrame based on a condition.
|
211
|
+
|
212
|
+
• Parameters:
|
213
|
+
- df (pd.DataFrame): The original DataFrame.
|
214
|
+
- condition (str): A query condition to identify rows for updating.
|
215
|
+
- updates (dict): A dictionary with column names as keys and new values as values.
|
216
|
+
|
217
|
+
• Returns:
|
218
|
+
- pd.DataFrame: A new DataFrame with updated rows.
|
219
|
+
|
220
|
+
• Example:
|
221
|
+
```
|
222
|
+
from rgwfuncs import update_rows
|
223
|
+
import pandas as pd
|
224
|
+
|
225
|
+
df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [30, 25]})
|
226
|
+
df_updated = update_rows(df, "Name == 'Alice'", {'Age': 31})
|
227
|
+
print(df_updated)
|
228
|
+
```
|
229
|
+
--------------------------------------------------------------------------------
|
230
|
+
|
231
|
+
### 8. `delete_rows`
|
232
|
+
Delete rows from the DataFrame based on a condition.
|
233
|
+
|
234
|
+
• Parameters:
|
235
|
+
- df (pd.DataFrame): The original DataFrame.
|
236
|
+
- condition (str): A query condition to identify rows for deletion.
|
237
|
+
|
238
|
+
• Returns:
|
239
|
+
- pd.DataFrame: The DataFrame with specified rows deleted.
|
240
|
+
|
241
|
+
• Example:
|
242
|
+
```
|
243
|
+
from rgwfuncs import delete_rows
|
244
|
+
import pandas as pd
|
245
|
+
|
246
|
+
df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [30, 25]})
|
247
|
+
df_deleted = delete_rows(df, "Age < 28")
|
248
|
+
print(df_deleted)
|
249
|
+
```
|
250
|
+
--------------------------------------------------------------------------------
|
251
|
+
|
252
|
+
### 9. `drop_duplicates`
|
253
|
+
Drop duplicate rows in the DataFrame, retaining the first occurrence.
|
254
|
+
|
255
|
+
• Parameters:
|
256
|
+
- df (pd.DataFrame): The DataFrame from which duplicates will be dropped.
|
257
|
+
|
258
|
+
• Returns:
|
259
|
+
- pd.DataFrame: A new DataFrame with duplicates removed.
|
260
|
+
|
261
|
+
• Example:
|
262
|
+
```
|
263
|
+
from rgwfuncs import drop_duplicates
|
264
|
+
import pandas as pd
|
265
|
+
|
266
|
+
df = pd.DataFrame({'A': [1,1,2,2], 'B': [3,3,4,4]})
|
267
|
+
df_no_dupes = drop_duplicates(df)
|
268
|
+
print(df_no_dupes)
|
269
|
+
```
|
270
|
+
--------------------------------------------------------------------------------
|
271
|
+
|
272
|
+
### 10. `drop_duplicates_retain_first`
|
273
|
+
Drop duplicate rows based on specified columns, retaining the first occurrence.
|
274
|
+
|
275
|
+
• Parameters:
|
276
|
+
- df (pd.DataFrame): The DataFrame from which duplicates will be dropped.
|
277
|
+
- columns (str): Comma-separated string with column names used to identify duplicates.
|
278
|
+
|
279
|
+
• Returns:
|
280
|
+
- pd.DataFrame: A new DataFrame with duplicates removed.
|
281
|
+
|
282
|
+
• Example:
|
283
|
+
```
|
284
|
+
from rgwfuncs import drop_duplicates_retain_first
|
285
|
+
import pandas as pd
|
286
|
+
|
287
|
+
df = pd.DataFrame({'A': [1,1,2,2], 'B': [3,3,4,4]})
|
288
|
+
df_no_dupes = drop_duplicates_retain_first(df, 'A')
|
289
|
+
print(df_no_dupes)
|
290
|
+
```
|
291
|
+
--------------------------------------------------------------------------------
|
292
|
+
|
293
|
+
### 11. `drop_duplicates_retain_last`
|
294
|
+
Drop duplicate rows based on specified columns, retaining the last occurrence.
|
295
|
+
|
296
|
+
• Parameters:
|
297
|
+
- df (pd.DataFrame): The DataFrame from which duplicates will be dropped.
|
298
|
+
- columns (str): Comma-separated string with column names used to identify duplicates.
|
299
|
+
|
300
|
+
• Returns:
|
301
|
+
- pd.DataFrame: A new DataFrame with duplicates removed.
|
302
|
+
|
303
|
+
• Example:
|
304
|
+
```
|
305
|
+
from rgwfuncs import drop_duplicates_retain_last
|
306
|
+
import pandas as pd
|
307
|
+
|
308
|
+
df = pd.DataFrame({'A': [1,1,2,2], 'B': [3,3,4,4]})
|
309
|
+
df_no_dupes = drop_duplicates_retain_last(df, 'A')
|
310
|
+
print(df_no_dupes)
|
311
|
+
```
|
312
|
+
|
313
|
+
--------------------------------------------------------------------------------
|
314
|
+
|
315
|
+
### 12. `load_data_from_query`
|
316
|
+
Load data from a database query into a DataFrame based on a configuration preset.
|
317
|
+
|
318
|
+
• Parameters:
|
319
|
+
- `db_preset_name` (str): Name of the database preset in the config file.
|
320
|
+
- query (str): The SQL query to execute.
|
321
|
+
- `config_file_name` (str): Name of the configuration file (default: "rgwml.config").
|
322
|
+
|
323
|
+
• Returns:
|
324
|
+
- pd.DataFrame: A DataFrame containing the query result.
|
325
|
+
|
326
|
+
• Example:
|
327
|
+
```
|
328
|
+
from rgwfuncs import load_data_from_query
|
329
|
+
|
330
|
+
df = load_data_from_query(
|
331
|
+
db_preset_name="MyDBPreset",
|
332
|
+
query="SELECT * FROM my_table",
|
333
|
+
config_file_name="rgwml.config"
|
334
|
+
)
|
335
|
+
print(df)
|
336
|
+
```
|
337
|
+
|
338
|
+
--------------------------------------------------------------------------------
|
339
|
+
|
340
|
+
### 13. `load_data_from_path`
|
341
|
+
Load data from a file into a DataFrame based on the file extension.
|
342
|
+
|
343
|
+
• Parameters:
|
344
|
+
- `file_path` (str): The absolute path to the data file.
|
345
|
+
|
346
|
+
• Returns:
|
347
|
+
- pd.DataFrame: A DataFrame containing the loaded data.
|
348
|
+
|
349
|
+
• Example:
|
350
|
+
```
|
351
|
+
from rgwfuncs import load_data_from_path
|
352
|
+
|
353
|
+
df = load_data_from_path("/absolute/path/to/data.csv")
|
354
|
+
print(df)
|
355
|
+
```
|
356
|
+
|
357
|
+
--------------------------------------------------------------------------------
|
358
|
+
|
359
|
+
### 14. `load_data_from_sqlite_path`
|
360
|
+
Execute a query on a SQLite database file and return the results as a DataFrame.
|
361
|
+
|
362
|
+
• Parameters:
|
363
|
+
- `sqlite_path` (str): The absolute path to the SQLite database file.
|
364
|
+
- query (str): The SQL query to execute.
|
365
|
+
|
366
|
+
• Returns:
|
367
|
+
- pd.DataFrame: A DataFrame containing the query results.
|
368
|
+
|
369
|
+
• Example:
|
370
|
+
```
|
371
|
+
from rgwfuncs import load_data_from_sqlite_path
|
372
|
+
|
373
|
+
df = load_data_from_sqlite_path("/path/to/database.db", "SELECT * FROM my_table")
|
374
|
+
print(df)
|
375
|
+
```
|
376
|
+
|
377
|
+
--------------------------------------------------------------------------------
|
378
|
+
|
379
|
+
### 15. `first_n_rows`
|
380
|
+
Display the first n rows of the DataFrame (prints out in dictionary format).
|
381
|
+
|
382
|
+
• Parameters:
|
383
|
+
- df (pd.DataFrame)
|
384
|
+
- n (int): Number of rows to display.
|
385
|
+
|
386
|
+
• Example:
|
387
|
+
```
|
388
|
+
from rgwfuncs import first_n_rows
|
389
|
+
import pandas as pd
|
390
|
+
|
391
|
+
df = pd.DataFrame({'A': [1,2,3], 'B': [4,5,6]})
|
392
|
+
first_n_rows(df, 2)
|
393
|
+
```
|
394
|
+
|
395
|
+
--------------------------------------------------------------------------------
|
396
|
+
|
397
|
+
### 16. `last_n_rows`
|
398
|
+
Display the last n rows of the DataFrame (prints out in dictionary format).
|
399
|
+
|
400
|
+
• Parameters:
|
401
|
+
- df (pd.DataFrame)
|
402
|
+
- n (int): Number of rows to display.
|
403
|
+
|
404
|
+
• Example:
|
405
|
+
```
|
406
|
+
from rgwfuncs import last_n_rows
|
407
|
+
import pandas as pd
|
408
|
+
|
409
|
+
df = pd.DataFrame({'A': [1,2,3,4,5], 'B': [6,7,8,9,10]})
|
410
|
+
last_n_rows(df, 2)
|
411
|
+
```
|
412
|
+
|
413
|
+
--------------------------------------------------------------------------------
|
414
|
+
|
415
|
+
### 17. `top_n_unique_values`
|
416
|
+
Print the top n unique values for specified columns in the DataFrame.
|
417
|
+
|
418
|
+
• Parameters:
|
419
|
+
- df (pd.DataFrame): The DataFrame to evaluate.
|
420
|
+
- n (int): Number of top values to display.
|
421
|
+
- columns (list): List of columns for which to display top unique values.
|
422
|
+
|
423
|
+
• Example:
|
424
|
+
```
|
425
|
+
from rgwfuncs import top_n_unique_values
|
426
|
+
import pandas as pd
|
427
|
+
|
428
|
+
df = pd.DataFrame({'Cities': ['NY', 'LA', 'NY', 'SF', 'LA', 'LA']})
|
429
|
+
top_n_unique_values(df, 2, ['Cities'])
|
430
|
+
```
|
431
|
+
|
432
|
+
--------------------------------------------------------------------------------
|
433
|
+
|
434
|
+
### 18. `bottom_n_unique_values`
|
435
|
+
Print the bottom n unique values for specified columns in the DataFrame.
|
436
|
+
|
437
|
+
• Parameters:
|
438
|
+
- df (pd.DataFrame)
|
439
|
+
- n (int)
|
440
|
+
- columns (list)
|
441
|
+
|
442
|
+
• Example:
|
443
|
+
```
|
444
|
+
from rgwfuncs import bottom_n_unique_values
|
445
|
+
import pandas as pd
|
446
|
+
|
447
|
+
df = pd.DataFrame({'Cities': ['NY', 'LA', 'NY', 'SF', 'LA', 'LA']})
|
448
|
+
bottom_n_unique_values(df, 1, ['Cities'])
|
449
|
+
```
|
450
|
+
|
451
|
+
--------------------------------------------------------------------------------
|
452
|
+
|
453
|
+
### 19. `print_correlation`
|
454
|
+
Print correlation for multiple pairs of columns in the DataFrame.
|
455
|
+
|
456
|
+
• Parameters:
|
457
|
+
- df (pd.DataFrame)
|
458
|
+
- `column_pairs` (list of tuples): E.g., `[('col1','col2'), ('colA','colB')]`.
|
459
|
+
|
460
|
+
• Example:
|
461
|
+
```
|
462
|
+
from rgwfuncs import print_correlation
|
463
|
+
import pandas as pd
|
464
|
+
|
465
|
+
df = pd.DataFrame({
|
466
|
+
'col1': [1,2,3,4,5],
|
467
|
+
'col2': [2,4,6,8,10],
|
468
|
+
'colA': [10,9,8,7,6],
|
469
|
+
'colB': [5,4,3,2,1]
|
470
|
+
})
|
471
|
+
|
472
|
+
pairs = [('col1','col2'), ('colA','colB')]
|
473
|
+
print_correlation(df, pairs)
|
474
|
+
```
|
475
|
+
|
476
|
+
--------------------------------------------------------------------------------
|
477
|
+
|
478
|
+
### 20. `print_memory_usage`
|
479
|
+
Print the memory usage of the DataFrame in megabytes.
|
480
|
+
|
481
|
+
• Parameters:
|
482
|
+
- df (pd.DataFrame)
|
483
|
+
|
484
|
+
• Example:
|
485
|
+
```
|
486
|
+
from rgwfuncs import print_memory_usage
|
487
|
+
import pandas as pd
|
488
|
+
|
489
|
+
df = pd.DataFrame({'A': range(1000)})
|
490
|
+
print_memory_usage(df)
|
491
|
+
```
|
492
|
+
|
493
|
+
--------------------------------------------------------------------------------
|
494
|
+
|
495
|
+
### 21. `filter_dataframe`
|
496
|
+
Return a new DataFrame filtered by a given query expression.
|
497
|
+
|
498
|
+
• Parameters:
|
499
|
+
- df (pd.DataFrame)
|
500
|
+
- `filter_expr` (str)
|
501
|
+
|
502
|
+
• Returns:
|
503
|
+
- pd.DataFrame
|
504
|
+
|
505
|
+
• Example:
|
506
|
+
```
|
507
|
+
from rgwfuncs import filter_dataframe
|
508
|
+
import pandas as pd
|
509
|
+
|
510
|
+
df = pd.DataFrame({
|
511
|
+
'Name': ['Alice', 'Bob', 'Charlie'],
|
512
|
+
'Age': [30, 20, 25]
|
513
|
+
})
|
514
|
+
|
515
|
+
df_filtered = filter_dataframe(df, "Age > 23")
|
516
|
+
print(df_filtered)
|
517
|
+
```
|
518
|
+
|
519
|
+
--------------------------------------------------------------------------------
|
520
|
+
|
521
|
+
### 22. `filter_indian_mobiles`
|
522
|
+
Filter and return rows containing valid Indian mobile numbers in the specified column.
|
523
|
+
|
524
|
+
• Parameters:
|
525
|
+
- df (pd.DataFrame)
|
526
|
+
- `mobile_col` (str): The column name with mobile numbers.
|
527
|
+
|
528
|
+
• Returns:
|
529
|
+
- pd.DataFrame
|
530
|
+
|
531
|
+
• Example:
|
532
|
+
```
|
533
|
+
from rgwfuncs import filter_indian_mobiles
|
534
|
+
import pandas as pd
|
535
|
+
|
536
|
+
df = pd.DataFrame({'Phone': ['9876543210', '12345', '7000012345']})
|
537
|
+
df_indian = filter_indian_mobiles(df, 'Phone')
|
538
|
+
print(df_indian)
|
539
|
+
```
|
540
|
+
|
541
|
+
--------------------------------------------------------------------------------
|
542
|
+
|
543
|
+
### 23. `print_dataframe`
|
544
|
+
Print the entire DataFrame and its column types. Optionally print a source path.
|
545
|
+
|
546
|
+
• Parameters:
|
547
|
+
- df (pd.DataFrame)
|
548
|
+
- source (str, optional)
|
549
|
+
|
550
|
+
• Example:
|
551
|
+
```
|
552
|
+
from rgwfuncs import print_dataframe
|
553
|
+
import pandas as pd
|
554
|
+
|
555
|
+
df = pd.DataFrame({'Name': ['Alice'], 'Age': [30]})
|
556
|
+
print_dataframe(df, source='SampleData.csv')
|
557
|
+
```
|
558
|
+
|
559
|
+
--------------------------------------------------------------------------------
|
560
|
+
|
561
|
+
### 24. `send_dataframe_via_telegram`
|
562
|
+
Send a DataFrame via Telegram using a specified bot configuration.
|
563
|
+
|
564
|
+
• Parameters:
|
565
|
+
- df (pd.DataFrame)
|
566
|
+
- `bot_name` (str)
|
567
|
+
- message (str)
|
568
|
+
- `as_file` (bool)
|
569
|
+
- `remove_after_send` (bool)
|
570
|
+
|
571
|
+
• Example:
|
572
|
+
```
|
573
|
+
from rgwfuncs import send_dataframe_via_telegram
|
574
|
+
|
575
|
+
# Suppose your bot config is in "rgwml.config" under [TelegramBots] section
|
576
|
+
df = ... # Some DataFrame
|
577
|
+
send_dataframe_via_telegram(
|
578
|
+
df,
|
579
|
+
bot_name='MyTelegramBot',
|
580
|
+
message='Hello from RGWFuncs!',
|
581
|
+
as_file=True,
|
582
|
+
remove_after_send=True
|
583
|
+
)
|
584
|
+
```
|
585
|
+
|
586
|
+
--------------------------------------------------------------------------------
|
587
|
+
|
588
|
+
### 25. `send_data_to_email`
|
589
|
+
Send an email with an optional DataFrame attachment using the Gmail API via a specified preset.
|
590
|
+
|
591
|
+
• Parameters:
|
592
|
+
- df (pd.DataFrame)
|
593
|
+
- `preset_name` (str)
|
594
|
+
- `to_email` (str)
|
595
|
+
- subject (str, optional)
|
596
|
+
- body (str, optional)
|
597
|
+
- `as_file` (bool)
|
598
|
+
- `remove_after_send` (bool)
|
599
|
+
|
600
|
+
• Example:
|
601
|
+
```
|
602
|
+
from rgwfuncs import send_data_to_email
|
603
|
+
|
604
|
+
df = ... # Some DataFrame
|
605
|
+
send_data_to_email(
|
606
|
+
df,
|
607
|
+
preset_name='MyEmailPreset',
|
608
|
+
to_email='recipient@example.com',
|
609
|
+
subject='Hello from RGWFuncs',
|
610
|
+
body='Here is the data you requested.',
|
611
|
+
as_file=True,
|
612
|
+
remove_after_send=True
|
613
|
+
)
|
614
|
+
```
|
615
|
+
|
616
|
+
--------------------------------------------------------------------------------
|
617
|
+
|
618
|
+
### 26. `send_data_to_slack`
|
619
|
+
Send a DataFrame or message to Slack using a specified bot configuration.
|
620
|
+
|
621
|
+
• Parameters:
|
622
|
+
- df (pd.DataFrame)
|
623
|
+
- `bot_name` (str)
|
624
|
+
- message (str)
|
625
|
+
- `as_file` (bool)
|
626
|
+
- `remove_after_send` (bool)
|
627
|
+
|
628
|
+
• Example:
|
629
|
+
```
|
630
|
+
from rgwfuncs import send_data_to_slack
|
631
|
+
|
632
|
+
df = ... # Some DataFrame
|
633
|
+
send_data_to_slack(
|
634
|
+
df,
|
635
|
+
bot_name='MySlackBot',
|
636
|
+
message='Hello Slack!',
|
637
|
+
as_file=True,
|
638
|
+
remove_after_send=True
|
639
|
+
)
|
640
|
+
```
|
641
|
+
|
642
|
+
--------------------------------------------------------------------------------
|
643
|
+
|
644
|
+
### 27. `order_columns`
|
645
|
+
Reorder the columns of a DataFrame based on a string input.
|
646
|
+
|
647
|
+
• Parameters:
|
648
|
+
- df (pd.DataFrame)
|
649
|
+
- `column_order_str` (str): Comma-separated column order.
|
650
|
+
|
651
|
+
• Returns:
|
652
|
+
- pd.DataFrame
|
653
|
+
|
654
|
+
• Example:
|
655
|
+
```
|
656
|
+
from rgwfuncs import order_columns
|
657
|
+
import pandas as pd
|
658
|
+
|
659
|
+
df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [30, 25], 'Salary': [1000, 1200]})
|
660
|
+
df_reordered = order_columns(df, 'Salary,Name,Age')
|
661
|
+
print(df_reordered)
|
662
|
+
```
|
663
|
+
|
664
|
+
--------------------------------------------------------------------------------
|
665
|
+
|
666
|
+
### 28. `append_ranged_classification_column`
|
667
|
+
Append a ranged classification column to the DataFrame.
|
668
|
+
|
669
|
+
• Parameters:
|
670
|
+
- df (pd.DataFrame)
|
671
|
+
- ranges (str): Ranges separated by commas (e.g., "0-10,11-20,21-30").
|
672
|
+
- `target_col` (str): The column to classify.
|
673
|
+
- `new_col_name` (str): Name of the new classification column.
|
674
|
+
|
675
|
+
• Returns:
|
676
|
+
- pd.DataFrame
|
677
|
+
|
678
|
+
• Example:
|
679
|
+
```
|
680
|
+
from rgwfuncs import append_ranged_classification_column
|
681
|
+
import pandas as pd
|
682
|
+
|
683
|
+
df = pd.DataFrame({'Scores': [5, 12, 25]})
|
684
|
+
df_classified = append_ranged_classification_column(df, '0-10,11-20,21-30', 'Scores', 'ScoreRange')
|
685
|
+
print(df_classified)
|
686
|
+
```
|
687
|
+
|
688
|
+
--------------------------------------------------------------------------------
|
689
|
+
|
690
|
+
### 29. `append_percentile_classification_column`
|
691
|
+
Append a percentile classification column to the DataFrame.
|
692
|
+
|
693
|
+
• Parameters:
|
694
|
+
- df (pd.DataFrame)
|
695
|
+
- percentiles (str): Percentile values separated by commas (e.g., "25,50,75").
|
696
|
+
- `target_col` (str)
|
697
|
+
- `new_col_name` (str)
|
698
|
+
|
699
|
+
• Returns:
|
700
|
+
- pd.DataFrame
|
701
|
+
|
702
|
+
• Example:
|
703
|
+
```
|
704
|
+
from rgwfuncs import append_percentile_classification_column
|
705
|
+
import pandas as pd
|
706
|
+
|
707
|
+
df = pd.DataFrame({'Values': [10, 20, 30, 40, 50]})
|
708
|
+
df_classified = append_percentile_classification_column(df, '25,50,75', 'Values', 'ValuePercentile')
|
709
|
+
print(df_classified)
|
710
|
+
```
|
711
|
+
|
712
|
+
--------------------------------------------------------------------------------
|
713
|
+
|
714
|
+
### 30. `append_ranged_date_classification_column`
|
715
|
+
Append a ranged date classification column to the DataFrame.
|
716
|
+
|
717
|
+
• Parameters:
|
718
|
+
- df (pd.DataFrame)
|
719
|
+
- `date_ranges` (str): Date ranges separated by commas, e.g., `2020-01-01_2020-06-30,2020-07-01_2020-12-31`
|
720
|
+
- `target_col` (str)
|
721
|
+
- `new_col_name` (str)
|
722
|
+
|
723
|
+
• Returns:
|
724
|
+
- pd.DataFrame
|
725
|
+
|
726
|
+
• Example:
|
727
|
+
```
|
728
|
+
from rgwfuncs import append_ranged_date_classification_column
|
729
|
+
import pandas as pd
|
730
|
+
|
731
|
+
df = pd.DataFrame({'EventDate': pd.to_datetime(['2020-03-15','2020-08-10'])})
|
732
|
+
df_classified = append_ranged_date_classification_column(
|
733
|
+
df,
|
734
|
+
'2020-01-01_2020-06-30,2020-07-01_2020-12-31',
|
735
|
+
'EventDate',
|
736
|
+
'DateRange'
|
737
|
+
)
|
738
|
+
print(df_classified)
|
739
|
+
```
|
740
|
+
|
741
|
+
--------------------------------------------------------------------------------
|
742
|
+
|
743
|
+
### 31. `rename_columns`
|
744
|
+
Rename columns in the DataFrame.
|
745
|
+
|
746
|
+
• Parameters:
|
747
|
+
- df (pd.DataFrame)
|
748
|
+
- `rename_pairs` (dict): Mapping old column names to new ones.
|
749
|
+
|
750
|
+
• Returns:
|
751
|
+
- pd.DataFrame
|
752
|
+
|
753
|
+
• Example:
|
754
|
+
```
|
755
|
+
from rgwfuncs import rename_columns
|
756
|
+
import pandas as pd
|
757
|
+
|
758
|
+
df = pd.DataFrame({'OldName': [1,2,3]})
|
759
|
+
df_renamed = rename_columns(df, {'OldName': 'NewName'})
|
760
|
+
print(df_renamed)
|
761
|
+
```
|
762
|
+
|
763
|
+
--------------------------------------------------------------------------------
|
764
|
+
|
765
|
+
### 32. `cascade_sort`
|
766
|
+
Cascade sort the DataFrame by specified columns and order.
|
767
|
+
|
768
|
+
• Parameters:
|
769
|
+
- df (pd.DataFrame)
|
770
|
+
- columns (list): e.g. ["Column1::ASC", "Column2::DESC"].
|
771
|
+
|
772
|
+
• Returns:
|
773
|
+
- pd.DataFrame
|
774
|
+
|
775
|
+
• Example:
|
776
|
+
```
|
777
|
+
from rgwfuncs import cascade_sort
|
778
|
+
import pandas as pd
|
779
|
+
|
780
|
+
df = pd.DataFrame({
|
781
|
+
'Name': ['Charlie', 'Alice', 'Bob'],
|
782
|
+
'Age': [25, 30, 22]
|
783
|
+
})
|
784
|
+
|
785
|
+
sorted_df = cascade_sort(df, ["Name::ASC", "Age::DESC"])
|
786
|
+
print(sorted_df)
|
787
|
+
```
|
788
|
+
|
789
|
+
--------------------------------------------------------------------------------
|
790
|
+
|
791
|
+
### 33. `append_xgb_labels`
|
792
|
+
Append XGB training labels (TRAIN, VALIDATE, TEST) based on a ratio string.
|
793
|
+
|
794
|
+
• Parameters:
|
795
|
+
- df (pd.DataFrame)
|
796
|
+
- `ratio_str` (str): e.g. "8:2", "7:2:1".
|
797
|
+
|
798
|
+
• Returns:
|
799
|
+
- pd.DataFrame
|
800
|
+
|
801
|
+
• Example:
|
802
|
+
```
|
803
|
+
from rgwfuncs import append_xgb_labels
|
804
|
+
import pandas as pd
|
805
|
+
|
806
|
+
df = pd.DataFrame({'A': range(10)})
|
807
|
+
df_labeled = append_xgb_labels(df, "7:2:1")
|
808
|
+
print(df_labeled)
|
809
|
+
```
|
810
|
+
|
811
|
+
--------------------------------------------------------------------------------
|
812
|
+
|
813
|
+
### 34. `append_xgb_regression_predictions`
|
814
|
+
Append XGB regression predictions to the DataFrame. Requires an `XGB_TYPE` column for TRAIN/TEST splits.
|
815
|
+
|
816
|
+
• Parameters:
|
817
|
+
- df (pd.DataFrame)
|
818
|
+
- `target_col` (str)
|
819
|
+
- `feature_cols` (str): Comma-separated feature columns.
|
820
|
+
- `pred_col` (str)
|
821
|
+
- `boosting_rounds` (int, optional)
|
822
|
+
- `model_path` (str, optional)
|
823
|
+
|
824
|
+
• Returns:
|
825
|
+
- pd.DataFrame
|
826
|
+
|
827
|
+
• Example:
|
828
|
+
```
|
829
|
+
from rgwfuncs import append_xgb_regression_predictions
|
830
|
+
import pandas as pd
|
831
|
+
|
832
|
+
df = pd.DataFrame({
|
833
|
+
'XGB_TYPE': ['TRAIN','TRAIN','TEST','TEST'],
|
834
|
+
'Feature1': [1.2, 2.3, 3.4, 4.5],
|
835
|
+
'Feature2': [5.6, 6.7, 7.8, 8.9],
|
836
|
+
'Target': [10, 20, 30, 40]
|
837
|
+
})
|
838
|
+
|
839
|
+
df_pred = append_xgb_regression_predictions(df, 'Target', 'Feature1,Feature2', 'PredictedTarget')
|
840
|
+
print(df_pred)
|
841
|
+
```
|
842
|
+
|
843
|
+
--------------------------------------------------------------------------------
|
844
|
+
|
845
|
+
### 35. `append_xgb_logistic_regression_predictions`
|
846
|
+
Append XGB logistic regression predictions to the DataFrame. Requires an `XGB_TYPE` column for TRAIN/TEST splits.
|
847
|
+
|
848
|
+
• Parameters:
|
849
|
+
- df (pd.DataFrame)
|
850
|
+
- `target_col` (str)
|
851
|
+
- `feature_cols` (str)
|
852
|
+
- `pred_col` (str)
|
853
|
+
- `boosting_rounds` (int, optional)
|
854
|
+
- `model_path` (str, optional)
|
855
|
+
|
856
|
+
• Returns:
|
857
|
+
- pd.DataFrame
|
858
|
+
|
859
|
+
• Example:
|
860
|
+
```
|
861
|
+
from rgwfuncs import append_xgb_logistic_regression_predictions
|
862
|
+
import pandas as pd
|
863
|
+
|
864
|
+
df = pd.DataFrame({
|
865
|
+
'XGB_TYPE': ['TRAIN','TRAIN','TEST','TEST'],
|
866
|
+
'Feature1': [1, 0, 1, 0],
|
867
|
+
'Feature2': [0.5, 0.2, 0.8, 0.1],
|
868
|
+
'Target': [1, 0, 1, 0]
|
869
|
+
})
|
870
|
+
|
871
|
+
df_pred = append_xgb_logistic_regression_predictions(df, 'Target', 'Feature1,Feature2', 'PredictedTarget')
|
872
|
+
print(df_pred)
|
873
|
+
```
|
874
|
+
|
875
|
+
--------------------------------------------------------------------------------
|
876
|
+
|
877
|
+
### 36. `print_n_frequency_cascading`
|
878
|
+
Print the cascading frequency of top n values for specified columns.
|
879
|
+
|
880
|
+
• Parameters:
|
881
|
+
- df (pd.DataFrame)
|
882
|
+
- n (int)
|
883
|
+
- columns (str): Comma-separated column names.
|
884
|
+
- `order_by` (str): `ASC`, `DESC`, `FREQ_ASC`, `FREQ_DESC`.
|
885
|
+
|
886
|
+
• Example:
|
887
|
+
```
|
888
|
+
from rgwfuncs import print_n_frequency_cascading
|
889
|
+
import pandas as pd
|
890
|
+
|
891
|
+
df = pd.DataFrame({'City': ['NY','LA','NY','SF','LA','LA']})
|
892
|
+
print_n_frequency_cascading(df, 2, 'City', 'FREQ_DESC')
|
893
|
+
```
|
894
|
+
|
895
|
+
--------------------------------------------------------------------------------
|
896
|
+
|
897
|
+
### 37. `print_n_frequency_linear`
|
898
|
+
Print the linear frequency of top n values for specified columns.
|
899
|
+
|
900
|
+
• Parameters:
|
901
|
+
- df (pd.DataFrame)
|
902
|
+
- n (int)
|
903
|
+
- columns (str): Comma-separated columns.
|
904
|
+
- `order_by` (str)
|
905
|
+
|
906
|
+
• Example:
|
907
|
+
```
|
908
|
+
from rgwfuncs import print_n_frequency_linear
|
909
|
+
import pandas as pd
|
910
|
+
|
911
|
+
df = pd.DataFrame({'City': ['NY','LA','NY','SF','LA','LA']})
|
912
|
+
print_n_frequency_linear(df, 2, 'City', 'FREQ_DESC')
|
913
|
+
```
|
914
|
+
|
915
|
+
--------------------------------------------------------------------------------
|
916
|
+
|
917
|
+
### 38. `retain_columns`
|
918
|
+
Retain specified columns in the DataFrame and drop the others.
|
919
|
+
|
920
|
+
• Parameters:
|
921
|
+
- df (pd.DataFrame)
|
922
|
+
- `columns_to_retain` (list or str)
|
923
|
+
|
924
|
+
• Returns:
|
925
|
+
- pd.DataFrame
|
926
|
+
|
927
|
+
• Example:
|
928
|
+
```
|
929
|
+
from rgwfuncs import retain_columns
|
930
|
+
import pandas as pd
|
931
|
+
|
932
|
+
df = pd.DataFrame({'A': [1,2], 'B': [3,4], 'C': [5,6]})
|
933
|
+
df_reduced = retain_columns(df, ['A','C'])
|
934
|
+
print(df_reduced)
|
935
|
+
```
|
936
|
+
|
937
|
+
--------------------------------------------------------------------------------
|
938
|
+
|
939
|
+
### 39. `mask_against_dataframe`
|
940
|
+
Retain only rows with common column values between two DataFrames.
|
941
|
+
|
942
|
+
• Parameters:
|
943
|
+
- df (pd.DataFrame)
|
944
|
+
- `other_df` (pd.DataFrame)
|
945
|
+
- `column_name` (str)
|
946
|
+
|
947
|
+
• Returns:
|
948
|
+
- pd.DataFrame
|
949
|
+
|
950
|
+
• Example:
|
951
|
+
```
|
952
|
+
from rgwfuncs import mask_against_dataframe
|
953
|
+
import pandas as pd
|
954
|
+
|
955
|
+
df1 = pd.DataFrame({'ID': [1,2,3], 'Value': [10,20,30]})
|
956
|
+
df2 = pd.DataFrame({'ID': [2,3,4], 'Extra': ['X','Y','Z']})
|
957
|
+
|
958
|
+
df_masked = mask_against_dataframe(df1, df2, 'ID')
|
959
|
+
print(df_masked)
|
960
|
+
```
|
961
|
+
|
962
|
+
--------------------------------------------------------------------------------
|
963
|
+
|
964
|
+
### 40. `mask_against_dataframe_converse`
|
965
|
+
Retain only rows with uncommon column values between two DataFrames.
|
966
|
+
|
967
|
+
• Parameters:
|
968
|
+
- df (pd.DataFrame)
|
969
|
+
- `other_df` (pd.DataFrame)
|
970
|
+
- `column_name` (str)
|
971
|
+
|
972
|
+
• Returns:
|
973
|
+
- pd.DataFrame
|
974
|
+
|
975
|
+
• Example:
|
976
|
+
```
|
977
|
+
from rgwfuncs import mask_against_dataframe_converse
|
978
|
+
import pandas as pd
|
979
|
+
|
980
|
+
df1 = pd.DataFrame({'ID': [1,2,3], 'Value': [10,20,30]})
|
981
|
+
df2 = pd.DataFrame({'ID': [2,3,4], 'Extra': ['X','Y','Z']})
|
982
|
+
|
983
|
+
df_uncommon = mask_against_dataframe_converse(df1, df2, 'ID')
|
984
|
+
print(df_uncommon)
|
985
|
+
```
|
986
|
+
|
987
|
+
--------------------------------------------------------------------------------
|
988
|
+
|
989
|
+
## Additional Info
|
990
|
+
|
991
|
+
For more information, refer to each function’s docstring by calling:
|
992
|
+
```
|
993
|
+
rgwfuncs.docs(method_type_filter='function_name')
|
994
|
+
```
|
995
|
+
or display all docstrings with:
|
996
|
+
```python
|
997
|
+
rgwfuncs.docs(method_type_filter='*')
|
998
|
+
```
|
999
|
+
|
1000
|
+
--------------------------------------------------------------------------------
|
1001
|
+
|
1002
|
+
© 2025 Ryan Gerard Wilson. All rights reserved.
|
1003
|
+
|