imsciences 0.6.3.1__tar.gz → 0.8__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,434 @@
1
+ Metadata-Version: 2.1
2
+ Name: imsciences
3
+ Version: 0.8
4
+ Summary: IMS Data Processing Package
5
+ Author: IMS
6
+ Author-email: cam@im-sciences.com
7
+ Keywords: python,data processing,apis
8
+ Classifier: Development Status :: 3 - Alpha
9
+ Classifier: Intended Audience :: Developers
10
+ Classifier: Programming Language :: Python :: 3
11
+ Classifier: Operating System :: Unix
12
+ Classifier: Operating System :: MacOS :: MacOS X
13
+ Classifier: Operating System :: Microsoft :: Windows
14
+ Description-Content-Type: text/markdown
15
+ Requires-Dist: pandas
16
+ Requires-Dist: plotly
17
+ Requires-Dist: numpy
18
+ Requires-Dist: fredapi
19
+ Requires-Dist: requests_cache
20
+ Requires-Dist: geopy
21
+ Requires-Dist: plotly
22
+ Requires-Dist: bs4
23
+ Requires-Dist: yfinance
24
+ Requires-Dist: holidays
25
+
26
+ # IMS Package Documentation
27
+
28
+ The **IMSciences package** is a Python library designed to process incoming data into a format tailored for econometrics projects, particularly those utilising weekly time series data. This package offers a suite of functions for efficient data manipulation and analysis.
29
+
30
+ ---
31
+
32
+ ## Key Features
33
+ - Seamless data processing for econometrics workflows.
34
+ - Aggregation, filtering, and transformation of time series data.
35
+ - Integration with external data sources like FRED, Bank of England, ONS and OECD.
36
+
37
+ ---
38
+
39
+ ## Table of Contents
40
+
41
+ 1. [Data Processing](#data-processing)
42
+ 2. [Data Pulling](#data-pulling)
43
+ 3. [Installation](#installation)
44
+ 4. [Useage](#useage)
45
+ 5. [License](#license)
46
+
47
+ ---
48
+
49
+ ## Data Processing
50
+
51
+
52
+ ## 1. `get_wd_levels`
53
+ - **Description**: Get the working directory with the option of moving up parents.
54
+ - **Usage**: `get_wd_levels(levels)`
55
+ - **Example**: `get_wd_levels(0)`
56
+
57
+ ---
58
+
59
+ ## 2. `remove_rows`
60
+ - **Description**: Removes a specified number of rows from a pandas DataFrame.
61
+ - **Usage**: `remove_rows(data_frame, num_rows_to_remove)`
62
+ - **Example**: `remove_rows(df, 2)`
63
+
64
+ ---
65
+
66
+ ## 3. `aggregate_daily_to_wc_long`
67
+ - **Description**: Aggregates daily data into weekly data, grouping and summing specified columns, starting on a specified day of the week.
68
+ - **Usage**: `aggregate_daily_to_wc_long(df, date_column, group_columns, sum_columns, wc, aggregation='sum')`
69
+ - **Example**: `aggregate_daily_to_wc_long(df, 'date', ['platform'], ['cost', 'impressions', 'clicks'], 'mon', 'average')`
70
+
71
+ ---
72
+
73
+ ## 4. `convert_monthly_to_daily`
74
+ - **Description**: Converts monthly data in a DataFrame to daily data by expanding and dividing the numeric values.
75
+ - **Usage**: `convert_monthly_to_daily(df, date_column, divide)`
76
+ - **Example**: `convert_monthly_to_daily(df, 'date')`
77
+
78
+ ---
79
+
80
+ ## 5. `plot_two`
81
+ - **Description**: Plots specified columns from two different DataFrames using a shared date column. Useful for comparing data.
82
+ - **Usage**: `plot_two(df1, col1, df2, col2, date_column, same_axis=True)`
83
+ - **Example**: `plot_two(df1, 'cost', df2, 'cost', 'obs', True)`
84
+
85
+ ---
86
+
87
+ ## 6. `remove_nan_rows`
88
+ - **Description**: Removes rows from a DataFrame where the specified column has NaN values.
89
+ - **Usage**: `remove_nan_rows(df, col_to_remove_rows)`
90
+ - **Example**: `remove_nan_rows(df, 'date')`
91
+
92
+ ---
93
+
94
+ ## 7. `filter_rows`
95
+ - **Description**: Filters the DataFrame based on whether the values in a specified column are in a provided list.
96
+ - **Usage**: `filter_rows(df, col_to_filter, list_of_filters)`
97
+ - **Example**: `filter_rows(df, 'country', ['UK', 'IE'])`
98
+
99
+ ---
100
+
101
+ ## 8. `plot_one`
102
+ - **Description**: Plots a specified column from a DataFrame.
103
+ - **Usage**: `plot_one(df1, col1, date_column)`
104
+ - **Example**: `plot_one(df, 'Spend', 'OBS')`
105
+
106
+ ---
107
+
108
+ ## 9. `week_of_year_mapping`
109
+ - **Description**: Converts a week column in `yyyy-Www` or `yyyy-ww` format to week commencing date.
110
+ - **Usage**: `week_of_year_mapping(df, week_col, start_day_str)`
111
+ - **Example**: `week_of_year_mapping(df, 'week', 'mon')`
112
+
113
+ ---
114
+
115
+ ## 10. `exclude_rows`
116
+ - **Description**: Removes rows from a DataFrame based on whether the values in a specified column are not in a provided list.
117
+ - **Usage**: `exclude_rows(df, col_to_filter, list_of_filters)`
118
+ - **Example**: `exclude_rows(df, 'week', ['2022-W20', '2022-W21'])`
119
+
120
+ ---
121
+
122
+ ## 11. `rename_cols`
123
+ - **Description**: Renames columns in a pandas DataFrame.
124
+ - **Usage**: `rename_cols(df, name)`
125
+ - **Example**: `rename_cols(df, 'ame_facebook')`
126
+
127
+ ---
128
+
129
+ ## 12. `merge_new_and_old`
130
+ - **Description**: Creates a new DataFrame with two columns: one for dates and one for merged numeric values.
131
+ - Merges numeric values from specified columns in the old and new DataFrames based on a given cutoff date.
132
+ - **Usage**: `merge_new_and_old(old_df, old_col, new_df, new_col, cutoff_date, date_col_name='OBS')`
133
+ - **Example**: `merge_new_and_old(df1, 'old_col', df2, 'new_col', '2023-01-15')`
134
+
135
+ ---
136
+
137
+ ## 13. `merge_dataframes_on_date`
138
+ - **Description**: Merge a list of DataFrames on a common column.
139
+ - **Usage**: `merge_dataframes_on_date(dataframes, common_column='OBS', merge_how='outer')`
140
+ - **Example**: `merge_dataframes_on_date([df1, df2, df3], common_column='OBS', merge_how='outer')`
141
+
142
+ ---
143
+
144
+ ## 14. `merge_and_update_dfs`
145
+ - **Description**: Merges two dataframes on a key column, updates the first dataframe's columns with the second's where available, and returns a dataframe sorted by the key column.
146
+ - **Usage**: `merge_and_update_dfs(df1, df2, key_column)`
147
+ - **Example**: `merge_and_update_dfs(processed_facebook, finalised_meta, 'OBS')`
148
+
149
+ ---
150
+
151
+ ## 15. `convert_us_to_uk_dates`
152
+ - **Description**: Convert a DataFrame column with mixed date formats to datetime.
153
+ - **Usage**: `convert_us_to_uk_dates(df, date_col)`
154
+ - **Example**: `convert_us_to_uk_dates(df, 'date')`
155
+
156
+ ---
157
+
158
+ ### 16. combine_sheets
159
+ - **Description**: Combines multiple DataFrames from a dictionary into a single DataFrame.
160
+ - **Usage**: `combine_sheets(all_sheets)`
161
+ - **Example**: `combine_sheets({'Sheet1': df1, 'Sheet2': df2})`
162
+
163
+ ---
164
+
165
+ ## 17. `pivot_table`
166
+ - **Description**: Dynamically pivots a DataFrame based on specified columns.
167
+ - **Usage**: `pivot_table(df, index_col, columns, values_col, filters_dict=None, fill_value=0, aggfunc='sum', margins=False, margins_name='Total', datetime_trans_needed=True, reverse_header_order=False, fill_missing_weekly_dates=False, week_commencing='W-MON')`
168
+ - **Example**: `pivot_table(df, 'OBS', 'Channel Short Names', 'Value', filters_dict={'Master Include': ' == 1', 'OBS': ' >= datetime(2019,9,9)', 'Metric Short Names': ' == spd'}, fill_value=0, aggfunc='sum', margins=False, margins_name='Total', datetime_trans_needed=True, reverse_header_order=True, fill_missing_weekly_dates=True, week_commencing='W-MON')`
169
+
170
+ ---
171
+
172
+ ## 18. `apply_lookup_table_for_columns`
173
+ - **Description**: Equivalent of XLOOKUP in Excel. Allows mapping of a dictionary of substrings within a column.
174
+ - **Usage**: `apply_lookup_table_for_columns(df, col_names, to_find_dict, if_not_in_dict='Other', new_column_name='Mapping')`
175
+ - **Example**: `apply_lookup_table_for_columns(df, col_names, {'spend': 'spd', 'clicks': 'clk'}, if_not_in_dict='Other', new_column_name='Metrics Short')`
176
+
177
+ ---
178
+
179
+ ## 19. `aggregate_daily_to_wc_wide`
180
+ - **Description**: Aggregates daily data into weekly data, grouping and summing specified columns, starting on a specified day of the week.
181
+ - **Usage**: `aggregate_daily_to_wc_wide(df, date_column, group_columns, sum_columns, wc, aggregation='sum', include_totals=False)`
182
+ - **Example**: `aggregate_daily_to_wc_wide(df, 'date', ['platform'], ['cost', 'impressions', 'clicks'], 'mon', 'average', True)`
183
+
184
+ ---
185
+
186
+ ## 20. `merge_cols_with_seperator`
187
+ - **Description**: Merges multiple columns in a DataFrame into one column with a separator `_`. Useful for lookup tables.
188
+ - **Usage**: `merge_cols_with_seperator(df, col_names, seperator='_', output_column_name='Merged', starting_prefix_str=None, ending_prefix_str=None)`
189
+ - **Example**: `merge_cols_with_seperator(df, ['Campaign', 'Product'], seperator='|', output_column_name='Merged Columns', starting_prefix_str='start_', ending_prefix_str='_end')`
190
+
191
+ ---
192
+
193
+ ## 21. `check_sum_of_df_cols_are_equal`
194
+ - **Description**: Checks if the sum of two columns in two DataFrames are the same, and provides the sums and differences.
195
+ - **Usage**: `check_sum_of_df_cols_are_equal(df_1, df_2, cols_1, cols_2)`
196
+ - **Example**: `check_sum_of_df_cols_are_equal(df_1, df_2, 'Media Cost', 'Spend')`
197
+
198
+ ---
199
+
200
+ ## 22. `convert_2_df_cols_to_dict`
201
+ - **Description**: Creates a dictionary using two columns in a DataFrame.
202
+ - **Usage**: `convert_2_df_cols_to_dict(df, key_col, value_col)`
203
+ - **Example**: `convert_2_df_cols_to_dict(df, 'Campaign', 'Channel')`
204
+
205
+ ---
206
+
207
+ ## 23. `create_FY_and_H_columns`
208
+ - **Description**: Creates financial year, half-year, and financial half-year columns.
209
+ - **Usage**: `create_FY_and_H_columns(df, index_col, start_date, starting_FY, short_format='No', half_years='No', combined_FY_and_H='No')`
210
+ - **Example**: `create_FY_and_H_columns(df, 'Week (M-S)', '2022-10-03', 'FY2023', short_format='Yes', half_years='Yes', combined_FY_and_H='Yes')`
211
+
212
+ ---
213
+
214
+ ## 24. `keyword_lookup_replacement`
215
+ - **Description**: Updates chosen values in a specified column of the DataFrame based on a lookup dictionary.
216
+ - **Usage**: `keyword_lookup_replacement(df, col, replacement_rows, cols_to_merge, replacement_lookup_dict, output_column_name='Updated Column')`
217
+ - **Example**: `keyword_lookup_replacement(df, 'channel', 'Paid Search Generic', ['channel', 'segment', 'product'], qlik_dict_for_channel, output_column_name='Channel New')`
218
+
219
+ ---
220
+
221
+ ## 25. `create_new_version_of_col_using_LUT`
222
+ - **Description**: Creates a new column in a DataFrame by mapping values from an old column using a lookup table.
223
+ - **Usage**: `create_new_version_of_col_using_LUT(df, keys_col, value_col, dict_for_specific_changes, new_col_name='New Version of Old Col')`
224
+ - **Example**: `create_new_version_of_col_using_LUT(df, 'Campaign Name', 'Campaign Type', search_campaign_name_retag_lut, 'Campaign Name New')`
225
+
226
+ ---
227
+
228
+ ## 26. `convert_df_wide_2_long`
229
+ - **Description**: Converts a DataFrame from wide to long format.
230
+ - **Usage**: `convert_df_wide_2_long(df, value_cols, variable_col_name='Stacked', value_col_name='Value')`
231
+ - **Example**: `convert_df_wide_2_long(df, ['Media Cost', 'Impressions', 'Clicks'], variable_col_name='Metric')`
232
+
233
+ ---
234
+
235
+ ## 27. `manually_edit_data`
236
+ - **Description**: Enables manual updates to DataFrame cells by applying filters and editing a column.
237
+ - **Usage**: `manually_edit_data(df, filters_dict, col_to_change, new_value, change_in_existing_df_col='No', new_col_to_change_name='New', manual_edit_col_name=None, add_notes='No', existing_note_col_name=None, note=None)`
238
+ - **Example**: `manually_edit_data(df, {'OBS': ' <= datetime(2023,1,23)', 'File_Name': ' == France media'}, 'Master Include', 1, change_in_existing_df_col='Yes', new_col_to_change_name='Master Include', manual_edit_col_name='Manual Changes')`
239
+
240
+ ---
241
+
242
+ ## 28. `format_numbers_with_commas`
243
+ - **Description**: Formats numeric data into numbers with commas and specified decimal places.
244
+ - **Usage**: `format_numbers_with_commas(df, decimal_length_chosen=2)`
245
+ - **Example**: `format_numbers_with_commas(df, 1)`
246
+
247
+ ---
248
+
249
+ ## 29. `filter_df_on_multiple_conditions`
250
+ - **Description**: Filters a DataFrame based on multiple conditions from a dictionary.
251
+ - **Usage**: `filter_df_on_multiple_conditions(df, filters_dict)`
252
+ - **Example**: `filter_df_on_multiple_conditions(df, {'OBS': ' <= datetime(2023,1,23)', 'File_Name': ' == France media'})`
253
+
254
+ ---
255
+
256
+ ## 30. `read_and_concatenate_files`
257
+ - **Description**: Reads and concatenates all files of a specified type in a folder.
258
+ - **Usage**: `read_and_concatenate_files(folder_path, file_type='csv')`
259
+ - **Example**: `read_and_concatenate_files(folder_path, file_type='csv')`
260
+
261
+ ---
262
+
263
+ ## 31. `remove_zero_values`
264
+ - **Description**: Removes rows with zero values in a specified column.
265
+ - **Usage**: `remove_zero_values(data_frame, column_to_filter)`
266
+ - **Example**: `remove_zero_values(df, 'Funeral_Delivery')`
267
+
268
+ ---
269
+
270
+ ## 32. `upgrade_outdated_packages`
271
+ - **Description**: Upgrades all outdated packages in the environment.
272
+ - **Usage**: `upgrade_outdated_packages()`
273
+ - **Example**: `upgrade_outdated_packages()`
274
+
275
+ ---
276
+
277
+ ## 33. `convert_mixed_formats_dates`
278
+ - **Description**: Converts a mix of US and UK date formats to datetime.
279
+ - **Usage**: `convert_mixed_formats_dates(df, date_col)`
280
+ - **Example**: `convert_mixed_formats_dates(df, 'OBS')`
281
+
282
+ ---
283
+
284
+ ## 34. `fill_weekly_date_range`
285
+ - **Description**: Fills in missing weeks with zero values.
286
+ - **Usage**: `fill_weekly_date_range(df, date_column, freq)`
287
+ - **Example**: `fill_weekly_date_range(df, 'OBS', 'W-MON')`
288
+
289
+ ---
290
+
291
+ ## 35. `add_prefix_and_suffix`
292
+ - **Description**: Adds prefixes and/or suffixes to column headers.
293
+ - **Usage**: `add_prefix_and_suffix(df, prefix='', suffix='', date_col=None)`
294
+ - **Example**: `add_prefix_and_suffix(df, prefix='media_', suffix='_spd', date_col='obs')`
295
+
296
+ ---
297
+
298
+ ## 36. `create_dummies`
299
+ - **Description**: Converts time series into binary indicators based on a threshold.
300
+ - **Usage**: `create_dummies(df, date_col=None, dummy_threshold=0, add_total_dummy_col='No', total_col_name='total')`
301
+ - **Example**: `create_dummies(df, date_col='obs', dummy_threshold=100, add_total_dummy_col='Yes', total_col_name='med_total_dum')`
302
+
303
+ ---
304
+
305
+ ## 37. `replace_substrings`
306
+ - **Description**: Replaces substrings in a column of strings using a dictionary and can change column values to lowercase.
307
+ - **Usage**: `replace_substrings(df, column, replacements, to_lower=False, new_column=None)`
308
+ - **Example**: `replace_substrings(df, 'Influencer Handle', replacement_dict, to_lower=True, new_column='Short Version')`
309
+
310
+ ---
311
+
312
+ ## 38. `add_total_column`
313
+ - **Description**: Sums all columns (excluding a specified column) to create a total column.
314
+ - **Usage**: `add_total_column(df, exclude_col=None, total_col_name='Total')`
315
+ - **Example**: `add_total_column(df, exclude_col='obs', total_col_name='total_media_spd')`
316
+
317
+ ---
318
+
319
+ ## 39. `apply_lookup_table_based_on_substring`
320
+ - **Description**: Maps substrings in a column to values using a lookup dictionary.
321
+ - **Usage**: `apply_lookup_table_based_on_substring(df, column_name, category_dict, new_col_name='Category', other_label='Other')`
322
+ - **Example**: `apply_lookup_table_based_on_substring(df, 'Campaign Name', campaign_dict, new_col_name='Campaign Name Short', other_label='Full Funnel')`
323
+
324
+ ---
325
+
326
+ ## 40. `compare_overlap`
327
+ - **Description**: Compares matching rows and columns in two DataFrames and outputs the differences.
328
+ - **Usage**: `compare_overlap(df1, df2, date_col)`
329
+ - **Example**: `compare_overlap(df_1, df_2, 'obs')`
330
+
331
+ ---
332
+
333
+ ## 41. `week_commencing_2_week_commencing_conversion`
334
+ - **Description**: Converts a week commencing column to a different start day.
335
+ - **Usage**: `week_commencing_2_week_commencing_conversion(df, date_col, week_commencing='sun')`
336
+ - **Example**: `week_commencing_2_week_commencing_conversion(df, 'obs', week_commencing='mon')`
337
+
338
+ ---
339
+
340
+ ## 42. `plot_chart`
341
+ - **Description**: Plots various chart types including line, area, scatter, and bar.
342
+ - **Usage**: `plot_chart(df, date_col, value_cols, chart_type='line', title='Chart', x_title='Date', y_title='Values', **kwargs)`
343
+ - **Example**: `plot_chart(df, 'obs', df.cols, chart_type='line', title='Spend Over Time', x_title='Date', y_title='Spend')`
344
+
345
+ ---
346
+
347
+ ## 43. `plot_two_with_common_cols`
348
+ - **Description**: Plots charts for two DataFrames based on common column names.
349
+ - **Usage**: `plot_two_with_common_cols(df1, df2, date_column, same_axis=True)`
350
+ - **Example**: `plot_two_with_common_cols(df_1, df_2, date_column='obs')`
351
+
352
+ ---
353
+
354
+ ## Data Pulling
355
+
356
+ ## 1. pull_fred_data
357
+ - **Description**: Fetch data from FRED using series ID tokens.
358
+ - **Usage**: pull_fred_data(week_commencing, series_id_list)
359
+ - **Example**: pull_fred_data('mon', ['GPDIC1', 'Y057RX1Q020SBEA', 'GCEC1', 'ND000333Q', 'Y006RX1Q020SBEA'])
360
+
361
+ ---
362
+
363
+ ## 2. pull_boe_data
364
+ - **Description**: Fetch and process Bank of England interest rate data.
365
+ - **Usage**: pull_boe_data(week_commencing)
366
+ - **Example**: pull_boe_data('mon')
367
+
368
+ ---
369
+
370
+ ## 3. pull_oecd
371
+ - **Description**: Fetch macroeconomic data from OECD for a specified country.
372
+ - **Usage**: pull_oecd(country='GBR', week_commencing='mon', start_date='2020-01-01')
373
+ - **Example**: pull_oecd('GBR', 'mon', '2000-01-01')
374
+
375
+ ---
376
+
377
+ ## 4. get_google_mobility_data
378
+ - **Description**: Fetch Google Mobility data for the specified country.
379
+ - **Usage**: get_google_mobility_data(country, wc)
380
+ - **Example**: get_google_mobility_data('United Kingdom', 'mon')
381
+
382
+ ---
383
+
384
+ ## 5. pull_seasonality
385
+ - **Description**: Generate combined dummy variables for seasonality, trends, and COVID lockdowns.
386
+ - **Usage**: pull_seasonality(week_commencing, start_date, countries)
387
+ - **Example**: pull_seasonality('mon', '2020-01-01', ['US', 'GB'])
388
+
389
+ ---
390
+
391
+ ## 6. pull_weather
392
+ - **Description**: Fetch and process historical weather data for the specified country.
393
+ - **Usage**: pull_weather(week_commencing, country)
394
+ - **Example**: pull_weather('mon', 'GBR')
395
+
396
+ ---
397
+
398
+ ## 7. pull_macro_ons_uk
399
+ - **Description**: Fetch and process time series data from the Beta ONS API.
400
+ - **Usage**: pull_macro_ons_uk(additional_list, week_commencing, sector)
401
+ - **Example**: pull_macro_ons_uk(['HBOI'], 'mon', 'fast_food')
402
+
403
+ ---
404
+
405
+ ## 8. pull_yfinance
406
+ - **Description**: Fetch and process time series data from Yahoo Finance.
407
+ - **Usage**: pull_yfinance(tickers, week_start_day)
408
+ - **Example**: pull_yfinance(['^FTMC', '^IXIC'], 'mon')
409
+
410
+ ## Installation
411
+
412
+ Install the IMS package via pip:
413
+
414
+ ```bash
415
+ pip install ims-package
416
+ ```
417
+
418
+ ---
419
+
420
+ ## Useage
421
+
422
+ ```bash
423
+ from imsciences import *
424
+ ims = dataprocessing()
425
+ ims_pull = datapull()
426
+ ```
427
+
428
+ ---
429
+
430
+ ## License
431
+
432
+ This project is licensed under the MIT License.
433
+
434
+ ---