imsciences 0.5.4.8__tar.gz → 0.9.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of imsciences might be problematic. Click here for more details.

@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Independent Marketing Sciences
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,330 @@
1
+ Metadata-Version: 2.1
2
+ Name: imsciences
3
+ Version: 0.9.3
4
+ Summary: IMS Data Processing Package
5
+ Author: IMS
6
+ Author-email: cam@im-sciences.com
7
+ Keywords: python,data processing,apis
8
+ Classifier: Development Status :: 3 - Alpha
9
+ Classifier: Intended Audience :: Developers
10
+ Classifier: Programming Language :: Python :: 3
11
+ Classifier: Operating System :: Unix
12
+ Classifier: Operating System :: MacOS :: MacOS X
13
+ Classifier: Operating System :: Microsoft :: Windows
14
+ Description-Content-Type: text/markdown
15
+ License-File: LICENSE.txt
16
+ Requires-Dist: pandas
17
+ Requires-Dist: plotly
18
+ Requires-Dist: numpy
19
+ Requires-Dist: fredapi
20
+ Requires-Dist: bs4
21
+ Requires-Dist: yfinance
22
+ Requires-Dist: holidays
23
+ Requires-Dist: google-analytics-data
24
+ Requires-Dist: geopandas
25
+
26
+ # IMS Package Documentation
27
+
28
+ The **Independent Marketing Sciences** package is a Python library designed to process incoming data into a format tailored for projects, particularly those utilising weekly time series data. This package offers a suite of functions for efficient data collection, manipulation, visualisation and analysis.
29
+
30
+ ---
31
+
32
+ ## Key Features
33
+ - Seamless data processing for time series workflows.
34
+ - Aggregation, filtering, and transformation of time series data.
35
+ - Visualising Data
36
+ - Integration with external data sources like FRED, Bank of England, ONS and OECD.
37
+
38
+ ---
39
+
40
+ Table of Contents
41
+ =================
42
+
43
+ 1. [Data Processing for Time Series](#data-processing-for-time-series)
44
+ 2. [Data Processing for Incrementality Testing](#data-processing-for-incrementality-testing)
45
+ 3. [Data Visualisations](#data-visualisations)
46
+ 4. [Data Pulling](#data-pulling)
47
+ 5. [Installation](#installation)
48
+ 6. [Usage](#usage)
49
+ 7. [License](#license)
50
+
51
+ ---
52
+
53
+ ## Data Processing for Time Series
54
+
55
+ ## 1. `get_wd_levels`
56
+ - **Description**: Get the working directory with the option of moving up parents.
57
+ - **Usage**: `get_wd_levels(levels)`
58
+ - **Example**: `get_wd_levels(0)`
59
+
60
+ ## 2. `aggregate_daily_to_wc_long`
61
+ - **Description**: Aggregates daily data into weekly data, grouping and summing specified columns, starting on a specified day of the week.
62
+ - **Usage**: `aggregate_daily_to_wc_long(df, date_column, group_columns, sum_columns, wc, aggregation='sum')`
63
+ - **Example**: `aggregate_daily_to_wc_long(df, 'date', ['platform'], ['cost', 'impressions', 'clicks'], 'mon', 'average')`
64
+
65
+ ## 3. `convert_monthly_to_daily`
66
+ - **Description**: Converts monthly data in a DataFrame to daily data by expanding and dividing the numeric values.
67
+ - **Usage**: `convert_monthly_to_daily(df, date_column, divide=True)`
68
+ - **Example**: `convert_monthly_to_daily(df, 'date')`
69
+
70
+ ## 4. `week_of_year_mapping`
71
+ - **Description**: Converts a week column in 'yyyy-Www' or 'yyyy-ww' format to week commencing date.
72
+ - **Usage**: `week_of_year_mapping(df, week_col, start_day_str)`
73
+ - **Example**: `week_of_year_mapping(df, 'week', 'mon')`
74
+
75
+ ## 5. `rename_cols`
76
+ - **Description**: Renames columns in a pandas DataFrame with a specified prefix or format.
77
+ - **Usage**: `rename_cols(df, name='ame_')`
78
+ - **Example**: `rename_cols(df, 'ame_facebook')`
79
+
80
+ ## 6. `merge_new_and_old`
81
+ - **Description**: Creates a new DataFrame by merging old and new dataframes based on a cutoff date.
82
+ - **Usage**: `merge_new_and_old(old_df, old_col, new_df, new_col, cutoff_date, date_col_name='OBS')`
83
+ - **Example**: `merge_new_and_old(df1, 'old_col', df2, 'new_col', '2023-01-15')`
84
+
85
+ ## 7. `merge_dataframes_on_column`
86
+ - **Description**: Merge a list of DataFrames on a common column.
87
+ - **Usage**: `merge_dataframes_on_column(dataframes, common_column='OBS', merge_how='outer')`
88
+ - **Example**: `merge_dataframes_on_column([df1, df2, df3], common_column='OBS', merge_how='outer')`
89
+
90
+ ## 8. `merge_and_update_dfs`
91
+ - **Description**: Merges two dataframes, updating columns from the second dataframe where values are available.
92
+ - **Usage**: `merge_and_update_dfs(df1, df2, key_column)`
93
+ - **Example**: `merge_and_update_dfs(processed_facebook, finalised_meta, 'OBS')`
94
+
95
+ ## 9. `convert_us_to_uk_dates`
96
+ - **Description**: Convert a DataFrame column with mixed US and UK date formats to datetime.
97
+ - **Usage**: `convert_us_to_uk_dates(df, date_col)`
98
+ - **Example**: `convert_us_to_uk_dates(df, 'date')`
99
+
100
+ ## 10. `combine_sheets`
101
+ - **Description**: Combines multiple DataFrames from a dictionary into a single DataFrame.
102
+ - **Usage**: `combine_sheets(all_sheets)`
103
+ - **Example**: `combine_sheets({'Sheet1': df1, 'Sheet2': df2})`
104
+
105
+ ## 11. `pivot_table`
106
+ - **Description**: Dynamically pivots a DataFrame based on specified columns.
107
+ - **Usage**: `pivot_table(df, index_col, columns, values_col, filters_dict=None, fill_value=0, aggfunc='sum', margins=False, margins_name='Total', datetime_trans_needed=True, reverse_header_order=False, fill_missing_weekly_dates=False, week_commencing='W-MON')`
108
+ - **Example**: `pivot_table(df, 'OBS', 'Channel Short Names', 'Value', filters_dict={'Master Include': ' == 1'}, fill_value=0)`
109
+
110
+ ## 12. `apply_lookup_table_for_columns`
111
+ - **Description**: Maps substrings in columns to new values based on a dictionary.
112
+ - **Usage**: `apply_lookup_table_for_columns(df, col_names, to_find_dict, if_not_in_dict='Other', new_column_name='Mapping')`
113
+ - **Example**: `apply_lookup_table_for_columns(df, col_names, {'spend': 'spd'}, if_not_in_dict='Other', new_column_name='Metrics Short')`
114
+
115
+ ## 13. `aggregate_daily_to_wc_wide`
116
+ - **Description**: Aggregates daily data into weekly data and pivots it to wide format.
117
+ - **Usage**: `aggregate_daily_to_wc_wide(df, date_column, group_columns, sum_columns, wc='sun', aggregation='sum', include_totals=False)`
118
+ - **Example**: `aggregate_daily_to_wc_wide(df, 'date', ['platform'], ['cost', 'impressions'], 'mon', 'average', True)`
119
+
120
+ ## 14. `merge_cols_with_seperator`
121
+ - **Description**: Merges multiple columns in a DataFrame into one column with a specified separator.
122
+ - **Usage**: `merge_cols_with_seperator(df, col_names, separator='_', output_column_name='Merged')`
123
+ - **Example**: `merge_cols_with_seperator(df, ['Campaign', 'Product'], separator='|', output_column_name='Merged Columns')`
124
+
125
+ ## 15. `check_sum_of_df_cols_are_equal`
126
+ - **Description**: Checks if the sum of two columns in two DataFrames are equal and provides the difference.
127
+ - **Usage**: `check_sum_of_df_cols_are_equal(df_1, df_2, cols_1, cols_2)`
128
+ - **Example**: `check_sum_of_df_cols_are_equal(df_1, df_2, 'Media Cost', 'Spend')`
129
+
130
+ ## 16. `convert_2_df_cols_to_dict`
131
+ - **Description**: Creates a dictionary from two DataFrame columns.
132
+ - **Usage**: `convert_2_df_cols_to_dict(df, key_col, value_col)`
133
+ - **Example**: `convert_2_df_cols_to_dict(df, 'Campaign', 'Channel')`
134
+
135
+ ## 17. `create_FY_and_H_columns`
136
+ - **Description**: Adds financial year and half-year columns to a DataFrame based on a start date.
137
+ - **Usage**: `create_FY_and_H_columns(df, index_col, start_date, starting_FY, short_format='No', half_years='No', combined_FY_and_H='No')`
138
+ - **Example**: `create_FY_and_H_columns(df, 'Week', '2022-10-03', 'FY2023', short_format='Yes')`
139
+
140
+ ## 18. `keyword_lookup_replacement`
141
+ - **Description**: Updates values in a column based on a lookup dictionary with conditional logic.
142
+ - **Usage**: `keyword_lookup_replacement(df, col, replacement_rows, cols_to_merge, replacement_lookup_dict, output_column_name='Updated Column')`
143
+ - **Example**: `keyword_lookup_replacement(df, 'channel', 'Paid Search Generic', ['channel', 'segment'], lookup_dict, output_column_name='Channel New')`
144
+
145
+ ## 19. `create_new_version_of_col_using_LUT`
146
+ - **Description**: Creates a new column based on a lookup table applied to an existing column.
147
+ - **Usage**: `create_new_version_of_col_using_LUT(df, keys_col, value_col, dict_for_specific_changes, new_col_name='New Version of Old Col')`
148
+ - **Example**: `create_new_version_of_col_using_LUT(df, 'Campaign Name', 'Campaign Type', lookup_dict)`
149
+
150
+ ## 20. `convert_df_wide_2_long`
151
+ - **Description**: Converts a wide-format DataFrame into a long-format DataFrame.
152
+ - **Usage**: `convert_df_wide_2_long(df, value_cols, variable_col_name='Stacked', value_col_name='Value')`
153
+ - **Example**: `convert_df_wide_2_long(df, ['col1', 'col2'], variable_col_name='Var', value_col_name='Val')`
154
+
155
+ ## 21. `manually_edit_data`
156
+ - **Description**: Manually updates specified cells in a DataFrame based on filters.
157
+ - **Usage**: `manually_edit_data(df, filters_dict, col_to_change, new_value, change_in_existing_df_col='No', new_col_to_change_name='New', manual_edit_col_name=None, add_notes='No', existing_note_col_name=None, note=None)`
158
+ - **Example**: `manually_edit_data(df, {'col1': '== 1'}, 'col2', 'new_val', add_notes='Yes', note='Manual Update')`
159
+
160
+ ## 22. `format_numbers_with_commas`
161
+ - **Description**: Formats numerical columns with commas and a specified number of decimal places.
162
+ - **Usage**: `format_numbers_with_commas(df, decimal_length_chosen=2)`
163
+ - **Example**: `format_numbers_with_commas(df, decimal_length_chosen=1)`
164
+
165
+ ## 23. `filter_df_on_multiple_conditions`
166
+ - **Description**: Filters a DataFrame based on multiple column conditions.
167
+ - **Usage**: `filter_df_on_multiple_conditions(df, filters_dict)`
168
+ - **Example**: `filter_df_on_multiple_conditions(df, {'col1': '>= 5', 'col2': '== 'val''})`
169
+
170
+ ## 24. `read_and_concatenate_files`
171
+ - **Description**: Reads and concatenates files from a specified folder into a single DataFrame.
172
+ - **Usage**: `read_and_concatenate_files(folder_path, file_type='csv')`
173
+ - **Example**: `read_and_concatenate_files('/path/to/files', file_type='xlsx')`
174
+
175
+ ## 25. `upgrade_outdated_packages`
176
+ - **Description**: Upgrades all outdated Python packages except specified ones.
177
+ - **Usage**: `upgrade_outdated_packages(exclude_packages=['twine'])`
178
+ - **Example**: `upgrade_outdated_packages(exclude_packages=['pip', 'setuptools'])`
179
+
180
+ ## 26. `convert_mixed_formats_dates`
181
+ - **Description**: Converts mixed-format date columns into standardized datetime format.
182
+ - **Usage**: `convert_mixed_formats_dates(df, column_name)`
183
+ - **Example**: `convert_mixed_formats_dates(df, 'date_col')`
184
+
185
+ ## 27. `fill_weekly_date_range`
186
+ - **Description**: Fills in missing weekly dates in a DataFrame with a specified frequency.
187
+ - **Usage**: `fill_weekly_date_range(df, date_column, freq='W-MON')`
188
+ - **Example**: `fill_weekly_date_range(df, 'date_col')`
189
+
190
+ ## 28. `add_prefix_and_suffix`
191
+ - **Description**: Adds prefixes and/or suffixes to column names, with an option to exclude a date column.
192
+ - **Usage**: `add_prefix_and_suffix(df, prefix='', suffix='', date_col=None)`
193
+ - **Example**: `add_prefix_and_suffix(df, prefix='pre_', suffix='_suf', date_col='date_col')`
194
+
195
+ ## 29. `create_dummies`
196
+ - **Description**: Creates dummy variables for columns, with an option to add a total dummy column.
197
+ - **Usage**: `create_dummies(df, date_col=None, dummy_threshold=0, add_total_dummy_col='No', total_col_name='total')`
198
+ - **Example**: `create_dummies(df, date_col='date_col', dummy_threshold=1)`
199
+
200
+ ## 30. `replace_substrings`
201
+ - **Description**: Replaces substrings in a column based on a dictionary, with options for case conversion and new column creation.
202
+ - **Usage**: `replace_substrings(df, column, replacements, to_lower=False, new_column=None)`
203
+ - **Example**: `replace_substrings(df, 'text_col', {'old': 'new'}, to_lower=True, new_column='updated_text')`
204
+
205
+ ## 31. `add_total_column`
206
+ - **Description**: Adds a total column to a DataFrame by summing values across columns, optionally excluding one.
207
+ - **Usage**: `add_total_column(df, exclude_col=None, total_col_name='Total')`
208
+ - **Example**: `add_total_column(df, exclude_col='date_col')`
209
+
210
+ ## 32. `apply_lookup_table_based_on_substring`
211
+ - **Description**: Categorizes text in a column using a lookup table based on substrings.
212
+ - **Usage**: `apply_lookup_table_based_on_substring(df, column_name, category_dict, new_col_name='Category', other_label='Other')`
213
+ - **Example**: `apply_lookup_table_based_on_substring(df, 'text_col', {'sub1': 'cat1', 'sub2': 'cat2'})`
214
+
215
+ ## 33. `compare_overlap`
216
+ - **Description**: Compares overlapping periods between two DataFrames and summarizes differences.
217
+ - **Usage**: `compare_overlap(df1, df2, date_col)`
218
+ - **Example**: `compare_overlap(df1, df2, 'date_col')`
219
+
220
+ ## 34. `week_commencing_2_week_commencing_conversion_isoweekday`
221
+ - **Description**: Maps dates to the start of the current ISO week based on a specified weekday.
222
+ - **Usage**: `week_commencing_2_week_commencing_conversion_isoweekday(df, date_col, week_commencing='mon')`
223
+ - **Example**: `week_commencing_2_week_commencing_conversion_isoweekday(df, 'date_col', week_commencing='fri')`
224
+
225
+ ---
226
+
227
+ ## Data Processing for Incrementality Testing
228
+
229
+ ## 1. `pull_ga`
230
+ - **Description**: Pull in GA4 data for geo experiments.
231
+ - **Usage**: `pull_ga(credentials_file, property_id, start_date, country, metrics)`
232
+ - **Example**: `pull_ga('GeoExperiment-31c5f5db2c39.json', '111111111', '2023-10-15', 'United Kingdom', ['totalUsers', 'newUsers'])`
233
+
234
+ ## 2. `process_itv_analysis`
235
+ - **Description**: Pull in GA4 data for geo experiments.
236
+ - **Usage**: `process_itv_analysis(self, raw_df, itv_path, cities_path, media_spend_path, output_path, group1, group2)`
237
+ - **Example**: `process_itv_analysis(df, 'itv regional mapping.csv', 'Geo_Mappings_with_Coordinates.xlsx', 'IMS.xlsx', 'itv_for_test_analysis_itvx.csv', ['West', 'Westcountry', 'Tyne Tees'], ['Central Scotland', 'North Scotland'])`
238
+
239
+ ---
240
+
241
+ ## Data Visualisations
242
+
243
+ ## 1. `plot_one`
244
+ - **Description**: Plots a specified column from a DataFrame with white background and black axes.
245
+ - **Usage**: `plot_one(df1, col1, date_column)`
246
+ - **Example**: `plot_one(df, 'sales', 'date')`
247
+
248
+ ## 2. `plot_two`
249
+ - **Description**: Plots specified columns from two DataFrames, optionally on the same or separate y-axes.
250
+ - **Usage**: `plot_two(df1, col1, df2, col2, date_column, same_axis=True)`
251
+ - **Example**: `plot_two(df1, 'sales', df2, 'revenue', 'date', same_axis=False)`
252
+
253
+ ## 3. `plot_chart`
254
+ - **Description**: Plots various chart types using Plotly, including line, bar, scatter, area, pie, etc.
255
+ - **Usage**: `plot_chart(df, date_col, value_cols, chart_type='line', title='Chart', x_title='Date', y_title='Values')`
256
+ - **Example**: `plot_chart(df, 'date', ['sales', 'revenue'], chart_type='line', title='Sales and Revenue')`
257
+
258
+ ---
259
+
260
+ ## Data Pulling
261
+
262
+ ## 1. `pull_fred_data`
263
+ - **Description**: Fetch data from FRED using series ID tokens.
264
+ - **Usage**: `pull_fred_data(week_commencing, series_id_list)`
265
+ - **Example**: `pull_fred_data('mon', ['GPDIC1', 'Y057RX1Q020SBEA', 'GCEC1', 'ND000333Q', 'Y006RX1Q020SBEA'])`
266
+
267
+ ## 2. `pull_boe_data`
268
+ - **Description**: Fetch and process Bank of England interest rate data.
269
+ - **Usage**: `pull_boe_data(week_commencing)`
270
+ - **Example**: `pull_boe_data('mon')`
271
+
272
+ ## 3. `pull_oecd`
273
+ - **Description**: Fetch macroeconomic data from OECD for a specified country.
274
+ - **Usage**: `pull_oecd(country='GBR', week_commencing='mon', start_date='2020-01-01')`
275
+ - **Example**: `pull_oecd('GBR', 'mon', '2000-01-01')`
276
+
277
+ ## 4. `get_google_mobility_data`
278
+ - **Description**: Fetch Google Mobility data for the specified country.
279
+ - **Usage**: `get_google_mobility_data(country, wc)`
280
+ - **Example**: `get_google_mobility_data('United Kingdom', 'mon')`
281
+
282
+ ## 5. `pull_seasonality`
283
+ - **Description**: Generate combined dummy variables for seasonality, trends, and COVID lockdowns.
284
+ - **Usage**: `pull_seasonality(week_commencing, start_date, countries)`
285
+ - **Example**: `pull_seasonality('mon', '2020-01-01', ['US', 'GB'])`
286
+
287
+ ## 6. `pull_weather`
288
+ - **Description**: Fetch and process historical weather data for the specified country.
289
+ - **Usage**: `pull_weather(week_commencing, country)`
290
+ - **Example**: `pull_weather('mon', 'GBR')`
291
+
292
+ ## 7. `pull_macro_ons_uk`
293
+ - **Description**: Fetch and process time series data from the Beta ONS API.
294
+ - **Usage**: `pull_macro_ons_uk(additional_list, week_commencing, sector)`
295
+ - **Example**: `pull_macro_ons_uk(['HBOI'], 'mon', 'fast_food')`
296
+
297
+ ## 8. `pull_yfinance`
298
+ - **Description**: Fetch and process time series data from Yahoo Finance.
299
+ - **Usage**: `pull_yfinance(tickers, week_start_day)`
300
+ - **Example**: `pull_yfinance(['^FTMC', '^IXIC'], 'mon')`
301
+
302
+ ---
303
+
304
+ ## Installation
305
+
306
+ Install the IMS package via pip:
307
+
308
+ ```bash
309
+ pip install imsciences
310
+ ```
311
+
312
+ ---
313
+
314
+ ## Usage
315
+
316
+ ```bash
317
+ from imsciences import *
318
+ ims_proc = dataprocessing()
319
+ ims_geo = geoprocessing()
320
+ ims_pull = datapull()
321
+ ims_vis = datavis()
322
+ ```
323
+
324
+ ---
325
+
326
+ ## License
327
+
328
+ This project is licensed under the MIT License. ![License](https://img.shields.io/badge/license-MIT-blue.svg)
329
+
330
+ ---
@@ -0,0 +1,305 @@
1
+ # IMS Package Documentation
2
+
3
+ The **Independent Marketing Sciences** package is a Python library designed to process incoming data into a format tailored for projects, particularly those utilising weekly time series data. This package offers a suite of functions for efficient data collection, manipulation, visualisation and analysis.
4
+
5
+ ---
6
+
7
+ ## Key Features
8
+ - Seamless data processing for time series workflows.
9
+ - Aggregation, filtering, and transformation of time series data.
10
+ - Visualising Data
11
+ - Integration with external data sources like FRED, Bank of England, ONS and OECD.
12
+
13
+ ---
14
+
15
+ Table of Contents
16
+ =================
17
+
18
+ 1. [Data Processing for Time Series](#data-processing-for-time-series)
19
+ 2. [Data Processing for Incrementality Testing](#data-processing-for-incrementality-testing)
20
+ 3. [Data Visualisations](#data-visualisations)
21
+ 4. [Data Pulling](#data-pulling)
22
+ 5. [Installation](#installation)
23
+ 6. [Usage](#usage)
24
+ 7. [License](#license)
25
+
26
+ ---
27
+
28
+ ## Data Processing for Time Series
29
+
30
+ ## 1. `get_wd_levels`
31
+ - **Description**: Get the working directory with the option of moving up parents.
32
+ - **Usage**: `get_wd_levels(levels)`
33
+ - **Example**: `get_wd_levels(0)`
34
+
35
+ ## 2. `aggregate_daily_to_wc_long`
36
+ - **Description**: Aggregates daily data into weekly data, grouping and summing specified columns, starting on a specified day of the week.
37
+ - **Usage**: `aggregate_daily_to_wc_long(df, date_column, group_columns, sum_columns, wc, aggregation='sum')`
38
+ - **Example**: `aggregate_daily_to_wc_long(df, 'date', ['platform'], ['cost', 'impressions', 'clicks'], 'mon', 'average')`
39
+
40
+ ## 3. `convert_monthly_to_daily`
41
+ - **Description**: Converts monthly data in a DataFrame to daily data by expanding and dividing the numeric values.
42
+ - **Usage**: `convert_monthly_to_daily(df, date_column, divide=True)`
43
+ - **Example**: `convert_monthly_to_daily(df, 'date')`
44
+
45
+ ## 4. `week_of_year_mapping`
46
+ - **Description**: Converts a week column in 'yyyy-Www' or 'yyyy-ww' format to week commencing date.
47
+ - **Usage**: `week_of_year_mapping(df, week_col, start_day_str)`
48
+ - **Example**: `week_of_year_mapping(df, 'week', 'mon')`
49
+
50
+ ## 5. `rename_cols`
51
+ - **Description**: Renames columns in a pandas DataFrame with a specified prefix or format.
52
+ - **Usage**: `rename_cols(df, name='ame_')`
53
+ - **Example**: `rename_cols(df, 'ame_facebook')`
54
+
55
+ ## 6. `merge_new_and_old`
56
+ - **Description**: Creates a new DataFrame by merging old and new dataframes based on a cutoff date.
57
+ - **Usage**: `merge_new_and_old(old_df, old_col, new_df, new_col, cutoff_date, date_col_name='OBS')`
58
+ - **Example**: `merge_new_and_old(df1, 'old_col', df2, 'new_col', '2023-01-15')`
59
+
60
+ ## 7. `merge_dataframes_on_column`
61
+ - **Description**: Merge a list of DataFrames on a common column.
62
+ - **Usage**: `merge_dataframes_on_column(dataframes, common_column='OBS', merge_how='outer')`
63
+ - **Example**: `merge_dataframes_on_column([df1, df2, df3], common_column='OBS', merge_how='outer')`
64
+
65
+ ## 8. `merge_and_update_dfs`
66
+ - **Description**: Merges two dataframes, updating columns from the second dataframe where values are available.
67
+ - **Usage**: `merge_and_update_dfs(df1, df2, key_column)`
68
+ - **Example**: `merge_and_update_dfs(processed_facebook, finalised_meta, 'OBS')`
69
+
70
+ ## 9. `convert_us_to_uk_dates`
71
+ - **Description**: Convert a DataFrame column with mixed US and UK date formats to datetime.
72
+ - **Usage**: `convert_us_to_uk_dates(df, date_col)`
73
+ - **Example**: `convert_us_to_uk_dates(df, 'date')`
74
+
75
+ ## 10. `combine_sheets`
76
+ - **Description**: Combines multiple DataFrames from a dictionary into a single DataFrame.
77
+ - **Usage**: `combine_sheets(all_sheets)`
78
+ - **Example**: `combine_sheets({'Sheet1': df1, 'Sheet2': df2})`
79
+
80
+ ## 11. `pivot_table`
81
+ - **Description**: Dynamically pivots a DataFrame based on specified columns.
82
+ - **Usage**: `pivot_table(df, index_col, columns, values_col, filters_dict=None, fill_value=0, aggfunc='sum', margins=False, margins_name='Total', datetime_trans_needed=True, reverse_header_order=False, fill_missing_weekly_dates=False, week_commencing='W-MON')`
83
+ - **Example**: `pivot_table(df, 'OBS', 'Channel Short Names', 'Value', filters_dict={'Master Include': ' == 1'}, fill_value=0)`
84
+
85
+ ## 12. `apply_lookup_table_for_columns`
86
+ - **Description**: Maps substrings in columns to new values based on a dictionary.
87
+ - **Usage**: `apply_lookup_table_for_columns(df, col_names, to_find_dict, if_not_in_dict='Other', new_column_name='Mapping')`
88
+ - **Example**: `apply_lookup_table_for_columns(df, col_names, {'spend': 'spd'}, if_not_in_dict='Other', new_column_name='Metrics Short')`
89
+
90
+ ## 13. `aggregate_daily_to_wc_wide`
91
+ - **Description**: Aggregates daily data into weekly data and pivots it to wide format.
92
+ - **Usage**: `aggregate_daily_to_wc_wide(df, date_column, group_columns, sum_columns, wc='sun', aggregation='sum', include_totals=False)`
93
+ - **Example**: `aggregate_daily_to_wc_wide(df, 'date', ['platform'], ['cost', 'impressions'], 'mon', 'average', True)`
94
+
95
+ ## 14. `merge_cols_with_seperator`
96
+ - **Description**: Merges multiple columns in a DataFrame into one column with a specified separator.
97
+ - **Usage**: `merge_cols_with_seperator(df, col_names, separator='_', output_column_name='Merged')`
98
+ - **Example**: `merge_cols_with_seperator(df, ['Campaign', 'Product'], separator='|', output_column_name='Merged Columns')`
99
+
100
+ ## 15. `check_sum_of_df_cols_are_equal`
101
+ - **Description**: Checks if the sum of two columns in two DataFrames are equal and provides the difference.
102
+ - **Usage**: `check_sum_of_df_cols_are_equal(df_1, df_2, cols_1, cols_2)`
103
+ - **Example**: `check_sum_of_df_cols_are_equal(df_1, df_2, 'Media Cost', 'Spend')`
104
+
105
+ ## 16. `convert_2_df_cols_to_dict`
106
+ - **Description**: Creates a dictionary from two DataFrame columns.
107
+ - **Usage**: `convert_2_df_cols_to_dict(df, key_col, value_col)`
108
+ - **Example**: `convert_2_df_cols_to_dict(df, 'Campaign', 'Channel')`
109
+
110
+ ## 17. `create_FY_and_H_columns`
111
+ - **Description**: Adds financial year and half-year columns to a DataFrame based on a start date.
112
+ - **Usage**: `create_FY_and_H_columns(df, index_col, start_date, starting_FY, short_format='No', half_years='No', combined_FY_and_H='No')`
113
+ - **Example**: `create_FY_and_H_columns(df, 'Week', '2022-10-03', 'FY2023', short_format='Yes')`
114
+
115
+ ## 18. `keyword_lookup_replacement`
116
+ - **Description**: Updates values in a column based on a lookup dictionary with conditional logic.
117
+ - **Usage**: `keyword_lookup_replacement(df, col, replacement_rows, cols_to_merge, replacement_lookup_dict, output_column_name='Updated Column')`
118
+ - **Example**: `keyword_lookup_replacement(df, 'channel', 'Paid Search Generic', ['channel', 'segment'], lookup_dict, output_column_name='Channel New')`
119
+
120
+ ## 19. `create_new_version_of_col_using_LUT`
121
+ - **Description**: Creates a new column based on a lookup table applied to an existing column.
122
+ - **Usage**: `create_new_version_of_col_using_LUT(df, keys_col, value_col, dict_for_specific_changes, new_col_name='New Version of Old Col')`
123
+ - **Example**: `create_new_version_of_col_using_LUT(df, 'Campaign Name', 'Campaign Type', lookup_dict)`
124
+
125
+ ## 20. `convert_df_wide_2_long`
126
+ - **Description**: Converts a wide-format DataFrame into a long-format DataFrame.
127
+ - **Usage**: `convert_df_wide_2_long(df, value_cols, variable_col_name='Stacked', value_col_name='Value')`
128
+ - **Example**: `convert_df_wide_2_long(df, ['col1', 'col2'], variable_col_name='Var', value_col_name='Val')`
129
+
130
+ ## 21. `manually_edit_data`
131
+ - **Description**: Manually updates specified cells in a DataFrame based on filters.
132
+ - **Usage**: `manually_edit_data(df, filters_dict, col_to_change, new_value, change_in_existing_df_col='No', new_col_to_change_name='New', manual_edit_col_name=None, add_notes='No', existing_note_col_name=None, note=None)`
133
+ - **Example**: `manually_edit_data(df, {'col1': '== 1'}, 'col2', 'new_val', add_notes='Yes', note='Manual Update')`
134
+
135
+ ## 22. `format_numbers_with_commas`
136
+ - **Description**: Formats numerical columns with commas and a specified number of decimal places.
137
+ - **Usage**: `format_numbers_with_commas(df, decimal_length_chosen=2)`
138
+ - **Example**: `format_numbers_with_commas(df, decimal_length_chosen=1)`
139
+
140
+ ## 23. `filter_df_on_multiple_conditions`
141
+ - **Description**: Filters a DataFrame based on multiple column conditions.
142
+ - **Usage**: `filter_df_on_multiple_conditions(df, filters_dict)`
143
+ - **Example**: `filter_df_on_multiple_conditions(df, {'col1': '>= 5', 'col2': '== 'val''})`
144
+
145
+ ## 24. `read_and_concatenate_files`
146
+ - **Description**: Reads and concatenates files from a specified folder into a single DataFrame.
147
+ - **Usage**: `read_and_concatenate_files(folder_path, file_type='csv')`
148
+ - **Example**: `read_and_concatenate_files('/path/to/files', file_type='xlsx')`
149
+
150
+ ## 25. `upgrade_outdated_packages`
151
+ - **Description**: Upgrades all outdated Python packages except specified ones.
152
+ - **Usage**: `upgrade_outdated_packages(exclude_packages=['twine'])`
153
+ - **Example**: `upgrade_outdated_packages(exclude_packages=['pip', 'setuptools'])`
154
+
155
+ ## 26. `convert_mixed_formats_dates`
156
+ - **Description**: Converts mixed-format date columns into standardized datetime format.
157
+ - **Usage**: `convert_mixed_formats_dates(df, column_name)`
158
+ - **Example**: `convert_mixed_formats_dates(df, 'date_col')`
159
+
160
+ ## 27. `fill_weekly_date_range`
161
+ - **Description**: Fills in missing weekly dates in a DataFrame with a specified frequency.
162
+ - **Usage**: `fill_weekly_date_range(df, date_column, freq='W-MON')`
163
+ - **Example**: `fill_weekly_date_range(df, 'date_col')`
164
+
165
+ ## 28. `add_prefix_and_suffix`
166
+ - **Description**: Adds prefixes and/or suffixes to column names, with an option to exclude a date column.
167
+ - **Usage**: `add_prefix_and_suffix(df, prefix='', suffix='', date_col=None)`
168
+ - **Example**: `add_prefix_and_suffix(df, prefix='pre_', suffix='_suf', date_col='date_col')`
169
+
170
+ ## 29. `create_dummies`
171
+ - **Description**: Creates dummy variables for columns, with an option to add a total dummy column.
172
+ - **Usage**: `create_dummies(df, date_col=None, dummy_threshold=0, add_total_dummy_col='No', total_col_name='total')`
173
+ - **Example**: `create_dummies(df, date_col='date_col', dummy_threshold=1)`
174
+
175
+ ## 30. `replace_substrings`
176
+ - **Description**: Replaces substrings in a column based on a dictionary, with options for case conversion and new column creation.
177
+ - **Usage**: `replace_substrings(df, column, replacements, to_lower=False, new_column=None)`
178
+ - **Example**: `replace_substrings(df, 'text_col', {'old': 'new'}, to_lower=True, new_column='updated_text')`
179
+
180
+ ## 31. `add_total_column`
181
+ - **Description**: Adds a total column to a DataFrame by summing values across columns, optionally excluding one.
182
+ - **Usage**: `add_total_column(df, exclude_col=None, total_col_name='Total')`
183
+ - **Example**: `add_total_column(df, exclude_col='date_col')`
184
+
185
+ ## 32. `apply_lookup_table_based_on_substring`
186
+ - **Description**: Categorizes text in a column using a lookup table based on substrings.
187
+ - **Usage**: `apply_lookup_table_based_on_substring(df, column_name, category_dict, new_col_name='Category', other_label='Other')`
188
+ - **Example**: `apply_lookup_table_based_on_substring(df, 'text_col', {'sub1': 'cat1', 'sub2': 'cat2'})`
189
+
190
+ ## 33. `compare_overlap`
191
+ - **Description**: Compares overlapping periods between two DataFrames and summarizes differences.
192
+ - **Usage**: `compare_overlap(df1, df2, date_col)`
193
+ - **Example**: `compare_overlap(df1, df2, 'date_col')`
194
+
195
+ ## 34. `week_commencing_2_week_commencing_conversion_isoweekday`
196
+ - **Description**: Maps dates to the start of the current ISO week based on a specified weekday.
197
+ - **Usage**: `week_commencing_2_week_commencing_conversion_isoweekday(df, date_col, week_commencing='mon')`
198
+ - **Example**: `week_commencing_2_week_commencing_conversion_isoweekday(df, 'date_col', week_commencing='fri')`
199
+
200
+ ---
201
+
202
+ ## Data Processing for Incrementality Testing
203
+
204
+ ## 1. `pull_ga`
205
+ - **Description**: Pull in GA4 data for geo experiments.
206
+ - **Usage**: `pull_ga(credentials_file, property_id, start_date, country, metrics)`
207
+ - **Example**: `pull_ga('GeoExperiment-31c5f5db2c39.json', '111111111', '2023-10-15', 'United Kingdom', ['totalUsers', 'newUsers'])`
208
+
209
+ ## 2. `process_itv_analysis`
210
+ - **Description**: Pull in GA4 data for geo experiments.
211
+ - **Usage**: `process_itv_analysis(self, raw_df, itv_path, cities_path, media_spend_path, output_path, group1, group2)`
212
+ - **Example**: `process_itv_analysis(df, 'itv regional mapping.csv', 'Geo_Mappings_with_Coordinates.xlsx', 'IMS.xlsx', 'itv_for_test_analysis_itvx.csv', ['West', 'Westcountry', 'Tyne Tees'], ['Central Scotland', 'North Scotland'])`
213
+
214
+ ---
215
+
216
+ ## Data Visualisations
217
+
218
+ ## 1. `plot_one`
219
+ - **Description**: Plots a specified column from a DataFrame with white background and black axes.
220
+ - **Usage**: `plot_one(df1, col1, date_column)`
221
+ - **Example**: `plot_one(df, 'sales', 'date')`
222
+
223
+ ## 2. `plot_two`
224
+ - **Description**: Plots specified columns from two DataFrames, optionally on the same or separate y-axes.
225
+ - **Usage**: `plot_two(df1, col1, df2, col2, date_column, same_axis=True)`
226
+ - **Example**: `plot_two(df1, 'sales', df2, 'revenue', 'date', same_axis=False)`
227
+
228
+ ## 3. `plot_chart`
229
+ - **Description**: Plots various chart types using Plotly, including line, bar, scatter, area, pie, etc.
230
+ - **Usage**: `plot_chart(df, date_col, value_cols, chart_type='line', title='Chart', x_title='Date', y_title='Values')`
231
+ - **Example**: `plot_chart(df, 'date', ['sales', 'revenue'], chart_type='line', title='Sales and Revenue')`
232
+
233
+ ---
234
+
235
+ ## Data Pulling
236
+
237
+ ## 1. `pull_fred_data`
238
+ - **Description**: Fetch data from FRED using series ID tokens.
239
+ - **Usage**: `pull_fred_data(week_commencing, series_id_list)`
240
+ - **Example**: `pull_fred_data('mon', ['GPDIC1', 'Y057RX1Q020SBEA', 'GCEC1', 'ND000333Q', 'Y006RX1Q020SBEA'])`
241
+
242
+ ## 2. `pull_boe_data`
243
+ - **Description**: Fetch and process Bank of England interest rate data.
244
+ - **Usage**: `pull_boe_data(week_commencing)`
245
+ - **Example**: `pull_boe_data('mon')`
246
+
247
+ ## 3. `pull_oecd`
248
+ - **Description**: Fetch macroeconomic data from OECD for a specified country.
249
+ - **Usage**: `pull_oecd(country='GBR', week_commencing='mon', start_date='2020-01-01')`
250
+ - **Example**: `pull_oecd('GBR', 'mon', '2000-01-01')`
251
+
252
+ ## 4. `get_google_mobility_data`
253
+ - **Description**: Fetch Google Mobility data for the specified country.
254
+ - **Usage**: `get_google_mobility_data(country, wc)`
255
+ - **Example**: `get_google_mobility_data('United Kingdom', 'mon')`
256
+
257
+ ## 5. `pull_seasonality`
258
+ - **Description**: Generate combined dummy variables for seasonality, trends, and COVID lockdowns.
259
+ - **Usage**: `pull_seasonality(week_commencing, start_date, countries)`
260
+ - **Example**: `pull_seasonality('mon', '2020-01-01', ['US', 'GB'])`
261
+
262
+ ## 6. `pull_weather`
263
+ - **Description**: Fetch and process historical weather data for the specified country.
264
+ - **Usage**: `pull_weather(week_commencing, country)`
265
+ - **Example**: `pull_weather('mon', 'GBR')`
266
+
267
+ ## 7. `pull_macro_ons_uk`
268
+ - **Description**: Fetch and process time series data from the Beta ONS API.
269
+ - **Usage**: `pull_macro_ons_uk(additional_list, week_commencing, sector)`
270
+ - **Example**: `pull_macro_ons_uk(['HBOI'], 'mon', 'fast_food')`
271
+
272
+ ## 8. `pull_yfinance`
273
+ - **Description**: Fetch and process time series data from Yahoo Finance.
274
+ - **Usage**: `pull_yfinance(tickers, week_start_day)`
275
+ - **Example**: `pull_yfinance(['^FTMC', '^IXIC'], 'mon')`
276
+
277
+ ---
278
+
279
+ ## Installation
280
+
281
+ Install the IMS package via pip:
282
+
283
+ ```bash
284
+ pip install imsciences
285
+ ```
286
+
287
+ ---
288
+
289
+ ## Usage
290
+
291
+ ```bash
292
+ from imsciences import *
293
+ ims_proc = dataprocessing()
294
+ ims_geo = geoprocessing()
295
+ ims_pull = datapull()
296
+ ims_vis = datavis()
297
+ ```
298
+
299
+ ---
300
+
301
+ ## License
302
+
303
+ This project is licensed under the MIT License. ![License](https://img.shields.io/badge/license-MIT-blue.svg)
304
+
305
+ ---
@@ -0,0 +1,4 @@
1
+ from .mmm import dataprocessing
2
+ from .pull import datapull
3
+ from .geo import geoprocessing
4
+ from .vis import datavis